Synopsis - 可應用各種強韌及高位元率之編碼方式的圖形式軟體浮水印架構

Chapter 1 Introduction

1.3 Synopsis

In the next chapter, we will introduce the related work of software watermark framework and algorithms. The proposed framework with three graph encoding will be expounded in Chapter 3.

Analysis and comparison will be applied in Chapter 4 and 5 respectively. Finally, we give the conclusion in Chapter 6.

Chapter 2 Related Work

In this chapter, related software watermark algorithms will be introduced. At first, a software watermark framework which can adapt different graph encodings is published by Ramarathnam Venkatesan [5]. Static and dynamic software watermark algorithms both have each related graph-based encodings. QP algorithm is a significant concept to embed the watermark through graph-based encoding. QPS algorithm use color modification to improve QP algorithm. And CT algorithm provides a runtime execution algorithm to embed or recognize watermark.

2.1 Graph Theoretic Approach for Software Watermarking

Graph Theoretic Approach which is proposed by Venkatesan, Vazirani, and Sinha [5] provides a tool for software tamper resistance and against the graph based attack. In this approach, weak connection means that a link or function call between program P and watermark W is only a single edge between two subgraphs which are parsed from P and W. To prevent being identified as weak connection, graphs are efficiently separated into subgraphs which will be merged by adding edges, and graphs will be well connected. Subgraphs of W must be locally indistinguishable from P. Based on well connection and locally indistinguishableness. The steps of graph algorithm are shown as Figure 2.1. For given program P, watermark code W, secret keys ω1, ω2 and ω3, and integer m:

Graph step: Flow graph G which has basic block as nodes and control flow or function calls as edges is computed from P. Similarly for W. G and W are both digraphs.

Clustering step: Using ω1 as random seed to partition G into n clusters, so that edges straddling across clusters are minimized. Let Gc be the graph where each node corresponds to a cluster in G and there is an edge between two nodes if the corresponding clusters in G have an edge going

across them. Wc is yielded in the same way as Gc to produces undirected graphs of small order.

Merging step: Edges are added between Gc and Wc using a random process. The edges are added by a random process: when the node is v, the current values are dgg and dgw, the number of nodes adjacent to v in Gc and Wc respectively. Let Pgg =dgg

(

dgg +dgw

)

and Pgw =dgw

(

dgg +dgw

)

. The next random node in Gc will be visited with probability Pgw or a node in Wc with probability Pgg

and secret key ω2 will make the choices. An edge is added between node v and its next random node.

Repeat the step until the resultant graph H yield.

Recovery step: Finally, Wc is compute and encrypted with secret key ω3.

Figure 2.1 Graph Theoretic Approach

2.2 Static Watermark Algorithms

In static watermark algorithm, watermarks are stored in the application executable itself. Static

watermark may exist as code or data stored in the section of the program. QP and QPS are classified into this group.

2.2.1 QP Algorithm

Qu and Potkonjak have proposed QP algorithm for embedding a watermark. QP algorithm contains three kinds of watermark algorithms and adding edges is the algorithm we choose to study and improve. In this paper, for a given graph G (V, E) and a message M to be embedded in G. Let vertices set V = (v0, v1… vn-1), edges set E and the message is encrypted into a binary string M = m0

m1… (By stream ciphers, block ciphers or cryptographic hash functions).

Embedding phase: First, a vertex vi is selected from given graph G (V, E) and find the nearest two vertices vi1 and vi2 for all i < i1 < i2 (mod n) which are not connected to vi, where means (vi, vi1), (vi,

vi2) ∉E. And the rule for embedding according to mi is as follows:

If mi = 0, (νi, vi1) is put into E’, means the edge between νi and vi1 is added.

If mi = 1, (νi, vi2) is put into E’, means the edge between νi and vi2 is added.

After the message M = m0 m1…are entirely embedded, a new graph G’ (V, E’) which have new edges set is reported. For example, in Figure 2.2, a message M = 510 = 1012 has been embedded into a 6 vertices graph by 3 edges and each edge presents one bit of message M. The essence of this algorithm is to add an extra edge between two vertices, and these two vertices have to be colored by different colors.

Figure 2.2 An Example of QP Algorithm

Figure 2.3 QP Recognition Algorithm

Recognition phase: In given graph G’ (V, E’), each (νi, vj) is the vertices pair that one bit of the embedded message can be obtained. For each (νi, vj), j > i (mod n), the bit extraction is done by examining the number of vertices between νi and vj are not connected to νi. There will be three cases to consider:

CaseⅠ: If there is no vertex which is not connected to νi, a 0 bit will be extracted. The example is shown as Figure 2.3 (a).

CaseⅡ: If there is only one vertex which is not connected to νi, a 1 bit will be extracted. The example is shown as Figure 2.3 (b).

CaseⅢ: If there is more than one vertex which is not connected to νi, then reverse the order of νi

and vj and repeat the extraction process.

2.2.2 QPS Algorithm

Myles and Collberg pointed out the error in QP algorithm, recognition failure. They provide two example of recognition failure as follow:

Example 1: Consider a graph G (V, E) as Figure 2.4 (a) and the message M = m1m2 is 00. The embedding phase is illustrated as Figure 2.3 (b). At first, νi = v0, vi1 = v2 and vi2 = v3 are selected to embed m1 = 0 by adding edge between v0 and v2. And νi = v3, vi1 = v0 and vi2 = v1 are selected to embed m2 = 0 by adding edge between v3 and v0. New graph G’ (V, E’) is obtained as Figure 2.3 (c).

In recognition phase, m1 = 0 is found by examining the number of vertices not connected to v0

between v0 and v2. And m2 = 1 is found by examining the number of vertices not connected to v3

between v0 and v3. The inaccurate message 01 is extracted when 00 was the embedded message.

Figure 2.4 Failure Recognition of QP Algorithm

Example 2: When we embed the message 101 in the graph G (V, E) as Figure 2.5 (a), the new graph G’ (V, E’) is obtained as Figure 2.5 (b). By following the recognition algorithm, the message 1001 is recovered.

Figure 2.5 Failure Recognition of QP Algorithm

They considered the unpredictability of coloring for the vertices as the inaccurate message recognition of QP algorithm. To eliminate the unpredictability, QPS algorithm places additional constraints on which vertices can be selected for a triple. In this paper, for a given a graph G = (V, E), a set of three vertices {v1, v2, v3} is considered a triple if

1. v1, v2, v3

∈

^V,

2. (v1, v2), (v1, v3), (v2, v3)

∉

And for a given a n-colorable graph G = (V, E), a set of three vertices {v1, v2, v3} is considered a

colored triple if 1. v1, v2, v3

∈

^V,

2. (v1, v2), (v1, v3), (v2, v3)

∉

^{E, and}

3. {v1, v2, v3}are all colored the same color.

Embedding phase: Select a vertex vi which is not must already in a triple. Find the nearest two vertices vi1 and vi1 which are the same color as vi and not already in a triple. An additional register allocator would be used to record related color of selected triple as (v'i, v'i1, v'i2). And the rule for embedding according to mi is as follows:

If mi = 0, add edge (νi, vi1) and change the color (ν'i, v'i1).

If mi = 1, add edge (νi, vi2).

The key idea of QPS embedding algorithm is to select the triples so that they are isolated units that will not affect other vertices in the graph. In addition, they use a specially designed register allocator which only changes the coloring of one of the two vertices involved in the added edge and no other vertices in the graph.

Recognition phase: QPS recognition algorithm works by identifying triples which had been selected in the embedding phase. A triple (vi, vi1, vi2) has been identified, and its related colored triple, (v'i, v'i1, v'i2), is examined. If v'i and v'i1 are different color, a 0 was embedded, otherwise a 1 was embedded.

2.2.3 Problems

The colors of vertices provide starting-point to discuss. In QP embedding algorithm, an extra edge is added between two vertices and these two vertices which may not be necessary in the original graph G will be colored by different colors. In QP recognition algorithm, we will find colors between two vertices are not accurate feature to observe one bit of message had been embedded.

The coloring feature in QP algorithm is ambiguous. This results in bad performance in embedding and recognizing watermark information.

In Example 1, when νi = v3, vi1 = v0 and vi2 = v1 are selected to embed m2 = 0, QP embedding algorithm which is defined as i < i1 < i2 (mod n) has been mistaken about. Unpredictable error could occur when we embed the message and neglect the definition i < i1 < i2 (mod n). For illustration, as Figure 2.6, the message M = 1 will be embedded into original graph, as Figure 2.6 (a). In embedding phase, νi = v2, vi1 = v3 and vi2 = v0 are selected to embed M= 1 as Figure 2.6 (b). In recognition phase, the message will be identified by examining the number of vertices not connected to v0 between v0 and v2 and there are two cases can take into consider:

Case 1: νi = v3 is considered that is not connected to νi = v0 and a 1 message is identified.

Case 2: νi = v1 is considered that is not connected to νi = v0 and a 0 message is identified.

Different message is identified when different vertex is taken into consider and it is uncertain whether message is true. This makes it clear that Example 1 is not a proper example to illustrate problem of QP algorithm.

Figure 2.6 Example of Recognition Problem in QP Algorithm

Some problems occur when the definition is followed already. To illustrate, Example 3 and 4 are applied to consider and there is no need to go into details about the color between vertices:

Figure 2.7 Example of Recognition Problem in QP Algorithm

Example 3: Consider a graph G (V, E) as Figure 2.7 (a) and the message M = 0. The embedding phase is illustrated as Figure 2.7 (b), and νi = v0, vi1 = v2 and vi2 = v3 are selected to embed M= 0 by adding edge between v0 and v2. New graph G’ (V, E’) is obtained as Figure 2.7 (c). In recognition phase, M = 1 is found by examining the number of vertices not connected to v0

between v0 and v2. The inaccurate message 1 is extracted when 0 was the embedded message.

Figure 2.8 Example of Recognition Problem in QP Algorithm

Example 4: Consider a graph G (V, E) as Figure 2.8 (a) and the message M = 1. The embedding phase is illustrated as Figure 2.8 (b), and νi = v0, vi1 = v2 and vi2 = v3 are selected to embed M= 0 by adding edge between v0 and v3. New graph G’ (V, E’) is obtained as Figure 2.8 (c). In recognition phase, the number of vertices not connected to νi = v0 between νi = v0 and νj = v3 is 2, and the same we reverse the order as νi = v3 and νj = v0. It is an undefined case that an unknown message is extracted if the number is 2.

The cause of the problem can be traced back to the assumption of embedding phase of QP algorithm:

find the nearest two vertices vi1 and vi2 which are not connected to vi. We define the graph have

“vertices pair” which means the nearest vertices vi1 and vi2 are not connected to vi. When vi is selected, there are many vertices pairs {vi1, vi2} can be selected to embed the message. In Figure 2.6, when vi = v0, there are two vertices pairs, {v1, v2} and {v2, v3} for being selected to embed the message. Correct message will be extracted if {v1, v2} is selected. In Example 3, an inaccurate 1 message is extracted when {v2, v3} is selected. In the same way, in Figure 2.8, correct message will be extracted if {v1, v2} is selected and inaccurate message will be extracted if {v2, v3} is selected.

Take a more carefully look into the selection of vertices pairs in the embedding phase and we find the result: Only if the vertices pair is the nearest to vi is restricted to selected, the message must be correct is extracted. This restrict can be treated as the definition in QP embedding phase.

2.3 Dynamic Watermark Algorithm

The basic concept of dynamic watermark algorithm is to embed watermark information to the running state of a program. In the other words, the watermark information can be embedded and extracted at runtime and thus make the disclosure more difficult. However, the implementation of dynamic watermark algorithm is difficult for most programming languages. Generally, Java-based language can implement algorithm easily. It can divide into four phases:

Annotation: Adding annotation (or mark) points into the application to be watermarked before the watermark can be embedded. Functions will be inserted into these annotation points. These functions perform no action and simply indicate to the locations where watermark can be inserted.

Locations are preferred mark locations that allocate objects and manipulate pointers and directly depend on user input. Hot spots and non-deterministically execution should avoid mark locations.

Tracing: A tracing run of the program will be performed after the application has been annotated.

Some annotation points will be selected after the tracing run. These annotation points will be the location where watermark-building code will later be inserted.

Embedding: Hence, embedding of watermark will start when the application has been traced. The input will be converted into an integer. From the integer, a graph G is generated to embed. The

embedding of watermark is divided into four steps:

1. Watermark W is embedded by generating graph G.

2. Generating an intermediate code to build this graph and translate into a Java method.

3. Replace the functions in annotation phase with these new functions.

4. New Java method file will be executed to build the graph G on the heap.

A single graph encoding method is not expected to fill requirements (high bit rate, high resilience to attack, etc.) in CT algorithm. Develop a library of graph encoding for building watermark graphs which have different characters instead. There are five kinds of graph encoding methods are implemented: Permutation encoding、Radix encoding、Parent-pointer trees、Reducible permutation graph and Planted plane cubic trees.

Extraction: In watermarked application, extraction is run as a sub-process under debugging. The secret input sequences are exactly entered as input and same mark allocations will be hit during tracing run. If the last part of input has been entered, the heap is examined for graphs that could potentially be a complete watermark graphs. Number of watermark will be extracted from graphs and reported.

There are five kinds of graph encoding algorithms which are implemented in CT algorithm.

Permutation, Radix and Parent-pointer tree encoding algorithm have higher date rate than other algorithm. The detail is given respectively as follow:

Figure 2.9 Example of Permutation Encoding Algorithm

Permutation Encoding: A watermark integer in the range [0…n-1] can be shown by permuting the

numbers 0,...,n−1 , the mapping of permutation will represent a number. For an example, the original A=<0,1,2,3,4,5,6,7> and a watermark is 1000, the permuted A will beA=<0,7,2,1,4,3,6,5> to represent the watermark. Permutation can be constructed by a singly-linked, circular list data structure, such as Figure 2.9.

Radix Encoding: A Radix graph is a circular linked list with n lengths, order of node represent the number of exponent with base-n digit and data pointer is the coefficient. A null-pointer encodes a 0, a self-pointer is a 1, and a pointer to next node encodes a 2, and so on. Take Figure 2.10 as an example,3×6⁴ +2×6³ +3×6² +4×6¹ +1×6⁰can be represented by the graph.

Figure 2.10 Example of Radix Encoding Algorithm

Parent-pointer trees encoding (PP trees): This encoding algorithm can be described as enumerations of graphs. The idea is that the watermark number n can be represented by the index of the watermark graph in a table of enumeration and the number of nodes depends on your watermark.

Figure 2.11 is an example which represents the number 1, 2, 20 and 21.

Figure 2.11 Example of PP Tree Encoding Algorithm

2.4 Summary

In this chapter, we introduce advantage and disadvantage of some famous framework and algorithms. Based on these characters, we will design and modify our proposed software watermark framework and algorithms.

Chapter 3 The Proposed Software Watermarking Framework

As described in the previous chapter, we explain the problems of failure recognition, low bit rate and color in QP and QPS algorithm. We also proposed static software watermark algorithms by leveraging the concepts in CT dynamic algorithm. The proposed static watermark algorithms will make the improvement on these characters:

Easier in constructing and extracting

Higher bit rates

Better robust from common attacks

More programming languages

We proposed a flexible framework to adopt not only proposed graph encoding algorithms but also other graph algorithms. To construct and extract watermark easily, we modify the heavy procedure of random walk in Graph Theoretic Approach. It is also implemented easily with more kinds of programming languages.

3.1 Framework

Framework can be divided into two phases: embedding and recognition phases. Embedding and recognition phase are shown with Figure 3.1 and Figure 3.3 separately.

In embedding phase, process proceeds through three steps:

Transformation step: Program P is parsed into graph G which is constructed by vertices and edges.

Each vertex of graph represents a basic block consisting of instructions. Each edge will be a directed edge represents a function call between two basic blocks. Order of vertices is arranged by depth-first search (DFS) algorithm. Message M can be a statement or special integer in binary

format. To make binary conversion, the hash function or other transform function can also be adopted to achieve higher security.

To increase the complexity and improve the privacy, graph G is segmented into n subgraphs,

{

g₁,g₂,L,g_n

}

, where gi is one of the subgraph of G. The representation of the gi shown as^G=

{

^gi 1≤ⁱ≤ⁿ

}

, where n is the number of subgraphs. Message M is also segmented into n fragmental messages,

{

m₁,m₂,L,m_n

}

, using the random seed ω, where mi is one of the fragmental messages of M. The representation of the mi shown as^M ⁼

{

^m_i ¹^≤ⁱ^≤ⁿ

}

, where n is the number of subgraphs. Each subgraph is constructed according to the vertices selecting rule that make each subgraph which is decided by the corresponding fragmental message being embedded successfully.

Figure 3.1 Embedding Phase of Proposed Framework

The vertices selecting rule is defined as follow:

1. Select the vertex in the higher level of sub-graph as the first-selected vertex.

2. Select the vertices which aren’t the children of first vertex of sub-graph.

Each subgraph should have these characters:

1. The size of subgraph is decided by corresponding fragmental message 2. A sub-graph should contain at least three vertices.

3. If sub-graph is constructed from k vertices, fragmental message should be k-2 bits.

Embed step: When fragmental messages and subgraphs are generated, the next move is selecting the algorithm of graph encoding. There are three kinds of graph encoding algorithms: link encoding (LE)、color encoding (CE) and link color encoding (LCE). If LE or LCE is selected, path analysis will proceed. During the process of path analysis, as Figure 3.2, subgraph and the path which has more levels for embedding is analyzed by tree diagram. First vertex is selected according to the result from path analysis. Follow on the first vertex, fragmental message is embedded into its corresponding subgraph by adding directed edge between two vertices.

Figure 3.2 Example of Path Analysis

Merge step: The last step is merging original graph G and each subgraph into new graph. Directed edge is added between two vertices in original graph if the same vertices in subgraph have been embedded a bit of fragmental message. Color of vertices in original graph will be tampered according to the same vertices in subgraph. Finally, a new graph G’ contains the embedded message is generated.

Figure 3.3 Recognition Phase of Proposed Framework

In recognition phase, there are two steps:

Recognition step: When watermarked graph G’ is received, information of subgraphs, vertices and its corresponding color, and edges are also known. Algorithm of graph encoding which was used to embed the message is also analyzed and found. In the light of information, graph G’ is segmented into n subgraphs

{

g₁′,g′₂,L,g_n′

}

, where g’i is one of the subgraph of watermarked graph G’. The representation of the gi is shown as^G'=

{

^g'i 1≤ⁱ≤ⁿ

}

, where n is the number of subgraphs.

According to that graph encoding recognition algorithm, the fragmental messages

{

m'₁,m'₂,L,m'_n

}

are extracted from each subgraph, where m’i is one of message M’. The

representation of the mi is shown as^M'=

{

^m'i 1≤ⁱ≤ⁿ

}

, where n is the number of subgraphs.

Message processing step: Combine each fragmental message into new message m’. It is not necessary that new message M’ must be equal to original message M’ if we can verify the correct hidden information from M’.

Figure 3.4 is an example of proposed framework in embedding phase:

在文檔中可應用各種強韌及高位元率之編碼方式的圖形式軟體浮水印架構 (頁 13-0)