Chapter 1 Introduction
1.5 Thesis Organization
In the remainder of this thesis, related works about data hiding in text documents and the specifications of the XPS document format are reviewed in Chapter 2. In Chapter 3, the proposed method for covert communication via XPS documents is described. In Chapter 4, the proposed authentication method for verification of the integrity and fidelity of XPS document contents is described. In Chapter 5, the proposed method for data hiding by adjusting the advance widths of some special ASCII codes for steganography is presented. Finally, conclusions and some suggestions for future works are given in Chapter 6.
Chapter 2
Review of Related Works and XPS Document Format
2.1 Previous Studies on Data Hiding Techniques in Text Documents
Existing studies on data hiding techniques via text documents are not as many as those via images or videos because of the lack of redundant information in texts for embedding data. However, lots of text documents are used in daily communication among people. Thus, developing data hiding techniques using text documents as cover media is needed and useful. It is also a greater challenge to a researcher!
In recent years, several data hiding techniques applied on different kinds of text document format have been proposed. A survey of them is conducted in this chapter.
2.1.1 Review of Data Hiding Techniques in HTML Files
HTML files are used widely on the Internet because it is convenient for people to obtain information directly through web pages. Some techniques for data hiding and its applications in HTML files have been proposed. Wu and Lai [1] hid binary data in HTML files using attributes of tags for bit encoding. Wu, et al. [2] designed a fast fragile watermarking method for copyright protection of web pages based on a hash function to prevent web pages from being tampered with. Chang and Tsai [3] used pseudo-spaces, the specific string “ ,” to encode the copyright data into the text
of an HTML file and duplicated the copyright data to enhance the robustness against HTML manipulations. Lee and Tsai [4] proposed a technique for secret communication by embedding special codes in HTML files to substitute for original white spaces of HTML files.
2.1.2 Review of Data Hiding Techniques in Microsoft Word Documents
The Microsoft Word document is one of the most popular text document formats so far. Because Microsoft Word documents are so common that people are unlikely suspicious of the existence of the secret data hidden in them, it is appropriate to use them as covert communication channels for secret hiding. Also, it is important to verify the integrity and fidelity of the contents of Microsoft Word documents.
Liu and Tsai [5] utilized a change tracking technique in Microsoft Word documents to disguise a stego-document as a normal collaborative document. The secret data are embedded by degenerating the contents of a cover document. Liu and Tsai [6] also designed a data hiding technique in Microsoft Word documents by using block signatures to authenticate messages quoted from credible sources. Moreover, Microsoft Word 2007 is a new format which is different from previous versions of Microsoft Word. Park, et al. [7] concealed the secret data under Microsoft Word 2007 files by inserting unknown parts and relationships which still satisfy the Microsoft Word 2007 standard but are not shown on the display.
2.1.3 Review of Data Hiding Techniques in PDF Files
PDF is another popular file format used in network communication because of its independency of different computer platforms. In recent years, several data hiding
techniques using PDF files as cover documents have been proposed. Zhong, et al. [8]
inserted secret data between indirect objects and modified the cross reference table in PDF files for data hiding. They also proposed another data hiding method in [9] by adjusting the positions of the text characters slightly to embed the secret data. Wang and Tsai [10] achieved authentication of PDF files by embedding authentication signals using the modified values of PDF object parameters, resulting in a slight difference of the PDF appearance that is hard to notice by human eyes. Liu, et al. [11]
did not insert extra data or slightly change the appearance of the original PDF file;
instead, they presented a data hiding algorithm by rearrangement of the order of the sequence of elements in PDF files, resulting in a better data embedding capacity. Data hiding techniques via PDF files can also be attained by using equivalent white space codes or invisible ASCII codes as proposed by Lai and Tsai [12] and Lee and Tsai [13].
2.1.4 Review of Other Techniques and a Summary
For the email format, Lee and Tsai [14] proposed a special ASCII control codes to embed secret data into email text line ends. These special ASCII control codes are invisible while being displayed on the screen and so will not affect a user’s reading of the resulting email. For XML document format which is described by the XML markup language, five techniques to embed secret data into XML documents have been proposed by Inoue et al. [15], including 1) using different representations of an empty XML element, 2) inserting white spaces in tags, 3) exchanging the order of XML elements, 4) exchanging the order of attributes in XML elements, and 5) exchanging inner-tags and outer-tags.
In conclusion, the text document is a good choice as a covert channel for data
hiding because they are common files used for information exchanges in daily works and for communication on the Internet. Some data hiding techniques applied on different kinds of text document formats have been proposed over the past decade.
However, studies on data hiding via XPS documents are not found yet, so we will propose new data hiding techniques and its applications in this study.
2.2 Review of XPS Document Format
2.2.1 Overview
XML Paper Specification (XPS) is a new document format designed to provide a fixed page layout regardless of where and how the document is viewed or printed.
XPS documents are described by an XML-based language [16]. Physically, the XPS document is in fact a compressed ZIP archive called a package, which consists of a XML markup file for each page and other resources including fonts, images, thumbnails, etc. Every component or file stored in the XPS document is called a part of the package and the connection between parts and the document is called relationships. Figure 2.1 shows the basic concept of the XPS document format [16].
XPS Focument Content
graphics and texts using the XPS document format are introduced in detail as follows.
2.2.2 Logical and Physical Hierarchy of an XPS Document
XPS documents contain clear logical and physical hierarchies, compared with other similar document formats. The logical hierarchy of an XPS document is illustrated in Figure 2.2. This example shows that the root of the XPS document references two separate fixed documents. Each fixed document also references a set of fixed pages. Each fixed page is described by the XML markup language and contains resources as references. These resources such as fonts or images can be shared by different pages.
Fixe dDocume ntS e que nce
Fixe dDocume nt 1 Fixe dDocume nt 2
Fixe dP a ge 1 Fixe dP a ge N Fixe dP a ge 1 Fixe dP a ge N
Figure 2.2 Logical hierarchy of an XPS document.
As mentioned previously, an XPS document is a compressed file package. After decompressing the package, we can see the physical organization in an XPS document. An example is shown in Figure 2.3. An XPS document consists of the
hierarchical folders and the document parts such as XML markup files, embedded fonts, images, etc. inside the package. The _rels folders contain files that specify the relationship between resources and pages.
The most important part of an XPS document is the fixed page part, namely, .fpage file, because it describes how a page is rendered using the XML markup language. More detailed page description used in the XPS documents is discussed in the next section.
Figure 2.3 Physical hierarchy of an XPS document viewed in Open XML Editor.
2.2.3 Composition of an XPS Document using XML Markup Language
The XML is used to describe every page included in an XPS document and results in a fixed-layout document. All rules and elements used to compose an XPS document are specified in the XPS Specification [16]. The <FixedPage> element is the root element of a page. Other elements are contained within the <FixedPage>
element. The size of a page is defined by the Width and Height attributes of the
<FixedPage> element.
In addition, the <Glyphs> and the <Path> are two major elements used to create graphics and texts. Figure 2.4 is an example of using the XML to compose a simple page. The <Glyphs> element is used to create text segments. A set of attributes is available to describe the characteristics of the text segment, such as the position, the font type, the font size, and the font color. Similarly, the <Path> element is used to create vector graphics and the <Path.Fill> property element specifies the object including images, gradients, or drawing patterns to fill the geometric area described by the Data attribute. For example, the <ImageBrush> element is used to describe an image to fill the area. The rendered page layout corresponding to Figure 2.4 is shown in Figure 2.5.
Consequently, each page of an XPS document is composed using the XML markup language. By modifying the attribute of elements or appending elements, the page layout can be changed. In this study, we will utilize the above-mentioned XPS document features to develop new data hiding techniques.
Figure 2.4 A simple page viewed in Open XML Editor.
Figure 2.5 The layout corresponding to Figure 2.4 viewed in XPS viewer.
Chapter 3
Covert Communication by
Hierarchical Division of Images in XPS Documents
3.1 Introduction
Since XPS documents are more and more popularly used in daily communication among people, they become good cover media for covert communication. The proposed data hiding method for covert communication via XPS documents is described in this chapter. In Section 3.1.2, the basic idea of the proposed method is described. Detailed data embedding and extraction algorithms are presented in Section 3.2. In addition, some security enhancement measures for the proposed method are proposed in Section 3.3. Experimental results showing the feasibility of the method are given in Section 3.4. Finally, a brief summary is given in the last section of this chapter.
3.1.1 Problem Definition
Covert communication via XPS documents is the first issue we deal with in this study. The aim is to embed a given message secretly into an XPS document so that the stego-document can pretend to a normal document which can then be transmitted to a receiver. It is hoped that other people will so not be suspicious of the document. The receiver can easily get the secret message by extracting it from the stego-document with a secret key. Thus, the problem of achieving covert communication here is how
to find a good “channel” in the XPS document to embed data so that people cannot detect the secret from the document appearance. In case he/she knows the algorithm to extract the secret, he/she still cannot accomplish the secret extraction work without the secret key.
3.1.2 Major Idea of Proposed Method by Hierarchical Division of Images
In the XPS specification, as mentioned in Chapter 2, the <Path> element can be used to create an area to display an image on the XPS document and the Data attribute can be used to describe the area of the image. Figure 3.1 shows an example where the XML markup describes an area filled with an image and the area is drawn from the start point (120,100) to the specified points (632,100), (632,484), and (120,484) sequentially. The corresponding rendered result is shown in Figure 3.1(b).
(a) (b)
Figure 3.1 An example of an XPS document with an image. (a) The XML markup describing an area filled with an image. (b) The corresponding result of (a).
<Path Data="M 120,100 L 632,100 632,484
Accordingly, an image can be partially displayed by narrowing the area described by the Data attribute. And by this function, an image can be hierarchically divided into blocks using multiple <Path> elements. For example, if we divide an image into a pattern like , we can use three <Path> elements to describe it, as illustrated in Figure 3.2. But it is noted that we do not really divide the image itself into pieces; we only change the Data attribute to display the image block by block.
Figure 3.2 The XML markup describing an image divided into the pattern .
According to the above finding of the XPS document property, we may generate block patterns with two levels of divisions to encode message bits (discussed in more detail later). The difference in appearance between the original cover image and the resulting stego-image will be imperceptible. This is just this idea behind the proposed new method for covert communication.
The data hiding process in the proposed method is based on the use of a table designed in this study, which includes a list of block patterns obtained by two-level
<Path Data="M 120,100 L 366,100 366,484 120,484 z" >
<Path.Fill>
<Path Data="M 365,100 L 632,100 632,292 365,292 z" >
<Path.Fill>
<ImageBrush ImageSource="/Documents/1/Resources/Images/1.JPG"
Viewbox="0,0,1024,768" TileMode="None" ViewboxUnits="Absolute"
ViewportUnits="Absolute" Viewport="120,100,512,384" />
</Path.Fill>
</Path>
<Path Data="M 365,292 L 632,292 632,484 365,484 z" >
<Path.Fill>
<ImageBrush ImageSource="/Documents/1/Resources/Images/1.JPG"
Viewbox="0,0,1024,768" TileMode="None" ViewboxUnits="Absolute"
ViewportUnits="Absolute" Viewport="120,100,512,384" />
</Path.Fill>
</Path>
image divisions as mentioned above and a set of corresponding 3-bit codes, as shown in Table 3.1, which we call the division pattern encoding table subsequently. Note that the sizes of all the division patterns are the same, which we call the unit size of division blocks. And an image block of this size is called a unit block. We let the unit size of the division blocks be dynamic, meaning that it is determined in this study by the message length and the cover image (the detail will be described later in this chapter).
Accordingly, a message can be embedded into an XPS document by dividing an image in it into blocks with their division patterns corresponding to the message bits.
Figure 3.3(a) shows an example where an image is divided into a set of block patterns encoding a certain message. In this example, part of the message embedded in the image is S = 100 001 010 000, which corresponds to four division patterns enclosed by the red rectangle shown in Figure 3.3(b). The hidden message can be extracted simply by looking up the division pattern encoding table to find the corresponding codes and concatenate them.
Table 3.1 A division pattern encoding table used for message embedding.
Division
In summary, by using the function of the <Path> element in the XPS document, we can embed a message into an image in an XPS document by skillfully dividing the image into 2-level division patterns which correspond to the message bits according to Table 3.1. The image is not really divided division of it is just conducted in the XML markup of the XPS document; the appearances of the image and the resulting XPS document are totally unaffected and so will arouse no notice from any observer of the image.
(a) (b)
Figure 3.3 An image in an XPS document with a message embedded in it. (The edges of the blocks are emphasized on purpose in order to show the result.) (a) The entire image with division patterns superimposed (not seen in real appearance). (b) The enlarged partial view of (a) with the red rectangular part corresponds to a partial message of S = 100 001 010 000.
3.2 Data Embedding and Extraction Processes
Detailed embedding and extraction algorithms of the proposed data hiding method are described in this section. The embedding process is illustrated by Figure 3.4. First, an input secret message, after its bits being randomized by a secret key, is transformed into a sequence of 3-bit segments. After mapping these segments into a
set of corresponding division patterns, a selected image in the input XPS document is then divided into blocks of these division patterns. Eventually, we get a stego-XPS document with the secret message embedded in it.
Secret
Figure 3.4 Flowchart of the proposed data embedding process.
People who want to send a secret message to others can use this message embedding process to produce a stego-XPS document and deliver it as a normal XPS document to other people. The receiver can extract the secret message correctly using the same secret key. The extraction process is similar to the embedding process but conducted essentially in a reverse order, as illustrated in Figure 3.5. A series of division patterns are extracted from the stego-XPS document and transformed into the corresponding binary values. After concatenating these values and reordering them using the same secret key, the receiver gets the original secret message.
Figure 3.5 Flowchart of the proposed data extraction process.
3.2.1 Proposed Algorithm for Data Embedding
The detail of the proposed algorithm for data embedding is described in the following. In this algorithm, we add a secret key as input to prevent someone who knows the algorithm from extracting the embedded secret. We also calculate the number of characters in the secret message so that we can estimate how many division blocks we need to hide the secret message. Because the unit size of division blocks is dynamic as mentioned previously, we need to embed a unit block at the beginning of creating the required division patterns. Also, we add an “ending signal” at the end of the secret message, instead of embedding the length of the secret message, to mark the end of the embedded message bits. Both the unit block and the ending signal are used in the extraction process to recognize the division patterns. The reason of using them will become clear in the algorithm.
Algorithm 3.1. Data embedding for covert communication.
Input: A secret message S, a cover XPS document D, and a secret key K.
Output: A stego-XPS document D′. Steps:
1. Use the secret key K as a seed to generate a sequence of random numbers Q.
2. Randomize the characters of the input secret message S with the random numbers Q to get a randomized message S′ and let l be the number of characters in S′.
3.3 Add a 9-bit ending signal consisting of three 3-bit segments, sk+1 = 000, sk+2
= 000, sk+3 = 001 at the end of sk, where k = 3×l.
4. Map the 3-bit segments s1, s2, …, sk+3 into a series of division patterns p1, p2, …, pk+3 according to Table 3.1.
5. Add a unit block denoted by p0 at the beginning of the series of division patterns.
6. Perform the following steps on the cover XPS document D.
6.1 Decompress the XPS document D.
6.2 Find a minimum number N such that N2 ≥ k+4.
6.3 Modify the XML markup file in D which describes an image I in the following way.
6.3.1 Divide image I into N×N blocks, n0, n1, …, nN×N−1.
6.3.2 Divide each block ni, 0 ≤ i ≤ N×N−1, into the corresponding division pattern pi until the ending signal is embedded.
6.3.3 Call the final divided image I the stego-image, and denote it by I′. 7. Recompress D (with I′ in it) with the modified XML file to get a stego-XPS
document D′.
3.2.2 Proposed Algorithm for Data Extraction
The detail of the proposed algorithm for data extraction is described in the following. First, we extract the unit block embedded at the beginning of the stego-image to get the information of the block size of the division patterns so that we can decode all the following division patterns in the stego-image. The decoding process stops when the ending signal is extracted. Hence, even when we do not know the length of the secret message, we still know where the end of the message is in the stego-image. By using the same secret key, we can recover the correct message.
The detail of the proposed algorithm for data extraction is described in the following. First, we extract the unit block embedded at the beginning of the stego-image to get the information of the block size of the division patterns so that we can decode all the following division patterns in the stego-image. The decoding process stops when the ending signal is extracted. Hence, even when we do not know the length of the secret message, we still know where the end of the message is in the stego-image. By using the same secret key, we can recover the correct message.