具偽裝效果及驗證功能之文字型文件資訊分享的研究

全文

(1)國立交通大學資訊科學系碩士論文. 具偽裝效果及驗證功能之文字型文件資訊分享的研究 A Study on Information Sharing of Text-type Documents with Steganography and Authentication Capabilities. 研究生：黃貴笠指導教授：蔡文祥教授. 中華民國九十三年六月.

(2) 具偽裝效果及驗證功能之文字型文件資訊分享的研究 A Study on Information Sharing of Text-type Documents with Steganography and Authentication Capabilities. Student: Kuei-Li Huang Advisor: Wen-Hsiang Tsai. 研究生：黃貴笠指導教授：蔡文祥. 國立交通大學資訊科學研究所碩士論文. A Thesis Submitted to Institute of Computer and Information Science College of Electrical Engineering and Computer Science National Chiao Tung University In Partial Fulfillment of the Requirements For the Degree of Master In. Computer and Information Science June 2004 Hsinchu, Taiwan, Republic of China. 中華民國九十三年六月.

(3) 具偽裝效果及驗證功能之文字型文件資訊分享的研究研究生: 黃貴笠. 指導教授: 蔡文祥教授國立交通大學電機資訊學院資訊科學研究所. 摘. 要. 秘密分享是一種資訊隱藏的技術，能將一份秘密資料轉換成多份分享並將之分發給參與者保管。之後，蒐集一定份數以上的分享可將原始秘密資料回復。偽裝學是另一種資訊隱藏技術，能將一份資料編譯或轉換成某種格式的檔案或檔案的一部分，達到保護一份資料的目的。而驗證則是保證資料完整性和真確性的資訊隱藏技術。本研究提出了一些具偽裝效果和驗證功能的文字型文件的資訊分享方法。首先提出的是，具偽裝效果和驗證功能的純文字文件之資訊分享方法，利用邏輯運算對秘密文件做秘密分享，並將分享轉換成利用有意義句子形成的英文文件，達到偽裝的目的。並且將驗證訊號藏入這些英文文件內，達到驗證的目的。接著提出一具偽裝效果的 HTML 文件秘密分享方法，將分享偽裝成一份 HTML 文件，並提出一個用來驗證這些偽裝成 HTML 的文件的新方法。這兩個方法合起來利用合作式分享運算對一份秘密 HTML 文件中的內容做分享，將分享偽裝成跟秘密文件具有相同格調的 HTML 文件，而內容部分則用另一份藏入分享資料和驗證訊號的內容來取代。最後針對電子郵件格式的文件，提出一具偽裝效果的階層式秘密分享的方法和一用來驗證電子郵件形式分享的方法。這些方法對一份秘密電子郵件文件中每一構成要素做秘密分享、偽裝和驗證，而保留了原秘密文件中構成要素的架構。這些方法根據不同格式內容的構成要素，將含有分享資料和驗證訊號藏在裡面的構成要素用來取代原來的構成要素，形成一份份的分享。實驗結果證明了這些方法的可行性。. i.

(4) A Study on Information Sharing of Text-type Documents with Steganography and Authentication Capabilities Student: Kuei-Li Huang. Advisor: Dr. Wen-Hsiang Tsai. Department of Computer and Information Science National Chiao Tung University. ABSTRACT Secret sharing is a kind of data hiding technique that transforms secret data into shares, which can be distributed to participants to keep and collected to recover the original secret. Steganography is another kind of data hiding technique that translates digital data into certain formats with difference appearances for protection of original data. And authentication is a technique for assuring the fidelity and integrity of protected digital data. In this study, secret sharing methods for text-type documents with steganography and authentication capabilities are proposed. A secret sharing method for pure texts with steganography and authentication capabilities is first proposed. The method shares a secret text by exclusive-OR operations. For steganographic effects, the shares are translated into meaningful sentences in which certain authentication signals using spaces are hidden for authentication. Next, a method for sharing HTML documents with steganographic effects is proposed, and a new authentication method for verifying HTML shares is described. The two methods together share the contents of a secret HTML by cooperative sharing operations, and retain the style of the secret HTML in the shares by replacing the contents of the secret HTML with fake contents imperceptibly containing share data and authentication signals. Finally, a hierarchical secret sharing method for e-mails and an authentication method for e-mail shares are proposed. The methods are applied to the components of a secret e-mail, and make the framework of the e-mail shares identical to that of the secret e-mail. The methods substitute fake components with share data ii.

(5) for the original e-mail components to form shares. And depending on the content of each component, corresponding authentication signals are embedded and an appropriate authentication process conducted. Experimental results are also included, which show feasibility of all the proposed methods.. iii.

(6) ACKNOWLEDGEMENTS The author is in hearty appreciation of the continuous guidance, discussions, support, and encouragement received from his advisor, Dr. Wen-Hsiang Tsai, not only in the development of this thesis, but also in every aspect of his personal growth. Thanks are due to Mr. Chih-Hsuan Tzeng, Mr. Chang-Chou Lin, Mr. Chih-Jen Wu, Mr. Tsung-Yuan Liu, Mr. Cheng-Jyun Lai, Mr. Yen-Chung Chiu, Miss Yen-Lin Chen, Mr. Wei-Liang Lin, Mr. Yi-Chieh Chen and Mr. Kuei-Li Huang for their valuable discussions, suggestions, and encouragement. Appreciation is also given to the colleagues of the Computer Vision Laboratory in the Department of Computer and Information Science at National Chiao Tung University for their suggestions and help during his thesis study. Finally, the author also extends his profound thanks to his family for their lasting love, care, and encouragement. He dedicates this dissertation to his parents.. iv.

(7) TABLE of CONTENTS ABSTRACT(in Chinese) ........................................................................................... i ABSTRACT(in English) ........................................................................................... ii ACKNOWLEDGEMENTS ...................................................................................... iii TABLE of CONTENTS .............................................................................................v LIST OF FIGURES ............................................................................................... viii Chapter 1 Introduction..............................................................................................1 1.1 Motivation......................................................................................................1 1.2 Review of Related Works ..............................................................................2 1.2.1 Review of Secret Sharing Methods....................................................2 1.2.2 Review of Steganographic Methods for Text-Type Documents ........3 1.3 Overview of Proposed Methods.....................................................................4 1.3.1 Definitions of Terms ..........................................................................4 1.3.2 Brief Descriptions of Proposed Methods...........................................5 1.4 Thesis Organization .......................................................................................9 Chapter 2 Proposed Techniques of Information Sharing of Text-Type Documents with Steganographic Effects and Authentication Capabilities .............................................................................................10 2.1 Introduction..................................................................................................10 2.2 Review of Adopted Techniques ...................................................................10 2.2.1 Secret Sharing Technique by Exclusive-OR Operations ................. 11 2.2.2 Secret Sharing Technique by Cooperative Sharing Operations ....... 11 2.2.3 Hierarchical Secret Sharing Technique............................................13 2.2.4 Data Hiding for Text Documents .....................................................15 2.3 Proposed Techniques and Contributions......................................................16 2.3.1 Data Magnitude Control by Modulus Adjustment...........................16 2.3.2 Steganographic Technique for Share Data by Use of Simple Sentences..........................................................................................19 2.3.3 Steganographic Technique for Text Components in HTML Documents .......................................................................................22 2.3.4 Steganographic Techniques for Non-text Components in HTML Documents .......................................................................................24 2.3.5 Steganographic and Authentication Techniques for Header Components in E-Mails ...................................................................27 2.3.6 Authentication Technique for Verifying Share Data........................31 Chapter 3 Secret Sharing Method with Steganographic Effects and v.

(8) 3.1 3.2. 3.3. 3.4 3.5. Authentication Capability for Pure Text Documents .........................34 Introduction..................................................................................................34 Overview of Proposed Method ....................................................................35 3.2.1 Secret Pure Text Sharing Process.....................................................35 3.2.2 Secret Pure Text Recovery Process..................................................36 Proposed Detailed Processes for Sharing Secret Pure Texts........................37 3.3.1 Use of Exclusive-OR Operations for Sharing..................................38 3.3.2 Steganographic Technique for Last Piece of Share Data .................39 3.3.3 Steganographic Technique for Authentication of Text Shares .........40 Experimental Results ...................................................................................42 Discussions and Summary ...........................................................................42. Chapter 4 Secret Sharing Method with Steganographic Effects for HTML Documents ..............................................................................................46 4.1 Introduction..................................................................................................46 4.1.1 Properties of HTML Documents......................................................46 4.1.1.1. In-tag Text ............................................................................47 4.1.1.2. Outside-tag Text...................................................................47 4.1.2 Processes of Proposed Method ........................................................48 4.1.2.1. Secret HTML Sharing Process.............................................48 4.1.2.2. Secret HTML Recovery process ..........................................50 4.2 Proposed Detailed Processes for Sharing Secret HTML Documents ..........51 4.2.1 Process for Sharing the Text Component in Secret HTML .............52 4.2.2 Process for Sharing Non-Text Components in Secret HTML .........54 4.3 Experimental Results ...................................................................................57 4.4 Discussions and Summary ...........................................................................58 Chapter 5 Steganographic Method for Tamper Proofing of HTML Shares.......62 5.1 Introduction..................................................................................................62 5.1.1 Idea of Proposed Approach..............................................................62 5.1.2 Overview of Processes.....................................................................63 5.1.2.1. Authentication signal embedding process ...........................63 5.1.2.2. Authentication Signal Extraction and Verification Process .64 5.2 Proposed Detailed Processes for Authentication of HTML Shares .............64 5.2.1 Non-text Component Authentication ...............................................64 5.2.2 Text Component Authentication ......................................................68 5.3 Experimental Results ...................................................................................70 5.4 Discussions and Summary ...........................................................................71 Chapter 6 Hierarchical Secret Sharing with Steganographic Effects for E-mail Documents..................................................................................73 vi.

(9) 6.1 Introduction..................................................................................................73 6.1.1 Properties of E-mail Documents......................................................73 6.1.1.1. Overview of e-mail format: multipurpose internet mail extensions (MIME) ..............................................................74 6.1.1.2. Main components: e-mail header, content, and attachment.74 6.1.1.3. Content transfer encoding methods .....................................75 6.1.2 Overview of Proposed Processes .....................................................76 6.1.2.1. Secret E-mail Sharing process .............................................76 6.1.2.2. Secret E-mail Recovery process ..........................................77 6.2 Proposed Detailed Processes for Sharing Secret E-mail..............................78 6.2.1 Sharing E-mail Header.....................................................................81 6.2.2 Sharing Content Component............................................................83 6.2.3 Sharing Attachments ........................................................................85 6.3 Experimental Results ...................................................................................85 6.4 Discussions and Summary ...........................................................................86 Chapter 7 Steganographic Method for Tamper Proofing of E-mail Shares .......90 7.1 Introduction..................................................................................................90 7.1.1 Proposed Ideas .................................................................................90 7.1.2 Processes of Proposed Method ........................................................91 7.1.2.1. Authentication signal embedding process ...........................91 7.1.2.2. Authentication process.........................................................93 7.2 Proposed Detailed Processes for authenticating E-mail Shares...................94 7.2.1 E-mail Header Authentication..........................................................97 7.2.2 Content and Attachment Authentication ..........................................97 7.3 Experimental Results ...................................................................................99 7.4 Discussions and Summary ...........................................................................99 Chapter 8 Conclusions and Suggestions for Future Works ...............................104 8.1 Conclusions................................................................................................104 8.2 Suggestions for Future Works....................................................................106 REFERENCES ........................................................................................................107. vii.

(10) LIST OF FIGURES Figure 2.1 A hierarchical secret sharing example. ......................................................15 Figure 2.2 An example of hiding data in a text document. .........................................16 Figure 2.3 A hierarchical sharing example after applying data magnitude control. .........................................................................................................................20 Figure 2.4 Flowchart of share data translation process...............................................20 Figure 2.5 Flowchart of creating authentication capability.........................................32 Figure 2.6 Flowchart of authenticating process. .........................................................33 Figure 3.1 Flowchart of secret pure text sharing process. ..........................................35 Figure 3.2 Flowchart of process of authentication signal embedding. .......................36 Figure 3.3 Flowchart of authenticating process. .........................................................37 Figure 3.4 Flowchart of process of recovering secret pure text..................................37 Figure 3.5 Flowchart of process of embedding authentication signals into a stego-text......................................................................................................................41 Figure 3.6 Flowchart of verification process of share text..........................................41 Figure 3.7 The secret pure text....................................................................................42 Figure 3.8 Stego-texts (a) through (b) articles selected from an article database. ......43 Figure 3.9 The last stego-text......................................................................................44 Figure 3.10 Shares (a) the last share; (b) one of the other shares. ..............................44 Figure 3.11 Recovered secret pure text. ......................................................................45 Figure 4.1 An HTML document. (a) the source code; (b) the display on a browser.........................................................................................................................48 Figure 4.2 Flowchart of component extraction and sharing. ......................................49 Figure 4.3 Flowchart of the steganography processes for the share data of components. .................................................................................................................50 Figure 4.4 Flowchart of the process of stego-HTML document creation...................50 Figure 4.5 Flowchart of the process of share data extraction of components.............51 Figure 4.6 Flowchart of secret component recovery processes. .................................51 Figure 4.7 Flowchart of secret HTML recovery process. ...........................................52 Figure 4.8 A secret HTML document..........................................................................59 Figure 4.9 stego-HTML documents (a) through (b) stego-HTML documents generated from the secret HTML document in Figure 4.8. .........................................60 Figure 4.10 Recovered secret HTML document.........................................................60 Figure 4.11 Image components. (a) secret image component; (b) the corresponding stego-image component of the first stego-HTML document; (c) the corresponding stego-image component of the second share HTML. ....................61 Figure 5.1 Flowchart of process of embedding authentication signals.......................65 viii.

(11) Figure 5.2 Flowchart of share authenticating process.................................................66 Figure 5.3 A stego-HTML document. .........................................................................71 Figure 5.4 A share of the HTML document in Figure 5.3...........................................72 Figure 5.5 A verified share of a tampered HTML share. ............................................72 Figure 6.1 Conceptual expression of MIME format. ..................................................75 Figure 6.2 Flowchart of component extraction and sharing process. .........................77 Figure 6.3 Flowchart of steganographic effect creation process.................................78 Figure 6.4 Flowchart of process of share data extraction. ..........................................81 Figure 6.5 Flowchart of recovery process of secret e-mail.........................................81 Figure 6.6 A secret e-mail. ..........................................................................................88 Figure 6.7 Hierarchical framework relationship among four participants..................88 Figure 6.8 Stego-e-mails. (a) through (b) Two of four stego-e-mails. ........................89 Figure 6.9 Recovered secret e-mail.............................................................................89 Figure 7.1 Flowchart of share data extraction process................................................92 Figure 7.2 Flowchart of processes of authentication signal embedding. ....................92 Figure 7.3 Flowchart of process of share creation. .....................................................93 Figure 7.4 Flowchart of component extraction process. .............................................93 Figure 7.5 Flowchart of authenticating process. .........................................................94 Figure 7.6 Flowchart of the process of verified share e-mail creation........................94 Figure 7.7 A stego-e-mail..........................................................................................100 Figure 7.8 The share of the stego-e-mail in Figure 7.7. ............................................101 Figure 7.9 The contents of the HTML attachment....................................................101 Figure 7.10 A successfully verified share. ................................................................102 Figure 7.11 A verified share with an improper key. ..................................................102 Figure 7.12 The contents of the HTML attachment in the verified share in Figure 7.11. ................................................................................................................103 Figure 7.13 The verified share of the share in Figure 7.8, which is tampered..........103. ix.

(12) Chapter 1 Introduction Motivation With the advance of the Internet and computer technologies, more and more documents are transformed into digital versions and can be transmitted on the Internet. For an organization, it is convenient to exchange messages and documents via the Internet. However, because digital documents can be copied easily and quickly, important documents, such as contracts, conference records, technical reports, source codes, strategic decisions, etc, which must be kept only by part of the staff of the organization should be dealt with carefully. One method to manage this kind of document is secret sharing. Although secret sharing has been applied to various kinds of files, such as images and videos, it is worthy to apply it to text-type documents, one kind of file, for their frequent usage. Secret sharing is a way to transform secret data into many meaningless parts which are then assigned to participants and to recover the secret data by collecting a sufficient number of parts from the participants. Because of the meaninglessness of each part kept by the participants, they will be worried about where their own parts can be hidden to keep curious users from accessing their parts and how secure a network environment would be to provide secure transmissions for these important parts. If each part can be covered by or hidden in other meaningful media, the possibility that these parts bring about other users’ awareness will be relatively lower. This effect can be achieved by applying techniques of steganography to the meaningless parts. How to accomplish such a goal is an interesting issue.. 1.

(13) Although each part is meaningful and can avoid a certain degree of the awareness of curious users, we cannot know whether the part is changed or not. The part may be changed accidentally while, for example, being transmitted on an unstable network; or the part may be modified intentionally by an intended user. Therefore, authenticating the parts is necessary for ensuring the integrity of each part. How to conduct authentication of parts is another interesting issue which will be investigated in this study.. Review of Related Works 1.1.1 Review of Secret Sharing Methods Secret sharing is a way to encrypt and distribute secret data into several parts so that each part, kept by a participant, contains only partial information of the secret. The secret data can be recovered if a pre-defined group of parts is collected. Moreover, collecting a group of parts different from the pre-defined group cannot recover the secret information. Shamir [1] was the first to propose the concept of secret sharing in his (k, n)-threshold method, where n denotes the number of participants and the threshold k specifies the minimum number of parts in the pre-defined group. By this method, secret information is encrypted and then distributed into n parts, which are assigned to n participants, respectively. If and only if k or more than k participants get together, the secret information can be recovered by a certain method. Subsequently, many related topics were studied [2]-[7] and various kinds of secret sharing methods were proposed [8]-[13]. Nevertheless, these proposed methods are only suitable for data of short lengths, such as passwords, encryption and decryption keys, and so on. 2.

(14) Applied to images, many secret sharing methods based on the (k, n)-threshold method were proposed [14]-[18]. Especially, an efficient secret sharing method using exclusive-OR operations was proposed by Lin and Tsai [19]. This method is an (n, n)-threshold method. It simply applies the exclusive-OR operation to a secret image as well as n-1 images, all of the size of the secret one, to generate the nth image. The n-1 images and the nth image altogether are regarded as shares and are distributed to n participants, respectively. The secret image can be quickly recovered by exclusive-ORing the n images held by the n participants. Based on ideas from company organizations, Lin and Tsai [20] proposed a method of hierarchical secret sharing, which is a new concept of sharing secret among groups of participants. Three types of secret sharing, cooperative sharing, independent sharing and dominant sharing, were proposed for realizing the concept. The method first specifies a hierarchical structure as a tree, in which each non-leaf tree node denotes one of the three operations and each leaf node denotes one of the parts. According to the tree, secret information can be encrypted and distributed into parts corresponding to the leaf nodes and recovered as well.. 1.1.2 Review of Steganographic Methods for Text-Type Documents A steganographic method for text-type documents is to embed message data into a text-type document to avoid awareness of the message. The capacity of the redundancy information of text-type documents for hiding data is smaller than that of images or videos. Methods for hiding data in text-type documents may embed data into the text itself or in the language describing the text format.. 3.

(15) Wayner [21] proposed a method that creates cover texts according to the secret message using content-free grammars and selects “proper” productions that are relatively more meaningful than the improper ones, to achieve steganographic effect for covert communication. Therefore, a cover text itself is also the encoded secret message. The secret message can be decrypted by parsing the transmitted covert texts. As for hiding data in the cover text, Maxemchuk et al. [22]-[23] described a steganographic method for text-type documents. Secret data are embedded by adjusting the distance between two successive between-word spaces or between two successive interline spaces. For example, if the distance between two successive interline spaces is larger than a threshold, the embedded information is “1”; otherwise “0.” For a soft-copy text, Bender et al. [24] proposed a similar method that exploits inter-sentence spacing, end-of-line spacing, and between-word spacing to embed secret information. For instance, a single between-word space means that “0” is embedded in and a double between-word space is interpreted as “1”. For documents of complex formats, Chang and Tsai [25] proposed a method for covert communication using HTML documents. Because a web browser does not display a sequence of spaces following a leading space and tags in an HTML document, the method can encode secret information by adjusting the size of between-word spaces and the expression of tags.. Overview of Proposed Methods 1.1.3 Definitions of Terms 1.. Document: A document is a text-type file, such as a piece of pure text, an HTML file, or an e-mail. 4.

(16) 2.. Component: A component of a document is an independent part in the document, which another part of the same format can be substituted for directly. For instance, text and non-text parts in HTML’s, headers and attachments in e-mails are called Components.. 3.. Secret: A secret is certain information that is important and should be preserved properly.. 4.. Share data: share data are the secret sharing result of a secret.. 5.. Stego-document: A stego document is a document with share data embedded in. For example, An HTML document with share data embedded in is called a stego-HTML document and an e-mail with share data embedded in is called stego-e-mail.. 6.. Share: A share is the result after applying steganographic and authentication techniques on a piece of share data.. 1.1.4 Brief Descriptions of Proposed Methods A.. Proposed Techniques of Information Sharing of Text-Type Documents with Steganographic Effects and Authentication Capabilities Some possible techniques for sharing text-type documents with steganographic. effects and authentication capabilities have been reviewed previously. Accordingly, several information sharing, steganographic effect creation and authentication techniques for text-type documents are proposed independently and in detail. These techniques will be selected properly and applied cooperatively later to documents of three different text-types, namely, pure text, HTML, and e-mails. B.. Proposed Method for Sharing Pure Text Documents with Steganographic Effect and Authentication Capability 5.

(17) Several techniques are combined in this study for sharing pure text documents with steganographic effects and authentication capabilities. The first proposed technique of sharing secrets is based on exclusive-OR operations and encodes and distributes a secret pure text document into several pieces of share data. The second proposed technique translates a piece of share data into several simple sentences to form a meaningful text, a stego text. The purpose of this technique is to make each piece of share data meaningful. Then, by inter-word, inter-sentence and inter-line spacing, authentication signals can be hidden in the stego text by the third proposed technique to generate a share, and accordingly the share can be authenticated later. By combining the three techniques together, a secret pure text document can be shared among the participants of the secret pure text document and each share can be verified. C.. Proposed Secret Sharing Method for HTML Documents with Steganographic Effect One secret sharing technique for HTML documents and two steganographic. methods for share data are combined to create a secret sharing method for HTML documents in this study. The method parses and locates components in an HTML document and then encrypts them into pieces of share data by the use of a certain modified version of the cooperative sharing operation mentioned previously. Since users of web browsers cannot be aware of certain symbols, such as spaces, new-lines, tabs, and text inside tags in HTML’s, a steganographic method for HTML shares is also proposed and transforms the pieces of share data into HTML documents of the style of the secret HTML documents by exploiting some properties mentioned above. To simulate the style of the secret HTML, components of the secret HTML, such as texts, images, videos, etc, are replaced by cover ones of the same type, respectively. For instance, in a HTML document, an image link, which can show an image on 6.

(18) browsers, is substituted by another link which will display another image on the browsers. In addition, each cover component also contains the corresponding piece of share data. D.. Proposed Steganographic Method for Tamper Proofing of HTML Share Documents Two steganographic methods are proposed in this study for authenticating. HTML share documents. Authentication signals of share data are generated by segmenting share data into several strings of one length according to the size of the hiding space and exclusive-ORing the strings into a string. The properties of HTML documents are used again to hide authentication signals in tags, and inter-word, inter-line, and inter-sentence spaces. For text components, their authentication signals are embedded in the between-word spaces, where the corresponding share data are also embedded. By specifying the number of authentication signals embedded in each between-word space and the position relation between share data and authentication signals, the authentication technique can work well. For non-text components, the authentication signals of share data are hidden in tags of the HTML document with share data embedded in. E.. Proposed Hierarchical Secret Sharing Method for E-mail with Steganographic Effect A hierarchical secret sharing method for secret e-mails and three cooperative. steganographic methods for e-mail shares are proposed in this study. A secret e-mail contains many components which can be sorted the components into three kinds, header, body, and attachment. For header components, important information in each component is extracted and shared by a modified hierarchical secret sharing technique. Body components can be classified into two types: pure text and binary stream. As for attachment components, there are three types of the components, e.g. e-mail, pure text, 7.

(19) and binary stream. By applying the modified method of hierarchical secret sharing, body and attachment components of pure-text type and binary-stream type can be shared as well. The e-mail elements can be treated as a new secret e-mail and processed by the proposed method. The steganographic method refers to the framework of the secret e-mail, and replaces a header component with a cover header, a component of pure-text type with another pure text, and a component of binary-stream type with an HTML document. Component contains their corresponding share data so that illicit users opening the stego e-mail made up of the components cannot nose out anything different. F.. Proposed Steganographic Method for Tamper Proofing of E-mail share Documents Three steganographic methods for tamper proofing of e-mail share documents. are combined and proposed in this study. Each component’s authentication signals are generated independently by the authentication signal generation method, as mentioned previously while generating authentication signals for stego HTML documents, and embedded into the cover host of the components. For header components, because an e-mail parser discards the string of space and tab symbols at the rear of lines in a header, the authentication signals are encoded in this study into several strings of tab and space symbols, which are concatenated at the rear of the lines. For pure-text type components, the between-word spacing scheme of the data hiding method proposed by Bender et al. [24] is used to embed authentication signals. For binary stream elements, an element’s authentication signals are inserted in the tags of the corresponding cover host, an HTML document. After hiding authentication signals into the e-mail in which share data are embedded, a share is generated. While proofing, the method examines a share component by component and fingers out which components are tampered with. 8.

(20) Thesis Organization In the remaining chapters of this thesis, the adopted techniques are reviewed and the proposed techniques are described briefly, in Chapter 2. The proposed secret sharing method for pure text documents with steganographic effects and authentication capabilities is described in Chapter 3. The proposed secret sharing method for HTML documents with steganographic effects is described in Chapter 4. The proposed steganographic method for assuring the fidelity of HTML shares is described in Chapter 5. In Chapter 6, the proposed method for sharing e-mail documents by modified hierarchical secret sharing method and for creating steganographic effects on share data of secret e-mails are described. In Chapter 7, the proposed steganographic method for temper proofing of e-mail shares is described. And in Chapter 8, conclusions will be made and future works for further study will be suggested.. 9.

(21) Chapter 2 Proposed Techniques of Information Sharing of Text-Type Documents with Steganographic Effects and Authentication Capabilities Introduction In order to achieve the goal of this study, several related techniques are required. In this chapter, the adopted techniques and the proposed ones for realizing information sharing of text-type documents of different formats with steganographic effects and authentication capabilities are described. In Section 2.2, the adopted techniques are reviewed in detail first. These techniques are not necessarily suitable for text-type documents. However, with modifications the techniques work well on them. In Section 2.3, the proposed techniques are described. Some of them are derived from the adopted techniques. Some of the proposed techniques are useful for general text-type documents and the others are suitable for texts of special types.. Review of Adopted Techniques Four techniques will be reviewed in the following. The first technique is to share secret images by the exclusive-OR operation. A discussion on the feasibility of applying the technique to texts will also be made. The second and third ones are about the so-called hierarchical sharing. Brief discussions on their applications to text. 10.

(22) sharing will be presented. The last one is about data hiding in text documents.. 2.1.1 Secret Sharing Technique by Exclusive-OR Operations In this section, a technique of secret sharing using the exclusive-OR operation proposed by Lin and Tsai [19] will be reviewed. Originally, this technique is applied to share images. A secret image is exclusive-ORed pixel by pixel with some randomly selected images, all of the same size as that of the secret image. The selected images and the resulting image which is meaningless are regarded as shares and distributed to secret sharing participants. Due to the constraint that the sizes of the selected images must be the same as that of the secret image, the technique can be utilized for texts only under the condition that the lengths of secret texts and those of selected texts are the same. This requirement can be relaxed for pure texts, which will be described in Chapter 3.. 2.1.2 Secret Sharing Technique by Cooperative Sharing Operations The cooperative sharing operation is one of the three main operations of hierarchical secret sharing proposed by Lin [20]. A property of the operation is that only when all shares are collected can the secret be recovered. In addition, secret sharing by cooperative sharing operations itself is an (n, n)-threshold method. The function of cooperative sharing for two participants is shown as follows:. f ( x ) = ( x − a ) + ( x − b) + s , 11.

(23) where s is the secret, and a and b are randomly selected integers. And (a, f(a)) and (b, f(b)) are distributed to the two participants as their own share data of s, respectively. The secret s can be recovered only when the two participants cooperate in the following way. From the viewpoint of the participant who keeps (a, f(a)), s equals to f(a) – a + b. Instead, from the other participant’s point of view, s can be obtained by computing f(b) – b + a. For more than two participants, the function can be revised as follows: n. f ( x) = ∑ ( x − ai ) + s , i =1. where n is the number of participants, s is the secret, and all ai are randomly selected integers, where i = 1, 2, …, n. Hence, the ith participant keeps share data (ai, f (ai)), and for each participant, s can be revealed by the cooperative recovery formula in the following: s = f (ai ) − (n − 1) × ai +. n. ∑a. k =1;k ≠i. k. .. An illustrative example is described as follows. Let the participant number n be 3, the secret s be 24, and the randomly selected integer ai be in order 129, 3, and 10, for i = 1, 2, 3. The three participants’ share data are (129, 269), (3, –109), (10, –88), respectively. For the participant who keeps the share data (3, –109), according to the cooperative sharing recovery function, the computed value is −109 − [(3 − 1) × 3] + [129 + 10] = 24, which equals to s. While trying to apply the technique to texts, a critical issue is encountered. As seen in the example illustrated above, the minimal space size for storing a piece of share data is not consistent. For share data (129, 269), three bytes will be used; as for either of the remaining ones, two bytes will be used. How to limit the size of each 12.

(24) piece of share data by controlling their magnitudes will be mentioned in 2.3.1.. 2.1.3 Hierarchical Secret Sharing Technique Hierarchical secret sharing is a new concept of secret sharing. From a behavior point of view, a senior person initiates the secret sharing activity among participants; the new concept is first to share the secret among several groups formed by the participants and then, regarding each piece of share data as the group secret of the corresponding group, to continuously share the group secret of each group among the smaller groups formed by the participants of the group, until each participant of the secret gets his/her own share data. However, by applying three different sharing operations, namely, the cooperative sharing operation, the independent sharing operation, and the dominant sharing operation, the concept of “hierarchical” secret sharing can be truly realized. To understand the technique of hierarchical secret sharing, the three main sharing operations are first described and a description of how the hierarchical secret sharing technique works follows. The description of the cooperative sharing operation will be given in 2.2.2 and is skipped here. The concept of independent sharing operation is that each participant knows the secret. Its corresponding function is described in the following: n. f ( x) = ∏ ( x − ai ) + s , i =1. where s is the secret, n denotes the number of participants, and ai denotes a randomly selected integer for the ith participant. The ith participant keeps share data (ai, f(ai)), where f(ai) equals to s. The concept of dominant sharing operation is that only one “dominant”. 13.

(25) participant can know the secret and the others can get the secret only after the permission of the dominant participant. The function of dominant sharing is as follows: n. f ( x) = ∏ ( x − ai ) + ( x − a1 ) + s , i =1. where s is the secret, n denotes the number of participants, and ai denotes a randomly selected integer, for i = 1 through n. The first participant keeping share data (a1, f(a1)) knows the secret because f(a1) equals to s, while the other participants must get the permission of the first participant to know the secret. Note that the ith participant, except the first one, can compute the secret by the formula s = f(ai) − ai + a1. That is, if these participants are trustworthy, the first participant can transmit just a1 to the ith participant via the Internet and the ith participant can know the secret without caring about the secure problem. According to the requirement of a group of participants, the group secret is processed by one of the three sharing operations. An illustrative example is presented in Figure 2.1. Suppose that Participants 1 and 2 are two managers of a company, Participant 3 is the president of the company, and Participant 4 is the secretary of the president. Now, assume that a secret of the company is 5. The secret can be known only under the condition of acquiring the president’s and one of the two managers’ agreements. In order to avoid failure of secret recovery coming from the absence of the president and the hard time to reveal the secret, the secretary is standby for such the condition. Usually, the president, Participant 3, and one of the two managers, for example, Participant 1, can cooperate to get the secret after three recovery operations.. 14.

(26) Cooperative Secret: 195 Dominant S34: (10, 134) s34:134100. Independent S12: (71, 256) s12: 256710 Participant 1 S1: (22, 256710). Participant 2 S2: (12, 256710). *Participant 3 S3: (17, 134100). Participant 4 S4: (58, 134141). Figure 2.1 A hierarchical secret sharing example. Firstly, Participant 1 can recover his/her and Participant 2’s composite secret independently by simply extracting the second part of pairs. Secondly, Participant 3 can recover his/her and Participant 4’s composite secret by himself/herself by directly taking the second part of his/her own share data out. Finally, Participant 1 and Participant 3 can use their secrets, which are just recovered, as shares to recover the secret by the cooperative recovery formula. Under the condition just mentioned, the absent president can send the first part of his/her share data to his/her secretary and then his/her secretary can get their composite secret as well by computation. As discussed in Section 2.2.2, the data magnitude control problem also exists in the hierarchical secret sharing technique for real world applications. The method in Section 2.3.1 is proposed for solving this problem.. 2.1.4 Data Hiding for Text Documents Three techniques of data hiding that will be reviewed here are proposed by Bender et al. [24]. These techniques hide data into text documents through manipulations of three kinds of spaces, including inter-sentence space, end-of-line space, and inter-word space, in text documents. For instance, one space between two. 15.

(27) successive words, sentences, or lines may be regarded to represent a “0”, while two spaces a “1”. Therefore, the article in Figure 2.2, for example, contains 8 bit data “01011110.”. W e a W o r u a s u n. r l r. e d . e s h. T h e Y o m y i n e .. Figure 2.2 An example of hiding data in a text document.. Proposed Techniques and Contributions 2.1.5 Data Magnitude Control by Modulus Adjustment In order to apply the cooperative sharing technique and the hierarchical secret sharing technique to text-type documents, some functions of the three operations are modified in this study. Consequently, the storage size of a share is exactly two times larger than that of the corresponding secret. That is, a share is stored in two bytes, while the secret is stored in one byte. In the following, the encountered problem is first described and investigated in detail. Then a technique of controlling the magnitude of shares is proposed. Finally, an example is given. A piece of share data is of the form of a pair (x, f(x)), where x is randomly selected and the function f is one of the three functions of the three sharing operations. By inspection of these functions, the range of the first part of a pair can be controlled easily and the value range of f(x) depends on the number of participants, the secret 16.

(28) value, and the first parts of the shares. Moreover, the value range of f(x) is directly proportional to the number of participants. Although the second parts of the shares can be controlled by selecting the first parts of the shares carefully so that the distance between every two first parts is as short as possible, the significance of randomizing for sharing secret becomes meaningless. Furthermore, the probability of guessing the secret correctly by a brute force method will increase. How to control the magnitudes of share data without losing the meaning of randomization is described in the next paragraph. By the concept of algebra, a set of all possible remainders of a prime divisor, an addition operator, and a multiplication operator can form a field, which has some good properties: (1) each element of the set has one and only one addition inverse and one and only one multiplication inverse; (2) there are one and only one addition unit element, and one and only one multiplication unit element. Let the prime be 7, for example. The remainder set of the prime divisor is {0, 1, 2, 3, 4, 5, 6}. Also let the addition and multiplication operators be the ordinary addition and multiplication operators, respectively, 0 be the addition unit element, and 1 be the multiplication unit element. Some examples are listed as follows: 1.. 2 + 5(mod 7 ) = 0 ;. 2.. 4 × 2(mod 7 ) = 1 .. In the first example, 5 is called the addition inverse of 2 and vice versa; and in the second example, 4 is called the multiplication inverse of 2 and vice versa. Therefore, the magnitudes of share data can be restricted by modulus adjustment and the computation of recovering a secret can be performed correctly according these properties of the field. The functions of the three sharing operations are revised. 17.

(29) accordingly in the following:. 1.. ⎞ ⎛ n Revised Cooperative Operation: f ( x) = ⎜ ∑ ( x − ai ) + s ⎟ , ⎠ mod p ⎝ i=1. 2.. ⎛ n ⎞ Revised Independent Operation: f ( x) = ⎜⎜ ∏ ( x − ai ) + s ⎟⎟ , ⎝ i=1 ⎠ mod p. 3.. ⎛ n ⎞ Revised Dominant Operation: f ( x) = ⎜⎜ ∏ ( x − ai ) + ( x − a1 ) + s ⎟⎟ , ⎝ i=1 ⎠ mod p. where s is the secret, n is the number of participants, p is a prime number, and ai is a randomly selected integer from the remainder set of divider p. In the next paragraph, the proposed hierarchical secret sharing technique for texts will be described in detail. Because a text is a byte-based file, the integer 257 is chosen as the divider p in this study. The value of the second part, f(x), of a piece of share data now ranges from 0 to 256 after the modulo p operation, while the value of a byte ranges from 0 to 255. By dealing with the special case of value 256 in the second part of share data, two-byte space can store all possible values of a pair. In cooperative sharing, because of the linearity of the function in the set of the selected random integers, the condition that the second parts of two pairs are identical implies the fact that the first parts of the two pairs are identical. Therefore, a constraint for the special case is specified as follows: a1 + … + an = [(n − 1) × 256 + s]mod 257. where s is the secret of one byte, n is the number of participants of s, and a1, …, an are the first parts of the n pairs and different from each other, and range from 0 to 255. Under this constraint, the function value, the second part of a pair, of ai can be limited to the interval from 0 to 255. 18.

(30) In dominant sharing, by inspection, the corresponding function is a linear function among the set of all randomly selected integers {a1, a2, …, an}. Moreover, changing the value of a1 can influence all the function values of ai while modifying the value of another randomly selected integer, say ai, can only change the function value of ai. Therefore, after specifying a1, the other integers a2, …, an can be randomly selected under the condition that ai must not be (256 − s + a1)mod 257 to avoid the function value to become 257. As for independent sharing, if the secret value is not 256, the appearance of the special case is impossible. And the secret value is really impossible to be 256. Using the example in 2.2.3 again, let’s see how the revised hierarchical secret sharing technique works and what differences exist between the revised one and the original one in Figure 2.3. In this example, the secret is first shared among two groups by cooperative sharing into two pieces of share data: (100, 201) and (94, 189). In turn, independent sharing is byte by byte applied to s12, which is the composite secret of Participant 1 and Participant 2 and regarded as a two-byte secret. Next, the two-byte composite secret, s34, of Participant 3 and Participant 4 is processed byte by byte by dominant sharing. Finally, each participant gets a four-byte piece of share data. Therefore, the hierarchical secret sharing technique for texts can run well. The recovery processes of the three sharing operations can also run correctly and well.. 2.1.6 Steganographic Technique for Share Data by Use of Simple Sentences In this section, a technique of translating a piece of share data, which is almost. 19.

(31) always meaningless, into several meaningful simple sentences is proposed. Firstly, a steganography procedure is shown. Secondly, the technique is described in detail. Finally, an illustrative example is described. Cooperative Secret: [195] Dominant S34: (94, 189) s34: [94, 189]. Independent S12: (100, 201) s12: [100,201]. Participant 1 S1: (22, 100), (6,201). Participant 3 S3: (6, 94), (20,189). Participant 2 S2: (12, 100), (2,201). Participant 4 S4: (19,107), (10,179). Figure 2.3 A hierarchical sharing example after applying data magnitude control. Verb Database Name Database. Share data S. Sentence Pattern Database Pattern A. Sentence Translation Machine. Sentences ------------Tim likes May. Woods hits John. ………. Figure 2.4 Flowchart of share data translation process. In Figure 2.4, share data is first pushed into a simple sentence translation machine designed in this study. The machine divides the share data into several parts. and encodes each part into a sentence according to its sentence pattern and its corresponding databases by an indexing technique. While recovering the share data, sentences are decoded by inverse indexing. The technique uses one sentence pattern, the subject-verb-object pattern, and two corresponding databases, a name database and a verb database. Each database 20.

(32) contains 257 words, where the first 256 words are used for encryption and the last one is used as the end-of-share-data word. The translation procedure for encoding text share data is described in detail as follows.. Algorithm 2.1 Translation from share data to simple sentences. Input: share data S. Output: an encoded string Se. Steps.. 1.. Pad S with the “super” byte of the value 257, until the length of S is divisible by 3.. 2.. Divide S into three-byte strings.. 3.. For each string, replace the first byte and the third byte with names in the name database and the second byte with a verb in the verb database all by indexing.. 4.. For each word of the third bytes, concatenate an end-of-sentence symbol in the tail.. 5.. For each word, concatenate a space in the tail.. 6.. Concatenate all sentences into a string as Se. The corresponding recovery procedure is as follows.. Algorithm 2.2 Translation from simple sentences to share data. Input: a string of sentences S’. Output: a decoded share data Sd. Steps.. 1.. Divide the sentences in S’ into many ordered sentences.. 2.. For each sentence, decode the first and the third words by inverse indexing using the name database and the second words by inverse indexing using 21.

(33) the verb database. 3.. Concatenate these bytes according to the order of the sentences into a decoded string.. 4.. Remove the “supper” bytes in the tail of the decoded string to get Sd.. Now, an example is given. For convenience, let share data S be “HELLO.” First, S is divided into “HEL” and “LOθ”, where θ denotes the “super” byte. After encoding. by indexing, they become a sentence like “Tim likes May.” and “Woods hits John.” The name “John” is the end-of-share-data word. In the end, the encoded string Se is “Tim likes May. Woods hits John.” In the decoding stage, Se is first divided into two single sentences, “Tim likes May.” and “Woods hits John.”, and then, by inverse indexing and concatenation, a string “HELLOθ” is acquired. Finally, the decoded share data Sd is “HELLO” by eliminating the “supper” byte.. 2.1.7 Steganographic Technique for Text Components in HTML Documents A steganographic technique for text components in HTML documents is proposed. According to a property of the HTML, text components that can be seen on browsers can be substituted by fake text components and the data of text components are translated and hidden in a substitute one without arousing any awareness of the hidden data. First, the behavior of text contents on a browser is described. Finally, the proposed steganographic technique is described. The text components in an HTML document are the texts outside the tags and can be displayed on a browser. Only one space symbol between two successive words is displayed on a browser while, actually, a sequence of tab symbols, new-line. 22.

(34) symbols, and space symbols are bundled into the position between the two consequent words in an HTML document. An additional symbol of ANSI code 0x0C can behave as the three symbols in the environment of the Internet Explore (IE) browser. For other browsers, some different symbols have the same property and can be used as the three symbols. These symbols can be put in use in the proposed steganographic technique. In this study, Internet Explore browser is used as the HTML browser. Let a space symbol denote a “0,” a tab symbol denote a “1,” a new-line symbol denote a “2,” a symbol with ASCI code 0x0C denote a “3,” L(x) be the length of text x and N(x) be the number of inter-word, interline and inter-sentence spaces in text x. The proposed procedure for creating steganographic effects on important parts of an HTML document as a fake HTML document is as follows.. Algorithm 2.3 Steganographic effect creation for text components. Input: an HTML document H and an English article A. Output: an HTML document Hs. Steps.. 1.. Extract every text content segment Si between two successive tags in H.. 2.. For each text content segment Si, perform the following steps. (i). Cut an appropriate number of sentences in front of the remainder of A as Ci such that the difference between L(Ci) and L(Si) is minimal.. (ii) Translate Si into a string Ei made up of tab, new-line and space symbols and the symbol of ANSI code 0x0C. 3.. For each inter-word, inter-line, or inter-sentence space in Ci, embed a space symbol and ⎡N (Ci ) / N ( Ei )⎤ symbols of Ei into Ci to generate Di.. 4.. Assign Di to Si.. 5.. Put back each extracted and processed segments Si to form the result Hs.. 23.

(35) The corresponding recovery procedure is described in the following.. Algorithm 2.4 Recovery process of Algorithm 2.3. Input: an HTML document H’. Output: a recovered HTML document Hr. Steps.. 1.. Extract every text content segment Si between two successive tags in H’.. 2.. Create an empty string S.. 3.. For each text content segment Si, perform the following steps. (i). Clean up S.. (ii) Extract all inter-word, inter-line, and inter-sentence sequences Lj of space symbols, new-line symbols, symbols of ANSI code 0x0C and tab symbols, except the leading symbol. 4.. For each sequence Lj, append Lj to the rear of S.. 5.. Assign S to Si.. 6.. Replace all extracted and processed segment Si with the original ones in H’ to form the result Hr.. Although this technique is applied to the text components of an HTML document, it can also be utilized for the share data of the text components without any modification.. 2.1.8 Steganographic Techniques for Non-text Components in HTML Documents A steganographic technique for non-text components in HTML documents is proposed. Non-text contents that can be displayed on browsers are of the form of links 24.

(36) in the tags of an HTML document. According to the concept of “dynamic” links, non-text contents can be replaced with a fake link and become part of the fake link. In the following, how a dynamic link works on the internet is introduced first, how to replace a link into a workable dynamic one that contains the original link is proposed then, and, in the end, an illustrative example is given. A “static” link is the link which indicates directly the address of the corresponding source on the internet. However, a “dynamic” link is the link which contains the address of an agent in a server on the internet and information for the agent following the address. A dynamic link, for example, is of the form: “http://www.cis.nctu.edu.tw/~gis91568/agent?123456”, where the string before the question mark is the address of the agent and the string following the question mark is the information for the agent. After a browser gets the address of the agent from the link and informs the agent the information in the link, the agent returns the corresponding source according to the information. For example, a dynamic image link containing the address of an image database agent and the information of the image index in the database can let a browser know where the image database agent is on the internet and which image should be retrieved from the database according to the image index. Finally, the agent returns the image of the index in the image database. Suppose that a multimedia database MD and an agent A of MD are created and that the address of the agent on the internet is a dynamic link ADD. The procedure of the proposed steganographic technique is described as an algorithm as follows.. Algorithm 2.5 Steganographic effect creation for non-text components. Input: a secret link L, ADD. Output: a dynamic link DL.. 25.

(37) Steps.. 1.. Translate L into a string S in hexadecimal form byte by byte.. 2.. Create a dynamic link DL.. 3.. Set DL=ADD?S, where ‘?’ is the special symbol for separating the address ADD and its information S.. The string S is used as an index in the above algorithm. A dynamic link created in this way can be understood by browsers and work well on the internet. For instance, an image link “http://tw.yahoo.com/a.jpg” is first translated into “687474703A2F74772E7961686F6F2E636F6D2F612E6A7067” in hexadecimal and the. corresponding. dynamic. link. is. “http://www.nctu.edu.tw/agent?. 687474703A2F74772E7961686F6F2E636F6D2F612E6A7067”, where the address of the image database agent is “http://www.nctu.edu.tw/agent”. The corresponding recovery procedure is as follows.. Algorithm 2.6 Recovery process of Algorithm 2.5. Input: a dynamic link DL. Output: a secret link L. Steps.. 1.. Extract a string S from the parameter part of DL.. 2.. Translate S back to a string TS of symbols.. 3.. Set L=TS.. This technique can also be applied directly to the share data of non-text components in a secret HTML.. 26.

(38) 2.1.9 Steganographic and Authentication Techniques for Header Components in E-Mails A steganographic technique and an authentication technique for head components in E-mails are proposed. By utilizing certain properties of the e-mail standard format, an e-mail header can be hidden in another e-mail header generated in this study with authentication signals to achieve steganography and authentication effects. In the following paragraphs, some properties of e-mail headers are first introduced, and the steganographic technique and authentication technique are then proposed in turn. A header in an e-mail contains many kinds of information, such as receivers’ e-mail address, sender’s e-mail address, e-mail subject, launch time, etc. In light of the e-mail standard format, every item of information in a header is expressed as a pair of title and value. Some titles are constant and must be in a header, while some titles can be specified with a leading string “X-” by an e-mail programmer. For example, the title “From” is constant and necessary for identifying the source of an e-mail in an e-mail header and “X-SMART” is a specified title. In addition, in order to transmit e-mail correctly on the internet, the values of some titles, such as “Subject”, may be encoded by one of two coding methods, the base-64 and the quoted-printable methods, which are described in e-mail format standard. The values of constant titles can be seen on e-mail software. For a secret e-mail, the values of the constant tiles are important. The proposed procedure to embed these important values into a pseudo header to create steganographic effects is described as an algorithm as follows.. Algorithm 2.7 Steganographic effect creation for e-mail headers. 27.

(39) Input: an e-mail header H. Output: a pseudo e-mail header H’. Steps.. 1.. Let the titles “Subject”, “From”, “To”, “Date”, “Cc” and “Bcc” be denoted as T1, T2, T3, T4, and T5, respectively.. 2.. For i = 1 to 5, extract all the values Vi ,1 , ...,Vi ,ti of Ti in H, where ti denotes the number of the values of Ti.. 3.. For each value Vi,j, encode Vi,j by base-64 encoding method into vi,j.. 4.. Create a string S of the following form: t1 | v1,1 | ... | v1,t1 | t2 | v2,1 | ... | v2,t 2 | ... | t5 | v5,1 | ... | v5,t 5. 5.. Create an attribute A with title “X-scanInfo” and value S.. 6.. For i = 1 to 5, randomly select a pseudo value vi' ,1 of Ti for the corresponding database.. 7.. Create an empty pseudo e-mail header H’.. 8.. For i = 1 to 5, add an attribute of Ti into H’.. 9.. Add attribute A with title “X-scanInfo” and value S into H’.. The corresponding recovery procedure is in the following.. Algorithm 2.8 Recovery process of Algorithm 2.7. Input: a e-mail header H’. Output: a recovered e-mail header Hr. Steps.. 1.. Extract the value S of the attribute of title “X-scanInfo.”. 2.. Separate S into v1,1 , ..., v1,t1 ; v2,1 , ..., v2,t2 ;...; v5,1 , ..., v5,t5 .. 3.. For each vi,j, decode vi,j into Vi,j by a base-64 decoding method 28.

(40) 4.. Create Hr.. 5.. For i = 1 to 5, add an attribute with title Ti and values vi ,1 , ..., vi ,ti .. This steganographic effect creation technique will be applied later to the share data of important parts in an e-mail header without modifications. In order to embed authentication signals of important parts of an e-mail header into the pseudo header, in which the important values are embedded, one property of e-mail headers is utilized. The property is that a string of tab and space symbols concatenated at the rear of one header string does not influence the interpretation of the attributes of an e-mail header except the attribute of title “Subject”. Therefore, by specifying the number of tab and space symbols in advance, the number of authentication signals can be obtained and the authentication signals can be then generated, divided, and concatenated at the rear of the lines. Let H be an e-mail header with important information S inside, Ha be the e-mail header after embedding authentication signals of S, a tab symbol denote ‘1’, a space symbol denote ‘0’. And AUTHEN_GEN(x, l, k) denotes the l-bit string of authentication signals generated from a string x with a key k, where l is the number of the bits that will be concatenated at the rear of a string in H. The proposed authentication signal embedding procedure is shown in the following:. Algorithm 2.9 Authentication signal embedding process. Input: H, K. Output: Ha. Steps.. 1.. Extract the value S of the attribute of title “X-scanInfo” from H.. 2.. Locate all strings L1, …, Lm, which are not used to express the attribute of title “Subject”, in H, where m the number of such lines in H. 29.

(41) 3.. Compute the authentication signal string AS = AUTHEN_GEN(S, l × m, K).. 4.. Divide AS into m l-bit strings AS1, AS2, …, ASm.. 5.. Set Ha=H.. 6.. For i = 1 to m, do the following two steps. (i). Translate ASi into a string TASi made up of space and tab symbols.. (ii) Append TASi to the rear of Li to form TASLi. (iii) Replace Li in Ha with TASLi. Let H’ be an e-mail header and K’ be a key for authentication. The corresponding authenticating procedure is as follows.. Algorithm 2.10 Authenticating process. Input: H’, K’. Output: a Boolean value b, which denotes the authentication result of H’. Steps.. 1.. Extract the value S of the attribute of title “X-scanInfo” from H’.. 2.. Locate all strings L1, …, Ln, which are not used to express the attribute of title “Subject”, in H’, where n the number of such lines in H’.. 3.. Set CAS = AUTHEN_GEN(S, l × n, K’).. 4.. Set the extracted authentication signal string EAS to be empty.. 5.. For i = 1 through n, do the following two steps. (i). Extract all tab and space symbols at the rear of Li to form a string TEASi.. (ii) Translate TEASi back into binary string EASi. (iii) Append EASi to the rear of EAS. 6.. If EAS equals CAS, set b to be TRUE; else FALSE.. The adopted procedure, the procedure of authentication signal generation, will be. 30.

(42) described in the following section.. 2.1.10 Authentication Technique for Verifying Share Data An authentication technique for verifying share data is proposed. A way to generate authentication signals is first proposed. And an authentication procedure for verifying share data is then described. Let S be a piece of share data, H be the size of space for hiding authentication signals, K be the corresponding key, R(s, l) be the random number of size l, where s is a seed, L(x) denote the size of data x in bit, and y(i) denote the ith bit of y where y is a string of data. Authentication signals of S can be generated by the following procedure.. Algorithm 2.11 Authentication signal generation. Input: S, H, and K. Output: a string of authentication signals AS. Steps.. 1.. Create AS of size H.. 2.. Set AS to R(K, H).. 3.. For i = 1 to L(S), set AS([(i − 1) (mod H)] + 1)= AS([i (mod H)] + 1) XOR S(i), where XOR denotes the exclusive-OR operator.. Because of the use of exclusive-OR operations, authentication signals can be generated quickly by the above procedure. The authentication capability creation procedure is shown as Figure 2.5 below. 31.

(43) The authentication signals of the share data are generated by the procedure of authentication signal generation mentioned above and then embedded into the host to generate the result. The authenticating procedure is as shown in Figure 2.6. To authenticate a text, share data and authentication signals are extracted from the text in the beginning. Then, other authentication signals are generated by applying the procedure of authentication signal generation to the extracted share data. By comparing the generated authentication signals with the extracted ones, the fidelity of the text can be authenticated. In these two procedures, the processes of how to embed and extract data is not mentioned. This pair of corresponding procedures is regarded as a template authentication technique. Various kinds of data hiding techniques can be imported as the two processes for the uses of the template authentication technique. Share Data Key. Host. Generating Authentication Signals. Authentication Signals. Embedding Authentication Signals. Result. Figure 2.5 Flowchart of creating authentication capability.. 32.

(44) Text. Extracting Share Data. Extracted Share Data Generating Authentication Signals. Extracting Authentication Signals. Extracted Authentication Signals. Key. Generated Authentication Signals Authentication Result. Figure 2.6 Flowchart of authenticating process.. 33. True False.

(45) Chapter 3 Secret Sharing Method with Steganographic Effects and Authentication Capability for Pure Text Documents Introduction In this chapter, a secret sharing method for pure text documents with steganographic effects and authentication capability is proposed. The remainder of this section will introduce some properties of plain text documents. In the following sections, an overview of the method is first introduced in Section 3.2. Next, the detail of the method is described in Section 3.3. And then, in Section 3.4, some experimental results are shown. Finally, some discussions and a summary of this chapter are made in Section 3.5. For pure text documents in English, the hexadecimal values of meaningful ASCII symbols range from 20 to 7E. However, for pure text documents in character languages, such as Chinese, Japanese, Korean, etc, possible hexadecimal values of bytes in a pure text document of a character language are from 0x00 to 0xFF. Therefore, in general, the hexadecimal value range of a symbol in pure text documents is from 0x00 to 0xFF, which is also the range that should be handled for secret sharing. In addition, because every symbol in a pure text document is visible, one can identify every symbol by inspection. It is also the reason why the capacity of the. 34.

(46) redundancy information of pure text documents for hiding data is quite less, compared with images, or videos.. Overview of Proposed Method In this section, the focus is on the processes of sharing and recovery. In Section 3.2.1, a secret pure text sharing process is shown and, in Section 3.2.2, the corresponding secret recovery process is presented.. 3.1.1 Secret Pure Text Sharing Process. Secret Text S. Stego-Text T1. Exclusive-OR Sharing. Share Data Dn. Text Database. Stego-Text Tn-1. Non-space symbol Extracting. Share Data D1. Simple Sentence Translation. …. Share Data Dn-1. Stego-Text Tn. Figure 3.1 Flowchart of secret pure text sharing process. Figure 3.1 shows the process of sharing the secret text S among participants and creating steganographic effects on the last piece of share data Dn. Suppose that the number of participants is n and each participant Pi holds a key Ki. First, n − 1 stego-texts are selected from the text database. Second, S is exclusive-ORed with the n − 1 stego-texts to generate share data Dn, which is a sequence of meaningless symbols. And share data D1, D2, …, and Dn−1 are the results from extracting non-space symbols from T1, T2, …, and Tn−1, respectively. Third, Dn is translated into 35.

(47) simple sentences to form a stego-text, which is meaningful. Tn and T1, T2, …, and Tn−1 are themselves the n − 1 stego-texts. Share Data Di. Stego-Text Ti. Key Ki. Generating Authentication Signals. Authentication Signals Ai. Embedding Authentication Signals. Share ATi. Figure 3.2 Flowchart of process of authentication signal embedding. Finally, as shown in Figure 3.2, the authentication signals of share data Di is generated with the corresponding key Ki and then embedded into stego-text Ti, for i = 1 through n to form share ATi. Now, n shares with authentication capability for the n participants each are created.. 3.1.2 Secret Pure Text Recovery Process Two stages are performed in the secret pure text recovery process. The first stage is the authentication process for verifying the integrity of given shares. The second one is the recovery process for recovering the secret pure text from the given shares. In Figure 3.3, the authentication process for shares is shown. Suppose n shares with n corresponding keys are used for recovering the secret pure text. Because of the difference between share data extraction processes, an authentication process is utilized for the last share and another authentication process is exploited for the other shares. If all of the n shares are authenticated successfully, the second stage begins. If not, an authentication failure report about the texts is issued.. 36.