• 沒有找到結果。

利用特殊字元編碼的新資訊隱藏技術與其於網際網路上之應用

N/A
N/A
Protected

Academic year: 2021

Share "利用特殊字元編碼的新資訊隱藏技術與其於網際網路上之應用"

Copied!
106
0
0

加載中.... (立即查看全文)

全文

(1)

國立交通大學

多媒體工程研究所

碩士論文

利用特殊字元編碼的新資訊隱藏技術與其於網際網路

上之應用

A Study on New Data Hiding Techniques Using Special

Character Codes and Their Applications on the Internet

研 究 生:王以安

指導教授:蔡文祥 教授

(2)

利用特殊字元編碼的新資訊隱藏技術 與其於網際網路上之應用

A Study on New Data Hiding Techniques Using Special Character Codes

and Their Applications on the Internet

研 究 生:王以安

Student: Yi-An Wang

指導教授:蔡文祥

Advisor: Prof. Wen-Hsiang Tsai

國 立 交 通 大 學 多 媒 體 工 程 研 究 所

碩 士 論 文

A Thesis

Submitted to Institute of Multimedia Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science

June 2011

Hsinchu, Taiwan, Republic of China

(3)

利用特殊字元編碼的新資訊隱藏技術與其於網際網路

上之應用

研究生:王以安

指導教授:蔡文祥 博士

國立交通大學多媒體工程研究所

摘要

隨著電腦和網路科技的發展,越來越多人透過網路來進行通訊,所以有必要 保護網際網路上通訊訊息的安全性。以此,本論文提出了四種資訊隱藏的方法, 用來對部落格、電子佈告欄系統(BBS)及電子郵件三種受歡迎的網路應用進行秘 密通訊或文章驗證。 對於部落格,本研究利用不可視的特殊美國標準資訊交換碼(ASCII)控制碼 組成的驗證訊號來識別部落格文章是否有被更改,達到文章驗證的功能。在電子 公佈欄方面,本研究提出了兩種資訊隱藏的方法,分別是利用不可視的大五碼 (Big-5)字元碼以及特殊的大五碼的空白碼來組成秘密資訊,此兩種方法透過電子 公佈欄做媒介,都可用以進行祕密通訊以及文章驗證。最後,對於電子郵件,本 研究提出了一個利用特殊八位元萬國碼(UTF-8)的空白碼的資訊隱藏技術,可用 來驗證電子郵件,以偵測任何對受保護之郵件的惡意竄改。 上述所提應用都是利用這些特殊字元碼在各個應用平台上的不可視性,來隱 藏秘密資訊而不被察覺,並將特殊字元編碼所構成的秘密訊息藏入原始文章當中, 達到資訊隱藏的目的,而且所隱藏的秘密資訊皆可在之後正確地還原回來。 最後本論文也提供了相關的實驗結果,來證明所提方法的可行性。

(4)

A Study on New Data Hiding Techniques Using

Special Character Codes and Their Applications on

the Internet

Student: Yi-An Wang

Advisor: Wen-Hsiang Tsai

Institute of Multimedia Engineering, College of Computer Science

National Chiao Tung University

ABSTRACT

With the progress of computer and networking technologies, communication via the Internet has become more and more popular nowadays, and so security protection of communication messages on the Internet has become a necessity. In this study, four data hiding methods for covert communication or article authentication are proposed for use on three Internet applications, namely, the blog, BBS, and email.

For the blog, a new article authentication method based on one of the proposed data hiding techniques is proposed, which uses invisible ASCII control codes to construct authentication signals to verify whether a blog article is tampered with or not. For the BBS, two data hiding techniques are proposed. One uses invisible Big-5 codes, and the other uses special Big-5 space codes, for embedding message data imperceptibly. Each of the two techniques may be used to accomplish covert communication as well as BBS article authentication. Finally, for email, a data hiding technique via the use of special UTF-8 codes is proposed for webmail authentication, so that malicious tampering with a protected email may be found out.

In all the applications mentioned above, the invisibility of the proposed special codes on appropriate platforms of Internet applications is utilized to achieve the aim

(5)

of hiding data imperceptibly. A stego-article is generated by embedding the secret message information composed of the adopted special codes into the ends of the text lines of a cover article. The hidden secret information can be recovered correctly later.

Experimental results showing the feasibility of the proposed methods are also included.

(6)

ACKNOWLEDGEMENTS

The author is in hearty appreciation of the continuous guidance, discussions, and support from his advisor, Dr. Wen-Hsiang Tsai, not only in the development of this thesis, but also in every aspect of his personal growth.

Appreciation is also given to the colleagues of the Computer Vision Laboratory in the Institute of Computer Science and Engineering at National Chiao Tung University for their suggestions and help during his thesis study.

Finally, the author also extends his profound thanks to his dear family for their lasting love, care, and encouragement.

(7)

CONTENTS

ABSTRACT (IN CHINESE) ... i

ABSTRACT (in English) ... ii

ACKNOWLEDGEMENTS ... iv

CONTENTS ... v

LIST OF FIGURES ... vii

LIST OF TABLES ... x

Chapter 1 Introduction ... 1

1.1 Motivation and Background ... 1

1.1.1 Motivation of Study ... 1

1.1.2 Introduction to Used Media ... 2

1.2 Overview of Related Works ... 4

1.3 Overview of Proposed Methods ... 5

1.3.1 Definitions of Terms ... 5

1.3.2 Brief Description of Proposed Methods ... 6

1.4 Contributions ... 9

1.5 Thesis Organization ... 9

Chapter 2 Review of Related Works and Character Coding Formats ... 10

2.1 Previous Studies on Data Hiding Techniques Using Special Character Codes ... 10

2.1.1 Review of Data Hiding Techniques via Text Documents ... 11

2.1.2 Review of Data Hiding Techniques for Internet Applications ... 12

2.1.3 Review of Other Techniques and Summary ... 14

2.2 Review of Related Character Coding Formats ... 14

2.2.1 Review of ASCII Format ... 14

2.2.2 Review of Big-5 Format ... 15

2.2.3 Review of UTF-8 Format ... 16

Chapter 3 Authentication of Blog Articles by Invisible ASCII Control Codes . 19 3.1 Introduction and Problem Definition ... 19

3.2 Major Idea of Proposed Method by Use of Invisible ASCII Control Codes 20 3.2.1 Use of Special Character Codes ... 20 3.2.2 Necessity of Distributing Embedded Codes Evenly at Line Ends to

(8)

Reduce Suspicion ... 23

3.2.3 Construction of End Signals ... 25

3.3 Authentication Signal Generation and Embedding Process ... 25

3.4 Authentication Signal Extraction and Blog Verification Process ... 28

3.5 Experimental Results... 30

3.6 Summary ... 31

Chapter 4 Covert Communication via the BBS Using Special BIG-5 Codes .... 37

4.1 Introduction and Problem Definition ... 37

4.2 Major Ideas of Proposed Methods by Use of Special Big-5 Codes ... 38

4.2.1 Data Hiding by Invisible Big-5 Codes ... 38

4.2.2 Data Hiding by Special Big-5 Space Codes ... 42

4.3 Proposed Algorithm for Data Embedding ... 46

4.4 Proposed Algorithm for Data Extraction ... 49

4.5 Experimental Results... 51

4.6 Summary ... 52

Chapter 5 BBS Article Authentication by Special BIG-5 Codes ... 58

5.1 Introduction and Problem Definition ... 58

5.2 Major Idea of Proposed Method by Use of Special Big-5 Codes ... 59

5.3 Authentication Signal Generation and Embedding Process ... 59

5.4 Authentication Signal Extraction and Verification Process ... 61

5.5 Experimental Results... 63

5.6 Summary ... 64

Chapter 6 Email Authentication by Special UTF-8 Space Codes ... 71

6.1 Introduction and Problem Definition ... 71

6.2 Major Idea of Proposed Method by Use of Special UTF-8 Codes ... 72

6.3 Authentication Signal Generation and Embedding Process ... 76

6.4 Authentication Signal Extraction and Verification Process ... 79

6.5 Experimental Results... 81

6.6 Adaptability of Proposed Method for Authentication of Blog Articles ... 81

6.7 Summary ... 82

Chapter 7 Conclusions and Suggestions for Future Works ... 90

7.1 Conclusions ... 90

(9)

LIST OF FIGURES

Figure 1.1 The login screen and a normal article on a bbs. ... 3 Figure 1.2 An instance of blogs. ... 4 Figure 1.3 Two popular email systems. (a) G-mail. (b) Hotmail. ... 4

Figure 2.1 Example of data hidden using white space [3]. (a) Normal text. (b)

White space encoded text. ... 11

Figure 2.2 An experimental result found in lee and tsai [10]. (a) Cover file seen in

adobe reader 8.1.2 window. (b) Stego-file seen in adobe reader 8.1.2 window with message “this is a covert communication method” embedded. ... 12

Figure 2.3 An experimental result found in lee and tsai [12]. (a) Cover text seen in

the window of the ie. (b) Stego-text (with message about “cartesian coordinates” embedded) seen in the window of the ie. ... 13 Figure 2.4 Big-5 coding format. ... 16 Figure 2.5 An example of utf-8 coding [17]. ... 17

Figure3.1 Some examples of highlighted blog articles with secret codes embedded

in them on (a) mozilla firefox and (b) google chrome. ... 23

Figure 3.2 Some examples of highlighted blog articles with secret codes embedded

in them in ie. (a) The stego-article with secret codes embedded in order. (b) The stego-article with secret codes embedded evenly using the proposed method. ... 25

Figure 3.3 Flowchart of proposed authentication signal generation and embedding

process. ... 28

Figure 3.4 Flowchart of the proposed authentication signal extraction and blog

article verification process. ... 29 Figure 3.5 An example of experimental results. (a) An original blog article. (b) A user interface used to generate authentication signal and protected blog article. (c) The protected blog article with an authentication signal embedded. (d) The authentication report with the message “authentication is successful.” (e) A protected blog article with a temped word. (f) The verification result of the tempered blog article. ... 33

Figure 3.6 Another example of experimental results. (a) An original blog article. (b)

The user interface with a secret key “nctu”. (c) The protected blog article with an authentication signal embedded. (d) The authentication result by using the correct secret key. (e) The authentication result by using a wrong secret key. ... 35

(10)

Figure 4.1 Stego-articles with some embedded invisible big-5 symbols displayed on (a) pcman, (b) kkman, (c) pietty, and (d) the telnet connection program, respectively. ... 40

Figure 4.2 Stego-articles with secret messages embedded displayed on some

well-known bbs’s. (a) on pcman. (b) on kkman. (c) on pietty. (d) on the telnet connection program. ... 45

Figure 4.3 Flow chart of proposed process of embedding secret messages. ... 47

Figure 4.4 Flow chart of proposed data extraction process ... 50

Figure 4.5 An example of experimental results. (a) A normal article displayed on

the pcman with our program in the upper right. (b) Data embedding process: type a secret key and a secret message, select a hiding method, highlight a cover article, and press the hiding-button to generate a stego article with the secret message embedded. (c) The displayed stego-article with the secret message embedded on the pcman. (d) Data extraction process: extract the secret massage by the use of using the correct secret key, select the same method, and press the extraction-button. (e) Result of using a wrong key to extract the secret message. (f) An extracted wrong message. ... 53

Figure 4.6 Another example of experimental results. (a) Another normal article. (b)

Embedding a secret message by method 2, using special big-5 space codes. (c) Stego-article with the secret message embedded. (d) Extracted correct secret massage. ... 56

Figure 5.1 Flowchart of proposed authentication signal generation process. ... 60

Figure 5.2 Flow chart of proposed bbs article verification process. ... 62

Figure 5.3 An example of experimental results. (a) A generation and sending

process of a protected bbs mail. (b) A protected bbs mail displayed on the pcman. (c) A protected bbs article authenticated with a correct secret key. (d) A protected bbs article authenticated with a wrong key. ... 65

Figure 5.4 Another example of experimental results. (a) A generation and sending

process of a protected bbs article. (b) A protected bbs article displayed on the pcman. (c) An authentication result of a protected bbs article with a correct secret key. (d) An authentication result of a protected bbs article tampered by replacing a word. ... 67

Figure 5.5 An experimental result displayed on the kkman. (a) A normal bbs mail.

(b) A protected bbs mail. ... 69

Figure 5.6 An experimental result displayed on the telnet connection program. (a) A

normal bbs mail. (b) A protected bbs mail. ... 70

(11)

order or (b) evenly using the proposed method. ... 75

Figure 6.2 Flowchart of proposed authentication signal generation and embedding

process. ... 77

Figure 6.3 Flowchart of the proposed authentication signal extraction and email

verification process. ... 80

Figure 6.4 An example of experimental results. (a) An original email will be send

through the g-mail webmail platform. (b) Our program with a secret key typed. (c) A protected email. (d) A protected email highlighted by a mouse. (e) The authentication result by using the correct secret key. (f) The authentication result by using a wrong secret key ... 83

Figure 6.5 The appearances of (a) a protected email and (b) its highlighted form

displayed on ie ... 86

Figure 6.6 An example of experimental results. (a) An original article will be

publish on blog. (b) Our program with a secret key typed. (c) A protected blog article. (d) The authentication result by using the correct secret key. (e) A protected blog article tampered by replacing a word. (f) The authentication result of the tampered blog article. ... 87

(12)

LIST OF TABLES

Table 2.1 ASCII code chart. ... 15

Table 2.2 UTF-8 encoding format. ... 17

Table 3.1 ASCII control codes and description [1]... 21

Table 3.2 Encoding table for used invisible ascii control codes. ... 23

Table 4.1 Encoding table for used invisible big-5 codes. ... 41

Table 4.2 Encoding table for used special big-5 space codes. ... 43

(13)

Chapter 1

Introduction

1.1 Motivation and Background

1.1.1 Motivation of Study

Good media for information communication to promote social developments are indispensable nowadays. Ever since paper was invented, knowledge dissemination and science advancement have progresses every day. Today, at the time of breakthroughs in information communication, the Internet has become an extremely important medium.

However, the Internet is just like a double-edged sword. When we communicate information on the Internet, we may be pleased to see the surprising propagation velocity and the great capability of the network. However, we are also putting the information in danger in the meantime. To exchange information on the Internet safely, the use of the data hiding technique is a solution, which was intensively studied in the past decade.

One of the applications of data hiding is steganography, which is one form of

covert communication. Unlike cryptography, the imperceptibility of steganography

conceals the behavior of secret transmission so that the risk for the secret to be detected by malicious users decreases. Besides, authentication is also an application of data hiding. It aims to verify the integrity and fidelity of data. As it is hard to prevent malicious users from intercepting and tampering with information on the Internet, an authentication process is necessary for many communication applications.

(14)

In the previous studies, the proposed data hiding techniques mostly were applied to document files, such as pictures, videos, and text files. These methods hide information in controllable items like data structures, special file syntax, and even file headers. However, studies on data hiding for some popular Internet applications, such as the BBS, blog, and email, are few and even not found yet. Because these Internet applications use non-conventional media, the above-mentioned controllable items are not found in them. These applications just accept typed words or uploaded pictures given by users. For these reasons, it is desired to design new data hiding techniques for them in this study. Specifically, we want to develop covert communication or authentication techniques for these Internet applications.

1.1.2 Introduction to Used Media

In this study, it is desired to propose data hiding techniques for three kinds of Internet applications, including the BBS, blog, and email. We briefly introduce them subsequently.

1.1.2.1 Introduction to the BBS

The BBS (bulletin board system) is a kind of Internet forum; it is popularly used

in Taiwan, Hong Kong, and China. An instance of the BBS islike Figure 1.1.

In Taiwan, the number of users on the most popular BBS site, PTT [2], can reach 60 to 150 thousand at any time. The BBS is a text-type internet forum and its screen view is presented simply by monochrome or chromatic text. Users can ask various questions, discuss any matter, interact with one another, and even send BBS mails

mutually. Furthermore, journalists also write reports by adopting some conspicuous

(15)

Figure 1.1 The login Screen and a normal article on a BBS.

1.1.2.2 Introduction to Blog

Weblog is a term coined by combining two words, web and log. Later, because someone jokingly broke the word weblog into the phrase we blog, the term “blog” is coined. Blog is a type of website or part of a website, and is usually maintained by an individual or a management team with regular entries of commentary, descriptions of events, or other material such as graphics or video. Visitors can interact on a blog by leaving comments and even messaging each other via widgets, and it is this interactivity that distinguishes them from other static websites. A typical blog combines text, images, and links to other blogs, web pages, and other media related to its topic. A typical blog is shown in Figure 1.2.

(16)

Figure 1.2 An instance of blogs.

1.1.2.3 Introduction to Email

Electronic mail, commonly called email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Today, almost everyone has not only a real address to send and receive mails but also a virtual address for emails. Existing email systems are based on a store-and-forward model. Hence, email servers are responsible to accept, forward, store, and deliver messages, so that users can send or receive emails at any time. An email can include a message subject, a message body, and some attached small files like pictures and text documents. There are many popular email websites such as g-mail, hotmail, and yahoo mail on the Internet, and two instances are shown in Figure 1.3.

(a) (b)

Figure 1.3 Two popular email systems. (a) G-mail. (b) Hotmail.

1.2 Overview of Related Works

(17)

not notice the existence of the hidden data. Many techniques have been proposed in recent years for hiding data. However, in this study, we think that hiding data by

special character codes is a more appropriate methodfor internet applications such as

the BBS, blog, and email. In Chapter 2, we will review some techniques of data hiding using special character codes for text documents and internet applications. In addition, we will also review the related character coding formats there, including the ASCII code, Big-5 code, and Unicode formats.

1.3 Overview of Proposed Methods

1.3.1 Definitions of Terms

The definitions of some related terminologies used in this study are described as follows.

1. Cover media: cover media, such as images, documents, or videos, are files into which data can be embedded.

2. BBS: the BBS is a popular internet forum for people.

3. Cover article: a cover article is an article into which data can be embedded. 4. Stego-article: a stego-article is an article with some data embedded in it.

5. Protected article: a protected article is an article into which an authentication signal is embedded.

6. Cover email: a cover email is an email into which data can be embedded. 7. Stego-email: a stego-email is an email with some data embedded in it.

8. Protected email: a protected email is an email into which an authentication signal is embedded.

(18)

1.3.2 Brief Description of Proposed Methods

1.3.2.1 Proposed Method for Authentication of Blog

Articles

An authentication method for verifying the integrity and fidelity of blog articles via the use of invisible ASCII control codes is proposed in this study. The idea of embedding data in emails is introduced first by Lee and Tsai [1] via Outlook Express and IE under the operating system of the traditional Chinese version of Microsoft Windows XP, service pack 2, 2002. Their method is based on the use of unused ASCII codes. Secret data are encoded by these special ASCII control codes and embedded into cover emails by inserting the data into the text line ends of emails. We use this idea to hide data in blog articles under the same operating system except that the

service pack is version 3 instead of 2.It was discovered that the special ASCII codes,

when displayed in many kinds of blog articles by the most popular browsers such as Google Chrome, Mozilla Firefox, and IE, are invisible or look just like spaces, achieving an effect of steganography. Such invisible ASCII control codes were found out in this study by a systematic test of all the ASCII codes on various Internet browsers and blogs.

To apply the idea to authentication of blog articles, we use a given cover blog article to produce authentication a signal, and hide the signal in the original cover blog article, resulting in a stego-article. Through the authentication process, we can verify whether the protected blog article has been tampered with or not by comparing the authentication signal extracted from the stego-article with those computed from the original cover blog article in the stego-article. The detailed authentication process and the related data embedding and extraction algorithms will be described in Chapter 3.

(19)

1.3.2.2 Proposed Method for Covert Communication via

The BBS

Two new methods for data hiding using special Big-5 codes via the BBS with Big-5 servers are proposed for covert communication in this study. One is implemented under the operating system of the traditional Chinese version of Microsoft Windows XP, service pack 3, 2002. And the other can be applied under most general operating systems.

Because the ASCII control codes are utilized to implement some system functions, we cannot hide data by the same method used for blog articles mentioned above. Through continually testing and observation in our experiments, we discovered that by the transcoding of different text coding systems with Big-5 and Unicode formats, some special Big-5 codes are invisible when they are displayed on popular BBS browsers such as PCMan, KKMan, and Pietty. So we use these special Big-5 codes for data hiding in the BBS in the first method. More specifically, we insert the invisible Big-5 codes into the text line ends, which do not change the meanings of the sentences in the cover article, neither cause any noticeable difference to the reader.

Furthermore, we develop a second new method to hide data by the use of some

special Big-5 space codes. These codes are defined in the Big-5 standard format, so that the proposed method can be used generally in most operating systems.

By the two methods, the embedded secret data are all hard to be discerned. Hence, the proposed methods can be used for covert communication. The detailed embedding and extraction processes of the proposed methods will be described in Chapter 4.

(20)

1.3.2.3 Proposed Method for BBS Authentication

We propose also a technique for BBS authentication in this study by the idea used in the above-mentioned two methods of data hiding in the BBS. There is much important information, like goods orders, meeting places, business transactions, etc., on the BBS or in BBS mails, so it is necessary sometimes to conduct authentication of such BBS articles or mails. This activity is called BBS authentication in this study. The proposed method for this purpose will be described in Chapter 5.

1.3.2.4 Proposed Method for Email Authentication

A new method for email authentication by the use of special UTF-8 space codes is proposed in this study. To reach the goal of email authentication, firstly we also tried to implement the method proposed by Lee and Tsai [1], i.e., data hiding in emails by using the unused ASCII control codes. However, in using webmails like the g-mail, hotmail, and yahoo mail, when sending an email, the unused ASCII control codes will be removed or changed into general white spaces. Because this obviously is not appropriate for data hiding in the webmail, we propose the use of special UTF-8 space codes to achieve the aim in this study. The idea is inspired from one of the above-mentioned methods: the data hiding technique for the BBS using special Big-5 space codes. Today, almost all webmail servers are built using the Unicode UTF-8 format as their text coding systems. It is a computing industry standard for consistent encoding, representation, and handling of text expressed in most of the world’s writing systems, and has codes corresponding to the special Big-5 space codes which are used in the proposed method for data hiding in the BBS mentioned previously. Thus, we can find the special UTF-8 space codes to hide data in the webmail and use the result for email authentication. The detailed processes will be described in Chapter

(21)

1.4 Contributions

Some contributions made in this study are listed in the following.

1. For the first time blog articles are used as cover media for data hiding applications.

2. For the first time the BBS is used as a cover medium for data hiding applications. 3. An authentication method for verification of integrity and fidelity of blog articles

by the use of invisible ACSII control codes is proposed.

4. A new data hiding technique using invisible Big-5 codes is proposed for the two applications of covert communication via the BBS as well as BBS authentication. 5. A new data hiding technique using special Big-5 space codes is proposed for the

two applications of covert communication via the BBS as well as BBS authentication.

6. A new data hiding technique using special UTF-8 space codes is proposed for email authentication.

1.5 Thesis Organization

In the remainder of this thesis, related works about data hiding using special character codes and the used character coding formats are reviewed in Chapter 2. In Chapter 3, the proposed authentication method for verification of integrity and fidelity of blog articles is described. And the proposed methods for covert communication via the BBS as well as BBS authentication are described in Chapters 4 and 5, respectively. In Chapter 6, the proposed method for email authentication by the use of special UTF-8 space codes is described. Finally, conclusions and some suggestions for future works are given in Chapter 7.

(22)

Chapter 2

Review of Related Works and

Character Coding Formats

2.1 Previous Studies on Data Hiding

Techniques Using Special

Character Codes

With prosperity of the computer network, the Internet has become a very popular medium for information communication. A lot of important information is interchanged on the Internet all the time. Especially, on text-typed Internet applications like the BBS, blog, and email, people communicate messages, discuss private matters, publish articles, and even do business. But article created or sent in these activities might be tampered with illegally by hackers on the line. Therefore, it is necessary to protect these articles. In this study, we design data hiding techniques to achieve this purpose by covert communication and authentication on the Internet applications.

Articles on the Internet applications belong to soft-copy texts [3]. In recent years, some methods of data hiding via text documents have been proposed, like using file headers [4], file structures, and file features [5]. However, the Internet applications are not conventional media, so these controllable items are not found in them. Hence, we implement data hiding techniques on them by embedding special character codes. In the past, some data hiding methods about using character codes have been proposed. Wayner [6] proposed a method to use the context-free grammar to create secret text

(23)

messages in cover files for covert communication; the secret message is not embedded in the cover file directly. And a receiver extracts the hidden message by parsing. A constraint is that the cover text should be a meaningful message; otherwise, a reader will doubt it. Bender et al. [3] proposed the use of infrequent additional spaces to form secret data and transmitted them in soft-copy texts, including inter-sentence spacing, end-of-line spacing, and inter-word spacing in texts. For example, one space between words is taken to represent a “0” and two spaces a “1.” An illustration of the method is shown in Figure 2.1.

Data hiding techniques via special character codes are also used for some popular text documents and Internet applications. A survey of them is conducted in this chapter.

(a) (b)

Figure 2.1 Example of data hidden using white space [3]. (a) Normal text. (b) White space encoded text.

2.1.1 Review of Data Hiding Techniques via Text

Documents

Every day, numerous text documents are interchanged on the Internet. It is hard to prevent malicious users from intercepting and tampering with them, so developing data hiding techniques to protect important information on them is necessary.

For the XML which is a set of rules for encoding documents in machine-readable form, Inoue et al. [7] proposed a technique to embed secret data by inserting white

(24)

spaces in tags. Representation of a tag is accomplished by including either some white spaces before the close bracket, or no white space [8]. By inserting or deleting spaces, they can embed the data preserving all meanings of original documents. For the PDF which is a popular file format with independency of different computer platforms, data hiding techniques can also be attained by using equivalent white space codes or invisible ASCII codes, as proposed by Lai and Tsai [9] and Lee and Tsai [10] An experimental result found in Lee and Tsai [10] is shown in Figure 2.2. And even for

software programs like the Visual C++ and C++ Builders, three ways to hide data using

invisible ASCII control codes were proposed by Lee and Tsai [11], including 1) alternative space coding, 2) line-end space coding, and 3) null space coding.

(a) (b)

Figure 2.2 An experimental result found in Lee and Tsai [10]. (a) Cover file seen in Adobe Reader 8.1.2 window. (b) Stego-file seen in Adobe Reader 8.1.2 window with message “This is a covert communication method” embedded.

2.1.2 Review of Data Hiding Techniques for Internet

Applications

(25)

decades, methods about hiding data in Internet applications directly are very few. Lee and Tsai [12] and Huang and Tsai [13] proposed some techniques for data hiding by embedding special codes in HTML files to substitute for the original white spaces in the files, and an experimental result found in Lee and Tsai [12] is shown in Figure 2.3. In these cases, message data were hidden in HTML files so that these files became stego-media for secret communication or secret sharing when the HTML files are displayed on the Internet. However, these methods are indirect data hiding techniques for Internet applications. In another paper published by the same authors, Lee and Tsai [1], a direct data hiding technique was proposed to embed secret data into email text line ends using special ASCII control codes. These special ASCII control codes are invisible when displayed on the screen and so will not affect a user’s reading of the resulting email.

(a)

(b)

Figure 2.3 An experimental result found in Lee and Tsai [12]. (a) Cover text seen in the window of the IE. (b) Stego-text (with message about “Cartesian coordinates” embedded) seen in the window of the IE.

(26)

2.1.3 Review of Other Techniques and Summary

For some text documents, data hiding methods using not only special character codes but also special file syntax or file features have been proposed. Chang and Tsai [14] used pseudo-spaces, the specific string “&nbsp,” to encode copyright data into the text of an HTML file; duplicated the copyright data to enhance the robustness against HTML manipulations; and combined the blank character code and the HTML special syntax to hide data. Zhong, et al. [15] proposed a data hiding method for PDF documents by adjusting the positions of the text characters slightly to embed the secret data. They also hid data by combining character codes and certain special file features.

In conclusion, the text-typed Internet applications and text documents are good choice as a covert channel for data hiding because they are commonly used for information exchanges in daily works and for communication on the Internet. Some data hiding techniques applied on different kinds of text document formats have been proposed over the past decade. However, studies on data hiding via the Internet applications like the BBS, blog, and email are very few and even not found yet, so we will propose new data hiding techniques and related applications for them in this study.

2.2 Review of Related Character

Coding Formats

2.2.1 Review of ASCII Format

(27)

character-encoding scheme based on the ordering of the English alphabet. ASCII codes are used to represent text in computers, communications equipment, and other devices that use text. A standard ASCII code is composed of seven bits and usually expressed as two hexadecimal numbers. The first edition of the standard was published in 1963, a major revision in 1967, and the most recent update in 1986 [16]. The ASCII codes include 128 characters: 33 are non-printing control characters (now mostly obsolete), 94 are printable characters, and the space which is considered as an invisible graphic. All the ASCII codes are listed in Table 2.1.

As computer technology spreads throughout the world, many coding standards have been developed to facilitate the expression of non-English alphabets. However, these character coding standards, such as the Unicode and Big-5, all include the ASCII codes as the kernel set. Now, almost all web servers are built with the Unicode. Therefore, in many current Internet applications, the properties of the ASCII codes are still preserved.

Table 2.1 ASCII code chart.

0 1 2 3 4 5 6 7 8 9 A B C D E F

0 NUL SOH STX ETX EOT ENQ ACK BEL BS HT L F VT FF CR SO SI 1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

2 ! " # $ % & ' ( ) * + , - . / 3 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 4 @ A B C D E F G H I J K L M N O 5 P Q R S T U V W X Y Z [ \ ] ^ _ 6 ` a b c d e f g h i j k l m n o 7 p q r s t u v w x y z { | } ~ DEL

2.2.2 Review of Big-5 Format

(28)

and Macau for Traditional Chinese characters. When the Unicode has not been developed yet, many different language coding standards existed in various countries under old operating systems like MS-DOS. Now, under the current operating systems such as Windows XP and Window 7, all internal messages are interchanged using the Unicode and all the different language coding standards such as Big-5 and ASCII can be supported. In the recent years, the most often used Big-5 version is defined in Microsoft Windows Codepage 950 (CP950) [17]. A standard Big-5 code is a double-byte character set and the structure is shown in Figure 2.4.

First byte (“lead byte”) Second byte

First byte (“lead byte”) Second byte

0x81 to 0xfe 0x40 to 0x7e, 0xa1 to 0xfe

Figure 2.4 Big-5 coding format.

Though most of the current Internet applications do not support Big-5 coding, for the BBS which is a very popular kind of forum in Taiwan, China, and Hong Kong, the Big-5 is still the major character coding system for BBS servers.

2.2.3 Review of UTF-8 Format

UTF-8 (8-bit Unicode Transformation Format) is a multibyte character encoding style for the Unicode. And the Unicode is a computing industry standard for consistent encoding, representation, and handling of text expressed in most of the world’s writing systems. It is developed in conjunction with the Universal Character

b16 b15 b14 b13 b12 b11 b10 b9 b8 b7 b6 b5 b4 b3 b2 b1 1

(29)

Set standard and published in book form as The Unicode Standard [18]. The success of the use of the Unicode to unify character set coding has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including the XML, the Java programming language, the Microsoft .NET Framework, and many modern operating systems. The Unicode can be implemented by different character encodings, and the UTF-8 is the most commonly used one. Unlike the original Unicode which is a double-byte character set, the UTF-8 is a variable-length encoding scheme, with each character represented by one to four bytes. The UTF-8 encoding format is shown in Table 2.2 and an example is shown in Figure 2.5 [17].

Table 2.2 UTF-8 encoding format.

Code point range Binary code point UTF-8 bytes Annotations U+0000 to

U+007F 0xxxxxxx 0xxxxxxx

the range of ASCII

the highest bit of the byte is 0 U+0080 to U+07FF 00000yyy yyxxxxxx 110yyyyy 10xxxxxx

the first byte starts with 110 the following byte starts with 10 U+0800 to U+FFFF zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx

the first byte starts with 1110 the following bytes start with 10

U+010000 to U+10FFFF 000wwwzz zzzzyyyy yyxxxxxx 11110www 10zzzzzz 10yyyyyy 10xxxxxx

the first byte starts with 11110 the following bytes start with 10

(30)

The UTF-8 has many advantages. For example, every valid ASCII character is also a valid UTF-8 encoded Unicode character with the same binary value, so old systems and software with the ASCII encoding format can make no change or just a few slight modifications to be used further. Therefore, it is gradually becoming a top-priority character coding system for e-mail, website, and other Internet applications.

(31)

Chapter 3

Authentication of Blog Articles by

Invisible ASCII Control Codes

3.1 Introduction and Problem

Definition

3.1.1 Introduction

In this chapter, we will specifically introduce the proposed data hiding method for authentication of blog articles. In Section 3.1.2, the problem definition is described, and the major idea of the proposed data hiding method for authentication is described in Section 3.2. In Section 3.3, we present the technique we propose to generate an authentication signal and the process to embed it. In Section 3.4, a process for extraction and verification of the authentication signal is proposed. Experimental results showing the feasibility of the proposed method are shown in Section 3.5. Finally, a brief summary is given in Section 3.6.

3.1.2 Problem Definition

Blog is a virtual channel, which allows people to express feelings, and many public figures like politicians and entertainers also advertise political philosophies or raise awareness by interacting with their fans on the blog. Some blog articles are important messages worth long-time recording. Hence, to check the integrity and fidelity of articles on the blog to see whether they have been attacked or not is

(32)

important. For this purpose, we propose a data hiding technique for authentication of blog articles in this study.

For public figures, the proposed method offers significant functions. For example, their blogs are often managed by a management team, and everyone in the team has the authority to publish or modify the contents of the blogs. It results that if a malicious person tampers with the content of an article or even posts a fake message on a celebrity’s blog, the members of the management team cannot take action as soon as possible, because they may think that the change is made by another member of the team. By the proposed method, they can rapidly authenticate the integrity and fidelity of the blog articles, preventing the negative advertising effect brought by malicious tampering. And for people who manage their blogs by themselves, the method not only can save their time spent on literally tedious checking of the authenticity of their articles published before, but also can let other people, who do not have the authority to modify the blog articles, help the work of authenticating them if they are given the keys to enter the system. The major idea of the proposed method is specifically introduced in the next section.

3.2 Major Idea of Proposed Method by

Use of Invisible ASCII Control

Codes

3.2.1 Use of Special Character Codes

On the blog, as mentioned in Chapter 2, users can only type simple words and upload some small files like pictures and short videos. Many controllable items

(33)

facilitating data hiding, like data structures, special file syntax, and file headers, cannot be found on the blog, so we use some special character codes to achieve the goal of data hiding in this study.

Specifically, we hide data in the blog using invisible ASCII control codes. Part of the ASCII codes from 00 through 1F, namely, the ASCII control codes, were originally designed to control some computer peripheral devices such as teletypes, tape drivers, printers, etc. All the ASCII control codes are listed in Table 3.1 [1]. Now, because of the rapid development of new peripheral hardware technologies, however, the ASCII control codes are rarely used for their original purposes, except the codes for text display control, like 0A and 08 with the meanings of line feed and backspace, respectively.

Table 3.1 ASCII control codes and description [1].

Dec Hex Char Description Dec Hex Char Description

0 0 NUL null character 16 10 DLE data link escape

1 1 SOH start of header 17 11 DC1 device control 1

2 2 STX start of text 18 12 DC2 device control 2

3 3 ETX end of text 19 13 DC3 device control 3

4 4 EOT end of transmission 20 14 DC4 device control 4

5 5 ENQ Enquiry 21 15 NAK negative acknowledge

6 6 ACK acknowledge 22 16 SYN synchronize

7 7 BEL bell (ring) 23 17 ETB end transmission block

8 8 BS Backspace 24 18 CAN cancel

9 9 HT horizontal tab 25 19 EM end of medium

10 A LF line feed 26 1A SUB substitute

11 B VT vertical tab 27 1B ESC escape

12 C FF form feed 28 1C FS file separator

13 D CR carriage return 29 1D GS group separator

14 E SO shift out 30 1E RS record separator

(34)

Furthermore, through continuous tests and observations in our experiments, we discovered that some of the control codes are invisible or just shown as white spaces

when they appear in a blog article on many popular web browsers under certain

software environments. In addition, almost all of the current web servers today use the UTF-8 standard as the system text coding format, and the properties and features of the ASCII codes are completely included as the kernel set; the UTF-8 codes equate exactly to the ASCII codes for code values smaller than 128.

In this study, it is desired to use the invisible ASCII control codes in the UTF-8 to embed data in blog articles without causing noticeable artifacts under the popular software environments of Google Chrome, Mozilla Firefox, IE, and the operating system of the traditional Chinese version of Microsoft Windows XP, service pack 3, 2002. To achieve this aim, we divide a secret message into 2-bit segments, then map them to the corresponding special ASCII control codes which are discovered through continuous testing mentioned previously, and embed them as well as some end signals into a cover article. Table 3.2 shows the used special ASCII codes and the mapping relationship devised in this study for this purpose.

In more detail, we use the four ASCII codes of 1C, 1D, 1E, and 1F, which, when displayed, look just as nothing on the web browsers of Mozilla Firefox and Google Chrome. Two examples are shown in Figure 3.1, where Figure 3.1(a) shows a result displayed in Firefox in which the embedded codes at the line ends cannot be

highlighted, while Figure 3.1(b) shows a result displayed in Chrome in which the

spaces between the ends of all text lines and the right end of the blog article window are all highlighted when the entire article is highlighted, no matter whether there are embedded codes at line ends or not. Both cases have no worry about leaking embedded characters at text line ends. However, in IE, these codes are displayed as

(35)

security of the embedded message using the codes. A solution is proposed in this study, which is discussed next.

Table 3.2 Encoding table for used invisible ASCII control codes.

Bit stream (binary) 00 01 10 11 End signal

ASCII code (Hexadecimal) 1C 1D 1E 1F 1F0A

(a) (b)

Figure3.1 Some examples of highlighted blog articles with secret codes embedded in them on (a) Mozilla Firefox and (b) Google Chrome.

(36)

Evenly at Line Ends to Reduce Suspicion

As mentioned previously, in IE, when all the text of a blog is highlighted, the embedded special character codes will appear to be white spaces, thus leaking the secret hidden in the blog article. This secret leakage phenomenon is even clearer when all the special codes are embedded at the line ends in sequential orders, starting from the first line, because then long blank spaces will appear at the ends of the beginning lines, like the example shown in Figure 3.2(a). This phenomenon will tend to arouse the attacker’s suspicion.

One way to solve this problem, as proposed in this study, is to distribute the embedded codes evenly into all the line ends in order not to create long blank spaces at the ends of the beginning lines. More specifically, we embed different numbers of special ASCII codes in accordance with the variable lengths of the text lines in the

blog article.That is, if a text line is longer, then we embed less codes, and vice versa.

To implement this idea, we estimate the size of the data embedding slot at each text line end, and compute accordingly the required number of special codes embeddable at the end of each line. Then, we sequentially embed the codes into the blog article according to the computed numbers. A stego-article with special codes embedded in this way is shown in Figure 3.2(b), in which the special codes are seen to have been embedded evenly at the line ends. The detailed hiding method will be described in Section 3.3.

(37)

(a) (b)

Figure 3.2 Some examples of highlighted blog articles with secret codes embedded in them in IE. (a) The stego-article with secret codes embedded in order. (b) The stego-article with secret codes embedded evenly using the proposed method.

3.2.3 Construction of End Signals

The previously-mentioned end signal used in this study is composed of two ASCII control codes, 1F and 0A, which together specify a unique signal for unambiguous identification of the line end. The code 0A appears at each text line end, originally purely for the use as a line feed signal. However, because in the proposed method end signals are only embedded at text line ends, we can just create them by merging the two existing codes 1F and 0A instead of finding a new one.

In summary, by the use of a secret key and the proposed data hiding technique, we can produce a protected blog article with a barely imperceptible authentication signal embedded in it for detection of any falsification incurred by malicious users.

3.3 Authentication Signal Generation

and Embedding Process

(38)

signal and embed it into a blog article. An illustration of the process is shown in Figure 3.3. When the stego-article is displayed on the blog, it is desired that the article body can fit the width of the blog article window. For this, first we fold longer article lines into shorter ones, leaving at least eight characters at each line end as a data

embedding slot. Next, we remove from the folded blog article all the line feed signals

so that the verification process described in the next section will not be interfered by redundant line feed signals. The modified blog article and a secret key then are used to generate an authentication signal using a hash function and the exclusive-OR operation. Subsequently, the authentication signal is divided into 2-bit segments. Finally, we map them into corresponding special ASCII control codes, and embed them into the text line ends accompanied with end signals to obtain a protected blog article. The detail is described in the following.

Algorithm 3.1 Authentication signal generation and embedding process. Input: a secret key K, a hash function f (such as MD5), and a blog article B to be

protected.

Output: a protected blog article  Steps.

1. Fold sequentially each text line li with a length larger than 60 units (with a unit

meaning the length of an ASCII code displayed on the blog) in blog article B into a 60-unit line by inserting a line feed, denoted as LF, occupying zero unit, and

represented by ASCII code 0A, after the original 60th character in li to generate

a folded article, denoted as F.

2. Compute the size Li of the data embedding slot at the end of each text line li in F

(39)

which means that the maximum number of characters that can be inserted at the end of li.

3. Removeall the line feed signals in F, use the result and the secret key K as inputs

to the hash function f to generate two 128-bit digests Fand K, respectively, and

return all the removed LF signals back into their original positions in F.

4. Compute the exclusive-OR value F⊕K to obtain a 128-bit authentication signal

S.

5. Separate the bits of S into 64 two-bit segments t1, t2, …, t64.

6. Map t1 through t64 into invisible ASCII control codes p1 through p64, respectively,

according to Table 3.2 and let N1 = N2 = 64 for use as parameters in subsequent

steps.

7. Scan F from the first line to find the line, say the i-th, with the longest slot and calculate the number ni of invisible ASCII control codes embeddable in the i-th

line in the following way:

(1) if Li 1, then increment ni by 1, decrement Li by 1, decrement N1 by 1, and perform Step 7 again;

(2) if Li = 1 or N1 = 0, then perform Step 8.

8. Embed the symbols p1, p2, …, p64 sequentially into F, starting from the first line

in the following way:

(1) Scan li to find the line feed LF, remove it, sequentially embed ni symbols in li

at the end, decrement N2 by ni, and append an end signal, in which an LF is

included, to the end of the embedded symbols.

(2) If the last line is processed, and if N2  0, then embed the remaining symbol/symbols below F as one or more blank lines in the following way. 8.1 Embed as many symbols as possible into a new line sequentially before

(40)

end signal to the line end.

8.2 If all symbols are embedded, then continue; otherwise, repeat Step 8.1 again.

9. Take the final version of F as the desired protected blog article B'.

It is possible that the symbols of the authentication signal cannot be embedded in the text line ends completely. This might happen when the blog article is too short. In this case, more lines are appended to the end of the article, all being empty, and the remaining symbols then are all embedded into them, as done in Step 8 of the above algorithm.

Hash function f Authentication

signal S

Fold each long text line to the appropriate length

Secret key K Original blog article B

Mapping

Divide S into several 2-bit segments t1, t2…, tk

Compute the size of each data embedding slot after each text line

Calculate the required number of special ASCII control codes in each

text line Embed the special ASCII control codes to

the text line ends

Invisible ASCII control codes encoding table

Protected blog article B

Figure 3.3 Flowchart of proposed authentication signal generation and embedding process.

(41)

and Blog Verification Process

The proposed authentication signal verification scheme can be used to verify the integrity and fidelity of a protected blog article, and the detailed secret data extraction process for blog authentication is illustrated in Figure 3.4. First, we extract the special ASCII codes embedded in the protected blog article and transform them to be an authentication signal S. Then, we use the same secret key and hash function as those used in Algorithm 3.1 to transform the blog article, in which all the secret data and the line feed signals are removed, into a verification signal T. Finally, by comparing the two signals S and T, we can decide whether the protected blog article has been modified or not. The detailed algorithm is described in the following.

Mapping

Hash function f

Extract the special ASCII control codes embedded in B

Remove the special ASCII control codes embedded in B

Transform those special ASCII control codes according to the encoding table

Authentication signal S Verification signal T Compare S with T An authentication report R

Protected blog article B

Invisible ASCII control codes encoding table Secret key K

Figure 3.4 Flowchart of the proposed authentication signal extraction and blog article verification process.

(42)

Input: a secret key K and a hash function f both being the same as those used in

Algorithm 3.1; and a protected blog article B.

Output: an authentication report RSteps.

1. Check each line li in the protected blog B′ sequentially, starting from the first line;

and extract the special ASCII control codes embedded in front of the end signal in

li.

2. Concatenate all the extracted special ASCII codes sequentially into a set of 64 codes, p1, p2, …, p64.

3. Map p1 through p64 to corresponding two-bit segments t1, t2, …, t64 according to Table 3.2.

4. Concatenate t1 through t64 into a 128-bit authentication signal S.

5. Use the secret key K as an input to the hash function f to generate a 128-bit digest

K

6. Remove all the secret data and line feed signals from the blog article, and use the

resultas an input to the hash function f to generate a 128-bit digest B

7. Compute the exclusive-OR value B⊕Kto get a 128-bit verification signal T.

8. Compare S and T, resulting in the following two cases.

(1) If S = T, then regard the input B as unmodified and mark it so in the

authentication report R.

(2) If S T, then regard B as modified and mark it so in R. 9. Output the authentication report R.

3.5 Experimental Results

(43)

articles have been conducted. The algorithms were implemented using the language of Microsoft Visual C#. We wrote blog articles on a public blog system, and these articles can be displayed on popular web browsers such as Google Chrome, Mozilla Firefox, and IE. In this section, we show some experimental results displayed on Google Chrome.

In Figure 3.5(a), an original blog article displayed on the web browser of Google Chrome is shown. And the user interface is shown in Figure 3.5(b). By entering a secret key, we generated an authentication signal and embedded it in the original article, resulting in a protected blog article as shown in Figure 3.5(c). If the protected article is not modified, the mark “Authentication is successful.” will be shown on the authentication report like Figure 3.5(d). However, if a malicious user tries to tamper with the protected article, yielding a modified blog article, people who have the same secret key can verify the integrity and fidelity of it. A tampered article and the corresponding verification result are shown in Figures 3.5(e) and 3.5(f), respectively. And Figure 3.6 shows another example of our experimental results displayed on the web browser of Mozilla Firefox. In this figure, we obtain an authentication report by using the correct secret key and a wrong secret key, respectively, and get the different results. These experimental results show that, using the proposed method, we can verify whether a blog article has been tampered with or not successfully.

3.6 Summary

In this chapter, a method for authentication of the integrity and fidelity of blog articles using a new data hiding technique has been proposed. Authentication signals of the form of invisible ASCII control codes are generated using a folded version of a given blog article. They are embedded sequentially in the folded article according to

(44)

pre-computed numbers of secret symbols in the lines. Even in the most unfavorable web browser IE, the embedding result is good to arouse no suspicion. A secret key was used also to randomize the content of the authentication signal so that malicious users cannot forge easily the text content and the corresponding authentication signal. The proposed method is reliable to protect blog article from being tampered with, as proved by the experimental results.

(a)

(45)

Figure 3.5 An example of experimental results. (a) An original blog article. (b) A user interface used to generate authentication signal and protected blog article. (c) The protected blog article with an authentication signal

embedded. (d) The authentication report with the message

“Authentication is successful.” (e) A protected blog article with a temped word. (f) The verification result of the tempered blog article.

(c)

(46)

Figure 3.5 An example of experimental results (continued). (a) An original blog article. (b) A user interface used to generate authentication signal and protected blog article. (c) The protected blog article with an authentication signal embedded. (d) The authentication report with the message “Authentication is successful.” (e) A protected blog article with a temped word. (f) The verification result of the tempered blog article.

(e)

(f)

Figure 3.5 An example of experimental results (continued). (a) An original blog article. (b) A user interface used to generate authentication signal and protected blog article. (c) The protected blog article with an authentication signal embedded. (d) The authentication report with the message

(47)

“Authentication is successful.” (e) A protected blog article with a temped word. (f) The verification result of the tempered blog article.

(a)

(b)

Figure 3.6 Another example of experimental results. (a) An original blog article. (b) The user interface with a secret key “NCTU”. (c) The protected blog article with an authentication signal embedded. (d) The authentication result by using the correct secret key. (e) The authentication result by using a wrong secret key.

(48)

(c)

(d)

(e)

Figure 3.6 Another example of experimental results (continued). (a) An original blog article. (b) The user interface with a secret key “NCTU”. (c) The protected blog article with an authentication signal embedded. (d) The authentication result by using the correct secret key. (e) The authentication result by using a wrong secret key.

(49)

Chapter 4

Covert Communication via the BBS

Using Special Big-5 Codes

4.1 Introduction and Problem

Definition

4.1.1 Introduction

In this chapter, we will specifically introduce the proposed data hiding methods for covert communication via the BBS. The problem definition is described in the Section 4.1.2. And in Section 4.2, the basic ideas of the proposed methods are described. Detailed data embedding and extraction algorithms are presented in Sections 4.3 and 4.4, respectively. In Section 4.5, experimental results showing the feasibility of the methods are given. Lastly, we briefly summarize the work we have done in Section 4.6.

4.1.2 Problem Definition

The BBS (bulletin board system) is a popular interaction platform for discussions, entertainments, shopping, etc. Every day, numerous articles are published on BBS’s. Thus, it is an appropriate channel for covert communication. Furthermore, BBS administrators have the supreme authority to read or delete any article and even read private mails or messages on the BBS, so covert communication via the BBS is not only appropriate but also necessary. The aim of this kind of covert communication is

(50)

to send secret messages through the articles published on the BBS without arousing suspicions of hackers. Accordingly, we develop two techniques for covert communication via the BBS by the use of special Big-5 codes in this study. They are introduced in the following sections.

4.2 Major Ideas of Proposed Methods

by Use of Special Big-5 Codes

In this study, we propose two data hiding methods for covert communication via the BBS. One is to use invisible Big-5 codes; the other is to use special Big-5 space codes, and we generally call the two kinds of codes we use special Big-5 codes.

4.2.1 Data Hiding by Invisible Big-5 Codes

To achieve the goal of data hiding in BBS articles, at the beginning of this study we have tried the technique of using invisible ASCII codes mentioned previously in Chapter 3, because the ASCII codes are compatible with the Big-5 codes as the kernel set. However, invisible ASCII control codes are utilized to implement some system functions on the BBS, so we have to develop new data hiding technique.

The first proposed new data hiding method via the BBS is to use invisible Big-5 codes. In Taiwan, many BBS’s like the PTT and the school BBS sites are built on the servers with the Big-5 coding format, so the proposed first technique is appropriate for them. Nowadays, most of the popular operating systems such as Windows XP and Windows 7 use the Unicode format as their text coding systems, because the Unicode is a universal and complete standard format. No matter what coding formats are used for text, they will be transformed into the appearance of the Unicode format by these

(51)

operating systems when they are displayed on the screen. Taking Windows XP as an example. In this operating system which contains many different conversion tables for transcoding between various text coding formats and the Unicode, all text with the Big-5 format on the BBS will be displayed on the screen with the Unicode format by referring to the CodePage 950 which is a transcoding table between the Big-5 and the Unicode [17].

For this reason, we tried to find the mapping relationship between all Big-5 codes and Unicode codes, and discovered that some special Big-5 codes, which originally represent certain rarely-used Chinese characters or Japanese characters, are invisible, and look just like white spaces when these codes are transcoded into the Unicode format and displayed on the BBS. This phenomenon resulted from the fact that these corresponding Unicode codes are located in the Unicode Private Use Area, which ranges from code E000 to code E8FF and does not contain any character assignment so that no character code chart is provided for this area.

However, on some popular BBS browsers such as PCMan and Pietty, to facilitate users to read and type some special characters, certain above-mentioned special Big-5 codes are presented as their original appearances through the simulated Unicode

compensation plan implemented by the BBS browser software. So, through

continuous tests and observations in our experiments on popular BBS browsers including PCMan, KKMan, Pietty, and the basic telnet connection program provided by Windows XP, we have found 185 special Big-5 codes useful for our study, and we supplemented the 185 codes to a total of 256 symbols by padding a white space after each of the first 71 ones of them. The appearances of embedding some of these symbols in BBS articles on the above-mentioned browsers are shown in Figure 4.1. And the codes are listed in Table 4.1. Note that we have created an end signal which is composed of a special Big-5 code, FEAE, and the original white space.

(52)

(a) (b)

(b) (d)

Figure 4.1 Stego-articles with some embedded invisible Big-5 symbols displayed on (a) PCMan, (b) KKMan, (c) Pietty, and (d) the telnet connection program, respectively.

數據

Figure 2.1  Example of data hidden using white space [3]. (a) Normal text. (b) White  space encoded text
Figure 2.3  An experimental result found in Lee and Tsai [12]. (a) Cover text seen in  the  window  of  the  IE
Figure 3.3  Flowchart  of  proposed  authentication  signal  generation  and  embedding  process
Figure 3.4  Flowchart of the proposed authentication signal extraction and blog article  verification process
+7

參考文獻

相關文件

39 資訊與網路技術 佳作 李喬安 國立花蓮高級工業職業學校/勞動部勞動力發 展署桃竹苗分署. 39 資訊與網路技術 佳作

分區技能競賽 資訊與網路技術. 正式賽

你是 Dora 公司裡的 IT Expert,日前收到Dora的委託架設內部 dora.local 網路,最近 Dora 公司業績長紅,Dora 的合作夥伴 Diego 想要拓展分公司,但是搗蛋鬼不相信你的技術所 以想要說服

由於較大型網路的 規劃必須考慮到資料傳 輸效率的問題,所以在 規劃時必須將網路切割 成多個子網路,稱為網 際網路。橋接器是最早

(B)惟規劃特定課程,需運用特殊外聘專業師資授課時,得於新 臺幣 1000 元至最高新臺幣 2,000

熟悉 MS-OFFICE

z 香港政府對 RFID 的發展亦大力支持,創新科技署 06 年資助 1400 萬元 予香港貨品編碼協會推出「蹤橫網」,這系統利用 RFID

熟悉 MS-OFFICE