Summary - Authentication of Blog Articles by Invisible ASCII Control Codes . 19

Chapter 3 Authentication of Blog Articles by Invisible ASCII Control Codes . 19

3.6 Summary

In this chapter, a method for authentication of the integrity and fidelity of blog articles using a new data hiding technique has been proposed. Authentication signals of the form of invisible ASCII control codes are generated using a folded version of a given blog article. They are embedded sequentially in the folded article according to

pre-computed numbers of secret symbols in the lines. Even in the most unfavorable web browser IE, the embedding result is good to arouse no suspicion. A secret key was used also to randomize the content of the authentication signal so that malicious users cannot forge easily the text content and the corresponding authentication signal.

The proposed method is reliable to protect blog article from being tampered with, as proved by the experimental results.

(a)

(b)

Figure 3.5 An example of experimental results. (a) An original blog article. (b) A user interface used to generate authentication signal and protected blog article. (c) The protected blog article with an authentication signal embedded. (d) The authentication report with the message

“Authentication is successful.” (e) A protected blog article with a temped word. (f) The verification result of the tempered blog article.

(c)

(d)

Figure 3.5 An example of experimental results (continued). (a) An original blog article. (b) A user interface used to generate authentication signal and protected blog article. (c) The protected blog article with an authentication signal embedded. (d) The authentication report with the message

“Authentication is successful.” (e) A protected blog article with a temped word. (f) The verification result of the tempered blog article.

(e)

(f)

“Authentication is successful.” (e) A protected blog article with a temped word. (f) The verification result of the tempered blog article.

(a)

(b)

Figure 3.6 Another example of experimental results. (a) An original blog article. (b) The user interface with a secret key “NCTU”. (c) The protected blog article with an authentication signal embedded. (d) The authentication result by using the correct secret key. (e) The authentication result by using a wrong secret key.

(c)

(d)

(e)

Figure 3.6 Another example of experimental results (continued). (a) An original blog article. (b) The user interface with a secret key “NCTU”. (c) The protected blog article with an authentication signal embedded. (d) The authentication result by using the correct secret key. (e) The authentication result by using a wrong secret key.

Chapter 4 Covert Communication via the BBS Using Special Big-5 Codes

4.1 Introduction and Problem Definition

4.1.1 Introduction

In this chapter, we will specifically introduce the proposed data hiding methods for covert communication via the BBS. The problem definition is described in the Section 4.1.2. And in Section 4.2, the basic ideas of the proposed methods are described. Detailed data embedding and extraction algorithms are presented in Sections 4.3 and 4.4, respectively. In Section 4.5, experimental results showing the feasibility of the methods are given. Lastly, we briefly summarize the work we have done in Section 4.6.

4.1.2 Problem Definition

The BBS (bulletin board system) is a popular interaction platform for discussions, entertainments, shopping, etc. Every day, numerous articles are published on BBS’s.

Thus, it is an appropriate channel for covert communication. Furthermore, BBS administrators have the supreme authority to read or delete any article and even read private mails or messages on the BBS, so covert communication via the BBS is not only appropriate but also necessary. The aim of this kind of covert communication is

to send secret messages through the articles published on the BBS without arousing suspicions of hackers. Accordingly, we develop two techniques for covert communication via the BBS by the use of special Big-5 codes in this study. They are introduced in the following sections.

4.2 Major Ideas of Proposed Methods by Use of Special Big-5 Codes

In this study, we propose two data hiding methods for covert communication via the BBS. One is to use invisible Big-5 codes; the other is to use special Big-5 space codes, and we generally call the two kinds of codes we use special Big-5 codes.

4.2.1 Data Hiding by Invisible Big-5 Codes

To achieve the goal of data hiding in BBS articles, at the beginning of this study we have tried the technique of using invisible ASCII codes mentioned previously in Chapter 3, because the ASCII codes are compatible with the Big-5 codes as the kernel set. However, invisible ASCII control codes are utilized to implement some system functions on the BBS, so we have to develop new data hiding technique.

The first proposed new data hiding method via the BBS is to use invisible Big-5 codes. In Taiwan, many BBS’s like the PTT and the school BBS sites are built on the servers with the Big-5 coding format, so the proposed first technique is appropriate for them. Nowadays, most of the popular operating systems such as Windows XP and Windows 7 use the Unicode format as their text coding systems, because the Unicode is a universal and complete standard format. No matter what coding formats are used for text, they will be transformed into the appearance of the Unicode format by these

operating systems when they are displayed on the screen. Taking Windows XP as an example. In this operating system which contains many different conversion tables for transcoding between various text coding formats and the Unicode, all text with the Big-5 format on the BBS will be displayed on the screen with the Unicode format by referring to the CodePage 950 which is a transcoding table between the Big-5 and the Unicode [17].

For this reason, we tried to find the mapping relationship between all Big-5 codes and Unicode codes, and discovered that some special Big-5 codes, which originally represent certain rarely-used Chinese characters or Japanese characters, are invisible, and look just like white spaces when these codes are transcoded into the Unicode format and displayed on the BBS. This phenomenon resulted from the fact that these corresponding Unicode codes are located in the Unicode Private Use Area, which ranges from code E000 to code E8FF and does not contain any character assignment so that no character code chart is provided for this area.

However, on some popular BBS browsers such as PCMan and Pietty, to facilitate users to read and type some special characters, certain above-mentioned special Big-5 codes are presented as their original appearances through the simulated Unicode compensation plan implemented by the BBS browser software. So, through continuous tests and observations in our experiments on popular BBS browsers including PCMan, KKMan, Pietty, and the basic telnet connection program provided by Windows XP, we have found 185 special Big-5 codes useful for our study, and we supplemented the 185 codes to a total of 256 symbols by padding a white space after each of the first 71 ones of them. The appearances of embedding some of these symbols in BBS articles on the above-mentioned browsers are shown in Figure 4.1.

And the codes are listed in Table 4.1. Note that we have created an end signal which is composed of a special Big-5 code, FEAE, and the original white space.

(a) (b)

(b) (d)

Figure 4.1 Stego-articles with some embedded invisible Big-5 symbols displayed on (a) PCMan, (b) KKMan, (c) Pietty, and (d) the telnet connection program, respectively.

Table 4.1 Encoding table for used invisible Big-5 codes. special Big-5 code FE AE and the original white space.)

4.2.2 Data Hiding by Special Big-5 Space Codes

We have also proposed another data hiding technique for the BBS by the use of some special Big-5 space codes. In this method, two kinds of Big-5 codes are used, one being the original white space code and the other a Big-5 space code. Because the two codes are both included in the Big-5 standard, and appear to be invisible when they are displayed on BBS browsers, we can use them to achieve the aim of data hiding in the BBS under most general operating systems by assembling them in a proper order.

On the BBS, many users are accustomed to publishing articles with alternate blank lines and this habit facilitates us to hide secret messages after the line feed code of each line end. So, we tried to devise an appropriate scheme to efficiently utilize the two mentioned invisible codes for the largest utilization ratio of the blank spaces in each line.

More specifically, there are two kinds of character lengths on the BBS. One occupies a unit which is defined to be the same as the unit mentioned in Algorithm 3.1, like the original white space; and the other is two-unit long, like the special Big-5 space code. For the variability and efficiency of using the Big-5 symbols for data hiding, we allow them to be a special Big-5 code, a combination of a special Big-5 code and a white space, or a combination of a special Big-5 code and several white spaces, as shown in Table 4.2, which we mention as an encoding table for the used special Big-5 space codes. Here, efficiency is judged by the average required number of units for hiding one bit.

In our study, we only use four types (except the end signal) of symbols, including:

type 1: , type2: , type3: , and type4: .

Though we can create more types of symbols following the same symbol-creating logic by padding more white space codes after the symbol , yet we can prove that the 4-symbol codes created in Table 4.2 have the largest efficiency of symbols, as discussed next.

Table 4.2 Encoding table for used special Big-5 space codes.

Assume that the probability for each symbol to appear is identical, and that a symbol can represent n embedded bits. And let ui and pi specify the occupied units and the appearance probability of the i-th type of symbols like those defined in Table 4.2, respectively. Then, the efficiency E of the symbols is defined by the following

Also, under the assumption that all the symbols have equal appearance probabilities, we may substitute ui and pi in (1) above with their real values to obtain

which can be reduced easily to be

Bit stream (binary) 00 01 10 11 End signal

Special Big-5 codes (embedded symbol)

Occupied units two three four five two

(Note: : special Big-5 space code. : original white space code. : line feed code.)

Setting g(n) = 0 in (4) above results in the following equation:

1 1

(2ⁿ^) n n2 2 ⁿ^  6 0 (5)

which can be solved to get the extreme value for n. However, because the number of bits must be an integer and such an integer satisfying Eq. (5) is inexistent, we must take n to be 2 for g(n) to be closest to zero. Alternatively, from Eq. (3) we have f(n) = 5/2, 7/4, and 11/6 for n = 1, 2, and 3, respectively. Since 5/2  7/4  11/6, we see that n = 2 indeed is an optimal value to make f(n) minimum, i.e., to yield the smallest average required number of units, 7/4, for hiding one bit. This completes the proof that the 4-symbol codes (except the end signal) listed in Table 4.2 are optimal, yield the largest efficiency of coding.

Some examples of BBS articles in which secret messages are embedded by the proposed method are given in Figure 4.2. By the way, we created the end signal composed of a special Big-5 space code and a line feed code rather than a white space code and a line feed code, since all white spaces between any other code and the line feed will be removed when an article is published on the BBS.

Because secret messages embedded by the two proposed methods are almost imperceptible on the BBS even when a user highlights BBS articles by a mouse, we can use the methods to achieve the goal of covet communication on the BBS. The

detailed algorithms about embedding and extraction of the secret message are described specifically in the subsequent sections.

(a) (b)

Figure 4.2 Stego-articles with secret messages embedded displayed on some well-known BBS’s. (a) On PCMan. (b) On KKMan. (c) On Pietty. (d) On the telnet connection program.

4.3 Proposed Algorithm for Data Embedding

In this section, first the process of covert communication via the BBS by using the two proposed methods is illustrated by a flow chart shown in Figure 4.3. In the process, first we fold longer article lines into shorter ones, leaving at least eight characters at each line end as a data embedding slot. Next, we use a secret key as a seed to randomize the content of a secret message which we want to embed in a cover article for covert communication. Then, we map the randomized message to corresponding invisible symbols according to the user’s choice. If method 1 as mentioned previously is chosen, we conduct the mapping by referring to Table 4.1;

otherwise, when method 2 is chosen, we replace each special Big-5 space code in the cover article with two original white space codes so that the process of data extraction in the next section can be performed correctly, and conduct the mapping by referring to Table 4.2. Finally, we sequentially embed the symbols obtained from the mapping into the folded article to obtain a stego-article with the randomized secret message hidden imperceptibly.

The algorithm for conducting this process is described in the following, in which a line in a BBS article means a number of characters in a row with an LF appended to the end of the line.

Algorithm 4.1 Data embedding for covert communication.

Input: a secret message S, a secret key K, and a cover BBS article B.

Output: a stego-article 

Steps.

Randomization Secret message S

Special Big-5 codes encoding table 1

Choose a hiding method Generate a series of invisible

symbols t1, t2…, tk Fold each long text line to the

appropriate length

Compute the size of each slot after each text line

Figure 4.3 Flow chart of proposed process of embedding secret messages.

1. Fold sequentially each text line li with a length larger than 70 units (with a unit meaning the length of an ASCII code displayed on the BBS) in BBS article B into a 70-unit line by inserting a line feed, denoted as LF and occupying zero unit, after the original 70th character in l_i to generate a folded article, denoted as F.

2. Compute the size Li of the data embedding slot at the end of each text line li in F by:

Li = 78 the length of li,

which means that the maximum number of characters that can be inserted at the end of l_i.

3. Use the secret key K as a seed to generate a sequence Q of random numbers.

4. Randomize the input secret message S with Q to get a randomized message S

5. Choose a method to hide S:

(1) If method 1 is chosen, then perform Step 6.

(2) If method 2 is chosen, then go to Step 9.

6. (Method 1) Separate the bits of S into 8-bit segments and map them to invisible symbols t₁, t₂, …, t_k, respectively, according to Table 4.1.

7. Let |l| be the total number of lines, |T| be the total number of t1 through tk, and Ut1, Ut₂, …, Ut_k be the numbers of units occupied by t₁ through t_k, respectively.

8. Embed the invisible symbols obtained in Step 6 sequentially into the folded article F from the first line (that is, take the index number i of l_i and the index number k of tk both to be 1 initially), and then conduct the following steps.

8.1 If i  |l|, then perform one of the following three operations at the end of li; otherwise, perform Step 8.2.

(1) If k  |T| and Li  Utk  2, then embed tk in the data embedding slot of li, decrement Li by Utk, increment k by 1, and repeat Step 8.1 again.

(2) If k  |T| and Li  Utk  2, then scan li to find the line feed LF, remove it, embed an end signal in the data embedding slot of li, increment i by 1, and repeat Step 8.1 again.

(3) If k  |T|, then embed an end signal in the data embedding slot of li, and go to Step 13.

8.2 Embed the remaining symbol/symbols below F as one or more blank lines with an end signal appended at each line end, and go to Step 13.

9. (Method 2) Replace each special Big-5 code in the folded article F with two white space codes.

p₂, …, p_k, respectively, according to Table 4.2.

11. Let |l| be the total number of lines, |P| be the total number of p1 through pk, and Up₁, Up₂, …, Up_k be the numbers of units occupied by p₁ through p_k, respectively.

12. Embed the invisible symbols sequentially into the folded article F from the first line (that is, take the index number i of li and the index number k of pk both to be 1 initially), and then conduct the following steps.

12.1 If i  |l|, then perform one of the following three operations at the end of li; otherwise perform Step 12.2.

(1) If k  |P| and Li  Upk  2, then embed pk in the data embedding slot of l_i, decrement L_i by Up_k, increment k by 1, and repeat Step 12.1 again.

(2) If k  |P| and Li  Upk  2, then scan li to find the line feed LF, remove it, embed an end signal in the data embedding slot of l_i, increment i by 1, and repeat Step 12.1 again.

(3) If k  |P|, then embed an end signal in the data embedding slot of li, and perform Step 13.

12.2 Embed the remaining symbol/symbols below F as one or more blank lines with an end signal at each line end, and continue.

13. Take the final version of F as the desired stego-BBS article 

4.4 Proposed Algorithm for Data Extraction

In this section, we will specifically introduce the process for extraction of secret data. A flow chart of the process is shown in Figure 4.4. First, we extract the invisible symbols embedded in a stego-article. Next, according to the adopted different

methods of embedding the invisible symbols, we conduct different processes. If method 1 is used, we map the symbols into 8-bit segments by referring to Table 4.1;

otherwise, when method 2 is used, we map the symbols into 2-bit segments by referring to Table 4.2. Then, we concatenate the segments into a random message.

Finally, by using the same key which is used for embedding the message, we can recover the correct secret message. The detailed algorithm for extraction of the secret message is described in the following.

Figure 4.4 Flow chart of proposed data extraction process

Algorithm 3.2 Data extraction for covert communication.

Input: a stego-BBS article  and the secret key K used in Algorithm 3.1

Output: a secret message S.

Steps.

1. Check each line l_i in the stego-BBS article B sequentially, starting from the first line; and extract the invisible symbols embedded in front of the end signal in l_i. 2. Transform the extracted symbols according to the different method used for

embedding the secret message to be extracted.

(1) If method 1 is used, map them into 8-bit segments t1, t2, …, tk, respectively, by referring to Table 4.1.

(2) If method 2 is used, map them into 2-bit segments p1, p2, …, pk,, respectively, by referring to Table 4.2.

3. Concatenate the extracted segments into a random message Q.

4. Use the secret key K to reorder Q to obtain a result as the desired secret message S.

4.5 Experimental Results

A series of experiments have been conducted to test the proposed algorithms for covert communication via the BBS under the popular software environments of PCMan, KKMan, Pietty, and the operating system of the traditional Chinese version of Microsoft Windows XP, service pack 3, 2002. In the following, we show some secret key and a secret message, as shown in Figure 4.5(b). Specifically, we just have

to select a hiding method, highlight a cover article by a mouse, and press the hiding-button also shown in Figure 4.5(c). In this way, we obtained a stego-BBS article with a secret message embedded, and the appearance of the stego-article is shown in Figure 4.5(d). Later, we extracted the secret message by typing the correct secret key, selecting the same method, highlighting the stego-article, and pressing the extraction-button as shown in Figure 4.5(e). On the other hand, as shown in Figure 4.5(f), when the typed secret key was wrong, the correct secret message was obtained.

Another example of our experimental results is shown in Figure 4.6.

4.6 Summary

In this chapter, two new methods of data hiding using special Big-5 codes in BBS articles have been proposed for covert communication. One is appropriate for the operating systems with the Unicode standard as the kernel set and using the

在文檔中利用特殊字元編碼的新資訊隱藏技術與其於網際網路上之應用 (頁 43-0)