Security Consideration and Experimental Results

Secret Communication through Web Pages and Automatic Authentication of Web Pages

Algorithm 10.2. Extraction of a secret message

10.5 Security Consideration and Experimental Results

The above-mentioned simple processes, however, have several weaknesses in security from the viewpoint of automatic authentication without human involvement, as discussed in the following, in which solutions for removing these weaknesses are also proposed.

(1) Word position disordering and replacement of entire web page contents --- A hacker, who knows the above processes (including the used function h) as is usually assumed in information hiding studies, may destroy the web page content by just exchanging the orders of the words (each word assumed to include the embedded space code next to it). It can be figured out that this false web page can pass the authentication process. Even worse is the case that the hacker replaces the entire text content of a web page with all authentication signals for the new words recomputed and embedded. Such a fake web page obviously will also pass the above authentication. We propose to solve these problems by first putting the words into a certain order and then generating a series of corresponding random numbers, one for a word, to compute the authentication signals by the mapping s^Bi^B = h(w^Bi^B, k^Bi^B) where k^Bi^B is the random number generated for w^Bi^B. The random numbers are generated by a function controlled by a secret key as the seed. In this way, a web page with changed word orders cannot pass the authentication process, as can be figured out, because a word w with its position changed will now be given a different random number so that the computed numerical authentication signal becomes different from the previously-embedded one. Also, it is easy to see that a hacker’s replacement of the entire text content of a web page with embedded authentication signals computed

without a key will not pass the authentication process now.

(2) Guessing of authentication signals without a key --- The above modified process of authentication signal generation still has a weakness, i.e., the generated authentication signal for each word is a 3-bit number, which is encoded into one of the eight space codes so that the probability to guess it correctly is 1/8. That is, after inserting a replacing word, the hacker only has to guess the authentication signal for the word eight times before he/she can pass the authentication of the word. This is not secure enough. As a remedy, we propose to allow the mapping function h(w, k) to yield a numerical authentication signal which, when transformed into binary, has more bits than three. For example, if we allow h to yield 12 bits which may be encoded, three by three, into four space codes, then we may use four words to provide the four white spaces at their right-hand sides to embed the four space codes. This way of multiple word encoding is equivalent to regard four words as a single one by concatenating them together. More generally, if we want to yield 3n bits as the numerical authentication signal, we regard every n words as a single one in computing the authentication signal s = h(w, k). The signal s is encoded into n space codes which are then embedded at the right-hand sides of the n words. Additionally, the mapping h may be taken to be any reasonable function, such as one of the various existing hashing algorithms. We may even adopt the famous secure SHA-1 algorithm as h with 54 words as input, and use a secret key as the seed to generate random numbers as its initial values. The algorithm yields 160 bits as output, to which we may affix two bits of 0’s. We then encode the resulting 162 bits into 54 space codes (54 = 162/3) and embed the codes at the right-hand sides of the 54 words. The security of the protected 54 words will then be very high.

For a clearer illustration, we report a simple one of the experiments we have conducted, without using random numbers in computing the authentication signals.

The text in an HTML file to be protected includes three text lines: “Personal Data:”

“Name: I-Shi Lee, Mr.” and “Tel: (09)8672555.” The corresponding web page seen in the IE window is shown in Figure 10.4(a). We regard a punctuation following a word as part of the word, and adopted a simple mapping function h which considers two words as a single one, adds up the decimal values of the ASCII codes of all the characters in them to obtain a sum S, takes the modulo-64 value M of S as a 6-bit numerical authentication signal s, and encodes M as two 3-bit numbers into two space codes by Table 10.1 as the symbolic authentication signal. These two space codes are finally taken to replace the two normal space codes 20 located to the right of the two words. That is, if the two words are w^B1^B = c^B11^Bc^B12^B...c^B1n1^B and w^B2^B = c^B21^Bc^B22^B...c^B2n2^B with c^Bij^B’s being their ASCII codes and d^Bij^B the corresponding decimal values, then we compute s as s = h(w^B1^B, w^B2^B) = (d^B11^B+d^B12^B +...+d^B1n1^B+d^B21^B+d^B22^B+...+d^B2n2^B) mod 64 = b^B1^Bb^B2^B...b^B6^B, with b^B1^Bb^B2^Bb^B3^B

encoded into a space code and b^B4^Bb^B5^Bb^B6^B into another. After all the symbolic authentication signals for the word pairs were computed in this way and embedded appropriately, the resulting web page, as viewed in the IE window, appears to be as Figure 10.4(b), which looks no different from that shown in Figure 10.4(a). Figure 10.4(c) shows the corresponding stego-HTML codes in the FrontPage window, which can be seen to include all the space codes. To simulate web page intrusion and modification, the last name “Lee” in the second line was replaced by another, “Lin.”

After the authentication process was performed, the word pair “Lin, Mr.” was authenticated to have been tampered with, and so was marked as bold italic, as shown in Figure 10.4(d). More of our experimental results show the feasibility of the proposed method.

A problem mentioned previously which need be solved is that the two space codes &#32 and &#160, after being inserted, should not be followed by digits;

otherwise, they will be regarded as codes with more digits instead of 32 and 160. One

way out is to append to either of &#32 and &#160 one additional space code other than these two to stop this ambiguity, and decode the resulting code pair as just the first one only, which may still be done uniquely.

(a) Original web page seen in IE.

(b) Web page with embedded authentication signals (space codes) seen in IE.

Figure 10.4 An experimental result of authentication of a modified web page.

(d) Web page with detected modified word pair “Lee, Mr.” marked as bold italic.

Figure 10.4 An experimental result of authentication of a modified web page (continued).

10.6 Concluding Remarks

A new secret communication method via web pages using special space codes in HTML files has been proposed. These codes appear as white spaces in the web page, and so may be used to encode secret message bits with steganographic effects. The codes are the result of a thorough investigation of all possible coding systems which can be applied in the HTML file. The character string of each message, before being embedded, is randomized with a secret key to enhance the security against illegal intercept and extraction. The original message embedded in the HTML text is non-destructible unless the web page server is intruded. Our experimental results show that the proposed method is feasible.

Also, an automatic authentication method for verifying a web page against illegal modifications of the words in the text of the web page has been proposed. The special space codes are used to encode binary mapping results from the word contents as

authentication signals, and are embedded at between-word spaces in the HTML codes.

Security enhancement techniques to prevent illicit word tampering and guessing of authentication signals have also been proposed, including the use of secret keys and the scheme of multiple word encoding. Experimental results show the feasibility of the proposed method.

Future researches may be directed to utilizing the space codes in other data hiding applications, further promotion of the security of the proposed method, and applying the space codes to other purposes, like copy protection.

在文檔中在影像與文字檔案中進行資料隱藏的技術與應用之研究 (頁 195-200)