Major Idea of Proposed Method by Use of Special UTF-8 Codes

Chapter 6 Email Authentication by Special UTF-8 Space Codes

6.2 Major Idea of Proposed Method by Use of Special UTF-8 Codes

In Chapter 3, we described how we fulfill authentication of blog articles on popular web browsers by using special ASCII control codes which become invisible after being embedded into blog articles. With this success, naturally we tried to implement the same data hiding technique on webmails for email authentication.

However, for some popular webmail platforms such as G-mail, when we send a stego-email with some special ASCII control codes embedded, all the codes will be removed so that nothing can be extracted later for verification of the mail.

Nevertheless, we tried further to hide data by the method proposed in Chapters 4 and 5. It was used there to achieve the goals of covert communication via BBS articles and authentication of BBS articles, using special invisible Big-5 code. This time these special codes can be preserved after undergoing the mail sending and receiving processes conducted on webmail platforms, but unfortunately for some popular web browsers like Mozilla Firefox and Google Chrome, these codes are revealed and

appear as special patterns provided by them. For example, the special Big-5 code

“FDEA” is transcoded to the corresponding Unicode code “E25F” and displayed graphically on the Firefox and Chrome as and , respectively. Thus the above two data hiding methods are not appropriate for webmail authentication, either.

Finally, we tried the use of some special UTF-8 space codes to achieve the aim of email authentication. This time we succeeded. The idea is inspired from the data hiding technique for the BBS using special Big-5 space codes. Specifically, we found the UTF-8 code “E38080” useful for our purpose here, which is transcoded from a Big-5 space code and is a standard Unicode code. Because it is located in the normal character area with a space chart, and is invisible when it is displayed on browsers, we can combine it with white spaces to become special symbols for use in data hiding in emails. The used UTF-8 codes and the devised code mapping relationship are listed in Table 6.1.

Table 6.1 Encoding table for used UTF-8 codes.

The process of embedding these secret symbols is similar to the hiding procedure used for blog authentication presented in Chapter 3, except that we embed the initial (special space) before all symbols in each data embedding slot to be a start signal. It can assist us in finding the starting position of the secret symbols embedded

Bit stream

in each line when we conduct the verification process.

We also have to use the distributional embedding method mentioned in Chapter 3 to disperse all secret symbols at the ends of the text lines. The reason is that, when we read a protected email on a web browser, if we highlight the article content of the email by a mouse, then the embedded secret symbols will appear at the text line ends as some white spaces. This phenomenon will become an undesirable leakage of the embedded authentication signal. Two examples for highlighting a stego-email on the IE are shown in Figure 6.1, where Figure 6.1(a) shows the case of hiding the secret symbols at the line ends just in normal order and Figure 6.1(b) shows the case of embedding the symbols evenly at the line ends.

The imperceptibility of these symbols composed of UTF-8 space codes is relatively lower than the secret symbols used in the previous chapters. However, these symbols are still idoneous to be used for email authentication. They do not disturb users to read emails at all, and even if malicious users work out our hiding method, they still cannot arbitrarily tamper with a protected email and escape from our verification. Furthermore, this method is appropriate for most operating systems and even compatible with other text-styled Internet applications, because all the used secret symbols are composed of UTF-8 codes. These codes are defined in the standard Unicode format with normal charts, and nowadays almost all websites use the UTF-8 standard as their text encoding format. Thus, in the proposed method special UTF-8 space codes are used to embed data for email authentication. The detailed processes implementing the method are described in the next sections.

(a)

(b)

Figure 6.1 A highlighted stego-email with secret symbols embedded (a) just in order or (b) evenly using the proposed method.

6.3 Authentication Signal Generation and Embedding Process

The proposed process for generating an authentication signal and embedding it into an email is described in this section. In Figure 6.2, an illustration of the process is shown, and this process is appropriate for most popular webmail platforms. First, we fold longer email article lines into shorter ones, leaving some character spaces at each line end as a data embedding slot, and replace each UTF-8 space code (if it exists) in the article with an approximately equal-length combination of four white space codes.

(The length of a UTF-8 space is approximately equal to the length of four white spaces displayed in webmails.) Next, we remove from the folded email article all the line feed signals so that the verification process described in the next section will not be interfered by redundant line feed signals. The modified email article and a secret key then are used to generate an authentication signal using a hash function and the exclusive-OR operation. Subsequently, we map each bit of the authentication signal into our devised secret symbols one by one according to a table (Table 6.1). Finally, these secret symbols are embedded into the text line ends accompanied with end signals to obtain a protected email. The detailed procedure is described in Algorithm 6.1 below.

Algorithm 6.1 Authentication signal generation and embedding process.

Input: a secret key K, a hash function f (such as MD5), and an email E to be protected.

Output: a protected email 

Steps.

Hash function f

Authentication signal S Fold each long text line to the

appropriate length secret symbols in each text line

Protected email E

Figure 6.2 Flowchart of proposed authentication signal generation and embedding process.

1. Fold sequentially each text line l_i with a length larger than 60 units (with a unit meaning the length of an ASCII code displayed on web browsers) in email E into a 60-unit line by inserting a line feed, denoted as LF and occupying zero unit, after the original 60th character in li to generate a folded article, denoted as F.

2. Replace each UTF-8 space code (if it exists) in F with an approximately equal-length combination of four white space codes.

3. Compute the size L_iof the data embedding slot at the end of each text line l_i in F by:

Li = 90 the length of li,

which means the maximum number of characters that can be inserted at the end of li.

4. Remove all the line feed signals in F, use the result and the secret key K as inputs to the hash function f to generate two 128-bit digests Fand K, respectively, and

return all the removed LF signals back into their original positions in F.

5. Compute the exclusive-OR value F⊕K to obtain a 128-bit authentication signal S, and let N₁ and N₂ both denote the total number of bits of S.

8. Embed the symbols p1 through p128 sequentially into F, starting from the first line, in the following way.

(1) Scan li to find the line feed LF, replace it with a start signal, sequentially embed n_i symbols in l_i at the end, decrement N₂ by n_i, and append an end signal, in which an LF is included, to the end of the embedded symbols.

(2) If the last line is processed, and if N2  0, then embed the remaining symbol/symbols below F as one or more blank lines in the following way.

8.1 Embed as many symbols as possible into a new line sequentially before the length of the line (in units) becomes larger than 87, insert a start signal at the line start, and append an end signal to the line end.

8.2 If all symbols are embedded, then continue; otherwise, repeat Step 8.1 again.

9. Take the final version of F as the desired protected email 

In Step 2 of the above algorithm, because it is possible that the UTF-8 space

appears in an original cover email, we need to replace it with a combination of four white space codes, which is approximately equal-length to a UTF-8 space, so that the verification process described in the next section can be conducted correctly.

6.4 Authentication Signal Extraction and Verification Process

The proposed email authentication signal extraction and verification process is described in this section. First, we extract the secret symbols embedded in the protected email and transform them into an authentication signal S. Then, we use the same secret key and hash function as those used in Algorithm 6.1 to transform the email, in which all the secret data and the line feed signals are removed, into a verification signal T. Finally, we can verify the integrity and fidelity of the email by comparing the two signals S and T. The main process is similar to the authentication process of blog articles mentioned in Algorithm 3.2, except that when we conduct the extraction step for secret symbols, first we must find the start signals before other symbols in the text lines so that we can differentiate each white space code appearing in the email, which is typed in the original cover email or appended to be a secret symbol. A flow chart of the proposed process is shown in Figure 6.3, and the detailed algorithm is given in Algorithm 6.2.

Algorithm 6.2 Authentication signal extraction and blog article verification.

Input: a secret key K and a hash function f both being the same as those used in Algorithm 6.1; and a protected email E.

Output: an authentication report R.

Steps.

Mapping

Figure 6.3 Flowchart of the proposed authentication signal extraction and email verification process.

1. Check each line l_i in the protected email E′ sequentially, starting from the first line; find the start signal; and extract the subsequent secret symbols embedded in front of the end signal in l_i.

2. Concatenate all the extracted secret symbols sequentially into a set of 128 codes, p₁, p₂, …, p₆₄.

3. Map each of p1 through p64 to a corresponding bit 0 or 1, according to Table 6.1.

4. Concatenate these bits into a 128-bit authentication signal S.

5. Use the secret key K as an input to the hash function f to generate a 128-bit digest K

6. Remove all the secret data and line feed signals from the email, and use the result as an input to the hash function f to generate a 128-bit digest E

7. Compute the exclusive-OR value E⊕K to get a 128-bit verification signal T.

8. Compare S and T, resulting in the following two cases.

(5) If S = T, then regard the input E as unmodified and mark it so in the authentication report R.

(6) If S  T, then regard E as modified and mark it so in R.

9. Output the authentication report R.

6.5 Experimental Results

Some examples of our experimental results are given in this section. We tried to generate protected emails using the proposed method through many popular webmail platforms such the G-mail, hotmail, and yahoo mail; and all the experimental results prove that the proposed email authentication scheme is feasible.

In Figure 6.4(a), we tried to send an email through the webmail platform G-mail on Chrome, and we use Algorithm 6.1 with a secret key as input to transform the email into a protected email. The appearances of the interface of the algorithm and a protected email are shown in Figures 6.4(b) and 6.4(c), and the highlighted protected email is shown in Figure 6.4(d). Later, when the protected email was received, we verified the integrity and fidelity of it by the use of a correct secret key, as shown in Figure 6.4(e). However, if the wrong key is typed as shown Figure 6.4(f), then the

在文檔中利用特殊字元編碼的新資訊隱藏技術與其於網際網路上之應用 (頁 84-93)