A Covert Communication Method via Spreadsheets by Secret Sharing with a Self-Authentication Capability

(1)

A Covert Communication Method via Spreadsheets by Secret Sharing with a

Self-Authentication Capability ^

Che-Wei Lee¹and Wen-Hsiang Tsai^1,^2, 

1Department of Computer Science and Information Engineering

National Chiao Tung University, Hsinchu, Taiwan 30010

2Department of Information Communication

Asia University, Taichung, Taiwan 41354

Tel: +886-3-5728368 Fax: +886-3-5734935

E-mails: paradiserlee@gmail.com & whtsai@cis.nctu.edu.tw

 This work is supported financially by the National Science Council, Taiwan, ROC under Project No. 99-2631-H-009-001.

 To whom all correspondence should be sent.

(2)

Abstract

A new covert communication method with a self-authentication capability for secret data hiding in spreadsheets using the information sharing technique is proposed.

At the sender site, a secret message is transformed into shares by Shamir’s (k, n)-threshold secret sharing scheme with n = k + 1, and the generated k + 1 shares are

embedded into the number items in a spreadsheet as if they are part of the spreadsheet content. And at the receiver site, every k shares among the k + 1 ones then are extracted from the stego-spreadsheet to recover k + 1 copies of the secret, and the consistency of the k + 1 copies in value is checked to determine whether the embedded shares are intact or not, achieving a new type of blind self-authentication of the embedded secret. By dividing the secret message into segments and applying to each segment the secret sharing scheme, the integrity and fidelity of the hidden secret message can be verified, achieving a covert communication process with the double functions of information hiding and self-authentication. Experimental results and discussions on data embedding capacity, authentication precision, and steganalysis issues are also included to show the feasibility of the proposed method.

Key words: covert communication, secret sharing, information hiding, self-authentication, spreadsheet.

(3)

1. INTRODUCTION

Covert communication is a technique of concealing secret information into a cover medium in an imperceptible way or with a camouflage effect such that only a

sender and an intended receiver know the existence of the hidden data in the resulting stego-medium. In the literature, emphases were put on the use of multimedia like

images, videos, and audios [1-4] because these media in general provide larger embeddable spaces and cause less suspicion due to their wide distributions. And weaknesses existing in human beings’ visual capabilities are often exploited to design effective covert communication methods. For example, the methods proposed in [5-7]

replace the least-significant bits of pixels in cover images to embed information, and that of [8] uses the parities of palette colors, composed by similar colors, to represent hidden message bits.

In addition to methods developed for multimedia, several others [9-12] used cover media of text, PDF, or Word documents for covert communication. In Brassil and Maxemchuk [9], data are embedded by slightly adjusting the lines, tabs, or characters in text files. Lee and Tsai [10] used special ASCII codes in PDF files to embed data between characters. Liu and Tsai [12] made use of the change tracking function in Microsoft Word to embed data imperceptibly by a document degeneration technique.

In this study, we propose a new covert communication method which applies Shamir’s (k, n)-threshold secret sharing scheme with n = k + 1 to a given secret item to yield k+1 shares, and the generated k + 1 shares are embedded into the number items in a spreadsheet as if they are part of the spreadsheet content. The purpose of transforming the secret data into secret shares by the (k, k+1)-threshold secret sharing

(4)

scheme is not to enforce robustness, but to yield a blind self-authentication capability for the embedded secret. Conventionally, the concept of (k, n)-threshold secret sharing is applied to provide destruction-tolerant capabilities. That is, any k shares collected from n ones may be processed to reveal the shared secret even though up to (n  k) shares are destroyed. But in the proposed method, the scheme of (k, k + 1)-threshold secret sharing is developed for the first time to provide instead a self-authentication capability by checking the value-consistency of k + 1 results coming from all k + 1 combinations to determine whether the extracted secret is intact or not. That is, only when the results computed from any k shares collected from k + 1 shares are all identical in value can the extracted secret be decided to be intact. Fig. 1 illustrates

these core ideas of the proposed method.

Moreover, to conceal the presence of hidden data, secret shares are spread throughout the cover spreadsheet in a sparsely fashion. And a spreadsheet containing numeral items with a high scatter level is more suitable to be used as a cover spreadsheet for better concealment. Merits of the proposed method include the following. (1) A receiver can confirm the correctness of the extracted secret message.

(2) Compared with some methods using hash codes or parity bits as redundant data to ensure the authenticity of retrieved data, only a minor redundancy, i.e., the (k + 1)-th share, is needed in the proposed method. (3) By adaptively choosing involved parameters, i.e. the value of p, used in the polynomial of Shamir’s method for the selected spreadsheet, the numerical items’ values generated by the method will fall into a reasonable range of values, arousing little suspicion during covert communication. (4) Using spreadsheets as cover media, the proposed method is free from unintentional destruction of hidden data like data compression during the secret transmission or data keeping process, in contrast with cover media like images or

(5)

videos which are often compressed ignorantly in such a process. Two examples of such documents, Microsoft Excel and Google Docs, are shown in Fig. 2.

The remainder of this paper is organized as follows. In Section 2, the Shamir method on which the proposed method is based is reviewed first. In Section 3, the details of the proposed method, including secret message embedding, secret message extraction, and self-authentication of the extracted message, are described. In Section 4, discussions on related issues about the proposed method are given. Experimental results are presented in Section 5, followed by conclusions in Section 6.

2. REVIEW OF SHAMIR’S METHOD FOR SECRET SHARING

In the (k, n)-threshold secret sharing scheme proposed by Shamir [13] with k  n, a secret d in the form of an integer is transformed into shares which then are distributed to n participants to keep; and as long as up to k of the n shares can be collected, the original secret can be recovered. The detail of the scheme may be described as two algorithms in the following.

Algorithm 1: (k, n)-threshold secret sharing.

Input: a secret d in the form of an integer, the number n of participants, and a threshold k not larger than n.

Output: n shares in the form of integers for n participants to keep.

Steps.

1. Choose randomly a prime number p which is larger than the secret d.

2. Select k  1 integer values c₁, c₂, …, c_k1 within the range of 0 through p  1.

3. Select n distinct real values for the variables x₁, x₂, …, x_n.

4. Use the following (k  1)-degree polynomial to compute n function values F(xi),

(6)

called partial shares:

F(xi) = (d + c1xi + c2xi2

+ … + ck1xik1

)mod p, (1)

for i = 1, 2, …, n.

5. Deliver the 2-tuple (xi, F(xi)) as a share to the ith participant, where i = 1, 2, …, n.

Since there are k coefficients, including d and c1 through ck1, in (1) above, it is necessary to collect at least k shares from the n participants to form k equations of the form of (1) to solve these k coefficients in order to recover the secret d. This explains the term, threshold, for k and the name, (k, n)-threshold, for the Shamir method [13].

Below is a description of the equation-solving process for secret recovery.

Algorithm 2: secret recovery.

Input: k shares collected from the n participants where kis the threshold mentioned in Algorithm 1; and the prime number p which was chosen in Step 1 of Algorithm 1.

Output: the secret d hidden in the shares and the coefficients ci used in the equations described by (1) in Algorithm 1, where i = 1, 2, …, k  1.

Steps.

1. Use the k shares (x1, F(x1)), (x₂, F(x2)), …, (x_k, F(x_k)) to set up the following equations:

F(x_j) = (d + c₁x_j + c₂x_j² + … + c_k1x_j^k1)_{mod p}, (2) where j = 1, 2, ..., k.

2. Solve the k equations above by Lagrange’s interpolation to obtain the desired secret value d [16] as follows:

(7)

1 2 3 1 2

1 2

1 2 1 3 1 2 1 2 3 2

1 2 1

mod

1 2 1

... ...

( 1) [ ( ) ( )

( )( )...( ) ( )( )...( )

... ( ) ... ] .

( )( )...( )

k k k

k k

k

k p

k k k k

x x x x x x

d F x F x

x x x x x x x x x x x x

x x x

F x x x x x x x



  

     

 

  

3. Compute the values c1 through ck1 by expanding the following equality and comparing the result with (2) in Step 1 while regarding the variable x in the equality below to be x_j in (2):

2 3 1 3

1 2

1 2 1 3 1 2 1 2 3 2

1 2 1

mod

1 2 1

( )( )...( ) ( )( )...( )

( ) [ ( ) ( )

( )( )...( ) ( )( )...( )

( )( )...( )

... ( ) ] .

( )( )...( )

k k

k

k p

k k k k

F x F x F x

x x x x x x

F x x x x x x x



     

 

     

  

 

  

Step 3 in the above algorithm is included for the purpose of computing the values of the parameters ci in the proposed method. In other applications, if only the secret value d need be recovered, this step may be eliminated.

3. PROPOSED COVERT COMMUNICATION METHOD USING SPREADSHEETS

3.1 Generation of a Stego-Spreadsheet

In the proposed method, an appropriate cover spreadsheet S which contains numeric data for disguising generated secret shares is prepared first. Next, a secret message M to be hidden is divided into several segments, and taken as input to Shamir’s (k, n)-threshold secret sharing scheme [13] with carefully chosen parameters to generate secret shares. Then, numeric items in S which are selected by a secret key are replaced with the shares to generate a stego-spreadsheet S. In this process, the parameters involved in Eq. (1) of Algorithm 1 are adjusted to satisfy the characteristics of the input secret message and the prepared cover spreadsheet. These parameters include: (a) the number m of bits in each message segment, which is also

(8)

taken to be the identical numbers of bits in all of the coefficients d, c₁ through c_k; (b) the number k of message segments processed by the Shamir scheme each time, which is also the minimum number k of secret shares needed to be collected to recover the secret; (c) the total number n of generated shares, which is set to be k + 1 specifically;

(d) and the prime number p, which is the smallest integer larger than all the values of the coefficients d, c1 through ck, and the variables x1 through xn used in Eq. (1) [13].

A detailed algorithm describing the process is presented in the following.

Algorithm 3: generation of a stego-spreadsheet.

Input: a binary secret message M divided into m-bit segments, a spreadsheet S, a

secret key K, and three pre-selected integers k, n (= k + 1), and m.

Output: a stego-spreadsheet S.

Steps.

Stage 1  share generation.

Step 1. Choose the smallest prime number p which is larger than 2^m  1.

Step 2. Take sequentially k unprocessed m-bit segments from M to form a group G, called segment group, and perform the following steps to transform the segment group into partial shares.

2.1 Transform the k m-bit message segments in G into integers and take the results to be d, c₁, c₂, …, c_k, respectively

2.2 Take x1 through xn to be the integers 1 through n, respectively, where n = k + 1.

2.3 Use the following (k  1)-degree polynomial to compute n partial shares F(xi):

F(xi) = (d + c1xi + c2xi2

+ … + ck1xik1

)mod p, (3)

(9)

where i = 1, 2, …, n.

2.4 Save all F(x_i) in order into a partial-share set F_ps.

Step 3. If the message segments in M are not exhausted, then go to Step 2 to process another segment group; otherwise, continue.

Stage 2  partial share embedding.

Step 4. Take an unprocessed partial share F(x_i) from F_ps, and perform the following steps.

4.1 Use the secret key K to randomly select a numeric item I in S.

4.2 Replace I with F(xi).

Step 5. If there exist unprocessed partial shares in Fps, go to Step 4; otherwise, take the final S as the output S.

3.2 Algorithm for data extraction and authentication

The proposed blind self-authentication capability for verifying a recovered secret message is fulfilled by the (k, k + 1)-threshold secret sharing scheme. In the past, the concept of (k, n)-threshold secret sharing is often applied to develop methods for secret image sharing [14-16] or image repairing [17] with destruction-tolerant capabilities  any k shares collected from the n ones may be processed to reveal the shared secret even though up to (n  k) shares are destroyed. But in the proposed method here, the scheme of (k, k + 1)-threshold secret sharing is developed to provide a self-authentication capability for verifying the correctness of a recovered segment group in the secret message  any k shares collected from the k + 1 ones should, after the secret recovery process of Algorithm 2 is conducted, reveal the same secret value in normal cases, meaning that no damage ever occurs to the k + 1 shares; otherwise, it can be decided that some shares must have been destroyed. By making use of this

(10)

characteristic, blind self-authentication of each segment group in the recovered secret message is carried out, and verification of the integrity and fidelity of the secret message thus achieved. A detailed algorithm of secret message recovery and self-authentication is described in the following.

Algorithm 4: secret data recovery and self-authentication.

Input: a stego-spreadsheet S; the prime number p, the three integers k, n (= k + 1), and m, and the secret key K used in Algorithm 3.

Output: a secret message M hidden in S presumably, and a report about the authenticity of the segments within M.

Steps.

Stage 1  message segment computation.

Step 1. Use the secret key K to select randomly numeric items in S; take out their values which presumably are the partial shares F(x_i) embedded by Algorithm 3;

and put the items sequentially into a set Fps as a partial-share set.

Step 2. Take out in order n partial shares from F_ps, set their corresponding x values as 1 through n, respectively, and perform the following steps to recover a binary segment Mi of the secret message M, if possible.

2.1 For every k partial shares F₁, F₂, …, F_k in the n ones and their corresponding x values x1, x2, …, xk, perform the following steps.

2.1.1 Use the k shares (x₁, F₁), (x₂, F₂), …, (x_k, F_k) to set up the following equations:

Fj = F(xj) = (d + c1xj + c2xj2

+ … + ck1xj k1

)mod p, (4)

where j = 1, 2, ..., k.

2.1.2 Compute the values d and c1 through ck1 by expanding the following

(11)

equality and comparing the result with (4) in Step 2.1.1 above while regarding the variable x in the equality below to be x_j in (4):

2 3 1 3

1 2

1 2 1 3 1 2 1 2 3 2

1 2 1

mod

1 2 1

( )( )...( ) ( )( )...( )

( ) [ ( ) ( )

( )( )...( ) ( )( )...( )

( )( )...( )

... ( ) ] .

( )( )...( )

k k

k

k p

k k k k

F x F x F x

x x x x x x

F x x x x x x x



     

 

     

  

 

  

2.1.3 Put the computed values of d and c1 through ck1 as a set into a buffer B.

(There will be n = k + 1 sets of values of d and c₁ through c_k1 at the end of Step 2).

Stage 2  self-authentication of the computed message segment.

Step 3. Take out the n sets of the coefficient values of d and c1 through ck1 in B and perform the following operations.

3.1 Transform the coefficients d and c₁ through c_k1 into k binary segments, and concatenate them as a message segment M_i.

3.2 If all the n sets of the coefficient values are identical to one another, then mark Mi as authentic and append it to the end of the desired secret message M; else, mark M_i as having been damaged and continue.

Step 4. If all shares embedded in S′ are processed, then take the final M as the output;

otherwise, go to Step 2.

4. DISCUSSIONS ON RELATED ISSUES ABOUT PROPOSED METHOD

4.1 Statistical undetectability

A statistical anomaly caused by information embedding is a reliable clue to detect

the presence of the steganographic content [18]. For the purpose of resisting such

(12)

statistical analysis, two strategies are used in the proposed method. One is to spread

secret shares throughout the cover spreadsheet in a sparsely and randomly distributed

fashion so that less affection is incurred to the statistical properties of the cover

spreadsheet after information embedding. This way of achieving undetectability for a

hidden message used in the proposed method follows the concept of the

frequency-hopping spread spectrum technique [19] in which radio signals are

transmitted by many frequency channels selected according to a pseudorandom

sequence known to the sender and the receiver. The other strategy is to choose

comparatively insignificant parts of numeric data in the spreadsheet for embedding

secret shares in order to keep a low level of embedding strength for maintaining the

statistical properties in a stego-spreadsheet. For example, we may choose the decimal

fractions of the numbers in a cover spreadsheet and replace their values with those of

the secret shares, resulting in insignificant alterations to the statistical property in the

stego-spreadsheet.

4.2 Active Security Consideration

The proposed method not only can passively prevent the stego-spreadsheet from

detection but also can actively ensure the fidelity and integrity of the transmitted

secret. In the active attack model [12], if an adversary subtly made modifications to

passing-by stego-spreadsheets for the purpose of misleading a receiver, the blind

(13)

self-authentication capability provided by the proposed method can be used to check

the authenticity of the retrieved secret message. When the authenticity check fails, it

reveals that the communication between the two sides has been threatened and

appropriate measures should be adopted.

4.3 Embedding Capacity Analysis

The value k mentioned in Step 2 of Algorithm 3 determines the number of message segments, or equivalently, the total number of bits, in each segment group processed by the algorithm. It can be figured out that under the condition of using the same number of numeric items in a spreadsheet for data embedding, a larger k implies a larger embedding capacity but a coarser integrity check in the later process of self-authentication, while a smaller k means the reverse. There exists a tradeoff here.

Specifically, for instance, assume that 10 numeric items in a cover spreadsheet are to be replaced with secret shares, and a (k, n)-threshold secret sharing scheme with k = 9, n = k + 1 = 10 is adopted. In this case, the 9 coefficients d, c1, c2, …, and c8, with each being an m-bit segment of the secret message, form the coefficients of the 8-degree polynomial described by (3), and so provide 9×m = 9m bits as the embedding capacity by generating 10 secret shares and embedding them into the cover spreadsheet. As a comparison, under the same condition but with (k, n) = (k, k + 1) = (4, 5), a 3-degree polynomial including four m-bit coefficients is formed, providing a data embedding capacity of 4×m = 4m bits after 5 partial shares are generated and embedded. Therefore, if 10 number items of a cover spreadsheet is provided as well, then the 10 items can be used to embed 2 sets of 5 secret shares generated from 2 distinct segment groups in the secret message, yielding a total of

(14)

2×4m = 8m bits as the data embedding capacity. As can be observed from the two cases, the former case provides a larger embedding capacity of 9m secret message bits yet with a segment group of 9m bits as the unit for later self-authentication.

Contrastively, the latter case provides a smaller embedding capacity of 8m secret message bits but a finer authentication unit of 4m-bit segment group in the secret message.

From the above discussions, a general conclusion about the data embedding capacity of the proposed method is made as follows: if I denotes the total number of numeric items in a cover spreadsheet available for embedding secret shares, then the embedding capacity C of the proposed method based on a (k, n)-threshold secret sharing scheme with n = k + 1 is:

C = I n m k

  

   (5)

where I n

 

   denotes the number of segment groups in the secret message M and m is the number of bits in a segment of M.

5. EXPERIMENTAL RESULTS

5.1 Experimental Results Using Spreadsheets Recording Students’ Scores

A result of the experiments we conducted using the proposed method was based on the use of a cover spreadsheet recording 300 students’ scores saved as an Excel file as shown in Fig. 3. Note that this is just an example; the type of cover spreadsheet and the content of it need not be restricted to be so.

The values of the involved parameters p, m and k in Eq. (3) of the Shamir method were set to be 101, 6, and 7, respectively. The value of the prime number p

(15)

was taken to be 101 because it is the smallest integer larger than the full marks of 100 of the students’ test scores. The value of m = 6 means that the length of each segment of the input secret message M was taken to be 6 bits, which satisfies the requirement of 2^m  1 = 63 < p mentioned in Step 1 of Algorithm 3. And each message segment in M was transformed into an integer for use as one of the coefficients d, c₁, c₂…, c_k1 in Eq. (3). As for k = 7, it means that the value n is n = k + 1 = 8 in the applied (k, n)-threshold secret sharing scheme, and that every 7 message segments in M are used

as the coefficients d, c1, c2, …, c6 of the polynomial in Eq. (3). Then, a total of 8 (= 7 + 1) secret shares were generated by Algorithm 3, yielding a self-authentication capacity of checking every 7 message segments in M.

Furthermore, as shown in Fig. 4, the input secret message M was taken to be the note: “password: 19841221”. In this case, the 18 characters of the message were transformed into a binary string with 18×7 = 126 bits (7 bits per ASCII-coded character). The 126 bits then were divided into 3 segment groups with each group composed of 7 segments and each segment consisting of m = 6 bits. The three segment groups correspond to the following three message sections:

Group 1: “Passwo”; Group 2: “rd: 19”; Group 3: “841221.”

Totally, the 3 segment groups generated 38 = 24 secret shares which at last, by the use of a secret key, were randomly embedded into the cover spreadsheet to yield a stego-spreadsheet. We list the first 36 items in the stego-spreadsheet in Fig. 5(a), where items having been replaced with the secret shares are marked in blue. A list of the first 36 items in the cover spreadsheet is given in Fig. 5(b) for comparison.

If the stego-spreadsheet is intentionally modified illegally, Algorithm 4 will detect such tampering by the self-authentication operation (see Step 3). Besides, if

(16)

some embedded secret shares survive the modification, Algorithm 4 can reconstruct the partially correct secret message from them by the recovery steps (Steps 2 through 4). Some experimental results of these functions are described now. Fig. 6 shows a modified stego-spreadsheet where items 16 through 26 were altered by replacing them with other numbers. Within the 11 modified items, items 15 and 17 include two embedded secret shares. The secret message extracted from such a modified spreadsheet using Algorithm 4 is shown in Fig. 7. As can be seen, segment groups 2 and 3 of the secret message were reconstructed correctly, while segment group 1 is authenticated to have been modified and marked by the algorithm with asterisk symbols “*.”

In this case, the strategy of yielding a low embedding rate mentioned previously is used to achieve the goal of creating undetectability of the stego-spreadsheet. In order to ensure that this strategy works, the two-sample Kolmogorov-Smirnov test (KS test), which is a non-parametric statistical test and is useful to check whether two data samples come from the same probability distribution, is used to quantitatively compare the probability distribution of numeric data in a stego-spreadsheet with that in a cover spreadsheet. The null hypothesis is that two data samples come from the same underlying distribution at the 5% significance level, and the alternative hypothesis is that they are from different distributions. The result of applying the test to the contents of the cover spreadsheet and the stego-spreadsheet shown in Fig. 5 is shown in Table 1 given below, in which the resulting hypothesis 0 means that the test cannot reject the null hypothesis, that is, a third party cannot think that the probability distribution of the stego-spreadsheet is different from that of the cover spreadsheet.

The limit of the embedding rate at which the two-sample KS-test will reject the null hypothesis, according to our experiments, is 50.67% in this case. This means that the

(17)

embedding rate should be smaller than 50.67% in order to keep the undetectability property of the stego-spreadsheet when a steganalyst has the information of the probability distribution related to the stego-spreadsheet.

How to choose an embedding rate which is secure against such a statistical test depends on the scatter level of the chosen numeric data of the cover spreadsheet. Here, the scatter level is computed as the variance of numeric data values. In terms of this parameter, three spreadsheets Scores1, Scores2, and Scores3 with the scatter level from high to low were tested further in our experiments using the same setting of parameters. Scores1 is just the one used in the first experiment mentioned above and the corresponding statistics is shown in Table 1. The results of using Scores2 and Scores3 are shown in Tables 2 and 3, respectively. From Table 2, the limit of the embedding rate using Scores2 is seen to be 26% which is lower than that using Scores1. As for Scores3, the corresponding limit of the embedding rate is down to be 6.04% as seen in Table 3. These experimental statistics indicate that the numeric data of a cover spreadsheet with a higher scatter level can yield a higher embedding rate without causing statistical anomalies. This fact can also be seen from the message embedding bit rate per numeric item, also shown in the tables. Specifically, the upper bound of the embedding bit rate per numeric item in Scores 1 is 2.66 b, which is higher than those in Scores 2 (1.36 b) and Scores 3 (0.32 b).

5.2 Experimental Results Using a Spreadsheet of a Financial Statement

Another experimental result using the Microsoft Excel file of a financial statement of a company as the cover spreadsheet is shown in Figs. 8 through 11. Fig.

8 shows the cover spreadsheet with 32 candidate numeric items for data embedding.

In this case, the strategy of choosing insignificant parts of numeric data in the cover spreadsheet for embedding secret shares is used to keep a low level of embedding

(18)

strength for consideration of the undetectability of the generated stego-spreadsheet.

Fig. 9 shows the input secret message which was transformed into 32 shares by Algorithm 3. Correspondingly, the decimal fractions of all of the 32 numeric items in the cover spreadsheet of Fig. 8 were used to embed the shares. Each share was transformed into two digits and embedded to the right of the decimal point of a numeric item. The resulting stego-spreadsheet is shown in Fig. 10 which looks like a common spreadsheet. As done in the previous experiment, the two-sample Kolmogorov-Smirnov test was used, and the result is shown in Table 4 which supports the use of the strategy, accomplishing the goal of yielding statistical undetectability in the stego-spreadsheet.

Fig. 11 shows the stego-spreadsheet with 3 numeric items (highlighted) being modified. The secret message extracted from the modified stego-spreadsheet is shown in Fig. 12(b) in which the destructed part of the secret message is marked by asterisk symbols. As a comparison, the secret message extracted from the intact stego-spreadsheet shown in Fig. 10 is shown in Fig. 12(a).

5.3 Comparison with existing Methods

For the purpose of presenting the contributions made in this study, a comparison of the capabilities of the proposed method with those of some existing covert communication methods is given in Table 5.

Most existing information hiding methods for covert communication [5-8, 10-12]

were developed based on the premise that an adversary always works in the passive mode. However, the active attack model [12] mentioned previously that an adversary introducing subtle modifications to passing-by stego-objects between the two parties is possible in practical covert communication. Contrastive with the existing methods, the proposed method is the only one which has the capability against active attacks

(19)

and simultaneously takes the passive steganalytic attack into the consideration.

Furthermore, the destructed part of a secret message can be localized precisely by the proposed method, that is, the proposed method has the capability of modification localization which is useful for verifying the integrity of the secret message in the proposed method.

Furthermore, auxiliary information for message decoding is required in some methods like [12]. Extra storage space is thus required to save the information for both parties in the communication, adding a burden to the system in practical use.

Contrarily, like the methods of [5-8, 10-11] the proposed method does not need any auxiliary information. In addition, the methods in [10, 12] increase the size of the generated stego-file due to the procedure of adding encoding codes or changing tracking records for data embedding. In contrast, the manipulation of substitution/replacement for data embedding used in methods of [5-8] as well as the proposed method keep the size of a cover file unchanged after it is transformed into a stego-version.

The embedding bit rate of the proposed method is comparatively smaller than that yielded by the methods of [5-7] using images as cover media. However, it is noted that these methods are vulnerable to the well-known RS steganalysis [20]. This study aims at providing a new way of covert communication, and the issue of improving the embedding capacity deserves further investigation in the future.

6. CONCLUSIONS

A new covert communication method with a self-authentication capability via spreadsheets using Shamir’s (k + 1, k) secret sharing scheme has been proposed in this study. The segment groups of a secret message are transformed into secret shares and

(20)

then embedded as if they are part of the content in a cover spreadsheet, yielding a camouflage effect and generating a self-authentication capability. Each segment group of the secret message extracted from a stego-spreadsheet can be blindly authenticated by checking the results computed from all the k + 1 possible combinations of k shares out of k + 1 ones  if the resulting k + 1 copies of the recovered secret are all identical to one another, then the stego-spreadsheet is decided to be intact. In case the stego-spreadsheet is authenticated to have been modified, the altered part of the hidden secret message may be identified, and the undamaged part recovered correctly.

Experimental results have been shown to prove the feasibility and effectiveness of the proposed method. Derivations of the data embedding capacity and authentication precision have also been conducted, and discussions on the steganalysis issue included. Future studies may be directed to applications of the proposed method to multimedia protection in the field of fragile watermarking.

REFERENCES

[1] M. Wu, H. Yu, A. Gelman, Multi-level data hiding for digital image and video, Proc. of SPIE Photonics East, Boston, MA, USA, 1999, pp. 10-21.

[2] Gopalan, K., et al, Covert speech communication via cover speech by tone insertion, Proc.

of the 2003 IEEE Aerospace Conference, Big Sky, MT, USA, March 2003.

[3] J. J. Chae, B. S. Manjunath, Data hiding in Video, Proc. of 1999 IEEE International Conference on Image Processing, Kobe, Japan, 1999, pp. 243-246.

[4] A. Cheddad, J. Condell, K. Curran, P, Mc Kevitt, Digital image steganography: Survey and analysis of current methods, Signal Processing, 90 (3) 2010 727-752.

[5] W. Bender, D. Gruhl, N. Morimoto, A. Lu, “Techniques for data hiding, IBM Syst. J., 35 (3-4) (1996) 313-336.

[6] D. C. Wu, W. H. Tsai, A steganographic method for images by pixel-value differencing,

(21)

Pattern Recognition Letters, 24 (9-10) (2003) 1613-1626.

[7] C. H. Yang, C. Y. Weng, S. J. Wang, H. M. Sun, Adaptive data hiding in edge areas of images with spatial LSB domain systems, IEEE Transactions on Information forensics and security, 3 (3) (2008) 488-497.

[8] J. Fridrich, R. Du, Secure steganographic methods for palette images, Proc. of 3rd International Workshop Information Hiding, Dresden, Germany, Sept. 1999; also in Lecture Notes in Computer Science, Springer-Verlag, Berlin, 2000, pp. 61-76.

[9] J. T. Brassil, N. F. Maxemchuk, Copyright protection for the electronic distribution of text Documents, Proc. IEEE, 87 (7) (1999) 1181-1196.

[10] I. S. Lee, W. H. Tsai, A new approach to covert communication via PDF Files, Signal Processing, 90 (2) (2010) 557-565.

[11] S. Zhong, X. Cheng, T. Chen, Data hiding in a kind of PDF texts for secret communication, International Journal of Network Security, 4, (1) (2007) 17-26.

[12] T. Y. Liu, W. H. Tsai, A New steganographic method for data hiding in Microsoft Word documents by a change tracking technique, IEEE Transactions on Information Forensics and Security, 2 (1) (2007) 24-30.

[13] A. Shamir, How to share a secret, Communication of ACM, 22 (11) (1979) 612-613.

[14] C. C. Lin, W. H. Tsai, Secret image sharing with steganography and authentication, Journal of Systems and Software, 73 (3) (2004) 405-414.

[15] C. C. Thien, J. C. Lin, Secret image sharing, Computers and Graphics, 26 (1) (2002) 765-770.

[16] L. S. T. Chen, J. C. Lin, Multithreshold progressive image sharing with compact shadows, Journal of Electronic Imaging, 19 (1) (2010) 013003.

[17] C. W. Lee, W. H. Tsai, Authentication of binary document images in PNG format based on a secret sharing technique, Proceedings of 2010 International Conference on System Science and Engineering, Taipei, Taiwan, 2010, pp. 133-138.

[18] N. Provos, P. Honeyman, Hide and seek: an introduction to steganography, IEEE Security and Privacy Magazine, 1 (3) (2003) 32-44.

(22)

[19] R. L. Pickholtz, D. L. Schilling, L. B. Millstein, Theory of spread spectrum communications a tutorial, IEEE Transactions on Communications, 30 (5) (1982) 855-884.

[20] H. Wang and S. Wang, “Cyber warfare: steganography vs. steganalysis,”

Communications of ACM, 47 (10) (2004) 76-82.

(23)

(a)

1 k

Ck^

(b)

Fig. 1. Illustration of proposed covert communication method via spreadsheets by secret sharing. (a) Generation of a stego-spreadsheet. (b) Self-authentication of the extracted message.

(24)

(a) (b)

Figure 2. Examples of spreadsheets. (a) Microsoft Excel. (b) Google Docs.

(25)

(a) (b)

Figure 3. A cover spreadsheet with 300 numeric items of students’ test scores. (a) List of the first 36 items in the spreadsheet. (b) List of the last 34 items in the spreadsheet.

(26)

Figure 4. A dialogue for entering input secret message.

(27)

(a) (b)

Figure 5. Comparison of a cover spreadsheet and the stego-spreadsheet generated from it. (a) The stego-spreadsheet. (b) The cover spreadsheet.

(28)

Figure 6.An altered spreadsheet with fake items 16 through 26.

(29)

Figure 7. An extracted secret message with a message segment retrieved from tampered items in the stego-spreadsheet marked by symbols “*.”

(30)

Figure 8. A cover spreadsheet of financial statement with 32 numeric items.

(31)

Figure 9. A dialogue with the input secret message.

(32)

Figure 10. A stego-spreadsheet in which the decimal fractions of the numeric items have been modified by embedded shares.

(33)

Figure 11. A stego-spreadsheet with 3 numeric items (highlighted) being modified.

(34)

(a) (b)

Figure 12. The recovered secret message. (a) Message extracted from the intact stego-spreadsheet shown in Fig. 10. (b) Message extracted from the modified stego-spreadsheet shown in Fig. 11.

(35)

Table 1. Experimental results of using strategy 1 with a cover spreadsheet with high scatter level of numeric data.

Scores 1 (300 numeric

items with variance 917.76

and size 25K)

# of replaced numeric items I

Resulting hypothesis

(5%)

p value

Capacity=

I/n×m×k (bits)

Embedding bit rate per numeric

item

Embedding bit rate

Embedding rate

5% 16

0 (cannot

reject)

1 2×6×7=84 0.28 b 1/298

Embedding rate

limit 50.67% 152 1

(reject) 0.0309 19×6×7=798 2.66 b 1/31

(36)

Table 2. Experimental results of using strategy 1 with a cover spreadsheet with medium scatter level of numeric data.

and size 105K)

# of replaced numeric items

(5%)

p value

Capacity=

Embedding bit rate per numeric

item

Embedding bit rate

Embedding rate

5% 64

0 (cannot

reject)

0.9999 8×6×7=336 0.26 b 1/313

Embedding rate

limit 26% 336 1

(reject) 0.049 42×6×7=1764 1.36 b 1/60

(37)

Table 3. Experimental results of using strategy 1 with a cover spreadsheet with low scatter level of numeric data.

and size 31K)

# of replaced numeric items

(5%)

p value

Capacity=

Embedding bit rate per

numeric item

Embedding bit rate

Embedding rate

5% 112

0 (cannot

reject)

0.3557 14×6×7=588 0.26 b 1/53

Embedding rate

limit 6.04 % 136 1

(reject) 0.0383 17×6×7=714 0.32 b 1/43

(38)

Table 4. Experimental results of using strategy 2 for a cover spreadsheet of a financial statement.

Financial statement (32 numeric

items and size 15K)

# of replaced numeric

items

Result of Hypothesis

(5%)

p value

Capacity=

Embedding bit rate per numeric item

Embedding bit rate

Embedding

rate 100% 32

0 (cannot

reject)

1 4×6×7=168 5.25 b 1/89

(39)

Table 5. Comparison of existing steganographic methods and proposed method.

Manipulation of data embedding

Against active attack

Modification localization

capability

free from need of auxiliary information for message extraction

Keeping the size of a cover file after transformed into stego-version

[5, 6, 7] LSB-based

(image) No No Yes Yes

[8] parities of palette

colors (image) No No Yes Yes

[10] Certain ASCII codes

(PDF) No No Yes No

[11] Character space

varying (PDF) No No Yes Yes

[12]

Change tracking technique (MS word document)

No No No No

Proposed method

Partial replacement of numeric items

(spreadsheet)

Yes Yes Yes Yes

A Covert Communication Method via Spreadsheets by Secret Sharing with a Self-Authentication Capability

A Covert Communication Method via Spreadsheets by Secret Sharing with a

Self-Authentication Capability 

Self-Authentication Capability ^