Chapter 3 Secret Transmission via PDF Files by Space Coding and Insertion of
3.6 Discussions and Summary
In this chapter, we have proposed two data hiding techniques via PDF files for secret transmission. The first is based on a space coding scheme and the second is based on a scheme of inserting invisible texts into a PDF file.
For the first technique, the capacity of the embedded data is limited by the number of usable white-space characters in the cover PDF while the size of the cover PDF will not change after data embedding because we do not insert any other data in it. For the second technique, the capacity of the embedded data is unlimited so that we can embed a large amount of secret data, which even can be another PDF file. That is just the property we utilize to implement secret PDF document sharing which will be described in Chapter 5.
Furthermore, we can combine these two techniques together for secret
transmission. If the secret message is short enough, we just use the first technique, which does not change the size of the cover PDF; else, the second technique is used subsequently to embed the remaining secret data.
No matter which technique we use to implement secret transmission, or both of them are used, it has no influence on the display of the PDF file so that people cannot be aware of the existence of the hidden data. Even if an illicit user knows that there is a secret message in the PDF file, the covert message can be protected by a user key, and the illicit user still cannot extract the original secret message. The idea has been proved by our experiments.
Chapter 4
Authentication of Secret Messages for Fidelity Verification in PDF Files
4.1 Introduction
It is not safe for people to transmit messages on the Internet because the messages might be intercepted and tampered with by illicit users. Even if data hiding techniques may be used to embed a secret message, the cover medium in which the secret message is embedded still has chance to be altered. So we designed a method in this study to authenticate the secret message to check if it is believable.
More specifically, the proposed method can be used to authenticate a secret message embedded in a PDF file for fidelity verification. The basic idea will be described in Section 4.2. Basically, we modify the generation numbers in the PDF structure as authentication signals. In Section 4.3, the proposed authentication algorithm for secret messages in PDF files will be stated. Several experimental results will be shown in Section 4.4, and then a summary of the proposed method and some discussions are given in Section 4.5.
4.2 Idea of Secret Message
Authentication by Modifying
Generation Numbers in the PDF File
Since what we want to authenticate is a secret message, we need to embed authentication signals by a different way from that we use to embed the secret message. The proposed data hiding technique for embedding authentication signals is to modify the generation numbers in the PDF file.
As mentioned in Section 2.2, an identifier of an indirect object in a PDF file consists of two parts: a positive integer object number and a non-negative integer generation number. The generation number is used to keep track of the times for which the object has been updated. It also appears in the cross-reference table.
Figure 4.1 The example of generation numbers.
More specifically, in Figure 4.1 we can see a cross-reference table. The second object in the table is object 7, its offset is 719, and its generation number is 0 on
which we use a red rectangle to highlight. Except the cross-reference table, we can know the generation number of object 7 in other places like “7 0 R”, which is used to refer to the object 7, and “7 0 obj”, the identifier of object 7. All generation numbers of the same indirect object in the PDF file must be the same value or it can cause a wrong display of the PDF document.
After some experiments, we find out that the value of a generation number of an indirect object has nothing to do with the display of the PDF document. So we can modify some generation numbers to embed authentication signals. First, we use the secret message to generate authentication signals by applying exclusive-OR operations on each byte of the secret message. Then, we transform the authentication signal into a bitstream and then use each bit in the bitstream to replace the generation numbers of some indirect objects in the PDF file.
For example, suppose the authentication signal is “10011000.” Then, we use each bit of the authentication signal to replace the generation numbers of object 1 to 8, respectively. So the modified generation numbers of these indirect objects are 1, 0, 0, 1, 1, 0, 0 and 0, respectively. Note that all the generation numbers of the same indirect object in the PDF file must be the same value, so we need to modify all generation numbers of the indirect objects which we use to embed the authentication signal.
When we get a stego-PDF, extract a secret message from it, and want to know if the secret message is believable, we use the secret message which we extract from the stego-PDF to generate the authentication signal by the same way. Then we compare it with the signal which is extracted from the generation numbers in the PDF file. If they are the same, the authentication is successful. Else, the secret message is not believable and it is decided to have been tampered with by illicit users.
4.3 Proposed Authentication Processes for Secret Messages in PDF files
In the proposed authentication process, we use the generation numbers of objects 1 through 16 to embed every two bytes of the authentication signal. We separate the authentication process into two parts, which are described here. The first part is the embedding of the authentication signal. Its flowchart is illustrated by Figure 4.2 and the detail is described below as an algorithm.
Algorithm 4.1. Generating and embedding authentication signals in PDF files.
Input: a secret message S and a cover PDF P.
Output: a stego-PDF P′′ with an authentication signal for the secret message embedded.
Steps:
1 Apply exclusive-OR operations on all bytes of S to get an authentication signal A by the following way.
16.
3.2 Replace the last digit of the generation numbers of objects 1 through 16 in the cross-reference table by a1, a2, …and a16, respectively.
3.3 Replace all generation numbers of object i in P′ by the modified generation numbers, where i is from 1 to 16.
4 Take the resulting stego-PDF as the desired output P′′.
The second part of the proposed authentication process is extracting the authentication signal and verifying the secret message for fidelity. Its flowchart is illustrated by Figure 4.3 and the detail is described below as an algorithm.
Algorithm 4.2. Extracting authentication signals and verifying the secret message in a stego-PDF.
Input: a stego-PDF P′′.
Output: an authentication report R of the secret message in P′′. Steps:
1 Extract the secret message S in P′′.
2 Apply exclusive-OR operations on all bytes of S to get an authentication signal A′ by the following way.
3 Extract the embedded authentication signal A in P′′ by the following way.
3.1 Find out the last digit of the generation numbers of objects 1 through 16 in the cross-reference table to get a1, a2,…and a16, respectively, where ai = 0 or 1 with i from 1 to 16.
3.2 Concatenate a1 through a16 to get A.
4 Compare A and A′.
4.1 If A = A′, regard the authentication to be successful and mark it so in R.
4.2 Else, A ≠ A′, decide that the secret message has been tampered with by illicit users and mark it so in R.
4.4 Experimental Results
We use the same program which was described in Chapter 3 for secret transmission to implement the authentication of the secret message in a stego-PDF. In Figure 4.4, we embed a secret message and its authentication signal into the cover PDF. In Figure 4.5, we extract the secret message first and then authenticate it to get the result of successful authentication. Figures 4.6 and 4.7 show the cover PDF and the stego-PDF with the secret message and authentication signal embedded, respectively. Comparing the two figures, no change can be seen on the display of the PDF document. If the secret message has been tampered with by illicit users, we will extract the wrong secret message. The result of the authentication is shown in Figure 4.8. It warns us that the secret message is not believable.
Figure 4.2 The flowchart of embedding authentication signals.
Figure 4.3 The flowchart of the authentication processes.
Figure 4.4 The window of the user interface for secret authentication
Figure 4.5 The result of successful authentication
Figure 4.6 The cover PDF.
Figure 4.7 The stego-PDF.
Figure 4.8 The result of failure authentication
4.5 Discussions and Summary
We have proposed a method for authentication of secret messages for fidelity verification in PDF files in this chapter. We generate authentication signals by applying exclusive-OR operations on each byte of the secret message and embed it into the secret-embedded PDF file by modifying generation numbers. It has no influence on the display of the PDF file. We use two bytes to embed the authentication signal in this study; however, if the secret message is long, we can generate more bytes of the authentication signals for secret authentication.
If we want to authenticate the secret message extracted from a stego-PDF, we just need to compare the authentication signal embedded in the stego-PDF and which is generated from the extracted secret message. Then we can know the result. The idea has been proved by our experiments.
Chapter 5
Secret Sharing via PDF Documents by Data Hiding Techniques
5.1 Introduction
Secret sharing is an interesting subject. For example, suppose that a treasure map is divided into several parts and distributed to some participants. Only when all the participants bring their partial maps together can the complete treasure map be recovered so that they can go to find the treasure. This way can prevent someone from finding and taking the treasure alone. Although a treasure map may only occur in stories, many things in the real word nowadays are just like the treasure map which needs protections, for example, certain copyrighted products which are not devised by single persons but a group of people in cooperation.
Many secret sharing methods have been proposed for different kinds of digital formats [8-10]. Because the PDF has become a very popular file format nowadays, many research papers and publications are saved as PDF documents. In addition to being used as cover files, the PDF document can be treated as a secret file to share as well.
In this chapter, the proposed method of secret sharing via PDF documents is described. An overview of the proposed method is stated in Section 5.2. And the proposed secret sharing and recovery algorithms are described in Sections 5.3 and 5.4, respectively. In Section 5.5, some experimental results of the proposed method is shown. Finally, a summary of the proposed method and some discussions are given in
Section 5.6.
5.2 Overview of Proposed Method of Secret Sharing via PDF Documents
In this section, the idea of the proposed method of secret sharing via PDF documents by data hiding techniques is described. The secret which we want to share is a PDF document, and the size of the secret PDF document is big in general. We use the data hiding technique based on the scheme of inserting invisible texts into cover PDFs, which is discussed in Section 3.2.2, to implement the proposed method.
The general idea of the proposed method is illustrated in Figure 5.1. We have a secret PDF document and select n cover PDFs randomly, denoted as P1 through Pn, respectively. After applying the proposed secret sharing process, we transform them into n stego-PDFs which are regarded as shares and denoted as P1′ through Pn′, respectively. Then we distribute the n shares to a group of participants of the same number, each participant with a share.
When the participants want to recover the secret PDF document, they should bring their shares all together. After applying the proposed secret recovery process, they can get the original secret PDF document.
From each cover PDF, we extract a portion of data with the same size as that of the secret PDF file. Every portion of data is regarded as a preliminary share. Let the length of the secret PDF document in bytes be denoted as l. The idea of extracting data from each cover PDF to prepare the preliminary share is described below.
Secret sharing processes P1
Secret PDF document Pn
P1 Pn
Secret PDF document Secret recovery processes
Figure 5.1 Illustration of proposed method of secret sharing via PDF documents.
1. If the size of the cover PDF is larger than or equal to that of the secret PDF document, we extract data whose length is l from the beginning of the cover PDF.
2. If the size of the cover PDF is smaller than that of the secret PDF document, we extract data from the beginning of the cover PDF to its end and repeat this operation, until the total length of the extracted data is l.
3. As mentioned in Section 3.2.2, after embedding data into an indirect object by the data hiding technique of inserting invisible texts, we need to update the cross-reference table and the trailer in the cover PDF. It means that the indirect object in the stego-PDF where we embed data, the cross-reference table, and the trailer are not the same as they are in the cover PDF. Because we will use the data hiding technique of inserting invisible texts in the later processes to embed the intermediate secret message, when we extract data from each cover PDF to get preliminary shares, we skip the indirect object with data embedded, the cross-reference table, and the trailer to guarantee that when we extract the preliminary shares from the stego-PDFs, they are still the same as those extracted from the cover PDFs, so that the recovered PDF document is correct.
After the above process, we can get n preliminary shares with the same size. We then use them and the secret PDF document to generate the intermediate secret message by using exclusive-OR and coincidence operators. Which operator should be used is decided by a key K generated from the user keys selected and kept by the participants, respectively. Let the user key of participant Pi be denoted as Ki, where i is from 1 to n. Then, we generate K as K = K1⊕K2…⊕ Kn.
We know that p⊙q = −(p⊕q), where “⊕” is the exclusive-OR operator and “⊙”
is the coincidence operator. Because “⊕” is commutative, p⊙q = −(p⊕q) = −(q⊕p) = q⊙p, “⊙” is also commutative. So we can use exclusive-OR and coincidence operators to generate the intermediate secret message S′.
Denote each preliminary share as Si = si1si2…sil where i is from 1 to n and si1
After we get the intermediate secret message S′, we separate S′ into n parts with the length of each part being l/n in bytes. Then, we embed each part into one of P1
through Pn, respectively, by the scheme of inserting invisible texts which is discussed in Section 3.2.2, to get the stego-PDF’s P1′ through Pn′, respectively. Figure 5.2 shows an illustration of this secret sharing process.
separate S into n parts and embed into the cover PDFs P1
Secret PDF document Pn
P1 Pn
S1 Sn
Extracting Extracting
Apply exclusive-OR and coincidence operators
Intermediate secret message S
Figure 5.2 Illustration of proposed secret sharing process.
If we want to recover the secret PDF document, the participants should bring the n stego-PDFs all together. First, we extract n parts of the intermediate secret message from the stego-PDFs and concatenate them to get S′ with length l. Then, we get n
preliminary shares from the stego-PDFs by a reverse process of the way mentioned in the bits of K. Then, we recover S by the following way:
s1 = s′1 ○1 s11 ○2 s21 ○3 s31… ○n sn1
Finally, we recover the secret PDF document S. An illustration of the proposed secret recovery process is shown in Figure 5.3.
5.3 Proposed Secret Sharing Algorithm
The idea of the proposed secret PDF document sharing has been described in the last section and the detail is described below as an algorithm.
Algorithm 5.1. Sharing a secret PDF document.
Input: a secret PDF document S and n cover PDFs P1, P2,…, Pn with n user-selected keys K1, K2, …, Kn.
Output: n stego-PDFs P1′, P2′,…, Pn′.
Secret PDF document
P1 Pn
S1 Sn
Extracting Extracting
Apply exclusive-OR and coincidence operators
Extracting
Intermediate secret message S
Figure 5.3 Illustration of proposed secret recovery process.
Algorithm 5.2. Recovering a secret PDF document.
Input: n stego-PDFs P1′, P2′,…, Pn′ with n user-selected keys K1, K2, …, Kn. Output: a secret PDF document S.
Steps:
1 Extract the embedded data S′1, S′2,…, S′n from P1′, P2′,…, Pn′, respectively.
2 Concatenate S′1 through S′n to get intermediate secret message S′ with length l.
3 Transform S′ into S′ = s′1s′2…s′l , where s′1 through s′l are the bytes of S′.
4 Extract S1, S2,…, Sn from P1′, P2′,…, Pn′, respectively, by the way mentioned in language of Java. It supports both the secret sharing and recovery functions. In one of our experiments, we used three cover PDFs to share a secret PDF document.
Figure 5.4 shows the dialog window for keying in the number of participants in the secret sharing activity, We can decide the number of cover PDFs according to this number of participants. We use three cover PDFs in this experiment. Figure 5.5 shows the window of the user interface, and Figure 5.6 shows this window with a secret PDF document and three cover PDFs with user keys as input. The three cover PDFs are shown in Figures 5.7, 5.8 and 5.9, respectively, and the secret PDF document is
shown in Figure 5.10.
After the proposed secret sharing process was conducted, we get three stego-PDFs, as shown in Figures 5.11, 5.12 and 5.13, respectively. No change can be seen on the displays of the stego-PDF documents, compared with those of the original cover PDF. And then we can distribute them to the three participants, each participant with a share.
Figure 5.14 shows the window of the user interface with the three stego-PDFs and user keys as input to recover the secret PDF document. And the recovered PDF document is shown in Figure 5.15. We can find out that the recovered PDF document is just the same one as the original secret PDF document shown in Figure 5.10. As long as one of the shares or the user keys is incorrect, the recovered PDF document cannot be opened, as shown in Figure 5.16.
Figure 5.4 The window of inputting the number of PDF documents.
Figure 5.5 The window of the user interface.
Figure 5.6 The window of the user interface with a secret PDF document and three cover PDFs with user keys as input
.
Figure 5.7 The first cover PDF
Figure 5.8 The second cover PDF
Figure 5.9 The third cover PDF
Figure 5.10 The secret PDF document.
Figure 5.11 The first stego-PDF.
Figure 5.12 The second stego-PDF.
Figure 5.13 The third stego-PDF.
Figure 5.14 The window of the user interface with three stego-PDFs and user keys as input.
Figure 5.15 The recovered PDF document.
Figure 5.16 The incorrect secret PDF document.
5.6 Discussions and Summary
We have proposed a method for secret sharing via PDF documents in this chapter.
We use exclusive-OR and coincidence operators to share the secret PDF document and embed the intermediate secret message into cover PDFs by the data hiding technique which is mentioned in Chapter 3. By the proposed secret sharing method, we can select cover PDFs and user keys randomly. After applying the proposed sharing method, we can get shares and distribute them to a group of participants of the same number, each participant with a share. Only when all the participants bring their
We use exclusive-OR and coincidence operators to share the secret PDF document and embed the intermediate secret message into cover PDFs by the data hiding technique which is mentioned in Chapter 3. By the proposed secret sharing method, we can select cover PDFs and user keys randomly. After applying the proposed sharing method, we can get shares and distribute them to a group of participants of the same number, each participant with a share. Only when all the participants bring their