Surveys of Related Studies and Brief Descriptions of Proposed Methods
2.1 Survey of Related Studies
Many data hiding techniques have been proposed while this dissertation study is dedicated to develop new data hiding techniques for various applications. Surveys of related studies on data hiding are described first in the following, followed by brief descriptions of the proposed methods.
2.1.1. Survey of Data Hiding in Binary Images
Many data hiding techniques have been proposed for a variety of applications of digital images in recent years [1-22]. Most of the techniques were proposed for color and grayscale images because pixels in such images take a wide range of values and so are more proper for data hiding. One simple method to data hiding in grayscale images is to use the LSB replacement technique to hide secret data or authentication signals. However, data hiding in binary images is a more challenging work. Because binary image pixels have drastic contrast, it is easier for humans’ eyes to find out pixel value changes in binary images. Therefore, it is more difficult to hide data into binary images than into color or grayscale images. Wu et al. [12] embedded secret data in specific image blocks that are selected with higher “flippability” scores by pattern matching. Manipulated flippable pixels on the image region boundary are then used to embed a significant amount of data without causing noticeable artifacts. Pan et al. [6] changed pixel values in image blocks, mapped block contents into the secret data, and used a secret key and a weight matrix to protect the hidden data. Given an
image block of size m×n, the scheme can conceal up to ⎣logB2B(m×n + 1)⎦ bits of data in the image by changing, at most, two bits in an image block. Tseng and Pan [8]
proposeda technique to alter an image bit into a new value identical to a neighboring one. It can yield better hiding effect within a binary image. Koch and Zhao [2]
embedded a bit 0 or 1 in a block by changing the number of black pixels in the block to be larger or smaller than that of white ones, respectively. In [5, 11], secret data are concealed into dithered images by maneuvering dithering patterns. Tzeng and Tsai [9]
encoded the edge features of binary images into 4×4 block patterns, and authenticated the images by pattern matching. Tzeng and Tsai [10] also proposed a new feature, called surrounding edge count, for measuring the structural randomness in a 3×3 image block, and defined “pixel embeddability” from the viewpoint of minimizing image distortion. Accordingly, embeddable image pixels suitable for hiding secret data can be selected. Wu et al. [14] used even-odd relationships of lengths of run pairs to embed information in binary images, and adjusted the length of each run to an even or odd value to represent the embedded bit value.
2.1.2. Survey of Data Hiding in Grayscale Images
Wang et al. [15] embedded an image in the fifth LSB bit plane of a cover grayscale image, and employed an optimal substitution process based on a genetic algorithm and a local pixel adjustment method to lower the distortion in the stego-image. Chang et al. [16] used dynamic programming to obtain an optimal solution for the LSB substitution method. Chan and Cheng [17, 18] presented an optimal pixel adjustment process to improve the image quality of the stego-image acquired by Wang’s schemes. Thien and Lin [19] proposed a method for hiding data in images digit by digit using a modulus function. The method is better than simple LSB substitution not only in eliminating false contours but also in reducing image
distortion. Lee and Chen [20] applied variable-sized LSB insertion to estimate the maximum embedding capacity by a human visual system (HVS) property, and to maintain image fidelity by removing false contours in smooth image regions. Liu et al.
[21] presented a novel bit plane-wise data hiding scheme using variable-depth LSB substitution and employed post-processing to eliminate the resulting noticeable artifacts.
Most of the above methods lack consideration of using precise human visual models in improving the data hiding effect. Instead, Wu and Tsai [13] presented a method based on the HVS by modifying quantization scales according to variation insensitivity from smooth to contrastive to improve stego-image quality. And Lie and Chang [22] presented an adjusted LSB technique with the number of LSBs adapting to the pixels of different grayscales.
On the other hand, some steganalysis techniques were developed to detect secret messages among stego-images. TLyu and Farid [23]T developed a universal blind detection scheme to detect hidden messages in stego-images, which uses wavelet-like decomposition to build higher-order statistical models of natural images and adopts the support vector machine as an optimal classifier to separate stego-images from cover images. TThe method Tdemonstrates good performance on JPEG images and the selected statistics is rich enough to detect hidden data in the results yielded by a very wide range of steganographic methods. In addition, to detect data hidden in LSBs in the spatial domain, it is observed that the basic LSB substitution method changes pixel values only between 2i and 2i + 1 in the i-th bit plane of the pixel value. This leads to an effective steganalytic technique, the RS method proposed by Fridrich, et al.
[24], which not only can expose the presence of secret data but also can estimate the length of the embedded data.
2.1.3. Survey of Data Hiding in Color Images
Many techniques for data hiding in color images have been proposed in the past decade [1, 7, 27] which may be categorized into two major methods: the spatial-domain method and the frequency-domain method. In the former, secret data are directly embedded in the characteristics of the pixels of the cover image, and in the latter, the cover image is transformed first into frequency-domain coefficients, into which secret data are embedded. In general, the frequency-domain method is more robust against attacks while the spatial-domain method can hide more data. The previously-surveyed methods for data hiding in binary and grayscale images are conducted in the spatial domain. For the other method, related papers are very few unless the previously-surveyed methods are adapted to be applicable to color images, for example, by considering each color channel as a grayscale image. Tsai and Wang [28] proposed a data hiding technique for color images using a binary space partitioning tree, which partitions the RGB color space into voxels and embeds three message bits into each voxel.
2.1.4. Survey of Data Hiding in Text Documents
In contrast with other multimedia, digital texts contain less redundant information for embedding data. Most data hiding methods for digital text documents try to encode information directly into the text itself or into the text format. One way of into-text hiding is to exploit the natural redundancy of languages, and one way of into-format hiding is to adjust inter-word or inter-line space [29]. On the other hand, from the steganographic point of view, digital text documents can be classified into two types: hard-copy and soft-copy [27]. A hard-copy text may be treated as a binary image resulting from scanning a text document, while a soft-copy text may be regarded as an American Standard Code for Information Interchange (ASCII) text that
can be edited by a text editing software like Notepad.
For a hard-copy text, which is interpreted as a highly-structured image, information can be embedded into the layout or format of the image. Low et al. and Brassil et al. [30-31] presented text-based steganographic methods which use the distances between consecutive lines of texts or between consecutive words to hide information. If the space between two lines is smaller than a threshold, a “0” is represented; otherwise, a “1.”
In contrast with hard-copy texts and other digital media, soft-copy texts are more difficult to hide data due to the lack of redundant information. Even a slight modification, like rewriting a letter, may be noticed by a reader. However, huge amounts of text documents that people deal with daily on the Internet are essentially soft-copy texts in nature. Thus, the protection of digital rights of this type of text document becomes more and more important.
Bender et al. [27] proposed the use of infrequent additional spaces to form secret data and transmit them in soft-copy texts, including inter-sentence spacing, end-of-line spacing, and inter-word spacing in texts. For example, one space between words is taken to represent a “0” and two spaces a “1.” Wayner [32] proposed a method to use the context-free grammar to create secret text messages in cover files for covert communication; the secret message is not embedded in the cover file directly. And a receiver extracts the hidden message by parsing. A constraint is that the cover text should be a meaningful message; otherwise, a reader will doubt it.
Cantrell and Dampier [33] proposed to embed data into unused spaces in file headers. These spaces are invisible to usual users because they are disregarded when the files are opened. The spaces can be seen when examined at the byte level, but few users would do so. Johnson et al. [34] proposed another way to embed information in unused spaces that are imperceptible to an observer, which is based on the fact that
usually operating systems allocate more space than the need of a file and the result leaves some unused space to hide information. A third method is to create a hidden partition in a file system to embed information. The partition is not viewed normally.
This concept has been expanded in a steganographic file system [35]. If a user knows the file name and the password, access to the file will be granted; otherwise, no evidence of the file will be revealed in the system of the hidden files.
Characteristics inherent in network protocols may also be taken advantage of to hide information [36]. For example, TCP/IP packets can be used to transmit secret messages across the Internet by embedding unused spaces in the packet header.
Finally, Chang and Tsai [37] proposed a special space encoding to embed copyright information into the HTML text content. The bit “1” is encoded by inserting a so-called pseudo-space string “ ” before a real space, while the bit “0” is represented by a normal space between two words or sentences.
2.1.5. Survey of Data Hiding and Sharing in Software Programs
A survey about watermarking in programs can be found in Zhu, et al. [54]. Two methods have been identified: static and dynamic. The former inserts and extracts watermarks in program codes without running the program while the latter does the same in the execution state of a software object. Two respective examples are Venkatesan, et al. [55] and Collberg and Thomborson [56]. There exist other methods with digital text, sentence syntax, text typos, e-mails [1, 27, 53, 57-59] as cover media.
The concept of secret sharing was proposed first by Shamir [46]. By a so-called (k, n)-threshold scheme, the idea is to encode a secret data item into n shares for n participants to keep, and any k or more of the shares can be collected to recover the original secret, but any (k − 1) or fewer of them will gain no information about it. A
similar scheme, called visual cryptography, was proposed by Naor and Shamir [46]
for sharing an image. The scheme provides an easy and fast decryption process consisting of xeroxing the shares onto transparencies and stacking them to reveal the original image for visual inspection. This technique has been investigated further in [48-50], though it is suitable for binary images only. Verheul and van Tilborg [51]
extended the visual cryptography technique for processing images with small numbers of gray levels or colors. Lin and Tsai [52] proposed a digital version of the visual cryptography scheme for color images with no limit on the number of colors. The n shares obtained from a color image are hidden in n camouflage images which may be selected to have well-known contents, like famous characters or paintings, to create additional steganographic effects for security protection of the shares.
2.1.6. Survey of Data Hiding in PDF Documents
Portable Document Format (PDF) files [63] are popular nowadays, and so using them as carriers of secret messages for covert communication is convenient. Though there are some techniques of embedding data in text files [57-58], studies of using PDF files as cover media are very few, except Zhong et al. [64] in which integer numerals specifying the positions of the text characters in a PDF file are used to embed secret data.
For security, it is necessary to verify the authenticity of a file received from another party or kept for a long time in a certain environment, before the file is used for various purposes. This is the authentication problem of the file, which should be solved for protection of the file against unintentional changes and malicious manipulations. In the past, the information hiding method [1] has been adopted to solve this problem but most studies were about images [10, 66-71]. There is yet no investigation on the authentication of PDF files, though a related study about data
hiding in PDF files can be found in Zhong et al. [64]. Hiding data in documents other than PDF files have also been investigated [61-62].
2.1.7. Survey of Data Hiding in HTML Documents
About hiding data in the HTML, Shirali-Shahreza [72] protects a Java applet in an HTML file from being copied by hiding a special 8-character string with a key within the Java applet. Wu and Lai [60] hide binary data in HTML files using various properties of tags, like attributes. Wu, et al. [73] use hash functions to compute digests of web page contents as watermarks. Chang and Tsai [37] insert extra white spaces in HTML text to encode bits for watermarking, as done by some commercial software [74].
There are very few studies on web page authentication using data hiding techniques so far. Zhao and Lu [75] generated watermarks of web pages based on principal component analysis and embed them by upper and lower cases of letters in HTML tags. The watermark was used to check the integrity of the entire web page.
Wu et al. [73] designed fast fragile watermarks for web pages based on hash functions which generate digests of web pages quickly with case insensitivity. Two related studies can be found in [37, 60] which utilize properties of spaces, tabs, tags, attributes, etc., to encode and hide data bits into the HTML for purposes other than web page authentication. And some more general studies about data hiding can be found in [1, 76].