Chapter 1 Introduction
1.5 Thesis Organization
The remainder of this thesis is organized in following descriptions. An introduction to the Kinect sensor as well as the structures of image data captured by the Kinect sensor, and reviews of the applications and methods of image authentication, covert communication via images, and visible watermarking in images will be described in Chapter 2. The proposed method for authentication of KINECT images is described in Chapter 3. The proposed method for covert communication via KINECT images by interpolation at depth holes is described in Chapter 4. In Chapter 5, the proposed method for copyright protection of KINECT images by 3D visible watermarking is described. Finally, conclusions and some suggestions for future works are included in Chapter 6.
8
Chapter 2
Review of Related Works and KINECT Image Structures
In this chapter, we will give a review of the KINECT sensor and the structures of the depth and color images taken by the KINECT device in Sections 2.1 and 2.2, respectively. And then we will give also a review of the existing data hiding techniques for image authentication, covert communication via images, and copyright protection of images, in Sections 2.3 through 2.5, respectively.
2.1 Previous Studies of 3D Image Acquisition Using KINECT Devices
The release of the Microsoft Kinect sensor was probably one of the biggest impacts in the research field of computer vision. Kinect sensors have created many opportunities for multimedia computing. In this section, we give a review of the hardware of the KINECT sensor and the principle of the operations which can be conducted using the KINECT sensor.
The Kinect sensor includes a color VGA video camera, a depth sensor, a multi-array microphone, and a tilt motor for sensor operations and adjustments. The horizontal field of view of the KINECT device is 57 degrees, the vertical field of view is 43 degrees, and the physical tilt range is ± 27 degrees. The color camera aids in detecting three color components: red, green and blue, like other cameras. The main difference between commonly-seen cameras and the Kinect sensor is that the Kinect
9
device has an extra depth detection sensor. The depth detection sensor is composed of an infrared projector and a monochrome Complementary Metal-Oxide Semiconductor (CMOS) sensor, which work together to capture the distance information between the depth sensor and the objects in front of the Kinect device. The reason why the Kinect sensor can acquire the depth information comes from the use of the PrimeSense’s light coding-patented technology. The light coding technology works by coding the scene with near-IR light, which is invisible to the human eye, and then uses the CMOS image sensor and the chips to execute sophisticated parallel computational algorithms to decipher the received light-coding infrared patterns to produce a VGA-size depth image of the scene.
The key of the mentioned light coding technique is the use of laser speckles when the laser is projected on objects with rough surfaces or through the frosted glasses, it will generate random reflecting speckles, integrally called the speckle pattern. The speckle pattern is highly randomized and the pattern image changes with different distances. The speckle pattern images that are captured by the Kinect sensor in any two places of the real space are different. According to speckle patterns, all the places in the real space can be marked. Then, the depth information can be obtained by decoding the laser speckle pattern on the object.
Figure 2.1 Hardware of the KINECT device.
10
2.2 Previous Studies of Structures of Depth and Color Images Taken by KINECT Devices
In this section, we will give a review of the functions of the KINECT sensor and the structures of the depth and color images taken by the KINECT sensor. The KINECT sensor brings many effects in not only various fields of research but also our daily life.
Because of the popularity of the KINECT sensor, people can use it easily to develop various kinds of applications, for example, detection of the positions of the user’s hands for browsing of websites according to the user’s movements only.
Many developers use the Open Natural Interaction (OpenNI) software development kit (SDK) of the KINECT sensor to develop related applications. The OpenNI framework is an open-source SDK useful for development of 3D sensing middleware libraries and applications. In addition, there are other SDKs like the Kinect-for-Windows SDK, which is also used by many developers for R & D. In this study, we use the OpenNI SDK to acquire data with the KINECT sensor. The acquired data include depth and color information, with the color information being like a commonly-seen color image with resolution 640480; and the depth information being a range image also with resolution 640480. The depth range provided by the KINECT sensor using the Kinect-for-Windows SDK is from 800mm to 4000mm, but that provided by the use of the OpenNI SDK is up to is the maximum of about 8000mm. As the depth distance detected by the KINECT sensor increases, more depth data will be missed in the detected result. For example, the missing depth value start from 611mm, and then 622mm, 631mm, 638mm, etc. When the depth value comes to 7960mm, the values of 7961mm to 8146mm are missing, so the inaccuracy of the detected value is 185mm. According to this phenomenon, we can realize that the larger the depth value
11
captured by the KINECT sensor, the lower accuracy the detected depth values have. If a user wants to interact with the KINECT sensor, the better range of distances between the KINECT sensor and the user is from 1200mm to 3600mm, which is advised by the KINECT development official website.
The acquired data by the KINECT sensor includes depth and color images as mentioned previously, and either of the depth and color images will be called a KINECT image in this study.
2.3 Review of Techniques for Image Authentication
With the advance of the computer and Internet technologies, security of digital image data is considered as a significant issue today. Thus, many techniques for embedding authentication signals for the purpose of image authentication have been proposed in the past. Specifically, the methods proposed in [2-4] authenticate images at the pixel level such that any pixel of a tampered image part can be identified, and then the result of detailed tampering localization is reported.
The method proposed in Liu et al. [2] generates a binary image that is mapped from the difference image computed from the cover image and its so-called chaotic pattern. And the least-significant-bit (LSB) plane is used to accommodate the binary image as a fragile watermark for use in later image authentication.
In Lee and Tsai [3, 4], a grayscale image authentication method was proposed. The method is based on a bin-mapping scheme which divides the grayscale range into two parts, the five MSBs and the remaining three LSBs. The former is used to generate a 3-bit bin code as the authentication signal for each pixel in the input cover image. Then, the authentication signals are embedded randomly into the other pixels of the image.
12
The authentication signals are utilized not only for detecting and localizing tampered pixels but also for generating representative values for repairing the tampered pixels.
2.4 Review of Techniques for Covert Communication via Images
Many data hiding techniques have been proposed for various purposes such as authentication and covert communication in the past. To achieve the goal of hiding the data imperceptibly, data hiding utilizing the weaknesses of the human’s vision system have been investigated intensively. A widely known method is least significant bit (LSB) modification which changes the LSBs of the pixel value in an image to embed information. For example, Chan and Cheng [5] presented a data hiding method by simple LSB substitution. On the other hand, data hiding techniques using pairs of image pixels to hide information have also been proposed, like Tian [6] who proposed a reversible data embedding method by using a difference expansion scheme which is based on simple reversible integer transformations. This method calculates the differences of neighboring pixel values, and selects some difference values of the pixel pairs for the difference expansion. Then, the information is embedded into the expanded differences of the pixel pairs. Since the modified values are generated from the differences between manipulated pixel pairs, the original pixel values can be recovered easily.
13
2.5 Review of Techniques for Copyright Protection of Images
Because of the popularity of the Internet, acquisitions of digital information from the network becomes easier and easier nowadays. Thus, protection of the copyright of digital information on the Internet is more important than ever before. About the topic of copyright protection of images, digital watermarking has been used widely in a lot of applications [7-12]. Digital watermarking means embedding some kinds of information, like ownership information, company logo, etc., into digital images that should be protected.
In general, digital watermarking techniques for images can be categorized into two types: visible and invisible. The first type of technique, visible watermarking, is to embed clearly visible marks into images. The embedded visible watermark is usually irremovable and leaves permanent distortion to the original image. The technique of the second type aims to embed the copyright information imperceptibly into images so that in case of copyright infringement, the hidden information can be retrieved to identify the ownership of the protected host image.
Both visible and invisible watermarking techniques yield the distortion of the host image after the embedding process. A group of techniques, named reversible watermarking [7-10], allow authorized users to remove the embedded watermark and save the original content of images as needed. Besides, some reversible watermarking techniques guarantee lossless image recovery, which means that the recovered image is identical to the original image. The techniques of lossless recovery is important in some applications , for example, images for military uses, historical art imaging, or related applications of medical image analysis. In these applications, any permanent distortion generated by watermarking is not allowed.
14
In [7], Alattar extended Tian’s algorithm [6] to utilize the difference expansion of vectors, instead of pairs of pixels, to increase the hiding ability and the computation efficiency.
In [8], Coltuc and Chassery presented a high-capacity data embedding scheme without using any additional data compression operation. This scheme is based on a reversible contrast mapping (RCM) technique, which is a simple integer transform defined on pairs of pixels. It partitions the cover image into pairs of pixels and divides the pairs into three groups, and then conducts a respective embedding process on each pair group. This RCM scheme provides almost similar embedding bit-rates when compared to the difference expansion approach, while it has a considerably lower mathematical complexity.
15
Chapter 3
Authentication of KINECT Images by Data Hiding Technique
3.1 Introduction
Because of the growing popularity of the KINECT device, the depth and color images, also called KINECT images in this study, acquired by KINECT devices are more commonly seen in various applications. Like other digital images, the KINECT images also need be verified for their correctness in order to prevent them from being tampered with by malicious persons when they are transmitted or kept in storages. For this reason, we propose a method for authentication of KINECT images in this study.
The detail of the method will be described in this chapter.
First, the definition of the problem and the idea of the proposed method are given in Section 3.1. In Section 3.2, we will describe the process of generating authentication signals and embedding them into the KINECT images. Then, in Section 3.3, the process of extracting the embedded authentication signals from the protected KINECT images will be described. In Section 3.4, the recovery of the original depth and color images, and repairing of the possibly tampered versions of them will be described. Experimental results showing the feasibility of the method are given in Section 3.5. Finally, some discussions and a brief summary are given in the last section of this chapter.
16
3.1.1 Problem Definition
As 3D sensing technologies are growing vigorously, related applications of the KINECT sensor are also increasing in various fields. Since the KINECT sensor can be used to acquire the depth information, the data can be used more extensively in applications than the 2D data can, for example, as the 3D information of marble sculptures created by famous artists and the models of historical architectures or objects encountered in daily life. These kinds of information are often very important, so the correctness of them must be guaranteed.
The range of the depth values that are provided by the KINECT device is different from that of the general values of color-image pixels. In addition, the pair of the depth and color images acquired by the KINECT device at each identical instant should be protected together to keep their relation in time unchanged. No matter whether the color or the depth image is tampered with by malicious persons or not, the tampered region should be detected by an authentication method, and better be repaired. These requirements should be considered when designing an authentication method for KINECT images, as is done in this study.
3.1.2 Proposed Ideas
The major idea of the proposed authentication method for KINECT images was inspired by the concept involved in the authentication method proposed by Lee and Tsai [4] as well as by some natures of KINECT image features. The proposed method aims to authenticate, as a whole, every pair of color and depth images taken by a KINECT device at an identical instant. The proposed method utilizes the features of KINECT images and the ranges of the pixel values in them to achieve the goal of authenticating them together as a whole.
17
As an inherent feature of KINECT images, a pixel value in the depth image is represented by a 13-bit binary string, as mentioned in Chapter 2, in order to cover the entire range of the depth values. And the value of a pixel in the color image, including the information of red, green, and blue, is represented by three 8-bit binary strings, respectively.
Furthermore, in the proposed method for KINECT image authentication, as shown in Fig. 3.1 the 13-bit depth value D of each pixel Pd in a depth image is divided into two parts - the nine MSBs of D and the three LSBs. The former is used to generate an authentication signal for the depth pixel Pd itself. The authentication signal is then embedded into five LSBs of a color-image pixel Pc randomly, where the five LSBs of Pc includes two LSBs of the red color value, two LSBs of the green color value, and one LSB of the blue color value, or we can say equivalently, the mentioned five LSBs of Pc includes the (2, 2, 1) LSBs of the (R, G, B) values of Pc. On the other hand, the nine MSBs of the color-image pixel Pc are used to generate an authentication signal for the color-image pixel Pc itself, and the signal is embedded into the three LSBs of the depth pixel Pd randomly, where the nine MSBs of Pc
includes (3, 3, 3) MSBs of the (R, G, B) values of Pc. In addition, the generated signals not only can be used to verify the correctness of a pixel P (Pd or Pc) in the color or depth image, but also can be used to repair part of the value of P when P is authenticated to have been tampered with.
The detailed algorithms about the proposed methods and the related processes of authentication of KINECT images are presented in the following sections.
18
3.2 Generation and Embedding of Authentication Signals
In this section we will introduce the details of the implemented processes of authentication signal generation and embedding according to the proposed method. An illustration of the processes is illustrated in Figure 3.1. The detailed process of generation of authentication signals is described in Section 3.2.1, and the detailed process of embedding the generated authentication signals into KINECT images is described in Section 3.2.2.
[r7r6r5g7g6g5b7b6b5] mappingBin 3-bit authentication signal
[d12d11d10d9d8d7d6d5d4]
Bin mapping 5-bit authentication
signal
Save the original bits [g1r1b0g0r0d2d1d0] to
alpha channels Color-image pixel Pc
Depth-image pixel Pd
Figure 3.1 Illustration of the authentication signals generation and embedding.
19
3.2.1 Authentication Signal Generation
In the proposed authentication signal generation process, at first we transform the value of a depth pixel Pd in the depth image into a 13-bit binary string, d12, d11, d10, …, d0. Then, we also transform the (R, G, B) values of each color-image pixel Pc into three 8-bit binary strings, r7, r6, r5, …, r0, g7, g6, g5, …, g0, and b7, b6, b5, …, b0, respectively.
Next, we transform the nine MSBs of Pd into an integer m and the (3, 3, 3) MSBs of the (R, G, B) values of Pc as a whole are also transformed into an integer n. Then, we apply a modified version of a bin-mapping scheme mentioned in Lee and Tsai [4] for the purpose of compressing these MSBs information before embedding them. Additionally, we map the depth-value range specified by the nine MSBs into 32 equal-length intervals, called bins. And each bin is indexed by a decimal integer called a bin number, which corresponds to a 5-bit binary number called a bin code. The 32 bins and their corresponding bin numbers and bin codes are shown in Table 1. On the other hands, the color-value range specified by the nine MSBs is also mapped into eight bins, and the eight bins and their corresponding bin numbers and bin codes are shown in Table 2.
Finally, the bin code of each pixel is taken to be the authentication signal of the pixel.
The proposed technique above for generating authentication signals using parts of depth pixel values and color pixel values is described as an algorithm, Algorithm 3.2.1, as follows.
20
Table 3.1 Bins, bin number, bin codes, and representative values of bins used in the generation of
authentication signals for depth image pixels in this study.
Bin Bin number Bin code Representative value of bin
[0, 31] 0 00000 16
Table 3.2 Bins, bin number, bin codes, and representative values of bins used the generation of
authentication signals for color image pixels in this study.
Bin Bin number Bin code Representative value of bin
[0, 63] 0 000 32 transform the red color values R, G, and B into three eight-bit strings, r7, r6,
21
r5, …, r0, g7, g6, g5, …, g0, and b7, b6, b5, …, b0.
Step 2. Transform the nine MSBs, d12, d11, d10, …, d5into an integer m; and concatenate the three binary sub-strings, r7, r6, r5, g7, g6, g5, b7, b6, b5, and transform the result into an integer n.
Step 3. Map the integer m into a bin indexed by a bin number Bm computed by the function Bm
m/16
; and map also the integer n into a bin indexed by a bin number Bn computed by the function Bn
n/64
.Step 4. Transform Bm into a 5-bit bin code t = e4e3e2e1e0 for use as the authentication signal for Pc; and transform also Bn into a 3-bit bin code s = h2h1h0 for use as the authentication signal for Pd.
3.2.2 Embedding of Authentication Signals
Because the color and depth image is saved as the PNG images, the alpha channels are used to hide the original bits for the process of original-image recovery. The original bits of a pair of a depth pixel and a color pixel includes the (2, 2, 1) LSBs of the original color-image pixel and the three LSBs of the original depth-image pixel. And the original bits of the depth and color image will be replaced with the authentication signals in this process. Therefore, in this process, the original bits of the depth and color image are hidden into the alpha channels of the depth and color image, respectively.
To embed the generated authentication signals generated as described above. At first we select a pixel Pd from the depth image I and a pixel Pc from the color image J
To embed the generated authentication signals generated as described above. At first we select a pixel Pd from the depth image I and a pixel Pc from the color image J