Thesis Organization - 在 H.264/AVC 視訊上做資訊隱藏之研究及其應用

Chapter 1 Introduction

1.5 Thesis Organization

In the remainder of this thesis, a review of related works about techniques of video data hiding, visible watermarking, and information sharing, as well as the H.264/AVC standard is given in Chapter 2. In Chapter 3, the proposed method for video data hiding for covert communication is described. In Chapter 4, the proposed removable visual watermarking method is described. In Chapter 5, the proposed method for video sharing is described. Finally, conclusions and some suggestions for future researches are made in Chapter 6.

Chapter 2 Review of Related Works and H.264/AVC Standard

2.1 Review of Techniques for Video Data Hiding

Techniques of video data hiding are developed for hiding secret data into a video.

By this way, secret data can be transmitted covertly. A lot of approaches related to hiding data into a video have been proposed [1-3]. Yang and Bourbakis [1] proposed a scheme for embedding data in the DCT coefficients by means of vector quantization.

Hu et al. [2] proposed a method for hiding data in H.264/AVC videos based on intra-prediction modes. The basic idea is to modify 4×4 intra-prediction modes based on the mapping between 4×4 intra-modes and hidden bits. Their method uses only the intra-coded macroblock to hide data. Kapotas et al. [3] proposed a method for embedding data into encoded video sequences, in which the hiding technique is used to modulate the partition size to hide the secret data. This method can only be used for embedding information in inter-coded macroblock.

2.2 Review of Techniques for Visible Watermarking in Videos

Visible watermarking is a technique for copyright protection [4]. The owner of a video can embed a visible watermark representing copyright information into the video, and this embedded watermark can be removed when proving his ownership.

Bhattacharya et al [5] surveyed different video watermarking techniques and used comparison analysis with reference to H.264/AVC. Mohanty et al. [6] proposed a DCT-domain visible watermarking technique for images. In their method, embedding visible watermarks in the DCT coefficients is based on a mathematical model developed by exploiting the texture sensitivity of the human visual system (HVS).

Chien and Tsai [7] proposed an active watermarking method for the MPEG-4 videos with a scheme for video displays with limited counts. The basic idea is to use an active agent to check available play counts. If the play count of the video is not zero, the active agent will remove the visible watermark embedded in the video; otherwise, the visible watermark will appear promptly to state the copyright.

2.3 Review of Techniques for Secret Sharing

Secret sharing is a technique for use to transform secret data into multiple shares, with each shares kept by a participant. When a pre-defined group of shares is collected, the secret data can be recovered. Shamir [8] proposed the concept of secret sharing in his (k, n)-threshold method, in which n indicates the number of participants and k means a threshold as the minimum number of shares in the pre-defined group.

Lin and Tsai [9] proposed an efficient (n, n)-threshold secret sharing method using

exclusive-OR operations. This method simply applies the exclusive-OR operation to a secret image and uses n-1 images to generate the nth image. The n-1 images and the nth image are taken as shares and are distributed to n participants separately. By exclusive-OR operations to n images held by the n participants, the secret image can be recovered quickly. Zou and Sun [10] proposed an approach which combines secret sharing and information hiding for covert communication.

2.4 Review of H.264/AVC Standard

In this study, all the proposed information hiding, watermarking and video sharing techniques employ H.264/AVC videos as carrier media for hiding information.

Richardson described in detail the H.264/AVC standard in his book [11]. We will give a brief review of the H.264/AVC standard in this section. In Section 2.4.1, the structure of the H.264/AVC standard will be described. In Section 2.4.2 and Section 2.4.3, the encoding and decoding processes in the H.264/AVC standard will be described.

2.4.1 Structure of H.264/AVC Standard

The H.264/AVC standard defines a set of three Profiles: Baseline, Main and Extended, which support different functions and suit different environment. Figure 2.1 shows the relationship between the three profiles and the coding tools supported by the standard. The H.264/AVC video has a hierarchical structure as illustrated in Figure 2.2. A video sequence is composed of a series of pictures (frames). The picture is coded as one or more slices. In general, there are three main slice types for use in H.264/AVC standard, including intra-slice (I), predictive slice (P), and bi-predictive slice (B). The slice consists of a number of macroblocks. There are four different types of macroblocks, including I macroblock, P macroblock, B macroblock, skipped

macroblock. I macroblocks are predicted from previously coded data within the same slice. P macroblocks are predicted from one reference picture. B macroblocks are predicted from two reference pictures. Skipped macroblocks of the P slice are encoded with a motion vector and no transform coefficients. And skipped macroblocks of the B slice are encoded without motion vectors and no transform coefficients. Each slice type has its own macroblock types. The relationships between the slice types and the macroblock types are listed in Table 2.1.

Figure 2.1 Relation between the Baseline, Main and Extended profiles.

Figure 2.2 Hierarchical structure of the H.264/AVC video.

Table 2.1 Relationships between slice types and macroblock types.

I macroblock P macroblock B macroblock

Skipped macroblock

I slice ●

P slice ● ● ●

B slice ● ● ●

2.4.2 Process of Encoding

A flow diagram of the encoding process is shown in Figure 2.2. In the encoding process, there are two data flow paths, forward (left to right) and reconstruction (right to left). In forward paths, each 16×16 macroblock is encoded in intra-mode or inter-mode, and a prediction (marked as P in Figure 2.2) is calculated by reconstructed data. In the intra-mode, the encoder calculates the best intra-prediction mode by reconstructed data in the current slice, and then computes the intra-prediction. In the inter-mode, the encoder calculates the best motion vector based on the reconstructed data in one or two reference picture(s), and then computes the motion-compensated prediction. The prediction is subtracted from the current block to produce a residual block (marked as Dn in Figure 2.2). A DCT-based transform is performed on each residual block. After that, each 4×4 block of the transform coefficients is quantized. Each resulting block (marked as X in Figure 2.2) is scanned in a zig-zag order and entropy encoded. An entropy technique is used to compress the quantized coefficient data and other information required to decode each block within the macroblock and form the compressed bitstream. Finally, the compressed bitstream is passed to the network abstraction layer (NAL) for transmission or storage. In reconstruction paths, the encoder decodes (reconstructs) each block in a macroblock which is regarded as a reference for further prediction.

The quantized coefficients are scaled and inverse-transformed to produce a difference block (marked as D'n in Figure 2.2), and then the prediction is added to the difference block to produce a reconstructed block (marked as uF'_n in Figure 2.2). Finally, the filter is used to reduce the effects of blocking distortion and the reconstructed reference picture is created from a series of blocks.

2.4.3 Process of Decoding

A flow diagram of the decoding process is shown in Figure 2.3. The decoder receives a compressed bitstream from the NAL and entropy decodes the data to get the quantized coefficients. Through scale and inverse-transform, the decoder obtains a difference block. By the header information from the bitstream, the decoder creates a prediction, identical to the original prediction formed in the encoder. The prediction is added to the difference block to produce the reconstructed block which is then filtered to create a decoded block.

Figure 2.3 Flow diagram of H.264/AVC encoding process.

Figure 2.4 Flow diagram of H.264/AVC decoding process.

Chapter 3 Data Hiding in H.264/AVC Videos for Covert Communication

3.1 Introduction

Due to the growth of computer network and audio/video compression technologies, many applications of digital media emerge on the network. But many new problems also arise. The preservation and transmission of secret information is a hot topic recently. Using data hiding techniques for covert communication is a good solution. In this way, we can hide secret data into other cover data, and the hidden information is unperceivable. Videos are suitable for use as cover media because videos are used widely and there is large hiding capacity in them. So we propose a data hiding method via H/264/AVC videos for covert communication in this study.

In Section 3.1.1, some relevant definitions are given, and in Section 3.1.2 the basic ideas of the proposed method are presented. In Section 3.2, the proposed data hiding method is described, and the corresponding data extraction method is stated in Section 3.3. In Section 3.4, several experimental results are shown to prove the feasibility of the proposed method. Finally, some discussions and a summary of the proposed method are made in the last section of this chapter.

3.1.1 Problem Definition

Traditionally, when applying video data hiding techniques for covert communication, the data hiding capacity and the imperceptibility of the hidden data are two of the major concerns. Therefore, the problem is how to hide data with large-volume capacity and imperceptibility.

In addition, with the popularity of web applications, people give more and more attention to low bit rate videos. Therefore, an additional problem is how to hide data into videos and get optimal results which take data hiding capacity, imperceptibility, and low bit rating into consideration.

3.1.2 Proposed Ideas

There are two macroblock types for use in the baseline profile of the H.264/AVC standard, which are I macroblock and P macroblock. We propose data hiding techniques for the two macroblock types, respectively, in this study.

Two methods are proposed for hiding data into I macroblocks based on the intra-prediction mode, which is a new coding method proposed in the H.264/AVC standard. In the first method, we transform the data to be hidden into novenary data and encode them by the use of the prediction modes.

In the second method, an encoder selects the best prediction mode for each block by a Lagrangian cost function [12] to minimize simultaneously the rate and distortion in the H.264/AVC standard, which is formulated as follows:

arg min( ( , ) ( , ))

(2) λ represents the Lagrange multiplier;

(3) Sk denotes the block being processed;

(4) D is a distortion function whose value is computed as the sum of the squared differences (SSD) between the reconstructed block Sk' and the original one Sk; (5) R denotes the used bits for encoding the block Sk using the prediction mode M_k.

In our approach, the block S_k is fixed to be 4×4 which yields higher data embedding rates. Furthermore, we add the hiding capacity as a new parameter to the Lagrangian cost function described by (3.1), resulting in:

arg min( ( , ) ( , ) - 1 )

,where the new parameter γ₁ is a multiplier for the hiding capacity N_i (in unit of bit) in the 4×4 block. By this function, we can get the best result as a tradeoff among the data hiding capacity, the bit rate, and the resulting distortion.

The idea of hiding data in the P macroblocks proposed in this study is to modify the variable partition size of the tree structured motion compensation, which is a different feature of the H.264/AVC standard from earlier standards. Tree structured motion compensation is a method of partitioning macroblocks into motion compensated sub-blocks of varying sizes. The encoder selects the partition size for each macroblock by a Lagrangian cost function described as follows:

arg min( ( , ) ( , ))

where ω denotes the set of all alternative partition sizes, and Pk denotes the current partition size. Similarly, we add hiding capacity as a new parameter to the Lagrangian cost function, resulting in:

arg min( ( , ) ( , ) - 2 )

,where the new parameter γ₂ is the multiplier for hiding capacity (in unit of bit) in P macroblocks. By this formula, we can get the best result as a tradeoff among the data

hiding capacity, the bit rate, and the resulting distortion.

3.2 Review of Related Techniques

3.2.1 Intra-prediction

For each I macroblock of an H.264/AVC video, a 4×4 prediction block as shown in Figure 3.1 includes 16 samples a, b, ..., p whose values are computed from some samples of previously encoded and reconstructed blocks (A, B, C, D in the top row from the upper neighboring block; E, F, G, H from the upper right block; I, J, K, L in the leftmost column from the left neighboring block; and M from the upper left block, as shown in Figure 3.1). And the resulting prediction block is subtracted from the current block prior to encoding. On the other hand, to compute the values of the prediction block samples, it is noted first that there are nine possible prediction modes for a luminance 4×4 block (abbreviated as a luma block in the sequel). The nine prediction modes are illustrated in Figure 3.2(a). Except prediction mode 2 with its samples all of the same value which is computed as the mean of A through D and I through L, the values of the samples of the remaining eight modes are computed from those values of A through M according to eight directions as illustrated in Figure 3.2(b). The H.264/AVC standard allows the selection of an encoder which adopts, among the nine modes, the best one with the lowest rate-distortion cost computed by the Lagrangian cost function described by Eq. (3.1).

3.2.2 Tree Structured Motion Compensation

A P macroblock may be split and motion compensated by four ways as (1) one 16×16 macroblock partition; (2) two 16×8 partitions; (3) two 8×16 partitions; or (4) four 8×8 partitions, as shown in Figure 3.3. If the 8×8 partitions are selected, each of

the four 8×8 sub-macroblocks may be split further by four ways as (1) one 8×8 sub-macroblock partition; (2) two 8×4 partitions; (3) two 4×8 partitions; or (4) four 4×4 partitions, as illustrated in Figure 3.4. An encoder selects the best partition size which has the lowest rate-distortion cost computed by the Lagrangian cost function

Figure 3.1 Samples a to p of a luma 4×4 prediction block are calculated based on the sample values of A to M in neighboring prediction blocks.

(a) Nine prediction modes for 4×4 prediction blocks.

(b) Directions for computing samples of eight prediction modes.

Figure 3.2 Prediction modes for luma 4×4 prediction.

8×8 4×8 8×4 4×4 8

8 4

16×16 8×16 16×8 8×8

16 8

Figure 3.3 Macroblock partitions.

Figure 3.4 Sub-macroblock partitions.

3.3 Hiding Secret Data into H.264/AVC Videos

In this section, the proposed methods of hiding data into different types of macroblocks of H.264/AVC videos will be described. An illustration of the hiding method is shown in Figure 3.5. In Section 3.3.1, the proposed method for hiding large-volume data in I macroblocks based on the use of the nine intra-prediction modes is described. In Section 3.3.2, the proposed method for hiding data in I macroblocks based on optimal choice of an intra-prediction mode is described. Finally, the proposed method for hiding data in P macroblocks optimally based on tree

structured motion compensation is described in Section 3.3.3.

Figure 3.5 Illustration of the proposed hiding method.

3.3.1 Process for Hiding Large-Volume Data into I Macroblocks Based on Intra-Prediction Mode

In this section, we describe the proposed method for hiding secret data based on the direct use of the nine prediction modes. To take full advantage of the nine prediction modes, we transform the binary data to be hidden into novenary ones, and then encode the result by the prediction modes. In addition, we also combine the user’s secret key and the secret data by exclusive-OR operations for the purpose of ensuring that the hidden data can be extracted only by a user who has the correct key.

A detailed algorithm of the process is described in the following.

Algorithm 3.1: large-volume data hiding process using I macroblocks.

Input: a user’s key R, a secret data file D, and the 4×4 luma prediction mode M.

Output: a stego-macroblock I'.

Steps:

1. For each character Di of the secret data D, perform the following steps.

1.1 Compute the remainder R' of dividing R by 256.

1.2 Transform each character Di of the secret data D in the following way to form encrypted data E:

i i

E = D ⊕ R '. (3.5)

2. Transform E into a six novenary number N by converting every nineteen bits of E into a novenary digit. So each 4×4 luma prediction mode in this method macroblock can be used to hide 19/6 bits of data.

3. Encode each digit N_i of N with magnitude i by the corresponding prediction mode Mi.

For example, if the user key is R = 3735, then R' = 3735/256 = 141₁₀ = 100011012. Now, suppose that a secret message character D1 = ‘a’ is to be embedded, whose corresponding binary form is 01100001₂. Then, the encrypted form of D₁ is E₁

= 01100001⊕10001101 = 111011002. Similarly, if D2 = ‘b,’ D3 = ‘c,’ with binary forms being 01100010₂ and 01100011₂, respectively, then E₂ = 01100010⊕10001101

= 111011112 and E3 = 01100011⊕10001101 = 111011102. Together, we get E = E₁E₂E₃ = 111011001110111111101110₂ whose first 19 bits as underlined, when converted into novenary, becomes the novenary number 8185639, and so may be encoded by the prediction mode M₁ = 8, M₂ = 1, M₃ = 8, M₄ = 5, M₅ = 6, M₆ = 3.

3.3.2 Process for Hiding Data Optimally into I

Macroblocks Based on Intra-Prediction Mode

In this section, we describe how we hide secret data optimally in a sense mentioned previously, based on the use of the nine prediction modes. Each 4×4 luma prediction mode in the I macroblock can be used to hide zero to four bits of data by this method and the method does not influence the degree of the imperceptibility. In addition, we recode the number of bits so hidden in the highest-frequency quantized coefficients of the 4×4 block, as shown in Figure 3.6. We also use the user’s secret key to encrypt the secret data to enhance the security. A detailed algorithm of the process is described in the following.

0 1 5 6

Figure 3.6 The quantized coefficient in the high-frequency.

Algorithm 3.2: optimal data hiding process for I macroblocks.

Input: an I macroblock in the spatial domain, I, a user’s key R, and a secret data file D.

Output: a stego-macroblock I'.

Steps:

1. For each character D_i of the secret data D, perform the following steps.

1.1 Compute the remainder R' of dividing R by 256.

1.2 Transform each character D_i of the secret data D according to Eq. (3.5) to form encrypted data E.

2. For each luma 4×4 block B of I, perform the following operations.

2.1 For each luma 4×4 prediction mode Mi, perform intra-prediction,

DCT-based transform, and quantization in the video coding process, and then match four bits of E, E3E2E1E0 with the 4-bit numeral value I0I1I2I3

2.2 Replace the highest-frequency quantized coefficients C as shown in Figure 3.4 by a new value according to the following mapping rules:

if 0 then set 0; number of bits which can be hidden in this block. Take away from E these bits.

4. Repeat the above steps to encode more bits in the remaining portion of E until no more is left.

= 01100001⊕10001101 = 111011002. Similarly, if D2 = ‘b’ with binary form being 01100010₂, then E₂ = 01100010⊕10001101 = 11101111₂. Together, we get E = E₁E₂

= 11101100111011112 whose first 4 bits are then matched to the binary equivalent of the index i of each prediction mode M_i. Suppose the best mode selected using the Lagrangian cost function is M3 whose corresponding binary index is 3 = 00112 (bits from right to left correspond to bits of E from left to right), we get two matching bits which can be hidden in M3. And so we set C = -1 and hide it in the highest-frequency quantized coefficient.

A flowchart of the optimal data hiding process for I macroblocks is shown in Figure 3.7.

3.3.3 Process for Hiding Data Optimally into P

Macroblocks Based on Tree Structured Motion Compensation

In this section, we hide secret data based on variable partition sizes of 16×16 macroblocks. Each 16×16 P macroblock can be used to hide one or four bit(s) of data by modifying the partition size. In order to allow better choices of sizes to reduce rate-distortion, we encode hidden data by the partition size with multiple choices for 0 or 1 according to Table 3.1, in which two groups of sizes are used to encode 0 and 1, respectively. In addition, we use the user’s secret key to encrypt secret data. A detailed algorithm of the process is described in the following.

Figure 3.7 Flowchart of the optimal data hiding process for I macroblocks.

Table 3.1 Relations between hidden data and partition sizes.

Algorithm 3.3: optimal data hiding process for P macroblocks.

Input: a P macroblock in the spatial domain P, a user’s key R, a secret data file D, and the macroblock partition size K.

Output: a stego-macroblock P'.

Steps:

1. For each character D_i of the secret data D, perform the following steps.

1.1 Compute the remainder R' of dividing R by 256.

1.2 Transform each character D_i of the secret data D according to Eq. (3.5) to form encrypted data E.

2. According to Table 3.1, hide one bit e₁ or four bits e_j of E into the macroblock partition according to the following rules for the macroblock

在文檔中在 H.264/AVC 視訊上做資訊隱藏之研究及其應用 (頁 15-0)