VP8 Intra Prediction and Inter Prediction

Chapter 2 Review of Related Works abnd WebM Standard

2.5 Review of WebM Standard

2.5.6 VP8 Intra Prediction and Inter Prediction

To encode a video frame, a block-based video codec, such as the VP8 video codec, at first decomposes the frame into smaller segments called macroblocks. For each macroblock in the VP8 video codec, the encoder will predict redundant motion and color information based on previously processed macroblocks. The redundant information can be subtracted and transformed from the macroblock, resulting in more efficient compression. The VP8 encoder uses two prediction types: intra

prediction and inter prediction. The intra prediction uses data within an encoded

macroblock in this frame so it does not reference any previously encoded frames; and the inter prediction uses data from previously encoded frames, so the residual signal data are encoded using other techniques, such as transform coding.

(A) VP8 Intra Prediction Modes 

The VP8 video codec uses three types of macroblocks in intra prediction modes,

4×4 luma, 16×16 luma, and 8×8 chroma. Five intra prediction modes are shared by

these macroblocks. The first is the H_PRED (horizontal prediction), which fills each column of the block with a copy of the left column. The second is the V_PRED (vertical prediction), which fills each row of the block with a copy of the row above.

The third is the DC_PRED (DC prediciton), which fills the block with a single value using the average of the pixels in the row above, A, and the column to the left, L (see Fig. 2.6). The fourth is the B_PRED, which divides a macroblock into sixteen blocks with each block having its own prediction modes. The last is the TM_PRED (TrueMotion prediction), which is a new compression prediction technique developed by On2 Technologies. We illustrate more details about TrueMotion prediction below.

In addition to the row A and the column L, TreMotion prediction uses the pixel C above and to the left of the block. Horizontal differences between pixels in A (starting from C) are propagated using the pixels from L to start each row. As mentioned above, the TM_PRED mode is unique to the VP8 video codec. Figure 2.6 uses an example 4×

4 block of pixels to illustrate how the TM_PRED mode works, where C, A_x and L_x (x

= 0, 1, 2, 3) represent reconstructed pixel values from previously encoded blocks, and

X

₀₀ through X₃₃ represent predicted values for the current block. The TM_PRED mode uses the following equation to calculate Xij:

ij i j

Figure 2.6 An example of 4×4 block of pixels.

Although the above example uses a 4×4 block, the TM_PRED mode for 8×8 and 16×16 blocks works in the same way. The TM_PRED prediction mode is one of the more frequently used intra prediction modes in the VP8 video codec. Generally speaking, together with other intra prediction modes, the TM_PRED prediction mode helps the VP8 video codec to achieve very good compression efficiency, especially for key frames, which can only use intra modes.

(B) VP8 Inter Prediction Modes 

In the VP8 video codec, inter prediction modes are used only on inter frames (non-key frames). For any VP8 inter frame, typically three previously coded reference frames can be used for prediction. A typical prediction block is constructed using a motion vector to copy a block from one of the three frames. The motion vector points to the location of a pixel block to be copied. In most video compression schemes, a good portion of the bits are spent on encoding motion vectors; the portion can be especially large for videos encoded at lower data rates. The VP8 video codec encodes motion vectors very efficiently by reusing motion vectors from neighboring macroblocks. The VP8 video code uses a similar strategy in the overall design of inter prediction modes. For example, the prediction modes "NEARESTMV" and

"NEARMV" make use of the last and second-to-last, non-zero motion vectors from neighboring macroblocks. And the prediction mode “ZEROMV” whose motion vectors in this macroblock is zero. These inter prediction modes can be used in combination with any of the three different reference frames.

In addition, the VP8 video codec has a very complicated, flexible inter prediction mode called SPLITMV. It is also a unique new compression prediction technique developed by On2 Technologies. This prediction mode was designed to enable flexible partitioning of a macroblock into sub-blocks to achieve better inter prediction.

The SPLITMV prediction mode is very useful when objects within a macroblock have different motion characteristics. Within a macroblock encoded by the SPLITMV prediction mode, each sub-block can have its own motion vector. Similar to the strategy of reusing motion vectors at the macroblock level, a sub-block can also use motion vectors from neighboring sub-blocks above or left to the current block. This strategy is very flexible and can effectively encode any shape of sub-macroblock partitioning, and very efficiently. Figure 2.7 and Figure 2.8 illustrate an example of a macroblock using the SPLITMV prediction mode. In Figure 2.7, NEW represents a 4×4

block encoded with a new motion vector, and LEFT and ABOVE represent a 4×4 block encoded using the motion vector from the left and above, respectively. As can be seen from Figure 2.8, macroblocks have three different colors; and each color represents a segment with different motion vectors, so there exist three different motions in these macroblock.

NEW LEFT LEFT NEW

ABOVE LEFT LEFT ABOVE

ABOVE NEW LEFT ABOVE

ABOVE ABOVE LEFT LEFT

Figure 2.7 An example of the SPLITMV prediction mode.

Figure 2.8 An example of the SPLITMV prediction mode.

Chapter 3 Data Hiding in WebM Videos for Covert Communication by

Frequency Coefficient Modifications 3.1 Introduction

Due to the growth of computer network and audio/video compression technologies, many applications of digital media have emerged on the network. The preservation and transmission of secret information are interesting research topics. To solve such covert communication problems, the use of data hiding techniques is a good solution. In this way, we can hide secret data into cover media, and the hidden information is desirably imperceptible in general. Videos are suitable for use as cover media for this purpose because more data can be hidden in videos than in images or in other documents. In addition, because of the efficiency and good quality of the WebM video, some popular video sharing web sites, like YouTube, have already used WebM videos widely for user communications. Considering this popularity of the WebM video, we propose a data hiding method via WebM videos for covert communication in this study, which we describe in this chapter.

In Section 3.1.1, some relevant definitions are given, and in Section 3.1.2 the basic ideas of the proposed method are presented. In Section 3.2, the proposed data hiding method is described in detail, and the corresponding data extraction method is presented in Section 3.3. In Section 3.4, some experimental results are shown to prove

the feasibility of the proposed method. Finally, discussions and a summary of the proposed method are made in the last section of this chapter.

3.1.1 Problem Definition

When data hiding techniques via videos are applied for covert communication, the amount and imperceptibility of the hidden data are two major concerns.

Furthermore, with the popularity of web applications, people give more and more attention to low bit rate videos. Therefore, an additional problem is how to hide data into videos in an optimal way to reduce the increase on the bit rate of the stego-video.

Finally, the enhancement of the hidden secret security should also be taken into considerations.

3.1.2 Proposed Ideas

In the method proposed in this study for hiding data via WebM videos for covert communication, because the transform coding scheme in the VP8 video codec always conducts compression at the 4×4 resolution, we try to modify the WebM‟s frequency coefficients of the chroma color space in the compression result and generate data patterns for data hiding. In addition, the PSNR values are computed and compared with a threshold to optimize these changes for maintaining the video quality and the bit rate. For secret security enhancement, first we calculate the total number of macroblocks which can be used to embed data and the total size of the secret message.

Then, we use a key together with a random number generator to select randomly data hiding positions in images, preventing a malicious user from figuring out the locations where the secret data are embedded.

There are two frame types in WebM videos, namely, key frame (I frame) and

prediction frame (P frame). The data hiding technique we propose in this study

utilizes the prediction frame.

3.2 Embedding of Secret Data into WebM Videos

In this section, the proposed method for embedding secret data into the frequency coefficients of the WebM video will be described in detail. An illustration of the embedding process is shown in Figure 3.1. In Section 3.2.1, the idea of the proposed data embedding scheme is given, and in Section 3.2.2 the details of the corresponding process is described.

Input

frame Prediction Transform Entropy

coding Bitstream

Data hiding

Figure 3.1 Illustration of the proposed data hiding method.

3.2.1 Idea of Proposed Method

Two main features of the proposed method are region-of-interest (ROI) map and frequency-coefficient pattern, whose functions for use in this study are described first below.

(A) Region-of-Interest Map

As mentioned in Section 2.5.4, the VP8 video codec supports up to four maps for each frame. Each macroblock has its own map index, and such an index is also encoded into the bitstream by tree coding. Here, we propose a scheme for assigning a map index for use as a data extraction mark to label macroblocks whose coefficients

are modified for embedding secret information. As a result, the proposed scheme can be used to indicate the macroblock positions in images where the secret information exists.

(B) Frequency Coefficient Patterns

As mentioned in Section 2.5.1, the macroblock-level data in a compressed frame in a WebM video is processed in a raster-scan order, and the macroblock is a square array of pixels whose Y components are 16×16 and U and V components are 8×8.

Each macroblock is decomposed further into 4×4 subblocks, so that every macroblock has sixteen Y subblocks, four U subblocks, and four V subblocks. Figure 3.2 shows one of the subblocks, whose size is 4×4. The DCT (discrete cosine transform) and WHT (Walsh-Hadamard transform) are always performed to conduct compression at the 4×4 resolution in the VP8 video codec. And the pixel values in a subblock, after the DCT is conducted, will be transformed into frequency-domain coefficients, and the energy of the coefficient signals is “clumped” at the left-upper corner of the subblock.

In addition, after the quantization step with an adaptive quantization level is conducted, non-zero or zero coefficients will appear in the middle area of a quantized subblock. At this area of non-zero and zero coefficients, pre-defined data patterns may be generated automatically to replace them for imperceptible data hiding. Figure 3.3 shows an example of a subblock after performing the DCT and quantization.

Furthermore, by the research results of the color theory [18], we know that human eyes have lower sensitivity on high-frequency signals and chrominance than on low-frequency signals and luminance.

According to the above discussions, we propose a data hiding scheme based on the DCT at the 4×4 resolution in this study, which modifies up to four coefficients on

the “positive-sloped diagonal line” of the 44 subblock of the quantized frequency coefficients using sixteen pre-defined 44 patterns to represent the message information to be embedded. Here, by the positive-sloped diagonal line, we mean those yellow-colored squares in the 44 coefficient matrix (corresponding to a subblock) shown in Fig. 3.2 or those red-colored ones shown in Fig. 3.3.

There are two reasons why we do not choose the coefficients from the upper left nor from the lower right part of the coefficient matrix to conduct pattern replacement for data embedding there. First, if the quantization level used for quantizing the coefficients in the lower right portions is too large, modifications of the coefficients there will cause too much distortion in the resulting image, allowing one to perceive any modification that has been done, so that imperceptibility would not be achieved.

Second, the coefficients in the upper left portion yielded by the DCT and the quantization process are usually non-zero values; therefore, it is almost impossible for the message data to match the pre-defined patterns well without modifying the coefficients.

Considering the capacity of hiding data and the above reasons, the proposed method uses the positive-sloped diagonal lines of all the subblocks of the chroma color channel for data embedding.

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Figure 3.2 An example of subblocks with yellow coefficients composing a positive-sloped diagonal line.

81 20 6 -2

11 4 0 0

1 0 0 0

0 0 0 0

Figure 3.3 An example of a subblock after performed DCT and quantization with red coefficients composing a positive-sloped diagonal line.

3.2.2 Process for Embedding Secret Data

In this section, we will describe the detailed algorithm of the proposed method for hiding secret message data into cover videos by changing the frequency coefficients into pre-defined patterns. A flowchart of the proposed data embedding process is shown in Figure 3.5.

Beforehand, we define in the following the aforementioned 16 data patterns for use in the proposed algorithms where we use the notations N and 0 to denote the meanings “non-zero” and “zero,” respectively.

Data pattern i (i = 0 to 15): a 44 block with its positive-sloped diagonal line

being filled with four symbols S₄

S

₃

S

₂

S

₁ of N‟s and 0‟s, which correspond to the binary value b4

b

of i in the following way:

if bj = 0, then Sj = N; otherwise, Sj = 0, j = 1, 2, 3, 4.

Figure 3.4 illustrates the 16 data patterns. For example, when i = 3, the corresponding binary value is i = 310 = 00112, so we define pattern 3 as the 44 block with its positive-sloped diagonal line being filled with the four symbols S4

S

1 = 00NN.

And when i =10, the corresponding binary value is i = 10₁₀ = 1010₂, so we define

pattern 10 as the 44 block with its positive-sloped diagonal line being filled with

S

₄

S

₃

S

₂

S

₁ = N0N0.

Figure 3.4 The sixteen data patterns for use to embed message data.

Algorithm 3.1 Process for computing the data hiding capacity of a video sequence.

Input: a video sequence V and a pre-selected threshold T.

Output: the number C of macroblocks in V

which may be used to embed data patterns without causing intolerable distortion.

Steps.

1. Perform the following steps for each macroblock MB in the prediction frames of

V.

1.1 Save the original quantized coefficients of the chroma color channels (including the U channel and the V channel) in the macroblock MB.

1.2 Check the coefficients of each subblock SB of the chroma color channels:

if the original coefficientsof SB do not satisfy data pattern 0, then modify them to be so by changing 0 in them to be 1;

else, do nothing.

1.3 (Computing the resulting distortion) Calculate the mean square quantization error (MSQE) between the saved content MBo of the original macroblock

where i =1 and 2 represent the U channel and the V channel, respectively, and each of MB_o and MB_o

' means an 88 vector of coefficients.

where S_peak means the maximum possible pixel value of the image.

1.5 Calculate the average PSNR value PSNR_avg of the chroma color channels:

2 index to label the modified macroblock MB' in the following way:

if PSNRavg is smaller than the pre-selected threshold T, then set the the ROI map index value to be 1, meaning that macroblock MB is

data-embeddable;

else, use the default value of the ROI map index which is 0, meaning that macroblock MB is non-data-embeddable.

2. Increment the value C by one if the ROI map index is set to be 1.

3. Repeat Steps 1 and 2 until all macroblocks are processed.

In Step 2 above, if the ROI map index is set to be 1, it means that this macroblock can be used to embed data without causing intolerable distortion in the resulting macroblock. Also, the number C is used to specify the data hiding capacity

of this video sequence in unit of macroblock. In addition, the Parseval theorem [19]

states that mean square error (MSE) in the pixel domain is equivalent to the mean square quantization error (MSQE) in the DCT domain because the DCT is a normalized orthogonal transformation. So in Steps 1.3 and 1.4 above, we may also use the original PSNR definition described by Eq. (3.3) below to calculate the PSNR values:

where S_peak means the maximum possible pixel value of the image.

With the data embedding capacity C computed, we can now describe the proposed method for data embedding as an algorithm in the following.

Algorithm 3.2 Process for embedding secret data into a WebM video.

Input: a video V, a secret key K, a random number generator f, and a secret message S.

2. (Randomizing the secret message) Transform the secret message S into a binary string B, use the secret key K as a seed to generate a sequence Q of random numbers using the random number generator f, and randomize B with Q to get a randomized binary string B'.

3. Calculate the total number N of macroblocks which are needed for embedding B'

by:

32 . the length of B'

N  (3.5)

4. Use the secret key K and the random number generator f to generate a sequence

RS of N random integer numbers with C as the maximum number in the

sequence, and sort them into an ascending order.

5. Divide B' into a linear array A of 4-bit segments.

Stage 2 --- embedding message data into the video.

6. Perform the following steps to embed message S into each unprocessed macroblock MB in every prediction frame of V, assuming V is large enough to embed the entire message.

6.1 Save the original quantized coefficients of the chroma color channels in

MB.

6.2 Check the coefficients of each subblock SB of the chroma color channels:

if the original coefficientsof SB do not satisfy data pattern 0, then modify them to be so by changing 0 in them to be 1;

else, do nothing.

6.3 Calculate the mean square quantization error (MSQE) between the saved content MBo of the original macroblock MB and the content MBo

' of the

modified macroblock MB' of the chroma color channels:

o o

( )

MSQEi  MBMB (3.6)

where i =1 and 2 represent the U channel and the V channel, respectively, and each of MBo and MBo

' means an 88 vector of coefficients.

6.4 Calculate the value of the peak signal-to-noise ratio (PSNR) PSNRi of the chroma color channels for i = 1 and 2:

where S_peak means the maximum possible pixel value of the image.

6.5 Calculate the average PSNR value PSNRavg of the chroma color channels:

2 index to label the modified macroblock MB' in the following way:

if PSNR_avg is smaller than the pre-selected threshold T, then set the the ROI map index value to be 1, meaning that macroblock MB is

data-embeddable;

else, use the default value of the ROI map index which is 0, meaning that macroblock MB is non-data-embeddable.

6.7 Increment the value of a pre-defined random selection counter C_v by one if the ROI map index of this macroblock is set to be 1.

6.8 (Embedding the secret data patterns at random locations in the input video) If Cv is equal to the next unprocessed random number in RS (meaning that the currently macroblock is chosen randomly for message hiding), then conduct the following steps to embed eight 4-bit segments (=32 bits) of the message data; if not, go to Step 6.9.

(1) Take eight unprocessed 4-bit elements from A, denoted as A1 through A8, and for each element A_i, define for it a corresponding data pattern P_i such that if Ai = (b1

b

4)2 = d10, then Pi is just data pattern d (e.g., if Ai

= 1001₂ = 9₁₀, then P_i = data pattern 9).

(2) Check the coefficients of each of the eight corresponding subblocks of the chroma color channels with four in the U channel and the other four

in the V channel, denoted as SB_j, j = 1, 2, …, 8:

if the original coefficientsof SBj do not match those of data pattern

P

_i, then modify them to be so by changing those mismatching ones in SBj to be the corresponding ones of data pattern Pi; else, do nothing.

6.9 Embed an ending pattern in the next macroblock by setting its ROI map index to be 1 if the entire secret message in array A has been embedded (i.e., if all elements in A have been processed); otherwise, go to Step 6 to repeat the above process.

The random sequence RS generated in Step 4 above is used to represent positions where the secret message can be embedded. In the same step, the value N represents the number of macroblocks we need to embed the entire secret message. Selecting N elements from RS and using them in Step 6.8 means that the message data are embedded into random positions. By this way, we can reduce the opportunity for an attacker to get the secret message. Also, because the VP8 encoder always encodes frames in raster-scan order, we sort the N elements in an ascending order.

In Step 5, the proposed hiding method is based on the use of the 16 pre-defined

在文檔中利用WebM視訊做資訊隱藏及其應用之研究 (頁 34-0)