在 H.264/AVC 視訊上做資訊隱藏之研究及其應用

(1)

國

立

交

通

大

學

資資資學與與與與與與

碩

士

論

文

在 H.264/AVC 視訊上做資訊隱藏之研究及

其應用

A Study on Data Hiding in H.264/AVC Videos and Its

Applications

研究生：黃冠霖

指導教授：蔡文祥教授

中

華

民

國

九

十

七

年

六

月

(2)

在 H.264/AVC 視訊上做資訊隱藏之研究及其應用

A Study on Data Hiding in H.264/AVC Videos

and Its Applications

研究生：黃冠霖 Student：Guan-Lin Huang

指導教授：蔡文祥 Advisor：Wen-Hsiang Tsai

國立交通大學

資訊科學與工程研究所

碩士論文

A Thesis

Submitted to Institute of Computer Science and Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science June 2008

Hsinchu, Taiwan, Republic of China

(3)

在

在 H.264/AVC 視訊上做資訊隱藏之研究

視訊上做資訊隱藏之研究

視訊上做資訊隱藏之研究及其應用

視訊上做資訊隱藏之研究

及其應用

研究生：黃冠霖指導教授：蔡文祥博士

國立交通大學資訊科學與工程研究所

摘要

隨著資訊科技的進步，越來越多的網路視訊應用也隨之發展，其中 H.264/AVC 的視訊檔案是現今應用最廣的視訊格式。本論文針對 H.264/AVC 的視訊檔案，利用資訊隱藏及數位浮水印之技術做秘密傳輸、版權保護及視訊分享之研究與應用。在秘密傳輸方面，我們提出一個大量以及一個最佳化的方法，這兩個方法皆是利用 H.264/AVC 視訊檔案的特性來隱藏資料。其中最佳化的方法可在隱藏資料量、視覺影響及壓縮後檔案大小間取得一個最佳結果。在版權保護方面，對於被客戶端下載的 H.264/AVC 視訊檔案，我們提出一個利用主動式可視浮水印技術及限制影片在特定電腦播放的方法來做版權保護。而在秘密分享的應用方面，我們利用邏輯運算從事秘密視訊資料的分享，並將分享出去的資料回藏至視訊當中，最後將分享後的視訊分發給各參與者保管。在所有參與者所擁有的視訊集合起來之後，即可回復成原本的秘密視訊。最後，我們以實驗結果證明了所提方法之可行性。

(4)

A Study on Data Hiding in H.264/AVC Videos

and Its Applications

Student: Guan-Lin Huang

Advisor: Wen-Hsiang Tsai

Institute of Computer Science and Engineering

National Chiao Tung University

ABSTRACT

With the advance of information technologies, more and more digital video applications on the internet have been proposed. H.264/AVC videos are used in a wide variety of applications. In this study, several methods for data hiding applications, namely, covert communication, copyright protection, and video sharing, are proposed using H.264/AVC videos as cover media. For covert communication, we propose a large-volume method and an optimal method for hiding secret data in H.264/AVC videos, based on the use of prediction modes and tree structured motion compensation. The optimal method is a tradeoff between hiding capacity, imperceptibility, and low bit rating. For copyright protection, in order to protect the ownership of downloaded videos at the client site, a method using a removable visible watermarking technique with a scheme for display control on specified computers is proposed. For the application of secret sharing, we share the data of the prediction modes of a secret video based on exclusive-OR operations and hide the resulting share data into the prediction modes of cover videos. The resulting share videos are then distributed to participants to keep. By collecting the share videos owned by all participants, the secret video can be recovered. Good experimental results show the feasibility of the proposed methods.

(5)

ACKNOWLEDGEMENTS

I am in hearty appreciation of the continuous guidance, discussions, support, and encouragement received from my advisor, Dr. Wen-Hsiang Tsai, not only in the development of this thesis, but also in every aspect of my personal growth.

Thanks are due to Miss Shung-Yung Tsai, Mr. Jiun-Tsung Wang, and Hsing-Chia Chen for their valuable discussions, suggestions, and encouragement. Appreciation is also given to the colleagues of the Computer Vision Laboratory in the Institute of Computer Science and Engineering at National Chiao Tung University for their suggestions and help during my thesis study.

Finally, I also extend my profound thanks to my family for their lasting love, care, and encouragement. I dedicate this dissertation to my beloved parents and friend.

(6)

LIST OF FIGURES

Figure 2.1 Relation between the Baseline, Main and Extended profiles. ... 9

Figure 2.2 Hierarchical structure of the H.264/AVC video. ... 10

Figure 2.3 Flow diagram of H.264/AVC encoding process... 12

Figure 2.4 Flow diagram of H.264/AVC decoding process... 12

Figure 3.1 Samples a to p of a luma 4×4 prediction block are calculated based on the sample values of A to M in neighboring prediction blocks. ... 17

Figure 3.2 Prediction modes for luma 4×4 prediction. ... 17

Figure 3.3 Macroblock partitions... 18

Figure 3.4 Sub-macroblock partitions... 18

Figure 3.5 Illustration of the proposed hiding method. ... 19

Figure 3.6 The quantized coefficient in the high-frequency. ... 21

Figure 3.7 Flowchart of the optimal data hiding process for I macroblocks. ... 24

Figure 3.8 Flowchart of the hiding process for P macroblocks. ... 27

Figure 3.9 Illustration of the proposed extraction method... 28

Figure 3.10 Flowchart of the extraction process for I macroblocks of optimal method.31 Figure 3.11 Flowchart of the extraction process for P macroblocks... 33

Figure 3.12 The secret data file... 34

Figure 3.13 The 1st to 4th frames (IIII) of original video (left) and stego-video (right). 35 Figure 3.14 The extracted data file. ... 36

Figure 3.15 The 4th to 7th frames (PPIP) of original video (left) and stego-video (right). ... 37

Figure 3.16 The extracted data file. ... 38

Figure 4.1 Illustration of the proposed idea. ... 43

Figure 4.2 The checking process of video display control on specified computers. ... 44

Figure 4.3 Illustration of drift problem. ... 45

Figure 4.4 Types of macroblock to slice group maps (type 6 is “Explicit” which is user-defined). ... 46

Figure 4.5 Flowchart of the embedding process for I and P frames. ... 47

Figure 4.6 Flowchart of the recovery process for I and P frames. ... 50

Figure 4.7 A watermark binary image with size 16×16 ... 51

Figure 4.8 Six frames of the original video. (a) The first frame (I frame). (b) The second frame (P frame). (c) The third frame (P frame). (d) The 4th frame (P frame). (e) The 5th frame (P frame). (f) The 6th frame (P frame)... 51

Figure 4.9 Six frames of the watermarked video. (a) The first frame (I frame). (b) The second frame (P frame). (c) The third frame (P frame). (d) The 4th frame (P frame). (e) The 5th frame (P frame). (f) The 6th frame (P frame). ... 52

(9)

Figure 4.10 Six frames of the recovered video. (a) The first frame (I frame). (b) The second frame (P frame). (c) The third frame (P frame). (d) The 4th frame (P

frame). (e) The 5th frame (P frame). (f) The 6th frame (P frame). ... 53

Figure 5.1 An illustration of the proposed idea... 57

Figure 5.2 Illustration of the secret sharing method. ... 60

Figure 5.3 Illustration of the prediction error problem. ... 61

Figure 5.4 Flowchart of the process of creating steganographic effects... 64

Figure 5.5 Flowchart of the recovering process... 67

Figure 5.6 The four frames of the original video (left) and randomized video (right). . 69

Figure 5.7 The four frames of the first cover video (left) and the share video (right)... 70

Figure 5.8 The four frames of the second cover video (left) and the share video (right). ... 71

Figure 5.9 The four frames of the recovered secret video. ... 72

(10)

LIST OF TABLES

Table 2.1 Relationships between slice types and macroblock types... 10

Table 3.1 Relations between hidden data and partition sizes... 25

Table 3.2 Configuration parameters... 34

Table 3.4 NBH, PSNRI and BRI values for several video sequences (γi,γp= 250). ... 39

(11)

Chapter 1 Introduction

1.1 Motivation

With the advance of the Internet and multimedia technologies, more and more people transmit videos through the Internet. H.264/AVC is the latest video compression standard, and contains a lot of new features that allow it to compress videos more effectively than older standards. H.264/AVC also provides more flexibility for applications on a wide variety of networks, and so it is suitable for use as a kind of carrier for information hiding investigated in this study.

Data hiding techniques can be used to hide secret data in a video, resulting in a so-called stego-video. In this way, stego-videos instead of secret data may be transmitted through the network. Except the owner, other users usually do not know the existence of the hidden information and so will not try to get the information because the secret data hidden in a video are invisible. It is desired in this study to develop a data hiding method via H/264/AVC videos for covert communication.

Copyright protection of videos becomes more and more important nowadays because videos communicated on the Internet might be copied or misused. For video copyright protection, one approach is to utilize digital visible watermarking techniques. Using this approach to certify the copyright of a video has a main advantage, i.e., the watermark conveys a straightforward claim of the ownership of the video. But the video content might be partially occluded by an embedded visible

(12)

watermark. In order to solve this problem,it is desired to developa removable visible watermark technique in this study, by which videos can be protected by the use of visible watermarks. And when displaying a video on a specified computer, it is also hoped that we can remover the watermark before watching the video.

Secret sharing is a technique for use to transform secret data into multiple shares, with each share kept by a participant. Each share may be created to have a meaningless content. By collecting a sufficient number of shares, we can recover the secret data. Because each meaningless share might be suspected easily, it is a good idea to hide each share into another meaningful media file. This effect can be accomplished by applying the technique of steganography to the meaningless shares.

Secret sharing techniques have been applied to various kinds of files, such as image and video. Though studies on the secret image sharing technique are getting intensive now, there are very few studies on secret video sharing yet. So, it is desired in this study to develop a secret video sharing method which creates steganographic effects.

1.2 General Review of Related Works

Although all the above-mentioned goals of this study are related to the technique of hiding information within videos, different methods should be adopted for different applications. A detailed review of video data hiding, visible watermarking, and secret sharing techniques which have been developed in recent years will be introduced in Chapter 2. In addition, because the proposed data hiding and watermarking techniques are applied to H.264/AVC videos, we will also make a review of the H.264/AVC standard in Chapter 2.

(13)

1.3.1 Terminologies

The definitions of some related terminologies used in this study are described as follows.

1. Secret: a secret is a piece of information that is important and should be

preserved properly and not revealed to unauthorized people.

2. Stego-video: a stego-video is one in which some digital information is

embedded.

3. Watermarked video: a watermarked video is one in which a visible

watermark has been embedded.

4. Recovered video: a recovered video is one obtained by removing the

embedded visible watermark from a watermarked video.

5. Cover video: a cover video is input one use for secret sharing and

steganography.

6. Share video: a share video is one of the secret sharing results of a secret

video.

1.3.2 Brief Descriptions of Proposed Methods

A. An Data Hiding Method for Covert Communication

A method using a data hiding technique for covert communication via H.264/AVC videos is proposed in this study. An H.264/AVC video consists of a series of I and P image frames. We propose proper hiding techniques for macroblocks in I and P frames by utilizing their properties, respectively. We hide data into the intra-prediction mode of the I macroblock, and into the tree structured motion compensation of the P macroblock. Briefly, we modify the prediction mode and the

(14)

variable partition size of the tree structured motion compensation to hide data and maintain imperceptibly of the hidden data. In addition, we also use the Lagrange optimization technique to optimize the changes yielded by the data hiding process.

B. A Visible Watermarking Method for Copyright Protection

A method for copyright protection of videos using a removable visible watermarking technique and a scheme for video display control on specified computers is also proposed in this study. By performing encoding process on a given video, the quantized transform coefficients of all the 16×16 luminance macroblocks of each frame of the video are obtained. We embed one pixel of a visible watermark into the direct current (DC) coefficient of the luminance (luma) macroblock.

On the other hand, a software player is designed to obtain certain hardware identification information (such as the number of the CPU and the volume serial number) from the user’s computer, when a video is displayed. If the computer information is correct, the player will remove the visible watermark embedded in the watermarked-video; otherwise, a visible watermark will appear promptly to state copyright and prohibit the viewer from enjoying the video content.

C. A Video Sharing Method with Steganographic Effects

A method using the secret sharing and steganography techniques for sharing secret videos is proposed in this study. The technique of sharing secret is based on the exclusive-OR operation, by which we encode the prediction mode of secret videos and the upper half of the prediction mode of a covert video into several pieces of share data. Then we hide the meaningful share data into the lower half of the prediction mode of the covert video. The share videos are distributed to the

(15)

participants for custody by them. After collecting all the share videos from the participants, we can extract the hidden data and recover the secret video.

1.4 Contributions

The contributions made in this study are summarized in the following.

1. An data hiding method based on some properties of H.264/AVC is proposed for covert communication.

2. A removable visible watermarking method with a scheme for video display control on specified computers is proposed for protecting the copyright of an H.264/AVC video.

3. A method of video sharing with steganographic effects is proposed for protecting secret videos systematically and securely.

1.5 Thesis Organization

In the remainder of this thesis, a review of related works about techniques of video data hiding, visible watermarking, and information sharing, as well as the H.264/AVC standard is given in Chapter 2. In Chapter 3, the proposed method for video data hiding for covert communication is described. In Chapter 4, the proposed removable visual watermarking method is described. In Chapter 5, the proposed method for video sharing is described. Finally, conclusions and some suggestions for future researches are made in Chapter 6.

(16)

Chapter 2 Review of Related Works and

H.264/AVC Standard

2.1 Review of Techniques for Video

Data Hiding

Techniques of video data hiding are developed for hiding secret data into a video. By this way, secret data can be transmitted covertly. A lot of approaches related to hiding data into a video have been proposed [1-3]. Yang and Bourbakis [1] proposed a scheme for embedding data in the DCT coefficients by means of vector quantization. Hu et al. [2] proposed a method for hiding data in H.264/AVC videos based on intra-prediction modes. The basic idea is to modify 4×4 intra-prediction modes based on the mapping between 4×4 intra-modes and hidden bits. Their method uses only the intra-coded macroblock to hide data. Kapotas et al. [3] proposed a method for embedding data into encoded video sequences, in which the hiding technique is used to modulate the partition size to hide the secret data. This method can only be used for embedding information in inter-coded macroblock.

(17)

2.2 Review of Techniques for Visible

Watermarking in Videos

Visible watermarking is a technique for copyright protection [4]. The owner of a video can embed a visible watermark representing copyright information into the video, and this embedded watermark can be removed when proving his ownership. Bhattacharya et al [5] surveyed different video watermarking techniques and used comparison analysis with reference to H.264/AVC. Mohanty et al. [6] proposed a DCT-domain visible watermarking technique for images. In their method, embedding visible watermarks in the DCT coefficients is based on a mathematical model developed by exploiting the texture sensitivity of the human visual system (HVS). Chien and Tsai [7] proposed an active watermarking method for the MPEG-4 videos with a scheme for video displays with limited counts. The basic idea is to use an active agent to check available play counts. If the play count of the video is not zero, the active agent will remove the visible watermark embedded in the video; otherwise, the visible watermark will appear promptly to state the copyright.

2.3 Review of Techniques for Secret

Sharing

Secret sharing is a technique for use to transform secret data into multiple shares, with each shares kept by a participant. When a pre-defined group of shares is collected, the secret data can be recovered.Shamir [8] proposed the concept of secret sharing in his (k, n)-threshold method, in which n indicates the number of participants and k means a threshold as the minimum number of shares in the pre-defined group. Lin and Tsai [9] proposed an efficient (n, n)-threshold secret sharing method using

(18)

exclusive-OR operations. This method simply applies the exclusive-OR operation to a secret image and uses n-1 images to generate the nth image. The n-1 images and the

nth image are taken as shares and are distributed to n participants separately. By

exclusive-OR operations to n images held by the n participants, the secret image can be recovered quickly. Zou and Sun [10] proposed an approach which combines secret sharing and information hiding for covert communication.

2.4 Review of H.264/AVC Standard

In this study, all the proposed information hiding, watermarking and video sharing techniques employ H.264/AVC videos as carrier media for hiding information. Richardson described in detail the H.264/AVC standard in his book [11]. We will give a brief review of the H.264/AVC standard in this section. In Section 2.4.1, the structure of the H.264/AVC standard will be described. In Section 2.4.2 and Section 2.4.3, the encoding and decoding processes in the H.264/AVC standard will be described.

2.4.1 Structure of H.264/AVC Standard

The H.264/AVC standard defines a set of three Profiles: Baseline, Main and

Extended, which support different functions and suit different environment. Figure 2.1

shows the relationship between the three profiles and the coding tools supported by the standard. The H.264/AVC video has a hierarchical structure as illustrated in Figure 2.2. A video sequence is composed of a series of pictures (frames). The picture is coded as one or more slices. In general, there are three main slice types for use in H.264/AVC standard, including intra-slice (I), predictive slice (P), and bi-predictive slice (B). The slice consists of a number of macroblocks. There are four different types of macroblocks, including I macroblock, P macroblock, B macroblock, skipped

(19)

macroblock. I macroblocks are predicted from previously coded data within the same slice. P macroblocks are predicted from one reference picture. B macroblocks are predicted from two reference pictures. Skipped macroblocks of the P slice are encoded with a motion vector and no transform coefficients. And skipped macroblocks of the B slice are encoded without motion vectors and no transform coefficients. Each slice type has its own macroblock types. The relationships between the slice types and the macroblock types are listed in Table 2.1.

(20)

Figure 2.2 Hierarchical structure of the H.264/AVC video.

Table 2.1 Relationships between slice types and macroblock types. I macroblock P macroblock B macroblock

Skipped macroblock I slice ●

P slice ● ● ●

(21)

2.4.2 Process of Encoding

A flow diagram of the encoding process is shown in Figure 2.2. In the encoding process, there are two data flow paths, forward (left to right) and reconstruction (right to left). In forward paths, each 16×16 macroblock is encoded in intra-mode or inter-mode, and a prediction (marked as P in Figure 2.2) is calculated by reconstructed data. In the intra-mode, the encoder calculates the best intra-prediction mode by reconstructed data in the current slice, and then computes the intra-prediction. In the inter-mode, the encoder calculates the best motion vector based on the reconstructed data in one or two reference picture(s), and then computes the motion-compensated prediction. The prediction is subtracted from the current block to produce a residual block (marked as Dn in Figure 2.2). A DCT-based

transform is performed on each residual block. After that, each 4×4 block of the transform coefficients is quantized. Each resulting block (marked as X in Figure 2.2) is scanned in a zig-zag order and entropy encoded. An entropy technique is used to compress the quantized coefficient data and other information required to decode each block within the macroblock and form the compressed bitstream. Finally, the compressed bitstream is passed to the network abstraction layer (NAL) for transmission or storage. In reconstruction paths, the encoder decodes (reconstructs) each block in a macroblock which is regarded as a reference for further prediction. The quantized coefficients are scaled and inverse-transformed to produce a difference block (marked as D'n in Figure 2.2), and then the prediction is added to the difference

block to produce a reconstructed block (marked as uF'n in Figure 2.2). Finally, the

filter is used to reduce the effects of blocking distortion and the reconstructed reference picture is created from a series of blocks.

(22)

2.4.3 Process of Decoding

A flow diagram of the decoding process is shown in Figure 2.3. The decoder receives a compressed bitstream from the NAL and entropy decodes the data to get the quantized coefficients. Through scale and inverse-transform, the decoder obtains a difference block. By the header information from the bitstream, the decoder creates a prediction, identical to the original prediction formed in the encoder. The prediction is added to the difference block to produce the reconstructed block which is then filtered to create a decoded block.

Figure 2.3 Flow diagram of H.264/AVC encoding process.

(23)

Chapter 3 Data Hiding in H.264/AVC Videos

for Covert Communication

3.1 Introduction

Due to the growth of computer network and audio/video compression technologies, many applications of digital media emerge on the network. But many new problems also arise. The preservation and transmission of secret information is a hot topic recently. Using data hiding techniques for covert communication is a good solution. In this way, we can hide secret data into other cover data, and the hidden information is unperceivable. Videos are suitable for use as cover media because videos are used widely and there is large hiding capacity in them. So we propose a data hiding method via H/264/AVC videos for covert communication in this study.

In Section 3.1.1, some relevant definitions are given, and in Section 3.1.2 the basic ideas of the proposed method are presented. In Section 3.2, the proposed data hiding method is described, and the corresponding data extraction method is stated in Section 3.3. In Section 3.4, several experimental results are shown to prove the feasibility of the proposed method. Finally, some discussions and a summary of the proposed method are made in the last section of this chapter.

(24)

Traditionally, when applying video data hiding techniques for covert communication, the data hiding capacity and the imperceptibility of the hidden data are two of the major concerns. Therefore, the problem is how to hide data with large-volume capacity and imperceptibility.

In addition, with the popularity of web applications, people give more and more attention to low bit rate videos. Therefore, an additional problem is how to hide data into videos and get optimal results which take data hiding capacity, imperceptibility, and low bit rating into consideration.

3.1.2 Proposed Ideas

There are two macroblock types for use in the baseline profile of the H.264/AVC standard, which are I macroblock and P macroblock. We propose data hiding techniques for the two macroblock types, respectively, in this study.

Two methods are proposed for hiding data into I macroblocks based on the

intra-prediction mode, which is a new coding method proposed in the H.264/AVC

standard. In the first method, we transform the data to be hidden into novenary data and encode them by the use of the prediction modes.

In the second method, an encoder selects the best prediction mode for each block by a Lagrangian cost function [12] to minimize simultaneously the rate and distortion in the H.264/AVC standard, which is formulated as follows:

arg min( ( , ) ( , )) k k k k k M J D S M R S M τ λ ∈ = + (3.1) where

(1) τ denotes the set of all the nine prediction modes, i.e., τ = {M1, M2, ..., M9};

(2) λ represents the Lagrange multiplier;

(25)

(4) D is a distortion function whose value is computed as the sum of the squared differences (SSD) between the reconstructed block Sk' and the original one Sk;

(5) R denotes the used bits for encoding the block Sk using the prediction mode Mk.

In our approach, the block Sk is fixed to be 4×4 which yields higher data

embedding rates. Furthermore, we add the hiding capacity as a new parameter to the Lagrangian cost function described by (3.1), resulting in:

1 arg min( ( , ) ( , ) - ) i i i i M J D S M R S M N τ λ γ ∈ = + ⋅ (3.2)

,where the new parameter γ1 is a multiplier for the hiding capacity Ni (in unit of bit) in

the 4×4 block. By this function, we can get the best result as a tradeoff among the data hiding capacity, the bit rate, and the resulting distortion.

The idea of hiding data in the P macroblocks proposed in this study is to modify the variable partition size of the tree structured motion compensation, which is a different feature of the H.264/AVC standard from earlier standards. Tree structured motion compensation is a method of partitioning macroblocks into motion compensated sub-blocks of varying sizes. The encoder selects the partition size for each macroblock by a Lagrangian cost function described as follows:

arg min( ( , ) ( , )) k k k P J D S M R S M ω

λ

∈ = + (3.3)

where ω denotes the set of all alternative partition sizes, and Pk denotes the current

partition size. Similarly, we add hiding capacity as a new parameter to the Lagrangian cost function, resulting in:

2 arg min( ( , ) ( , ) - ) i i i i P J D S M R S M N ω λ γ ∈ = + ⋅ (3.4)

,where the new parameter γ2 is the multiplier for hiding capacity (in unit of bit) in P

(26)

hiding capacity, the bit rate, and the resulting distortion.

3.2 Review of Related Techniques

3.2.1 Intra-prediction

For each I macroblock of an H.264/AVC video, a 4×4 prediction block as shown in Figure 3.1 includes 16 samples a, b, ..., p whose values are computed from some samples of previously encoded and reconstructed blocks (A, B, C, D in the top row from the upper neighboring block; E, F, G, H from the upper right block; I, J, K, L in the leftmost column from the left neighboring block; and M from the upper left block, as shown in Figure 3.1). And the resulting prediction block is subtracted from the current block prior to encoding. On the other hand, to compute the values of the prediction block samples, it is noted first that there are nine possible prediction modes for a luminance 4×4 block (abbreviated as a luma block in the sequel). The nine prediction modes are illustrated in Figure 3.2(a). Except prediction mode 2 with its samples all of the same value which is computed as the mean of A through D and I through L, the values of the samples of the remaining eight modes are computed from those values of A through M according to eight directions as illustrated in Figure 3.2(b). The H.264/AVC standard allows the selection of an encoder which adopts, among the nine modes, the best one with the lowest rate-distortion cost computed by the Lagrangian cost function described by Eq. (3.1).

3.2.2 Tree Structured Motion Compensation

A P macroblock may be split and motion compensated by four ways as (1) one 16×16 macroblock partition; (2) two 16×8 partitions; (3) two 8×16 partitions; or (4) four 8×8 partitions, as shown in Figure 3.3. If the 8×8 partitions are selected, each of

(27)

the four 8×8 sub-macroblocks may be split further by four ways as (1) one 8×8 sub-macroblock partition; (2) two 8×4 partitions; (3) two 4×8 partitions; or (4) four 4×4 partitions, as illustrated in Figure 3.4. An encoder selects the best partition size which has the lowest rate-distortion cost computed by the Lagrangian cost function (3.3).

M

J

I

D

C

B

A

g

m

p

l

n

i

e f

j

h

d

c

b

a

L

K

k

o

H

G

F

E

Figure 3.1 Samples a to p of a luma 4×4 prediction block are calculated based on the sample values of A to M in neighboring prediction blocks.

(a) Nine prediction modes for 4×4 prediction blocks.

(b) Directions for computing samples of eight prediction modes. Figure 3.2 Prediction modes for luma 4×4 prediction.

(28)

8×8 4×8 8×4 4×4 8 8 4 16×16 8×16 16×8 8×8 16 16 8

Figure 3.3 Macroblock partitions.

Figure 3.4 Sub-macroblock partitions.

3.3 Hiding Secret Data into H.264/AVC

Videos

In this section, the proposed methods of hiding data into different types of macroblocks of H.264/AVC videos will be described. An illustration of the hiding method is shown in Figure 3.5. In Section 3.3.1, the proposed method for hiding large-volume data in I macroblocks based on the use of the nine intra-prediction modes is described. In Section 3.3.2, the proposed method for hiding data in I macroblocks based on optimal choice of an intra-prediction mode is described. Finally, the proposed method for hiding data in P macroblocks optimally based on tree

(29)

structured motion compensation is described in Section 3.3.3.

Figure 3.5 Illustration of the proposed hiding method.

3.3.1 Process for Hiding Large-Volume Data into I

Macroblocks Based on Intra-Prediction Mode

In this section, we describe the proposed method for hiding secret data based on the direct use of the nine prediction modes. To take full advantage of the nine prediction modes, we transform the binary data to be hidden into novenary ones, and then encode the result by the prediction modes. In addition, we also combine the user’s secret key and the secret data by exclusive-OR operations for the purpose of ensuring that the hidden data can be extracted only by a user who has the correct key.

(30)

A detailed algorithm of the process is described in the following.

Algorithm 3.1: large-volume data hiding process using I macroblocks.

Input: a user’s key R, a secret data file D, and the 4×4 luma prediction mode M.

Output: a stego-macroblock I'.

Steps:

1. For each character Di of the secret data D, perform the following steps.

1.1 Compute the remainder R' of dividing R by 256.

1.2 Transform each characterDi of the secret data D in the following way to

form encrypted data E:

i i

E = D ⊕ R '. (3.5)

2. Transform E into a six novenary number N by converting every nineteen bits of E into a novenary digit. So each 4×4 luma prediction mode in this method macroblock can be used to hide 19/6 bits of data.

3. Encode each digit Ni of N with magnitude i by the corresponding prediction

mode Mi.

For example, if the user key is R = 3735, then R' = 3735/256 = 14110 =

100011012. Now, suppose that a secret message character D1 = ‘a’ is to be embedded,

whose corresponding binary form is 011000012. Then, the encrypted form of D1 is E1

= 01100001⊕10001101 = 111011002. Similarly, if D2 = ‘b,’ D3 = ‘c,’ with binary

forms being 011000102 and 011000112, respectively, then E2 = 01100010⊕10001101

= 111011112 and E3 = 01100011⊕10001101 = 111011102. Together, we get E =

E1E2E3 = 1110110011101111111011102 whose first 19 bits as underlined, when

converted into novenary, becomes the novenary number 8185639, and so may be

(31)

3.3.2 Process for Hiding Data Optimally into I

Macroblocks Based on Intra-Prediction Mode

In this section, we describe how we hide secret data optimally in a sense mentioned previously, based on the use of the nine prediction modes. Each 4×4 luma prediction mode in the I macroblock can be used to hide zero to four bits of data by this method and the method does not influence the degree of the imperceptibility. In addition, we recode the number of bits so hidden in the highest-frequency quantized coefficients of the 4×4 block, as shown in Figure 3.6. We also use the user’s secret key to encrypt the secret data to enhance the security. A detailed algorithm of the process is described in the following.

0

1

5

6

2

4

7

12

3

8

15

13

9

10

14

11

0

1

2

3

0

1

2

3 (x, y)

Figure 3.6 The quantized coefficient in the high-frequency.

Algorithm 3.2: optimal data hiding process for I macroblocks.

Input: an I macroblock in the spatial domain, I, a user’s key R, and a secret data file

D.

Output: a stego-macroblock I'.

(32)

1.2 Transform each characterDi of the secret data D according to Eq. (3.5)

to form encrypted data E.

2. For each luma 4×4 block B of I, perform the following operations. 2.1 For each luma 4×4 prediction mode Mi, perform intra-prediction,

DCT-based transform, and quantization in the video coding process, and then match four bits of E, E3E2E1E0 with the 4-bit numeral value I0I1I2I3

the index i of Mi to obtain the number of bits Ni which can be hidden in

Mi in the following way:

if E3 = I3, then set Ni = 1;

if E3 = I3 and E2 = I2, then set Ni = 2;

if E3 = I3, E2 = I2 and E1 = I1, then set Ni = 3;

if all Ej are equal to ;I;, then set Ni = 4;

otherwise, set Ni = 0.

2.2 Replace the highest-frequency quantized coefficients C as shown in Figure 3.4 by a new value according to the following mapping rules:

if 0 then set 0; if 1 then set 1; if 2 then set -1; if 3 then set 2; if 4, then set -2. i i i i i N = , C = N = , C = N = , C = N = , C = N = C = (3.6)

3. Select the best prediction mode according to Eq. (3.2), and so decide the number of bits which can be hidden in this block. Take away from E these bits.

4. Repeat the above steps to encode more bits in the remaining portion of E until no more is left.

(33)

= 01100001⊕10001101 = 111011002. Similarly, if D2 = ‘b’ with binary form being

011000102, then E2 = 01100010⊕10001101 = 111011112. Together, we get E = E1E2

= 11101100111011112 whose first 4 bits are then matched to the binary equivalent of

the index i of each prediction mode Mi. Suppose the best mode selected using the

Lagrangian cost function is M3 whose corresponding binary index is 3 = 00112 (bits

from right to left correspond to bits of E from left to right), we get two matching bits which can be hidden in M3. And so we set C = -1 and hide it in the highest-frequency

quantized coefficient.

A flowchart of the optimal data hiding process for I macroblocks is shown in Figure 3.7.

3.3.3 Process for Hiding Data Optimally into P

Macroblocks Based on Tree Structured Motion

Compensation

In this section, we hide secret data based on variable partition sizes of 16×16 macroblocks. Each 16×16 P macroblock can be used to hide one or four bit(s) of data by modifying the partition size. In order to allow better choices of sizes to reduce rate-distortion, we encode hidden data by the partition size with multiple choices for 0 or 1 according to Table 3.1, in which two groups of sizes are used to encode 0 and 1, respectively. In addition, we use the user’s secret key to encrypt secret data. A detailed algorithm of the process is described in the following.

(34)

(35)

Table 3.1 Relations between hidden data and partition sizes.

Partition size Hidden data

16×16 1 8×16 0 16×8 0 8×8 1 4×8 0 8×4 0 4×4 1

Algorithm 3.3: optimal data hiding process for P macroblocks.

Input: a P macroblock in the spatial domain P, a user’s key R, a secret data file D, and

the macroblock partition size K.

Output: a stego-macroblock P'.

Steps:

1.2 Transform each characterDi of the secret data D according to Eq. (3.5)

to form encrypted data E.

2. According to Table 3.1, hide one bit e1 or four bits ej of E into the

macroblock partition according to the following rules for the macroblock partition size K.

2.1 When the partition size K is 16×16:

1 1

if 0, then perform next partition size;

if 1, then perform Step 3.

e e

=

= (3.7)

2.2 When the partition size K is 8×16 or 16×8:

1 1

if 0, then perform Step 3;

if 1, then perform the next step.

e e

=

= (3.8)

2.3 When the partition size K is 8×8, split P into the four 8×8

(36)

For j = 1 to 4,

i. when partition size is 4×8 or 8×4:

if 0, then perform Step 3; if 1, then perform the next step

j j e e = = (3.9)

ii. when partition size is 8×8 or 4×4:

j j

if 0, then perform the next step; if 1, then perform Step 3.

e e

=

= (3.10)

3. According to Eq. (3.4), compute the best partition size, and decide the number of bits which can be hidden in this block. Take away from E these bits.

4. Repeat the above steps to encode more bits in the remaining portion of E until no more is left.

= 01100001⊕10001101 = 111011002. Similarly, if D2 = ‘b,’ with binary form being

011000102, then E2 = 01100010⊕10001101 = 111011112. Together, we get E = E1E2

= 11101100111011112. We may choose the macroblock partition size of 16×16 to hide

the first bit or choose four sub-macroblock partitioning 8×8, 4×4, 8×8, 4×8 further to hide the first four bits of E. We use the Lagrangian cost function to decide the best partition size. If the best partition is 16×16, we hide one bit in this block.

A flowchart of the optimally hiding process for P macroblocks is shown in Figure 3.8.

(37)

block

P macroblock in the spatial domain

XOR

Stego-macroblock

User's key Secret Data

Modified Lagrangian Formula

Quantized block in the frequency domain block Macroblock partition size is 8×8? Yes Macroblock partition size K

Inter-prediction & transform & quantize

No

Four bits Ejof E

Encrypted data E

Hide four bits by sub-macroblock partition size

Hide one bit by macroblock

partition size One bit E1 of E

Optimal partition

size

(38)

3.4 Extraction of Secret Data from

H.264/AVC Videos

In this section, the proposed methods for extracting the hidden data from an input H.264/AVC stego-video will be described. An illustration of the proposed data extraction method is illustrated in Figure 3.9. In Section 3.3.1 and Section 3.3.2 the two processes for extracting data from I macroblocks will be described. Next, the process for extracting data from P macroblocks will be described in Section 3.3.3.

Figure 3.9 Illustration of the proposed extraction method.

Macroblock type ?

Extraction process for I macroblocks

Extraction process for P macroblocks

I P

Stego-video

User's key

(39)

3.4.1 Process for Extraction of Data from I

Macroblocks by Proposed Large-Volume

Method

The proposed data extraction process retrieves the adopted prediction modes for the I macroblocks from the bitstream of the stego-video. The prediction mode and the user’s key then are taken as input to the extraction process for I macroblocks of the proposed large-volume data hiding method. The detailed algorithm is described in the following.

Algorithm 3.4: extraction process for I macroblocks of the large-volume method.

Input: the prediction mode M and a user’s key R.

Output: an extracted data file D.

Steps:

1. Extract the prediction mode M from blocks.

2. Transform every six prediction modes M6 (novenary) into nineteen bits Ej of

encrypted data E.

3. For every eight bits Ei of the E, perform the following steps.

2.1 Compute the remainder R' by dividing R by 256.

2.2 Compute the 8-bit code of each character Di of the secret data D as

follows and transform the obtained codes into original characters to get the embedded message:

'

i i

D = E ⊕ R . (3.11)

100011012. Suppose the extracted six prediction modes are M1 = 8, M2 = 1, M3 = 8,

(40)

number 8185639. After being converted into binary, it becomes the binary data

11101100111011111112 whose first eight bits E1 may be converted to get a secret

message character ‘a’ after conducting the exclusive-OR operations E1⊕R' =

111011002⊕100011012 = 011000012 to get the binary code 011000012 of ‘a.’

3.4.2 Process for Extraction of Data from I

Macroblocks by Proposed Optimal Method

The proposed data extraction process retrieves the adopted prediction modes of I macroblocks from the bitstream of the stego-video first. After entropy-decoding the stego-video, the highest-frequency quantized coefficient of each luma 4×4 block of I macroblocks is retrieved. Then we take these coefficients, the prediction modes and the user’s key as input to the data extraction process for I macroblocks. The detailed algorithm is described in the following.

Algorithm 3.5: extraction process for I macroblocks for the proposed optimal method.

Input: a quantized luma 4×4 block of the I macroblock in the frequency domain I, the

prediction mode M and a user’s key R.

Steps:

1. Obtain the number of bits Ni hidden in the mode M from the

highest-frequency coefficient C of Iaccording to the following rules: if 0 then set 0; if 1 then set 1; if -1 then set 2; if 2 then set 3; if -2 then set 4. i i i i i C = , N = C = , N = C = , N = C = , N = C = , N = (3.12)

(41)

3. For every eight bits Ei of E, perform the following steps.

3.2 Set each character Di of the secret data D according to Equation (3.11)

For example, if the prediction mode is M7 whose index is 01112 in binary and Ni

=4, then we can get the message data e = 11102.

A flowchart of the extraction process is shown in Figure 3.10

Figure 3.10 Flowchart of the extraction process for I macroblocks of optimal method. XOR

Extract the last Ni bit(s)

from prediction mode

User's key Secret Data luma prediction mode Quantized block in the

frequency domain

Encrypted data E

(42)

3.4.3 Process for Extraction of Data from P

Macroblocks

The proposed data extraction process retrieves the macroblock partition sizes of the P macroblocks from the stego-video first; if the macroblock partition size is 8×8, we get the sub-macroblock partition size of P macroblocks further, and take the partition size and the user’s key as input to the data extraction process for P macroblocks. A flowchart of the data extraction process for P macroblocks is shown in Figure 3.11 and the detailed algorithm is described in the following.

Algorithm 3.6: data extraction process for P macroblocks.

Input: a macroblock partition size P (and four sub-macroblock partition size Pj) and a

user’s key R.

Steps:

1. Extract a bit e1 or four bits ej as part of the encrypted data E from the P

according to the following rules.

1.1 When P is 16×16, 8×16, or 16×8, extract a bit e1 from the macroblock

partition size P1 in the following way:

1 1 1 1 if is 16 16, then set 1; if is 8 16, 16 8 then set 0. P e P e × = × × = (3.13)

1.2 When P is 8×8, extract 4 bits ej from four sub-macroblocks partition

size Pj in the following way:

for j = 1 to 4: if is 8 8, 4 4, then set 1; if is 4 8, 8 4, then set 0. j j j j P e P e × × = × × = (3.14)

2. For every 8 bits Ei of E, perform the following steps.

(43)

2.2 Set each character Di of the secret data D according to Equation (3.11).

For example, if the extracted macroblock partition size is P = 8×8 and the four sub-macroblock partition sizes are P1 = 8×8, P2 = 4×8, P3 = 4×4, P4 = 8×4, then we

can get four message bits e1 = 1, e2 = 0, e3 = 1, e4 = 0.

Figure 3.11 Flowchart of the extraction process for P macroblocks.

XOR

User's key

Secret Data

Macroblock partition size

Encrypted data E Partition size is

8×8 ?

Extract one bit according to macroblock partition

size Read four

sub-macroblock partition size

Yes No

Pj Extract four bits according to

sub-macroblock partition size

(44)

3.5 Experimental Results

3.5.1 Experimental Results of Large-Volume Method

In our experiments, the proposed video sharing algorithm has been integrated into the H.264 reference software JM12.4 [13]. The most important configuration parameters of the JM12.4 are shown in Table 3.2; other parameters are kept to retain their default values. An H.264/AVC video in CIF (352×288 pixels) format was used in our experiments.

Table 3.2 Configuration parameters.

Profille Baseline

Number of frames to be coded 5

Period of I-pictures 1

The secret data with the size of 2262 bytes used in the experiments are shown in Figure 3.12. Four of five frames of the input video and the resulting stego-video are shown in Figure 3.13. The extracted data are shown in Figure 3.14.

(45)

(46)

Figure 3.14 The extracted data file.

3.5.2 Experimental Results of Optimal Method

The proposed optimal data hiding algorithm was integrated into the H.264 reference software JM12.4. The most important configuration parameters of the JM12.4 are shown in Table 3.3; other parameters are kept to retain their default values. Several video sequences, Foreman, Football, Mobile and Tempete, in CIF (352×288 pixels) format were used in our experiments.

Table 3.3 Configuration parameters.

Profille baseline

Number of frames to be coded 10

Period of I-pictures 5

RD Optimization High complexity mode

The performance of the optimal data hiding algorithms was evaluated with the number of bits hidden (NBH), the Peak-Signal-to-Noise-Ratio increase in the Y color

(47)

space (PSNRI), the bit rate increase (BRI) and the subjective perception testing by comparing the original video and the stego video frames with hidden secret data. The secret data with the size of 2262 bytes used in the experiments are shown in Figure 3.12. Four of ten frames of the input video and the resulting stego-video are shown in Figure 3.15. The extracted data are shown in Figure 3.16. Finally the NBH, PSNRI and BRI values for several video sequences are shown in Table 3.4. From the above results and data, it can be observed that the proposed method can embed the secret data into H.264/AVC videos imperceptibly with light bit rate increasing.

(48)

Figure 3.15 The 4th to 7th frames (PPIP) of original video (left) and stego-video (right). (continued)

(49)

Table 3.4 NBH, PSNRI and BRI values for several video sequences (γi,γp= 250). football foreman mobile tempete

Average of PSNRI/NBH in I macroblocks (%) -0.031761 -0.052796 -0.024492 -0.030125 Average of BRI/NBH in I macroblocks 7.668156 13.382799 4.941370 6.188712 Average of NBH in I macroblocks 1567 593 3650 2321 Average of PSNRI/NBH in P macroblocks (%) -0.189027 -0.162934 -0.235225 -0.250322 Average of BRI/NBH in P macroblocks 27.370086 11.071856 17.341488 13.365207 Average of NBH in P macroblocks 1043 835 981 868

3.6 Discussions and Summary

In this chapter, we proposed a large-volume data hiding method and an optimal data hiding method that can be used to hide data into I macroblocks and P macroblocks. The optimal data hiding method not only considers hiding capacity of secret data and imperceptibility, but also considers bit rates. Therefore, the method is suitable for covert communication applications, especially when we need to transmit an H.264/AVC video in a low bit rate network.

(50)

Chapter 4 Copyright Protection of H.264/AVC

Videos by Watermarking and

Display Control on Specified

Computers

4.1 Introduction

With the fast development of network techniques, web 2.0 becomes ‘hot’ in recent years. Everyone can share videos on the Internet in various forms. However, these videos on the web can be easily downloaded and might be distributed to other people illegally. Hence, development of methods for protecting the copyright of videos is essential. In this study, a removable visible watermarking method with a scheme for video display control on specified computers is proposed for copyright protection of H.264/AVC videos, and is described in this chapter.

In Section 4.1.1, some related problem definitions are given and in Section 4.1.2, the basic idea of the proposed method are presented. In Section 4.2, the proposed visible watermark embedding method with a scheme for video display control on specified computers is described, and a corresponding visible watermark removal method is stated in Section 4.3. In Section 4.4, several experimental results of the

(51)

proposed method will be shown. Finally, some discussions and a summary will be made in last section of this chapter.

4.1.1 Problem Definition

To solve the problem of protectingthe copyright of videos mentioned above, two important issues are how to prevent a video from being illegally distributed and how to implement a scheme for display control on specified computers. Another issue is how to embed a visible watermark in an H.264/AVC video with a secret key, with the visible watermark being sufficiently robust against attacks from unauthorized users. A related issue is how to remove the embedded visible watermark to recover the original video.

4.1.2 Proposed Ideas

To deal with all the above-mentioned issues, it is proposed in this study to proceed in the following way:

1. A user downloads an active program from a server site to read the identification information of the local computer (called computer information below) and uploads it together with a selected secret key to the server site.

2. The server embeds a visible watermark together with the received computer information into a video selected by the user, and sends the resulting stego-video to the user site.

3. The user downloads an active player to display the protected video after removing the embedded visible watermark using the key he/she provides.

4. If the user distributes the stego-video together with the key and the active player to a third-party user, the user cannot remove the embedded visible watermark to view the video clearly because the active player will check the correctness of not

(52)

only the key but also the local computer information to decide whether or not to remove the visible watermark.

An illustration of the proposed idea is shown in Figure 4.1.

4.2 Proposed Scheme for Display

Control on Specified Computers and

Embedding Visible Watermarks in

H.264/AVC Videos

In this section, the proposed scheme of video display control on specified computers and embedding visible watermarks into H.264/AVC videos will be described. In Section 4.2.1, the process for video display control on specified computers will be described. In Section 4.2.2, the process of embedding visible watermarks is stated.

4.2.1 Process for Video Display Control on

Specified Computers

In order to protect the copyright of an H.264/AVC video and avoid the video being distributed illegally, we propose the idea of controlling video displays only on pre-specified computers. While a user requests a download service from the video supplier,the active program sent to the user will access the CPU and disk information of the user’s computer by Windows API, which is in the Microsoft core set of application programming interfaces (APIs), and then send the information to the server site for the server to embed a visual watermark together with such information into the video selected by the user. The video together with an active video player

(53)

then is sent to the user.

Figure 4.1 Illustration of the proposed idea.

When the user plays the video with the active player, the player will access the CPU, disk information of the local computer, and use this information to remove the visual watermark if this information is the same as the embedded computer information in the stego-video. Otherwise, the video will be covered with the previously-embedded visible watermark. Then, supposing that the user duplicates the stego-video and sends the copy to other users, the computer information checking

(54)

process will fail on other computers and so the watermark will not be removed, thus claiming the ownership of the video and in the mean time protecting the video from being viewed. An illustration of this checking process is shown in Figure 4.2.

Figure 4.2 The checking process of video display control on specified computers.

4.2.2 Process for Embedding Visible Watermarks

A watermark to be embedded into a video is assumed to be a binary image which has only black and white pixels. The embedding process for I and P macroblocks in a given video utilizes a 16×16 luminance macroblock of the video to embed a watermark pixel. A new technique of using intra-prediction modes is adopted in the H.264/AVC standard and results in higher compression efficiency than previous standards. A detailed introduction can be found in Section 3.2.1.

The sample values of a prediction block are computed by those of previously encoded and reconstructed blocks, as mentioned previously. Therefore, we cannot modify the transform coefficients directly for embedding a watermark pixel as the

(55)

traditional method does, because such modification will cause prediction errors, resulting in the so-called drift problem. That is, when a block which is not watermarked refers to a watermarked block, this block will also be watermarked. An illustration of the drift problem is shown in Figure 4.3. So we not only have to embed the visible watermark but also have to prevent the drift problem.

Figure 4.3 Illustration of drift problem.

We propose the use of multiple slice groups (described as Flexible Macroblock

(56)

groups make map the slice group to coded macroblocks in a number of flexible ways. Figure 4.4 list the seven different types of macroblock-to-slice-group mappings. The macroblocks in different slice groups will not refer to one another. Therefore, we can use this feature to set watermarked and non-watermarked macroblocks in different slice groups to avoid the drift problem, and use the DC coefficient to embed the watermark pixel. A flowchart of the embedding process for I and P frames is shown in Figure 4.5 and the detailed algorithm of the proposed process is described in the following.

Figure 4.4 Types of macroblock to slice group maps (type 6 is “Explicit” which is user-defined).

(57)

在 H.264/AVC 視訊上做資訊隱藏之研究及 其應用

國

立

交

通

大

學

資資資學與與與與與與

碩

碩

碩

碩

士

士

士

士

論

論

論

論

文

文

文

文

在 H.264/AVC 視訊上做資訊隱藏之研究及

其應用

A Study on Data Hiding in H.264/AVC Videos and Its

Applications

研 究 生：黃冠霖

指導教授：蔡文祥 教授

中

中

中

中

華

華

華

華

民

民

民

民

國

國

國

國

九

九

九

九

十

十

十

十

七

七

七

七

年

年

年

年

六

六

六

六

月

月

月

月

在 H.264/AVC 視訊上做資訊隱藏之研究及其應用

A Study on Data Hiding in H.264/AVC Videos

and Its Applications

研 究 生：黃冠霖 Student：Guan-Lin Huang

指導教授：蔡文祥 Advisor：Wen-Hsiang Tsai

國 立 交 通 大 學

資 訊 科 學 與 工 程 研 究 所

碩 士 論 文

在

在

在 H.264/AVC 視訊上做資訊隱藏之研究及其應用

研究生：黃冠霖

指導教授：蔡文祥教授

研究生：黃冠霖 Student：Guan-Lin Huang

國立交通大學

資訊科學與工程研究所

碩士論文

研究生：黃冠霖指導教授：蔡文祥博士