Discussions and Summary - Searches of Video Contents for Scene Surveillance by Novel Uses of

Chapter 3 Searches of Video Contents for Scene Surveillance by Novel Uses of

3.7 Discussions and Summary

In this chapter, we have proposed a motion detection method using tree structured macroblock decomposition information, a data hiding method suitable for H.264 videos, and a surveillance video search system. The proposed motion detection

method fully utilizes the encoding information generated during the encoding process.

Thus, the method can detect motion regions quickly. The proposed data hiding method avoids losing hidden data due to changes of intra prediction modes. The proposed surveillance video search system provides an easy way to search activities in a surveillance video based on the techniques mentioned above. Experimental results show the feasibility of the proposed method.

(a) (b)

Figure 3.6 Ten representing frames of the resulting stego-video. (a) The first frame. (b) The second frame. (c) The third frame. (d) The 4th frame. (e) The 5th frame.

(f) The 6th frame. (g) The 7th frame. (h) The 8th frame. (i) The 9th frame. (j) The 10th frame.

(e) (f)

(g) (h)

(i) (j)

Figure 3.6 Ten representing frames of the resulting stego-video. (a) The first frame. (b) The second frame. (c) The third frame. (d) The 4th frame. (e) The 5th frame. (f) The 6th frame. (g) The 7th frame. (h) The 8th frame. (i) The 9th frame. (j) The 10th frame (continued).

Figure 3.7 The proposed user interface for searching suspicious activities in a surveillance video.

Figure 3.8 The search result of the bookshelf which is specified by a black rectangle.

1. Input File 2. Select a specific region.

3. Start to search.

4. Search result.

(a) (b)

(e) (f)

Figure 3.9 Some resulting video clips of the search in Figure 3.8. (a) The first video clip.

(b) The second video clip. (c) The third video clip. (d) The 4th video clip. (e) The 5th video clip. (f) The 6th video clip.

Chapter 4 Authentication of Surveillance

Videos by Hiding Tree-Structured Macroblock Decomposition

Information

4.1 Introduction

With the progress of video compression technology and efficient video coding standards, digital videos nowadays have become more and more popular than in the past. The H.264 standard is especially used widely in many applications related to videos such as surveillance video systems. This convenience raises a problem that digital videos are easier to be modified through lots of video editing software than traditional ones recorded in tapes. Moreover, along with the progress of the Internet technology, digital videos are often transmitted on the Internet. As a consequence, some malicious users may acquire and tamper with the videos easily. Especially in the case of using a surveillance video system, if the videos stored in the system are tampered with, it may cause lots of serious legal disputes. Therefore, it is necessary to authenticate the integrity and fidelity of surveillance videos. In this chapter, a method for authentication of surveillance video sequences and their contents is proposed.

In Section 4.1.1, the related problem definitions are given. In Section 4.1.2, the idea of the proposed method is presented. In Section 4.2, the proposed process for

generating authentication signals is described. In Section 4.3, the proposed process for embedding authentication signals in surveillance videos is described. The proposed process for authentication of video sequences and contents is stated in the Section 4.4.

Some experimental results are shown in Section 4.5. Finally, some discussions and a summary will be given in the last section of this chapter.

4.1.1 Problem Definition

The main task of a video authentication system is to verify whether a video has been tampered with or not. Tampering operations can be categorized into two types:

spatial and temporal. Spatial tampering means modifications manipulated on video

frame contents, and temporal tampering means modifications manipulated on video frame sequences.

Temporal tampering can be categorized further into three types: replacement, cropping, and insertion. Replacement means substituting fake video frames for some

of the original video frames, respectively. In this way, the number of video frames will not change and the difference of the size between the original video and the fake one will be too tiny to detect. For example, a malicious user may want to replace a frame including a suspicious activity with a non-suspicious one. An illustration of replacement is shown in Figure 4.1.

Cropping means deleting some video frames from the original video sequence.

For instance, a malicious user may want to eliminate his or her criminal fact by cropping some video frames in the original video sequence. An illustration of cropping is shown in Figure 4.2.

Insertion means placing some fake video frames between frames of the original video sequence. If a malicious user wants to impute his or her criminal activities to

someone who is innocent, he/she may try to insert fake video frames into the original video sequence. An illustration of insertion is shown in Figure 4.3.

The main task of the proposed authentication system is not only to detect if a surveillance video has been tampered with, but also to recognize the tampering type and mark further the altered frames.

4.1.2 Proposed Idea

In the proposed method, we divide a video sequence into several frame groups, with each group being composed of some P frames and one I frame. In order to detect spatial and temporal temperings, authentication signals are generated for each frame group G and hidden into the DCT coefficients of each macroblock within the I frame in G. The authentication signals of G are composed of two types of features, as proposed in this study. The first is the tree structured macroblock decomposition information of a P frame in G, which can be used to detect spatial tempering. The

second is the index of G, which can be used to detect temporal tampering.

In Chapter 2, we have reviewed the tree structured motion compensation technique used in the H.264 standard. The basic idea is that a macroblock in a P slice can be partitioned into sub-macroblocks and each of these sub-macroblocks is motion compensated individually. The way of partitioning the macroblock is usually adapted to the video content. Because different video contents result in different sub-macroblock partitions, the partition modes of the sub-macroblocks, called tree structured macroblock decomposition information, are suitable for use to generate

authentication signals.

Figure 4.1 An illustration of replacement.

Figure 4.2 An illustration of cropping.

Figure 4.3 An illustration of insertion.

4.2 Generation of Authentication Signals

In this section, the proposed technique for composition of authentication signals is described. In Section 4.2.1, the principle is described first, and in Section 4.2.2, the proposed process for generation of authentication signals is presented.

4.2.1 Principle of Authentication Signal Generation

In this study, a frame group is treated as a unit for authentication. Therefore, each frame group G of an input video has its own authentication signals which comprise two parts. The first part is the index of G which is used to detect temporal tampering. The second part is tree structured macroblock decomposition information T of the motion regions of a P frame within G, which is selected randomly by a key.

Since surveillance videos usually contain some suspicious activities in motion regions, we use the motion regions to generate the authentication signals. More specifically, we record T and quantify it to form a string I. Then, I together with the index of G comprise the authentication signals S_b. Finally, S_b is embedded into the I frame of G for the authentication use.

4.2.2 Process for Generation of Authentication Signals

In Chapter 2, we have reviewed the tree structured motion compensation technique of the H.264 standard. A 16×16 macroblock can be partitioned by one of the following ways: one 16×16 macroblock partition, two 16×8 partitions, two 8×16 partitions, and four 8×8 partitions. If the 8×8 macroblock partition mode is chosen,

each of the four 8×8 sub-macroblocks in the macroblock may be further partitioned in four ways: one 8×8 sub-macroblock partition, two 8×4 sub-macroblock partitions, two 4×8 sub-macroblock partitions, and four 4×4 sub-macroblock partitions.

Therefore, a macroblock partition mode and four sub-macroblock partition modes are used to describe the partition of a 16×16 macroblock in the H.264 standard.

Besides the index of a frame group G, we also need the tree structured macroblock decomposition information T of a P frame within G, which is selected randomly, to generate the authentication signals. Specifically, we select one of the P frames F_p in G and perform the motion detection algorithm introduced in Chapter 3 to Fp to obtain a set of the motion regions, R, in it. For each 16×16 macroblock M of each region R_i of R, denote its macroblock partition mode as P_m. If P_m is a large partition mode except for the 8×8 macroblock partition, we use the bit 0 to represent M. If P_m is the 8×8 macroblock partition and each sub-macroblock partition mode of M is the 8×4, 4×8 or 4×4 mode, we use the bit 1 to represent M. If Pm is the 8×8 macroblock partition mode and all the sub-macroblock partition modes of M are the 8×8 sub-macroblock partition modes, then we treat M as a special case which will be described later.

Next, we form a binary string S′ with a pre-defined length LT to represent the tree structured macroblock decomposition information of R_i by assigning each macroblock a representing bit in the above-mentioned way. We then combine S′ with the binary form G_b of the index of G and the binary form R_ib of the coordinates of R_i to compose a binary string Si with length LR which is the sum of the value of LT, the length of Gb

and the length of R_ib. Moreover, we call S_i the region signal of R_i. Finally, we combine all region signals of R to produce the authentication signals Sb of G. The following algorithm describes the details of the above-mentioned process.

Algorithm 4.1. Process for generating authentication signals.

Input: a frame group G in a video, a secret key K, and a random number generator f.

Output: authentication signals Sb to be embedded.

Steps:

1. Use the input secret key K as a seed for f and use f to generate a sequence of random numbers, Q.

2. Select randomly a P frame F_p in G according to Q.

3. Perform the motion detection process (Algorithm 3.1) to Fp to obtain a set of the motion regions R in F_p.

4. For each region Ri within R, perform the following steps.

4.1 For each 16×16 macroblock M in R_i, perform the following steps.

4.1.1 Denote the macroblock partition mode of M as Pm and the sub-macroblock partition mode as P_s.

4.1.2 If Pm is the 16×16, 16×8, or 8×16 mode, Mark M as a large partition macroblock.

4.1.3 If Pm is the 8×8 mode and each Ps of M is 8×4, 4×8, or 4×4 mode, Mark M as a small partition macroblock.

4.1.4 For the case that both Pm and Ps are the 8×8 mode, decide that M is a large partition macroblock or a small partition macroblock according to the following rules.

4.1.4.1 Evaluate the partition score of M according to the following rules.

4.1.4.1.1 Name the eight neighboring macroblocks as A through H, as depicted in Figure 4.4.

4.1.4.1.2 Define the macroblock gain G_i for each of A through H in the following way.

(1) For A, B, C, and D, if P_i is the 8×8 mode, set the value of Gi to 1; otherwise, to 0.

(2) For D, E, F, and H, if P_i is the 8×8 mode, set the value of Gi to 0.5; otherwise, to 0.

4.1.4.1.3 Calculate the partition score according to the following equation:

4.1.4.2 If the partition score is larger than a pre-defined threshold T, mark M as a large partition macroblock; otherwise, as small.

4.1.5 If M is a large partition macroblock, set B(M) to 1; otherwise, to 0.

4.2 For each Ri, select LT 16×16 macroblocks M1 through ML, each denoted as M_i, and combine all B(M_i) to form a binary string S′, where L_T is a

pre-defined length of signals. If the total number of macroblocks in Ri is smaller than L_T, allow repetition of using macroblocks in R_i.

4.3 Transform the coordinate information of Ri into the binary form and combine it with S′ to form a new binary string S_i. macroblock; otherwise, a small one. A partition score is calculated for M based on the

eight neighboring macroblocks to decide which case M belongs to. The macroblocks which are in direct contact with M (macroblocks A, B, C, and D in Figure 4.4) have much influence on M than macroblocks E through H.

The case that the macroblock partition mode is 8×8 and the sub-macroblock

4.3 Embedding and Extracting of Authentication Signals in

Surveillance Videos

In this section, the proposed methods of embedding and extracting authentication signals are introduced. In Section 4.3.1, the proposed technique of embedding authentication signals is described, and in Section 4.3.2 the proposed technique of extracting authentication signals is presented.

H G

F E

D M

Figure 4.4 The notations of the eight neighboring macroblocks of M.

4.3.1 Embedding of Authentication Signals

In this section, the proposed technique of embedding authentication signals is described. In Section 4.3.1.1, the proposed idea is presented. In Section 4.3.1.2, the detail steps of the embedding process are described.

4.3.1.1 Proposed Idea

We divide a video into several frame groups, and each of them is treated as a unit of authentication, as mentioned previously. After generating the authentication signals S_b for each group G, we embed S_b into the only I frame in G for authentication use, resulting in a protected video.

A protected video V_p might be tampered with and recompressed by a malicious user. Therefore, if the method used for embedding authentication signals is not robust enough to be recompression-resilient, the authentication signals hidden in V_p may get lost and the protected video will not be authenticable any more. As a result, the data hiding method applied to the authentication signal embedding process needs to be robust with respect to H.264 recompression, so the robust data hiding method introduced in Chapter 3 is utilized in the proposed embedding process described next.

4.3.1.2 Process for Embedding Authentication Signals in I Frames

After obtaining the authentication signals S_b, we duplicate S_b into several copies, where the total length of these copies is set smaller than the capacity of an I frame.

The main purpose of this duplication process is to facilitate extracting authentication signals more precisely using a voting technique in the later authentication process in

order to reduce the probability of misrepresentation. Then, the signals are embedded into the I frame using the secret-key-based data hiding method introduced previously in Chapter 3. The details are described as an algorithm as follows.

Algorithm 4.2. Process for embedding authentication signals.

Input: authentication signals Sb, an I frame F, a secret key K, and a random number generator f.

Output: a protected I frame F′.

Steps:

1. Denote the length of Sb as L(Sb). Duplicate Sb k times and concatenate them in order to form a new binary string Sb′, where k is such that L(Sb)×k is smaller than the capacity of an I frame.

2. For each 16×16 macroblock M in F, perform the following steps before M is encoded.

2.1 Take out the first consecutive 16 bits of Sb′, which have not been hidden, and denote these data bits as S_b16′.

2.2 Take K, Sb16′, M, and f as the input to the data hiding method (Algorithm 3.2) introduced in Chapter 3, and perform the data hiding process.

3. Repeat Step 2 until all macroblocks in F are processed.

An example of embedding authentication signals is illustrated in Figure 4.5.

Figures 4.5(a) through 4.5(c) comprise the first frame group G1 in an input video.

Figures 4.5(a) and 4.5(b) are two P frames of G₁, and Figure 4.5(c) is the I frame of G1. We selected one frame F from the first P frame in Figures 4.5(a) or the second P frame 4.5(b) to construct the authentication signals of G₁. More specifically, there are two motion regions, R and R , within G, and the region signals of R and R are

combined to comprise the authentication signals of G₁ and embedded into the I frame of G1. Figures 4.5(d) through 4.5(f) comprise the second frame group G2 in the video, and the embedding process for is the same as for G₁. A comparison between the original I frame and the stego-I frame is illustrated in Figure 4.6. Figure 4.6(a) is the original I frame and Figure 4.6(b) is the stego-I frame. The comparison shows that the data hiding process does not result in many noises in the stego-I frame.

4.3.2 Extraction of Authentication Signals

In this section, the proposed technique of extracting authentication signals is described. In Section 4.3.2.1, the proposed idea is presented, and in Section 4.3.2.2, the detail steps of the extraction process are described.

4.3.2.1 Proposed Idea

If a protected video is re-encoded, the original DCT coefficients may be slightly changed. Some of the authentication signals hidden in the video may also be changed due to the recompression process. For this reason, we duplicate the signals several times and embed all of them, as mentioned previously. We extract them in the data extraction process by the voting technique in order to increase the precision of the extracted authentication signals. Furthermore, if the protected video is tampered with, we can still extract the correct signals while the non-suspicious area is larger than the suspicious area in an I frame spatially.

(a) (b)

(e) (f)

Figure 4.5 An example of embedding authentication signals. The region signals of one of the two P frames form authentication signals, and the authentication signals are hidden into the following I frame. (a) The first P frame of the first frame group. (b) The second P frame of the first frame group. (c) The I frame of the first frame group. (d) The first P frame of the second frame group. (e) The second P frame of the second frame group. (f) The I frame of the second frame group.

(a) (b)

Figure 4.6 A comparison between the original I frame and the stego-I frame. (a) The original I frame. (b) The stego-I frame.

4.3.2.2 Process for Extracting Authentication Signals by Voting Technique

We extract signals hidden in a video by the data extracting method mentioned previously in Chapter 3, and get authentication signals based on the use of the voting technique proposed later. In Section 4.2, we have introduced the process for generation of authentication signals. For each frame group G, the signals are produced based on the motion detection result of the P frame in G. Since there may be several detected motion regions and since a specific segment of authentication signals in the authentication signals S_b is produced for each region, the length of S_b is not fixed for every frame group in the video. As a result, the length Ls of Sb needs to be decided first, so that the voting process can be performed based on L_s.

In the proposed voting process, each bit of the extracted data may be either of the two possible values, 0 and 1, so every bit B_i of S_b is associated with two scores:

Score-0 and Score-1. If the value of Bi is 0, one vote is added to Score-0; otherwise, to Score-1. Then the value with the higher vote score will be regarded as the correct value of B.

Algorithm 4.3. Process for extracting authentication signals.

Input: a protected I frame F, a secret key K, and a random number generator f.

Output: authentication signals Sb. Steps:

1. For each macroblock M_i of F, take M_i, K, and f as the input to the data extraction method (Algorithm 3.3) to get the hidden data Di of Mi.

2. Combine all D_i to form a binary string S.

3. Perform the following steps on S to get the authentication signals Sb. 3.1 Denote the length of S as L.

3.2 Divide L into segments of lengths LR, and denote the number of segments as T, where L_R is the length of a region signal as mentioned in Section 4.2.2.

3.3 Generate T candidate authentication signals according to the following steps and denote them as S₁ through S_T.

3.3.1 Denote the currently-processed candidate authentication signal as Sj, where 1≤ j ≤ T.

3.3.2 Divide S into several segments of signals, with each of them being of the length L_R×j.

3.3.3 Transform each segment S′ of S into the binary form as S′ = b₁b₂b₃…b_l, where l is the length of S′. Associate each bit of S′ with two vote scores V₀[m] and V₁[m], where 1 ≤ m ≤ l. Calculate the score of each bit of the

comparing the two scores of each bit of S′ according to the following

3.3.6 Calculate the average distribution rate Pj of S_j by the following rule:

1 distribution rate, as the output authentication signals Sb.

In the above process for extracting authentication signals, we first divide the extracted signals S by L_R in order to know how many region signals S can hold, and the division result is denoted as T. Since we do not know how many region signals comprise the desired authentication signals, we check each possible number N of region signals, where N cannot be greater than T, and construct the corresponding candidate authentication signals Sj by the voting technique, where 1 ≤ j ≤ T. Based on the voting result which yields Sj, we calculate the distribution rate of each bit of Sj. The candidate with the highest average distribution rate of all bits is recognized as the desired authentication signals.

4.4 Authentication of Surveillance Videos

In this section, the proposed detection and verification techniques for spatial and temporal tamperings are introduced. In Section 4.4.1, the process for detection and verification of spatial tampering is described. And the process for detection and verification of temporal tampering is presented in Section 4.4.2.

4.4.1 Detection and Verification of Spatial Tampering

A frame group G is treated as a unit of authentication. The authentication signals for G are embedded in the I frame of G. The first step of authentication of spatial tampering is to perform authentication on the I frame. If any region in the I frame is marked as a suspected region, G is also marked as a suspected frame group. For a suspected frame group G′, we perform authentication on P frames of G′ to get more

在文檔中利用資訊隱藏技術對 H.264 影片做真偽驗證及內容保護之研究 (頁 50-0)