Detection and Verification of Temporal Tampering

Chapter 4 Authentication of Surveillance Videos by Hiding Tree-Structured

4.4 Authentication of Surveillance Videos

4.4.2 Detection and Verification of Temporal Tampering

In the proposed method, temporal tampering is categorized into three types:

replacement, cropping, and insertion. We not only can detect three types of tampering, but also can detect the starting frame and the end frame for each type of tampering.

We utilize the extracted index Ii′ of a frame group Gi obtained from the

corresponding authentication signals to verify the correctness of a video sequence. We compare I_i′ with the index of G which is denoted as I_i to detect the temporal tampering.

Algorithm 4.5: Process for temporal tampering detection of a video sequence.

Input: a video sequence V, and authentication signals S of each frame group of V.

Output: a report R of the detection result.

Steps:

1. Denote the total number of frame groups in V as N, each frame group as G_ι, and the index of each frame group as I_i, where 1 ≤ i ≤ N.

2. For each frame group Gi in V, extract the index Ii′ hidden in the corresponding authentication signals, where 1 ≤ i ≤ N.

3. Create a flag bit B to indicate the occurrence of tampering, and initialize B to 0.

4. Create a flag bit F to indicate the occurrence of replacement, and initialize F to 0.

5. Subtract Ii from Ii′, and denote the result as Di. 6. If D_i≠ 0, perform the following steps.

6.1 If B is equal to 0, set B to 1, and record the index n_s of the I frame in G_i. 6.2 If B is equal to 1 and Di is equal to Di-1, set F to 1.

7. If Di = 0, perform the following steps.

7.1 If B is equal to 1, record the index nf of the I frame in Gi, and perform the following steps.

7.1.1 If F is equal to 1, decide the tampering type as replacement

7.1.2 If F is equal to 0, decide the tampering type as cropping and insertion.

7.1.3 Store the tampering type, ns and nf, into R.

7.1.4 Set B, ns, and n_f to 0.

8. Repeat Steps 5 through 7 for each frame group until reaching the end of V.

9. If B is equal to 1, perform the following steps.

9.1 If F is equal to 1, recognize the tampering type as replacement.

9.2 If F is equal to 0, perform the following steps.

9.2.1 If DN > 0, decide the tampering type as cropping.

9.2.2 If DN < 0, decide the tampering type as insertion.

9.2.3 Store the tampering type, ns and the index of the last I frame of V into R.

The meanings of some of the steps of the algorithm are explained here. The basic idea of the above proposed method is to detect tampering based on the difference between the real index and the extracted index of a frame group. All the differences of the frame groups in a video sequence without temporal tampering should be zero. If a cropping operation occurs in frames before a frame group G, the difference of G is larger than zero. If an insertion operation occurs in frames before a frame group G, the difference of G is smaller than zero.

Once there is found a frame group G_i with non-zero difference D_i in Step 6, we mark Gi as the start ns of the tampering. Then, if there is a frame group Gj after Gi in the video sequence and the difference D_j of G_j is zero, G_j is marked as the end n_f of the tampering in Step 7.1. For convenience of use, we call the tampering T.

Next, in Steps 7.1.1 through 7.1.4, we decide the tampering type of T based on the difference sequence of the frame groups between Gi and Gj. If the differences in the sequence are all equal, then the type of T is marked as cropping and insertion, which means there is a cropping operation which crops a number of frames as well as an insertion operation which inserts the same number of frames as the cropping operation. If the difference sequence includes non-consecutive numbers, then the frame groups between G_i and G_j are not the original frame groups of the video.

Therefore, we mark the type of T as replacement.

If the frame group G_i with non-zero index difference is found, but the frame group Gj with zero index difference is not found even when reaching the end of the video, we decide the tampering type of T based on the following rules in Step 9. If the difference sequence includes non-consecutive numbers, the type is regarded as replacement. If the differences in the sequence are all equal and larger than zero, the type is regarded as cropping. Otherwise, the type is regarded as insertion.

4.5 Experimental Results

In our experiments, the size of each video frame is 352×288. The input video is a surveillance video of the Computer Vision Lab at National Chiao Tung University, where this study was conducted. In this video, a person wants to take a book on the table, and a malicious user try to cover this person and crops the part of the person in all frames of the input video. Each row of the figures in Figure 4.7 through Figure 4.10 is a frame group G_i of the input video. The left figure of the row is a representing P frame of Gi, and the right figure of the row is the I frame of Gi. Three consecutive frame groups of an original video are shown in Figure 4.7. Three consecutive frame groups of the protected video yielded by the proposed method are shown in Figure 4.8.

Three consecutive frame groups of an attacked version of the video are shown in Figure 4.9. The malicious user crops the area containing the person in each frame and replaces it with the background image. The three corresponding consecutive frame groups of the authenticated video are shown in Figure 4.10.

In Figure 4.10, the green areas in the right figures are the suspicious areas of the I frames, which are attacked. The black rectangles in the left figures are the results of authentication on P frames. These rectangles reveal the information of the original video contents in the attacked areas, and the tree structured macroblock

decomposition information of the contents. Based on the concept of tree structured motion compensation, the areas with small rectangles may contain some moving objects. The areas with small rectangles are distributed around the table. If we compare the areas with the background image, we may guess that the book on the table is moved by someone.

This experiment shows that the proposed authentication method not only can detect whether a video has been tampered with or not, but also can specify which part of the image frame is tampered with.

4.6 Discussions and Summary

In this chapter, we have proposed an authentication method that can detect and verify tamperings in a suspicious video. The proposed method uses the tree structured macroblock decomposition information in H.264 codes as authentication signals and embeds the authentication signals into the I frames of the input video. In order to extract the authentication signals more precisely, we use the voting technique to make sure we can still extract the correct signal while most regions of a suspicious frame are not tampered with. The correct signals can detect both temporal tampering and spatial tampering and verify the suspicious regions and frames.

Therefore, the proposed authentication system not only checks if a protected video has been tampered with or not, but also further shows where and how the tampering occurs.

(a) (b)

(e) (f)

Figure 4.7 Three consecutive frame groups of the original video. (a) A representing P frame of G_1. (b) The I frame of G₁. (c) A representing P frame of G_2. (d) The I frame of G₂. (e) A representing P frame of G_3. (f) The I frame of G₃.

(a) (b)

(e) (f)

Figure 4.8 Three consecutive frame groups of the protected video. (a) A representing P frame of G_1. (b) The I frame of G₁. (c) A representing P frame of G_2. (d) The I frame of G₂. (e) A representing P frame of G_3. (f) The I frame of G₃.

(a) (b)

(e) (f)

Figure 4.9 Three consecutive frame groups of the tampered video. (a) A representing P frame of G_1. (b) The I frame of G₁. (c) A representing P frame of G_2. (d) The I frame of G₂. (e) A representing P frame of G_3. (f) The I frame of G₃.

(a) (b)

(e) (f)

Figure 4.10 Three consecutive frame groups of the authenticated video. The green areas in the right figures are suspicious areas of the I frame. The black rectangles in the left figures are the tree structured macroblock decomposition information of the suspicious areas. (a) A representing P frame of G_1. (b) The I frame of G₁. (c) A representing P frame of G_2. (d) The I frame of G₂. (e) A representing P frame of G_3. (f) The I frame of G₃.

Chapter 5 Protection of Personal Privacy in Surveillance Videos

5.1 Introduction

Surveillance systems rise along with the development of society, so lots of issues have to be considered. Privacy protection is one of these issues in video surveillance.

Since a video surveillance system usually monitors a public space for long periods of time, it may possibly record some information which violates personal privacy.

Therefore, we propose a method for privacy protection to solve this issue and the method is described in this chapter.

In Section 5.1.1, the related problem definitions are given. In Section 5.1.2, the idea of the proposed method is described. In Section 5.2, the proposed process for embedding decoding information into videos is presented. In Section 5.3, the proposed process for extracting decoding information from videos is presented. Some experimental results are shown in the Section 5.4. Finally, some discussions and a summary will be given in the last section of this chapter.

5.1.1 Problem Definition

In the privacy protection problem dealt with in this study, an authorized user can specify a protected region R in an input video. The video contents in R then are removed and replaced with the background image in order not to reveal sensitive

privacy information in R. Also, the privacy information of R is hidden into the video to produce a privacy video. Thereafter, once the privacy information needs to be recovered, the data hidden in the privacy video is extracted and used to recover R.

Two main issues are involved in this problem. The first is how to replace the video contents in R with the background image and to embed the information about the contents in R into the video. The second is how to extract the data from the privacy video and to recover the original contents of the protected region.

5.1.2 Proposed Idea

A video can be decoded correctly based on the decoding information generated during the encoding process. Therefore, in order to remove sensitive video contents of a region R, which is specified by an authorized user, in an input video, we set the decoding information of R to some pre-defined values, so that the video contents are removed and replaced with the background image. The decoding information of R is then hidden into the input video. If the video contents of R need to be recovered, the decoding information of R hidden in the video is extracted and used to recover the contents of R.

5.2 Hiding of Privacy Information

In this section, the proposed process for hiding privacy information is introduced.

In Section 5.2.1, the proposed idea of the process is stated. In Section 5.2.2, the proposed process for hiding privacy information is described.

5.2.1 Proposed Idea

In Chapter 2, we have reviewed the concept of motion compensation. Motion compensation is the process of finding the best prediction block in inter mode. A motion vector is used to indicate the location of the best prediction block. The

difference between the best prediction block and the currently-processed block is DCT-based transformed into a set of frequency coefficients. Motion vectors and frequency coefficients are used in the decoding process to decode the corresponding block.

A P frame can be decoded correctly based on correct decoding information which includes motion vectors, frequency coefficients, partition modes, etc. In order to remove the privacy information in the user-specified region R and replace the privacy information with the background image, we first use the proposed motion detection method to detect if there are any activities in R. If any motion region is detected in R, we denote the resulting motion region as a replaced region. We modify motion vectors of the replaced region R′ in order to change the original prediction blocks into the corresponding blocks in the background image, and set all frequency coefficients of R′ to zero. Therefore, the video contents in R′ turn into the corresponding part of the background image. The original motion vectors and frequency coefficients of R′ are embedded into the input video for recovery use.

5.2.2 Process for Hiding Privacy Information

The proposed process is applied on P frames of input H.264 videos. The proposed motion detection method introduced in Chapter 3 is applied on the P frame to detect motions in a user-specified region R and get the replaced region R′. When encoding macroblocks within R′, the motion vectors and frequency coefficients of the

currently-processed macroblock M is all set to zero. Therefore, the video contents of R′ become the corresponding part of the background image which has appeared in the previous frames of the input video. It also places a restriction on this proposed process that the first frame of the input video must be a background frame.

The values of the original motion vectors and frequency coefficients of macroblocks of R′ are then hidden into the remaining region of the P frame. We use a secret key to randomize the hiding order of macroblocks for the security protection purpose.

In Chapter 2, we have reviewed the process for encoding an H.264 video. In the prediction procedure of an H.264 encoding process, all sample values of a prediction block are computed by those of previously encoded and reconstructed blocks.

Therefore, if we modify the motion vectors and frequency coefficients of R′ during a traditional encoding process, then it will cause prediction errors on macroblocks which have referenced the macroblocks in R′. In more details, assume that some macroblocks M of the following frames have referenced the modified macroblocks in R′. Once the video contents within R′ are recovered, then the prediction blocks of M used in the decoding process will be different from the prediction blocks computed in the previous encoding process. It causes decoding errors on M. Therefore, we introduce the use of multiple slice groups to solve this problem. The details are described in the following algorithm.

Algorithm 5.1. Process for removing and hiding privacy information with a user-specified region.

Input: an H.264 video V, a secret key K, a random number generator f, and a region R specified by an authorized user.

Output: an H.264 video V′ with privacy information in R removed and hidden.

Steps:

1. Use explicit mapping, which is Type 6 of multiple slice groups maps to set the slice group number N_ij of each macroblock M_ij of each frame of the input H.264 video V according to the following rule:

2. For each P frame F of V, perform the following steps.

2.1 Take the currently-processed P frame F as input to the proposed motion detection algorithm (Algorithm 3.1), denote the resulting motion region as R′, and regard R′ as a replaced region for removing privacy information.

2.2 For each macroblock Mij in R′ of F, if the corresponding Nij is equal to 0, store the motion vector and the frequency coefficients of Mij in a report E and set the motion vector and frequency coefficients of M_ij to zero.

2.3 Denote the total number of macroblocks in R′ as N.

2.4 Use the input secret key K as a seed for f and use f to generate a sequence of random numbers Q ={i1, i2, …, iN} in the range of {1, 2, …, N} without repetitive values.

2.5 For each number in Q, get the motion vector and the DC coefficient of the frequency coefficients of the corresponding macroblock in R′, and transform them into a binary string Sk.

2.6 Combine all Sk and the binary form of the coordinate information of R′ to form a binary string S, and denote it as S = s1s2s3…sL, where L is the length of S.

2.7 For each macroblock Mij of F, if the corresponding Nij is equal to 1, perform the following steps.

2.7.1 For each 4×4 sub-macroblock M of M , denote the corresponding

frequency coefficients as Coeff.

2.7.2 Modify Coeff in order to hide an un-hidden bit B of S according to the following rules.

2.7.2.1Select the coefficient pair C1(0, 3) and C2(3, 0) in Coeff.

2.7.2.2Modify C₁ and C₂ according to the following equations.

(1) if B = 0:

2.8 Repeat Step 2.6 until reaching the end of S or the last macroblock of F.

3. Repeat Step 2 until reaching the end of V.

We have mentioned the concept “slice” in Chapter 2. A slice is composed of macroblocks. During an H.264 encoding process, macroblocks are predicted from samples in the same slice. In other words, macroblocks in different slice will not refer to each other. It also implies that an H.264 encoder processes the next slice until all macroblocks in the currently-processed slice are encoded.

We solve the prediction problem by the use of multiple slice groups, which are introduced in the H.264 standard. A slice group may contain one or more slices.

Multiple slice groups define a number of flexible ways to map coded macroblocks to slices groups. There are totally seven types of multiple slice groups maps. The first six types are illustrated in Figure 5.1. The last type called explicit mapping is entirely user-defined.

Since macroblocks in different slices will not refer to each other, macroblocks in

different slice groups will not refer to each other, either. Therefore, we can solve the prediction error problem by using the explicit mapping to set the user-specified region and the remaining region in different slice groups. Then, the decoding of macroblocks of the remaining region will not be affected by the modified macroblocks of the user-specified region.

Another benefit that the multiple slice groups bring about is that we can control the encoding order of each slice groups by setting the slice group identifier. As a consequence, the process of removing privacy information and the process of embedding privacy information can be done in the mean time. In other words, we do not have to perform the encoding process two times. Without multiple slice groups, the macroblocks are encoded in a raster scan order. Then, we have to perform an encoding process to remove the privacy information and get the decoding information, and then perform another encoding process to hide the decoding information into the video. In this situation, it results in another problem that the decoding information stored in the first encoding process may not be the same as the one generated in the second encoding process. The mismatching decoding information may result in decoding errors and cause the recovery process which will be introduced later to fail.

That is why the slice group identifier of R is set to 0, and the remaining region is set to 1 in Step 1 of the proposed algorithm. We can remove the privacy information in the encoding of the first slice group and hide it into the video in the second slice group during the same encoding process.

5.3 Recovery of Privacy Information

In this section, the proposed process for recovery of privacy information is introduced. In Section 5.3.1, the proposed idea is described, and the process for

recovery of privacy information is presented in Section 5.3.2.

(a) (b)

(e) (f)

Figure 5.1 Types of multiple slice groups. The numbers in these figures are the slice group identifiers. There is another type, Type 6 - explicit mapping which is entirely user-defined. (a) Type 0 - interleaved mapping (three slice groups).

(b) Type 1 - dispersed mapping (three slice groups). (c) Type 2 - foreground and background mapping (four slice groups). (d) Type 3 - box-out mapping (two slice groups). (e) Type 4 - raster mapping (two slice groups). (f) Type 5 - wipe mapping (two slice groups).

5.3.1 Proposed Idea

We use a secret key to extract the coordinate information, the motion vectors,

and the frequency coefficients of each macroblock of a replaced region R′ from an input privacy video. Once the privacy information in R′ needs to be recovered, the extracted information is used to recover the video contents in R′.

5.3.2 Process for Recovery of Privacy Information

In the proposed process for removing privacy information, we have mentioned that the replaced region and the remaining region are in different slice groups. We call the slice group of the replaced region privacy slice, and the slice group of the remaining region remaining slice. Because of the slice group identifier, during an H.264 decoding process, the privacy slice is decoded first and then the remaining slice.

Therefore, there are two phases in the proposed process for recovery of privacy information in an input video. The first is to extract the decoding information of the

在文檔中利用資訊隱藏技術對 H.264 影片做真偽驗證及內容保護之研究 (頁 75-0)