Modified Window Follower Algorithm (MWFA)

Chapter 2 Background Knowledge

2.4 Fast Motion Estimation Algorithms

2.4.4 Modified Window Follower Algorithm (MWFA)

Window follower algorithm (WFA) [26] takes the maximum displacement of MV in previous frame plus one unit as the search range for the current frame. The algorithm is presented as follows.

Window Follower Algorithm [27]

Step 1: For the kth frame, compute the maximum horizontal and vertical displacement from all MVs in (k－1)th frame. The maximum value D is defined as equation (2.15). The dt represents the maximum displacement of two components of MV of tth block.

max[ ]^t

D= d (2.15)

max ,

t x y

d = ⎢⎣MVt MVt ⎥⎦ (2.16)

Step 2: Perform motion estimation for kth frame with search range P=D+1. For the first frame, the search range P is set to max search range.

WFA assumes that [26]:

(1) The change of motion content between frames is gradual and not sudden.

(2) The motion content is constant over a large number of successive frames.

However, the characteristics of motion in natural video sequences are various and hardly predictable. The assumptions of WFA may not be true in natural video sequences. MWFA [27]

modifies WFA by exploiting both temporal and spatial information and adopting SAD as a measure of accuracy of MV. MWFA algorithm is presented as follows.

Modified Window Follower Algorithm [28]

Step 1: For the kth frame, compute the displacement D as defined in WFA.

Step 2: Perform motion estimation for each block in kth frame with search range P_t, for tth block. P_t is determined by the following mutually exclusive rules.

(1) If (SADmint-1 >= TH1) Pt = Pmax, F = 1 (2) If (SADmint-1 <= TH1 and F == 1) Pt = max (D, d t-1) + 1

If (SAD_mint-1 <= TH1 and F == 0) Pt = D + 1

(3) If (SADmint-1 <= TH2 and F == 1) Pt = max (D, d t-1) If (SAD_mint-1 <= TH₂ and F == 0) P_t= D

SAD_mint-1and d _t-1 represent the minimum SAD and the maximum MV displacement for the (t

－1)th block in the current frame, respectively. The flag F is set to zero at the beginning of each frame. When the flag F is set to zero, only temporal information is considered; when the flag F is set to one, both temporal and spatial information are taken into account. The threshold TH1 and TH2

are set to 4096 and 2048, respectively, derived from simulations of typical video sequences.

Chapter 3 Content-Aware Fast Motion Estimation Algorithm

In this chapter, we present our proposed Content-Aware Fast Motion Estimation Algorithm (CAFME), which consists of SDSR, SEAIF, and ETA. At first, section 3.1 presents some observations and analyses of search range in motion estimation. Simple dynamic search range algorithm (SDSR) and SEA with integral frame (SEAIF) are presented in section 3.2 and 3.3, respectively. Finally, our early termination algorithm (ETA) is given in section 3.4.

3.1 Analysis of Search Range

In this section, we want to explore the relationships among the parameters in the motion estimation. Because adjustment of search range needs some information, the relationships can help us to develop a good algorithm. We did some experiments to observe and analyze the relationships between search range (SR) and frame rate, frame resolution, motion activity, quantization parameter (QP), and SAD of best-matched block. The experimental environment is as follows.

Platform: H.264/AVC reference software JM 9.4 [32]

Machine: Athlon XP 1700+ with 512 MB memory Profile: baseline

Level: 3.0

Block match algorithm (BMA): full search Group of picture (GOP): 15

Frame structure: IPPP

Number of reference frame: 1 Hadamard transform: enable

All block size (16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4): enable Rate-distortion optimized (RDO): enable

Fast ME (UMHexagonS) [33]: disable Fast mode selection [34]: disable

Rate control (RC): disable 16x16 MB observed

3.1.1 Search Range and Frame Rate

Since the frame rate affects the difference of successive frames, so we observe the relationship between SR and frame rate. The test data are foreman sequence with FPS=30 and FPS=15. The temporal distance of sequence with FPS=30 is 1/30 second and the temporal distance of sequence with FPS=15 is 1/15 second. In theory, when the frame rate is higher, the motion estimation needs smaller search range.

The Quantization parameter (QP) is mapped into quantization step and affects the bitrate significantly. In our experiments, the QP is fixed and RC is disabled. Therefore, we only need to observe the bitrate field for different search ranges. In Table 3-1, the gray areas represent the bitrates are stable and the search ranges are enough to find true MVs in motion estimation. We can observe that the bitrates are approximately stable when the SR≧4 with FPS=30 and the SR≧8 with FPS=15. Experimental results show a larger search range is required to find the true MV when the frame rate is lower. The experimental results conform to the theory.

Table 3-1 The relation between SR and FPS Foreman, QCIF (176 x 144), QP=36

FPS=30 100 Frames FPS=15 50 Frames

3.1.2 Search Range and Frame Resolution

We test the coastguard sequence in QCIF and CIF resolution. In Table 3-2, the gray areas in

QCIF resolution represent the bitrates change slightly when the SR is from 2 to 32 and the gray areas in CIF resolution represent the bitrates change slightly when the SR is from 4 to 32. These observations present the SR≧2 is large enough to find the true MVs in QCIF resolution and SR≧4 is large enough to find the true MVs in CIF resolution. Therefore we conclude that the search range is required to increase adaptively for the larger resolution.

Table 3-2 The relation between SR and resolution Coastguard, QP=36, FPS=30, Encoded frames=90

QCIF (176 x 144) CIF (352 x 288)

1 29.107 76.120 29.225 420.099

0 28.799 113.893 28.920 639.923

3.1.3 Search Range and Motion Activity

We divide the foreman sequence into two parts, which represent the low and high motion sequences. The first part consists of first 90 frames and the second part consists of frames from frame 151 to 240. In Table 3-3, we observe that the bitrates are approximately stable when the SR

≧4 in low motion sequence and the SR≧8 in high motion sequence. As we expect, the search range should be increased adaptively for high motion sequences.

Table 3-3 The relation between SR and motion activity Foreman QCIF QP=36 FPS=30

Frame 0~89 (low motion) Frame 151~240 (high motion) SR SNR (dB) Bitrate (Kbps) SNR (dB) Bitrate (Kbps)

32 31.478 69.528 31.245 84.747

16 31.469 69.365 31.242 84.949

8 31.448 69.400 31.223 85.307

4 31.443 69.224 31.166 89.931

2 31.424 69.771 31.024 112.832

1 31.406 71.155 30.954 130.109

0 31.227 79.122 30.633 185.883

3.1.4 Search Range, QP, and SAD of Best-matched Block

In block matching, SAD is used as matching criterion. If SR is too small, then the true MV may not be found and the SAD found at the best-matched block will be large. Besides, the QP also affects SAD obviously. Therefore, this experiment considers these factors. In Table 3-4, the field SAD best average means the average of SAD value causes the minimum rate distortion cost in H.264/AVC encoder. The experimental result shows the true MVs can be found as long as SR≧8 regardless of QP while QP only affects the magnitude of SAD. We also show the SAD best average frame by frame in Figure 3-1. In foreman sequence, the motion is higher than the rest of the sequence from frame 150 to 220. Therefore, SR≦4 is not large enough to find the true MVs.

Table 3-4 The relation between SR, QP, and SAD Foreman QCIF 300 Frames FPS=30

SAD best average

SR QP=18 QP=24 QP=30 QP=36

32 921.2 1024.0 1221.3 1577.5

16 924.7 1027.8 1226.0 1584.9

8 942.9 1045.6 1244.6 1605.3

4 1068.8 1165.5 1353.8 1702.9 2 1252.9 1344.9 1524.4 1860.9 1 1413.7 1503.7 1585.9 2001.9 0 1831.9 1911.4 2081.7 2330.1

Foreman QCIF QP36

Figure 3-1 SR and motion activity in foreman QCIF frame by frame

In Summary, if we can find a SR for a frame or a block in motion estimation such that the true MVs can be found, the local minimum problem can be avoided and the computational cost of motion estimation can be reduced dramatically.

In our experiments, the search range should be changed adaptively according to motion activity of video and parameters of encoder. However, the parameters of encoder should be used by comparing with each other. Hence, we develop SDSR to adjust SR dynamically based on motion activity.

3.2 Simple Dynamic Search Range (SDSR)

In this section, we present our proposed Simple Dynamic Search Range algorithm (SDSR). In order to adjust search range for motion estimation, some approaches have already been implemented in DSWA [21], AFSBM [22], MWFA [27], and MAS [28]. These approaches may be classified into block matching error based and motion vector based.

The block matching error is usually measured in MSD, MAD or SAD. The block matching error represents the degree of matching between current block and candidate block. The value of block matching error is determined by many factors including motion activity, texture, and quantization parameter. See Figure 3-2 for example. From frame 220, the values of SAD are much higher than the rest. The reason is the complicated video texture, not the motion activity. However, from frame 150 to 170, the values of SAD are raised sharply due to the sudden motion change instead of video texture. Consequently, the approaches based on block matching error are usually unsuitable to evaluate the motion activity.

On the contrary, motion vector represents the motion activity more precisely [28]. For this reason, our proposed approach is based on motion vector information. Due to the wide variations of motion activity in video sequences and different motion activity in various areas within a single frame, we would like to adjust search range on both frame level and block level. The adjustments of SR in frame level and block level are based on temporal correlation and spatial correlation of

motion field, respectively.

Figure 3-2 SAD of foreman CIF frame by frame

The proposed Simple Dynamic Search Range algorithm is described as follows.

Simple Dynamic Search Range Algorithm

Step 1: Determine the search range in frame level. The search range called SR_FRAMEk is computed by the maximum horizontal and vertical displacement from all MVs in (k－1)th frame plus one unit. The definition is:

{ }

_ max[ , ] 1

all blocks in ( -1)th frame

k t t

SR FRAME MVx MVy

t k

= +

∈ (3.1)

Step 2: Adjust the search range in macroblock level. Let MV_MAXt denote the maximum displacement of two components of MVs in neighbor blocks of tth block, described as in the following rules.

s ∈﹛The left, above left, above, above right blocks of tth block﹜

If any of neighbor blocks is not available

_ ^t max[max[ ^s, ^s], _ ^k]

MV MAX = MVx MVy SR FRAME

Else

_ t max[ s, s]

MV MAX = MVx MVy

Step 3: Determine the final search range for tth block, called SR_BLOCK_t by the following rules.

//Adjust SR in block level

If ( _ _ )

SR BLOCK MV MAX SR FRAME MV MAX

≥

Else if ( _ max search range) _ max search range

Because the prediction of MV may not be zero MV in motion estimation, the displacement of MV may be larger than the SR. Hence the SR in frame level may increase more than one unit between frames. The adjustment of SR in block level ensures that the SR is large enough to find the true MV.

Note that the neighbor block of current block may not be a complete macroblock (16x16) in H.264/AVC video compression standard, shown in Figure 3-3.

D (8x4) B (8x4)

Figure 3-3 Current and neighbor blocks (variable block size)

3.3 Successive Elimination Algorithm with Integral Frame (SEAIF)

The SEA and integral frame technique had been introduced in section 2.4.2 and 2.3. In this section, we integrate them to form a new SEA called SEAIF for H.264/ACV standard. In H.264/AVC standard, rate-distortion optimization (RDO) is recommended for mode selection. The modes include nine intra modes and seven inter modes (see Figure 2-2). In inter-coding, a total of 41 motion estimations is required for a 16x16 macroblock while the RDO is enabled. (One for 16x16, two for 16x8, two for 8x16, four for 8x8, eight for 8x4, eight for 4x8, and sixteen for 4x4) Therefore, the ME cost increases dramatically.

In order to reduce the intensive computation caused by RDO. In the H.264/AVC reference software JM 9.4 [32], a Fast Full Pel Search algorithm is implemented by reusing SAD values of the smallest 4x4 block. Before a new macroblock is motion estimated, it computes the SAD values for all 4x4 block at all search points within the search window. After that, it merges the SAD values to get the SAD values of larger blocks. In this way, computation of SAD for a macroblock with all block size enabled is about equal to the computation of SAD with only a 16x16 block.

We take the concept of reusing SAD and integrate it into our proposed SEAIF. The main idea of the SEAIF for H.264/AVC is to reuse sea values and SAD values. The following sub-sections present the detail of the design. Section 3.3.1 and 3.3.2 present the techniques of reusing sea and SAD values. Section 3.3.3 presents the spiral search pattern used by SEAIF algorithm. Finally, analysis of complexity for SEAIF is presented in Section 3.3.4.

3.3.1 Reusing of sea value

For each search point, calculate the sea values of sixteen 4x4 blocks of the current macroblock by using integral frame technique. These sea values of 4x4 blocks are the basis for sea values of larger blocks. Then the sea values of larger blocks are derived from these sea values of 4x4 blocks, described as follows.

For 8x4 or 4x8 block, sum up sea values of two 4x4 blocks.

For 8x8 block, sum up sea values of two 8x4 blocks.

For 16x8 or 8x16 block, sum up sea values of two 8x8 blocks.

For 16x16 block, sum up sea values of two 16x8 blocks.

In this way, we can get all sea values of all blocks. These sea values of larger blocks are always equal to or larger than the sea values computed directly from BS of corresponding blocks.

Therefore, the sea values of larger blocks derived from 4x4 block sea values are lower bound of SAD and the more computations of SAD can be skipped.

3.3.2 Reusing of SAD value

In SEAIF, if the sea value is less than the current minimum SAD value, complete calculation of SAD will be preformed. In H.264/AVC, overlapped blocks are used in motion estimation. In order to reduce the computations of SAD, we take the 4x4 block SAD values as the basis of the larger block SAD values. The following is the approach.

Reusing SAD value algorithm

Regardless of block size, Calculation of SAD for the block is:

Step1: Find out all 4x4 blocks within the block.

Step2: Check the SAD values of these 4x4 blocks. If any SAD value of 4x4 blocks is not available, compute the SAD value.

Step3: Get the SAD value of the target block by adding up SAD values of these 4x4 blocks.

In this way, there is no redundant computation of SAD.

3.3.3 Spiral Search

In JM 9.4, the spiral search for full search is not really spiral shape. Therefore, we modified it

to the real spiral shape shown in Figure 3-4 and Figure 3-5, respectively. Chapter 4 will show the

Figure 3-4 Spiral search in JM 9.4 Figure 3-5 Real spiral search pattern

3.3.4 Analysis of complexity

The reason of adopting SEA is to reduce the computational cost in block matching measurement. The overhead of SEA should be considered and analyzed. The overheads of SEA are mainly the computations of block sum. In SEA [16], Salari et al. proposed a fast algorithm to compute the block sums. We compare three approaches and present the analysis of overhead as follows.

Let W denote image width, H image height, M block width, and N block height. Operations required for block sums of all M x N blocks in a reference frame are:

Straightforward approach:

Number of block sum in a frame: (W－M + 1)(H－N + 1) Operations required for a block sum: MN－1

Total cost: (MN－1) (W－M + 1)(H－N + 1) Approximate cost: MNWH

SEA approach in [16]:

Total cost: 4WH－(H－N)(M + 3)－3W(N + 1)

Approximate cost: 4WH

Integral frame approach:

Operations required for an integral frame: 2WH

Operations required for all block sum: ≈ 2(W－M + 1)(H－N + 1) ¹ Total cost: 2WH + 2(W－M + 1)(H－N + 1)

Approximate cost: 4WH

Although integral frame approach and the SEA approach in [16] have approximately the same complexity, there is an advantage in integral frame approach. Integral frame approach is flexible to get block sum of any rectangle block.

For example, if we want to use the multilevel SEA for each block size in H.264/AVC, the implementation will be easier with integral frame approach. (Note that our approach uses the tighter lower bound in SEA, not multilevel SEA.) Computing msea value of 16x16 block with level L=0 only needs 5 operations (5 = 3 for get BS + 1 subtraction + 1 absolute). Nevertheless, merging 16 4x4 sea values to get the sea value of 16x16 block with level L=0 needs 15 addition operations while the sea value is tighter lower bound. Trade-off is between the tighter lower bound and computational complexity.

3.4 Early Termination Algorithm (ETA)

In this section, we preset our proposed Early Termination Algorithm (ETA) in detail. In [29], Siou-Shen Lin et al. introduce the variance of motion vectors. They show the probability is about 79% in average when the variance of the current block and neighbor blocks is smaller than 3. They consider that it is high probability that the current block and the neighbor blocks might belong to the same object when the variance of the motion vectors in the neighbor blocks is small.

We exploit and modify the variance of motion vectors proposed in [29] to classify the motion

1 In [4], Viet Anh Nguyen and Yap-Pen Tan proposed a fast approach to calculate block sum by exploiting the adjacent property of the blocks.

activity of current block and neighbor blocks into simple motion and complex motion. The variance of motion vectors is defined in equation (3.3).

( ) / 4

If any of neighbor blocks is not available, MVvar is set to a large value (999999). For accuracy, we compare the MVvar with 5 instead of 3 to classify motion activity, shown in equation (3.4).

If (MVvar ≦ 5)

Mactivity = simple_motion (3.4)

Else

Mactivity = complex_motion

If motion activity is simple motion, we consider the current block and neighbor blocks are in the same object for simple. On the contrary, the current block and neighbor blocks are considered not in the same block. The SAD values of blocks within the same object should be similar and the SAD values of blocks not in the same object should be different largely. Based on the concept, the lower bound for the condition of termination is determined in equation (3.5).

If (Mactivity == simple_motion)

SAD_threshold = SAD_prediction (3.5) Else

SAD_threshold = SAD_prediction – SAD_standard_deviatoin

The SAD_prediction and SAD_standard_deviation represent the prediction of SAD of current block and the standard deviation of SAD of all blocks in the previous frame, respectively. The definitions are defined in equation (3.6) and (3.8):

( ) / 4

The SAD_t is the SAD value of tth block in a frame. Number_MB is the total number of MB in a frame. If there is no any neighbor block near the current block, SAD_prediction is set to a small value (-999999). Note that the SAD_prediction and SAD_standard_deviation are calculated for 16x16 macroblock. In H.264/AVC standard, there are seven block sizes used in motion estimation.

We determine the SAD_prediction and SAD_standard_deviation for other block size according to the area occupied by the block. The calculations are shown in the following rules.

Adjustment of SAD_prediction and SAD_variance for H.264/AVC standard

If (block size == 16x8 or 8x16)

SAD_prediction = SAD_prediction / 2

SAD_standard_deviation = SAD_standard_deviation / 2 Else if (block size == 8x8)

SAD_prediction = SAD_prediction / 4

SAD_standard_deviation = SAD_standard_deviation / 4 Else if (block size == 8x4 or 4x8)

SAD_prediction = SAD_prediction / 8

SAD_standard_deviation = SAD_standard_deviation / 8 Else if (block size == 4x4)

SAD_prediction = SAD_prediction / 16

SAD_standard_deviation = SAD_standard_deviation / 16

Finally, the condition of termination is tested when a new up-to-date best-matched block is found. If the SAD value of the up-to-date block is equal to or smaller than SAD_threshold, the motion estimation is terminated.

Chapter 4 Experimental Results and Discussions

In this chapter, we present the experimental results of the proposed approaches including simple dynamic search range algorithm, successive elimination algorithm with integral frame, and early termination algorithm. Finally, the experimental results of integrated algorithm called Content-Aware Fast Motion Estimation Algorithm (CAFME) are presented.

We modify the H.264/AVC reference software JM 9.4 and implement the proposed algorithms on it. In the experiments, we compare the proposed algorithm with Full Search (FS). We observe the number of search points for each block to measure the performance of the proposed algorithms.

We also measure the coding efficiency. In order to measure the coding efficiency, we compare the bitrates of encoded sequences with the same quantization parameter and disabling rate control.

Besides, we exploit the SAD value as a criterion to measure whether the determined search range is

在文檔中內容感知的快速運動估計演算法 (頁 23-0)