Organization - 內容感知的快速運動估計演算法

Chapter 1 Introduction

1.3 Organization

The paper is organized as follows. Chapter 2 introduces the related background knowledge, including motion estimation, integral frame, and related algorithms. In Chapter 3, we present how the Content-Aware Fast Motion Estimation Algorithm is designed and developed. Chapter 4 reports the significant experimental results. Finally, the conclusions and future works are given in Chapter 5.

Table 1-1 Advantages and drawbacks of fast motion estimation algorithms Unsuitable for high motion Coding efficiency degradation

Unsuitable for sudden motion change Substantial overhead

Chapter 2 Background Knowledge

In this chapter, we introduce some background knowledge related to our proposed approaches.

At first, we acquaint you with block motion estimation and compensation. Second, matching criterions for motion estimation are described briefly. Next, integral frame is presented. Finally, some fast motion estimation algorithms are presented in detail.

2.1 Block Motion Estimation and Compensation

Motion estimation and compensation techniques are used to remove temporal redundancy of inter frames. An ideal approach is to segment the frame into some objects including moving and stationary objects. However, the segmentation of objects is difficult and impractical. A practical and widely used method of motion compensation is to compensate for movement of blocks of the currents frame. We call this method as block-based motion estimation and compensation. Usually the block is a 16x16-pixel region of a frame, called macroblock (MB). The MB is the basic unit for motion compensated prediction in many of visual coding standards including MPEG-1, MEEG-2, MPEG-4, H.263 and H.264.

Motion estimation of a macroblock involves finding a 16x16-pixel block in a reference frame that closely matches the current macroblock. The reference frame may be before or after the current frame in display order. An area in the reference frame centered on the search center is searched and the 16x16-pixel block within the search area that minimizes the matching criterion is chosen as the best-matched block. The height and width of the search area are considered as the size of search window as shown in Figure 2-1.

Reference frame Current frame

Current MB

Best matched block

Search range MV

Figure 2-1 Motion estimation

The new visual coding standard H.264/AVC introduces the overlapped variable block size to improve coding efficiency. There are seven block sizes, 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4, forming the following partitions of a 16x16 macroblock as depicted in Figure 2-2. When P8x8 type is considered, the 8x8, 8x4, 4x8, and 4x4 type must be considered for each of the four individual 8x8 sub-blocks. Note that each partition has its own unique motion vector [30].

16×16 type 16×8 type 8×16 type P8×8 type

8×8 type 8x4 type 4x8 type 4x4 type

Different partition sizes for a macroblock subtype in P8×8 mode

Figure 2-2 Different partition sizes in a macroblock

2.2 Matching Criterion

In order to choose the best-matched block, a matching criterion is needed. Mean square

difference (MSD), mean absolute difference (MAD), and sum of absolute difference (SAD) are frequently used criterions. Their definitions can be described by the following equations.

( )

M and N is the width and height of the block, respectively. m and n are horizontal and vertical component of motion vector, respectively. f_c and f_r are the current and reference blocks, respectively.

MSD, MAD, and SAD have very high accuracy in block matching. However, SAD does not need any multiplication operations. Therefore, SAD is the most popular criterion used in the international video coding standards.

Unlike other video coding standards, H.264 uses the Lagrange multiplier to compute the rate distortion cost for each partition within a macroblock. The best-matched block is selected by minimizing the following Lagrange cost.

( , ^motion) ( , ( , ))^c ^r ^motion ( ^P)

J MV λ =SAD f f m n +λ ⋅Rate MV−MV (2.4)

MV = (m, n) is the motion vector, MV_P = (m_Px, n_P) is the prediction for motion vector, andλ

motion is the Lagrange multiplier. The function Rate(MV－MVP) represents the predicted motion

error and is implemented by a look up table [31].

2.3 Integral Frame

In this section, we introduce integral frame technique which is used in our Successive Elimination Algorithm (SEA) to compute the sum of pixel values in a block efficiently. We denote the sum of pixel values in a block as block sum (BS). Viola et al. [31] proposed the integral frame

technique for sum of pixel values within any rectangular area in a frame. Given a video frame f, the

The integral frame is shown in Figure 2-3

The computational cost for an integral frame is described as follows. Let R_f(p, q) be the cumulative row sum of pixel values in frame f. The definitions are:

By using equation (2.9) and (2.10) recursively, one can compute the integral frame If in one pass. For a frame with W x H pixels, 2WH additions are required to compute an integral frame. The sum of pixel values in any rectangular block in a frame can be computed by three arithmetic operations. For example, as illustrated in Figure 2-4, the BS of block D can be computed by equation (2.11).

Figure 2-4 Computation of block sum

2.4 Fast Motion Estimation Algorithms

In this section, we introduce some fast motion estimation algorithms, including Diamond Search [2], Successive Elimination Algorithm [16], Partial Distortion Elimination [17], and modified Window Follower Algorithm [27].

2.4.1 Diamond Search (DS)

Just as other conventional fast motion estimation algorithms, DS [2] is also designed to reduce the number of search points in motion estimation. DS has very good performance compared with the Three-Step Search (TSS), New Three-Step Search (NTSS), and Four-Step Search (4SS).

However, DS are still often trapped into local minimum problem. DS employs two search patterns in motion estimation, as illustrated in Figure 2-5.

The first pattern called large diamond search pattern (LDSP) is repeatedly used until the step in which the minimum block distortion occurs at the center point. After that, the second pattern called small diamond search pattern (SDSP) is used as the final step. The minimum block distortion point found in SDSP is the final solution of MV, which points to the best -matched block. See Figure 2-6 for example of search process.

Large diamond search pattern (LDSP)

Small diamond search pattern (SDSP)

Figure 2-5 Diamond search patterns

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

Figure 2-6 Example for search process of Diamond Search

2.4.2 Successive Elimination Algorithm (SEA)

In motion estimation, the SAD of each block in the search window is compared with the current minimum SAD. If the SAD of the current block is smaller than the current minimum SAD, the block is considered as up-to-date best-matched block. In order to reduce the computation of SAD, Successive Elimination Algorithm (SEA) [16] was proposed. The SEA is a lossless fast motion estimation algorithm based on mathematical inequality. The main idea of SEA can be shown in the equation (2.12).

1 1

In equation (2.12), BSc and BSr are the block sums in the current block and candidate block,

respectively. Because SAD(f_c, f_r(m, n)) is equal to or larger than sea(f_c, f_r(m, n)), if sea(f_c, f_r(m, n)) is larger than the current minimum SAD, SAD(fc, fr(m, n)) must be larger than the current minimum SAD. Therefore, computation of SAD(f_c, f_r(m, n)) can be skipped.

To compute sea value is easier than to compute SAD, because BSc has to be calculated only once and BS_r(m, n) can be derived from the previous value of BS_r(m－1, n). Hence, SEA can reduce the computation of SAD efficiently.

Multilevel SEA (MSEA) proposed in [20] is a generalized SEA. MSEA divides a macroblock into sub-blocks and calculates the BS for each sub-block. Then we compute the sum of absolute differences of the corresponding BSs as mesa(f_c, f_r(m, n)). The mesa(f_c, f_r(m, n)) is always equal to or larger than sea(fc, fr(m, n)). Consequently, the mesa(fc, fr(m, n)) is a lower bound of SAD. The equation (2.13) describes the idea.

In Equation (2.13), k is the index of sub-block and L is the level of division. For example, when N=16 and M=16, msea with level L=0 is reduced to sea, and msea with level L=4 is the same as SAD. Obviously, the bound is lower when the level is higher; however, the computational cost is higher.

2.4.3 Partial Distortion Elimination (PDE)

The concept of PDE [17] uses the partial sum of difference to eliminate impossible candidates before the complete calculation of SAD. The basic concept is shown in equation (2.14).

1 1

In the process of computation of SAD, we compute the partial SAD and compare the partial SAD with the current minimum SAD. If the partial SAD is equal to or larger than the current minimum SAD, the calculation of SAD can be terminated and the search point can be skipped.

Owing to the overhead of testing inequality, the testing is performed every row. Like SEA, if we can find a smaller SAD early, the more candidates can be skipped.

2.4.4 Modified Window Follower Algorithm (MWFA)

Window follower algorithm (WFA) [26] takes the maximum displacement of MV in previous frame plus one unit as the search range for the current frame. The algorithm is presented as follows.

Window Follower Algorithm [27]

Step 1: For the kth frame, compute the maximum horizontal and vertical displacement from all MVs in (k－1)th frame. The maximum value D is defined as equation (2.15). The dt represents the maximum displacement of two components of MV of tth block.

max[ ]^t

D= d (2.15)

max ,

t x y

d = ⎢⎣MVt MVt ⎥⎦ (2.16)

Step 2: Perform motion estimation for kth frame with search range P=D+1. For the first frame, the search range P is set to max search range.

WFA assumes that [26]:

(1) The change of motion content between frames is gradual and not sudden.

(2) The motion content is constant over a large number of successive frames.

However, the characteristics of motion in natural video sequences are various and hardly predictable. The assumptions of WFA may not be true in natural video sequences. MWFA [27]

modifies WFA by exploiting both temporal and spatial information and adopting SAD as a measure of accuracy of MV. MWFA algorithm is presented as follows.

Modified Window Follower Algorithm [28]

Step 1: For the kth frame, compute the displacement D as defined in WFA.

Step 2: Perform motion estimation for each block in kth frame with search range P_t, for tth block. P_t is determined by the following mutually exclusive rules.

(1) If (SADmint-1 >= TH1) Pt = Pmax, F = 1 (2) If (SADmint-1 <= TH1 and F == 1) Pt = max (D, d t-1) + 1

If (SAD_mint-1 <= TH1 and F == 0) Pt = D + 1

(3) If (SADmint-1 <= TH2 and F == 1) Pt = max (D, d t-1) If (SAD_mint-1 <= TH₂ and F == 0) P_t= D

SAD_mint-1and d _t-1 represent the minimum SAD and the maximum MV displacement for the (t

－1)th block in the current frame, respectively. The flag F is set to zero at the beginning of each frame. When the flag F is set to zero, only temporal information is considered; when the flag F is set to one, both temporal and spatial information are taken into account. The threshold TH1 and TH2

are set to 4096 and 2048, respectively, derived from simulations of typical video sequences.

Chapter 3 Content-Aware Fast Motion Estimation Algorithm

In this chapter, we present our proposed Content-Aware Fast Motion Estimation Algorithm (CAFME), which consists of SDSR, SEAIF, and ETA. At first, section 3.1 presents some observations and analyses of search range in motion estimation. Simple dynamic search range algorithm (SDSR) and SEA with integral frame (SEAIF) are presented in section 3.2 and 3.3, respectively. Finally, our early termination algorithm (ETA) is given in section 3.4.

3.1 Analysis of Search Range

In this section, we want to explore the relationships among the parameters in the motion estimation. Because adjustment of search range needs some information, the relationships can help us to develop a good algorithm. We did some experiments to observe and analyze the relationships between search range (SR) and frame rate, frame resolution, motion activity, quantization parameter (QP), and SAD of best-matched block. The experimental environment is as follows.

Platform: H.264/AVC reference software JM 9.4 [32]

Machine: Athlon XP 1700+ with 512 MB memory Profile: baseline

Level: 3.0

Block match algorithm (BMA): full search Group of picture (GOP): 15

Frame structure: IPPP

Number of reference frame: 1 Hadamard transform: enable

All block size (16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4): enable Rate-distortion optimized (RDO): enable

Fast ME (UMHexagonS) [33]: disable Fast mode selection [34]: disable

Rate control (RC): disable 16x16 MB observed

3.1.1 Search Range and Frame Rate

Since the frame rate affects the difference of successive frames, so we observe the relationship between SR and frame rate. The test data are foreman sequence with FPS=30 and FPS=15. The temporal distance of sequence with FPS=30 is 1/30 second and the temporal distance of sequence with FPS=15 is 1/15 second. In theory, when the frame rate is higher, the motion estimation needs smaller search range.

The Quantization parameter (QP) is mapped into quantization step and affects the bitrate significantly. In our experiments, the QP is fixed and RC is disabled. Therefore, we only need to observe the bitrate field for different search ranges. In Table 3-1, the gray areas represent the bitrates are stable and the search ranges are enough to find true MVs in motion estimation. We can observe that the bitrates are approximately stable when the SR≧4 with FPS=30 and the SR≧8 with FPS=15. Experimental results show a larger search range is required to find the true MV when the frame rate is lower. The experimental results conform to the theory.

Table 3-1 The relation between SR and FPS Foreman, QCIF (176 x 144), QP=36

FPS=30 100 Frames FPS=15 50 Frames

3.1.2 Search Range and Frame Resolution

We test the coastguard sequence in QCIF and CIF resolution. In Table 3-2, the gray areas in

QCIF resolution represent the bitrates change slightly when the SR is from 2 to 32 and the gray areas in CIF resolution represent the bitrates change slightly when the SR is from 4 to 32. These observations present the SR≧2 is large enough to find the true MVs in QCIF resolution and SR≧4 is large enough to find the true MVs in CIF resolution. Therefore we conclude that the search range is required to increase adaptively for the larger resolution.

Table 3-2 The relation between SR and resolution Coastguard, QP=36, FPS=30, Encoded frames=90

QCIF (176 x 144) CIF (352 x 288)

1 29.107 76.120 29.225 420.099

0 28.799 113.893 28.920 639.923

3.1.3 Search Range and Motion Activity

We divide the foreman sequence into two parts, which represent the low and high motion sequences. The first part consists of first 90 frames and the second part consists of frames from frame 151 to 240. In Table 3-3, we observe that the bitrates are approximately stable when the SR

≧4 in low motion sequence and the SR≧8 in high motion sequence. As we expect, the search range should be increased adaptively for high motion sequences.

Table 3-3 The relation between SR and motion activity Foreman QCIF QP=36 FPS=30

Frame 0~89 (low motion) Frame 151~240 (high motion) SR SNR (dB) Bitrate (Kbps) SNR (dB) Bitrate (Kbps)

32 31.478 69.528 31.245 84.747

16 31.469 69.365 31.242 84.949

8 31.448 69.400 31.223 85.307

4 31.443 69.224 31.166 89.931

2 31.424 69.771 31.024 112.832

1 31.406 71.155 30.954 130.109

0 31.227 79.122 30.633 185.883

3.1.4 Search Range, QP, and SAD of Best-matched Block

In block matching, SAD is used as matching criterion. If SR is too small, then the true MV may not be found and the SAD found at the best-matched block will be large. Besides, the QP also affects SAD obviously. Therefore, this experiment considers these factors. In Table 3-4, the field SAD best average means the average of SAD value causes the minimum rate distortion cost in H.264/AVC encoder. The experimental result shows the true MVs can be found as long as SR≧8 regardless of QP while QP only affects the magnitude of SAD. We also show the SAD best average frame by frame in Figure 3-1. In foreman sequence, the motion is higher than the rest of the sequence from frame 150 to 220. Therefore, SR≦4 is not large enough to find the true MVs.

Table 3-4 The relation between SR, QP, and SAD Foreman QCIF 300 Frames FPS=30

SAD best average

SR QP=18 QP=24 QP=30 QP=36

32 921.2 1024.0 1221.3 1577.5

16 924.7 1027.8 1226.0 1584.9

8 942.9 1045.6 1244.6 1605.3

4 1068.8 1165.5 1353.8 1702.9 2 1252.9 1344.9 1524.4 1860.9 1 1413.7 1503.7 1585.9 2001.9 0 1831.9 1911.4 2081.7 2330.1

Foreman QCIF QP36

Figure 3-1 SR and motion activity in foreman QCIF frame by frame

In Summary, if we can find a SR for a frame or a block in motion estimation such that the true MVs can be found, the local minimum problem can be avoided and the computational cost of motion estimation can be reduced dramatically.

In our experiments, the search range should be changed adaptively according to motion activity of video and parameters of encoder. However, the parameters of encoder should be used by comparing with each other. Hence, we develop SDSR to adjust SR dynamically based on motion activity.

3.2 Simple Dynamic Search Range (SDSR)

In this section, we present our proposed Simple Dynamic Search Range algorithm (SDSR). In order to adjust search range for motion estimation, some approaches have already been implemented in DSWA [21], AFSBM [22], MWFA [27], and MAS [28]. These approaches may be classified into block matching error based and motion vector based.

The block matching error is usually measured in MSD, MAD or SAD. The block matching error represents the degree of matching between current block and candidate block. The value of block matching error is determined by many factors including motion activity, texture, and quantization parameter. See Figure 3-2 for example. From frame 220, the values of SAD are much higher than the rest. The reason is the complicated video texture, not the motion activity. However, from frame 150 to 170, the values of SAD are raised sharply due to the sudden motion change instead of video texture. Consequently, the approaches based on block matching error are usually unsuitable to evaluate the motion activity.

On the contrary, motion vector represents the motion activity more precisely [28]. For this reason, our proposed approach is based on motion vector information. Due to the wide variations of motion activity in video sequences and different motion activity in various areas within a single frame, we would like to adjust search range on both frame level and block level. The adjustments of SR in frame level and block level are based on temporal correlation and spatial correlation of

motion field, respectively.

Figure 3-2 SAD of foreman CIF frame by frame

The proposed Simple Dynamic Search Range algorithm is described as follows.

Simple Dynamic Search Range Algorithm

Step 1: Determine the search range in frame level. The search range called SR_FRAMEk is computed by the maximum horizontal and vertical displacement from all MVs in (k－1)th frame plus one unit. The definition is:

{ }

_ max[ , ] 1

all blocks in ( -1)th frame

k t t

SR FRAME MVx MVy

t k

= +

∈ (3.1)

Step 2: Adjust the search range in macroblock level. Let MV_MAXt denote the maximum displacement of two components of MVs in neighbor blocks of tth block, described as in the following rules.

s ∈﹛The left, above left, above, above right blocks of tth block﹜

If any of neighbor blocks is not available

_ ^t max[max[ ^s, ^s], _ ^k]

MV MAX = MVx MVy SR FRAME

Else

_ t max[ s, s]

MV MAX = MVx MVy

Step 3: Determine the final search range for tth block, called SR_BLOCK_t by the following rules.

//Adjust SR in block level

If ( _ _ )

SR BLOCK MV MAX SR FRAME MV MAX

≥

Else if ( _ max search range) _ max search range

Because the prediction of MV may not be zero MV in motion estimation, the displacement of MV may be larger than the SR. Hence the SR in frame level may increase more than one unit between frames. The adjustment of SR in block level ensures that the SR is large enough to find the true MV.

Note that the neighbor block of current block may not be a complete macroblock (16x16) in H.264/AVC video compression standard, shown in Figure 3-3.

D (8x4) B (8x4)

Figure 3-3 Current and neighbor blocks (variable block size)

3.3 Successive Elimination Algorithm with Integral Frame (SEAIF)

The SEA and integral frame technique had been introduced in section 2.4.2 and 2.3. In this section, we integrate them to form a new SEA called SEAIF for H.264/ACV standard. In H.264/AVC standard, rate-distortion optimization (RDO) is recommended for mode selection. The modes include nine intra modes and seven inter modes (see Figure 2-2). In inter-coding, a total of 41 motion estimations is required for a 16x16 macroblock while the RDO is enabled. (One for 16x16, two for 16x8, two for 8x16, four for 8x8, eight for 8x4, eight for 4x8, and sixteen for 4x4) Therefore, the ME cost increases dramatically.

In order to reduce the intensive computation caused by RDO. In the H.264/AVC reference software JM 9.4 [32], a Fast Full Pel Search algorithm is implemented by reusing SAD values of the smallest 4x4 block. Before a new macroblock is motion estimated, it computes the SAD values for all 4x4 block at all search points within the search window. After that, it merges the SAD values to get the SAD values of larger blocks. In this way, computation of SAD for a macroblock with all block size enabled is about equal to the computation of SAD with only a 16x16 block.

We take the concept of reusing SAD and integrate it into our proposed SEAIF. The main idea of the SEAIF for H.264/AVC is to reuse sea values and SAD values. The following sub-sections

在文檔中內容感知的快速運動估計演算法 (頁 13-0)