• 沒有找到結果。

Projection based adaptive window size selection for efficient motion estimation in H.264/AVC

N/A
N/A
Protected

Academic year: 2022

Share "Projection based adaptive window size selection for efficient motion estimation in H.264/AVC"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

PAPER Special Section on Image Media Quality

Projection Based Adaptive Window Size Selection for E fficient Motion Estimation in H.264 /AVC

Anand PAUL †a) , Jhing-Fa WANG †† , Nonmembers, Jia-Ching WANG , Member, An-Chao TSAI , and Jang-Ting CHEN , Nonmembers

SUMMARY This paper introduces a block based motion estimation al- gorithm based on projection with adaptive window size selection. The blocks cannot match well if their corresponding 1D projection does not match well, with this as foundation 2D block matching problem is trans- lated to a simpler 1D matching, which eliminates majority of potential pixel participation. This projection method is combined with adaptive window size selection in which, appropriate search window for each block is deter- mined on the basis of motion vectors and prediction errors obtained for the previous block, which makes this novel method several times faster than ex- haustive search with negligible performance degradation. Encoding QCIF size video by the proposed method results in reduction of computational complexity of motion estimation by roughly 45% and over all encoding by 23%, while maintaining image /video quality.

key words: block based motion estimation, 1D projection, adaptive win- dow size selection

1. Introduction

Recently established H.264 /AVC is the newest video coding standard. The main goals of the H.264/AVC standardization e ffort have been to enhance compression performance and provide a “network-friendly” video representation. In H.264 encoder, motion estimation is the major burden of complex- ity [1]. It is well known that motion search range is im- portant parameter to determine the coding e fficiency and the encoding computational cost. Most video coding standards, including MPEG-l/2/4, H.261, H.263, and H.264/AVC use block motion estimation and compensation for removing temporal redundancy. This is one of the most important part of a video encoder, and newer standards achieve better video quality at constant bitrate by allowing Subdivision of 16×16 pixel macroblocks (MBs) into smaller blocks. The encoder may then select whether to use large blocks and only a few motion vectors (MVs), or more accurate motion estimation (ME) with smaller blocks but with more motion vectors to transmit. For example, the newest and most e fficient cod- ing standard H.264/AVC allows subdividing the MBs into 16 × 8, 8 × 16, or 8 × 8 pixel blocks, and when the small- est size is chosen, the block may be further subdivided in a treelike fashion into 4 × 8, 8 × 4, or 4 × 4 pixel blocks

Manuscript received December 7, 2005.

Manuscript revised April 28, 2006.

Final manuscript received July 31, 2006.

The authors are with Multimedia and Communication IC Lab, EE Department, NCKU, Tainan-701, Taiwan, Republic of China.

††

The author is Chair Professor in EE Department, NCKU, Tainan-701, Taiwan, Republic of China.

a) E-mail: wangjf@csie.ncku.edu.tw DOI: 10.1093/ietfec/e89–a.11.2970

Fig. 1 Di fferent partition sizes in a macroblock.

[2], [3]. A good quality encoder must then handle di fferent blocks with 7 block sizes, as shown in Fig. 1. Since there are a vast number of possibilities for selecting the subdivision of blocks, the encoder must intelligently decide what block subdivision to use and which MV to use for each block.

Earlier encoders typically computed the sum of abso- lute di fferences (SAD) between the current block and can- didate blocks and selected simply the MV yielding the least distortion. However, this often will not give the best image quality for a given bit rate, because it may select long mo- tion vectors that need many bits to transmit it also does not help determining how the subdivision should be performed, because the smallest blocks will always minimize the dis- tortion, even if the multiple MVs may use larger amount of bits and increase the bit rate. This paper will not focus on determining how those subdivision are performed rather it focus on determining the search range for subsequent mac- roblock in a given frame and applies projection technique in determining motion vector.

Motion estimation is computationally the most de-

manding part of a typical video encoder. The multiple block

sizes available in newer standards only increase computa-

tion. Many fast motion estimation methods have been de-

veloped, but many of them compute the criterion only at a

few possible MVs. This will degrade video quality, because

the best MV might not be found. Other alternatives are the

fast full search methods. A lot of research has been carried

out to theoretically improve the various fast search methods,

but the practical implementation has been considered rela-

tively little. It is important that a given ME algorithm can be

applied in practical encoders, taking into account such as-

pects as RD-optimization and multiple block sizes, or it will

remain unused. In this paper we propose a novel motion es-

Copyright c  2006 The Institute of Electronics, Information and Communication Engineers

(2)

timation method based on projection with adaptive window selection which reduces computation and improves image quality.

1.1 Motivation

The computational complexity of block based motion esti- mation is the direct consequence of the expensive 2D block matching process. The relationship between motion and projection has been well established before [4]. Reference [5] was the first to introduced fast feature based motion es- timation based on integral projection; in which, most candi- date blocks are eliminated by matching 1D projection of the blocks. 1D projection of 2D blocks is used to eliminate the majority of candidates by matching in 1D, which is much faster than matching 2D blocks. Basic projection scheme is shown in Fig. 2. This is a greedy approach as we are try- ing to match only the sum of two blocks, the sum contains too little information about the block and DC matching may yield too many mismatches, thus an adaptive scheme is in- corporate to improve [6].

It is well known that motion search window is impor- tant to determine the coding efficiency and the encoding computational cost [4]. As the total encoding power is huge, it is necessary to develop a motion search window decision algorithm [6] to reduce encoding time. After obtaining mo- tion vector of current block by projection method, Adaptive Window Size Selection (AWSS) technique is used to fix the window size for succeeding blocks in the frame to obtain accurate motion vectors. An appropriate search window for each block is determined on the basis of motion vectors and prediction errors obtained for the previous block.

In this work, we aim to develop a simple yet effi- cient algorithm in which projection-based block matching [7] method is combined with adaptive window size selec- tion scheme [8]. This makes the over all approach to be an efficient one for motion estimation. This novel Projection with Adaptive Window Size Selection (PAWSS) was imple- mented by modifying JM 10.1 (Joint Model) reference soft- ware [9] encoder to include the PAWSS method in it. This showed a similar performance over fast full search algorithm with less computation time.

The paper is organized as follows. A review of projec- tion scheme along with fast projection and excluding can- didate by 1D matching is discussed in Sect. 2. In Sect. 3 Adaptive Window Size Selection (AWSS) is introduced and mapping of projection and AWSS algorithm is described.

(a) (b)

Fig. 2 Basic projection scheme. (a) reference frame, (b) current frame.

Simulation and experimental results of the proposed method is presented in Sect. 4 and finally conclusion of our work is concluded in Sect. 5.

2. Projection

2.1 Definition

Suppose the frame size is W×H, the block size is B h ×B v , and the search window size is W h ×W v. Then to predict one block, there are W h × W v candidates to search. A common block size is 16×16 (macroblock size) is used here for illustration.

Let B x ,y denote the 2D block of pixels with top-left position at (x, y) in a video frame, and let B i x , j ,y where 0 ≤ i < B h , and 0 ≤ j < B v , be the pixel value at jth row and ith column of the block. Define the vertical (col- umn) projection of B x ,y to be PB x ,y , a 1D row vector whose ith value is the sum of the ith column of B x ,y :

PB i x ,y =

B 

v

−1 j =0

B x i , j ,y , 0 ≤ i < B h , and 0 ≤ j < B y . (1)

By projection, a B h × B v 2D block is reduced to a B h

component 1D vector, and only DC information of each col- umn is preserved as shown in Fig. 3.

2.2 Fast Projection

In the current frame, there are only (W × H/B h × B v ) non- overlapped blocks and the computational load to compute projection is small (O(W × H) operations). By contrast, in the reference frame, there are W × H different blocks (one starting at each pixel). B x ,y and its direct right (lower) neighbor B x +1,y (B x ,y+1 ) share all pixels except two columns (rows). If PB x i ,y is known, for example, then PB i x +1,y and PB i x ,y+1 in (2) and (3) can be updated e fficiently (see Fig. 4), as follows:

PB i x ,y+1 = PB i x ,y − B x i ,0 ,y + B x i ,B ,y+1

v

−1 , (2) and

PB i x +1,y =

⎧⎪⎪ ⎪⎪⎪⎨

⎪⎪⎪⎪⎪

PB x ,y , i < B h − 1

B 

v

−1 j =0

B i x , j ,y , i = B h − 1 . (3)

Fig. 3 The projection of 2D block.

(3)

Starting with B 0,0 , with proper bu ffering, on average only 2 operations per pixel are required on average to com- pute the projection of a block in this updating manner. The cost for projection is thus only O(2W × H) operations.

2.3 Excluding Candidates by 1D Matching

Two blocks cannot match well if their projections do not match well [7]. For a block C x ,y in the current frame, block based motion estimation searches all displaced blocks R x +dx,y+dy in the search window in the reference frame for the best-matched block. The commonly used matching er- ror metric called maximum amplitude difference (MAD) is given by:

MAD(dx, dy) =

B 

h

−1 i =0

B 

v

−1 j =0

|C i x , j ,y − R i x , j +dy,y+dy |. (4)

By Contrast, the matching error MAD of the 1D projec- tions (projection maximum amplitude difference- PMAD) of blocks C x ,y and R x +dx,y+dy is

PMAD (dx , dy) =

B 

h

−1 i =0

 PC i x ,y − PR i x +dx,y+dy  . (5) MAD in (4) has more operation when compare to PMAD in (5). Pictorial representation of excluding candidate by 1D matching is shown in Fig. 5. We can see that 256 operations by 2D matching are reduced to 16 operations by 1D.

Fig. 4 Incremental calculation of block projection.

Fig. 5 Comparing 2D matching cost vs the 1D matching cost (MAD vs PMAD).

2.4 Buffering Scheme

To search for the motion vectors of a strip of blocks with their top corners at y in the current frame, only the blocks with their top corners within  y − W h /2, y + W h /2 

are in- volved in the reference frame, which is a W × W v strip. So a W × W v buffer, instead of one for W × H (usually H  W v ) is sufficient to store all reusable projections. When moving to the next strip, we need to slide up the buffer by B v lines, discard the B v lines moving out and update B v lines mov- ing in using fast projection. An additional B v -point buffer is necessary by the current frame.

2.5 Discussion

a) Although only vertical projection is discussed in this manuscript, an entirely analogous algorithm ensues by us- ing horizontal projections. We believe that vertical projec- tion may be generally preferred, since video sequences tend to have more horizontal motion in the scenes. However, horizontal projection may be more suitable for certain se- quences.

b) By using both vertical and horizontal projections, a few more candidate blocks can be eliminated by 1D matching.

However, the saving is just enough to compensate the cost of the extra projection. Besides, extra bu ffers are necessary.

Hence, the combination of two 1D projections is not better than using a single projection.

c) An even more greedy approach that may be based on matching just the total sum of blocks, PB x ,y , does not work well in practice. This is DC matching, and it is certainly very fast: we can derive fast methods to both compute the DC as well as manage the 1D buffer. Although DC match- ing is very efficient (only one operation per match), the sum contains too little information about a block. Our analysis indicates that DC matching yields too many mismatches to be effective.

Thus an adaptive scheme is incorporated to increase the

probability to match block with less computation and im-

proved performance then existing method.

(4)

3. Adaptive Window Size Selection

In motion estimation, motion vectors that exceed the search window cannot be detected, and when this happens, since sufficient motion compensation efficiency cannot be ob- tained video quality will be degraded in the encoding pro- cess. Moreover video PSNR degradation might be avoided with a wider search window, the unneeded computational complexity created by this window at scenes with only small motions would be wasteful [10].

3.1 Motion Vector Thresholding

To avoid this problem of choosing search window, we have developed a method for making adaptive search window de- cisions at each block after finding first best match using pro- jection method. Search window is modified on the basis of the motion estimation results for the previous block. The search window for any given block is chosen from among three candidates, i.e., ±32, ±16, and ±8 pixels. In search window selection, the sums of absolute motion vector val- ues (SumMV) are first calculated.

The prediction error for each block is calculated based on the PMAD between that block and the reference block.

When this PMAD and SumMV exceeds an upper bound threshold value i.e. PMAD Threshold [8] and Motion Vec- tor Threshold (PMADTh1, MVTh1), the largest search win- dow ( ±32 pixels) will be chosen for the next block. When both PMAD and SumMV are smaller than lower threshold values (PMADTh2, MVTh2) the narrowest search window (±8 pixels) will be chosen for the next frame. That is, each search window is determined by comparing PMAD and, at times, SumMV with predetermined threshold values. The flow diagram is shown in Fig. 6. Conditions for the flow are given in later part of this section, which is based on the threshold values.

The thresholds for SumMV are MVTh1 and MVTh2 (MVTh1 > MVTh2), and the thresholds for PMAD are PMADTh1 and PMADTh2 (PMADTh1 > PMADTh2).

(MVTh1, MVTh2) = (15 × 10 4 , 10 × 10 4 ) and (PMADTh1, PMADTh2) = (35 × 10 5 , 25 × 10 5 ).

Fig. 6 Adaptive window size selection scheme.

3.2 Projection and AWSS Mapping

The fast algorithm presented in this paper avoids most of ex- pensive 2D matchings. 1D projection of 2D blocks is used to eliminate the majority of candidates by matching in 1D, which is much faster than matching 2D blocks. The compu- tational scalability can be achieved by controlling how many candidates to be excluded by 1D matching.

Initially, column projection is used in reference frame and in current frame to project 2D block into 1D row, and PMAD is calculated and compared for all 1D projected row then best match block is found for frame. For find- ing the next best match block we use AWSS algorithm to fix the search window size and then PMAD is found for all the blocks with in this window size and the best match is chosen, this reduces the computational complexity and im- provers the image/video quality significantly comparing to existing fast full search methods. With negligible PSNR degradation our result shows that bit-rate is also reduced.

3.3 AWSS Algorithm

Search window is modified on the basis of the motion esti- mation results for the previous block. The search window for any given block is chosen from among three candidates, i.e., ±32, ±16, and ±8 pixels. More specifically, window size is determined on the basis of both the sum of the ab- solute of the motion vector (SumMV) and the sum of the prediction errors for the previous block. Proposed AWSS algorithm optimally varies area of each window size. The AWSS Algorithm flow is described below.

Flow 1:

if current window size is 8 or 16 and, PMAD > PMADTh1 and sumMV > MVTh1 then choose wider (±32) window size.

When PMAD and sumMV are larger than PMADTh1 and MVTh1 respectively, it implies fast motion makes the pre- diction inaccurate so larger window size should be chosen for the next block.

Flow 2:

if current window size is 8 and, PMADTh2 < PMAD <

PMADTh1 then choose medium (±16) window size.

Flow 3:

if current window size is 16 or 32 and, PMAD < PMADTh2 and SumMV < MVTh2 then choose smaller (±8) window size.

When PMAD and SumMV is less than PMADTh2 and MVTh2 respectively, it implies slow motion makes waste of computation with larger search window. So, smaller win- dow size should be chosen for the next block.

Flow 4:

if current window size is 32 and, PMADTh2 < PMAD <

PMADTh1 and SumMV < MVTh1 then choose medium

(±16) window size.

(5)

3.4 Flow Chart

Figure 7 depicts the over all algorithm flow of projection based adaptive window size selection scheme. Initially, col- umn projection is used in reference frame and in current frame to project 2D block into 1D row, and PMAD is cal- culated and compared for all 1D projected rows then best match block is found for the frame (initial frame). For find- ing the next best match block we use AWSS algorithm to

Fig. 7 Flow chart for over all algorithm.

(a) (b) (c)

Fig. 8 (a) Original Frame, (b) after PAWSS, (c) pixel di fference of (a) and (b).

fix the search window size and then PMAD is calculated for all the blocks with in this window size and the best match is chosen after performing column projection. This is done until last frame is reached, if the last frame is not reached the flow performs AWSS and it does so until it reaches the last frame in the sequence.

4. Simulation Results

Proposed motion estimation algorithm was implemented in the reference JM software version 10.1 [9]. We have tested the proposed method using standard QCIF (176 × 144) 30 frames per sec “Foreman” test sequence, slice mode is turned off and YUV format of 4:2:0 was used in the simula- tion. Results of PAWSS compared with original video frame and their pixel difference is shown in Fig. 8. Figure 8(a) is the original frame and Fig. 8(b) is obtained after PAWSS and Fig. 8(c) gives out the error (pixel difference) between the original frame and PAWSS.

In Fig. 9, computation time complexity comparison of fast full search used in reference JM software [9] and pro- posed PAWSS are shown. For this experiment, we used full foreman sequence. Quantization parameter was set as 20 and 10 for I and P frames, respectively. CAVLC entropy coding scheme was selected for our simulation.

Total motion estimation time has been significantly re- duced by 45% compared with fast full search method and there is about 23% of the total encoding time reduction.

4.1 Simulation Environment

The simulation is done on a P4 2.8 GHz workstation with 512 MB RAM running Windows XP professional operat-

Fig. 9 Computation time complexity—PAWSS compared with fast full

search method. (1) Total motion estimation time sequence (2) Total encod-

ing time.

(6)

Table 1 Bitrate comparison.

Table 2 PSNR comparison.

ing system with service pack 2. The reference encoder was modified to include the PAWSS method in it. It was written in plain C language, without any platform dependant opti- mizations, and compiled with VC++ Compiler version 6.0.

Table 1 brings out the bit rate comparison between fast full search and PAWSS. There is slight reduction of bit rate from 2268604 to 2266884 by PAWSS scheme. Even there is some reduction of bits/pic for both P and I frame for PAWSS when compared with fast full search scheme. This shows that by PAWSS scheme there is no significant reduction in bit rate rather it reduces computation complexity and finds motion vector quickly, which in turn increases encoding speed.

PSNR comparison of fast full search and PAWSS are tabulated in Table 2, which shows there is negligible PSNR degradation of about 0.088 for I frame and 0.057 for P frame.

5. Conclusion

We have presented projection based motion estimation with Adaptive Window Size Selection in this paper. This novel method greatly reduces the computational complexity while maintaining prediction integrity since most candidates can be quickly eliminated by matching 1D projection, which is faster than 2D blocks. Moreover, addition of adaptive scheme to fix window size for succeeding blocks avoids un- needed computation. Thus, in encoding a QCIF size video our method results in reduction of computation complexity of block based motion estimation by 45% and over all en- coding by 23% with negligible PSNR degradation, which is in e ffect, improves image/video quality.

Acknowledgments

This work is supported by National Science Council, Re- public of China under the research grant NSC93-2215-E- 006-019.

References

[1] P. Kuhn, Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation, Kluwer, 1999.

[2] Joint Video Team of ITU-T and ISODEC JTC I, “Draft ITU-T rec- ommendation and final draft international standard of joint video

specification,” (ITU-T Kec. H.264 ISO /IEC 14496.10 AVC) JVT of ISO /IEC MPEG and ITU-T VCEG, JVT-GO05, March 2003.

[3] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra,

“Overview of the H.264 /AVC video coding standard,” IEEE Trans.

Circuits Syst. Video Technol., vol.13, no.7, pp.560–576, 2003.

[4] P. Milanfar, “A model of the e ffect of image motion in the radon transform domain,” IEEE Trans. Image Process., vol.8, no.9, pp.1276–1281, Sept. 1999.

[5] J. Kim and R. Park, “A fast feature-based block matching algorithm using integral projections,” IEEE J. Sel. Areas Commun., vol.10, no.5, pp.968–971, June 1992.

[6] K.L. Chung and L.C. Chang, “A new predictive search area ap- proach for fast block motion estimation,” IEEE Trans. Image Pro- cess., vol.12, no.6, pp.648–652, June 2003.

[7] C. Tu, T. Tran, and P. Topiwala, “A hybrid feature /image block mo- tion estimation approach,” ITU-T /VCEG M.26.doc, Austin Meeting, April 2001.

[8] T. Yamada, M. Ikekawa, and I. Kuroda, “Fast and accurate motion estimation algorithm by adaptive search range and shape selection,”

Proc. ICASSP, vol.2, pp.897–900, 2005.

[9] Joint Video Team of ISO /IEC MPEG and ITU-T VCEG, H.264 /AVC. (online) Reference Software JM10.1, http:/bs.hhi.de/

suehring /tml/download/

[10] J. Mitcell, W. Pennebaker, C. Fogg, and D. LeGall, MPEG Video Compression Standard, Chapman and Hall, 1997.

Anand Paul is currently pursuing the Ph.D.

degree in the electrical engineering at National Cheng Kung University, Taiwan, R.O.C. His re- search interests include Algorithm and Archi- tecture for motion estimation in video, and Dig- ital Video SoC design for H.264 /AVC.

Jhing-Fa Wang is now a Chair Professor in National Cheng Kung University, Tainan, Tai- wan. He received his Master and Bachelor de- grees in the Department of Electrical Engineer- ing from National Cheng Kung University, Tai- wan in 1979 and 1973, respectively and Ph.D.

degree in the Department of Computer Science and Electrical Engineering from Stevens Insti- tute of Technology, U.S.A. in 1983. He was elected as an IEEE Fellow in 1999 and now the Chairman of IEEE Tainan Section. He got out- standing awards from Institute of Information Industry in 1991 and Na- tional Science Council of Taiwan in 1990, 1995, and 1997, respectively.

He has been invited to give keynote speech in PACLIC 12 (Pacific Asia

Conference on Language, Information and Computation), Singapore and

served as the general chairman of ISCOM 2001. (International Sympo-

sium on Communication), Taiwan. He has developed a Mandarin speech

recognition system called Venus-Dictate known as a pioneering system in

Taiwan. He was an associate editor for IEEE Transaction on Neural Net-

works and VLSI System. He is currently leading a research group of dif-

ferent disciplines for the development of “Advanced Ubiquitous Media for

Created Cyberspace.” He has published about 91 journal papers and 217

conference papers and obtained 5 patents since 1983. His research areas in-

clude wireless content-based media processing, image processing, speech

recognition and natural language understanding.

(7)

Jia-Ching Wang received the M.S. and Ph.D. degrees in electrical engineering from Na- tional Cheng Kung University, Tainan, Taiwan, in 1997, 2002, respectively. His research inter- ests include signal processing and VLSI archi- tecture design. Dr. Wang is a member of the Phi Tau Phi Scholastic Honor Society. He is also a member of IEEE, ACM, and IEICE.

An-Chao Tsai is currently pursuing the Ph.D. degree in the electrical engineering at Na- tional Cheng Kung University, Taiwan, R.O.C.

His research interests include Architecture for Entropy, video processing and Digital Video SW /HW co-design.

Jang-Ting Chen received his B.S. degree

in electronics engineering from National Taiwan

University of Science and Technology, Taipei,

Taiwan, R.O.C., in 1998. His research inter-

ests include intelligent video coding technology,

H.264 /AVC video coding and associated VLSI

architecture.

數據

Fig. 1 Di fferent partition sizes in a macroblock.
Fig. 2 Basic projection scheme. (a) reference frame, (b) current frame.
Fig. 5 Comparing 2D matching cost vs the 1D matching cost (MAD vs PMAD).
Fig. 6 Adaptive window size selection scheme.
+3

參考文獻

相關文件

In the size estimate problem studied in [FLVW], the essential tool is a three-region inequality which is obtained by applying the Carleman estimate for the second order

He proposed a fixed point algorithm and a gradient projection method with constant step size based on the dual formulation of total variation.. These two algorithms soon became

In this paper, we evaluate whether adaptive penalty selection procedure proposed in Shen and Ye (2002) leads to a consistent model selector or just reduce the overfitting of

In the inverse boundary value problems of isotropic elasticity and complex conductivity, we derive estimates for the volume fraction of an inclusion whose physical parameters

Feedback from the establishment survey on business environment, manpower requirement and training needs in respect of establishments primarily engaged in the provision of

In the third quarter of 2002, the Census and Statistics Department conducted an establishment survey (5) on business aspirations and training needs, upon Hong Kong’s

In the work of Qian and Sejnowski a window of 13 secondary structure predictions is used as input to a fully connected structure-structure network with 40 hidden units.. Thus,

The MTMH problem is divided into three subproblems which are separately solved in the following three stages: (1) find a minimum set of tag SNPs based on pairwise perfect LD