• 沒有找到結果。

Efficient hierarchical motion estimation algorithm based on visual pattern block segmentation

N/A
N/A
Protected

Academic year: 2021

Share "Efficient hierarchical motion estimation algorithm based on visual pattern block segmentation"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

1997 IEEE International Symposium on Circuits and Systems, June 9-12,1997, H e r o n g

EFFICIENT HIERARCHICAL MOTION ESTIMATION ALGORITHM BASED ON

VISUAL PATTERN BLOCK SEGR/IENTATION

M e i - J u a n Chen, Liang-Gee Chen, Ro-Min Weng and Yung-Pin Lee

Department of Electrical Engineering National Taiwan Univeicsitv

Taipei, Taiwan,

R.0.C China

ABSTRACT

A new hierarchical block-matching motion estimation al- gorithm with t h e segmentation of variable block size and visual pattern is presented. T h e block partition with visual patterns which exploiting t h e properties of t h e human visu- al syst,ems(HVS) is employed t o get a more precise motion estima.tion for t h e detailed regions with complex motion in an image. T h e hierarchical search expedites the motion estima.tion and simplifies the processing of the stationary area. Simple control overhead and low side information are requirled due t o t h e regular decomposition of visual pattern structures. T h e performance of proposed method is superi- or over t h e conventional full search block-matching motion estimabtion while reducing computation complexity drasti- cally. Extensive experimental results are included in this paper.

1.

INTRODUCTION

Motion estimation is a powerful technique t o eliminate temposral redundancy for video compression. T h e block- matching approach[l] is popular and has been widely adopt- ed by several video coding standards, such as H.261[2], H.263[3] and MPEG[4]. Because the removal of temporal redundancy between successive image frames relies heavily on the use of a block-matching motion estimation technique, t h e pe:rformance and efficiency of the video coding systems depend on t h e accuracy and the speed of t h e block-matching motio:n estimation algorithm(BMA).

Block-matching algorithms find t h e motion vector based on a block-by-block matching. Jain and Jain[l] originally proposed t h e motion estimation which involves t h e division of the image i n t o fixed size blocks. T h e motion vector is t h e location t h a t has the maximum correlation value between blocks: i n temporally adjacent frames. Each equal-size block is compared against candidate blocks in a search area on t h e previous frame t o get t h e best-matched one. T h e success of BMA lies on t h e ability of prediction. Among various BMAri, t h e full search(or exhaustive search) is t h e most popultar one. However, a massive computation effort is usu- ally needed. In a typical hybrid coding system, there is 51% time spent in operation for the motion estimation. For t h e applications where power consumption and processing speed are critical, i t is clear t h a t t h e fast algorithms are necessary. In t h e past, various fast search algorithms[6][7] have been proposed t o alleviate t h e computational burden imposed by full search. Basically, the majority of these fast search. algorithms reduces t h e number of searched positions largelyg based on t h e assumption t h a t t h e matching criterion increases monotonically as the search position moves away from the best match position and reduce t h e computation time a t the expense of t h e accuracy.

For BMA, there is a n implicit assumption t h a t the mo- tion within each block is uniform. I t is not always valid if

0-7803-3583-X/97

$10.00

01997 IEEE

t h e fixed block sise is too large compared t o real object in an image. Then the block effect due t o moving edge will be noticeable arid the quality of t h e predict suffers. However, a decrease in the sise of t h e blocks means the number of mo- tion vectors :to be transmitted increase. T h e large overhead will frustrate coding efficiency. In the past, t h e quad-tree Segmentation has been used in image processing[l2]. His- torically, [9][10][11.] segmented t h e image into variable block sizes based on bin-tree or quad-tree decompositions before BMA t o make a compromise between performance and bit- rates. Quad-tree segmentation is one of popular techniques for variable block size coding. According t o [lo], t h e quad- tree is a tree! structure representation in which each node, unless i t is at leaf, generates four children. Each child oc-

cupies a quarter of t h e area of i t s parent. T h e subdivision of a parent node into i t s four children is guided by a homo- geneity test. In this test, a decision is made whether four subblocks of temporal difference are homogeneous. If t h e test fails, the! four subblocks are generated and they are re- garded as four new independent parent blocks t o be tested further. After the decomposition with quad-tree structure, the block-matching motion estimation is conducted on the variable size block.

Natural v:ideo images typically consist of regions with widely varying content and activity. Most motion estima- tion algorithms deal with each block equally in spite of the different coniplexity of motion for t h e various blocks. To use t h e comlputing resources efficiently, different computa- tion efforts should be made for blocks containing different amount of motion. T h u s the video image can be segment- ed into regions having widely different degree of motions. Especially for typical videoconferencing sequence, certain regions:, like the s,peakes's eyes and mouth in t h e head-and- shoulders sequence, are critical to our subjective evaluation of quality, and relatively small errors can perceptually have a major degrading effect on the overall quality. Such regions tend to dominate t h e viewer's attention and are intrinsical- ly mort: difficult compensated t h a n t h e 'background' of the sequence.

Motivated by the above considerations, i t is desirable t o find at segmentation t h a t allocates less computation effort- s and bit-rates t o homogeneous regions and more for t h e area with complex motion. In this paper, we devise a hier- archical BM,4 in which visual pattern structures are utilized t o describe t8he segmentation of the blocks. T h e hierarchi- cal procedure simplifies the motion estimation of t h e blocks with slow motion. For the regions experiencing complicated motion.s, the algorithm provides precise motion estimation by splitting them with visual pattern blocks which are de- signed t o take advantage of human visual system charac- teristics for image coding[l3]. T h e visual pattern design is develolped using icelevant psychophysical and physiological d a t a . 'The objective of our approach is t o obtain a unifor- m quality over the entire image, especially to improve the large d.egradation a t t h e area with quick motions.

(2)

2.

PROPOSED HIERARCHICAL

ALGORITHM

T h e hierarchical processing methodology is being proposed for increasing applications of image processing and video coding[8][11]. T h e success may be mainly attributed to t h e similar level of efficiency achieved in each hierarchy. T h e definition of hierarchy in our proposed algorithm is the block size. To obtain t h e flexibility of visual pattern block partition while avoiding t h e excessive overhead or side in- formation needed t o characterize more sophisticated image segmentation, we use a top-down(sp1itting) bit-quads and visual pattern blocks for image decomposition. From the t o p t o t h e bottom level, t h e block sizes 32x32, 16x16, 8x8 and visual pattern block(VPB) are chosen. And the image segmentation doesn't involve upsampling or downsampling because the image has single resolution in each hierarchy.

T h e encoding procedure takes place in three stages. In t h e first stage, an initial segmentation of 32x32 blocks of t h e current frame is performed. T h e procedure starts from t h e largest possible block(32x32). T h e first stage consists of determining which bit-quads pattern(as seen in Figure 1) a 32x32 block should be mapped. For the four subblock- ~ ( 1 6 x 1 6 ) in each 32x32 block, the average absolute frame difference(FD) is used as a segmentation rule. If this value is greater t h a n some threshold, bit 1 is assigned to indicate t h e subblock needs further processing to get satisfied quali- ty. And the halfof average frame difference of this subblock is obtained as t h e adaptive threshold value of test of next hierarchy. In general, video statistics are not known a prz- or. For this reason, we develop adaptive algorithm t h a t is capable of adjusting t h e thresholds t o adapt to the statis- tics of t h e video activity. To avoid unnecessary division of blocks due t o small luminance changes or random noise, t h e adaptive threshold is compared against a minimum thresh- old(minfhresh). If t h e adaptive threshold is smaller than min-thresh, min-thresh replaces it. However, if the thresh- old test is satisfied for t h e 16x16 block, the operation will be stopped and bit 0 is assigned for the subblock. We can use t h e bit-quads pattern which is used in binary image processing[5] t o describe t h e condition of the four 16x16 subblocks in a 32x32 block. If bit 0 is corresponding t o the subblock, t h e 16x16 block is treated as 'background' and without the need for t h e second stage. Otherwise, the bit 1 indicates t h a t t h e 16x16 block requires the processing of t h e next hierarchy(stage 2). Since natural images usually contain numerous large approximately static regions, larger blocks are adequate. Four bits for pattern index are need- ed for each 32x32 block a t stage 1. T h e purpose of the first stage of t h e coder is t o find those stationary area t o save further processing efforts and no operation for motion estimation is involved.

For each 16x16 block with bit 1 in the bit-quads pat- tern decided i n stage 1, a full search block matching is per- formed a t stage 2. And a search interval of only -2-$2 pixels is sufficient. For each of the four 8x8 blocks, the displaced frame difference(DFD) is employed t o determine whether t h e block should be processed further. If the adap- tive threshold obtained a t stage 1 is satisfied, the operation will be stopped for the 8x8 block. Otherwise, the half of D F D is obtained t o be t h e adaptive threshold of next hier- archy. T h e min-threshold check is the same as in the first stage. T h e bit-quads pattern can indicate t h e decomposi- tion structure of stage 2 as in stage 1.

At stage 3, for each 8x8 block with bit 1 in bit-quads pattern, a full search block matching around t h e motion vector obtained at stage 2 with -2-+2 search range is per- formed. T h e displaced frame difference(DFD) of each 8x8 blockis checked t o see whether t h e threshold test is satisfied or not

.

If it is not satisfied, a full search block matching with -3-+3 search range around the motion vector so far

obtained is implemented. In this stage, we use t h e visual pattern blocks as segmentation model(as indicated in Fig- ure 2). T h e size of the large block is 8x8 and t h a t of t h e small block is 2x2. T h e design of visual pattern blocks in a 8x8 block is defined encompassing t h e orientation. A 8x8 block can be uniform(index O ) , partitioned into two region- s(index 1-14) with respective orientation 90", 45", O o and -45O or decomposed into four subblocks(index 15). T h e regions of different colors i n Figure 2 represent different ob- j e c t s with homogeneous motion in t h e block respectively. T h e algorithm computes

DFD

of every subblock(2x2 for every displacement in search range. T h e minimum

D

d

D

is defined as the sum of individual minimum

DFD

of different parts in those visual pattern blocks. Although there are 16 possibilities for segmenting t h e block into separate objects, t h e computation complexity is the same as a 8x8 block- matching due to t h e requirement of memory elements. T h e implementation of pattern 15 needs more bit-rates for the coding of motion vectors. T h u s we have a compromise be- tween performance and bit-rate for pattern 15. T h e index between 0-15 as seen in Figure 2 is transmitted by select- ing t h e pattern whose minimum D F D is optimized. T h e patterns require only a small overhead rate by restricting the shape and possible numbers of t h e final regions from a predetermined set of options. T h e residual image is consid- erably concentrated in t h e boundary of moving object. T h e visual patterns are selected using a simple viewing geometry model in conjunction with measured properties of biological vision and suitable t o describe t h e boundary of objects i n a small block. T h e visual pattern decomposition provides an effective and economical solution t o t h e problem of object segmentation i n t h e application of block-matching motion estimation.

T h e bit-quads and visual patterns are not transmitted but instead are built independently by the receiver and transmitter only a few indexes must be communicated from the receiver t o t h e transmitter t o define the shape. T h e s-

mall overhead with the proposed method is due t o i t s struc- tural decomposition property, i.e. partitioning t h e picture frame into subblocks, whose sizes, shapes and locations are predetermined and thus are not transmitted. For hierarchi- cal search, matching is first performed in t h e 16x16 block t o obtain an initial estimation of the vector field; t h e computed vector field is then propagated t o t h e next stage, where it is corrected and again propagated t o t h e next stage until t h e necessary stage is reached. For each stage, t h e vector field to be transmitted will be composed of small vectors which might be efficiently coded using an MPEG-like Huffman coder. T h e hierarchical search incorporated with bit-quads and visual patterns reduces t h e overhead for coding motion vectors.

3.

SIMULATION RESULTS

The performance of t h e proposed algorithm is demonstrat- ed by five various video sequences, which are Claire,

Mzs-

s America, Susie, Salesman and Table Tennis. These se- quences were chosen for their different motions and char- acteristics. Claire and Miss America are typical head-and- shoulders sequences. Susie and Salesman involve local mo- tions. T h e aforesaid four sequences have a speaker imposed on a static background. However, Table Tennis includes zooming and large displacement. Each sequence has 30 frames/sec, 60 frames and luminance only. T h e perfor- mance of proposed method is evaluated and compared with the conventional full search(b1ock size 16x16), three-step fast search(b1ock size 16x16) and variable block size quad- tree segmentation block matching[lO]. For t h e quad-tree segmentation block-matching scheme, t h e largest possible block has 32x32 pixels, whereas t h e smallest one consists of 4x4 pixels. Then the full search is applied. All t h e schemes utilize mean absolute error(MAE) as matching criterion for

(3)

simplicity. And t h e final search ranges of motion estimation are all set t o -7-+7 with integer precision.

T h e methods aforementioned are also compared accord- ing t o the P S N R , computation complexity, block number, total bit-rate and subjective quality. T h e P S N R gives an objec:tive measurement for t h e quality of reconstructed im- age. T h e computation complexity, which is important for real-time video coding, is represented by the computation of one 16x16 block-matching. And t h e total bit-rates consists of two components. One is t h e entropy of t h e prediction error image. T h e other is t h e bit-rate for coding the mo- tion vectors, which

is

obtained from the multiplication of t h e motion vector entropy and total block number. T h e measurement of t h e bit-rate of motion vector also includes t h e overhead for t h e transmission of pattern indexes used in our strategy or t h e code addressing quad-tree structure for quad-tree decomposition method.

Table 1 provides t h e average performance comparison of t h e proposed hierarchical algorithm with other approaches. T h e ;superiority over t h e previous methods is obvious. Re- gardless of picture type, for the objective quality(in terms of PSNR), our proposed method outperforms t h a n conven- tional schemes, even though t h e full search. A massive com- putation effort is needed by full search algorithm. Even for t h e fast-moving sequence( Table Tennis), t h e computation of our method is lower t h a n t h e three-step search. T h e per- form:snce improvement a n d computation reduction of our proposed algorithm result from t h a t for those large block-

s wh:ich exhibit low picutre activity, like backgrounds, are usua1.ly quite successfully compensated with only small vec- tors; however, for t h e regions with complicated motion, the algorithm divides t h e blocks t o lower hierarchy or t o visual pattern blocks t o make accurate motion estimation and thus attains improvement on accuracy of reconstructed image and lower bit-rate for error images. Due t o t h e advantage of hierarchical procedure, there is no large amount compu- tation needed for visual pattern blocks and the vector dis- tribution is more smooth and t h u s the lower bit-rate needed for coding motion vectors. In respect of bit-rate, the pro- posed algorithm requires very little overhead for coding the bit-quads and visual patterns indexes. With fewer block-

s and little overhead, the bit-rate of proposed method is competable t o other methods. In [14], the edge information in a .block is extracted t o guide block-matching. Although t h e scheme makes use of t h e edge features, i t gets inferior performance t h a n full search.

Figures 3 illustrates t h e subjective quality comparison of t h e reconstructed pictures with motion-compensated vec- tors of proposed algorithm and full search for Table Tennis

sequence. T h e figures show t h a t t h e proposed algorithm applied t o visual p a t t e r n blocks so accurately compensates t h e motion t h a t there is little need for coding the error sig- nal. :Both uniform areas and many edges and detailed in the imag’e are well compensated. T h e image quality is clearly better when the proposed algorithm is used. Moreover, t h e blocking artifact, present with fixed-size block matching, is almost nonexist. In particular, observe t h e small detailed such as t h e player’s hands, the boundary of t h e banner and the corner of t h e table. T h e quad-tree segmentation using t h e homogeneity test rule of temporal difference tends t o get smaller blocks in a cluster. Although t h e use of small- er blocks results i n higher adaptivity, b u t t h e correlation among blocks can’t be exploited and therefore limits the accuracy and t h e compression ratio achieved. However, t h e use cif bit-quads and visual patterns have suitable segmen- tation t o fit t h e real objects. In particular, observe t h e par- tition of visual patterns used a t the player’s left ear, hands and the ball. From t h e simulation, our method gives more reliable motion estimation with much lower computation.

4.

CONCLUSION

A new hierarchical algorithm for block-matching motion es- timation has presented which combines the concepts of vari- able block size and visual pattern segmentation. To make use of the segmentation of variable block size, different com- putational (efforts may be made for regions having different compllexity of motion. Particularly, we utilize t h e visual patterns of single-image coding t o t h e block decomposition for smaller bloclcs t o get very accurate motion compensa- tion for those regions with complex motions. We devel- op the adaptive algorithm t h a t is capable of adjusting t h e thresholds to adapt to t h e statistics of the video activity. Extensive experiments have shown t h a t t h e proposed meth- ods reduces computation complexity while providing better performance t h a n conventional block-matching methods.

REFERENCES

J.R.Jain and A.K.Jain, “Displacement measurement a.nd i t s application in interframe image coding”, I E E E Trans.Commun., Vol.COM-29, No.12, pp.1799-1808, December 1981.

“CCIT‘T Standard H.261, Video codec for audio visual services a t p x 6 4 kbps”, July 1990.

ITU-T Studly Group 15, Working P a r t y 15/1, “Draft ITU-T Recommendation H.263 - Video Coding for Low E%it-Rate Communication”, July 1995.

‘‘ISO/I:EC 13818-2 Generic Coding of Moving Pic- tures and Associated Audio: Recommendation H.262”, November 1993.

W.K. P r a t t , Digital Image Processing, A Wiley- Interscience Publication, 199 1

T.Koyix, K.Iinuma, A.Hirano, Y.Iiyima, and T.Ishi- p r o , “Motion compensated interframe coding for video ‘conferencing”

,

Proc. N T C81, pp. G5.3.1-65.3.5, New Orleans, LA, December 1981.

M.J. Chen, L.G. Chen and T . D . Chiueh, “One- dimensional Motion Estimation Algorithm for Video Coding”, I E E E Trans. C A S for Video Technology,

Vo1.4, INo.5, pp.504-509, October 1994.

M.Bierling, “Displacement estimation by hierarchical block-matching”, S P I E , Vol.1001 Visual Communica- tions and Image Processing, pp.942-951, 1988. 1d.H. Chan,

Y

.B

.Yu,

A.G.Const an tinides, “Variable size Eilock nnatch.ing motion compensation with application-

s t o vi.deo coding”, I E E Proceedings, Vo1.137, No.4, V.Sefei:idis, M.Ghanbari, “Generalised block-matching motion estiimation using quad-tree structured spatial decomposition”, I E E Proceedings- Vision Image Signal I’roces.sing, .vo1.141, No.6, pp.446-452,December 1994. X X i a isnd Y.Q.Shi, “A thresholding hierarchical block matching algorithm for motion estimation”, I S C A S ,

E:.Shusterman and M.Feder, “Image compression via improved quadtree decomposition algorithms”, I E E E Trans. Image Processing, vo1.3, No.2, pp.207-215, March 1994.

D.Chen and A.C.Bovik, “Visual pattern Image Cod- ing”, I E E E Trans. Communications, vo1.38, No.12, pp.2137-2145, December 1990.

S’ .Z hong,

F

.Chin,Y .S. Cheung and

D

.Kwan, “Hierar- chical .motion estimation based on visual patterns for video c:oding”, ICASSP, pp.2323-2326, 1996.

August 1990.

DP. 624-627, 1996.

(4)

Algorithms PSNR M atch (dB)

B1 ock Entropy(bits/pixel Number DFD MV+Over. kotal

Mlss America Prouosed I 38.78 1 4118 I 218 I 3.52 0.023 3.543 Proposed 42.95 1216 Fullsearch 42.28 64148 Three-Step 42.21 7954 Ouadtree 42.43 15392 124 2.15 0.010 2.160 308 2.17 0.006 2.176 308 2.18 0.006 2.186 249 2.17 0.012 2.182 22969 Full-Search Three-Step Quadtree Susie Proposed Fullsearch Three-Step Quadtree 3.98 0.024 4.004 4.02 0.007 4.027 4.04 0.008 4.048 4.07 0.017 4.087 5.35 0.069 5.419 5.28 0.023 5.303 5.49 0.025 5.515 5.37 0.062 5.432 38.52 64148 308 3.50 0.022 3.522 38.15 7931 308 3.56 0.021 3.581 38.28 13739 154 3.56 0.010 3.570 36.18 6123 325 3.82 0.039 3.859 35.65 64148 308 3.84 0.012 3.852 35.14 7902 308 3.90 0.012 3.912 35.45 19093 437 3.87 0.029 3.899

Table 1. The performance comparison for various algorithms and video sequences

Figure 1.

The

bit-quads pattern used in stage l(seg- menting a 32x32 block into four 16x16 blocks) and stage %(segmenting a 16x16 block into four 8x8 blocks)

2 z

Figure 2. Set of visual pattern blocks used in stage S(segmenting a 8x8 block into visual pattern blocks)

Original Picture

Proposed Algorithm

Segmentation of Proposed Algorithm

Quad-tree Segmentation

Figure 3. Subjective comparison of proposed method and other approaches for Table Tennis

se-

quence

數據

Table  1.  The  performance  comparison  for  various  algorithms  and  video  sequences

參考文獻

相關文件

Bingham & Sitter (2001) used the usual minimum-aberration criterion for unblocked designs to compare split-plot designs, but since it often leads to more than one

Convergence of the (block) coordinate descent method requires typi- cally that f be strictly convex (or quasiconvex or hemivariate) differentiable and, taking into account the

grep - print lines matching a pattern. $ grep [OPTIONS]

In this chapter, we have presented two task rescheduling techniques, which are based on QoS guided Min-Min algorithm, aim to reduce the makespan of grid applications in batch

Secondly, the key frame and several visual features (soil and grass color percentage, object number, motion vector, skin detection, player’s location) for each shot are extracted and

Soille, “Watershed in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations,” IEEE Transactions on Pattern Analysis and Machine Intelligence,

Furthermore, based on the temperature calculation in the proposed 3D block-level thermal model and the final region, an iterative approach is proposed to reduce

Some efficient communication scheduling methods for the Block-Cyclic redistribution had been proposed which can help reduce the data transmission cost.. The previous work [9,