A hierarchical decimation lattice based on N-queen with an application for motion estimation

(1)

228 IEEE SIGNAL PROCESSING LETTERS, VOL. 10, NO. 8, AUGUST 2003

A Hierarchical Decimation Lattice Based

on

N-Queen With an Application

for Motion Estimation

Chung-Neng Wang, Shin-Wei Yang, Chi-Min Liu, and Tihao Chiang, Senior Member, IEEE

Abstract—We present a novel technique, -queen lattice, to spa-tially subsample a block of pixels. Although this lattice is pertinent to many applications, we present an application to speed up motion estimation with minimal loss of coding efficiency. The -queen lat-tice is constructed to characterize spatial features in all directions. It can be hierarchically organized for motion estimation with vari-able nonsquare block size. Despite the randomized lattice structure, we demonstrate that it is possible to achieve compact data storage architecture for efficient memory access and simple hardware im-plementation. Our simulations show that the -queen lattice is su-perior to several existing sampling techniques with improvement in speed by about times and small loss in peak SNR.

Index Terms—Fast motion search, -queen lattice, pixel deci-mation.

I. INTRODUCTION

S

EVERAL VIDEO coding standards including MPEG-1/2/4 contain block motion estimation as the most computation-ally intensive task. There are three categories to improve motion estimation by reducing the number of search points [1], [2], the load for measuring the distortion [3], [4], and the number of matching pixels from a block [5]–[10]. The MPEG-4 reference software has provided two fast algorithms that have significantly reduced the number of search points [1], [2]. The bit truncation or one-bit algorithms reduce the complexity by modifying the bit depth and the distortion measure [3], [4]. When the pixels are represented in a binary format, the block matching can use ex-clusive-ORBoolean operators and table lookup techniques. The pixel decimation approaches can be easily combined with ap-proaches from the first two categories. Thus, we will focus on pixel decimation to achieve further improvement.

The pixel decimation can be achieved with either fixed [5]–[8] or adaptive patterns [9], [10]. As shown in Fig. 1(b), Bierling used an orthogonal sampling lattice with a 4 : 1 subsampling [5], which is referred to as the “quarter pattern” here. Liu and Zaccarin implemented pixel decimation that is similar to Bierling’s approach with four alternating subsam-pling patterns selected for each step so that all the pixels in

Manuscript received September 9, 2002; revised October 4, 2002. This work was supported by the National Science Council, Taiwan, R.O.C. under Contract NSC 91-2218-E 009-005.

C.-N. Wang, S.-W. Yang, and C.-M. Liu are with the Department of Com-puter Science and Information Engineering, National Chiao-Tung University (NCTU), Hsinchu, 30050 Taiwan, R.O.C. (e-mail: [email protected]; [email protected]; [email protected]).

T. Chiang is with the Department and Institute of Electronics Engineering, National Chiao-Tung University (NCTU), Hsinchu, 30050 Taiwan, R.O.C. (e-mail: [email protected]).

Digital Object Identifier 10.1109/LSP.2003.814403

Fig. 1. Pixel patterns for decimation. (a) Full pattern withN 2 N pixels selected. (b) Quarter pattern uses 4 : 1 subsampling. (c) Four-queen pattern is tiled with four identical patterns. (d) Eight-queen pattern. (c) and (d) are derived from theN-queen approach with N = 4 and N = 8, respectively.

the current block are visited [6]. The pixel decimation can be adapted based on the spatial luminance variation within a picture [9], [10]. Adaptive techniques can achieve better coding efficiency as compared to the uniform subsampling schemes [5]–[8] with an overhead in deciding which pattern is more representative. Due to mispredicted branches, the irregular or adaptive structure is difficult for pipelined implementation.

The quarter pattern has advantages in pipelining and memory access but fails to represent half of the lines in horizontal, ver-tical, and diagonal directions. To represent key features and maintain pipelined memory access, we will construct a family of lattices that maintain the regularity and characterize more di-rectional features.

II. -QUEENPIXELDECIMATION

Pixel decimation is used to reduce the computation for mea-suring the distortion for each block during the search [5]. The most representative sampling lattice is selected based on how much the texture and edge information are retained with min-imal number of pixels. The sampling lattice is analyzed with the spatial homogeneity and directional coverage. The spatial

(2)

WANG et al.: HIERARCHICAL DECIMATION LATTICE BASED ON -QUEEN WITH AN APPLICATION FOR MOTION ESTIMATION 229

TABLE I

COMPARISON OF THESAMPLINGLATTICES FOR AN82 8 BLOCK. IN

MEASURING THEDIRECTIONALCOVERAGE, FOURORIENTATIONS

DESCRIBED INFIG. 1(d) AREUSED. FORHORIZONTAL, VERTICAL,

ANDDIAGONALDIRECTIONS, THEREAREEIGHT, EIGHT,

AND15 POSSIBLEEDGES, RESPECTIVELY, WHILE FOR THE

DIAGONALDIRECTIONS, THEREARE15 POSSIBLEEDGES

homogeneity is measured by the average and variance of spatial distances from each skipped pixel to its nearest selected pixel

(1)

(2) where is the dimension of the block, and indicates the coordinates of the selected pixel nearest to the pixel at the posi-tion ( ). is the number of the selected pixels. Smaller and indicate a more spatially homogeneous sampling lattice. An edge is defined as a line passing through the sampling grids in any of 0 , 45 , 90 , and 135 directions as shown in Fig. 1(d). The directional coverage is measured as the percentage of edges that at least one of the selected pixels exists on an edge. Table I shows that the quarter pattern has less spatial homogeneity and lacks half of the coverage in the specified directions. To address the issues of spatial homogeneity and directional coverage, we construct a new -queen sampling lattice.

To fully represent the spatial information of a block, it is required that at least one pixel should be selected for each row, column, and diagonal. To satisfy such a constraint, the solution is identical to the problem of placing queens on a chessboard, which is referred to as -queen pattern. For a block, as shown in Fig. 1(c) and (d), every pixel of the -queen pattern occupies a dominant position, which is located at the center. All the other pixels located on the four lines in the vertical, hori-zontal and diagonal directions are removed from the list of the selected pixels. With such elimination process, there is exactly one pixel selected for each row, column, and (not necessarily main) diagonal of the block. Thus, the -queen patterns present a subsampling lattice that can provide times of speedup improvement.

The -queen patterns are not unique. For example, there are 92 8-queen patterns for a 8 8 block. The remaining issue is to

Fig. 2. Row and column alignment approaches for transforming a two-dimensional 42 4 block into a one-dimensional vector of four pixels.

identify which one provides a better representation. By (1), the average distances of these 92 patterns are distributed between 1.29 and 1.37 pixels. Thus, the variation in average distances is only 0.08 pixels. We find that the 92 8-queen patterns have almost identical performance with varying peak SNR (PSNR) less than 0.1 dB.

To minimize the memory access bandwidth, a group number (one to four) is used to index each group of pixels that are placed in a separate memory buffer as shown in Fig. 2. There is a sep-arate frame buffer allocated for each of the groups based on the -queen lattice. For example, the 4-queen lattice stores the nonoverlapping references pixels in four smaller buffers. One of the special properties of this storage technique is that a mac-roblock resides in a continuous memory space for easy access. For example, the selected pixels are grouped together to fully exploit the single-instruction multiple-data architecture as pro-posed by Moschettie et al. [8]. If we use a pipelined memory ac-cess strategy, a shift of one pixel in each frame buffer represents a spatial shift of pixels in the original frame. Thus, this data storage architecture can easily facilitate a search strategy. Another interesting observation is that each pixel is se-quentially accessible even though the search strategy is hierar-chical. This provides an elegant solution to improve both search strategy and memory access.

To compute the full-pixel motion vectors for blocks with sizes 16 16 and 8 8, the motion vectors of 16 16 blocks are computed first. The 8 8 block uses the motion vector of a 16 16 block as an initial position and fine-tunes the search in a window of 2. The coding mode is decided based on a tradeoff between the distortion and the required bits for encoding motion vectors. The half-pixel motion vector is found by searching the eight points surrounding the best full-pixel motion vector using the 16 16 and 8 8 modes, respectively. In the fine-tuning process, we classify the eight pixels into two sets. The first set includes the search points on the diagonals, and the other set covers the remainders. For each set, we perform the motion

(3)

es-230 IEEE SIGNAL PROCESSING LETTERS, VOL. 10, NO. 8, AUGUST 2003

TABLE II

PERFORMANCE OF THEFOURPIXELPATTERNS,THETWOSEARCHSTRATEGIES,AND THEVARIOUSVIDEOSEQUENCES ONDIFFERENTTESTINGCONDITIONS. FOR

EACHMETHOD,THEFIRSTSYMBOLDENOTES THESEARCHSTRATEGY,AND THEREMAININGSYMBOLDENOTES THESAMPLINGPATTERNS. FOREXAMPLE,THE

NOTATION“PMVFAST_8” REPRESENTS THEMOTIONESTIMATIONUSINGPMVFASTANDEIGHT-QUEENPATTERN. WHERE THECOLUMN

“PSNRY” DENOTES THEAVERAGEPSNRFOR THELUMINANCECOMPONENT,AND THECOLUMN“CHECKINGPOINTS” INDICATES THE

ACTUALNUMBER FORCALCULATINGSAD CRITERION OF162 16 PIXELS

timation with various sampling patterns. We then find the best half-pixel motion vectors by comparing the distortion of the two candidates with the minimum distortion of the best full-pixel motion vector.

III. EXPERIMENTALRESULTS

In our simulation, we use the MPEG-4 reference software, and the distortion measure is sum of absolute difference (SAD), which is computed for a macroblock of size 16 16 and various search ranges. The coding efficiency is analyzed based on the three factors: sampling patterns, search strategies, and testing conditions.

As for the sampling patterns, we use five patterns as described in Fig. 1. The full pattern (“F”) selects all of the pixels in the

current block. The quarter pattern (“Q”) and the hexagonal pat-tern (“H”) are described in [5] and [7], respectively. The 4-queen (“4”) pattern is constructed by tiling multiple small 4-queen pat-terns for each macroblock. The 8-queen (“8”) pattern is con-structed by tiling similarly to the 4-queen pattern.

As for the search strategies, we tested full search and the fastest approach, predictive motion vector field adaptive search technique (PMVFAST) [2], as recommended by the MPEG-4 committee. We follow the recommended testing conditions as prescribed by the MPEG committee [1]. As shown in Table II and Fig. 3, we reach the following conclusions.

1) The -queen patterns have negligible video quality degradation. With PMVFAST, the loss in PSNR is less than 0.36 dB for slow motion video such as “Container.”

(4)

WANG et al.: HIERARCHICAL DECIMATION LATTICE BASED ON -QUEEN WITH AN APPLICATION FOR MOTION ESTIMATION 231

Fig. 3. PSNR comparisons for the motion estimation based on exhaustive search strategy and various subsampling lattices. The Foreman sequence is in common intermediate format and is encoded with 512 kb/s and 15 frames/s. The search range for motion estimation is 16 for encoding.

The loss in PSNR is less than 0.53 dB at worst for fast motion video. The 4-queen pattern is better than the quarter and hexagonal patterns by about 0.05 0.25 dB for the frame-coded sequences.

2) When we compare algorithms with the same pattern but different search strategies, the degradation using full search is more than that of PMVFAST for all patterns. This may be caused by the predictive nature of PMV-FAST.

3) It is advantageous to use the -queen patterns for the CCIR-601 interlaced sequences because -queen pat-terns retain all the horizontal spatial information within a block, while the quarter and hexagonal patterns lose half of the horizontal spatial information.

IV. CONCLUSION

This letter has presented a novel and simple pixel decima-tion technique using the -queen lattice with an application for

block-based motion estimation. The complexity and memory bandwidth can be arbitrarily reduced by a factor of . It is su-perior in terms of spatial homogeneity and directional coverage. The hierarchical -queen sampling lattice is flexible when the block size is variable including nonsquare block used in H.26L.

ACKNOWLEDGMENT

The authors wish to thank the anonymous reviewers for their insightful comments to improve the initial draft of this letter.

REFERENCES

[1] A. M. Tourapis, O. C. Au, and M. L. Liou, “Fast block-matching motion estimation using predictive motion vector field adaptive search technique (PMVFAST),” ISO/IEC, Noordwijkerhout, NL, ISO/IEC JTC1/SC29/WG11 MPEG2000/M5866, 2000.

[2] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast block-matching motion estimation,” IEEE Trans. Image Processing, vol. 9, pp. 287–290, Feb. 2000.

[3] B. Natarajan, V. Bhaskaran, and K. Konstantinides, “Low-complexity block-based motion estimation via one-bit transforms,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp. 702–706, Aug. 1997. [4] X. Song, T. Chiang, X. Lee, and Y.-Q. Zhang, “New fast binary pyramid

motion estimation for MPEG2 and HDTV encoding,” IEEE Trans. Cir-cuits Syst. Video Technol., vol. 10, pp. 1015–1028, Oct. 2000. [5] M. Bierling, “Displacement estimation by hierarchical block matching,”

Proc. SPIE, vol. 1001, pp. 942–951, 1988.

[6] B. Liu and A. Zaccarin, “New fast algorithms for the estimation of block motion vector,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 148–157, Apr. 1993.

[7] K. T. Choi, S. C. Chan, and T. S. Ng, “A new fast motion estimation algorithm using hexagonal subsampling pattern and multiple candidate search,” in Proc. ICIP, 1996, pp. 497–500.

[8] F. Moschettie and E. Debes, “About macroblock subsampling for motion estimation,” Proc. ICME, Aug. 2001.

[9] Y.-L. Chan and W.-C. Siu, “New adaptive pixel decimation for block motion vector estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 113–118, Feb. 1996.

[10] Y. K. Wang, Y. Q. Wang, and H. Kuroda, “A globally adaptive pixel-dec-imation algorithm for block-motion estpixel-dec-imation,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, pp. 1006–1011, Sept. 2000.