1994 International Symposium on Speech, Image Processing and Neural Networks, 13-16 April 1994, Hong Kong
Modifications and Performance Improvements of 3-Step Search
Block-Matching Algorithm for Video Coding
Her-Ming Jong, Liang-Gee Chen, and Tzi-Dar Chiueh Department of Electrical Engineering National Taiwan University, Taipei, China.
Abstract
This paper proposes three modifications t o the %step hierarchical search block-matching algorithm for video coding: a multiple- winner search t h a t improves the estimation accuracy, a method of subsampling t h a t re- duces computation and input d a t a amount, and a n overlapping strategy that improves the accuracy of large-area search. Exper- imental results show t h a t combining these techniques provides high-speed and high- precision motion estimators with reduced on- chip buffers and lowered input bandwidth re- quirements.
1
Introduction
T h e block-matching motion estimation/compensation is widely used in several video coding standards. In motion compensated video coding systems, the present frame in t h e video sequence is divided into quadrat- ic blocks. T h e process of block-matching is t o find
a candidate block, within a search area in the previ- ous frame, t h a t is most similar t o the currently con- cerned block in the present frame. Full search (FS) block-matching exhaustively evaluates all possible dis- placements and provides the optimal solution. Its mas-
sive computation motivates developments of many fast block-matching a l g o r i t h m (BMA’s) [ 11-[5]. These fast algorithms mainly use heuristic methods t o decrease the number of locations for checking, or use subsam- pling t o reduce the computation required for evaluat- ing each candidate location. These algorithms signifi- cantly reduce the computation cost of block-matching process, and some of them provide sub-optimal accu- racies close t o t h a t of full search.
Among these BMA’s, the 3-step hierarchical search (3SHS)[1] performs well and is recommended by R- M8 of H.261[6] and SM3 of MPEG[7]. T h e purpose of this paper is t o present three methods for improv- ing accuracy and reducing input bandwidth and com- putation cost of this algorithm: multiple-winner hi- ‘This work is supported by National Science Council, Republic of China, under Grant NSC 82-0404-E-002-198.
erarchical search, search-area subsampling, and over- lapped search for cases of large search area. These techniques can also be applied t o other BMA’s such as 2-D LOG[2], CS[3], etc., which use coarse-to-fine hi- erarchical searching schemes similar t o t h a t of 3SHS. These techniques are then combined t o provide high precision and large search area motion estimators a t costs much lower than t h a t of full search
.
0-7803-1865-X/94/$3.00 0 1994 IEEE 256
2
Multiple-winner hierarchical search
Several hierarchical BMA’s t h a t follow a coarse-to-fine approach have been proposed[l]-[4]. In each step of these algorithms, some candidate locations around t h e winner of the previous step are checked. T h e one with the least distortion is then chosen as t h e center lo- cation of the next-step check points. This approach is developed based on the assumption t h a t the distortion increases monotonically as the searched point moves away from the location of minimum distortion. How- ever, this assumption is not always true for real-world sequences, and thus it makes t h e inappropriate choice in early steps excludes the optimal solution. A method for overcoming this drawback is t o retain more than one winner for t h e next step. For example, t w o o r more low-distortion candidate locations can be picked out as center locations of the next-step search. T h e trade-off between cost and performance is determined by ex- perimental results presented in the following sections, which show t h a t keeping two least-distortion locations in each step is appropriate for general video sequences.
3
Search-area subsampling
Subsampling is a well-known technique t h a t reduces the computation in evaluating the distortion of a can- didate block. However, direct subsampling (pixel deci- mation in both current blocks and search area) is prone t o lose spatial resolution. T h e alternative subsampling technique(51 was thus proposed t o improve the accura- cy. This method preserves spatial details by using all pixels of current blocks and search area for calculating motion vectors. However, from t h e viewpoint of hard- ware implementation, this method saves computation but the requirements for 1/0 bandwidth and on-chip buffer remain unchanged.
Our observation suggests t h a t , for general video se- quences, the high interframe correlation makes details in two successive frames can be covered by preserving all pixels of only one frame (and the other frame is permitted t o be appropriately subsampled). Because the search area is usually much larger than a cur- rent block, subsampling search area and keeping all current-block pixels can significantly reduce 1/0 and buffer requirements and preserve the accuracy. Fig.1 illustrates the subsampling stratagy, in which differ- ent sampling masks are applied t o current block when evaluating adjacent candidate locations.
4
Overlapped search for large search
range
As proposed by M P E G SM3, the 3SHS can be directly expanded t o more t h a n three steps t o cover a larger search range. But when the step number increases, the distances between checked points a t early steps enlarge exponentially. Although the computation cost is much lower than FS of the same search range, the probability of being trapped in local minima signif- icantly raises and the accuracy is thus dramatically reduced. T o overcome this drawback, we proposed t o use several independent 3SHS’s, which inherently have search ranges of -7
-
+7 pixels, t o cover required larger search ranges. For example, the search range of -15-
+14 pixels can be covered by four 3SHS’s. A weakness of this approach is t h a t small motions cannot be detected well. This is because the smaller motion vectors locate near boundaries of the four composing 3SHS’s, b u t 3SHS itself performs better for location- s near center of its search range. The proposed so- lution is t o apply a n additional JSHS that covers asearch range centered a t (O,O), as shown in Fig. 2. Al- though its search range overlaps that of other 3SHS’s and results in some overhead, it significantly improves performance because small motion vectors frequently occur in typical video sequences.
5
Simulation Results
To decide the number of kept winners for MWHS, the percentage improvement defined as
Percentage Improvement (%)
- PSNR(MWHS) - PSNR(3SHS) loo% -
PSNR(FS) - PSNR(3HSH)
is evaluated for different winner numbers. Fig. 3 shows the average results of three and five benchmark sequences for H.261 and MPEG re- spectively:“salesman’’(sales), “Miss America”(miss), “Claire“(cl), “Susie”, “windmill”(wm), “table ten- nis”(tt), “football”(fb), and “mobile and calen- dar”(mob). These results suggest t h a t two is a n ap- propriate winner number by considering the trade-off
between cost and improvement.
Table 1-3 present performances and complexities of proposed methods and their combinations, togeth- er with some traditional approaches for comparison. In Table 1, all sequences are a t a frame rate of 30 frame/sec, and search ranges are -8(for FS) or -7(for other algorithms) t o +7 pixels. “Sub.” stands for 1:2 subsampling of search-area pixels, and the MWH-
S keeps two winners a t both step 1 and 2 a s centers of the next-step search range. An illustration is given in Fig.4, it shows t h a t the performance improvement provided by MWHS is more significant t h a n the degra- dation results from subsampling.
Table 2 and 3 present results of large-range searches for sequences of lower frame rates. In Table 2, test frames are temporally skipped by a factor of two to magnify interframe displacement. For all algorithms, search ranges also double in both horizontal and verti- cal directions. T h e algorithm “4x 3SHS” merely con- tains four 3SHS’s that cover t h e whole search range, and the OHS16 is the proposed overlapped scheme t h a t provides a n additional 3SHS t o deal with smal- l vectors. T h e direct expansion t o four steps is also evaluated and labelled by “4SHS”. T h e subsampling and MWHS can also applied t o each JSHS in OHS16 t o provide further improvement(OHS16s,~.). Results in Table 3 are similar t o t h a t in Table 2, except t h a t interframe interval and search range are both enlarged by a factor of four, and thus 16 3SHS’s are required t o cover the search range. These results in Table 2-3 and Fig. 5-6 show that the proposed OHS is robust and performs close t o full search. Computational complex- ities (relative t o t h a t of FS) of these algorithms are calculated by counting the number of required addi- tion/subtraction (as shown in Table 1
-
3), which isa useful criteria when dedicated hardware is applied. Simulation results show t h a t t h e proposed methods provide good trade-off between performance and cost.
6
Conclusion
The multiple-winner hierarchical search raises the ac- curacy of conventional 3-step hierarchical search block- matching algorithm; T h e search-area subsampling re- duces the computation and input d a t a amount; T h e proposed overlapped search enlarges the practical search range of hierarchical search algorithms. The combinations of these three proposed methods provide high-speed motion estimators t h a t perform close t o the optimal full search, even when t h e motion in image se- quences and the required search ranges are enlarged.
References
[l] T . Koga, K. Iinuma, A. Hirano, Y. Iijima, and T . Ishiguro,“Motion compensated interframe coding for video conferencing,” in Proc. Nat. Telecom- mun. Conf., New Orleans, LA, Nov.29-Dec.3, 1981, pp. G5.3.1-5.3.5.
match wrren1 block
10 location (-1 ;2)
match w m n t block 10 Ixation (-2.-2)
Figure 1: Subsampling the search area pixels.
-1 5 0 14 ... -15 : ~ BSHS-l 3SHS-2
I
0 14 Figure 2: OHS16.t
...t i p
7 3SHS-3 3SHS-4 j ... J ...Search range overlapping of the proposed
[2] S. Kappagantula and K. R. Rao, "Motion com- pensated interframe image prediction," IEEE
Trans. Commun., vol. COM-33, no.9, pp.1011- 1015, Sep. 1985.
[3] M. Ghanbari, UThe cross-search algotithmfor mo- tion estimation," IEEE R a n a . Commun., vo1.38,
no.7, pp.950-953, July 1990.
[4] L. G. Chen, W. T. Chen, Y. S. Jehng, and T. D. Chiueh,"An efficient parallel motion estimation algorithm for digital image processing," IEEE Trans. Circuita Syst. for Video Tech., vol.1, no.4, pp.378-385, Dec. 1991.
[5]
B.
Liu and A. Zaccarin, "New fast algorithms for the estimation of block motion vectors," IEEE Trans. Circuila Syst. Video Technol. vo1.3, no.2, pp.148-157, Apr. 1993.[6] CCITT SGXV, "Description of reference model
8 (RME)," Document 525, Working Party XV/4, Specialists Group on Coding for Visual Telepho- ny, June 1989.
[7] MPEG, "IS0 CD11172-2: Coding of moving pic- tures and associated audio for digital storage me- dia a t up t o about 1.5Mbits/s", Nov.1991.
H.261 sequences c
-
MPEG sequences
-~~
1 2 3 4 5 6 7 8 9
N O . of kept winners
Figure 3: Percentage improvements of different winner numbers.
FS -
4 0
i t
3 8
39t1
Sub. 3SHS --
1
Figure 4: Comparison of various algorithms' PSNR's for normal frame rate and search range of -8
-
7 pixels. 3 8 F S --
m n aI
3 5 3 3 32 OHS1 6+Sub. m s -*-- 4x3SHS OHS16 - *]
0 5 10 1 5 20 25 30 P r a m Number ['Susie'lFigure 5: Comparison of various algorithms'
PSNR's
for 1/2 frame rate and search range of -16
-
15 pixels.j 3 7 ,
.
.
.
.
.
,.
1 3 6 3 5 34.
3 3 32 - 3 1 . 3 0 - . . . OHSIZ*Sub.MWHS --- OHS32 + 16X3SHS - \ I,
2 8\
,/--4
27 0 2 4 6 8 10 12 14 16 F r a m Number [ ' S u s i e ' lFigure 6: Comparison of various algorithms' PSNR's for 1/4 frame rate and search range of -32
-
31 pixels.Table 1: Simulation results of normal frame rate and search range of - 8
-
7 pixels.Table 2:
search range of -16
-
15 pixels.SimuIation results of 1/2 frame rate and
Algorithms PSNR (dB) Complexity
sales I miss I cl I susie I wm I t t 1 fb I mob
(%1
Table 3: Simulation results of 1/4 frame rate and search range of -32