Block-matching Algorithm - Motion Estimation in H.264/AVC

Low Power Algorithms

3.2 Motion Estimation in H.264/AVC

3.2.1 Block-matching Algorithm

The relative studies (not specialized in video coding field) can be classified into two major categories, the time-domain algorithms and the frequency-domain ones [20,21]. The time-domain methods include feature-matching, recursive, and gradient-based algorithms, etc.. The frequency-domain algorithms comprises phase-correlation and transform-based (e.g. Fourier Transform, DCT, wavelet) algorithms.

Most of the these schemes are generally based on block-matching, matching of (all/some) pels of the current block with a candidate block in the search area. Block-matching algo-rithm (BMA) is the one of the most popular techniques for motion estimation. The basic concept of BMA is to represent the block in current frame using a block in the reference and a motion vector indicating their displacements. The block in reference picture that is a best match is the one that minimize the matching cost. A search range is set to confine the search procedure within an area that is more probably to have a good match. Various derivative strategies are proposed to further lower the searching time or searching cost.

Matching Criterion

The motion vector is the displacement of the current macroblock and the block that minimize the matching cost. Various matching cost has been adopted. These cost function vary in terms of implementation complexity and inefficiency. The following are commonly used cost functions in implementation [21]:

MSE (Mean Squared-Error): The mean square error produces outstanding results.

The MSE finds a block that minimizes the squared-error of two blocks, which is close to human visual perception. In addition, the square term appears in both MSE and PSNR equation. Thus, MSE generally produces excellent PSNR results. However, the MSE also suffers from high computational complexity due to the square term of cost function.

MAE (Mean Absolute Error): The mean absolute error is similar the MSE func-tion but replace the square operafunc-tion with absolute operafunc-tion. It has much less computation complexity comparing to MSE, and make it a popular cost function, especially for VLSI design. One problem for the MAE function is that small and

large error are treated equally, while the MSE emphasize on large errors by squaring the errors.

SAD (Sum of Absolute Difference): The SAD is very similar to the MAD function.

The SAD does not divide the summation of errors by the total number of pixels in the block, which is a constant through out the encoding process. This makes SAD more practical then MAD for implementation. The SAD criteria is most popular in hardware implementation of block-matching algorithms. The block-based nature leads to regularity and parallelism which are suitable for hardware-based realization.

MME (Minimized Maximum Error): The MME function inspects the maximum error of a block and select the block that has minimum maximum error. The ad-vantage is that a 8-bits comparators is required to get the maximum error instead of a 16 bit accumulator to save the differences of MAE.

TRANS (Transformation of Block): The block is first transformed to a block with lower bits resolution. One bits is used to represent the block after transform. Then, apply the technique similar to SAD on the reduced resolution block to find the motion vector. Computations are saved in the process of finding motion vector, but the transformation requires extra computation overhead.

3.2.2 Motion Estimation in H.264/AVC

This organization simply describes several advanced functionalities of the motion estima-tion for H.264 codec and software realizaestima-tion in JVT Joint Model (JM), including

Variable Block Size Motion Estimation Accurate prediction efficiently reduces the degree of displaced-difference between the current macroblock and the predictive region.

During block-based motion compensation, it is not possible of macroblock to contain more than one moving object with exactly one direction. Traditional video coding using MB size of compensated-prediction is hence not enough good fitting to these objects in the moving sequence. For this reason, multi-block motion estimation is therefore adopted in H.264/AVC for better compression performance. The luminance components of each macroblock are most split up in four ways, as shown in Figure 3.1(a). The motion compen-sation can be one macroblock partitioning, two half thick partitions (each 8 × 16 pixels), two half tall partitions (each 16×8 pixels), or four quarter squares partitioning (each 8×8 pixels). The 8 × 8 partition can be divided into further four manners, as shown in Figure 3.1(b). These subdivisions are one 8 × 8 partitioned sub-block, two 4 × 8 sub-blocks, two 8 × 4 sub-blocks, and four equal size partitions with each size 4 × 4 pixels. The associated chroma block sizes are given in Table 3.1. The approval macroblock segmentation leads to a large number of combinations. Each block in the coded area is compensated by

indi-0 of 8x8 luma samples and associated chroma samples of 8x4 luma samples and associated chroma samples

2 sub-macroblock partition of 4x8 luma samples and associated chroma samples

4 sub-macroblock partition of 4x4 luma samples and associated chroma samples

(b) Sub-macroblock partitions

Figure 3.1: Partitioning of a MB for motion compensation

Table 3.1: Chroma block sizes associated with luminance partitions

Block size Luma 16×16 16×8 8×16 8×8 8×4 4×8 4×4

(full pixel) Chroma 8×8 8×4 4×8 4×4 4×2 2×4 2×2

vidual motion vector. This method of partitioning macroblock into motion compensated sub-blocks is known as tree-structured motion estimation.

Enhanced Fractional Resolution Prior ITU-T H.263 standard is capable of support-ing the motion-compensated prediction in half-pel accuracy for both luma and chroma MVs [22]. The H.264 motion compensation adopts the chroma MVs at ¹₈ pixel resolution are derived form transmitted luma MVs of ¹₄ pixel resolution. The fractional-pel mo-tion compensamo-tion extends the search points form integer resolumo-tion to half-pel or even quarter-pel resolution, which further eliminates the displaced-frame difference (DFD) and improves the coding efficiency. The pixel values of fractional pixels are interpolated and the matching cost are calculated on the fractional search points using these interpolated pixels. Since the computation is too expensive to explore exhaustive fractional search, the block matching on fractional-pel is usually separated from integer motion estimation;

i.e. the fractional motion estimation is performed on the fractional search points around the integer position just found.

Multiple Reference Frame Motion Estimation Multiple reference frame motion estimation is a feature adopted by H.264. In the pervious video coding standards, the motion estimation allows to find out the best matching block in the previous or/and future one reference frame. This manner is better for the smoothly moving sequence. But for the high motion sequence, the best matching block may occur in the reference frame which is not the adjacent one. For the reason, H.264 takes multiple reference frames into consideration for motion estimation. Instead of only search previous one reference frame, H.264 is capable of searching most five previous frames for finding the best fitting.

Therefore, adopting multiple reference frames increase the access frequency according to a linear model, 25% complexity increased for each add frame. By using multiple reference frame motion estimation, a negligible gain (less than 2%) in bit rate is observed for low and medium motion sequence, but more significant saving can be achieved for active motion sequences (up to 10%) [23].

在文檔中次微瓦適H.264/AVC之極高編碼效能位移估測研究 (頁 44-47)