Downsampling Methods - Hierarchical Motion Estimation Algorithm

Chapter 1. Introduction

2.2. Hierarchical Motion Estimation Algorithm

2.2.1. Downsampling Methods

HMEA comprises three resolution levels, from zero to two. Level 0 is the top level, and level 2 is the lowest. The number of pixels at the next lower level is reduced to one quarter the number at the upper level. Figure 2-1 shows the hierarchical frame structure, and the W and H are the width and the height of the image, respectively. The MB size changes from16 16× , through 8 8× , to 4 4× at levels 2, 1 and 0, respectively.

Fig. 2-1. The hierarchical frame structure

In block matching algorithm, SAD is an important procedure, and its value at level l can be defined as

( )

^{( )}

( )

^{( )}

( )

In (1), the computational complexity of the matching process can be seriously reduced. At level 1, the computational complexity is only one quarter than that at level 2, and that at level 0 is one quarter than that at level 1.

Numerous approaches are available to reduce an image. In this paper, three different methods, left-top, 2D discrete wavelet transform (2D-DWT), and averaging filter are adopted. The comparison of these methods in computational complexity, performance and hardware implementation are discussed. The bicubic interpolation that can provide better performance is not in consideration since its complexity is much more than the other methods. Besides, when it comes to reducing the image with 50% in both width and height, the quality is not as better as it of enlarging the frame.

A. Left-top method

The left-top method is one of the simplest approaches for subsampling an image.

For the k^th input frame, I_k^{( )}²

( )

⋅ , the upper level images are computed by executing the following down-sampling:

( )⁻¹

( )

i,j =I^{( )}

(

2i,2j

)

,forl =1 ,2

I_k^l _k^l , (2-2)

where I^{( )}^l⁻¹

( )

i,j represents the gray level value at the position

( )

i, j of the k^th

frame at level l−1.

In the hardware implementation, the arithmetic operations are not necessary, and the output image can be generated by inputting the original one by specific order directly. The only cycles required are for moving data, and no extra hardware design is essential.

B. 2D-DWT

In image processing, most of the power associated with natural image signals tends to be in the low frequency band. Accordingly, the analysis of the low frequency band must be more extensive than that of the high frequency band. In practical applications, the low frequency band, decomposed from DWT, is further analyzed through second level DWT processing to yield more detail of the analysis signal at the lower frequency band. Such analysis is referred to as multi-resolution. Haar's and Antonini 9/7 Wavelet Transform is used to increase the speed of execution of the wavelet transform [64]. The 2D-DWT is applied as a one-dimensional DWT in the horizontal direction and then another in the vertical direction.

Figure 2-2(a) plots the corresponding locations of the images of the frequency bands decomposed by 2D-DWT. Fig. 2-2(c) and Fig. 2-2(d) shows the subsampled results obtained using the ‘Akiyo’ image, displayed in Fig. 2-2(b), after two levels of DWT processing. Fig. 2-2(c) and Fig. 2-2(d) truncate the values that are above 255 and below 0 for demonstration, but the value are retained in the evaluation progress.

As shown in Fig. 2-3, for the k^th frame, I_k^{( )}¹

( )

i,j and I_k^{( )}⁰

( )

i,j are the LL band of the first and the second order decomposition, respectively.

(a)

(b) (c) (d) Fig. 2-2. Examples of 2D-DWT downsampling: (a) frequency bands after two-level DWT decomposition; (b) original Akiyo image; (c) reduce the Akiyo image 50% in both width and height; (d) reduce the Akiyo image 25% in both width and height

Fig. 2-3. The relationship between the 2D-DWT and the downsampled images

There are some problems existing in this downsampling method. Firstly, the

computational complexity is heavy because it requires more additions and more multiplications for subsamping one pixel. Secondly, in the normal hardware design, 8 bits are required to store the gray level value of the pixel from 0 to 255. However, the range of the downsamping pixel by 2D-DWT goes beyond 0 to 255, and more bits are necessary. Therefore, the memory bandwidth of the hardware architecture and the chip area will be increased. Although the range of the pixels can be normalized into 0 to 255 to reduce the bandwidth, the extra hardware for normalization is essential. The computational complexity and the die size will be also increased.

C. Averaging filter

This method is the same as the bilinear interpolation that rescale the image with 50% in both width and height. Therefore, the quality of the reduced image can be ensured. For the k^th input frame, I_k^{( )}²

( )

⋅ , the upper level images are computed by executing the following down-sampling:

( )

( ) ∑ ∑

⁺ ^{( )}

( )

The hardware implementation for the averaging filter is simple, and only three additions and one bit shift operations are required for subsampling one pixel.

D. Comparisons

The main purpose of the ME is to eliminate the temporal redundancies existing in adjacent frames. Therefore, the quality will be increased when higher correlation coefficient exists between the successive images. The correlation coefficient, ρ, is defined as:

Figure 2-4 shows that the correlation coefficients between the downsampled images. It is observed that the correlation coefficients of the Haar DWT and the averaging filter are almost the same, and are much greater than the left-top and the Antonini 9/7 DWT methods, especially in the video sequences that their backgrounds are more complex. In Table 2-1, the estimation results are depicted, and it shows that the average PSNR of Haar DWT and the averaging filter are similar. Moreover, the image quality of the left-top method is bad, and Antonini 9/7 DWT even gets worse quality than it in some cases. The results indicate that downsampling method plays a very important role in MMEA. The estimation performance of adopting averaging filter significantly exceeds that of the method that considers only the left-top pixel, and can be used to design an efficient down-sampling hardware architecture.

Table 2-1 The comparison of the video quality between various downsampling methods for left-top, Haar’s DWT, Atonini’s 9/7 DWT, and the averaging filter in dB.

Left-top Antonini 9/7

garden 21.43 24.23 26.55 26.62

Foreman 30.53 28.74 33.14 33.18

0.75

Fig. 2-4. The correlation coefficients between the adjacent images for averaging filter, left_top method, Haar DWT, and Antonini 9/7 DWT in (a) Flower garden (b) Stefan

The performance of the Antonini 9/7 DWT is worse than that of the Haar DWT is unexpected. Theoretically speaking, the Antonini 9/7 DWT can reserve more information in low frequency band since it adopts higher order filters. However, the statement stands only when the inverse transform, which is not executed in the downsampling procedures, is performed. Therefore, Antonini 9/7 DWT requires higher computational power, but it provides poor quality in the downsampling stage of HMEA. Moreover, if the scaling factor of Haar DWT is replaced by 1/2, the results are exactly the same as the averaging filter, and can get rid of the dynamic range problem. The reason of the averaging filter outperforms the Haar DWT is that 1 2 is chosen as its scaling factor, and this will cause the inaccuracy of the values of downsampled pixels. Considering both the coding performance and the hardware design, the averaging filter is chosen to down-sample the image in HMEA.

在文檔中 MPEG-4/H.264視訊壓縮標準於嵌入式系統之最佳化研究 (頁 27-35)