P ROPOSED FRUC ALGORITHM - 使用時間域與空間域的插補法來提升畫面更新率

3.2 Proposed FRUC algorithm

The proposed FRUC method aims at doubling frame rate, namely, converts the frame rate from n to 2n. Figure 3.13 shows the concept, where suppose that the input sequence will become even frames of the up-converted sequence and we need to produce all odd frames by the proposed method. We call these non-existing odd frames as the to-be-interpolated frames.

Figure 3.13 Frame rate up-conversion from n to 2n

The proposed FRUC algorithm is summarized with following flow chart in Figure 3.14. The proposed algorithm mainly divided into two parts. The first part including steps 1, 2, and 3 is frame-based temporal interpolation to generate the initial interpolated frame. The second part including step4 is pixel-based spatial interpolation to improve visual quality with various non-linear interpolation methods by pixel gradients determination. The detail of each step will be described in the following sections.

Figure 3.14 Flow chart of proposed FRUC algorithm

3.2.1 ME & Motion Vector Merging

Step1 of the proposed method includes a block-based unidirectional motion estimation process which is applied to every adjacent frame of the input sequence by using block matching algorithm for obtaining the best motion vectors for each block as the Fig.3.15 shows.

Figure 3.15 Motion Estimation in proposed method

Step 2 of the proposed method includes a MV merging process. In the video coding standard H.264, block sizes vary from 16x16 to 4x4. In our approach, for any block which is smaller than 8x8, it will be merged with its neighbor blocks into an 8x8 block. The MV of the merged 8x8 block is chosen as the median of the motion vectors of all its sub-blocks. Since the median MV should have minimal distances between it and other three MVs, our median function is defined as: the median MV is the one which has the minimum SAD (sum of absolute difference) between it and the other three neighboring MVs. This is different from traditional MV merging which adopts average MV as the merged MV. The Figure 3.16 shows an example of our MV merging, where there are 4 neighboring 4x4 sub-blocks. Use the proposed median function to determine the median one among the four motion vectors, MV0 ~ MV3. If MVi is selected, then it will become the MV of the resulting 8x8 block. MV merging processing not only can reduce MCI computation (because number of MVs is reduced), but also can increase video quality. This will be illustrated in the section of

experimental results.

Figure 3.16 MV merging by median selection

3.2.2 Temporal Interpolation model

In the step3 of the proposed algorithm, the initial interpolated frame is generated by two bi-directional MCI methods. The non-aligned bi-directional MCI (NA-BDMCI) is performed first, which produces interpolated pixels by averaging the pixels on the adjacent frames along real motion trajectory. The real motion trajectory is derived from the motion vectors of adjacent frames, obtained by motion estimation process in step1. The NA-BDMCI is illustrated as the Fig.3.17 shows.

Figure 3.17 non-aligned bi-directional MCI (NA-BDMCI)

After NA-BDMCI, there may have some holes on the interpolated frame, due to no motion trajectory on them. Thus, a aligned bi-directional MCI (A-BDMCI) is

) (

₌₀_~₃

=

median MV

MV

performed to overcome this problem. Different from NA-BDMCI which uses real motion trajectory, A-BDMIC uses motion vectors of the co-located blocks on adjacent frames as the motion vectors of the interpolated frame and thus, every aligned block in the interpolated frame will have a motion vector. In our approach A-BDMIC is only used to produce the pixels on the hole of the interpolated frame generated by NA-BDMCI. The A-BDMCI is illustrated as the Fig.3.18 shows.

Figure 3.18 aligned bi-directional MCI (A-BDMCI)

Producing pixels by using BDMCI methods typically has the pixel overlapping problem, that is, multiple pixels are interpolated corresponding to the same location.

There are two alternatives to be used in common: average selection and minimum absolute difference (MAD) selection. The average selection uses the average pixel value of all the overlapped pixels; while the MAD selection chooses the pixel value from the one which has minimum absolute difference between the motion compensated pixels on the two adjacent frames. In our solution, average method is used.

3.2.3 Spatial Interpolation model

In the step4 of the proposed algorithm, a pixel-based spatial interpolation is adopted. The flow chart is shown in Figure 3.19.

Figure 3.19 Flow chart of pixel-based spatial interpolation model

First, it calculates the gradients both in temporal and spatial (containing 6 different directions) domains for each pixel on the initial interpolated frame produced by step3. Second, it distinguishes reliable and unreliable pixels according to temporal gradient threshold (GT_TH) which is a value predefined by using statistic method.

For those pixels identified to be unreliable, they will be modified by using spatial interpolation method because their initial values produced by temporal method are not good enough according to GT_TH.

Figure 3.20 shows how the threshold value of GT is defined. It is obtained by using temporal interpolation and spatial interpolation respectively on all the pixels of each frame for eight training sequences. The corresponding PSNR values and gradient values (both in frame-based) are presented in the ascending order of PSNR value of spatial interpolation and the descending order of PSNR value of temporal interpolation. From Fig.3.20, it is observed that on the left side of the intersection of two PSNR curves, the temporal interpolation has better results than spatial interpolation. So the frames with temporal gradients (GT) falling in this region are regarded to be reliable if temporal interpolation is used. Hence, we use the average of GTs in this region as the GT_TH and use the equation (3.17) to determine whether a pixel, p, at location (x, y) of frame n is reliable or not, where ‘1’ means the pixel is reliable and ‘0’ means unreliable.

1, _, _

(3.17)

Figure 3.20 Temporal gradient threshold for reliable pixel

Chapter 4 Experimental Results

To examine the performance of proposed methods, we use four test video sequences with QCIF (176x144) resolution and split those test sequences into two subsequences; one consisting of all odd frames and the other all even frames. Then, we get reconstructed even frames by encoding the even sequence with H.264/AVC reference software, JM 16.0 [9], and perform the proposed FRUC algorithm on the reconstructed even frames to generate all odd frames. The performance is then evaluated by comparing the interpolated odd frames with original odd frames. The proposed methods are compared with MCI method for both objective and subjective visual qualities. The objective quality is measured using Peak Signal-to-Noise Ratio (PSNR) which is defined by Equation (4.1)

10 4.1

, where

∑ ∑ _, _,

4.2

, where height and width are the frame resolution in vertical and horizontal directions, respectively; _, is the pixel value of the original sequence and _, is the pixel value generated (or interpolated) by the decoder.

在文檔中使用時間域與空間域的插補法來提升畫面更新率 (頁 29-36)