• 沒有找到結果。

Chapter 3 Data and Computation Efficient Inter Predictor Design for H.264/AVC Scalable

3.5. Combination of All Low Bandwidth Inter-Layer Prediction Algorithms

3.5.2. Algorithms Combination

Fig. 3.18 exhibits the overall flowchart of our proposed data efficient algorithm. At the beginning, the frame size is determined to decide which data efficient algorithm should be applied. If the frame size is less than 480P, the Level C based data efficient algorithm is applied. Otherwise, the multi-level based data efficient algorithm is applied. Fig. 3.19 and Fig.

3.20 show the detail flowchart of our proposed multi-level based and Level C based data efficient algorithm. From the figures, the main difference between two algorithms is only that the reference data reusing. In Level C data reuse scheme, the overall reference data can be reused for other Inter-layer prediction modes. However, for the multi-level data efficient algorithm, since the reference data memory of Inter prediction mode includes the reference data of Level 0, Level 1, and Level 2, only the reference data of Level 0 have be reused for other Inter-layer prediction modes due to the reference data of Level 1 and Level 2 are the

71

sub-sampled version and they are unsuitable to be used for Inter-layer prediction. In addition, the residual information is also loaded to compute the corresponding RDCosts of Inter-layer residual prediction.

Fig. 3.18. Flowchart of proposed data efficient algorithm

Start

Derive MBPartition, n=0 Derive MVPInter

DiffMV < th2

Load residual informationResidual information Reference data of Level 0

Fig. 3.19. Flowchart of proposed multi-level based data efficient algorithm

72

Start

Derive MBPartition, n=0 Derive MVPInter

DiffMV < th2

Load reference data of Inter by using Level C data reuse

scheme

Load residual informationResidual information

Fig. 3.20. Flowchart of proposed Level C based data efficient algorithm

73

3.6. Simulation Results

In this subsection, several simulation results are conducted to demonstrate the efficiency of our proposed data efficient Inter-layer prediction algorithms. The simulation settings are shown in Table 3-V. Fig. 3.21 and Fig. 3.22 show the rate distortion curve comparisons for different frame resolution. In Fig. 3.21 the frame size of spatial base and enhancement layer is QCIF and CIF, separately. For non-dyadic spatial resolution relationship between spatial layers, Fig. 3.22 shows the case that the frame size of spatial base layer is QCIF and the frame size of spatial enhancement layer is 480P. In the simulation results, the multi-level based and Level C based motion estimation algorithm with our proposed Inter-layer data efficient algorithm are individually compared to the JSVM reference software 9.14 [52]. However, although there are many test sequences have been used to conduct our simulation result, only few rate distortion curves of some test sequences have been listed in our dissertation.

Nevertheless, all listed rate distortion curves cover all motion behavior ranged from slow to high motion. From the rate distortion curve comparisons, we can observe that both of multi-level and Level C based data efficient motion estimation algorithms result in near the same rate distortion performance when compared to JSVM reference software.

Table 3-VI and Table 3-VII show the detailed rate distortion comparisons for different spatial relationship configuration. In Table 3-VI, the frame size of base and enhancement layers is QCIF and CIF, respectively. In Table 3-VII, the frame size of QCIF and 480P is individually applied for the spatial base and enhancement layer. From these tables, we can observer some interesting phenomena from the rate distortion results. First, it can be seen that the rate distortion performance of multi-level based data efficient motion estimation algorithm is worth than that of the rate distortion performance of Level C based data efficient motion estimation algorithm in the small frame resolution coding configuration due to Level C data reuse scheme can easily find out the best result from the limited search area when frame size

74

is small. Therefore, this is the reason that why the rate distortion performance of Level C based algorithm is much better than multi-level based algorithm in small frame resolution case. However, for the larger frame resolution case, the multi-level based data efficient algorithm can achieve much better rate distortion performance when compared to Level C based algorithm. Second, from the Table 3-VII, we can observe that the Level C based data efficient algorithm results in very significant rate distortion performance drop. Fortunately, thanks the help of our proposed adaptive motion estimation switching algorithm, the Level C based data efficient motion estimation algorithm will not been applied in our Inter prediction operation when frame size is larger than 480P. Therefore, the rate distortion performance drop of Level C data efficient algorithm will not presented in our proposed algorithm in larger frame resolution case.

On average, the rate distortion performance drop of our proposed Level C based data efficient motion estimation algorithm is only 0.161% bitrate increasing and 0.002 dB PSNR decrease for CIF frame resolution. For the higher frame resolution sequence 480P, our proposed multi-level based data efficient motion estimation algorithm only results in 0.91%

and 0.09 dB bitrate increasing and PSNR decrease, respectively.

Table 3-V Encoding setting configuration

Codec JSVM 9.14

Test sequences

Akiyo, Coastguard, Container, Foreman, Mobile, Silent, Blue_sky, Tractor, Station2, Pedestrain_area,

Rush_hour, Riverbed

QP 16, 20, 24, 28, 32

Resolution QCIF and CIF, CIF and 480p

Search range ±16 and ±32

Frame rate 15 Hz

GOP 8

Inter-layer prediction Adaptive Inter-layer prediction

Frames to be encoded 100

75

(a)

(b)

(c)

Fig. 3.21. RD curves comparisons (a) Foreman, (b) Stefan, and (c) Akiyo

30

76

(a)

(b)

(c)

Fig. 3.22. RD curves comparison (a) Tractor, (b) Pedestrian_area, and (c) Riverbed

30

77

Table 3-VI Detailed rate distortion comparisons for different encoding configuration settings (BL:QCIF, EL:CIF)

PMRME average : ΔBitrate = 1.123%, ΔPSNR = -0.073 dB

78

Table 3-VII Detailed rate distortion comparisons for different encoding configuration settings (BL:QCIF, EL:480P)

Table 3-VIII exhibits the data access bandwidth saving comparison for different algorithms and frame sizes. From this table, 50.55, 64.78, 80.07, 80.06, 90.18, and 95.00 % data access bandwidth savings can be achieved by our proposed data efficient Inter-layer prediction

79

algorithm for frame resolution of QCIF, CIF, 480P, 4CIF, 720P, and 1080P, respectively on average.

Table 3-VIII Data access bandwidth savings comparison (%)

Proposal QCIF CIF 480P 4CIF 720P 1080P

Level C based1 54.74 67.28 79.29 79.26 88.07 93.42 Multi-level based2 46.35 62.27 80.85 80.85 92.28 96.57

Average 50.55 64.78 80.07 80.06 90.18 95.00

1 Compared to Level C data reuse scheme without our proposed data efficient IL algorithm

2 Compared to multi-level scheme without our proposed data efficient IL algorithm

3.7. Summary

In this section, we proposed several data efficient Inter-layer prediction algorithms to lighten the data access bandwidth overheads. For Inter-layer residual prediction, we propose an efficient data reuse scheme for helping the derivation of RDCosts of Inter-layer residual prediction. Through our proposed algorithm, not only the reference data can be reused but the RDCosts of Inter/Inter-layer motion and Inter+residual/Inter-layer motion+residual can be derived by single motion estimation module. For Inter-layer motion and InterBL prediction algorithms, we propose an efficient data reuse scheme for saving the data bandwidth by reusing the reference data of Inter prediction mode. In addition, we also propose an adaptive motion estimation switching algorithm to dynamically select the motion estimation algorithm by determining the frame size. Several simulation results show that our proposed Level C based data efficient motion estimation algorithm only results in 0.161% bitrate increasing and 0.002 dB PSNR decrease for CIF frame resolution. Besides, our proposed multi-level based data efficient motion estimation algorithm only leads to 0.91% and 0.09 dB bitrate increasing and PSNR decrease, respectively. For data access bandwidth savings, 50.55, 64.78, 80.07, 80.06, 90.18, and 95.00 % data access bandwidth savings can be achieved by our proposed data efficient Inter-layer prediction algorithm for frame resolution of QCIF, CIF, 480P, 4CIF, 720P, and 1080P, respectively on average.

80

Chapter 4

An Efficient Mode Pre-Selection

Algorithm for Fractional Motion

Estimation in H.264/AVC Scalable

Video Extension

81

4.1. Introduction

To efficiently exploit the temporal relationship between successive frames, the video coding system usually adopts the technique of Inter prediction to remove the temporal redundancies.

In the older video coding systems, the Inter prediction usually stands for the integer pixel position motion estimation based on the assumption that the object moving is only occurred in the integer position. However, some literatures shown that taking sub-pixel position into account in the Inter prediction can further improve the coding performance since the object moving would not be always in the integer accuracy. Therefore, the concept of fractional accuracy motion estimation is thus getting popular and literature [53] shown that 4+ dB in PSNR improvement can be achieved by adopting fractional motion estimation (FME). As a result, the fractional motion estimation is thus been widely adopted in the existing video coding standards [7].

Fig. 4.1 illustrates the concept of FME and its operation is explained in detail as follows.

The operation of FME is mainly composed by two steps called half pixel and quarter pixel position motion estimation. After the best integer position has been decided by integer pixel motion estimation (IME) as labeled by circle symbols, the half pixel position motion estimation will be executed around the best integer pixel position. The half pixel position motion estimation checks eight additional candidate positions, as labeled by square symbols, to find out the best search results. During the checking process of half pixel position motion estimation, the absent pixels on the half pixel position will be interpolated by a six-tape interpolator with the coefficients of [1, -5, 20, 20, -5, 1]. Once the best half pixel position has been decided, the quarter pixel position motion estimation will be executed around the best position decided by the half pixel position motion estimation. Similar to the half pixels position motion estimation, the quarter pixel position motion estimation also checks eight additional candidate positions, as labeled by triangle symbols, to derive the final results of

82

Inter prediction. However, different from the half pixel position motion estimation, the absent pixels during the quarter pixel position motion estimation will be generated by a bi-linear interpolator. Therefore, from the operation of FME, we can find that the interpolation operations should be involved in both of half-pel and quarter-pel search stages.

Integer pixel position Half pixel position Quarter pixel position

Fig. 4.1. Illustration of factional motion estimation

Although FME only checks several positions around the best motion vectors resulted by integer motion estimation, the computational complexity of IME and FME are almost equal to each other especially in the hardware realization due to the complex interpolation process and lots of prediction modes need to be checked by FME [53][54]. To measure the computational complexity of fractional motion estimation, Fig. 4.2 shows the percentage of CPU usage profile for Mobile sequence reported in [55]. From this figure, we observed that the CPU usage of sub-pixel motion estimation is much higher than that of integer pixel motion estimation. Totally, 76% of CPU usage has been occupied by the fractional motion estimation component. As a result, the operations of IME and FME are usually divided into two different pipeline stages in hardware design [56] to balance the computational complexity of pipeline stage and thus aim at higher coding performance.

83

Fig. 4.2. Percentage of CPU usage profile of H.264 encoder. For Mobile sequence at CIF size, using CAVLC, ±16 search range, fast motion estimation, and no RD mode selection.

In H.264 video coding standard [7], variable block size motion estimation is supported in Inter prediction mode and each partition size has to be checked by IME and FME one by one to select the best prediction result as shown in Fig. 4.3. Thus, 41 blocks have to go through IME and FME operation. In addition to the inherent prediction modes in H.264, the mechanism of Inter-layer prediction modes adopted in SVC [11] including Inter-layer motion prediction (ILM), and Inter-layer motion residual prediction (ILM+R) also significantly increases the computational complexity of FME as shown in Fig. 4.4. As a result, 41×4=164 blocks have to be examined by FME in SVC.

16x16 16x8 8x16 8x8

IME for Inter mode FME for Inter mode Best mode selection

Best mode

8x4

4x8 4x4

Fig. 4.3. Illustration of mode selection process of H.264

84

Fig. 4.4. Illustration of mode selection process of SVC

To simplify the design complexity in hardware design, the small blocks ranged from 8×8 to 4×4 are early decided in IME stage to derive a Submode and thus only partition sizes of 16×16, 16×8, 8×16, and Submode (9 blocks in minimum and 21 blocks in maximum) have to be examined by FME operation as Fig. 4.5(a) shown. Similarly, the idea of Submode early decision can be also applied to SVC for easing the overheads of hardware implementation. As a result, only 36 to 84 blocks have to be examined by FME in SVC as shown in Fig. 4.6.

Although the early decision method for Submode can efficiently reduce the overheads of FME, the computational complexity of FME is still high. Several works [57]-[60] have been proposed to increase the coding speed of FME through the hardware implementation. In contrast to check all prediction modes, [61],[62] proposed a mode pre-selection method as shown in Fig. 4.5(b) to pre-selecting the potential skippable prediction modes before entering FME prediction process in H.264. However, none of above literature has addressed the issues of SVC. Thus, this chapter proposes an efficient mode pre-selection algorithm to lighten the computational complexity of FME for SVC by using the concept of mode pre-selection. In our proposed algorithms, the rate distortion cost relationship between different prediction

85

modes are first analyzed and observed to find out the clues that we can use to pre-select the potentially ignorable prediction modes. Based on the analytical results, several mode pre-selection rules are proposed in this chapter.

16x16 16x8 8x16 Sub-mode

Fig. 4.5. Illustration of (a) mode selection process for H.264/AVC and (b) mode pre-selection concept for H.264/AVC

Fig. 4.6. Illustration of mode selection process for SVC

The rests of this chapter are organized as follows. In Section 4.2, some observations are introduced to indicate the rate distortion cost relationship between different prediction modes.

86

Afterwards, the mode pre-selection algorithms are proposed according to the observations in Section 4.3. Simulation results are shown in Section 4.4 to demonstrate the efficiency of our proposed algorithms. The hardware architecture design is presented in Section 4.5. Finally, the conclusions are made in Section 4.6.

4.2. Analysis of Rate Distortion Cost between Prediction Modes

To find the valuable clues for helping the derivation of FME mode pre-selection algorithm, we conduct several analyses for the rate distortion cost of IME and FME between different prediction modes. Afterwards, we carry out several simulations to confirm the observed results. The reason why we choose the rate distortion cost to observe is that the best prediction mode is judged by the magnitude of rate distortion cost in mode selection process. Therefore, it is very intuitive to choose rate distortion cost for observation.

4.2.1. Observing the RDCost Relationship between Different Prediction Modes

In this subsection, we analyze the rate distortion cost (RDCost) relationship of IME and FME for four different prediction modes combination. These four combinations are "Inter versus Inter-layer motion (Type1)", "Inter+residual versus Inter-layer motion+residual (Type2)",

"Inter versus Inter+residual (Type3)", and "Inter-layer motion versus Inter-layer motion+residual (Type4)". Fig. 4.7 to Fig. 4.9, Fig. 4.10 to Fig. 4.12, Fig. 4.13 to Fig. 4.15, and Fig. 4.16 to Fig. 4.18 individually show the rate distortion cost relationship of IME and FME for Type1, Type2, Type3, and Type4 prediction modes combinations with different partition sizes. In these figures, the vertical axis indicates the RDCosts and the horizontal axis is the index of macroblocks. The terms of I, M, and R individually stand for the Inter, Inter-Layer Motion (ILM), and Inter-layer Residual prediction mode. From these figures, we can observe two properties called spatial locality and no spatial locality property. The spatial

87

locality here is roughly defined as that the macroblocks which satisfy the condition that rate distortion costs of IME and FME are very close to each other. For example, the RDCost of IME is very close to the RDCost of FME for Inter mode and the same situation can be seen from ILM prediction mode in Type1 prediction mode combination. That is, for example, if the IME RDCost of Inter mode is sufficiently larger than that of IME RDCost of ILM mode, it has high probability that the FME RDCost of Inter mode will be larger than FME RDCost of ILM mode and vice versa due to the spatial locality property. On the other hand, the no spatial locality property is defined as that the macroblocks don't have the spatial locality property.

From these figures we can observe that the most of macroblocks have the spatial locality property for block size of 16×16, 16×8, and 8×16. However, for the block size of Submode, such spatial locality property can not been observed easily from the figures.

From the above observations, we can summarize the following observations. First, the mode pre-selection process should take the block prediction size into consideration. That is, the block sizes of 16×16, 16×8, and 8×16 are jointed into account for deriving the FME mode pre-selection algorithm because these block sizes have much similar behavior. For the Submode block size, it should be treated individually for better prediction performance due to it reveals different behavior when compared to block sizes of 16×16, 16×8, and 8×16. Second, the macroblocks should be dealt with separately depending on what property they belong to.

In other words, the macroblocks which have the spatial locality property should be processed by one mode pre-selection algorithm which considers the spatial locality property, and the other macroblocks have to be processed by another mode pre-selection algorithm.

88

(a)

(b)

(c)

(d)

Fig. 4.7. Relationship between RDCosts of IME and FME of Football sequence for Type1 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

89

(a)

(b)

(c)

(d)

Fig. 4.8. Relationship between RDCosts of IME and FME of Foreman sequence for Type1 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

90

(a)

(b)

(c)

(d)

Fig. 4.9. Relationship between RDCosts of IME and FME of Soccer sequence for Type1 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

91

(a)

(b)

(c)

(d)

Fig. 4.10. Relationship between RDCosts of IME and FME of Football sequence for Type2 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

92

(a)

(b)

(c)

(d)

Fig. 4.11. Relationship between RDCosts of IME and FME of Foreman sequence for Type2 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

93

(a)

(b)

(c)

(d)

Fig. 4.12. Relationship between RDCosts of IME and FME of Soccer sequence for Type2 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

94

(a)

(b)

(c)

(d)

Fig. 4.13. Relationship between RDCosts of IME and FME of Football sequence for Type3 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

95

(a)

(b)

(c)

(d)

Fig. 4.14. Relationship between RDCosts of IME and FME of Foreman sequence for Type3 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

96

(a)

(b)

(c)

(d)

Fig. 4.15. Relationship between RDCosts of IME and FME of Soccer sequence for Type3 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

97

(a)

(b)

(c)

(d)

Fig. 4.16. Relationship between RDCosts of IME and FME of Football sequence for Type4 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

98

(a)

(b)

(c)

(d)

Fig. 4.17. Relationship between RDCosts of IME and FME of Foreman sequence for Type4 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

99

(a)

(b)

(c)

(d)

Fig. 4.18. Relationship between RDCosts of IME and FME of Soccer sequence for Type4 prediction mode combination (a) 16×16, (b) 16×8 (c) 8×16, and (d) Submode

100

As mentioned above that if the macroblocks with the spatial locality property can be detected, the unnecessary FME mode checking can be skipped and thus achieve the computational complexity reduction. However, quantitatively defining the threshold of

"sufficiently large" is not an easy task since different video sequences have different rate distortion cost behavior. There are two directions that can be used to define the term of

"sufficiently large" threshold. The first one is to run a large number of simulations and find a properly good value to be the term of "sufficiently large" threshold quantitatively. However, this approach lacks of flexibility to sense the content variation of target images and thus result

"sufficiently large" threshold. The first one is to run a large number of simulations and find a properly good value to be the term of "sufficiently large" threshold quantitatively. However, this approach lacks of flexibility to sense the content variation of target images and thus result