Experiments - 用於H.264/MPEG-4 AVC可調式視訊編碼標準之快速編碼演算法設計

In the following experiments, the proposed algorithm is implemented on JSVM 8 [77] and with the test video sequences covering a broad range of visual characteristics. We run tests of CGS, dyadic spatial scalability, and the combination of these two scalabilities. The layer dependency in three tests

Base Layer

Misc. RDO CABAC Adaptive Inter-Layer Prediction Structures (a)-(c) are specified in Fig. 4-5.

In Table 4-3, the proposed schemes are compared with JSVM 8 [77] in terms of the average

Y-PSNR loss (∆P) and bit rate increase (∆R), and the overall time saving (TS). Our scheme provides up to 64% overall encoder time saving over JSVM 8 [77], which is about 3 times faster in the encoding process. Moreover, the average bit rate increase is less than 0.9% and the PSNR loss is no more than 0.05 dB. The changes in PSNR and bit-rate are so small that their output sequences cannot be distinguished between the proposed scheme and JSVM 8 [77]. In Table 4-4, the complexity ratio is defined as the ratio of the enhancement layers encoding time and the base layer encoding time. As shown, the proposed scheme provides up to 69% time saving for encoding the enhancement layers.

In addition to JSVM 8 [77], we also compare our scheme with the other two state-of-the-art fast algorithms [78][79], in which a reduced mode set offers a reduction of 58% in encoding complexity.

However, without considering the Intra16x16, these two schemes result in higher Y-PSNR loss and bit rate increase, especially in the CREW and ICE sequences.

Also from Fig. 4-5(b) and Table 4-3, it is interesting to note that the speed-up in the spatial scalability is only slightly higher than 50%. This is mostly due to the fact that the spatial interpolation, as required by the spatial scalability, involves computation-intensive filtering operations that are not accelerated in the proposed scheme. Another factor is that the rate-distortion estimation is currently applied to the CGS case only. As a result, the gain comes solely from our layer-adaptive mode selection.

Table 4-3 Performance comparisons

Sequence

Xiong’s method [78] Yang’s method [79] Proposed

(dB) (%) (%) (dB) (%) (%) (dB) (%) (%)

Table 4-4 Layer complexity ratio of enhancement-layer encoding time to base-layer encoding time

Structure JSVM 8 [77] Xiong’s method [78]

Yang’s

method [79] Proposed

(a) 3.99 1.53 2.21 1.25

(b) 25.18 13.52 16.38 12.02

Chapter 5 Fast Mode Selection and Motion Search for Scalable Video Coding with Combined Coarse Granular Scalability (CGS) and Temporal Scalability

To speed up the H.264/SVC encoder [2], we propose a layer-adaptive intra/inter mode decision algorithm and a motion search scheme for the hierarchical-B frames in H.264/SVC [2] with combined coarse-grain quality scalability (CGS) and temporal scalability. To reduce computation but maintain the same level of coding efficiency, we examine the rate-distortion performance contributed by different coding modes at the enhancement layers and the mode conditional probabilities at different temporal layers. For the intra prediction on inter frames, we can reduce the number of Intra4x4/Intra8x8 prediction modes by 50% or more, based on the reference/base layer intra prediction directions. For the enhancement-layer inter prediction, the look-up tables containing inter prediction candidate modes are designed to use the macroblock coding mode dependence on and the reference/base layer quantization parameters ( ). In addition, to avoid checking all motion estimation reference frames, the base layer reference frame index is selectively reused. And according to the enhancement-layer macroblock partition, the base-layer motion vector can be used as the initial search point for the enhancement-layer motion estimation. Compared with JSVM 9.11 [10], our proposed algorithm provides a 20x speedup on encoding the enhancement layers and an

85% time saving on the entire encoding process with negligible loss in coding efficiency. Moreover, compared with other fast mode decision algorithms, our scheme can demonstrate a 7%–41%

complexity reduction on the overall encoding process.

The rest of this chapter is organized as follows. Section 5.1 contains a brief review of the prior works related to the fast mode decision algorithms both in H.264/AVC [4] and H.264/SVC [2]

coding structure. Section 5.2 analyzes the correlation between the mode distributions of the base layer and enhancement layers. Section 5.3 describes our context-adaptive mode decision algorithm, and also presents our motion search strategy. Section 5.4 compares the proposed schemes with JSVM and the other state-of-the-art algorithms in terms of complexity reduction and rate-distortion performance.

Section 5.1 Literature Review

An effective way to reduce the encoding complexity is to restrict the number of candidate modes.

There exists a large body of literature devoted to the studies on mode reduction for H.264/AVC [4].

For example, Tsai et al. in [83] design a set of gradient filters to extract the edge direction, which decides the intra prediction mode to avoid testing all possible directions. They further improve the mode detection accuracy in texture areas by computing the intensity difference at both sub-block and pixel levels [84]. Another example of using macroblock features to predict mode sets can be found in [85]. They first classify macroblocks into three categories according to their inter, intra, and motion

Bayesian rules. Similarly, Zeng et al. [69] pick up the mode set for each macroblock based on its motion activity. There are some other mode reduction approaches that exploit the spatial and temporal correlation between partition modes. Their processes usually predict the most probable macroblock mode by observing the coding mode of its nearby macroblocks [61] or of its co-located macroblock in the previous frame [68]. Similar concepts are adopted to develop early termination conditions in the mode decision process. For example, a Skip decision scheme is designed based on the conditions of evaluating various inter/intra modes [86]. This type of techniques has often been generalized to a hierarchical decision process with multiple termination criteria in [68], [65] and [87].

All these methods are equally applicable to the intra-layer mode reduction in H.264/SVC [2].

Thus far, little research has been devoted to the study of the H.264/SVC [2] fast mode decision.

Most of published articles use the inter-layer correlation to confine the mode search at the ELs. Li et al. [80][88], for example, observe that owing to the Lagrange rate-distortion optimization process, the inter macroblock motion partition at enhancement layers tends to be the same as or smaller than that of its corresponding base-layer macroblock. This observation is used in conjunction with the base layer mode decision to design a fast mode search for the ELs. In [72], the complexity reduction is made a step further, by considering both the spatial homogeneity of the mode distribution and its consistency across temporal layers. In [89], Ren et al. notice a high correlation exists in spatially neighboring macroblocks. Thus, they develop an intra-layer fast algorithm without considering the inter-layer relationship. For each coding layer, their method collects the local area’s best partition

with rate-distortion costs to progressively perform the mode search for each macroblock until an early termination condition is satisfied. Some other previous work has been associated with the intra macroblock mode reduction. Yang et al. [79] show that the inter-layer intra prediction can effectively replace Intra16x16 and Intra8x8 modes. On top of that, Xiong [78] makes an additional simplification by restricting the Intra4x4 prediction to three options only: Vertical, Horizontal, and DC modes. Through the effective use of the inter- and/or intra-layer correlation between coding modes, an average computing time saving of 40% to 60% (in comparison with JSVM 9.11 [10]) has been reported at the cost of 1% to 4% bit-rate increase for typical test sequences.

However, in determining the reduced candidate mode set for enhancement layers, most existing approaches have not yet considered the following issues, leading to a loss of rate-distortion performance and/or a waste of computational power.

1. The effect of layer settings on the mode distribution at enhancement layers. In our previous studies [81][82], we noticed that the quality of the base layer affects the reliability on the candidate mode prediction, and that an enhancement layer, when coded at a much higher bit-rate than its base layer, may have a completely different behavior in mode selection. The candidate mode set must therefore be adaptively adjusted for different layer settings. The need for this adjustment becomes most obvious in the multi-layer coding scenarios, where the values and the inter-layer dependency change on a layer-to-layer basis.

2. The correlation between the motion parameters of base layer and enhancement layer. As also

shown in our previous studies [81][82], an enhancement-layer (inter) macroblock usually has the same reference frame index and prediction direction as its co-located macroblock at the base layer, especially when both are coded with the same macroblock partition. In this regard, the exhaustive motion search (adopted by most previous researchers) may not be needed for reaching the target rate-distortion performance.

Based on the above observations, we propose in this paper a fast context-adaptive mode decision algorithm and a reduced-complexity motion search strategy for H.264/SVC [2] with combined CGS and temporal scalability. Our scheme distinguishes from the other approaches in two significant ways: (1) the candidate mode set for each enhancement-layer macroblock is chosen according to both local and global contexts－including the coding mode adopted by its co-located macroblock at the base layer, the assigned to the base layer and enhancement layers, as well as its temporal layer index, and (2) the search for motion parameters, for a particular candidate mode, is conducted only when the base-layer motion information is not reusable. That is, the exhaustive motion search is performed only when the BL motion information is judged unreliable for that enhancement layer. Compared with JSVM 9.11 [10], our method shows an overall time reduction of 65-85% with a minor bit-rate increase of less than 1%. The computational complexity for coding the enhancement layers alone is reduced to 10% of that of the JSVM implementation [10]. Compared with the state-of-the-art fast algorithms, [80][88][89], an up to 41% improvement can be achieved solely by the use of inter-layer correlation; further improvement is expected when the intra-layer

correlation is also incorporated.

Section 5.2 Correlations between Base and Enhancement Layers

In this section, we are going to investigate the relationship between the base-layer coding modes and the enhancement-layer coding modes, with a focus on the CGS configuration. We like to know from the statistical analysis that (1) which intra/inter modes are the enhancement-layer dominating modes;

(2) how these modes are distributed when the base-layer mode is given; and (3) which coding modes are most critical to the enhancement-layer rate-distortion performance. In addition, we examine the statistics of the reference frame selection and the inter-layer residual predictor efficiency. Our codec contains one base layer and one CGS enhancement layer and is tested on six video sequences:

AKIYO (QCIF), STEFAN (QCIF), FOREMAN (CIF), MOBILE (CIF), CITY (4CIF), and CREW (4CIF). The notations and denote the quantization parameters of the base layer and enhancement layers, respectively, and AVG. shows the averaged behavior of all six test sequences.

5.2.1 Distributions of Intra Prediction Mode in CGS

Our first study aims at exploring the effect of value on the correlation of intra prediction types/modes between coding layers. In Fig. 5-1, the distribution of the enhancement-layer intra modes is displayed as a function of and . We can see that the distribution is highly dependent on the quality of the base layer and enhancement layers. When the base layer is coded

with good quality (using a small ), most of the intra macroblocks are coded in the IntraBL type, whose predictor comes from the base-layer intra-coded macroblock. However, when the enhancement-layer quality gradually improves, the intra predictor is switched from the base layer to the enhancement layer. Particularly, the Intra4x4 percentage increases more noticeably than the other two types, Intra8x8 and Intra16x16; together with the IntraBL, it makes up 80% or more of the intra prediction types at enhancement layers. Its percentage can be higher than 90%, especially in the complex-texture sequences such as MOBILE and STEFAN. Our results agree with the findings reported in [79]. In addition, the Intra16x16 is preferred for smooth areas, but its presence is usually less than 10% at the CGS enhancement layers because it must compete with the IntraBL mode, which is chosen more often in the smooth areas due to less overhead. As the base-layer quality improves, the Intra8x8 and the Intra16x16 do not seem to offer benefit in coding efficiency.

In addition to the intra prediction type, we compare the nine prediction directions in intra coding when both layers are coded by either Intra4x4 or Intra8x8. Specifically, an enhancement-layer coding block is said to have a similar prediction mode to its counterpart at the base layer if the best prediction comes from the same or neighboring directions, or if it uses the DC mode. For instance, if the coding block at the base layer selects the Vertical mode and the one at the enhancement layer picks up either Vertical, Vertical Right, Vertical Left, or DC predictions, these two blocks are called similar in prediction direction. The similarity check requires locating the base-layer counterpart of a coding block. As shown by Fig. 5-2, this process can be implemented by a one-to-one block address

mapping in the CGS configuration.

20 25 30 35 40

Fig. 5-1 Distribution of intra prediction types at CGS enhancement layers

Fig. 5-2 One-to-one block address mapping of CGS

20 25 30 35 40

Fig. 5-3 shows the probability of the base layer and the enhancement layer having similar intra prediction modes for fixed and a set of values ranging from ( ) to . From these data we can conclude that the intra prediction modes between the base layer and the enhancement layer are strongly correlated and, on the average, 75% or higher block pairs adopt similar prediction modes. Moreover, this correlation becomes even stronger when is closer to and this tendency does not seem to be affected by the base-layer quality and the test sequence.

5.2.2 Distributions of Inter Prediction Mode in CGS

Next, we investigate the correlation of the motion partition between the base layer and the enhancement layer, under different values and prediction distances. To this aim, we collect the conditional probability of partition modes at different temporal enhancement layers with

and varying from 20 to 40. This conditional probability is defined by

, (5.1)

where denotes the best mode selected by the base layer with at temporal layer ; is the optimal mode at the enhancement layer with at temporal layer ; {B_Direct/Skip, Inter16x16, Inter16x8, Inter8x16, Inter8x8}; and {B_Direct/Skip, Inter16x16, Inter16x8, Inter8x16, Inter8x8, Intra, BLSkip}. The collected statistics are given in Fig. 5-4 (a)-Fig. 5-4 (e). In addition, in Fig. 5-4 (f), a different conditional probability is defined as

, (5.2)

where {B_Direct8x8, Inter8x8, Inter8x4, Inter4x8, Inter4x4}. This conditional probability presents the distribution of the finer partitions including 8x8 and those smaller than 8x8.

20 24 28 32 36 40

(a) Conditional probability of when

20 24 28 32 36 40

(b) Conditional probability of when

20 24 28 32 36 40

(d) Conditional probability of when

20 24 28 32 36 40

(e) Conditional probability of when

20 24 28 32 36 40

(f) Distribution of sub-partition at enhancement layers

Fig. 5-4 Conditional probability of inter partition mode at CGS enhancement layers for , between 20 to 40, and GOP size = 16

From Fig. 5-4, several important observations can be made:

More than 50% of macroblock pairs choose the same motion partition for both base layer and enhancement layer, namely,

(5.3)

Among them, the enhancement-layer macroblock can be coded in either BLSkip mode or the other inter modes, which may or may not use inter-layer motion prediction. The BL_Skip mode is chosen most often especially at higher temporal enhancement layers. The second and the third most probable modes are B_Direct/Skip and Inter16x16, respectively.

This observation is slightly different from those in [72][80][88], which suggest the enhancement-layer candidate mode generally does not have partition size larger than its co-located base-layer macroblock mode. Interestingly, if the base-layer macroblock chooses Inter8x8 mode, the choice for the enhancement-layer macroblock is also likely (>70%) to be the same. These results seem to be independent of the difference,

When a base-layer macroblock is coded in B_Direct/Skip mode, its co-located enhancement-layer macroblock is often coded in either B_Direct/Skip or Inter16x16.

If a base-layer macroblock is coded with the 8x16 (or 16x8) partition, it is unlikely that its enhancement-layer counterpart will choose the 16x8 (or 8x16) partition.

The probability for an enhancement-layer macroblock to be coded in BLSkip mode is

greater than 0.5 at the two highest temporal layers, and .

The probability for an enhancement-layer sub-macroblock having a sub-partition finer than 8x8 is usually less than 0.2. Even though the MOBILE and STEFAN have more macroblocks coded with finer partitions, on the average, 70% of sub- macroblocks still select the B_Direct8x8 and Inter8x8 as their sub-partition modes.

: Our

experimental data reveal that when an enhancement-layer macroblock is further partitioned into sub-partitions smaller than 8x8, the conditional probability of Inter4x4 is typically less than 0.05, whereas it can increase to 0.1 for the sequences MOBILE and STEFAN.

Fig. 5-4(a) to (e) also show that the most probable mode in the hierarchical-B frames is the BLSkip mode. This is a direct consequence of the Lagrangian rate-distortion optimization process, which looks for a balanced compromise between distortion and coding rate. To achieve a better quality, an enhancement-layer macroblock may search for new motion vectors with the same-size partition or additional motion vectors offered by finer partitions. However, these two alternatives may require extra coding bits. Statistically, using the lower layer information as much possible seems to be a good policy for the mode decision at enhancement layers, especially in the CGS configuration because it has the benefits of reducing the number of candidate modes. This is most obvious when the base layer is coded with good quality using a small . In such a case, the conditional probability

(5.4)

can go higher than 0.9, making it possible to skip more coding modes with different partition size from that at the base layer. Furthermore, the inter-layer relation represented by becomes stronger as the index of temporal layer increases. We thus divide the mode conditional probabilities into four regions along two dimensions, the temporal layer and the quantization parameter of the reference layer, as illustrated by Fig. 5-5. High conditional probabilities appear at small and higher temporal layers. In our scheme, and refer to the highest two temporal enhancement layers in a GOP. For a small GOP size, such as four, it is possible that all the temporal enhancement layers belong to the category.

30 T

_N-1

~T

T

₁

~T

_N-2

High

High Low Medium

Fig. 5-5 Four regions representing different degrees of mode correlations between coding layers

In summary, the base-layer coding information can be a good reference for predicting the enhancement-layer coding mode in the CGS configuration. Generally, which coding mode would be the best for a base-layer macroblock depends highly on the image texture. However, the conditional probabilities of the enhancement-layer modes do not vary drastically with video content. In other words, the inter-layer mode correlation is nearly content-independent in the sense that when conditioned by the base-layer modes, the distribution of the enhancement-layer modes has a weak dependency on video content. Therefore, in Section 5.3 we will use these observations to design our fast enhancement-layer mode decision algorithm.

5.2.3 Temporal Reference Frames between Coding Layers

As described before, the motion estimation operation in the hierarchical-B frames needs to find the

在文檔中用於H.264/MPEG-4 AVC可調式視訊編碼標準之快速編碼演算法設計 (頁 115-0)