In our efforts to devise efficient search strategies for optimal/near-optimal extraction paths, we discovered that the global condition can be satisfied by maintaining the con-vexity of R-D curves across spatial/quality and temporal layers during SVC encoding.
With hierarchical and dyadic temporal dependencies, the cascading QP assignment in current JSVM [11] can already make the R-D curves across temporal layers convex in most cases, especially when MSE is used for distortion measure. This is because higher temporal layers are coded with larger QP values, which inherently leads to diminishing R-D improvement with increasing temporal level.
On the other hand, among the spatial and quality layers, the convexity of their R-D curves can be guaranteed by satisfying the following criterion.
Criterion 6 Convexity of Rate-Distortion Curves across Spatial and Quality Layers.
An SVC encoder should produce an SVC bitstream according to a well-adapted inter-layer (spatial and quality) dependence relation that ensures every successive refine-ment of scalable layer representations exhibits a monotonic decrease in MSE
dis-Chapter 5. Production of Well-adapted SVC Bitstreams
Figure 5.1: R-D performance of SVC bitstreams with different inter-layer dependency settings. Labels A, B, C, D, and E denote five coding layers of different SNR levels with E being the target layer for reconstruction.
tortion d(Li, bT ) > d(Li+1, bT ) as well as a monotonic decrease of R-D improvement γL(Li, bT ) > γL(Li+1, bT ) > 0.
This criterion forbids the slope of the R-D curves to steepen (or equivalently their R-D improvement to rise) as a viewing device takes in a sequence of coding layers in successive refinement steps. Its practical implication can be explained using an example shown in Figure 5.1. In the example, each layer (from B to E) in Setting #1 depends on its previous layer; hence, the reconstruction of layer E requires the decoding of all its dependent layers from A to D. However, because the R-D improvement produced by D is not as good as the one produced by E, Setting #1 cannot maintain the R-D convexity. In contrast, Setting #2, which links C directly to E by skipping D, is a well-adapted dependency setting.
We must advise readers to exercise caution when they try to set up a well-adapted inter-layer dependence relation because the adaptation can easily be overdone. In Figure 5.1, although Setting #2 (which ensures R-D convexity along the spatial/quality dimension) produces a better R-D performance for a single viewing device even if it takes layer E in one moment and layer D in another, Setting #1 (which fails to maintain R-D convexity) consumes less bandwidth when it comes to serving two viewing devices existing in the same network. This observation confirms a well-known fact that the SVC coding gain over simulcasting is at the cost of the R-D performance of individual layers. Our advice of caution can be summarized in the following proposition.
Proposition 1 Minimal Adaptation of Successive Inter-layer Dependencies. An SVC encoder should choose a successive inter-layer dependence relation, which usually
pro-duces the lowest bit rates, to be the default dependency setting. The dependence re-lation should only be modified at the refinement steps that produce non-convex R-D improvements. At those refinement steps, the reference layers should be chosen to be the nearest spatial/quality layers that can produce convex R-D improvements.
Again using the example in Figure 5.1, a proper adjustment of inter-layer dependen-cies is to make layer E depend on layer C rather than layer B. This minimal adjustment of inter-layer dependence relations shall only cause a small increase in the total data rate of the SVC bitstream. We would like to emphasize that such strategy is to ensure the global condition rather than to optimize the R-D performance of individual lay-ers. For the later case, readers are referred to the paper by Yao and Li [17] for more complete discussion.
CHAPTER 6
Experiments
6.1 Implementation of Well-adapted SVC Bitstream
Having described our criteria for well-adapted bitstreams, this section further presents a practical approach for generating well-adapted inter-layer dependencies.
6.1.1 Prediction of R-D Convexity
To predict the R-D performance of SVC along the spatial/quality dimension, one ef-fective approach is to evenly add 10% or more redundancies1 to the R-D points of H.264/AVC [13]. The results generally hold when multi-loop encoder control and fixed-quality configurations are used [6][13]. Moreover, the predictability remains valid with bottom-up encoding process [11] after taking into consideration that the enhancement layers usually suffer more coding efficiency losses than the base layer. The observa-tions enable us to predict the R-D convexity of SVC without the need of exhaustive encoding.
1Comparing with the single layer coding, the coding efficiency loss of SVC is generally proportional to the number of coding layers. In some cases, the R-D gap between H.264/AVC and SVC can be much greater than 10%.
Mobile
Kbits/s
0 200 400 600 800 1000 1200 1400 1600
PSNR-Y (CIF)
Figure 6.1: Comparison of SVC dependency settings: (a) Mobile and (b) Foreman.
The results were produced with bottom-up encoding process and fixed-quality config-urations.
For validation, several SVC bitstreams, each corresponds to one of the following dependency settings, were encoded using bottom-up encoder control and fixed-quality configurations. In particular, Setting #1 denotes the default dependency setting (which yields a minimal total bit rate), whereas Settings #2 and #3 adapt the default setting by merely changing the reference layer of layer B0. The R-D performances of these dependency settings are compared with that of H.264/AVC in Figure 6.1.
• Setting #1: (QCIF A0←A1←A2), (CIF A2←B0←B1). (Default Setting)
• Setting #2: (QCIF A0←A1←A2), (CIF A1←B0←B1).
• Setting #3: (QCIF A0←A1←A2), (CIF A0←B0←B1).
Looking at the R-D points of H.264/AVC in Figure 6.1, one can readily predict that Setting #3 would be a well-adapted setting for Mobile sequence, and the prediction was confirmed by the corresponding SVC R-D curve. Likewise, in Foreman sequence, both Settings #2 and #3 are likely to ensure R-D convexity. Although Setting #3 has better R-D performance, we choose Setting #2 because, as will be seen in the next section, the increase in total bit rate is minimized.
In Figure 6.2 we further present the results with fixed-rate configurations, in which the quality (and the QP) of each layer is not fixed; rather, the cumulative rate to each layer is kept constant regardless of dependency settings. Comparing with the H.264/AVC, the coding efficiency loss of SVC can be seen from the drop of R-D curves.
Similar to the bit rate increase in fixed-quality configurations, the distribution of PSNR drops helps to predict the R-D convexity of SVC. From Figure 6.2, we obtain exactly
Chapter 6. Experiments
Mobile
Kbits/s
0 200 400 600 800 1000 1200 1400 1600
PSNR-Y (CIF)
Figure 6.2: Comparison of SVC dependency settings: (a) Mobile and (b) Foreman.
The results were produced with bottom-up encoding process and fixed-rate configura-tions.
the same dependency settings as with fixed-quality configurations. Interestingly, in Foreman sequence there is a “bump” in the R-D curve with Setting #1. This is because the QP value of layer B0 is improperly chosen to meet the bit rate constraint.
The result stresses the importance of proper QP settings.
The preceding discussions assume the availability of H.264/AVC R-D points. The assumption does not generally hold unless each layer is pre-encoded with H.264/AVC.
Collecting these R-D data is indeed time-consuming, but performing exhaustive SVC encoding is even worse. In addition, in our approach the R-D convexity is guaranteed only at full frame rate. Nevertheless, the global condition requires R-D convexity at all possible frame rates. We have found empirically that the convexity at full frame rate would also likely to ensure the convexity at lower frame rates. After all, the R-D behavior at full frame rate represents the average performance of all video frames.
6.1.2 Degradation in Coding Efficiency
The previous section has analyzed the SVC R-D convexity under various dependency settings. We now turn our attention to the overall coding efficiency, which is charac-terized by the total bit rate of an SVC bitstream. As described previously, long-term inter-layer reference may be needed for the sake of R-D convexity. It is natural then to question whether and to what extent the total bit rate will increase. The answers can be found by the comparison shown in Figure 6.3. From there it can be seen that the well-adapted dependency settings (Setting #2 for Foreman; Setting #3 for Mobile)
Sequences -Approach Foreman-FQ Foreman-FR Mobile-FQ Mobile-FR
Normalized Total Bit Rate (%)
95 100 105 110 115 120 125 130 135
Setting 1 Setting 2 Setting 3
Figure 6.3: Comparison of total bit rate for different dependence settings. Fixed-quality (FQ) and fixed-rate (FR) configurations were used.
incur, on average, 15∼20% bit rate increase in comparison with Setting #1 (default setting). The penalty arises mostly because layers A1 and A2 are not utilized for the inter-layer prediction of layer B0 in Settings #2 and #3.