6.3 Performance of Greedy Heuristic Scheme
6.3.2 Computational Complexity
The computationally most demanding part in search for optimal extraction paths is to collect the R-D data associated with each decodable NAL set. While the exhaustive search needs to actually decode all possible representations, the greedy heuristic scheme
Temporal (T)
5.6332.71 4.9726.24 4.2119.44 3.2913.24
Exhaustive search Steepest-descent
Figure 6.7: Comparison of extraction paths for the steepest-descent method and exhaustive search: (a) R-D trellis diagram and (b) R-D curves.
Table 6.2: Comparison of extraction paths with MSE.
CIF30 CIF15 QCIF30 QCIF15
Exh. S.D. XO R Exh. S.D. XO R Exh. S.D. XO R Exh. S.D. XO R
M SE + F.R.
Akiyo 110100 110100 0 11010 11010 0 11000 10100 01100 1100 1100 0
Forem an 000111 000111 0 00111 00111 0 00011 00011 0 0011 0011 0
M obile 00011 00011 0 0011 0011 0 00011 00011 0 0011 0011 0
Fo otball 00011 00011 0 0011 0011 0 00011 00011 0 0011 0011 0
4CIF30 4CIF15 CIF30 CIF15
Harb or 00011 00011 0 0011 0011 0 00011 00011 0 0011 0011 0
ICE 000111 000111 0 001111 001111 0 00011 00011 0 0011 0011 0
M SE + B.Direct
Akiyo 111000 111000 0 11100 11100 0 11000 11000 0 1100 1100 0
Forem an 000111 000111 0 00111 00111 0 00011 00011 0 0011 0011 0
M obile 00011 00011 0 0011 0011 0 00011 00011 0 0011 0011 0
Fo otball 00011 00011 0 0011 0011 0 00011 00011 0 0011 0011 0
4CIF30 4CIF15 CIF30 CIF15
Harb or 00011 00011 0 0011 0011 0 00011 00011 0 0011 0011 0
ICE 000111 000111 0 001111 001111 0 00011 00011 0 0011 0011 0
reduces the computation by lazy evaluation. On average, only half (42 ∼ 58%) the number of decodable NAL sets are required for evaluation in order to achieve the same or similar performance. The gain is most obvious when an SVC bitstream contains a large number of decodable NAL sets.
6.4 Comparisons with Other Extraction Schemes
We conducted experiments to compare our adaptation scheme with the Quality-Layers-based approach [2] and Basic Extraction implemented in JSVM [11][8]. In our ex-periments, we examine two types of scalability: (1) QCIF SNR and (2) QCIF/CIF combined scalability. Two quality enhancements from the base quality are encoded for QCIF SNR scalability, while each spatial resolution in QCIF/CIF combined scalability
Chapter 6. Experiments
Figure 6.8: R-D preformance comparison of the proposed scheme with the Quality Layer and Basic extractions in JSVM 9: (a) QCIF SNR Scalability, (b) QCIF/CIF Combined Scalability.
is encoded with a base quality and one quality enhancement. Both experiments use the MGS vector mode {3, 3, 4, 6} without key pictures. In addition, each layer is simply predicted from the previous layer and the Quality Layers are assigned independently across spatial layers, i.e., the QCIF substreams must be entirely extracted prior to the extraction of the CIF layers.
From Figure 6.8, the proposed scheme is far superior to the other two approaches in Akiyo sequence while showing comparable performance in Foreman sequence. The rea-sons are twofold. Firstly, our scheme allows optimal extraction paths to preferentially improve spatial quality without extracting the entire base layer. However, both the Quality-Layers-based extraction and Basic Extraction must initially extract the base layer at full frame rate. Secondly, our extraction paths are derived based on the real R-D costs of scalable layers. Contrarily, the Quality Layers are computed by estimating the R-D information.
R1 R2 R3 R4
R2
R3 R4
R1
(a) (b)
Figure 6.9: Bitstream extraction (a) with and (b) without successive refinement.
R1-R4 indicate the extracted NAL sets associated with increasing bit rate.
Finally, we compare and contrast the major differences of our proposed scheme with other previous works, including the Basic Extraction in JSVM [11][8], the Qual-ity Information Table [4], the QualQual-ity Index [7], as well as the QualQual-ity-Layers-based approach [2].
• Applications: The Quality-Layers-based extraction [2] aims at medium-grain quality adaptation, while the other schemes focus on multi-dimensional adaptation with combined scalability. In particular, the Quality-Layers-based approach [2]
is conditioned on the full extraction of the base layer, whereas the others allow performing R-D optimal extraction without the presence of the entire base layer, so does ours.
• Extraction Constraints: Both our scheme and the Quality-Layers-based ex-traction must incrementally extract NAL units for successive refinement, while the others allow discretionary extraction. Through successive refinement, coarser representations are always embedded in finer ones, which leads to more efficient use and share of extracted NAL sets among viewing devices. The differences in bitstream extraction with and without successive refinement are shown in Figure 6.9 using Venn diagram.
• Extraction Criteria: All schemes perform bitstream extraction based on the R-D performance of NAL units except the Basic Extraction approach, which carries out extraction in such a way that the resulting bitstream must have a bit rate that is closest to but not greater than the target bit rate. As it has been shown in our R-D analysis, decoding a substream with a higher bit rate does not necessarily produce better playback quality, especially when spatiotemporal
Chapter 6. Experiments
interpolation is involved.
• Distortion Measurement: Both our scheme and the Quality-Index-based ap-proach compute the R-D data with respect to interpolated videos rather than decoded videos. Also, as indicated in our analysis, the interpolated videos can more realistically reflect playback quality on viewing devices. An even more direct approach is to acquire the perceptual preference, as used in the Quality Information Table. However, it would be impossible to have subjective evaluation for every video sequence.
• Rate-Distortion Performance: While most previous works simply try to con-struct an R-D optimized extraction path for pre-encoded SVC bitstreams, in this thesis we further recommended a set of criteria for generating well-adapted bit-streams, which together with strong or weak local condition promise the R-D convexity of optimal extraction paths.
• Search Strategy and Complexity: Through the use of well-adapted settings, our greedy heuristic scheme can very often find the optimal/near-optimal can-didates while reducing the complexity by 50% or more in comparison with the exhaustive search that was adopted by most previous works.
Conclusions
In our work, we attempted to approach the task of rate-distortion (R-D) optimized SVC bitstream extraction from a new direction. Our approach was characterized by three unique considerations: (1) the combined effect of proper encoder setting coupled with matching bitstream extraction and decoding mechanisms, (2) the computation efficiency of search strategies for R-D optimized extraction paths, and (3) the choice of extraction paths amenable to successive refinement of SVC bitstreams.
Through theoretical analysis of SVC inter-layer dependence relations and empirical study of the R-D performance of different encoded/extracted bitstreams, we obtain the following discoveries:
1. An optimal extraction path (corresponding to a convex R-D curve with minimal underlying area) can be found for an SVC bitstream if convex R-D performance can be maintained at every spatial/quality layer as well as temporal layers (re-ferred as the global conditions) and in every pair of successive refinement steps (referred as the local conditions). If the convexity of R-D performance is violated only by minor deviations occur in a small fraction of all refinement steps then a near-optimal extraction path can be found.
Chapter 7. Conclusions
2. Convex R-D performance can be maintained across spatial/quality layers by adapting the inter-layer dependencies between different layers and the quanti-zation parameter QP of individual layer during SVC encoding. The R-D con-vexity of SVC layers (especially the spatial layers) can be predicted by referring to the R-D performance of corresponding H.264/AVC bitstreams encoded with fixed-quality or fixed-rate settings. On the other hand, convex R-D performance across temporal layers can be ensured by the proper cascade of QP values over the hierarchy of temporal layers.
3. The greedy heuristic scheme can be employed to search for the unique optimal extraction path if the SVC bitstream can satisfy both global R-D conditions and strong local R-D conditions. The greedy heuristic scheme is most computa-tionally efficient as it decodes only half of the scalable layer representations in comparison with the exhaustive search strategy that was adopted by most pre-vious works. Beside of being efficient, our experiments showed that the greedy heuristic strategy is also relatively robust with respect to its search results. The strategy can always find a sub-optimal extraction path close to the optimal path even under weak local R-D conditions. The strategy can even find the near-optimal extraction path when the global and local R-D conditions are violated in parts as when a subjective quality measure such as mean opinion scores (MOS) is used to quantify R-D performance.
Our work is still in its early stage, we plan to extend our investigation in several directions: (1) to study R-D optimized encoding and bitstream extractions for the SVC bitstreams with medium-grain scalability (MGS) support, (2) to conduct experiments with error concealment techniques and finally, (3) to devise computationally efficient strategies to search for optimal/near-optimal extraction paths under weak or fractional violation of global and local R-D conditions.
[1] “ITS Video Quality Research,” http://www.its.bldrdoc.gov/n3/video/index.php.
[2] I. Amonou, N. Cammas, S. Kervadec, and S. Pateux, “Optimized Rate-Distortion Extraction With Quality Layers in the Scalable Extension of H.264/AVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, pp. 1186 — 1193, September 2007.
[3] H. C. Huang, W. H. Peng, T. Chiang, and H. M. Hang, “Advances in the Scalable Amendment of H.264/SVC,” IEEE Communications Magazine, vol. 45, pp. 68 — 76, 2007.
[4] Y. S. Kim, Y. J. Jung, T. C. Thang, and Y. M. Ro, “Bit-stream Extraction to Maximize Perceptual Quality Using Quality Information Table in SVC,” SPIE Conference on Visual Communications and Image Processing (VCIP), vol. 6077, January 2006.
[5] Z. La, W. Lin, B. C. Heng, S. Kato, S. Yao, and X. K. Yang, “Measuring the negative impact of frame dropping on perceptual visual quality,” Human Vision and Electronic Imaging X, SPIE-IST, vol. 5666, pp. 554 — 562, January 2005.
BIBLIOGRAPHY
[6] Z. G. Li, S. Rahardja, and H. Sun, “Implicit Bit Allocation for Combined Coarse Granular Scalability and Spatial Scalability,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 12, pp. 1449 — 1459, December 2006.
[7] J. Lim, M. Kim, S. Hahm, K. Lee, and K. Park, “An Optimization-theoretic Ap-proach to Optimal Extraction of SVC Bitstreams,” ISO/IEC JTCI/SC29/WG11 and ITU-T SG16 Q.6, JVT-U081, October 2006.
[8] H. Liu, H. Li, and Y. K. Wang, “Showcase of Scalability Information SEI Message,”
ISO/IEC JTCI/SC29/WG11 and ITU-T SG16 Q.6, JVT-Q067, October 2005.
[9] W. H. Peng, J. K. Zao, T. W. Wang, and H. T. Huang, “Multidimensional SVC Bitstream Adaptation and Extraction for Rate-Distortion Optimized Het-erogeneous Multicasting and Playback,” IEEE International Conference on Image Processing (ICIP), October 2008.
[10] M. Pinson and S. Wolf, “A New Standardized Method for Objectively Measuring Video Quality,” IEEE Transactions on Broadcasting, vol. 50, no. 3, pp. 312 — 322, September 2004.
[11] J. Reichel, H. Schwarz, and M. Wien, “Joint Scalable Video Model JSVM-9,”
ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT-V202, January 2007.
[12] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” IEEE International Conference on Image Processing (ICIP), October 2006.
[13] H. Schwarz and T. Wiegand, “Further Results for an RD-optimized Multi-loop SVC Encoder,” ISO/IEC JTCI/SC29/WG11 and ITU-T SG16 Q.6, JVT-W071, April 2007.
[14] D. Taubman, “High Performance Scalable Image Compression with EBCOT,”
IEEE International Conference on Image Processing (ICIP), October 1999.
[15] Y.-K. Wang, M. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger, “System and Transport Interface of SVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1149 — 1163, September 2007.
[16] T. Wiegand, G. Sllivan, J. Reichel, H. Schwarz, and M. Wien, “Joint Draft ITU-T Rec. H.264 | ISO/IEC 14496-10/Amd.3 Scalable Video Coding,” ISO/IEC JTCI/SC29/WG11 and ITU-T SG16 Q.6, JVT-X201, July 2007.
[17] W. Yao, Z. G. Li, and S. Rahardja, “Balanced Inter-Layer Prediction for Com-bined Coarse Granular Scalability and Spatial Scalability,” IEEE International Symposium on Circuits and Systems (ISCAS), May 2007.