Specifically, our main contributions in this work include the following:
• We define the rate-distortion optimal bitstream extraction problem as a con-strained optimization problem and create a R-D trellis diagram to model the bitstream extraction process.
• We employ dynamic programming algorithm and propose a fast greedy heuristic
search strategy for searching optimal extraction paths.
• We develop a set of adaptation rules for setting quantization parameters and inter-layer dependencies during SVC encoding.
• We analyze a lot of experimental results to figure out how video contents, device types, distortion measures and interpolation algorithms may affect the optimal extraction paths.
Experimental results indicate that our optimization scheme makes a significant dif-ference in improving viewing quality. Our adaptation rules promise the R-D convexity of optimal extraction paths and enable the greedy heuristic scheme to achieve the same or similar performance as the dynamic programming algorithm while reducing the complexity by 50% or more.
The remaining of this thesis is organized as follows: Chapter 2 contains a review of SVC dependency structure and related works for finding optimal bitstream extraction schemes. Chapter 3 presents our R-D optimization model for bitstream extraction.
Chapter 4 introduces and analyses our strategies for finding an optimal/near-optimal extraction path. Chapter 5 further describes the necessary criteria that must be satis-fied during SVC encoding in order to guarantee the existence of optimal/near-optimal extraction paths. Chapter 6 addresses the implementation issues of establishing well-adapted inter-layer dependencies and provides a detailed analysis on the optimal ex-traction paths and evaluates the performance of the greedy heuristic scheme in search for the optimal path. The differences between our extraction scheme and other pre-vious works are also compared. This thesis ends with a summary of our observations and a list of future works in the conclusion.
CHAPTER 2
Background
2.1 Scalable Video Coding
2.1.1 Concept
The scalable video coding (SVC) standard [3][12][16] is an scalable extension of the H.264/AVC standard developed by the Joint Video Team (JVT) that makes a single bitstream to provide multiple frame sizes, frame rates and quality levels while achieving a reasonable coding efficiency. A subset of SVC bitstreams can be extracted and decoded to produce a lower playback quality rather than failed to decode under some constraints of resources such as network throughput or power of devices.
SVC supports three types of scalabilities: spatial, temporal and quality scalabili-ties. An SVC bitstream is organized into one base layer and one or more enhancement layers in corresponding dimension if it provides certain scalability. The spatial scala-bility bases on multilayer coding that uses separate encoder loops for different spatial resolution layers and develops adaptive inter-layer prediction techniques to exploit cor-relations among the layers. For each coding layer, the temporal scalability is provided by hierarchical temporal prediction structures. Quality scalability in SVC is provided
Temporal Id(T) 0 3 2 3 1 3 2 3 0
QCIF (0,0) QCIF CGS1
(1,0) CIF (2,0) CIF CGS
(3,0) (D,Q)
Playback order 0 1 2 3 4 5 6 7 8
Figure 2.1: SVC dependency structure
by two approaches: Coarse-grain quality scalable coding (CGS), which can be con-sidered as a special case of spatial scalability with identical frame sizes for base and enhancement layer, and medium-grain quality scalable coding (MGS), which provides quality refinement layers inside each spatial layer and allows packet-based quality scal-able coding.
Figure 2.1 depicts an example of SVC dependency structure. Each block denotes a coded picture. The horizontal order presents playback order of frames and the ver-tical stack appears the coding layers, as known as dependency layers, in spatial/CGS scalabilities. The arrows present the dependency relations due to coding prediction structures. Every dependency layer may choose one of lower layers as reference layer for inter-layer prediction. To decode correctly, all of lower layers which target layer directly or indirectly depends on for reference should appear while bitstream decoding.
2.1.2 Transport Interface of SVC
The coded video data and other side information in SVC bitstreams are encapsulated as network abstraction layer (NAL) units. The NAL unit consists of a header followed by payload data. The SVC NAL header consists of one-byte H.264/AVC header and three-byte extended SVC header. The extended header includes syntax elements
de-Chapter 2. Background
Figure 2.2: Scalable layers corresponding to Figure 2.1
pendency_id (D), temporal_id (T ) and quality_id (Q), which denote the identifier of dependency layers, temporal layers and quality refinement layers respectively, as well as other assisting information to support easy bitstream extraction. Another impor-tant syntax element is the priority identifier priority_id, which can be used to signal the importance of NAL unit.
The sets of NAL units with identical D, T and Q information are organized into scalable layers. Here, the dependency and quality identifiers are combined as coding layer identifier L. As shown in Figure 2.2, the NAL units in the SVC bitstream which is depicted in Figure 2.1 can be grouped into scalable layers using coding layer identifier L and temporal identifier T . A set of scalable layers which are required for decoding certain corresponding scalable layer is known as scalable layer representation and de-fined as S(L, T ) in this thesis. For instance, S(3, 2) includes all scalable layers with identifiers L ≤ 3 and T ≤ 2 in Figure 2.2.
SVC also designs Scalability information Supplemental Enhancement Information (SSEI) messages to carry the scalable layers information of bitstream such as spatial resolution, bit rate and priority information of layers for assisting bitstream adaptation processes.