Related Works - 一個針對可調視訊編碼中跨層編碼與位元流擷取之位元率-失真最佳化模型

2.2.1 Basic Extraction

Currently, the Joint Scalable Video Model (JSVM) [11][15] provides three diﬀerent ways to perform bitstream extraction. The first one is to extract a substream according to a bit rate constraint. The scalable layer representation thus extracted will have a bit rate that is closest to but not greater than the target bit rate. The second one is to choose a target scalable layer. The extractor will return the layer representations on which the target layer directly or indirectly depends. The last one is to explicitly specify the desired frame rate, frame size, and bit rate. However, the current standard does not specify what to produce if there are several extraction possibilities.

In following subsection, we reviewed some approaches that have been proposed for finding optimal bitstream extraction schemes.

2.2.2 Quality Information Table (QIT)

Kim et al. [4] evaluated the perceptual preference for spatial and temporal quality over a range of bit rates to find preference paths of perceptual quality for bitstream extrac-tion. The spatiotemporal switching points were recorded using Quality Information Tables (QIT), which were further provided to the extractor.

The main idea is to figure out the optimal bit rate allocation strategy for three scalabilities of SVC according to video classes. First of all, video segments are classi-fied and represented using semantic concepts. Then, quality preference paths between multidimensional scalabilities of diﬀerent semantic concepts are determined by subjec-tive testing while bit rate decreasing. For example, Figure 2.3[4] shows the preference paths of scenery and active concepts in three-dimensional scalability. The quality pref-erence path of each video class is recorded in quality information table, which contains scalable layers information and relative bit rate of every switching point. After all, the QITs are provided to extractor for bitstream adaptation.

This approach can find quality preference paths of perceptual quality for diﬀer-ent video classes. However, the display formats of target devices are not considered.

Furthermore, subjective testing is time consuming and hardly performed for all video sequences.

Chapter 2. Background

Figure 2.3: Preference path of perceptual quality [4] : (a) scenery concept, (b) action concept

2.2.3 Quality Index (QI)

Unlike QIT used subjective testing as measurement, Lim et al.[7] defined a objec-tive Quality Index (QI) to measure the perceptual quality and performed bitstream extraction by maximizing the quality index of the resulting bitstream subject to the bit rate constraint. The total QI is composed of weighted quality indexes of spatial, temporal and quality scalabilities (denote as QISR, QIF R and QIP SN R, respectively) of extracted bitstream. Among them, quality indexes for spatial scalability QISR and quality scalability QIP SN R can be measured by PSNR value. While measuring QISR, video segments are interpolated first to matching the playback format of target de-vices. Quality index for temporal scalability QIF R, on the other hand, employs an expo-logarithm function [5] as model to estimate subjective perceptual quality MOS.

This scheme measures QI of every scalable layer representations that can be ex-tracted subject to the bit rate constraint and chooses the one that has maximum total QI value. It obtains the sub-stream with best viewing quality measured by QI given any bit rate. But, the arbitrary extracted scalable layers at diﬀerent bit rates may not support multiple adaptation of single extracted bitstream, which is an important feature in some network applications such as video multicasting.

Figure 2.4: Quality-Layer-based extraction [2]

2.2.4 Quality Layer Optimized Extraction

Amonou et al. [2] formulated the problem as a rate-distortion (R-D) optimization process and shuﬄed the quality increments in an R-D sense for MGS/FGS enhancement layers. The idea is similar to Quality Layers in JPEG 2000 [14].

Priorities are assigned to NAL units in SVC bitstream to represent virtual layered organization of stream for further bitstream adaptation. First of all, R-D information is calculated for quality increment of each picture at each quality refinement level using independent or dependent distortion calculation. In dependent distortion calculation, the distortion of a picture and the distortion of pictures which were predicted from it are all considered. Namely, the impact on total rate and on the global reconstruction quality of each quality increment is computed to measure its R-D performance (slope).

Based on the R-D information, the quality increments are sorted while the constraints of temporal prediction dependency are respected. Finally, Quality Layers are assigned to the quality increments according to the sorting results and stored in NAL header using priority_id field or in SEI messages.

The Quality Layer optimized extraction can even apply to multiresolution bit-stream. Figure 2.4 [2] illustrates Quality-Layer-based extraction. Each big block represents a scalable layer refereed to as (Dd, Tt, Qq)where Ddindicates the spatial res-olution, Tt for temporal layer and Qq for the quality level. The small blocks represent the NAL units of quality enhancement layer in diﬀerent spatial resolution: dark-gray

Chapter 2. Background

blocks for D0 and gray ones for D1. The blocks are ordered according to their Quality Layer information rather than quality levels. Therefore, NAL units with lower R-D performance will be dropped first when bitstream extraction happened.

Quality Layer assignment makes quality increments are well prioritized, which in-sures a simple parsing of the stream that can be performed in network transmission.

Nevertheless, the trade-oﬀ between spatial and temporal scalabilities is not considered in this approach.

In summary, all of prior studies were designed to determine the bitstream extraction order through diﬀerent optimization schemes except the Basic Extraction approach.

Between them, the Quality-Layers-based extraction is the only one approach that can produce extracted sub-streams which can support multiple adaptations. Moreover, they all can be treated as post-processing of pre-encoded bitstreams. No suggestions for proper parameter settings during SVC encoding have been proposed for benefiting the bitstream extraction.

Rate-Distortion Optimization of SVC Bitstream Extraction

Our investigation began with an attempt to devise strategies for finding an optimal extraction path of an SVC bitstream for a viewing device. The extraction path should be amenable to successive refinement of the SVC bitstream for supporting multiple adaptations. In this chapter, we describe the notion of successive refinement of optimal extraction paths and define the R-D optimization of SVC bitstream extraction problem as a constrained optimization problem. We further introduce a R-D trellis diagram to model the bitstream extraction process. Based on R-D trellis diagrams, we can employ dynamic programming algorithm to find the solution, and furthermore propose a greedy heuristic scheme to achieve the same or similar performance while reducing the complexity significantly.

在文檔中一個針對可調視訊編碼中跨層編碼與位元流擷取之位元率-失真最佳化模型 (頁 18-22)