Overview of Dissertation - 可調視訊編碼之高等細緻可調性研究

Scalable video coding (SVC) attracts wide attention with the rapid growth of multimedia appli-cations over Internet and wireless channels. In such appliappli-cations, the video may be transmitted over error-prone channels with fluctuated bandwidth. Moreover, the clients, consisting of dif-ferent devices, may have difdif-ferent processing power and spatial resolutions. To produce a single bit-stream for versatile purposes, the idea of SVC is proposed. The following summarizes the major applications of SVC and Figure 1.1 depicts a framework for the applications of SVC.

• Internet video streaming/broadcasting,

• Mobile streaming/broadcasting,

• Mobile interactive applications,

• Surveillance,

• Video archiving,...and so on.

To serve the video streaming in a heterogeneous environment, the simulcast, which directly compresses the video into multiple bit-streams with distinct bit rates, is one of the most institu-tive ways to achieve the goal. According to the available channel bandwidth, the transmission

Ethernet Ethernet

Server

Wireless

Point-to-Point Transmission Broadcasting

Router

Wireless

512 kbps

32 kbps 128 kbps

256 kbps 64 kbps

3 Mbps 1.5 Mbps

384 kbps

64 kbps

Bandwidth

Time

Figure 1.1: Application framework for scalable video coding.

can be switched among the bit-streams that are coded at different bit rates. Particularly, to avoid drifting errors, a switching frame can be periodically coded as a transition frame from one bit-stream to another bit-bit-stream [12]. Since the simulcast encodes the video into a limited number of bit-streams, it cannot provide graceful variation of quality while the channel bandwidth fluc-tuates. Furthermore, from the compression perspective, the simulcast is not efficient because the existing correlations among the bit-streams are not fully utilized.

1.1.1 MPEG-4 Fine Granularity Scalability

To offer graceful variation of quality, MPEG-4 streaming video profile [15] defines the fine granularity scalability (FGS), which provides a DCT-based quality (SNR) scalability using the layered approach. Specifically, the video is compressed into a base layer and an enhancement layer. The base layer offers a minimum guaranteed visual quality. Then the enhancement layer

Chapter 1. Introduction

I-frame P-frame P-frame P-frame Base Layer: Non-scalable Codec Enhancement Layer: Embedded Bit-Plane Coding

Truncated Part Truncated Part

Figure 1.2: Prediction structure of MPEG-4 FGS.

40kbits/s 168kbits/s 296kbits/s

Enhancement-Layer Base-Layer

Figure 1.3: Example of fine granular SNR scalability.

refines the quality over that offered by the base layer. Currently, the base layer is coded by a non-scalable codec using conventional closed-loop prediction. On the other hand, the enhancement layer is coded by an embedded bit-plane coding with an open-loop prediction. Figure 1.2 shows the prediction scheme of MPEG-4 FGS [15].

Having the open-loop prediction and embedded bit-plane coding, the enhancement layer can be arbitrarily truncated and dropped for the adaptation of channel bandwidth and processing power. At decoder side, the video quality depends on how much enhancement layer is received and decoded. An example of such quality (SNR) scalability is presented in Figure 1.3. As shown, the base layer provides a rough representation; as more enhancement layers are received, the decoded quality is gradually improved.

Figure 1.4 contrasts the simulcast and MPEG-4 FGS [15] in terms of quality variation and

Quality (PSNR)

Channel Bandwidth Simulcast

MPEG-4FGS

Bit-stream 1 Bit-stream 2

Bit-stream 3 Non-Scalable

2~3dB

Figure 1.4: Comparison between simulcast and MPEG-4 FGS in terms of channel bandwidth and quality variation.

channel bandwidth. With a limited number of bit-streams, the quality variation provided by the simulcast is in a stepwise manner. Particularly, the number of quality levels is determined by the number of pre-encoded bit-streams. On the other hand, MPEG-4 FGS [15] provides an infinite number of quality levels through the truncation of the enhancement-layer bit-stream. According to the fluctuation of channel bandwidth, MPEG-4 FGS [15] can offer smooth variation of visual quality.

While offering good scalability at fine granularity, the compression efficiency of MPEG-4 FGS [15] is often much lower than that of a non-scalable codec. Averagely, at the same bit rate, a PSNR loss of 2∼3dB or more is observed, as presented in Figure 1.4. The PSNR loss comes from the fact that the enhancement layer is simply predicted from the base layer. As shown in Figure 1.3, the base layer is mostly encoded at very low bit rate, and the base-layer frames often have poor visual quality. Since the predictor of poor quality cannot effectively remove the redundancy, the coding efficiency is inferior.

1.1.2 Enhanced Mode-Adaptive Fine Granularity Scalability

To improve the coding efficiency, we try to find a better predictor by using the enhancement layer. In addition to the macroblock from the base-layer frame (Type B), we additionally con-struct two macroblock predictors from the previously reconcon-structed enhancement-layer frame (Type E) and the average of Type B and Type E (Type BE). These predictors are adaptively used to minimize the prediction residue. For example, because the base-layer frames are compressed

Chapter 1. Introduction

Figure 1.5: Prediction structure of enhanced mode-adaptive FGS.

at worse quality, the enhancement-layer frames with motion compensation generally provide better quality, and thus Type E can be used to improve the quality of predictor. On the other hand, Type B is useful for the regions where motion estimation cannot efficiently reduce the inter-frame correlation, e.g., fast-motion region, occlusion region, etc. Additionally, Type BE mode can improve the coding efficiency by taking the best of Type B and Type E. Figure 1.5 depicts the prediction structure of our enhanced mode-adaptive FGS algorithm (EMFGS). As compared to the prediction structure of MPEG-4 FGS [15] in Figure 1.2, the EMFGS provides better coding efficiency by using a closed-loop prediction at the enhancement layer.

Although the coding efficiency can be improved by using the enhancement layer for pre-diction, drifting errors could occur at lower bit rate. As shown in Figure 1.5, a closed-loop prediction is introduced at the enhancement layer. During the transmission, the enhancement layer is not guaranteed to be received in an expected manner. Therefore, the predictor mismatch between the encoder and decoder would produce drifting errors.

To minimize drifting errors, we create an adaptive mode-selection algorithm, in the encoder, which first smartly estimates the possible drifting errors in the decoder and then chooses the best macroblock mode wisely. Particularly, we show that the Type BE predictor can be generalized to reduce drifting errors, and the Type B predictor can completely stop drifting errors. To stop/reduce drifting errors, we adaptively use Type B and Type BE predictors to offer reset and fading mechanisms.

As compared to other advanced FGS schemes [10][32][33], our EMFGS algorithm shows a PSNR improvement of 0.3∼0.5dB with a less complex structure. While comparing to MPEG-4 FGS [15], more than 1∼1.5dB improvement can be gained.

1.1.3 Context-Adaptive Bit-Plane Coding

In addition to constructing better predictors, we also propose a context-adaptive bit-plane coding (CABIC) to improve the coding efficiency of the enhancement layer.

Currently, in MPEG-4 FGS [15], the bit-plane coding at the enhancement layer is performed from the most significant bit-plane to the least significant one. For each bit-plane, the coding is conducted in a frame raster and coefficient zigzag scanning manner. Further, in each transform block, the coefficient bits are represented by (Run, EOP) symbols and coded with Huffman tables.

While offering good embedded property, current approach suffers from poor coding ef-ficiency and subjective quality. The poor coding efef-ficiency is contributed by three factors.

Firstly, information with different weighting is jointly grouped by (Run, EOP) symbols and coded without differentiation. Secondly, existing correlations across bit-planes and among spa-tially adjacent blocks are not fully exploited. Lastly, the Huffman tables have limitation for adapting to the statistic of different sequences.

In addition, the frame raster and coefficient zigzag scanning causes poor subjective qual-ity. Since the enhancement layer could be truncated during the transmission, the frame raster scanning may only refine the upper part with one extra bit-plane. Therefore, the lower part of a decoded frame is normally with worse quality. Such uneven refinement causes degradation of subjective quality. An example of partial refinement is illustrated in Figure 1.6.

To improve the coding efficiency, our CABIC incorporates a context-adaptive binary arith-metic codec. The bit-planes are coded in a context-adaptive, bit-by-bit manner. To distinguish coefficient bits of different importance, we classify the coefficient bits into different types. For each type of bits, the context model is designed by different sources of correlations. Further-more, to fully utilize the existing correlations, both the energy distribution in a block and the spatial correlations in the adjacent blocks are considered in our context models. Also, we ex-ploit the context across bit-planes to save side information and use estimated Laplacian models to maximize the efficiency of binary arithmetic codec.

Furthermore, to improve the subjective quality, we develop a stochastic bit reshuffling (SBR) scheme, which refines the base layer in a content-aware manner. Instead of using a determinis-tic coding order, our SBR employs a dynamic order to refine the regions of higher energy with higher priority. To achieve this, each coefficient bit is assigned with a distortion reduction ∆D, and a coding cost ∆R. With such information, the coefficient bits at the enhancement layer

Chapter 1. Introduction

Truncated

MSB Bit-Plane LSB Bit-Plane

Truncated

Raster Scanning

Blocking Effect Reconstruction

Figure 1.6: Example of partial refinement due to the truncation of the enhancement layer.

are reordered in a way that the associated (∆D/∆R) is in descending order. Particularly, to prevent the exact coding order from transmission, both encoder and decoder model the trans-form coefficients with discrete Laplacian distributions and incorporate them into the context probability models for the content-aware parameter estimation. In our scheme, the overhead is minimized since the coding order is implicitly known to both sides. Moreover, the bit reshuf-fling is conducted in a content-aware manner because our parameter estimation considers the energy distribution in spatial domain by referring to context probability models.

As compared to the bit-plane coding in MPEG-4 FGS [15], our CABIC improves the PSNR by 0.5∼1.0dB at medium and high bit rates. While maintaining similar or even higher coding efficiency, our SBR significantly improves the subjective quality.

In summary, as compared to MPEG-4 FGS [15], our EMFGS together with CABIC can provide 2∼3dB PSNR improvement. On top of that, our SBR can significantly improve the subjective quality.

1.1.4 Applications in Scalable Video Coding Standard

Although the proposed EMFGS and SBR are mainly developed for MPEG-4 FGS [15], in this thesis, we also show that these techniques can be applied in the upcoming MPEG standard for scalable video coding (SVC) [23].

Specifically, our EMFGS can be used to improve the coding efficiency of anchor frames.

In SVC [23], the anchor frames and their enhancement layer are coded in a way similar to that used by MPEG-4 FGS [15]. As it has been proved, using the base layer as the only predictor

of the enhancement layer leads to poor coding efficiency. Thus, the techniques employed in the EMFGS can be applied for the coding of anchor frames. Moreover, we will show that such an application is more important in the low-delay applications, in which the coding efficiency of anchor frames is more critical to the overall performance.

In addition, we also demonstrate that the idea of SBR can be extended for coding the FGS layers. Currently, the FGS layers are coded by a cyclical block coding [26]. Each block is equally coded with one symbol in a coding cycle. Through the concept of SBR, a prioritized block coding is proposed to have the symbols with better rate-distortion performance be coded with higher priority. Also, by using explicit syntax for the priority information, the prioritized block coding can also serve the purpose of region-of-interest functionality.

在文檔中可調視訊編碼之高等細緻可調性研究 (頁 23-30)