LIST OF TABLES - 對可伸展式視訊編碼之最佳碼率分配

Table 4.1 Proposed Optimal Bit Allocation Algorithm for H.264/SVC . . . 64 Table A.1 Setting 1 for the Third Experiment of the Wavelet Based Codec . . . 80 Table A.2 Setting 2 for the Third Experiment of the Wavelet Based Codec . . . 81 Table A.3 Setting 3 for the Third Experiment of the Wavelet Based Codec . . . 82

CHAPTER 1 INTRODUCTION

Scalable video coding (SVC) facilitates the encoding of a bitstream containing rep-resentations with lower spatial resolutions, frame rates, and quality, which are designed to meet the requirements of the heterogeneous display and computational capabilities of a target device. A client with restricted resources (display resolution, processing power, and bandwidth) can only decode a part of the delivered bitstream. Thus, SVC can be used in a wide range of multicast applications, such as Internet and wireless ap-plications, where scalability is necessary in order to deal with the variable transmission conditions to the end-users. Another benefit of SVC is that it can adapt to a network-aware environment on-the-fly [1,2] when feedback is provided by the network and the end-users.

An important issue in SVC is how to measure the relative importance of a resolu-tion to the overall coding performance. Many researchers emploed a weighting coeffi-cient to represent the relative importance of a resolution. For example, Ramchandran, Ortega, and Vetterli [3] modeled the distortion as a summation of the weighted mean-square-errors (MSEs) on different resolutions, and proposed a bit-allocation algorithm based on the exhaustive search technique. Schwarz and Wiegrand [4] adopted a simi-lar approach by weighting the MSEs of the base layer and the enhancement layer, and demonstrated the effect of employing different weights on each layer on the overall cod-ing performance. The above works do not explain the meancod-ing of the weights or how to derive them. Since the peak-signal-to-noise ratio (PSNR) is most commonly used as a quality measurement of a coding system, in this Dissertation, instead of weighting the MSE of a resolution, we weight the PSNR as a measurement of the resolution relatively important to the overall coding performance.

A good coding performance metric for SVC should consider the subscriber pref-erence for different resolutions. For example, if we want to produce bitstreams in two scenarios: one where all the subscribers prefer the QCIF display and the other where all the subscribers prefer the CIF display, then the optimal bitstreams for the two scenarios should be different. In the first scenario, the optimal bit allocation can only be obtained by allocating all the bits to the subbands that support the QCIF display. Obviously, this allocation cannot be optimal for the second scenario in which the optimal bit allocation must encode more spatial subbands to support a higher spatial resolution display with the CIF format.

Currently, there exist two promising frameworks for SVC. One is the wavelet-based coding method [5–9] and the other is H.264/SVC [10,11]. In the wavelet-wavelet-based coding method, the video is first decomposed into multiple subbands, and then these subbands are respectively coded by the EZBC entropy coder [12,13]. Since the sub-bands are encoded independently, the rate allocated to one subband does not affect the distortions of the others. This property decreases the complexity of the rate allocation problem, but causes less coding efficiency compared to H.264/SVC, which facilitates several techniques to remove the redundancy among the layers. It is obvious that the rate allocation problem of H.264/SVC is more complicated because the dependency among the layers should be analyzed before the rate allocation problem can be solved.

We analyze and solve the rate allocation problem of the wavelet-based coding method as follows: (1) based on the resolution preference, we formulate the bit allo-cation problem for the wavelet based SVC, and show that the weighting coefficients can be derived from the subscriber preferences on different resolutions in a motion com-pensation temporal filtering (MCTF)-based 2D+t wavelet video codec [14]; and (2) we propose three bit-allocation algorithms to solve the problem. The first is an efficient Lagrangian-based method that solves the upper bound of the problem optimally, and the second is a less efficient dynamic programming method that solves the problem

opti-mally. Both methods require knowledge of user preference. For the case where the user preference is unknown, we solve the problem by a min-max approach, which objective is to optimize the bit allocation solution for the worst possible preference distribution.

The overall performance of our approach is highly dependent on whether the pref-erence on resolutions are provided to the wavelet codec. If they are provided, then our methods can achieve an overall PSNR that is at least as good as that of the state-of-the-art 3D wavelet codec in [15]. The PSNR gain of our method over that in [15]

depends on the subscriber preference patterns. Our experiments on various video se-quences with known preferences demonstrate that the overall PSNR gain of our method over that in [15] can range from 0−25 dB when only spatial scalability is applied, from 0− 5 dB when only temporal scalability is applied, and from 0 − 25 dB when both spa-tial and temporal scalability are applied. In fact, we show that the codec in [15] is a special case of our system where all the subscribers prefer the highest spatial and tem-poral resolutions. As a consequence, our codec has 0 dB gain over that in [15] under such particular preference pattern.

In practice, the subscriber preference is inaccessible to many scalable coding plications. To address the problem, we propose an algorithm based on the min-max ap-proach to derive the optimal bit allocation when the subscriber preference distribution is not provided. We show that, under the min-max approach, the least favorable user pref-erence distribution occurs when all users subscribe to the highest spatial and temporal resolutions. In that case, the bit-allocation problem of SVC is exactly the same as that of non-scalable video coding, where the goal is to solve the problem for one particular resolution optimally. Our experiment results show that, for a scalable video codec, there is a significant PSNR gap between scenarios where the user preferences are known and scenarios where the preferences are not known. Finding an approach to reduce the gap is beyond the scope of the present study, so we defer the matter to a future work.

On the other hand, H.264/SVC is a state-of-the-art SVC codec that significantly reduces the gap in rate-distortion (R-D) efficiency between single layer coding and scal-able coding [10,11]. The performance of SVC depends to a large extent on the settings of several parameters [16]. The quantization parameters (QP), the ratio of the I, P, and B frames, and the target bit rate have the most influence on the performance. In this Dis-sertation, we study the multiple-layer bit rate allocation problem in SVC, also known as the optimal quantization parameter (QP) assignment to each layer in SVC. With the ob-jective of simplifying the analysis without affecting its generality, we use a fixed set of values for several SVC coding parameters. Specifically, we assume that the motion vec-tors have been acquired already. In addition, we use the hierarchical B-frame structure for temporal scalability and inter-layer residual prediction for spatial and coarse-grain quality scalabilities [11].

The optimal bit allocation of a rate-constrained encoder control system is usually derived by applying the Lagrangian technique [17]. In contrast to the single-layer video coding, SVC requires that all users are served simultaneously in a single bitstream.

Thus, the data items in an SVC bitstream are highly correlated to each other. This inter-dependency can cause a coding error in one layer to propagate to other layers and thereby complicate the bit allocation process. Another factor that affects bit allocation under SVC is the end-user preference. For example, the bit allocation scheme for the user subscribed to the highest resolution should be different from that for the user sub-scribed to the lowest resolution, since the latter only uses the base layer information.

Hence, the preferences for some resolutions should also be considered by the bit allo-cation scheme. However, incorporating user preferences into the bit alloallo-cation process implies that the preference information should be acquired by the encoder through a feedback mechanism. This is usually considered as a disadvantage in a broadcasting environment.

In [4], Schwartz and Wiegand proposed an encoder control mechanism that jointly optimizes the coding parameters of the base layer and enhancement layers under H.264/SVC.

Their algorithm also utilizes a weighted combination of the distortions of all the layers to balance the coding efficiency of different layers. Although the above approaches demonstrated the correlation between the coding performance and the values of the weighting factors, analyses of the derivation of the weighting factors were not provided.

Recently, Koziri and Eleftheriadis [18] presented an interesting approach that models the distortion dependency between layers as a stochastic process for joint optimization of scalable coding. However, their analysis is limited to Gaussian sources and spatial dependency.

To solve the rata allocation problem of H.264/SVC, we propose a theoretical anal-ysis on the weighting factors. We analyze the effect of a coding error in one layer over the other layers in terms of the residual prediction of temporal, spatial, and quality scal-abilities under SVC. Then, we demonstrate that the weighting factor of a layer i is a function of all the layers affected by the coding error in layer i, and the end-user pref-erence for subscribing to the affected layers. Based on the analysis, we derive the main result, namely, the average PSNR can be represented as the weighted combination of the bit rates assigned to each layer, where the coefficient is a weighting factor. We also propose an R-D optimization algorithm. Experiments on H.264/SVC JSVM 9.18 [19]

demonstrate that our algorithm achieves a significant improvement in the average PSNR over that of the state-of-the-art method in [3,4]. We also show that knowing the user’s preference can significantly improve the coding performance of a scalable video coder.

The remainder of this Dissertation is organized as follows. In the next chapter, we consider several issues that are relevant to the performance measurement of SVC. In Chapter 3, we analyze and solve the rate allocation problem of the wavelet-based SVC.

In Chapter 4, we analyze the dependency of the rate-distortion curves in the predic-tion structure of H.264/SVC. According to the results obtained, we also formulate and

solve the rate allocation problem of H.264/SVC. Chapter 5 contains some concluding remarks.

CHAPTER 2

在文檔中對可伸展式視訊編碼之最佳碼率分配 (頁 27-35)