Organization - 以特徵為基礎的視訊編碼位元配置結構

Chapter 1 Introduction

1.2 Organization

The rest of this thesis is organized as follows. Chapter 2 will introduce the background and the related work of the video object segmentation, rate control and bit allocation algorithm in the H.264 standard. In Chapter 3, we will present the details of our proposed algorithm for object segmentation and strategy for rate control and bit allocation. Chapter 4 will show the experimental results and we will make a conclusion in Chapter 5.

Chapter 2 Background

The purpose of coding region of interest is trying to search regions which human will focus on in the video sequences and improve the visual quality of these areas while sacrificing the quality of background. Thus, video object segmentation is first required to extract the moving objects that might be interesting to the viewers. After the segmentation process, bit allocation scheme will distribute different bits to object regions and background by their characteristics. Finally, in the channel with a limited bitrate, the users will get better visual quality on regions which they are interesting in.

In this chapter, we will introduce the related works of the video object segmentation and rate control/bit allocation. The details of the related works of video object segmentation will be introduced in Section 2.1. Since we are developing the rate control/bit allocation strategy with H.264, we will review the standard of H.264/MPEG-4 Part 10 and its rate control strategy in section 2.2 and 2.3. Then, related works of bit allocation algorithm will be introduced in section 2.4.

2.1 Video Object Segmentation

There are many researches in the literature of object segmentation. Generally, segmentation algorithm can be classified into two categories, change detection based methods and the homogeneity based methods.

The change detection based algorithms [4]-[6] segment objects by taking difference between current frame and previous frame, and then a binary mask indicating the shape and position of the moving objects has been decided with a

monitoring by combining background registration [7][8], but they are not suitable for video sequences with camera moving, such as movie.

The other category of segmentation algorithms [9]-[12] are homogeneity based algorithms. These algorithms segment moving objects based on the homogeneity of their color, texture or motion information. Pixels with some similar features are first grouped into small regions, and these regions are then grouped into objects with some other features. However, the primary drawback of these and many other pixel based approaches to object segmentation is the amount of the required computational cost to process the video sequences.

Recently, some fast segmentation algorithms [13]-[15] have been proposed to efficiently segment objects in the video sequence without large amount of computation. These fast algorithms are based on compressed domain and utilize the feature of temporal and spatial information, such as motion vectors and DCT coefficients in MPEG bit-stream. By filtering motion vectors and DCT coefficients, these methods use a watershed algorithm to cluster favorable macroblocks that have similar features.

Since our system is based the idea of the application in real-time video streaming, we will refer the ideas of [13]-[15] and propose a simple and fast algorithm to segment objects by clustering blocks of similar motion vector with region growing algorithm.

2.2 H.264/MPEG 4 Part 10

H.264 is a new standard and promises to outperform the earlier MPEG-4 and H.263 standard, providing better compression of video images. The new standard is entitled “Advanced Video Coding (AVC)” and is published jointly as MPEG-4 Part 10 of MPEG4 and ITU-T Recommendation H.264.

2.2.1. New Features

We list some of the important terminology adopted in the H.264 standard. More features in the H.264 standard are shown in [3].

Variable block-size motion compensation with small block size: This standard support more flexibility in the selection of motion compensation block sizes and shapes than any previous standard. The block sizes may be one 16x16 macroblock partition, two 16x8 partitions, two 8x16 partitions or four 8x8 partitions. If the 8x8 mode is chosen, each of the four 8x8 sub-macroblocks within the macroblock may be split in a further 4 ways, either as one 8x8 sub-macroblock partition, two 8x4 sub-macroblock partitions, two 4x8 sub-macroblock partitions or four 4x4 sub-macroblock partitions.

Multiple reference picture motion compensation: For motion compensation purpose, the encoder can select a large number of pictures, which have been decoded, to be the reference frames.

Weighted Prediction: This feature allows the motion-compensated prediction signal to be weighted and offset by parameters specified by the encoder.

Directional spatial prediction for intra coding: A new technique of extrapolating the edges of the previous-decoded parts of the current picture is applied in regions of pictures that are coded as intra. This improves the quality of the prediction signal, and allows prediction from neighboring areas that were not coded by intra coding.

2.2.2. Introduction to H.264 Encoder

Fig. 2-1 H.264 Encoder [1]

Fig. 2-1 shows the H.264 encoder. An input frame or field F is processed in _n units of a macroblock. Each macroblock is encoded in intra or inter mode and, for each block in the macroblock, a prediction PRED (marked as P in Fig. 2-1) is formed based on reconstructed picture samples. In intra mode, PRED is formed from samples in the current slice that have previously encoded, decoded and reconstructed (uF ' ). _n In the inter mode, PRED is formed by motion-compensated prediction from one or two reference frames.

The prediction PRED is subtracted from current block to produce a residual block D that is transformed and quantized to give X, a set of quantized transform _n coefficients which are reordered and entropy encoded.

The encoder decodes a macroblock to provide a reference for further predictions.

The coefficients X are scaled (Q ) and inverse transformed (⁻¹ T⁻¹) to produce a difference block D' . The prediction block PRED is added to _n D' to create a _n reconstructed block uF ' . A filter is applied to reduce the effects of blocking _n distortion and the reconstructed reference frame is created from a series of block F ' . _n

2.3 Rate Control for H.264

An encoder employs rate control as a way to regulate varying bit rate characteristics of the coded bit-stream in order to produce high quality decoded frame at a given target bit rate. Rate control is thus a necessary part of an encoder, and has been widely studied in standards, like MPEG 2, MPEG 4, H.263, and so on [18]-[22].

Rate distortion optimization (RDO) is expecting to minimize the decoded distortion under a given rate constraint. The Lagrangian method can find the tradeoff between the rate and distortion efficiently. In H.264, the Lagrangian method is used for mode selection in motion compensation and intra prediction. In other word, it can minimize the distortion and find the optimal motion vector and coding mode of a block at a give rate constraint. However, utilizing Lagrangian method makes the rate control for JVT a more difficult task than those for other standards [23]-[25]. This is because the quantization parameters are used in both rate control algorithm and RDO, which resulted in the following chicken and egg dilemma when the rate control is studied: To perform RDO for macroblocks in the current frame, a quantization parameter (QP) should be first determined for each macroblock by using the mean absolute difference (MAD) of current frame or macroblock [18][19]. However, the MAD of current frame or macroblock is only available after the RDO.

As described above, there is a problem in the implementation of the rate control in H.264 coding. (1) The MAD is unknown before performing RDO. (2) Although we cat get MAD for each coding mode after motion compensation, the best coding mode is still unknown so that we cannot decide which MAD cab be used to estimate the QP.

The H.264 standard uses a single pass rate control algorithm to solve the problem described above. The following sections will describe the H.264 rate control scheme in detail. Fig. 2-2 shows the approach for the rate control in H.264 standard.

Fig. 2-2 Elements of H.264 Rate Control [25]

2.3.1 Quadratic Rate Distortion Model

Quadratic R-D model is adopted in MPEG-4 and H.264/AVC. To illustrate the rationale of quadratic R-D model, we summarize the results in [18][19].

Assuming that the statistics if input datas are Laplacian distributed: solution for the R-D functions as derived:

The R-D function is expanded into a Taylor series:

)

The new model is formulated in the equation as follows:

Q i : quantization level used for the current frame i;

In order to consider the complexity of each frame and the overhead including video/frame syntax and motion vectors, the quadratic R-D model is modified as follows:

T i : total number of texture bits used for encoding the current frame i;

MAD : MAD of the current frame i, computed using motion-compensated residual i

for the luminance component;

2 1, X

X : first- and second-order coefficients.

2.3.2 Terminology

A. Definition of Basic Unit

Suppose that a frame is composed of N_mbpic macroblocks. A basic unit is defined to be a group of contiguous macroblocks which is composed of N_mbunit macroblocks where N_mbunit is a fraction of N_mbpic. Denote the total number of basic units in a frame by N_unit, which is computed by:

mbunit mbpic

unit N

N = N (6)

Examples of a basic unit can be a macroblock, a slice, a field or a frame.

B. A Fluid Flow Traffic Model

Fig. 2-3 Fluid Flow Traffic Model

We shall now present a fluid flow traffic model to compute the target bit for the current coding frame. Let N_gop denote the total number of frames in a group of

picture (GOP), n_i_,_j(i=1,2,L =,j 1,2,L,N_gop) denote the jth frame in the ith GOP,

C. A Linear Model for MAD Prediction

We now introduce a linear model to predict the MAD of current basic unit in the current frame by the actual MAD of the basic unit in the same position of the previous frame. Suppose that the predicted MAD of current basic unit in the current frame and the actual MAD of basic unit in the same position of previous frame are denoted by

MAD and cb MAD , respectively. The linear prediction model is then given by _pb

1 MAD a

MAD_cb = × _pb + (8)

where a₁ and a₂are two coefficients of prediction model. The initial value of a₁ and a₂ are set to 1 and 0, respectively. They are updated after coding each basic unit.

The linear model (8) is proposed to solve the chicken and egg dilemma.

D. HRD Consideration

In order to place a practical limit on the size of decoder buffer, a lower bound and an upper bound for the target bits of each frame are determined by considering the hypothetical reference decoder (HRD) [26]. Compliant encoders must generate

bistreams that meet the requirements of the HRD. The lower bound and upper bound for the nth frame are bounded by L(n_i_{, j}) and U(n_i_{, j}), respectively. It is also shown that HRD consideration is conformed if the actual frame size is always within the range

⎣

^L⁽ⁿⁱ^,^j^),^U⁽ⁿⁱ^,^j⁾

⎦

Let t_r(n_i_{, j}) denote the removal time of the jth frame in the ith GOP. Also let )

be be the bit witch is equivalent of a time t, with the conversion factor being the buffer arrival rate [40]. The initial values of the upper and the lower bound are given as follows: iteratively as follows:

( ) ( ) ( ) ( )

2.3.3 Overview of the original H.264 Rate Control Scheme

With the concept of basic unit, models (7) and (8), the steps of the H.264 rate control scheme are given as follows:

1. Compute a target bit for the current frame by using the fluid traffic model (7) and bound it by HRD.

using the actual MAD of basic unit in the co-located position of previous frame.

3. Allocate the remaining bits to all non-coded basic units in the current frame equally.

( )

where T is the bits allocated for current frame and BUMAD is the predicted MAD _i in the ith basic unit of a frame. MINVALUE is constant, and K is the total number of the basic unit.

4. Compute the quantization parameter by using the quadratic R-D model (5).

5. Perform RDO for each macroblock in the current basic unit by the quantization parameter derived from step 4.

2.3.4 GOP Layer Rate Control

In this layer, we need to compute the total number of remaining bits for all non-coded frames in each GOP and to determine the starting quantization parameter of each GOP. In the beginning of the GOP, the total number of bits allocated for the ith GOP is computed as follows:

( ) ( )

gop c

(

i N_gop

)

The starting quantization parameter of the first GOP is a predefined quantization parameter QP₀. The I-frame and the first P-frame of the GOP are coded by QP₀.

QP0 is predefined based on the available channel bandwidth and the GOP length.

Normally, a small QP₀ should be chosen if the available channel bandwidth is high and a large QP₀ should be used if it is low.

The starting quantization parameter of other GOPs, QP_st, is computed by

( )

sum of quantization parameters for all P frames in the previous GOP. Same as QP₀,

QPst is adaptive to the GOP length and the available channel bandwidth.

2.3.5 Frame Layer Rate Control

The frame layer rate control scheme consists of two stages: pre-encoding and post-encoding.

2.3.5.1. Pre-Encoding Stage

A. Quantization parameters of B frames

Since B frames are not used to predict any other frame, the quantization parameters can be greater than those of their adjacent P or I frames such that the bits could be saved for I and P frames. On the other hand, to maintain the smoothness of visual quality, the difference between the quantization parameters of two adjacent frames should not be greater than 2.

Suppose that the number of successive B frames between two P frames is L and the quantization parameters of the two P frames are QP₁ and QP₂, respectively. The quantization parameter of the ith B frame is calculated according to the following two cases:

Case 1: L=1. In other words, there is only one B frame between two P frames. The

quantization parameter of the B frame is computed by

Case 2: L>1. In other words, there are more than one B frame between two P frames.

The quantization parameters of ith B frame between two P frames are computed by

where α is the difference between the quantization parameter of the first B frame and QP₁, and is given by where the video sequence switches from one GOP to another GOP.

B. Quantization parameters of P frames

The quantization parameters of P frames are computed via the following two steps:

Step 1 Determine a target bit for each P frame.

Step 1.1 Determination of target buffer occupancy.

We predefine a target buffer level for each frame according to the frame sizes of the first I frame and the first P frame, and the average complexity of previous coded frames. The function of the target buffer level is to compute a target bit for each P frame, which is then used to compute the quantization parameter. Since the quantization parameter of the first P frame is given at the GOP layer, we only need to

predefine target buffer levels for other P frames in each GOP.

After coding the first P frame in the ith GOP, we reset the initial value of target buffer level as

The target buffer level for the subsequent P frames is determined by

W is the average complexity weight of P pictures, respectively.

In the case that there is no B frame between two P frames, Equation (19) can be simplified as fullness is exactly the same as the predefined target buffer level, it can be ensured that each GOP uses its own budget. However, since the rate-distortion (R-D) model and the MAD prediction model are not accurate [18][19], there usually exists a difference between the actual buffer fullness and the target buffer level. We therefore need to compute a target bit for each frame to reduce the difference between the actual buffer fullness and the target buffer level.

Step 1.2 Microscopic Control (target bit rate computation).

The target bits allocated for the jth frame in the ith GOP is determined based on

the target buffer level, the frame rate, the available channel bandwidth and the actual buffer occupancy as follows:

))

The number of remaining bits should also be considered when the target bit is computed.

If the last frame is complex and uses excessive bits, more bits should be assigned to this frame. The target bit is a weighted combination of ~( )

, j

Step 2 Compute the quantization parameter and perform RDO.

The MAD of current P frame is predicted by the linear model (8) using the actual MAD of previous P frame. Then, the quantization parameter Qˆ corresponding to _pc the target bit is computed by using the quadratic model (5).

The quantization parameter is then used to perform RDO for each macroblock in the current frame by using the method.

2.3.5.2. Post-Encoding Stage

Finally, there are three major tasks in this stage: update the parameters a₁ and a2 of linear model (8), the parameters X₁ and X₂ of quadratic R-D model (5), and determine the number of frames needed to be skipped.

2.3.6 Basic Unit Layer Rate Control

macroblocks) , an additional basic unit layer rate control should be added in the scheme.

Same as the frame layer, we shall first determine the target bit for each P frame.

The process is the same in that at the frame layer. The bits are then allocated to each basic unit. First, the MADs of all non-coded basic units in the current frame are predicted by linear model (8) using actual MAD of bask unit in the same position of previous frame, and we allocate the remaining bits to all non-coded basic units in the current frame by function (11) using these predicted MADs.

Then, we compute the quantization parameter of current basic unit by using quadratic R-D model (5). But, we need to consider the following three cases:

Case 1: The quantization parameter for first basic unit in the current frame is assigned to the average value of quantization parameters for all basic units in the previous frame.

Case 2: If the number of remaining bits for all non-coded basic units in the current frame is less than zero, the quantization parameter should be greater than that of previous basic unit.

Case 3: Otherwise, we shall compute quantization parameter by using the quadratic model.

After all, the RDO process and updating for parameters of linear model and quadratic model is done by the same way as the frame layer.

2.4 Bit Allocation Strategy

In the previous section, we have introduced the rate control strategy in H.264.

And there are many other schemes proposed to improve it.

Pan et al. [28] proposed a new scheme for the bit allocation of each P frame to further improve the perceptual quality of the reconstructed video. A new

least-mean-square estimation method of the R-D model parameters was developed by Nagn et al. [29]. However, these target bit estimation schemes, as an important factor in determining the quantization parameter (QP), are distributing bits to every basic unit equally without considering the complexity of the frame, and it results in poor target bit estimation for different frames.

In [30][31], Ling et al. had proposed a modified algorithm using more accurate frame complexity to allocate bits. While the predicted MAD calculated in linear model (8) is not very accurate, Yu et al. [32] have used a measure named motion complexity of the frame to distribute more bits to high motion scenes. However, these methods only try to allocate more bits to complex frames, and it only results in a general better quality to whole frame.

Since the human visual system (HVS) is more sensitive to the moving regions, it is worthwhile to sacrifice quality of the background regions while enhancing that of the moving regions. Some research works on region/content-based rate-control have been reported [33][34]. They adopted a heuristic approach to decide the quantization parameters for different regions in a frame. Region of Interest (ROI) will obtain a finer quantizer and a coarser quantizer will be used for non-ROI. These methods

在文檔中以特徵為基礎的視訊編碼位元配置結構 (頁 11-0)