Bit Allocation Strategy

Chapter 2 Background

2.4 Bit Allocation Strategy

In the previous section, we have introduced the rate control strategy in H.264.

And there are many other schemes proposed to improve it.

Pan et al. [28] proposed a new scheme for the bit allocation of each P frame to further improve the perceptual quality of the reconstructed video. A new

least-mean-square estimation method of the R-D model parameters was developed by Nagn et al. [29]. However, these target bit estimation schemes, as an important factor in determining the quantization parameter (QP), are distributing bits to every basic unit equally without considering the complexity of the frame, and it results in poor target bit estimation for different frames.

In [30][31], Ling et al. had proposed a modified algorithm using more accurate frame complexity to allocate bits. While the predicted MAD calculated in linear model (8) is not very accurate, Yu et al. [32] have used a measure named motion complexity of the frame to distribute more bits to high motion scenes. However, these methods only try to allocate more bits to complex frames, and it only results in a general better quality to whole frame.

Since the human visual system (HVS) is more sensitive to the moving regions, it is worthwhile to sacrifice quality of the background regions while enhancing that of the moving regions. Some research works on region/content-based rate-control have been reported [33][34]. They adopted a heuristic approach to decide the quantization parameters for different regions in a frame. Region of Interest (ROI) will obtain a finer quantizer and a coarser quantizer will be used for non-ROI. These methods [33][34] just set quantizers with constants and do not take the contents of region into consideration, and this may cause improper QPs and unreasonable bits used for different regions. So, there are some other improved algorithms trying to adaptively adjust these factors. Lai et al. [35] proposed a scheme which uses a region-weighted rate-distortion model to calculate different QPs for different regions. Sun et al. [36]

also proposed a scheme to allocate bits to foreground and background by utilizing a weighting function for different regions. However, these algorithm [33]-[36] only use fixed values or simple region-based weighting scheme to assign quantization

these regions.

In [37-38], the algorithms that take account of size, motion and priority of the foreground and background regions has been proposed. But these methods adjust the quality of foreground/background by taking the whole foreground as one part. Since there may be multiple objects in the foreground region, we propose an algorithm utilizing the features of different objects to further adjust different quality of these object regions.

Chapter 3 Motion-based Object Segmentation and Feature-based Bit Allocation Scheme

In this chapter, we present our methods for video object segmentation and rate control. In section 3.1, we first go through the whole scheme and give an overview quickly. In section 3.2, we present the object segmentation algorithm. And in section 3.3, the bit allocation strategy for background and foreground objects is presented.

3.1 Overview

Our proposed scheme contains two parts, video object segmentation parts and the bit allocation parts. Since we are focusing on uncompressed video input sources, the object segmentation algorithm is only used with inter-coding frames. In the beginning, we use a multi-resolution algorithm to find the motion vector. In the coarsest level, we establish a object mask and a object set by using coarse motion vectors generated in the motion estimation modules. While in every finer level the multi-resolution algorithm refines the motion vectors, we also use these finer motion vectors to update our objects mask and object set. Then, the object set is then used by bit allocation module. The bit allocation strategy uses the information of objects to judge the importance of foreground objects and background, and then different coding bits will be allocated to these regions to keep the visual quality of foreground objects. The flow of the whole system is illustrated in Fig. 3-1.

Fig. 3-1 System Overview

3.2 Motion-based Video Segmentation Algorithm

The video segmentation algorithm directly takes the raw video data as input to segment the object regions and extracts the object mask for proceeding processing. A multi-resolution pyramid structure has been adopted to find motion vectors and to segment objects by utilizing the motion vectors iteratively. In section 3.2.1, we will present the multi-resolution motion estimation algorithm, and in section 3.2.2, the object localization algorithm will be proposed. The algorithm of updating object regions and morphological operation will be proposed in section 3.2.3 and 3.2.4, respectively.

3.2.1. Multi-Resolution Motion Estimation

For the sake of reducing the computation load segmentation, a multi-resolution motion estimation algorithm has been applied. The multi-resolution algorithm is chosen due to its pyramid structure, robustness and improvements in comparison to the one-level schemes. Since motion clustering is time-consuming, we can utilize the iterative pyramid structure to decrease the complexity by generating a rough mask at the coarsest level and refining it at each finer level.

In the following, we will present the details of the multi-resolution motion estimation scheme that has been used in our system.

Fig. 3-2 Multi-Resolution frame structure

3.2.1.1 Multi-Resolution Frame Structure

The multi-resolution motion estimation we applied is a simple method. First we decompose the input frame into a three layer pyramid by the following sub-sampling function:

( )

^{( )}

⁽ ^, ⁾

4 , 1

0 1

i j I i m j n

I

_k^l

m n

= ∑∑ + +

= =

+ (24)

where ^Ik^{( )}^l⁺¹

( )

ⁱ, ^j represents the intensity value at the position

( )

^i, ^j of the kth frame at level l + 1. The number of pixels in the next upper level is reduced to on fourth of the lower level. The multi-resolution frame structure is illustrated in Fig. 3-2.

The MB size becomes 16 × 16, 8 × 8 and 4 × 4 at levels as 0, 1, and 2, respectively.

The sum of absolute difference (SAD) is widely used as the matching criterion vector in a given search range.

3.2.1.2 Motion Search Framework

1) Search at Level 2: We choose two candidates, i.e.,

{

^{( )} 2^{( )}¹

}

1 , MV

MV , based

on the spatial correlation in motion vector fields as well as minimum SAD, and employ them as initial search centers at level 1. MV₁^{( )}¹ having the minimum SAD are found by full search within a search range SR₂:

SR and w is the predefined search

range by encoder.

( )¹

MV2 is predicted from adjacent motion vectors at level 0 via a component-based median predictor.

2) Search at Level 1: Local search are performed around the two candidates in order to find a motion vector candidate for the search at level 0.

( )

where

3) Search at Level 0: A final motion vector is found from a local search around ( )0

3.2.2. Object Localization

At the coarsest level, after multi-resolution motion estimation, object localization algorithm is used to locate potential objects in a video sequence for subsequent object based bit allocation. Initially, we check if there is any camera motion of each frame and compensate motion vectors with global motion if camera motion happens.

Otherwise, noisy motion vectors are eliminated directly without motion compensation.

Subsequently, motion vectors that have similar magnitude and direction are clustered together and this group of associated macroblocks of similar motion vectors is regarded as an object. The overview of the algorithm of object localization is shown in Fig 3-3.

Fig. 3-3 Object localization algorithm

3.2.2.1. Global Motion Estimation

To correctly locate the position of objects, global motion (camera motion) such as panning, zooming, and rotation, should be estimated for compensation. In this section, a fast and simplified global motion detection algorithm is proposed.

Many global motion estimation algorithms have been proposed, and are based on the motion model of two (translation mode), four (isotropic model), six (affine model), eight (perspective model), or twelve parameters (parabolic model). They can be classified into three types: frame matching, differential technique, and feature points based algorithm.

Since all the method based on motion model need heavy computation, we propose a simple algorithm to calculate global motion by using histogram to reduce the complexity.

The histograms of magnitude and direction of motion vectors are computed to acquire dominant motion direction and dominant motion magnitude to further identify whether global motion, pan and tilt, happens or not. Using the approach of histogram-based dominant motion computation, we can avoid matrix multiplications, which are computationally inefficient when motion vectors are fit to motion model.

The magnitude and direction of camera motion are obtained by using the equations below:

where DMH and DAH are the dominant magnitude and dominant direction of motion vector histogram, respectively, SDMH is the summation of three bins (_i Bin_DMH₋₁_,_i,

BinDMH_, ,Bin_DMH₊₁_,_i) of the magnitude histogram of the i^th frame, SDAH_i is the

summation of three bins (Bin_DAH₋₁_,_i, Bin_DAH_,_i, Bin_DAH₊₁_,_i) of direction histogram

of the i^th frame, and N(Bin_j_,i) means the value of the j bin in the ^th i frame. ^th In the ideal situations, macroblocks in an object would have the same motion magnitude and direction. However, although the entire objects moves toward the same direction, some regions in the object might have different but similar motion magnitude and direction because objects in real world are not rigid in their shape and size. Consequently, to tolerate the error of motion estimations, the values of

BinDMH₋₁_, , Bin_DMH_,_i and Bin_DMH₊₁_,_i of magnitude histogram are summed to examine whether the summation SDMH is larger than the threshold or not, and the _i values of Bin_DAH₋₁_,_i,Bin_DAH_,_i andBin_DAH₊₁_,_i of direction histogram are summed to examine SDAH . If _i SDMH and _i SDAH are both larger than the threshold _i

global

T

, global motion happened, and DMH and DAH are identified as magnitude and direction of camera motion in i frame. Moreover, motion vectors are compensated ^th with the magnitude and direction of global motion for further processing.

3.2.2.2. Object Clustering

We use region growing approach to cluster macroblocks that have motion vectors with similar magnitude and direction together and this group of associated macroblocks of similar motion vector is regarded as an object,. Detailed algorithm is presented in the following.

Object Localization Algorithm

Input: Coarsest layer of the input frame

Output: Object sets {Obj₁, Obj₂, …, Obj_n}, where n is the total number of objects in frame. Each object size is measured in terms of the number of

macroblocks, and the centroid of the object is also calculated by averaging the coordinates of all macroblocks inside the object region.

Step 1. Analyze motion vector of inter-coded macroblocks in a frame to see if there is any camera motion.

Step 2. If there is no global motion, go to step 3. If glolbal motion is detected, motion vectors that are not noisy are compensated with camera motion magnitude and direction.

Step 3. Cluster motion vectors that are of similar magnitude and direction into the same group with region growing approach.

Step 3.1 Set search windows (W) size 3x3 macroblocks.

Step 3.2 Search all macroblocks within W, and compute the difference (diffMag and _k

diffAng ) of motion vector magnitude (k MV ) and direction (∠MV ) between center MV_center and its neighboring eight motion vectorsMV within W. _k

)

where MV_center is the motion vector in the center position of W and MVk∈ motion vectors within W except MV_center, k∈[1,8]

For all k∈[1,8], flag

⎩⎨

where T_Mag is the predefined threshold for motion vector magnitude and

TAng is the threshold for motion vector direction.

∑

= 8

≥

6 F

k _{, mark} center

F of MV_center as 1, where

F

_center is the flag of

the center motion vector within W. Otherwise, set all flags within W to 0.

Step 3.3 Go to step 3.2 until all macroblocks are processed.

Step 3.4 Group macroblocks that are marked as 1 into the same cluster.

Step 3.5 Compute each object center and record its associated macroblocks.

Step 3.6 Generate one object set for each P-frame.

3.2.3. Update Object Regions in Finer Level

While the multi resolution motion estimation algorithm is iteratively refining motion vectors of every macroblocks in a frame at each finer level, the rough object mask generated at coarsest level is also refined by these refined motion vectors.

Details of the refining algorithm is presented as follows.

Object Sets Refining Algorithm

Input: Object set {Obj₁, Obj₂, …, Obj_n}.

Output: Refined object set {Obj₁, Obj₂, …, Obj_n} where n is the total number of refined objects in a frame. The object size, dominant motion vector magnitude/direction and centroid are measured by the number of macroblocks within the object, average value of motion vector magnitude/direction and average value of coordinates, repectively.

Step 1. Calculate the motion vector magnitude and direction of the centroid macroblock.

Step 2. Search all macroblocks within the object region, and compute the difference diffMag and diffAng of motion vector magnitude and direction between centroid and these macroblocks. The block will be excluded from the object

if both diffMag >T_Mag and diffAng >T_Ang where T_Mag and T_Ang are predefined thresholds

Step 3. Go to step 2 until all macroblocks are processed.

Step 4. Generate the object mask with the reformed object set, then refining the object mask by employing morphological operation and regenerating the object set with the fined mask.

3.2.4. Morphological Operation

To smooth the boundaries of regions of interest and remove the noisy blocks, two kinds of morphological operations are frequently used. The closing operation is first used to fill the block holes inside the objects mask and the opening operation is the used to remove the small noise blocks that do not belong to the moving objects. In our algorithm, the structure element of size 3×3 is selected for closing and opening operations respectively.

After the morphological operations, the object mask is refined and indicates the shapes and the positions of all the moving objects in the current frame. Then, the individual objects can be extracted to generate the new object set.

3.3 Feature-based Bit Allocation Strategy

Our proposed bit allocation method is based on the characteristics of the object regions, which include size and motion. In order to make a more accurate bit distribution, we will allocate the bits in the frame level first.

3.3.1. Frame Level Rate Control

It is well known that MAD of the residual component can be a good indication of encoding complexity. In the quadratic R-D model, the encoding complexity is usually substituted by MAD. In order to solve this problem of distributing the bits to different frames, we refer to the scheme in [30] and adopted here.

3.3.1.1. Measure of Frame Encoding Complexity

A MAD ratio is used to measure the complexity of a frame, which is the ratio of the predicted MAD of current frame to the average MAD of all previous encoded frames. The MAD ratio of ith frame is calculated as the following:

( )

⎟ ( ) ⁻ ¹

⎠

⎜ ⎞

⎝

= ⎛

∑

⁻

^MAD ⁱ

i MAD

MAD

j avg

ratio (32)

where MAD is calculated by linear model (8), _i MAD_avg^j is the average MAD of jth previous coded frame and (i-1) is the total number of previous coded frames.

3.3.1.2. Adaptive Target Bit Estimation Control

We use the MAD ratio to simply control the target bits estimation for the frame.

The distribution of the bit count is scaled by a function of MAD_ratio. Initial target bits Tr for a frame can be adjusted as shown in the following pseudo code:

Calculate the average MAD of all previously inter-coded frames;

Calculate the MAD_ratio using predicted MAD of the current frame / average MAD; IF (MAD_ratio< 0.9) THEN

Tr = T_r * 0.5

ELSE IF (MAD_ratio< 1.0) THEN

Tr = T_r * MAD_ratio * 0.6 ELSE IF (MAD_ratio< 1.8) THEN

Tr = T_r * MAD_ratio * 0.7 ELSE IF (MAD_ratio>= 1.8) THEN

Tr = T_r * 1.8

The basic idea is to set T_r smaller if the current frame complexity is low and set

Tr larger if the current frame complexity is high. The objective of the improvement is to save bits from those frames with relatively less complexity and allocate more bits to frames with higher complexity duo to high motion or scene changes.

3.3.2. Macroblock Rate Control

In the macroblock level, a content-based bit allocation strategy has been used in our scheme. We have proposed an approach whereby bit allocation to every region is determined based on the characteristics of different image regions. These characteristics include object region size and object dominant motion.

‧Size: First, bit allocation is governed by the size of the object region and background region. The normalized size of each object regions is determined by

f i i f

N

S = N

and for background,

N

S

= N

^b where

N

ⁱ_f is the total number

of macroblocks in the ith foreground object,

N

_b is the total number of macroblocks

in the background,

N

_b is is the total number of macroblocks in the foreground and

N

is the number of macroblocks in a frame.

‧Motion: Bit allocation is also performed according to the activity of each object region, which can be measured by its motion. The normalized motion parameters for each object are derived as

= ∑

where

MV

_doⁱ _min_ant is the dominant motion magnitude of the ith object.

Based on the above characteristics, the amount of bits can be assigned to foreground objects and background region as follows:

Method 1: If

S

> TH

_b and

TH

_b is the threshold for the ratio of number of macroblocks in the background to the number of macroblocks in the frame, then the bits allocation is done as follows:

f macroblock of the background, MAD are the predicted MAD of the frame and _f

Qbest is the quantization level determined by the QP of this frame that is calculated in section 2.3.5.1.

where

α

_p means the portion of the bits that the background will transfer to foreground, ω_M,ω_S are the respective weighting functions of the size and motion parameters and ω_M +ω_S =1.

3.3.3. Post-Encoding Process

After encoding, the encoder updates the R-D model based on the encoding results. The first and second model parameters X₁ and X₂ are updated by using the linear regression technique [40]. And the buffer fullness is updated after encoding by fluid flow traffic model (7).

Chapter 4 Experimental Results

In this chapter, we will show the experimental results and give some discussions.

We will describe the experiment environment first in section 4.1. And we will list some results in section 4.2

4.1 Experiment Environment

In this thesis, we implemented the object segmentation algorithm and bit allocation method by modifying the H.264 reference software JM 9.5[39] and the original version was used to comparison purposes. All experiments were conducted on the PC with an Intel Pentium 4 CPU 2.4 GHz and 256 MB of RAM.

Our experimental work uses the following approach:

1) First frame is intra-coded and others are P-frames.

2) Macroblock type only use 16 × 16

3) Original version is adopted the multi-resolution motion estimation.

4.2 Experimental Results

We had experimented with five sequences: “Football” and “Stefan” of SIF format (352 × 240) and “Foreman”, “Mother and daughter”, and “Hall” of CIF format (352 × 288). According to the sequence type, we encoded the Football and Stefan sequence with 500 kbps because their high motion. Foreman and Mother were encoded with 100 kbps because their obvious foreground region. And we had compressed the Hall sequence with 50 kbps because its static scene with small foreground objects.

Original Version Modified Version AVG PSNR

FG (dB)

AVG PSNR BG (dB)

Bitrates (kbps)

AVG PSNR FG (dB)

AVG PSNR BG (Db)

Bitrates (kbps)

Football 23.73 25.52 513.33 25.73 24.44 557.58

Stefan 28.77 29.94 510.48 30.17 28.47 541.58

Foreman 27.73 27.87 111.01 28.85 26.82 114.47

Mother and daughter

33.53 36.48 109.23 34.47 35.66 110.48

Hall 24.94 32.30 51.37 26.05 30.99 51.26

Table 4-1 Encoded Results of five sequence of JM original version and original version

First, with encoding a sequence which has only one obvious object, we use the CIF format sequence “Foreman” for example. We had encoded the sequence with 100 kbps and compare the original version without bit allocation and the modified version with our proposed method. First, by comparing the object quality, we can see that our method have improved the average quality of foreground object region In Fig.4-1(a).

Second, by comparing the generally quality, the average PSNR of the foreground was improved by 1.12 dB in Table 4-1, whereas the background quality was degraded by 1.05 dB. Finally, by comparing the two encoded images shown in Fig. 4-2, we can

在文檔中以特徵為基礎的視訊編碼位元配置結構 (頁 27-0)

Chapter 2 Background

2.4 Bit Allocation Strategy

Chapter 3

Motion-based Object Segmentation and Feature-based Bit Allocation Scheme

3.1 Overview

3.2 Motion-based Video Segmentation Algorithm

3.2.1. Multi-Resolution Motion Estimation

3.2.1.1 Multi-Resolution Frame Structure

( )

( , )

4 , 1

i j I i m j n

I

= ∑∑ + +

( )

( )

3.2.1.2 Motion Search Framework

{

}

3.2.2. Object Localization

3.2.2.1. Global Motion Estimation

T

3.2.2.2. Object Clustering

∑

≥

6

F

F

3.2.3. Update Object Regions in Finer Level

3.2.4. Morphological Operation

3.3 Feature-based Bit Allocation Strategy

3.3.1. Frame Level Rate Control

3.3.1.1. Measure of Frame Encoding Complexity

( )

⎟ ( ) − 1

⎠

⎜ ⎞

⎝

= ⎛

∑

MAD i

i MAD

MAD

3.3.1.2. Adaptive Target Bit Estimation Control

3.3.2. Macroblock Rate Control

N

S = N

N

S

= N

N

N

N

N

= ∑

MV

S

> TH

TH

α

3.3.3. Post-Encoding Process

Chapter 4

Experimental Results

4.1 Experiment Environment

4.2 Experimental Results

⁽ ^, ⁾

⎟ ( ) ⁻ ¹

^MAD ⁱ