Consistent Picture Quality Control Strategy for Dependent Video Coding

(1)

Consistent Picture Quality Control Strategy

for Dependent Video Coding

Kao-Lung Huang and Hsueh-Ming Hang, Fellow, IEEE

Abstract—Typically, a video rate control algorithm minimizes the average distortion (denoted as MINAVE) at the cost of large temporal quality variation, especially for videos with high motion and frequent scene changes. To alleviate the negative effect on subjective video quality, another criterion that restricts a small amount of quality variation among adjacent frames is preferred for practical applications. As pointed out by [20], although some existing proposals can produce consistent quality videos, they often fail to fully utilize the available bits to minimize the global total distortion. In this paper, we would like to achieve the triple goal of consistent quality video, minimizing the total distortion, and meeting the bit budget strictly all at the same time on the interframe dependent coding structure. Two approaches are taken to accomplish this goal. In the first algorithm, a trellis-based framework is proposed. One of our contributions is to derive an equivalent condition between the distortion minimization problem and the budget minimization problem. Second, our trellis state (tree node) is defined in terms of distortion, which facilitates the consistent quality control. Third, by adjusting one key parameter in our algorithm, a solution in between the MINAVE and the constant quality criteria can be obtained. The second approach is to combine the Lagrange multipliers method together with the consistent quality control. The PSNR performance is degraded slightly but the computational complexity is significantly reduced. Simulation results show that both our approaches produce a much smaller PSNR variation at a slight average PSNR loss as compared to the MPEG JM rate control. When they are compared to the other consistent quality proposals, only the proposed algorithms can strictly meet the target bit budget requirement (no more, no less) and produce the largest average PSNR at a small PSNR variation.

Index Terms—Bit allocation, consistent quality control, H.264, quality smoothing, video rate-distortion control.

I. INTRODUCTION

V

IDEO coding technologies have been progressing very fast in the past two decades. The MPEG-4 AVC/H.264 is the latest international video coding standard [1], [2]. To achieve the optimal rate-distortion (R-D) performance on a

Manuscript received April 08, 2008; revised December 03, 2008. First pub-lished March 24, 2009; current version pubpub-lished April 10, 2009. This work was supported in part by the National Science Council (Taiwan, R.O.C.) under Grant NSC 91-2219-E009-011. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Srdjan Stankovic.

K.-L. Huang is with the Department of Electrical Engineering, Na-tional Chiao-Tung University, Hsinchu, 30010, Taiwan, R.O.C. (e-mail: [email protected]).

H.-M. Hang is with Department of Electrical Engineering, National Chiao-Tung University, Hsinchu, 30010, Taiwan, R.O.C., and also with the Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10061, Taiwan, R.O.C. (e-mail: [email protected]. tw).

Digital Object Identifier 10.1109/TIP.2009.2014259

specific coding structure, the so-called rate control algorithms are proposed to determine the best quantization parameter (QP) for a coding unit (which can be a macroblock (MB) or a frame) and these algorithms should also prevent the buffer(s) from underflow or overflow in the environment of a constant bit rate (CBR) channel or a variable bit rate (VBR) channel. In this paper, we will focus on the frame-level bit rate control over a CBR channel.

Many types of rate control algorithms have been developed [3]. The rate control problem in video coding becomes highly complicated mainly due to the motion estimator and other interframe operations in the system. The parameters selected in encoding the current frame affect the parameter selection in the subsequent frames, which results in the so-called dependent coding structure [4]. Two approaches are taken to solve this problem. A sub-optimal approach simplifies the original for-mulation by adopting the independent coding structure, which picks up the best parameters for the current frame without considering their effects on the future frames, for example, the method in [5]. Many practical one-pass or two-pass algorithms belong to this category and they include R-D models such as the classical statistical model [6], [7], the quadratic model [8], and the rho-domain model [9]. The R-D optimality is not guaranteed in this approach because of the unavailability of future frames. The other approach is to reduce the complexity of the parameter search so that a near-optimal solution can be obtained at a reasonable computational cost. Mathematically, the exhaustive search for the optimal frame QPs in a group of pictures (GOP) is equivalent to finding the optimal path in a tree. Potentially, this approach can identify the globally optimal solution, which cannot be accomplished by the first sub-optimal approach. However, the tree grows exponentially as more pictures are coded. Therefore, several methods are proposed to reduce the search complexity, for example, the monotonicity assumption [10], the node clustering [11], the steepest descent search [12], and the interframe R-D model [13]. The first two methods are adopted in this paper and will be elaborated in Section III.

There are two commonly used optimization criteria in de-signing a rate-control algorithm: minimum average distortion (the MINAVE criterion) and minimum maximum distortion (the MINMAX criterion). The MINAVE criterion [14] aims at min-imizing the total distortion under a given bit budget. This op-timal goal is widely adopted and is well studied in the liter-ature. Examples are the algorithms mentioned in the previous paragraph. However, this MINAVE goal is attained often at the expense of a possibly larger frame-to-frame quality variation. From the perspective of human visual system (HVS), a video

(2)

sequence with nearly constant quality or consistent quality is more desirable [15]. Therefore, the MINMAX criterion [14] is proposed to minimize the maximal distortion for a given bit budget. Coupled with the dependent coding structure, achieving the MINMAX target becomes a very complicated issue. Sev-eral methods have been proposed such as dynamic programming [14], lexicographic algorithm [16], low-pass filtering of rate-dis-tortion functions [17], minimum disrate-dis-tortion variation [13], [18] and an iterative frame bit allocation algorithm [19]. Often, the final results produced by these proposals do not achieve strictly the global MINMAX target. They typically produce videos with a slowly varying quality, or in other words, with a consistent quality, and this is practically all we need.

Extensions of the one-pass algorithms are frequently adopted by the researchers in the last paragraph. However, their assump-tion of similar image statistics in near-by frames does not hold for videos with high motion and scene changes. Also, the one-pass approach may fail to achieve consistent quality on fast motion sequences due to limited bit budget and unavailability of future frames [18]. Furthermore, these methods often de-crease the frame-to-frame quality variation without paying at-tention to the total distortion. Therefore, a hybrid MINMAX/ MINAVE method was suggested to increase the overall quality after finding the MINMAX solution [20].

In this paper, we tackle the dependent MINAVE and con-sistent-quality problems simultaneously. More specifically, we would like to achieve the consistent quality goal across the en-tire sequence and, in the meantime, to meet the target bit rate accurately and to minimize the total distortion. The tradeoff be-tween average distortion and consistent quality is controlled by one key parameter, namely, the maximal quality variation con-straint. One method to solve the above optimization problem with finite parameter set is the dynamic programming approach. By adopting the monotonicity and clustering concepts, the tree structure in the dependent video coding is converted into a trellis diagram. Thus, the Viterbi algorithm can be employed to find the truly optimal solution in this dependent coding problem. The trellis state (tree node) is defined in terms of distortion to facili-tate the consistent quality control. In addition, a fast technique is proposed to decrease the computation in the branch expansion process. By adjusting the key parameters in our scheme such as cluster size, we can decrease the computational complexity at the cost of minor performance loss.

A second method is proposed based on the Lagrange multi-pliers. To ensure the global optimality on the dependent coding platform, an iterative scheme is designed to find the best lambda parameter (Lagrange multiplier) in the Lagrange cost or La-grangian. This algorithm backtracks many times to narrow down a valid range containing the optimal . Then, the best value is identified by a fast search algorithm. This scheme runs much faster than the trellis-based approach. Its performance is close to but slight lower than that of the trellis-based approach.

Despite the optimality of both methods suggested in this paper, their real-time implementation is still beyond the current hardware capability. Thus, the proposed algorithms may be more suitable for off-line applications such as DVD playback when video quality is the major concern. We implement our al-gorithm on the new and very efficient H.264 coder and evaluate its performance.

This paper is organized as follows. In Section II, we introduce the rate-control problem in video coding and derive an equiv-alent condition between the distortion minimization problem and the budget minimization problem. Two proposed algorithms are described in Section III: a) the trellis-based algorithm with Viterbi search and b) a Lagrangian-based iterative algorithm with bisection search. Section IV presents the simulation re-sults to show the effectiveness of our algorithm. These rere-sults are compared with existing MINAVE and MINMAX schemes. Also, the effect of control parameters on PSNR and complexity is studied. Section V summarizes the findings and their limita-tions in this paper.

II. PROBLEMFORMULATION ANDDISTORTION-RATEFUNCTION

The frame-level bit allocation problem and the uniqueness property of the distortion-rate function are described in this sec-tion. In our selected structure, we encode a frame and all its mac-roblocks using the same QP. The notion of quality in this paper is the well adopted image objective criterion, PSNR.

A. Dependent MINAVE Bit Allocation Problem

In the (forward prediction) dependent coding formulation, the th frame distortion and bits, i.e., and , depend on the current and previous frame QP values. Let be the set of quantization parameter values in the H.264 standard video codec. Given frames and a total bit budget , our goal is to minimize the overall distortion by choosing the optimal

frame-level QP values for all

frames, where , for . That is

subject to the constraints

and (1) where is the average distortion for all frames and is the PSNR function calculated by

, where FPN is the pixel number in a frame. The second constraint in (1) is added to achieve the consistent quality video; that is, the difference between the frame PSNR and the average sequence PSNR is limited by

.

Another important function of a rate control algorithm is to avoid the buffer underflow and overflow problems. The MPEG standard imposes a hypothetical decoder model on a legal bit stream, namely, Video Buffer Verifier (VBV). There are three prescribed operation modes in VBV. In this study, we consider only the constant bit rate (CBR) mode; i.e., the channel rate is constant. We assume that the decoder buffer is large enough to eliminate the buffer overflow problem. In more details, the buffer is initially empty. To avoid the buffer underflow problem, bits in the decoder buffer accumulate for a specific time before the bits of the first frame are removed. Afterwards, the decoder buffer continues receiving constant-rate bits from the channel

(3)

and the decoder removes the bits in buffer at regular frame-time intervals. Essentially, the buffer underflow problem imposes a delay on the decoder. For frame , the buffer occupancy is

(2) where is the initial buffer occupancy and channel bit rate/frame rate. The buffer underflow is avoided if the constraint for all is satisfied. In other words, the decoder delay shall be selected to ensure that the buffer contains at least bits, when the decoder starts decoding frame for all

.

B. Uniqueness of Distortion-Rate Function

In the conventional MINAVE problem, we minimize the total distortion subject to a given bit constraint. However, in order to achieve a consistent quality video, it is more convenient if the distortion, not the bit rate, is the controllable argument in our process. That is, we prefer rather than

.

In the classical information theory, the distortion-rate func-tion, , is a nonincreasing, convex function and its slope must be both nonpositive and nondecreasing. Then, the rate-dis-tortion function, , the inverse of , is a legal nonin-creasing, convex function too. As a result, the solution to the

problem is identical to that to the

problem. However, these ideal properties of the rate-distortion function may not true for the real-data case. Therefore, we study the relation of these two solutions in the operational sense and derive the proposition as follows.

Proposition: Given a rate-distortion coder with control parameters of discrete and finite values, we consider the op-erational and functions. In other words, is the achievable distortion for the given bit rate . is similarly defined. Then, the optimal solution, , to the minimum distortion problem, i.e., , is also the optimal solution, , to the minimum budget problem,

i.e., , if the optimal distortion function

is a one-to-one mapping, where is the solution set to the ) problem at the given budget bits.

Proof: Since is the optimal solution to

, it implies . On the other hand, is the optimal solution (least amount of

distor-tion) to , thus . Consequently, we

have . The optimal solution of

implies . In addition, is the optimal solution (least amount of bits) of ; it thus

implies . Consequently, we have .

Now, if is a one-to-one function, the relation

must be true because . Therefore, the solutions to

these two problems, and , are

identical if is a one-to-one function.

III. CONSISTENTQUALITYCONTROLALGORITHM

Two approaches are chosen to solve the interframe depen-dent coding problem in this study. We start with the trellis-based

approach. First, the tree structure inherent in dependent coding is reduced to the trellis structure. Then, the branch expansion process is described and the Viterbi search is used to solve the bit allocation problem. Next, a fast branch expansion algorithm ex-tended from a previous proposal is presented. In the last sub-sec-tion, we propose the Lagrange multipliers approach. An iterative structure is designed for finding the optimal lambda value in the Lagrange cost. To speed up this iterative process, a couple of the existing but independently proposed fast schemes are included with proper modifications.

A. Trellis Representation of the Tree Structure

In the dependent coding structure, the current frame distor-tion and bits depend not only on the current QP but also on the previous frame QPs. Given 52 possible QP values, there are 52 possible coded pictures (each coded using a different QP value) for the first frame. Each coded picture is associated with a (dis-tortion, rate) pair after coding. Each of them leads to 52 possible second-frame pictures. Therefore, there are in total possible pictures (or states, nodes in a tree) for the second picture. The picture (or state) number grows exponentially as more frames are coded. All the possible picture sequences thus form a tree structure. The computational complexity of finding the optimal solution in a tree becomes a serious issue.

Two approaches were suggested to reduce the growing number of states. The state pruning technique was proposed by [10] and the state clustering approach was proposed by [11]. In the first approach [10], the state in a tree is denoted by the accumulated frame coded bits. The theoretical basis of state pruning is the “monotonicity” assumption that a better current coding frame will lead to a more efficient coding in the future [10]. Although this monotonicity condition is not always guaranteed as pointed out by [21], our experimental results indicate this assumption is typically true. Therefore, the Markovian condition (the future optimal path depends only on the current state not the previous one) is created. The Viterbi Algorithm (VA) can thus be applied. As a result, when multiple branches arrive at the same state, only one branch of the minimal accumulated distortion is selected as the survivor and the complexity is largely reduced. The second complexity reduction approach adopts the notion of “cluster” [11], which merges a few neighboring nodes (states) into one cluster be-cause these nodes (in one cluster) have similar states (buffer level in [11]) and thus lead to similar final results.

We adopt both the concepts of monotonicity and cluster in this study. However, for the quality variation control purpose in this study, the distortion value (represented by PSNR) is used as the state variable. In addition, because the PSNR value is a real number, the problem of infinite states occurs in this for-mulation. Therefore, a cluster representing a distinct range of PSNR values is defined as a state. The cluster size parameter is used to define the span of a cluster. To convert the tree structure into a trellis, it is necessary to restrict the dynamic range of admissible PSNR. It is set by the lowest quality, de-noted by , and the highest quality, denoted by . This range should include all the PSNR values in the optimal solution and is chosen empirically. Consequently, the number of states equals

(4)

Fig. 1. Illustration of cluster, node, branch,P , and definitions.

. Because there are only a finite number of states, the tree struc-ture is degenerated into the trellis strucstruc-ture. In contrast, the con-cept of cluster is proposed to reduce the tree search complexity in [11] and now is extended for the purposes of both defining finite states and reducing complexity in this study.

The rest is the detailed description of our trellis structure. Fig. 1 illustrates the relation among cluster, node, branch, , and .

• Cluster: The notation represents a cluster with index at stage (frame) , where

and . The th cluster PSNR range is

. A cluster may contain a number of nodes in it. The best performing node (in the R-D sense) inside a cluster is chosen to be the representa-tive node of this cluster.

• Node: A node represents a legal operating point of the coding result, whose PSNR value is in the

cluster at frame , where , and

. and are the accumulated coded dis-tortion and bits before encoding frame , respectively. • Branch: A branch connects two nodes in the trellis

diagram. The notation indicates that it stems from the representative node in cluster at frame and it ends at a node in cluster at frame . It uses to quantize frame . It produces a next stage node

,

where and

, if the three conditions,

and ,

are all satisfied. A rate-distortion pair

is associated with this branch. Note that the average se-quence PSNR value is not available until the end of the encoding process. It is thus approximated by the current value.

B. Branch Expansion and Frame-Level Bit Allocation

Let two nodes of and be connected by a branch . In the branch expansion process for node , all the QPs satisfying the following three constraints are examined (that is, they are used to quantize data in frame ): a) PSNR range , b) bit budget , and c) quality

variation . The previous frame QP value,

, is selected to be the center QP value, denoted by QP , and the examined QPs are expanded from the center value gradually by , where the step index is incremented by one until any of the above constraints is violated.

The first frame (I frame) in a sequence is by default the first active node. In the following frames (P frames), the number of branches and nodes grows exponentially if they are not elimi-nated or merged. The adaptation of the cluster concept allows the merge of nodes with similar distortion. A cluster containing at least one expanding node in it is called active cluster. When a small cluster size, say, dB, is in use, typically only the branch of least accumulated bits and its associated node will be the single survivor in this cluster. The survivor node in an active cluster is defined as an active node. The “monotonicity assump-tion” enables the elimination of weaker branches (branches with higher bit rates) ending at the same node (cluster). That is, in the backtracking process, only the active node with the smallest total distortion and permissible bit usage is selected. Therefore, the goal of minimizing the total distortion is achieved.

To accomplish the consistent quality video goal,

dB is usually adopted. Overall, the proposed quality control al-gorithm is summarized below.

Algorithm 1: Trellis-Based Consistent Quality Control (TCQC) Algorithm

Step 1: Initialize the values of , and . Step 2: Encode the first I frame using all quantization

values. Prune the branches that violate any of the two constraints: the PSNR range

and the bit budget .

Step 3: If multiple branches merge at the same destination cluster, select the branch with the least accumulated bits and its corresponding node becomes the survivor. At the end of this step, each cluster contains only one active node, which is connected to only one surviving branch. Save the context information of the survivor nodes.

Step 4: Expand all active nodes for the next I- or P-frame. Encode the next frame (frame

) using all allowable quantization scales. Prune the branches which violate any of the three constraints: the PSNR range

, and the quality variation constraint

.

Step 5: If the current frame is not the last frame in the sequence, go to Step 3.

Otherwise, among all active clusters, choose the survivor node with the best overall quality as the final solution. Backtrack along the optimal path connecting to the starting frame of this sequence. We thus obtain the optimal frame-level QP and bits for each frame. This sequence is then done.

(5)

C. Fast Branch Expansion Process

Generally, a complete video encoding process is executed whenever a branch is expanded. In the MPEG JM reference soft-ware, the coding parameter selection is done by two components: the rate-control algorithm and the rate distortion optimization (RDO) process. This 2-stage coder control structure is well recognized for its efficiency for a highly complicated hybrid video coder such as H.264. But the RDO process is costly in computation. The RDO process needs a QP input value for its operation and it outputs the coding modes, distortion, header bits, and residual signals. On the other hand, a typical rate-control algorithm needs the modes etc. information to pick up the best QP for quantizing the current MB or frame. Therefore, these two components depend on each other for supplying their inputs, a chicken and egg problem [22]. Let QP and QP denote the QPs used by the RDO process and the quantization process, respec-tively. The initial value of QP is generally not equal to the value of QP . Therefore, an iterative procedure has been proposed for

updating QP (for example, after

the first set of QP and QP are obtained [22].

It is reported that the coding PSNR loss is less than 0.2 dB when 3 [22]. When a frame is encoded twice

using two sets of QP values, namely, and

, separately, we run RDO only once with . Using the aforementioned property, the same RDO outputs are used for quantization in both cases, and , if and are sufficiently close. We thus save one RDO computation.

To lower the approximation error, we restrict the

approxi-mation range by . The fast branch

expan-sion process now runs as follows. First, the current frame is en-coded using the center QP defined in Section III-B, i.e.,

. Then, the upper and lower two branch expan-sions can be easily generated by performing the quantization

processes four times with .

As a result, five branch expansions are generated at the cost of computing one RDO process and five quantization processes. If more branch expansions are needed, another complete video en-coding process is needed, for example,

or . Finally, to prevent the approximation error prop-agation to the next stage, a complete video encoding process, i.e., running RDO and quantization with the chosen final QP, is executed again for each active cluster.

D. Technique Based on the Lagrange Multipliers

Another optimization technique, the so-called Lagrange mul-tipliers method can also be used to find the optimal operation point on the rate-distortion curve [11]. We define the Lagrange

cost to be . The goal

be-comes

(3)

It is well-known that the optimal solution to the mini-mizing distortion problem with budget constraint, denoted by , is equivalent to that of minimizing the

La-grange cost, in (3) with [11]. The

key step in finding the optimal solution is to identify , the optimal value of lambda. In general, this optimal solution can be iteratively solved [11]. However, in this study, we impose two additional constraints: the consistent quality

con-straint, and the PSNR range constraint,

. We develop an iterative process to solve this new and more complex problem as follows.

First, the budget constraint of is relaxed. We intend to find a proper lambda range, denoted by , such that the solution to the problem with a lambda located inside this range shall satisfy all three constraints,

, and

. Therefore, the optimal lambda value is guaranteed to locate in the selected range. Next, a fast bisection algorithm in [23] is employed to find the solution to the problem. That is, the lambda search process iterates until the predefined bit rate tolerance, i.e., , is satisfied. And the (optimal) QPs are a byproduct in this process.

In the following, we describe how the constraints are satisfied in the aforementioned process of finding the lambda range. For a given frame, we examine only the valid QP values that satisfy the quality constraint, . The picture coding process is similar to the fast branch expanding step described earlier. To satisfy the other two constraints

and , we start

with two initial lambda values, and , such that both

and are

satisfied. Then, the center value in the current lambda interval is used as the test lambda to determine whether the solution to the problem satisfies the constraint . If the current average PSNR is lower than , a smaller lambda should be used, and, thus, the lower subinterval is selected as the lambda interval for the next iteration. Equivalently, the test lambda value is decreased in the next iteration. On the other hand, if the current average PSNR is larger than , the upper subinterval is selected as the lambda interval for the next iteration, which increases the test lambda value in the next iteration. We check the average PSNR value whenever a frame is coded. If either of the above conditions happens, we need to re-encode the video sequence from the first frame again using the new lambda range. This process continues until the chosen leads to a successful coding of the entire video sequence. At the end, if the resulting bits are smaller than the bit budget, the latest test lambda value is referred as . The same process is performed in the lambda interval

to obtain value, but note that the obtained value shall satisfy . Theoretically, if the values of , and are properly selected (so that the optimal solution exists), because the R-D curve is convex, this algorithm converges. Overall, the iterative lambda optimization steps are summarized below.

(6)

Algorithm 2: Lagrangian-based Consistent Quality Control (LCQC) Algorithm

Step0: Start with two values and

such that and

. Set and frame index .

Step 1: Given , use the fast branch expansion technique to examine all the QPs that satisfy

.

Step 2: If , go to Step

3. Else if , set ;

otherwise , set . Let

(start from the first frame again), go to Step 1.

Step 3: Encode the current frame again using the up-to-date QP value. If the current frame is not the last frame in the sequence, let , go to Step 1.

Step 4: If set . Else set

. If the lambda interval boundaries, and , are both found, go to Step 5. Else let

, go to Step 1.

Step 5: Perform the fast bisection search algorithm [23] in the lambda range to find the optimal , i.e., . The usual stop rule

is adopted. A few assistant formulas are proposed in [23] so that this search process converges rather fast. Normally, this step takes 2 to 4 recursions. The final and its associated QPs are our optimal solution to (1).

Typically, Steps 1 and 3 require only one branch expansion process (to examine the valid QPs) and one complete encoding process (to prevent approximation error propagation), respec-tively. The computational complexity mainly comes from the number of iterations. It usually takes 5 to 8 iterations to com-plete this lambda search. Detailed simulation results including PSNR and computing time are discussed in Section IV.

IV. SIMULATIONRESULTS

We have implemented the proposed quality control algorithm on MPEG-4 AVC/H.264 video coder with the rate-distortion optimization (RDO) option turned on. Experiments are per-formed using the standard MPEG video sequences, Foreman, Table Tennis, News, and Stefan. All test videos are 300 frames in QCIF size. The GOP size is 30. Only I- and P- frames are in use. The PSNR range in each case is estimated from the minimum PSNR and the maximum PSNR obtained by applying JM 7.6 to the test video sequence. Simulations are performed on a 3-GHz Intel Pentium CPU.

We conduct four sets of simulations to evaluate performance of the proposed TCQC and LCQC algorithms. In the first ex-periment, the TCQC algorithm is tested at different bit rates to show its effectiveness on bit allocation, as compared with the JM and the constant QP schemes. The JM7.6 rate control scheme is unable to select a QP for the first frame. For fair comparison,

Fig. 2. PSNR plots of the TCQC, JM 7.6, and Constant QP algorithms for News at two bit rates. (a) 24 kbps. (b) 112 kbps.

the first QP is set to be identical to that of the TCQC algorithm. Also, the best constant QP case is shown, which is produced by using a single QP value for the entire sequence. In this exper-iment, all possible QP values are tried and the one which pro-duces bits closest to the target bits is chosen. Next, the PSNR and complexity of the LCQC algorithm are compared with two published algorithms, LPF in [17] and MultiStage in [19]. In the third and fourth experiments, TCQC and LCQC are compared. Several and values are tested to show the PSNR and com-plexity tradeoff.

A. Performance Comparison With Constant QP and JM The TCQC algorithm is evaluated on four different video se-quences at three different bit rates to show its effectiveness on bit allocation. The Foreman sequence contains mainly a talking head with a scene change near the end, the News sequence con-tains some amount of background changes, the Table Tennis quence has a scene change in the middle, and the Stefan se-quence has high motion. Two other schemes, namely, JM 7.6 and constant QP, are also applied to these sequences. The pa-rameters used in this experiment are the cluster size

(7)

TABLE I

COMPARISONS OFMINIMUMPSNR, MAXIMUMPSNR, AVERAGEPSNR, PSNR VARIANCE, BITRATE,ANDDECODING

DELAY FORJM 7.6, TCQC,ANDCONSTANTQP SCHEMES ON THEFOREMANSEQUENCE

TABLE II

COMPARISONS OFMINIMUMPSNR, MAXIMUMPSNR, AVERAGEPSNR, PSNR VARIANCE, BITRATE,ANDDECODING

DELAY FORJM 7.6, TCQC,ANDCONSTANTQP SCHEMES ON THENEWSSEQUENCE

curves and their relative merits of these three schemes show sim-ilar trend on all these four test sequences, and, thus, only the Foreman and News sequences are displayed in Tables I and II. The News plot which has the largest variation is also displayed in Fig. 2.

As shown in Tables I and II, the TCQC scheme has the least PSNR variation as compared to JM and constant QP. It has the highest minimum PSNR and the lowest maximum PSNR. The constant QP method is the simplest conceptually but its overall PSNR is often lower; it has pretty low PSNR variation but not the lowest. Generally, the complexity of constant QP method is much lower than that of TCQC. To ensure a consis-tent frame-to-frame quality, TCQC has a lower PSNR than JM 7.6, but the difference is often less than 0.5 dB. As shown in Subsection IV-C, the average PSNR gets higher if the con-straint is loosen. Also shown in Tables I and II are the decoder buffer delay [ defined in (2)], which avoids buffer underflow. Simulation results also show that our minimum and max-imum PSNR values are very close. Therefore, it is possible to narrow down the PSNR range for further complexity reduction in our algorithm. Empirically the JM average PSNR, , is a good estimate for the TCQC average PSNR. Extensive sim-ulation results conclude that typically the PSNR range can be

approximated by ).

Fig. 2 depicts the frame-to-frame PSNR plots for the News sequence at two different bit rates. The TCQC PSNR curve has no drop at the GOP boundaries or at scene changes. It has the smoothest shape among these three curves. The overall PSNR performance of JM 7.6 is the best but it has a large swing of more than 3 dB in PSNR across the entire sequence. One may notice that the first few frames of the TCQC algorithm have higher PSNR. This agrees with the well-known observation that a good I frame leads to better P frames in a GOP. As discussed earlier, the Viterbi search provides the optimal solution under the given assumptions and constraints. Therefore, although the average PSNR of TCQC is slightly lower than that of JM, TCQC offers the best average PSNR under the consistent quality constraint.

B. LCQC Performance Comparison With LPF and MultiStage Algorithms

In this subsection, two recent well-performed rate-control al-gorithms, LPF in [17] and MultiStage in [19], are simulated and compared to our LCQC algorithm. The basic idea behind LPF (low-pass filtering) is to smooth out (low-pass filtering) the dis-tortion curve by reallocating the bits of frames inside a moving

(8)

TABLE III

COMPARISONS OFPSNR, BITRATE,ANDCOMPLEXITY FORLCQC, MULTISTAGE,ANDLPF ALGORITHMS ONNEWS AT THREE BITRATES

TABLE IV

COMPARISONS OFPSNR, BITRATE,ANDCOMPLEXITY FORLCQC, MULTISTAGE,ANDLPF ALGORITHMS ONTABLETENNIS ATTHREEBITRATES

TABLE V

EFFECT OFQUALITYVARIATIONCONSTRAINT ONPSNRANDCOMPLEXITY FOR THELCQC ALGORITHM ONTHREESEQUENCES, FOREMAN, TABLETENNIS,,ANDNEWS ATTHREEQUALITYCONSTRAINTS.P = P 0 P = 2 dB

time window. A quite accurate model that relates the smoothed distortion and the smoothed bit rate is proposed in [17].

The MultiStage algorithm is aiming at the constant quality target. A 2-stage iterative procedure is proposed [19]. Given a set frame bits, the Target rate stage encodes each frame with the given bits. Given the average PSNR of all frames, the Constant quality stage tries to encode every frame to reach the average PSNR by adjusting QP. If either of the following two stop condi-tions is satisfied, the algorithm terminates: a) the difference be-tween the maximal and the minimal PSNR value in a sequence for the quality stage and b) the difference between coded bits and

the target bits for the rate stage. In our experiment, the threshold values are 0.5 dB and 2% for the quality stage and the rate stage, respectively. The parameters used by the LCQC algorithm are:

dB, and dB.

As shown in Tables III and IV, typically, the LCQC algorithm can match the bit budget very well. The MultiStage algorithm usually has a bit rate mismatch especially at low rates, which are consistent with the report in [19]. The LPF algorithm has a bit rate mismatch too. As discussed in [17], the coding bits converge to the budget bits when the sequence length goes to

(9)

in-Fig. 3. LCQC results for News at 24 kbps andP = 2:0 dB: (a) = 0:2 dB. (b) = 0:4 dB. (c) = 1:0 dB.

finity. Often, the LCQC algorithm has the largest average PSNR and its PSNR variance is controlled at around 0.02 consistently at all rates because the frame quality variation is limited to a range between and . That is, the PSNR is accurately controlled by adjusting the quality variation parameter. In con-trast, the LPF and the MultiStage algorithms try to achieve the constant quality goal only. The LCQC complexity (CPU time)

TABLE VI

EFFECT OF CLUSTERSIZEP ON THEPSNR LOSS FOR THETCQC ALGORITHM ONFOREMAN ANDTABLETENNIS ATTHREECLUSTERSIZES AND = 0:4 dB

lies in between those of MultiStage and LPF, whereas LPF has the smallest complexity. Both the LCQC and the MultiStage al-gorithms have a larger complexity at low bit rates due to the large number of iterations for convergence.

C. Effects of Quality Variation Constraint on PSNR and Complexity

One important feature of our schemes is the flexibility of adjusting the picture quality variation over time. Our schemes achieve the MINAVE goal in (1) when . It produces the constant quality pictures when approaches 0 dB. Gener-ally, if is smaller than 0.4 dB, a consistent quality solution is practically obtained. By adjusting the value in the range of [0.4, ], we obtain a solution in between the constant quality and the MINAVE.

The disadvantage of using a large value is to increase the number of branch expansions and active nodes. It has a much less impact on the LCQC algorithm since LCQC does not have the trellis structure. Table V shows the test results. Indeed, its computational load increases only slightly from a small to a large .

Fig. 3 is the frame PSNR plot for the News sequence at dif-ferent quality variation constraints. Simulation results show that a larger value leads to larger picture variation but produces a higher PSNR. As shown in Fig. 3(b), the PSNR curve produced by LCQC has little variation at the beginning of the sequence as compared to that of TCQC in Fig. 2(a). It shows that the LCQC algorithm generates even more smooth PSNR outputs.

D. Effects of Cluster Size on PSNR and Complexity

Table VI shows the TCQC results at various values. As expected, the average PSNR value decreases when gets larger. The granularity loss is defined as the absolute PSNR differences between dB case (very fine granularity) and the larger cases. When the cluster size is very small ( dB), we essentially achieve the best possible results without PSNR loss due to the use of cluster. As expected, the granularity loss is getting larger as the cluster size is larger than 0.1 dB.

Since LCQC does not have trellis structure, LCQC has no granularity loss. However, on the other hand, the LCQC formu-lation is an approximation to the integer programming problem [11]. Also, in the lambda search procedure, we stop at a given tolerance. Therefore, there is a performance loss due to the use of Lagrange cost and tolerance. Table VII shows the test results of TCQC and LCQC at the same quality variation of

(10)

TABLE VII

COMPARISONS OFAVERAGEPSNR, BITRATE,ANDCOMPLEXITY INTCQCANDLCQC ALGORITHMS FORTHREESEQUENCES

dB and the same PSNR range of dB. The cluster size is 0.1 dB for TCQC. As expected, TCQC is slightly better but the PSNR difference is typically less than 0.3 dB. Again, LCQC is much faster in speed.

V. CONCLUSION

In this paper, we realize the triple goal of producing consistent quality videos, minimizing the total distortion and meeting the bit budget strictly. Moreover, this framework can flexibly pro-vide a solution in between the MINAVE and constant quality extremes. Two algorithms are proposed to find the optimal and consistent quality solution. Inspired by the previous work, a trellis-based quality control scheme is firstly proposed. This ap-proach provides a nearly optimal solution (the resulting total distortion is minimized) for a given bit rate budget on a de-pendent coding platform. The second algorithm is developed based on the Lagrange multipliers method. We impose the con-sistent quality constraint on this formulation and also we design a fast procedure to find the optimal solution. As compared to the trellis-based algorithm, it runs much faster and has a per-formance very close to the former. Simulation results show that both approaches have the largest PSNR average at a slight PSNR variation as compared to the other published consistent quality proposals and have a much smaller PSNR variation at a slight average PSNR loss as compared to the MPEG JM rate control. In addition, only the proposed algorithms can strictly meet the target bit budget requirement.

Due to the interframe dependent consideration, two proposed schemes have rather high computational complexity. Therefore, they are targeting at off-line applications such as Internet video streaming and DVD playback, in which the coding performance has a higher priority than the complexity. More powerful tech-niques that reduce the computational complexity are under de-velopment.

REFERENCES

[1] in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Joint Final Committee Draft (JFCD) of Joint Video Specification (ITU-T Rec. H.264—ISO/IEC 14496-10 AVC), JVT-D157, 4th Meet., Klagenfurt, Germany, Jul. 2002.

[2] T. Wiegand, G.-J. Sullivan, G. Bjontegarrd, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

[3] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Process. Mag., Nov. 1998. [4] A.-E. Mohr, “Bit allocation in sub-linear time and the multiple-choice

knapsack problem,” in Proc. IEEE Data Compression Conf., Mar. 2002, pp. 352–361.

[5] Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of quantizers,” IEEE Trans. Acoust., Speech, Signal Process., vol. 36, no. 9, pp. 1445–1453, Sep. 1988.

[6] H.-M. Hang and J.-J. Chen, “Source model for video transform coder and its application—Part I: Fundamental theory,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 4, pp. 287–298, Apr. 1997.

[7] J.-J. Chen and H.-M. Hang, “Source model for video transform coder and its application—Part II: Variable frame rate coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 4, pp. 299–311, Apr. 1997. [8] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using

quadratic rate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 2, pp. 246–250, Feb. 1997.

[9] Z. He and S.-K. Mitra, “A unified rate-distortion analysis framework for transform coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 12, pp. 1221–1236, Dec. 2001.

[10] K. Ramchandran, A. Ortega, and M. Vetterli, “Bit allocation for depen-dent quantization with application to multi-resolution and MPEG video coders,” IEEE Trans. Image Process., vol. 3, 9, no. 5, pp. 533–545, Sep. 1994.

[11] A. Ortega, K. Ramchandran, and M. Vetterli, “Optimal trellis-based buffered compression and fast approximations,” IEEE Trans. Image Process., vol. 3, no. 1, pp. 26–40, Jan. 1994.

[12] Y. Sermadevi and S.-S. Hemami, “Efficient bit allocation for dependent video coding,” in Proc. IEEE Data Compression Conf., Mar. 2004, pp. 232–241.

[13] L.-J. Lin and A. Ortega, “Bit-rate control using piecewise approxi-mated rate-distortion characteristics,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 8, pp. 446–459, Aug. 1998.

[14] G.-M. Schuster, G. Melnikov, and A.-K. Katsaggelos, “A overview of the minimum maximum criterion for optimal bit allocation among de-pendent quantizers,” IEEE Trans. Multimedia, vol. 1, no. 3, pp. 3–17, Mar. 1999.

[15] Y. Yu, J. Zhou, Y. Wang, and C.-W. Chen, “A novel two-pass VBR coding algorithm for fixed- storage application,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 3, pp. 345–356, Mar. 2001. [16] Y. Sermadevi and S.-S. Hemami, “Lexicographic bit allocation for

MPEG video coding,” in Proc. IEEE Data Compression Conf., Mar. 1997, pp. 101–110.

[17] Z. He, W. Zeng, and C.-W. Chen, “Low-Pass filtering of rate-distortion functions for quality smoothing in real-time video communication,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 973–981, Aug. 2005.

[18] B. Xie and W. Zeng, “A sequence-based rate control framework for consisten quality real-time video,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 1, pp. 56–71, Jan. 2006.

(11)

[19] N. Cherniavsky et al., “MultiStage: a MINMAX bit allocation algo-rithm for video coders,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 1, pp. 59–67, Jan. 2007.

[20] S.-Y. Lee and A. Ortega, “Optimal rate control for video transmission over VBR channels based on a hybrid MMAX/MMSE criterion,” in Proc. IEEE Int. Conf. Multimedia Expo, Aug. 2002, vol. 2, pp. 93–96. [21] J.-J. Chen and D. W. Lin, “Optimal bit allocation for coding of video signals over ATM networks,” IEEE J. Sel. Areas Commun., vol. 3, no. 8, pp. 1002–1015, Aug. 1997.

[22] D.-K. Kwon, M.-Y. Shen, and C.-C. Jay Kuo, “Rate control for H.264 video with enhanced rate and distortion models,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 5, pp. 517–529, May 2007. [23] W.-Y. Lee and J.-B. Ra, “Fast algorithm for optimal bit allocation in a

rate-distortion sense,” Electron. Lett., vol. 32, no. 20, pp. 1871–1873, Sep. 1996.

Kao-Lung Huang received the B.S. and M.S. degrees from Chung Cheng Institute of Technology, Taoyuan, Taiwan, R.O.C., in 1985 and 1989, respec-tively, both in electrical engineering. He is currently pursuing the Ph.D. degree in electrical engineering at the National Chiao Tung University, Hsinchu, Taiwan.

Since 1989, he has been with the Chung Shan Institute of Science and Technology (CSIST) as a Member of Technical Staff. His research interests include video coding, image/radar signal processing algorithms, and multimedia communication systems.

Hsueh-Ming Hang (S’79–M’84–SM’91–F’02) received the B.S. and M.S. degrees from the National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 1978 and 1980, respectively, and the Ph.D. degree in electrical engineering from Rensselaer Polytechnic Institute, Troy, NY, in 1984.

From 1984 to 1991, he was with AT&T Bell Laboratories, Holmdel, NJ, and then he joined the Electronics Engineering Department, NCTU, in December 1991. He took a leave from NCTU and has been the Dean of the Electrical Engineering and Computer Science College, National Taipei University of Technology (NTUT), since August 2006. He is a co-editor and contributor of the Handbook of Visual Communications (Academic). He holds 11 patents (R.O.C., U.S., and Japan) and has published over 150 technical papers related to image compression, signal processing, and video codec architecture. His research interests include multimedia compression, image/signal processing algorithms and architectures, and multimedia communication systems.

Dr. Hang was an associate editor for the IEEE TRANSACTIONS ONIMAGE

PROCESSING (1992–1994), the IEEE TRANSACTIONS ON CIRCUITS AND

SYSTEMS FORVIDEOTECHNOLOGY(1997–1999), and is currently an associate editor for the IEEE TRANSACTIONS ONIMAGEPROCESSINGagain. He is a recipient of the IEEE Third Millennium Medal, a Fellow of IET, and a member of Sigma Xi.