A q-Domain Characteristic-Based Bit-Rate Model for Video Transmission

(1)

A

q-Domain Characteristic-Based Bit-Rate Model for Video Transmission

Chun-Yuan Chang, Student Member, IEEE, Cheng-Fu Chou, Din-Yuen Chan, Tsungnan Lin, Senior Member, IEEE,

and Ming-Hung Chen

Abstract—For low-delay video transmission, we introduce a -domain characteristic-based bit-rate model. Specifically, three characteristics are efficiently extracted from the quantized DCT spectra to construct the bit-rate model. Extensive experimental results show that our rate model can provide more accuracy with lower complexity than existing models.

Index Terms—Rate control, rate-quantization model.

I. INTRODUCTION

T

HE rate control scheme, which adjusts the quantization parameters (QPs), plays an important role in packet video transmission since the communication channel often imposes stringent constraints on the transmission bandwidth. Thus, how to construct an accurate bit-ate model has become a major chal-lenge for the rate control designer. In classical MB-level R-D models in [1] and [2], the only characteristic that describes the input source data is the variance of source input, but this ap-proach cannot efficiently adapt a dramatic variation of input source. To tackle this issue, an improved variance-based R-Q model in [2] is proposed. However, a big bit-rate-estimating error might happen in low-motion or low-bit-rate cases. On the other hand, Kim et al. [3] have first used the number of nonzero quantized transform coefficients as the main characteristic, i.e., the number of codewords, to model the bit rate for the rate con-troller. Furthermore, the authors in [4] defined as the per-centage of zeros among the quantized transform coefficients and found that there is a linear relationship between and within each frame. The latter one suggests that the linearity between and within each frame can be used to model the curve when the slope of is predetermined. Accordingly, He et al. [4] attempt to compute some control points, which are regarded as pseudobit rates, to determine the slope of , that is, they collected extensive the actual bit-rate points of different frames and classified those bit-rate points according to different pseudo . Then, within the same cluster of bit-rate points, the “pseu-docoding” process1_{is applied to model the pseudobit rates} _.

After that, we can apply the linear rate regulation approach [5] to the pseudo bit rates to determine the slope of . Finally,

Manuscript received August 8, 2006; revised March 4, 2007 and July 17, 2007. First published April 30, 2008; current version published October 8, 2008. This paper was recommended by Z. He.

C.-Y. Chang is with the Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei 106, Taiwan, R.O.C.

C.-F. Chou and M.-H. Chen are with the Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: [email protected]).

D.-Y. Chan is with the Department of Computer Science and Information Engineering, National Chayi University, Chayi 600, Taiwan, R.O.C.

T. Lin is with the Graduate Institute of Communication Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C.

Digital Object Identifier 10.1109/TCSVT.2008.924103

1_{The “pseudocoding” process is to extract some useful characteristics from}

frequency domain to estimate the actual bit rate.

the curve is obtained and the R-Q model is constructed by one-to-one mapping between and . Nevertheless, there are two main concerns for -domain-based R-Q model:

1) if the accuracy of the “pseudocoding” process for a spec-ified is general for all different types of videos; 2) if the linearity between and could still hold on within

each frame.

To cope with the above issues, in this letter we provide dis-cussions and arguments about -domain model and derive a new -domain characteristic-based bit rate model. In addition, we also demonstrate a two-level greed-based rate controller [6] using the proposed rate model. Predictably, experimental re-sults show that the improvement of accuracy of our rate model is able to meet rate control concerned targets, e.g., the ability of meeting target buffer delay and visual quality enhancement, better than existing rate controllers [1], [4].

II. FROM -DOMAINBASED R-Q MODEL TO

CHARACTERISTIC-BASEDR-Q MODEL A. Problems of -Domain Based R-Q Model

In [4], He et al. took use of a finite set of as control points to perform the “pseudocoding” process. For each , there are two characteristics, i.e., the average of the sizes of all nonzero coefficients and the average of the sizes of all run length num-bers, denoted as and , respectively; they are used to measure “the finite set of pseudo bit rates,” which is . Herein, we write the estimated “pseu-dobit rate” as follows:

(1) where the characteristic vector is defined as

, and is the set of model co-efficients obtained through the offline regression method. Furthermore, two characteristics and in (1) are computed by the linear and cubic function [4], respectively, as follows:

(2) (3) Therefore, assuming that the linearity between and within the same frame holds, the -domain based R-Q model is pro-posed to model the source bit rate, which is shown as follows:

(4) Specifically, we can compute the finite set of pseudobit rates by (1). Afterward, is fed into a linear rate regulation [5] to determine . Accordingly, the curve can be constructed using the estimated . Finally, the R-Q model is obtained via the one-to-one mapping between and .

(2)

Fig. 1. RelationshipR 0(10) at MB-level among the proposed two charac-teristic vectors Q1 and Q2, The-domain-based R-Q model with optimal slope and actual rate curve.

We can see that the -domain-based R-Q model relies on a strong assumption of linearity between and within each frame. Once the linearity between and is not strong enough, the estimated R-Q model could lose its accuracy. In Fig. 1, we plot some actual rate points ), which are generated by encoding the same MB for several times. We can observe that the curve is not always a straight line. In addition, we de-fine an optimal slope 2_{and use} _{to plot the estimated}

curves used in Fig. 1. Fig. 1 indicates that it is insufficient to just use a single optimal slope to predict the actual rate point . In other words, the assumption of linearity between and does not always hold. Therefore, the estimated curve could lose its accuracy, even if the optimal slope is employed to model the rate curve . Intuitively, one of fea-sible solutions to this problem is to perform the “pseudocoding” process directly in the -domain instead of in the -domain to avoid the unnecessary estimation error caused by assumption of linearity between and within each MB/frame. Therefore, we argue that should be a function of as follows:

(5) It also implies that itself could be an important characteristic for “pseudocoding” process in the -domain.

B. Characteristics Analysis and Characteristic-Based R-Q Model

Now we turn the attention to the “pseudocoding” process in (1). Since each is a constant, according to the regression theory [7], the characteristic vector in (1) can be equivalently

ex-tended from to .

This implies that the actual bit rate could be represented via the three characteristics ,3 _{, and} _.

To study the relationships of , , and , we collect lots of rate points , , and from different video sequences and plot them in Fig. 2. We cal-culate the correlation coefficients of , , and . We can see that the correlation coefficients of

2 _{= arg min} _{jR((q )) 0 ^}_{R((q ))=R((q ))j} 3_{Q means nonzeros among the quantized transform coefficients.}

Fig. 2. (a)–(f) Plots of the relationships between the actual bit rate in they-axis and the observed characteristics QC, QNZ, QZ,and QL, in (a), (b), (d), and (e), respectively. The relationships between the actual bit rate and the estimated bit rate using Q1 and Q2 are shown in (c) and (f), respectively.

and are larger than 0.99 on average and of is 0.82 on average. Hence, it is adequate to model the actual coding bit rate as a linear combination of , , and . We denote the first characteristic vector

as Q1.

Particularly, in this work, we study another characteristic, i.e., the sum of levels of quantized nonzero coefficients, denoted as . We note that the correlation coefficient of is also more than 0.99. Hence, it could also be possible to model the actual coding bit rate using the second characteristic vector , named Q2. Fig. 2(c) and (f) plots the relation-ship between the actual bit rate and the estimated bit rate mod-eled by Q1 and Q2, respectively. It is clear that such multivari-able modeling framework performs better than single varimultivari-able modeling framework.

The main idea of this work is to perform the pseudocoding process in the -domain. Thus, we collect extensive rate points from different frames and classify those rate points according to different . Within each cluster of rate points, we study the correlation coefficient between the actual bit rate and each mentioned characteristic. They are denoted as ,

, , and . We can observe

that correlation coefficients of , ,

and are very close to 1 and is more

than 0.88. Recall that we have argued that the “pseudocoding” process should directly be performed in the -domain instead of the -domain. Therefore, we make a reasonable hypoth-esize—the “pseudocoding” process can be performed in the

-domain, i.e.,

with

(6) To evaluate the modeling accuracy of different models, we transform the rate points estimated by using Q1 and Q2 from the -domain to the -domain and plot them in Fig. 1. From Fig. 1, we see that the proposed rate curves have substantial improvement compared with the -domain-based R-Q model

(3)

Fig. 3. Plots of the relationships between the actual bit rate for different QP and the estimated bit rate using Q1 and Q2 at frame level, respectively.

with . This evidence shows that our proposed vector of char-acteristics can construct a better model for the actual bit rate. Further, we plot extensive pairs of and for different by using Q1 and Q2 in Fig. 3. Obviously, the relationships be-tween and estimated by (6) using Q1 and Q2 nearly converge to a straight line. The relative average estimation er-rors are about . Therefore, all of these results support our argument—the “pseudocoding” process can be completely transplanted from the -domain to the -domain.4

III. FASTEXTRACTIONFRAMEWORK

Here, we provide a fast extraction framework for computing , , and . In the following extraction frame-work, two characteristics and are extracted in actual calculations and is obtained by a fast approxima-tion method.

According to the definition of , we only pay attention to the coefficients out of dead zone . To speed up the com-putation of , we introduce another temporary character-istic , which means the sum of all absolute values of nonzero transform coefficients out of dead zone , into our ex-traction process. For the sake of simplicity, we assume that the two characteristics and are already known and the processing unit in question is an MB. The com-putation of and will be discussed in more detail later.

Based on the definition of Uniform Threshold Quantizer [8], could be calculated by

(7)

4_{Note that without any fast lookup-table (LUT) approach, the computation}

ofQ (q ) in (6) is of high computational complexity. In the following, we choose the second characteristic vector Q2 to model the source bit rate and present a fast extraction framework for it.

Fig. 4. 1-D array B records the status of nonzero DCT coefficients and their relative positions during applying dead zone thresholding from1 to 1 . P (q ) records the position of the last nonzero coefficient after applying dead-zone thresholding1 .

where is a transform coefficient out of dead zone and is the number of nonzero coefficients. Herein, we com-pute the item in advance and denote it as . Accordingly, we can approximate (7) by following expression: (details can be found in [6] and [8]):

(8)

Now, we discuss how to compute , and

. To reuse most of computed results, the proposed fast extraction process is performed recursively from to . Herein, we involve an auxiliary status array of an

MB in order to extract the three rate characteristics , , and . To achieve it, after zigzag scan and DCT, we copy all absolute values of nonzero coefficients and their corresponding positions into the status array , as shown in Fig. 4. We denote the initial status of as and its size as . In addition, the histogram of DCT coefficients of is also built simultaneously, denoted as . The compu-tational complexity of roughly needs the additive operations.

Afterwards, when we successively apply a dead-zone threshold from to to , the two rate characteris-tics , are generated progressively. Similarly, is also computed recursively according to the status of .

Now, we compute the initial values and

for recursive computation of and . In (9) and (11), and represent the count of nonzero co-efficients and the sum of absolute values of those coco-efficients, respectively. Then, the results of and are

reused to compute and , respectively. As

(10) and (12) have shown, and will be gen-erated recursively from to .

It is clear to see that the computational complexity of (11)5

is about additive operations. Obviously, the computational complexity of (9) is not more than additive operations. There-fore, we conclude that, when we successively apply dead-zone threshold from to to , the total computational

5_Since_{x and D (x) are integer numbers, we can replace (11) fully with}

additive operations. For each itemD (x) 1 x, if D (x) = 0, we do nothing, otherwiseD (x) additions are performed.

(4)

(11) (12)

Now, we focus on the computation of . Here, we define a temporary variable which records the position of the last nonzero coefficient of . So, when we know the

informa-tion of and , can be calculated by

sub-tracting from . The computation of

also can be recursively achieved by successively applying dead-zone threshold to . As Fig. 4 shows, after we apply to , will be obtained by moving the position of the last nonzero coefficient from to . Recursively, the results of can be reused to compute . Clearly, when we successively apply dead-zone threshold from to to

, the total computational complexity of is about additive operations.

In the following, we focus on the discussion of computational complexity of existing R-Q models using QCIF format6_{with the}

help of a fast LUT. As analyzed above, the total computational complexity of the recursive extraction of , , and is roughly additive operations. For the construc-tion of R-Q, it is necessary to perform (8) and (6) with Q2 for 31 times. In (6) with Q2, there are three additive operations and three multiplications. In (8), two multiplications, one shift, and two additive operations are needed. Therefore, ad-ditions, 5 31 multiplications and 31 shifts are totally needed for the construction of the whole R-Q model.

For the -domain modeling framework, we first build the one-to-one mapping table for and QP. This roughly costs additive operations for the construction the histogram of DCT coefficients and the recursive computation of one-to-one map-ping between and QP. To construct the rate curve , at least a control point , which is defined in (13), is needed to construct the whole characteristic curve. Here, we apply an-other LUT, denoted as , to speed up calculation of in (13). Therefore, the computation of takes about , i.e., , additive operations

as follows:

(13) In addition, we need to consider the computation complexity of (1), (2), and (3) for the finite set of pseudo bit rates . In (1), two multiplications and two additive operations are re-quired. In (2), we need to perform one multiplication. In (3), six

6_{The number of pixels in a 16}_{2 16 MB is 16 2 16 2 1.5, i.e., six 8 2 8}

blocks.

sion. Finally, the whole R-Q model is constructed by performing linear rate prediction for 31 times. 31 10 additive operations and 31 2 multiplications are needed.

For the variance-based R-Q model in [1], we need to perform the quadratic form to obtain the R-Q model, which is shown as follows:

(14) where is a constant value and is an adaptive factor which is updated using the method in [1]. Hence, its computational complexity includes the computation of variance and (14) for 31 times. For the computation of variance, it costs 4.5

additions and 1.5 multiplications. In addition, to com-pute (14), two multiplications are needed. Therefore, there are

totally multiplications and 1.5

additive operations. Extensive experimental data shows that 1.5 is three times of , statistically. Thus, ap-proaches 16 16 1.5 0.333. Note that the above analysis of the computational complexity is in terms of an MB as the processing unit. If we use a frame as processing unit, can be easily scaled to the size of

.

Table I summarizes the computational complexity of dif-ferent R-Q models. We can see that, with the help of the fast LUT approach, two characteristic-based source models perform better than variance-based model does in term of the computa-tion complexity. Moreover, we could speed up the computacomputa-tion of the -domain source model by a fast table-look-up approach but the large size of the LUT is required, e.g., table for (13). On the other hand, our proposed -domain model; 1) only requires a small size of LUT and 2) could keep low computational com-plexity even without the help of an LUT. Therefore, we believe our -domain R-Q model is more suitable to the environment, where is equipped with the general purpose processing unit or the limited hardware.

IV. EXPERIMENTRESULTS

We implement the proposed rate model on H.263+ [12] and experiment on numerous typical QCIF format videos with 300 frames. The encoding frame is fixed at 10 fps. Frame type is set to IPPP. The first I-frame is encoded with .

To explore the robustness of the R-Q model, we use the predefined QP assignment similar as [2]7_{to offline}

encode each MB and then generate the actual bit rate and the estimated bit rate . The ratio of the accumulation estimation errors to the total of bit rate of a frame is used to evaluate the accuracy of the MB

7_{The QP assignment of each MB is progressively increased by 1 from 15 to}

(5)

TABLE I

COMPARISON OFCOMPUTATIONALCOMPLEXITY OFDIFFERENTMODELS

TABLE II

COMPARISON OFCOMPUTATIONALCOMPLEXITY OFDIFFERENTMODELS

Fig. 5. Comparison of the number of bits in the encoder buffer when the pro-posed rate controller, TMN8rc, and-rc are applied in H.263+ encoder for VBR.

level, i.e., .

On the other hand, for frame-level, we will adopt .

Comparisons of modeling accuracy using predefined QP as-signment with the other models [1], [2], and [4] are given in Table II. For the -domain based R-Q model, we use an optimal slope for each MB in the simulation. We can see that the proposed rate modeling function has substantial improvement compared with the other models for both frame- and MB- levels. In the following, we compare our previous rate controller [6], which uses the proposed rate model, with other rate controllers in more detail. The CBR cases have been presented in [6]. In this study, we will demonstrate the performance of the VBR [10]. In Fig. 5, we plot the buffer fullness for each coded frame, for which our rate controllers in [6], TMN8, and -rc, are ap-plied. One can see that, when the bandwidth fluctuates from 66 to 22 kbps, our rate controller can avoid many buffer underflows compared with TMN8rc and -rc. In Table III, we list the av-erage PSNR and the number of coded frames for different se-quences with different CBR. The proposed rate controller also

TABLE III

COMPARISON OFAVERAGEPSNR WITHDIFFERENTCHANNELRATE

can substantially improve up to 0.70 and 0.56 dB compared with TMN8rc and -rc. Consequently, the proposed rate con-troller can efficiently utilize the channel bandwidth, maintain the buffer fullness, and provide better visual quality compared with TMN8rc and -rc.

V. CONCLUSION

In this paper, we introduce a new characteristic vector for the “pseudocoding” process to construct the bit rate model in -domain. Experimental results show that, compared with ex-isting models, the proposed characteristic-based bit-rate model not only has substantial improvement in accuracy of the esti-mated bit rate but also is beneficial for developing an efficient rate controller.

REFERENCES

[1] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low-delay communication,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 172–185, Feb. 1999.

[2] J. Wei, B. H. Soong, and Z. G. Li, “A new rate-distortion model for video transmission using multiple logarithmic functions,” IEEE Signal

Process. Lett., vol. 11, pp. 694–697, Aug. 2004.

[3] T. Y. Kim and J. K. Kim, “An accurate rate control of MPEG video by rate-codewords modeling,” in Proc. IEEE Int. Symp. Circuits Systems

(ISCAS’97), Hong Kong, Jun. 9–12, 1997, pp. 1261–1264.

[4] Z. He and S. K. Mitra, “Low-Delay rate control for DCT video coding via-domain source modeling,” IEEE Trans. Circuits Syst. Video

Technol., vol. 11, no. 8, pp. 928–940, Aug. 2001.

[5] Z. He and S. K. Mitra, “A unified rate-distortion analysis framework for transform coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 12, pp. 1121–1235, Dec. 2001.

[6] C. Y. Chang, M. H. Chen, C. F. Chou, and D. Y. Chan, “A two-layer characteristic-based rate control framework for low delay video trans-mission,” in Proc. IEEE Int. Conf. Commun. (ICC’07), Jun. 2007, pp. 2699–2704.

[7] R. J. Freund and W. J. Wilson, Regression Analysis: Statistical

Mod-eling of a Response Variable. New York: Academic, 1998. [8] M. Ghanbari, Video Coding: An Introduction to Standard Codecs.

London, U.K.: IEE Press, 1999.

[9] C. Y. Chang, L. Tsungnan, D. Y. Chan, and S. H. Hung, “A low com-plexity rate-distortion source modeling framework,” in Proc. IEEE Int.

Conf. Acoust., Speech, Signal Process. (ICASSP’06), May 2006, pp.

929–932.

[10] Z. G. Li, C. Zhu, N. Ling, X. K. Yang, G. N. Feng, S. Wu, and F. Pan, “A unified architecture for real-time video-coding systems,” IEEE Trans.

Circuits Syst. Video Technol., vol. 13, no. 6, pp. 472–486, Jun. 2003.

[11] Z. He and S. K. Mitra, “A linear source model and unified rate control algorithm for DCT video coding,” IEEE Trans. Circuits Syst. Video

Technol., vol. 12, no. 11, pp. 970–982, Nov. 2002.

[12] H.263+ Codec [Online]. Available: http://www.ece.ubc.ca/spmg/ h263plus/h263plus.html