• 沒有找到結果。

Rate-Distortion Control Process

Chapter 2 Advanced Audio Coding

2.9 Rate-Distortion Control Process

Motivated by the human auditory system, the spectral coefficients are grouped into a number of bands, called scale factor bands (SFB). The spectral coefficients in one SFB are quantized by a non-uniform quantizer. The non-uniform quantizer in AAC is formulated in (2.1). The common_scalefactor is the common quantizer step size information for all the SFB. The

quantizer step size which determines the quantization distortion (noise-to-masking ratio, NMR) is controlled by the parameter, Scale Factor (SF). Note that, the parameter, Scale Factor, here and in the following discussions is equal to (scalefactor−common_scalefactor) in (2.1).

( )

⎜⎜

⎛ × +

=int ( _ 2 × 0.4054

_ 16

) _

( 3

34 saclefactor common scalefactor

line mdct abs q

x (2.1)

The quantized coefficients in one band are then entropy-coded by one of the twelve pre-designed Huffman CodeBooks (HCBs). Each SFB can have its own quantization step size and HCB. In addition, the indices of SFs and HCBs have to be coded and transmitted as side information. In AAC, the SFs are differentially coded relative to the previous SF and then Huffman coded using a pre-designed codebook [2]. Taking Fig. 2.8 as example, instead of encoding the SF value of the 2nd SFB, 65, the difference between the 2nd SFB and the 1st SFB, 5, is coded. The indices of HCBs are coded by run-length codes [22]. A run-length code in AAC is 9 bits long, which is composed of a 4-bits codebook index and a 5-bits run index.

For example, as shown in Fig. 2.8, the 3rd HCB is used from the 1st SFB to the 10th SFB;

therefore, these 10 HCB indices (same value) are coded together by one run-length code, in which the codebook index is 3 and run index is 10. It is obvious that the differential and run-length coding induce the inter-band dependency in coding process. The R-D controller, our focus in this paper, is to determine two critical parameters, the values of SF

and HCB, for each SFB so as to optimize the selected criterion under the given bit rate constraint. In the following discussions, if the context is clear, the abbreviation “SF” is also referred to the value of SF and “HCB” is also referred to the index of HCB.

Fig. 2.8: An example of values of SF and HCB.

A typical rate-distortion (R-D) control process in the MPEG audio encoder has two nested iteration loops, the outer iteration loop and the inner iteration loop. Thus, it is often called the two-loop search (TLS). The outer iteration loop is the distortion control loop that handles the distortion associated with each band. The inner iteration loop, also called the rate control loop, adjusts coding bits to fit the target bit budget for a frame. The flowcharts of the

outer loop and the inner loop specified in AAC are shown in Fig. 2.9.

Fig. 2.9: AAC (a) outer iteration loop and (b) inner iteration loop. [2]

Chapter 3

Cascaded Trellis-Based Rate-Distortion Control Algorithm

In this chapter, we describe the proposed first type of R-D control algorithms, called cascaded trellis-based (CTB) scheme. The proposed CTB algorithm and its variations are described in

Section 3.1. The proposed fast trellis search schemes are described in Section 3.2. The complexity analysis of the proposed R-D control algorithms and the simulation results with quality evaluation are summarized in Section 3.3.

3.1 Cascaded Trellis-Based Optimization Scheme

We start with the problem formulation of the R-D control algorithm for AAC in Section 3.1.1.

The trellis-based (TB) procedures for SF optimization and HCB optimization in the CTB scheme are described in Sections 3.1.2 and 3.1.3, respectively. One key element in the trellis-based optimization process on SF, so-called “pseudo HCB”, is explained in Section 3.1.4. Finally, the procedure of the complete CTB optimization scheme is summarized in Section 3.1.5.

3.1.1 Problem Formulation

For the perceptual audio coders, noise-to-masking ratio (NMR) is the most widely used objective measure in the R-D control module for modeling the subjective perceptual distortion.

Based on NMR, there are two commonly used criteria for R-D optimization, the average noise-to-mask ratio (ANMR) and the maximum noise-to-mask ratio (MNMR) [23]. In AAC,

the differential coding of SFs and the run-length coding of HCBs introduce inter-band dependence in parameter selection. In order to take into account the inter-band dependence in encoding SFs and HCBs, we need to consider all their possible combinations for all SFBs and

examine the bits and distortion produced by each combination. If such inter-band dependence does not exist, we can decide SF and HCB for each SFB separately and add all bands together to find the global optimal solution.

Mathematically, the R-D optimization problems for minimizing ANMR and MNMR under a given bit rate constraint are formulated by (3.1) and (3.2), respectively.

where i is the SFB index, wi is the inverse of the masking threshold, and di is the quantization distortion, the mean squared quantization errors. In (3.1), Σi widi is the sum of NMR over all SFBs in a frame and in (3.2), maxi widi is the maximum NMR in a frame. The parameter values of SF and HCB for the ith SFB are denoted by si and hi, respectively. Symbol D() is a function of SF, representing the number of bits produced by differential coding of SF. Symbol R() is a function of HCB, representing the number of bits produced by run-length coding of

HCB. The returned function values in both cases are numbers of bits to encode the arguments.

Parameter bi is the number of bits for coding the quantized spectral coefficients (QSCs) and the parameter PB is the prescribed bit rate for an audio frame.

To solve (3.1) and (3.2), the straight-forward joint optimization of SF and HCB for all SFBs is exorbitantly complex. For one frame in AAC, the number of SF values is 60, the number of HCB indices is 12, and there are 49 SFBs in total. Therefore, to find the optimal solution of all combinations, the complexity of brute force search is O((60⋅12)49). In [7][8], a dynamic programming approach, called joint trellis-based (JTB) scheme in this paper, is proposed to find the optimal SF and HCB for all SFBs jointly at a reduced complexity. As shown in [7][8], the problem of minimizing ANMR in (3.1) can be reformulated as

+ + +

=

i

i i i

i i

i

id b D s s R h h

w

CANMR λ ( ( 1) ( 1, )) (3.3)

Likewise, the problem of minimizing MNMR in (3.2) can be reformulated as minimizing the cost functions, CMNMR, under the constraint: widi≤ λ, ∀i, for a certain value of λ.

+ +

=

i

i i i

i

i D f f R h h

b

CMNMR ( 1) ( 1, ) (3.4)

The research in [7][8] shows that the problem of minimizing CANMR and CMNMR can be efficiently solved by the Viterbi search through the trellis, in which we compute only the legal transitions from the previous state to the current state [12][13]. Although, the search complexity of JTB scheme [8], O((60⋅12)2⋅49), is much lower than that of brute force search, it is still extremely high for practical applications.

As shown in Fig. 3.1, a simplification of the JTB scheme is to search for the SF and the HCB values in two consecutive steps without going through all possible combinations. Ideally, the order of complexity of our CTB scheme goes down to O((602+122)⋅49). However, because these two steps are strongly correlated, we need to design the cascaded algorithm with special treatment on this issue to reduce performance degradation. This is the main point of this section.

Fig. 3.1: Joint trellis-based scheme vs. cascaded trellis-based scheme.

3.1.2 Trellis-Based Optimization on SF

In this sub-section, the procedures of trellis-based optimization on SF aiming at two criterions, ANMR and MNMR, are described.

1) Trellis-Based Procedure for ANMR Minimization:

The problem of minimizing ANMR in the JTB scheme is formulated as minimizing the unconstrained cost functions, CANMR, in (3.3). However, to break the combined one step into two consecutive steps in our CTB scheme, this problem is reformulated as minimizing two unconstrained cost functions, CSF_ANMR and CHCB, as follows.

+ +

=

i

i i i

i

id b D s s

w

CSF_ANMR λ ( ( 1)) (3.5)

) , ( 1

HCB i i

i

i R h h

b

C =

+ (3.6)

The minimization of CSF_ANMR is described in this sub-section, and the minimization of CHCB will be described in Section 3.1.3. Because CSF_ANMR and CHCB are minimized in two separate steps, the global optimality of CANMR is not guaranteed although the computation is significantly reduced. Our contribution described hereafter is to develop techniques that would come close to the global optimality.

Similar to the approach in the JTB scheme, the goal of finding proper SFs that minimize CSF_ANMR can be achieved by looking for the optimal path through the trellis. Each stage in the trellis corresponds to an SFB and there are N_SFB stages in total. However, different from JTB, each state at the ith stage in our scheme only represents an SF candidate for the ith SFB.

In other words, at the ith stage, if a path passes through the mth state, it means that the mth SF candidate is used to encode the ith SFB.

For a given value of λ, the Viterbi search procedure for finding a proper set of SFs that minimize CSF_ANMR is outlined below. We denote Υk,i as the kth state at the ith stage and denote Ck,i as the minimum accumulative-partial cost ending at Υk,i. The state-transition cost, Tl,i-1→k,i, from Υl,i-1 to Υk,i is λ⋅D(sk,i − sl,i-1), where sk,i is the SF value associated with the state Υk,i.

1) Initialize all the states and start trellis search from the first stage. Ck,0 = 0, ∀k and i =1.

2) For each state at the ith stage, find the best path from the previous stage by examining all the states at the (i-1)th stage leading to the current state. The best path ending at Υk,i is the one that has the minimum accumulative-partial Ck,i. That is, we look for the minimum value of Ck,i, ∀k;

)}

( {

min , 1 , , , 1 ,

,i l li i ki ki li ki

k C wd b T

C = + +λ⋅ + (3.7) 3) Check the index, i. If i < N_SFB, set i = i+1 and go to step 2.

2) Trellis-Based Procedure for MNMR Minimization:

The problem of minimizing MNMR in the JTB scheme is formulated as minimizing the cost functions, CMNMR, in (3.4). In our CTB scheme, this problem is reformulated as the minimization of two cost functions, CSF_MNMR in (3.8) and CHCB in (3.6), under the constraint:

widi≤ λ, ∀i, for a certain value of λ.

+

=

i

i i

i D s s

b

CSF_MNMR ( 1) (3.8)

Similar to the trellis-based ANMR optimization on selecting SF described above, an “SF trellis” is constructed for minimizing CSF_MNMR. For a given value of λ, the Viterbi search procedure for finding proper SFs that minimize CSF_MNMR is outlined below. The state-transition cost, Tl,i-1→k,i, is D(sk,i− sl,i-1).

1) Initialize. Ck,0 = 0, ∀k and i =1.

2) For the ith stage, only the particular state, which the NMR (widk,i) associated with is less than or equal to λ, is valid for trellis search. Therefore, before staring the trellis search, we must find the valid states for the ith stage, Υk,i, ∀k.

3) For each valid state at the ith stage, find the best path from the previous stage by

examining all the valid states in the (i-1)th stage leading to the current state. That is, we compute and find the Ck,i such that;

} ) (

{

min , 1 , , 1 ,

,i l li ki li ki

k C b T

C = + + (3.9)

4) If i < N_SFB, set i = i+1 and go to step 2.

After completing the forward “search and expansion” step through the trellis, the optimal path in the trellis can be extracted by tracing backward from the state with minimum Ck,N_SFB

at the last stage. Consequently, the optimal SFs for all SFBs that minimize CSF_MNMR (or CSF_ANMR) are determined.

As described in [7][8], to a band below the masking threshold, any values of SF can be assigned. Therefore, its associated state in the trellis is split into two consecutive states. At the first state, the spectral coefficients are quantized using the assigned valid SF, and at the second state, all quantized values of spectral coefficients are set to zero.

3.1.3 Trellis-Based Optimization on HCB

The HCB optimization is performed under the assumption that the SF (value) for each SFB has already been decided. In our CTB scheme, SF is determined by the trellis-based optimization on SF described in Section 3.1.2. With a determined SF, QSCs for each SFB are fixed and thus the bi term in the cost function CHCB (see (3.6)) depends only on the selection of HCB. Therefore, CHCB can be restated as (3.10).

) , ( )

( 1

HCB i i

i

i

h R h h

H

C =

i q + (3.10)

where qi (vector) contains the QSCs for the ith SFB and symbol Hh () is a function of QSCs, representing the number of bits produced by Huffman-coding of QSCs using the hth HCB.

The goal of the optimization procedure here is to find the HCBs for all SFBs that minimize

trellis with states now being HCB.

An “HCB trellis” is thus constructed for searching for the minimum CHCB. Each state at the ith stage represents an HCB candidate for the ith SFB. The state-transition cost, Tn,i-1→m,i, from Υn,i-1 to Υm,i is R(hn,i-1 , hm,i), where hm,i is the HCB associated with the state Υm,i. According to the run-length coding rule in AAC, R(hn,i-1 , hm,i) is defined by (3.11). In other words, no extra bits are transmitted if the same HCBs are used in two neighboring SFBs.

⎩⎨

⎧ =

= otherwise

m n h if

h

R ni mi , 9

, ) 0 ,

( , 1 , (3.11)

The Viterbi search procedure for finding proper HCBs that minimize CHCB is outlined below.

1) Initialize. Cm,0 = 0, ∀m and i =1.

2) For each state at the ith stage, find the best path from the previous stage by examining all the states at the (i-1)th stage leading to the current state. That is, we find the best Υm,i by computing and find the Cm,i such that

)}

) ( ( {

min , 1 , 1 ,

,i n ni h , i ni mi

m C H T

C = + mi q + (3.12)

where qi is the vector of the QSCs in the stage i.

3) If i < N_SFB, set i = i+1 and go to step 2.

Similar to the trellis-based optimization on SF, after completing the forward search/expansion step through the trellis, the optimal path in the trellis can be extracted by tracing backward from the minimum Cm,N_SFB state at the last stage. Then, the optimal HCBs for all SFBs that minimize CHCB are determined.

3.1.4 Pseudo HCB for SF Optimization

1) Motivation for Pseudo HCB:

We first look at the MNMR minimization case. The key problem in splitting (3.4) into (3.8) and (3.6) is to choose the correct (optimal) value of bk,i in (3.8). In (3.8), the widk,i or D(sk,i − sl,i-1) term is unique for a given state or state transition in the SF trellis. However, the value of bk,i depends not only on sk,i associated with the state in the SF trellis; it also depends on the choice of HCB. In the JTB scheme, SF and HCB are chosen simultaneously. Therefore, for each candidate value of SF, all possible bk,i values, corresponding to 12 candidate HCBs, are evaluated. In other words, the chosen value of bk,i for each state Υk,i in the trellis for JTB optimization scheme is optimal [7][8]. But in our sequential optimization scheme, the value of bk,i for the state Υk,i in (3.8) is estimated based upon the available information. The estimated value of bk,i may not be the optimal value and this may further induce an incorrect (non-optimal) selection in SF optimization. For example, two candidate paths in the SF trellis, A and B, are shown in (3.13). Path A is better than path B because CSF_MNMRA <CSF_MNMRB , where CSF_MNMRA and CSF_MNMRB are the CSF_MNMR values of path A and path B, respectively. Note that ˆA

b and i bˆiB in (3.13) are the estimated values of bi for path A and path B. If the decision on SF is made at this point, path A is chosen. Now, let us go one step further. Based on the selected SF sets of path A and path B, we can find their optimal HCBs, hiA

and hiB

respectively, according to the HCB optimization procedure described in Section 3.1.3. Then, their actual bits information biA and biB, for path A and path B, respectively, is obtained. Finally, the total costs CMNMRA and CMNMRB for two candidate paths are shown in (3.14). The result in (3.14) indicates that path B is actually better than path A when the bits information is correct. With a wrong estimate on bi, our CTB algorithm would pick up path A for SFs and thus it fails to find the overall optimal path B.

Clearly, with a more accurate estimate on bk,i, we can select better SFs. For this aim, the concept of “pseudo HCB” is proposed for the trellis-based optimization on SF. The preceding discussions on choosing HCB can be applied to the ANMR minimization case.

2) Design of Pseudo HCB:

When the trellis-based optimization on SF is performed in the pseudo HCB mode, a pseudo HCB with an index set hkv,i needs to be constructed for the state Υk,i to produce bk,i in (3.5) and (3.8). It can be constructed in several ways. For example, hkv,i may contain only one of the 12 candidate HCBs or several codebooks. In order to improve the accuracy of the estimated values of bk,i and hkv,i, we analyze the data collected from the JTB optimization scheme.

For a given value of λ, using the JTB scheme, we can find a set of optimal parameters,

JTB

sopt , hoptJTB and boptJTB that minimizes the cost function, CANMR in (3.3) or CMNMR in (3.4).

For comparison purposes, we also construct a reference set of QSC bits, bminJTB, in the following way. For the ith SFB, bmin ,iJTB is the minimum number of bits for encoding qoptJTB,i

and is determined bybmin,JTBi =minm{Hm(qoptJTB,i)}, where qoptJTB,i is the QSCs quantized by using

JTB ,i

sopt . In other words, without considering the bits for coding the HCB indices, bmin ,iJTB is the lowest bits number produced by any of the 12 HCBs applied to the QSCs. Because the coded bits for HCB indices, R(hi-1 , hi), are also included in the overall optimization procedure, when comparing coding bits for QSCs only, boptJTB is higher than or equal to bminJTB.

By collecting the statistics from the simulations on ten audio sequences, the histogram of the differences between boptJTB and bminJTB, denoted by ∆b, is shown in Fig. 3.2. We observe that over 91% of ∆b is less than 3 for both ANMR and MNMR criterions. In general, we can choose the HCB that produces the minimum QSC bits, bminJTB.

Fig. 3.2: Histogram on ∆b.

After examining this characteristics of boptJTB, we derive a rule in determining hkv,i and

bk,i. For the state Υk,i, hkv,i is the index set of HCB that satisfies the proposed rule in (3.15);

namely,

{

| ( ) min { ( )} , {0, ,11}

}

, = n HH + n∈ L

hkvi n qk,i m m qk,i δ (3.15)

The minm{Hm(qk,i)} term is the minimum number of bits for coding qk,i without considering the coding bits for HCB indices and δ is an offset parameter. For example, if H1(qk,i) and H3(qk,i) are both smaller than or equal to minm{Hm(qk,i)}+δ, then hkv,i equals to {1,3}.

Although (3.5) and (3.8) do not include the bits number for coding HCB indices, it is found from experiments that including this term leads to a better estimate of SF. Therefore, we expand (3.5) to approximate (3.3) and expand (3.8) to approximate (3.4) with additional terms.

Based on the above observation, bk,i is rewritten as:

( )

elements in hkv,i. The symbol Rv is the run-length coding function performed on the pseudo HCB and is defined below.

⎩⎨

Note that the Rv() function is essentially the R() function in (3.11). However, because hkv,iand

v i

hl,1 are index sets of HCB, the intersection is used in (3.17).

After having derived (3.15) and (3.16), we still need to determine the proper values for δ and α. The values of δ and α can be determined by examining the difference between the JTB scheme and the CTB scheme at different values of δ and α and the results are shown in Fig.

3.3 and Fig. 3.4. Note that CANMRJTB and CANMRCTB are the CANMR (in (3.3)) derived using the

JTB scheme and CTB scheme, respectively.CMNMRJTB and CMNMRCTB are the CMNMR (in (3.4)) derived using the JTB scheme and CTB scheme, respectively. We find that for a wide range of δ values, we can achieve a pretty good performance when Rv(hlv,i1,hkv,i) is included in bk,i

(α > 0). As Fig. 3.3 and Fig. 3.4 indicate, the case that δ=1 and α=0.5 gives the best results.

Hence, we choose 1 for δ and 0.5 for α in our implementation.

Fig. 3.3: ( CTB MNMRJTB

MNMR C

C ) vs. (δ , α).

Fig. 3.4: ( CTB ANMRJTB ANMR C

C ) vs. (δ , α).

3.1.5 Cascaded Trellis-Based Optimization Procedure

The major steps in our CTB scheme have been described in detail in Sections 3.1.2 to 3.1.4.

The flowchart of the complete CTB optimization scheme is summarized in Fig. 3.5. Passing

The flowchart of the complete CTB optimization scheme is summarized in Fig. 3.5. Passing