Rate-Distortion Control Process - Advanced Audio Coding

Chapter 2 Advanced Audio Coding

2.9 Rate-Distortion Control Process

Motivated by the human auditory system, the spectral coefficients are grouped into a number of bands, called scale factor bands (SFB). The spectral coefficients in one SFB are quantized by a non-uniform quantizer. The non-uniform quantizer in AAC is formulated in (2.1). The common_scalefactor is the common quantizer step size information for all the SFB. The

quantizer step size which determines the quantization distortion (noise-to-masking ratio, NMR) is controlled by the parameter, Scale Factor (SF). Note that, the parameter, Scale Factor, here and in the following discussions is equal to (scalefactor−common_scalefactor) in (2.1).

( )

_⎟^⎟

⎠

⎞

⎜⎜

⎝

⎛ × +

=int ( _ 2 ^× ⁻ 0.4054

_ ¹⁶

) _

( 3

34 saclefactor common scalefactor

line mdct abs q

x (2.1)

The quantized coefficients in one band are then entropy-coded by one of the twelve pre-designed Huffman CodeBooks (HCBs). Each SFB can have its own quantization step size and HCB. In addition, the indices of SFs and HCBs have to be coded and transmitted as side information. In AAC, the SFs are differentially coded relative to the previous SF and then Huffman coded using a pre-designed codebook [2]. Taking Fig. 2.8 as example, instead of encoding the SF value of the 2nd SFB, 65, the difference between the 2nd SFB and the 1st SFB, 5, is coded. The indices of HCBs are coded by run-length codes [22]. A run-length code in AAC is 9 bits long, which is composed of a 4-bits codebook index and a 5-bits run index.

For example, as shown in Fig. 2.8, the 3rd HCB is used from the 1st SFB to the 10th SFB;

therefore, these 10 HCB indices (same value) are coded together by one run-length code, in which the codebook index is 3 and run index is 10. It is obvious that the differential and run-length coding induce the inter-band dependency in coding process. The R-D controller, our focus in this paper, is to determine two critical parameters, the values of SF

and HCB, for each SFB so as to optimize the selected criterion under the given bit rate constraint. In the following discussions, if the context is clear, the abbreviation “SF” is also referred to the value of SF and “HCB” is also referred to the index of HCB.

Fig. 2.8: An example of values of SF and HCB.

A typical rate-distortion (R-D) control process in the MPEG audio encoder has two nested iteration loops, the outer iteration loop and the inner iteration loop. Thus, it is often called the two-loop search (TLS). The outer iteration loop is the distortion control loop that handles the distortion associated with each band. The inner iteration loop, also called the rate control loop, adjusts coding bits to fit the target bit budget for a frame. The flowcharts of the

outer loop and the inner loop specified in AAC are shown in Fig. 2.9.

Fig. 2.9: AAC (a) outer iteration loop and (b) inner iteration loop. [2]

Chapter 3 Cascaded Trellis-Based Rate-Distortion Control Algorithm

In this chapter, we describe the proposed first type of R-D control algorithms, called cascaded trellis-based (CTB) scheme. The proposed CTB algorithm and its variations are described in

Section 3.1. The proposed fast trellis search schemes are described in Section 3.2. The complexity analysis of the proposed R-D control algorithms and the simulation results with quality evaluation are summarized in Section 3.3.

3.1 Cascaded Trellis-Based Optimization Scheme

We start with the problem formulation of the R-D control algorithm for AAC in Section 3.1.1.

The trellis-based (TB) procedures for SF optimization and HCB optimization in the CTB scheme are described in Sections 3.1.2 and 3.1.3, respectively. One key element in the trellis-based optimization process on SF, so-called “pseudo HCB”, is explained in Section 3.1.4. Finally, the procedure of the complete CTB optimization scheme is summarized in Section 3.1.5.

3.1.1 Problem Formulation

For the perceptual audio coders, noise-to-masking ratio (NMR) is the most widely used objective measure in the R-D control module for modeling the subjective perceptual distortion.

Based on NMR, there are two commonly used criteria for R-D optimization, the average noise-to-mask ratio (ANMR) and the maximum noise-to-mask ratio (MNMR) [23]. In AAC,

the differential coding of SFs and the run-length coding of HCBs introduce inter-band dependence in parameter selection. In order to take into account the inter-band dependence in encoding SFs and HCBs, we need to consider all their possible combinations for all SFBs and

examine the bits and distortion produced by each combination. If such inter-band dependence does not exist, we can decide SF and HCB for each SFB separately and add all bands together to find the global optimal solution.

Mathematically, the R-D optimization problems for minimizing ANMR and MNMR under a given bit rate constraint are formulated by (3.1) and (3.2), respectively.

∑

where i is the SFB index, wi is the inverse of the masking threshold, and di is the quantization distortion, the mean squared quantization errors. In (3.1), Σi widi is the sum of NMR over all SFBs in a frame and in (3.2), maxi widi is the maximum NMR in a frame. The parameter values of SF and HCB for the ith SFB are denoted by si and hi, respectively. Symbol D() is a function of SF, representing the number of bits produced by differential coding of SF. Symbol R() is a function of HCB, representing the number of bits produced by run-length coding of

HCB. The returned function values in both cases are numbers of bits to encode the arguments.

Parameter bi is the number of bits for coding the quantized spectral coefficients (QSCs) and the parameter PB is the prescribed bit rate for an audio frame.

To solve (3.1) and (3.2), the straight-forward joint optimization of SF and HCB for all SFBs is exorbitantly complex. For one frame in AAC, the number of SF values is 60, the number of HCB indices is 12, and there are 49 SFBs in total. Therefore, to find the optimal solution of all combinations, the complexity of brute force search is O((60⋅12)⁴⁹). In [7][8], a dynamic programming approach, called joint trellis-based (JTB) scheme in this paper, is proposed to find the optimal SF and HCB for all SFBs jointly at a reduced complexity. As shown in [7][8], the problem of minimizing ANMR in (3.1) can be reformulated as

∑

⁺ ^⋅ ⁺ ⁻ ⁻ ⁺ ⁻

i i i

i i

id b D s s R h h

C_ANMR λ ( ( ₁) ( ₁, )) (3.3)

Likewise, the problem of minimizing MNMR in (3.2) can be reformulated as minimizing the cost functions, CMNMR, under the constraint: widi≤ λ^,∀i, for a certain value of λ^.

∑

⁺ ⁻ ⁻ ⁺ ⁻

i i i

i D f f R h h

C_MNMR ( ₁) ( ₁, ) (3.4)

The research in [7][8] shows that the problem of minimizing C_ANMR and C_MNMR can be efficiently solved by the Viterbi search through the trellis, in which we compute only the legal transitions from the previous state to the current state [12][13]. Although, the search complexity of JTB scheme [8], O((60⋅12)²⋅49), is much lower than that of brute force search, it is still extremely high for practical applications.

As shown in Fig. 3.1, a simplification of the JTB scheme is to search for the SF and the HCB values in two consecutive steps without going through all possible combinations. Ideally, the order of complexity of our CTB scheme goes down to O((60²+12²)⋅49). However, because these two steps are strongly correlated, we need to design the cascaded algorithm with special treatment on this issue to reduce performance degradation. This is the main point of this section.

Fig. 3.1: Joint trellis-based scheme vs. cascaded trellis-based scheme.

3.1.2 Trellis-Based Optimization on SF

In this sub-section, the procedures of trellis-based optimization on SF aiming at two criterions, ANMR and MNMR, are described.

1) Trellis-Based Procedure for ANMR Minimization:

The problem of minimizing ANMR in the JTB scheme is formulated as minimizing the unconstrained cost functions, CANMR, in (3.3). However, to break the combined one step into two consecutive steps in our CTB scheme, this problem is reformulated as minimizing two unconstrained cost functions, CSF_ANMR and CHCB, as follows.

∑

⁺ ^⋅ ⁺ ⁻ ⁻

i i i

id b D s s

C_{SF_ANMR} λ ( ( ₁)) (3.5)

) , ( ₁

HCB i i

i R h h

C ⁼

∑

⁺ ₋ (3.6)

The minimization of CSF_ANMR is described in this sub-section, and the minimization of C_HCB will be described in Section 3.1.3. Because C_{SF_ANMR} and C_HCB are minimized in two separate steps, the global optimality of CANMR is not guaranteed although the computation is significantly reduced. Our contribution described hereafter is to develop techniques that would come close to the global optimality.

Similar to the approach in the JTB scheme, the goal of finding proper SFs that minimize C_{SF_ANMR} can be achieved by looking for the optimal path through the trellis. Each stage in the trellis corresponds to an SFB and there are N_SFB stages in total. However, different from JTB, each state at the ith stage in our scheme only represents an SF candidate for the ith SFB.

In other words, at the ith stage, if a path passes through the mth state, it means that the mth SF candidate is used to encode the ith SFB.

For a given value of λ, the Viterbi search procedure for finding a proper set of SFs that minimize C_{SF_ANMR} is outlined below. We denote Υk,i as the kth state at the ith stage and denote Ck,i as the minimum accumulative-partial cost ending at Υk,i. The state-transition cost, T_l,i-1→k,i, from Υl,i-1 to Υk,i is λ⋅D(sk,i − sl,i-1), where s_k,i is the SF value associated with the state Υk,i.

1) Initialize all the states and start trellis search from the first stage. Ck,0 = 0, ∀k and i =1.

2) For each state at the ith stage, find the best path from the previous stage by examining all the states at the (i-1)th stage leading to the current state. The best path ending at Υk,i is the one that has the minimum accumulative-partial Ck,i. That is, we look for the minimum value of Ck,i, ∀k;

)}

( {

min _, ₁ _, _, _, ₁ _,

,i l li i ki ki li ki

k C wd b T

C = ₋ + +λ⋅ + ₋_→ (3.7) 3) Check the index, i. If i < N_SFB, set i = i+1 and go to step 2.

2) Trellis-Based Procedure for MNMR Minimization:

The problem of minimizing MNMR in the JTB scheme is formulated as minimizing the cost functions, C_MNMR, in (3.4). In our CTB scheme, this problem is reformulated as the minimization of two cost functions, CSF_MNMR in (3.8) and CHCB in (3.6), under the constraint:

widi≤ λ^,∀i, for a certain value of λ^.

∑

⁺ ⁻ ⁻

i i

i D s s

C_{SF_MNMR} ( ₁) (3.8)

Similar to the trellis-based ANMR optimization on selecting SF described above, an “SF trellis” is constructed for minimizing C_{SF_MNMR}. For a given value of λ, the Viterbi search procedure for finding proper SFs that minimize CSF_MNMR is outlined below. The state-transition cost, Tl,i-1→k,i, is D(sk,i− sl,i-1).

1) Initialize. Ck,0 = 0, ∀k and i =1.

2) For the ith stage, only the particular state, which the NMR (widk,i) associated with is less than or equal to λ, is valid for trellis search. Therefore, before staring the trellis search, we must find the valid states for the ith stage, Υk,i, ∀k.

3) For each valid state at the ith stage, find the best path from the previous stage by

examining all the valid states in the (i-1)th stage leading to the current state. That is, we compute and find the C_k,i such that;

} ) (

{

min _, ₁ _, _, ₁ _,

,i l li ki li ki

k C b T

C = ₋ + + ₋_→ (3.9)

4) If i < N_SFB, set i = i+1 and go to step 2.

After completing the forward “search and expansion” step through the trellis, the optimal path in the trellis can be extracted by tracing backward from the state with minimum Ck,N_SFB

at the last stage. Consequently, the optimal SFs for all SFBs that minimize C_{SF_MNMR} (or CSF_ANMR) are determined.

As described in [7][8], to a band below the masking threshold, any values of SF can be assigned. Therefore, its associated state in the trellis is split into two consecutive states. At the first state, the spectral coefficients are quantized using the assigned valid SF, and at the second state, all quantized values of spectral coefficients are set to zero.

3.1.3 Trellis-Based Optimization on HCB

The HCB optimization is performed under the assumption that the SF (value) for each SFB has already been decided. In our CTB scheme, SF is determined by the trellis-based optimization on SF described in Section 3.1.2. With a determined SF, QSCs for each SFB are fixed and thus the b_i term in the cost function C_HCB (see (3.6)) depends only on the selection of HCB. Therefore, CHCB can be restated as (3.10).

) , ( )

( ₁

HCB i i

h R h h

C ⁼

∑

i ^q ⁺ − ^(3.10)

where qi (vector) contains the QSCs for the ith SFB and symbol Hh () is a function of QSCs, representing the number of bits produced by Huffman-coding of QSCs using the hth HCB.

The goal of the optimization procedure here is to find the HCBs for all SFBs that minimize

trellis with states now being HCB.

An “HCB trellis” is thus constructed for searching for the minimum C_HCB. Each state at the ith stage represents an HCB candidate for the ith SFB. The state-transition cost, T_n,i-1→m,i, from Υn,i-1 to Υm,i is R(hn,i-1 , hm,i), where hm,i is the HCB associated with the state Υm,i. According to the run-length coding rule in AAC, R(hn,i-1 , hm,i) is defined by (3.11). In other words, no extra bits are transmitted if the same HCBs are used in two neighboring SFBs.

⎩⎨

⎧ =

− = otherwise

m n h if

R _n_i _m_i , 9

, ) 0 ,

( _, ₁ _, (3.11)

The Viterbi search procedure for finding proper HCBs that minimize CHCB is outlined below.

1) Initialize. Cm,0 = 0, ∀m and i =1.

2) For each state at the ith stage, find the best path from the previous stage by examining all the states at the (i-1)th stage leading to the current state. That is, we find the best Υm,i by computing and find the Cm,i such that

)}

) ( ( {

min _, ₁ _, ₁ _,

,i n ni h _, i ni mi

m C H T

C = − + mi q + −→ (3.12)

where q_i is the vector of the QSCs in the stage i.

3) If i < N_SFB, set i = i+1 and go to step 2.

Similar to the trellis-based optimization on SF, after completing the forward search/expansion step through the trellis, the optimal path in the trellis can be extracted by tracing backward from the minimum C_{m,N_SFB} state at the last stage. Then, the optimal HCBs for all SFBs that minimize CHCB are determined.

3.1.4 Pseudo HCB for SF Optimization

1) Motivation for Pseudo HCB:

We first look at the MNMR minimization case. The key problem in splitting (3.4) into (3.8) and (3.6) is to choose the correct (optimal) value of bk,i in (3.8). In (3.8), the widk,i or D(sk,i − s^l,i-1) term is unique for a given state or state transition in the SF trellis. However, the value of bk,i depends not only on sk,i associated with the state in the SF trellis; it also depends on the choice of HCB. In the JTB scheme, SF and HCB are chosen simultaneously. Therefore, for each candidate value of SF, all possible bk,i values, corresponding to 12 candidate HCBs, are evaluated. In other words, the chosen value of bk,i for each state Υk,i in the trellis for JTB optimization scheme is optimal [7][8]. But in our sequential optimization scheme, the value of bk,i for the state Υk,i in (3.8) is estimated based upon the available information. The estimated value of b_k,i may not be the optimal value and this may further induce an incorrect (non-optimal) selection in SF optimization. For example, two candidate paths in the SF trellis, A and B, are shown in (3.13). Path A is better than path B because C_{SF_MNMR}^A <CSF_MNMR^B , where C_{SF_MNMR}^A and C_{SF_MNMR}^B are the C_{SF_MNMR} values of path A and path B, respectively. Note that ˆ^A

b and i b^ˆ_i^B in (3.13) are the estimated values of bi for path A and path B. If the decision on SF is made at this point, path A is chosen. Now, let us go one step further. Based on the selected SF sets of path A and path B, we can find their optimal HCBs, hiA

and hiB

respectively, according to the HCB optimization procedure described in Section 3.1.3. Then, their actual bits information b_i^A and b_i^B, for path A and path B, respectively, is obtained. Finally, the total costs C_MNMR^A and C_MNMR^B for two candidate paths are shown in (3.14). The result in (3.14) indicates that path B is actually better than path A when the bits information is correct. With a wrong estimate on b_i, our CTB algorithm would pick up path A for SFs and thus it fails to find the overall optimal path B.

∑

Clearly, with a more accurate estimate on b_k,i, we can select better SFs. For this aim, the concept of “pseudo HCB” is proposed for the trellis-based optimization on SF. The preceding discussions on choosing HCB can be applied to the ANMR minimization case.

2) Design of Pseudo HCB:

When the trellis-based optimization on SF is performed in the pseudo HCB mode, a pseudo HCB with an index set h_k^v_,_i needs to be constructed for the state Υk,i to produce bk,i in (3.5) and (3.8). It can be constructed in several ways. For example, h_k^v_,_i may contain only one of the 12 candidate HCBs or several codebooks. In order to improve the accuracy of the estimated values of bk,i and h_k^v_,_i, we analyze the data collected from the JTB optimization scheme.

For a given value of λ, using the JTB scheme, we can find a set of optimal parameters,

JTB

sopt , h_opt^JTB and b_opt^JTB that minimizes the cost function, CANMR in (3.3) or CMNMR in (3.4).

For comparison purposes, we also construct a reference set of QSC bits, b_min^JTB, in the following way. For the ith SFB, b_{min ,i}^JTB is the minimum number of bits for encoding q_opt^JTB_,i

and is determined byb_min,^JTB_i =min_m{H_m(q_opt^JTB_,_i)}, where q_opt^JTB_,i is the QSCs quantized by using

JTB ,i

sopt . In other words, without considering the bits for coding the HCB indices, b_{min ,i}^JTB is the lowest bits number produced by any of the 12 HCBs applied to the QSCs. Because the coded bits for HCB indices, R(hi-1 , hi), are also included in the overall optimization procedure, when comparing coding bits for QSCs only, b_opt^JTB is higher than or equal to b_min^JTB.

By collecting the statistics from the simulations on ten audio sequences, the histogram of the differences between b_opt^JTB and b_min^JTB, denoted by ∆b, is shown in Fig. 3.2. We observe that over 91% of ∆b is less than 3 for both ANMR and MNMR criterions. In general, we can choose the HCB that produces the minimum QSC bits, b_min^JTB.

Fig. 3.2: Histogram on ∆b.

After examining this characteristics of b_opt^JTB, we derive a rule in determining h_k^v_,_i and

bk,i. For the state Υk,i, h_k^v_,_i is the index set of HCB that satisfies the proposed rule in (3.15);

namely,

{

^| ⁽ ⁾ ^min ^{ ⁽ ^)} ^, ^{⁰^, ^,¹¹^}

}

, = n H ≤ H + n∈ L

h_k^v_i _n ^q_k,i _m _m ^q_k,i δ ^(3.15)

The minm{Hm(qk,i)} term is the minimum number of bits for coding qk,i without considering the coding bits for HCB indices and δ is an offset parameter. For example, if H1(qk,i) and H3(qk,i) are both smaller than or equal to minm{Hm(qk,i)}+δ^{, then}hk^v_,i equals to {1,3}.

Although (3.5) and (3.8) do not include the bits number for coding HCB indices, it is found from experiments that including this term leads to a better estimate of SF. Therefore, we expand (3.5) to approximate (3.3) and expand (3.8) to approximate (3.4) with additional terms.

Based on the above observation, b_k,i is rewritten as:

( )

elements in h_k^v_,_i. The symbol R_v is the run-length coding function performed on the pseudo HCB and is defined below.

⎩⎨

Note that the Rv() function is essentially the R() function in (3.11). However, because h_k^v_,_iand

v i

hl_,₋₁ are index sets of HCB, the intersection is used in (3.17).

After having derived (3.15) and (3.16), we still need to determine the proper values for δ and α. The values of δ^andα can be determined by examining the difference between the JTB scheme and the CTB scheme at different values of δ^andα and the results are shown in Fig.

3.3 and Fig. 3.4. Note that ^C_ANMR^JTB and C_ANMR^CTB are the CANMR (in (3.3)) derived using the

JTB scheme and CTB scheme, respectively.C_MNMR^JTB and C_MNMR^CTB are the CMNMR (in (3.4)) derived using the JTB scheme and CTB scheme, respectively. We find that for a wide range of δ values, we can achieve a pretty good performance when R_v(h_l^v_,_i₋₁,h_k^v_,_i) is included in bk,i

(α > 0). As Fig. 3.3 and Fig. 3.4 indicate, the case that δ^{=1 and}α=0.5 gives the best results.

Hence, we choose 1 for δ and 0.5 for α in our implementation.

Fig. 3.3: ( CTB MNMR^JTB

MNMR C

C − ) vs. (δ^,α^).

Fig. 3.4: ( CTB ANMR^JTB ANMR C

C − ) vs. (δ^,α^).

3.1.5 Cascaded Trellis-Based Optimization Procedure

The major steps in our CTB scheme have been described in detail in Sections 3.1.2 to 3.1.4.

The flowchart of the complete CTB optimization scheme is summarized in Fig. 3.5. Passing

在文檔中用於先進音訊編碼之高效率編碼策略 (頁 30-0)