• 沒有找到結果。

Chapter 3 Cascaded Trellis-Based Rate-Distortion Control Algorithm

3.3 Simulation Results

3.3.3 Subjective Quality

Listening test by human ears is the traditional way to subjectively evaluate the audio quality and is also the most recognized subjective quality measure. However, such subjective test is expensive, time consuming, and difficult to reproduce. As described in Section 6.2 in [8], the subjective quality (mean opinion score, MOS) of the JTB-ANMR (or JTB-MNMR) scheme is significantly better than that of the VM-TLS. MOS is derived from the ITU (International Telecommunications Union) 5-grade absolute category rating (ACR) scheme [25]. Moreover,

the informal listening tests on the aforementioned schemes show that it is hard to tell the difference between JTB and various CTB schemes. In addition, a “simulated” subjective measure, Objective Difference Grade (ODG), is used in audio quality evaluation. ODG is generated by a procedure designed to be comparable to the Subjective Difference Grade (SDG) judged by human ears. It is calculated based on the difference between the quality rating of the “reference” signal and the “test” signal. The ODG has a range of [-4, 0], in which –4 stands for very significant difference and 0 stands for imperceptible difference between the reference and the test signal [26][27].

The ODG results for various R-D control schemes discussed in this paper are shown in Fig. 3.11, in which the reference signal is the original audio sequence. According to the collected test data (Fig. 3.11), the difference between JTB and CTB schemes is quite small.

The ODG results, comparing against the full search CTB-MNMR scheme, are shown in Fig. 3.12. Note that the reference signal here is the coded audio sequence produced by the full search CTB-MNMR scheme. We can find that there is almost no difference between the GMNU algorithm and the full search CTB-MNMR particularly at mid to high bit rates. Again the performance of the non-uniform trellis algorithms is better than that of the uniform trellis algorithms with the same computational complexity.

Fig. 3.11: ODG of VM-TLS, JTB and CTB.

Fig. 3.12: ODG of various fast SF search algorithms for CTB-MNMR.

Chapter 4

Enhanced BFOS Bit Allocation Algorithm

How to make use of the bits more efficiently is always the key issue in perceptual audio coding. The traditional bit allocation strategies, “allocate bits to the band with the largest NMR” or “allocate bits to the bands of which the distortion is larger than the masking threshold” [23][28], do not necessarily provide the best bit-use efficiency. The “bit-use efficiency” here means the distortion improvement due to receiving bits.

The previously proposed generalized BFOS algorithm is an efficient bit allocation algorithm for classified vector quatization or subband coding [9][10]. The research in [11]

also shows that the generalized BFOS algorithm is a near optimal bit allocation scheme for MPEG-1 LayerⅠ/ LayerⅡ audio coding. However, our research shows that the generalized BFOS bit allocation algorithm becomes less efficient for MPEG-4 AAC in which the inter-band dependency of coding process exists.

In this chapter, we describe the proposed second type of R-D control algorithms, the Enhanced BFOS (EBFOS) bit allocation algorithm. We first describe briefly the generalized BFOS bit allocation algorithm in Section 4.1. The bit allocation procedure of our EBFOS scheme for AAC and its fast version are described in Section 4.2. For comparison, we also propose an approach to integrate the generalized BFOS bit allocation algorithm in AAC in Section 4.3. Finally, the complexity analysis and the simulation results are presented in Section 4.4.

4.1 Generalized BFOS Algorithm

As illustrated in [10], the generalized Breiman, Friedman, Olshen, and Stone (BFOS) algorithm is an extension of an algorithm for optimal pruning in tree-structured classification

subtrees of a given tree-structured coder. Bit allocation of the audio coder using the generalized BFOS algorithm is first suggested in [11].

As described in [10][11], the BFOS algorithm allocates the bits based on the “Marginal Returns” provide by each allocation. The marginal return, MR, is defined as (4.1) [11].

R MR D

= ∆ (4.1)

∆D and ∆R in (4.1) are the changes of distortion and bit rate, respectively. For a subband coding application, we restate the basic idea of the generalized BFOS bit allocation algorithm as “allocate bits to the band with the maximum distortion decrease per bit” or “de-allocate bits from the band with the minimum distortion increase per bit”. However, as described in [10][11], the analytic procedure of “distortion decrease per bit” or “distortion increase per bit”

is inter-band independent in the generalized BFOS bit allocation algorithm. This may result in less efficient bit allocation for the coder in which the inter-band dependence of coding process exists, such as AAC.

4.2 Enhanced BFOS Bit Allocation Algorithm for AAC (EBFOS)

In the proposed second type of R-D control algorithms, instead of performing the trellis search through entire frame, we allocate the bits to the proper band step by step. If the inter-band dependency of coding process does not exist, the bit allocation problem can be efficiently solved by the generalized BFOS algorithm. However, as described in Section 2.9, the differential and run-length coding induce the inter-band dependency in coding process of AAC.

Similar to the generalized BFOS bit allocation algorithm, the basic idea behind our bit allocation scheme is “allocate bits to the band with the maximum NMR gain per bit” or

“retrieve bits from the band with the maximum bits per NMR loss”. However, in our approach, we also consider the inter-band dependency of coding process. Therefore, our bit allocation approach is called Enhanced BFOS algorithm. “NMR gain per bit” (NGPB) means the gain in NMR by allocating one bit and is formulated by (4.2). “bits per NMR loss” (BPNL) is the number bits we save if we give away one unit of NMR and is formulated by (4.3). In (4.2) and (4.3), NMRref and bitsref are the original NMR value and bit numbers, respectively. NMRnew is the new NMR value after allocating new bit numbers, bitsnew. In principle, our proposed scheme tries to reduce the total NMR of all bands. Hence, it has a performance close to the algorithm that minimizes the averaged NMR criterion.

)

/ NMR Loss bitsref bitsnew NMRnew NMRref

bits = − − (4.3)

4.2.1 Bit Allocation Procedure of EBFOS Scheme

As illustrated in Section 2.9, in AAC, NMR in each band is controlled by the SF value. In general, larger SF value (referring to larger step size of the quantizer) will result in larger NMR value in each band. After been quantized, the quantized spectral coefficients (in each band) are entropy-coded by a proper choice of HCB. In addition, the indices of SFs and HCBs for all the bands are coded using differential and run-length codes respectively. The total coding bits, TB, for a frame can be expressed as (4.4). The values of SF and HCB for the ith SFB are denoted by si and hi, respectively. Symbol D() and R() represent the number of bits produced by differential coding and run-length coding, respectively. Parameter bi is the number of bits for coding the quantized spectral coefficients and parameter Bi is the total coding bits for the ith SFB.

( )

Fig. 4.1: EBFOS bit allocation scheme.

The block diagram of the EBFOS bit allocation scheme is shown in Fig. 4.1. Each step in Fig.

4.1 is elaborated below.

1. Initialization. This step is to initialize the reference NMR for each band, NMRref,i, at the start-up of Maximum NGPB/ BPNL analysis. Then, we can determine the reference SF for each band, sref,i,and calculate the values of reference total coding bits for a frame, TBref, based on the adopted reference NMR value. In general, larger NMRref,i value (at the start-up) will result in smaller TBref value at the start-up. There seems to be no theoretically optimal choice for these values. In our implementation, we set the reference NMR to 1 (0 dB) for all the bands, NMRref,i=1, ∀i. In other words, we are targeting at perceptually lossless coding at the beginning of processing a frame.

2. Local Maximum NGPB/BPNL analysis. This step is to find the local maximum NGPB and BPNL values for all bands. We can determine the local maximum NGPB and BPNL of the ith SFB, denoted by NGPBL,i and BPNLL,i, by computing:

and distortion for the ith SFB respectively, when the corresponding SF value of the ith SFB is changed from sref,i to snew,i. The local optimal SF value of the ith SFB, sopt,i, is the SF value associated with the local maximum NGPB or BPNL. The ni in (4.5) (or (4.6)) determines the candidate number of snew,i, which is approximate to 12 on the average from the statistics of coded data.

Note that, in performing the local maximum NGPB or BPNL analysis for the ith SFB, only the SF value of the ith SFB is changed from sref,i to snew,i. The SF values of the other SFBs are kept unchanged (sj = sref,j , ∀j , j≠i).

3. Global Maximum NGPB/BPNL analysis. We first find the global maximum NGPB and BPNL value, NGPBG and BPNLG, for a frame by computing: constraint is not met.

In order to handle the inter-band dependency of coding process, we use TB instead of Bi

for NGPB/BPNL analysis. Otherwise, the SF value change of the sfbG-th SFB in step 3 will

we have to performing the local maximum NGPB/BPNL analyses for all the bands for each iteration.

As mentioned in step 2 of the preceding procedure, after changing the SF value from sref,i

to snew,i, we need to calculate TBnew and NMRnew,i. The value of NMRnew,i depends only on the value of snew,i. However, (4.4) indicates that the value of TBnew depends not only on the value of snew,i; it also depends on the choice of HCB. In our bit allocation scheme, we adopt the trellis-based optimization algorithm for HCB decision proposed in Section 3.1.3.

In general, either the Maximum NGPB analysis or the Maximum BPNL analysis (but not both) has to be performed for each iteration. The Maximum NGPB analysis is used for spending the bit budget (when the bit budget is positive) and the Maximum BPNL analysis is used for recovering the bit budget (when the bit budget is negative).

4.2.2 Fast Algorithm for EBFOS Scheme

The complexity of our EBFOS scheme highly depends on the times that the NGPB/BPNL calculation in step 2 of the bit allocation procedure (in Section 4.2.1) is performed. Taking the local maximum NGPB analysis as example, we need to perform ni times NGPB calculation for locating the local maximum NGPB of the ith SFB. Hence, the total number of calculations for finding the global maximum NGPB is

ini .

It is obvious that the most effective way for reducing computations is to reduce the number of NGPB/BPNL calculations. From the statistics of the local optimal parameters, sopt,i and NGPBL,i (or BPNLL,i) collected from the coded data, we find some interesting properties whish are summarized in Table 4.1.

In Table 4.1, i is the SFB index and m is the index of SF adjustment iteration. sfbGm is the global optimal SFB of the mth SF adjustment iteration andS =

{

sfbGm1,sfbGm+1

}

, the set

of two SFBs. The first statistic in Table 4.1 is the probability that m+1, i

sopt differs from soptm ,i

and is denoted asPC(soptm+1,isoptm ,i), where soptm ,i is the local optimal SF value of ith SFB for

. The other statistic, taking the Maximum NGPB analysis as example, is the average value of normalized differences between

m i

NGPBL, and NGPBmL,+i1 , )ADC(NGPBLm,i,NGPBLm,+i1 , where NGPBmL,i is the local maximum NGPB value of ith SFB for the mth SF adjustment iteration.

)

It is clearly that the differences of local maximum NGPB/BPNL analyses between each SF adjustment iteration mainly centralize at the SFB belong to S. Using these properties, we can drastically reduce the number of iterations in determining sfbG and NGPBG (or BPNLG).

We only need to perform the local maximum NGPB/BPNL analysis on three SFBs (SFB=

{

sfbGm1,sfbGm,sfbGm+1

}

) after the first SF adjustment iteration. This is the fast version of our EBFOS algorithm.

Table 4.1: Statistics of the local optimal parameters in maximum NGPB/BPNL analysis.

Condition ( C ) iS i∉(SsfbGm)

4.3 Generalized BFOS Bit Allocation Algorithm for AAC

The generalized BFOS algorithm is an efficient bit allocation algorithm for subband coding.

For the purpose of analyses and comparisons, we propose an approach to integrate the generalized BFOS bit allocation algorithm in AAC in this section based on the concepts described in [10] and [11]. The bit allocation procedure of the generalized BFOS scheme for AAC is similar to that of the EBFOS scheme (see Fig. 4.1). Each step in the generalized BFOS scheme for AAC is elaborated below.

1. Initialization. The same to the initialization step in Section 4.2.1, we set the reference NMR to 1 (0 dB) for all the bands. Then, we determine the sref,i value and calculate the value of reference total coding bits for each band, Bref,i based on the adopted reference NMR value, NMRref,i=1, ∀i.

2. Local Maximum NGPB/BPNL analysis. Differing from the EBFOS scheme, the local maximum NGPB and BPNL of the ith SFB for the BFOS scheme are determine by the formula (4.10) and (4.11) respectively.

{

ref i newi newi ref i

}

newi ref i i newi ref i

Bnew,i and NMRnew,i are the new values of total coding bit and distortion for the ith SFB respectively, when the corresponding SF value of the ith SFB is changed from sref,i to snew,i. The local optimal SF value of the ith SFB, sopt,i, is the SF value associated with the local

maximum NGPB or BPNL.

3. Global Maximum NGPB/BPNL analysis. The same to the step 3 in Section 4.2.1, we first find the NGPBG (or BPNLG ) for a frame by the formula (4.7) (or (4.8)) and determine sfbG . Then we set the SF value only of the sfbG-th SFB to the local optimal SF value of the sfbG-th SFB.

4. Update NMRref,i (as well as sref,i) and Bref,i of the sfbG-th SFB. Go to step 2 if the bit budget constraint is not met.

In the generalized BFOS bit allocation scheme here, we also adopt the trellis-based optimization algorithm for HCB decision. However, differing from the EBFOS scheme, we only perform the local maximum NGPB/BPNL analysis for the sfbG-th SFB.

As described in [10], the generalized BFOS bit allocation scheme can be performed with and without convexity assumption. When the generalized BFOS scheme is performed with convexity assumption, ni in (4.10) (or (4.11)) is equal to 1. When the generalized BFOS scheme is performed without convexity assumption, ni is approximate to 14 on the average from the statistics of coded data.

4.4 Simulation Results

In this section, we evaluate the computational complexity and the coded audio quality in our experiments. Four types of bit allocation algorithms are simulated and compared as described below using the MPEG-4 AAC Verification Model (VM) as the test platform.

(1) The TLS algorithm in MPEG-4 AAC VM (VM-TLS).

(2) The BFOS algorithm for AAC with convexity assumption, BFOS-C, and without convexity assumption, BFOS-NC, which are described in Section 4.3.

(3) The trellis-based algorithm aiming at minimizing average NMR, JTB-ANMR, and aiming

(4) The EBFOS scheme and its fast version, which are described in Section 4.2.

In order to focus only on the bit allocation performance, all the optional tools in AAC, such as TNS and M/S stereo coding, are not used in our simulations. Ten two-channel audio sequences with a sampling rate at 44.1 kHz are tested. Two of them are extracted from MPEG SQAM [6], and the rest are from EBU [24].

4.4.1 Complexity Analysis

The complexity analysis for the aforementioned several bit allocation algorithms is summarized in Table 4.2. The “Computation” column is the average number of NGPB (or BPNL) calculation for a frame. The values in “Computation” column are derived from the statistics collected from the simulations on audio sequences. For the convenience of comparison, the BFOS-NC scheme is chosen to be the reference (ratio=1) and all the other schemes are rated based on this reference.

Table 4.2: Complexity Analysis of EBFOS scheme and generalized BFOS scheme Scheme Computation Ratio

BFOS-C 119 0.27

BFOS-NC 444 1

Fast EBFOS 1145 2.58

EBFOS 11848 26.68

The experimental data indicate that the computation of fast EBFOS scheme is approximately 2.6 times higher than that of the BFOS-NC scheme. Moreover, the fast EBFOS scheme is approximately 10 times faster than that of the EBFOS scheme.

4.4.2 Objective Quality

The rate-distortion curves of the aforementioned bit allocation schemes are shown in Fig. 4.2 and Fig. 4.3. Two common objective quality measurements, average NMR (ANMR) and

maximum NMR (MNMR) are adopted in the objective performance comparison.

The research in [11] shows that the BFOS-C scheme is a near optimal bit allocation scheme for MPEG-1 LayerⅠ/ LayerⅡ audio coding, but the simulation results show that the BFOS-C scheme becomes less efficiency for AAC. The performance of the BFOS-NC scheme is much better than that of the BFOS-C scheme which means that the convex assumption is not suitable for AAC. Otherwise, both the ANMR and MNMR performances of the BFOS-NC scheme are approximately 1dB worse than that of the JTB-ANMR scheme.

Clearly, the performances of the EBFOS scheme are much better than that of VM-TLS and better than that of the BFOS-NC scheme. If we look at the ANMR plot (Fig. 4.2), the performance of the EBFOS scheme is slightly worse than that of JTB-ANMR but they are very close. It is somewhat better than the JTB-MNMR scheme since the latter is not optimized for the ANMR criterion. If we look at the MNMR plot (Fig. 4.3), the EBFOS scheme is somewhat worse than JTB-MNMR but it is slightly better than the JTB-ANMR scheme. As stated earlier, the EBFOS scheme is aiming at reducing the overall NMR, which pretty much leads to minimizing ANMR. As for the fast version, there is almost no loss of performance (less than 0.06dB loss) in adopting the fast algorithm for EBFOS.

4.4.3 Subjective Quality

The informal listening tests on the aforementioned schemes show that it is hard to tell the difference between JTB-ANMR and the EBFOS scheme. In addition, a “simulated” subjective measure, Objective Difference Grade (ODG), is used in audio quality evaluation.

The ODG results of the aforementioned bit allocation schemes are shown in Fig. 4.4, in which the reference signal is the original audio sequence. Interestingly, JTB-ANMR is the best algorithm judged by ODG. According to the collected test data (Fig. 4.4), the EBFOS scheme is better than that of the BFOS-NC and BFOS-C schemes. Moreover, the difference between the EBFOS and the JTB-ANMR schemes is rather small.

Fig. 4.2: ANMR rate-distortion comparison for various bit allocation schemes

Fig. 4.3: MNMR rate-distortion comparison for various bit allocation schemes

Fig. 4.4: ODG performance of various bit allocation schemes

Chapter 5 Perceptually Weighted Inter-Channel Prediction

Despite the success of current audio coding techniques, not much effort has been made to reduce the inter-channel redundancy inherent in multichannel audio. In order to achieve a higher efficiency in removing the inter-channel redundancy and, in the meanwhile, to maintain good audio quality, an efficient inter-channel prediction algorithm, called perceptually weighted inter-channel prediction, is described in this chapter.

We first give a brief review of linear prediction technique in Section 5.1. The proposed perceptual-weight inter-channel prediction scheme is described in Sections 5.2. The experiments and simulation results are summarized in Section 5.3.

5.1 Linear Prediction

Linear prediction technique is an effective tool in speech coding and lossless audio coding and thus is commonly used in those coders [30]. However, as mentioned in [31], there has been a long debate whether or not the inter-channel linear prediction in time domain can further increase the compression rate of the multichannel audio coder. Theoretically, the coder

Linear prediction technique is an effective tool in speech coding and lossless audio coding and thus is commonly used in those coders [30]. However, as mentioned in [31], there has been a long debate whether or not the inter-channel linear prediction in time domain can further increase the compression rate of the multichannel audio coder. Theoretically, the coder