附錄 - 基於正交分頻多重進接之無線多媒體傳收機研究及設計---子計劃IV:無線網路串流聲訊研究及聲視訊子系統整合(I)

1. C.-H. Yang and H.-M. Hang, “Efficient bit assignment strategy for perceptual audio coding,”

ICASSP 2003, Hong Kong, April 2003.

2. C.-H. Yang and H.-M. Hang, “Cascaded trellis-based optimization for MPEG-4 Advanced Au-dio Coding,” to be presented in AuAu-dio Engineering Society Convention 2003, New York, Oct.

2003.

3. 出席國際學術會議報告 IEEE ISCAS 2003

報告論文（前ㄧ期成果） K.-T. Shih, C.-Y. Tsai, and H.-M. Hang, “Real-time implementation of H.263+ using TI TMS320C6201 digital signal processor”, ISCAS 2003, Bangkok, Thailand, May 2003.

EFFICIENT BIT ASSIGNMET STRATEGY FOR PERCEPTUAL AUDIO CODING

Cheng-Han Yang and Hsueh-Ming Hang**

Department of Electronics Engineering

National Chiao Tung University Hsinchu, Taiwan, R.O.C.

**hmhang@cc.nctu.edu.tw; Fax: (886)-3-5723283

ABSTRACT

For the purpose of efficient audio coding at low rates, a new bit allocation strategy is proposed in this paper. The basic idea behind this approach is “Give bits to the band with the maximum NMR-Gain/bit” or “Retrieve bits from the band with the maximum bits/NMR-Loss”.

The notion of “bit-use efficiency” is suggested and it can be employed to construct a bit assignment algorithm oper-ated at band-level as compared to the traditional frame-level bit assignment methods. Based on this strategy a new bit assignment scheme, called Max-BNLR, is de-signed for the MPEG-4 AAC. Simulation results show that the performance of the Max-BNLR scheme is signifi-cantly better than that of the MPEG-4 AAC Verification Model (VM) and is close to that of TB-ANMR [3], which is the (nearly) optimal solution. Moreover, the Max-BNLR scheme has the advantages of low computational complexity comparing to TB-ANMR.

1. INTRODUCTION

Many highly efficient and high quality audio coding schemes have been developed and proposed to meet the growing demand of multimedia applications. The MPEG-4 Advanced Audio Coding (AAC) is one of the most re-cent audio coder specified by the ISO/IEC MPEG stan-dards committee [1]. It is a very efficient audio compres-sion algorithm aiming at a wide variety of applications, such as Internet, wireless, and digital broadcast arenas [2].

For the applications where the bandwidth is very limited, the low rate audio coding with good quality becomes es-sential.

The procedure of bit assignment is one of the most important elements in audio coding. Particularly, when bits are scare, how to make the best use of the limited bits is critical in producing the best quality audio. Up to now, the popular strategies on bit assignment are as follows ([2][3][5]).

1. “Give bits to the band which has the largest value of NMR (perceptual distortion).”

2. “Give bits to the bands of which the distortion is larger than the masking threshold”.

In these strategies, the bit-use (giving away bits) is con-sidered at frame-level and only the value of distortion is taken into consideration at band-level. Hence, it is hard to control the bit-use efficiency (the NMR improvement due to adding one bit) at band level and thus results in a less efficient compression scheme.

In this paper, we suggest the notion of bit-use effi-ciency and propose a new strategy to improve the bit-use efficiency, which can be evaluated at band-level. More-over, a new bit assignment scheme based on this new strategy is proposed for MPEG-4 AAC.

The organization of the paper is as follows. Section 2 describes the aforementioned new strategy. A new AAC bit assignment scheme is delineated in section 3. Finally, the complexity analysis and the simulation results are pre-sented in section 4.

2. EFFICIENT BIT-USE STRATEGY

How to make use of the bits more efficiently is always the key issue in audio coding. The traditional strategies, “Giv-ing bits to the band with the largest NMR” or “Giv“Giv-ing bits to the bands of which the distortion is larger than masking threshold”, do not necessarily provide the best bit-use efficiency. For example, there are two candidate bands, A and B, and their NMR characteristics are listed in the table below. Which band should the first available bit be as-signed to? In this table, NMR-Gain/bit means the gain in NMR by allocating one bit to this particular band. A more precise definition of NMR-Gain/bit will be given in sec-tion 3.

Band NMR (dB) NMR-Gain/bit A 3.5 0.5

B 3 1.5

Following the traditional strategy, we would assign this one bit to band A; however, considering the bit-use effi-ciency, this one bit should be assigned to band B so that the overall NMR reduction is maximized. The essence of this new strategy can be summarized by the following statements.

“Give bits to the band with the maximum NMR-Gain/bit” or “Retrieve bits from the band with the

➠ ➡

<附件一>

maximum bits/NMR-Loss”, where bits/NMR-loss is the number bits we save if we give away one unit of NMR.

3. MAX BITS/NMR-LOSS BIT ASSIGNMENT SCHEME

In this section, a new bit assignment scheme designed for MPEG-4 AAC based our new strategy is described. First, we define NMR-Gain/bit and bits/NMR-Loss by the fol-lowing equations. Figure 1 is the block diagram of the Max bits/NMR-Loss based bit assignment scheme. Each step in Figure 1 will be elaborated in the following sub-sections.

Pre-Processing Bits/NMR-Loss Analysis

Adjust SF of the SFB with max Bits/NMR-Loss

Total coding bits

< prescribed bits No

Yes

Figure 1. Max bits/NMR-Loss bit assignment scheme 3.1. Pre-Processing

The pre-processing step is to initialize two of the major parameters in the bits/NMR-Loss analysis: the reference NMR and the reference bits. There are no particular val-ues associated with these parameters and thus the design of the pre-processing is case-dependent. In our implemen-tation, we set the reference NMR=1 (0dB) for all the scale factor bands (SFB) at the beginning of processing a frame.

After that, the reference scale factor (SF) for each SFB and the reference bits are calculated based on the input audio data.

3.2. Bits/NMR-Loss Analysis and SF Adjustment In this scheme, only one SF value (of one SFB) is ad-justed in one adjustment iteration. The detailed process is described below.

1. Initialization. Get the reference bits (Bref), and the refer-ence SFs (sfref) and NMRs (NMRref) for all SFBs (N_SFB SFB in total) from the pre-processing step.

Start the max bits/NMR-Loss analysis from the first SFB and thus set the SFB index i=1.

2. Find the local max bits/NMR-Loss ratio of the ith SFB, BNLR , by computing i current frame if the SF value (of the ith SFB) is changed from sf_ref_,_ito sf_sf_,_i. Thesf_max,_iis the SF value that quantizes all the spectral coefficients in the ith SFB to zero. The local optimal SF (of the ith SFB),

sf

_opt_,_i, is the SF with the maximum BNLR. The local optimal coding bits of the ith SFB,B_opt_,_i=

4. Find the global maximum bits/NMR-Loss ratio, BNLRglobal, by computing

^BNLRglobe =maxi{^BNLRi} ∀ⁱ,0≤ⁱ<^N_^SFB The global optimal SFB, sfbglobal, is the SFB that has the BNLRglobal. Then, the global optimal SF, sfglobal, is the local optimal SF of the sfbglobal-th SFB. Similarly, the global optimal coding bits, Bglobal, is the coding bits of the sfbglobal-th SFB.

5. Set the SF of the sfbglobal-th SFB to sfglobal. Update pa-rameters for the sfbglobal-th SFB; that is,

global

Note that, in performing the local maximum bits/NMR-Loss ratio analysis in step 2, only the SF of one SFB that is being examined is modified. The SF of the other SFBs are kept unchanged.

3.3. Trellis-Based Optimization on HCB

Total coding bits calculation in step 2 in the Bits/NMR-Loss Analysis (in sub-section 3.2) is one of the most computational-intensive processes. When the SF for each SFB is determined, the quantized spectral coefficients are also fixed. Before calculating the total coding bits, the HCB for each SFB has to be chosen first. The MPEG-4 AAC Verification Model (VM) has a simple algorithm for this purpose; however, a more efficient algorithm is needed for HCB decision. Thus, we adopt the Viterbi-based approach in this paper.

The problem for finding the optimal HCB can be reformu-lated as minimizing the following cost function:

➡ ➡

( _i ₁, _i)

i i

HCB b Rh h

C ⁼

∑

⁺ ₋ , (3) where b is the coding bits of the quantized spectral coef-_i ficients for the ith SFB, h is the HCB for the ith SFB, and _i R is the run-length coding function (bits needed) for cod-ing HCB. We find that the contribution of h to _i C_HCB depends only on the previous choice, h_i₋₁. Therefore, the minimization of C_HCB can be achieved by finding the optimal path through the trellis using the Viterbi algo-rithm.

A trellis is thus constructed for minimizing C_HCB. Each stage in the trellis corresponds to an SFB and each state at the ith stage represents a HCB candidate for this scale factor band. In other words, for the ith stage, if a path passes through the mth state, the mth HCB is em-ployed for encoding the ith SFB. The Viterbi search pro-cedure is outlined below.

The kth state at the ith stage is denoted by S_k_,_i and the minimum accumulative-partial cost ending at S_k_,_iis de-noted by C_k_,_i. The transition cost from S_n_,_i₋₁ to S_m_,_i is

3.4. Fast algorithm for Bits/NMR-Loss Analysis The most time-consuming computation in this bit assign-ment scheme is the trellis-based HCB optimization for coding bits calculation in step 2 (Search). For each SF modification in step 2, the new value of total coding bits needs to be recalculated. Therefore, for one SF adjustment iteration, we need to perform (sf_max,_i −sf_ref_,_i) times trel-lis-based HCB optimization processes for the local bits/NMR-Loss analysis. Hence, the total number of calcu-lations for finding the global maximum bits/NMR-Loss is

∑

= There are at least two ways to reduce computations. One is to reduce the complexity of the trellis-based HCB opti-mization; the other is to reduce the number of trellis-based HCB optimization.

By analyzing the local optimal parameters, sf_opt_,_iand BNLRi, we find some interesting properties.

1. The average value of the difference between the local optimal SFs of the mth and the (m+1)th iterations,

sfdiffave, is often close to zero.

∑

∉

the global optimal SFB of the mth SF adjustment iteration.

2. The average value of the difference between the local max bits/NMR-Loss ratio of the mth and the (m+1)th itera-tion, BNLRdiff_ave, is typically quite small.

∑

∉

ave abs BNLR BNLR

SFB

Using these two properties, we can drastically reduce the number of bits/NMR-Loss analyses (trellis-based HCB optimizations). We only need to perform the bits/NMR-Loss analysis on three SFBs after the first SF adjustment iteration.

4. SIMULATION RESULTS

The computational complexity and objective quality based on our simulations are summarized in this section. The bits assignment schemes used in comparison are as fol-lows.

(1) The MPEG-4 VM of AAC (VM-TLS) without modifi-cation.

(2) The modified MPEG-4 VM of AAC (VM-TLS-M), in which the HCB decision algorithm is replaced by the TB-HCB optimization procedure described in section 3.3.

(3) The trellis-based ANMR optimization (TB-ANMR) and the MNMR optimization (TB-MNRM), which are implemented as described in [3] and [4].

(4) The normal and fast max bits/NMR-Loss schemes (max-BNLR).

Ten audio files with sampling rate 44.1K are used as test sequences. Two of them are extracted from MPEG SQAM [6], and the others are from EBU [7].

4.1. Computational complexity

The storage and computational complexity of one iteration in various schemes are summarized in Table 1.

Table 1. Complexity Analysis

Search complexity Storage

VM-TLS 1 --

VM-TLS-M 12²×N_SFB 12×N_SFB

TB-ANMR Max-BNLR N_SFB×Ave_SF×12²×N_SFB 12×N_SFB

Fast

Max-BNLR ※ (a) (N_SFB×Ave_SF

×12² ×N_SFB) (b) 3×Ave_SF ×12² ×N_SFB

12×N_SFB

※ (a) is only for the first iteration; all the rest are using (b)

➡ ➡

In this table, Ave_SF is the average number of SF tested for the max BNLR analysis for each SFB and its typical value is around 17 or so. Table 2 is the statistics collected from the simulations on audio sequences. It is clear that in terms of computational requirement:

Fast Max-BNLR<< Max-BNLR<< TB-ANMR(MNMR) Table 2. Statistics on Computational Complexity

Average

4.2. Objective results

Two common objective quality measurements, average noise to mask ratio (ANMR) and maximum noise to mask ratio (MNMR) [5], are adopted in the performance com-parison. Note that, in evaluating distortion, the NMR is set to 0 dB if the original NMR value is less than 0 dB. The rate-distortion curves of six bit assignment schemes are shown in Figures 2 and 3. (Note: ANMR and TB-MNMR are similar algorithms aiming at two different target NMRs.) We can find that the ANMR performance of the Max-BNLR scheme is almost as good as that of TB-ANMR. There is almost no loss of ANMR perform-ance in using the fast algorithm for Max-BNLR either.

The MNMR values of TB-ANMR, Max-BNLR and Fast Max-BNLR are also similar. The characteristic of the pro-posed Max-BNLR scheme is closer to that of TB-ANMR as compared to MNMR. Again, ANMR and TB-MNMR are the optimal solutions tuned for their target cost functions, ANMR and MNMR, respectively [3][4].

4. CONCLUSIONS

In this paper, we propose a new concept, bit-use effi-ciency, for improving audio coding performance. Fur-thermore, a new bits assignment scheme based on this new concept (strategy) is proposed for MPEG-4 AAC, named Max-BNLR. Simulation results show that the Max-BNLR scheme has a performance close to TB-ANMR and is much better than the MPEG VM. In addi-tion, its computational complexity is much lower than that of TB-ANMR.

5. REFERENCES

[1] ISO/IEC JTC1/SC29, “Information technology – vary low bitrate audio-visual coding,” ISO/IEC IS-14496 (Part 3, Audio), 1998

[2] M. Bosi, et al., “ISO/IEC MPEG-2 advanced audio coding,”

Journal of Audio Engineering Society, vol. 45, pp. 789-814, October 1997.

[3] A. Aggarwal, S.L. Regunathan, K. Rose, “Trellis-based optimization of MPEG-4 advanced audio coding,” Proc.

IEEE Workshop on Speech Coding, pp. 142-4 2000.

[4] A. Aggarwal, S.L. Regunathan, K. Rose, “Near-optimal selection of encoding parameters for audio coding,” Proc.

of ICASSP, vol. 5, pp. 3269-3272, Jun 2001.

[5] H. Najafzadeh and P. Kabal, “Perceptual bit allocation for low rate coding of narrowband audio,” Proc. of ICASSP, vol. 2, pp. 893-896, 2000.

[6] “The MPEG audio web page.” http://www.tnt.uni-hannover.de/project/mpeg/audio.

[7] European Broadcasting Union, Sound Quality Assessment Material: Recordings for Subjective Tests, Brussels, Bel-gium, Apr. 1988.

Total Bit Rate (kbps)

10 20 30 40 50 60 70 80

Figure 2. ANMR rate-distortion analyses

Total Bit Rate (kbps)

10 20 30 40 50 60 70 80

Figure 3. MNMR rate-distortion analyses

➡ ➠

___________________________________

Audio Engineering Society

Convention Paper

Presented at the 115th Convention

2003 October 10–13 New York, New York

This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42^nd Street, New York, New York 10165-2520, USA; also see www.aes.org.

___________________________________

Cascaded Trellis-Based Optimization For MPEG-4 Advanced Audio Coding

Cheng-Han Yang¹, Hsueh-Ming Hang¹

1 Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.

hmhang@mail.nctu.edu.tw; Fax: (886)-3-5723283 u8911831@cc.nctu.edu.tw; Fax: (886) -3-5731791

ABSTRACT

A low complexity and high performance scheme for choosing MPEG-4 Advanced Audio Coding (AAC) parameters is proposed. One key element in producing good quality compressed audio at low rates in particular is selecting proper coding parameter values. A joint trellis-based optimization approach has thus been previously proposed. It leads to a near-optimal selection of parameters at the cost of extremely high computational complexity. It is, therefore, very desirable to achieve a similar coding performance (audio quality) at a much lower complexity.

Simulation results indicate that our proposed cascaded trellis-based optimization scheme has a coding performance close to that of the joint trellis-based scheme, and it requires only 1/70 in computation.

1. INTRODUCTION

To meet the demand of various multimedia applications, many high-efficient audio coding schemes have been developed. The MPEG-4 Advanced Audio Coding (AAC) is one of the most recent-generation audio coders specified by the ISO/IEC MPEG standards committee [1]. It is a very efficient audio compression algorithm aiming at a wide variety of different applications, such as Internet, wireless, and digital broadcast arenas [2].

One key element in an AAC coder is selecting two sets of coding parameters properly, the scale factor

(SF) and Huffman codebook (HCB) in the rate-distortion (R-D) loop. Because encoding these parameters is inter-band dependent, i.e., the coded bits produced for the second band depend on the choice of the first band, the choice of their proper values so as to minimize the objective quality becomes fairly difficult. As discussed in [3][4], the poor choice of parameters for rate control is one shortcoming of the current MPEG-4 AAC Verification Model (VM) and therefore its compression efficiency is not as expected at low bit rates.

<附件二>

YANG AND HANG CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

Some methods such as vector quanitzers rather than scalar quantizers have been suggested to reduce the side information [5][6]. They would alter the syntax of the standards. In this paper, we focus on finding the parameters in the existing AAC standard that produce the (nearly) optimal compressed audio quality for a given bit rate.

In [3] and [4], a joint optimization scheme, which takes the inter-band dependence into account, is proposed for choosing the encoding parameters for all the frequency bands. This joint optimization is formulated as a trellis search and is, therefore, called trellis-based optimization. Although the complexity of this joint trellis-based optimization scheme can be reduced by adopting the Viterbi algorithm, its search complexity is still extremely high and is thus not suitable for practical applications.

In this paper, we propose a cascaded trellis-based (CTB) optimization scheme for selecting the proper encoding parameters. Our scheme retains the good audio quality offered by the joint trellis-based (JTB) optimization while its search complexity is drastically decreased.

The organization of this paper is as follows. In section 2, a brief overview of MPE-4 AAC is provided. The proposed CTB scheme with several variations for choosing the optimal coding parameters is described in sections 3, 4 and 5. The algorithm complexity analysis and the simulation results are summarized in section 6.

2. OVERVIEW OF AAC ENCODER

The basic structure of the MPEG-4 AAC encoder is shown in Figure 1. The time domain signals are first converted into the frequency domain (spectral coefficients) by the modified discrete cosine transform (MDCT). For tying in with the human auditory system, these spectral coefficients are grouped into a number of bands, called scale factor bands (SFB). The pre-process modules, which are the optional functions, can help removing the time/frequency domain redundancies of the original signals. The psychoacoustic model calculates the spectral coefficient masking threshold, which is the base for deciding coding parameters in the R-D loop.

The R-D loop, our focus in this paper, is to determine two critical parameters, SF and HCB for each SFB so as to optimize the selected criterion under the given bit rate constraint. The SF is related to the step size of the quantizer, which determines the quantization noise-to-masking ratio (NMR) in each band. The quantized coefficients are then entropy-coded by one

在文檔中基於正交分頻多重進接之無線多媒體傳收機研究及設計---子計劃IV:無線網路串流聲訊研究及聲視訊子系統整合(I) (頁 19-38)