基於正交分頻多重進接之無線多媒體傳收機研究及設計---子計劃IV:無線網路串流聲訊研究及聲視訊子系統整合(I)

(1)

行政院國家科學委員會專題研究計畫成果報告

子計劃四:無線網路串流聲訊研究及聲視訊子系統整合(I)

計畫類別：整合型計畫

計畫編號： NSC91-2219-E-009-011-

執行期間： 91 年 08 月 01 日至 92 年 07 月 31 日

執行單位：國立交通大學電子工程學系

計畫主持人：杭學鳴

計畫參與人員：楊政翰，陳繼大，蔡家揚，侯思瑋，李仰哲，王俊能

報告類型：完整報告

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 92 年 10 月 23 日

(2)

行政院國家科學委員會補助專題研究計畫

■ 成果報告

□期中進度報告

無線網路串流聲訊研究及聲視訊子系統整合 (1/3)

Wireless Streaming Audio Research and

Audio/Video Subsystem Integration (1/3)

計畫類別：□ 個別型計畫 ■ 整合型計畫

計畫編號：

NSC 91-2219-E009-011

執行期間：

91 年 8 月 1 日至 92 年 7 月 31 日

計畫主持人：杭學鳴

計畫參與人員：楊政翰，陳繼大，蔡家揚，侯思瑋，李仰哲，王俊能

成果報告類型(依經費核定清單規定繳交)：□精簡報告 ■完整報告

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

■出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

處理方式：除產學合作研究計畫、提升產業技術及人才培育研究計畫、

列管計畫及下列情形者外，得立即公開查詢

□涉及專利或其他智慧財產權，□一年□二年後可公開查詢

執行單位：國立交通大學電子工程學系

中華民國 92 年 10 月 15 日

(3)

行政院國家科學委員會專題研究計畫成果報告

無線網路串流聲訊研究及聲視訊子系統整合（1/3）

Wireless Streaming Audio Research and

Audio/Video Subsystem Integration (1/3)

計畫編號: NSC 91-2219-E009-011

執行期限: 91 年 8 月 1 日至 92 年 7 月 31 日

主持人: 杭學鳴國立交通大學電子工程學系教授

計畫參與人員：楊政翰，陳繼大，蔡家揚，侯思瑋，李仰哲，王俊能

國立交通大學電子研究所

中文摘要

本子計畫之主要目標在研究與製作寬頻無線網路環境中的串流聲訊系統。主要完成目標為聲

訊(Audio & Speech)在寬頻無線網路中的有效傳輸方式，含可調式(scalable)編碼法，整合通

道與訊源編碼，錯誤糾正與補償等演算法之研究。其次考慮通道錯誤偵測與錯誤控制技術以

及系統層對達成品質保證(QoS)目標的影響。並配合群體計畫的 DSP 展示系統，則同時進行

MPEG-4 Audio Coding (AAC 或其延伸)與 3GPP AMR Speech coding 的 DSP 實現。第三年亦

將協助總計畫，在本計畫內先將視訊與聲訊 DSP 整合成訊源子系統。

關鍵詞：AMR，AAC，串流聲訊

Abstract

The goal of this research project is to study, simulate and design effective streaming audio

algo-rithms/systems transmitted in the wideband wireless environment. This is often achieved by three

means: sclable source coding schemes, adaptive channel coding techniques and QoS protocol

support from the network. We will focus mostly on the former two methods.

In the first year, we study and design the MPEG AAC (advanced audio coding) audio codec and

BSAC (Bit-Sliced Arithmetic Coding) scalable coding techniques. In the future, we work on the

DSP implementation of AAC and 3GPP AMR (adaptive multi-rate) coding. In the third year, we

will integrate the audio and video components together with the system layer to form the source

subsystem to be merged into the Group Project test bed.

(4)

目錄 Table of Contents

A. 背景 ... 3

B. 研究步驟 ... 3

1. AAC General Coding 編碼法之研究： ... 3

1.1 有效位元分配方式: ...4

1.2 快速位元分配參數搜尋法: ...6

2. 切片式算數編碼(Bit-Sliced Arithmetic Coding，BSAC)研究： ... 8

C. 實驗與結果 ... 9

1. AAC 編碼法 Rate/Distortion Control：... 9

2. 切片式算數編碼： ... 12

D. 結論 ... 15

E. 參考文獻 ... 16

F. 計畫成果自評 ... 17

G. 附錄 ... 17

1. C.-H. Yang and H.-M. Hang, “Efficient bit assignment strategy for perceptual audio

cod-ing,” ICASSP 2003, Hong Kong, April 2003.

2. C.-H. Yang and H.-M. Hang, “Cascaded trellis-based optimization for MPEG-4

Ad-vanced Audio Coding,” to be presented in Audio Engineering Society Convention 2003,

New York, Oct. 2003.

(5)

A. 背景

前一期執行電信國家型計畫時，已經實踐了一些 Speech Coding 與 Convolutional Coding

演算法，故在本期第一年中我們先探討 AAC Audio Coding 編碼法與可調式(scalable)切片式

算數編碼(Bit-Sliced Arithmetic Coding，BSAC)。

AAC 最先是 MPEG-2 中的聲訊編碼標準，在 1997 年 12 月制定完成。MPEG-2 AAC 聲

訊編碼標準捨棄與 MPEG-1 聲訊編碼標準的相容性，加入了 Temporal Noise Shaping (TNS)

及預測 (Prediction) 這兩個獨立的新模組，因此 AAC 能提供比 MP3 更好的壓縮率及聲訊品

質[1][2][3]。MPEG-4 AAC (version 2)是 ISO/IEC MPEG 於 1999 年 12 月制定完成之新一代聲

訊編碼標準[4][5]。MPEG-4 AAC 聲訊編碼是以 MPEG-2 AAC 為基礎，並加入了數個獨立的

新模組， Long Term Prediction (LTP) [6] 、 Perceptual Noise Substitution (PNS) [7] 、

Transform-Domain Weighted Interleave Vector Quantization (Twin-VQ) [8]等。這些新的模組將

有助於更低位元率的聲訊壓縮。除此之外 MPEG-4 AAC 更引進“可調整之彈性（scalability）”

的概念，也就是利用 Bit-Sliced Arithmetic Coding (BSAC)這個新模組，讓編碼端可以依情況

(傳輸通道之頻寛等) 來對壓縮位元率及聲訊頻寛作調整，並調整編碼層次。解碼端則可以

依情況 (接收到的有效資料串多寡) 來調整解碼的層次，進而得到不同聲訊品質。舉例來

說，若編碼端以 128 kbps 進行聲音壓縮，解碼端可以隨著通道狀況的不同，用 32kbps、

64kbps、96kbps 及 128kbps 進行解碼[3]。因此 MPEG-4 AAC 除了可提供更高壓縮率、更好

品質外，亦更適合網路或無線通道的傳輸。我們擬選取適當的演算法加以 DSP 實現。

B. 研究步驟

1. AAC General Coding 編碼法之研究：

多媒體壓縮標準如聲訊壓縮，只界定解碼端。編碼端許多參數選擇，是設計工程師的職

責。好的參數選擇產生較好的效果。圖 1 是 MPEG-4 AAC 聲訊編碼標準的整體架構。在

MPEG-4 AAC 聲訊編碼中，參數選擇中最重要的一類是控制位元率，以達到較佳聲訊品質

與較高壓縮比，也就是圖.1 中 Rate/Distortion (R/D) Control 的部分。經由前置處理所得到的

頻線係數(Spectral Coefficients)先經過量化器(Quantizer)量化，量化過的頻譜係數再利用霍夫

曼(Huffman)編碼。參數 “比例因子(Scale Factor, SF)”控制著量化器的步階(Step Size)，因此

也決定著量化誤差(noise-to-masking ratio, NMR) 。

另外 MPEG-4 AAC 提供了 12 組“霍夫曼編碼書(Huffman Code Book, HCB)”以供編碼。

Rate/Distortion Control 就是藉由選擇不同的 SF 和 HCB 數值來控制編碼位元率以及聲訊品

質。SF 和 HCB 這兩項參數最後也需要被編碼並傳送到解碼端。在 Rate/Distortion Control

演算法上，我們嘗試兩種想法，分別發表兩篇論文，這兩種想法有部分亦可結合在一起。

(6)

Filter Bank TNS Intensity/Coupling LTP Prediction M /S Stereo Perceptual M odel PNS B i t s t r e a m M u l t i p l e x Rate/ Distortion Control Scale Factors Quantizer Noiseless Coding (Huffman) Interleving Perceptual W eights Vector Quantization Twin VQ

Input Time Signal

圖 1. MPEG-4 AAC 聲訊編碼整體架構

1.1 有效位元分配方式:

在聲訊壓縮中傳統位元分配方法是，觀察各頻帶(Band) 的 NMR 值（誤差值），將位元

優先配給 NMR 值最大者。但如從整體效率角度，這並不是成效最好的方法，例如下例：

Band NMR

(dB)

NMR-Gain/bit

A 3.5 0.5

B 3 1.5

如上表，Band A 的 NMR 值較大，但給 Band A 1 bit 只降低 0.5 dB。若給 Band B 1 bit

則可降低 1.5 dB。如果聽覺上，不論哪一個 Band，降低 1.5 dB 都會比降低 0.5 dB 好，則選

Band B 較有效。

不同於傳統的位元分配方法，我們將“位元使用效率”這個概念引入，並根據此概念提

出下方之新的位元分配之原則。新的位元分配原則可以比傳統方法更有效的控制位元使用效

率。

(7)

根據我們所提出之新的位元分配原則，我們針對 MPEG-4 AAC 聲訊編碼設計一套新的

位元分配方法，Max Bits/NMR-Loss (BNL)位元分配方法。

Max Bits/NMR-Loss 位元分配方法大至可分為以下四個步驟:

1. 前置處理(Pre-Processing)：主要是用來初始化一些在接下來步驟中所需要用的參數，如:

參考位元(

bits

_ref

)，參考 NMR(

NMR

_ref

)等…。

2. Bits/NMR-Loss 分析：對於各 band，藉著調整 SF 的數值，我們可以得到一組新位元(

bits

_new

)

及新 NMR(

NMR

new

)。經過 Bits/NMR-Loss 的分析後，對於各 band，我們就可以找出最大

Bits/NMR-Loss 數值和相對應的最佳 SF 數值。

Bits/NMR-Loss=

(

bits

_ref

−

bits

_new

)

/(

NMR

_new

−

NMR

_ref

)

3. 分析所有 band 的 Bits/NMR-Loss，並選出擁有最大 Bits/NMR-Loss 數值的 band，然後將

此 band 的 SF 數值調整到最佳數值。

4. 計算新的總編碼位元，如果大於限定之編碼位元則更新所有參數(如:

bits

_ref

、

NMR

ref

等)

並回到步驟(2)。

P re -P ro c e s s in g B its /N M R -L o s s A n a ly s is A d ju s t S F o f th e S F B w ith m a x B its /N M R -L o s s T o ta l c o d in g b its < p re s c rib e d b its N o Y e s

圖 2. Max Bits/NMR-Loss 位元分配方法

步驟(2)是 Max Bits/NMR-Loss 位元分配方法中計算量最大的部分，如下例：MPEG-4 AAC

聲訊編碼有 49 個 band，如果各 band 的候選 SF 數目為 10，那總共要執行 Bits/NMR-Loss 計

算 490 次。為了減少新位元分配方法的計算量，我們同時提出了一個步驟(2)的快速演算法。

經過統計上的分析，我們發現，除了少數特定的 band 之外，其餘 band 的 “最佳 SF” 和 “最

大 Bits/NMR-Loss”數值在前後 iteration 的相似性極高。因此對於這類 band，我們可以省去

它的大部分 Bits/NMR-Loss 計算。以下圖說明，如果我們在這個 iteration 調整了 Band A 的

SF 數值，那麼在下個 iteration 我們只需要從新分析 Band A 及與其相鄰的 2 個 band (Band

A-1、A+1)的 Bits/NMR-Loss。其餘 band 的“最佳 SF”等參數可以延續到下個 iteration 使用，

如此一來便可以大幅的減少步驟(2)的計算量。

(8)

Band A Band A+1 Band A-1 Band A+2

Band A-2

1.2 快速位元分配參數搜尋法:

SF 和 HCB 這兩項參數的組合控制著編碼位元率以及聲訊品質。想要得到最佳的效果，

最直接的方法就是比較 SF 和 HCB 這兩項參數所有可能的組合並選出最佳的組合，就也就

是所謂的窮舉搜尋法。但由於窮舉搜尋法的計算量非常大，不適合於實際應用。前人觀察到

編碼參數前後間之重複關係，提出利用 trellis-based search 的方法來降低計算量。由於在前

人所提出之方法中，SF 和 HCB 這兩項參數是利用 trellis-based search 的方法同時決定，所

以其演算法又稱為 joint trellis-based (JTB)，而其效果可以逼近窮舉搜尋法[9][10]。

雖然 JTB 演算法的計算量已遠較窮舉搜尋法為低，但是仍然過高，因此我們提出一個

計算量更低之快速演算法 cascaded trellis-based(CTB)。CTB 演算法同樣是利用 trellis-based

search 的方法來決定參數，但是它和 JTB 演算法最大的不同點在於 SF 和 HCB 兩項參數是

分在兩個不同步驟決定的。下圖 4 為 CTB 演算法的架構，大致可分為四個步驟。

Virtual-HCB mode TB SF optimization

TB HCB optimization

Fixed-HCB mode TB SF optimization initialize

λ

opt sf opt hcb opt f s ′ opt hcb

Count total bits, comparison & Adjust

λ

opt sf ( )

圖 4. CTB 演算法架構

1. 在沒有可參考之 HCB 的條件下，利用 trellis-based search 的方法決定一組最佳的 SF，

opt

sf

。

2. 在給定一組 SF，

sf

opt

，的條件下(來自步驟(1))，利用 trellis-based search 的方法決定一組

最佳的 HCB，

hcb

_opt

。

3. 在給定一組最佳的 HCB，

hcb

opt

的條件下(來自步驟(2))，利用 trellis-based search 的方法

再次決定一組新的最佳 SF，

sf

'opt

。此步驟是為了修正一些在步驟 1 中因為沒有可參考之

HCB 時對於 SF 做的徧差決定。

(9)

上述流程是 CTB 演算法的完整模式，如果要再進一步減化計算量，我可以省去上述流

程中的步驟(3)，這樣便是 CTB 演算法的簡化模式。分開決定 SF 和 HCB 兩項參數可以大量

減少計算量，但也可能大幅降低品質，其關鍵點在於找出適當的虛擬霍夫曼編碼書模型

(virtual HCB model)，而此模型主要是用於 CTB 演算法中的步驟(1)。在 JTB 演算法中，因

為 SF 和 HCB 兩參數是一起決定，所以對於任一候選 SF，有 12 組 HCB 與之搭配形成 12

組延申候選組合。然而在 CTB 演算法中，對於任一候選 SF，只有一組 HCB 與之搭配。步

驟(3)中用來搭配之 HCB 是由步驟(2)求得之真實 HCB。然而在步驟(1)中，我們則需要利用

virtual HCB model 來決定一組用來搭配的 virtual HCB，而這個 virtual HCB 適當與否將影響

整個 CTB 演算法的效能。為了建立適當的 virtual HCB model，我們從統計資料中找出篩減

候選 HCB 數目的規則性，並利用它來找出 virtual HCB model 中兩個重要的變數，編碼位元

偏移(

δ

)和虛擬霍夫曼編碼書比重(

α

)。下圖 5 顯示 CTB 演算法和 JTB 演算法之間的誤差隨

著不同的

δ

和

α

數值而改變，而越小的誤差值表示越好。

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1 = δ 0 = δ 2 = δ 3 = δ

α

(a)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5

α

1 = δ 0 = δ 2 = δ 3 = δ

(b)

圖 5. CTB 演算法和 JTB 演算法之誤差值 v.s (

δ

,

α

)

(10)

2. 切片式算數編碼(Bit-Sliced Arithmetic Coding，BSAC)研究：

在 MPEG-4 第一版的比率編碼方法中，只粗略的提供了幾個特定的位元率進行編碼 (例

如，24 kbps 編碼率的基礎層，另外再加上一至二個 16 kbps 編碼率的增進層)，仍舊存在許

多尚待改進的空間。因此，配合 Fine Granularity Scalability(FGS)的概念，MPEG-4 在第二

版的聲訊壓縮標準中提供了編碼率精細可調式的新工具 Bit-Sliced Arithmetic

Cod-ing(BSAC)[11]。每個可調間距大約為 1 kbits/s/ch。這功能對一些頻寬容易變動的通訊系統，

例如網際網路或行動通訊來說，是非常有用的。

我們首先研究切片式算數編碼的音質效能及其對於傳輸錯誤的敏感度。接著，我們提出

兩種方法試圖改善切片式算數編碼的編碼效率。

由於 BSAC 和傳統 AAC 的編碼架構大致相同，只有在最後的部分用 BSAC 方法取代原

先用在頻線係數和 SF 上的無失真編碼，因此我們比較了 BSAC 和傳統 AAC 的音質效能之

後，並對實驗結果進行分析，並提出造成兩者效能差異的可能原因。因為算數編碼是一種對

傳輸錯誤很敏感的編碼方式，所以我們也研究了切片式算數編碼中的錯誤傳遞問題。

BSAC 編碼過程中有兩個重要的步驟，一是將頻線係數由低到高頻的分佈切割成不同可

調層，另一個則是依照頻線係數的特性替算數編碼決定適合的機率模型。因此我們就從這兩

方要來改善編碼效率。

MEPG-4 BSAC 中的機率模型大致是兩兩一組。每次可以從兩組機率模型中選出一組用

來做算數編碼。但是由於 MEPG-4 BSAC Verification Model 中都只固定使用其中一組，因此

我們將選擇機率模型的機制打開，並嘗試兩種不同的選擇機率模型的方式，(1)在 R/D loop

之外選擇機率模型，(2) 在 R/D loop 之內選擇機率模型。此外，我們也設計並測試經由實際

聲音訊號所產生的機率模型來取代 MEPG-4 BSAC 原有的機率模型。

另一個改善編碼效能的方法是改變可用位元的分配及可調層的切割。可調層的切割的改

變如下表。基礎層(Base layer, BL)的 coefficient 數目不做更動，而把原來增進層(enhancement

layer, EL)分割規則由(12、12、8)改為(16、16)。而在分析不同可調層位元使用情況(下圖 6) 。

我們可以發現位元使用多集中在低頻可調層部分，較高頻可調層(35 以上)幾乎不需要位元。

因此我們在可用位元的分配上嘗試多種調整的方式，大致可以分為兩大方向，(1)增加基礎

層的位元分配，(2)增加每個增進層位元分配。因為位元分配的順序是由低頻可調層到高頻

可調層，而在總位元率不變的條件下，這兩種方式的主要觀念都是分配更多的位元數給較低

頻的可調層。以下表為例，在調整過可調層的切割方式後為了和原始的 BSAC 達到相似的

效果，增進層位元分配由原來的(1、1、1)調整為(1.5、1.5)。

Original BL

EL1

EL2

EL

3 # coeff.

160

12

8 位元分配 kbps/ch

16 1 1 1

Modified BL

EL1 EL2

# coeff.

160

16

(11)

0

10

20

30

40

50

60

1

4

7 10 13 16 19 22 25 28 31 34 37

layer

bi

ts

frame1

frame2

frame3

圖 6. 不同可調層位元使用情況

C. 實驗與結果

1. AAC 編碼法 Rate/Distortion Control：

這裡將列出我們所提出的兩種 Rate /Distortion Control 演算法的複雜度分析和效能的實

驗結果。比較的對象一是 MPEG-4 AAC Verification Model (VM-TLS)，另一則是 JTB 演算法。

又由於最佳化的條件不同，JTB 演算法又可以細分為 1. 針對 Average NMR 條件做最佳化的

JTB-ANMR 和 2. 針對 Maximum NMR 條件做最佳化的 JTB-MNMR 兩種。

表 1. Max-BNL 演算法之複雜度分析

Algorithm

Complexity Ratio

Storage Ratio

JTB 1

1 Max-BNL 1/17 1/120

Fast Max-BNL

1/150

1/120

由圖 7 中兩種不同的客觀品質量測結果證明，我們所提出的(Fast) Max-BNL 位元分配方

法比 MPEG-4 AAC Verification Model 要好上約 3dB，而與 JTB-ANMR 演算法的效果類似。

同時表 1 列出了以 JTB 演算法為基準的實際複雜度比較，我們可以看到 Fast Max-BNL 演算

法的計算量只有 JTB 演算法的 1/150。另外，運算中用於 trellis-based search 上所需的記憶體

數量也只有 JTB 演算法的 1/120，這對於某些只允許有限且少量記憶體的應用(如：DSP 實

現)來說是較為有利的。

表 2. CTB 演算法之複雜度分析

Algorithm

Complexity Ratio

Storage Ratio

JTB 1

1

(12)

簡化模式 CTB (One-Loop)

1/142 1/12

Total Bit Rate (kbps)

10 20 30 40 50 60 70 80 AN M R (d B ) 0 1 2 3 4 5 6 7 8 VM-TLS JTB-ANMR JTB-MNMR Max-BNL Fast Max-BNL (a)

10 20 30 40 50 60 70 80 MNMR (d B) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 VM -TLS JTB-ANMR JTB-MNM R Max-BNL Fast Max-BNL (b)

圖 7. Max-BNL 演算法之 Rate-Distortion 分析

(13)

Total Bit Rate (kbps) 16 32 48 64 80 AN MR (dB) -6 -4 -2 0 2 4 6 8 10 VM-TLS One-Loop CTB-MNMR Two-Loop CTB-MNMR JTB-MNMR One-Loop CTB-ANMR Two-Loop CTB-ANMR JTB-ANMR

(a)

Total Bit Rate (kbps)

16 32 48 64 80 MN MR (d B) -3 -1 1 3 5 7 9 11 13 15 One-Loop CTB-MNMR Two-Loop CTB-MNMR JTB-MNMR VM-TLS One-Loop CTB-ANMR Two-Loop CTB-ANMR JTB-ANMR

(b)

圖 8. CTB 演算法之 Rate-Distortion 分析

(14)

Total Bit Rate (kbps) 16 32 48 64 80 ODG -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 VM-TLS One-Loop CTB-MNMR Two-Loop CTB-MNMR JTB-MNMR One-Loop CTB-ANMR Two-Loop CTB-ANMR JTB-ANMR

圖 9. CTB 演算法之 ODG 分析

Trellis-based search 方法的複雜度取決於 trellis 中 stage 的數目和每個 stage 包含 state 的

數目。Stage 代表著 MPEG-4 AAC 中的 band，而 state 則代表著不同的候選參數。在相同的

SF 候選數目和 HCB 候選數目條件下，CTB 演算法和 JTB 演算法的複雜度比較結果列於表

2。我們可以看到完整模式 CTB 演算法的計算量只有 JTB 演算法的 1/72，而簡化模式 CTB

演算法的計算量更只有 JTB 演算法的 1/142。而運算中用於 trellis-based search 上所需的記憶

體數量也只有 JTB 演算法的 1/12。圖 8 是兩種不同客觀品質量測的結果，圖 9 則是利用模

擬主觀品質 Objective Difference Grade (ODG) 量測的結果。我們可以發現，不論從那種量測

結果都顯示 CTB 演算法的壓縮效能和 JTB 演算法無甚差別，而且都比 MPEG-4 AAC

Verification Model 要好很多。

2. 切片式算數編碼：

下圖 10 是 BSAC 和 AAC 聲訊品質的比較，我們可以明顥的看出兩者之間的差異，特

別在較高的位元率時，兩者的 NMR 可差到約 1dB。為了找出造成如此差異的原因，我們分

析了 BSAC 和 AAC 的位元分配情況，其如果列於表 3 中。BASC 中的 side information 大至

包含 SF、stereo 和機率模型等幾個部分，而 AAC 中的 side information 包含 SF 和 HCB 等資

訊。BASC 為了達到 FGS 的效果花了相對較多的位元在 side information 上，真正用於較重

要的 spectral data 的位元比 AAC 少約了 6.5%，因此也損失了部分聲訊品質。

(15)

-3.5

-3.0

-2.5

-2.0

-1.5

24

32

40

48

56

64 Bitrate(kbps/ch)

ODG

BSAC

AAC

-8

-7

-6

-5

-4

-3

-2

-1

0 NM

R

24

32

40

48

56

64 Bitrate(kbps/ch)

BSAC

AAC

(a) (b)

圖 10. BSAC 和 AAC 效能比較

表 3. BSAC and AAC 位元分配情況分析

BSAC

AAC

Spectral data

65.95%

72.60%

Side information

34.05%

27.40%

-4.0

-3.5

-3.0

-2.5

-2.0

OD

G

24

32

40

48

56

64 Bitrate(kbps/ch)

16 (a)

BER=0 BER=10e-3 BER=10e-4

-7

3

13

23

33

43 NMR

Bitrate(kbps/ch)

24

32

40

48

56

64

16 (b)

BER=0 BER=10e-3 BER=10e-4

圖 11. BSAC 中位元錯誤造成品質衰減之分析

我們由圖 11 可以發現，位元錯誤對 BSAC 的造成很大的影響，聲訊品質衰減的十分嚴

重。另外由於錯誤傳遞使得聲訊品質衰減的情況在高位元率時反而比在低元率時來得嚴重。

表 4. 不同的機率模型選擇流程分析

無機率模型選拔

R/D loop 外選擇機率模型 R/D loop 之內選擇機率模型

位元串 size

240 236

240 size 減少

─

1.66% 0%

(16)

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

16 24 32 40 48 56 64

bitrate (kbps/ch)

ODG

original BSAC

models seleciton

inside the R/D loop

(a)

-7

-6

-5

-4

-3

-2

-1

0 NM

R

original BSAC

models seleciton

inside the R/D loop

16 24 32 40 48 56 64

bitrate (kbps/ch)

(b)

圖 12. 在 R/D loop 之內選擇機率模型之效能分析

因為在 R/D loop 外選擇機率模型只會影響壓縮後之位元串大小，所以由表 4 來看，在

R/D loop 外選擇機率模型對於減少位元串大小的改善有限。而在 R/D loop 內選擇機率模型

則會影響品質，由表 4 和圖 12 來看，在 R/D loop 內選擇機率模型對於整體的聲訊品質改善

也同樣很有限。

圖 13 是使用了新機率模型的效能分析。我們發現，就算利用受測訊號所產生之機率模

型來作算數編碼，效能上得到的改善也不太多。

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

16

24

32

40

48

56

64 bitrate (kbps/ch)

OD

G

Original BSAC

New Models

(a)

-7

-6

-5

-4

-3

-2

-1

0 NMR

16

24

32

40

48

56

64 bitrate (kbps/ch)

(b)

Original BSAC

New Models

圖 13. 使用新機率模型之效能分析

圖 14 是在固定基礎層位元分配在 16kbps/ch 條件下，改變每個增進層位元分配的結果。

由於總位元固定，增加每個增進層位元分配就會減少可編碼增進層的數目。由結果我們可以

發現，分配較多位元給每個增進層可以得到較好的效能。圖 15 是在固定每個增進層位元分

配在 2kbps/ch 條件下，改變基礎層位元分配的結果。由結果我們可以發現，分配較多位元

給每個基礎層可以得到較好的效能。上面兩項結果都顯示了，對於 BSAC，分配更多的位元

數給較低頻的可調層可以得到比較明顯的效能改善。

(17)

-4 -3.5 -3 -2.5 -2 -1.5 16 24 32 40 48 56 64 Bitrate(kbps/ch) ODG (a) original bsac EL=1kbps/ch EL=2kbps/ch EL=3kbps/ch EL=4kbps/ch -8 -7 -6 -5 -4 -3 -2 -1 0 16 24 32 40 48 56 64 NMR Bitrate(kbps/ch) (b) original bsac EL=1kbps/ch EL=2kbps/ch EL=3kbps/ch EL=4kbps/ch

圖 14.改變可調層位元分配之效能分析

-3.7 -3.5 -3.3 -3.1 -2.9 -2.7 -2.5 -2.3 -2.1 -1.9 24 32 40 48 56 64 bitrate (kbps) ODG BL=14kbps/ch BL=16kbps/ch BL=18kbps/ch BL=20kbps/ch BL=22kbps/ch BL=24kbps/ch (a) -7.5 -6.5 -5.5 -4.5 -3.5 -2.5 -1.5 24 32 40 48 56 64 bitrate (kbps/ch) (b) BL=14kbps/ch BL=16kbps/ch BL=18kbps/ch BL=20kbps/ch BL=22kbps/ch BL=24kbps/ch

圖 15.改變基礎層位元分配之效能分析

D. 結論

在 AAC 編碼法 rate control 方面，我們提出兩個有效位元分配計算方式。第一個方法是

基於我們提出之原則

“Give bits to the band with the maximum NMR-Gain/bit” 或

“Retrieve bits from the band with the maximum bits/NMR-Loss”.

實驗結果此法與窮舉搜尋法達到類似效果，但計算量為百分之一。

第二個方法是使用快速位元分配參數搜尋法。因此我們將演算法切割為段落，分開計算

可以大量減少運算量，但也可能大幅降低品質。其關鍵點在找出適當的位元編碼模型 (virtual

Huffman code book model)。實驗數據顯示，我們可降低計算量達 trellis-based search 的百分

之一，而壓縮效果無甚差別。

在 AAC 編碼法 rate control 方面，我們首先研究切片式算數編碼的音質效能及其對於傳

輸錯誤的敏感度。接著，我們提出兩種方法試圖改善切片式算數編碼的編碼效率。比較切片

式算數編碼和進階音訊編碼(Advanced Audio Coding, AAC)的音質效能之後，我們對實驗結

(18)

果進行分析，並提出造成兩者效能差異的可能原因。

在改善編碼效率方面，我們研究了在切片式算數編碼過程中會用到的機率模型。我們也

設計並測試經由實際聲音訊號產生的機率模型。另一個改善編碼效能的方法是改變每個可調

層分配到的位元數。主要觀念在於分配更多的位元數給較低頻的可調層。這個方法將可以看

到比較明顯的效能改善。

E. 參考文獻

[1] ISO/IEC 13818-3: 1997, Information technology – Generic coding of moving pictures and

associated audio information – Part 3: Audio

[2] ISO/IEC 13818-7: 1997, Information technology – Generic coding of moving pictures and

associated audio information – Part 7: Advanced Audio Coding (AAC)

[3] M. Bosi, et al., “ISO/IEC MPEG-2 Advanced Audio Coding”, J. Audio Eng. Soc., Vol.45, No

10, October 1997.

[4] ISO/IEC 14496-3, 1999, Information technology –Coding of audio-visual objects – Part 3:

Audio

[5] J. Herre and B. Grill, “Overview of MPEG-4 audio and its applications in mobile

communi-cations”, 5th International Conference on Signal Processing, Vol 1, pp. 11 –20, 2000.

[6] J. Ojanpera, and M. Vaananen, “Long Term Predictor for Transform Domain Perceptual

Au-dio Coding”, 107

th

AES Covention, New York 1999.

[7] B. Schulz, “Improving Audio Codecs by Noise Substitution”, J. Audio Eng. Soc., Vol.44, No.

7/8, July/August 1996.

[8] N. Iwakami, T. Moriya, and S. Miki, “High-quality audio-coding at less than 64 kbit/s by

us-ing transform-domain weighted interleave vector quantization (TwinVQ)”, IEEE Int’l Conf.

Acoustics, Speech, and Signal Proc., Vol. 5, pp. 3095 -3098, 1995.

[9] A. Aggarwal, S.L. Regunathan, and K. Rose, “Trellis-based optimization of MPEG-4

ad-vanced audio coding,” Proc. IEEE Workshop on Speech Coding, pp. 142-4, 2000.

[10] A. Aggarwal, S.L. Regunathan, and K. Rose, “Near-optimal selection of encoding parameters

for audio coding,” Proc. of ICASSP, vol. 5, pp. 3269-3272, Jun 2001.

[11] S. H. Park, et al., “Multi-layer bit- sliced bit rate scalable MPEG-4 audio coder”, 103

rd

Con-vention of the AES, New York, Sep. 1997.

(19)

F. 計畫成果自評

無線通訊為國家重點發展的科技項目，而多媒體服務是寬頻無線網路的最重要應用。

然而在無線網路上傳送串流多媒體數據有許多困難，本專題研究將承繼我們過去的經驗與前

人的成果，進一步設計發展解決方式。所發展出的技術、經驗及成品極具實用價值，可促進

國內工業研發技術開發。

參與工作人員(研究生與博士後)在學理上習得聲訊與語音編碼技術與國際標準。針對寬

頻無線網路，設計開發可調式編碼等演算法，成員得到此課題研究與開發產品的經驗與知

識。畢業後進入產業，直接有助於產業界開發新產品，提昇我國工業技術能力，達到人才培

育之目的。期間研究成果論文兩篇，已發表於國際學術會議，並準備延伸成為期刊論文。

綜合評估：研究內容與原計畫大致相符，已達成學術研究創新與人才培育之預目標。整

體成效良好。研究成果頗具學術與應用價值，已發表學術論文兩篇以及碩士學位論文一冊如

下表。

Publications:

[1] C.-H. Yang and H.-M. Hang, “Efficient bit assignment strategy for perceptual audio coding,”

ICASSP 2003, Hong Kong, April 2003.

[2] C.-H. Yang and H.-M. Hang, “Cascaded trellis-based optimization for MPEG-4 Advanced

Au-dio Coding,” to be presented in AuAu-dio Engineering Society Convention 2003, New York, Oct.

2003.

[3] Szu-Wei Hou 侯思瑋, Performance Analysis and Improvement on MPEG-4 Bit-Sliced

Arithmetic Coding for Audio, MS Thesis, NCTU, June 2003.

G. 附錄

1. C.-H. Yang and H.-M. Hang, “Efficient bit assignment strategy for perceptual audio coding,”

ICASSP 2003, Hong Kong, April 2003.

2. C.-H. Yang and H.-M. Hang, “Cascaded trellis-based optimization for MPEG-4 Advanced

Au-dio Coding,” to be presented in AuAu-dio Engineering Society Convention 2003, New York, Oct.

2003.

3. 出席國際學術會議報告 IEEE ISCAS 2003

報告論文（前ㄧ期成果） K.-T. Shih, C.-Y. Tsai, and H.-M. Hang, “Real-time implementation

of H.263+ using TI TMS320C6201 digital signal processor”, ISCAS 2003, Bangkok, Thailand,

May 2003.

(20)

EFFICIENT BIT ASSIGNMET STRATEGY FOR PERCEPTUAL AUDIO CODING

Cheng-Han Yang and Hsueh-Ming Hang**

Department of Electronics Engineering

National Chiao Tung University Hsinchu, Taiwan, R.O.C.

**

[email protected]

; Fax: (886)-3-5723283

ABSTRACT

For the purpose of efficient audio coding at low rates, a new bit allocation strategy is proposed in this paper. The basic idea behind this approach is “Give bits to the band with the maximum NMR-Gain/bit” or “Retrieve bits from the band with the maximum bits/NMR-Loss”. The notion of “bit-use efficiency” is suggested and it can be employed to construct a bit assignment algorithm oper-ated at band-level as compared to the traditional frame-level bit assignment methods. Based on this strategy a new bit assignment scheme, called Max-BNLR, is de-signed for the MPEG-4 AAC. Simulation results show that the performance of the Max-BNLR scheme is signifi-cantly better than that of the MPEG-4 AAC Verification Model (VM) and is close to that of TB-ANMR [3], which is the (nearly) optimal solution. Moreover, the Max-BNLR scheme has the advantages of low computational complexity comparing to TB-ANMR.

1. INTRODUCTION

Many highly efficient and high quality audio coding schemes have been developed and proposed to meet the growing demand of multimedia applications. The MPEG-4 Advanced Audio Coding (AAC) is one of the most re-cent audio coder specified by the ISO/IEC MPEG stan-dards committee [1]. It is a very efficient audio compres-sion algorithm aiming at a wide variety of applications, such as Internet, wireless, and digital broadcast arenas [2]. For the applications where the bandwidth is very limited, the low rate audio coding with good quality becomes es-sential.

The procedure of bit assignment is one of the most important elements in audio coding. Particularly, when bits are scare, how to make the best use of the limited bits is critical in producing the best quality audio. Up to now, the popular strategies on bit assignment are as follows ([2][3][5]).

1. “Give bits to the band which has the largest value of NMR (perceptual distortion).”

2. “Give bits to the bands of which the distortion is larger than the masking threshold”.

In these strategies, the bit-use (giving away bits) is con-sidered at frame-level and only the value of distortion is taken into consideration at band-level. Hence, it is hard to control the bit-use efficiency (the NMR improvement due to adding one bit) at band level and thus results in a less efficient compression scheme.

In this paper, we suggest the notion of bit-use

effi-ciency and propose a new strategy to improve the bit-use

efficiency, which can be evaluated at band-level. More-over, a new bit assignment scheme based on this new strategy is proposed for MPEG-4 AAC.

The organization of the paper is as follows. Section 2 describes the aforementioned new strategy. A new AAC bit assignment scheme is delineated in section 3. Finally, the complexity analysis and the simulation results are pre-sented in section 4.

2. EFFICIENT BIT-USE STRATEGY

How to make use of the bits more efficiently is always the key issue in audio coding. The traditional strategies, “Giv-ing bits to the band with the largest NMR” or “Giv“Giv-ing bits to the bands of which the distortion is larger than masking threshold”, do not necessarily provide the best bit-use efficiency. For example, there are two candidate bands, A and B, and their NMR characteristics are listed in the table below. Which band should the first available bit be as-signed to? In this table, NMR-Gain/bit means the gain in NMR by allocating one bit to this particular band. A more precise definition of NMR-Gain/bit will be given in sec-tion 3.

Band NMR (dB) NMR-Gain/bit

A 3.5 0.5

B 3 1.5

Following the traditional strategy, we would assign this one bit to band A; however, considering the bit-use effi-ciency, this one bit should be assigned to band B so that the overall NMR reduction is maximized. The essence of this new strategy can be summarized by the following statements.

“Give bits to the band with the maximum NMR-Gain/bit” or “Retrieve bits from the band with the

➠

➡

(21)

maximum bits/NMR-Loss”, where bits/NMR-loss is the

number bits we save if we give away one unit of NMR. 3. MAX BITS/NMR-LOSS BIT ASSIGNMENT

SCHEME

In this section, a new bit assignment scheme designed for MPEG-4 AAC based our new strategy is described. First, we define NMR-Gain/bit and bits/NMR-Loss by the fol-lowing equations. = −Gain bit NMR / ) /( )

(NMRref −NMRnew bitsnew−bitsref (1) and _{bits /}_NMR_{− Loss}₌

) /(

)

(bitsnew−bitsref NMRnew−NMRref . (2) Figure 1 is the block diagram of the Max bits/NMR-Loss based bit assignment scheme. Each step in Figure 1 will be elaborated in the following sub-sections.

Pre-Processing Bits/NMR-Loss Analysis

Adjust SF of the SFB with max Bits/NMR-Loss

Total coding bits < prescribed bits No

Yes

Figure 1. Max bits/NMR-Loss bit assignment scheme

3.1. Pre-Processing

The pre-processing step is to initialize two of the major parameters in the bits/NMR-Loss analysis: the reference NMR and the reference bits. There are no particular val-ues associated with these parameters and thus the design of the pre-processing is case-dependent. In our implemen-tation, we set the reference NMR=1 (0dB) for all the scale factor bands (SFB) at the beginning of processing a frame. After that, the reference scale factor (SF) for each SFB and the reference bits are calculated based on the input audio data.

3.2. Bits/NMR-Loss Analysis and SF Adjustment In this scheme, only one SF value (of one SFB) is ad-justed in one adjustment iteration. The detailed process is described below.

1. Initialization. Get the reference bits (Bref), and the refer-ence SFs (sfref) and NMRs (NMRref) for all SFBs (N_SFB SFB in total) from the pre-processing step.

Start the max bits/NMR-Loss analysis from the first SFB and thus set the SFB index i=1.

2. Find the local max bits/NMR-Loss ratio of the ith SFB, i BNLR , by computing

{

}

i i ref i ref i sf sf ref sf i sf sf sf and sf NMR NMR B B BNLR max, , , , ) /( ) ( max ≤ < ∀ − − =

TheB is the new value of the total coding bits in the sf current frame if the SF value (of the ith SFB) is changed from sfref,ito sfsf,i. Thesfmax,iis the SF value that quantizes all the spectral coefficients in the ith SFB to zero. The local optimal SF (of the ith SFB),

sf

_opt_,_i, is the SF with the maximum BNLR. The local optimal coding bits of the ith SFB,B_opt,_i= Bsfopt,i, is also re-corded.

3. If i<N_SFB, update i to i+1 and go to step 2.

4. Find the global maximum bits/NMR-Loss ratio,

BNLRglobal, by computing

BNLR_globe =max_i

{

BNLR_i

}

∀i,0≤i<N_SFB

The global optimal SFB, sfbglobal, is the SFB that has the

BNLRglobal. Then, the global optimal SF, sfglobal, is the local optimal SF of the sfbglobal-th SFB. Similarly, the

global optimal coding bits, Bglobal, is the coding bits of the sfbglobal-th SFB.

5. Set the SF of the sfbglobal-th SFB to sfglobal. Update pa-rameters for the sfbglobal-th SFB; that is,

global sfb

ref sf

sf , _global = and NMRref,sfb_global=NMRsf_global,sfb_global.

6. Compare Bglobal to the prescribed rate,

B

. If Bglobal >

B

, update B_ref to Bglobal and go to step 2.

Note that, in performing the local maximum

bits/NMR-Loss ratio analysis in step 2, only the SF of one SFB that

is being examined is modified. The SF of the other SFBs are kept unchanged.

3.3. Trellis-Based Optimization on HCB

Total coding bits calculation in step 2 in the Bits/NMR-Loss Analysis (in sub-section 3.2) is one of the most computational-intensive processes. When the SF for each SFB is determined, the quantized spectral coefficients are also fixed. Before calculating the total coding bits, the HCB for each SFB has to be chosen first. The MPEG-4 AAC Verification Model (VM) has a simple algorithm for this purpose; however, a more efficient algorithm is needed for HCB decision. Thus, we adopt the Viterbi-based approach in this paper.

The problem for finding the optimal HCB can be reformu-lated as minimizing the following cost function:

(22)

( i 1, i) i

i

HCB b Rh h

C =

∑

+ − , (3)

where b is the coding bits of the quantized spectral coef-i ficients for the ith SFB, h is the HCB for the ith SFB, and i

R is the run-length coding function (bits needed) for

cod-ing HCB. We find that the contribution of h to _i C_HCB

depends only on the previous choice, h_i₋₁. Therefore, the minimization of C_HCB can be achieved by finding the optimal path through the trellis using the Viterbi algo-rithm.

A trellis is thus constructed for minimizing C_HCB. Each stage in the trellis corresponds to an SFB and each state at the ith stage represents a HCB candidate for this scale factor band. In other words, for the ith stage, if a path passes through the mth state, the mth HCB is em-ployed for encoding the ith SFB. The Viterbi search pro-cedure is outlined below.

The kth state at the ith stage is denoted by S_k,_i and the

minimum accumulative-partial cost ending at Sk,iis de-noted by C_k_,_i. The transition cost from S_n_,_i₋₁ to S_m_,_i is

) , (hn,i1 hm,i

R − .

1. Initialize C_m_,₀ = ,0 ∀m. Initialize i=1.

2. Search. m∀ , the best path ending at Sm,i is found by computing )} , ( { min , 1 , , 1 , ,i n ni mi ni mi m C b Rh h C = − + + −

3. If i < N, set i = i+1 (SFB) and go to step 2. 3.4. Fast algorithm for Bits/NMR-Loss Analysis The most time-consuming computation in this bit assign-ment scheme is the trellis-based HCB optimization for coding bits calculation in step 2 (Search). For each SF modification in step 2, the new value of total coding bits needs to be recalculated. Therefore, for one SF adjustment iteration, we need to perform (sfmax,i −sfref,i) times trel-lis-based HCB optimization processes for the local

bits/NMR-Loss analysis. Hence, the total number of

calcu-lations for finding the global maximum bits/NMR-Loss is

∑

= − SFB N i i ref i sf sf _ 1 , max, ) ( . (4) There are at least two ways to reduce computations. One is to reduce the complexity of the trellis-based HCB opti-mization; the other is to reduce the number of trellis-based HCB optimization.

By analyzing the local optimal parameters, sf_opt_,_iand i

BNLR , we find some interesting properties.

1. The average value of the difference between the local optimal SFs of the mth and the (m+1)th iterations,

ave

sfdiff , is often close to zero.

∑

∉ + ₋ × − = S i m i opt m i opt ave abs sf sf SFB N sfdiff ( ) ) 3 _ ( 1 , 1 , , where ={ −1, , m +1} global m global m global sfb sfb sfb S and m global sfb is the global optimal SFB of the mth SF adjustment iteration.

2. The average value of the difference between the local max bits/NMR-Loss ratio of the mth and the (m+1)th

itera-tion, BNLRdiff_ave, is typically quite small.

∑

∉ + ₋ × − = S i m i m i ave _N _SFB abs BNLR BNLR BNLRdiff ( ) ) 3 _ ( 1 1

Using these two properties, we can drastically reduce the number of bits/NMR-Loss analyses (trellis-based HCB

optimizations). We only need to perform the bits/NMR-Loss analysis on three SFBs after the first SF adjustment

iteration.

4. SIMULATION RESULTS

The computational complexity and objective quality based on our simulations are summarized in this section. The bits assignment schemes used in comparison are as fol-lows.

(1) The MPEG-4 VM of AAC (VM-TLS) without modifi-cation.

(2) The modified MPEG-4 VM of AAC (VM-TLS-M), in which the HCB decision algorithm is replaced by the TB-HCB optimization procedure described in section 3.3. (3) The trellis-based ANMR optimization (TB-ANMR) and the MNMR optimization (TB-MNRM), which are implemented as described in [3] and [4].

(4) The normal and fast max bits/NMR-Loss schemes (max-BNLR).

Ten audio files with sampling rate 44.1K are used as test sequences. Two of them are extracted from MPEG SQAM [6], and the others are from EBU [7].

4.1. Computational complexity

The storage and computational complexity of one iteration in various schemes are summarized in Table 1.

Table 1. Complexity Analysis

Search complexity Storage

VM-TLS 1 -- VM-TLS-M ₁₂2_{×N_SFB} _{12×N_SFB} TB-ANMR TB-MNMR 2 ) 2 60 ( × ×₁₂2_{×N_SFB} _{60×2×12×N_SFB} Max-BNLR N_SFB×Ave_SF×₁₂2_{×N_SFB} _{12×N_SFB} Fast

Max-BNLR ※ (a) (N_SFB×Ave_SF _×₁₂2_{×N_SFB)}

(b) 3×Ave_SF ×₁₂2_{×N_SFB}

12×N_SFB

※ (a) is only for the first iteration; all the rest are using (b)

(23)

In this table, Ave_SF is the average number of SF tested for the max BNLR analysis for each SFB and its typical value is around 17 or so. Table 2 is the statistics collected from the simulations on audio sequences. It is clear that in terms of computational requirement:

Fast Max-BNLR<< Max-BNLR<< TB-ANMR(MNMR) Table 2. Statistics on Computational Complexity

Average iteration /frame Average TB HCB optimi-zations/ frame Average TB HCB optimi-zations/ iteration Complexity ratio TB-ANMR (MNMR) 12 14400*12 14400 1 Max-BNLR 50 10103 10103/12 = 842 1/17 Fast Max-BNLR 50 1153 1153/12 = 96 1/150 4.2. Objective results

Two common objective quality measurements, average noise to mask ratio (ANMR) and maximum noise to mask ratio (MNMR) [5], are adopted in the performance com-parison. Note that, in evaluating distortion, the NMR is set to 0 dB if the original NMR value is less than 0 dB. The rate-distortion curves of six bit assignment schemes are shown in Figures 2 and 3. (Note: ANMR and TB-MNMR are similar algorithms aiming at two different target NMRs.) We can find that the ANMR performance of the Max-BNLR scheme is almost as good as that of TB-ANMR. There is almost no loss of ANMR perform-ance in using the fast algorithm for Max-BNLR either. The MNMR values of TB-ANMR, Max-BNLR and Fast Max-BNLR are also similar. The characteristic of the pro-posed Max-BNLR scheme is closer to that of TB-ANMR as compared to MNMR. Again, ANMR and TB-MNMR are the optimal solutions tuned for their target cost functions, ANMR and MNMR, respectively [3][4].

4. CONCLUSIONS

In this paper, we propose a new concept, bit-use effi-ciency, for improving audio coding performance. Fur-thermore, a new bits assignment scheme based on this new concept (strategy) is proposed for MPEG-4 AAC, named Max-BNLR. Simulation results show that the Max-BNLR scheme has a performance close to TB-ANMR and is much better than the MPEG VM. In addi-tion, its computational complexity is much lower than that of TB-ANMR.

5. REFERENCES

[1] ISO/IEC JTC1/SC29, “Information technology – vary low bitrate audio-visual coding,” ISO/IEC IS-14496 (Part 3,

Audio), 1998

[2] M. Bosi, et al., “ISO/IEC MPEG-2 advanced audio coding,”

Journal of Audio Engineering Society, vol. 45, pp. 789-814,

October 1997.

[3] A. Aggarwal, S.L. Regunathan, K. Rose, “Trellis-based optimization of MPEG-4 advanced audio coding,” Proc.

IEEE Workshop on Speech Coding, pp. 142-4 2000.

[4] A. Aggarwal, S.L. Regunathan, K. Rose, “Near-optimal selection of encoding parameters for audio coding,” Proc.

of ICASSP, vol. 5, pp. 3269-3272, Jun 2001.

[5] H. Najafzadeh and P. Kabal, “Perceptual bit allocation for low rate coding of narrowband audio,” Proc. of ICASSP, vol. 2, pp. 893-896, 2000.

[6] “The MPEG audio web page.”

http://www.tnt.uni-hannover.de/project/mpeg/audio.

[7] European Broadcasting Union, Sound Quality Assessment

Material: Recordings for Subjective Tests, Brussels,

Bel-gium, Apr. 1988.

10 20 30 40 50 60 70 80 AN M R (d B ) 0 1 2 3 4 5 6 7 8 VM-TLS VM-TLS-M TB-ANMR TB-MNMR Max-BNLR Fast Max-BNLR

Figure 2. ANMR rate-distortion analyses

10 20 30 40 50 60 70 80 MN M R (d B) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 VM-TLS VM-TLS-M TB-ANMR TB-MNMR Max-BNLR Fast Max-BNLR

Figure 3. MNMR rate-distortion analyses

(24)

___________________________________

Audio Engineering Society

Convention Paper

Presented at the 115th Convention

2003 October 10–13

New York, New York

This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request

and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org.

Journal of the Audio Engineering Society.

___________________________________

Cascaded Trellis-Based Optimization For

MPEG-4 Advanced Audio Coding

Cheng-Han Yang 1_{, Hsueh-Ming Hang} 1

1 Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.

[email protected]; Fax: (886)-3-5723283

[email protected]; Fax: (886) -3-5731791

ABSTRACT

A low complexity and high performance scheme for choosing MPEG-4 Advanced Audio Coding (AAC) parameters is proposed. One key element in producing good quality compressed audio at low rates in particular is selecting proper coding parameter values. A joint trellis-based optimization approach has thus been previously proposed. It leads to a near-optimal selection of parameters at the cost of extremely high computational complexity. It is, therefore, very desirable to achieve a similar coding performance (audio quality) at a much lower complexity. Simulation results indicate that our proposed cascaded trellis-based optimization scheme has a coding performance close to that of the joint trellis-based scheme, and it requires only 1/70 in computation.

1. INTRODUCTION

To meet the demand of various multimedia applications, many high-efficient audio coding schemes have been developed. The MPEG-4 Advanced Audio Coding (AAC) is one of the most recent-generation audio coders specified by the ISO/IEC MPEG standards committee [1]. It is a very efficient audio compression algorithm aiming at a wide variety of different applications, such as Internet, wireless, and digital broadcast arenas [2]. One key element in an AAC coder is selecting two sets of coding parameters properly, the scale factor

(SF) and Huffman codebook (HCB) in the rate-distortion (R-D) loop. Because encoding these parameters is inter-band dependent, i.e., the coded bits produced for the second band depend on the choice of the first band, the choice of their proper values so as to minimize the objective quality becomes fairly difficult. As discussed in [3][4], the poor choice of parameters for rate control is one shortcoming of the current MPEG-4 AAC Verification Model (VM) and therefore its compression efficiency is not as expected at low bit rates.

(25)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

Some methods such as vector quanitzers rather than scalar quantizers have been suggested to reduce the side information [5][6]. They would alter the syntax of the standards. In this paper, we focus on finding the parameters in the existing AAC standard that produce the (nearly) optimal compressed audio quality for a given bit rate.

In [3] and [4], a joint optimization scheme, which takes the inter-band dependence into account, is proposed for choosing the encoding parameters for all the frequency bands. This joint optimization is formulated as a trellis search and is, therefore, called

trellis-based optimization. Although the complexity

of this joint trellis-based optimization scheme can be reduced by adopting the Viterbi algorithm, its search complexity is still extremely high and is thus not suitable for practical applications.

In this paper, we propose a cascaded trellis-based (CTB) optimization scheme for selecting the proper encoding parameters. Our scheme retains the good audio quality offered by the joint trellis-based (JTB) optimization while its search complexity is drastically decreased.

The organization of this paper is as follows. In section 2, a brief overview of MPE-4 AAC is provided. The proposed CTB scheme with several variations for choosing the optimal coding parameters is described in sections 3, 4 and 5. The algorithm complexity analysis and the simulation results are summarized in section 6.

2. OVERVIEW OF AAC ENCODER

The basic structure of the MPEG-4 AAC encoder is shown in Figure 1. The time domain signals are first converted into the frequency domain (spectral coefficients) by the modified discrete cosine transform (MDCT). For tying in with the human auditory system, these spectral coefficients are grouped into a number of bands, called scale factor bands (SFB). The pre-process modules, which are the optional functions, can help removing the time/frequency domain redundancies of the original signals. The psychoacoustic model calculates the spectral coefficient masking threshold, which is the base for deciding coding parameters in the R-D loop. The R-D loop, our focus in this paper, is to determine two critical parameters, SF and HCB for each SFB so as to optimize the selected criterion under the given bit rate constraint. The SF is related to the step size of the quantizer, which determines the quantization noise-to-masking ratio (NMR) in each band. The quantized coefficients are then entropy-coded by one of the twelve pre-designed HCBs. In addition, the

indices of SFs and HCBs are coded using differential and run-length codes respectively and are transmitted as side information. Psychoacoustic model Transform/ Filter Bank Pre-Process Modules Rate-Distortion Control Process Scale Factor _Quantizer Noiseless Coding Rate/Distortion Loop

Fig. 1. Basic structure of the MPEG-4 AAC encoder 3. CASCADED TRELLIS-BASED

OPTIMIZATION

The JTB optimization approach can substantially enhance the coding performance at low bit rates [3][4]. However, this approach also results in a very high computational complexity. The coding parameters in the JTB scheme, SF and HCB, are optimized simultaneously by using the trellis search. The states at the ith stage in the trellis for the JTB scheme represent all combinations of SF and HCB for the ith SFB. Different from the JTB scheme, our scheme, so-called cascaded trellis-based scheme (CTB), finds the proper coding parameters, SF and HCB, in two consecutive steps. The search complexity can thus be drastically reduced, while the advantage of trellis-based optimality is mostly retained.

The way that the trellis search performs depends on the optimization criterion it adopts. There are two frequently used criteria, the average noise-to-mask ratio (ANMR) and the maximum noise-to-mask ratio (MNMR) [7]. Both criteria will be used in this paper. 3.1. Trellis-Based ANMR Optimization on SF The constrained optimization problem for the ANMR criterion is formulated as below.

∑

i i id w min s.t.

(

)

∑

+ − − + − ≤ i i i i i i Dsf sf R h h B b ( ₁) ( ₁, )

, wherew is the inverse of the masking threshold and _i

i

d is the quantization distortion. Under this criterion, we minimize the sum of the perceptually weighted distortion. The coding parameters, SF and HCB, for the ith SFB is denoted by sfi andhi. Symbol D is the differential coding function performed on SF and symbol R is the run-length coding function performed on HCB. The returned function values in both cases are bits to encode the arguments. Parameterb_i is the bits for coding the quantized

(26)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

spectral coefficients (QSCs) and the parameter B is the prescribed bit rate for a frame.

As described in [3], the ANMR optimization problem can be reformulated as minimizing the unconstrained cost functions, C_ANMR , with the Lagrangian multiplier λ.

∑

+ ⋅ + − − + − = i i i i i i i i ANMR h h R sf sf D b d w C )) , ( ) ( ( ₁ ₁ λ (1)

Different from that in the original JTB scheme, the optimization problem in our CTB scheme is reformulated as minimizing two unconstrained cost functions, C_SF_{_}_ANMR and C_HCB, as follows.

∑

+ ⋅ + − − = i i i i i i ANMR SF wd b D sf sf C _ λ ( ( 1)) (2) ) , ( _i₁ _i i i HCB b R h h C =

∑

+ ₋ (3) The minimization of C_SF_{_}_ANMR is described in this sub-section, and the minimization of CHCB will be

described in section 3.3.

Similar to the approach in the JTB scheme, the goal for finding proper SFs that minimize C_SF_{_}_ANMRcan be achieved by finding the optimal path through the trellis. Each stage in the trellis corresponds to an SFB. (There are N_SFB stages in total.) However, different

from JTB, each state at the ith stage in our scheme

only represents a SF candidate for the ith SFB. In

other words, at the ith stage, if a path passes through

the mth state, it means that the mth SF candidate is

employed to encode the ith SFB. For a given value of

λ, the Viterbi search procedure described in [3] is

modified as stated below.

The kth state at the ith stage is denoted by S_k_,_i and the minimum accumulative-partial cost ending at

i k

S _, is denoted by C_k_,_i . The state-transition cost,

i k i l T_,₋₁_→_, , from S_l_,_i₋₁ to S_k_,_i is λ⋅D(sf_k_,_i −sf_l_,_i₋₁). 1. Initialize Ck,0 = ,0 ∀kand i =1.

2. Search for, ∀ , the best path ending at k S_k_,_i by computing

{

li i ki ki li ki

}

l i k C wd b T C_, =min _,₋₁+ _, +λ⋅ _, + _,₋₁_→_, (4) 3. If i < N_SFB, set i = i+1 and go to step 2.

3.2. Trellis-Based MNMR Optimization on SF The constrained optimization problem for the MNMR criterion is formulated below.

(

i i

)

i wd max min s.t.

(

)

∑

+ − − + − ≤ i i i i i i D sf sf Rh h B b ( ₁) ( ₁, ) , where _i _i i wd

max is the maximum NMR in a frame. Again, using the unconstrained format, the cost function in the JTB scheme [4] becomes

=

∑

+ − − + − i i i i i i MNMR b Dsf sf R h h C ( 1) ( 1, ) (5) Different from the cost function in the JTB scheme [4], the MNMR optimization problem in our CTB scheme is reformulated as the minimization of two cost functions, CSF_MNMR and CHCB(Eqn.(3)), under

the constraint: w_id_i≤ ,λ ∀i , for some constant value of λ,

∑

+ − − = i i i i MNMR SF b Dsf sf C _{_} ( ₁) (6) Similar to the trellis-based ANMR optimization on selecting SF, a trellis is constructed for minimizing

MNMR SF

C _ and each state at the ith stage only

represents a SF candidate for the ith SFB. The Viterbi search procedure described in [4] is modified as stated below. The state-transition cost, T_l_,_i₋₁_→_k_,_i, from

1 ,i−

l

S to Sk,i is D(sfk,i−sfl,i−1).

1. Initialize C_k_,₀= ,0 ∀kand i =1.

2. Find the valid states for the ith stage, S_k_,_i,∀k. A state is valid if the NMR (widk,i) corresponding to

that state parameter is ≤ λ.

3. Search for, k∀ , the best path ending at the valid state S_k,_i by computing

{

li ki li ki

}

l i k C b T C, =min ,−1+ , + ,−1→, (7)

4. If i < N_SFB, set i = i+1 and go to step 2.

As pointed in [3][4], in the trellis for selecting the optimal SF (for both ANMR and MNMR), each state is further split into two consecutive states. In the first state, the spectral coefficients are quantized using the assigned valid SF, and in the second state, all quantized values of spectral coefficients are set to zero.

3.3. Trellis-Based Optimization on HCB The HCB optimization is performed under the condition that the SF for each SFB has already been determined. With a determined SF, QSCs (quantized spectral coefficients) for each SFB are fixed and thus theb_i term in the cost function C_HCB (Eqn.(3)) only depends on HCB. The optimization procedure here is to find the HCBs that minimize the cost function

HCB

C and this can be achieved again by finding the optimal path through the trellis.

基於正交分頻多重進接之無線多媒體傳收機研究及設計---子計劃IV:無線網路串流聲訊研究及聲視訊子系統整合(I)

行政院國家科學委員會專題研究計畫 成果報告

子計劃四:無線網路串流聲訊研究及聲視訊子系統整合(I)

計畫類別： 整合型計畫

計畫編號： NSC91-2219-E-009-011-

執行期間： 91 年 08 月 01 日至 92 年 07 月 31 日

執行單位： 國立交通大學電子工程學系

計畫主持人： 杭學鳴

計畫參與人員： 楊政翰，陳繼大，蔡家揚，侯思瑋，李仰哲，王俊能

報告類型： 完整報告

報告附件： 出席國際會議研究心得報告及發表論文

處理方式： 本計畫可公開查詢

中 華 民 國 92 年 10 月 23 日

行政院國家科學委員會補助專題研究計畫

■ 成 果 報 告

□期中進度報告

無線網路串流聲訊研究及聲視訊子系統整合 (1/3)

Wireless Streaming Audio Research and

Audio/Video Subsystem Integration (1/3)

計畫類別：□ 個別型計畫 ■ 整合型計畫

計畫編號：

NSC 91-2219-E009-011

執行期間：

91 年 8 月 1 日至 92 年 7 月 31 日

計畫主持人：杭學鳴

計畫參與人員：楊政翰，陳繼大，蔡家揚，侯思瑋，李仰哲，王俊能

成果報告類型(依經費核定清單規定繳交)：□精簡報告 ■完整報告

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

■出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

處理方式：除產學合作研究計畫、提升產業技術及人才培育研究計畫、

列管計畫及下列情形者外，得立即公開查詢

□涉及專利或其他智慧財產權，□一年□二年後可公開查詢

執行單位：國立交通大學電子工程學系

中 華 民 國 92 年 10 月 15 日

行政院國家科學委員會專題研究計畫成果報告

無線網路串流聲訊研究及聲視訊子系統整合（1/3）

Wireless Streaming Audio Research and

Audio/Video Subsystem Integration (1/3)

計畫編號: NSC 91-2219-E009-011

執行期限: 91 年 8 月 1 日至 92 年 7 月 31 日

主持人: 杭學鳴 國立交通大學電子工程學系教授

計畫參與人員：楊政翰，陳繼大，蔡家揚，侯思瑋，李仰哲，王俊能

國立交通大學電子研究所

中文摘要

本子計畫之主要目標在研究與製作寬頻無線網路環境中的串流聲訊系統。主要完成目標為聲

訊(Audio & Speech)在寬頻無線網路中的有效傳輸方式，含可調式(scalable)編碼法，整合通

道與訊源編碼，錯誤糾正與補償等演算法之研究。其次考慮通道錯誤偵測與錯誤控制技術以

及系統層對達成品質保證(QoS)目標的影響。並配合群體計畫的 DSP 展示系統，則同時進行

MPEG-4 Audio Coding (AAC 或其延伸)與 3GPP AMR Speech coding 的 DSP 實現。第三年亦

將協助總計畫，在本計畫內先將視訊與聲訊 DSP 整合成訊源子系統。

Abstract

The goal of this research project is to study, simulate and design effective streaming audio

algo-rithms/systems transmitted in the wideband wireless environment. This is often achieved by three

means: sclable source coding schemes, adaptive channel coding techniques and QoS protocol

support from the network. We will focus mostly on the former two methods.

In the first year, we study and design the MPEG AAC (advanced audio coding) audio codec and

BSAC (Bit-Sliced Arithmetic Coding) scalable coding techniques. In the future, we work on the

DSP implementation of AAC and 3GPP AMR (adaptive multi-rate) coding. In the third year, we

will integrate the audio and video components together with the system layer to form the source

subsystem to be merged into the Group Project test bed.

目錄 Table of Contents

A. 背景 ... 3

B. 研究步驟 ... 3

1. AAC General Coding 編碼法之研究： ... 3

1.1

有效位元分配方式: ...4

1.2

快速位元分配參數搜尋法: ...6

2. 切片式算數編碼(Bit-Sliced Arithmetic Coding，BSAC)研究： ... 8

C. 實驗與結果 ... 9

1. AAC 編碼法 Rate/Distortion Control：... 9

2. 切片式算數編碼： ... 12

D. 結論 ... 15

E. 參考文獻 ... 16

F. 計畫成果自評 ... 17

G. 附錄 ... 17

1. C.-H. Yang and H.-M. Hang, “Efficient bit assignment strategy for perceptual audio

行政院國家科學委員會專題研究計畫成果報告

計畫類別：整合型計畫

執行單位：國立交通大學電子工程學系

計畫主持人：杭學鳴

計畫參與人員：楊政翰，陳繼大，蔡家揚，侯思瑋，李仰哲，王俊能

報告類型：完整報告

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 92 年 10 月 23 日

■ 成果報告

中華民國 92 年 10 月 15 日

主持人: 杭學鳴國立交通大學電子工程學系教授

新模組， Long Term Prediction (LTP) [6] 、 Perceptual Noise Substitution (PNS) [7] 、