• 沒有找到結果。

基於正交分頻多重進接之無線多媒體傳收機研究及設計---總計劃(I)

N/A
N/A
Protected

Academic year: 2021

Share "基於正交分頻多重進接之無線多媒體傳收機研究及設計---總計劃(I)"

Copied!
37
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會補助專題研究計畫成果報告

基於正交分頻多重進接之無線多媒體傳收機研究及設計—總計畫(I)

OFDM-Based Mobile Wireless Multimedia Transceiver Research and

Design

計畫編號:NSC 91-2219-E-009-007

執行期限:91 年 8 月 1 日至 92 年 7 月 31 日

主持人:林大衛 交通大學電子工程學系 教授

共同主持人: 張文鐘

交通大學電信工程學系 教授

陳紹基、杭學鳴 交通大學電子工程學系 教授

摘要

本整合型計畫係以 IEEE 802.16a 標準為基礎,研究正交分頻多重進接(OFDMA)傳輸

系統在作無線行動通訊服務之用時的傳收系統技術。本計畫擬為期三年,第一年

(91/8-92/7)獲核定四個子計畫,第二及第三年(92/8-94/7)獲核定五個子計畫。本報告係針

對第一年之研究,其中我們探討了聲訊壓縮編碼、濾波器設計、OFDM 接收器架構與

FFT 處理器架構、傳輸訊號之時間同步與頻率同步、通道估計、通道碼解碼、及通道品

質預測與傳輸率控制法。在第二年以後我們將加入視訊編解碼之研究。此外,在第二年

以後我們除將繼續進行各分項傳輸技術之研究外,亦擬以數位訊號處理器(DSP)為主要

平台,將各傳收器組件作軟體或硬體實現並予以連結。

關鍵詞:正交分頻多重進接、正交分頻多工調變、聲訊編碼、視訊編碼、同步、通道估

計、通道編碼、通道品質預測、媒體存取控制、資料流率控制

I

(2)

Abstract

This integrated project bases on the IEEE 802.16a standard to research into transceiving

system technologies for wireless mobile communication under OFDMA (orthogonal

frequency-division multiple access). The project is intended for 3 years, where for the first

year (2002/8-2003/7) we have been awarded 4 subprojects and for the second and the third

years (2003/8-2005/7) 5 subprojects. This report is concerned with the research of the first

year, wherein we looked into audio coding, filter design, OFDM receiver architecture and

FFT processor architecture, time and frequency synchronization of the transmitted signal,

channel estimation, channel decoding, and channel quality prediction and transmission rate

control. From the second year on, we will include research in video coding and decoding. In

addition, from the second year on, besides continued research in individual transmission

technologies, we also plan to use digital signal processors (DSPs) as the main platform to

conduct software or hardware implementation of the transceiver components and to connect

them together.

Keywords: Orthogonal Frequency-Division Multiple Access (OFDMA), Orthogonal

Frequency-Division Multiplexing (OFDM), Audio Coding, Video Coding, Synchronization,

Channel Estimation, Channel Coding, Channel Quality Estimation, Medium Access Control,

Data Flow Control

(3)

目錄 Table of Contents

一、計畫緣由與目的

1

二、結果與討論

2

A. 總體成果摘要

2

B. 無線網路串流聲訊研究及聲視訊子系統整合(子計畫四)

3

C. 寬頻正交分頻調變傳輸系統之傳輸信號處理技術研究及傳輸子系統

整合(子計畫三)

4

D. 無線正交分頻多重進接頻道使用技術研究及全系統整合(子計畫一)

4

E. 根據預測的無線通道情況控制即時資料的傳輸速率(子計畫二)

5

三、參考文獻

6

四、圖表

7

五、計畫成果自評

8

六、可供推廣之研發成果(請見各子計畫報告)

8

七、附錄

9

III

(4)

一、計畫緣由與目的

第三代行動通訊系統已逐漸由研發階段步入商業運轉的階段。許多人也預期約將在

十年後進入第四代。以傳輸速率而言,預期第四代約將為第三代的 10 至 100 倍。在調

變方法方面,則似以正交分頻多工(OFDM)及與其相關之技術最為人矚目。本計畫基

於最近制定之 IEEE 802.16a 標準[1]中,關於正交分頻多重進接(OFDMA)之規範,研

究高速無線行動傳輸技術。IEEE 802.16a 標準原是為固定式寬頻無線通訊而制定,但

經過約半年的醞釀之後,在 IEEE 802.16 工作團(Working Group)之下又已於去(2002)

年最後一季成立了 802.16e 任務團(Task Group),盼只將 802.16 系列標準做小幅延伸,

即能符合行動傳輸之所需,且可成為第四代行動通訊系統的候選技術。目前該任務團

擬於今(2003)年 9 月完成其第一版標準草案(D1)。故本計畫之執行時間,恰與其同時。

有趣的是,除了 802.16e,IEEE 之內另有一個新成立(2002 年 12 月)的 802.20 工作團

亦在探討行動寬頻無線傳輸。

本計畫原擬為期三年,主要目標有二:

1.

以 IEEE 802.16a 標準為基礎,研究 OFDMA 傳輸系統在作無線行動通訊服務之用

時的傳收系統技術及傳輸效能。

2.

將各傳收系統組件以數位訊號處理器(DSP)或 FPGA 作軟體或硬體實現,並予以

連接。

本計畫第一年(91/8-92/7)共獲核定四個子計畫,第二及第三年(92/8-94/7)獲核定五個子

計畫。本報告所述之成果僅含第一年之四個子計畫,但以下關於計畫規劃之描述則包

括三年。

本計畫研究標的系統之概略架構及各子計畫間的分工如圖一所示。圖左的 Video

Encoder/Decoder and Error Resilience,亦即標記為 subproject x 的兩個方塊,即為第二

及第三年獲核定之第五個子計畫。各子計畫除了須互相配合以達成以上述主要目標,

亦可依其負責之技術項目作衍伸性之研究,以求在學術與技術方面之更多創新與進

步。以下我們簡釋圖一之內容。

IEEE 802.16a 標準容許分頻雙工(FDD)與分時雙工(TDD)。鑒於未來的高速無線通

訊很可能具有雙向不對稱的傳輸率,我們決定考慮 TDD 雙工,並兼研究上行與下行

兩方向之傳輸(兩方向之傳輸信號結構不同)。我們並擬試使最後之 DSP(或 FPGA)實現

具可程式性,以方便彈性選擇欲採用之選項參數。在 MAC 及 Link 層方面,我們主要

考慮的是通道的有效使用與優良服務品質的獲得。

如圖一所示,PHY 層之各項信號處理功能,即自圖上中之 FEC (FEC encoder 之簡

稱)至下中之 FED (FEC decoder 之簡稱),係分別由子計畫四、一、三進行研究。其中

且將建構一個模擬的無線通道(圖一右中之 Simulated Channel),使系統測試時可不必

建構實體 RF 通道,並使其測試能更方便而具彈性。MAC 及 Link 層之研究由子計畫

二執行(見圖中央部分),其中研究如何依據通道狀況來改變個別用戶的調變與資料頻

寬以求優良的服務品質。IEEE 802.16a 標準,主要是規範 MAC 層及 PHY 層之運作。

但自 2.5G 行動通訊系統開始發展以來,多媒體行動通訊日形重要,也成為第三代行動

通訊系統設計上的重要考慮因素。其實所謂寬頻無線進接的一項主要功能,也就是要

(5)

提供多媒體聲視訊傳輸服務。本計畫在整個信號傳輸系統的研究上,亦考慮到適用於

無線傳輸之聲視訊編解碼,以及相關傳輸誤差之處理。如圖一最左方所示,我們係於

子計畫四及 x 中分別探討聲訊及視訊之編解碼與傳輸誤差之處理。子計畫二之一主要

研究題目,即聲視訊資料頻寬之控制,亦與此相關。

圖一中亦顯示本計畫在系統整合方面之規劃,計分三部分,如虛線框架所示,分別

由子計畫三、四、一執行。我們擬使用個人電腦及 DSP/FPGA 插板為實現平台。預期

在整合完成的系統中,傳送端(含模擬之無線通道)及接收端各需要數個插板,兩端擬

各架構在一台個人電腦上。

總體而言,三年的工作規劃大略如下:

1.

第一年:各分項技術之演算法研究。

2.

第二年:繼續各分項技術之演算法研究。開始 DSP/FPGA 實現工作。

3.

第三年:完成 DSP/FPGA 實現工作。進行系統整合。亦繼續各分項技術之進一步

研究。

在第一年的研究中,我們希望參與者都對 IEEE 802.16a 標準有相當程度的了解。各子

計畫也不宜囿於圖一所示的責任範圍,而對其責任範圍以外的方塊無所了解,以致對

不同子計畫間的溝通與未來的系統整合工作造成不良的影響。在第一年中,總計畫透

過隔週一次的例會,促進各子計畫之間的溝通。我們也不禁止各子計畫(特別是要負責

子系統或系統整合的子計畫)去了解與研究其他子計畫負責的系統方塊之相關技術。這

樣做的效果在下節的子計畫成果簡述中可以得見。

二、結果與討論

以下簡單討論本計畫的成果,其中首先摘要敘述總體成果,然後分述各子計畫的成

果。後者則係依圖一中傳輸信號處理之先後順序排列,而非依國科會核定之子計畫號

數,且大體上係先討論 PHY 層而後 MAC 及 Link 層,以期條理能更清楚。

A. 總體成果摘要

除人才培育外,本計畫在第一年中的主要成果可歸納為三大類:

1.

對 IEEE 802.16a 標準的規範及其中所使用的技術的了解:前者如該標準中的訊框

(frame)結構與嚮導載波(pilot carriers)的時頻域位置,後者如該標準所使用的通道

編碼的特性。由於該標準是在計畫進行的過程中才定稿,所以我們是不斷追蹤該

標準的發展,以了解其最新版的草稿與前一版之間的差異。我們也在持續注意

IEEE 802.16e 及 802.20 標準的發展,以視是否能應用其相關資訊以幫助本計畫之

進行。

2.

符合 IEEE 802.16a 或其他 OFDM 傳輸方式下的行動無線多媒體信號傳輸相關演算

法設計:依圖一中傳輸信號處理之先後順序排列,包括聲訊壓縮編碼法(子計畫

四)、濾波器設計法(子計畫三)、時間同步法(子計畫一)、頻率同步法(子計畫一、

二)、通道估計法(子計畫二、三)、通道碼解碼法(子計畫一)、及通道品質預測與

(6)

傳輸率控制法(子計畫二)。

3.

符合 IEEE 802.16a 或其他 OFDM 傳輸方式下的信號處理硬體架構設計:包括

OFDM 接收器(receiver)架構與 FFT 處理器架構(子計畫三)。

B. 無線網路串流聲訊研究及聲視訊子系統整合(子計畫四)

本子計畫幾項研究課題為:(1)進階音訊編碼(Advanced Audio Coding, AAC)編碼法

位元率控制研究,及(2)切片式算術編碼(Bit-Sliced Arithmetic Coding, BSAC)研究。茲

分別敘述於下。

關於 AAC 編碼法位元率控制之研究,我們知道多媒體壓縮標準如聲訊壓縮,只界

定解碼端。編碼端許多參數選擇,是設計工程師的職責。好的參數選擇產生較好的效

果。參數中最重要的一類是控制位元率,以達到較佳聲訊品質與較高壓縮比。我們嘗

試了兩種不同想法,分述於下。

1.

有效位元分配方式:在聲訊壓縮中傳統位元分配方法是,觀察各頻帶的 NMR

(noise-to-masking ratio)值,將位元優先配給 NMR 值最大者。但如從整體效率角

度,這未必是成效最好的,例如下例:

Band NMR (dB) NMR-Gain/Bit

A 3.5

0.5

B 3

1.0

Band A 的 NMR 值較大,但給它 1 bit 只降低 0.5 dB。若給 Band B 則降低 1 dB。

如果聽覺上,不論哪一個 band,降低 1 dB 都會比降低 0.5 dB 好,則選 Band B 較

有效。此即為我們提出之原則 “give bits to the band with maximum NMR-gain/bit”

或 “retrieve bits from the band with maximum bits/NMR-loss”。實驗結果此法與窮舉

搜尋法達到類似效果,但計算量為百分之一。

2.

快速位元分配參數搜尋法:窮舉搜尋法可達到最佳效果,但其計算量非常大,不

合於實際應用。前人觀察到編碼參數前後間之重複關係,提出 trellis-based search,

其計算量遠較窮舉搜尋法為低,但仍過高。因此我們將演算法切割為段落。但分

開計算雖可以大量減少運算量,卻也可能大幅降低品質。其關鍵點在找出適當的

位元編碼模型(virtual Huffman codebook model)。此外,我們從統計資料中找出篩

減候選參數數目的規則性。實驗數據顯示,我們可降低計算量達 trellis-based search

的百分之一,而壓縮效果無甚差別。

關於 BSAC 之研究,我們知道 MPEG-4 第二版的聲訊壓縮標準提供了編碼率精細

可調式的新工具。每個可調間距大約為 1 kbits/s/ch。這功能對一些頻寬容易變動的通

訊系統,例如網際網路或行動通訊來說,是非常有用的。我們首先研究切片式算術編

碼的聲質效能及其對於傳輸錯誤的敏感度。接著,我們提出兩種方法試圖改善切片式

算術編碼的編碼效率。比較切片式算術編碼和進階聲訊編碼的聲質效能之後,我們對

實驗結果進行分析,並提出造成兩者效能差異的可能原因。因為算術編碼是一種對傳

輸錯誤很敏感的編碼方式,所以我們也研究了切片式算術編碼中的錯誤傳遞問題。在

改善編碼效率方面,我們研究了在切片式算術編碼過程中中會用到的機率模型。我們

3

(7)

也設計並測試經由實際聲聲訊號產生的機率模型。另一個改善編碼效能的方法是改變

每個可調層分配到的位元數。主要觀念在於分配更多的位元數給較低頻的可調層。這

個方法將可以看到比較明顯的效能改善。

本子計畫本年度發表二篇會議論文,見附錄 A 及 B。

C. 寬頻正交分頻調變傳輸系統之傳輸信號處理技術研究及傳輸子系統整

合(子計畫三)

本子計畫幾項研究課題為:(1) OFDM 接收器架構與硬體設計研究,(2) FFT 處理器

設計研究,(3)傳輸濾波器設計研究,及(4)通道估計法研究。

在 OFDM 接收器架構與硬體設計研究方面,本子計畫除針對 802.16a 進行研究外,

亦致力於探討寬頻 OFDM 型態的傳收機之高效能的、統一的(unified)數位訊號處理技

術。這是指可程式及可調整的設計,能有效率的使用一個共通的 OFDM 核心硬體架

構,而能彈性調整以適用於不同的通訊系統規格。也就是能有效的達到所謂多模式

(multi-mode) 或多標準 (multi-standard) 的運作。我們已提出一個多模式與多標準的

OFDM 內接收器架構。此架構具有低硬體成本之特性,亦可彈性調整以適用於不同之

OFDM 通訊系統,如 802.11a, 802.16a, DAB, 及 DVB。並已用 Verilog 硬體設計語言完

成其合成與驗證。

在 FFT 處理器設計研究方面,我們提出一個可變長度的 FFT 處理器架構,可處理

256 至 8192 個子載波的範圍。此架構具小面積及高速率的優點,並已用 Verilog 硬體

設計語言合成與驗證。此處理器中包括一個新穎的可變長度資料位址產生架構、一個

新穎的 twiddle factor 位址產生架構、及一個新穎的 twiddle factor 產生架構。

在傳輸濾波器設計研究方面,我們提出一個新穎的有限脈衝響應長度(FIR)濾波器

合成法。該法可用以產生低複雜度、高訊雜比的 cascade 型式的定點(fixed-point) FIR

濾波器。

在通道估計法研究方面,我們探討了一些既有的通道估計法的效能(含均方差、位

元錯誤率、及運算複雜度)及其在 802.16a 環境下的應用。結果顯示基於 DFT 的通道估

計及二維 Wiener 內插具有較佳的結果,只是運算複雜度較高。我們又提出二種基於

DCT 的通道估計法,結果顯示它們比基於 DFT 的方法為佳。我們又考慮在快速淡化

(fast fading)通道環境下的通道估計,並提出二個方法,可有效降低快速淡化狀況下的

道際干擾(ICI)。

本子計畫本年度發表二篇會議論文,見附錄 C 及 D。

D. 無線正交分頻多重進接頻道使用技術研究及全系統整合(子計畫一)

本子計畫幾項研究課題為:(1) IEEE 802.16a OFDMA 之同步技術研究,(2) IEEE

802.16a OFDMA 通道編解碼研究,及(3)最佳多載波調變方法研究。

在 IEEE 802.16a OFDMA 之同步技術研究方面,我們首先了解此標準之信號結構。

由此我們知道:因為下行與上行的信號結構不同,運作狀況亦有異,所以須有不同的

同步設計。例如:用戶台在開機時需要抓取下行訊號的訊框與精確的載波頻率,但上

(8)

行訊號則須根據基地台所規定的時間傳送,並須將其載波頻率控制在某一誤差之內;

又如:下行與上行的 pilot 信號結構與頻域位置不同;再如:下行的的訊號是由一個基

地台傳來,但上行的訊號則可能來自不只一個用戶台,且其間可能有相當大的時間差

異。不論下行或上行傳輸都需要作時間同步以偵測信號到達時間。如果估測錯誤會降

低 guard interval (即 cyclic prefix)用來防止多重路徑延遲造成符際干擾(ISI)的能力。此

外,OFDM 系統對載波頻率偏移非常敏感,些許偏移即可能造成次載波之間的正交性

喪失,因而在下行傳輸中特須對頻率偏移作同步處理。

經理論及模擬研究,我們將下行同步分為四級。第一及第二級利用 guard intervals

的循環相關性來估測 OFDM symbol 開始的時間及分數頻率偏移(fractional frequency

offset)。第三級利用 guard bands 及部分 pilot carriers 來判斷整數頻率偏移。最後一級

藉由下行傳輸的前置資訊(preamble)來判斷訊框開始的時間。上行同步則嘗試使用兩種

方式做時間同步,分別是在時域及頻域將收到的訊號與上行傳輸的前置資訊作相關度

分析,以找到相關度最大的時間。

在 IEEE 802.16a OFDMA 通道編解碼研究方面,IEEE 802.16a 的通道編碼採用串接

碼(concatenated code),外層是經過縮短(shorten)和穿孔(puncture)的里德-索羅門碼

(Reed-Solomon code),內層是咬尾(tail-biting)和穿孔的迴旋碼(convolutional code)。此

外,在串接碼後接了一個位元交錯器(bit interleaver)和 M 階正交振幅調變器(M-ary

QAM)。我們為整個通道編碼設計了解碼演算法。我們在加成性白色高斯通道下和淡

化通道下分別模擬了里德-索羅門碼,迴旋碼,和串接碼,並把模擬之結果與一些分析

的結果做比較,包含在加成性白色高斯通道下 IEEE 802.16a 的編碼增益需求值,用

Shannon 極限求出的編碼極限值,及用最短碼字間距求出的增益值。在加成性白色高

斯通道下,里德-索羅門碼和迴旋碼的編碼增益值幾乎達到理論值,但整體的編碼增益

值離理論值或 IEEE 802.16a 的編碼增益需求值卻有很大的差距。這與能找到的 802.16a

Task Group 之文獻資料相吻合。我們預測在更高的號雜比下,實際的編碼增益值會比

較接近理論計算出的編碼增益值。

在最佳多載波調變方法研究方面,目前一般實用的多載波調變係採用 DFT,但文

獻上不乏討論在理論上更佳的方法,如使用 SVD (singular value decomposition)。然而

SVD 的運算量極大,且須隨通道變化而調整。不過,隨著數位電路的飛速發展,也許

在不久的將來確可考慮使用 SVD。本子計畫對基於 SVD 的多載波調變在多路徑淡化

通道中的性能及其與基於 DFT 的方法之間的比較,獲得了一些模擬分析結果。

本子計畫有數篇會議論文在投稿中。

E. 根據預測的無線通道情況控制即時資料的傳輸速率(子計畫二)

本子計畫幾項研究課題為:(1)淡化(fading)通道預測之研究,(2)載波頻率偏移估計

之研究,及(3)通道估計方法研究。

關於淡化通道預測之研究,我們知道無線通道常具淡化效應。在通道情況不佳時傳

送訊號,會因資訊錯誤而造成必須重傳。若有許多資訊需要重傳,就會對訊源編碼器

及通道傳輸器的功能都有不利的影響。若能預估通道的好壞,就可彈性調整訊源編碼

器或是通道傳輸器的輸出位元率,減少重傳的需要。具好壞兩個狀態的馬可夫鏈,所

5

(9)

謂 Gilbert 通道模型,常被用來作淡化通道的模型。我們探討如何由瑞利淡化通道建構

Gilbert 通道模型,並據此以預測未來的通道狀況,進而預測未來的位元錯誤率及須重

傳的資料量。這個資料量就可以從原有的傳輸量中減去(例如降低訊源編碼器的輸出位

元率),以防止如緩衝器溢滿等問題。

關於載波頻率偏移估計之研究,現行的 OFDM 系統中,為了避免頻帶間的干擾,

不會將所有的次載波都拿來傳送資料。這些沒有傳送資料的次載波,我們稱之為空載

波。我們利用這些空載波與另外有傳送資料的次載波間的正交特性衍生出一套估計載

波頻率偏移的方法。也就是說,在沒有載波頻率偏移的情形下(不考慮雜訊),我們用

這些空載波與收到的訊號作內積 結果將會是零。一旦有載波頻率偏移發生時,我們只

要將這些空載波修正到正確的載波頻率偏移,再與接收訊號作內積,得到的結果也會

是零。因此,根據收到訊號在修正過的空載波上的投影量,我們可以正確的決定載波

頻率偏移。在考慮雜訊的情況下,投影量最小所對應的載波頻率偏移即是我們要的值。

我們根據此原則設計了一個載波頻率偏移估計法,其中使用平行與疊代搜尋以降低運

算量,結果顯示此方法有很好的效能。

關於通道估計方法研究,OFDM 系統通常會傳送嚮導載波(pilot carriers)。這些 pilots

可用來作通道估計之用。文獻中曾有人提出 comb type 和 block type 兩種 pilots 配置的

方法。我們透過模擬驗證 comb type 較不受都卜勒偏移的影響(故適合快速淡化的環

境),但對頻率選擇性淡化卻較敏感,而 block type 則相反。我們也利用適應性訊號處

理的技巧以提昇估計效能。我們亦考慮 IEEE 802.16a 下鏈傳輸的通道估計,透過計算

機模擬來看不同的延遲擴散(delay spread)和都卜勒偏移之下的通道估計表現。

本子計畫本年度發表一篇會議論文,見附錄 E。

三、參考文獻

[1] IEEE Std 802.16a-2003 (Amendment to IEEE Std 802.16-2001), IEEE Standard for

Local and Metropolitan Area Networks – Part 16: Air Interface for Fixed Broadband

Wireless Access Systems – Amendment 2: Media Access Control Modifications and

Additional Physical Layer Specifications for 2-11 GHz. New York: IEEE, Apr. 1, 2003.

(10)

四、圖表

圖一:計畫架構

(11)

五、計畫成果自評

研究內容與原計畫相符程度:達成第一年之主要目標,即對 IEEE 802.16a 標準規

範及其中所使用之技術的了解,以及在 OFDMA 傳輸方式下之行動無線多媒體信號傳

輸相關演算法設計。

達成預期目標情況:總計畫之主要成就,除設備之採購、計畫進行之協調等外,係

透過其協調工作,促成各子計畫創新之發現、理論之推導、技術水準之提昇、計算機

模擬軟體之建立、及人才培育等成果。

成果之學術與應用價值等:總計畫本身之價值,其中很重要的一點是在計畫進行中

所累積的經驗與所建構的實驗環境,可為後續研究工作之用。各子計畫成果的學術價

值高,迄現在已發表數篇學術論文,並有其他論文在陸續投稿中。應用價值方面,各

子計畫的所設計的演算法與硬體架構,以及其相關模擬結果,可為 IEEE 802.16a 及相

關無線多媒體 OFDM 傳輸系統開發之參考。

綜合評估:本總計畫及各子計畫獲得不少具有學術與應用價值的成果,並達人才培

育之效。自評為「佳」

六、可供推廣之研發成果(請見各子計畫報告)

8

(12)

七、附錄

本附錄共含五篇會議論文,如下列:

A. C.-H. Yang and H.-M. Hang, “Efficient bit assignment strategy for perceptual audio

coding,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. V-405—V-408,

April 2003 (4 pages).

B. C.-H. Yang and H.-M. Hang, “Cascaded trellis-based optimization for MPEG-4

Advanced Audio Coding,” in Audio Engineering Society 115

th

Convention, New York,

Oct. 2003 (9 pages).

C. Y.-H. Yeh and S.-G. Chen, “Efficient channel estimation based on discrete cosine

transform,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp.

IV-676—IV-679, April 2003 (4 pages).

D. C.-K. Chang, C.-P. Hung, and S.-G. Chen, “An efficient memory-based FFT

architecture,” in Proc. IEEE Int. Symp. Circuits Syst., pp. II-129—II-132, May 2003 (4

pages).

E. W.-T. Chang, “Rate control for real time media based on predictive wireless channel

condition,” poster presentation at 2003 年消息理論及通訊秋季講習會暨國科會成果

發表會(2003 Fall Workshop on Information Theory & Communication), 花蓮縣兆豐

休閒農場, Aug. 26-27, 2003 (4 pages).

以上 A 及 B 為子計畫四所發表,C 及 D 為子計畫三所發表,E 為子計畫二所發表。

(13)
(14)
(15)
(16)
(17)

___________________________________

Audio Engineering Society

Convention Paper

Presented at the 115th Convention

2003 October 10–13

New York, New York

This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the

Journal of the Audio Engineering Society.

___________________________________

Cascaded Trellis-Based Optimization For

MPEG-4 Advanced Audio Coding

Cheng-Han Yang 1, Hsueh-Ming Hang 1

1 Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.

hmhang@mail.nctu.edu.tw; Fax: (886)-3-5723283

u8911831@cc.nctu.edu.tw; Fax: (886) -3-5731791

ABSTRACT

A low complexity and high performance scheme for choosing MPEG-4 Advanced Audio Coding (AAC) parameters is proposed. One key element in producing good quality compressed audio at low rates in particular is selecting proper coding parameter values. A joint trellis-based optimization approach has thus been previously proposed. It leads to a near-optimal selection of parameters at the cost of extremely high computational complexity. It is, therefore, very desirable to achieve a similar coding performance (audio quality) at a much lower complexity. Simulation results indicate that our proposed cascaded trellis-based optimization scheme has a coding performance close to that of the joint trellis-based scheme, and it requires only 1/70 in computation.

1. INTRODUCTION

To meet the demand of various multimedia applications, many high-efficient audio coding schemes have been developed. The MPEG-4 Advanced Audio Coding (AAC) is one of the most recent-generation audio coders specified by the ISO/IEC MPEG standards committee [1]. It is a very efficient audio compression algorithm aiming at a wide variety of different applications, such as Internet, wireless, and digital broadcast arenas [2]. One key element in an AAC coder is selecting two sets of coding parameters properly, the scale factor

(SF) and Huffman codebook (HCB) in the rate-distortion (R-D) loop. Because encoding these parameters is inter-band dependent, i.e., the coded bits produced for the second band depend on the choice of the first band, the choice of their proper values so as to minimize the objective quality becomes fairly difficult. As discussed in [3][4], the poor choice of parameters for rate control is one shortcoming of the current MPEG-4 AAC Verification Model (VM) and therefore its compression efficiency is not as expected at low bit rates.

(18)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

2

Some methods such as vector quanitzers rather than scalar quantizers have been suggested to reduce the side information [5][6]. They would alter the syntax of the standards. In this paper, we focus on finding the parameters in the existing AAC standard that produce the (nearly) optimal compressed audio quality for a given bit rate.

In [3] and [4], a joint optimization scheme, which takes the inter-band dependence into account, is proposed for choosing the encoding parameters for all the frequency bands. This joint optimization is formulated as a trellis search and is, therefore, called

trellis-based optimization. Although the complexity

of this joint trellis-based optimization scheme can be reduced by adopting the Viterbi algorithm, its search complexity is still extremely high and is thus not suitable for practical applications.

In this paper, we propose a cascaded trellis-based (CTB) optimization scheme for selecting the proper encoding parameters. Our scheme retains the good audio quality offered by the joint trellis-based (JTB) optimization while its search complexity is drastically decreased.

The organization of this paper is as follows. In section 2, a brief overview of MPE-4 AAC is provided. The proposed CTB scheme with several variations for choosing the optimal coding parameters is described in sections 3, 4 and 5. The algorithm complexity analysis and the simulation results are summarized in section 6.

2. OVERVIEW OF AAC ENCODER

The basic structure of the MPEG-4 AAC encoder is shown in Figure 1. The time domain signals are first converted into the frequency domain (spectral coefficients) by the modified discrete cosine transform (MDCT). For tying in with the human auditory system, these spectral coefficients are grouped into a number of bands, called scale factor bands (SFB). The pre-process modules, which are the optional functions, can help removing the time/frequency domain redundancies of the original signals. The psychoacoustic model calculates the spectral coefficient masking threshold, which is the base for deciding coding parameters in the R-D loop. The R-D loop, our focus in this paper, is to determine two critical parameters, SF and HCB for each SFB so as to optimize the selected criterion under the given bit rate constraint. The SF is related to the step size of the quantizer, which determines the quantization noise-to-masking ratio (NMR) in each band. The quantized coefficients are then entropy-coded by one of the twelve pre-designed HCBs. In addition, the

indices of SFs and HCBs are coded using differential and run-length codes respectively and are transmitted as side information. Psychoacoustic model Transform/ Filter Bank Pre-Process Modules Rate-Distortion Control Process Scale Factor Quantizer Noiseless Coding Rate/Distortion Loop

Fig. 1. Basic structure of the MPEG-4 AAC encoder

3. CASCADED TRELLIS-BASED

OPTIMIZATION

The JTB optimization approach can substantially enhance the coding performance at low bit rates [3][4]. However, this approach also results in a very high computational complexity. The coding parameters in the JTB scheme, SF and HCB, are optimized simultaneously by using the trellis search. The states at the ith stage in the trellis for the JTB scheme represent all combinations of SF and HCB for the ith SFB. Different from the JTB scheme, our scheme, so-called cascaded trellis-based scheme (CTB), finds the proper coding parameters, SF and HCB, in two consecutive steps. The search complexity can thus be drastically reduced, while the advantage of trellis-based optimality is mostly retained.

The way that the trellis search performs depends on the optimization criterion it adopts. There are two frequently used criteria, the average noise-to-mask ratio (ANMR) and the maximum noise-to-mask ratio (MNMR) [7]. Both criteria will be used in this paper.

3.1. Trellis-Based ANMR Optimization on SF

The constrained optimization problem for the ANMR criterion is formulated as below.

i i id w min s.t.

(

)

+ − − + − ≤ i i i i i i D sf sf R h h B b ( 1) ( 1, )

, wherewiis the inverse of the masking threshold and i

d is the quantization distortion. Under this criterion, we minimize the sum of the perceptually weighted distortion. The coding parameters, SF and HCB, for the ith SFB is denoted by sfi andhi. SymbolDis the

differential coding function performed on SF and symbol R is the run-length coding function performed on HCB. The returned function values in both cases are bits to encode the arguments. Parameter bi is the bits for coding the quantized

(19)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

3

spectral coefficients (QSCs) and the parameter Bis the prescribed bit rate for a frame.

As described in [3], the ANMR optimization problem can be reformulated as minimizing the unconstrained cost functions, CANMR , with the Lagrangian multiplier λ.

+ ⋅ + − − + − = i i i i i i i i ANMR h h R sf sf D b d w C )) , ( ) ( ( 1 1 λ (1)

Different from that in the original JTB scheme, the optimization problem in our CTB scheme is reformulated as minimizing two unconstrained cost functions, CSF_ANMR and CHCB, as follows.

+ ⋅ + − − = i i i i i i ANMR SF wd b D sf sf C _ λ ( ( 1)) (2) ) , ( i1 i i i HCB b R h h C =

+ (3) The minimization of CSF_ANMR is described in this

sub-section, and the minimization of CHCB will be

described in section 3.3.

Similar to the approach in the JTB scheme, the goal for finding proper SFs that minimize CSF_ANMRcan be

achieved by finding the optimal path through the trellis. Each stage in the trellis corresponds to an SFB. (There are N_SFB stages in total.) However, different from JTB, each state at the ith stage in our scheme only represents a SF candidate for the ith SFB. In other words, at the ith stage, if a path passes through the mth state, it means that the mth SF candidate is employed to encode the ith SFB. For a given value of λ, the Viterbi search procedure described in [3] is modified as stated below.

The kth state at the ith stage is denoted by Sk,i and

the minimum accumulative-partial cost ending at

i k

S, is denoted by Ck,i . The state-transition cost, i k i l T,−1→, , from Sl,i−1 to Sk,i is λ⋅D(sfk,isfl,i−1). 1. Initialize Ck,0= ,0 ∀kand i =1.

2. Search for, k∀ , the best path ending at Sk,i by

computing

{

li i ki ki li ki

}

l i k C wd b T C , =min ,−1+ , +λ⋅ , + ,−1→, (4) 3. If i < N_SFB, set i = i+1 and go to step 2.

3.2. Trellis-Based MNMR Optimization on SF

The constrained optimization problem for the MNMR criterion is formulated below.

(

i i

)

i wd max min s.t.

(

)

+ − − + − ≤ i i i i i i D sf sf R h h B b ( 1) ( 1, ) , where i i i wd

max is the maximum NMR in a frame. Again, using the unconstrained format, the cost function in the JTB scheme [4] becomes

=

+ − + i i i i i i MNMR b D sf sf R h h C ( 1) ( 1, ) (5)

Different from the cost function in the JTB scheme [4], the MNMR optimization problem in our CTB scheme is reformulated as the minimization of two cost functions, CSF_MNMR and CHCB(Eqn.(3)), under

the constraint: widi≤ ,λ ∀i , for some constant

value of λ,

+ − − = i i i i MNMR SF b D sf sf C _ ( 1) (6)

Similar to the trellis-based ANMR optimization on selecting SF, a trellis is constructed for minimizing

MNMR SF

C _ and each state at the ith stage only represents a SF candidate for the ith SFB. The Viterbi search procedure described in [4] is modified as stated below. The state-transition cost, Tl,i−1→k,i, from

1 ,il

S to Sk,i is D(sfk,isfl,i−1). 1. Initialize Ck,0= ,0 ∀kand i =1.

2. Find the valid states for the ith stage, Sk,i,∀k. A

state is valid if the NMR (widk,i) corresponding to

that state parameter is ≤ λ.

3. Search for, ∀k, the best path ending at the valid state Sk,i by computing

{

li ki li ki

}

l i k C b T C , =min ,−1+ , + ,−1→, (7) 4. If i < N_SFB, set i = i+1 and go to step 2.

As pointed in [3][4], in the trellis for selecting the optimal SF (for both ANMR and MNMR), each state is further split into two consecutive states. In the first state, the spectral coefficients are quantized using the assigned valid SF, and in the second state, all quantized values of spectral coefficients are set to zero.

3.3. Trellis-Based Optimization on HCB

The HCB optimization is performed under the condition that the SF for each SFB has already been determined. With a determined SF, QSCs (quantized spectral coefficients) for each SFB are fixed and thus thebi term in the cost function CHCB (Eqn.(3)) only

depends on HCB. The optimization procedure here is to find the HCBs that minimize the cost function

HCB

C and this can be achieved again by finding the optimal path through the trellis.

(20)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

4

A trellis is thus constructed for minimizing CHCB.

Each stage in this trellis corresponds to an SFB (There are N_SFB stages in total.) and each state at the ith stage represents a HCB candidate for the ith SFB. In other words, at the ith stage, if a path passes through the mth state, the mth HCB candidate is employed for encoding the ith SFB. The state-transition cost, Tn,i−1→m,i , from Sn,i−1 to Sm,i is

) , (hn,i1 hm,i

R . The Viterbi search procedure for finding optimal HCBs is as follows.

1. Initialize Cm,0= ,0 ∀m. Initialize i=1.

2. Search for, m∀ , the best path ending at Sm,i by

computing } { min , 1 , , 1 , ,i n ni mi ni mi m C b T C = + + (8)

3. If i < N_SFB, set i = i+1 and go to step 2.

3.4. Cascaded Trellis-Based Optimization

The block diagram of the CTB optimization scheme is shown in Figure 2 and the processing steps are described below.

1. Initialize λ.

2. For a given λ, a set of optimal SF, sfopt , is

determined by the trellis-based SF optimization procedure using the Virtual-HCB Mode.

3. For the given sfopt obtained from step 2, a set of

optimal HCB, hcbopt, is determined by the

trellis-based HCB optimization procedure.

4. For the given hcbopt obtained from step 3 and λ, a

set of recalculated optimal SF, sfopt′ , is obtained

from the Fixed-HCB Mode trellis-based SF optimization procedure.

5. Adjust rate. For the given optimal sfopt′ (or sfopt)

andhcbopt, the total coding bit rate is calculated

and compared to the prescribed bit rate (B). Adjust λ and go to step 2 if the constraint is not met. In the preceding procedure, the trellis-based ANMR (MNMR) optimization on SF is applied to steps 2 and 4.

As described in the preceding procedure, the trellis-based optimization procedure on SF is operated in two different modes, Virtual-HCB Mode and Fixed-HCB Mode. In these two modes, the value of bk,i in

Eqn.(4) and Eqn.(7) is determined in different ways, and these will be described in section 4.

The preceding procedure is called the full

optimization mode (or Two-Loop mode), because the

optimization procedure on SF is done twice. The second optimization procedure on SF (step 4) can help in recovering some improper SF values determined in step 2. Furthermore, the CTB optimization can also been operated in a simpler optimization mode (so-called One-Loop mode), in which step 4 is not included.

Virtual-HCB mode TB SF optimization

TB HCB optimization

Fixed-HCB mode TB SF optimization initializeλ opt sf opt hcb opt f shcbopt

Count total bits, comparison & Adjustλ

opt

sf

( )

Fig. 2. Cascaded trellis-based optimization scheme

4. FIXED AND VIRTUAL HCB MODE FOR SF OPTIMIZATION

For an identified Ck,i in Eqn.(4) and Eqn.(7) in

sections 3.1 and 3.2, widk,i or D(sfk,isfl,i−1) is unique for a given state or state transition. However, the value of bk,i depends not only on the state

parametersfk,i; it also depends on the choice of HCB.

In the JTB optimization scheme, for each candidate value of SF, all possible b values, corresponding to 12 pre-designed HCBs, are evaluated. But in our SF optimization scheme, we have to determine one proper value of bk,ifor the state Sk,i. According to

our implementation, the trellis-based ANMR or MNMR SF optimization scheme can operate in two modes, the Fixed-HCB mode and the Virtual-HCB mode, for determining the value of bk,i.

In the Fixed-HCB mode, a set of fixed HCBs,

] ... [ 1 2 _ f SFB N f f h h

h , is determined beforehand. For all the states at the ith stage, the QSCs, qk ,i, are encoded using f i h ; thus, ∀ ,k , ( k,i) f i i k h b = q .

In the Virtual-HCB mode, a Virtual HCB, v i k h, , is employed for state Sk,i. Thus,

v i k

h, needs to be pre-constructed to help us in determining bk,iand it can be constructed in several ways. For example, v

i k h,

(21)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

5

may be one of the 12 pre-designed HCBs or a compound codebook. Consequentially, the more accurate of the bk,i and

v i k

h, we can estimate, the

higher accuracy of SF optimization we can achieve. In order to improve the accuracy of the estimated values of bk,iand v

i k

h, , we did some analysis on the JTB optimization scheme.

For a given value of λ, by applying the JTB scheme, we can find a set of optimal parameters,sfoptJTB,

JTB

hcbopt and

JTB

bopt that minimize the cost function CANMR (Eqn.(1)) or CMNMR (Eqn.(5)). For comparison purpose, we also construct an ideal set of bits for coding QSCs, JTB

min

b . For the ith SFB, JTB

bmin,i is the minimum value of bits for coding qoptJTB,i using 12 pre-designed HCBs and is formulated as:

{

( )

}

min , min, JTB i opt m m JTB i h b = q , where JTB qopt,iis QSCs quantized by JTB i opt sf , . The histogram of the differences between JTB

bopt and JTB bmin , denoted by JTB opt minb , is shown in Figure 3. We can find that over 91% of JTB

opt min

b is less than 3 for both ANMR and MNMR criterions. From the other viewpoint, we tend to choose the HCB that results in nearly the minimum QSCs bits.

0 1 2 3 4 H ist o g ram ( % ) 0 5 10 15 65 70 MNMR ANMR 69.42 12.03 9.97 3.74 4.84 67.99 13.67 9.35 3.80 5.19 bJTBopt-min ≥

Fig. 3. Comparison against bopt minJTB

Observing this characteristics of boptJTB, we derive a rule in determining v

i k

h, and bk,i. For state Sk,i, the

candidate hkv,i values are the set of HCB that satisfies

the proposed rule in Eqn.(9); namely,

{

}

≤min ( ) )

(qk,i m hm qk,i

h (9)

Then, bk,i is formulated as:

) , ( ) ( 1 , 1 , , , , , v i k v i l v h n i k n v i k i k h q R h h h b v i k − ∈ ⋅ + =

α (10) v

R is the run-length coding function performed on the Virtual HCB and it is similar to R in our implementation.    ∩ ≠ = − − else h h if h h R v i k v i l v i k v i l v , 9 ) ( , 0 ) , ( , 1 , , 1 , φ (11) , where α is a weight for including ( , 1, ,)

v i k v i l v h h R in i k

b ,. Finally, we have to determine suitable values for δ and α. The simulation results of the normalized difference, (CANMRJTB - CTB ANMR C ) or (CMNMRJTB -CTB MNMR C ),

versus different values of δ and α are shown in Figure 4 and 5. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1 = δ=0 δ 2 = δ 3 = δ α Fig. 4. (CMNMRJTB -CTB MNMR C ) v.s (δ,α) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 α 1 = δ=0 δ 2 = δ 3 = δ Fig. 5. (CANMRJTB -CTB ANMR C ) v.s (δ,α) In this notation, CANMRJTB (or

JTB MNMR

C ) is the minimal ANMR

C (or CMNMR) derived from the JTB scheme and CTB

ANMR

C (orCMNMRCTB ) is the minimal CANMR(or CMNMR) derived from the CTB scheme. We find that for a

(22)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

6

wide range of δ values, we can achieve better performance when ( , 1, ,) v i k v i l v h h R is included in bk,i(α > 0). As shown in Figure 4, the CTB scheme can achieve the nearly best performance when δ=1 and α=0.5. Therefore, we choose 1 for δ and 0.5 for α in our implementation.

5. FAST SEARCHING ALGORITHM

The computational complexity of the trellis-based optimization scheme depends on the searching range (number of states) of each stage in the trellis. Hence, reducing the candidate states at each stage is an effective way in reducing the complexity. Base on this idea, we propose fast searching algorithms for the trellis-based optimization schemes on SF and HCB.

5.1. Fast Searching Algorithm for HCB Optimization

In MPEG-4 AAC, SFs are differentially coded and HCBs are coded by run-length coding. Run-length coding can be viewed as a special case of differential coding; therefore, the procedure of trellis-based optimization on HCB is similar to that on SF. However, the output of run-length coding has only two possible values, either 0 or 9. In looking for Ck,i in Eqn.(8), the cost of run-length coding is as follows.

   = = − else m n if h h R ni mi , 9 , 0 ) , ( , 1 , (12)

In HCB optimization, each state at the ith stage represents a HCB candidate. As shown in Figure 6(a), for finding the optimal path ending at Sm,i , all the HCB candidates at (i-1)th stage have to been taken into consideration. In MPEG-4 AAC, there are 12 pre-designed HCBs, so the searching complexity for finding all the optimal paths ending at the ith stage is 12×12.

The number next to the arrow in Figure 6 is the state-transition cost. Except for the path Sm,i−1→Sm,i, the state-transition costs of the other paths ending at

i m

S , are all the same (equal to 9). Therefore, in

calculating Ck,i in Eqn.(8), among these 11 paths, the path with the smallest Cn,i−1will result in the smallest

i m

C , . Based on this property, a fast searching

algorithm is proposed and is divided into two steps. 1. Among the 12 candidate states at (i-1)th stage, the

state with the minimum cost, Cmin,i−1, is chosen

and treated as the virtual thirteenth state, Smin,i−1.

} { min , 1 min,i = n CniC

2. As shown in Figure 6(b), while finding the optimal path ending at Sm,i, we only have to consider two paths, path ( Sm,i−1 → Sm,i ) and path (Smin,i−1→Sm,i). The rest of searching procedure is the same to that in section 3.3.

The searching complexity (in terms of branch metric calculation) of this fast algorithm is approximately 12 + 2×12. The first “12” term is the computational complexity for determining Smin,i−1. Note that the

performance (accuracy) of the fast searching algorithm is the same to that of the full searching algorithm. Sp,i-1 (a) Sm,i-1 So,i-1 Sm,i (b) Sm,i-1 Sm,i Smin,i-1 9 9 0 9 0

Fig. 6. Trellis representation of HCB optimization

5.2. Fast Searching Algorithm for SF Optimization

In SF optimization, each state in the trellis represents a candidate value of SF. Searching over a larger set of SF candidates can result in better performance, but at the cost of higher searching complexity.

In general, the state numbers (sn) for all the stages in the trellis are the same and the searching complexity for each stage transition in this uniform sn trellis is

sn × sn. In this section, we propose two non-uniform

(adaptive) sn algorithms, in which the sn for each stage in the trellis can vary to reduce the searching complexity. The first one is called “global minimum reference SF restricted non-uniform trellis”, or “Gm_Nu” in short, and the second one is called “local minimum reference SF restricted non-uniform

trellis”, or “Lm_Nu”. In both cases, a reference SF is

first identified and then the number of candidates is reduced against this reference.

In the first step, we define the reference SF, ref i sf , for ith SFB as the largest SF among all the valid states at the ith stage. Then we can find a global

(23)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

7

minimum reference SF, ref Min G

sf , which is the minimum SF among all the reference SFs. In the Gm_Nu algorithm, we restrict the SF candidates at the ith stage in the range of [ ref

i sf , ref

Min G

sf −ε]. Thus,

the sn at the ith stage, snGm,i , equals to (sfiref

ref Min G

sf +1+ε).

Next, we define the nth-order local reference minimum SF at the ith stage, ref

i Min L sf ⋅ , , where

{

ref

}

j n i j n i ref i Min L sf sf + ≤ ≤ − ⋅ , = min (13)

In the Lm_Nu algorithm, we restrict the SF candidates at the ith stage in the range of [ ref i sf , ref i Min L

sf ⋅ , −ε]. Therefore, the sn for the ith stage,

i Lm sn , , equals to ( ref i sfref i Min L sf ⋅ , +1+ε). In both cases,

ε is a parameter to control the searching range for all stages. In the simulations in section 6, the values of n in Eqn.(13) and ε are both set to 1.

6. SIMULATION RESULTS

In this section, we will discuss the computational complexity and the coded audio quality in our experiments. Three types of bits allocation algorithms have been tested and compared as described below. (1) The MPEG-4 VM of AAC (VM-TLS).

(2) The joint trellis-based ANMR and MNMR optimization schemes, abbreviated as JTB-ANMR and JTB-MNRM respectively, described in [3] and [4].

(3) The cascaded trellis-based ANMR and MNMR optimization schemes, abbreviated as CTB-ANMR CTB-MNMR respectively, described in section 3.

Ten two-channel audio sequences with sampling rate 44.1kHz are tested. Two of them are extracted from MPEG SQAM [8], and the others are from EBU [9].

6.1. Complexity Analysis

The computational complexity analyses for the aforementioned several coding schemes are summarized in Table 1. The value in “Computation” column is the searching complexity in calculating one stage transition in the trellis in terms of branch metric computation. For the convenience of comparison, the full-search JTB is set as the reference (ratio=1) and all the other schemes are rated based this base. Also shown in Table 1 is the storage requirement. Again, it is measured in terms of one branch metric computational needs.

We can find from Table 1 that the n-Loop CTB scheme is approximately (142/n) times faster than the JTB scheme. Moreover, the storage requirement for

the trellis search in the CTB scheme is much smaller than that in the JTB scheme.

For the JTB scheme, the fast HCB optimization algorithm can reduce the complexity to 1/4. Note that

ave Gm

sn and snLmave in Table 1 are the average sn in the Gm_Nu and Lm_Nu algorithms and are calculated by Eqn.(14) and Eqn.(15).

2 / 1 _ 1 , 1 , ) ( _ 1     =

= − SFB NB i i Gm i Gm ave Gm sn sn SFB NB sn (14) 2 / 1 _ 1 , 1 , ) ( _ 1     =

= − SFB NB i i Lm i Lm ave Lm sn sn SFB NB sn (15)

The simulation results show that typical ave Gm

sn is approximately 12 and ave

Lm

sn is about 5. Hence, the Gm_Nu algorithm can reduce the complexity to 1/25 and the Lm_Nu algorithm can reduce the complexity to 1/144.

Table 1. Complexity Analysis

Computation Ratio Storage

JTB 2 ) 2 60 ( × ×122 1 60×2×12 n-Loop CTB n× 2 ) 2 60 ( × + 2 12 n /142 60×2 JTB + Fast HCB 2 ) 2 60 ( × × 36 1/4 60×2×12 JTB + Fast HCB + Gm_Nu 2 ) 2 ( ave× Gm sn × 36 1/100 60×2×12 JTB + Fast HCB +Lm_Nu 2 ) 2 ( ave× Lm sn × 36 1/576 60×2×12 n-Loop CTB + Gm_Nu + Fast HCB n× 2 ) 2 ( ave× Gm sn +36 n /3600 60×2 n-Loop CTB + Lm_Nu + Fast HCB n× 2 ) 2 ( ave× Lm sn +36 (n+0.4) / 20736 60×2

6.2. Objective Quality Analysis

The rate-distortion curves of these bit allocation schemes are shown in Fig. 7 and 8. Two major evaluative methodologies, ANMR and MNMR, are used for distortion. We can find that the performance of the CTB scheme is similar to that of the JTB scheme. The ANMR performance loss is less than 0.2dB for One-Loop CTB-ANMR and less than 0.1dB for Two-Loop CTB-ANMR (the lowest three curves in Fig. 7). The MNMR performance loss is less than 0.1 dB for both One- and Two-Loop CTB-MNMR (the lowest three curves in Fig. 8). Both of them are much better than the MPEG-4 VM (the top line).

(24)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

8

The differences of performance between the fast searching algorithms and the original CTB-MNMR scheme are shown in Fig. 9 and 10. In light of the complexity analyses on Gm_Nu and Lm_Nu, and the uniform NB_SF fast algorithms, with NB_SF=12 and 5, are chosen for comparison. There is nearly no performance loss for the Gm_Nu algorithm (ANMR or MNMR Difference ≈ 0). The advantage of the non-uniform algorithms over the uniform algorithms at about the same complexity is clearly shown in Figs. 9 and 10.

Total Bit Rate (kbps)

16 32 48 64 80 AN MR ( d B) -6 -4 -2 0 2 4 6 8 10 VM-TLS One-Loop CTB-MNMR Two-Loop CTB-MNMR JTB-MNMR One-Loop CTB-ANMR Two-Loop CTB-ANMR JTB-ANMR

Fig. 7. ANMR Rate-Distortion Analysis

Total Bit Rate (kbps)

16 32 48 64 80 MN MR (d B) -3 -1 1 3 5 7 9 11 13 15 One-Loop CTB-MNMR Two-Loop CTB-MNMR JTB-MNMR VM-TLS One-Loop CTB-ANMR Two-Loop CTB-ANMR JTB-ANMR

Fig. 8. MNMR Rate-Distortion Analysis

6.3. Subjective Quality Analysis

Listening test by human ears is the traditional method to subjectively evaluate the audio quality and is also the most recognized subjective quality test. However, such subjective test is expensive, time consuming, and difficult to reproduce. Informal listening tests on the aforementioned schemes show that it is hard to differentiate between JTB and various CTB schemes. In addition, a “simulated” subjective test, Objective Difference Grade (ODG), has been conducted. ODG

is a measure of quality designed to be comparable to the Subjective Difference Grade (SDG). It is calculated based on the difference between the quality rating of the reference and test (coded) signals. The ODG has a range of [-4, 0], in which –4 stands for very annoying difference and 0 stands for imperceptible difference between the reference and the test signals [10][11]. The ODG results for various search schemes discussed in this paper are shown in Fig. 11 and the reference signal is the original audio sequence. According to the collected test data (Fig. 11), the difference between JTB and CTB schemes is quite small. The ODG results, which are relative to the CTB-MNMR scheme, for various fast searching algorithms are shown in Fig. 12. Again the performance of the non-uniform NB_SF algorithms is better than that of uniform NB_SF algorithms at about the same computational complexity.

Total Bit Rate (kbps)

16 32 48 64 80 A N M R D iffe re n c e (d B ) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Uniform Lm_Nu Uniform Gm_Nu ) 5 (sn= ) 12 (sn=

Fig. 9. ANMR Difference Analysis

Total Bit Rate (kbps)

16 32 48 64 80 M N M R D if fe rence (d B ) 0.00 0.02 0.04 0.06 0.08 0.10 Uniform Lm_Nu Uniform Gm_Nu ) 5 (sn= ) 12 (sn=

(25)

YANG AND HANG

CASCADED TRELLIS-BASED OPTIMIZATION FOR AAC

AES 115TH CONVENTION, NEW YORK, NEW YORK, 2003 OCTOBER 10-13

9

Total Bit Rate (kbps)

16 32 48 64 80 ODG -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 VM-TLS One-Loop CTB-MNMR Two-Loop CTB-MNMR JTB-MNMR One-Loop CTB-ANMR Two-Loop CTB-ANMR JTB-ANMR

Fig. 11. ODG for VM-TLS, JTB and CTB

Total Bit Rate (kbps)

16 32 48 64 80 ODG -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 Uniform Lm_Nu Uniform Gm_Nu ) 5 (sn= ) 12 (sn=

Fig. 12. ODG for Various Fast Searching Algorithms

7. CONCLUSIONS

In this paper, we propose a CTB optimization scheme for the MPEG-4 AAC coder, in which the optimization procedures for finding coding parameters, SF and HCB, are separated in two consecutive steps. Based on the complexity analysis, the proposed CTB scheme is approximately 71 to 142 times faster than the JTB scheme. Moreover, the simulation results show that both the objective and subjective quality of the CTB scheme is close to that of the JTB scheme. In addition, we also propose a lossless fast searching algorithm for trellis-based HCB optimization, which is about 4 times faster. Furthermore, two non-uniform searching algorithms, Gm_Nu and Lm_Nu, are proposed for trellis-based SF optimization. The simulation results show that the non-uniform searching algorithms can achieve better performance than uniform searching algorithms under the same complexity.

8. ACKNOWLEDGMENT

This work was supported by National Science Council, Taiwan, R.O.C., under Grant NSC-91-2219-E-009-011.

9. REFERENCES

[1] ISO/IEC JTC1/SC29, “Information technology – vary low bitrate audio-visual coding,” ISO/IEC

IS-14496 (Part 3, Audio), 1998.

[2] M. Bosi, et al., “ISO/IEC MPEG-2 advanced audio coding,” Journal of Audio Engineering

Society, vol. 45, pp. 789-814, October 1997.

[3] A. Aggarwal, et al., “Trellis-based optimization of MPEG-4 advanced audio coding,” Proc. IEEE

Workshop on Speech Coding, pp. 142-4 2000.

[4] A. Aggarwal, et al., “Near-optimal selection of encoding parameters for audio coding,” Proc. of

ICASSP, vol. 5, pp. 3269-3272, Jun 2001.

[5] T. V. Sreenvias and M. Dietz, “Vector quantization of scale factors in advance audio coder (AAC),” Proc. of ICASSP, vol. 4, pp. 3641-3644, May 1998.

[6] H. Najafzadeh and P. Kabal, “Improving perceptual coding of narrowband audio signals at low rates,” Proc. of ICASSP, vol. 2, pp. 913-916, March 1999.

[7] H. Najafzadeh and P. Kabal, “Perceptual bit allocation for low rate coding of narrowband audio,” Proc. of ICASSP, vol. 2, pp. 893-896, 2000.

[8] “The MPEG audio web page.” http://www.tnt. unihannover. de/project/mpeg/audio.

[9] European Broadcasting Union, Sound Quality

Assessment Material: Recordings for Subjective Tests Brussels, Belgium, Apr. 1988.

[10] Draft ITU-T Recommendation BS.1387: “Method for objective measurements of perceived audio quality,” July. 2001.

[11] A. Lerchs, “EAQUAL software”, Version 0.1.3 alpha, http://www.mp3-tech.org/

(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)

Rate control for real time media based on predictive wireless channel

condition

根據預測的無線通道情況控制即時資料的傳輸速率

計畫編號:NSC 91-2219-E-009-009

執行期限:91 年 08 月 01 日至 92 年 07 月 31 日

主持人:張文鐘 交通大學電信系

E-mail:wtchang@cc.nctu.edu.tw

一,中文摘要

關鍵字:(

通道模型,自動重傳,通道預

測) 兩個狀態的馬可夫鍊被用來描述

衰變通道。在這個模型,好狀態代表正

確傳輸,而壞狀態代表位元錯誤。這兩

個狀態模型的轉換機率是由通道情況

推導而來,像是都卜勒頻率,平均信號

雜音比及符號長度。利用這麼個模型

來代表目前通道情況的目的是要用轉

換機率來推測未來的通道情況。進兒

預測未來的位元錯誤率及自動重傳的

量。這個自動重傳的量就可以從原先

已分配好的位元量減掉,以防止因重

傳而使緩衝器擠滿。

ABSTRACT (keywords: channel model,

ARQ, channel prediction): A two-state

Markov chain is used to model the

fading channel. In this model, good state

indicates correct transmission and bad

state indicates bit error. The transition

probability of the two-state model is

derived as a function of the channel

condition such as Doppler frequency,

average SNR and symbol timing. The

purpose of the state model to represent

the current channel condition is that the

future channel condition can be

predicted from the transition matrix.

Based on this model, the future channel

condition and bit error rate can be

predicted and the amount of ARQ can be

pre-determined. This amount of ARQ is

thus subtracted from the pre-allocated

target bit for real time media to

pre-compensate for future

re-transmission.

二,緣由與目的

Most of the error in transmission is due

to channel fading. The current wireless

system employs ARQ protocol to deal

with erroneous packets. This kind of

error concealment will increase the

transmission burden and pose a problem

for real time media data. The direct

consequence is that the effective buffer

output rate will decrease due to the

retransmission. To prevent overflow, one

of the method is to adapt the source

coding rate according to the buffer

condition. The buffer fullness reflects

the channel condition prior to the current

encoding moment. With such a strategy,

the current coding rate will be a function

of the amount of ARQ that was issued

before. However, if the channel

condition can be predicted in advance

such that during the source coding

period the transmission error rate can be

estimated, the source coding rate can be

accordingly adapted such that ARQ will

not increase the buffer fullness to

prevent future buffer overflow and

frame skip. To achieve this goal, a

strategy that further reduces the bit rate

allocated for each frame to be coded

according to the channel prediction is

proposed. This is used to

pre-compensate the amount of future

possible ARQ retransmission.

In the original model, the buffer update

rule is as follows:

)

0

,

/

max(

W

B

R

F

W

=

prev

+

prev

if W>M, M the threshold, the next frame

would be skipped. Otherwise, the frame

1

參考文獻

相關文件

– Association Request Frame: Capability information, Listen Interval, SSID, and Supported Rates. – Association Response Frame: Capability information, Status code, Association

A linear address is divided into a page directory field, page table field, and page. Directory

In this way, we can take these bits and by using the IFFT, we can create an output signal which is actually a time-domain OFDM signal.. The IFFT is a mathematical concept and does

It is intended in this project to integrate the similar curricula in the Architecture and Construction Engineering departments to better yet simpler ones and to create also a new

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

n Media Gateway Control Protocol Architecture and Requirements.

– Basic concept of computer systems and architecture – ARM architecture and assembly language.. – x86 architecture and

Then, the time series of aiming procedure is partitioned into two portions, and the first portion is designated for the main aiming trajectory as well as the second potion is