其他處理器比較

第五章驗證結果與晶片實現

5.3 效能比較

5.3.3 其他處理器比較

如表 5-14 所示，與其他處理器比較 256 點 FFT 效能與速度。

表 5-14：多級向量量化效能比較表[33]

Device Speed MHz

256-point FFT benchmark

ADSP-218X 75

DSP16410 170

TMS320C54X 160

`320VC549 100

ARM7TDMI/

Piccolo

DSP1620 120

This work 125

Execution time Execution cycle

20 40 60 80 100 120 140 160 180 200 (us) 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K (cycles) 另外與 TIC54x 的規格比較如表 5-15。

表 5-15：與 TI 的規格比較表 TIC54X Proposed

Design S-P –Com MAC 8 cycle 1 cycle S-P –Real MAC 1 cycle 1/2 cycle R2 Butterfly 8 cycle 9 cycle * S-P Com Bit-Revise 3 cycle 2 cycle Convolution 1 cycle 1/2 cycle Hardware

Accelerators

Image/Video Extension

MSVQ Paths

External RAM Type Support

Async SDRAM

SRAM

*：include Scale shift & load/store memory

與其他 paper[2]比較，處理器加上智慧型 DMA 後效能已逼近且成本非常低。

表 5-16：效能與成本比較表 ITEM 50-taps FIR

100 samples

50-taps complex FIR,

100 samples

1K Complex FFT

Gate Count

This Work 11200 (cycles)

24000 (cycles)

53427 (cycles)

136K (Include

VQ IP) A 32-b RISC/DSP

Microprocessor[2]

12200 (cycles)

22000 (cycles)

45000 (cycles)

210K

第六章結論

通用的 32-bit RISC 處理器與平行的 16-bit/32-bit DSP 整合，為一個兼顧成本與效能單核心處理器的解決方案，本論文提出語音導向的智慧型 DMA 搭配處理器可成為 RISC/DSP 平行處理的架構，掛載的語音向量量化加速器符合 IP 重覆使用的特性，對特定運算可大幅提升處理器的效能。智慧型 DMA 擁有傳輸與運算的模式，傳輸方面從周邊到記憶體擁有多種傳輸組合，並有多種的定址模式提升傳輸效率；在運算方面內建 ALU，能處理複數乘法、柱狀位移等資料路徑，且指令集模式符合 RISC 架構，不影響處理器管線的運作。

智慧型 DMA 的特色在於能結合傳輸與運算的功能，在計算中有效率的安排資料與運算，提高處理器的處理的效率。以下說明三種合作方式：

(1)利用智慧型 DMA 傳輸資料的定址法，與 ALU 的乘加器合作，可進行固定迴圈的乘累加運算。

(2)利智慧型 DMA 的位元反轉傳輸模式，在進行快速複立葉轉換前先安排好係數，再經由智慧型 DMA 裡 ALU 的資料路徑作蝴蝶運算。

(3)進行編碼簿搜尋時，利用智慧型 DMA 對外部記憶體周邊的溝通能力，協助向量量化加速器傳輸編碼簿，進行加權均方誤差的計算。

本論文提出的 RISC/DSP 處理器特別適合於語音的運算，語音的運算常需要對訊號做頻譜轉換來進行分析；另外也常用線性預測模型來合成語音的訊號，對此，這個處理器特別適合作以上兩個動作，如 FFT 及 FIR 的實現，可用在語音壓縮、語音辨識等。

為了應付其他語音應用或影像處理，智慧型 DMA 可以針對演算法掛載特定的硬體加速器 IP，因此未來可增加一條高速的匯流排如 AHB bus，各種加速 IP 與外部記憶體皆可透過匯流排與處理器內部作溝通，由智慧型 DMA 掌管優先權順序。另外在智慧型 DMA 與記憶體處理能力上，為了成本導向的目標，未來在智慧

型 DMA 內可增加處理動態隨機存取記憶體的單元(DRAM)，相較於靜態隨機存取記憶體(SRAM)，需支援高低位址的資料交互讀取的機制，雖然在速度上不如 SRAM，

但擁有高密度成本低的優點。

本論文設計與一顆通用 RISC 處理器做整合，並應用在 MELP 語音壓縮演算法上，能解決 MELP 三個運算集中的地方，成功減少運算量 70%以上。本設計將在國家晶片中心（CIC）下線，未來此顆晶片可用做低成本的單核心 RISC/DSP 處理器來使用，也可以 IP 的方式，將系統整合起來，成為一個 SOC 的系統。

參考文獻

[1] Vijay K. Madisetti, “VLSI Digital Signal Processors: An Introduction to Rapid Prototyping and Design Synthesis,” IEEE Press, 1995.

[2] Michael Dolle, Satwinder Jhand, Walter Lehner, Otto M¨uller and Manfred Schlett “A 32-b RISC/DSP microprocessor with reduced complexity,” IEEE Journal of Solid-State Circuits, vol. 32, no. 7, 1997.

[3] Dave Comiskey, Sanjive Agarwala, and Charles Fuoco, “A scalable high-performance DMA architecture for DSP application,” Proceedings of the IEEE International Conference on Computer Design (ICCD), pp. 414-417, 2000.

[4] Luca Breveglieri and Luigi Dadda, “A VLSI inner product macrocell,” IEEE Trans. on VLSI, vol. 6, no. 2, pp. 292-298, 1998.

[5] Vassilios A. Chouliaras and Jose Nunez, “Scalar Coprocessors for Accelerating the G723.1 and G729A Speech Coders,” IEEE Transactions on Consumer Electronics, vol. 49, no. 3, 2003.

[6] V.A.Chouliaras, J.L. Nunez, K. Koutsomyti, D.J. Mulvaney and S. Datta, S.R.

Parr, “Development of custom vector accelerator for high-performance speech coding,” Electronics Letters 25th, vol. 40, no. 24, 2004.

[7] Han Qi, Zheng Jiang and Jia Wei, ”IP reusable design methodology,” ASIC, 2001.

Proceedings. 4th International Conference, pp. 756-759, 2001.

[8] 戴顯權編著,資料壓縮, 紳藍出版社, 2002.

[9] L. Supplee, R. Cohn,J. Collura, and A. McCre., “MELP: The New Federal Standard at 2400 bps,” IEEE ICASSP-97 Conference, pp.1591-1594, 1997.

[10] Tian Wang, Kazuhito Koishida., Vladimir Cuperman, Allen Gersho and John S.

Collura, “A 1200/2400 bps coding suite based on MELP,” Speech Coding, 2002,

IEEE Workshop Proceedings, pp.90-92, 2002.

[11] Grant Davidson, Allen Gersho, “Application of a VLSI Vector Quantization Processor to Real-Time Speech Coding,” IEEE Journal on Selected Areas in Communications, vol. sac-4, no. 1 , 1986.

[12] Chin-Liang Wang and Ker-Min Chen , “A New VLSI Architecture for Full-Search Vector Quantization,“ IEEE Transactions on Circuit and Systems for Video Technology, vol. 6, no. 4, 1996.

[13] Chen-Yi Lee, Shih-Chou Juan and Yen-Juan Chao, “Finite State Vector Quantization with Multipath Tree Search Strategy for Image Video Coding ,“IEEE Transactions on Circuit and Systems for Video Technology, vol. 6, no. 3, 1996.

[14] Massimiliano Bracco, Sandro Ridella and Rodolfo Zunino, “Digital Implementation of Hierarchical Vector Quantization,” IEEE Transactions on Neural Networks, vol. 14, no. 5, 2003.

[15] F. Lahouti and A. K. Khandani, “Reconstruction of Multi-Stage Vector Quantized Sources Over Noisy Channels- Applications to MELP Codec Technical Report,”

Department of Electrical & Computer Engineering University of Waterloo Waterloo, Ontario, Canada, 2004.

[16] Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck, Discrete Time Signal Processing, 2nd Edition, Prentice Hall, 1999.

[17] ARM Ltd, AMBA Specification, rev. 2.0, http://www.arm.com, 1999.

[18] B.H. Juang and A.H. Gray, “Multiple stage vector quantization for speech coding,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '82, vol. 1, pp. 597–600, 1982.

speech coding,” IEEE Transactions Speech Audio Process., vol 1, no.4, pp.373-385, 1993.

[20] John L. Hennessy and David A. Patterson, Computer Architecture, 3rd Edition, Morgan Kaufmann, 2003.

[21] John L. Hennessy and David A. Patterson, Computer Organization & Design：

The Hardware / Software Interface, 2nd Edition, Morgan Kaufmann Publishers, 1998.

[22] Bill S.-H.Kwan, Bruce F.Cockburn, and Duncan G. Elliott, “Implementation of DSP-RAM: architecture for parallel digital signal processing in memory,”

Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, pp. 341-346, 2001.

[23] Corinna. G. Lee and Mark. G. Stoodley, “Simple vector microprocessors for multimedia applications,” Proceedings of 31^st Annual ACM/IEEE International Symposium, pp. 25-36, 1998.

[24] Asawaree Kalavade and Edward A. Lee, “A hardware-software codesign methodology for DSP application,” IEEE Design & Test of Computers, vol. 10, pp. 16-28, 1993.

[25] Steve Fuber, ARM System-on-Chip Architecture, 2nd edition, Addison-Wesley Professional, 2000.

[26] Hans-Joachim Stolberg, Mladen Berekovic, Lars Friebe, Sören Moch, Mark Bernd Kulaczewski and Peter Pirsch, “HiBRID-SoC: A Multi-Core System-on-Chip Architecture for Multimedia Signal Processing Applications,”

Proceedings of International Conference on Very Large Scale Integration of System-on-Chip, pp. 155-160, 2003.

[27] Jae Sung Lee and Myung H. Sunwoo, “Design of new DSP instructions and their hardware architecture for high-speed FFT,” Kluwer Academic Publishers, pp.

247-254, 2003.

[28] Markovic, M.Z., “Speech compression - recent advances and standardization,”

Telecommunications in Modern Satellite, Cable and Broadcasting Service, 2001.

TELSIKS 2001. 5th International Conference, vol.1, pp.235 – 244, 2001.

[29] Donald E. Thomas and Philip Moorby, The Verilog Hardware Description Language, Kluwer Acadmic Publishers, 1994.

[30] Pran Kurup, et al., Logic Synthesis Using Synopsys, 2nd Edition, Kluwer Academic Publishers, 1997.

[31] Nian Shyang Chang, Cell-Based IC Physical Design and Verification, Chip Implementation Center, July, 2004.

[32] GlobalUnichip, UAPC-5110 DMA Controller Lite,”

http://www.globalunichip.com.tw, 2002.

[33] W.Smith and J.Smith, Handbook of Real-Time Fast Fourier Transforms. New York: IEEE Press, 1995

在文檔中智慧型DMA 應用於成本導向之語音處理器晶片設計 (頁 85-93)

第五章 驗證結果與晶片實現

5.3 效能比較

5.3.3 其他處理器比較

第六章 結論

參考文獻

第五章驗證結果與晶片實現

第六章結論