Sequential DVBT DSP assembly code generation

Chapter 6. DVBT DSP C Compiler Implementation

6.2 Sequential DVBT DSP assembly code generation

接下來是介紹DVBT DSP Compiler所支援的C語言的敘述句如何轉譯成 DVBT DSP assembly codes。在DVBT DSP的架構中，所有有關記憶體存取的指令都必需經由RA暫存器來運作，其中RA0,RA1,RA2是給LOAD用，RA3,RA4

是給STORE用。所以在整個循序DVBT DSP指令產生的時候，只要先給RA0,RA3

一個初始的記憶體位址，來執行基本的LOAD與STORE指令。所以在C語言中所宣告的變數，相當於是相對於RA暫存器位址的位移量，而不是位址。以FOR 迴圈為例

Void main (void){

圖十九、FOR 迴圈執行結果

Void main(void){

int a;

a=0;

while(a<10){

a++;

} }

mov RA3,0x0000 mov RA0,0x0000 mov R0,0

store R0,#0 b test

WLoop0:

load R0,#0 add R0,R0,1 store R0,#0 test:

load R0,#0 cmplt R1,R0,10 bnz R1,WLoop0

圖二十、while 結構 DVBT DSP 程式範例

在比較指令的部份，DVBT DSP可以作暫存器與暫存器之間的比較，跟暫存器與立即值之間的比較。將結果值存到暫存器中。分之指令在利用此暫存器的值

決定是否要發生跳躍。整個while架構是由WLOOP與test兩個標籤所構成，程

If的架構方面，是由if_true , else , endif三個標籤所構成，如果if的判斷是 true，則跳到if_true執行，如果是false，則跳到else執行。如果是不完整的if 架構，也就是沒有else 的if，則compiler將不會產生else標籤，在b else的那行指令會變成b endif。圖6-9跟圖6-10是if架構執行結果。

圖二十三、if else 架構執行結果 圖二十四、if 架構執行結果

void main(viod){

int a;

a=0;

do{

a++;

}while(a<10) }

mov RA3,0x0000 mov RA0,0x0000 mov R0,0

store R0,#0 DLoop0:

load R0,#0 add R0,R0,1 store R0,#0 load R0,#0 cmplt R1,R0,10 bnz R1,DLoop0

圖二十五、do while 結構 DVBT DSP 程式範例

在do while架構中，跟while的架構差不多，只是判斷放在架構的最後執行，

所以do while架構中的指令至少會被執行一次。圖6-12使是do while架構執行結果。

圖二十六、do while 架構執行結果

在ARRAY的宣告部份，DVBT DSP compiler的做法跟變數的宣告相同，

ARRAY宣告以後，compiler會依序將位移量分配下去。假設宣告的變數為

int a,b,c[4],d，則位移量的分配如下:

Offset variable 0 a 1 b 2 c[0]

3 c[1]

4 c[2]

5 c[3]

6 d 表十六、位移分配表

執行結果

void main(){

int a,b,c[4],d;

a=d;

b=c[1];

}

圖二十七、宣告變數的執行結果

由圖x-x可知，在宣告c[4]後，程式執行到a=d，是做load R0,#6跟storeR0,#0，可見compiler給 d位移量的是6沒錯。

初值的給定

void main(){

int a=1,b,c[]={9,8,7,6};

b=c[0];

}

圖二十八、初值給定的執行結果

由圖6-14可知，當在變數宣告的時候，同時安排初始值的話，在程式的一開始就會先把值一個一個存到memory中，在開始執行其他指令。

執行sort

main() {

int A[]= {11, 2 , 3, 6, 4, 20, 13, 8, 25, 7};

int i, j,tmp;

for(i=0; i< 10; i++){

for( j=0; j<9; j++){

if( A[j]<A[j+1] ){

tmp= A[j];

A[j]=A[j+1];

A[j+1]=tmp;

} }

圖二十九、一個 bubble sort C 程式

最後對compiler作依個粽合的驗證，我們將一個bubble sort C 程式，送進compiler 執行，執行結果如下:

圖三十、bubble sort 執行結果

結果與討論

第一年的計畫主要在DVBT_DSP微架構的設計與指令集的定義。目前計畫

進行的狀態，已將DVBT_DSP硬體透過模擬軟體與FPGA版驗證完畢，並使用幾個常在數位訊號處理中使使用的演算法評估效能，以更精確的評估實作上的時脈限制、成本限制，回饋改進原來的設計，以及克服實作的限制及成效的保證，

從事系統參數調整及結構修正。不過DVBT_DSP應用分析環境正在建立當中，

因此計畫目前執行的進度大約是百分之九十。在下年度的計畫中，主要目標是將

DVBT_DSP處理核心驗證及整合於400MHz之IC晶片，並精確地估計得出各單

元電路大小，即進入第二次電路配置，更詳細地規劃各單元在晶片上的位置及大小。並依配置上的困難回饋修改原設計。在初步的實作電路整合完成後，在總計畫的整合驗證環境下，以實際的應用程式、作業系統等，與其他子計畫進行整合測試，以對設計進行相容性驗證。一但有相容性的問題出現，就要回饋修改原設計。

DSP核心高階語言支援設計，目前已經可以將C語言轉譯成 DVBT_DSP 的目的程式。第二年目標實現DSP核心高階語言最佳化編譯器設計，將設計相容於C語言的高階語言的支援環境，此環境包括高階語言編譯器及除錯環境，並最佳化的轉譯指令排程以增進DVBT_DSP 的執行效能，以下兩點是預計進行最佳化的方式。

(1) 支援資料流方面有向量化運算與窗型運算(Data Windowing Computing )結構的分析。

(2) 支援指令並行方面有零負擔的迴圈(Zero-overhead Looping)、軟式管

道運算(Software Pipelining)與多引線的指令排程( Multi-threading) 的結構分析。

參考文獻

[1] J. F. Parker, et al., “A CMOS continuous-time NTSC-to-color-difference decoder,” IEEE Journal of Solid-State Circuits, Vol. 30, pp. 1524-1532, Dec.

1995.

[2] M. McGinn and J. Alberkrack, “An advanced multi-standard TV video/sound IF,” IEEE Transactions on Consumer Electronics, Vol. 40, pp. 290-298, Aug.

1994.

[3] D.-S. Han, J.-H. Seo, and J.-J. Kim, “Fast carrier frequency offset compensation in OFDM systems,” IEEE Transactions on Consumer Electronics, Vol. 47, pp.

364-369, Aug. 2001.

[4] Y. Wu, E. Pliszka, B. Caron, P. Bouchard, and G. Chouinard, “Comparison of Terrestrial DTV Transmission Systems: The ATSC 8-VSB, the DVB-T COFDM, and the ISDB-T BST-OFDM,” IEEE Trans. Broadcasting, Vol. 46, No. 2, June 2000.

[5] F. Classen and H. Meyr, “Frequency synchronization algorithms for OFDM systems suitable for communication over frequency selective fading channel,” in Proc. IEEE Vehic. Technol. Conf., Stockholm, Sweden, pp. 1655-1659, June 1994.

[6] S. A. Fechtel, “OFDM carrier and sampling frequency synchronization and its performance on stationary and mobile channels,” IEEE Transactions on Consumer Electronics, Vol. 46, pp. 438-441, Aug. 2000.

[7] A. G. Armada, B. Bardon, and M. Calvo, “Parameter optimization and simulated performance of a DVB-T digital television broadcasting system,” IEEE Transactions on Broadcasting, Vol. 44, pp. 131-138, March 1998.

[8] J. Rinne, “Diversity reception schemes for COFDM in a mobile environment utilizing soft-bit information,” IEEE International Conference on Communications 2001, pp. 3041-3045, 2001.

[9] S. Armour, et al., “Complexity evaluation for the implementation of a pre-FFT equalizer in an OFDM receiver,” IEEE Transactions on Consumer Electronics,

Vol. 46, pp. 428-437, Aug. 2000.

[10] S. Cacopardi et al, ”High end implementation of DVB-T OFDM demodulator using general purpose DSPs”, International Conference on Consumer Electronics, pp. 382-383, 1998.

[11] F. Frescura et al, “DSP based OFDM demodulator and equalizer for professional DVB-T receivers”, IEEE Trans. on Broadcasting, vol. 45, No.3, pp. 323-332, September 1999.

[12] C. D. Toso et al, “0.5-um CMOS circuits for demodulation and decoding of an OFDM-based digitl TV signal conforming to the European DVB-T standard”, IEEE Journal of Solid-State Circuits, vol.33, pp. 1781-1791, November 1998.

[13] C. C. W. Hui et al, “A 64-point Fourier transform chip for vedio motion compensation using phase correlation”, IEEE Journal of Solid-State Circuits, vol.31, pp. 1751-1761, November 1996.

[14] E. E. Swartzlander, W. K. W. Young and S.J. Joseph, “A radix-4 delay commutator for fast Fourier transform processor implementation,” IEEE Journal of Solid-State Circuits, vol. SC-19, pp. 702-709, October 1984.

[15] E. E. Swartzlander, V. Jain. And H. Hikawa, “A radix-8 wafer scale FFT processor,” IEEE Journal of VLSI Signal Processing, pp.165-176, May 1992.

[16] E. H. Wold and A. M. Despain, “Pipeline and parallel-pipeline FFT processors for VLSI implementation,” IEEE Transactions on Computers, vol. C-33, pp.414-426, May 1984.

[17] B. Gold and T. Bially, “Parallelism in fast Fourier transform hardware,” IEEE Transactions on Audio Electroacoustics, vol.21, no.1, pp. 5-16,1973.

[18] A. M. Despain, “Fourier transform computer using CORDIC iterations”, IEEE Transactions on Computer, pp. 993-1001, Oct 1974.

[19] A. M. Despain, “Very fast Fourier transform algorithms hardware for implementation,” IEEE Transactions on Computers, vol. C-28, no.5, May 1979.

[20] E. Bidet, D. Castelain, C. Joanblanq, and P. Stenn, “A fast single chip

Circuits, vol.30, pp. 300-305, March 1995.

[21] S. He and M. Torkelson, “Design and implementation of a 1024-point pipeline FFT processor”, IEEE Custom Integrated Circuits Conference, pp. 131-134, Santa Clara, CA, May 1998.

[22] B. M. Bass, “A 9.5mW 330usec 1024-point FFT processor”, IEEE Custom Integrated Circuits Conference, pp. 127-130, San Jose, CA, May 1998.

[23] G. Bi and E. Jones, “A pipelined FFT processor for word-sequential data,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, pp.1982-1985, December 1989.

[24] Yutai Ma, “A effective memory addressing scheme for FFT processors”, IEEE Transactions on Signal Processing, vol.47, No.3 pp. March 1999.

[25] D. Sima, “Superscalar instruction issue”, IEEE Micro, Vol. 17, Sept. 1997, pp.

28-39.

[26] Albert Yu, “The future of microprocessors”, IEEE Micro, Dec 1996, pp. 46-53.

[27] Gregory A. Uvieghara, Wen-mei W. Hwu, etc, “An experimental single-chip data flow CPU”, IEEE J. Solid-State Circuits, Vol. 27, No. 1, Jan. 1992, pp.17-28.

[28] Ellen Spertus, and William J. Dally, Experiments with Dataflow on a General-Purpose Parallel Computer, MIT AI Laboratory Report, April 1994.

[29] P. B. Endecott, “Superscalar instruction issue in an asynchronous microprocessor”, IEE Proc. Comput. Digit. Tech., Vol. 143, No. 5, Sep. 1996, pp.

266-272.

[30] R. S. Nikhil, G. M. Papadopoulos, Arvind, “a multithreaded massively parallel architecture”, ACM, 1992.

[31] Lundberg, M.; Muhammad, K.; Roy, K.; Wilson, S.K., “High-level modeling of switching activity with application to low-power DSP system synthesis”, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 1877 - 1880, March 1999.

[32] Keane, G.; Spanier, J.; Woods, R., “The impact of data characteristics and

hardware topology on hardware selection for low power DSP”, Proceedings of International Symposium on Low Power Electronics and Design, pp. 94-96, Aug.

1998.

[33] An-Yeu Wu; Liu, K.J.R.; Zhongying Zhang; Nakajima, K.; Raghupathy, A.,

“Low-power design methodology for DSP systems using multirate approach”, Proceedings of IEEE International Symposium on Circuits and Systems, ISCAS '96, pp. 292 - 295 vol.4, May 1996.

[34] An-Yeu Wu; Liu, K.J.R.; Zhongying Zhang; Nakajima, K.; Raghupathy, A.,

“Algorithm-based low-power DSP system design: methodology and verification”, VLSI Signal Processing, VIII, pp. 277 – 286, Sept. 1995.

[35] Prayati, A.; Chun Wong; Marchal, P.; Cossement, N.; Catthoor, F.; Lauwereins, R.; Verkest, D.; De Man, H.; Birbas, A., “Task concurrency management experiment for power-efficient speed-up of embedded MPEG4 IM1 player”, Proceedings of Parallel Processing, pp. 453 – 460, Aug. 2000.

[36] Mun Gi Choi; Yingchun Xu , “A new multimedia network architecture using 3G CDMA2000”, Vehicular Technology Conference, pp. 2937 – 2944, Sept. 2000.

[37] Bellas, N.; Hajj, I.; Polychronopoulos, C.; Stamoulis, G., “Energy and performance improvements in microprocessor design using a loop cache”, Proceedings of International Conference on Computer Design, pp. 378 – 383, Oct. 1999.

[38] Chingren Lee; Jenq Kuen Lee; TingTing Hwang, “Compiler optimization on instruction scheduling for low power”, Proceedings of The 13th International Symposium on System Synthesis, pp.55-60, Sept. 2000.

[39] Talla, D.; John, L.K.; Lapinskii, V.; Evans, B.L., “Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures”, Proceedings of International Conference on Computer Design, pp. 163 – 172, Sept. 2000.

[40] Sami, M.; Sciuto, D.; Silvano, C.; Zaccaria, V., “Instruction-level power estimation for embedded VLIW cores”, Proceedings of the Eighth International

[41] Frantz, G., “Digital signal processor trends”, IEEE Micro, pp. 52 – 59, vol. 20, Nov/Dec 2000.

[42] Furber, S.B.; Edwards, D.A.; Garside, J.D., “AMULET3: a 100 MIPS asynchronous embedded processor”, Proceedings of International Conference on Computer Design, pp. 329 - 334, Sept. 2000.

[43] Eyre, J.; Bier, J., “The evolution of DSP processors”, IEEE Signal Processing Magazine, pp. 43 – 51, vol.17, March 2000.

[44] Gatherer, A.; Stetzler, T.; McMahan, M.; Auslander, E., “DSP-based architectures for mobile communications: past, present and future”, IEEE Communications Magazine, pp. 84-90, vol. 38, Jan. 2000.

[45] Gunn, J.E.; Barron, K.S.; Ruczczyk, W., “A low-power DSP core-based software radio architecture”, IEEE Journal on Selected Areas in Communications, pp.

574 – 590, vol. 17, April 1999.

[46] Gonzalez, R.; Horowitz, M., “Energy dissipation in general purpose microprocessors”, IEEE Journal of Solid-State Circuits, pp. 1277-1284, vol. 31, Sept. 1996.

[47] Zhao Wu; Wolf, W., “Data-path synthesis of VLIW video signal processors,”

11th International Symposium on System Synthesis, pp 96 -1012-4, Dec 1998.

[48] Banerjee, S.; Sheikh, H.R.; John, L.K.; Evans, B.L.; Bovik, A.C., “VLIW DSP vs.

superscalar implementation of a baseline 11.263 video encoder,” Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers, 2000 , pp 1665 -1669 , Vol: 2 , 2000.

[49] Sunghyun Jee; Palaniappan, K., “Dynamically scheduling VLIW instructions with dependency information,” Proceedings. Sixth Annual Workshop on Interaction between Compilers and Computer Architectures, pp 15 -23, 2002.

[50] Ebcioglu, K.; Fritts, J.; Kosonocky, S.; Gschwind, M.; Altman, E.; Kailas, K.;

Bright, T., “An eight-issue tree-VLIW processor for dynamic binary translation,”

International Conference on Computer Design: VLSI in Computers and Processors, 1998. ICCD '98, pp 488 -495, 5-7 Oct 1998.

[51] Shyh-Kwei Chen; Fuchs, W.K., “Compiler-assisted multiple instruction word retry for VLIW architectures,” IEEE Transactions on Parallel and Distributed Systems, Vol 12, pp 1293 -1304, Dec 2001.

[52] Gray, J.; Naylor, A.; Abnous, A.; Bagherzadeh, N., “VIPER: A 25-MHz, 100-MIPS peak VLIW microprocessor,” Proceedings of the IEEE Custom Integrated Circuits Conference, pp 4.1.1 -4.1.5, 9-12 May 1993.

[53] Wolfe, A.; Fritts, J.; Dutta, S.; Fernandes, E.S.T., “Datapath design for a VLIW video signal processor,” Third International Symposium on High-Performance Computer Architecture, 1997., pp 24 -35, 1-5 Feb 1997.

[54] Talla, D.; John, L.K.; Lapinskii, V.; Evans, B.L., “Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures,”

International Conference on Computer Design, 2000, pp 163 -172.

[55] Moon, S.-M.; Park, S., “Performance analysis of VLIW compilation techniques,”

IEE Proceedings- Computers and Digital Techniques, Vol.147 Issue: 2 , pp 117 -123, Mar 2000.

[56] Colwell, R.P.; Hall, W.E.; Joshi, C.S.; Papworth, D.B.; Rodman, P.K.; Tornes, J.E., “Architecture and implementation of a VLIW supercomputer,” Proceedings of Supercomputing '90., pp 910 -919, 12-16 Nov 1990.

[57] Conte, T.M.; Banerjia, S.; Larin, S.Y.; Menezes, K.N.; Sathaye, S.W.,

“Instruction fetch mechanisms for VLIW architectures with compressed encodings,” Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996. MICRO-29. , pp 201 -211, 2-4 Dec 1996.

[58] Agarwala, S.; Koeppen, P.; Anderson, T.; Hill, A.; Ales, M.; Damodaran, R.;

Nardini, L.; Wiley, P.; Mullinnix, S.; Leach, J.; Lell, A.; Gill, M.; Golston, J.;

Hoyle, D.; Rajagopal, A.; Chachad, A.; Agarwala, M.; Castille, R.; Common, N.;

Apostol, J.; Mahm, “A 600 MHz VLIW DSP,” IEEE International Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC. 2002 , Vol. 1 , pp 56 -444, 2002.

retargetable VLIW compiler framework for DSPs with instruction-level parallelism,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 20 Issue: 11 , pp 1319 -1328, Nov 2001.

[60] Sunghyun Jee; Palaniappan, K., “Compiler processor tradeoffs for DISVLIW architecture” International Symposium on Parallel Architectures, Algorithms and Networks, 2002. I-SPAN '02., pp 175 -180, 2002.

[61] Lee, M.; Tirumalai, P.; Ngai, T.-F., “Software pipelining and superblock scheduling: compilation techniques for VLIW machines,” Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, 1993, pp 202 -213 , 5-8 Jan 1993.

[62] Seshan, N., “High VelociTI processing [Texas Instruments VLIW DSP architecture],” IEEE Signal Processing Magazine , Vol. 15 Issue: 2 , pp 86 -101, Mar 1998.

[63] Suzuoki, M.; Kutaragi, K.; Hiroi, T.; Magoshi, H.; Okamoto, S.; Oka, M.; Ohba, A.; Yamamoto, Y.; Furuhashi, M.; Tanaka, M.; Yutaka, T.; Okada, T.; Nagamatsu, M.; Urakawa, Y.; Funyu, M.; Kunimatsu, A.; Goto, H.; Hashimoto, K.; Ide, N.;

Murakami, H.; Ohtagu, “A microprocessor with a 128-bit CPU, ten floating-point MAC's, four floating-point dividers, and an MPEG-2 decoder,”

IEEE Journal of Solid-State Circuits, Vol. 34 Issue: 11 , pp 1608 -1618, Nov 1999.

[64] Stoodley, M.G.; Lee, C.G., “Software pipelining loops with conditional branches,” Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996. MICRO-29., pp 262 -273, 2-4 Dec 1996.

[65] Sair, S.; Kaeli, D.R.; Meleis, W., “A study of loop unrolling for VLIW-based DSP processors,” 1998 IEEE Workshop on Signal Processing Systems, SIPS 98. , pp 519 -527, 8-10 Oct 1998.

[66] Lee, L.; Suparjo, B.S.; Wagiran, R.; Sidek, R., “DSP design using VLIW architecture,” IEEE International Conference on Semiconductor Electronics, pp 160 -167, 2000

[67] Ahmad Khan, S.; Saqib, M.M.; Ahmed, S., ‘Parallel Viterbi algorithm for a

VLIW DSP,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000. ICASSP '00., Vol. 6 , pp 3390 -3393, 2000.

[68] Li Fanghui; Long Teng, “A high-speed real-time digital pulse compression system based on TMS320C6201,” 2001 CIE International Conference on Radar, pp 557 -561, 2001.

[69] Rizzo, D.; Colavin, O., “A video compression case study on a reconfigurable VLIW architecture,” Proceedings of Design, Automation and Test in Europe Conference and Exhibition, pp 540 -546, 2002

[70] Iseli, C.; Sanchez, E., “Spyder: a reconfigurable VLIW processor using FPGAs,”

Proceedings. IEEE Workshop on FPGAs for Custom Computing Machines, 1993. , pp 17 -24, 5-7 Apr 1993.

[71] Intel, Intel Itanium 2 Processor Reference Manual, 2002.

[72] Lex & yacc / John R. Levine, Tony Mason, Doug Brown

[73] Crafting a Compiler with C Charles N. Fischer and Richard J. Leblanc, Jr.

[74] Compilers : Principles, Techniques and Tools A. V. Aho, R. Sethi and J. D.

Ullman

[75] System Software L. L. Beck

計畫相關之著作

[1] Te-Shin Yang, Jih-Ching Chiu,"Improving ILP with the Vectorized Computing Mechanism in VLIW DSP Architectures,"Submit to IEE Proc. Computers &

Digital Techniques, Nov. 2003.

[2] Chih-Kang Wu and Jih-Ching Chiu,"Design of Buffering Mechanism for Improving Instruction and Data Stream,"Submit to IEE Proc. Computers &

Digital Techniques, Oct. 2003.

[3] Te-Shin Yang and Jih-Ching Chiu,"Vectorized Code Scheduling Method for the FFT Algorithm in VLIW Architecture ,"The Ninth Workshop on Compiler Techniques for High-Performance Computing, pp. 11-14, Mar. 2003.

計畫成果自評

數位訊號處理器核心在數位無線通訊系統晶片設計中，實現演算法設計，扮演著決定性的角色。當前相關的高效能的DSP 核心的發展如 TI的TMS320C6x 系列、Motorola 的DSP563xx、Lucent Technologics 的DSP16xx、Analog Devices

的 ADSP-2116x與TigerSHARC處理器，提供了各種高效能的技術參考，但若直

接將之應用於 OFDM 的相關技術的開發則將無法提供經濟的因應高資料頻寬的資料運算需求。基於本實驗室在微處理設計與實作的經驗(曾參與國內P7前瞻性微處理器設計計劃、曾完成 SiS 公司之 ARM 指令相容之五個 Pipeline Stage 的

在文檔中 DVB-T數位電視接收器創新IP設計與SoC實作---子計畫V：DVB-T數位視訊廣播系統之高效能數位運算核心的設計與實現(I)Design and Implementation of High Performance DSP Core in DVB-T System (I) (頁 38-61)