Chapter 6. DVBT DSP C Compiler Implementation
6.2 Sequential DVBT DSP assembly code generation
接下來是介紹DVBT DSP Compiler所支援的C語言的敘述句如何轉譯成 DVBT DSP assembly codes。在DVBT DSP的架構中,所有有關記憶體存取的指 令都必需經由RA暫存器來運作,其中RA0,RA1,RA2是給LOAD用,RA3,RA4
是給STORE用。所以在整個循序DVBT DSP指令產生的時候,只要先給RA0,RA3
一個初始的記憶體位址,來執行基本的LOAD與STORE指令。所以在C語言中 所宣告的變數,相當於是相對於RA暫存器位址的位移量,而不是位址。以FOR 迴圈為例
Void main (void){
圖十九、FOR 迴圈執行結果
Void main(void){
int a;
a=0;
while(a<10){
a++;
} }
mov RA3,0x0000 mov RA0,0x0000 mov R0,0
store R0,#0 b test
WLoop0:
load R0,#0 add R0,R0,1 store R0,#0 test:
load R0,#0 cmplt R1,R0,10 bnz R1,WLoop0
圖二十、while 結構 DVBT DSP 程式範例
在比較指令的部份,DVBT DSP可以作暫存器與暫存器之間的比較,跟暫存 器與立即值之間的比較。將結果值存到暫存器中。分之指令在利用此暫存器的值
決定是否要發生跳躍。整個while架構是由WLOOP與test兩個標籤所構成,程
If的架構方面,是由if_true , else , endif三個標籤所構成,如果if的判斷是 true,則跳到if_true執行,如果是false,則跳到else執行。如果是不完整的if 架構,也就是沒有else 的if,則compiler將不會產生else標籤,在b else的那行 指令會變成b endif。圖6-9跟圖6-10是if架構執行結果。
圖二十三、if else 架構執行結果 圖二十四、if 架構執行結果
void main(viod){
int a;
a=0;
do{
a++;
}while(a<10) }
mov RA3,0x0000 mov RA0,0x0000 mov R0,0
store R0,#0 DLoop0:
load R0,#0 add R0,R0,1 store R0,#0 load R0,#0 cmplt R1,R0,10 bnz R1,DLoop0
圖二十五、do while 結構 DVBT DSP 程式範例
在do while架構中,跟while的架構差不多,只是判斷放在架構的最後執行,
所以do while架構中的指令至少會被執行一次。圖6-12使是do while架構執行 結果。
圖二十六、do while 架構執行結果
在ARRAY的宣告部份,DVBT DSP compiler的做法跟變數的宣告相同,
ARRAY宣告以後,compiler會依序將位移量分配下去。假設宣告的變數為
int a,b,c[4],d,則位移量的分配如下:
Offset variable 0 a 1 b 2 c[0]
3 c[1]
4 c[2]
5 c[3]
6 d 表十六、位移分配表
執行結果
void main(){
int a,b,c[4],d;
a=d;
b=c[1];
}
圖二十七、 宣告變數的執行結果
由圖x-x可知,在宣告c[4]後,程式執行到a=d,是做load R0,#6跟storeR0,#0, 可見compiler給 d位移量的是6沒錯。
初值的給定
void main(){
int a=1,b,c[]={9,8,7,6};
b=c[0];
}
圖二十八、初值給定的執行結果
由圖6-14可知,當在變數宣告的時候,同時安排初始值的話,在程式的一 開始就會先把值一個一個存到memory中,在開始執行其他指令。
執行sort
main() {
int A[]= {11, 2 , 3, 6, 4, 20, 13, 8, 25, 7};
int i, j,tmp;
for(i=0; i< 10; i++){
for( j=0; j<9; j++){
if( A[j]<A[j+1] ){
tmp= A[j];
A[j]=A[j+1];
A[j+1]=tmp;
} }
} }
圖二十九、一個 bubble sort C 程式
最後對compiler作依個粽合的驗證,我們將一個bubble sort C 程式,送進compiler 執行,執行結果如下:
圖三十、bubble sort 執行結果
結果與討論
第一年的計畫主要在DVBT_DSP微架構的設計與指令集的定義。目前計畫
進行的狀態,已將DVBT_DSP硬體透過模擬軟體與FPGA版驗證完畢,並使用 幾個常在數位訊號處理中使使用的演算法評估效能,以更精確的評估實作上的時 脈限制、成本限制,回饋改進原來的設計,以及克服實作的限制及成效的保證,
從事系統參數調整及結構修正。不過DVBT_DSP應用分析環境正在建立當中,
因此計畫目前執行的進度大約是百分之九十。在下年度的計畫中,主要目標是將
DVBT_DSP處理核心驗證及整合於400MHz之IC晶片,並精確地估計得出各單
元電路大小,即進入第二次電路配置,更詳細地規劃各單元在晶片上的位置及大 小。並依配置上的困難回饋修改原設計。在初步的實作電路整合完成後,在總計 畫的整合驗證環境下,以實際的應用程式、作業系統等,與其他子計畫進行整合 測試,以對設計進行相容性驗證。一但有相容性的問題出現,就要回饋修改原設 計。
DSP核心高階語言支援設計,目前已經可以將C語言轉譯成 DVBT_DSP 的目的 程式。第二年目標實現DSP核心高階語言最佳化編譯器設計,將設計相容於C語言 的高階語言的支援環境,此環境包括高階語言編譯器及除錯環境,並最佳化的轉 譯指令排程以增進DVBT_DSP 的執行效能,以下兩點是預計進行最佳化的方式。
(1) 支援資料流方面有向量化運算與窗型運算(Data Windowing Computing )結構的分析。
(2) 支援指令並行方面有零負擔的迴圈(Zero-overhead Looping)、軟式管
道運算(Software Pipelining)與多引線的指令排程( Multi-threading) 的結構分析。
參考文獻
[1] J. F. Parker, et al., “A CMOS continuous-time NTSC-to-color-difference decoder,” IEEE Journal of Solid-State Circuits, Vol. 30, pp. 1524-1532, Dec.
1995.
[2] M. McGinn and J. Alberkrack, “An advanced multi-standard TV video/sound IF,” IEEE Transactions on Consumer Electronics, Vol. 40, pp. 290-298, Aug.
1994.
[3] D.-S. Han, J.-H. Seo, and J.-J. Kim, “Fast carrier frequency offset compensation in OFDM systems,” IEEE Transactions on Consumer Electronics, Vol. 47, pp.
364-369, Aug. 2001.
[4] Y. Wu, E. Pliszka, B. Caron, P. Bouchard, and G. Chouinard, “Comparison of Terrestrial DTV Transmission Systems: The ATSC 8-VSB, the DVB-T COFDM, and the ISDB-T BST-OFDM,” IEEE Trans. Broadcasting, Vol. 46, No. 2, June 2000.
[5] F. Classen and H. Meyr, “Frequency synchronization algorithms for OFDM systems suitable for communication over frequency selective fading channel,” in Proc. IEEE Vehic. Technol. Conf., Stockholm, Sweden, pp. 1655-1659, June 1994.
[6] S. A. Fechtel, “OFDM carrier and sampling frequency synchronization and its performance on stationary and mobile channels,” IEEE Transactions on Consumer Electronics, Vol. 46, pp. 438-441, Aug. 2000.
[7] A. G. Armada, B. Bardon, and M. Calvo, “Parameter optimization and simulated performance of a DVB-T digital television broadcasting system,” IEEE Transactions on Broadcasting, Vol. 44, pp. 131-138, March 1998.
[8] J. Rinne, “Diversity reception schemes for COFDM in a mobile environment utilizing soft-bit information,” IEEE International Conference on Communications 2001, pp. 3041-3045, 2001.
[9] S. Armour, et al., “Complexity evaluation for the implementation of a pre-FFT equalizer in an OFDM receiver,” IEEE Transactions on Consumer Electronics,
Vol. 46, pp. 428-437, Aug. 2000.
[10] S. Cacopardi et al, ”High end implementation of DVB-T OFDM demodulator using general purpose DSPs”, International Conference on Consumer Electronics, pp. 382-383, 1998.
[11] F. Frescura et al, “DSP based OFDM demodulator and equalizer for professional DVB-T receivers”, IEEE Trans. on Broadcasting, vol. 45, No.3, pp. 323-332, September 1999.
[12] C. D. Toso et al, “0.5-um CMOS circuits for demodulation and decoding of an OFDM-based digitl TV signal conforming to the European DVB-T standard”, IEEE Journal of Solid-State Circuits, vol.33, pp. 1781-1791, November 1998.
[13] C. C. W. Hui et al, “A 64-point Fourier transform chip for vedio motion compensation using phase correlation”, IEEE Journal of Solid-State Circuits, vol.31, pp. 1751-1761, November 1996.
[14] E. E. Swartzlander, W. K. W. Young and S.J. Joseph, “A radix-4 delay commutator for fast Fourier transform processor implementation,” IEEE Journal of Solid-State Circuits, vol. SC-19, pp. 702-709, October 1984.
[15] E. E. Swartzlander, V. Jain. And H. Hikawa, “A radix-8 wafer scale FFT processor,” IEEE Journal of VLSI Signal Processing, pp.165-176, May 1992.
[16] E. H. Wold and A. M. Despain, “Pipeline and parallel-pipeline FFT processors for VLSI implementation,” IEEE Transactions on Computers, vol. C-33, pp.414-426, May 1984.
[17] B. Gold and T. Bially, “Parallelism in fast Fourier transform hardware,” IEEE Transactions on Audio Electroacoustics, vol.21, no.1, pp. 5-16,1973.
[18] A. M. Despain, “Fourier transform computer using CORDIC iterations”, IEEE Transactions on Computer, pp. 993-1001, Oct 1974.
[19] A. M. Despain, “Very fast Fourier transform algorithms hardware for implementation,” IEEE Transactions on Computers, vol. C-28, no.5, May 1979.
[20] E. Bidet, D. Castelain, C. Joanblanq, and P. Stenn, “A fast single chip
Circuits, vol.30, pp. 300-305, March 1995.
[21] S. He and M. Torkelson, “Design and implementation of a 1024-point pipeline FFT processor”, IEEE Custom Integrated Circuits Conference, pp. 131-134, Santa Clara, CA, May 1998.
[22] B. M. Bass, “A 9.5mW 330usec 1024-point FFT processor”, IEEE Custom Integrated Circuits Conference, pp. 127-130, San Jose, CA, May 1998.
[23] G. Bi and E. Jones, “A pipelined FFT processor for word-sequential data,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, pp.1982-1985, December 1989.
[24] Yutai Ma, “A effective memory addressing scheme for FFT processors”, IEEE Transactions on Signal Processing, vol.47, No.3 pp. March 1999.
[25] D. Sima, “Superscalar instruction issue”, IEEE Micro, Vol. 17, Sept. 1997, pp.
28-39.
[26] Albert Yu, “The future of microprocessors”, IEEE Micro, Dec 1996, pp. 46-53.
[27] Gregory A. Uvieghara, Wen-mei W. Hwu, etc, “An experimental single-chip data flow CPU”, IEEE J. Solid-State Circuits, Vol. 27, No. 1, Jan. 1992, pp.17-28.
[28] Ellen Spertus, and William J. Dally, Experiments with Dataflow on a General-Purpose Parallel Computer, MIT AI Laboratory Report, April 1994.
[29] P. B. Endecott, “Superscalar instruction issue in an asynchronous microprocessor”, IEE Proc. Comput. Digit. Tech., Vol. 143, No. 5, Sep. 1996, pp.
266-272.
[30] R. S. Nikhil, G. M. Papadopoulos, Arvind, “a multithreaded massively parallel architecture”, ACM, 1992.
[31] Lundberg, M.; Muhammad, K.; Roy, K.; Wilson, S.K., “High-level modeling of switching activity with application to low-power DSP system synthesis”, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 1877 - 1880, March 1999.
[32] Keane, G.; Spanier, J.; Woods, R., “The impact of data characteristics and
hardware topology on hardware selection for low power DSP”, Proceedings of International Symposium on Low Power Electronics and Design, pp. 94-96, Aug.
1998.
[33] An-Yeu Wu; Liu, K.J.R.; Zhongying Zhang; Nakajima, K.; Raghupathy, A.,
“Low-power design methodology for DSP systems using multirate approach”, Proceedings of IEEE International Symposium on Circuits and Systems, ISCAS '96, pp. 292 - 295 vol.4, May 1996.
[34] An-Yeu Wu; Liu, K.J.R.; Zhongying Zhang; Nakajima, K.; Raghupathy, A.,
“Algorithm-based low-power DSP system design: methodology and verification”, VLSI Signal Processing, VIII, pp. 277 – 286, Sept. 1995.
[35] Prayati, A.; Chun Wong; Marchal, P.; Cossement, N.; Catthoor, F.; Lauwereins, R.; Verkest, D.; De Man, H.; Birbas, A., “Task concurrency management experiment for power-efficient speed-up of embedded MPEG4 IM1 player”, Proceedings of Parallel Processing, pp. 453 – 460, Aug. 2000.
[36] Mun Gi Choi; Yingchun Xu , “A new multimedia network architecture using 3G CDMA2000”, Vehicular Technology Conference, pp. 2937 – 2944, Sept. 2000.
[37] Bellas, N.; Hajj, I.; Polychronopoulos, C.; Stamoulis, G., “Energy and performance improvements in microprocessor design using a loop cache”, Proceedings of International Conference on Computer Design, pp. 378 – 383, Oct. 1999.
[38] Chingren Lee; Jenq Kuen Lee; TingTing Hwang, “Compiler optimization on instruction scheduling for low power”, Proceedings of The 13th International Symposium on System Synthesis, pp.55-60, Sept. 2000.
[39] Talla, D.; John, L.K.; Lapinskii, V.; Evans, B.L., “Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures”, Proceedings of International Conference on Computer Design, pp. 163 – 172, Sept. 2000.
[40] Sami, M.; Sciuto, D.; Silvano, C.; Zaccaria, V., “Instruction-level power estimation for embedded VLIW cores”, Proceedings of the Eighth International
[41] Frantz, G., “Digital signal processor trends”, IEEE Micro, pp. 52 – 59, vol. 20, Nov/Dec 2000.
[42] Furber, S.B.; Edwards, D.A.; Garside, J.D., “AMULET3: a 100 MIPS asynchronous embedded processor”, Proceedings of International Conference on Computer Design, pp. 329 - 334, Sept. 2000.
[43] Eyre, J.; Bier, J., “The evolution of DSP processors”, IEEE Signal Processing Magazine, pp. 43 – 51, vol.17, March 2000.
[44] Gatherer, A.; Stetzler, T.; McMahan, M.; Auslander, E., “DSP-based architectures for mobile communications: past, present and future”, IEEE Communications Magazine, pp. 84-90, vol. 38, Jan. 2000.
[45] Gunn, J.E.; Barron, K.S.; Ruczczyk, W., “A low-power DSP core-based software radio architecture”, IEEE Journal on Selected Areas in Communications, pp.
574 – 590, vol. 17, April 1999.
[46] Gonzalez, R.; Horowitz, M., “Energy dissipation in general purpose microprocessors”, IEEE Journal of Solid-State Circuits, pp. 1277-1284, vol. 31, Sept. 1996.
[47] Zhao Wu; Wolf, W., “Data-path synthesis of VLIW video signal processors,”
11th International Symposium on System Synthesis, pp 96 -1012-4, Dec 1998.
[48] Banerjee, S.; Sheikh, H.R.; John, L.K.; Evans, B.L.; Bovik, A.C., “VLIW DSP vs.
superscalar implementation of a baseline 11.263 video encoder,” Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers, 2000 , pp 1665 -1669 , Vol: 2 , 2000.
[49] Sunghyun Jee; Palaniappan, K., “Dynamically scheduling VLIW instructions with dependency information,” Proceedings. Sixth Annual Workshop on Interaction between Compilers and Computer Architectures, pp 15 -23, 2002.
[50] Ebcioglu, K.; Fritts, J.; Kosonocky, S.; Gschwind, M.; Altman, E.; Kailas, K.;
Bright, T., “An eight-issue tree-VLIW processor for dynamic binary translation,”
International Conference on Computer Design: VLSI in Computers and Processors, 1998. ICCD '98, pp 488 -495, 5-7 Oct 1998.
[51] Shyh-Kwei Chen; Fuchs, W.K., “Compiler-assisted multiple instruction word retry for VLIW architectures,” IEEE Transactions on Parallel and Distributed Systems, Vol 12, pp 1293 -1304, Dec 2001.
[52] Gray, J.; Naylor, A.; Abnous, A.; Bagherzadeh, N., “VIPER: A 25-MHz, 100-MIPS peak VLIW microprocessor,” Proceedings of the IEEE Custom Integrated Circuits Conference, pp 4.1.1 -4.1.5, 9-12 May 1993.
[53] Wolfe, A.; Fritts, J.; Dutta, S.; Fernandes, E.S.T., “Datapath design for a VLIW video signal processor,” Third International Symposium on High-Performance Computer Architecture, 1997., pp 24 -35, 1-5 Feb 1997.
[54] Talla, D.; John, L.K.; Lapinskii, V.; Evans, B.L., “Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures,”
International Conference on Computer Design, 2000, pp 163 -172.
[55] Moon, S.-M.; Park, S., “Performance analysis of VLIW compilation techniques,”
IEE Proceedings- Computers and Digital Techniques, Vol.147 Issue: 2 , pp 117 -123, Mar 2000.
[56] Colwell, R.P.; Hall, W.E.; Joshi, C.S.; Papworth, D.B.; Rodman, P.K.; Tornes, J.E., “Architecture and implementation of a VLIW supercomputer,” Proceedings of Supercomputing '90., pp 910 -919, 12-16 Nov 1990.
[57] Conte, T.M.; Banerjia, S.; Larin, S.Y.; Menezes, K.N.; Sathaye, S.W.,
“Instruction fetch mechanisms for VLIW architectures with compressed encodings,” Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996. MICRO-29. , pp 201 -211, 2-4 Dec 1996.
[58] Agarwala, S.; Koeppen, P.; Anderson, T.; Hill, A.; Ales, M.; Damodaran, R.;
Nardini, L.; Wiley, P.; Mullinnix, S.; Leach, J.; Lell, A.; Gill, M.; Golston, J.;
Hoyle, D.; Rajagopal, A.; Chachad, A.; Agarwala, M.; Castille, R.; Common, N.;
Apostol, J.; Mahm, “A 600 MHz VLIW DSP,” IEEE International Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC. 2002 , Vol. 1 , pp 56 -444, 2002.
retargetable VLIW compiler framework for DSPs with instruction-level parallelism,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 20 Issue: 11 , pp 1319 -1328, Nov 2001.
[60] Sunghyun Jee; Palaniappan, K., “Compiler processor tradeoffs for DISVLIW architecture” International Symposium on Parallel Architectures, Algorithms and Networks, 2002. I-SPAN '02., pp 175 -180, 2002.
[61] Lee, M.; Tirumalai, P.; Ngai, T.-F., “Software pipelining and superblock scheduling: compilation techniques for VLIW machines,” Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, 1993, pp 202 -213 , 5-8 Jan 1993.
[62] Seshan, N., “High VelociTI processing [Texas Instruments VLIW DSP architecture],” IEEE Signal Processing Magazine , Vol. 15 Issue: 2 , pp 86 -101, Mar 1998.
[63] Suzuoki, M.; Kutaragi, K.; Hiroi, T.; Magoshi, H.; Okamoto, S.; Oka, M.; Ohba, A.; Yamamoto, Y.; Furuhashi, M.; Tanaka, M.; Yutaka, T.; Okada, T.; Nagamatsu, M.; Urakawa, Y.; Funyu, M.; Kunimatsu, A.; Goto, H.; Hashimoto, K.; Ide, N.;
Murakami, H.; Ohtagu, “A microprocessor with a 128-bit CPU, ten floating-point MAC's, four floating-point dividers, and an MPEG-2 decoder,”
IEEE Journal of Solid-State Circuits, Vol. 34 Issue: 11 , pp 1608 -1618, Nov 1999.
[64] Stoodley, M.G.; Lee, C.G., “Software pipelining loops with conditional branches,” Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996. MICRO-29., pp 262 -273, 2-4 Dec 1996.
[65] Sair, S.; Kaeli, D.R.; Meleis, W., “A study of loop unrolling for VLIW-based DSP processors,” 1998 IEEE Workshop on Signal Processing Systems, SIPS 98. , pp 519 -527, 8-10 Oct 1998.
[66] Lee, L.; Suparjo, B.S.; Wagiran, R.; Sidek, R., “DSP design using VLIW architecture,” IEEE International Conference on Semiconductor Electronics, pp 160 -167, 2000
[67] Ahmad Khan, S.; Saqib, M.M.; Ahmed, S., ‘Parallel Viterbi algorithm for a
VLIW DSP,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000. ICASSP '00., Vol. 6 , pp 3390 -3393, 2000.
[68] Li Fanghui; Long Teng, “A high-speed real-time digital pulse compression system based on TMS320C6201,” 2001 CIE International Conference on Radar, pp 557 -561, 2001.
[69] Rizzo, D.; Colavin, O., “A video compression case study on a reconfigurable VLIW architecture,” Proceedings of Design, Automation and Test in Europe Conference and Exhibition, pp 540 -546, 2002
[70] Iseli, C.; Sanchez, E., “Spyder: a reconfigurable VLIW processor using FPGAs,”
Proceedings. IEEE Workshop on FPGAs for Custom Computing Machines, 1993. , pp 17 -24, 5-7 Apr 1993.
[71] Intel, Intel Itanium 2 Processor Reference Manual, 2002.
[72] Lex & yacc / John R. Levine, Tony Mason, Doug Brown
[73] Crafting a Compiler with C Charles N. Fischer and Richard J. Leblanc, Jr.
[74] Compilers : Principles, Techniques and Tools A. V. Aho, R. Sethi and J. D.
Ullman
[75] System Software L. L. Beck
計畫相關之著作
[1] Te-Shin Yang, Jih-Ching Chiu,"Improving ILP with the Vectorized Computing Mechanism in VLIW DSP Architectures,"Submit to IEE Proc. Computers &
Digital Techniques, Nov. 2003.
[2] Chih-Kang Wu and Jih-Ching Chiu,"Design of Buffering Mechanism for Improving Instruction and Data Stream,"Submit to IEE Proc. Computers &
Digital Techniques, Oct. 2003.
[3] Te-Shin Yang and Jih-Ching Chiu,"Vectorized Code Scheduling Method for the FFT Algorithm in VLIW Architecture ,"The Ninth Workshop on Compiler Techniques for High-Performance Computing, pp. 11-14, Mar. 2003.
計畫成果自評
數位訊號處理器核心在數位無線通訊系統晶片設計中,實現演算法設計,扮 演著決定性的角色。當前相關的高效能的DSP 核心的發展如 TI的TMS320C6x 系列、Motorola 的DSP563xx、Lucent Technologics 的DSP16xx、Analog Devices
的 ADSP-2116x與TigerSHARC處理器,提供了各種高效能的技術參考,但若直
接將之應用於 OFDM 的相關技術的開發則將無法提供經濟的因應高資料頻寬的 資料運算需求。基於本實驗室在微處理設計與實作的經驗(曾參與國內P7前瞻性 微處理器設計計劃、曾完成 SiS 公司之 ARM 指令相容之五個 Pipeline Stage 的
接將之應用於 OFDM 的相關技術的開發則將無法提供經濟的因應高資料頻寬的 資料運算需求。基於本實驗室在微處理設計與實作的經驗(曾參與國內P7前瞻性 微處理器設計計劃、曾完成 SiS 公司之 ARM 指令相容之五個 Pipeline Stage 的