智慧型 DMA 在多媒體系統上的發展,可以朝向整合 VLIW 處理器結合智慧型 DMA 成為一個完整系統,配合軟體開發環境,足夠在此平台上發展完整的多媒體 應用,並針對多媒體上音訊、視訊的應用做開發,使其能呈現完整的多媒體應用 能力。應用本論文的 VLIW 處理器達到多媒體處理的效能,配合智慧型 DMA 的架 構,完成如圖 5-1,MP3 Encoder 應用的例子。虛線框起來的部分,可交由智慧 型 DMA 來輔助運算,此時處理器可分攤其他方塊的工作,使得原先無法單獨處理 MP3 Encoder 工作的處理器,能夠 Real-Time 執行 MP3 Encoder 的功能,提升到 DSP 等級的能力。
32-Channel Polyphase Analysis
Filterbank
MDCT
Bit Allocation Loop Quantization Huffman Coding
Bit Stream Formatting
FFT
Psychoacoustic Model II Data input
Data output Window
Switching SMR
圖 5-1:MP3 編碼流程
根據圖 5-1,主要運算的方塊為 32 組濾波器、改良餘弦轉換及快速傅利 葉轉換,本論文的功能驗證裡,已包含濾波器及餘弦轉換的實現,因此,上 述功能可打包成一個 Function,直接透過智慧型 DMA 來處理,另外 FFT 的實 現也可以利用智慧型 DMA 協助資料的排列,而處理器負責流程控制及其他方
CHIP 內,進到內部記憶體,經過 MP3 Eencoder 演算法的程式處理後,送到 智慧型 DMA 透過 I/O Bus 存入外部的儲存體 CF card,最後可以將 CF 卡的 MP3 音源資料檔讀出用 Media Player 撥放。
I2S Audio Codec I2S Rxd
I2S Txd SDMA VLIW
Processor
Flash I/O Bus
SRAM
SRAM
Audio Codec CHIP
I2S
圖 5-2:應用-MP3 編碼、解碼
複雜的多媒體應用上,內建記憶體容量將不敷使用,未來可在匯流排掛上外 部記憶體的控制器,使得在有大量記憶體需求的多媒體應用上,仍能發揮其強大 的效能。由於數位訊號處理相當多樣化,可針對數位訊號處理,發展不同的定址 模式及資料排列方式,以增強智慧型 DMA 的能力。此外,現有的微處理器已發展 多時,效能普通卻有低成本的優勢,未來可發展小型微處理器結合智慧型 DMA,
使得其成為更低成本兼具計算能力的解決方案。
本篇論文提出一個智慧型 DMA 控制器。在傳輸方面,支援所有的傳輸模式,
且為了在取得資料時,有效率地選取資料,降低傳輸頻寬,增加速度,智慧型 DMA 控制器設計了四種定址方法:遞增/遞減(Increasing/Decreasing)、環狀 定址(Circular Addressing)、鏡射定址(Mirror Addressing)及索引定址
(Index-based Addressing)。而針對數位訊號處理,此四種定址方法配合乘加 運算器(MAC),可達到輔助處理器的數位訊號處理功能。且同時支援雙通道資料
匯流排標準,內建常用的音源介面(I2S),共可外掛八個 I/O 裝置,若有需要可 在匯流排上加裝符合 APB 標準的介面,擁有記憶體到記憶體、記憶體到周邊、周 邊到記憶體、周邊到周邊四種傳輸模式。而在加上 MAC 運算元後,僅增加 10%的 Gate Count,成本相當低。
在驗證方面,發展一顆 VLIW 處理器,透過和智慧型 DMA 的結合,讓原本處 理器的效能達到數位訊號處理器的水準。VLIW 處理器結合智慧型 DMA 控制器晶 片,已在國家晶片中心(CIC)下線,未來此顆晶片可用做低成本的數位訊號處 理器來使用,也可以 IP 的方式,將系統整合起來,成為一個 SOC 的系統。
參考文獻
[1] Vijay K. Madisetti, VLSI Digital Signal Processors: An Introduction to Rapid Prototyping and Design Synthesis, IEEE Press, 1995.
[2] Luca Breveglieri and Luigi Dadda, “A VLSI inner product macrocell,” IEEE
Trans. on VLSI, vol. 6, no. 2, pp. 292-298, 1998.
[3] Dave Comiskey, Sanjive Agarwala, and Charles Fuoco, “A scalable high-performance DMA architecture for DSP application,” Proceedings of the
IEEE International Conference on Computer Design (ICCD), pp. 414-417, 2000.
[4] C. M. Yuen, K. F. Tsang, and W. H. Chan, “Direct memory access frequency synthesizer for channel efficiency improvement in frequency hopring communication,” Proceedings of IEEE International Symposium on Circuits and
Systems (ISCAS 2000), pp.485-488, 2000.
[5] Mattias O’Nils and Axel Jantsch, “Synthesis of DMA controllers from architecture independent descriptions of HW/SW communication protocols,”
Proceedings of IEEE International Conference on VLSI Design (ICVD), 1999.
[6] ARM Ltd, PrimeCell Single Master DMA Controller (PL081), rev. r1p1, http://www.arm.com, Technical Reference Manual, 2003.
[7] Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck, Discrete Time Signal Processing, 2nd Edition, Prentice Hall, 1999.
[8] ARM Ltd, AMBA Specification, rev. 2.0, http://www.arm.com, 1999.
[9] John L. Hennessy and David A. Patterson, Computer Architecture, 3rd Edition, Morgan Kaufmann, 2003.
[10] John L. Hennessy and David A. Patterson, Computer Organization & Design:
1998.
[11] Bill S.-H.Kwan, Bruce F.Cockburn, and Duncan G. Elliott, “Implementation of DSP-RAM: architecture for parallel digital signal processing in memory,”
Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, pp. 341-346, 2001.
[12] C. G. Lee and M. G. Stoodley, “Simple vector microprocessors for multimedia applications,” Proceedings of 31st
Annual ACM/IEEE International Symposium,
pp. 25-36, Dec. 1998.[13] A. Kalavade and E. A. Lee, “A hardware-software codesign methodology for DSP application,” IEEE Design & Test of Computers, vol. 10, pp. 16-28, Sept.
1993.
[14] Steve Fuber, ARM System-on-Chip Architecture, 2nd edition, Addison-Wesley Professional, 2000.
[15] Hans-Joachim Stolberg, Mladen Berekovic, Lars Friebe, Sören Moch, Mark Bernd Kulaczewski and Peter Pirsch, “HiBRID-SoC: A Multi-Core System-on-Chip Architecture for Multimedia Signal Processing Applications,”
Proceedings of International Conference on Very Large Scale Integration of System-on-Chip, pp. 155-160, 2003.
[16] Jae Sung Lee and Myung H. Sunwoo, “Design of new DSP instructions and their hardware architecture for high-speed FFT,” Kluwer Academic Publishers, pp.
247-254, 2003.
[17] Philips Semiconductor, I2S bus Specification,
http://www.semiconductors.philips.com ,1996.
[18] Donald E. Thomas and Philip Moorby, The Verilog Hardware Description Language, Kluwer Acadmic Publishers, 1994.
Academic Publishers, 1997.
[20] Nian Shyang Chang, Cell-Based IC Physical Design and Verification, Chip Implementation Center, July, 2004.
[21] GlobalUnichip, “UAPC-5110 DMA Controller Lite,”
http://www.globalunichip.com.tw, 2002.
[22] Faraday, “Direct Memory Access Controller: Faraday/UMC FTDMAC020,” rev.
1.2, www.faraday.com.tw, 2003.
[23] Yuan-Hao Huang, Hsi-Pin Ma, Ming-Luen Liou, and Tzi-Dar Chiueh, “A 1.1 G MAC/s Sub-Word-Parallel digital signal processor for wireless communication applications,” IEEE Journal of Solid-State Circuits, vol. 39, no. 1, 2004.
[24] Michael Dolle, Satwinder Jhand, Walter Lehner, Otto M¨uller and Manfred Schlett “A 32-b RISC/DSP microprocessor with reduced complexity,” IEEE Journal of
Solid-State Circuits, vol. 32, no. 7, 1997.
附錄
A 測試程式
Memory A to Memory A Memory A to Memory B
SDMAB 0,0_00000001_0000000
SDMAB 1,0_000000111111111 SDMAB 2,0_00000001_0000000 SDMAB 3,0_000000000000000 SDMAB 4,01_10_00_0100000000 SDMAB 5,0_1_000_000_00_0_000_1_1 DMAOK
MOVRC R2,2 LABEL:A2A GDMA R3
JNER R3,R2,A2A
SDMAB 0,1_00000001_0000000 SDMAB 1,0_000000111111111 SDMAB 2,0_00000001_0000000 SDMAB 3,1_000000000000000 SDMAB 14,00000000_00000000 SDMAB 4,01_10_00_1000000000 SDMAB 5,0_1_000_000_00_0_000_1_1 DMAOK
MOVRC R2,2 LABEL:RAM_COPY GDMA R3
JNER R3,R2,RAM_COPY
Memory A to Memory B (Circular) Memory A to Memory B (Mirror)
SDMAB 0,0_00000111_0000000
SDMAB 1,0_000000000000000 SDMAB 2,0_00000001_0000000 SDMAB 3,1_000000000000000 SDMAB 14,00100000_00000000
SDMAB 0,1_00000111_0000000
SDMAB 1,0_000000000000000
SDMAB 2,0_00000001_0000000
SDMAB 3,1_000000111111111
SDMAB 14,00100000_00000000
SDMAB 4,10_10_00_1000000000 SDMAB 5,0_1_000_000_00_0_000_1_1 DMAOK
MOVRC R2,2
LABEL:A2B_circular GDMA R3
JNER R3,R2,A2B_circular
SDMAB 4,10_01_00_1000000000 SDMAB 5,0_1_000_000_00_0_000_1_1 DMAOK
MOVRC R2,2
LABEL:A2B_mirror GDMA R3
JNER R3,R2,A2B_mirror
Inner product Convolution
SDMAB 0,0_00000001_0000000
SDMAB 1,0_000000000000000 SDMAB 6,0_00000001_0000000 SDMAB 7,1_000000000000000 SDMAB 4,10_00_00_1000000000 SDMAB 10,10_00_00_1000000000 SDMAB 14,00000000_00000000 SDMAB 15,00000000_00000000
SDMAB 5,0_1_000_000_00_0_001_1_1 SDMAB 11,0_1_000_000_00_0_001_1_1 DMAOK
MOVRC R2,0
LABEL:Inner_product GDMA R3
SDMAB 0,0_00000001_0000000 SDMAB 1,0_000000000000000 SDMAB 6,0_00000001_0000000 SDMAB 7,1_000000111111111 SDMAB 4,10_00_00_1000000000 SDMAB 10,01_00_00_1000000000 SDMAB 5,0_1_000_000_00_0_001_1_1 SDMAB 11,0_1_000_000_00_0_001_1_1 DMAOK
MOVRC R2,0
LABEL:CONVOLUTION GDMA R3
JNER R3,R2,CONVOLUTION
DCT0 DCT4
SDMAB 0,0_00000001_0000000
SDMAB 1,0_000000000000000 SDMAB 6,1_00000000_0000000 SDMAB 7,1_000000000000000 SDMAB 4,10_00_00_0000100100 SDMAB 10,10_00_00_0000100100 SDMAB 14,00000000_00000000
SDMAB 5,0_1_000_000_00_0_001_1_1 SDMAB 11,0_1_000_000_00_0_001_1_1 DMAOK
MOVRC R2,0 LABEL:DCT0 GDMA R3
JNER R3,R2,DCT0
SDMAB 0,0_00000001_0000000 SDMAB 1,0_000000000000000 SDMAB 6,1_00001000_0000100 SDMAB 7,1_000000000000100 SDMAB 4,10_00_00_0000100100 SDMAB 10,10_00_00_0000100100 SDMAB 15,01001001_00000000
SDMAB 5,0_1_000_000_00_0_001_1_1 SDMAB 11,0_1_000_000_00_0_001_1_1 DMAOK
MOVRC R2,0 LABEL:DCT4 GDMA R3
JNER R3,R2,DCT4
DWT_A0 DWT_A4
SDMAB 15,00000101_00000000 SDMAB 0,0_00000001_0000000 SDMAB 1,0_000000111111000 SDMAB 6,1_00000001_0000000 SDMAB 7,1_000000010000000 SDMAB 4,10_00_00_0000001001 SDMAB 10,10_00_00_0000001001
SDMAB 1,0_000000000000000
SDMAB 7,1_000000010000000
SDMAB 4,10_00_00_0000001001
SDMAB 10,10_00_00_0000001001
SDMAB 5,0_1_000_000_00_0_001_1_1
SDMAB 11,0_1_000_000_00_0_001_1_1
DMAOK
SDMAB 5,0_1_000_000_00_0_001_1_1 SDMAB 11,0_1_000_000_00_0_001_1_1 DMAOK
MOVRC R2,0 LABEL:DWT_A0 GDMA R3
JNER R3,R2,DWT_A0
MOVRC R2,0 LABEL:DWT_A4 GDMA R3
JNER R3,R2,DWT_A4
DWT_D0 DWT_D6
SDMAB 15,00000100_00000000 SDMAB 1,0_000000111111011 SDMAB 7,1_000000011000000 SDMAB 4,10_00_00_0000000111 SDMAB 10,10_00_00_0000000111 SDMAB 5,0_1_000_000_00_0_001_1_1 SDMAB 11,0_1_000_000_00_0_001_1_1 DMAOK
MOVRC R2,0 LABEL:DWT_D0 GDMA R3
JNER R3,R2,DWT_D0
SDMAB 1,0_000000000000111 SDMAB 7,1_000000011000000 SDMAB 4,10_00_00_0000000111 SDMAB 10,10_00_00_0000000111 SDMAB 5,0_1_000_000_00_0_001_1_1 SDMAB 11,0_1_000_000_00_0_001_1_1 DMAOK
MOVRC R2,0 LABEL:DWT_D6 GDMA R3
JNER R3,R2,DWT_D6
B 佈局驗證結果說明
1. DRC
DRC 驗證無誤
2. LVS
LVS 驗證無誤
C Tapeout Review Form
1. 晶片概述:
1-1. 專題名稱: 具有 Smar -DMA 之 VLIW 架構多媒體訊號處理器 1-2. Top Cell 名稱: SD297
1-3. 使用 library 名稱:
CIC_CBDK35 CIC_CBDK25 v CIC_CBDK18 版本: v2
1-4. 是否使用 CIC 提供之 Memory? Yes 1-5. 工作頻率: 100 MHz 1-6. 功率消耗: 400mW 1-7. 晶片面積: 2375 X 2372
2. 設計合成:
2-1. 使用之合成軟體? Synopsys design compiler 2-2. 是否加入 boundary condition:
v input drive strength、 v input delay、 v output loading、 v output delay 2-3. 是否加入 timing constraint:
v specify clock (sequential design)
max delay、 min delay (combinational design) 2-4. 是否加入 area constraint? No
2-5. 合成後之 report 是否有 timing violation? No 有 setup time violation、 有 hold time violation
2-6. 合成後之 verilog 是否含有 assign 描述? No (手動修過) 2-7. 合成後之 verilog 是否含有 *cell* 之 instance name? No
2-8. 合成後之 verilog 是否含有反鈄線 \ 之 instance name 或 net name? Yes
3. 可測試性設計(前瞻性晶片必填):
3-0. 使用之設計軟體? DFT compiler 3-2. 使用之 ATPG 軟體? Tetramax
3-3. 使用 Embedded memory 數量: SRAM 2 ,ROM 0 Memory 大小: 512x32 (Word x bit)
測試方法: BIST Yes ,or 其他測試方法 N/A 若使用 BIST,其 Test Algorithm 為何? Moving Inversion (13N March)
3-4. Scan Chain Information
Flip-Flop 共有多少個? 4970
Scan chain 的數量共有多少條? 1 Scan chain length (Max.) ? 137781.929
3-5. Uncollapsed fault coverage 是否超過 90% ? Yes ,為多少? 98.45%
ATPG pattern 的數目為多少? 983
註:若使用 Synopsys TetraMAX 來產生 ATPG pattern,請使用 set faults -fault_coverage指令指定 TetraMAX 產生 fault coverage information 若使用 SynTest TurboScan 之 asicgen 來產生 ATPG pattern,請以 atpg pessimistic fault coverage 的值為準
4. 佈局前模擬
4-1. gate level simulation 是否有 timing violation? No 有 setup time violation、 有 hold time violation
5. 實體佈局
5-1. 使用之 P&R 軟體? Apolo、 v SE
5-2. power ring 寬度? 8 是否已考量 current density(1mA/1um)? Yes 5-3. 是否考慮 output loading? Yes
5-4. 是否加上 Clock Tree? Yes 5-5. 是否加上 Corner pad? Yes 5-6. 是否加上 IO Filler? Yes 5-7. 是否加上 Core Filler? Yes 5-8. 是否上加 Bonding Pad? Yes 以下(A-1)為使用 Apollo 者才須回答
A-1. 是否執行 Fill Notch and Gap 步驟?
以下(S-1 至 S-2)為使用 SE 者才須回答
S-1. power ring 上是否有 overlap vias? No
S-2. 是否確定 IO Row 和 Corner Row 互相貼齊? Yes
6. 佈局後模擬
6-1. 是否做過 post-layout gate-level simulation? Yes
STA(static timing analysis) 軟體? Primetime / Modelsim 6-2. 是否做過 post-layout transistor-level simulation? No
6-3. 已針對以下環境狀態模擬: SS、 TT、 FF
6-4. 晶片取得時將以何種方式進行測試? I MS 測試機台 6-5. 模擬時是否考量輸出負載影響? Yes
7. DRC/LVS 驗證
7-1. 是否有 DRC 錯誤? No 錯誤原因:
驗證 DRC 軟體? Calibre
是否有不作 DRC 的區域? No 7-2. 是否有 LVS 錯誤? No
驗證 LVS 軟體? Calibre 是否有非 CIC 提供的 BlackBox? No
設計者簽名: 蘇育緯/周經翔/鍾仁峰 指導教授簽名: 林進燈
D Module I/O
z Chip I/O
Pin Name I/O Width Function
Global
CLK I 1 System Clock
Reset I 1 Synchronous Reset
Scan Chain
Scan_in I 1 Scan Chain Input
Scan_en I 1 Scan Chain Enable
Scan_out O 1 Scan Chain Output
Memory BIST
BistMode I 1 BIST Mode Select
BistFail O 1 BIST Failure
Finish O 1 BIST Finish
ErrMap_A O 1 Error Map of Ram A
ErrMap_B O 1 Error Map of Ram B
Program Rom
pro_mem_address_w O 16 Program Rom Address pro_mem_data_w I 64 Program Rom Data I2S interface
BCLK I 1 I2S Bit Clock
LRCLK I 1 I2S Word Select Clock
SDIN I 1 I2S Data Input
BCLK_out O 1 I2S Bit Clock Output
LRCLK_out O 1 I2S Word Select Clock Output
SDOUT O 1 I2S Data Output
z Processor Modules
1. PC Counter / Branch Protect Stage:
2. Instruction Fetch Stage:
I/O Type Pin Name Width
Input CLK 1
Input Reset 1
Input pc 16
Input j_bit 1
Input jump_done 1
Output pc_predec 16
Output pro_mem_address 16
3. Program Memory Stage:
4. Instruction Predecoder Stage:
I/O Type Pin Name Width
Input CLK 1
Input Reset 1
Input pc_predec 16
Input instruction 64
Input j_bit 1
Input jump_done 1
Output op_a 6
Output op_b 6
Output pre_instruction_a 32
Output pre_instruction_b 32
Output pc_dec 16
5. Instruction Decoder Stage:
I/O Type Pin Name Width
Input CLK 1
Input Reset 1
Input instruction 32
Input op 6
Input pc_dec 16
Input j_bit 1
Input jump_done 1
Output rd 6
Output direct 20
Output op_reg 6
Output pc_reg 16
Output address1 6
Output address2 6
Output DMA_address 4
6. Register File Stage:
I/O Type Pin Name Width
Input CLK 1
Input Reset 1
Input pc_reg 16
Input op 6
Input rd 6
Input flag 2
Input direct 20
Input address1 6
Input address2 6
Input address3 6
Input reg_w_data 32
Input Device_Int 4
Input int_done 1
Input dec_to_reg 6
Input DMA_address 4
Output pc_alu 16
Output op_alu 6
Output rd_alu 6
Output addr1_alu 6
Output addr2_alu 6
Output flag_alu 2
Output direct_alu 20
Output reg_r_data1 32
Output reg_r_data2 32
Output reg_to_alu 31:0
Output DMA_address_alu 3:0
7. ALU Stage:
I/O Type Pin Name Width
Input CLK 1
Input Reset 1
Input pc 16
Input op 6
Input rd 6
Input flag 2
Input direct 20
Input A 32
Input B 32
Input jump_done 1
Input mem_output_data 32
Input alu_work 1
Input addr1_alu 6
Input addr2_alu 6
Input reg_to_alu 32
Input Intn 2
Input RegOut 32
Input DMA_address 4
Output Cout 32
Output C_address 6
Output j_bit 1
Output jump_addr 32
Output mem_w_address 9
Output mem_input_data 32
Output wren 1
Output Pin0 16
Output Pin1 16
Output mem_r_address 9
Output int_done 1
Output mgrant_a 1
Output mgrant_b 1
Output Rcen 1
Output Rwen 1
z Smart DMA Modules
Pin Name I/O Width Function Global
CLK I 1 System Clock
rstn I 1 Synchronous Reset
Setting DMA
mgrant0 I 1 Memory A Select mgrant1 I 1 Memory B Select rcen I 1 Register Chip Enable rwen I 1 Register Write Enable raddr I 4 Register Address RegIn I 32 Register Data Input RegOut O 32 Register Data Output
Intn O 2 Channel Controller Interrupt Memory Interface
cen0 O 1 Chip Enable of Memory A oen0 O 1 Output Enable of Memory A wen0 O 1 Write Enable of Memory A
a0 O 9 Address of Memory A
d0 O 32 Data Input of Memory A q0 I 32 Data Output of Memory A cen1 O 1 Chip Enable of Memory B oen1 O 1 Output Enable of Memory B wen1 O 1 Write Enable of Memory B
a1 O 9 Address of Memory B
d1 O 32 Data Input of Memory B q1 I 32 Data Output of Memory B I2S interface
BCLK I 1 I2S Bit Clock
LRCLK I 1 I2S Word Select Clock
SDIN I 1 I2S Data Input
BCLK_out O 1 I2S Bit Clock Output
LRCLK_out O 1 I2S Word Select Clock Output
SDOUT O 1 I2S Data Output