• 沒有找到結果。

Chapter 4 Implementation

4.6 Summary

We have fully implemented the ALU Cluster architecture design by utilizing the current developed CAD tools and cell-based design flow. The maximal clock rate, physical core size, and power dissipation are 100 MHz, 2.16 mm2, and 312 mW, respectively. After having completed the execution of selected benchmark simulation, FIR filter system, the code utilization is 63.4%, and the clock cycles per executed result output is 3.96. The memory capacity of IRF units and SPRF unit are sufficient to provide during the simulation execution, and the ratio of data reference times of on-cluster memory and off-cluster memory is 989 : 91. The higher ratio to the on-cluster memory means that the limited global memory bandwidth is not wasted on the arithmetic units where the ample local memory bandwidth is easy to utilize.

Compare to current related reported works, this work could perform a quite competitive performance while being compared with the application-specific architectures. Besides, this work has better performance compared with the reconfigurable architectures and the FPGA architectures.

Furthermore, the implementation of multiple ALU Clusters design combined with power saving techniques has been developed by several members of SoC Laboratory, and another developed architecture simulator could depend on the required performance to determine the number of ALU Clusters to be executed after the application simulation has been finished. Therefore, the performance of execution time and power dissipation would be scalable depended on requirement.

CHAPTER 5

CONCLUSION

An ALU Cluster design for the media streaming processors architecture has been designed in this thesis. In the meantime, this work has also demonstrated the consideration of implementation feasibility of each component. The back-end simulation results based on the process technology and standard cell library have decided the optimized number and performance of each component. This streaming architecture combined with memory bandwidth hierarchy architecture has efficiently dealt with the selected test bench without wasting too much expensive and communication limited global memory bandwidth on the function units. Additionally, the analysis results of performance evaluation for this work confirm to have the competitiveness and advantages compared with recent relative reported works. Finally, the prototype of this work has been fabricated in UMC 0.18 um 1P6M standard CMOS process technology.

To integrated the developed ALU Cluster with power saving techniques as shown in Figure 4.5.1. The results show that the power dissipation and energy consumption of selected benchmark for the multimedia applications and baseband communication systems could be reduced significantly. Both power dissipation and energy consumption become scalable by dynamic selecting the number of utilized ALU Clusters. The instant performance and energy consumption of an entire work could be optimized for mobile systems. Thus, this design has provided a breakthrough in the operating time and power dissipation in limited battery life for similar architectures. From the above-mentioned results, therefore, the combination of streaming architectures and power saving techniques have been the main stream for the design of next generation portable multimedia and communication systems as depicted in Figure 4.5.5.

BIBLIOGRAPHY

[1] S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R.

Mattson, J. D. Owens, “A Bandwidth-Efficient Architecture for Media Processing,” Proceedings of 31st Annual ACM/IEEE International Symposium on Microarchitecture, pages 3-13, November 1998.

[2] L. Hennessy, A. Patterson, Computer Architecture: A Quantitative Approach, Third Edition, Morgan Kaufmann Publishers, 2003.

[3] W. Wolf, Modern VLSI Design: System-on-Chip Design, Third Edition, Prentice Hall Modern Semiconductor Design Series, 2002.

[4] J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, Second Edition, Prentice Hall Electronics and VLSI Series, 2003.

[5] N. H. E. Weste, K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, Second Edition, Addison-Wesley VLSI Systems Series, 1993.

[6] B. Khailany, J. W. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B.

Towles, A. Chang, S. Rixner, “Imagine: Media Processing with Streams,” IEEE Micro, pages 35-46, March-April 2001.

[7] U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Matttson, J. D.

Owens, “Programmable Stream Processors,” IEEE Computer, pages 54-62, August 2003.

[8] W. J. Dally, U. J. Kapasi, B. Khailany, J. H. Ahn, A. Das, “Stream Processors:

Programmability with Efficiency,” ACM Queue, pages 52-62, March 2004.

[9] K. Mai, T. Paaske, N. Jayasena, R. Ho, J. W. Dally, M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” Proceedings of the 27th International Symposium on Computer Architecture, pages 161-171, June 2000.

[10] J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, G. Daglikoca, “The Architecture of the DIVA Processing-In-Memory Chip,” Proceedings of the International Conference on Supercomputing, pages 14-25, June 2002.

[11] J. Draper, J. Sondeen, S. Mediratta, I. Kim, “Implementation of a 32-bit RISC Processor for the Data-Intensive Architecture Processing-In-Memory Chip,”

Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pages 163-172, July 2002.

[12] T. Sakurai, “Perspectives on Power-Aware Electronics,” IEEE International Solid-State Circuits Conference, pages 26-29, February 2003.

[13] J. Mitola, “The Software Radio Architecture,” IEEE Communications Magazine, pages 26-38, May 1995.

[14] E. Buracchini, “The Software Radio Concept,” IEEE Communications Magazine, pages 138-143, September 2000.

[15] M. Keating, P. Bricaud, Reuse Methodology Manual for System-on-Chip Designs, Third Edition, Kluwer Academic Publishers, 2002.

[16] J. D. Owens, S. Rixner, U. J. Kapasi, P. Mattson, B. Towles, B. Serebrin, W. J.

Dally, “Media Processing Applications on the Imagine Stream Processor,”

Proceedings of the IEEE International Conference on Computer Design, pages 295-302, September 2002.

[17] A. V. Oppenheim, R. W. Schafer, J. R. Buck, Discrete-Time Signal Processing, Second Edition, Prentice Hall Signal Processing Series, 1999.

[18] A. P. Chandrakasan, S. Sheng, R. W. Brodersen, “Low-Power CMOS Digital Design,” IEEE Journal of Solid-State Circuits, pages 473-484, April 1992.

[19] S. Rixner, Stream Processor Architecture, Kluwer Academic Publishers, 2002.

[20] B. khailany, The VLSI Implementation and Evaluation of Area- and Energy-Efficient Streaming Media Processors, Ph.D Dissertation, Stanford University, 2003.

[21] http://www.synopsys.com/

[22] http://www.umc.com/english/process/d.asp

http://www.umc.com/english/design/b_3.asp#Artisan

[23] R. Ho, K. Mai, M. Horowitz, “The Future of Wires,” Proceedings of the IEEE, pages 490-504, April 2001.

[24] http://www.cadence.com/

[25] http://www.novas.com/

[26] http://www.mentor.com/

[27] http://www.mathworks.com/

[28] H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levine, R. R. Taylor, “PipeRench:

A Virtualized Programmable Datapath in 0.18 Micron Technology,” Proceedings of the IEEE Custom Integrated Circuits Conference, pages 63-66, May 2002.

[29] E. F. Stefatos, H. Wei, T. Arslan, R. Thomson, “Low-Power Reconfigurable VLSI Architecture for the Implementation of FIR Filters,” Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, pages 168b-168b, April 2005.

[30] C. H. Wang, A. T. Erdogan, T. Arslan, “Algorithmic Implementation of Low-Power High Performance FIR Filtering IP Cores,” Proceedings of the 18th IEEE International Conference on VLSI Design, pages 659-662, January 2005.

[31] R. B. Staszewski, K. Muhammad, P. Balsara, “A 550-MSample/s 8-Tap FIR Digital Filter for Magnetic Recording Read Channels,” IEEE Journal of Solid-State Circuits, pages 1205-1210, August 2000.

[32] http://www.atmel.com/dyn/resources/prod_documents/DOC0833.PDF

[33] T. W. Lin, M. C. Lee, F. J. Lin, H. Chiueh, “A Low Power ALU Cluster Design for Media Streaming Architecture,” to appear: IEEE 48th International Midwest Symposium on Circuits and Systems, August 2005.

[34] J. W. Tschanz, S. g. Narendra, Y. Ye, B. A. Bloechel, s. Borkar, V. De, “Dynamic Sleep Transistor and Body Bias for Active Leakage Power Control of Microprocessors,” IEEE Journal of Solid-State Circuits, pages 1838-1845, November 2003.

[35] D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D. W. Stout, S. W. Gould, J. M.

Cohn, “Managing Power and Performance for System-on-Chip Design Using Voltage Islands,” IEEE/ACM International Conference on Computer Aided Design, pages 195-202, November 2002.

APPENDIX A

SUMMARY OF THE DEFINED MICROCODE IN INSTRUCTION SET

I. The part of “SOURCE” in instruction set format

Source Register Binary Code [1:0]

None operation 00 b

Off-chip data memory (DM) 11 b

SPRF (SP) 10 b

IRF (RF) 11 b

II. The part of “DESTINATION” in instruction set format

Destination Register Binary Code [3:0]

None operation 0000 b

Off-chip data memory (DM) 0001 b

SPRF (SP) 0010 b

Left IRF of ALU_0 (I9) 0011 b

Right IRF of ALU_0 (I8) 0100 b

Left IRF of ALU_1 (I7) 0101 b

Right IRF of ALU_1 (I6) 0110 b

Left IRF of MUL_0 (I5) 0111 b

Right IRF of MUL_0 (I4) 1000 b

Left IRF of MUL_1 (I3) 1001 b

Right IRF of MUL_1 (I2) 1010 b

Left IRF of DIV_0 (I1) 1011 b

Right IRF of DIV_0 (I0) 1100 b

III. The part of “OPERATION CODE” in instruction set format

Function Unit Operation OP code

None 0000 b

ADD 0001 b

SUB 0010 b

ABS 0011 b

AND 0100 b

OR 0101 b

XOR 0110 b

NOT 0111 b

SLL 1000 b

SRL 1001 b

SRA 1010 b

LT 1011 b

GT 1100 b

ALU

EQ 1101 b

None 0 b

MUL MUL 1 b

None 00 b

DIV 01 b

REM 10 b

DIV

SQR 11 b

APPENDIX B

9 RF_00_RF_00_DM_00_ADD DM_12_DM_02_I6_05_MUL DM_13_DM_03_I7_06_MUL

10 RF_00_RF_00_SP_00_ADD DM_14_DM_04_I6_06_MUL DM_15_DM_05_I7_07_MUL

11 RF_01_RF_01_I9_02_ADD DM_16_DM_06_I6_07_MUL DM_10_DM_01_I7_08_MUL

12 RF_02_RF_02_I8_02_ADD DM_11_DM_02_I6_08_MUL DM_12_DM_03_I7_09_MUL

13 RF_03_RF_03_I9_03_ADD DM_13_DM_04_I6_09_MUL DM_14_DM_05_I7_10_MUL

14 RF_04_RF_04_I8_00_ADD DM_15_DM_06_I6_10_MUL DM_16_DM_07_I9_00_MUL

15 RF_01_SP_00_DM_01_ADD DM_09_DM_01_I7_00_MUL DM_10_DM_02_I6_00_MUL

16 RF_05_RF_05_I8_03_ADD DM_11_DM_03_I7_01_MUL DM_12_DM_04_I6_01_MUL

17 RF_02_RF_02_DM_02_ADD RF_06_RF_06_I9_04_ADD DM_13_DM_05_I7_02_MUL DM_14_DM_06_I6_02_MUL

18 RF_03_RF_01_I9_05_ADD RF_07_RF_07_I8_04_ADD DM_15_DM_07_I7_03_MUL DM_16_DM_08_I6_03_MUL

19 RF_08_RF_08_I8_05_ADD DM_08_DM_01_I7_04_MUL DM_09_DM_02_I6_04_MUL

20 RF_09_RF_09_I9_01_ADD DM_10_DM_03_I7_11_MUL DM_11_DM_04_I6_11_MUL

21 RF_10_RF_10_I8_06_ADD DM_12_DM_05_I7_05_MUL DM_13_DM_06_I6_05_MUL

22 RF_04_RF_03_I9_06_ADD RF_00_RF_00_I8_07_ADD DM_14_DM_07_I7_06_MUL DM_15_DM_08_I6_06_MUL

23 RF_05_RF_00_DM_03_ADD RF_01_RF_01_I9_02_ADD DM_16_DM_09_I8_01_MUL DM_07_DM_01_I7_07_MUL

24 RF_00_RF_05_I9_03_ADD RF_02_RF_02_I8_02_ADD DM_08_DM_02_I6_07_MUL DM_09_DM_03_I7_08_MUL

25 RF_03_RF_03_I9_07_ADD DM_10_DM_04_I6_08_MUL DM_11_DM_05_I7_09_MUL

26 RF_01_RF_06_I8_08_ADD RF_04_RF_04_I9_08_ADD DM_12_DM_06_I6_09_MUL DM_13_DM_07_I7_10_MUL

27 RF_06_RF_04_DM_04_ADD RF_11_RF_11_I8_03_ADD DM_14_DM_08_I6_10_MUL DM_15_DM_09_I7_00_MUL

28 RF_02_RF_07_I8_00_ADD RF_05_RF_05_I9_04_ADD DM_16_DM_10_I6_00_MUL DM_06_DM_01_I7_01_MUL

29 RF_06_RF_06_I8_05_ADD DM_07_DM_02_I6_01_MUL DM_08_DM_03_I7_02_MUL

30 RF_07_RF_02_I9_00_ADD DM_09_DM_04_I6_02_MUL DM_10_DM_05_I7_03_MUL

31 RF_08_RF_01_I9_01_ADD RF_07_RF_07_I8_06_ADD DM_11_DM_06_I6_03_MUL DM_12_DM_07_I7_04_MUL

32 RF_03_RF_08_DM_05_ADD RF_08_RF_08_I9_05_ADD DM_13_DM_08_I6_04_MUL DM_14_DM_09_I7_11_MUL

33 RF_04_RF_03_I9_02_ADD RF_09_RF_09_I8_04_ADD DM_15_DM_10_I6_11_MUL DM_05_DM_01_I7_05_MUL

34 RF_10_RF_10_I9_06_ADD DM_06_DM_02_I6_05_MUL DM_07_DM_03_I7_06_MUL

35 RF_00_RF_00_DM_06_ADD RF_00_RF_00_I8_02_ADD DM_08_DM_04_I6_06_MUL DM_09_DM_05_I7_12_MUL

36 RF_01_RF_05_I8_01_ADD RF_01_RF_01_I9_07_ADD DM_10_DM_06_I6_12_MUL DM_11_DM_07_I7_07_MUL

37 RF_05_RF_06_I9_03_ADD RF_02_RF_02_I8_07_ADD DM_12_DM_08_I6_07_MUL DM_13_DM_09_I7_08_MUL

38 RF_03_RF_03_I9_04_ADD DM_14_DM_10_I6_08_MUL DM_04_DM_01_I7_09_MUL

39 RF_06_RF_04_I9_08_ADD RF_04_RF_04_I8_03_ADD DM_05_DM_02_I6_09_MUL DM_06_DM_03_I7_10_MUL

40 RF_11_RF_11_I8_00_ADD DM_07_DM_04_I6_10_MUL DM_08_DM_05_I7_00_MUL

41 RF_02_RF_01_DM_07_ADD RF_05_RF_05_I9_00_ADD DM_09_DM_06_I6_00_MUL DM_10_DM_07_I7_01_MUL

42 RF_03_RF_02_SP_00_ADD RF_06_RF_06_I8_05_ADD DM_11_DM_08_I6_01_MUL DM_12_DM_09_I7_02_MUL

43 RF_07_RF_07_I9_01_ADD RF_12_RF_12_I8_06_ADD DM_13_DM_10_I6_02_MUL DM_03_DM_01_I7_03_MUL

44 RF_04_RF_03_I8_04_ADD RF_07_RF_07_I9_05_ADD DM_04_DM_02_I6_03_MUL DM_05_DM_03_I7_04_MUL

45 RF_08_RF_08_I9_06_ADD DM_06_DM_04_I6_04_MUL DM_07_DM_05_I7_11_MUL

46 RF_09_RF_09_I9_02_ADD DM_08_DM_06_I6_11_MUL DM_09_DM_07_I7_05_MUL

47 RF_08_SP_00_DM_08_ADD RF_10_RF_10_I8_01_ADD DM_10_DM_08_I6_05_MUL DM_11_DM_09_I7_06_MUL

48 RF_01_RF_00_I9_03_ADD RF_00_RF_00_I8_02_ADD DM_12_DM_10_I6_06_MUL DM_02_DM_01_I7_12_MUL

49 RF_00_RF_05_I8_03_ADD RF_01_RF_01_I9_04_ADD DM_03_DM_02_I6_12_MUL DM_04_DM_03_I7_07_MUL

50 RF_05_RF_06_I8_07_ADD RF_02_RF_02_I9_07_ADD DM_05_DM_04_I6_07_MUL DM_06_DM_05_I7_08_MUL

51 RF_03_RF_03_I9_09_ADD DM_07_DM_06_I6_08_MUL DM_08_DM_07_I7_09_MUL

52 RF_02_RF_01_I9_08_ADD RF_04_RF_04_I8_08_ADD DM_09_DM_08_I6_09_MUL DM_10_DM_09_I7_10_MUL

53 RF_03_RF_04_DM_09_ADD RF_11_RF_11_I9_01_ADD DM_11_DM_10_I6_10_MUL DM_01_DM_01_I7_00_MUL

54 RF_06_RF_03_I9_00_ADD RF_05_RF_05_I8_00_ADD DM_02_DM_02_I6_00_MUL DM_03_DM_03_I7_01_MUL

55 RF_04_RF_02_I8_05_ADD RF_06_RF_06_I9_05_ADD DM_04_DM_04_I6_01_MUL DM_05_DM_05_I7_02_MUL

56 RF_12_RF_12_I8_06_ADD DM_06_DM_06_I6_02_MUL DM_07_DM_07_I7_03_MUL

57 RF_09_RF_08_I8_01_ADD RF_07_RF_07_I9_02_ADD DM_08_DM_08_I6_03_MUL DM_09_DM_09_I7_04_MUL

58 RF_08_RF_08_I8_04_ADD DM_10_DM_10_I6_04_MUL DM_01_DM_02_I7_11_MUL

59 RF_00_RF_07_DM_10_ADD RF_09_RF_09_I9_03_ADD DM_02_DM_03_I6_11_MUL DM_03_DM_04_I7_05_MUL

60 RF_07_RF_05_I8_02_ADD RF_10_RF_10_I9_04_ADD DM_04_DM_05_I6_05_MUL DM_05_DM_06_I7_06_MUL

61 RF_01_RF_00_I8_03_ADD RF_00_RF_00_I9_06_ADD DM_06_DM_07_I6_06_MUL DM_07_DM_08_I7_12_MUL

62 RF_05_RF_01_I9_09_ADD RF_01_RF_01_I8_09_ADD DM_08_DM_09_I6_12_MUL DM_09_DM_10_I8_08_MUL

63 RF_02_RF_06_I8_10_ADD RF_02_RF_02_I9_10_ADD DM_01_DM_03_I7_07_MUL DM_02_DM_04_I6_07_MUL

64 RF_03_RF_04_I9_00_ADD RF_03_RF_03_SP_00_ADD DM_03_DM_05_I7_08_MUL DM_04_DM_06_I6_08_MUL

65 RF_08_RF_02_DM_11_ADD RF_04_RF_04_I9_07_ADD DM_05_DM_07_I7_09_MUL DM_06_DM_08_I6_09_MUL

66 RF_11_RF_11_I8_00_ADD DM_01_DM_04_I7_10_MUL DM_02_DM_05_I6_10_MUL

67 RF_09_RF_03_DM_12_ADD RF_05_RF_05_I9_01_ADD DM_03_DM_06_I7_00_MUL DM_04_DM_07_I6_00_MUL

68 RF_04_RF_10_I8_01_ADD RF_06_RF_06_I9_02_ADD DM_05_DM_08_I7_01_MUL DM_06_DM_09_I6_01_MUL

69 RF_06_RF_09_I8_04_ADD RF_12_RF_12_SP_01_ADD DM_07_DM_10_I9_03_MUL DM_01_DM_05_I7_02_MUL

70 RF_10_SP_00_I9_05_ADD RF_07_RF_07_I8_02_ADD DM_02_DM_06_I6_02_MUL DM_03_DM_07_I7_03_MUL

71 RF_08_RF_08_SP_02_ADD DM_04_DM_08_I6_03_MUL DM_05_DM_09_I7_04_MUL

72 RF_01_RF_00_I9_08_ADD RF_09_RF_09_I8_03_ADD DM_06_DM_10_I6_04_MUL DM_01_DM_06_I7_05_MUL

73 RF_00_RF_01_DM_13_ADD RF_10_RF_10_I8_05_ADD DM_02_DM_07_I6_05_MUL DM_03_DM_08_I7_06_MUL

74 RF_07_RF_04_I8_07_ADD RF_00_RF_00_I9_04_ADD DM_04_DM_09_I6_06_MUL DM_05_DM_10_I8_06_MUL

75 RF_02_SP_01_I9_06_ADD RF_01_RF_01_I8_09_ADD DM_01_DM_07_I7_07_MUL DM_02_DM_08_I6_07_MUL

76 SP_02_RF_02_I9_09_ADD DM_03_DM_09_I7_08_MUL DM_04_DM_10_I6_08_MUL

77 RF_02_RF_02_I8_00_ADD DM_01_DM_08_I7_09_MUL DM_02_DM_09_I6_09_MUL

78 RF_03_RF_05_I8_01_ADD RF_03_RF_03_I9_01_ADD DM_03_DM_10_I6_11_MUL DM_01_DM_09_I7_10_MUL

79 RF_05_RF_07_DM_14_ADD RF_04_RF_04_I8_04_ADD DM_02_DM_10_I6_10_MUL DM_01_DM_10_DM_20_MUL

80 RF_08_RF_08_I8_10_ADD RF_05_RF_05_SP_00_ADD DM_07_DM_09_I7_30_MUL DM_08_DM_10_I6_30_MUL

81 RF_04_RF_09_I9_02_ADD RF_06_RF_06_I8_02_ADD

82 RF_09_RF_03_SP_63_ADD RF_07_RF_07_I9_07_ADD

83 RF_01_RF_00_I9_03_ADD RF_08_RF_08_I8_05_ADD

84 RF_09_RF_09_I7_00_ADD

85 RF_06_RF_10_DM_15_ADD

86 SP_00_RF_06_I9_00_ADD

87 RF_10_RF_10_DM_01_ADD

88 RF_03_RF_04_DM_18_ADD RF_30_RF_30_I8_31_ADD

89 RF_07_RF_05_DM_20_ADD RF_00_RF_11_DM_00_ADD

90 RF_02_RF_01_DM_17_ADD

91 RF_00_RF_02_DM_19_ADD

92

93 SP_63_RF_31_DM_16_ADD

相關文件