Reconfigurable Parameters Setup

Chapter 5 Application Notes

5.3 Reconfigurable Parameters Setup

The proposed design has been detailed in previous sections; this section describes the way to reconfigure the parameters of the proposed design in tabular form. Table 5.5 lists the I/O interface of the proposed design. Table 5.6, identical to Table 3.3, lists again the possible SW combination schemes of the proposed design.

Table 5.7 and Table 5.8 give some examples to configure the kill and mode signals, respectively. Some points or exception to be noticed are list as notes at the bottom of each Table.

Table 5.5. Interface of the proposed design.

MAC Scalar 16-bit 32-bit 64-bit MULTIPLICAND mcand[N-1:0] mcand[15:0] mcand[31:0] mcand[63:0]

MULTIPLIER mlier[N-1:0] mlier[15:0] mlier[31:0] mlier[63:0]

ACCUMULATOR accu[2N-1:0] accu[31:0] accu[63:0] accu[127:0]

MODE ^mode[1:0] mode_v0[1:0] RESULT m_out[2N-1:0] m_out[31:0] m_out[63:0] m_out[127:0]

CARRY-OUT cout cout_v0 cout Note: N is the bit width of scalar input operands

Note: v0, v1, …, v7 indicate SWs in order; v7 aligns to MSB; v0, LSB Note: For all MODE signal: 1?: mixed-mode; 00: unsigned; 01: signed Note: kill is inserted between SWs; kill2 between v2 and v3, and the like

Table 5.6. Possible sub-word combinations of the proposed SWP MAC design.

A 64-bit SWP MAC is viewed consisting of two independent 32-bit SWP MACs; it then has 5×5 = 25 possible combinations

Table 5.7. Configuration example of KILL signal.

KILL kill6 kill5 kill4 kill3 kill2 kill1 kill0

Note: List only some possible conditions

Note: An illegal input will be redirected to scalar mode by default

Table 5.8. Configuration example of MODE signal.

MODE ^SW_7 ^SW_6 ^SW_5 ^SW_4 ^SW_3 ^SW_2 ^SW_1 ^SW_0 16-bit SWP MAC

(16) N/A N/A N/A N/A N/A N/A ○ ╳ (8,8) N/A N/A N/A N/A N/A N/A ○ ○ 32-bit SWP_MAC

(32) N/A N/A N/A N/A ○ ╳ ╳ ╳ (16,16) N/A N/A N/A N/A ○ ╳ ○ ╳ (8,8,8,8) N/A N/A N/A N/A ○ ○ ○ ○ (8,8,16) N/A N/A N/A N/A ○ ○ ○ ╳ 64-bit SWP MAC

(64) ○ ╳ ╳ ╳ ╳ ╳ ╳ ╳

(32,32) ○ ╳ ╳ ╳ ○ ╳ ╳ ╳ (16,16,16,16) ○ ╳ ○ ╳ ○ ╳ ○ ╳ (8,8,8,8,8,8,8,8) ○ ○ ○ ○ ○ ○ ○ ○ (16,16,32) ○ ╳ ○ ╳ ○ ╳ ╳ ╳ (8,8,8,8,32) ○ ○ ○ ○ ○ ╳ ╳ ╳ (8,8,16,32) ○ ○ ○ ╳ ○ ╳ ╳ ╳

○: Configurable

╳: Can't configure;should be identical with the nearest ○ on the left side Note: Incorrect assignment of mode may cause a wrong result

C ^HAPTER 6 C ^ONCLUSIONS

In this thesis, we present the design methodology of a high-performance reconfigurable modified Booth encoded MAC unit. It is capable of supporting sub-word parallel (SWP) operation which enhances computational throughput of many DSP algorithms especially for multimedia applications. The scalar version of the proposed design comprises a high-speed, area-reduced, and race-free MBE; a speed optimized Wallace PPRT using TDM; and a high speed, area-minimized Fong adder. Using essentially the same hardware, SWP is performed on the scalar MAC by applying some preprocessing to operands associated with a new arrangement of the SWPPA, and with the support of carry-chain blocking when accumulating all partial products. A novel full-adder carry-out masking concept is proposed to build the SWPPRT, facilitating the use of TDM. The SWP version Fong adder inherits its scalar merits and supports identical SW combinations with our requirement. The proposed SWP design innovatively features the flexible sub-word combination and mode assignment scheme with nearly same delay and modest area overhead compared with the proposed scalar design. The proposed designs are fully-synthesizable in a reusable and verifiable design style. Experimental results demonstrate that the proposed scalar and SWP designs, for most cases, outperform the designs of DesignWare® IP [38] and of [10] in terms of critical path delay, area cost, and power consumption.

F ^UTURE W ^ORKS

We are developing a generator to generate the RTL codes of the proposed MAC designs in Verilog HDL format. Testbench for verification, synthesis script, and user’s manual will also be generated. All output files depend on the user reconfigurable inputs. We are also analyzing the pros and cons of replacing the scalar MAC units in multiple-MAC DSP processors by a proposed SWP MAC in order to design a high-performance MAC unit.

B IBLIOGRAPHY

[1] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, pp. 698, pp. 484, pp. 488, John Wiley & Sons, 1999.

[2] P. Lapsley, J. Bier, A. Shoham and E. Lee, DSP Processor Fundamentals:

Architectures and Features, p. 9, p. 35, p. 47, Berkeley Design Technology Inc., 1996

[3] B. Parhami, Computer Arithmetic Algorithms and Hardware Design, pp.

204-205, pp. 149-151, pp. 133-134, pp. 98-99, Oxford University Press, New York, 2000.

[4] O. L. MacSorley, "High-speed arithmetic in binary computers", Proc. IRE, vol.

49, pp. 67-91, 1961.

[5] C. Wallace, “A Suggestion for a Fast Multiplier,” IEEE Trans. on Electronic Computers, vol.13, pp. 14-17, 1964.

[6] S. Krithivasan and M. J. Schulte, “Multiplier Architectures for Media Processing,” Proc. 37th Asilomar Conf. Signals, Systems, and Computers, pp.

2193-2197, Nov. 2003.

[7] M. Keating and P. Bricaud, Reuse Methodology Manual for System-on-Chip Designs, Kluwer Academic Publishers, third edition, 2002.

[8] V. G. Oklobdzija, D. Villeger, and S. S. Liu, "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach," IEEE Trans. Computers, vol. 45, no. 3, pp. 294--305, March 1996.

[9] W.-C. Yeh and C.-W. Jen, “High-Speed Booth Encoded Parallel Multiplier

Design,” IEEE Trans. Computers, vol. 49, no. 7, pp. 692-701, July 2000.

[10] A. Danysh and D. Tan, "Architecture and Implementation of a Vector/SIMD Multiply-Accumulate Unit," IEEE Transactions on Computers, vol. 54, no. 3, pp. 284-293, Mar., 2005.

[11] D. Tan, A. Danysh, M. Liebelt, "Multiple-Precision Fixed-Point Vector Multiply-Accumulator Using Shared Segmentation," arith, p. 12, 16th IEEE Symposium on Computer Arithmetic (ARITH-16 '03), 2003.

[12] G. W. Bewick, "Fast Multiplication: Algorithms and Implementation," PhD dissertation, pp. 14-16, appendix A, pp. 13-14, Stanford University, Department of Electrical Engineering, Feb., 1994.

[13] A. D. Booth, "A Signed Binary Multiplication Technique," Quarterly J.

Mechanical and Applied Math., vol. 4, pp. 236-240, 1951.

[14] L. Dadda, “Some Schemes for Parallel Multipliers,” Alta Frequenza, pages 349-356, March 1965.

[15] M. Santoro, “Design and Clocking of VLSI Multipliers”, PhD dissertation, Stanford University, Department of Electrical Engineering, 1989.

[16] R. Fried, "Minimizing Energy Dissipation in High-Speed Multipliers," Proc.

1997 Int'l Symp. Low Power Electronics and Design, pp. 214-219, 1997.

[17] M. Annaratone and W. Z. Shen, “The Design of an LSI Booth Multiplier,”

Carnegie Mellon University Thesis report (CS), no. 150, 1984.

[18] A. A. Farooqui and V. G. Oklobdzija, “General Data-Path Organization of a MAC Unit for VLSI Implementation of DSP Processors,” Proc. 1998 IEEE Int'l Symp. Circuits and Systems, vol. 2, pp. 260-263, 1998.

[19] S. Vassiliadis, E.M. Schwarz, and B.M. Sung, “Hard-Wired Multipliers with Encoded Partial Products,” IEEE Trans. Computers, vol. 40, no. 11, pp.

1181-1197, Nov. 1991.

[20] P. F. Stelling, C. U. Martel, V. G. Oklobdzija, and R. Ravi, “Optimal circuits for parallel multipliers,” IEEE Transactions on Computers, vol. 47, no. 3, pp.

273-285, Mar. 1998.

[21] D. A. Patterson and J. L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, pp241-249, Morgan Kaufman Publishers, Inc., 2nd Edition, 1998.

[22] R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE Transactions on Computers, vol. 31, no. 3 pp.260-264, 1982.

[23] T. Han, D. A. Carlson, and Steven P. Levitan, “Fast Area Efficient VLSI Adders,” IEEE International Conference on Computer Design, pages 418-422, October 1987.

[24] H Ling, "High-Speed Binary Adder," IBM J. Res. Develop., vol. 25, no. 3, pp156-166, May 1981.

[25] G. Dimitrakopoulos and D. Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders,” IEEE Trans. Computers, vol. 54, No.2, Feb. 2005.

[26] Y. -C. Fong, "A High-Speed Area-Minimized Reconfigurable Adder Design,"

Master’s thesis, National Chiao Tung University, Department of Electronics Engineering, Jul. 2006.

[27] Analog Devices, Blackfin® Processor Hardware Reference, revision 3.0, Sep., 2004. Available from www.analog.com.

[28] Texas Instruments, TMS320C6000 CPU and Instruction Set Reference Guide, revision F, Oct. 2000. Available from www.ti.com.

[29] C. G. Lee and M. G. Stoodley, “Simple Vector Microprocessors for Multimedia Applications,” Proc. 31st Ann. ACM/IEEE Int’l Symp. Microarchitecture, pp.

25-36, 1998.

[30] R. B. Lee, “Multimedia Extensions for General-Purpose Processors,” Proc.

Signal Processing Systems (SIPS ’97), pp. 9-23, Nov. 1997.

[31] N. Burgess, “PAPA—Packed Arithmetic on a Prefix Adder For Multimedia Applications,” Proc. IEEE Int’l Conf. Application-Specific Systems, Architectures and Processors, pp. 197-207, July 2002.

[32] A. A. Farooqui, V. G. Oklobdzija, and F. Chehrazi, “Multiplexer Based Adder for Media Signal Processing,” Proc. 1999 Int’l Symp. VLSI Technnology, Systems, and Applications, pp 100-103, June 1999.

[33] C. R. Baugh and B. A. Wooley, "A two's complement parallel array multiplication algorithm," IEEE Transactions on Computers, vol. 22, pp.

1045--1047, December 1973.

[34] M. J. Schulte, L. P. Marquette, S. Krithivasan, E. G. Walters, and J. Glossner,

“Combined Multiplication and Sum-of-Squares Units,” Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pp. 204–214, June, 2003.

[35] Shankar Krithivasan, Michael J. Schulte, John Glossner, "A Subword-Parallel Multiplication and Sum-of-Squares Unit," isvlsi, p. 273, IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI'04), 2004.

[36] T. K. Callaway and E. E. Swamlander, Jr., “Power-Delay Characteristics of CMOS Multipliers,” Proceedings of rhe 13rh IEEE Siaworium 011 Cornpurer Arirhmeric, pp. 26-32, 1997.

[37] Artisan Components, UMC 0.18μm L180 Process 1.8-Volt Sage-XTMStandard Cell Library Databook, release 2.0, pp. 32-33, Nov. 2003.

[38] Synopsys Inc., DesignWare® Building Block IP Documentation Overview, Jan.

17, 2005.

[39] Synopsys Inc., Design Compiler® User Guide, version W-2004. 12, Dec.,

2004.

[40] Synopsys Inc., PrimePower® Manual, version W-2004. 12, Dec., 2004.

[41] Cadence Design Systems Inc., Verilog®-XL User Guide, version 3.4, Jan., 2002.

[42] Novas Software Inc., nLint® User Guide and Tutorial, version 2.2, Dec., 2004.

[43] TransEDA Technology Ltd., Verification Navigator® User Guide, version 2005.03, Mar., 2005.

[44] Cadence Design Systems Inc., Encounter™ Conformal® Equivalence Checking User Guide, version 5.1, June, 2005.

在文檔中高效能且可組態之子字組平行化乘加器設計 (頁 89-0)