Chapter 4 Implenmentation and Comparison
4.2 Application of Reconfigurable Fixed-Width Multiplier
Next, the proposed 8x8 reconfigurable fixed-width multiplier is applied to the 35-tap FIR filter for speech processing. The behavior of a digital FIR filter can be expressed in (13).
1
0
) ( ) ( )
(
L
i
i m S i H m
O
(13) where O(i), H(i), and S(i) denote output sequence, filter coefficient and input sequence at ith discrete time, respectively. For convenience of evaluation of various multipliers, we take 1,000 samples for the consonant part and the vowel part of “Chicken.” The error performance and power consumption of four CMs are tabulated in Table 4.8, where error performance is measured by signal-to-noise ratio (SNR) and the power consumption is still measured at 125 MHz. The floating-point output is regarded as an error-free standard output, which is used to assess the FIR filter performance using the proposed four modes and an 8x8 full-precision multiplier mode. The error performance and power consumption are tabulated in Table 4.8, where the error performance is measured by signal-to-noise ratio (SNR) and the power consumption is still measured at 125 MHz. From the comparison results in Table 4.8, the CM1 mode shows higher SNR value than that by other three CMs including CM2, CM4, and HPFM modes. However, the CM1 mode consumes more power than other three proposed modes. Compared with the CM1 mode, CM2 and CM4 modes can result in 53.20% and 51.48% power saving, respectively, but these two modes lead to 14.57 dB SNR loss in this benchmark. For fair comparison between CM1 and an 8x8 full-precision multiplier mode owing to the same input bit width, the CM1 mode can save 39.23% power consumption compared to the latter one with 10.21 dB loss. Under the same 4-bit input bit width, compared with the CM3 mode, CM2 and CM4 modes can attain power saving by 31.28% and 28.76%,
respectively, with 4.66 dB SNR loss. Since CM2 and CM4 modes can concurrently generate two multiplication products, only the number of L/2 nxn multipliers is needed to finish FIR filter operation. Therefore, CM2 and CM4 modes can save the number of L/2 nxn multipliers. As for the CM3 mode, the SNR and power consumption are
between the performance of CM1 and CM2/CM4 modes. In order to compare with the published FIR filter work [33-35], one terminology introduced in [33] is used to indicate the normalized power dissipation per multiplier, P(mult), and is given by
freq clk Tech Vdd
bits bits
mults Power Total
mult P
125 0.18
8 . 1
sample
# 8 coeff
# 8 ) #
(
2
, (14)
where Vdd and Tech denote the power supply voltage and process technology, respectively. From the comparison results of Table 4.9, the FIR filter using the proposed reconfigurable multiplier has the lowest normalized power consumption. The main reason is that the fixed-width multiplier has less power consumption than the full-precision multiplier does in Table 4.8. Hence, in this benchmark as listed in Tables 4.8 and 4.9, the proposed reconfigurable fixed-width Booth multiplier shows power scalable capability and better power saving with satisfactory error performance compared with other FIR filters.
Table 4.8: Evaluation results of error signals and power consumption obtained with by the proposed 8x8 reconfigurable fixed-width Booth multiplier for FIR filter application
Comparison Results of Error Signals
Power SNR
FIR Filter Using Reconfigurable
Fixed-Width Multiplier
CM1 Mode 25.02 mW 25.46 dB
CM2 Mode 11.71 mW 10.89 dB
CM3 Mode 17.04 mW 15.55 dB
CM4 Mode 12.14 mW 10.89 dB
FIR Filter Using 8x8 Non-Reconfigurable Full-Precision Multiplier
41.17 mW 35.67 dB
Table 4.9: Comparison results among FIR filters
Ref. Description Tech. Vdd Power P(mult)
[33] 128-tap, 32 10x12 mult@80MHz
0.5 um 3.3V 415 mW 1.16 mW [34] 12-tap, 12 10x8
mult@40MHz
0.5 um 1.8V 10.9 mW 0.82 mW [35] 32-digit, 11.52
12x8mult
@100MHz
0.35 um 2.5V 80 mW 1.54 mW
This work (CM1 mode)
35-tap, 35 8x8 mult@125MHz
0.18 um 1.8 V 25.02 mW 0.71 mW
Chapter 5
Conclusion and Future Work
This thesis proposed a framework for the reconfigurable fixed-Width Booth multiplier and the reconfigurable fixed-width Baugh-Wooley multiplier to generate a family of fixed-width and full-precision multipliers. From the implementation results, the presented four configuration modes of the reconfigurable fixed-width multiplier are capable of providing high resolution, parallel, or full-precision multiplications for different computation demands. On the other hand, the proposed reconfigurable fixed-width Booth multiplier can attain the power saving of 14.0% on average with respect to that of non-reconfigurable fixed-width Booth multiplier with n=16. Also, the proposed pipelined reconfigurable fixed-width Baugh-Wooley multiplier can save 12.56% power consumption on average in comparison with that of pipelined non-reconfigurable fixed-width Baugh-Wooley multiplier with n=16. The future work is to apply the developed multipliers to the power-aware systems.
Bibliography
[1] A. D. Booth, “A signed binary multiplication techniques,” Quart. J. Mech. Appl. Math., vol.
4, pp. 236-240, 1951.
[2] O. L. MacSorley, “High-speed arithmetic in binary computer,” Proc. IRE, vol. 49, pp. 67-91, 1961.
[3] H. Sam and A. Gupta, “A generalized multibit recoding of two’s complement binary numbers and its proof with applications in multiplier implementations,” IEEE Trans.
Comput., vol. 39, pp. 1006-1015, Aug. 1990.
[4] C. R. Baugh and B. A. Wooley, “A two’s complement parallel array multiplication algorithm,” IEEE Trans. Compt., vol. C-22, no. 12, pp. 1045-1047, Dec. 1973.
[5] K. Hwang, Computer Arithmetic: Principles, Architecture, and Design. New York:
John-Wiley, 1979.
[6] F. Cavanagh, Digital Computer Arithmetic: Design and Implementation. New York:
McGraw-Hill, 1984.
[7] M. D. Ercegovac and T. Lang, Digital Arithmetic, Morgan and Kaufmann, 2004.
[8] S. L. Freeny, “Special-purpose hardware for digital filtering,” Proc. IEEE, vol. 63, no. 4, pp.
633-647, Apr. 1975.
[9] Y. C. Lim, “Single-precision multiplier with reduced circuit complexity for signal processing applications,” IEEE Trans. Comput., vol. 41, no. 10, pp. 1333-1336, Oct. 1992.
[10] M. J. Schulte and E. E. Swartzlander, Jr., “Truncated multiplication with correction
constant,” VLSI Signal Processing, VI, New York: IEEE Press, 1993, pp. 388-396.
[11] S. S. Kidambi, F. El-Guibaly, and A. Antoniou, “Area-efficient multipliers for digital signal processing applications,” IEEE Trans. Circuits Syst. II, vol. CAS-43, no. 2, pp. 90-94, Feb.
1996.
[12] E. J. King and E. E. Swartzlander, Jr., “Data-dependent truncation scheme for parallel multipliers,” in Proc. 31st Asilomar Conference on Signals, Systems, and Computers, 1997, vol. 2, pp. 1178-1182, Pacific Grove, CA.
[13] E. E. Swartzlander, Jr., “Truncated multiplication with approximate rounding,” in Proc.
33rd Asilomar Conference on Signals, Systems, and Computers, 1999, vol. 2, pp.
1480-1483.
[14] J. M. Jou, S. R. Kuang, and R. D. Chen, “Design of low-error fixed-width multiplier for DSP applications,” IEEE Trans. Circuits Syst. II, vol. CAS-46, no. 6, pp. 836-842, Jun.
1999.
[15] L. D. Van, S. S. Wang, and W. S. Feng, “Design of the lower-error fixed-width multiplier and its application,” IEEE Trans. Circuits Syst. II, vol. 47, pp. 1112-1118, Oct. 2000.
[16] S. J. Jou, M. H. Tsai and Y. L. Tsao, “Low-Error Reduced-width Booth multiplier for DSP application,” IEEE Trans. Circuits Syst. I, vol. 50, pp. 1470-1474, Nov.
2003.
[17] K. J. Cho, K. C Lee, J.G. Chung and K. K. Parhi, “Design low-error fixed-width modified Booth multiplier,” IEEE Trans. VLSI, vol.12, pp. 522-531, No 5, May 2004.
[18] T. B. Juang, S. F. Hsiao, ”Low-error carry-free fixed-width multiplies with low-cost compensation circuits,” IEEE Trans. Circuits Syst. II, vol. 52, pp. 299-303, June 2005.
[19] L. D. Van and C. C. Yang, “Generalized low-error area-efficient fixed-width multipliers,”
IEEE Trans. Circuits Syst. I, vol. 52, pp. 1608-1619, Aug. 2005.
[20] M. A. Song, L. D. Van, and S. Y. Kuo, “Adaptive low-error fixed-width Booth
multipliers,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E90-A, no. 6, pp. 1180-1187, Jun. 2007.
[21] S. Krithivasan and M. J. Schulte, “Multiplier architectures for media processing,” in Proc.
IEEE Asilomar Conference on Signals, Systems, and Computers, Nov. 2003, vol. 2, pp.
2193-2197.
[22] Y.-H. Huang, H.-P. Ma, M.-L. Liou, and T.-D. Chiueh, “A 1.1 G MAC/s sub-word-parallel digital signal processor for wireless communication applications,” IEEE Journal of Solid-State Circuits, Vol.39, pp.169-183, Jan. 2004.
[23] S. Krithivasan, M. J. Schulte, and J. Glossner, ”A subword-parallel multiplication and sum-of-squares unit,” IEEE Computer society Annual Symposium on VLSI, pp. 273-274, Feb. 2004.
[24] Y.-L. Tsao, W.-H. Chen, M.-H, Tan, M.-C. Lin, and S.-J. Jou, “Low-power embedded DSP core for communication systems,” EURASIP Journal on Applied Signal Processing, pp.1355-1370, Jan. 2003.
[25] D. Tan, A. Danysh and M. Liebelt, ”Multiple-precision fixed-point vector multiply-accumulator using shared segmentation,” in Proc. IEEE Symposium on Computer Arithmetic, pp. 12-19, Jun. 2003.
[26] C. L. Wey and J. F. Li, ”Design of reconfigurable array multipliers and multiplier-accumulators,” in Proc. IEEE Asia-Pacific Conference on Circuits and Systems, Dec. 2004, pp. 37-40.
[27] R. Lin, “Reconfigurable parallel inner product processor architecture,” IEEE Trans. VLSI Syst., vol. 9, pp. 261-272, Apr. 2001.
[28] K. Tatas, G. Koutroumpezis, D. Soudris, A. Thanailakis, "Architecture design of a coarse-grain reconfigurable multiply-accumulate unit for data-intensive applications,"
Integration The VLSI Journal, vol. 40, pp. 74-93, Feb. 2007.
[29] S. D. Haynes and P. Y. K. Cheung, “Configurable multiplier blocks for embedding in
FPGAs,” Electronics Letter, vol. 34, no. 7, pp. 638-639, Apr. 1998.
[30] J. Di and J. S. Yuan, “Run-time reconfigurable power-aware pipelined signed array multiplier design,” in Proc. IEEE International Symposium on Signals, Circuits, and Systems, July 2003, vol. 2, pp. 405-406.
[31] M. Sjalander, M. Drazdziulis, P. Larsson-Edefors, and H. Eriksson, “A low-leakage twin-precision multiplier using reconfigurable power gating,” in Proc. IEEE International Symposium on Circuits, and Systems, May 2005, vol. 2, pp. 1654-1657.
[32] S.-R. Kuang and J.-P. Wang, “Design of power-efficient pipelined truncated multipliers with various output precision,” IET Computers & Digital Techniques, vol. 1, pp. 129-136, Mar. 2007.
[33] C. J. Nicol, P. Larsson, K. Azadet, and J. H. O’Neill, “A low power 128-tap digital adaptive equalizer for broadband modems,” IEEE J. Solid-State Circuits, vol. 32, no.
11, pp. 1777–1789, Nov. 1997.
[34] C. Henning, R. Schwann, V. Gierenz, and T. G. Noll, “A low power reconfigurable 12-tap FIR interpolation filter with fixed coefficient sets,” in Proc. IEEE 26th European Solid-State Circuits Conference, Sep. 2000, pp. 81-84.
[35] K.-H. Chen and T.-D. Chiueh, “A low-power digit-based reconfigurable FIR filter,” IEEE Trans. Circuits Syst. II, vol. 53, pp. 617-621, Aug. 2006.
Biography
Jin-Hao Tu was born in Taipei, Taiwan, R.O.C., in 1984. He received the B.S.
degree from National Changhua University of Education, Changhua, Taiwan, in 2006, and the M.S. degree from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2008, all in computer science. His research interests are computer arithmetic and 3D graphics system design.