Low-cost Two's Complement Multipliers Using Signed Binary Digits for High-speed Digital Systems

全文

(1)Low-cost Two’s Complement Multipliers Using Signed Binary Digits for High-speed Digital Systems. Shiann-Rong Kuang, Shao-Hean Hsu, Shu-You Liu, and Kuo-Chin Huang Department of Computer Science Engineering National Sun Yat-Sen University [email protected]. Abstract. Although the RB multiplier is very fast, its area is probably large due to the redundant binary number representation and number system conversion. Fortunately, the property of constant data size appeared in many real applications, which requires that the 2n-bit product of multiplication operation is truncated into n bits, can be applied to significantly reduce the area and power of RB multiplier. The simplest method to obtain a truncated multiplier is directly eliminating about half the adder cells of the standard multiplier, but a large product error would be introduced. Many papers [3]-[8] have proposed efficient methods and circuits to reduce the product error. However, most of them design the low-error truncated multiplier from the Baugh-Wooley multiplier, and none pays attention to the redundant binary signed-digit (RBSD) multiplier [1, 2]. This paper focuses on the design of low-cost and low-error truncated RBSD multiplier to reduce the area and power of multiplier.. Multiplication is the most important operation in many high-speed digital systems. Redundant binary number system has been used to design fast multipliers, but whose area is probably larger than other kind of multipliers. In this paper, area-efficient two’s complement multipliers using binary signed-digit number system are designed for digital systems with constant data size by truncating the 2n-bit product into n-bit. Based on the variable correction value scheme, a novel carry compensation formulation and corresponding circuit are developed to largely degrade the product error. Simulation results show that the proposed truncated multipliers are more accurate than other truncated architectures while maintaining high speed and small area. When applying to discrete cosine transform (DCT), the proposed multiplier can significantly reduce the area and power of DCT circuit and still obtain good image quality.. In the past, fixed constant [3]-[5] or variable correction value [6]-[8] is used to reduce the product error of truncated multiplier. The former adds a fixed constant obtained based on statistic average to the remaining adder cells of the truncated multiplier. The latter adds an input-data dependent correction value to the remaining adder cells so that it usually works better than the former. Therefore, we develop low-cost truncated RBSD multipliers based on the scheme of variable correction value. Simulation results show that the proposed truncated RBSD multipliers have lower product error than other architectures while maintaining high speed and small area.. Keywords: binary signed-digit number system, two’s complement multiplier, discrete cosine transform. 1. Introduction Numerous multiplication schemes have been introduced to enhance the performance of multipliers. An efficient method to design a fast multiplier is to represent the partial products as redundant binary (RB) numbers and accumulate them by a RB adder tree. The RB multiplier not only improves speed because it requires no continuous carry propagation, but also simplifies the interconnection. The literature [1] has reported that the RB multiplier is more suitable for VLSI design due to its regular layout and results in high-speed circuit implementations.. The remainder of this paper is organized as follows. Section 2 briefly introduces the RBSD number representation and multiplier. The product error correction for truncated RBSD -1-.

(2) multipliers is described in Section 3. Section 4 provides error comparisons with previous designs and some applications. Finally, the conclusion is given.. RB product can be truncated to n-bit by eliminating the n least significant columns (column 0 to column n−1) to form a truncated RBSD multiplier TRMH. In TRMH, the complexity of RBSD Booth's encoders, RBA tree, and RBSD-to-NB converter is reduced by almost half, but large error is introduced into the product.. 2. RBSD Number and Multiplier The redundant binary signed-digit number system uses the digit set {1 , 0, 1} to represent numbers, where 1 denotes the digits value –1. Each digit in the RBSD representation can be encoded by using two bits if the positive-and-negative encoding is employed. The value of each digit is calculated by xi = ( xi+ , xi− ) = ( xi+ ) − ( xi− ), where ( xi+ , xi− ) is one of the four forms (0, 0), (0, 1), (1, 0), and (1, 1), whose value is 0, 1 , 1, and 0, respectively.. Let σ n −1 denote the sum of carries from the column n−1, a good estimation of σ n −1 can be used as a correction value to degrade the product error of TRMH. By Fig. 3, we have σ n−1 =  2 −1 ( p0,n −1 + p1, n − 3 + ... + p  n / 2  −1,1). + 2−2( p0,n−2 + ... + p n/ 2  −1,0 ). Fig. 1 shows the block diagram of an n×n-bit RBSD multiplier for n=8. The multiplier essentially consists of RBSD Booth's encoders, an RB adder tree, and an RBSD-to-NB converter. Given two n-bit binary numbers X and Y in two’s complement form, the RBSD Booth-2 (radix-4) encoders [2] generate multiples and n / 2 rows of RBSD partial products PPi, where  x  denotes the smallest integer that is larger than or equal to the real number x. The RBSD Booth's encoder uses the same encoding table as the modified Booth's encoding to generate RBSD partial products without any additional time delay and with almost no extra hardware. Then the RBSD partial products are added up by using the redundant binary adders (RBA) tree. The array of an RBA tree can increase operating speed by use of high speed RBA. For example, the RBA presented in [1] (shown in Fig. 2) is optimized for speed and area efficiency by employing transmission gates. In [1], both the inputs ( ai+ , ai− ) and ( bi+ , bi− ) of a RBA cell are assumed to take one of the three states (0, 1), (0, 0), (1, 0), and no (1, 1) to simplify the consideration. The ( βi , hi) denotes the carry and hi is defined to prevent the continuous carry propagation by eliminating the collision of the sum and the carry from the lower digit. The final RBSD product R must be converted to a normal binary (NB) product N by an RBSD-to-NB converter [1].. 3. Design of Truncated RBSD Multiplier In a n × n standard RBSD multiplier, n / 2   rows of RBSD partial products are added up to generate the final RB product R[2n−1] to R[0]. Fig. 3 shows the case of n=8. The 2n-bit -2-. + ... + 2−(n−1) p0,1 + 2−n p0,0 . = 2 −1 (θ + σ n − 2 )  .. (1). where θ is the sum of partial products in column n−1, and  x  denotes the integer part of the real number x. When round-off is considered, the sum of carries from the column n−1, denoted as δ n−1 , becomes δ n−1 = σ n −1 + (θ + σ n− 2 ) mod 2.  −1. . = 2 (θ + σ n − 2 ) + (θ + σ n−2 ) mod 2 .. (2). The following goal is to find a good estimation of δ n−1 to obtain a compensation value for reducing the product error of TRMH. Using the fixed constant correction scheme to reduce the product error of TRMH doesn’t work well since the partial products of RBSD multiplier may be 0, 1, or 1 so that the average value of δ n−1 always approximates to zero. It means that this scheme will use constant 0 as the compensation value and no improvement of the product error can be achieved. Therefore, we adopt the variable correction value scheme and find two possible adaptive compensation formulations, and then the better one is selected by simulation results. The first candidate is θ . θ has been used in other kind of truncated multipliers as the compensation value. For example, the truncated array multiplier in [6] and the truncated Booth's multiplier in [8] use θ to degrade the product error. Therefore, θ is a possible approximation of δ n −1. The second candidate, denoted as λ , is derived from Eq. (2). As mentioned above, the partial products of RBSD multiplier may be 0, 1, or 1 so that the average value of σ n−2 also.

(3) approximates to zero. Replacing σ n−2 by 0, Eq. (2) is rewritten as. . −1. δ n−1 ≅ λ = 2 ⋅ θ. . + (θ mod 2 ) .. (3). (5). υ ≡ E{(ε − ε ) 2 } ,. (6). where FP represents the output value for different truncated multiplier, respectively, and E{•} is the expectation operator. The comparison results of ε , ε M and υ for different truncated multipliers are shown in Table 4 to Table 6. The results show that TRMC is more accurate than other truncated multipliers.. We apply all possible input combinations to standard RBSD multiplier to inspect that θ or λ is more approximate to δ n−1. Let α1 = δ n −1 − θ and α2 = δn −1 − λ, the probability distribution of α1 and α2 for the cases of n from 8 to 14 are shown in Table 1 and Table 2, respectively. It is obvious that δn −1 is equals to λ (i.e. α2 =0) for most input combinations. Thus, λ is the better choice.. The proposed multiplier is applied to the design of a discrete cosine transform (DCT) circuit for image processing. We use the different 11×11-to-15 truncated RBSD multipliers to test the quality of reconstructed images. Four 256×256 images are picked for this experiment, and quality comparison among different multipliers is based on PSNR and RMSE. The larger PSNR and smaller RMSE represent the better quality of the reconstructed images. The quality comparison reported in Table 7 shows that the proposed truncated multipliers can obtain very good image quality.. The subsequent challenge is to design a fast and simple circuit to perform Eq. (3). Since the partial products are represented as RBSD numbers, it is very difficult to design a general circuit that completely match the behavior of Eq. (3). In our design, the compensation circuit consists of three kinds of cells RHA1, RHA2, and RHA3 as shown in Fig. 4. Taking the partial products of column n−1 as inputs, these cells generate the compensation value of Eq. (3) in the similar form of adder tree. In the compensation tree, RHA1 and RHA3 are used in the first (top) level and last (bottom) level, respectively. RHA2 is used in other levels between the first and last levels. The carries generated by these cells then are applied to the RB full adders of TRMH to from a low-error truncated multiplier TRMC as shown in Fig. 5(a) and Fig. 5(b) for n=8 and 12.. 5. Conclusion This paper has proposed low-cost and low-error RBSD multipliers to save hardware area and power dissipation. The correction value for product error was dependent upon input data and has been verified by simulation. Experimental results shown that the product error of the proposed RBSD truncated multipliers was lower than that of other truncated multipliers.. Table 3 shows the transistor ratio of truncated RBSD multiplier TRMH and TRMC versus the standard RBSD multiplier MS. Comparing with MS, the proposed truncated multiplier TRMC saves about 32% area. Moreover, TRMC needs about 8% area overhead but has very low product error than TRMH.. Acknowledgment This work was supported in part by the National Science Council, R.O.C., under Grant NSC-91-2215-E-110-027.. 4. Experimental Results To appreciate the accuracy of the proposed truncated multiplier TRMC, we take the K-G-As’ structure (MK-G-A) [5], the J-K-Cs’ structure (MJ-K-C) [7], the multiplier MBooth proposed in [8], TRMH, and TRMS (truncate the n LSBs of 2n-bit product of MS to obtain its n-bit product) for comparison. Let ε , ε M , ε , and υ denote the absolute error, the maximal absolute error, the average error, and the variance of error, respectively. That is, ε ≡ M S − FP ,. ε ≡ E{ε } ,. References [1]. H. Makino, Y. Nakase, H. Suzuki, H. Moronika, H. Shinohara, and K. Mashiko, “An 8.8-ns 54×54-Bit Multiplier with High Speed Redundant Binary Architecture,” IEEE Joural of Solid-State Circuits, vol. 31, no. 6, pp.773-783, June 1996.. [2]. N. Besli and R. G. Deshmukh, “A novel redundant binary signed-digit (RBSD) Booth's encoding,” IEEE SoutheastCon, pp. 426-431, 2002.. [3]. Y. C. Lim, “Single-recision multiplier with. (4). -3-.

(4) “Data-dependent truncation scheme for parallel multipliers,” 31st Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1178-1182, 1997.. reduced circuit complexity for signal processing applications,” IEEE Trans. on Computers, vol. 41, no. 10, pp. 1333-1336, 1992. [4]. M. J. Schulte and E. E. Swartzlander, Jr., “Truncated multiplication with correction constant,” Workshop on VLSI Signal Processing, VI, pp. 388-396, 1993.. [5]. S. S. Kidambi, F. El-Guibaly, and A. Antoniou, “Area-efficient multipliers for digital signal processing applications,” IEEE Trans. on Circuits & Systems II, vol. 43, no.2, pp. 90-95, Feb. 1996.. [6]. [7]. J. M. Jou, S. R. Kuang, and R. D. Chen, “Design of low-error fixed-width multipliers for DSP applications,” IEEE Trans. on Circuits & Systems II, vol. 46, no. 6, pp. 836-842, June 1999.. [8]. S. J. Jou and H. H. Wang, “Fixed-Width Multiplier for DSP Application,” IEEE International Conference on Computer Design, pp. 318-322, 2000.. E. J. King and E. E. Swartzlander, Jr.,. 9. X [8..6]. Y [8..0]. X [6..4]. X [4..2]. X [2..0]. 3. 3. 3. 3 RBSD B o o t h ’s Encoder. PP3. RBSD B o o t h ’s Encoder. PP2. 9. RBSD B o o t h ’s Encoder. PP1. 9. RBSD Booth ’s Encoder. PP0. 9. 9. RBA. RBA RBA R [15..0]. 16. R B S D -to-N B C o n v e r t e r N [15..0]. 16. Fig. 1. The block diagram of an 8×8-bit RBSD multiplier. Fig. 2. Redundant Binary Adder schematic diagram. -4-. RBA tree.

(5) TP0. PP0. TP1. PP1 PP2 PP3. p 0,8 p 0,7 p 0,6 p 0,5 p 0,4 p 0,3 p 0,2 p 0,1 p 0,0. p 1,8 p 1, 7 p 1,6 p 1,5 p 1, 4 p 1, 3 p 1,2 p 1,1 p 1,0. TP2 TP3. p 2,8 p 2,7 p 2,6 p 2,5 p 2,4 p 2,3 p 2, 2 p 2,1 p 2,0. p 3,8 p 3,7 p 3,6 p 3,5 p 3,4 p 3,3 p 3,2 p 3,1 p 3,0. R[15] R[14] (MSB). . . .. R[9] R[8] R[7] R[6]. . . .. Fig. 3. Partial products of an 8×8-bit RBSD multiplier. Fig. 4. The cells for generating compensation value. Fig. 5. Apply compensation value to TRMH. -5-. R[1] R[0] (LSB).

(6) α1. n 8 10 12 14. 0.04%. α2. n. Table 1. Probability distribution of α1 with different n -2 -1 0 1. -3. 8 10 12 14. -3. 0.12% 0.31% 0.61% 0.98%. 12% 14% 17% 19%. 75% 69% 64% 60%. 12% 15% 17% 19%. 2. 3. 0.12% 0.31% 0.61% 0.99%. 0.04%. 2. 3. 0.04% 0.10% 0.19% 0.32%. -. Table 2. Probability distribution of α2 with different n -2 -1 0 1. -. 0.04% 0.13% 0.26% 0.45%. 8.2% 10.6% 12.6% 14.1%. 86% 82% 78% 75%. 5.3% 7.2% 8.8% 10.2%. Table 3. Transistor ratio for different multipliers Multiplier. Transistor ratio n=10 n=12 1 1 0.61 0.60 0.69 0.68. n=8 1 0.62 0.71. MS TRMH TRMC. n=14 1 0.59 0.67. Table 4. Comparison results of average error Multipliers MK-G-A MJ-K-C MBooth TRMH TRMS TRMC. Error. ε. n=8 188.3 170.5 106.2 149.1 124.6 101.1. n=10 906.4 736.6 455.7 675.4 507.7 425.5. n=12 3842.1 3094.2 1912.1 2982.9 2043.1 1786.0. n=14 17752.0 12805.5 11488.3 12961.2 8185.6 7476.0. Table 5. Comparison results of the maximal absolute error Multipliers MK-G-A MJ-K-C MBooth TRMH TRMS TRMC. Error. εM. n=8 1281 515 441 938 255 459. n=10 6145 2403 2105 4778 1023 2219. -6-. n=12 32769 10979 9785 23210 4095 10923. n=14 163841 49379 44601 109226 16383 51883.

(7) Table 6. Comparison results of variance of error Multipliers MK-G-A MJ-K-C MBooth TRMH TRMS TRMC. Error. υ. n=8 22959 10159 6247 14400 5470 5470. n=10 416043 190805 125055 289841 87463 97922. n=12 9204493 3417020 2341510 5576828 1398529 1747885.28. n=14 377915712 63473866 41516237 104164651 22371591 30981360. Table 7. Quality comparison of reconstructed images for different multipliers Image Lena Baboon Bear F16. Error PSNR RMSE PSNR RMSE PSNR RMSE PSNR RMSE. TRMH 37.35 11.96 37.10 12.68 36.14 15.83 35.26 19.35. -7-. Multiplier TRMS 40.98 5.18 40.20 6.21 41.12 5.04 39.93 6.61. TRMC 42.47 3.68 43.51 2.90 43.55 2.87 42.42 3.72.

(8)