Chapter 2 Previous Works
2.2 Related Works
2.2.1 Partial Product Generation (PPG)
Partial product generation is divided into two parts - decoding the multiplicand in correspondence with the encoding of multiplier done by an MBE, and the arrangement and alignment on the MBE outputs to form the PPA.
Concerning the MBE, [16] presents a comparison of energy dissipation among standard, compact, and race-free encoding schemes of an MBE. The race-free
9
scheme encoded MBE consumes least power because it balances the delay of internal signals and thus avoids glitches/sparks in the circuits. In [9] the race-free MBE is further optimized in terms of timing and area. The spirit of this implementation is to intentionally use “wrong” encoding signals at middle gate levels and corrects the error at final level. The temporal “wrong” logic enables more logic optimization compared to other encoding schemes, leading to a decrease in delay, reduction of area, and less consumption of power. Table 2.2, 2.3, and 2.4 list the truth table of standard, compact, and race-free MBE schemes, respectively. Fig.
2.4 shows the improved encoder and decoder of the MBE in [9].
Table 2.2. Truth table of standard encoding.
Y2i+1 Y2i Y2i-1 P1 P2 Z M1 M2
Table 2.3. Truth table of compact encoding.
Y2i+1 Y2i Y2i-1 P1 P2 Neg
Table 2.4. Truth table of race-free encoding.
Y2i+1 Y2i Y2i-1 P1 P2 Neg Z
0 0 0 0 1 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1
Fig. 2.4. The MBE encoder and decoder in [9].
When MBA is used, the PPs are treated as signed numbers since three negative MBE outputs may be selected as listed in Table 2.1. This suggests sign extension be applied to every PP to ensure a correct result; however, sign extension needs to take considerable extra logic. To deal with, in [17], [18], [19], a technique called sign-encoding (SE) or sign-generation is provided and [12] gives this technique a general description. The concept of SE is depicted in Fig. 2.5 at the MSB end: It begins to presume all PPs are negative and hence one-extension is applied as shown in Fig. 2.5a. Since the extended ones are fixed in position, accumulating all extended
11
ones in advance produces {1,1} in front of the first PP and {1,0} for others, as shown in Fig. 2.5b. To correct the presumption, add one to the LSB of each sign-extension string, resulting in the logic in Fig. 2.5c. As a whole, SE exploits the predictability of sign-extension, and cleverly protects from the redundant extension bits simply for correctly representing a sign number. It takes only two or three SE bits, {p,n,n} for the first PP and {1,p} for others, in front of the original MSB of each PP where n stands for the original sign of each PP; p, the negation of n.
We simulate a multiplier with or without using SE. While SE is used, the power consumption of the PPG and PPRT is saved up to one-third of that without using SE; the improvment rate grows as the bit width increases.
Another problem arises when MBE selects a negative output. Since MBA treats the operands as signed numbers in two’s complement (TC) format, if a negative output is selected, we have to negate/two’s-complement the bit stream, implying a two’s complementer for negation is required. To complete the operation, the ones, also called hot-ones [19], are needed to be added after inverting (one’s complementing) the bit stream. It’s a waste to let these ones solely for TC be one of the PPRT inputs. Fortunately, due to MBA this can be prevented since the least significant bit (LSB) position of the present PP should align two bits far from the LSB of the preceding PP; two bits space {h,h} is saved and can be utilized to locate the hot-one from the preceding PP as shown in Fig. 2.5a at the LSB end. The hot-one may also left shift one bit if MBE selects 2x or -2x from the encoding table, but this takes no effort since two bits space are reserved.
In case a random-valued multiplier is being encoded, the hot-one bit may show up in either left or right h position. This irregularity will increase the PPRT latency [9]. In [9], the authors also propose a skill – we refer it to “hot-one modification” in this paper– to regulate the LSB end of all PPs. Observing the fact that the hot-one
12
logic relates the LSB logic of the present PP as shown in Fig. 2.5b, a truth table as listed in Table 2.5 can be built; the new logic equation of two signals LSB_new and hot2 can be expressed as:
2 1 2
2 1 2 - 1 2 2 1 2
_ ( )
2
i LSB i i
i i i i LSB i LSB
LSB new x y y
hot y y y y x y x
−
+ −
= ⋅ ⊕
= ⋅ + ⋅ + ⋅ + . (2.1) It arranges all hot-one bits to the left h positions (hot2) accompanied with the
probable modification on the preceding LSB (LSB_new). As a result, Fig. 2.5c exhibits the arranged, shorter, parallelogram-shaped, more regular PPA to be accumulated in the PPRT.
Fig. 2.5. Sign encoding and hot-one modification.
13
Table 2.5. Truth table of LSB_new and hot2.
In [8], Oklobdzija et al present a three-dimensional method (TDM) to build a speed optimized Wallace PPRT. The main idea of this speed optimization can be briefly depicted as Fig. 2.6: In Fig. 2.6a, a common logic implementation of an FA is shown. Without loss of generality, assuming a NAND gate delay to be 1 and an XOR gate delay to be 2, the delay of each input-to-output path can be calculated as shown in Fig. 2.6b. The longest path is from input a or input b to output sum; sum, therefore, is referred to as the “slow output” in contrast with the “fast output”, cout.
cin is the “slow input” since it can wait for a slow output. Connecting a “slow output” to a signal requiring a “fast input” (e.g., a) produces the critical path! Take a two-level PPRT for example, the latency of the left configuration in Fig. 2.6c is
14