CHAPTER 3. ANALYSIS AND IMPLEMENTATION
3.4 I MPLEMENTATION
3.4.3 Arithmetic operation
In this thesis arithmetic operations include 16*16 bits array multiplier and 32*32 bits carry
Figure 3-10 shows the 2-input du
become valid. When this happens exactly one of the four C-elements goes high. This again causes the relevant output wire to go high corresponding to the gate producing the desired valid output. When all inputs become empty the C-elements are all set low and the output of the dual-rail AND gate becomes empty again. By using the same concepts, other dual-rail basic elements including OR, XOR are constructed. These basic elements are also used to construct the half adder, full adder, and other ALU blocks in hierarchical.
-lookahead adder. We introduce 16*16 bits array multiplier first. Multiplier is an essential device in apparatuses such as microprocessors or in digital signal processors. It also takes the longest operational time, which usually is the decisive factor of an effective chip.
For the time being, several asynchronous designs have been proposed. Due to its low power consumption, low average operational time and flexibility to adapt to various process and environment, the asynchronous circuit has been used in VLSI circuits for better performance.
In our design the multiplier comprises a partial product generator, an addition array, a final-stage adder and a completion detector. The partial product generator generates intermediate partial products, and the addition array adds these partial products. Then, the final-stage adder adds these partial products and outputs the sum. Finally, the completion detector checks and output the result. Figure 3.11 shows the work flow of conventional asynchronous multiplier.
Figure 3-11: The work flow of conventional asynchronous multiplier
Figure 3-12 shows an 8*8 bits right-to-left array multiplier including a partial product generator, and a right-to-left addition array.
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . .
. . . .
. . . . . . . .
product
Figure 3-12: An 8*8 bits right-to-left array multiplier
In figure 3-12 “●” represents a bit product generation. The partial product generator is usually implemented with an AND gate. ”♁” represents an adder. In the right-to-left array adder, the sum of each adder is propagated to the next-level adder. The carry of each adder is propagated to the higher-bit adder in the same level. The computation result of each adder covered by gray dotted line is used to generate completion-detection signal.
The partial product generator is implemented with the DI AND gate which is illustrated in previous section so that in this section we do not mention it again. Figure 3-13 is a schematic drawing showing a single partial product generator scheme in a single row.
Referring to figure 3-13, the gray point represents the partial product, which is the product of multiplier and a particular bit of the multiplicand xj.
Figure 3-13: A conventional single partial product generator scheme
Then the partial products are added by the adder array. In the conventional technique, the DI full adder can be a basic unit of the addition array. To implement the full adder, the dual-rail signal is used for the inputs, (A0, A1), (B0, B1) and (Cin0, Cin1), and the outputs, the sum (S0, S1) and the carry (Cout0, Cout1). Wherein, the sum and carry can be obtained from the following logic expressions:
Cout0 = A0 B0 + A0 Cin0 + B0 Cin0
Cout1 = A1 B1 + A1 Cin1 + B1 Cin1
S0 = A0 B0 Cin0 + A1 B1 Cin0 + A0 B1 Cin1 + A1 B0 Cin1
S1 = A1 B1 Cin1 + A1 B0 Cin0 + A0 B1 Cin0 + A0 B0 Cin1
For example, when A = 1, B = 0, and Cin = 1, then Cout = 1 and S = 0. By using dual-rail encoding, we can use (A0, A1) = (0, 1) to represent A (valid “1”), (B0, B1) = (1, 0) to represent B (valid “0”), and (Cin0, Cin1) = (1, 0) to represent Cin (Valid “1”). After applying to the above logic expression, we can obtain (Cout0, Cout1) = (0, 1) which represents valid “1” and (S0, S1) = (1, 0) which represents valid “0”. The result agrees with our expectation. Figure 3-14 is a schematic drawing showing a dual-rail symbol of a DI full adder.
Figure 3-14: A dual-rail symbol of a DI full adder
The DI full adder can be needed to comprise the right-to-left carry-ripple array of the asynchronous multiplier shown in figure 3-13.
After introducing array multiplier, the details of DI Carry-Lookahead Adder are illustrated in the following paragraph. DI Carry-Lookahead Adder can be implemented by using dual-rail signaling in input bits, sum bits, carry bits, and carry-kill bit. A carry is said to be “generated” from a given bit position if the sum for the given position produces a carry out independent of a carry in. A carry is said to be “killed” in a given bit position if a carry does not propagate through the bit. Thus the adder is statistically faster than the ripple adder since carry-kill and carry-generate signals can be generated in the middle bits instead of going through all the carry logic from the least significant bit (LSB). The terms propagate, generate, and kill may be applied to blocks. Several full adders can also be grouped together to form an adder block. A carry is said to propagate through a given block if a carry transferred into the given block’s LSB summation is followed by a carry out of the given block’s MSB summation. A block is said to generate a carry if the block’s MSB summation produces a
carry out, independent of carries into the block’s LSB. Figure 3-15 is a schematic drawing showing a conventional DI carry lookahead adder. The DICLA comprises the input bits (Ai, Bi), the output bits (Si, Ci) and the hot code (ki, gi, pi) of internal signal. For simplicity, we use an 8-bits DICLA scheme as example.
Figure 3-15: An 8*8 bits DICLA scheme
The DICLA can be built with two basic modules: C and D modules, connected in a tree-like structure. The equations of the C module are defined as follows:
Carry-kill ki = Ai0Bi0
Carry-generate gi = Ai1Bi1
Carry-propagate pi = Ai0Bi1 + Ai1Bi0
Sum S0 = A0 B0 Cin0 + A1 B1 Cin0 + A0 B1 Cin1 + A1 B0 Cin1
SumS1 = A1 B1 Cin1 + A1 B0 Cin0 + A0 B1 Ci0 + A0 B0 Ci1
Where i = 0, 1, …, n-2, n-1. The C module is shown in figure 3-16. The dual-rail signals on the left side of figure 3-16 are grouped as Ai = (Ai0, Ai1), Bi = (Bi0, Bi1), Ci = (Ci0, Ci1), Si =
(Si0, Si1), and Ii = (ki, gi, pi). The schematic symbol of C module is shown on the right side of figure 3-16.
Figure 3-16: C-module
The equations for the D module are defined as follows:
Block-carry-propagate Pi,k = Pi,jPj-1,k
Block-carry-kill Ki,k = Ki,j + Pi,jKj-1,k
Block-carry-generate Gi,k = Gi,j + Pi,jGj-1,k
Block-carry-out = Cj0 = Kj-1,k + Pj-1,kCk0
Block-carry-out = Cj1 = Gj-1,k + Pj-1,kCk1
Where i = 0, 1, …, n-2, n-1. The D module is shown in figure 3-17.
Figure 3-17: D-module
The signals on the left side of figure 3-17 are grouped as Ii,j = (Ki,j, Gi,j, Pi,j), and Ci = (Ci0, Ci1).
The schematic symbol of D module is shown on the right side of figure 3-17. Initially, all carries (Ci0, Ci1 for i = 1, 2, …, n) and the internal signals (Ki,j, Gi,j, Pi,j) are zero, because all primary inputs (Ai0, Ai1, Bi0, and Bi1 for i = 0, 1, …, n-1) and input carry (Ci0, Ci1) are zero.
During the computation, the inputs Ai0, Ai1, Bi0, Bi1, C00, and C01 become valid, and then the outputs (Ci0, Ci1) and (Si0, Si1) for i = 1, 2, …, n become valid gradually. Finally, the completion detector checks all outputs and outputs the completion signal indicating that the operation is completed.