Bit Level Area Optimization Using CSA-Based Structure

Chapter 3. Proposed Algorithm

3.1 Bit Level Area Optimization Using CSA-Based Structure

Structure

Several algorithms of CSA-based MCM optimization are already proposed.

However they only count on the number of CSAs, neglecting the size difference of them. The fact that the word length of different CSAs are not identical makes these algorithms not obtaining minimum cost in bit-accuracy. An example is shown in Figure 19. The input is denoted by x and there are two solutions for implementing 51x:

13x<<2 − x and x<<6 − 13x. The constant 13x is implemented by 16x − 4x + x.

Implemented by our later proposed method the number of adder bits of 13x<<2 − x is 13 and x<<6 − 13x is 10 if we assume input's word length is 12. ΔAdder bit(%) is (13

− 10) / 13 =23.08 %. Thus only considering number of adder without considering the bit-level accuracy cannot obtain the smaller cost in this case.

Figure 19. Two different additions for implementing 51x have different the number of adder bits

However, the word length calculation of CSA involves the number system.

Two's complement is the most generally used binary signed number system in digital systems. For a N-bit number A, the MSB A[N − 1] is the sign bit and it represents a negative value –A[N – 1]2^{(N – 1)}. On the other hand, the remaining bit A[x] represents the positive value A[x]2^x, 𝑥 ∈ {0,1, … , 𝑁 − 2}. The corresponding value of the two's

For example, if A is an 8-bit two's complement number, the bit A[7] represents the value −A[7]2⁷. The corresponding value is –A[7]*2⁷+ A[6]*2⁶+ … + A[0]*2⁰.

Figure 20. An 8-bit two's complement number

In our proposed algorithm we will mark the sign bit by red square as in Figure 21 because it is different from the other bits. Due to the fact that the positions of sign bit in all three inputs may be different, we may face the problem of adding sign bit with unsigned bits. If we add sign bits with unsigned bits directly, the result will also contain sign bits. However, inserting a sign bit in the middle of a data word is not allowed in 2’s complement number system.

Figure 21. MSB is a sign bit while the other bits are unsigned bits

To deal with this problem, the inputs with smaller positions of MSB are needed to do the sign extension. Extend the two inputs with smaller positions of MSB is a trivial way to deal with the problem, but it comes with a high cost in terms of adder bits. An example is shown in Figure 22(a). Assume the inputs 1 and 2 of the addition 16 + 2 + 1 =19 do the trivial sign extension and word length of input variable is 4. The total number of adder bits is 4. But in our proposed method we adjust the sign extension as shown in Figure 22(b). Fewer bits are extended and the number of adder bits for the addition is less. The principle and detail of the algorithm will be discussed in this chapter.

Figure 22. (a) Trivial sign extension (b) Smart sign extension

3.2 Problem Formulation

As mentioned in 3.1, we have changed the optimization target from reduction of the number of CSA's to reduction of the number of adder bits for implementing all constants. The problem formulation is shown as below:

Area optimization of CSA-based MCM block Problem: Given a set of constants (positive and odd), find the minimum the number of adder bits for implementation of all constant multiplications.

Definition 1: A CSA-based MCM block has single input and multiple output ports that each output port has two data words (sum and carry of a CSA).

Definition 2: Adder bit is a bit which needs a full adder implementation in CSA.

The addition in adder bit is not replaced by simpler hardware like wiring and inverter.

3.3 Overall Flow

The overall flow of our proposed algorithm is shown in Figure 23. After the preprocessing of making the given set of constants positive and odd, the following four steps will be executed:

1. Find AGN representations for constants

2. Build Boolean network for adder sharing

3. Calculate the number of adder bits for each adder

4. Formulation of 0-1 ILP problem

In step 1, the AGN representations for constants are found. Then in step 2, the connections between constants and additions are built and a Boolean network is constructed to represent the connections. After step 2, the calculation of number of adder bits for each adder is performed in step 3. In step 4, the ILP constraints and optimization target function are transformed from the Boolean network generated in step 2 and calculated adder bits in step 3. Finally the ILP tool (in this thesis, Gurobi_5.0 is used) can be executed to find the optimal solution under the constraints in step 4. ILP-optimized solution is the additions that total number of adder bits is minimal. The details of all 4 steps will be discussed in 3.4~3.7.

Input: a set of constants

Make all constants positive and odd

1. Find AGN representations for

constants

2. Build Boolean network for adder sharing

3. Calculate #adder bits for each adder

4. Formulation of 0-1 ILP problem

Output: additions that corresponding

#adder bit is minimized Run ILP tool Gurobi_5.0

for optimal solution

Figure 23. Overall flow of our proposed algorithm

3.4 Find AGN representations for Constants

To avoid the exponentially increasing number of ILP constraints because the graph algorithms have an extremely large solution space, it is necessary to choose the candidates of additions carefully. We adopt the approximate general number in this step because it provides better solutions than CSE algorithms without adding too many extra constraints. Furthermore, it utilizes the characteristic that adding 1 to a CSA output only needs one CSA as shown in section 2.2. Compared to all previous works, it provides the minimum number of adders to the best of our knowledge.

For a target constant, we first find all combinations of adding or subtracting 2^N to another constant (can be negative). For example, 51 = 1<<3 + 43 means the target constant 51 can be implemented by adding 1<<3 = 2³ to constant 43. But there is an extra rule mentioned in 2.1. A set called N_set is built and stored in the ascending order based on the sorting result of all odd constants that can be represented in a given word length with their number of non-zero terms in CSD representation. According to the sorted order in Nset, only the constants with the smaller indices compared to the target constant is allowed (see Figure 17(b)). An example is showed in Figure 24. Suppose the 51 is the needed constant and the word length of the constant is 7, there are 14 possible candidates of AGN representations. But among these 14 candidates, three of them (−1<<1 + 53, −1<<5 + 83 and −1<<6 + 115) have inputs with larger indices compared to 51. So they won't be the legitimate AGN representations for generating 51x. Furthermore, there are 2 candidates with the same input constant but different shift and sign (1<<6 − 13 and −1 + 13<<2). If only counting number of adder as in [21], they have the same cost and one of them will be removed for reducing ILP

constraints. But in our work the number of adder bits of these two adders and corresponding position of MSB and LSB of sum and carry may be different, it is necessary for us to keep all of the candidates.

Figure 24. Legitimate AGN representations of 51

The overall process of finding legal AGN representations for all constants is shown in Table 3. First a set of constant C is inserted of all target constants and the word length of constant is assumed as n . For each element ck in C, it adds or subtracts a left shift (from 0 to n – 1) of 1 and the addition/subtraction results are saved in a temporary set temp_C. For every element in temp_C, it is transformed into a positive and odd constant, c3, and its corresponding AGN representation, o1: c_k = s1 * (c3<<m) + s2 * (1<<n), is found. Both s1 and s2 is 1 or – 1 and m as well as n are how many bits should c3 and 1 to be shifted, respectively. If (1) the index of c3 in N_set is smaller than the index of ck in Nset , (2) o1 is not found before and (3) the number of non-zero digits of c3 in CSD representation is greater than or equal to 3, o1 will be saved. If the number of non-zero digits of c3 in CSD representation is greater than 3, it will be inserted into C. Finally, c_k is removed from C and temp_C is set as an empty set. The process will be executed until C is an empty set.

Ignore one of

3.5 Construct Boolean Network

If all AGN representations of each constant are found, a Boolean network would be built to represent the connections of all constants and additions. First, we transform all AGN representations to an AND gate. For example, 51 = −1 + 13<<2 is shown in Figure 24. In Figure 25, every edge is either a constant or an adder output. Edge vX Table 3. Overall process of finding legal AGN representations for all constants

represents the constant X and edge vX_i represents the i_th AGN representation of constant X. Bold edge represents the sum and carry of an addition and thin edge represents the input variable. Each edge has a corresponding binary variable. The value of the binary variable represents whether we need the constants or addition implementation.

Figure 25. Corresponding AND gate of −1 + 13<<2

After AGN representations are transformed into the AND gates, all of the candidates for the same constants are connected to an OR gate as shown in Figure 26.

Only part of AGN representations are shown in Figure 26, but in practice all of the AGN representations should be connected to the same OR gate. The output of the OR gate is vX and the meaning of the OR gate is that if we need constant X, at least one of the additions should be implemented.

Figure 26. OR gate connected with all AGN representations for 51

The edges v13, v47 and v67 in Figure 26 are not only connected to one AND gate, of course. The sharing for the same input of different additions is common because the sharing is the key of the area minimization in MCM designs. Part of the sharing is shown in Figure 27. If v13 is not only used in −1 + 13<<2 but also in 1<<6 + 13, the branch of edge v13 will appear.

Figure 27. Part of the whole Boolean network

3.6 Calculation of Number of Adder Bits for All

Additions

3.6.1 Calculation of Number of Adder Bits

Unlike processor-based design, ASIC designs have the flexibility to adjust the position of MSB (most significant bits) and LSB (least significant bits) according to the dynamic range and resolution of a data word. In MCM design, because of bit-wise shifting, the bit position of every data word may be varied. In our work, calculation of number of adder bits depends on the bit positions of the data words and how we do

sign extension. The MSB's and LSB's of the three inputs determine the number of adder bits of an addition. LSB determines the starting point and MSB determines the end point of a series of the full adders in CSA. Besides knowing the adder bits in a CSA, in order to supply the MSB and LSB information to the fanout CSA, the position of the MSB and LSB of both output words (sum and carry) in current CSA would be the information we must know. So when we need to calculate the adder bits of all additions, the MSB and LSB of three inputs of a CSA and its MSB and LSB of both sum and carry are necessary. The properties of input of MCM block such as word length and number system will also affect the calculation. We assume input variable of the MCM block is under the 2's complement representation and the word length of MCM input is equal to the word length of constant.

In our proposed algorithm, the calculation of adder bits of all additions is actually finding the weighting of all AND gates in the Boolean network. An example is shown in Figure 28. First we define MSB(x) is the MSB position of x and LSB(x) is the LSB position of x. Assume MSB’s and LSB’s of sum and carry of v67 are known. In the figure we denote the position of MSB and LSB of sum add carry by ( [MSB(sum):LSB(sum)], [MSB(carry):LSB(carry)] ). And the positions of MSB’s as well as LSB’s of sum and carry of the output edge v51_3 are solved by the smart sign extension method we proposed (will be shown in next section). The number of adder bits in the addition is solved at the same time. Then the positions of MSB's and LSB's of sum and carry of v51_1~v51_n and calculation of number of adder bits for each addition implemented 51 are determined. How we obtain the bit information of v51 after the OR gate will be described in section 3.6.3.

Figure 28. Calculation of #adder bits of v51_3

3.6.2 Smart Sign Extension

The smart sign extension we propose is a systematic method to do sign extension for any input condition. The concept is similar to some booth multiplier, however, the booth multiplier is a regular structure and there is no systematic method proposed yet to the best of our knowledge. There are three cases are sorted out according to the inputs of the CSA. We sorted out all input combinations into three cases, with some common parts and some varied parts. We divide this method to two steps due to these three cases. The common first step is for the LSB handling. The second step handles sign extension for each case. Compared with the trivial sign extension, it has less number of adder bits.

First step:

We rename the three inputs to x, y and z in CSA according to their position of

MSB such that MSB(z)>= MSB(y) >= MSB(x). For input x, we extend its sign bit to 33

Figure 29. (a) Sign extension for input x (b) Direct wiring (c) First step complete

MSB(y) as shown in Figure 29(a). Then for those inputs whose LSB position it's not the largest, we can do direct wiring to the sum and carry. Because there are two outputs and inputs, no full adder is needed. This means from the bit 0 to max(LSB(x), LSB(y), LSB(z)) − 1, no full adder is needed. The starting point of the series of full adders is max(LSB(x), LSB(y), LSB(z)). In Figure 29(b), the LSB positions of input x and input y are not the largest. So for the bits below LSB(z) (i.e., 0~LSB(z) − 1), no full adder is needed. In implementation, the input with the smallest position of LSB will be wired to the sum and the other input will be wired to carry because we want to balance the word length of carry and sum. The smallest position of LSB is 0 since we only implement the odd constant. The position of LSB of sum and carry become 0 and median(LSB(x), LSB(y), LSB(z) ), respectively. The following procedures of proposed algorithm will depend on the largest position of MSB. In Figure 29(c), it is the MSB(z). We find there are three cases would generate different results and should be discussed separately.

Second step:

We will divide this step into 3 cases according to the relative position of MSB(z) and MSB(y). Case 1 is MSB(z) > MSB(y) + 1, case 2 is MSB(z) = MSB(y) + 1 and case 3 is MSB(z) = MSB(y). These three cases contained all possibility because MSB(z) ≥ MSB(y) by our definition.

Case 1: MSB(z) > MSB(y) + 1

In this case, as shown in Figure 30, MSB(z) is at least 2 bits higher than MSB(y).

The example used in Figure 30 is a CSA with three inputs and their word lengths are 6.

The addition implemented by this CSA is 1<<4 + 1<<2 + 1. So the input x is 1, input

y is 1<<2 and input z is 1<<4. The first 4 bits of sum and first 2 bits of carry are from

the first 4 bits of x and first 2 bits of y, respectively. In this case, even if we extend input x to the MSB(y), the bit at MSB(y) in input z is still not sign bit thus can’t be added to x and y directly. So an extra operation is needed. We will inverse the bits at MSB(y) of the input x and y and treat them as the unsigned bits. As shown in Figure 30(b), the equation of this operation is –x5 – y₅= –2 + x₅+ y₅. The logical value of a sign bit x_s is actually –x_s while a inversed Boolean variable x_s is denoted by x_s , To prove the equation is a tautology, the truth table of left part and right part of the equation is shown in Table 4.

After handling the sign bit of x and y, we can add these unsigned bits in the middle as shown in Figure 30(c). The end point of the series of the full adders becomes MSB(y) as shown in Figure 30(d). So in this case the number of adder bits of the addition is MSB(y) − max(LSB(x), LSB(y), LSB(z) ) + 1. In this example, the number of adder bits is 7 − 4 + 1 = 4. The bits s4~s7 in sum are the sum outputs and c₃~c₆ in carry are the carry outputs of the series of full adders. The bit c₂ in carry is

Table 4. The truth table of –x5– y5 and –2 + x5+ y5. x5 y5 –x5 – y5 –2 + x5+ y5

0 0 –0–0 = 0 −2+1+1=0

0 1 –0–1 = –1 −2+1+0=–1

1 0 –1–0 = –1 −2+0+1=–1

1 1 –1–1 = –2 −2+0+0=–2

actually 0, but we will treat it as a variable to make the carry to be a continuous vector for the simplicity of the whole process.

The −1 at MSB(y) + 1 can be used to transform the bit of input z at the same position from unsigned bit to sign bit. In this example, the equation is -z₄. = -1 +z₄ . The truth table of left part and right part of the equation is shown in Table 5 to prove the equation is also a tautology. The inversed z4 can be used as the sign bit of the sum and the remaining bit in the input z (i.e., the sign bit z₆) can be wired to the carry. So both sum and carry are in the format of two's complement. The Table 6 shows the MSB and LSB of both sum and carry.

Table 6. The MSB and LSB of both sum and carry in case 1

MSB LSB

Sum MSB(y) +1 0

Carry MSB(z) median(LSB(x), LSB(y), LSB(z) ) Table 5. The truth table of –1 +z₄ and –z₄

z4 –1 +z₄ –z₄

0 –1+0 = –1 –1

1 –1+1 = 0 0

Figure 30. Example of case 1 after sign extension

Case 2: MSB(z) = MSB(y) + 1

In this case, as shown in Figure 31, MSB(z) is at actually 1 bit higher than MSB(y).

The example used in Figure 31 is a CSA with three inputs is a CSA with three inputs and their word lengths are 6. The addition implemented by this CSA is 1<<3 + 1<<2 + 1. So the input x is 1, input y is 1<<2 and input z is 1<<3. The first 3 bits of sum and

first 1 bit of carry are from the first 3 bits of x and first 1 bits of y, respectively.

The following procedure is very similar to the procedure in case 1. We will inverse the bits at MSB(y) of the input x and y and treat them as the unsigned bits.

This operation will increase an extra −1 at MSB(y) + 1. In Figure 31(b), this means –x5 – y₅= –2 + x5+ y5. After adding the middle bits, the bits s3~s7 in sum are the sum outputs and c2~c6 in carry are the carry outputs as shown in Figure 31(d). The number of adder bits is also MSB(y) − max(LSB(x), LSB(y), LSB(z) ) + 1. In the example used in Figure 31, it is 7 − 3 + 1 = 5. The bit c1 in carry is actually 0, but we will treat it as a variable like in case 1.

The different point is the bit at MSB(y) + 1 of input z is the sign bit of the input z.

In this example as shown in Figure 31(d), it is −1 − z5, not −1 + z5 in case 1. We will do 1-bit sign extension for input z and inverse the bit at MSB(y) +1 of input z. The inversed bit is used as the sign bit of sum and the extended bit is used as the sign bit of carry. In the example, the sign bit of sum is z5 and the sign bit of carry is z5. This operation is actually the equation –1 – z5= –1 + z5-2z5 = – z5 -2z5 . Then we can take z5 as the sign bit of sum and take z5 as sign bit of carry. The Table 7 shows the MSB and LSB of both sum and carry.

Table 7. The MSB and LSB of both sum and carry in case 2

MSB LSB

Sum MSB(y) +1 0

Carry MSB(z) +1 median(LSB(x), LSB(y), LSB(z) )

Figure 31. Example of case 2 after sign extension

Case 3: MSB(z) = MSB(y)

In this case, as shown in Figure 32, MSB(z) equals MSB(y). The example used in

在文檔中應用於進位儲存加法器式多常數乘法設計面積最小化之智慧型正負號延伸及精確位元計算技術 (頁 28-0)