• 沒有找到結果。

A 1.5 V CMOS high-speed 16-bit÷8-bit divider using the quotient-select architecture and true-single-phase bootstrapped dynamic circuit techniques suitable for low-voltage VLSI

N/A
N/A
Protected

Academic year: 2021

Share "A 1.5 V CMOS high-speed 16-bit÷8-bit divider using the quotient-select architecture and true-single-phase bootstrapped dynamic circuit techniques suitable for low-voltage VLSI"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

A

l . 5 V

CMOS

High-speed 16-bitt8-bit Divider Using the

Dynamic Circuit Techniques Suitable for Low-Voltage VLSI

uotient-Select Architecture and True-Single-phase Bootstrapped

C. C. Yeh, J. H. Lou and J. B. Kuo

Rm. 338, Dept. of Electrical Eng., National Taiwan University Roosevelt Rd., Sec.

4,

#1, Taipei, Taiwan 106-17

Abstract-This paper reports a 1.5V high-speed

16-bit4-bit divider circuit using the quotient-select architecture and true-single-phase bootstrapped dy-

namic circuit techniques. Based on a 0.8pm CMOS

technology, the speed performance of this 16-bitt8-

bit divider circuit is improved by 45% as compared to

the divider using the non-restoring iterative architec- ture and the domino dynamic logic circuits without the bootstrapped technique.

I. INTRODUCTION

Division is an important function in a CPU arithmetic unit. Enhancing the speed performance of a divider cir- cuit is critical in raising the speed performance of a VLSI

CPU [1]-[3]. Fig. 1 shows the block diagram of a 16- bit+&bit divider circuit using a non-restoring iterative architecture [l]. A 16-bit dividend and an 8-bit divisor are assumed to be positive and smaller than 1. As in a stan- dard binary division operation, successively right-shifted values of the divisor are subtracted from or added to the the dividend. Also, for next-generation deep-submicron CMOS VLSI technology, low supply voltage is the trend. For sub-0.lpm CMOS technology, 1.5V is necessary. At a supply voltage of 1.5V, the speed performance of CMOS dynamic logic circuits such as NORA

[4],

domino, Zipper is better than that of CMOS static ones as a result of

reduced internal parasitic capacitances. However, as the serial fan-in is large, its associated propagation delay may increase drastically, which is especially serious at a low supply voltage. In this paper, by using a 1.5V CMOS dy- namic logic circuit with a bootstrapper technique, a 1.5V high-speed CMOS bootstrapped 16-bitt8-bit divider us-

ing the quotient-select architecture i s reported. It will

be shown that the speed performance of this 16-bits8- bit divider circuit is improved by 45% as compared t o the divider using the non-restoring iterative architecture and the domino dynamic logic circuits without the boot- strapped technique.

This work is supported under R.O.C. National Science Council Contracts #84-2622-E002-008, 011 & #85-2215-E0002-024.

10.1'1,

Fig. 1. The architecture of a 16-bit+8-bit non-restoring iterative divider.

11. CONVENTIONAL NON-RESORING ITERATIVE

DIVIDER

In a conventional 16-bit+8-bit non-restoring iterative divider, eight 'quo tient rows' are required. In each quo- tient row, there are A (adder) cells, an S (sign) cell, and a CLA (carry look-ahead) cell. Each A cell is com- posed of a full adder and a control signal implemented by an EXCLUSIVE-OR gate. The control signal decides whether an add or a subtract to be performed. In the jth

A

cell in the (i-1)th r ow, using a full adder, its associated sum and carry signals at the output are related to the sum and carry signals of the previous row as:

(2)

A cell

s cell CLA cell

Fig. 2. The functional blocks of the ith quotient row in the 16- bit+8-bit non-restoring iterative divider.

where Qi-2 is the negative of the (i-2)th-bit quotient, and

Dj

is the negative of the jth-bit dividend. In addition, propagate and genetate signals for the carry look-ahead circuit have been produced:

pi,j [ ( Q i - 2 CB

Dj)

CB S i - l , j + l CB

c

i - I , j + ~ ]

+ c i , j + 1 , (3)

Gi,j = [ ( Q i - z CB

Dj)

CB Si-l,j+l CB Ci-l,j+Z]

'Ci,j+l. (4)

=

The

S

cell, which contains the sum portion of the

A

cell, is used to compute the sign for each row:

si

= (Qi-2 CB s i - 1 , 1 CB Ci-1,2) CB Ci,l- ( 5 )

Eqs. (1)-(4) are applicable for the cells not at the top and left boundaries (2

5

i

5

b,

+

1 and 1

5

j

5

bd - 1). For the cells at the top and left boundaries

(i

= 1 or j = b d ) , where b, is the quotient bit number, and bd is the divisor bit number, Eqs.(l)-(4) should be modified to include appropriate boundary conditions as shown in Fig. 1. Note that Ni is the ith bit dividend.

As

shown in Fig. 2, the CLA cell is used to compute the final carry signal according to the following formula:

Yi,j = Gi,j

+

Pi,jl$,j+l, j = 1,

...,

b d , (6)

LAi = & , I . (7)

Using an EXCLUSIVE-OR logic gate, the ith quotient bit

at the output is low if either output from the S cell or t

he CLA cell is 1.

The speed performance of a non-restoring iterative di- vider is determined by the speed of the propagate and ~ 367 BLOCK I LEVEL 1 LEVEL 2 LEVELS BLOCK2 BLOCKJ I I I I

Fig. 3. Block diagram of the 1.5V 16-bitt8-bit divider using the quotient-select architecture and true-single-phase bootstrapped dy- namic circuit techniques.

generate signals of the A-cells, the delay time of the LAi signal in the CLA cell and the speed of producing the quotient bit

&z

of the XOR in the quotient row. After the quotient bit of a quotient row

(a)

is produced, its value is transferred to the next row. Then, the quotient bit of the next row

(G)

is computed. No quotient bit of the next row can be computed until the quotient bit of the previous row is obtained.

111. THE

PARALLEL-OUT

QUOTIENT-SELECT

DIVIDER As shown in Fig. 3, in the 3-bit parallel-out quotient- select architecture, instead of waiting for the quotient bit from the previous row, three quotient blocks have been used to produce the nine output quotient bits almost si- multaneously. In each block, three levels of quotient rows have been arranged. For example, under the quotient row of the first level, at the second level there are two quotient

rows: a and b. At the third level, there are four quotient rows: w, x, y, and z. The second and the third levels of quotient rows have been arranged to produce three output quotient bits: Qo, Q 1 , and Qz simultaneously. In these

seven quotient rows in three levels, the input quotient bits of rows 0, a, b, w, x, y, and z have been designated as 1, 1, 0, 1, 0, 1, and 0, respectively.

(3)

Bootab8pp.d OR gate :

Output tz A+B

Bootstmppd AND gat. :

Output = A 8

Fig. 4. The 1.5V CMOS bootstrapped dynamic logic circuits in- cluding the CMOS bootstrapper circuit.

The seven individual output quotient bits of each row in three levels are

Qo,

Q o a , Q o b , Q o w , Q o z , Q o y , and

Qoz,

respectively. The inputs to the first block are the divi- dend bits: NO-Ns. The output quotient bit of row 0- QO may be 0 or 1. The sum and the carry signals produced by row 0 are transferred t o rows a and b such that Qoa and Qob can be computed immediately without waiting

for the generation of Qo. Another dividend bit

NS

is used as the input t o both the quotient rows a and b. Then, the output quotient bit- Q1 is equal t o Qoa or Q o b depend-

ing on Qo by the multiplexer- MUX. If Qo is 1, MUX outputs Qoa as Q 1 . If & I is 0, MUX outputs Qob as Q 1 .

The outputs of the second-level quotient rows-Sal

-

Sa8, Ga2

-

c a 8 ; s b l

-

s b 8 , c b 2 - c b 8 are used as inputs t o the

third level. In addition, another dividend bit iV10 is used

as an input to the third level.

The output quotient bit QZ of the third level is deter- mined by a similar decision criterion as in the second level. Under the third level of the quotient rows, a multiplexer is used to select the sum and the carry of the first block-

&.z-s38, c 3 1 - c 3 8 . The second block uses the outputs from

the first block to generate the output quotient bits: Q 3 ,

Q 4 , and Q5. The third block generates the output quo-

tient bits: Qc, Q7, and Q8.

The speed of the 16-bit+&bit divider with the quotient- select architecture is determined by the propagation delay

of each

of

the three blocks as shown in Fig.3. The prop- agation delay of producing the output quotient bits

Q o ,

Q1, and Q 2 of the first block is mainly determined by the

propagati on delay of the sum and the carry signals (S's, C's) associated with each quoti ent row in all three lev- els in the first block. Although there are three levels in the first block, the speed of producing Q 1 and Q2 is not substantially slower than that of producing Qo since the

critical component of the propagation delay in producing the three output quotient bits- Qo, Q 1 and Q 2 is on the

adder circuit in the

A

cell. Therefore, the speed of gener- ating Q o , Q1 and QZ is about identical. Similar situations

exist for Q 3 , Q 4 and Q5 for the second block and Q s ,

Q ,

and Q8 for the third block.

As a result, the speed of the 16-bit+8-bit divider with the quotient-select architecture is about three times faster ~

368

'dd 'dd

?it

t

I

Fig. 5 . The 1.5V CMOS buffer circuit using the bootstrapper tech- nique.

as compared to that with the conventional non-restoring iterative architecture. Fig.

4

shows the 1.5V CMOS bootstrapped dynamic logic circuits including the CMOS bootstrapper circuit. As CK is low, it is the precharge period of bootstrapper circuits. During the precharge pe- riod, the internal node ( V d o ) is prechaxged t o V d d , and the

output voltage Vout is predischarged t o ground via

MN.

The bootstrap capacitor (Cb) is charged t o Vdd- the left

side is grounded and the right side is at

Vb

= V d d . Dur-

ing the precharge period, the right side of the bootstrap capacitor is separated from the output since MPB is off. As

CK

turns high, it's the logic evaluation period. Dur- ing the logic evaluation period, MPD, MP, M N turn off. During the logic evaluation period, the internal node volt- age V d o is determined by inputs A and B. If both A and

B

are high, v d o is pulled low and VI is high. Owing to

the charge in the bootstrap capacitor, Vb will be boot- strapped to over Vdd- the internal voltage overshoot. In

addition, as MPB turns on, Vout is pulled high t o over

V d d . In the

CLA

cell, owing t o the 1.5V bootstrapped

CMOS dynamic logic circuit, the signal swing of the in- put signals-P's and GIs exceeds 2V. As a result, the switching speed of the CLA cell is enhanced.

Fig. 5 shows the 1.5V CMOS buffer (B) circuit using the bootstrapper technique[ 51. During the pull-up tran- sient, the operation of the full-swing bootstrapped CMOS buffer circuit is divided into two periods regarding the bootstrap capacitor Cbp: (1) the charge build-up period and (2) the bootstrap period. Prior t o the pull-up tran- sient, the input is at OV and a t the output of the inverter

V,

is at 1.5V. Therefore, M ~ l b and

M N ~

are off ; M N Z ~ is on. At the output of the buffer, Vout is at 0V. On the other hand, M N ~ ~ and c b p of the bootstrap segment are separated from the Mp2 and Mpl of the fundamental seg-

(4)

I

50 100 150 200 250

Time (ns)

Fig. 6. Transient waveform of the 1.5V 16-bit~8-bit divider us- ing the quotient-select architecture and the true-single-phase boot- strapped dynamic circuit techniques.

ment. As a result, the bootstrap capacitor c b p has charge of 1.5Cbp Coulomb After the input ramp-up period, the right side of the bootstrap capacitor c b p is disconnected from ground since is off. Instead, it’s connected t o the gate of Mpl since MNlb is on. Due to the voltage change at the left side of the bootstrap capacitor Cbp, the right side of the bootstrap capacitor c b p changes to be- low OV- the internal voltage undershoot. As a result, the output voltage can switch at a faster pace since the gate of Mp1 is driven at below

OV.

Pull-down transient has a complementary configuration.

I v . RESULTS AND DISCUSSION

Fig. 6 shows the transient waveform of the 1.5V 16- bit+8-bit divider using the quotient-select architecture and the true-single-phase bootstrapped dynamic circuit techniques. The load at the quotient bit output is O.lpf. At a supply voltage of 1.5V, the propagation delay of the output quotient bit QS is 107ns for the 16-bite&

bit divider using the 3-bit parallel-out quotient-select architecture. Compared with the propagation delay of the divider using the conventional non-restoring iterative architecture-l92ns, a speed enhancement of 1 . 8 ~ has be en reached, which is less than 3x as expected. This is due to the fact that the quotient-select architecture is not fully “parallel-processing”

.

The propagation delays of two consecutive quotient rows of a block are differed by the delay in a full adder. In addition, the extra delay due to the multiplexer and the buffer also contributes to the shrinkage in the speed enhancement.

Conventional Divider

wlo Bootstrepper circuit

3-blt Parallelout Quotient-select Divider 40

2ot

I i 2:5

a

3:s i 4‘5 Supply Voltage 01)

Fig. 7. Delay time vs supply voltage of the true-single-phase CMOS bootstrapped divider.

V.

CONCLUSION

In this paper, a 3-bit parallel-out quotient-select archi- tecture has been studied. In fact, for a large-size divider system such as a 64-bite32-bit divider, a 8-bit parallel- out quotient-select architecture can be used t o further en- hance the speed performanc e. The more bits used in the parallel-out quotient-select structure, the more improve- ment in speed can be expected. However, a larger die area is also needed.

REFERENCES

[l] M. Cappa and V. C. Hamacher, ”An Augmented Iterative Ar- ray for High-speed Binary Division,” IEEE %ns. Computers,

Vol. C-22, No. 2, pp.172-175, Feb. 1973.

[2] R. Stefanelli, ”A Suggestion for a High-speed Parallel Binary Divider, ” IEEE %ns. Computers, Vol. C-21, No. 1, pp.42-

55, Jan 1972.

[3] A. Vandemeulebroecke, E. vanzieleghem, T. Denayer, and P. G. A. Jespers, ”A New Carry-Free Division Algorithm and its Applications to a Single-Chip 1024-b RSA Processor, ” ZEEE

J . Solid-state Circuits, Vol. 25, No. 3, pp. 748-756, June 1990. [4] N. F. Gonzales and H. J. DeMan, “NORA: A Racefree Dynamic CMOS Technique for Pipelined Logic Structures,” IEEE J . Solid-state Circuits, Vol. 18, pp. 261-266, June 1983.

[5] J. H. Lou and J. B. Kuo, “A 1.5V Full-Swing Bootstrapped CMOS Large Capacitive-Load Driver Circuit Suitable for Low- Voltage CMOS VLSI,” IEEE J . Solid-state Circuits, Vol. 32,

No. 1, pp. 119-121, Jan. 1997.

Fig. 7 shows the delay time vs. supply voltage of the true-single-phase CMOS boo tstrapped divider. As shown in the figure, regardless of the supply voltage, a consistent improvement in the speed for the 16-bit+8-bit divider us- ing the 3-bit parallel-out quotient-select architecture over the the one using the conv entional non-restoring iterative architecture can be seen.

數據

Fig.  1.  The  architecture  of  a  16-bit+8-bit  non-restoring  iterative  divider.
Fig.  2.  The  functional  blocks  of  the  ith  quotient  row  in  the  16-  bit+8-bit  non-restoring iterative divider
Fig.  4.  The  1.5V  CMOS  bootstrapped  dynamic  logic  circuits  in-  cluding  the  CMOS bootstrapper  circuit
Fig.  6  shows  the transient  waveform  of  the  1.5V  16-  bit+8-bit  divider  using  the  quotient-select  architecture  and  the  true-single-phase  bootstrapped  dynamic  circuit  techniques

參考文獻

相關文件

circuit sat: Given a circuit, is there a truth assignment such that the circuit outputs true?. • circuit sat ∈ NP: Guess a truth assignment and then evaluate the

circuit sat: Given a circuit, is there a truth assignment such that the circuit outputs true?.. • circuit sat ∈ NP: Guess a truth assignment and then evaluate the

circuit sat: Given a circuit, is there a truth assignment such that the circuit outputs true?.. • circuit sat ∈ NP: Guess a truth assignment and then evaluate the

With the proposed model equations, accurate results can be obtained on a mapped grid using a standard method, such as the high-resolution wave- propagation algorithm for a

The design of a sequential circuit with flip-flops other than the D type flip-flop is complicated by the fact that the input equations for the circuit must be derived indirectly

One, the response speed of stock return for the companies with high revenue growth rate is leading to the response speed of stock return the companies with

xchg ax,bx ; exchange 16-bit regs xchg ah,al ; exchange 8-bit regs xchg var1,bx ; exchange mem, reg xchg eax,ebx ; exchange 32-bit regs.. xchg var1,var2 ; error: two

The following code calculates the sum of an array of 16-bit integers. mov edi,OFFSET intarray ; address of intarray mov ecx,LENGTHOF intarray ;