Summary - Background Review - 應用於近臨界電壓晶片資料傳輸之拔靴帶式電路技術

Chapter 2 Background Review

2.4. Summary

In this chapter, several backgrounds of the dissertation have been briefly reviewed. Since some non-ideal effects owing to the shrinking of the channel length and the gate-oxide thickness,

current variation caused by environment makes circuit designs more challenging. Additionally, nano-scaled circuits design using near-threshold supply has several detrimental impacts.

Trade-off between performance and energy efficiency should be carefully dealt with. Last part of this chapter, some popular low-voltage design techniques have been introduced as well. Based on the concept of the bootstrap technique, we will develop several bootstrap circuits in the following chapters.

Chapter 3

Near-threshold Clock Network

A driver with strong driving current and little skew is needed in a clock network. According to Fig. 3-1(a), the conventional bootstrapped driver consists of a pull-up and pull-down control pair to drive the PMOS and NMOS transistors, respectively. As mentioned in chapter 2, the gate voltages of PMOS and NMOS driver transistors are kept VDD and 0 in the cut-off phase; they are fed -VDD and 2VDD to increase the current density in the driving phase. Despite a previous effort [35] to increase the boosting efficiency by rearranging the timing of the switching and boosting signals, reverse leakage current remains the main drawback of conventional bootstrapped drivers.

Among other bootstrapped circuits, single capacitor ones reduce the costs of hardware overhead [36-37]. However, their complex circuitry design seriously degrades charge sharing at the capacitor node. Moreover, the leakage current is problematic as well.

(a) (b)

Fig. 3-1.(a) Conventional bootstrapped circuit (b) Proposed bootstrapped circuit.

In this chapter, we present a sub-threshold clock network with a bootstrapped CMOS inverter operated at sub-threshold power supply. The bootstrapped CMOS inverter is introduced to achieve high boosting efficiency and improve the speed. It is applicable in both increasing driving ability by boosting signals into super-threshold region and reducing the leakage current as well. Fig. 3-1(b) illustrates the circuit diagram. Theoretically, the PN bootstrap circuit produces an output swing of -VDD to 2VDD. 2VDD (-VDD) enhances the driving capability of NMOS (PMOS) driver and suppresses the leakage for the PMOS (NMOS). The PN bootstrap circuit provides VSG (VGS) = 2VDD and turns on the PMOS (NMOS) driver. In contrast, a

negative VSG (VGS) = -VDD suppresses leakage current while the PMOS (NMOS) driver is turned off. Moreover, as compared to other previous works, the proposed design scheme has fewer devices in the sub-threshold region. Consequently, that explain why the process variation affects the proposed design scheme to a lesser extent.

3.1. Overview of On-chip Interconnect

Before introducing the proposed bootstrapped CMOS inverter, the fundamental of interconnect is briefly reviewed. First of all, interconnect and repeater linear model is adopted according to VLSI parameters scaling in this section. In addition, the definitions of speed and power consumption of the on-chip interconnect circuits are described. All these parameters introduced from linear models to define figure of merit (FoM), the index for optimal global on-chip interconnect design.

3.1.1. RC-Interconnect with Repeater Insertion

Top Metal

Bottom Metal

Fig. 3-2. Cross section of interconnect configurations.

In general, a global interconnect is assumed to be placed between two adjacent orthogonal metal layers and two coplanar wires, as shown in Fig. 3-2, where W and S are the interconnect width and spacing; T is the interconnect thickness and H is the dielectric height; Cf is the fringing-field capacitance; Ca is the parallel plate capacitance to the top and bottom layers of metal; Cc is the coupling capacitance between the neighboring interconnects. The interconnect resistance per unit length is denoted as (3-1).

T r_w W

= ρ⋅

. (3-1)

Where ρ is the metal resistivity; rw is the sheet resistance in the data sheet.

With technology scaling and global interconnect increasing, repeaters insertion is broadly used to reduce delay and power consumption. Several literatures have addressed the optimization of global interconnect design with repeater insertion [29-33]. Since the interconnect parameters can be determined by width S and spacing W and so on, on-chip interconnects with repeaters insertion can be analyzed by Elmore RC delay model. According to Elmore delay model, time constant τof whole interconnect can be given from the model depicted in [29-31]

When we separate global interconnect into several segments, the small delay penalty of repeaters can be tolerated on these critical segments. Time constant τis dominated by interconnect segment. However, if the segment of global interconnect is over-shorten, the driving capability of repeaters decreases severely. Consequently, there is a trade-off between time constant τand power consumption.

3.1.2. Time Constant, Power Dissipation and Figure of Merit

Data rate is relative to time constant. Rising time and falling time can be estimated by the step response The output rise time is defined from the 20% transition edge to 80% transition edge, as shown in Eq.(3-2).

r 80% 20%

t =t −t ≅1.386τ . (3-2)

The minimum rising time is specified as 0.125 unit interval (UI) in the SATA standard, where t80% and t20% is the time when output voltage exceeds 80% VDD and 20% VDD, respectively during the rising edge [34].

Besides speed is one of the most important factors in on-chip interconnect design, power consumption is another basic consideration as well. The total power consumption includes not only the switching power, but also the leakage power and the short-circuit power, which is expressed as PSW, PSC and PLeakage, respectively. The detail expressions and discussions are reported in [29-31]. The total power dissipation of each interconnect is written as in Eq.(3-3).

( )

T SW SC Leakage

P L P P P

=⎛ ⎞⎜ ⎟⎝ ⎠× + + ^. (3-3)

Where L is the total length of interconnect and h is the separated segment length. Since switching power dissipation is a great portion of total power, PSW can be expressed as in Eq.(3-4).

( )

SW gs db Wire DD

P f mL c c c V

α ^⎡ h ^⎤

= ⋅⎢⎣ + + ⎥⎦⋅ ^. (3-4)

where α represents the activity factor which shows the probability of signal switching. The

(cgs+cdb) is the parasitic capacitor of repeater.

Performance of interconnect is effected by many design parameters. Most of them were discussed in literatures [32-33]. The FoM is used to compare the performance. Here, FoM1 in Eq.(3-5) is defined as the total energy per bit to express the energy efficiency.

FoM1 _T P^T _{Total DD}.

E C V

f α

= = ≈ (3-5)

Where ET represents the total energy. Fig. 3-3 shows the energy per bit is a function where total L is 10 mm and ET is depicted as a function of segment length h and repeater finger m. As a result, we can find out that the design is more energy-efficient as h is longer and m is using minimum m=1. Since the supply voltage VDD is assigned by the system requirement, the only way to gain the energy efficiency is using long segment length h. However, it suffers great penalty of speed. According to this limiting fact, the most energy efficiency happens as using maximum h and the minimum driver sizing. It becomes a trade-off depending on the requirement.

400

800

1,200

10² 10³

0 0.5 1 1.5 2

Finger m Segment length (um)

Engery per bit (pJ)

Fig. 3-3. Effect of segment length and fingers of repeaters on the energy per bit.

3.2. Active Leakage Reduction Bootstrapped Inverter

Fig. 3-4 schematically depicts the proposed active leakage reduction bootstrapped inverter (ALBI). Where CBP and CBN are the bootstrap capacitors; MP1 and MN1 are the transistors for CBP

pre-charge and CBN pre-discharge; INV refers to the inverter to control MP2 and MN2; MPD and MND are the output drivers for CL; NP and NN are the boosted nodes. The node NB is boosted above VDD and below ground to enhance the driving capability. Fig. 3-5 and Fig. 3-6 show the

operations with the input switching from H to L and from L to H respectively. Fig. 3-7 shows the ALBI simulated transient waveforms with an output load of 0.5pF under a power supply of 200mV. According to this figure, before Vin transits from H-to-L, node NN has the initial voltage of 0V. After transiting from H-to-L, NN is boosted below ground to (-188mV). Meanwhile, MP2

is turned off and MN2 is turned on. Therefore, the boosted signal at NN passes through MN1 to NB

to drive MPD in order to pull up the capacitive load CL. At this moment, MP1 is turned on to pre-charge NP to VDD (0.2V). However, MN1 is turned on reversely causing the reverse current flow to charge NN. At the end of the period while Vin is L, NN still holds (-90mV). When Vin goes from L to H, the operation is similar to Vin transiting from H to L. NP is boosted above VDD to 389mV and discharged to 303mV at the end of the period while Vin is H.

Fig. 3-4. Proposed bootstrapped inverter.

Fig. 3-5. Proposed bootstrapped inverter operations (input H-to-L).

Fig. 3-6. Proposed bootstrapped inverter operations (input L-to-H).

M prechargeP1

to 0.2V

M pre-dischargeN1

to 0V

-90mV

Fig. 3-7. Simulated timing waveforms at 5 MHz at 200 mV VDD.

3.3. Detail Evaluation and Discussion

The proposed ALBI is superior to previous designs in terms of leakage power and switching speed. In a low-voltage circuit design, the decreasing the Ion/Ioff ratio degrades the noise margin.

In the proposed design, the boosted voltage is used in both driving phase and cut-off phase.

Additionally, the proposed design improves the Ion/Ioff ratio by using the active bootstrapped leakage reduction method. Moreover, fewer design components increase the speed of the bootstrapped circuit. Owing to the fewer components operating in the sub-threshold region, the proposed design scheme performs better than other previous works in terms of Monte Carol analysis.

To compare the performances of the proposed scheme and conventional ones more fairly, this work re-designed the conventional inverter and reported bootstrapped drivers by using the 90nm process. The sizes of the conventional inverter and the bootstrapped driver are designed to obtain the same rise/fall transient output waveforms. Their device sizes are listed in TABLE 3-1.

A 30fF boost capacitor is used to ensure that the boosting efficiency exceeds 80%. These features are evaluated in detail as follows.

TABLE 3-1 Device Sizing

3.3.1. Boosting Efficiency

Ideally, the boosted node NB generates a voltage swing from 2VDD to –VDD. However, the parasitic capacitance at node NB exhibits the charge-sharing effect with the bootstrap capacitance [17]. For example, when NB transitions above VDD, consider the equivalent circuit of the upper side shown in Fig. 3-4. VBP and CPTP are the voltage and the total parasitic capacitance at NB,

To increase driving capability, the bootstrap capacitance is designed to be significantly larger than the parasitic capacitance at the node. As a result, (3-6) can be rewritten as (3-7),

2 2 .

β is the boosting efficiency factor or simply the boosting efficiency. Similarly, as VP BN transits

from VDD to below ground, the estimated VBN is

( ) ( )

BN BN DD N DD

BN PTN

V C V V

C C β

≈ ⋅ − ⋅ −

+ (3-8)

Based on larger bootstrap capacitance, the boosting efficiency is better. In order to observe the leakage power and time delay time in a more ideal case, we used 100fF as a bootstrap capacitor. In our test chip, based on a trade-off between cost and performance, a 30fF boost capacitor is used for sure that the boosting efficiency is 80% at least. As shown in the Fig. 3-8, the boosting efficiency is 88% when using a 30fF bootstrap capacitor.

0 20 40 60 80 100

55 60 65 70 75 80 85 90 95 100

Boost efficiency (%)

Boost capacitance (fF)

Boost efficiency

Boosting efficiency (%)

Bootstrap capacitor (fF)

Boosting efficiency (%)

Fig. 3-8. Boosting efficiency vs. bootstrap capacitor.

3.3.2. Reduction of Leakage Current

In the proposed design scheme, the boosted high (2VDD) at NB enhances the driving capability of MND and suppresses the leakage current of MPD. Similarly, the boosted low (-VDD) at NB enhances the driving of MPD and reduces the leakage of MND.

The Ioff current is primarily formed by a sub-threshold leakage current [38-39]. Hence, scaling the supply voltage lowers the Ion/Ioff ratio. In the previous literature, bootstrapped drivers improve the Ion/Ioff ratio only by enhancing Ion unidirectional. The proposed design effectively suppresses the leakage current of PMOS (NMOS) by providing a potential of a -VDD to VSG

(VGS). According to the I-V formula in sub-threshold region, our design s reduces the leakage current exponentially.

Although HSPICE can simulate steady-state leakage power, characterizing the leakage

power under dynamic operations is difficult. The leakage power of a periodic waveform can be estimated by separating it from the average total power. The total energy ET of a period of T is

( )

T T SW SC Leakage

SW SC Leakage

E P T P P P T

E E P T

= ⋅ ≈ + + ⋅

= + + ⋅ , (3-9)

where E , _T E_SW , ESC and E_Leakage represents the total energy, the switching energy, the short-circuit energy, and the leakage energy. The switching energy, short circuit energy and leakage current are assumed to remain constant under the same power supply. A long wire can be regarded as large capacitive load is pF range. When a CMOS driver drives heavy capacitive loads, the energy contributions of the short-circuit current can be ignored. E_Leakage is proportional to T; E is the total energy of the repeaters. Thus, we can rewrite Eq.(3-9) as _rep

2 .

T rep 2 wire DD Leakage

E ≈⎛⎜⎝E +α C V ⎞⎟⎠+P ⋅T (3-10)

For two identical signals with different periods T1 and T2, Leakage power PLeakage is derived as

( )

1 1 2 2

1 2

T T

Leakage

P T P T

P T T

⋅ − ⋅

= − . (3-11)

Fig. 3-9 shows the comparison results for the leakage power as a function of frequency with a 0.2pF capacitive load in different temperature and process corners. The ratio of leakage power to total power is also shown in Fig. 3-9. Owing to the negative VGS control, the leakage power at 10MHz under 0.2V of the proposed bootstrapped inverter is 2pW. The leakage power is 3.9nW for a conventional inverter, 0.15nW for [16], and 39nW for [17]. Although the PMOS (NMOS) transistor is turned off with the positive voltage VSG (VGS) = VDD in [17], the leakage power in [17] is more than three orders higher than in the proposed design scheme. When the operating frequency goes from 10MHz to 100kHz, the potential of the boost node become lower due to the node leakage degrades the leakage performance. The potential of the boost node even returns to VDD or 0 at 100kHz. Hence, we can find out the leakage power is very close to the design in [16].

100k 1M 10M 10n

100n 1μ 10μ

Leakage power (Watt)

Clock frequency (Hz)

0 20 40 60 80 100

@FF 125 C, V =0.2V° DD

PLeak/Ptotal(%)

(c)

Fig. 3-9. Leakage power as a function of frequency from 10 MHz to 100 kHz in corners.

3.3.3. Delay Time Analysis

Delay time is another important feature of bootstapped circuits. Although the driving transistors operate in a triode region under the subthreshlod-supply, other devices remain in the subthreshlod region. The total delay time is thus the sum of the propagation delay of the INV and the driver, which is denoted as

, , ,

P BI P INV P Driver

t =t +t . (3-12)

Where tP BI, , t_{P INV}_, , and tP Driver, are the delays of the bootstrapped inverter, the INV, and the driver, respectively.

Assume that the boost efficiency is the same for all bootstrapped drivers. Delay time of the INV becomes a dominant factor. The sub-threshold logic delay is derived in [9] as

2exp( )

f L DD

DD th

dep T

k C V

t W V V

C V

L nV

⋅ ⋅

= − . (3-13)

Where kf is a fitting parameter. However, circuit delay time is related to the RC loading effects.

The ALBI has the shortest delay time among the other bootstrapped circuits since the loading of INV is only gate capacitance of MN2 and MP2.

Fig. 3-10 summarizes the comparison results for the delay time (from H to L) and the power consumption as a function of CL at 10 MHz with a supply of 200 mV. The proposed design is the lowest in power consumption and delay time.

0.2 0.4 0.6 0.8 1.0

5.0n 10.0n 15.0n 20.0n 25.0n 30.0n 35.0n

Delay time (sec)

Cap Loading (pF)

Proposed JSSC1997[4]

TVLSI2008[6]

10n 100n

Power (Watt)

@V =0.2V,25 C,TT CornerDD °

Proposed JSSC1997[16]

TVLSI2008[17]

Fig. 3-10. Delay time and power consumption versus capacitive loads at 10 MHz.

The potential of the boost node returned to VDD or 0 indeed degrades the leakage performance in the low frequency or in the fast process/temperature corners. On the contrary, the potential of another boost node can easily pre-charge to VDD or 0. As shown in Fig. 3-11, whether in the nominal 25°C, TT corner or in -40°C, SS corner or the 125 °C, FF corner, the delay times of all designs are almost the same at the frequencies from 1 MHz to 100 kHz.

100k 1M 10M

1n 10n 100n

Delay time (sec)

Clock frequency (Hz)

@V =0.2VDD

Proposed JSSC1997[16]

TVLSI2008[17]

TT, 25°C SS, -40°C FF, 125°C

★

@SS, -40 C° ▲

@TT, 25 C°

@FF, 125 C°

Fig. 3-11. Delay time as a function of frequency in corners.

3.3.4. Delay Time Analysis of Process Variation

Sub-threshold operation limits the yield due to its serious process variations. Although the boosted control signal pushes the driver transistors into the triode region, the residue circuit devices still incur the same serious problems with the variation. With fewer devices in the sub-threshold region, the proposed design is less affected by the process variation.

The delay time variability analysis is performed based on Monte Carlo simulations. Device mismatch, threshold voltage Vth and process corner variation are assumed to be Gaussian random distribution. In order to cover the most critical process and temperature corners, Monte Carlo simulations are under 3σ process variation at 25°C, 125°C and -40°C, as shown in Fig. 3-12. The supply voltage is 200mV and the clock rate is 1MHz. The number of samples for each temperature corner is 1500, and the total number of samples is 4500. For the worst case at -40°C, a conventional inverter has an average delay of 15.1ns, and the standard deviation is 26.4ns. For the proposed design does not only reduce the average delay to 6.9ns, but also the standard deviation to 6.3ns, which is much better than [16] and [17]. Obviously, The ALBI has higher immunity to the process and temperature variation.

0.0 10.0n 20.0n 30.0n 40.0n 50.0n 60.0n 70.0n 80.0n 0

Fig. 3-12. Monte Carlo simulation results under a power supply of 200 mV.

3.4. Implementation and Experimental Results

3.4.1. Implementation of the Bootstrap Capacitor

We can choose the value of the boost capacitor to adjust the boosting efficiency. Large boost capacitor can achieve high boosting efficiency. In addition, larger boost capacitor can store more charges to keep the node voltage against the leakage even at the low speed. However, the area cost and power consumption is the design trade-off. In our test chip, a 30fF boost capacitor is used ensure that the boosting efficiency is at least 80% and doesn’t occupy too much area.

MOSFET cap, MOM cap, and MIM capacitor are three types of capacitors in CMOS technology. Among them, MOSFET capacitor has the densest capacitance per area. However, MOSFET capacitor also has several drawbacks. First of all, while the MOSFET capacitor operated in sub-threshold region, the capacitance changes abruptly due to the control voltage as shown in Fig. 3-13. Then, the leakage current of the nano-scaled device becomes more serious.

Next, MOSFET capacitor has large parasitic capacitance from Vctrl nodes to the bulk as compared to other caps. The large parasitic capacitance need more power budget in the driver.

MIM capacitor has the least parasitic capacitance but largest area. A 30fF MIM capacitor occupies 5.1um x 8.5um. Besides, MIM capacitor needs an extra mask which means extra cost.

As a result, we use MOM capacitor as the boost capacitor without extra mask. A 30fF MOM capacitor occupies 3.7um x 8.6um and has 1fF parasitic capacitance load at both nodes.

-0.9 -0.6 -0.3 0.0 0.3 0.6 0.9

Fig. 3-13. MOSFET capacitor changes due to the control voltage.

3.4.2. Chip Implementation and Measurement

A test chip of bootstrapped CMOS inverters is implemented in 90nm 1P9M SPRVT process to demonstrate the effectiveness of the proposed design scheme. The test circuits include the reported bootstrapped circuits of [16], [17], and the proposed design. The circuits also contain test keys to verify the interconnection model. Each bootstrapped circuit is implemented as a 10-stage cascade driver chain. In each stage, two 30fF MOM capacitors serve as bootstrap capacitors and a 200fF MOM capacitor as CL. Level shifters are used to boost the 200mV internal signal to 500mV chip I/O signal for the measurement. The total area is 958μm 776μm× , and the core area is 566μm 102μm.× Fig. 3-14 shows the die photograph. The layout area of the proposed bootstrapped inverter cell is 25.8μm 4.1μm.×

Test keys

Bootstrapped test circuits

Decouple Cap.

De-couple

Cap.

Proposed bootstrapped inverter cell

Fig. 3-14. Die photograph and cell layout.

Fig. 3-15 Experimental environment.

Fig. 3-15 shows the photography of our experimental environment. Fig. 3-16 shows the measured waveform. The cumulative clock peak-to-peak and RMS jitters are 3.6ns and 504ps, respectively. The measured average total power is 1.01μW. With the leakage power estimated in

在文檔中應用於近臨界電壓晶片資料傳輸之拔靴帶式電路技術 (頁 24-0)