• 沒有找到結果。

Operation at Low Supply Voltages

Chapter 2 Overview of Low Power Design and Leakage Control

3.4 Simulation Results

3.4.3 Operation at Low Supply Voltages

Fig. 3.22 shows the VBB generators operate under different supply voltages. They work well at higher input codes even when the supply voltage is down to 0.5V. At lower supply voltages, however, the values of VBB fail to achieve the targets as described in (3.9) and (3.10) due to small pumping currents.

Input Binary Code

Input Binary Code

Fig. 3.22 Operations under different supply voltages.

3.5 Conclusion

In this section, the principles and operations of charge pumps are discussed. Both positive and negative pumping circuits are addressed, and some high-performance and advanced charge pumps are introduced. Besides, a novel scheme that generates multiple voltage levels through configurable control signals is realized.

Applying reversed body bias (RBB) is a popular technique to reduce subthreshold current, and voltages lower than GND and higher than VDD should be produced. Using charge pump is a useful and simple method to realize that. But the performance of charge pump degrades severely for high load-current applications.

The configurable scheme generates various voltages with different input settings.

This feature is especially useful in SoC designs, which need several different supply voltages and body-bias voltages. Instead of dealing with kinds of charge pump circuits, those voltages can be generated by using the same circuits with different control signals.

Chapter 4

Variable-Threshold CMOS (VTCMOS) SRAM Cell Arrays With On-Chip Body-Bias Generators

Power consumption is becoming a critical issue in designing processors, memories, and other logic circuits. By scaling down the supply voltages and threshold voltages, the active power consumptions of logic circuits are reduced dramatically and high performance is maintained. However, as the technology scales down, the leakage current in standby mode cannot be ignored anymore.

For almost all the VLSI or SoC (System-on-Chip) chips, various kinds of memories occupy most of the area of the chips. Therefore, the requirements of low power and low voltage memories are emergent. Since the memories cost a large fraction of chip area, their power consumptions play an important role in the whole chips. As above, active power can be significantly reduced by scaling down the supply voltage, but leakage current is increasing with the scaling of technologies.

Fig. 4.1 shows the leakage paths and standby leakage equations in a SRAM cell.

Since subthreshold leakage is the dominant part of leakage currents in deep-submicron and nano-scale technologies, many variable-threshold CMOS (VTCMOS) SRAM have been proposed to reduce subthreshold current by dynamically applying reversed body-bias (RBB).

Fig. 4.1 Leakage currents and standby currents equations in a SRAM cell.

In Sec. 4.1, some representative VTCMOS SRAM architectures are introduced.

An on-chip dual-level body-bias (VBB) generator is presented in Sec. 4.2, and this circuit is applied to SRAM cell arrays to observe the effectiveness in saving leakage power. In Sec. 4.3, a time-out-policy controller for VBB generator is described, and finally some conclusions are addressed in Sec. 4.4.

4.1 Variable-Threshold CMOS SRAM

In this section several VTCMOS SRAM circuits are reviewed, they dynamically adjust the body-bias to reduce subthreshold leakage current. Their operations and disadvantages are also discussed.

4.1.1 Dynamic Leakage Cut-off Scheme

Fig. 4.2 (a) shows the schematic diagram of dynamic leakage cut-off (DLC) SRAM, and Fig. 4.2 (b) is the operating waveforms and Fig. 4.2 (c) and (d) are the well bias drivers [4.2]. The n- and p-well bias voltages are VDD and GND respectively for selected rows, while the unselected rows are 2VDD and –VDD, respectively.

Through the well bias drivers, the n- and p-well bias voltages can be dynamically adjusted. In this way, the selected memory cells maintain high performance while the unselected memory cells perform low subthreshold leakage.

However, there are some questions about this scheme. First, Fig. 4.2 (b) depicts that VPWELL and VNWELL return to GND and VDD respectively before VWL rises. There might be some extra logic circuits to detect or predict the rises of VWL. Second, the substrate is a large capacitive load and it takes a long time to charge and discharge it.

Before VPWELL and VNWELL go back to the nominal values, VWL and input signals must be delayed to avoid incorrect operations. Finally, no any VBB generators are adopted in this scheme, it means that the voltages -VDD and 2VDD are external voltage sources.

This scheme seems so impractical since two more external voltage sources are needed.

Fig. 4.2 (a) Dynamic leakage cut-off SRAM, (b) operating waveforms, well bias drivers for (c) n-well and (d) for p-well.

4.1.2 Preactivating Mechanism for VTCMOS Cache

Fig. 4.3 [4.2] uses address prediction to solve the problem of DLC circuit. It uses three address lines and two extra address decoders to predict the activity of wordline.

Moreover, a reservation counter is included and it indicates the number of reservations for line accesses. This method concerns about the processor architecture

and Fig. 4.4 [4.2] shows the processor architecture with a preactivating DLC cache.

Fig. 4.3 Preactivating mechanism for a VTCMOS cache.

Fig. 4.4 Processor organization with a preactivating DLC cache.

4.1.3 Auto-Backgate-Controlled MT-CMOS

Fig. 4.5 shows the concept of the Auto-Backgate-Controlled MT-CMOS (ABC-MT-CMOS) circuit that uses two distinct external voltage sources (VDD1 and VDD2) in different operating modes [4.3]. Q1 and Q2 here are high-Vt transistors, and low-Vt transistors are used for the internal circuits. While the circuit is operating (active mode), Q1 and Q2 are turned on and therefore the virtual source line VVDD and the virtual ground line VGND are 1V and 0, respectively.

In the sleep mode, Q1 and Q2 are turned off and the other voltage source VDD2

(3.3V) supplies the memory cells. The VVDD is connected to VDD2 through diode D1, while VGND is connected to ground through diode D2. Note that each of D1 and D2 consists two diodes and the forward bias of one diode is 0.5V. Hence, the VVDD and VGND are 2.3V and 1V respectively in the sleep mode.

Fig. 4.5 Concept of ABC-MT-CMOS.

The static leakage current consumed by VDD2 is significantly reduced compared with that in the active mode because the threshold voltage of the internal transistors increases by the reversed body-source voltage. From Fig. 4.5 it can be easily understood that a 1V reversed body-source voltage is applied to the internal circuits.

Fig. 4.6 Configuration of ABC-MT-CMOS.

Fig. 4.6 shows the actual configuration of the ABC-MT-CMOS circuit with two additional high-Vt transistors Q3 and Q4. In the active mode, SL is low and SL is high and thus Q1, Q2, and Q3 are turned on. Hence, both VVDD and substrate bias BP are 1V. On the other hand, in the sleep mode SL is high and SL is low and thus only Q4 turns on and BP becomes 3.3V. The operations of Fig. 4.6 and Fig. 4.5 are equivalent.

However, this scheme needs a voltage regulator or converter to transform 3.3V to 1V, if 1V is internally generated. The regulator or converter induces extra power and area overheads. Besides, the nodes VVDD and VGND are large capacitive nodes and they probably cost a great amount of time to charge and discharge. Therefore, the sizes of Q1-Q4, D1, and D2 are indispensably large to diminish charging and discharging time. The area overhead is hence significant and the extra power to charge and discharge the virtual source lines is another power overhead.

4.1.4 Dynamic-Vt SRAM

Fig. 4.7 shows a dynamic Vt SRAM to reduce subthreshold leakage current [4.4].

The two NMOS transistors serve as voltage switches to dynamically adjust the voltage of substrate in different operating modes. The substrate is switched to 0V in active mode for high performance, while it’s switched to Vbs (a negative value) in sleep mode for saving leakage power.

Fig. 4.7 Schematic of a dynamic Vt SRAM set.

A time-based capacitor-discharging scheme for Vt-control is shown in Fig. 4.8 [4.4]. The circuit consists of an RC decay circuit, a level converter, and Vsub switches.

When the data line is accessed, Vcap is charged by WL and immediately switches Vsub

to 0V. Vcap starts to discharge slowly as long as WL is pulled low, and it’d recharged whenever WL is accessed again. After a sufficient idle period, Vcap is low enough to switch Vsub to –1.0V. Fig. 4.9 depicts the operating waveforms for the nodes.

There are some questionable problems about the operation. First, Fig. 4.8 shows that the Vt control circuit needs 1.5V, 1.0V, and –1.0V three supply voltages. It’

converters or charge pump circuits are indispensable if only one external voltage source is available. However, no any voltage converters or charge pumps are mentioned in this scheme.

Fig. 4.8 Schematic of the Vt control circuit using capacitor-discharging scheme.

Fig. 4.9 Operating waveforms for Vt control circuit.

Second, the Vsub switches in Fig. 4.8 are not robust if V1 is generated by a charge pump instead of an ideal external source. When the switch that is connected to V1

turns on, voltage –1.0V passes to Vsub through the switch. Unfortunately, the charges at V1 redistribute between V1 and Vsub since V1 is connected to a charge pump.

Therefore, in the steady state Vsub and V1 are both between –1.0V and 0V due to charge redistribution. Using an external voltage source V1 can solve this problem, but generally for logic chips, no negative supply voltages are available.

Finally, the operation waveforms in Fig. 4.9 do not concern about the loading

effect. The substrate is a large capacitive load and it takes a lot of time to charge and discharge. Hence, the waveforms in Fig. 4.9 is too ideal and actual situations are much more complicated.

4.1.5 Forward Body-Biased SRAM

In this subsection, a forward body-biased (FBB) SRAM scheme is described. In contrast to the previous schemes, a FBB SRAM intends to achieve high-speed operation instead of suppressing standby leakage. However, this scheme uses super high Vt devices to reduce subthreshold leakage in both active and standby modes. The performance degradation due to super high Vt devices is diminished by forward biasing the body-source junctions.

Fig. 4.10 shows the schematic diagram of FBB SRAM scheme with body bias drivers M1-M3 [4.5]. The SUBSL signal is generated by the decoder circuit and each subarray has a dedicated SUBSL signal. When the subarray is accessed, the SUBSL is pulled high and the switches M1 and M2 and turned on. Therefore, the p-well of the selected subarray is charged to 0.5V, increasing the active current and achieving a fast operation. On the other hand, the p-well voltage of unselected subarrays is switched to 0V through M3.

Fig. 4.11 shows the operating waveforms of the control signals. The scheme uses extra decoder circuits to decode the most significant address bits, ensuring the SUBSL signal is pulled high before the wordline signal. As in Fig. 4.11, the SUBSL signal goes to high before the coming of the wordline signal, and VPWELL is switched to 0.5V before the wordline arrives as well.

The operating waveforms in Fig. 4.11 seem so perfect but some problems exist.

First, a voltage converter is necessary to generate 0.5V and this circuit induces power and area overhead. Next, due to the extra decoder circuits for generating SUBSL signal before the wordline, another power and area overhead is included. Finally, it seems so difficult to switch VPWELL to the FBB level before the arrival of wordline signal. Fig. 4.10 shows that a subarray contains 1024 cells and the capacitance at VPWELL probably exceeds the order of pico-farad. The time period between wordline and SUBSL is about the order of nano-second. Therefore, in comparison with the two parameters, correct operations of this scheme seem so questionable.

Fig. 4.10 Schematic diagram of forward body-biased SRAM.

Fig. 4.11 Operating waveforms of FBB SRAM.

4.2 SRAM Cell Arrays With On-Chip V

BB

Generators

Section 4.1 introduces several VTCMOS SRAM schemes that both RBB and FBB schemes are included. However, almost all of them use external voltage sources instead of on-chip voltage generators. For general digital circuits, it seems so impractical to have so many external voltage sources so that on-chip voltage generators are necessary. In this section, an on-chip dual-level body-bias generator is presented and applied to the design of SRAM cell arrays. The power overhead of on-chip voltage generator is also taken into account.

4.2.1 On-Chip Dual-Level VBB Generator

Fig. 4.12 shows the schematic diagram of SRAM cells with the proposed VBB

generator, which comprises two substrate bias generators, two recovery circuits, and a high/low control circuit. The substrate bias generators have been introduced in Chapter 3, and the recovery circuits are used to return VBBN and VBBP to their original values. High/low control circuit controls input pumping signals and thus the output voltages.

Fig. 4.12 Schematic diagram of SRAM cells with on-chip dual-level VBB generator.

4.2.1.1 Substrate Bias Generator

There are two substrate bias generators that the voltage doubler is for VBBP and the negative charge pump is for VBBN. Please refer to Chapter 3 for detail schematics and operations.

4.2.1.2 High/Low Control

Fig. 4.13 shows the schematic and operating waveforms of high/low control circuit. A clocking signal is fed into the circuit and Low signal is used to control the swing of output signals, Vout0 and Vout1. When Low is pulled low, both the PMOS transistors are turned on and input clocking signals directly pass through the transistors. When Low signal is pulled high, the PMOS transistors are off and the NMOS are on. Consequently, the swing of output signals is smaller than input clocking signal by an amount of Vt. The operating waveforms clearly illustrate the

operations.

Fig. 4.13 Schematic of high/low control and operating waveforms.

4.2.1.3 Recovery Circuits

Reversed body-bias is applied to unselected rows to reduce subthreshold leakage.

Once the rows are selected some mechanisms must be done to cancel the body-bias.

Fig. 4.14 shows the recovery circuits for both VBBN and VBBP and their operating tables.

Fig. 4.14 Recovery circuits for VBBN and VBBP and operation tables.

4.2.2 Simulation Results

The SRAM cell arrays with on-chip VBB generators are simulated in TSMC 0.13um technology. Some power consumption and power saving information are

discussed below.

4.2.2.1 Waveform of VBB Generator

Fig. 4.15 shows the simulated waveforms of the VBB generator for both high and low conditions. In high condition VBBN reaches –1.15V and VBBP reaches 2.35V, where VDD is 1.2V. On the other hand, VBBN reaches –0.92V and VBBP reaches 2.12V in low condition. Note that the pumping frequency and output loading are 5KHz and 10pF, respectively.

Waveform of VBB generator

Time (ms)

Fig. 4.15 Simulated waveforms of VBB generator.

4.2.2.2 Average Power of VBB Generator

Fig. 4.16 shows the average power of VBB generator and it reveals that the average power converges with time. Fig. 4.16 clearly illustrates this feature that the power consumption in steady state is less than transition state. That is, for a row of SRAM with sufficient time period in standby, the average power overhead of VBB

generator converges to about 1.6nW. Therefore, the factor of time period in standby

mode must be taken into account when evaluating the net power saving.

Average power of VBB generator versus time

Time (ms)

Fig. 4.16 Average power of VBB generator versus time.

4.2.2.3 Net Power Saving of SRAM

Fig. 4.17 shows the net power saving of SRAM versus time period in standby mode. The net power saving is defined as the original SRAM leakage power minus the remaining part and the power overhead of VBB generators. Fig. 4.17 illustrates that the net power saving increases with the increase of time. This is because the power overhead of VBB generator decreases with time.

Fig. 4.17 also shows the curves of different wordline lengths. It can be seen that wider wordline lengths achieve larger net power savings and reach the break-even points in less time. Break-even point means the point that the saved leakage power is equivalent to the power overhead of VBB generator.

Fig. 4.17 strongly proves the statement mentioned above that the power saving is time dependent. If the time period in standby is 3 milliseconds, for example, a 64-bit row obtains positive net power saving but negative ones are achieved for 32-bit and 16-bit rows. This means that for 32-bit and 16-bit rows, the saved leakage power within 3 milliseconds is not enough to compensate the power overhead.

Net power saving versus time period in standby mode

Time period in standby mode (ms)

0 20 40 60 80 100 120

Fig. 4.17 Net power saving of SRAM versus time period in standby mode.

Composition of power sources VBB generator's power

Fig. 4.18 Composition of power sources.

4.2.2.4 Composition of Power Sources

Fig. 4.18 depicts the composition of power sources of a wordline with VBB

generators. The leakage power of SRAM is proportional to wordline length and the power consumption of VBB generator increases slightly with the increase of wordline length. Wider wordlines save much more leakage power and with a small fraction of increased overhead. Therefore, more net power saving is achieved when wordline length increases. Fig. 4.19 further shows that the fraction of power overhead is getting relatively smaller with the increase of wordline length.

Fraction of power sources

Wordline lengths

16-bit 32-bit 64-bit

Normalized power

0 1

VBB generator's power SRAM leakage power

95% 92% 89%

Fig. 4.19 Fraction of power overhead for different wordline lengths.

4.2.2.5 RBB VBBN or VBBP Alone

Fig. 4.20 and 4.21 show the power information of using RBB VBBN or VBBP

alone for a 32-bit wordline. The net power saving of applying both VBBN and VBBP is about 64%, while the net power saving of applying VBBN alone is about 64.5%. This result demonstrates that VBBP generator has less significant effect on leakage saving.

The information of applying VBBP alone shown in Fig. 4.20 and Fig. 4.21 supports this result.

Net power saving (32-bit wordline)

Time period in standby mode (ms)

0 20 40 60 80 100 120

Fig. 4.20 Effectiveness of net power saving for VBBN or VBBP alone.

Net power saving (32-bit wordline)

Conventional Both Vbb Vbbn only Vbbp only - 64% - 64.5%

+ 9%

Fig. 4.21 Power information for VBBN or VBBP alone.

Fig. 4.20 and Fig. 4.21 show that no positive net power saving is possible if RBB VBBP is applied alone, due to the remaining leakage power plus power overhead exceed the nominal leakage power. In comparison with the conditions of using both VBB generators and VBBN generator alone, the prior condition obtains more leakage power saving but more power overhead induced by VBB generators. Therefore, it’s another solution to apply RBB VBBN alone and the net power saving is slightly larger.

4.2.3 Triple-Well Layout for SRAM Cells and VBB Generator

Fig. 4.22 shows the layout of conventional and triple-well SRAM cells, and the cross-sectional views depict the difference. Triple-well structure uses an n-well ring and deep n-well to form a p-well region, which serves as the substrate of NMOS transistors. Voltages can be easily applied to p-well and n-well through well contacts, as shown in Fig. 4.22. Fig. 4.23 further shows the layout and configuration of two 64-bit rows with a VBB generator. The two p-well regions for NMOS are connected together through well contacts and supplied by VBB generator. Likewise, the two n-well regions for PMOS are connected together through well contacts and supplied by VBB generator.

Fig. 4.22 Layout of conventional and triple-well SRAM cells.

Fig. 4.23 Layout and configuration of triple-well SRAM rows with a VBB generator.

4.3 Time-Out-Policy V

BB

Generator Controller

In this section a VBB generator controller is presented. It adopts the concept of time-out policy to determine the operation of VBB generator. Time-out policy is a commonly used technique for controlling operating modes in software. Here, a

In this section a VBB generator controller is presented. It adopts the concept of time-out policy to determine the operation of VBB generator. Time-out policy is a commonly used technique for controlling operating modes in software. Here, a