• 沒有找到結果。

Power Reduction Due to Power-Gating

Chapter 4 Variable-Threshold CMOS (VTCMOS) SRAM Cell Arrays With

4.3 Time-Out-Policy V BB Generator Controller

4.3.6 Power Reduction Due to Power-Gating

The controller without power-gating scheme consumes tens of nano-Watt active and standby power. The term ‘active’ means that the state machine is evaluating and

‘standby’ means the output signal is kept high.

Fig. 4.31 shows the comparison of power consumption between with and without power-gating scheme. Although some extra gates are added for power-down control, they are inactive except in standby mode and therefore consume almost no energy in active mode (no active power overhead). Eventually, the standby power is reduced to 6% if power-gating mechanism is adopted.

w/o power-down with power-down

Fig. 4.31 Power comparison between with and without power-down scheme.

4.4 Conclusion

In this section, some VTCMOS SRAM designs are presented. These designs dynamically adjust the threshold voltage of transistors whether to achieve low standby power or high performance.

Besides, a VTCMOS SRAM scheme with on-chip VBB generators is proposed and discussed. The VBB generator generates two voltage levels and consumes nano-Watt order of power consumption. Simulation results show that this scheme can significantly reduce standby power of SRAM. Since the VBB generator consumes insignificant power, a great amount of net power saving is obtained. Simulation results show that about 75% net power saving is achieved for 64-bit wordline and 64% for 32-bit wordline. These results show that a significant power saving is achieved even the power overhead of VBB generator is included.

A time-out-policy controller for VBB generator that adopts a data-retention latch and power-gating mechanism is also presented. Once the output of the controller enables VBB generator, the most part of the controller is power-gated and about 94%

power saving is achieved.

Chapter 5

Power-Gating Technique In Ultra-Low Power SRAM Cell Array Design

Power gating is a popular low-power technique to reduce leakage current in standby mode and it has been widely used in logic circuits [5.1]. However, several SRAM architectures adopting power gating are proposed in recent years.

In Sec. 5.1 the principles of stacking effect is described. Design issues of power-gated (or gated-VDD) SRAM cells are discussed in Sec. 5.2, 5.3, and 5.4. Two power-gated SRAM architectures, column-controlled and row-controlled schemes are introduced in Sec. 5.5. In Sec. 5.6, a column/row co-controlled scheme is realized and comparisons with other schemes are shown in Sec. 5.7. Furthermore, the layout is implemented in 0.13um CMOS technology and area comparison is done. Finally, some conclusions and discussions are addressed in Sec. 5.8.

5.1 Stacking Effect

It has been observed that the stacking of two off transistors has significantly smaller subthreshold leakage current than one off transistor [5.2], [5.3]. This is called stacking effect and it is due to self-reverse biasing of stacked transistors.

5.1.1 Self-Reverse Biasing

Fig. 5.1 explains the phenomenon of self-reverse biasing. On the left is an off NMOS transistor with leakage current I1, which is mainly composed of subthreshold leakage. On the right are two stacked off NMOS transistors and the leakage current is I2. In the steady state, the voltage at node Vx is slightly higher than ground and thus transistor M21 has a negative Vgs (gate-to-source voltage) to make the pn junction reversely biased. Therefore, leakage current I2 is smaller than I1 due to the reversed bias of transistor M21.

5.1.2 Tradeoff Between Delay and Leakage

As mentioned above, two staked off transistors have smaller subthreshold current than one off transistor. However, due to the stacked devices the drive current is smaller and results in increased delay. Fig. 5.2 shows the circuits used to observe the tradeoff of delay and leakage. In the middle of Fig. 5.2 is a normal inverter that the channel widths for NMOS and PMOS are W and 2 W, respectively. On the rightmost shows a modified inverter that the NMOS is split into two half-sized NMOS transistors whose channel widths are W/2. Fig. 5.3 is the simulated result that shows the delay-leakage tradeoff, and it clearly shows that smaller leakage current with larger delay. Therefore, paths that are faster than required can adopt this effect to slow down and reduce leakage current.

Fig. 5.2 Using a stacked inverter to observe the tradeoff of delay and leakage current.

Delay-leakage tradeoff

Ioff (nA)

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Delay (ps)

Fig. 5.3 Delay-leakage tradeoff of stacking effect.

5.2 Gated-V

DD

SRAM Cell

Gated-VDD SRAM is just similar to power gating technique used in logic circuits.

Adding a PMOS transistor between virtual VDD and actual VDD, or adding a NMOS transistor between virtual GND and actual GND can reduce standby leakage.

5.2.1 Virtual GND Node Fluctuation

However, the situations of SRAM and normal logic circuits are different because the internally stored data in SRAM will disappear gradually after the gating transistor is turned off. Fig. 5.4 (a) shows a SRAM cell with a gating device between virtual GND and actual GND. The virtual node vss0 is floating and charged by cell leakage current when the gating device is turned off. Therefore, cell leakage current charges vss0 and forces the potential to increase. Fig. 5.5 is the simulated result shows that vss0 probably exceeds 1/2 VDD (600mV for 1.2V VDD) and thus influences the stored data.

The voltage at virtual GND node can be limited to a small value by adding a diode-connected NMOS transistor between virtual GND node vss1 and actual GND node [5.4], as in Fig. 5.4 (b). Fig. 5.5 shows that vss1 is limited to about 100mV.

(a) (b)

Fig. 5.4 Gated-VDD SRAM cells (a) without diode and (b) with diode.

5.2.2 Stability/Static Noise Margin

The stability of data in SRAM or register file is a critical factor for satisfied yield and low cost. Static noise is DC disturbance such as mismatches and offsets due to processing and variations in operating conditions. The static noise margin (SNM) is the maximum amount of DC disturbances that SRAM cell can tolerate that the stored data is not flipped [5.5]. Fig. 5.6 (a) shows a latch that comprises two inverters and two static noise sources, and Fig. 5.6 (b) shows the graphical view of SNM.

Time (us)

Fig. 5.5 Voltage of virtual GND increases after turning off gating device.

(a) (b)

Fig. 5.6 (a) A latch with static noise sources and (b) Static noise margin.

5.2.3 SNM Issue of Gated-VDD SRAM Cell

Fig. 5.7 shows the circuit used to discuss the SNM of gated-VDD SRAM cell. The two inverters are connected to a gating transistor and a diode-connected transistor, and both the two transistors have channel width “Wg. The channel widths of NMOS and PMOS of the inverters are n*Wg and 2*n*Wg, respectively. The term n is a positive number and we can observe the behavior of SNM by scaling the value of n. If n is 1, for example, it means that the gating device is the same size as the NMOS transistors of the inverters. Fig. 5.8 is the simulated result that shows SNM is a function of scale factor n, and the optimum value of n is about 2. This means that the maximum SNM occurs when the width of gating device is about half of the NMOS transistors of inverters.

Fig. 5.7 Circuit used to observe the SNM of gated-VDD SRAM cell.

SNM versus scale factor n

scale factor n

0 2 4 6 8 10

Normalized SNM

0.92 0.94 0.96 0.98

1.00 optimum n ~ 2

Fig. 5.8 SNM versus scale factor n.

5.3 Gate Leakage Reduction

As mentioned above, using an additional stacking device can reduce subthreshold leakage because of stacking effect. Moreover, gate leakage current decreases as well due to the positive voltage of virtual GND node. Gate leakage current increases exponentially with decrease in oxide thickness and increase in voltage across oxide [5.6], and it is shown that gate leakage through PMOS is smaller than NMOS. Fig. 5.9 shows the components of gate leakage in a gated-VDD SRAM cell while in standby mode. Since gate leakage is a function of voltage across oxide,

the gate leakage currents through M5 and M6 depend on bitline precharged voltage.

Therefore, the dominant component is through M1 because it includes gate-to-source, gate-to-drain, and gate-to-substrate gate leakage currents.

After adopting a gating device and a diode-connected transistor between virtual GND node and actual GND, most of the gate leakage currents mentioned above are reduced because of the rising of virtual GND node. Note that the voltage at node storing ‘0’ equals to the virtual GND node. It’s clearly that the gate leakage currents through M1, M2, M3, and M4 can be reduced when the voltage of virtual GND node rises to a positive value. As shown in Fig. 5.9, the dotted lines represent the extra leakage currents induced by the two additional devices. However, these two components are negligible since both the VDS and VGS voltages of them are small.

Fig. 5.9 Gate leakage components and extra leakage currents in gated-VDD SRAM cell.

5.4 NMOS or PMOS Gating Device

Fig. 5.10 depicts the dominant leakage sources of an inactive SRAM cell. The bitlines are precharged to VDD and the gating device is turned off. The solid lines index the subthreshold leakage paths and the dotted line represents the bitline leakage path. Using NMOS or PMOS gating devices can reduce the subthreshold leakage currents. But a PMOS gating device, however, does not create the isolation between bitlines and the ground as an NMOS gating device does. Therefore, using an NMOS gating device can save more standby power as a PMOS device can do, since an NMOS gating device isolates the bitline leakage path.

Fig. 5.10 Dominant leakage sources in an inactive SRAM cell.

5.5 Gated-V

DD

SRAM Architectures

Several gated-VDD SRAM architectures adopting power-gating technique have been proposed in recent years. In this section, two of them are introduced and discussed in detail about their features and behaviors.

5.5.1 Row (Wordline)-Controlled Architecture

Fig. 5.11 shows a gated-VDD (or gated-ground) SRAM that row decoder controls the gating devices [5.7]. In this architecture, all the SRAM cells on the same wordline share a common gating device. All the cells on the same wordline are turned on when the row is selected, and other unselected rows are in standby mode to save leakage power. Note that no any diode device is included between virtual GND and actual GND. The authors carefully sized the gating devices to maintain the data stability.

Since the row-controlled scheme share a common gating device per wordline, the capacitance at the virtual GND node is quite large and it may take a large amount of time to discharge this node. Consequently, the maximum operating clock frequency is limited due to the extra time to discharge the virtual node. Moreover, not all the cells are necessary for each read/write operation. Therefore, turning on the unnecessary cells just wastes a great mount of active power.

Fig. 5.11 Row-controlled SRAM architecture that row decoder controls the gating devices.

5.5.2 Column-Controlled Architecture

In contrast to row-controlled scheme, column-controlled scheme controls the gating devices by column decoder. Fig. 5.12 shows the schematic diagram of column-controlled SRAM architecture [5.8]. All the cells on the same bitline share a common gating device, and only the cells of the selected bitline are turned on. The virtual GND node is also a large capacitive node and the value bases on the number of wordlines.

As in column-controlled scheme, all the cells on one selected bitline are turned on but only one of them is selected by wordline. Consequently, less power saving is obtained due to the cells in active mode but unnecessary.

This scheme and the previous one have the same drawbacks, one is that they turn on many unnecessary cells for each read/write operation, and the other is that the virtual nodes are large capacitive nodes. The first one drawback makes power saving less and the other limits the maximum operating frequency due to the discharging of the large capacitive nodes.

Fig. 5.12 Column-controlled SRAM architecture that column decoder controls the gating devices.

5.6 Column/Row Co-Controlled Architecture

A new gated-VDD SRAM architecture is proposed in this section, and this scheme conquers the two drawbacks of the two previous schemes.

5.6.1 Schematic Diagram

Fig. 5.13 shows the schematic diagram of proposed SRAM scheme. In contrast to the previous two schemes, this scheme controls the gating devices with signals from both row and column decoders. The cells on the same wordline are grouped in blocks, and the block size depends on the number of I/O pins. Fig. 5.13 is an example of 32-bit wordline and 8-bit I/O and the 32 cells are divided into four blocks. Note that the wordline signals right from row decoder are not directly connected to the cells but through AND gates. The AND gates receive signals from row and column decoders and generate control signals to serve as ‘local’ wordlines and gating devices control signals.

Fig. 5.13 shows that each block is turned on only when both wordline and selection signals (sel0, sel1, and so on) are pulled high. The reason for this scheme is that for an 8-bit I/O SRAM core, only 8-bit data are either read from the SRAM or written into the SRAM per operation. That’s why the block size depends on the lengths of I/O. It’s straightforward to realize that the active power of this example is about 25% of row-controlled scheme.

Fig. 5.13 Proposed column/row co-controlled SRAM scheme.

5.6.2 Effectiveness of Proposed Scheme

Fig. 5.14 is the test circuit used here to observe the cell current in standby and active modes. It comprises eight SRAM cells and a gating device and a diode-connected transistor. Table 5.1 shows the total cell current of the circuit in standby and active modes. Table 5.1 reveals that the cell current in active mode is greater than standby mode by more than 1000 times. Therefore, more unnecessary cells are power gated, more power saving is predictable.

Fig. 5.14 Test circuit to observe the cell current in standby and active modes.

Table 5.1 Cell current in standby and active modes.

Standby/Active Icell (8-bit) ctrl=0 (standby) 1.08nA

ctrl=1 (active) 1.22uA

5.7 Three Typical Schemes for Comparison

In order to judge the effectiveness and usefulness, three test circuits are constructed and Fig. 5.15 shows the three different SRAM architectures. Note that all of them contain 32 SRAM cells per wordline. In the following the performance and power consumption of these three circuits will be compared.

In Fig. 5.15, the first one is conventional SRAM, which contains no any gating device. The second one is row-controlled scheme with only one gating device for all the cells. Besides, the gating device is controlled directly by wordline signal. The last one is proposed column/row co-controlled scheme, which contains four 8-bit blocks.

The four 8-bit blocks have separate gating devices and one for each block. The control signals come from both column and row decoders. Any block is active only when both the wordline and byte selection (sel0, sel1, and so on) signal is high. It’s noticeable that the gating device in (2) is four-time larger than any one in (3).

5.7.1 Read-Out Delay

Fig. 5.16 shows the simulated waveform of data read-out delay. The curve ‘Dout (1)’ is the output for conventional scheme, ‘Dout (2)’ is for row-controlled scheme, and ‘Dout (3)’ is for proposed scheme. Undoubtedly, conventional scheme is the fastest one to read out data. From Fig. 5.16, the read-out delay for row-controlled scheme is slightly larger than conventional. There are at least two reasons. First, row-controlled scheme has smaller active current due to stacked gating device.

Second, row-controlled scheme needs extra time to discharge virtual GND node.

The read-out delay of proposed scheme is obviously larger than the other two schemes. This is mainly due to the gate delay of the ‘AND’ gate used to generate control signal for gating device and cells. According to Fig. 5.16, the delay is larger than conventional by about 27%. Note that this value is measured from wordline-to-output time. Therefore, the overhead would be smaller if we measure it from the whole time of one complete read/write operation (clock-to-output).

Fig. 5.15 Three SRAM test circuits to compare their performance and power consumption.

5.7.2 Cell Standby Power

Fig. 5.17 shows the comparison of cell standby power of the three test circuits.

Obviously, conventional scheme consumes most cell standby power since no gating device adopted. Row-controlled and proposed schemes almost have the same cell standby power consumption. These two schemes are equivalent in standby mode because they both turn off all the cells, and about 60% cell standby power is reduced.

Wordline-to-dataout delay

Fig. 5.16 Simulated read-out delay for the three test circuits.

Comparison of SRAM cell standby power

conventional row-controlled this work

Fig. 5.17 Cell standby power comparison.

5.7.3 Active Power

Fig. 5.18 shows the comparison of normalized cell active power of the three SRAM architectures. The cell active power of row-controlled scheme is slightly smaller than conventional scheme. Due to the stacked gating device, row-controlled scheme has smaller active current and thus smaller active power. As mentioned before, this smaller current makes larger read-out delay.

In comparison with row-controlled and proposed schemes, the active power of the later one achieves 77% power saving. It seems so straightforward that proposed scheme just turns on the gating device of one block and the other three blocks are remained power-gated. No doubt about three-fourth power saving is obtained.

Comparison of SRAM cell active power

conventional row-controlled this work Fig. 5.18 Cell active power comparison.

5.7.4 Wordline Length, Block Size, and Cell Active Power

Fig. 5.19 depicts cell active power versus various wordline lengths and block sizes. The figure reveals that the cell active power of conventional and row-controlled schemes is proportional to wordline lengths, since both of them turn on all the cells for each operation. As for proposed scheme, however, the active power is almost a constant for a fixed block size, regarding of wordline length. This is because that only one block of cells is active at the same time no matter the length of wordline length is.

Comparison of SRAM cell active power

Fig. 5.19 Cell active power versus different wordline lengths and block sizes.

SRAM cell active power saving (vs. row-controlled)

Cell active power saving (%)

Fig. 5.20 Cell active power saving versus different wordlines and block sizes.

Fig. 5.19 also shows that less cell power consumption is obtained with smaller block size. It’ clearly understood that smaller block size means fewer active cells for each operation and thus less cell active power consumption. However, the block size depends on the number of I/O pins that is usually fixed.

Fig. 5.20 shows the cell active power saving of proposed scheme with various wordline lengths and block sizes. It’s obvious that most cell active power saving is achieved while the block size is the smallest. The circled points in Fig. 5.19 and Fig.

5.20 are interesting that the wordline length and block size both are 32 bits. In this situation, proposed scheme degenerates to row-controlled scheme since all the cells on the same wordline are turned on at the same time. Therefore, proposed and row-controlled schemes consume the same amount of cell active power, as shown in Fig. 5.18. Undoubtedly, the circled point in Fig. 5.20 shows a 0% power saving in this condition.

5.7.5 Reduction of Wordline Loading

Although proposed scheme induces an extra gate delay and increases wordline-to-output delay, this overhead is diminished due to the division of wordline loading. As in Fig. 5.15, the wordline of conventional scheme connects to 64 NMOS

Although proposed scheme induces an extra gate delay and increases wordline-to-output delay, this overhead is diminished due to the division of wordline loading. As in Fig. 5.15, the wordline of conventional scheme connects to 64 NMOS