Chapter 2 Background Overview
2.5 Summary
We have introduced different kinds of power dissipation in the CMOS circuit, the common format of standard cell library and the content of the liberty file. Then we explain the flow that we design and characterize the standard cell. We also introduce the NLDM lookup table of timing and power in the liberty file in this chapter.
24
Chapter 3
Timing and Power Model Characterization Flow
3.1 Timing Characterization Flow
3.1.1 Transition Time and Propagation Delay time
In this section, we will introduce the flow that we characterize the propagation delay time and transition time. The propagation delay time and transition time actually can be characterized at one simulation. The procedures that we characterize the propagation delay time and transition time are described as following:
(1) The first step: Determining the size of lookup table and choosing the ranges of indices. The way of selecting the ranges of indices can refer to the following rule.
( i ) The minimum index of input transition time:
We use the cell with the largest driving ability to drive the cell with the smallest driving ability in the standard cell library. Then we can obtain the output transition time of the largest driving ability cell. It is defined as the minimum index of input transition time because this is the best timing case that one gate can drive the loading cell. Taking the inverter gate for instance,
25
we use the largest driving ability inverter to drive the smallest driving ability one. Then the output transition time of that largest driving ability inverter is defined as the minimum index of input transition time.
( ii ) The maximum index of input transition time:
We can see that the curves of the input transition time vs. the propagation delay time in Fig. 3.1 are close to be a straight line on the larger segments of the input transition time. As long as the maximum index of transition time is large enough, we can calculate the output propagation delay time that is out of the maximum index by linear extrapolation.
( iii )The maximum index of output loading capacitance:
The rule that we define the maximum index of output loading capacitance is the same with the rule of defining the maximum index of input transition time. The curve of the output loading capacitance vs. the propagation delay time is also close to be a straight line on the larger segments of the output loading capacitance. So we define three times of the largest driving ability inverter input capacitance as the maximum index of output loading capacitance. We also can calculate the output propagation delay time that is out of the maximum index by linear extrapolation.
26
0 100p 200p 300p 400p 500p 600p 700p 800p 900p
0 100p 200p 300p 400p 500p 600p 700p 800p 900p
200p
0 100p 200p 300p 400p 500p 600p 700p 800p 900p
200p
Fig. 3.1 (a) Power vs. input transition time (b) Delay_rise vs. input transition time and (c) Delay_fall vs. input transition time with fixed output load capacitance of an inverter
The size of the lookup table can be determined by the following observations.
From Fig. 3.2, we can realize that the curves of the input transition time vs. the propagation delay time or the output loading capacitance vs. the propagation delay time are non-linear on the smaller index and linear on the larger index. So we use the tactic that we choose finer and more indices on the smaller index
27
region and fewer indices on the larger index region to establish the lookup table after determining the minimum and maximum values of index. By this way, we can describe the curve more accurately.
(2) The second step: Determining and importing the input pattern according to the functions of different kinds of cells
− We use a 3-input NAND gate as shown in the Fig. 3. to explain this step. Table 3.1 is the truth table of a 3-input NAND gate. We transit the specific input pin that we want to measure its transition time or propagation delay time and set the other pins on the high level. Through this way, we can obtain the transition time or propagation delay time of the specific input pin. If we want to characterize the timing performances related to input pin, in1, we have to import an input pattern which Y changes with in1 transition and keep other input at high level.
Then we can measure the timing performances.
Fig. 3.2 3-input NAND schematic Table 3.1 Truth table of 3-input NAND
In1 In2 In3 Y 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 0
We can find out the combinations of the input pattern that can fit our above requirement are the fourth pattern and the eighth pattern in the Table 3.1.
28
(3) The third step: Run the SPICE simulation and get the results.
Before we run the SPICE simulation, we have to know the following definitions about timing performance.
− Transition time: The definition of transition time is the time difference between 10% VDD and 90% VDD of the output signal. (It also can be from 20% to 80%
VDD or from 30% to 70% VDD.)
− Delay time: The definition of delay time is the time difference between 50%
VDD of the input signal and 50% VDD of the output signal.
Thus we follow the above three steps and definitions to measure the timing performance.
3.1.2 Input Capacitance
In this section, we will explain the flow that we measure the input capacitances of the cells
(1) Create a lookup table of output load capacitance vs. delay time:
In the first step, we will create a look up table of output load capacitance vs.
delay time. We use an arbitrary inverter to drive many different values of capacitance and record the propagation delay time of every simulation. Then we can create the look up table that we want. Fig. 3. and Table 3.2 show what we have to do in this step.
(2) We use the same inverter that we use at the first step to drive the specific input pin of the cell under measurement and then record the delay time of every simulation. Fig. 3.4 shows the action to execute this step.
29
Fig. 3.3 Circuit diagram of creating capacitance vs. delay time look up table Table 3.2 Capacitance vs. Delay table
Capacitance 1fF 2fF 3fF 4fF
Delay time 1ps 2ps 3ps 4ps
Fig. 3.4 Use the inverter in step 1 to drive circuit under measurement
(3) In this step, we have to compare the delay time that we have measured in the step 2 with the lookup table created in the step 1. Then, we can find the input capacitance of the cell under measurement from the corresponding delay time in the lookup table. If the delay time of the cell is not exact in the look up table, we can use the interpolation to get the input capacitance.
3.2 Power Characterization Flow
3.2.1 Internal Power
The flow that we characterize the internal power is the same with the transition time and propagation delay time. So we can characterize them at the same time. There is one thing that we have to notice. The values stored in the internal power table are energy and the unit of these values is joule. It can be expressed by the following equation.
Energy = Power * Time = V(VDD) * I(VDD) * Time
When the characterization tool uses SPICE to measure the power, it will record
30
the switching power and the energy of the VDD and VSS and at the same time. So it will subtract the switching power from the total power to get the internal power and avoid double calculating the internal power consumption in each pin group. At this time, the tool will also record the energy of VDD and VSS as the output transits, and sum up these two values to obtain the energy consumption in this transition.
3.2.2 Leakage Power
We know that the leakage power of a cell is quite different when we feed the cell with the different input vectors. So we would like to record the leakage power of all combinations of static input vectors. The flow that we characterize the leakage power is described in the following steps.
(1) We import all combinations of static input vectors and measure the power consumption.
(2) We record all power consumptions of the corresponding combinations of input vectors.
Table 3.3 is the leakage power of a 2-input NOR gate and Fig. 3.5 shows an example of input dependent leakage power in the .lib file.
Table 3.3 Leakage power of a 2-input NOR gate
Input AB 00 01 10 11
Leakage power (pW) 353.42 231.18 2795.10 90.46
31
//maximum leakage power of cell //input dependent leakage power block //leakage power when input is 11
//leakage power when input is 10
//leakage power when input is 01
//leakage power when input is 00
Fig. 3.5 Example of input dependent leakage power format
3.3 Summary
We have introduced our characterization flow of timing and power model in this chapter. The major difference in our characterization flow is that we create the input dependent leakage power model. It can help the designers to estimate the leakage power more accurately especially in the 90nm or below process. We cooperate with ITRI-STC and use the automatic characterization tool – PAREX established by ITRI-STC to finish our characterization flow.
32
Chapter 4
Low Power Standard Cell Library
4.1 Overview of Low Power Standard Cell Design Methodology
There are many design methodologies to design low power circuits. It can be assorted as circuit/logic level, technology level, system level, algorithm level, and architecture level. In this thesis, we focus on circuit and logic level to design a low power standard cell library. We know that total power consumption includes both dynamic power and static power from Eqn. (2.1). Therefore, the total power consumption will be reduced as a result of diminishing either dynamic power consumption or static power consumption. The dynamic power consumption can be expressed by the follow equation:
f V
C
P
dynamic= α ⋅
L⋅
DD2⋅
(4.1)Where α is toggle rate, CL is output loading capacitance, VDD is supply voltage and f is operating frequency.
From this equation, we know that we can curtail the switching activity of the nets, output load capacitance, supply voltage, and operating frequency to reduce dynamic power consumption. The main methods that we can do to achieve this target in the
33
standard cell library is shrinking the widths of devices or using low supply voltage cells. Shrinking the widths of devices can also reduce the parasitic capacitance and gate capacitance of the device. It is equivalent to reduce the output load capacitance CL of the cell. The other method, using low supply voltage, would reduce the performance of cells and the dynamic power consumption at the same time. So, designers can use multiple supply voltages in a chip. The final target is to reduce the total power consumption while meet the required performance. Of course we can also reduce the power consumption by diminishing the leakage power. Eqn. (2.3) provides the information that the leakage power is related to leakage current directly. Thus, we can decrease leakage power by reducing leakage current. We will introduce several techniques to reduce leakage current in the following sections.
4.1.1 Multiple Threshold Voltage Circuit
Multiple-threshold CMOS circuit means that there are at least two different kinds of threshold transistors in a chip. Transistors with different threshold voltages have distinct characterizations. High threshold transistors are used to suppress sub-threshold leakage current, but it will degrade the performance seriously. The utility of low threshold transistors is to achieve high performance, but its sub-threshold leakage current is much greater than the high threshold transistors. The effect of standard threshold transistors is between low and high threshold transistors.
According to the above description about multiple threshold technology, there have been several proposed multiple thresholds CMOS design techniques.
The first type is Multi-threshold-Voltage CMOS (MTCOMS) circuit which was proposed by inserting high threshold devices in series to low-Vth circuitry [7]. Fig.
4.1(a) shows the schematic of a MTCMOS circuit.
34
(a) (b) (c)
Fig. 4.1 Schematic of MTCMOS circuits (a) Original MTCMOS (b) PMOS insertion MTCMOS and (c) NMOS insertion MTCMOS
The utility of the sleep control transistor is to do efficient power management.
When circuit is in the active mode, the pin SL is applied to low and the sleep control transistors (MP and MN) with high-Vt are turned on. Because the on-resistances of sleep control transistors are very small, the virtual supply voltages (VDDV and VSSV) are quite close to real ones. When the circuit is turned into the standby mode, the pin SL is set to high, MP and MN are turned off and they can cut the leakage current efficiently. Actually, in the practical design, it needs only one type of high-Vt transistor for leakage control. Fig. 4.1(b) and (c) show the PMOS insertion and NMOS insertion schemes, respectively. Most designers prefer the NMOS insertion due to the on-resistance of NMOS is quite smaller than PMOS with the same size. So designers can use the smaller size NMOS to be the sleep control transistor. MTCMOS can be easily implemented based on existing circuits. However, the main drawback of MTCMOS is it can only deal with the standby leakage power. The other problem is the large inserted MOSFETs will increase the area and delay significantly. Besides, if
35
the data retention is required in standby mode, it needs an additional high-Vt memory circuits to maintain the data [8].
The second type of multiple-threshold CMOS circuit is super cut-off CMOS (SCCMOS). The schematic of PMOS and NMOS insertion SCCMOS circuits are shown in Fig. 4.2(a) and (b), respectively. SCCMOS uses rather low-Vth transistors with an inserted gate bias generator than high-Vt sleep control transistors used in MTCMOS [9].
VDD
MP
VDDV
VSS Standby: VDD+0.4V Active: VSS
(a) (b) Fig. 4.2 Schematic of SCCMOS circuits (a) PMOS insertion SCCMOS
and (b) NMOS insertion SCCMOS
For the PMOS insertion SCCMOS, the gate is applied to VSS and the low-Vt PMOS is turned on in the active mode. At this time, the virtual supply voltage (VDDV) is very close to real power supply voltage. When the circuit is turned into the standby mode, the gate is set to VDD+0.4V to fully turn off the low-Vt PMOS.
Because the reverse bias is applied to the gate of PMOS, SCCMOS can fully cut off
36
the leakage current. On the other hand, the operation of NMOS insertion is the same as PMOS one. The gate of NMOS is set to VDD in the active mode and VSS-0.4V to fully cut off the leakage current in the standby mode, respectively. With the same reason as MTCMOS, it needs only one type of insertion SCCMOS for leakage control in the practical design.
The third type is Dual Threshold CMOS. We know that high threshold transistors are used to suppress sub-threshold leakage current, but it will degrade the performance seriously. For a logic circuit, high threshold transistors can be assigned in non-critical paths to reduce the leakage current, while the low threshold transistors in the critical paths can maintain the performance. By this method, both high performance and low power can be achieved simultaneously and it doesn’t need any additional transistors. Dual Threshold CMOS circuit is shown in Fig. 4.3. This dual threshold technique can diminish the leakage power during both standby and active mode very well. But the main difficulty of using this method is not all the transistors in non-critical paths can be replaced by high threshold voltage transistors due to the complexity of a circuit or the critical path of the circuit may change, thereby increasing the critical delay [1]. So it is hard for the tools to synthesize circuits with the consideration of this method.
Due to the above reason, designers have to use this technique carefully to avoid changing the critical path of the circuit. Note that this algorithm only deals with the circuits at the gate level. Thus, the transistors in a gate will have the same threshold voltage.
37
Fig. 4.3 Dual-threshold CMOS circuit [1]
The next type is mixed-Vth CMOS circuit scheme. [10] introduced two types of mixed-Vth CMOS circuits. Mixed-Vth schemes can have different threshold voltages within a gate. For type I scheme (MVT1), it is not allowed different threshold transistors in p pull-up or n pull-down networks. In the first step, designers have to find out the MOSs on the critical path. If the MOSs on the critical paths are in p pull-up or n pull-down networks, designers need to replace all of the MOSs in p pull-up or n pull-down networks with the same low threshold voltage MOSs to improve the performance. For example, the MOS transistors in the square (see Fig.
4.4(a)) are on the critical path. In the NOR gate, both p pull-up and n pull-down networks have the MOSs on the critical paths. So we change all the PMOSs and NMOSs for low threshold MOSs. In the inverter gate, we can see that only NMOS is on the critical path. So we just replace the NMOS with low threshold MOS and keep the high threshold MOS in the p pull-up network.
In another scheme of mixed-Vth CMOS circuit (MVT2), it allows different threshold transistors anywhere except for the series connected transistors. The transistors on the series connected networks must be the same threshold MOSs. When using the MVT2 technique, designers have to find out the MOSs on the critical paths, first. This step is the same as MVT1. Then designers have to change all the MOSs on
38
the critical paths on the series networks for low threshold transistors. The main difference between MVT1 and MVT2 is that MVT2 will just replace the MOSs on the critical path on the parallel networks with low threshold MOSs and keep other MOSs with high threshold transistors. For example, the NOR gate in Fig. 4.4(b), both the p pull-up series structure and the n pull-down parallel structure networks have MOSs on the critical path , respectively. With the above description, MVT2 replace all the PMOSs in the series structure at the critical path with low threshold MOSs and replace NMOS in the critical to low threshold transistors. MVT2 keeps other NMOSs on the parallel structure networks with high threshold transistors. The situation of inverter gate is the same as MVT1.
(a) (b) Fig. 4.4 MVT schemes of [10] (a) MVT1 scheme and (b) MVT2 scheme
A new Mixed-Vth (MVT) CMOS design technique is proposed to reduce the static power dissipation on gate-level in [12]. The goal of MVT-Gates is to reduce the leakage within a gate without varying the performance. This will be achieved by replacing normal-Vth transistors with high-Vth and low-Vth transistors. Optimization of a gate should not increase the worst case delay.
39
In a logic cell, stacked transistors usually form the critical path, and the MOSs on it must be low-Vth transistors. We can use different threshold transistors in such a stack to reduce leakage and keep the performance. In MLVT-gates scheme (Fig.
4.5(b)), all the transistors on the critical path are low-Vth transistors and the transistors on the non-critical path are still high-Vth transistors. Another scheme is called MVT-gates. The scheme of MVT-gates is the same as MLVT-gates on the
4.5(b)), all the transistors on the critical path are low-Vth transistors and the transistors on the non-critical path are still high-Vth transistors. Another scheme is called MVT-gates. The scheme of MVT-gates is the same as MLVT-gates on the