Chapter 1 Introduction
1.2 Organization
In this thesis, we investigate the feasibility of three low power techniques, Voltage Separation, body bias and power switch, for the cell-based design flow. This thesis presents a set simple but effective low power cell-based design flows via existent commercial EDA tool. In the following chapters, the detailed design process and principle will be presented clearly.
In Chapter 2, the relative researches and major efficiency for each low power techniques will be review and presented.
In Chapter 3, the details and principle of modified low power cell-based design flow for Voltage Separation, body bias and power switch are presented. All implementations use TSMC 0.18um CMOS technology and Artisan standard cell library support by Taiwan Chip Implementation Center (CIC) [13].
In Chapter 4, the results from these low power techniques are presented that demonstrate the area cost as well as possible design issues.
Finally, in Chapter 5, we summary the conclusion and present the future work.
Chapter 2 Background
In Chapter 2, the relative researches and major efficiency for each low power techniques are presented. In Section 2.1, the Voltage Island and Voltage Separation are introduced. Moreover, some considerations for Voltage Separation are also discussed. The basic principle of body bias is introduced in Section 2.2. The effectiveness of reducing leakage current and process variation are presented. Finally, the power switch and MTCMOS are introduced in Section 2.3. The power switch sizing impact on performance is also discussed.
2.1 Voltage Separation
Voltage Island first is proposed by IBM in 2002 [1]. Voltage Island is a system architecture, which can be used to dramatically reduce active and static power consumption for System-on-Chip (SoC) designs. Voltage separation is a sub-step of Voltage Islands and can be accomplished with existent EDA tools. In this thesis, we apply Voltage Separation on a SIMD MAC and treat each one as an individual island.
Therefore, the power supply of each island can be managed individually. In Section 2.1, we will briefly explain Voltage Island and Voltage Separation.
In previous generations, large functional blocks were not integrated on the same chip, so that voltage level of each block could be made independently. As advance of process technology and increase of chip capability, integrating more functional units into the single chip become a popular design trend. Therefore, traditional approaches to power distribution and performance optimization fail to provide the flexibility of voltage and technology optimization of the previously disintegrated solutions.
Figure 2.1 shows a Voltage Island example. Identify minimum voltage level of each element to achieve its required performance. In general, the most performance critical element of the design requires 1.2V supported by the technology in order to maximize it’s performance. Other elements, such as memory or control logic, may
require only 1.0V. Therefore, saving significant active power is available if they can be operated at lower voltages. In addition, voltage flexibility allows pre-designed standard elements to be reused in a new SoC application. Further, some elements, such as analog core, are specified at special voltage, and can be easily accommodated in mixed voltage system. However, some peripheral circuit, such as level converter, may be used for translating voltage level among mixed voltage system.
Another type of Voltage Island, shown in Figure 2.2, increase power savings in application more sensitive to standby power, such as battery-powered devices.
Method such as clock gating can be used to limit the active power from these idle function units, but the leakage current (or standby) power remains. If the power supplies for these function units are partitioned into islands, the functional units can be completely powered off via switch, eliminating both active and standby components power. There are three islands shown in the figure 2.2. The first island contains the circuitry that listens for the signal to wake up for the rest of system.
Using this concept requires that power management be built into the architecture, to handle power sequencing and communication issues. The other two islands contain switches respectively. During the sleep mode, the switch cut off the supply power and both active and standby power can be eliminated.
Voltage Separation mentioned in this thesis, like first scenario of Voltage Island in Figure 2.1, can feed each Group with differential voltage supply. The entire design flow for Voltage Separation includes the following steps, as shown below.
z Functional partitioning
z Synthesis and timing consideration z Floorplanning and Physical design z Logic verification
For implementing Voltage Separation, it involves additional step that affects each design phase in the design flow. The following will briefly describe the requirement consideration.
z Functional partitioning: The designer should partition the functional units of the design into different islands according to its power characteristic and operation schedule. Example, if component A and component B requires the same voltage level and have the same operation schedule, these two components can be classified to the same island. The islands should be written into individual RTL module when RTL coding, which make grouping islands easily and clearly.
z Synthesis and timing consideration: When Synthesis, the effect caused from difference of signal voltage level has to take into consideration. Because of difference voltage level between islands, the level shifter must be added into design. Therefore, the increase of delay from level shifter must take into account when synthesis.
z Floorplanning and Physical design: A Voltage Separation requires complete isolation from each island. In order to enable independent power sequencing. Floorplanning includes determining the number of power sources that meet each island power requirement and how to place the island efficiently [14]. Islands must be floorplanned closely to where the corresponding power pins.
z Logic verification: the verification of island functionality is needed. It includes determining the correctness of power management and switching of power control.
Figure 2.1 Timing-critical Voltage Island [1].
Hi-speed Processor
400 MHz 5M gates 10Mbit SRAM 0.9 V
Listen Island 50K gates
Low Performance 0.7V
Operational Island For Hibernate Mode 500k gates 2Mb SRAM Low performance 0.9V
SWITCH
SWITCH
Wake up Hibernate
Sleep Mode
Figure 2.2 Voltage Island for Power-Sequencing [1].
Some considerations and issues described above are still not resolved. These are still investigated in the recent literature. In this thesis, we only implement Voltage Separation, which are the sub-step of Voltage Island and accomplished with existent EDA tool. We partition the design into two islands and create its own power grid. In our research, the signal level shifting and other issue are not under consideration. The detailed design flow will be introduced in next chapter.
2.2 Body Bias
Body bias is a dynamic technique that has been used to for leakage power reduction by dynamically changing the body bias applied to the circuit block [Figure2.3]. In general, the forward body bias (FBB) is applied to increases the operating frequency when active mode. When block enter the idle mode, the forward bias is withdrawn, reducing the leakage. In addition, reverse body bias (RBB) can be applied during idle mode for advanced leakage savings. Compare with power switch, the body bias can provide leakage reduction without any performance degradation [3].
Figure 2.3 Outline of body bias.
Process parameter variations, which are becoming worse as technology scales, impact the frequency and leakage distribution of microprocessor dies [15]. Due to these die-to-die and within-die variations, some dies can’t achieve the desired frequency target, while others may fail the maximum leakage power specification.
Therefore, adaptive body bias has been employed to reduce the impact of these variations and adjust the frequency [16], [17]. The adaptive body bias scheme bases on the frequency of the critical path and use a phase detector which communicates
with a central body bias generator. Therefore, the body bias values which must be applied can be determined by the central body bias generator via embedding measurement. This method, therefore, can ignore any within-die variation.
In thesis, the body bias technique is implemented under two design assumption.
First, body bias is realized with conventional standard cell library. Because of the Body terminal and Source terminal are tied together, separating these two terminals is need for body bias realization. Therefore, contact of the standard and well pattern will modified for isolation between Body and Source terminal. Besides the remove of contact within the standard cell, the well pattern and metal line which created for body bias signal can be implemented via existent commercial CAD tool. It significantly decreases the design complexity and time.
Body bias can also be realized via the dual-supply standard cell. The outlines of the layout will be presented in Chapter 3. The dual-supply standard cells separate the Body and Source terminal embedded in the layout. According to the detailed design, the Body bias techniques can be achieved via existent commercial EDA tool.
2.3 Power Switch
Power switch has first adopted by MTCMOS (Multi-threshold CMOS) technique [18], [19]. MTCMOS is very effective at reducing leakage current in the idle mode.
MTCMOS use two types of CMOS: high-VT and low-VT transistor. High-VT devices can be used to reduce leakage currents while low-VT devices can be used whenever high performance is required. MTCMOS technique involves using high-VT transistors to gate power supplies of a low-VT logic block as shown in Figure 2.4. When the high-VT transistors are turned on, the low-VT logic is connected to virtual ground and power, the switching is performed through fast devices. When the circuit enters the idle mode, the high-VT gating transistors are turned off, resulting in a very low leakage current from VCC to ground [20]. MTCMOS circuit can achieve several orders of magnitude reduction in leakage currents through two effects. First, the total effective transistor width of the original CMOS circuit is reduced to the width of the single “off” transistor (provided it is smaller than the original width), and second, the increased threshold voltage results in an exponential reduction in leakage currents [18].
Low-VT
Figure 2.4 MTCMOS circuit structure.
Power switch (or Sleep transistor) connecting power lines to virtual power lines can be accurately modeled as linear resistors. For a turned-on NMOS transistor sized large enough to ensure performance for requirement, the virtual ground voltage will be close to actual ground. Therefore, the power switch sizing is a key design parameter that affects the performance of circuit. If sized too large, the silicon area would be wasted and switching energy overhead between idle and active modes would be increased. On the other head, if sized too small, then the circuit would be too slow because of increased resistance to ground. Therefore, overdriving and under-driving are used to apply on the power switch [20], [21. Overdriving is used in active mode in order to reduce the frequency penalty of the power switch. Gate under-driving is used in idle mode to further increase the leakage savings by reducing the leakage of the power switch. Besides incurring a little performance penalty, power switch is still a very attractive technique for leakage suppression.
In order to insert power switch into Cell-Based design without modifying core design, the appropriate arrangement for power switch is important. Figure 2.5 show the power connection between power grid and standard cell. In our implementation, the power switch is inserted below the power delivery grid between metal-1 and metal-2 [Figure 2.6]. The power switches are distributed throughout the power ring in two columns in order to avoid any current crowding issues. The power switches are designed as large as possible to avoid the sizing impact on performance.
Metal 2 VSS
Figure 2.5 Power connection in general physical design flow.
Cell
Virtual
Metal 2 Vss
Metal 1
VSS
Virtual VSS
Power Switch
VSS Cell
Figure 2.6 Power connection for power switch.
Chapter 3
Implementation
In this chapter, each physical design flow is presented. First, general cell-based physical design flow is introduced in Section 3.1. It is divided into seven design phases and purpose of each phase is explained clearly. The physical design flow for Voltage Separation is presented in Section 3.2. In order to avoid substrate noise coupling, deep n-well (DNW) pattern is added for digital circuit core. The body bias for cell-based design flow is shown in Section 3.3. For dual-supply cell, layout of dual-supply cell and connection between port and power net are presented. For general standard cell, the modification of well and contact are introduced. Finally, the power switch implementation is shown in Section 3.4.
3.1 General Automatic Physical Design Flow
The physical design is translating gate-level netlist into a physical representation.
Because of the major goal of physical design is standard cells’ placement and routing.
The physical design is also called to Auto Place and Routing (APR). The physical design flow includes power/ground line design, partitioning, floorplanning, placement, routing and clock tree synthesis. A general automatic physical design flow is shown in the figure 3.1. From the gate-level netlist to final GDSII file, the entire physical design flow is divided into seven phases. The details of each phase will be described below.
First phase in physical design flow is design setup. In this design phase such as technology file, reference libraries, gate-level netlist and power connection are specified. The technology file contains layer definitions and process design rules. It must be specified before creating a design library. The reference libraries are including standard cell library, memory library and IO library. The gate-level netlist is an HDL code after logic synthesis. The EDA tool can load appropriate standard cells from gate-level netlist. Power and Ground port of each standard cell must be
associated with corresponding global Power and Ground nets, respectively.
Second phase is floorplanning. In this design phase the core area aspect and Power/Ground Grid will be determined. The core area aspect including standard cell placement direction will be defined by some control parameters. The routing channel and core utilization also be confirm in this design phase respectively. The size of routing channel and core utilization affect the total chip area and probability of routing success. The core power ring and power straps are created to form Power/Ground Grid. The well-defined Power/Ground Grid leads to power arrangement balance and current density. The third design phase is timing setup. In this design phase EDA tool optimize the logic gates, places and routes them to fit in the smallest possible area while meeting all timing constraints by relying on static timing analysis and parasitic extraction estimation and calculation.
The forth phase is placement. The placement of standard cells is determined in this design phase. Unsuitable placement of standard cells results in congestion problem which is a limit to the number of nets through the small area. During placement, the congestion problem is fixed by spreading cells apart and wire detour without hurting circuit performance. After placement, the port of each standards cell will connect to Power/Ground Grid. The fifth design phase is clock tree synthesis (CTS). The multi-level buffer trees according to clock specification are added into your target design. The clock skew will be decrease and fit the time specification of your design.
The side effort of clock tree synthesis is re-move of some cells and increase of congestion. The EDA tool will optimize the placement of standard cells and fix the congestion problem.
The sixth design phase is routing. The goal of routing is drawing Design-Rule- Check-correct (DRC) metal shapes for all interconnect wire while maintaining circuit timing, clock skew, signal net transition and capacitance limits. But this build-in DRC is used for simple verification only, it have to use other tools for sing-off. When routing phase, each metal layer has its own, possibly unique, grid and preferred routing direction. Therefore, every metal line is assign to respective track and is attempting to make long, straight routes. Like placement phase, the congestion problem is expected and resolving with detour routing.
The seventh, is also last, design phase is Design for Manufacture (DFM). DFM is used to improve several manufacturability issues and increase manufacturing yield.
Such as antenna fixing, metal slotting and metal filler are used to control metal density and prevent from metal liftoff and erosion. The final validation is detailed DRC and Layout Versus Schematic (LVS) verification. The DRC checks physical
formation matching fabrication design constrains. The LVS checks the connectivity of physical layout to its related schematic circuit netlist. Finally, the GDS II file which is free of error can be fabricated in foundry for manufacture.
Figure 3.1 General automatic physical design flow.
3.2 Physical Design Flow for Voltage Separation
This section presents a design flow for the Voltage Separation. The entire design flow for Voltage Separation is shown in the figure 3.2. Compare with general physical design flow, the difference parts of design phase are design setup, floorplanning and design for manufacturing. The detailed illustrations are shown in remaining of this section.
In the floorplanning design phase for Voltage Separation, the extra three steps are added to the design phase. There respectively are
z Partition into groups z Floorplanning of groups z Create Separated Power Rings
First step is Partition into Groups. According to the demand of system, designer can partition design circuit into several Groups which are supplied to difference voltage level, respectively. Therefore, the power consumption of system can be decrease by providing lowest voltage level for each Group. But at the same time, the core area will be increase result of individual voltage grid and dead space from floorplanning of Groups. In order to increase the design flexibility and decrease the area penalty caused by voltage separation, each Group can adjust island’s aspect and core utilization by some design parameters tuning.
After step of Partition into Groups, the Floorplanning of Groups has to implement.
According to the pre-plan of power grid, designer can place Group to any region in the core individually. In order to reduce power consumption of system, creating Separated Power Rings surrounding each island respectively is needed. To prevent from fault connection of power rings, the declaration of power rings connection has to specify clearly.
In this part, the deep N-well (DNW) can be respectively added to each Group for decreasing substrate noise coupling. The DNW attenuating noise to common substrate is shown in figure 3.3. The DNW isolates the P-well, which is the noise source and P-substrate with each other. The device of characteristic is not affected by DNW impact because DNW implant peak is deep enough, about 2 um. 70dB substrate noise isolation between integrated subsystems is achieved from a circuit level methods [22], substrate noise trapping, descript in figure 3.4. The DNW entirely covering the digital circuit section attenuates the substrate noise passing through the DNW’s walls towards the common substrate (substrate noise trapping). Once into the common substrate, the attenuated substrate noise will proceed towards the DNW protecting the
RF circuit section, making that whole DNW change its electric potential uniformly.
RF circuit section, making that whole DNW change its electric potential uniformly.