In this section, we review several previous publications about power optimization and introduce our utilized method. Power dissipation in CMOS digital circuits consists of dynamic power, short circuit power and leakage power. The short circuit power is usually negligible compared to dynamic power and leakage power. Therefore, most of the optimization works focus on the latter two sources of power consumption. Generally, the dynamic power is insensitive to process variations and can be assumed to be deterministic [63]. The leakage power is greatly affected by physical parameters with uncertainties because of manufactured process variations, and needs to be treated as random processes. Existing useful power reduction methods at the circuit-level are the supply voltage scaling, threshold voltage scaling, gate-oxide scaling, gate-sizing, retiming, and any combination of these methods.
2.5.1 Statistical Leakage Power Optimization and Deterministic Dynamic Power Optimization
A performance optimization based on the criticality is proposed in [64]. By modeling the sta-tistics of leakage and delay as posynomial functions, authors in [65] formulate a geometric programming problem and solve it by the convex optimization method. In [66], it is formulated as an unconstrained nonlinear optimization problem and solved based on the efficient power and delay gradient computation. A statistical power optimization algorithm under the timing yield constraint is presented in [67], where the second order cone programming is employed.
Some sensitivity-based heuristic methods are proposed to reduce the leakage power [68, 69].
The above works utilize the techniques of gate-sizing and dual-threshold voltage to reduce the leakage power statistically. However, most of them neglect the importance of dynamic power.
On the other hand, many researches use the multiple supply voltages to do the deterministic dynamic power optimization [70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84]; never-theless, the leakage power is ignored in their experiments. Although several studies consider the dynamic and leakage power at the same time, they still have their limitations. Authors in [85]
propose an algorithm based on the linear programming. The genetic algorithm is employed in [86] to do the power optimization. A two-phase flow is presented in [87] to minimize the power consumption. By the use of retiming and Vdd/Vth scaling, the authors [88] formulate the problem by using the integer linear programming approach. These studies are all determin-istic methods. Although the work in [67] considers the total power reduction statdetermin-istically, it does not take the temperature influence into account.
2.5.2 Multiple Supply Voltage (MSV) Technique
The concept of MSV method is to assign lower supply voltages to gates on the non-critical path for power saving and assign higher supply voltages to gates on the critical path for satisfying the timing constraint. It provides the premium result with less penalty [89]. The two constraints introduced by the MSV method are the electrical constraint and the physical constraint. In a voltage-scaled circuit, if a low supply voltage gate drives a high supply voltage gate, a level converter (LC) must be inserted to eliminate the undesirable static current. The additional level converters would increase the cost in area, delay and power; hence, the number of level
convert-ers must be controlled. Moreover, cells operating at different supply voltages should be placed carefully to facilitate the power network design and reduce the routing complexity. Previous efforts toward reducing the level-shifting overhead include: 1) clustered voltage scaling (CVS) and 2) extended CVS (ECVS). The CVS partitions a circuit into two clusters - one having only cells operating at high supply voltage and the other having only cells operating at low supply voltage. The scenario in which a cell driven by low supply voltage directly feeds a cell driven by high supply voltage is clearly precluded in this partition. The ECVS relaxes this topological constraint and allows a cell with low supply voltage to feed a cell with high supply voltage after its output has undergone level conversion. Thus, ECVS has more freedom in finding parts of the circuit that can be operated at the lower supply voltage and can potentially lead to higher power saving. However, the delay penalty tends to be larger too. An effective solution is grouping cells of different supply voltages into a small number of “voltage islands”, where each voltage island occupies a contiguous physical space and operates at a single supply voltage and meets the performance requirement.
Logic boundaries are largely used in this grouping process mainly because they are the boundaries that designers are most familiar with. Nevertheless, these natural boundaries in a design are almost always nonoptimal boundaries for supply voltages. Fig. 2.3 illustrates why sticking to logic boundaries is limiting the solution space in producing optimal MSV. In the example, there are three modules, each of them contains only leaf cells, and both modules A and B contain some timing-critical cells that require high voltage Fig. 2.3(a). Fig. 2.3(b) and (c) are the designs based on logic boundaries. While Fig. 2.3(b) guarantees the performance using high power, Fig. 2.3(c) reduces the power consumption without meeting the timing requirement.
None of them are optimal MSV. By using placement proximity (instead of logic) information, the optimal MSV meets power and timing requirements at the same time while keeping the number of power domains small as shown in Fig. 2.3(d). This idea is called post-placement voltage island generation [72]. Due to the advantages of the post-placement voltage island, we employ this methodology in our experiment. Another reason is that the location of each gate is provided in the post-placement stage so the spatial correlation can be considered in the SSTA.
A B
C
(a)
A B
C
(b)
A B
C
(c)
A B
C
(d)
Fig. 2.3: (a) Design with timing-critical cells (small purple cells). (b) Power consumption too high. (c) Timing requirement not met for small cells in module A. (d) Placement-proximity-based solution with nonlogical boundary.
2.5.3 Previous Works of Post-Placement Voltage Island
The post-placement voltage island can be performed in two stages: supply voltage assign-ment [73] and voltage island generation [72]. Using the concept of the zero slack algorithm and the Voronoi diagram, authors in [73] propose a proximity-driven-voltage-assignment algorithm.
Based on the placement and the voltage requirement of each cell, they continue implementing an efficient algorithm to find the voltage islands for the best tradeoff between the total power and the number of islands [72]. The method developed in [74] allows the generated voltage islands to be any shape instead of only rectangular [72].
Chapter 3
Statistical Power Optimization in 3D ICs
Netlist Cell Library ( LEF/DEF )
Timing/Leakage
Power Cell Library 3D Placement ( Bookshelf )
Statistical 3D Thermal Analysis Thermal Aware Statistical Timing Analysis
Timing
Violation Yes Rescue No
Timing 3D IC Yes
Voltage Budget
No Voltage
Assignment
Post Tuning Grouping and Extension Grouping and Extension
End
Fig. 3.1: Flowchart of the proposed statistical power optimization for 3D ICs
3.1 Problem Formulation and Flowchart
The proposed power reduction design flow for 3D ICs is shown in Fig. 3.1. Given a known placement, design netlist and a standard cell library, the flow first executes a statistical thermal simulation to obtain the statistical temperature distribution for the specified 3D IC. After that, a thermal aware SSTA is performed with the statistical temperature distribution got from the previous step. The slack data provided by SSTA is used to compute the power-delay sensitivity;
next, a grid-based procedure is developed for the voltage assignment. After the assignment procedure, the power consumption and delay of gates are changed so we do the thermal and
Block Block Block
Block Block
Pad
Dielectric Layer
Metal Layer Through
Silicon Via ( Signal Via )
Through Silicon Via ( Dummy Via )
Fig. 3.2: The schematic diagram of a 3D IC with 3 chip layers
timing analysis again, and the timing of the circuit is verified. If there is any timing violation, a rescue procedure is enforced to assure the timing correctness and finish the iteration. When the circuit satisfies the timing constraint, the program starts the assignment process again. The program also terminates the loop when the iteration can not provide more improvement. Then, a post-tuning step is employed to further lower the power consumption. In the last process, the proposed method uses grouping and extension to implement the voltage island generation. Each executing step will be described in the following sections.