國
立
交
通
大
學
電 信 工 程 學 系
碩 士 論 文
一個新穎且快速的調整睡眠電晶體尺寸演算法
經由使用積分型的靈敏度
A Novel and Fast Sleep Transistor Sizing Algorithm by Using
Integral Sensitivity
研 究 生:徐獻哲
指導教授:李育民 教授
一個新穎且快速的調整睡眠電晶體尺寸演算法經由使用積分型的靈敏度
A Novel and Fast Sleep Transistor Sizing Algorithm
by Using Integral Sensitivity
研 究 生:徐獻哲 Student:Shian-Je Shiu
指導教授:李育民 Advisor:Yu-Min Lee
國 立 交 通 大 學
電 信 工 程 學 系
碩 士 論 文
A ThesisSubmitted to Department of Communication Engineering College of Electrical and Computer Engineering
National Chiao Tung University in partial Fulfillment of the Requirements
for the Degree of Master
in
Communication Engineering January 2009
Hsinchu, Taiwan, Republic of China
一個新穎且快速的調整睡眠電晶體尺寸演算法經由
使用積分型的靈敏度
學生:徐獻哲
指導教授:李育民 博士
國立交通大學電信工程學系碩士班
摘
要
隨著製程進入深次微米時代,特別是在 90 奈米以下,因為互補式金屬氧化物半導體(CMOS) 隨科技進步尺寸縮小,次臨界電流及閘漏電流呈指數增加,漏電功率已經變成主要的議題。所 以,如何節省漏電功率變成很重要的議題。節省漏電功率的方法已經廣被發展,而電源閘是眾 多有效方法的其中之一個。 在本篇論文裡,我們發展出一個新穎且快速的方法經由利用積分型的靈敏度去調整睡眠電 晶體的尺寸。我們提出的演算法分為兩個階段。在第一個階段,我們利用改善卡洛夫子系統法 (IEKS)解睡眠電晶體上的跨壓,在第二個階段則用正確的時域解取代改善卡洛夫子系統法 (IEKS)。為了加速整個最佳化的過程,我們也發展出同時選取多個睡眠電晶體來調整。最後的 實驗結果顯示我們的方法勝過目前已知最好的方法,它可以有效且快速的達到我們要的目標。 i
A Novel and Fast Sleep Transistor Sizing Algorithm by
Using Integral Sensitivity
Student:Shian-Je Shiu
Advisor:Dr.
Yu-Min Lee
Department of Communication Engineering
National Chiao Tung University
ABSTRACT
As the process technology enter the deep sub-micro era, especially below 90nm, leakage power has become a major issue due to the exponential increase of sub-threshold and gate leakage current with CMOS technology scaling. So, how to save leakage power becomes a very important issue. Many methods of save leakage power have been general developed, and power gating is one of the most effective methods.
In this thesis, we developed a novel and fast method by using integral sensitivity to size the width of sleep transistors. The purposed algorithm has two stages. In the first stage, we solve the voltage drop of sleep transistors by using Improved Extended Krylov Subspace (IEKS), and in the second stage, we employ exact time domain solver to replace IEKS. In order to speed up the optimizing procedure, a multiple sleep transistor simultaneously sizing strategy is also developed. The experimental results demonstrate that our method outperforms a state-of-the-art method, it can effective and fast to reach our object.
誌
謝
這篇論文能順利完成,首先要感謝我的指導老師李育民教授。在我的研究生 涯裡,每當遇到瓶頸,李老師總能給出許多寶貴的建議,引導我了解問題的根源, 進而解決問題,讓我獲益良多。我相信這對我未來的成長是極有幫助的。另外在 實驗室裡,李老師也提供一個非常棒的研究環境,不論是硬體或是軟體資源都極 為豐富,所以實驗才能夠順利完成。 再者,要感謝至鴻學長、培育、柏毅,懷中在研究上給予的指點與建議。還 要感謝實驗室的學弟妹,及昔日曾陪伴過我的伙伴。最後要感謝父母與哥哥,因 為他們的支持與鼓勵,默默的陪伴我,讓我得以完成論文。最後由衷的感謝每位 幫助我及關懷我的人,希望你們永遠健康快樂,謝謝你們。
Contents
1 Introduction 1
1.1 Motivation . . . 1
1.2 Our Contributions . . . 5
1.3 Organization of the Thesis . . . 5
2 Background 6 2.1 Basic Concepts of Power Gating . . . 6
2.2 Basic Formulations of Power Gating . . . 7
2.3 Structures of Sleep Transistor . . . 8
2.4 Previous Works . . . 10
3 Sleep Transistor Sizing Algorithm 13 3.1 The Flowchart and Problem Formulation . . . 13
3.2 DSTN Modeling . . . 15
3.3 Definition of Cost . . . 17
3.4 Definition and Computation of Integral Sensitivity . . . 18
3.5 Model Order Reduction Method . . . 21
3.5.1 Improved Extended Krylov Subspace Method(IEKS) . . . 21
3.5.2 The Moment of Several Unit Pulses Waveform . . . 22
3.6 Two-Stage Sizing Algorithm . . . 23
3.7 Speed Up . . . 25
3.8 Discussion . . . 26
4 Experimental Results 30
List of Figures
1.1 Subthreshold leakage current and gate tunneling leakage current . . . 1
1.2 The relation between Isubthand Vgs . . . 2
1.3 Leakage power and dynamic power [5] . . . 3
2.1 A single sleep transistor . . . 6
2.2 Module based design . . . 8
2.3 Power management processor . . . 9
2.4 Cluster based design . . . 10
2.5 Distributed sleep transistor network (DSTN) . . . 11
3.1 The voltage waveform at node i. . . 14
3.2 The flowchart of the two-stage sizing algorithm. . . 15
3.3 The IR equivalent circuit of a DSTN structure . . . 16
3.4 DSTN equivalent circuit . . . 17
3.5 The voltage waveform across a sleep transistor. . . 18
3.6 A waveform contains several unit pulses . . . 22
3.7 Moment calculation algorithm for the pulse waveform. . . 24
3.8 (a)The voltage waveform of node i in the first sizing stage. (b) The Vmax vio i of node i was reduced. (c)The voltage waveform of node i in the second stage. (d) The Vmax vio i of node i is equal to zero. . . 28
3.9 The two-stage sizing algorithm. . . 29
4.1 Total width relation between the number of selected sleep transistors and the upper bound voltage. . . 34
4.2 Runtime relation between the number of selected sleep transistors and the upper bound voltage. The X is the execution time. . . 35
List of Tables
4.1 Total width comparison for two-stage method and [4] . . . 31
4.2 The ratio of width for two-stage method and [4]. . . 31
4.3 Runtime comparison for two-stage method and [4]. . . 32
4.4 The ratio of runtime for two-stage method and [4]. . . 32
4.5 Total width comparison for two-stage method and only execution in second stage. 33 4.6 The ratio of width for two-stage method and only execution in second stage. . . 33
4.7 Runtime comparison for two-stage method and only execution in second stage. . 34
4.8 The ratio of runtime for two-stage method and only execution in second stage. . 34
Chapter 1
Introduction
1.1
Motivation
Gate Drain Source N + N + P Substrate Isubth IgateFig. 1.1: Subthreshold leakage current and gate tunneling leakage current
As the process technology scales down to below 90nm, leakage power has become a major issue. Two major leakage currents are the subthreshold leakage current and the gate tunneling leakage current, as shown in Fig. 1.1.
Subthreshold leakage current is the drain-to-source current of a transistor when the transistor is in the weak inversion region [1, 2]. Taking a NMOS transistor for example, when Vgs < Vthand Vds = Vdd, there is still a current flowing in the channel of the NMOS
log
I
subthgs
V
th
V
Fig. 1.2: The relation between Isubthand Vgs
transistor due to the Vdspotential. The subthreshold leakage of a MOS can be modeled as
[3] Isubth = A × e1/mvT(V g−V s−V th0−γ 0×V s+ηV ds) × (1 − e−V ds/VT), (1.1) A = u0Cox W Lef f (VT)2e1.8e−∆V th/ηVT, (1.2)
where Vth0 is the zero bias threshold voltage, VT is the thermal voltage, γ0 is the
lin-earizeded body effect coefficient, Cox is the gate oxide capacitance, µ0 is the zero bias
mobility, and m is the subthreshold swing coefficient of the transistor. In Equation (1.1), the Vgs and Isubthare exponentially related. Fig. 1.2 also illustrates the relation between Vgsand Isubth[2].
Gate tunneling leakage current is due to an electric field across the oxide coupled with a low oxide thickness, resulting in the tunneling of electrons from substrate to gate. The magnitude of the gate tunneling leakage current increases exponentially with the decrease of gate oxide thickness Tox and the increase of supply voltage Vdd. In deep
Fig. 1.3: Leakage power and dynamic power [5]
submicro VLSI design, the gate oxide thickness scales down as the process technology scales down. As the gate oxide thickness decreases, it results in increased gate tunneling leakage current. In order to reduce the gate tunneling leakage current, we can use a high-k dielectric. In addition to using a high-high-k dielectric, several additional leahigh-kage power reduction techniques have been developed such as MTCMOS (Multi-Threshold CMOS) and power gating. Power gating is also called sleeping transistor.
According to [4, 6], leakage power is expected to reach more than 50% of the total power of a chip in the 65nm technology. Furthermore, in ASIC system-on-chip VLSI design, leakage power is catching up with dynamic power, as shown in Fig. 1.3 [5]. Thus the issue of how to reduce leakage power is becoming a major current issue. Several useful methods have been proposed, as follows [1].
• Power Gating: When the circuit is in the standby state, the power gating is in the
leakage power is reduced. Otherwise, as the circuit is in the active state, and the power gating is in the on state, it results in IR drop across the power gating. Hence, the performance of circuit is degraded.
• Multi-Threshold CMOS ( MTCMOS ): MTCMOS is a technology that uses
sev-eral types of transistors with different threshold voltage values. Sleep transistors have high threshold voltage and transistors in the cluster have low threshold volt-age. As we increase the threshold voltage of the sleep transistors, it reduces the diffusion of minority carriers, hence the leakage power is reduced.
• Body Bias: When the circuit is in the standby state, reverse-body voltage is applied
to transistors so as to increase the threshold voltage, and then leakage current is reduced. The threshold voltage of a transistor can be calculated as
VT = VT 0+ γ(
q
|−2ΦF + VSB| −
q
|−2ΦF|), (1.3)
where VT 0is the threshold voltage for VSB = 0, γ is the body-effect coefficient and ΦF is the substrate Fermi potential. According to Equation (1.3), it can be observed
that a reverse bias will increase a transistor’s threshold voltage, thus reducing leak-age current.
This thesis focuses on sleep transistors because they are a useful way to reduce leakage power. Sleep transistors are an effective method for reducing leakage power. However, sleep transistors have the drawback of increasing timing delay and the total area of a cir-cuit. Accordingly, it is desired to minimize the total width of the sleep transistors subject to meet other circuit constraints, which generates a trade off question. The question can be written as a problem formulation, which along with our method will be presented in chapter 3.
1.2
Our Contributions
In this thesis we propose a novel and fast sleep transistor sizing algorithm. The sizing algorithm has two stages. In the first stage, Improved Extended Krylov subspace (IEKS) method is utilized to obtain the voltage of the nodes [7, 8], which is a very effective way to speed up the circuit analysis. We can get a better solution for the width of each sleep transistor after finishing the first stage, and then enter the second stage. In the second stage, the exact time domain solver is employed. It is a more accurate solver, it provides the best sleep transistor candidates for each sizing step and avoids oversizing. During the sizing procedure we consider the global region in the time period, therefore introducing both cost and integral sensitivity and also utilizing them to size the sleep transistors. The experimental results will show that the performance of our method is better than the state-of-the-art method [4].
1.3
Organization of the Thesis
The rest of this thesis is organized as follows. Chapter 2 introduces the basic concepts of sleep transistors and reviews the previous works. Chapter 3 presents flowchart and problem formulation, definition of cost and integral sensitivity, model order reduction techniques, the moment of unit pulse waveform, the sizing algorithm and speed up pro-cedures. Experimental results and conclusions are presented in chapters 4 and 5, respec-tively.
Chapter 2
Background
In this chapter we first provide a basic background for power gating and several power gating structures. After that we survey the previous work.
2.1
Basic Concepts of Power Gating
VDD
Cluster
Sleep Transistor
Virtual Ground
Fig. 2.1: A single sleep transistor
Power gating also called sleep transistor . Generally, a sleep transistor is connected be-tween a virtual ground and a ground, as shown in Fig. 2.1. When the logic or clusters are in the standby mode, the sleep transistor is in the off state, thus blocking the leakage current of the logic or clusters. When the logic or clusters are in the active mode, the sleep transistor is in the on state so there is a current through the sleep transistor, and resulting in an IR drop across the sleep transistor. The IR drop across the sleep transistor increases the timing delay of circuit, hence the performance of circuit is degraded.
2.2
Basic Formulations of Power Gating
A sleep transistor operates in the linear region when it is in the on state. Because the voltage drop between gate and drain is bigger than threshold voltage, so a sleep transistor can be modeled as a resistance [13, 14]. In the linear region, the current through a sleep transistor can be formulated as
Iist ≈ µnCox( Wist
L )(VDD− Vth)V st
i , (2.1)
where µnis the N-mobility, Coxis the gate oxide capacitance, Vthis the threshold voltage
of a sleep transistor, Vst
i is the voltage across the drain node and source node of the
sleep transistor i, and Wst
i is the width of the sleep transistor i. For each sleep transistor,
Equation (2.1) can be reformulated as
Wist = 1 k( Ist i Vst i ), (2.2)
where k = µnCox(VDD − Vth)/L can be treated as a constant. In the DSTN design, we
wish the width of sleep transistor to be as small as possible, but if it is not large enough, the voltage drop across the sleep transistor will be larger than the maximum allowed voltage drop, and the discharge/charge current of the gate will be reduced. Hence, circuit performance will be degraded. This is a trade-off question between the sleep transistor width and the circuit delay. In this thesis, we propose a novel method that not only
minimizes the total width of sleep transistors but also meets the maximum allowed voltage drop constraint. The details will be described in chapter 3.
2.3
Structures of Sleep Transistor
Cluster
VDD
Module
Fig. 2.2: Module based design
In this session we introduce three different structures, namely module based design, cluster based design and distributed sleep transistor network(DSTN) design. In the mod-ule based design, a modmod-ule includes several clusters or logic, and the ends of the clusters are connected by a sleep transistor as shown in Fig. 2.2. The sleep transistors in the mod-ule based design are controlled by a power management processor (PMP), as shown in Fig. 2.3 [9]. The general design that can be traced back to [10, 11]. In the cluster based design, a cluster includes several cells, the end of the cluster is connected to a sleep tran-sistor as shown in Fig. 2.4. Cluster based design can be found in [12]. In the distributed sleep transistor network (DSTN) design, a sleep transistor is inserted under a cluster or a
Module
Module
Module
Module
Module
Module
PMP
Fig. 2.3: Power management processor
cell, and the drains of these sleep transistors are connected together as shown in Fig. 2.5. By comparing the module based design with the cluster based design, we can observe the maximum instantaneous current (MIC) of the module is actually less than summation MIC of the clusters in the module, so the total area of module based design is less than cluster based design if we do not consider the resistances of the virtual ground wires. However, the long virtual ground wires in the module based design increase the IR drop, which may result in functional failure. In order to avoid this situation, it is necessary to increase the width of the sleep transistors. On the other hand, in the DSTN design, since the MIC of each cluster usually does not occur in the same time and all the sleep transis-tors share all the currents of the clusters, therefore the total area of the sleep transistransis-tors is reduced. The previous works also show that the DSTN design is better than cluster based design and module based design. Hence, we employ the DSTN design to reduce leakage current in this thesis.
VDD
Cluster
VDD
Cluster
VDD
Cluster
Fig. 2.4: Cluster based design
2.4
Previous Works
In this section we review the previous work. In [11], module based design was proposed so as to use a single sleep transistor for the entire circuit. In [15], cluster based design was presented, whereby each cluster has an individual sleep transistor. In [9], the authors proposed a novel distributed sleep transistor network (DSTN) such that each cell or cluster was connected to an individual sleep transistor and the drains of the sleep transistors were connected together. The experimental results showed that the DSTN design reduced the area better than the cluster based design. In [9], the authors estimated the total width of the sleep transistors by Equation (2.3), where M IC(CKT ) is the maximum instantaneous current of the circuit and β is an empirical number to consider the effect of the resistances of the virtual ground lines. β can be calculated by Equation (2.4), where Nclu is the
VDD
VDD
Cluster
Cluster
VDD
Cluster
Fig. 2.5: Distributed sleep transistor network (DSTN)
number of clusters in the circuit.
Wst = 1 k(
(1 + β) ∗ M IC(CKT )
Vst∗ ), (2.3)
β = 0.002 ∗ Nclu. (2.4)
In [16], the authors presented a sleep transistor sizing algorithm which considered timing criticality and temporal currents. In [14], the authors utilized the DSTN structure and estimated a tight upper bound of the voltage drop. Recently, the authors in [4] partitioned the waveform of each cluster into many time frames and then estimated the MIC flowing through a sleep transistor at each time frame. Although the MIC can be accurately com-puted for the sizing step, it is time-consuming. Therefore the authors proposed a variable length partitioning scheme to speed up the procedure, but the total width of the sleep tran-sistors was increased. In [4], a state-of-the-art method was proposed. The authors just only consider the maximum instantaneous voltage violation, and reduces the violation.
However, our method consider a time period for each node. The different between us is we consider the global region, [4] just only consider the local region. Finally, the results show that our method outperform the state-of-the-art method, it will be shown in chapter 4.
Chapter 3
Sleep Transistor Sizing Algorithm
In this chapter we present algorithm flowchart and problem formulation, DSTN mod-eling, definition and calculation of cost and integral sensitivity, model order reduction techniques and the details of the two-stage sizing algorithm.
3.1
The Flowchart and Problem Formulation
In this these, we proposed the sleep transistor sizing algorithm, the algorithm is also called two-stage sizing algorithm. Before the beginning, we firstly define the drain of sleep transistor i as node i. Fig. 3.1 illustrates the voltage waveform at node i, where Vist∗ is the maximum allowed threshold voltage at node i, Vup is the upper bound voltage at node i, Vmax vio
i is the maximum instantaneous voltage violation at node i, and the yellow
parts of this figure are the cost of node i. The upper bound voltage Vupis employed in the
first stage, it will help us to speed up the sizing procedure and get the initial solution for second stage.
Generally, the execution time is proportional to the number of independent sources. when the circuit become large, the runtime will be increase. In order to handle large circuit design, two-stage sizing algorithm is proposed. The first stage provides a initial solution for second stage, and the upper bound voltage and model order reduction method are employed to speed up the all of the sizing procedure. In the second stage, the exact time domain solver is utilized, it provides the final solution and make sure results are not
* st i
V
( )
iV t
t
T
_ max vio iV
0
upV
Fig. 3.1: The voltage waveform at node i.
violation circuit design. The proposed two-stage sizing algorithm design flow is shown in Fig. 3.2. The design flow has two stages. In the first stage, we utilize model order reduction to solve the voltage at total nodes and then construct the sizing metric. Next we size the sleep transistors and update the conductance matrix G, repeating the above procedure until it meets the upper bound constraint. In the second stage, we employ an exact time domain solver to replace the model order reduction solver, again repeating the procedure until it meets the maximum allowed threshold voltage constraint, i.e. meets the cost constraint. The details will be described in the following sections. Our problem formulation is as follows:
• Input: Given a set of independent piece wise linear(PWL) current sources,
maxi-mum allowed threshold voltage at each node, the upper bound voltage, and mini-mum width of sleep transistors.
Generate circuit
Evaluate cost and sensitivity Solve voltage with
model order reduction solver
Size and update G Meet upper bound
constraint
No
Yes
Evaluate cost and sensitivity Solve voltage with exact
time domain solver Size and update G
Meet cost constraint
Yes
End
No
Fig. 3.2: The flowchart of the two-stage sizing algorithm.
• Object: Minimize the total width of sleep transistors subject to meet the maximum
allowed threshold voltage constraint.
• Output: The width of each sleep transistor and the total width of all the sleep
transistors.
3.2
DSTN Modeling
In the DSTN structure, the virtual ground line and sleep transistor can be modeled as equivalent resistances, and each cluster is modeled as a time-variant piece wise linear (PWL) waveform, so the DSTN structure can be transformed into an IR equivalent net-work as shown in Fig. 3.3. In the IR equivalent circuit, we employ Kirchhoff’s Current
VDD VDD Cluster Cluster VDD Cluster VDD : resistance of interconnect : resistance of sleep transistor
VDD VDD i
R
i R clusterI
clusterI
clusterI
stR
stR
iR
st R st RFig. 3.3: The IR equivalent circuit of a DSTN structure
Law (KCL) to the node 2 as show in Fig. 3.4, where Rsti is the resistance of sleep
tran-sistor i, Vi(t) is the voltage waveform at node i, Rij is the resistance between node i and
node j, and Ii(t) is a current injected into node i. We then get an equation as
I2(t) = (V2(t) − V1(t))/R21+ (V2(t) − V3(t))/R23+ (V2(t) − V6(t))/R26
+ (V2(t) − V5(t))/R25+ V2(t)/Rst2. (3.1)
Applying the KCL to each node of this equivalent IR circuit, the behavior of the system can be expressed by modified nodal analysis (MNA) formulation as
g11 g12 ... ... g1n g21 g22 ... ... g2n .. . ... . .. ... ... .. . ... ... . .. ... gn1 gn2 ... ... gnn V1(t) V2(t) .. . Vn(t) = I1(t) I2(t) .. . In(t) , (3.2)
where gij=(−1/Rij) and gii=(P1/Rij). Equation (3.2) can be reformulated as
21
R
25R
1 stR
5( )
V t
1( )
V t
1
( )
I t
2 stR
6( )
V t
2( )
V t
2( )
I t
3 stR
3( )
V t
3( )
I t
4 stR
4( )
V t
4( )
I t
26R
23R
Fig. 3.4: DSTN equivalent circuit
where G(Wst) is the conductance matrix, U(t) is the vector of current flowing through
sleep transistors, V(t, Wst) is the vector of voltage drop across sleep transistors and Wst
is the vector of width.
3.3
Definition of Cost
The integral of voltage waveform at node i above a maximum allowed threshold voltage
Vist∗ is an efficient metric for the performance of each node in the DSTN structure. On the other hand, the cost at node i includes a time period. If we sum the cost of each node, it can be viewed as global view, the results of global view point is better than local view point. As shown in Fig. 3.5, the cost of node i can be defined as
ci(Wst) 4 = Z T 0 maxVi(t, Wst) − Vist∗, 0 dt = Z T 0 qi(t)(Vi(t, Wst) − Vist∗)dt, (3.4)
* st i
V
( ,
st)
iV t W
22t
21t
12t
11t
T
t
( )
iq t
1
""""
Fig. 3.5: The voltage waveform across a sleep transistor.
where Vi(t, Wst) is voltage waveform at node i, and Wst is a vector of width. The qi(t)
is a waveform which indicates the voltage violation intervals; for example, there are two unit pulses within time intervals [t11, t12] and [t21, t22] in qi(t) as shown in Fig. 3.5.
The cost of each node has been defined in Equation (3.4). Hence, the total cost of a DSTN design can be defined as
c(Wst)=4 X i
ci(Wst). (3.5)
Given a set of Vist∗’s, each Vi(t, Wst) in the DSTN design must be always less than Vist∗,
i.e. the cost of each node must be equal to zero. Hence, Equation (3.5) be equal to zero.
3.4
Definition and Computation of Integral Sensitivity
Integral sensitivity is utilized to size the sleep transistors, because the sensitivity indicates the effectiveness of sizing the sleep transistor. From global view point, we combine cost and sensitivity, and then use them to size sleep transistors. Finally, it can get better results.
In this work, the integral sensitivity of sleep transistor i is defined as Si. It can be written as Si 4 = ∂c ∂Wst i (3.6)
where c is the total cost of DSTN.
To calculate each Si, firstly, a new function ˜c(t, Wst) is defined as
˜ c(t, Wst) = X i Z t 0 qi(τ )(Vi(τ, Wst) − Vist∗)dτ = X i Z t 0 hi(t − τ )(Vi(τ, Wst) − Vist∗)dτ, (3.7) where hi(t − τ ) 4
= qi(τ ) for each i. It can be observed that c(Wst) = ˜c(T, Wst).
Equa-tion (3.7) can be expressed as the follow vector form.
˜
c(t, Wst) = Z t
0
hT(t − τ )(V(t, Wst) − Vst∗)dτ, (3.8)
where h(t) is a functional vector with each entry i being hi(t), and Vst∗is a vector with
each entry equal to Vst∗
i . By taking Laplace transform on both sides of Equations (3.8)
and (3.3), we have G(Wst) ˜V(s, Wst) = ˜U(s) (3.9) ˆ c(s, Wst) = ˜hT(s) V(s, W˜ st) − V st∗ s ! , (3.10)
where ˆc(s, Wst), ˜h(s), ˜V(s) and ˜U(s) are Laplace transforms of ˜c(t, Wst), h(t), V(t)
and U(t), respectively. From Equation (3.9), we have ˜V = G−1U, so Equation (3.10)˜
can be formulated as ˆ c(s, Wst) = ˜hT(s) G−1(Wst) ˜U(s) − V st∗ s ! . (3.11) Therefore, ∂ˆc ∂Wst i = −˜hTG−1 ∂G ∂Wst i ˜ V = V˜aT ∂G ∂Wst i ˜ V, (3.12)
where ˜V is the Laplace transform of the solution of original MNA equations, and ˜Vais the solution of adjoint MNA equations in the s-domain, i.e.
G ˜V = ˜U, (3.13)
GTV˜a= −˜h. (3.14)
Finally, the sensitivity of c(Wst) with respect to an arbitrary parameter Wst
i is equal to ∂c(Wst) ∂Wst i = ∂˜c(t, W st) ∂Wst i t=T = Z T 0 VaT(t − τ ) ∂G ∂Wst i V(τ )dτ = κ Z T 0 Via(T − τ )Vi(τ )dτ (3.15)
In order to calculate each ∂c(Wst)/∂Wst
i , we need to know the waveforms of Va(t)
and V(t) which can be done by applying the trapezoidal integral approximation to Equa-tions (3.13) and (3.14) in the time domain, solving them easily with only two forward/backward substitutions at each time step. After solving the waveforms of Va(t) and V(t) at each
time step, we utilize Riemann’s sum to estimate the sleep transistor sensitivity. In this thesis, we employ the model order reduction method to solve the matrix problem in the first stage because it is faster than the exact time domain solver. Details will be described in the next section.
For the reader’s information, the ∂ci(Wst)/∂Wistcan be obtained by the above
proce-dure, i is from one to the total number of nodes. Because the ∂ci(Wst)/∂Wist is smaller
than ∂c(Wst)/∂Wist, if we employ the ∂ci(Wst)/∂Wistto size sleep transistors i, it might
result in oversizing, hence we utilize the ∂c(Wst)/∂Wist to size the sleep transistors, thereby obtaining better results.
3.5
Model Order Reduction Method
In this section we introduce model order reduction techniques which have proven to be very efficient for speeding up circuit analysis [7, 8]. IEKS (Improved Extended Krylov Subspace) method is one of these. IEKS was developed in [8], and it does not need to perform moment shifting for source waveform modeling. The major advantage of IEKS method is that its runtime is not proportional to the number of independent sources. Since the DSTN structure contains many current sources, our sensitivity computation is coupled with IEKS-based order reduction approach so that it can handle large scale DSTN design. In [7, 8], the authors analyzed circuits with piece wise linear (PWL) waveforms. In the present study we not only analyze circuits with PWL waveforms but also analyze circuits with several unit pulse waveforms. In section 3.5.2, we will describe how to calculate the moment with several unit pulse waveform.
3.5.1
Improved Extended Krylov Subspace Method(IEKS)
In Equation (3.3) we employ Laplace transformation on both sides and then obtain G ˜V(s) = ˜
U(s), where ˜V(s) and ˜U(s) are Laplace transform of V(t) and U(t). After calculating
the moment, the orthonormal basis X of its extended Krylov subspace can be calculated. We then employ this basic to construct its order-reduced model by projecting the original system onto this subspace via congruent transformation. The reduction matrix will be calculated by ˆG = XTGX, ˆV = XTV, ˆU = XTU. The dimension of this new system
is smaller than original system because the rank of ˆG is smaller than G. Therefore the
runtime is much less than the original circuit. Next we set up the system equations of re-duced circuit and utilize the fast simulation method in [17] to get the waveform of ˆV(t).
Then ˆV(t) is projected back to the original space to provide the approximate solution, V(t) ≈ X ˆV(t) . Details of IEKS reduction procedure can be found in [6].
T
)
(t
h
1
t
12t
11t
21t
t
22t
N1t
N2Fig. 3.6: A waveform contains several unit pulses
3.5.2
The Moment of Several Unit Pulses Waveform
In order to calculate the orthonormal basis X, we need to firstly compute the moment form waveform. In [7, 8], the authors analyzed the circuit with piece wise linear (PWL) waveform. Now, we analyze the circuit with several unit pulses waveform in this session. Given a circuit with several unit pulses waveform, we can derivate the moment of the waveform. A unit pulse after shifting can be viewed as a unit step function after shifting subtracts another unit step function after shifting. If h(t) is composed of N unit pulses as shown in Fig. 3.6, the details derivate as follows:
h(t) = N
X
i=1
where u(t) is a step function and u(t − ti) is a is a unit step function after shifting ti. We
take Laplace transformation on both sides and than have
L{h(t)} = N X i=1 e−sti1 s − e−sti2 s ! = N X i=1 1 s (∞ X l=0 (−1)l· t l i1 l!s l ! − ∞ X l=0 (−1)l· t l i2 l!s l !) = N X i=1 1 s (∞ X l=1 (−1)l· t l i1− tli2 l! s l !) = N X i=1 ∞ X l=1 (−1)l· t l i1− tli2 l! s l−1 ! = ∞ X l=1 (−1)l N X i=1 tl i1− tli2 l! s l−1 ! , (3.17)
whereupon we let l = m + 1 and thus obtain
∞ X l=1 (−1)l N X i=1 tl i1− tli2 l! s l−1 ! = ∞ X m=0 (−1)m+1 N X i=1 tm+1i1 − tm+1 i2 (m + 1)! s m ! . (3.18)
Therefore, the m-th moment of h(t) can be calculated by
hm = (−1)m+1 N X i=1 tm+1i1 − tm+1 i2 (m + 1)! ! , (3.19)
where m is the m-th moment. Fig. 3.7 shows the proposed moment calculation algorithm.
3.6
Two-Stage Sizing Algorithm
As shown in Fig. 3.4, the DSTN design optimization problem can be expressed as the following problem formulation.
“Given an IR circuit shown in Fig. 3.4 with a set of independent PWL current sources and the maximum allowed threshold voltage at each node, the problem is to simultane-ously minimize the total width of sleep transistors and meet the maximum allowed thresh-old voltage constraints.”
The flowchart of sizing algorithm is shown in Fig. 3.2, and its details are shown in Fig. 3.9. Our sizing algorithm has two stages. To speed up the optimization procedure,
Moment Calculation Algorithm
Input : A waveform h(t) contains
N unit pulses {(t11, t12), (t21, t22) · · · , (tN 1, tN 2)}
Output : hm= {h1, h2, ..., hm}, the first m moments of the source h(t).
1 Begin 2 For i = 1 : m 3 moment = 0 4 For j = 1 : N 5 moment+ = tm+1j1 − tm+1 j2 6 End For 7 For k = 1 : (m + 1) 8 moment =momentk 9 End For 10 If (m + 1) % 2 = 0 11 hi= moment 12 Else 13 hi= −moment 14 End For 15 End.
Fig. 3.7: Moment calculation algorithm for the pulse waveform.
in the first stage we use IEKS reduction method to approximate the solution of the equiv-alent DSTN circuit with many independent time-variant PWL current sources, and then calculate the solution of the adjoint system of this equivalent circuit with many indepen-dent pulse waveforms. After that, the cost of each node and the integral sensitivity of each sleep transistor are computed, as shown in sections 3.3 and sections 3.4. To potentially zero the cost of the selected sleep transistors and also minimize the increased width of a DSTN, Wiaddis computed as
Wiadd= ci Si
, (3.20)
where ci is the cost of node i, and Si is the integral sensitivity of sleep transistor i.
By the above sizing method the new width of sleep transistor i becomes
Winew = Wiold+ Wiadd. (3.21)
In the sizing procedure of transistor i, the influence of sizing sleep transistor i is greater than sizing the other sleep transistors. So if node i has maximum Vmax vio
i , the
we choose the sleep transistor with maximum Vimax vioto size, hence the sleep transistor oversizing issue is alleviated. However, the conductance matrix and its LU decomposi-tion need to be recalculated after each sizing step. This recalculadecomposi-tion procedure is time-consuming. To reduce the number of recalculation steps, several sleep transistors are simultaneously sized instead of only one sleep transistor. The details will be described in the next section. The above procedure is repeated until the maximum Vimax vioof the sleep transistors plus Vist∗is less than the upper bound voltage Vup.
In the second stage, an accurate and efficient time domain solver [17] is used to cal-culate the cost function and its sensitivity. The procedure presented in the first stage is repeated until the total cost is equal to zero. Fig. 3.8 shown the voltage waveform of node
i in the sizing procedure from beginning to end. Fig. 3.8 (a) and Fig. 3.8 (b) show the
voltage variable at node i in the first stage. Fig. 3.8 (c) illustrates it meets upper bound voltage constraint, and then starting the second stage. Fig. 3.8 (d) shows the cost of node
i is equal to zero.
The total width of sleep transistors and the execution time of the proposed two-stage sizing algorithm are dependent on the value of Vup in the first stage. From the experi-mental results shown that it can be observed that the execution time is reduced when we decrease Vup. This is because IEKS method is faster than the time domain solver [17]. On the other hand, the total width of sleep transistors is reduced when we choose a larger
Vup. This is because the time domain solver [17] is more accurate than IEKS method.
3.7
Speed Up
In the sizing procedure, as runtime is increased with the circuit become large. In order to decrease the runtime of circuit analysis, we also proposed the speed up method for sizing sleep transistors.
In our algorithm, after solving the voltage of nodes at each time step, we then choose the sleep transistor i with the maximum Vimax vio and size its width. Finally we update
conductance matrix G, then repeat above procedure until meet cost constraint or upper bound voltage constraint. When we solve voltage of nodes at each time step, we need to perform LU decomposition, this is time-consuming so we modify our algorithm. First, we calculate the Vimax vio of each sleep transistor. Then, these values are sorted into descending order. After that, the first N sleep transistors are chosen for sizing. It can be observed in the experimental results that increasing N can reduce the execution time, but the total width will increase a little.
3.8
Discussion
In the chapter, we have proposed the two-stage sizing algorithm by using cost and integral sensitivity. In the first stage, model order reduction mthod is empoly because it can speed up the analysis of large circuits. On the other hand, the upper bound voltage is employed in the first stage, it can decide the runtime of all sizing procedure and total width of sleep transistor. If the upper bound voltage is increased, the runtime of first stage is decrease, and the runtime of second stage is increased. This is because it quickly meets the upper bound constraint in the first stage, then break the first stage and starting the second stage. In the second stage, exact time domain is utilized, it can solve the exact solution and make sure that final solution is not violation the circuit constraint, but it is time-consuming. That is why increase upper bound voltage will increase total runtime. Comparing the performance of using two stage with using only the second stage, the runtime of the latter is more than the former, the experimental results will be seen in the next chapter, and it will show that our two-stage sizing algorithm can handle large circuits efficiently. In [4], a state-of-the-are method has been proposed, the authors just only consider the maximum
Vmax vio
i at node i and then reduce the maximum Vimax vio. In our method, on the other
hand, we consider not only the cost of time period T but also consider the total cost of the nodes, i.e. we consider the global region. From the global view point, our method is outperform [4]. The experimental results will show our method is better than [4] in next
Sizing
0
T
* st iV
( )
iV t
upV
Sizing
Sizing
(a)
_ max vio iV
t
(c)
upV
* st iV
_ max vio iV
( )
iV t
T
t
0
t
(d)
upV
* st iV
( )
iV t
T
0
(b)
( )
iV t
_ max vio iV
upV
* st iV
T
t
0
Fig. 3.8: (a)The voltage waveform of node i in the first sizing stage. (b) The Vmax vio i
of node i was reduced. (c)The voltage waveform of node i in the second stage. (d) The
Two-Stage Sizing Algorithm
Input : The numbers of sleep transistor K, the numbers of slected sleep transistors N, the upper bound voltage Vup,
the maximum allowed threshold voltage Vst∗
i for each node i,
and minimum width of sleep transistors. Output : Total width of sleep transistors.
1 Begin
2 Construct conductance matrix G 3 FIRST STAGE
4 While
5 Solve voltage at each time step with IEKS method 6 Sort Vmax vio
i from maximum to minimum
7 For i = 1 : N
8 Estimate cost and sensitivity for node i 9 End For
10 If don’t meet upper bound voltage constraint
11 Select first N nodes in this order 12 For i = 1 : N 13 Wadd i = ci Si=> W new i = Wiold+ Wiadd 14 End For
15 Update conductance matrix G 16 End If
17 Else Break
18 End While
19 SECOND STAGE
20 While
21 Solve voltage at each time step with exact time domain solver 22 Sort Vimax viofrom maximum to minimum
23 For i = 1 : N
24 Estimate cost and sensitivity for node i 25 End For
26 If don’t meet total cost constraint
27 Select first N nodes in this order 28 For i = 1 : N 29 Wadd i = ci Si=> W new i = Wiold+ Wiadd 30 End For
31 Update conductance matrix G 32 End If
33 Else Break
34 End While
35 End.
Chapter 4
Experimental Results
The two-stage sizing algorithm is implemented in C++ language on a processor 3.2GHZ HP workstation with 32GB memory. The test circuits are randomly generated, and the numbers of nodes are from 400 to 5000. In the test circuit, the waveform of each cur-rent source contains several triangle waves with peak value randomly generated between 0.1mA and 5mA. Our VDD is set to 1.3V, and the maximum allowed threshold voltage
is set to 0.065V, which is 5% of the VDD. The virtual ground resistances are randomly
generated from 10 to 20Ω. Time period T is partitioned into 100 time steps.
The plots shown in Fig. 4.1 and Fig. 4.2 are the results for the test circuit with 2500 nodes. Fig. 4.1 illustrates the relation between the total width of sleep transistors, the upper bound voltage Vup and the number N of selected sizing sleep transistors. It can be observed that the total width decreases as Vup increases or N decreases. Fig. 4.2 illustrates the relation between the runtime, the upper bound voltage Vupand the number
N of selected sizing sleep transistors. It can be observed that the runtime decreases as Vupdecreases or N increases.
Circuit Area (Width) µm
[4] Two-Stage Sizing Algorithm
no. of nodes no. of time frames no. of selected transistors (N)
TP V-TP 1 4% of sleep transistors 400 558.0900 1919.5600 556.3900 556.7100 900 1162.4500 4565.2600 1158.6300 1159.4300 1225 1611.5000 6258.1000 1607.6300 1609.2400 1600 2045.8400 8952.9100 2038.0000 2043.2000 2025 2595.4100 11137.3000 2588.6300 2591.5400 2500 3191.0800 14893.6000 3179.9300 3183.6300 5000 6179.0300 32326.2000 6158.7500 6173.1900 Total 17343.4100 80052.9300 17287.9600 17316.9400 Table 4.1: Total width comparison for two-stage method and [4]
Circuit The ratio of width
[4] Two-Stage Sizing Algorithm no. of nodes no. of time frames no. of selected transistors (N)
TP V-TP 1 4% of sleep transistors 400 1.0025 3.4481 0.9994 1.0000 900 1.0026 3.9375 0.9994 1.0000 1225 1.0014 3.8889 0.9990 1.0000 1600 1.0013 4.3818 0.9975 1.0000 2025 1.0015 4.2976 0.9989 1.0000 2500 1.0023 4.6782 0.9988 1.0000 5000 1.0009 5.2365 0.9977 1.0000 Avg. 1.0018 4.2669 0.9986 1.0000
Table 4.2: The ratio of width for two-stage method and [4].
In order to demonstrate that two-stage sizing algorithm is better than the state-of-the-art method [4], we also implement the method [4]. In Tables 4.1 and 4.2, the TP is the sizing method by using the uniform time frame partition proposed by [4] with 100 time frames, and the V-TP is the sizing method by using the variable time frame partition developed in [4] with 10 time frames. In our method, the Vupis set to be 0.2V in the first stage, and the number of selected sleep transistors is 4% of the total sleep transistors.
Table 4.1 and Table 4.2 demonstrates that the total width of the proposed sizing algo-rithm outperforms both the TP and V-TP methods. In fact, the total width of our two-stage sizing algorithm is slightly less than the TP for each test circuit, and the total width of V-TP averages over four times that of our proposed method. Table 4.3 and Table 4.4 show
Circuit Runtime (s)
[4] Two-Stage Sizing Algorithm
no. of nodes no. of time frames no. of selected transistors (N)
TP V-TP 1 4% of sleep transistors 400 7.4900 3.0400 102.7400 6.9500 900 42.2700 17.7900 665.3200 18.6400 1225 83.9400 34.9500 1279.7700 26.1600 1600 173.8900 80.5900 2601.6200 44.4700 2025 444.7400 173.3200 6353.0800 71.8400 2500 1068.8400 269.9600 14391.0000 101.2900 5000 9876.9500 2764.0900 153513.0000 749.2800 Total 11698.1200 3343.7400 178906.5000 1018.6300 Table 4.3: Runtime comparison for two-stage method and [4].
Circuit The ratio of runtime
[4] Two-Stage Sizing Algorithm no. of nodes no. of time frames no. of selected transistors (N)
TP V-TP 1 4% of sleep transistors 400 1.0777 0.4374 14.7827 1.0000 900 2.2677 0.9544 35.6931 1.0000 1225 3.2087 1.3360 48.9209 1.0000 1600 3.9103 1.8122 58.5028 1.0000 2025 6.1907 2.4126 88.4337 1.0000 2500 10.5523 2.6652 142.0772 1.0000 5000 13.1819 3.6889 204.8807 1.0000 Avg. 5.7699 1.9909 84.7558 1.0000
Table 4.4: The ratio of runtime for two-stage method and [4].
that two-stage sizing algorithm is faster than both the TP and V-TP methods, specifically averaging 5.77× faster than the TP and 1.99× faster than the V-TP.
In Table 4.4, comparing TP or V-TP with two-stage method, it can be observed as the circuit become large, the ratio of runtime increases. We moreover compare, using our method, sizing based on selecting only one sleep transistor relative to resizing based on selecting 4% of the total sleep transistors. It is found that the total width of the former is less than the latter, but the run time of the latter is 84.75× faster than the former. It is thus shown that selecting multiple sleep transistors for concurrent sizing is a very fast and reasonably effective methodology.
Circuit Area (Width) µm
Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)
1 4% of sleep transistors 1 4% of sleep transistors
400 556.0700 556.1500 556.3900 556.7100 900 1157.6400 1157.9000 1158.6300 1159.4300 1225 1605.8300 1606.1600 1607.6300 1609.2400 1600 2036.4800 2037.2300 2038.0000 2043.2000 2025 2586.2000 2586.6700 2588.6300 2591.5400 2500 3177.5600 3178.3000 3179.9300 3183.6300 Total 11119.7800 11122.4100 11129.2100 11143.7500
Table 4.5: Total width comparison for two-stage method and only execution in second stage.
Circuit The ratio of width
Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)
1 4% of sleep transistors 1 4% of sleep transistors
400 0.9988 0.9989 0.9994 1.0000 900 0.9984 0.9986 0.9993 1.0000 1225 0.9979 0.9981 0.9990 1.0000 1600 0.9967 0.9971 0.9975 1.0000 2025 0.9979 0.9981 0.9988 1.0000 2500 0.9981 0.9983 0.9988 1.0000 Avg. 0.9980 0.9982 0.9988 1.0000
Table 4.6:The ratio of width for two-stage method and only execution in second stage.
when using the full two stage sizing method. This is because the exact time domain solver provides a best candidate for choosing and sizing sleep transistors. Hence, the total width can be reduced. Table 4.7 shows that the two stage sizing method is faster than using only the second stage. It also reveals that model order reduction is an effective method for circuit analysis. In Table 4.8, it can be observe that comparing only execution in second stage with two-stage method, as the circuit become large, the ratio of runtime increases. Finally, memory usage of our method is compared with [4], showing that the memory usage of our method is more than [4], with the results shown in Table 4.9.
Circuit Runtime (s)
Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)
1 4% of sleep transistors 1 4% of sleep transistors
400 155.3800 8.7600 102.7400 6.9500 900 989.7800 27.9500 665.3200 18.6400 1225 1884.7800 39.5800 1279.7700 26.1600 1600 3512.2200 70.3600 2601.6200 44.4700 2025 7579.9400 150.9600 6353.0800 71.8400 2500 20668.500 213.8300 14391.0000 101.2900 Total 34790.6000 511.4400 25393.5300 269.3500
Table 4.7:Runtime comparison for two-stage method and only execution in second stage.
Circuit The ratio of runtime
Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)
1 4% of sleep transistors 1 4% of sleep transistors
400 22.3568 1.2604 14.7827 1.0000 900 53.0998 1.4995 35.6931 1.0000 1225 72.0481 1.5129 48.9208 1.0000 1600 78.9795 1.5821 58.5028 1.0000 2025 105.5114 2.1013 88.4337 1.0000 2500 204.0527 2.1111 142.0772 1.0000 Avg. 89.3414 1.6779 64.73508 1.0000
Table 4.8: The ratio of runtime for two-stage method and only execution in second stage.
20 40 60 80 100 0.1 0.15 0.2 3180 3200 3220 3240 3260
Number of selected nodes Upper bound voltage (V)
Total width (
μ
m)
Fig. 4.1: Total width relation between the number of selected sleep transistors and the upper bound voltage.
20 40 60 80 100 0.1 0.15 0.2 2 2.5 3 3.5 4
Number of selected nodes Upper bound voltage (V)
Time(log X) (s)
Fig. 4.2: Runtime relation between the number of selected sleep transistors and the upper bound voltage. The X is the execution time.
Circuit Memory (MB)
[4] Two-Stage Sizing Algorithm no. of nodes no. of time frames no. of selected transistors (N)
TP V-TP 1 4% of sleep transistors 400 1.18 2.68 3.82 3.98 900 4.62 3.92 6.45 6.54 1225 5.66 4.70 8.09 8.21 1600 7.06 5.81 10.51 10.06 2025 8.40 6.82 12.61 11.90 2500 10.09 8.12 14.56 14.76 5000 19.00 8.89 26.03 27.71
Chapter 5
Conclusion
In this thesis, we have presented a novel approach to size sleep transistors. In order to speed up circuit analysis, we employ model order reduction to solve voltage drop of sleep transistors, and select more than one sleep transistor to size in a loop, this can effective reduce execution time. On the other hand, we combine the cost and integral sensitivity and utilize them to size sleep transistor. In global view point,the answer of it is better than local view point, so the experimental results show that our total width of sleep transistors is less than [4], and execution time is faster than [4].
Bibliography
[1] F. Fallah, and M. Pedram, ”Standby and active leakage current control and min-imization in CMOS VLSI circuits,” IEICE Trans. on Electronics, vol. E88-C, pp. 509-519, April 2005.
[2] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, ”Leakage current mech-anisms and leakage reduction techniques in deep-submicrometer CMOS circuits,” Proc. of the IEEE vol. 91, no. 2, pp. 305 - 327, February 2003.
[3] V. De, Y. Ye, A. Keshavarzi, S. Narendra, J. Kao, D. Somasekhar, R. Nair, and S. Borkar, ”Techniques for leakage power reduction,” in Design of High-Performance Microprocessor Circuits, A. Chandrakasan, W. Bowhill, and F. Fox, Eds. Piscat-away, NJ: IEEE, 2001, ch. 3, pp. 52-55.
[4] D. S. Chiou, D. C. Juan, Y. T. Chen, and S. C. Chang, ”Fine-grained sleep transis-tor sizing algorithm for leakage power minimization,” Proc. of ACM/IEEE Design Automation Conference (DAC-2007), 2007, pp. 81-86,.
[5] http://asic-soc.blogspot.com/2008/03/leakage-power-trends.html
[6] H. Chang, and S. S. Sapatnekar, ”Full-chip analysis of leakage power under process variations, including spatial correlations,” Proc. of ACM/IEEE Design Automation Conference (DAC-2005), 2005, pp. 523-528.
[7] Y. M. Lee, Y. Cao, T. H. Chen, J. Wang, and Charlie C. P. Chen, ”HiPRIME: Hierar-chical and passivity preserved interconnect macromodeling engine for RLKC power
delivery,” IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems (TCAD), vol. 24, no. 6, pp. 797-806, June, 2005.
[8] Y. Cao, Y. M. Lee, T. H. Chen, and Charlie C. P. Chen, ”HiPRIME: Hierarchical and passivity reserved interconnect macromodeling engine for RLKC power delivery,” Proc. of ACM/IEEE Design Automation Conference (DAC-2002), 2002, pp. 379-384.
[9] C. Long, and L. He, ”Distributed sleep transistor network for power reduction,” IEEE Transaction on VLSI systems, vol. 12, No. 9, pp 181-186, September 2004.
[10] S. Mutoh, S. Shigematsu, Y. Matsuya, H. Fukuda, T. Kaneko, and J. Yamada, ”A 1-v multithreshold-voltage CMOS digital signal processor for mobile phone appli-cation,” IEEE J. Solid-State Circuits, vol. 31, issue 11, pp. 1795-1802, November 1996.
[11] J. Kao, S. Narendra, and A. Chandrakasan, ”MTCMOS hierarchical sizing based on mutual exclusive discharging patterns,” Proc. of ACM/IEEE Design Automation Conference (DAC-1998),1998, pp. 495-500.
[12] M. Anis, S. Areibi, M. Mahmoud, and M. Elmasry, ”Dynamic and leakage power reduction in MTCMOS circuits using an automated efficient gate clustering tech-nique,” Proc. of ACM/IEEE Design Automation Conference (DAC-2002), 2002, pp. 480-485.
[13] J. Kao, A. Chandrakasan, and D. Antoniadis, ”Transistor sizing issues and tool for multi-threshold CMOS technology,” Proc. of ACM/IEEE Design Automation Con-ference (DAC-1997) , 1997, pp. 409-414.
[14] D. S. Chiou, S. H. Chen, S. C. Chang, and C. Yeh, ”Timing driven power gating,” Proc. of ACM/IEEE Design Automation Conference (DAC-2006), 2006, pp. 121-124.
[15] M. Anis, S. Areibi, and M. Elmasry, ”Design and optimization of multithreshold CMOS (MTCMOS) circuits,” IEEE Transactions on Computer-Aided Design of In-tegrated Circuits and Systems (TCAD), Vol. 22, issue 10, pp. 1324-1342, June 2003.
[16] A. Ramalingam, B. Zhang, A. Devgan, and D. Z. Pan, ”Sleep transistor sizing using timing criticality and temporal currents,” Proc. of ACM/IEEE Asia and South Pa-cific Design Automation Conferernce (ASP-DAC-2005), vol. 2, pp. 1094-1097, Jan. 2005.
[17] T. H. Chen, and C. C. Chen, ”Efficient large-scale power grid analysis based on preconditioned Krylov subspace iterative methods,” Proc. Design Automation Con-ference (DAC-2001), 2001, pp. 559-562.