An Efficient Method for Analyzing On-Chip Thermal Reliability Considering Process Variations

(1)

41 Considering Process Variations

YU-MIN LEE, National Chiao Tung University

PEI-YU HUANG, Industrial Technology Research Institute

This work provides an efficient statistical electrothermal simulator for analyzing on-chip thermal reliability under process variations. Using the collocation-based statistical modeling technique, first, the statistical interpolation polynomial for on-chip temperature distribution can be obtained by performing deterministic electrothermal simulation very few times and by utilizing polynomial interpolation. After that, the proposed simulator not only provides the mean and standard deviation profiles of on-chip temperature distribution, but also innovates the concept of thermal yield profile to statistically characterize the on-chip temperature distribution more precisely, and builds an efficient technique for estimating this figure of merit. Moreover, a mixed-mesh strategy is presented to further enhance the efficiency of the developed statistical electrothermal simulator.

Experimental results demonstrate that (1) the developed statistical electrothermal simulator can obtain accurate approximations with orders of magnitude speedup over the Monte Carlo method; (2) comparing with a well-known cumulative distribution function estimation method, APEX [Li et al. 2004], the developed statistical electrothermal simulator can achieve 215_{× speedup with better accuracy; (3) the developed} mixed-mesh strategy can achieve an order of magnitude faster over our baseline algorithm and still maintain an acceptable accuracy level.

Categories and Subject Descriptors: B.7.2 [Integrated Circuits]: Design Aids; B.8.2 [Performance and Reliability]: Performance Analysis and Design Aids

General Terms: Design, Algorithms, Performance, Reliability

Additional Key Words and Phrases: Electrothermal simulation, thermal analysis, chip temperature, thermal reliability, process variation, simulation

ACM Reference Format:

Lee, Y.-M. and Huang, P.-Y. 2013. An efficient method for analyzing on-chip thermal reliability considering process variations. ACM Trans. Des. Autom. Electron. Syst. 18, 3, Article 41 (July 2013), 32 pages. DOI: http://dx.doi.org/10.1145/2491477.2491485

1. INTRODUCTION

As technology scales down to the sub-90nm node, on-chip power densities increase rapidly. Hence, power dissipation and thermal management have become important issues of VLSI design. High on-chip temperature distribution and thermal gradients

Preliminary versions of this article appeared in Proceedings of the IEEE International Systems-on-Chip

Conference (SoCC’10) [Chang et al. 2010] and in Proceedings of the IEEE/ACM International Asia and South Pacific Design Automation Conference (ASP-DAC’12) [Huang et al. 2012].

This work was supported in part by the National Science Council of Taiwan under grants NSC 99-2220-E-009-035, 100-2221-E-009-074, and 101-2221-E-009-168, and by the Industrial Technology Research Institute, Taiwan.

Authors’ addresses: Y.-M. Lee, Department of Electrical and Computer Engineering, National Chiao Tung University, HsinChu, Taiwan; email: yumin@nctu.edu.tw; P.-Y. Huang, Industrial Technology Research In-stitute, Taiwan.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax_{+1 (212)} 869-0481, or permissions@acm.org.

c

2013 ACM 1084-4309/2013/07-ART41 $15.00

(2)

drastically degrade the circuit performance and design reliability, operating tempera-tures seriously affect gate delays [Kumar and Kursun 2006], and nonuniform on-chip temperature distribution induces timing faults [Bota et al. 2004]. Since on-chip power consumption is proportional to operating temperatures, thermal runaway might occur if thermal-related issues are not carefully considered in package design [Vassighi and Sachdev 2006]. To ensure design qualities, such as performance, power consumption, and reliability, researchers have been devoted to dealing with thermal-related issues in physical design [Tsai et al. 2006; Liu et al. 2008]. To provide the related thermal cost for optimization engines of physical design, several efficient deterministic thermal simulators [Wang and Chen 2003; Huang et al. 2006; Yang et al. 2007; Huang and Lee 2009] have been developed to predict on-chip temperature profile. However, these thermal simulators only provide thermal information with deterministic on-chip power consumption.

As predicted by the international technology roadmap for semiconductors (ITRS), leakage power consumption increases dramatically and has become a dominant por-tion of total power consumppor-tion [ITRS 2010]. Moreover, the scaling down of technology causes physical parameter variations to be non-ignorable, and this fact leads to sub-stantial on-chip leakage power fluctuations. As pointed out by Pang and Nikolic, 8% of process variations can lead to about 25% of on-chip leakage power fluctuations [2009]. Therefore, physical parameter variations are essential to be considered for on-chip power estimation techniques [Chang and Sapatnekar 2007; Shen et al. 2010b, 2010a]. Since on-chip temperature is transfered from on-chip power consumption, its distribu-tion is impacted by process variadistribu-tions inducing leakage power fluctuadistribu-tions. However, deterministic thermal analyzers [Wang and Chen 2003; Huang et al. 2006; Yang et al. 2007; Huang and Lee 2009], which did not take process variations into account in their power models, are not adequate to precisely provide thermal reliability estima-tion under process variaestima-tions. Therefore, the on-chip temperature profile should be treated statistically under process variations, and statistical thermal simulation tech-niques are essential, especially for leakage dominated technologies [Huang et al. 2009; Jaffari and Anis 2008].

To provide the statistical characteristics of on-chip temperature distribution, Jaffari and Anis [2008] proposed a recursive log-normal approximation algorithm to obtain mean and standard deviation profiles of the on-chip temperature distribution. Com-pared with the Monte Carlo (MC) simulations, they have successfully demonstrated its efficiency and accuracy for estimating mean and standard deviation profiles of the temperature distribution in the macro-architectural level. Instead of constructing the different leakage power model for each different macro/gate type, they built the differ-ent leakage power model for each bin (grid) on a die. Since optimization engines, such as floorplanners or placers, might disturb positions of macros or gates, their related leakage power models need to be rebuilt after an optimization loop is executed. There-fore, the efficiency of their approach will be degraded while casting into thermal-aware design flows, because their approach needs to execute the time-consuming HSPICE simulation numerous times and fit curves for reestablishing leakage power models. In addition, their recursive log-normal approximation algorithm is restricted to the form of their proposed leakage power models. However, the leakage power model is getting more complicated for maintaining an acceptable accuracy level as the tech-nology continuously scales down. To overcome the leakage power model restriction, a statistical thermal simulation framework that has high capability of adopting complex and accurate power models for any technology generations is required.

Besides the leakage power model issues, the figure of merit for on-chip temperature distribution is still ambiguous if only its mean and standard deviation profiles are

(3)

reported.1 _{Therefore, instead of only reporting mean and standard deviation profiles,} a more precise figure of merit for the statistical characteristics of on-chip temperature distribution should be addressed to ensure the thermal reliability or to provide the thermal related cost for thermal-aware design engines.

In this work, a statistical simulation framework is developed for characterizing the on-chip temperature distribution. With the aim of dealing with the restriction issues of leakage power models, providing a more precise figure of merit for ensuring thermal reliability and being more easily incorporated into statistical performance analysis and design engines, technical key points and advantages of this work are summarized as follows.

(1) Compared with the bin-based model [Jaffari and Anis 2008], a cell-based model is adopted for characterizing leakage powers. With the precharacterizing property, the reestablishing process of leakage power models can be avoided, while macros or gates are exchanged by optimization engines, such as floorplanners or placers. (2) Adopting the concept of sparse collocation-based methods, a statistical

electrother-mal simulation framework is developed to generate the statistical polynomial ex-pression of on-chip temperature distribution. Compared with that of Jaffari and Anis [2008], the developed framework is more flexible for complex and precise leakage powers models.

(3) This work not only provides the mean and standard deviation profiles of on-chip temperature distribution, but also introduces the concept of thermal yield profile to statistically characterize the on-chip temperature distribution more precisely, and builds an efficient estimating technique for this figure of merit.

(4) Without sacrificing the accuracy, a mixed-mesh strategy is presented and integrated into the baseline method of our statistical electrothermal simulation engine to further enhance its efficiency.

The rest of this article is organized as follows. Section 2 motivates the concept and essentialness of on-chip thermal yield profile, investigates the accuracy of existing cell-based leakage power models, and indicates that complex leakage current models are required for maintaining acceptable accuracy level. After that, the problem formulation and the modeling technique of device parameters are described in Section 3. Then, the developed statistical electrothermal simulator is detailed in Section 4, and experimen-tal results are given in Section 5. Finally, the conclusion and potential applications of the developed simulation framework are presented in Section 6.

2. MOTIVATION ILLUSTRATIONS

2.1. Concept of On-Chip Thermal Yield Profile

Because of process variations, on-chip temperature at an arbitrary position r is a random variable. Therefore, the deterministic thermal analysis with nominal on-chip power profile can no longer be a good figure of merit for identifying the hotspot lo-cation of the chip. Moreover, even if the temperature at any arbitrary position r has been treated as a random variable, it will still be ambiguous if only the mean (μT(r))

and standard deviation (σT(r)) profiles of on-chip temperature distribution are

deliv-ered. For example, only using the mean profile of on-chip temperature distribution as a figure of merit is very likely (about 50%) to incorrectly indicate hotspot loca-tions. Furthermore, as bothμT(r) andσT(r) are delivered, by utilizing the Chebyshev

inequality, a large temperature value will be estimated to ensure the lower bound

(4)

Fig. 1. An example to demonstrating the gap between the temperature profile that really achieves 90% thermal reliability and the temperature profile that satisfies the lower bound of 90% thermal reliability in the Chebyshev inequality: (a) the temperature profile that really achieves 90% thermal reliability, T90

Real(r);

(b) the temperature profile that satisfies the lower bound of 90% thermal reliability in the Chebyshev inequality, T90

Chebyshev(r).

of 90% thermal reliability, that is, T90

Chebyshev(r) needs to be μT(r)+ 3σT(r) to ensure

Prob(T (r)≤ T90

Chebyshev(r))≥ 0.9. Here, T (r) is the on-chip temperature distribution. Since the Chebyshev inequality does not always get a tight lower bound for any type of random variables, there will be a gap between the temperature profile, T_Real90 (r), that really achieves 90% thermal reliability (i.e., Prob(T (r) ≤ T90

Real(r)) = 0.9), and

T_Chebyshev90 (r).2 _{As shown in Figure 1, the gap between T}90

Real(r) and T 90

Chebyshev(r) can achieve about 10◦C in our experimental results. Therefore, using the temperature profile of the Chebyshev bound might be an immoderately conservative constraint for thermal reliability. This undesirable phenomenon can result in immoderate guard-banding for circuit design.

On the other hand, from the aspect of circuit design, the specifications of circuit performance and the timing constrains of primary I/Os are usually specified in the system-level design stage. Moreover, in the timing and thermal cooptimization pro-cess, timing issues usually take higher priority than thermal issues. Designers try to minimize the circuit delay and meet the temperature requirement. Therefore, in the presence of process variations, to identify possible hotspot regions of a chip, the thermal

yield profile, T yield(r, Tspec(r)), can be defined as the probability profile of the on-chip temperature at an arbitrary position r being at or less than a specified temperature

Tspec(r).

The thermal yield profile can identify the hotspot regions and can also quantify the probability of a region that could be a hotspot. Therefore, it is a suitable figure of merit for the thermal-related cost in timing-thermal cooptimization physical design stages. 2.2. Leakage Power Modeling

Leakage currents of a gate not only depend on physical device parameters and oper-ating temperatures but also on its input patterns [Chang and Sapatnekar 2007]. To build leakage power models, different input patterns, physical parameters, and oper-ating temperatures are set for each gate in the cell library, and HSPICE simulation is

2_{For example, given a standard normal random variable x with Prob(x} _{≤ 1.28σ}

x) = 0.9, however, the

Chebyshev inequality requires a larger reference value to obtain the same probability as the lower bound, i.e., Prob(x≤ 3σx)≥ 0.9.

(5)

Table I. Accuracy Comparison of Igand Iswith HSPICE Simulation Results for an NAND Gate

maximum average

fg(L_{, t}ox, T ) error error error> 3%

Without tox_{, L, t}_{ox, L}2 2

6.48% 2.70% 4.37% temperature [Chang and Sapatnekar 2007; Shen et al. 2010b]

With L, tox, T 3.20% 0.97% 0.35%

temperature †L, tox, T, tox2 1.55% 0.29% 0.00%

maximum average

fs(L, tox, T ) error error error> 3%

L_{, t}ox, tox, t2 ox−1[Chang and Sapatnekar 2007] 347.32% 70.65% 98.27%

Without

L, tox, Ltox, L2_{, t}_{ox, t}2 −1

ox, Ltox−1, L−1tox

temperature

[Shen et al. 2010b, 2010a] 314.13% 70.52% 100.00%

L, T, tox[Yu et al. 2009] 32.23% 8.73% 76.62%

(L, tox, T ) are fully expanded to 2nd order =⇒

With L, tox, T, Ltox, toxT, T L, L2_{, t}_{ox, T}2 2 10.31% 1.53% 8.47%

temperature _{† (L, t}ox, T ) are fully expanded to 3rd order =⇒

L_{, t}ox, T, Ltox, toxT, T L, L2, tox, T2 2, LtoxT, L2tox, 1.31% 0.19% 0.00% t2

oxT, T2L, L3, tox3, T3 † The adoptive forms of fgand fsin this work.

Note: The second column shows the fitting components of fgand fsadopted by existing and proposed models. performed with the industry design kit to generate data of the leakage currents. After that, average leakage currents of the input patterns are fitted by the least squares method. With leakage currents exponentially relating with physical parameters and operating temperatures and using the least squares fitting method, two major leakage currents—gate tunneling leakage current (Ig) and subthreshold leakage current (Is)—

for each gate type can be fitted [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b; Yu et al. 2009].

Ig(L, tox, T ) = a0exp( fg(L, tox, T )), (1)

Is(L, tox, T ) = b0exp( fs(L, tox, T )). (2)

Here, a0and b0are fitting constants, L is the channel length, toxis the oxide thickness,

T is the operating temperature, and fg(·) and fs(·) are specific fitting forms.3

Basically, Igoccurs in both on and off states, and Isis the off-state leakage mechanism.

Therefore, the leakage power of a gate can be represented as follows [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b; Yu et al. 2009].

Pleak(L, tox, T ) = VddIg+ (1 − Sw)VddIs, (3)

where Vddis the supply voltage and Sw is the switching activity.

To adopt suitable leakage power models, different cell-based leakage current models are investigated [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b; Yu et al. 2009]. To examine their accuracy, we have implemented their proposed models and compared their results with that of HSPICE simulation under TSMC 65nm design kit. Since leakage currents are temperature-dependent, simulation results show that the ignorance of temperature effect in the models [Chang and Sapatnekar 2007; Shen et al. 2010b, 2010a] leads to considerable errors. As shown in the second row of Table I, the

3_{Variations of the device channel length and oxide thickness are considered in this work, since leakage power}

is more sensitive to these parameters [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b]. It should be noted that although only these two parameters are considered, the developed framework can be easily extended to include any other process variation types, such as the channel dopant variation.

(6)

Table II. Error of Leakage Current Models Proposed by [Jaffari and Anis 2008] for an NAND Gate under 65nm Technology Node Leakage Current maximum error average error error> 3%

Subthreshold 35_.53% 9_.82% 79_.34% Gate Tunneling 4.51% 1.07% 6.32%

model adopted by Chang and Sapatnekar [2007] and Shen et al. [2010b] can provide acceptable accuracy for the gate tunneling leakage current because of its insensitivity to operating temperatures. However, since the subthreshold leakage current is sensitive to operating temperatures, as shown in the sixth and seventh rows of Table I, the models [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b] are not adequate for preserving the accuracy.

To simultaneously take temperatures and process variations into account, Yu et al. proposed a first-order exponential model, b0exp(b1L+ b2tox+ b3T ), for the

sub-threshold leakage current [2009]. Their model can provide accurate results for the 90nm technology node. However, since the variability of subthreshold leakage current, because of operating temperatures and physical device parameters, will increase for more advanced technologies, considerable errors occur for simulation results under the 65nm technology node, as shown in the eighth row of Table I. To improve the accuracy for modeling leakage currents, the orders of fitting components for fg(L, tox, T ) and

fs(L, tox, T ) shown in Eqs. (1) and (2) are increased. As shown in the fourth and tenth

rows of Table I, compared with some models [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b; Yu et al. 2009], our models can present accurate results for both gate tunneling and subthreshold leakage currents. As demonstrating by the test results, exquisite approaches are still required for modern statistical power analyzers [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b] to refine the estimated result, while the temperature-dependent leakage power model is included.

Besides the cell-based leakage current models [Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b; Yu et al. 2009], Jaffari and Anis proposed a bin (grid)-based leak-age power model that also simultaneously contains temperature and process variation effects [2008]. Adopting the bin (grid)-based leakage power model, Haghdad and Anis developed a power yield analysis engine that simultaneously considers temperature and process variation effects [2012]. However, the power dissipation of several bins might be changed because of disturbing macros/cells after each optimization itera-tion of thermal-aware design engines, such as floorplanners or placers. Therefore, the time-consuming HSPICE simulation and least-squares fitting process need to be re-performed for rebuilding leakage power models of the disturbed bins (grids). This will degrade its efficiency of providing thermal reliability information or thermal-related cost for thermal-aware design engines. We have implemented their leakage power mod-els for examining the accuracy. With the fitting forms adopted in Jaffari and Anis [2008] and Haghdad and Anis [2012], both subthreshold and gate tunneling leakage currents of each gate type in the cell library are fitted as a0(1+a1T+a2T2) exp (a3L+ a4tox). The

fitting results are also compared with those of HSPICE simulations under the TSMC 65nm model card, and listed in Table II. The results show that their adopted leakage power models present considerable errors under the 65nm technology node, although adequate accuracy of these models for the 90nm technology node has been reported [Jaffari and Anis 2008]. As shown in Table II, their adopted leakage power models result in the maximum error being 35.53% and the average error being 9.82% for the subthreshold leakage current of an NAND gate under the 65nm technology node.

Demonstrating results, more accurate leakage power models should be adopted in the electrothermal analysis frameworks [Jaffari and Anis 2008; Haghdad and Anis 2012] to refine the estimated results, since the temperature is transferred from power consumption. Although the electrothermal analysis frameworks proposed [Jaffari and

(7)

Fig. 2. Compact thermal model of physical design stages.

Anis 2008; Haghdad and Anis 2012] can be very efficient, their baseline tempera-ture calculation frameworks require exquisite extending strategies, because their log-normal temperature approximation algorithm can not be applied to the leakage power models that are not expressed as log-normal random variables, that is, the first-order regression models for fg(L, tox, T ) and fs(L, tox, T ) in Eqs. (1) and (2) are not sufficient

for the accuracy.

Compared with the framework in Jaffari and Anis [2008] and Haghdad and Anis [2012], our proposed thermal reliability estimator can handle accurate and more com-plicated leakage power models and present accurate estimated results.

3. PROBLEM FORMULATION AND PHYSICAL PARAMETER MODELING 3.1. Problem Formulation

The compact thermal model for physical design stages is shown in Figure 2 [Wang and Chen 2003; Huang et al. 2006; Yang et al. 2007; Huang and Lee 2009]. It consists of three portions. The primary heat flow path is composed of the thermal interface mate-rial, heat spreader, and heat sink. The secondary heat flow path contains interconnect layers, I/O pads, and the print circuit board. Functional blocks of the die are modeled as many power-generating sources attached to a thin layer close to the top surface of the die with the thickness being equal to the junction depth of device [Lallement et al. 2004]. Due to variations of physical parameters, the power consumption of func-tional blocks is treated statistically. Therefore, the profile of power generating sources,

p(r, L, tox, T ), shown in Figure 2 is modeled as a function of device channel length L,

oxide thickness tox, and on-chip temperature distribution T .

Combining the compact thermal model and the statistical power consumption of functional blocks, the on-chip temperature distribution T (r, L, tox) can be governed by

the statistical steady state heat transfer equation.4

∇ · (κ(r, T )∇T (r, L, tox))= −p(r, L, tox, T ), (4)

4_{Since the time constant of heat conduction is much larger than the clock period of the circuit [Wang}

and Chen 2003; Skadron et al. 2004], steady-state characteristics of the on-chip temperature distribution are more concerned in thermal-aware physical design engines [Han and Koren 2007; Chakraborty et al. 2008; Tsai et al. 2006]. The scope of this article is to provide a simulation framework for thermal-aware physical design engines, although temporary characteristics of the on-chip temperature distribution are also important for the post-floorplanning or placement real-time task scheduling or workload assignment [Reda et al. 2011].

(8)

Fig. 3. An iterative scheme for computing the appropriate thermal conductivity and approximated average of steady-state nominal temperature values. Tnomis the average of on-chip mean temperature values, and Pnom is the total nominal power consumption after executing an iteration. With nominal values of the

physical device parameters and Tnom, Pnomcan be obtained by summing up the power values of all gates in

the design.

subject to the boundary condition

κ(rbs, T )

∂T (rbs, L, tox)

∂ nbs

+ hbsT (rbs, L, tox)= fbs(rbs). (5)

Here, r= (x, y, z) ∈ D, D = (0, Lx)× (0, Ly)× (−Lz, 0) is the domain of die, Lx and Ly

are the lateral sizes of die, Lzis the thickness of die,κ(r, T ) is the thermal conductivity

(W/m ·◦C) of die, and ∇ is a diverge operator. bs is any specific boundary surfaces

of the die, rbs is the position on bs, hbs is the heat transfer coefficient on bs, fbs(rbs)

is the heat flux function on bs, and ∂/∂ nbs is the differentiation along the outward

direction which is normalized to bs. p(r, L, tox, T ) is the power density profile that

consists of the deterministic dynamic power density profile pd(r), the statistical gate

tunneling leakage power density profile pg(r, L, tox, T ), and the statistical subthreshold

leakage power density profile ps(r, L, tox, T ). Since the major part of device current

flows through the channel, power density distribution has its value only when r ∈ (0, Lx)× (0, Ly)× (− jd, 0). Here, jd is the junction depth of device [Lallement et al.

2004].

Generally, the values ofκ(r, T ) are temperature dependent. In practice, they can be treated as appropriate constant values while performing temperature-aware phys-ical design procedures [Tsai et al. 2006]. Given nominal values of the physphys-ical device parameters, the appropriate thermal conductivity can be computed by using the ap-proximated average of steady-state nominal temperatures calculated by an iterative computation scheme shown in Figure 3.

With the appropriate thermal conductivity, the statistical steady-state heat transfer equation can be rewritten as

κ∇2_{T (r}_{, L, t}

ox)= −p(r, L, tox, T ), (6)

κ∂T (rbs, L, tox)

∂ nbs

(9)

whereκ is the thermal conductivity of the die that is obtained by utilizing the procedure presented in Figure 3.

With Eqs. (6) and (7), the goals of this work are to estimate the mean profile, the stan-dard deviation profile, and the thermal yield profile of on-chip temperature distribution. 3.2. Physical Parameter Modeling

Generally, variations of physical parameters can be classified into two categories, die-to-die (D2D) variations and within-die (WID) variations. Due to different stages of the fabrication process, D2D and WID variations can be treated as two independent variation sources. Since D2D variations are smooth within a die, it is reasonable to model all devices having the same D2D variation. On the other hand, WID variations present considerable gradients within a die, and they are spatially correlated because spatial imperfection of the chemical-mechanical polishing and lithography processes. As indicated by the measured results of Cline et al. [2006] and Cheng et al. [2011], distributions of the physical parameters are similar to those of the Gaussian random variables; generally, WID variations are assumed to be a correlated Gaussian ran-dom process, and D2D variations are treated as a Gaussian ranran-dom variable for all devices [Bhardwaj et al. 2008; Chang and Sapatnekar 2007; Shen et al. 2010a, 2010b]. Combining the models of D2D and WID variations, the physical parameter Par(rxy)

with its nominal valueμPar(rxy) at position rxy= (x, y) ∈ (0, Lx)× (0, Ly) can be

repre-sented as

Par(rxy)= μPar(rxy)+ δW I D(rxy)+ δD2D, (8)

whereδW I D(rxy) is a Gaussian random process of WID variations, andδD2Dis a Gaussian

random variable of D2D variations.

Since the spatial correlation of δW I D(rxy) has different decreasing rates in the

x-direction and y-direction [Cline et al. 2006], the following spatial covariance function

[Bhardwaj et al. 2008] is adopted for modeling the spatial correlation ofδW I D(rxy).5

C(rx1y1, rx2y2)= σ 2_exp −|x1− x2| λx − |y1− y2| λy , (9)

where rx1y1 = (x1, y1) and rx1y2 = (x2, y2),λx andλy are correlation lengths ofδW I D in the x- and y-directions, respectively, andσ is the standard deviation of δW I D(rxy).

In this work, the Karhunen-Lo`eve (KL) expansion is utilized to simplifyδW I D(rxy),

since its number of transformed random variables is much smaller than that of princi-pal component analysis (PCA) [Bhardwaj et al. 2008]. By applying the KL expansion,

δW I D(rxy) with the spatial covariance function shown in Eq. (9) can be approximated as

δW I D(rxy)≈ NPar l=1 √_χ lϑl(rxy)ζl. (10)

Here, NParis the truncation number, each (χl, ϑl(rxy)) is an eigen-pair of C(rx1y1, rx2y2), andζl’s are independent standard normal random variables.

5_{Although this specific spatial covariance function is adopted, the Karhunen-Lo`eve expansion of a Gaussian}

random process with an arbitrary spatial covariance function can be efficiently obtained by a finite-element method [Schwab and Todor 2006]. Hence, more advanced spatial covariance functions [Gao et al. 2011; Liu 2007; Cheng et al. 2011] can also be incorporated into our framework.

(10)

The closed-form expressions of an eigen-pair (χl, ϑl(rxy)) for C(rx1y1, rx2y2) shown in Eq. (9) can be derived as follows [Zhang and Lu 2004].

χl = 4σ2_λ xλy λ2 xνx2,i+ 1 λ2 yν2y, j+ 1 , (11) ϑl(rxy) = ϑx,i(x)ϑy, j(y), (12)

where l, i, and j are indices, and the mapping between (i, j) and l is one to one.

ϑx,i(x) andϑy, j(y) are equal to

ϑx,i(x) = λxνx,icos(νx,ix)+ sin(νx,ix) λ2 xνx2,i+ 1 Lx/2 + λx , (13)

ϑy, j(y) = λyνy, jcos(νy, jy)+ sin(νy, jy)

λ2

yν2y, j + 1

Ly/2 + λy

. (14)

Here,νx,iandνy, j are positive values that satisfy

(λ2ν2− 1) sin(νγ ) = 2λν cos(νγ ), (15)

with (ν = νx,i, γ = Lx, λ = λx) and (ν = νy, j, γ = Ly, λ = λy), respectively.

To get reasonable truncation numbers of KL expansions for the physical parameters

L and tox, in this work, NPar for Par = L or Par = tox is decided by the following

criterion. χNPar+1 NPar+1 i=1 χi ≤ ε, (16) withε = 1%.

Generally, devices located adjacently have similar physical characteristics [Chang and Sapatnekar 2007; Shen et al. 2010b, 2010a]. Therefore, the active layer is par-titioned into several rectangular grids for modeling physical parameters. After that, with the KL expansion, the device channel length Lmand oxide thickness tox min the

mth modeling grid can be approximated as Lm = μLm+ g T LmηL, (17) t_{ox m} = μtox m+ g T tox mηtox. (18)

Here, μLm and μtox m are nominal values of Lm and tox m, respectively. gLm and gtox m

are coefficient vectors for η_L and η_t_ox, respectively. η_L = [ηL1, . . . , ηL_NL]

T _and _η

tox =

[ηtox 1, . . . , ηtox Ntox]

T _{are standard normal random vectors constituted by the KL expanded}

WID and D2D random variables for representing the device channel length and the oxide thickness in all modeling grids, respectively.

In the rest of this article,ξ is employed to represent [ηL1, . . . , ηL_NL, ηtox 1, . . . , ηtox Ntox]

T

for the sake of notation simplicity.

4. STATISTICAL ELECTRO-THERMAL SIMULATOR

The executing flow of the proposed statistical electrothermal simulator is summa-rized in Figure 4. Given the information of physical parameters, the KL expansion is performed to transform the spatial correlated physical parameters into a set of uncor-related random variables. Then, the statistical polynomial expression of the on-chip temperature distribution is generated by the developed stochastic collocation-based

(11)

Fig. 4. Flow of the developed statistical electrothermal simulator.

statistical interpolation polynomial generator. After that, the on-chip thermal yield profile is estimated by the developed thermal yield profile estimation engine.

The stochastic collocation-based statistical interpolation polynomial generator, the thermal yield profile estimation engine, and a mixed-mesh strategy for enhancing the statistical electrothermal simulator will be described in the following three sections. 4.1. Stochastic Collocation-Based Statistical Interpolation Polynomial Generator

The generator takes three steps to construct the statistical interpolation polynomial of the on-chip temperature distribution. First, the multidimensional sampling points of KL expanded random variables are generated by using the Smolyak sparse grid formula with sampling points being the roots of Hermite polynomials (HPs). Then, for each sampling point of the physical parameters, its corresponding temperature profile can be obtained by solving the corresponding deterministic heat transfer equation. Finally, the approximated expression of on-chip temperature distribution is built by utilizing Newton’s interpolation polynomial formula. The details are presented in the rest of this section.

4.1.1. Smolyak Sparse Grid Generation.The primary advantage of Smolyak sparse grid formulation is to construct an interpolating polynomial for approximating a multi-variate function u ∈ Cr _{by using much fewer samples of the desired function than}

(12)

Sampling Points of Monte Carlo

Sampling Points of Smolyak Sparse Grid

ξ1

ξ2

ξ1 ξ2

Fig. 5. The number of sampling random variables comparison between the Monte Carlo method and the Smolyak sparse grid formulation. Here, the samples of Smolyak sparse grid are adopted for achieving a level-two approximation.

maintain an acceptable error bound [Smolyak 1963; Barthelmann et al. 2000]. Here,

Cr_{is the set of all functions that have continuous derivatives of all orders up to r. With}

this stochastic collocation technique, the statistical interpolation polynomial of on-chip temperature distribution can be efficiently constructed.

The MC method randomly generates samples of the random variables and hence requires a large number of samples for achieving an accurate estimate. In contrast, the Smolyak sparse grid technique uses the roots of HPs or the extrema of the Chebyshev polynomial [Barthelmann et al. 2000] to generate samples of the random variables and employs these fewer samples to effectively interpolate the desired solution. For example, Figure 5 illustrates that the number of possible sample points of the MC method is much larger than that of the Smolyak sparse grid formulation for a two-dimensional random variable.

According to the Smolyak sparse grid formulation [Smolyak 1963], on-chip tempera-ture distribution can be explicitly approximated as follows [Barthelmann et al. 2000].

TNKL q (r, ξ) = q−NKL+1≤|i|≤q (−1)q−|i| NKL− 1 q− |i| Qi1_{(T )}⊗ · · · ⊗ Qin_{(T )}⊗ · · · ⊗ Qi_NKL_{(T )}. (19)

Here, NKL= Ntox+ NLis the number of random variables inξ, q = NKL+ l, l ≥ 1 is

the formulation level, and|i| = NKL

n=1in, with each in ≥ 1. Qin(T ) is an interpolating

polynomial of T (r, ξ) by only utilizing a single random variable ξn, in is the index to

decide the sample number (min) for Q

in_{(T ), and} ⊗ is the cross product operator for

functions. As suggested by Barthelmann et al. [2000], min=1is set to 1, and minis equal

to 2in−1+ 1 for i

n > 1. From Eq. (19), only the corresponding temperature values of

a small set of samples forξ need to be known [Barthelmann et al. 2000]. This set is called the sparse grid and can be represented as

H(q, NKL)= q−NKL+1≤|i|≤q ¯ hi1× · · · × ¯hin× · · · × ¯hi_NKL, ₍₂₀₎ where ¯hin= {ξ1 n, . . . , ξ min

n } is the set of sample points used by Qin(T ), and× is the cross

product operator for sets.

The number of sample points from the Smolyak sparse grid formulation is in the order of ONl

KL/l!

, and the runtime complexity for obtaining TNKL

q (r, ξ) is in the order

(13)

electrothermal simulation once. The Smolyak sparse grid formulation ensures a error bound, cNKL,r·(log NH)(r+1)(NKL−1)

Nr

H , for the function having bounded derivatives up to order

r [Barthelmann et al. 2000]. Here, N_H is the number of sample points in H(q, NKL),

and cNKL,r is a constant that depends on NKLand r. According to our experience, the

accurate estimation of the thermal yield profile can be obtained by setting level l to be 1. The number of sample points for the Smolyak sparse grid formulation can be much less than that of the MC method. A simple example is presented in Appendix A to illustrate the Smolyak sparse grid formulation.

The sampling values of ¯hin_{for each i}

nmust be properly decided. Adopting the roots

of H-PCs with its order corresponding to each incan achieve the most accurate result,

asξ is a normal random vector [Phillips 2003]. Choosing the extrema of the Chebyshev polynomial with its order corresponding to incan achieve the nested sparse grid

struc-ture, that is, ¯hin= j ⊂ ¯hin=k_{as j} < k, and the acceptable accuracy [Barthelmann et al.

2000]. In this work, we select the roots of H-PCs as the sampling values, since the re-sult is shown to be very accurate by using the low-level approximation, and the nested sparse grid structure is still preserved for q= NKL+ 1.6

4.1.2. Deterministic Electrothermal Solver: Temperature Profile Calculation for a Given Sample Point.After constructing the sparse gridH(q, NKL) ofξ, the samples of channel length

and oxide thickness in the mth modeling grid corresponding to the jth sampling vector ξj _in_{H(q, N}

KL) can be calculated by Eqs. (17) and (18), respectively, and the

determin-istic power density profile corresponding toξjcan also be obtained. Hence, we have the deterministic steady-state heat transfer equation as

κ∇2_{T (r, ξ}j

)= −p(r, ξj, T ), (21)

κ∂T (rbs, ξ j₎ ∂ nbs + hbsT (rbs, ξ j₎_{= f} bs(rbs). (22)

Here, T (r, ξj) and p(r, ξj, T ) are the deterministic temperature and power density profiles with the sampling point ξj, respectively. Since the power density profile in Eq. (21) is temperature dependent, a deterministic electrothermal solver is summarized in Figure 6 and built to obtain each T (r, ξj).

The implementation of the developed deterministic electrothermal solver is illus-trated in Figure 7. The accumulated area of each gate type in each simulating tem-perature grid can be precalculated and stored in the precalculation stage. With this precalculated data, the deterministic power density profile for each sampling point in

H(q, NKL) can be obtained and updated in the order of O(NxNyNtype) during the

elec-trothermal simulation loop. Here, Nxand Nyare the division numbers of the simulation

grid along x− and y−directions, respectively, and Ntypeis the number of gate types for

the given design. Generally, Ntypeis determined by the specific cell library, and it is far

less than the number of simulation grids, NxNy. After obtaining or updating the

deter-ministic power density profile, an efficient deterdeter-ministic thermal simulator [Huang and Lee 2009] is adopted to solve the deterministic heat transfer equation. These updating and solving procedures are repeated until the result is converged.

6_{If high-order approximation is needed for the accuracy, we suggest using the extrema of the Chebyshev}

polynomial because the nested sparse grid structure is always preserved. Hence, the number of sample points can be smaller.

(14)

Fig. 6. Deterministic electrothermal solver for each sampling point (_ξj) in the sparse grid. pleak, pd, and p

are the leakage, dynamic, and total power density profiles for each sampling point, respectively.

Placement of Gates

Pre-calculation Stage

Post-calculation Stage

Simulation Grids … …

Accumulate the area of each gate type in a grid

…

Obtain Power Profile

Obtain Temperature Profile

Power consumption in a grid is got by summing all products of the accumulated area of each gate type and its power density with the updated temperature and process parameters of sampling point

Temperature in a grid is calculated by an efficient generalized integral transform based analytical thermal simulator that can analyze one-million-grid temperature profile within 0.13 second

… …

Electro-thermal Loop

Fig. 7. Implementation of solving the deterministic heat transfer equation.

4.1.3. Temperature Profile Construction by Using Polynomial Interpolation.With each obtained

T (r, ξj), the polynomial interpolation technique can be applied to construct the in-terpolating polynomial for the statistical temperature. As suggested by Barthelmann et al. [2000], the Lagrange polynomial can be applied to construct the interpolating polynomialQi1_{(T )}⊗ · · · ⊗ QiNKL_{(T ) for each different}|i|. However, the suggested

inter-polation method requires performing the cross-product operation of functions; this can be slightly complicated for the implementation. Therefore, we adopt Newton’s inter-polating method to globally interpolate T (r, ξ), because it can be implemented more easily and can interpolate the same polynomial as Barthelmann’s method [Phillips 2003]. Therefore, the Smolyak’s error bound can still be preserved.

(15)

Fig. 8. Stochastic collocation-based statistical interpolation polynomial generation algorithm.

With Newton’s interpolation formula, the on-chip temperature at an arbitrary posi-tion r∗of the die can be approximated as

T (r∗, ξ) = NH−1 j=0 ˆ uj(r∗)φj(ξ). (23)

Here, eachφj(ξ) is a fundamental polynomial with respect to the jth sampling vector

ξj

, and the form of each φj(ξ) can be found in Phillips [2003]. NH is the number of

sampling vectors in the sparse gridH(q, NKL). Each ˆuj(r∗) is an unknown coefficient

that needs to be determined.

Based on the basic idea of interpolation that the approximated function must match each known data, the interpolated polynomial in Eq. (23) satisfies the following equa-tion for eachξn.

NH−1

j=0

ˆ

uj(r∗)φj(ξn)= T (r∗, ξn). (24)

With the property of fundamental polynomial described in [Phillips 2003], Eq. (24) can be rewritten as the matrix form for finding each ˆuj(r∗) at position r∗.

⎡ ⎢ ⎢ ⎢ ⎢ ⎣ φ0(ξ0) 0 · · · 0 φ0(ξ1) φ1(ξ1) · · · 0 .. . ... . .. ... φ0(ξNH−1) φ1(ξNH−1) · · · φNH−1(ξ NH−1₎ ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎢ ⎣ ˆ u0(r∗) ˆ u1(r∗) .. . ˆ uNH−1(r∗) ⎤ ⎥ ⎥ ⎥ ⎦= ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ T (r∗, ξ0) T (r∗, ξ1) .. . T (r∗, ξNH−1₎ ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ (25)

Each ˆuj(r∗) can be calculated by using the forward substitution.

The algorithm for generating the statistical interpolation polynomial of on-chip tem-perature distribution is shown in Figure 8.

4.2. Thermal Yield Profile Estimation Engine

With the generated statistical interpolation polynomial of on-chip temperature distri-bution, the mean, standard deviation, and skewness profiles of on-chip temperature

(16)

distribution are computed. After that, the temperature at each arbitrary position is approximated to be a corresponding skew normal random variable by the moment matching technique. Finally, the on-chip thermal yield profile is estimated by looking up the cumulative distribution function (CDF) table of those corresponding skew nor-mal random variables. The detailed description of this thernor-mal yield profile estimation engine is shown next.

As mentioned in Section 2.1, the on-chip thermal yield profile at an arbitrary position r∗ of the die can be defined as

T yield(r∗, Tspec(r∗))def= Prob(T (r∗, ξ) ≤ Tspec(r∗)). (26) With the definition given in Eq. (26), our target is to approximate the CDF of T (r∗, ξ). To obtain the approximated expression of T (r∗, ξ), the formulation level l is set as 1 for generating the sparse gridH(q, NKL) with q= NKL+ l in Eq. (20). Then, applying

Newton’s interpolating method, the approximated expression of T (r∗, ξ), shown in Eq. (23), can be rewritten as7

T (r∗, ξ) = NKL k=1 ˆak(r∗)ξk2+ ˆbk(r∗)ξk + ˆc(r∗₎_, ₍₂₇₎

where ˆak(r∗), ˆbk(r∗), and ˆc(r∗) are the coefficients and can be obtained by performing

the algorithm shown in Figure 8.

After several manipulations, Eq. (27) can be rewritten as T (r∗, ξ) = NKL k=1 ˆak(r∗)χk(r∗, ξk)+ ˜c(r∗). (28) Here, eachχk(r∗, ξk)= (ξk+ ˆ bk(r∗) 2 ˆak(r∗))

2_{is a non-central chi-square random variable, because}

ξkis a normal random variable, ˜c(r∗)= ˆc(r∗)−

NKL

k=1 ˆ

b2

k(r∗)

4 ˆak(r∗) is a constant, andχk(r

∗_{, ξ}

k)’s

are independent becauseξk’s are independent. Therefore, T (r∗, ξ) is a weighted sum of

independent non-central chi-square random variables.

The estimation of Eq. (26) can be done by calculating the CDF of T (r∗, ξ) represented

in Eq. (28). Sinceξ is an independent normal random vector, theoretically, the PDF of T (r∗, ξ) could be obtained by convolving the PDFs of χk(r∗, ξk)’s. However, it is not

practical because of numerous numerical convolutions. The moment matching-based CDF estimation techniques are another choice for efficiently approximating the CDF of T (r∗, ξ). APEX [Li et al. 2004], a state-of-the-art method, approximates the CDF of

a random variable with the similar form of Eq. (27) by linearly combining exponential waveforms and can achieve an arbitrarily required matching order of statistical mo-ments. Pad´e approximation is essential during performing APEX, although it cannot guarantee being stable for obtaining poles/zeros, even in the low-order approximation. To remedy this unstable issue, the technique proposed by Tutuianu et al. [1996] can be adopted to obtain the first two dominated pole/zero pairs for APEX. However, the first two dominated pole/zero pairs only can construct an approximated CDF of T (r∗, ξ) that

7_{To get a more accurate approximated expression of T (r}∗_{, ξ), one can set the formulation level l as 2 to capture}

the cross-product terms ofξk’s in the second-order polynomial approximation. As shown in Appendix C, with cross-product terms of_ξk’s, the second-order polynomial approximated expression of T (r∗, ξ) can be

transformed to the form that is consistent with Eq. (27). Therefore, the proposed thermal yield profile estimation engine can be extended to the second-order polynomial approximation that has cross-product terms ofξk’s.

(17)

Fig. 9. Sketch of PDF for the weighted sum of two independent non-central chi-square random variables. Case 1: the convolution result of two right-skewed distributions or two left-skewed distributions. Case 2: the convolution result of one left-skewed distribution and one right-skewed distribution.

matches up to the first two statistical moments. Refer to Li et al. [2004] and Tutuianu et al. [1996] for the details of APEX and the stable two-pole technique, respectively.

Here, we are going to develop a moment matching-based technique to match the statistical moments of T (r∗, ξ) up to the third order for approximating the CDF of

T (r∗, ξ). The basic idea of this approach is to approximate a random variable with a

unimodal and skewed PDF by matching its mean, variance, and skewness to be a skew-normal random variable. To explain the unimodal PDF property of T (r∗, ξ), the sketches

of PDFs corresponding to two different cases for the weighted sum of two independent non-central chi-square random variables are shown in Figure 9. Case 1 shows the convolution result of two right-skewed distributions or two left-skewed distributions, and Case 2 presents the convolution result of one left-skewed distribution and one right-skewed distribution. Although, depending on the leading coefficients, the skewness of resulting random variables might increase or decrease, both resulting random variables have unimodal PDFs. Since T (r∗, ξ) is the weighted sum of independent non-central

chi-square random variables, its PDF can be obtained by performing the convolution of two random variables successively. Therefore, it still has a unimodal PDF.

As indicated in Azzalini [2005], the skew-normal random variable is suitable for approximating the random variable with unimodal and skewed PDF. Hence, the skew-normal random variable can be a suitable model for approximating T (r∗, ξ).

From the representation of Eq. (28), the thermal yield at an arbitrary locationr∗can be approximated as T yieldr∗, Tspec(r∗) ≈ Prob T (r∗, ξ) ≤ Tspec(r∗) = Prob T (r∗, ξ) ≤ ρ_T(r∗), (29) where ρ_T(r∗)= Tspec(r∗)−μT (r∗) σ T(r∗) , and  T (r ∗_{, ξ) =} T ( r∗,ξ)−μT (r∗) σT (r∗) . μT (r ∗_{) and} _σ T(r∗) are the

mean and the standard deviation of T (r∗, ξ), respectively; they can be computed in the

order of O(NKL), sinceχk(r∗, ξk)’s are independent.

To approximate T (r∗, ξ) to be a skew-normal random variable, Z ∼ SN(υr∗, ω_r∗, α_r∗), the parametersυr∗, ω_r∗, andα_r∗ [Azzalini 2005] need to be calculated. The first three

(18)

moments of Z can be formulated as follows with these parameters. E{Z} = υr∗+ ωr∗δr∗, (30) Var{Z} = ω2 r∗ 1− δ2_r∗ , (31) Skew{Z} = 4− π 2 δ3 r∗ 1− δ2 r∗ 3, (32) where δr∗ = 2 π αr∗ 1+ α2 r∗ . (33)

After matching the first three moments of T (r∗, ξ) with Eqs. (30)–(32), we have

υr∗ = −ωr∗δr∗, (34) ωr∗ = 1 1− δ_r2∗, (35) αr∗ = π 2 δr∗ 1−π₂δ2 r∗ , (36) where δr∗= γ 2 3  T(r∗) _4−π 2 2 3 + γ 2 3  T(r∗) . (37)

Here,γ_T(r∗) is the skewness of T (r∗, ξ), and the sign of δr∗ is the same as the sign

ofγ_T(r∗).8

To obtainγ_T(r∗), E{ T3_(r∗_{, ξ)} is needed and can be calculated as}

E{ T3(r∗, ξ)} =E{ T 3_(r∗_{, ξ)} − 3σ}2 T(r∗)μT (r∗)− μ 3 T(r∗) σ3 T(r∗) , (38)

whereμ_T(r∗) andσ_T(r∗) are the mean and standard deviation of T (r∗, ξ), respectively.

As shown in Eq. (38), E{ T3_(r∗_{, ξ)} is needed. However, the computational complexity} of obtaining its value is O(N3

KL) if the expression of T (r∗, ξ) shown in Eq. (28) is directly

used. To reduce the complexity, T (r∗, ξ) can be rewritten as

T (r∗, ξ) = NKL k=1 ˆak(r∗) ˆχk(r∗, ξk)+ ˆd(r∗), (39)

8_{Theoretically, the skew-normal random variable has an upper-bound skewness that can be achieved. In}

Appendix B, we examine the preceding stability issue, and address a method that can stably achieve the more higher-order approximation if more accurate approximation is required.

(19)

where ˆχk(r∗, ξk)= χk(r∗, ξk)− μχk(r∗), ˆd(r∗) = ˜c(r∗)−

NKL

k=1 ˆak(r∗)μχk(r∗), and μχk(r∗) =

E{χk(r∗, ξk)}. Since ˆχk(r∗, ξk)’s have zero mean and are independent, we have

E ⎧ ⎨ ⎩ _N KL k=1 ˆak(r∗) ˆχk(r∗, ξk) i⎫_⎬ ⎭ = NKL k=1 ˆa_ki(r∗)E χˆ_ki(r∗, ξk) ! , (40) for i = 1, 2, and 3.

Therefore, E{ T3_(r∗_{, ξ)} can be obtained in the order of O(N}

KL), and the computational

complexity of evaluating E{ T3_(r∗_{, ξ)} is also O(N}

KL).

Withυr∗,ωr∗, andαr∗, T yield(r∗, Tspec(r∗)) can be estimated by the CDF of the matched skew-normal random variable. Finally, we have

T yield(r∗, Tspec(r∗))≈ (βr∗)− 2TOwen(βr∗, αr∗). (41)

Here, (·) is the CDF of the standard normal random variable, T_Owen(·) is Owen’s T function [Azzalini 2005], andβr∗ = ρT (r

∗_)−υr∗

ωr∗ .

With Eqs. (34)–(36) and Eq. (41), T yield(r∗, Tspec(r∗)) can be efficiently evaluated by the lookup table method.

4.3. Mixed-Mesh Strategy for Enhancing the Statistical Electrothermal Simulator

As stated in Section 4.1.3, the deterministic electrothermal solver presented in Figure 6 needs to be executed N_Htimes to generate the statistical interpolation polynomial of on-chip temperature distribution shown in Eq. (23) with the level-1 Smolyak sparse grid formula. Hence, although the developed thermal yield profile estimation engine can be done efficiently, the total runtime for obtaining the thermal yield profile is still dominated by the statistical interpolation polynomial generation. Here, we will present a mixed-mesh strategy to speed up the statistical interpolation polynomial generator without sacrificing the accuracy of the estimated thermal yield profile.

The developed mixed-mesh strategy is inspired by the following observations. The statistical interpolation polynomial generator needs to perform the deterministic elec-trothermal solver once for calculating the temperature profile with nominal device parameters and execute the deterministic electrothermal solver N_H-1 times for ob-taining temperature variations corresponding to the nominal temperature profile. The temperature profile from the first part contributes the major portion of the mean pro-file of temperature distribution, and the temperature variations from the second part contribute a large portion of the variance and skewness profiles of temperature dis-tribution. In practice, the magnitude of the mean temperature profile is larger than those of the variance and skewness profiles of temperature distribution, since process variations of parameters are usually within a controllable range.

Based on the preceding observations, the mixed-mesh strategy for generating the statistical interpolation polynomial of on-chip temperature distribution is illustrated in Figure 10. Since the mean profile contributes a major portion to the thermal yield profile, the nominal temperature profile is built by the deterministic electrothermal solver with a fine mesh having NFNFgrids to preserve the estimation accuracy of the mean temperature profile. Then, the difference between the maximum and minimum temperature values,Tmax, of the nominal temperature profile is calculated, and an allowable temperature resolution, Tres, is chosen. After that, the remaining N_H− 1 de-terministic electrothermal simulations are executed with a coarse mesh having NCNC grids. Here, NC is equal toTmax/Tres. Finally, the mean, variance, and skewness profiles of on-chip temperature distribution can be approximated by the generated

(20)

Fig. 10. The sketch of mixed-mesh strategy for generating the statistical interpolation polynomial of on-chip temperature distribution.

statistical interpolation polynomial, and these temperature profiles are utilized to cal-culate the thermal yield profile.

With the proposed mixed-mesh strategy, the runtime for generating the statistical interpolation polynomial of on-chip temperature distribution can be significantly re-duced. In this work, an effective deterministic thermal simulator [Huang and Lee 2009] is adopted as the kernel of our developed deterministic electrothermal solver.9_The com-putational complexity of the deterministic thermal simulator presented in Huang and Lee [2009] is O(NMNMlog NBase). Here, NMNMis the mesh size, and NBaseis the number of bases for expressing the deterministic temperature profile. Generally, by using the average chip temperature calculated by the iterative computation scheme of the 1D thermal model shown in Figure 3 to be the initial operating temperature, the number of electrothermal loops for achieving convergence can be less than a small value. Hence, the computational complexity of the developed deterministic electrothermal solver is also O(NMNMlog NBase).

Therefore, the computational complexity of our baseline algorithm (the fine mesh is used for each deterministic electrothermal simulation) stated in Figure 4 is

O(N_HNFNFlog NBase), and the computational complexity by utilizing the developed mixed-mesh strategy can be reduced to O((NFNF+ (NH− 1)NCNC) log NBase).

The computational complexity ratio of the developed mixed-mesh strategy for gener-ating the statistical interpolation polynomial to the deterministic electrothermal solver with nominal device parameters is equal to 1+(N_H−1)×(NC/NF)2. In our experimental results, an accurate thermal yield profile can be estimated with N_H = 53, NF = 128,

NC = 16, and Tres = 0.65◦C. The computational complexity ratio is 1.8125. There-fore, the mixed-mesh strategy does enhance the efficiency of the developed statistical electrothermal simulator for catching up with that of a deterministic electro-thermal simulator.

5. EXPERIMENTAL RESULTS

The developed statistical electrothermal simulator is implemented in C++ language and tested on a Linux system with Intel Xeon 3.0-GHz CPU and 32GB memory. The die size is 2.5 mm× 2.5 mm × 0.5 mm. The junction depth is set to 20nm for the 65nm

9_{As reported in Huang and Lee [2009], it took only 0}_{.13 seconds to obtain the temperature profile of a chip}

(21)

Fig. 11. Power map of the test chip: (a) floorplan, (b) geometries of the die and package, (c) the mean profile of the power map, (d) the standard deviation profile of the power map. Here, Lx and Lyare the width and

length of the test die, respectively.

technology [Lallement et al. 2004], and the Debye length is 2nm [Bienacel et al. 2004]. The floorplan of a test chip having 1.2 million functional gates is shown in Figure 11(a), and the geometries of chip and package are shown in Figure 11(b).

By applying the modeling skill of thermal parameters mentioned in Figure 3 of Section 3.1 and the modeling skill for both heat transfer paths mentioned in Huang et al. [2006], the thermal conductivity and the equivalent heat transfer coefficients of the primary and secondary heat flow paths for executing the deterministic thermal simulator [Huang and Lee 2009] are summarized in Table III. The boundary condition of each vertical surface is set to be isothermal [Huang and Lee 2009].

The device parameters, the truncation points of KL expansions for the channel length (NL) and the oxide thickness (Ntox), and the number of device modeling grid (NKLg) are

summarized in Table IV. Both NLand Ntox are decided by using the criterion stated in

Eq. (16) withε = 1%. To model the spatial correlation, both ηx/Lx andηy/Lyare set to

0.98 for the correlation function shown in Eq. (9) [Cline et al. 2006]. The active layer of the test chip is divided into 128× 128 simulated grids.

The estimated mean and standard deviation profiles of the power map under the settings of 60% WID and 40% D2D variations to the total variations are shown in Figures 11(c)–(d), respectively.

(22)

Table III. Equivalent Thermal Parameters

Parameter Value

κ 104.6 W/(m·◦C)

hp 12,000 W/(m2_·◦_C) hs 2,017 W/(m2_·◦_C) κ: the thermal conductivity of the die.

hp: the equivalent primary heat transfer coefficient.

hs: the equivalent secondary heat transfer coefficient. Table IV. Parameters and Truncation Points for the Channel Length

and Oxide Thickness

Nominal L Nominal tox 3σL 3σtox NL Ntox NKLg

65nm 1.5nm 12% 5% 13 13 49

5.1. Electrothermal vs. Nonelectrothermal Simulations

With the number of grids being 100 for modeling device parameters and the ratios of WID and D2D variations to the total variations being 60% and 40%, respectively, the MC method with 2× 104_{samples is employed to demonstrate essentialness of the} electrothermal simulation loop. The estimated mean and standard deviation profiles of the on-chip temperature distribution with and without considering the temperature-dependent issue of leakage power are shown in Figures 12(a) and 12(c), and Fig-ures 12(b) and 12(d), respectively. For the mean profile estimation, the difference be-tween Figure 12(a) and Figure 12(b) is over 16%. For the standard deviation profile estimation, the difference between Figure 12(c) and Figure 12(d) is over 31%. These results indicate that statistical electrothermal analysis is essential.

5.2. Accuracy and Efficiency

This section is going to demonstrate the correctness and efficiency of the developed statistical electrothermal simulator and show its efficiency improvement by using the mixed-mesh strategy.

Given three different ratio pairs of the WID and D2D variations to the total varia-tions, (WID, D2D)= (40%, 60%), (50%, 50%), or (60%, 40%), the results from 2 × 104 MC simulations, which satisfy the maximum absolute relative error of variance is less than 1%, are utilized as the reference solution.

5.2.1. Mean and Standard Deviation Estimation.To demonstrate the accuracy and efficiency of the developed statistical interpolation polynomial generator, the level-1 Smolyak sparse grid formula with the sampling points being the roots of HPs is built. The deterministic electrothermal solver needs to be executed 53 times, since the number of sampling points (N_H) for physical parameters to obtain the level-1 Smolyak sparse grid formula is equal to 2× (NL+ Ntox)+ 1, and both NLand Ntox are calculated to be

13, as shown in Table IV. The size of each simulation mesh is 128× 128.

The maximum absolute errors of mean and standard deviation profiles are presented in Table V. The first two columns indicate the ratio pairs of WID and D2D variations to the total variations. Compared with 2× 104_{MC simulations, the maximum absolute} errors of the estimated mean and standard deviation profiles from the developed sta-tistical interpolation polynomial generator are shown in the fifth and sixth columns, respectively. As shown in Table V, the maximum absolute errors are less than 3.0% for all three different ratio pairs.

To fairly compare the runtime, the MC simulation is performed till achieving the same accuracy of standard deviation as the developed statistical interpolation polyno-mial generator. The number of MC simulations is shown in the #Samples column, and

(23)

Fig. 12. Results of the MC method with or without considering the electrothermal effect: (a) and (b) are the mean temperature profiles with and without considering the electrothermal effect, respectively; (c) and (d) are the standard deviation profiles of temperature distribution with and without considering the electrothermal effect, respectively.

the Runtime column denotes the runtime for both methods. According to the Speedu column, the developed statistical interpolation polynomial generator can be orders of magnitude faster than the MC method. Under the ratio pair (WID, D2D)= (60%, 40%), the developed statistical interpolation polynomial generator takes 2.74 seconds to gen-erate the interpolation polynomial of temperature profile for the 128× 128 simulation mesh. It contains 0.47 seconds for executing 53 deterministic electrothermal simula-tions, and 2.27 seconds to generate the interpolation polynomials after the 53 sampling temperature profiles are obtained.

With the ratio pair (WID, D2D)= (60%, 40%), the estimated mean and standard deviation profiles of on-chip temperature distribution are shown in Figures 13(a) and 13(b), respectively. The error distributions of the mean and standard deviation of on-chip temperature distribution are presented in Figures 13(c) and 13(d), respectively.

(24)

Table V. Accuracy and Efficiency Comparison of the Developed Statistical Interpolation Polynomial Generator and the MC Method

MC Method Developed Statistical Interpolation Polynomial Generator

WID W2D _{Maximum Maximum Standard} _{Speedup (}_×)

Ratio Ratio #Samples Runtime (s) Mean Error Deviation Error Runtime (s)

40% 60% 6,921 442.94 0.91% 2.70% 2.68 165.2

50% 50% 7,011 448.70 0.91% 2.68% 2.72 164.9

60% 40% 7,031 449.98 0.90% 2.72% 2.74 164.2

†The error is calculated by comparing with 2 × 104_{MC simulations.}

“Runtime” does not include the time for parsing input files.

Fig. 13. Estimated mean and standard deviation profiles of on-chip temperature distribution with the ratio pair (WID, D2D)_{= (60%, 40%): (a) and (b) are the estimated mean and standard deviation profiles,} respectively; (c) and (d) are the error distributions of the estimated mean and standard deviation profiles compared with the MC method, respectively.