Full-Chip Thermal Analysis for the Early Design Stage via Generalized Integral Transforms

(1)

Full-Chip Thermal Analysis for the Early Design

Stage via Generalized Integral Transforms

Pei-Yu Huang and Yu-Min Lee, Member, IEEE

Abstract—The capability of predicting the temperature profile is critically important for timing estimation, leakage reduction, power estimation, hotspot avoidance and reliability concerns during modern IC design. This paper presents an accurate and fast analytical full-chip thermal simulator for early-stage tem-perature-aware chip design. By using the generalized integral transforms (GIT), an accurate formulation is derived to estimate the temperature distribution of full-chip with a truncated set of spatial bases which only needs very small truncation points. Then, we develop a fast Fourier transform like evaluating algorithm to efficiently evaluate the derived formulation. Experimental results confirm that the proposed GIT-based analyzer can achieve an order of magnitude speedup compared with a highly efficient Green’s function-based thermal simulator. Finally, we propose a 3-D IC thermal simulator and demonstrate its efficiency and accuracy.

Index Terms—Circuit simulation, generalized integral trans-forms (GITs), physical design, simulation, thermal analysis, 3-D IC.

I. INTRODUCTION

T

HE power density of VLSI circuits increases monotoni-cally as the CMOS technology scales down. The power dissipated by the circuits converts into heat. As a result, it raises the temperature of dies and induces hot spots. These thermal-related phenomena significantly degrade the performance and reliability of circuits [1]–[16]. For example, the resistance of copper interconnect increases 39% as the temperature rises from 20 C to 120 C, and the mean-time-to-failure of the intercon-nect exponentially decreases as the temperature increases [1]. To precisely predict the thermal impacts on design performance, an efficiently and accurately thermal analyzer is necessary in the temperature-aware design flow because it is usually a part of simulation kernel in the optimization loop and needs to be executed numerous times.

The thermal simulators can be categorized into two classes, numerical and analytical methods. The numerical methods use the finite difference method or the finite-element method (FEM) to transfer heat equations to resistance–capacitance (RC) net-work equations. Based on the RC netnet-work equations, several methods have been proposed to save the runtime. Wang et al. [2] utilized the alternating-direction-implicit method to split the Manuscript received August 22, 2007; revised December 31, 2007. First pub-lished March 10, 2009; current version pubpub-lished April 22, 2009. This work was supported in part by National Science Council of Taiwan and by SoC Research Center of NCTU.

The authors are with the Department of Communication Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: [email protected]).

Digital Object Identifier 10.1109/TVLSI.2008.2006043

equivalent RC system into different alternating directions, and alternately performed the line smooth scheme in each direction. In [3], the model order reduction technique was employed to improve the efficiency of transient analysis. Li et al. [4] applied the multi-grid method to speed up the convergence rate of itera-tive methods, and developed an order reduction scheme to save the runtime of dynamic thermal simulation. Because of the flex-ibility for dealing with the complicated structure, the numerical framework is the main stream in back-end design stages such as the post layout thermal verification.

As pointed out in [1], [5], and [6], temperature-aware design should be brought to early design stages such as thermal-aware floor-planning and placement. To give a reasonably accurate temperature prediction with little computational effort, [1] pro-posed a compact thermal model which modeled the package and interconnect layers as effective heat transfer coefficients for the boundary conditions of die. With the modeled heat transfer co-efficients for the heat sink, prelayout interconnect and package, recently, an efficiently numerical thermal simulator developed by Yong et al. [7] is very suitable for early temperature-aware design stages. Because their simulator applies an adaptive dis-cretization algorithm for spatial and temporal domains to ana-lyze the temperature profile without degrading the accuracy, the number of temperature variables and simulating time steps can be significantly reduced.

The other category of thermal simulators being suitable for early design stages is the analytical method. The primary ad-vantage of analytical approaches is that they avoid the volume meshing procedure of entire substrate, and have closed-form representations for the temperature distribution of the entire die. Hence, they are flexible to obtain the temperature distribution of certain user-specified regions without performing the thermal simulation for the entire die. Furthermore, based on the closed-form representations, the fast temperature evaluation of the die can be achieved for early design stages.

One analytical thermal solver is the Green’s function-based method [6]. First, the steady-state Green’s function of chip with a unity impulse power source is calculated. After that, its steady state temperature distribution with arbitrary power source map is got by taking the convolution of Green’s function and its power density distribution with a table lookup method. To en-hance the efficiency for lots of power sources, they used a series of cosine waveforms to approximate the power density map, and the temperature map of all grid cells were cast into the form of discrete cosine transform (DCT). Although their computational

cost is , where and are numbers of

(2)

lation. However, the dynamic thermal analysis is also necessary while performing the dynamic thermal management and run-time thermal analysis [1], [4], [7]. Moreover, as indicated in [6], a large number of truncation points for the Green’s function is usually required to achieve an accurate solution.

To overcome these shortcomings, our major contributions are as follows.

• Compared with a highly efficient Green’s function-based method [6], we improve the bound of the error decaying rate of analytical solution for the steady state temperature distribution and provide a transient temperature simula-tion by utilizing the generalized integral transforms (GITs) [17]–[19] to construct a set of spatial bases and calcu-late their time-varying coefficients. The proposed method can accurately estimate the temperature distribution of full-chip with very small truncation points ( and ) of spa-tial bases. The experimental results presented in Section V show that can be far less than without losing any accuracy compared with [6].

• We develop a fast Fourier transform (FFT) like evaluating algorithm to efficiently evaluate the temperature map of all grid cells, and its computational cost is in the order

of , where and are truncation

points of bases in the - and -directions, respectively. • We build an efficient 3-D IC thermal simulator by

com-bining the GIT and numerical schemes, and its efficiency and accuracy are demonstrated by experimental results. Moreover, this hybrid scheme can be used to get more ac-curate temperature distribution with considering the dif-ferent thermal conductivity of each stacked layer for the primary and secondary heat flow paths.

This paper is organized as follows. First, the thermal model for early design stages is presented in Section II. The GIT-based computational formula for the full-chip thermal simulation and the proposed evaluating algorithms are described in Section III. After that, the hybrid scheme of GIT-based thermal simulation method for the 3-D ICs and the package structures is addressed in Section IV. Finally, the experimental results and conclusions are given in Sections V and VI, respectively.

II. THERMALMODELING FOREARLYDESIGNSTAGES A compact thermal structure of the chip, as illustrated in Fig. 1, can be used for early design stages. This model consists of three portions [1]: the primary heat flow path, the secondary heat flow path, and the heat transfer characteristic of each macro/block on the silicon die. The primary heat flow path is composed of thermal interface material, heat spreader and heat sink. The secondary heat flow path contains interconnect layers, input/output (I/O) pads and the print circuit board (PCB). The functional blocks are modeled as many power generating sources attached to a thin layer close to the top surface of die with the thickness being equal to the junction depth.1_{The major} concerns of early-stage temperature-aware optimization proce-dure are to reduce the temperature or the thermal gradient of die. Here, we focus on estimating the temperature distribution.

1_{Because major part of currents only passes through the channel, this}

approx-imation is more reasonable than setting the power generating sources distributed to the entire die.

Fig. 1. Compact thermal model for early design stages.

Fig. 2. Energy conservation law and the heat conduction equation. The dE=dt is the energy change rate for the unit volume and is equal to 1x1y1z@T=@t. The conduction heat flowing into the unit volume is equal to the sum ofqj = 01y1z@T=@xj , qj = 01x1z@T=@yj andqj = 01x1y@T =@zj . The conduction heat flowing outward the unit volume is the sum ofqj = 01y1z@T=@xj ,qj =

01x1z@T=@yj , andqj = 01x1y@T =@zj . The

p1x1y1z is the energy generation rate of that unit volume. The and are the thermal conductivity, and the product of the material density and the heat in the unit volume, respectively. Thep is the power consumption density of the unit volume.

According to energy conservation law, the changing rate of energy in a unit volume of substrate equals to the conduction heat through the unit volume [17]. Fig. 2 illustrates this heat conduction mechanism. Based on this heat conduction mech-anism, the temperature of die can be governed by the following heat transfer equations [2], [4], [5], [7]:

(1) (2) Here, , is the thermal conductivity W/m C of die, is the product of the material density and the specific heat J/m C of die, is the power density of

heat source W/m , is the

dimension of die, and are the lateral lengths of die, is the thickness of die, is any specific boundary surface of the die, is the heat-transfer coefficient on , is an arbitrary function on , and is the differentiation along the outward direction which is normalized to .

(3)

To provide reasonable accuracy of the temperature estimation with little computational effort during the early-stage temper-ature-aware optimization procedures, heat-transfer coefficients on the boundary surfaces of die should be appropriately mod-eled [1]. Based on the model proposed by [1], the heat transfer coefficients of primary heat flow path can be equalized to an effective heat transfer coefficient by combining the effect of each component on the primary heat flow path. Since the detailed layout of interconnects is not available in early de-sign stages, each interconnect layer is modeled as an equiva-lent thermal resistance based on the densities and the regularity structure assumption of metal and dielectric material [1]. Fur-thermore, the I/O pads and PCB can also be modeled as an ef-fective thermal resistance by using the technique proposed by [20], [21]. With theses modeled thermal resistors, the technique shown in [15] can be utilized to find the equivalent heat-transfer coefficient of these successively connected thermal resis-tors. After and have been obtained, ’s for the top and bottom surfaces are set to and , respectively [15], [20]. Here, is the ambient air temperature. Because of the chip and package structures, the area of vertical surface is strictly less than the area of horizontal surface, and the thermal conductivity of air is much less than the thermal conductivi-ties of primary and secondary heat flow paths. Therefore, the boundary condition of each vertical surface can be set to be adi-abatic [6].

Generally, the values of and are temperature de-pendent. The difference of peak temperature is about 5 C be-tween the result with temperature-dependent thermal parame-ters and the result with constant thermal parameparame-ters at 25 C [7]. In current VLSI design, the on-die temperture can be in the degree of 100 C. Under this situation, this difference may lead to about 5% error for the peak temperature of die. Since the ef-fort to amend this error is relatively high,2_{for practical purposes,} these thermal parameters are usually treated as appropriate con-stants while performing temperature-aware floor-planning and placement [5], [11]–[14].

The value of each thermal parameter can be found by ap-plying a simplified 1-D thermal model shown in Fig. 3 to es-timate the roughly average rising temperature of the die with respect to the room temperature. After that, the thermal pa-rameters are calculated at the average temperature. Please see Section V-A for the detail of using 1-D thermal model. By using these estimated thermal parameters, the error of peak tempera-ture can be reduced.

With the above models, the heat diffusion equations for the rising temperature, , of die in early de-sign stages can be rewritten as

(3) (4) (5)

2_{To calibrate the difference caused by the temperature-dependent issue of}

thermal parameters, several iterative loops of the thermal simulation need to be executed.

Fig. 3. Simplified 1-D thermal model for estimating the roughly average tem-perature of die. The modeled thermal resistance network is shown in the right-hand side. The values of thermal resistors areR = 1=A h , R = 1=A h andR = D =A . T (z) is the average rising temperature with respect to the room temperatureT on the lateral planes at arbitrary z position of die. Here,R can be viewed as a variable resistor when obtainingT (z) at cer-tainz position. P is the total average power consummation of die. A is the cross area of die normal to thez-direction, and D is the thickness of die.

(6) Here, and are the thermal conductivity, and the product of the material density and the specific heat of die got by using the roughly average temperature, respectively, and the initial

condi-tion .

By discretizing the power generating source of die along the - and -directions into grid cells, as shown in Fig. 1, where and are numbers of divisions in - and -direc-tions, respectively, the power density profile in (3) can be written as

; . (7)

Here, , is the junction depth

of device, is an indicative function with nonzero value being 1 only when is in

, , , and are

indices of divisions, and is the power density waveform of grid cell in the thin layer with thickness .

For the dynamic thermal simulation, is a user-speci-fied time-interval function with the magnitude of each interval being equal to the average power density of each time interval. We should note that the thermal time constant of heat conduction is much larger than the clock period of circuit [2], [16]. As indi-cated in [16], the temperature takes at least 100 K cycles to rise 0.1 C. Practically, the time interval specified by the user can be much larger than the clock period of circuit. Moreover, when calculating the steady state temperature, the input power profile is usually set to the steady power profile (the average power pro-file for a very long time period estimation) [1], [2], [4], [6], [7], therefore, can be reasonably viewed as a step function with the magnitude being equal to its average power density for a long time period.

(4)

Fig. 4. Executing flow of the proposed GIT-based thermal simulator.

With the previous discussion and governing (3)–(6), our goal is to get the rising temperature distribution of the die corre-sponding to the ambient temperature.

III. FULL-CHIPTHERMALSIMULATION

The executing flow of our GIT-based thermal simulator is summarized in Fig. 4. After the chip geometry, package config-uration and power density of grid cells are given, the compact thermal model described in Section II is built.

Then, the GIT-based computational formulas for the full-chip temperature distribution are derived. As shown in the first major block (Computational formulas construction) of Fig. 4, three steps are involved to construct the formulas. In the beginning, a set of appropriate bases is generated by a system-compat-ible auxiliary problem.3_{After that, the temperature distribution} can be expressed by these bases with suitable time-varying co-efficients. With the Galerkin’s scheme [17], [19], those time-varying coefficients can be found by an uncoupled system for estimating the temperature in the sense of least square residual approximation. Finally, the calculating formula of the average temperature for each specific grid cell is obtained by averaging the temperatures in that grid area.

After the temperature computational formulas are derived, we develop two efficient FFT like evaluating algorithms,

2D-LTS-FFT and 2D-STL-FFT, as shown in the second major

block (fast temperature evaluating algorithm) of Fig. 4, to get the transformed coefficients for the power density map of grid cells and the desired temperature distribution, respectively.

In reality, the leakage power of chip is temperature dependent. Although our simulating flow does not include this issue, it can be easily handled by combining the temperature-power iterative

3_{Several guidelines [17]–[19] need to be followed for choosing this}

auxil-iary problem. First, the auxilauxil-iary problem should be as similar as possible to the original problem. Second, the generated bases have to be completely ortho-nor-malized to ensure the convergence in mean of the approximated temperature distribution. Finally, the ortho-normal bases should be time independent for the efficiency consideration.

framework [1], [7], [10] with the proposed algorithm, and the detail is presented in Appendix III.

In the rest of this section, each sub-block of two major blocks in Fig. 4, the error bound decaying rates of [6] and our GIT-based formula for the average steady-state temperature distri-bution, and the dynamic thermal simulation are discussed.

A. Auxiliary Problem for Generating Appropriate Spatial Bases

The auxiliary problem can be introduced by considering the homogeneous problem which the temperature distribution sat-isfies (3)–(6) with . As stated in [17]–[19], the aux-iliary problem can be set to be the following Sturm–Liouville problem with specific boundary conditions:

(8) (9) (10) (11) The solutions of Sturm–Liouville problem form a set of com-pletely ortho-normal spatial bases for the die, and the general forms of and can be obtained as follows [17]:

(12) (13) where , and are non-negative integers, is the normalized

value being equal to , , ,

with and , ,

(14) and

(15) Here, each is a positive value which satisfies

(16) To obtain each , we apply Newton–Raphson method [22] to (16) with the initial guess of each being

because the period of the right hand side in (16) is equal to . Each is called as an eigenfunction, is its eigen-value, and , , and are eigenvalues in -, -, and -di-rections, respectively. The physical meaning of is that it presents the free vibration with respect to the system de-scribed by (8)–(16), and its vibration frequencies are , and in -, -, and -directions, respectively. The physical

(5)

meaning of is that it presents the spectral magnitude of .

B. System Transformation for Time-Varying Coefficients

Since the generated bases are completely ortho-normal in the spatial domain of die, can be approximated as the following finite integral transform pair [17]–[19]:

(17)

(18)

where each is an unknown transformed time-varying co-efficient, and , and are truncation points in -, -, and

-directions, respectively.

After utilizing the energy conservation law and Diver-gence Theorem [18], and executing a series of derivations, the following uncoupled system is established to find each time-varying coefficient function . The detail description is shown in Appendix I

(19) where

(20)

Since (19) is uncoupled for different “ ”, each can be individually solved as

(21)

For the steady-state simulation, is a step function with its magnitude being equal to the average power density of grid for a long time period. Thus, is set to be infinity to find

the steady-state value of which is .

Therefore, the evaluation of steady-state temperature can be done without any time step approaching.

C. Average Rising Temperature Evaluation of Grid Cells

Generally speaking, hot spots occur in regions which are close to power sources. Hence, we focus on evaluating the average temperature of each grid cell on the top surface of die.4_{First, we present the formulation for calculating the} av-erage rising temperature of steady state and discuss its decaying rate of truncation error. Then, the fast evaluating algorithms are developed for realizing the formulation. Finally, the dynamic thermal simulation is given.

4_{Our method can be used to find the average temperature of each grid cell at}

arbitrary lateral plane of the die by substituting suitablez into the bases.

1) Steady-State Formulation: Plugging ’s and ’s into (17), the average rising temperature of steady state for each grid cell on the top surface is

(22) where (23) (24) and (25) (26) where is the average power density of grid for a long time period, and and are numbers of divisions in the

- and -directions.

An error bound of calculated by (22) is given by

Theorem 1 in Appendix II. Based on Theorem 1, the error

decaying rate of (22) is dominated by

. To compare the previous error decaying rate with the Green’s function based method’s [6], we set the boundary conditions and power source location to be the same with [6]. With these settings and Appendix II, the error decaying rate of our GIT-based formulation is in the order of ), and the error decaying rate

of [6] is in the order of . Therefore,

the error decaying rate of the proposed GIT-based method is faster than [6]. The reason is that the bases in -direction of the GIT-based method are different with [6], and our constructed bases can fully fill the eigen-space of heat diffusion equation. This fact leads to different coefficients in the approximating form even if the bases in - and -directions of our GIT-based method are the same with [6]. Furthermore, the error decaying rate of the proposed GIT-based method is not only faster than [6], the experimental results also show that it can maintain the same accuracy as [6] even if its truncation points, and are far less than the numbers of divisions, and .

(6)

Fig. 5. Overview of using 2D-SLT-FFT and 2D-LTS-FFT to evaluate the av-erage rising temperature of grid cells.

Although the truncation points can be far less than the number of grid cells , there is no actual efficiency improve-ment over [6] if we directly apply the standard FFT to evaluate each . The reason is that the standard inverse fast Fourier transform (IFFT) needs to pad zeros to the input data when the dimension of input data is less than the dimension of output data, such as (22). Moreover, the dimension of output data in stan-dard FFT is equal to the dimension of input data. However, the dimension of output data in (25) is only which is far less than its dimension of input data, . To overcome this limi-tation, we develop FFT like fast evaluating algorithms for our GIT formulation in the next subsection.

2) Fast Evaluating Algorithms for GIT Formulation: To

ef-ficiently realize our formulation for the steady-state tempera-ture distribution, we first derive a one-dimensional radix-two-based FFT like algorithm for the length of output data being larger than the length of input data 1D-STL-FFT. Then, based on 1D-STL-FFT, we develop a 1-D FFT like algorithm for the length of output data being smaller than the length of input data 1D-LTS-FFT. Finally, we extend these 1-D algorithms to two 2-D algorithms by the row-column procedure, and we call them as 2D-STL-FFT and 2D-LTS-FFT. Finally, these two al-gorithms are integrated to calculate (22) and (25). The compu-tational complexity of our GIT-based thermal simulator can be analyzed to be only . The overview of the previous evaluating algorithms are shown in Fig. 5. Given the power density profile of chip, 2D-STL-FFT computes the trans-formed coefficients of power density profile, and 2D-LTS-FFT transforms these transformed coefficients to obtain the average rising temperature of grid cells.

a) 1D-STL-FFT: The prototype of 1D-STL-FFT is

(27)

where and both are power of 2, , and ’s

and ’s are complex input and output data with lengths being equal to and , respectively.

Because the length of ’s is larger than the length of ’s, the zeros-padding step of ’s like in the standard FFT algo-rithm needs to be avoided for saving the runtime. Therefore, the

Fig. 6. Procedure of 1D-STL-FFT. The “Reverse 0 bit” means the re-verse-bit algorithm [22].

1D-STL-FFT algorithm shown in Fig. 6 is developed to

calcu-late (27) without the zeros-padding.

In the beginning, the “ ” reorders the input data for those sub discrete Fourier transforms (DFTs) which will be generated by recursively performing the Danielson–Lanczos Lemma (DL-Lemma) [22] to the prototype of 1D-STL-FFT in (27). The DL-Lemma is used to rewrite the original DFT as the sum of two sub DFTs with half output length. One of the two is formed from the even-numbered points of the input data, and the other is formed from the odd-numbered points. In this step, the DL-Lemma is used recursively for these two sub DFTs. Because is less than , this bisecting procedure is executed only times, and we have bisecting levels.

After Line 2 in Fig. 6 is performed, the 1D-STL-FFT algo-rithm evaluates the output of those sub DFTs in the bottom level by using Lines 3 12, and performs Line 13 to get the output of remaining levels.

An example with and is given in Fig. 7(a). There are three bisecting levels, and four sub DFTs in the bottom level. After performing the reverse-bit algorithm to the input data, two phases are executed. The first phase is done by using

Lines 3 12 of Fig. 6. The second phase is to get the output of the remaining levels by executing the bottom up procedure of standard FFT as stated in Line 13 of Fig. 6.

The complexity of 1D-STL-FFT is since there are bisecting levels and each complexity is .

b) 1D-LTS-FFT: The prototype of 1D-LTS-FFT is

(28)

where , and and are real input and complex output data with lengths being equal to and , respectively. Applying the DL-Lemma to the prototype of 1D-LTS-FFT for generating bisecting levels, can be written as the sum of sub DFTs. Each sub DFT has the same form as the 1D-STL-FFT with the lengths of input and output being equal to and , respectively.

(7)

Fig. 7. Computational flow graphs of 1D-STL-FFT and 1D-LTS-FFT withM = 8 and M = 16. (a) 1D-STL-FFT. (b) 1D-LTS-FFT. (c) 1D-LTS-FFT for negative frequencies.

Fig. 8. Procedure of 1D-LTS-FFT.

Two phases are utilized to evaluate , and the 1D-LTS-FFT algorithm is shown in Fig. 8. First, Line 2 performs the re-verse-bit algorithm to the input data, and Lines 4 8 use the

1D-STL-FFT algorithm to obtain each bisected sub DFT. After

each sub DFT has been done, a bottom up procedure is applied to the remaining bisecting levels for finding

, and the executing steps are from Line 9 to Line 24. An example with and is shown in Fig. 7(b). In the first phase, the input data are reordered by using the re-verse-bit algorithm, and the reordered data are fed into the corre-sponding 1D-STL-FFT blocks. This can be done by using Lines 3 8 in Fig. 8. Then, the output of top block in the level 1 of the second phase is calculated by

(29)

Fig. 9. Procedure of 2D-STL-FFT.

and can be done by a similar way. Finally, is equal to (30) The second phase is summarized in Lines 9 24 of Fig. 8.

For the general case, the sub DFTs in each level of the second phase can be obtained by combining those sub DFTs of their previous level with the similar formula of (29) by replacing 16 to be in each level. The computational complexity of the first phase is because the

1D-STL-FFT needs to be executed times, and each

complexity is . The complexity is for

the second phase. Hence, the computational complexity of

1D-LTS-FFT is .

c) Temperature Evaluation: The average rising

tempera-ture of steady state shown in (22) can be got as

(31) where is the real part operator, and

(32)

Here, , ,

, and each is equal to (23).

To obtain ’s, the values of ’s and ’s need to be known.

To calculate ’s, a row-column-based 2D-STL-FFT method is developed and shown in Fig. 9 by utilizing the

(8)

Fig. 10. Simulating algorithm of the proposed steady-state thermal simulator.

for each row of the input matrix which each entry is , and Lines 5 7 apply the 1D-STL-FFT to each column of the output matrix got from the row procedure for obtaining

the desired matrix which each entry is .

The complexity for obtaining ’s is

because the complexities of row and column procedures are

and , respectively.

To calculate each from (23), ’s need to be known from (25). Therefore, the 2-D prototype with the similar form as (28) is needed to get related ’s for the input data being ’s. A row-column-based 2D-LTS-FFT algorithm can be constructed by using the similar procedure shown in Fig. 9 with the 1D-STL-FFT replaced by the 1D-LTS-FFT. The

2D-LTS-FFT method is then used to get those related ’s. However, (31) cannot be utilized to calculate ’s because the lengths of those related ’s in the row and column directions are less than and , respectively. Therefore, the complex conjugates of ’s are required to complete the calculation of

’s.

Fortunately, the complex conjugate of the output from each sub 1D-STL-FFT in calculating ’s can be directly obtained by reversing these sub DFTs. Therefore, the complex conjugates of ’s can be got by reversing the data of in Line 7 of Fig. 8, and performing Lines 9 24 in Fig. 8 during the row-column procedure of ’s.

The complexity of row procedure for obtaining those related

’s is because the 1D-LTS-FFT needs to

be executed times. The complexity of column procedure is because the 1D-LTS-FFT needs to be ex-ecuted times. Hence, the complexity for obtaining ’s is . The complexity for calculating the

com-plex conjugates of ’s is since only

the second phase needs to be recomputed. Therefore, the

com-plexity for computing (25) is .

From the previous discussion, we conclude that the complexity of our GIT based thermal simulator is . Finally, the completely proposed simulating algorithm is illustrated in Fig. 10.

3) Dynamic Thermal Simulation: While performing the

dy-namic thermal simulation, each can be modeled as a

user-specified time interval function with the magnitude in each interval being equal to the average power in each time interval. By using (21), each time-varying coefficient is

(33) where is the time step and is equal to the time interval of power density waveforms, is equal to (20) with being equal to the average power density profile in the time interval

, and .

After ’s are calculated, the average temperature of each grid cell at the sampling time can be obtained by (17) with the same evaluating method presented in Section III-C2. In addi-tion, applying (33) to compute each would not induce any unstable issue with a large because (33) is the exact solu-tion of the system (19), i.e., without the error caused from finite difference approximations such as the backward-Euler method, the trapezoidal method and the Runge–Kutta method. Further-more, since the thermal time constant of heat conduction is much larger than the clock period of circuit [2], [16], the time step can be far larger than the clock period of the circuit to save runtime with acceptable errors.

IV. THERMAL SIMULATION FOR 3-D ICS AND PACKAGESTRUCTURES

3-D ICs provide several advantages over 2-D ICs [8]. They provide the flexibility in system design, placement and routing, the suitability for circuits operating on different supply voltages and the capability of on-chip memory design. However, due to the high power density and the ill capability of heat dissipation, the thermal issue is one major concern for 3-D ICs.

Recently, the tradeoff between the circuit performance and the thermal issue of early-stage 3-D ICs design has been studied by estimating the uniform average temperature of each layer [9], [10]. To take into the thermal issue account for early-stage 3-D ICs design, Cong et al. [11], [12] applied the 1-D thermal model to predict the temperature cost for their floorplanning and placement algorithms, Goplen et al. [13] applied the FEM-based thermal simulator for their force-directed approach based stan-dard cell placement method, and Balakrishnan et al. [14] uti-lized the state-of-the-art numerical method provided in [2] to obtain the temperature distribution as the cost function for their global placement engine.

The 1-D thermal model can quickly capture the average tem-perature of the region close to cells for each active layer but it loses the spatial temperature gradient. The numerical method needs to know temperatures of unnecessary sampling points, such as the points being far from the region of devices, because temperatures of sampling points depend on each other. To effi-ciently obtain the nonuniform temperature distribution without needing the temperatures of unnecessary sampling points, we develop a fast 3-D IC thermal simulator by combining the GIT and numerical schemes.

This hybrid scheme is developed for the structure of 3-D ICs in Fig. 11 by using the effective heat transfer coefficients for the primary and secondary heat flow paths. On the other hand, although different materials in the primary and secondary heat

(9)

Fig. 11. Schematic diagram of a 3-D IC withN chip layers.

flow paths of Fig. 1 are described by effective heat transfer co-efficients for the fast temperature estimation, those materials should be modeled as an inhomogeneous structure for the fur-ther accuracy consideration. Since its structure is similar to the multilayer structure of 3-D ICs, its inhomogeneity can also be handled by the proposed 3-D IC thermal simulator.

As shown in Fig. 11, the structure of 3-D ICs is a multilayer structure with stacking silicon and insulator layers one by one [8]–[10]. The power sources are distributed in a thin layer close to the top surface of each active silicon layer in the -direction, and each insulator layer consists of Cu, ILD and glue materials. The heat transfer equations of 3-D ICs can be built by combining the governing equations of each layer with suitable boundary conditions. The heat diffusion equation inside each layer is sim-ilar to (3) with their corresponding thermal parameters and . Here, is the layer index. The boundary conditions on the lateral surfaces are flux isolated, and the boundary conditions at and are convection types with equivalent heat-transfer coefficients and for the primary and sec-ondary heat flow paths, respectively. With the GIT technique, the governing equations of 3-D ICs can be transformed into a 1-D subproblem by utilizing the following ortho-normal spatial bases in the - and -directions

(34)

where , , , and

with , . These ortho-normal spatial bases satisfy the following 2-D Sturm–Liouville problem:

(35)

where and . The

boundary conditions of (35) are flux isolated and equal to (9)

with replacing by .

Since ’s are ortho-normal spatial bases, the approx-imated rising temperature can be expressed as

(36)

where each is an unknown function, and needs to be found.

Combining the interface conditions, the temperature conti-nuity and the heat flux conservation law on the interface of two different layers, performing Galerkin’s scheme along the - and -directions, and using (35), each can be got by solving the following 1-D sub-problem:

(37) (38) (39) (40) (41) where , is

the layer index and , is the position of the th interface in the -direction, is the power density in the thin layer of the th active silicon substrate and is equal to zero as is even (insulator layer), and each .

Though the ortho-normal spatial bases in the -direction of the above 1-D sub-problem can be analytically solved by the

sign-count method [17], or (37)–(41) can be directly solved by

[23], their computational efforts5_{are relatively high for the} prac-tical purpose. Hence, we adopt the numerical scheme to ob-tain because its runtime is linear in the number of grid points along the -direction.

By discretizing this 1-D sub-problem along the -direction, the value of at each grid point in the -direction can be obtained by the following matrix equation:

(42)

where ,

’s are positions of grid points in the -direction,

, , and is the number of grid

points. The ’s and are tri-diagonal and diagonal

matrices, respectively, and is equal to

.

When performing steady-state simulation, is a con-stant vector, and is a zero vector. Hence, can be obtained without the time step evaluation. Moreover, because each is tri-diagonal, each can be solved in linear

5_{The complexity of sign-count method [17] for obtaining the ortho-normal}

spatial bases in thez-direction for each “il” is proportional to “#Layers 2 K ”. Here, N is the truncation number in the z-direction, and K is the sign-count iterations for obtaining the eigenvalue of each ortho-normal spatial basis. The complexity of using [23] to obtain the spatial bases in thez-direction for each “il” is extremely high because it needs symbolic expression for the determinant of a#Layers 2 #Layers matrix and needs to perform the inverse Laplace transform.

(10)

Fig. 12. Accuracy and the maximum error trend of a test chip. (a) Floorplan; (b) geometries of the test chip; (c) power distribution; (d) the rising temperature distribution of the top surface of the die; (e) the relative error distribution; and (f) the maximum relative error versus truncation point.

time. After solving , the steady-state temperature of (36) at any position of grid point can be cast into the similar form developed for 2-D ICs, and the proposed fast evaluating method can be used to calculate the temperature.

The transient analysis can be done by performing the time step evaluation to (42) for getting the value of at each time step, and then the proposed evaluating method is used to calculate the temperature at each time step. Note that, each wouldn’t change after functional blocks are replaced. Hence, once the LU decompositions of ’s are done, they can be reused during the temperature-aware design flow.

V. EXPERIMENTALRESULTS

We implement the proposed GIT-based thermal simulator and the Algorithm II of a highly efficient Green’s function based method [6] in C++ language. The state-of-the-art FFT package, FFTW3 [24], is used to realize the DCT and IDCT for [6]. All methods are tested on a HP xw9300 workstation with 16 GB memory. The results are compared with a commercial compu-tational fluid dynamic software ANSYS.

A. Accuracy and Fast Convergence of the GIT-Based Thermal Simulator

A chip, DEC Alpha 21264 [25], is employed to demon-strate the accuracy of our method, and its size is scaled down to 3.3 mm 3.3 mm 0.5 mm for the 65-nm technology. Its floorplan is shown in Fig. 12(a), and its die and package geometries are shown in Fig. 12(b). The equivalent thermal resistance of the package is set to be 45.5 C/W [21].

The interconnect layer consists of 25% copper and 75% oxide with the thickness being equal to 0.06 mm, and its effective thermal conductivity is 101 W/(m C). The thickness of the power source layer is set to be 20 nm which is the nominal value of the device junction depth for the 65-nm technology [26]. The equivalent heat transfer coefficient of the primary heat

flow path, , is 8700 W m C [6], and the equivalent heat transfer coefficient of the secondary heat flow path, , is

2017 W m C .

To appropriately set the thermal conductivity of die, we apply the 1-D thermal model shown in Fig. 3 to compute the av-erage temperature of die. To calculate the thermal resistance

, we apply the formula stated in [2], [15] to obtain 10.55 C/W. Here, is the cross area of die among the -direction. The is equivalent thermal resistance of the successively connected package and interconnect layers which is equal to 45.52 C/W. Initially, is calculated by using the thermal conductivity of die at the room temperature. After thermal resistances , and are obtained, the av-erage rising temperature of die is equal to

(43)

where and can be obtained by using the 1-D thermal model.

The got from (43) is the exact average rising tempera-ture of die for its 1-D thermal model with given thermal resis-tors. Once is calculated, is reset by using the thermal conductivity of die at . This calculating procedure is repeated until converges. Here, the room temperature is set to be 27 C. With the above procedure, the average temperature is 90.9 C, the thermal conductivity of die is 113.5

W m C , and C/W.

The top surface of die is divided into 128 128 grid cells and the average power density profile is shown in Fig. 12(c). The average steady state rising temperature distribution on the top surface of the die computed by the proposed method with the truncation points being 32 in each -, -, and -direction is shown in Fig. 12(d). The maximum relative error compared with the result of ANSYS is 0.24%, and its relative error distribution

(11)

Fig. 13. Power density and temperature distribution of a 1 cm2 1 cm chip with one million functional blocks. (a) The power density distribution and (b) the rising temperature distribution.

is shown in Fig. 12(e). The relative error of each grid cell is measured by

(44) where is the average rising temperature of grid cell obtained by ANSYS. Note that, the 65.14 C got by the 1-D thermal model is consistent with the

65.15 C got by the proposed GIT-based method. This verifies the ability of 1-D thermal model for predicting the average tem-perature of the entire die.

To further demonstrate our fast error decaying rate, we plot the maximum relative errors with different truncation points in Fig 12(f). The result shows that the proposed GIT-based ana-lyzer can achieve an extremely accurate solution even when the truncation points are very small.

B. Thermal Simulation for the Full-Chip Containing Lots of Functional Blocks

To demonstrate the capability of the proposed GIT-based method for the thermal simulation of full-chip with containing lots of functional blocks and the efficiency improvement over the Algorithm II of [6], a test chip with dimension of 1 cm 1 cm 0.5 mm and one million functional blocks is considered. The top surface of the chip is set to be adiabatic, and the power sources are assumed to be attached on the top surface of the die.6 _{The setting is consistent with the setting} in [6]. Fig. 13(a) shows the power density distribution of the functional blocks in . The top surface of the chip is divided into 1024 1024 grid cells. The truncation points of our GIT-based method are and the truncation points of [6] are 2048 2048 to achieve the same maximum error level. The average rising temperature distribution of the top surface got by our GIT-based method is shown in Fig. 13(b), and the maximum error is 0.3576% presented in Table I.

The runtime comparison is shown in Table I. The runtime of the post-calculating stage in our method is 0.1312 s while the runtime of the post-calculating stage in [6] is 2.7642 s. The speedup of our method over [6] is 21.07 at the post-calculating stage. This result demonstrates the substantial efficiency im-provement of our thermal analyzer over [6].

6_{The power sources which are attached on the top surface of the die can be}

easily handled by deriving the integral transform pair with the assumption of the plane-power density on the top surface of die. The general solution can be found in [17]–[19].

TABLE I

ACCURACY ANDRUNTIMECOMPARISON OF THEPROPOSEDGIT-BASED

METHOD ANDALGORITHMIIOF[6]

C. Accuracy and Efficiency of the GIT-Based Thermal Simulator for the 3-D IC Thermal Analysis

To demonstrate the accuracy of our GIT-based thermal sim-ulator for 3-D ICs, three chip layers are stacked and the power sources are distributed in three thin layers with the thickness being equal to the device junction depth. The lateral dimension of each chip layer is 3.3 mm 3.3 mm. The thicknesses of insu-lator and silicon layers on both top and middle chips are scaled down to 15 and 10 m, respectively. The thicknesses of insu-lator and silicon layers (including the substrate) for the bottom chip are 15 and 500 m, respectively. The thermal parameters of each layer are referred to [9]. The top surface of each sil-icon layer is divided into 128 128 grid cells. The truncation point is 32 in each - and -direction, and the number of sam-pling points in the -direction is 10 for each layer. Comparing with the result of ANSYS, our maximum error is 0.24% which demonstrates the accuracy of our method for 3-D ICs.

To show the efficiency of our GIT-based method for the cell-level thermal analysis in 3-D ICs, the top surface of each silicon layer is divided into 1024 1024 grid cells to mimic 1.05 mil-lion power sources. The truncation point and the number of sam-pling points in the -direction are the same as the case of 128 128 grid cells. The average power density profile of each silicon layer is shown in Fig. 14(a), (c), and (e). The estimated average steady state rising temperature distribution on the top surface of each silicon layer is shown in Fig. 14(b), (d), and (f) from the top layer to the bottom layer. The runtime of our GIT-based method is 0.031 s for the pre-calculating stage (including the LU decomposition of each tri-diagonal matrix ). The runtime of the post-calculating stage is only 0.48 s (including 0.016 s for calculating each ).

VI. CONCLUSION

An accurate and efficient GIT-based thermal simulator has been presented. Experimental results confirm its theoretical property which can achieve accurate results with sufficiently small truncation points. The proposed algorithm only takes 0.13 s for a chip with one million functional blocks and over one million grid cells, and 0.48 s for a 3-D IC with 3.146 million grid cells in the post-calculating stage to achieve ac-curately steady state temperature distribution. Therefore, the proposed GIT-based thermal simulator is very suitable for the thermal-aware design flow.

(12)

Fig. 14. Power density and temperature distribution of a test 3-D chip. (a)–(c) The power density distribution on the top surface of the top, middle, and bottom silicon layers, and (d)–(f) the temperature distribution on the top surface of the top, middle, and bottom silicon layers.

APPENDIXI

DERIVATION OFTIME-VARYINGCOEFFICIENTS FOR THE APPROXIMATEDTEMPERATURE

To derive the uncoupled first-order differential (19) for each , both sides of (3) are multiplied by and integrated over the region of the die. After that, we have

(45)

where .

The inward and outward flows of (45) must be balanced to satisfy the energy conservation law. Therefore, by applying the Divergence Theorem [18] to the left-hand side of (45), we have

(46)

where is the surface integral of the boundary surface union of the die, and is the normal derivative on in the outward direction.

By applying the Divergence Theorem to the second term in the right side of (46), we have

(47)

Inserting (47) into (46), and then putting the result into (45), we have

(48) By plugging (8), (18), (4)–(6), and (9)–(11) into (48), we have the uncoupled first-order differential (19) for each .

APPENDIXII

ERRORBOUNDANALYSIS OF THEGIT-BASEDSTEADY-STATE TEMPERATUREFORMULATION

To proceed the error bound analysis of the GIT-based steady-state temperature formulation steady-stated in Section III-C1, the fol-lowing lemma is introduced.

Lemma 1: The magnitude of each time-varying coefficient

at the steady state is bounded by

(49) where is the total steady power consumption of die and is the junction depth of device.

Since the time domain waveform of steady power profile can be treated as a step function, Lemma 1 can be easily proved by plugging (12) and (20) into (21), setting to be infinity and with

(13)

several manipulations. With Lemma 1, an error bound of the GIT based formulation is given by the following theorem.

Theorem 1: The absolute error of average steady state

tem-perature for each grid cell by using the GIT-based for-mulation with truncation points of , and in -, -, and

-directions is bounded by (50) where and , , , , , , , .

Proof: As pointed out in [18] and [19], (17) is convergent in

mean when truncation points are infinities. Hence, the absolute truncation error is bounded as

(51)

where . Plugging (12) into (51),

utilizing Lemma 1 and with several manipulations, we can get the error bound (50).

Since the decaying rate of is dominated by , the error decaying rate of the GIT-based steady-state temper-ature formulation stated in Section III-C1 is dominated by

.

To compare the error bound of the GIT-based formulation with the error bound of [6] under the same boundary conditions and the plane-power source assumption, the error bound (50) can be simplified to

(52)

where ,

, , and

.

The previous result shows that the error decaying rate of our GIT-based method can be in the order of

.

On the other hand, the error bound of the Green’s function-based method shown in [6] can be similarly derived as

(53)

Fig. 15. Temperature-power iterative framework for dealing with the tempera-ture dependence issue of leakage power.

where ,

, , ,

, , and

.

This bound shows that the error decaying rate of the Green’s function based method [6] is in the order of

.

APPENDIXIII

EXTENSION OFGITFOR THETEMPERATURE-DEPENDENCE ISSUE OFLEAKAGEPOWER

By utilizing the temperature-power iterative frameworks [1], [7], [10], the proposed GIT-based method can be extended to consider the temperature-dependence issue of leakage power, as shown in Fig. 15. In the beginning, the power density profile is calculated at the room temperature and is immediately up-dated by applying the temperature-power iterative framework to the 1-D thermal model of the chip before carrying out the detail thermal simulation. Then, the temperature-power itera-tive framework is executed by recursively using the GIT thermal simulator and the power density calculator until they converge. Here, the precalculating stage only needs to be done once since it is independent of the power density profile.

Remarks: By integrating (17) from to and converting the result to the form which is suitable for performing

2D-STL-FFT, a more accurate temperature estimation can be

obtained. However, the difference between the top surface tem-perature distribution and the average temtem-perature distribution for the power source layer is very small because the thickness of power source layer is very small.

ACKNOWLEDGMENT

The authors would like to thank the National Center for High-Performance Computing of Taiwan for the computer time and facilities.

REFERENCES

[1] W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R. Stan, “HotSpot: A compact thermal modeling methodology for early-stage VLSI design,” IEEE Trans. Very Large Scale Integr.

(VLSI) Syst., vol. 14, no. 5, pp. 501–513, May 2006.

[2] T.-Y. Wang and C. C.-P. Chen, “Thermal-ADI: A linear-time chip-level thermal simulation algorithm based on alternating-direction im-plicit (ADI) method,” IEEE Trans. Very Large Scale Integr. (VLSI)

Syst., vol. 11, no. 4, pp. 691–700, Aug. 2003.

[3] T.-Y. Wang and C. C.-P. Chen, “SPICE-compatible thermal simulation with lumped circuit modeling for thermal reliability analysis based on model reduction,” in Proc. Int. Symp. Quality Electron. Des., 2004, pp. 357–362.

(14)

[4] P. Li, L. T. Pileggi, M. Asheghi, and R. Chandra, “IC thermal simu-lation and modeling via efficient multigrid-based approaches,” IEEE

Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 9, pp.

319–326, Sep. 2006.

[5] J.-L. Tsai, C. C.-P. Chen, G. Chen, B. Goplen, H. Qian, Y. Zhan, S.-M. Kang, M. D. F. Wong, and S. S. Sapatnekar, “Tempera-ture-aware placement for SOCs,” Proc. IEEE, vol. 94, no. 8, pp. 1502–1518, Aug. 2006.

[6] Y. Zhan and S. S. Sapatnekar, “High efficiency Green function-based thermal simulation algorithms,” IEEE Trans. Comput.-Aided Des.

In-tegr. Circuits Syst., vol. 26, no. 9, pp. 1661–1675, Sep. 2007.

[7] Y. Yang, Z. Gu, C. Zhu, R. P. Dick, and L. Shang, “ISAC: Integrated space-and-time-adaptive chip-package thermal analysis,” IEEE Trans.

Comput.-Aided Des. Integr. Circuits Syst., vol. 26, no. 1, pp. 86–99,

Jan. 2007.

[8] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, “3-D ICs: A novel chip design for improving deep-submicrometer interconnect per-formance and systems-on-chip integration,” Proc. IEEE, vol. 89, no. 5, pp. 602–633, May 2001.

[9] G. L. Loi, B. T. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood, and K. Banerjee, “A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy,” in Proc. Des. Autom.

Conf., 2006, pp. 991–996.

[10] H. Hua, C. Mineo, K. Schoenfliess, A. Sule, S. Melamed, R. Jenkal, and W. R. Davis, “Exploring compromises among timing, power and tem-perature in three-dimensional integrated circuits,” in Proc. Des. Autom.

Conf., 2006, pp. 997–1002.

[11] J. Cong, G. Luo, J. Wei, and Y. Zhang, “Thermal-aware 3D IC place-ment via transformation,” in Proc. Asia South Pacific Des. Autom., 2006, pp. 780–785.

[12] J. Cong, J. Wei, and Y. Zhang, “A thermal-driven floorplanning algo-rithm for 3D ICs,” in Proc. Int. Conf. Comput.-Aided Des., 2004, pp. 306–313.

[13] B. Goplen and S. Sapatnekar, “Efficient thermal placement of standard cells in 3D ICs using a force directed approach,” in Proc. Int. Conf.

Comput.-Aided Des., 2003, pp. 86–89.

[14] K. Balakrishnan, V. Nanda, S. Easwarm, and S. K. Lim, “Wire con-gestion and thermal aware 3D global placement,” in Proc. Asia South

Pacific Des. Autom., 2005, pp. 1131–1134.

[15] Y.-K. Cheng, P. Raha, C.-C. Teng, E. Rosenbaum, and S.-M. Kang, “ILLIADS-T: An electrothermal timing simulator for tempera-ture-sensitive reliability diagnosis of CMOS VLSI chips,” IEEE Trans.

Comput.-Aided Des. Integr. Circuits Syst., vol. 17, no. 8, pp. 668–681,

Aug. 1998.

[16] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan, “Temperature-aware microarchitecture: Modeling and implementation,” ACM Trans. Arch. Code Opt., vol. 1, no. 1, pp. 94–125, Mar. 2004.

[17] M. D. Mikhailov and M. N. Ozisik, Unified Analysis and Solutions of

Heat and Mass Diffusion. New York: Wiley, 1983.

[18] N. Y. Olcer, “On the theory of conductive heat transfer in finite region,”

Int. J. Heat Mass Transfer, vol. 7, pp. 307–314, 1964.

[19] M. D. Mikhailov, “General solutions of the heat equation in finite re-gions,” Int. J. Eng. Sci., vol. 10, pp. 577–591, 1972.

[20] J. Parry, H. Rosten, and G. B. Kromann, “The development of compo-nent-level thermal compact models of a C4/CBGA interconnect tech-nology: The Motorola PowerPC 603 and PowerPC 604 RISC micro-processors,” IEEE Trans. Compon., Packag., Manuf. Technol. A, vol. 21, no. 1, pp. 104–112, Mar. 1998.

[21] C. Lasance, H. Vinke, H. Rosten, and K.-L. Weiner, “A novel approach for the thermal characterization of electronic parts,” in Proc. IEEE

Semi-Therm Symp., 1995, pp. 1–9.

[22] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery,

Nu-merical Recipes in C++. Cambridge, U.K.: Cambridge Univ. Press, 2002.

[23] X. Lu, P. Tervola, and M. Viljanen, “A novel and efficient analyt-ical method for calculation of the transient temperature field in a multi-dimensional composite slab,” J. Phys. A: Math. Gen., vol. 38, pp. 8337–8351, 2005.

[24] M. Frigo and S. G. Johnson, “FFTW Version 3.1 Package,” 2004. [On-line]. Available: http://www.fftw.org

[25] W. Liao, L. He, and K. Lepak, “Temperature-aware performance and power modeling,” UCLA, Los Angeles, CA, Tech. Rep. UCLA Eng. 04-250, 2004.

[26] F. Lallement, B. Duriee, A. Grouillet, F. Amaud, B. Tavel, F. Wac-quant, P. Stalk, M. Woo, Y. Erokhin, J. Scheuer, L. Gadet, J. Weeman, D. Distaso, and D. Lenoble, “Ultra-low cost and high performance 65 nm CMOS device fabricated with plasma doping,” in Symp. VLSI

Technol. Dig. Tech. Papers, 2004, pp. 178–179.

Pei-Yu Huang received the B.S. degree in electrical

engineering from the National Taiwan University of Science and Technology, Taiwan, in 2004, and the M.S. degree from National Chiao-Tung Uni-versity, Taiwan, in 2004, where he is pursuing the Ph.D. degree in the Department of Communication Engineering.

His research interests include computer-aided de-sign of integrated circuits, thermal analysis, thermal optimization technique, and power grid analysis.

Yu-Min Lee (M’03) received the B.S. and M.S.

degrees in communication engineering from the Na-tional Chiao-Tung University, Taiwan, in 1991 and 1993, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Wisconsin-Madison, Madison, in 2003.

Since 2003, he has been an Assistant Professor with the Department of Communication Engi-neering, National Chiao-Tung University. His research interests include computer-aided design on VLSI circuits with emphases on interconnect analysis and optimization, and circuit/thermal/electro-thermal simulation.