基於增強和高效的離散餘弦變換的熱模型的構建及其應用於熱感知擺放

(1)

國

立

交

通

大

學

電子工程學系電子研究所

碩士論文

基於增強和高效的離散餘弦變換的熱模型的構建及

其應用於熱感知擺放

Enhanced and Efficient DCT Based Thermal Model Construction

and Its Application to Thermal-Aware Placement

研究生：吳永證

指導教授：陳宏明教授

(2)

基於增強和高效的離散餘弦變換的熱模型的構建及

其應用於熱感知擺放

Enhanced and Efficient DCT Based Thermal Model Construction

and Its Application to Thermal-Aware Placement

研究生：吳永證 Student：Yong-Zheng Wu

指導教授：陳宏明 Advisor：Hung-Ming Chen

國立交通大學

電子工程學系電子研究所

碩士論文

A Thesis

Submitted to Department of Electronics Engineering and Institute of Electronics

College of Electrical and Computer Engineering National Chiao Tung University

in partial Fulfillment of the Requirements for the Degree of

Master of Science in

Electronics Engineering

July 2011

Hsinchu, Taiwan, Republic of China

中華民國一００年七月

(3)

基於增強和高效的離散餘弦變換的熱模型的構建

及其應用於熱感知擺放

學生: 吳永證

指導教授: 陳宏明教授

國立交通大學電子工程學系電子研究所碩士班

摘

要

在這篇論文中，我們提出了一種快速、準確的熱感知解析式擺置器 (Thermal-Aware Analytical Placer)。熱模型(Thermal model)是以格林函數(Green Function)建構，並以增強的離散餘旋變換(Enhanced-DCT)來產生完整晶片的溫度曲線(Full-Chip Temperature Profile)。與以往其他的熱感知擺置器相比，我們的熱模型與力平衡擺置器緊密地結合。我們提出基於二維高斯模型(2D Gaussian model)的熱散播力(Thermal Spreading Force)以及動態熱區域控制，以降低最高晶片溫度，能優化總半周長(HPWL)和晶片上的溫度分佈。

我們的熱模型已被最新的商業工具驗證，平均偏差在 6.5%內，並加速 240 倍。比起 Capo 與 APlace2，我們的擺置器可以加速 3-4 倍並達到同樣的結果。實驗是以 ISPD2005 年的測資(Test Bench)測試，基準高達 200 萬邏輯閘的設計。其結果更進一步以 GSRC 計算器評估總 HPWL，並用商用軟體 ICEPAK 做溫度分佈。

(4)

Enhanced and Efficient DCT Based Thermal

Model Construction and Its Application to

Thermal-Aware Placement

Student: Yong-Zheng Wu

Advisor: Prof. Hung-Ming Chen

Department of Electronics Engineering

National Chiao Tung University

Hsinchu, Taiwan 300, R.O.C.

2011-07

(This page will be replaced by a pdf file which is transformed from the formal cover page edited with Microsoft Word.)

(5)

Enhanced and Efficient DCT Based Thermal Model

Construction and Its Application to Thermal-Aware

Placement

Student: Yong-Zheng Wu Advisor: Prof. Hung-Ming Chen

Department of Electronics Engineering Institute of Electronics

National Chiao Tung University

ABSTRACT (Chinese)

(This page will be replaced by a pdf file which is transformed from the Chinese abstract edited with Microsoft Word.)

(6)

Enhanced and Efficient DCT Based Thermal Model

Construction and Its Application to Thermal-Aware

Placement

Student: Yong-Zheng Wu Advisor: Prof. Hung-Ming Chen

Department of Electronics Engineering Institute of Electronics

National Chiao Tung University

ABSTRACT

In this thesis, we proposed a fast and accurate thermal aware analytical placer. Thermal model is constructed based on Green function with enhanced DCT to gen-erate full chip temperature profile. Unlike other previous thermal aware placers, our thermal model is tightly integrated with a flat force directed placement. A ther-mal spreading force based on 2D Gaussian model is proposed to reduce maximum on-chip temperature with dynamic hot region size control, optimizing between total half-perimeter wirelength (HPWL) and on-chip temperature distribution.

Our thermal model is verified by the most recent commercial tool and has an average deviation within 6.5% with 240x speed up. Our placer can reach the same quality compared to Capo and APlace2 with 3-4x speed up. Experiments are tested using ISPD 2005 benchmark with up to 2 million gate design. The results are further evaluated using GSRC Bookshelf Evaluator for total HPWL, and using ICEPAK for temperature distribution.

(7)

ACKNOWLEDGEMENTS

First and foremost, I would like to acknowledge the advice and guidance of my professor Hung-Ming Chen. I also thank my lab mates for all their help and take care of me, especially Ph.D Snieaulu for all his encouragement and suggestions. Special thanks to professor Yu-Min Lee and his Ph.D student Pei-Yu Huang without their knowledge and assistance this thesis would not have been successful.

The major experiment tools for this study was provided by National Chip Im-plementation Center (CIC), Xin-Zhu, Taiwan. I would like to thank their support and their co-operation, especially Mrs. Jing-Ru Chu, the research assistant of chip stack structure and thermal analysis, without her help and tutor this experimental results could not have been finished.

Finally, I thank to my parents for supporting me throughout all my studies and thank to the National Chiao Tung Unviversity (NCTU) for providing a very good and beautiful environment.

(8)

List of Tables

1.1 Previous work on thermal aware placement . . . 4

4.1 Temperature profile comparison with Icepak . . . 26

4.2 Placement results in ISPD 2005 benchmarks . . . 27

(11)

List of Figures

2.1 Summary overall thermal analysis flow chart . . . 6

2.2 Schematic of cross-sectional view of a VLSI chip with packaging . . . 7

2.3 Illustration of the simplified chip model . . . 8

2.4 The illustration of power density on meshing top surface of chip . . . 11

2.5 The illustration of butterfly algorithm for DFT . . . 14

2.6 (a) The original input sequence. (b) The even extension of original

input. . . 15

2.7 The sequence of input patterns after reordering. (a) The original

input sequence. (b) The sequence of vn and wn. . . 16

3.1 Placement of cells to demonstrate interleaving mechanism between

legalization and CG for adaptec1. . . 18

3.2 Illustration of the legalization process. . . 20

3.3 Flow Diagram of our thermal aware placement. . . 21

3.4 Forces exerted on a cell by thermal anchor and legalizing anchor.

Thermal anchor is placed on a circle and red rectangle is the identified

hot region. . . 22

3.5 An illustration of 2-D Gaussian force spread model with center

(12)

4.1 (a). Power density of adaptec1. (b). Power density of bigblue3. (c). Temperature distribution of adaptec1. (d). Temperature distribution of bigblue3. (e). Temperature distribution of adaptec1 using Icepak before thermal aware placement. (f). Temperature distribution of

adaptec1 using Icepak before thermal aware placement. . . 29

(13)

Chapter 1 Introduction

The demand to alleviate temperature variation and increase chip reliability in-creases as technology continue to advance. Recent studies have shown that chip performance and reliability can deteriorate to a certain magnitude due to

temper-ature variation. Evidence have shown that a 10◦

C variation in temperature can induce 5% increase in interconnect delay, 25% increase in crosstalk-induced noise and reduce component life time by 50% [7][6].

The situation for on-chip temperature and variation will only increase as technol-ogy advances. As node size scales down, components will be placed more compact which entails higher current density per unit area. Shrinking in interconnect di-mension will increase interconnect resistance which results in higher power and heat dissipation. Viewed in this light, the effect of chip temperature and variation must be addressed to when considering chip performance and reliability.

1.1 Previous Work

1.1.1 Placement

Regarding to current state-of-the-art, placement can be categorized to three cate-gories: partition based, linear force-directed and non-linear force directed. Partition-based placer such as Capo[21] partition the given placement using Fidducia-Mattheyes

(14)

in a top-down fashion. Although it does not require any model to approximate total HPWL, it trails behind force-directed and non-linear force directed placer in both quality and run time.

Non-linear forced directed placer such as NTUPlace3[9], mPL6[5] and APlace2[15] uses log-sum-exp model to approximate total HPWL. The log-sum-exp model is a very accurate model to approximate HPWL. However, log-sum-exp model is rela-tively complex which requires longer run time and is patented which requires license to use. To accelerate run time, non-linear force directed placer generally coarsen the given placement by clustering cells to reduce problem size.

Linear force directed placer such as RQL[24], simPL[16], FastPlace3[25] and KraftWerk2[22] use bound-to-bound, star-net or clique model to approximate total HPWL. The simplicity of linear force directed model implies faster run time but is less accurate compare to log-sum-exp model. However, recent development in [24] and [16] have outperformed all non-linear force directed placer in total HPWL by a notch and is approximately 3-5 times faster.

1.1.2 Thermal Model

Regarding to thermal model, most recent approaches can be categorized to nu-merical and analytical approach. Nunu-merical approach generally uses finite-difference method(FDM) [12] or finite-element method(FEM) [23] by meshing given silicon substrate and then solve a set of linear equation to obtain temperature profile. FDM discretizes the partial differential equation of heat conduction and uses forward-difference method to approximate temperature profile. FEM discretizes the given temperature for each grid point in design space and then points elsewhere is calcu-lated using interpolation. Generally, numerical approach can achieve high accuracy at the expense of relatively long run time making it suitable for post-layout thermal verification.

(15)

Analytical approach solves the differential equation of heat problem by first generating the fundamental solution of one unit heat source and then exploit the linearity of the heat equation to obtain the general solution of overall temperature distribution [27][26][13]. In contrast to numerical approach, analytical approach can quickly obtain an approximate solution in closed form representation without an amount volume of meshing making it suitable for where approximate solution is adequate.

After two classified approaches, there is two states of analysis to be concerned, transient-state and steady-state. The transient-state is concerned about ture in time-varying but the steady-state is interested in the stabilized of tempera-ture in long term state. In this paper, we proposed to consider the steady-state of temperature.

1.1.3 Thermal Aware Placement

To implement a thermal aware placement, two main components are usually required: a thermal model to conduct full chip thermal analysis and a placement mechanism to consider the optimization between thermal effect and wirelength.

Regarding to previous works on thermal aware placement, Table 1.1 summarizes previous work on thermal aware placer based on their thermal model and place-ment algorithm. Tsai et al.[23] constructs lumped RC matrix to model substrate heat conduction and obtain thermal profile using FDM, then simulated annealing is applied evenly spread out hot spot. Kahng et al. [14] adopts its thermal model from [23] and integrate the model to its previously published placer, APlace [15]. Chen et al.[8] simplifies the model in [23] and applies partition based placement to consider thermal effect. Goplen et al.[11] uses FEM method to conduct full chip temperature analysis and uses linear force directed placer based on star net model to mediate thermal effect. Jing et al.[17] constructs RC equivalent matrix to model

(16)

heat transfer and obtain thermal profile using FDM, Fiduccia-Mattheyses partition is then applied as their placement strategy. Bernd et al.[19] proposes a methodology to consider the impact of dynamic power density by adding additional temperature cost obtained from FDM into the iterative annealing optimization.

Table 1.1: Previous work on thermal aware placement

hhh hh hh hh hh hhh Placement Thermal Model FEM FDM Simulated Annealing [23] Partition-Based [11] [8],[17]

Linear Force Directed [19]

Non-Linear Force Directed [14]

1.2 Our Contributions

In spite of past effort, thermal aware placement today still suffers deficiency in terms of (i). accuracy of full-chip analysis,(ii). quality of placement and (iii). fast execution time. The accuracy of some thermal models are unknown since it lacks evaluation with accurate model. The quality of algorithms for placement is deficient either because it greedily focuses on thermal distribution or its placement methodology is relatively naive. Some methods are impractical to deal with million gate design due to execution speed when using simulated annealing or solving com-plex thermal model requires long execution time. In addition, none of our surveyed thermal-aware placers has tested on million gate design.

In this thesis, we presented a thermal aware placement to reduce maximum temperature using analytical thermal model combined with linear force-directed placer. Both of our thermal model and placer are analytical making it inherently fast. The run time bottleneck of Green function based thermal analysis is in its post-process DCT calculation and we reduced the complexity to O(N logN ) by applying even extension and input reordering algorithm. Our thermal model is verified with commercial tool ANSYS Icepak and demonstrated that the deviation of our thermal

(17)

model is within 6.5% and is 242x times faster.

The thermal model is integrated with a flat (no clustering is required) global placement based on linear-force model which is 3-4x faster compare to non-linear and partitioned based placer with reasonable placement quality. We experimented our thermal-aware placement on a set of open source benchmarks released by IBM and evaluated our the thermal profile using Icepak[1].

We proposed the concept of thermal anchor combined with dynamic resizing scanning of hot region to optimize the trade-off between maximum temperature and total HPWL. The experiments have performed on three kinds of weight adjustment which are linear force model, quadratic force model and quadratic Gaussian force model to demonstrate the effectiveness of our thermal aware placement.

The following chapters are organized as follows: Chapter 2 describes the analyt-ical thermal model and Chapter 3 depicts the application of thermal for placement. Chapter 4 presents the experimental results and finally Chapter 5 concludes the thesis.

(18)

Chapter 2 Thermal Model

2.1 Analytical Thermal Model Based on Green

Function

In early stage of physical design, the primary concern of thermal analysis focuses more on its speed rather than accuracy. Therefore, we adopted our thermal model using Green function to quickly perform thermal analysis. The concept of the Green function is to first derive the fundamental solution then use linear superposition to obtain the general solution for overall chip temperature distribution.

(a) Thermal analysis flow

Figure 2.1: Summary overall thermal analysis flow chart

(19)

follows, first the fundamental bases solution which satisfies boundary conditions is generated from homogeneous equation. Then, the general solution can be obtained by applying generated bases solution to Green function to express power density and temperature distribution. Finally, the solution to the heat conduction based on many power sources is in a form of DCT and IDCT. Thus, the input reordering is adopted to enhance DCT and IDCT execution time. As shown in Figure 2.1,

Given the law of conservation of energy, it states that the total amount of energy in a system remains constant over time. That means the changing rate of heat energy equals to the summation of conduction heat flowing into the unit volume and its pre-existing power density. Hence, the temperature distribution inside the chip is defined as

σ∂T (r, t)

∂t = ∇ · (κ∇T (r, t)) + p(r, t) r ∈ D (2.1)

which is also known as heat diffusion equation. T denotes temperature in Celsius (◦

C), the domain r = (x, y, z) κ denotes thermal conductivity (W/m◦

C) and σ = cpρ

has unit of (J/m3◦_{C). c}

p is the specific heat, ρ is the heat capacity, p(r, t) is the

power density of heat source (W/m3_{) and D = (0, L}

x) ∗ (0, Ly) ∗ (−Lz, 0) are the

dimensions of die.

(a) Chip Model

Figure 2.2: Schematic of cross-sectional view of a VLSI chip with packaging Figure 2.2 depicts a flip-chip model in which top and bottom surfaces of die are

contacted by printed circuit board (PCB) and heat sink. hp denotes primary heat

flow to the heat sink and hs denotes secondary heat flow to the PCB. The model

(20)

illustrated in Figure 2.3. The boundary conditions of such assumption are derived as ∂T (r, t) ∂x |x=0,Lx= ∂T (r, t) ∂y |y=0,Ly= 0 (2.2) κ∂T (r, t) ∂z |z=−Lz= hpT (x, y, −Lz, t) (2.3) κ∂T (r, t) ∂z |z=0= −hsT (x, y, 0, t) (2.4)

Since steady-state temperature analysis is the primary concern in this thesis, temperature and power density are assumed to be stable. The temperature distri-bution equation (2.1) can be written as a form of inhomogeneous equation.

∇2T (r) = −p(r)

κ (2.5)

Figure 2.3: Illustration of the simplified chip model

To solve the inhomogeneous equation and satisfy the boundary conditions, the solution bases of homogeneous equation needs to be solved to satisfy all boundary conditions. Finally, the general solution of temperature distribution in R domain can then be derived by substituting the bases solution from homogeneous equation into inhomogeneous equation (2.5).

2.1.1 Homogeneous Problem

In this subsection, the objective is to find the solution which satisfies the homo-geneous equation and inhomohomo-geneous conditions in top and bottom boundary. The homogeneous equation is first defined by setting the external force to zero which correspond to the power density in (2.1). That is,

σ∂T (r, t)

(21)

Given that Θ(r, t) is the solution to homogeneous equation, it is then substituted into homogeneous equation (2.6).

∂Θ(r, t)

∂t = γ

∂2_Θ

∂r2 (2.7)

which γ is a constant equal to κ/σ

Θ(r, t) is a function of space domain R and time domain t in which space domain and time domain are independent to each other. Hence, Θ(r, t) can be separated to two functions of variable r and t respectively.

Θ(r, t) = τ (t)υ(r) (2.8)

(2.8) substitutes to (2.7) then gather time-dependent terms on one side and space-dependent terms on the other side.

τ′

τ = γ

υ′′

υ (2.9)

The left hand side is a function of time t and the right hand side is a function of space R. Thus, both sides of (2.9) must be equal to the same constant. By denoting

the constant as −α and let α = γω2 _{with ω > 0.}

τ′

+ ατ = 0 (2.10)

υ′′+ ω2υ = 0 (2.11)

The fundamental solution of τ (t) and υ(r) that satisfies the homogeneous bound-ary conditions can then be obtained by using complex variable method.

τ (t) = e−αt (2.12)

(22)

For υ(z), a combination of trigonometric function is applied to satisfy the inho-mogeneous of top and bottom conditions. That is,

υ(z) = κcos(ωz(z + Lz)) + hp ωz sin(ωz(z + Lz)) (2.14) where ωz satisfies κ2_ω2 z − hphs κωz(hp + hs) = cot(ωzLz) (2.15)

Since cosine is periodic, ωL must be an integer multiple of π. The general solution of homogeneous heat equation at infinite time is expressed as the summation of numerous solution bases υ(r).

Θ(x, y, z, ∞) = ∞ X i=0 ∞ X j=0 cos(iπ Lx x)cos(jπ Ly y)υ(z) (2.16)

2.1.2 Inhomogeneous Problem

To solve the inhomogeneous equation subject to specific initial and boundary

conditions, Green function denoted by G(r, r′

) is applied. Let G be the temperature

distribution of a unit point power source denoted by δ(r, r′

). The complete equation of G is

∇2G(r, r′) = −δ(r, r

′

)

κ (2.17)

and the boundary conditions is

∂G(r, r′ ) ∂x |x=0,a = ∂G(r, r′ ) ∂y |y=0,b (2.18) κ∂G(r, r ′ ) ∂z |z=−Lz = hpG(x, y, −Lz) (2.19) κ∂G(r, r ′ ) ∂z |z=0 = −hsG(x, y, 0) (2.20) where δ(r, r′) = 1 0 , r = r′ r 6= r′ (2.21)

(23)

The bases solution (2.16) obtained by solving homogeneous equation is a set of

orthogonal bases which satisfies the characteristics of Green function with Nx, Ny

and Nz truncation point. In addition, the integration of two orthogonal bases is

equal to δ. Therefore, (2.16) can be substituted to G(r, r′

). G(r, r′ ) = 1 κω2 Nx−1 X i=0 Ny−1 X j=0 Nz−1 X k=0 υij(x ′ , y′ )υk(z ′ ) (2.22)

The general solution of T (r) is obtained by using superposition of integrals of

power density distribution P (r) with G(r, r′

). T (r, ∞) = Z D G(r, r′ )P (r′ )dr′ (2.23)

Since power density is a function only of x and y direction, it may be also written in the form of superposition of orthogonal bases of x and y direction.

P (r) ≈ Pij = Z Lx 0 Z Ly 0 p(x′ , y′ )υij(x′, y′)dx′dy′ (2.24)

Figure 2.4: The illustration of power density on meshing top surface of chip In Figure 2.4, the overall chip power distribution is approximated by dividing

chip into M ∗ N grids with power density pmn in each grid. The power density in

each bin is the summation of power density of contained cells. For cells that are placed on the boundary between two bins, power density of the cell is divided based on cell area covered to each bin. Following equation describes the approximated

(24)

power density Pij for each bin. Pij = M −1 X m=0 N −1 X n=0 pmn Z (m+1)Lx_M mLx/M Z (n+1)Ly_N nLy/N υij(x′, y′)dx′dy′ = 4LxLy ijπ2 sin( iπ 2Lx )sin( jπ 2Ly ) ∗ M −1 X m=0 N −1 X n=0 pmncos( iπ(2m + 1) 2M )cos( jπ(2n + 1) 2N ) (2.25)

The second term on the right hand side of (2.25) is a form of type-I DCT, thus, the formula can be simplified as

Pij = Cij ∗ ˆPij (2.26) where Cij = 4LxLy ijπ2 sin( iπ 2Lx )sin( jπ 2Ly ) (2.27) ˆ Pij = DCT [pmn] (2.28)

Similar to power density distribution, temperature distribution can also be ap-proximated by dividing into numerous small bins. To describe temperature

distri-bution Tmn in each grid, (2.23) is substituted by (2.25).

T (r, ∞) ≈ T (m, n) = Z Nr 0 1 κω2 Nx−1 X i=0 Ny−1 X j=0 Nz−1 X k=0 υij(x′, y′)υk(z′)Pijdr′ (2.29)

Due to the bases in z direction described in (2.29) is independent to x and y

(25)

derived as T (m, n) = Kz Nx−1 X i=0 Ny−1 X j=0 CijPîj Z Nx 0 Z Ny 0 υij(x ′ , y′)dx′dy′ = Kz Nx−1 X i=0 Ny−1 X j=0 CijDijPîjcos( iπ(2m + 1) 2M )cos( jπ(2n + 1) 2N ) = Kz Nx−1 X i=0 Ny−1 X j=0 EijPîjcos( iπ(2m + 1) 2M )cos( jπ(2n + 1) 2N ) = Kz · IDCT [EijPîj] (2.30) where Kz = 1 κω2 Z Nz 0 υk(z ′ )dz′ (2.31) Eij = Cij ∗ Dij (2.32) Dij = 4NxNy ijπ2 sin( iπ 2Nx )sin( jπ 2Ny ) (2.33)

When i = 0 and j = 0, (2.32) cannot be evaluated. l’Hopital’s rule is then

applied to approximate the value of Eij when i ≈ 0 and j ≈ 0. Eij can then be

written as Eij =            △x△y 4N Ly△xsin2(_2Njπ) j2_π2 4M Lx△ysin2(_2Miπ) i2_π2

16M N LxLysin2(_2Miπ)sin2(_2Njπ)

i2_j2_π4 , i = 0, j = 0 i = 0, j 6= 0 i 6= 0, j = 0 i 6= 0, j 6= 0 (2.34)

(2.30) is also a form of type-I inverse DCT (IDCT) of Eij∗ ˆPij. Solving DCT and

IDCT directly requires complexity of O(N4_{). However, O(N logN ) can be achieved}

by applying input reordering. The runtime can be further enhanced by calculating

Kz and Eij beforehand since both terms are independent of the geometry of power

density distribution.

2.2 Enhancement of DCT

DFT can achieve complexity of O(N logN ) by applying butterfly algorithm, it can also be applied in DCT by operating on real data with even symmetry. Figure

(26)

2.5 is a schematic of butterfly algorithm used in DFT.

Figure 2.5: The illustration of butterfly algorithm for DFT

Given N real number of data input x(n), 0 ≤ n ≤ N − 1. DCT and DFT

transform can described as follows in which Ckis denoted as DCT and Fkis denoted

as DFT. Ck= DCT [xn] = N −1 X n=0 xncos π(2n + 1)k 2N , 0 ≤ k ≤ N − 1 (2.35) Fk = DF T [xn] = N −1 X n=0 xn· e−j 2πnk N , 0 ≤ k ≤ N − 1 (2.36)

Note that, ejn _{is a Euler’s formula in which j is an imaginary number.}

ejn= cos(n) + jsin(n) (2.37)

By defining WN = e−j2π/N as twiddle factor, DFT can be expressed as

Fk=

N −1

X

n=0

xnWNnk (2.38)

In (2.35) and (2.36), 4N inputs are required in DFT to generate N outputs for DCT with complexity of O(4N log4N ). Even extension method[4] can be im-plemented to reduce complexity to O(2N log2N ). Further improvement to reduce complexity to O(N logN ) can be achieved by reordering input patterns [18].

(27)

2.2.1 Even Extension of DFT for DCT

The concept of even extension is to make the mirror input of DFT to eliminate the imaginary number generated by DFT itself.

(a) input (b) even-extension

Figure 2.6: (a) The original input sequence. (b) The even extension of original input.

yn is a series of 2N points that is the even extension of xn

yn= x_xn

2N −n−1 ,

0 ≤ n ≤ N − 1

N ≤ n ≤ 2N − 1 (2.39)

By substituting (2.39) into (2.38), 2N output points Fk from DFT can be

ob-tained. Fk= N −1 X n=0 xnW2Nnk + 2N −1 X n=N x2N −n−1W2Nnk (2.40) where 0 ≤ k ≤ 2N − 1

By changing the summation variable in the second term of the right hand side

and factoring out the term W_2N−k/2, equation can be derived as 2.41. Noting that

W2kN

2N = 1 for integer k and W2N2nk = WNnk.

Fk = W −k/2 2N N −1 X n=0 xn[W2NnkW k/2 2N + W −_nk 2N W −k/2 2N ] (2.41) which is, Fk = W −k/2 2N · 2Re[W k/2 2N N −1 X n=0 xnW2Nnk] (2.42)

(28)

or Fk = W −_k/2 2N · 2 N −1 X n=0 xncos π(2n + 1)k 2N (2.43)

By substituting C(k) in (2.35) to (2.43), the relation between DFT and DCT can be found as Ck = 1 2W k/2 2N Fk , 0 ≤ k ≤ 2N − 1 (2.44)

2.2.2 Reordering Input Data

Regarding to (2.42), applying even extension to input data still have complexity of O(2N log2N ). Further improvement can be achieved by applying input reordering which will be explained in this subsection. The concept of input reordering is to avoid extending number of input data to 2N points. Thus, instead of mirroring real number, elimination of the imaginary number can also be achieved by reordering input.

(a) input (b) reordering

Figure 2.7: The sequence of input patterns after reordering. (a) The original input

sequence. (b) The sequence of vn and wn.

The reordered sequence vn and wn can be obtained by retrieving even and odd

element from yn which also corresponds to the original input sequence xn.

vn = x_x2n 2N −2n−1 , 0 ≤ n ≤ N −1₂ N +1 2 ≤ n ≤ N − 1 (2.45)

where wn is a reverse of vn. v(n) and w(n) are then substituted into DFT

Fk = N −1 X n=0 vnW2N2nk+ N −1 X n=0 wnW2N(2n+1)k (2.46) Fk = N −1 X n=0 vnW2N2nk+ N −1 X n=0 vN −n−1W2N2nkW2Nk (2.47)

(29)

With few manipulations of summation variable and utilizing the relation of Ck and

Fk, equation (2.49) can then be obtained.

Ck= W2Nk/2[ N −1 X n=0 vnWNnk+ N −1 X n=0 vnW −_nk N W −_k 2N] (2.48) which is, Ck= N −1 X n=0 vnWNnkW4Nk + N −1 X n=0 vnWN−nkW −k 4N (2.49)

Then, (2.49) also be written as

Ck = 2Re[W4Nk N −1 X n=0 vnWNnk] (2.50) where 0 ≤ k ≤ N − 1

DCT can then be solved by re-ordering Ck back to original input patterns using

N points in(2.50) with complexity of O(N logN ). IDCT can be solved using exact same approach as forward DCT.

(30)

Chapter 3 Application to Thermal Aware

Placement

(a) Iteration 1 (b) Legalization after

Itera-tion 1

(c) Iteration 11

(d) Legalization after Itera-tion 11

(e) Iteration 21 (f) Legalization after

Itera-tion 21

(g) Iteration 31 (h) Legalization after

Itera-tion 31

Figure 3.1: Placement of cells to demonstrate interleaving mechanism between le-galization and CG for adaptec1.

In this chapter, the application of thermal analysis for placement is presented and the general flow is depicted in Figure 3.3. The idea of force directed placement

(31)

is to model each net as a spring system. A force exists between two cells and pulls two cells toward each other if two cells are connected. To construct the force model for the given placement, each multi-pin net is converted to several two-pin nets and force is calculated for each two-pin net. The force between a two-pin net is defined as follows.

Fij = Wij[(xi− xj)2+ (yi− yj)2] (3.1)

where xi, yi and xj, yj are coordinate of cell i and cell j. The summation of all

the forces for every single two-pin net is

φ(x, y) =X i,j Fij = 1 2x T_{Qx + d}T xx + 1 2y T_{Qy + d}T yy (3.2)

The summation of force in x and y coordinate can be solved separately and force

equilibrium can be obtained by solving Qx = −dx and Qy = −dy.

Wij =

2

(p − 1)|xi− xj|

(3.3) After matrix is built, we applied Jacobi preconditioned Conjugate Gradient method and used Polak-Ribiere as line search direction parameter to solve the prob-lem.

3.1 Placement Algorithm

The placement strategy implemented in this thesis is based on simPL[16]. Similar to simPL, two placement algorithms are implemented. One greedily minimizes the total wirelength using Precondition Conjugate Gradient Method(CG) and the other serves as a legalizer that quickly generate a rough legalized placement. CG will try to pull every cell together to minimize wirelength and legalizer will try to pull

(32)

cell to free area to remove overlap. The result generated by legalizer will act as an anchor pulling cells toward free area. The two algorithms interleave with each other, solution generated by one algorithm will serve as input to another. The process will stop when solution generated by two algorithms are converge.

Figure 3.2: Illustration of the legalization process.

3.2 Considering Thermal Effect in Placement

To achieve a flat thermal profile, high power cells should be evenly spread across the chip. However, greedily spreading out cells is impractical since it dramatically increases the total HPWL which will cause resulting placement unroutable. In this thesis, we addressed thermal effect by minimizing maximum on-chip temperature. Experimental results show that a smoother temperature profile can be obtained by reducing maximum on chip

The placement implemented in this thesis is a flat placer which requires no clustering of cells. A flat placer is ideal to consider thermal effect since each cell can

(33)

be treated individually. To consider thermal effect, additional anchor force can be added to pull cells away from high temperature region. However, additional anchor force will increase total HPWL. Thus, the trade off between reducing maximum temperature and increase total HPWL should be carefully dealt with.

Figure 3.3: Flow Diagram of our thermal aware placement.

3.2.1 Obtaining Thermal Anchor

After hot region is identified, an imaginary circle with radius R using center from the obtained hot region will be formed. Anchors will be created on the perimeter of circle pulling cells away from the hot region. Each cell will be pulled by an anchor that is closest in distance.

Without considering thermal effect, a cell will only suffer forces exerted from ev-ery other node it connects to and the force exerted from the pseudo anchor produced during legalization stage. Considering thermal effect, additional thermal force will be exerted on the cells within hot region. Here, we defined the anchor that serves to reduce maximum temperature as thermal anchor and anchor that serves to

(34)

remove overlap as legalizing anchor. Figure 3.4 illustrates the concept of thermal anchor and legalizing anchor.

Figure 3.4: Forces exerted on a cell by thermal anchor and legalizing anchor. Ther-mal anchor is placed on a circle and red rectangle is the identified hot region.

To obtain the thermal anchor position for each cell, the vector to direct cell

away from center is calculated. (xi, yi) are the coordinate of cell i and (xo, yo) are

the coordinate of center point of hot region.

~x = xi − xo, ~y = yi− yo (3.4)

Magnitude of the vector is calculated to position thermal anchor on the perimeter

of the circle. R is the radius of the circle and (xt, yt) are the coordinates of the

thermal anchor. mag = 2 s R ~x2_{+ ~y}2 (3.5) xt= xi+ mag ∗ xi (3.6) yt= yi+ mag ∗ yi (3.7)

(35)

The radius of circle and force exerted from each anchor determines the magnitude of perturbation of cells and how far it will move away from its original position. To reduce maximum temperature, the larger the circle is, the farther away cell is pulled away from the center.

3.2.2 Determine Magnitude of Thermal Force

As iteration number increases, the general structure of placement begins to form as cells becomes less congested. The force exerted by thermal anchor begin to in-crease such that high power density cells can be spread out more evenly. However, as structure of placement begins to form, perturbing too many cells to reduce maximum temperature will deviate the general structure of placement. To avoid perturbing too many cell and deviate cell too much from its original position, the size of hot re-gion and radius of circle will gradually decrease in size as iteration number increase. We initially set the size of hot region to 13 bins in width and height and decrease by average of 2.5% in each iteration.

In early iterations, cells are congested in center. Anchors generated by legalizer which pull cells from congested region towards white space also serves as a force to pull cells away from hot region. Thus, we let the overlap removal force dominate thermal driven force in early stage of iteration. To adjust the proportion of force between legalizing anchor and thermal anchor, the weight of the thermal anchor Wthermal in iteration n is set proportional to n2/N and weight of legalizing anchor

Wlegal is set proportional to n. Note that, N is total iteration number. Thermal

anchors have negligible effect on cells in early iterations and its impact will quadrat-ically increase as iteration number increases.

To minimize perturbation of cell and to prevent sharp increase in temperature at the perimeter of hot region, a 2D Gaussian model shown in Figure 3.5 is used to determine the magnitude of the force exerted for each thermal anchor to connecting

(36)

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

Figure 3.5: An illustration of 2-D Gaussian force spread model with center positioned at center of hot region.

cells. The 2-D Gaussian has the form

g(x, y) = Ae−(

(xi−xo)2 2σ2x +

(yi−yo)2

2σ2y ) (3.8)

xoand yoare the center point of the hot region. σx and σy are standard deviation

in xi and yidirection of the cells contained within hot region. Thus, the force exerted

from each thermal anchor Ftherm at iteration n can be derived as follow

FthermX = g(x, y) n2 N 1 |xt− xi| (3.9) FthermY = g(x, y) n2 N 1 |yt− yi| (3.10)

where, (xt,yt) is the coordinate of thermal anchor and (xi,yi) is the coordinate

(37)

Chapter 4 Experimental Results

In this chapter, experimental results are presented in three sections. First sec-tion presents the accuracy of the thermal model comparing to Icepak. The second presents the results of our placer comparing with partition-based placer and non-linear force directed placer. In third section, temperature distribution and total HPWL with and without considering thermal effect is presented.

The entire thermal aware placer is written in standard C++ language and com-piled using g++ 4.1.2. The experiment is conducted using Intel Xeon E5530 Quad Core machine operating at 2.4GHz. ISPD 2005 placement benchmark [2] is used for input benchmark. Final legalization and detail placement is delegated to an external binary FastDP [20]. An open-source C-code based FFT solver [10] is used in computation to generate temperature profile. All of the testcases are performed using same parameter without tuning for a specific testcase. The total HPWL is evaluated using open source GSRC bookshelf evaluator [3] and every temperature profile is evaluated with Icepak after legalization.

4.1 Thermal Model

To examine the accuracy of our thermal model, we reference the parameters from [12] and apply in our thermal model. Chip width and height are given in each

(38)

testcases and thickness of chip is set to 0.5 mm. Thermal conductivity κ of silicon

is set to 148 W/(m ·◦

C), heat transfer rate hp and hs is set to 8700 W/(m2· C) and

2017 W/(m2_{· C) respectively. Power density of each cell is a random value ranging}

from 0 to 2∗106_W/m2_{. Ambient temperature is set to 22.02}◦

C and the grid number

Nx and Ny used in thermal model is set to 128.

Figure 4.1 is the comparison of temperature profile obtained by running our simulation model with ANSYS Icepak. The experimental results of max temperature and average temperature running in our thermal simulation and Icepak are presented as shown in Table 4.1. On average, our thermal simulation achieves an accuracy within 6.5% deviation compare to Icepak with 240 times speedup. Note that, the deviation of temperature distribution for each testcase is calculated by taking the average of the temperature difference for each grid between temperature calculated in our thermal simulation and temperature obtained from Icepak.

Table 4.1: Temperature profile comparison with Icepak

Without Thermal With Thermal Max Temp.

Our Icepak Our Icepak Red. (%)

# Max. Avg. Time(s) Max. Avg. Time(s) Deviation Max. Time(s) Max. Time(s) Deviation Our IcePak adaptec1 64.64 46.82 0.74 63.78 50.2 219.71 8.58 60.21 0.51 61.62 247.54 9.22 5.85 3.39 adaptec2 52.67 35.05 0.64 48.02 35.71 235.02 5.75 50.45 0.43 48.38 235.82 5.37 -0.02 -0.75 adaptec3 42.56 31.96 0.68 45.20 32.83 236.67 3.43 44.01 0.67 42.22 240.66 3.34 6.10 6.60 adaptec4 47.57 32.73 0.71 46.82 32.65 242.68 3.13 45.18 0.71 45.86 250.83 3.13 2.98 2.05 bigblue1 74.44 50.12 0.51 73.41 55.08 225.46 10.71 70.77 0.52 71.36 234.69 10.53 3.77 2.79 bigblue2 55.07 39.82 0.75 53.56 40.82 209.26 6.11 51.03 0.75 53.25 247.39 6.36 2.03 0.58 bigblue3 72.49 36.17 1.24 71.52 36.48 218.39 5.62 65.46 1.35 64.61 231.09 5.55 11.29 9.66 bigblue4 75.35 43.02 2.19 77.67 45.06 222.37 6.76 72.37 2.42 75.49 234.55 8.41 3.95 2.81 Avg. - - 0.93 - - 226.20 6.26 - 0.92 - 240.32 6.49 4.50 3.39

4.2 Placement

In this section, placement results are compared with Capo [21], APlace2 [15], mPL6 [5] and simPL [16]. Capo is a partition based placer, APlace2 is a non-linear force-directed placer, mPL6 is currently strongest non-non-linear force directed placer and simPL is currently one of the strongest placer in ISPD 2005 benchmark.

(39)

Our placer is on par in terms of total HPWL compare to APlace2 and outperform Capo10.5 by a nose with 3-4 times speed up. Comparing with state-of-the-art placer, our placer trail behind by 10% and is 4 times slower compare to simPL and 2 times faster compare to mPL6. However, although total HPWL trails behind state-of-the-art placers, the results presented in this section demonstrates that our placer can achieve reasonable placement quality within 10% comparing to the best placers with relatively fast execution speed comparing to partioned-based and non-linear force directed placer as be shown in Table 4.2.

Table 4.2: Placement results in ISPD 2005 benchmarks

Our Capo 10.5 APlace 2.0 simPL mPL6

benchmarks Cell # HPWL Runtime HPWL Runtime HPWL Runtime HPWL Runtime HPWL Runtime adaptec1 211K 87.69 7.82 88.14 25.95 78.35 35.02 77.73 2.27 77.93 18.36 adaptec2 255K 98.22 12.33 100.25 36.06 95.70 50.57 90.36 3.48 92.04 19.91 adaptec3 452K 242.49 25.03 276.80 78.19 218.52 119.53 208.95 7.04 214.16 58.92 adaptec4 496K 212.09 24.40 231.30 79.32 209.28 131.57 187.40 5.30 193.89 55.95 bigblue1 278K 107.46 13.20 110.92 41.78 100.02 44.91 97.42 4.01 96.80 22.82 bigblue2 558K 155.53 22.57 162.81 80.55 153.75 100.96 145.78 8.28 152.34 61.55 bigblue3 1110K 378.09 79.05 405.40 182.94 411.59 209.24 339.78 13.79 344.10 85.23 bigblue4 2180K 897.69 194.00 1016.19 567.15 871.29 489.05 808.22 35.80 829.44 189.83 Norm. - 1.00 1.00 1.07 3.07 0.97 3.97 0.90 0.26 0.92 1.89

4.3 Thermal Aware Placement

In Table 4.3, when weight of each thermal anchor increases linearly in each itera-tion, max temperature is reduced by 2.2% at the expense of 15% increase in HPWL. This is primarily due to large perturbation of cells in early iterations. By adjusting weight of thermal anchor to increase quadratically, better max temperature reduc-tion can be obtained with less increase in total HPWL. In addireduc-tion, we observed that moving cells in the hottest region is most effective to reduce maximum temperature, however, moving every single cells inside the hot region will cause temperature in-crease around the perimeter of hot region. Thus, we apply Gaussian force model to let cells closer to the hot region have more energy to move away while cells farther away from the hot region have less energy to move. With additional of thermal force

(40)

adjustment, 4.5% reduction in max temperature with 7% increase in total HPWL. Figure 4.1 is the temperature distribution of adaptec1 with and without con-sidering thermal effect. Note that region in red only denote the hottest region and does not correspond to a specific temperature. Maximum temperature for Figure 4.1(c) is 5.85% less than Figure 4.1(a) with 3.5% increase in total HPWL. It can be observed that a smoother temperature distribution can be obtained by consid-ering thermal effect. Noting that in Table 4.3, LW is the linear width and QW is Quadratic Width.

Table 4.3: Various approaches to reduce maximum temperature

Original LW increase W = n QW increase W = n2_/N _{QW +Gaussian}

benchmarks Cell# HPWL Temp. HPWL Temp. Red.(%) HPWL Temp. Red.(%) HPWL Temp. Red.(%) adaptec1 211K 87.69 63.95 91.40 60.72 5.05 89.33 60.43 5.50 90.76 60.21 5.85 adaptec2 255K 98.22 50.44 115.86 49.25 2.36 109.94 52.96 -5.00 105.76 50.45 -0.02 adaptec3 452K 242.49 46.87 282.51 49.68 -6.00 274.06 45.02 3.94 258.81 44.01 6.10 adaptec4 496K 212.09 46.57 219.20 44.43 4.59 228.01 45.33 2.66 219.12 45.18 2.98 bigblue1 278K 107.46 73.54 126.37 71.39 2.92 123.76 71.25 3.11 118.61 70.77 3.77 bigblue2 558K 155.53 52.09 160.73 53.22 -2.17 166.15 51.46 1.21 163.46 51.03 2.03 bigblue3 1110K 378.09 73.79 524.75 67.42 8.63 474.70 62.28 15.60 439.27 65.46 11.29 bigblue4 2180K 897.69 75.35 935.53 66.55 2.05 947.73 66.88 11.24 909.50 72.37 3.95 Norm. - 1.00 1.00 1.15 0.98 2.18 1.11 0.95 4.79 1.07 0.95 4.50

(41)

(a) (b)

(c) (d)

(e) (f)

Figure 4.1: (a). Power density of adaptec1. (b). Power density of bigblue3. (c). Temperature distribution of adaptec1. (d). Temperature distribution of bigblue3. (e). Temperature distribution of adaptec1 using Icepak before thermal aware place-ment. (f). Temperature distribution of adaptec1 using Icepak before thermal aware placement.

(42)

(a) (b)

(c) (d)

(43)

Chapter 5 Conclusions

In this thesis, we proposed a thermal aware placement using analytical thermal model and used a force directed algorithm for our placer. The analytical thermal model is implemented using Green function and temperature profile is solved using DCT with input reordering to enhance speed. Analytical thermal model can offer closed form thermal representation with negligible run time. Although analytical thermal model is less accurate but can obtain a temperature profile much faster within reasonable accuracy. Such characteristic is very suitable for thermal analysis during placement or floorplanning stage where thermal analysis does not require to be exact.

Regarding to our placement, we adopt our placer based on [16] with few modifi-cations of our own. Although it trails behind state-of-the-art placers, experimental result shows that the quality of our placer is within reasonable range. The general structure of our placer is relatively simple allowing thermal model easily integrated. Considering thermal effect, we proposed a new methodology by adding thermal anchors around identified hot region to reduce maximum temperature. The size of hot region is dynamic such that perturbation of cells is limited without sacrificing too much HPWL. The magnitude of the thermal anchor is calculated using a 2-D Gaussian model to obtain a smooth temperature profile.

(44)

Bibliography

[1] Ansys icepak. http://www.ansys.com/Products/Simulation+Technology/ Fluid+Dynamics/ANSYS+Icepak.

[2] Ispd 2005 benchmark. http://archive.sigda.org/ispd2006/contest.html. [3] S. N. Adya and I. L. Markov. Executable placement utilities. http://vlsicad.

eecs.umich.edu/BK/PlaceUtils/.

[4] N. Ahmed, T. Natarajan, and K. Rao. Discrete cosine transfom. volume C-23, pages 90 – 93, jan. 1974.

[5] T. F. Chan, J. Cong, J. R. Shinnerl, K. Sze, and M. Xie. mpl6: enhanced mul-tilevel mixed-size placement. In International Symposium on Physical Design, pages 212–214, 2006.

[6] S. Chaudhury.

[7] D. Chen, E. Li, E. Rosenbaum, and S.-M. Kang. Interconnect thermal modeling for accurate simulation of circuit timing and reliability. volume 19, pages 197 –205, feb 2000.

[8] G. Chen and S. Sapatnekar. Partition-driven standard cell thermal placement. In Proceedings of the 2003 international symposium on Physical design, ISPD ’03, pages 75–80, New York, NY, USA, 2003. ACM.

(45)

[9] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W. Chang. Ntuplace3: An analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints. volume 27, pages 1228–1240, July 2008.

[10] M. Frigo and S. G. Johnson. Fftw. http://www.fftw.org/links.html. [11] B. Goplen and S. Sapatnekar. Efficient thermal placement of standard cells in 3d

ics using a force directed approach. In Proceedings of the 2003 IEEE/ACM in-ternational conference on Computer-aided design, ICCAD 03, pages 86–, Wash-ington, DC, USA, 2003. IEEE Computer Society.

[12] B. Goplen and S. S. Sapatnekar. Efficient thermal placement of standard cells in 3d ics using a force directed approach. In Proc. Int. Conf. on Computer-Aided Design, pages 86–89, 2003.

[13] P.-Y. Huang, C.-K. Lin, and Y.-M. Lee. Full-chip thermal analysis for the early design stage via generalized integral transforms. In Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific, pages 462 –467, march 2008.

[14] A. B. Kahng, S. mo Kang, W. Li, and B. Liu. Analytical thermal placement for vlsi lifetime improvement and minimum performance variation. In International Conference on Computer Design, pages 71–77, 2007.

[15] A. B. Kahng and Q. Wang. A faster implementation of aplace. In International Symposium on Physical Design, pages 218–220, 2006.

[16] M.-C. Kim, D.-J. Lee, and I. Markov. Simpl: An effective placement algo-rithm. In Computer-Aided Design (ICCAD), 2010 IEEE/ACM International Conference on, pages 649 –656, nov. 2010.

(46)

[17] J. Li and H. Miyashita. Thermal-aware placement based on fm partition scheme and force-directed heuristic. volume E89-A, pages 989–995, Oxford, UK, April 2006. Oxford University Press.

[18] J. Makhoul. A fast cosine transform in one and two dimensions. volume 28, pages 27 – 34, feb 1980.

[19] B. Obermeier and F. M. Johannes. Temperature-aware global placement. In Proceedings of the 2004 Asia and South Pacific Design Automation Conference, ASP-DAC ’04, pages 143–148, Piscataway, NJ, USA, 2004. IEEE Press.

[20] M. Pan, N. Viswanathan, and C. C. N. Chu. An efficient and effective detailed placement algorithm. In International Conference on Computer Aided Design, pages 48–55, 2005.

[21] J. Roy. Capo: Robust and scalable open-source min-cut floorplacer. In Inter-national Symposium on Physical Design, pages 218–220, 2006.

[22] P. Spindler, U. Schlichtmann, and F. M. Johannes. Kraftwerk2 - a fast force-directed quadratic placement approach using an accurate net model. volume 27, pages 1398–1411, 2008.

[23] C. H. Tsai and S. M. Kang. Cell-level placement for improving substrate ther-mal distribution. In IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., volume 19, pages 253–266, Feb. 2000.

[24] N. Viswanathan, G.-J. Nam, C. Alpert, P. Villarrubia, H. Ren, and C. Chu. Rql: Global placement via relaxed quadratic spreading and linearization. In Design Automation Conference, 2007. DAC ’07. 44th ACM/IEEE, pages 453 –458, june 2007.

(47)

[25] N. Viswanathan, M. Pan, and C. C. N. Chu. Fastplace 3.0: A fast multilevel quadratic placement algorithm with placement congestion control. In Asia and South Pacific Design Automation Conference, pages 135–140, 2007.

[26] B. Wang and P. Mazumder. Accelerated chip-level thermal analysis using mul-tilayer green’s function. volume 26, pages 325 –344, feb. 2007.

[27] Y. Zhan and S. Sapatnekar. High-efficiency green function-based thermal simu-lation algorithms. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 26(9):1661 –1675, sept. 2007.

基於增強和高效的離散餘弦變換的熱模型的構建及其應用於熱感知擺放

國

立

交

通

大

學

電子工程學系 電子研究所

碩 士 論 文

基於增強和高效的離散餘弦變換的熱模型的構建及

其應用於熱感知擺放

Enhanced and Efficient DCT Based Thermal Model Construction

and Its Application to Thermal-Aware Placement

研 究 生：吳永證

指導教授：陳宏明 教授

基於增強和高效的離散餘弦變換的熱模型的構建及

其應用於熱感知擺放

Enhanced and Efficient DCT Based Thermal Model Construction

and Its Application to Thermal-Aware Placement

研 究 生：吳永證 Student：Yong-Zheng Wu

指導教授：陳宏明 Advisor：Hung-Ming Chen

國 立 交 通 大 學

電子工程學系 電子研究所

碩 士 論 文

中華民國一００年七月

基於增強和高效的離散餘弦變換的熱模型的構建

及其應用於熱感知擺放

學生: 吳永證

指導教授: 陳宏明 教授

國立交通大學 電子工程學系 電子研究所 碩士班

摘

要

Enhanced and Efficient DCT Based Thermal

Model Construction and Its Application to

Thermal-Aware Placement

Student: Yong-Zheng Wu

Advisor: Prof. Hung-Ming Chen

Department of Electronics Engineering

National Chiao Tung University

Hsinchu, Taiwan 300, R.O.C.

2011-07

Enhanced and Efficient DCT Based Thermal Model

Construction and Its Application to Thermal-Aware

Placement

Enhanced and Efficient DCT Based Thermal Model

Construction and Its Application to Thermal-Aware

Placement

ACKNOWLEDGEMENTS

Contents

List of Tables

List of Figures

Chapter 1

Introduction

1.1

Previous Work

1.1.1

Placement

1.1.2

Thermal Model

1.1.3

Thermal Aware Placement

1.2

Our Contributions

Chapter 2

Thermal Model

2.1

Analytical Thermal Model Based on Green

Function

2.1.1

Homogeneous Problem

2.1.2

Inhomogeneous Problem

2.2

Enhancement of DCT

2.2.1

Even Extension of DFT for DCT

2.2.2

Reordering Input Data

Chapter 3

Application to Thermal Aware

電子工程學系電子研究所

碩士論文

研究生：吳永證

指導教授：陳宏明教授

研究生：吳永證 Student：Yong-Zheng Wu

國立交通大學

電子工程學系電子研究所

碩士論文

指導教授: 陳宏明教授

國立交通大學電子工程學系電子研究所碩士班