Simultaneous Buffer-sizing and Wire-sizing for Clock Trees Based on Lagrangian Relaxation

(1)

Simultaneous Buffer-sizing and Wire-sizing for Clock Trees

Based on Lagrangian Relaxation

YU-MIN LEEa, CHARLIE CHUNG-PING CHENa,*, YAO-WEN CHANGband D.F. WONGc

a

Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI 53706, USA;bDepartment of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC;c_{Department of Computer Sciences, University of Texas, Austin, TX 78712, USA}

(Received 15 March 2001; Revised 30 January 2002)

Delay, power, skew, area and sensitivity are the most important concerns in current clock-tree design. We present in this paper an algorithm for simultaneously optimizing the above objectives by sizing wires and buffers in clock trees. Our algorithm, based on Lagrangian relaxation method, can optimally minimize delay, power and area simultaneously with very low skew and sensitivity. With linear storage overall and linear runtime per iteration, our algorithm is extremely economical, fast and accurate; for example, our algorithm can solve a 6201-wire-segment clock-tree problem using about 1-minute runtime and 1.3-MB memory and still achieve pico-second precision on an IBM RS/6000 workstation.

Keywords: VLSI CAD; Interconnect optimization; Lagrangian relaxation; Buffer-sizing; Wire-sizing; Clock trees

INTRODUCTION

Delay, skew, power, area and skew sensitivity are the most important concerns in current clock-tree design. With the increasing complexity of synchronous ASICs, clock skew and clock-signal delay have become important factors in determining circuit performance [2,4,10,17]. Wire width process variations during fabrication can significantly impact the delay and skew; thus, it is important to consider the sensitivity of a design to inter-chip process variations [13]. As reported in Ref. [7], power dissipation of a clock tree play a key role in overall chip’s power dissipation. Therefore, it is desirable to simultaneously consider delay, skew, power, area and sensitivity in clock-tree design.

Algorithms for routing-tree optimization have

been proposed in much of the literature recently [3,4,5,12,13,15,17]. The works in Refs. [3,5,12,15] are designed for general routing tree, hence, they cannot handle clock tree issues such as skew and sensitivity. Although Refs. [4,13,14,17] consider sensitivity, skew and/or delay, most of these algorithms only size wires and do not minimize power and area. Moreover, existing

algorithms suffer long runtime and large storage requirements. For example, Refs. [13,17] convert the skew minimization problem into the least-squares minimization problem. However, due to the storage and inversion of large gradient matrices, their respective runtime per iteration and storage requirements are about cubic and quadratic in the problem size.

We present in this paper an algorithm for simul-taneously optimizing the above-mentioned objectives by sizing wires and buffers in clock trees. Our algorithm, based on the Lagrangian relaxation method, can simultaneously optimize delay, power and area with very low skew and sensitivity; it relaxes the constraints scaled with Lagrangian multipliers into its objective function and then iteratively solve the subproblems resulted from dynamically adjusting the Lagrangian multipliers. Our algorithm is extremely fast, economical and accurate; it requires only linear storage overall and linear runtime per iteration for adjusting wire and buffer sizes. For example, we can solve a 6201-wire-segment clock-tree problem in about 1-min runtime and 1.3-MB memory and still guarantee pico-second precision on an IBM RS/6000 workstation.

ISSN 1065-514X print/ISSN 1563-5171 online q 2002 Taylor & Francis Ltd DOI: 10.1080/1065514021000012200

(2)

PRELIMINARIES

We use the following notations in this paper.

. T: A clock tree with a driver w0at the root (source) and a set of s sinks {N1; N2; . . .; Ns}:

. wi: i-th wire segment or buffer. wiis a wire segment when 1 # i # n; or a buffer when n þ 1 # i # n þ m or i ¼ 0:

. xi, li: Size and length of wi, respectively.

. l: l¼ ðl1;l2; . . .;lsÞ is the Lagrange-multiplier

vector.

. x: x ¼ ðx0; x1; x2; . . .; xnþmÞ is a wire- and buffer-sizing

solution.

. ri: Resistance of wire per unit length at unit width, when 1 # i # n; resistance of unit-size buffer, when i ¼ 0 or n þ 1 # i # n þ m:

. ei: Area capacitance of wire per unit square, when

1 # i # n; capacitance of unit-size buffer, when i ¼ 0 or n þ 1 # i # n þ m:

. ri: Resistance of wi. ri<rili=xi; when 1 # i # n;

ri<ri=xi; when n þ 1 # i # n þ m or i ¼ 0:

. ci: Capacitance of wi. ci<eilixi; when 1 # i # n;

ci<eixi; when n þ 1 # i # n þ m or i ¼ 0:

. Ui, Li: Upper bound and lower bound of the size of wi, respectively, i.e. Li# xi# Ui; 0 # i # n þ m:

. Pi: All wires and buffers on the path from the source to sink Ni(including Ni).

. Ti: All wires and buffers in the subtree of T rooted at wi (excluding wi).

. parent(wi): Parent of wi.

. Child(wi): Set of wi’s children.

. Ans(wi): All wires or buffers on the path from wito the nearest upstream buffer or the root (excluding wi). . Dec(wi): All wires, buffers or sinks on the paths from

wi to the neighboring downstream buffers or sinks

(excluding wi).

. Ri: Upstream resistance of wi; Ri¼Pwj[AnsðwiÞrj:

. C_Pi: Downstream capacitance of wi; Ci¼

wj[ChildðwiÞðCjþ cjÞ þ P

Nj[ChildðwiÞ~cj; where c˜j is the capacitance of sink Nj, 1 # j # s:

. A: Area of a clock tree; A ¼Pn_i¼1xiliþP nþm

i¼nþ1xiþ x0:

See Fig.1 for an illustration of Riand Ci.

We use a distributed resistance – capacitance (RC) segment to represent a branch of a clock tree (see Fig. 2(a)). The distributed RC segment can be modeled as an equivalent lumped p-circuit. The lumped resistance

and capacitance of the p-model of an RC segment wiare

approximated byrili/xiandeixili, respectively. We use the switch-resistor model to compute buffer delays (see Fig. 2(b)) and apply the Elmore delay model [8] to approximate signal delays in a subtree. Given a distributed RC routing tree T, its signal delay at sink Niis computed by Di¼ wj[Pi; 1#j#n X rj Cjþ cj 2 þ wj[Pi; nþ1#j#nþm X rjCjþ r0C0:

In practical CMOS applications, capacitive dissipation (due to charging and discharging of load capacitances)

FIGURE 1 Upstream resistance and downstream capacitance.

(3)

usually dominates the other types of power dissipation [5]. Hence, we consider only the capacitive dissipation in this paper. Given a clock tree, its power dissipation P can be approximated by P < fCtotV2dd; where f is

the clock frequency and Ctot is the total capacitance of the tree.

Clock skew is defined as the maximum difference in the delays from the clock source to clock sinks; that is, the skew of a clock tree, S ¼ maxi; jjDi2 Djj: Given wire

width w, the skew sensitivity, D, is defined as the maximum difference between skews under varying values of w due to process variations [4]. The goal of sensitivity minimization is to find an optimal w such that D is minimized.

This paper addresses the clock-tree wire- and buffer-sizing problem, targeting multiple objectives such as delay, skew, power, area and sensitivity. We give the formulation for the wire- and buffer-sizing problem as follows:

The Clock-Tree Wire- and Buffer-Sizing Problem

Given: A clock tree T with the source N0 and sinks

{N1; N2; . . .; Ns}; wire segments {w1; w2; . . .; wn}; buffers

{w0; wnþ1; wnþ2; . . .; wnþm}; upper bounds {U0; U1; . . .;

Unþm}; and lower bounds {L0; L1; . . .; Lnþm}:

Objective: Find an x that minimizes max1#i#sDi; S, P, A

and/or D.

An example of Clock-Tree Wire- and Buffer-Sizing Problem

Figure 3 illustrates an example of clock trees with source N0. There are three sinks (N1, N2and N3), five wires (w1, w2, w3, w4and w5), and two buffers (w0, w6) in this clock tree. The goal is to find a set of wire and buffer sizes to minimize max1#i#sDi; S, P, A and/or D.

DELAY/POWER/AREA MINIMIZATION

We formulate the wire- and buffer-sizing problem for simultaneous delay, power and area minimization as follows:

M : Minimize aDmaxþbP þgA

Subject to DiðxÞ # Dmax; 1 # i # s;

Li# xi# Ui; 0 # i # n þ m;

Dmax. 0;

wherea,bandgare the given constants. Note that Dmaxis a variable we introduced to minimize maximum delay. As shown above, there are two sets of inequalities. The first set of s inequalities is used to ensure that every sink satisfies its delay constraint. The second set of inequalities is used to ensure that the size of every wire segment and buffer satisfies its size constraints.

By dividing both sides of the delay, lower bound, and upper bound constraints by Dmax, xiand Ui, respectively, we can rewrite these constraints as ðDiðxÞ=DmaxÞ # 1;

ðLi=xiÞ # 1 and ðxi=UiÞ # 1: Hence, M becomes a

geometric programming problem which can be reduced to a convex programming problem by an exponential transformation [6]. However, since general geometric programming solvers usually involve gradient matrices inversions, their storage and runtime requirements are at least quadratic and cubic in the problem size, respectively. Therefore, it is desirable to develop an efficient algorithm for solving this problem.

Our approach for solving M is based on Lagrangian

relaxation [1,9]. We relax the delay constraints into

(4)

the objective function by introducing Lagrange multipliers

li’s, 1 # i # s; one for each delay constraint

DiðxÞ # Dmax: We have the Lagrangian-relaxation

sub-problem forM as follows:

M0: Minimize aDmaxþbP þgA þP s i¼1liðDiðxÞ 2DmaxÞ Subject to Li# xi# Ui; 0 # i # n þ m; Dmax. 0:

For each l, let L(l) be the optimal objective function value ofM0. It is well known thatL(l) is a lower bound of the optimal objective value ofM [1,9]. On the other hand, any feasible solution ofM is an upper bound of the optimal objective value. Hence, we can use these two bounds to evaluate the quality of a current solution and to determine the termination criteria. By the Kuhn – Tucker theory [11] and the fact thatM is equivalent to a convex programming problem, we have the following theorem.

Theorem 1 ðx* ; D_max* Þ is an optimal solution if and only if there exists a vectorl* ¼ ðl1* ;l2* ; . . .;l*Þ such thats

(1) Ps_i¼1li* ¼a;

(2) li* ðDiðx* Þ 2 Dmax* Þ ¼ 0; 1 # i # s;

(3) Diðx* Þ 2 Dmax* # 0; 1 # i # s;

(4) li* $ 0; 1 # i # s;

(5) xi* ¼ minðUi; maxðLi; FiÞÞ; where

Fi¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðrimiCiÞ=ðbpiþgþei X wj[AnsðwiÞ rjmj r Þ; pi¼ feiV2dd; and mi¼ X Nj[Ti lj; 0 # i # n þ m:

Proof Since the objective function is a posynominal and

the delay constraints are also posynominals after dividing

both the sides with Dmax. M is a geometric

pro-gramming problem which is equivalent to a convex programming problem under the following transformation

xi¼ eyi: Hence, a local minimum of M is a global

minimum ofM.

We write down Kuhn – Tucker conditions [11] forM0as follows: ›L ›Dmax* ¼ 0; ð1Þ ›L ›xi ¼ 0; 0 # i # n þ m; ð2Þ liðDiðx* Þ 2 Dmax* Þ ¼ 0; 1 # i # s; ð3Þ Diðx* Þ 2 Dmax* # 0; 1 # i # s; ð4Þ Dmax* . 0; ð5Þ li$ 0; 1 # i # s: ð6Þ By Eq. (1), we get Xs i¼1 li¼a ð7Þ

We can also rewrite Eq. (2) as follows:

Note that the terms that involve xi come from

P

wk[ansðwiÞrkmkCk: In fact, only the termeilixi(the wire capacitance of wi) andeixi(the buffer capacitance of wi) in Ckcontribute to the terms with xi, hence

Aiðx*Þ¼ l1 bpiþgþei wj[ansðwiÞ P_r jmj ! 1#i#n; bpiþgþei wj[ansðwiÞ P_r jmj i¼0ornþ1#i#nþm: 8 > > > > > > < > > > > > > : ›L ›xi* ¼›bP þgA þ Ps j¼1ljDjþ ða2Psj¼1ljÞDmax* ›xi ¼ ›bP þgA þPs_j¼1lj wk[Pj;1#k#n P rk Ckþc₂k þ wk[Pj;nþ1#k#nþm P rkCkþ r0C0 " # ›xi ¼ ›bP þgA þ wk[T;1#k#n P rk Ckþc₂k Nj[decðwkÞ P ljþ wk[T;nþ1#k#nþm P rkCk PNj[ decðwkÞljþ r0C0 " # ›xi ¼ ›bP þgA þ wk[T;1#k#n P rkmk Ckþc₂k þ wk[T;nþ1#k#nþm P rkmkCkþ r0C0 " # ›xi :

(5)

Since the terms that involve (1/xi) only coming from rimiCi, we have Biðx* Þ ¼ lirimiCi 1 # i # n; rimiCi i ¼ 0 or n þ 1 # i # n þ m: (

It is clear that Ai(x*), and Bi(x*) are independent of xi. Hence, we can rewrite (›L/›xi) as follows:

›L ›xi ¼›Aiðx* Þxiþ Biðx*Þ xi þ Eiðx* Þ ›xi ¼ Aiðx* Þ 2 Biðx* Þ x2 i ;

where Ei(x*) is independent of xi, since while fixing other variables, (›L/›xi) is a convex function respect to a single

variable xi. We know that the optimal xi* satisfies

following equation: xi* ¼ min Ui; max Li; ffiffiffiffiffiffiffiffiffiffiffiffiffi Biðx* Þ Aiðx* Þ r ; 0 # i # n þ m: ð8Þ

Theorem 1 thus follows. _A

Based on the above analysis, we need to find x* andl*

to solve Problem M. Once li’s are assigned, we can

compute x* based on Theorem 1(5). Hence, we can adopt a two-level approach to solve this problem: in the outer

loop, we dynamically adjust sink weights li’s; weight

associated with each sink is proportional to the signal delay of the sink. In the inner loop, we find an optimal wire- and buffer-sizing solution for the given li’s. With this in mind, we present the Lagrangian-relaxation-based algorithm shown in Fig. 4; the algorithm iteratively adjusts the multipliers based on the delay information associated with sinks and solves the corresponding Lagrangian relaxation subproblems. Our algorithm runs

in O( pqn ) time using O(n ) storage, where p is the number of iterations (A3 – A6) in OWBA and q is the number of iterations (S2 – S3) in LRS. Empirically, the overall runtime approaches linear. We have the following theorem.

Theorem 2 Algorithm OWBA converges to a global

optimal solution.

SKEW AND SENSITIVITY MINIMIZATION By definition, clock skew S ¼ maxi; jjDi2 Djj: To reduce

clock skew, we need not only to reduce signal delays but also to balance delays. We have the following formulation to minimize clock skew:

M1 : Minimize aDmaxþbP þgA þdðDmax2 DminÞ

Subject to DiðxÞ # Dmax; 1 # i # s;

DiðxÞ $ Dmin; 1 # i # s;

Li# xi# Ui; 0 # i # n þ m;

Dmax. 0; Dmin . 0:

Since M1 introduces negative coefficients, it is no

longer a geometric programming problem and hence there is no guarantee of convexity. For a non-convex problem, global optimal solution may not be found easily. We resort to the following heuristic approach. Following the Lagrangian relaxation procedure, we relax the delay constraints by bringing them into the objective function with associated Lagrange multipliers li’s and si’s, 1 #

i # s; where li and si are the Lagrange multipliers

associated with the delay constraint DiðxÞ # Dmax and

DiðxÞ $ Dmin; respectively. We have the Lagrangian

relaxation subproblem forM1 as follows:

M10: Minimize aDmaxþbP þgA þdðDmax2 DminÞ

þPs_i¼1liðDiðxÞ 2 DmaxÞ

þPs

i¼1siðDmin2 DiðxÞÞ

Subject to Li# xi# Ui; 0 # i # n þ m;

Dmax. 0; Dmin. 0:

Hence, by repeatedly solving the Lagrangian relaxation subproblems, we can minimize clock skew.

Sensitivity is used to measure the influence of production variations. It can be measured by the first derivative of the signal delay with respect to wire (buffer) size which can be shown to be j1iRi2 ðriliCi=x2iÞj

ðjeiRi2 ðriCi=x2iÞjÞ: Restricting the sensitivity of every

wire (buffer) to be smaller than Dmax, we get jeiRi2

ðriliCi=x2iÞj # DmaxðjeiRi2 ðriCi=x2iÞj # DmaxÞ: In our

algorithm, we dynamically add the following constraints into Step S3 of LRS during execution:

(6)

. For 1 # i # n xi# min Ui; max Li; ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi riliCi eiRi2 Dmax r ; if eiRi2 riliCi x2 i $ 0; xi$ min Ui; max Li; ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi riliCi eiRiþ Dmax r ; if eiRi2 riliCi x2 i , 0: . For i ¼ 0 or n þ 1 # i # n þ m xi# min Ui; max Li; ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi riCi eiRi2 Dmax r ; if eiRi2 riCi x2 i $ 0; xi$ min Ui; max Li; ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi riCi eiRiþ Dmax r ; if eiRi2 riCi x2 i , 0:

While the above approaches reduce skew and sensitivity, they also tend to increase delay, power, area and runtime at the same time. In fact, we observe that

Algorithm OWBA already significantly reduces skew and sensitivity while optimizing delay, power and/or area. Since Algorithm OWBA tends to allocate higher weights to sinks with longer delay and smaller weights to the ones with shorter delay. Consequently, the longer paths get more resources than the shorter ones. This effect directly balances the delays between different sinks and hence reduces clock skew. We observed that OWBA is a good heuristic for sensitivity minimization as well. To see this,

let us consider delay minimization (i.e. a¼ 1;

b¼g¼ 0). Our algorithm essentially iteratively sizes

all buffers and wire segments, one at a time (in Step S3 of LRS) while keeping the sizes of all other buffers/wire segments fixed. It can be proved that S3 not only optimally size a buffer/wire segment, it also simultaneously minimizes the sensitivity with respect to average delay.

EXPERIMENTAL RESULTS

We implemented our algorithm and tested on the five circuits r1 – r5 used in Ref. [16] on an IBM RS/6000 workstation. The per micron resistance and capacitance used are 3 mV and 0.02 f F, respectively. The lower and upper bounds for wire widths are 1 and 10 mm, respectively. Table I lists the names of the circuits, numbers of wire segments in the circuits, delays, skews, sensitivity, runtimes and storage requirements. It shows that our algorithm, on the average, reduced

TABLE I Experimental results in delay, skew and sensitivity Delay (ns) Skew (ps) Dmax(10215sec/mmm)

Ckt # Nodes Initial Final Red% Initial Final Red% Initial Final Red% Runtime (sec) Stor (kb) Err (ps) r1 533 0.775 0.161 481 64 16 400 7.96 0.53 1501 3.50 148 0.2 r2 1195 2.108 0.379 556 221 12 1842 15.86 0.65 2436 13.38 280 0.4 r3 1723 3.376 0.572 590 154 36 427 20.58 0.68 3039 17.25 388 0.6 r4 3805 9.087 1.376 660 716 92 778 42.13 1.48 2850 54.87 812 1.4 r5 6201 15.864 2.312 686 974 102 955 63.51 2.06 3085 67.04 1300 2.3 Avg – – – 595 – – 691 – – 2582 – – –

(7)

the respective delay, skew and sensitivity by 595, 691 and 2582% after wire-sizing. Further, our algorithm is extremely fast and economical. For example, for the circuit r5 with 6201 wire segments, our algorithm needed only 67-second runtime and 1.3-MB storage to achieve 2.3-ps precision. In Fig. 5(a),(b), the runtime and storage requirements, respectively (represented by the vertical axis), are plotted as a function of the number of wire segments in a circuit (denoted by the horizontal axis). It shows that the runtime and storage requirements of our algorithm approach linear in the number of wire segments. Figure 6 shows the relationship among the maximum delays (Dmax), the value of theL(l) and clock skew at each iteration. The horizontal axis and the vertical axis represent the number of iterations and Dmax, L(l), and skew (in pico second), respectively. The gap between DmaxandL(l) is the error bounds of our algorithm.

Acknowledgements

The authors thank Prof. Leon S. Lasdon, Prof. Patrick Jaillet, Prof. Ross Baldick, Prof. Jonathan Bard and Prof. Jayant Rajgopal for their invaluable help and comments.

References

[1] Lasdon, Leon S. (1970) Optimization Theory for Large Systems (Macmillan Publishing Co., Inc., New York).

[2] Bakoglu, H. (1990) Circuits, Interconnections, and Packaging for VLSI (Addison-Wesley Publishing Company Inc., Reading, MA).

[3] Chung-Ping, Chen and Wong, D.F. (1996) “A fast algorithm for optimal wire-sizing under Elmore delay model”, Proc IEEE ISCAS. [4] Chung, J. and Cheng, C.-K. (1994) “Skew sensitivity minimization

of buffered clock tree”, Proc. ICCAD, 280 – 283.

[5] Cong, J. and Leung, K.-S. (1995) “Optimal wiresizing under Elmore delay model”, IEEE TCAD 14(3), 321 – 336.

[6] Duffin, R.J., Peterson, E.L. and Zener, C. (1967) Geometric Programming—Theory and Application (John Wiley and Sons, Inc., New York).

[7] Dobberpuhl, D. and Witek, R. (1992) “A 200 MHz 64B dual-issue CMOS microprocessor”, Proc. IEEE ISSCC, 106 – 107.

[8] Elmore, W.C. (1948) “The transient response of damped linear networks with particular regard to wide band amplifiers”, J. Appl. Phys. 19(1).

[9] Fisher, M.L. (1985) “An applications oriented guide to Lagrangian relaxation”, Interfaces 15(2), 10 – 21.

[10] Jackson, M.A.B., et al. (1990) “Clock routing for high performance ICs”, Proc. DAC, 573 – 579.

[11] Luenberger, D.G. (1984) Linear and Nonlinear Programming (Addison-Wesley Pub. Company Inc., Reading, MA).

[12] Menezes, N., Baldick, R. and Pillage, L.T. (1995) “A sequential quadratic programming approach to concurrent gate and wire-sizing”, Proc. ICCAD.

[13] Menezes, N., Pullela, S., Dartu, F. and Pillage, L.T. (1994) “RC interconnect syntheses—a moment fitting approach”, Proc. ICCAD. [14] Pullela, S., Menezes, N. and Pillage, L.T. (1993) “Reliable non-zero skew clock trees using wire width optimization”, Proc. 30th ACM/IEEE Design Automation Conf., 165 – 170.

[15] Sapatnekar, S.S. (1994) “RC interconnect optimization under the Elmore delay model”, Proc. ICCAD, 387 – 391.

[16] Tasy, R.-S. (1993) “Exact zero skew”, IEEE TCAD.

[17] Zhu, Q., Dai, W.W.-M. and Xi, J.G. (1993) “Optimal sizing of high-speed clock networks based on distributed RC and lossy transmission line models”, Proc. ICCAD, 628 – 633.

Yu-Min Leewas born in Taiwan in 1969. He received the

B.S. and M.S. degrees in communication engineering from the National Chiao-Tung University, Taiwan, in 1991

(8)

and 1993, and the M.S. degree in electrical and computer engineering from the University of Wisconsin at Madison in 2000. He is currently working towards the Ph.D. degree at the University of Wisconsin at Madison. His research interests include electrical design automation, low-power and high-performance circuit design and signal integrity analysis and optimization.

Charlie Chung-Ping Chen received his B.S. degree in

computer science and information engineering from the National Chiao-Tung University, Hsinchu, Taiwan, in 1990. He received his M.S. and Ph.D. degrees in computer science from the University of Texas at Austin in 1996 and 1998, respectively. Between 1997 – 1999 he was with the Intel Corporation as a senior CAD engineer with Strategic CAD Labs. He actively participated in several high-speed interconnect optimization and circuit synthesis projects. He received the D2000 Award from Intel Corp. in 1999. Currently, he is an assistant professor in the Electrical and Computer Engineer Department at the University of Wisconsin, Madison. His research interests are in the areas of computer-aided design and microprocessor circuit design with an emphasis on interconnect and circuit optimization as well as signal integrity analysis and optimization. Dr Chen received Faculty Early Career Development Award (CAREER) from National Science Foundation in 2001 for his work on VLSI interconnect modeling, simulation and optimization.

Yao-Wen Chang received his B.S. degree in computer

science and information engineering from the National Taiwan University, Taipei, Taiwan, in 1988, and the M.S. and the Ph.D. degrees in the computer science from

the University of Texas at Austin in 1993 and 1996, respectively. He was with IBM T.J. Watson Research Center, Yorktown Heights, NY, in the VLSI group, during the summer of 1994. Currently, he is an Associate Professor in the Department of Electrical Engineering at the National Taiwan University, Taipei, Taiwan. His research interests lie in design automation, architecture and systems for VLSI and combinatorial optimization. Dr Chang received the Best Paper Award of the CAD tract at the 1995 IEEE International Conference on Computer Design (ICCD-95) for his work on FPGA routing. He is a member of IEEE Circuits and System Society, ACM, and ACM/SIGDA.

D.F. Wongreceived the B.Sc. degree in mathematics from

the University of Toronto, Canada, and the M.S. degree in mathematics and the Ph.D. degree in computer science from the University of Illinois at Urbana-Champaign. He is currently a Professor of the computer science department at the University of Texas at Austin. His main research interest is computer-aided-design (CAD) of very-large-scale integration (VLSI). He has published more than 140 technical papers in this area. He is a coauthor of Simulated Annealing for VLSI Design (Norwell, MA: Kluwer, 1988). He is the Technical Program Chair of the 1998 ACM International Sym-posium on Physical Design (ISPD-98). He has also served on the Technical Program Committees of a number of other VLSI CAD conferences (e.g. ICCAD, ED&TC, ISCAS, FPGA). Dr. Wong received the Best Paper Awards at DAC-86 and ICCD-95 for his work on floorplan design and field programmable gate array routing, respectively. He is an Editor of IEEE Transactions on Computers.