MB^*-Tree: A Multilevel Floorplanner for Large-Scale Building-Module Design

(1)

MB

∗

-Tree: A Multilevel Floorplanner for

Large-Scale Building-Module Design

Hsun-Cheng Lee, Yao-Wen Chang, Member, IEEE, and Hannah Honghua Yang

Abstract—In this paper, we present an agglomeratively

multi-level floorplanning/placement framework based on the B∗-tree representation called MB∗-tree to handle the floorplanning and packing for large-scale building modules. The MB∗-tree adopts a two-stage technique, i.e., clustering followed by declustering. The clustering stage iteratively groups a set of modules based on a cost metric guided by area utilization and module connectivity and at the same time establishes the geometric relations for the newly clustered modules by constructing a corresponding B∗-tree for them. The declustering stage iteratively ungroups a set of the previously clustered modules (i.e., perform tree expansion) and then refines the floorplanning/placement solution by using a simulated annealing scheme. In particular, the MB∗-tree preserves the geometric relations among modules during declustering, which makes the MB∗-tree an ideal data structure for the multilevel floorplanning/placement framework. Experimental results show that the MB∗-tree obtains significantly better silicon area and wirelength than previous works. Further, unlike previous works, the MB∗-tree scales very well as the circuit size increases.

Index Terms—Floorplanning, layout, multilevel framework,

physical design, placement.

I. INTRODUCTION

D

ESIGN complexities are growing at breathtaking speed with the continued improvement of nanometer IC tech-nologies. On one hand, designs with billions of transistors are already in production (ICs with billions of transistors are even expected within this decade), Internet Protocol modules are widely reused, and a large number of buffer blocks are used for delay optimization as well as noise reduction in nanometer interconnect-driven floorplanning [3], [11], [19], [20], [23], [35], all of which drive the need of a tool to handle large-scale building modules. On the other hand, the highly competitive IC market requires faster design convergence, faster incremental design turnaround, and better silicon area utilization. Efficient and effective design methodology and tools capable of

plac-Manuscript received April 5, 2003; revised February 28, 2004 and August 23, 2005. The work of Y.-W. Chang was supported by the National Science Council of Taiwan under Grants NSC 2215-E-002-009, NSC 93-2220-E-002-001, and NSC 93-2752-E-002-008-PAE. This paper was presented at the 40th ACM/IEEE Design Automation Conference, June 2003. This paper was recommended by Associate Editor T. Yoshimura.

H.-C. Lee is with Synopsys, Taipei 110, Taiwan, R.O.C. (e-mail: gis88526@ cis.nctu.edu.tw).

Y.-W. Chang is with the Department of Electrical Engineering and Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C., and also with Waseda University, Tokyo 169-8050, Japan (e-mail: ywchang@cc.ee.ntu.edu.tw).

H. H. Yang is with Strategic CAD Laboratories, Intel Corporation, Hillsboro, OR 97124 USA (e-mail: hyang@ichips.intel.com).

Digital Object Identifier 10.1109/TCAD.2007.891368

ing and optimizing large-scale modules are essential for such large designs.

Many floorplan representations have been proposed [9], [15], [24]–[28], [31]–[33], [36], [37] in the literature. However, tra-ditional floorplanning/placement algorithms do not scale well as the design size, complexity, and constraints increase, which are mainly due to their inflexibility in handling nonslicing floorplans and/or intrinsically nonhierarchical data structures (representations). The B∗-tree, in contrast, has shown an ef-ficient, effective, and flexible data structure for nonslicing floorplans [9]. It is particularly suitable for representing a nonslicing floorplan with large-scale modules and for creating or incrementally updating a floorplan. What is more important is that its binary-tree-based structure directly corresponds to the framework of a hierarchical divide-and-conquer scheme, and thus, the properties inherited from the structure can sub-stantially facilitate the operations for multilevel large-scale building-module floorplanning/placement.

Based on the B∗-tree representation, we present an ag-glomeratively multilevel floorplanning/placement framework called MB∗-tree to handle the floorplanning and packing for large-scale building modules with high efficiency and quality. MB∗-tree is inspired by the success of the agglomeratively mul-tilevel framework in graph/circuit partitioning such as Chaco [16], hMetis [21], and ML [4]; placement such as mPL [6]; hierarchical placement/floorplanning such as BEAR [13]; and routing such as MRS [10], MR [8], [29], MARS [12], and CMR [17], [18]. Unlike multilevel partitioners and placers, however, multilevel floorplanning poses unique difficulties as the shapes of modules to be clustered together can significantly affect the area utilization of a floorplan, and a floorplan design within a cluster needs to be explored along with the global floorplan optimization. The clustering approach also helps to directly address floorplan congestion and timing issues, since different clustering algorithms can be developed to localize intermodule communication and reduce the critical path length.

The MB∗-tree algorithm adopts a two-stage technique, i.e., clustering followed by declustering. (See Fig. 1 for an illustra-tion of the multilevel framework.) The clustering stage itera-tively groups a set of modules (could be basic modules and/or previously clustered modules) based on a cost metric guided by area utilization and module connectivity and at the same time establishes the geometric relations for the newly clustered mod-ules by constructing a corresponding B∗-tree. The clustering procedure repeats until a single cluster containing all modules is formed, which is denoted by a one-node B∗-tree that bookkeeps the entire multilevel clustering information. For soft modules, we apply Lagrangian relaxation during clustering to determine

(2)

Fig. 1. Multilevel framework.

the module shapes. Then, the declustering stage iteratively un-groups a set of the previously clustered modules (i.e., expanding a node into a subtree according to the B∗-tree topology con-structed at the clustering stage) and then applies simulated annealing to refine the floorplanning/placement solution based on a cost metric defined by area utilization and wirelength. The refinement shall lead to a “better” B∗-tree structure that guides the declustering at the next level. It is important to note that we always keep only one B∗-tree for processing at each iteration, and the MB∗-tree preserves the geometric relations among modules during declustering (i.e., the tree expansion), which makes the MB∗-tree an ideal data structure for the multilevel floorplanning/placement framework. Note that our multilevel framework agglomeratively clusters solutions (i.e., cluster mod-ules one by one) with a postrefinement at each level of the hierarchy, resulting in a linear number of levels. This frame-work is different from classical multilevel frameframe-works that si-multaneously cluster solutions throughout the design, resulting in a logarithmic number of levels.

Experimental results show that the MB∗-tree scales very well as the circuit size increases while the famous previous works, sequence pair (SP), O-tree, and B∗-tree alone do not. For circuit sizes ranging from 49 to 9800 modules and from 408 to 81 600 nets, the MB∗-tree consistently obtains high-quality floorplans with dead spaces of less than 3.7% in empirically linear runtime, while SP, O-tree, and B∗-tree can handle only up to 196, 196, and 1960 modules in the same amount of runtime and result in dead spaces of as large as 13.00% (at 196 mod-ules), 9.86% (at 196 modmod-ules), and 27.33% (at 1960 modmod-ules), respectively. We also performed experiments based on a large industrial design with 189 modules and 9777 nets. The results show that our MB∗-tree algorithm obtained significantly better silicon area and wirelength than previous works.

The remainder of this paper is organized as follows: Section II formulates the module floorplanning/placement problem. Section III gives a brief overview on the B∗-tree rep-resentation. Section IV presents our two-stage algorithm, i.e., clustering followed by declustering, for the problem addressed in this paper. Section V presents our approach for handling soft modules. Section VI gives the experimental results, and finally, the concluding remarks are given in Section VII.

II. PROBLEMFORMULATION

Let M = {m1, m2, . . . , mn} be a set of n rectangular

mod-ules. Each module m_i∈ M is associated with a three tuple (hi, wi, ai), where hi, wi, and aidenote the width, height, and

Fig. 2. Admissible placement and its corresponding B∗-tree.

aspect ratio of mi, respectively. The area Ai of mi is given by hiwi, and the aspect ratio ai of mi is given by hi/wi. Let r_i,min and r_i,max be the minimum and maximum aspect ratios, i.e., h_i/w_i∈ [r_i,min, r_i,max]. A placement (floorplan) P = {(x_i, y_i)|m_i∈ M} is an assignment of rectangular

mod-ules m_i with the coordinates of their bottom-left corners be-ing assigned to (x_i, y_i) so that no two modules overlap (and h_i/w_i∈ [r_i,min, r_i,max] ∀i). In this paper, we consider both hard and soft modules. A hard module is not flexible in its shape

but free to rotate. A soft module is free to rotate and change its shape within the range [r_i,min, r_i,max]. The objective of

placement/floorplanning is to minimize a specified cost metric such as a combination of the area Atot and wirelength Wtot

induced by the assignment of mi, where Atot is measured by the final enclosing rectangle of P , and Wtotis the summation of half the bounding box of pins for each net (or the center-to-center interconnections among all modules).

III. REVIEW OF THEB∗-TREEREPRESENTATION

As mentioned earlier, we apply the B∗-tree representation to handle the problem of multilevel large-scale building-module floorplanning/placement. Thus, we shall first give a review of the B∗-tree representation.

Given a compacted placement P that can neither move down nor move left (called an admissible placement [15]), we can represent it by a unique B∗-tree T [9]. [See Fig. 2(b) for the B∗-tree representing the placement of Fig. 2(a).] A B∗-tree is an ordered binary tree (a restriction of O-tree with faster and more flexible operations) whose root corresponds to the module on the bottom-left corner. Using the depth-first search (DFS) procedure, the B∗-tree T for an admissible placement P can be constructed in a recursive fashion. Starting from the root, we first recursively construct the left subtree and then the right subtree. Let R_idenote the set of modules located on the right-hand side and adjacent to m_i. The left child of the node n_i corresponds to the lowest module in R_i that is unvisited. The right child of ni represents the lowest module located above

mi, with its x-coordinate is equal to that of mi.

As shown in Fig. 2, we make n1 the root of T since m1

is on the bottom-left corner. Constructing the left subtree of

n1 recursively, we make n2the left child of n1. Since the left

(3)

Fig. 3. Cluster with the four primitive modules,a, b, c, and d. The placement

can be obtained by applying the clustering scheme{{m₁, m2}, {m3, m4}},

resulting in a dead space of 36 units.

of n2(which is rooted by n3). The construction is recursively performed in DFS order. After completing the left subtree of

n1, the same procedure applies to the right subtree of n1.

Fig. 2(b) illustrates the resulting B∗-tree for the placement shown in Fig. 2(a). The construction takes only linear time. The B∗-tree keeps the geometric relationship between two modules as follows: If node n_j is the left child of node n_i, module

m_j must be located on the right-hand side of m_i, with x_j =

x_i+ w_i. Besides, if node n_j is the right child of n_i, module

m_jmust be located above module m_i, with the x-coordinate of

m_jequal to that of m_i, i.e., x_j= x_i. Also, since the root of T represents the bottom-left module, the coordinate of the module is(xroot, yroot) = (0, 0).

Inheriting from the nice properties of ordered binary trees, the B∗-tree is simple, efficient, effective, and flexible for han-dling nonslicing floorplans. It is particularly suitable for rep-resenting a nonslicing floorplan with various types of modules and for creating or incrementally updating a floorplan. What is more important is that its binary-tree-based structure directly corresponds to the framework of a hierarchical scheme, which makes it a superior data structure for multilevel large-scale building-module floorplanning/placement.

IV. MB∗-TREEALGORITHM

In this section, we shall present our MB∗-tree algo-rithm for multilevel large-scale building-module floorplanning/ placement. As mentioned earlier, the algorithm adopts a two-stage approach, i.e., clustering followed by declustering, by using the B∗-tree representation.

The clustering operation results in two types of modules, namely: 1) primitive modules and 2) cluster modules. A prim-itive module m is a module given as an input (i.e., m∈ M), while a cluster one is created by grouping two or more primitive modules. Each cluster module is created by a clustering scheme

{mi, mj}, where mi (mj) denotes a primitive or a cluster module. Fig. 3 shows a cluster module with four primitive modules; a possible way to form the cluster module is by the clustering scheme{{m1, m2}, {m3, m4}}.

In the following subsections, we detail the two-stage ap-proach of clustering followed by declustering for hard modules.

A. Clustering

The clustering stage iteratively groups a set of (primitive or cluster) modules (say, two modules) based on a cost metric

Fig. 4. Example connectivity between each pair of modules. We apply the clustering scheme {{m1, m2}, {m3, m4}} based on connectivity density

instead of{{{m₁, m2}, m3}, m4} (based on connectivity).

defined by area utilization, wirelength, and connectivity among modules, and at the same time establishes the geometric rela-tions among the newly clustered modules by constructing a cor-responding B∗-subtree. The clustering procedure repeats until a single cluster containing all modules is formed (or the number of modules is smaller than a predefined threshold), which is denoted by a one-node B∗-tree that bookkeeps the entire clustering scheme. We shall first consider the clustering metric.

1) Clustering Metric: The clustering metric is defined by

the two criteria, namely: 1) area utilization (dead space) and 2) the connectivity density among modules.

1) Dead space: The area utilization for clustering two mod-ules m_i and m_j can be measured by the resulting dead space s_ij, representing the unused area after clustering

m_i and m_j. Let stot denote the dead space in the final

floorplan P . We have stot= Atot−_m_i_∈MAi, where

A_i denotes the area of module m_i, and Atot denotes

the area of the final enclosing rectangle of P . Since

mi∈MAiis a constant, minimizing Atot is equivalent to minimizing the dead space stot. For the example shown in Fig. 3, s12= 0, s13= 36, and stot= 36.

2) Connectivity density: Let the connectivity c_ijdenote the number of nets between two (primitive or cluster) mod-ules m_iand m_j. The connectivity density d_ijbetween two modules m_iand m_jis given by

d_ij = c_ij/(n_i+ n_j) (1) where n_i(n_j) denotes the number of primitive modules in mi (mj). Often, a bigger cluster implies a larger

number of connections. The connectivity density considers not only the connectivity but also the sizes of clusters between two modules to avoid possible biases. For the example shown in Fig. 4, we apply the clustering scheme {{m1, m2}, {m3, m4}} (based on connectivity

density) instead of {{{m1, m2}, m3}, m4} (based on

connectivity).

Obviously, the cost function of dead space is for area opti-mization while that of connectivity density is for timing and wiring area optimization. Therefore, the metric for clustering two (primitive or cluster) modules miand mj, φ: {mi, mj} →

R+_{∪ {0}, is then given by}

φ({m_i, m_j}) = αˆs_ij+βK_ˆ

d_ij (2)

where ˆsij and K/ ˆd_ij are respective normalized costs for sij

and K/d_ij, and α, β, and K are user-specified parameters/ constants. We set K =s_ij/d_ij to normalize the dead

(4)

Fig. 5. Relation of two modules and their clustering. (a) Two candidate modulesmi andmj. (b) Clustering and corresponding B∗-subtree for the case wheremiis horizontally related tomj. (c) Clustering and corresponding B∗-subtree for the case wheremiis vertically related tomj.

space and the connectivity cost, i.e., to make the ranges of the two normalized costs about the same. Note that we shall nor-malize the dead space and connectivity density to equally weigh the two costs. To calculate the normalization factors for s_i,j and d_i,j, we can preprocess using simulated annealing to derive the initial temperature and then obtain the approximate ranges of the resulting area and connectivity density to normalize the costs. For example, we may perform 100 runs of simulated annealing to obtain the approximate ranges of the resulting costs (i.e., area and connectivity density here) and derive the factors (weights) to equally weigh the costs by making the ranges of the two costs about the same. By doing so, it is more meaningful to weigh the area and connectivity density costs through the controlling factors α and β.

2) Clustering Algorithm: Based on φ, we pick a set of

modules (say, two modules) with the minimum clustering cost

φ and cluster them into one. The procedure continues until a

single cluster containing all primitive modules is formed or the number of modules is smaller than a given threshold (and thus can be easily handled by a classical floorplanner). During clustering, we shall record how two modules m_i and m_j are clustered into a new cluster module m_k. Fig. 5 shows two ways to cluster two modules m_i and m_j. If m_i is placed left to (below) m_j, then m_i is horizontally (vertically) related to

mj, which is denoted by mi→ (↑)mj. If mi→ (↑)mj, then

nj is the left (right) child of ni in its corresponding B∗-tree. The relation for each pair of modules in a cluster is established and recorded in the corresponding B∗-subtree during clustering. It will be used for determining how to expand a node into a corresponding B∗-subtree during declustering.

Fig. 6 shows our two-way clustering algorithm. Line 1 com-putes the initial cost matrix Φ = (φ_ij), where φ_ij = αˆs_ij+

βK/ ˆd_ij. Line 2 assigns to n the number of input primitive modules. Lines 3–9 perform step-by-step clustering (n− 1 steps in total). At Step k, we pick two modules m_iand m_jwith the minimum φ_ij(Extract_Min(φ_ij) in Line 4) and then cluster them into a new cluster module m_n+k (cluster(m_i, m_j) in Line 5). Line 6 records the clustering scheme q_kfor{m_i, m_j}.

Line 7 randomly decides the relation of m_i and m_j, and constructs the corresponding B∗-subtree. We then update the set of modules to cluster modules (Line 8) and the entries associated with mn+kin the cost matrixΦ (Line 9). We repeat the two-way clustering process n− 1 times until all mod-ules are clustered into a single cluster. The clustering scheme

q_n−1 for the last two modules bookkeeps the entire clustering scheme Q. Thus, we assign q_n−1to Q (Line 10) and return the entire scheme (Line 11).

Fig. 6. Two-way clustering algorithm.

B. Declustering

The declustering stage iteratively ungroups a set of previ-ously clustered modules (i.e., expanding a node into a subtree according to the B∗-tree topology constructed at the cluster-ing stage) and then refines the floorplan solution based on a simulated annealing scheme. The refinement shall lead to a “better” B∗-tree structure that guides the declustering at the next level. It is important to note that we always keep only one B∗-tree for processing at each iteration, and the agglomeratively multilevel B∗-tree-based floorplanner preserves the geometric relations among modules during declustering (i.e., the tree expansion), which makes the B∗-tree an ideal data structure for the multilevel floorplanning framework.

We shall first introduce the metric used in simulated anneal-ing for refinanneal-ing floorplan/placement solutions.

1) Declustering Metric: The declustering metric is defined

by the two criteria, namely: 1) area utilization (dead space) and 2) the wirelength among modules.

1) Dead space: Same as that defined in Section IV-A. 2) Wirelength: The wirelength of a net is measured by half

the bounding box of all the pins of the net or by the length of the center-to-center interconnections between the mod-ules if no pin positions are specified. The wirelength for clustering two modules m_iand m_j, i.e., w_ij, is measured by the total wirelength interconnecting the two modules. The total wirelength in the final floorplan P , i.e., wtot, is

the summation of the length of the wires interconnecting all modules.

Obviously, the cost function of dead space is for area op-timization while that of wirelength is for timing and wiring area optimization. Therefore, the metric for refining a floorplan solution during declustering ψtot : M → R+∪ {0} is then

given by

ψtot= γˆstot+ δ ˆwtot (3)

whereˆstotandwˆtotare the respective normalized costs for stot

(5)

Fig. 7. Declustering algorithm.

the normalization procedure for stotand wtotis similar to that

described in Section IV-A1.

2) Declustering Algorithm: The declustering stage

itera-tively ungroups a set of previously clustered modules (i.e., ex-pand a node into a subtree according to the B∗-tree constructed at the clustering stage) and then refines the floorplan solution based on simulated annealing.

Fig. 7 shows the algorithm for declustering a cluster module

m_kinto two modules miand mj that are clustered into mk at the clustering stage. Without loss of generality, we make m_i right to or below m_j. In Algorithm Declustering (see Fig. 7),

parent(n_i), right(n_i), and left(n_i) denote the parent, right

child, and left child of node n_iin a B∗-tree, respectively. Line 1 updates the parent of n_kas that of n_i. Lines 2–5 make n_ia left (right) child if n_kis a left (right) child. Lines 6–13 deal with the case where m_iis horizontally related to m_j. If m_i→ m_j, then

n_jis the left child of n_i, and thus, we update the corresponding links in Line 7. Lines 8–10 (11–13) update the links associated with the right (left) child of n_k. Similarly, Lines 14–23 cope with the case where m_iis vertically related to m_j.

Fig. 8 gives an illustration of this algorithm. Fig. 8(a) shows an instance of clustering and its corresponding B∗-tree, for which we are preparing to decluster m3 into m6 and m7(i.e., the clustering scheme for m3 is {m6, m7}). Fig. 8(b) shows

four cases to decluster m3, and their corresponding resulting

B∗-trees are illustrated in Fig. 8(c). Cases 1 and 2 correspond

to Lines 6–13 of Fig. 7, and Cases 3 and 4 correspond to Lines 14–23.

Theorem 1: Each declustering operation takes O(1) time,

and the overall declustering stage takes O(|M|) time, where

|M| is the number of input primitive modules.

Proof: As listed in Algorithm Declustering (see Fig. 7),

each declustering operation requires updating only local links associated with the three involved modules (m_i, m_j, and m_k). Since there are only a constant number of such links, per-forming a declustering operation takes O(1) time. Further, it is trivial that we perform|M| − 1 declustering operations to un-group all modules, and the overall declustering complexity thus

follows.

We proposed a simulated annealing-based algorithm to refine the solution at each level of declustering. We apply the follow-ing three operations to perturb a multilevel B∗-tree (a feasible solution) to another.

1) Op1: Rotate a module.

2) Op2: Move a module to another place. 3) Op3: Swap two modules.

Op1 exchanges the width and height of a module. Op2 deletes a node of a B∗-tree and inserts it into another position. Op3 deletes two nodes and inserts them into the corresponding po-sitions in the B∗-tree. Obviously, Op2 and Op3 need to perform the deletion and insertion operations on a B∗-tree, which takes

O(h) time, where h is the height of the B∗-tree.

The annealing procedure uses a parameter, i.e., temperature

t, to control the probability of accepting an uphill move (an

inferior solution). The initial temperature t0= ∆avg/ln(P ),

where ∆avg is the average cost change for a set of randomly

generated uphill moves, and P is the initial probability of accepting uphill moves. The temperature t is then decreased by a factor r <1 (i.e., the temperature for the next iteration is rt). We terminate the annealing process when the temperature cools down to a user-defined value ε.

The simulated annealing algorithm starts by a B∗-tree pro-duced during declustering. Then, it perturbs a B∗-tree (a feasi-ble solution) to another B∗-tree by Op1, Op2, and/or Op3 until a predefined “frozen” state is reached. At last, we transform the resulting B∗-tree to the corresponding final admissible placement.

C. Overall MB∗-Tree Algorithm

The MB∗-tree algorithm integrates the aforementioned three algorithms and is summarized in Fig. 9. In Line 1, we first perform clustering to reduce the problem size level by a level based on the clustering metric described in Section IV-A1 and then enter the declustering stage. In the declustering stage, we perform floorplanning for the modules at each level using the simulated annealing-based algorithm B∗-tree. At level i, we perform the declustering i2times and then perform simulated annealing with i× p tries per iteration, where p is a user-specified parameter. Therefore, the number of tries for each iteration of simulated annealing is proportional to the number of (primitive and cluster) modules at the current level, leading to a better tradeoff between scalability and solution quality since the MB∗-tree can inherit a “good” solution from the previous level.

(6)

Fig. 8. Declusteringm3intom6 andm7. (a) Configuration before declustering. (b) Four cases to declusterm6andm7. (c) Placement and corresponding B∗-tree topology after declustering.

Fig. 9. MB∗-tree algorithm.

Fig. 10 illustrates an execution of the MB∗-tree algorithm. For explanation, we cluster three modules each time in Fig. 10. Fig. 10(a) lists seven modules to be packed, m_i, 1 ≤ i ≤ 7. Fig. 10(b)–(d) illustrates the execution of the clustering algorithm. Fig. 10(b) shows the resulting configuration after clustering m5, m6, and m7into a new cluster module m8(i.e.,

the clustering scheme of m8is{{m5, m6}, m7}). Similarly, we

cluster m1, m2, and m4into m9by using the clustering scheme {{m2, m4}, m1}. Finally, we cluster m3, m8, and m9into m10

by using the clustering scheme{{m3, m8}, m9}. The

cluster-ing stage is thus done, and the declustercluster-ing stage begins, in which simulated annealing is applied to do the floorplanning. In Fig. 10(e), we first decluster m10into m3, m8, and m9[i.e., ex-pand the node n10into the B∗-subtree illustrated in Fig. 10(e)]. We then move m8to the top of m9(perform Op2 for m8) during

simulated annealing [see Fig. 10(f)]. As shown in Fig. 10(g),

we further decluster m9into m1, m2, and m4, and then rotate m2and move m3on top of m2(perform Op1 on m2and Op2

on m3), resulting in the configuration shown in Fig. 10(h).

Finally, we decluster m8shown in Fig. 10(i) to m5, m6, and m7, and move m4 to the right of m3 (perform Op2 for m4),

which results in the optimum placement shown in Fig. 10(j). V. EXTENSION TOSOFTMODULEHANDLING

In this section, we present our approach for handling soft modules. We first apply Lagrangian relaxation [38] to cluster soft modules at the clustering stage while keeping declustering the same as before. We then propose a network-flow-based algorithm for projecting Lagrange multipliers to satisfy their optimality conditions.

A. Clustering Metric for Soft Modules

The clustering metric for soft modules is defined by the two criteria, namely: 1) area utilization (dead space) and 2) the distance between modules obtained from the computation of Lagrangian relaxation.

1) Dead space: Same as that defined in Section IV-A. 2) Distance: In Lagrangian relaxation, we formulate dead

space and wirelength as the objective function. Thus, after the computation of Lagrangian relaxation to be described in Section V-C, we can obtain the distances of two cluster modules i and j, denoted by t_ij, via their coordinates computed by Lagrangian relaxation.

Therefore, the metric for clustering two soft (primitive or cluster) modules m_iand m_j, i.e., φ_s: {m_i, m_j} → R+∪ {0},

is then given by

φ_s({m_i, m_j}) = αˆs_ij+ βˆt_ij (4) where ˆs_ij and ˆt_ij are the respective normalized costs for s_ij and t_ij, and α and β are the user-specified parameters. The

(7)

Fig. 10. (a) Given seven modules,mi,1 ≤ i ≤ 7. (b) Clusters m5,m6, andm7intom8. (c) Clustersm1,m2, andm4intom9. (d) Clustersm3,m8, and

m9intom10. (e) Declusterm10tom3,m8, andm9. (f) Perform Op2 form8. (g) Declusterm9tom1,m2, andm4. (h) Perform Op1 and Op2 form2and

m3, respectively. (i) Declusterm8tom5,m6, andm7. (j) Perform Op2 form4.

procedure to normalize the s_ij and t_ij costs is similar to that described in Section IV-A1.

Based on φ_s, we perform the clustering algorithm as before and then employ the simulated annealing-based algorithm, which is described in Section IV-B, for the floorplanning.

B. Formulation

Let M = {m1, m2, . . . , mn} be a set of n primitive soft

modules. Each primitive soft module mi∈ M is associated

with a three tuple(hi, wi, ai), where hi, wi, and aidenote the width, height, and aspect ratio of mi, respectively. The area

A_i of m_i is given by h_iw_i, and the aspect ratio a_i of m_i is given by h_i/w_i∈ [r_i,min, r_i,max]. Let L_i=A_i/r_i,min and

U_i =A_i/r_i,max denote the minimum and maximum width of m_i, respectively. We have h_i = A_i/w_iand L_i≤ w_i ≤ U_i.

A cluster module m_c is composed of a set of primitive soft modules Mp. mc can be reshaped via reshaping the modules in Mp without violating the relations of the modules in Mp. We create two dummy modules ms and mt, and set xs= 0,

y_s= 0, w_s= 0, and h_s= 0. Then, we construct horizontal and

vertical constraint subgraphs of m_c, denoted by Ghcand Gvc,

respectively. Ghcand Gvcare constructed as follows.

1) For msand mt, create two vertices vsand vtin both Ghc

and Gvc.

2) For each mp∈ Mp, create a vertex vpin Ghcand Gvc. 3) For each m_p, m_q ∈ M_p, if m_p is left to (below) m_q,

(8)

Fig. 11. (a) Cluster modulemcwith the cluster scheme{{m1, m2}, m3}.

(b)mccorresponding constraint subgraphsGhcandGvc. (c) Constraints to ensure that no relation of modules is violated.

4) For each m_p that is placed at the left boundary (bottom boundary), create an edge e(vs, vp) from vs to vp in

Ghc(Gvc).

5) For each mp that is placed at the right boundary (top boundary), create an edge e(v_p, v_t) from v_p to v_t in

Ghc(Gvc).

If x_p+ w_p≤ x_q∀e(p , q) ∈ Ghc and y_p+ (A_p/w_p) ≤ y_q∀e(p , q) ∈ Gvc are satisfied, the relations of the modules in M_p will not be violated. Fig. 11 illustrates how to construct

Ghc and Gvc, and what corresponding constraints must be satisfied. Fig. 11(a) shows a cluster module m_cwith the cluster scheme{{m1, m2}, m3}. Fig. 11(b) shows the corresponding

constraint subgraphs Ghcand Gvcof mc. Fig. 11(c) shows the

constraints to ensure that no relation of modules is violated. Thus, it implies that w_c≥ x_tand h_c ≥ y_t.

At level i, let Mi= {mi1, mi2, . . . , mini} denote the set of cluster modules. For each mi_j ∈ Mi,(xi_j, yi_j) denote the

coor-dinate of its bottom-left corner, and hi_jand w_jidenote the height and width of mi_j, respectively. Note that xi_j, y_ji, hi_j, and wi_jare non-negative real numbers. For convenience, we additionally create two variables, namely: xi_n_i+1 and ynii+1, which denote the estimated height and width of the chip at level i, respec-tively. Thus, the estimated area of the chip at level i equals

xi_n_i+1yini+1. To estimate wirelength, we adopt the quadratic of the length of the complete graph of pins in a net and take the center of a module as the location of a pin, if the pins are not assigned during floorplanning. Let Eidenote the set of nets at level i. For a net ei_j ∈ Ei, ei_j can be represented as a set of the modules {mi_k|ei_j has a pin connecting to mi_k}. Thus, the estimated wirelength /i_jof a net ei_j ∈ Eiis defined by

/_ji = mi p,miq∈eij xi_p+ w_pi/2−xi_q+ w_qi/22 +yi_p+ hi_p/2−y_qi+ hi_q/22 .

We use the cost function φ to guide the clustering of soft modules as φ(0x, 0y) = αxi_n_i+1yini+1+ β ei j∈Ei /i_j (5)

must be laid in the chip (i.e., xi_j+ wi_j≤ xi_n_i+1and yij+ hij≤

yi_n_i+1). Therefore, we can formulate the problem of clustering

for soft modules, called CS, as follows: Minimize αxi_n_i₊₁yi_n_i₊₁+ β ei j∈Ei /i_j subject to xi_j+ w_ji ≤ xi_n_i₊₁ ∀1 ≤ j ≤ n_i, y_ji+ hi_j ≤ yi_n_i₊₁ ∀1 ≤ j ≤ n_i, x_t_j ≤ w_ji, y_t_j ≤ hi_j ∀1 ≤ j ≤ n_i, x_p+ w_p ≤ x_q ∀e(p , q) ∈ G_hj ∀1 ≤ j ≤ n_i, y_p+Ap w_p ≤ yq ∀e(p , q) ∈ Gvj ∀1 ≤ j ≤ ni, L_i≤ w_i≤ U_i ∀1 ≤ i ≤ n

where α and β are nonnegative user-defined parameters.

C. Lagrangian Relaxation

Then, the Lagrangian relaxation subproblem associated with the multiplier 0P = (0κ, 0η,0λ, 0µ, 0r,0s), denoted by LRS/( 0P), can be defined as follows: Minimize αxi_n_i+1ynii+1+ β ei j∈Ei /_ji + ni j=1 κ_jxi_j+ wi_j− xi_n_i+1 + ni j=1 η_jy_ji+ hi_j− y_ni_i+1 +ni j=1 e(p,q)∈Ghj λ_jpq(x_p+ w_p− x_q) +ni j=1 e(p,q)∈Gvj µ_jpq y_p+Ap w_p − yq + ni j=1 r_jx_t_j − wi_j+ s_jy_t_j − hi_j subject to Li≤ wi≤ Ui ∀1 ≤ i ≤ n.

Let Q( 0P) denote the optimal value of LRS/( 0P). The Lagrangian dual problem (LDP) of CS can be defined as follows:

Maximize Q( 0P)

subject to P ≥ 0.0

Since CS can be transformed into a convex problem, we can apply the theorem in [5, Th. 6.2.4]. This implies that if 0P is an

(9)

optimal solution to LDP, the optimal solution of LRS/( 0P) will also optimize CS.

Consider the Lagrangian ζ of CS defined as follows:

ζ= αxi_n_i₊₁y_ni_i₊₁+ β ei j∈Ei /_ji + ni j=1 κ_jxi_j+ wi_j− xi_n_i₊₁ + ni j=1 η_jyi_j+ hi_j− yi_n_i₊₁ + ni j=1 e(p,q)∈Ghj λ_jpq(x_p+ w_p− x_q) + ni j=1 e(p,q)∈Gvj µ_jpq y_p+Ap w_p − yq + ni j=1 r_jx_t_j− wi_j+ s_jy_t_j − hi_j + n i=1 u_i(L_i− w_i) + n i=1 v_i(w_i− U_i).

The Kuhn–Tucker conditions imply that the optimal solution of CS must be at ∂ζ/∂x_p= 0 and ∂ζ/∂y_p= 0. Thus, we only need to consider the multipliers 0P that satisfy these conditions. Therefore, for1 ≤ p ≤ n, we have

∂ζ/∂x_p= ni j=1   e(p,q)∈Ghj λ_jpq− e(q,p)∈Ghj λ_jqp   = 0 (6) ∂ζ/∂y_p= ni j=1   e(p,q)∈Ghj µ_jpq− e(q,p)∈Ghj µ_jqp   = 0. (7) D. SolvingLRS/( 0P) and LDP

LetΩ denote the set of multipliers 0P satisfying (6) and (7). We now consider solving the Lagrangian relaxation subproblem

LRS/( 0P) for a given 0P ∈ Ω, i.e., computing the dimension

and coordinate of each module. First, we partially differentiate

ζ with respect to w_ito get an optimal value of w_isuch that ζ is minimized, i.e., ∂ζ/∂w_i= (v_p− u_p) + ni j=1   q∈out_Ghj(vp) λ_jpq   −ni j=1   q∈out_Gvj(vp) µ_jpqAp w2_p   = 0. Thus, we have w_p = _n_i j=1 q∈outGvj(vp)µjpqAp (vp− up) +j=1ni q∈out_Ghj(vp)λjpq

where out_G(v) = {u|e(v, u) ∈ E(G)}. Recall that L_p≤ w_p≤

U_p, 1 ≤ p ≤ n. Thus, the optimal w∗_p can be computed by

w∗_p= min{U_p,max{L_p, w_p}}.

Since the dimension of each primitive module (w_p and h_p) has been determined, the dimension of each cluster module (wi_j and hi_j) can be computed by applying a longest path algorithm in Ghj and Gvj. Then, we consider partial differentiation of

ζ with respect to xi_j and y_ji, giving the optimality conditions of CS. Therefore, for1 ≤ j ≤ ni, we have

∂ζ/∂xi_j= β    ei k⊃{mij} 2ei_k−1xi_j− ei k⊃{mij} mi l∈eik\{mij} xi_l + ei k⊃{mij} mi l∈eik\{mij} w_ji− wi_l   + ni j=1 κj= 0 (8) ∂ζ/∂y_ji= β    ei k⊃{mij} 2ei_k−1yi_j− ei k⊃{mij} mi l∈eik\{mij} y_li + ei k⊃{mij} mi l∈eik\{mij} hi_j− hi_l   + ni j=1 η_j = 0 (9) where|ei_k| denotes the number of pins of ei_k.

In (8), there are n_i equations with n_i variables. Thus, we can apply Gaussian elimination to solve these n_i equations with n_i variables to get the optimal value of xi_j. In these n_i equations, all the coefficients of variables depend only on the net information (i.e., ei_k). Since the net information is the same through the entire process, each variable can be solved by the same process. Hence, we can record the process of solving each variable during the first iteration (which takes cubic time), and then each subsequent computation will take only quadratic time by applying the same process. Similarly, we can compute the optimal value of yi_j. After the dimensions and coordinates of all modules are computed, then we can get the dimension of the chip, xi_n_i+1 and ynii+1, since all modules are within the chip, i.e., xi_n_i+1= max{xij+ wji}, and ynii+1= max{yji+ hij} for all cluster modules mi_j.

Next, we use a subgradient optimization method to search for the optimal 0P. Let 0P be a multiplier at step k. We move 0P to a new multiplier 0Pbased on the subgradient direction

κ_j =κ_j+ ρ_kxi_j+ wi_j− xi_n_i+1 + η_j =η_j+ ρ_ky_ji+ hi_j− y_ni_i₊₁+ λ_jpq = [λ_jpq+ ρ_k(x_p+ w_p− x_q)]+ µ_jpq = µjpq+ ρk yp+Ap w_p − yq +

where [x]+= max{x, 0}, and ρ_k is a step size such that limk→∞ρk = 0 and∞k=1ρk= ∞.

(10)

We present a network-flow-based algorithm to check whether

0

P belongs to Ω and to project 0P to 0P∗ ∈ Ω, if 0P ∈ Ω. Further,

an increamental update technique is employed to make the maximum flow computation more efficient. For each cluster module m_c, we first create two networks N_hc (for Ghc) and N_vc(for Gvc) as follows.

1) For each v_i ∈ V (Ghc)(V (Gvc)), create a vertex v_i in N_hc(N_vc) and make v_s and v_t as the source and sink, respectively.

2) For each e(p , q) ∈ E(Ghc)(E(Gvc)), create a

corre-sponding edge e(p_{, q}_{) with capacity λ}_cpq_(µ_cpq_{) in}

Nhc(Nvc).

We apply the maximum flow computation on the networks to check whether 0P belongs to Ω. The maximum flow compu-tation finds an augmenting path from v_s to v_t and then pushes flow on it until no argument path can be found. Let cap(v, u) and flow(v, u) denote the capacity and flow on the edge e(v, u). An edge e(v, u) is saturated if its capacity equals the flow (i.e., cap(v, u) = flow(v, u)).

Theorem 2: If all edges in the networks are saturated, 0

P ∈ Ω.

Proof: After the maximum flow computation, for each v_p

in a network except the source and sink, the sum of the flows of v_pincoming edges equals the sum of its outgoing ones (i.e.,

e(p_,q_)∈N_hcflow(p, q) =_e(q_,p_)∈N_hcflow(q, p) for each

N_hcand_e(p_,q_)∈N_vcflow(p, q) =

e(q_,p_)∈N_vcflow(q, p) for each N_vc). Besides, cap(p, q) = flow(p, q) for all

edges e(p_{, q}_{) (all edges are saturated), and cap(p}_{, q}_{) of}

each edge e(p_{, q}_{) in N}

hc(Nvc) equals λcpq(µcpq). Hence, e(p,q)∈Ghcλcpq = e(q,p)∈Nhcλcqpand e(p,q)∈Gvcµcpq=

e(q,p)∈Gvcµcqp for each cluster module mc. 0P belongs

toΩ.

If 0P does not belong to Ω, we project 0P to 0P∗ by restoring the flow flow(p, q) of each edge e(p, q) in N_hc(N_vc) to λ_cpq (µcpq) for each mc.

Theorem 3: 0P∗ ∈ Ω.

The projection process greatly affects the efficiency of the entire optimization, since there may be O(n2) edges in the worst case. Thus, we employ an incremental flow update tech-nique to speed up the max–flow computation after updating

0

P and its corresponding capacity. Fig. 12 shows an

algo-rithm for the incremental network update. Lines 1–2 check whether each edge e(p_{, q}_{) violates the capacity constraint (i.e.,}

0 ≤ flow(p_{, q}_{) ≤ cap(p}_{, q}_{)). Lines 3–9 fix the overflow on}

e(p, q) if an edge e(p, q) violates its capacity constraint.

Finally, Line 10 computes a maximum flow again.

Note that, for efficiency consideration, we may perform Lagrangian relaxation only at the higher levels of the agglom-eratively multilevel framework (when the number of modules become small enough for Lagrangian relaxation). To do so,

Fig. 12. Incremental update algorithm. TABLE I

BENCHMARKCIRCUITSUSED INOUREXPERIMENT

however, we still need to pass the information of the aspect ratio for each soft module level by level.

VI. EXPERIMENTALRESULTS

We implemented the MB∗-tree algorithm in C++ language on a 450-MHz SUN Ultra 60 workstation with 2-GB memory. The package is available at http://eda.ee.ntu.edu.tw/research.htm.

Columns 1, 2, and 3 of Table I list the names of the benchmark circuits, the number of modules, and the number of nets, respectively. ami49 is the largest Microelectronics Center of North Carolina benchmark circuit used in the previ-ous works [9], [15] for comparative study. To test the scalability of existing methods, we created ten synthetic circuits, named ami49_x, by duplicating the modules and nets of ami49 by

x times. The largest circuit ami49_200 contains 9800

mod-ules and 91 351 nets (specified by pin-to-pin interconnections). Note that the work in [22] simply duplicates all modules and nets of the circuit ami49. However, these kinds of synthetic circuits are not general since there is no interconnection be-tween the duplicated copies of circuits. To avoid possible biases, we also added interconnections among different copies of duplicated circuits. For the circuitami49_x, we duplicated each module/net x times. For each module mi, we dupli-cated it as m_i,1, m_i,2, . . . , m_i,xand added x− 1 nets between (mi,1, mi,2), (mi,1, mi,3), . . . , (mi,1, mi,x). Also, we divide

(11)

TABLE II

COMPARISONS FORAREA, DEADSPACE, RUNTIME,ANDMEMORYAMONGMB∗-TREE, SP, O-TREE,ANDB∗-TREE. NR: NORESULTOBTAINEDWITHIN

5-h CPU TIME ONSUN SPARCULTRA60. NOTETHATMB∗-TREE, SP,ANDB∗-TREEFINISHEDTHEIRMEMORYALLOCATION IN THEVERYEARLY

STAGE OFEXECUTION. THEREFORE, THEIRMEMORYCONSUMPTION FOR THELISTEDCIRCUITSIZESCANBEMEASURED. O-TREEPERFORMS

MEMORYALLOCATION ANDDEALLOCATIONDURINGEXECUTION; THEREFORE, ONLY THEMEMORYREQUIREMENTS FOR THESMALLCASESTHATFINISHEDEXECUTIONAREAVAILABLE

the block widths/heights by 5 for the benchmarks to avoid overflows in computing the wirelength forami49_x.

Table II shows the results forami49_x by optimizing area alone (γ= 1.0 and δ = 0.0). Columns 2, 3, 4, 5, and 6 give the total area of modules in the circuit, the resulting area, the dead space, the runtime, and the memory requirement for our MB∗-tree, respectively. The remaining columns list the results for the well-known previous works, SP [31], O-tree [15], and B∗-tree [9]. Note that the B∗-tree package we used here is the September 2000 version, B∗-tree-v1.0, available also at

http://eda.ee.ntu.edu.tw/research.htm. It runs 50–100× faster and achieves better area utilization than the B∗-tree package reported in [9]. We shall also note that the tools we compared here are all variable-die floorplanners. The well-known floor-planner Parquet-3.1/-4.0 [1], [34] and the floorplacer Capo 9 [2] both target on fixed-die floorplanning, a different floorplanning problem from what we have solved in this paper. We have also tested publicly available mixed-size placers on the floorplan benchmarks, including Feng Shui 2.6/5.0 [14] and mGP [7], [30]. Feng Shui generated the floorplanning results directly using its legalizer without performing global placement. Thus, its results are far from optimal. For mGP, it results in many overlaps and places some modules outside the chip boundary. So we shall not compare our results with those mixed-size placers.

As shown in Table II, our MB∗-tree algorithm obtained a dead space of only 2.78% forami49 in only 0.4-min runtime and 1.3-MB memory, while B∗-tree-v1.0 reported a dead space

of 3.53% using 0.25-min runtime and 3.2-MB memory. Fur-ther, the experimental results for larger circuits show that the MB∗-tree scales very well as the circuit size increases while the previous works, i.e., SP, O-tree, and B∗-tree, do not. For circuit sizes ranging from 49 to 9800 modules and from 408 to 81 600 nets, the MB∗-tree consistently obtains high-quality floorplans with dead spaces of less than 3.72% in empirically linear run-time, while SP, O-tree, and B∗-tree can handle only up to 196, 98, and 1960 modules in the same amount of time and result in dead spaces of as large as 13.00% (at 196 modules), 12.29% (at 98 modules), and 27.33% (at 1960 modules), respectively. In Fig. 13, the resulting dead space and runtime are plotted as functions of the circuit size (in the number of modules), respectively. As shown in Table II and Fig. 13(a), the resulting dead spaces for the MB∗-tree are almost independent of circuit sizes, which proves the high scalability of the MB∗-tree. In contrast, the dead spaces for the nonhierarchical previous works all grow dramatically as the circuit size increases. Fig. 13(b) shows the empirical runtime for the four algorithms. This figure reveals that the empirical runtime of the MB∗-tree is the best. In particular, the empirical runtime of the MB∗-tree approaches linear in circuit size while the other previous works cannot handle large-scale designs. Fig. 14 shows the layout for the largest circuit ami49_200 obtained by MB∗-tree in 256-min CPU time. It has a dead space of only 3.44%. Note that this circuit is not feasible to the previous works [9], [15], [31].

Table III shows the comparisons on area, dead space, and runtime between running the complete MB∗-tree scheme and

(12)

Fig. 13. (a) Comparison for the dead space versus circuit size (number of modules). (b) Comparison for the CPU time versus circuit size (number of modules).

Fig. 14. Layout ofami49_200 (9800 modules, 81 600 nets). Dead space = 3.44%.

only clustering based on the ami49 family circuits for area optimization. The resulting areas right after clustering are on average about 1.78× of the final areas, and the runtimes for clustering range from less than 1% to about 13% of the total runtime. As shown in the table, although several initial placements have around 60% dead spaces, our declustering can consistently reduce the dead spaces to around 3%. The results show the effectiveness of the declustering scheme.

We also tested the scalability of B∗-tree and MB∗-tree for wirelength optimization. (We shall omit the comparisons with SP and O-tree since they cannot handle this wirelength opti-mization problem with more than 100 modules well.) Table IV shows the results on half-perimeter wirelength (HPWL), dead space, and CPU time for B∗-tree and MB∗-tree based on the ami49_x circuits and within 24-h CPU time. As shown in the table, MB∗-tree results in significantly smaller wirelength and average dead space than B∗-tree for wirelength optimization. In particular, the MB-tree scales very well while the B∗-tree does not.

Table V shows the comparisons for area optimization alone (γ= 1.0, δ = 0.0), wirelength optimization alone (γ = 0.0,

δ= 1.0), and simultaneous area and wirelength optimization

(γ= 0.5, δ = 0.5) among SP, B∗-tree, and MB∗-tree based

on the circuitindustry (whose total area = 658.04 mm2). The circuit industry is a 0.18-µm 1-GHz industrial design with 189 modules, 20 million gates, and 9777 center-to-center in-terconnections. It is a large chip design and consists of three “tough” modules with aspect ratios greater than 19 (and as large as 36). (Note that we do not have the results for O-tree for this experiment because the data industry cannot be fed into the O-tree package.) In each entry of the table, we list the best/average values obtained in ten runs of simulated annealing using a random seed for each run. For the column “Time,” we report the runtime for obtaining the best value and the average runtime of the ten runs. As shown in the table, our MB∗-tree algorithm obtained significantly better silicon area and wire-length than SP and B∗-tree in all tests. For area optimization, MB∗-tree can obtain a dead space of only 2.11% while SP (B∗-tree) results in a dead space of at least 28.1% (12.9%). For wirelength optimization, MB∗-tree can obtain a total wirelength of only 56 631 mm while SP (B∗-tree) requires a total wire-length of at least 81 344 mm (113 216 mm). For simultaneous area and wirelength optimization, MB∗-tree also obtains the best area and wirelength. The results show the effectiveness of our MB∗-tree algorithm. For the runtimes, MB∗-tree is larger than B∗-tree and SP for wirelength optimization. (For area optimization, MB∗-tree runs faster than SP.) This is reason-able because it took much longer to obtain significantly better results, and the multilevel process incurred some overhead. Nevertheless, as shown in Table II, both SP and B∗-tree do not scale well to instances with a large number of modules (and thus their runtimes increase dramatically when the number of modules grows into hundreds). The resulting layout ofindustry for simultaneous area and wirelength optimization using MB∗-tree is shown in Fig. 15.

VII. CONCLUDINGREMARKS

We have presented the MB∗-tree-based agglomeratively mul-tilevel framework to handle the floorplanning and packing for large-scale modules. Experimental results have shown that the MB∗-tree scales very well as the circuit size increases. The ca-pability of the MB∗-tree shows promise in handling large-scale designs with complex constraints. We propose to explore the

(13)

TABLE III

COMPARISONS FORAREA, DEADSPACE,ANDRUNTIMEBETWEENRUNNING THECOMPLETEMB∗-TREESCHEME ANDONLYCLUSTERING

TABLE IV

COMPARISONS FORHPWL, DEADSPACE,ANDCPU TIMEBETWEENB∗-TREE ANDMB∗-TREE FOR THEami49_x CIRCUITS. NR: NORESULTOBTAINEDWITHIN24-h CPU TIME

TABLE V

COMPARISONS FORAREAOPTIMIZATIONALONE, WIRELENGTHOPTIMIZATIONALONE,ANDSIMULTANEOUSAREA ANDWIRELENGTHOPTIMIZATION

AMONGSP, B∗-TREE,ANDMB∗-TREEBASED ON THECIRCUIT INDUSTRY. INEACHENTRY, BOTHBEST/AVERAGEVALUESOBTAINED INTENRUNS OF

(14)

Fig. 15. Layout ofindustry by simultaneously optimizing area and wire-length(γ = 0.5, δ = 0.5). CPU time = 5234 s, Area = 716 263 680 µm2, Total wirelength= 67 786 296 µm, Dead space = 8.14%.

floorplanning/placement problem with large-scale rectilinear and mixed-sized modules/cells as well as buffer-block planning for interconnect-driven floorplanning in the future.

REFERENCES

[1] S. Adya and I. Markov, “Fixed-outline floorplanning: Enabling hierarchi-cal design,” IEEE Trans. Very Large Shierarchi-cale Integr. (VLSI) Syst., vol. 11, no. 6, pp. 1120–1135, Dec. 2003.

[2] S. N. Adya, S. Chaturvedi, J. A. Roy, D. A. Papa, and I. L. Markov, “Unifi-cation of partitioning, placement and floorplanning,” in Proc. IEEE/ACM

Int. Conf. Comput.-Aided Des., Nov. 2004, pp. 550–557.

[3] C. J. Alpert, J. H. Hu, S. S. Sapatnekar, and P. G. Villarrubia, “A practical methodology for early buffer and wire resource allocation,” in Proc.

ACM/IEEE Des. Autom. Conf., Jun. 2001, pp. 189–194.

[4] C. J. Alpert, J.-H. Huang, and A. B. Kahng, “Multilevel circuit partition-ing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 17, no. 8, pp. 655–667, Aug. 1998.

[5] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming

Theory and Algorithms. Hoboken, NJ: Wiley, 1993.

[6] T. F. Chan, J. Cong, T. Kong, and J. R. Shinnerl, “Multilevel optimization for large-scale circuit placement,” in Proc. IEEE/ACM Int. Conf.

Comput.-Aided Des., Nov. 2000, pp. 171–176.

[7] C.-C. Chang, J. Cong, and X. Yuan, “Multi-level placement for large-scale mixed-size ic designs,” in Proc. ACM/IEEE Asia and South

Pac. Des. Autom., Jan. 2003, pp. 325–330.

[8] Y.-W. Chang and S.-P. Lin, “MR: A new framework for multilevel full-chip routing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 23, no. 5, pp. 793–800, May 2004.

[9] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S.-W. Wu, “B∗-trees: A new representation for non-slicing floorplans,” in Proc. ACM/IEEE Des.

Autom. Conf., Jun. 2000, pp. 458–463.

[10] J. Cong, J. Fang, and Y. Zhang, “Multilevel approach to full-chip gridless routing,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 2001, pp. 396–403.

[11] J. Cong, T. Kong, and D. Z. Pan, “Buffer block planning for interconnect-driven floorplanning,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 1999, pp. 358–363.

[12] J. Cong, M. Xie, and Y. Zhang, “An enhanced multilevel routing system,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 2002, pp. 51–58. [13] W. W. Dai, B. Eschermann, E. Kuh, and M. Pedram, “Hierarchical place-ment and floorplanning in bear,” IEEE Trans. Comput.-Aided Design

Integr. Circuits Syst., vol. 8, no. 12, pp. 1335–1349, Dec. 1989.

[14] FengShui Standard Cell/Mixed Block/Structure Placer. [Online]. Available: http://vlsicad.cs.binghamton.edu/software.html

[15] P.-N. Guo, C.-K. Cheng, and T. Yoshimura, “An O-tree representation of non-slicing floorplan and its applications,” in Proc. ACM/IEEE Des.

Autom. Conf., Jun. 1999, pp. 268–273.

[16] B. Hendrickson and R. Leland, “A multilevel algorithm for partitioning graph,” in Proc. Supercomputing, 1995, pp. 1–24.

[17] T.-Y. Ho, Y.-W. Chang, S.-J. Chen, and D. T. Lee, “A fast crosstalk- and performance-driven multilevel routing system,” in Proc. IEEE/ACM Int.

Conf. Comput.-Aided Des., San Jose, CA, Nov. 2003, pp. 382–387.

planning and buffer block planning,” IEEE Trans. Comput.-Aided Design

Integr. Circuits Syst., vol. 23, no. 5, pp. 694–703, May 2004.

[21] G. Karypis and V. Kumar, “Multilevelk-way hypergraph partitioning,” in Proc. ACM/IEEE Des. Autom. Conf., Jun. 1999, pp. 343–348.

[22] H.-C. Lee, Y.-W. Chang, J.-M. Hsu, and H. Yang, “Multilevel floor-planning/placement for large-scale modules using B∗-trees,” in Proc.

[23] S.-M. Li, Y.-H. Cherng, and Y.-W. Chang, “Noise-aware buffer plan-ning for interconnect-driven floorplanplan-ning,” in Proc. IEEE/ACM Asia and

South Pac. Des. Autom. Conf., Jan. 2003, pp. 423–426.

[24] J.-M. Lin and Y.-W. Chang, “TCG: A transitive closure graph based rep-resentation for non-slicing floorplans,” in Proc. ACM/IEEE Des. Autom.

Conf., Jun. 2001, pp. 764–769.

[25] J.-M. Lin and Y.-W. Chang, “TCG-S: Orthogonal coupling of P∗ -admissible representations for general floorplans,” in Proc. ACM/IEEE

Des. Autom. Conf., Jun. 2002, pp. 842–847.

[26] J.-M. Lin and Y.-W. Chang, “TCG-S: Orthogonal coupling of P∗ -admissible representations for general floorplans,” IEEE Trans.

Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 6, pp. 968–980, Jun. 2004.

[27] J.-M. Lin and Y.-W. Chang, “TCG: A transitive closure graph based rep-resentation for general floorplans,” IEEE Trans. Very Large Scale Integr.

(VLSI) Syst., vol. 13, no. 2, pp. 288–292, Feb. 2005.

[28] J.-M. Lin, Y.-W. Chang, and S.-P. Lin, “Corner sequence: A P-admissible floorplan representation with a worst-case linear-time packing scheme,”

IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 4, pp. 679–

686, Aug. 2003.

[29] S.-P. Lin and Y.-W. Chang, “A novel framework for multilevel routing considering routability and performance,” in Proc. IEEE/ACM Int. Conf.

[30] MGP: Multilevel Global Placer for large-scale standard-cell placement, mixed-sized (standard-cells mixed with macros) placement for wire-length minimization and routability optimization. [Online]. Available: http://ballade.cs.ucla.edu/mGP/

[31] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, “Rectangle-packing based module placement,” in Proc. IEEE/ACM Int. Conf.

[32] S. Nakatake, K. Fujiyoshi, H. Murata, and Y. Kajitani, “Module placement on BSG-structure and IC layout applications,” in Proc. IEEE/ACM Int.

Conf. Comput.-Aided Des., Nov. 1996, pp. 484–491.

[33] R. H. J. M. Otten, “Automatic floorplan design,” in Proc. ACM/IEEE Des.

Autom. Conf., Jun. 1982, pp. 261–267.

[34] PARQUET: Floorplanner for fixed-outline floorplanning and classi-cal min-area block packing. [Online]. Available: http://vlsicad.eecs. umich.edu/BK/parquet/

[35] P. Sarkar, V. Sundararaman, and C. K. Koh, “Routability-driven repeater block planning for interconnect-centric floorplanning,” in Proc. ACM Int.

Symp. Phys. Des., Apr. 2000, pp. 186–191.

[36] T. C. Wang and D. F. Wong, “An optimal algorithm for floorplan and are optimization,” in Proc. ACM/IEEE Des. Autom. Conf., 1990, pp. 180–186. [37] D. F. Wong and C. L. Liu, “A new algorithm for floorplan design,” in Proc.

[38] F. Y. Young, C. C. N. Chu, W. S. Luk, and Y. C. Wong, “Floorplan area minimization using Lagrangian relaxation,” in Proc. ACM Int. Symp.

Phys. Des., Apr. 2000, pp. 174–179.

Hsun-Cheng Lee received the B.S. degree in

in-formation computer engineering from Chung Yuan Christian University, Chungli, Taiwan, R.O.C., in 1999, and the M.S. degree in computer information science from the National Chiao Tung University, Hsinchu, Taiwan, in 2001.

He is currently a Software Engineer with Synop-sys, Taipei, Taiwan. His research interests include physical design and DFM-related topics.

(15)

Yao-Wen Chang (S’94–M’96) received the B.S.

degree from National Taiwan University, Taipei, Taiwan, in 1988, and the M.S. and Ph.D. degrees from the University of Texas at Austin in 1993 and 1996, respectively, all in computer science.

He is a Professor in the Department of Electrical Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University. He is cur-rently also a Visiting Professor at Waseda University, Japan. He was with the IBM T. J. Watson Research Center, Yorktown Heights, NY, in the summer of 1994. From 1996 to 2001, he was on the faculty of National Chiao Tung University, Taiwan. His current research interests lie in VLSI physical design, design for manufacturing, and FPGA. He has been working closely with industry on projects in these areas. He has coauthored one book on routing and over 120 ACM/IEEE conference/journal papers in these areas.

Dr. Chang received an award at the 2006 ACM ISPD Placement Contest, Best Paper Award at ICCD-1995, and nine Best Paper Award Nominations from DAC-2007, ISPD-2007 (two), DAC-2005, 2004 ACM TODAES, ASP-DAC-2003, ICCAD-2002, ICCD-2001, and DAC-2000. He has received many awards for research performance, such as the inaugural First-Class Principal Investigator Awards and the 2004 Mr. Wu Ta You Memorial Award from the National Science Council of Taiwan, the 2004 MXIC Young Chair Professor-ship from the MXIC Corp, and for excellent teaching from National Taiwan University and National Chiao Tung University. He is an editor of the Journal

of Computer and Information Science. He has served on the ACM/SIGDA

Physical Design Technical Committee and the technical program committees of ASP-DAC (topic chair), DAC, DATE, FPT (program co-chair), GLSVLSI, ICCAD, ICCD, IECON (topic chair), ISPD, SOCC (topic chair), TENCON, and VLSI-DAT (topic chair). He is currently an independent board member of Genesys Logic, Inc, the chair of the Design Automation and Testing (DAT) Consortium of the Ministry of Education, Taiwan, a member of the board of governors of the Taiwan IC Design Society, and a member of the IEEE Circuits and Systems Society, ACM, and ACM/SIGDA.

Hannah Honghua Yang received the B.S. degree

from Peking University, Beijing, China, in 1988, and the M.S. and Ph.D. degrees from the University of Texas, Austin, in 1991 and 1995, respectively, all in computer science.

She is currently a Senior Staff CAD Researcher with Strategic CAD Laboratory, Intel Corporation, Hillsboro, OR. She is a technical lead on very large scale integration (VLSI) design automation for physical design and microarchitecture planning and exploration. She has published over 30 technical papers in the semiconductor design automation field in premium international conferences and journals.