Simultaneous Floorplan and Buffer-Block Optimization

(1)

Simultaneous Floorplan and

Buffer-Block Optimization

Iris Hui-Ru Jiang, Yao-Wen Chang, Member, IEEE, Jing-Yang Jou, Senior Member, IEEE, and Kai-Yuan Chao

Abstract—As technology advances and the number of inter-connections among modules rapidly increases, timing closure, and design convergence are the most important concerns. Hence, it is desirable to consider interconnect optimization as early as possible. Previous work for this issue can be classified into two directions: wire planning and buffer-block planning for intercon-nect-driven floorplanning. Wire planning for interconintercon-nect-driven floorplanning does not consider buffer insertion, and buffer-block planning for interconnect-driven floorplanning cannot overcome the limitation of a bad initial floorplan. In this paper, we first address simultaneous floorplanning and buffer-block planning (i.e., integrating buffer-block planning into floorplanning) for interconnect optimization. We adopt simulated annealing to refine a floorplan so that buffers can be inserted more effectively. In each iteration, we construct a routing tree for each net, allocate buffers for all nets, introduce corresponding buffer blocks into the intermediate floorplan, and invoke Lagrangian relaxation to optimize area and satisfy timing requirements. Further, in order to reduce the problem size, we present supermodule partitioning which partitions modules into supermodules. Experimental results show that our method of integrating buffer-block planning into floorplanning can significantly improve the interconnect delay and reduce the number of buffers needed. Based on a set of MCNC benchmark circuits, our approach achieves an average success rate of 86.1% of nets meeting timing constraints, inserts only 272 buffers on average, and consumes an average extra area of only 0.28% over the given floorplan, compared with the average success rate of 62.6%, 1123 buffers, and extra area of 1.05% resulted from a famous recent work presented at ICCAD’99.

Index Terms—Floorplanning, interconnect optimization, layout, physical design.

I. INTRODUCTION

A

S REVEALED by the 1999 International Technology Roadmap for Semiconductors [13], technology will soon shrink into below 0.1 m and the chip complexity will be over 200 million transistors soon. For such large and complex designs, timing closure and design convergence are the most important concerns. Further, for deep submicron designs, interconnect dominates circuit performance. However, the

Manuscript received July 8, 2002; revised January 29, 2003. This work was supported in part by the National Science Council of Taiwan, R.O.C., under Grant NSC 91-2215-E-002-038. This paper was recommended by Associate Editor M. D. F. Wong.

I. H.-R. Jiang is with VIA Technologies Inc., Taipei 231, Taiwan (e-mail: huiru@cis.nctu.edu.tw).

Y.-W. Chang is with the Department of Electrical Engineering and the Grad-uate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan (e-mail: ywchang@cc.ee.ntu.edu.tw).

J.-Y. Jou is with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: jyjou@ee.nctu.edu.tw).

K.-Y. Chao is with Intel Corporation, Hillsboro, OR 97124 USA (e-mail: kchao@ichips.intel.com).

Digital Object Identifier 10.1109/TCAD.2004.826582

conventional design flow deals with interconnect optimization at the routing or the postrouting stage. When the amount of communication among modules rapidly increases, it is almost impossible to remedy interconnect during or after routing, since most silicon and routing resources are occupied. Therefore, we should optimize interconnect as early as possible. Previous work for this issue can be classified into two directions: wire planning and buffer-block planning for interconnect-driven floorplanning.

Wire planning for interconnect-driven floorplanning tries to measure the impact of wiring or to plan interconnect at the floorplanning stage [4]. However, this method considers only wires; other useful techniques, e.g., buffer insertion, were not included. On the other hand, buffer-block planning for interconnect-driven floorplanning manages buffer-blocks for a given floorplan [5], [11], [14]. Previous work has shown that buffer insertion is an effective and widely used technique to improve interconnect delay, especially for global signals [1], [13]. (For example, over 85% of global nets in Intel Itanium microprocessors are buffered to reshape signals [9].) Because buffers consume silicon resource, it is too difficult to insert a large number of buffers individually after placement or routing when most silicon and routing resources are occupied. The induced area may significantly change the floorplan and placement, thus causing problems in timing closure and design convergence. To tackle this problem, researchers tried to consider buffer insertion during postfloorplanning (not during routing or postrouting) [5], [11], [14]. For a given floorplan, channels and dead spaces are used as buffer blocks, which accommodate buffers. Cong et al. first consider this issue in [5]; they derive feasible region formulas to determine where to insert buffers to meet timing requirements and propose a greedy algorithm to plan buffer blocks in a slicing floorplan. Sarkar et al. also consider routability and address the concept of independent feasible regions (feasible regions of buffers for a net do not influence each other) in [11]. Tang and Wong optimally plan as many buffers into buffer blocks as possible for all nets, each with one buffer in [14]. Moreover, [5] and [11] expand channels to provide more buffers, if necessary. However, if the given floorplan is not good enough, channel expansion would result in much area overhead. Hence, this kind of strategy is limited by the quality of a given floorplan. Although [5] claims their approach can be applied to slicing and nonslicing floorplans, channel expansion can be adopted only when the channel definition is certain. For slicing floor-plans, each channel is explicitly shown in the representation, e.g., slicing floorplan trees [16]. However, a channel may implicitly be defined in a nonslicing floorplan. Hence, channel

(2)

expansion cannot easily be applied to nonslicing floorplans. Alpert et al. proposed buffer-site methodology in [3], allocating buffers into empty silicon area inside macroblocks. However, placing buffers inside macroblocks requires one to consider the interaction between logic and interconnect. Therefore, buffers are typically inserted outside macroblocks [9].

Previous work for interconnect-driven floorplanning does not integrate buffer insertion into floorplanning. Existing work for buffer-block planning for interconnect-driven floorplanning cannot break through the limitation by a bad floorplan. In this paper, we first study simultaneous floorplanning and buffer-block planning (FBP) to conquer the weakness of the above. (In industry, this idea was considered for Intel Itanium microprocessor design [9].) We present an algorithm that simultaneously considers FBP for a general floorplan. Our method adopts the simulated annealing mechanism to refine the floorplan so that buffers can be inserted more effectively. In each iteration, we construct a routing tree for each net, allocate buffers for all nets, introduce corresponding buffer blocks into the intermediate floorplan, and invoke Lagrangian relaxation to optimize area and satisfy timing requirements. Further, in order to reduce the problem size, we present supermodule partitioning which partitions modules into supermodules.

Experimental results show that our method of integrating buffer-block planning into floorplanning can significantly improve the interconnect delay and reduce the number of buffers needed. Based on a set of MCNC benchmark circuits, our approach achieves an average success rate of 86.1% of nets meeting timing constraints, insert only 272 buffers on average, and consumes an average extra area of only 0.28% over the given floorplan, compared with the average success rate of 62.6%, 1123 buffers, and extra area of 1.05% resulted from the recent work in [5].

The rest of this paper is organized as follows. Section II gives the problem formulation of simultaneous FBP. Section III in-troduces the concept of a nonslicing floorplan representation, independent feasible regions, and basic buffer-block planning. We detail Lagrangian relaxation-based buffer-block planning and supermodule partitioning in Section IV and the simulated annealing algorithm in Section V. Experimental results are dis-cussed in Section VI. Section VII concludes this paper.

II. PROBLEMFORMULATION

In this section, we give our problem formulation. We define the simultaneous FBP problem as follows.

• Problem: The simultaneous FBP problem.

• Objective: Minimize area overhead, subject to timing re-quirements.

• Inputs: An initial floorplan, multiterminal nets, and their timing requirements, buffer library, technology file. • Outputs: A floorplan with buffer-block planning. Table I lists the technology file and buffer library used in our experiments that are based on 0.18- m technology in the NTRS’97 roadmap [12]. These parameters were also used in [5] and [11]. The notation is used throughout this paper.

TABLE I

PARAMETERS OF0.18-m TECHNOLOGY IN THENTRS’97 ROADMAP

Fig. 1. (a) Packing of a sequence pair(abcd; bacd) for modules fa; b; c; dg. (b) The corresponding horizontal and vertical constraint graphs.

III. PRELIMINARIES

This section first introduces the sequence-pair representation of a nonslicing floorplan [10] and the concepts of independent feasible regions [11]. We then propose our approach for buffer-block planning on two-terminal nets.

A. Sequence-Pair Representation

We adopt the sequence-pair representation [10] for a general floorplan. A sequence pair of a set of modules is a pair of se-quences formed by module names. For example, given a set of modules is a sequence pair of these modules, as shown in Fig. 1(a). Based on the following proper-ties, we can retrieve the topology relations between modules.

• H-constraint: If , module

is on the right side of module .

• V-constraint: If , module

is below module .

We can accordingly construct the horizontal and vertical con-straint graphs, and . In , we construct a node for each module and two additional nodes and . Except and

(3)

whose weights are zero, each node in is weighted as the width/height of the corresponding module. The edges are constructed by the following rules.

• There exists an edge from to in iff .

In addition, edges from to zero-indegree nodes and from zero-outdegree nodes to are added. Fig. 1(b) illustrates the cor-responding constraint graphs of the sequence pair . The -coordinate ( -coordinate) of the bottom-left corner of each module can be computed by the longest path length from to the module node in . Hence, if a dummy module replaces and an additional edge from the dummy module to is added in , the x-coordinate ( -coordinate) of the bottom-left corner of the dummy module equals the width (height) of the packing. By a sequence of the following two kinds of perturbations, an arbitrary sequence pair can change to a given one.

• Exchange two modules in the first sequence. • Exchange two modules in both sequences. B. Independent Feasible Region

In this section, we present the computation of independent feasible regions proposed by [11]. The independent feasible re-gion of a buffer is the rere-gion where the buffer can be placed to meet the timing requirement of the net, while the other buffers are placed within their respective independent feasible regions. Given a wire segment of length with driver resistance , load capacitance , wire resistance per unit length , and wire capacitance per unit length , its Elmore delay is calculated by

Assume that is the buffer output resistance, and is the buffer input capacitance. Let denote the El-more delay of a two-terminal net of length with buffers inserted, where is the distance between the driver and the th buffer. The buffer locations under the optimal delay

are

where

The width of the independent feasible region of a buffer means the maximum tolerable range around the optimum location of the buffer. In [11], the independent feasible region

of width for the th buffer of a net is defined as

such that and

, where denotes the timing requirement associated with net . Moreover, if ,

the width of the independent feasible region for each buffer of net is

On the other hand, in [5], the minimum number of buffers required to meet the timing requirement for a net

of length is

where

C. Basic Buffer-Block Planning

In this section, we propose the basic idea of our buffer-block planning for two-terminal nets. (Multiterminal nets will be con-sidered later.) Fig. 2(a) shows the independent feasible regions of two buffers on a two-terminal net . Based on the formulas shown in the previous subsection, the routing of a two-terminal net should be a monotonic route restricted in the bounding box of its terminals. The independent feasible region of the th buffer is a hexagon or a degenerated hexagon bounded by the bounding box and two parallel lines of slope or . The respective distance from the source terminal to these parallel lines are

and .

A buffer block is a rectangular region consisting of buffers, provided by dead spaces and/or channels. As shown in Fig. 2(a), each buffer is inserted into a buffer block with which its indepen-dent feasible region overlaps. For the first buffer, its indepenindepen-dent feasible region intersects the dead space , thus, it is assigned to the buffer block . If there are many choices, we first assign it to the one with the most overlapped area. For the second buffer, there is no dead space intersecting its independent feasible re-gion, thus it is assigned to the channel (between modules and ), which is nearest to its independent feasible region. After all buffers for all nets are allocated, the region of each buffer block is determined as the bounding rectangle of the inserted buffers. We then treat a buffer block as a soft module, and insert the node into the constraint graphs accordingly. See Fig. 2(b) for an illus-tration. Since we remove all transitive edges before processing, inserting a buffer-block node into the constraint graph needs only linear time. We will reshape the floorplan by Lagrangian relaxation detailed in Section IV.

IV. LAGRANGIAN RELAXATION-BASED

BUFFER-BLOCKPLANNING

In this section, we detail buffer-block planning for an inter-mediate floorplan. We construct a routing tree for each net, as-sign buffer blocks (extended from the basic idea introduced in

(4)

Fig. 2. (a) Net requires two buffers. Each buffer can be inserted into its independent feasible region. In the case shown in this figure, one buffer is inserted to the dead spacef , the other is inserted to the channel h (on the right side of module c). (b) The modified sequence pair with induced buffer blocks and its corresponding constraint graphs, where transitive edges are not shown, and induced buffer-block nodes are indicated by rectangles.

Section III), reshape the floorplan using the Lagrangian relax-ation technique, partition the floorplan into supermodules, and, finally, summarize our buffer-block planning procedure. A. Routing Tree Construction

For an intermediate floorplan, we first construct a routing tree for each multiterminal net. At the floorplanning stage, detailed timing information is not available. Thus, our goal is to construct a timing-aware routing tree for each net.

We adopt the AHHK heuristic presented by [2] to combine Dijkstra’s shortest path algorithm with Prim’s minimum span-ning tree one [6]. The generated tree directly tradeoffs between radius and wire length. The initial tree is then converted to a Steiner tree by removing overlapped edges based on the algo-rithm proposed in [7]. Fig. 3(a) shows an example of a mul-titerminal routing tree, the longest path (source sink2 sink3) is indicated by the bold line. (Alternative tree construc-tion approaches can also be used instead.) Based on the formulas described in Section III-B, we can check whether an optimal buffered routing tree can satisfy its timing requirement, i.e., . We record these unsatisfied nets, which do not meet timing requirements, even with optimally inserted buffers, and do not plan buffers for them (since the timing of those nets cannot be satisfied).

B. Buffer-Block Planning

A multiterminal routing tree can be seen as a combination of several two-terminal routing segments. Hence, our buffer-block planning for multiterminal nets is extended from the basic buffer-block planning for two-terminal nets presented in Section III-C.

After checking whether a routing tree can satisfy its timing requirement, we record unsatisfied nets and do not plan buffers for them. For the rest of the nets, we process path by path (from the longest to the shortest) in each routing tree. Based on the for-mulas in Section III-B, we obtain the number of buffers needed for the longest path, the optimal distance from the source ter-minal to each buffer, and the width of independent feasible re-gion. We then determine the independent feasible region of each buffer on each path according to the above information.

Fig. 3(b) shows the independent feasible regions of buffer as-signment for the routing tree given in Fig. 3(a). In this case, the longest path (source sink2 sink3) requires two buffers, and the path from the source to sink1 does not need buffers. To preserve the topology, the independent feasible region of each buffer is further restricted to the bounding box of the two nearest Steiner tree nodes. If the independent feasible region covers some tree node, the tree node plays the role of the buffer. As shown in Fig. 3(b), the independent feasible region of the first buffer is subject to the nearest tree nodes, and the second

(5)

Fig. 3. (a) The routing tree for a multiterminal net, where the longest path (source! sink2 ! sink3) is highlighted by the bold line. We process the tree path by path, from the longest to the shortest. (b) The path from the source to sink3 requires two buffers; the corresponding independent feasible region of the first buffer is shown by the shaded hexagon, and the second buffer is covered by sink2. (c) The resulting buffer assignment for the longest path; the first buffer is assigned to the buffer blockf , and the second buffer is taken by sink2.

buffer is replaced by the sink2 terminal. Similar to the basic buffer-block planning for two-terminal nets, we assign buffers into a dead space that intersects their independent feasible re-gions with the most area or into the nearest channel; as shown in Fig. 3(c), the first buffer is assigned to the buffer block .

After allocating buffers for all nets, we introduce buffer blocks as soft modules into constraint graphs. These buffer blocks may occupy dead spaces or be inserted into channels. Their areas equal the bounding areas of inserted buffers. Pre-vious work generates buffer blocks before buffer assignment; however, we generate buffer blocks after buffer assignment and, thus, the area of buffer blocks can properly be controlled, especially for the buffer blocks in channels.

C. Lagrangian Relaxation

We adopt the Lagrangian relaxation technique to reshape the floorplan. After buffer allocation, contains mod-ules nodes and buffer-block nodes. The first nodes indi-cate modules, and the other nodes indiindi-cate buffer blocks. Each module or buffer block has its bottom-left corner -coordinate , bottom-left corner -coordinate , area , width , height , maximum width , and minimum width . In addi-tion, inspired by [17] to facilitate area calculaaddi-tion, we add one dummy node labeled to and . As indicated in Fig. 4(b), each edge directed to is altered to the dummy node, and an additional edge from to is added. As mentioned in Section III-A, equals the width (height) of the packing. There are multiterminal nets. denotes the timing requirement of net , and denotes the longest path delay in the routing tree of net .

Hence, we may formulate the geometric program (primal problem) to minimize the total area subject to timing require-ments as follows.

Minimize Subject to

Because the objective function and the constraints are all posynomial [15], we can apply Lagrangian relaxation to solve the problem by introducing one nonnegative Lagrange multiplier for each constraint. Therefore, the Lagrangian relaxation subproblem is given by

Minimize

Subject to

The objective function of is the Lagrangian func-tion . We have the following theorem to simplify the Lagrangian function.

(6)

Fig. 4. (a) Original constraint graphs,G =G , where nodes 1–4 are modules, and nodes 5–8 are induced buffer blocks. (b) The modified constraint graphs added with the dummy node 9, where the modification is highlighted by bold lines.

Theorem 1: The optimality conditions for the Lagrange mul-tipliers are given by

Proof: By Kuhn–Tucker conditions [15], the first order derivative of , with respect to each variable, equals 0 at the optimal solution of .

Rearranging , we have

By checking Kuhn–Tucker conditions, this theorem thus follows.

Applying the optimality conditions, we may further simplify as follows:

(7)

and are constant for a fixed vector of Lagrange multipliers.

Theorem 2: Let be a solution, then the op-timal width of module or buffer block is given by

The optimal Manhattan distance between the buffer at and the buffer at of net , is constrained by

and are consecutive edges in the longest path of net .

Proof: Differentiating with respect to , we have

Applying the range constraints on width, we have the optimal width

The delay of net , is given by

where is the longest path in the routing tree of net has segments, thus, buffers inserted, is the driver resistance at is the load capacitance at is the Man-hattan distance between the buffer at and the buffer at of net is the delay associated with the edge , and is the buffer delay.

We assume that a sink terminal of a net can be a driver for other sink terminals, and the driver delay of the sink terminal equals the buffer delay . Therefore, the timing constraints

can be rewritten as

where

For two consecutive edges and in

Since the first order derivative of with respect to equals 0, we have

and are consecutive edges in . This theorem thus follows.

The Lagrangian dual problem ( ) is to find a vector of Lagrange multipliers such that the optimal solution of is also the optimal solution of .

Maximize

Subject to in the optimality conditions where

We only need to consider those multipliers satisfying the op-timality conditions. We iteratively adjust multipliers by the sub-gradient optimization method as follows:

where and is the step-size sequence that

satisfies and (e.g., ).

After applying the subgradient optimization method, Lagrange multipliers change to a new vector, thus, the new vector needs to be projected back to the nearest point by the 2-norm measure and to meet the optimality conditions.

D. Supermodule Partitioning

After Lagrangian relaxation, we partition the floorplan into supermodules to reduce the problem size for simulated annealing. At a high temperature, the size of a supermodule is small so that the simulated annealing can freely refine the floorplan. When the temperature is cooling down (the floorplan is settled down at a low temperature), the size of a supermodule is adjusted to a larger value. A supermodule holds the following two properties.

• A supermodule is a set of modules in the floorplan. • The nets between any pair of modules in a supermodule

meet timing requirements.

An extreme case is all modules in one supermodule, i.e., all nets meet timing requirements. Note that buffer blocks in a supermodule will be considered for buffer-block planning in the next iteration, and supermodules are considered as hard modules. Fig. 5 summarizes the procedure of supermodule partitioning.

E. Summary on Buffer-Block Planning

Fig. 6 lists our buffer-block planning procedure. In lines 1 and 2, constraint graphs are extracted according to the given in-termediate floorplan, and transitive edges are deleted. In lines 4–7, the routing trees are then constructed, and unsatisfied nets are recorded. In lines 8–10, buffer blocks are planned. In lines 11–19, the Lagrangian relaxation technique is invoked to re-shape the floorplan. In line 20, unsatisfied nets are updated for the refined floorplan. In line 21, the resulting floorplan is parti-tioned into supermodules.

(8)

Fig. 5. Supermodule partitioning procedure.

Fig. 6. Buffer-block planning procedure.

V. SIMULTANEOUS FLOORPLANNING AND

BUFFER-BLOCKPLANNING(FBP)

In this section, we shall present our simultaneous FBP algo-rithm for the FBP problem. The FBP algoalgo-rithm is based on sim-ulated annealing and provides a mechanism to refine the floor-plan. After perturbing the floorplan, FBP invokes the buffer-block planning procedure to plan buffers.

A. Solution Perturbation

A feasible nonslicing floorplan, without overlapping mod-ules, can be represented by a sequence pair. We adopt the fol-lowing four operations to perturb a sequence pair to another.

• Op1: Exchange two modules in the first sequence. • Op2: Exchange two modules in both sequences. • Op3: Rotate a module.

• Op4: Relax a supermodule.

Op1 swaps two modules in the first sequence only. Op2 swaps two modules in both sequences. Op3 rotates a module; eight orientations (with pin considerations) are configured for each module. Op4 relaxes a supermodule (decluster some modules in a supermodule). We perturb a solution with the guidance of the current solution. Hence, with a probability adjusted by tem-perature and the solution quality, the related modules of the un-satisfied nets are chosen as candidates for perturbation. B. Cost Function

As given in Section II, the objective of the FBP problem is to find a floorplan with planned buffer blocks such that all timing requirements are satisfied and the area growth is minimized. Hence, a floorplan is evaluated by its cost combined by area and timing as follows.

where is a user specified parameter, is the set of nets, is the delay of net after buffer-block planning, is the timing requirement of net , and denotes the positive

part of , i.e., .

The first part of cost is the area consumed by the floorplan, including currently existing buffer blocks. The second part of the cost reflects the timing penalty paid for unsatisfied nets. The multiplier means the area equivalent of time. In experiments, is set to balance the area cost and timing penalty. The simu-lated annealing process gradually minimizes the cost.

C. Annealing Schedule

The annealing schedule controls the acceptance rate of up-hill moves, neighboring solutions with higher costs. The initial temperature is set as , where is the average cost change of a random sequence of moves, and is the ini-tial probability of accepting uphill moves. In the beginning, the temperature is high; hence, is initially set very close to 1. After each iteration, the temperature is reduced by a factor . The annealing process ends up when the temperature cools down below .

D. Overall Algorithm

The simulated annealing process begins from a random fea-sible floorplan . Buffer blocks are accordingly planned as de-scribed in Section IV. FBP then perturbs the floorplan using the aforementioned four operations. After each move, buffer blocks are planned according to the new floorplan. The process termi-nates when the solution is frozen, the temperature is too low, or the runtime is too long.

Fig. 7 summarizes the FBP algorithm. In line 1, the initial floorplan is extracted from the benchmark circuits. In lines 5–31, FBP perturbes the floorplan from one to another until any of the conditions given in line 31 is satisfied.

VI. EXPERIMENTALRESULTS

We implemented the FBP algorithm in the C language on a 166-MHz Sun UltraSPARC I workstation. The parameters used in the experiments are based on 0.18- m technology (see Table I). Note that this set of parameters were also used in [5].

The statistics of benchmarks are outlined in Table II. It should be noted that, as presented earlier, our approach can handle

(9)

Fig. 7. Simulated annealing for simultaneous FBP (the FBP algorithm).

TABLE II STATISTICS OFBENCHMARKS

multiterminal nets directly. For a comparative study, however, we used the two-terminal nets obtained in [5] by splitting from multiterminal nets; the timing requirements are also generated by [5] from – . The experiments of [11] are based on different parameters and delay bounds (randomly generated within the same interval – ), so we listed the re-sults of the RBP algorithm in [11] only for the reader’s ref-erence. The experimental results are summarized in Table III. The second column shows the number of nets meeting timing requirements (# nets meet) and that of total nets in a circuit (Tot. # nets). The third column gives the percentages of nets meeting the timing constraints. Column 4 lists the number of buffers inserted (# buffers). Column 5 gives the percentages of extra areas over the given floorplans for buffer insertion. We

TABLE III

RESULTS OFBBP, FBP,ANDRBP. THEEXPERIMENTS OFRBP AREBASED ON

DIFFERENTPARAMETERS ANDDELAYBOUNDS(RANDOMLYGENERATED

WITHIN THESAMEINTERVAL1:05–1:20D ), SOWELISTED THERESULTS OF THERBP ALGORITHM FOR THEREADER’SREFERENCE

compared with BBP [5]. In [5], BBP plans buffer blocks during postfloorplanning for two-terminal nets in a given slicing floor-plan. (Note that FBP can handle multiterminal nets and general floorplans.) For fair comparison, FBP adopts buffer-block plan-ning for two-terminal nets. In addition, FBP converts the given slicing floorplan into the corresponding sequence pair represen-tation before processing. Runtime comparisons are not shown in this table because FBP not only planned buffer blocks but also refined floorplans. Further, BBP and FBP ran on different ma-chines. For these benchmarks, the running times of FBP ranged from 1 min for the smallest circuit apte to about 35 min for the largest circuit playout. The results show that our method of in-tegrating buffer-block planning into floorplanning can signifi-cantly improve the interconnect delay and reduce the number of buffers needed. FBP achieves an average success rate of 86.1% of nets meeting timing constraints, insert only 272 buffers on av-erage, and consumes an average extra area of only 0.28% over the given floorplan, compared with the average success rate of 62.6%, 1123 buffers, and extra area of 1.05% resulted from BBP.

VII. CONCLUDINGREMARKS

In this paper, we have addressed the issue of simultaneous FBP for interconnect optimization at the floorplanning stage. Experimental results have shown that our method can signifi-cantly improve the interconnect delay and reduce the number of buffers needed. For simultaneous FBP, besides interconnect delay, routing congestion and crosstalk could also be investi-gated in the future.

(10)

ACKNOWLEDGMENT

The authors would like to thank Prof. J. Cong, Dr. T. Kong, and Prof. D. Z. Pan for providing the benchmark circuits and their detailed explanations on the data. Thanks also go to the anonymous reviewers for their very constructive comments.

REFERENCES

[1] C. J. Alpert and A. Devgan, “Wire segmenting for improved buffer inser-tion,” in Proc. 34th Design Automation Conf., June 1997, pp. 588–593. [2] C. J. Alpert, T. C. Hu, J. H. Huang, A. B. Kahng, and D. Karger, “Prim-Dijkstra tradeoffs for improved performance-driven routing tree design,” IEEE Trans. Computer-Aided Design, vol. 14, pp. 890–896, July 1995. [3] C. J. Alpert, J. Hu, S. S. Sapatnekar, and P. G. Villarrubia, “A practical methodology for early buffer and wire resource allocation,” in Proc. 38th ACM/IEEE Design Automation Conf., June 2001, pp. 189–194. [4] H.-M. Chen, H. Zhou, F. Y. Young, D. F. Wong, H. H. Yang, and N.

Sherwani, “Integrated floorplanning and interconnect planning,” in Dig. Tech. Papers 1999 IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 1999, pp. 354–357.

[5] J. Cong, T. Kong, and D. Z. Pan, “Buffer block planning for interconnect-driven floorplanning,” in Dig. Tech. Papers 1999 IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 1999, http://cadlab.cs.ucla.edu/pan/pub-lications/iccad99.ps, pp. 358–363. revised version.

[6] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algo-rithms. Cambridge, MA: MIT Press, 1990.

[7] J.-M. Ho, G. Vijaand, and C. K. Wong, “A new approach to the recti-linear Steiner tree probelm,” IEEE Trans. Computer-Aided Design, vol. 9, pp. 185–193, Feb. 1990.

[8] M. Lai and D. F. Wong, “Maze routing with buffer insertion and wire-sizing,” in Proc. 37th ACM/IEEE Design Automation Conf., June 2000, pp. 374–378.

[9] M. Mclnerney, K. Leeper, T. Hill, H. Chan, B. Basaran, and L. Mc-Quiddy, “Methodology for repeater insertion management in the RTL, layout, floorplan a fullchip timing databases of the Itanium™ micropro-cessor,” in Proc. ACM Int. Symp. Phys. Design, Apr. 2000, pp. 99–104. [10] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, “Rec-tangle-packing-based module placement,” in Dig. Tech. Papers IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 1995, pp. 472–479.

[11] P. Sarkar, V. Sundararaman, and C.-K. Koh, “Routability-driven repeater block planning for interconnect-centric floorplanning,” in Proc. ACM Int. Symp. Phys. Design, Apr. 2000, pp. 186–191.

[12] National Technology Roadmap for Semiconductors, 1997 ed: Semicon-ductor Industry Assoc..

[13] International Technology Roadmap for Semiconductors, 1999 ed: Semi-conductor Industry Assoc..

[14] X. Tang and D. F. Wong, “Planning buffer locations by network flows,” in Proc. ACM Int. Symp. Phys. Design, Apr. 2000, pp. 180–185. [15] W. L. Winston, Operations Research: Applications and Algorithms, 3rd

ed. Toronto, Canada: Thomson, 1994.

[16] D. F. Wong and C. L. Liu, “A new algorithm for floorplan design,” in Proc. 23rd ACM/IEEE Design Automation Conf., 1986, pp. 101–107. [17] F. Y. Young, C. C. N. Chu, W. S. Luk, and Y. C. Wong, “Floorplan area

minimization using Lagrangian relaxation,” in Proc. ACM Int. Symp. Phys. Design, Apr. 2000, pp. 174–179.

[18] H. Zhou, D. F. Wong, I.-M. Liu, and A. Aziz, “Simultaneous routing and buffer insertion with restrictions on buffer locations,” in Proc. 36th ACM/IEEE Design Automation Conf., June 1999, pp. 96–99.

Iris Hui-Ru Jiang received the B.S. and Ph.D.

de-grees in electronics engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1995 and 2002, respectively.

She is currently with VIA Technologies, Inc., Taipei, Taiwan. Her research interests focus on inter-connect optimization in deep submicron technology. Dr. Jiang is a Member of the ACM and ACM/SIGDA.

Yao-Wen Chang (S’94-M’96) received the B.S.

degree from National Taiwan University, Taipei, in 1988 and the M.S. and the Ph.D. degrees from the University of Texas, Austin, in 1993 and 1996, respectively, all in computer science.

Currently, he is an Associate Professor in the Department of Electrical Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University. He was with the VLSI Design Group, IBM T. J. Watson Research Center, Yorktown Heights, NY, in the summer of 1994. From 1996 to 2001, he was on the faculty of the Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan. His research interests lie in physical design automation, architectures, and systems for VLSI and combinatorial optimization.

Dr. Chang is a Member of IEEE Circuits and Systems Society, ACM, and ACM/SIGDA. He serves on the technical program committees of several international conferences on VLSI design automation, including ASP–DAC, ICCAD, ICCD, and APCCAS. He received a Best Paper Award at the 1995 IEEE International Conference on Computer Design (ICCD’95) for his work on FPGA routing, a reviewers’ Best Paper nomination at the 2000 ACM/IEEE Design Automation Conference (DAC’2000) for his work on the B*-tree floorplan representation, and a Best Paper nomination at the 2002 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’2002) for his work on multilevel routing. He received an inaugural all-university Excellent Teaching Award from the Department of Computer and Information Science, National Chiao Tung University (ranked first in the Department) in 2000.

Jing-Yang Jou (S’82–M’83–SM’02) received the

B.S. degree in electrical engineering from National Taiwan University, Taipei, and the M.S. and Ph.D. degrees in computer science from the University of Illinois, Urbana-Champaign, in 1979, 1983, and 1985, respectively.

He is currently a professor of the Department of Electronics Engineering at National Chiao Tung University, Hsinchu, Taiwan. He was previously with GTE Laboratories and Bell Laboratories. He has published more than 100 journal and conference papers. His research interests include behavioral and logic synthesis, VLSI designs and CAD for low power, design verification, and hardware/software codesign.

He is a Member of Tau Beta Pi. He served as the technical program chair of the Asia-Pacific Conference on Hardware Description Languages (APCHDL’97). He received the Distinguished Paper Award at the IEEE International Confer-ence on Computer-Aided Design in 1990.

Kai-Yuan Chao received the B.S. degree in nuclear

engineering from the National Tsing Hua University, Taipei, Taiwan, in 1986, the M.S. degree in medical engineering from the National Yang-Ming Medical College, Taipei, Taiwan, in 1988, and the M.S.E. and Ph.D. degrees in electrical and computer engineering from the University of Texas, Austin, in 1992 and 1995, respectively.

He presently manages floorplan and assembly design automation for major CPU development projects, including all Pentium 4 microprocessors, in the Desktop Platform Group, Intel Corporation, Hillsboro, OR, which he joined in 1995. He has published 19 technical papers and two book chapters in the research areas of VLSI/CAD, packaging, and radiology. His current research interests include architectural and design convergence, ECO methodology, and design collaboration.