MR: A new framework for multilevel full-chip routing

(1)

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 793

ShortPapers

_______________________________________________________________________________

MR: A New Framework for Multilevel Full-Chip Routing Yao-Wen Chang and Shih-Ping Lin

Abstract—In this paper, we propose a novel framework for multilevel

full-chip routing considering both routabilityand performance called MR. The two-stage multilevel framework consists of coarsening, followed byun-coarsening. Unlike the previous multilevel routing, MR integrates global routing, detailed routing, and resource estimation, together at each level of the framework, leading to more accurate routing resource estimation during coarsening and thus facilitating the solution refinement during un-coarsening. Further, the exact routing information obtained at each level makes MR more flexible in dealing with various routing objectives (such as crosstalk, power, etc.). Experimental results show that MR obtains signifi-cantlybetter routing solutions than previous works. For example, for a set of 11 commonlyused benchmark circuits, MR achieves 100% routing com-pletion for all circuits, while the previous multilevel routing, the three-level routing, and the hierarchical routing can complete routing for only2, 0, 2 circuits, respectively. In particular, the number of routing layers used by MR is even smaller. We also have performed experiments on timing-driven routing. The results are also verypromising.

Index Terms—Detailed routing, estimation, global routing, layout,

phys-ical design, routing, timing optimization.

I. INTRODUCTION

Research in very large scale integrated (VLSI) routing has received much attention in the literature. Routing is typically a very complex combinatorial problem. In order to make it manageable, the routing problem is usually solved using the two-stage approach of global routing, followed by detailed routing. Global routing first partitions the routing area into tiles and decides tile-to-tile paths for all nets, while detailed routing assigns actual tracks and vias for nets. Many routing algorithms adopt a flat framework of finding paths for all nets. Those algorithms can be classified into sequential and concurrent ap-proaches. Early sequential routing algorithms include maze-searching approaches [17], [24] and line-searching approaches [14], which route net-by-net. Most concurrent algorithms apply network-flow or linear-assignment formulation [1], [23] to route a set of nets at one time.

The major problem of the flat frameworks lies in their scalability for handling larger designs. As technology advances, technology nodes are getting smaller and circuit sizes are getting larger. To cope with the increasing complexity, researchers proposed to use hierarchical approaches to handle the problem: Marek-Sadowska proposed a hierarchical global router based on linear assignment [22]; Heisterman and Lengauer presented a hierarchical integer linear programming ap-proach for global routing [13]; Wang and Kuh proposed a hierarchical Manuscriptreceived August16, 2002; revised February 1, 2003. This work was supported in part by the National Science Council of Taiwan, R.O.C., under GrantNSC 91-2215-E-002-038. A preliminary version of this paper was pre-sented at the 2002 IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, November 2002, where itwas nominated BestPaper. This paper was recommended by Associate Editor T. Yoshimura.

Y.-W. Chang is with the Department of Electrical Engineering and Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: [email protected]).

S.-P. Lin is with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: is85060@ cis.nctu.edu.tw).

Digital Object Identifier 10.1109/TCAD.2004.826547

(; )3_{algorithm for timing-driven multilayer MCM/IC routing [25];}

Chang et al. applied linear assignmentto develop a hierarchical, concurrent global and detailed router for field programmable gate arrays (FPGAs) [3].

The two-level, hierarchical routing framework, however, is still lim-ited in handling the dramatically growing complexity in current and future IC designs which may contain hundreds of millions of gates in a single chip. As pointed out in [5], for a 0.07-m process technology, a2:5 2 2:5 cm2chip may contain over 360 000 horizontal and vertical routing tracks. To handle such high design complexity, the two-level, hierarchical approach becomes insufficient. Therefore, it is desired to employ more levels of routing for larger IC designs.

The multilevel framework has attracted much attention in the liter-ature recently. It employs a two-stage technique: coarsening followed by uncoarsening. The coarsening stage iteratively groups a set of cir-cuit components (e.g., circir-cuit nodes, cells, modules, routing tiles, etc.) based on a predefined cost metric until the number of components being considered is smaller than a threshold. Then, the uncoarsening stage iteratively ungroups a set of previously clustered circuit components and refines the solution by using a combinatorial optimization tech-nique (e.g., simulated annealing, local refinement, etc.). The multilevel framework has been successfully applied to VLSI physical design. For example, the famous multilevel partitioners,ML [2], hMET IS [15], andHP M [8], the multilevel placer, mP L [4], and the multilevel floorplanner/placer,MB30tree [20], all show the promise of the mul-tilevel framework for large-scale circuit partitioning, placement, and floorplanning.

A framework similar to multilevel routing was presented in [12] and [18]. Lin et al. in [18] and Hayashi and Tsukiyama in [12] pre-sented hybrid hierarchical global routers for multilayer VLSIs [12], in which both bottom-up (coarsening) and top-down (uncoarsening) tech-niques were used in global routing. Recently, Cong et al. proposed a pioneering multilevel approach for large-scale, full-chip, routability-driven (global) routing [5]. The framework starts by recursively coars-ening routing tiles, and an estimation of routing resources is computed at each level. When the number of tiles is below a threshold, a multi-commodity flow algorithm is used to obtain an initial routing solution. Then, the uncoarsening stage uses a modified maze-searching algo-rithm to further improve the routing solution, level by level. Their final results of the multilevel algorithm are tile-to-tile paths for all the nets. The results are then fed into a detailed router to find the exact connec-tion for each net. Their experimental results show better routing quality or running times than the traditional two-stage flat approach of global routing followed by detailed routing and the hierarchical approaches.

Inspired by the work of the multilevel router presented in [5], we propose, in this paper, a novel framework for multilevel global and detailed routing considering both routability and performance called MR. Different from the work presented in [5], MR has the following distinguished features.

• The previous works [5], [12], [18] are mainly for global routing, while our MR integrates global and detailed routing.

• MR integrates global routing, detailed routing, and resource esti-mation together at each level of the framework, leading to more accurate routing resource estimation during coarsening and thus facilitating the solution refinement during uncoarsening. Specif-ically, at each level of the coarsening stage, MR performs global routing to obtain a good initial solution for all nets inside the 0278-0070/04$20.00 © 2004 IEEE

(2)

Fig. 1. Multilevel framework flow.

TABLE I

FRAMEWORKCOMPARISONBETWEENOURS ANDCONGet al. [5]

tiles being considered and then detailed routing to obtain the exact routing patterns for these nets. Since the exact routing pat-terns are known, resource estimation is more accurate. With these good properties, the refinement conducted at the uncoarsening stage becomes much easier. In contrast, the work [5] performs only resource estimation during the coarsening stage, and only global routing during the uncoarsening stage. After multilevel processing is finished, the final global routing result is then fed into a detailed router to obtain the final routing solution. It is ob-vious that MR can have better interaction among global routing, detailed routing, and resource estimation, since they are consid-ered simultaneously. For example, global and detailed routers usually use rip-up and reroute to refine a routing solution based on the results of resource estimation. If the three tasks are per-formed separately, the rerouting process conducted at the global routing stage may be in vain since it does not know if the rerouting is useful for the detailed router. Also, the detailed router may fail to find a path because of the low flexibility induced from the

sep-arated global routing. Therefore, making the three tasks interact with each other can significantly improve routing quality. • A two-stage refinement method of Z-pattern routing, followed by

maze routing, is used in our multilevel framework, which makes rerouting much more effective.

• Unlike the previous works [5], [12], [18] that consider routability alone, MR also applies a recalling modification method to per-form timing-driven routing.

• MR is more flexible and, thus, different routing objectives (such as crosstalk, power, etc.) can be incorporated into our framework since exact track and wiring information at each level after de-tailed routing is known.

Fig. 1 shows our multilevel framework, and Table I summarizes the differences between MR and that presented in [5].

Experimental results show that MR obtains significantly better routing solutions than the multilevel routing [5], the three-level routing [6], and the hierarchical approach [5]. For the 11 benchmark circuits provided by the authors of [5], MR obtains 100% routing

(3)

Fig. 2. Routing graph: (a) partitioned layout and (b) routing graph.

completion for all circuits, while the multilevel routing, the three-level routing, and the hierarchical routing can complete routing for only 2, 0, 2 circuits, respectively. In particular, the number of routing layers required for routing completion for MR is even smaller. We also have performed experiments on timing-driven routing. The results are also very promising.

The rest of this paper is organized as follows. Section II presents the routing model and the multilevel routing framework. Section III presents our framework for routability and performance optimization. Experimental results are shown in Section IV. Finally, we give con-cluding remarks in Section V.

II. PRELIMINARIES A. Routing Model

Routing in modern ICs is a very complex process and, thus, we can hardly obtain solutions directly. Our routing algorithm is based on a graph-search technique guided by the congestion and timing informa-tion associated with routing regions and topologies. The router assigns higher costs to route nets through congested areas to balance the net distribution among routing regions. For performance-driven routing, additional costs are added to the routing topologies with longer critical path delays.

Before we can apply the graph search technique to multilevel routing, we first need to model the routing resource as a graph, such that the graph topology can represent the chip structure. Fig. 2 illustrates the graph modeling. For the modeling, we first partition a chip into tiles. A node in the graph represents a tile in the chip, and an edge denotes the boundary between two adjacent tiles. Each edge is assigned a capacity according to the physical area or the number of tracks of a tile. The graph is used to represents the routing area and is called multilevel routing graphG₀. A global router finds tile-to-tile paths for all nets onG0to guide the detailed router. The goal of global routing is to route as many nets as possible while meeting the capacity constraint of each edge and any other constraint, if specified. As the process technology advances, multiple routing layers are possible. The number of layers in a modern chip can be more than six [11]. Wires in each layer run either horizontally or vertically. We refer to the layer as a horizontal (H) or a vertical (V) routing layer.

B. Multilevel Routing Model

As illustrated in Fig. 1,G0 corresponds to the routing graph of the level 0 of the multilevel coarsening stage. At each level, our global router first finds routing paths for the local nets (or local two-pin con-nections) [those nets (concon-nections) that entirely sit inside a tile], and then the detailed router is used to determine the exact wiring. After the global and detailed routing are performed, we merge four adjacent tiles ofG0into a larger tile and at the same time perform resource estimation for use at the next level (i.e., level 1 here). Coarsening continues until

the number of tiles at a level, say thekth level, is below a threshold. After finishing coarsening, the uncoarsening stage tries to refine the routing solution starting from the last levelk where coarsening stops. During uncoarsening, the unroutable nets during coarsening are consid-ered, and maze routing and rip-up and reroute are performed to refine the routing solution. Then, we proceed to the next level (levelk 0 1) of uncoarsening by expanding each tiles to four finer tiles. The process continues up to level 0, when the final routing solution is obtained.

III. MULTILEVELROUTINGFRAMEWORK

Our multilevel routing algorithm, MR, is inspired by the work in [5]. Nevertheless, MR is significantly different from [5]. During the coarsening stage of the work [5], instead of routing or planning wires, they only estimate routing resources by using a line-sweep algorithm and then recursively coarsen to the last levelk. Since their coarsening stage does not perform real routing, it is hard to retrieve the routing information at the higher level, which may make real routing resource estimation inaccurate. At the last levelk, they apply a multicommodity flow algorithm to obtain an initial routing and avoid the net ordering problem. However, a router may encounter higher congestion when uncoarsening expands local nets. A bad initial routing at the higher level needs more time to reroute at the lower level because of lacking local routing information. This problem is also with the hierarchical approach.

MR tends to route shorter nets first, since we route local nets at each level of coarsening. It is obvious that the local nets at the lower level (say, level 0) are usually shorter than those at a higher level (say, levelk). Naturally, a shorter net enjoys less freedom, while searching for a path to route it. This fact holds even during rip-up and reroute. Thus, this observation implicitly suggests that a shorter net has a higher priority than a longer net as far as routability is concerned. Kastner

et al. in [16] also suggest this conclusion. Though this net-ordering

scheme may not be the optimal solution for some routing problems (for example, when timing is considered, routing the most critical net first often leads to better timing performance), it is still a reasonable alternative.

A. Multilevel Routing for Routability

Given a netlist, MR first runs the minimum spanning tree (MST) algorithm to construct the topology for each net, and then decomposes each net into two-pin connections, with each connection corre-sponding to an edge of the minimum spanning tree. MR starts from coarsening the finest tiles of level 0. At each level, tiles are processed one by one, and only local nets (connections) are routed. At each level, the two-stage routing approach of global routing followed by detailed routing is applied [see Fig. 3(a)–(c) for an illustration]. The global routing is based on the approach used in the pattern router [16] and first routes local nets (connections) on the tiles of level 0. Let the multilevel routing graph of level i be Gi = (Vi; Ei). Let

Re = fe 2 Eije is the edge chosen for routingg. We apply the cost

function : E_i ! < to guide the routing (Re) =

e2R

ce (1)

wherec_eis the congestion of edgee and is defined by ce=₂_{(p 0d )}1

where p_e and d_e are the capacity and density associated with e, respectively.

After the global routing is completed, MR performs detailed routing with the guidance of the global-routing results and finds a real path in the chip. Our detailed router is based on the maze-searching algorithm

(4)

Fig. 3. Global routing, detailed routing, and local refinement. (a) Route the local connectionn in a tile of G . (b) Global route of n. (c) Detailed route of n on the chip. (d) Route another local connection m that belongs to the same net asn. (e) Detailed route of connection m. (f) Local refinementof net.

and supports the local refinement illustrated in Fig. 3(d)–(f). Pattern routing uses an L- or Z-shaped route to make the connection, which gives the shortest path length between two points. Therefore, the wire length is minimum and, thus, we do not include wire length in the cost function at this stage. We measure the routing congestion based on the commonly used channel density. After the detailed routing finishes routing a net, the channel density associated with an edge of a multi-level graph is updated accordingly. This is called resource estimation.

Our global router first tries L-shaped pattern routing. If the routing fails, we try Z-shaped pattern routing. This can be considered as a simple version of rip-up and reroute. If both pattern routes fail, we give up routing the connection, and an overflow occurs. We refer to a failed net (failed connection) as that causes an overflow. The failed nets (connections) will be reconsidered (refined) at the uncoarsening stage. There are at least two advantages by using this approach. First, routing resource estimation is more accurate than that performing global routing alone since we can precisely evaluate the routing region. Second, we can obtain a good initial solution for the following refinement very effectively since pattern routing enjoys very low time complexity and uses fewer routing resources due to its simple L- and Z-shaped routing patterns. Fig. 3 shows an example of routing a local netin a tile.

The uncoarsening stage starts to refine each local failed net (connec-tion), left from the coarsening stage. The global router is now changed to the maze router with the following cost function : E_i! <:

(Re) = e2R

(ale+ bce+ coe) (2)

wherea–c are user-defined parameters, l_eis the length of the net (con-nection), andoe 2 f0; 1g. If an overflow happens, oeis setto 1; itis set to 0, otherwise.

There is a tradeoff among minimizing wire length, congestion, and overflow. At the uncoarsening stage, we intend to resolve the overflow in a tile. Therefore, we letc be much larger than a or b. Also, a de-tailed maze routing is performed after the global maze routing. Iterative refinement of a failed net is stopped when a route is found or several tries (say, three) have been made. Uncoarsening continues until the first levelG0is reached and the final solution is found. Note that the global

maze routing here serves as an elaborate rip-up and reroute processor, in contrast to the simple L- and Z-shaped routing during coarsening. (For rip-up and reroute in MR, we means the Z-shaped refinement at the coarsening stage, or the maze routing at the uncoarsening stage. They are only applied to global routing for better efficiency and quality tradeoff.) This two-stage approach of global and local refinement of de-tailed routing gives our overall refinement scheme.

B. Multilevel Routing for Performance

1) Timing Optimization: In deep submicron IC designs,

intercon-nection delay dominates the performance of a circuit. Therefore, im-proving the wire delay also improves the overall chip performance. The routing problem with timing constraints is much more complex, as not only congestion must be controlled but also timing constraints must be satisfied. Many techniques have been developed to facilitate high-performance IC designs. For example, the algorithms for perfor-mance-driven routing-tree topology construction have received much attention [7], [10], [19]. However, most existing works focus only on constructing a single routing tree. To employ the existing methods of tree construction, the congestion problem must be addressed. The MST topology leads to the minimum total wirelength and, thus, congestion is easier to be controlled than other topologies. However, its topology may result in longer critical paths and, thus, degrade circuit mance. Though a shortest path tree (SPT) may result in the best perfor-mance, its total wirelength (and congestion) may be significantly larger than that constructed by the MST algorithm [10]. In [10], researchers used the idea of incrementally modifying an MST to construct a routing tree for a better tradeoff between timing (SPT) and wirelength (MST). Our construction of a timing-driven routing tree is based on the sim-ilar idea used in [10]. We first construct an MST (for smaller wirelength and, thus, better routability) and then fix the timing violation, if any, by resorting to the SPT topology of the net. Performance optimization usu-ally targets on the minimization of the critical path delay (see Fig. 4), but to determine a critical path in a circuit is an NP-hard problem due to the false path problem [9]. Therefore, for simplicity, we minimize the critical sink of a net. In the following, we present our framework for timing-driven multilevel routing that is summarized in Fig. 5.

(5)

Fig. 4. Example of recalling modification. (a) Node I on the thick path violates the timing constraint. (b) Connect Node I to a new parent to satisfy the timing constraint and delete the corresponding edge. (c) Continue the modification until it meets the timing constraint.

Fig. 5. Algorithm for performance-driven multilevel routing.

Just as in the framework for multilevel routing for routability, we first build an MST for each net. However, the MST here is directed, since timing analysis is conducted from the tree source to all sinks, opposite to the multilevel routing for routability that uses undirected trees. After the topologies of all nets are obtained, our multilevel framework starts from coarsening the finest tiles at level 0 and processes tiles one by one. Before we route a local net (connection), timing analysis, based on the Elmore delay model, is performed from the tree source to all sinks. If a target node violates the timing constraint, we modify the tree topology by recalling modification. That is, if a target node violates the timing constraint, we delete this local connection and then trace back from the target node to the tree source to find a new parent for the connection that can meet the timing constraint. (Although this process might increase the total wirelength and thus the total wire capacitance, the decrease of the path delay due to lower source-to-sink loading capacitance is even more significant.) Fig. 4 shows how to trace back the tree from the target node to the source to find a new node to satisfy the timing constraint. After a new path that meets the timing constraint is found, we start to route the net if it is a local net belonging to the current level. The routing process is the same as that for multilevel routing for

Fig. 6. Algorithm to computeb(v).

routability. After detailed routing is done, the target node may again violate the timing constraint because the detailed route may run through a longer path or incur a larger load from other tree branches. We will fix the timing violation at the later uncoarsening stage. In order to alleviate this problem, we may keep a small timing slack when we estimate the path delay.

After coarsening is done, MR performs timing analysis on all nets again to identify those nets that violate the timing constraints. Uncar-sening continues to refine those failed nets, if any, by maze routing. Also, the failed nets from the coarsening stage are refined. Since we iteratively fine tune every local net, a topology of the net meeting the timing constraint and possessing good routability is gradually formed. Like [5], the iterative refinement provides a framework for seamless integration of different algorithms at different levels.

2) Via Minimization: Vias typically have significantly larger RC

delay than metal wires and, thus, it is desired to minimize the number of vias used in a routing path to optimize circuit performance. We apply the following algorithm, called Simultaneous Pathlength and Via Minimization (SPVM), to perform maze routing to find a shortest path with the minimum number of bends/vias (see Fig. 6). It asso-ciates each basic detailed routing regionu (could be a grid cell in gridded-based routing or a basic routing region defined by the wire pitch in gridless routing) with two labels:d(u) and b(u), where d(u) is the distance of the shortest path from sources to u, and b(u) is the minimum number of bends/vias along the shortest path froms to u. Initialize d(u) = 1, b(u) = 1, 8u 6= s, d(s) = 0, and b(s) = 0. Maze routing is a two-stage approach of wave propagation followed by backtracking [17]. In the wave-propagation stage of maze routing, the computation of labelds is the same as the original maze-routing algorithm. Letu be a basic routing region on the wavefront and v a neighboring basic-routing region ofu. The predecessor routing re-gion ofu is the region from which the wavefront was propagated for obtaining the minimumb(u). The propagation direction of u is the direction from the predecessor routing region ofu to u. The compu-tation ofb(v) is as follows.

(6)

TABLE II BENCHMARKCIRCUITS

TABLE III

COMPARISONAMONG(A)THETHREE-LEVELROUTING[6], (B)THEHIERARCHICALROUTING[5], (C)THEMULTILEVELROUTING[5],

AND(D)OURMULTILEVELROUTINGMR. NOTE: (A)–(C) WERERUN ON A440-MHz SUNULTRA-5 WITH

384 MBOFMEMORY; (D)WASRUN ON A450-MHz SUNSPARCULTRA-60 WITH2-GBOFMEMORY

The basic idea is to compare the distance labelds firstand then com-pare the bend/via number label bs. The value b(v) of a neighboring routing regionv with d(v) < d(u) remains unchanged because the path froms through u to v is not the shortest path between s and v. The backtracking stage is the same as that of the original maze-routing algorithm. Note that it is possible that there may exist several shortest paths with different number of bends/vias. The wave-propagation stage always keeps track of the shortest path with the minimum bend/via number to allow the backtracking stage to find such a path. It is clear that the SPVM algorithm guarantees finding a shortest path with the minimum number of bends/vias, if such an path exists.

IV. EXPERIMENTALRESULTS

We have implemented our multilevel routing system MR in the C++ language on a 450-MHz SUN Sparc Ultra-60 work-station with 2 GB memory. (MR is available at the web site http://cc.ee.ntu.edu.tw/~ywchang/research.html.) We compared our results with [5] and [6] based on the 11 benchmark circuits provided by the authors. The design rules for wire/via widths and wire/via separation for detailed routing are the same as those used in [5] and [6]. The parametersa and b in the cost function were both set to one, whilec was initially set to one, and was gradually increased when the router failed to refine the target net until a termination bound was reached.

Table II lists the set of benchmark circuits. In the table, “Ex.” gives the names of the circuits, “Size” gives the layout dimensions, “# of

Layers” denotes the number of routing layers used, “#Nets” gives the number of two-pin connections after net decomposition, and “#Hori-zontal/Vertical tracks” gives the number of horizontal/vertical routing tracks per layer. Table III gives the comparison of our multilevel routing MR for routability with the three-level routing [6], the hierarchical routing [5], and the multilevel routing [5]. The three-level routing (A) first uses a performance-driven global router, then a noise-constrained wire spacing and track assignment algorithm, and a detailed router [6]. The hierarchical routing with rip-up and replan (B) is developed in [5] for comparative study. Since the hierarchical approach adopts the top-down process to handle designs, it has a more global view of the problem. However, as mentioned earlier, a hierarchical flow lacks local routing information and needs to refine more local congestion than a multilevel approach does. The multilevel routing (C) gives the main results from [5]. In the table, “Time (s)” represents the running times in second, “#Rtd. Nets” denotes the number of routed nets, “Comp. Rates” gives the routing completion rates, and “avg.” (bottom row) de-notes the average routing completion rates.

As shown in the table, MR obtains significantly better routing so-lutions than the multilevel routing [5], the three-level routing [6], and the hierarchical approach [5]. (Note that the previous work [5] also made comparisons with earlier works such as Wang and Kuh [25], which is a maze-based router. As reported in [5], the simple net-by-net maze-based router cannot scale well to handle the circuits used in the experiments.) For the 11 benchmark circuits provided by the authors of [5], MR obtains 100% routing completion for all circuits, while the multilevel routing, the three-level routing, and the hierarchical routing can complete routing for only 2, 0, 2 circuits, respectively.

(7)

TABLE IV

RESULTS OF OURMULTILEVELROUTING FORROUTABILITY BYUSING

TWO ANDTHREELAYERS. ( : EXCLUDE THERATE FORMcc2)

TABLE V

RESULTS OF OURTIMING-DRIVENMULTILEVELROUTING

WITHDIFFERENTCONSTRAINTRATIOSks

Note: Time (s) includes constraint calculation and timing-driven multilevel routing.

Since all examples are 100% routed by our system using the num-bers of layers given in the test data, we show our superior performance by further reducing the numbers of available routing layers in the ex-amples. Table IV shows that MR still obtains better routing comple-tion rates by even using fewer layers. From Table IV, we can see that if we only use two layers, MR often needs more time for performing routing, since rip-up and reroute might occur more often as the routing resources become more restricted.

We also performed experiments on timing-driven routing (although no previous timing-driven routers are available to us for comparative studies). In the benchmark circuits, Mcc1, Mcc2, Prim1, and Prim2 do not have the information of net sources. Therefore, we cannot calcu-late the path delay for those benchmarks and, thus, only the results for the six examples listed in Table V are reported. To perform ex-periments on timing-driven routing, we used the same resistance, ca-pacitance, and via parameters as those used in [11]. First, we con-structed a shortest path tree for a net by connecting all sinks directly to their net source to obtain the timing constraints. We then assigned the timing bound of each sink as the multiplication of the constant k and the shortest path delay of the net. We tried different values of

Fig. 7. Routing solution for “S9234” obtained from MR for routability (2 layers; completion rates= 99:7%).

Fig. 8. Routing solution for “S9234” obtained from our timing-driven MR (HVH routing model; = 2; 3 layers; completion rates = 94:3%). (a) Routes on the third horizontal layer and (b) routes on the first and second layers.

ks and used three layers for routing. As shown in Table V, as k ap-proaches 2.5 (2.0), the routing completion rates obtained by our timing-driven MR system are higher than (comparable to) those obtained in [5] that considered only routability. Further, our timing-driven MR can dramatically reduce both the critical path delay(dmax) and the

av-erage netdelay(davg). Therefore, the timing-driven multilevel router

MR is very promising. Fig. 7 shows the two-layer routing solution for “S9234” obtained from our system with routability consideration alone (completion rates = 99:7%). Fig. 8 shows the three-layer routing so-lution for “S9234” from our timing-driven multilevel router withk = 2 (completion rates = 94:3%).

The memory requirements ranged from 14 MB for 9234 to 496 MB for s38584, for two-layer routing and were proportional to the number

(8)

of layers. For example, for three-layer routing, the circuit s38584 would need about(496=2) 2 3 = 744 MB.

V. CONCLUSION

We have proposed a novel multilevel routing framework MR consid-ering both routability and performance. Unlike the previous multilevel routing, MR integrates global routing, detailed routing, and resource estimation together at each level of the framework, leading to more accurate routing resource estimation during coarsening and thus facil-itating the solution refinement during uncoarsening. The exact routing information at each level makes our framework more flexible in dealing with various routing objectives (such as crosstalk, power, etc.). Exper-imental results have shown that MR is very promising. Future work lies in the development of a timing-driven multilevel router considering signal integrity.

ACKNOWLEDGMENT

The authors would like to thank the authors of [5], Prof. J. Cong, J. Fang, and Y. Zhang, for providing the benchmark circuits. Special thanks go to Y. Zhang for her prompt explanations of their data and very helpful discussions. They also thank the anonymous reviewers for their very constructive comments.

REFERENCES

[1] C. Albrecht, “Global routing by new approximation algorithms for mul-ticommodity flow,” IEEE Trans. Computer-Aided Design, vol. 20, pp. 622–632, May 2001.

[2] C. J. Alpert, J.-H. Huang, and A. B. Kahng, “Multilevel circuit parti-tioning,” IEEE Trans. Computer-Aided Design, vol. 17, pp. 655–667, Aug. 1998.

[3] Y.-W. Chang, K. Zhu, and D. F. Wong, “Timing-driven routing for sym-metrical-array-based FPGAs,” ACM Trans. Design Automation

Elec-tron. Syst., vol. 5, no. 3, pp. 433–450, 2000.

[4] T. Chan, J. Cong, T. Kong, and J. Shinnerl, “Multilevel optimization for large-scale circuitplacement,” in Proc. IEEE/ACM Int. Conf.

Computer-Aided Design, Nov. 2000, pp. 171–176.

[5] J. Cong, J. Fang, and Y. Zhang, “Multilevel approach to full-chip gridless routing,” in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 2001, pp. 396–403.

[6] J. Cong, J. Fang, and K. Khoo, “DUNE: A multi-layer gridless routing system with wire planning,” in Proc. ACM Int. Symp. Physical Design, 2000, pp. 12–18.

[7] J. Cong, A. Kahng, and K. Leung, “Efficient algorithms for the min-imum shortest path steiner arborescence problem with applications to VLSI physical design,” IEEE Trans. Computer-Aided Design, vol. 17, pp. 24–39, Jan. 1998.

[8] J. Cong, S. Lim, and C. Wu, “Performance driven multilevel and mul-tiway partitioning with retiming,” in Proc. ACM/IEEE Design

Automa-tion Conf., June 2000, pp. 274–279.

[9] J. Cong and P. H. Madden, “Performance driven global routing for stan-dard cell design,” in Proc. ACM Int. Symp. Physical Design, Apr. 1997, pp. 73–80.

[10] J. Cong, A. B. Kahng, G. Robins, M. Sarrafzadeh, and C. K. Wong, “Provably good performance driven global routing,” IEEE Trans.

Com-puter-Aided Design, vol. 11, pp. 739–752, June 1992.

[11] T. Deguchi, T. Koide, and S. Wakabayashi, “Timing-driven hierarchical global routing with wire-sizing and buffer-insertion for VLSI with multi-routing-layer,” in Proc. Asia South Pacific Design Automation Conf., June 2000, pp. 99–104.

[12] M. Hayashi and S. Tsukiyama, “A hybrid hierarchical global router for multi-layer VLSI’s,” IEICE Trans. Fundamentals, vol. E78-A, no. 3, pp. 337–344, 1995.

[13] J. Heisterman and T. Lengauer, “The efficient solutions of integer pro-grams for hierarchical global routing,” IEEE Trans. Computer-Aided

Design, vol. 10, pp. 748–753, June 1991.

[14] D. Hightower, “A solution to line routing problems on the continuous plane,” in Proc. Design Automation Workshop, 1969, pp. 1–24.

[15] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hy-pergraph partitioning: application in VLSI domain,” IEEE Trans. VLSI

Syst., vol. 7, pp. 69–79, Mar. 1999.

[16] R. Kastner, E. Bozorgzadeh, and M. Sarrafzadeh, “Predictable routing,” in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 2000, pp. 110–114.

[17] C.-Y. Lee, “An algorithm for path connection and its application,” IRE

Trans. Comput., vol. EC-10, pp. 346–365, Sept. 1961.

[18] Y.-L. Lin, Y.-C. Hsu, and F.-S. Tsai, “Hybrid routing,” IEEE Trans.

Computer-Aided Design, vol. 9, pp. 151–157, Feb. 1990.

[19] J. Lillis, C.-K. Cheng, T.-T. Y. Lin, and C.-Y. Ho, “New performance driven routing techniques with explicit area/delay tradeoff and simulta-neous wiresizing,” in Proc. Design Automation Conf., June 1996, pp. 395–400.

[20] H.-C. Lee, Y.-W. Chang, J.-M. Hsu, and H. Yang, “Multilevel floor-planning/placementfor large-scale modules usingB -trees,” in Proc.

ACM/IEEE Design Automation Conf., Anaheim, CA, June 2003, pp.

812–817.

[21] S.-P. Lin and Y.-W. Chang, “Novel framework for multilevel routing considering routability and performance,” in Proc. IEEE/ACM Int. Conf.

Computer-Aided Design, San Jose, CA, Nov. 2002, pp. 44–50.

[22] M. Marek-Sadowska, “Router planner for custom chip design,” in Proc.

IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 1986, pp. 246–249.

[23] G. Meixner and U. Lauther, “A new global router based on a flow model and linear assignment,” in Proc. IEEE/ACM Int. Conf. Computer-Aided

Design, Nov. 1990, pp. 44–47.

[24] J. Soukup, “Fastmaze router,” in Proc. ACM/IEEE Design Automation

Conf., June 1978, pp. 100–102.

[25] D. Wang and E. Kuh, “A new timing-driven multilayer MCM/IC routing algorithm,” in Proc. Multi-Chip Module Conf., Feb. 1997, pp. 89–94.

Testing SoC Interconnects for Signal Integrity Using Extended JTAG Architecture

Mohammad H. Tehranipour, Nisar Ahmed, and Mehrdad Nourani

Abstract—As technologyshrinks and working frequencyreaches the

multigigahertz range, designing and testing interconnects are no longer trivial issues. In this paper, we propose an enhanced boundary-scan archi-tecture to test high-speed interconnects for signal integrity. This architec-ture includes: 1) a modified driving cell that generates patterns according to multiple transitions fault model and 2) an observation cell that monitors signal integrityviolations. To fullycomplywith the conventional Joint Test Action Group Standard, two new instructions are used to control cells and scan activities in the integritytest mode.

Index Terms—Boundary-scan test, integrity loss, interconnect testing,

Joint Test Action Group (JTAG) Standard, signal integrity, system-on-chip.

I. INTRODUCTION A. Motivation

The number of cores in a system-on-chip (SoC) is rapidly growing, which leads to a significant increase in the number of interconnects. With fine miniaturization of the very large scale integrated (VLSI) cir-cuits, existence of long interconnects in SoCs and rapid increase in the working frequency (currently in the gigahertz range), signal integrity

Manuscript received June 23, 2003. This work was supported in part by the National Science Foundation under CAREER Award #CCR-0130513. This paper was recommended by Associate Editor K. Chakrabarty.

The authors are with the Center for Integrated Circuits and Systems, The University of Texas at Dallas, Richardson, TX 75083-0688 USA (e-mail: [email protected]; [email protected]; [email protected]).