A Network-Flow-Based RDL Routing Algorithm for Flip-Chip Design

(1)

Abstract—The flip-chip package gives the highest chip density of any packaging method to support the pad-limited application-specific integrated circuit designs. In this paper, we propose the first router for the flip-chip package in the literature. The router can redistribute nets from wire-bonding pads to bump pads and then route each of them. The router adopts a two-stage technique of global routing followed by detailed routing. In global routing, we use the network flow algorithm to solve the assignment problem from the wire-bonding pads to the bump pads and then create the global path for each net. The detailed routing consists of three stages, namely: 1) cross-point assignment; 2) net ordering determination; and 3) track assignment, to complete the routing. Experimental results based on seven real designs from the industry demonstrate that the router can reduce the total wirelength by 10.2%, the critical wirelength by 13.4%, and the signal skews by 13.9%, as compared with a heuristic algorithm currently used in industry.

Index Terms—Detailed routing, global routing, physical design.

I. INTRODUCTION A. Flip-Chip Design

D

UE TO THE increasing complexity and decreasing fea-ture size of very large scale integration (VLSI) designs, the demand of more I/O pads has become a significant problem of package technologies. A relatively new packaging technol-ogy, i.e., the ﬂip-chip package, as shown in Fig. 1, is created for higher integration density and rising power consumption. Flip-chip bonding was first developed by IBM in the 1960s. It gives the highest chip density of any packaging method to support the pad-limited application-specific integrated circuit designs.

Flip-chip is not a specific package, or even a package type, e.g., pin grid array (PGA) or ball grid array (BGA). Flip-chip describes the method of electrically connecting the die to the package carrier. The package carrier, which is either a substrate or a lead frame, provides the connection from the die to the

Manuscript received January 4, 2006; revised September 17, 2006. This work was supported in part by Faraday Technology Corporation of Taiwan and in part by the National Science Council of Taiwan under Grants NSC 93-2215-E-002-009, NSC 93-2215-E-002-029, and NSC 93-2752-E-002-008-PAE. This paper was presented in part at the 2005 IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, November 2005. This paper was recommended by Associate Editor J. Hu.

J.-W. Fang and I-J. Lin are with the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: jiawei@eda.ee.ntu.edu.tw; waves@eda.ee.ntu.edu.tw).

Y.-W. Chang is with the Department of Electrical Engineering and Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: ywchang@cc.ee.ntu.edu.tw).

J.-H. Wang is with Apache Design Solutions, Hsinchu 300, Taiwan, R.O.C. (e-mail : jhwang@apache-da.com)

Digital Object Identifier 10.1109/TCAD.2007.891364

Fig. 1. (a) Flip-chip. (b) Flip-chip package.

outside devices of the package. The die of a PGA/BGA package is attached to the carrier face up, and later, a wire is bonded first to the die, then looped and bonded to the carrier. In contrast, the interconnection between the die and carrier in the flip-chip package is made through a conductive bump ball that is placed directly on the die surface. Finally, the bumped die is flipped over and placed face down, with the bump balls connecting to the carrier directly. The flip-chip technology is the choice in high-speed applications because of the following advantages: reduced signal inductance (high speed), reduced power/ground inductance (low power), reduced package footprint, smaller die size, higher signal density, and lower thermal effect. However, in recent integrated circuit designs, the I/O pads are still placed along the boundary of the die. This placement does not suit the flip-chip package. As a result, we use the top metal or an extra metal layer, which is called a redistribution layer (RDL), as shown in Fig. 2, to redistribute the wire-bonding pads to the bump pads without changing the placement of the I/O pads. Since the RDL is the top metal layer of the die, the routing angle in an RDL cannot be any angle as in the PGA/BGA packages. Bump balls are placed on the RDL and use the RDL to connect to wire-bonding pads by bump pads.

The flip-chip package is generally classified into two types, namely: 1) the peripheral array, as shown in Fig. 3(a), and 2) the area array, as shown in Fig. 3(b). In the peripheral array, the bump balls are placed along the boundary of the flip-chip package. The disadvantage of the peripheral array is that we only have a limited number of bump balls. In the area array,

(2)

Fig. 2. Cross section of RDL.

Fig. 3. (a) Peripheral array. (b) Area array. (c) RDL routing result.

the bump balls are placed in the whole area of the flip-chip package. The advantage of the area array is that the number of bump balls is much more than that of the peripheral array; thus, it is more suitable for modern VLSI designs. Since the flip-chip design is for high-speed circuits, the issue of signal skews is also important. Thus, a special router, i.e., the RDL router [11], is needed to reroute the peripheral wire-bonding pads to the bump pads and then connect the bump pads to the bump balls. Consider that the routing of multipin nets and the minimization of the total wirelength and the signal skews are also needed for an RDL router. Fig. 3(c) shows one RDL routing result for an area-array flip-chip.

B. Previous Work

To the best knowledge of the authors, there is no previous work in the literature on the routing problem for flip-chip designs. Similar works are the routing for PGA packages, BGA packages, and planar graphs, including [1]–[4], [8]–[10], and [12]–[15]. Yu and Dai [14] used the geometric and symmetric attributes of the pin positions in the BGA packages to assign pins of the BGA packages. However, in flip-chip designs, the positions of wire-bonding pads and bump pads do not always have these geometric and symmetric attributes. PGA routers are presented in [3] and [10], whereas a BGA router is provided in [4]. These three routers are any-angle multilayer routers without considering the pin assignment problem, single-layer routing, and total wirelength minimization. Wang et al. [12] and Yu et al. [15] applied the minimum-cost network flow algorithm to solve the I/O pin routing problems. All these routers focused only on routability and did not consider multipin nets and signal skews. Wang et al. [12] also did not consider the routing congestion problem. Furthermore, they assumed that wires can be any angle; thus, their methods are not suitable for the RDL routing, typically with a 90◦ angle routing. For the previous works on the planar routing [1], [2], and [5], since the pins

can be placed anywhere in the chip, it is a nameplate-complete problem, and thus, most likely, there exists no efficient optimal algorithm for the planar routing. In the flip-chip routing, since wire-bonding pads and bump pads are placed in arrays, we can take the advantage of the regular structure to find an efficient algorithm for the RDL routing. Thus, the flip-chip routing problem is also different from the planar routing one.

C. Our Contributions

To our best knowledge, this paper is the first work in the literature to propose an RDL router to handle the routing problem of flip-chip designs with real industry applications. We present a unified network flow formulation to simultaneously consider the concurrent assignment of the wire-bonding pads to the bump pads and the routing between them. Our algorithm consists of two phases. The first phase is the global routing that assigns each wire-bonding pad to a unique bump pad. By formulating the assignment as a maximum-flow problem and applying the minimum-cost maximum-flow (MCMF) al-gorithm, we can guarantee 100% detailed routing completion after the assignment. The second phase is the detailed routing that efficiently distributes the routing points between two adja-cent wire-bonding (bump) pads and assigns wires into tracks. In addition to the traditional single-layer routing with only routability optimization, our RDL router also tries to optimize the total wirelength and the signal skews between a pair of signal nets under the 100% routing completion constraint. Ex-perimental results based on seven real designs from the industry demonstrate that the router can reduce the total wirelength by 10.2%, the critical wirelength by 13.4%, and the signal skews by 13.9%, as compared with a heuristic algorithm currently used in industry.

The rest of this paper is organized as follows: Section II gives the formulation of the RDL routing problem. Section III details our global and detailed routing algorithms. Section IV shows the experimental results. Finally, conclusions are given in Section V.

II. PROBLEMFORMULATION

We introduce the notations used in this paper and formally define the routing problem for flip-chip packages. Fig. 4 shows the modeling of the routing structure of the flip-chip package. Let P be the set of wire-bonding pads, and let B be the set of bump pads. For practical applications, the number of bump pads is larger than or equal to the number of wire-bonding pads, i.e., |B| ≥ |P |, and each bump pad can be assigned to more than one wire-bonding pad. Let Rb= {r1b, r2b, . . . , rbm}

be a set of m bump pad rings in the center of the package, and let Rp= {rp1, r2p, . . . , rpk} be a set of k wire-bonding

pad rings at the boundary of the package. Each bump pad ringrb

i consists of a set ofq bump pads {bi1, bi2, . . . , biq}, and

each wire-bonding pad ring rjp consists of l wire-bonding

pads {pj₁, pj₂, . . . , pj_l}. Let N be the set of nets (could be two-pin or multipin nets) for routing. Each multipin net n in N is defined by a set of wire-bonding pads and a set of bump pads that should be connected. Each two-pin net can

(3)

Fig. 4. Four sectors in a flip-chip package.

be assigned to a bump pad not included in the sets of bump pads for the multipin nets. Since the RDL routing for current technology is typically on a single layer, it does not allow wire crossings, for which two wires intersect each other in the routing layer. As shown in Fig. 4, based on the two diagonals of the flip-chip package, we partition the whole package into four sectors, namely: 1)N orth = {PN, BN, RpN, RNb }; 2) East =

{PE, BE, REp, REb }; 3) South = {PS, BS, RSp, RbS}; and

4)W est = {PW, BW, RWp , RWb }, where Pi(Bi) and Rip(Rib),

i ∈ {N, E, S, W } are the set of the wire-bonding (bump) pads and the set of the wire-bonding (bump) pad rings in thei sector, respectively. For practical applications, the wire-bonding pads in one sector only connect to the bump pads in the same sector. We define an interval to be the segment between two adjacent bump pads in the same ring rbi or the segment between two

adjacent wire-bonding pads in the same ringrjp. Given a

flip-chip routing instance, there are two types of routing, namely: 1) the monotonic routing and 2) the nonmonotonic routing. A monotonic routing can be formally defined as follows.

Deﬁnition 1: A monotonic routing is a routing such that for each netn connecting from a wire-bonding pad p to a bump pad b, n intersects exactly one interval in each ring rb

i and

exactly one interval in each ringrpj.

As shown in Fig. 5(a), the netsn2 and n4 are monotonic

routes. If we exchange the positions of two bump padsb2and

b4, the routings of n2 and n4 are nonmonotonic, as shown

in Fig. 5(b). The wirelengths of the nets n2 and net n4 are

increased. This shows a drawback of the nonmonotonic rout-ing. Since the nonmonotonic routing occupies more routing resource, it causes significant problems for the single-layer routing. Thus, a good flip-chip package routing should be a monotonic routing without detours, as shown in Fig. 6, because the monotonic routing results in smaller total wirelength and higher routing completion, as compared to the nonmonotonic

Fig. 5. (a) Monotonic routing. (b) Nonmonotonic routing.

Fig. 6. Monotonic routing with and without detours.

routing. Furthermore, the signal skew, i.e., the difference of wirelength between the longest net and the shortest one, should also be considered for routing on the flip-chip package.

Based on the aforementioned definition, the routing problem can be formally defined as follows.

Problem 1: The single-layer flip-chip routing problem is to connect a set ofp ∈ P and a set of b ∈ B so that no wire crosses each other, the routing is monotonic, and the total wirelength and the signal skew are minimized.

III. ROUTINGALGORITHM

In this section, we present our routing algorithm. First, we give the overview of our algorithm. Then, we detail the methods used in each phase.

A. Algorithm Overview

According to the routing flow shown in Fig. 7, our algorithm consists of two phases, namely: 1) global routing based on the MCMF algorithm [5] and 2) detailed routing based on the cross-point assignment, the net ordering determination, and the track assignment.

In the first phase, we construct four flow networks, namely: 1)GN; 2)GE; 3)GS; and 4)GW, one for each sector, to solve

the assignment of the wire-bonding pads to the bump pads. Since we have only one layer for routing, the assignment should not create any wire crossings. We avoid the wire crossings by restricting the edges in the networks not to intersect each other. We first consider two-pin nets and then multipin nets. The reason is that multipin nets allow more than one wire-bonding pad to connect to one bump pad. Thus, the multipin nets may block the two-pin nets. Under this condition, a wire-bonding pad may not find a global path. Thus, the two-pin nets need to be considered first. We will detail the reason in Section III-B4. After applying MCMF, we obtain the flows representing the routes from wire-bonding pads to bump pads for the nets. Those flows give the global paths for the nets.

In the second phase, we use the cross-point assignment, the net ordering determination, and the track assignment to determine detailed routes. A cross point is the point for a net to pass through an interval. First, we find the cross points for

(4)

Fig. 7. RDL routing flow.

Fig. 8. Overview of the RDL routing algorithm.

all nets passing through the same interval. For all nets that pass through the same interval, we evenly distribute these cross points. Second, we use the net ordering determination technique presented in [7] to create the routing sequence between two adjacent rings so that we can guarantee to route all nets. Finally, we assign at least one track to each net based on the routing sequence obtained from the net ordering determination algorithm. Fig. 8 summarizes our routing algorithm.

B. Global Routing

In this section, we first show the basic flow network formula-tion. Then, we detail the capacity of each edge, the intermediate nodes, the tile nodes, and the cost of each edge. Finally, we discuss how to handle the multipin nets.

1) Basic Network Formulation: We describe how to con-struct the flow network GS to perform the concurrent

as-signment for the South sector. The other three sectors can be processed similarly. As shown in Fig. 9(a), we define DS =

{dS

1, dS2, . . . , dSh} to be a set of h intermediate nodes. Each

intermediate node represents an interval(pj_y, pjy+1)((bix, bix+1))

in a wire-bonding (bump) pad ring. TS = {tS1, tS2, . . . , tSu}

is a set of u tile nodes. Each tile node represents a tile (pj

y, pjy+1, pj+1y , pj+1y+1)((bix, bix+1, bi+1x , bi+1x+1)) between two

adjacent wire-bonding (bump) pad rings. We construct a graph GS = (PS∪ DS∪ BS∪ TS, E) and add a source node s and

a sink nodet to GS. Each intermediate noded has a capacity

ofKd, whereKdrepresents the maximum number of nets that

are allowed to pass through an intervald. Each tile node t has a capacity ofLt, whereLtrepresents the maximum number of

nets that are allowed to pass through a tilet. We will detail how to handle the capacity of the intermediate nodes and the tile nodes so that MCMF can be applied in Section III-B2. There are 11 types of edges:

1) edges from a wire-bonding pad to a bump pad;

2) edges from a wire-bonding pad to an intermediate node; 3) edges from a wire-bonding pad to a tile node;

4) edges from an intermediate node to a bump pad;

5) edges from an intermediate node to another intermediate node;

6) edges from an intermediate node to a tile node; 7) edges from a tile node to a bump pad;

8) edges from a tile node to an intermediate node; 9) edges from a tile node to another tile node; 10) edges from the source node to a wire-bonding pad; 11) edges from a bump pad to the sink node.

Each edge is associated with a(cost, capacity) tuple to be described in the following sections. Recall that we do not allow wire crossings for all wires. Since E represents the possible global paths for all nets, we can guarantee that no wire crossings will occur if there are no crossings in edges. Thus, we construct all the edges and avoid crossings of all edges at the same time. Fig. 9(b) shows an example flow network GS for the South

sector. The last two types of edges are not shown here. Further-more, we do not construct edges between the two tile nodes in the center of the two wire-bonding pad rings because the place-ment of these tile nodes is symmetric. We can solve MCMF in time O(|V |2|E|) based on the network flow algorithm presented in [5], whereV is the vertex set in the flow network.

Theorem 1: Given a flow network with the vertex set V and edge set E, the global routing problem can be solved in O(|V |2|E|) time.

Proof: Immediate from the aforementioned discussions. 2) Capacity Assignment and Node Construction: Now, we introduce the capacity of each edge, the intermediate nodes, and the tile nodes. Fig. 10 shows the capacity and cost for all 11 types of edges in the complete flow network. For an edgee, ife is from a wire-bonding pad to a bump pad, an intermediate node, or a tile node, the capacity ofe is set to one. If e is from an intermediate node or a tile node to a bump padb, then the capacity ofe is set to Mb, whereMb is the maximum number

(5)

Fig. 9. (a) Intermediate nodes and tile nodes. (b) Flow network for the South sector.

Fig. 10. Capacity and cost on edges.

of nets that are allowed to connect to the bump padb. Recall that an intermediate noded has a capacity of Kd, whereKd is

the maximum number of nets that are allowed to pass through this intermediate noded. This means that the capacity of each incoming edge of an intermediate noded is equal to Kd. Ife

is an incoming edge of a tile nodet, then the capacity of e is set toLt, where Lt is the maximum number of nets that are

allowed to pass through the tile nodet. As shown in Fig. 11, in order to model this situation, we decompose each intermediate noded into two intermediate nodes d andd, and an edge is connected fromd tod, with a capacity ofKd. All outgoing

Fig. 11. (a) Capacity and cost on intermediate nodes. (b) Capacity and cost on tile nodes.

edges ofd are now connected from d, with a capacity of ¯Kd,

and all incoming edges of d are now connected to d, with a capacity ofKd. Each tile nodet is also decomposed into two

tile nodes t andt, and the capacity of a tile nodet is set to Lt, whereLtis the maximum number of nets that are allowed

to pass through this tile nodet. The capacity of the edges from the source node to the wire-bonding pads is set to one, and the capacity of the edges from each bump padb to the sink node is set toMb. There are three worst cases of congestion in a tile,

as shown in Fig. 12. The four nodes in the three figures are all bump pads. In Fig. 12(a) and (c), the maximum number of nets passing through the tile is 2K. In Fig. 12(b), the maximum number of nets passing through the tile is 3K. If we do not use the tile nodet, the maximum number of nets in Fig. 12(a)–(c)

(6)

Fig. 12. Three kinds of congestion in a tile.

Fig. 13. Adjustment ofα values on edges in the South sector.

could exceed the capacity of a tile (2K > Lt or 3K > Lt).

Since the capacity of each tile node is well modeled in our flow network, we can totally avoid this congestion problem.

3) Cost of Edges: The cost function of each edge is defined by the following equation:

Cost= α × WL (1)

where WL denotes the Manhattan distance between two

ter-minals of an edge, and α is an adaptive parameter to adjust the cost of different types of edges. By adjusting the value of α, we can control the wirelength of each net to avoid large signal skews among different nets. As an example shown in Fig. 13(a), we assign the smallestα to the dashed (red) edge that connects an intermediate node to a bump pad to assign the intermediate node to the bump pad first. By doing so, the rout-ing for a net startrout-ing from a precedrout-ing rrout-ing can be completed earlier to reduce its routing length (and, thus, signal skew). As an example shown in Fig. 13(b), the dashed (red) edge that connects one tile node to another tile node is also assigned the smallestα to assure that fewer bump pad rings are used. Since the wirelength between the tile nodet and the bump pad 1 is the same as that betweent and the bump pad 2, we have to assign the smallestα to the dashed (red) edge to make t connect the bump pad 2 first. Thus, we can reduce the number of long nets to reduce the signal skew by using fewer bump pad rings. If a wire-bonding pad is assigned to a bump pad directly, it might generate a very short net. Hence, we assign the largestα to the dotted (blue) edge that connects a wire-bonding pad to a bump pad to avoid too short connections between the two types of pads to reduce the signal skew. Finally, the solid (black) edge that connects two intermediate nodes, a tile node to a bump pad, or an intermediate node to a tile node is assigned a mediumα. Since the solid (black) edge does not influence the signal skew, the mediumα is set to one. The costs of the edges from the

source node to the wire-bonding pads and the costs of the edges from the bump pads to the sink node are both set to zero.

4) Multipin Net Handling: For practical flip-chip routing, if a bump pad can be assigned to any two or even more wire-bonding pads, we can just increase the capacity of the bump pad to connect more wire-bonding pads. Since we construct the edges for the two-pin nets and the multipin nets simultaneously, the global routing result is optimal. However, for other practical flip-chip routing, a net may connect multiple wire-bonding pads (which are assigned the same signal such as power or ground pads) to a bump pad. This bump pad cannot be assigned to other nets. As stated before, we first assign two-pin nets and then multipin nets. We only construct the edges associated with the two-pin nets and apply MCMF for the assignment. After the assignment, we delete all edges from the source node s and all edges to the sink node t. (However, the flows of the edge e, the intermediate node d, and the tile node t for each assigned two-pin net will be kept in the flow network.) The global paths of the assigned two-pin nets are not deleted and considered as blockagesF during the construction of the edges for the multipin nets. Recall that if there are no edge crossings in the flow network, then there are no wire crossings in the final routing solution. When we construct the edges for the multipin nets, an edgee exists only if e does not intersect any blockages or never crosses the assigned two-pin nets. Then, we add the edges from the source node to the wire-bonding pads associated with the multipin nets and the edges from the bump pads associated with the multipin nets to the sink node. Fig. 14(a) illustrates an example. We assume that a multipin netn consists of((p₂, p4, p5), (b3, b9)), which means that three wire-bonding

pads 2, 4, and 5 are only free to be assigned to one of the two bump pads 3 and 9. No other wire-bonding pads can be assigned to these two bump pads. Redundant edges are deleted by the blockagefi. For example, the edge fromp2to the intermediate

node between b8 and b9 is deleted because it intersects the

blockage (p₃, b8). By using MCMF, the wire-bonding pads

and bump pads are grouped into two sets: {p₂, b3} and

{p4, p5, b9}. Fig. 14(b) illustrates why we handle two-pin nets

first. In this example, we assume that only the bump pad 1 for two-pin nets and the bump pad 2 for multipin nets can be assigned to wire-bonding pads. If we handle the multipin net 2 first, then the two-pin net 1 cannot be assigned to the bump pad 1 to find a global path. The reason is that the multipin net 2 divides the region into two subregions and blocks the wire-bonding pad 1. In order to avoid this situation, we shall handle two-pin nets first. The similar idea is applied in the planar routing. As shown in Fig. 15(a), in a planar routing, if a net such as net 1 or net 2 is routed to divide the region into two subregions, it should be routed later. Otherwise, as shown in Fig. 15(b), other nets such as net 3 or net 4 may cross the net.

Based on the global routing algorithm, we have the following theorem.

Theorem 2: Given a set of wire-bonding pads, a set of bump pads, and a set of nets, if there exists a feasible solution computed by the MCMF algorithm, we can guarantee 100% detailed routing completion.

Proof: In our global routing model, MCMF is optimal for two-pin nets and suboptimal for multipin nets. Since we

(7)

Fig. 14. (a) Assign multipin nets. (b) Handle multipin nets first.

Fig. 15. (a) Routing sequence: {5, 3, 4, 1, 2}. (b) Routing sequence:

{1, 2, 3, 4, 5}.

consider the routing resource in the global routing stage and will never assign nets to exceed the capacity of an interval or a tile, we will never violate the design rules. Also, because we do not allow edge crossings during the flow network construction, the final routing solution will not generate wire crossings. Thus, after the assignment, all global paths are routable in the detailed

routing stage.

C. Detailed Routing

In this section, we explain the three methods used in our detailed routing. As shown in Fig. 16, after the global routing, each global path contains only wire-bonding pads, intermediate nodes, and bump pads. The two global paths dk, t, dl and

dy, t, bx, which pass through the tile node t, are remodeled

asdk, dl and dy, bx. Tile nodes are not needed for the final

representations of the global paths because a tile node is just used to avoid the congestion overflow.

1) Cross-Point Assignment: Based on the global routing re-sult (discussed in Section III-B), we use the cross-point assign-ment algorithm to evenly distribute nets that pass through the same interval (see Fig. 17 for an example). As shown in Fig. 17, the two nets from wire-bonding padsp2 andp3 pass through

the same intermediate node. Thus, we split the intermediate

Fig. 16. Redefined global paths.

Fig. 17. Cross-point assignment.

node into two cross points. Since the maximum number of intermediate nodes is ((|B| − |Rb|) + (|P | − |Rp|)), we have

the following theorem.

Theorem 3: The cross-point assignment problem can be solved inO(|B| + |P |) time.

Proof: If there areqibump pads in the bump pad ringrbiin

the South sector, there will be(q_i− 1) intervals of the bump pad ringrb

i. Hence, there will be(qi− 1) intermediate nodes. Since

the number of bump pad rings is|RS

b|, the maximum number

of intermediate nodes is|RSb|

i=0(qi− 1) = |BS| − |RSb|. As for

the wire-bonding pads, the condition is the same as that of the bump pads, and the conditions of the remaining three sectors are the same as those of the South sector. Thus, the maximum num-ber of intermediate nodes is ((|B| − |R_b|) + (|P | − |R_p|)), and the time complexity isO(|B| + |P |) (the upper bound of

(8)

Fig. 18. (a) Net segments between two adjacent rings. (b) Stack for net ordering determination.

((|B| − |Rb|) + (|P | − |Rp|))) to assign cross points to each

intermediate node.

2) Net Ordering Determination: After the assignment of cross points, each net has its path to cross each interval. For two adjacent rings, we can treat the routing between the two rings as a channel routing. Thus, we can use the net ordering determination algorithm presented in [7] to generate a routing sequence S = (ns

1, nd1), (ns2, nd2), . . . , (nsk, ndk), with k net

segments. Each net segment ni(j, j) is represented by a

source–destination pair/tuple (ns

i, ndi). We first determine

the source and destination for each net based on the counterclockwise traversing distance along the leftmost and rightmost boundaries. If the counterclockwise traversing distance along the leftmost boundary is shorter than the counterclockwise traversing distance along the rightmost boundary, the terminalj is a source, and the terminal j is a destination. Otherwise, the terminalj is a destination, and the terminaljis a source. For example, given the net 1 shown in Fig. 18(a), since the counterclockwise traversing distance along the leftmost boundary is shorter than the counterclockwise traversing distance along the rightmost boundary, we make the terminal 1 a source and the terminal 1 a destination. For the net 10, however, since the counterclockwise traversing distance along the leftmost boundary is longer than the counterclockwise traversing distance along the rightmost boundary, we make the terminal 10a source and the terminal 10 a destination. Starting from an arbitrary terminal, we then generate a circular list for all terminals ordered counterclockwise according to their positions on the boundaries. A stack is used to check if there exist crossovers among the net segments. For each terminal of net segmentni, if it is a source, then we push it into the

stack. If this terminal is a destination and the top element of the stack belongs to the same net segment, then net segment ni is matched, and the top element is popped. Otherwise, if

the stack is empty, or this terminal is a destination and the top element of the stack does not belong to the same net

segment, then we search the circular list for the next terminal. We keep searching the circular list until all nets are matched. As shown in Fig. 18(b), we start with the terminal 1. Since the terminal 1 is a source, we push it into the stack. Then, we search each terminal on the boundary counterclockwise. The terminal 1 is searched, and it is a destination; thus, we compare it with the top element of the stack. Because these two terminals belong to the same net segment, we pop the top element and determine the routing sequence of the net segment n1. Keeping on searching, since the terminals 2, 3, and 4are

all destinations, we do not push them into the stack. Since the terminals 5, 6, 7, 8, 9, and 10are all sources, we push them into the stack. Then, we process the terminal 10, which is a destination and matches the top element in the stack. Thus, we pop the net segmentn10and add it into the routing sequence.

Repeating this step, we can get the resulting routing sequence. With this sequenceS, we can guarantee that each net segment between two adjacent rings can be routed without intersecting each other. For example, given an instance shown in Fig. 18(a), according to the above-described net ordering determination algorithm, we can obtain the sequence S = (n1, n1), (n10,

n10), (n9, n9), (n8, n8), (n7, n7), (n6, n6), (n5, n5), (n2, n2),

(n3, n3), (n4, n4). According to the net ordering

determina-tion algorithm, we have the following theorem.

Theorem 4: Given a setN of nets, the net ordering determi-nation problem can be solved inO(|N |2) time.

Proof: According to the net ordering determination algo-rithm, the worst case happens when only one net is matched during each searching cycle. In this case, the total number of terminal searches is|N|−1_i=0 2(|N| − i) = |N|2+ |N|. Hence,

the time complexity isO(|N |2).

3) Track Assignment: With the net ordering, we can use maze routing to route all nets for any two adjacent rings. However, maze routing is quite slow and generates too many bends. (For example, for a small circuit with 513 nets, we need 25 min on a 1.2-GHz SUN Blade 2000 workstation with 8-GB

(9)

Fig. 19. (a) Example for track assignment. (b) Blocking point.

memory to complete the detailed routing.) Thus, we propose a track assignment algorithm to assign tracks to each net segment of any two adjacent rings. For each net segment ni in S,

according to the relative locations ofnsi andndi, we search a

track to be assigned toni from the top to the bottom or from

the bottom to the top. We search the tracks from the top to the bottom ifnsi is on the top-right side ofndi ornsi is on the

bottom-right side ofnd

i. Otherwise, we search the tracks from

the bottom to the top. If we find a trackh and it does not create any overlap with other wires, then we assignh to ni. As shown

in Fig. 19(a), we assign net segmentn1first. Since the terminal

1 is a source and the net ordering determination algorithm makes each net routed counterclockwise from the source to the destination along the boundary, we search from track 1 to track 6. Thus,n1is assigned to track 1 first. Since the terminal 5is a source, we search from track 6 to track 1. Thus,n5is assigned

to track 6 first. Also, we record the blocking pointsQ for ni. A

blocking segment is a wire on trackh + 1 (if we search from the top to the bottom) orh − 1 (if we search from the bottom to the top) to stopni from being assigned toh + 1 or h − 1 without

creating any overlap with it. A blocking pointqiis a terminal of

the blocking segment whose projection onh overlaps with ni.

As shown in Fig. 19(b), the pointq3on trackh2is the blocking point for netn3. If we cannot find suchh, we rip up and reroute

all net segments n1 toni−1. For each net segment nk to be

rerouted, we use the concept of the dogleg in the channel rout-ing to break a segment into two segments based on the blockrout-ing pointqk, such asq3in Fig. 19(b). Then, we assign the segment

that will not overlap withqkon the lowest possible track (if we

search from the top to the botto m) or on the highest possible track (if we search from the bottom to the top). After assigning tracks, we record the new blocking points for nk. Note that

Fig. 20. Algorithm for track assignment.

since, now, each net segment may be assigned with more than one track, we may have more than one blocking point for each net. Fig. 20 summarizes the track assignment algorithm. According to this algorithm, we have the following theorem.

(10)

Theorem 5: Given a set N of nets and the number of tracks H, the track assignment problem can be solved in O(|N |2H(|Rb| + |Rp|)) time.

Proof: The worst case of the track assignment algorithm is that we have to rip up and reroute every time while assigning the next net. Thus, the total number of times to assign nets to tracks is|N|_i=1i = |N |2+ |N|/2. When assigning a net, the maximum number of track searches isH, and the number of channels is(|Rp| + |Rb| − 1). Hence, the time complexity is

O(|N |2H(|Rb| + |Rp|)).

D. Complexity Analysis

If there are |B| bump pads, |P | wire-bonding pads, |D| intermediate nodes,|T | tile nodes, |Rb| bump pad rings, and

|Rp| wire-bonding pad rings, we can construct a flow network

composed of|V | vertices and |E| edges for the global routing, where |V | = |B| + |P | + |D| + |T | and |E| = edges among these vertices. In the detailed routing, there are |N| global paths, and each channel is divided into H tracks. Hence, the time complexity is as follows:O(|V |2|E|) (global routing)+ O(|B|+|P |) (cross-point assignment)+ O(|N |2)(net ordering determination) + O(|N|2_H(|R

b| + |Rp|)) (track assignment)

= O(|V |2₍_{|E| + H(|R}

b| + |Rp|))). The space complexity is

O(|E|) since we have O(|B| + |P | + |D| + |T |) nodes and O(|E|) edges. Thus, we can solve the RDL routing problem in polynomial time.

Theorem 6: Given a setP of wire-bonding pads, a set B of bump pads, and a setN of nets, if there exists a feasible solution computed by the RDL routing algorithm, the RDL routing problem can be solved in O(|V |2(|E| + H(|Rb| + |Rp|)))

time andO(|E|) space.

Proof: The time complexity analysis is immediate from Theorems 1, 3, 4, and 5. First, since|V |=|B|+|P |+|D|+|T |, the time complexity of the global routingO(|V |2|E|) domi-nates that of the cross-point assignmentO(|B| + |P |). Second, the time complexity of the track assignmentO(|N |2H(|Rb| +

|Rp|)) dominates that of the net ordering determination

O(|N |2). Finally, since |N| = |P |, the time complexity O(|V |2) dominates O(|N|2). Thus, the time complexity of the RDL routing algorithm is given byO(|V |2(|E| + H(|Rb| +

|Rp|))). The space complexity of pads O(|B| + |P |) dominates

that of nodes O(|D| + |T |). Since there is an edge from the source node s to every wire-bonding pad of P , and there is an edge from every bump pad of B to the sink node t, the space complexity of edgesO(|E|) dominates that of pads O(|B| + |P |). Thus, we can reduce the space complexity from O(|B| + |P | + |D| + |T | + |E|) to O(|E|).

IV. EXPERIMENTALRESULTS

We implemented our algorithm in the C++ programming language on a 1.2-GHz SUN Blade 2000 workstation with 8-GB memory. The benchmark circuits, which are listed in Table I, are real industry designs. In Table I, “Circuits” denotes the names of circuits; “#Nets” denotes the number of nets; “#R_p” denotes the number of wire-bonding pad rings; “#p” denotes the number of wire-bonding pads; “#R_b” denotes the

number of bump pad rings; and “#b” denotes the number of bump pads. In each of fs900, fs2116, and fs4096, the number of wire-bonding pads equals the number of bump pads. Thus, each wire-bonding pad needs to be assigned to exactly one bump pad. Hence, these three cases are more difficult for routing than the other four cases.

In Table II, we show how to calculate the values of Mb,

Kd, Hi, and Lt. As defined in the previous sections, Mb is

the maximum number of nets allowed to connect to a bump pad b, Kd is the maximum number of nets allowed to pass

through an intermediate noded, Hi is the maximum number

of tracks between two adjacent pad rings i and i + 1, and Lt

is the maximum number of nets allowed to pass through a tile nodet. All these variables can be expressed by the equations shown in the table. We calculate the values of these variables during the RDL routing process. The parameters used to cal-culate theMb,Kd,Hi, andLtvariables are listed in Table III.

They are all structure and design-rule related parameters. Since there are no flip-chip routing algorithms in the litera-ture, we compared our algorithm with the following heuristic currently used in industry. This heuristic is called the nearest node connection (NNC) algorithm. In NNC, the wires are routed sequentially. If a wire-bonding pad p can find a free bump padb in a restricted area of the nearest bump pad ring rbm, then it connectsp to b. If there are no free bump pads in rbm,

then we search for a free bump pad in the next bump pad ring rb

m+1. This process is repeated until we find a free bump pad.

The experimental results are shown in Table IV. We report the total wirelength, the critical wirelength (the wirelength of the longest net), the maximum signal skews, and the central processing unit times. Since the routability is guaranteed to be 100%, we do not report it. As compared with NNC, the ex-perimental results show that our network-flow-based algorithm reduces the total wirelength by 10.2%, the critical wirelength by 13.4%, and the signal skews by 13.9%, in reasonably longer running time. Note that for fs2116 and fs4096, NNC fails to find a routing solution. In Fig. 21, the running time of our algorithm is plotted as a function of the number of nets. Empirically, the running time of our RDL routing algorithm approaches quadratic (aboutN2.17) to the number of netsN , with the least square analysis for the log–log plot of the function. In Table V, we report the memory usage (in kilobytes) for each circuit for the RDL routing. In Fig. 22, the memory requirement of our algorithm is plotted as a function of the number of nets. The em-pirical memory complexity of our RDL routing algorithm is be-tween linear and quadratic (aboutN1.47) to the number of nets N , again with the least square analysis for the log–log plot of the function. The experimental results show that our network-flow-based RDL algorithm is effective and efficient for flip-chip designs. Fig. 23 shows the RDL routing result of fs900.

We also explore the effects of differentα on wirelength and skew. In Table II, we give the equations for the computation of the upper bound of the smallestα and the lower bound of the largestα. The two bounds come from the geometric relation of the pad placement. Nets that are composed of the wire-bonding pads of the inner wire-bonding pad ring and the bump pads of the outer bump pad ring are often short. Thus, by modelingdP B

(11)

TABLE II EXPERIMENTALVARIABLES

TABLE III

STRUCTURE ANDDESIGN-RULERELATEDPARAMETERS

we can avoid short nets. Furthermore, we want to make the nets of the outer wire-bonding pad rings be assigned to the outer bump pad rings; thus, we model dP P and dP B into

the equation of the upper bound of the smallestα. In order to

minimize the signal skew, the largest α has to be larger than the lower bound, and the smallest α has to be smaller than the upper bound. Furthermore, we also observe that when the largest (smallest)α is scaled up (down), the critical wirelength

(12)

TABLE IV

RDL ROUTINGRESULTS(N/A: NOTAVAILABLE)

Fig. 21. Running time for the RDL routing. TABLE V

MEMORYUSAGE FOREACHCIRCUIT

Fig. 22. Memory usage for the RDL routing.

and the signal skew may be further improved at the cost of larger total wirelength. Thus, we use this property to minimize the critical wirelength and the signal skew without increasing the total wirelength too much. We conducted an experiment to explore the effects of differentα, and the results are listed

Fig. 23. RDL routing result for fs900.

in Table VI. In this experiment, we tested three pairs of the smallestα and the largest α values on fs90b740. We first set the largest (smallest) α to 1 and then scaled it up (down) to see the effects of differentα to the total wirelength, the critical wirelength, and the signal skew. The percentages listed in the parentheses give the normalized ratios to that with the smallest and the largestα being set to 1. From the experimental results in Table VI, as the largest (smallest) α scales up (down), the total wirelength increases while the critical wirelength and the signal skew decrease.

V. CONCLUSION

In this paper, we have developed an RDL router for the flip-chip package. The RDL router consists of the two stages of global routing followed by detailed routing. The global routing applies the network flow algorithm to solve the assignment problem from the wire-bonding pads to the bump pads and then creates the global path for each net. The detailed routing applies the three-stage technique of cross-point assignment, net ordering determination, and track assignment to complete the routing. Experimental results demonstrate that our router can

(13)

achieve much better results in routability, wirelength, critical wirelength, and signal skews, as compared with a heuristic algorithm currently used in industry.

REFERENCES

[1] H. Cai, “Multi-pads, single layer power net routing in VLSI circuits,” in

Proc. ACM/IEEE Des. Autom. Conf., Jun. 1998, pp. 183–188.

[2] D.-S. Chen and M. Sarrafzadeh, “A wire-length minimization algorithm for single-layer layouts,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided

Des., Nov. 1992, pp. 390–393.

[3] S.-S. Chen, J.-J. Chen, S.-J. Chen, and C.-C. Tsai, “An automatic router for the pin grid array package,” in Proc. ACM/IEEE Asia and South Pac.

Des. Autom. Conf., Jan. 1999, pp. 133–136.

[4] S.-S. Chen, J.-J. Chen, C.-C. Tsai, and S.-J. Chen, “An even wiring approach to the ball grid array package routing,” in Proc. IEEE Int. Conf.

Comput. Des., Oct. 1999, pp. 303–306.

[5] B. V. Cherkassky, “Efficient algorithms for the maximum flow problem,”

Math. Methods Solut. Econ. Probl., vol. 7, pp. 117–126, 1977.

[6] J.-W. Fang, I.-J. Lin, P.-H. Yuh, Y.-W. Chang, and J.-H. Wang, “A routing algorithm for flip-chip design,” in Proc. IEEE/ACM Int. Conf.

Comput.-Aided Des., Nov. 2005, pp. 753–758.

[7] C.-P. Hsu, “General river routing algorithm,” in Proc. ACM/IEEE Des.

Autom. Conf., Jun. 1983, pp. 578–583.

[8] K.-F. Liao, M. Sarrafzadeh, and C. K. Wong, “Single-layer global rout-ing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 13, no. 1, pp. 38–47, Jan. 1994.

[9] A. Titus, B. Jaiswal, T. J. Dishongh, and A. N. Cartwright, “Innovative circuit board level routing designs for BGA packages,” IEEE Trans. Adv.

Packag., vol. 27, no. 4, pp. 630–639, Nov. 2004.

[10] C.-C. Tsai, C.-M. Wang, and S.-J. Chen, “News: A net-even-wiring sys-tem for the routing on a multilayer PGA package,” IEEE Trans.

Comput.-Aided Design Integr. Circuits Syst., vol. 17, no. 2, pp. 182–189, Feb. 1998.

[11] UMC, 0.13µm Flip-Chip Layout Guideline, p. 6, 2004.

[12] D. Wang, P. Zhang, C.-K. Cheng, and A. Sen, “A performance-driven I/O pin routing algorithm,” in Proc. ACM/IEEE Asia and South Pac. Des.

Autom. Conf., Jan. 1999, pp. 129–132.

[13] M.-F. Yu and W. W.-M. Dai, “Pin assignment and routing on a single-layer pin grid array,” in Proc. ACM/IEEE Asia and South Pac. Des. Autom.

Conf., Sep. 1995, pp. 203–208.

[14] M.-F. Yu and W. W.-M. Dai, “Single-layer fanout routing and routability analysis for ball grid arrays,” in Proc. IEEE/ACM Int. Conf.

[15] M.-F. Yu, J. Darnauer, and W. W.-M. Dai, “Interchangeable pin rout-ing with application to package layout,” in Proc. IEEE/ACM Int. Conf.

Jia-Wei Fang (S’05) received the B.S. degree in

electrical engineering from the National Cheng Kung University, Tainan, Taiwan, R.O.C., in 2003 and the M.S. degree in electronics engineering from the National Taiwan University, Taipei, Taiwan, in 2005. He is currently working toward the Ph.D. degree at the Graduate Institute of Electronics Engineering, National Taiwan University.

His current research interests include package routing and other package technologies.

I-Jye Lin (S’05) received the B.S. degree in

com-puter science from the National Tsing-Hua Uni-versity, Hsinchu, Taiwan, R.O.C., in 2004. She is currently working toward the Ph.D. degree at the Graduate Institute of Electronics Engineering, Na-tional Taiwan University, Taipei, Taiwan.

Her current research interests include package routing and design for manufacturing.

Yao-Wen Chang (S’94–M’96) received the B.S.

degree from National Taiwan University, Taipei, Taiwan, in 1988, and the M.S. and Ph.D. degrees from the University of Texas at Austin in 1993 and 1996, respectively, all in computer science.

He is a Professor in the Department of Electrical Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University. He is cur-rently also a Visiting Professor at Waseda University, Japan. He was with the IBM T. J. Watson Research Center, Yorktown Heights, NY, in the summer of 1994. From 1996 to 2001, he was on the faculty of National Chiao Tung University, Taiwan. His current research interests lie in VLSI physical design, design for manufacturing, and FPGA. He has been working closely with industry on projects in these areas. He has coauthored one book on routing and over 120 ACM/IEEE conference/journal papers in these areas.

Dr. Chang received an award at the 2006 ACM ISPD Placement Contest, Best Paper Award at ICCD-1995, and nine Best Paper Award Nominations from DAC-2007, ISPD-2007 (two), DAC-2005, 2004 ACM TODAES, ASP-DAC-2003, ICCAD-2002, ICCD-2001, and DAC-2000. He has received many awards for research performance, such as the inaugural First-Class Principal Investigator Awards and the 2004 Mr. Wu Ta You Memorial Award from the National Science Council of Taiwan, the 2004 MXIC Young Chair Professor-ship from the MXIC Corp, and for excellent teaching from National Taiwan University and National Chiao Tung University. He is an editor of the Journal

of Computer and Information Science. He has served on the ACM/SIGDA

Physical Design Technical Committee and the technical program committees of ASP-DAC (topic chair), DAC, DATE, FPT (program co-chair), GLSVLSI, ICCAD, ICCD, IECON (topic chair), ISPD, SOCC (topic chair), TENCON, and VLSI-DAT (topic chair). He is currently an independent board member of Genesys Logic, Inc, the chair of the Design Automation and Testing (DAT) Consortium of the Ministry of Education, Taiwan, a member of the board of governors of the Taiwan IC Design Society, and a member of the IEEE Circuits and Systems Society, ACM, and ACM/SIGDA.

Jyh-Herng Wang (M’98) received the B.S. and

Ph.D. degrees in electrical engineering from Na-tional Taiwan University, Taipei, Taiwan, R.O.C., in 1987 and 1994, respectively.

From 1994 to 1999, he was an Associate Scientist with the electrical design group, National Center for High-Performance Computing, HsinChu, Taiwan. He then joined Faraday Technology Corporation, where he was a Staff Engineer, CAD Manager, and Director of the Design Development Division, in 1999, 2000, and 2003, respectively. He joined Apache Design Solutions, Hsinchu, Taiwan, as Director of the Taiwan R&D team in 2007. His research interests include timing analysis, SI analysis, and VLSI design flow.