• 沒有找到結果。

Outward Ring-by-Ring Routing

Chapter 3. An RDL Routing Algorithm 22

3.5 Outward Ring-by-Ring Routing

Based on the MCMF result, all connections of I/O pads are determined. In this stage, we will finish the routing topology for each net in the fan-out region, based on a ring-by-ring, chip-by-chip scheme. Starting from the I/O pads, for each ring formed by the bump pads, we will assign correct ring passing points for all I/O connections in the fan-out region. To correctly assign the ring passing points, we have to fix two types of violations between two adjacent rings: (1) the net order violation and (2) the tile capacity violation. Because net crossings are not allowed, the net order on any two adjacent rings should match. Besides, we have to check if a given set of currently assigned net segments can pass through a tile simultaneously.

First, we can fix net order violations by either reassigning some I/O pads to bump pads, detouring the pre-assignment net, or reassigning some I/O pads to

escaped routes or boundary bump pads in the fan-in region, as shown in Figure 3.9.

Pre-assignment nets break two adjacent rings into many segments. The net orders of two corresponding segments on the inner and outer rings should be the same. If all order violations are free-assignment nets, we can reassign them to correct the order. If a net order violation involved a pre-assignment net, we have to detour the pre-assignment net or reassign free-assignment I/O pads to the fan-in region with unassigned escaped routes or boundary bump pads.

After correcting all net order violations, we check capacities of tiles by the tile routability analyzer proposed in [18]. The analyzer can check if a given set of currently assigned net segments can pass through a tile simultaneously in constant time. If a tile capacity violation occurs, we cluster the tile with an adjacent tile to push some net segments into the adjacent tile, as shown in Figure 3.9. It may need to cluster more adjacent tiles to fix a tile capacity violation. We can also reassign some I/O pads to escaped routes or boundary bump pads in the fan-in region.

(a) (b)

Figure 3.9: An example for our outward ring-by-ring routing algorithm. (a) The global routing result with two types of violations after finishing the first ring. All the I/O pads are assigned to correct ring passing points on the first ring. The ring passing points on the first ring are connected to the targets. (b) The global routing result after finishing the second ring. All the violations in (a) are corrected by reassigning some I/O pads to bump pads or detouring the pre-assignment net.

Figure 3.10 summarizes our proposed global routing algorithm. Line 1 applies the layer assignment method for pre-assignment nets to assign pre-assignment nets to appropriate layers to avoid long detours. Lines 2–6 perform escape routing for all chips to finish the routing in the fan-in region. Lines 7–10 apply the pad/layer assignment for free-assignment nets to assign free-assignment I/O pads to bump pads on specific layers. Line 11 applies the outward ring-by-ring method to complete the routing in the fan-out region. Note that we may need to add more RDLs if it fails to obtain a routing solution. We move some pre-assignment nets to the new layer if the routing failure is caused by pre-assignment nets. Then, we repeat Lines 3–11.

Algorithm: The global routing algorithm Input:

C: a set of chips B: a set of bump pads Q: a set of I/O pads

N : a set of pre-assignment nets design rules

Output:

global routes

01. perform layer assignment for N ; // see Figure 3.4

02. construct a bipartite graph to calculate the bipartite costs;

03. calculate pre-assignment costs by the locations of pre-assignment I/O pads;

04. construct flow networks for all chips based on the costs;

05. apply an MCMF algorithm on the flow networks;

06. obtain escaped routes based on the MCMF result;

07. construct concentric-circle models for all chips on all layers;

08. construct a flow network for the 3rd stage based on the obtained models;

09. apply an MCMF algorithm on the flow network;

10. obtain the pad/layer assignment for free-assignment I/O pads based on the MCMF result;

11. apply the outward ring-by-ring method for each chip on each layer;

Figure 3.10: The global routing algorithm.

Experimental Results

In this chapter, we show the experimental results of our RDL routing algorithm for InFO WLCSPs to verify the effectiveness of our algorithm. We first introduce our experimental setup and benchmarks in Section 4.1. Then, in Section 4.2, we show the experimental results of our proposed algorithm.

4.1 Experimental Setup

We implemented our algorithm in the C++ programming language. All experiments were performed on an Intel Xeon 2.97GHz Linux workstation with 48 GB memory. The benchmark circuits are listed in Table 4.1, where “#Chips”,

“#I/O Pads”, “#Bump Pads”, and “#Pre-Assignment Nets” denote the numbers of chips, I/O pads, bump pads, and pre-assignment nets, respectively.

Table 4.1: Benchmark circuit statistics. “#Chips”, “#I/O Pads”, “#Bump Pads”, and “#Pre-Assignment Nets” denote the numbers of chips, I/O pads, bump pads, and pre-assignment nets, respectively.

Circuits #Chips #I/O Pads #Bump Pads #Pre-Assignment Nets

info1 3 672 672 44

info2 3 1275 1364 89

info3 3 1398 1600 107

info4 2 1672 1920 156

info5 4 3670 3720 376

4.2 Experimental Results and Comparisons

We compared our algorithm with a heuristic, FA-maze. FA-maze first routes all the free-assignment nets using the network-flow-based method proposed in [9]

with the tile model proposed in [29]. The work [29] correctly models the diagonal capacity of a rectangular tile. Using the tile model, the space of a tile can be fully utilized under the routing-angle constraint mentioned in Section 2.2. Furthermore, We extend the work [9] to handle multiple layers with the technique presented in [5]. The technique modifies the flow network which is used to complete the global routing stage. For each layer, it duplicates vertices and edges except vertices for I/O pads and bump pads because each pad can only connect to a net. After the free-assignment nets are routed, the pre-assignment nets are routed sequentially by maze routing. When FA-maze cannot route all the nets using the current number of layers, we would add one more layer and perform FA-maze again. We set an upper bound on the maximum number of layers to three.

The comparisons of FA-maze and our algorithm are shown in Table 4.2. The number of layers, the routability, the total wirelength, and the runtime are reported.

From the results, our algorithm achieves 100% routibility for each circuit, while FA-maze fails to obtain a routing solution with more layers than ours. Further, our algorithm runs 20.8X faster than FA-maze. The results show that our algorithm is effective and efficient for the RDL routing for the InFO WLCSP. Figures 4.1 and 4.2 are the routing solution of info3 on the first and second layers, respectively.

We analyze the runtime of the four stages in our algorithm. As shown in Table 4.3, the second and third stages take up a major portion of the runtime. The runtime of second and third stages is dominated by the minimum-weight maximum-cardinality bipartite matching and the minimum-cost maximum-flow algorithms.

Table4.2:ComparisonsofRDLroutingresults(“N/A”:incompleteroutingresults). Circuits#LayersRoutability(%)TotalWirelength(µm)Runtime(sec) FA-mazeOursFA-mazeOursFA-mazeOursFA-mazeOurs info13295.4100.0N/A142102523 info23293.5100.0N/A51421675936 info33292.4100.0N/A611851111741 info43290.8100.0N/A851808172385 info53389.3100.0N/A196815315512859 Comparisons--0.921.00--20.801.00

Figure 4.1: The routing solution of info3 on the first layer. The red lines are the pre-assignment nets, and the black lines are the free-assignment ones.

Figure 4.2: The routing solution of info3 on the second layer. The red lines are the pre-assignment nets, and the black lines are the free-assignment ones.

Table 4.3: Runtime analysis. This table shows the runtime of the four stages in our global routing algorithm. The second and third stages take up the majority portion of the runtime.

Circuits Runtime (sec)

First Stage Second Stage Third Stage Fourth Stage Total

info1 <1 <1 3 <1 3

info2 <1 16 15 5 36

info3 <1 17 18 6 41

info4 <1 38 39 8 85

info5 1 246 517 95 859

Average <0.001 0.319 0.583 0.098 1.000

Conclusions and Future Work

In this thesis, we have formulated the unified-assignment, multi-layer multi-chip RDL routing problem for the InFO WLCSP and proposesd the first algorithm to solve this problem. To handle the layer-assignment problem for pre-assignment nets, we have proposed the concentric-circle model to model all the nets between one chip and all the other chips. Based on this model, we have assigned the pre-assignment nets to appropriate layers to avoid long detours. In addition, we have used this model to integrate the geometrical information of the pre-assignment nets between chips into a network-flow model, and the result can facilitate the following outward ring-by-ring stage and the detailed routing stage. Experimental results demonstrate the high quality and efficiency of our algorithm.

Some future research directions are provided as follows:

• An RDL router considering the package on package technology with InFO WLCSPs: The package on package (PoP) technology is a packaging method to combine packages vertically, as shown in Figure 5.1. With the usage of the mobile dynamic random access memory (DRAM) increasing, it is desired to integrate DRAM and logic chips using the PoP technology to improve the circuit performance. Furthermore, the PoP technology with InFO WLCSPs can achieve a thinner package thickness than traditional one, as shown in

Figure 5.1. Besides, the technology is more cost effective than the 3DIC tech-nology. As a result, the technology is a great solution for next generation of mobile devices [2].

Because there are predefined connections between two packages, performing RDL routing of the two packages individually may lead to an inferior solution or even a routing failure. To solve the RDL routing problem effectively, we have to consider the routing of the packages simultaneously. Furthermore, lo-cations of through mold vias (TMVs), which are used to connect the packages, should be decided to maximize routability. Besides, we might need to consider the distribution of power/ground TMVs and thermal effects of TMVs while deciding the locations of TMVs.

• A chip-package-board co-design methodology for a multi-chip InFO WLCSP:

In modern IC design methodology, the design processes of chips, packages, and boards are typically separate. However, it may be hard to achieve the design convergence due to the lack of information from other domains. A co-design methodology can improve the design quality and reduce the time needed to achieve design convergence.

As shown in Figure 5.2, there are five processes: (1) chip floorplanning in a package, (2) I/O buffer placement, (3) RDL routing, (4) escape routing on a micro-bump array, and (5) printed circuit board (PCB) routing. Due to the difference of packages, the method proposed in [14] might not be applicable.

We introduce some additional issues which are not considered in [14]. The first one is that we have to decide the locations of chips considering the interaction between chips because there are multiple chips in an InFO WLCSP. The sec-ond one is that the RDL routing for a multi-chip InFO WLCSP. Finally, we

might need to modify the assignment techniques in each stage of the method considering the difference of packages.

Furthermore, we might consider multiple packages simultaneously. As shown in Figure 5.2, there are two packages on the PCB. If we can consider these two packages simultaneously, we might improve the design quality.

Package on Package (PoP) PoP with InFO WLCSPs

Scheme

Package Thickness

Substrate Cost

Substrate Chip Mobile DRAM

Chip1 Chip2 Mobile DRAM

1X 0.7X~0.8X

Needed Not Needed

Figure 5.1: Comparisons of the traditional package on package (PoP) technology and the PoP with InFO WLCSPs [2]. The PoP with InFO WLCSPs does not need a substrate and can obtain a thinner package thickness. Through mold vias (TMVs) are used to connect the packages.

Chip1 Chip2

package 2 package 1 PCB

package 1

I/O buffer

micro-bump ball

micro-bump array boundary

other component

RDL routing

escape routing

PCB routing

Figure 5.2: Main processes in a design flow. Besides the three processes shown in this figure, we might need to consider chip floorplanning, which decides the locations of the chips in a package. We might also need to handle I/O buffer placement. The overall target is connecting each I/O buffer in a package to the corresponding pin (the other components in this figure).

[1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows: Theory, Algo-rithms,and Applications. Prentice Hall Englewood Cliffs, 1993.

[2] H. Chen, H.-C. Lin, C.-N. Peng, and M.-J. Wang, “Wafer level chip scale package copper pillar probing,” in Proceedings of International Test Conference (ITC), pp. 1–6, October 2014.

[3] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms.

MIT press, 2009.

[4] J.-W. Fang and Y.-W. Chang, “Area-I/O flip-chip routing for chip-package co-design,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 518–522, November 2008.

[5] ——, “Area-I/O flip-chip routing for chip-package co-design considering signal skews,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 5, pp. 711–721, May 2010.

[6] J.-W. Fang, C.-H. Hsu, and Y.-W. Chang, “An integer linear programming based routing algorithm for flip-chip design,” in Proceedings of ACM/IEEE Design Automation Conference, pp. 606–611, June 2007.

[7] ——, “An integer-linear-programming-based routing algorithm for flip-chip de-signs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 98–110, January 2009.

[8] J.-W. Fang, Y.-W. C. I.-J. Lin, P.-H. Yuh, and J.-H. Wang, “A routing algo-rithm for flip-chip design,” in Proceedings of IEEE/ACM International Con-ference on Computer-Aided Design, pp. 753–758, November 2005.

[9] J.-W. Fang, I.-J. Lin, Y.-W. Chang, and J.-H. Wang, “A network-flow-based RDL routing algorithmz for flip-chip design,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 8, pp. 1417–1429, August 2007.

[10] J.-W. Fang, M. D. F. Wong, and Y.-W. Chang, “Flip-chip routing with uni-fied area-I/O pad assignments for package-board co-design,” in Proceedings of ACM/IEEE Design Automation Conference, pp. 336–339, July 2009.

[11] Y.-K. Ho, H.-C. Lee, W. Lee, Y.-W. Chang, C.-F. Chang, I.-J. Lin, and C.-F.

Shen, “Obstacle-avoiding free-assignment routing for flip-chip designs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 33, no. 2, pp. 224–236, February 2014.

[12] Y.-C. Huang, B.-Y. Lin, C.-W. Wu, M. Lee, H. Chen, H.-C. Lin, C.-N. Peng, , and M.-J. Wang, “Efficient probing schemes for fine-pitch pads of info wafer-level chip-scale package,” in Proceedings of ACM/IEEE Design Automation Conference, 2016.

[13] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.

[14] H.-C. Lee and Y.-W. Chang, “A chip-package-board co-design methodology,”

in Proceedings of ACM/IEEE Design Automation Conference, pp. 1082–1087, June 2012.

[15] H.-C. Lee, Y.-W. Chang, and P.-W. Lee, “Recent research development in flip-chip routing,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 404–410, November 2010.

[16] P.-W. Lee, C.-W. Lin, Y.-W. Chang, C.-F. Shen, and W.-C. Tseng, “An effi-cient pre-assignment routing algorithm for flip-chip designs,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 239–244, November 2009.

[17] P.-W. Lee, H.-C. Lee, Y.-K. Ho, Y.-W. Chang, C.-F. Chang, I.-J. Lin, and C.-F. Shen, “Obstacle-avoiding free-assignment routing for flip-chip designs,”

in Proceedings of ACM/IEEE Design Automation Conference, pp. 1088–1093, November 2012.

[18] C.-W. Lin, P.-W. Lee, Y.-W. Chang, C.-F. Shen, and W.-C. Tseng, “An efficient pre-assignment routing algorithm for flip-chip designs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 6, pp.

878–889, June 2012.

[19] C. C. Liu, S.-M. Chen, F.-W. Kuo, H.-N. Chen, E.-H. Yeh, C.-C. Hsieh, L.-H.

Huang, M.-Y. Chiu, J. Yeh, T.-S. Lin, T.-J. Yeh, S.-Y. Hou, J.-P. Hung, J.-C.

Lin, C.-P. Jou, C.-T. Wang, S.-P. Jeng, and D. C. H. Yu, “High-performance integrated fan-out wafer level packaging (InFO-WLP): Technology and system integration,” in Proceedings of IEEE International Electron Devices Meeting (IEDM), pp. 14.1.1–14.1.4, December 2012.

[20] X. Liu, Y. Zhang, G. K. Yeap, and C. Chu, “Global routing and track assign-ment for flip-chip designs,” in Proceedings of ACM/IEEE Design Automation Conference, pp. 90–93, June 2010.

[21] L. Luo and M. D. F. Wong, “Ordered escape routing based on boolean satisfi-ability,” in Proceedings of IEEE/ACM Asia and South Pacific Design Automa-tion Conference, pp. 244–249, March 2008.

[22] D. Staepelaere, J. Jue, T. Dayan, and W. W.-M. Dai, “Surf: Rubberband routing system for multichip modules,” IEEE Design & Test of Computers, vol. 10, no. 4, pp. 18–26, December 1993.

[23] D. J. Staepelaere, Geometric Transformations for a Rubber-Band Sketch. Uni-versity of California Santa Cruz, 1992.

[24] K. J. Supowit, “Finding a maximum planar subset of a set of nets in a chan-nel,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 6, no. 1, pp. 93–94, January 1987.

[25] T. Szymanski, “A special case of the maximal common subsequence problem,”

Technical Report TR-170, Computer Science Laboratory, Princeton University, Tech. Rep., 1975.

[26] J. T. Yan and Z. W. Chen, “IO connection assignment and RDL routing for flip-chip designs,” in Proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference, pp. 588 – 593, January 2009.

[27] ——, “RDL pre-assignment routing for flip-chip design,” in Proceedings of the Great Lakes Symposium on VLSI, pp. 401–404, May 2009.

[28] ——, “Pre-assignment RDL routing via extraction of maximal net sequence,” in Proceedings of IEEE International Conference on Computer Design, pp. 65–70, October 2011.

[29] T. Yan and M. D. F. Wong, “Correctly modeling the diagonal capacity in escape routing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 2, pp. 285–293, February 2012.

[30] D. Yu, “A new integration technology platform: Integrated fan-out wafer-level-packaging for mobile applications,” in Proceedings of the Symposium on VLSI Technology, pp. T46–T47, June 2015.

[1] B.-Q. Lin, T.-C. Lin, and Y.-W. Chang, “Redistribution layer routing for in-tegrated fan-out wafer-level chip-scale packages,” to appear in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, November 2016.

[2] B.-Q. Lin, T.-C. Lin, and Y.-W. Chang, “An effective RDL routing algorithm for the InFO WLCSP technology,” to appear in the 27th VLSI Design/CAD Symposium, August 2016.

[3] C.-C. Huang, H.-Y. Lee, B.-Q. Lin, S.-W. Yang, C.-H. Chang, S.-T. Chen, and Y.-W. Chang, “Detailed-routability-driven analytical placement for mixed-size designs with technology and region constraints,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp.508-513, November 2015.

[4] C.-C. Huang, H.-Y. Lee, B.-Q. Lin, S.-W. Yang, C.-H. Chang, S.-T. Chen, and Y.-W. Chang, “A detailed-routability-driven analytical placer considering tech-nology and region constraints,” in the 26th VLSI Design/CAD Symposium, Au-gust 2015.

相關文件