用於擺置繞線流程的可繞度和效能最佳化技術

(1)

國立交通大學資訊學院資訊科學與工程研究所

博士論文

Graduate Institute of Computer Science and Engineering

College of Computer Science

National Chiao Tung University

Doctoral Dissertation

用於擺置繞線流程的可繞度和效能最佳化技術

Optimizing Routability and Performance of Placement and

Routing Flow for Nanometer Designs

劉文皓

Wen-Hao Liu

指導教授:李毅郎博士

Advisor: Yih-Lang Li, Ph.D.

中華民國一百零二年一月

January, 2013

(2)

i

摘要

近年來，隨著奈米技術的進步，晶片中的元件越來越多，同時繞線的

難度也越來越高。晶片的可繞性(Routability)成為眾人所觀注的議題。在

該議題上，全域繞線(Global Routing)扮演著重要的角色。在電路實體設

計(Physical Design)的流程中，全域繞線上承元件擺放(Placement)，下

啟細部繞線(Detailed Routing)。一個快速的全域繞線器能提供擁擠度

(Congestion)的資訊給擺放器，讓擺放器擺出較容易繞線的布局。此外，

若全域繞線器能有效的解決繞線擁擠度的問題，細部繞線的負擔和所需

時間可以大幅降低。當全域繞線器和擺放器合作時，全域繞線器如何快

速且準確的回報擁擠度資訊給擺放器是一個重要的議題。另一方面，當

全域繞線器扮演細部繞線器的指導者角色時，全域繞線結果的品質就顯

的格外重要。若細部繞線器能根據一個高品質的全域繞線結果進行細部

繞線，將會提升細部繞線結果的品質，並大縮短細部繞線的時間。這篇

論文提出了兩個全域繞線器: Grace是個快速的全域繞線器，適合擔任擁

擠度預估器; NCTU-GR 2.0能產生高品質的繞線結果來指導細部繞線，

其繞線結果較不擁擠且有較短的線長。此外，為了將Grace應用於業界，

我們增加了許多功能於Grace中來滿足業界的需求。在擺放和繞線的中間

階段，我們結合了擺放器和全域繞線器，提出了一個可繞性優化器

(Routability Optimizer)。若給予一個布局結果，該優化器可以重新擺放

其中的元件，讓該布局更容易被繞線，進而得到更好的繞線結果。最終，

在進行細部繞線前，我們還提出了一個三維繞線改善器，將全域繞線的

結果在做進一步的改善，此舉可以更進一步的降低細部繞線器的時間和

負擔。

(3)

ii

Abstract

Routability has become one of most critical issues to successfully achieve design closure.

To address this issue, global routing plays an important role in the placement and routing

flow. During the placement stage, a fast global router can serve as a routing congestion

estimator to guide that placers improve the routability of placement solutions; however,

traditional global routers are too slow to offer quick but accurate congestion estimation. In

the routing stage, the duty of a global router is to identify a global routing result to guide

downstream detailed routers. The runtime of the detailed router can significantly reduce if

the global routing result has well optimized congestion and wirelength.

In this dissertation, two global routing engines are proposed, Grace and NCTU-GR 2.0.

Grace is a fast global router to serve as a fast routing congestion estimator, adopts the

proposed unilateral monotonic routing and hybrid unilateral monotonic routing to replace

time-consuming maze routing in its routing flow, and invokes a congestion-aware bounding

box expansion scheme to avoid over-expanding the searching regions to achieve high

speedup. Moreover, in order to use Grace in the industrial flow, Grace have been enhanced

to tackle the layer directive and scenic constraints for considering the timing issue.

Another proposed global router NCTU-GR 2.0 can generate high-quality global routing

results to guide the downstream detailed router. The proposed bounded-length maze

routing avoids producing redundant detours to save routing resource; rectilinear Steiner

minimum tree aware routing scheme can guide NCTU-GR 2.0 to build a routing tree for

each multi-pin net with shorter wirelength; a dynamically adjusted history cost function

highlights for NCTU-GR 2.0 which grid edges are critical routing resource that can be more

carefully allocated to the nets that really desire. Based on the proposed innovations to

carefully utilize routing resource, NCTU-GR 2.0 obtains shorter total wirelength and lower

(4)

iii

In addition, between the placement and routing stages, this dissertation presents an

incremental place-and-route tool called Ropt to optimize the routability of a given

placement solution. Rather than minimizing HPWL, Ropt directly improves routability by

minimizing the routing cost of nets, as the routing cost is defined in terms of global

congestion, local congestion and wirelength. In addition to using NCTU-GR 2.0 to evaluate

the routability of the placement solutions, this work also uses Wroute to obtain detailed

routing results of the optimized placement solutions for the evaluation of real routability.

Finally, the proposed post-3D-global-router called Post3DGR further refines the

wirelength, congestion, and via count of a given 3D global routing result. Post3DGR consists

of the 3D post routing stage and negotiation-based layer assignment stage. The 3D post

routing stage adopts an inherited history cost function to guide the routing, which can

greedily reduce total wirelength and vias. The negotiation-based layer assignment stage

re-assigns the routing layer for each wire to reduce via count. The negotiation-based layer

assignment can be extended to consider via overflow and antenna effect. Considering these

issues before detailed routing can ease the effort and runtime of subsequent detailed

(5)

iv

誌謝

能順利的完成這份論文，我最先要感謝的是我的父母-劉修添先生和

蘇素華女士。謝謝你們的支持，讓我能無後顧之憂的做自己喜歡的研究。

你們的身教和言教，是我能順利完成博士學位的最大原因。謝謝你們。

我希望將這份論文獻給我最親愛的父親和母親。

在我博士生涯中，謝謝李毅郎老師的細心且耐心的指導。我何等幸

運，因為經師多有，人師難得。謝謝你教導我做研究的方法和待人處事

的道理。這四年在你的教導下，我覺得我在很多方面都有大幅的進步，

特別是論文撰寫方面。四年前，我連一句文法正確的英文句子都寫不出

現。往往整篇論文都要經過你的大幅翻修，才能讓人讀懂。謝謝你花許

多精力批改我的論文和教導我如何寫論文。因為你的教導，這篇博士論

文才能順利完成。

謝謝Cheng-Kok Koh教授給我到普渡大學進行千里馬的機會。我在普

渡大學的這段時間，謝謝你關心我的生活，並跟我分享你的人生經驗。

在我們討論研究進度時，謝謝你能忍受我的破爛英文，並且不時的幫我

正音。我從你身上學到了有別以往的研究方法，並且更開擴了我的國際

觀。我相當感激並且慶幸能受到你的指導。當我在美國的期間，謝謝Cliff

Sze博士給我到IBM Austin Lab進行短期研究的機會。IBM Austin Lab一

直是我心中所響往的一流研究機構，在這裡，我很慶幸能和許多知名的

研究者共事。我在IBM的期間，學到了業界解決問題的方法，並且讓我

的英文的溝通能力大有進步。

一路走來，我要感謝的人實在是太多了。謝謝國小時的林蘭春老師，

國中時的張澄仁老師，高中時的徐英珠老師和黃玉慧老師。你們總是在

(6)

v

我做錯事時包容我、糾正我;在我意志消沉時鼓勵我、鞭策我。沒有你們

的教誨，就沒有今日的我。在我就讀博士期間，謝謝王廷基教授和張耀

文教授對我的照顧和經驗分享。我要特別感謝王廷基教授，我在研究上

遇到困難時，王廷基教授經常對我伸出援手。我還要感謝我的姑姑-劉虹

君女士，謝謝妳對我的照顧和愛護，妳在我的成長過程中扮演著十分重

要的角色。最後我要感謝我的女朋友-林珈竹，謝謝妳陪伴我走過研究所

的求學生涯，謝謝妳在我研究受挫時鼓勵我、聽我訴苦。有妳相伴的時

光，讓我的生活不再只是程式碼和論文，妳讓我的研究所生涯多了許多

美好的回憶。謝謝妳，我愛妳。

(7)

vi

Chapter 1 Introduction...1

1.1 Overveiw of this Dissertation ...1

1.2 Background...4

1.2.1 Building Grid Graph and the Objectives of Global Routing...5

1.2.2 Net Decomposition ...6

1.2.3 Pattern Routing and Monotonic Routing ...7

1.2.4 Negotiation-based Rip-up and Rerouting (NRR)...7

1.2.5 Layer Assignment...8

1.2.6 Comparison of Recent Global Routers ...9

Chapter 2 Grace: A Fast Global-routng-based Routing Congestion Estimator ...10

2.1 Introduction...10

2.2 Problem Description ...11

2.3 The Proposed Algorithms for Accelerating Routing...12

2.3.1 Unilateral Monotonic Routing ...13

2.3.2 Hybrid Unilateral Monotonic Routing...17

2.3.3 Congestion-aware Bounding Box Expansion ...20

2.4 Design Flow of Grace...22

2.5 Experimental Results...23

2.5.1 Comparison between NCTU-GR 2.0 and Grace...24

2.5.2 Effectiveness of the Proposed Algorithms ...27

2.6 Summary ...28

Chapter 3 E-Grace: Enhancing Grace for Practically Industrial Designs ...29

3.2 Preliminaries ...32

3.2.1 Problem Description ...32

3.2.2 Congestion Evaluation Metrics ...33

3.2.3 Previous Works ...33

3.3 Design Flow of E-Grace...34

3.3.1 Relaxation-Legalization Scenic Controlling Method ...36

3.3.2 TCR-driven R&R Scheme ...40

3.3.3 Throughput Controlling...42

3.4.1 Compare E-Grace with other industrial RCE tools...43

3.4.2 Effectiveness of each Innovation in E-Grace ...45

(8)

vii

Chapter 4 Ropt: Optimization of Placement Solutions for Routability ...51

4.3 Case Study for Placement Solutions in DAC Contest ...54

4.3.1 Framework for Performing Detailed Routing ...55

4.3.2 Mismatch between Global and Detailed Routability...56

4.3.3 What Causes Routing Violations ...58

4.4 Proposed Routability Optmizer ...60

4.4.1 Local-Routability-Aware Global Routing Model ...60

4.4.2 Routing-cost-driven Global Re-Placement ...63

4.4.3 Legalization with Global Routing Preserved...66

4.4.4 Local Detailed Placement...67

4.5.1 Global Routability: Evaluation by NCTU-GR 2.0 ...69

4.5.2 Effective Routability: Evaluation by Wroute...71

4.5.3 Comparison between Abacus and Our Legalizer...75

4.6 Summary ...76

Chapter 5 NCTU-GR 2.0: Global Routing with Bounded-Length Maze Routing ....78

5.3 Proposed Approaches to Improve Routing Quality ...81

5.3.1 BLMR ...81

5.3.2 RSMT-Aware Routing Scheme ...88

5.3.3 Dynamically Adjusted History Cost Function ...90

5.4 Design Flow of NCTU-GR 2.0 ...91

5.5.1 Comparing Traditional Maze Routings with BLMR ...94

5.5.2 The Effectiveness of RSMT-aware Routing ...96

5.5.3 Comparison of Optimal-BLMR and Heuristic-BLMR ...96

5.5.4 Routing Result Comparison of Sequential Routers...97

5.6 Summary ...100

Chapter 6 Post3DGR: Post Optimization of 3D Global Routing Results ...101

6.3 Design Flow of Post3DGR ...104

6.4 Negotiation-based Layer Assignment (NLA)...107

(9)

viii

6.4.2 MCSNLA: Minimum-cost Single Net Layer Assignment...109

6.4.3 Congestion Cost Formulations...115

6.5.1 Effectiveness of NLA...118

6.5.2 Effctiveness of Post3DGR ...119

6.5.3 Consideration of Antenna Effect...121

6.6 Summary ...122

Chapter 7 Conclusions and Future Works ...123

7.1 Conclusions...123

7.2 Future Works ...124

(10)

ix

List of Figures

Fig. 1.1 (a) A placement solution; (b) a global routing result; (c) a detailed routing result of

a design. ... 1

Fig. 1.2 Routability-driven placement and routing (P&R) flow... 2

Fig. 1.3 (a) partition a placement into a 3D array of G-cells; (b) model the 3D array of G-cells into a grid graph; (c) typical global routing flow... 5

Fig. 1.4. modern 3D global routing flow... 6

Fig. 1.5. Four-pin net decomposition ... 7

Fig. 2.1. (a) Vertically monotonic routing path; (b) horizontally monotonic routing path; (c) routing path combining vertically and horizontally monotonic routing. ... 12

Fig. 2.2. Example of VM routing... 14

Fig. 2.3. The proposed vertically monotonic routing algorithm... 15

Fig. 2.4. The pseudo code of hybrid unilateral monotonic routing algorithm ... 17

Fig. 2.5. Four path types in B with congested regions ... 18

Fig. 2.6. (a) Routing result of 3-bend routing with two overflows; (b) routing result of escaping routing with an overflow; (c) routing result of the proposed HUM routing without overflows. ... 19

Fig. 2.7. Example of congestion-aware bounding box expansion... 20

Fig. 2.8. Design flow of Grace ... 22

Fig. 2.9. Overflow converge curves of Grace and NCTU-GR 2.0 ... 25

Fig. 2.10. Congestion maps of s4_Ripple generated by (a) NCTU-GR 2.0 and (b) Grace ... 27

Fig. 3.1. Congestion maps obtained by different RCEs... 30

Fig. 3.2. Design flow of E-Grace... 34

Fig. 3.3. (a) the initial routing result of a net; (b) a segment exhausts all detour quotas, resulting another segment has no detour quotas to bypass congestion regions; (c) the routing result of the proposed R&R stage; (d) the routing result after the scenic legalization stage ... 36

Fig. 3.4. The routing result obtained by (a) the R&R stage; (b) the first iteration of the soft legalization phase; (c) the second iteration of the soft legalization phase ... 39

Fig. 3.5. Curve of cc(e) in Eq. (3.5) ... 41

Fig. 3.6. Congestions maps of Ind4 obtained by (a) CAind, (b) EGall, (c) GRind... 45

Fig. 3.7. Routing results of Ind11. (a) Minimizing overflows; (b) minimizing congestion ratio to approach 80%; (c) color scheme... 48

(11)

x

Fig. 3.8. Routing results of Ind11 (a) without extra blockages; (b) with 2% extra blockages;

(c) with 5% extra blockages ... 50

Fig. 4.1. Placement solutions of s19 obtained by (a) Ripple; (b) mPL; (c) SimPLR; (d) NTUplace... 58

Fig. 4.2. The local views of the most congested region in the placement solution of (a) mPL; (b) Ripple; (c) NTUplace ... 59

Fig. 4.3. Design flow of the proposed routability optimizer Ropt... 60

Fig. 4.4. Pseudo code of the algorithm for FOPG problem ... 64

Fig. 4.5. An example of the proposed heuristic algorithm for FOPG problem... 65

Fig. 4.6. Pseudo code of local detailed placement... 67

Fig. 5.1. (a) Maze routing within a bounding box; (b) maze routing without bounding box. ... 80

Fig. 5.2. (a) The search region of the net while L is set to 9; (b) two path candidates P1 and P2 from s to v; (c) ewk(v, t) denotes estimating wirelength from v to t in iteration k ... 83

Fig. 5.3. Relationship between the routing iteration number and the scaling factor ... 87

Fig. 5.4. Example of RSMT-aware routing scheme... 89

Fig. 5.5. Design flow of NCTU-GR 2.0... 92

Fig. 6.1. Gap of the recognition of good results between 2D routing with layer assignment and 3D routing ... 102

Fig. 6.2. MGR flow... 103

Fig. 6.3. (a) Design flow of Post3DGR. (b) Design flow of NLA ... 104

Fig. 6.4. Example of the quality improvement in Post3DGR ... 105

Fig. 6.5. The comparison between existing layer assignments and NLA ... 107

Fig. 6.6. An example of single net layer assignment ... 109

Fig. 6.7. The pseudo code of MCSNLA ... 111

Fig. 6.8. Procedure InitSol of MCSNLA ... 112

Fig. 6.9. Procedure EnumSol of MCSNLA... 112

Fig. 6.10. An example for constructing a 3D tree ti,3... 113

Fig. 6.11. An example of EnumSol ... 113

Fig. 6.12 The overflow reduction stage of NLA resolves overflow (OF) of adaptec2 at the cost of increasing vias ... 116

(12)

xi

List of Tables

Table 1.1 Recent global routing researches ... 9

Table 1.2 The issues disccused in recent global routing researches... 9

Table 2.1 Comparison total overflows between NCTU-GR 2.0 and Grace in a given time budget. ... 24

Table 2.2 Comparison the routing results between NCTU-GR 2.0 and Grace without time limitation... 26

Table 2.3 The usage of the proposed unilateral monotonic routing and HUM routing algorithms ... 27

Table 2.4 Comparison between congestion-aware box expansion scheme and tradtional scheme ... 28

Table 3.1 Design Information ... 43

Table 3.2 Routing results comparison between E-Grace, industrial congestion analyzer and real router ... 44

Table 3.3 Design Information ... 45

Table 3.4 Different versions of E-Grace... 46

Table 3.5 Effectiveness of using relaxation-legalization method to handle the Scenic Constraint... 46

Table 3.6 Route Ind11 with different objectives ... 47

Table 3.7 Effectiveness of using throughput controlling method to trade off runtime and routing quality... 49

Table 3.8 Route Ind11 with extra blockages... 49

Table 4.1 Comparing the placement solutions in DAC12 contest based on the DAC12 metric, violations and TLMW... 57

Table 4.2 Benchmarks' information... 68

Table 4.3 Ropt with different features ... 69

Table 4.4 Global routing result comparison between NTUplace, Ropt1, Ropt2 and Ropt3... 70

Table 4.5 Detailed routing results of NTUplace ... 71

Table 4.6 Detailed routing result comparison between NTUplace, Ropt3, Ropt4, Ropt5 and Ropt6... 72

Table 4.7 Comparing Detailed routing results of mPL and Ropt6... 74

(13)

xii

Table 4.9 Comparing Detailed routing results of SimPLR and Ropt6... 75

Table 4.10 Comparison between the detailed routing results of the placement solutions in DAC12 contest... 75

Table 4.11 Comparison between Abacus and our legalizer ... 76

Table 5.1 Main features of modern global routers ... 79

Table 5.2 Net ordering methods comparision ... 93

Table 5.3 Routing result comparison between maze routing w/ and w/o bounding box and bounded-length maze routing ... 95

Table 5.4 Comparison of the routing result of H-BLMR-GR with and without RSMT-aware routing scheme ... 96

Table 5.5 Comparision of the routing result of global routers with heuristic-BLMR, Optimal-BLMR and [27] ... 97

Table 5.6 Comparison between NCTU-GR 2.0 and the other routers on overflow-free cases... 99

Table 5.7 Comparison between NCTU-GR 2.0 and the other routers on hard-to-route benchmarks ... 99

Table 6.1 Comparing NLA with previous layer assignment works on via count minimization problem ... 117

Table 6.2 The quality variations of NLA and [17] with different assignment ordering sequences ... 118

Table 6.3 The wirelength improvement and runtime of Post3DGR ... 119

Table 6.4 Comparing NCTU-3D-GR 2.0 with other 3D routers... 120

(14)

1

Chapter 1 Introduction

1.1 Overveiw of this Dissertation

With ceaseless advances in semiconductor technology shrinkage, the main contributing factors to the

increasingly more challenging routing problem include the high number of metal layers, wide range of

metal thickness, and complex design rules, thus routability has become a critical issue in VLSI physical

design flow. To address the routability issue, global routing plays an important role since global routing

bridges the gap between placement and detailed routing. In traditional physical design flow, the global

routing stage follows the placement stage to yield a rough routing result for most nets, and then the

detailed routing stage based on the rough routing result completes physical routes for every net and

finally realizes a detailed routing result. For example, Figs. 1.1(a), (b) and (c) respectively show a

placement solution, a global routing result and a detailed routing result for a design, in which the gray

rectangles denote macros, the small rectangles denote cells, and red circles denote pins of a net. Figure

1.1 illustrates that global routing identifies a set of routing regions that the net should pass through, and

detailed routing finds a physical routing path in these regions.

In this dissertation, a routability-driven placement and routing (P&R) flow (Fig. 1.2) is presented

based on the proposed global routing engines and post-placement routability optimizer. The proposed

global routing engines can not only cooperate with placers to obtain better routability placement Fig. 1.1 (a) A placement solution; (b) a global routing result; (c) a detailed routing result of a design.

(15)

2

solutions, but can also yield high-quality global routing results so that detailed routers can apply the

global routing results to generate good detailed routing results. Compared to the traditional placement

and routing flow, the flow shown in Fig. 1.2 pays more attention to the interaction between placement

and routing and optimizes the main factors to influence routability such like global congestion, local

congestion, wirelength and via count, which can detect and fix the congestion problems in the early

stages and thus contributes to faster design closure. The red boxes in Fig. 1.2 highlight the contributions

of this dissertation to deal with routability issue, introduced in the following four paragraphs.

To avoid wasting time on routing unroutable designs, a routing congestion estimator (RCE) can help

designers to fast judge whether a design is routable in the early stages to speed up the design closure.

Also, a RCE can cooperate with placers to optimize routability. Thus, this dissertation presents a

global-routing-based RCE cooperating with placers to improve routability. The proposed

global-routing-based RCE can offer more accurate routing congestion estimation than

probabilistic-based RCE [35, 36] since global-routing-based RCE can better capture the actual routing

behaviors. However, global-routing-based RCE is typically slower than probabilistic-based RCEs.

Because a RCE may be frequently launched in the placement stage, the placement stage would slow

down if the launched RCE is not fast enough. Accordingly, the objective of a global-routing-based RCE

is to identify an accurate congestion map as fast as possible. This dissertation presents an accurate and

fast global-routing-based RCE called Grace. In addition to testing Grace on academic benchmarks, we Fig. 1.2 Routability-driven placement and routing (P&R) flow.

(16)

3

also enhance Grace to fulfill industrial requirements and then apply the enhanced Grace in the industrial

design flow.

In the flow of Fig.1.2, RCE reports the congestion information of a placement solution to placers,

and then placers can move cells based on the congestion map to improve the placement solution's

routability. However, as cells move, the congestion map also changes, thereby degrading the

effectiveness to improve the routability of a placement. To resolve this problem, we develop a

routability optimizer Ropt that takes a placement solution and optimizes its routability by incremental

place-and-route. Ropt always maintains a global routing instance based on the current placement

solution. The global routing instance is built on a local-routability-aware model. Therefore, the global

routing instance provides both global and local congestion information to guide the placement

algorithms. Also, the placement algorithms in Ropt invoke a global routing engine to decide the placed

locations for movable cells.

After a placement solution is optimized by Ropt, the proposed global router NCTU-GR 2.0 is used

to identify a high-quality global routing result of the placement solution to provide a good blueprint for

detailed routing. Generally, the runtime of the detailed routing stage is hundred or thousand times of

that of the global routing stage. Good global routing results can diminish the time of detailed routing

and promote the final interconnection quality significantly. Because the routing quality of a global

routing result can be measured by the total overflow and total wirelength, minimizing overflows and

wirelength is the major task for global routing researches [3-21], where overflow means that a region's

routing demand exceeds its routing capacity. Compared to other state-of-the-art global routers [6, 11, 16,

17], the proposed NCTU-GR 2.0 can get global routing results with fewer overflows and shorter

wirelength. Note that, although Grace can be treated as a light global router, the algorithms used in

Grace and NCTU-GR 2.0 are largely different since their purposes are different. Chapter 5 will detail

the algorithmic differences between designing a global-routing-based RCE and a global router.

With semiconductor technology shrinkage, the number of metal layers ceaselessly increases. Thus,

(17)

4

planning for routing wires impacts the amount of vias, timing, and many manufacturing issues such like

antenna effect and double-vias. However, the typical global routing stage ignores these issues and

leaves these issues to detailed routing, which may make detailed routers struggle for these issues.

Accordingly, we develop a post-3D-global-routing tool Post3DGR between global routing and detailed

routing to refine a given 3D global routing result, which can ease the effort of detailed routers to speed

up the design closure. The proposed Post3DGR can reduce the vias, congestion, and wirelength of a

given 3D global routing result by re-routing nets and re-planning wires' layers. With some modifications,

Post3DGR also can take antenna effect into account.

The rest of this dissertation is organized as follows. Chapter 1 introduces the problem formulation

and background of global routing. Chapter 2 presents a fast global-routing-based RCE called Grace

whose goal is to identify a satisfactory global routing result to predict routing congestion as fast as

possible. Chapter 3 presents an enhanced Grace applied in the industrial flow to consider timing and

local congestion, and target congestion ratio. Chapters 4 introduces the proposed incremental

place-and-route tool Ropt that can optimize the routability of a given placement solution. Chapter 5

presents a global router called NCTU-GR 2.0 whose objective is to obtain high-quality global routing

results in a reasonable runtime to guide detailed routers. A post-3D-global-routing tool Post3DGR is

detailed in Chapter 5. Finally, Chapter 7 draws conclusions.

1.2 Background

In global routing problem, typically the given placement solution is partitioned into a 3-dimension

(3D) array of global cells (Fig. 1.3(a)), and then the array of global cells is modeled to a 3D grid graph

(Fig. 1.3(b)). Generally, there are two strategies to deal with global routing problem on the 3D grid

graph. One directly performs global routing on a 3D grid graph [3-6]. Although directly performing

global routing on a 3D grid graph may achieve a better result, it is time-consuming. Thus, the

mainstream approach is to condense 3D grid graph into 2D grid graph first, and then peform 2D global

(18)

5

routing wire to the corresponding metal layers to obtain a 3D global routing result [7-21]. Figure 1.3(c)

shows the general flow adopted in most global routers to tackle 3D global routing porblem, the

functions of each stage are detailed in the follows.

1.2.1 Building Grid Graph and the Objectives of Global Routing

In the grid graph, each grid node refers to a global cell (G-cell), and each grid edge corresponds to a

boundary between two abutting global cells in the same layer. Meanwhile, each via edge connects two

abutting G-cells in two adjacent layers. The number of routing tracks that can be accommodated across

the abutting boundary is defined as the capacity c(e) of a grid edge e, and the number of wires that pass

through grid edge e is called grid edge’s demand d(e). The overflow of a grid edge e is defined

max(d(e)-c(e), 0), the total overflow is the sum of overflows on all grid edges, and the maximum

overflow is the maximum overflow among all edges. For simplicity, the capacity of each via edge is not

limited, which is also adopted in most of global routing researches [3-21]. Given the pins' locations of

each net distributed on the grid graph, the objective of global routing problem is to identify a highly

routable global path to connect the pins of each net. The quality of a global routing result is generally

measured by the total overflow and wirelength.

Figure 1.4 shows how to compute the capacity of 2D grid edges in the mainstream flow of 2D global Fig. 1.3 (a) partition a placement into a 3D array of G-cells; (b) model the 3D array of G-cells into a grid graph; (c) typical global routing flow.

(19)

6

routing with layer assignment, in which the numbers next to 3D grid edges denote the capacity of the

3D edges. After the 3D grid graph is compacted to a 2D graph, the capacity of a 2D grid edge is

obtained by adding up the capacities of its corresponding 3D grid edges.

1.2.2 Net Decomposition

Most global routers decompose each multi-pin net into two-pin subnets, because net decomposition

can simplify a multi-terminal routing problem to a two-terminal routing problem. Before routing stages,

the rectilinear Steiner minimal tree (RSMT) or rectilinear minimum spanning tree (RMST) construction

algorithms are commonly used to generate the initial topology for each multi-pin net and then each

multi-pin net is decomposed into two-pin subnets based on its topology. For example, Figs. 1.5(a) and

1.5(b) show the initial topologies of a four-pin net generated by RSMT and RMST, respectively, in

which the green rectangle denotes a Steiner point, and the topologies of the four-pin net in Figs. 1.5(a)

and 1.5(b) can be decomposed to 4 and 3 two-pin subnets, respectively. Because a RSMT has shorter

wire length than a RMST has, net decomposition by RSMT is popular in many literature. FLUTE [23]

is a very fast and accurate RSMT construction tool, which is widely used by many modern global

routers. FLUTE not only quickly constructs a good RSMT for a multi-pin net, but also obtains optimal

RSMTs for nets with nine or fewer pins. However, FGR [3] indicates that the RSMT has less routing

10

0

5

1

0

10

15

11

(20)

7

flexibility than the RMST as it owns Steiner points and generates more flat segments than the RMST,

and the used data structure of RSMTs is more complex than that of RMSTs. On the contrary, the RMST

can simply complete each subnet’s routing with pattern routing or monotonic routing to avoid

congestion regions. Consider wirelength and routing flexibility, in which a RMST that encourages

multiple two-pin routings to merge together with multiple paths that pass through the same grid edges

(Fig. 1.5(c)). This ideal solution avoids passing through congested regions by using a shorter total wire

length than that of a RMST that does not encourage finding joint wires. However, how to identify a

RMST with joint wires is a challenge.

1.2.3 Pattern Routing and Monotonic Routing

Pattern routing adopts specific routing patterns to connect two pins. The most common patterns are

L-shaped or Z-shaped. The main advantage of pattern routing is that it can complete the path searching

in a very short time, but its solution space is very tiny. To mitigate the huge performance gap between

pattern routing and maze routing, Pan et al. [14] present monotonic routing to enrich the solution space.

Monotonic routing uses the dynamic-programming technique to identify a routing path from the source

to the target without any detour. The time complexity of monotonic routing in a m  n grid graph is

O(mn), which is the same as that of the Z-shaped pattern routing.

1.2.4 Negotiation-based Rip-up and Rerouting (NRR)

Rip-up and re-routing technique is widely used in global and detailed routing. Given an illegal Fig. 1.5. Four-pin net decomposition by (a) RSMT; (b) RMST; (c) RMST with a joint wire, subnets n1 and n2 share a joint wire.

(21)

8

routing solution, rip-up and rerouting technique iteratively removes the nets with violations and reroutes

them sequentially to expel violations. In global routing problem, a violation occurs when an overflow is

produced. Widely, the negotiation technique, as proposed in PathFinder [22], is associated with rip-up

and re-routing technique (NRR) in modern global routers to improve the ability of overflow removal.

The main idea of NRR is to increase the penalty of a grid edge at current iteration that overflowed at the

previous iteration. Thus, path searching intends to avoid passing previously overflowed grid edges. [22]

formulates the negotiation-based routing cost of grid edges e as follows,

e e e

e b h p

c (  ) , (1.1)

where ce represents the routing cost of e; be denotes the base cost; he denotes the history cost, and pe

denotes the congestion penalty. The history cost he increases as overflow occurs. The value of he in the

(k+1)-th iteration is given by:

     _   otherwise overflowed is if 1 k e inc k e k e h e h h h , (1.2)

where h1e1, hinc is a constant, and hek is updated in every iteration. In addition, FGR [3] presents

another formula to preserve the base cost as follows.

e e e

e b h p

c    . (1.3)

Several variations of negotiation-based cost functions have been discussed in [10-12, 16-17].

1.2.5 Layer Assignment

The goal of layer assignment in global routing is to translate a 2D global routing result into a 3D

result on minimizing the number of vias while not changing routing topology or increasing any

overflows, which is called the congestion-constraint layer assignment problem. Congestion-constrained

layer assignment problem for via minimization has been proven to be NP-complete [34] and extensively

studied. BoxRouter2.0 [9] adopted integer linear programming to minimize via count minimization.

FGR [3] greedily assigned net edges to the corresponding metal layers by heuristics. Lee et al. proposed

(22)

9

order at first and then assigning each net to the appropriate layer by a dynamic-programming technique.

FastRoute 4.0 [16] decomposes multi-pin nets to two-pin net, then using the dynamic-programming

algorithm to assign each two-pin net one bye one. Dai et al. [17] presented a congestion-relaxed layer

assignment with a layer shifting algorithm, followed by net rip-up and re-assigning to further reduce the

number of vias. In addition, some researchers extended the layer assignment problem to consider via

overflow [28-29], double patterning [30], timing [31], and antenna effect [32-33].

1.2.6 Comparison of Recent Global Routers

Table 1.1 lists the well-know global routers developed in recent six years. Although most global

routers in Table 1.1 are based on the global routing flow shown in Fig. 1.3(c), they have different

opinions on several issues. Table 1.2 shows the issues that are widely discussed in recent global routing

researches. For instance, the routers in [3, 7, 71] use RMST to be the initial tree topology for each net,

while the routers in [16, 71] use RSMT; NTHU-Route [11] reroutes the nets in the un-congested region

earlier, while the routers in [3, 12, 17] reroutes the nets in the congested region earlier; Box-Router [9,

70] rips-up a set of nets and then reroutes these nets one by one, while the routers in [4, 7, 11, 17] rip-up

a net and then reroute it immediately. On the parallel routing issues, GRIP [4, 5] parallelize global

routing on a cluster computing environment, NCTU-GR [18, 71] performs on a many-core server, the

router in [19] performs on a GPU-CPU hybrid platform.

Net decomposition [3, 7 11, 16, 71] Routing algorithms [6, 8, 10, 12, 14, 16, 71] Routing nets ordering [3, 11, 12, 17] Layer assignment approaches [3, 9, 16, 17]

Rip-up and rerouting scheme

[4, 7, 9. 11, 17, 70]

Routing cost formulation [3, 7, 10, 11, 12, 13, 16, 17, 69, 71] Multi-threaded routing [4, 5, 19, 71]

TABLE 1.2 THE ISSUES DISCCUSED IN RECENT GLOBAL ROUTING RESEARCHES

NTHU-Route [69, 11] FastRoute [13-16] FGR [3, 7] MGR [6]

NTUgr [12] Box-Router 2.0 [70, 9] NCTU-GR [17, 18, 71] Archer [10]

GRIP [4, 5] HybridGR [19] Maize-Router [8]

(23)

10

Chapter 2 Grace: A Fast Global-routng-based

Routing Congestion Estimator

2.1 Introduction

Routability is of primary concern in nanometer-scale design. Considering the routability issue in

placement stage can avoid generating an unroutable design. Two strategies are generally adopted by

routability-driven placement to estimate the congested regions (hot spots). First, the probabilistic

method estimates the routing congestion of a region by using the pin density and the nets’ bounding box

or Steiner tree [35, 36]. Although fast, this method typically fails to capture actual routing behavior, and

therefore has low estimation accuracy. The second congestion estimation strategy performs global

routing to analyze routing congestion [37]. The latter method can identify more precisely the congestion

information. However, such an approach is markedly slower than the former one. Among the modern

routability-driven placers, Ripple [38], NTUplace [39] and the placers in [40, 41] used the former

strategy, whereas SimPLR [42], IPR [43], CRISP [44] and GRplacer [45] adopted the latter one. Clearly,

it is inevitable to trade-off routing quality for better run-time performance when these built-in global

routers are concerned.

Maze routing with A* search scheme is the indispensable kernel algorithm of state-of-the-art global

routers [3-19]. For hard-to-route benchmarks, these routers attempt to eliminate overflows by iteratively

ripping up and rerouting overflowed nets by using maze routing. However, maze routing is slower than

other routing algorithms, such as pattern routing and monotonic routing algorithms. Several works have

attempted to reduce runtime by developing alternative routing algorithms in order to lower the

frequency of invoking maze routing. For instance, Archer [10] developed the U-shaped pattern routing

algorithm; NTUgr [12] presented the escaping routing algorithm; and FastRoute 4.0 [16] developed the

3-bend routing algorithm. These routing algorithms run faster than maze routing within a quite limited

(24)

11

identify better routes. Consequently, maze routing still consumes the majority of the runtime in the

entire routing flow.

This work presents an extremely fast global router called Grace, which does not include maze

routing to achieve high speedup as an ideal built-in routing congestion estimator for placers.

(a) This work presents two efficient routing algorithms, called unilateral monotonic routing and

hybrid unilateral monotonic (HUM) routing. HUM routing can identify a better routing path than

U-shaped pattern routing, 3-bend routing, and escaping routing. Moreover, the time complexity of

HUM routing is the same as those of these three approaches, linear in terms of the size of the

routing region.

(b) Many routers adopt bounding boxes to limit the searching region of routing. Consequently, the

bounding box size affects the routing quality and runtime. This work presents an efficient

congestion-aware bounding box expansion scheme. With this scheme, the proposed router can

improve runtime by 50% than without this scheme.

(c) The proposed router relies on HUM routing to eliminate overflows without invoking maze routing.

Experimental results indicate that the proposed router achieves a routing quality similar to that of

the proposed maze-routing-based router NCTU-GR 2.0 [18]. Moreover, the run-times of the

proposed router are up to 26 times faster than those of [18] on large benchmarks.

The rest of this chapter is organized as follows. Section 2 introduces the global routing problem and

the research objective. Section 3 then presents the proposed unilateral monotonic routing, HUM routing

algorithms and a congestion-aware bounding box expansion scheme. Section 4 displays the design flow

of the proposed global router. Section 5 summarizes the experimental results. Conclusions are finally

drawn in Section 6.

2.2 Problem Description

Global routing is formulated as the routing problem on a grid graph G(V, E) , where V denotes the set

(25)

12

related G-cells to its two end nodes. The capacity c(e) of a grid edge e indicates the number of routing

tracks that can legally cross the abutting boundary. The number of wires that pass through grid edge e is

called the demand of the grid edge d(e). The overflow of a grid edge e is defined as follows. The total

overflow is the sum of overflows on all edges of E.

otherwise e c e d if s w e c e d e overflow L L , ) ( ) ( , 0 ) ( * )) ( ) ( ( ) (        (2.1)

where wL and sL respectively denote the minimum wire width and wire spacing at layer L where e

belongs. In modern designs, higher layers have larger wire width and wire spacing.

Conventionally, overflow and wirelength minimizations have a higher priority than runtime

improvement for global routing that offers a global path to guide the detailed routing of each net.

However, when global router plays the role as a congestion estimator, the runtime issue become more

critical because the estimator have to report the congestion information to placers in a limited time

budget (e.g. around 1~5 min). Accordingly, this work focuses on comply with the limited time budget to

complete global routing.

2.3 The Proposed Algorithms for Accelerating Routing

Although capable of identifying a detour-free path efficiently, monotonic routing fails to replace

maze routing when a detoured path is required to avoid obstacles or congested regions. A detour is

viewed as a move away from the target. To approach the behavior of maze routing, we develop an Fig. 2.1. (a) Vertically monotonic routing path; (b) horizontally monotonic routing path; (c) routing path combining vertically and horizontally monotonic routing.

(26)

13

extremely fast routing algorithm, called unilateral monotonic routing, capable of seeking a detoured

path and running in the same time complexity as that of monotonic routing. Unilateral monotonic

routing identifies a least-cost routing path within a limited region using minimal horizontal or vertical

distance. Two unilateral monotonic routing types are defined as follows.

Definition. Horizontally/Vertically monotonic (HM/VM) routing identifies the least-cost routing path

from the source to the target using minimal horizontal/vertical distance.

For a HM/VM routing path, a detour occurs only in vertical/horizontal move. Figures 2.1(a) and

2.1(b) illustrate a VM routing path and a HM routing path, respectively, in which the gray rectangles

represent congested regions. Although the solution space of HM or VM routing is less than that of maze

routing, alternatively invoking HM and VM routings together can increase the solution space

significantly. Figure 2.1(c) depicts an example of invoking successive HM and VM routings, the path in

Fig. 2.1(c) consists of a HM routing path from s to an internal node u and a VM routing path from t to u.

2.3.1 Unilateral Monotonic Routing

Without a loss of generality, the proposed unilateral monotonic routing is introduced by using an

example of VM routing shown in Fig. 2.2. At the beginning of VM routing, the congestion map (Fig.

2.2(a)) is formulated into the global routing model (Fig. 2.2(b)), and the congestions is formulated into

the routing cost on each grid edge, then a window is given to enclose the source and target with the

height of vertical distance between the source and target and the width of horizontal distance larger than

that between the source and target (Fig. 2.2(b)). The window size determines the runtime and the

routing quality of the unilateral monotonic routing. Section 3.3 in this chapter will introduce how to

determine the window size. Figure 2.3 shows the pseudo code of the VM routing algorithm, in which

source s and target t are located at (x1, y1) and (x2, y2) respectively; B.l and B.r represent the left and

right borders of windows B, respectively; cost(v, u) denotes the routing cost of grid edge e(v, u); d(u)

refers to the least cost of the VM routing path within B from s to u; and π(u) is the predecessor of u. The

(27)

14

bottom row, i.e. the row where the start node belongs. The second stage computes the d(u) values of

nodes in all rows, except for the bottom row, from the row above the bottom row to the top one. The

first stage is a simple sequential examination initiating from the start node towards the left and right

boundaries of B, and then the second stage processes all rows except for the bottom one sequentially

and upwards. In the second stage, based on the dynamic programming algorithm, a two-phase flow is

developed and the d(u) value of each node is computed row by row. The first phase determines the Fig. 2.2. Example of VM routing. (a) a congestion map; (b) the routing model of (a), the dotted lines denote the grid edges; (c) The predecessor of each node u in the row of y-coordinate y1 after d(u) is obtained, the arrow of each node denotes its predecessor; (d) the predecessor of each node u in the row of y-coordinate y1+1 after lclb(u) is obtained; (e) the predecessor of each node u in the row of

y-coordinate y1+1 after d(u) is obtained; (f) the routing result of VM routing.

(c) (d)

(e) (f)

(a) (a) (b)

(28)

15

least-cost VM path to connect every node from the start node at the left or bottom side, while the second

phase determines the least-cost VM path to connect every node from the start node at the right side. By

the two-phase operation, the least-cost VM path to reach every node within B from the start node is then

identified.

Upon commencement of the second stage, the d(v) value of each node v∈(i,y1) for B.liB.r is identified. By assuming that node u is located at (i, y1+1), lclb(u) is the least of all costs of the VM

Algorithm Vertically monotonic routing

Input: source s(x

1

, y

1

), target t(x

2

, y

2

), bounding box B, cost array d

1. d(s)= 0, π(s)= null;

2. for x= x1-1 to B.l

3. u=(x, y1), v=(x+1, y1);

4. d(u)= d(v)+ cost(v, u), π(u)= v; 5. end for

6. for x= x1+1 to B.r

7. u=(x, y1), v=(x-1, y1);

8. d(u)= d(v)+ cost(v, u), π(u)= v; 9. end for

10. for y= y1+1 to y2

11. u=( B.l , y), v=( B.l, y-1)

12. lclb(u)= d(v)+ cost(v, u), π(u)= v 13. for x= B.l +1 to B.r

14. u=(x, y), v1=(x-1, y), v2=(x, y-1);

15. if lclb(v1) + cost(v1, u) < d(v2) + cost(v2, u) 16. lclb(u)= lclb(v1)+ cost(v1, u), π(u)= v1

17. else

18. lclb(u)= d(v2)+ cost(v2, u), π(u)= v2 19. end for

20. u=( B.r , y), d(u) = lclb(u) 21. for x= B.r -1 to B.l

22. u=(x, y), v3=(x+1, y), d(u) = lclb(u) 23. if d(v3) + cost(v3, u) < d(u)

24. d(u) = d(v3) + cost(v3, u), π(u)= v3 25. end for

26. end for

(29)

16

routing paths from s to u when the predecessor of u is at its left or bottom side, and can be obtained via

the following equation in the first phase,

       otherwise )), , ( ) ( ), , ( ) ( min( of boundary left on the is if ), , ( ) ( 2 2 1 1 2 2 u v cost v d u v cost v lc B u u v cost v d ) u ( lc lb lb (2.2)

where v1 and v2 represent the left and bottom adjacent nodes of u, respectively. During the second phase,

the least cost of VM paths to reach node u from the start node at the right side (denoted by lcr(u)) and

then the least-cost VM path to reach node u from the start node are determined sequentially by the

following equation.       otherwise )), , ( ) ( , min( of boundary right on the is if 3 3 cost v u v d ) u ( lc ) u ( lc B u ), u ( lc ) u ( d r lb lb (2.3)

where v3 represents the right adjacent node of u. If u is on the right boundary of B, the predecessor of u

must be on the left side or on the bottom side of u; thus d(u) equals lclb(u). While each node u is

examined sequentially from right to left in the second phase, the least-cost VM path to reach each node

from the start node is then determined by Eq. (2.3).

In Fig. 2.3, lines 1 to 9 calculate the least cost of the VM paths from s to each node v(i,y1) for

B.liB.r (Fig. 2.2(c)). Next, based on the dynamic programming method, the least-cost VM path from

s to each node of each row within B is identified from the row of y-coordinate y1+1 to the row of

y-coordinate y2, (lines 10 to 26), where lines 11 to 19 identify the values of lclb(u) by Eq. (2.2) and lines

20 to 25 identify the values of d(u) by Eq. (2.3). Figure 2.2(d) shows the predecessor of each node u in

the s-to-u path of lclb(u) in the row of y-coordinate y1+1. Meanwhile, Fig. 2.2(e) shows the predecessor

of each node u in the s-to-u path of d(u) in the row of y-coordinate y1+1. Upon completion of the VM

routing, each node within B has a least-cost VM path to reach s along its predecessor (Fig. 2.2(f)).

Therefore, the least-cost VM path from s to t is also identified. Obviously, the time complexity of

(30)

17

2.3.2 Hybrid Unilateral Monotonic Routing

Compared to maze routing, unilateral monotonic routing still offers a limited solution space to solve

overflows. This section introduces a hybrid unilateral monotonic (HUM) routing algorithm to search for

larger solution space than unilateral monotonic routing offers. The HUM routing concept assumes that

each node within B can be an intermediate point connecting the start and target nodes. The HUM path

consists of two paths, i.e. the path linking the start node with an intermediate point and the path linking

the intermediate point with the target node. Each path can be formed by unilateral monotonic routing.

Since a path can be formed by VM or HM routing, four combinations are available to form a HUM

routing path. By assuming that bounding box B encloses nodes u and v, VMPB(u,v) and HMPB(u,v)

represent a VM routing path and a HM routing path connecting u with v within B, respectively. A HUM

routing path connecting s with t belongs to one of the following four path types: (VMPB(s,u), VMPB(u,t)),

(VMPB(s,u), HMPB(u,t)), (HMPB(s,u), VMPB(u,t)) and (HMPB(s,u), HMPB(u,t)) for each node u within B.

Whereas (VMPB(s,u), VMPB(u,t)) denotes a path concatenation operation that combines two unilateral Fig. 2.4. The pseudo code of hybrid unilateral monotonic routing algorithm.

Algorithm Hybrid Unilateral Monotonic Routing

Input: source node s, target node t, bounding box B

1. Initialize cost array Aryvs, Aryvs, Aryhs, Aryht 2. //Find the paths from each node in B to s

3. Vertically_Monotonic_Routing(s, B.bl, B, Aryvs) 4. Vertically_Monotonic_Routing(s, B.tr, B, Aryvs) 5. Horizontally_ Monotonic_Routing(s, B.bl, B, Aryhs) 6. Horizontally_Monotonic_Routing(s, B.tr, B, Aryhs) 7. //Find the paths from each node in B to t

8. Vertically_Monotonic_Routing(t, B.bl, B, Aryvt) 9. Vertically_Monotonic_Routing(t, B.tr, B, Aryvt) 10. Horizontally_ Monotonic_Routing(t, B.bl, B, Aryht) 11. Horizontally_ Monotonic_Routing(t, B.tr, B, Aryht) 12. foreach node u in B

13. mrc(u)=min(Aryhs(u), Aryvs(u))+min(Aryht(u), Aryvt(u)) 14. end foreach

(31)

18

monotonic paths of one or two type to form a HUM path linking start and end nodes. Figure 2.4 shows

the proposed HUM routing algorithm. The least costs of VMPB(s,u), VMPB(t,u), HMPB(s,u) and HMPB(t,u) of

each node are stored in the arrays Aryvs, Aryvt, Aryhs and Aryht, respectively, while B.bl and B.tr

represent the nodes at the bottom-left and top-right corners of B, respectively. Lines 3 – 6 regard s as the

start node, and B.bl and B.tr as pseudo targets. Then, lines 3 and 4 invoking VM routing from s to the

pseudo targets obtain VM routing paths from s to every node within B; lines 5 and 6 invoking HM

routing from s to the pseudo targets obtain HM routing paths from s to every node within B. Similarly,

lines 8 – 11 regard t as the start node and B.bl and B.tr as the pseudo targets, and then obtain VM

routing paths and HM routing paths from t to every node within B. Accordingly, lines 2 – 11 identify

the value of each element in Aryvs, Aryvt, Aryhs and Aryht. Thereafter, the least costs of VMPB(s,u),

VMPB(t,u), HMPB(s,u) and HMPB(t,u) for each node u within B are obtained (Fig. 2.5(a)-(d)). The algorithm

then selects the least-cost HUM routing path among the candidates of four path types (lines 12 – 15).

Fig. 2.5. Four path types in B with congested regions (gray rectangles). (a) VMPB(s,u), (b) VMPB(t,u), (c) HMPB(s,u), and (d) HMPB(t,u).

(a) (b)

(32)

19

The time complexities of three parts, lines 2-11, lines 12-14 and line 15 are all O(|B|).

Correspondingly, the time complexity of HUM routing algorithm is still O(|B|), which is faster than that

of maze routing with A* search scheme (O(|B|log|B|)). Figure 2.6 compares the proposed HUM routing

with 3-bend routing and escaping routing, indicating that the time complexities of 3-bend routing and

escaping routing are also O(|B|). Figures 2.6(a) and 2.6(b) summarize the routing results of 3-bend

routing and escaping routing with two overflows and with an overflow, respectively. In contrast, the

proposed HUM routing algorithm can identify an overflow-free path (Fig. 2.6(c)) with the pattern

(HMPB(s,u), HMPB(u,t)). Notably, even if HUM routing cannot identify an overflow-free path, it can

always identify a least-cost HUM path which must cost less than or equal to that of 3-bend routing and

escaping routing. Because, the solution space of HUM routing totally covers and is much larger than

that of 3-bend routing and escaping routing.

Assume that most of overflowed grid edges within B are aligned in a row similar to the congestion

map in Fig. 2.1(a), the least costs of HMPB(s,u) and HMPB(u,t) are likely larger than the least costs of

VMPB(s,u) and VMPB(u,t). Therefore, the operations of exploring HMPB(s,u) and HMPB(u,t) can be regarded

as redundant and are thus omitted. Based on this observation, four HUM routing types are explored only

once for every net at the first time when it is routed by HUM routing. If a net is rerouted by HUM

routing in the later routing stage, only the HUM routing type that initially identified the least-cost path (a) (b) (c)

Fig. 2.6. (a) Routing result of 3-bend routing with two overflows; (b) routing result of escaping routing with an overflow; (c) routing result of the proposed HUM routing without overflows.

(33)

20

is invoked. By this scheme, experimental results indicate that similar routing quality and an

approximately 23% decrease in runtime of HUM routing can be achieved.

2.3.3 Congestion-aware Bounding Box Expansion

Bounding box is widely adopted to limit the searching region of routing. In conventional global

routers, the initial bounding box is slightly larger than the minimum rectangle enclosing the terminals of

the routed net. The inability to identify an overflow-free path within the bounding box causes the

bounding box to expand and, then, the overflowed net is rerouted again. The box expansion policy

based on current congestion information has seldom been discussed in the literature. The traditional box

expansion scheme tends to over-expand, subsequently increasing the runtime. For instance, Fig. 2.7(a)

shows a routing path with a vertical overflowed edge. Traditional box expansion chooses to expand the

bounding box along both x and y coordinates to resolve the overflow. However, the bounding box only Fig. 2.7. Example of congestion-aware bounding box expansion. (a) Routing path with an vertical overflow; (b) the overflow map of the benchmark superblue1 after the initial routing; (c) currently identified path Ps,t and path Ls,t that is expected to be across the left side of the bounding box; (d) the estimated lower bound cost of Ls,t is the sum of the costs of Ps,v and Pt,u plus manh(v, u)* α.

(a) (b)

(34)

21

needs to expand horizontally in Fig. 2.7(a) while the vertical expansion is unnecessary. Figure 2.7(b)

displays the congestion map of the benchmark superblue1 after the initial routing. The red regions

represent the overflowed grid edges, which normally range horizontally or vertically, implying that the

situation in Fig. 2.7(a) occurs frequently during routing. Based on this observation, this work presents a

novel congestion-aware bounding box expansion scheme to avoid over expanding.

Before rerouting a net, this work analyzes the amount of horizontal overflowed grid edges (HOEs)

and vertical overflowed grid edges (VOEs) by tracing the routing path of the rerouted net. If the number

of HOEs is more than that of VOEs, the bounding box expands vertically by δ units; on the contrary, the

bounding box expands horizontally. If a tie occurs, the bounding box randomly chooses to expand

horizontally or vertically. Single-direction expansion can restrict the sizes of bounding boxes to reduce

the runtime. In our implement, the initial bounding box is set as the minimum rectangle enclosing two

terminals to be routed, and δ is set to 5+30/ri, where ri denotes the rip-up and rerouting times of the

rerouted net. Moreover, based on the assumption that two opposite sides have different congestion

states, extending the side near the congested region may be unnecessary, implying that the extension of

each boundary of B should be discussed separately. The algorithm examines each boundary side of B to

determine the necessity of box boundary expansion at the end of HUM routing. Without a loss of

generality, the left boundary of B is used to illustrate the concept. Left boundary expansion can be

regarded to have the intention to find a path Ls,t on the left side of B; in addition, Ls,t has a lower routing

cost than that of the currently identified path Ps,t (Fig. 2.7(c)). Namely, a situation in which the routing

cost of Ps,t is lower than the least cost of Ls,t implies that the left boundary expansion is unnecessary.

However, the least cost of Ls,t is unknown because the region on the left side of B has not been explored

yet. Thus, the estimated lower-bound cost of Ls,t, ecL, is defined by the following equation to evaluate

the necessity of boundary expansion. If the currently identified path Ps,t costs less than ecL, the left

boundary remains unchanged at the next expansion of B.

)

(

)

(

)

(









min

 

d

s,v

d

t

,

u

manh

v

,

u

ec

L u VL,v VL (2.4)

(35)

22

where VL denotes the set of grid nodes on the left boundary of B; d(s,v) and d(t,u) represent the least

cost of the unilateral monotonic routing paths from s to v and from t to u, respectively; manh(v,u) refers

to the Manhattan distance between v and u; and α is the lower-bound routing cost of a grid edge. In this

work, α is set to 1. Notably, d(s,v) and d(t,u) are known values that have been computed by the HUM

routing (Fig. 2.7(d)). With this, before a net ni is rerouted, the path of ni is first traced to obtain HOEs

and VOEs. If the number of VOEs is more than that of HOEs, extending the left and right boundaries of

the bounding box B of ni is considered. If ni is not routed by HUM routing in previous routing, the left

and right boundaries of B extend immediately. Otherwise, the decision of boundary expansion is made

according to the previous discussion.

2.4 Design Flow of Grace

Figure 2.8 shows the design flow of the proposed routing congestion estimator Grace. First, the

multi-layer routing region is projected on a 2D plan and each net is decomposed into two-pin nets based

on the topology of the RMST because the works in [1, 7, 18] indicate that RMST offers better flexibility

than Steiner tree to avoid blockages or congestion. An initial congestion graph is then generated by

pattern routing and monotonic routing. Next, the rip-up and rerouting stage iteratively reroutes the

overflowed net until an overflow-free routing result is obtained or the runtime exceeds the given time

budget. In the rip-up and rerouting stage, before net ni is rerouted, the bounding box of ni is expanded Fig. 2.8. Design flow of Grace

(36)

23

according to the proposed congestion-aware expansion scheme. For a situation in which the width of the

bounding box is equal to the x-distance between the source and the target of ni, ni is rerouted using HM

routing. Moreover, if the height of the bounding box is equal to the y-distance between the source and

the target of ni, ni is rerouted using VM routing. Otherwise, ni is rerouted by HUM routing.

2.5 Experimental Results

The proposed algorithms are implemented in C/C++ language on a quad-core 2.4 GHz Intel

Xeon-based linux server with a 50GB memory (only a single core is used). By hosting a

routability-driven placement contest, ISPD11 has motivated many researchers to develop effective

modern placers [38, 39, 42]. Ripple [38] and mPL11 placed first in the contest; their contest placement

results are adopted here as the input benchmarks in our experiments. We compare Grace with

NCTU-GR 2.0 [18] which is one of the fastest global routers. The experiments in [18] indicate that

NCTU-GR 2.0 runs 1.90X, 1.77X and 18.66X faster than NTHU-Route 2.0 [11], FastRoute 4.1 [16],

and NTUgr [12], respectively. In addition, in the old benchmarks [1, 2], the minimum wire spacing and

width are uniform and all pins locate at the lowest layer. In contrast, in new benchmarks [46] used in

this work, the minimum wire spacing and width of different layers are different and pins may locate at

high layers. Because most of traditional routers do not consider these new features, we cannot directly

adopt them to route the new benchmarks. However, recent routers NCTU-GR 2.0, BFG-R [7] and

CGRIP [35] can handle these new features, but the runtime of BFG-R and CGRIP is much larger than

NCTU-GR 2.0. Owing to its robustness and efficiency, the routability-driven placement contest in

DAC12 [47] and ICCAD12 [48] selects NCTU-GR 2.0 to be the evaluation tool. In the following

experiments, NCTU-GR 2.0 and Grace perform on the same machine. Notably, NCTU-GR 2.0 has the

parameters of via cost, wirelength optimization level, pattern routing iteration, monotonic routing

iteration and post routing iteration, which are set to 1, 50, 2, 2 and 0, respectively. The setting of the

用於擺置繞線流程的可繞度和效能最佳化技術

國立交通大學資訊學院資訊科學與工程研究所

博士論文

Graduate Institute of Computer Science and Engineering

College of Computer Science

National Chiao Tung University

Doctoral Dissertation

用於擺置繞線流程的可繞度和效能最佳化技術

Optimizing Routability and Performance of Placement and

Routing Flow for Nanometer Designs

劉文皓

Wen-Hao Liu

指導教授:李毅郎 博士

Advisor: Yih-Lang Li, Ph.D.

中華民國一百零二年一月

January, 2013

摘要

近年來，隨著奈米技術的進步，晶片中的元件越來越多，同時繞線的

難度也越來越高。晶片的可繞性(Routability)成為眾人所觀注的議題。在

該議題上，全域繞線(Global Routing)扮演著重要的角色。在電路實體設

計(Physical Design)的流程中，全域繞線上承元件擺放(Placement)，下

啟細部繞線(Detailed Routing)。一個快速的全域繞線器能提供擁擠度

(Congestion)的資訊給擺放器，讓擺放器擺出較容易繞線的布局。此外，

若全域繞線器能有效的解決繞線擁擠度的問題，細部繞線的負擔和所需

時間可以大幅降低。當全域繞線器和擺放器合作時，全域繞線器如何快

速且準確的回報擁擠度資訊給擺放器是一個重要的議題。另一方面，當

全域繞線器扮演細部繞線器的指導者角色時，全域繞線結果的品質就顯

的格外重要。若細部繞線器能根據一個高品質的全域繞線結果進行細部

繞線，將會提升細部繞線結果的品質，並大縮短細部繞線的時間。這篇

論文提出了兩個全域繞線器: Grace是個快速的全域繞線器，適合擔任擁

擠度預估器; NCTU-GR 2.0能產生高品質的繞線結果來指導細部繞線，

其繞線結果較不擁擠且有較短的線長。此外，為了將Grace應用於業界，

我們增加了許多功能於Grace中來滿足業界的需求。在擺放和繞線的中間

階段，我們結合了擺放器和全域繞線器，提出了一個可繞性優化器

(Routability Optimizer)。若給予一個布局結果，該優化器可以重新擺放

其中的元件，讓該布局更容易被繞線，進而得到更好的繞線結果。最終，

在進行細部繞線前，我們還提出了一個三維繞線改善器，將全域繞線的

結果在做進一步的改善，此舉可以更進一步的降低細部繞線器的時間和

負擔。

Abstract

誌謝

能順利的完成這份論文，我最先要感謝的是我的父母-劉修添先生和

蘇素華女士。謝謝你們的支持，讓我能無後顧之憂的做自己喜歡的研究。

你們的身教和言教，是我能順利完成博士學位的最大原因。謝謝你們。

我希望將這份論文獻給我最親愛的父親和母親。

在我博士生涯中，謝謝李毅郎老師的細心且耐心的指導。我何等幸

運，因為經師多有，人師難得。謝謝你教導我做研究的方法和待人處事

的道理。這四年在你的教導下，我覺得我在很多方面都有大幅的進步，

特別是論文撰寫方面。四年前，我連一句文法正確的英文句子都寫不出

現。往往整篇論文都要經過你的大幅翻修，才能讓人讀懂。謝謝你花許

多精力批改我的論文和教導我如何寫論文。因為你的教導，這篇博士論

文才能順利完成。

謝謝Cheng-Kok Koh教授給我到普渡大學進行千里馬的機會。我在普

渡大學的這段時間，謝謝你關心我的生活，並跟我分享你的人生經驗。

在我們討論研究進度時，謝謝你能忍受我的破爛英文，並且不時的幫我

正音。我從你身上學到了有別以往的研究方法，並且更開擴了我的國際

觀。我相當感激並且慶幸能受到你的指導。當我在美國的期間，謝謝Cliff

Sze博士給我到IBM Austin Lab進行短期研究的機會。IBM Austin Lab一

直是我心中所響往的一流研究機構，在這裡，我很慶幸能和許多知名的

研究者共事。我在IBM的期間，學到了業界解決問題的方法，並且讓我

的英文的溝通能力大有進步。

一路走來，我要感謝的人實在是太多了。謝謝國小時的林蘭春老師，

國中時的張澄仁老師，高中時的徐英珠老師和黃玉慧老師。你們總是在

我做錯事時包容我、糾正我;在我意志消沉時鼓勵我、鞭策我。沒有你們

的教誨，就沒有今日的我。在我就讀博士期間，謝謝王廷基教授和張耀

文教授對我的照顧和經驗分享。我要特別感謝王廷基教授，我在研究上

遇到困難時，王廷基教授經常對我伸出援手。我還要感謝我的姑姑-劉虹

君女士，謝謝妳對我的照顧和愛護，妳在我的成長過程中扮演著十分重

要的角色。最後我要感謝我的女朋友-林珈竹，謝謝妳陪伴我走過研究所

的求學生涯，謝謝妳在我研究受挫時鼓勵我、聽我訴苦。有妳相伴的時

光，讓我的生活不再只是程式碼和論文，妳讓我的研究所生涯多了許多

美好的回憶。謝謝妳，我愛妳。

Table of Contents

Chapter 1

Introduction...1

Chapter 2

Grace: A Fast Global-routng-based Routing Congestion Estimator ...10

Chapter 3

E-Grace: Enhancing Grace for Practically Industrial Designs ...29

Chapter 4

指導教授:李毅郎博士