為了工程變更時序最佳化的考慮拓樸的重新繞線

(1)

國

立

交

通

大

學

資訊科學與工程研究所

碩

士

論

文

為了工程變更時序最佳化的考慮拓樸的重新繞線

Topology-aware Rerouting for ECO Timing Optimization

研究生：童建勳

指導教授：李毅郎教授

中

(2)

為了工程變更時序最佳化的考慮拓樸的重新繞線

Topology-aware Rerouting for ECO Timing Optimization

研究生：童建勳 Student：Jian-Syun Tong

指導教授：李毅郎 Advisor：Yih-Lang Li

國立交通大學

資訊科學與工程研究所

碩士論文

A Thesis

Submitted to Institute of Computer Science and Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science

October 2010

Hsinchu, Taiwan, Republic of China

(3)

為了工程變更時序最佳化的考慮拓樸的重新繞線研究生 : 童建勳

指導教授: 李毅郎博士國立交通大學資訊科學與工程研究所

摘要

由於增加的複雜度，現代設計產生越來越多的時序違反。時序工程變更有效地修正時序違反，而不需要重新設計整個晶片，並且可以有效地縮短再製造時間。慣例上插入緩衝器只專注於最小化最嚴重接點的延遲，卻忽略了障礙物和已繞線的區域。然而，忽略多接點網路的拓樸和障礙物，會使其他接點的延遲在插入緩衝器之後變得更糟。這篇論文導出兩個在插入緩衝器上的拓樸作用：打斷邊並且連接緩衝器和重新連結緩衝器。基於這些作用，這篇論文提出一個新穎的、有考慮已繞線區域的考慮拓樸的工程變更時序最佳化法，來改進時序違反的接點，並且不會使其他接點的延遲變差。實驗結果顯示，提出的演算法可以平均地降低測資的最差負數延遲時間達 75.3%和總合負數延遲時間達 76.3%。相較於慣例上的以兩接點網路為基礎的插入緩衝器，這裡提出的演算法有更好的延遲改進，平均達到 35.2%，但需要花較長的執行時間。

(4)

Topology-aware Rerouting for ECO Timing Optimization

Student: Jian-Syun Tong Advisor: Dr. Yih-Lang Li

Institute of Computer Science and Engineering

National Chiao Tung University

Abstract

Modern designs cause more and more timing violations due to the increased complexity. Timing ECO effectively fixes the timing violations incrementally without requiring redesigning the whole chips, and turnaround time can then be diminished significantly. Conventional buffer insertion only focuses on minimizing the delay of one critical sink and ignores the existing obstacles and routed wire segments when inserting buffers. However, ignoring the topology of one multi-pin net and obstacles may worsen the delays of other sinks after buffer insertion. This work derives two topology effects on buffer insertion: edge breaking and buffer connection and buffer

reconnection. Based on the effects, this work presents a novel topology-aware ECO

timing optimization considering routed wire segments to improve the delay of the violated sinks while preventing from worsening the delays of other sinks. Experimental results show that the proposed algorithm averagely improves the worst negative slack (WNS) and total negative slack (TNS) of benchmarks by 75.3% and 76.3%, respectively. Compared to the conventional 2-pin net-based buffer insertion, the proposed algorithm achieves better delay improvement by 35.2% on average at the cost of runtime.

(5)

Acknowledgement

I am deeply grateful to my advisor, Dr. Yih-Lang Li for his continuous guidance, support, and ardent discussion throughout this research. His useful suggestions help me to complete the thesis. Also I express my sincere appreciation to all classmates in my laboratory for their encouragement and help. Especially I want to thank my senior classmate Yen-Hung Lin for his assistance and direction in this work.

This thesis is dedicated to my parents and my families for their patience, love, encouragement and long expectation.

(6)

List of Figures

1 Example of timing path……...………...…….4

2 Example of buffer insertion on a multi-pin net………...………7

3 Illustration of edge breaking and buffer connection effect……….9

4 Illustration of buffer reconnection effect………..……….11

(8)

List of Tables

1 Statistics of The Benchmark Circuits………...……….18 2 Statistics of Benchmark Nets and Experimental Results of Topology-aware Timing ECO Optimization (TOPO)……….………...………..19 3 Comparison of the Delay Improvement of the Critical Sink based on the Elmore

Delay (EL) and SOC Encounter (SE) and Runtime Analysis between the Simulated 2-pin Net-based Buffer Insertion (SIM) and Topology-aware ECO Timing Optimization (TOPO)………...20

(9)

Chapter 1 Introduction

Interconnection delay has become the main factor to determine circuit delay in recent years because of the scaling of VLSI technology. However, precise timing information for critical paths/sinks with delay violations cannot be obtained before placement and routing (P&R). Timing violations can be fixed by redesign, but redesign is time-consuming and requires considerable efforts. Fortunately, engineering change order (ECO) provides an effective approach to fix timing violation or functional correctness after P&R. ECO utilizes the pre-placed spare cells to partially modify design. Spare cells are redundant cells. Different chip designs have different type and number of spare cells. For spare-cell placement, Chang et al. [1] proposed a comprehensive analysis on the configurations of spare-cell types and the strategies of spare-cell insertion. Jiang et al. [2] proposed a spare-cell-aware analytical placement framework which predicts the spare cell requirement and considers spare cell insertion during global placement.

ECO contains timing ECO [3]-[5] and functional ECO [6][7]. For timing ECO, Chou et al. [3] proposed a two-stage ECO algorithm Timing-Driven ECO Routing algorithm (TDER) by modifying detailed-routed nets to reduce delays of critical sinks. In the first stage, TDER selects a pair of suitable subtrees along the trunk, which is defined as a path from the source pin to the critical sink of a net, and then merges them. In the second stage, TDER determines the positions of Steiner points along the trunk to reduce the delay of critical sinks. Chen et al. [4] proposed a timing ECO framework considering buffer insertion and gate sizing simultaneously. They presented a dynamic programming algorithm considering the dynamic cost, called

(10)

dynamic cost programming (DCP). Chang et al. [5] proposed a framework, named MOESS, to solve the input-slew and output-loading violations by connecting spare cells onto the violated nets as buffers. MOESS provides two buffer insertion schemes performed sequentially to minimize the number of inserted buffers and then to solve timing violations. On the other hand, for functional ECO, Chang et al. [6] proposed a matching-based ECO synthesizer, ECOS. ECOS correctly implements the incremental design changes using the available spare cells, and also tries to reduce the prohibitive photomask cost at the same time. Tseng et al. [7] proposed an approach with two main steps: (1) technology remapping and (2) spare cell selection. In technology remapping step, their approach considers resource constraints. Traditional technology mapping might potentially exhaust resources or be unable to find suitable resources. In spare cell selection, they regard this problem as a question of resource allocation with the objective of simultaneously selecting the suitable spare cells to achieve functional changes and minimizing the increased wirelength.

This work focuses on timing ECO. Chou et al. [3] only considered one critical sink of a routed net. Chen et al. [4] considered the two-pin net in the timing path and ignore the multi-pin net topology and the impact on the delays of other sinks of the same net when selecting inserted buffers. Chang et al. [5] considered multiple pins of a net but do not consider the topology of the net either. Moreover, Chen et al. [4] and Chang et al. [5] do not perform detailed routing to connect the selected spare cell with the net. Therefore, the accuracy of spare cell selection of previous researches is not enough because of disregarding the net topology and detailed routed wires.

To enhance the accuracy of buffer insertion, this work simultaneously considers the routed wires and the multi-pin net topology during buffer insertion. Moreover, this is the first work to improve the delay of critical sinks without degrading those of other sinks. This work presents a topology-aware ECO timing optimization algorithm to

(11)

modify a detailed-routed net such that the interconnection delays of all violated sinks in the net are minimized without creating additional violated sinks and modifying any other detailed-routed net.

The rest of this paper is organized as follows. Section II gives preliminaries and formulates the ECO timing optimization problem. Section III presents the topology effects on buffer insertion. Section IV presents our topology-aware ECO timing optimization algorithm. Section V reports the experimental results. Finally, we conclude our work in Section VI.

(12)

Chapter 2 Preliminaries

Timing path is a unit of timing ECO optimization and is defined as a path from fanin or register to register or fanout. To improve the delay of the timing path, ECO timing optimization inserts one buffer between two gates along the timing path by breaking the original interconnection and re-wiring the gates to the inserted buffers. Spare cells in the chip can be treated as the candidates of inserting buffers. Fig. 1 shows an example of timing ECO optimization in a timing path where g1-g2-g3 is the

timing path and gs1 and gs2 are spare cells. The timing path in Fig. 1(a) with a slack

equal to -0.8 violates the timing constraint. After inserting gs1 and gs2 in the timing

path, the timing violation is solved, as shown in Fig. 1(b).

2.1 Elmore Delay Model

The Elmore delay model [8] provides quick delay estimation. Although the accuracy of Elmore delay model is not sufficiently precise, the goal of this work is to

Fig. 1. Example of timing path: (a) a timing path before buffer insertion; (b) a timing path after buffer insertion and re-wiring.

(13)

reduce the relative delays of the sinks. Therefore, this work adopts the Elmore delay model with π-model to estimate delay. The Elmore delay model is introduced for further descriptions as follows. eni represents the wire segment from node ni to its

parent in T(n0), where T(n0) is a routing tree rooted at the source n0 and a node

represents a Steiner point or a pin. c(eni) represents the capacitance of eni. r(eni)

represents the resistance of eni plus total pin or via resistance of the parent node of

node ni. C(ni) is the total capacitance of T(ni), which includes the capacitances of all

wire segments and all pin load capacitances belonging to T(ni).

The Elmore delay of wire segment eni with π-model equals r(eni)×(c(eni)/2+C(ni)).

Let Rd be the output driver resistance at source n0. The Elmore delay tED(nk) at the

node nk is then computed as:

∑

∈ + × + × = ) , ( 0 ( )) 2 ) ( ( ) ( ) ( ) ( k o i n i i n n path e i n n d k ED C n e c e r n C R n t (1)

,where path(n0, nk) is the path from n0 to nk in T(n0).

2.2 Problem Formulation

Topology-aware buffer insertion: In a post-routing design D, a buffer set B =

{ b1 , b2 , … , bm } is placed in D as spare cells. A detailed-routed net N = (P, E) where P and E represents the pin and the edge sets, respectively. P = {p0, p1, p2, … , pn}

where p0 is the driver and others are sinks. Each one in E is a path between two

Steiner points. Timing-related information of N, driver arrival time Tarr(p0), sink

required time Treq(pi), i = 1…n, and sink delay Td(pi), i=1…n, are given. Given a set of

timing violation sinks VP = {vp1, vp2, … , vpk} ∈ P-{p0}, where Tarr(p0)+Td(vpi)

exceeds Treq(vpi), i=1…k. By inserting a buffer in B into N, such that the topology of N is changed and the interconnection delays of the set VP are minimized without

(14)

(15)

Chapter 3 Topology Effects on Buffer Insertion

Conventional timing ECO algorithms focus on improving the delay of one timing path at a time. Therefore, to trace the given timing path, previous works [4] [5] handle 2-pin net for each segment in the timing path although the 2-pin net is partial of one multi-pin net. Focusing on optimizing the delay of one 2-pin net of one multi-pin net

for the most violation sink may degrades the delays of other sinks belonging to the same net, sequentially worsening other timing violation paths. Fig. 2(a) shows an example of two timing violation paths intersecting in one multi-pin net, where g0-g3 is

Fig. 2. Example of buffer insertion on a multi-pin net: (a) two timing violation paths intersecting in a multi-pin net; (b) considering as 2-pin net with buffer insertion for the most critical timing path; (c) considering as multi-pin net with buffer insertion; (d) considering as multi-pin net with buffer insertion and buffer reconnection.

(16)

a part of the most critical path with slack equal to -0.8; g0-g2 is a part of the other

violation path with slack equal to -0.3; and gs1 and gs2 are spare cells for buffering. In

Fig. 2(b), considering the net as 2-pin net for buffer insertion increases the slack of the most critical path to -0.7 but decreases the slack of the other violation path to -0.4. In Fig. 2(c), considering the net as multi-pin net for buffer insertion increases the slacks of the most critical path and the other to -0.6 and -0.2, respectively. Moreover, reconnecting the buffer with shorter interconnections can further improve the delay of all sinks. In Fig. 2(d), reconnecting the buffer in Fig. 2(c) increases the slacks of the most critical path and the other to -0.3 and -0.15, respectively.

To insert one buffer in one multi-pin net, denoted as Nm, one edge, denoted as Edis, in Nm must be removed, and the inserted buffer, denoted as Bins, connects to the

two disconnected terminals due to removing Edis. Fig. 3 depicts a net before/after

buffer insertion. In Fig. 3(a), en(i+1) is removed, and Enew1 and Enew2 are connected to

the inserted buffer buff. In Fig. 3, Edis and Bins are en(i+1) and buff, respectively.

Choosing different pair of Edis and Bins leads to different delay effects on all sinks in Nm. The pair of Edis and Bins is defined as buffering pair, denoted as pairbuf(Edis, Bins).

This work observes two topology effects on the buffer insertion in one multi-pin net from the Elmore delay: edge breaking and buffer connection and buffer reconnection. Edge breaking and buffer connection illustrates how to choose the most proper buffering pair for all sinks, even non-critical sinks, while buffer reconnection illustrates how to reconnect to the inserted buffer to further improve the delays of all sinks.

3.1 Edge Breaking and Buffer Connection Effect

The purpose of edge breaking and buffer connection is to find the proper buffering pair. Fig. 3 illustrates a multi-pin net, where n0 is the source pin; n1 to nj are Steiner

(17)

points; and T1 to Tj are subtrees of T(n0). In Fig. 3(b), en(j+1) is removed and buff is

inserted. Breaking the net by removing en(j+1) separates the net into two sub-nets, one

including n0 and the other excluding n0, and Enew1 and Enew2 reconnect the two

sub-nets by connecting to buff. Based on the Elmore delay model, the delay of each node in Fig. 3(a) is:

(2) . , 1, where , )) ( _ 2 ) ( ( ) ( ) ( _ ) ( ) , ( 0 0 j k n a C e c e r n a C R n t y n n n n path e d k ED y y k y n L = + × + × =

∑

∈

Because the topology of the net in Fig. (a) and in Fig. (b) are different, C(n) in Fig. (a) and Fig. (b) are also different. Thus, we define C(n) in Fig. (a) as C_a(n) and C(n) in Fig. (b) as C_b(n).

The delay of each node in Fig. 3(b) is:

)

3 (

.

,

1,

where

,

))

(

_

2 )

(

)

(

)

(

_

)

(

) , ( 0 0

i

k

n

b

C

e

c

e

r

n

b

C

R

n

t

y n n n n path e d k ED y y k y n

L

=

+

×

+

×

=

∑

∈

Fig. 3. Illustration of edge breaking and buffer connection effect: (a) a net before buffer insertion; (b) one inserted buffer buff and two reconnections.

(18)

(

)

) 4 ( , , 1 k where , )) ( _ 2 ) ( ( ) ( )) ( _ 2 ) ( ( ) ( ) ( _ ) ( ) ( )) ( 2 ) ( ( ) ( )) ( _ 2 ) ( ( ) ( ) ( _ ) ( ) , ( 1 2 2 1 2 1 1 ) , ( 0 1 0 j i n b C e c e r n b C E c E r n b C E c buff r buff c E c E r n b C e c e r n b C R n t y n n n n path e i new new i new new new y n n n n path e d k ED y y k i y n y y i y n L + = + × + + × + + × + + × + + × + × =

∑

+ ∈ + + ∈

Thus, the delay difference of each node between Fig. 3(b) and Fig. 3(a) is:

) 5 ( i. , 1, k where , )) ( _ ) ( ) ( ) ( ( ) ) ( ( ) ( 1 1 ) , ( 1 0 L = − − + × + = ∆ + ∈ +

∑

i n new n n path e n d k ED n a C e c buff c E c e r R n t i k y n y ) 6 ( , , 1 k where )), ( _ 2 ) ( ( ) ( )) ( _ 2 ) ( ( ) ( )) ( _ ) ( ( ) ( )) ( 2 ) ( ( ) ( )) ( _ ) ( ) ( ) ( ( )) ( ( ) ( 1 1 2 2 1 2 1 1 1 1 ) , ( 1 1 1 0 j i n a C e c e r n b C E c E r n b C E c buff r buff c E c E r n a C e c buff c E c e r R n t i n n i new new i new new new i n new n n n path e d k ED i i i y i y n L + = + × − + × + + × + + × + − − + × + = ∆ + + + + ∈ + + +

∑

Notably, the delay difference of each pin in Ti is equal to the delay difference of ni.

Thus, when tED(nk) where k = 1, …, i is non-positive, the delay difference of each

pin in the sub-net including n0 is also not positive. Notably, the delay difference in the

sub-net including n0 has the common term c(Enew1) + c(buff) – c(en(i+1)) – C_a(ni+1),

and the delay of the sub-net including n0 can be improved by satisfying the following

inequality. 0 ) ( _ ) ( ) ( ) ( ₁ ₁ 1 − ≤ − + n+ i+ new c buff c e C a n E c i (7)

In other words, the delay of each pin in the sub-net including n0 can be improved by

inserting the buffer.

(19)

delay of the sub-net excluding n0. Therefore, we derive the following inequality based on (6). (8) 0 )) ( _ 2 ) ( ( ) ( )) ( _ 2 ) ( ( ) ( )) ( _ ) ( ( ) ( )) ( 2 ) ( ( ) ( )) ( _ ) ( ) ( ) ( ( )) ( ( 1 1 2 2 1 2 1 1 1 1 ) , ( 1 1 1 0 ≤ + × − + × + + × + + × + − − + × + + + + + ∈ + + +

∑

i n n i new new i new new new i n new n n n path e d n a C e c e r n b C E c E r n b C E c buff r buff c E c E r n a C e c buff c E c e r R i i i y i y n

On other words, satisfying (8) guarantees that the delays of all pins in the sub-net excluding n0 can be improved by inserting the buffer.

In conclusion, choosing a proper buffering pair that satisfies both (7) and (8) guarantees that the delay improvement of all sinks after buffer insertion.

3.2 Buffer Reconnection Effect

After buffer insertion, one net can be partitioned into two parts: one includes the driver and the other excludes the driver. The delays of all pins in these two parts could be further improved by reconnecting them to the inserted buffer. Fig. 4(a) shows a net after buffer insertion where n0 is the driver, and na, nb, and ne are Steiner points. The

Fig. 4. Illustration of buffer reconnection effect: (a) a net after buffer insertion; (b) a net after buffer reconnection.

(20)

part including n0 reconnects to the buffer by finding the nearest path from buff to the path(n0,na), as shown in Fig. 4(b) where Enew1’ is the nearest path from buff to the path(n0,na) and nc is the intersection point. After reconnection, the delay differences of

the sub-net including n0 can be partitioned into three parts: part1, part2, and buff as

shown in Fig. 4(b). The delay of each node in the sub-net including n0 in Fig. 4(a) is:

(9) )) ( _ 2 ) ( ( ) ( ) ( _ ) ( ) , ( 0 0 y n n n n path e d i ED C a n e c e r n a C R n t y y i y n + × + × =

∑

∈

where ni is not including buff.

(10) )) ( 2 ) ( ( ) ( )) ( _ 2 ) ( ( ) ( ) ( _ ) ( 1 1 ) , ( 0 0 buff c E c E r n a C e c e r n a C R buff t new new y n n n n path e d ED y y a y n + × + + × + × =

∑

∈

The delay of each node in the sub-net including n0 in Fig. 4(b) is:

(11), )) ( _ 2 ) ( ( ) ( ) ( _ ) ( ) , ( 0 0 y n n n n path e d i ED C b n e c e r n b C R n t y y i y n + × + × =

∑

∈

where ni is not including buff.

(12) )) ( 2 ) ' ( ( ) ' ( )) ( _ 2 ) ( ( ) ( ) ( _ ) ( 1 1 ) , ( 0 0 buff c E c E r n b C e c e r n b C R buff t new new y n n n n path e d ED y y c y n + × + + × + × =

∑

∈

Thus, the delay difference of part1 between Fig. 4(b) and Fig. 4(a) is in the following: ) ( )) ( ( ) ' ( )) ( ( ) ( ₁ ) , ( 1 ) , ( 0 0 new n n n path e new n n n path e i ED n r e c E r e c E t y i y n y c y n × − × = ∆

∑

∈ ∈ (13),

where ni is the pins in part1. tED(ni) has a higher possibility to become negative when Enew1’ is shorter than Enew1. The delay difference of part2 between Fig. 4(b) and Fig. 4(a)

is in the following. )) ( ) ' ( ( )) ( ( ) ( ₁ ₁ ) , ( 0 new new n n n path e i ED n r e c E c E t y i y n − × = ∆

∑

∈ (14),

(21)

where ni is the pins in part2. Similarly, tED(ni) has a higher possibility to become

negative when Enew1’ is shorter than Enew1. The delay difference of buff between Fig. 4(b)

and Fig. 4(a) is in the following.

(15) )) ( ) ( ) ( _ 2 ) ( ( ) ( )) ( 2 ) ( ( ) ( )) ( ) ( ( )) ( ( )) ( 2 ) ' ( ( ) ' ( )) ( ) ' ( ( )) ( ( ) ( 1 ) , ( 1 1 1 ) , ( 1 1 1 ) , ( 0 0 buff c E c n a C e c e r buff c E c E r buff c E c e r buff c E c E r buff c E c e r buff t new y n n n n path e new new new n n n path e new new new n n n path e ED y y a c y n y a y n y c y n − − + × − + × − + × − + × + + × = ∆

∑

∈ ∈ ∈

_t_ED_{(buff) has a higher possibility to become negative when E}_new1_{’ is shorter than E}_new1_. Therefore, according to (13), (14), and (15), we can derive the delay of each pin in the sub-net including n0 could be improved when reconnecting to buff through the nearest

path.

In the sub-net excluding n0, separating it into more than one part can effectively

decrease the downstream capacitance of each part based on the Elomre delay model. Thus, the delay of the pin in the sub-net excluding n0 can be further improved. However,

separating one sub-net into more than one part requires reconnecting these parts to buff, sequentially resulting in more wire length and downstream capacitance. The delay improvement diminishes when the number of parts is too much. Therefore, this work only partitions the sub-net excluding n0 into two parts at most. The nearest path from buff to path(nb, ne) is found. In Fig. 4(b), Enew2’ is the nearest path from buff to path(nb, ne), and nd is the additional Steiner point. The edge Edis’ between nd and the upper

stream Steiner point of nd is disconnected to separate the sub-net into two parts. Then,

the delays of all pins in the parts including nb and nd, respectively, in Fig. 4(a) are in

the following. )) ( _ 2 ) ( ( ) ( )) ( _ ) ( ( ) ( 2 2 2 b new new b new d b ED C a n E c E r n a C E c R n t = × + + × + (16)

(22)

(17) )) ( _ 2 ) ( ( ) ( )) ( _ 2 ) ( ( ) ( )) ( _ ) ( ( ) ( ) , ( 2 2 2 y n n n n path e b new new b new d d ED n a C e c e r n a C E c E r n a C E c R n t y y d b y n + × + + × + + × =

∑

∈

The delays of all pins in the parts including nb and nd, respectively, in Fig. 4(b) are in

the following. (18) )) ( _ 2 ) ( ( ) ( )) ( _ ) ' ( ) ( _ ) ( ( ) ( 2 2 2 2 b new new d new b new d b ED n b C E c E r n b C E c n b C E c R n t + × + + + + × = (19) )) ( _ 2 ) ' ( ( ) ' ( )) ( _ ) ' ( ) ( _ ) ( ( ) ( 2 2 2 2 d new new d new b new d d ED n b C E c E r n b C E c n b C E c R n t + × + + + + × =

Thus, the delay differences of all pins in the parts including nb and nd, respectively,

between Fig. 4(b) and Fig. 4(a) are in the following.

(20) )) ( _ ) ' ( ( ) ( )) ' ( ) ' ( ( ) ( 2 2 d dis new dis new d b ED n a C E c E r E c E c R n t + × − − × = ∆ (21) )) ( _ 2 ) ( ( ) ( )) ( _ 2 ) ( ( ) ( )) ( _ 2 ) ' ( ( ) ' ( )) ' ( ) ' ( ( ) ( ) , ( 2 2 2 2 2 y n n n n path e b new new d new new dis new d d ED n a C e c e r n a C E c E r n b C E c E r E c E c R n t y y d b y n + × − + × − + × + − × = ∆

∑

∈

If tED(nb) is negative, the delay of each pin passing through nb is improved. If tED(nd)

is negative, the delay of each pin passing through nd is improved. By (20) and (21), we

(23)

Chapter 4 Topology-aware ECO Timing

Optimization

Fig. 5 depicts the proposed topology-aware ECO timing optimization flow. Given a DEF file, LEF file, .lib file, and violation sink information, the score of each buffering pair is calculated based on the two topology effects on buffer insertion. The buffering pair with the highest score is selected for buffer insertion. After edge breaking and buffer connection, if the inserted buffer can satisfy the constraints which will be introduced in the following subsection, buffer reconnection modifies the two sub-nets to further improve the timing. Otherwise, the buffering pair is re-selected until the selected buffer satisfies the constraints. Finally, the modified DEF file is output. Edge breaking and buffer connection and buffer reconnection are introduced prior to the buffering pair score computation which is based on them.

(24)

4.1 Edge Breaking and Buffer Connection

This operation is based on the first derived effect: edge breaking and buffer connection effect. Given a detailed-routed net Nm and a buffering pair, the edge Edis is

broken, and the generated sub-nets are connected to the buffer buff through two additional connections Enew1 and Enew2, as illustrated in Fig. 3. Noticeable, connecting

sub-nets to the buffer adopts detailed routing aware routed wire segments and obstacles. If Edis, Enew1, and Enew2 can satisfy (7) and (8), the buffering pair is legal.

Otherwise, the buffering pair is illegal.

4.2 Buffer Reconnection

This operation is based on the second derived effect: buffer reconnection effect. In Fig. 4(b), buffer reconnection routes the additional interconnections, such as Enew1’

and Enew2’, aware routed wire segments and obstacles. Therefore, the timing impact

can be obtained accurately. For the sub-net including n0, if (13), (14) and (15) are

negative when disconnecting Enew1 and connecting Enew1’, Enew1 is removed. Then the

sub-net reconnects to buff through Enew1’. Then the sub-net reconnects to buff through Enew1’. For the sub-net excluding n0, if (20) and (21) are negative when disconnecting Edist’ and connecting Enew2’, Edist’ is disconnected and Enew2’ is connected to nd.

4.3 Buffering Pair Score Computation

Different buffering pair results in different timing impacts on all pins of one net. Therefore, the buffering pair score computation estimates the timing impacts of all buffering pairs based on the two derived effects. To accurately assign the scores of buffering pairs, pseudo edge breaking and buffer connection and pseudo buffer

reconnection are applied by estimating the wire length of interconnections by

Manhattan distance instead of detailed routed wire length. With a buffering pair, the delays of all pins after buffer insertion can be improved when (7) and (8) are satisfied.

(25)

To differentiate buffering pairs, the delay differences of the timing violation sinks VP are only concerned. T1VP = {tED1(vp1), tED1(vp2), …, tED1(vpk)}is the delay

differences of VP after pseudo edge breaking and buffer connection while T2VP =

{tED2(vp1), tED2(vp2), …, tED2(vpk)}is the delay differences of VP after pseudo

buffer reconnection. The slack of each pin indicates its importance of timing optimization. A pin with more negative slack represents that it violates the timing constraint more seriously. Each violation sink is assigned a weight, denoted as w(vpi),

in the following according to its original slack.

∑

= = _k j j i i vp slack vp slack vp w 1 ) ( ) ( ) ( ₍₂₂₎

(22) represents that the violation sink with a more negative slack has a greater weight. Therefore, the score of each buffering pair is computed in the following.

∑

= ∆ + ∆ × − = k i i ED i ED ins dis buf E B w vpi t vp t vp pair 1 )) ( ) ( ( ) ( )) , ( score( 2 1 (23)

The score of the buffering pair is greater means that the delays of the violation sinks can be improved more by selecting the buffering pair.

(26)

Chapter 5 Experimental Results

The proposed algorithm is implemented in C++ language on an AMD Opteron 3.0GHz 8-core processor with 48GB memory. Two circuits in IWLS 2005 benchmarks [9] s35932 and s38417 with DEF, LEF, and .lib files are used. The original benchmarks contain no spare cells. Therefore, this work inserts 300 spare cells in each circuit, and the ratio of the number of spare cells to the total cells number is between 3% to 5%, which is reasonable for modern designs [5]. SOC Encounter 8.1 runs P&R on the modified circuits. The statistics of the circuits are shown in Table I, where “Circuit Name” denotes the names of circuits; “#Cell” denotes the number of cells in the circuit; “#Buffer” denotes the number of buffers in the circuit; “#Net” denotes the number of nets in the circuit; and “CU” denotes the core utilizations of circuits. To demonstrate the efficiency of the proposed algorithm, this work selects five nets, N1-N5, and three nets, N6-N8, in s35942 and s38417, respectively. The statistics of the selected nets are shown in Table II, where “Name” gives the name of selected net; “Degree” denotes the number of pins connected by the net; and “Original” denotes the worst negative slack (WNS) and total negative slack (TNS) calculated by the Elmore delay (EL) before ECO timing optimization.

Table II shows the experimental results of the proposed topology-aware ECO timing optimization (TOPO). As shown in Table II, edge breaking and buffer

(27)

connection (EB) and buffer reconnection (BR) of TOPO averagely improve the WNS by 36.5% and 38.8%, respectively, resulting in 75.3% total WNS improvement. This demonstrates the efficiency of buffering pair score computation of considering both EB and BR. The buffering pair with small delay improvement after EB can be selected because its delay improvement after BR is larger than those of other buffering pairs. Moreover, TOPO improves the TNS by 76.3% on average. Besides adopting the Elmore delay, Table II also shows the WNS improvements computed by SOC Encounter 8.1 (SE). The WNS improvements computed by Elmore delay and SE are similar. The overall runtime, that of detailed routing (DR), and that except for DR are shown in Table II. A grid-based DR is adopted herein. Almost runtime expending on DR demonstrates the efficiency of TOPO.

Conventional buffer insertion of timing ECO only concerns one critical sink in one multi-pin net. To compare with conventional buffer insertion, a simulated 2-pin net-based buffer insertion algorithm (SIM) is implemented herein. SIM ignores the topology of the net and treats one multiple-pin net as a two-pin net including driver and the critical sink. SIM searches all buffers and inserts the one into the path from the driver to the critical sink which achieve the best delay improvement of the critical sink. The original benchmark nets are modified as containing only one violation sink, which is the WNS sink in the original benchmark, by relaxing the required time of other sinks. The comparisons of delay improvement and runtime between SIM and TOPO are shown in Table III. The delay improvement is computed by considering the

(28)

whole multi-pin net. The delay improvements of TOPO are better than those of SIM by 35.2% and 30.1% in term of Elmore delay and SE, respectively. TOPO searches all buffering pairs near the whole net in EB while SIM only searches buffering pairs near the path between the driver to the critical sink. Moreover, SIM does no BR. Therefore, compared to SIM, TOPO significantly improves the delay at the cost of runtime.

(29)

Chapter 6 Conclusions

This work derives two topology effects on buffer insertion: edge breaking and

buffer insertion and buffer reconnection. Based on the topology effects, a novel

topology-aware ECO timing optimization algorithm is proposed. Experimental results show that the proposed algorithm averagely improves the worst negative slack (WNS) and total negative slack (TNS) effectively. Compared to the conventional 2-pin net-based buffer insertion, the proposed algorithm achieves better delay improvement at the cost of runtime.

(30)

Chapter 7 Bibliography

[1] K.-H. Chang, I. L. Markov, and V. Bertacco, “Reap what you sow: Spare cells for post-silicon metal fix,” ISPD, pp. 103–110, 2008.

[2] Z-W. Jiang, M.-K. Hsu, Y.-W. Chang, and K.-Y. Chao, “Spare-Cell-Aware Multilevel Analytical Placement,” DAC, pp. 430–435, 2009.

[3] L.-C. Chou and Y.-L. Li, “A Timing-Driven ECO Router with Elmore Delay Model,” A Thesis Submitted to Institute of Computer Science and Engineering College

of Computer Science National Chiao Tung University, 2008.

[4] Y.-P. Chen, J.-W. Fang and Y.-W. Chang, “ECO Timing Optimization Using Spare Cells,” ICCAD, pp. 530-535, 2007.

[5] C. -W. Chang, C.-T. Chao, C. -P. Lu, C. -H. Lo, “A Metal-Only-ECO Solver for Input-Slew and Output-Loading Violations,” ISPD, pp. 191-198, 2009.

[6] L.-G. Chang, H.-B. Hung, H.-Y. Chang and I. H.-R. Jiang, “Matching-Based Minimum-Cost Spare Cell Selection for Design Changes,” DAC, pp. 408–411, 2009. [7] Y.-S. Tseng, Y.-K. Yeh, M.-C. Chi and T.-M. Hsieh, “Technology Remapping for Engineering Change with Wirelength Consideration,” ISCAS, pp. 2602–2605, 2010. [8] J. Rubinstein, P. Penfield, and M. Horowitz, “Signal Delay in RC Tree Networks,”

IEEE Trans. Computer-Aided Design, Vol. 2, pp. 202-211, July 1983.

為了工程變更時序最佳化的考慮拓樸的重新繞線

國

立

交

通

大

學

資訊科學與工程研究所

碩

碩

碩

碩

士

士

士

士

論

論

論

論

文

文

文

文