Buffered Metal Balance Clock Tree - 使用金屬平衡演算法來降低時鐘樹架構受製程變數的影響

In order to verify that metal balance algorithm in buffered clock tree can still tol-erate manufacturing process, we modify the algorithm in [19] for building buffered metal balance clock tree.

In a practical solution of buffer insertion, it is usually desired that all root-to-leaf paths have the same number of buffer stages. It can make the impact of process variation on the skew of a clock net minimize if there is a change in buffer delay due to process variation. This is because the increment of decrement of phase delay

Figure 3.5: Level-by-level insert buffers method in binary tree

is the same for all paths. Level-by-level method is usually used in many works.

It is shown Figure 3.5. However, this method works well when the tree topology is full binary tree where all sinks have same number of levels. If the clock tree is not full binary tree in Figure 3.6(a), the number of buffers from source to sinks are not the same. Hence, balanced buffer insertion method is preferred in [19]. This method starts with an equal path-length clock tree T. Then, they insert buffers at the same distance from clock source. Then, even the tree does not have clear level, this method can still guarantee that there are same number buffer from source to sinks. Figure 3.6(b) is the balanced buffer insertion method. Our paper is suited the balanced buffer insertion method due to our clustering.

Our process of building buffered metal balance clock tree has three steps. First, we build a metal balance unbuffered clock tree. The primary difference is the clock tree must be an equal path-length clock tree. We use Manhattan distance to replace delay for merging. If there is the same delay from source to sink, there is the same

distance from source to sinks. Second, we find the buffer insert location. The buffer location is as follows:

buf f erlocation = L/(n + 1)

L is the path distance from clock source to sink. n is initialized 1. Thirdly, we calculate the skew from clock source to sinks and check timing constrain is satisfied.

If the timing constrain is not satisfied, we add n by 1 and repeat step2 until the timing constrain is satisfied. Finally, we get a buffered metal balance clock tree.

Figure 3.6: (a) Level-by-level method in non full binary clock tree (b) The balance buffer insertion method in non full binary clock tree

Chapter 4 Experiment Results

We have implemented our approach in the C++ programming language and tested it on AMD Opteron (tm) 2.8G with 2.0GB memory. We use UMC 90nm standard cell library for conventional buffers. The benchmark circuits are r1-r5 downloaded from the GSRC Bookshelf (http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/BST/).

Table 4.1 shows the number of sinks in benchmark r1-r5.

Table 4.2 shows the 4x and 4y of DME and our metal-balance. Fifth and sixth columns are the improvement between ours and DME. We can find that ours 4x and 4y is less than DME 20% to 70%. Hence, our research can make the clock tree more balance.

In order to simulate the effect of the manufacturing variations, we add two pa-rameters in delay model:

r(αL_x+ βL_y)(^c(αL^x₂^+βL^y⁾ + C_v1) + t_v1

The primary difference is that we use (αLx+ βLy) to replace the distance l. α and β are user defined. Lx and Ly are the length of horizontal and vertical. The result is showed in Table 4.3. It compares the results of DME and our metal-balance algo-rithm in nonbuffered clock tree. The first column shows the wirelength of DME and

we normalize the DME wirelength to 1. The third column shows the ratio of ours to DME. The wirelength of ours is more than DME 1% to 7%.This is because that DME is an algorithm that tries to find the minimum wirelength. The second and forth columns are the skew of DME and ours based on the new delay model where α = 1.1 and β = 0.9. In fifth column, it shows that our algorithm can reduce the skew under process violation. Therefore, this is a trade off between wirelength and process violation.

Table 4.4 shows the result of DME and ours in buffered clock tree. The first and second columns are the skew of DME and ours. Because buffer insert can reduce the arrival time, we can see that the skew difference between DME and ours is less than table 4.2. However,we can still get from third column that our skew is still less than DME.

4.1 Discussion

In the top-down phase of DME [1], DME resolves the exact locations of all internal nodes in ZST. The exact locations are always the end points of merging segments because DME tries to find the shortest path. However, the end points often have the maximum wirelength difference from end points to their children. In [2]- [9], they improve the bottom-up phase of DME, but their top-down phase is the same as in DME. They all try to find the shortest path. Hence, they can not balance the wirelength of every metal and skew will be effected easily by manufacturing process.

Table 4.1: The number of sinks in benchmark r1-r5. The benchmark circuits are

Table 4.2: The table shows the comparison of wirelength difference of horizontal and vertical metal by using DME and metal balance(MB). The results present advantages in balance. The 4x of MB is average 36% less than DME. The 4y of MB is average 30% less than DME.

DME 4x DME 4y MB 4x MB 4y 4x improve 4y improve

r1 29093.5 29503.2 20561.8 22033.4 29% 25%

r2 54878.79 45205.6 35044.7 32840.2 36% 27%

r3 57852.77 57337.6 43300.1 30522.1 25% 47%

r4 58434.2 68738.6 46557.4 52534.2 20% 24%

r5 126556.1 62453.4 32376.2 46274.3 74% 26%

Table 4.3: The wirelength and skew comparison between DME and metal bal-ance(MB) in unbuffered clock tree under manufacturing process. The results present advantages in skew. MB skew is average 60% less than DME. However, MB wire-length is average 4% more than DME.

DME WL DME skew (ps) MB WL MB skew (ps) skew improve

r1 1 17.2 1.001 4.4 74%

r2 1 45 1.048 29 36%

r3 1 113 1.045 37 67%

r4 1 310 1.069 97 69%

r5 1 650 1.035 290 55%

Table 4.4: The skew comparison between DME and metal balance(MB) in buffered clock tree under manufacturing process. The results present advantages in skew.

MB skew is average 37% less than DME.

DME skew (ps) MB (ps) Skew improve

r1 5.5637 3.5234 37%

r2 5.87 1.5 74%

r3 21.602 18.957 12%

r4 1.34 1.057 21%

r5 15 8.384 44%

Chapter 5

在文檔中使用金屬平衡演算法來降低時鐘樹架構受製程變數的影響 (頁 22-29)