4.2 Buffer Insertion
4.2.2 Buffered Tree Transformation
Due to the recursion of buffer insertion algorithm, it runs from the leaves to the
Figure 4.7: Example of merge operation. The two columns on the left are the solution sets being merged; pairs of solution combined are surrounded by circle.
The resulting set of solutions appears on the right.
Algorithm Bottom Up(root) if root is leaf
evaluate the sink solution else
for i = 1 . . ., n is the number of children Bottom Up(children(i))
if root is not branch
evaluate the sorted solution for root else
evaluate all sorted solution of children Merge Solution()
evaluate the optimal buffer solution
Figure 4.8: Dynamic Programming Algorithm for Buffer Insertion With Pruning [15, 20].
nodes may be the parents of other nodes. To easily execute the buffer insertion algorithm, we do not permit the sinks having any children. We slightly change the tree structure, and force all sinks to the leaves of the tree by additional pseudo nodes. An example is illustrated in Fig 4.9. In this example, the sink, t1, has one child s1. We then generate a pseudo node, s2, at the same position with t1. We complete the transformation by producing the edges, es2!s1, es2!t1 and eroot!s2 , and breaking the edges, eroot!t1 and et1!s2.
Next, we consider the different conditions of buffer connection at the branch point and further change the tree topology to handle all possible conditions. In Figure 4.10, (a) is the topology we considered, it might have buffer conditions, (b), (c), (d), (e), (f), (g) and (h). Since the optimal solution can be computed by the buffer insertion algorithm, we can transform the tree with the same technique to the final structure as (i). Through this simple transform action, we can obtain the solutions better than the solutions without considering the different buffer connection at branch point. This is because now we can take all possible conditions of buffer connection into account, not only (b) and (c). We called this transform action as “decouple function” in our platform.
Figure 4.9: Illustration of tree transformation.
Figure 4.10: (a) Consider all buffer combinations at candidate point C. (b) Buffer is not inserted at C. (c) One buffer at C drives t1 and t2. (d) One buffer at C drives only t1. (e) One buffer at C drives only t2. (f) Two buffers at C drive t1 and t2 respectively. (g)&(h) Two buffers are inserted at C, and one buffer decouple t1 or t2. (i) Candidate point C transform to three pseudo point to handle all buffer combinations.
4.3 Summary
The entire flow of our two-stage buffered tree construction algorithm follows these steps:
1. Construct the grid graph
2. Construct the performance-driven routing tree based on IDOM algorithm.
3. Transform the tree structure (option: consideration for all possible conditions of buffer connection at branch point).
4. Execute the buffer insertion algorithm for the tree.
Chapter 5
Experimental Results
We have implemented two approaches for buffered interconnect tree construction in C++ and tested it on Pentium 4 PC 2.4GHz with 512MB memory, one is two-level hierarchical simulated annealing algorithm and the other is two-stage buffered tree algorithm. To show the effectiveness of our approaches, we compare the results with the fast flat simulated annealing algorithm [9]. We use the same technology parameters given in [19], as shown in Table 5.1. Our chip size is 17 × 17 mm2 with horizontal and vertical grid lines spaced at 0.5mm distance from each other.
Table 5.1: Technology Parameters.
Table 5.2 and 5.3 show the comparison between these four methods. In Table 5.2 and 5.3 , we use a single buffer type and two buffer types respectively. We now examine the efficiency and performance of these algorithms. For execution time, the simulated annealing algorithm has long execution time, but the two-stage algorithm remains the same execution time. This is because the run time of performance
Table 5.2: Performance comparison (Buffer Types = 1, Blockages = 11) between the approaches in [9] and our approaches. Our two-stage approach has better delay and wirelength in comparison with SA algorithm.
DATA Flat Two-Level Two-Stage Two-Stage
SA SA (Decouple Off) (Decouple On)
name delay WL buf CPU delay WL buf CPU delay WL buf CPU delay WL buf CPU
(ps) (mm) (sec) (ps) (mm) (sec) (ps) (mm) (sec) (ps) (mm) (sec)
NET8 1292 77 20 5.88 1284 73 18 5.05 1309 65 20 0.64 1249 65 21 0.64
NET11 976 78.5 21 8.0 955 74.5 15 4.5 892 66.5 20 0.73 859 66.5 22 0.73
NET18 1239 124 29 8.86 1059 99 21 5.67 1118 94.5 30 1.15 1035 94.5 34 1.15
NET23 1319 141 35 10.23 1100 106 25 6.69 1037 96 32 1.54 992 96 35 1.54
NET25 1084 154 45 11.86 1182 117.5 26 7.5 1055 97.5 33 1.7 999 97.5 37 1.7
driven interconnect tree construction is fixed and the run time of buffer insertion algorithm is very fast. However the simulated annealing algorithm spends long time on lookup table construction and also needs time to search the optimal buffered tree, hence it will use more time when using multiple buffer types. For performance, our two-stage algorithm has better performance in our most experimental cases. The main reason is the performance-driven tree has lower load and tree radius, and the performance is naturally better than the results of simulated annealing algorithm.
In addition, if we turn on the decouple function of the two-stage algorithm, we can get more delay reduction. Figure 5.1 and 5.2 are examples for “decouple off” and
“decouple on” respectively.
We further discuss these two proposed algorithm in details. The two-level hierar-chical buffered tree construction is based on simulated annealing algorithm [9]. The simulated annealing algorithm mainly emphasizes that the execution time is less than previous simultaneous approaches [16, 11]. However, we find this approach has a major drawback. Its solution is very uncertain especially when terminal number of net is large. When terminal number of net is large, the wirelength of buffered tree constructed by simulated annealing algorithm appears very long. We believe
Table 5.3: Buffer Types = 2, Blockages = 11. The 2nd buffer’s parameters: output resistance=90(Ω), input capacitance= 0.048(pF), intrinsic delay=36.4(ps).
DATA Flat Two-Level Two-Stage Two-Stage
SA SA (Decouple Off) (Decouple On)
name delay WL buf CPU delay WL buf CPU delay WL buf CPU delay WL buf CPU
(ps) (mm) (sec) (ps) (mm) (sec) (ps) (mm) (sec) (ps) (mm) (sec)
NET8 1180 76 23 53.73 1196 74 19 41.94 1085 65 20 0.64 1049 65 23 0.64
NET11 837 80.5 24 61.98 796 74.5 17 42.86 753 66.5 22 0.73 728 66.5 25 0.73
NET18 1104 141.5 36 100 927 104 19 47.95 924 94.5 34 1.15 875 94.5 36 1.15
NET23 1135 173.5 58 127.42 926 114 24 56.08 870 96 32 1.54 838 96 34 1.54
NET25 994 195 55 198 1018 131.5 28 55.31 880 97.5 35 1.7 841 97.5 36 1.7
the longer wirelength is the reason why it has bad performance and use a lot of buffers. To improve this disadvantage, we try to use two-level hierarchical method to minimize the wirelength. From our experimental results, the two-level hierarchi-cal method has better wirelength and use less buffers. However it has a little bad delay for some cases, we think the result may be caused by the longer tree radius.
To improve the disadvantages of the above algorithms, we propose the method of two-stage buffered tree construction. We believe performance-driven tree construc-tion and buffer inserconstruc-tion can be done independently. We do not need to consider both of them at the same time. If we can construct a interconnect tree which is po-tentially good for delay and insert buffers for it, we can get a good enough solution.
From our experimental results, our two-stage algorithm has better performance than the simultaneous approaches and use less buffer resources and wire resources. The two-stage algorithm is also more efficient than simulated annealing algorithms.
Figure 5.1: The decouple function of Two-Stage-Method turn off. Delay = 1309 (ps), wirelength = 65 (mm), # of buffer = 20.
Figure 5.2: The decouple function of Two-Stage-Method turn on. Delay = 1249 (ps), wirelength = 65 (mm), # of buffer = 21.
Chapter 6
Conclusion and Future Work
Since the interconnect delay becomes more important, we should take it into consid-eration during chip design. The buffer insertion algorithm can minimize the delay of fixed tree. However, the solution may be limited by the input tree. [9] proposed a fast simulated annealing algorithm which simultaneously constructs the routing tree and performs buffer insertion. But this algorithm suffers from the problem of un-certainty when the terminal number of net is large. We try to solve this problem by clustering. We get wirelength reduction and use less buffer resources. But in some cases, the clustering algorithm may cause worse delay due to longer tree radius.
We believe that the routing tree construction and buffer insertion can be inde-pendently performed. We propose the two stage algorithm to efficiently construct the buffered tree. First we construct a performance-driven interconnect tree, then apply the buffer insertion algorithm to minimize delay. From the experimental re-sults, our algorithm is more efficient than [9] and can obtain better delay. We draw the conclusion that the two-stage algorithm can use less run time and get better performance than the simultaneous approach by decoupling technique.
In future works, we plan to further improve our two-stage algorithm by wire siz-ing. We can also find another approach to synthesizing a better performance-driven tree.
Acknowledgment: We thank Dr. Xiaoping Tang and Prof. Chris C. N. Chu for providing their platforms in experimental results comparison.
Bibliography
[1] M. J. Alexander and G. Robins. “New Performance-driven FPGA Routing Algorithms”. IEEE Transactions on Computer-Aided Design of Integrated Cir-cuits and Systems, 15(12):1505–1517, Dec. 1996.
[2] C. J. Alpert, A. Devgan, and S.T. Quay. “Buffer Insertion with Adaptive Block-age Avoidance”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(11):1633–1645, Nov. 1999.
[3] C. J. Alpert, G. Gandham, M. Hrkic, J. Hu, A. B. Kahng, B. Liu J. Lillis, S. T.
Quay, S. S. Sapatnekar, and A. J. Sullivan. “Buffered Steiner Trees for Difficult Instances”. In Proceedings International Symposium on Physical Design, pages 4–9, 2001.
[4] C. J. Alpert, T. C. Hu, J. H. Huang, A. B. Kahng, and D. Karger. “Prim-Dijkstra Tradeoffs for Improved Performance-driven Routing Tree Design”.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys-tems, 14(7):890–896, July 1995.
[5] J. P. Cohoon and L. J. Randall. “Critical Net Routing”. In Proceedings IEEE International Conference on Computer Design, pages 174–177, 1991.
[6] J. Cong. “Challenges and Opportunities for Design Innovations in Nanometer Technologies”. In Semiconductor Research Corporation Design Sciences Con-cept Paper, pages 1–15, 1998.
[7] J. Cong, A. B. Kahng, and K.-S. Leung. “Efficient Algorithms for The Mini-mum Shortest Path Steiner Arborescence Problem with Applications to VLSI Physical Designs”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(1):24–39, Jan. 1998.
[8] J. Cong and X. Yuan. “Routing Tree Construction Under Fixed Buffer Lo-cations”. In Proceedings IEEE/ACM Design Automation Conference, pages 379–384, 2000.
[9] S. Dechu, C. Shen, and C. Chu. “An Efficient Routing Tree Construction Algo-rithm with Buffer Insertion, Wire Sizing and Obstacle Considerations”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(4):600–608, April 2005.
[10] M. Hrkic and J. Lillis. “S-Tree: A Technique for Buffered Routing Tree Syn-thesis”. In Proceedings IEEE/ACM Design Automation Conference, pages 578–
583, 2002.
[11] M. Hrkic and J. Lillis. “Buffer Tree Synthesis with Consideration of Tempo-ral Locality, Sink Polarity Reuirements, Solution Cost and Blockages”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(4):481–491, April 2003.
[12] F. K. Hwang, D. S. Richards, and P. Winter. “The Steiner Tree Problem”.
North-Holland Publisher, 1992.
[13] S. S. Sapatnekar J. Hu. “Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 49(4):446–458, April 2000.
[14] M.-H. Lai and D.F. Wong. “Maze Routing with Buffer Insertion and Wiresiz-ing”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(10):1205–1209, Oct. 2002.
[15] J. Lillis, C.-K. Cheng, and T.-T. Lin. “Optimal Wire Sizing and Buffer In-sertion for Low Power and a Generalized Delay Model”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(3):437–447, March 1996.
[16] J. Lillis, C.-K. Cheng, T.-T Lin, and C.-Y. Ho. “New Performance Driven Routing Techniques with Explicit Area/Delay Tradeoff and Simultaneous Wire Sizing”. In Proceedings IEEE/ACM Design Automation Conference, pages 395–
400, 1996.
[17] T. Okamoto and J. Cong. “Buffered Steiner Tree Construction with Wire Sizing for Interconnect Layout Optimization”. In Proceedings IEEE/ACM In-ternational Conference on Computer-Aided Design, pages 44–49, 1996.
[18] R. R. Rao, D. Blaauw, D. Sylvester, C. J. Alpert, and S. Nassif. “An Effi-cient Surface-Based Low-Power Buffer Insertion Algorithm”. In Proceedings International Symposium on Physical Design, pages 86–93, 2005.
[19] X. Tang, R. Tian, H. Xiang, and D. F. Wong. “A New Algorithm for Routing Tree Construction with Buffer Insertion and Wire Sizing under Obstacle Con-straints”. In Proceedings IEEE/ACM International Conference on Computer-Aided Design, pages 49–56, 2001.
[20] L. P. P. P. van Ginneken. “Buffer Placement in Distributed RC-tree Network for Minimal Elmore Delay”. In Proceedings Internationl Symposium on Circuits and Systems, pages 865–868, 1990.
[21] H. Zhou, D.F. Wong, I.-M. Liu, and A. Aziz. “Simultaneous Routing and Buffer Insertion with Restrictions on Buffer Locations”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(7):819–824, July 2000.
作者簡歷
游宗達,民國七十年九月出生於花蓮縣。民國九十二年六月畢業於國立交通 大學電子工程學系,並於同年九月進入國立交通大學電子研究所就讀,從事 VLSI 實體設計方面相關研究。民國九十四年六月取得碩士學位,碩士論文題目為『考 慮障礙物繞線及緩衝器插入之方法研究』。