Retrace Algorithm - R’ i =Rmin0 – β buffer * load

**R’ i =Rmin0 – β buffer * load – Sb**

Algorithm 5 Retrace Algorithm

Inputs: boolean tLevel[k], k=1,2...n, for each k, if two-level is used int next[k], k=1,2,....n, the first sink that does not connected to root directly

Output: the total number of buffer nBuffer, the parent node for each sink pSink[k], the parent node for each buffer pBuffer[i], i=1,2...nBuffer

begin

int step = -1;

int i = 0;

while (i) < (n + 1) do if tLevel[i]==true then

run two-level algorithm to get nBuffer and the num for each sink for i=n to k+1 do

for j=1 to nBuffer do pBuffer[step+1+j] = step;

end for

pSink[i] = step+1+num;

end for

nBuffer = nBuffer + step+1;

break;

else

for j=i to(next[i] - 1) do pSink[j]=step;

pBuffer[step+1]=step;

end for end if step++;

i=next[i];

end while end

Chapter 4 Experimental Results

The whole algorithms are implemented in C++ and the platform used for this master thesis is Pentium 4 2.66 GHz, 1280MB dram. The parameter of the resistance and the per-unit-capacitance are gotten from [10]. We will adopt interconnects per unit length for every connects between nodes.

There are three output files :

1. The number and the name of the buffer used.

2. The net information among these nodes:source, sink, buffer.

3. The runtime for each benchmark and relative information.

The information for each benchmark are shown in Table 4.1. In Table 4.2 to Table 4.4, the Minimum is the minimum required time at sinks, the Original stands for the required time at source without buffer insertion, the Ideal represents the potential best required time, the Result on behalf of the final result at source after buffer insertion, and the NBuffer is the usage of buffer number for every benchmark. The simulation results are shown in Table 4.2 to Table 4.4.

While a great number of papers have been written on the fanout optimization, many of them entirely do not consider the interconnect delay issue.

The * symbol in Table 4.2 to Table 4.4 is the whole algorithm running with consideration of the interconnect delay. Once the delay value in Table 4.2 to Table 4.4 has been changed, the number of the buffer is also different from that without interconnect delay. The result** means that we check the timing for every sink to source and choose the smallest one.

Besides the field of Method, NBuffer and Runtime in Table 4.2 to Table 4.4, the unit of every field in the Table 4.2 to Table 4.4 is picosecond. For each benchmark, we first use the

Table 4.1: Benchmark Information Bench1 Bench2 Bench3 Bench4 Bench5

α_source 1 1 1 1 1

β_source 0.5 0.5 0.5 0.5 0.5

αbuf 1 1 1 1 1

β_buf 0.5 0.5 0.5 0.5 0.5

C_buf 1 1 1 1 1

Total Sinks 1000 2000 3000 4000 5000

Bench6 Bench7 Bench8 Bench9 Bench10

α_source 1 1 1 1 1

β_source 0.5 0.5 0.5 0.5 0.5

αbuf 1 1 1 1 1

β_buf 0.5 0.5 0.5 0.5 0.5

C_buf 1 1 1 1 1

Total Sinks 6000 7000 8000 9000 10000

combinational merging algorithm, and if the obtained required time is within a small range of the ideal required time, computing stops here. Otherwise, the LT-Trees algorithm will be called for a better solution. Since combinational merging algorithm is efficient, its overhead on those using LT-Trees algorithm finally is acceptable. Adding the interconnect delay results in the usage of decreasing the number of buffer.

Table 4.2: Simulation Results of the LT-Trees and Combinational Merging

Bench1 Bench2 Bench3 Bench4 Bench5

Minimum 70265 76067 70265 80005 80000

Original 68933 73404 66271 74071 72726

Ideal 70263 76063 70263 80002 79998

Result 70263 76056 70258 80002 79997

Result** 70262 75993 70139 80001 79995

Result* 70257 76036 70241 79997 79991

Delay 2.0221 10.9944 7.8357 2.0005 3

Delay* 8.0672 30.8928 24.0712 7.5015 8.5008

NBuffer 493 135 168 2497 3062

NBuffer* 476 136 168 1422 1488

Runtime 0.2810 0.5150 2.3590 8.3760 13.1720

Runtime* 0.2970 0.5160 2.4840 8.8280 15.2040

Method LT-TREES C.M. C.M. LT-TREES LT-TREES

Bench6 Bench7 Bench8 Bench9 Bench10

Minimum 80000 76067 70265 80000 80000

Original 71285 66749 59617 67179 65651

Ideal 79998 76064 70263 79998 79998

Result 79997 76054 70259 79997 79996

Result** 79996 75893 70145 79996 79995

Result* 79990 76035 70241 79990 79989

Delay 2.1653 12.2716 6.4508 2.7819 3.0004

Delay* 9.0013 31.8043 24.4905 9.5019 10.5032

NBuffer 3363 267 288 3177 2598

NBuffer* 1415 267 288 1004 802

Runtime 20.3440 23.2660 28.2660 54.8600 71.7190 Runtime* 23.7810 24.4060 29.6400 64.1880 84.0940

Method LT-TREES C.M. C.M LT-TREES LT-TREES

Table 4.3: Simulation Results of the LT-Trees

Bench1 Bench2 Bench3 Bench4 Bench5

Minimum 70265 76067 70265 80005 80000

Original 68933 73404 66271 74071 72726

Ideal 70263 76063 70263 80002 79998

Result 70263 75994 70140 80002 79997

Result** 70262 75993 70139 80001 79996

Result* 70257 75970 70097 79997 79991

Delay 2.0221 72.7322 125.2144 2.0005 3

Delay* 8.0672 96.1278 168.1960 7.5015 8.5008

NBuffer 493 980 671 2497 3062

NBuffer* 476 868 622 1422 1488

Runtime 0.2660 1.0780 2.3590 7.7970 13.3900

Runtime* 0.2970 1.2510 3.7650 8.7660 15.1570

Method LT-TREES LT-TREES LT-TREES LT-TREES LT-TREES

Bench6 Bench7 Bench8 Bench9 Bench10

Minimum 80000 76067 70265 80000 80000

Original 71285 66749 59617 67179 65651

Ideal 79998 76063 70263 79998 79998

Result 79997 75894 70146 79997 79996

Result** 79996 75893 70145 79996 79995

Result* 79990 75860 70097 79990 79989

Delay 2.1653 172.3098 119.2462 2.7819 3.0004

Delay* 9.0013 206.2102 167.9518 9.5019 10.5032

NBuffer 3363 1039 2872 3177 2598

NBuffer* 1415 979 1662 1004 802

Runtime 21 27.6410 28.2660 57.2660 74.7810

Runtime* 23.7030 31.3600 43.5790 64.5310 83.7650

Table 4.4: Simulation Results of Combinational Merging Bench1 Bench2 Bench3 Bench4 Bench5

Minimum 70265 76067 70265 80005 80000

Original 68933 73404 66271 74071 72726

Ideal 70263 76063 70263 80002 79998

Result 70263 76056 70258 80002 79995

Result** 70262 75993 70139 80001 79996

Result* 70251 76036 70241 79993 79985

Delay 2.0221 10.9944 7.0167 2.000500 4.5008 Delay* 14.0221 30.8928 24.0712 11.0145 14.5000

NBuffer 100 135 168 215 237

NBuffer* 100 136 168 215 237

Runtime 0.2810 0.5160 2.438 8.3440 14.11 Runtime* 0.2800 0.5320 2.469 8.2810 14.4680

Method C.M. C.M. C.M. C.M. C.M.

Bench6 Bench7 Bench8 Bench9 Bench10

Minimum 80000 76067 70265 80000 80000

Original 71285 66749 59617 67179 65651

Ideal 79998 76064 70263 79998 79998

Result 79996 76054 70259 79995 79996

Result** 79996 75893 70145 79996 79995

Result* 79981 76035 70241 79982 79980

Delay 3.1653 12.2716 6.4508 4.0002 4.1379 Delay* 18.1657 31.8043 24.4905 17.5104 19.5010

NBuffer 260 267 288 317 337

NBuffer* 260 267 288 317 337

Runtime 22.1412 24.1876 29.1253 60.9537 80.7813 Runtime* 22.6720 24.4060 29.8280 61.7660 82.9230

Method C.M. C.M. C.M C.M. C.M.

Chapter 5 Conclusion

The fanout optimization is a NP-Complete problem if non-constant capacity values are allowed at sinks. There is always a trade-off between better solution and less time. Combina-tional Merging Algorithm is a heuristic algorithm with much less time consuming than LT-Trees Algorithm. In this thesis, the two algorithms are combined: We already know the minimum re-quired time at sinks and we can get the ideal maximum rere-quired time by: Ideal rere-quired time:

r₁ − α_source− β_source∗ (C_{buf f er}+ C) − R ∗ (C_{buf f er} + C) − β_{buf f er}C₁

For each benchmark, we first use the combinational merging algorithm, if the obtained re-quired time is within a small range of the ideal rere-quired time, computing stops here. Otherwise, LT-Trees algorithm will be called for a better solution. Since combinational merging is very fast, its overhead on those using LT-Trees finally is acceptable.

The interconnect delay could not be neglected in deep sub-micron IC design. In this thesis, the interconnect delay is elmore delay model. The future works will include the extension of gate sizing, one more size buffer library, multiple sink, more precise model of source gate model and interconnect delay. At last, the improvement of the benchmark will have the X-Y information for every node including buffer, source, sink that can estimate the length of

Bibliography

[1] L. P. P. P. van Ginneken, “ Buffer placement in distributed RC-tree networks for minimal Elmore delay,” in In Proc. Intl. Symposium on Circuits and Systems, pp. 865-868,1990.

[2] H. Bakoglu, “ Circuits, Interconnections, and Packaging for VLSI,” Addison-Wesley Pub-lishing Company,1987.

[3] J. Lillis, C. K. Cheng and T.-T. Y. Lin, “ Optimal wire sizing and buffer insertion for low power and a generalized delay model,” in IEEE J. Solid-State Circuits, vol. 31(3), pp.

437-447,1996.

[4] Weiping Shi and Zhuo Li, “ A Fast Algorithm for Optimal Buffer Insertion,” in IEEE Trans. Computer-Aidede Design, vol. 24, no. 6, pp. 879-891.,June 2005.

[5] Weiping Shi and Zhuo Li, “ An O(nlogn) Time Algorithm for Optimal Buffer Insertion,”

in 40th Design Automation Conference (DAC), pp. 580-585, 2003.

[6] Zhuo Li and Weiping Shi, “ An O(bn2) Time Algorithm for Optimal Buffer Insertion with b Buffer Types,” in Conference on Design, Automation and Test in Europe (DATE), Munich, Germany, pp. 1324-1329, March 2005.

[7] Weiping Shi, Zhuo Li and Charles J. Alpert, “ Complexity Analysis and Speedup Tech-niques for Optimal Buffer Insertion with Minimum Cost,” in 9th Asia and South Pacific Design Automation Conference (ASP-DAC), Yokohama, Japan, pp. 609-614, Jan 2004.

[8] Zhuo Li, C. N. Sze, Charles J. Alpert, Jiang Hu and Weiping Shi, “ Making Fast Buffer In-sertion even Faster via Approximation Techniques,” in 10th Asia and South Pacific Design Automation Conference (ASP-DAC), Shanghai, China, pp. 13-18, Jan 2005.

[9] Zhuo Li and Weiping Shi, “ An O(mn) Time Algorithm for Optimal Buffer Insertion of Nets with m Sinks,” in 11st Asia and South Pacific Design Automation Conference (ASP-DAC), Yokohama, Japan, pp. 320-325., Jan 2006.

[10] “Fast Buffer Insertion Source Code,”

[11] Y. Peng and X. Liu, “ Low-power repeater insertion with both delay and slew rate con-straints ,” in DAC, pp. 303-307, 2006.

[12] “ http://www.ece.umd.edu/class/enee644.S2004/project/project.htm,”

[13] H. Touati, “ Performance-oriented technology mapping ,” in Ph.D. dissertation, Univ.

California, Berkeley, CA, 1990.

[14] D. Kung, “ A Fast Fanout Optimization Algorithm for Near- Continuous Buffer Libraries ,” Proc. of 35th DAC, pp. 352-355 , June 1998.

[15] P. Rezvani, A. Ajami, M. Pedram, H. Savoj, “ Leopard: A Logical Effort-based fanout Optimization for Area and Delay ,” Proc. of ICCAD, pp. 516-519 , November 1999.

[16] P. Rezvani and M. Pedram, “ A fanout optimization algorithm based on the effort delay model,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 12, pp.

1671-1678, Dec. 2003.

[17] D. Zhou and X. Liu,“ Minimization of chip size and power consumption of high-speed VLSI buffers,” in Proc. Int. Symp. Phys.pp. 186-191, Dec.1997.

[18] K. J. Singh and A. Sangiovanni-Vincentelli, “ A heuristic algorithm for the fanout

prob-[21] K. Kodandapani, J. Grodstein, A. Domic, and H. Touati,“ A simple algorithm for fanout optimization using high-performance buffer libraries,” in Proc. Int. Conf. Comput.-Aided Des. pp. 466-471, 1993.

[22] B. Amelifard, F. Fallah, and M. Pedram,“Low-power fanout optimization using multi threshold voltages and multi channel lengths,” IEEE Trans. on Computer Aided Design,, Vol. 28, No. 4, pp.478-489, Apr. 2009.

[23] Nikolai Ryzhenko, Oleg Venger,“A Practical Repeater Insertion Flow,” GLSVLSI08 pp.261-266, May 2008.

[24] I-Min Liu, Adnan Aziz, “ Delay Constrained Optimization by Simultaneous Fanout Tree Construction, Buffer InsertiodSizing and Gate Sizing ,” Proceedings of the 37th annual ACM/IEEE Design Automation Conference pp.209-214, June 2000.

[25] Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic, “ Digital Integrated Circuits (2nd Edition),” pp. 25-26,Jan 2003.

[26] Wei Chen, Cheng-Ta Hsieh, Massoud Pedram, “ Simultaneous Gate Sizing and Fanout Optimization,” Proceedings of the 2000 IEEE/ACM international conference on Computer aided design , pp. 374-378, June 2000.

在文檔中一個使用緩衝器插入且考量連線延遲的單源扇出最佳化 (頁 42-51)