Problem Formulation - 在晶片與封裝共同設計時對於核心區塊與輸入輸出緩衝器擺置的方法

Performance of a digital system is measured by its cycle time. Shorter cycle time means higher performance. With considering the performance of a design at the layout level, signal propagation time and signal skew are two main factors. signal propagation time is defined as the path delay of the signal. Signal skew is defined

better the performance of the design, it is desirable to minimize the longest path delay and the signal skew. Besides, power integrity problem is also an important issue, a chip with bad power distribution will not have good performance.

Chapter 3 Proposed Methodology

Our methodology was divided into two parts, one is block placement step, another is buffer placement step. In block placement step, we place the block with more signals first,and minimize the total wire length and minimize the skew at the same time. We use one kind of grid assignment methodology to chose grid to place block.

Blocks can rotate, but they will not overlap after the placement step. In buffer placement step, we put the I/O buffers in horizontal and vertical way. For skew purpose, I/O buffers path delay should not be too long or too short, especially the differential pairs. And for power integrity purpose, number of I/O buffer placed in each grid should have a limit (SPG signal power ratio).[5]

3.1 Block Placement

At first, we cut the whole chip in to n*n grids, n depends on the block numbers.

For example, we chose n=4 when we have 12 blocks(4*4=16>=12), we chose n=5 when we have 23 blocks(5*5=25>=23). Then we calculate the cost of each block in each grid, and chose the best three grids and record the best three grids suitable for each block. So that each block has three candidates. If a grid is the best place for someone block, we add two charge counters on this grid. If a grid is the second place for someone block, we add one charge counter on this grid.If a grid is the third

Figure 3.1: After putting all charge counters, every grid has a number of counters.

Small triangles mean charge counters, big rectangles mean core blocks, and three numbers next to rectangle are three candidates. Small circles are signal bumps that connect to the ports on each blocks through I/O buffers.

place for someone block, we add one charge counter on this grid. Fig 3.1 show the diagram of case1(fc1) after putting all charge counters. Small triangles mean charge counters, big rectangles mean core blocks, and three numbers next to rectangle are three candidates. Small circles are signal bumps that connect to the ports on each blocks through I/O buffers.

So we know that which grid is the popular grid(with most charge counters), then we put the fittest block(fittest mean that this grid is one of the three candidates of this block and this block has most number of signals at the same time) in to the grid, and remove all the charge counter put by this block. Repeat this step until all

Figure 3.2: After grid assignment, each block sit at the left-bottom of the grid.

we get the result of grid assignment block by block, as shown in Figure 3.2 .

After putting every block into grids, we start rotation step. We can rotate four direction for each block for less wire length. After this step, all the blocks chose the best rotation of the fewest cost of wire length and skew.

After rotation step, we start movement step. Before this step we move each block roughly in the left-bottom of a grid or next to the boundary. It may reduce overlap section slightly. Then we got blocks’s position shown in Figure 3.3 . In the movement step, the major object is to reduce blocks’s overlap section. But in the movement step, we also reduce the signal skew at the same time. We use two kinds of cost function to measure the direction that a block should move, one of the cost function let blocks to move opponent way from the overlap section. One of the other

Figure 3.3: Case3 Blocks placement before rectification step(we only assign each block roughly in the left-bottom of a grid or next to the boundary.

let blocks to move to the way which can reduce the signal skew and overlap cost together. We repeat this step for a while until no overlap section appear. Then we start the buffer placement step.

Figure 3.5: If a grid does not have enough supporting power, buffer will find another grid to place.(the two big rectangle means block, the small one means buffer)

3.2 Buffer Placement

In this step, first we cut the whole chip in horizontal way to put buffers. The se-quence that which buffer places first or later depends on it’s distance that between the bump and block port it will connect. If a buffer has long distance that between the bump and block port it will connect, we should place it first basically. Because the longest path delay will lead to high signal skew, it will mess the final result seriously. Besides, because of the power integrity purpose (SPG signal power ratio), we will not put too many buffers into one grid(blocks also consume power). If we

Our remedy is to calculate the whole chip area, blocks area and buffers area. If we have chip area with 1000 units, blocks area with 400 units and buffers area with 300 units. We given chip area with 1000 units can support 1000 units of power, blocks area with 400 units will draw 400 units of power. Then we given the rest power(1000-400=600) support the buffers area(300), so we got that each unit of buffer will draw 2 units of power. Big buffer draw more power according to its area.

Because we cut chip for grids before, we know the boundary and area of each grid.

Because we had place blocks before, we know how much area rest for buffers in each grid. Then we put buffers into grids and will not over the limit of the power support ability.We set the power support ability is 1.2 times of the grid area(if a grid area is 200, it can supply 240 unit of power mostly). Shown in Figure 3.5 , if a buffer need 4 units of power, it can not place into the middle grid even the grid’s empty area is enough for buffer to place. The buffer will find another available grid where is with less cost to place.

Secondly, we cut the whole chip in vertical way to put buffers in the same way, we also follow the SPG (signal power ratio). Besides, we have a special concern about differential pairs. When we put the second differential pair buffer, it will find the place where the path delay is about to the first differential pair buffer, so the skew of the differential pair can reduced again. After deciding the position of I/O buffers, we chose the rotation(horizontal buffers can face to right or left, vertical buffers face to up or down) of the I/O buffers for much less wire length.

Chapter 4 Experimental Results

We implemented our algorithm in the C++ Programming language on a intel(R) Xeon(TM) CPU 3.00GHz work station with 2GB memory. The benchmark circuits fc1, fc2, . . ., fc5 are real consumer designs (DVD players, MP3, etc) and were provided by the leading foundry UMC and its design service company Faraday.

Table 4.1 lists the names of circuits, the number of blocks, the number of buffers, the number of differential pairs, the chip areas and the parameters α and β (also can be defined by the company). The parameter α is the weighting factor of the skew part Φ₁ of the objective function Γ, and the β is that of the path delay part Φ₂ of Γ.

Table 4.1: Statistics of the test circuits[2]

Circuit # of blocks # of buffers # of differential pairs chip area α β

fc1 6 25 3 1040x1040 50 50

fc2 12 168 10 3440x3440 50 50

fc3 23 320 20 4240x4240 70 30

fc4 28 384 20 4440x4440 70 30

fc5 28 384 20 4440x4440 70 30

We compared our algorithm with the B* tree representation based hierarchical

Figure 4.1: Power integrity exhibition before and after concern of case5(fc5). The upper figure has a hot spot that drains more power then the other grids .

eration, additionally. The experiment resultsare shown in Table 4.2 .

As shown in Figure 4.1, our method also obtains better results in power distribu-tion problem. We do not put too many buffer into one grid if the grid’s supporting

Table 4.2: Experimental results of our placement method and [1] where the CPU time of [1] was measured on a 1.2GHz workstation with 8GB memory. This table shows the effectiveness of our approach

Ckt [1] Ours Improvement(%)

Total path delay 17760 18070 -1.74

Max. input skew 120 230 -91.6

Max. output skew 90 180 -100

fc1 Avg. skew of differential pairs - 46.7

Cost Γ 2.01e+06 5.17e+06 -157.2

CPU Time 1s 0.36s

Total path delay 361650 354750 +1.9

Max. input skew 1010 720 +28.7

Max. output skew 1390 740 +46.8

fc2 Avg. skew of differential pairs - 42

Cost Γ 1.66e+08 7.10e+07 +57.2

CPU Time 16s 10.8s

Total path delay 619200 805540 -30.1

Max. input skew 1660 1060 +36.1

Max. output skew 1700 1320 +22.3

fc3 Avg. skew of differential pairs - 116

Cost Γ 4.14e+08 2.25e+08 +45.7

CPU Time 51s 72s

Total path delay 726040 1020220 -40.5

Max. input skew 2190 1200 +45.2

Max. output skew 2380 1500 +36.9

fc4 Avg. skew of differential pairs - 142

Cost Γ 7.54e+08 2.89e+08 +61.7

CPU Time 72s 216s

Total path delay 707430 947830 -33.9

Max. input skew 1730 1130 +34.7

Max. output skew 2160 1120 +48.1

fc5 Avg. skew of differential pairs - 163

Cost Γ 5.57e+08 2.06e+08 +63.0

CPU Time 78s 282s

Figure 4.2: The blocks and I/O buffers placement result of fc3.

Chapter 5 Conclusion and Future Work

We have presented a two step heuristic method for the block and I/O buffer place-ment for flip-chip design. This method not only offer a good result in signal skew and differential pairs, but also maintain the power distribution normalization.

For future improvement of our placement method, we can add some more con-straints into out placement algorithm like interconnection between blocks or take Re-Distributed Layer (RDL) into consideration. Connection line from bump to I/O buffer pass by Re-Distributed Layer, but connection line from I/O buffer to core block may pass by some other metal layer. We can model these two kinds of path and develop a better algorithm more associated to the real physical design.

Bibliography

[1] Chih-Yang Peng, Wen-Chang Chao, Yao-Wen Chang, , and Jyh-Herng Wang.

“Simultaneous Block and I/O Buffer Floorplanning for Flip-Chip Design.”. In Asia and South Pacific Conference on Design Automation., pages 24–27, 2006.

[2] Faraday Corp. “Block and Input/Output Buffer Placement for Skew/Delay Minimization in Flip-chip Design”. In Proc. of ACM International Sym-posium on Physical Design. IC/CAD Contest,Taiwan, 2003. http : //www.cs.nthu.edu.tw/ cad/cad91/P roblems/P 3/CAD_contest₂003_P3.pdf . [3] Hao-Yueh Hsieh and Ting-Chi Wang. “Simple Yet Effective Algorithms for

Block and I/O Buffer Placement in Flip-Chip Design*”. In IEEE International Symposium on Circuits and Systems, pages 1879 – 1882, 2005.

[4] Hung-Ming Chen, I-Min Liu, Muzhou Shao, Martin D.F. Wong, and Li-Da Huang. “I/O clustering in design cost and performance optimization for flip-chip design”. In Proceedings. IEEE International Conference on Computer Design, pages 562 – 567, 2004.

[5] Jinjun Xiong, Yiu-Chung Wong, Egino Sarto, and Lei He. “Constraint Driven I/O Planning and Placement for Chip-package Co-design”. In Asia and South Pacific Conference on Design Automation,, 2006.

[6] J. N. Kozhaya, S. R. Nassif, and F. N. Najm. “I/O Buffer Placement Method-ology for ASICs”. In IEEE International Conf. on Electronic, Circuits and Systems, pages 245–248, 2001.

[7] Audet Jean, D.P. O’Connor, Mike Grinberg, and James P. Libous. “Effect of organic package core via pitch reduction on power distribution performance”.

In Proceedings Electronic Components and Technology Conference, pages 1449 – 1453, 2004.

[8] G. Yasar, C. Chiu, R.A. Proctor, , and J.P. Libous. “I/O Cell Placement and Electrical Checking Methodology for ASICs with Peripheral I/Os”. In IEEE International Symposium on Quality Electronic Design, pages 71 – 75, 2001.

[9] P.H. Buffet, J. Natonio, R.A. Proctor, Y.H. Sun, , and G. Yasar. “Methodology for I/O cell Placement and Checking in ASIC Designs Using Area-Array Power Grid”. In IEEE Custom Integrated Circuits Conference, pages 125–128, 2000.

[10] R. Farbarik, X. Liu, M. Rossman, P. Parakh, T. Basso, , and R. Brown. “CAD Tools for Area-Distributed I/O Pad Packaging”. In IEEE Multi-Chip Module Conference, pages 125–129, 1997.

[11] P.S. Zuchowski, J.H. Panner, D.W. Stout, J.M. Adams, F. Chan, P.E. Dunn, A.D. Huber, , and J.J. Oler. “I/O Impedance Matching Algorithm for High Performance ASICs,”. In IEEE International ASIC Conference and Exhibit, pages 270–273, 1997.

[12] C. Tan, D. Bouldin, , and P. Dehkordi. “Design Implementation of Intrinsic Area Array ICs”. In Proceedings 17th Conference on Advanced Research in VLSI,, pages 82–93, 1997.

[13] R.J. Lomax, R.B. Brown, M. Nanua, , and T.D. Strong. “Area I/O Flip-Chip Packaging to Minimize Interconnect Length,”. In IEEE Multi-Chip Module Conference,, pages 2–7, 1997.

[14] S. N. Adya, I. L. Markov, , and P. G. Villarrubia. “On whitespace in mixed-size placement and physical synthesis”. In Proceedings of IEEE/ACM Int. Conf. on Computer-Aided Design, pages 311–318, 2003.

[15] A. R. Agnihotri, M. C. Yildiz, A. Khatkhate, A. Mathur, S. Ono, and P. H.

Madden. “Fractical cut: improved recursive bisection placement”. In Pro-ceedings of IEEE/ACM Int. Conf. on Computer-Aided Design,, pages 307–310, 2003.

[16] A. E. Caldwell, A. B. Kahng, and I. L. Markov. “Can recursive bisection alone produce routable placement?”. In Proc. of ACM/IEEE Design Automation Conf., pages 477–482, 2000.

[17] T. Chan, J. Cong, and K. Sze. “Multilevel generalized force-directed method for circuit placement”. In Proc. of ACM International Symposium on Physical Design, 2005.

在文檔中在晶片與封裝共同設計時對於核心區塊與輸入輸出緩衝器擺置的方法 (頁 20-35)