運用增廣拉格朗日方法的多階層式混合尺寸置放器

全文

(1). . . . . . . . . . . .

(2) A Multi-level Mixed-size Placer Using Augmented Lagrangian Method. .

(3). . . . . . . . . . . . . . .

(4) . . . . . . A Multi-level Mixed-size Placer Using Augmented Lagrangian Method. . . . . . . . .

(5). .

(6) . StudentYu-Chi Chou. . AdvisorJing-Yang Jou . Hung-Ming Chen. !. " # $ % & ! ) * + , '. . (. A Thesis Submitted to Degree Program of Electrical Engineering and Computer Science College of Electrical Engineering and Computer Science National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Electronics and Electro-Optical Engineering September 2006 Hsinchu, Taiwan, Republic of China. -. .. /. 0. 1. 2. 3. 0. 4.

(7) . . . . . . .

(8) . . . . . . . . . . . . . . .

(9). . .

(10). . . . . . . . . ! >. % !. G.

(11) J ^% 5 . ?. 1 ?. h. # 2. 8 i. @. ". @ 1. 5. $. & A. V <. %. 5 W. 4. ' B X. =. ( C. >. ) D ?. %. *. @. 1 A. A R S z { B J f | L C e ] B q } ~ L. +. ,. -. . ,. . . . .. ? E F G H ;Y Z [ \ ] j k l m n o % } ~ { f C { % ¡ ¢. . . %. . . . . . . . . . / I. . 0. % J O ^_ p q % ; * ) ! C ;£ ¤. 1. 2 3 4 K LM N ` a b c R S LO U3 D . 5 O

(12) r % ?. @. 6. 7. 8. 9. : ;< P Q @ A R S T ;d e O J f s t u G ;v w ] 4 . A ;* ) ¥ ¦ § ¨ © ª. 4. = ;U g x y « %.

(13) A Multi-level Mixed-size Placer Using Augmented Lagrangian Method StudentYu-Chi Chou. AdvisorsDr. Jing-Yang Jou Dr. Hung-Ming Chen. . Degree Program of Electrical Engineering and Computer Science National Chiao Tung University. ABSTRACT. Due to the trends of IP re-use and the SOC integration, mixed-size designs are very common now, and the quality of mixed-size placement becomes a critical step in the VLSI physical design. However, because the algorithms of macro placement and standard-cell placement are fundamentally distinct, placing the mixed-size design in a single flow is actually a challenging problem. In this thesis, we formulate the general placement problem as a nonlinear constrained optimization problem and solve it by the analytical approach incorporating with a multi-level scheme. The experimental results clearly show that our model can be employed as a global placer. By applying the augmented Lagrangian method to perform nonlinear programming, the result of the total half-perimeter wire length is comparable to current state-of-the-art placers. .

(14) ACKNOWLEDGEMENTS. I would like to express my sincerest appreciation to my advisors, Dr. Jing-Yang Jou and Dr. Hung-Ming Chen, for their guidance of this thesis. On every review Dr. Jou always highlighted the critical issues I never think of. Dr. Chen gave me consistent help and invaluable suggestion during my program development. Without their help I could never get this work done. I also would like to thank all the friends and colleagues who ever helped me during my thesis writing. At last I wish to thank my wife and my mother for the endless support and encouragement throughout my study years..

(15) Conte nts. 1 O v e rv ie w. 5. 1.1 In tro du c tio n . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 1.2. O u r C o n trib u tio n . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 1.3. O rg a n iz a tio n. 7. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2 P re lim in a rie s 2 .1 P la c er C la ssifi c a tio n. 8 . . . . . . . . . . . . . . . . . . . . . . .. 8. 2 .2. P rev io u s Wo rk o f M ix ed-siz e P la c er . . . . . . . . . . . . . . . 10. 2 .3. P ro b lem Fo rm u la tio n . . . . . . . . . . . . . . . . . . . . . . . 11 2 .3 .1. C o n c ep t . . . . . . . . . . . . . . . . . . . . . . . . . . 11. 2 .3 .2. Q u a dra tic O b jec tiv e . . . . . . . . . . . . . . . . . . . 12. 2 .3 .3. N o n lin ea r C o n stra in ts B a sed o n B in U tiliz a tio n. 3 A lg o rith m a n d Im p le m e n ta tio n. . . . . 13 18. 3 .1 M a in Flo w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 .2. A u g m en ted L a g ra n g ia n M eth o d . . . . . . . . . . . . . . . . . 2 0. 3 .3. N o n lin ea r C o n ju g a te G ra dien t M in im iz a tio n . . . . . . . . . . 2 1. 3 .4. N eg a tiv e G ra dien t E v a lu a tio n . . . . . . . . . . . . . . . . . . 2 3. 3 .5 M u lti-L ev el S ch em e . . . . . . . . . . . . . . . . . . . . . . . . 2 4 3 .6 L eg a liz a tio n . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 6. 1.

(16) 3.6.1. Legalization Flow . . . . . . . . . . . . . . . . . . . . . 26. 3.6.2. Implementation of Macro Legalization . . . . . . . . . 27. 4 E x perimental R esu lts. 29. 4.1 Some IBM ICCAD ’04 R esults . . . . . . . . . . . . . . . . . . 30 4.2 Macro Legalization . . . . . . . . . . . . . . . . . . . . . . . . 31 5 C onc lu sion. 36. 2.

(17) List of F ig u res 2.1 T he comparison of unconstrained and constrained optimization for a simple case . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 T he three ty pes of x-dimension overlap for small cells . . . . . 15 2.3 T he three ty pes of x-dimension overlap for large cells . . . . . 15 3.1 T he main fl ow of our algorithm . . . . . . . . . . . . . . . . . 19 3.2 T he algorithm of ALAG method . . . . . . . . . . . . . . . . . 21 3.3 T he coarsening/ uncoarsening fl ow . . . . . . . . . . . . . . . . 24 4.1 T he normalized H PWL comparison . . . . . . . . . . . . . . . 31 4.2 IBM01 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 IBM02 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 IBM03 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5 IBM04 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.6 IBM05 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.7 IBM06 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.8 IBM07 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.9. IBM08 result . . . . . . . . . . . . . . . . . . . . . . . . . . . 33. 4.10 IBM01 placement after macro legalization . . . . . . . . . . . 35. 3.

(18) List of Ta b les 3.1 The placement results with diff erent initial design points . . . 26 4.1 Some ICCAD’04 benchmark cases . . . . . . . . . . . . . . . . 29 4.2 The comparison of the placement results . . . . . . . . . . . . 30 4.3 The HPWL of ibm01 with diff erent legalization . . . . . . . . 34. 4.

(19) Chapter 1 O v erv iew 1.1. In tro d u c tio n. The complexity of circuit design continues to increase as the deep sub-micron IC technology keeps scaling down the feature size. This trend makes IP reuse become a necessary strategy to tame the design complexity and ensure timeto-market. Current SOC designs usually contain hundreds of thousands of standard cells, mixed with a number of IP, analog blocks, embedded memories, and pre-designed legacy blocks. To place so many different cells simultaneously is actually a great challenges. An obvious issue is the design scale. The placement algorithm must be very effi cient so that it can handle huge amount of placement instances within an acceptable runtime and memory size. In addition, the variation in cell dimensions introduces significant discontinuity in the solution space of the placement.. As indicated in [1], the cell size of the hard blocks may ranges from 1x to 10000x or more, comparing with that of a standard cell. That is the root cause why traditional standard-cell placers usually either fail to process 5.

(20) these mixed-size designs or produce results with unsatisfied q uality. Although floorplanners are suitable to place the cells with arbitrary sizes, the design scale makes the use of traditional floorplanners impractical. In existing commercial tools, placing the hard macros of a modern SOC design still req uires helps from human engineers and is a time-consuming task during the design floorplanning phase. Since placement plays a critical role in determining the circuit performance and layout resources, an efficient and effective algorithm which focuses on such mixed-size large-scale design would be very helpful to SOC developments. This thesis presents a new algorithm to tackle the large-scale mixed-size placement problem.. 1.2. O ur C ontrib ution A new formulation is proposed to model the placement problem. Our formulation focuses on wire length minimization and ensures an even cell distribution. Because of its universal form, the formulation can be applied on general designs and is suitable for mixed-size problem. We introduce the augmented Lagrangian method to implement the optimization solver and describe the programming details of the whole multi-level engine. The experimental results show that our algorithm is comparable to current state-of-the-art placers. A novel floorplanning techniq ue is studied as the macro legalization.. 6.

(21) 1.3. Org a niz a tion. The remaining part of this paper is organized as follows. Chapter 2 describes the background of the placement problem and our problem formulation. The details of our method is discussed in Chapter 3, and the experimental results are summarized in Chapter 4. We conclude the paper in Chapter 5.. 7.

(22) Chapter 2 P relim inaries 2.1. P lace r Classifi cation. Cell placement is the process to arrange the circuit components onto a layout surface. A placer that performs cell placement is usualy required to minimize the interconnects of the whole design as well as other objectives, depending on the design requirements. According to the design style, the placement problem can be classified as block placement or macro placement, standardcell placement or row-based placement, and mixed-size placement.. Generally, block placement focuses on full-custom design, in which the circuit components can be arbitrary sizes and shapes. Due to its high complexity, it can only deal with the design that contains a small number of placement objects. Most algorithms employs the floorplanning techniques to solve the block placement problem.. Standard-cell placement targets the digital design which is synthesized based on standard cells or gate array cells. Such design has the characteristics that 8.

(23) the cell heights are unique and the cell sizes are relatively similar. The difficulty of standard-cell placement is the extremely large problem size. Current advanced designs which contain million of cells are very common. To ensure a reasonable runtime can be achieved for such large designs, the heuristic algorithms of most standard-cell placers often exploit the design characteristic of equal cell height. After decades of work, now the state-of-the-art standardcell placers are able to deliver very optimized results with excellent efficiency for the pure standard-cell designs. These placers apply various heurstics such as recursive partitioning (Capo[2], FengShui[3]), recursive clustering (mPG[4]), analytical techniques (K raftwerk[5], fastPlace[6]), and the hybrid algorithms that combine partitioning and analytical methods (GORDIAN[8], GORDIAN-L[9]). However, for those designs that the required design characteristic does not exist, most standard-cell placers either cannot function normally or produce unacceptable results. Unfortunately, pure standard-cell designs are rarely seen in the SOC era. Most SOC designs often consists of a number of hard macros, and they must be fixed first in order to make standard-cell placement work. An obvious drawback of such flow is the loss of optimality when placing hard macros without considering the effect of standard cells.. Recently the concept of mixed-size placement is proposed to solve the challenges from these SOC designs. It emphasizes that large macro cells and standard cells can be handled in a single flow without human efforts.. 9.

(24) 2.2. Prev ious Work of M ix ed-size Placer. Some of the standard-cell placers mentioned above also propose sophisticated approaches to handle large-scale mixed-size designs. A three-stage placement-floorplanning-placement flow that incorporates Capo and the Parquet floorplanner was demostrated in [10]. Khatkhate et al. improved the placer FengShui and introduced a special fractional cut bisection that allows off-row alignment for horizontal cuts[11]. mPG-MS[1], the new version of mPG, clusters the standard cells to form big blocks and eventually clusters with big macros which have similar sizes. With multi-level simulated annealing scheme and careful treatment of macros in the legalization stage, mPG-MS can provide results with comparable quality as Capo. On the other hand, the analytical placer FDP[12] which is based on the method of Kraftwerk, minimizes the quadratic wire length objective and spreads cells by adjusting the extra spreading force. Another analytical placer, APlace, turns to a log-sum-exp wire length model and utilizes nonlinear programming to obtain outstanding wire length results for mixed-size designs. Among these mixed-size placers, the analytical placers exhibit their inherent flexibilities on handling the various design constraints in [5, 14, 15]. In this thesis, we propose a new analytical placement algorithm based on the augmented Lagrangian method to perform the global placement for large-scale mixed-size designs.. Among various analytical placement algorithms, the force-directed method attracted the most attention in recent years. The force-directed method simulates the design netlist as an spring system and solves the classic mechanics problem to determine the cell locations that minimize the total wire length. Such model is proved to be very efficient for large-scale problems. However, 10.

(25) by reason that it does not take the cell dimension into consideration, the solution usually contains a large amount of cell overlaps. How to eliminate those cell overlaps without much impact on wire length is actually the key point in force-directed methods. In Kraftwerk[5], additional forces are applied on cells to pull them away from dense regions. The new cell locations are obtained by solving the Poisson’s equation. V iswanathan et al.[6] proposed a simple cell shifting technique which determines the magnitude and direction of the new forces by expanding the bins with high utilization and shrinking those with low utilization. This method exhibits outstanding performance due to the fact that only the unconstrained minimization is needed in each iteration. In APlace[7], though a log-sum-exp wire length model is used to substitute the quadratic force model, a similar bin structure is constructed to represent the local information of cell distribution within the placement area. The regions with either too high or too low bin utilization would be penalized in the quadratic penalty function. By performing nonlinear optimization with increasing penalty weight an even cell distribution can be achieved eventually.. 2.3 2.3.1. Problem Form ulation C o n cep t. In order to simplify the problem complexity, we only target the wire length minimization as our objective in this thesis. The classic quadratic wire length is chosen as our wire length model for the sake of its simplicity. We also adopt the bin structure to represent rough cell distribution. The utilization of each bin stands for the cell density of this local region. In a global manner, if we can generate a placement which the bin utilizations are all 11.

(26) close to the chip density, an even cell distribution is obtained. Hence, we formulate the constraints based on the differences between the chip density and the exact utilizations of the bins. By combining the objective and the constraints, finding a placement with minimized wire length is modeled as a constrained optimization problem. Such constrained optimization can be solved effectively through the augmented Lagrangian method. The details of our problem formulation are described as below.. 2.3.2. Q u a d ra tic O b jectiv e. Given a netlist with n cells and m nets, the placement problem is to find the locations and orientations of all cells so that the total wire length is minimized. Since we do not need to consider the data flow, the netlist can be modeled as a non-directive graph G = (V, E) where each vertex vi ∈ V corresponding to a cell with the cell area as the vertex weight. Each edge eij ∈ E represents the connectivity between each pair of cells. In this thesis, we model a net as a clique which follows the formula proposed in [13], and thus the edge weight of eij can be determined by summing up the weights of all the clique edges between vi and vj . By constructing this connectivity graph we can calculate the quadratic objective which is actually the sum of the weighted squares of the Euclidean distances between two cells: f (x, y) =. n X n 1X cij [(xi − xj )2 + (yi − yj )2 ] 2 i=1 j=1. (2.1). The objective function can be rewritten in matrix notation as illustrated in [18]: 0 0 1 1 f (~x, ~y) = ~x0 C~x + d~x ~x+ ~y0 C~y + d~y ~y+co ns t. 2 2. (2.2). The vectors ~x and ~y denote the coordinates of the movable cells, and the prime denotes vector transposition. The matrix C represents the connec12.

(27) tivity of movable cells. The vectors d~x and d~y are contributed by the connectivity between the movable cells and the fixed cells, and the constant term is contributed by the connectivity between the fixed cells. Solving this unconstrained minimization problem is equivalent to solving the two linear equations: C~x + d~x = 0. (2.3). C~y + d~y = 0. (2.4). Such equivalence relationship requires that the connectivity matrix C is positive definite. Fortunately, this property is always true for general circuits because none of the connectivity would be negative. Hence, the quadratic objective can be easily minimized by any linear equation solvers. In this thesis, we solve Equation (2.3) and (2.4) by conjugate gradient method in order to obtain the initial placement. Note that the fixed cells such as the I/O pads must be provided to guarantee non-zero vectors for both d~x and d~y . Otherwise, a trivial solution that ~x = 0(~y = 0) will be obtained, which is not what we desire.. 2.3.3. N onlinear Constraints B ased on B in U tiliz ation. The major issue of the quadratic objective function in Equation (2.2) and (2.3) is that it would generate a placement with a large amount of cell overlaps. The reason is evident since there is no information about cell dimension at all in the equation. Due to that the number of internal connections is usually larger than that of the connections to fixed I/O pads, in most cases, the unconstrained optimization would result in the placement which the cell density at the center of is much higher than that at the chip boundary. Figure 2.1 demostrates such result for a 100-block circuit and compares it with. 13.

(28) Figure 2.1: The comparison of unconstrained and constrained optimization for a simple case another more even solution generated by constrained optimization. In order to eliminate the cell overlaps, we must evaluate the cell distribution to guide us how to push cells toward the vacant region. A straightforward model to measure the cell distribution is the bin structure. Let the chip area be divided by k bins, and all the bins have the same width wb and height hb . The bin utilization of bin bj is denoted as uj , which is defined as the total cell areas within bj over the bin area wb × hb . Given a placement, the local cell density can be measured by the utilizations of the k bins. If one bin has the utilization over 100% , it is impossible to find a feasible placement inside this bin, and thus some of the cells must be removed to decrease the bin utilization. Since our goal is to generate the placement which every bin utilization is close to the average chip density, we can define the constraints as uj − U = 0, for j = 1 . . . k. 14. (2.5).

(29) Figure 2.2: The three types of x-dimension overlap for small cells. Figure 2.3: The three types of x-dimension overlap for large cells , where U is the average chip density, that is, the total cell area over the chip area. In this thesis, we adopt a unified model to calculate the exact area that a cell contributes to each bin, no matter the cell size is larger or smaller than the bin size. The model is described as follows.. Consider a cell ci with width wi and height hi , and a bin bj with width wj and height hj . Also assume that the cell width is smaller than the bin width. Let dx and dy denote the center-to-center distance between ci and bj in x-dimension and y-dimension, respectively. The overlapping in x-dimension can be classified into three cases as illustrated in Figure 2.2. If dx is larger than or equal to (wj + wi )/2, there is no overlapping. As dx is decreased to be less than (wj + wi )/2, ci starts to overlap with bj in x-dimension, and the length of overlapping is (wj + wi )/2 − dx, which grows as dx decreases. When dx is further decreased to be less than or equal to (wj − wi )/2, the length 15.

(30) of overlapping will saturate at its maximum value, that is, min(wj , wi ). Another scenario is that the cell width is larger than the bin width, which may happens while dealing with the big macro cells. As shown in Figure 2.3, the overlapping is similar with that of the former scenario, except (wi − wj )/2 substitutes with (wj − wi )/2 and min(wj , wi ) changes to wj . The overlapping in y-dimension can also be calculated in the similar manner. Combine the two scenarios of different cell sizes and consider the x-dimension and the y-dimension together, the exact overlap area of cell ci and bin bj can be generalized as below : aij = ws hs Mx (dx )My (dy ). (2.6). , where ws = min(wi , wj ), hs = min(hi , hj ), Mx and My are actually two piecewise-linear functions of dx and dy , respectively.       . Mx (dx ) =      . My (dy ) =.             . 1, if dx ≤ δx −dx + δx + ws , if δx < dx < δx + ws ws 0, otherwise. (2.7). 1, if dy ≤ δy −dy + δy + hs , if δy < dy < δy + hs hs 0, otherwise. (2.8). ,where δx denotes |wi − wj |/2, and δy denotes |hi − hj |/2. With this unified model, the exact overlap between cells and bins can be easily computed. Now the bin utilization of each bin can be rephrased as follows : uj =. n 1 X aij , for j = 1 . . . k wb hb i=1. 16. (2.9).

(31) Combine the quadratic objective and the nonlinear constraints, the placement problem is restated in the following form : Minimize. 0 0 f (~x, ~y) = 12 ~x0 C~x + d~x ~x+ 12 ~y0 C~y + d~y ~y. (2.10) subject to cj (~x, ~y) = uj (~x, ~y) − U = 0, for j = 1 . . . k Here we describe the notations again. C denotes the matrix of the connectivities between any two movable cells. Each element in d~x denotes the sum of product of the connectivity between the movable cell to each fixed cell and the X coordinate of the fixed cell. Similarly, d~y is a vector composed of the sum of product of the connectivity between the movable cell to each fixed cell and the Y coordinate of the fixed cell. uj denotes the utilization of bin bj , and U denotes the average chip density. For the chip area divided into k bins, there are k constraints.. In Equation (2.10), the objective to be minimized is quadratic, but the constraints are nonlinear to ~x and ~y. This equation implies that any solver can solve this nonlinear constrained optimization problem effectively can be used as a global placer for mixed-size placement. The details of our approach is discussed in the next chapter.. 17.

(32) Chapter 3 A lgorithm and Implementation 3.1. Main Flow. The constrained optimization is usually attacked by solving a sequence of unconstrained problems. In each unconstrained problem, the merit function, which is composed of the original objective function and the constaints, is to be minimized. Since our formulation contains a quadratic objective function and nonlinear constraints, the merit function is nonlinear. In our work, a nonlinear minimizer based on the conjugate gradient method is implemented as the fundation of the solver. The merit function is formulated following the augmented Lagrangian (ALAG) method[19], which can be viewed as a combination of the quadratic penalty function and the ordinary Lagrangian method. Unlike the quadratic penalty function, the ALAG method usually generates a solution without an extremely large penalty, and thus avoids illconditioning effects. It also provides better convergence rate than ordinary Lagrangian method. Although the ALAG method has such advantages over other constrained optimization methods, it still does not guarantee to obtain the minimum from any starting point in the solution space. In order to make 18.

(33) Figure 3.1: The main flow of our algorithm a good choice of the starting point, the multi-level scheme is approached, as the procedure shown in Figure 3.1. Our algorithm begins with a multilevel graph coarsening, which is followed by a unconstrained minimization to determine the initial point for the coarsest graph. Then a sequence of constrained minimizations and graph uncoarsenings are performed. In each level, first the ALAG method is executed, and the solution is used as the starting point for the next finer level. Such process repeats until the finest graph is uncoarsened, and a solution of the global placement is finally obtained. For the purpose of improving the efficiency, an extra decision branch 19.

(34) is added to skip the ALAG method when the number of vertices is not suffciently larger than that in last coarser level in which the ALAG method was performed. We set the vertex ratio as 1.4. In other words, every time the ALAG method is performed, the graph size is at least 1.4x larger than that of last run. The implementation details are described in the following sections.. 3.2. A ugmented L agrangian Meth od. The augmented Lagrangian method introduces one penalty parameter µ for the quadratic penalty term and explicit Lagrange multiplier estimate λj for each constraint cj (~x, ~y). By relaxing the constraints cj (~x, ~y) into the original objective f (~x, ~y), the constrained problem in Equation (2.10) is transformed into a sequence of unconstrained problems shown in the following equation: L(~x, ~y) = f (~x, ~y) −. X. λj cj (~x, ~y) +. 1 X 2 cj (~x, ~y), for j = 1 . . . k 2µ. (3.1). At the n-th iteration, the ALAG method fixes the penalty parameter µn and all the Lagrange multiplier estimates λnj , and performs unconstrained minimization with respect to ~x and ~y. After an approximate minimizer is found, we check the convergence of the merit value. If the convergence criterion is satisfied, the ALAG method is terminated with approximate solution (~xn , ~yn ). Otherwise, we update the Lagrange multiplier estimates and the penalty parameter for the next iteration. The algorithm is shown in Figure 3.2.. 20.

(35) given initial solution (x 0 s , y 0 s ) repeat find (x k , y k ) that minimizes Lk (x,y ) from (x ks , y ks ) if ( c(x k , y k ) the tolerence k ) if (no obvious reduction on Lk (x,y )) break with solution (x k , y k ) foreach Lagrange multiplier. kj 1 kj c j (x k , y k )/ k else. k 1 0.5 k (x k 1s , y k 1s ) (x k , y k ). Figure 3.2: The algorithm of ALAG method. 3.3. N onlinear Conjugate G radient Minimization. The subproblem stated in Equation (3.1) indicates that a nonlinear unconstrained minimization is required within every ALAG iteration. While choosing the minimizer, its performance is the major concern since it is located in the loop and will be called many times. Here we choose the conjugate gradient method[19]. The conjugate gradient method is an iterative method and is very popular in large-scale problems due to its efficiency. In each iteration, only several vector operations are needed to determine the search direction, and a one-dimensional minimization is performed along the search direction to find the suitable step size. With successive line searchs, this algorithm can gradually reach the local minimum.. 21.

(36) At the k-th iteration, the search direction pk is a linear combination of the negative gradient −gk and the previous search direction pk−1 . When k = 0, the negative gradient at the initial solution x0 is an intuitive choice for the initial search direction, since the previous search direction does not exist. In general, the search direction can be expressed in the following form : pk =.     . −gk ,. if k = 0. (3.2). −gk + βk pk−1 , otherwise. where βk is a scalar such that pk is conjugate to pk−1 . There are several variants in the choice of βk , and here we follows the Polak-Ribiere formula[20] : βk =. gk0 (gk − gk−1 ) 0 gk−1 gk−1. (3.3). After the search direction pk is obtained, the step size αk is determined by finding the approximate one-dimensional minimizer along the direction, and thus the new solution for next iteration is given by : xk+ 1 = xk + αk pk. (3.4). Our one-dimensional minimization combines the line searach method following the Armijo’s rule and the Fibonacci search method. Both details can be found in [20]. The Armijo’s rule is also known as the Wolfe condition, which uses the first-order approximation to decide whether a step makes enough cost reduction in the merit function. If the step makes enough reduction, we accept it and further expand the step size to examine the Armijo’s rule again. On the contrary, if the reduction is not satisfied, we repeatly contract the step size until either the Armijo’s rule is satisfied or the step size becomes too small. In each line search, first we determine the interval that contains the minimum by examining the Armijo’s rule, and then we perform 22.

(37) the Fibonacci search within the interval to find the most suitable step size.. The stopping criteria of our conjugate gradient method are : (1) the gradient is too small, (2) the merit value cannot be further improved after several iterations, or (3) a maximum number of iterations is reached.. 3.4. Negative Gradient E valuation. An important factor in the conjugate gradient method is how to compute the negative gradient for the complex merit function presented in Equation (3.1). In our work, we develop a special approximation of the negative gradient. Let (~xk , ~yk ) denotes the current design point of the k-th iteration of the conjugate gradient method, (xki , yik ) denotes the i-th element of (~xk , ~yk ), and δ denotes a unit distance. In the evaluation of the negative gradient at the k-th iteration, for each element (xki , yik ) eight different directions are examined with moving a δ stepsize. More specifically, (xki , yik ) is replaced by (xki + δ, yik ), (xki + δ, yik + δ), (xki , yik + δ), (xki − δ, yik + δ), (xki − δ, yik ), (xki − δ, yik − δ), (xki , yik − δ), and (xki + δ, yik − δ), respectively, and thus eight different design points are evaluated to obtain the difference of the merit value, comparing with the original design point. The difference actually represents the local information that how the merit value is affected if the cell moves along this direction. Obviously, the movement that causes the most decrease in the merit value should be chosen, and the corresponding element of the negative gradient can be determined by the definition of the gradient, that is, the difference of the merit value over the distance. If this move only causes the change in the X-direction, for example, then the Y-component of the element of the negative gradient is 0, and vice versa.. 23.

(38) . G0. G0. . G2

(39) . . G2. …. . . . .

(40). G1. …. . G1. Gn. Figure 3 .3 : T he c oa rsen in g/ un c oa rsen in g fl ow. It is c lea r tha t the eva lua tion of the n ega tive gra d ien t would ta k e much more time then tha t of the merit va lue. H owever, with c a reful implemen ta tion , the time spen t on on e eva lua tion of the n ega tive gra d ien t c a n be less tha n eight times of tha t spen t on on e eva lua tion of the merit va lue.. 3.5. M u lti-L e v e l S ch e m e. M ulti-level scheme is wid ely a d opted in the d oma in of physic a l d esign . A typic a l implemen ta tion c on sists of the c oa rsen in g pha se a n d the un c oa rsen in g pha se, a s shown in Figure 3 .3 . In the begin n in g the problem is rec ursively c oa rsen ed to red uc e the problem siz e. T he optimiz a tion is performed a t the c oa rsest level, a n d then in ea ch of the followin g fi n er levels the loc a l refi n emen t is performed d urin g the un c oa rsen in g pha se. T he mult-level scheme is very a ttra c tive in the run time red uc tion without too much q ua lity loss. P la c ers with such scheme c a n be foun d in [1 , 7 , 1 3 ].. 24.

(41) In this thesis, the implementation of a multi-level framework is mainly due to the lack of a good initial design point for the AL AG method. It is wellknown that the inital design point significantly aff ects the result and the performance in nonlinear constrained optimization[21]. B ut how to decide a reasonable initial design point? We think that the optimization result of the coarser level should be a proper choice. D uring the uncoarsening phase, the solution obtained in the coarser level is inherited as the initial solution of the AL AG method in the current level. The only ex ception is at the coarest level, where a unconstrained optimization is first performed to obtained the required initial solution. B y testing a simple case which contains 10 7 cells and 15 1 nets, we confirmed that such multi-level framework is superior to the direction optimization with the initial design point given by random guess. The row 2 to row 9 in Table 3.1 show the placement results with 8 random initial design points, and their average results are shown in the row 10 . The row 11 shows the placement result with the same optimization parameters ex cept for the initial design point obtained by a 7-level optimization. It is clear that both the half-perimeter wire length and the amount of cell overlapping obtained by the multi-level scheme are better than those obtained in the random cases.. O ur coarsening/uncoarsening approach follows the algorithms proposed by metis and hMetis[16 , 17]. The randomized ” First-C hoice” matching is performed to cluster the adjacent vertices based on the connectivity and vertex weights. Here the vertex weight is the cell area. When two vertices are clustered together, their weights are summed up to form the new vertex . We recursively coarsen the graph until (1) the graph size is small enough that the number of vertices is less than a given lower bound, saying 10 . (2) there. 25.

(42) Table 3.1: The placement results with different initial design points HPWL. C e ll ove rla p. random 1. 242.0 0. 388.88. random 2. 376.0 0. 183.13. random 3. 444.0 0. 156.63. random 4. 453.0 0. 155.38. random 5. 423.0 0. 182.38. random 6. 39 0 .0 0. 167.50. random 7. 378.0 0. 167.38. random 8. 413.0 0. 177.63. av g .. 389 .88. 19 7.36. mu lti-le v e l. 344.0 0. 157.50. exists some vertex whose weight exceeds a given maximum weight.. 3.6 3.6.1. Leg a liz a tio n L e g a liz a tio n F lo w. After the ALAG method is performed at the finest level, we can obtain a placement solution that every bin has similar cell utilization. It is obvious that such result doesn’t guarantee overlap-free, and thus an extra legalization step is necessary. This step can be further divided into three stages: (1)macro legalization, (2)row legalization, and (3)detailed placement.. In the first stage, only the macro cells are legalized without considering the standard cells. After a solution can be obtained that none of the macro cells overlaps with each other, these macro cells are set fixed, and we go to next stage to legalize the standard cells. The key point of the stage 1 and stage 2 is that the legalization must honor the solution of our global placement 26.

(43) and only moves cells locally. After stage 2, the placement is a legal solution, but the total wire length may increase. Hence, the final stage is to refine the standard cells locally to further reduce the wire length as well as to meet other design constraints without cell overlaps. In our work, we evaluate a new macro packing algorithm to perform macro legalization, but we do not cover the row-based legalization and detailed placement.. 3.6.2. Im p lem entation of M ac ro Legalization. Our macro legalization is based on the block packing technique which is widely used in design floorplanning. The fundation of block packing is a floorplan representation, with which the geometric relationship of the blocks can be well-defined. By disturbing the representation sequentially, we can evaluate different packings and choose the one with the best cost. We developed a representation which is actually a modification of B* -Tree[23]. B* -Tree is an ordered binary tree in which a node stands for a block to be packed. The root node is initially placed as the bottom-left block, and the leaf nodes are placed by the rules as follows. If node nj is the left child of node ni , module bj must be located on the right-hand side and adjacent to module bi . If node nj is the right child of node nj , module bj must be located above and adjacent to module bi . After traversing the whole tree, all the blocks are placed, and a nonslicing floorplan is constructed. However, the packing along the bottom-left direction is not what we desired. In most commercial ICs, the macro cells are often placed around the chip boundary, and the chip center can forms a complete region for standard cell placement. To ensure the macro cells can be 27.

(44) packed in such style, a four-B*-Tree stretegy is employed. We pack the four trees from the four chip corners. The bottom-left tree is actuall a B*-tree, and the top-left tree is a clockwise 90-degree rotation of B*-Tree. S imilarily, the top-right tree and the bottom-right tree are also rotated B*-Trees.. In the beginning the legalization, the macro cells are assigned to the four trees by their locations which are determined by the global placement. For each tree, the sortings along the X -direction and Y -direction are performed to decide the geometric relationships of these cells. Thus, the trees can be constructed following the B*-Tree definition. Then we start the S A process to disturb the trees. In each iteration, only one tree is selected, and one of the three possible operations is applied on the node which is chosen by random. ROTATE - Rotate the cell by 90-degree, 180-degree or 270-degree. MOV E - Insert a node ni to be the child of another node nj . The child trees of both ni and nj may be affected recursively. S WAP - S witch two nodes ni and nj in the tree. The cost function of our S A process is composed of two terms. The first term is the sum of the distance from the block to the chip corner of the tree. The second is the related HPWL of the blocks. We also make the weight of the first term larger than that of the HPWL term to ensure a compact packing. The result of macro legalization is discussed in S ection 4.2.. 28.

(45) Chapter 4 E x perim en tal R esu lts We implemented our placer in C/C+ + and compiled it using g+ + on cygwin, which emulates Linux environment on Windows XP. We ran the program on a 1.5GHz PC with 512MB memory, and used the IBM ICCAD’04 benchmark set in LEF/DEF format as testcases, which can be downloaded from [21]. Table 4.1: Some ICCAD’04 benchmark cases # of in st(c ore/ m ac ro/ pad ). # of n ets. # of m asters. ibm01. 12752 ( 12260 / 246 / 246). 14111. 2846. ibm02. 19601 ( 19071 / 271 / 259). 19584. 3057. ibm03. 23136 ( 22563 / 290 / 283). 27401. 3757. ibm04. 27507 ( 26925 / 295 / 287). 31970. 4587. ibm05. 39347 ( 28146 / 0 / 1201). 28446. 4911. ibm06. 32498 ( 32154 / 178 / 166). 34826. 4598. ibm07. 45926 ( 45348 / 291 / 287). 48117. 5748. ibm08. 51309 ( 50722 / 301 / 286). 50513. 4235. 29.

(46) 4.1. Some IB M IC C A D ’0 4 R esults. We compared our placement results with other academic placers by applying the ICCAD’04 benchmark set. This benchmark set contains 18 large-scale designs and most of them include large movable macros. These macros are all hard blocks with fixed aspect ratios and pin locations. In addition, the locations of I/O pads are fixed around the chip boundary. Such characteristic is a must for force-directed methods such as our placer, because the fixed cells provides the external energy to expand the movable cells. Due to the short of resources, we only ran the first 8 cases. The design chararcteristics of the cases we tested are listed in Table 4.1, and the comparison results are shown in Table 4.2. Table 4.2: The comparison of the placement results APlace. APlace. FengS h ui. C ap o. M PG -M S. O urs. O urs. gp W L. dp W L. dp W L. dp W L. dp W L. gp W L. runtime. ibm01. 2.17. 2.14. 2.56. 2.67. 3.01. 2.74. 12.35. ibm02. 4.83. 4.61. 6.05. 5.54. 7.42. 5.60. 26.50. ibm03. 6.94. 6.72. 8.77. 8.67. 11.20. 8.13. 34.00. ibm04. 7.70. 7.60. 8.38. 9.79. 10.50. 9.83. 46.58. ibm05. 9.82. 9.70. 9.94. 10.82. 10.90. 10.01. 36.46. ibm06. 6.31. 5.99. 6.99. 7.35. 9.21. 8.79. 65.25. ibm07. 10.04. 10.02. 11.37. 11.23. 13.70. 11.96. 72.78. ibm08. 12.65. 12.34. 13.51. 16.02. 16.40. 15.52. 91.93. The first two columns show the HPWL of APlace global placement and APlace detailed placement. By normalizing the gpWL to 1, we found that dpWL is 0.975 smaller than gpWL in average. It indicates that with a strong correlation between global placement and detailed placement, the wirelength 30.

(47) ˄ˁˉ. ˄ˁˇ. ˄ˁ˅. ˄ APlace dpWL FengShui dpWL. ˃ˁˋ. Capo dpWL MPG-MS dpWL Our gpWL. ˃ˁˉ. ˃ˁˇ. ˃ˁ˅. ˃ ibm01. ibm02. ibm03. ibm04. ibm05. ibm06. ibm07. ibm08. avg.. Figure 4.1: The normalized HPWL comparison could be further reduced after legalization and detailed placement, and the HPWL of global placement and detailed placement should be very close. The 3th, 4th, and 5th column are the detailed placement results of FengShui v2.6, Capo v9.0, and MPG-MS, respectively. The data of the first four columns were reported in [22], and the data of the 5th column was reported in [1]. The last two column show the global placement HPWL and the runtime of our placer. We also normalized our HPWL as 1 to see the differences with the other placers, as depicted in Figure 4.1. The last set is the average from ibm01 to ibm08, and the values are 0.81, 0.94, 0.99, 1.15 and 1, respectively. Our results for the eight cases are depicted in Figure 4.2 to Figure 4.9.. 4.2. Macro Legalization. We implemented the block packing that mentioned above in our multi-level framework. Every time a finer level begins, we check whether there exists any macro cell as a single node in the current graph. If it does, the SA-based block packing is performed before the ALAG method. The parameters of our. 31.

(48) Figure 4.2: IBM01 result. Figure 4.3: IBM02 result. Figure 4.4: IBM03 result. Figure 4.5: IBM04 result. 32.

(49) Figure 4.6: IBM05 result. Figure 4.7: IBM06 result. Figure 4.8: IBM07 result. Figure 4.9: IBM08 result. 33.

(50) SA process are described as follows. We set the temperature-decreasing rate as 0.85 and the stopping temperature Tstop as 5.0−4 of the initial temperature T0 . At each temperature, the number of moves is set as 300∗th e nu m o f blo ck s. And the uphill probability is 0.85 at the initialization. Following such parameter setting to test the ibm01 case, the experimental result is shown in the 2nd column of Table 4.3. Comparing with the original result which we list again in the first column, it clearly indicates that such block packing impacts the HPWL greatly. The 3rd column shows the result when we further decrease Tstop to 1.5−4 . The HPWL is improved slightly at the cost of runtime. The corresponding placements for the two packings are depicted in Figure 4.10. We can see that the cell distribution is quite uneven. This is because the block packing disturbs the solution too much. Even we try to maintain the geometic relationships between blocks at the tree construction, the following SA process would destroy the relationships eventually. Since the solution for the macro cells becomes very different, it also affects the standard cells and escapes the result which we obtained through the multilevel scheme.. From the experimental results, we realize that the SA-based block packing is not suitable to incorperate with our algorithm. We should seek for a legalizer which can honor the global placement solution with minimum movement toward the chip boundary. Table 4.3: The HPWL of ibm01 with different legalization. H PWL. No Pack ing. S A w ith Tstop = T0 ∗ 5.0−4. S A w ith Tstop = T0 ∗ 1.5−4. 2.74. 3.79. 3.54. 34.

(51) (a) Tstop = 5.0-4T0. (b) Tstop = 1.5-4T0. Figure 4 .1 0 : IB M 0 1 p la c em en t a fter m a c ro lega liz a tio n. 35.

(52) Chapter 5 Co n c lu sio n A new multi-level mix ed-size placer is proposed in this thesis. It is based on a unifi ed model of the placement problem and utilizes the augmented L agragian method to optimize the H P W L . T he multi-level scheme not only improves the performance but also helps decide good initial points for nonlinear programming. T he ex perimental results show that our placer generates good global placements, and the q uality is comparable to current state-of-the-art placers.. Many features are necessary to form a complete placement framework . T he future work s include a robust legalizer for both macro cells and standard cells, and an eff ective algorithm for detailed placement. T he performance of our placer can be further improved. And more design constraints such as congestion and power should be supported to mak e our placer more practical.. 36.

(53) Bibliography [1] C .-C . C hang, J . C ong, and X . Yuan, Multi-level Placement for LargeS cale Mixed-S ize IC D esigns , In Proc . Asia S ou th Pa c ifi c D esig n Au tom a tion C on feren ce, pp. 32 5-330, 2 003. [2 ] A. E . C aldwell, A. B. K ahng, and I. L. Markov, C an R ecursive Bisection Alone Produce R outable Placements? , In Proc . D esig n Au tom a tion C on feren ce, pp. 47 7 -48 2 , 2 000. [3] A. Agnihotri, M. C . Yildiz, A. K hatkhate, A. Mathur, S . O no, and P. H. Madden, Fractional cut: Improved recursive bisection placement , In Proc . In tern a tion a l C on feren ce on C om p u ter-Aided D esig n , pp. 307 -310, 2 003. [4] C .-C . C hang, J . C ong, D . Pan and X . Yuan, Physical hierarchy generation with routing congestion control, In Proc . In tern a tion a l S y m p osiu m on Ph y sica l D esig n , pp. 36-41, 2 002 [5] H. E isenmann and F. M. J ohannes, G eneric G lobal Placement and Floorplanning , In Proc . D esig n Au tom a tion C on feren ce, pp. 2 69 -2 7 4, 19 9 8 [6] N . V iswanathan and C hris C .-N . C hu, FastPlace: E ffi cient Analytical Placement using C ell S hifting, Iterative Local R efinement and a Hybrid 37.

(54) Net Model, In Proc. International Symposium on Physical Design, pp. 26-33, 2004 [7] Andrew B. Kahng and Q . Wang, Implementation and Extensibility of an Analytic Placer, In Proc. International Symposium on Physical Design, pp. 18-25, 2004 [8] J. M. Kleinhans, Georg Sigl, F. M. Johannes and K. J. Antreich, GORDIAN: VLSI Placement by Q uadratic Programming and Slicing Optimization, IE E E Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol.10, No.3, Mar 1991 [9] Georg Sigl, K. Doll and F. M. Johannes, Analytical Placement: A Linear or a Q uadratic Objective Function? , In Proc. ACM / IE E E Design Automation Conference, pp. 427-431, 1991 [10] S. N. Adya and I. L. Markov, Consistent Placement of Macro-Blocks U sing Floorplanning and Standard-Cell Placement , In Proc. International Symposium on Physical Design, pp. 12-17, 2002 [11] A. Khatkhate, Chen Li, A. R. Agnihotri, M. C. Yildiz, S. Ono, C.-K. Koh and P. H. Madden, Recursive Bisection Based Mixed Block Placement , In Proc. International Symposium on Physical Design, pp. 84-89, 2004 [12] K. Vorwerk, A. Kennings and A. Vannelli, Engineering Details of a Stable Force-Directed Placer, In Proc. IE E E / ACM International Conference on Computer-Aided Design, pp. 573-580, 2004 [13] B. Hu and M. Marek-Sadowska, Fine Granularity Clustering-Based Placement, IE E E Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol.23, No.4, Apr 2004 38.

(55) [14] A. B. Kahng and Q. Wang, An Analytical Placer for Mixed-Size Placement and Timing-Driven Placement, In Proc. IEEE/ACM International Conference on Computer-Aided Design, pp. 565-572, 2004. [15] Y.-C. Chou and Y.-L. Lin, A Performance-Driven Standard Cell Placer Based on a Modified Force-Directed Algorithm, In Proc. International Symposium on Physical Design, pp. 24-29, 2001. [16] G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, In SIMA J ournal on Scientific Computing, Vol.20, No.1, pp. 359-392, 1999 [17] G. Karypis, V. Kumar and S. Shekhar, Multilevel Hypergraph Partitioning: Application in VLSI DomainIn Proc. ACM/IEEE Design Automation Conference, pp. 526-529, 1997 [18] K. M. Hall, A r-dimensional quadratic placement algorithm, In Management Science, pp. 219-229, 1970 [19] D. P. Bertsekas, Constrained O ptimization and Lagrange Multiplier Methods, Academic Press, 1982 [20] M. S. Bazaraa, H. D. Sherali and C. M. Shetty, N onlinear Programming - Theory and Algorithms, 2 nd Edition, John Wiley & Sons, 1993 [21] S. N. Adya, S. Chaturvedi, A. Roy, D. Papa and I. L. Markov, Unification of Partitioning, Floorplanning and Placement, In Proc. IEEE/ACM International Conference on Computer-Aided Design, pp. 550-557, 2004 (URL: http:/ / vlsicad.eecs.umich.edu/ BK/ ICCAD04bench/ ). 39.

(56) [22] A. B. Kahng, S. Reda and Qinke Wang, Architecture and Details of a High Quality, Large-Scale Analytical Placer, In Proc. IEEE/ACM International Conference on Computer-Aided Design, pp. 890-897, 2005 [23] Y.-C. Chang, Y.-W. Chang, G.-M. Wu and S.-W. Wu, B* -Trees: A New Representation for Non-Slicing Floorplans, In Proc. IEEE/ACM Design Automation Conference, pp.458-463, 2000. 40.

(57)