Chapter 5 FDPrior Algorithm FDPrior Algorithm
5.1 FDPrior algorithm
The algorithm flow is illustrated in Fig.10. The orange parts implement on the GPU platform, as well as the white parts execute on CPU. The algorithm is composed of three phases, including N-body simulation, mapping cells to a layer, and escape
Figure 10: The flow chart of FDPrior algorithm.
Independent module Circuit Netlist
Escape From Local Optimum Mapping Cells To A Layer Initialization & Force Modeling
N-Body Simulation
Does Simulation Stop?
Terminate k=K-1?
Start
CPU GPU
Without Layers Construction (k=0)
With Layer-k is Stacked Up k=k+1
Yes
No Yes
No
27
from local optimum. N-Body simulation phase simulates the movement of each mobile cell based on the forces impacted on it. The second phase picks appropriate cells and maps them to a layer. The first two phases limit the optimization search within a regional area and could fall into a local optimum. Thus the third phase enables a mechanism to escape from local optimum through disturbing the unfixed cells. The above phases will be repeated K-1 times to determine the lower K-1 layers.
The remaining mobile cells after K-1 iterations are directly placed into the layer_ K.
Details of each phase are discussed in the below sections.
5.1.1 Phase1: N-body Simulation
FDPrior employs the non-heuristic approach to calculate the forces. N-body simulation numerically approximates the motion of each cell in a system. Each cell in a netlist is modeled as a body and be moved independently. Consequently, there has massive parallelism and exploits by the GPGPU platforms. We separate the force into two fundamental components. First, the hold force will give each cell a tendency to stay in the current position. The second component is an attractive force, which is induced by the inter-cell forces to move cells to towards the equilibrium position.
These two forces pull each other until the total force is zero; and the system reaches a static equilibrium.
5.1.1.A Hooke’s Law
Many problems seem new but actually not. The quadratic placement techniques had been used for the placement problem in a long time. We can refer the previous methods on placement and make modifications for solving partitioning in 3DICs.
Accordingly, FDPrior adopts old solutions of the placement problems and assumes the Hooke’s law to describe the motions of force. In quadratic placement problem, many
28
algorithms use the springs or extra forces by Hooke’s law to obtain better solutions, e.g. Quinn published in 1975 [26]. This method is usually celled force-directed approach in the EDA field.
Simulating mobile cells in a free space may appear vibration when applying gravitational forces that describes in Section 2.2. In our experience, the vibration could take a long time to reach a stable state by using Equation 1 as the force equations. We practice the general rectilinear motion formulae to describe the motion along a straight line. Since there are not frictions in simulation, the accumulated speed keeps each body moving even after they have arrived the meeting point, and results in a two-body vibration situation. According to practical implementations, f the vibration mode can rarely provide improvement on the solutions. In the simulation, the vibration mode happens when the direction of a body’s acceleration is opposite to its velocity. For this reason, spending too much time on the simulation of the vibration mode is unnecessary and considered redundant.
In physics, Hooke's law of elasticity is an approximation that expresses the extension of a spring with the load applied to it. Hooke's law simply presents that strain is directly proportional to stress. The common application of Hooke’s law is spring application. To avoid vibration and obtain quick convergence, FDPrior adopts the forces exerted by the springs, which are defined by Hooke’s law. Hooke’s law states mathematically in Equation7 where F is the restoring force exerted by the spring.
(7)
Where x is the displacement of this spring's end from its equilibrium position;
and k is the spring constant. A negative sign on the right hand side of Equation 7 shows always opposite direction between displacement and restoring force. FDPrior perform force-directed method to solve equations of forces instead by linear solver.
29
5.1.1.B Hold Force
The hold force is defined in Equation 8. , defined by the area of cell , is the spring constant which affects the strength of the hold force.
(8)
The negative sign presents this cell want to stay in the current position. When the area of a cell is large, this cell has larger momentum a nd is more difficult to be moved.
5.1.1.C Attractive Force
Fig.11 illustrates the attractive force in z-direction on cell by a spring joining cells and [ 27]. In a hypergraph, all of the 2-pin nets are exerted by the stretched springs in accordance with clique model.
Figure 11: The force by a stretched spring joining cell i and j.
The attraction force between two bodies forms a tendency to pull the bodies closer to each other at every simulation step. The attractive force between cells is defined in Equation 9. The final attractive force is the accumulated forces impacted on a cell .
(9)
In Chapter 3, we mentioned that FDPrior adhere the clique model to modify the given hypergraph, which is translated from the multi-pin nets to a group of 2-pin nets by clique model. All the weight of each 2-pin net is same and set to unit in a system.
The formulation of the force in the clique by Hooke’s law is equivalent to Equation 8.
30
const int i = blockDim.x * blockIdx.x + threadIdx.x; // create thread id in GPU bool active= true;
float distance = 0;
if ( i >= cell_size) active = false;
else if ( dcell_anneal[i]<=0 )active = false; //these fixed cells in the system if ( active ){
float nowlayer = dcell_layer[i]; //the current position of cell with index for (int j=dc2c_nbegin[i];j<dc2c_nbegin[i+1];j++) //the identified set i
distance+= dcell_layer[ dcell2cell[j] ] - nowlayer;
dcell_force[i]= distance; // the final attractive force of cell with index i }
__syncthreads();
if ( active ){
distance = __fdividef(dcell_force[i],dcell_area[i]);//calculate displacement dcell_layer[i]+= distance; //update the new position of cell i if( dcell_layer[i] < 0 ) dcell_layer[i] = 0;
} }
Table 3: The code of the calculated forces in the N-body simulation phase
The total force is the sum of the hold force and attractive force. The new cell positions are efficiently computed by solving Equation 10 for .
(10)
The N-Body Simulation phase terminates when the number of iterations reaches the threshold level or the total number of iterations exceeds a configured limit. In each layer space, configured limit is 2000 in our experimentation. The threshold level is defined as * distribution ratio where is an empirical number. The range used in
31
this algorithm is = 0.95. And the distribution ratio is the total number of mobile cells which fall in the constructed layers divided by the rest of the mobile cells. In other words, * distribution ratio means that the majority of cells exercise enough and already drop to the lower layer by the connected nets.
Table 3 shows the calculations of forces in the N-body simulation phase. Line 6 percolates already fixed cells in a system. Line 7-12 calculates each attractive force with cell by creating a thread with index i. Because throughput of single-precision floating-point division takes some cycle, we use __fdividef functio n to accelerate for division, shown in line 15. This __fdividef(x,y) function means x divided by y and provides a faster math version. Both the regular floating -point division and __fdividef(x,y) have the same accuracy. But __fdividef(x,y) delivers a result of zero and cause inaccuracy when 2126 < y < 2128. Fortunately, the areas of every cells in the ISPD98 benchmark are not bigger than 2126. Line 14-18 arranges the data and updates the new positions of cells. Since the I/O pins are located in the bottom layer, the positions cannot be lower than 0. In case this situation occurs, we add a refinement in line 17.
5.1.2 Phase 2: Mapping Cells To A Layer
In the Mapping Cells To A Layer phase, the layer space is constructed by gradually stacking up cells from the bottom of a layer. Since it is a bottom -up approach, cells at lower positions have higher priorities to be included in this layer.
When more cells are included in this layer, the area of this layer will increase. And FDPrior will search the best layer boundary (with minimal cutline) between the minimum layer area bound ( ) and maximum layer area bound ( ) which were defined in Chapter 3. These cells which are mapped into this layer will be changed into the fixed state, while others remain in mobile states.
32
In other words, FDPrior will gradually search cells which are located in lower positions and straightly put them in this layer until the area of this layer reaches the minimum layer area bound. For example, these cells with index 1 … p are successively mapped into a layer. Then FDPrior estimates the cutline and save data in the buffer when each cell moves until the area of this layer reaches the maximum layer area bound. Suppose these cells with index p+1 … q are already handled and each cell has its own cutline. Finally, the program will seek out the minimum cutline between cells with index from p+1 to q, as well as set the appropriate cells into fixed states. If the cell with index g has the minimal cutline, cells with index g+1 … q will set to freedom in a system. In conclusion, only these cells with index 1 … p, p+1 … g are firmly mapped into these layer and turned states into the fixed states.
In the program beginning, there does not define any layer spaces. And all the cells of the given hypergraph are in the mobile states. Each layer space will b e stacked up after repeating executes the Mapping Cells To A Layer phase. Until the lower K-1 layers all are successively implemented, the remaining mobile cells in the system will be directly placed into the layer_K.
5.1.3 Phase 3: Escape From Local Optimum
The previous two phases limit the optimization search within a regional area and the solution may fall into a local optimum. FDPrior adds an approach to escape from the possible local optimum by disturbing certain m obile cells based on Equation 10.
/ / ) (11)
The parameter is the average of the summation of .
33