Problem Formulation - 應用於通用圖形處理器上具熱感知及位置相關之三維佈局規劃演算法

Chapter 2 Preliminaries

2.3 Problem Formulation

Given a set of module information, including height, width, and power density, a netlist, and the number of layers. Out method is to find the location and resided layer of each module to minimize the footprint area, total wirelength, number of TSVs, and peak temperature in floorplan with fix-outline constraint.

Chapter 3 Thermal-aware Floorplan Algorithm

3.1 Proposed Thermal Model

We want a location-dependent thermal model without runtime increase. We will introduce how we build this thermal model in this section.

No matter how accurate the model is, if runtime is too long, then it is not a good model for floorplan. Hence, runtime is first issue needed to be concerned. Thus, we prefer the simple and fast model, so we construct our thermal model based on the Z-tile-based model. As we know, Z-tile cannot show the location-dependent property.

It is a location-independent thermal model.

Thermal resistances of each Z-tile are the same, so the temperature module produced are no deference while module is placed everywhere at same layer. If power density doesn’t change, alter thermal resistance such that temperature will change. So we let the central thermal resistance be smaller than peripheral thermal resistance. In this way, modules placed at corner will produce higher temperature than it placed at center.

We want our thermal model can reflect location-dependent property on compact resistive thermal model. Therefore, we let the distribution of thermal resistance on our thermal model display distribution of temperature on compact resistive thermal model.

For example, as shown in Figure 13, we want to construct the single Z-tile located at corner. First, we put unit power density as current source at corner with top layer, point a, and after circuit arrives at steady-state condition, temperature of this point is Va. Then the top thermal resistance of Z-tile-based thermal model is Va, because if we

put same unit power density as current source at point a, the temperature point a produce is as same as the temperature compact resistive thermal model produce. Then we do same thing at the point located at corner with the second layer from top, point b.

The temperature of point b is Vb, and then the point b in Z-tile-based thermal model must to produce the same temperature with unit power density. Thus, the second thermal resistance from top is Vb – Va. If we put unit power density at point b in Z-tile-based thermal model, the temperature will be Vb, [(Vb – Va)+Va] ×1=Vb. And do the same thing until the bottom thermal resistance is determined. Remainder thermal resistances are determined as same way.

Compare to Z-tile, all thermal resistances of Z-tile are the same. In our model, thermal resistances have different values at single Z-tile with different layers, and they will be different as placed at different location with same layer. By altering the thermal resistance, Z-tile-based thermal model can show the location-dependent property and the thermal resistances reflect the temperature distribution of compact resistive thermal model. This new location-dependent thermal model is named location-dependent Z-tile (LDZT).

LDZT has location-dependent property by altering thermal resistances, and it preserves the speed of Z-tile. Because the formulation for temperature evaluation is the same, LDZT has the same runtime as Z-tile. We realize the location-dependent property on Z-tile without runtime increase.

a b c

a

b

c

Va

Vb – Va

Vc – Vb

Figure 13. Thermal resistance evaluation.

3.2 Proposed Thermal-aware Floorplan Algorithm

The flow chart of our algorithm is shown in Figure 14 and the cost function is shown in Equation (6). The items in Equation (6) are shown as Equation (7) – (11).

We use SP representation to represent the floorplan and HPWL to evaluate the wirelength. TSV cost is TSV count and temperature cost is maximum temperature in LDZT. The last cost, RF, will be introduced in section 3.2.1. After floorplan finish, we use compact resistive thermal model to evaluate the temperature accurately.

1 Cost Cost Cost Cost Cost

Cost    ₃ ₄ ₅

3.2.1 Repulsion Force

LDZT is applied to our proposed floorplan algorithm. Before LDZT is used in floorplan, we analyze its characteristic. LDZT is a Z-tile-based model, so it has the property of Z-tile, such as let module with high power density be put at upper layer, and its runtime is as fast as Z-tile. Besides, LDZT can show the location-dependent property, this is the most important improvement in Z-tile. As we discussed before, we alter thermal resistances to realize location-dependent property. In LDZT, the module with high power density will be placed near the center, and this tendency will cause potential hotspot. Because LDZT is Z-tile-base model, it also ignores horizontal thermal resistance. Hence, LDZT can’t consider the horizontal heat dissipation, even though it has location-dependent property. The potential hotspot will occur when modules with high power density are all placed near the center. This condition will happen due to the placed tendency of LDZT.

Although the hotspot occurred at center is better than it occurred at corner, we want the temperature distribution is as uniform as possible. Before LDZT is applied to floorplan, we want to solve this problem. We want the modules with high power density are not close to each other, so we add the force to exclude them. Logan et al.

[36] proposed a repulsion force (RF) cost in floorplan to exclude the modules with high power density. They proposed a concept of thermal coupling, which means it has hotter temperature when modules with high power density closer to each other. Thus, they define the hot modules whose power density are higher than average power density and then define the repulsion force, as expressed in Equation (12), to let hot modules exclude each other.

In Equation (12), Pi and Pj represent the power density of hot module i and j respectively, and d_ij² is the distance between these two hot modules. If two hot modules with very high power density, the distance between two must be large enough to minimize the total repulsion force cost. We refine the repulsion force cost to suit our floorplan algorithm, as expressed in Equation (13).

(13)

In Equation (13), we refine four points from [36]. First, we consider all modules, because the definition of hot module has drawback. For instance, if average power density is 50, shouldn’t the module with 49 of power density be considered?

Therefore, we consider all modules, and evaluating repulsion force cost is not time-consuming, anyway.

Second, we change the addition to multiplication. Because addition cannot precisely determine whether the reciprocal effect of two modules is good or not, we change it. For example, doing addition of two modules with 1 and 99 of power density respectively is 100. And then do it of two modules with 50 and 50 of power density. The answers are the same. So these two sets of modules are the same for repulsion force cost. But, the set of modules with 1 and 99 of power density is much better than the other set, because the thing that one module is hot and the other is cool is good for heat dissipation. In multiplication, the answer is large if and only if two modules both have high power density. This is why we adopt the multiplication.

The next, [36] consider the hot module with all layers to evaluate repulsion force cost. Because we use LDZT to consider vertical heat dissipation already, we don’t have to consider vertical repulsion force. Thus, we calculate repulsion force layer by layer, and then we sum up the repulsion force on each layer.

The last point we refine is adding the different weights to different layers. High

 

repulsion force of one layer means there are many modules with high power density or some modules with high power density are closer to each other. No matter what condition is, high repulsion force of one layer represents there is high probability the layer has hotspot. Thus, we want high repulsion force occur at upper layer, because upper layer is close to heat sink. So, we increase the tolerability of repulsion force at upper layer by the additional weight.

3.2.2 Over-heat Prevention Zone

We use the algorithm introduced previously for 100 random seeds. The worst case of 100 floorpalns is shown in Figure 15. The hotspot occurs at original point, the corner of floorplan. This is not condition we except, because we will place modules with high power density at center by LDZT. So we analyze this condition. There are two reasons causing the hotspot occurs at original point.

Figure 15. Temperature distribution

The first reason is we use SP representation to represent floorplan. We will place modules from bottom-left to top-right by SP representation. Hence, the density of module of original point (bottom-let corner) will be higher than the other corners, as shown in Figure 16.

Figure 16. Modules are placed by using SP representation.

The other reason is degree of perturbation. Because there are other items in cost function, all items of cost function determine where the module is placed. Although the thermal cost want to place module away from corner, other items may not want. In brief, other items may restrict the degree of perturbation.

In order to let the hotspot be away from original point, we want the modules with high power density be placed away from original point. We observe the SP representation. In SP representation, if notation A is always left to notation B in two sequences (Γ⁺, Γ^-), the location of module A will be left to the location of module B.

Otherwise, if notation A is left to notation B in sequenceΓ⁺, but it is right to notation B in sequenceΓ^-. In this condition, the module A will be placed above the module B.

Based on this property of SP representation, the notation get closer to left part of sequences means that the module of this notation get closer to original point (bottom-left).

After observation of SP representation, we consider the N leftmost rooms in two sequences as over-heat prevention zone (OHPZ), and N is the zone size. The purpose of OHPZ is that the modules with high power density cannot be located in OHPZ.

Then, we define over-heat module whose power density is higher than average power density plus standard deviation of power density. We constrain that the over-heat module cannot be placed in OHPZ.

Chapter 4 Parallel Floorplan Algorithm on GPGPU

In this chapter, we introduce the way we parallelize the floorplan and how the parallel algorithm can obtain maximum speedup. As listed in section 1.3, using global memory will cause large latency, so we have to use it as less as possible. There are two conditions where we use global memory. One is communication between blocks, and the other is communication between CPU and GPU. For first condition, the best way to prevent is that let blocks not to communicate each other. Thus, we let blocks deal with independent issues. In this way blocks don’t need to communicate each other. The latter condition can’t be avoided. If CPU doesn’t transmit data to GPU, there are no data to use for CUDA kernel function. On the other hand, if GPU doesn’t transmit data to CPU, CPU can’t receive the data which are treated already. Therefore, we can’t avoid this condition. All we can do is reduce the frequency and quantity of data of communication.

There are two types of data sent to GPU from CPU. One is the data which do not change during SA procedure, such as netlist, power density of module, and thermal resistance. The other one will change during SA procedure, such as location of modules, and height/width of modules. In order to reduce the frequency of communication, we send the type one data to GPU once before SA procedure. And during SA procedure, we only send type two data to GPU to reduce the quantity of data of communication.

The items of cost function are area, wirelength, TSV count, temperature, and repulsion force. We consider each item as respective kernel function, because each item evaluation is suit for different number of blocks and threads.

We will introduce the method we parallelize these items in turn.

4.1 Parallel Floorplan – Area

SP representation places modules from bottom-left to top-right. When module is placed, its location is relative to the modules which are placed already. Thus, the module is dependent while calculating area.

Because the data are dependent, we can’t divide the algorithm into independent parts completely. We divide the algorithm by breaking the for-loop as shown in Algorithm 1, pseudo-code of area evaluation for Parquet 4.5 [38]. After breaking the for-loop, the pseudo-code is performed be each thread. In this way, the time complexity of area evaluation becomes O(#modules) from O(#modules²).

Algorithm 1. The pseudo-code for area evaluation [38].

Area evaluation

for( i = 1 to #modules ) match[X[i]].x = i;

match[Y[i]].y= i;

Length[i] = 0;

for( i = 1 to #modules ) b = X[i];

p = match[b].y;

Position[b] = Length[p];

t = Position[b] + weights(b);

for( j = p to #modules)

if( t > Length[j] ) Length[j] = t;

else break;

return Length[#modules];

In this algorithm, the independent elements are x coordinate evaluation, y coordinate evaluation, and different layers. We can evaluate x coordinate and y coordinate of each module and module coordinate at different layers simultaneously.

So the number of blocks is 2 × #layers. After the evaluation, we only return the footprint area to CPU to reduce the communication time (quantity of data of communication). The module data, like height/width and coordinate, are stored at global memory, because the shared memory will be clear while the kernel function is end. The module date we store are used for other kernel function. Like wirelength evaluation, we need module coordinate to calculate it. Other kernel function all need module coordinate, so we evaluation area first.

BREAK

4.2 Parallel Floorplan – Wirelength and TSV Count

We deal with these two items by using the same data, netlist, and module coordinate. In order to reduce the times of communications between shared memory and global memory, we evaluate these two items in one kernel function together.

In this algorithm, the independent element is net. Because the wirelength of net and number of layers the net cross are independent, we can deal with one net by one thread, the time complexity becomes O(maximum #degrees of a net) from O(total

#degrees). And we sum up the values of each thread to get the total wirelength/TSV

count by tree reduction technique, as shown in Figure 17. By this technique, the time complexity for addition is becomes O(log^#nets) from O(#nets).Finally, we only send total wirelength and TSV count to CPU.

10 15 7 8 10

18 25

Figure 17. Tree reduction technique.

4.3 Parallel Floorplan – Temperature

4.3.1 Power Density Evaluation

The way to evaluate temperature is expressed in Equation (3). Before we evaluate temperature, we calculate power density first. The power density of the grid

is calculated by summation of power density of each module multiplied by its area ratio, the overlapping area between module and grid divide by grid area, as shown in Equation (14) different grids simultaneously. One thread deals with one grid, and each grid scans all modules at layer where grid resides to calculate the power density. Hence, the time complexity becomes O(#modules) from O(#grids×#modules). In this algorithm, number of blocks is determined by number of layers, because different layers are independent element. Besides, we only need the module data with one layer on one block, so we can let module data with same layer be located at shared memory for each block.

4.3.2 Temperature Evaluation

After we calculate power density, we calculate the temperature immediately.

Each grid evaluates temperature respectively. Then accumulate the temperature layer by layer, so the maximum temperature will occur at bottom layer. We use the tree reduction technique to find maximum temperature. Different from it used by wirelenght/TSV count, the operation tree reduction technique perform is not addition, but comparison. It finds maximum temperature by comparing two temperatures iteratively. So the time complexity becomes O(log^#grids) from O(#grids). And we return the maximum temperature only.





4.4 Parallel Floorplan – Repulsion Force

Repulsion force evaluations of each module are independent. We use one thread to calculate the repulsion force of one module. We evaluate repulsion force layer by layer. The modules with different layers are not used. This is similar to power density evaluation, so we deal with them in the same way. The number of blocks is determined by number of layers, due to the usage of shared memory has been discussed previously.

The thing each thread operates is scan all modules at one layer to calculate the repulsion force, so the time complexity is O(#modules) from O(#modules²). Then we sum up the repulsion force by tree reduction technique, so the time complexity is O(log^#modules) from O(#modules). Finally, we return the total repulsion force only.

Chapter 5 Experiments

5.1 Environmental Setup

Partition

Floorplan

Temperature evaluation Module

information netlist

Construct LDZT

Floorplan result

Thermal profile

Figure 18. Experiments flow

The largest cases form benchmark GSRC are shown in Table 3. The total flow is shown in Figure 18 and it is introduced as follow. Before floorplan, we construct thermal model, LDZT, first. Then, the initial layers where modules reside is determined by iLap [37]. The flooplanner we use is Parquet 4.5 [38], a 2D floorplanner, and [39] modify it for 3D floorplan. Next, the experimental settings of floorplan are shown as follow. The fix-outline constraints we set are 20% whitespace for 4-layer design and the area ratio is 1. The floorplan algorithm we proposed has been implemented in C++/Linux environment. Last, we use compact resistive thermal model for thermal simulation after floorplan, and the details of it are shown in Table 4.

Thickness Substrate 30

Interconnect sublayer 150

5.2 Experimental Results

In this section, we first show the results of floorplan quality, including area, wirelength, TSV count, and temperature. Next we present the runtime analysis in the CUDA platform.

5.2.1 Quality

In this part, we compare our work to related work [29], as shown in following table. Table 5, Table 6, and Table 7 show the results of circuit of n100, n200, n300 respectively. The first row shows the zone size, and the number of bracket means zone size / (#modules/#layers). ZT represents Cong’s work [29] without applying OHPZ technique, which implies the zone size of it is 0. Other columns show the result of varies zone size in LDZT. In the second row, Max_T means the maximum temperature of a single floorplan, and the following columns show the maximum/minimum/average/standard deviation of max_T from floorplans generated by 100 different random seeds. The last, the bottom two rows show the average wirelength and TSV count of 100 floorplans.

The Figure 19, Figure 20, and Figure 21 show the distribution of thermal data of Table 5, Table 6, and Table 7. The top endpoint of line means the maximum of Max_T, and the bottom endpoint of line means the minimum of Max_T. The label on the line means the average of Max_T.

Table 5. Experimental results – n100.

Zone size ZT 0(0%) 3(10%) 5(20%) 8(30%) 10(40%)

Max_T Std 5.6 5.1 2.8 2.7 2.5 3.0

Max 163.2 160.4 154.6 150.0 151.5 155.3 Min 130.1 133.6 137.2 139.3 135.4 135.1 Avg 148.0 147.0 145.7 144.7 145.0 146.6 WL 131554 131486 130930 131207 131412 131772

TSV 703.2 702.7 699.3 693.4 704.7 701.3

Table 6. Experimental results – n200.

Zone size ZT 0(0%) 5(10%) 10(20%) 15(30%) 20(40%)

Max_T Std 2.7 2.2 1.9 1.7 1.4 1.6

Max 193.8 191.4 191.7 192.1 189.4 189.5 Min 181.3 180.7 178.6 179.8 181.2 179.8 Avg 186.3 185.3 184.6 184.4 184.1 184.5 WL 241258 239184 240007 240083 240619 242574 TSV 1540.4 1527.6 1522.7 1520.0 1516.6 1516.8

Table 7. Experimental results – n300.

Zone size ZT 0(0%) 8(10%) 15(20%) 23(30%) 30(40%)

Max_T Std 2.5 1.6 1.8 1.3 1.2 1.3

Max 203.7 199.1 197.5 196.2 196.7 197.8 Min 189.3 189.1 187.5 189.2 188.9 189.3 Avg 193.7 193.2 192.8 192.7 193.3 193.5 WL 343863 340112 341874 343885 345545 347358 TSV 1592.5 1570.4 1565.6 1566.8 1564.2 1557.2

Figure 19. Thermal data – n100.

Figure 20. Thermal data – n200.

Figure 21. Thermal data – n300.

Observing above data, TSV count and wirelength of our work are similar to Cong’s work. Next, observing of the thermal issue. In Figure 19 – Figure 21, the range of the maximum temperatures becomes smaller in our methods. It means our

在文檔中應用於通用圖形處理器上具熱感知及位置相關之三維佈局規劃演算法 (頁 23-0)