• 沒有找到結果。

Based on simulated annealing [Kirkpatrick et al. 1983], we implemented the temporal floorplanning algorithm in the C++ programming language on a 433 MHz SUN Ultra-60 workstation with 1 GB memory. We compared 3D-subTCG with Sequence Triplet (ST) [Yamazaki et al. 2000] and T-tree based on the same SA engine and same SA parameters, (cooling schedule, initial temper-ature, weights of the cost function, etc.). ST is extended from the well-known Sequence Pair (SP) [Murata et al. 1995] representation, which is very popular for handling floorplanning/placement in both the industry and academia.

In this section, we first report the outline-free floorplanning results. Then, we report results for boundary constraint and fixed-outline constraint.

Table III. Results for Volume and Overhead Optimization

ST 3D-subTCG

# Sum Volume Dead Volume Dead

of of (mm2× Space Time (mm2× Space Time

Circuit tasks volume clks) (%) (sec.) clks) (%) (sec.)

beasley1 10 6218 8710 28.6 7.7 7504 17.1 8.5

beasley2 17 11497 14664 21.5 45.2 12402 7.2 28.5

beasley3 21 10362 16016 35.3 44.1 12640 18.0 22.4

beasley4 7 10205 13800 26.0 3.0 13064 21.8 2.0

beasley5 14 16734 22750 26.4 18.2 18912 11.5 16.0

beasley6 15 11040 14994 26.3 27.9 13200 16.3 24.8

beasley7 8 17168 24570 30.1 3.8 20574 16.5 2.3

beasley8 13 83044 132275 37.2 15.4 98280 15.5 19.4

beasley9 18 133204 174496 23.6 30.6 167751 20.5 17.2

beasley10 13 493746 660480 25.2 13.0 575685 14.2 10.8

beasley11 15 383391 486381 24.8 17.5 438702 12.6 9.8

beasley12 22 646158 922080 29.9 100.0 823816 21.5 58.5

okp1 50 1.24327e+08 216950048 42.6 1607.2 173829024 28.4 387.3

okp2 30 8.54452e+07 128093128 33.2 285.3 110095000 22.3 73.8

okp3 30 1.23808e+08 185146208 33.1 280.7 160854400 23.0 70.6

okp4 61 2.38861e+08 417942304 42.8 791.3 328835424 27.3 501.9

okp5 97 1.89875e+08 448984000 57.7 607.8 295849984 35.8 565.9

Average 32.01 19.38

6.1 Results for Outline-Free Floorplanning

In this subsection, we report the result for outline-free floorplanning problem.

We have conducted four sets of experiments: (1) volume optimization, (2) volume and overhead optimization, (3) simultaneous volume, wirelength, and over-heads optimization, and (4) volume and overhead optimization for five real circuits.

To verify our algorithm, we first tested 3D-subTCG on five synthetic circuits that can be packed without deadspace. Table II shows the results. Note that the volume of a placement is the minimum bounding box enclosing the placement.

We can see that 3D-subTCG obtains the optimal placements for the first three test cases and near optimal solutions for the last two larger circuits, all in reasonable time. The results show that our approach is very effective for cost optimization.

For the second experiment, we perform volume and reconfiguration and com-munication overheads optimization. In this experiment, we adopted the bench-mark circuits used in Fekete and Schepers [1997] and added the reconfiguration and communication overheads. We compared 3D-subTCG with ST. As shown in Table III, the 3D-subTCG based method outperforms the ST-based one by a

Table IV. The Five 3D-MCNC Benchmark Circuits

# of # of # of # Total # of precedence

Circuit modules pads nets pins volume constraints

3D-apte 9 73 97 214 9.8 × 107 3

3D-xerox 10 107 203 696 4.0 × 107 3

3D-hp 11 43 83 264 1.2 × 107 3

3D-ami33 33 42 123 480 2.3 × 106 7

3D-ami49 49 24 408 931 1.3 × 108 11

large margin. 3D-subTCG achieved less deadspace on the average compared to ST (19.38% vs. 32.01%).

For the third experiment, we perform 3D placement with the considerations of precedence constraints, wirelength, and reconfiguration/communication overheads. In this experiment, we used the MCNC benchmarks. Since the MCNC benchmarks do not have execution time and precedence constraints, we assigned their execution time and precedence constraints by ourselves. The new benchmark suite is called the 3D-MCNC benchmark. Table IV lists the statistics of the five 3D-MCNC benchmarks. In the following, we describe how to construct the control data flow graph (CDFG) from a traditional floorplan-ning benchmark. The basic idea is to construct the edges among tasks based on the interconnections among them.

Let F stands for the given floorplanning benchmark. First, we define ci for each task vi as the summation of the number of interconnections between vi and all other tasks plus the number of tasks that are connected to vi. Let C be the set of ci sorted in descending order. The initial CDFG DG = {S, E}

contains the first k tasks in C, where k is a user-defined constant. Then we delete the interconnections among these k tasks in F . We iteratively add tasks and edges into DG. For each iteration, we add at most r directed edges into E from H, where H = {(vi, vj)|vi ∈ S, vj /∈ S, and there exist interconnections between vi and vj in F}. Here, r is a user-specified constant. Then we add the task vj into DG and delete the interconnections associated with the task vj

and the interconnections between the tasks vi and vj. When all tasks are in DG, the algorithm terminates. Finally, we randomly select l tasks from DG and the edges in E that connect these tasks to form the final CDFG. Here, l is a user-specified constant.

In this experiment, we simultaneously optimized volume and wirelength, and reconfiguration/communication overheads with precedence constraints. We compared 3D-subTCG with ST and T-tree. Table V shows the results. As shown in Table V, 3D-subTCG achieves better volume utilization (15% deadspace v.s.

35% deadspace) and shorter wirelength compared with ST. 3D-subTCG also needs less CPU time than ST. Compared with T-tree, 3D-subTCG obtains com-parable deadspace (14.86% v.s. 14.22%) with shorter wirelength (396.08 v.s.

402.06). Figure 19 shows the resulting placement of 3D-xerox.

Although it is hard to quantify, a key insight to the different performance between 3D-subTCG and (ST) lies in the effects of their perturbations: swap-ping two modules in a ST may lead to a dramatic change from the original placement while the change for the 3D-subTCG perturbation is smaller, which makes simulated annealing easier to converge to an optimal solution. (Here is

Average 524.2 52.9 312.18 396.08 14.86 166.46

T-tree 3D-subTCG

3D-apte 1.0 × 108 380.0 5.9 0.58 1.0 × 108 335.3 5.9 3.9 3D-xerox 4.7 × 107 595.7 13.8 1.78 4.4 × 107 602.0 8.4 8.9 3D-hp 1.41 × 107 165.1 8.6 1.65 1.5 × 107 158.3 13.7 11.2 3D-ami33 3.0 × 106 78.1 24.5 34.33 3.0 × 106 77.7 24.7 128.1 3D-ami49 1.61 × 108 791.4 18.3 72.46 1.6 × 108 807.1 21.6 680.2

Average 402.06 14.22 22.16 396.08 14.86 166.46

Fig. 19. The result of 3D-xerox with optimizing volume and wirelength simultaneous.

an analogy: like the gradient search for the optimization of nonlinear program-ming, the step size plays an important role in determining whether a search scheme can converge to the global optimal solution—a huge step size may fail to converge to an optimal solution.)

For the final experiment, we used five real circuits: JPEG encoder [Banerjee et al. 2005], Recursive Least Square filter (RLS) [TORSCHE ], Finite Impulse Filer (FIR), Bandpass Filter (BF) [Papachristou and Konuk 1990], and Fast Fourier Transform (FFT) [Cooly and Tukey 1965]. We considered volume and overhead optimization in this experiment. The width and height of each type of tasks (addition, multiplication, etc.) range from 5 to 15 and the duration ranges from 15 to 25. Table VI shows the result of the five real circuits. Columns 2 and 3 list the number of tasks and the number of precedence constraints for each circuit, respectively. Column 4 gives the total volume of each circuit.

Table VI. Results of Volume and Overhead Optimization for Five Real Circuits

# T-tree 3D-subTCG

# of Sum Dead Dead

of precedence of space Time space Time

Circuit tasks constraints volume Volume (%) (sec.) Volume (%) (sec.)

JPEG 8 9 17781 25785 31 0.47 25785 31 1.09

RLS 11 12 18448 24990 26.2 1.16 24150 23.6 11.48

FIR 21 12 42672 46440 8.1 11.58 45824 6.8 3.54

BF 29 26 34643 46880 26.1 7.46 47320 26.7 15.43

FFT 64 96 95868 142500 32.7 57.03 148580 35.4 553.0

Average 24.64 15.54 24.52 116.9

Columns 5 to 10 list the resulting volumes, deadspaces, and CPU times of T-tree and 3D-subTCG. From this table, we can see that 3D-subTCG obtains comparable average volumes (24.52% deadspace vs. 24.64% deadspace) and needs longer average CPU time (116.9 sec. vs. 15.54 sec.) than T-tree. This experiment demonstrates the ability of 3D-subTCG to handle the real circuits.

It also confirms our observation in Section 3.3 that T-tree has advantages in packing efficiency and volume optimization, especially for large-scale circuits, such as the FFT circuit.

6.2 Results for Boundary Constraints and Fixed-Outline Constraints

In this subsection, we first report the result for boundary constraint. Next, we report the result for fixed-outline constraint.

For the floorplanning with boundary modules, we compared 3D-subTCG with T-tree. The goal of this experiment is to verify the ability of 3D-subTCG for han-dling various floorplanning constraints. For T-tree, we discarded the infeasible solutions (boundary modules are not on the boundary) during simulated an-nealing. We used the 3D-MCNC benchmarks and the five real circuits for this experiment. Tables VII and VIII show the respective results for the 3D-MCNC benchmarks and the five real circuits. For the 3D-MCNC benchmark, we con-sidered the volume, wirelength, and overhead, while we concon-sidered the volume and overhead for the five real circuits. In both tables, column 2 shows the num-bers of the top, bottom, left, and right tasks, denoted by #|T|, #|B|, #|L|, and

#|R|, respectively. As shown in Table VII, 3D-subTCG achieves shorter average wirelength (358.94 mm vs. 388.88 mm) and smaller average deadspace (17.84%

vs. 18.3%) than T-tree for the 3D-MCNC benchmarks. However, 3D-subTCG needs longer CPU time (99.18 sec vs. 27.81 sec) than T-tree. Similar results are also obtained for the five real circuits. For the five real circuits, 3D-subTCG obtains smaller average volume (24.09% deadspace vs. 26.00% deadspace) with longer CPU time (113.28 sec vs. 47.15 sec) than T-tree. From Tables VI and VIII, we observe that 3D-subTCG obtains similar volumes as T-tree if no boundary constraint is considered, and obtains smaller volumes than T-tree if boundary constraints need to be addressed. The experimental results confirm our ob-servation described in Section 3.3 that 3D-subTCG may be more suitable for handling various floorplanning constraints, because 3D-subTCG keeps more geometric information in the representation and has a larger solution space than T-tree. We can easily determine if a task is on the boundary of a device

3D-ami49 3,3,2,3 1.74 × 108 940.3 24.3 91.86

3D-xerox 1,1,1,1 4.66 × 107 447.6 13.2 7.99

3D-hp 1,1,1,1 1.50 × 107 176.0 13.7 1.67

3D-ami33 2,2,2,2 3.31 × 106 67.4 30.0 115.6

3D-ami49 3,3,2,3 1.79 × 108 792.7 26.4 368.0

average 358.94 17.84 99.18

Table VIII. Results for the Five Real Circuits with Boundary Constraints T-tree

by checking the indegree and outdegree of its corresponding node in Ch or Cv, and thus the SA engine can search for the feasible solutions more effectively.

Figure 20 shows the resulting 3D floorplan of 3D-ami49. White modules repre-sent boundary modules.

Table IX. Results for Various Aspect Ratios of Desired Widths and Heights for 3D-ami33 Circuit

Outline-free SA engine

Outline Min/Avg/Max Min/Avg/Max

Circuit width/ Success Exec. time Deadspace

name height rate (clk cycles) (%)

3D-ami33 1100/600 47% 6/6.61/11 21.51/29.87/39.11 900/900 13% 6/7.92/11 23.81/32.72/37.66 850/700 9% 7/8.44/11 24.47/34.30/37.66 550/1200 42% 6/6.47/11 21.51/29.98/40.02 650/800 6% 7/8.66/11 24.47/33.79/37.44

Avg. 23.4% 6.4/7.62/11 23.15/32.13/38.37

Fixed-outline SA engine

Outline Min/Avg/Max Min/Avg/Max

Circuit width/ Success Exec. time Deadspace

name height rate (clk cycles) (%)

3D-ami33 1100/600 91% 6/6.87/8 37.85/46.15/55.60 900/900 92% 6/6.66/8 46.45/54.56/60.77 850/700 69% 6/7.62/10 32.70/45.89/59.36 550/1200 58% 6/7.56/10 34.17/49.58/64.46 650/800 46% 7/8.59/11 31.73/45.11/57.72

Avg. 71.2% 6.2/7.46/9.4 36.58/48.25/59.58

Fig. 20. The result of 3D-ami49 with boundary constraints. White modules represent boundary modules.

For the fixed-outline floorplanning problem, we chose the 3D-ami33 cir-cuit for experiment. We added various outline constraints. Table IX reports the success rate2, the minimum/average/maximum task execution time3 and the minimum/average/maximum deadspace4of the fixed-outline SA engine de-scribed in Section 5.2. We follow, Adya and Markov [2001, 2003] to compare the success rate with and without considering the fixed-outline constraint. The

2Number of runs that satisfies the fixed-outline constraint in 100 runs.

3The minimum/average/maximum total execution time in all successful runs.

4The minimum/average/maximum deadspace in all successful runs.

fixed-outline SA engine. One observation is that for the 850/700 outline con-straint, outline-free SA engine obtains larger minimum execution time with smaller minimum deadspace than fixed-outline SA engine. The reason is that fixed-outline SA engine makes use of the given architecture, and therefore may generate a floorplan with smaller execution time. In contrast, free-outline SA engine optimizes volume, and therefore may generate a floorplan with longer execution time but smaller area, hence smaller deadspace.

相關文件