The problem we concerned is described as follows. Let M = {m1,m2,. . . ,mn} be a set of n rectangular modules. Each module has its height hi, width wi, and switching profile including the active time and idle time, the dynamic and leakage power dissipation, the wake-up time, and the wake-up power dissipation. The goal is to evaluate whether a module needs to insert sleep transistor or not, and to find a optimal floorplan with minimum cost (in area, wire-length), and arrange enough area for modules with sleep transistor under the effect of power supply noise.
Chapter 3
Methodology for Integrating
Floorplann and Sleep Transistor Sizing in the Presence of Power Supply Noise
In this chapter, we introduce our methodology to do this work. For a given design, first we should evaluate which modules should be inserted with sleep transistors.
Once we make sure which modules need to be gated, we size the area of sleep transistors and do the floorplanning. While doing the floorplanning, we will calculate the noise and revise the area of sleep transistors. After these steps, we complete our work.
3.1 Power Evaluation
To ensure the benefit of sleep transistor, first we should compute the power dissi-pation of every module before and after inserting sleep transistor. And then decide whether we should insert sleep transistor to this module or not. To accomplish this work, we must prepare the profile of modules. This must includes the active and idle time, the dynamic and leakage power dissipation, the wake-up time, and the wake-up power dissipation.
First we assume every module is inserted with sleep transistor. And there are three conditions for computing the power dissipation. (1)When the module is in active mode, it consumes dynamic power and the energy is Eactive = Pdynamic × Tactive. (2)When the module is in idle mode, we first check if Tidle > Twake−up. If it is true, this module can enter the sleep mode, and then it consumes leakage power and the energy is Esleep = Pleakage × Tsleep. Otherwise this module can not enter sleep mode since it has no time to sleep. Thus it must stay in active mode and it consumes dynamic power with energy Eactive = Pdynamic × Tidle. (3)When the module is entering active mode from sleep mode, it will consume the wake-up power and the energy is Ewake−up = Pwake−up × Twake−up, where Ewake−up is discussed in Chapter 2.1.3.
Then the power dissipation is Pgated = (Eactive + Esleep + Ewake−up) / Ttotal. Compared with the original power Poriginal, if Pgated < Poriginal, this module is worth being gated. Otherwise we should not insert sleep transistor to this module. Thus we can know whether we can get advantage or not if we insert sleep transistor into a module. Once we make sure which modules is suitable to insert sleep transistor, we go to the next step, floorplan. There are many representation for packing floorplans, in this thesis we choose sequence pair and b*-tree representation. And we use the simulated annealing framework[6] to accomplish the floorplan.
3.2 Sequence Pair Packing
In sequence pair packing, there are two processes to establish the floorplan, hori-zontal longest common subsequence pair evaluation (H-LCS ) and vertical longest common subsequence pair evaluation (V-LCS ).
For a given sequence pair P = (X,Y), H-LCS will search the longest common
subsequence in X and Y. We use the data structure, bucket-list[3], to help to find the longest common sequence more efficiently. The bucket-list is composed of double-linked node. And the node has three data, x 1 = x 2last−node represents the x -coordinate of this module, x 2 = x 1 + module width, represents the x -coordinate of next module which is right to this module, and bucket-list number represents the sequence position in Y.
In this process, modules will be placed from left to right and their x -coordinates will be determined. As shown in Figure 3.1, we will search every element in sequence X, and find the bucket-list number according to the sequence Y. For example, the first element in sequence X is a, and its bucket-list is 4 since a is the forth element in sequence Y. And x 1(a) is set to 0 since its last node, root, x 2(0) = 0. Then the x 2(a) is set to x 1(a) + width(a) (Figure 3.1 (b)). The second element in X is b and its bucket-list number is 2. This node will be inserted between node 0 and node 4, and x 1(2) = x 2(0) = 0, x 2(2) = x 1(2) + width(b). (Figure 3.1(c)). The following two element in X, d and e, are the same. (Figure 3.1 (d)). When we insert the fifth element in X, c, whose bucket-list number is 1, into the bucket-list, we check every node after it and delete the node whose x 1 is less than x 2(c). This means those deleted nodes are not component of longest common subsequence. (Figure 3.1(e)).
Final we complete the process and determine the x 1 of all modules. (Figure 3.1(f)) And vertical longest common subsequence pair evaluation (V-LCS) is similar to H-LCS. But this will place modules form bottom to up, orientate the y-coordinates.
After these two processes we get the floorplan as shown in Figure 3.5. The location of a module can be sure after we complete these two processes. Thus we can analysis the power supply noise and size the sleep transistor. But the locations of modules are fixed, and there may be no space to insert sleep transistor.
To solve this problem, our method is, compare the weight and height of this floorplan. If the weight is less than height, we re-do the H-LCS and increase the
Figure 3.1: Horizontal longest common sequence evaluation process on Sequence pair ((a,b,d,e,c,f),(c,b,f,a,d,e))[3].
Figure 3.2: The floorplan of Sequence pair ((a,b,d,e,c,f),(c,b,f,a,d,e))
Figure 3.3: (a) The floorplan width is less than the height. (b) Re-do the H-LCS and increase the width of those gating modules (b, d, f)[3].
weight of modules for inserting sleep transistor, otherwise we redo the V-LCS and increase the height of modules, as shown in Figure 3.3.
We use simulated annealing framework [6] to obtain a acceptable solution. In simulated annealing process, we perturb the original sequence pair to obtain a new sequence pair. Then we pack the new floorplan and calculate the new cost. If the new cost is less than the original cost, we will accepted this new floorplan. We do this process repeatedly until we obtain an accepted solution or the temperature of simulated annealing cools down. The flowchart is shown in Figure 3.4.
3.3 B*-tree Packing
In B*-tree packing, we will do the depth-first-search (DFS) on the tree and place the module corresponding the node which we are visiting. While we are placing block bj, first we check that node nj is a left or right child of his father, node ni. If it is a left child, the x -coordinate is set to be xj = xi + wi, where wi is the width of block bi. However if it is a right child, the x -coordinate is set to be xj = xi, that is, the
Figure 3.4: (a) The flowchart of simulated annealing framework[6] with inserting sleep transistors in sequence pair packing
same as its father.
After we determine the x -coordinate, we will use the horizontal contour[5] to find out the x -coordinate of that module. The horizontal contour is a vector that records the top line of present floorplan, and will be updated if a module is placed into the floorplan, as shown in Figure 3.5.
Unlike sequence pair packing, B*-tree will always place the most left-bottom modules first. Thus when we place a module into the floorplan, other modules which is bellow or left to it have been placed already. So we can place this module and make sure its location. Then we calculate the power supply noise and size the sleep transistor. After we get the area of sleep transistor, we add this area into the module. Here we do not increase only its width (or height) alone. We will increase both the width and height and keep the ratio of the module the same, as shown in Figure 3.6. We use the simulated annealing to obtain an acceptable solution and the flowchart is shown in Figure 3.7
Figure 3.5: (a) The present floorplan and the horizontal contour. (b)block b3 will be placed on the top of block b1, the y-coordinate can be got from the horizontal contour. (c) Place the b lock b3 and update the horizontal contour[5].
Figure 3.6: (a) Place the most left-bottom module to its location. (b) Calculate the noise and size the sleep transistor if need. (c) Place the next module and insert sleep transistor if need. (d) Continue the work. (e) The whole floorplan
Figure 3.7: (a) The flowchart of simulated annealing framework[6] with inserting sleep transistors in B*-tree packing
Chapter 4
Experimental Results
We implement a power simulator to evaluate which modules are suitable to be inserted sleep transistor, and two floorplanners; one is based on sequence pair pack-ing and another is based on B*-tree packpack-ing. All above are implemented in C++
programming language. The platform is Intel Pentium 3.0GHz CPU with 2.0GB memory. We experiment with our approach on MCNC[19] circuit benchmark. And we refer to the tsmc 0.18 CMOS library to model our sleep transistor. We will compare the power consumption before and after inserting sleep transistor, and the result of sequence pair packing and B*-tree packing.
4.1 Power Analysis Results
We do the experiment on ami33 and ami49, the circuit benmark of MCNC. The results are shown in Table 4.1 and Table 4.2. In ami33, before we insert the sleep transistor, its original power is 161.7mW, including dynamic power 125.7mW and leakage power 36mW. Our simulator choose 10 modules to insert sleep transistor (because these module will consume less power if they have sleep transistors). The area of these 10 modules is 32.10% of ami33. After we insert the sleep transistor, its total power 157.6mW, including dynamic power 127.96mW and leakage power 29.73mW.
Table 4.1: The result of power simulator.
number of area of gated total area of Circuit
gated modules modules (nm2) circuit (nm2) percentage
ami33 10 3.68×105 1.15×106 32.10 %
ami49 29 1.30×107 3.53×107 36.68 %
Table 4.2: Compare the power consumption in MCNC benchmark, circuit ami33 and circuit ami49.
Without Sleep transistor With Sleep Transistor Circuit
total dynamic leakage total dynamic leakage ami33 161.7 125.7 36 157.69 127.96 29.73
ami49 254.3 193.5 50.8 237.3 197.8 39.5
In ami49, before we insert the sleep transistor, its original power is 254.3mW, in-cluding dynamic power 193.5mW and leakage power 50.8mW. Our simulator choose 29 modules to insert sleep transistor (because these module will consume less power if they have sleep transistors). The area of these 29 modules is 36.68% of ami49.
After we insert the sleep transistor, its total power 237.3mW, including dynamic power 197.8mW and leakage power 39.5mW.
The dynamic power increases because sleep transistors will consume extra dy-namic power when the module is in active mode. But sleep transistor will save leakage power when the circuit is in sleep mode.
4.2 Floorplan Results
After power analysis, we take the results to do the floorplan. In ami33 there is 10 modules chosen to be gated, takes 30.09% of total area. In ami49 there is 29 modules chosen to be gated, takes 36.68% of total area.
Table 4.3: The floorplanning results of ami33 with sequence pair packing and B*-tree packing.
width height area dead space sleep transistor packing
(nm) (nm) (nm) ratio
(%) area (%)
sequence pair 1484 1082.75 1.6×106 0.73 28.6 3.58
B*-tree 1045 1251.9 1.3×106 0.83 8.76 3.52
Table 4.4: The floorplanning results of ami49 with sequence pair packing and B*-tree packing.
width height area dead space sleep transistor packing
(nm) (nm) (nm) ratio
(%) area (%)
sequence pair 7177.64 8679.73 6.23×107 0.82 40 2.40 B*-tree 7275.1 5672.9 4.13×107 0.78 10.84 3.66
In sequence packing, the floorplan of ami33 has 28.6% dead space, and sleep transistors take 3.58% of total area (see Figure 4.1). The floorplan of ami49 has 40% dead space, and sleep transistor take 2.40% of total area, as shown in Table 4.3. In B*-tree packing, the floorplan of ami33 has 8.76% dead space, and sleep transistors take 3.52% of total area (see Figure 4.2). The floorplan of ami49 has 10.84% dead space, and sleep transistor take 3.66% of total area (see Figure 4.3), as shown in Table 4.4
In this experiment, we use the same time to run the programs based on sequence pair packing and B*-tree packing. According to the results, B*-tree has less dead space than sequence pair. In sequence pair packing, we add the width or height of gated modules for inserting sleep transistor, but the gated modules will push their nearby modules and make a lot of modules. In the other hand, the modules cannot hold its original ratio since we only increase its weight or height. Due to these reasons, the B*-tree packing is suggested to do this work.
Figure 4.1: floorplan of ami33 based on sequence pair packing. The small black block represents the power bumps, and the dark area represents the sleep transistor.
Figure 4.2: floorplan of ami33 based on B*-tree packing. The small black block represents the power bumps, and the dark area represents the sleep transistor.
Figure 4.3: floorplan of ami49 based on B*-tree packing
Chapter 5 Conclusion
In this thesis, we have developed a framework to integrate floorplanning and sleep transistor insertion and sizing in the presence of power supply noise. We use sequence pair and B*-tree as our core engines to perform floorplanning. The experimental results have shown that the B*-tree packing is more suitable to do this work. We further validate the effectiveness of sleep transistor insertion in saving leakage power.
Although the application of sleep transistors is few, they are expected to be widely used in the near future due to lower power demand.
Bibliography
[1] Hailin Jiang, Malgorzata Marek-Sadowska, and Sani R. Nassif. “Benefits and Costs of Power-Gating Technique”. In Proceedings IEEE International Confer-ence on Computer Design, 2005.
[2] Murata, Fujiyoshi, Nakatake, and Kajitani. “Rectangle-Packing Based Mod-ule Placement”. In Proceedings IEEE/ACM International Conference on Computer-Aided Design, 1995.
[3] Xiaoping Tang and D.F. Wang. “FAST-SP:A Fast Algorithm for Block Place-ment basced on Sequence Pair”. In Proceedings IEEE Asia and South Pacific Design Automation Conference, 2001.
[4] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S. W. Wu. “B*-Trees: A new representation for non-slicing floorplans”. In Proceedings IEEE/ACM Design Automation Conference, pages 458–463, April 2000.
[5] Guang-Ming Wu, Yun-Chih Chang, and Yao-Wen Chang. “Rectilinear Block Placement Using B*-Trees”. In ACM Trans. on Design Automation of Elec-tronic Systems, April 2003.
[6] S. Kirkpatrick, C.D. Gelatt, and M. P. Vecchi. “Optimization by simulated annealing”. In Science, pages 671–680, 1983.
[7] Benton H. Calhoun, Frank A. Honor´e, and Anaatha P. Chandrakasan. “A Leakage Reduction Methodology for Distributed MTCMOS”. In IEEE Journal of Solid State Circuit, May 2004.
[8] Vishal Khandelwal and Ankur Srivastava. “Leakage Control Through Fine-Grained Placement and Sizing of Sleep Transistors”. In Proceedings IEEE/ACM International Conference on Computer-Aided Design, 2004.
[9] Anand Ramalingam, Bin Zhang, Anirudh Degan, and Daid Z. Pan. “Sleep Transistor Sizing Using Timing Criticality and Temporal Currents”. In Pro-ceedings IEEE Asia and South Pacific Design Automation Conference, 2005.
[10] Changbo Long, and Lei He. “Distributed Sleep Transistor Network for Power Reduction”. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, September 2004.
[11] Pietro Babighian, Luca Benini, Alberto Macii, and Enrico Macii. “Post-Layout Leakage Power Minimization Based on Distributed Sleep Transistor Insertion”.
In ProceedingsInternational Symposium on Low Power Electronics and Design, 2004.
[12] Ramaprasath Vilangudipitchai and Poras T. Balsara. “Decap Aware Sleep Transistor Design”. In Proceedings of the 2004 IEEE Dallas/CAS Workshop Implementation of High Performance Circuits, pages 171–175, 2004.
[13] Ramaprasath Vilangudipitchai and Poras T. Balsara. “Power Switch Network Design for MTCMOS”. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2005.
[14] Afshin Abdollahi, Farzan Fallah, and Massoud Pedram. “An Effective Power Mode Transition Technique in MTCMOS Circuit”. In Proceedings IEEE/ACM Design Automation Conference, 2005.
[15] Kaijian Shi. “Dual threshold voltages and power-gating design flows offer good results”. In EDN, February 2006.
[16] Changbo Long, Jinjun Xiong, and Lei He. “On Optimal physical Synthesis of Sleep Transistors”. In Proceedings International Symposium on Physical Design, 2004.
[17] Magnus Sj¨alander, Mindaugas Drazdziulis, Per Larsson-Edefors, and Henrik Eriksson. “A Low-Leakage Twin-Precision Multiplier Using Reconfigurable Power Gating”. In Proceedings Internationl Symposium on Circuits and Sys-tems, 2005.
[18] Narendra Vallepalli, Yih Wang, B. Zheng, Kevin Zhang, Uddalak Bhattacharya, Zhanping Chen, Fatih Hamzaoglu, Daniel Murray and Mark Bohr. “SRAM De-sign on 65-nm CMOS Technologh With Dynamic Sleep Transistor for Leakage Reduction”. In IEEE Journal of Solid State Circuit, December 2004.
[19] http://www.cse.ucsc.edu/research/surf/gsrc/mcncbench.html.
[20] Mohab Anis, Shawki Areibi, Mohamed Mahmoud and Mohamed Elmasry. “Dy-namic and Leakage Power Reduction in MTCMOS Circuits Using an Auto-mated Effcient Gate Clustering Technique”. In Proceedings IEEE/ACM Design Automation Conference, 2002.