Two-Stage Sizing Algorithm - Sleep Transistor Sizing Algorithm

Sleep Transistor Sizing Algorithm

3.6 Two-Stage Sizing Algorithm

As shown in Fig. 3.4, the DSTN design optimization problem can be expressed as the following problem formulation.

“Given an IR circuit shown in Fig. 3.4 with a set of independent PWL current sources and the maximum allowed threshold voltage at each node, the problem is to simultane-ously minimize the total width of sleep transistors and meet the maximum allowed thresh-old voltage constraints.”

The flowchart of sizing algorithm is shown in Fig. 3.2, and its details are shown in Fig. 3.9. Our sizing algorithm has two stages. To speed up the optimization procedure,

Moment Calculation Algorithm Input : A waveform h(t) contains

N unit pulses {(t11, t12), (t21, t22) · · · , (tN 1, tN 2)}

Fig. 3.7: Moment calculation algorithm for the pulse waveform.

in the first stage we use IEKS reduction method to approximate the solution of the equiv-alent DSTN circuit with many independent time-variant PWL current sources, and then calculate the solution of the adjoint system of this equivalent circuit with many indepen-dent pulse waveforms. After that, the cost of each node and the integral sensitivity of each sleep transistor are computed, as shown in sections 3.3 and sections 3.4. To potentially zero the cost of the selected sleep transistors and also minimize the increased width of a DSTN, W_i^addis computed as

W_i^add= c_i

S_i, (3.20)

where c_i is the cost of node i, and S_i is the integral sensitivity of sleep transistor i.

By the above sizing method the new width of sleep transistor i becomes

W_i^new = W_i^old+ W_i^add. (3.21)

In the sizing procedure of transistor i, the influence of sizing sleep transistor i is greater than sizing the other sleep transistors. So if node i has maximum V_i^{max vio}, the best method to reduce the cost of node i is to size the sleep transistor i. In our method,

we choose the sleep transistor with maximum V_i^{max vio}to size, hence the sleep transistor oversizing issue is alleviated. However, the conductance matrix and its LU decomposi-tion need to be recalculated after each sizing step. This recalculadecomposi-tion procedure is time-consuming. To reduce the number of recalculation steps, several sleep transistors are simultaneously sized instead of only one sleep transistor. The details will be described in the next section. The above procedure is repeated until the maximum V_i^{max vio}of the sleep transistors plus V_i^st∗is less than the upper bound voltage V^up.

In the second stage, an accurate and efficient time domain solver [17] is used to cal-culate the cost function and its sensitivity. The procedure presented in the first stage is repeated until the total cost is equal to zero. Fig. 3.8 shown the voltage waveform of node i in the sizing procedure from beginning to end. Fig. 3.8 (a) and Fig. 3.8 (b) show the voltage variable at node i in the first stage. Fig. 3.8 (c) illustrates it meets upper bound voltage constraint, and then starting the second stage. Fig. 3.8 (d) shows the cost of node i is equal to zero.

The total width of sleep transistors and the execution time of the proposed two-stage sizing algorithm are dependent on the value of V^up in the first stage. From the experi-mental results shown that it can be observed that the execution time is reduced when we decrease V^up. This is because IEKS method is faster than the time domain solver [17].

On the other hand, the total width of sleep transistors is reduced when we choose a larger V^up. This is because the time domain solver [17] is more accurate than IEKS method.

3.7 Speed Up

In the sizing procedure, as runtime is increased with the circuit become large. In order to decrease the runtime of circuit analysis, we also proposed the speed up method for sizing sleep transistors.

In our algorithm, after solving the voltage of nodes at each time step, we then choose the sleep transistor i with the maximum V_i^{max vio} and size its width. Finally we update

conductance matrix G, then repeat above procedure until meet cost constraint or upper bound voltage constraint. When we solve voltage of nodes at each time step, we need to perform LU decomposition, this is time-consuming so we modify our algorithm. First, we calculate the V_i^{max vio} of each sleep transistor. Then, these values are sorted into descending order. After that, the first N sleep transistors are chosen for sizing. It can be observed in the experimental results that increasing N can reduce the execution time, but the total width will increase a little.

3.8 Discussion

In the chapter, we have proposed the two-stage sizing algorithm by using cost and integral sensitivity. In the first stage, model order reduction mthod is empoly because it can speed up the analysis of large circuits. On the other hand, the upper bound voltage is employed in the first stage, it can decide the runtime of all sizing procedure and total width of sleep transistor. If the upper bound voltage is increased, the runtime of first stage is decrease, and the runtime of second stage is increased. This is because it quickly meets the upper bound constraint in the first stage, then break the first stage and starting the second stage.

In the second stage, exact time domain is utilized, it can solve the exact solution and make sure that final solution is not violation the circuit constraint, but it is time-consuming.

That is why increase upper bound voltage will increase total runtime. Comparing the performance of using two stage with using only the second stage, the runtime of the latter is more than the former, the experimental results will be seen in the next chapter, and it will show that our two-stage sizing algorithm can handle large circuits efficiently. In [4], a state-of-the-are method has been proposed, the authors just only consider the maximum V_i^{max vio}at node i and then reduce the maximum V_i^{max vio}. In our method, on the other hand, we consider not only the cost of time period T but also consider the total cost of the nodes, i.e. we consider the global region. From the global view point, our method is outperform [4]. The experimental results will show our method is better than [4] in next

chapter.

Sizing

Fig. 3.8: (a)The voltage waveform of node i in the first sizing stage. (b) The V_i^{max vio} of node i was reduced. (c)The voltage waveform of node i in the second stage. (d) The V_i^{max vio}of node i is equal to zero.

Two-Stage Sizing Algorithm

Input : The numbers of sleep transistor K, the numbers of slected sleep transistors N, the upper bound voltage V^up,

the maximum allowed threshold voltage V_i^st∗for each node i, and minimum width of sleep transistors.

Output : Total width of sleep transistors.

1 Begin

2 Construct conductance matrix G

3 FIRST STAGE

4 While

5 Solve voltage at each time step with IEKS method 6 Sort V_i^{max vio}from maximum to minimum

7 For i = 1 : N

8 Estimate cost and sensitivity for node i

9 End For

10 If don’t meet upper bound voltage constraint 11 Select first N nodes in this order

12 For i = 1 : N

13 W_i^add= _S^cⁱ

i=> W_i^new= W_i^old+ W_i^add

14 End For

15 Update conductance matrix G

16 End If

17 Else Break

18 End While

19 SECOND STAGE

20 While

21 Solve voltage at each time step with exact time domain solver 22 Sort V_i^{max vio}from maximum to minimum

23 For i = 1 : N

24 Estimate cost and sensitivity for node i

25 End For

26 If don’t meet total cost constraint 27 Select first N nodes in this order

28 For i = 1 : N

29 W_i^add= _S^cⁱ

i=> W_i^new= W_i^old+ W_i^add

30 End For

31 Update conductance matrix G

32 End If

33 Else Break

34 End While

35 End.

Fig. 3.9:The two-stage sizing algorithm.

Chapter 4 Experimental Results

The two-stage sizing algorithm is implemented in C++ language on a processor 3.2GHZ HP workstation with 32GB memory. The test circuits are randomly generated, and the numbers of nodes are from 400 to 5000. In the test circuit, the waveform of each cur-rent source contains several triangle waves with peak value randomly generated between 0.1mA and 5mA. Our V_DD is set to 1.3V, and the maximum allowed threshold voltage is set to 0.065V, which is 5% of the V_DD. The virtual ground resistances are randomly generated from 10 to 20Ω. Time period T is partitioned into 100 time steps.

The plots shown in Fig. 4.1 and Fig. 4.2 are the results for the test circuit with 2500 nodes. Fig. 4.1 illustrates the relation between the total width of sleep transistors, the upper bound voltage V^up and the number N of selected sizing sleep transistors. It can be observed that the total width decreases as V^up increases or N decreases. Fig. 4.2 illustrates the relation between the runtime, the upper bound voltage V^upand the number N of selected sizing sleep transistors. It can be observed that the runtime decreases as V^updecreases or N increases.

Circuit Area (Width) µm

[4] Two-Stage Sizing Algorithm

no. of nodes no. of time frames no. of selected transistors (N)

TP V-TP 1 4% of sleep transistors

400 558.0900 1919.5600 556.3900 556.7100

900 1162.4500 4565.2600 1158.6300 1159.4300

1225 1611.5000 6258.1000 1607.6300 1609.2400

1600 2045.8400 8952.9100 2038.0000 2043.2000

2025 2595.4100 11137.3000 2588.6300 2591.5400

2500 3191.0800 14893.6000 3179.9300 3183.6300

5000 6179.0300 32326.2000 6158.7500 6173.1900

Total 17343.4100 80052.9300 17287.9600 17316.9400 Table 4.1: Total width comparison for two-stage method and [4]

Circuit The ratio of width

[4] Two-Stage Sizing Algorithm no. of nodes no. of time frames no. of selected transistors (N)

TP V-TP 1 4% of sleep transistors

400 1.0025 3.4481 0.9994 1.0000

900 1.0026 3.9375 0.9994 1.0000

1225 1.0014 3.8889 0.9990 1.0000

1600 1.0013 4.3818 0.9975 1.0000

2025 1.0015 4.2976 0.9989 1.0000

2500 1.0023 4.6782 0.9988 1.0000

5000 1.0009 5.2365 0.9977 1.0000

Avg. 1.0018 4.2669 0.9986 1.0000

Table 4.2: The ratio of width for two-stage method and [4].

In order to demonstrate that two-stage sizing algorithm is better than the state-of-the-art method [4], we also implement the method [4]. In Tables 4.1 and 4.2, the TP is the sizing method by using the uniform time frame partition proposed by [4] with 100 time frames, and the V-TP is the sizing method by using the variable time frame partition developed in [4] with 10 time frames. In our method, the V^upis set to be 0.2V in the first stage, and the number of selected sleep transistors is 4% of the total sleep transistors.

Table 4.1 and Table 4.2 demonstrates that the total width of the proposed sizing algo-rithm outperforms both the TP and V-TP methods. In fact, the total width of our two-stage sizing algorithm is slightly less than the TP for each test circuit, and the total width of V-TP averages over four times that of our proposed method. Table 4.3 and Table 4.4 show

Circuit Runtime (s)

[4] Two-Stage Sizing Algorithm

no. of nodes no. of time frames no. of selected transistors (N)

TP V-TP 1 4% of sleep transistors

400 7.4900 3.0400 102.7400 6.9500

900 42.2700 17.7900 665.3200 18.6400

1225 83.9400 34.9500 1279.7700 26.1600

1600 173.8900 80.5900 2601.6200 44.4700

2025 444.7400 173.3200 6353.0800 71.8400

2500 1068.8400 269.9600 14391.0000 101.2900

5000 9876.9500 2764.0900 153513.0000 749.2800

Total 11698.1200 3343.7400 178906.5000 1018.6300 Table 4.3: Runtime comparison for two-stage method and [4].

Circuit The ratio of runtime

[4] Two-Stage Sizing Algorithm no. of nodes no. of time frames no. of selected transistors (N)

TP V-TP 1 4% of sleep transistors

400 1.0777 0.4374 14.7827 1.0000

900 2.2677 0.9544 35.6931 1.0000

1225 3.2087 1.3360 48.9209 1.0000

1600 3.9103 1.8122 58.5028 1.0000

2025 6.1907 2.4126 88.4337 1.0000

2500 10.5523 2.6652 142.0772 1.0000

5000 13.1819 3.6889 204.8807 1.0000

Avg. 5.7699 1.9909 84.7558 1.0000

Table 4.4: The ratio of runtime for two-stage method and [4].

that two-stage sizing algorithm is faster than both the TP and V-TP methods, specifically averaging 5.77× faster than the TP and 1.99× faster than the V-TP.

In Table 4.4, comparing TP or V-TP with two-stage method, it can be observed as the circuit become large, the ratio of runtime increases. We moreover compare, using our method, sizing based on selecting only one sleep transistor relative to resizing based on selecting 4% of the total sleep transistors. It is found that the total width of the former is less than the latter, but the run time of the latter is 84.75× faster than the former. It is thus shown that selecting multiple sleep transistors for concurrent sizing is a very fast and reasonably effective methodology.

Table 4.5 shows that the total width obtained using only the second stage is less than

Circuit Area (Width) µm

Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)

1 4% of sleep transistors 1 4% of sleep transistors

400 556.0700 556.1500 556.3900 556.7100

900 1157.6400 1157.9000 1158.6300 1159.4300

1225 1605.8300 1606.1600 1607.6300 1609.2400

1600 2036.4800 2037.2300 2038.0000 2043.2000

2025 2586.2000 2586.6700 2588.6300 2591.5400

2500 3177.5600 3178.3000 3179.9300 3183.6300

Total 11119.7800 11122.4100 11129.2100 11143.7500

Table 4.5: Total width comparison for two-stage method and only execution in second stage.

Circuit The ratio of width

Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)

1 4% of sleep transistors 1 4% of sleep transistors

400 0.9988 0.9989 0.9994 1.0000

900 0.9984 0.9986 0.9993 1.0000

1225 0.9979 0.9981 0.9990 1.0000

1600 0.9967 0.9971 0.9975 1.0000

2025 0.9979 0.9981 0.9988 1.0000

2500 0.9981 0.9983 0.9988 1.0000

Avg. 0.9980 0.9982 0.9988 1.0000

Table 4.6:The ratio of width for two-stage method and only execution in second stage.

when using the full two stage sizing method. This is because the exact time domain solver provides a best candidate for choosing and sizing sleep transistors. Hence, the total width can be reduced. Table 4.7 shows that the two stage sizing method is faster than using only the second stage. It also reveals that model order reduction is an effective method for circuit analysis. In Table 4.8, it can be observe that comparing only execution in second stage with two-stage method, as the circuit become large, the ratio of runtime increases.

Finally, memory usage of our method is compared with [4], showing that the memory usage of our method is more than [4], with the results shown in Table 4.9.

Circuit Runtime (s)

Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)

1 4% of sleep transistors 1 4% of sleep transistors

400 155.3800 8.7600 102.7400 6.9500

900 989.7800 27.9500 665.3200 18.6400

1225 1884.7800 39.5800 1279.7700 26.1600

1600 3512.2200 70.3600 2601.6200 44.4700

2025 7579.9400 150.9600 6353.0800 71.8400

2500 20668.500 213.8300 14391.0000 101.2900

Total 34790.6000 511.4400 25393.5300 269.3500

Table 4.7:Runtime comparison for two-stage method and only execution in second stage.

Circuit The ratio of runtime

Only execution in second stage Two-Stage Sizing Algorithm no. of nodes no. of selected transistors (N) no. of selected transistors (N)

1 4% of sleep transistors 1 4% of sleep transistors

400 22.3568 1.2604 14.7827 1.0000

900 53.0998 1.4995 35.6931 1.0000

1225 72.0481 1.5129 48.9208 1.0000

1600 78.9795 1.5821 58.5028 1.0000

2025 105.5114 2.1013 88.4337 1.0000

2500 204.0527 2.1111 142.0772 1.0000

Avg. 89.3414 1.6779 64.73508 1.0000

Table 4.8: The ratio of runtime for two-stage method and only execution in second stage.

Fig. 4.1: Total width relation between the number of selected sleep transistors and the upper bound voltage.

Fig. 4.2: Runtime relation between the number of selected sleep transistors and the upper bound voltage. The X is the execution time.

Circuit Memory (MB)

[4] Two-Stage Sizing Algorithm no. of nodes no. of time frames no. of selected transistors (N)

TP V-TP 1 4% of sleep transistors

400 1.18 2.68 3.82 3.98

900 4.62 3.92 6.45 6.54

1225 5.66 4.70 8.09 8.21

1600 7.06 5.81 10.51 10.06

2025 8.40 6.82 12.61 11.90

2500 10.09 8.12 14.56 14.76

5000 19.00 8.89 26.03 27.71

Table 4.9: Memory usage comparison.

Chapter 5 Conclusion

In this thesis, we have presented a novel approach to size sleep transistors. In order to speed up circuit analysis, we employ model order reduction to solve voltage drop of sleep transistors, and select more than one sleep transistor to size in a loop, this can effective reduce execution time. On the other hand, we combine the cost and integral sensitivity and utilize them to size sleep transistor. In global view point,the answer of it is better than local view point, so the experimental results show that our total width of sleep transistors is less than [4], and execution time is faster than [4].

Bibliography

[1] F. Fallah, and M. Pedram, ”Standby and active leakage current control and min-imization in CMOS VLSI circuits,” IEICE Trans. on Electronics, vol. E88-C, pp.

509-519, April 2005.

[2] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, ”Leakage current mech-anisms and leakage reduction techniques in deep-submicrometer CMOS circuits,”

Proc. of the IEEE vol. 91, no. 2, pp. 305 - 327, February 2003.

[3] V. De, Y. Ye, A. Keshavarzi, S. Narendra, J. Kao, D. Somasekhar, R. Nair, and S.

Borkar, ”Techniques for leakage power reduction,” in Design of High-Performance Microprocessor Circuits, A. Chandrakasan, W. Bowhill, and F. Fox, Eds. Piscat-away, NJ: IEEE, 2001, ch. 3, pp. 52-55.

[4] D. S. Chiou, D. C. Juan, Y. T. Chen, and S. C. Chang, ”Fine-grained sleep transis-tor sizing algorithm for leakage power minimization,” Proc. of ACM/IEEE Design Automation Conference (DAC-2007), 2007, pp. 81-86,.

[5] http://asic-soc.blogspot.com/2008/03/leakage-power-trends.html

[6] H. Chang, and S. S. Sapatnekar, ”Full-chip analysis of leakage power under process variations, including spatial correlations,” Proc. of ACM/IEEE Design Automation Conference (DAC-2005), 2005, pp. 523-528.

[7] Y. M. Lee, Y. Cao, T. H. Chen, J. Wang, and Charlie C. P. Chen, ”HiPRIME: Hierar-chical and passivity preserved interconnect macromodeling engine for RLKC power

delivery,” IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems (TCAD), vol. 24, no. 6, pp. 797-806, June, 2005.

[8] Y. Cao, Y. M. Lee, T. H. Chen, and Charlie C. P. Chen, ”HiPRIME: Hierarchical and passivity reserved interconnect macromodeling engine for RLKC power delivery,”

Proc. of ACM/IEEE Design Automation Conference (DAC-2002), 2002, pp. 379-384.

[9] C. Long, and L. He, ”Distributed sleep transistor network for power reduction,”

IEEE Transaction on VLSI systems, vol. 12, No. 9, pp 181-186, September 2004.

[10] S. Mutoh, S. Shigematsu, Y. Matsuya, H. Fukuda, T. Kaneko, and J. Yamada, ”A 1-v multithreshold-voltage CMOS digital signal processor for mobile phone appli-cation,” IEEE J. Solid-State Circuits, vol. 31, issue 11, pp. 1795-1802, November 1996.

[11] J. Kao, S. Narendra, and A. Chandrakasan, ”MTCMOS hierarchical sizing based on mutual exclusive discharging patterns,” Proc. of ACM/IEEE Design Automation Conference (DAC-1998),1998, pp. 495-500.

[12] M. Anis, S. Areibi, M. Mahmoud, and M. Elmasry, ”Dynamic and leakage power reduction in MTCMOS circuits using an automated efficient gate clustering tech-nique,” Proc. of ACM/IEEE Design Automation Conference (DAC-2002), 2002, pp.

480-485.

[13] J. Kao, A. Chandrakasan, and D. Antoniadis, ”Transistor sizing issues and tool for multi-threshold CMOS technology,” Proc. of ACM/IEEE Design Automation Con-ference (DAC-1997) , 1997, pp. 409-414.

[14] D. S. Chiou, S. H. Chen, S. C. Chang, and C. Yeh, ”Timing driven power gating,”

Proc. of ACM/IEEE Design Automation Conference (DAC-2006), 2006, pp. 121-124.

[15] M. Anis, S. Areibi, and M. Elmasry, ”Design and optimization of multithreshold CMOS (MTCMOS) circuits,” IEEE Transactions on Computer-Aided Design of In-tegrated Circuits and Systems (TCAD), Vol. 22, issue 10, pp. 1324-1342, June 2003.

[16] A. Ramalingam, B. Zhang, A. Devgan, and D. Z. Pan, ”Sleep transistor sizing using timing criticality and temporal currents,” Proc. of ACM/IEEE Asia and South Pa-cific Design Automation Conferernce (ASP-DAC-2005), vol. 2, pp. 1094-1097, Jan.

2005.

[17] T. H. Chen, and C. C. Chen, ”Efficient large-scale power grid analysis based on preconditioned Krylov subspace iterative methods,” Proc. Design Automation Con-ference (DAC-2001), 2001, pp. 559-562.

在文檔中一個新穎且快速的調整睡眠電晶體尺寸演算法經由使用積分型的靈敏度 (頁 31-47)