Organization - 矽智產設計的功率估測方法之研究

The remainder of this dissertation is organized as follows. First, the improved vector compaction method using sampling techniques is presented in Chapter 2. Both the single-sequence and multiple-sequence consecutive sampling techniques will be presented in this chapter. In Chapter 3, the gate-level power model using only 1-dimensional lookup-table will be presented. The proposed tableless power model using feed-forward neural network for behavioral-level simulation will be explained in Chapter 4. Finally, we will give our conclusions and make some discussions about the future works in Chapter 5 to complete this dissertation.

Chapter 2

Improved Vector Compaction Methods

2.1 Vector Compaction Techniques

Traditionally, power estimation is often performed at transistor-level by SPICE-liked simulation. However, it is impractical to simulate a complex design with a large number of test vectors by a transistor-level simulator because it may require too much simulation time.

For efficiency consideration, many vector compaction techniques [23~32][37~40] have been proposed. The compacted input sequences are generated according to some characteristics of the original input sequences or the activity of the circuits while triggered by those input vectors. Therefore, the power characteristics can still be maintained because those statistics are carefully kept during the compaction process. Based on the vector compaction techniques, we can estimate the power consumption of a circuit with a much smaller input vector set thus reducing the power estimation time dramatically with little accuracy loss.

The vector compaction techniques can be roughly classified into two categories. The regeneration approaches [23~30] generate a new input sequence that is shorter than the original input sequence but has the similar average power as that of the original one. In [23], the pairwise transition probabilities of inputs are used to approximate the joint transition probabilities of the primary inputs. Those probabilities in the original input sequences will be the target to be kept in the compacted sequence.

In [24], the authors build an incomplete state transition graph, in which the primary inputs with higher power sensitivity are used as the state bits, to generate a smaller sequence after compacting the activity number of each edge in the state transition graph by the Eulerian walk algorithm. The average hamming distance of unselected primary inputs will also be considered when they regenerate the compacted sequence. Based on grouping and sampling techniques, the authors of [27] separate the primary inputs of the circuit into several groups according to their power sensitivity values. The input pattern pairs are also divided into several subsets such that they can generate a smaller input sequence by randomly sampling from each subset according to the size of each subset and the compaction ratio. In [24,27], the power sensitivity values of the inputs are obtained from a simulation. Those power sensitivity values may become inaccurate under different distribution of input signal probability and switching activity.

In [25], the authors build a transition graph of the original input sequence to model the transitions between vectors. With the transition graph, they can obtain the active numbers of all edges and keep their ratios in the compacted input sequence. In [26], the authors analyze the input sequence with the Markov chain model and generate a smaller input vector set that keeps the characteristics on the Markov chain model. In [28], the spatial correlation of input bits is used to cluster the input pins and the compacted sequence can be generated more easily compact input sequence because those bit clusters are treated as independent. In [29], the authors generate a compacted sequence that has the similar transition profile on the internal signals. In [30], the authors separate the input vectors into several vector sets based on the transition counts of internal nodes and generate a smaller sequence according to the fractal

compaction algorithm. However, the backward weight propagation in [29] and the fractal algorithm in [30] have high computational overhead such that the speedup is limited.

Another category of the vector compaction techniques is the sampling approaches [31,32][37~40]. The sampling approach chooses some input patterns from the original sequence to estimate the average power. The Monte Carlo simulation method is proposed in [31] and [32] for combinational circuits and sequence circuits. In [37], the stratified random sampling technique is used to improve the convergence speed of the Monte Carlo simulation method. In [38], the gate-level simulation is used to draw the waveform of the indicator function. The transistor-level simulation is used to estimate the pattern pair only when its power variation is large enough. Finally, the power waveform could be used to estimate the power consumption of the original input sequence. In [39], the sampling process is done for module groups with similar power behavior. In this case, the sample size of Monte Carlo simulation could be reduced and the performance could be improved.

According to the cycle-based power information obtained from logic-level simulations, the authors of [40] separate input pattern pairs into to several groups and select the largest energy cycle per group to be simulated by a transistor-level simulator such as PowerMill.

According to the power consumption of each sampled cycle and the size of each group, they can calculate the average power consumption of the circuit under the original input sequences.

Because they select the largest energy cycle in each group as sampled cycle, those cycles might be randomly distributed in the original input sequence. Therefore, their method is called a random-liked sampling method.

2.2 Useless Transitions

For large circuits, vector compaction techniques could provide a faster solution for power estimation with reasonable accuracy. However, the random-liked sampling method may lose the compaction ratio and speedup as shown in Figure 2-1. Those group numbers in Figure 2-1 present different range of power characteristic values. After compaction, only 4 pattern pairs are randomly selected, one from each group. However, when those 4 pattern pairs are serially concatenated to be the compacted sequence, 7 pattern pairs will be found in the compacted sequence. That means the compaction ratio and speedup are lost because the compacted input sequence includes 3 useless transitions.

Figure 2-1. Random sampling with useless transition

Therefore, we propose a single-sequence consecutive sampling technique to reduce those useless transitions. Using the single-sequence consecutive sampling technique as shown in Figure 2-2, we can sample a single period of patterns instead of individual pattern pairs to reduce the loss of compaction ratio caused by the useless transitions. Compared to the example shown in Figure 2-1, there is no useless transition in the compacted sequence so that we can keep the compaction ratio as desired and shorten the length of the sequence.

Figure 2-2. Consecutive sampling

However, due to non-uniform distribution of pattern pairs in some groups, it is very possible that we cannot find a perfect consecutive sequence without any undesired transitions as shown in Figure 2-2. Using single-sequence consecutive sampling technique, we will over-sample some groups in such cases to find an intact single sequence that have enough samples for all groups. Therefore, the compaction ratio of the sequence length may not be improved too much. In those cases, if we can relax the limitation a little bit such that multiple consecutive sequences are allowed, we may generate a shorter sequence that still has the desired distribution. For example, if the desired distribution is G1:G2:G3:G4 = 3:1:2:2 for the input vectors shown in Figure 2-3, the compacted sequence found in single-sequence approach will include at least 11 transitions as shown in Figure 2-3(a). However, as shown in Figure 2-3(b), we can find two subsequences that also satisfy the requirements but the number of transitions is only 9 after concatenated. It implies that we can find better solutions for vector compaction problem if we minimize the number of sequences instead of setting the number to be one. Of course, the number of sequences could be one as handled in the original single-sequence approach, but it is just a special case in the multi-sequence approach.

Therefore, in this work, we focus on discussing this new extension and perform some

experiments to show the improvements of this new approach.

Figure 2-3(a). The result of single-sequence approach

Figure 2-3(b). The result of multi-sequence approach

In this work, our focus is to reduce useless transitions in random-liked sampling method for vector compaction. Although those vector compaction methods including previous approaches and the proposed approach only focus on combinational circuits, they can still be applied to sequential circuits with full scan. The only difference is that we have to record the internal states of all flip-flops (FFs) when we estimate logic-level power characteristics of each input transition. This FF information will then be used in the transistor-level simulation for the compacted input sequence to set the internal states at the beginning of each composing subsequence. Therefore, if the compacted input sequence is composed of only one subsequence, we only have to set the initial condition once, which requires very little overhead.

2.3 Selection of Power Characteristics

The power consumption of a CMOS digital circuit is often formulated as Equation (2-1).

The static power (Pstatic) is often much smaller than the dynamic power (Pdynamic). The Pdynamic

is the summation of the functional transition power (Pfunc_trans), the glitch power (Pglitch) and the short-circuit power (Pshort-circuit), which is represented as Equation (2-2). The Pshort-circuit is consumed when short-circuit current flows from VDD to ground at the period that both PMOS and NMOS transistors turn on together during the signal transitions and is often smaller than the summation of the Pfunc_trans and the Pglitch. The proportion between Pfunc_trans and Pglitch

depends on the circuit behavior and the design skill. Given a circuit with n nodes in its netlist, we could express the power consumptions of Pfunc_trans and Pglitch as Equation (2-3) and (2-4), where i denotes the index of each internal node, Ci is its load capacitance of node i, Vdd is supply voltage of the circuit, fi_func is the frequency of functional transition at node i and fi_glitch is the frequency of glitch at node i. Note that a node in the netlist is defined as the input or output of a logic gate in the circuit. Generally speaking, a functional transition only considers the signal transition from 0 to 1 or 1 to 0. On the contrary, a glitch is the signal transition from 0 to 1 to 0 or 1 to 0 to 1 such that it is not multiplied by a factor 1/2. τi is the factor of the width of glitch to the glitch power and should be between 1 and 0.

(2-1)

In Equation (2-3), the term is often defined as the charging and discharging capacitance (CDC) during an input transition, where f

∑

= n ⋅

i Ci fi func

1 _

i_func =1 if node i has signal transition and fi_func =0 if node i has no signal transition. The Ci of node i is the summation of output capacitance for driving gate and the input capacitances of driven gates at node i. For commercial cell libraries, the vendors will provide the output loading capacitance and input loading capacitances of cells. If such loading information is not provided, users can easily characterize the loading capacitances by themselves using the characterization process proposed in [19]. Therefore, to calculate the CDC values of an input pattern pair only have to sum the loading capacitances of those nodes whose logic values are changed during the input transitions. Only a logic-level simulator is required to obtain the node transition information for calculating CDC values.

In the simulation-based vector compaction approaches that consider the circuit structures or behaviors, they often classify the input pattern pairs according to some power characteristics of each pattern pair. In the literatures, many power characteristics have been proposed [23,24][27] [30]. For example, Hamming distance (HD) of pattern pairs is adopted in [23][27] which use the number of transition bits of the primary inputs to approximate the average power consumption. Switching count (SC) is used in [30] to approximate the power consumption of a pattern pair using the summation of . Charging and discharging

capacitance (CDC) is adopted in [40] to approximate the power consumption of a pattern pair.

Power sensitivity is used in [24,27] as an estimation on the influence of an input to the overall power consumption.

∑

= n

i fi func

1 _

In the vector compaction approaches, the adopted power characteristics have large

impacts on the accuracy of the estimated power and the extra computation overhead for the compacted input sequences. In order to determine which power characteristic is most suitable for different circuits, we define the average normalized error of a power characteristic as below to make a fair comparison between them.

An average normalized error (AVGNE) is the average error between the normalized power characteristics to the normalized real power of all combinations of input vectors. The normalized power characteristic is the power characteristic value divided by the average power characteristic value. The normalized real power is the power consumption divided by the average power.

For a combinational circuit with n inputs, we can formulate the AVGNE of any power characteristic PC as Equation (2-6). In Equation (2-6), Pj,k is the power consumption of the transition from pattern j to pattern k, and PCj,k is the power characteristic value of that transition. Pavg is the average power consumption of all input pattern pairs and PCavg is the average power characteristic value of all input pattern pairs.

The AVGNE of a power characteristic can make a fair comparison between the power characteristic value and the real power consumption. The power characteristic with smaller AVGNE is considered as much closer to the real power. Therefore, we make some

experiments to evaluate the AVGNE of some popular power characteristics.

In our previous work [79], we have compared the AVGNE of CDC and HD. In this work, we compare the AVGNE of three popular power characteristics, HD, zero-delay CDC and zero-delay SC. In our experiments, we evaluate the AVGNE by three input sequences with the same average input signal probability (P=0.5) but different average transition density (D) on several ISCAS’85 benchmark circuits. For the input sequence with high average transition density, D is set to 0.4. For the input sequence with middle average transition density, D is set to 0.25. For the input sequence with low average transition density, D is set to 0.1. For each test sequence, 500 pattern pairs are randomly generated according to the desired signal probability and transition density. Those patterns are then used in PowerMill to simulate their power consumptions. About the corresponding power characteristic values, the CDC and SC values are calculated by the Verilog-XL simulator, and the HD values is obtained by a simple self-developed C program. The comparison result is shown in Table 2-1. The AVGNEs of ISCAS’85 benchmark circuits are estimated by three test sequences and the overall average AVGNEs of CDC, SC and HD are 0.1278, 0.1399 and 0.2568 respectively. Therefore, we also choose CDC to be the power characteristic in this work.

Table 2-1. Average normalized errors for three power characteristics

C432 0.1592 0.1933 0.3093 0.2306 0.2501 0.4793 0.2091 0.2424 0.7749 C499 0.1044 0.1064 0.1334 0.1321 0.1308 0.1641 0.1491 0.1405 0.2662 C880 0.1044 0.1183 0.1729 0.0841 0.0991 0.2562 0.1016 0.1178 0.3153 C1355 0.1055 0.1099 0.1402 0.1026 0.1056 0.1579 0.1246 0.1268 0.2112 C1908 0.1441 0.1515 0.1809 0.1421 0.1597 0.2303 0.1490 0.1551 0.2851 C2670 0.1018 0.1213 0.1768 0.1067 0.1341 0.2158 0.1121 0.1363 0.3138 C3540 0.1445 0.1598 0.1750 0.1335 0.1477 0.2396 0.1417 0.1496 0.4756 C5315 0.0882 0.1003 0.1563 0.0836 0.0963 0.2400 0.1091 0.1190 0.3313 C6288 0.0982 0.1125 0.1262 0.1532 0.1684 0.1733 0.2096 0.2260 0.2391 C7552 0.0914 0.0933 0.1667 0.1014 0.1067 0.2699 0.1158 0.1177 0.3286 Average 0.1142 0.1267 0.1738 0.1270 0.1399 0.2426 0.1422 0.1531 0.3541

2.4 Grouping of Pattern Pairs

If we have a power characteristic that is almost proportional to the real power consumption for all pattern pairs, we can easily generate a compacted input sequence for estimating the average power consumption of a circuit by a simple random selection. For example, if the original input sequence is L and the compacted sequence is C, the power consumption of the original input sequence PL can be calculated from PL=PC*(PCL/PCC), where PCL is the total power characteristic value of the original input sequence, PCC is the total power characteristic value of the compacted sequence, and PC is the power consumption of the compacted sequence. However, most of the power characteristics including CDC can only model the functional transition power. The glitch power is often not proportional to the functional transition power for all pattern pairs such that the power characteristic values may

not always be proportional to the real power consumption.

Therefore, another solution is required instead of random selection to minimize the estimation errors. One popular approach is to separate the input pattern pairs into several groups according to their power characteristics and then sample pattern pairs from each group.

This grouping method, which is also applied in this work, is widely used just like [40] with very low computation complexity. The variation caused by glitch power can be effectively reduced because the average value of each group is used to represent the power consumption of all pattern pairs that belong to this group such that the variation can be compensated.

In order to demonstrate the grouping effects, we perform a simple experiment on C1355 in ISCAS’85 benchmark circuits with 5,000 random input pattern pairs. The variance limitation of each group is set as ±2.5%. The experimental results are illustrated in Figure 2-4 to show the estimation error between normalized CDC values and normalized real power values of all pattern pairs. The estimation error of each pattern pair is defined as Equation (2-9), where NCDCi is the normalized CDC value of pattern pair i, and NPi is the normalized power consumption of pattern pair i. Without grouping, all pattern pairs are treated as a single group with group number 0 in Figure 2-4. We can see that there is a large error distributed from 20% to -60%. After divided those pattern pairs into 24 groups with group number 1 to 24 in Figure 2-4, we can see that the error distribution range of each group is significantly reduced if the average value is used to represent the real power value of each pattern pair in the same group.

Figure 2-4. The effects of grouping

Figure 2-5 shows an example of the grouping process. Figure 2-5(a) is the distribution of the CDC values of 15 pattern pairs. After sorting and grouping, those pattern pairs with similar CDC values will be put together into a group as shown in Figure 2-5(b). In this example, there are six groups for the input pattern pairs. The group size and the number of groups is determined by a user-defined variance limitation, which is the range of CDC values in a group from the average CDC value to its maximum or minimum value.

(a). CDC distribution (b). Sorting and grouping Figure 2-5. An example of grouping process

Our experience shows that the best variance limitation falls between ±2.5% to ±5%. If the variance limitation is smaller than ±2.5%, the number of groups is increased and the group sizes are decreased. In this case, it is hard to obtain a high compaction ratio because many groups are too small to provide enough samples. If the variance limitation is larger than

±5%, the grouping process will cause larger errors because the glitch power may be quite different between those pattern pairs in a group. Therefore, there is a trade-off between the compaction ratio, the estimation error, and the variance limitation, which can only be decided according to the characteristics of the circuits.

In order to demonstrate these effects, we use 50,000 pseudo random vectors to test C2670 in ISCAS’85 benchmark circuits with different variance limitation and compression ratio by using the two different vector compaction approaches that will be introduced in Section 2.5. The experimental results are shown in Table 2-2. In Table 2-2, DCR is the abbreviation of desired compaction ratio. From the results, we can see that the estimation errors will increase in both approaches when the compression ratio is increased. If we set the variance limitation to a smaller value (1%), we can see that the estimation errors are getting

在文檔中矽智產設計的功率估測方法之研究 (頁 30-0)