Test Power Challenge - 掃描鏈重新排列減少掃描移動功率架構在非指定的測試集合

Chapter 2 Preliminaries

2.2 Test Power Challenge

Figure 2.2: An example of scan-in, capture, and scan-out operation.

a four flip-flops scan chain (initialized cell values equal to 0) and a test vector 1011 is given. During scan-in operation, it take four cycles to shift in the vector values to scan cells, and then new cell values are loaded from combinational logic in capture cycle.

Last, responses are scan-out while the next pattern is shifted in simultaneously (Xs denote the values of next pattern) .

2.2 Test Power Challenge

Test power is a possible major engineering problem in the future of system-on-chip (SOC) development. As both the SoC designs and the deep-submicron geometry become prevalent, larger designs, tighter timing constrains, higher operating frequencies,

2.2 Test Power Challenge 7

and lower applied voltages all affect the power consumption systems of silicon devices.

More precisely, these factors affect energy, average power, instantaneous power, and peak power, so some characteristics are defined.

• Energy: The total switching activity generated during test application, energy affects the battery lifetime during power up or periodic self-test of battery-operated devices.

• Average power: Average power is the total distribution of power over a time period.

The ratio of energy to test time gives the average power. Elevated average power increases the thermal load that must be vented away from the device under test to prevent structural damage (hot spots) to the silicon, bonding wires, or packages.

• Instantaneous power: Instantaneous power is the value of power consumed at any given instant. Usually, it is defined as the power consumed right after the application of a synchronizing clock signal. Elevated instantaneous power might overload the power distribution systems of the silicon or package, causing brown-out.

• Peak power: The highest power value at any given instant, peak power determines the component’s thermal and electrical limits and system packaging requirements.

If peal power exceeds a certain limit, designers can no longer guarantee that entire circuit will function correctly. In fact, the time window for defining peak power is related to the chip’s thermal capacity, and forcing this window to one clock period is sometimes just a simplifying assumption. For example, consider a circuit that has a peak power consumption during only one cycle but consumes power within the chip’s thermal capacity for all other cycles. In this case, the circuit is not damages, because the energy consumed-which corresponds to the peak power consumption time one cycle-will not be enough to elevate the temperature over

2.2 Test Power Challenge 8

the chip’s thermal capacity limit. To damage the chip, high power consumption must last for several cycles.

The scan-based design architecture are popular because of their low impact on per-formance and area. But these scan-based architecture are expensive because each test pattern requires a power-consuming shift operation to provide test patterns and evalu-ate test response. This phenomenon is well known in industry. To meet specified power limits during test and avoid system destruction, it is really important to reduce power dissipation during scan shifting.

Chapter 3 Power-driven Scan Cell Reordering

3.1 Previous Work

In this section, we introduce the previous work about scan cell reordering method-ology to reduce power [20]. The prerequisite is the deterministic sets including test patterns and corresponding responses. The idea of the methodology is to find a scan cell ordering which generates the minimum transitions during shifting operation. (scan − in and scan − out operations)

The steps are described as following:

• Collect a deterministic sets of test patterns and corresponding responses.

• Determine a scan cell chaining to minimize total bit differences of adjacent cells in patterns and responses .

• Determine a scan cell ordering to minimize generated transitions during scan-in and scan-out operations.

The data in the first step is obtained from ATPG tool that generate the patterns with non-specified bits. In the second step, finding a scan cell chaining with the minimal

3.1 Previous Work 10

Figure 3.1: Graph construction based on deterministic sets

bit differences represents the high probability to minimize transitions as low as possible.

A complete graph is constructed based on the deterministic sets to help us find out the solution. In the graph, a vertex denotes a scan cell and the edge weight represents the total bit difference of adjacent vertices. Figure 3.1 shows the graph construction based on deterministic sets. In the figure, the test sequence is composed of four test vectors (V¹ to V⁴) and four output responses (R¹ to R⁴). The scan chain has four flip-flops, hence scan vectors are four-bit long. The initial order of the scan cells in the scan chain is depicted on the figure. According to the above description, flip-flop 1, denoted FF1, corresponds to bit 1 in each scan vector, and so on.

Next, we calculate the total number of bit differences between each pair of scan flip-flops, which represents the number of transitions that may be generated in the corresponding portion of the scan chain by connecting these two flip-flops together.

Calculating the total number of bit differences between each pair of flip-flops provides the following results: d(FF1, FF2) = 6, d(FF1,FF3) = 3, d(FF1,FF4) = 2, d(FF2, FF3)

= 5, d(FF2,FF4) = 4, d(FF3, FF4) = 5.

To determine a scan cell chaining with the minimum cost, a greedy traveling salesman problem (TSP) is applied to find the solution. In the cases, the chaining of cells

FF1-3.1 Previous Work 11

FF4-FF2-FF3-FF1 is the final solution.

In the third step, the input and output of scan chain are identified. The transitions that bit differences generate during shift operation has dependency with its position of the scan chain. A novel method called weighted transition counts (WTC) had been proposed to calculate transitions [10].

Figure 3.2 shows different transition weight for test vectors and responses. For test vectors, the bit difference of cell pair close to scan chain output generates more transi-tions because of passing through a long path during scan-in operation. The transition weight is increasing from the scan chain input. On the contrary, the weight of out re-sponse is decreasing from the scan chain input. As a result, the various combinations of input and output of scan chains cause different transition counts.

FF1 FF2 FF3 FF4

scan-in scan-out

test vectors output response V₁= 0 1 1 0 R₁= 0 1 0 0

weight=1 weight=3 weight=3 weight=1

Figure 3.2: Different transition weight for test vectors and responses

The third step tries and calculates the weighted transition counts for different cases to find the best ordering of scan cells with the minimum generated transitions during shift operation. Figure 3.3 describes the operation. Taking the chaining FF1-FF4-FF2-FF3 for example, the value of W T Ctotal is derived by calculating the weighted transition counts of vectors and responses, respectively. Besides, the transitions generated due to the difference between the first bit of V³ and the last bit of R² are also considered. The case FF1-FF4-FF2-FF3 has the minimum W T Ctotal and is the final ordering of scan

3.2 Motivation 12

Figure 3.3: Identifying the input and output of scan chain

3.2 Motivation

During the scan-based testing, the total power consumption of the CUT is highly correlated with the total number of signal transitions on the scan cells [10]. In this thesis, we use the number of signal transitions on scan cells as the power model of the whole CUT. The proposed scan-cell-reordering scheme focuses on reducing the total scan-shift power, i.e., reducing the total scan-shift transitions. The capture power is not considered in the proposed scheme.

From the discussions in Chap. 1, the scan-in transitions can be minimized by wisely filling the don’t-care bits of a test set once the scan-cell order in the scan paths are given [10]. This reduction could be more significant as the percentage of don’t-care

3.2 Motivation 13

bits increases. Therefore, our scan-cell reordering scheme attempts to first minimize the scan-out transition count without specifying the care bits, leaving the don’t-care bits for a later minimization of scan-in transition, such as MT-fill [10]. However, before specifying the don’t-care bits, the value of some responses may not be obtainable, implying that no explicit information of scan-out transitions can be used during the scan-cell reordering process.

We use a simple experiment (reported in Table 3.1) to show that certain pairs of scan cells tend to have the same response value in most cases of the random don’t-care filling. Thus the reordering scheme can avoid the possible scan-out transitions by connecting those correlated pairs of scan cells next to each other. We first define this tendency between two scan cells as the response correlation, which is the probability that the two scan cells have the same response value by a random fill of don’t-care bits.

In the experiment, we use a commercial tool [26] to generate stuck-at-fault patterns with don’t-care bits. We then collect the statistic of the response correla-tion between any two scan cells by randomly filling the don’t-care bits and simulating the corresponding responses for 1-million times. Table 3.1 lists the range of response correlations (Columns 1 and 4), the number of scan-cell pairs whose sampled response correlation falls in the range (Columns 2 and 5), and its corresponding percentage to the total scan-cell pairs (Columns 3 and 6), for the largest ISCAS benchmark circuit s38584.

As the results show, while majority of the scan-cell pairs have a response correlation around 0.5, still 21595 scan-cell pairs (2%) have a response correlation higher than 0.75.

Those 21595 scan-cell pairs could form a fair-sized solution space when reordering the 1452 scan cells in s38584. This experimental result indicates that, even without spec-ifying the don’t-care bits, the response correlations are not purely random. The same trend can be observed on other ISCAS and ITC benchmark circuits as well.

3.3 Problem Formulation 14

Correlation # of Distribution Correlation # of Distribution

cell pairs (%) cell pairs (%)

0.95 - 1 32 0.003 0.45 - 0.50 476,539 45.220

0.90 - 0.95 758 0.072 0.40 - 0.45 34,963 3.319

0.85 - 0.90 2,549 0.242 0.35 - 0.40 12,957 1.230 0.80 - 0.85 6,531 0.620 0.30 - 0.35 9,260 0.879 0.75 - 0.80 11,725 1.113 0.25 - 0.30 6,910 0.656 0.70 - 0.75 17,097 1.623 0.20 - 0.25 5,109 0.485 0.65 - 0.70 17,518 1.663 0.15 - 0.20 3,666 0.348 0.60 - 0.65 21,848 2.074 0.10 - 0.15 1,949 0.185

0.55 - 0.60 46,804 4.443 0.05 - 0.10 748 0.071

0.50 - 0.55 376,600 35.750 0 - 0.05 0 0

Table 3.1: Response correlation of ISCAS benchmark s38584.

3.3 Problem Formulation

The problem of the scan-cell reordering for scan-shift power reduction is first defined as follows:

Input:

• A circuit under test with scan cells inserted, and

• ATPG test patterns with don’t care bits (X’s).

Output:

• An ordering of scan cells, and

3.3 Problem Formulation 15

• Test patterns with all don’t-care bits specified by MT-Fill based on the derived cell ordering.

Objective:

• Generate the minimum number of scan-shift transitions for the given test patterns.

The proposed scan-cell-reordering scheme only discuss the situation of one scan chain in a design. However, the concept of the proposed reordering scheme could be extended to multiple-scan-chain architectures as well.

Given a test pattern and the scan-cell order for the scan chain, we can use the weighted transition count (WTC) [10] to calculate the number of scan-in and scan-out transitions. The WTC considers not only the value difference between the patterns or responses of two adjacent scan cells, but also the number of transitions that this value difference generates during the scan shift cycles. Equation 3.1 and 3.2 define the W T Cin(i) and W T Cout(i) to calculate the scan-in transitions and scan-out transitions generated by the ith pattern, respectively.

W T Cin(i) = Xs−1

j=0

P D(j) × WP D(j) (3.1)

W T Cout(i) = Xs−1 j=0

RD(j) × WRD(j) (3.2)

In equation 3.1 and 3.2, s denotes the total number of scan cells; P D(j) (RD(j)) denotes the value difference between the scan-in pattern (scan-out response) of the jth cell and the j + 1 cell; WP D(j) denotes the number of scan-in transitions generated by the pattern-value difference P D(j) when shifting in the corresponding pattern values from the scan input to the j + 1 cell; WRD(j) denotes the number of scan-out transitions generated by the response-value difference RD(j) when shifting out the responses from the j cell to the scan chain output.

3.4 Scan-cell Reordering Considering Only Response Correlation 16

In the WTC calculation, WP D(j) = j, implying that a pattern-value difference can generate more scan-in transitions if this value difference occurs closer to the scan-chain output. On the contrary, WRD(j) = s − 1 − j, implying that a response-value difference can generate more out transitions if this value difference occurs closer to the scan-chain input. Figure 3.4 shows an example of the WTC computation on a 6-cell scan chain, assuming that three value differences occur between cells (C¹, C²) , (C², C³), and (C⁵, C⁶) for both the test pattern and its response.

Equation 3.3 calculates the total number of transitions, W T Ctotal, generated by a given test set with m test patterns.

W T Ctotal = Xm

i=1

[W T Cin(i) + W T Cout(i)] (3.3)

3.4 Scan-cell Reordering Considering Only Response Correlation

3.4.1 Detailed Steps of Reordering Scheme

We introduce a scan-cell reordering scheme, named RORC (ReOrdering considering Response Correlation), which first reduces the scan-out transitions by minimizing the response correlations while preserving all don’t-care bits in the test patterns. Then, the scan-in transitions are further minimized by specifying the don’t-care bits with MT-fill.

Figure 3.5 shows the flow of RORC, which consists of five main steps. The detail of each step is described in the following subsections.

3.4 Scan-cell Reordering Considering Only Response Correlation 17

C1 C2 C3 C4 C5 C6

Scan-in value 1 0 1 1 1 0

W_PD(j) 1 2 3 4 5 WTC_in(i) 1 + 2 + 0 + 0 + 5 = 8

C1 C2 C3 C4 C5 C6

Scan-out value 1 0 1 1 1 0

W_RD(j) 5 4 3 2 1 5 + 4 + 0 + 0 + 1 = 10 VP(j) 1 1 0 0 1

1 1 0 0 1 (a) Scan-in operation

(b) Scan-out operation RD(j)

Scan-in

Scan-out

Scan-in

Scan-out

WTC_out(i)

Figure 3.4: Calculation of scan-in and scan-out WTC.

Obtain Response Correlations

A simulation-based method is applied to sample the response correlations between each pair of scan cells. However, the filling of don’t-care bits in RORC is not purely ran-dom since the MT-fill technique will be applied later in RORC. Therefore, in this step, we randomly generate the scan-cell ordering multiple times, specify don’t-care bits using MT-fill based on each generated scan-cell ordering, and then collect the response corre-lations by simulating the filled patterns. The number of random-generated cell orderings used in simulation will determine the accuracy of the sampled response correlations. We use the following empirical equation to determine this number of random-generated cell

3.4 Scan-cell Reordering Considering Only Response Correlation 18

Step 1: Obtain the response correlations

Step 2: Construct the response-correlation graph based on the sampled response correlations

Step 3: Find a maximal Hamiltonian cycle on the response-correlation graph Step 4: Determine the cell ordering with minimum WTC by breaking the Hamil-tonian cycle

Step 5: Apply the MT-Fill to specify the don’t-care bits of test patterns based on the derived cell ordering

Figure 3.5: Main steps of the proposed reordering scheme RORC.

orderings.

Simulation T imes = (G Counts/50) × P Counts, (3.4) where G Counts and P Counts denote the circuit gate count and the number of given test patterns, respectively.

Construct the Correlation Graph

After obtaining the response correlations, we construct a non-directed graph, named response-correlation graph, in which a vertex represents a scan cell and the weight of each edge represents the response correlation between the adjacent vertices. Because any pair of scan cells could be placed next to each other, the response-correlation graph is a complete graph. Figure 3.6 shows an example of constructing a response-correlation graph with four scan cells.

3.4 Scan-cell Reordering Considering Only Response Correlation 19

Figure 3.6: Construction of a response-correlation graph.

Find a Maximal Hamiltonian Cycle

A higher response correlation between two scan cells implies a lower probability that a response-value difference occurs between the two cells. Based on this concept, the maximum Hamiltonian cycle on the response-correlation graph implies a scan-cell ordering on which the number of value differences generated between adjacent cells is statistically minimum. Finding the maximum Hamiltonian cycle is known as the trav-eling salesman problem (TSP), which is NP-complete. We use a greedy TSP algorithm, which orders one vertex at a time to form the cycle. The selection criteria for the new ordered vertex is to find the vertex which has the maximum weight with the previous ordered vertex. In addition, we select the first N largest edges as the initial searching points and report the best result out of these N trials, where N denotes the total number of scan cells. The time complexity of this algorithm is Q(N³).

Determine Cell Ordering with Minimal WTC

In the previous step, we obtained a maximal Hamiltonian cycle on the response-correlation graph so that the number of potential response-value differences between adjacent cells can be minimized. However, to minimize the W T Cout, we need to

con-3.4 Scan-cell Reordering Considering Only Response Correlation 20

sider not only the number of response-value differences but also the positions of those value differences in the cell ordering (as discussed in Section 3.3). In Step 4, we break the given maximal Hamiltonian cycle into a Hamiltonian path, which forms the final scan-cell ordering. The breaking of the Hamiltonian cycle will affect the positions of the response-value differences and, in turn, affect the W T Cout. Here, we estimate the W T Cout generated by each possible breaking of the given Hamiltonian cycle and use the breaking with the minimum W T Cout to form the final cell ordering.

The estimated W T Cout here is obtained by replacing the RD(j) in Equation 3.2 with 1 minus the response correlation between cell j and j +1. For example, the maximal Hamiltonian cycle in Figure 3.6 is C¹-C²-C⁴-C³-C¹. Figure 3.7 shows the estimated W T Cout for all eight cases of the possible cycle breaking. The final cell ordering of the scan chain is C²-C¹-C³-C⁴.

Apply MT-Fill to Specify Don’t-care Bits

After the scan-cell ordering is decided in the previous step, we apply the MT-fill technique to fill the don’t-care bits of the test patterns so that the scan-in transitions based on the scan-cell ordering can be minimized. The rule of MT-fill is that a don’t-care bit is filled with the value of the first encountered specified bit when traversing from the don’t-care bit toward the scan-chain output. Refer to [10] for more details of MT-fill.

3.4.2 Experimental Results

We conduct experiments on nine ISCAS and ITC benchmark circuits. Table 3.2 first shows the statistics of the benchmark circuits and their ATPG patterns generated by [26].

The following experiment compares RORC with another scan-cell reordering scheme presented in [20], which requires fully-specified test patterns before the reorder-ing. Since RORC applies MT-fill to minimize the scan-in transitions, we apply MT-fill

3.4 Scan-cell Reordering Considering Only Response Correlation 21

Figure 3.7: Estimated W T Cout of different scan-chain input/output.

for [20] as well. In the following experiment of [20], we first randomly generate an initial scan-cell ordering and specify the don’t-care bits using MT-fill according to that initial ordering. Then the reordering scheme in [20] is applied to obtain the final scan-cell ordering based on the filled patterns. We repeat the above steps 100 times and report the best results for [20]. Also, we use the same TSP algorithm in both RORC and [20]

to make a fair comparison.

In Table 3.3, Columns 3, 4, and 5 list the numbers of scan-in transitions, scan-out transitions, total scan-shift transitions, respectively. Column 6 lists the peak number of scan-shift transitions at a single scan-shift cycle. Column 7 lists the runtime in seconds. The results show that RORC can outperform [20] with an average 43.28%

3.4 Scan-cell Reordering Considering Only Response Correlation 22

circuit gate count PI PO # of # of ATPG don’t-care-bit total coverage

scan cell pattern percentage(%) faults (%)

s13207 7,951 31 121 669 108 79.65 21,190 100

s15850 9,772 14 87 597 117 75.35 23,244 100

s35932 16,065 35 320 1,728 24 37.36 57,084 100

s38417 15,106 28 106 1,636 167 78.94 61,754 100

s38584 19,253 12 278 1,452 148 78.01 71,278 100

b17 22,645 37 97 1,415 778 89.98 128,886 99.57

b20 8,875 32 22 490 539 73.37 47,040 99.56

b21 9,259 32 22 490 543 74.41 47,548 99.77

b22 14,282 32 22 735 530 75.51 70,750 99.91

Table 3.2: Statistics of the circuits and their ATPG patterns.

and 49.50% reduction to the number of scan-in transitions and scan-out transitions,

在文檔中掃描鏈重新排列減少掃描移動功率架構在非指定的測試集合 (頁 16-0)