Formulation as a Delay Insertion Process

Chapter 3 Problem Formulation

3.3 Formulation as a Delay Insertion Process

With the run-and-scan test application methodology, we may apply a diagnostic test sequence to the chip. After the core logic computation, the results will be captured to the flip-flops in the scan chain, and then with scan shift operation, we can observe the snapshot images. For a failing chip, we use the same flow to observe the snapshot images in the following two steps.

Step 1: When the diagnostic test sequence has been applied to the chip through primary input pins (PI), but the scan shift operation is not started yet, the snapshot image will be the same as the fault-free one. However if there is some fault in the core logic, after the core logic computation, the fault in the logic will cause the snapshot image a slightly different from the fault-free one. The fault effect in the core logic will be captured into the flip-flops to flip the expected contents in the scan chains. The experiments show the difference of snapshot images caused by faulty core logic regarded as random noise on the snapshot images.

Step 2: After the scan shift out operation, we will observe a failing image that is different from the fault-free snapshot image by only one bit. For instance, the one bit at the faulty flip-flop is overwritten by its preceding flip-flop due to the multi-steeping phenomenon caused by the hold-time fault. In other words, the one bit in the snapshot image is dropped as scanned out as the final observe image.

Example 1: Fig 3.4 shows the distortion of the hold-time fault under the run-and-scan methodology.

We assume to use a specific diagnostic test sequence determined in advance to pump into the chip to setup the snapshot images of these flip-flops as (0011) before the scan shift out operation been executed. Here we assume the flip-flop 2 (FF index 2) has a hold-time fault and it will be triggered in the following scan shift out operation. During the scan shift operation, the value in the flip-flop 2 will be overwritten, and we may observe the snapshot image at the scan output pins as (-011), where

In summary, from the discussion above, we know that difference between the fault-free and faulty snapshot images is only a number of missing bits. Therefore, we can continue to perform the hold-time fault diagnosis by the delay insertion process to exam the image different profiles to localize the exact failing location in the scan chain as much as possible.

Fig. 3.4 : The distortion of a hold-time fault on the image.

input

Snapshot image set up by a test sequence: (0011) Observed image: (011)

Definition 3: (Delay Insertion Process) Given a fault-free image (g1, g2…gn) and a failing observed image (f1, f2…fn) obtained with the run-and-scan methodology. The delay insertion process is to insert a number of the delayed bits “d” into the failing observed images, so the similarity of the two images could be optimized, i.e. the different bits between the two bit streams can be reduced due to the delayed bits insertions. The similarity of the two images is defined as the number of bit positions where the two images are identical. Then based on this similarity and some statistical post processing, we can localize the possible faulty flip-flops as candidates of hold-time faults.

represent the fault-free images, i.e. the snapshot images before scan shift out operation under run-and-scan methodology and the failing images, i.e. the observed images after the scan shift out operation which triggered the hold-time faults. Certain bits in the failing image are denoted as “-“, meaning these values will depend on the data at the scan input pins in the second-stage of the run-and-scan test application. From the Fig 3.5 with the simple compare manipulation, we can get the original similarity between the fault-free image and the failing image that is calculated as 11 bits.

Now we apply the delay insertion process to insert the extra two delayed bits after the flip-flop 7 and flip-flop 13. Then the update similarity will be calculated and increased to 16 bits. And such increase of these similarity bits will indicate us these flip-flops we insert delayed bit may be the hold-time fault candidates.

Regarding the diagnosis as a delay insertion process is simply an attempt to reverse the hold-time effect. Or it can be viewed as a reconstruction method for a given distorted failing observed image to trace back the fault-free image. In the following, we will propose a Greedy algorithm to solve the hold-time diagnosis problem.

Chapter 4 Greedy Algorithm

In this section, we will explain the principal of the algorithm and illustrate the operation of the algorithm with one example

4.1 Principal of Greedy Algorithm

The kernel of the Greedy algorithm is to insert the delay bit one by one by examining the fault-free image and failing image simultaneously. The outline of the Greedy algorithm is shown in Fig. 4.1. Here we use run-and-scan methodology to apply some specific diagnostic patterns to the chip, and then with the scan shift out operation we may observe a great number of snapshot images (say 500 images) from a fault-free scan chain, and a large number of observed images (say 500 images also) from a faulty scan chain. For each image pair, i.e. one image is from fault-free 500 images and the other one is from the 500 faulty images. We sweep them one bit at one time from the scan output side (i.e. the flip-flop bit with the highest index) to find the proper delay candidate position (i.e. the bit position to insert the extra delay element “d” with the delay insertion process).

When we go through the delay insertion process by checking the bits one by one, two conditions may occur.

Condition 1: (matched case) The values of the checking bits between the fault-free and faulty images are identical. Then we simply proceed to the next checking bit to the left.

Once the sweep is done, we will further consider the running sequence effects. Here the running sequence means a consecutive 0’s or consecutive 1’s in the fault free image. The example as shown in Fig 4.2 will explain the effects. In principal, if the leading bit of the running sequence is marked as an extra delay candidate, the every rest bit in the running sequence will be marked as a delay candidate. The heuristic is based on the observation that an extra delay inside the ant bit position in the running sequence will lead to the identical failing image. We just can’t differentiate the more accurate delay candidates due to the running sequence effects. In order to accommodate the ambiguity, we take a conservative stance and regard the whole running sequence as extra delay candidates. Once we have done the processing of one experiment with 500 image pairs, we deal with these 500 extra delay candidates simply by summing the number of occurrences that a bit position is marked as a extra delay candidate and rank each flip-flop position with the occurrences numbers. For each flip-flop position, the larger the occurrences number is, the higher the rank it will be.

Fig. 4.1: The outline of a greedy algorithm.

fault-free images

Sweep the images from right (SO side) to left (SI side) Insert a delay to the failing image

immediately once a difference is found

Consider the running sequence effect

Rank the fault candidates

no more more

4.2 Operation of Greedy Algorithm

Based on the principal of greedy algorithm in previous section, we will use an example to illustrate how the greedy algorithm work under run-and-scan methodology

Example 3: Fig 4.2 illustrates how the greedy algorithm is performed on an image pair. Similarly, we assume the scan chain is composed of 18 flip-flops and the failing image here is caused by hold-time faults in scan chain which is triggered under the scan shift out operation rather than the random noise effects caused by faulty core logic computation. The first row is the flip-flop index (FF index) starting from the scan input (i.e. SI, with the least FF index) to the scan output (i.e. SO, with the highest FF index). Sweeping from the right to the left (i.e. from SO to SI), we found the first difference is at bit position 11. Based on delay insertion process rule, we immediately insert one delay element “d” at the position and move on to check the rest bits. The checking stops again at flip-flop 5 where another extra delay element is inserted. Finally we got preliminary fault candidate position at flip-5 and flip-flop 11. Then we consider the running sequence effect, for flip-flop 11.

The running sequence is “000” i.e. the flip-flop 11, 12 to 13. So under considering of the running sequence effect, we also add the flip-flop 12 and 13 into our fault candidate lists. With the same reason, we also put flip-flop 6 into our fault candidate list since the running sequence for flip-flop 5 is flip-flop 5and 6. Finally the possible hold-time fault candidates for the example will be from {FF index 5 and 11} to {FF index 5, 6, 11, 12, 13}.

It is hard to localize the accurate hold-time fault location for greedy algorithm with one image pair due to the ambiguity caused by running sequence effects. However if we perform such analysis

F ig. 4 .2 : Illu stra tio n o f a gre e d y a lgo rith m .

1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 fau lt-free

im a g e failin g

im a g e - - 1 0 1 1 0 1 1 1 1 0 0 1 0 1 0 1 FF in d ex 1 2 3 4 5 6 7 8 9 1 01 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8

d elay o n first d ifferen c e - 1 0 1 1 0 1 1 1 1 d 0 0 1 0 1 0 1 failin g

im a g e

d elay o n seco n d d ifferen c e 1 0 1 1 d 0 1 1 1 1 d 0 0 1 0 1 0 1 failin g

im a g e

co n sid er ru n n in g effect 1 0 1 1 d 0 1 1 1 1 d 0 0 1 0 1 0 1 failin g

im a g e

d d

d d d d

ru nn in g sequ en ce

Chapter 5 Experimental Results

In this chapter, we will depict the experimental setup first. We present the experimental results implemented with greedy algorithm under considerations of ideal and non-ideal conditions for some practical designs.

5.1 Experimental Setup

We have implemented the proposed approach as a system including a number of programs. The overall experimental setup is shown at Fig 5.1. The circuit under diagnosis is given as a netlist in the Verilog format. Then running the logic simulation to record the 5000 clock cycles snapshot images, with the cooperation of the test sequence selection mechanism (i.e. by analyzing the logic simulation snapshot images with the randomness criteria. We pick up a large number of test sequences to make the signal-1 frequency of each flip-flop fall with a predefined range, say [0.3,0.7] as much as possible to). We can get the final 500 snapshot images that have mostly random behavior per each flip-flop in the scan chains and recorded the test sequences for the 500 snapshot images. That is the fault-free images we may use to diagnose the hold-time faults. For failing chip, our system can inject faults at the core logic and flip-flops in scan chains. We inject one stuck-at fault at the core logic stem side to bring contamination for the following branches. We use the test sequence selected in fault-free condition to apply to the fault simulator to get the corresponding 500 failing images. For hold-time fault injection, we randomly inject hold-time faults in the scan chain before scan shift out

F ig . 5 .1 : E x p e r im e n ta l s e tu p .

S im u l a tio n a n d T e st S e q u e n c e

S e le c tio n

F a u lt- fre e I m a g e s

T e st be n c h F a u lt I nje c to r T e s tbe nc h

a nd s im u la to r

F a ilin g Im a g e s

G re e d y A lg o rit hm A n a ly sis

P o st-p r o c e s s ing (e .g ., s u m m in g a nd ra nk in g )

F a u lt C a n d id a te L ist in T o p 1 0 C irc u it

In V e r ilo g

The experiments are performed on 4 practical designs: GCD, FIR, Montgomery Inverse and Viterbi decoder. These designs are all written in Verilog code and synthesized into their gate level netlists. The GCD is a design that computes the greatest common divisor of two given natural numbers. The FIR circuit is a digital finite impulse response filter. The Montgomery Inverse is a 32-bit integer counter. The Viterbi circuit is a channel decoder that extracts the original bit streams from the received bit streams at receive side in a communication system. The experimental setup parameters for these 4 designs are shown in Table 5.1

Table 5.1 Test circuits information and experiment setup parameters test application time. After the flush test, the failing signatures will tell us the faulty scan chain and the faulty behaviors. We can utilize this information to focus on a small number of scan chains that are faulty under flush test. Then we use proper diagnosis mechanism to locate the exact faulty flip-flops in the faulty scan chains. The 4 test cases are not the big million-gate counts design, but we can regard these designs as a basic block that contains a complete scan chain under diagnosis. In our experiment below, we assume only one scan chain exists in our test design cases. The basic assumption is the flip-flops inside a small sub-design likely to be connected in the same scan chain in the whole chip because of layout proximity. In general, a design with multiple scan chain is relatively easier to diagnose because neighboring scan chains are likely to be fault-free and can serve

consider the fault-free and faulty core logic conditions per the three experiment sets. Finally we will also discuss the intermittent faults effects with the diagnosis

Before the experimental results, we define some terminologies used in the following summary.

These are listed below:

(1) Size: This indicates the overall gate counts in the design.

(2) Scan FF’s: This indicates the total flip-flops in the scan chain to be diagnosed.

(3) Success rate: This indicates the rate that the faulty flip-flop is included in the top 10 candidates predicted.

(4) 1^st hit index: This refers to the index of the first flip-flop in the final top 10 candidate list that turned out to be the true faulty location. It reflects the amount of efforts a physical failure analysis engineer needs to spend if guided by the predicted top 10 candidate lists.

(5) 2^nd hit index: Similar as 1^st hit index metrics, here this refers the index of the second flip-flop in the final top 10 candidate list that turned out to be the true faulty location. For two faults or burst fault test cases, 1^st hit index means as long as one faulty flip-flop is identified in the candidate list, it was counted as success. The 2^nd hit index will count as success as long as both of the two faulty flip-flops are localized and predicted in the candidate list.

5.2 Single fault experiment set

In this experiment as shown in Table 5.2, we inject one single hold-time fault at scan chain randomly for fault-free core logic and faulty core logic. We conducted 100 experiments and calculate the average 1^st hit index and the final success rate with the Greedy algorithm. The 1^st hit index is almost 1 for every design and the success rate is 100% for each design case. This implies that it highlight the faulty location exactly for all cases we tried. The key to this success is mostly due to the heuristic for dealing with the running sequences.

Table 5.2: Experiment results of single fault under fault-free core logic

Design Size Scan FF’s 1^sthit index Success rate

GCD 1.5K 66 1.00 100%

FIR 11K 160 1.00 100%

Montgomery Inverse

4.5K 202 1.09 100%

Viterbi decoder 9K 620 1.02 100%

For the faulty core logic, i.e. except the single hold-time fault we injected randomly, we also inject one stuck-at fault randomly at the stem side of the design circuits. The stuck-at fault injected in the

After that, the scan shift operation will trigger the single hold-time fault we injected randomly to have the observed images distorted. The condition is under faulty core logic; it is not an ideal assumption that core logic is fault-free under scan chain diagnosis. From the summary in Table 5.3, the average 1^st hit index is still 1 around for GCD and FIR design. For the rest two designs, the average 1^st hit index is 2 around; it implies the Greedy algorithm still can catch the exact faulty location in the short trial. And this will lower down the efforts for physical failure analysis engineer to delayer the chip guided by the summary.

Table 5.3: Experiment results of single fault under faulty core logic

Design Size Scan FF’s 1^sthit index Success rate

GCD 1.5K 66 1.30 91%

FIR 11K 160 1.24 100%

Montgomery Inverse

4.5K 202 2.19 70%

Viterbi decoder 9K 620 2.21 73%

5.3 Two faults experiment set

In this experiment set, we first inject two hold-time faults randomly in the scan chain that is to be diagnosed. Then use the same Greedy algorithm to diagnose these faulty locations. Again, we conducted 100 experiments for each design case to derive the final average results as shown in Table 5.4. It can be seen that the results are similar to Table 5.2, which implies that under fault-free core logic condition, the Greedy algorithm still can localize the faulty flip-flops, so it is applicable to multiple fault situations without any modifications.

Table 5.4: Experiment results of two faults under fault-free core logic

Design Avg. 1^st hit index Success rate Avg. 2^nd hit index Success rate

GCD 1.00 100% 2.00 100%

FIR 1.00 100% 2.00 100%

Montgomery Inverse

1.03 99% 2.26 96%

Viterbi decoder 1.05 100% 2.50 98%

In Table 5.5, we inject one stuck-at fault at core logic to evaluate the robustness of the proposed algorithm. Similarly, we tested 100 times to come out the final average hit index and final success

average 2^nd hit index, the experimental values are similar as data for major designs shown in Table 5.4, this implies the robustness of the algorithm when multiple hold-time faults exist under the faulty core logic situation.

Table 5.5: Experiment results of two faults under faulty core logic.

Design Avg. 1^st hit index Success rate Avg. 2^nd hit index Success rate

GCD 1.19 90% 2.00 86%

FIR 1.23 99% 2.12 97%

Montgomery Inverse

1.56 68% 2.75 48%

Viterbi decoder 2.32 66% 3.42 57%

5.4 Two Burst faults experiment set

In the experimental set, we apply the two hold-time faults in a burst into the scan chain at the first time, then using the proposed approach to diagnose the observed images. The burst here means the faulty flip-flop are connected side-by-side. The result is as shown in Table 5.6. Since the core logic is fault-free, so the observed images that are not identical to expected snapshot images are caused by two burst hold-time faults we injected. Again, we tested 100 times to come out the final success rate and average hit index. The results looked pretty good for most design cases. And this implies the algorithm not only be applicable to multiple hold-time faults but also be capable to diagnose such multiple burst faults without further modifications.

Table 5.6: Experiment results of two burst faults under fault-free core logic

Design Avg. 1^st hit index Success rate Avg. 2^nd hit index Success rate

GCD 1.00 100% 2.00 100%

FIR 1.00 100% 2.00 100%

Montgomery Inverse

1.10 98% 2.38 97%

Viterbi decoder 1.19 100% 2.60 100%

The results looked pretty nice in average hit index point of view. The performance degradation of average hit index under faulty core logic is not so serious. The success rate for GCD, FIR design looked well, but a little poor for the rest two design. Since we inject the stuck-at fault at stem side, so the contamination will depend on part of circuit design to cause the success rate under faulty core

在文檔中掃描串列故障診斷的新手法 (頁 27-0)