Run-and-scan diagnosis flow - Basic Diagnosis Flow

Chapter 2 Basic Diagnosis Flow

2.3 Run-and-scan diagnosis flow

The diagnosis process used in the thesis is shown as Fig 2.5. Certain snapshot images are assigned to the faulty chain first through the core logic in the functional mode before scanning out for further analysis, as in the approach first proposed by Kundu [7].

Here, we first assume 500 diagnostic test sequences have been generated in advance. These diagnostic sequences are derived from the test bench. Then, the fault-free images can be collected by examining the VCD (Value Change Dump) file produced after the RTL simulation. These test sequences are hoped to set the value of each flip-flop as random as possible, as in Cheney [18]. For each test sequence, we apply it to the failing chip in the functional mode through primary input pins (i.e. PI), after the core logic computation, and then scanning out the snapshot images as observed images at the scan output pins (i.e. SO). Eventually we will have a large number of fault-free snapshot images and failing observed images. Then we can analyze these images to produce different profiles to identify the faulty locations.

Fig. 2.5: Basic run-and-scan diagnosis flow.

(Run-and-Scan Test Application) For each test sequence,

(1) Apply it to the failing chip (2) Collect the failing image

For each test sequence,

Derive fault-free snapshot image by simulation or examining the VCD file produced after RTL simulation

Analyze the fault-free images and the failing images to identify The fault locations

Prepare a large number of

diagnostic test sequences (say 500 of them)

Chapter 3 Problem Formulation

In this section, we will first review the behavior of a hold-time violation fault and then formulate the diagnosis as a delay insertion process.

3.1 Hold-time Fault Definitions

First we have the timing diagrams for a flip-flop and two flip-flops on a scan chain are shown as Fig 3.1 and Fig 3.2 respectively (Huang [3])

Fig. 3.1 Timing diagram for a single flip-flop

Fig. 3.2 Timing diagram for a scan chain

In Fig 3.1, we observe whether the data at the input port D of the flip-flop can be propagated and registered to its output port Q correctly. The data must be stable after the clock is active.

Otherwise the data registered at port D may be incorrect. In Fig 3.2, during the scan chain with multiple flip-flops connected, the statement must be true to cause the hold-time error.

If tsk + tH – tCQ > td, then we will trigger the hold-time faults in which tsk is the clock skew between clocks driving the adjacent flip-flops on the scan chain, tH is the required hold-time, tCQ is the delay from activating clock to register the data for the driving flip-flop, and t_d is the propagation delay from the output of the driving flip-flop to the input of the driven flip-flop. So why the hold-time fault will be more common in today’s design and process technology? It can have some

increase the risk to trigger the hold-time fault.

Reason 3: the propagation delay may be shorter due to the improved process technology with high density interconnects.

3.2 Hold-time Fault modeling

Previously, Wu [15] discussed the hold-time faults and classified as three types.

Type 1: the faulty flip-flop captures the incorrect data if and only if a “0->1” transition happens at the input of the flip-flop.

Type 2: the faulty flip-flop captures the incorrect data if and only if a “1->0” transition happens at the input of the flip-flop.

Type 3: the hold-time fault happens whenever there is a transition at the input of the faulty flip-flop.

Besides Wu [15], Guo [2] also defined the hold-time faults as if the clock to the scan latch stays active, then function of the faulty scan latch will behavior as a buffer that the expected value will come out of the scan chain one cycle earlier. This hold-time fault is what the thesis target for. We assume such hold-time fault will be triggered only in scan shift operation to have the faulty scan cell transparent. So we refer the phenomenon that the signal at a flip-flop’s input (i.e. D-pin) changes too fast after the clock active edge. As a result, the flip-flop can be transparent if certain clock skew exists between driving and driven scan latch.

In Fig 3.4 as shown below, we use an example to illustrate the hold-time fault we are targeting in this thesis.

F ig. 3 .3 : T h e im p act o f a h o ld -tim e fau lt o n th e flu sh test. is too short, thus it triggers a too-early D-pin value change during the scan chain shift operation.

Such a fault could make flip-flop2 transparent. We will say the flip-flop 2 has suffered a hold-time fault. So from Fig 3.3, in the presence of the hold-time fault at flip-flop 2, the observed bit-stream at the scan output (SO) pin is the same as the one we pumped into the scan chain in the flush test, but just one cycle earlier. For example, we pumped into the scan chain with a pattern “0011” for a fault-free chip during the flush test, then we may get the observed bit streams from the scan output pins as “0011XXXX”. However, for a failing chip (i.e., the flip-flop 2 has a hold-time fault) with the same pumped in pattern, we may get the observed bit streams as “ 0011XXX”. Here the “X” denotes a don’t care bit and it depends on the rest values of the flip-flops. For the fault-free chip, we may need to wait for 4 clock cycles to observe the pumping pattern at the scan output pin, while we only need to wait for 3 clock cycles for the failing case. Similarly, if there are two hold-time faults in the

3.3 Formulation as A Delay Insertion Process

With the run-and-scan test application methodology, we may apply a diagnostic test sequence to the chip. After the core logic computation, the results will be captured to the flip-flops in the scan chain, and then with scan shift operation, we can observe the snapshot images. For a failing chip, we use the same flow to observe the snapshot images in the following two steps.

Step 1: When the diagnostic test sequence has been applied to the chip through primary input pins (PI), but the scan shift operation is not started yet, the snapshot image will be the same as the fault-free one. However if there is some fault in the core logic, after the core logic computation, the fault in the logic will cause the snapshot image a slightly different from the fault-free one. The fault effect in the core logic will be captured into the flip-flops to flip the expected contents in the scan chains. The experiments show the difference of snapshot images caused by faulty core logic regarded as random noise on the snapshot images.

Step 2: After the scan shift out operation, we will observe a failing image that is different from the fault-free snapshot image by only one bit. For instance, the one bit at the faulty flip-flop is overwritten by its preceding flip-flop due to the multi-steeping phenomenon caused by the hold-time fault. In other words, the one bit in the snapshot image is dropped as scanned out as the final observe image.

Example 1: Fig 3.4 shows the distortion of the hold-time fault under the run-and-scan methodology.

We assume to use a specific diagnostic test sequence determined in advance to pump into the chip to setup the snapshot images of these flip-flops as (0011) before the scan shift out operation been executed. Here we assume the flip-flop 2 (FF index 2) has a hold-time fault and it will be triggered in the following scan shift out operation. During the scan shift operation, the value in the flip-flop 2 will be overwritten, and we may observe the snapshot image at the scan output pins as (-011), where

In summary, from the discussion above, we know that difference between the fault-free and faulty snapshot images is only a number of missing bits. Therefore, we can continue to perform the hold-time fault diagnosis by the delay insertion process to exam the image different profiles to localize the exact failing location in the scan chain as much as possible.

Fig. 3.4 : The distortion of a hold-time fault on the image.

input

Snapshot image set up by a test sequence: (0011) Observed image: (011)

Definition 3: (Delay Insertion Process) Given a fault-free image (g1, g2…gn) and a failing observed image (f1, f2…fn) obtained with the run-and-scan methodology. The delay insertion process is to insert a number of the delayed bits “d” into the failing observed images, so the similarity of the two images could be optimized, i.e. the different bits between the two bit streams can be reduced due to the delayed bits insertions. The similarity of the two images is defined as the number of bit positions where the two images are identical. Then based on this similarity and some statistical post processing, we can localize the possible faulty flip-flops as candidates of hold-time faults.

represent the fault-free images, i.e. the snapshot images before scan shift out operation under run-and-scan methodology and the failing images, i.e. the observed images after the scan shift out operation which triggered the hold-time faults. Certain bits in the failing image are denoted as “-“, meaning these values will depend on the data at the scan input pins in the second-stage of the run-and-scan test application. From the Fig 3.5 with the simple compare manipulation, we can get the original similarity between the fault-free image and the failing image that is calculated as 11 bits.

Now we apply the delay insertion process to insert the extra two delayed bits after the flip-flop 7 and flip-flop 13. Then the update similarity will be calculated and increased to 16 bits. And such increase of these similarity bits will indicate us these flip-flops we insert delayed bit may be the hold-time fault candidates.

Regarding the diagnosis as a delay insertion process is simply an attempt to reverse the hold-time effect. Or it can be viewed as a reconstruction method for a given distorted failing observed image to trace back the fault-free image. In the following, we will propose a Greedy algorithm to solve the hold-time diagnosis problem.

Chapter 4 Greedy Algorithm

In this section, we will explain the principal of the algorithm and illustrate the operation of the algorithm with one example

4.1 Principal of Greedy Algorithm

The kernel of the Greedy algorithm is to insert the delay bit one by one by examining the fault-free image and failing image simultaneously. The outline of the Greedy algorithm is shown in Fig. 4.1. Here we use run-and-scan methodology to apply some specific diagnostic patterns to the chip, and then with the scan shift out operation we may observe a great number of snapshot images (say 500 images) from a fault-free scan chain, and a large number of observed images (say 500 images also) from a faulty scan chain. For each image pair, i.e. one image is from fault-free 500 images and the other one is from the 500 faulty images. We sweep them one bit at one time from the scan output side (i.e. the flip-flop bit with the highest index) to find the proper delay candidate position (i.e. the bit position to insert the extra delay element “d” with the delay insertion process).

When we go through the delay insertion process by checking the bits one by one, two conditions may occur.

Condition 1: (matched case) The values of the checking bits between the fault-free and faulty images are identical. Then we simply proceed to the next checking bit to the left.

Once the sweep is done, we will further consider the running sequence effects. Here the running sequence means a consecutive 0’s or consecutive 1’s in the fault free image. The example as shown in Fig 4.2 will explain the effects. In principal, if the leading bit of the running sequence is marked as an extra delay candidate, the every rest bit in the running sequence will be marked as a delay candidate. The heuristic is based on the observation that an extra delay inside the ant bit position in the running sequence will lead to the identical failing image. We just can’t differentiate the more accurate delay candidates due to the running sequence effects. In order to accommodate the ambiguity, we take a conservative stance and regard the whole running sequence as extra delay candidates. Once we have done the processing of one experiment with 500 image pairs, we deal with these 500 extra delay candidates simply by summing the number of occurrences that a bit position is marked as a extra delay candidate and rank each flip-flop position with the occurrences numbers. For each flip-flop position, the larger the occurrences number is, the higher the rank it will be.

Fig. 4.1: The outline of a greedy algorithm.

fault-free images

Sweep the images from right (SO side) to left (SI side) Insert a delay to the failing image

immediately once a difference is found

Consider the running sequence effect

Rank the fault candidates

no more more

4.2 Operation of Greedy Algorithm

Based on the principal of greedy algorithm in previous section, we will use an example to illustrate how the greedy algorithm work under run-and-scan methodology

Example 3: Fig 4.2 illustrates how the greedy algorithm is performed on an image pair. Similarly, we assume the scan chain is composed of 18 flip-flops and the failing image here is caused by hold-time faults in scan chain which is triggered under the scan shift out operation rather than the random noise effects caused by faulty core logic computation. The first row is the flip-flop index (FF index) starting from the scan input (i.e. SI, with the least FF index) to the scan output (i.e. SO, with the highest FF index). Sweeping from the right to the left (i.e. from SO to SI), we found the first difference is at bit position 11. Based on delay insertion process rule, we immediately insert one delay element “d” at the position and move on to check the rest bits. The checking stops again at flip-flop 5 where another extra delay element is inserted. Finally we got preliminary fault candidate position at flip-5 and flip-flop 11. Then we consider the running sequence effect, for flip-flop 11.

The running sequence is “000” i.e. the flip-flop 11, 12 to 13. So under considering of the running sequence effect, we also add the flip-flop 12 and 13 into our fault candidate lists. With the same reason, we also put flip-flop 6 into our fault candidate list since the running sequence for flip-flop 5 is flip-flop 5and 6. Finally the possible hold-time fault candidates for the example will be from {FF index 5 and 11} to {FF index 5, 6, 11, 12, 13}.

It is hard to localize the accurate hold-time fault location for greedy algorithm with one image pair due to the ambiguity caused by running sequence effects. However if we perform such analysis

F ig. 4 .2 : Illu stra tio n o f a gre e d y a lgo rith m .

1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 fau lt-free

im a g e failin g

im a g e - - 1 0 1 1 0 1 1 1 1 0 0 1 0 1 0 1 FF in d ex 1 2 3 4 5 6 7 8 9 1 01 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8

d elay o n first d ifferen c e - 1 0 1 1 0 1 1 1 1 d 0 0 1 0 1 0 1 failin g

im a g e

d elay o n seco n d d ifferen c e 1 0 1 1 d 0 1 1 1 1 d 0 0 1 0 1 0 1 failin g

im a g e

co n sid er ru n n in g effect 1 0 1 1 d 0 1 1 1 1 d 0 0 1 0 1 0 1 failin g

im a g e

d d

d d d d

ru nn in g sequ en ce

Chapter 5 Experimental Results

In this chapter, we will depict the experimental setup first. We present the experimental results implemented with greedy algorithm under considerations of ideal and non-ideal conditions for some practical designs.

5.1 Experimental Setup

We have implemented the proposed approach as a system including a number of programs. The overall experimental setup is shown at Fig 5.1. The circuit under diagnosis is given as a netlist in the Verilog format. Then running the logic simulation to record the 5000 clock cycles snapshot images, with the cooperation of the test sequence selection mechanism (i.e. by analyzing the logic simulation snapshot images with the randomness criteria. We pick up a large number of test sequences to make the signal-1 frequency of each flip-flop fall with a predefined range, say [0.3,0.7] as much as possible to). We can get the final 500 snapshot images that have mostly random behavior per each flip-flop in the scan chains and recorded the test sequences for the 500 snapshot images. That is the fault-free images we may use to diagnose the hold-time faults. For failing chip, our system can inject faults at the core logic and flip-flops in scan chains. We inject one stuck-at fault at the core logic stem side to bring contamination for the following branches. We use the test sequence selected in fault-free condition to apply to the fault simulator to get the corresponding 500 failing images. For hold-time fault injection, we randomly inject hold-time faults in the scan chain before scan shift out

F ig . 5 .1 : E x p e r im e n ta l s e tu p .

S im u l a tio n a n d T e st S e q u e n c e

S e le c tio n

F a u lt- fre e I m a g e s

T e st be n c h F a u lt I nje c to r T e s tbe nc h

a nd s im u la to r

F a ilin g Im a g e s

G re e d y A lg o rit hm A n a ly sis

P o st-p r o c e s s ing (e .g ., s u m m in g a nd ra nk in g )

F a u lt C a n d id a te L ist in T o p 1 0 C irc u it

In V e r ilo g

The experiments are performed on 4 practical designs: GCD, FIR, Montgomery Inverse and Viterbi decoder. These designs are all written in Verilog code and synthesized into their gate level netlists. The GCD is a design that computes the greatest common divisor of two given natural numbers. The FIR circuit is a digital finite impulse response filter. The Montgomery Inverse is a 32-bit integer counter. The Viterbi circuit is a channel decoder that extracts the original bit streams from the received bit streams at receive side in a communication system. The experimental setup parameters for these 4 designs are shown in Table 5.1

Table 5.1 Test circuits information and experiment setup parameters test application time. After the flush test, the failing signatures will tell us the faulty scan chain and the faulty behaviors. We can utilize this information to focus on a small number of scan chains that are faulty under flush test. Then we use proper diagnosis mechanism to locate the exact faulty flip-flops in the faulty scan chains. The 4 test cases are not the big million-gate counts design, but we can regard these designs as a basic block that contains a complete scan chain under diagnosis. In our experiment below, we assume only one scan chain exists in our test design cases. The basic assumption is the flip-flops inside a small sub-design likely to be connected in the same scan chain in the whole chip because of layout proximity. In general, a design with multiple scan chain is relatively easier to diagnose because neighboring scan chains are likely to be fault-free and can serve

consider the fault-free and faulty core logic conditions per the three experiment sets. Finally we will also discuss the intermittent faults effects with the diagnosis

Before the experimental results, we define some terminologies used in the following summary.

These are listed below:

(1) Size: This indicates the overall gate counts in the design.

(2) Scan FF’s: This indicates the total flip-flops in the scan chain to be diagnosed.

(3) Success rate: This indicates the rate that the faulty flip-flop is included in the top 10 candidates

在文檔中掃描串列故障診斷的新手法 (頁 21-0)