Jump scan: a DFT technique for low power testing

(1)

Jump Scan: A DFT Technique for Low Power Testing

Min-Hao Chiu and James C.-M. Li

Laboratory of Dependable Systems, GIEE

Electrical Engineering Department, National Taiwan University

cmli@cc.ee.ntu.edu.tw

Abstract

This paper presents a Jump scan technique (or J-scan) for low power testing. The J-scan shifts two bits of scan data per clock cycle so the scan clock frequency is halved without increasing the test time. The experimental data show that the proposed technique effectively reduces the test power by two thirds compared with the traditional MUX scan. The presented technique requires very few changes in the existing MUX-scan design for testability methodology and needs no extra computation. The penalties are area overhead and speed degradation.

1. Introduction

Circuit power dissipation in test mode is much higher than the power dissipation in function mode [Zorian 93]. One possible reason is that automatic test pattern generators (ATPG) try to activate as many faults as possible to minimize the test application time [Wang 97]. Low power design for testability (DFT) techniques are gaining more and more importance recently [Girard 02]. The first advantage of low power DFT techniques is to avoid the risk of damaging the Circuits Under Test (CUT). High temperature and high current in test mode not only cause catastrophic damage at the time of testing but also accelerate reliability failures (such as electromigration). Low power DFT techniques save the cost of expensive packages or external cooling devices for heat dissipation. In addition, low power DFT techniques enable parallel testing of multiple cores in the system on a chip (SOC). Power consumption in test mode is one of the major constraints when scheduling tests for multiple cores [Chou 94]. By applying the low power DFT techniques, many cores can be tested at the same time and hence the overall SOC test time is reduced. Last, low power DFT techniques prevent on-chip power integrity problems in test mode. High current in

test mode results in excessive Vdd drop or ground bounce, which may cause the CUT to malfunction. Low power DFT techniques ensure correct operations of the CUT in test mode.

This paper presents the Jump-scan (or J-scan) DFT technique for low power testing. As opposed to traditional Mux-scan chains which shift one bit per clock cycle, the proposed J-scan chains shift two bits per clock cycle. J-scan halves the clock frequency without increasing the test time. This is achieved by modifying the scan cells and adding an extra routing for scan signals. JQN-scan is an enhanced version of

J-scan by adding the QN-J-scan toggle suppression technique. The simulation results show that JQN-scan

saves up to 67% test power compared with the traditional MUX scan. The proposed technique has two important applications. The first application is parallel testing of multiple cores on a SOC because the J-scan test power is even lower than the power in function mode. Alternatively, J-scan can be also applied to double the test data rate and save the test time by the half.

In addition to test power reduction, the other advantages of J-scan are as follows. First, J-scan is compatible with the existing MUX-scan DFT methodology. Neither extra computation nor special ATPG is needed to implement J-scan. Second, J-scan technique needs no modification to the clock trees, which avoids the risk of clock skew. Third, the JQN

-scan technique is applicable to delay fault testing as well as stuck-at fault testing. The cost of J-scan include area overhead and speed degradation.

The organization of this paper is as follows. Section two introduces the background knowledge of low power testing and reviews past publications in this area. The third section explains our J-scan DFT technique in detail. Section four shows the experimental data collected on ISCAS’89 benchmark circuits. The fifth section discusses some issues related to the presented idea. And finally the last section concludes the paper.

(2)

2. Background 2.1 Power Dissipation

The dynamic power dissipation of CMOS circuits can be classified into two major components: the short circuit power and the switching power. The former is caused by the temporary short circuits at the moment of signal transition, when both PMOS and NMOS transistors are turned on for a short period of time. The short circuit power can be calculated by eq.1. Eiis the

energy consumed per transition of gate i. TRi is the

toggle rate of output of gate i. Eiis usually provided by

the ASIC vendor and TRi is usually obtained by

simulation. The short circuit power is consumed by the combinational logic and the sequential circuits, such as scan flip-flops. ) ( i i gate all for i SC E TR P

_¦

(eq. 1)

The switching power is consumed by charging and discharging of the capacitors. It can be calculated by eq. 2. Cload i is the total load capacitance connected

to net i. Cloadcan be extracted from the physical layout

or estimated by the synthesis tool. TRi is the toggle

rate on net i, which can be a gate internal node or a piece of interconnection wire. Again, the TR can be obtained by simulation. The switching power is consumed by signals as well as clocks. The clock power dissipation is a significant component of the switching power because the clock network is heavily loaded. ) ( . 2 2 i i net all for i load DD SW C TR V P

_¦

(eq. 2) 2.2 Past Research

Past research in low power testing can be summarized as follows. Reordering the sequence of the scan cells to reduce the test power is proposed in [Dabholkar 98]. The problem of scan chain reordering is that the optimal order for one test set (e.g. stuck-at fault test set) may not be optimal for another test set (e.g. delay fault test set). Disabling or gating the clock of certain scan chains also helps to reduce the power [Sankaralingam 01] [Bonhomme 01] [Whetsel 00]. Disabling the clocks not only increases the risk of skew problems but also imposes some constraints on the test patterns generation. Inserting gates (like inverters, XOR, XNOR) into scan chains can minimize the toggling when scan chains shift [Sinanoglu 02]. This technique requires not only extra gates but also

computation for optimal positions to insert these gates. The toggle suppression technique separates the data outputs and scan outputs of scan cells [Hertwig 98] [Gerstendorfer 99]. By suppressing the data outputs, the power consumption in combinational circuits is reduced. However, the skew-load delay fault testing cannot be applied because of suppressed data outputs in scan mode. An improved toggle suppression technique, the quiet-noisy scan (or QN scan), is proposed for low power delay fault testing [Li 04]. The QN scan operation, which is composed of quiet scans followed by a noisy scan in the last cycle, makes the skew-load delay fault testing possible.

3. J-scan Technique

3.1 J-scan DFF and J-scan chain

Figure 1 shows the structure of a J-scan DFF. It contains a negative latch (NL), a positive latch (PL) and two multiplexers. This is a master-slave implementation of a rising edge triggered flip-flop. This scan cell is called J-scan DFF for the ‘jumping behavior’ when test patterns are shifting in the scan chain. Compared with a Mux-scan (M-scan) DFF, the J-scan DFF has an additional Jump Input (JI) pin and an additional Jump Output (JO) pin. The negative latch (NL) is transparent when the clock is in negative phase; the positive latch (PL) is transparent when the clock is in positive phase. The two multiplexers are controlled by the scan enable (SE) signal. When the SE is de-asserted (function mode), Mux1 and Mux2 select the data input (DI) and the output of NL, respectively. In function mode, a J-scan DFF is the same as a rising edge triggered DFF. When the SE is asserted (scan mode), Mux1 and Mux2 select the SI and the JI, respectively. In the negative phase of the clock, the NL is transparent (from SI to JO) and the PL latches the data from JI. In the positive phase of the clock, the PL is transparent (from JI to SO) and the NL latches the data from SI. By doing so, the SI is shifted to the JO output and the JI is shifted to the SO output. In this figure, the Data Output (DO) and the Scan Output (SO) are shared. These two pins can be separated as long as the QN output of the PL is available.

(3)

Figure 2. Jump-scan chain (scan mode only)

Figure 2 illustrates a J-scan chain with four J-scan DFFs. The multiplexers, SE signals and DI signals are omitted for clarity. The J-scan DFFs are numbered in increasing order, from the scan input to the scan output of the chain. In J-scan chain, two J-scan DFFs are connected by two wires. One of the routing paths is called the scan path (from SO to SI). Another routing path is called the jump path (from JO to JI). For example, the SO output of the first J-scan DFF (JSD1)

is connected to the SI input of JSD2as a scan path. The

JO output of JSD1 is connected to the JI input of JSD2

as a jump path. Note that the scan input of the scan chain (Scan_In ) is connected to both the JI and the SI of JSD1. Figure 3 illustrates the Scan_In and the clock

waveforms of M-scan and J-scan. The waveforms are divided into four time periods, marked from I to IV. The input data, A to D, are applied at the beginning of every time period. The clock frequency of the M-scan is two times higher than that of the J-scan. For the M-scan, every time period has a negative phase and a positive phase. For the J-scan, every time period has only one negative phase or one positive phase.

M-scan J-scan Scan In

Time (I) (II) (III) (IV)

A B C D

Figure 3. Waveforms of M-scan and J-scan Table 1 shows the scan input data and the contents of all J-scan DFFs in every time period. Only two clock cycles of J-scan (instead of four clock cycles of M-scan) are needed to shift in four bits of test data. The contents of latches differ from those in the previous time period are underlined. Test data A and C shift via the thin lines in Figure 2; test data B and D shift via the bold lines. In time period IV, test data A to D (highlighted in gray) are settled in JSD1 to JSD4,

respectively. The scan out waveforms are the same as the scan in waveforms shown in Figure 3. Note that a multiplexer (Mux3) has to be inserted between the last J-scan DFF and the Scan_Out of the scan chain (see

Figure 2). In the negative phase of the clock, Mux3 selects SO of JSD4. In the positive phase of the clock,

Mux3 selects JO of JSD4.

Table 1. Contents of J-scan chain Time ScanIn NL1 PL1 NL2 PL2 NL3 PL3 NL4 PL4

I A A

II B A B A

III C C B B A A

IV D C D B C A B A

Penalties of the J-scan technique are area overhead and speed degradation. These two penalties are analyzed as follows. Compared with the M-scan chain, the J-scan chain requires larger scan cells and one extra routing, the jump path. According to the numbers in TSMC 0.25 Pm technology standard cell library, a J-scan DFF is 161.3 Pm2, which is 40 % larger than the M-scan DFF. The cell area of the J-scan DFF is obtained by adding up the areas of components as individual standard cells. Besides scan cell area overhead, the J-scan chain has an addition routing than the M-scan chain to shift the scan data. Although this additional routing can be long, this signal is not timing critical. By allowing long delay for this additional signal, the area overhead can be minimized. As far as the speed degradation is concerned, the J-scan DFF has an additional Mux (i.e. Mux2 in Figure 1) inserted between the NL and PL latches. According to the TSMC 0.25Pm library, the delay of a Mux is about 270 ps. In scan mode, this extra delay is not significant since the CUTs are usually operated at a low frequency when scan chains are shifting. In function mode, Mux2 introduces a delay from the NL to the PL. This extra delay makes the propagation delay of the J-scan DFF larger than that of the M-scan DFF. The area and the delay overhead can be reduced if the J-scan DFF is laid out as a single customized cell.

Also note that the J-scan chain requires even number of scan cells because two bits are shifted in a clock cycle. If the number of scan cells in the original chain is odd, one dummy scan cell has to be inserted at the Scan_In of the scan chain.

(4)

Quiet Scan CLK SE Reset SI DO 1 1 cyc N/2 0 1 Reset Noisy Scan _Sys. clk cyc N/2+1 1 0 P1 P2 Cap. Quiet Scan 0 0 cyc 1 cyc 1 cyc 2

Figure 5. Waveforms of delay fault testing using JQN-scan

3.2 JQN-scan DFF

The J-scan low power technique can be applied together with the Quite-Noisy scan technique to further reduce the test power. Figure 4 shows the structure of a JQN-scan DFF. Compared to the J-scan DFF, the JQN

scan DFF has an extra reset pin. In addition, the scan output (SO) pin and the data output (DO) pin are now separated. The JQN scan DFF is bigger than the J-scan

DFF because the former has two additional NOR gate and one extra inverter. The cell area of the JQN-scan

DFF in TSMC .25 library is 201.6Pm2 (75% larger than an M-scan DFF without reset, or 30% larger than a M-scan DFF with reset).

Figure 4. JQN-scan DFF

When both the reset and SE are high, the JQN-scan

DFF operates in the same way as the J-scan DFF except that the output of DO pin is tied to logic zero. This is called the quite scan mode because the toggle activity in the combinational logic is suppressed. When the reset is zero and the SE is one, the JQN-scan

DFF operates in the same way as the M-scan DFF in scan mode. This is called a noisy scan mode because the output of DO is not suppressed to zero. As opposed to the quiet scan mode which shifts two bits per clock, the noisy scan mode shifts only one bit per clock. Figure 5 shows the waveforms of delay fault testing using the JQN-scan technique. After reseting the

circuit, the SE and reset signals are both asserted and the test pattern is quietly scanned in. During the quiet scan, the clock frequency is halved and the DO outputs

are always zero. After N/2 cycles of quiet scan (N equals total number of DFFs in the chain), the reset signal is de-asserted and the pattern P1 appears at DO.

In the (N/2+1)th cycle, the circuit is in a noisy scan

mode and pattern P2 appears at DO. Then the SE is

de-asserted so the scan cells are in function mode. The responses of the circuit are captured in the flip-flops. Finally, the responses are quietly scanned out. By applying the quiet-noisy scan, skew-load two-pattern tests are possible. The JQN scan reduces the test power

and, at the same time, preserves the delay fault coverage. Please see [Li 04] for more details about the quite-noisy scan technique.

4 Experimental Results 4.1 Power Dissipation

Table 2 lists the power dissipation of ISCAS’89 benchmark circuits of four different versions. The non-scan version is obtained by mapping the benchmark circuits to the TSMC 0.25Pm standard cell library. The M-scan version is generated by changing non-scan flip-flops to traditional M-scan DFFs, which are chained into one single scan chain. The M-scan DFFs are then replaced by either J-scan DFFs or JQN-scan

DFFs. The frequency of the scan clock is 10MHz in the simulation. The system clock is of the same rate as the scan clock. The simulations are performed in a bit-by-bit shifting way so that the circuit activity is accurately modeled. The absolute power dissipation is shown in micro-watts. On the average, the power reduction of J-scan and JQN-scan are 39% and 67%

with respect to the M-scan. The power of non-scan versions is also shown for reference. Because the ISCAS benchmark circuits have no functional test patterns, the power of non-scan versions is regarded as the power in function mode. It is shown that the test power of JQN-scan is even lower than the power in

(5)

function mode. The gate count (G) and the flip-flop count (FF) are shown for reference.

Table 2. Power Dissipation (PW) CUT G FF M-scan J-scan JQN-scanNon-scan

S526 193 21 229.0 149.5 114.9 89.5 s1494 647 6 582.7 355.0 275.0 220.0 s5378 2,779 179 1,731.5 1,080.1 700.0 979.9 s9234 5,597 211 2,822.6 1,766.2 833.4 1,303.3 s15850 9,772 534 5,447.5 3,522.2 2,046.7 2,950.6 s38417 22,1791,636 19,110.8 11,378.2 5,909.8 7,142.7 Average 4,987.4 3,041.9 1,646.6 2,114.3 To further analyze the power consumption, Figure

6 shows the breakdown of the power dissipation for s9234. The power dissipation is comprised of three major components. The components marked as “SC COMB” are the short circuit power dissipated within the combinational logic cells. The components marked as “SC SEQ” are the short circuit power dissipated within the sequential cells (i.e., flip-flops or latches). The “SW” components represent the switching power dissipated when charging and discharging the capacitors connected to interconnect wires. The J-scan effectively reduces all three types of power because of the halved clock rate. The JQN scan further reduces the

SC power in the combinational logic and the switching power. ˃ ˅˃˃ ˇ˃˃ ˉ˃˃ ˋ˃˃ ˄˃˃˃ ˄˅˃˃ ˄ˇ˃˃ ˄ˉ˃˃ ˠˀ̆˶˴́ ˝ˀ̆˶˴́ ˝ˤˡˀ̆˶˴́ ˡ̂́ˀ̆˶˴́ ˣ ̂̊ ˸̅ ʳʻ ̈˪ ʼ _SC COMB SC SEQ

SW SC = Short Ckt PowerSEQ = Sequential Cells COMB = Combinational Cells SW = Switching Power of Wires

Figure 6. Power Dissipation of s9234 4.2 Area Overhead

Table 3 shows the area overhead of the benchmark circuits. The first two columns show the area overhead of the J-scan and JQN-scan versions with

respect to their M-scan versions. Over all benchmark circuits, the average area overheads of the J-scan and JQN-scan are 12.8 % and 23.9 %, respectively. Because

the J-scan and JQN scan DFFs are not available in the

library, their areas are estimated by multiplying the area of M-scan DFFs by 1.4 and 1.75 respectively. The third column shows the area overhead of JQN-scan with

respect to the resetable M-scan versions. This is a fair comparison because the JQN-scan versions supports a

reset mode. The area overhead of JQN-scan versions is

only 9.5% compared to the resetable M-scan versions. Table 3. Area overhead of J-scan and J_QN-scan

CUT J-scan JQN-scan

JQN-scan (w.r.t. resetable M-scan) s526 14.4% 27.0% 10.7% s1494 1.7% 3.3% 1.3% s5378 13.2% 24.7% 9.7% s9234 8.8% 16.4% 6.5% s15850 11.7% 21.9% 8.7% s38417 14.3% 26.8% 10.6% Average 12.8% 23.9% 9.5%

4.3 Comparison against Other Techniques

Table 4 shows the comparison of J-scan and JQN

-scan against four other low power DFT techniques. The power reduction percentage numbers are obtained by taking the average over all the cases in the original paper. JQN-scan has the highest power reduction

percentage against all the other techniques. The second column shows if the DFT technique supports delay fault testing. The first four techniques do not consider delay fault testing in the paper. These techniques may be able to apply delay fault but probably need extra work and modifications. The J-scan and JQN-scan

support delay fault testing without problem. The third column shows the hardware overhead. Overall speaking, the J-scan and JQN-scan are effective low

power DFT techniques compared with the other previous techniques.

Table 4. Comparison of low power techniques Techniques Power Reduction Delay test? HW overhead Reorder [Dabholkar 98] 18% NA 0 Disable [Sankaralingam 01] 23% NA disable circuitry Gated Clock [Bonhomme 01] 40% NA extra clock Insert Gate [Sinanoglu02] 12% NA 3.3% J-scan 39% Yes 12.8% JQN-scan 67% Yes 9.5-23.9% 5. Discussions

5.1 Double Edge Trigger Scan FF

Double edge trigger (DET) FFs are often used in low power circuit design [Afghahi 91]. DET FFs change their outputs at both positive and negative clock edges. As far as the test power is concerned, DET consumes more power than J-scan because the outputs of DET FFs toggle twice per clock cycle but the outputs of J-scan FFs toggle only once per clock cycle.

(6)

In terms of the speed degradation in function mode, the conventional DET FF has longer setup time and propagation time than a single edge trigger FF [Llopis 96]. The J-scan FFs introduce only propagation time, not setup time, degradation in function mode compared to M-scan FFs. Furthermore, the DET designs are vulnerable to clock skew problems in function mode because of double active edges. Timing checks have to be performed carefully to avoid timing violations in both clock edges. The J-scan, on the contrary, does not require extra timing checks in function mode for the inactive clock edge. The cell area of a DET FF is approximately the same as that of a J-scan but the later requires an extra routing for the jump scan path. 5.2 Double Data Rate Scan

If the test time, instead of the test power, is the bottleneck of the testing, the proposed techniques can also be used to reduce the test time. The double data rate scan (or DDR scan) is achieved by testing the circuits with JQN-scan chains at the same clock

frequency as the M-scan. Since two scan data bits are shifted in one clock period, the DDR scan double the scan data rate and hence save 50% test time. There are, however, three issues needed to be addressed before applying the DDR scan. First, the scan data now have only half a clock cycle time to propagate so scan paths and jump paths have to be routed carefully. Second, the ATE has to support double data rate scan input and scan output test channels. The last concern is the power dissipation of DDR scan. For the ISCAS benchmark circuits, the average DDR scan power consumption is about 65.7% of that of the M-scan (same clock frequency).

6. Summary

The presented J-scan and JQN-scan techniques

effectively reduce 39% and 67% of the test power of a traditional M-scan, respectively. The advantages of the proposed low power testing technique include (1) minimal impact to existing DFT/ATPG flow, (2) no increase in test time, and (3) no change in clock tree design. The proposed technique is also applicable to delay fault testing. The penalties of the J-scan include area overhead of scan DFF, one extra routing and speed degradation.

Acknowledgement

This research is supported by the National Science Council of Taiwan under contract number NSC93-2220-E-002-012.

References

[Afghahi 91] M. Afghahi and J. Yuan, “Double Edge Trigger D-Flip-Flop for High-Speed CMOS Circuits”. IEEE J.

of Solid-State Circuits, August 1991, pp.1168-1070. [Bonhomme 01] Y. Bonhomme, P. Girard, L. Guiller, C.

Landrault, S. Pravossoudovitvh, “A Gated Clock Scheme for Low Power Scan Testing of Logic ICs or Embedded Cores,” Proc. 10th

Asian Test Symp.,

pp.253-258, 2001.

[Chou 94] R.M. Chou, K.K. Saluja and V.D. Agrawal, “Power Constraint Scheduling of Tests”, IEEE Int.

Conf. on VLSI Design, pp. 271-274,1994

[Dabholkar 98] Dabholkar r, V.; Chakravarty, S.; Pomeranz, I.; Reddy, S.; “Techniques for Reducing Power Dissipation During Test Application in Full Scan Circuits,” IEEE Trans. Computer-Aided Design, Vol. 17, no 12, Dec. 1998, pp1325-1333.

[Gerstendorfer 99] S. Gerstendorfer, H.J. Wunderlich, “Minimized Power Consumption for Scan-Based BIST,” Proc. IEEE Int’l Test Conf., pp77-84, 1999. [Girard 02] Girard, P, “Survey of Low-Power Testing of

VLSI Circuits,” IEEE Design and Test of Computers, pp. 82-92, May-June 2002.

[Hertwig 98] Hertwig, A. and H.J. Wunderlich, “Low Power Serial Built-in Self Test,” Proc. 3rd European Test Workshop, pp.49-53, 1998.

[Li 04] Li, J. C.M, “A Design for Testability Technique for Low Power Delay Fault Testing,” IEICE Transactions

on Electronics, v E87-C, n 4, April, 2004, pp.621-628. [Llopis 96] Llopis, R.P.; Sachdev, M., “Low power, testable

dual edge triggered flip-flops,” Int’l Symp. on Low Power Electronics and Design, pp.341–345, 1996. [Sankaralingam 01] R. Sankaralingam, B. Pouya and N. A.

Touba, “Reducing Power Dissipation During Test Using Scan Chain Disable,” Proc. IEEE 19th_{VLSI Test} Symp., pp. 319-324, 2001.

[Sinanoglu 02] P. Sinanoglu, I. Bayraktaroglu, and A. Orailoglu, “Test Power Reduction through Minimization of scan Chain Transitions,” Proc. IEEE

20th VLSI Test Symp., 2002.

[Wang 97] Wang, S. and S.K. Gupta, ”DS-LFSR: A New BIST TPG for Low Heat Dissipation,” Proc. Int’l Test

Conf., pp. 848-857, 1997.

[Whetsel 00] Whetsel, Lee” Adapting scan architectures for low power operation” IEEE International Test

Conference, 2000, p 863-872.

[Zorian 93] [1] Zorian, Y., “A Distributed BIST Control Scheme for Complex VLSI Design,” Proc. 11th IEEE VLSI Test Symp., pp. 4-9, 1993.