On Reducing Test Power, Volume and Routing Cost by Chain Reordering and Test Compression Techniques

全文

(1)IEICE TRANS. ELECTRON., VOL.E93–C, NO.3 MARCH 2010. 369. PAPER. Special Section on Circuits and Design Techniques for Advanced Large Scale Integration. On Reducing Test Power, Volume and Routing Cost by Chain Reordering and Test Compression Techniques Chia-Yi LIN†a) , Li-Chung HSU†† , Nonmembers, and Hung-Ming CHEN† , Member. SUMMARY With the advancement of VLSI manufacturing technology, entire electronic systems can be implemented in a single integrated circuit. Due to the complexity in SoC design, circuit testability becomes one of the most challenging works. Without careful planning in Design For Testability (DFT) design, circuits consume more power in test mode operation than that in normal functional mode. This elevated testing power may cause problems including overall yield lost and instant circuit damage. In this paper, we present two approaches to minimize scan based DFT power dissipation. First methodology includes routing cost consideration in scan chain reordering after cell placement, while second methodology provides test pattern compression for lower power. We formulate the first problem as a Traveling Salesman Problem (TSP), with different cost evaluation from [18], [19], and apply an efficient heuristic to solve it. In the second problem, we provide a selective scan chain architecture and perform a simple yet effective encoding scheme for lower scan testing power dissipation. The experimental results of ISCAS’89 benchmarks show that the first methodology obtains up to 10% average power saving under the same low routing cost compared with a recent result in [19]. The second methodology reduces over 17% of test power compared with filling all don’t care (X) bit with 0 in one of ISCAS’89 benchmarks. We also provide the integration flow of these two approaches in this paper. key words: DFT, TSP, test power. 1.. Introduction. In System-on-Chip (SoC) era, chip design and testing engineers have encountered more and more new design challenges. Due to the need of DFT, modern design has made external testing more difficult than before. During scan testing, the power dissipation is the critical issue because test vectors require a large number of shift operations and make circuits with high transition activity [1]. In fact, circuits may consume twice or more power in test mode than in normal functional mode operation. Scan power issue may cause several problems including the increase of product cost, circuit reliability reduction, instant circuit damage, decrease of overall yield, and autonomy decrease in portable systems [2], [20]. Some works [34], [35], [37] formulate the problems as IR drop and apply pattern filling techniques to deal with these problems. When dealing with high-performance modern ASICs and SoCs, a nondestructive test must satisfy all the power constraints defined in the design phase. In the past, because test needed only to cover stuck-at faults, tests typically ran at Manuscript received June 19, 2009. Manuscript revised September 29, 2009. † The authors are with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan. †† The author is with Springsoft Co., Taiwan. a) E-mail: chiayi@vda.ee.nctu.edu.tw DOI: 10.1587/transele.E93.C.369. lower speed than normal circuit working frequency. Moreover, the scan-based architectures consume a lot of power because each test vector requires a shifting operation to initialize scan cells and evaluate test responses. There are several approaches proposed to reduce average power during scan testing operation. It is feasible to reduce testing power consumption by lowering scan frequency, but it will increase testing application time and will not be suitable for high complexity and advanced SoC design which may need atspeed testing. Low power Automatic Test Pattern Generation (ATPG) algorithm concerns both fault coverage and low power issue [21]. But this approach needs more test vectors to achieve the same fault coverage comparing with traditional ATPG approach. It also increases test time. Test power consumption reduction by chaining scan cell with low power order is proposed in [22], [23]. However, [22] cannot guarantee short scan chain interconnection length and may cause congestion problems during scan routing. In order to avoid the routing congestion, [18] constrains power-optimized scan chain connection length by partitioning the chip into several regions and chaining scan cells with low power order in each region. Since this approach partitions chip by geographical criteria, the number of scan cells in each region may not be well-distributed. When there are few scan cells in some cluster, this cluster will suffer from poor power optimization ratio. [19] further proposes to partition the chip by balancing the number of scan cell in each region and makes more power reduction comparing to [18]. In [18] and [19], although they have shortened scan chain connection length by partitioning chips, they still have impact on routing wire length. And the stronger the constraint on the longest scan chain length is, the larger the number of clusters is. Consequently, if the number of clusters becomes larger, test power reduction will become smaller. On the other hand, in addition to scan chain reordering approach, test data compression can be another way to achieve low power scan testing. In [3], [24], [25], test data compression is used for test volume and test application time reduction. Moreover, [4], [8]–[10] use various test data compression techniques for reducing scan power, test volume, and testing time. [5]–[8] applies multiple scan chain concept in test power reduction. [4], [9], [10], [15] use test resource partitioning techniques, including Huffman coding, Golomb coding and alternating run-length coding skills for test data compression. However the overhead of extra circuit designs are considerably high, such as finite state machines generation and resultant circuit area. In [36], a mask based. c 2010 The Institute of Electronics, Information and Communication Engineers Copyright .

(2) IEICE TRANS. ELECTRON., VOL.E93–C, NO.3 MARCH 2010. 370. XOR network with EDT [38] test structure is used to reduce the switching activities. In this paper, we propose a methodology to reduce test power consumption by considering routing congestion in scan based architecture and selective test data compression. For routing congestion consideration in scan chain design, we formulate it as a traveling salesman problem (TSP) with different cost function from previous approaches [18], [19], and discuss the tradeoff between power and routing overhead. Although those approaches [18], [19] can reduce routing overhead when chaining scan cells with power driven strategy and design partitioning, these approaches may omit some good choices of scan cell pairs which have both low transition number and short connection length. Since we do not constrain power driven chaining scan cells in specific regions, our approach can achieve up to 10% power saving under the same routing cost in s9234 benchmark compared with [18]. Our experimental result is also better than [19] which has 1–3% improvement comparing to [18]. We have obtained at most 57% routing cost improvement under the same test power consumption as well. For test data compression technique to further reduce test power, we obtain averagely over 11% power reduction with small decoder circuit overhead, compared with the testing power in original test pattern length. The test volume reduction rate is 37% in average for six ISCAS’89 benchmarks. Our new scan architecture is easy to implement with synthesis tools. We further integrate these two approaches and obtain a better power reduction percentage, compared with filling don’t care (X) bit with 0 in original test pattern, and shorter scan chain wire length consideration. The remainder of this paper is organized as follows. In Sect. 2, we briefly describe power optimization scan chaining proposed in [22], routing constraint driven approaches on low power scan chain (LPSC) proposed in [18], [19], and test data compression techniques for low power in [8]–[10]. In Sects. 3 and 4, we present our approaches on how to consider both power and routing constraint issues, and our new scan architecture and selective test data compression techniques. In Sect. 5, we show the experimental results and conclude the paper in Sect. 6. This paper is an extended version of [14]. 2.. Low Power Scan Testing by Chain Design Considering Routing Cost and Test Data Compression. In IC design flow, scan cells and testing circuit are inserted after synthesis procedure. The scan chain connection will be broken before going into placement phase to prevent scan chain from having great impact on routing congestion. Then scan cells will be reordered with layout driven chaining algorithm. As for scan chain reordering, there have been many papers about shortening scan chain length and reducing scan chain routing congestion [26], [27]. As chip complexity and operating frequency evolve dramatically, scan chain reordering should not only focus on area overhead reduction, but also should take care of power consumption issue, such as. Fig. 1 (a) Example set of scan-in vectors and output response. (b) Corresponding weighted graph. Nodes represent scan cell and edges represent connection between nodes. The edge weight is bit difference between two scan cells.. [28]. In addition to scan chain ordering schemes, test data compression is an effective approach to reduce test power and test volume at the same time. We review some low power driven scan chain ordering methods in [18], [19], [22], [28] and some test data compression techniques for low power issues [8]–[10]. 2.1 Power Driven Scan Chain Reordering The dynamic power, P=Σ 1/2* Cli * f * Vdd2 * S i , where Cli is the equivalent output capacitance which is strongly correlated to its fan-out number, Vdd is the power supply voltage, and S i is switching probability. According to the equation, we can reduce test power by decreasing the number of scan transition activities. In [22], a low power-driven scan chain ordering approach applied a heuristic algorithm to minimize scan chain power. In this algorithm, it uses the test data which are generated from scan cell insertion tool and automatic test pattern generator. The proposed approach procedure is as follows. First, we construct a complete undirected graph: vertex represents scan flip-flop and each edge represents possible connection between two scan cells. The weight of each edge represents the total number of bit differences between two scan cells. Figure 1(a) illustrates scan flip-flops and scan vectors. We calculate bit differences between each pair of flipflops on the scan-in vectors and output responses. For example in Fig. 1(a): d(ff1,ff2)=6, d(ff1,ff3)=3, d(ff1,ff4)=2, d(ff2,ff3)=5, d(ff2,ff4)=4, d(ff3,ff4)=5. Figure 1(b) shows the corresponding weighted graph. Second, we use greedy algorithm to find Hamiltonian cycle with optimal low cost. From the optimal low cost cyclic solution, we estimate power for each flip-flop by determining scan-in and scan-out ports and find a minimum cost cutting edge. The solution is near-optimal scan cell order with much lower transition number during scan chain shifting operation. 2.2 Routing Constrained Low Power Scan Chains The power driven scan chain reordering approach from pre-.

(3) LIN et al.: ON REDUCING TEST POWER, VOLUME AND ROUTING COST. 371. 2.3 Test Data Compression to Achieve Lower Scan Testing Power. Fig. 2 Example of power driven chaining scan cells for s9234 obtained from [22]. It contains high routing congestion.. vious subsection has drawbacks mainly in creating routing congestion and long scan connection in the design. To show this point, Fig. 2 shows s9234 benchmark (has 211 flip-flops) routing result with power driven scan chain reordering. In Fig. 2, nodes represent the position of scan cells in the design and edges are connections between scan cells. Although the power driven approach can efficiently reduce power (27% power reduction compared to length driven scan chain reordering), the routing result is not optimal and has pretty high routing congestion. In order to reduce the routing overhead in power driven scan chain ordering, [18] proposed a chips partition method with geographic criteria and scan chain flip-flops in each cluster with low power driven order. In this approach, it definitely can shrink low power scan chain length, but the testing power reduction may be low when there are few flipflops in a cluster. To improve power reduction ratio in each cluster, [19] proposed a better version on partitioning chips with well distributed scan flip-flops in each cluster. This approach can slightly increase testing power reduction ratio and the total wire length of the scan chain remains almost the same with [18]. From the experimental results described in [18], [19], we observe that the testing power reduction and scan chain connection length are strongly correlated with the number of clusters. If there are more clusters, the power reduction ratio will be less and the scan chain length will be shorter. In fact, we will show that we do not have to partition the chip and obtain the same tradeoff easier. [28] proposed a technique for reordering of scan cells to minimize power dissipation that is also capable of reducing the area overhead of the circuit compared with a random ordering of the scan cells. They use dynamic minimum transition fill (MT-fill) to fill the unspecified bits in the test vector. They use this greedy/intuitive heuristic to achieve locally optimal scan cell ordering, however this sequential approach may not have big picture in lowering scan test power. Furthermore, although they use a tradeoff parameter λ to control relative importance of those two terms, it is not easy to specify a good value. We will show that our tradeoff parameter can provide better flexibility in giving power or routing cost minimization solution.. There are some previous works using compression techniques to achieve low power test. For example, [8]–[10] use test data compression techniques for reducing scan power, test volume, and testing time. In [9], [10], they used Golomb coding and alternating run-length codes for low power scan testing and test data compression. Moreover, [16] uses dictionary based method with memory to compress test data. This method uses memory storage inside chip to save compressed code in it and reduces the shift in data size. Another strategy is saving the encoding information in circuit [17]. It provides an inverter-interconnect based decompression network to decode the test data. We use decoding scheme to implement a low power architecture to achieve test data compression and low test power. This architecture needs some decoders to decode fewer bits of shift-in data. The experimental result shows that this approach reduce a lot of scan power and test data size. Although our compression scheme needs some routing resources, that is unavoidable cost. In the integrated flow, our reorder technique can alleviates the overall routing cost. 3.. Simultaneous Power and Routing Cost Minimization in Scan Chain Design. In previous works ([18], [19]), they focus on chaining scan cells with power driven order in each cluster and find that test power reduction ratio is strongly correlated with the number of clusters. They then chain each cluster with a knowledge based architecture. Although [18] and [19] both can reduce routing overhead which is induced by power driven scan cell chaining by partitioning the design, these approaches may omit some good choices of scan cell pairs which have both low transition number and short connection length. In order to consider test power and routing length minimization, we propose reordering scan chain with cost function which can take both power and routing length into consideration. In this way, this approach can make scan chain reordering process to find a better scan chain order solution without limiting in specific regions. In the following subsections, we will delineate how the edge weight is defined and how good quality of the scan chain order can be found. 3.1 Weighted Graph Construction Using Power And Routing Cost In power driven scan chain ordering [22], it can definitely get low transition scan chain order. However, this approach has not yet considered the physical information of the scan chain in the design and resulted in routing cost issue. In order to improve routing overhead in LPSC ordering, this approach constructs a weighted undirected graph G(V,E) and uses distance cost and bit difference cost between each pair.

(4) IEICE TRANS. ELECTRON., VOL.E93–C, NO.3 MARCH 2010. 372. cells of entrance of this scan chain to decide scan-in port and scan-out port. 3.3 Power Estimation for Scan-in and Scan-out Ports Determination. Fig. 3. When scan chain order has been decided, we need to define scan-in and scan-out port because the number of transition activities is not only related to the number of bits difference between scan cells but also strongly correlated to their relative positions. In order to choose which one of scan cells in the beginning of the scan chain, we need to estimate power for these two orders. To estimate scanning power dissipation, we use weighted transition model proposed in [29]. The estimation of scan-in and scan-out power in our scan chain order is the formula (2) (from [29]):. Our algorithm for finding low-power scan chain order.. of flip-flops to obtain edge weight. Vertices represent scan cells and edges represent connections between scan cells. The edge weights are given by formula (1). Edge weight(i, j) = (1 − β) ∗ Dist(i, j)/L + β ∗ Bit Diff (i, j)/N. (1). where Dist(i, j) is the direct connecting distance between i-th scan cell and j-th scan cell, Bit Diff (i, j) is number of bit differences between test vectors in i-th scan cell and j-th scan cell, L is diagonal length of the chip, N is total number of scan-in and response vectors, and β is the parameter that controls how much effort we pay to scan power consumption and it is ranged from 0 to 1† . Because the unit of Dist(i, j) and Bit Diff (i, j) are differently scaled, we normalize direct connecting distance of scan cell i and j by diagonal distance of the chip and normalize number of bit difference by total number of scan-in and response vectors. 3.2 Efficient Heuristic in Finding Min-Cost Scan Chain Order After weighted graph is constructed, we need to find a path with minimum cost. This problem can be formulated as a TSP problem, which is known as NP-complete. In order to generate an acceptable solution efficiently, we implement a heuristic algorithm to get a competent low cost solution. Our algorithm shows in Fig. 3. The complexity of this algorithm is O(|V||E|), where |V| is number of flip-flops and |E| is number of edge in the weighted graph. For circuits, which contain a large number of flip-flops, solving by TSP problem could be time-consuming. We can shrink the graph by grouping the nodes in the graph based on a threshold value in edge cost. The nodes will be grouped when their associative edge cost are below a threshold value. Then we apply the same algorithm. Since we want low cost trip in TSP, using this technique can reduce the complexity and achieve comparable results. After we find the low cost scan chain order, we can estimate the power for two scan. Weight T ransition = Σ(S ize o f S can Chain − Pos. o f T rans.). (2). From this power estimation function, we can decide which one of scan cell in the beginning scan chain to be scan-in port and which one to be scan-out port for this low power dissipation scan chain. 4.. Selective Test Data Compression Technique for Low Power Scan Testing. For further test power reduction after scan chain reordering, we propose a simple yet effective selective test data compression method to reduce the input test pattern size and total power consumption. In this method, we will compress some selected test patterns and use another compressed scan chain (CSC) as input data. We then shift the compressed data into CSC. The CSC decodes the compressed data to the normal scan chain (NSC) by additional decoders. The following subsections introduce our selective scan chain architecture and scan test power minimization strategy via test data compression. Our integrated methodology flow is shown in Fig. 4. Our scan chain reordering technique and low power test compression technique are applied after traditional ATPG, which is suitable for modern design methodology. 4.1 The Selective Scan Chain Architecture for Lower Scan Power The proposed scheme applies a new architecture and an optimization flow to achieve lower power and smaller test data volume. The selective scan chain architecture is shown in Fig. 5. All the test patterns are divided into two groups: first group of test pattern is used for CSC, the shift-in patterns are compressed form; second group of test patterns is used † We apply this tradeoff parameter β which can be used by designer to specify the relative importance of power and routing cost. Different from [28], our parameter is very easy to specify (ranging from 0 to 1)..

(5) LIN et al.: ON REDUCING TEST POWER, VOLUME AND ROUTING COST. 373. Fig. 7 Pattern selection stage. After we get the partitioned test data set, we use X bit omit ratio to determine which test data row should be put into CSC data set and which test data row should be put into NSC data set.. Fig. 4 Our proposed methodology to reduce scan testing power by power-driven scan chain reordering and selective scan chain optimization.. ing scheme is simple, the decoder circuit is small as well. The number of scan cells in each scan unit are generated by the optimization methodology. After introducing our test architecture, following subsections present the optimization methodology on test data. 4.2 Optimization Methodology for Test Data Compression and Further Power Reduction. Fig. 5 Our selective pattern-compression architecture, which contains compressed scan units and decoders. A compressed scan unit provides test data to normal scan chain.. Fig. 6 Small overhead for decoder circuit used in our CSC. Figure 6(a) shows the decoder example. The CSC code are 0 and 1, while the codes map to NSC codes are 0000 and 0111. Figure 6(b) is another decoder example. The CSC codes are 00,01, and 10, while the codes maps to NSC are 0001, 1101, and 1110.. in NSC, the shift-in patterns are not compressed. In this architecture, CSC uses the first group of test patterns which has more X bits to manipulate. X bit ratio is the ratio of the X in a single pattern length (SPL). And we use X bit omit ratio as an indicator to separate the test patterns. If X bit ratio of a test pattern is smaller than the selected X bit omit ratio, the pattern belongs NSC group, otherwise the test pattern belongs to CSC group. Compressed patterns need special decoders to extract the test patterns to NSC. Figure 5 shows the one scan unit to four bits (1-unit-to-4-bit) decoder structure with original scan chain and compressed scan chain. Each compressed scan unit is composed of one or more scan cells and provides decoding results to the normal scan cell as test data. Figure 6 shows the decoder circuit examples. Figure 6(a) has one scan cell in compressed scan unit, and Fig. 6(b) has 2 scan cells in compressed scan unit. Because the decod-. Our methodology consists of three stages. First stage is pattern selection, it sets the X bit omit ratio in order to select patterns for CSC. Second stage is pattern compression, it merges 4-bit length of test patterns in the same column of test sets. The first column is the first 4-bit in each test pattern. Third stage is power optimization stage. In this stage, it uses shorter pattern length and applies greedy search method to find the smallest power consumption code in CSC column by column. Each X bit omit ratio provides one result for test data volume and power consumption. By evaluating all ratios, we can get an optimal ratio for power minimization. 4.3 Pattern Selection Stage This stage separates the test patterns into two groups by X bit omit ratio. The test patterns are generated from automatic test pattern generation (ATPG) tool, such as SyntestTurboScan [11] or TetraMax [12]. First group of test patterns is for CSC, and the second group is for NSC. First group of test patterns for CSC needs further compaction, while second group of test patterns for NSC uses normal shift method to test circuit. If test pattern’s X bit ratio is smaller than given X bit omit ratio, this test pattern will belong to the NSC group, otherwise it will belong to CSC. Figure 7 shows that less X bit pattern is put into NSC group. Total original test size in equation (3) and total new test size in equation (4) are used to calculate the test data volume in this paper. Original test sizetotal = Pattern numberorg × SPLorg New test sizetotal = Omit pattern number × SPLorg + S elected pattern number × SPLnew. (3). (4).

(6) IEICE TRANS. ELECTRON., VOL.E93–C, NO.3 MARCH 2010. 374. Fig. 9 Our pattern compression algorithm. TP(M,N) is test pattern array (shown in Fig. 8(a)).. the first pattern of the second column. Patterns in each column are independently merged. Figure 8(b) is the example of the merged pattern results. It fills all X with 0 to each merged pattern at the end of this stage. Our pattern compression algorithm is shown in Fig. 9. Column 1 in Fig. 8(b) shows that it has 8 merged patterns. It means that this column needs 3 bits to encode the 8 merged patterns. Similarly, columns 2, 3, 4, and 5 need 3-bit, 4-bit, 2-bit, and 1-bit respectively. Because column 3 still needs 4 bits to encode in this case, there is no gain on volume reduction. Test data in this column is not changed. Finally, the new test data length in this case is 14 bits since the residual bit is also added at the end of new test data. 4.5 Power Optimization Stage. Fig. 8 Pattern compression example. (a) shows the original format of the first four original test patterns been selected for CSC and it is separated by 4-bit per column. (b) After the test patterns are merged by the pattern compression stage, all of the X bits are filled with 0. The result shows that each column has different number of merged patterns. (c) As the test pattern for CSC are encoded by the power optimization stage, the encoded data for CSC has smaller size than the original one. Here we show the first 4 patterns that will be shifted into CSC. (d) is the codes of the first 4 test patterns that will be decoded from CSC to NSC.. SPLnew is the CSC test data length that comes from compressed scan unit (shown in Fig. 5). 4.4 Pattern Compression Stage After test patterns for CSC are determined, these patterns are compressed by the pattern compression stage. This step merges pattern with X bits. For example, X000 and 0000 can be merged into pattern 0000. As can be seen, Fig. 8(a) lists the first four test patterns which are selected for CSC. The test data is separated by four bits per column. It has 5 columns and 1 residual bit at the end in this case. This procedure starts to merge two test patterns from the first pattern of the first column to the last pattern of the first column until the first column completely merged. Then, it starts to merge. In order to minimize the shift-in power with the new test data, the greedy search method are applied in each column to get the new test data. Figure 8(c) shows the new test data which is in compressed form. This stage maps new test data code to the merged pattern from previous compression stage. This stage transfers new test data code to the original test pattern. For example, The first column of Fig. 8(b) has 8 different merged codes. After the power optimization procedure, we get the result in Fig. 8(c). It shows that the first 4 test pattern mapping are 110, 110, 110, and 100 in column 1. That means the shift-in data for CSC is 110 and the decoded result is 0000 (shown in Fig. 8(d)) for NSC in the first column of the first test pattern. In this stage, we try to calculate the smaller transition mapping of the new test code from the first column. Because the first column in Fig. 8(b) has 8 different codes, it needs 3 bits to encode them. The permutation of 3 bits, with 8 new encoded data, is 40320(8!). The optimization method calculates all of the encoding results and selects the fewest switch power encoding from the first column. Next, this procedure starts to encode next column and selects the fewest switch power encoding. The greedy method obtains the low power encoding results at the final of this stage. The encoding results become new test data for CSC. Figure 10 shows our power optimization algorithm..

(7) LIN et al.: ON REDUCING TEST POWER, VOLUME AND ROUTING COST. 375 Table 1 Characteristics of ISCAS’89 benchmark and its DFT information. Circuit Name # Gates # DFFs # Vectors FC s5378 2779 179 114 98.7 s9234 5597 211 154 94.5 s13207 7951 638 249 99.1 s15850 9772 534 133 98.1 s35932 16065 1728 40 91.3 s38417 22179 1636 346 99.9. Fig. 10 Our power optimization algorithm. NTP(M,N) is new test pattern array (shown in Fig. 8(c)).. 5.. Experimental Results. The experimental results with this approach on circuit benchmark of ISCAS’89 family [30] show in this section. In order to simplify scan-based test power estimation, we use number of transition in scan chain as dynamic power unit and normalize it with routing driven scan chain ordering to highlight the power reduction ratio. The results are verified using PrimePower [13] and the estimation error is within 3%† . As for interconnection length of scan cells calculation, we use direct connection length between scan cells. The circuit characteristics and testing vector information are shown in Table 1. The deterministic testing vectors are generated from Syntest-TurboScan [11]. The lost of fault coverage (FC) is due to circuit design and aborted faults. The second column of the first part in the table shows the number of gates which NOT gates are included and the third column shows the number of D flip-flops. In the second part of the table, we show the number of testing vectors for each circuit benchmark and corresponding fault coverage. Note that the value in Table 1 will not be changed during the process. 5.1 The Results for Routing Cost Aware Scan Chain Ordering The first part of experimental results of our approach are shown in Table 2. The second row PR represents power reduction ratio and WL represents routing length of all scan cells. The placement result is generated from the placer which follows the design of Dragon standard-cell placement tool [31], [32]. We also follow the assumption from [18] that there is a strong connection between routing length and routing congestion. We start by setting β with 0 and increase the value by 0.2 until β is equal to 1. For each benchmark, we show its power reduction ratio and scan chain interconnection length. All power reduction ratios are normalized to ratios with β=0 of each benchmark which has best routing cost and poorest power consumption. From Table 2, we can. see the tradeoff between testing power and scan chain routing cost. For example, although s13207 has 6.9% higher power reduction ratio in β=0.4 than that in β=0.2, it cost 43.9% longer in wire length. With our well defined cost function, we observe that the power saving ratio is almost linearly increasing with β from 0 to 1. From this result, testing designers can control tradeoff between power consumption and routing overhead more intuitively. We have implemented [18] to compare the experimental results with our approach. Table 3 shows the experimental result of [18] in our platform. As previously shown, all power reduction ratios of each benchmark are normalized to results of routing congestion driven scan chain reordering. The experimental results show that the more clusters we use, the less the power reduction is and the shorter the scan chain length is, and the trend is the same as shown in [18]. Furthermore, the advantage of our approach is clearly shown in Fig. 11, which we compare the result from Table 3 and our proposed approach. The horizontal axis is routing wire length and the vertical axis represents power reduction ratio compared with pure routing driven scan chain reordering. We can see that our approach has more power reduction ratio under the same routing overhead. We also have less routing overhead under the same power reduction ratio. Table 4 validates the above statement and shows averagely 43% routing cost improvement. In Fig. 12, we show scan chain routing graph of s9234 benchmark with four clusters in (a) and β=0.6 in (b). We can explicitly observe the advantage over routing congestion under the same test power saving ratio which is around 19% in this example. As for approach in [19], which clusters chip with well distributed flip-flops, their approach has 1– 3% improvement in power reduction ratio and has about the same routing overhead compared with [18] in benchmark s9234. We then deduct from Table 3 that our approach has both better power reduction and routing cost compared with approach in [19]. 5.2 The Results for Selective Scan Chain Architecture The second part of experimental results is shown in Table 5 and Fig. 13. We use 1-unit-to-4-bit compressed scan unit strategy in our experimental architecture. It shows that dif† We assume that the fan-out loading of each scan flip-flop is the same. However, the fan-out loading of each scan flip-flop may differ from each other. This is why we have variation in estimating switching power..

(8) IEICE TRANS. ELECTRON., VOL.E93–C, NO.3 MARCH 2010. 376 Table 2 Average power reduction and routing length with β=0 to 1. This shows the tradeoff between power consumption and routing cost of scan chain ordering. The wire length unit is μm. β value β=0.0 β=0.2 β=0.4 β=0.6 β=0.8 β=1.0. s5378 PR(%) WL 0 6.37E+2 8.99 6.79E+2 16.23 8.43E+2 23.56 1.21E+3 27.96 1.91E+3 34.52 4.44E+3. s9234 PR(%) WL 0 9.03E+2 11.80 1.09E+3 15.30 1.48E+3 19.30 2.41E+3 22.70 3.88E+3 27.08 7.08E+3. s13207 PR(%) WL 0 1.95E+3 8.74 2.37E+3 15.13 3.41E+3 21.34 5.72E+3 28.9 1.16E+4 30.03 2.98E+4. Table 3 Experimental results of approach in [18] in our platform (s9234). The last row shows the result of the routing driven scan chain reordering which obtains best routing congestion but worst power consumption.. s9234 # cluster 1 2 4 9 16 36 64 100 256 3136 1. PR(%) WL 27.08 7.08E+3 21.04 5.09E+3 19.02 3.78E+3 18.28 2.84E+3 17.26 2.54E+3 14.81 1.87E+3 10.52 1.73E+3 7.95 1.55E+3 4.18 1.25E+3 0.57 9.54E+2 Routing Driven Scan Chain Reordering 0 6.94E+2. s15850 PR(%) WL 0 2.05E+3 8.29 2.31E+3 13.56 2.84E+3 18.17 3.93E+3 23.89 6.68E+3 30.03 2.33E+4. s35932 PR(%) WL 0 1.78E+4 0.66 1.78E+4 3.26 1.79E+4 8.07 1.83E+4 15.24 1.96E+4 39.23 7.52E+5. Table 4 Wirelength comparison of our approach and previous work [18] under the same power reduction ratio. W LI MP means the routing cost improvement percentage. PR(%) 0 11.8 15.3 19.3 22.7 27.08. W L [18] 9.54E+2 1.73E+3 1.87E+3 3.78E+3 5.09E+3 7.08E+3. W L(Ours) 9.03E+2 1.09E+3 1.48E+3 2.41E+3 3.88E+3 7.08E+3. W LI MP(%) 5.6 56 26 57 31 0. Fig. 12 Experimental result of benchmark s9234. (a) By [18]. The design is partitioned into four clusters and has power saving ratio 19.02%. (b) By our approach. The β is set to 0.6 and has power saving ratio 19.30%. It is clear that our approach provides better wiring with the same power saving ratio.. Fig. 11 Comparison of experimental results with approach in [18] and ours for s9234, showing that our approach has outperformed [18] both in power and routing cost reduction.. ferent circuits need different X bit omit ratio to obtain the smallest test data size and lowest test power. The results are normalized the result of filling all X bits with 0. For test data volume reduction comparison, we compare our results with [39] since these two approaches are similar. In Table 6, our reduction rate and the selected code method in [39] are similar. Both methods use 4 bits block as encoding source. The selected code uses simpleto-decode strategy to implement, which can reduce the complexity in decoder design but the test data volume would be higher than the optimal Huffman encoding. Our method. Table 5 Experimental results of scan chain pattern reduction and power reduction in ISCAS’89 benchmarks. orig new reduce orig test opt power reduce SPL SPL volume (%) power in CSC power(%) s5378 179 62 33.6 257404 217176 15.6 s9243 211 64 34.1 609802 516847 15.2 s13207 638 163 49.6 2048046 1910626 6.7 s15850 534 144 35.9 20671203 1946605 5.8 s35932 1728 451 32.1 6222396 5160967 17.1 s38584 1426 399 40.9 17510442 16046133 8.4 average 37.7 11.5. only optimizes the encoding column by column but not all of columns at one time. That also degrades the compression ratio. Finally, both of the methods provide similar well compression results. 5.3 The Integrated Methodology We further proposed the integrated flow of these two approaches to obtain more power reduction in test power, and at the same time gain better tradeoff in scan chain wire.

(9) LIN et al.: ON REDUCING TEST POWER, VOLUME AND ROUTING COST. 377. Fig. 13 Fixed output scheme experimental results of ISCAS’89 circuits in new total test size and power consumption using one scan chain architecture with different omit ratio. New total test size is normalized to original test size. Power consumption is normalized to power of all X bits fill with 0. (a) For s38584, we can get the total optimal size reduction rate 50% at X bit omit ratio of 75%. (b) The power reduction rate of s38584 at X bit omit ratio of 90% is 8.4%. Table 6 Compression results comparison. The test patterns provide 100% fault coverage and the block size of selected code method [39] is 4. Volume reduction rate of Our volume reduction selected code [39] rate s5378 45.6 33.6 s9243 40.1 34.1 s13207 48.8 49.6 s15850 35.8 35.9. Table 7 The cost weights for the patterns containing X bit in chain ordering technique of our combined approach. Pattern XX X0 or 0X X1 or 1X 10 or 01. Cost 0 0.5 0.5 1. length. The flow is shown in Fig. 4. Since we do not define the X bit pattern in our chain ordering technique, we can defined the cost weights for those patterns with X bit. Table 7 shows the pattern cost example. 6.. Conclusion. In this paper, we propose two approaches to alleviating test power issue by routing cost driven scan chain design and selective test data compression. Since it does not constrain chaining scan cells in a specific cluster region, the first approach has more freedom to choose better scan cell connection. With well defined cost function for weighted graph, it obtains better test power and routing cost optimization more explicitly. It also provide better tradeoff parameter for more flexibility in power or routing cost optimization. To further reduce test power consumption after scan cell reordering, the selective test data compression is provided for smaller test power and test data volume with limited area overhead. However, the limitation of the decoder’s performance also needs to be explored in this low scan power architecture design.. References [1] A. Crouch, Design-for-Test for Digital IC’s and Embedded Core Systems, Prentice Hall, 1999. [2] N. Nicolici and B.M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits, Kluwer Academic Publishers, 2003. [3] S. Kajihara, K. Taniguchi, K. Miyase, I. Pomeranz, and S.M. Reddy, “Test data compression using don’t-care identification and statistical encoding,” Proc. Asian Test Symposium, pp.67–72, 2002. [4] M. Nourani and M.H. Tehranipour, “RL-Huffman encoding for test compression and power reduction in scan applications,” ACM Trans. Des. Autom. Electron. Syst., vol.10, no.1, pp.91–115, Jan. 2005. [5] I. Lee, Y.M. Hur, and T. Ambler, “The efficient multiple scan chain architecture reducing power dissipation and test time,” Proc. IEEE Asian Test Symposium, pp.94–97, 2004. [6] I. Lee, J.H. Jeong, and T. Ambler, “Two efficient methods to reduce power and testing time,” Proc. International Symposium on Low Power electronics and Design, pp.167–172, 2005. [7] K.J. Lee, S.J. Hsu, and C.M. Ho, “Test power reduction with multiple capture orders,” Proc. IEEE Asian Test Symposium, pp.26–31, 2004. [8] Y. Shi, N. Togawa, S. Kimura, M. Yanagisawa, and T. Ohtsuki, “Low power test compression technique for designs with multiple scan chains,” Proc. IEEE Asian Test Symposium, pp.386–389, 2005. [9] A. Chandra and K. Chakrabarty, “A unified approach to reduce SOC test data volume, scan power and testing time,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.22, no.3, pp.352– 362, March 2003. [10] A. Chandra and K. Chakrabarty, “Low-power scan testing and test data compression for system-on-a-chip,” IEEE Trans. Comput.Aided Des. Integr. Circuits Syst., vol.21, no.5, pp.597–604, May 2002. [11] TurboScan. SynTest Inc. [12] TetraMax. Synopsys Inc. [13] Prime Power Synopsys Inc. [14] L.-C. Hsu and H.-M. Chen, “On optimizing scan testing power and routing cost in scan chain design,” International Symposium on Quality Electronic Design, pp.451–456, 2006. [15] A. Jas, J.G. Dastidar, M.E. Ng, and N.A. Touba, “An efficient test vector compression scheme using selective huffman coding,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.22, no.6, pp.797–806, June 2003. [16] A. Wurtenberger, C.S. Tautermann, and S. Hellebrand, “Data compression for multiple scan using dictionaries with corrections,” Proc. IEEE International Test Conference, pp.926–935, Oct. 2004. [17] A. Orailoglu, W. Rao, and G. Su, “Frugal linear network-based test.

(10) IEICE TRANS. ELECTRON., VOL.E93–C, NO.3 MARCH 2010. 378. [18]. [19]. [20] [21]. [22]. [23]. [24]. [25]. [26]. [27]. [28]. [29]. [30]. [31]. [32]. [33]. [34]. [35]. [36]. [37]. [38]. decompression for drastic test cost reduction,” Proc. International Conference on Computer-Aided Design, pp.721–725, Nov. 2004. Y. Bonhomme, T. Yoneda, H. Fujiwara, and P. Girard, “Efficient scan chain design for power minimization during scan testing under routing constraint,” Proc. IEEE International Test Conference, pp.488–493, 2003. Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, and A. Virazel, “Design of routing-constrained low power chains,” IEEE Proc. Design, Automation and Test in Europe Conference, pp.62–67, 2004. P. Girard, “Survey of low-power testing of VLSI circuit,” Proc. IEEE Design & Test of Computers, pp.82–92, 2002. S. Wang and S.K. Gupta, “ATPG for heat dissipation minimization during scan testing,” Proc. IEEE/ACM Design Automation Conference, pp.614–619, 1997. Y. Bonhomme, P. Girard, C. Landrault, and S. Pravossoudovitch, “Power driven chaining of flip-flop in scan architectures,” Proc. IEEE International Test Conference, pp.796–803, 2002. P. Gupta, A.B. Kahng, and S. Mantik, “Routing-aware scan chain ordering,” ACM Trans. Des. Autom. Electron. Syst., pp.546–560, July 2005. J. Rajski and J. Tyszer, “Test data compression and compaction for embedded test of nanometer technology designs,” Proc. IEEE International Conference on Computer Design, pp.331–336, 2003. I. Bayraktaroglu and A. Orailoglu, “Test volume and application time reduction through scan chain concealment,” Proc. ACM/IEEE Design Automation Conference, pp.151–155, 2001. K.-H. Lin, C.-S. Chen, and T.T. Hwang, “Layout-driven chaining of scan flip-flops,” Proc. IEEE Computers and Digital Techniques, pp.421–425, 1996. M. Hirech, J. Beausang, and Xinli Gu, “A new approach to scan chain reordering using physical design information,” Proc. IEEE VLSI Test Symposium, pp.348–355, 1998. S. Ghosh, S. Basu and, N.A. Touba, “Joint minimization of power and area in scan testing by scan cell reordering,” IEEE Computer Society Annual Symposium on VLSI, 2003. R. Sankaralingam, R.R. Oruganti, and N.A Touba, “Static compaction techniques to control scan vector power dissipation,” Proc. IEEE VLSI Test Symposium, pp.35–40, 2000. F. Brglez, D. Bryant, and K. Kozminski, “Combinational profiles of sequential benchmark circuits,” Proc. Internationl Symposium on Circuits and Systems, pp.1929–1934, 1989. M. Sarrafzadeh and M. Wang, “NRG: Global and detailed placement,” Proc. IEEE/ACM International Conference on ComputerAided Design, pp.164–169, 1997. X. Yang, M. Wang, and M. Sarrafzadeh, “Dragon2000: Standardcell placement tool for large industry circuits,” Proc. IEEE/ACM International Conference on Computer-Aided Design, pp.260–263, 2000. K. Rahimi and M. Soma, “Layout driven synthesis of multiple scan chains,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., pp.317–326, March 2003. W.-W. Hsieh, I.-S. Lin, and T. Hwang, “A physical-location-aware X-Filling method for IR-drop reduction in At-Speed scan test,” Proc. Design, Automation and Test in Europe, 2009. M.-F. Wu, J.-L. Huang, X. Wen, and K. Miyase, “Reducing power supply noise in linear-decompressor-based test data compression environment for At-Speed testing,” Proc. International Test Conference, 2008. D. Czysz, M. Kassab, X. Lin, G. Mrugalski, J. Rajski, and J. Tyszer, “Low power scan shift and capture in the EDT environment,” Proc. International Test Conference, 2008. S. Sde-Paz and E. Salomon, “Frequency and power correlation between At-Speed scan and functional tests,” Proc. International Test Conference, 2008. J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, “Embedded deterministic test,” IEEE Trans. Comput.-Aided Des. Integr. Circuits. Syst., pp.776–792, May 2004. [39] A. Jas, J. Ghosh-Dastidar, and N.A. Touba, “Scan vector compression/decompression using statistical coding,” Proc. IEEE VLSI Test Symposium, 1999.. Chia-Yi Lin received the B.S. degree in Industrial Technology Educational Department from National Kaohsiung Normal University, Kaohsiung, Taiwan, in 1998 and the M.S. degree in Information Management Department from National Sun Yat-Sen University, Kaohsiung, Taiwan, in 2001. He is currently working toward the Ph.D. degree in the Department of Electronics Engineering, National Chiao Tung University, Taiwan. He interests in low power test of digital circuit and physical design method in VLSI.. Li-Chung Hsu received the B.S. and M.S. degree in Department of Electronics Engineering from National Chiao Tung University, Hsinchu, Taiwan, in 2003 and 2005, respectively. He is currently working in Springsoft co., Taiwan. He interests in EDA and digital circuit design.. Hung-Ming Chen received the B.S. degree in computer science and information engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1993, and the M.S. and the Ph.D. degrees in computer sciences from University of Texas at Austin, in 1998 and 2003, respectively. He is currently an Associate Professor with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan. Dr. Chen has been a member of some technical program committees, including IEEE SOCC, ASP-DAC and VLSI-DAT. His research interests include EDA (nanometer physical design and design methodology), beyond dieintegration, design and analysis of algorithms and optimizations..

(11)