Wafer Sort Bitmap Data Analysis Using the PCA-Based Approach for Yield Analysis and Optimization

(1)

Wafer Sort Bitmap Data Analysis Using

the PCA-Based Approach for Yield

Analysis and Optimization

Yeou-lang Hsieh, Gwo-hshiung Tzeng,

Fellow, IEEE,

T. R. Lin, and Hsiao-cheng Yu

Abstract—Yield analysis is one of the most important subjects

in IC companies. During the initial stage of new process develop-ment, several factors can greatly impact the yield simultaneously. Traditionally, several learning cycle iterations are required to solve yield loss issues. This paper describes a novel way to diagnose yield loss issues in less iteration. First, the failure classification of bitmap data is transferred to a new basis using principal component analysis. Second, the defective rates are calculated and the original bitmap data is reconstructed in the principal basis, allowing the yield loss space to be generated by Cluster Analysis. Third, physical failure analysis samples can be selected to solve yield loss issues. Furthermore, the new yield loss basis can be used to monitor the progress of yield improvement as a discriminate analysis measure for reducing failure patterns (bitmap failures).

Index Terms—Bitmap, cluster analysis, discriminate analysis,

principal component analysis (PCA), yield analysis, yield loss space.

I. Introduction

D

URING THE wafer manufacturing process, wafer sort-ing is the final step to ensure that ICs function well. Only qualified chips are sent on for packaging and further processing. If an IC contains repeated circuit blocks [e.g., embedded static random access memory (SRAM)], bitmap data can be collected as part of chip probe data. Bitmap data collection is a common procedure in SRAM, dynamic random access memory, and Flash memory ICs. Bitmap data records the failing bits of the memory being tested, and it represents the failing counts for different failure patterns recognition. Because specific bitmap failure patterns can be connected to certain semiconductor process failures, several previous studies focus on bitmap failure pattern recognition [2], [8], [9]. The first step of traditional yield analysis approach is to synthesize bitmap data into a Pareto chart. The second

Manuscript received November 25, 2008; revised November 12, 2009; accepted June 8, 2010. Date of publication August 26, 2010; date of current version November 3, 2010.

Y.-L. Hsieh is with the Customer Technical Supporting Division, Taiwan Semiconductor Manufacturing Company, Hsinchu 300, Taiwan, and also with the Institute of Management of Technology, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: ylhsieh@tsmc.com).

G.-H. Tzeng, T. R. Lin, and H.-C. Yu are with the Institute of Management of Technology, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: ghtzeng@mail.knu.edu.tw; gtrl@faculty.nctu.edu.tw; chengyu@cc.nctu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSM.2010.2065510

step conducts failure analysis to examine selected samples from the Pareto chart to determine failure mechanisms. After modifications are made for the process improvement, the third step generates a new Pareto chart for the second learning cycle. Lastly, the entire procedure is repeated to achieve the yield goal. A few learning cycles and several months of development are usually required to launch new technology (e.g., 45 nm technology).

This paper uses the principal component analysis (PCA) approach to explore the reduced failure space basis, the cluster analysis to generate independent failure events, and discrim-inate analysis to select objective physical failure analysis (PFA) dice. Using failure analysis, this paper then reveals the IC failure mechanisms and fixes them using process improvement. Results show that the approach in this empirical study demonstrates fewer learning cycles than traditional yield improvement approaches.

SRAM is widely used as a test vehicle in the new generation complementary metal oxide semiconductor (CMOS) process development due to its high front-end transistor density and CMOS-compatible manufacturing process properties. Com-pared to other kinds of circuits, memory circuits are also easier to use in PFA because failing bits are localized by bitmap testing data. Therefore, SRAM ICs or logic ICs with embedded SRAM are commonly used as monitors in CMOS process development.

Many different mechanisms can induce chip failures. Failure patterns can be categorized as random failures, systematic failures, and repeated failures [5].

This research is organized as follows. Section II intro-duces the bitmap data and mathematical method, Section III presents examples of bitmap data analysis to demonstrate the procedures of the proposed method, and Section IV presents conclusions and remarks.

II. Bitmap Data and Mathematical Method This section briefly introduces the bitmap data and the method of generating simulated bitmap data. The second subsection then introduces analytical approaches like PCA, cluster analysis, and discriminate analysis.

A. Bitmap Data

1) Bitmap Failure Pattern: Bitmap data records the

lo-cations of failing bits in the repeated structures of an IC

(2)

Fig. 1. Bitmap failure pattern recognition.

(e.g., the SRAM). Due to the circuit structure, specific failure mechanisms usually produce particular failure patterns that can be analyzed and categorized. For instance, if four bits share the same via in the circuit layout, they would fail simultaneously by the via open fail. Also, if two bits share the same contact, both would fail if that particular contact fails. Furthermore, failure patterns change with different testing voltages. For example, if the contact fails because of high resistance but no contact open, the results of testing twin bit failure at low voltages and high voltages might be different.

Fig. 1 illustrates typical (SRAM) bitmap failure recognition. Grouping procedures generally begin by processing large-area failure patterns, such as bulk failures, word line failures, and bit line failures. The remaining failing bits are then grouped into four bit failures, twin bit failures, one bit failures, and so on. With further analyses, each of the failure patterns can be classified into several subcategories. For example, wordline failures can be separated into full wordline failures and partial wordline failures, and so on.

2) Simulated Bitmap Data: The method in this paper

generates bitmap data by the following equation:

Xn×p:→ m

k=1

Ck× Dk_n_×1× Fk₁_×p× f (1) where Xn×p is a matrix with n row(s) and p column(s), and the element [xνi]n×p represents the failure bit count of the ith bitmap failure mode (i = 1, 2, ..., p) of the vth die (v = 1, 2, ..., n), the element [xνi]n_×p can be any nonnegative integer (Ex: if [x12,5] = 7, the failure bit count is seven in

the 5th failure mode of 12th die). In the simulated example,

n = 500, p = 20, and m = 6 (500 dice, 20 failure modes, six yield loss event). The basic idea of (1) is, bitmap data from the kth yield loss event can be constructed by: 1) Ck: fail intensity among different wafers; 2) Dkn×1: fail intensity within same wafer but different dice; 3) Fk

1×p: specific bitmap

data feature by circuits structure; and 4) f : uncertain factor. One yield loss event can cause different yield loss results in different wafers (for example, if we have litho machine lens heating issue in a metal layer which makes metal island pattern

TABLE I

CkValues for Each Yield Loss Event Event 1 (System Wafer Edge) Event 2 (System Wafer Edge) Event 3 (System Wafer Center) Event 4 (System Local-ized) Event 5 (Repeating) Event 6 (Random) C1 = 100 C2 = 100 C3 = 100 C4 = 100 C5 = 100 C6_{= 1 (low noise} scenario) C6_{= 10 (median noise} scenario) C6 _{= 100 (high noise} scenario)

Fig. 2. Wafer map of yield loss events Dk_n_×1.

fail, we can observe same bitmap failures with different “fail intensity,” because lens heating is getting worse by process time), so that we use Ck_{as failure count multipliers to interpret} yield loss events’ “intensity.” The term Ck _{is a constant of k,} which stands for the kth yield loss event, in the model, Ck _is assigned as different order for different signal intensities.

In Table I, the Ck_{values function as failure count multipliers} for yield loss events. Event-1–Event-5 are signals with three noise scenarios presented by Event 6. When the Ck value of Event 6 (in a low/medium noise scenario) is lower than that of the other events, the signals (Events 1–5) are larger than the noise (Event 6).

Dk

n×1 is a matrix with n row(s) and one column in Event k, we use “0–1” in Dk

n_×1to separate those defective/non-defective dices under each yield loss event. For example, [D97]4 = 0

means for 97th die, it is non-defective from yield loss event four. In this paper, we set m = 6 (six yield loss events), four systematic yield loss events occur along with one repeated yield loss event and one random yield loss event. Fig. 2 displays the corresponding wafer maps of these six events and combined wafer maps. In the first row of Fig. 2, D1 and

D2 represent the systematic wafer edge yield losses, while

D3 indicates the systematic wafer center yield loss. In the second row, D4 represents the systematic localized yield loss,

D5 _{depicts the repeated yield loss, D}6_{shows the random yield}

loss. Finally, the bottom pattern is a combination of the wafer maps for the six yield loss events above.

A SRAM layout example is shown in Fig. 3, imagining a contact layer process issue inducing a random single contact

(3)

Fig. 3. Four-bit SRAM layout example, OD layer in vertical, POLY layer in horizontal, contact in rectangular [2].

TABLE II

Fk

1×pValues for Each Yield Loss Event (Example)

opening failure, this failure will induce single bit fault if one contact opening among contact-1–contact-4, horizontal twin fault if either contact-11 or contact-12 opening, vertical twin fault if one of contact-5–contact-10 opening. Assuming in a certain yield loss event the contact opening probability is equal to each other, the ratio among single bit fault v.s. horizontal twin bit fault v.s. vertical twin bit fault will be 4:1:3=0.5:0.125:0.375, with a specific signature. Therefore, due to circuit structure characteristics, the ratio of each bitmap failure mode in each event was approximately to a specific signature and be expressed by Fk

1×p, where Fk1×p is a matrix

with one row and p column(s) in Event k, graded from 0 to 1. TABLE II shows the data used in Fk

1×p.

There are always uncertainties in the testing results. If we test a same wafer in the same tester twice and then compare the results, the data will be close but not 100% the same. The disparities might result from testing marginality, fault marginality, or bitmap fault pattern recognition sensitivity, and so on. In this model, f represents an uncertain factor with a value between 0.8 and 1.2 with conservative estimate.

This paper includes 20 bitmap failure modes and six yield loss events. Table III shows the selected simulated data of X. Except for Event 2, the Dk

n×1values are either 0 or 1. The number 1 signifies that a specific die was influenced by a certain yield loss event; whereas 0 means that the die was not

TABLE III

Selected Bitmap Data List (See Appendix-D)

affected. In Event 2, the Dk

n×1values are set between 0 and 1 to show the intensity of influence that Event 2 has within this wafer. A good die is only produced when all the Dk

n×1 values are 0, as “P” in the “P/F?” row indicates. The FM1 (bitmap failure mode1) count of Die 1is 110, but 130 for Die 2. Die 1 is affected by Events 1 and Events 2, while Die 2 was affected by Events 1, Events 2, Events 5, and Events 6 (see Appendix D).

B. PCA Approach

PCA is a component of multivariate statistical analysis. PCA was first proposed by Pearson, and then developed by Hotelling. The PCA method is also called the Karhunen-Loève transform or Hotelling transform. PCA is a technique for reducing multidimensional data sets to minimize dimensions of analysis. The dimensions remaining after PCA analysis are mutually independent. In other words, PCA is a linear transformation that converts data to a new coordinate system where the greatest variance of any projection of data lies on the first coordinate (the first principal component), the second greatest variance on the second coordinate, the third greatest variance on the third coordinate, and so forth [1], [4], [6], [7] (see Appendix A).

The following section briefly describes how to use the PCA approach. First, standardize the bitmap raw data to matrix X ,

xvi,stan daridized =

xvi,raw− ¯xi σxi X = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ x11 · · · x1i · · · x1p .. . . .. ... . .. ... xν1 · · · xνi · · · xνp .. . . .. ... . .. ... xn1 · · · xni · · · xnp ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ n_×p = [xνi]n×p.

(4)

Second, calculate the correlation matrix R = [rhj]p×pwhere

rhj is the correlation coefficient

R = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 r12 r13 · · · r1p r21 1 r23 · · · r2p r31 r32 1 · · · r3p .. . ... ... . .. ... rp1 rp2 rp3 · · · 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ p_×p = [rhj]p×p (2)

of the hth and jth bitmap failure modes, with p = 20 in our paper. Third, to calculate , the diagonalization matrix, use

= BRB (3) where = ⎡ ⎢ ⎢ ⎢ ⎣ λ₁ 0 · · · 0 0 λ2 · · · 0 .. . ... . .. ... 0 0 · · · λp ⎤ ⎥ ⎥ ⎥ ⎦.

is a diagonal matrix where the ith diagonal element is the ith eigenvalue of R = [rhj]p×p The term B is a translation matrix between old and new coordinate systems, where the ith column is the ith eigenvector of R = [rhj]p_×p.

In the PCA approach, an important output result called principle factor loading A = [aik] can be derived from

B = [βik]p×p (see Table A2). In other words, the principle factor loading

A = [aik] = [

λkβik]. (4)

The ith eigenvalue λk of [rij] stands for the variance corresponding to kth principle component, so the sum of all eigenvalues is the total variance of [rij]. The variance ratios can be acquired by dividing λk by

p

k=1λk= p (A7)–(A9). Another useful result called factor score coefficients may be found by taking m factors that can be derived from dividing [βik]p×m by

√

λk.

Factor score coefficients (FSC) are as follows: FSC = [βik÷

λk]. (5)

Factor scores (FS) can be obtained by multiplying the normalized X by the factor score coefficients

[FS]n×m= Xn×p× BP×m/ λk = [xνi]n×p× [βik/ λk]p×m (6) where variables i = 1, 2, ..., p, and samples v = 1, 2, ..., n.

C. Cluster and Discriminatory Analysis for Bitmap Failure Patterns

This paper uses clustering to classify a data set into subsets (clusters) so that the data in each subset shares some common features. The cluster analysis employed in this paper uses factor scores and wafer maps.

Due to the nature of the semiconductor process, one die might suffer multiple yield losses. Such dice are not appropri-ate objectives for failure analysis, because more resources are required to arrive at a failure analysis conclusion. Therefore, this paper proposes using a suitable threshold value of factor

score to select PFA objective dice. The objective die should have as more factor scores as possible in specific PC and also have as fewer factor scores as possible for other PCs.

Once the matrix B is derived from the raw data, (6) can be used to transform the sample’s data from bitmap failure space

X to event failure space Y

Y = XB (7)

X = YB−1. (8) Comparing the spectrum in the event failure space for the original data and transformed data reveals any new yield loss events.

III. Bitmap Data Analysis Example

The conventional yield improvement procedures include: 1) set up the bitmap yield loss Pareto plot; 2) select sample dice and process them with PFA analysis; 3) modify the process recipe to resolve the yield loss events; 4) process new wafers with the new recipe; and 5) continue the above four steps until the yield reaches the desired goal.

Yield loss events are usually mutually independent, which is consistent with the eigenvectors of the bitmap data correlation matrix. Accordingly, Microsoft EXCEL VBA was used to generate the simulated data, PCA analysis, cluster analysis, wafer maps, and so on.

A. Bitmap Data Analysis with Medium Noise

The Ck _{value of a random yield loss event in a medium} noise case is set at 10, which is ten times less than the Ck values of other signals. Table IV lists the top five eigenvalues and factor loadings.

Comparing the factor loading data in Table IV with that in Table II shows that principal component 1 (PC1) is mostly mapped to FM5, FM6 and FM11 to 14 (failure modes 5, 6, 11–14). This reflects the wafer center yield loss event (Event 3 in Table II). PC2 is mainly mapped to FM15 to 20, which reflects the repeated wafer yield loss event (Event 5 in Table II). PC3 is largely mapped to FM1 to 4, FM7 and FM8, which reflects the wafer edge yield loss event (Event 2 in Table II). PC4 is mainly mapped to FM7 to 10, which reflects the system localized yield loss event (Event 4 in Table II). PC5 is mainly mapped to FM2 to 4 and FM8, which reflects the wafer edge yield loss event (Event 1 in Table II).

B. Bitmap Data Analysis with Low Noise

In low noise cases, the Ck_{value of random yield loss event} in Table I is set at 1 for further noise reduction. Repeating the PCA procedures produces the data shown in Table V. These results are very similar to those obtained from bitmap data analysis with medium noise.

C. Bitmap Data Analysis with High Noise

For high noise scenario analysis, the Ck _{value of random} yield loss events in Table I is adjusted to 100 to determine what happens in the PCA analysis when the background

(5)

TABLE IV

Eigenvalues and Corresponding Factor Loadings (Rotated) of Medium Noise Analysis

Ck Value of Event 6 = 10 PC1 PC2 PC3 PC4 PC5 Eigenvalue 6.30 5.87 4.96 2.30 0.24 Proportion 31.5% 29. 3% 24.8% 11 5% Community 1.2% PC1–PC4 Commulative 31.5% 60.9% 85.7% 97.2% 98.4% FM1 −0.011 0.020 0.994 −0.022 0.989 −0.025 FM2 −0.185 0.012 0.941 −0.021 0.920 0.266 FM3 0.209 0.026 0.944 −0.027 0.937 −0.228 FM4 0.213 0.018 0.948 −0.032 0.946 −0.204 FM5 0.954 0.027 0.255 0.044 0.978 −0.046 FM6 0.941 0.044 0.301 0.039 0.979 −0.031 FM7 0.080 0.076 0.576 0.795 0.975 0.119 FM8 −0.149 0.028 0.882 0.365 0.934 0.231 FM9 0.532 0.088 −0.043 0.834 0.989 −0.057 FM10 0.455 0.089 0.011 0.879 0.988 −0.048 FM11 0.980 0.037 −0.072 0.132 0.984 0.021 FM12 0.983 0.030 −0.080 0.119 0.987 0.017 FM13 0.982 0.031 −0.100 0.070 0.981 0.021 FM14 0.981 0.017 −0.098 0.079 0.979 0.020 FM15 0.059 0.984 0.010 0.113 0.984 −0.007 FM16 0.014 0.987 0.000 0.006 0.975 −0.001 FM17 0.040 0.989 0.030 0.034 0.981 0.002 FM18 0.000 0.988 0.007 −0.049 0.980 0.006 FM19 0.065 0.978 0.045 0.067 0.967 0.003 FM20 0.021 0.993 0.015 −0.010 0.986 −0.004 TABLE V

Eigenvalues and Corresponding Factor Loadings (Rotated) of Low Noise Analysis

Ck Value of Event 6 = 1 PC1 PC2 PC3 PC4 PC5 Eigenvalue 6.31 5.91 4.98 2.30 0.24 Proportion 31.5% 29.6% 24.9% 11.5% Community 1.2% PC1–PC4 Commulative 31.5% 61.1% 86.0% 97.5% 98.7% FM1 −0.015 0.011 0.994 −0 032 0.989 0.015 FM2 −0.185 −0.001 0.944 −0.025 0.926 −0.253 FM3 0.206 0.015 0.947 −0.027 0.940 0.222 FM4 0.210 0.021 0.946 −0.034 0.941 0.217 FM5 0.956 0.025 0.257 0.031 0.982 0.045 FM6 0.948 0.027 0.287 0.036 0.984 0.032 FM7 0.077 0.069 0.586 0.786 0.973 −0.114 FM8 −0.149 0.035 0.885 0.355 0.933 −0.231 FM9 0.530 0.085 −0.044 0.834 0.986 0.052 FM10 0.445 0.069 0.005 0.887 0.990 0.043 FM11 0.979 0.032 −0.078 0.128 0 982 −0.021 FM12 0.979 0.033 −0.084 0.137 0.986 −0.017 FM13 0.985 0.026 −0.099 0.080 0.987 −0.020 FM14 0.985 0.017 −0.100 0.073 0.986 −0.020 FM15 0.048 0.986 0.008 0.099 0.984 0.009 FM16 0.011 0.992 −0.001 −0.004 0.985 0.001 FM17 0.034 0.991 0.028 0.022 0.985 −0.013 FM18 0.008 0.993 −0.001 −0.035 0.987 −0.006 FM19 0.051 0.989 0.038 0.067 0.987 0.005 FM20 0.020 0.993 0.010 −0.011 0.987 0.004 TABLE VI

Eigenvalues and Corresponding Factor Loadings (Rotated) of High Noise Analysis

Ck Value of Event 6 = 100 PC1 PC2 PC3 PC4 PC5 Eigenvalue 14.69 2.77 1.36 0.41 0.32 Proportion 73.5% 13.8% 6.8% Community PC1–PC4 2.1% 1.6% Commulative 73.5% 87.3% 94.1% 96.2% 97.8% FM1 0.683 0.710 0.016 0.972 0.100 0.050 FM2 0.391 0.898 −0.068 0.964 −0.009 −0.025 FM3 0.766 0.582 0.088 0.934 0.171 0.002 FM4 0.754 0.604 0.091 0.941 0.174 −0.005 FM5 0.969 0.069 0.142 0.963 0.099 −0.037 FM6 0.970 0.090 0.132 0.965 0.094 −0.023 FM7 0.868 0.333 0.050 0.867 −0.328 0.091 FM8 0.611 0.732 −0.023 0.910 −0.278 0.018 FM9 0.947 −0 016 0.126 0.913 −0.249 0.057 FM10 0.950 0.014 0.109 0.914 −0.236 0.108 FM11 0.970 −0 064 0.150 0.967 0.049 −0.059 FM12 0.961 −0.094 0.169 0.961 0.072 −0.107 FM13 0.883 −0.177 0.241 0.869 0.085 −0.332 FM14 0.881 −0.193 0.253 0.877 0.080 −0.320 FM15 0.919 0.037 −0.343 0.963 −0.014 0.104 FM16 0.777 0.010 −0.613 0.980 −0.003 −0.048 FM17 0.934 0.061 −0.293 0.962 0.061 0.133 FM18 0.836 0.018 −0.530 0.980 0.025 0.018 FM19 0.946 0.061 −0.216 0.944 0.008 0.167 FM20 0.884 0.040 −0.435 0.972 0.037 0.081

noise increases. When the noise intensity matches the signal intensity, the failure counts of systematic failure events roughly equal the number of random failure events. Yield loss events cannot be completely decoupled by PCA, as Table VI shows. This result shows that PC1 is a random failure event, and only three out of five events can be analyzed: wafer center, localized, and repeated failure events (PC2–PC4). However, in real situations, the failure counts of random yield loss events are lower than systematic and repeated yield loss events. As a result, high noise cases rarely occur.

D. Cluster Analysis

The details of yield loss events are usually unavailable for both the event (number) count and event bitmap failure mode distribution. Therefore, factor score data is required for deciding objective sample dice of the PFA. The factor scores can be easily obtained, as (6) indicates.

Fig. 4 plots the PC1 factor scores and corresponding Dk n_×1 values at a medium noise level for 500 simulated samples. According to the plot, the factor score data varies significantly among various Dk

n×1 values (0 or 1). As Fig. 5 shows, the factor score map of PC1 is very similar to the Dkn×1 map of Event 3. The threshold values of the gray-level parts of the factor score map are 1.88, 1.28, and 0.84, respectively. These values correspond to roughly 97%, 90%, 80% of the cumulative probability.

Fig. 6 shows that the factor score map of PC2 is similar to the Dk

n×1map of Event 5. The original yield loss events repeat the defects possibly caused by the mask’s faults. Except for the wafer center area, the repeated defects of yield loss events can be duplicated in PC2’s factor score map. This is because

(6)

Fig. 4. Factor score plot for each sample with corresponding Dk n×1 value (medium noise, PC1).

Fig. 5. Factor score map for PC1 (left) and Dk

n×1map (right) for Event 3 (medium noise).

Fig. 6. Factor score map for PC2 (left) and Dk_n_×1map (right) for Event 5 (medium noise).

the wafer center signal might be biased from the wafer center yield loss event (Event 3 in PCA analysis).

In Event 2, the Dkn×1 values are set at 0, 0.2, 0.4, 0.6, 0.8, and 1.0 to represent varying degrees of yield loss. According to Fig. 7, the magnitude of PC3 is correlated to the Dk

n×1 values of Event 2, which supports the knowledge that greater factor score values correspond to more serious degrees of yield loss.

Fig. 8 shows that PC3 represents Event 2, Fig. 9 shows that PC4 represents Event 4, and Fig. 10 indicates that PC5 stands for Event 1. The eigenvalue of PC5 is much less than that of the other PCs, so the connection between PC5 and Event 1 is not obvious.

Based on this discussion above, the objective die ν should be on the right-hand side of the factor score probability distribu-tion for the given factor scores of specific PC. Simultaneously, the objective die should also have as fewer factor scores as possible for other PCs.

Equation set (7) shows the cluster analysis proposed in this paper. Appendix B is the factor score summary of medium noise scenario (Ck _{= 10), with gray background mark for}

P{[FS]k ≤ thresholdk} = 97% and an underlined mark for

P{[FS]k ≤ threshold∗k} = 70%. For example, die-482 is a proposed candidate of PC1, die-274 is a proposed candidate

Fig. 7. Factor score plot for each sample with corresponding Dk n×1value

(medium noise, PC3).

Fig. 8. Factor score map for PC3 (left) and Dk_n_×1map (right) for Event 2 (medium noise).

Fig. 9. Factor score map for PC4 (left) and Dk

n×1map (right) for Event 4

(medium noise).

Fig. 10. Factor score map for PC5 (left) and Dk

n×1map (right) for Event 1

(medium noise).

of PC2, die-008 is a proposed candidate of PC3, and die-387 is a proposed candidate of PC4. The criteria 97% and 70% are not invariant, they depend on how many dice can be supported by PFA resource ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ [FS]νk≥ thresholdk for k = η [FS]νk≤ threshold∗k for k= η. k=1,2,.η..,6 ν=1,2,...,500 (9) E. Empirical Study

An empirical study in the 65 nm SoC device yield improve-ment trend chart is shown in Fig. 11. In this example, ten

(7)

Fig. 11. Selected 65 nm device yield improvement trend chart. TABLE VII

Eigenvalues and Corresponding Factor Loadings in the Empirical Study

months were taken to improve the yield from 0% to 80% by traditional approach.

The traditional procedure is: 1) build up a yield loss Pareto from tested data; 2) 80–20 rule, choose top failure dice for PFA, usually only couple of dice are chosen due to limited PFA resource; 3) draw up a process improvement strategy according to PFA results; and 4) after new wafer processed by improved process condition, repeat procedures (1)–(3).

If there were two yield loss events (named event A and event B), event A causes hundreds of failure bits per die, and event B only causes couple of failure bits per die. Most likely, we could not dig out event B in the first learning iteration by traditional procedure.

Table VII is the PCA results of the 0% yield wafer of mentioned 65 nm device. In this paper, we add defective rate from factor score to improve PCA in practice (in this case, factor score >0.7 is treated as defective dice). Compared to the PCA data and empirical results, PCA results 2, factor-4, factor-6 can match yield loss Event 2, Event 3, Event 1, respectively. PCA results factor-1 can be treated as random defect, because factor-1 can cause whole the bit map failure modes FM01–FM12. We did not put resource on Factor-3/5, because the defective rate is as low as 1%.

F. Discussion and Implementation

From the simulated analysis of medium and low noise cases, using the PCA based approach discusses all the principal com-ponents, and four out of the five eigenvalues are greater than

one. From the empirical example, which introduces defective rate to enhance the PCA results in practice, one can shorten the yield learning iterations. Based on this paper, objective dice of the PFA for yield improvement can be selected after PCA is implemented using the following proposed procedures:

1) 80–20 rule, calculating the defect rate of each factor based on factor scores which are greater than a suitable value (ex: 0.7);

2) from high defective PCs, choosing the objective dice as higher factor score as possible with corresponding PC and also as lower factor score as possible in other PCs; 3) drawing up a process improvement strategy according

to PFA results on objective dice;

4) after new wafer processed by improved process condi-tion, repeat procedures (1)–(3).

Based on the analysis of the three different noise scenarios, the PCA based approach is influenced when the noise intensity equals the signal intensity (i.e., the high noise case). However, most yield loss events can be detected when the noise intensity is smaller than the signal intensity.

Wafer map overlaps between yield loss events indeed affect-ing the results produced by PCA. For example, Event 3 and Event 5 have similar wafer center area map, and this overlap influences the factor score maps of PC1 and PC2.

It is not practicable to have PFA in whole defective dice, usually several dice are selected as PFA objectives for one yield learning iteration. Usually, several learning iterations are needed for yield improvement and it really consumes both time (one iteration—three months) and cost (need many wafers). The proposed “PCA+ defective rate” analysis is a practical methodology for shortened yield learning iteration, compared to traditional Pareto rules.

IV. Conclusion

The process of new technology yield improvement can take up to one year to complete [3]. The bottlenecks in this process are: 1) not all yield loss events can be uncovered in the first analysis; 2) after the modifications, two or three months of pro-cess time is required to verify the yield results; and 3) the PFA method is a resource-limited and time-consuming approach.

However, the bitmap data analysis method proposed in this paper uses the “PCA based + defective rate” approach to greatly reduce the yield learning cycle time without requiring additional resources. Only a desk-top computer, the related software, and a little time is required to conduct this analysis. Although previous studies present numerous data mining approaches [1]–[6], [8], [11], none of them can decouple the yield loss events considering wafer maps and signal intensity. Only the PCA based approach can decouple these kinds of failures.

Once the bitmap data is analyzed, the principal components can be used as the basis of failure process space. The following techniques are suggested for semiconductor manufacturing yield management: 1) the goal of yield improvement should objective not only systematic or repeated failure events, but also random failure events; 2) since the failure process space of a wafer analyzed by PCA can be established, the problems

(8)

of another wafer with similar failure modes can be disclosed in minutes without any traditional analysis; and 3) the basis of failure process space can be extended by adding new failure modes. Eventually, a complete version of failure process space for a specific technology node can be built (ex: 0.13 µm CMOS logic low power process) to improve manufacturing knowledge management or technology transfer.

Appendix A

CONCEPTS OFPRINCIPLECOMPONENT

TABLE A.1

Bitmap Data Set Matrix and Sample Scores for PCA

TABLE A.2 Principle Factor Loading

Variables Principle y1 Principle yk Principle ym Community

x1 √ λ1β11 · · · √ λkβ1k · · · √ λmβ1m h1 . . . . . . . . . . . . . . . xi √ λ1βi1 · · · √ λkβik · · · √ λmβim hi . . . ... ... ... ... xp √ λ1βp1 · · · √ λkβpk · · · √ λmβpm hp Eigenvalue λ1 · · · λk · · · λm − Contri. rate λ1/p · · · λk/p · · · λm/p · Accu.con. rate λ1/p · · · k k=1 λk/p · · · m k=1 λm/p where aik=√λkβik,i.e., A = [aik] = [√λkβik], hi= m k=1 a2_ikand√λkβikand λk= p i=1 a2 ik.

A. BASIC CONCEPTS OFPRINCIPLE COMPONENT Y = β1x1+· · · + βixi· · · + βpxp= β

x. (A1) Let the synthetic index Y = βx of standardized bitmap

variables vector x = (x1,· · · , xi,· · · , xp) be the greatest variance, i.e., maximize Var(Y ) = Var(βx) = βRβ. Let the

standardized bitmap data set matrix X be

X = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ x11 · · · x1i · · · x1p .. . . .. ... . .. ... xν1 · · · xνi · · · xνp .. . . .. ... . .. ... xn1 · · · xni · · · xnp ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ n_×p = [xνi]n×p. (A2)

APPENDIX-B: SELECTIVEFACTORSCORES(Ck _{= 10)}

TABLE A.3

(9)

The correlation matrix R = [rij]p×pis a variance-covariance matrix by the standardized bitmap data set matrix X = [xiv]n×p, where R = [rij]p×p is the correlation coefficient matrix R = [rij]p_×p= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 r12 r13 · · · r1n r21 1 r23 · · · r2n r31 r32 1 · · · r3n · · · . .. ··· rn1 rn2 rn3 · · · 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ . (A3)

Find the eigenvalue and respective eigenvalue to eigenvector as follows:

Maximize Var(Y ) = βRβ (A4)

subject to

p

i=1

β2_i = ββ = 1.

Take the Lagrange under equality constraint ββ = 1 for

maximizing objective function Var(Y ) = βRβ, and then see

that Q= βRβ− λ(ββ− 1) → Maximize (A5) ∂Q ∂β = ∂ ∂β[β Rβ− λ(ββ− 1)] = 2Rβ− 2λβ = 0 ⇒ Rβ = λβ. (A6) This leads to (R− λI)β = 0. (A7)

The eigenvalue (λ1, ..., λk, ..., λm, ..., λp) and respective eigenvalue to eigenvector can then be found

Bp×p= (β1, ..., βk, ..., βm, ..., βp) = [βik]p×p β_kβk= 1 k= k β_kβk= 0 k= k independence. (A8) Therefore, BB = I.

According to (A6) Var(yk) = β

kRβk = λk of component

yk; k = 1, 2, ..., p can be obtained and BRB= . Alternately, based on (A6), RB = B→BRB = BB, BRB= when BB = I, where the diagonalization matrix

= ⎡ ⎢ ⎢ ⎢ ⎣ λ1 0 · · · 0 0 λ2 · · · 0 .. . ... . .. ... 0 0 · · · λp ⎤ ⎥ ⎥ ⎥ ⎦. (A9)

Finally, based on (A8) principle factor loading matrix A, communality hi, eigenvalue λk and contribution rate λk/p, accumulated contribution rate can be obtained (see Table A2)

A = [aik] = [

λkβik]. (A10)

In the real word, take k = 1, 2, ..., m and m < p to determine the accumulated contribution rate m_k₌₁λk/p.

APPENDIX-C: SELECTIVERAW DATA (Ck _{= 10)}

TABLE A.4

Table-A3 Selective raw data (Ck_{= 10)}

APPENDIX-D: MOREINTERPRETATION OFTABLE 3 TABLE A.5

(10)

References

[1] L. Yan, “A PCA-based PCM data analyzing method for diagnosing process failures,” IEEE Trans. Semiconduct. Manuf., vol. 19, no. 4, pp. 404–410, Nov. 2006.

[2] Z. Qian, F. Siegelin, B. Tippelt, and S. Muller, “Localization and physical analysis of a complex SRAM failure in 90 nm technology,” Microelectron. Reliab., vol. 46, nos. 9–11, pp. 1558–1562, 2006. [3] C. Weber, “Yield learning and the sources of profitability in

semiconduc-tor manufacturing and process development,” IEEE Trans. Semiconduct. Manuf., vol. 17, no. 4, pp. 590–596, Nov. 2004.

[4] K. R. Skinner, D. C. Montgomery, G. C. Runger, J. W. Fowler, D. R. McCarville, T. Reed Rhoads, and J. D. Stanley, “Multivariate statistical methods for modeling and analysis of wafer probe test data,” IEEE Trans. Semiconduct. Manuf., vol. 15, no. 4, pp. 523–530, Nov. 2002. [5] K. Imai and T. Kaga, “A novel filtering method to extract three critical

yield loss components (gross, repeated, and random) FIMER,” IEEE Trans. Semiconduct. Manuf., vol. 13, no. 4, pp. 408–415, Nov. 2000. [6] J. Horan and C. Lyden, “Rapid IC performance yield and distribution

prediction using a rotation of the circuit parameter principals compo-nents,” Microelectron. Reliab., vol. 38, no. 12, pp. 1913–1918, Dec. 1998.

[7] D. A. White and D. Boning, “Spatial characterization of wafer state using principal component analysis of optical emission spectra in plasma etch,” IEEE Trans. Semiconduct. Manuf., vol. 10, no. 1, pp. 52–60, Feb. 1997.

[8] T. Hamada and M. Sugimoto, “Application of a bitmap analysis system to the forefront of DRAM devices development,” in Proc. IEEE/SEMI Adv. Semiconductor Manuf. Conf., 1997, pp. 222–227.

[9] R. S. Collica, J. P. Card, and W. Martin, “SRAM bitmap shape recog-nition and sorting using neural networks,” IEEE Trans. Semiconduct. Manuf., vol. 8, no. 3, pp. 326–332, Aug. 1995.

[10] I. T. Jollife, Principal Component Analysis. New York: Springer-Verlag, 2002, ch. 1.

[11] D. Y. Shee and G.-H. Tzeng, “The key dimensions of criteria for the evaluation of ISPs: An exploratory study,” J. Comput. Inform. Syst., vol. 42, no. 4, pp. 112–121, 2002.

Yeou-Lang Hsieh was born in Taiwan in 1973. He

received the Bachelors degree in engineering science from National Cheng Kung University, Tainan City, Taiwan, in 1995, the Masters degree in electronic engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 1997, and the second Masters degree in management of technology from NCTU in 2005. He is currently pursuing the Ph.D. degree from the Institute of Management of Tech-nology, NCTU.

He joined Taiwan Semiconductor Manufacturing Company (TSMC) as a Process Integration Engineer in 2000, and since 2004 has been a Principal Product Engineer. In this role, he took part in the Advanced Technology Yield Improvement Team. He is currently a Project Manager with the Customer Technical Supporting Division. During his position with TSMC, he has registered six U.S. patents for TSMC.

Gwo-Hshiung Tzeng (F’02) was born in Taiwan in

1943. He received the Bachelors degree in business management from the Tatung Institute of Technol-ogy, Taipei, Taiwan, in 1967, the Masters degree in urban planning from Chung Hsing University, Taichung, China, in 1971, and the Ph.D. degree in management science from Osaka University, Osaka, Japan, in 1977.

He was an Associate Professor with National Chiao Tung University, Hsinchu, Taiwan, from 1977 to 1981, a Research Associate with the Argonne National Laboratory, Argonne, IL, from July 1981 to January 1982, a Visiting Professor with the Department of Civil Engineering, University of Maryland, College Park, from August 1989 to August 1990, a Visiting Professor with

the Department of Engineering and Economic System, Energy Modeling Forum, Stanford University, Stanford, CA, from August 1997 to August 1998, a Professor with Chaio Tung University from 1981 to 2003, and is currently a Chair Professor with the Institute of Management of Technology, National Chiao Tung University. His current research interests include statis-tics, multivariate analysis, network routing and scheduling, multiple criteria decision making, fuzzy theory, hierarchical structure analysis for applying to technology management, energy, environment, transportation systems, trans-portation investment, logistics, location, urban planning, tourism, technology management, electronic commerce, global supply chain, and so on.

Dr. Tzeng received the National Distinguished Chair Professor (highest honor offered by the Ministry of Education Affairs, Taiwan) and Distinguished Research Fellow (highest honor offered by NSC, Taiwan) in 2000. He got the Highly Cited Paper (March 13, 2009) ESI for “Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS” as published in the European Journal of Operational Research in 2004, which has been recently identified by Thomson Reuters’ Essential Science IndicatorsSM to be one of the most cited papers in the field of economics. He received the Pinnacle of Achievement Award of the World in 2005, the National Distinguished Chair Professor and Award (highest honor offered) from the Ministry of Education Affairs of Taiwan, the Distinguished Research Award thrice, and the Distinguished Research Fellow (highest honor offered) from the National Science Council of Taiwan twice. He organized a Taiwan Affiliate Chapter of the International Association of Energy Economics in 1984 and was the Chairman of the Tenth International Conference on Multiple Criteria Decision Making, Taipei, from July 19 to 24, 1992. He was the Co-Chairman of the 36th International Conference on Computers and Industrial Engineering, Taipei, from June 20 to 23, 2006. He was the Chairman of the International Summer School on Multiple Criteria Decision Making, Kainan University, Taiwan, from July 2 to 14, 2006. He is a member of IAEE, ISMCDM, World Transport, the Operations Research Society of Japan, the Society of Instrument and Control Engineers of Japan, the City Planning Institute of Japan, the Behaviormetric Society of Japan, the Japan Society for Fuzzy Theory and Systems, and is participating in many societies of Taiwan. He is the Editor-in-Chief of the International Journal of Operations Research, the International Journal of Information Systems for Logistics and Management, and others.

Grace Tyng-Ruu Lin received the Ph.D. degree

from the Judge Business School of Cambridge Uni-versity, Cambridge, U.K., in 2003.

She is currently an Associate Professor with the Institute of Management of Technology, National Chiao Tung University, Hsinchu, Taiwan. She has been published in the Journal of Business Ethics, Technology in Society, Developing Economies, En-trepreneurship and Regional Development, the Jour-nal of Scientific and Industrial Research, the JourJour-nal of Global Business and Technology, Technovation, and others. Her current research interests include innovation and technology management, strategic management, and marketing studies.

Hsiao-Cheng Yu received the Ph.D. degree from

the School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, in 1981.

He is currently a Professor with the Institute of Management of Technology, National Chiao Tung University (NCTU), Hsinchu, Taiwan. Before join-ing NCTU, he was a Member of Technical Staff, AT&T Bell Laboratories, Murray Hill, NJ, from 1985 to 1992, and was a Consultant with Contel Information Systems, Great Neck, NY, from 1981 to 1985. His current research interests include telecom-munications and broadcasting policies, entrepreneurship and venture capital, telecommunications technologies and service management, and trade skills and business paradigms in high-tech industries.