A novel design of wafer yield model for semiconductor using a GMDH polynomial and principal component analysis

(1)

A novel design of wafer yield model for semiconductor using a GMDH

polynomial and principal component analysis

Jun-Shuw Lin

Department of Industrial Engineering and Management, National Chiao Tung University, 1001 Dah-Hsei Road, Hsin-Chu 300, Taiwan, ROC

a r t i c l e

i n f o

Keywords: Yield model Defect cluster index

Group method of data handling (GMDH) Principal component analysis (PCA)

a b s t r a c t

According to previous studies, the Poisson model and negative binomial model could not accurately esti-mate the wafer yield. Numerous mathematical models proposed in past years were very complicated. Furthermore, other neural networks models can not provide a certain equation for managers to use. Thus, a novel design of this paper is to construct a new wafer yield model with a handy polynomial by using group method of data handling (GMDH). In addition to defect cluster index (CIM), 12 critical electrical test

parameters are also considered simultaneously. Because the number of input variables for GMDH is inad-visable to be too many, principal component analysis (PCA) is used to reduce the dimensions of 12 critical electrical test parameters to a manageable few without much loss of information. The proposed approach is validated by a case obtained in a DRAM company in Taiwan.

1. Introduction

For integrated circuits (IC) manufacturers, the wafer yield is a key index to evaluate their proﬁt. Semiconductor manufacturing companies strive to achieve defect-free products and increase proﬁt rate by adopting advanced manufacturing, planning, and evaluating technologies. In these performance technologies (Leachman, 1993), wafer yield prediction is one of the most widely researched approaches in the complicated semiconductor manu-facturing system. Wafer yield prediction is very important for a semiconductor manufacturing factory in improving yield, decreas-ing cost and maintaindecreas-ing a good relationship with customers (Kumar et al., 2006). For this reason, it is an essential task for engineers to manage the wafer yield.

As the wafer size increases, the clustering phenomenon of defects becomes pronounced. Although the Poisson model is the simplest model to use, the essential assumption is that defects must occur independently with constant probability of occurring in small area on a wafer (Albin & Friedman, 1991). The negative binomial yield model (Stapper, 1973) includes a clustering index (

a

), but the value of

a

can be very scattered and negative that leads to unhandy analysis (Cunningham, 1990). Numerous mathematical models have been developed to predict wafer yield in the last 40 years (Cunningham, 1990; Stapper, 1991; Stapper & Rosner, 1995), but these models are very complicated in practice.

Neural networks are also utilized to construct the wafer yield models, but those models (Tong & Chao, 2008; Tong, Lee, & Su, 1997) must set several parameters (e.g., the number of neurons in the hidden layers, the momentum, and the learning rate) and

can not provide a certain equation for managers to use. Thus, those neural networks models are often difﬁcult for managers without profound profession knowledge to use in performing wafer yield prediction.

On the basis of practicability, a novel design of this paper is to construct a new wafer yield model with a handy polynomial by using group method of data handling (GMDH) (Ivakhnenko, 1968, 1971). This proposed GMDH model does not need any statistical assumption and can be friendly to use. In addition to defect cluster index (CIM) (Tong, Wang, & Chen, 2007), 12 critical electrical test parameters are also considered simultaneously. Because the num-ber of input variables for GMDH is inadvisable to be too many, prin-cipal component analysis (PCA) (Pearson, 1901) is used to reduce the dimensions of 12 critical electrical test parameters to a manage-able few without much loss of information for convenient analysis. Finally, a case of a DRAM company in Taiwan is utilized to dem-onstrate the effectiveness of the proposed approach. Comparisons are also made among negative binomial yield model, back-propa-gation neural network (BPNN) yield model, general regression neu-ral network (GRNN) yield model (Tong & Chao, 2008), and the proposed GMDH yield model to demonstrate that the proposed ap-proach is indeed superior.

2. Literature review 2.1. Yield models

The Poisson yield model assumes that the defects on a chip follow a Poisson probability distribution. Under this assumption, the probability that a chip has k number of defects is

E-mail address:junsoon1@hotmail.com

Contents lists available atSciVerse ScienceDirect

Expert Systems with Applications

(2)

PðkÞ ¼e k0_kk

0

k! ; k ¼ 0; 1; 2; . . . ð1Þ

where k0 is the average number of defects per chip, and k is the number of defects per chip. The Poisson yield model can be ob-tained as

Y ¼ Pðk ¼ 0Þ ¼ ek0 _ð2Þ

Cunningham (1990)indicated that, when the chip size is less than 0.25 cm2_{, the Poisson yield model is appropriate. However, as the} chip size increases, the conventional Poisson yield model will fre-quently underestimate the actual wafer yield.

The negative binomial yield model proposed byStapper (1973)

is a widely applied yield model, which employs a gamma function for the distribution of defect density. The negative binomial yield model can be expressed as

Y ¼ 1

ð1 þ D0A=

a

Þa

ð3Þ

where D0is the average number of defects per unit area, A is the chip area, and

a

is the cluster parameter. The value of

a

is calculated by the following equation:

a

¼ k2_=ð

_r

2_kÞ _ð4Þ

where kis the mean number of defects per chip, and

r

2_{is the} var-iance. Cunningham (1990) indicated that, the value of

a

can be quite scattered and sometimes negative when the negative bino-mial yield model is used to predict yield.

Other yield models are summarized in Stapper and Rosner

(1995).Tong et al. (1997) proposed a neural network-based ap-proach to predict the wafer yield. Langford, Liou, and Raghavan (2001)presented a simple robust windowing method for the Pois-son yield model to extract the systematic and random components of yield from wafer probe bin map data. Liou et al. (2002) pre-sented a statistical modeling of MOS devices for parametric yield prediction. Meyer and Park (2003) presented a center-satellite model to predict defect-tolerant yield in the embedded core con-text.Dupret and Kielbasa (2004)presented the partial least square (PLS) regression model to predict the yield from measurements ob-tained during the production.Kim and Baldwin (2005)presented a theoretical yield model for assembly processes of area array sol-ders inter connect process.Tong and Chao (2008)proposed a gen-eral regression neural network (GRNN) to predict the wafer yield with clustered defects.

2.2. Defect cluster index

The intensity of defects clustered on a wafer can be depicted by a defect cluster index. The cluster parameter (

a

) of the negative binomial model, the variance/mean ratio (V/M) and the non-parameters assumption cluster index (CI) are commonly used. The negative binomial yield model is as follows:

Y ¼ 1

ð1 þ k=

a

Þa ð5Þ

where

a

is the cluster parameter and kis the mean number of de-fects per chip. Earlier reports show that cluster parameter

a

in the negative binomial model may be quite scattered and may even have a negative value when the model is used to forecast yield ( Cunning-ham, 1990).

Tyagi and Bayoumi (1992, 1994) utilized various grid sizes superimposed on a wafer map to measure the intensity of defects distributed on a wafer. The defects contained within each grid can be used to judge the spatial distribution of defects. The distribution of defects follows a Poisson distribution if the defects are randomly distributed. Because both variance (V) and mean (M) are equal in

the Poisson distribution, the value of V/M equals 1 if the wafer fects are randomly scattered. The value of V/M exceeds 1 if the de-fects distributed on a wafer are clustered. The values of V/M depend on how the grids are selected and cannot indicate the grad-ualness of cross-wafer defect density variations.

Jun, Hong, Kim, Park, and Park (1999)proposed a cluster index based on the projected x and y coordinates of defect locations on a wafer. Defect clustering tends to show clumps in the x and the y coordinates, which result in a large variance in defect intervals. However, showing clumps either on the x-axis or on the y-axis does not necessarily represent the clustered defects. The clustering index CI can be calculated as

CI ¼ min S 2 v V2; S2W W2 ( ) ð6Þ

where Viand Wiare a sequence of defect intervals on the x-axis and y-axis deﬁned as

Vi¼ XðiÞ Xði1Þ; i ¼ 1; 2; . . . ; n ð7Þ Wi¼ YðiÞ Yði1Þ; i ¼ 1; 2; . . . ; n ð8Þ

where X(i)and Y(i)denote the ith smallest defect coordinates on the x-axis and y-axis respectively, X(0)= Y(0)= 0, and n is the number of defects on a wafer. The value of CI is close to 1 if the defects are ran-domly scattered, and the value of CI is expected to be greater than 1 if clustering of defects appears.

3. Proposed approach

The constructing of the proposed wafer yield model is described in the following subsections.

3.1. Group method of data handling (GMDH)

The GMDH (Ivakhnenko, 1968, 1971) is a special model, and it can be expressed as a set of neurons in which different pairs of them in each layer are connected through a polynomial and, so produce new neurons in the next layer. For instance, the training set is divided into two parts: model learning set E1 and model selecting set E2in GMDH. Let X = (X1, X2, . . ., Xn) and y be the input vector and actual output, respectively. Given M observations of multi-input, single-output data pairs {yi, Xi1, Xi2, . . ., Xin, i = 1, 2, . . ., M} in set E1, I train a GMDH-type neural network to pre-dict the output values ^yi:

^

yi¼ ^fðXi1;Xi2; . . . ;XiMÞ; i ¼ 1; 2; . . . ; M ð9Þ

The problem transforms to construct a GMDH-type neural network so that minX M i¼1 ½^fðXi1;Xi2; . . . ;XiMÞ yi 2 ð10Þ

The connection between the inputs and the output variables can be expressed by a complicated discrete form of the Volterra functional series in the form of

y ¼ a0þ XM i¼1 aiXiþ XM i¼1 XM j¼1 aijXiXjþ XM i¼1 XM j¼1 XM k¼1 aijkXiXjXk þ ð11Þ

which is also called as the Kolmogorov–Gabor (K–G) polynomial (Madala & Ivakhnenko, 1994; Muller & Lemke, 2000), in particular by the K–G polynomial of degree 2 consisting of only two variables (neurons) in the form of

^

(3)

In this manner, such a partial quadratic description is recursively used in a network of connected neurons to construct the general mathematical relation of the inputs and output variables given in Eq.(11). The coefficients aiin Eq. (12) are calculated with least squares (LS) (Madala & Ivakhnenko, 1994; Muller & Lemke, 2000). In this manner, the coefficients of each quadratic function Giare given to optimally fit the output yiin the whole set E1, that is

min PM i¼1ðyi GiÞ2 M " # ð13Þ

By the GMDH algorithm, all the possibilities of two independent variables out of the total n input variables are taken in order to con-struct the polynomial in the form of Eq. (12) that best ﬁts the dependent observations (yi, i = 1, 2, . . ., M) with LS. Therefore, C2n¼ nðn 1Þ=2 neurons will be constructed in the ﬁrst hidden layer of the feed-forward network from the observations {(yi, Xip, Xiq)} for different p, q 2 {1, 2, . . ., n}. Likewise, it is now possible to construct M data triples {(yi, Xip, Xiq)} from observations with such p, q 2 {1, 2, . . ., n} in the form X1p X1q y1 X2p X2q y2 XMp XMq yM 0 @ 1 A.

By the quadratic sub-expression in the form of Eq.(12)for each row of M data triples, the following matrix equation can be given as Aa = Y, where a is the vector of unknown coefﬁcients of the qua-dratic polynomial in Eq. (12), a = {a0, a1, a2, a3, a4, a5}T and Y = {y1, y2, . . ., yM}Tis the vector of the output’s value from observa-tion. It can be shown in the following

1 X1p X1q X1pX1q X21p X 2 1q 1 X2p X2q X2pX2q X22p X 2 2q 1 XMp XMq XMpXMq X2Mp X 2 Mq 0 B B @ 1 C C A

The LS obtains the solution of the equations in the form of

a ¼ ðAT_AÞ1_AT_Y

ð14Þ

which determines the vector of the best coefficients of Eq.(12)for the whole set of M data triples. It should be paid attention to that this pro-cedure is repeated for each neuron of the next hidden layer according to the connectivity topology of the network. In each layer, it uses LS to estimate the parameters of candidate models in set E1, and uses the external criterion to evaluate and select the candidate models in set E2. The process continues and should be stopped when we find the optimal model by the termination principle, which is presented by the theory of optimal complexity (Madala & Ivakhnenko, 1994): along with the increase of model complexity, the value of external criterion will decrease first and then increase, and finally the global extreme value agrees with the optimal complexity.

3.2. Principal component analysis (PCA)

Given a set of centered input vectors xt (t = 1, 2, . . ., l and Pl

t¼1xt¼ 0), each of which is of m dimension xt= [xt(1), xt(2), . . ., xt(m)]Tusually m < l, PCA (Pearson, 1901) linearly trans-forms each vector xtinto a new one stby

st¼ UTxt ð15Þ

where U is the m m orthogonal matrix whose ith column, uiis the eigenvector of the sample covariance matrix

C ¼1 l

Xl t¼1

xtxTt ð16Þ

In other words, PCA ﬁrstly solves the eigenvalue problem

kiui¼ Cui; i ¼ 1; 2; . . . ; m ð17Þ

where kiis one of the eigenvalues of C, uiis the corresponding eigen-vector. Based on the estimated ui, the components of stare then cal-culated as the orthogonal transformations of xt

stðiÞ ¼ uTixt; i ¼ 1; 2; . . . m ð18Þ

The new components are called principal components. By using only the ﬁrst several eigenvectors sorted in descending order of the eigenvalues, the number of principal components in stcan be re-duced. So PCA has the dimensional reduction characteristic. The principal components of PCA have the following properties: st(i) are uncorrelated, has sequentially maximum variances and the mean squared approximation error in the representation of the ori-ginal inputs by the ﬁrst several principal components is minimal (Jolliffe, 1986).

3.3. Defect cluster index (CIM)

In this study, I use the clustering index (CIM) proposed byTong

et al. (2007)to measure the clustering phenomenon of defects. The detailed descriptions of obtaining CIMare listed as the following ﬁve steps.

Step 1: Project the defect coordinates (Xi, Yi) into a new axis obtained by rotating the x-axis counterclockwise using h_. Sup-pose that a wafer has n defects, and (Xi, Yi) denotes the x and y coordinates of the ith defect location in a two-dimensional space, i = 1, . . ., n. These n defects then can be projected onto a new axis X

i;hobtained by rotating the x-axis counterclockwise using h°. The new coordinates for the ith defect with respect to h then can be calculated as follows:

Xi;h¼ cos h Xiþ sin h Yi ð19Þ

where i denotes the ith defect and h represents a rotating angle, where 0 6 h 6 180.

Step 2: Sort the Xi;hvalues in ascending order and calculate the intervals between each adjacent coordinate value X

i;h. The inter-vals between each adjacent coordinate value X

i;h then can be calculated as follows:

Vi;h¼ Xði;hÞ X

ði1;hÞ ð20Þ

where X

ð0;hÞ¼ 0 and Vi,hrepresents the ith interval between Xði;hÞ and X

ði1;hÞ.

Step 3: Calculate the squared coefﬁcient of variation (SCV) for Vi,h. The SCV for Vi,hcan be determined as follows:

SCVh¼ S2 v;h V2 h ð21Þ

where SCVh represents the squared coefﬁcient of variation for Vi;h;Vh¼ ðPni¼1Vi;hÞ=n, and S2V;h¼ ð

Pn

i¼1ðVi;h VhÞ2Þ=ðn 1Þ. Step 4: Change the angle of h and calculate the corresponding h= 1° value. The number of 180 SCVhvalues with respect to h, increased by h = 1°, can be obtained through Steps 1–3. Step 5: According to the SCVhvalues obtained from Step 4, the average SCVh value determines the clustering index (CIM), as follows:

CIM¼ P180

h¼0SCVh

180 ð22Þ

where CIMrepresents defect cluster index. A larger CIMvalue indi-cates a stronger degree of defect clustering formed on a wafer. 3.4. Prepare the relative data per wafer

In this study, defect counts, the value of CIM, and the value of principal component scores are utilized as the input variables for

(4)

GMDH. The value of actual wafer yield is the output variable for GMDH. Follows are brief descriptions for the obtainment of CIM, principal component scores, and the actual wafer yield.

3.4.1. Calculate the value of CIM

The clustering phenomenon of defects on a wafer inﬂuences the accuracy of a wafer yield model, and the CIMcan effectively mea-sure the clustering phenomenon on a wafer. The CIMcan be ob-tained by the ﬁve calculating steps introduced in Section3.3.

3.4.2. Obtain the value of principal component scores

Use the principal component analysis (PCA) to form new vari-ables that are linear combinations of the original varivari-ables (i.e., 12 critical electrical test parameters). Then let the standardized data of original variables into the linear equations of new variables to obtain the value of principal component scores.

3.4.3. Calculate the value of actual wafer yield

The actual yield value can be obtained by the number of non-defective chips divided by the total number of chips on a wafer. 3.5. Verify the proposed model

The accuracy of neural networks can be measured by a root-mean squared error (RMSE). When the value of RMSE is smaller,

the accuracy of neural networks is higher. The RMSE can be calcu-lated as RMSE ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn i¼1ðAi OiÞ2 n s ð23Þ

where n represents the number of data, Airepresents the actual va-lue of output, and Oirepresents the predicted value. The general indicator for measuring the strength of the relationship between the actual and predicted outputs is the Pearson’s linear correlation coefﬁcient r. In this study, RMSE and r are both used to evaluate the performance of wafer yield model.

4. Implementation

In this section, a case of a DRAM company in Taiwan is utilized to demonstrate the effectiveness of the proposed approach. Com-parisons are also made among negative binomial yield model, back-propagation neural network (BPNN) yield model, general regression neural network (GRNN) yield model (Tong & Chao, 2008), and the proposed GMDH yield model to demonstrate that the proposed approach is indeed superior.

Table 1

The partial 12 critical electrical test parameters of 111 wafer data in this case. No. Parameter x1 Parameter x2 Parameter x3 Parameter x12

1 1.1727 0.0658 322.9658 2.1098 2 1.1662 0.0615 315.9007 2.6533 3 1.1774 0.0682 322.2566 2.3412 4 1.1695 0.0632 310.9355 2.2145 5 1.1834 0.0595 313.5132 2.4589 6 1.1607 0.0659 320.4290 2.5098 109 1.1808 0.0614 319.6768 2.3325 110 1.1762 0.0604 321.3472 2.5977 111 1.1849 0.0632 317.0603 2.1104

Fig. 1. The scree plot of PCA.

(5)

4.1. PCA of 12 critical electrical test parameters

There are totally 111 data of 8-in. wafer in a case of a DRAM company in Taiwan, and 12 critical electrical test parameters per

wafer are considered in this case. The partial 12 critical electrical test parameters of 111 wafer data are listed inTable 1.

The computer software, STATISTICA 6.0, is used to perform PCA.

Fig. 1shows the scree plot of PCA.Fig. 2shows the eigenvalues of PCA.Fig. 3shows the eigenvectors of PCA. By Kaiser’s rule, we re-tain only those components whose eigenvalues are greater than 1. Therefore, there are 3 principal components which should be re-tained. According toFig. 3, the 3 principal components can be cal-culated as follows:

Prin1 ¼ 0:0267x1 0:3285x2 0:2843x3þ 0:3363x12 ð24Þ Prin2 ¼ 0:0077x1þ 0:2830x2þ 0:3014x3þ 0:2779x12 ð25Þ Prin3 ¼ 0:9385x1þ 0:0655x2 0:1783x3 þ 0:0534x12 ð26Þ

The standardized principal component scores of Prin1, Prin2, and Prin3 are partially listed inTable 2.

Fig. 3. The eigenvectors of PCA.

Fig. 4. The ﬁve clustering patterns. Table 2

The standardized principal component scores of Prin1, Prin2, and Prin3. No. Scores of Prin1 Scores of Prin2 Scores of Prin3

1 1.2071 0.6119 0.1203 2 0.6368 1.5014 0.0992 3 1.6939 0.1562 1.3042 109 0.8735 1.3814 0.5404 110 0.6523 0.8559 1.4322 111 0.2593 1.6555 1.4796

(6)

4.2. Construct a new wafer yield model using GMDH

In this study, the computer software, NeuroShell 2.0, is used to construct the proposed GMDH yield model. In this case of a DRAM company in Taiwan, one random pattern and four clustering pat-terns (i.e., bull eye pattern, edge pattern, bottom pattern, and cres-cent moon pattern) (Friedman, Hansen, Nair, & James, 1997) are considered, and these ﬁve clustering patterns are shown inFig. 4. Eighty-nine wafer data are randomly selected as training sam-ples, and the rest 22 wafer data are the testing samples. The result of GMDH learning is shown inFig. 5. ByFig. 5, the GMDH polyno-mial of proposed yield model is shown in Eq.(27)

Y ¼ 8:8E 002 X5 0:15 X2 þ X3 0:11 X4 3:8E 002 7:3E 002 X1 1:1 X3^2 6:9 X4^₂ 0:61 X3^_{3 þ 1:8 X4}^_{3 þ 8:5 X3 X4 1:3 X3} X5 þ 7 X4 X5 þ 7:9 X3 X4 X5 þ 0:1 X2^2 þ 1:1 X2 X3 þ 0:28 X2 X5 1:4 X2 X3^2 6:5 X2 X4^2 0:77 X2 X3^_{3 þ 1:7 X2 X4}^_{3 þ 8:1} X2 X3 X4 1:3 X2 X3 X5 þ 6:6 X2 X4 X5 þ 7:5 X2 X3 X4 X5 0:26 X1^2 þ 0:12 X1^₃ þ 0:19 X5^2 þ 0:15 X2^_{3 þ 0:21 X5}^₃ ð27Þ

where Y denotes the predictive wafer yield value, X1 is defect counts, X2 is the value of CIM, X3 is scores of Prin1, X4 is scores of Prin2, and X5 is scores of Prin3. The value of RMSE ¼_{ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi} pffiffiffiffiffiffiffiffiffiffiMSE¼

0:010540 p

¼ 0:1027, and the value of correlation coefﬁcient is 0.9784.

4.3. Compare with other wafer yield models

Finally, the comparisons made among negative binomial yield model, back-propagation neural network (BPNN) yield model, gen-eral regression neural network (GRNN) yield model (Tong & Chao,

2008), and the proposed GMDH yield model are listed inTable 3. The scatter plots in the negative binomial yield model, BPNN yield model, GRNN yield model, and the proposed GMDH yield model are shown fromFigs. 6–9.

FromTable 3, it can be seen that the proposed GMDH model in this study has the smallest value of RMSE and the largest value of correlation coefﬁcient. Therefore, the predictive accuracy of the proposed model in this study is indeed superior.

Table 3

Comparisons of RMSE and r between predictive and actual yield value.

Yield model RMSE r

Negative binomial yield model 0.1443 0.9159

BPNN yield model 0.1224 0.9308

GRNN yield model 0.1189 0.9496

Proposed GMDH yield model 0.1027 0.9784

0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1

Actual yield value

Predictive yield value

Fig. 9. The scatter plot in GMDH yield model. Fig. 5. The result of GMDH learning.

0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1

Actual yield value

Fig. 6. The scatter plot in negative binomial yield model.

0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1

Actual yield value

Fig. 7. The scatter plot in BPNN yield model.

0.6 0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1

Actual yield value

(7)

5. Conclusions

When the clustering phenomenon of defects is pronounced, the conventional Poisson yield model can not reasonably estimate the wafer yield. Other neural networks models have the problem of setting parameters and can not provide a certain equation for man-agers to use.

On the basis of practicability, a novel design of this paper is to construct a new wafer yield model with a handy polynomial by using GMDH, and it can accurately predict the wafer yield. In addi-tion to defect cluster index (CIM), 12 critical electrical test param-eters are also considered simultaneously. In this study, the PCA is used to reduce the dimensions of 12 critical electrical test param-eters to a manageable few without much loss of information for convenient analysis.

The merits of the proposed approach are summarized as follows:

(1) The proposed GMDH yield model can provide a handy poly-nomial for managers to use, and this model does not require setting parameters of neural networks.

(2) This study employs PCA to reduce the dimensions of 12 crit-ical electrcrit-ical test parameters to a manageable few without much loss of information, and it can effectively simplify the constructions of variables.

(3) The proposed GMDH yield model is fast learning and has high accuracy of prediction.

(4) The proposed GMDH yield model does not need any statisti-cal assumption and can be friendly to use.

(5) The proposed GMDH yield model can help the IC manufac-turers to manage the wafer yield and evaluate their process capability in relation to proﬁt and loss.

Acknowledgement

The author thanks the National Chiao Tung University for its resourceful support.

References

Albin, S. L., & Friedman, D. J. (1991). Clustered defects in IC fabrication: impact on process control charts. IEEE Transactions on Semiconductor Manufacturing, 4(1), 36–42.

Cunningham, J. A. (1990). The use and evaluation of yield models in integrated circuit manufacturing. IEEE Transactions on Semiconductor Manufacturing, 3(2), 60–71.

Dupret, Y., & Kielbasa, R. (2004). Modeling semiconductor manufacturing yield by test data and partial least squares. In Proceedings of 16th international conference on microelectronics (pp. 404–407).

Friedman, D. J., Hansen, M. H., Nair, V. N., & James, D. A. (1997). Model-free estimation of defect clustering in integrated circuit fabrication. IEEE Transactions on Semiconductor Manufacturing, 10(3), 344–359.

Ivakhnenko, A. G. (1968). The group method of data handling; a rival of the method of stochastic approximation. Soviet Automatic Control, 13(3), 43–55.

Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man and Cybernetics, 1(4), 364–378.

Jolliffe, I. J. (1986). Principal component analysis. New York: Springer.

Jun, C. H., Hong, Y., Kim, S. Y., Park, K. S., & Park, H. (1999). A simulation-based semiconductor chip yield model incorporating a new defect cluster index. Microelectronics Reliability, 39(4), 451–456.

Kim, C., & Baldwin, D. F. (2005). A theoretical yield model for assembly process of area array solder interconnects packages with experimental veriﬁcation. IEEE Transactions on Electronics Packaging Manufacturing, 28(4), 344–354. Kumar, N., Kennedy, K., Gildersleeve, K., Abelson, R., Mastrangelo, C. M., &

Montgomery, D. C. (2006). A review of yield modeling techniques for semiconductor manufacturing. International Journal of Production Research, 44(23), 5019–5036.

Langford, R. E., Liou, J. J., & Raghavan, V. (2001). The application and validation of a new robust windowing method for the Poisson yield model. In Advanced semiconductor manufacturing conference, IEEE/SEMI (pp. 157–160).

Leachman, R. C. (1993). The competitive semiconductor manufacturing survey. In IEEE international symposium on semiconductor manufacturing conference, Austin, Texas, USA (pp. 359–381). Piscataway, NJ: IEEE.

Liou, J. J., Zhang, Q., McMacken, J., Thomson, J. R., Stiles, K., & Layman, P. (2002). Statistical modeling of MOS devices for parametric yield prediction. Microelectronics Reliability, 42(4), 787–795.

Madala, H. R., & Ivakhnenko, A. G. (1994). Inductive learning algorithms for complex systems modeling. Boca Raton, FL: CRC Press.

Meyer, F. J., & Park, N. (2003). Predicting defect-tolerant yield in the embedded core context. IEEE Transactions on Computers, 52(11), 1470–1479.

Muller, J. A., & Lemke, F. (2000). self-organising data mining: An Intelligent approach to extract knowledge from data. Hamburg: Libri.

Pearson, K. (1901). On lines and planes of closest ﬁt to systems of points in space. Philosophical Magazine, 2, 559–572.

Stapper, C. H. (1973). Defect density distribution for LSI yield calculations. IEEE Transactions on Electron Devices (Correspondence), 20(7), 655–657.

Stapper, C. H. (1991). On Murphy’s yield integral. IEEE Transactions on Semiconductor Manufacturing, 4(4), 294–297.

Stapper, C. H., & Rosner, R. J. (1995). Integrated circuit yield management and yield analysis: Development and implementation. IEEE Transactions on Semiconductor Manufacturing, 8(2), 95–102.

Tong, L. I., & Chao, L. C. (2008). Novel yield model for integrated circuit with clustered defects. Expert Systems with Applications, 34, 2334–2341.

Tong, L. I., Lee, W. I., & Su, C. T. (1997). Using a neural network-based approach to predict the wafer yield in integrated circuit manufacturing. IEEE Transactions on Components, Packaging, and Manufacturing Technology – Part C, 20(4), 288–294. Tong, L. I., Wang, C. H., & Chen, D. L. (2007). Development of a new cluster index for wafer defects. International Journal of Advanced Manufacturing Technology, 31, 705–715.

Tyagi, A., & Bayoumi, M. A. (1992). Defect clustering viewed through generalized Poisson distribution. IEEE Transactions on Semiconductor Manufacturing, 5(3), 196–206.

Tyagi, A., & Bayoumi, M. A. (1994). The nature of defect patterns on integrated circuit wafer maps. IEEE Transactions on Reliability, 43(1), 22–29.