Monitoring Batch Processes with Multiple On–Off Steps in Semiconductor Manufacturing

(1)

Monitoring Batch Processes with

Multiple On–Oﬀ Steps in

Semiconductor Manufacturing

SHUI-PIN LEE

Ching Yun University, Taoyuan, Taiwan 32097

AN-KUO CHAO

National Tsing Hua University, Hsinchu, Taiwan 30013

FUGEE TSUNG

Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong

DAVID SHAN HILL WONG, SHENG-TSIANG TSENG, and SHI-SHANG JANG

National Tsing Hua University, Hsinchu, Taiwan 30013

A modern semiconductor manufacturing line is made of hundreds of sequential batch-processing stages. Each of these stages consists of many steps carried out by expensive tools, which are monitored by numerous sensors capable of sampling at intervals of seconds. The sensor readings of each run constitute profiles, which can include extremely drastic changes. The heterogeneous variations at different profile points are mainly due to on–off recipe actions at specific points. In addition, the analysis of these profiles is further complicated by long-term trends due to tool aging and short-term effects specific to the first wafer in a lot cycle. Statistical process control methods that fail to take these effects into consideration will lead to frequent false alarms. A systematic method is proposed to address these challenges. First, a reference profile is determined for each sensor variable that describes the on–off actions. Next, level shifts of these profiles in each step are established to capture and remove intrinsic variations due to long-term aging trends and the short-term first-wafer effects. The residuals are used to formulate a health index, and this index can be used to monitor the health of the equipment and detect faulty wafers efficiently.

Key Words: Batch-Processing Stages; Health Index; Heterogeneous Variation; Multivariate Statistical Pro-cess Control; Sensor Data; Signal Map.

Introduction

T

HE SEMICONDUCTORmanufacturing industry has

been one of the fastest growing industries of the

Dr. Lee is an Assistant Professor in the Department of Industrial Engineering and Management. His email address is [email protected].

Mr. Chao is a Graduated Student in the Institute of Statis-tics. His email address is [email protected].

Dr. Tsung is a Professor in and Head of the Department of Industrial Engineering and Logistics Management. He is a Fellow of ASQ. His email address is [email protected].

Dr. Wong is a Professor in the Department of Chemical Engineering. His email address is [email protected].

Dr. Tseng is a Professor in the Institute of Statistics. He is a Member of ASQ. His email address is [email protected] .edu.tw.

Dr. Jang is a Professor of Chemical Engineering. His email address is [email protected].

(2)

FIGURE 1. The Synchronized Proﬁles of Variables V1 to V5 over a 364 Wafer Batch.

last two decades. Effective control of final product quality has become one of the keys to maintain-ing competitiveness (May and Spanos (2006)). How-ever, a modern semiconductor manufacturing line in-volves hundreds of sequential processes, and the re-sults of off-line quality tests are available only weeks or months after a problem has occurred in one of the processes. For some processes, such as planariza-tion and etching, inline metrology data on thickness and critical dimensions are available as indicators of process performance. Statistical control of these line metrology data is used to ensure that these in-dividual processes are operating properly. However, for many processes, such as implanting and thermal annealing, inline metrology data may not be avail-able. Tracking process performance depends solely on monitoring sensor data of the equipment. Advances in information technology make available the data collected from a large number of sensors during the course of each run, known as “profiles”. Data in the form of a profile presents special opportunities and challenges for statistical monitoring. Woodall et al. (2004) and Woodall (2007) have presented a compre-hensive review of the issues involved. In the

follow-ing, we will use a motivating example to illustrate the problems from a practical perspective.

Motivating Example

Figure 1 illustrates profiles of five sensor variables V1–V5 for 364 wafers from 16 lots of an industrial etching process. The lengths of the original profiles vary from wafer to wafer. They were synchronized using Akima splines (Akima (1970)). The x-axis rep-resents 547 synchronized profile points divided into 23 steps. The vertical dashed lines mark the begin-ning of each step. Variables V1 to V4 relate to plasma operations, which show large on–off variations. V5 is the chamber temperature.

Figure 2(a) and Figure 2(b) show the profile plots of variable V3 at step 9 for all 364 wafers and 25 wafers of a specific lot, respectively. The colors in these profiles have been arranged to change from light to dark according to the actual start time of the wafer. Figure 2(a) demonstrates a long-term ag-ing trend that shifted the profiles downward. In Fig-ure 2(b), the profile of the first wafer is marked by triangles. This profile is substantially lower than the

(3)

(a) (b)

264 272 280 288 296 304 312 320 328 336

FIGURE 2. Long-Term Aging and Short-Term First-Wafer Effects for Variable V3 in a Segment of the Entire Profile. (a) Long-term aging trend from wafer to wafer (light to dark). (b) Short-term first-wafer effects within a lot (first wafers marked by triangles).

others, which also have an upward trend within the lot. The deviation of the first wafer of a lot is known as the “first-wafer” effect. For experienced engineers, such short- and long-term fluctuations are regarded as part of normal operations, i.e., they are “intrinsic” variations that should not cause an alarm.

Figure 3 shows the mean centered profiles, which are the deviations of individual profiles from the mean profile, of V3 at step 9. The fluctuations at the beginning of this step are significantly larger than other fluctuations. “On–off” actions are very

FIGURE 3. The Mean Centered Proﬁle of Variable V3 at the 9th Step.

common in many semiconductor processes. The pro-files usually exhibit drastic changes at specific profile points. Sensor variables may have large fluctuations at certain profile points and small variations at the others because of such “on–off” actions.

Objective and Scope

As shown in the motivating example, variations of proﬁles may show structures that are correlated be-tween proﬁle points and auto-correlated from sample to sample. These structures may be caused by intrin-sic causes that must be isolated so they will not cause undesired alarms, after which the residuals can be monitored statistically. In this paper, a simple and robust method is proposed to address this problem. Furthermore, to detect faulty wafers, we propose a health index that is constructed based on the resid-uals of this model.

Literature Review

Proﬁle monitoring has attracted much scholarly attention (e.g., Kang and Albin (2000), Kim et al. (2003), Chang and Gan (2006), Zou et al. (2006), Zhu and Lin (2010), Mahmoud et al. (2007), Zou et al. (2007)). Practitioners have used Shewhart charts, multivariate Hotelling T2 _charts,

exponen-tially weighted moving average (EWMA) charts, and multivariate EWMA charts to monitor model pa-rameters such as the slope, intercept, and residual variance of linear proﬁles. For more complicated pro-ﬁles, nonlinear models can be used. Williams et al. (2007) proposed a four-parameter logistic regression

(4)

model to represent certain proﬁles. The parameters of the nonlinear regression model can be monitored using Hotelling T2 _{charts. If the functional form is}

not smooth enough, wavelets can be used (e.g., Jin and Shi (1999), Chicken et al. (2009)).

When conventional control charts are used to monitor fixed-effect parameters, an essential assump-tion is that the model parameters remain unchanged as long as the process is “in-control”. Hence, possi-ble sample-to-sample trends of an in-control process are neglected. Moreover, because the error terms at different profile point are often assumed to be in-dependent and identically distributed random vari-ables, within-profile variations are also neglected.

To consider sample-to-sample auto-correlation in a profile, Noorossana et al. (2008) modified the linear model to include a first-order autoregressive (AR(1)) term in the errors. Jensen et al. (2008) examined a linear mixed model,

yi= Xiβ+ Zibi+εi, (1)

where yi = (yi1, . . . , yiK) is a vector of

measure-ments at K proﬁle points that are related to a set of explanatory variables Xi = (xi1, . . . , xiK).β is a

vector of ﬁxed-eﬀect parameters. Zidenotes a vector

of the predictor variables for the possible change in shape of the proﬁle. Furthermore, the random eﬀect parameter, bi, follows a multivariate normal

distri-bution with mean vector 0 and covariance matrix

D, i.e., bi ∼ MN (0, D); and εi ∼ MN (0, Ri) is a

noise term, which is normally distributed with co-variance matrix Ri. Thus, within-proﬁle correlation

of the residual is also considered. The model was later extended to a nonlinear mixed model (Jensen et al. (2009)). Restricted maximum likelihood methods were used to obtain the parameters of this model, but due to the large number of adjustable parame-ters, an iterative solution is possible only when cer-tain simple forms, such as compound symmetry or autoregression, are assumed for Ri.

We have carried out simulation studies using the linear mixed model. Convergence of parameter esti-mates using the restricted maximum-likelihood algo-rithm may not be guaranteed, and the computation time increases as the square of the profile length. Such computation problems are obviously not desir-able in practical applications. As has been pointed out by Jensen et al. (2008), the linear mixed model approach will only be superior to the linear fixed-effect model when the data are unbalanced or miss-ing. They also noted that careful selection of ini-tial guesses is required for convergence. Moreover,

Hotelling’s T2 _{was used to monitor b}

i based on the

assumption of a multivariate normal distribution of random effects. In order to account for within-profile correlation, Qiu et al. (2010) has proposed the fol-lowing nonparametric mixed effects model:

yik= g(xik) + fi(xik) +εik, (2)

where g is the population profile function (fixed ef-fect), fi is the random-effects term describing the

variation of the ith individual profile from the fixed effect. The within-profile correlation will be ac-counted for by assuming that the random effect fi

is a mean zero process with a common covariance function, r(xk, xk) = E(fi(xk), fi(xk)). Because

within-proﬁle correlation has been accounted for in the random eﬀect, the error term is assumed to be

εi ∼ MN (0, σi2I). For Phase II analysis, the

in-control regression function was denoted by g0. The

in-control hypothesis then becomes H0: gi = g0 and

is monitored by a Hotelling T2 _{chart of the error}

term. Due to the complexity of the model, reliable estimation is possible only with a large number of proﬁle samples as well as a large number of proﬁle points.

Alternatively, principal component analysis (PCA) is a technique commonly used in Phase I for model building (Jones and Rice (1992), Nomikos and Mac-Gregor (1995a, 1995b), Kaistha et al. (2004), Ding et al. (2006)). To use PCA, the profile data of one or more sensor variables are unfolded so that each profile point is regarded as a different variable. Mod-ern instrumentation and data acquisition technolo-gies provide users with a large amount of data. In the monitoring of semiconductor equipment, datasets with over 50 sensor variables and 500 profile points are very common. This leads to an unfolded matrix with more than 25,000 variables. Collecting enough wafer samples to perform statistically valid multi-variate analysis is often very difficult. Wise et al. (1999) has provided an overview of using multivariate statistical methods in semiconductor-manufacturing processes. Subsequently, several authors (e.g., Li et al. (2000), Spitzlsperger et al. (2005), Chen and Zhang (2010), Cherry and Qin (2006), Ge and Song (2010)) highlighted the trade-off between sensitivity and false alarms in Phase II monitoring when ap-plying such PCA monitoring methods due to drifts or non-Gaussian process behavior. Figure 4 shows the Q statistic (from Wise et al. (1999)) and a q-q plot of the error term of a multi-way PCA (MPCA) model for V3 at step 9. It is obvious that the distributions of the Q statistics are

(5)

diﬀer-FIGURE 4. The Control Chart and q-q Plot of Q Statistic Obtained Using Residuals of the First Principal Component of Variable V3 at the 9th Step in Phase I and Phase II.

ent between the two periods. This phenomenon was persistently observed when different periods of plant operation were sampled. Consultation with process engineers confirmed that such drifts are part of the normal process that they do not wish to be alarmed. As Apley (2010) has pointed out, in profile monitoring, within-profile correlation and sample-to-sample autocorrelation must be identified. More-over, we should distinguish between trends due to in-control intrinsic causes and variations that should be monitored. Furthermore, it is important to deter-mine whether variations that should be monitored will show up in within-profile structures or in the error terms.

Model Development

Let us denote wafers, variables, and proﬁle points with the indices i = 1, . . . , I, j = 1, . . . , J and k = 1, . . . , K, respectively. These K proﬁle points are separated into M steps in accordance with the recipe setting. If Yijk is the value of the jth variable

at the kth proﬁle point of the ith wafer, then Yijk is

given by

Yijk= μjk+ bijm(k)+ εijk, (3)

where m(k) denotes the speciﬁc step to which the kth proﬁle point belongs. If km−1 < k ≤ km, where

km is the last proﬁle point of the mth step, then

m(k) = m.

In this equation, μjkis the recipe-proﬁle parameter

of the jth sensor variable at the kth profile point. The values are determined by the recipe, so large changes due to on–off changes are mostly captured in μjk. In many profile-monitoring studies, a

fixed-reference profile has been used (e.g., Woodall et al. (2004) employed a linear profile, Jensen et al. (2008) applied a logistic function, Zou et al. (2008) used a fixed combination of kernel functions). Our method corresponds to a direct estimation of a fixed-reference profile without the need of a specific profile form or assumptions about the smoothness of the profile.

Meanwhile, bijm is a level-shift parameter of the

jth variable of the ith wafer at the mth step. It rep-resents a mean shift at that point with respect to the ﬁxed reference proﬁle. Thus, bijm corresponds to

the profile coefficients determined for individual sam-ples. Unlike in other profile monitoring approaches in which the coefficients are assumed to be random vari-ables, we shall show that deterministic mechanisms dominate the variations of bijm.

If we assume that all systematic variations in-cluding on–off recipe steps, long-term aging, and the short-term first-wafer effect are captured by μjkand

bijm, then εijk denotes the remaining error. As we

have seen in Figure 3, the independent and identi-cally distributed assumption may not be valid for proﬁles with large on–oﬀ actions. Hence, we assume that the variances of the white noise term εijk,

de-noted by σ2

jk, will vary with the proﬁle point k. In

order to reduce the inﬂuence of the larger variation at the beginning proﬁle points of a step on the es-timates of the parameters, consider weighted least square estimates (WLSE) of the parameters μjkand

bijm by minimizing the following weighted sum of

square errors (WSS): WSSjm= I i=1 km k=km−1+1 1 σ2 jk (Yijk−μjk−bijm)2, (4)

(6)

with j = 1, . . . , J and m = 1, . . . , M . Given σ2

jk and

the constraints I_i=1bijm = 0 for all j and m, the

WLSEs of μjk and bijm are

ˆ μjk=Yjk= 1 I I i=1 Yijk (5) and ˆbijm= km k=km−1+1 wjk(Yijk− ˆμjk), (6) where wjk= 1 σ2 jk km k=km−1+1 1 σ2 jk

denotes the weight of (Yijk − ˆμjk) for the

esti-mate of bijm. In fact, Equations (5) and (6) are the

maximum-likelihood estimates of μjk and bijm if we

assume the error terms εijk are mutually

indepen-dent and normally distributed for all i = 1, . . . , I and k = 1, . . . , K. Even though this assumption will be violated with sample-to-sample autocorrela-tion and within-profile correlaautocorrela-tion (e.g., from Fig-ure 3, the residuals are clearly correlated within pro-files across k, as well as temporally across i), we be-lieve these still constitute intuitively appealing and effective estimators. Apley (2010) has commented that, because within-profile correlation is pervasive in most profile-monitoring problems, such correlation will be revealed as sample-to-sample variations when we assume that there is no within-profile correlation, as accounted for by the term Zibi in Jensen et al.

(2008). It should be pointed out that the level shifts represent a highly structured non-random change of the profiles. The assumption of independence of error terms across profile points should be acceptable for profiles that are long and without sharp changes due to on–off actions. This assumption may not be appli-cable for shorter profiles with sharp changes. In the following section, we shall see that, in our approach, the portion of a profile with large “on–off” changes is given less weight. Hence, we believe that this is a robust method for estimating the level shift.

In practice, σ2

jkis unknown and must be estimated

using historical data. However, according to tempo-ral profile-to-profile behavior presented in Figure 3, we can assume that the changes in level-shift param-eters between successive wafers in the same lot are small. The absolute difference between two succes-sive profiles within a lot, i.e., the moving range of

{Yijk}i=1, is deﬁned as

Rijk =|Yijk− Y(i−1)jk| ≈ |εijk− ε(i−1)jk| (7)

and MRjkdenotes the corresponding mean. Then the

standard deviation of the error term, σjk, can be

es-timated by

ˆ

σjk= MRjk/d2, (8)

where d2 = 1.128 (Montgomery (2005)). The

moving-diﬀerence method was used to reduce the ef-fect of sustained step changes or other trends be-tween samples. It was also used by Jensen et al. (2008) to estimate the covariance matrix in bi in the

linear mixed model or the covariance matrix inβi in

the linear model.

All digital sensors are actually discontinuous due to their resolutions. Hence, it is possible that some of the estimates of ˆσjk are zeros. To avoid singularity,

they can be replaced by a lower bound of standard deviation. Suppose that the measurement resolution of variable j is rjand that yjis the observed value, we

assume the true value will be uniformly distributed in (yj−rj/2, yj+rj/2). The lower bound of the

stan-dard deviation of this variable is rj/

√ 12. Hence, the estimator of σjk becomes ˆ σjk= max MRjk d2 , rj √ 12 . (9)

The level-shift parameter estimators in Equation (6) can also be modiﬁed into

ˆbijm= km k=km−1+1 ˆ wjk(Yijk− ˆμjk), (10)

where ˆwjk is the estimate of wjk given by

ˆ wjk= 1 ˆ σ2 jk km k=km−1+1 1 ˆ σ2 jk , (11) where km−1+ 1≤ k ≤ km.

In addition, to avoid the effect of large deviations due to faulty operations or data collection, estimates of the recipe-profile parameter in Equation (5) at each profile point can be slightly modified as

ˆ

μjk= median

i (Yijk). (12)

Hence, the residuals can be expressed as follows: ˆ

eijk= Yijk− ˆμjk− ˆbijm(k). (13)

The estimator of level-shift parameter bijm is given

(7)

measurements in the step to represent the general shift in a step with respect to the mean profile. It sets the weighting coefficients as inversely proportional to the variance at each profile point of the same step. In this way, data with larger variations coming from the unstable period at the beginning of a step due to the on–off operation will receive negligible weight. Data coming from the more stable period in each step will contribute more to the estimate of this level shift. The level shift is responsible for capturing the intrinsic causes due to long-term aging and the short-term first-wafer effect. As we have observed in Figure 3, an unstable period due to on–off actions will have larger variations that are less indicative of these de-terministic trends.

It should be pointed out that the model parame-ters ˆμjkand ˆwjkas well as the control limits used for

future analysis of residuals must be estimated with the outliers removed from the training data in Phase I. The detailed algorithm for applying our proposed model to a given dataset to estimate model param-eters and for determining outliers in Phase I are de-scribed in Appendix A.

Numerical Results of

the Proposed Algorithm

Comparison with MPCA

In order to better illustrate the advantages of our proposed method with respect to MPCA, we ﬁrst present the results of applying the proposed method to the V3 data at step 9 of our illustrative example. By following the procedures in Appendix A, the

esti-mated level shifts ˆbi39and residuals ei3kfor the Phase

I data can be obtained. The corresponding values of ˆbi39 and ei3k for wafers in Phase II can be obtained

using ˆμjk and ˆwjk estimated in Phase I and

Equa-tions (A1) and (A6).

We also calculate the Q statistic (Nomikos and MacGregor (1995a), Wise at al. (1999)) to compare the residuals of new profiles in Phase II to a con-trol limit defined by the profiles in Phase I under normal operating conditions. Details of computing the control limit are shown in Appendix A. The Q statistic for the residuals of a new profile is defined as Equation (A10). Figure 5(a) shows the Q statistics of the normalized residuals ˜ei3k of the training data in

Phase I and the testing data in Phase II. Figure 5(b) demonstrates the q-q plot of the Q statistics of the training data with respect to the testing data. The q-q plot of the proposed method is closer to a straight line than that in Figure 4(b), which means that the distributions of the Q statistics for Phase I and Phase II are more similar for the proposed method than for MPCA. The upper control limit of the Q statistic is set by taking the signiﬁcance level α = 0.273%, i.e., z0= 3.09. Wafer 58 is an outlier of the training data

for both the proposed method and MPCA. There are two outlier signals (wafers 287, 357) in Phase II for the proposed level-shift method. MPCA found 11 outliers (wafers 287, 300, 313, 319, 320, 322, 323, 324, 331, 346, 357). Obviously, the false-alarm rate of this new model is substantially lower than that of the MPCA method. In a simulation study, we know the system is “in-control”. In this case, it was the opinion of the expert engineer that the system is in fact in-control.

FIGURE 5. The Control Chart and q-q Plot of Q Statistic Obtained Using Residuals of the Proposed Model of Variable V3 at the 9th Step in Phase I and Phase II.

(8)

FIGURE 6. The Reference Proﬁles of Variables V1 to V5.

Let us now apply the procedure just developed to analyzing the complete dataset.

Recipe Proﬁles

The recipe proﬁles ˆμjkof the ﬁve variables V1–V5

(obtained by Equation (A1) in Appendix A) are illus-trated in Figure 6. The resemblance between Figure 6 and Figure 1 demonstrates that the changes in sensor variables due to recipe actions have been successfully captured. Figure 7 shows the difference between ac-tual profiles and the reference profile Yijk−ˆμjk, which

illustrate the trends as shown in Figure 3, but for the entire proﬁle.

Level-Shift Parameters of Variable V3

Figure 8 illustrates the estimates of the level-shift parameter for variable V3, ˆbi3m, for 1 ≤ i ≤ 364

and 1 ≤ m ≤ 23. The y-axis of each subplot was limited to the range (−10, 10). The abscissa is the run sequence i = 1, . . . , 364. Significant aging trends and first-wafer effects (as indicated by the bullets) are evident in steps 2, 3, 5, 7, 9, 10, 12, 14, 16, 18, 20, 22, while no such trends were captured for the other steps highlighted by the gray background color.

The aging trends and first-wafer effect are extracted deterministic patterns demonstrated by Figure 2(a) and Figure 2(b) for the long-term and short-term structures of a profile. Steps without trends deviate little from zero, which indicates that the V3 sensor was turned off during these steps. The profiles are just noise from the sensor’s electronics. This illus-trates that the level-shift parameters capture the ag-ing phenomena. Because this is a systematic and in-trinsic effect, it is observed in all active steps with a similar pattern. Figure 9 shows the Q statistics and the q-q plot comparisons for all turn-on steps 2, 3, 5, 7, 9, 10, 12, 14, 16, 18, 20, 22 for the variable V3 and i = 1, . . . , 364. Again the statistics of the residuals are comparable in Phase I and Phase II, indicating that intrinsic trends have been sufficiently separated.

Feature Extraction from the Level-Shift Parameters of Variables V1–V5

It is diﬃcult to track variations in the level-shift parameters of all variables at every step simultane-ously. To simplify presentation, PCA was performed using all 23 level-shift parameters for each variable V1 to V5. The wafer-to-wafer scores of the ﬁrst

(9)

prin-FIGURE 7. The Residual Proﬁles of Variables V1 to V5.

FIGURE 8. Wafer-to-Wafer Variations in Level-Shift Parameters for All m = 1,. . . ,23 Steps for the Variable V3 (y-axis) and All i = 1,. . . ,364 Wafers (x-axis).

(10)

FIGURE 9. The Control Charts and q-q Plots of Q Statistics for Turn-On Steps 2, 3, 5, 7, 9, 10, 12, 14, 16, 18, 20, 22 for the Variable V3 and All i = 1, . . . , 364 Wafers. (Rows 1 and 3 are Q statistics vs. wafers and Rows 2 and 4 are the corresponding q-q plot comparisons.

cipal components for the five variables are shown in Figure 10. The vertical lines divide the wafer se-quences by lots and indicate the first wafers. Aging was the main feature of V1, while both aging and the first-wafer effects were found for V3 and V4. The first-wafer effect was extremely prominent in the V5 data. However, the feature characteristic of variable V2 was quite different from those of the other four variables. It may be fruitful to identify the source of such differences.

As was mentioned earlier, the entire system was under run-to-run feedback control in which a cer-tain manipulation variable, MV, not disclosed due to the proprietary nature of the information, was var-ied based on measured variations in the quality of the output. Logically, the systematic manipulation of an operating variable should be reflected in some of the sensor data. Figure 11 is a scatter plot of the first principal component of V2 against MV. The ab-solute correlation coefficient over 364 wafers is 0.855. This strong relationship indicates that the long-term trend in V2’s level-shift parameters was caused by run-to-run adjustments.

Jensen et al. (2008) represented structured

distor-tions to the reference proﬁle as Zibi. We have

al-ready discussed the equivalence of our level-shift pa-rameter to this term. The results in this section have clearly demonstrated that the variations in the coef-ﬁcients biwill be dominated by patterned

sample-to-sample trends due to intrinsic causes that are part of the “in-control” process. Such changes should not be monitored statistically. Variation that cannot be at-tributed to these prescribed changes will be revealed in the error term. For example, Qiu et al. (2010) de-veloped an elaborate nonparametric method to ex-tract within-proﬁle correlation, but only the error terms were monitored.

Health Index and

Faulty Wafer Detection

After eliminating aging and the first-wafer effects, the residuals vector êik= (ˆei1k, . . . , êiJk), where ˆeijk

is defined in Equation (13), contains the information about the process’ health at the kth profile point during the production of the ith wafer. Assuming that êik is multivariate normally distributed with a

zero mean vector, then the Hotelling T2

ikof ˆeik is

ˆ

(11)

FIGURE 10. Wafer-to-Wafer Variations in the 1st Principal Component of all 23 Level-Shift Parameters for Variables V1 to V5.

where ˆΣk is the sample covariance matrix of ˆeik for

all normal wafers in Phase I. When ˆeik belongs to

Phase I, it is dependent on ˆΣk. The statistic ˆTik2

in Phase I follows a beta distribution (Chou et al. (1999)). Johnson and Wichern (2007) showed it can be approximated by a chi-square distribution. They also proved that ˆT2

ik in Phase II is F -distributed

because ˆeik is independent of ˆΣk. Their degrees of

freedom are related to the number of turn-on vari-ables v(k) and the number of normal samples for step nm(k) in the training dataset (see Appendix B). In

order to evaluate them on a uniform basis, the cumu-lative probability qik of each ˆTik2 is evaluated using

the corresponding distributions and degrees of free-dom for each proﬁle point,

χ21−qik(v(k)) = ˆT 2

ik,

if i∈ L, i.e., wafer i in Phase I dataset, and F1−qik(v(k), nm(k)− v(k)) = nm(k)(nm(k)− v(k)) (n2 m(k)− 1)v(k) ˆ Tik2,

if i /∈ L, i.e., wafer i in Phase II dataset. (15)

Hence, qik is approximately uniformly distributed

over the interval (0, 1) if the process is in control. qik is one minus the p-value of the ˆTik2 statistic.

Larger qik imply higher probability of a fault for

wafer i at the kth proﬁle point. Identifying a poten-tial faulty wafer involves evaluating an overall prob-ability for the wafer over all proﬁle points. It may be intuitive to use a geometric mean of p-values, i.e., (Kk=1(1− qik))1/K, but such a geometric mean will

be very sensitive to a few profile points with ex-tremely small p-values. To balance the importance of every profile point in detecting a potential faulty wafer, the average of qik for all profile points might

be used, denoted by ¯qi, as this will have an

approx-imate normal distribution. Hence, an overall health index hi of the ith wafer can be deﬁned using the

standardized statistics of ¯qi,

hi= q¯i− ˆμq

ˆ

σq , (16)

where ˆμq and ˆσq are the mean and standard

devia-tion, respectively, for all normal wafers in the train-ing data. In this study, we set the upper control limit of hi at 3, so if hi > 3, the ith wafer is potentially

(12)

FIGURE 11. The First Principal Component of Variable V2’s Level-Shift Parameters versus Values of the Run-to-Run Control Manipulation Variable MV for Each Wafer.

faulty. The detailed algorithms for computing the values of ˆT2

ik, qik, ¯qi, ˆμq, and ˆσq and for detecting

outliers in Phase I are presented in Appendix B. The bottom plot in Figure 12 is a signal map of qik

for the etching data. It allows us to micro-monitor op-erations at each proﬁle point. The x-axis is the wafer index. The left and right y-axes are the step index and the proﬁle points, respectively. When the value of qik is larger than 0.999, the corresponding

loca-tion on the map will be highlighted. The signal map clearly shows that the operation in step 2 is mainly responsible for the pronounced ﬁrst-wafer eﬀect.

The top plot in Figure 12 shows the control chart of hi for i = 1, . . . , 364. It demonstrates a total 18

signal wafers in Phase I and Phase II as follows: Phase I: 1* 2* 50* 74* 83* 108* 109* 154* 179*

204*

Phase II: 224* 249* 270* 271* 295* 325 329 341* Note that the superscript ‘*’ means that the wafer is the first or second wafer in a lot. In Phase I, all outliers are due to first-wafer effects. In Phase II, only two wafers, 325 and 329 in lot 15, are possibly faulty, and neither is the first wafer of a lot. They lie just outside the control limit. This shows that the proposed method can be used to monitor an aging process without causing frequent false alarms.

It should be pointed out that the health index depends solely on error terms and not on the level-shift parameter bijm. Hence, we are in fact looking

for changes in μjk, i.e., possible error in basic recipe

actions of the process.

Discussion and Conclusions

Changes of profile shape with respect to a ref-erence profile are highly structured in semiconduc-tor manufacturing. In our example, the changes may have been caused by intrinsic trends that are auto-correlated from sample to sample. These are changes that one does not wish to trigger alarms. Moreover, due to large on–off actions, after eliminating strongly correlated within-profile variation, residuals at dif-ferent profile points are still unlikely to be indepen-dent and iindepen-dentically distributed. Failure to account for these effects will lead to frequent false alarms.

This study has led to a reference-profile and level-shift model. On–off actions and a general similarity in the profiles due to recipe actions can be captured in the reference profile. The level-shift parameters serve as good indicators for within-profile variations caused by long-term trends and short-term effects. A simple and robust estimation method has been provided for their estimation. Capturing and sepa-rating these systematic variations in the sensor data, which are regarded as part of normal operation,

(13)

fa-FIGURE 12. The Health Index Obtained from the Residuals of the Proposed Model.

cilitates the detection of real faults. The residuals of this model can then be combined into a health index for each wafer using standard multivariate statisti-cal analysis. This health index allows the detection of faulty wafers in the presence of aging trends and the ﬁrst-wafer eﬀects without causing frequent false alarms. Future study of this problem will focus on isolating the root causes, and the results will be pre-sented in a subsequent report.

Appendix A:

Proﬁle Model Parameter

Estimation in Phase I

Step 1

Select variable j and step m for the following steps. Let Sjm denote the order set of wafer indices

under normal operating conditions for the jth vari-able at the mth step. At the beginning of the pro-cedures, Sjm = L ={1, . . . , 223} includes indices of

all wafers in the Phase I training dataset arranged in ascending order of the manufacturing sequence.

Step 2

The estimate of the reference proﬁle is given by ˆ

μjk= median i∈Sjm

(Yijk). (A1)

Step 3

For the kth proﬁle point at the mth step, calculate the sample variance s2

jk as

s2jk= Var i∈Sjm

(Yijk). (A2)

Step 4

Calculate the moving range for any two successive wafers that belong to Sjm,

Rijk=|Yijk− Y(i−1)jk|, (A3)

and its average,

MRjk = mean i∈Sjm (Rijk), (A4) and ˆ σjk= max MRjk d2 , rj √ 12 . (A5)

(14)

Step 5

Obtain the weighting factors for all k and m as ˆ wjk= (1/ˆσjk)2 km k=km−1+1 (1/ˆσjk)2 . (A6) Step 6

Calculate the estimate of the level-shift parame-ters as ˆbijm= km k=km−1+1 ˆ wjk(Yijk− ˆμjk). (A7) Step 7

Calculate the residuals, ˆ

eijk = Yijk− ˆμjk− ˆbijm, (A8)

and normalize them ˜

eijk= s−1_jk × ˆeijk. (A9)

Step 8

Obtain the Q statistic for the jth variable at the mth step, Qijm= km k=km−1+1 ˜ e2ijk. (A10) Step 9

Calculate the upper control limit (UCL) for the Q statistic as follows: Assume ˜eijm = (˜eij,km−1+1, . . . ,

˜

eij,km) is multivariate normally distributed with mean vector 0 and covariance matrix Σjm. The sum

of squares of these normalized residuals can be ap-proximated by a weighted chi-squared distribution φχ2

v, where the constant φ and the degrees of

free-dom v are both functions of Σjm (Box (1954)). The

control limit of Q statistic can be calculated by Qαjm= θ1jm 1−θ2jmh 0 jm 1− h0 jm θ2 1jm + sign(h0 jm)zα 2θ2jm(h0jm)2 θ1jm 1/h0 jm , (A11)

where zα is the critical value of the standard normal

distribution of signiﬁcance level α and sign(h0jm) = 1, if h0 jm≥ 0 −1, otherwise. In addition, h0jm= 1− 2θ1jmθ3jm 3θ2jm , (A12) θ1jm= trace(A), (A13) θ2jm= trace(A2), (A14) θ3jm= trace(A3), (A15)

where A is the covariance matrix of the residuals ˜

eijm, for all i ∈ Sjm. The control limit in

Equa-tion (A11) requires that the error terms are indepen-dent between samples. Such an assumption should be essentially valid if the sample-to-sample variation is largely captured by bijm in Step 6.

Step 10

If Qijm > Qαjm, then remove the ith wafer from

Sjm, i.e., Sjm = Sjm\{i}, and go back to Step 2,

cycling until Qijm≤ Qαjm for all i∈ Sjm.

Step 11

Repeat Step 1 to Step 10 for all other j’s and m’s.

Appendix B:

Calculation of Proﬁle Point and

Overall Health Indices

Step 1

Let Φm= S1m∩ S2m∩ · · · ∩ SJm denote the set of

indices of wafers that have normal operating proﬁles for all variables at the mth step and let nm be the

size of Φm. The corresponding estimate of Σk is then

given by ˆ Σk= i∈Φm(k) ˆ eikˆeik nm− 1 . (B1) Step 2

Calculate Hotelling’s T2 for every wafer at every proﬁle point,

ˆ

Tik2 = ˆeikΣˆ−1k ˆeik. (B2)

Step 3

Use ˆT2

ikto calculate qikusing the chi-square

(15)

for wafers in Phase II, χ2

1−qik(v(k)) = ˆT 2

ik

if i∈ L, i.e., wafer i in Phase I dataset, and F1−qik(v(k), nm(k)−v(k)) = nm(k)(nm(k)− v(k)) (n2 m(k)− 1)v(k) ˆ Tik2

if i /∈ L, i.e., wafer i ∈ Phase II dataset, (B3) where v(k) is the number of variables that are active at the kth proﬁle point. Hence, the average of qikfor

wafer i can be calculated as ¯ qi= 1 K K k=1 qik. (B4) Step 4

Let the normal data set Φ be L, the set of all training data at the beginning. Denote the number of samples in this training dataset by nΦ. Calculate

ˆ

μq and ˆσq, the mean and standard deviation,

respec-tively, for all wafers in this set as ˆ μq= 1 nΦ i∈Φ ¯ qi (B5) and ˆ σq2= 1 nΦ− 1 i∈Φ (¯qi− ˆμq)2. (B6)

An overall heath index for the ith wafer can be cal-culated as hi= ¯ qi− ˆμq ˆ σq . (B7)

If zi > 3 for any i, then the ith wafer is considered

as an outlier and removed from the training dataset Φ = Φ\{i}. This step is repeated until all outliers have been removed.

Step 5

Equation (B7) is then used to calculate the overall health index for all other wafers in Phase II.

Acknowledgments

The authors are very grateful to the Editor and the two referees for valuable suggestions that greatly improved the paper. This work was supported in part by National Center for Theoretical Sciences of Tai-wan and by Hong Kong’s Research Grants Coun-cil through Competitive Earmarked Research Grant 620010.

References

Akima, H. _{(1970). “A New Method of Interpolation and} Smooth Curve Fitting Based on Local Procedures”. Journal

of the Association for Computing Machinery 17, pp. 589–

602.

Apley, D. W._{(2010). Comment to “Nonparametric Proﬁle} Monitoring by Mixed Eﬀects Modeling”. Technometrics 52, pp. 277–280.

Box, G. P._{(1954). “Some Theorems on Quadratic Forms} Ap-plied in the Study of Analysis of Variance Problems. I: Eﬀect of Inequality of Variance in the One-Way Classiﬁcation”. The

Annals of Mathematical Statistics 25, pp. 290–302.

Chang, T. C._{and Gag, F. F. (2006). “Monitoring Linearity of} Measurement Gauges”. Journal of Statistical Computation

and Simulation 76, pp. 889–911.

Chen, T._{and Zhang, J. (2010). “On-Line Multivariate} Sta-tistical Monitoring of Batch Processes Using Gaussian Mix-ture Model”. Computers and Chemical Engineering 34, pp. 500–507.

Cherry, G. A._{and Qin, S. J. (2006). “Multiblock Principal} Component Analysis Based on a Combined Index for Semi-conductor Fault Detection and Diagnosis”. IEEE

Transac-tions on Semiconductor Manufacturing 19(2), pp. 159–171.

Chicken, E.; Pignatiello, J.; _{and Simpson, J. R. (2009).} “Statistical Process Monitoring of Nonlinear Proﬁles Using Wavelets”. Journal of Quality Technology 41, pp. 198–212. Chou, Y. M.; Mason, R.;_{and Young, J. C. (1999). “Power}

Comparisons for a Hotelling’s T2 Statistic”.

Communica-tions in Statistics, Part B—Simulation and Computation

28, pp. 1031–1050.

Ding, Y.; Zeng, L.; _{and Zhou, S. (2006). “Phase I} Analy-sis for Monitoring Nonlinear Proﬁles in Manufacturing Pro-cesses”. Journal of Quality Technology 38, pp. 199–216. Ge, Z._{and Song, Z. (2010). “Semiconductor Manufacturing}

Process Monitoring Based on Adaptive Substatistical PCA”.

IEEE Transactions on Semiconductor Manufacturing 23(1),

pp. 99–108.

Jensen, W. A.; Birch, J. B.;_{and Woodall, W. H. (2008).} “Monitoring Correlation Within Linear Proﬁles Using Mixed Models”. Journal of Quality Technology 40, pp. 167–183. Jensen, W. A._{and Birch, J. B. (2009). “Proﬁle Monitoring}

via Nonlinear Mixed Models”. Journal of Quality Technology 41, pp. 18–34.

Jin, J. _{and Shi, J. (1999). “Feature-Preserving Data} Com-pression of Stamping Tonnage Information Using Wavelets”.

Technometrics 41(4), pp. 327–339.

Johnson, R. A._{and Wichern, D. W. (2007). Applied}

Multi-variate Statistical Analysis, 6th edition, pp. 241–247. Upper

Saddle River, NJ: Pearson Education.

Jones, M. _{and Rice, J. (1992). “Displaying the Important} Features of Large Collections of Similar Curves”. The

Amer-ican Statistician 46, pp. 140–145.

Kaistha, N.; Moore, C. F.;_{and Leitnaker, M. G. (2004).} “A Statistical Process Control Framework for the Charac-terization of Variation in Batch Proﬁles”. Technometrics 46, pp. 53–68.

Kang, L. _{and Albin, S. L. (2000). “On-Line Monitoring} When the Process Yields a Linear Proﬁle”. Journal of

Qual-ity Technology 32, pp. 418–426.

Kim, K.; Mahmoud, M. A.;_{and Woodall, W. H. (2003).} “On the Monitoring of Linear Proﬁles”. Journal of Quality

(16)

Li, W.; Yue, H.; Valle, S.;_{and Qin, J. (2000). “Recursive} PCA for Adaptive Process Monitoring”. Journal of Process

Control 10, pp. 471–486.

Mahmoud, M. A.; Parker, P. A.; Woodall, W. H.; _and Hawkins, D. M._{(2007). “A Change Point Method for} Lin-ear Proﬁle Data”. Quality & Reliability Engineering

Inter-national 23, pp. 247–268.

May, G. S._{and Spanos, C. J. (2006). Fundamentals of}

Semi-conductor Manufacturing and Process Control. Hoboken,

NJ: John Wiley & Sons, Inc.

Montgomery, D. C._{(2005). Introduction to Statistical}

Qual-ity Control, 5th edition. Hoboken, NJ: John Wiley & Sons,

Inc.

Nomikos, P._{and MacGregor, J. F. (1995a). “Multivariate} SPC Charts for Monitoring Batch Processes”.

Technomet-rics 37, pp. 41–59.

Nomikos, P. _{and MacGregor, J. F. (1995b). “Multi-Way} Partial Least Squares in Monitoring Batch Processes”.

Chemometrics and Intelligent Laboratory Systems 30, pp.

97–108.

Noorossana, R.; Amiri, A.;_{and Soleimani, P. (2008). “On} the Monitoring of Autocorrelated Linear Proﬁles”.

Commu-nications in Statistics: Theory and Methods 37, pp. 425–442.

Qiu, P.; Zou, C.;_{and Wang, Z. (2010). “Nonparametric} Pro-ﬁle Monitoring by Mixed Eﬀects Modeling”. Technometrics 52, pp. 265–277.

Spitzlsperger, G.; Schmidt, C.; Ernst, G.; Strasser, H.; and Speil, M. (2005). “Fault Detection for a Via Etch Pro-cess Using Adaptive Multivariate Methods”. IEEE

Transac-tions on Semiconductor Manufacturing 18, pp. 528–533.

Williams, J. D.; Birch, J. B.; Woodall, W. H.;_{and Ferry,} N. M. _{(2007). “Statistical Monitoring of} Heteroscedas-tic Dose–Response Proﬁles from High-Throughput Screen-ing”. Journal of Agricultural, Biological and Environmental

Statistics 12, pp. 216–235.

Wise, B. M.; Gallagher, N. B.; Butler, S. W.; White, D.; and Barna, G. G. (1999). “A Comparison of Principal Com-ponents Analysis, Multi-Way Principal ComCom-ponents Analy-sis, Trilinear Decomposition and Parallel Factor Analysis for Fault Detection in a Semiconductor Etch Process”. Journal

of Chemometrics 13, pp. 379–396.

Woodall, W. H._{(2007). “Current Research on Proﬁle} Mon-itoring”. Produ¸c˜ao, 17(3), pp. 420–425.

Woodall, W. H.; Spitzner, D. J.; Montgomery, D. C.; and Gupta, S. (2004). “Using Control Charts to Monitor Process and Product Quality Proﬁles”. Journal of Quality

Technology 36, pp. 309–320.

Zhu, J._{and Lin, D. K. J. (2010). “Monitoring the Slopes of} Linear Proﬁles”. Quality Engineering 22, pp. 1–12. Zou, C.; Tsung, F.; _{and Wang, Z. (2007). “Monitoring}

General Linear Proﬁles Using Multivariate Exponentially Weighted Moving Average Schemes”. Technometrics 49, pp. 395–408.

Zou, C.; Tsung, F.;_{and Wang, Z. (2008). “Monitoring} Pro-ﬁles Based on Nonparametric Regression Methods”.

Techno-metrics 50, pp. 512–526.

Zou, C.; Zhang, Y.;_{and Wang, Z. (2006). “Control Chart} Based on Change-Point Model for Monitoring Linear Pro-ﬁles”. IIE Transactions 38, pp. 1093–1103.