Monitoring Batch Processes with
Multiple On–Off Steps in
Semiconductor Manufacturing
SHUI-PIN LEE
Ching Yun University, Taoyuan, Taiwan 32097
AN-KUO CHAO
National Tsing Hua University, Hsinchu, Taiwan 30013
FUGEE TSUNG
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
DAVID SHAN HILL WONG, SHENG-TSIANG TSENG, and SHI-SHANG JANG
National Tsing Hua University, Hsinchu, Taiwan 30013
A modern semiconductor manufacturing line is made of hundreds of sequential batch-processing stages. Each of these stages consists of many steps carried out by expensive tools, which are monitored by numerous sensors capable of sampling at intervals of seconds. The sensor readings of each run constitute profiles, which can include extremely drastic changes. The heterogeneous variations at different profile points are mainly due to on–off recipe actions at specific points. In addition, the analysis of these profiles is further complicated by long-term trends due to tool aging and short-term effects specific to the first wafer in a lot cycle. Statistical process control methods that fail to take these effects into consideration will lead to frequent false alarms. A systematic method is proposed to address these challenges. First, a reference profile is determined for each sensor variable that describes the on–off actions. Next, level shifts of these profiles in each step are established to capture and remove intrinsic variations due to long-term aging trends and the short-term first-wafer effects. The residuals are used to formulate a health index, and this index can be used to monitor the health of the equipment and detect faulty wafers efficiently.
Key Words: Batch-Processing Stages; Health Index; Heterogeneous Variation; Multivariate Statistical Pro-cess Control; Sensor Data; Signal Map.
Introduction
T
HE SEMICONDUCTORmanufacturing industry hasbeen one of the fastest growing industries of the
Dr. Lee is an Assistant Professor in the Department of Industrial Engineering and Management. His email address is [email protected].
Mr. Chao is a Graduated Student in the Institute of Statis-tics. His email address is [email protected].
Dr. Tsung is a Professor in and Head of the Department of Industrial Engineering and Logistics Management. He is a Fellow of ASQ. His email address is [email protected].
Dr. Wong is a Professor in the Department of Chemical Engineering. His email address is [email protected].
Dr. Tseng is a Professor in the Institute of Statistics. He is a Member of ASQ. His email address is [email protected] .edu.tw.
Dr. Jang is a Professor of Chemical Engineering. His email address is [email protected].
FIGURE 1. The Synchronized Profiles of Variables V1 to V5 over a 364 Wafer Batch.
last two decades. Effective control of final product quality has become one of the keys to maintain-ing competitiveness (May and Spanos (2006)). How-ever, a modern semiconductor manufacturing line in-volves hundreds of sequential processes, and the re-sults of off-line quality tests are available only weeks or months after a problem has occurred in one of the processes. For some processes, such as planariza-tion and etching, inline metrology data on thickness and critical dimensions are available as indicators of process performance. Statistical control of these line metrology data is used to ensure that these in-dividual processes are operating properly. However, for many processes, such as implanting and thermal annealing, inline metrology data may not be avail-able. Tracking process performance depends solely on monitoring sensor data of the equipment. Advances in information technology make available the data collected from a large number of sensors during the course of each run, known as “profiles”. Data in the form of a profile presents special opportunities and challenges for statistical monitoring. Woodall et al. (2004) and Woodall (2007) have presented a compre-hensive review of the issues involved. In the
follow-ing, we will use a motivating example to illustrate the problems from a practical perspective.
Motivating Example
Figure 1 illustrates profiles of five sensor variables V1–V5 for 364 wafers from 16 lots of an industrial etching process. The lengths of the original profiles vary from wafer to wafer. They were synchronized using Akima splines (Akima (1970)). The x-axis rep-resents 547 synchronized profile points divided into 23 steps. The vertical dashed lines mark the begin-ning of each step. Variables V1 to V4 relate to plasma operations, which show large on–off variations. V5 is the chamber temperature.
Figure 2(a) and Figure 2(b) show the profile plots of variable V3 at step 9 for all 364 wafers and 25 wafers of a specific lot, respectively. The colors in these profiles have been arranged to change from light to dark according to the actual start time of the wafer. Figure 2(a) demonstrates a long-term ag-ing trend that shifted the profiles downward. In Fig-ure 2(b), the profile of the first wafer is marked by triangles. This profile is substantially lower than the
(a) (b)
264 272 280 288 296 304 312 320 328 336
FIGURE 2. Long-Term Aging and Short-Term First-Wafer Effects for Variable V3 in a Segment of the Entire Profile. (a) Long-term aging trend from wafer to wafer (light to dark). (b) Short-term first-wafer effects within a lot (first wafers marked by triangles).
others, which also have an upward trend within the lot. The deviation of the first wafer of a lot is known as the “first-wafer” effect. For experienced engineers, such short- and long-term fluctuations are regarded as part of normal operations, i.e., they are “intrinsic” variations that should not cause an alarm.
Figure 3 shows the mean centered profiles, which are the deviations of individual profiles from the mean profile, of V3 at step 9. The fluctuations at the beginning of this step are significantly larger than other fluctuations. “On–off” actions are very
FIGURE 3. The Mean Centered Profile of Variable V3 at the 9th Step.
common in many semiconductor processes. The pro-files usually exhibit drastic changes at specific profile points. Sensor variables may have large fluctuations at certain profile points and small variations at the others because of such “on–off” actions.
Objective and Scope
As shown in the motivating example, variations of profiles may show structures that are correlated be-tween profile points and auto-correlated from sample to sample. These structures may be caused by intrin-sic causes that must be isolated so they will not cause undesired alarms, after which the residuals can be monitored statistically. In this paper, a simple and robust method is proposed to address this problem. Furthermore, to detect faulty wafers, we propose a health index that is constructed based on the resid-uals of this model.
Literature Review
Profile monitoring has attracted much scholarly attention (e.g., Kang and Albin (2000), Kim et al. (2003), Chang and Gan (2006), Zou et al. (2006), Zhu and Lin (2010), Mahmoud et al. (2007), Zou et al. (2007)). Practitioners have used Shewhart charts, multivariate Hotelling T2 charts,
exponen-tially weighted moving average (EWMA) charts, and multivariate EWMA charts to monitor model pa-rameters such as the slope, intercept, and residual variance of linear profiles. For more complicated pro-files, nonlinear models can be used. Williams et al. (2007) proposed a four-parameter logistic regression
model to represent certain profiles. The parameters of the nonlinear regression model can be monitored using Hotelling T2 charts. If the functional form is
not smooth enough, wavelets can be used (e.g., Jin and Shi (1999), Chicken et al. (2009)).
When conventional control charts are used to monitor fixed-effect parameters, an essential assump-tion is that the model parameters remain unchanged as long as the process is “in-control”. Hence, possi-ble sample-to-sample trends of an in-control process are neglected. Moreover, because the error terms at different profile point are often assumed to be in-dependent and identically distributed random vari-ables, within-profile variations are also neglected.
To consider sample-to-sample auto-correlation in a profile, Noorossana et al. (2008) modified the linear model to include a first-order autoregressive (AR(1)) term in the errors. Jensen et al. (2008) examined a linear mixed model,
yi= Xiβ+ Zibi+εi, (1)
where yi = (yi1, . . . , yiK) is a vector of
measure-ments at K profile points that are related to a set of explanatory variables Xi = (xi1, . . . , xiK).β is a
vector of fixed-effect parameters. Zidenotes a vector
of the predictor variables for the possible change in shape of the profile. Furthermore, the random effect parameter, bi, follows a multivariate normal
distri-bution with mean vector 0 and covariance matrix
D, i.e., bi ∼ MN (0, D); and εi ∼ MN (0, Ri) is a
noise term, which is normally distributed with co-variance matrix Ri. Thus, within-profile correlation
of the residual is also considered. The model was later extended to a nonlinear mixed model (Jensen et al. (2009)). Restricted maximum likelihood methods were used to obtain the parameters of this model, but due to the large number of adjustable parame-ters, an iterative solution is possible only when cer-tain simple forms, such as compound symmetry or autoregression, are assumed for Ri.
We have carried out simulation studies using the linear mixed model. Convergence of parameter esti-mates using the restricted maximum-likelihood algo-rithm may not be guaranteed, and the computation time increases as the square of the profile length. Such computation problems are obviously not desir-able in practical applications. As has been pointed out by Jensen et al. (2008), the linear mixed model approach will only be superior to the linear fixed-effect model when the data are unbalanced or miss-ing. They also noted that careful selection of ini-tial guesses is required for convergence. Moreover,
Hotelling’s T2 was used to monitor b
i based on the
assumption of a multivariate normal distribution of random effects. In order to account for within-profile correlation, Qiu et al. (2010) has proposed the fol-lowing nonparametric mixed effects model:
yik= g(xik) + fi(xik) +εik, (2)
where g is the population profile function (fixed ef-fect), fi is the random-effects term describing the
variation of the ith individual profile from the fixed effect. The within-profile correlation will be ac-counted for by assuming that the random effect fi
is a mean zero process with a common covariance function, r(xk, xk) = E(fi(xk), fi(xk)). Because
within-profile correlation has been accounted for in the random effect, the error term is assumed to be
εi ∼ MN (0, σi2I). For Phase II analysis, the
in-control regression function was denoted by g0. The
in-control hypothesis then becomes H0: gi = g0 and
is monitored by a Hotelling T2 chart of the error
term. Due to the complexity of the model, reliable estimation is possible only with a large number of profile samples as well as a large number of profile points.
Alternatively, principal component analysis (PCA) is a technique commonly used in Phase I for model building (Jones and Rice (1992), Nomikos and Mac-Gregor (1995a, 1995b), Kaistha et al. (2004), Ding et al. (2006)). To use PCA, the profile data of one or more sensor variables are unfolded so that each profile point is regarded as a different variable. Mod-ern instrumentation and data acquisition technolo-gies provide users with a large amount of data. In the monitoring of semiconductor equipment, datasets with over 50 sensor variables and 500 profile points are very common. This leads to an unfolded matrix with more than 25,000 variables. Collecting enough wafer samples to perform statistically valid multi-variate analysis is often very difficult. Wise et al. (1999) has provided an overview of using multivariate statistical methods in semiconductor-manufacturing processes. Subsequently, several authors (e.g., Li et al. (2000), Spitzlsperger et al. (2005), Chen and Zhang (2010), Cherry and Qin (2006), Ge and Song (2010)) highlighted the trade-off between sensitivity and false alarms in Phase II monitoring when ap-plying such PCA monitoring methods due to drifts or non-Gaussian process behavior. Figure 4 shows the Q statistic (from Wise et al. (1999)) and a q-q plot of the error term of a multi-way PCA (MPCA) model for V3 at step 9. It is obvious that the distributions of the Q statistics are
differ-FIGURE 4. The Control Chart and q-q Plot of Q Statistic Obtained Using Residuals of the First Principal Component of Variable V3 at the 9th Step in Phase I and Phase II.
ent between the two periods. This phenomenon was persistently observed when different periods of plant operation were sampled. Consultation with process engineers confirmed that such drifts are part of the normal process that they do not wish to be alarmed. As Apley (2010) has pointed out, in profile monitoring, within-profile correlation and sample-to-sample autocorrelation must be identified. More-over, we should distinguish between trends due to in-control intrinsic causes and variations that should be monitored. Furthermore, it is important to deter-mine whether variations that should be monitored will show up in within-profile structures or in the error terms.
Model Development
Let us denote wafers, variables, and profile points with the indices i = 1, . . . , I, j = 1, . . . , J and k = 1, . . . , K, respectively. These K profile points are separated into M steps in accordance with the recipe setting. If Yijk is the value of the jth variable
at the kth profile point of the ith wafer, then Yijk is
given by
Yijk= μjk+ bijm(k)+ εijk, (3)
where m(k) denotes the specific step to which the kth profile point belongs. If km−1 < k ≤ km, where
km is the last profile point of the mth step, then
m(k) = m.
In this equation, μjkis the recipe-profile parameter
of the jth sensor variable at the kth profile point. The values are determined by the recipe, so large changes due to on–off changes are mostly captured in μjk. In many profile-monitoring studies, a
fixed-reference profile has been used (e.g., Woodall et al. (2004) employed a linear profile, Jensen et al. (2008) applied a logistic function, Zou et al. (2008) used a fixed combination of kernel functions). Our method corresponds to a direct estimation of a fixed-reference profile without the need of a specific profile form or assumptions about the smoothness of the profile.
Meanwhile, bijm is a level-shift parameter of the
jth variable of the ith wafer at the mth step. It rep-resents a mean shift at that point with respect to the fixed reference profile. Thus, bijm corresponds to
the profile coefficients determined for individual sam-ples. Unlike in other profile monitoring approaches in which the coefficients are assumed to be random vari-ables, we shall show that deterministic mechanisms dominate the variations of bijm.
If we assume that all systematic variations in-cluding on–off recipe steps, long-term aging, and the short-term first-wafer effect are captured by μjkand
bijm, then εijk denotes the remaining error. As we
have seen in Figure 3, the independent and identi-cally distributed assumption may not be valid for profiles with large on–off actions. Hence, we assume that the variances of the white noise term εijk,
de-noted by σ2
jk, will vary with the profile point k. In
order to reduce the influence of the larger variation at the beginning profile points of a step on the es-timates of the parameters, consider weighted least square estimates (WLSE) of the parameters μjkand
bijm by minimizing the following weighted sum of
square errors (WSS): WSSjm= I i=1 km k=km−1+1 1 σ2 jk (Yijk−μjk−bijm)2, (4)
with j = 1, . . . , J and m = 1, . . . , M . Given σ2
jk and
the constraints Ii=1bijm = 0 for all j and m, the
WLSEs of μjk and bijm are
ˆ μjk=Yjk= 1 I I i=1 Yijk (5) and ˆbijm= km k=km−1+1 wjk(Yijk− ˆμjk), (6) where wjk= 1 σ2 jk km k=km−1+1 1 σ2 jk
denotes the weight of (Yijk − ˆμjk) for the
esti-mate of bijm. In fact, Equations (5) and (6) are the
maximum-likelihood estimates of μjk and bijm if we
assume the error terms εijk are mutually
indepen-dent and normally distributed for all i = 1, . . . , I and k = 1, . . . , K. Even though this assumption will be violated with sample-to-sample autocorrela-tion and within-profile correlaautocorrela-tion (e.g., from Fig-ure 3, the residuals are clearly correlated within pro-files across k, as well as temporally across i), we be-lieve these still constitute intuitively appealing and effective estimators. Apley (2010) has commented that, because within-profile correlation is pervasive in most profile-monitoring problems, such correlation will be revealed as sample-to-sample variations when we assume that there is no within-profile correlation, as accounted for by the term Zibi in Jensen et al.
(2008). It should be pointed out that the level shifts represent a highly structured non-random change of the profiles. The assumption of independence of error terms across profile points should be acceptable for profiles that are long and without sharp changes due to on–off actions. This assumption may not be appli-cable for shorter profiles with sharp changes. In the following section, we shall see that, in our approach, the portion of a profile with large “on–off” changes is given less weight. Hence, we believe that this is a robust method for estimating the level shift.
In practice, σ2
jkis unknown and must be estimated
using historical data. However, according to tempo-ral profile-to-profile behavior presented in Figure 3, we can assume that the changes in level-shift param-eters between successive wafers in the same lot are small. The absolute difference between two succes-sive profiles within a lot, i.e., the moving range of
{Yijk}i=1, is defined as
Rijk =|Yijk− Y(i−1)jk| ≈ |εijk− ε(i−1)jk| (7)
and MRjkdenotes the corresponding mean. Then the
standard deviation of the error term, σjk, can be
es-timated by
ˆ
σjk= MRjk/d2, (8)
where d2 = 1.128 (Montgomery (2005)). The
moving-difference method was used to reduce the ef-fect of sustained step changes or other trends be-tween samples. It was also used by Jensen et al. (2008) to estimate the covariance matrix in bi in the
linear mixed model or the covariance matrix inβi in
the linear model.
All digital sensors are actually discontinuous due to their resolutions. Hence, it is possible that some of the estimates of ˆσjk are zeros. To avoid singularity,
they can be replaced by a lower bound of standard deviation. Suppose that the measurement resolution of variable j is rjand that yjis the observed value, we
assume the true value will be uniformly distributed in (yj−rj/2, yj+rj/2). The lower bound of the
stan-dard deviation of this variable is rj/
√ 12. Hence, the estimator of σjk becomes ˆ σjk= max MRjk d2 , rj √ 12 . (9)
The level-shift parameter estimators in Equation (6) can also be modified into
ˆbijm= km k=km−1+1 ˆ wjk(Yijk− ˆμjk), (10)
where ˆwjk is the estimate of wjk given by
ˆ wjk= 1 ˆ σ2 jk km k=km−1+1 1 ˆ σ2 jk , (11) where km−1+ 1≤ k ≤ km.
In addition, to avoid the effect of large deviations due to faulty operations or data collection, estimates of the recipe-profile parameter in Equation (5) at each profile point can be slightly modified as
ˆ
μjk= median
i (Yijk). (12)
Hence, the residuals can be expressed as follows: ˆ
eijk= Yijk− ˆμjk− ˆbijm(k). (13)
The estimator of level-shift parameter bijm is given
measurements in the step to represent the general shift in a step with respect to the mean profile. It sets the weighting coefficients as inversely proportional to the variance at each profile point of the same step. In this way, data with larger variations coming from the unstable period at the beginning of a step due to the on–off operation will receive negligible weight. Data coming from the more stable period in each step will contribute more to the estimate of this level shift. The level shift is responsible for capturing the intrinsic causes due to long-term aging and the short-term first-wafer effect. As we have observed in Figure 3, an unstable period due to on–off actions will have larger variations that are less indicative of these de-terministic trends.
It should be pointed out that the model parame-ters ˆμjkand ˆwjkas well as the control limits used for
future analysis of residuals must be estimated with the outliers removed from the training data in Phase I. The detailed algorithm for applying our proposed model to a given dataset to estimate model param-eters and for determining outliers in Phase I are de-scribed in Appendix A.
Numerical Results of
the Proposed Algorithm
Comparison with MPCAIn order to better illustrate the advantages of our proposed method with respect to MPCA, we first present the results of applying the proposed method to the V3 data at step 9 of our illustrative example. By following the procedures in Appendix A, the
esti-mated level shifts ˆbi39and residuals ei3kfor the Phase
I data can be obtained. The corresponding values of ˆbi39 and ei3k for wafers in Phase II can be obtained
using ˆμjk and ˆwjk estimated in Phase I and
Equa-tions (A1) and (A6).
We also calculate the Q statistic (Nomikos and MacGregor (1995a), Wise at al. (1999)) to compare the residuals of new profiles in Phase II to a con-trol limit defined by the profiles in Phase I under normal operating conditions. Details of computing the control limit are shown in Appendix A. The Q statistic for the residuals of a new profile is defined as Equation (A10). Figure 5(a) shows the Q statistics of the normalized residuals ˜ei3k of the training data in
Phase I and the testing data in Phase II. Figure 5(b) demonstrates the q-q plot of the Q statistics of the training data with respect to the testing data. The q-q plot of the proposed method is closer to a straight line than that in Figure 4(b), which means that the distributions of the Q statistics for Phase I and Phase II are more similar for the proposed method than for MPCA. The upper control limit of the Q statistic is set by taking the significance level α = 0.273%, i.e., z0= 3.09. Wafer 58 is an outlier of the training data
for both the proposed method and MPCA. There are two outlier signals (wafers 287, 357) in Phase II for the proposed level-shift method. MPCA found 11 outliers (wafers 287, 300, 313, 319, 320, 322, 323, 324, 331, 346, 357). Obviously, the false-alarm rate of this new model is substantially lower than that of the MPCA method. In a simulation study, we know the system is “in-control”. In this case, it was the opinion of the expert engineer that the system is in fact in-control.
FIGURE 5. The Control Chart and q-q Plot of Q Statistic Obtained Using Residuals of the Proposed Model of Variable V3 at the 9th Step in Phase I and Phase II.
FIGURE 6. The Reference Profiles of Variables V1 to V5.
Let us now apply the procedure just developed to analyzing the complete dataset.
Recipe Profiles
The recipe profiles ˆμjkof the five variables V1–V5
(obtained by Equation (A1) in Appendix A) are illus-trated in Figure 6. The resemblance between Figure 6 and Figure 1 demonstrates that the changes in sensor variables due to recipe actions have been successfully captured. Figure 7 shows the difference between ac-tual profiles and the reference profile Yijk−ˆμjk, which
illustrate the trends as shown in Figure 3, but for the entire profile.
Level-Shift Parameters of Variable V3
Figure 8 illustrates the estimates of the level-shift parameter for variable V3, ˆbi3m, for 1 ≤ i ≤ 364
and 1 ≤ m ≤ 23. The y-axis of each subplot was limited to the range (−10, 10). The abscissa is the run sequence i = 1, . . . , 364. Significant aging trends and first-wafer effects (as indicated by the bullets) are evident in steps 2, 3, 5, 7, 9, 10, 12, 14, 16, 18, 20, 22, while no such trends were captured for the other steps highlighted by the gray background color.
The aging trends and first-wafer effect are extracted deterministic patterns demonstrated by Figure 2(a) and Figure 2(b) for the long-term and short-term structures of a profile. Steps without trends deviate little from zero, which indicates that the V3 sensor was turned off during these steps. The profiles are just noise from the sensor’s electronics. This illus-trates that the level-shift parameters capture the ag-ing phenomena. Because this is a systematic and in-trinsic effect, it is observed in all active steps with a similar pattern. Figure 9 shows the Q statistics and the q-q plot comparisons for all turn-on steps 2, 3, 5, 7, 9, 10, 12, 14, 16, 18, 20, 22 for the variable V3 and i = 1, . . . , 364. Again the statistics of the residuals are comparable in Phase I and Phase II, indicating that intrinsic trends have been sufficiently separated.
Feature Extraction from the Level-Shift Parameters of Variables V1–V5
It is difficult to track variations in the level-shift parameters of all variables at every step simultane-ously. To simplify presentation, PCA was performed using all 23 level-shift parameters for each variable V1 to V5. The wafer-to-wafer scores of the first
prin-FIGURE 7. The Residual Profiles of Variables V1 to V5.
FIGURE 8. Wafer-to-Wafer Variations in Level-Shift Parameters for All m = 1,. . . ,23 Steps for the Variable V3 (y-axis) and All i = 1,. . . ,364 Wafers (x-axis).
FIGURE 9. The Control Charts and q-q Plots of Q Statistics for Turn-On Steps 2, 3, 5, 7, 9, 10, 12, 14, 16, 18, 20, 22 for the Variable V3 and All i = 1, . . . , 364 Wafers. (Rows 1 and 3 are Q statistics vs. wafers and Rows 2 and 4 are the corresponding q-q plot comparisons.
cipal components for the five variables are shown in Figure 10. The vertical lines divide the wafer se-quences by lots and indicate the first wafers. Aging was the main feature of V1, while both aging and the first-wafer effects were found for V3 and V4. The first-wafer effect was extremely prominent in the V5 data. However, the feature characteristic of variable V2 was quite different from those of the other four variables. It may be fruitful to identify the source of such differences.
As was mentioned earlier, the entire system was under run-to-run feedback control in which a cer-tain manipulation variable, MV, not disclosed due to the proprietary nature of the information, was var-ied based on measured variations in the quality of the output. Logically, the systematic manipulation of an operating variable should be reflected in some of the sensor data. Figure 11 is a scatter plot of the first principal component of V2 against MV. The ab-solute correlation coefficient over 364 wafers is 0.855. This strong relationship indicates that the long-term trend in V2’s level-shift parameters was caused by run-to-run adjustments.
Jensen et al. (2008) represented structured
distor-tions to the reference profile as Zibi. We have
al-ready discussed the equivalence of our level-shift pa-rameter to this term. The results in this section have clearly demonstrated that the variations in the coef-ficients biwill be dominated by patterned
sample-to-sample trends due to intrinsic causes that are part of the “in-control” process. Such changes should not be monitored statistically. Variation that cannot be at-tributed to these prescribed changes will be revealed in the error term. For example, Qiu et al. (2010) de-veloped an elaborate nonparametric method to ex-tract within-profile correlation, but only the error terms were monitored.
Health Index and
Faulty Wafer Detection
After eliminating aging and the first-wafer effects, the residuals vector ˆeik= (ˆei1k, . . . , ˆeiJk), where ˆeijk
is defined in Equation (13), contains the information about the process’ health at the kth profile point during the production of the ith wafer. Assuming that ˆeik is multivariate normally distributed with a
zero mean vector, then the Hotelling T2
ikof ˆeik is
ˆ
FIGURE 10. Wafer-to-Wafer Variations in the 1st Principal Component of all 23 Level-Shift Parameters for Variables V1 to V5.
where ˆΣk is the sample covariance matrix of ˆeik for
all normal wafers in Phase I. When ˆeik belongs to
Phase I, it is dependent on ˆΣk. The statistic ˆTik2
in Phase I follows a beta distribution (Chou et al. (1999)). Johnson and Wichern (2007) showed it can be approximated by a chi-square distribution. They also proved that ˆT2
ik in Phase II is F -distributed
because ˆeik is independent of ˆΣk. Their degrees of
freedom are related to the number of turn-on vari-ables v(k) and the number of normal samples for step nm(k) in the training dataset (see Appendix B). In
order to evaluate them on a uniform basis, the cumu-lative probability qik of each ˆTik2 is evaluated using
the corresponding distributions and degrees of free-dom for each profile point,
χ21−qik(v(k)) = ˆT 2
ik,
if i∈ L, i.e., wafer i in Phase I dataset, and F1−qik(v(k), nm(k)− v(k)) = nm(k)(nm(k)− v(k)) (n2 m(k)− 1)v(k) ˆ Tik2,
if i /∈ L, i.e., wafer i in Phase II dataset. (15)
Hence, qik is approximately uniformly distributed
over the interval (0, 1) if the process is in control. qik is one minus the p-value of the ˆTik2 statistic.
Larger qik imply higher probability of a fault for
wafer i at the kth profile point. Identifying a poten-tial faulty wafer involves evaluating an overall prob-ability for the wafer over all profile points. It may be intuitive to use a geometric mean of p-values, i.e., (Kk=1(1− qik))1/K, but such a geometric mean will
be very sensitive to a few profile points with ex-tremely small p-values. To balance the importance of every profile point in detecting a potential faulty wafer, the average of qik for all profile points might
be used, denoted by ¯qi, as this will have an
approx-imate normal distribution. Hence, an overall health index hi of the ith wafer can be defined using the
standardized statistics of ¯qi,
hi= q¯i− ˆμq
ˆ
σq , (16)
where ˆμq and ˆσq are the mean and standard
devia-tion, respectively, for all normal wafers in the train-ing data. In this study, we set the upper control limit of hi at 3, so if hi > 3, the ith wafer is potentially
FIGURE 11. The First Principal Component of Variable V2’s Level-Shift Parameters versus Values of the Run-to-Run Control Manipulation Variable MV for Each Wafer.
faulty. The detailed algorithms for computing the values of ˆT2
ik, qik, ¯qi, ˆμq, and ˆσq and for detecting
outliers in Phase I are presented in Appendix B. The bottom plot in Figure 12 is a signal map of qik
for the etching data. It allows us to micro-monitor op-erations at each profile point. The x-axis is the wafer index. The left and right y-axes are the step index and the profile points, respectively. When the value of qik is larger than 0.999, the corresponding
loca-tion on the map will be highlighted. The signal map clearly shows that the operation in step 2 is mainly responsible for the pronounced first-wafer effect.
The top plot in Figure 12 shows the control chart of hi for i = 1, . . . , 364. It demonstrates a total 18
signal wafers in Phase I and Phase II as follows: Phase I: 1* 2* 50* 74* 83* 108* 109* 154* 179*
204*
Phase II: 224* 249* 270* 271* 295* 325 329 341* Note that the superscript ‘*’ means that the wafer is the first or second wafer in a lot. In Phase I, all outliers are due to first-wafer effects. In Phase II, only two wafers, 325 and 329 in lot 15, are possibly faulty, and neither is the first wafer of a lot. They lie just outside the control limit. This shows that the proposed method can be used to monitor an aging process without causing frequent false alarms.
It should be pointed out that the health index depends solely on error terms and not on the level-shift parameter bijm. Hence, we are in fact looking
for changes in μjk, i.e., possible error in basic recipe
actions of the process.
Discussion and Conclusions
Changes of profile shape with respect to a ref-erence profile are highly structured in semiconduc-tor manufacturing. In our example, the changes may have been caused by intrinsic trends that are auto-correlated from sample to sample. These are changes that one does not wish to trigger alarms. Moreover, due to large on–off actions, after eliminating strongly correlated within-profile variation, residuals at dif-ferent profile points are still unlikely to be indepen-dent and iindepen-dentically distributed. Failure to account for these effects will lead to frequent false alarms.
This study has led to a reference-profile and level-shift model. On–off actions and a general similarity in the profiles due to recipe actions can be captured in the reference profile. The level-shift parameters serve as good indicators for within-profile variations caused by long-term trends and short-term effects. A simple and robust estimation method has been provided for their estimation. Capturing and sepa-rating these systematic variations in the sensor data, which are regarded as part of normal operation,
fa-FIGURE 12. The Health Index Obtained from the Residuals of the Proposed Model.
cilitates the detection of real faults. The residuals of this model can then be combined into a health index for each wafer using standard multivariate statisti-cal analysis. This health index allows the detection of faulty wafers in the presence of aging trends and the first-wafer effects without causing frequent false alarms. Future study of this problem will focus on isolating the root causes, and the results will be pre-sented in a subsequent report.
Appendix A:
Profile Model Parameter
Estimation in Phase I
Step 1Select variable j and step m for the following steps. Let Sjm denote the order set of wafer indices
under normal operating conditions for the jth vari-able at the mth step. At the beginning of the pro-cedures, Sjm = L ={1, . . . , 223} includes indices of
all wafers in the Phase I training dataset arranged in ascending order of the manufacturing sequence.
Step 2
The estimate of the reference profile is given by ˆ
μjk= median i∈Sjm
(Yijk). (A1)
Step 3
For the kth profile point at the mth step, calculate the sample variance s2
jk as
s2jk= Var i∈Sjm
(Yijk). (A2)
Step 4
Calculate the moving range for any two successive wafers that belong to Sjm,
Rijk=|Yijk− Y(i−1)jk|, (A3)
and its average,
MRjk = mean i∈Sjm (Rijk), (A4) and ˆ σjk= max MRjk d2 , rj √ 12 . (A5)
Step 5
Obtain the weighting factors for all k and m as ˆ wjk= (1/ˆσjk)2 km k=km−1+1 (1/ˆσjk)2 . (A6) Step 6
Calculate the estimate of the level-shift parame-ters as ˆbijm= km k=km−1+1 ˆ wjk(Yijk− ˆμjk). (A7) Step 7
Calculate the residuals, ˆ
eijk = Yijk− ˆμjk− ˆbijm, (A8)
and normalize them ˜
eijk= s−1jk × ˆeijk. (A9)
Step 8
Obtain the Q statistic for the jth variable at the mth step, Qijm= km k=km−1+1 ˜ e2ijk. (A10) Step 9
Calculate the upper control limit (UCL) for the Q statistic as follows: Assume ˜eijm = (˜eij,km−1+1, . . . ,
˜
eij,km) is multivariate normally distributed with mean vector 0 and covariance matrix Σjm. The sum
of squares of these normalized residuals can be ap-proximated by a weighted chi-squared distribution φχ2
v, where the constant φ and the degrees of
free-dom v are both functions of Σjm (Box (1954)). The
control limit of Q statistic can be calculated by Qαjm= θ1jm 1−θ2jmh 0 jm 1− h0 jm θ2 1jm + sign(h0 jm)zα 2θ2jm(h0jm)2 θ1jm 1/h0 jm , (A11)
where zα is the critical value of the standard normal
distribution of significance level α and sign(h0jm) = 1, if h0 jm≥ 0 −1, otherwise. In addition, h0jm= 1− 2θ1jmθ3jm 3θ2jm , (A12) θ1jm= trace(A), (A13) θ2jm= trace(A2), (A14) θ3jm= trace(A3), (A15)
where A is the covariance matrix of the residuals ˜
eijm, for all i ∈ Sjm. The control limit in
Equa-tion (A11) requires that the error terms are indepen-dent between samples. Such an assumption should be essentially valid if the sample-to-sample variation is largely captured by bijm in Step 6.
Step 10
If Qijm > Qαjm, then remove the ith wafer from
Sjm, i.e., Sjm = Sjm\{i}, and go back to Step 2,
cycling until Qijm≤ Qαjm for all i∈ Sjm.
Step 11
Repeat Step 1 to Step 10 for all other j’s and m’s.
Appendix B:
Calculation of Profile Point and
Overall Health Indices
Step 1Let Φm= S1m∩ S2m∩ · · · ∩ SJm denote the set of
indices of wafers that have normal operating profiles for all variables at the mth step and let nm be the
size of Φm. The corresponding estimate of Σk is then
given by ˆ Σk= i∈Φm(k) ˆ eikˆeik nm− 1 . (B1) Step 2
Calculate Hotelling’s T2 for every wafer at every profile point,
ˆ
Tik2 = ˆeikΣˆ−1k ˆeik. (B2)
Step 3
Use ˆT2
ikto calculate qikusing the chi-square
for wafers in Phase II, χ2
1−qik(v(k)) = ˆT 2
ik
if i∈ L, i.e., wafer i in Phase I dataset, and F1−qik(v(k), nm(k)−v(k)) = nm(k)(nm(k)− v(k)) (n2 m(k)− 1)v(k) ˆ Tik2
if i /∈ L, i.e., wafer i ∈ Phase II dataset, (B3) where v(k) is the number of variables that are active at the kth profile point. Hence, the average of qikfor
wafer i can be calculated as ¯ qi= 1 K K k=1 qik. (B4) Step 4
Let the normal data set Φ be L, the set of all training data at the beginning. Denote the number of samples in this training dataset by nΦ. Calculate
ˆ
μq and ˆσq, the mean and standard deviation,
respec-tively, for all wafers in this set as ˆ μq= 1 nΦ i∈Φ ¯ qi (B5) and ˆ σq2= 1 nΦ− 1 i∈Φ (¯qi− ˆμq)2. (B6)
An overall heath index for the ith wafer can be cal-culated as hi= ¯ qi− ˆμq ˆ σq . (B7)
If zi > 3 for any i, then the ith wafer is considered
as an outlier and removed from the training dataset Φ = Φ\{i}. This step is repeated until all outliers have been removed.
Step 5
Equation (B7) is then used to calculate the overall health index for all other wafers in Phase II.
Acknowledgments
The authors are very grateful to the Editor and the two referees for valuable suggestions that greatly improved the paper. This work was supported in part by National Center for Theoretical Sciences of Tai-wan and by Hong Kong’s Research Grants Coun-cil through Competitive Earmarked Research Grant 620010.
References
Akima, H. (1970). “A New Method of Interpolation and Smooth Curve Fitting Based on Local Procedures”. Journal
of the Association for Computing Machinery 17, pp. 589–
602.
Apley, D. W.(2010). Comment to “Nonparametric Profile Monitoring by Mixed Effects Modeling”. Technometrics 52, pp. 277–280.
Box, G. P.(1954). “Some Theorems on Quadratic Forms Ap-plied in the Study of Analysis of Variance Problems. I: Effect of Inequality of Variance in the One-Way Classification”. The
Annals of Mathematical Statistics 25, pp. 290–302.
Chang, T. C.and Gag, F. F. (2006). “Monitoring Linearity of Measurement Gauges”. Journal of Statistical Computation
and Simulation 76, pp. 889–911.
Chen, T.and Zhang, J. (2010). “On-Line Multivariate Sta-tistical Monitoring of Batch Processes Using Gaussian Mix-ture Model”. Computers and Chemical Engineering 34, pp. 500–507.
Cherry, G. A.and Qin, S. J. (2006). “Multiblock Principal Component Analysis Based on a Combined Index for Semi-conductor Fault Detection and Diagnosis”. IEEE
Transac-tions on Semiconductor Manufacturing 19(2), pp. 159–171.
Chicken, E.; Pignatiello, J.; and Simpson, J. R. (2009). “Statistical Process Monitoring of Nonlinear Profiles Using Wavelets”. Journal of Quality Technology 41, pp. 198–212. Chou, Y. M.; Mason, R.;and Young, J. C. (1999). “Power
Comparisons for a Hotelling’s T2 Statistic”.
Communica-tions in Statistics, Part B—Simulation and Computation
28, pp. 1031–1050.
Ding, Y.; Zeng, L.; and Zhou, S. (2006). “Phase I Analy-sis for Monitoring Nonlinear Profiles in Manufacturing Pro-cesses”. Journal of Quality Technology 38, pp. 199–216. Ge, Z.and Song, Z. (2010). “Semiconductor Manufacturing
Process Monitoring Based on Adaptive Substatistical PCA”.
IEEE Transactions on Semiconductor Manufacturing 23(1),
pp. 99–108.
Jensen, W. A.; Birch, J. B.;and Woodall, W. H. (2008). “Monitoring Correlation Within Linear Profiles Using Mixed Models”. Journal of Quality Technology 40, pp. 167–183. Jensen, W. A.and Birch, J. B. (2009). “Profile Monitoring
via Nonlinear Mixed Models”. Journal of Quality Technology 41, pp. 18–34.
Jin, J. and Shi, J. (1999). “Feature-Preserving Data Com-pression of Stamping Tonnage Information Using Wavelets”.
Technometrics 41(4), pp. 327–339.
Johnson, R. A.and Wichern, D. W. (2007). Applied
Multi-variate Statistical Analysis, 6th edition, pp. 241–247. Upper
Saddle River, NJ: Pearson Education.
Jones, M. and Rice, J. (1992). “Displaying the Important Features of Large Collections of Similar Curves”. The
Amer-ican Statistician 46, pp. 140–145.
Kaistha, N.; Moore, C. F.;and Leitnaker, M. G. (2004). “A Statistical Process Control Framework for the Charac-terization of Variation in Batch Profiles”. Technometrics 46, pp. 53–68.
Kang, L. and Albin, S. L. (2000). “On-Line Monitoring When the Process Yields a Linear Profile”. Journal of
Qual-ity Technology 32, pp. 418–426.
Kim, K.; Mahmoud, M. A.;and Woodall, W. H. (2003). “On the Monitoring of Linear Profiles”. Journal of Quality
Li, W.; Yue, H.; Valle, S.;and Qin, J. (2000). “Recursive PCA for Adaptive Process Monitoring”. Journal of Process
Control 10, pp. 471–486.
Mahmoud, M. A.; Parker, P. A.; Woodall, W. H.; and Hawkins, D. M.(2007). “A Change Point Method for Lin-ear Profile Data”. Quality & Reliability Engineering
Inter-national 23, pp. 247–268.
May, G. S.and Spanos, C. J. (2006). Fundamentals of
Semi-conductor Manufacturing and Process Control. Hoboken,
NJ: John Wiley & Sons, Inc.
Montgomery, D. C.(2005). Introduction to Statistical
Qual-ity Control, 5th edition. Hoboken, NJ: John Wiley & Sons,
Inc.
Nomikos, P.and MacGregor, J. F. (1995a). “Multivariate SPC Charts for Monitoring Batch Processes”.
Technomet-rics 37, pp. 41–59.
Nomikos, P. and MacGregor, J. F. (1995b). “Multi-Way Partial Least Squares in Monitoring Batch Processes”.
Chemometrics and Intelligent Laboratory Systems 30, pp.
97–108.
Noorossana, R.; Amiri, A.;and Soleimani, P. (2008). “On the Monitoring of Autocorrelated Linear Profiles”.
Commu-nications in Statistics: Theory and Methods 37, pp. 425–442.
Qiu, P.; Zou, C.;and Wang, Z. (2010). “Nonparametric Pro-file Monitoring by Mixed Effects Modeling”. Technometrics 52, pp. 265–277.
Spitzlsperger, G.; Schmidt, C.; Ernst, G.; Strasser, H.; and Speil, M. (2005). “Fault Detection for a Via Etch Pro-cess Using Adaptive Multivariate Methods”. IEEE
Transac-tions on Semiconductor Manufacturing 18, pp. 528–533.
Williams, J. D.; Birch, J. B.; Woodall, W. H.;and Ferry, N. M. (2007). “Statistical Monitoring of Heteroscedas-tic Dose–Response Profiles from High-Throughput Screen-ing”. Journal of Agricultural, Biological and Environmental
Statistics 12, pp. 216–235.
Wise, B. M.; Gallagher, N. B.; Butler, S. W.; White, D.; and Barna, G. G. (1999). “A Comparison of Principal Com-ponents Analysis, Multi-Way Principal ComCom-ponents Analy-sis, Trilinear Decomposition and Parallel Factor Analysis for Fault Detection in a Semiconductor Etch Process”. Journal
of Chemometrics 13, pp. 379–396.
Woodall, W. H.(2007). “Current Research on Profile Mon-itoring”. Produ¸c˜ao, 17(3), pp. 420–425.
Woodall, W. H.; Spitzner, D. J.; Montgomery, D. C.; and Gupta, S. (2004). “Using Control Charts to Monitor Process and Product Quality Profiles”. Journal of Quality
Technology 36, pp. 309–320.
Zhu, J.and Lin, D. K. J. (2010). “Monitoring the Slopes of Linear Profiles”. Quality Engineering 22, pp. 1–12. Zou, C.; Tsung, F.; and Wang, Z. (2007). “Monitoring
General Linear Profiles Using Multivariate Exponentially Weighted Moving Average Schemes”. Technometrics 49, pp. 395–408.
Zou, C.; Tsung, F.;and Wang, Z. (2008). “Monitoring Pro-files Based on Nonparametric Regression Methods”.
Techno-metrics 50, pp. 512–526.
Zou, C.; Zhang, Y.;and Wang, Z. (2006). “Control Chart Based on Change-Point Model for Monitoring Linear Pro-files”. IIE Transactions 38, pp. 1093–1103.