Control chart is a powerful tool in statistical process control (SPC) for achieving process stability and improving capability through the reduction of variability. The statistic used in constructing a control chart usually is a quality-related variable or a random vector consisting of several possibly correlated quality characteristics of a process or a product. However, in many practical situations, the quality of a process or a product is better characterized by a relationship between a response variable and one or more explanatory variables. Thus, for a sample, one observes a collection of data that can be represented by a curve (or profile). The profile can be linear or nonlinear. In this paper, we consider both casts of linear and nonlinear profile. Linear profiles are first fitted by a simple linear regression model and the estimated parameters are used for process monitoring. The monitoring of linear profiles can be applied to a wide variety of applications. In particular, most of studies in linear profile monitoring have been motivated by calibration applications. A nonlinear profile very often expressed as a high dimensional data vector. But most of multivariate data analysis technique will face the ill-condition problem due to the high correlation between data on the same profile. Because of this, we often use a Hotelling T2 statistic of the estimated regression parameters to monitor profiles.
Process monitoring based on control charting usually consists of Phase I and Phase II.
Control charts are used primarily in Phase I to bring the process to the in-control state. The purpose of this paper is to study the OAAT scheme proposed by Shiau and Sum (2006) for Phase I profile monitoring. A historical data set is collected and trial control limits are constructed to determine if the process is in-control. If so, then we have an in-control data set to establish adequate control limits for future on-line process monitoring.
In Phase I, to construct a control chart to monitor samples collected from the process, we use the samples to establish trial control limits for the monitoring statistic, such as Hotelling T2 statistics. For simplicity, if samples are outside the control limit, then we claim that the samples are out of control. If some samples are out of control, then the process may have assignable causes for these out-of-control samples. If some assignable causes are found, then these
out-of-control samples should be removed, otherwise, one needs to make a decision on eliminating these samples or not. No one knows which action is correct without further information since data points may exceed the limits simply by chance or due to be some uncovered assignable causes. For being conservative, many practitioners may choose to discard these potential〝out-of-control〞samples. After eliminating these out-of-control samples, the trial control limits need to be recalculated with the remaining samples to check if the data set still contains any out-of-control samples. The above screening steps are repeated until no more out-of-control samples are present. We then use the in-control process data attained in Phase I to estimate process parameters, e.g., the mean and standard deviation, of the monitoring statistic for setting up reliable control limits to monitor new process data in Phase II.
Statistically, there are possibilities that some in-control samples will be claimed concluded as out of control and some out-of-control samples as in control, which are similar to committing Type I and Type II errors in hypothesis testing, respectively. A good control chart should be able to control these two types of error rates. Because out-of-control samples usually make the trial control limits too wide, some out-of-control samples are not detected and removed. On the other hand, if some in-control samples are eliminated, some estimation efficiency is lost. Thus one would link to have effective procedure for collecting in-control data for Phase II usage. For more discussions regarding the differences between analyses of Phase I and Phase II, one is referred to Mahmoud and Woodall (2004) and Sullivan (2002). We focus on Phase I analysis in this paper.
Robust estimation methods are more effective in detecting unusual data points, and the control limits constructed with these robust estimates would be more reliable. But robust estimation for multivariate data or profiles are not as straightforward nor as easily implemented due to the extensive computation required to obtain the estimates.
Recently, some researches study robust estimation methods for detecting multivariate outliers.
Outliers in multivariate data are more difficult to detect than that in univariate data. Woodruff and Rocke (1994) discussed some approaches, such as the minimum volume ellipsoid estimator (MVE) proposed initially by Rousseeuw (1984) and the minimum covariance determinant (MCD)
estimator proposed by Rousseeuw (1984) and Rousseuw and Van Deiessen (1999), are well suited for detecting multivariate outliers. These approaches can avoid “bad” data to “mask effects” in multivariate data. Shiau, Yen and Feng (2006) proposed using a Hotelling T2 chart based on MCD estimators in conjunction with the False Discovery Rate (FDR), and demonstrated that the chart is effective in detecting a reasonable number of outliers.
Jensen, Birch, and Woodall (2007) compared the standard, MVE and MCD estimator. Figure 1 (from Jensen et al. (2007)) shows that best estimator among the three for various sample sizes m of the historical data set and percentage of outliers. They concluded that the MVE method is best for small values of m and proportion of outliers is small. We use two data sets for demonstration in this study, one is the bioassay data given in Williams, Woodall, Ferry, and Birch (2007) and the vertical board density profile (VDP) data set given in Walker and Wright (2000). The bioassay data set consists of 44 profiles and the VDP data consists of 24 profiles.
Both examples have fewer than 50 samples. We adopt the MVE estimator in this study, since the number of profiles m is small in two examples.
Rousseeuw and Leroy (1987) divided the MVE into the MVE and the Reweight MVE (RMVE). The RMVE is more robust than MVE, because the RMVE estimator is not affected by outliers. In this paper, we use the RMVE estimator to monitor samples.
Figure 1: A summary of the preferred estimator for p=2. (from Jensen et al. (2007))
For detecting out-of-control profiles and preventing losing too many in-control profiles, Shiau and Sun (2006) proposed an iterative procedure called One-At-A-Time (OAAT) scheme that discards only the most extreme beyond-limits point and then updates the control limits at each iteration. We refer to the traditional scheme that deletes all of the out-of-control profiles at a time and claims that it is the〝Delete-All〞scheme. For comparing two schemes, we use the following two measures: (i) the false-alarm rate, defined as the probability that an in-control sample is claimed as out of control; and (ii) the detecting power, is the probability that an out-of-control sample is detected. For example, let m=25 and m1=3, where m is the number of profiles and m1 is the number of out-of-control profiles. If one in-control profile is claimed as an out-of-control profile, then the false alarm rate is 1/(25-3). If two out-of-control profiles gets detected, then the detecting power is 2/3.
In this paper, we use a robust Hotelling T2 statistic, to monitor profile data for Phase I analysis, and apply the OAAT scheme to the bioassay data given in Williams, Birch, Woodall,
and Ferry (2007) and to the VDP data given in Walker and Wright (2002). We compare the Delete-All scheme and the OAAT scheme in terms of the false-alarm rate and detecting power.
We confirm that the OAAT scheme reduces the false-alarm rates while attaining the detecting power in profile monitoring. It is also found that when the process shift is large enough, the false-alarm rates of the two schemes are very close for one-sided control charts. Since the Delete-All scheme is more economic in computation, we provide a statistic as a guide on when to use it for achieving the efficiency.
Section 2 reviews linear/nonlinear profile monitoring, and robust multivariate control charts.
Section 3 compares the OAAT scheme with the Delete-All scheme, and Section 4 give a guideline on which scheme to use for real data. Section 5 demonstrates our method with the bioassay data and the VDP data. Finally, we conclude the thesis in Section 6.