Robust Estimation in Multivariate Control Chart

In most regression problems, the parameter estimators are correlated. Thus a popular procedure in the retrospective Phase I is the T² control chart, which is a tool to detect

multivariate outliers and shifts. To construct the T² chart, we need to estimate the mean vector and covariance matrix from a set of historical data. However, the presence of multiple outliers may go undetected due to their biasing effect on the estimators, which is known as the masking effect. Efforts to address this problem have focused on the robust estimation in the presence of multiple outliers, especially for the covariance matrix. A number of different researchers have studied the robust estimation in multivariate settings, for example, Rousseeuw (1984) and Rousseeuw and Leroy (1987).

The T²statistics based on the usual sample variance-covariance matrix estimator (de-noted by T_usu² ) and the successive-difference variance-covariance matrix estimator (denoted by T_dif² ) are most commonly used for multivariate control charts. Sullivan and Woodall (1996) showed that T_usu² and T_dif² are not effective in detecting more than a very small num-ber of outliers. Sullivan and Woodall (1996) and Vargas (2003) both showed that T_dif² is effective in detecting both sustained step and ramp shifts in the mean vector. Sullivan and Woodall (1996) found not only that T_usu² is less effective in detecting a shift in the mean vector, but also that the power to detect the shift decreases as the magnitude of the shift increases. They found that T_usu² has the effect of “pooling” the data all together such that a large step shift “inflates” the variance, thus making detection of the shift difficult.

Vargas (2003) and Jensen, Birch, and Woodall (2005) studied that the T² statistics based on high-breakdown estimators, such as the MVE and MCD methods of Rousseeuw (1984) (denoted respectively by T_mve² and T_mcd² ), are excellent in detecting multiple outliers for Phase I. Jensen, Birch, and Woodall (2005) further investigated the more advantageous situations of the MVE and MCD estimators for certain combinations of the sample size and the number of outliers present for multivariate Phase I applications. The MVE estimator is preferred for smaller sample sizes and a smaller percentage of outliers while the MCD estimator is preferred for larger sample sizes and/or large percentages of outliers. The

simulations and generated control limits presented there give useful guidelines about the situations for which the high-breakdown approach is most appropriate. Jensen, Birch, and Woodall (2005) discussed some properties of the MVE and MCD methods along with their computing algorithms.

The distributions of the exact MCD and MVE estimators of location and scale are not known in closed form. However, the asymptotic distributions of the MVE and MCD estimators can be derived. Davies (1992) showed that the exact MVE estimators of location and scale are consistent for the mean vector and covariance matrix respectively provided that the random error vectors are i.i.d. Similar results were given in Butler, Davies, and Jhun (1993) for the exact MCD estimators. However, the MCD estimators converge to their population counterparts at a rate of n^−1/2 while the MVE estimators converge at a slower rate of n^−1/3, thus the MCD estimators are more efficient. In addition, the distribution of the MCD estimator of location converges to a normal distribution, which is not necessarily the case for the MVE estimator of location. Thus, the asymptotic properties of the MCD estimators are superior to those of the MVE estimators. Davies (1997) and Butler, Davies, and Jhun (1993) also indicated that the asymptotic distributions of the T_mve² and T_mcd² statis-tics converge in distribution to a χ²_p distribution for i = 1, . . . , m. Hardin and Rocke (2005) provided an improved F approximation of the MCD estimator that gives accurate outlier rejection points for various sample sizes. So it may be useful to study the use of approximate control limits which are much simpler to obtain than those obtained via simulation. They believed that it is likely that large sample sizes are needed for the χ²_p approximation to be sufficiently accurate.

The hybrid algorithm of Rocke and Woodruff (1996) is a combination of the data partitioning methods of Woodruff and Rocke (1994), the FSA algorithm involving the MCD from Hawkins (1994), and M-estimation. This hybrid algorithm is very effective in detecting

a larger percentage of outliers. Rousseeuw and Van Driessen (1999) proposed an algorithm, which they called the FAST-MCD, that is based on an iterative scheme and the MCD estimators. The FAST-MCD method is able to handle large data sets within a reasonable amount of time.

In Section 4, we give a brief overview of various robust estimation methods based on the MCD method for multivariate Phase I application. In addition to using the MCD estimators in the T² statistic, with an attempt to enhance the detecting power of the control chart, we use the FDR procedure proposed by Benjamini and Hochberg (1995) for determing outliers in the data set. The proposed scheme is compared with the MCD-based T² control chart in Sections 5 and 6.

3 Modeling Nonlinear Profiles

3.1 Nonlinear Regression Model

Assume that there are sample profiles in the historical data for Phase I analysis. For each sample i we observe the response variable y_ij and a set of predictor variables x_ij (k = 1), j = 1, · · · , n, i = 1, · · · , m. We present the nonlinear regression model (2.3) in matrix form

and the covariate matrix be

X =

is the parameter matrix, where β_i is the p × 1 vector to be estimated for profile i, and

² =

is the random error matrix, where ²_ij are assumed to be i.i.d normal random variables with mean zero and variance σ².

To simplicity notation, we rewrite the form in (2.3) by stacking the n observations within each profile as y_i = (y_i1, y_i2, · · · , y_in)⁰, f (x_i, β_i) = (f (x_i1, β_i), f (x_i2, β_i), · · · , f (x_in, β_i))⁰, and ²i = (²i1, ²i2, · · · , ²in)⁰. The vector form is then given by

y_i = f (x_i, β_i) + ²_i, i = 1, 2, . . . , m. (3.5)

For the nonlinear regression model given in (3,5), we first must obtain the estimate of β_i for each profile. This is usually accomplished by employing the Gauss-Newton procedure and iterating until convergence to obtain the least squares estimates. Define the n×p matrix of the derivatives of f (x_i, β_i) with respect to β_i as Then an iterative solution for ˆβ_i is given by

βˆ^(a+1)_i = ˆβ^(a)_i + ( ˆF^0(a)_i Fˆ^(a)_i )⁻¹Fˆ^0(a)_i (y_i − f (x_i, ˆβ^(a)_i )). (3.7)

See Myers (1990, Chapter 9) or Schabenberger and Pierce (2002, Chapter 5) for a concise discussion of nonlinear regression model estimation. A more detailed treatment can be found in Gallant (1987) or Seber and Wild (2003).

Unlike linear regression, the small-sample distribution of parameter estimators in non-linear regression is unobtainable, even when the errors ²_ij are assumed to be i.i.d. normal random variables. Let F (ˆβ_i)(= ˆF_i) be the derivative matrix in (3.6) evaluated at the param-eter vector estimate ˆβ_i. Seber and Wild (2003, Chapter 12) gave the asymptotic distribution

of ˆβ_i as well as the necessary assumptions and regularity conditions needed for it. Since the following assumptions and regularity conditions

1. The ²_ij are i.i.d. with mean zero and variance σ².

2. For each i, f (x_i, β^∗_i) is a continuous function of β^∗_i for β^∗_i ∈ B, where B is a closed, bounded subset of R^p .

3. β_i is an interior point of B. Let B^∗ be an open neighborhood of B.

4. The first and second derivatives, ^∂f(xi,β^∗_i)

∂β_ir^∗ and ^∂²f(xi,β^∗_i)

∂β_ir^∗∂β_is^∗ (r, s = 1, 2, . . . , p), exist and are continuous for β^∗_i for all β^∗_i ∈ B^∗.

5. n⁻¹F (β^∗_i)⁰F (β^∗_i) converges to some matrix Ω(β^∗_i) uniformly in β^∗_i for β^∗_i ∈ B^∗.

6. n⁻¹^Pⁿ_i=1

·∂²f(xi,β^∗_i)

∂β_ir^∗∂β_is^∗

¸₂

converges uniformly in β^∗_i for β^∗_i ∈ B^∗.

7. Ω_i=Ω(β_i) is nonsingular.

hold, the asymptotic distribution of ˆβ_i is given by

√n(ˆβ_i− β_i) −→ N_p(0, σ²Ω⁻¹_i ). (3.8)

Also n⁻¹Fˆ⁰_iFˆ_i is a strongly consistent estimator of Ω_i. For practical purposes, the distri-bution given by (3.8) can not be calculated since the matrix Ω_i is unknown. Instead, the following approximate asymptotic distribution of ˆβ_i is commonly used:

βˆ_i ≈ Np(β_i, σ²(F⁰_iFi)⁻¹). (3.9)

For the “in-control” case, we have β_i = β for all m samples, where β is the in-control parameter vector. Accordingly, the Ω_i (and F_i) matrices are the same for all m profiles if all profiles have the same underlying function, f , the same x-values, and the same values of β_i. However, the ˆF_i matrices are not equal since the ˆβ_i values vary from profile to profile.

在文檔中對第一階段非線性剖面資料製程監控 (頁 18-25)