Nonparametric Profile Control Chart - 包含區間, 多變量管制圖及構面性資料統計品質管制

Statistical process control has been widely applied in many areas, especially in industries. Another topic in this research is about profile monitoring. Most statistical process control applications deal with the quality of a process or product can be adequately represented by the distribution of a univariate qual-ity characteristic or a vector of correlated qualqual-ity characteristics. But in many practical situations, the focus point is the relationship between a response vari-able and some explanatory varivari-ables. Thus, at each sampling stage, a collection of data points that can represent the relationship is a data curve or data pro-file. And thus a control chart designed for profile data is called a profile control chart.

The monitoring of process/product profiles is presently a growing and promis-ing area of research in statistical process control. One of the aims in this

re-search is to develop monitoring schemes for nonlinear profiles with random effects. We utilize the technique of principal components analysis to analyze the covariance structure of the profiles and propose monitoring schemes based on principal component (PC) scores.

Profile monitoring is a relatively new research area in quality control. Kang and Albin (2000) studied the problem of linear profile monitoring and proposed two control schemes by modelling the profiles with the simple linear regression model, Y = A₀+A₁x+, where Y is the response variable and x is the indepen-dent variable; A0 and A1 are the parameters to be estimated; the noise variables

’s are independent and normally distributed with mean zero and common vari-ance σ². By centering the x-values to make the least squares estimators of the Y -intercept (A₀) and slope (A₁) independent of each other, Kim et al. (2003) proposed a combined-chart scheme in which three EWMA charts designed re-spectively for detecting shifts in intercept, slope, and standard deviation (σ) are used simultaneously. Mahmound and Woodall (2004) presented and compared several control charts for Phase I analysis of linear profiles and applied some of the charts to a calibration application. For more discussions on linear profile monitoring, see the review paper by Woodall et al. (2004).

Shiau and Weng (2004) extended the above linear profile monitoring schemes to a scheme suitable for profiles of more general forms via nonparametric re-gression. No assumptions are made for the form of the profiles except the smoothness. The nonparametric regression model considered is Y = g(x) + ,

where g(x) is a smooth function and is the random error as before. Spline regression was adopted as the curve fitting/smoothing technique for its sim-plicity. They proposed an EWMA chart for detecting mean shifts, an R chart for variation changes, and an EWMSD (standard deviation) chart for variation increases.

Note that the models described above all consist of a deterministic line/curve plus random noises. It does not account for some allowable profile-to-profile variations that we often observe in many profile data, e.g., the aspartame ex-ample and VDP exex-ample, where these profile-to-profile variations should be con-sidered as caused by common causes. A monitoring scheme constructed based on the afore-mentioned “fixed-effect” model may interpret these common-cause variations as caused by some special causes and signal many false alarms. Thus, we need a suitable model that can cope with these common-cause variations and construct a monitoring scheme accordingly.

For this, Shiau, Lin, and Chen (2006) considered a random-effect linear model to develop monitoring schemes for linear profiles. Similarly, Jensen et al. (2006) proposed a linear mixed (effects) model for linear profiles. Williams et al. (2003) fitted nonlinear profiles by nonlinear parametric regression and then monitored profiles with some T² statistics of the estimated parameters.

Later, Williams et al. (2007) extended this methodology to nonlinear profiles with a nonconstant variance at set points to analyze a set of heteroscedastic dose-response profiles. Adopting a random-effect parametric nonlinear

regres-sion model for profiles, Shiau, Yen, and Feng (2006) proposed a robust nonlinear profile monitoring scheme. Jensen et al. (2009) proposed using nonlinear mixed models to model nonlinear profiles. Note that the parametric approaches men-tioned above all need to pre-specify a parametric functional form for profiles, a task often not so easy for practitioners. Qiu, Zou and Wang (2010) proposed a novel control chart, which dealt with mixed effect profile data without assump-tions on a parametric functional form. Their control chart is based on local linear kernel smoothing of profile data and on the EWMA weighting scheme at different time points and the heteroscedasticity of observations within each profile.

We extend the nonparametric fixed-effect model of Shiau and Weng (2004) to a random-effect model in order to incorporate some profile-to-profile variabil-ity as caused by common causes. With the random-effect model, we focus on the covariance structure and use the principal components analysis (PCA) to analyze it. Ding et al. (2006) also considered modelling profiles nonparametri-cally for a Phase I analysis, but proposed using ICA (independent components analysis) instead of PCA for monitoring profiles that are in clusters, a situation PCA may fail to preserve the clustering feature of the original data.

PCA is very useful in summarizing and interpreting a set of profile data with the same equally spaced x-values for each profile. We remark that the smoothing step described above can relax this requirement for profile data since the equally-spaced data can be obtained from the smoothed profiles easily.

Some pioneer works on analyzing curves with PCA include Castro et al. (1986), Rice and Silverman (1991), Jones and Rice (1992), and others. For applications, Shiau and Lin (1999) analyzed a set of accelerated LED degradation profiles to estimate the mean lifetime of the product with the techniques of nonparametric regression and PCA.

For Phase I profile monitoring, we propose using the usual Hotelling T² chart, a commonly used control chart designed for multivariate process data, by treating the principal component (PC) scores of a profile obtained from PCA as the multivariate data. For Phase II process monitoring, we propose and study three monitoring schemes constructed by utilizing the eigenvalues and eigenvectors obtained from PCA to compute the PC-scores of each incoming profile, including individual PC-score charts, a combined chart that combines all of the PC-score charts and a T² chart (different from the T² chart of Phase I). The performances of these monitoring schemes are evaluated in terms of the average run length (ARL).

Chapter 2 A Nonparametric Coverage Interval

In this chapter, we propose a coverage interval estimation based on sym-metric quantiles. The coverage precision of the proposed symsym-metric coverage interval is studied and comparisons with empirical quantile coverage interval is also conducted to demonstrate that the symmetric quantile coverage interval is superior to its empirical counterpart in practical usage when the underlying distribution is symmetric.

2.1 Symmetric Coverage Interval

For random variable Y with cumulative distribution function F , the λth quantile is defined as

F⁻¹(λ) = inf{c : F (c) ≥ λ}.

The classical 1 − α coverage interval is defined as C(1 − α) = (F⁻¹(α

2), F⁻¹(1 − α 2)).

Suppose that we now have a random sample y1, ..., yn from distribution F . The corresponding empirical 1 − α coverage interval is

Cn(1 − α) = (F_n⁻¹(α

2), F_n⁻¹(1 − α

2)) (2.1)

where we let F_n⁻¹ be the empirical quantile, a quantile function with distribution function of the sample type as Fn(y) = ¹_nPn

i=1I(yi ≤ y).

Unlike the way in which the empirical quantile is constructed based on the cumulative distribution function, the so-called symmetric quantile of Chen and Chiang (1996) is formulated based on a folded distribution function. Let us consider the folded cumulative function about µ, known or unknown, as

F_s(a) = P (|y − µ| ≤ a), a ≥ 0.

Extending from Chen and Chiang (1996), we define the 1 − α symmetric cov-erage interval as

C_s(1 − α) = (µ − F_s⁻¹(1 − α), µ + F_s⁻¹(1 − α))

where F_s⁻¹(λ) = inf{a : Fs(a) ≥ λ}. If F is continuous, the 1 − α symmetric coverage interval satisfies 1 − α = P (µ − F_s⁻¹(1 − α) ≤ y ≤ µ + F_s⁻¹(1 − α)). If we further assume that F is symmetric at µ, it can be seen that

C_s(1 − α) = C(1 − α), (2.2)

the classical one and the symmetric one are identical in the sense of containing the same set of reference individuals.

We interpret the folded cumulative function and the symmetric coverage interval through Figure2.1.

0 2 4 6 8 10

0.00.10.20.30.4

µµ − Fs⁻⁻¹((0.8))

µµ + Fs⁻⁻¹((0.8)) µµ

µµ+a µµ−a

Folded distribution function and symmetric coverage interval for Gamma distribution ΓΓ(2,2)

Figure 2.1: The symmetric coverage interval.

Considering the Gamma distribution Γ(2, 2) which has probability density function (pdf) as the curve in the figure, we consider the folded distribution about the median. With this distribution, the median µ is 3.36. For a given a > 0, the value of this folded distribution at a represents the probability of a region as the part of shadow. Suppose that the interest is 80% coverage interval. For this continuous distribution, we search F_s⁻¹(0.8) = a^∗ such that 0.8 = P (µ − a^∗ ≤ y ≤ µ + a^∗) with y ∼ Γ(2, 2) which indicates that a^∗ = 2.21.

Hence the 80% symmetric coverage interval is

C_s(0.8) = (µ − F_s⁻¹(0.8), µ + F_s⁻¹(0.8))

= (1.15, 5.57)

Let ˆµ be an estimate of µ. We may define the sample type 1 − α symmetric coverage interval as

C_sn(1 − α) = (ˆµ − F_sn⁻¹(1 − α), ˆµ + F_sn⁻¹(1 − α)) (2.3) where F_sn(a) = _n¹Pn

i=1I(|y_i − ˆµ| ≤ a) is the sample type folded cumulative distribution function and F_sn⁻¹(1 − α) = inf{a : F_sn(a) ≥ 1 − α}.

Let’s give a simple example to describe the construction of sample symmetric coverage interval. Suppose that we have a set of observations that are ordered as

−5, −3, −2, −1, −0.5, 0.5, 1, 3, 50, 100.

We want to construct 80% empirical and symmetric coverage intervals. With F_n⁻¹(0.1) = −5 and F_n⁻¹(0.9) = 50, the 80% empirical coverage interval is

Cn(0.8) = (−5, 50). (2.4)

For construction of symmetric coverage interval, we choose sample median as the estimate of µ. That is,

µ = F_n⁻¹(0.5) = inf{a : 1 10

i=1

I(y_i ≤ a) ≥ 0.5} = −0.5.

Let’s denote residuals e_i = y_i− ˆµ, i = 1, ..., 10. The residuals are

−4.5, −2.5, −1.5, −0.5, 0, 1, 1.5, 3.5, 50.5, 100.5.

The sample type folded cumulative distribution function is F_sn(a) = 1

i=1

I(|e_i| ≤ a).

For examples, Fsn(0) = ₁₀¹, Fsn(1) = ₁₀¹[I(| − 0.5| ≤ 1) + I(|0| ≤ 1) + I(|1| ≤ 1)] = ₁₀³. Then we have

F_sn⁻¹(0.8) = inf{a : 1 10

i=1

I(|e_i| ≤ a) ≥ 0.8}

= 4.5.

This indicates that the 80% symmetric coverage interval is

C_sn(0.8) = (ˆµ − F_sn⁻¹(0.8), ˆµ + F_sn⁻¹(0.8))

= (−0.5 − 4.5, −0.5 + 4.5)

= (−5, 4). (2.5)

Comparing the resulted sample empirical and symmetric coverage intervals in (2.4) and (2.5), it is seen the benefit for using the latter one for that it is shorter than the former one. This would happen very often when the observa-tions are drawn from asymmetric distribuobserva-tions.

在文檔中包含區間, 多變量管制圖及構面性資料統計品質管制 (頁 15-25)