4.3 A Real Data Application
5.3.1 Phase I Applications
First, we consider simulated profiles that imitate the dissolving process of aspar-tame under different levels of temperature, an example of profiles first described
in Kang and Albin (2000). Instead of generating profiles from multivariate nor-mal distribution as we did in Section 3.3.1, the IC aspartame profiles are directly generated from equation (3.11). That is,
y = I + M eN (x−1)2 + ε,
where all the random components I, M , N , and ε are independently generated from normal distributions. The same settings of the parameters of the random components as before are followed, i.e., (µI, σI) = (1, 0.2), (µM, σM) = (15, 1), (µN, σN) = (−1.5, 0.3), and σε = 0.3. Let p = 20, x = (x1, . . . , xp)′ are equally spaced ranging from 0.64 to 3.68. Choosing h = 0.48, 200 generated aspartame pro-files and the corresponding smoothed estimators by using the local linear smoother are shown in Figure 5.1. Note that the pattern of the generated aspatame profiles is quite similar to the profiles generated from the multivariate normal distribution as in Section 3.3.1. The OC profiles considered are generated from the following models:
(a) I ∼ N(µI+ δσI, σ2I);
(b) M ∼ N(µM + δσI, σM2 );
(c) I ∼ N(µI, (δ∗σI)2);
(d) M ∼ N(µM, (δ∗σM)2);
(e) y∼ δ(y − µ) + µ.
The first four OC models simulate the situations that one of the parameters of the random components is shifted by an amount quantified by δ, while Model (e) considers the case that the variance-covariance matrix of the profile data y is multiplied by a constant and the mean vector stays unchanged. Since there exists no distribution-free methodology for profile monitoring in the literature, we consider the method developed under parametric model for comparison. In this
1.0 1.5 2.0 2.5 3.0 3.5
Figure 5.1: (a) 200 generated aspartame profiles and (b) the corresponding smoothing estimates.
the profiles are the same, thus we can simply treat the profiles as multivariate data.
Here, the Hotelling’s T2 control chart introduced in Section 4.2 is also considered.
The performances of the control charts are also measured via the type-I and type-II error rates introduced in Section 3.1.3. The error rates reported in what follows are the average of 1,000 replications.
In Phase I applications, the reference sample is usually mixed with both IC and OC observations. To simulate the historical data set in Phase I, 500 subgroups of profiles with subgroup size 10 (i.e., m = 500 and n = 10) are generated, and among them there are 450 IC and 50 OC cases. We first compare the type-I error rates of the SMSS chart between using the conventional procedure and the OAAT procedure for each of the five OC conditions. Choosing α = 0.05, the control limit of the SMSS chart is then 5.8551 (from Table A.4 in Appendix A.2 with n = 10 and p = 2). The shift size of the parameters δ = 0(0.6)3 in Models (a) and (b), δ∗ = 1(0.4)3 in Models (c) and (d). We choose the number of effective PCs K = 3 for demonstration. The results are summarized in Table 5.1. From the table, we can see that the type-I error rate is increasing in the shift size when the conventional procedure is used for all the OC models. In contract, for the
Table 5.1: The type-I error rates of the SMSS chart by using the conventional and OAAT detecting procedures.
Conventional OAAT
δ δ∗ (a) (b) (c) (d) (a) (b) (c) (d)
0.6 1.4 0.0505 0.0508 0.0507 0.0511 0.0491 0.0494 0.0492 0.0495 1.2 1.8 0.0514 0.0520 0.0524 0.0527 0.0500 0.0504 0.0504 0.0507 1.8 2.2 0.0529 0.0531 0.0571 0.0544 0.0507 0.0505 0.0506 0.0508 2.4 2.6 0.0569 0.0540 0.0649 0.0557 0.0499 0.0500 0.0493 0.0503 3.0 3.0 0.0638 0.0554 0.0695 0.0580 0.0492 0.0500 0.0492 0.0500
OAAT procedure, the error rate is quite robust to the shift size and stays around 0.05 stably. To ensure the type-I error rate to be controlled at a nominal level, we adopt the OAAT procedure in what follows.
We next compare the performances between the SMSS chart and the Hotelling’s T2 chart. It is well known that the type-I error rate is hard to control at the prespecified value when a parametric control chart is applied on processes with distributions different from the assumption. Therefore, for a fair comparison, we adjust the control limit of the Hotelling’s T2 chart to achieve the specified type-I error rate, say, α = 0.05 in this case. Tables 5.2 - 5.6 tabulate the simulation results including the type-I and type-II error rates and their standard errors for OC Models (a) - (e), respectively. The columns labeled “T2(mod)” indicate the results of the Hotelling’s T2 chart after modifying the control limit to achieve the nominal false-alarm error rate α = 0.05. The columns “S” indicate the results of the SMSS chart and the numbers in the parentheses are the number of the effective PCs used in T02 statistic. The type-II error rates are unavailable and labeled “-”
in the tables when δ = 0, since there are no OC profiles in the data.
The results for Models (a) and (b) (Tables 5.2 and 5.3) are similar since both models are related to the mean shift. The type-I error rates of the Hotelling’s T2 chart are stably around 0.053, slightly exceeding the nominal value 0.05. This is
Table 5.2: The type-I and type-II error rates and their standard errors (in parentheses) of OC Model (a) for α = 0.05
pI pII
δ T2 S(2) S(3) S(4) S(5) T2(mod) S(2) S(3) S(4) S(5)
0.0 0.053 0.049 0.049 0.049 0.049 - - - -
-(.0003) (.0003) (.0003) (.0003) (.0003) - - - -
-0.6 0.0530 0.0489 0.0491 0.0493 0.0491 0.9074 0.9505 0.9499 0.9494 0.9505 (.0003) (.0003) (.0003) (.0003) (.0003) (.0014) (.0010) (.0009) (.0010) (.0010) 1.2 0.0523 0.0497 0.0500 0.0502 0.0497 0.6881 0.9479 0.9239 0.9299 0.9338 (.0003) (.0003) (.0003) (.0003) (.0003) (.0024) (.0010) (.0012) (.0011) (.0012) 1.8 0.0529 0.0494 0.0507 0.0504 0.0502 0.2035 0.9312 0.7972 0.8324 0.8604 (.0003) (.0003) (.0003) (.0003) (.0003) (.0022) (.0012) (.0021) (.0019) (.0017) 2.4 0.0529 0.0497 0.0499 0.0505 0.0502 0.0113 0.8969 0.4186 0.5429 0.6361 (.0003) (.0003) (.0003) (.0003) (.0003) (.0005) (.0015) (.0033) (.0034) (.0031) 3.0 0.0532 0.0499 0.0492 0.0492 0.0496 0.0001 0.8372 0.0574 0.1047 0.1754 (.0003) (.0003) (.0003) (.0003) (.0003) (.0001) (.0020) (.0013) (.0019) (.0027)
the effect of violating the normality assumption; however, the deviation is fairly small. On the other hand, the type-I error rates are well controlled around 0.05 for the SMSS chart despite of the shift size in mean. To obtain a fair comparison for the type-II error rates, we adjust the control limit of the T2 chart to achieve 0.05 for the type-I error rate. The type-II error rates of the modified Hotelling’s T2 chart are slightly larger than that of the regular (unmodified) T2 chart but the modified T2 chart still outperforms the SMSS chart. This is because the T2 statistics of the generated profiles do not depart much from the distribution it would have followed when the normality assumption holds. As it is well known, a parametric method usually is more efficient than a nonparametric method when the assumed parametric model is correct. Thus the Hotelling’s T2 chart has better performance than the SMSS chart in these models.
Table 5.3: The type-I and type-II error rates and their standard errors (in parentheses) of OC Model (b) for α = 0.05
pI pII
δ T2 S(2) S(3) S(4) S(5) T2(mod) S(2) S(3) S(4) S(5)
0.0 0.053 0.049 0.049 0.049 0.049 - - - -
-(.0003) (.0003) (.0003) (.0003) (.0003) - - - -
-0.6 0.0533 0.0491 0.0492 0.0494 0.0493 0.8877 0.9414 0.9446 0.9461 0.9465 (.0003) (.0003) (.0003) (.0003) (.0003) (.0015) (.0011) (.0010) (.0010) (.0010) 1.2 0.0525 0.0506 0.0504 0.0505 0.0500 0.5343 0.8179 0.8570 0.8794 0.8950 (.0003) (.0003) (.0003) (.0003) (.0003) (.0027) (.0020) (.0016) (.0015) (.0015) 1.8 0.0531 0.0502 0.0506 0.0506 0.0506 0.0538 0.3392 0.4832 0.5917 0.6721 (.0003) (.0003) (.0003) (.0003) (.0003) (.0011) (.0028) (.0030) (.0030) (.0027) 2.4 0.0529 0.0493 0.0493 0.0499 0.0497 0.0005 0.0272 0.0583 0.1071 0.1718 (.0003) (.0003) (.0003) (.0003) (.0003) (.0001) (.0008) (.0012) (.0018) (.0025) 3.0 0.0532 0.0490 0.0492 0.0490 0.0492 0.0000 0.0006 0.0024 0.0052 0.0101 (.0003) (.0003) (.0003) (.0003) (.0003) (.0000) (.0001) (.0002) (.0003) (.0005)
The ability of detecting outliers of the SMSS chart comes from the change of the directions of the spatial signs of the T02 and T12 statistics. For example, the shift in the mean of I indicates the vertical shift in the profiles, which enlarges both the T02 and T12 statistics; thus, the corresponding spatial sign vectors cluster around the upper side of the unit circle (see Figure 5.2(a)). On the other hand, the change in the mean of M affects mainly the T02 statistic hence the corresponding spatial sign vectors are closer to the right side of the unit circle (see Figure 5.2(b)).
The clustering of the spatial sign vectors enlarges the value of the Q statistic and hence the SMSS chart has the ability to detect outliers.
Both Models (c) and (d) indicate the variance-covariance structure is changed in the process, so the results are analogous as summarized in Tables 5.4 and 5.5.
The type-I error rates for both of the regular and modified Hotelling’s T2charts are decreasing in δ∗. Since the shift in the variance of the random component I or M
Table 5.4: The type-I and type-II error rates and their standard errors (in parentheses) of OC Model (c) for α = 0.05
pI pII
δ∗ T2 S(2) S(3) S(4) S(5) T2(mod) S(2) S(3) S(4) S(5)
1.0 0.053 0.049 0.049 0.049 0.049 - - - -
-(.0003) (.0003) (.0003) (.0003) (.0003) - - - -
-1.4 0.0521 0.0488 0.0494 0.0493 0.0493 0.9406 0.9463 0.9310 0.9346 0.9377 (.0003) (.0003) (.0003) (.0003) (.0003) (.0011) (.0010) (.0011) (.0011) (.0011) 1.8 0.0501 0.0498 0.0504 0.0503 0.0498 0.9217 0.9379 0.8729 0.8898 0.9008 (.0003) (.0003) (.0003) (.0003) (.0003) (.0012) (.0011) (.0015) (.0014) (.0013) 2.2 0.0494 0.0495 0.0505 0.0505 0.0503 0.8993 0.9159 0.7859 0.8164 0.8354 (.0003) (.0003) (.0003) (.0003) (.0003) (.0014 ) (.0013) (.0019) (.0018) (.0017) 2.6 0.0482 0.0498 0.0500 0.0505 0.0500 0.8744 0.8864 0.6857 0.7265 0.7612 (.0003) (.0003) (.0003) (.0003) (.0003) (.0015) (.0015) (.0024) (.0022) (.0022) 3.0 0.0474 0.0497 0.0500 0.0499 0.0503 0.8523 0.8598 0.5861 0.6352 0.6784 (.0003) (.0003) (.0003) (.0003) (.0003) (.0017) (.0017) (.0023) (.0023) (.0023)
leads to a change in the variance-covariance structure of the profiles (see equation (3.13)), consequently it changes the estimate of the eigen-vectors. As a result, the type-I error rates cannot be controlled at a specified level. Moreover, when the reference sample is contaminated by the outliers with large dispersion, the scatter matrix would be “inflated” when estimated by the sample variance-covariance matrix. More severe outlying condition makes most IC profiles to have smaller T2 statistic values to signal an alarm, hence the type-I error rate is decreased. In contrast, the performance of the SMSS chart in type-I error is still quite robust to the magnitude of the shift in the variance of I or M . On the other hand, the type-II error rates indicate that the SMSS chart is more powerful than the T2 chart in detecting outliers. Since the variation in I or M is incorporated in the first few PCs, the change in the variance leads to the change in the T02 statistic and hence the directions of the corresponding spatial sign vectors are more concentrated to
Table 5.5: The type-I and type-II error rates and their standard errors (in parentheses) of OC Model (d) for α = 0.05
pI pII
δ∗ T2 S(2) S(3) S(4) S(5) T2(mod) S(2) S(3) S(4) S(5)
1.0 0.053 0.049 0.049 0.049 0.049 - - - -
-(.0003) (.0003) (.0003) (.0003) (.0003) - - - -
-1.4 0.0545 0.0494 0.0495 0.0495 0.0494 0.9360 0.9038 0.9153 0.9228 0.9266 (.0003) (.0003) (.0003) (.0003) (.0003) (.0011) (.0014) (.0012) (.0012) (.0012) 1.8 0.0519 0.0504 0.0507 0.0504 0.0502 0.9095 0.7919 0.8304 0.8530 0.8681 (.0003) (.0003) (.0003) (.0003) (.0003) (.0013) (.0019) (.0017) (.0016) (.0015) 2.2 0.0503 0.0505 0.0508 0.0507 0.0504 0.8799 0.6483 0.7095 0.7425 0.7744 (.0003) (.0003) (.0003) (.0003) (.0003) (.0015) (.0023) (.0022) (.0021) (.0020) 2.6 0.0480 0.0499 0.0503 0.0506 0.0502 0.8487 0.5199 0.5884 0.6391 0.6777 (.0003) (.0003) (.0003) (.0003) (.0003) (.0017) (.0025) (.0025) (.0025) (.0023) 3.0 0.0468 0.0497 0.0500 0.0498 0.0505 0.8211 0.3995 0.4723 0.5305 0.5743 (.0003) (.0003) (.0003) (.0003) (.0003) (.0020) (.0026) (.0025) (.0026) (.0026)
the right (see Figure 5.2(c) and (d)). However, the clustering patterns of Models (c) or (d) are not as obvious as that of Models (a) or (b), thus the powers are actually lower than that in Models (a) and (b).
Consider the OC Model (e), which indicates that the variance at each of the design points dilates or shrinks and the covariance structure is unchanged. It is equivalent to changing the eigen-values of the variance-covariance matrix but not affecting the corresponding eigen-vectors. The results are summarized in Table 5.6. When δ > 1, which indicates the dilation of process dispersion, the type-I error rates of the Hotelling’s T2 chart are roughly controlled at a certain level but not quite stable. In the case that the dispersion of the outliers shrinks, the type-I error rate of the T2 chart increases as the level of shrinkage increases. This is because that the smaller dispersion of the outliers makes their T2 statistics smaller and hence it is harder to detect them, whereas IC profiles have relatively larger T2
Table 5.6: The type-I and type-II error rates and their standard errors (in parentheses) of OC Model (e) for α = 0.05
pI pII
δ T2 S(2) S(3) S(4) S(5) T2(mod) S(2) S(3) S(4) S(5)
0.500 0.0953 0.0493 0.0491 0.0495 0.0495 1.0000 0.0000 0.0000 0.0000 0.0000 (.0005) (.0003) (.0003) (.0003) (.0003) (.0000) (.0000) (.0000) (.0000) (.0000) 0.625 0.0854 0.0494 0.0497 0.0496 0.0495 0.9998 0.0000 0.0000 0.0000 0.0000 (.0004) (.0003) (.0003) (.0003) (.0003) (.0001) (.0000) (.0000) (.0000) (.0000) 0.750 0.0739 0.0497 0.0497 0.0497 0.0493 0.9993 0.0057 0.0055 0.0057 0.0055 (.0004) (.0003) (.0003) (.0003) (.0003) (.0001) (.0003) (.0003) (.0003) (.0003) 0.875 0.0634 0.0501 0.0503 0.0501 0.0504 0.9938 0.5399 0.5404 0.5392 0.5397 (.0004) (.0003) (.0003) (.0003) (.0003) (.0004) (.0025) (.0025) (.0025) (.0025)
1.000 0.053 0.049 0.049 0.049 0.049 - - - -
-(.0003) (.0003) (.0003) (.0003) (.0003) - - - -
-1.143 0.0446 0.0498 0.0502 0.0502 0.0501 0.7982 0.6039 0.6021 0.6008 0.5999 (.0003) (.0003) (.0003) (.0003) (.0003) (.0019) (.0024) (.0024) (.0025) (.0024) 1.333 0.0407 0.0496 0.0498 0.0498 0.0494 0.4595 0.0362 0.0355 0.0356 0.0350 (.0003) (.0003) (.0003) (.0003) (.0003) (.0025) (.0008) (.0008) (.0008) (.0008) 1.600 0.0466 0.0494 0.0497 0.0496 0.0495 0.1128 0.0001 0.0001 0.0002 0.0002 (.0003) (.0003) (.0003) (.0003) (.0003) (.0015) (.0000) (.0001) (.0001) (.0001) 2.000 0.0519 0.0493 0.0491 0.0495 0.0493 0.0090 0.0000 0.0000 0.0000 0.0000 (.0003) (.0003) (.0003) (.0003) (.0003) (.0004) (.0000) (.0000) (.0000) (.0000)
statistics and hence easier to be misclassified. On the other hand, our proposed SMSS chart is quite robust to the change in dispersion in term of the type-I error rate. For the type-II error rate, the Hotelling’s T2 chart is able to detect outliers only in the dilation case but is almost useless in the shrinkage case. On the con-trary, the SMSS chart is very powerful in detecting outliers whether the dispersion dilates or shrinks. Since both the dilatation and shrinkage of the dispersion lead to changes in the values of the T02 and T12 statistics, especially the T12 statistic, the directions of the corresponding spatial sign vectors change and tend to concentrate
−1.0 −0.5 0.0 0.5 1.0
Figure 5.2: The scatter plots of the spatial sign vectors of the profiles generated from OC Models (a) - (e).
to the upper side of the unit circle for δ > 1 (see Figure 5.2(e)), or the lower side for δ < 1 (see Figure 5.2(f)). The more concentrated pattern of the spatial signs of the vectors (T02, T12)′ within a subgroup leads to a larger values of the Q statistic in the SMSS chart. It should be noticed that, although the Hotelling’s T2 chart was originally developed for the mean change, it has some power for detecting the process dispersion change because the contamination of the reference sample in dispersion also affects the estimation of the variance-covariance matrix. Neverthe-less, we consider the Hotelling’s T2 chart in the comparison simply because there is no existing nonparametric monitoring scheme for the variance-covariance matrix of profiles or multivariate data so far.
Table 5.7: The type-I and type-II error rates and their standard errors (in parentheses) of OC Model (5.11) for α = 0.05
pI pII
δ T2 S(2) S(3) S(4) S(5) T2(mod) S(2) S(3) S(4) S(5)
0.0 0.248 0.050 0.050 0.050 0.050 - - - -
-(.0007) (.0003) (.0003) (.0003) (.0003) - - - -
-0.6 0.2511 0.0505 0.0503 0.0503 0.0496 0.9357 0.9170 0.9236 0.9164 0.9235 (.0008) (.0003) (.0003) (.0003) (.0003) (.0011) (.0013) (.0013) (.0013) (.0012) 1.2 0.2473 0.0510 0.0503 0.0501 0.0505 0.8603 0.4521 0.5286 0.3869 0.4763 (.0008) (.0003) (.0003) (.0003) (.0003) (.0017) (.0052) (.0040) (.0034) (.0034) 1.8 0.2469 0.0495 0.0499 0.0493 0.0492 0.2845 0.0484 0.0374 0.0119 0.0188 (.0007) (.0003) (.0003) (.0003) (.0003) (.0058) (.0037) (.0016) (.0005) (.0008) 2.4 0.2465 0.0498 0.0494 0.0489 0.0489 0.0010 0.0017 0.0013 0.0010 0.0003 (.0007) (.0003) (.0003) (.0003) (.0003) (.0001) (.0003) (.0007) (.0008) (.0001) 3.0 0.2457 0.0488 0.0494 0.0495 0.0495 0.0000 0.0000 0.0000 0.0000 0.0000 (.0007) (.0003) (.0003) (.0003) (.0003) (.0000) (.0000) (.0000) (.0000) (.0000)
Analogous to the CS and CE charts introduced in Section 3.1.2, choosing the number of effective PCs K is also an issue in implementing the SMSS chart. Ac-cording to the tables of total variation explained (in Appendix A.5), one may choose K = 3 for all the five OC models. However, the best choice of K is 2 for OC Models (b) and (d) from the aspect of the type-II error rate. Therefore, we suggest that one should try several candidates of K according to the percentage of the total variation explained. Note that the performance is not necessarily better when a larger K is chosen. If an unnecessary PC is chosen in computing the T02 statistic, the weights of the effective PCs are diluted and hence the power of the SMSS chart degrades. Recall that a similar phenomenon was observed for the CS and CE charts and the same issue was discussed in Section 3.3.2.
To emphasize the ability in detecting outliers with mean shifts, we next con-sider profiles distributed further depart from the multivariate normal distribution.
Consider the IC profiles of the form:
yij = g0(xj) + fi(xj) + εij, i = 1, . . . , m, j = 1, . . . , p, (5.11) where {x1, . . . , xp} is the set of the design points equally spaced ranging from 0.025 to 0.975 with p = 20, g0(·) is the fixed effect of the profile, fi(·) is assumed following the multivariate t distribution tp(ν) described in Section 4.2 with degrees of freedom ν = 3 (the variance-covariance matrix used is Σij = 0.5|i−j|) and {εi1, . . . , εip} is the i.i.d. random noise from N(0, σε2), σε = 0.3. This model is used to simulate profiles with the heavy-tailed distribution. Assume that g0(x) = 0 and the OC profiles are generated by shifting the fixed effect to g1(x) = δ sin(2π(x− 0.5)) where δ = 0(0.6)3. Choose α = 0.05. The results are summarized in Table 5.7. Note that the type-I error rates of the Hotelling’s T2 chart are quite stable but much larger than 0.05. As before, we adjust the control limit to control the type-I error rates around 0.05 for fair comparison. From Table 5.7, we observe that the SMSS chart outperforms the T2 chart in terms of the type-II error rate. It shows that the SMSS chart has a better detecting power than the Hotelling’s T2 chart not only in the dispersion but also in the mean change when the underlying profile distribution is not too close to the normal distribution.
When implementing the SMSS chart, the spatial median and the Tyler’s trans-formation matrix of the vector (T02, T12) need to be estimated numerically. We remark that, when solving equations (5.5) and (5.6), the solutions may not exist when the historical data contain very extreme outliers. Other than that, the SMSS chart is an efficient and powerful tool for monitoring profiles in Phase I applications and very easy to use for practitioners.