In a manufacturing process, suppose that a product has k possible types of defects for some known positive integer k. For each tested product item, the result could be recorded as exactly one of the following k + 1 disjoint categories: fthe
…rst defect type, : : :, the kth defect type, passg. Such data are called either binary for k = 1 or polytomous for k 2. In the paper, categorical data denote either binary or polytomous data. See, e.g., McCullagh and Nelder (1989, Chapters 4 and 5) or Agresti (2002) for a review of the categorical data analysis.
In a Bayesian framework, the prior distribution of the unobserved random parameters is pre-speci…ed explicitly, i.e., it does not depend on the observed data. However, it is usually a non-trivial task for practitioners to pre-specify an appropriate prior distribution of the random parameters. Thus, an empirical Bayes approach is commonly used instead.
In an empirical Bayes framework, there exist some unknown hyperparameters in the prior distribution of the unobserved random parameters. Then the marginal distribution of the observed data is utilized to estimate the hyperparameters. Fi-nally, a Bayesian inference is made for the random parameters by treating the estimated prior distribution as the prior distribution. Since the estimated prior distribution does depend on the observed data, an empirical Bayes inference is not a Bayesian inference.
There are some research works utilizing the empirical Bayes model to monitor the categorical data generated in a manufacturing process. For example, Yousry et al. (1991) used the beta-binomial empirical Bayes model to monitor the binary data and utilized the method of moments for estimation of the
hyperparame-ters. Recently, Shiau et al. (2005) used the Dirichlet-multinomial empirical Bayes model to monitor the polytomous data and utilized both the pseudo maximum likelihood method and the method of moments for estimation of the hyperparame-ters. Chen et al. (2004) used the beta-binomial/Dirichlet-multinomial empirical Bayes model to monitor the categorical data and utilized the maximum likelihood method for estimation of the hyperparameters. Similarly, Chen et al. (2005) used the transformed-normal-binomial/multinomial empirical Bayes model to monitor the categorical data and utilized the maximum likelihood method for estimation of the hyperparameters. Chen and Liu (2005) developed a model selection technique between two empirical Bayes models for the categorical data.
To proceed the discussion, we give a brief description on the Bayesian inference as follows: In a Bayesian framework, the prior distribution of the unobserved ran-dom parameter vector has an explicitly pre-speci…ed prior probability density function (p.d.f.) or probability mass function (p.m.f.) ( ) and that the response vector y given has a known conditional p.d.f. or p.m.f. f (yj ), where the func-tion ( ) does not depend on y. Then the Bayesian inference is based on the posterior p.d.f. or p.m.f. p( jy) of given y, where
p( jy) / f(yj ) ( ):
In the Bayesian terminology, ( ), f (yj ), and p( jy) are also called the prior likelihood, the likelihood, and the posterior likelihood of , respectively. In the literature, it is common practice to estimate by the posterior mean E( jy) of
given y, where
E( jy) =
RR f (yj ) ( ) d f (yj ) ( ) d or
P
2 f (yj ) ( ) P
2 f (yj ) ( )
with P (f 2 g) = 1. An alternative estimator of is the posterior mode mode( jy) of given y, where
mode( jy) = arg sup
2
p( jy) = arg sup
2
f (yj ) ( ):
See, e.g., Gelman et al. (2004) for a review of the Bayesian data analysis.
Next, we give a brief description on the empirical Bayes inference as follows:
In an empirical Bayes framework, the unobserved random parameter vector has a prior p.d.f. or p.m.f. ( ; ) for some unknown hyperparameter vector and that the response vector y given has a known conditional p.d.f. or p.m.f. f (yj ).
An empirical Bayes inference is simply a Bayesian inference discussed above with ( ) being replaced by ( ; )j = ^ (y) ( ( ; ^(y))), where ^(y) is an estimator of . Then an empirical Bayes inference is based on the estimated posterior p.d.f. or p.m.f. p( jy; )j = ^ (y) ( p( jy; ^(y))) of given y, where
p( jy; ) / f(yj ) ( ; ):
In practice, either the maximum likelihood estimator or a method-of-moments estimator of is usually used as ^(y) in an empirical Bayes inference. Similarly, it is common practice to estimate by the estimated posterior mean E( jy; )j = ^ (y)
( E( jy; ^(y))) of given y, where
E( jy; ) =
RR f (yj ) ( ; ) d f (yj ) ( ; ) d or
P
2 f (yj ) ( ; ) P
2 f (yj ) ( ; )
with P (f 2 g; ) = 1. An alternative estimator of is the estimated posterior mode mode( jy; )j = ^(y) ( mode( jy; ^(y))) of given y, where
mode( jy; ) = arg sup
2
p( jy; ) = arg sup
2
f (yj ) ( ; ):
See, e.g., Carlin and Louis (2000) for a review of the empirical Bayes data analysis.
The remaining parts of the paper is organized as follows. In Section 2, a two-components mixture prior parametric family for the in-control prior distribution is proposed in a manufacturing process. In Section 3, an empirical Bayes approach is proposed when there are available in-control categorical data generated from the manufacturing process. An example of the proposed empirical Bayes model is introduced in Section 4. The goodness of …t and the simpli…cation of the proposed model are discussed in Sections 5 and 6, respectively. Utilizing the likelihood ratio method, both Bayesian and empirical Bayes monitoring techniques are proposed in Section 7. The performance of the proposed process monitoring scheme is studied in terms of the average run length in Section 8. Some concluding remarks are given in the …nal section.
2
A TWO-COMPONENTS MIXTURE PRIOR PARAMETRIC FAMILYAssume that a product item is classi…ed as one of the following k + 1 disjoint categories: fthe …rst defect type, : : :, the kth defect type, passg, where k is a known positive integer. Let t be any positive integer. For i 2 f1; : : : ; kg, let
it denote the probability that a product item manufactured at time t has the ith defect type. Then 1 Pk
i=1 it ( k+1;t) is the probability that a product item manufactured at time t passes the test. Set t ( 1t; : : : ; kt)T and f t:
1t; : : : ; kt > 0 and Pk
i=1 it < 1g. In the paper, t is called the (unobserved) random parameter vector at time t. Let F t denote the prior cumulative distrib-ution function (c.d.f.) of t. For simplicity of notation, set Rm ( 1; 1)m for any positive integer m.
Throughout the paper, the manufacturing process is said to be in control at time t if and only if F t = F, where F is an unknown in-control prior c.d.f. on with p.d.f. ( t). In other words, the manufacturing process is said to be out of control at time t if and only if F t 6= F .
For u 2 f1; 2g, let fFu; u: u 2 ug denote the uth component prior para-metric family, where u is a qu 1 hyperparameter vector for some known pos-itive integer qu, each Fu; u is a known prior c.d.f. on with p.d.f. u( t; u), and u is a known open subset of Rqu. Assume that @2 u( t; u)=@ u@ Tu ex-ists for each t 2 , u 2 u, and u 2 f1; 2g. Let fF : 2 g denote the two-components mixture prior parametric family, where ( (!; T1; T2)T) is a (1 + q1 + q2) 1 ( q 1) hyperparameter vector, each F is a known prior
c.d.f. on with p.d.f.
( t; ) exp(!)
1 + exp(!) 1( t; 1) + 1
1 + exp(!) 2( t; 2); (1)
and [ 1; 1] 1 2. Assume that the two-components mixture prior parametric family is identi…able, i.e., F 1 6= F 2 if 1 6= 2 with 1; 2 2 . When ! = 1, the two-components mixture prior parametric family is simpli…ed to the …rst component prior parametric family with ( t; ) = 1( t; 1). When
! = 1, the two-components mixture prior parametric family is simpli…ed to the second component prior parametric family with ( t; ) = 2( t; 2). See, e.g., McLachlan and Peel (2000).
For any 2 , the Kullback-Leibler divergence between the in-control prior c.d.f. F and the prior c.d.f. F is de…ned as
d(F; F )
Z
log ( t)
( t; ) dF ( t) d( ): (2)
By the Jensen inequality,
d( ) = Z
log ( t; )
( t) dF ( t) log
Z ( t; )
( t) ( t) d t
= log
Z
f t: ( t)>0g
( t; ) d t log Z
( t; ) d t = 0
for 2 , where d( ) = 0 if and only if F = F .
Assume that all of the following conditions hold: For 2 ( 1; 1) 1 2
( o), @2d( )=@ @ T exists,
@d( )
@ =
Z @
@ log ( t)
( t; ) dF ( t) S( );
and
@2d( )
@ @ T =
Z @2
@ @ T log ( t)
( t; ) dF ( t) J ( ):
Assume that there exists a unique 0 2 o such that
0 = arg inf
2 d( ): (3)
Then S( 0) = 0q 1. Observe that, for 2 o,
S( ) =
Z @ ( t; )=@
( t; ) dF ( t)
Z ( t; )
( t; ) dF ( t) Z
S( ; t) dF ( t) E(S( ; t); F ) (4)
and
J ( ) =
Z @S( ; t)
@ T dF ( t)
=
Z @2 ( t; )=@ @ T
( t; ) + ( t; ) T( t; )
[ ( t; )]2 dF ( t)
Z T( t; )
( t; ) + ( t; ) T( t; )
[ ( t; )]2 dF ( t) Z
J ( ; t) dF ( t) E(J ( ; t); F ): (5)
For 2 o, set
One way to evaluate 0 is to iterate the following procedure until (v)converges to 0: First choose a good initial value (0) 2 o for 0. Next, set
(v+1) (v) + J 1 (v) S (v) (7)
when (v) is de…ned for v 2 f0; 1; 2; : : :g. If (v+1) 2 o and d( (v+1)) d( (v)), set (v+1) (v+1); otherwise, set
(u;v+1) (v)+ 1
2u K 1 (v) S (v) (8)
for u 2 f0; 1; 2; : : :g and set (v+1) (mv+1;v+1), where mv+1 minfu: u 2 f0; 1; 2; : : :g, (u;v+1) 2 o, (u+1;v+1) 2 o, and d( (u;v+1)) < minfd( (v)), d( (u+1;v+1))gg.
Note that, by the Taylor series expansion, we obtain
d (u;v+1) = d (v) ST (v) (u;v+1) (v) +
= d (v) 1
2u ST (v) K 1 (v) S (v) + O 1 22 u
as u ! 1 for any …xed non-negative integer v. Since ST( (v))K 1( (v))S( (v)) >
0for any …xed non-negative integer v, d( (u;v+1))is a strictly increasing function of u for large u with limit d( (v)), which implies that mv+1 is well-de…ned. Thus, d( (v)) is a decreasing function of v, i.e., d( (0)) d( (1)) d( (2)) : : :.
When any of d( ), S( ), J ( ), and K( ) does not have a closed-form formula, we may …rst simulate an independent and identically distributed (i.i.d.) sample f (1)t ; : : : ; (R)t g of size R, e.g., R = 50 000, from the in-control prior c.d.f. F and
then numerically evaluate d( ), S( ), J ( ), and K( ) by
3
AN EMPIRICAL BAYES APPROACHLet t be any positive integer. Suppose that there are nt tested product items manufactured at time t, where nt is a known positive integer. For i 2 f1; : : : ; kg, let yitdenote the number of the tested product items which have the ith defect type among the nt tested product items manufactured at time t. Then nt
Pk i=1 yit ( yk+1;t) is the number of the tested product items which pass the test among the nt tested product items manufactured at time t. Set yt (y1t; : : : ; ykt)T and Ynt fyt: y1t; : : : ; ykt 2 f0; 1; : : : ; ntg and Pk
i=1 yit ntg. In the paper, yt is called the (observed) response vector at time t.
At each time t, assume that the response vector ytgiven the random parameter vector t is distributed as either the conditional binomial(nt; t) distribution for k = 1 or the conditional multinomial(nt; t) distribution for k 2, denoted by
ytj t binomial(nt; t) for k = 1 or multinomial(nt; t) for k 2. Let Fytj t
denote the conditional c.d.f. of yt given t with p.m.f.
f (ytj t) = 1Ynt(yt) nt!
For u 2 u and u 2 f1; 2g, assume that In the paper, it is assumed that the in-control prior c.d.f. F = F 0 for some unique 0 2 . Then d( 0) = 0. Assume that there are available historical in-control response vectors fy1; y2; : : : ; yTg generated in the manufacturing process for some known large positive integer T , where ( T1; yT1)T; ( T2; yT2)T; : : : ; ( TT; yTT)T are independent 2k 1 random vectors. Set ( T1; T2; : : : ; TT)T, y (yT1; yT2; : : :, yTT)T, and Y Yn1 Yn2 YnT, where and y are, respectively, called the historical in-control (unobserved) random vector and the historical in-control
(ob-served) response vector in the paper. Let Fy; 0 denote the marginal c.d.f. of y
Given the historical in-control response vector y, the log-likelihood function for is
the score function for is
S( ; y) @`( ; y)
and the observed (Fisher) information for is
Then K( ; y) is a non-negative de…nite covariance matrix for 2 o and y 2 Y.
For large T , K( ; y) is in general a positive de…nite covariance matrix for 2 o and y 2 Y.
Observe that, for 2 o,
The maximum likelihood estimator (MLE) ^(y) ( ^) of solves the score equation S( ; y) = 0q 1 for . That is, S( ; y)j = ^ ( S(^; y)) = 0q 1.
One way to evaluate ^ is to iterate the following procedure until (v) converges to ^: First choose a good initial value (0) 2 o for ^. Next, set
(v+1) (v)
+ J 1 (v); y S (v); y (24)
when (v) is de…ned for v 2 f0; 1; 2; : : :g. If (v+1) 2 o and `( (v+1); y)
`( (v); y), set (v+1) (v+1); otherwise, set
Note that, by the Taylor series expansion, we obtain
` (u;v+1); y
0for any …xed non-negative integer v, `( (u;v+1); y)is a strictly decreasing function of u for large u with limit `( (v); y), which implies that mv+1is well-de…ned. Thus,
`( (v); y) is an increasing function of v, i.e., `( (0); y) `( (1); y) `( (2); y) : : :.
When any of `( ; y), S( ; y), J ( ; y), and K( ; y) does not have a closed-form closed-formula, we may numerically evaluate any of them as follows: First, for u2 f1; 2g, simulate an i.i.d. sample f (u;1)1 ; : : : ; (u;R)1 g of size R, e.g., R = 50 000,
f^u; u(yt; u) and ^f T(yt; ), respectively, utilizing their closed-form formulae. Finally, numer-ically evaluate `( ; y), S( ; y), J ( ; y), and K( ; y) by
4
AN EXAMPLEFor illustration of the proposed methodology, the …rst component prior para-metric family is chosen as the family of all beta/Dirichlet distributions because it is a conjugate family of binomial/multinomial distributions. The second com-ponent prior parametric family is chosen as the family of all transformed normal distributions (de…ned below) because it is a rich family of distributions, o¤ering important distribution shapes that cannot be achieved within the family of all beta/Dirichlet distributions. See, e.g., O’Hagan and Forster (2004, Chapter 12).
4.1
The First Component Prior Parametric FamilyLet the …rst component prior parametric family fF1; 1: 1 2 1g denote the family of all beta/Dirichlet distributions, where 1 ( 11; : : : 1;k+1)T ( ( 11; : : : 1q1)T), 1 Rk+1, and F1; 1 has p.d.f.
1( t; 1) = 1 ( t) [Pk+1
i=1 exp( 1i)]
Qk+1
i=1 [exp( 1i)]
k+1Y
i=1
exp( 1i) 1 it
1 ( t) [exp( 10)]
Qk+1
i=1 [exp( 1i)]
k+1Y
i=1
exp( 1i) 1
it (33)
with 1 ( t) = 1 for t 2 and 0 otherwise. Since fF1; 1: 1 2 1g is cho-sen as a conjugate family of binomial/multinomial distributions, all of f1(yt; 1), f1; 1(yt; 1), and f1; 1 T
1(yt; 1) have closed-form formulae for 1 2 1 as follows:
For 1 2 1, it follows from Johnson et al. (1997, pages 80 and 81) that
Thus, for yt2 Ynt and 1 2 1,
f1; 1(yt; 1) = f1(yt; 1) S1( 1; yt) (35)
and
f1;
1 T
1(yt; 1) = f1(yt; 1) S1( 1; yt) S1T( 1; yt) J1( 1; yt) : (36)
4.2
The Second Component Prior Parametric FamilyLet the second component prior parametric family fF2; 2: 2 2 2g de-note the family of all transformed normal distributions de…ned as follows: Set (log( 1t= k+1;t); : : : ; log( kt= k+1;t))T t( ( 1t; : : : ; kt)T). Then, for i 2 f1; : : :, kg, it = exp( it)=[1 +Pk
i0=1 exp( i0t)]. Let N ( ; ) denote the k-variate normal distribution with mean vector ( ( 1; : : : ; k)T) 2 Rk and k k positive def-inite covariance matrix ( ( ii0)). Set 1 ( ii0) and R ( ii0=p ii i0i0
) ( ( ii0)). Then
1 = diagnp 11
; : : : ;p
kko
Rdiagnp 11
; : : : ;p
kko :
Set
2 T; log 11 ; : : : ; log kk ; log 1 + 12
1 12 ; : : : ; log 1 + 1k 1 1k ; : : : ; log 1 + k 1;k
1 k 1;k
T
( 21; : : : ; 2;k(k+3)=2)T ( 21; : : : ; 2q2)T 2 2;
where 2 f 2: 2 Rk and is a k k positive de…nite covariance matrixg. distribution, denoted by t F2; 2, with p.d.f.
2( t; 2) = ( t; 2) det @ t
and formula, we may numerically evaluate all of them as follows: First simulate an i.i.d. sample f (2;1)1 ; : : : ; (2;R)1 g of size R, e.g., R = 50 000, from the prior c.d.f. F2; 2
f^2; 2(yt; 2)
2(yt; 2) is to utilize the multivariate Gauss-Hermite quadrature, e.g., see Fahrmeir and Tutz (2001, pages 447-449). All of nodes and weights of the Hermite polynomial of 32 degrees are shown in the appendix for the multivariate Gauss-Hermite quadrature.
In the paper, a simulation study is conducted for the following four cases where
F = F 0 = exp(!0)
1 is the beta(85; 15) distribution, and F2; 0
2 is the transformed-normal( 0:716; (0:214)2) distribution.
Case 2: 0 = (log(1); log(80); log(20); 0:410; log[1=(0:205)2])T. In particular, exp(!0)=[1 + exp(!0)] = 1=2, F is the beta(80; 20) distribution,and F is the
transformed-normal( 0:410; (0:205)2) distribution.
Case 3: 0 = (log(1); log(60); log(40); 1:405; log[1=(0:253)2])T. In particular, exp(!0)=[1 + exp(!0)] = 1=2, F1; 01 is the beta(60; 40) distribution, and F2; 02 is the transformed-normal( 1:405; (0:253)2) distribution.
Case 4: 0 = (log(5); log(73); log(27); 0:203; log[1=(0:202)2])T. In particular, exp(!0)=[1 + exp(!0)] = 5=6, F1; 0
1 is the beta(73; 27) distribution, and F2; 0
2 is the transformed-normal( 0:203; (0:202)2) distribution.
5
GOODNESS OF FITIn this section, the goodness of …t of the proposed model for a set of available historical in-control response vectors, fy1; : : : ; yTg, generated in a manufacturing process is discussed. Recall that ( T1; : : : ; TT)T, y (yT1; : : : ; yTT)T, Y Yn1 YnT, and F is the in-control prior c.d.f.
Consider the null hypothesis H0: 1; : : : ; T i:i:d:F 2 fF : 2 g versus the alternative H1: 1; : : : ; T i:i:d: F =2 fF : 2 g. Let F( ) denote the set of all prior c.d.f.’s on and let `(F ; y) denote the log-likelihood function of F given y.
Then
`(F ; y) log
" T Y
t=1
f (yt; F )
#
= XT
t=1
log[f (yt;F )]
XT t=1
`(F ; yt);
where
f (yt; F ) = Z
f (ytj t) dF ( t):
Let WT(y) denote the corresponding likelihood ratio (LR) statistic given y.
Then
where ^F is the non-parametric MLE of F given y under H1 and ^ is the parametric MLE of under H0. Since it takes too much time to calculate the critical point for performing the LR test, an alternative goodness-of-…t test is proposed in the paper as follows:
Note that the empirical prior c.d.f. ~F with p.m.f. T 1 PT
t=1 1f tg converges to F in distribution as T ! 1 and that, for t 2 f1; : : : ; T g, the MLE yt=nt of t
given ytconverges to tas nt! 1. Since 1; : : : ; T are unobserved, the empirical prior c.d.f. ~F is unavailable. Thus, we utilize the estimated empirical prior c.d.f. F with p.m.f. T 1 PT
t=1 1fyt=ntg to estimate F . When all of n1; : : : ; nT, and T tend to 1, F converges to F in distribution.
In the paper, consider the goodness-of-…t statistic
WT(y) 2 h
`(F ; y)jF =F ` ^; y i
2 h
`(F ; y) ` ^; y i
: (47)
One way to calculate the critical point for performing the goodness-of-…t test is as follows: First simulate an i.i.d. sample fy(1); : : : ; y(R)g, e.g., R = 50 000, from the estimated in-control marginal c.d.f. Fy; 0j 0= ^ ( Fy;^). Let (y(1); : : : ; y(R)) be a permutation of (y(1); : : : ; y(R)) such that WT(y(1)) : : : WT(y(R)). Let
be a known constant with 0 < < 1, e.g., 0:05. An approximate size 1 goodness-of-…t test is to reject H0 if and only if WT(y) > WT(y([R (1 )])), where [R (1 )] is the largest integer less than or equal to R (1 ).
The corresponding values of WT(y([R (1 )]))’s for Cases 1-4 in Section 4 are shown in Table 1, where k = 1, T = 300, n1 = : : : = nT = 300, R = 50 000, and = 0:05. And the empirical c.d.f.’s of WT(y)’s for Cases 1-4 in Section 4 are shown in Figures 1, where k = 1, T = 300, n1 = : : : = nT = 300, R = 50 000.
Table 1: The values of WT(y([R (1 )]))’s for Cases 1-4, where k = 1, T = 300, n1 = : : : = nT = nt= 300, R = 50 000, and = 0:05.
Case 1 Case 2 Case 3 Case 4
WT(y([R (1 )])) 18.1 4.90 12.7 1.78
Figure 1: The empirical c.d.f.’s of WT’s for Case 1-4, where k = 1, T = 300, n1 = : : : = nT = nt= 300, and R = 50 000.
-5.0 0.0 5.0 10.0 15.0 20.0 25.0
W*T
0.0 0.2 0.4 0.6 0.8 1.0
Empirical c.d.f.
Case 1 Case 2 Case 3 Case 4
4.90 0.95
18.1
1.78 12.7
6
SIMPLIFICATIONIn this section, the simpli…cation of the two-components mixture prior para-metric family to either the …rst or the second component prior parapara-metric family is discussed if the null hypothesis of the previous goodness-of-…t test is not rejected.
Let u 2 f1; 2g be …xed. Consider the null hypothesis Hu0: 1; : : : ; T i:i:d:F 2 fFu; u: u 2 ug versus the alternative Hu1: 1; : : : ; T i:i:d: F 2 fF : 2 g.
Let Wu;T(y)denote the LR statistic given y, where
Wu;T(y) 2
"
` ^; y sup
u2 u
XT t=1
`u( u; yt)
#
2 ` ^; y sup
u2 u
`u( u; y) 2 h
` ^; y `u ^
u; y i
(48)
with ^udenoting the MLE of ugiven y under the uth component prior parametric family.
One way to calculate the critical point for performing the LR test is as follows:
First simulate fy(u;1); : : : ; y(u;R)g, e.g., R = 50 000, from the estimated in-control marginal c.d.f. Fy;u; 0uj 0u= ^u ( Fy;u; ^
u). Let (y(u)(1); : : : ; y(u)(R)) be a permutation of (y(u;1); : : :, y(u;R)) such that Wu;T(y(u)(1)) : : : Wu;T(y(u)(R)). Let be a known constant with 0 < < 1, e.g., 0:05. An approximate size 1 LR test is to reject Hu0 if and only if Wu;T(y) > Wu;T(y(u)([R (1 )])), where [R (1 )] is the largest integer less than or equal to R (1 ).
When both H10 and H20 are rejected, the proposed two-components mixture prior parametric family for the in-control prior distribution is selected. The cor-responding monitoring technique is developed in the following section.
When H10 is not rejected but H20 is rejected, the …rst component prior
para-metric family for the in-control prior distribution is selected. The corresponding monitoring technique is developed in Chen et al. (2004).
When H10is rejected but H20is not rejected, the second component prior para-metric family for the in-control prior distribution is selected. The corresponding monitoring technique is developed in Chen et al. (2005).
When neither H10 nor H20is rejected, the model selection technique developed in Chen and Liu (2005) can be utilized. The corresponding monitoring technique is developed in either Chen et al. (2004) or Chen et al. (2005).
The corresponding values of Wu;T(y(u)([R (1 )]))’s for Cases 1-4 in Section 4 are shown in Table 2, where u 2 f1; 2g, k = 1, T = 300, n1 = : : : = nT = 300, R = 50 000, and = 0:05. And the empirical c.d.f.’s of W1;T(y)’s and W2;T(y)’s for Cases 1-4 in Section 4 are shown in Figures 2 and 3, where k = 1, T = 300, n1 = : : : = nT = 300, R = 50 000.
Table 2: The values of Wu;T(y(u)([R (1 )]))’s for Cases 1-4, where u 2 f1; 2g, k = 1, T = 300, n1 = : : : = nT = nt = 300, R = 50 000, and = 0:05.
Case 1 Case 2 Case 3 Case 4 W1;T(y(1)([R (1 )])) 2.146 1.762 0.566 1.284 W2;T(y(2)([R (1 )])) 1.035 0.653 1.789 0.335
Figure 2: The empirical c.d.f.’s of W1;T’s for Case 1-4, where k = 1, T = 300,
7
A PROCESS MONITORING SCHEMELet Pin denote the false-alarm rate, i.e., the probability that an out-of-control signal occurs when the manufacturing process is in control. Conventionally, Pin is taken to be 2 ( 3) ( 0:002 699 8), where is the c.d.f. of the standard normal distribution. In this section, utilizing the LR method, a Bayesian (or an empirical Bayes) monitoring scheme for the manufacturing process is proposed when F = F 0 2 fF : 2 g for some known (or unknown) 0 2 . The main reason for using the LR test is that it often has a higher power than other tests when the alternative hypothesis is true, which corresponds to a better detecting power in monitoring the process when the process is out of control.
In order to monitor the manufacturing process at time t (> T ), suppose that the response vector yt is observed. Then we are interested in testing whether or not the manufacturing process is in control at time t. Recall that F t is the prior c.d.f. of t and that F( ) is the set of all c.d.f.’s on .
7.1
A BAYESIAN MONITORING SCHEMEIn this subsection, consider the case where F = F 0 2 fF : 2 g for some known 0 2 . To monitor the manufacturing process at time t, the null hypothesis H0: F t = F 0 versus the alternative H1: F t 6= F 0, i.e., F t 2 F( )nfF 0g, is tested.
List all the elements of the sample space Ynt of yt by fy(1)t ; : : : ; y(jYt ntj)g, where jYntj (= (nt+ k)!=(nt!k!)) is the number of elements in Ynt. Regard F t as the unknown parameter of interest in F( ). Then the unknown parameter of inter-est is non-parametric. Let `(F t; yt) ( log[f (yt; F t)]) denote the log-likelihood
function of F t given yt. Note that
`(F t; yt) = log Z
f (ytj t) dF t( t) log Z
sup
t2
f (ytj t) dF t( t)
= log Z
f (ytj t)j t=yt=nt dF t( t) = logh
f (ytj t)j t=yt=nt
i
;
where the binomial/multinomial likelihood f (ytj t)for tgiven ytattains its max-imum at t = yt=nt: Thus, the MLE ^F t of F t given yt has p.m.f. 1fyt=ntg and
sup
F t2F( )
`(F t; yt) = `(F t; yt)jF t= ^F t ` F^ t; yt = log f (ytj t)j t=yt=nt :
Let Wt; 0(yt) denote the corresponding LR statistic, where
Wt; 0(yt) = 2 log f (ytj t)j t=yt=nt ` 0; yt (49)
with P (f0 < Wt; 0(yt) <1g; Fyt; 0) = 1.
The size PinLR test and a control chart of monitoring the LR statistic Wt; 0(yt) can be constructed as follows: Let (yt;(1); : : : ; yt;(jYntj))be a permutation of (yt(1); : : :, y(jYt ntj))such that Wt; 0(yt;(1)) : : : Wt; 0(yt;(jYntj)). Note that Wt; 0(yt)is a dis-crete random variable. If a deterministic upper control limit is used, a pre-speci…ed false-alarm rate Pin (2 (0; 1)), e.g., 2 ( 3), is nearly impossible to attain. How-ever, there is no problem to attain any pre-speci…ed false-alarm rate based on the concept of a randomized-upper-control-limit approach proposed in Shiau et al. (2005). To …nd the randomized upper control limit ( RU CL 0), we start ac-cumulating the right tail probability from Wt; 0(yt;(jYntj))until we reach the …rst r
such that P (fWt; 0(yt) Wt; 0(yt;(r))g; Fyt; 0) Pin. Denote this r by m 0, i.e.,
m 0 = max r: P Wt; 0(yt) Wt; 0 yt;(r) ; Fyt; 0 Pin : (50)
If P (fWt; 0(yt) Wt; 0(yt;(m 0))g; Fyt; 0) = Pin, which is nearly impossible, then there is no need for randomization and Wt; 0(yt;(m 0))is the upper control limit ( U CL 0). If P (fWt; 0(yt) Wt; 0(yt;(m 0))g; Fyt; 0) > Pin, then Wt; 0(yt;(m 0)) = RU CL 0. Note that there may be more than one yt;(r) such that Wt; 0(yt;(r)) = RU CL 0. Let m 0;L, m 0;U 2 f1; : : : ; jYntjg such that
Wt; 0 yt;(m 0;L 1) < Wt; 0 yt;(m 0;L) = RU CL 0 = Wt; 0 yt;(m 0;U)
< Wt; 0 yt;(m 0;U+1) ;
where Wt; 0(yt;(0)) 0 and Wt; 0(yt;(jYntj+1)) 1. Then the randomization is done by signaling an out-of-control alarm with probability
Pin; 0;RU CL = Pin P (fWt; 0(yt) > RU CL 0g; Fyt; 0) P (fWt; 0(yt) = RU CL 0g; Fyt; 0)
=
Pin PjYntj
r=m 0;U+1 P (fWt; 0(yt) = Wt; 0(yt;(r))g; Fyt; 0) Pm 0;U
r=m 0;L P (fWt; 0(yt) = Wt; 0(yt;(r))g; Fyt; 0) :(51) This leads to
Pin = P Wt; 0(yt) > RU CL 0 ; Fyt; 0
+Pin; 0;RU CL P Wt; 0(yt) = RU CL 0 ; Fyt; 0
and 0 < Pin; 0;RU CL 1. When Pin; 0;RU CL = 1, there is no need for
randomiza-tion.
The monitoring scheme is as follows: If Wt; 0(yt) > RU CL 0, then the null hypothesis H0: F t = F 0 is rejected and the manufacturing process is declared to be out of control at time t; if Wt; 0(yt) < RU CL 0, then the null hypothesis H0: F t = F 0 is not rejected and the manufacturing process is declared to be in control at time t; if Wt; 0(yt) = RU CL 0, then, with probability Pin; 0;RU CL, the null hypothesis H0: F t = F 0 is rejected and the manufacturing process is declared to be out of control at time t.
The corresponding values of RU CL 0’s and Pin; 0;RU CL’s for Cases 1-4 in Sec-tion 4 are shown in Table 3, where k = 1, T = 300, n1 = : : : = nT = nt = 300,
The corresponding values of RU CL 0’s and Pin; 0;RU CL’s for Cases 1-4 in Sec-tion 4 are shown in Table 3, where k = 1, T = 300, n1 = : : : = nT = nt = 300,