INTRODUCTION - 類別資料混合先驗分配之經驗貝氏製程監控技術

In a manufacturing process, suppose that a product has k possible types of defects for some known positive integer k. For each tested product item, the result could be recorded as exactly one of the following k + 1 disjoint categories: fthe

…rst defect type, : : :, the kth defect type, passg. Such data are called either binary for k = 1 or polytomous for k 2. In the paper, categorical data denote either binary or polytomous data. See, e.g., McCullagh and Nelder (1989, Chapters 4 and 5) or Agresti (2002) for a review of the categorical data analysis.

In a Bayesian framework, the prior distribution of the unobserved random parameters is pre-speci…ed explicitly, i.e., it does not depend on the observed data. However, it is usually a non-trivial task for practitioners to pre-specify an appropriate prior distribution of the random parameters. Thus, an empirical Bayes approach is commonly used instead.

In an empirical Bayes framework, there exist some unknown hyperparameters in the prior distribution of the unobserved random parameters. Then the marginal distribution of the observed data is utilized to estimate the hyperparameters. Fi-nally, a Bayesian inference is made for the random parameters by treating the estimated prior distribution as the prior distribution. Since the estimated prior distribution does depend on the observed data, an empirical Bayes inference is not a Bayesian inference.

There are some research works utilizing the empirical Bayes model to monitor the categorical data generated in a manufacturing process. For example, Yousry et al. (1991) used the beta-binomial empirical Bayes model to monitor the binary data and utilized the method of moments for estimation of the

hyperparame-ters. Recently, Shiau et al. (2005) used the Dirichlet-multinomial empirical Bayes model to monitor the polytomous data and utilized both the pseudo maximum likelihood method and the method of moments for estimation of the hyperparame-ters. Chen et al. (2004) used the beta-binomial/Dirichlet-multinomial empirical Bayes model to monitor the categorical data and utilized the maximum likelihood method for estimation of the hyperparameters. Similarly, Chen et al. (2005) used the transformed-normal-binomial/multinomial empirical Bayes model to monitor the categorical data and utilized the maximum likelihood method for estimation of the hyperparameters. Chen and Liu (2005) developed a model selection technique between two empirical Bayes models for the categorical data.

To proceed the discussion, we give a brief description on the Bayesian inference as follows: In a Bayesian framework, the prior distribution of the unobserved ran-dom parameter vector has an explicitly pre-speci…ed prior probability density function (p.d.f.) or probability mass function (p.m.f.) ( ) and that the response vector y given has a known conditional p.d.f. or p.m.f. f (yj ), where the func-tion ( ) does not depend on y. Then the Bayesian inference is based on the posterior p.d.f. or p.m.f. p( jy) of given y, where

p( jy) / f(yj ) ( ):

In the Bayesian terminology, ( ), f (yj ), and p( jy) are also called the prior likelihood, the likelihood, and the posterior likelihood of , respectively. In the literature, it is common practice to estimate by the posterior mean E( jy) of

given y, where

E( jy) =

RR f (yj ) ( ) d f (yj ) ( ) d or

2 f (yj ) ( ) P

2 f (yj ) ( )

with P (f 2 g) = 1. An alternative estimator of is the posterior mode mode( jy) of given y, where

mode( jy) = arg sup

p( jy) = arg sup

f (yj ) ( ):

See, e.g., Gelman et al. (2004) for a review of the Bayesian data analysis.

Next, we give a brief description on the empirical Bayes inference as follows:

In an empirical Bayes framework, the unobserved random parameter vector has a prior p.d.f. or p.m.f. ( ; ) for some unknown hyperparameter vector and that the response vector y given has a known conditional p.d.f. or p.m.f. f (yj ).

An empirical Bayes inference is simply a Bayesian inference discussed above with ( ) being replaced by ( ; )j = ^ (y) ( ( ; ^(y))), where ^(y) is an estimator of . Then an empirical Bayes inference is based on the estimated posterior p.d.f. or p.m.f. p( jy; )j = ^ (y) ( p( jy; ^(y))) of given y, where

p( jy; ) / f(yj ) ( ; ):

In practice, either the maximum likelihood estimator or a method-of-moments estimator of is usually used as ^(y) in an empirical Bayes inference. Similarly, it is common practice to estimate by the estimated posterior mean E( jy; )j = ^ (y)

( E( jy; ^(y))) of given y, where

E( jy; ) =

RR f (yj ) ( ; ) d f (yj ) ( ; ) d or

2 f (yj ) ( ; ) P

2 f (yj ) ( ; )

with P (f 2 g; ) = 1. An alternative estimator of is the estimated posterior mode mode( jy; )j = ^(y) ( mode( jy; ^(y))) of given y, where

mode( jy; ) = arg sup

p( jy; ) = arg sup

f (yj ) ( ; ):

See, e.g., Carlin and Louis (2000) for a review of the empirical Bayes data analysis.

The remaining parts of the paper is organized as follows. In Section 2, a two-components mixture prior parametric family for the in-control prior distribution is proposed in a manufacturing process. In Section 3, an empirical Bayes approach is proposed when there are available in-control categorical data generated from the manufacturing process. An example of the proposed empirical Bayes model is introduced in Section 4. The goodness of …t and the simpli…cation of the proposed model are discussed in Sections 5 and 6, respectively. Utilizing the likelihood ratio method, both Bayesian and empirical Bayes monitoring techniques are proposed in Section 7. The performance of the proposed process monitoring scheme is studied in terms of the average run length in Section 8. Some concluding remarks are given in the …nal section.

2

A TWO-COMPONENTS MIXTURE PRIOR PARAMETRIC FAMILY

Assume that a product item is classi…ed as one of the following k + 1 disjoint categories: fthe …rst defect type, : : :, the kth defect type, passg, where k is a known positive integer. Let t be any positive integer. For i 2 f1; : : : ; kg, let

it denote the probability that a product item manufactured at time t has the ith defect type. Then 1 Pk

i=1 it ( k+1;t) is the probability that a product item manufactured at time t passes the test. Set t ( _1t; : : : ; _kt)^T and f ^t:

1t; : : : ; _kt > 0 and Pk

i=1 it < 1g. In the paper, ^t is called the (unobserved) random parameter vector at time t. Let F _t denote the prior cumulative distrib-ution function (c.d.f.) of t. For simplicity of notation, set R^m ( 1; 1)^m for any positive integer m.

Throughout the paper, the manufacturing process is said to be in control at time t if and only if F t = F, where F is an unknown in-control prior c.d.f. on with p.d.f. ( _t). In other words, the manufacturing process is said to be out of control at time t if and only if F _t 6= F .

For u 2 f1; 2g, let fF^u; u: u 2 ^ug denote the uth component prior para-metric family, where u is a qu 1 hyperparameter vector for some known pos-itive integer qu, each Fu; u is a known prior c.d.f. on with p.d.f. u( _t; _u), and u is a known open subset of R^q^u. Assume that @² u( _t; _u)=@ _u@ ^T_u ex-ists for each t 2 , u 2 ^u, and u 2 f1; 2g. Let fF : 2 g denote the two-components mixture prior parametric family, where ( (!; ^T₁; ^T₂)^T) is a (1 + q1 + q2) 1 ( q 1) hyperparameter vector, each F is a known prior

c.d.f. on with p.d.f.

( _t; ) exp(!)

1 + exp(!) ¹( _t; ₁) + 1

1 + exp(!) ²( _t; ₂); (1)

and [ 1; 1] ¹ ². Assume that the two-components mixture prior parametric family is identi…able, i.e., F ¹ 6= F ² if ¹ 6= ² with ¹; ² 2 . When ! = 1, the two-components mixture prior parametric family is simpli…ed to the …rst component prior parametric family with ( _t; ) = ₁( _t; ₁). When

! = 1, the two-components mixture prior parametric family is simpli…ed to the second component prior parametric family with ( _t; ) = ₂( _t; ₂). See, e.g., McLachlan and Peel (2000).

For any 2 , the Kullback-Leibler divergence between the in-control prior c.d.f. F and the prior c.d.f. F is de…ned as

d(F; F )

log ( _t)

( _t; ) dF ( _t) d( ): (2)

By the Jensen inequality,

d( ) = Z

log ( _t; )

( _t) dF ( _t) log

Z ( _t; )

( _t) ( _t) d _t

= log

f ^t: ( t)>0g

( _t; ) d _t log Z

( _t; ) d _t = 0

for 2 , where d( ) = 0 if and only if F = F .

Assume that all of the following conditions hold: For 2 ( 1; 1) ¹ ²

( ^o), @²d( )=@ @ ^T exists,

@d( )

@ =

Z @

@ log ( _t)

( _t; ) dF ( _t) S( );

and

@²d( )

@ @ ^T =

Z @²

@ @ ^T log ( _t)

( _t; ) dF ( _t) J ( ):

Assume that there exists a unique ⁰ 2 ^o such that

0 = arg inf

2 d( ): (3)

Then S( ⁰) = 0_{q 1}. Observe that, for 2 ^o,

S( ) =

Z @ ( _t; )=@

( _t; ) dF ( _t)

Z ( _t; )

( _t; ) dF ( _t) Z

S( ; t) dF ( t) E(S( ; t); F ) (4)

and

J ( ) =

Z @S( ; _t)

@ ^T dF ( t)

Z @² ( _t; )=@ @ ^T

( _t; ) + ( _t; ) ^T( _t; )

[ ( _t; )]² dF ( _t)

Z T( _t; )

( _t; ) + ( _t; ) ^T( _t; )

[ ( _t; )]² dF ( _t) Z

J ( ; _t) dF ( _t) E(J ( ; _t); F ): (5)

For 2 ^o, set

One way to evaluate ⁰ is to iterate the following procedure until ^(v)converges to ⁰: First choose a good initial value ⁽⁰⁾ 2 ^o for ⁰. Next, set

(v+1) (v) + J ¹ ^(v) S ^(v) (7)

when ^(v) is de…ned for v 2 f0; 1; 2; : : :g. If ^(v+1) 2 ^o and d( ^(v+1)) d( ^(v)), set ^(v+1) ^(v+1); otherwise, set

(u;v+1) (v)+ 1

2^u K ¹ ^(v) S ^(v) (8)

for u 2 f0; 1; 2; : : :g and set ^(v+1) ^(m^v+1^;v+1), where m_v+1 minfu: u 2 f0; 1; 2; : : :g, ^(u;v+1) 2 ^o, ^(u+1;v+1) 2 ^o, and d( ^(u;v+1)) < minfd( ^(v)), d( ^(u+1;v+1))gg.

Note that, by the Taylor series expansion, we obtain

d ^(u;v+1) = d ^(v) S^T ^(v) ^(u;v+1) ^(v) +

= d ^(v) 1

2^u S^T ^(v) K ¹ ^(v) S ^(v) + O 1 2^{2 u}

as u ! 1 for any …xed non-negative integer v. Since S^T( ^(v))K ¹( ^(v))S( ^(v)) >

0for any …xed non-negative integer v, d( ^(u;v+1))is a strictly increasing function of u for large u with limit d( ^(v)), which implies that m_v+1 is well-de…ned. Thus, d( ^(v)) is a decreasing function of v, i.e., d( ⁽⁰⁾) d( ⁽¹⁾) d( ⁽²⁾) : : :.

When any of d( ), S( ), J ( ), and K( ) does not have a closed-form formula, we may …rst simulate an independent and identically distributed (i.i.d.) sample f ⁽¹⁾t ; : : : ; ^(R)_t g of size R, e.g., R = 50 000, from the in-control prior c.d.f. F and

then numerically evaluate d( ), S( ), J ( ), and K( ) by

3

AN EMPIRICAL BAYES APPROACH

Let t be any positive integer. Suppose that there are nt tested product items manufactured at time t, where nt is a known positive integer. For i 2 f1; : : : ; kg, let yitdenote the number of the tested product items which have the ith defect type among the nt tested product items manufactured at time t. Then nt

Pk i=1 y_it ( yk+1;t) is the number of the tested product items which pass the test among the nt tested product items manufactured at time t. Set yt (y_1t; : : : ; y_kt)^T and Yⁿt fy^t: y1t; : : : ; y_kt 2 f0; 1; : : : ; n^tg and Pk

i=1 y_it n_tg. In the paper, y^t is called the (observed) response vector at time t.

At each time t, assume that the response vector ytgiven the random parameter vector t is distributed as either the conditional binomial(nt; _t) distribution for k = 1 or the conditional multinomial(nt; _t) distribution for k 2, denoted by

y_tj ^t binomial(nt; _t) for k = 1 or multinomial(nt; _t) for k 2. Let Fytj ^t

denote the conditional c.d.f. of yt given t with p.m.f.

f (y_tj ^t) = 1_Y_nt(y_t) n_t!

For u 2 ^u and u 2 f1; 2g, assume that In the paper, it is assumed that the in-control prior c.d.f. F = F ⁰ for some unique ⁰ 2 . Then d( ⁰) = 0. Assume that there are available historical in-control response vectors fy¹; y₂; : : : ; y_Tg generated in the manufacturing process for some known large positive integer T , where ( ^T₁; y^T₁)^T; ( ^T₂; y^T₂)^T; : : : ; ( ^T_T; y^T_T)^T are independent 2k 1 random vectors. Set ( ^T₁; ^T₂; : : : ; ^T_T)^T, y (y^T₁; y^T₂; : : :, y^T_T)^T, and Y Yⁿ1 Yⁿ2 YⁿT, where and y are, respectively, called the historical in-control (unobserved) random vector and the historical in-control

(ob-served) response vector in the paper. Let F_y; ⁰ denote the marginal c.d.f. of y

Given the historical in-control response vector y, the log-likelihood function for is

the score function for is

S( ; y) @`( ; y)

and the observed (Fisher) information for is

Then K( ; y) is a non-negative de…nite covariance matrix for 2 ^o and y 2 Y.

For large T , K( ; y) is in general a positive de…nite covariance matrix for 2 ^o and y 2 Y.

Observe that, for 2 ^o,

The maximum likelihood estimator (MLE) ^(y) ( ^) of solves the score equation S( ; y) = 0q 1 for . That is, S( ; y)j = ^ ( S(^; y)) = 0q 1.

One way to evaluate ^ is to iterate the following procedure until ^(v) converges to ^: First choose a good initial value ⁽⁰⁾ 2 ^o for ^. Next, set

(v+1) (v)

+ J ¹ ^(v); y S ^(v); y (24)

when ^(v) is de…ned for v 2 f0; 1; 2; : : :g. If ^(v+1) 2 ^o and `( ^(v+1); y)

`( ^(v); y), set ^(v+1) ^(v+1); otherwise, set

Note that, by the Taylor series expansion, we obtain

` ^(u;v+1); y

0for any …xed non-negative integer v, `( ^(u;v+1); y)is a strictly decreasing function of u for large u with limit `( ^(v); y), which implies that m_v+1is well-de…ned. Thus,

`( ^(v); y) is an increasing function of v, i.e., `( ⁽⁰⁾; y) `( ⁽¹⁾; y) `( ⁽²⁾; y) : : :.

When any of `( ; y), S( ; y), J ( ; y), and K( ; y) does not have a closed-form closed-formula, we may numerically evaluate any of them as follows: First, for u2 f1; 2g, simulate an i.i.d. sample f ^(u;1)1 ; : : : ; ^(u;R)₁ g of size R, e.g., R = 50 000,

f^_u; _u(y_t; _u) and ^f ^T(y_t; ), respectively, utilizing their closed-form formulae. Finally, numer-ically evaluate `( ; y), S( ; y), J ( ; y), and K( ; y) by

4

^{AN EXAMPLE}

For illustration of the proposed methodology, the …rst component prior para-metric family is chosen as the family of all beta/Dirichlet distributions because it is a conjugate family of binomial/multinomial distributions. The second com-ponent prior parametric family is chosen as the family of all transformed normal distributions (de…ned below) because it is a rich family of distributions, o¤ering important distribution shapes that cannot be achieved within the family of all beta/Dirichlet distributions. See, e.g., O’Hagan and Forster (2004, Chapter 12).

4.1

The First Component Prior Parametric Family

Let the …rst component prior parametric family fF^1; 1: 1 2 ¹g denote the family of all beta/Dirichlet distributions, where 1 ( ₁₁; : : : _1;k+1)^T ( ( ₁₁; : : : _1q₁)^T), 1 R^k+1, and F1; 1 has p.d.f.

1( _t; ₁) = 1 ( _t) [Pk+1

i=1 exp( _1i)]

Qk+1

i=1 [exp( _1i)]

k+1Y

i=1

exp( 1i) 1 it

1 ( _t) [exp( ₁₀)]

Qk+1

i=1 [exp( _1i)]

k+1Y

i=1

exp( 1i) 1

it (33)

with 1 ( t) = 1 for t 2 and 0 otherwise. Since fF^1; ¹: 1 2 ¹g is cho-sen as a conjugate family of binomial/multinomial distributions, all of f1(y_t; ₁), f_1; ₁(y_t; ₁), and f_1; ₁ ^T

1(y_t; ₁) have closed-form formulae for 1 2 ¹ as follows:

For 1 2 ¹, it follows from Johnson et al. (1997, pages 80 and 81) that

Thus, for yt2 Yⁿ^t and 1 2 ¹,

f_1; ₁(y_t; ₁) = f₁(y_t; ₁) S₁( ₁; y_t) (35)

and

f_1;

1 T

1(y_t; ₁) = f₁(y_t; ₁) S₁( ₁; y_t) S₁^T( ₁; y_t) J₁( ₁; y_t) : (36)

4.2

The Second Component Prior Parametric Family

Let the second component prior parametric family fF^2; 2: 2 2 ²g de-note the family of all transformed normal distributions de…ned as follows: Set (log( _1t= _k+1;t); : : : ; log( _kt= _k+1;t))^T _t( ( _1t; : : : ; _kt)^T). Then, for i 2 f1; : : :, kg, ^it = exp( _it)=[1 +Pk

i⁰=1 exp( _i0t)]. Let N ( ; ) denote the k-variate normal distribution with mean vector ( ( ₁; : : : ; _k)^T) 2 R^k and k k positive def-inite covariance matrix ( ( _ii0)). Set ¹ ( ⁱⁱ⁰) and R ( ⁱⁱ⁰=p _ii _i₀_i₀

) ( ( _ii0)). Then

1 = diagnp ₁₁

; : : : ;p

kko

Rdiagnp ₁₁

; : : : ;p

kko :

Set

2 T; log ¹¹ ; : : : ; log ^kk ; log 1 + ₁₂

1 ₁₂ ; : : : ; log 1 + _1k 1 _1k ; : : : ; log 1 + _{k 1;k}

1 _{k 1;k}

( ₂₁; : : : ; _2;k(k+3)=2)^T ( ₂₁; : : : ; _2q₂)^T 2 ²;

where 2 f ²: 2 R^k and is a k k positive de…nite covariance matrixg. distribution, denoted by t F_2; ₂, with p.d.f.

2( _t; ₂) = ( _t; ₂) det @ _t

and formula, we may numerically evaluate all of them as follows: First simulate an i.i.d. sample f ^(2;1)1 ; : : : ; ^(2;R)₁ g of size R, e.g., R = 50 000, from the prior c.d.f. F^2; 2

f^_2; ₂(y_t; ₂)

2(y_t; ₂) is to utilize the multivariate Gauss-Hermite quadrature, e.g., see Fahrmeir and Tutz (2001, pages 447-449). All of nodes and weights of the Hermite polynomial of 32 degrees are shown in the appendix for the multivariate Gauss-Hermite quadrature.

In the paper, a simulation study is conducted for the following four cases where

F = F ⁰ = exp(!⁰)

1 is the beta(85; 15) distribution, and F_2; ⁰

2 is the transformed-normal( 0:716; (0:214)²) distribution.

Case 2: ⁰ = (log(1); log(80); log(20); 0:410; log[1=(0:205)²])^T. In particular, exp(!⁰)=[1 + exp(!⁰)] = 1=2, F is the beta(80; 20) distribution,and F is the

transformed-normal( 0:410; (0:205)²) distribution.

Case 3: ⁰ = (log(1); log(60); log(40); 1:405; log[1=(0:253)²])^T. In particular, exp(!⁰)=[1 + exp(!⁰)] = 1=2, F_1; ⁰₁ is the beta(60; 40) distribution, and F_2; ⁰₂ is the transformed-normal( 1:405; (0:253)²) distribution.

Case 4: ⁰ = (log(5); log(73); log(27); 0:203; log[1=(0:202)²])^T. In particular, exp(!⁰)=[1 + exp(!⁰)] = 5=6, F_1; ⁰

1 is the beta(73; 27) distribution, and F_2; ⁰

2 is the transformed-normal( 0:203; (0:202)²) distribution.

5

GOODNESS OF FIT

In this section, the goodness of …t of the proposed model for a set of available historical in-control response vectors, fy¹; : : : ; y_Tg, generated in a manufacturing process is discussed. Recall that ( ^T₁; : : : ; ^T_T)^T, y (y^T₁; : : : ; y_T^T)^T, Y Yⁿ1 YⁿT, and F is the in-control prior c.d.f.

Consider the null hypothesis H0: 1; : : : ; _T ^i:i:d:F 2 fF : 2 g versus the alternative H1: 1; : : : ; _T ^i:i:d: F =2 fF : 2 g. Let F( ) denote the set of all prior c.d.f.’s on and let `(F ; y) denote the log-likelihood function of F given y.

Then

`(F ; y) log

" _T Y

t=1

f (y_t; F )

= XT

t=1

log[f (y_t;F )]

XT t=1

`(F ; y_t);

where

f (y_t; F ) = Z

f (y_tj ^t) dF ( _t):

Let WT(y) denote the corresponding likelihood ratio (LR) statistic given y.

Then

where ^F is the non-parametric MLE of F given y under H1 and ^ is the parametric MLE of under H0. Since it takes too much time to calculate the critical point for performing the LR test, an alternative goodness-of-…t test is proposed in the paper as follows:

Note that the empirical prior c.d.f. ~F with p.m.f. T ¹ PT

t=1 1_f _t_g converges to F in distribution as T ! 1 and that, for t 2 f1; : : : ; T g, the MLE y^t=n_t of t

given ytconverges to tas nt! 1. Since ¹; : : : ; _T are unobserved, the empirical prior c.d.f. ~F is unavailable. Thus, we utilize the estimated empirical prior c.d.f. F with p.m.f. T ¹ PT

t=1 1_fy_t_=n_t_g to estimate F . When all of n1; : : : ; n_T, and T tend to 1, F converges to F in distribution.

In the paper, consider the goodness-of-…t statistic

W_T(y) 2 h

`(F ; y)jF =F ` ^; y i

2 h

`(F ; y) ` ^; y i

: (47)

One way to calculate the critical point for performing the goodness-of-…t test is as follows: First simulate an i.i.d. sample fy⁽¹⁾; : : : ; y^(R)g, e.g., R = 50 000, from the estimated in-control marginal c.d.f. F_y; ⁰j ⁰= ^ ( F_y;^). Let (y(1); : : : ; y_(R)) be a permutation of (y⁽¹⁾; : : : ; y^(R)) such that W_T(y₍₁₎) : : : W_T(y_(R)). Let

be a known constant with 0 < < 1, e.g., 0:05. An approximate size 1 goodness-of-…t test is to reject H0 if and only if W_T(y) > W_T(y_{([R (1} _)])), where [R (1 )] is the largest integer less than or equal to R (1 ).

The corresponding values of W_T(y_{([R (1} _)]))’s for Cases 1-4 in Section 4 are shown in Table 1, where k = 1, T = 300, n1 = : : : = n_T = 300, R = 50 000, and = 0:05. And the empirical c.d.f.’s of W_T(y)’s for Cases 1-4 in Section 4 are shown in Figures 1, where k = 1, T = 300, n1 = : : : = n_T = 300, R = 50 000.

Table 1: The values of W_T(y_{([R (1} _)]))’s for Cases 1-4, where k = 1, T = 300, n₁ = : : : = n_T = n_t= 300, R = 50 000, and = 0:05.

Case 1 Case 2 Case 3 Case 4

W_T(y_{([R (1} _)])) 18.1 4.90 12.7 1.78

Figure 1: The empirical c.d.f.’s of W_T’s for Case 1-4, where k = 1, T = 300, n₁ = : : : = n_T = n_t= 300, and R = 50 000.

-5.0 0.0 5.0 10.0 15.0 20.0 25.0

W^*_T

0.0 0.2 0.4 0.6 0.8 1.0

Empirical c.d.f.

Case 1 Case 2 Case 3 Case 4

4.90 0.95

18.1

1.78 12.7

6

SIMPLIFICATION

In this section, the simpli…cation of the two-components mixture prior para-metric family to either the …rst or the second component prior parapara-metric family is discussed if the null hypothesis of the previous goodness-of-…t test is not rejected.

Let u 2 f1; 2g be …xed. Consider the null hypothesis H^u0: 1; : : : ; _T ^i:i:d:F 2 fF^u; ^u: u 2 ^ug versus the alternative H^u1: 1; : : : ; _T ^i:i:d: F 2 fF : 2 g.

Let Wu;T(y)denote the LR statistic given y, where

W_u;T(y) 2

` ^; y sup

u2 u

XT t=1

`_u( _u; y_t)

2 ` ^; y sup

u2 ^u

`_u( _u; y) 2 h

` ^; y `_u ^

u; y i

(48)

with ^udenoting the MLE of ugiven y under the uth component prior parametric family.

One way to calculate the critical point for performing the LR test is as follows:

First simulate fy^(u;1); : : : ; y^(u;R)g, e.g., R = 50 000, from the estimated in-control marginal c.d.f. F_y;u; ⁰_uj ⁰_u= ^u ( F_{y;u; ^}

u). Let (y^(u)₍₁₎; : : : ; y^(u)_(R)) be a permutation of (y^(u;1); : : :, y^(u;R)) such that Wu;T(y^(u)₍₁₎) : : : Wu;T(y^(u)_(R)). Let be a known constant with 0 < < 1, e.g., 0:05. An approximate size 1 LR test is to reject Hu0 if and only if Wu;T(y) > W_u;T(y^(u)_{([R (1} _)])), where [R (1 )] is the largest integer less than or equal to R (1 ).

When both H10 and H20 are rejected, the proposed two-components mixture prior parametric family for the in-control prior distribution is selected. The cor-responding monitoring technique is developed in the following section.

When H10 is not rejected but H20 is rejected, the …rst component prior

para-metric family for the in-control prior distribution is selected. The corresponding monitoring technique is developed in Chen et al. (2004).

When H10is rejected but H20is not rejected, the second component prior para-metric family for the in-control prior distribution is selected. The corresponding monitoring technique is developed in Chen et al. (2005).

When neither H10 nor H20is rejected, the model selection technique developed in Chen and Liu (2005) can be utilized. The corresponding monitoring technique is developed in either Chen et al. (2004) or Chen et al. (2005).

The corresponding values of Wu;T(y^(u)_{([R (1} _)]))’s for Cases 1-4 in Section 4 are shown in Table 2, where u 2 f1; 2g, k = 1, T = 300, n¹ = : : : = nT = 300, R = 50 000, and = 0:05. And the empirical c.d.f.’s of W1;T(y)’s and W2;T(y)’s for Cases 1-4 in Section 4 are shown in Figures 2 and 3, where k = 1, T = 300, n₁ = : : : = n_T = 300, R = 50 000.

Table 2: The values of Wu;T(y^(u)_{([R (1} _)]))’s for Cases 1-4, where u 2 f1; 2g, k = 1, T = 300, n1 = : : : = n_T = n_t = 300, R = 50 000, and = 0:05.

Case 1 Case 2 Case 3 Case 4 W_1;T(y⁽¹⁾_{([R (1} _)])) 2.146 1.762 0.566 1.284 W_2;T(y⁽²⁾_{([R (1} _)])) 1.035 0.653 1.789 0.335

Figure 2: The empirical c.d.f.’s of W1;T’s for Case 1-4, where k = 1, T = 300,

7

A PROCESS MONITORING SCHEME

Let Pin denote the false-alarm rate, i.e., the probability that an out-of-control signal occurs when the manufacturing process is in control. Conventionally, Pin is taken to be 2 ( 3) ( 0:002 699 8), where is the c.d.f. of the standard normal distribution. In this section, utilizing the LR method, a Bayesian (or an empirical Bayes) monitoring scheme for the manufacturing process is proposed when F = F ⁰ 2 fF : 2 g for some known (or unknown) ⁰ 2 . The main reason for using the LR test is that it often has a higher power than other tests when the alternative hypothesis is true, which corresponds to a better detecting power in monitoring the process when the process is out of control.

In order to monitor the manufacturing process at time t (> T ), suppose that the response vector yt is observed. Then we are interested in testing whether or not the manufacturing process is in control at time t. Recall that F t is the prior c.d.f. of t and that F( ) is the set of all c.d.f.’s on .

7.1

A BAYESIAN MONITORING SCHEME

In this subsection, consider the case where F = F ⁰ 2 fF : 2 g for some known ⁰ 2 . To monitor the manufacturing process at time t, the null hypothesis H₀: F t = F ⁰ versus the alternative H1: F t 6= F ⁰, i.e., F t 2 F( )nfF ⁰g, is tested.

List all the elements of the sample space Yⁿ^t of yt by fy⁽¹⁾t ; : : : ; y^(jY_t ^nt^j)g, where jYⁿtj (= (n^t+ k)!=(n_t!k!)) is the number of elements in Yⁿt. Regard F _t as the unknown parameter of interest in F( ). Then the unknown parameter of inter-est is non-parametric. Let `(F _t; y_t) ( log[f (y_t; F _t)]) denote the log-likelihood

function of F t given yt. Note that

`(F _t; y_t) = log Z

f (y_tj ^t) dF _t( _t) log Z

sup

f (y_tj ^t) dF _t( _t)

= log Z

f (y_tj ^t)j t=yt=nt dF _t( _t) = logh

f (y_tj ^t)j t=yt=nt

;

where the binomial/multinomial likelihood f (ytj ^t)for tgiven ytattains its max-imum at t = y_t=n_t: Thus, the MLE ^F _t of F t given yt has p.m.f. 1_fyt=ntg and

sup

F _t2F( )

`(F _t; y_t) = `(F _t; y_t)jF _t= ^F _t ` F^ _t; y_t = log f (y_tj ^t)j t=yt=nt :

Let W_t; ⁰(y_t) denote the corresponding LR statistic, where

W_t; ⁰(y_t) = 2 log f (y_tj ^t)j t=yt=nt ` ⁰; y_t (49)

with P (f0 < Wt; ⁰(y_t) <1g; Fyt; ⁰) = 1.

The size PinLR test and a control chart of monitoring the LR statistic W_t; ⁰(y_t) can be constructed as follows: Let (yt;(1); : : : ; y_t;(jY_nt_j))be a permutation of (y_t⁽¹⁾; : : :, y^(jY_t ^nt^j))such that W_t; ⁰(y_t;(1)) : : : W_t; ⁰(y_t;(jY_nt_j)). Note that W_t; ⁰(yt)is a dis-crete random variable. If a deterministic upper control limit is used, a pre-speci…ed false-alarm rate Pin (2 (0; 1)), e.g., 2 ( 3), is nearly impossible to attain. How-ever, there is no problem to attain any pre-speci…ed false-alarm rate based on the concept of a randomized-upper-control-limit approach proposed in Shiau et al. (2005). To …nd the randomized upper control limit ( RU CL ⁰), we start ac-cumulating the right tail probability from W_t; ⁰(y_t;(jY_nt_j))until we reach the …rst r

such that P (fWt; ⁰(y_t) W_t; ⁰(y_t;(r))g; Fyt; ⁰) P_in. Denote this r by m ⁰, i.e.,

m ⁰ = max r: P W_t; ⁰(y_t) W_t; ⁰ y_t;(r) ; F_y_t_; ⁰ P_in : (50)

If P (fWt; ⁰(y_t) W_t; ⁰(y_t;(m ₀₎)g; Fyt; ⁰) = P_in, which is nearly impossible, then there is no need for randomization and W_t; ⁰(y_t;(m ₀₎)is the upper control limit ( U CL ⁰). If P (fWt; ⁰(y_t) W_t; ⁰(y_t;(m ₀₎)g; Fyt; ⁰) > P_in, then W_t; ⁰(y_t;(m ₀₎) = RU CL ⁰. Note that there may be more than one y_t;(r) such that W_t; ⁰(y_t;(r)) = RU CL ⁰. Let m ⁰_;L, m ⁰_;U 2 f1; : : : ; jYⁿtjg such that

W_t; ⁰ y_t;(m _{0;L 1}₎ < W_t; ⁰ y_t;(m _0;L₎ = RU CL ⁰ = W_t; ⁰ y_t;(m _0;U₎

< W_t; ⁰ y_t;(m _0;U+1₎ ;

where W_t; ⁰(y_t;(0)) 0 and W_t; ⁰(y_t;(jY_nt_j+1)) 1. Then the randomization is done by signaling an out-of-control alarm with probability

P_in; ⁰_{;RU CL} = P_in P (fWt; ⁰(y_t) > RU CL ⁰g; Fyt; ⁰) P (fWt; ⁰(y_t) = RU CL ⁰g; Fyt; ⁰)

P_in PjY_ntj

r=m _0;U+1 P (fWt; ⁰(y_t) = W_t; ⁰(y_t;(r))g; Fyt; ⁰) P^m 0;U

r=m _0;L P (fWt; ⁰(y_t) = W_t; ⁰(y_t;(r))g; Fyt; ⁰) :(51) This leads to

Pin = P W_t; ⁰(yt) > RU CL ⁰ ; F_y_t_; ⁰

+P_in; ⁰_{;RU CL} P W_t; ⁰(y_t) = RU CL ⁰ ; F_y_t_; ⁰

and 0 < P_in; ⁰_{;RU CL} 1. When P_in; ⁰_{;RU CL} = 1, there is no need for

randomiza-tion.

The monitoring scheme is as follows: If W_t; ⁰(y_t) > RU CL ⁰, then the null hypothesis H0: F t = F ⁰ is rejected and the manufacturing process is declared to be out of control at time t; if W_t; ⁰(y_t) < RU CL ⁰, then the null hypothesis H₀: F _t = F ⁰ is not rejected and the manufacturing process is declared to be in control at time t; if W_t; ⁰(y_t) = RU CL ⁰, then, with probability P_in; ⁰_{;RU CL}, the null hypothesis H0: F _t = F ⁰ is rejected and the manufacturing process is declared to be out of control at time t.

The corresponding values of RU CL ⁰’s and P_in; ⁰_{;RU CL}’s for Cases 1-4 in Sec-tion 4 are shown in Table 3, where k = 1, T = 300, n1 = : : : = nT = nt = 300,

在文檔中類別資料混合先驗分配之經驗貝氏製程監控技術 (頁 8-0)