• 沒有找到結果。

CONCLUDING REMARKS

在文檔中 混合偏斜t分佈及其應用 (頁 29-38)

We have proposed a robust approach to a finite mixture model based on the skew t distribution, called the STMIX model, which accommodates both asymmetry and heavy tails jointly that allows practitioners for analyzing data in a wide variety of considerations. We have described a normal-truncated normal-gamma-multinomial hierarchy for the STMIX model and presented some modern EM-type algorithms for ML estimation in a flexible complete-data framework. We demonstrate our approach with a real data set and show that the STMIX model has better performance than the other competitors.

Due to recent advances in computational technology, it is worthwhile to carry out Bayesian treatments via Markov chain Monte Carlo (MCMC) sampling methods in the context of STMIX model. The basic idea is to explore the joint posterior distributions of the model parameters together with latent variables γ and τ , and allocation variables Z when informative priors are employed. Other extensions of the current work include, for example, a generalization of STMIX to multivariate settings (Azzalini and Capitanio 2003; Jones and Faddy 2003) and determination of the number of components in skew t mixtures via reversible jump MCMC (Green 1995; Richardson and Green 1997; Zhang et al. 2004).

APPENDIX

A. Proofs of Eqs. (4), (5), (6) and (7)

Suppose Y∼ ST (ξ, σ2, λ, ν), where Z ∼ SN (λ), it has following representation:

Y = ξ + σ Z

√τ, Z ∼ SN (λ), τ ∼ Γ(ν/2, ν/2), Z ⊥ τ.

The condition distribution of Y given τ is

Y |τ ∼ SN (ξ, σ2/τ, λ).

We then have the following result:

E(τn) = Z

0

τn(ν/2)ν/2

Γ(ν/2) τν/2−1e−ν/2τ

= (ν/2)ν/2 Γ(ν/2)

Z

0

τ(ν+2n)/2−1e−ν/2τ

= Γ((ν + 2n)/2) Γ(ν/2)

³ν 2

´−n

. (A.1)

The first four moments of Z are

E(Z) = r2

πδλ, E(Z2) = 1, E(Z3) =

r2

πδλ(3 − δλ2), E(Z4) = 3. (A.2) Applying the double expectation trick, in conjunction of (A.1) and (A.2), we have

E(Y ) = E

³

E(Y |τ )

´

= E(ξ + r2

πδλ σ

√τ)

= ξ + Γ((ν − 1)/2) Γ(ν/2)

rν πδλσ.

It is easy to verify

Let γY and κY denote the skewness and kurtosis, respectively. We have

γY = E(Y − EY )3

and

B. Proof of Proposition 2

(a) Standard calculation of conditional expectation yields E(τ | y) =

By Proposition 1, it suffices to show

E(τ | y) =

(b) We first need to show the following:

E³√

From (12), the expectation of a truncated normal distribution is given by

Applying the double expectation trick and using (B.1) and (B.2), we get E(γτ | y) = E

(c) Similarly, it is easy to verify that E(γ2 | y, τ ) = δλ2(y − ξ)2+(1 − δλ22 Using (B.1) and (B.3), and the double expectation trick as before gives

E(γ2τ | y) = δλ2(y − ξ)2E(τ |y) + (1 − δλ22

By Leibnitz’s rule, we can get log³η2+ ν

C. The PX-EM Algorithm

The method of parameter-expansion EM, PX-EM, introduced by Liu, Rubin and Wu (1998), shares the simplicity and stability of ordinary EM, but has a faster rate of convergence. PX-EM algorithm accelerates EM algorithm since its E-step execute a more efficient analysis. PX-EM is to perform a covariance adjustment to correct the analysis of the M step, capitalizing on extra information captured in the imputed complete data.

PX-EM expands the complete data model f (ycom|θ) to a larger model, fX(ycom|Θ), with Θ = {θ, α}, and α is an auxiliary scale parameter whose value is fixed at α0

in the original model. If the auxiliary parameter α equal to 1, {θ} = {θ}.

And then, we want to compare ECM algorithm with PX-EM algorithm for ML estimation of skew t distribution.

Model O:

Y | γ, τ ∼ N µ

ξ + δλγ,1 − δλ2 τ σ2

, γ | τ ∼ T N

µ 0,σ2

τ ; [0, ∞)

, τ ∼ Γ(ν/2, ν/2),

and θ = (ξ, σ2, λ, ν) is the parameter of the stew t distribution in ECM algorithm.

The results of ECM algorithm are referred to Section 3.

We now derive this modified ECM using PX-EM, and want to adjust current estimates by expanding the parameter:

Model X:

Y | γ, τ ∼ N µ

ξ+ δλγ,1 − δ2λ τ σ2

, γ | τ ∼ T N

µ 0,σ2

τ ; [0, ∞)

, τ = αχ2ν

ν ∼ Γ(ν/2, ν/2),

and Θ = (ξ, σ2, λ, ν, α) is the parameter of stew t distribution in PX-EM algorithm.

And

(ξ, σ2, λ, ν) = R{(ξ, σ2, λ, ν, α)} = (ξ, σ2/α, λ, ν)

where R is the reduction function from the expanded parameter space to the original parameter space.

Applying routine algebraic manipulations leads to the following CM-step for updating α

ˆ

α(k+1) = n−1 Xn

j=1

ˆ s(k)1j

the application of the reduction function in the PX-EM algorithm leads to adjust-ments in the estimates of σ2and ν, which can be obtained by replacing the CM-steps 2 and 4 in the previous EM algorithm with the following two PX.CM steps:

PX.CM-step2:

ˆ

σ2(k+1) = Pn

j=1

³ ˆ

s(k)1j (yj − ˆξ(k+1))2− 2ˆδ(k)λ sˆ(k)2j (yj − ˆξ(k+1)) + ˆs(k)3j

´

2(1 − ˆδ2λ(k))Pn

j=1sˆ(k)1j .

PX.CM-step4:

log Ã

2Pn

j=1sˆ(k)1j

!

− DG(ν 2) + 1

n Xn

j=1

ˆ

s(k)4j = 0.

In the same way, under stew t mixture model, applying routine algebraic manip-ulations leads to the following CM-step for updating αi

ˆ

α(k+1)i = Xn

j=1

ˆ s(k)1ij/

Xn j=1

ˆ zij(k)

the application of the reduction function in the PX-EM algorithm leads to adjust-ments in the estimates of σi2and νi, which can be obtained by replacing the CM-step 3 and 5 in the previous EM algorithm with the following two PX.CM step:

PX.CM-step3:

ˆ

σi2(k+1) = Pn

j=1ˆs(k)1ij(yj− ˆξi(k+1))2− 2 ˆδi(k)Pn

j=1ˆs(k)2ij(yj− ˆξi(k+1)) +Pn

j=1sˆ(k)3ij 2(1 − ˆδi2(k))Pn

j=1sˆ(k)1ij .

PX.CM-step5:

log³νiPn

j=1zˆij(k) 2Pn

j=1sˆ(k)1ij

´

− DG³νi 2

´ +

Pn

j=1sˆ(k)4ij Pn

j=1zˆij(k) = 0.

REFERENCES

Azzalini, A. (1985), “A Class of Distributions Which Includes the Normal Ones,”

Scandinavian Journal of Statistics, 12, 171-178.

Azzalini, A. (1986), “Further Results on a Class of Distributions Which Includes the Normal Ones,” Statistica 46, 199-208.

Azzalini, A., and Capitaino, A. (2003), “Distributions Generated by Perturbation of Symmetry With Emphasis on a Multivariate Skew t-Distribution,” Journal of the Royal Statistical Society. Ser. B, 65, 367-389.

Basord, K. E., Greenway D. R., McLachlan G. J., and Peel D. (1997), “Standard Errors of Fitted Means Under Normal Mixture,” Computational Statistics, 12, 1-17.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion),” Journal of the Royal Statistical Society. Ser. B, 39, 1-38.

Flegal, K. M., Carroll, M. D., Ogden, C. L., and Johnson, C. L. (2002), “Prevalence and Trends in Obesity among US Adults, 1999-2000,” Journal of the American Medical Association, 288, 1723-1727.

Fraley, C, and Raftery, A. E. (2002), “Model-Based Clustering, Discriminant Analy-sis, and Density Estimation,” Journal of the American Statistical Association, 97, 611-631.

Green, P. J., (1995), “Reversible Jump MCMC Computation and Bayesian Model Determination,” Biometrika, 82, 711-732.

Henze, N. (1986), “A Probabilistic Representation of the Skew-Normal Distribu-tion,” Scandinavian Journal of Statistics, 13, 271-275.

Jones, M. C., and Faddy, M. J. (2003), “A Skew Extension of the t-Distribution, With Applications,” Journal of the Royal Statistical Society. Ser. B, 65, 159-174.

Lin, T. I., Lee, J. C., and Ni, H. F. (2004), “Bayesian Analysis of Mixture Modelling Using the Multivariate t Distribution,” Statistics and Computing, 14, 119-130.

Lin, T. I., Lee, J.C., and Yen, S. Y. (2006), “Finite Mixture Modelling Using the Skew Normal Distribution,” Statistica Sinica (To appear).

Liu, C. H., and Rubin, D. B. (1994), “The ECME Algorithm: a Simple Extension of EM and ECM With Faster Monotone Convergence,” Biometrika, 81, 633-648.

Liu, C. H., Rubin, D. B., and Wu, Y. (1998). “Parameter Expansion to Accelerate EM: the PX-EM Algorithm,” Biometrika, 85, 755-770.

McLachlan, G. J. and Basord, K. E. (1988), Mixture Models: Inference and Appli-cation to Clustering, Marcel Dekker, New York.

McLachlan, G. J., and Peel D. (2000), Finite Mixture Models, Wiely, New York.

Meng, X. L., and Rubin, D. B. (1993), “Maximum Likelihood Estimation via the ECM Algorithm: A General Framework,” Biometrika, 80, 267-78.

Peel, D., and McLachlan, G.J. (2000), “Robust Mixture Modeling Using the t Distribution,” Statistics and Computing 10, 339-348.

Richardson, S., and Green, P. J. (1997), “On Bayesian Analysis of Mixtures With an Unknown Number of Components (with discussion),” Journal of the Royal Statistical Society. Ser. B, 59, 731-792.

Shoham, S. (2002). “Robust Clustering by Deterministic Agglomeration EM of Mixtures of Multivariate t-Distributions,” Pattern Recognition, 35, 1127-1142.

Shoham, S., Fellows, M. R., and Normann R. A. (2003), “Robust, Automatic Spike Sorting Using Mixtures of Multivariate t-Distributions,” Journal of Neuro-science Methods, 127, 111-122.

Titterington, D. M., Smith, A. F. M. and Markov, U. E. (1985), Statistical Analysis of Finite Mixture Distributions, Wiely, New York.

Wang, H. X., Zhang, Q. B., Luo, B., and Wei, S. (2004), “Robust Mixture Mod-elling Using Multivariate t Distribution With Missing Information”, Pattern Recognition Letter, 25, 701-710.

Zacks, S. (1971), The Theory of Statistical Inference, Wiley, New York.

Zhang, Z., Chan, K. L., Wu, Y., and Cen, C. B. (2004), “Learning a Multivari-ate Gaussian Mixture Model With the Reversible Jump MCMC Algorithm,”

Statistics and Computing, 14, 343-355.

在文檔中 混合偏斜t分佈及其應用 (頁 29-38)

相關文件