Convergence of Gibbs sampler - Bayesian evaluation of inequality constrained hypotheses of mean

In Gibbs sampler, we usually monitor the convergence by the ”potential scale re-duction factor”(Gelman & Rubin,1992). First, we calculate the variance of the simulation form each chain and average these within-chain variances. Next we take the mixture variance of mixing all chains. Finally, we calculate the square roof of the ratio of average within-chain variances and mixture variance as the potential scale reduction factor which is denoted as ˆR.

R =ˆ

raverage within-chain variances

mixture variance . (23)

Table 1: Interpretation of the Bayes factor.

BF Evidence against M2

< 1 Negative (supports M₂) 1-3 Barely worth mentioning 3-20 Positive (supports M₁)

20-150 Strong

> 150 Decisive

When the chains have reached convergence, the average of within-chains variances and mixture variance will be identical, so the value of ˆR should be very close to 1. If R is much greater than 1, it implies that the chains have not yet mixed enough. Inˆ this study, we use the commands gelman.diag and gelman.plot in the coda packages in R to monitor the convergence of the Markov Chain Monte Carlo (MCMC) runs.

And we go until ˆR is less than 1.2 for the parameters of interest (Song & Lee, 2001). Once the MCMC has reached convergence and the successive draws from those conditional distributions could be considered as random draws from the joint posterior of all the parameters of interest.

4 Bayes factor

The Bayes factor is a popular Bayesian model selection criterion. For the com-parison between two models M₁ and M₂ base on the observed data D, the Bayes factor(BF₁₂) is defined as (Gelman, Carlin, Stern & Rubin, 2003)

BF12 = p(D | M1)

p(D | M₂), (24)

where p(D | M1) and p(D | M2) denote the probabilities of D under model M1 and M₂, respectively. A large BF₁₂ is, the greater the evidence in favor of model M₁ against M₂. Although the Bayes factor has no direct interpretation as a p-value, to what extend a Bayes factor provide support or evidence for M1 compared to M2, some guidelines have been suggested as reported in Table 1 (Kass & Raftery, 1995).

By the property of conditional probability, we have

BF₁₂= p(D | M1)

2) means the prior odds.

In this study, we are interested in testing the inequality constrained hypothesis of means of the single latent factor among different groups. For example, we use

µ⁽¹⁾, µ⁽²⁾, and µ⁽³⁾ to denote the means of group 1, group 2, and group 3 respectively and the inequality hypothesis µ⁽¹⁾ > µ⁽²⁾ > µ⁽³⁾ is of interest. Let H_i denote the inequality constrained hypothesis µ⁽¹⁾ > µ⁽²⁾ > µ⁽³⁾ and Hc denote the complement of H_i. To test H_i versus H_cusing Bayes factor based on data D, it is straightforward to see that p(H_c| D) = 1 − p(H_i | D) and p(H_c) = 1 − p(H_i). Once the MCMC has reached convergence and we obtain random draws from the joint posterior of all the parameters of interest, we use the proportion of draws which satisfy H_i to estimate p(H_i | D). We denote this proportion as f_i which is also called as proportion of fit. On the other hand, we use the proportion of random draws from the joint prior distribution of all the parameters to estimate p(H_i) and this proportion is similarly denoted as c_i and called the complexity (Hoijtink, 2013).

Consider the hypothesis µ⁽¹⁾ > µ⁽²⁾ > µ⁽³⁾. Because µ⁽¹⁾, µ⁽²⁾, and µ⁽³⁾ have the same prior distribution N(µ^(g)₀ , φ^(g)₀ ), there are 3! = 6 hypotheses with an equivalent structure. For example, one other equivalent hypothesis is µ⁽¹⁾ < µ⁽²⁾ < µ⁽³⁾. Each of these hypotheses has the same complexity from the prior distribution. Moreover, the probability of the equality hypothesis of µ⁽¹⁾ = µ⁽²⁾ = µ⁽³⁾ is zero. The union of the six equivalent hypotheses encompasses 100% of the parameter space of µ⁽¹⁾, µ⁽²⁾, and µ⁽³⁾, and therefore the complexity of each hypothesis is ¹₆ (Hoijtink, 2013).

As a result, the Bayes factor of H_i versus H_c which we denote as BF_ic is BF_ic = f_i/(1 − f_i)

c_i/(1 − c_i) = 5f_i

(1 − f_i). (26)

5 Simulation

We use R (R Core Team, 2015) to generate data from the MCCFA model and use the Bayes factor previously described to test inequality constrained hypotheses of the factor means among different groups. To assess the validity of Bayes factor in testing such constrained hypotheses, we consider two settings of order relations in the means of the latent factor, namely the equality (µ⁽¹⁾ = µ⁽²⁾ = µ⁽³⁾) and inequality (µ⁽¹⁾ > µ⁽²⁾ > µ⁽³⁾). We hope to see whether we can successfully reject all the inequality constrained hypotheses using the Bayes factor approach. On the other hand, we would like to investigate the strength of Bayes factor of testing inequality constrained hypothesis of means under the MCCFA model.

5.1 Simulation setting

5.1.1 parameters for data generation

We consider the setup of three groups (G = 3), and ten ordinal items (p = 10). For the parameters in the MCCFA model, we consider the simplest case with measure-ment invariance across groups such that Λ^(g), α^(g), and φ^(g) are set to be the same

for all groups g = 1, 2, 3. More specifically,

Note that µ^(g)is the only parameter that differs across groups under the equality and inequality settings. The µ^(g) values used for data generation are reported in Table 2. For each setting, we generate the ordered categorical data under the MCCFA model shown in (1) and (2). Three choices of sample size for each group, n_g, under consideration are 250, 500, and 1000. We assume the three groups to have equal sample sizes.

5.1.2 parameters of the prior distributions

We set the parameters of prior distributions of µ^(g), Λ^(g),Φ^(g), and α^(g) in (3) to (6) as follows:

• µ^(g) ∼ N(µ^(g)₀ , φ^(g)₀ ) with µ^(g)₀ = 0 and φ^(g)₀ = 10000;

• Λ^(g) ∼ MVN(Λ^(g)₀ , H^(g)₀ ) with Λ₀^(g) = 1 and H₀^(g) = 10I, where 1 and I are respectively the vector of ones and identity matrix;

• φ^(g)−1 ∼ Gamma(ρ^(g)₀ , θ₀^(g)) with ρ^(g)₀ = 10 and θ₀^(g) = 36;

where 1 is a 10 × 1 vector of ones. And we take random sampled values from N(0, 1) to be the starting value of µ^(g) for each chain with each setting.

We use the above starting values to generate F^(g) and start the Gibbs sampling process with the identifiability constraints of λ^(g)₁ = 1 and α^(g)₁₃ = −0.6 for all g = 1, 2, 3 as suggested in (22).

Table 2: Posterior means and standard deviations (SD) of the factor means.(ng = 250)

n_g = 250 chain 1 chain 2 chain 3

Setting Parameter True mean SD mean SD mean SD Rˆ

equality

µ⁽¹⁾ 0 -0.178 0.096 -0.170 0.091 -0.179 0.085 1.10 µ⁽²⁾ 0 -0.041 0.110 -0.154 0.127 -0.123 0.104 1.01 µ⁽³⁾ 0 -0.054 0.093 -0.066 0.070 -0.008 0.068 1.12 inequality

µ⁽¹⁾ 0 -0.208 0.089 -0.198 0.095 -0.263 0.079 1.13 µ⁽²⁾ -0.2 -0.362 0.105 -0.361 0.095 -0.383 0.099 1.01 µ⁽³⁾ -0.4 -0.457 0.080 -0.467 0.078 -0.426 0.077 1.02

Table 3: Posterior means and standard deviations (SD) of the factor means.(n_g = 500)

ng = 500 chain 1 chain 2 chain 3

Setting Parameter True mean SD mean SD mean SD Rˆ

equality

µ⁽¹⁾ 0 0.053 0.064 -0.017 0.088 0.040 0.071 1.01 µ⁽²⁾ 0 0.006 0.081 0.019 0.076 -0.016 0.074 1.04 µ⁽³⁾ 0 0.006 0.073 0.015 0.089 -0.007 0.068 1.00 inequality

µ⁽¹⁾ 0 -0.039 0.077 -0.014 0.075 0.013 0.069 1.01 µ⁽²⁾ -0.2 -0.308 0.064 -0.338 0.078 -0.337 0.079 1.01 µ⁽³⁾ -0.4 -0.500 0.068 -0.513 0.068 -0.501 0.061 1.03

在文檔中 Bayesian evaluation of inequality constrained hypotheses of means for ordered categorical data (頁 16-20)