• 沒有找到結果。

Bayesian evaluation of inequality constrained hypotheses of means for ordered categorical data

N/A
N/A
Protected

Academic year: 2021

Share "Bayesian evaluation of inequality constrained hypotheses of means for ordered categorical data"

Copied!
36
0
0

加載中.... (立即查看全文)

全文

(1)國立臺灣師範大學數 學系碩士班碩士論文. 指導教授: 蔡 蓉 青 博士. Bayesian evaluation of inequality constrained hypotheses of means for ordered categorical data. 研 究 生: 陳 冠 宏 中 華 民 國 一零五 年 六 月.

(2) 致謝 此篇論文的完成,首先要感恩的是我的指導教授—蔡蓉青老師,在寫作過 程中用心的指導與關懷。當我面臨模擬結果不如人意時,依然給予我鼓勵與建 議,使我能順利完成論文,僅此至最深之謝忱。. 也要感謝兩位口試委員蔡恆修老師與張少同老師,提供本論文諸多寶貴的 指正與建議,使學生獲益良多,並使本論文更臻完整。 再者,要感謝與我一同合作論文的好夥伴–賴驥緯同學以及林炯伊學長,感 謝兩位這段時間來的幫助,一起討論,一起找資料,使我們都能順利完成論文。. 另外,也要感謝師大數學所的各位同學,留系館一起趕論文的彥羽,在我 趕論文的時候幫我監考的建宇、寫期末考詳解的逸安,還有國棟、冠瑋、宗倫、 昱達、亞衛、欣翰、雪惠、佳陽、佑萱、宗儒、耀慶、逸翔、侃君、書豪、育 瑋、俊佑、黃偉、孟勳、湘屏等人,謝謝大家這兩年來的陪伴,讓我在碩士生的 生活中留下了許多回憶,希望畢業後還有再見的機會。 最後,還要感恩我最親愛的家人和朋友們,在我就讀研究所及論文寫作期 間給予我的支持與陪伴,這篇論文是因為有他們才得以完成。 陳冠宏 謹識於 國立臺灣師範大學 數學所 統計組 中華民國 一零五年 七月.

(3) 摘要 本研究利用了多群組離散型驗證性因素分析模型來分析多群組的有序分類 數據,主要目的在於利用貝氏估計在最小限制式的條件下,來估計模型中的閾 值、潛在因子的平均與變異數等參數,我們利用了資料擴張與Gibbs抽樣的方式來 估計這些參數的聯合分配。並利用貝氏因子來檢驗潛在因子的平均是否滿足不等 限制的假說。而藉由模擬與實徵資料的分析,貝氏因子已被驗證在檢驗不等限制 的假說上是可行的。. 關鍵 字 :貝氏估計、貝氏因子、不等限制假說、有序分類數據.

(4) Bayesian evaluation of inequality constrained hypotheses of means for ordered categorical data Kuan-Hung Chen Department of mathematics , National Taiwan Normal University Abstract The main purpose of this study is to use Bayesian estimation and Bayes factor to test for inequality constrained hypotheses of means for ordered categorical data among multiple groups using categorical confirmatory factor analysis model. Joint Bayesian estimates of the thresholds, the factor scores and the structural parameters subjected to some minimal identification constraints are obtained by using data augmentation and Gibbs sampling. By the simulation and real data analysis, Bayes factor is shown useful in testing hypotheses involving inequality constraints of means for ordered categorical data.. Keyword: Bayesian estimation, Bayes factor, inequality constrained hypotheses, ordered categorical data. 1.

(5) Contents 1 Introduction. 5. 2 Model description. 6. 3 Bayesian estimation 3.1 Joint prior distribution . . . . . . . . . . . 3.2 Conditional distributions of parameters 3.2.1 Y (g) ’s conditional distribution . . . . . 3.2.2 µ(g) ’s conditional distribution . . . . . 3.2.3 F (g) ’s conditional distribution . . . . . 3.2.4 Λ(g) ’s conditional distribution . . . . . −1 3.2.5 φ(g) ’s conditional distribution . . . . 3.2.6 α(g) ’s conditional distribution . . . . . 3.3 Identifiability constraints . . . . . . . . . . 3.4 Convergence of Gibbs sampler . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 6 8 9 9 10 10 11 12 13 13 13. 4 Bayes factor. 14. 5 Simulation 5.1 Simulation setting . . . . . . . . . . . . . . . 5.1.1 parameters for data generation . . . 5.1.2 parameters of the prior distributions 5.1.3 starting values . . . . . . . . . . . . . 5.2 Results . . . . . . . . . . . . . . . . . . . . .. 15 15 15 16 16 17. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 6 Real data. 20. 7 Discussion. 30. 8 Conclusion. 31. References. 32. 2.

(6) List of Figures 1 2 3 4 5 6 7. The iterations of factor mean with chain 1 . . . . . . . The iterations of factor mean with chain 2 . . . . . . . The iterations of factor mean with chain 3 . . . . . . . The potential scale reduction factor plot of factor mean The posterior distribution(burn in first 50000 times) of with chain 1 . . . . . . . . . . . . . . . . . . . . . . . . The posterior distribution(burn in first 50000 times) of with chain 2 . . . . . . . . . . . . . . . . . . . . . . . . The posterior distribution(burn in first 50000 times) of with chain 3 . . . . . . . . . . . . . . . . . . . . . . . .. 3. . . . . . . . . . . . . . . . . . . . . . . . . for each group factor mean . . . . . . . . factor mean . . . . . . . . factor mean . . . . . . . .. 22 23 24 25 26 27 28.

(7) List of Tables 1 2 3 4 5 6 7 8 9 10 11 12. Interpretation of the Bayes factor. . . . . . . . . . . . . . . . . . . . . Posterior means and standard deviations (SD) of the factor means.(ng = 250) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Posterior means and standard deviations (SD) of the factor means.(ng = 500) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Posterior means and standard deviations (SD) of the factor means.(ng = 1000) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Bayes factor under ng = 250. . . . . . . . . . . . . . . . . . . . . The Bayes factor under ng = 500. . . . . . . . . . . . . . . . . . . . . The Bayes factor under ng = 1000. . . . . . . . . . . . . . . . . . . . The seven questions on SITES. . . . . . . . . . . . . . . . . . . . . . The starting values used for the three Gibbs sampler chains. . . . . . The posterior means and standard deviations (SD) of the factor means for real data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Bayes factors of the equivalent inequality constrained hypotheses of real data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tukey multiple comparisons of means 95% family-wise confidence level.. 4. 14 17 17 18 18 19 19 20 20 29 29 30.

(8) 1. Introduction. Today we see a lot of surveys in our life asking about the service or policy satisfaction, learning achievement evaluation, and comments on social issues. In most of these surveys, five-point Likert scale questions are popularly used to investigate the subject of interest. Therefore, it becomes important to know how to properly analyze this type of ordered categorical data. The categorical confirmatory factor analysis (CFA) model is commonly used to analyze ordered categorical data. While comparing groups for statistical differences in their underlying latent factor is intended, its extension to the multiple group categorical confirmatory factor analysis (MCCFA) model facilitates the test of measurement invariance (Millsap & Tein, 2004; Widaman & Reise, 1997), differential item functioning (Chang, Huang, & Tsai, 2015; Stark, Chernyshenko, & Drasgow, 2006), response styles (Cheung & Rensvold, 2000), as well as factor mean differences (Fors & Kulin, 2016; S¨orbom, 1974; van der Sluis, Vinkhuyzen, Boomsma, & Posthuma, 2010). For the parameter estimation of MCCFA model for ordered categorical data, the direct method is using the maximum likelihood estimation (Lee, Poon, & Bentler, 1990). When the multiple integrals of maximum likelihood estimation become too complex, the multi-stage estimation was used to reduce the computation (Lee, Poon & Bentler, 1995). In practice, when the dimension of the factor model increases, the multi-stage estimation may not be useful. In this case, the Bayesian approach with Gibbs sampler algorithm (Geman & Geman, 1984) was used to deal with this problem. Song and Lee (2001) derived all the necessary Gibbs sampling steps to estimate the factor loadings, thresholds, and the variances of the latent factors, with the means of all the latent factors fixed at zero. However, it is sometimes of interest to investigate the mean differences among groups. Therefore, we modify and extend their approach to allow for nonzero means of the latent factors. By doing so, we can test the inequality constrained hypotheses of the means. We test the inequality rather than equality because we believe in most situations the latent factor of different groups would have different means. And we use the Bayes Factor (Kass & Raftery, 1995) to test the hypothesis instead of p-value. Because the p-value does not quantify statistical evidence (Wagenmakers, 2007), the Bayes Factor was used to test inequality constrained hypotheses (Klugkist & Hoijtink, 2007; Hoijtink, 2013). In this study, we will test the inequality constrained hypotheses of means of three different groups by Bayesian evaluation. First, we briefly review the MCCFA model and the parameters of our interest in Section 2. Second, the Bayesian estimation of the model parameters is described in details in Section 3. Next, we introduce the Bayes factor we use to test the inequality constrained hypotheses of the latent means in Section 4. Furthermore, some simulations are conducted to assess the validity of Bayes factor in testing the inequality constrained hypotheses. Finally, we analyze real data to illustrate the proposed method and draw some conclusions.. 5.

(9) 2. Model description (g). For k = 1, . . . , p, i = 1, . . . , ng , g = 1, . . . , G, we use zki to denote the observable (g) response of item k from person i in group g, and zki takes (interger) values in the (g) set of {0, 1, . . . , C} for all items. yki is assumed to be the unobservable variable (g) which gives rise to the ordinal response zki . With a set of thresholds of item k for (g) (g) (g) (g) group g, αk = (αk,1 , αk,2 , . . . , αk,C ), the MCCFA model assumes that (g). (g). (g). (g). zki = c if αk,c < yki ≤ αk,c+1 , (g). k = 1, . . . , p; i = 1, . . . , ng ; g = 1, . . . , G,. (g). (g). (1). (g). where αk,0 = −∞ and αk,C+1 = ∞. And we use z i and y i to denote respectively the observable and latent responses of person i in group g, and α(g) to be the thresholds of group g such that  (g)   (g)   (g)  α1 z1i y1i   α(g) z (g)  y (g)  2     2i   2i  (g) (g) y i =  . , z i =  . , α(g) =  . .    ..   ..   .  (g) (g) (g) zpi ypi αp (g). In this paper, for each y i , we assume there exists a latent factor scores denoted as (g) (g) ξi , a p × 1 factor loading vector Λ(g) and a p × 1 vector of error measurements i such that (g). (g). (g). y i = Λ(g) ξi + i ,. i = 1, . . . , ng ; g = 1, . . . , G,. (2). (g). where ξi follows the normal distribution with mean µ(g) , and variance φ(g) , i.e., (g) N(µ(g) , φ(g) ). i is assumed to follow the multivariate normal distribution MVN(0, Ψ(g) ), (g) (g) (g) with a diagonal covariance matrix Ψ(g) = Diag(ψ11 , . . . , ψpp ). And i is indepen(g) dent of ξi .. 3. Bayesian estimation (g). (g). For the g-th group, let Z (g) = (z 1 , . . . , z ng ) be the matrix of the observed polyto(g) (g) (g) (g) mous data, Y (g) = (y 1 , . . . , y ng ) and F (g) = (ξ1 , . . . , ξng ) be respectively the matrix of the latent response and factor score data. Moreover, let Z = (Z (1) , . . P . , Z (G) ), Y = (Y (1) , . . . , Y (G) ), F = (F (1) , . . . , F (G) ), α = (α(1) , . . . , α(G) ), and n = G i=1 ng . (g) Let Λ be the parameter matrix consisting of all the Λ ’s for g = 1, . . . , G, and µ, Φ, and Ψ are all similarly defined. For the MCCFA model, the parameters need to be estimated are the thresholds α, the factor loadings Λ, the means µ, the variance of latent factors Φ, and the variance matrix of the error measurement Ψ. However, not all the above-mentioned parameters are identifiable due to the categorical nature of the polytomous responses. To ensure identifiability of the 6.

(10) parameters, we constrain Ψ = I, the identity matrix. That is, the variance of ip is fixed at one for each of all the items. Thus, the parameters need to be estimated become α, Λ, Φ, and µ. We use Θ = (µ, Φ, α, Λ) to denote the set of all parameters of interest. To obtain the posterior distribution p(Θ | Z), we need to solve complicated multiple integrals induced by the polytomous variables. Therefore, it would be difficult to obtain the Bayesian estimates of the parameters Θ via direct derivation of p(Θ | Z). Tanner and Wong (1987) introduced an iterative method for the computation of the posterior distributions with augmented data, which facilitate the simulations of the parameters of interest. Therefore, we consider the complete data as (Z, Y , F ) by augmenting the observed data Z with the missing data (Y , F ). More specifically, we first add the unobserved data Y , F and use the Gibbs sampling (Geman & Geman, 1984) to obtain random draws from the conditional distribution p(Θ, Y , F | Z). Then we can obtain the posterior distribution p(Θ | Z) from p(Θ, Y , F | Z). The iterative process of the Gibbs sampling is given as following: At the t-th iteration with current values of Y (t) ,µ(t) ,F (t) ,Λ(t) ,Φ(t) and α(t) , we successively draw from the following conditional distributions that • generate Y (t+1) from p(Y | Z, µ(t) , F (t) , Λ(t) , Φ(t) , α(t) ); • generate µ(t+1) from p(µ | Z, Y (t+1) , F (t) , Λ(t) , Φ(t) , α(t) ); • generate F (t+1) from p(F | Z, Y (t+1) , µ(t+1) , Λ(t) , Φ(t) , α(t) ); • generate Λ(t+1) from p(Λ | Z, Y (t+1) , µ(t+1) , F (t+1) , Φ(t) , α(t) ); • generate Φ(t+1) from p(Φ | Z, Y (t+1) , µ(t+1) , F (t+1) , Λ(t+1) , α(t) ); • generate α(t+1) from p(α | Z, Y (t+1) , µ(t+1) , F (t+1) , Λ(t+1) , Φ(t+1) ). By the conditional independence of parameters, we can further simplify the above conditional distributions as follows: • p(Y | Z, µ(t) , F (t) , Λ(t) , Φ(t) , α(t) ) = p(Y | Z, F (t) , Λ(t) , α(t) ); • p(µ | Z, Y (t+1) , F (t) , Λ(t) , Φ(t) , α(t) ) = p(µ | F (t) , Φ(t) ); • p(F | Z, Y (t+1) , µ(t+1) , Λ(t) , Φ(t) , α(t) ) = p(F | Y (t+1) , µ(t+1) , Λ(t) , Φ(t) ); • p(Λ | Z, Y (t+1) , µ(t+1) , F (t+1) , Φ(t) , α(t) ) = p(Λ | Z, Y (t+1) , F (t+1) ); • p(Φ | Z, Y (t+1) , µ(t+1) , F (t+1) , Λ(t+1) , α(t) ) = p(Φ | µ(t+1) , F (t+1) ); • p(α | Z, Y (t+1) , µ(t+1) , F (t+1) , Λ(t+1) , Φ(t+1) ) = p(α | Z, Y (t+1) ). Note that we will in fact draw φ(g). −1. from p(φ(g). 7. −1. | µ(g) , F (g) ) to obtain φ(g) ..

(11) 3.1. Joint prior distribution. To derive the conditional distributions for the Gibbs sampling process, we need to specify the joint prior distribution of Θ = (µ, Λ, Φ, α). Similar to Arminger and Muth´en(1998) who assumed independence of the priors of µ, Λ and Φ, we also assume µ, Λ, and Φ as well as α are all independent. As a result, their joint prior distribution is given by p(Θ) = p(µ, Λ, Φ, α) = p(µ)p(Λ)p(Φ)p(α). We further assume that the priors of the group-specific parameters are independent G Q across groups. That is, p(µ) = p(µ(g) ). Moreover, Λ, Φ, and α are all similarly g=1. defined. Thus, we only need to one by one specify the priors of µ(g) , Λ(g) , φ(g) , and α(g) for each group g = 1, . . . , G. µ(g) ’s prior (g). (g). µ(g) ∼ N(µ0 , φ0 ), (g). (3). (g). where µ0 and φ0 are the mean and variance of normal distribution. Λ(g) ’s prior (g). (g)T. (g). λk ∼ N(λ0k , H0k ),. (4). (g)T. (g)T. where λk denotes the element of Λ(g) ’s k-th row, λ0k (g) (g) Λ0 ’s k-th row, and H0k is the variance.. denotes the element of. −1. φ(g) ’s prior φ(g) (g). −1. (g). (g). ∼ Gamma(ρ0 , θ0 ),. (5). (g). where ρ0 and θ0 are the shape and scale of the gamma distribution. α(g) ’s prior (g). Since αk is a random vector with ordered elements such that (g). (g). (g). (g). (g). (g). (g). αk = (αk,1 , αk,2 , . . . , αk,C ) with αk,1 < αk,2 < · · · < αk,C .. (6). Here we use the order statistics of the uniform distribution U(a, b) as their priors. (g) More specifically, αk,c is defined as the c-th smallest among a random sample of size C from U(a, b). Moreover, an appropriate choice of a small a and a large b would 8.

(12) (g). produce a similar effect as the non-informative prior p(αk,j ) ∝ 1, for g = 1, . . . , G. As a result, we have (g). (g). (g). (g). p(αk ) ∝ p(αk,1 , αk,2 , . . . , αk,C ) · I(α(g) <α(g) <···<α(g) ) k,1. ∝ ∝. (g) (g) p(αk,1 )p(αk,2 (g) (g) p(αk,1 )p(αk,2. | |. k,2. k,C. (g) (g) (g) (g) (g) (g) (g) (g) αk,1 )p(αk,3 | αk,1 , αk,2 ) · · · p(αk,C | αk,1 , αk,2 , . . . , αk,C−1 ) (g) (g) (g) (g) (g) αk,1 ) · p(αk,3 | αk,2 ) · · · p(αk,C | αk,C−1 ). (7). Since the order statistics of the uniform distribution on the unit interval have marginal distributions belonging to the Beta distribution family, we can use change of variables to get the following marginal and conditional probabilities that (g). (g) p(αk,1 ). (g). (g) p(αk,2. |. (g) αk,1 ). (g). αk,1 − a 0 αk,1 − a C−1 1 C! ( ) · (1 − ) · , = 0!(C − 1)! b − a b−a b−a =. (g). p(αk,1 , αk,2 ) (g). p(αk,1 ) (g). =. α −a 0 C! ( k,1 ) 0!0!(C−2)! b−a. (g). ·(. αk,2 −a b−a. (g). αk,1 −a 0 ) b−a. −. (g). α −a C! ( k,1 )0 0!(C−1)! b−a. (g). · (1 −. αk,2 −a C−2 ) b−a. 1 2 · ( b−a ). (g). · (1 −. αk,1 −a )C−1 b−a. ·. .. 1 b−a. By mathematical induction, we have (g). (g) p(αk,j+1. |. (g) αk,j ). =. (g). p(αk,j , αk,j+1 ) (g). p(αk,j ) (g). =. α −a j−1 C! ( k,j ) (j−1)!(j+1−j−1)!(C−j−1)! b−a. (g). ·(. αk,j+1 −a b−a. (g). −. (g). α −a C! ( k,j )j−1 (j−1)!(C−j)! b−a. 3.2. (g). αk,j −a j+1−j−1 ) b−a. · (1 −. αk,j+1 −a C−j−1 ) b−a. 1 2 · ( b−a ). (g). · (1 −. αk,j −a )C−j b−a. ·. 1 b−a. Conditional distributions of parameters. Here we will show the conditional distributions in the Gibbs sampler as previously described. 3.2.1 p(Y. Y (g) ’s conditional distribution (g). |F. (g). (g). (g). ,Λ ,α ,Z. (g). )=. ng Y. (g). (g). (g). p(y i | ξi , Λ(g) , α(g) , z i ). (8). i=1 ng. =. p YY. (g). (g). (g). (g). (g). (g). (g). p(yki | ξi , λk , α(g) , zki ). i=1 k=1 ng p. =. YY. p(yki | ξi , λk ) · I(α(g). i=1 k=1. 9. (g) k,z ki. (g). ,α. (g) (g) k,z +1 ki. ). (yki ),.

(13) where (g). (g). (g). (g) (g). yki | ξi , λk ∼ N(λk ξi , 1), and I(α(g). (g). ,α (g). k,z ki. (g) (g) k,z +1 ki. ). (g). (yki ) is the indicator function of yki such that (. I(α(g). ,α (g). k,z ki. 3.2.2. (9). (g) (g) k,z +1 ki. (g) (y ) ) ki. =. 1, α. (g). (g). (g). k,zki. < yki ≤ α. (g) (g). k,zki +1. ,. (10). 0, otherwise.. µ(g) ’s conditional distribution (g). p(µ(g) | F (g) , φ(g) ) ∝ p(µ(g) )p(ξ1 , . . . , ξn(g) | µ(g) , φ(g) ) g (g). ∝ p(µ ). ng Y. (g). p(ξi. | µ(g) , φ(g) ). i=1. 1 (g) (g) −1 exp{− (µ(g) − µ0 )2 φ0 } 2 ng n 1 X (g) −1 (g) − 2g × (φ ) exp{− (ξi − µ(g) )2 φ(g) } 2 i=1 (g). − 12. ∝ (φ0 ). ng. X (g) 1 −1 (g) (g) −1 ∝ exp{− (µ(g) − µ0 )2 φ0 + (ξi − µ(g) )2 φ(g) }, 2 i=1. (11). which is an exponential quadratic form in µ(g) . Completing the quadratic form and pulling out constant factors gives : µ(g) | F (g) , φ(g) ∼ N(µng , φng ),. (12). where (g) −1. µng = (φ0 φng = where ξ 3.2.3. (g). (g) −1 (φ0. (g) −1. −1. + ng φ(g) )−1 (φ0. −1 (g). (g). · µ0 + ng φ(g) ξ. ). (g) −1 −1. + ng φ. ) ,. (g). is the average of ξi , for i = 1, . . . , ng .. F (g) ’s conditional distribution (g). (g). (g). (g). (g). Since y i = Λ(g) ξi + i , where ξi follow distribution N(µ(g) , φ(g) ) and i is (g) a p × 1 vector with distribution MVN(0, Ψ(g) ). Thus, y i is a p × 1 vector with (g) distribution MVN(Λ(g) µ(g) , Λ(g) φ(g) Λ(g)T + Ψ(g) ). The covariance matrix of y i and (g) ξi is (g). (g). (g). (g). (g). (g). (g). Cov(y i , ξi ) = Cov(Λ(g) ξi + i , ξi ) = Λ(g) Cov(ξi , ξi ) = Λ(g) φ(g) . 10.

(14) " That is, ". (g) ξi (g) yi. # is a (p + 1) × 1 vector such that. #   (g)   (g) µ φ(g) (Λ(g) φ(g) )T ξi (g) (g) (g) , ∼ MVN (g) | µ , Λ , φ Λ(g) µ(g) Λ(g) φ(g) Λ(g) φ(g) Λ(g)T + Ψ(g) yi. According to the property of conditional multivariate normal distribution, we can get F (g) ’s conditional distribution that p(F. (g). |Y. (g). (g). (g). (g). ,µ ,Λ ,φ ) =. ng Y. (g). p(ξi. (g). | y i , µ(g) , Λ(g) , φ(g) ),. (13). i=1. where (g). ξi. (g). (g). | y i , µ(g) , Λ(g) , φ(g) ∼ N(δi , ∆(g) ),. (14). with (g). δi. (g). = µ(g) + (Λ(g) φ(g) )T [Λ(g) φ(g) Λ(g)T + Ψ(g) ]−1 (y i − Λ(g) µ(g) ),. ∆(g) = φ(g) − (Λ(g) φ(g) )T [Λ(g) φ(g) Λ(g)T + Ψ(g) ]−1 Λ(g) φ(g) . Note that Ψ(g) has been fixed as the identity matrix I for identification purpose and therefore the above mean vector and variance-covariance matrix could be further simplified to (g). δi. (g). = µ(g) + (Λ(g) φ(g) )T [Λ(g) φ(g) Λ(g)T + I]−1 (y i − Λ(g) µ(g) ). ∆(g) = φ(g) − (Λ(g) φ(g) )T [Λ(g) φ(g) Λ(g)T + I]−1 Λ(g) φ(g) . 3.2.4 p(Λ. (g). Λ(g) ’s conditional distribution |Z. (g). ,Y. (g). ,F. (g). )=. p Y. p(λk. (g). |Yk. (g). ,F. (g). )∝. k=1. =. p(λk (g) )p(Y k (g) | λk (g) , F (g) ). k=1 ng. p. ∝. p Y. 1 (g) −1 1X (g) (g) exp{− H0k (λk (g) − λ0k (g) )2 } × exp{− ψkk −1 · (yik − λk ξi )2 } 2 2 i=1 k=1 Y. p Y. 1 (g) −1 (g) 2 (g) (g) (g) 2 exp{− H0k [λk − 2λk λ0k + λ0k ]} 2 k=1 ng. 1X (g) 2 (g) (g) (g) × exp{− ψkk −1 [yik − 2yik λk · ξi + (λk ξi )2 ]} 2 i=1 =. p Y. 1 (g) −1 (g) 2 (g) (g) (g) 2 exp{− H0k [λk − 2λk λ0k + λ0k ]} 2 k=1. 1 T (g) (g) T (g) 2 (g) (g) T × exp{− ψkk −1 [Y k Y k − 2λk · F (g) Y k + λk F (g) F (g) ]}, 2 11. (15).

(15) (g) −1. (g) 2. where ψkk denotes the k-th diagonal element of Ψ, λ0k H0k a constant. Thus, (15) can be further simplified to. can be regarded as. p. p(Λ(g) | Z (g) , Y (g) , F (g) ) ∝ exp{− (g) −1 (g) λ0k. (g). − 2λk [H0k. 1 X (g) 2 (g) −1 T {λk [H0k + ψkk −1 F (g) F (g) ] 2 k=1 (g) T. −1 (g) + ψkk F Yk. (g) T. (g). ] + ψkk −1 Y k Y k. }}.. (16). (g). Finally we arrive at λk ’s conditional distribution such that (g). λk | Z (g) , Y k (g) , F (g) ∼ N(ωk (g) , Ωk (g) ),. (17). with (g) −1 (g) λ0k. ωk (g) = Ωk (g) [H0k (g) −1. Ωk (g) = [H0k. (g) T. −1 (g) + ψkk F Yk. ],. T. −1 (g) (g) −1 + ψkk F F ] . (g). (g). Again with Ψ fixed at I for identification purpose, ωk and Ωk are simplified to (g). (g) −1 (g) λ0k. (g). (g) T. + F (g) Y k. ωk = Ωk [H0k. (g) −1. T. + F (g) F (g) ]−1 .. ] and Ωk (g) = [H0k. −1. φ(g) ’s conditional distribution. 3.2.5 (g) −1. p(φ. |F. (g). (g) −1. (g). , µ ) ∝ p(φ. (g). )p(µ. |φ. (g) −1. )p(F. (g). (g). (g) −1. | µ ,φ. (g) −1. ) ∝ p(φ. ). ng Y. (g). p(ξi. | µ(g) , φ(g) ). i=1 (g). ρ −1 (g) −1 0. =. (φ. ). −1 (g) −1 exp{−(φ(g) )θ0 }. (g) (g) ρ0. θ0 (g) −1. ∝ (φ. ). ng 2. (g). Γ(ρ0 ). (g). +ρ0 −1. ×. exp{−φ. ng Y i=1. (g) −1. (g) −1 [θ0. 2 1 1 (g) −1 − µ(g) ) φ(g) } √ 1 exp{− (ξi (g) 2 ( 2π)(φ ) 2 ng. 2 1 X (g) (ξi − µ(g) ) ]} + 2 i=1. (18). −1. So we obtain φ(g) ’s conditional distribution such that φ(g). −1. | F (g) , µ(g) ∼ Gamma(. ng (g) + ρ0 , R(g) ) 2. with R. (g). =. (g) −1 (θ0. ng. 1 X (g) + (ξi − µ(g) )2 )−1 2 i=1. 12. (19).

(16) 3.2.6. α(g) ’s conditional distribution. In this study, we will discuss the five-point Likert scale so we take C = 4. (g). (g). (g). (g). (g). p(αk | Y (g) , Z (g) ) ∝ p(αk,1 , αk,2 , αk,3 , αk,4 ) (g). (g). (g). (g). × I(˜y(g) ,y(g) ) (αk,1 ) · I(˜y(g) ,y(g) ) (αk,2 ) · I(˜y(g) ,y(g) ) (αk,3 ) · I(˜y(g) ,y(g) ) (αk,4 ) 0. 1. 1. 2. 2. 3. 3. e (g) e (g) e (g) (g) (g) (g) (g) ∝ p(αk,1 )p(αk,2 | αk,1 )p(αk,3 | αk,2 )p(αk,4 | αk,3 ) (g). (g). 4. e. (g). (g). × I(˜y(g) ,y(g) ) (αk,1 ) · I(˜y(g) ,y(g) ) (αk,2 ) · I(˜y(g) ,y(g) ) (αk,3 ) · I(˜y(g) ,y(g) ) (αk,4 ), 0. 1. 1. 2. 2. 3. 3. (20). 4. e e e e (g) (g) (g) (g) where is the maximum of yki satisfying zki = j − 1, y j is the minimum of yki e (g) (g) (g) satisfying zki = j, and I(˜y(g) ,y(g) ) (αk,t ) is the indicator function of αk,j such that j−1 j e ( (g) (g) (g) 1, y˜j−1 < αk,j < y j (g) I(˜y(g) ,y(g) ) (αk,j ) = (21) e j−1 j 0, otherwise. e (g) y˜j−1. 3.3. Identifiability constraints. In the MCCFA model, different sets of parameters would result in the same response distributions (Chang, Huang & Tsai, 2015). So it is impossible to identify all the parameters just from Z (g) . Therefore, we need to give constraints of some parameters to make the other parameters identifiable. And we call such constraints as identifiability constraints. The number of constraints with a single latent factor, two groups (G = 2) and p items (p ≥ 3) are 2p + 4 (Chang, Hsu, & Tsai, 2015). Using similar arguments and derivations, we would need for a G-group MCCFA model Gp + 2G constraints to make the parameters identifiable. More specifically, we use the following constraints for the case with G = 3 such that (g). 3.4. ψkk = 1,. k = 1, . . . , p, and g = 1, 2, 3;. (g) λ1 (g) α13. g = 1, 2, 3;. = 1,. = constant,. (22). g = 1, 2, 3.. Convergence of Gibbs sampler. In Gibbs sampler, we usually monitor the convergence by the ”potential scale reduction factor”(Gelman & Rubin,1992). First, we calculate the variance of the simulation form each chain and average these within-chain variances. Next we take the mixture variance of mixing all chains. Finally, we calculate the square roof of the ratio of average within-chain variances and mixture variance as the potential ˆ scale reduction factor which is denoted as R. r ˆ = average within-chain variances . (23) R mixture variance 13.

(17) Table 1: Interpretation of the Bayes factor. BF Evidence against M2 <1 Negative (supports M2 ) 1-3 Barely worth mentioning 3-20 Positive (supports M1 ) 20-150 Strong > 150 Decisive. When the chains have reached convergence, the average of within-chains variances ˆ should be very close to 1. If and mixture variance will be identical, so the value of R ˆ is much greater than 1, it implies that the chains have not yet mixed enough. In R this study, we use the commands gelman.diag and gelman.plot in the coda packages in R to monitor the convergence of the Markov Chain Monte Carlo (MCMC) runs. ˆ is less than 1.2 for the parameters of interest (Song & Lee, And we go until R 2001). Once the MCMC has reached convergence and the successive draws from those conditional distributions could be considered as random draws from the joint posterior of all the parameters of interest.. 4. Bayes factor. The Bayes factor is a popular Bayesian model selection criterion. For the comparison between two models M1 and M2 base on the observed data D, the Bayes factor(BF12 ) is defined as (Gelman, Carlin, Stern & Rubin, 2003). BF 12 =. p(D | M1 ) , p(D | M2 ). (24). where p(D | M1 ) and p(D | M2 ) denote the probabilities of D under model M1 and M2 , respectively. A large BF12 is, the greater the evidence in favor of model M1 against M2 . Although the Bayes factor has no direct interpretation as a p-value, to what extend a Bayes factor provide support or evidence for M1 compared to M2 , some guidelines have been suggested as reported in Table 1 (Kass & Raftery, 1995). By the property of conditional probability, we have BF 12 where. p(M1 |D) p(M2 |D). p(D | M1 ) = = p(D | M2 ). p(D)p(M1 |D) p(M1 ) p(D)p(M2 |D) p(M2 ). means the posterior odds, and. p(M1 ) p(M2 ). =. p(M1 |D) p(M2 |D) p(M1 ) p(M2 ). ,. (25). means the prior odds.. In this study, we are interested in testing the inequality constrained hypothesis of means of the single latent factor among different groups. For example, we use 14.

(18) µ(1) , µ(2) , and µ(3) to denote the means of group 1, group 2, and group 3 respectively and the inequality hypothesis µ(1) > µ(2) > µ(3) is of interest. Let Hi denote the inequality constrained hypothesis µ(1) > µ(2) > µ(3) and Hc denote the complement of Hi . To test Hi versus Hc using Bayes factor based on data D, it is straightforward to see that p(Hc | D) = 1 − p(Hi | D) and p(Hc ) = 1 − p(Hi ). Once the MCMC has reached convergence and we obtain random draws from the joint posterior of all the parameters of interest, we use the proportion of draws which satisfy Hi to estimate p(Hi | D). We denote this proportion as fi which is also called as proportion of fit. On the other hand, we use the proportion of random draws from the joint prior distribution of all the parameters to estimate p(Hi ) and this proportion is similarly denoted as ci and called the complexity (Hoijtink, 2013). Consider the hypothesis µ(1) > µ(2) > µ(3) . Because µ(1) , µ(2) , and µ(3) have the (g) (g) same prior distribution N(µ0 , φ0 ), there are 3! = 6 hypotheses with an equivalent structure. For example, one other equivalent hypothesis is µ(1) < µ(2) < µ(3) . Each of these hypotheses has the same complexity from the prior distribution. Moreover, the probability of the equality hypothesis of µ(1) = µ(2) = µ(3) is zero. The union of the six equivalent hypotheses encompasses 100% of the parameter space of µ(1) , µ(2) , and µ(3) , and therefore the complexity of each hypothesis is 61 (Hoijtink, 2013). As a result, the Bayes factor of Hi versus Hc which we denote as BF ic is BF ic =. 5. 5fi fi /(1 − fi ) = . ci /(1 − ci ) (1 − fi ). (26). Simulation. We use R (R Core Team, 2015) to generate data from the MCCFA model and use the Bayes factor previously described to test inequality constrained hypotheses of the factor means among different groups. To assess the validity of Bayes factor in testing such constrained hypotheses, we consider two settings of order relations in the means of the latent factor, namely the equality (µ(1) = µ(2) = µ(3) ) and inequality (µ(1) > µ(2) > µ(3) ). We hope to see whether we can successfully reject all the inequality constrained hypotheses using the Bayes factor approach. On the other hand, we would like to investigate the strength of Bayes factor of testing inequality constrained hypothesis of means under the MCCFA model.. 5.1 5.1.1. Simulation setting parameters for data generation. We consider the setup of three groups (G = 3), and ten ordinal items (p = 10). For the parameters in the MCCFA model, we consider the simplest case with measurement invariance across groups such that Λ(g) , α(g) , and φ(g) are set to be the same. 15.

(19) for all groups g = 1, 2, 3. More specifically,    1.0 −2.78 −1.31  0.9   −1.89 −0.67     0.7   −2.74 −1.38    0.88  −1.82 −0.65      0.72 (g)  −2.44 −1.29 (g)  Λ = ,α =    0.86  −2.35 −1.15 0.74  −2.12 −1.15    0.84  −1.97 −0.70    0.76  −2.98 −1.48 −2.8 −1.35 0.82. −0.6 −0.08 −0.45 −0.2 −0.58 −0.46 −0.59 −0.12 −0.57 −0.43. 0.96 1.48 1.33 0.87 0.85 0.95 1.00 1.28 1.35 1.31.          , and φ(g) = 1.       . Note that µ(g) is the only parameter that differs across groups under the equality and inequality settings. The µ(g) values used for data generation are reported in Table 2. For each setting, we generate the ordered categorical data under the MCCFA model shown in (1) and (2). Three choices of sample size for each group, ng , under consideration are 250, 500, and 1000. We assume the three groups to have equal sample sizes. 5.1.2. parameters of the prior distributions. We set the parameters of prior distributions of µ(g) , Λ(g) ,Φ(g) , and α(g) in (3) to (6) as follows: (g). (g). (g). (g). • µ(g) ∼ N(µ0 , φ0 ) with µ0 = 0 and φ0 = 10000; (g). (g). • Λ(g) ∼ MVN(Λ0 , H 0 ) with Λ0 (g) = 1 and H 0 (g) = 10I, where 1 and I are respectively the vector of ones and identity matrix; • φ(g). −1. (g). (g). (g). (g). (g). ∼ Gamma(ρ0 , θ0 ) with ρ0 = 10 and θ0 = 36; (g). (g). • αk,1 < αk,2 < · · · < αk,C as the order statistics of the uniform distribution U(a, b) with a = −10000 and b = 10000. 5.1.3. starting values. The starting values of Λ(g) , α(g) and φ(g) are the same for all groups and all settings as follows:   Λ(g) = 1, α(g) = 1 −2 −1 1 2 , and φ(g) = 3, where 1 is a 10 × 1 vector of ones. And we take random sampled values from N(0, 1) to be the starting value of µ(g) for each chain with each setting. We use the above starting values to generate F (g) and start the Gibbs sampling (g) (g) process with the identifiability constraints of λ1 = 1 and α13 = −0.6 for all g = 1, 2, 3 as suggested in (22). 16.

(20) Table 2: Posterior means and standard deviations (SD) of the factor means.(ng = 250) ng = 250 Setting Parameter µ(1) equality µ(2) µ(3) µ(1) inequality µ(2) µ(3). chain 1 chain 2 True mean SD mean SD 0 -0.178 0.096 -0.170 0.091 0 -0.041 0.110 -0.154 0.127 0 -0.054 0.093 -0.066 0.070 0 -0.208 0.089 -0.198 0.095 -0.2 -0.362 0.105 -0.361 0.095 -0.4 -0.457 0.080 -0.467 0.078. chain 3 ˆ mean SD R -0.179 0.085 1.10 -0.123 0.104 1.01 -0.008 0.068 1.12 -0.263 0.079 1.13 -0.383 0.099 1.01 -0.426 0.077 1.02. Table 3: Posterior means and standard deviations (SD) of the factor means.(ng = 500) ng = 500 Setting Parameter µ(1) equality µ(2) µ(3) µ(1) inequality µ(2) µ(3). 5.2. chain 1 chain 2 chain 3 True mean SD mean SD mean SD 0 0.053 0.064 -0.017 0.088 0.040 0.071 0 0.006 0.081 0.019 0.076 -0.016 0.074 0 0.006 0.073 0.015 0.089 -0.007 0.068 0 -0.039 0.077 -0.014 0.075 0.013 0.069 -0.2 -0.308 0.064 -0.338 0.078 -0.337 0.079 -0.4 -0.500 0.068 -0.513 0.068 -0.501 0.061. ˆ R 1.01 1.04 1.00 1.01 1.01 1.03. Results. For each chain, 200000 iterations are run and we choose to burn in the first 100000 iterations to ensure convergence and set the thinning to be 10 to avoid high correlations between successive draws. Under both the equality and inequality settings, we compute the posterior means and standard deviations for the latent factor means and calculate the Bayes factors by fi and ci for all possible inequality constrained hypotheses of means of the different groups. The estimation and hypothesis testing results are reported in Table 2 to Table 7, respectively.. 17.

(21) Table 4: Posterior means and standard deviations (SD) of the factor means.(ng = 1000) ng = 1000 Setting Parameter µ(1) equality µ(2) µ(3) µ(1) inequality µ(2) µ(3). chain 1 chain 2 True mean SD mean SD 0 -0.062 0.053 -0.038 0.052 0 -0.113 0.049 -0.077 0.066 0 -0.012 0.059 -0.033 0.059 0 0.017 0.054 0.053 0.074 -0.2 -0.145 0.042 -0.144 0.044 -0.4 -0.438 0.046 -0.414 0.052. chain 3 mean SD -0.051 0.049 -0.093 0.059 -0.025 0.056 -0.006 0.060 -0.152 0.048 -0.402 0.056. Table 5: The Bayes factor under ng = 250. Setting. Hypothesis µ > µ(2) > µ(3) equality µ(1) > µ(3) > µ(2) µ(2) > µ(1) > µ(3) µ(2) > µ(3) > µ(1) (1) (2) (3) (µ = µ = µ ) µ(3) > µ(1) > µ(2) µ(3) > µ(2) > µ(1) µ(1) > µ(2) > µ(3) inequality µ(1) > µ(3) > µ(2) µ(2) > µ(1) > µ(3) µ(2) > µ(3) > µ(1) (1) (2) (3) (µ > µ > µ ) µ(3) > µ(1) > µ(2) µ(3) > µ(2) > µ(1) (1). 18. chain 1 chain 2 chain 3 0.2127 0.0628 0.0454 0.1240 0.8398 0.1536 0.7465 0.0546 0.0623 3.7458 1.2383 1.0234 0.5261 2.6947 2.4940 1.9599 1.9862 4.0285 9.3062 10.9033 4.7886 1.5062 1.1805 2.2823 0.5556 0.6155 0.7307 0.0277 0.0302 0.0953 0.0561 0.0307 0.1986 0.0070 0.0085 0.0659. ˆ R 1.06 1.01 1.03 1.01 1.00 1.01.

(22) Table 6: The Bayes factor under ng = 500. Setting. Hypothesis µ > µ(2) > µ(3) µ(1) > µ(3) > µ(2) µ(2) > µ(1) > µ(3) µ(2) > µ(3) > µ(1) µ(3) > µ(1) > µ(2) µ(3) > µ(2) > µ(1) µ(1) > µ(2) > µ(3) µ(1) > µ(3) > µ(2) µ(2) > µ(1) > µ(3) µ(2) > µ(3) > µ(1) µ(3) > µ(1) > µ(2) µ(3) > µ(2) > µ(1) (1). equality. (µ(1) = µ(2) = µ(3) ). inequality. (µ(1) > µ(2) > µ(3) ). chain 1 1.2274 1.9444 1.2360 0.5611 0.9123 0.3735 354.7122 0.0597 0.0105 0.0000 0.0000 0.0000. chain 2 0.8473 0.4389 0.6741 1.6103 0.8976 1.7549 142.0588 0.1754 0.0005 0.0000 0.0000 0.0000. chain 3 1.4499 2.3411 0.8754 0.4993 0.8706 0.3660 95.6036 0.2615 0.0000 0.0000 0.0000 0.0000. Table 7: The Bayes factor under ng = 1000. Setting. Hypothesis µ(1) > µ(2) > µ(3) equality µ(1) > µ(3) > µ(2) µ(2) > µ(1) > µ(3) µ(2) > µ(3) > µ(1) (µ(1) = µ(2) = µ(3) ) µ(3) > µ(1) > µ(2) µ(3) > µ(2) > µ(1) µ(1) > µ(2) > µ(3) inequality µ(1) > µ(3) > µ(2) µ(2) > µ(1) > µ(3) µ(2) > µ(3) > µ(1) (µ(1) > µ(2) > µ(3) ) µ(3) > µ(1) > µ(2) µ(3) > µ(2) > µ(1). 19. chain 1 0.3225 1.3363 0.0720 0.0787 5.1854 1.1706 956.5385 0.0000 0.0261 0.0000 0.0000 0.0000. chain 2 0.5175 2.1674 0.3996 0.4819 1.5181 1.3211 352.1429 0.0005 0.0705 0.0000 0.0000 0.0000. chain 3 0.5121 1.9348 0.1605 0.2582 2.4206 1.4243 282.3563 0.0040 0.0844 0.0000 0.0000 0.0000.

(23) Table 8: The seven questions on SITES. item 1 8 11 15 18 19 23. item description Students are more attentive when computers are used in class ICT can effectively enhance problem solving and critical thinking skills of students ICT-based learning enables students to take more responsibility for their own learning ICT improves the monitoring of students’ learning progress The achievement of students can be increased when using computers for teaching The use of e-mail increases the motivation of students Using computers in class leads to more productivity of students. Table 9: The starting values used for the three Gibbs sampler chains. Chain Chain 1 Chain 2 Chain 3. 6. Group 1 0.9916315 0.5163471 0.4907143. Group 2 -0.0917919 1.9677642 0.5842933. Group 3 -0.2593416 0.1250624 1.9249208. Real data. In 1998, the International Association for the Evaluation of Educational Achievement (IEA) established an international comparative research project, called Second Information Technology in Education Study (SITES), of Information and Communication Technology (ICT) infusion in different countries educational systems. The goal of this project is to provide policy-makers and the educational practitioners with information about the extent to which ICT contributes to bringing about in those systems reforms that will satisfy the needs of the Information Society. In the SITES project, IEA conducted a survey in 26 countries. The survey collected samples of at least 200 computer-using schools from at least one of primary, lower secondary, and upper secondary. We select three countries from this survey, Taiwan, Lithuania, and France with sample sizes of 572, 685, 768 respectively. Our goal is to test whether there exists some inequality of means among the three counties. There are 24 questions in this survey. For each question there are five response categories as strongly disagree, slightly disagree, uncertain, slightly agree, and strongly agree. We selected seven questions from the survey to examine whether ICT improves the students’ attitude or ability. The seven questions are reported in Table 8. Here we use the same parameters of the prior distribution as in the simulation settings. But for the starting values of µ(g) , we again take random samples from N(0, 1) as Table 9. We first show in Figures 1 to 3 the latent factor means at each iteration with Taiwan, Lithuania, and France, respectively. While examining the 20.

(24) three chains, they seem to reach a stable state after the first 50000 iterations and therefore we choose to burn in the first 50000 iterations. Again a thinning of 10 is applied to avoid high correlation between successive iterations. After a burn-in of 50000 iterations, the “potential scale reduction factor ” is shown in Figure 4 to assess convergence using the three chains and the resultant potential scale reduction factor of factor mean for Taiwan, Lithuania, and France are respectively 1.02, 1.03, and 1.04. According to the criterion of 1.2, we conclude that the chains have reached convergence and therefore we can then obtain the posterior distributions of factor means for each chain and each group. More specifically, we depict in Figures 5 to 7 the posterior distributions of the three factor means.. 21.

(25) Figure 1: The iterations of factor mean with chain 1. 22.

(26) Figure 2: The iterations of factor mean with chain 2. 23.

(27) Figure 3: The iterations of factor mean with chain 3. 24.

(28) Figure 4: The potential scale reduction factor plot of factor mean for each group. 25.

(29) Figure 5: The posterior distribution(burn in first 50000 times) of factor mean with chain 1. 26.

(30) Figure 6: The posterior distribution(burn in first 50000 times) of factor mean with chain 2. 27.

(31) Figure 7: The posterior distribution(burn in first 50000 times) of factor mean with chain 3. 28.

(32) Table 10: The posterior means and standard deviations (SD) of the factor means for real data. Chain 1 Country Parameter mean SD Taiwan µ(1) 0.147 0.06 Lithuania µ(2) 0.590 0.063 (3) France µ -0.055 0.048. Chain 2 Chain 3 mean SD mean SD 0.165 0.064 0.142 0.066 0.554 0.066 0.528 0.059 -0.062 0.048 -0.03 0.052. Table 11: The Bayes factors of the equivalent inequality constrained hypotheses of real data. Hypothesis µ(1) > µ(2) > µ(3) µ(1) > µ(3) > µ(2) µ(2) > µ(1) > µ(3) µ(2) > µ(3) > µ(1) µ(3) > µ(1) > µ(2) µ(3) > µ(2) > µ(1). Chain 1 0.0000 0.0000 2626.5789 0.0095 0.0000 0.0000. Chain 2 0.0000 0.0000 1277.0513 0.0196 0.0000 0.0000. Chain 3 0.0000 0.0000 620.0000 0.0403 0.0000 0.0000. And we calculate the posterior estimates and the Bayes factors by fi and ci as previously described for all possible inequality constrained hypotheses of means of the different groups. More specifically, we report in Table 10 the posterior mean and standard deviation of the factor mean for each group obtained from the three chains. The posterior means obtained from the three chains are similar while taking into account the variability indicated by the associated standard deviations. The Bayes factors for testing the inequality constrained hypotheses as shown in Table 11 where the Bayes factor of µ(2) > µ(1) > µ(3) is obviously larger than all the other hypotheses for all three chains. Therefore, according to the results, we conclude that the mean of latent factor of Lithuania is larger than Taiwan, and Taiwan is larger than France. Besides, we add up the scores of the seven questions for each people and use the ANOVA and Tukey’s test to test the mean difference of the three countries. The result are shown in Table 12. From Table 12 we can make the same conclusion that the mean of Lithuania is larger than Taiwan, and Taiwan is larger than France. If there is some response style behind the data, the ANOVA method might not be able to account for it and therefore arrive at a wrong conclusion. However, our method would also be useful for these data with possible response style.. 29.

(33) Table 12: Tukey multiple comparisons of means 95% family-wise confidence level. Comparisons Difference Lithuania-France 3.7056 Taiwan-France 3.0739 Taiwan-Lithuania -0.6316. 7. Lower bound 3.2180 2.5616 -1.1570. Upper bound 4.1931 3.5863 -0.1062. p-value after adjustment 0.0000 0.0000 0.0135. Discussion. When the factor means are equal, we expect the Bayes factor for each inequality constrained hypothesis to be similar which in turn suggests that the data do not tend to support any inequality constrained hypotheses. According to the criterion in Table 1, we tend to support the inequality constrained hypothesis with a Bayes factor greater than 3. However, for ng = 250 in Table 5, the Bayes factor values of 3.7458 obtained for the µ(2) > µ(3) > µ(1) hypothesis and 4.0285 obtained for the µ(3) > µ(2) > µ(1) hypothesis are greater than 3. And in Table 7, the Bayes factor value of 5.1854 is obtained for the hypotheses of µ(3) > µ(1) > µ(2) with n = 1000 is also greater than 3. There are some possible causes of such an outcome. (g) (g) First, although ξ i follows the N(µ(g) , φ(g) ), the best estimate of the mean of ξ i for i = 1, . . . , ng may not be exactly equal to µ(g) because of sampling variability. That is, for the current generated data with ng = 250, a greater Bayes factor value (g) might occur simply by chance. When the sample size increases, the mean of ξ i for i = 1, . . . , ng should be closer to µ(g) and the error due to sampling variability will decrease to improve the Bayes factor values as shown in the cases with ng = 500 and ng = 1000 in Table 6 and Table 7. Second, for the obtained posterior draws, no matter how small the difference µ(1) − µ(2) is, the draws contribute to the calculation of the proportion fi for the inequality hypothesis µ(1) − µ(2) despite the fact that such a small difference might be ignorable or simply occur by chance. Therefore, In other words, the magnitudes of the differences among the factor means µ(1) , µ(2) , and µ(3) might be small to arrive at such a large Bayes factor value. Third, if we consider the Bayes factor values in other chains, they are all smaller than 3. So the Bayes factor values greater than 3 are not typical and it occurs simply due to chance. In terms of the processing time of the Gibbs sampling algorithm, it needs about 60 hours for running the 200000 iterations in the case of ng = 500. Any increase either in the number of iterations or the sample size will require more time. This issue of considerably long running time has been known to be a major drawback of MCMC algorithms in general. However, the flexibility of allowing for testing inequality constrained hypotheses does encourage the use of Bayesian estimation methods in obtaining the posterior distributions of the relevant parameters. In practice, we can use other estimation methods such as the limited information or the least-squares approaches available in any existent software such as Mplus (Muth´en. 30.

(34) & Muth´en, 1998-2015) to obtain parameter estimates and take the estimated values as the starting values for the proposed Gibbs sampling runs. With such starting values presumably close to or likely to be considered as the draws from the posterior distributions of the parameters, the number of iterations required to reach convergence should be greatly reduced and much less running time is needed to obtain the posterior distributions of the parameters of interest. When inequality relations exist among the factor means, we expect the Bayes factor of actual inequality hypothesis to be significantly larger than all the other hypotheses. For the inequality case in Table 5 to Table 7, the Bayes factor of µ(1) > µ(2) > µ(3) is in fact much larger than all the other inequality hypotheses for sample sizes of ng = 250, 500, 1000. In other words, Bayes factor is shown effective and valuable in testing inequality constrained hypothesis of factor means under the MCCFA model. In this study, we assume there is just one single latent factor, but in reality there may be more than one latent factors. The Bayesian estimation method here can in fact apply to the cases with many latent factors by using similar arguments, but the identifiability constraints should be reconsidered. However, in that case the means of many latent factors will form a mean vector, so the inequality relations among vectors need a clear definition. However, this might be of some interest for future studies.. 8. Conclusion. This study discusses the Bayesian estimation and uses Bayes factor to test for inequality constrained hypotheses of factor means for ordered categorical data among three groups. We extend the estimation method of Song and Lee (2001) to allow for estimating the mean of the latent factors. And the minimal identification constraints are used to ensure identifiability of the parameters in the Gibbs sampling algorithm. Overall, we conclude that Bayes factor is useful in testing hypotheses involving inequality constraints of factor means for ordered categorical data.. 31.

(35) References [1] Arminger, G. & Muth´en, B.O. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs sampler and the Metropolis-Hastings algorithm. Psychometrika, 63, 271- 300. [2] Chang, Y. W., Hsu, N. J. & Tsai, R. C. (2015).Unifying differential item functioning of categorical CFA and GRM under a discretization of a normal variant. Manuscript submitted for publication. [3] Chang, Y. W., Huang, W. K., & Tsai, R. C. (2015). DIF detection using multiple-group categorical CFA with minimum free baseline approach. Journal of Educational Measurement, 52, 181-199. [4] Cheung, G. W., & Rensvold, R.B. (2000).Assessing extreme and acquiescence response sets in cross-cultural research using structural equations modeling. Journal of Cross-Cultural Psychology, 31, 187-212. [5] Fors, F., & Kulin, J. (2016). Bringing affect back in: Measuring and comparing subjective well-being across countries. Social Indicators Research, 127, 323-339. [6] Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis (2nd ed.). Boca Raton, FL, USA: Chapman & Hall/CRC. [7] Gelman, A. & Rubin,D.B.(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-511. [8] Geman, S. & Geman,D.(1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. [9] Hoijtink, H. (2013). Objective Bayes Factors for Inequality Constrained Hypotheses. International Statistical Review, 81, 207-229. [10] Kass, R. E. & Raftery, A. E.(1995). Bayes factors. Journal of American Statistical Association, 90, 773-795. [11] Klugkist,I. & Hoijtink,H.(2007). The Bayes factor for inequality and about equality constrained models. Computational Statistics & Data Analysis, 51, 6367-6379. [12] Lee,S. Y., Poon,W. Y. & Bentler,P.M.(1990). Full maximum likelihood analysis of structural equation models with polytomous variables.Statistics and Probability Letters, 9, 91-97. [13] Lee, S. Y., Poon, W. Y. & Bentler, P.M.(1995). A two-stage estimation of structural equation models with continuous and polytomous variables. British Journal of Mathematical and Statistical Psychology,48, 339-358. 32.

(36) [14] Millsap,R.E. & Tein,Y.J.(2004).Assessing factorial invariance in orderedcategorical measures. Multivariate Behavioral Research, 39, 479-515. [15] Muth´en, L.K., & Muth´en, B. O. (1998-2015). Mplus users guide (7th ed.). Los Angeles, CA: Muth´en & Muth´en. [16] R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.Rproject.org/. [17] Song, X. Y. & Lee, S. Y. (2001).Bayesian estimation and test for factor analysis model with continuous and polytomous data in several populations. British Journal of Mathematical and Statistical Psychology, 54, 237-263. [18] S¨orbom, D. (1974), A General method for studying differences in factor means and factor structure between groups. British Journal of Mathematical and Statistical Psychology, 27, 229-239. [19] Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy, Journal of Applied Psychology, 91, 1292-1306. [20] Tanner, M. A. & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American statistical Association, 82, 528540. [21] van der Sluis, S., Vinkhuyzen, A. A. E., Boomsma, D. I., & Posthuma, D. (2010). Sex differences in adults’ motivation to achieve. Intelligence, 38, 433446. [22] Wagenmakers, E.-J.(2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779-804. [23] Widaman, K. F., & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. In K. J. Bryant, M. Windle, & S. G. West (Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 281324). Washington, DC: American Psychological Association.. 33.

(37)

參考文獻

相關文件

• Many statistical procedures are based on sta- tistical models which specify under which conditions the data are generated.... – Consider a new model of automobile which is

∗ Suppose we want to determine if stocks picked by experts generally perform better than stocks picked by darts. We might conduct a hypothesis test to de- termine if the available

Department of Mathematics, National Taiwan Normal University, Taiwan..

Department of Mathematics, National Taiwan Normal University, Taiwan..

Feng-Jui Hsieh (Department of Mathematics, National Taiwan Normal University) Hak-Ping Tam (Graduate Institute of Science Education,. National Taiwan

2 Department of Educational Psychology and Counseling / Institute for Research Excellence in Learning Science, National Taiwan Normal University. Research on embodied cognition

&#34;Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,&#34; Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

Department of Mathematics, National Taiwan Normal University,