BAYESIAN INFERENCES OF LATENT CLASS MODELS WITH AN UNKNOWN NUMBER OF CLASSES

(1)

OCTOBER2014

DOI: 10.1007/S11336-013-9368-7

J

IA

-C

HIUN

P

AN

NATIONAL CHUNG CHENG UNIVERSITY

G

UAN

-H

UA

H

UANG NATIONAL CHIAO TUNG UNIVERSITY

This paper focuses on analyzing data collected in situations where investigators use multiple discrete indicators as surrogates, for example, a set of questionnaires. A very flexible latent class model is used for analysis. We propose a Bayesian framework to perform the joint estimation of the number of latent classes and model parameters. The proposed approach applies the reversible jump Markov chain Monte Carlo to analyze finite mixtures of multivariate multinomial distributions. In the paper, we also develop a procedure for the unique labeling of the classes. We have carried out a detailed sensitivity analysis for various hyperparameter specifications, which leads us to make standard default recommendations for the choice of priors. The usefulness of the proposed method is demonstrated through computer simulations and a study on subtypes of schizophrenia using the Positive and Negative Syndrome Scale (PANSS). Key words: categorical data, finite mixture model, label switching, reversible jump Markov chain Monte Carlo, sensitivity analysis, surrogate endpoint.

1. Introduction

Latent class analysis (LCA), originally described by Green (1951) and systematically de-veloped by Lazarsfeld and Henry (1968), as well as by Goodman (1974), has been found useful for classifying subjects based on their responses to a set of categorical items. The basic model postulates an underlying categorical latent variable with, say, J levels; and measured items are assumed independent of one another within any category of the latent variable. Recently, several authors extended the LCA model to describe the effects of measured covariates on the underly-ing categorical latent variable (Dayton & Macready,1988; Bandeen-Roche, Miglioretti, Zeger, & Rathouz,1997), or on measured item distributions within latent levels (Melton, Liang, & Pulver, 2005). This paper studies a more general latent class model proposed by Huang and Bandeen-Roche (2004), which incorporates covariate effects both on the latent variable and the measured items themselves (henceforth, the regression extension of latent class analysis (RLCA) model).

To reduce complexity and enhance interpretability, one usually fixes the number of levels or “classes” in a given latent class model and does the parameter estimation under the fixed number of classes. When prior knowledge does not mandate the number of classes, selecting the number of classes to fit becomes an analytic challenge. Standard practice is to base selection on either the Pearson χ2or the likelihood ratio goodness-of-fit test, and to fix J at the lowest number of classes that yields acceptable fit (Goodman,1974; Formann,1992). Instead of testing the goodness of fit of a specified model, we might use a criterion for selecting among different numbers of classes. The AIC (Akaike,1987) and BIC (Schwarz,1978) criteria, which trade off

Electronic Supplementary Material The online version of this article (doi:10.1007/s11336-013-9368-7) contains supplementary material, which is available to authorized users.

Requests for reprints should be sent to Guan-Hua Huang, Institute of Statistics, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 30010, Taiwan. E-mail:ghuang@stat.nctu.edu.tw

(2)

the value of the likelihood at the maximum likelihood solution and the number of estimated parameters, are commonly used approaches.

One common feature of the above methods is that they all must fit the latent class model repeatedly under different numbers of classes. Due to the slow convergence of commonly used fitting methods (e.g., the EM algorithm), these procedures may be infeasible in practical appli-cations. Huang (2005) developed a new tool for identifying the number of latent classes, which requires no model fit based on the assumed number of classes and synthesizes ideas from factor analysis, latent variable theory and generalized linear model residuals. However, such two-stage approaches are inefficient and may create misleading results. A poor goodness of fit may be the result of poor parameter estimation or wrong model assumptions or a bad choice for the number of classes. Without knowing the exact sources for poor goodness of fit, the use of goodness-of-fit tests in the specification of the number of classes may result in over parameterization in the number of classes and create a meaningless interpretation.

Joint inferences on the number of classes and model parameters are preferable because it is convenient, accurate and flexible. Traditional frequentist likelihood-based approaches do not allow this joint analysis, but advances in Bayesian inferences provide possible solutions. Markov chain Monte Carlo (MCMC) has had a profound effect on Bayesian statistics. MCMC draws samples from the posterior distribution by running a cleverly constructed Markov chain (e.g., a Metropolis–Hastings algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller,1953; Hastings,1970), and then forms sample averages to approximate posterior expectations of model parameters. The MCMC method is restricted to problems where the joint distribution has a den-sity with respect to a fixed dimensional parameter space. Green (1995) proposed the reversible jump Markov chain Monte Carlo (RJMCMC) method, which offers a general framework for construction of reversible Markov chain samplers that jump between parameter subspaces of different dimensionality. Latent class models with different numbers of classes correspond to pa-rameter subspaces of different dimensionality. RJMCMC thus provides a solution that can jointly estimate the number of classes and model parameters.

Richardson and Green (1997) made use of RJMCMC to estimate the number of compo-nents and the mixture component parameters jointly in finite mixtures of univariate normal dis-tributions. Since then, RJMCMC has been applied to many other mixture distributions, such as mixtures of exponential distributions (Gruet, Philippe, & Robert,1999), mixtures of Poisson distributions (Viallefont, Richardson, & Green, 2002), mixtures of multivariate normal distri-butions (Dellaportas & Papageorgiou,2006), and mixtures of multivariate Poisson distributions (Meligkotsidou,2007). Bartolucci, Mira, and Scaccia (2003) and Pandolfi, Bartolucci, and Friel (2010) discussed the RJMCMC implementation on latent class models with binary measured items. These can be viewed as finite mixtures of multivariate Bernoulli distributions. Bartolucci et al. (2003) used standard RJMCMC methods for updating estimates between different num-ber of classes, and applied the delayed rejection strategy for increasing the acceptance rate of class number change. Pandolfi et al. (2010) generalized the multiple-try Metropolis algorithm for RJMCMC, and applied this generalization to latent class models to improve the mixing of estimates. In this paper, we propose to implement the RJMCMC method to perform the joint estimation of the number of classes and model parameters on a more general latent class model that has polytomous measured items and incorporates various covariate effects. We thus focus on the model of finite mixtures of multivariate multinomial distributions incorporating covariate effects.

We now provide an outline for the remainder of the paper. Section2gives a Bayesian version of RLCA. Section 3 illustrates the RJMCMC scheme we propose and how we deal with the label switching problem. In Section4, simulation studies are conducted to evaluate the behavior of the proposed estimation method and the comparison with existing approaches. In Section 5, we perform a sensitivity analysis that discusses results from various hyperparameters and

(3)

proposal parameters, and provides default recommendations for the values of hyperparameters and proposal parameters. In Section6, data on determining subtypes of schizophrenia using the Positive and Negative Syndrome Scale (PANSS) are used to illustrate the proposed methods. A discussion is provided in Section7.

2. Bayesian Models for Regression Extension of Latent Class Analysis

2.1. Regression Extension Latent Class Model

To specify a latent class model with J classes, we define Yi= (Yi1, . . . , YiM)T to be a set of M polytomous response variables for the ith individual, i= 1, . . . , N. The mth variable, Yim, can take one of values{1, . . . , Km}, where Km≥ 2; the latent variable, Si, denotes the subpopulation in which the ith individual belongs to, and takes a value{1, . . . , J }. The basic idea of latent class modelling is that the components of Yi are associated because the overall population is comprised of a mixture of J subpopulations or classes, and these components are assumed to be statistically independent within latent classes. Therefore, the distribution of Yi can be expressed as the finite mixture density:

Pr(Yi1= yi1, . . . , YiM= yiM)= J j=1 Pr(Si= j) M m=1 Km k=1

Pr(Yim= k|Si= j)yimk

, (1)

where yimk= I (yim= k) = 1 if yim= k; 0 otherwise. The LCA model assumes that Pr(Si = j )= ηj and Pr(Yim= k|Si = j) = πmkj for all i. Thus, the model treats class membership probabilities, ηj, and item response probabilities conditional on class membership, πmkj, as ho-mogeneous over individuals. Huang and Bandeen-Roche (2004) incorporated covariates (xi,zi) into LCA, where xi = (1, xi1, . . . , xiP)T are predictors associated with latent variable Si, and zi= (zi1, . . . ,ziM); zim= (zim1. . . , zimL)T with m= 1, . . . , M are covariates built to have direct influence on response variables. Then the LCA model can be broadened to RLCA and be stated as

Pr(Yi1= yi1, . . . , YiM = yiM|xi,zi)= J j=1 ηj(xi) M m=1 Km k=1 πyimk mkj(zim) , (2)

where ηj(xi)and πmkj(zim)are often implemented assuming the generalized logit link function under the generalized linear model framework (McCullagh & Nelder,1989):

log ηj(xi) ηJ(xi) = β0j+ β1jxi1+ · · · + βPjxiP (3) and log πmkj(zim) πmKmj(zim) = γmkj+ α1mkzim1+ · · · + αLmkzimL (4) for i= 1, . . . , N; m = 1, . . . , M; k = 1, . . . , (Km− 1); j = 1, . . . , (J − 1); j= 1, . . . , J .

Notice that in the conditional probability model (4), we allow unrestricted intercepts and level- and item-specific covariate coefficients, but we do not allow the coefficients to vary across classes (i.e., αlmk is dependent on m, k but independent of j). This constraint is logical if the primary purpose of modeling conditional probabilities is to prevent possible misclassification by adjusting for characteristics associated with item measurements (Huang & Bandeen-Roche,

(4)

2004). It is also necessary to unambiguously distinguish covariate effects on measured response probabilities from covariate effects on class probabilities. Three assumptions are necessary to complete model (2):

(i) Pr(Si= j|xi,zi)= Pr(Si= j|xi);

(ii) Pr(Yi1= yi1, . . . , YiM= yiM|Si,xi,zi)= Pr(Yi1= yi1, . . . , YiM= yiM|Si,zi); (iii) Pr(Yi1= yi1, . . . , YiM= yiM|Si,zi)=Mm=1Pr(Yim= yim|Si,zi).

These assumptions give rise to implications that, from (i), class membership probabilities are as-sociated with xionly; from (ii), conditioning on class membership, responses are only associated with zi; and from (iii), the multiple measurements are independent given class membership Si and zi.

2.2. Hierarchical Model and Priors

Under a Bayesian framework, we specify prior distributions for the parameters in the RLCA model. Parameters in RLCA can be summarized as the number of classes J , regres-sion parameters for membership probabilities β= [βpj]0_{≤p≤P,1≤j≤J −1}, regression intercepts for modelling conditional probabilities γ = [γT₁, . . . , γT_M]T with γm= [γmkj]1≤k≤Km−1,1≤j≤J,

and covariate coefficients for modelling conditional probabilities α= [α1, . . . , αM]T with αm= [αlmk]1≤l≤L,1≤k≤Km−1.

We set all components of β, γ and α to be independently and normally distributed with mean 0 and pre-specified variance σ_P2. Here, we adopt a noninformative prior mean zero. Such a setting is common when prior scientific knowledge does not provide appropriate values of parameters (Garrett & Zeger,2000). For J , we assume it to follow an uniform distribution between 1 and a pre-specified integer Jmax. Class membership inference is often of our interest. We thus add the latent class variable into the joint distribution where Bayesian estimation is inferred from. There-fore, based on assumptions (i), (ii) and (iii) with selected prior distributions, the joint distribution of all variables for the RLCA model given covariates (x, z) can be represented as

× p(S|J, β; x)×p(Y|J, γ , α, S; z), (5) where Y= [YT₁, . . . ,YT_N]T, S= (S1, . . . , SN)T and, here and throughout this paper, we use p(·) and p(·|·) to denote joint and conditional distribution functions, respectively. Moreover, the dis-tributions of S and Y given S can be expressed as

p(S|J, β; x) = N i=1 Pr(Si= si|J, β; xi) = N i=1 J j=1 ηs_jij(xi) (6)

(5)

and p(Y|J, γ , α, S; z) = N i=1 Pr(Yi1= yi1, . . . , YiM= yiM|J, γ , α, Si= si; zi) = N i=1 J j=1 _M m=1 Km k=1 πyimk mkj (zim) sij (7)

with sij= I (si = j) and yimk= I (yim= k).

It is important to talk about the issue of labeling the classes. Our RLCA likelihood is in-variant when switching class labels. As a result, the posterior distribution is not identifiable for various permutations of class labels. We thus need to adopt a unique labeling for the class mem-bership. Once a unique labeling is determined, such a procedure is equivalent to ordering the joint prior distribution. So, the joint prior distribution of the parameters is the number of invari-ant class label permutations times the product of the individual densities. Our labeling procedure is detailed in Section3.4.

3. Bayesian Inferences with Variable Dimension Parameters 3.1. Reversible Jump Markov Chain Monte Carlo Algorithm

The Markov chain Monte Carlo (MCMC) method for the simulation of a distribution pro-duces an ergodic Markov chain whose stationary distribution is the distribution of interest (Robert & Casella,2004). There have been several earlier approaches on MCMC computation to deal with models with changing dimensionality, for instance, birth-and-death processes (Ripley,1977; Geyer & Møller,1994) or pseudo-priors (Carlin & Chib,1995), but the general formalization named reversible jump MCMC (RJMCMC) had not been proposed until Green (1995). In brief, RJMCMC is a random sweep Metropolis–Hastings method (Metropolis et al.,1953; Hastings, 1970) adapted for general state spaces (Richardson & Green,1997). In this paper, we use the RJMCMC algorithm for the situation where the number of classes is unfixed and to be deter-mined.

For our hierarchical RLCA model, we shall make use of six move types: (a) Updating the class memberships S;

(b) Updating the regression parameters for membership probabilities β; (c) Updating the regression intercepts for conditional probabilities γ ; (d) Updating the covariate coefficients for conditional probabilities α; (e) Birth or death of a class;

(f) Splitting one class into two, or merging two classes into one.

Move types (a)–(d) are conventional and can apply the Gibbs sampling scheme (Geman & Geman,1993; Zeger & Karim, 1991). Moves (e) and (f) involve updating the value of J , implying a change of dimensionality for the parameters in (b) and (c). One complete pass over these six moves will be called “a sweep” and is the basic time step of the algorithm.

3.2. Gibbs Sampling Scheme

For illustrating Gibbs sampling, we need to calculate the full conditional distribution of the interested variable given all others. In the case of Si, the full conditional distribution is

(6)

Since Si is discrete, the sample of Si given all other variables can be drawn directly from a multinomial distribution with probabilities in the right-hand side of (8), scaling to sum to 1.

The form of the full conditional distribution of β makes sampling from it difficult. Instead of sampling directly from the full conditional distribution, we use rejection sampling methods, where a sample from an envelope distribution where sampling is easier is first generated and this sample is then accepted with certain probability such that the accepted samples are pre-cisely from full conditional distributions (Zeger & Karim, 1991). We had tried out the stan-dard Metropolis–Hastings procedure for Steps (b), (c) and (d). However, the acceptance rate of Metropolis–Hastings sampling was very low and very long MCMC runs were needed. To gener-ate samples of β, β is divided into components{β[1], . . . , β[J −1]} with β[j]being the j th column of β. Each component β[j]is then updated conditioning on β[−j], where β[−j]is the β matrix deleting the j th column. The full conditional distribution of β[j]given β[−j]is in the form of

p β[j]|J, β[−j], γ , α,S, Y; x, z∝ p(S|J, β; x) × p β[j]|J

= p S|J, β[j], β[−j]; xp β[j]|J. (9) Following the approach of Zeger and Karim (1991), we choose the envelope distribution for β[j], g1(β[j]), to be a multivariate Gaussian distribution with mean ˆβ

[j]

, the maximum likelihood estimator of β[j]in (9), and variance c ˆΣ_β[j], the inverse of the observed fisher information of (9) multiplied by a constant c. To perform rejection sampling, a constance c∗is chosen so that p(S|J, β[j], β[−j]; x)p(β[j]|J ) ≤ (c∗· g1(β[j])), over all β[j]∈ RP+1. A vector sample of β[j] is produced by the followings steps:

1. Generate a random variate β∗from g1(β[j]); 2. Generate a random variate u from uniform (0, 1);

3. If u≤p(S|J,β∗_c,β∗_·g[−j]₁(β;x)p(β∗) ∗|J ), accept β∗, otherwise return to Step 1. In β’s rejection sampling scheme, we let c= 1.2 and set c∗to attain

p S|J, ˆβ[j], β[−j]; xp ˆβ[j]|J= c∗· g1 ˆβ[j].

Our selection of c∗ is adopted from Zeger and Karim (1991). These selections worked well in covering (9) in the simulations and real data analysis below.

Rejection method is applied to γ as well. First, γ is separated into column vectors {γ[1]_{, . . . , γ}[J ]_{}. A column vector γ}[j]_{is then updated from the full conditional distribution}

p γ[j]|J, β, γ[−j], α,S, Y; x, z∝ p(Y|J, γ , α, S; z) × p γ[j]|J

= p Y|J, γ[j], γ[−j], α,S; zp γ[j]|J, (10) where γ[−j]is the γ matrix deleting the j th column. The procedure of sampling γ[j]is similar to that for β[j], and γ is drawn by completing the loop for γ[j], j= 1, . . . , J . In γ ’s rejection sampling, we let c= 1.3 and set c∗to attain p(Y|J, ˆγ[j], γ[−j], α,S; z)p( ˆγ[j]|J ) = c∗·g2(ˆγ[j]), where g2(·) denotes the density of a multivariate Gaussian distribution with mean ˆγ[j]and vari-ance c ˆΣ_γ[j].

For generating α, we choose to update{α1, . . . , αM} sequentially. Each αm is divided into {α[1]m, . . . , α[Kmm−1]} with α[k]m being the kth column of αm. We first sample the values of α[k]m from the full conditional distribution

p α[k]_m|J, β, γ , α[−k]_m , α_−m,S, Y; z∝ p(Y|J, γ , α, S; z) × p α[k]_m

(7)

by the rejection sampling, where α[−k]m is the αm matrix deleting the kth column and α−m is the α collections without the component αm. The rejection sampling steps are simi-lar to those for β and γ . The value of α was completely drawn by executing the loop {α[1]1 , . . . , α[K

1−1]

1 , . . . , α[1]M, . . . , α[KMM−1]}. In α’s rejection sampling, we let c = 1.2 and set c∗ to attain p(Y|J, γ , ˆα[k]m, α[−k]m , α−m,S; z)p( ˆαm[k])= c∗· g3(ˆα[k]m), where g3(·) denotes the multi-variate Gaussian density with mean ˆα[k]_m and variance c ˆΣ_α[k]

m.

3.3. Reversible Jump Steps

For moves (e) and (f), the reversible jump mechanism is needed. The strategy of moves in RLCA is similar to Richardson and Green (1997). Increasing or decreasing of J fetches a dimensional change of β and γ . In Step (e), a choice of birth or death is made given the equal probability 0.5. When a birth is selected, we produce a (P + 1) × 1 random vector β[j∗] with each component generated from N (0, σ_BD2 ). Next, β[j∗]is added to the first column of β and the newly formed matrix is labeled as βbirth. The same procedure holds for γ as well, and the (M_m₌₁(Km− 1))×(J + 1) matrix γbirth is the newly born matrix. It remains to propose the reallocation of Si ∈ {1, . . . , J + 1}. This is done analogously to our Gibbs allocation (8) and creates a new class allocation Sbirth. Reversely, a death move first randomly draws a number among{1, . . . , J − 1}, say j∗, and then the columns β[j∗] and the γ[j∗] are removed from β and γ , respectively, to form new parameter matrices βdeathand γdeath. The reallocation of Si∈ {1, . . . , J − 1} is done analogously to (8), and the new class allocation is labeled as Sdeath. Note that in the birth-and-death step, the generated group or deleted group does not directly birth in or delete out the reference group. We leave the opportunity of removing or adding the reference (j∗= J ) to the work done by the split-and-merge step. Also, it is easy to see that the proposed birth-and-death move forms a reversible pair.

Following the RJMCMC recipe in Richardson and Green (1997), the acceptance probability for the birth move is min{1, AB} with

AB= p(J+ 1) p(J ) × L+1× p(βbirth|J + 1) × p(γbirth|J + 1) p(β|J ) × p(γ |J ) × 1 h1(β[j∗])× h2(γ[j∗]) × 1 J. (12)

In (12), J is the number of classes before birth. L₊₁ arises from the order statistics for the unique class labeling, and L₊₁ is J!/(J − 1)! if J ≥ 3; 1 if J = 2; 2 if J = 1. In the second line of (12), h1(β[j∗])=P_p₌₀h(βpj∗)and h2(γ[j

∗_]

)=M_m₌₁Km−1

k=1 h(γmkj∗)are the proposal densities for the birth columns β[j∗]and γ[j∗]with h(·) being the N(0, σ_BD2 )density. Technical details in deriving this acceptance probability ABfor the birth move can be found in Section S.1 of the supplementary information. For the corresponding death move, the acceptance probability is min{1, AD}, where

AD= p(J− 1) p(J ) × L−1× p(βdeath| J − 1) × p(γdeath| J − 1) p(β| J ) × p(γ | J ) × h1 β[j∗]× h2 γ[j∗]× J − 1, (13) and L₋₁is 1/(J− 1) if J ≥ 4; 1 if J = 3; 0.5 if J = 2.

For move (f), we make a random choice between attempting to split or merge with equal probability 0.5. Once the split procedure is adopted, randomly chosen columns β[j∗] and

(8)

β[J ]= 0. Given a random vector u generated from a multivariate normal distribution with mean

(

P+1

0, . . . , 0)T and variance diag(

P+1

σ_SM2 , . . . , σ_SM2 ), β[j∗]is divided into two column vectors β[j1]_and β[j2]_{with β}[j1]= β[j∗]+ u and β[j2]= β[j∗]− u. Similarly, a random vector v is generated from

a multivariate normal distribution with mean (

m(Km−1)

0, . . . , 0 )T and variance diag(

m(Km−1)

σ_SM2 , . . . , σ_SM2 )as well, and γ[j∗] is split into γ[j1]= γ[j∗]+ v and γ[j2]= γ[j∗]− v. The new beta and gamma

matrices, βsplit and γsplit, are formed by inserting β[j1]_{and γ}[j1]_{before the first column of β}

and γ , and replacing β[j∗]and γ[j∗]with β[j2]_{and γ}[j2]_{, respectively. Reallocation of subjects}

is done with the Gibbs sampling move (8). The new allocation is then labeled as Ssplit. When the reference class is chosen (i.e., j∗= J ), splitting β needs additional attention. When j∗= J , the two split beta matrices are β[j1]= u and β[j2]= −u. By the splitting rule we established above,

u is with respect to the first class and−u is with respect to the new reference class. To maintain

β[J ]= 0 in latent class model (3), the beta matrix should be adjusted for−u, and βsplitbecomes [2u, β[1]+ u, . . . , β[J −1]+ u].

A merge move in γ starts with a randomly selected column vector γ[j2]_{with j2}∈ {1, . . . , J },

and then γ[j2]_{and its “closest” column vector γ}[j1]_{are merged, where}

j1= arg min i∈{1,...,J −1}\{j2}

γ[j2]− γ[i]T _γ[j2]− γ[i]_. ₍₁₄₎

Note that we exclude J from the possible closest index set for the purpose of forming a reversible pair in the split-and-merge step. The merged column γ[j∗]= (γ[j1]+ γ[j2]_)/_{2. With j1}_{and j2}

obtained from γ , β[j1]_{and β}[j2]_{are also merged as β}[j∗]= (β[j1]+ β[j2]_)/_{2. The new allocation}

Smergeis formed by reallocating all subjects according to (8). The new merged matrices, γmerge and βmerge, are formed by replacing γ[j2]_{and β}[j2]_{with γ}[j∗]_{and β}[j∗]_{, respectively, and deleting} γ[j1] _{and β}[j1]_{. When the reference group is selected to be merged (i.e., j2}= J ), β[j1] _{is first}

deleted from β and then every remaining column of β is subtracted by β[j1]_/_{2 to maintain zero} βcoefficients of the reference class. It can be readily shown that our merge and split proposals are reversible.

The acceptance probability for the split is min{1, AS}, where AS = p(J+ 1) p(J ) × L+1× p(βsplit|J + 1) × p(γsplit|J + 1) p(β|J ) × p(γ |J ) × 1 h3(u)× h4(v)× J J+ 1× 1 wJ(j∗)× 1 2 (P₊₁₎₊₍M m=1(Km−1)) . (15)

Here, J is the number of class before splitting. As before, L₊₁arises from the order statistics permutations of the current and split states, and h3(u) and h4(v) are the densities for generating u and v. The weight wJ(j∗)is defined as

wJ j∗= 2, if j∗= J, 1, if j∗= J.

Technical details in deriving the acceptance probability AS for the split move can be found in Section S.1 of the supplementary information. Notice that, in split move, it is necessary to check whether the adjacency condition (14) is satisfied. If not, the split move is rejected forthwith for the reason that the split-merge pair is not reversible (Richardson & Green,1997). The acceptance

(9)

probability for the merge can be similarly obtained as min{1, AM}, where AM = p(J− 1) p(J ) × L−1× p(βmerge| J − 1) × p(γmerge| J − 1) p(β| J ) × p(γ | J ) × h3(u)× h4(v)× J J− 1× wJ−1 j∗× 2(P+1)+( M m=1(Km−1))_. ₍₁₆₎

There are some special features in the proposed reversible jump steps. First, the reallocation of class membership in reversible jump steps is done for all subjects, not just for subjects who belong to the classes that are selected for change. This is because any birth–death or split-merge in the β coefficient will result in different class weights in all classes (Equation (3)); thus, subjects in the new state have different probabilities to be in any of the classes, and reallocation for all subjects is necessary. Second, unlike Richardson and Green (1997), where the birth-and-death step is only for empty classes, our birth–death step is applied to all classes. We experimented with the empty-class-only birth–death step for our model; however, the acceptance rate was so low that the empty classes existed all the time. Our approach allows birth or death on all classes and then reallocates all subjects, which can efficiently eliminate empty classes. Third, our β coefficient only corresponds to the first J − 1 classes (the reference class has zero β coefficients); thus, some care is needed for renewing the β coefficient with respect to the reference class. We choose to update parameters of the reference class in the split-merge move but not in the birth–death to ease the model complication. We have found this simplification works well in our model. 3.4. Labeling Procedure

For identifiability concerns, we have adopted a unique labeling for class membership in each RJMCMC sweep. In our labeling, we use a modified version of the on-line processing method originally developed by Celeux, Hurn, and Robert (2000). The labeling procedure corresponds to an ordering of columns of γ . It is performed to relabel classes after running each RJMCMC sweep, and sweeps resulting in the same number of classes are collected altogether for the re-labeling process. Specifically, let γ1= [γ_mkj1 ], γ2= [γ_mkj2 ], . . . be the sequence of RJMCMC samples for γ that contains J latent classes. The first q samples after burn-in γ1, . . . , γq are used to initialize the procedure, where the choice of q has to be large enough to ensure initial estimates that crudely approximate posterior means, but is not so large that any label switch has occurred. For choosing a proper value of q, one can begin the RJMCMC with a pilot run. Then a trace plot of one particular parameter at every class can be drawn to look for where the first class switching phenomenon is taking place after burn-in. The choice of value of q should be smaller than the first sweep of switching problem. We take the dataset used in the sensitivity analysis for illustrating how to choose the value of q. The left-hand side of Figure1contains the trace plots of γ52j, j = 1, . . . , 6 at the first 5000 sweeps after burn-in before relabeling. From the figure, the first class switching occurred at about the 3000th sweep, and we can take the value of q not larger than 3000. Note that q is typically set to be 100 in this study.

Reference centers and variances are defined, respectively, as

γ_mkj(0) = 1 q q t=1 γ_mkjt and s_mkj(0) =1 q q t=1 γ_mkjt − γ_mkj(0)2.

(10)

Let (γt)[i]denote the ith column of γt, and let γ(0) be the matrix with components γ_mkj(0). The procedure runs to update the class labels for γq+1_{, γ}q+2_{, . . .} _{sequentially with γ}q+r _being pro-cessed as follows: 1. Let 1.1 (ij1, j1)= arg min {(i,j)|i,j=1,...,J } M m=1 Km−1 k=1 (γ_mkiq+r− γ_mkj(r−1))2 s_mkj(r−1) = arg min {(i,j)|i,j=1,...,J }(γ q+r₎[i]_{− (γ}(r−1)₎[j]2_; 1.2 (ij2, j2)= arg min {(i,j)|i,j=1,...,J ; i=i_j1,j=j1} (γq+r)[i]− (γ(r−1))[j]2; .. . 1.J (ijJ, jJ)= arg min {(i,j)|i,j=1,...,J ; i=ijn,j=jn,n=1,...,J −1} (γq+r)[i]− (γ(r−1))[j]2. 2. Reorder columns of γq+r _as _˜γq+r _{= [(γ}q+r₎[i1]_{, . . . , (γ}q+r₎[iJ−1]_{, (γ}q+r₎[iJ]] and βq+r

as ˜βq+r = [(βq+r)[i1]− (βq+r₎[iJ]_{, . . . , (β}q+r₎[iJ−1]_{− (β}q+r₎_[iJ]], where we define

(βq+r)[J ]= 0.

3. Update centers and variances:

γ_mkj(r) =q+ r − 1 q+ r γ (r−1) mkj + 1 q+ r ˜γ q+r mkj s_mkj(r) =q+ r − 1 q+ r s (r−1) mkj + q+ r − 1 q+ r γ_mkj(r−1)− γ_mkj(r)2+ 1 q+ r ˜γq+r mkj − γ (r) mkj 2 ,

where ˜γ_mkjq+r are components of the relabeled matrix ˜γq+r. In other words, we first compute γ(0) to be regarded as the reference of labeling and use the standardized column distance to identify the ordering of columns of the gamma matrix. At the (q+ r)th run of samples, columns of γq+r is permuted to the status that is the most marginally similar to γ(r−1), and γ(r−1)is updated to

γ(r) by incorporating relabeled γq+r. Indices (ij1, . . . , ijJ)apply to β as well. The right-hand

side of Figure1 shows the relabeled trace plots. The proposed relabeling procedure appears to work well.

4. Simulation Study

The simulation study consists of two parts. The first part aims to examine the performance of our proposed RJMCMC method, and the second part focuses on the comparison with other existing approaches.

4.1. Performance of the Proposal Method

Two different RLCA (2) models were simulated. One was a three-class RLCA model with five measured indicators, each indicator with three levels (i.e., J= 3, M = 5, K1= · · · = K5= 3). Four independent covariates were generated; two covariates (zi1∼ Bernoulli(0.5), zi2∼ Normal(0, 1)) were associated with Yimgiven Si for all m and the other two covariates (xi1∼ Bernoulli(0.5), xi2∼ Normal(0, 1)) were associated with Si (i.e., P = L = 2). The other RLCA

(11)

FIGURE1.

Trace plots for selecting the first q samples to initial the relabelling procedure. The left-hand side of the plot is the trace plot of γ521–γ526at the first 5000 sweeps before labelling. In the figure, one color represents one parameter of γ521–γ526. After adopting the relabelling procedure, the relabeled trace plot is shown in the right-hand side of the plot.

model was a six-class model with all others settings the same as for the three-class model (i.e., J= 6, M = 5, K1= · · · = K5= 3, P = L = 2). For each model, parameters β were selected with the purpose of allocating a similar number of individuals to each class. Each element of α and γ was determined by randomly choosing from Uniform(−5, 5). For the three-class RLCA, we set the sample size N= 500, which gave roughly 10 individuals per parameter of RLCA (2). For the six-class model, N= 1500 was selected, which gave roughly 16 individuals per parameter. In the simulation, we set hyperparameters σP = 3.0, σBD= σSM= 0.3, and Jmax= 30. Observable measurements Yi were generated from each of the two RLCA models with 100 replicates, and each replication was run for 100,000 RJMCMC sweeps. Because it was tedious to determine a best burn-in period for each of the 100 replications, to simply the process we examined first several replications and found that the occupancy fraction plots (as will be described in Section S.3.1 of the supplementary information and Section6.2of the following real data analysis) be-came stable roughly after the first 10,000 sweeps. We thus took the first 10,000 sweeps as the burn-in period for each replication. We did not use different initial values for RJMCMC esti-mation in the simulation study. However, the proposed procedure’s sensitivity to initial values is evaluated in Section S.3.4 of the supplementary information. To obtain posterior distributions of parameters for the class number J , in each replication we calculated: the sample mean ˆθ (J ), the sample standard deviation s_ˆθ(J )and the 2.5th and 97.5th percentiles of the one-dimensional marginal posterior distributions (i.e., a 95 % credible interval) from sweeps with the class number being equal to J .

(12)

FIGURE2.

Proportions of sweeps resulting in different values of J for 100 replications with data generated from the three-class RLCA model. The darkest region represents the proportion for the true number of class (J= 3 in this scenario), and re-gions above the darkest represent proportions of over-estimation (J > 3) and rere-gions below for under-estimation (J < 3). Out of 100 replications, 95 replicates suggested the number of classes to be three.

Figure2shows J ’s marginal posterior distribution for the three-class RLCA model for all replications. Proportions of sweeps resulting in different values of J in each replication are coded in one bar, where the darkest region represents the proportion for the true number of classes (J= 3 in this scenario); regions above the darkest represent proportions of over-estimation (J > 3) and regions below for under-estimation (J < 3). Among 100 replications, 95 give correct class number estimates (i.e., have the largest proportion for J = 3), while the other five replications support four classes. Figure3displays J ’s marginal posterior distribution for the six-class RLCA model for 100 replications. There are 84 out of 100 replications correctly estimating the number of classes (J = 6). Fourteen replicates show the class number to be five and the remaining two suggest four classes. We also examined the Bayes factors to see how strongly the simulation outputs support the true number of latent classes. This information is presented in Section S.2.1 of the supplementary information. We found that the evidence from most replications supported strongly the true underlying number of classes. We also showed that the best number of classes suggested by Bayes factors was also the mode of the posterior distribution of J , which we used for class number selection in RJMCMC.

Tables1and2list results of parameter estimation under the true number of classes, averaging over replications whose estimated J (the mode of posterior J ) was equal to the true value: the true parameter θ , the average of sample means ¯ˆθ, the average of sample standard deviations ¯s_ˆθ, and the coverage rate (CR) of the 95 % credible intervals to contain the true parameter value. Table1, showing the results from J= 3, indicates that the RJMCMC gives sensible inferences in this case. The estimated posterior means ¯ˆθ are reasonably close to the true values of the

(13)

FIGURE3.

Proportions of sweeps resulting in different values of J for 100 replications with data generated from the six-class RLCA model. The darkest region represents the proportion for the true number of class (J= 6 in this scenario), and regions above the darkest represent proportions of over-estimation (J > 6) and regions below for under-estimation (J < 6). Out of 100 replications, 84 replicates suggested the number of classes to be six, 14 replicates suggest five, and two replicates suggest four.

parameters, and the estimated posterior standard deviations¯s_ˆθare within the interval from 0.23 to 2.00. The coverage rates of 95 % credible intervals range from 88.42 % to 100 %. Table 2 gives the results from J = 6. Here, most of the estimated posterior means are close to the true values of the parameters, and the estimated posterior standard deviations range from 0.22 to 1.01, except one that attains 3.06. There are two (out of 95) CRs that are less than 80 %; nevertheless, other CRs are pretty close to 95 %. Although most parameter estimates have small biases and large CRs, some parameters are significantly overestimated with substantially larger standard deviations; for example, α142 in Table1 and (β12, β04, β14)in Table2. After careful examination, we found that these may be caused by the sparseness between response indicators and incorporated covariates, which is often encountered in a given dataset. Detailed explanation can be found in Section S.2.2 of the supplementary information.

Moreover, we ran an additional simulation with a small sample size relative to the number of model parameters. We adopted a three-class RLCA model similar to the one used above but with the sample size N = 50. The sample size (50) was less than the number of parameters of RLCA (56), which was similar in size to that in our real data example. The results indicate that only 41 % of the replications give correct class number estimates, which is a much lower correct rate as compared with when the sample size is large (95 %). Given that, we further examined the distribution of empty classes in RJMCMC to justify the class number estimation. The parameter estimation under the true number of classes is reasonably accurate, except for parameters with true values larger than 3 or smaller than−3. These biases in extreme values are probably due to our selection of noninformative prior mean zero of θ , which can dominate the

(14)

TABLE1.

The posterior distributions of model parameters from simulated data with J= 3 and N = 500 over 100 replications. β01 β11 β21 β02 β12 β22 α111 α211 α112 α212 θ 1.97 −2.36 1.38 1.43 −4.59 3.73 1.82 −1.26 4.53 4.29 ˆθ 1.98 −2.37 1.48 1.65 −4.54 3.64 1.60 −1.62 4.33 4.12 s_ˆθ 0.33 0.38 0.24 0.36 0.52 0.36 0.55 0.44 0.59 0.52 CR 100.00 100.00 100.00 100.00 100.00 97.89 89.47 88.42 97.89 94.74 α121 α221 α122 α222 α131 α231 α132 α232 θ −2.11 −3.92 −0.25 −2.49 −0.04 −3.61 −2.11 −1.05 ˆθ −2.17 −3.87 −0.19 −2.51 0.04 −3.57 −2.22 −1.13 s_ˆθ 0.48 0.42 0.35 0.31 0.37 0.43 0.46 0.28 CR 97.89 97.89 96.84 97.89 92.63 96.84 90.53 95.79 α141 α241 α142 α242 α151 α251 α152 α252 θ 3.57 3.08 −3.93 0.93 −2.62 4.90 2.50 2.76 ˆθ 3.53 2.93 −1.07 0.79 −2.59 4.81 1.99 2.82 s_ˆθ 0.46 0.33 2.00 0.31 0.40 0.44 0.77 0.43 CR 93.68 91.58 91.58 93.68 97.89 93.68 92.63 94.74 γ111 γ112 γ121 γ122 γ131 γ132 γ141 γ142 γ151 γ152 θ 1.24 3.82 −1.77 −0.20 −3.87 −0.15 3.75 3.86 2.86 −2.41 ˆθ 1.14 3.89 −1.74 −0.20 −3.91 −0.21 3.84 3.91 2.97 −1.95 s_ˆθ 0.72 0.70 0.43 0.27 0.63 0.23 0.79 0.78 0.41 0.82 CR 93.68 96.84 96.84 95.79 94.74 96.84 95.79 95.79 93.68 94.74 γ211 γ212 γ221 γ222 γ231 γ232 γ241 γ242 γ251 γ252 θ 1.51 2.13 −2.08 3.17 4.13 −1.76 −3.80 −1.30 2.51 −3.62 ˆθ 1.68 2.35 −2.59 3.33 4.40 −1.98 −3.83 −1.29 2.56 −3.40 s_ˆθ 0.65 0.66 1.38 0.45 0.59 1.47 0.50 0.32 0.40 0.88 CR 98.95 97.89 98.95 96.84 96.84 100.00 91.58 96.84 94.74 96.84 γ311 γ312 γ321 γ322 γ331 γ332 γ341 γ342 γ351 γ352 θ −3.98 −3.63 2.91 −2.96 0.00 −2.88 −3.00 −4.52 2.11 −2.66 ˆθ −4.22 −3.45 2.88 −3.10 0.02 −3.09 −2.90 −4.44 2.20 −2.23 s_ˆθ 0.61 0.51 0.45 1.30 0.30 0.88 0.43 1.07 0.37 0.80 CR 94.74 95.79 97.89 98.95 89.47 96.84 91.58 98.95 91.58 94.74

a_θ_{: the true parameter.}

ˆθ: the average of posterior sample means over replications whose posterior mode of J was equal to the true

value.

s_ˆθ: the average of posterior sample standard deviations over replications whose posterior mode of J was equal to the true value.

CR: the coverage rate of the 95 % credible intervals to contain the true parameter value over replications whose posterior mode of J was equal to the true value.

posterior distribution of θ when the sample size is small. Details of this simulation are shown in Section S.2.3 of the supplementary information.

(15)

TABLE2.

The posterior distributions of model parameters from simulated data with J= 6 and N = 1500 for 100 replications. β01 β11 β21 β02 β12 β22 β03 β13 β23 θa 0.76 −2.91 −1.30 −1.71 −4.52 4.10 −0.42 1.19 2.12 ˆθ 0.90 −3.54 −1.77 −1.64 −2.53 3.26 −0.34 1.26 1.64 s_ˆθ 0.34 0.97 0.48 0.56 3.06 0.95 0.43 0.45 0.38 CR 100.00 100.00 80.00 100.00 100.00 98.18 100.00 100.00 94.55 β04 β14 β24 β05 β15 β25 α111 α211 α112 α212 θ −3.48 0.24 −4.60 −0.15 1.28 1.20 −4.54 1.10 2.98 −4.06 ˆθ −2.26 −0.95 −3.87 −0.16 1.29 0.92 −4.35 1.11 3.12 −4.14 s_ˆθ 0.71 0.62 1.00 0.39 0.50 0.46 0.47 0.24 0.30 0.27 CR 54.55 23.64 96.36 100.00 100.00 94.55 94.55 96.36 90.91 89.09 α121 α221 α122 α222 α131 α231 α132 α232 θ 2.12 2.71 −0.60 −3.27 −1.99 2.98 −0.03 −3.30 ˆθ 2.02 2.70 −0.66 −3.23 −1.90 2.81 0.02 −3.50 s_ˆθ 0.23 0.22 0.23 0.27 0.33 0.39 0.25 0.32 CR 94.55 96.36 89.09 98.18 94.55 94.55 90.91 94.55 α141 α241 α142 α242 α151 α251 α152 α252 θ −3.09 4.41 −4.65 −2.99 −3.90 2.25 −0.11 −4.75 ˆθ −2.98 4.42 −4.38 −2.89 −3.82 2.13 −0.14 −4.91 s_ˆθ 0.30 0.32 0.44 0.34 0.41 0.28 0.23 0.35 CR 94.55 96.36 89.09 94.55 94.55 94.55 96.36 92.73 γ111 γ112 γ121 γ122 γ131 γ132 γ141 γ142 γ151 γ152 θ 3.80 1.01 −0.79 −1.06 −1.45 4.72 −3.51 −4.48 −1.95 −0.15 ˆθ 3.69 0.84 −0.67 −0.94 −1.09 5.15 −3.42 −4.51 −2.02 −0.06 s_ˆθ 0.48 0.48 0.25 0.30 0.71 0.60 0.42 0.75 0.36 0.28 CR 94.55 92.73 92.73 92.73 92.73 89.09 98.18 98.18 94.55 96.36 γ211 γ212 γ221 γ222 γ231 γ232 γ241 γ242 γ251 γ252 θ −0.99 0.80 −3.44 −2.20 4.79 1.40 −0.71 4.56 1.57 3.18 ˆθ −1.05 0.84 −3.36 −2.16 4.56 1.23 −0.90 4.24 1.64 3.35 s_ˆθ 0.47 0.34 0.39 0.36 0.72 0.73 0.52 0.51 0.51 0.51 CR 92.73 94.55 94.55 94.55 92.73 96.36 94.55 90.91 100.00 96.36 γ311 γ312 γ321 γ322 γ331 γ332 γ341 γ342 γ351 γ352 θ −2.55 −0.92 −4.25 3.96 2.74 1.76 3.92 1.79 −0.19 2.91 ˆθ −2.63 −0.98 −4.05 4.05 2.84 1.85 3.90 1.66 −0.14 3.01 s_ˆθ 0.40 0.22 0.50 0.37 0.40 0.40 0.39 0.43 0.35 0.33 CR 92.73 96.36 94.55 98.18 96.36 98.18 98.18 96.36 96.36 96.36 γ411 γ412 γ421 γ422 γ431 γ432 γ441 γ442 γ451 γ452 θ 3.83 2.88 −1.65 −1.90 −4.30 0.37 2.00 −4.16 −4.67 −2.10 ˆθ 3.69 2.69 −1.56 −1.84 −4.06 0.26 1.94 −4.13 −4.72 −2.02 s_ˆθ 0.60 0.58 0.27 0.32 0.62 0.26 0.31 0.65 0.71 0.34 CR 96.36 94.55 92.73 96.36 90.91 92.73 90.91 94.55 96.36 89.09

(16)

TABLE2. (Continued) γ511 γ512 γ521 γ522 γ531 γ532 γ541 γ542 γ551 γ552 θ −1.37 0.34 2.64 4.31 −3.88 3.14 1.40 −4.60 −4.62 −4.02 ˆθ −1.44 0.36 2.52 4.14 −3.42 3.29 1.19 −4.17 −4.40 −4.16 s_ˆθ 0.34 0.23 0.86 0.86 0.67 0.34 0.26 0.69 0.59 0.49 CR 98.18 100.00 100.00 100.00 85.45 94.55 85.45 96.36 100.00 94.55 γ611 γ612 γ621 γ622 γ631 γ632 γ641 γ642 γ651 γ652 θ 4.08 −1.03 −1.47 −1.26 4.69 0.82 4.08 3.81 −4.87 −0.39 ˆθ 4.04 −1.06 −1.38 −1.25 4.56 0.58 4.06 3.70 −4.66 −0.32 s_ˆθ 0.60 0.62 0.31 0.36 0.76 0.77 0.61 0.64 1.01 0.35 CR 94.55 100.00 92.73 92.73 98.18 90.91 92.73 92.73 96.36 96.36

ˆθ: the average of posterior sample means over replications whose posterior mode of J was equal to the true

value.

s_ˆθ: the average of posterior sample standard deviations over replications whose posterior mode of J was equal to the true value.

CR: the coverage rate of the 95 % credible intervals to contain the true parameter value over replications whose posterior mode of J was equal to the true value.

4.2. Comparison with Alternative Approaches

Here we compare our proposed RJMCMC method with traditional frequentist two-stage approaches, where the number of classes is selected first, and the parameter estimation under the selected number of classes is then performed using Huang and Bandeen-Roche (2004) maximum likelihood estimation (MLE). Four approaches for selecting the number of classes are compared: (a) the goodness-of-fit (GOF) approach, where the RLCA model is fitted under different number of classes and the selected class number is the lowest number of classes that yields acceptable fit under the likelihood ratio goodness-of-fit test; (b) the AIC criterion, where the estimated number of classes is fixed at the class number J that minimizes−2 log L + 2T , where L is the likelihood and T is the total number of parameters in the RLCA model; (c) the BIC criterion, where the estimated number of classes is fixed at J that minimizes−2 log L + T log N, where N is the number of observations; and (d) Huang’s (2005) approach, where a factor-analysis analogical method is used for the class number estimation—a method that does not require repeated model fitting.

The simulated datasets were generated from two different RLCA (2) models. One was a three-class RLCA model with M = 5, K1= · · · = K5= 2, P = L = 1; and the other was a six-class RLCA model with M= 5, K1= · · · = K5= 2, P = L = 1. All parameters in both models were randomly generated from Uniform(−5, 5). To avoid sparse response patterns that might invalidate the GOF approach, the covariates associated with conditional probabilities

zim1, m= 1, . . . , 5 and the covariate associated with latent prevalences xi1 were binary from

Bernoulli(0.5). To build up the correlation between zim1 and xi1, one-fourth of the individuals had their conditional probability covariate values the same as latent prevalence covariate values, and the other individuals had independent values from two covariates. For the three-class RLCA, the selected sample sizes N were 500 and 1500, which gave 20 and 62 individuals per parameter, respectively. For the six-class RLCA, we set N= 3000 and 10,000, which gave 76 and 254 indi-viduals per parameter, respectively. One hundred replications were performed for each generated RLCA model.

(17)

T ABLE 3. Comparison among fi v e d if ferent methods in estimating the number o f classes. method true no. of classes = 3 N = 500 true no. of classes = 3 N = 1500 true no. of classes = 6 N = 3000 true no. of classes = 6 N = 10 ,000 es ti m at ed cl as se s n u m b er 2 345 2 3 4 5 623456782 3 4 5 6 7 8 G O F 1 0 0 000 0 8 2 5 0 1 3 9 2800000 9 2 8 0 0 0 0 0 A IC 1 7 2 8 3 4 2 1 0 8 6 1 2 2 0000 2 3 3 5 2 7 1 5 0 0 0 0 9 9 1 0 B IC 5 9 3 290 0 9 5 5 0 001 3 3 4 0 2 4200 0 0 3 9 7 0 0 H u an g 1 2 8 800 4 9 6 0 0 00 9 6 400000 1 0 0 0 0 0 0 0 R JM C M C 0 9 541 0 9 7 2 1 00019 8 2800 0 0 6 9 4 0 0

(18)

The results for estimating the number of classes are shown in Table 3. Apparently, our RJMCMC approach outperforms other alternative approaches in selecting the correct number of classes, for both J = 3 and J = 6. It seems that increasing the sample size can improve the accuracy in class number estimation. When the sample size is large (i.e., (J, N )= (3, 1500) and (6, 10,000)), our approach can correctly identify the class number over 97 % and 94 % of repli-cations for J = 3 and 6, respectively, which indicates its ability in recovering the true number of classes. Also, under large sample size the selected numbers of classes by RJMCMC (Bayes factor) and BIC are very consistent, which verifies the well known fact that Bayes factor and BIC yield similar results (Kass & Raftery,1995). RJMCMC and BIC disagree when (J, N )= (3, 500) and (6, 3000). This may be due to having some extremely small classes (details can be found in Section S.2.4 of the supplementary information). Notice that the GOF and Huang (2005) approaches can seriously underestimate the class number when the true class number is 6. One possible explanation is due to the correlation between zim1and xi1, which violates the independence assumption made in Huang (2005).

For simulated data with J= 3, the comparison of Bayes posterior estimates from our RJM-CMC procedure and MLEs using the Huang and Bandeen-Roche (2004) procedure for model parameters can be found in Tables4and5for N= 500 and N = 1500, respectively. Both Bayes posterior estimates and MLEs were calculated under the true class number for each replication and were then averaged over all 100 replications. The MLEs seem to be more biased than the Bayes estimates, especially when the absolute value of the true γ is larger than 3. For β and

α, the standard deviation estimates of the MLEs are typically smaller than the posterior sample standard deviations from RJMCMC. This is not surprising because the MLEs from the two-stage approach do not take the variation of class number selection into account, but the RJMCMC estimates do. Some γ ’s MLEs have extremely large standard deviations, which is probably due to the convergence to boundary solutions in γ estimates for some replicates. The comparison of Bayes estimates and MLEs for the cases with J = 6 is shown in Tables S.6 and S.7 of the supple-mentary information, which is similar to the results for J = 3. However, due to the much more complex model structure of the six-class model, problems of large biases and standard deviations of MLEs were more serious.

5. Sensitivity Analysis

In the sensitivity analysis, we highlight some specifications of hyperparameters and proposal parameters, and give a detailed discussion of their influence on the posterior distributions of J, class allocation and parameters. We also examine the proposed procedure’s sensitivity to initial values, the de-outlier step, and rejection sampling. This de-outlier step was implemented in our simulation studies and in the real data example to exclude extreme RJMCMC samples from the calculation of the posterior distribution, which can greatly reduce the impact due to the disturbance in parameter estimates and class allocation created by the jumping moves. In the simulation presented in Section4, we begin with the default setting σP = 3.0, σBD= 0.3 and σSM= 0.3. Different settings are specified to illustrate their impacts on the reversible jump procedure. These settings are applied to 100 simulated datasets with true J= 6 and N = 1500, generated in Section4.1. Details of the sensitivity analysis and relevant results can be found in Section S.3 of the supplementary information.

In view of the results from sensitivity analysis, we suggest starting with hyperparameter settings σP = 3 and σBD/σP = σSM/σP = 0.1. Then, one can tune the value of σP to make sure the posterior distribution of β to be unimodal and tune the values of σBD/σP and σSM/σP to ensure stable and well mixing Figures S.2–S.5 of the supplementary information. We also suggest running several Markov chains in the selected setting but engaging in different initial

(19)

TABLE4.

The average of Bayes posterior means and MLEs of model parameters from simulated data with J= 3 and N = 500 over 100 replications. β01 β11 β02 β12 α111 α121 α131 α141 α151 θa −1.12 1.89 0.97 0.54 0.53 1.35 4.15 −1.30 2.73 ˆθPM −1.12 2.03 0.98 0.60 0.48 1.19 4.21 −1.38 2.63 s_ˆθ PM 0.48 0.49 0.17 0.31 0.44 0.61 0.39 0.39 0.29 CRPM 98.00 93.00 98.00 94.00 95.00 95.00 99.00 95.00 92.00 ˆθMLE −0.62 1.45 0.71 0.37 0.54 1.17 4.24 −1.32 2.60 s_ˆθ MLE 0.18 0.26 0.14 0.23 0.33 0.35 0.38 0.29 0.24 CRMLE 37.00 0.00 70.00 0.00 89.00 76.00 95.00 91.00 74.00 γ111 γ121 γ131 γ141 γ151 γ211 γ221 γ231 γ241 γ251 θ −1.17 −2.12 −2.27 0.25 4.22 −4.02 2.51 −3.90 3.43 −1.10 ˆθPM −1.21 −2.14 −2.26 0.24 2.71 −4.28 2.86 −3.95 3.65 −1.02 s_ˆθ PM 0.42 0.67 0.45 0.36 1.25 0.60 0.56 0.40 0.44 0.20 CRPM 97.00 99.00 94.00 92.00 87.00 95.00 98.00 99.00 92.00 89.00 ˆθMLE −2.15 −1.07 −2.71 0.95 1.70 −5.79 5.18 −4.06 4.49 −1.15 s_ˆθ MLE 0.74 0.29 0.37 0.25 0.63 6.54 17.69 0.39 1.98 0.18 CRMLE 52.00 35.00 72.00 48.00 10.00 89.00 54.00 91.00 65.00 81.00 γ311 γ321 γ331 γ341 γ351 θ 1.69 −4.58 −1.65 −1.40 −3.16 ˆθPM 2.03 −4.64 −1.63 −1.53 −3.13 s_ˆθ PM 0.56 0.98 0.36 0.42 0.55 CRPM 95.00 96.00 95.00 92.00 93.00 ˆθMLE 1.71 −6.17 −1.66 −1.45 −2.70 s_ˆθ MLE 0.32 435.99 0.32 0.30 0.30 CRMLE 62.00 79.00 91.00 83.00 54.00

ˆθPM: the average of posterior sample means over 100 replicates. s_ˆθ

PM: the average of posterior sample standard deviations over 100 replicates.

CRPM: the coverage rate of the 95 % credible intervals to contain the true parameter value over 100 repli-cates.

ˆθMLE: the average of MLEs over 100 replicates. s_ˆθ

MLE: the average of standard deviation estimates over 100 replicates.

CRMLE: the coverage rate of the 95 % confidence intervals to contain the true parameter value over 100 replicates.

values. Keep running these chains until they all converge to the neighborhood of the same value. The proposed de-outlier step can be used to remove extreme posterior estimates of parameters due to the dimension jump. The selection of c= 1.2 for β, c = 1.2 for α and c = 1.3 for γ in rejection sampling provides reasonable balance between accuracy and computational effort.

(20)

TABLE5.

The average of Bayes posterior means and MLEs of model parameters from simulated data with J= 3 and N = 1500 over 100 replications. β01 β11 β02 β12 α111 α121 α131 α141 α151 θa −1.12 1.89 0.97 0.54 0.53 1.35 4.15 −1.30 2.73 ˆθPM −1.11 1.96 0.97 0.58 0.53 1.26 4.18 −1.37 2.71 s_ˆθ PM 0.26 0.27 0.10 0.17 0.24 0.37 0.23 0.22 0.17 CRPM 96.00 94.00 96.00 93.00 97.00 100.00 95.00 94.00 89.00 ˆθMLE −0.63 1.49 0.76 0.36 0.54 1.24 4.21 −1.25 2.62 s_ˆθ MLE 0.11 0.15 0.08 0.13 0.18 0.19 0.22 0.16 0.14 CRMLE 31.00 0.00 61.00 0.00 92.00 62.00 94.00 84.00 74.00 γ111 γ121 γ131 γ141 γ151 γ211 γ221 γ231 γ241 γ251 θ −1.17 −2.12 −2.27 0.25 4.22 −4.02 2.51 −3.90 3.43 −1.10 ˆθPM −1.18 −2.14 −2.30 0.24 3.21 −4.08 2.71 −3.94 3.56 −1.07 s_ˆθ PM 0.22 0.39 0.26 0.21 1.16 0.31 0.29 0.23 0.25 0.12 CRPM 97.00 96.00 94.00 95.00 86.00 96.00 92.00 95.00 97.00 92.00 ˆθMLE −1.69 −1.25 −2.67 0.79 1.88 −5.27 6.09 −4.04 4.12 −1.17 s_ˆθ MLE 0.16 0.16 0.22 0.14 0.63 5.21 11.98 0.22 0.48 0.10 CRMLE 48.00 31.00 58.00 44.00 10.00 77.00 62.00 89.00 61.00 78.00 γ311 γ321 γ331 γ341 γ351 θ 1.69 −4.58 −1.65 −1.40 −3.16 ˆθPM 1.82 −4.70 −1.65 −1.46 −3.22 s_ˆθ PM 0.28 0.65 0.21 0.23 0.32 CRPM 94.00 97.00 94.00 98.00 94.00 ˆθMLE 1.67 −5.22 −1.71 −1.37 −2.74 s_ˆθ MLE 0.17 13564.74 0.20 0.16 0.17 CRMLE 62.00 63.00 93.00 78.00 51.00

ˆθPM: the average of posterior sample means over 100 replicates. s_ˆθ

PM: the average of posterior sample standard deviations over 100 replicates.

CRPM: the coverage rate of the 95 % credible intervals to contain the true parameter value over 100 repli-cates.

ˆθMLE: the average of MLEs over 100 replicates. s_ˆθ

MLE: the average of standard deviation estimates over 100 replicates.

CRMLE: the coverage rate of the 95 % confidence intervals to contain the true parameter value over 100 replicates.

6. Real Data Analysis

To illustrate the proposed RJMCMC method for RLCA, we use data from the Multidimen-sional Psychopathological Study on Schizophrenia (MPSS) project and the Study on Etiologi-cal Factors of Schizophrenia (SEFOS) project. The data are described in detail in Huang, Tsai, Hwu, Chen, Liu, Hua, and Chen (2011). Briefly, the MPSS and SEFOS projects recruited sub-sided patients of schizophrenia (N= 225) from three hospitals in Taiwan, based on the DSM-IV (American Psychiatric Association,1994) criteria for schizophrenia. In this study, schizophrenia