藉由K均值分群與分裂式分群程序預測潛在群體

(1)

國立交通大學

統計學研究所

碩士論文

藉由K均值分群與分裂式階層分群程序

預測潛在群體

Prediction of Underlying Latent Classes via

K-means and Divisive Hierarchical Procedures

研究生：許仲竹

指導教授：黃冠華博士

(2)

藉由K均值分群與分裂式階層分群程序

預測潛在群體

Prediction of Underlying Latent Classes via

K-means and Divisive Hierarchical Procedures

研究生：許仲竹 Student: Chung-Chu Hsu

指導教授：黃冠華 Advisor: Dr. Guan-Hua Huang

國立交通大學

統計學研究所

碩士論文

A Thesis

Submitted to institute of Statistics College of Science

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Master

in Statistics June 2008

Hsinchu, Taiwan, Republic of China

中華民國九十七年六月

(3)

藉由K均值分群與分裂式階層分群程序

預測潛在群體

研究生：許仲竹指導教授：黃冠華博士

國立交通大學統計學研究所

摘要

本研究的主要目的是藉由群聚分析的方法預測潛在群體。利用群

聚方法中的 k 均值分群和分裂階層分群的想法，將原本的距離測度改

為相關係數或共變異數，對所有的主體分群，使得屬於同一群的主體

所測得的各項目能互相獨立。利用模擬來評估參數估計的表現，除此

之外，還利用精神分裂症和乳癌的微陣列資料為例，作更詳細的說

明。模擬結果顯示，k 均值分群法所估出來的參數都相當靠近真實的

參數，但是分裂式階層分群法表現得並不好;然而，在乳癌資料的例

子裡，分裂階層分群法成功的將主體分群，也對潛在群體做了不錯的

預測。

關鍵字：潛在群體、K 均值分群、分裂式分群

(4)

Prediction of Underlying Latent Classes via

K-means and Divisive Hierarchical Procedures

Student: Chung-Chu Hsu Advisor: Dr. Guan-Hua Huang

Institute of Statistics

National Chiao Tung University

ABSTRACT

The aim of the study is to predict the underlying latent class via k-means and divisive hierarchical clustering methods. We use the correlation (or covariance) among items as the distance measure to group objects such that, for all objects who belong to the same latent class, items are ”independent”. A simulation study is presented to evaluate the behavior of estimating parameters. Besides, the schizophrenia and breast cancer microarray data were used for illustration. The results of the simulation studies displayed that the estimated parameters by k-means method are closed to the true parameters, but the divisive hierarchical method didn’t perform well. However, the divisive hierarchical approach makes the successful division and predicts the latent class membership well for breast cancer data.

(5)

誌謝

最重要的要感謝我的指導教授黃冠華老師，這一年在老師的教導下，才能使我完成這篇論文，不僅如此，也引導我學習如何解決問題，使我在辛苦的研究過程中找到樂趣。當然我同學及朋友的鼓勵也是我這一年做研究的動力。感謝跟我同一家的阿耕、彥銘師兄及佩芳師姐，有事沒事一起吐苦水、交換意見，互相鼓勵; 打球的夥伴們，讓我在疲憊的研究所生活得到充電的機會，還有所上的每一位老師、同學，感謝你們的照顧，讓我在這兩年中，不僅得的了許多的知識，更擁有珍貴的友誼，真的很開心認識你們。黃冠華老師、邱燕楓教授、陳君厚教授及洪志真老師，也要感謝你們寶貴的意見，讓我的論文更佳的完善。即將要畢業了，我將帶著感謝及美好的回憶離開，進交大、離開交大，這短短的兩年，但卻有我無限的回憶，感謝你們的支持及關愛，現在的我才能有無比的信心及勇氣迎接下一個階段的挑戰。最後，僅將此論文送給我愛的父母及老師及我身邊的同學們，我想這是我們大家努力的結果，也將一同分享論文完成的喜悅，感謝大家。許仲竹謹誌于國立交通大學統計研究所中華民國九十七年七月三日

(6)

Abstract (in Chinese)

i

Abstract (in English)

ii

Acknowledgements (in Chinese)

iii

iv

List of Tables v

List of Figures viii

List of Tables...v

1. Introduction...1

2. Literature review ...3

2.1 Latent class analysis (LCA) ...3

2.2 Regression extension of latent class analysis (RLCA) ...3

2.3 Marginalization of the regression extension of latent class model ...5

2.3.1 Marginalizing the covariate effects on conditional probabilities ...6

2.3.2 Marginalizing the covariate effects on latent prevalences ...7

2.4 Hierarchical clustering methods...8

2.5 Ward’s hierarchical clustering method...9

2.6 K-means method...10

3. Models ...12

3.1 LCA ...12

3.2 RLCA ...12

4. Parameter estimations by clustering analysis ...15

4.1 Latent class membership estimations for LCA ...15

4.2 Latent class membership estimations for RLCA ...21

4.3 Parameter estimation by viewing estimated latent class as known variable...22

5. Simulation study ...24

5.1 Generated data from RLCA model ...24

5.2 Simulation results...25

6. Example...30

7. Discussion ...39

References ...40

(7)

List of Tables

Table 1: Values ofα0andαLmin 3 class case...42

Table 2: Values ofβ₀andβ_pjin 3 class case...42

Table 7: Average parameters estimations for 100 replication in 3-class model,N=100 ...45

Table 8: Average parameters estimations for 100 replication in 3-class model, N=100...46

Table 9: Average conditional Probability for 100 replication in 3-class model, N=100...46

Table 10: Average Latent Prevalence for 100 replication in 3-class model, N=100.48 Table 11: Average Correlation Coefficients for 100 replication in3-class model,N=100...48

Table 12: Average Match Proportions for 100 replication in 3-class model, N=100 ...48

Table 15: Average conditional Probability for 100 replication in 3-class model, N=500...50

Table 16: Average Latent Prevalences for 100 replication in 3-class model, N=500 ...52

Table 17: Average Correlation Coefficients for 100 replication in 3-class model, N=500...52

Table 18: Average Match Proportions for 100 replication in 3-class model, N=500 ...52

(8)

Table 21: Average conditional Probability for 100 replication in 6-class model, N=300...56 Table 22: Average Latent Prevalences for 100 replication in 6-class model, N=300

...60 Table 23: Average Correlation Coefficients for 100 replication in 6-class model,

N=300 (total number of not NA values in parentheses) ...61 Table 24: Average Match Proportions for 100 replication in 6-class model, N=300

...61 Table 25: Average parameters estimations for 100 replication in 6-class model,

N=1000 (standard error in multinomial regression / sample standard error for 100 replication) ...62 Table 26: Average parameters estimations for 100 replication in 6-class model,

N=1000...65 Table 27: Average conditional Probability for 100 replication in 6-class model,

N=1000...66 Table 28: Average Latent Prevalences for 100 replication in 6-class model, N=1000

N=150...71 Table 32: Average parameters estimations for 100 replication in 2-class model,

N=700...75 Table 38: Average parameters estimations for 100 replication in 2-class model,

(9)

N=700 (total number of not NA values in parentheses) ...78 Table 42: Average Match Proportions for 100 replication in 2 class model, N=70078 Table 43: Composition of classes of patients at the acute state by the four-class

RLCA model with divisive hierarchical clustering method ...79 Table 44: External validity of classes of patients at the acute state by the four-class

RLCA model with divisive hierarchical clustering method ...80 Table 45: Composition of classes of patients at the subsided state by the three-class

RLCA model with divisive hierarchical clustering method ...81 Table 46: External validity of classes of patients at the subsided state by the

three-class RLCA model with divisive hierarchical clustering method...82 Table 47: Composition of classes of patients at the acute state by the four-class

RLCA model with k-means clustering method ...83 Table 48: External validity of classes of patients at the acute state by the four-class

RLCA model with k-means clustering method ...84 Table 49: Composition of classes of patients at the subsided state by the three-class

RLCA model with k-means clustering method ...85 Table 50: External validity of classes of patients at the subsided state by the

three-class RLCA model with k-means clustering method...86 Table 51: External validity of classes of breast cancer patients by the two-class

RLCA model with divisive hierarchical clustering method ...87 Table 52: Predictions of class membership of 19 tumours by divisive hierarchical

clustering method ...87 Table 53: External validity of classes of breast cancer patients by the two-class

RLCA model with k-means clustering method ...88 Table 54: Predictions of class membership of 19 tumours by k-means clustering

(10)

List of Figures

Figure 1: An example of k-means algorithm procedure. ...18

Figure 2: An example of divisive hierarchical algorithm procedure. ...20

Figure 3: Heatmap for schizophrenia patients at the acute state...89

Figure 4: Heatmap for schizophrenia patients at the subsided state ...90

(11)

1. Introduction

Latent class analysis (LCA), originally described by Green (1951) and systematically developed by Lazarsfeld and Henry (1968), Goodman (1974), has been found useful for classifying objects based on their responses to a set of categorical items. Latent class models have proven useful for analyzing relationships between measured multiple indicators and covariates of interest. Such models summarize shared features of the multiple indicators as an underlying categorical variable, and the indicators’ substantive associations with predictors are built directly and indirectly in unique model parameters.

The basic model postulates an underlying categorical latent variable, say, J categories, and measured items are assumed independent of one another within any category of the latent variable. Observed relationships among measured variables are thus assumed to result from the underlying classification of the data produced by the categorical latent variable.

Latent class analysis may legitimately be viewed as the analog of cluster analysis. The term cluster analysis encompasses a number of different algorithms and methods for grouping objects of similar kind into respective categories. In this research, instead of grouping objects of “similar kind” into respective categories, we apply the divisive hierarchical ideas of clustering methods with the correlation among items as the distance measure to group objects such that, for all objects who belong to the same latent class, items are “independent”.

Recently several authors extended the LCA model to describe the effects of measured covariates on the underlying mechanism (Dayton and Macready, 1988; Vander Heijden, Dessens and Bökenholt, 1996; Bandeen-Roche, Migliorette, Zeger and Rathouz, 1997), or on measured item distributions within latent levels (Melton, Liang and Pulver, 1994). These extended LCA models are called the regression

(12)

extension of latent class analysis (RLCA) models. For the RLCA model, by using the marginalizing techniques to eliminate covariate effects from both the latent variable and measured indicators (Huang, 2005), our clustering idea can be also applied to the reduced LCA model to estimate the latent class membership. By viewing the latent variable as known predictors, it becomes easy to estimate the parameters in the RLCA model.

(13)

2. Literature review

2.1 Latent class analysis (LCA)

The starting point for the methodology that we let Υ_i =

(

Y ,...,_i1 Y_iM

)

Tdenote a set of M observable polytomous indicators for the ith individual in a study sample of N personsY , _im m=1,...,M can take values

{

1,...,Κ_m

}

, where Κ_m ≥2. The basic model postulates an underlying categorical latent variableS_i =1,...,Jfor individual i; within any category of the latent variable, the measured indicators are assumed to be independent of one another. Therefore, the distribution forΥ can be expressed as _i

(

)

_{∑ ∏∏}

= = = ⎭⎬ ⎫ ⎩ ⎨ ⎧ = = = J j M m K k y mkj j m iM i m mk p y Y y Y 1 1 1 1 1 ,..., Pr η , (2.1)

where y_mk =I

(

y_m =k

)

=1if y_mk = ; 0 otherwise. The LCA model assumes that k

(

S_i j

)

j = Pr =

η and p_mkj =Pr

(

Y_im =k|S_i = j

)

, (2.2) N

i=1,..., ;m=1,...,M ;k =1,...,K_m; j =1,...,J.

The model treats class membership probabilities, η_j, and item response probabilities conditional on class membership, p_mkj , as homogeneous over individuals. Heuristically, η_jis the population prevalence of class j, and p is the probability of _mkj an individual in class j being at levels k ofY . Goodman (1974) provided an excellent _im overview of the LCA model, including a maximum likelihood strategy for estimating model parameters, conditions to determine local model identifiability, a strategy to test overall model fit, and the use of constraints to identify models.

2.2 Regression extension of latent class analysis (RLCA)

Huang and Bandeen-Roche (2004) extend the latent class analysis to allow both the probabilities of latent class membership and the distribution of observed responses given latent class membership to be functionally related to concomitant variables,

(14)

while preserving model identifiability. By allowing covariate effects on latent class probabilities, we summarize the effect of risk factors on the underlying mechanism. In the case of incorporation covariates into conditional probabilities, we can adjust for characteristics that determine responses other than underlying classes, hence hopefully improving the accuracy of classifying individuals. For instance, in evaluating functional disability, some data have suggested that women tend to rate tasks as “difficult” more readily than men independently of ability (Bandeen-Roche, Huang, Munoz, & Rubin, 1999). Without adjusting for the gender effect, the model might well classify some men and women with identical underlying functioning differently (men as “able”, women as “disabled”).

Let

(

x ,_i z_i

)

be the concomitant covariates of the ith person, where

(

)

T ip i x x ,..., , 1 ₁ i =

x are primary covariate hypothesized to be associated with latent

class membership,S , and _i z_i =

(

z ,...,_i₁ z_iM

)

Twith z_im =

(

1,z ,...,_im₁ x_imL

)

T,m=1,...,M , are secondary covariates used to build direct effects on measured indicators. The sets of covariates may include any combination of continuous and discrete measures, and two sets of covariates may be mutually exclusive or overlap. The regression extension of LCA may then be stated as follows:

(

)

_∑

( )

_∏∏

(

)

= = = ⎭⎬ ⎫ ⎩ ⎨ ⎧ + = = = J j M m K k m T im mj y mkj T i j i m iM i m mk p y Y y Y 1 1 1 i 1 1 ,..., | , Pr x z η x β γ z α (2.3)

with η_j

( )

xT_iβ and p_mkjymk

(

γ_mj +zT_imα_m

)

_{defined as in the generalized linear framework} (McCullagh and Nelder, 1989). Often, (3) is implemented assuming generalized logit (Agresti, 1984) link functions:

( )

T j j i pj ip i J T i j _β _β _x _β _x η η + ⋅⋅ ⋅ + + = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ 1 1 0 log β x β x for i=1,...,N; j=1,...,J −1 (2.4) and

(15)

(

)

(

)

mkj mk im Lmk imL m T im mj mKj m T im mj mkj _z _z p p α α γ + +⋅ ⋅⋅+ = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ + + 1 1 log α z γ α z γ for i=1,...,N;m=1,...,M ;k =1,...,(K_m−1); j =1,...,J. (2.5) Notice that in the conditional probability model (2.5), we allow unrestricted intercepts and level-and item-specific covariate coefficients, but the coefficients vary across classes is unallowable (i.e.,α_qmkis dependent on m, k but independent of j ). This constraint is reasonable if the primary purpose of modeling conditional probabilities is to prevent possible misclassification by adjusting for characteristics associated with item measurements. It is also necessary to unambiguously distinguish covariate effects on measured response probabilities from covariate effects on class probabilities. Three assumptions complete (2.3):

(C1) Pr

(

Y_i₁= y₁,...,Y_iM = y_m|S_i,x_i,z_i

)

=Pr

(

Y_i₁= y₁,...,Y_iM = y_m|S_i,z_i

)

; (C2) Pr

(

S_i = j|x_i,z_i

)

=Pr

(

S_i = j|x_i

)

; (C3) Pr

(

,..., | ,

)

Pr

(

| ,

)

. 1 1 1= = =

∏

₌ = M m im m i im i i m iM i y Y y S Y y S Y z z

Huang and Bandeen-Roche (2004) provided an excellent overview of the RLCA model, including model identification, Expectation-Maximization algorithm for parameter estimation, standard error calculation, convergent properties, and comparison of the RLCA model with models underlying existing latent class modeling software.

2.3 Marginalization of the regression extension of latent class model

Now we introduce a process to “eliminate” the covariates effect, hence

“marginalize” the RLCA model (2.3). The marginalization process (Huang 2005) includes two stages. Stage 1 aims to eliminatez effect. And stage 2, we apply the _i marginalization property, proposed by Bandeen-Roche et al. (1997, to

(16)

averagex effect out of the latent prevalence). _i

2.3.1 Marginalizing the covariate effects on conditional probabilities

The key to marginalizing over z is that the process must yield random variables _i

that follow a finite mixture distribution that is both independent of z and has J _i mixing components. One strategy for achieving such marginalization can be motivated by the properties of added variable plots for linear regression models.

Consider the linear model

ε β x β x Y= + T ₂ + 2 1 T 1 (2.6) whereεwith mean0and variance matrixV. Let Y~ denote the residuals of regressing

Y onx₂, andW =V-1be the weight matrix. Then, it is well known that if x₁andx₂ are orthogonal (i.e.,x₁Wx₂Τ =0), Y~ has mean x₁Tβ₁ and varianceV. Hence, the simple linear regression of Y~on x₁yields exactly the same inferences about β₁as if we performed the analysis on the more complicated model (2.6) (Cook and Weisberg, 1982). Viewing the just-described stability of β₁as analogous to the desired stability of latent class dimension, J, the added variable property can be applied to model (2.6) to obtain the marginalized conditional probabilities.

To present the key ideas more clearly, the measured indicators

(

Y ,...,_i₁ Y_iM

)

are assumed to be binary (i.e., K₁ =⋅ ⋅⋅=K_M =2). To make the analogy to (2.6), notice that (2.5) can be viewed as fitting a logistic regression of Yimon Siadjusting for zim, separately for each m. Let S_ij =Ι(S_i = j) fori=1,...,N ; j =1,...,J −1. We can reparameterize (2.5) as

(

)

[

Yim Si Zcim

]

Siγm

( )

Zcim αm Τ Τ ₊ = Ε | , logit for i=1,...,N;m=1,...,M (2.7) where S_i =

[

1,S_i₁,...,S_i₍_J₋₁₎

]

Τ;

(17)

Z_imc =

[

(

z_im1−z_m1

) (

,..., z_imL −z_mL

)

]

Τ, (“centered” covariate vector);

(

)

∑

= = N i imp mp N z 1 / 1 z ; γ_m =

[

γ_m₀,γ_m₁,...,γ_m₍_J₋₁₎

]

Τ; and α_m =

[

α₁_m,α₂_m,...,α_Lm

]

Τ.

Therefore, for any realization ofS_i, (2.7) is a logistic regression with dependent variable:Y and predictors: _im S_i,Z . c_im

Next, the problem becomes how to calculate residuals form the generalized linear model

(

)

[

_S _Z

]

( )

_Z _α* m c im c im i im Y = Τ Ε | , logit for i=1,...,N;m=1,...,M (2.8) The “pseudo-residuals” are given by

R_m =

[

R1_m,...,R_Nm

]

=Vˆ_m-1

(

Y_m −μˆ_m

)

Τ

. (2.9) Here “hat” represents the estimated values;

[

]

Τ

= _m _Nm m Y ,...,1 Y

Υ ;V_m =diag

(

V ,...,₁_m V_Nm

)

;V_im =Var

( )

Y_im ;Z_mc =

[

Z₁c_m,...,Zc_Nm

]

If x and_i z are independent, we can extract the _im Z from conditional probabilities _imc by treating the residuals form the model (2.8) as new response variables and

regressing them onS_i. We substitute the estimate of γ in the linear model *_m

im m Τ i im ε R =_S _γ* + , i=1,...,N;m=1,...,M (2.10) For the estimate of γ in the model (2.7). A formal justification shows that _m γ and *_m

m

γ can be very close under reasonable regularities. The above results can be

extended to the cases where

(

Y ,...,_i₁ Y_iM

)

is polytomous as in (2.1) and (2.3).

2.3.2 Marginalizing the covariate effects on latent prevalences

(18)

the covariates associated with latent class prevalences,x , can be ignored. _i

2.4 Hierarchical clustering methods

Hierarchical clustering techniques proceed by either a series of successive mergers or a series of successive divisions. Agglomerative hierarchical methods start with the individual objects. Thus, there are initially as many clusters as objects. The most similar objects are first grouped, and these initial groups are merged according to their similarities. Eventually, as the similarity decreases, all subgroups are fused into a single cluster.

Divisive hierarchical methods work in the opposite direction. An initial single group of object is divided into two subgroups such that the objects in one subgroup are “far from” the objects in the other. These subgroups are then further divided into dissimilar subgroups; the process continues until there are as many subgroups as objects – that is, until each object forms a group.

The results of both agglomerative and divisive methods may be displayed in the form of a two-dimensional diagram known as a dendrogram. As we shall see, the dendrogram illustrates the mergers or divisions that have been made at successive levels.

In this research, we focus on divisive hierarchical procedures. We will use an algorithm based on the proposal of Macnaughton-Smith et al. (1964).Here we illustrate the divisive analysis algorithm for grouping N objects.

1. All objects as a single cluster and an N×Nsymmetric distance (or dissimilarities) matrixD=

{ }

d_ij .

2. Looking for the object for which the dissimilarity to all other objects is largest. (If there are two such objects, we pick one at random.) This object is chosen to

(19)

initiate so-called splinter group.

3. For each objects of the larger group, we compute the dissimilarity with the remaining objects, and compare it to the dissimilarity with the objects of the splinter group. We choose the object which has the largest difference dissimilarity between the remaining objects with the splinter group to move into the splinter group.

4. Repeating step 3 until all the differences have become negative. Therefore, no further moves are made. The process stops and we have completed the first divisive step.

5. Then, we divide the biggest cluster, that is, the cluster with the largest diameter. (The diameter of a cluster is just the largest dissimilarity between two of its objects.) Therefore, the above procedure will be applied until all objects in a single cluster.

2.5 Ward’s hierarchical clustering method

Ward (1963) considered hierarchical clustering procedures based on minimizing the “loss of information” from joining two groups. This method is usually implemented with loss of information taken to be an increase in an error sum of squares criterion, ESS. First, for a given cluster k, letESS be the sum of the squared _k deviations of every item in the cluster from the cluster mean (centroid). If there are currently K clusters, define ESS as the sum of the ESS_k or

K 2

1 ESS ESS

ESS

ESS= + +...+ . At each step in the analysis, the union of every possible pair of clusters is considered, and the two clusters whose combination results in the smallest increase in ESS (minimum loss of information) are joined. Initially, each cluster consists of a single item, and, if there are N items, 0ESS_k = ,

,...N ,

(20)

a single group of N items, the value of ESS is given by

(

x x

) (

x x

)

ESS= _j − _j − Τ =

∑

N j 1

where x is the multivariate measurement associated with the jth item and x is the _j mean of all the items.

The results of Ward’s method can be displayed as a dendrogram. The vertical axis gives the values of ESS at which the mergers occur.

Ward’s method is based on the notion that the clusters of multivariate observations are expected to be roughly elliptically shaped. It is a hierarchical precursor to nonhierarchical clustering methods that optimize some criterion for dividing data into a given number of elliptical groups.

2.6 K-means method

MacQueen (1967) suggests the term K-means for describing an algorithm of his that assigns each item to the cluster having the nearest centroid (mean). In its simplest version, the process is composed of these three steps:

1. Partition the items into K initial clusters.

2. Proceed through the list of items, assigning an item to the cluster whose

centroid (mean) is nearest. (Distance is usually computed using Euclidean distance with either standardized or unstandardized observations.) Recalculate the centroid for the cluster receiving the new item and for the cluster losing the item.

3. Repeat Step 2 until no more reassignments take place.

Rather than starting with a partition of all items into K preliminary groups in Step 1, we could specify K initial centroids (seed points) and then proceed to Step 2.

(21)

The final assignment of items to clusters will be, to some extent, dependent upon the initial partition or the initial selection of seed points. Experience suggests that most major changes in assignment occur with the first reallocation step.

(22)

3. Models

3.1 LCA

Let

(

Y ,...,_i₁ Y_iM

)

denote a set of M observable polytomous outcome indicators and S denote the unobservable class membership, for the ith individual in a study _i sample of N persons.Υ can take values_im

{

1,...,Κ_m

}

, whereΚ_m ≥2, m=1,...,M , and

i

S can take values

{

1,...,J

}

. The latent class analysis model is based on the concept of conditional independence in the sense that the observed variables are assumed to be statistically independent within latent classes. Therefore, the distribution for

(

Y ,...,i1 YiM

)

can be expressed as the finite mixture density:

(

)

_∑

(

)

_∏∏

[

(

)

]

= = = ⎭⎬ ⎫ ⎩ ⎨ ⎧ = = = = = = J j M m K k y i im i m iM i m mk j S k Y j S y Y y Y 1 1 1 1 1 ,..., Pr Pr | Pr , (3.1)

where 1y_mk = if y_m =k ; 0 otherwise. The LCA model assumes that

(

Yim = |k Si = j

)

= pmkj

Pr , Pr

(

S_i = j

)

=η_j ,

N

i=1,..., ; m=1,...,M ; k =1,...,K_m ; j=1,...,J . Thus, the model treats class membership probabilities,η_j, and item response probabilities conditional on class membership, p_mkj , as homogeneous over individuals. Heuristically,η_j is the population prevalence of class j, and p is the probability of an individual in class j _mkj being at levels k ofY . _im

For more detail on identifiability, parameter estimations and the test overall model fit, readers may reference Goodman (1974).

3.2 RLCA

To incorporate covariate effects into LCA, let

(

x ,_i z_i

)

be the concomitant covariates of the ith person, where x_i =

(

1,x ,...,_i₁ x_ip

)

T are primary covariate

(23)

hypothesized to be associated with latent class membership, S , and _i

(

)

T iM i z z ,...,₁ i =

z with z_im =

(

1,z ,...,_im₁ x_imL

)

T,m=1,...,M , are secondary covariates used to build direct effects on measured indicators. The sets of covariates may include any combination of continuous and discrete measures. To marginalize the RLCA model, we begin by assuming that the two sets of covariates are mutually independent. The basic RLCA equation can be stated as

(

)

_∑

( )

_∏∏

(

)

= = = ⎭⎬ ⎫ ⎩ ⎨ ⎧ + = = = J j M m K k m T im mj y mkj T i j i m iM i m mk p y Y y Y 1 1 1 i 1 1 ,..., | , Pr x z η x β γ z α (3.2)

with η_j

( )

xT_iβ and p_mkjymk

(

γ_mj +zT_imα_m

)

_{defined as in the generalized linear framework} (McCullagh and Nelder, 1989). Often, (3.2) is implemented assuming generalized logit (Agresti, 1984) link functions:

( )

T j j i pj ip i J T i j x x β β β η η + ⋅⋅ ⋅ + + = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ 1 1 0 log β x β x for i=1,...,N; j=1,...,J −1, (3.3) and

(

)

(

)

mkj mk im Lmk imL m T im mj mKj m T im mj mkj z z p p α α γ + +⋅ ⋅⋅+ = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ + + 1 1 log α z γ α z γ for i=1,...,N;m=1,...,M ;k =1,...,(K_m −1); j=1,...,J. (3.4) If the regression coefficients in (3.3) or (3.4) are set as 0, model (3.2) reduced to models studied by Melton, Liang and Pulver (1994), Dayton and Macready (1998) or an ordinary latent class analysis (3.1).

Notice that in the conditional probability model (3.4), we allow unrestricted intercepts and level-and item-specific covariate coefficients, but we do not allow the coefficients to vary across classes (i.e.,α_qmkis dependent on m, k but independent of j ). This constraint is logical if the primary purpose of modeling conditional probabilities is to prevent possible misclassification by adjusting for characteristics associated with

(24)

item measurements. It is also necessary to unambiguously distinguish covariate effects on measured response probabilities from covariate effects on class probabilities. Three assumptions complete (3.2):

(C1) Pr

(

Y_i₁= y₁,...,Y_iM = y_m|S_i,x_i,z_i

)

=Pr

(

Y_i₁= y₁,...,Y_iM = y_m|S_i,z_i

)

; (C2) Pr

(

S_i = j|x_i,z_i

)

=Pr

(

S_i = j|x_i

)

; (C3) Pr

(

,..., | ,

)

Pr

(

| ,

)

. 1 1 1= = =

∏

₌ = M m im m i im i i m iM i y Y y S Y y S Y z z

For more detail on model assumptions, identifiability and parameter estimations, readers may reference Huang and Bandeen-Roche (2004).

(25)

4. Parameter estimations by clustering analysis

The parameters in (3.2) are typically estimated by maximum likelihood (ML) for a fixed number of classes, J. Viewing the class membershipS as unobservable, the _i LCA model (3.1) and RLCA model (3.2) becomes a typical incomplete-data problem. Goodman (1974) provided an excellent maximum likelihood strategy for estimating model parameters in (3.1), and Huang and Bandeen-Roche (2004) had successfully used the Expectation-Maximization (EM) algorithm (Dempster, Laird, & Rubin, 1997) to computing ML estimates of the parameters in (3.2) and created a powerful computer module to implement the proposed latent class model (3.2). However implementing the EM algorithm to estimate parameters in finite-mixture models is typically time-consuming. Therefore we propose an alternative clustering analysis strategy to predict parameters in (3.1) and (3.2).

4.1 Latent class membership estimations for LCA

Latent class analysis is a useful tool to classify objects based on their responses to a set of categorical items. Suppose the basic model has an underlying categorical latent variable S_i =1,...,J for individual i, and within any latent class, the measured indicators are assumed to be independent of one another. Therefore, if we can estimate the unobservable class membershipS , viewing the estimated class membership as _i known variable, then it is easy to predict the parameters in (3.1). We propose the following strategy to estimate the unobservable class membershipS . _i

Our strategies are to apply the concept of k-means (MacQueen, 1967) and divisive hierarchical methods to cluster the objects. Here we do not cluster the objects into J subgroups such that the objects in one subgroup are “far from” the objects in the others; we want to group objects such that response variables are statistically independent within latent classes. So we apply sample correlation or sample

(26)

covariance as distance in k-means and divisive hierarchical methods and the “loss information” and “minimum loss of information” concepts in Ward’s hierarchical clustering method.

Now we illustrate how to calculate the sample correlation and sample covariance matrix in k-means and divisive hierarchical methods.

For individual i, we transform the M polytomous outcome indicators

(

Y ,...,i1 YiM

)

to the dummy variables

( ) ( ) ( )

(

11,..., 1 1, 21,..., 2 1 ,..., 1,..., 1

)

~ 2 1− − − = _i _i _K _i _i _K _iM _iM _K_M i Y Y Y Y Y Y Υ with Y_imk =I

(

Y_im =k

)

,m=1,...,M,k =1,...,K_m −1. and variance-covariance matrix

( )

[

(

)

]

⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = = MM M M M M iqs imk i Y Y B B B B B B B B B Υ " # % # # " " 2 1 2 22 21 1 12 11 , Cov ~ Cov , (4.1)

where for the mth item and qth item, B is the block of _mq

(

K_m −1

)

×

(

K_q −1

)

covariance matrix. Various elements of the variance-covariance matrix of measured indicators are

(

)

(

)

(

)

(

)

(

)

(

)

(

)

⎪ ⎩ ⎪ ⎨ ⎧ ≠ = = − = = ≠ = = = − = = = = − = = 　　　　　　　　　　　　　　 q m Y Y Y Y s k q m Y Y s k q m Y Y Y Y Y iqs imk iqs imk iqs imk iqs imk imk iqs imk 　 if 1 Pr 1 Pr 1 , 1 Pr and if 1 Pr 1 Pr and if 1 Pr 1 Pr 1 Pr , Cov (4.2)

These variances were estimated by using the sample averages. Furthermore, we can also calculate the sample correlation matrix as

( )

2

1 2 1 ~ ~ Cov ~− − D Y D _i , where

(

B B BMM

)

D~ _diag ˆ _,ˆ _,...,ˆ 22 11

= . There are k-means and divisive hierarchical clustering algorithm separately.

(27)

K-means algorithm:

1. First, all objects are partitioned into K initial clusters.

2. Proceed through the list of objects, assigning an object to the cluster such that “minimum loss of independence” is reached.

3. Repeat step 2 until no more reassignments take place.

In step 1, we specify K preliminary centroids (seed points) and then proceed through the list of objects, assigning an object to the cluster whose centroid (mean) is nearest and the distance is computed using Euclidean distance. Since we use the sample covariance or correlation to measure minimum loss of independence, it is necessary to reach enough sample size in each initial cluster. Once an initial cluster including members less than we expected, we adjust the number of objects in each cluster by repartitioning the objects “randomly” and “evenly” into K initial clusters.

Now, we introduce the concept of minimum loss of independence in step 2. Denote thatMCov be the mean of the absolute values of entries in non-diagonal _k blocks of sample correlation/covariance matrix in a given cluster k. For a given object, if it is assigned to some cluster j, we define the loss of independenceLoI as the sum _j of theMCov , that is,_k LoI_j =MCov( )₁j +MCov( )₂j + +" MCov( )_Kj , where MCov( )_kj is the mean of the absolute values of non-diagonal-block entries of correlation/covariance matrix after the object being assigned to cluster j. After assigning some object through K clusters, we can obtainLoI ,j j= …1, ,K. The smaller the value ofLoI , the more j independent the observed variables for objects within cluster j are. Then we take the minimumLoI as the “minimum loss of independence” and assign a given object to _j the cluster corresponding to the minimum loss of independence. Figure 1 will display

(28)

k-means algorithm procedure.

Initial clusters

For object 1 assign to cluster 1

MCov₁(1) + MCov(1)₂ + MCov(1)₃ = LoI₁ Assign to cluster 2 (2) 1 MCov + (2) 2 MCov + (3) 3 MCov = LoI₂ Assign to cluster 3 (3) 1

MCov + MCov(3)₂ + MCov(3)₃ = LoI ₃

Figure 1: An example of k-means algorithm procedure.

Step 1: Partition 9 objects into 3 initial clusters. Step 2: What cluster the objects will be assigned to?

4 3 9 2 6 5 1 8 7 4 3 9 2 6 1 5 8 7 4 3 9 1 2 6 5 8 7 4 3 9 2 6 5 1 8 7

(29)

Assigning the object 1 into cluster 1, 2 and 3 separately, and we can obtain 1

LoI , LoI₂, andLoI . Assign the object 1 into the cluster which attaining ₃ “minimum loss of independence”.

Step 3: Repeat step 2 until no more reassignments take place.

Divisive hierarchical clustering algorithm:

1. Start with a single cluster containing all objects.

2. To divide the preliminary cluster, we apply k-means approach above to get the two smaller clusters.

3. We divide one of two clusters such that the ‘minimum loss of independence’ is attained.

4. Repeat Step 3 until no more division take place.

Here we illustrate the detail in Step 3. For currently K clusters, which one cluster we divided first? We divide cluster such that the minimum loss of independence is reached. For a given cluster j, if it is divided into two smaller clusters, U and V. We define the loss of independenceLoI as the sum of _j MCov_k(defined in K-means algorithm) of each cluster.

K 1) (j V U 1) -(j 1

j MCov MCov MCov MCov MCov MCov

LoI = +"+ + + + ₊ +"+ .

The smaller the value of LoI is, the more independent the observed variables for _j objects within cluster U and V are. So, we take the minimum LoI as the “minimum _j loss of independence” and divide the cluster j whose division results in the minimum loss of independence. An example of divisive hierarchical algorithm procedure can be found in Figure 2.

(30)

Figure 2: An example of divisive hierarchical algorithm procedure.

Step 1: Start with a single cluster which consists of all objects.

Step 2: Using k-means approach to divide the initial cluster into two smaller clusters.

Step 3: Which cluster will be divided first?

Consider the divisions of all current clusters, we get LoI₁andLoI₂. Divide the cluster whose division results in the “minimum loss of independence”.

1 2 5 4 7 6 8 3 3 5 8 6 2 1 7 4 k-means Cluster 1 Cluster 2 8 3 6 5 2 1 7 4 U

MCov + MCov_V + MCov₂ = LoI₁ Cluster U Cluster V Cluster 2

divide cluster 1 apply k-means 3 5 8 6 2 1 7 4 Cluster U Cluster V Cluster 1 1

MCov + MCovU + MCovV = LoI2 divide cluster 2

(31)

Step 4: Repeat Step 3 until no more division take place.

The results of divisive hierarchical clustering method can be displayed as a dendrogram. The vertical axis gives the values of one minus minimum loss of independence at which the division occurs.

4.2 Latent class membership estimations for RLCA

The k-means and divisive hierarchical methods we proposed also work for the model (3.2) under eliminating the covariate effects (Huang, 2005) and “marginalize” the model (3.2).

The key to marginalizing overz is that the process must yield random variables _i that follow a finite mixture distribution that is both independent of z and has J _i mixing components. One strategy for achieving such marginalization can be motivated by the properties of added variable plots for linear regression models. The conditional probabilities (3.4) can be viewed as fitting a logistic regression of Y on _im

i

S adjusting for z , separately for each m. Then the problem becomes how to _im

calculate residuals from the generalized linear model:

(

)

[

_S _Z

]

( )

_Z _α* m c im c im i im Y = Τ Ε | , logit for i=1,...,N;m=1,...,M (4.3) where “p” denotes polytomous responses; ₁,..., ₍ _-1₎

m p im Yim Yim K Τ ⎡ ⎤ = ⎣ ⎦ Υ and

(

)

imk im

Y =I Y =k ; Z_imc =

[

(

z_im₁ −z_m₁

) (

,..., z_imL −z_mL

)

]

Τ, (“centered” covariate vector);

(

)

_∑

= = N i imp mp N z 1 / 1 z ;

Under polytomous item responses, the pseudo-residual of ith participant’s mth response item is

( )

-1

(

)

ˆ _ˆ p p p p im= im im− im R V Y μ (4.4)

(32)

where “hat” denotes the estimated values; ₁,..., ₍ _-1₎ m p im Rim Rim K Τ ⎡ ⎤ = ⎣ ⎦ R ;

( )

=Var p p im im V Y ;μ_mp=E

(

Y Z_mp| c_m

)

; and i=1,...,N;m=1,...,M ;k=1,...,K_m.

When eliminatingz , we have the nice property that the covariates associated _i with class prevalencesx can be ignored and under the assumption of_i x and _i z are _im independent, we can treating the residuals from the model (4.1) as new response variables. Details of the above the marginalization process can be found in Huang (2005) and in section 2.3 of this thesis. We can classify objects based on the new response variablesR to a set of categorical items. The methods to classify objects are _imp the same as the k-means and divisive hierarchical clustering algorithms in section 4.1, besides the estimation of the covariance matrixCov

( )

Υ_i in (4.1), evaluated as

' n 1 1 --1 n n Τ ⎡ ⎛ ⎞ ⎤ ⎜ ⎟ ⎢ _⎝ _⎠ ⎥

⎣R I 11 R , where R is the residual matrix of n objects. ⎦

4.3 Parameter estimation by viewing estimated latent class as known variable

When using k-means and divisive hierarchical method to estimate the latent class membership, we denote the estimated latent class as Sˆi for individual i. Replace S i by Sˆi in (3.3) and (3.4) as the following:

( )

i 0 1 1 i ˆ Pr =j log ˆ Pr =J j j i pj ip S x x S β β β ⎡ ⎤ ⎢ _{⎥ =} ₊ _{+ ⋅⋅⋅ +} ⎢ ⎥ ⎣ ⎦ (4.5) and

(

)

(

)

0 1 ( )J-1 ( ) * * * 1 -1 1 1 ˆ Pr = | , ˆ ˆ log ˆ Pr = | , mk mk mk im i i i i J mk im Lmk imL im i i Y k S S S z z Y K S γ γ γ α α ⎡ ⎤ ⎢ _{⎥ =} ₊ _{+ +} ₊ _{+ ⋅⋅⋅ +} ⎢ ⎥ ⎣ ⎦ z z " (4.6) N i=1,..., ;m=1,...,M ;k =1,...,(K_m −1); j=1,..., ( -1)J

(33)

where S I Sˆ_ij=

(

ˆ_i = j

)

and 0 * mk mkJ γ =γ , j * -mk mkj mkJ γ =γ γ in (3.4).

Then, it is easily to estimate the parameters in (3.2) using multinomial logistic regression (4.5) and (4.6).

(34)

5. Simulation study

The simulation study aims to evaluate the performance of the proposed approach.

5.1 Generated data from RLCA model

Three different RLCA models are simulated in our simulation study. The first was a three-class RLCA with five two-level measured indicators, two covariates associated with conditional probabilities, and two covariates associated with latent prevalences (i.e., J =3,M =5,K₁ ="=K₅ =2,P=L=2 ). The second was a six-class RLCA with five three-level measured indicators and the same setting as the three-class model (i.e., J =6,M =5,K₁ ="=K₅ =3,P=L=2). The last was a two-class RLCA with five three-level measured indicators, two covariates with conditional probabilities and two covariates latent prevalences (i.e.,

2 , 3 , 5 , 2 = ₁ = = ₅ = = = = M K K P L

J " ). For each model, the model parameters

1} -J , 1, j ,

{β_pj = " for each p∈{0,1,",P} , J}{γ_jmk ,j=1,", for all m, k, and 1)} -(K , 1, k M; , 1, m ,

{α_qmk = " = " _m for all q, were given. Table 1~6 shows the values of parameters for the three models separately.

The covariates of three-class model, we got from the subjects who joined the Multidimensional Psychopathological Study on Schizophrenia (MPSS) or the Study on Etiological Factors of Schizophrenia (SEFOS). We got the covariates of six-class and two class models from the subjects who joined the Multidimensional Psychopathology Group Research Projects (MPGRP), MPSS or SEFOS. In three models, the covariates associated with conditional probabilities include variables of sex and age (year), and the covariates associated with latent prevalences include variables of occupation (with versus without occupation) and dprime, which is the sensitivity index of the Continuous Performance Task (CPT; Rosvold et al., 1956)

(35)

performance.

We fit each model under several different sample sizes. For the three-class RLCA, the selected sample sizes were 100 and 500, which gave roughly 3 and 16 individuals per parameter of RLCA (3.2), respectively. For the six-class RLCA, the selected sample sizes were 300 and 1000, which gave roughly 3 and 10 individuals per parameter, respectively. For the two-class RLCA, the selected sample sizes were 150 and 700, which gave roughly 3 and 16 individuals per parameter, respectively. The observable measurements Yi were then generated from each different model structure with 100 replications.

5.2 Simulation results

In each case, the results of simulation study are represented in six tables which include the average parameters estimates, average conditional probabilities, average latent prevalences, average correlation coefficients, and average match proportions for 100 replications separately. We shall explain these results later. The simulation results for 3-class model with 100 sample sizes are presented from Table 7 to Table 12. The simulation results for 3-class model with 500 sample sizes are presented from Table 13 to Table 18. The simulation results for 6-class model with 300 sample sizes are presented from Table 19 to Table 24. The simulation results for 6-class model with 1000 sample sizes are presented from Table 25 to Table 30. The simulation results for 2-class model with 150 sample sizes are presented from Table 31 to Table 36. The simulation results for 2-class model with 700 sample sizes are presented from Table 36 to Table 42. According to Table 7 ~ Table 42, we can see that these results of the k-means method and divisive hierarchical using correlation coefficients measurement are similar to those of k-means method and divisive hierarchical using covariance measurement. So, we shall discard the results of k-means method and divisive hierarchical using covariance measurement in the following discussion.

(36)

First we discuss the simulation results which are presented from Table 13 to Table 18 of 3-class model with 500 sample sizes.

Average parameters estimations

Table 12 and Table 13 under the column “TRUE” include all

{

βpj,γ jmk,αqmk

}

in simulated data. All average of

{

βpj,γjmk,αqmk

}

estimates got from the k-means method using correlation coefficient measurement (K_Corr) and covariance measurement (K_Cova) separately and the divisive hierarchical method using correlation coefficient measurement (D_Corr) and covariance measurement (D_Cova) appeared in Table 12 and Table 13. Table 12 and Table 13 can demonstrate that the parameters estimates got from the k-means method are well compared to the true parameters. But the parameters estimates got from the divisive hierarchical procedure are poor. Furthermore, the divisive hierarchical procedure is sensitive to cluster structure. This means that hierarchical procedure have the chance to perform more well only when there is clear cluster structure than when there is no clear cluster structure.

Table 12 and Table 13 also include the standard errors of parameters estimates in doing multinomial regressions, (4.1) and (4.2), and the average sample standard errors of the parameters estimates for 100 replications. The sample standard errors of the estimates for 100 replications include the variation of doing multinomial regression and creating cluster membership. Because we use the multinomial regression to estimate parameters under the assumption of known cluster membership, the standard errors of parameters estimates in doing multinomial regression did not include the variations of creating cluster membership. Therefore, the standard errors of parameters estimates in doing multinomial regressions should be smaller than the sample standard errors of the estimates for 100 replications. This is demonstrated in

(37)

Table 12 and Table 13. However this is not demonstrated in Table 7 and Table 8 for the 3-class model with 100 sample sizes which gave few individuals per parameter. For the sparse data, the estimated standard errors of parameters estimates in doing multinomial regressions are not accurate. Therefore, the standard errors of parameters estimates in doing multinomial regressions are not always smaller than the sample standard errors of the estimates over 100 replications for the 3-class model with 100 sample sizes.

Average Conditional Probabilities

Table 14 under the column “TRUE” displays the RLCA conditional probabilities evaluated at the sample means of the incorporated covariates:

) exp( 1 1 p 1 -K , 1, k , ) exp( 1 ) exp( p 1 1 mkj mKj 1 1 mkj mkj mkj

∑

− = − = + + = = + + + = K i mk T m K i mk T m mk T m z z z α γ α γ α γ " where

∑

= = N i im m N 1 1 z z .

The average of estimated conditional probabilities over 100 replications with k-means and divisive hierarchical methods appear in Table 14. The estimated conditional probabilities for k-means and divisive hierarchical methods are

j Y k j p im mkj class in s individual of number the of level at being class in s individual of number the ˆ =

Overall, the average conditional probabilities for the k-means method are more closed to the true conditional probabilities than the average conditional probabilities for the divisive hierarchical method.

Average Latent Prevalence:

(38)

prevalences:

∑

= ∗ ₌ N 1 i T i ) x ( N 1 _η _β η_j _j

The average of estimated prevalences over 100 replications with k-means and divisive hierarchical methods are also shown in Table 16. The estimated prevalences are study in s individual of number total the class in s individual of number the ˆ_j = j η

Overall, the average latent prevalences for the k-means method are more closed to the true prevalences than the average latent prevalence for the divisive hierarchical clustering method.

Average Correlation Coefficients

We evaluated theMCov_kof the objects in the same cluster k. Table 16 displays the average of MCovk over 100 replications in each cluster k. The k-means approach resulted smaller average correlation coefficients than the divisive hierarchical method. Next, for the 6-class model with 1000 sample sizes, we shall discuss the simulation results which are presented from Table 22 to Table 26. These tables show that the results of whether the k-means procedure or the divisive hierarchical procedure are poor obviously comparing to the 3-class model with 500 sample sizes. The 2-class model with 150 and 700 sample sizes, we shall discuss the simulation results which are presented from Table 27 to Table 36. It is reasonable that the results of k-means and divisive hierarchical clustering methods for 2-class RLCA models are the same.

When we use maximum likelihood to estimate the parameters in (3.2), the maximum likelihood estimation (MLE) is relative to the number of individuals given in per parameter. For the spare data which gave less individuals per parameter, the MLE can not be obtained or the MLE is not a good estimation .For the three models,

(39)

3-class RLCA with 100 sample sizes, 6-class RLCA with 300 sample sizes and 2-class RLCA with 150 sample sizes, which gave less individuals per parameter, the simulation results are not wore than those that gave more individuals per parameter. It demonstrates that our clustering procedure is irrelative to the number of individuals given per parameter.

(40)

6. Example

Schizophrenia Data

The present study is composed of three projects, the Multidimensional Psychopathology Group Research Projects (MPGRP), the Multidimensional Psychopathological Study on Schizophrenia (MPSS) and the Study on Etiological Factors of Schizophrenia (SEFOS). The initial project MPGRP investigated the clinical manifestations of schizophrenia in a cohort of schizophrenia patients. The subsequent project MPSS focused on the follow-up neuropsychological evaluation of the MPGRP patients. The project SEFOS aimed to search for neurobiological, environmental and genetic factors underlying schizophrenia. The analyzed data include 169 acute-state patients who had completed the PANSS within one week of index admission and 161 subsided-state patients who were living with community and under family care.

The major instrument applied in this study is the PANSS, were used to collect patients’ symptom measurements, an assessment of the clinical psychopathological symptoms of schizophrenia. It has 30 items rated on a 7-point scale (1=absent, 7=extreme). The PANSS consists of three subscales: positive (seven symptoms: P1-P7), negative (seven symptoms: N1-N7), and general psychopathology (sixteen symptoms: G1-G16). Because the original 7-point scale is too complex and has too many parameters to analyze, we reduced the 7-point scale on PANSS by merging the scales which have the percentages less than 5% on each item.

Demographic variables included gender, age, onset-age of psychotic symptoms, years of education, and occupation (having versus no occupation). The category of no occupation included housewives, students, unemployed and retired people.

(41)

growth retardation, special personal behavior and psychological adjustment problems. There were three environmental questions including: (1) the patient had brain injury in the developmental process, such as premature birth, brain damage and retarded intelligence; (2) the patient had unstable mood or abnormal behavioral traits to interfere with daily life, including angry, timid, depressed and inactive; and (3) the patient had psychological adjustment problems to interfere with daily life, including bad relation between parents, getting along badly with sibling, getting physical disease and unforeseen happenings of family. All three environmental factors were rated by a 3-point scale with 0 as no event, 1 as slight and no obvious effect on emotional and behavioral reacting, and 2 as obvious effect on emotional and behavioral reacting.

The neuropsychological batteries assessed reaction time, attention, speed of information processing, and active problem solving. Specifically, the test batteries included several standard neuropsychological instruments with demonstrated reliability and validity, including the Continuous Performance Test (CPT), Wisconsin Card Sorting Test (WCST), Wechsler Adult Intelligence Scale-Revised (WAIS-R), Wechsler Memory Scale-Revised (WMS-R), and Trail Making Tests A and B (TMT-A and -B). Here we concentrated on CPT.

We fit RLCA model with 30 7-level measured indicators, the covariates associated with conditional probabilities include variables of sex, age (year), years of education (year), and occupation (with versus without occupation), and the covariates associated with latent prevalences include variables of age of onset (year), envir11, envir21, envir22, envir31, envir32, and dprime.

We group objects by k-means and divisive hierarchical approaches, and the analysis reported here aims to describe the associations between risk factors and underlying latent class, and examine the composition of patient subtypes across

(42)

different disease states.

Here, we introduced a useful tool for clustering. Heatmap has the notion of rearranging the columns and rows to show structure in the data. A heatmap is a two-dimensional, rectangular, colored grid. It displays data that themselves come in the form of a rectangular matrix. The color of each rectangle is determined by the value of the corresponding entry in the matrix. The rows and columns of the matrix can be rearranged independently. Usually they are reordered so that similar rows are placed next to each other, and the same for columns. Among the orderings that are widely used are those derived from a hierarchical clustering, but many other orderings are possible. If hierarchical clustering is used, then it is customary that the dendrograms are provided as well. In many cases the resulting image has rectangular regions that are relatively homogeneous and hence the graphic can aid in determining which rows (generally the genes) have similar expression values within which subgroups of samples (generally the columns).

Results for patients at the acute state by divisive hierarchical clustering method

Heatmap for patients at the acute state was shown in Figure 3. The column dendrogram is agglomerative hierarchical clustering method with distance measurement using one minus correlation and the row dendrogram is our divisive hierarchical clustering with distance measurement using one minus loss of independence. The color of each cell represented the extent of induction or repression of a given gene.

Although the heatmap did not display the class structure clearly, we can use the dendrogram of divisive hierarchical method at the left to group objects into four classes.

Table (37) contains the scores (mean ± standard error) of 30 items (or 5 factors) in each class, we can characterize four classes as follows. Class 1 has lower scores

藉由K均值分群與分裂式分群程序預測潛在群體

國 立 交 通 大 學

統計學研究所

碩 士 論 文

藉由K均值分群與分裂式階層分群程序

預測潛在群體

Prediction of Underlying Latent Classes via

K-means and Divisive Hierarchical Procedures

研 究 生：許仲竹

指導教授：黃冠華 博士

藉由K均值分群與分裂式階層分群程序

預測潛在群體

Prediction of Underlying Latent Classes via

K-means and Divisive Hierarchical Procedures

研 究 生：許仲竹 Student: Chung-Chu Hsu

指導教授：黃冠華 Advisor: Dr. Guan-Hua Huang

國 立 交 通 大 學

統計學研究所

碩 士 論 文

Hsinchu, Taiwan, Republic of China

中華民國九十七年六月

藉由K均值分群與分裂式階層分群程序

預測潛在群體

研究生：許仲竹 指導教授：黃冠華 博士

國立交通大學統計學研究所

摘要

本研究的主要目的是藉由群聚分析的方法預測潛在群體。利用群

聚方法中的 k 均值分群和分裂階層分群的想法，將原本的距離測度改

為相關係數或共變異數，對所有的主體分群，使得屬於同一群的主體

所測得的各項目能互相獨立。利用模擬來評估參數估計的表現，除此

之外，還利用精神分裂症和乳癌的微陣列資料為例，作更詳細的說

明。模擬結果顯示，k 均值分群法所估出來的參數都相當靠近真實的

參數，但是分裂式階層分群法表現得並不好;然而，在乳癌資料的例

子裡，分裂階層分群法成功的將主體分群，也對潛在群體做了不錯的

預測。

Prediction of Underlying Latent Classes via

K-means and Divisive Hierarchical Procedures

Student: Chung-Chu Hsu Advisor: Dr. Guan-Hua Huang

Institute of Statistics

National Chiao Tung University

ABSTRACT

誌謝

Contents

Abstract (in Chinese)

Abstract (in Chinese)

i

Abstract (in English)

ii

Acknowledgements (in Chinese)

iii

Contents

iv

List of Tables v

List of Figures viii

List of Tables...v

1. Introduction...1

2. Literature review ...3

3. Models ...12

4. Parameter estimations by clustering analysis ...15

5. Simulation study ...24

6.

Example...30

7. Discussion ...39

References ...40

List of Tables

List of Figures

1. Introduction

2. Literature review

(

)

{

}

(

)

∑ ∏∏

(

)

(

)

(

國立交通大學

碩士論文

研究生：許仲竹

指導教授：黃冠華博士

研究生：許仲竹 Student: Chung-Chu Hsu

國立交通大學

碩士論文

研究生：許仲竹指導教授：黃冠華博士

_{∑ ∏∏}

_∑

_∏∏