• 沒有找到結果。

Latent Seriation Method for Cluster or Longitudinal Data

N/A
N/A
Protected

Academic year: 2022

Share "Latent Seriation Method for Cluster or Longitudinal Data"

Copied!
9
0
0

加載中.... (立即查看全文)

全文

(1)

Latent Seriation Method for Cluster or Longitudinal Data

Guo-Hua Huang

May 3, 2005

Abstract

Based on the cluster or longitudinal measurements, our research interest mainly focuses on seriating the latent degree of health status or functioning for the collected subjects. To solve this problem, a latent variable is used to represent the unobserved seriation. In addition, some appropriate and widely used joint models of a latent variable and the cluster measurements are proposed to find the most possible occurring value of a latent variable, which is taken in the subsequent seriation. Since a latent variable is considered, an EM-algorithm is suggested for the estimation of parameters. As for the evaluation of the seriation procedure, a “leave one subject out” criterion is proposed to compute the correlation or concordance index. To examine the performance of our procedure, a Monte Carlo simulation is implemented to show the merits of the proposed approaches. Finally, application of our seriation procedure to a CD4 depletion study is presented in the thesis.

1 Introduction

In this study, the cluster or longitudinal data {((Xi1, Yi1), · · · , (Ximi, Yimi)) : i = 1, 2, · · · , n}

are considered, where Yij’s are the response measurements, Xij’s are the corresponding covariates, and mi’s are the size or the number of repeated measurements of the ith subject.

For the longitudinal data setting, (Xij, Yij denotes the measurements at time point tij, i = 1, · · · , n, j = 1, · · · , mi. This type of data is frequently occurring in biomedical and epidemiological studies, and is widely discussed. Here, the research interest mainly aims at seriating latent degree of the health status or functioning for the collected subjects. For this problem, we use a latent variable U to represent an unobserved health status and functioning of each subject. Moreover, the most possible occurring value of U will be taken

KEY WORDS: cluster data, latent variable,longitudinal data, seriation, E-M algorithm.

(2)

in our seriation procedure.

Recent research for finding latent mobility disability classes of older aged disability women can be tracked back to the work of Larsen (2004). The author proposed the joint modeling of time-to-event and multiple binary responses with a latent variable U of nominal levels to classify the disability classes. For the related work with the pre-determined classes, it can be found in the paper of Lin, McCulloch, Turnbull, Slate, and Clark (2000). Their main concern is to find the true prostate cancer status of each patient via using prostate specific antigen (PSA), which is treated as the biomarker of prostate cancer, in conjunction with a latent class mixed model. As we can see, these methods don’t fully expose the true ordering or degree of latent classes in the considered models, and, hence, are very restrictive.

To further improve the above drawback, a more appropriate distribution assumption is made on a latent variable U in our approach.

Similar to the method of Larsen (2004) using the decomposition principle on the joint probability density function, say, f (yi1, · · · , yimi, ui|xi1, · · · , ximi) of ((Yi1, · · · , Yimi), Ui) con- ditioning on (xi1, · · · , ximi), our models are also made on the marginal density fm(ui|xi1, · · · , ximi) of Uiand the conditional density fc(yi1, · · · , yimi|xi1, · · · , ximi, ui) of (Yi1, · · · , Yimi). In addition, Yij’s are assumed to be mutually independent conditioning on ((xi1, · · · , ximi), ui) , i.e., fc(yi1, · · · , yimi|xi1, · · · , ximi, ui) =Qmj=1i fc(yij|xij, ui). Through the considered mod- els, we take the most possible occurring value of Ui, say,Ubi in the joint probability density function for the subsequent seriation, which can be achieved when the the probability den- sity function fm(ui|xi1, · · · , ximi) and fc(yij|xij, ui)s are appropriately specified.When the probability density function of Yij conditioning on (xij, ui) is set to be a widely used gen- eralized linear model, we derive that fc(yij|xij, ui) can be a unimodal of ui only under the specification of natural link function. Based on the maximizers Ubi’s, the relative latent

(3)

degree of n subjects can be arranged via the ordering of the values. If the research interest is to classifying the subjects into appropriate number of ordering classes, a hierarchical clustering method can be applied with a distance matrix defined on the values of Ubi’s. To compute the seriation index, such as the rank correlation and the concordance index, of our seriation procedure, a “leave one subject out” criterion is proposed in the study.

The rest of this thesis is organized as follows. In section 2, the proposed models and the corresponding properties are introduced first. An expectation and maximization (EM) algorithm is also stated for the estimation of parameters in the considered models. The seriation procedure and the evaluation criterion is proposed in section 3. In section 4, a Monte Carlo simulation is implemented to investigate the proposed approach. Moreover, our seriation procedure is applied to the empirical example from the Multicenter AIDS Cohort Study (MACS) is used to illustrate. Finally, we will provide a brief discussion of the possible extension of our method in section 5.

2 Joint Model and Estimation

In this section, some widely used latent and generalized linear models are considered for the joint probability density function. For the convenience of seriation purpose, it is expected to find the conditions so that the joint p.d.f. is an unimodal of a latent value.Meanwhile, we derive the conditions for the unimodal of a latent value, which are used in the succeeding seriation procedure. Under the validity of the proposed joint latent model, an EM-algorithm is suggested for this estimation problem.

2.1 Modelling

Let Yi = (Yi1, . . . , Yimi) and Xi = (Xi1, ..., Ximi) with Xij’s being p × 1 covariate vectors of the ith subject. In this study, we aim at modelling the joint latent p.d.f. f (yi, ui|xi)

(4)

to achieve the seriation for collected subjects. Here, a latent variable U represents true degree or level and explains the existence of independence mechanism among the cluster or longitudinal measurements. In biomedical, epidemiological, and longitudinal studies, a popular way to model the joint latent p.d.f. is via making distribution assumptions on the marginal and conditional p.d.f’s. In addition, conditioning on (ui, xi), the conditional p.d.f.

fc(yi|ui, xi) is further factorized as fc(yi|ui, xi) = Qmj=1i fc(yij|ui, xij). For modelling the marginal distribution of a latent variable, commonly used distributions, such as Gaussian, Gamma, logistic, extreme-minimum-value, extreme-maximum-value distributions, etc., are often assumed. We can find that these distributions are all belong to a unimodal distribution class. As for the conditional p.d.f. fc(yij|ui, xij), a widely used generalized linear model (GLM)

fc(yij|ui, xij) = exp(yijθij− bjij)

a(φ) + c(yij; φ))

is considered in this article, where φ is called the scaling parameter (or the dispersion parameter) and θij is called the natural parameter with E[yij|ui, xij] = hj(xTijβj + uiγj) for a known link function h(·). In the following theorems, we will show that the joint p.d.f. f (yi, ui|xi) will be an unimodal of ui when the considered latent model fm(ui|xi) is an unimodal of ui and fc(yij|ui, xij)’s are GLM’s with the specified natural link function, i.e. θij = xTijβj + uiγj. Here, the considered random effects models could date back to the recent works of Breslow and Clayton (1993), Breslow and Lin (1995), Lin and Breslow (1996), among others.

Generally, the marginal p.d.f. fm(ui|xi) is assumed to be an unimodal function of uiand conditional p.d.f. fc(yij|ui, xij)’s are assigned to be a GLMs. If each GLM has the natural link function with some regularity conditions, the joint p.d.f. f (yi, ui|xi) will become an unimodal of ui for each subject. With the advantage of the unimodal assumption, we will

(5)

derive the unique seriation index for the subsequent classification. In theorem 2.1, under some conditions the joint p.d.f will be unimodal. Moreover, in theorem 2.2, if we add the compact property on ui, we will get more application result than theorem 2.1.

Theorem 2.1. Assume that fm(ui|xi) is an unimodal function of uiand fc(yij|ui, xij)’s are GLMs with of (1) with θij = ηij = xTijβj+ uiγj. Then, f (yi, ui|xi) has either unique or no maximizer with respect to ui.

P roof . Our goal is to derive the unimode ui of the joint p.d.f. f (yi, ui|xi). By first partially differentiating ui of the joint p.d.f., we can set it equally to 0 in order to get the maximizer or minimizer ˆui.Then, by secondly partially differentiating ui of joint p.d.f., we could check out the solution in the first partial differentiation is maximizer or minimizer.

Second partial differentiation is derived below.

2li

∂u2i = 2log(f∂um2(ui|xi)) i

+Pmj=1i (a(φ)γj [(−∂µ∂θij

i

∂θij

∂ηij

∂ηij

∂ui) + (yij − µij)∂u

i(∂θ∂ηij

ij)])

= 2log(f∂um2(ui|xi)) i

+Pmj=1i (a(φ)γj [(−a(φ)γj V ar(yij|ui, xij)∂θ∂ηij

ij + (yij− µij)∂u

i(∂θ∂ηij

ij)]).

Since yij’s are random variables and we want to draw out the variations of deciding ui is maximizer or minimizer, we will reasonably assume that ∂θ∂ηij

ij = 0. It implies that θij is a linear function of ηij. In practical, the natural link function θij = ηij is satisfied this situation. Moreover, adding the condition fm(ui|xi) is an unimodal function of ui, we will easily see the truth of ∂u2l2i

i

< 0 for any i. This tells us ∂u∂li

i is a strictly decreasing function of ui and li may exist a maximizer of ˆui. If we want to make li exist a maximizer ˆui,we must guarantee the ∂u∂li

i = 0 exists a solution. Since ∂u∂li

i is a strictly decreasing function may have cross 0 or not, it will decide the joint p.d.f. f (yi, ui|xi) has either unique or no maximizer with respect to ui. 2

In some applications, we will assign the marginal p.d.f. fm(ui|xi) as normal distribution, gamma distribution, logistic distribution etc. and the conditional p.d.f fc(yi|ui, xi) as GLM

(6)

with natural link function, then we will get the maximizer ˆui to conveniently seriate the latent classes in the following section. However, if the common distributions are not used in the modelling and we conserve the unimodal property of p.d.f., in some naive views we will constrain on the latent variable region to achieve the unimodal goal. This constrains will be expressed in Theorem 2.2.

Theorem 2.2 Under some regular conditions as Theorem 2.1 and ui has a compact support for any i, then the maximizer ˆui = argmaxuif (yi|ui, xi) is unique.

P roof . The ∂u∂li

i and ∂u2l2i i

can be derived as theorem 2.1. We will see that ∂u2l2i i

< 0 under the natural link function assumption and ∂u∂li

i will be a decreasing function of ui. Then, li

has the two situations, one is that li is an unimodal function of ui, and the other is that lih is an increasing function of ui. If ui has compact support, we will find easily the maximizer

ˆ

ui lies in either the maximal mode or the boundary of the support. 2 2.2 EM-algorithm

Since the latent variable is unobserved, the EM-algorithm (Dempster, Laird, and Rubin 1977) is used to maximize the likelihood function for the observed data, {(xi, yi) : i = 1, 2, . . . , n}. By iterating between an E-step, where the expected log-likelihood of the com- plete data, {(ui, xi, yi) : i = 1, 2, . . . , n} is calculated conditional on the observed data and the current estimate of parameters (βj, γj)’s,and an M-step, where new parameter estimates are computed by maximizing the expected log-likelihood function, the parameters will be estimated.

Let Θ = {(βj, γj) : j = 1, . . . , n}. The complete data log-likelihood function is l(u, x, y; Θ) =

Xn

i=1

li(ui, xi, yi; Θ) , where

li(ui, xi, yi; Θ) = log(fm(ui|xi)) + log(fc(yi|xi, ui))

(7)

In the E-step

Q(Θ; Θ(r)) = E[l(U, X, Y ; Θ)|X, Y ; Θ(r)]

= Pni=1E[li(Ui, Xi, Yi; Θ)|Xi, Yi; Θ(r)] is calculated, where Θ(r) is the parameter value of Θ from the rth step.

In the M-step, Q(Θ; Θ(r)) is maximized as a function of Θ. We will get the maximizer, Θ(r+1). The iterative scheme is stopped until the convergence value ˆΘ occurs. However, usually the conditional p.d.f in E-step has not a specific close form, it will cause the huge computation. An appropriate technique which is called “importance sampling”, can be appropriately used for the problem and be expressed below.

Important Sampling(Casella and Rober 1996): Suppose that X ∼ f , but the p.d.f. f is difficult to simulate from. Generate Y1, Y2, . . . , Ym, i.i.d. from known p.d.f. g, and, for any function h, calculate the estimator

Xm

i=1

( f (Yi)/g(Yi) Pm

j=1f (Yj)/g(Yj))h(Yj).

Then, the estimator will converges in probability to Eh(X).

In the article, log-likelihood function log(li) and fUi(ui|yi, xi; Θ) are represent as h and f , respectively.

In addition, we want to make the parameters have the property of uniqueness and converge to parameters of the observed p.d.f.. In some regular conditions, we will guarantee the unique maximizer ˆΘ of l(Θ; u, x, y). The conditions are shown in Theorem 2.3.

Theorem 2.3: Suppose that l(Θ; u, x, y) is unimodal of u with ˆΘ being the only stationary point and that ∂Q(Θ;Θ∂Θ(r)) is continuous in Θ and Θ(r). Then for any EM sequence {Θ(r)}, Θ(r) converges to unique maximizer ˆΘ of l(Θ; u, x, y).

P roof . See Wu, 1983.

Therefore, in our study, the unimodal model assumption in Theorem 2.3 will bring the advantage of the convergence property. By the uniqueness of maximizers, this will help us

(8)

to seriate the latent degree.

3 Seriation Procedure and Evaluation

Before starting the seriation procedure, we should estimate the parameters of joint p.d.f f (y, u|x) through the EM-algorithm. Conditioning on the estimated parameters and the observed data, the joint p.d.f has the most occurring latent variable value ˆUi for each subject. Under the unimodal modelling assumption, the most possible value is unique, fully representing the unique possible level for each subject. The collected { ˆUi, i = 1, . . . , n} are reordered as { ˆU(i), with ˆU(i+1)≥ ˆU(i), ∀i = 1, . . . , n}.

4 Monte Carlo simulation

For the following estimation, we first assume that the marginal p.d.f. comes from the distribution of cumulative logistic model, i.e.

fm(ui|xi) = exp(ui+

mi

X

j=1

xTijβj)/(1 + exp(ui+

mi

X

j=1

xTijβj))2

, and the conditional p.d.f. satisfies the GLM assumption, i.e.

fc(yi|xi, ui) =

mi

Y

j=1

exp(yijθi− b(θi)

φ + c(yij, φ)) , where E[yij|ui, xij] = µij = h(ηij), ηij = xTijγj + uiδj with ∂b(θ∂θi)

i = µiji) = h(ηi),

∂µij

∂θi = V ar(yφ ij), and γj’s, δj’s are unknown parameters.

5 Reference

Dempster, A. P., Laird, N. M., and Rubin, D.B.(1977). Maximum likelihood from incom- plete observations. Journal of the Royal Statistical Society, Series B 39, 1-38.

Larsen, K.(2004). Joint analysis of time-to-event and multiple binary indicators of latent classes. Biometrics 60, 85-92.

(9)

Lin H., McCulloch C.E. et al (2000). A latent class mixed model for analysing biomarker trajectories with irregularly scheduled observations. Statistics in medicine 19, 1303-1318.

Wu J.(1983). On the convergence properties of the EM algorithm. The Annals of Statistics, Vol. 11 1,95-103

參考文獻

相關文件

The research proposes a data oriented approach for choosing the type of clustering algorithms and a new cluster validity index for choosing their input parameters.. The

For a polytomous item measuring the first-order latent trait, the item response function can be the generalized partial credit model (Muraki, 1992), the partial credit model

On the other hand, we provide an alternative proof, which uses the new properties of the merit function, for the convergence result of the descent method considered in [Chen, J.-S.:

Joint “ “AMiBA AMiBA + Subaru + Subaru ” ” data, probing the gas/DM distribution data, probing the gas/DM distribution out to ~80% of the cluster. out to ~80% of the cluster

A 60 s noise signal sampled at 1,087 Hz was applied for EEMD analysis. The original signal is plotted in Fig. 11, in which the multiple time scales of the noise and the blade

Visual images could be used to illustrate an artist’s latent cognition, and the art of Chinese calligraphy and painting could be regarded as reflections of such cognition, which

Structured programming 14 , if used properly, results in programs that are easy to write, understand, modify, and debug.... Steps of Developing A

Constrain the data distribution for learned latent codes Generate the latent code via a prior