Latent Seriation Method for Cluster or Longitudinal Data

(1)

Latent Seriation Method for Cluster or Longitudinal Data

Guo-Hua Huang

May 3, 2005

Abstract

Based on the cluster or longitudinal measurements, our research interest mainly focuses on seriating the latent degree of health status or functioning for the collected subjects. To solve this problem, a latent variable is used to represent the unobserved seriation. In addition, some appropriate and widely used joint models of a latent variable and the cluster measurements are proposed to find the most possible occurring value of a latent variable, which is taken in the subsequent seriation. Since a latent variable is considered, an EM-algorithm is suggested for the estimation of parameters. As for the evaluation of the seriation procedure, a “leave one subject out” criterion is proposed to compute the correlation or concordance index. To examine the performance of our procedure, a Monte Carlo simulation is implemented to show the merits of the proposed approaches. Finally, application of our seriation procedure to a CD4 depletion study is presented in the thesis.

1 Introduction

In this study, the cluster or longitudinal data {((X_i1, Y_i1), · · · , (X_im_i, Y_im_i)) : i = 1, 2, · · · , n}

are considered, where Yij’s are the response measurements, Xij’s are the corresponding covariates, and mi’s are the size or the number of repeated measurements of the ith subject.

For the longitudinal data setting, (Xij, Yij denotes the measurements at time point tij, i = 1, · · · , n, j = 1, · · · , mi. This type of data is frequently occurring in biomedical and epidemiological studies, and is widely discussed. Here, the research interest mainly aims at seriating latent degree of the health status or functioning for the collected subjects. For this problem, we use a latent variable U to represent an unobserved health status and functioning of each subject. Moreover, the most possible occurring value of U will be taken

KEY WORDS: cluster data, latent variable,longitudinal data, seriation, E-M algorithm.

(2)

in our seriation procedure.

Recent research for finding latent mobility disability classes of older aged disability women can be tracked back to the work of Larsen (2004). The author proposed the joint modeling of time-to-event and multiple binary responses with a latent variable U of nominal levels to classify the disability classes. For the related work with the pre-determined classes, it can be found in the paper of Lin, McCulloch, Turnbull, Slate, and Clark (2000). Their main concern is to find the true prostate cancer status of each patient via using prostate specific antigen (PSA), which is treated as the biomarker of prostate cancer, in conjunction with a latent class mixed model. As we can see, these methods don’t fully expose the true ordering or degree of latent classes in the considered models, and, hence, are very restrictive.

To further improve the above drawback, a more appropriate distribution assumption is made on a latent variable U in our approach.

Similar to the method of Larsen (2004) using the decomposition principle on the joint probability density function, say, f (y_i1, · · · , yim_i, ui|x_i1, · · · , xim_i) of ((Y_i1, · · · , Yim_i), Ui) conditioning on (x_i1, · · · , ximi), our models are also made on the marginal density fm(ui|x_i1, · · · , ximi) of Uiand the conditional density fc(yi1, · · · , yimi|xi1, · · · , ximi, ui) of (Yi1, · · · , Yimi). In addition, Y_ij’s are assumed to be mutually independent conditioning on ((x_i1, · · · , x_im_i), u_i) , i.e., fc(y_i1, · · · , yim_i|x_i1, · · · , xim_i, ui) =^Q^m_j=1ⁱ fc(yij|xij, ui). Through the considered models, we take the most possible occurring value of Ui, say,U^bi in the joint probability density function for the subsequent seriation, which can be achieved when the the probability density function fm(ui|xi1, · · · , ximi) and fc(yij|xij, ui)^′s are appropriately specified.When the probability density function of Y_ij conditioning on (x_ij, u_i) is set to be a widely used generalized linear model, we derive that fc(yij|xij, ui) can be a unimodal of ui only under the specification of natural link function. Based on the maximizers U^bi’s, the relative latent

(3)

degree of n subjects can be arranged via the ordering of the values. If the research interest is to classifying the subjects into appropriate number of ordering classes, a hierarchical clustering method can be applied with a distance matrix defined on the values of U^bi’s. To compute the seriation index, such as the rank correlation and the concordance index, of our seriation procedure, a “leave one subject out” criterion is proposed in the study.

The rest of this thesis is organized as follows. In section 2, the proposed models and the corresponding properties are introduced first. An expectation and maximization (EM) algorithm is also stated for the estimation of parameters in the considered models. The seriation procedure and the evaluation criterion is proposed in section 3. In section 4, a Monte Carlo simulation is implemented to investigate the proposed approach. Moreover, our seriation procedure is applied to the empirical example from the Multicenter AIDS Cohort Study (MACS) is used to illustrate. Finally, we will provide a brief discussion of the possible extension of our method in section 5.

2 Joint Model and Estimation

In this section, some widely used latent and generalized linear models are considered for the joint probability density function. For the convenience of seriation purpose, it is expected to find the conditions so that the joint p.d.f. is an unimodal of a latent value.Meanwhile, we derive the conditions for the unimodal of a latent value, which are used in the succeeding seriation procedure. Under the validity of the proposed joint latent model, an EM-algorithm is suggested for this estimation problem.

2.1 Modelling

Let Yi = (Yi₁, . . . , Yim_i) and Xi = (Xi₁, ..., Xim_i) with Xij’s being p × 1 covariate vectors of the ith subject. In this study, we aim at modelling the joint latent p.d.f. f (yi, ui|xi)

(4)

to achieve the seriation for collected subjects. Here, a latent variable U represents true degree or level and explains the existence of independence mechanism among the cluster or longitudinal measurements. In biomedical, epidemiological, and longitudinal studies, a popular way to model the joint latent p.d.f. is via making distribution assumptions on the marginal and conditional p.d.f’s. In addition, conditioning on (ui, xi), the conditional p.d.f.

fc(yi|ui, xi) is further factorized as fc(yi|ui, xi) = ^Q^m_j=1ⁱ fc(yij|ui, xij). For modelling the marginal distribution of a latent variable, commonly used distributions, such as Gaussian, Gamma, logistic, extreme-minimum-value, extreme-maximum-value distributions, etc., are often assumed. We can find that these distributions are all belong to a unimodal distribution class. As for the conditional p.d.f. fc(yij|ui, xij), a widely used generalized linear model (GLM)

f_c(y_ij|u_i, x_ij) = exp(yijθij− bj(θij)

a(φ) + c(y_ij; φ))

is considered in this article, where φ is called the scaling parameter (or the dispersion parameter) and θij is called the natural parameter with E[yij|ui, xij] = hj(x^T_ijβj + uiγj) for a known link function h(·). In the following theorems, we will show that the joint p.d.f. f (yi, ui|xi) will be an unimodal of ui when the considered latent model fm(ui|xi) is an unimodal of ui and fc(yij|ui, xij)’s are GLM’s with the specified natural link function, i.e. θij = x^T_ijβj + uiγj. Here, the considered random effects models could date back to the recent works of Breslow and Clayton (1993), Breslow and Lin (1995), Lin and Breslow (1996), among others.

Generally, the marginal p.d.f. fm(ui|xi) is assumed to be an unimodal function of uiand conditional p.d.f. fc(yij|ui, xij)’s are assigned to be a GLMs. If each GLM has the natural link function with some regularity conditions, the joint p.d.f. f (yi, ui|xi) will become an unimodal of ui for each subject. With the advantage of the unimodal assumption, we will

(5)

derive the unique seriation index for the subsequent classification. In theorem 2.1, under some conditions the joint p.d.f will be unimodal. Moreover, in theorem 2.2, if we add the compact property on ui, we will get more application result than theorem 2.1.

Theorem 2.1. Assume that fm(ui|xi) is an unimodal function of uiand fc(yij|ui, xij)’s are GLMs with of (1) with θij = ηij = x^T_ijβj+ uiγj. Then, f (yi, ui|xi) has either unique or no maximizer with respect to ui.

P roof . Our goal is to derive the unimode u_i of the joint p.d.f. f (y_i, u_i|x_i). By first partially differentiating ui of the joint p.d.f., we can set it equally to 0 in order to get the maximizer or minimizer ˆui.Then, by secondly partially differentiating ui of joint p.d.f., we could check out the solution in the first partial differentiation is maximizer or minimizer.

Second partial differentiation is derived below.

∂²l_i

∂u²_i = ^∂²^log(f_∂u^m2^(uⁱ^|xⁱ⁾⁾ i

+^P^m_j=1ⁱ (_a(φ)^γ^j [(−^∂µ_∂θ^ij

i

∂θ_ij

∂ηij

∂η_ij

∂ui) + (yij − µij)_∂u^∂

i(^∂θ_∂η^ij

ij)])

= ^∂²^log(f_∂u^m2^(uⁱ^|xⁱ⁾⁾ i

+^P^m_j=1ⁱ (_a(φ)^γ^j [(−_a(φ)^γ^j V ar(yij|ui, xij)^∂θ_∂η^ij

ij + (yij− µij)_∂u^∂

i(^∂θ_∂η^ij

ij)]).

Since yij’s are random variables and we want to draw out the variations of deciding ui is maximizer or minimizer, we will reasonably assume that ^∂θ_∂η^ij

ij = 0. It implies that θij is a linear function of ηij. In practical, the natural link function θij = ηij is satisfied this situation. Moreover, adding the condition fm(ui|xi) is an unimodal function of ui, we will easily see the truth of ^∂_∂u²^l2ⁱ

i

< 0 for any i. This tells us _∂u^∂lⁱ

i is a strictly decreasing function of ui and li may exist a maximizer of ˆui. If we want to make li exist a maximizer ˆui,we must guarantee the _∂u^∂lⁱ

i = 0 exists a solution. Since _∂u^∂lⁱ

i is a strictly decreasing function may have cross 0 or not, it will decide the joint p.d.f. f (yi, ui|xi) has either unique or no maximizer with respect to ui. 2

In some applications, we will assign the marginal p.d.f. fm(ui|xi) as normal distribution, gamma distribution, logistic distribution etc. and the conditional p.d.f fc(yi|ui, xi) as GLM

(6)

with natural link function, then we will get the maximizer ˆui to conveniently seriate the latent classes in the following section. However, if the common distributions are not used in the modelling and we conserve the unimodal property of p.d.f., in some naive views we will constrain on the latent variable region to achieve the unimodal goal. This constrains will be expressed in Theorem 2.2.

Theorem 2.2 Under some regular conditions as Theorem 2.1 and ui has a compact support for any i, then the maximizer ˆu_i = argmax_u_if (y_i|u_i, x_i) is unique.

P roof . The _∂u^∂lⁱ

i and ^∂_∂u²^l2ⁱ i

can be derived as theorem 2.1. We will see that ^∂_∂u²^l2ⁱ i

< 0 under the natural link function assumption and _∂u^∂lⁱ

i will be a decreasing function of ui. Then, li

has the two situations, one is that li is an unimodal function of ui, and the other is that lih is an increasing function of ui. If ui has compact support, we will find easily the maximizer

ˆ

u_i lies in either the maximal mode or the boundary of the support. 2 2.2 EM-algorithm

Since the latent variable is unobserved, the EM-algorithm (Dempster, Laird, and Rubin 1977) is used to maximize the likelihood function for the observed data, {(xi, yi) : i = 1, 2, . . . , n}. By iterating between an E-step, where the expected log-likelihood of the complete data, {(ui, xi, yi) : i = 1, 2, . . . , n} is calculated conditional on the observed data and the current estimate of parameters (βj, γj)’s,and an M-step, where new parameter estimates are computed by maximizing the expected log-likelihood function, the parameters will be estimated.

Let Θ = {(βj, γj) : j = 1, . . . , n}. The complete data log-likelihood function is l(u, x, y; Θ) =

Xn

i=1

li(ui, xi, yi; Θ) , where

li(ui, xi, yi; Θ) = log(fm(ui|xi)) + log(fc(yi|xi, ui))

(7)

In the E-step

Q(Θ; Θ^(r)) = E[l(U, X, Y ; Θ)|X, Y ; Θ^(r)]

= ^Pⁿ_i=1E[li(Ui, Xi, Yi; Θ)|Xi, Yi; Θ^(r)] is calculated, where Θ^(r) is the parameter value of Θ from the rth step.

In the M-step, Q(Θ; Θ^(r)) is maximized as a function of Θ. We will get the maximizer, Θ^(r+1). The iterative scheme is stopped until the convergence value ˆΘ occurs. However, usually the conditional p.d.f in E-step has not a specific close form, it will cause the huge computation. An appropriate technique which is called “importance sampling”, can be appropriately used for the problem and be expressed below.

Important Sampling(Casella and Rober 1996): Suppose that X ∼ f , but the p.d.f. f is difficult to simulate from. Generate Y₁, Y₂, . . . , Y_m, i.i.d. from known p.d.f. g, and, for any function h, calculate the estimator

Xm

i=1

( f (Yi)/g(Yi) Pm

j=1f (Y_j)/g(Y_j))h(Yj).

Then, the estimator will converges in probability to Eh(X).

In the article, log-likelihood function log(li) and fU_i(ui|yi, xi; Θ) are represent as h and f , respectively.

In addition, we want to make the parameters have the property of uniqueness and converge to parameters of the observed p.d.f.. In some regular conditions, we will guarantee the unique maximizer ˆΘ of l(Θ; u, x, y). The conditions are shown in Theorem 2.3.

Theorem 2.3: Suppose that l(Θ; u, x, y) is unimodal of u with ˆΘ being the only stationary point and that ^∂Q(Θ;Θ_∂Θ^(r)⁾ is continuous in Θ and Θ^(r). Then for any EM sequence {Θ^(r)}, Θ^(r) converges to unique maximizer ˆΘ of l(Θ; u, x, y).

P roof . See Wu, 1983.

Therefore, in our study, the unimodal model assumption in Theorem 2.3 will bring the advantage of the convergence property. By the uniqueness of maximizers, this will help us

(8)

to seriate the latent degree.

3 Seriation Procedure and Evaluation

Before starting the seriation procedure, we should estimate the parameters of joint p.d.f f (y, u|x) through the EM-algorithm. Conditioning on the estimated parameters and the observed data, the joint p.d.f has the most occurring latent variable value Ûi for each subject. Under the unimodal modelling assumption, the most possible value is unique, fully representing the unique possible level for each subject. The collected { Ûi, i = 1, . . . , n} are reordered as { Û_(i), with Û_(i+1)≥ Û_(i), ∀i = 1, . . . , n}.

4 Monte Carlo simulation

For the following estimation, we first assume that the marginal p.d.f. comes from the distribution of cumulative logistic model, i.e.

fm(ui|xi) = exp(ui+

m_i

X

j=1

x^T_ijβj)/(1 + exp(ui+

m_i

X

j=1

x^T_ijβj))²

, and the conditional p.d.f. satisfies the GLM assumption, i.e.

fc(yi|xi, ui) =

mi

Y

j=1

exp(yijθi− b(θi)

φ + c(yij, φ)) , where E[yij|ui, xij] = µij = h(ηij), ηij = x^T_ijγj + uiδj with ^∂b(θ_∂θⁱ⁾

i = µij(θi) = h(ηi),

∂µij

∂θi = ^{V ar(y}_φ ^ij⁾, and γj’s, δj’s are unknown parameters.

5 Reference

Dempster, A. P., Laird, N. M., and Rubin, D.B.(1977). Maximum likelihood from incom- plete observations. Journal of the Royal Statistical Society, Series B 39, 1-38.

Larsen, K.(2004). Joint analysis of time-to-event and multiple binary indicators of latent classes. Biometrics 60, 85-92.

(9)

Lin H., McCulloch C.E. et al (2000). A latent class mixed model for analysing biomarker trajectories with irregularly scheduled observations. Statistics in medicine 19, 1303-1318.

Wu J.(1983). On the convergence properties of the EM algorithm. The Annals of Statistics, Vol. 11 1,95-103