• 沒有找到結果。

Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data

N/A
N/A
Protected

Academic year: 2021

Share "Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data"

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)

Published online 20 August 2007 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/sim.3026

Estimation and prediction in linear mixed models with

skew-normal random effects for longitudinal data

Tsung I. Lin

1,∗,†

and Jack C. Lee

2

1Department of Applied Mathematics, National Chung Hsing University, Taichung 402, Taiwan 2Graduate Institute of Finance, National Chiao Tung University, Hsinchu 30050, Taiwan

SUMMARY

This paper extends the classical linear mixed model by considering a multivariate skew-normal assumption for the distribution of random effects. We present an efficient hybrid ECME-NR algorithm for the computation of maximum-likelihood estimates of parameters. A score test statistic for testing the existence of skewness preference among random effects is developed. The technique for the prediction of future responses under this model is also investigated. The methodology is illustrated through an application to Framingham cholesterol data and a simulation study. Copyrightq 2007 John Wiley & Sons, Ltd. KEY WORDS: ECME algorithm; maximum-likelihood estimation; prediction; random effects; SNLMM

1. INTRODUCTION

The normal linear mixed model (NLMM), originally introduced by Laird and Ware [1], has become the most frequently used analytic tool for longitudinal data analyses with continuous repeated measures. The specification of both random effects and the error term is automatically taken to be normal for mathematical convenience. Unfortunately, such normality assumptions are too restrictive and suffer from the lack of robustness against departure from the normality assump-tion. Meanwhile, it may yield invalid statistical inferences when the underlying normalities are violated.

To reduce unrealistic normality assumptions in NLMM, various authors concentrated on more flexible approaches using a broader family of distributions. Pinheiro et al.[2] proposed a (multi-variate) t linear mixed model and showed that it would perform well in the occurrence of outliers.

Correspondence to: Tsung I. Lin, Department of Applied Mathematics, National Chung Hsing University, Taichung

402, Taiwan.

E-mail: [email protected]

Contract/grant sponsor: National Science Council of Taiwan; contract/grant number: NSC95-2118-M-005-001-MY2

(2)

Verberk and Lesaffre [3] introduced a heterogeneous linear mixed model with random effects distributed according to finite normal mixtures. They provided a simple goodness-of-fit test for the investigation of heterogeneity in random effects.

An alternative way of enhancing the justification of the normality assumption in the presence of strong skewness is via the Box–Cox transformation; see e.g.[4, 5]. Although such transformation-based methodology may yield reasonable empirical results, the achievement of joint normality is rarely satisfied and the transformed variables are difficult to interpret. From a practical viewpoint, one needs to seek an appropriate theoretical model from a robustification of normal theory in regulating skewness.

The multivariate skew-normal (SN) distribution was introduced by Azzalini and Dalla Valle[6], and some further attractive features as well as applications are given in Azzalini and Capitaino [7]. For this class of distributions, a number of extensions and alternative formulations have been proposed during the last decade. Sahu et al.[8] defined a new class of multivariate SN distributions and stated that it is very convenient to implement within a Bayesian hierarchical framework. Arellano-Valle and Genton[9] studied the family of fundamental skew normal (FUSN) distributions obtained by a convolution scheme that generalizes[8] and leads to a fairly general procedure to obtain the multivariate SN densities starting from symmetric ones. For more discussions of FUSN and other proposals of the original SN distribution such as the closed SN of[10] and the hierarchial SN of[11] along with their connections and related properties, see [12, 13].

Recently, Arellano-Valle et al. [14] proposed a skew-normal linear mixed model (SNLMM) based on the multivariate SN distribution introduced by Azzalini and Dalla Valle[6] and Azzalini and Capitaino[7]. They assumed that both random effects and within-subject errors are distributed as SN for the sake of completeness. To the best of the authors’ experiences, however, in this setting the SN distributional assumption for within-subject errors is hard to justify even when the number of observations is sufficiently large. Moreover, it is irrational to assume that the skewness parameters for within-subject errors are identical for the parsimony of model building. As can be seen in the illustrative example of [14], such representation departs from the true realities for the fitted residuals and in turn leads to less significance for the estimated skewness parameter.

In this paper, we aim at developing additional tools for a simplified version of SNLMM, more directly linked to the work of [14], in which merely the random effects are assumed to follow multivariate SN distributions. A key feature of this model is that it can be formulated as a flexible normal-truncated normal hierarchy, which is useful for random number generation and for theoretical derivations. In view of the computational perspective, we present a hybrid of EM-type and gradient-based methods alternative to [14], which is used for accelerating the calculation of maximum-likelihood (ML) estimates with the standard errors as a by-product at convergence.

In Section 2, we describe the model, define the notations and derive a few distributional prop-erties under the complete data framework. In Section 3, we discuss how to compute ML esti-mates of model parameters in an efficient manner. In Section 4, a score test statistic is obtained for testing the skewness preference among random effects. Such a test can serve as a prelim-inary check before fitting a complex SNLMM. Inferences for the estimation of random effects and the prediction of future values are discussed in Section 5. We illustrate these results us-ing the famous Framus-ingham cholesterol data in Section 6. One simulation study is presented in Section 7. Some concluding remarks are given in Section 8 and technical derivations are given in the Appendices.

(3)

2. MODEL AND NOTATION

A p-dimensional random vector Y is said to have a multivariate SN distribution with a p× 1 location vectorl ∈ Rp, a p× p positive-definite dispersion matrix R and a p × 1 skewness vector

k ∈ Rp, say Y∼ SNp(l, R, k), if its probability density function (pdf) is

f(Y) = 2p(Y | l, R)(kTR−1/2(Y − l)) (1) wherep(·|l, R) is the pdf of Np(l, R) and (·) is the cumulative distribution function of N(0, 1).

Note that ifk = 0, the pdf of Y in (1) corresponds to Np(l, R) density. Some essential distributional

properties with regard to (1) are referred to[6, 7, 15].

We consider a generalization of NLMM in which the random effects are assumed to follow multivariate SN distributions within the family defined in (1). The model can be written as

Yi= Xib + Zibi + ei, bi∼ SNq(0, 2C, k)

ei∼ Nni(0, 

2C

i), bi ⊥ ei (i = 1, . . . , N)

(2)

where the subscript i is the subject index, Yi is an ni-dimensional random response vector, Xi and Zi are known full-rank matrices of dimensions ni× p and ni× q, respectively, b is a p × 1 vector

of unknown fixed effects describing the population mean, bi is a q× 1 vector of unobservable

random effects and ei is an ni× 1 vector of residual errors assumed to be independent of bi.

Moreover, C is a q × q unstructured symmetric positive-definite matrix, Ci is an ni× ni scaled

matrix, which is a function of a small set of autocorrelation parameters q = (1, . . . , g) and

depends on i only through its dimension ni, and k = (1, . . . , q) is a q × 1 vector of skewness

parameters. In what follows, we reparameterize C = F2 for ease of computation and theoretical derivation, where F is the square root ofC, i.e. C1/2, containing q(q + 1)/2 distinct elements.

According to Proposition 1 of[14], model (2) can be hierarchically represented as

Yi | i ∼ Nni(Xib + dii,  2 Wi) i ∼ TN(0, 2; (0, ∞)) (3) where Wi= ZiF(Iq− ddT)FTZTi + Ci= Ki− didTi Ki= ZiCZTi + Ci, di= ZiFd, d = d(k) = k 1+qj=12j

and TN(, 2; (a, b)) denotes the truncated normal distribution for N(, 2) lying within a trun-cated interval(a, b).

It follows from (3) that the density of Yi is

f(Yi) = 2ni(Yi|Xib,  2 Ki) ⎛ ⎝dTiW−1i (Yi− Xib) 1+ dTiWi−1di ⎞ ⎠ (4)

(4)

From (4), it is straightforward to observe that Yi are independent and marginally distributed as Yi∼ SNni(Xib, 2Ki, ai), where ai= (1 + dTiW−1i di)−1/2K 1/2 i Wi−1di. Consequently, simple algebra yields i | Yi∼ TN(i,  2 i; (0, ∞)) where i= (1 + d T iW−1i di)−1dTiWi−1(Yi − Xib), 2i=  2(1 + dT iW−1i di)−1 (5)

Note that, in particular,

E(i | Yi) = i+ ii and E( 2 i | Yi) = 2i + 2i+ iii (6) where i=( i) ( i) and i=i i =dTiWi−1(Yi − Xib) 1+ dTiW−1i di (7)

Denoting by Y=(Y1, . . . , YN) the observed data, the log-likelihood function of h=(b, 2, F, q, k) is (h|Y) = −n 2log( 2) −1 2 N  i=1 log|Ki| − 1 22 N  i=1 eTiK−1i ei+ N  i=1 log( i) (8) where n=iN=1ni denotes the total number of observations, Ki= Ki(F, q) and ei= Yi − Xib.

Explicit expressions for the score vector sh and the observed information matrix Ihh are derived in Appendix A.

3. MAXIMUM-LIKELIHOOD ESTIMATION

Since the ML estimators of parameters in model (2) cannot be written in closed forms, the popular Newton–Raphson (NR) algorithm can be applied to calculate or approximate ML estimates iteratively. Starting from an initial point ˆh(0), the NR procedure proceeds according to

ˆh(k+1)= ˆh(k)+ ˆI−1(k)hh ˆs(k)h (9) where ˆs(k)h and ˆI(k)hh are the score vector and the observed information matrix evaluated at ˆh(k), respectively. Other related iterative algorithms such as steepest ascent, Davidon–Fletcher–Powell, conjugate gradient, quasi-Newton and Marquardt–Levenberg, see e.g. [16], which are gradient methods such as the NR algorithm, can lead to equivalent calculations.

An oft-voiced complaint of the NR algorithm is that it may not converge unless good starting values are used. The EM algorithm[17], which takes advantage of being insensitive to the starting values as a powerful computational tool that requires the construction of unobserved data, has been well developed and has become a broadly applicable approach to the iterative computation of ML estimates. One of the major reasons for the popularity of the EM algorithm is that the

(5)

M-step involves only complete data ML estimation, which is often computationally simple. More-over, the EM algorithm is stable and straightforward to implement since the iterations converge monotonically and no second derivatives are required.

When the M-step of EM turns out to be analytically intractable, it can be replaced with a sequence of conditional maximization (CM) steps. Such modification is referred to as the ECM algorithm [18]. The ECME algorithm [19], a faster extension of EM and ECM, is obtained by maximizing the constrained Q-function (the expected complete data function) with some CM steps that maximizes the corresponding constrained actual likelihood function, called the CML steps. However, computation to the solution of the CML step may involve a high-dimensional search problem for which the likelihood is not guaranteed to increase. In practice, a modified NR maximization procedure such as the step-halving method can be used to guarantee that each step increases the likelihood.

Lets = (1, . . . , N) be the latent vector, and let ˆh

(k)

= (ˆb(k), ˆ2(k), ˆx(k)), where ˆx(k)= (vech ( ˆF(k)), ˆq(k), ˆk(k)), denote the estimates of h at the kth iteration. Given the hierarchical representation

(3), the complete data log likelihood is

c(h|Y, s) = − n+ N 2 log( 2) −1 2 N  i=1 log|Wi| − 1 22 N  i=1 {(Yi − Xib − dii)TWi−1(Yi − Xib − dii) + 2i} (10)

Given the current estimateh(k), the E-step calculates Q(h|h(k)) = E(c(h|Y, s)|Y, h(k)). From

(10), calculation of Q(h|h(k)) requires the expressions of ˆ(k)i = E(i|Y, h(k)) and ˆu(k)i = E(2i|Y, h(k)), which are readily evaluated according to (6). That is,

ˆ(k)i = ˆ(k)i + ˆi(k)ˆ(k)i and ˆu(k)i = ˆ2(k) i + ˆ 2(k) i + ˆ (k) i ˆ(k)i ˆ (k) i (11)

where ˆ(k)i and ˆ(k)i in (11) arei andi in (5) evaluated ath = ˆh

(k)

, and ˆ(k)i = (ˆ (k)i )/(ˆ (k)i ) with ˆ (k)i = ˆ(k)

i /ˆ

(k)

i . The CM steps then conditionally maximize Q(h|ˆh

(k)

) with respect to h,

obtaining a new estimate ˆh(k+1). To update ˆb(k) and ˆ2(k), the closed-form maximizers for ˆb(k+1)

and ˆ2(k+1)are given by ˆb(k+1)i = N i=1 XTi ˆW−1(k)i Xi −1N i=1 XTi ˆW−1(k)i (Yi− ˆd(k)i ˆ(k)i ) ˆ2(k+1)= 1 n+ N N i=1 ˆei(k+1)TW−1(k)i ˆe(k+1)i − 2 N  i=1 ˆi(k)ˆd (k)T i ˆW −1(k) i ˆe(k+1)i + N i=1 ˆu(k)i (1 + ˆd(k)i TˆW−1(k)i ˆd(k)i )  whereˆe(k+1)i = Yi− Xiˆb (k+1) i , ˆd(k)i = ZiˆF(k)ˆd (k) and ˆW(k)i = ZiˆF2(k)ZTi + Ci(ˆq(k)) − ˆd(k)i ˆd(k) T i .

(6)

Note that the maximization of Q(h|ˆh(k)) with respect to x = (F, q, k) does not have analytical solutions. We then perform a CML step by maximizing the constrained actual likelihood function with respect tox using the following one-cycle NR procedure:

ˆx(k+1)= ˆx(k)+ r(k)ˆI−1(k)xx ˆs(k)x

where ˆs(k)x and ˆI(k)xx are evaluated by using the current estimates, and the scaling number r(k), 0<r(k)1, is chosen at each iteration such that the actual likelihood function is non-decreasing in k.

The iterations are repeated until a suitable convergence rule is satisfied, e.g.ˆh(k+1)− ˆh(k) is sufficiently small. Unfortunately, there are instances in which this method is notoriously slow to converge. To accelerate the convergence, a faster hybrid ECME-NR algorithm is performed by running a moderate number of ECME iterations from a poor starting value until arriving in a near neighborhood of the ML estimates, and then switching to the NR algorithm (9) to speed up the convergence. Under some regularity conditions, the asymptotic variance–covariance estimates can be approximated by plugging the ML estimate ˆh = (ˆb, ˆ2, ˆF, ˆq, ˆk) into the inverse of the observed

information matrix.

4. A SCORE TEST FOR THE SKEWNESS PARAMETER

To assess whether the skewness parameterk is significantly different from zero, we use the score

test for testing the null hypothesis H0 : k = 0 against the alternative H1 : k = 0. The principal

advantage of using the score test in comparison with other competitive testing procedures such as the likelihood ratio or Wald tests is that it does not require the more complex model to be fitted. To obtain the score test statistic, we first calculate ML estimates under the null hypothesis, denoted by ˆh0, and then evaluate the score vector and observed information ath = ˆh0, denoted by[sh]ˆh0

and[Ihh]ˆh

0, respectively. Note that

shˆh

0 has all components equal to 0 except for the derivative

with respect tok.

Letf = (b, 2, vech(F), q), so that h = (f, k). The score vector and observed information matrix

can be reexpressed as sh= (sTf, sTk)T and Ihh= Iff Ifk Ikf Ikk

respectively. The score test statistic Usis

Us= [sh]Tˆh0[Ihh]−1ˆh 0[sh]ˆh0= [sk] T ˆh0[Ikk·f] −1 ˆh0[sk]ˆh0 (12)

where sk= (iN=1i* i/*1, . . . ,iN=1i* i/*q)Tand Ikk·f= Ikk− IkfI−1ff Ifk. Explicit

expres-sions for skand Ikk·f are shown in Appendix A.

Under H0, Us is asymptotically a chi-squared random variable with q degrees of freedom. A

disadvantage of the score test statistic is that it tends to give too few significant results since its asymptotic distribution is approached more slowly than that of the likelihood ratio test statistic in small samples. This is due to the fact that the score test is a first-order approximation to the likelihood ratio test. For a practical perspective, we suggest that one performs the likelihood

(7)

ratio test to yield more accurate results when the score test gives a weak indication that the null hypothesis is inappropriate, say at the 10 per cent significance level.

5. INFERENCE FOR RANDOM EFFECTS AND PREDICTION

We consider an empirical Bayes inference for the random effects that is useful for evaluating subject-specific quantities of interest such as individual intercepts and slopes. From model (2), it implies that Yi | bi∼ Nni(Xib + Zibi, 

2C

i) and bi∼ SNq(0, 2C, k). The conditional density of bi given Yi is f(bi|Yi) = f(Yi|bi) f (bi) f(Yi|bi) f (bi) dbi = q(lbi|Yi, Rbi|Yi)(k T bbi) wherelbi|Yi= CZ T iK−1i (Yi − Xib), Rbi|Yi=  2(C−1+ ZT iC−1i Zi)−1 andkb= −1C−1/2k.

After some algebraic manipulations, the minimum mean-squared error (MSE) estimator of bi

obtained by the conditional mean of bi given Yi is given by

ˆbi(h) = (i)lbi|Yi + (i)  1+ kTbRbi|Yikb Rbi|Yikb (13) wherei= (1 + kTbRbi|Yikb)−1/2k T blbi|Yi.

In practice, the empirical Bayes estimator of bi, ˆbi, can be obtained by substituting the ML

estimate ˆh into (13). As a consequence, it leads to ˆbi= ˆbi(ˆh). We mention in passing that the

detailed expression for the error covariance matrix of ˆbi(h) is somewhat complicated and hence is

not shown here. Furthermore, we are interested in the prediction of yi, a future × 1 vector of

mea-surements of Yi, given the observed measurements Y=(YT(i), YTi)T, where Y(i)=(YT1, . . . , YTi−1, YTi+1, . . . , YTN)T.

Let xi and zi denote q× m1and q× m2matrices of prediction regressors corresponding to yi.

Consequently, we assume that  Yi yi  ∼ SNni+(Xib, Xi, ai) where Xi = (XTi, xTi)T, Zi = (ZTi, zTi)T,Xi= 2(ZiCZ∗Ti + Ci) and ai= (1 + d∗Ti W∗−1i di)−1/2 K∗1/2i W∗−1i di. Moreover, we note that di = ZiFd, Ki = ZiCZ∗Ti + Ci, Wi= Ki− did∗Ti and Ci = Ci(q) is an (ni + ) × (ni + ) expanded autocorrelation matrix. Let ti= X−1/2ai and let ti, Ci andXi be partitioned conformably with Yi = (YTi, yTi)T. Then,

ti = (t(1)Ti , t(2)Ti )T, Ci = C11 C12 C21 C22 Xi = X11 X12 X21 X22 = ZiCZTi + C11 ZiCzTi + C12 ziCZiT+ C21 ziCzTi + C22 (14)

(8)

Here the indices for the partitioned matrices in (14) are omitted for notational simplicity. Also, we note that C11= Ci, C12= CT21,X11= Ki andX12= XT21. Making use of the fact that

f(yi|Yi, h) ∝



f(yi|Yi, i, h) f (Yi|i, h) f (i|h) di

the conditional distribution of yi given Yi is

f(yi|Yi, h) = (l2·1, X22·1)(t T i(Yi − Xib)) (˜tT i(Yi − Xib)) (15) where l2·1= xib + X21X−111(Yi − Xib), X22·1= X22− X21X−111X12 ˜ti = t (1) i + X−111X12t(2)i  1+ t(2)Ti X22·1t(2)i

The minimum MSE predictor of yi is the conditional expectation of yi given Yi, i.e.

ˆyi(h) = E(yi|Yi, h) = l2·1+ (˜t T i(Yi − Xib)) (˜tT i(Yi− Xib)) X22·1t(2)i  1+ t(2)Ti X22·1t(2)i (16)

Meanwhile, the MSE covariance matrix of predictor (16) is given by

E[(ˆyi(h) − yi)(ˆyi(h) − yi)T]=E[cov(yi|Yi)]

where cov(yi|Yi) = X22·1−  ˜tT i(Yi − Xib) + (˜t T i(Yi − Xib)) (˜tT i(Yi − Xib))  ×(˜tTi(Yi − Xib)) (˜tT i(Yi− Xib)) X22·1t(2)i ti(2)TX22·1 1+ t(2)Ti X22·1t(2)i (17)

Notice that expression (17) does not account for errors due to the estimation of unknown parameters and hence it tends to underestimate the true error covariance matrix. The prediction of

yi can be obtained by substituting the ML estimate ˆh into (16), leading to ˆyi= ˆyi(ˆh). Proofs of

(16) and (17) are sketched in Appendix B.

6. AN ILLUSTRATIVE EXAMPLE

The Framingham heart study has examined the role of serum cholesterol as a risk factor for the evolution of cardiovascular disease. Zhang and Davidian[20] proposed a semi-parametric approach to analyze a subset of the Framingham cholesterol data, which consist of gender, baseline age and

(9)

0 2 6 8 10 1.5 2.0 2.5 3.0 3.5 4.0 Years Cholesterol levels Male Female 4

Figure 1. Trajectories of cholesterol levels for 133 participants. The thicker solid line indicates the mean profile in the study.

cholesterol levels measured at the beginning of the study and then every 2 years over 10 years, for 200 randomly selected participants. Arellano-Valle et al.[14] analyzed the same data set by fitting a linear mixed-effects model to it with both random effects and within-subject errors following SN distributions. Both of them come to the same conclusion that the distribution of subject-specific intercepts is non-normal.

In this section, we revisit the Framingham cholesterol data with the aim of providing additional inferences for the use of SNLMM that are not considered in[14]. To simplify the illustration, we select 133 participants (60 men and 73 women) whose trajectories of cholesterol levels as well as covariates of interest are completely observed at the duration of follow-up time. Figure 1 displays the cholesterol profiles for the 133 participants and suggests that the underlying trend seems to be linear with time and increases with a slowly moving speed.

Assuming a linear growth model with subject-specific random intercepts and slopes, we fit a linear mixed model to the data as specified by[20]

Yi j= 0+ 1sexi+ 2agei+ 3ti j+ b0i+ b1iti j + i j, i = 1, . . . , 133, j = 1, . . . , 6 (18)

whereb = ( 0, 1, 2, 3)T are the fixed effects of explanatory variables and the random effects bi= (b0i, b1i)Tare assumed to have a bivariate SN distribution SN2(0, 2C, k) with k = (1, 2)T.

Moreover, the error terms ei= ( i 1, . . . , i 6)T are assumed identically and independently N6(0,

2I

(10)

Table I. ML estimation results for model (19) with the associated standard errors in parentheses, where F is the square root ofC such that C = F2.

Number of ˆ

b ˆF ˆ (ˆh) parameters AIC BIC

1.5740 1.6869 0.7068 0.2079 −100.89 8 217.78 240.90 (0.1713) (0.1253) (0.1238) (0.0104) −0.0338 0.9403 (0.0619) (0.1936) 0.01860 (0.0040) 0.2787 (0.0274)

over time. For numerical stability as mentioned in[20], Yi j are the cholesterol level divided by 100

at the j th time point for the i th participant, sexi is a gender indicator(0 = female, 1 = male), agei

is the i th participant’s age at baseline and ti j is taken as(time − 5)/10, with time measured in

years from the baseline.

To test for the existence of skewness preference for random effects, we start by fitting an ordinary NLMM as follows:

Yi= Xib + Zibi + ei, bi∼ N2(0, 2C)

ei∼ N6(0, 2I6), bi⊥ei, i = 1, . . . , 133

(19)

The resulting ML estimates, together with the maximized log-likelihood value (ˆh), and two penalized likelihood information criteria, AIC and BIC values (AIC =−2(ˆh)+2m; BIC = −2(ˆh)+

m log(N)), where m is the number of parameters, corresponding to the fitted NLMM (19) are listed

in Table I. The score test statistic for testing evidence of skewness among random effects gave a value of 12.17, which is highly significant compared with the 22 distribution. The score test suggests that the distribution of random effects is probably skewed. Figure 2 depicts histograms and corresponding normal quantile plots of the empirical Bayes estimates of bi, ˆbi= ˆCZTi(ZiˆCZTi + I6)−1(Yi− Xiˆb), and shows that the subject-specific intercepts are positively skewed. Conversely,

there are no apparent non-normal patterns for subject-specific slopes. We thus consider an alternate model

Yi= Xib + Zibi+ ei, bi∼ SN2(0, 2C, k)

ei∼ N6(0, 2I6), bi⊥ei, i = 1, . . . , 133

(20)

The summarized ML results corresponding to the fitted SNLMM (20) are shown in Table II. All estimates are significant (compared with their standard errors) with the exception of2. The results

reveal that the joint distribution of random effects may, to some extent, depart from normality. Accordingly, the fitted model (20) produces smaller AIC and BIC values than those of model (19), indicating that model (20) for characterizing a more plausible assumption on the random effects is our preferred choice.

After fitting model (19) to the Framingham cholesterol data, the individual random effects, ˆbi= (bi 1, bi 2), i = 1, . . . , 133, can be estimated by using (13) as stated in Section 5. Figure 3

(11)

-0.5 0.0 0.5 1.0 0 5 10 15 20 25 30

Empirical Bayes estimates of subject specific intercepts

Frequency

Quantiles of Standard Normal

-2 -1 0 1 2 -0.5 0.0 0.5 1.0 (a) -0.4 -0.2 0.0 0.2 0.4 0 10 20 30 40

Empirical Bayes estimates of subject specific slopes

Frequency

Quantiles of Standard Normal

-2 -1 0 1 2 -0.4 -0.2 0.0 0.2 0.4 (b)

Figure 2. Histogram and normal Q–Q plots of empirical Bayes estimates of: (a) subject-specific intercepts and (b) subject-specific slopes.

Table II. ML estimation results for model (20) with the associated standard errors in parentheses, where F is the square root ofC such that C = F2.

Number of ˆ

b ˆF ˆ kˆ (ˆh) parameters AIC BIC

1.2898 2.6799 0.2752 0.2077 4.7052 −94.18 10 208.36 237.26 (0.1731) (0.1977) (0.0883) (0.0103) (1.1702) −0.0382 0.9398 0.0000 (0.0614) (0.1569) (0.5173) 0.0151 (0.0036) 0.2341 (0.0273)

presents a scatter plot of estimated random effects overlaid on a set of contour lines, together with summary histograms of the marginal densities. A visual inspection of Figure 3 signifies that there is considerable asymmetry among the estimated random effects. Moreover, the contour level curves appear to follow somewhat satisfactorily the trend of a scattergram, reflecting the adequacy of using a bivariate SN distribution for the random effects.

(12)

random intercepts random slopes 0 0.3 −0.3 −0.3 0 0.3 0.6 0.9 1.2 1.5 1.8 0.6

Figure 3. Scatter plot of estimated random effects overlaid on a set of contour lines, together with summary histograms of the marginal densities.

7. SIMULATIONS

The prediction problem for longitudinal data is of great importance in a number of practical applications. As pointed out by Rao[21] and Lee [22], the predictive accuracy of future observations can be taken as an alternative measure of ‘fitness’ of the data. In this section, we present a simulation study in which attention is focused on comparing the predictive abilities of the NLMM and SNLMM models. In the simulations, we generate 500 Monte Carlo data sets from the model:

Yi j= 0+ ti j 1+ wi 2+ bi+ i j i= 1 . . . , N, j = 1, . . . , 6 (21)

where ti j= j, wi= 1 if iN/2 and is zero otherwise, 1= 2, 2= 1, i j∼ N(0, 22) and 0+ bi is

taken to have the Gamma(2, 1) distribution with density f (x) = x exp(−x) for yielding a highly right skewed distribution. For each generated data set, model (21) will be fitted twice under the model assumptions outlined in Section 2, with the density of bi represented by a SN distribution

and a normal distribution, respectively.

We use the pseudo-cross-validation approach to assess their predictive performances. This ap-proach of comparing forecasts with the corresponding actual values is in the spirit of cross validation of Stone[23] and Geisser [24]. The technique proceeds as follows: (i) drop out the last three mea-surements yi 4, yi 5, yi 6 on the i th participant; (ii) compute ML estimates using the remaining data

as the sample; (iii) prediction of yi= (yi 4, yi 5, yi 6), denoted by ˆyi= ( ˆyi 4, ˆyi 5, ˆyi 6), is made by

using formula (16). For each of the 500 generated Monte Carlo data sets, the procedure is repeated across subjects i= 1, . . . , N.

To evaluate prediction accuracies, the quantity of the empirical mean-squared forecast error

MSFE= 1 3N N  i=1 (ˆyi− yi)T(ˆyi− yi)

is calculated as a discrepancy measure. On the basis of this measure, the best predictor corresponds to having the smallest MSFE from a given set of alternatives.

We consider the cases with sample sizes N= 30, 50, 100, 150 and comparisons of both predic-tors based on 500 replications are summarized in Table III. As expected, the numerical results indicated that the SNLMM predictor performs encouragingly well in all cases. In all 2000

(13)

Table III. Comparison of prediction accuracy of NLMM and SNLMM predictors in terms of mean and standard deviation of 500 simulated MSFEs.

N= 30 N= 50 N= 100 N= 150

Predictor Mean SD Mean SD Mean SD Mean SD

NLMM 1.2719 0.2010 1.2920 0.1593 1.2974 0.1116 1.3016 0.0876 SNLMM 1.2184 0.1936 1.2359 0.1516 1.2380 0.1063 1.2413 0.0826 0 2 4 6 8 10 12 0 10 20 30 40 N=50 % improvement (b) 0 2 4 6 8 10 12 0 10 20 30 40 N=30 % improvement (a) 0 2 4 6 8 10 12 0 10 20 30 40 N=100 % improvement (c) 0 2 4 6 8 10 12 0 10 20 30 40 N=150 % improvement (d)

Figure 4. Percentage improvement of the SNLMM predictor over the NLMM predictor for 500 simulated MSFEs: (a) N= 30; (b) N = 50; (c) N = 100; and (d) N = 150.

generated simulation data, the SNLMM predictors yield better forecasts (with lower MSFEs) than the classical NLMM predictors. Judging from this, it is important to emphasize that model specification with regard to random effects is noteworthy in prediction. Figure 4 displays the per-centage improvement of the SNLMM predictor over the NLMM predictor. It is readily seen that the gains in predictive accuracies obtained from the SNLMM model appear striking.

8. CONCLUSIONS

We have proposed an approach to the analysis of linear mixed models using a multivariate SN distribution for the random effects, which will allow practitioners to fit longitudinal data in

(14)

a wide variety of considerations. We have described a normal-truncated normal hierarchy for the SNLMM model and presented a hybrid ECME-NR algorithm for dealing with ML estimation in a flexible complete data framework. Moreover, the score test as well as prediction procedures considered in this paper are easy to implement once the ML estimates are obtained. Numerical results illustrated in Section 6 indicate that the SNLMM model for the Framingham cholesterol data is evidently more adequate than the conventional NLMM model. The simulation study shows that an appropriate specification of random effects can enhance predictive abilities.

There are a number of possible extensions of the current work. A natural generalization is to robustify the skew-normal distribution using a broader family such as the skew t [25–27] and the skew-elliptical[8, 28] distribution. Another worthwhile task is to develop workable Markov chain Monte Carlo algorithms in a Bayesian framework of hierarchical SNLMM models. Finally, one referee pointed out that it is also of interest to compare SNLMM and t linear mixed models [2, 29, 30] with respect to their robust inferences.

APPENDIX A: THE SCORE FUNCTION AND EMPIRICAL INFORMATION MATRIX

The score vector sh is the vector of the first derivatives of (8) with respect toh = (b, , x), where x = (vech(F), q, d) with vech(F) being the distinct elements of F. Expressions for the elements

of sh are sb=*(h|Y) *b = 1 2 N  i=1 XTiK−1i ei− 1  N  i=1 ini s=*(h|Y) * = − n + 1 3 N  i=1 eTiK−1i ei − 1  N  i=1 i i [sx]r=*(h|Y) *r = − 1 2 N  i=1 tr(K−1i ˙Kir) + 1 22 N  i=1 eTiK−1i ˙KirK−1i ei+ N  i=1 i˙ ir

wherei and i are as in (7),

ni= X T iW−1i di  1+ dTiW−1i di ˙Kir=*Ki *r and ˙ ir= * i *r Note that ˙Kir= ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ Zi *F * fab FZTi + ZiF *F * fab ZTi if r= fab (a = 1, . . . , q; b = a, a + 1, . . . , q) *Ci *a if r= a (a = 1, . . . , g) 0 if r= b (b = 1, . . . , q) and ˙ ir= (˙dT irWi−1− dTiW−1i ˙WirW−1i )ei 1+ dTiW−1i di i(2˙dTirWi−1di− dTiW−1i ˙WirW−1i di) 2(1 + dTiWi−1di)

(15)

where ˙dir= *di *r = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ Zi *F * fab d if r= fab 0 if r= a ZiF*d *b if r= b

and ˙Wir= *Wi/*r= ˙Kir− ˙dirdTi − di˙dTir. Here,*F/* fabis clearly a binary indicator matrix and

the elements of*Ci/*a and*d/*b can be obtained straightforwardly.

The observed information Ihh, obtained by the negative of the second derivative of (8), has the following components: Ibb = 1 2 N  i=1 (XT iKi−1Xi + i( i + i)ninTi) Ib = 2 3 N  i=1 XTiK−1i ei − 1 2 N  i=1 i(1 − i i− 2i)ni [Ibx]r = 1 2 N  i=1 XTiK−1i ˙KirK−1i ei+ 1  N  i=1 (˙irni + i˙nir) I = −n 2 + 3 4 N  i=1 eTiK−1i ei+ 1 2 N  i=1 i i( 2i + i i − 2) [Ix]r = 1 3 N  i=1 eiTKi−1˙KirK−1i ei + 1  N  i=1 (˙ir i+ i˙ ir) [Ixx]r s = 1 2 N  i=1 [tr(K−1i ¨Kir s) − tr(K−1i ˙Ki sK−1i ˙Kir)] + 1 2 N  i=1 eiTKi−1˙KirK−1i ˙Ki sK−1i ei − 1 22 N  i=1 eTiK−1i ¨Kir sK−1i eiN  i=1 (˙ir˙ ir+ i¨ ir s) where ˙ir = *i *r= −  i( i+ i)˙ ir ˙nir = *ni *r= − XTi(W−1i ˙WirW−1i di− Wi−1˙dir)  1+ dTiW−1i dini(2˙dTirW−1i di− dTiW−1i ˙WirW−1i di) 2(1 + dTiW−1i di) ¨ ir s = *2 i *r*s = Aei 1+ dTiW−1i di(˙dTirW−1i − dTiW−1i ˙WirWi−1)ei(2˙dTi sW−1i di − dTiW−1i ˙Wi sWi−1di) 2(1 + dTiW−1i di)3/2

(16)

˙ i s(2˙dTirW−1i di − dTiW−1i ˙WirW−1i di) 2(1 + dTiW−1i di) iB 2(1 + dTiW−1i di) i(2˙dTirW−1i di− dTiW−1i ˙WirWi−1di)(2˙di sTWi−1di− dTiW−1i ˙Wi sW−1i di) 2(1 + dTiW−1i di)2 with A= ¨dTir sW−1i − ˙dirTW−1i ˙Wi sW−1i − ˙dTi sW−1i ˙WirW−1i + dTiWi−1˙Wi sW−1i ˙WirW−1i − dT iW−1i ¨Wir sW−1i + dTiW−1i ˙WirW−1i ˙Wi sWi−1 B= 2(¨dTir sW−1i di − ˙dirTW−1i ˙Wi sW−1i di + ˙dTirW−1i ˙di s− ˙dTi sW−1i ˙WirW−1i di + dT iW−1i ˙Wi sW−1i ˙WirW−1i di) − d T iW−1i ¨Wir sW−1i di ¨Wir s= ¨Kir s − ¨dir sdTi − ˙dir˙dTi s− ˙di s˙dTir− di¨dir sT ¨Kir s= *Ki *r*s= ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ *2 Ci *ab if r= a and s= b 0 o.w. and ¨dir s= * 2 di *r*s = ⎧ ⎪ ⎨ ⎪ ⎩ Zi *F * fab *d *c ifr= fab and s= c 0 o.w.

APPENDIX B: PROOFS OF (16) AND (17)

By applying Proposition 4 of[6], the moment generating function of the conditional density in (15) is given by Myi|Yi(t) = exp  tTl2·1+12tTX22·1t  (˜tT i(Yi− Xib))  (t(1)Ti (Yi − Xib) + t (2)T i (yi − xib)) ×(2)/2|X22·1|−1/2exp{−12(yi− l2·1− X22·1t)TX−122·1(yi− l2·1− X22·1t)}dyi =exp  tTl2·1+12tTX22·1t  (˜tT i(Yi− Xib))  ⎛ ⎝˜tT i(Yi− Xib) + t(2)Ti X22·1t  1+ t(2)Ti X22·1t(2)i⎠ (t ∈ R)

(17)

Differentiating Myi|Yi(t) with respect to t we get My i|Yi(t) = (l2·1+ X22·1t) exp(tTl2·1+12tTX22·1t) (˜tT i(Yi − Xib))  ⎛ ⎝˜tT i(Yi − Xib) + t (2)T i X22·1t  1+ t(2)Ti X22·1t(2)i ⎞ ⎠ +exp  tTl2·1+12tTX22·1t (˜tT i(Yi − Xib)) ⎝˜tT i(Yi − Xib) + t(2)Ti X22·1t  1+ t(2)Ti X22·1t(2)i ⎞ ⎠ × X22·1t (2) i  1+ t(2)Ti X22·1t(2)i Substituting t= 0 into My

i|Yi(t), we complete the proof of (16). It therefore suffices to verify

that E(yiyTi|Yi, h) = * *tTM yi|Yi(t)   t=0 = X22·1+ l2·1lT2·1+ (˜t T i(Yi − Xib)) (˜tT i(Yi − Xib)) ⎡ ⎣ X22·1t(2)i lT2·1 1+ t(2)Ti X22·1t(2)i + l2·1t (2)T i X22·1  1+ t(2)Ti X22·1t(2)i − ˜tT i(Yi− Xib) X22·1t(2)i t(2)Ti X22·1 1+ t(2)Ti X22·1t(2)i ⎤ ⎦

Using the fact that cov(yi|Yi, h) = E(yiyiT|Yi, h) − E(yi|Yi, h)E(yi|Yi, h)T, the result (17) is

then proved.

ACKNOWLEDGEMENTS

The authors would like to thank Prof. Arellano-Valle for providing the Framingham cholesterol data at an early stage of this work. We are also grateful to the editor and two anonymous referees for their valuable comments and suggestions, which led to substantial improvements in the presentation of this work. This research was partly supported by the National Science Council of Taiwan (grant no. NSC95-2118-M-005-001-MY2).

REFERENCES

1. Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics 1982; 38:963–974.

2. Pinheiro JC, Liu CH, Wu YN. Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics 2001; 10:249–276.

3. Verberk G, Lesaffre E. A linear mixed-effects model with heterogeneity in the random-effects population. Journal

of the American Statistical Association 1996; 91:217–221.

4. Lin TI, Lee JC. On modelling data from degradation sample paths over time. Australian and New Zealand

Journal of Statistics 2003; 45:257–270.

5. Lee JC, Lin TI, Lee KJ, Hsu YL. Bayesian analysis of Box–Cox transformed linear mixed models with ARMA( p, q) dependence. Journal of Statistical Planning and Inference 2005; 133:435–451.

(18)

6. Azzalini A, Dalla Valle A. The multivariate skew-normal distribution. Biometrika 1996; 83:715–726.

7. Azzalini A, Capitaino A. Statistical applications of the multivariate skew-normal distribution. Journal of the

Royal Statistical Society, Series B 1999; 61:579–602.

8. Sahu SK, Dey DK, Branco MD. A new class of multivariate skew distributions with application to Bayesian regression models. Canadian Journal of Statistics 2003; 31:129–150.

9. Arellano-Valle RB, Genton MG. On fundamental skew distributions. Journal of Multivariate Analysis 2005; 96:93–116.

10. Gonz´alez-Farias G, Dom´ınguez-Molina A, Gupta AK. Additive properties of skew normal random vectors. Journal

of Statistical Planning and Inference 2004; 126:521–534.

11. Liseo B, Loperfido N. A Bayesian interpretation of the multivariate skew-normal distribution. Statistics and

Probability Letters 2003; 61:395–401.

12. Azzalini A. The skew-normal distribution and related multivariate families (with discussion). Scandinavian Journal

of Statistics 2005; 32:159–200.

13. Arellano-Valle RB, Azzalini A. On the unification of families of skew-normal distributions. Scandinavian Journal

of Statistics 2006; 33:561–574.

14. Arellano-Valle RB, Bolfarine H, Lachos VH. Skew-normal linear mixed models. Journal of Data Science 2005; 3:415–438.

15. Genton MG. Skew-elliptical Distributions and Their Applications. Chapman & Hall: New York, 2004. 16. Thisted RA. Elements of Statistical Computing: Numerical Computation. Chapman & Hall: New York, 1988. 17. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm (with

discussion). Journal of the Royal Statistical Society, Series B 1977; 39:1–38.

18. Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 1993; 80:267–278.

19. Liu CH, Rubin DB. The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence.

Biometrika 1994; 81:633–648.

20. Zhang D, Davidian M. Linear mixed models with flexible distributions of random effects for longitudinal data.

Biometrics 2001; 57:795–802.

21. Rao CR. Prediction of future observations in growth curve models. Statistical Science 1987; 2:434–471. 22. Lee JC. Prediction and estimation of growth curve with special covariance structures. Journal of the American

Statistical Association 1988; 83:432–440.

23. Stone M. Cross-validatory choice and assessment of statistical prediction (with discussion). Journal of the Royal

Statistical Society, Series B 1974; 36:111–147.

24. Geisser S. The predictive sample reuse method with applications. Journal of the American Statistical Association 1975; 70:320–328.

25. Jones MC, Faddy MJ. A skew extension of the t-distribution, with applications. Journal of the Royal Statistical

Society, Series B 2003; 65:159–174.

26. Azzalini A, Capitaino A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society, Series B 2003; 65:367–389.

27. Lin TI, Lee JC, Hsieh WJ. Robust mixture modeling using the skew t distribution. Statistics and Computing 2007; 17:81–92.

28. Branco MD, Dey DK. A general class of multivariate skew elliptical distributions. Journal of Multivariate

Analysis 2001; 79:99–113.

29. Lin TI, Lee JC. A robust approach to t linear mixed models applied to multiple sclerosis data. Statistics in

Medicine 2006; 25:1397–1412.

30. Lin TI, Lee JC. Bayesian analysis of hierarchical linear mixed modeling using the multivariate t distribution.

數據

Figure 1. Trajectories of cholesterol levels for 133 participants. The thicker solid line indicates the mean profile in the study.
Table I. ML estimation results for model (19) with the associated standard errors in parentheses, where F is the square root of C such that C = F 2 .
Figure 2. Histogram and normal Q–Q plots of empirical Bayes estimates of: (a) subject-specific intercepts and (b) subject-specific slopes.
Figure 3. Scatter plot of estimated random effects overlaid on a set of contour lines, together with summary histograms of the marginal densities.
+2

參考文獻

相關文件

We are not aware of any existing methods for identifying constant parameters or covariates in the parametric component of a semiparametric model, although there exists an

• Many statistical procedures are based on sta- tistical models which specify under which conditions the data are generated.... – Consider a new model of automobile which is

Conditional variance, local likelihood estimation, local linear estimation, log-transformation, variance reduction, volatility..

Primal-dual approach for the mixed domination problem in trees Although we have presented Algorithm 3 for finding a minimum mixed dominating set in a tree, it is still desire to

important to not just have intuition (building), but know definition (building block).. More on

We showed that the BCDM is a unifying model in that conceptual instances could be mapped into instances of five existing bitemporal representational data models: a first normal

per-user incomplete discrete ratings predicted continuous ratings as individual: RMSE 24 :7433, worse than MF

Finally, discriminate analysis and back-propagation neural network (BPN) are applied to compare business financial crisis detecting prediction models and the accuracies.. In