A robust approach to joint modeling of mean and scale covariance for longitudinal data

(1)

Contents lists available atScienceDirect

Journal of Statistical Planning and Inference

journal homepage:w w w . e l s e v i e r . c o m / l o c a t e / j s p i

A robust approach to joint modeling of mean and scale covariance for

longitudinal data

Tsung-I. Lin

a,∗

, Yun-Jen Wang

b

a_{Department of Applied Mathematics and Institute of Statistics, National Chung Hsing University, Taichung 402, Taiwan} b_{Graduate Institute of Finance, National Chiao Tung University, Hsinchu 300, Taiwan}

A R T I C L E I N F O A B S T R A C T Article history:

Received 7 November 2007 Received in revised form 4 November 2008 Accepted 12 February 2009 Available online 3 March 2009 Keywords:

Covariance structure Maximum likelihood estimates Reparameterization Robustness Outliers Prediction

In this paper, we propose a multivariate t regression model with its mean and scale covariance modeled jointly for the analysis of longitudinal data. A modified Cholesky decomposition is adopted to factorize the dependence structure in terms of unconstrained autoregressive and scale innovation parameters. We present three distinct representations of the log-likelihood function of the model and study the associated properties. A computationally efficient Fisher scoring algorithm is developed for carrying out maximum likelihood estimation. The technique for the prediction of future responses in this context is also investigated. The implementation of the proposed methodology is illustrated through two real-life examples and extensive sim-ulation studies.

1. Introduction

In recent years, the method of joint modeling of mean and covariance on the general linear model with multivariate normal errors, called the normal joint modeling model (NJMM) hereafter, was heuristically introduced byPourahmadi (1999, 2000). The key advantages of such normal-error models include the convenience in statistical interpretation and computational ease in parameter estimation. Yet it still exists several weaknesses. For instance, the assumption of normality for the error terms may be questionable in many practical situations when atypical points exist or the underlying data exhibit thick tails. A number of authors in the literature have used a more thick-tailed distribution, like the multivariate t distribution, in place of normal for robust estimation of general linear models (Zellner, 1976;Lange et al., 1989;He et al., 2004). Robust estimation for linear mixed models using the multivariate t distribution has been studied byWelsh and Richardson (1997)andPinheiro et al. (2001), among others. Specifically, a p-dimensional random vectorY is said to follow a multivariate t distribution with degrees of freedom (df)

, mean vector

and scale covariance matrix

R

if its probability density function is

f (Y|

,

R

,

)=

+ p 2

₂

(

)p/2|

R

| −1/2 1+(Y −

) T

_R

−1₍_{Y −}

₎

−(+p)/2 .

We shall use the notation_{Y ∼ T}p(

,

R

,

) to denote thatY follows the above distribution. The multivariate t distribution has attracted considerable attention over the past 20–30 years. It has been applied in a wide variety of research fields, seeKotz and Nadarajah (2004)and the references therein.

∗ Corresponding author. Tel.: +886 4 22850420; fax: +886 4 22873028. E-mail address:tilin@amath.nchu.edu.tw(T.-I. Lin).

(2)

In this paper, we extend Pourahmadi's approach of joint mean–covariance parameterization to general linear models with the error term distributed according to a multivariate t distribution, also called the t joint modeling model (TJMM), as a robust approach to the analysis of longitudinal data.

Suppose that the repeated measurements of a continuous random variable are observed over time on each of m subjects. Let

Yi= (Yi1, . . . , Yini) T

be the response vector for the ith subject measured at time pointsti= (ti1, . . . , tini) T

, which are allowed unevenly spaced, and let the associate covariates_Xi= [xi1. . .xip] be an ni× p full rank design matrix.

The TJMM is defined as

Yi∼ Tni(

i,

R

i,

) (i= 1, ... , m), (1)

where

_i= (

i1, . . . ,

ini) T_{= X}

i

b

is the mean response vector for subject i. Moreover, to ensure positive definiteness of

R

i= [

ij], we reparameterize it via the modified Cholesky decomposition as

Li

R

iLTi= Di, (2)

whereDi=diag{

21, . . . ,

2ni} and Li=[

jk] is a unit lower triangular matrix with the (j, k)th entry being−

jk. Obviously,

R

−1

i =LTiD−1i Li. The parameters

jkand

2j inLiandDiare referred to as the autoregressive parameters and scale innovation variances of

R

i, respectively. Note that such decomposition in (2) is unique and has several nice features. For a detailed discussion on the modified Cholesky decomposition, interested readers are referred toPourahmadi (2001, Section 3.5).

Statistical interpretations for such reparameterization include (a) the below-diagonal entries of_Liare the negatives of the autoregressive parameters, namely−

jk, in

ˆYij=

ij+ j−1 k=1

jk(Yik−

ik),

which is the linear least-squares predictor of Yijbased on its predecessors; (b) the diagonal entries ofDiare the scale innovation variances

2

j= c−1 var(Yij− ˆYij), where c=

/(

− 2).

To make the dimension of unconstrained parameters

_jkand log

2

j more parsimonious, we model them using covariates in the spirit ofPourahmadi (1999), namely, for j= 1, ... , niand k= 1, ... , j − 1,

jk= zTjk

c

, log

2j = wTj

k

, (3)

where_z_jkand_wjare d× 1 and q × 1 covariate vectors, which can usually be determined in terms of polynomial of measurement time tij's with degrees of d− 1 and q − 1, respectively, and

c

and

k

are d× 1 and q × 1 vectors of unknown parameters. Note that

c

and

k

are assumed to be common for all

R

i's for exhibiting the same covariance structure. In other words,

R

idepends on i only through its dimension ni× ni.

The rest of the paper is organized as follows. In the next section, we present three distinct representations of the log-likelihood function of TJMM and describe a Fisher scoring algorithm for the implementation of ML estimation. Section 3 is devoted to addressing the prediction issue. For illustration purposes, two real examples are presented in Section 4 and extensive simulation results are reported in Section 5. Finally, some concluding remarks are briefly summarized in Section 6, and the technical derivations are sketched in Appendices.

2. Computational aspects of parameter estimation

2.1. The log-likelihood function

For notational convenience, letri= Yi− Xi

b

= {rij}nj=1i ,

i(

b

,

c

,

k

)= rTi

R

−1i riand n= mi=1nibe the total number of observations. Denote by

a

= (

b

,

c

,

k

,

) the population model parameters vector, where

b

= (

1, . . . ,

p),

c

= (

1, . . . ,

d),

k

= (

1, . . . ,

q). Given independent observations_{Y =(Y}1, . . . ,Ym), the log-likelihood function of

a

corresponding to TJMM can be written in three distinct representations as follows:

(

a

|Y) = m i=1 log

+ ni 2 − m log

₂

−n₂log(

)−1₂ m i=1 log|

R

i| − 1 2 m i=1 (

+ ni) log 1+

i(

b

,

c

,

k

)

= m i=1 log

+ ni 2 − m log

₂

−n₂log(

)−1₂ m i=1 ni j=1 log

2_j −1₂ m i=1 (

+ ni) log ⎛ ⎝1 +1

ni j=1

2 ij ⎞ ⎠ = m i=1 log

_+n i 2 −m log

₂

−n₂log(

)₋1 2 m i=1 ni j=1 wT j

k

− 1 2 m i=1 (

_+ni) log 1₊(ri− Zi

c

) T D−1 i (ri−Zi

c

)

, (4)

(3)

whereˆri= {ˆrij}nj=1i = { j−1

k=1rikzjkT

c

} ni

j=1is the predictor of rijbased on its predecessors ri,j−1, . . . , ri,1,

ij=

−1j (rij− ˆrij) andZiis an

ni× d matrix defined by

Zi= [z(i, 1) · · · z(i, ni)]T, z(i, j) = j−1 k=1

rikzjk.

FollowingPourahmadi (1999), it can readily be seen thatLiri= ri− ˆri. Some useful properties concerning this consequence are summarized in the following proposition:

Proposition 1. With notations

ijand

i(

b

,

c

,

k

) as defined in the previous paragraph, we have (a)

_ij=

−1

j (rij− ˆrij)∼ t(

) for j= 1, ... , ni. (b) n−1_i

i(

b

,

c

,

k

)= n_i−1 n_j₌₁i

2_ij∼ F(ni,

). (c) B= (1 +

−1 ni

j=1

2ij)−1∼ Beta(

/2, ni/2).

The proof follows directly from the properties of the t distribution (see, e.g.,Nadarajah and Kotz, 2005) and hence is omitted.

2.2. The scoring algorithm for maximum likelihood estimation

In light of (4), there is no closed-form solution available for the ML estimates and certain numerical techniques such as the iterative Newton–Raphson or Fisher scoring algorithms are used instead in order to find optimal parameter estimates. Explicit expressions of the score vector and the Fisher information required for the developed Fisher scoring algorithm are shown in this subsection. Taking the first derivative of (4) with respect to

a

= (

b

,

c

,

k

,

), a (p+ d + q + 1) × 1 vector, leads to the score vector sa. Expressions for the elements ofsaare

sb= m i=1

iXTi

R

−1i ri, sc= m i=1

iZiTD−1i (ri− Zi

c

), sk= −1₂ m i=1 ni j=1 wj+ 1 2 m i=1 ⎛ ⎝

i ni j=1

2 ijwj ⎞ ⎠ , s=1 2 m i=1

+ ni 2 −

₂

−ni

− log 1+

i(

b

,

k

,

c

)

+

i

i(

b

,

k

,

c

) , (5) where

i=

+ ni +

i(

b

,

k

,

c

) (6) and

(x)= (d/dx) log

(x) is the digamma function.

Computation of the Hessian matrix H_aa, obtained by the second derivative of (4) with respect to

a

, is given in Appendix A. To obtain the elements of the Fisher information matrix

Iaa= −E(Haa)= −E

*

2

(

a

|Y)

aa

T , (7)

we summarize the necessary formulae for calculating the expectations of (A.1)–(A.10), the elements of Hessian matrix, in the following proposition.

Proposition 2. For the multivariate t model (1) with reparameterization (3), the following holds:

(a) E ₁

+

i(

b

,

c

,

k

) =

1 + ni , (b) E

i

+

i(

b

,

c

,

k

) =

+ 2 (

+ ni+ 2) , (c) E(

2 iririT)=

+ ni

+ ni+ 2

R

i , (d) E ⎛ ⎝

i ni j=1

2 ijwjwTj ⎞ ⎠ =ni j=1 wjwTj, (e) E ⎛ ⎝

2 i ⎛ ⎝ni j=1

2 ijwj ⎞ ⎠ ⎛ ⎝ni j=1

2 ijwTj ⎞ ⎠ ⎞ ⎠ =

+ ni

+ ni+ 2 ⎧ ⎨ ⎩ ⎛ ⎝ni j=1 wj ⎞ ⎠ ⎛ ⎝ni j=1 wT j ⎞ ⎠ + 2ni j=1 wjwTj ⎫ ⎬ ⎭,

(4)

(f) E(

iZTiD−1i Zi)= ni j=1

−2 j

j

R

i

j

Tj, (g) E(

iz(i, j)rij)= j−1 k=1

kjzjk, (h) E(

iZiTD−1i

W

i)= ni j=1

−2 j ⎛ ⎝ j−1 k=1

kjzjkwTj−

j

R

i

j

T

c

wjT ⎞ ⎠ ,

where

j

j= [zj1 · · · zj,j−10 · · · 0] is a d × nimatrix and

W

i= [w1(ri1−

c

Tz(i, 1)) · · · wni(rini−

c

T_{z(i, n}

i))]T. (8)

The detailed proof is sketched in Appendix B. Applying Proposition 2, the exact expressions for the elements of (7) are given by Ibb= m i=1

+ ni

+ ni+ 2X T i

R

−1i Xi, Ibc= 0p×d, Ibk= 0p×q, Ib= 0p×1, Icc= m i=1

+ ni

+ ni+ 2 ⎧ ⎨ ⎩ ni j=1

−2 j ⎛ ⎝j−1 k=1 j−1 =1

kzjkzTj ⎞ ⎠ ⎫ ⎬ ⎭, Ick= m i=1

+ ni

+ ni+ 2 ⎧ ⎨ ⎩ ni j=1

−2 j ⎛ ⎝j−1 k=1

kjzjkwTj− j−1 k=1 j−1 =1

kzjkzT_j

c

wTj ⎞ ⎠ ⎫ ⎬ ⎭, Ic= 0d×1, Ikk=1₂ m i=1

+ ni

+ ni+ 2 ⎧ ⎨ ⎩ ni j=1 wjwTj− ( ni j=1wj)( n_j₌₁i wT_j)

+ ni ⎫ ⎬ ⎭, Ik= − m i=1 ⎧ ⎨ ⎩ 1 (

+ ni)(

+ ni+ 2) ni j=1 wj ⎫ ⎬ ⎭, I₌1 4 m i=1

₂−

+ ni 2 − 2ni(

+ ni+ 4)

(

+ ni)(

+ ni+ 2) ,

where

(x)_{= (d}2_/dx2_{) log}

_{(x) stands for the trigamma function.}

Let

h

= (

c

,

k

,

) denote the set of parameters excluding regression coefficients

b

, so that

a

= (

b

,

h

). Owing to the fact thatIbh= 0,

the Fisher information matrix (7) has a block-diagonal form

Iaa= Ibb 0 0 Ihh whereIhh= ⎡ ⎢ ⎢ ⎣ Icc Ick 0 IT ck Ikk Ik 0 IT k I ⎤ ⎥ ⎥ ⎦ . (9)

Furthermore, with regard toIc= 0, this implies the ML estimates of

c

and

are asymptotically uncorrelated.

Thus one can proceed the scoring method to calculate the ML estimates of parameters through the following steps:

Step 1: Choose a starting value

a

ˆ(0)_{= ( ˆ}

_b

(0)_,

_c

_ˆ(0)_{, ˆ}

_k

(0)_,

_ˆ(0)_{) and form the scale covariance matrix ˆ}

_R

(0) i = [ ˆ

(0)

ij ] using the modified Cholesky decomposition for i_{= 1, ... , m.}

Step 2: With the current estimates ˆ

b

(h)_{and ˆ}

_h

(h)_{= (ˆ}

_c

(h)_{, ˆ}

_k

(h)_,

_ˆ(h)_{)at the hth iteration, update the value ˆ}

_h

(h)_{using the scoring} procedure

ˆ

h

(h+1)= ˆ

h

(h)+ ˆI(h)_hh−1ˆs(h) h ,

whereˆs(h)_h = (ˆs(h)

c ,ˆs(h)k ,ˆs(h)) and ˆIhh(h)aresc,skand sin (5) andIhhin (9) evaluated at

a

ˆ(h)= ( ˆ

b

(h), ˆ

h

(h)). The scale covariance matrix

can be computed by ˆ

R

(hi +1)−1= Li(ˆ

c

(h+1))TD−1_i ( ˆ

k

(h+1)

(5)

Step 3: The value of ˆ

b

(h+1)can be obtained by a generalized-least-squares step as ˆ

b

(h+1)= ⎛ ⎝m i=1 ˆ

(h+1) i X T i

R

ˆ (h+1)−1 i Xi ⎞ ⎠ −1⎛ ⎝m i=1 ˆ

(h+1) i X T i

R

ˆ (h+1)−1 i Yi ⎞ ⎠ , where

ˆ(h+1)

i is

iin (6) with

b

,

c

,

k

and

replaced by ˆ

b

(h)

,

c

ˆ(h+1), ˆ

k

(h+1)and

ˆ(h+1), respectively.

Iterative optimization algorithm such as the scoring method must select good starting values

a

ˆ(0). For the starting values of

b

,

c

and

k

, a convenient choice is to use the ML estimates obtained by fitting the unbalanced joint mean–covariance normal regression model as described inPan and MacKenzie (2003). As for

, it is general to use a relative large starting value, say ˆ

(0)

= 50, which corresponds to an initial assumption of near-normality for the error terms. The iterations are repeated until a suitable convergence rule is satisfied, e.g.,ˆ

a

(h+1)_{− ˆ}

_a

(h)

/ˆ

a

(h)

is smaller than the specified tolerance.

After convergence of successive iterations, the converged solutions are the ML estimates, denoted by

a

ˆ= ( ˆ

b

, ˆ

k

,

c

ˆ,

ˆ). Under suitable regularity conditions (Zacks, 1971), the ML estimator

a

ˆis asymptotically normally distributed. Because of_Ibh_{= 0 and}

Ic= 0, it follows immediately that ˆ

b

and ˆ

h

= (ˆ

c

, ˆ

k

,

ˆ) as well as

c

ˆ and

ˆare asymptotically independent. In other words, the asymptotic confidence regions and hypothesis tests for the regression coefficients

b

as well as the variance components

h

can be obtained by assuming that the ML estimators ˆ

b

and ˆ

h

have approximate Np(

b

,Vb) and Nd+q+1(

h

,Vh) distributions, respectively. In practice, the asymptotic covariance matrixVbandVhare unknown and they can be empirically estimated by substituting ML estimates intoI−1

bbandI−1hh, respectively. 3. Prediction

In this section we are interested in deriving the computing formulae for predicting the future values of_Yi, say the ith individual, which is one of the m individuals under consideration. Let˜yibe a future g× 1 vector of measurements of Yi. We further assumeYi and˜yijointly follows an (ni+ g)-variate t distribution. Let ˜xibe a g× p matrix of prediction regressors corresponding to ˜yi. That is,

E(˜yi)= ˜xi

b

. Let˜zjkand˜wj, respectively, denote a d×1 and a q×1 vector such that

jk= ˜zTjk

c

, and log

2 j= ˜w

T

j

k

for j=ni+1, ... , ni+g and k_{= 1, ... , j − 1.}

In the above setting, we then have Yi ˜yi ∼ Tni+g(X∗i

b

,

R

∗i,

) withX∗ i=(XiT,˜xTi) T and

R

∗−1_i =L∗T

i D∗−1i L∗i, whereL∗iis an (ni+g)×(ni+g) lower triangular matrix with −

jkat the (j, k)th position,

k

<

j, j_{= 1, ... , n}i+ g, and D∗_i is a diagonal matrix with

2_j as its diagonal entry, i.e.,

L∗i= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 −zT 21

c

1 0 .. . ... . .. −zT ni,1

c

. . . −z T nini−1

c

1 −˜zT ni+1,1

c

. ._. _{. . .} _−˜zT ni+1,ni

c

1 .. . . .. . . .. . .. −˜zT ni+g,1

c

.. . . . −˜zT ni+g,ni+g−1

c

1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ and D∗ i= diag{wT1

k

, . . . ,wTni

k

,˜w T ni+1

k

, . . . , ˜w T ni+g

k

}.

Let

R

∗_ibe partitioned conformably with the dimension ofY∗

i = (YiT,˜yTi) T_{. Therefore, we have}

R

∗ i=

R

11

R

12

R

21

R

22 . (10)

Here we suppress the subscript i of the partitioned matrices in (10) for notational convenience. Note that

R

11=

R

iand

R

21=

R

T12. Making use of the conditional property concerning the multivariate t distribution, we have

˜yi|Yi,

a

∼ Tg

2·1,

+

i (

b

,

c

,

k

)

+ ni

R

22·1 ,

+ ni , (11)

(6)

From (11), the minimum mean-squared error (MSE) predictor of˜y is

ˆ˜yi(

a

)=

2·1= ˜xi

b

+

R

21

R

−111(Yi− Xi

b

). (12)

Consequently, the MSE covariance matrix for the predictor (12) is determined by

E( ˆ˜y_i(

a

)_{− ˜y}i)( ˆ˜yi(

a

)− ˜yi)T= E(cov(˜yi|Yi)) = E

+

i(

b

,

c

,

k

)

+ ni− 2

R

22·1 =

_{− 2}

R

22·1. (13)

Typically, the predicted value of˜yican be treated by substituting the ML estimate

a

ˆinto (12), leading to ˆ˜y = ˆ˜y(ˆ

a

). Meanwhile, it should be noted that the variance estimate of (13) does not account for errors due to estimation of unknown

a

and hence it will underestimate the actual variance for smaller sample sizes.

4. Numerical illustrations

4.1. The tumor growth data

We apply the results developed in Sections 2 and 3 to the in vivo growth of lung tumor for the control group of 22 xenografted nude mice, which is originally reported byRyggard and Spang-Thomsen (1997)and subsequently analyzed byDemidenko (2004, Chapter 10). This longitudinal study is to investigate the tumor growth rates for the immune-deficient nude mice with human tumor xenografts implanted after 14 days, defined as the baseline (day 0).

Fig. 1depicts the logarithm of tumor growth volumes over an unequally spaced 28-day period for the 22 mice together with the associated sample regressograms for the generalized autoregressive parameters and sample innovation variances. The profile plot inFig. 1(a) reveals that the mean level exhibits a linear growth trend over time. The design matrix for the mean response for the 22 mice takes the form

Xi= [1 k], 1 = (1, 1, ... , 1)T, k = (0, 1, 2.5, 3.5, 4.5, 6, 7, 8, 10, 11.5, 13, 14)T, i= 1, ... , 22.

0

Days after tumor implantation 4 5 6 7 8 9

Log tumor volume

1 3 5 7 9 12 17 19 20 22 2 5 7 9 12 14 16 20 23 26 28 1 lag -1 0 1 2 Gen.auto.par 2 3 4 5 6 7 8 9 10 11 time index -5 -4 -3 -2 -1 Gen.innov.var 1 2 3 4 5 6 7 8 9 10 11 12

(7)

Table 1

Comparison of ˆmax, number of parameters, and BIC values for some Poly(d, q) choices of NJMM and TJMM.

(d, q) max Number of parameters BIC

NJMM TJMM NJMM TJMM NJMM TJMM (2,5) −25.1867 −15.9188 11 12 3.8352 3.1332 (2,6) −24.7836 −15.1324 12 13 3.9391 3.2022 (3,3) −23.0706 −13.3374 10 11 3.5023 2.7580 (3,4) −23.0593 −13.3187 11 12 3.6418 2.8968 (3,5)b _−8.9331 _−1.3094 ₁₂ ₁₃ _2.4981 _1.9456 (3,6) −8.9254 −1.3092 13 14 2.6379 2.0860 (4,3) −22.9333 −12.9712 11 12 3.6304 2.8652 (4,4) −22.9260 −12.9443 12 13 3.7702 3.0033 (4,5)a _−6.0243 _1.7060 ₁₃ ₁₄ _2.3742 _1.8119 (4,6) −6.0180 1.7060 14 15 2.5141 1.9524 (5,3) −22.9331 −12.9345 12 13 3.7709 3.0024 (5,4) −22.9257 −12.9066 13 14 3.9107 3.1404 (5,5) −6.0200 1.7065 14 15 2.5143 1.9524 (5,6) −6.0102 1.7065 15 16 2.6539 2.0929

a_{Poly(4,5) is the best model.} b_{Poly(3,5) is the second-best model.}

Table 2

Parameter estimates for the best two Poly(d, q) choices of NJMM and TJMM.

Poly(4,5) Poly(3,5)

NJMM TJMM NJMM TJMM

MLE SE MLE SE MLE SE MLE SE

ˆ 0 5.2960 0.1071 5.3535 0.1062 5.2575 0.0944 5.3145 0.0917 ˆ 1 0.1628 0.0071 0.1617 0.0066 0.1628 0.0073 0.1615 0.0067 ˆ 0 1.5646 0.1626 1.5787 0.1720 1.1649 0.0827 1.1745 0.0882 ˆ 1 −1.0816 0.1806 −1.0928 0.1899 −0.6121 0.0637 −0.6170 0.0678 ˆ 2 0.2555 0.0594 0.2587 0.0622 0.0970 0.0127 0.0975 0.0135 ˆ 3 −0.0248 0.0075 −0.0253 0.0078 −0.0047 0.0007 −0.0047 0.0008 ˆ 4 0.0009 0.0003 0.0009 0.0003 – – – – ˆ 0 6.8278 1.3174 6.0870 1.3882 6.1052 1.3174 5.3054 1.3880 ˆ 1 −11.3551 1.7462 −10.5501 1.8328 −10.2234 1.7462 −9.3982 1.8326 ˆ 2 4.7445 0.7534 4.4029 0.7908 4.2148 0.7534 3.8697 0.7907 ˆ 3 −0.8808 0.1402 −0.8241 0.1472 −0.7779 0.1402 −0.7199 0.1471 ˆ 4 0.0737 0.0117 0.0695 0.0123 0.0649 0.0117 0.0606 0.0123 ˆ 5 −0.0023 0.0004 −0.0022 0.0004 −0.0020 0.0004 −0.0019 0.0004 ˆ – – 7.6832 3.4450 – – 7.7313 3.4756

Furthermore,Figs. 1(b) and (c) suggest both

jk's and log

2j's could be well suited to polynomial functions in lags of degree greater than or equal to three. As inPourahmadi (2000), we use the Poly(d, q) as a shorthand for imposing two distinct polynomials of lagged times j− k and j with degrees d and q for

jkand log

2j, respectively. Specifically, the covariateszjkandwjare chosen as

zjk= (1, (j − k), (j − k)2, . . . , (j− k)d)T,

wj= (1, j, j2, . . . , jq)T, j= 1 ... , 12; k = 1, ... , j − 1.

To identify the optimal degrees of (d, q), we use the BIC-based model selection criterion. The best pair is said to be (d∗, q∗)= argmin_(d,q)_{{BIC(d, q)} with d and q lying within the range 0 to n}∗, where n∗_{= max}1imni,

BIC(d, q)= −_m2ˆ

max(d, q)+p+ d + q + 3

m log(m),

and ˆ

max(d, q) is the maximized log-likelihood with respect to Poly(d, q).

For comparison purposes, we fit the tumor data by using NJMM and TJMM under various possible choices of Poly(d, q). A stopping criterion used for the scoring algorithm was performed to terminate iterations whenˆ

a

(h+1)_{− ˆ}

_a

(h)

/ˆ

a

(h)

<

10−8has met. The values of

max, together with the corresponding number of parameters and BIC values for selected pairs (d, q) with better fittings are listed inTable 1. Judging from the BIC values, Poly(4, 5) and Poly(3, 5) are the best two choices for both NJMM and TJMM.Table 2shows the ML estimates and the associated standard errors for the best two fitting NJMM and TJMM. As seen in the table, it is noteworthy that the estimates of the df for the two fitted TJMM are somewhat small, reflecting the presence of longer-than-normal errors.

(8)

5

5 Theoretic chi–square quantiles

Fitted quantiles

1

0.5 Theoretic Snedecor–F quantiles

Fitted quantiles

10

15

20

25

30

1.0

1.5

2.0

2.5

3.0

3.5

10

15

20

2

3

4

Fig. 2. Healy plot when either (a) an NJMM-Poly(4,5) model or (b) a TJMM-Poly(4,5) model is fitted to the tumor growth data.

We consider diagnostics to assess the validity of the underlying assumptions for the error term. For NJMM, a formal measure for judging the joint normality of residuals is the Mahalanobis-like distance

i( ˆ

b

,

c

ˆ, ˆ

k

), which has an asymptotic chi-square distribution with nidf. Checking the normality can be achieved by constructingHealy's (1968)plot (or the chi-square plot), which is a well-known diagnostic tool widely used in the context of multivariate normal theory.

As for TJMM, it is known from Proposition 1(b) that n_i−1

i( ˆ

b

,

c

ˆ, ˆ

k

) follows the F distribution with niand

df. To assess the fitness of TJMM, one can construct another Healy-type plot (or the Snedecor-F plot) by plotting the ordered F statistics against the quantiles of F(ni,

) distribution for nominal values (i− 0.5)/m, i = 1, ... , m. To compare NJMM and TJMM fits with graphical diagnostics, one can examine whether the corresponding Healy's plot resembles a straight line through the origin having unit slope. In other words, the scatter curve with serious departure from the 45◦line indicates a poor fitting of the model.

Fig. 2displays Healy's plots for Poly(4,5) of NJMM and TJMM. A comparison of these two plots indicates the t model tracks the identity line more closely than the normal counterpart, revealing a substantial improvement provided by the use of multivariate

t distribution. Note that Healy's plots for Poly(3,5) of NJMM and TJMM are similar and thus are not shown in the present paper.

One essential characteristic of longitudinal data is that measurements within each experimental unit are collected repeatedly over time, so they typically exhibit serial correlations. Therefore, the predictive accuracy of future observations (Rao, 1987) can be treated as an alternative measure of fitness of the data. Now, we are interested in comparing the predictive performance of the proposed TJMM with that of t linear mixed models (TLMM), a generalization ofLaird and Ware's (1982)linear mixed models with the random effects and within-subject errors distributed as a multivariate t distribution. We conduct a preliminary analysis to select appropriate TLMM for the tumor growth curve data. Results suggest that TLMM with random intercepts and slopes plus AR(1) dependence (

ˆ= 9.06, BIC = 2.84), denoted by TLMM-RIS-AR(1), is the most preferred choice among some selected normal/t mixed models. The prediction approach for TLMM with autoregressive dependence was described inLin and Lee (2006). For ease of comparison, we use the pseudo-cross-validation approach to assess the relative predictive performance of TJMM-Poly(4,5) and TLMM-RIS-AR(1). This approach of comparing forecasts with the corresponding actual values is in spirit to holdout

cross-validation ofStone (1974)and predictive sample reuse ofGeisser (1975). The technique proceeds as follows: (i) holdout the last q measurements on the ith participant; (ii) compute ML estimates using the remaining data as the sample; (iii) prediction of the q true measurementsyi= (yi,12−q+1, . . . , yi,12)T, denoted by ˆ˜yi= (ˆ˜yi,12−q+1, . . . , ˆ˜yi,12)T, is made by using formula (12). The procedure is repeated across m subjects.

As a measure of predictive accuracies we consider the empirical mean squared forecast error (MSFE), defined as MSFE₌ 1 mq m i=1 ( ˆ˜y_i_{− y}i)T( ˆ˜yi− yi).

The prediction results for the first six step-ahead forecasts are shown inTable 3. It appears that the predictive performance of the TJMM predictor is substantially better than that of TLMM with relative improvement percentages (RIPs) ranging from 12.3% to 24.2%.

4.2. The orthodontic growth data

This example, originally analyzed byPotthoff and Roy (1964), studied the dental growth of distances (in millimeters) from the center of the pituitary to the pterygomaxillary fissure measured repeatedly at 8, 10, 12, and 14 years for 11 girls and 16 boys.

(9)

Table 3

Comparison of predictive accuracies in terms of MSFE for two robust t models. Model q step-ahead forecasts

q= 1 q= 2 q= 3 q= 4 q= 5 q= 6

TLMM 0.0372 0.1181 0.1021 0.1213 0.1158 0.1112

TJMM 0.0313 0.0971 0.0774 0.0991 0.0982 0.0975

RIP (%) 16.1 17.8 24.2 18.3 15.2 12.3

RIP means the relative improvement percentage (rate of MSFE improvement) of the TJMM predictor over the TLMM predictor.

Table 4

Parameter estimates of the Poly(1,1) model for the orthodontic growth data.

Parameter NJMM TJMM MLE SE MLE SE ˆ 0 16.0707 0.9829 16.5863 0.6851 ˆ 0 1.3198 1.5398 0.9819 0.9689 ˆ 1 0.8122 0.0839 0.7713 0.0601 ˆ 1 −0.3341 0.1314 −0.2999 0.0850 ˆ 0 0.7337 0.1653 1.0051 0.1243 ˆ 1 −0.2188 0.0890 −0.3551 0.0679 ˆ 0 1.8898 0.3333 1.5295 0.2853 ˆ 1 −0.3145 0.1217 −0.3612 0.0947 ˆ – – 5.5165 2.0384 ˆ max −212.8414 −205.4788 BIC 16.7426 16.3193

Pinheiro et al. (2001)applied TLMM to this data set and investigated the robustness of the t distribution via simulations. As in the preceding subsection, we compare the ML results under NJMM and TJMM for this example. Similar toPinheiro et al. (2001), we also illustrate the robustness of the proposed model by using a single contamination on this data set to see the influence for the estimated parameters.

Assuming a linear growth function for describing the mean response, we have

ij=

0+

0Ii(F)+ (

1+

1Ii(F))ti, i= 1, ... , 27, j = 1, ... , 4,

where

_ij= E(Yij) is the mean orthodontic distance for the ith subject at age tj;

0and

1denote the intercept and the slope for boys, respectively;

0and

1denote the difference in intercept and slope between girls and boys, respectively, and Ii(F) is an indicator variable for the female.

Since there are only four measurements for each boy and each girl, we assume a parsimonious Poly(1,1) model for describing the dependence structure. The corresponding ML estimates along with standard errors, ˆ

maxand BIC under the normal and t settings are shown inTable 4. In the table, there is strong evidence that the residuals exhibit a leptokurtic distribution (ˆ

= 5.52). Furthermore, the t model has a smaller BIC value and all the standard errors of parameter estimates under the t model are much smaller than the corresponding normal model. From the above studies, it definitely indicates that the fitting results of TJMM are significantly superior to those obtained by using the NJMM in this example.

To further see the robustness of TJMM estimates against perturbation, it can be done by (a) adding a contaminated value

to a single observation yij, denoted by yij(

)= yij+

; (b) re-estimating the parameters twice under the Poly(1,1) assumption of NJMM and TJMM; (c) computing the relative change in the estimates, i.e.,

ˆ(

)/

ˆ− 1. Here ˆ

denotes one of the original estimates inTable 4and

ˆ(

) the corresponding estimate for the contaminated data. In the orthodontic growth example, we illustrate such procedure by adding a contaminated value to the last observation (age_{= 14 years) of the first girl with various}

between −20 and 20 mm by increments of 2 mm.Fig. 3displays the percentage changes in ML estimates under both models for different contaminations

. It is readily seen that the parameter estimates made by NJMM are highly affected by a single outlier, whereas the influence for TJMM is limited in a short range. This suggests TJMM provides a favorable way for achieving robust statistical inference.

5. Simulation studies

We first undertook a simulation study to examine the performance and flexibility of the proposed TJMM approach. The two main objectives of the study are: (1) when data are simulated from normal distribution, whether the proposed model will be over-fitted? (2) when data are simulated from the t distribution, how bad can the normal models behave? For the sake of convenience, the true parameters are chosen to mimic the tumor data fitted with TJMM-Poly(3,3), except that the df is specified at two different settings. The ML estimates of parameters, treated as true parameters, are ˆ

b

= (5.27, 0.16)T_,

_c

_ˆ_{= (1.12, −0.58, 0.09, −0.0043)}T_and

(10)

β0 –20 0 1 2 3 4 5 δ0 –200 –150 –100 –50 0 50 100 % Change δ1 –100 –80 –60 –40 –20 0 20 40 γ0 –20 –15 –10 –5 0 5 TJMM NJMM λ0 –60 –50 –40 –30 –20 –10 0 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20 β1 –10 –8 –6 –4 –2 0 γ1 –25 –20 –15 –10 –5 0 5 λ1 –250 –200 –150 –100 –50 0 ξ –20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20

Fig. 3. Percentage of change in the ML estimates under the normal and t joint modeling models with varying contaminationof a single measurement.

ˆ

k

= (−0.4, −1.37, 0.18, −0.007)T_{. For the values of df's used in the study, we take a low value (}

_{= 4) for yielding a heavy-tailed} distribution and a high value (

= 50) for approaching to normality. Note that the simulated longitudinal data are `unbalanced' in the sense of unevenly spaced measurement times, although each subject has the same number of observations. We also note that if the df is set too high, say far more than 50, the estimate of df is hardly to converge and tends to infinity.

To examine the finite sample properties, the evaluated sample size ranges from small (m= 25) to relatively large (m = 100). Simulations were run with a total of 500 replications for each combination of

and m and each simulated data set was fitted under TJMM and NJMM scenarios. The detailed numerical results, including the average ML estimates for the fixed effects (

₀,

₁), the autoregressive parameters (

₀,

₁,

₂,

₃) and the scale innovation variances (

0,

1,

2,

3), the average of maximized log-likelihood values

max, the average of associated BIC values and the median estimates for the df, together with their standard errors in parentheses, are summarized inTable 5. FollowingTaylor and Verbyla (2004), due to the strong skewness of their Monte Carlo ML estimates, the median value was chosen as an appropriate estimator for

in the table.

When the data are generated near normality (

= 50), both models produce similar estimates and standard errors, although the TJMM is slightly over-fitted because of large standard errors of

's and slightly smaller BIC. Overall, we may say both models are equally better. When the data are generated from the t distribution with heavy tails (

= 4), the TJMM leads to significantly smaller biases and standard errors as well as the BIC than those made by NJMM, reflecting its superiority for outlier resistance. Table 5also suggests the bias and standard errors can be decreased as the sample size increases, confirming the consistence of estimators.

With the same data sets generated from the setting

= 4, we conduct a second simulation study to compare the modeling adequacy and prediction ability between TJMM and TLMM. The primary goal of this study is to investigate how much the TLMM will miss if the scale covariance depends on covariates, i.e.,

R

ihas a non-stationary dependence structure.Table 6displays the comparisons of average MSFEs for the first six step-ahead forecasts and the associated BIC values based on the 500 generated samples and each of the sample sizes m= 25, 100. As anticipated, the numerical results indicate the TJMM produces substantial gains in prediction accuracy as well as modeling adequacy.

6. Concluding remarks

We present a robust approach to the analysis of longitudinal data with the mean and the scale covariance modeled jointly in a framework of multivariate t errors. In particular, the reparameterization scheme in (2) suggests a flexible way of modeling the variance components as a function of covariates, making the model to be more parsimonious than the arbitrary covariance structure. We develop an explicit procedure for ML estimation based on the scoring method, which can be easily performed by practitioners. Numerical results illustrated in Section 4 indicate that the proposed TJMM is suitable for these two real-life applications and yields better performance on the prediction of future values. The simulation results suggest the proposed TJMM is a promising technique for longitudinal data with nonstationary dependence structure. As a note in passing, the situation in

(11)

Table 5

Simulation results: average estimates forb,c,k,maxand BIC and median estimates forbased on 500 replications. Values within parentheses are empirical standard errors.

Parameter True value m= 25 m= 100

= 4 = 50 = 4 = 50 TJMM NJMM TJMM NJMM TJMM NJMM TJMM NJMM 0 5.27 5.2701 5.2732 5.2802 5.2721 5.2700 5.2674 5.2720 5.2696 (0.0841) (0.1786) (0.0788) (0.0892) (0.0497) (0.0841) (0.0372) (0.0375) 1 0.16 0.1601 0.1602 0.1598 0.1599 0.1601 0.1601 0.1600 0.1599 (0.0070) (0.0089) (0.0065) (0.0071) (0.0035) (0.0044) (0.0033) (0.0033) 0 1.12 1.0984 1.0820 1.1038 1.0956 1.1141 1.1067 1.1184 1.1154 (0.1134) (0.1566) (0.1049) (0.1095) (0.0543) (0.0851) (0.0483) (0.0517) 1 −0.58 −0.5649 −0.5540 −0.5705 −0.5640 −0.5760 −0.5702 −0.5797 −0.5766 (0.0887) (0.1227) (0.0832) (0.0852) (0.0423) (0.0665) (0.0377) (0.0401) 2 0.09 0.0872 0.0852 0.0884 0.0872 0.0892 0.0881 0.0901 0.0894 (0.0181) (0.0250) (0.0171) (0.0173) (0.0086) (0.0135) (0.0077) (0.0081) 3 −0.0043 −0.0042 −0.0040 −0.0042 −0.0042 −0.0043 −0.0042 −0.0043 −0.0043 (0.0011) (0.0015) (0.0010) (0.0010) (0.0005) (0.0008) (0.0005) (0.0005) 0 −0.4 −0.4175 0.1527 −0.3997 −0.3981 −0.4032 −0.2801 −0.4024 −0.3801 (0.5387) (0.7327) (0.4802) (0.4516) (0.2501) (0.4531) (0.2332) (0.2285) 1 −1.37 −1.3636 −1.3525 −1.4005 −1.3776 −1.3666 −1.3741 −1.3713 −1.3643 (0.3400) (0.4620) (0.3101) (0.3069) (0.1505) (0.2427) (0.1487) (0.1443) 2 0.18 0.1797 0.1786 0.1858 0.1831 0.1800 0.1809 0.1805 0.1796 (0.0598) (0.0814) (0.0546) (0.0553) (0.0262) (0.0418) (0.0261) (0.0255) 3 −0.007 −0.0071 −0.0072 −0.0073 −0.0072 −0.0070 −0.0071 −0.0070 −0.0070 (0.0030) (0.0041) (0.0028) (0.0028) (0.0013) (0.0021) (0.0013) (0.0013) 4.31 – 39.52 – 4.09 – 49.62 – (2.25) – (45.33) – (0.77) – (38.38) – max −14.27 −45.78 39.13 39.87 −73.93 −237.84 138.29 141.18 (28.56) (41.73) (14.17) (14.32) (58.27) (117.55) (26.60) (27.85) BIC 2.56 4.95 −1.71 −2.39 1.98 5.22 −2.26 −2.62 (2.28) (3.34) (1.15) (1.13) (1.16) (2.35) (0.53) (0.56) Table 6

Comparison of predictive accuracies in terms of MSFE and the BIC values for two robust t models.

m Model q step-ahead forecasts BIC

q= 1 q= 2 q= 3 q= 4 q= 5 q= 6 TLMM 0.1042 0.1208 0.1345 0.1472 0.1582 0.1631 3.76 (0.0611) (0.0608) (0.0661) (0.0626) (0.0710) (0.0711) (2.33) 25 TJMM 0.0920 0.1098 0.1225 0.1338 0.1411 0.1426 2.56 (0.0583) (0.0598) (0.0619) (0.0659) (0.0705) (0.0646) (2.28) RIP (%) 11.7 9.1 8.9 9.1 10.8 12.5 31.9 TLMM 0.1068 0.1285 0.1437 0.1556 0.1652 0.1724 3.43 (0.0400) (0.0405) (0.0413) (0.0437) (0.0464) (0.0511) (1.17) 100 TJMM 0.0961 0.1182 0.1322 0.1408 0.1477 0.1522 1.98 (0.0363) (0.0394) (0.0405) (0.0382) (0.0408) (0.0487) (1.16) RIP (%) 10.1 8.1 8.0 9.5 10.6 11.7 42.3

The relative improvement percentage (RIP) is measured by (TLMM− TJMM)/TLMM × 100%. (Replications = 500).

which subjects containing intermittent missing values can be easily handled via the EM-type algorithms (Dempster et al., 1977; Meng and Dyk, 1997).

Some possibilities for the future research along this line are as follows. One possible extension is to apply the joint mean–covariance modeling approach to other non-normal random errors such as the skew normal distribution (Azzalini and Dalla Valle, 1996;Azzalini and Capitaino, 1999) and the skew t distribution (Jones and Faddy, 2003;Azzalini and Capitaino, 2003). With the development of rapid computational techniques and low-cost computer powers, another worthwhile task is to develop a fully Bayesian inference for gaining more reliable inferences via the Markov chain Monte Carlo method.

Acknowledgments

The authors are grateful to the Executive Editor, the Associate Editor and the referee for their valuable comments and helpful suggestions which led to substantial improvements of the paper. This research work was supported by the National Science Council of Taiwan (Grant no. NSC97-2118-M-005-001-MY2).

(12)

Appendix A. Elements of the Hessian matrix H_bb= − m i=1

i XTi

R

−1i Xi− 2 XT i

R

−1i rirTi

R

−1i Xi

+

i(

b

,

k

,

c

) ! , (A.1) H_bc= n i=1

i

*

X T iL T i

*c

T (Id⊗ D −1 i Liri)+ XTiLTi

*

D−1i Liri

*c

+ 2 XT i

R

−1i ri(ri− Zi

c

)TD−1_i Zi

+

i(

b

,

k

,

c

) ! , (A.2) H_bk= m i=1

i ⎧ ⎨ ⎩

*

XT iLTiD−1i Liri

*k

T − XT i

R

−1i ri( n_j₌₁i

2_ijwT_j)

+

i(

b

,

k

,

c

) ⎫ ⎬ ⎭, (A.3) H_b= m i=1 ₁

+

i(

b

,

k

,

c

)−

i

+

i(

b

,

k

,

c

) XT i

R

−1i ri, (A.4) H_cc_{= −} m i=1

i ZTiD−1i Zi− 2ZT iD−1i (ri− Zi

c

)(ri− Zi

c

)TD−1i Zi

+

i(

b

,

k

,

c

) ! , (A.5) H_ck= − m i=1

i ⎧ ⎨ ⎩ZiTD−1i

W

i− ZT iD−1i (ri− Zi

c

)( nj=1i

2ijwTj)

+

i(

b

,

k

,

c

) ⎫ ⎬ ⎭, (A.6) H_kk_{= −}1 2 m i=1

i ⎧ ⎨ ⎩ ni j=1

2 ijwjwjT− ( ni j=1

2ijwj)( nj=1i

2ijwTj)

+

i(

b

,

k

,

c

) ⎫ ⎬ ⎭, (A.7) H_k=1₂ m i=1 ₁

+

i(

b

,

k

,

c

)−

i

+

i(

b

,

k

,

c

) ni j=1

2 ijwj, (A.8) H_c= m i=1 ₁

+

i(

b

,

k

,

c

)−

i

+

i(

b

,

k

,

c

) ZT iD−1i (ri− Zi

c

), (A.9) H=1₂ m i=1 ₁ 2

+ ni 2 −1₂

₂

+ ni

2 +

i(

b

,

k

,

c

)

+

i(

b

,

k

,

c

) ₁ −

i

− ni

2 , (A.10)

where

W

i is defined in (8), Id is the identity matrix of order d and ⊗ is the Kronecker product. Note that

*

D−1i /

*

k= diag{wT

1ek

−21 , . . . ,wTniek

−2

ni} with ekbeing a q× 1 binary vector whose kth element is one and is zero elsewhere;

*

Li/

*

kis

a lower triangular matrix whose (j, k)th entry iszT

jse∗k, for j= 2, ... , niand s= 1, ... , (i − 1), where e∗kis a d× 1 binary vector with the kth element being one and zero elsewhere.

Appendix B. Proof of Proposition 2

(a) By Proposition 1(c), we have

E ₁

+

i(

b

,

c

,

k

) = E _B

=

_{+ n}1 i . (b) Since

i= (ni+

)/(

+

i(

b

,

c

,

k

))= ((ni+

)/

)B, we have E

i

+

i(

b

,

c

,

k

) =

+ ni

2 E(B 2₎₌

+ 2

(

+ ni+ 2) .

(13)

(c) Denote by tp(ri|

,

R

,

) the density of Tp(

,

R

,

). Note thatri∼ Tni(0,

R

i,

), we then have E(

2_irirTi)= (

+ ni)2

2 " rirTi

+ ni 2 |

R

i|−1/2

₂

(

)ni/2 1+r T i

R

−1i ri

−((+4)+ni)/2 dri = (

+ ni)2

2

+ ni 2 |

R

i|−1/2

₂(

)ni/2

+ 2 2

2

2 (

(

+ 4))ni/2

+ ni+ 2 2

+ ni 2

_{+ n} i 2

_{+ 4}

ni/2 |

R

i|−1/2 ×" rirTitni ri|0,

+ 4

R

i,

+ 4 dri = (

+ ni) (

+ ni+ 2)

R

i . (d) Define_Li= [

i1

i2 · · ·

ini] T_{. Since}

Liri= (ri− ˆri), it implies that (rij− ˆrij)=

Tijri. We then have

E ⎛ ⎝

_ini j=1

2 ijwjwTj ⎞ ⎠ = E ⎧ ⎨ ⎩

i ni j=1

−2 j wjrTi

ij

TijriwjT ⎫ ⎬ ⎭ = ni j=1

−2 j wjE(

irTi

ij

Tijri)wTj = ni j=1

−2 j wj(tr(

ij

Tij

R

i))wTj = ni j=1 wjwTj,

where the last equality follows directly from tr(

ij

Tij

R

i)=

Tij

R

i

ij=

2j. (e) The result follows directly from the fact of

E

*

2 log f (_Yi)

kk

T = −1₂ ⎧ ⎨ ⎩E ⎛ ⎝

_ini j=1

2 ijwjwTj ⎞ ⎠ − 1

+ ni E ⎛ ⎝

2 i ⎛ ⎝ni j=1

2 ijwj ⎞ ⎠ ⎛ ⎝ni j=1

2 ijwTj ⎞ ⎠ ⎞ ⎠ ⎫ ⎬ ⎭ = − E

*

log f (Yi)

*k

*

log f (Yi)

*k

T =1₄ ⎛ ⎝ ni j=1 wj ⎞ ⎠ ⎛ ⎝ ni j=1 wT j ⎞ ⎠ −1₄E ⎛ ⎝

2 i ⎛ ⎝ ni j=1

2 ijwj ⎞ ⎠ ⎛ ⎝ ni j=1

2 ijwTj ⎞ ⎠ ⎞ ⎠ . (f) Sincez(i, j) = j−1

k=1rikzjk=

j

jri, the following trivially holds:

ZT iD−1i Zi= ni j=1

−2 j z(i, j)z T_{(i, j)}₌ ni j=1

−2 j

j

jrir T i

j

Tj. Therefore, E(

iZTiD−1i Zi)= E ⎛ ⎝

i ni j=1

−2 j

j

jrirTi

j

Tj ⎞ ⎠ =ni j=1

−2 j

j

R

i

j

Tj.

(g) Since E(

irirT_i)=

R

i, it is straightforward to show that

E(

iz(i, j)rij)= E ⎛ ⎝

i j−1 k=1 r_ik_z_jkrij ⎞ ⎠ =j−1 k=1 E(

irikrij)zjk= j−1 k=1

kjzjk.

(14)

(h) SinceZT

iD−1i

W

i= jn=1i

−2j z(i, j)rijwTj − ni

j=1

−2j z(i, j)zT(i, j)

c

wTj, it suffices to verify that

E(

iZTiD−1i

W

i)= ni j=1

−2 j E(

iz(i, j)rij)wTj− ni j=1

−2 j E(

iz(i, j)zT(i, j))

c

wTj = ni j=1

−2 j ⎛ ⎝j−1 k=1

kjzjk ⎞ ⎠ wT j− ni j=1

−2 j (

j

R

i

j

Tj)

c

w T j = ni j=1

−2 j ⎛ ⎝ j−1 k=1

kjzjkwTj−

j

R

i

j

Tj

c

wTj ⎞ ⎠ . References

Azzalini, A., Capitaino, A., 1999. Statistical applications of the multivariate skew-normal distribution. J. Roy. Statist. Soc. B 61, 579–602.

Azzalini, A., Capitaino, A., 2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. Roy. Statist. Soc. B 65, 367–389.

Azzalini, A., Dalla Valle, A., 1996. The multivariate skew-normal distribution. Biometrika 83, 715–726. Demidenko, E., 2004. Mixed Models: Theory and Applications. Wiley, New York.

Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. B 39, 1–38. Geisser, S., 1975. The predictive sample reuse method with applications. J. Am. Statist. Assoc. 70, 320–328.

He, X., Cui, H., Simpson, D.G., 2004. Longitudinal data analysis using t-type regression. J. Statist. Plann. Inference 122, 253–269. Healy, M.J.R., 1968. Multivariate normal plotting. Appl. Statist. 17, 157–161.

Jones, M.C., Faddy, M.J., 2003. A skew extension of the t-distribution, with applications. J. Roy. Statist. Soc. B 65, 159–174. Kotz, S., Nadarajah, S., 2004. Multivariate t Distributions and their Applications. Cambridge University Press, Cambridge. Laird, N.M., Ware, J.H., 1982. Random effects models for longitudinal data. Biometrics 38, 963–974.

Lange, K.L., Little, R.J.A., Taylor, J.M.G., 1989. Robust statistical modeling using the t distribution. J. Amer. Statist. Assoc. 84, 881–896. Lin, T.I., Lee, J.C., 2006. A robust approach to t linear mixed models applied to multiple sclerosis data. Statist. Med. 25, 1397–1412. Meng, X.L., Dyk, V.D., 1997. The EM algorithm—an old folk-song sung to a fast new tune. J. Roy. Statist. Soc. B 59, 511–567. Nadarajah, S., Kotz, S., 2005. Mathematical properties of the multivariate t distribution. Acta Appl. Math. 89, 53–84. Pan, J., MacKenzie, G., 2003. On modelling mean–covariance structures in longitudinal studies. Biometrika 90, 239–244.

Pinheiro, J.C., Liu, C.H., Wu, Y.N., 2001. Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. J. Comput. Graph. Statist. 10, 249–276.

Potthoff, R.F., Roy, S.N., 1964. A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika 51, 313–326. Pourahmadi, M., 1999. Joint mean–covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika 86, 677–690. Pourahmadi, M., 2000. Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika 87, 425–435. Pourahmadi, M., 2001. Foundations of Time Series Analysis and Prediction Theory. Wiley, New York.

Rao, C.R., 1987. Prediction of future observations in growth curve models (with discussion). Statist. Sci. 2, 434–471.

Ryggard, K., Spang-Thomsen, M., 1997. Quantitation and Gompertzian analysis of tumor growth data. Breast Cancer Res. Treat. 46, 303–312. Stone, M., 1974. Cross-validatory choice and assessment of statistical prediction (with discussion). J. R. Statist. Soc. B 36, 111–147. Taylor, J., Verbyla, A., 2004. Joint modelling of location and scale parameters of the t distribution. Statist. Modelling 4, 91–112.

Welsh, A.H., Richardson, A.M., 1997. Approaches to the robust estimation of mixed models. In: Maddlal, G.S., Rao, C.R. (Eds.), Handbooks of Statistics, vol. 15. Elsevier Science, Amsterdam, pp. 343–383.

Zacks, S., 1971. The Theory of Statistical Inference. Wiley, New York.

A robust approach to joint modeling of mean and scale covariance for longitudinal data

Journal of Statistical Planning and Inference

A robust approach to joint modeling of mean and scale covariance for

longitudinal data

Tsung-I. Lin

, Yun-Jen Wang

R

R

R

R

R

R





b

R



R







R





R

















c



k

c

k

c

k

R

R

b



b

c

k

R

a

b

c

k

b

c

k

a

a

R



b

c

k



k

c

c

c





b

c

k





b

c

k

a

b

c

_R