• 沒有找到結果。

A robust approach to t linear mixed models applied to multiple sclerosis data

N/A
N/A
Protected

Academic year: 2021

Share "A robust approach to t linear mixed models applied to multiple sclerosis data"

Copied!
16
0
0

加載中.... (立即查看全文)

全文

(1)

Published online 11 October 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2384

A robust approach to t linear mixed models applied to

multiple sclerosis data

Tsung I. Lin

1;∗;†

and Jack C. Lee

2

1Department of Applied Mathematics; National Chung Hsing University; Taichung 402; Taiwan 2Institute of Statistics and Graduate Institute of Finance; National Chiao Tung University;

Hsinchu 30050; Taiwan

SUMMARY

We discuss a robust extension of linear mixed models based on the multivariate t distribution. Since longitudinal data are successively collected over time and typically tend to be autocorrelated, we employ a parsimonious rst-order autoregressive dependence structure for the within-subject errors. A score test statistic for testing the existence of autocorrelation among the within-subject errors is derived. Moreover, we develop an explicit scoring procedure for the maximum likelihood estimation with standard errors as a by-product. The technique for predicting future responses of a subject given past measurements is also investigated. Results are illustrated with real data from a multiple sclerosis clinical trial. Copyright ? 2005 John Wiley & Sons, Ltd.

KEY WORDS: Fisher scoring; longitudinal data; prediction; random eects; t-REML

1. INTRODUCTION

Multiple sclerosis (MS), one of the most common chronic diseases of the central nervous sys-tem in young adults, occurs when the myelin around the nerve bres in the brain becomes damaged. As yet, the precise causes of MS remain unknown, though abundant re-search suggests MS may be an autoimmune disease in which the immune system attacks its own myelin, causing disruptions to the nerve transmissions. There are no drugs to cure MS, but some treatments are available to ease the symptom. For example, interferon beta-1b (INFB) was approved by the US Food and Drug Administration in mid-1993 for use in early stage relapsing-remitting MS (RRMS) patients. For diagnosis, cranial magnetic resonance imaging

Correspondence to: Tsung I. Lin, Department of Applied Mathematics, National Chung Hsing University, Taichung

402, Taiwan.

E-mail: tilin@amath.nchu.edu.tw

Contract=grant sponsor: National Science Council of Taiwan; contract=grant numbers: NSC93-2118-M-029-004, NSC93-2118-M-009-003

(2)

(MRI) is the most preferred tool for monitoring MS evolution in both natural history studies and treatment trials.

Gill [1] presents a robust approach based on Huber’s  function to a linear mixed model for the analysis of a data set, called the MS data throughout this paper, from a cohort study of 52 patients with RRMS. The study was a placebo-controlled trial of interferon beta-1b (INFB) in which patients were randomized to either a placebo (PL), a low-dose (LD), or a high-dose (HD) treatment. The LD and HD treatments correspond to high-doses of 1.6 and 8 million international units (MIU) of IFNB every other day, respectively. Each patient had a baseline cranial MRI and subsequent MRIs once every 6 weeks over two years. The 6-weekly serial MRI data were collected from June 1988 to May 1990 at the University of British Columbia site.

The use of the t distribution in place of the normal for robust regression has been inves-tigated by a number of authors, including West [2], Lange et al. [3] and James et al. [4]. The linear mixed model with multivariate t distributed responses, called the t linear mixed model hereafter, was considered by Welsh and Richardson [5], however, they do not explic-itly discuss or derive the distributions of the random eects as well as the error terms. More recently, Pinheiro et al. [6] incorporated multivariate t distributed random eects and error terms to formulate a normal–normal–gamma hierarchy for the t linear mixed model. They provide several ecient EM-type algorithms for maximum likelihood (ML) estimation and illustrate the robustness with respect to outlying observations using a real example and some simulation results.

In this paper, we develop additional tools for a simplied version of the Pinheiro et al. [6] model and use these tools to analyse the MS data. The model considered here is

Yi=XiR+Zibi+ii; bi|iNm2  0;2 i   ii|iNpi  0;2 iCi  ; iGa(=2; =2); (i = 1; : : : ; N ) (1)

where i is the subject index,Yi is a pi-dimensional observed response vector, N is the number

of subjects, Xi and Zi are, respectively, known pi×m1 and pi×m2 design matrices, R is an

m1×1 vector of xed eects, bi is an m2×1 vector of unobservable random eects, i is an

unknown scale assumed to be distributed as gamma with mean 1 and variance 2=, and bi|i

and ii|i are assumed to be independent. Furthermore,  is an m2×m2 matrix, which may

be unstructured or structured, and Ci is a pi×pi correlation matrix.

Pinheiro et al. [6] consider a general model where Ci is allowed to depend upon a vector

of parameters and the parameter  is allowed to vary across subgroups of subjects. In this paper, we exploit the widely used autoregressive structure to model the dependence for the within-subject errors. As an illustration, we concentrate on the simple case where Ci has an

AR(1) dependence structure that is common to all subjects, i.e.

Ci=Ci() = [|r−s|]; r; s = 1; 2; : : : ; pi (2)

The dependence structure ofCi can be extended to a high order autoregressive moving average

(3)

In model (1), the marginal distribution of the response Yi, after integrating over bi and i,

can be expressed as

Yitpi(XiR; 2i; ) (3)

where i=i(; ) = ZiZi+Ci() and tp(\; ; ) denotes the p-dimensional multivariate t

distribution with location vector \, scale matrix  and degrees-of-freedom (d.f.) .

In Section 2, we describe computational aspects of both the ML estimation and restricted maximum likelihood (REML) estimation based on the marginal likelihood of (3). Section 3 describes how to obtain the score test statistic for testing H0:  = 0 for Ci in (2) against the

alternative hypothesis = 0. Section 4 discusses inferences of random eects and prediction problems. In Section 5, we illustrate the proposed methodologies with the MS data. Finally, Section 6 oers some concluding remarks.

2. ESTIMATION

For notational convenience, let ei=Yi XiR, i(R; ; ) = ei i(; )ei and n =Ni=1pi

denote the total number of observations. Given independent observations Y1; : : : ; YN, write

the log-likelihood function as ‘ =N i=1  log    + p i 2  log    2  n 2 log( 2) 1 2 N  i=1 log|i(; )| − 1 2 N  i=1 ( + pi) log  1 +i(R; ; ) 2  (4)

To ensure non-negative deniteness of , we reparameterize  = FF by the Cholesky decomposition, where F is an upper triangular matrix. Let Q= (R;X), where X= (2; vech(F);

; ) is the vector of unknown model parameters excluding the xed eects R. Explicit ex-pressions for the score vectorsQ= (sR; sX) and the Fisher information matrixIQQ are derived in Appendix A. We employ the Fisher scoring algorithm to obtain the ML estimate. Under some regularity conditions, the asymptotic variance–covariance estimates can be computed by plugging the ML estimate ˆQ= ( ˆR; ˆX) into the inverse of the Fisher information matrix. The asymptotic covariance matrix of ˆR and ˆX can be obtained by

var( ˆR) = ˆI−1RR = ˆ2 N i=1 ˆ + pi ˆ + pi+ 2X  i ˆ−1i Xi −1 var(ˆX) = ˆI−1XX (5)

A disadvantage of the ML estimates of variance components is that they are biased down-ward in nite samples. REML corrects for the loss of degrees-of-freedom incurred in estimating the xed eects and produces unbiased estimating equations for the variance com-ponents. As pointed out by Harville [10], REML can be viewed as the Bayesian princi-ple of marginal inference. Adopting the prior distribution (R;X) 1 and Laplace’s method as in Welsh and Richardson [5], the t-REML likelihood function can be approximated

(4)

by LR(X) = L(R;X) dR ≈LR(X), where L R(X) = (2)−n=2 i=1N X i HiXi −1=2 N i=1 (( + pi)=2) (=2) × |i|−1=2 1 +i( ˆR(X);; ) 2 −(+pi)=2 (6) Hi = ( + pi) −1 i 2 + i( ˆR(X);; ) 2−1i ˆei(X)ˆei (X)−1i (2 + i( ˆR(X);; ))2  ˆ

ei(X) =YiXiRˆ(X), and ˆR(X) is obtained by solving the following estimating equation: N  i=1( + pi) X i −1i (YiXiR) 2 + i(R; ; ) = 0 (7)

The resulting (approximately) REML estimate of X, ˆXR, can be obtained by implementing

the Newton–Raphson (NR) algorithm with the ML estimate as the initial value. In the NR algorithm, the empirical Bayes estimate of R, ˆR(ˆXR), must be computed at each iteration. It

can be easily obtained by solving the estimating equation (7) with X replaced by current estimate ˆXR.

Appendix B presents the necessary rst partial derivatives of ‘R(X), the logarithm of LR(X) in (6), for the NR algorithm. However, the second partial derivatives of ‘R(X) are tedious. The entries of the Hessian matrix can instead be approximated numerically.

3. THE SCORE TEST FOR AUTOCORRELATION

It is of interest to test whether autocorrelation exists among the within-subject errors. We derive a score test statistic which is asymptotically a chi-squared random variable with 1 d.f. and can be easily computed. In this context, the score test statistic is based on the score vector and information matrix evaluated under H0 :  = 0. The advantage of the score test

over other testing procedures such as the likelihood ratio or Wald tests is that it does not require comparison with the alternative model. Rejection of the null model does not indicate that the AR(1) dependence structure is appropriate, however, it provides a simple check for the presence of possible autocorrelation among the within-subject errors.

Let W= (2; vech(F); ), so that X= (W; ). Because I

RX=0, the information matrix IQQ of Q= (R;W; ) can be reexpressed as the block partitioned matrix

IQQ= ⎡ ⎢ ⎢ ⎢ ⎣ IRR 0 0 0 I WW IW 0 I W I ⎤ ⎥ ⎥ ⎥ ⎦ (8)

(5)

Let ˆQ0= ( ˆR0; ˆW0; 0) denote the ML estimates of R and W under H0:  = 0. The score vector

[@‘=@Q]Q0ˆ has all components equal to 0 except the derivative with respect to . The score

test statistic s is s= @‘ @Q  ˆ Q0 [IQQ]−1Q0ˆ @‘ @Q  ˆ Q0 =[@‘=@] 2 ˆ Q0 [I·W]Q0ˆ (9) where @‘ @= 1 2 N  i=1tr  −1 i @C@i  + 1 2 N  i=1( + pi) e i −1i @C@ i −1i ei 2 + i(R; ; ) I·W=IIWIWW−1IW (10)

The detailed expressions for I, IW and IWW are given in Appendix A.

When evaluated at Q= ˆQ0, we obtain ˆui= [−1i ei]Q0ˆ =Yi XiR −ˆ Ziˆbi, where ˆbi=

(Zi Zi+ ˆ−1)−1Zi(YiXiRˆ) and ˆui can be viewed as the vector of residuals from the model

for subject i. In addition, [C−1i ]Q0ˆ =Ipi and [@Ci=@]Q0ˆ =Li +Li, whereLi is a pi×pi matrix

with the entries of 1 on the rst super- and sub-diagonals and 0 elsewhere. Applying −1 i = (ZiZi +Ci)−1=C−1i C−1i Zi(Zi C−1i Zi+−1)−1Zi C−1i we obtain  tr  −1 i @C@i  ˆ Q0 = 2tr((Zi Zi+ ˆ−1)−1Zi LiZi) (11)

The score statistic s can be calculated, from (11), as

s= N i=1tr((Zi Zi+ ˆ−1)−1ZiLiZi) +Ni=1(ˆ + pi) uˆ  i Liuˆi ˆ 2 ˆ +i( ˆR; ˆ; 0) 2 [I·W]Q0ˆ where i( ˆR; ˆ; 0) = (YiXiR −ˆ Ziˆbi)uˆi.

4. INFERENCES FOR RANDOM EFFECTS AND PREDICTION

We consider empirical Bayes estimates of the random eects, which are useful in explain-ing subject-specic deviations and helpful in predictexplain-ing future measurements. If values of Q= (R; 2; ; ; ) were known, the conditional mean of b

i given Y is

ˆbi(Q) = [Im2Wi(Wi+)−1]b

i(Q) (12)

where Wi= (ZiC−1i Zi )−1 andbi(Q) =Wi()Zi Ci−1(YiXiR). The resulting error covariance

matrix is E((ˆbi(Q)bi)(ˆbi(Q)bi)) =  2 2[WiWi(Wi+) −1W i] (13)

(6)

see Appendix C. As in Reference [11], substituting the ML estimate ˆQ into (12) leads to the empirical Bayes estimate ˆbi= ˆbi( ˆQ).

Furthermore, we are interested in the prediction ofyi, a future q×1 vector of measurements

of Yi, given the observed measurements Y = (Y(i); Yi), where Y(i)= (Y1; : : : ; Yi−1 ; Yi+1;

: : : ; Y

N). Given Q, the vectors (Yi; yi) (i = 1; : : : ; N ) are independent, so it is only

nec-essary to consider the joint distribution of Yi and yi in predicting yi.

Let xi and zi denote q×m1 and q×m2 matrices of prediction regressors corresponding to

yi. We thus have  Yi yi  tpi+q(XiR; 2; )

where Xi = (Xi ; xi ),Zi = (Zi ; zi ), = ZiZi+Ci and Ci = [|r−s|] for r; s = 1; : : : ; pi+ q. Let Ci and i be partitioned conformably with (Yi; yi), i.e.

C i =  C11 C12 C21 C22  ;  =  11 12 21 22  =  ZiZi +C11 Zizi +C12 ziZi +C21 zizi +C22 

where C11=Ci and C21=C12. The use of

f(yi|Yi) 

f(yi|Yi; i)f(Yi|i)f(i) di

leads to

yi|Yitq(\i; 2·1; !i22·1;  + pi)

where\i; 2·1=xiR+21−111(Yi−XiR),22·1=22−21−11112 and !i= (2+(Yi−XiR)−111

(YiXiR))=( + pi). The minimum MSE predictor of yi is the conditional expectation of yi

given Yi, i.e.

ˆ

yi(Q) =xiR+21−111(YiXiR)

=xiR+ziˆbi(Q) +C21C11−1(YiXiR −Ziˆbi(Q)) (14)

where ˆbi(Q) is given in (12). Similarly, the error covariance matrix for the predictor (14) is

given by

E((ˆyi(Q)yi)(ˆyi(Q)yi)) =  + pi

 + pi2!i22·1

where 22·1 can be rewritten as

22·1=C22·1+ (ziC21C−111Zi)(W11W11( + W11)−1W11)(ziC21C−111Zi) (15)

with C22·1=C22C21C−111 C12 and W11= (Zi C−111Zi)−1; see Appendix D. The prediction of

(7)

5. EXAMPLE

To illustrate the methodology, we next analyse the MS data. More details of the clinical trial leading to the MS data is described by D’yachkova et al. [12]. The PL, LD and HD treatment groups involve 17, 18, and 17 patients, respectively. Of the 52 patients, 3 patients were not included in the analysis since two of them (one in each of groups LD and HD) dropped out very early and one in group LD had 3 measurements of zero on MRI scans. All but 5 of the remaining patients have a complete set of 17 scans: one dropped out from PL after completing 14 visits, two dropped out from LD after completing 13 visits, and two dropped out from HD after completing 12 visits. We will analyse the data of the 49 patients. Our analyses will assume these early dropouts are ignorable [13]. Of these 49 patients, six patients have one or two isolated MRI scans missing. We impute these missing observations using the mean of two adjacent values as in Reference [1].

The disease burden, the total area (in mm2) of MS lesions on all slices of an MRI scan,

for the ith patient at time point j is denoted by Area(i; j), with j = 0 as the baseline time point. The corresponding log relative burden (LRB) is LRB(i; j) = log(Area(i; j)=Area(i; 0)), which is used as the response variable Yij due to strong skewness of the untransformed burden

measurements. Figure 1 depicts the LRB evolution of the 49 patients from various groups. Apparently, the MS data involve many outlying observations, especially for PL and LD.

We model the average evolution of LRB as a linear function of time and carry out the anal-yses separately for the 3 treatment groups. For the xed eects, we set R= (0; 1) with the

corresponding design matrix Xi= [1pi ki], where 1pi= (1; 1; : : : ; 1) and ki= (1; 2; : : : ; pi).

To explore the autocorrelation among the within-subject errors, we start by tting model (1) with random intercepts (Zi=1pi) and white noise errors ( = 0), denoted by M1.

Table I lists the resulting ML estimates with their standard errors in parentheses obtained from (5), Akaike’s Information Criteria (AIC = 2×maximized log-likelihood + 2m, where m is the number of model parameters), along with the values of the score test statistic s for

the three groups. The score tests are all highly signicant compared with the 2

1 distribution,

indicating that there exists autocorrelation among the within-subjects errors. Moreover, the estimates of the d.f. ’s are quite small, which justify the modelling eort via the t distribution.

PL (n=17) -3 -2 -1 0 1 LRB -1 0 1 2 3 LRB LD (n=16) 5 10 15 time 5 10 15 time 5 10 15 time -1.0 -0.5 0.0 0.5 1.0 LRB HD (n=16)

(8)

Table I. ML estimation results and score test statistics forM1, where f is such that  = f2. Group 0 1 2 f  AIC s −0:0076 0.0185 0.0143 1.8290 1.82 PL −99:02 16.91 (0.0460) (0.0015) (0.0042) (0.272) (0.59) −0:0048 0.0122 0.0132 0.9462 2.08 LD −145:58 21.30 (0.0325) (0.0015) (0.0039) (0.194) (0.71) −0:0846 0.0143 0.0128 1.0856 4.58 HD −251:44 49.58 (0.0356) (0.0015) (0.0027) (0.2185) (1.87)

Table II. ML estimation results forM2, where f is such that  = f2.

Group 0 1 2 f   AIC −0:0036 0.0176 0.0154 1.8290 0.3164 1.81 PL −116:26 (0.0476) (0.0020) (0.0046) (0.7322) (0.0685) (0.59) −0:0040 0.0125 0.0137 0.8734 0.3584 1.97 LD −167:10 (0.0343) (0.0021) (0.0043) (0.1994) (0.0712) (0.67) −0:0814 0.0137 0.0174 0.8348 0.5587 6.79 HD −310:38 (0.0400) (0.0027) (0.0040) (0.2192) (0.0689) (3.15)

As a note in passing, the REML estimates are somewhat similar to the ML estimates and hence are omitted for the rest of the paper.

We further t an alternate model with random intercepts and an AR(1) dependence struc-ture for the within-subject errors, denoted by M2. The corresponding ML estimation results

and AICs are shown in Table II. Based on smaller AIC values, M2 is preferred to M1.

Figure 2 displays the prole likelihood function of  and f, where f satises  = f2.

Ob-viously, all three plots are unimodal and exhibit signicant serial correlations, indicating that the AR(1) model is a possible dependence structure for modelling the MS data.

In some cases, however, a more general random eects model may be useful for the interpretation of autocorrelation among the within-subjects errors. For comparison purposes, we t a model, denoted by M3, with random intercepts and slopes (Zi=Xi) and white noise

errors. The associated ML estimation results are given in Table III. The AIC values did not improve over M2. In addition, all elements of F12 and F22 are relatively small (compared

with their standard errors) with the exception of F22 for HD, indicating negligible variation in

slopes except for HD. Based on this, we check the expanded model that combines the random intercepts, random slopes and an AR(1) dependence structure for HD. The value of AIC is

(9)

1 PL 0.1 0.2 0.3 0.4 0.5 0.6 rho 0.1 0.2 0.3 0.4 0.5 0.6 rho 0.5 1 1.5 2 f 0.5 1.5 2 f LD HD 1 0.8 0.6 0.4 0 0.2 Profile Likelihood 1 0.8 0.6 0.4 0 0.2 Profile Likelihood 1 0.4 0.5 0.6 0.7 rho 0.5 1.5 2 f 1 0.8 0.6 0.4 0 0.2 Profile Likelihood

Figure 2. Prole likelihood functions of (; f) from three treatment groups.

Table III. ML estimation results forM3, where F is the Cholesky decomposition of .

Group 0 1 2 F11; F12; F22  AIC −0:0139 0.0190 0.0129 1:1255; 0:0352; 0:0480 1.76 PL −108:20 (0.0358) (0.0023) (0.0038) (0:2629; 0:1022; 0:0323) (0.57) −0:0070 0.0125 0.0114 1:0115;−0:0264; 0:0663 1.97 LD −153:68 (0.0319) (0.0025) (0.0040) (0:5310; 0:2712; 0:0344) (0.67) −0:0857 0.0140 0.0113 0:9412;−0:0136; 0:0989 5.49 HD −273:26 (0.0299) (0.0032) (0.0023) (0:2665; 0:1216; 0:0253) (2.39)

M2 is our preferred model for the MS data since it has the smallest AIC. Furthermore, it

incorporates the autocorrelation and is parsimonious.

Based on the analysis so far, we found that the ML estimates of the d.f. ’s for the MS data are all relatively small, especially for PL and LD. To assess further the adequacy of normal

(10)

Table IV. ML estimation results for M4, where f is such that  = f2. Group 0 1 2 f  AIC 0.0368 0.0244 0.0826 1.1329 0.2208 PL 141.34 (0.0897) (0.0042) (0.0078) (0.2163) (0.0651) 0.0326 0.0089 0.0619 0.8953 0.2717 LD 40.26 (0.0685) (0.0039) (0.0063) (0.1868) (0.0682) −0:0040 0.0125 0.0137 0.8734 0.3584 HD −299:30 (0.0416) (0.0030) (0.0034) (0.1945) (0.0642) -3 -2 -1 0 1 2 3 Quantiles of Standard Normal

-3 -2 -1 0 1 2 3 Quantiles of Standard Normal

-3 -2 -1 0 1 2 3 Quantiles of Standard Normal -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 residuals PL -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 residuals LD -0.2 0.0 0.2 residuals HD

Figure 3. Normal quantile plots for residuals from tting M4.

modelling for the MS data, we t a normal version of M2 (obtained by setting  =),

denoted by M4. The ML estimation results and AICs are reported in Table IV. Compared

to Table II, we found that the xed eects are rather dierent and have consistently larger standard errors. The substantially larger AICs indicate that M4 is not suitable for the PL and

LD groups, although the AICs are comparable for the HD group.

Figure 3 displays the corresponding normal quantile plots for the residuals fromM4.

Obvi-ously, the residuals of PL and LD seriously deviate from normality, conrming the presence of longer-than-normal tails. In contrast, the departure from normality for HD is minor. Based on these ndings, it appears that M4 might be adequate for HD.

We next compare the prediction accuracy of M2 and M4. We use the predictive sample

reuse procedure of Geisser [14] sequentially by removing the last few points of each response vector as the true values to be predicted. As a measure of precision we use the MARD, which is dened as the mean of absolute relative deviations |(yjpyˆjp)=yjp|, where p is the time

point being forcast.

We restrict our attention to the one-step-ahead forecasts by setting p = 13–17. To predict Yip, we use Yi1; : : : ; Yi;p−1 andY(i) as the sample to obtain the ML estimate ˆQ and plug it into

(11)

Table V. Comparison of one-step-ahead forecast accuracy in terms of MARD.

Group The time point being forecast M4 M2

13 0.532 0.414 14 1.186 1.020 PL 15 1.825 1.610 16 0.818 0.702 17 0.717 0.675 Average 1.016 0.884 13 0.528 0.504 14 1.044 1.010 LD 15 0.580 0.591 16 0.716 0.484 17 0.302 0.287 Average 0.634 0.575 13 0.803 0.817 14 0.245 0.220 HD 15 0.414 0.382 16 0.765 0.741 17 0.412 0.397 Average 0.528 0.511 Overall average 0.726 0.657

the predictor (14). Since the predictions are done sequentially for each subject and for each p, the procedure is termed prequential by Dawid [15]. Table V shows prediction results from these two models. The t-based model has a much smaller MARD than the normal model for PL and LD; the relative improvement percentages are 13 and 9.3 per cent, respectively. On the contrary, the prediction performance for HD is only slightly better (3.2 per cent). The t-based model not only provides better model tting, it also yields smaller forecast errors for the MS data; the overall improvement is 9.5 per cent.

6. CONCLUDING REMARKS

There are rather extensive approaches to robustifying linear mixed models, see for example, References [16–19]. The t linear mixed model provides an alternative robust way of dealing with longitudinal data when some outlying observations are present. Moreover, the explicit derivative-based estimation and score testing procedures developed in this paper can be easily implemented with low computational burden.

As shown by Lee [20] and Chi and Reinsel [21], inclusion of the simple AR(1) dependence could lead to an appropriate representation of dependence structure and may reduce the need for including complex random eects in the model. From our illustrated MS data it is encour-aging that the use of t linear mixed model coupled with AR(1) structure oers better tting as well as better prediction performance than the normal counterpart. It may be worthwhile comparing with other dependence structures, such as higher order AR, MA or ARMA.

(12)

Finally, as one referee pointed out, the current model does not allow simultaneous tting for all three treatment groups of the MS data set and it may be of interest to extend the model in that direction.

APPENDIX A: THE SCORE FUNCTION AND FISHER INFORMATION MATRIX The score vectors is the vector of the rst derivatives of (4) with respect to Q= (R; 2; ;),

where = (vech(F); ) and F is the Cholesky decomposition of 

sR = N  i=1( + pi) X i −1i ei 2 + i(R; ; ) s2 = n 22 + 1 22 N  i=1( + pi)  i(R; ; ) 2 + i(R; ; )  s = 1 2 N  i=1    + pi 2   2  pi log  1 +i(R; ; ) 2  +( + pi)   i(R; ; ) 2 + i(R; ; )  [s]r =1 2 N  i=1 tr(−1i ˙ir)( + pi) e i −1i ˙ir−1i ei 2 + i(R; ; ) 

where ˙ir= @i=@!r, for r = 1; : : : ; g; g = (m22+ m2+ 2)=2 and (x) =dxd log((x)) denotes the

digamma function.

The Fisher information, obtained by the negative expectation of the second derivative of (4), has the following components:

IRR = N  i=1  + pi 2( + pi+ 2)X  i −1i Xi; IR=0m1× 1; IR=0m1× 1; IR=0m1× g I22 =  24 N  i=1 pi  + pi+ 2; I 2= 1 22 N  i=1  p i  + pi+ 2 pi  + pi  [I2]r =  22 N  i=1 1  + pi+ 2tr( −1 i ˙ir) I = 1 4 N  i=1   2   + p i 2  ( + p2( + 2) i+ 2) 2 + 4  + pi 

(13)

[I]r = N  i=1 1 ( + pi)( + pi+ 2)tr( −1 i ˙ir) [I]rs = 1 2 N  i=1 1  + pi+ 2{( + pi)tr( −1 i ˙ir−1i ˙is)tr(−1i ˙ir)tr(−1i ˙is)} for r; s = 1; : : : ; g, where (x) = d2

dx2log((x)) denotes the trigamma function.

APPENDIX B: THE FIRST PARTIAL DERIVATIVES OF ‘R

The rst partial derivatives of ‘R with respect to 2, , and  are

@‘ R @2 = n 22 1 2tr N i=1X  i HiXi −1N i=1X  i @H@2iXi  + 1 22 N  i=1 ( + pi)i( ˆR(X);; ) 2 + i( ˆR(X);; ) @‘ R @ = 1 2tr N i=1X  i HiXi −1N i=1X  i @H@ Xi i  +1 2 N  i=1  + p i 2   2  1log 1 +i( ˆR(X);; ) 2 +( + pi)   i( ˆR(X);; ) 2 + i( ˆR(X);; )   @‘ R @!  r = 1 2tr N i=1X  i HiXi −1N i=1X  i @H@!i rXi  1 2 N  i=1 tr(−1i−1ir )( + pi) ˆ e i (X)−1i ˙ir−1−1i eˆi(X) 2 + i( ˆR(X);; )  where @Hi @2 = ( + pi) −1 i (2 + i( ˆR(X);; ))2 4−1i eˆi(X)ˆei (X)−1i (2 + i( ˆR(X);; ))3  @Hi @ =  −1 i 2 + i( ˆR(X);; ) 2−1i eˆi(X)ˆei (X)−1i (2 + i( ˆR(X);; ))2

(14)

2( + p i) −1 i (2 + i( ˆR(X);; ))2 4−1i eˆi(X)ˆei (X)−1i (2 + i( ˆR(X);; ))3  @Hi @!r = ( + pi) −1 i ˙ir−1i 2 + i( ˆR(X);; )+ −1 i eˆi (X)−1i ˙ir−1i ˆei(X) (2 + i( ˆR(X);; ))2 + 2 −1 i ˙ir−1i eˆi(X)ˆei (X)−1i + 2−1i eˆi(X)ˆei (X)−1i ˙ir−1i (2 + i( ˆR(X);; ))2 4−1i eˆi(X)ˆei (X)−1i eˆi (X)−1i ˙ir−1i eˆi(X) (2 + i( ˆR(X);; ))3 

APPENDIX C: PROOFS OF (12) AND (13)

Recall that i=ZiZi +Ci. With some algebraic manipulations, we can get

−1 i =C−1i C−1i Zi(Wi+)−1(Zi C−1i Zi)−1Zi C−1i It follows that ˆbi(Q) = Z i −1i (YiXiR) = Zi C−1i (YiXiR) Z i C−1i Zi(Wi+)−1(Zi C−1i Zi)−1Zi C−1i (YiXiR)ZiC−1i (YiXiR) = Wi−1bi(Q)Wi−1(Wi+)−1bi(Q) = bi(Q)Wi(Wi+)−1bi(Q)

where bi(Q) = (Zi C−1i Zi )−1Zi C−1i (YiXiR), and equation (12) holds.

The error covariance matrix of ˆbi(X) is

E((ˆbi(Q)bi)(ˆbi(Q)bi)) =  2 2 + Z i −1i   2 2 i  −1 i Zi2Zi −1i Zi   2 2   =  2 2(Z i (C−1i C−1i Zi(Wi+)−1(Zi C−1i Zi)−1Zi C−1i )Zi) =  2 2(W iWi(Wi+)−1Wi)

(15)

APPENDIX D: PROOF OF (15) Recall that 11=ZiZi +C11, then

−1

11 =C−111 C−111Zi(W11W11( + W11)−1W11)Zi C−111

It suces to show that

22·1 =2221−11112 =zizi+C22(ziZi +C21) ×(C−111 C−111Zi(W11W11( + W11)−1W11)Zi C−111 )(Zizi +C12) =C22C21C−111C12+zizi−zi( + W11)−1zi zi( + W11)−1W11Zi C−111C12C21C−111ZiW11( + W11)−1zi +C21C−111Zi(W11W11( + W11)−1W11)Zi C−111 C12 Since zizi zi( + W11)−1zi =zi(W11W11( + W11)−1W11)zi zi( + W11)−1W11ZiC−111 C12 =zi(W11W11( + W11)−1W11)(C21C−111Zi) and C21C−111ZiW11( + W11)−1zi =C21C−111Zi(W11W11( + W11)−1W11)zi we have 22·1 =C22C21C−111C12+zi(W11W11( + W11)−1W11)zi zi(W11W11( + W11)−1W11)(C21C−111 Zi) C21C−111 Zi(W11W11( + W11)−1W11)zi +C21C−111Zi(W11W11( + W11)−1W11)Zi C−111C12 =C22·1+ (ziC21C−111 Zi)(W11W11( + W11)−1W11)(ziC21C−111 Zi)

This completes the proof.

ACKNOWLEDGEMENTS

The authors thank Prof. Paramjit S. Gill for providing the MS data at an early stage of this work. We also thank the editor and two anonymous referees for their helpful comments and suggestions that substantially improved the presentation. This research was supported by the National Science Council of Taiwan (Grant No. NSC93-2118-M-029-004 and NSC93-2118-M-009-003).

(16)

REFERENCES

1. Gill PS. A robust mixed linear model analysis for longitudinal data. Statistics in Medicine 2000;19:975–987. 2. West M. Outlier models and prior distributions in Bayesian linear regression. Journal of the Royal Statistical

Society, Series B 1984;46:431–438.

3. Lange KL, Little RJA, Taylor JMG. Robust statistical modelling using the t distribution. Journal of the American Statistical Association 1989;84:881–896.

4. James AT, Wiskich JT, Conyers RAJ. t-REML for robust heteroscedastic regression analysis of mitochondrial power. Biometrics 1993;49:339–356.

5. Welsh AH, Richardson AM. Approaches to the robust estimation of mixed models. In Handbooks of Statistics, Maddlal GS, Rao CR (eds). vol. 15. Elsevier Science: Amsterdam, 1997; 343 –383.

6. Pinheiro JC, Liu CH, Wu YN. Ecient algorithms for robust estimation in linear mixed-eects models using the multivariate t distribution. Journal of Computational and Graphical Statistics 2001;10:249–276. 7. Rochon J. ARMA covariance structures with time heteroscedasticity for repeated measures experiments. Journal

of the American Statistical Association 1992;87:777–784.

8. Lin TI, Lee JC. On modelling data from degradation sample paths over time. Australian and New Zealand Journal of Statistics 2003;45:257–270.

9. Lee JC, Lin TI, Lee KJ, Hsu YL. Bayesian analysis of Box-Cox transformed linear mixed models with ARMA(p; q) dependence. Journal of Statistical Planning and Inference 2005;133:435–451.

10. Harville DA. Bayesian inference for variance components using only error contracts. Biometrika 1977; 61: 383 – 385.

11. Laird NM, Ware JH. Random eects models for longitudinal data. Biometrics 1982; 38:963–974.

12. D’yachkova YD, Petkau J, White R. Longitudinal analysis for magnetic resonance imaging outcomes in multiple sclerosis clinical trials. Journal of Biopharmaceutical Statistics 1997;7:501–531.

13. Rubin DB. Inference and missing data. Biometrika 1976;63:581–592.

14. Geisser S. The predictive sample reuse method with applications. Journal of the American Statistical Association 1975;70:320–328.

15. Dawid AP. Present position and potential developments: some personal views. Journal of the Royal Statistical Society, Series A 1984;147:278–292.

16. Fellner WH. Robust estimation of variance components. Technometrics 1986;28:51–60.

17. Huggins RM. A robust approach to the analysis of repeated measures. Biometrics 1993;49:715–720. 18. Richardson AM. Bounded inuence estimation in the mixed linear model. Journal of the American Statistical

Association 1997;92:154–161.

19. Richardson AM, Welsh AH. Robust restricted maximum likelihood in mixed linear models. Biometrics 1995; 51:1429–1439.

20. Lee JC. Prediction and estimation of growth curve with special covariance structures. Journal of the American Statistical Association 1988;83:432–440.

21. Chi EM, Reinsel GC. Models for longitudinal data with random eects and AR(1) errors. Journal of the American Statistical Association 1989; 84:452–459.

數據

Table I lists the resulting ML estimates with their standard errors in parentheses obtained from (5), Akaike’s Information Criteria (AIC = − 2 × maximized log-likelihood + 2m, where m is the number of model parameters), along with the values of the score t
Table I. ML estimation results and score test statistics for M 1 , where f is such that  = f 2
Figure 2. Prole likelihood functions of (; f) from three treatment groups.
Table IV. ML estimation results for M 4 , where f is such that  = f 2 . Group  0  1  2 f  AIC 0.0368 0.0244 0.0826 1.1329 0.2208 PL 141.34 (0.0897) (0.0042) (0.0078) (0.2163) (0.0651) 0.0326 0.0089 0.0619 0.8953 0.2717 LD 40.26 (0.0685) (0.0039) (0.00
+2

參考文獻

相關文件

Project 1.3 Use parametric bootstrap and nonparametric bootstrap to approximate the dis- tribution of median based on a data with sam- ple size 20 from a standard normal

Primal-dual approach for the mixed domination problem in trees Although we have presented Algorithm 3 for finding a minimum mixed dominating set in a tree, it is still desire to

The research proposes a data oriented approach for choosing the type of clustering algorithms and a new cluster validity index for choosing their input parameters.. The

FIGURE 5. Item fit p-values based on equivalence classes when the 2LC model is fit to mixed-number data... Item fit plots when the 2LC model is fitted to the mixed-number

Abstract Based on a class of smoothing approximations to projection function onto second-order cone, an approximate lower order penalty approach for solving second-order cone

Based on a class of smoothing approximations to projection function onto second-order cone, an approximate lower order penalty approach for solving second-order cone

Following the supply by the school of a copy of personal data in compliance with a data access request, the requestor is entitled to ask for correction of the personal data

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.