On estimation and prediction for temporally correlated longitudinal data

(1)

Inference 87 (2000) 87–104

www.elsevier.com/locate/jspi

On estimation and prediction for temporally correlated

longitudinal data

Jack C. Lee

a; ∗

_{, R.C. Hwang}

b

a_{Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan} b_{Department of Accounting and Statistics, Dahan Institute of Technology, Hualien, Taiwan} Received 20 October 1998; received in revised form 12 August 1999; accepted 1 October 1999

Abstract

In this paper, from a Bayesian point of view, we consider estimation of parameters and prediction of future values for the longitudinal model proposed by Diggle (1988. Biometrics 44, 959–971). This model, called the repeated measures linear model, incorporates group mean, variability among individuals, serial correlation within an individual, and measurement error. Two dierent priors are employed by the Bayesian approach, one is the noninformative prior and the other is composed of inverse gamma distributions. Given the noninformative prior, it is shown that the resulting approximate estimates of the regression coecients are the same as those derived by the restricted maximum likelihood estimation. Markov chain Monte Carlo methods are also used to obtain more accurate Bayesian inference for parameters as well as prediction of future values. For parameter estimation and prediction of future values, the advantages of the Bayesian approach over the maximum likelihood method and the restricted maximum likelihood method are demonstrated by both real and simulated data. c 2000 Elsevier Science B.V. All

rights reserved. MSC: 62F15; 62F10

Keywords: Approximate Bayesian method; Informative prior; Maximum likelihood estimation; Minimum accumulated prediction error; Noninformative prior; Repeated measures linear model; Restricted maximum likelihood estimation

1. Introduction

In this paper, from a Bayesian point of view, we consider estimation of parameters and prediction of future values for the longitudinal model proposed by Diggle (1988). This model, called the repeated measures linear model, incorporates group mean, vari-ability among individuals, serial correlation within an individual, and measurement error, and is dened as

Yij= Xijÿi+ ij1ij+ Vij+ ij (1.1)

∗_{Corresponding author.}

(2)

for j = 1; : : : ; Ni and i = 1; : : : ; r. Here Yij is a pij× 1 random vector representing pij

observations made on the jth subject in the ith group. The Xij is a known design

matrix of order pij× m, and ÿi an unknown m × 1 vector of regression coecients

of the ith group. The ij’s are independent and identically distributed normal random

variables each with mean 0 and variance 2

, and 1ij is a pij× 1 vector of 1’s. The

Vij’s are independent pij× 1 normal random vectors each with mean 0 and covariance

matrix 2

VCij, where Cij stands for a correlation matrix. Finally, ij’s are independent

pij× 1 normal random vectors each with mean 0 and covariance matrix 2Iij. It is

noted that Iij is a pij× pij identity matrix, and Xij is of rank m; 26m6pij. The

rst component Xijÿi in (1.1) is interpreted as the ith group mean over individuals.

The second component ij stands for the individual random eect on the jth subject in

the ith group. The third component Vij represents serial correlation in Yij. The fourth

component ij denotes measurement errors which are uncorrelated.

By (1.1), the covariance matrix of Yij can be expressed as

Cov(Yij) = 2Vij; (1.2)

where ij=11ij10ij+Cij+2Iij; 1=2=2V and 2=2=V2. The dependence structure

in Vij considered in this paper is an autoregressive process of order 1, i.e., an AR(1)

process. Hence Cij= (|a−b|3 ), for a; b = 1; : : : ; pij, and 3 ∈ (−1; 1) is the AR(1)

parameter. Note that the covariance structure of Yij consists of a uniform covariance

and an AR(1) serial covariance. See Lee (1988) for a detailed explanation. It is also worth noting that this covariance structure, although quite exible for longitudinal data, may not model well in some situations. In particular, it assumes variance components, and correlation structure, that apply to all subjects and is time independent. A residual variance that slightly increases over time might well be indistinguishable from one that does not in terms of model t to the observations. However, it could make a large dierence in prediction. Furthermore, a more complicated correlation structure such as AR(g) or ARMA could be more appropriate for the data at hand. These are beyond the scope of the current paper and will be subjects of future investigation. Finally, 1=(1+ 2) = 2=(2+ 2) will tends to the intraclass correlation as 3 tends to zero.

The estimation of parameters for the model has been studied by Diggle (1988) using the methods of maximum likelihood (ML) and restricted maximum likelihood (REML). The prediction of future values for the model has been considered by Donnelly et al. (1995) by using the ML method.

In addition to estimation of parameters, two dierent prediction problems for the un-balanced repeated measures growth curve model are considered from a Bayesian point of view. For the rst prediction problem, let yi‘ be a q × 1 random vector representing

q future observations made on the ‘th subject in the ith group, for ‘ = 1; : : : ; Ni and

i=1; : : : ; r. We are interested in predicting yi‘ given Y =(Y110 ; : : : ; Y1N10 ; : : : ; Yr10; : : : ; YrNr0 )0.

This is a time series prediction and thus is important in practice. This prediction prob-lem is called extended prediction of yi‘, by Lee (1988), since the prediction is made

beyond the observed time range of the sample. It is noted that in our setting extended prediction is identical to conditional prediction as studied in Lee and Geisser (1972,

(3)

1975), Fearn (1975), Rao (1987), Lee (1988, 1991), and Donnelly et al. (1995). For the second prediction problem, let Fi be p future values made on a new subject under

study in the ith group. Note that the new subject under study is not contained in the original Ni subjects in the ith group in (1.1). Specically, Fi is a multivariate normal

random vector with mean Xÿi and covariance matrix 2Vp, where X is a known p×m

design matrix for the new subject under study in the ith group, and p is of the form

specied by (1.2). We will address the prediction of Fi given Y .

In order to estimate parameters and to predict future values for the growth curve problem, the model is considered in this paper from a Bayesian viewpoint with two dierent priors. The rst prior is the noninformative prior (Edwards et al., 1963; Zellner and Tiao, 1964). Using this prior, it is shown that the resulting approximate estimates of the regression coecients are the same as those obtained by the REML estima-tion. For a detailed discussion of the REML estimation, see, for example, Patterson and Thompson (1971) and Harville (1974). The second prior is composed of inverse gamma distributions, Gelfand et al. (1990). In order to employ the second prior, the hyperparameters of the inverse gamma distributions are selected by the idea of min-imum accumulated prediction error (Rissanen, 1986; Lee and Tsao, 1992). In order to compare relative merits of dierent methods, we will compare the mean absolute deviation (MAD) and the mean absolute relative deviation (MARD) of the predicted values from the actual observations. For extended prediction of yi‘, it is shown by real

data that the approximate Bayesian method employing the second prior performs better than those using the rst prior, the ML method and the REML method. A possible advantage of this prior is demonstrated in Table 5 regarding the predictive accuracy of extended prediction of yi‘.

Given the model using two dierent priors, Bayesian estimation of parameters is considered in Section 2 and prediction problems are presented in Section 3. Bayesian inference via the Markov chain Monte Carlo (MCMC) methodology is considered in Section 4. The results developed in this paper are illustrated in Section 5 with real and simulated data. Finally, some concluding remarks are made in Section 6.

2. Bayesian estimation of the parameters

In this section, the posterior distributions and the posterior regions for the parameters in the model are studied.

For the joint prior of the parameters ÿ; 2

V; 1; 2, and 3, two dierent priors (2.1)

and (2.2) will be used in this paper: (ÿ; 2

V; 1; 2; 3) ˙ −2V ; (2.1)

(ÿ; 2

V; 1; 2; 3) ˙ −2V (1)(2); (2.2)

where ÿ = (ÿ1; : : : ; ÿr) and the function is the inverse gamma distribution with the

hyperparameters Á ¿ 1 and Â ¿ 0. Specically, (x) = IG(Á; Â) ˙ x−(Á+1)_{exp(−Â=x):}

(4)

In both (2.1) and (2.2), we have assumed that ÿ1; : : : ; ÿr; 2V; 1; 2, and 3 have

independent prior distributions. In (2.1), no information is available for each of the parameters and it is called noninformative prior (Edwards et al., 1963; Zellner and Tiao, 1964); whereas for (2.2) Gelfand et al. (1990) chose subjectively the hyperparameters Á and Â of the inverse gamma distribution in advance. But in this paper we shall use the idea of minimum accumulated prediction error as proposed by Rissanen (1986) and Lee and Tsao (1992) to choose these two hyperparameters. In the accumulated prediction error criterion, the prediction is performed sequentially and was termed “prequential” by Dawid (1984). In this case, we avoid choosing these two hyperparameters subjectively. Note that the priors for 1 and 2 given on the right-hand side of (2.2) can be

generalized by using two dierent sets of hyperparameters in their inverse gamma distributions. Here, for simplicity, the same set of hyperparameters is assumed for the two inverse gamma distributions.

We now introduce the idea of the minimum accumulated prediction error criterion for choosing the two hyperparameters (Á; Â). This idea will take the minimizer of the accumulated prediction error S(Á; Â) over (Á; Â) as the selected hyperparameters for (Á; Â). The function S(Á; Â) is dened by

S(Á; Â) =Pr i=1 Ni P j=1 pij P

k=4|Yijk− ˆYijk(Á; Â)|:

Here Yijk is the kth observation in Yij and ˆYijk(Á; Â) the predicted value of Yijk when

Á and Â are used in (2.2). The detailed formulation of ˆYijk(Á; Â) will be given in

Section 3. Let ( ˆÁ; ˆÂ) denote the selected hyperparameters for (Á; Â). In this paper, the prior given in (2.1) is called prior 1 and that given in (2.2) with the selected hyperparameters ( ˆÁ; ˆÂ) prior 2.

By applying the approximate method of Ljung and Box (1980), it can be shown that the approximate posterior distributions of ÿi and V2, and an approximate 100(1 − )%

posterior region for ÿi; i = 1; : : : ; r, are

ÿi|Y ˙∼Tm   ˆˆÿ_i; ˆB1 (n − mr) Ni P j=1X 0 ijˆ−1ij Xij !−1 ; n − mr   ; (2.3) 2 V|Y ˙∼IG((n − mr)=2; ˆB1=2); (2.4) (ÿi− ˆˆÿi)0 Ni P j=1X 0 ijˆ−1ij Xij ! (ÿi− ˆˆÿi)6const1; (2.5)

where n=Pr_i=1PNi_j=1pij; const1=F(1−; m; n−mr)m(n−mr)−1ˆB1; Tp(; ; n) denotes

a p-variate T-distribution with mean and covariance matrix n(n − 2)−1_{, and ˆˆÿ}

i; ˆB1

and ˆij are the ˆÿi; B1 and ij with 1; 2; 3 replaced by ˆ1; ˆ2; ˆ3, respectively,

B1= r P i=1 Ni P j=1(Yij− Xijˆÿi) 0−1 ij (Yij− Xijˆÿi);

(5)

ˆÿ_i= PNi j=1X 0 ij−1ij Xij !−1 Ni P j=1X 0 ij−1ij Yij ! ; and ˆ₁; ˆ₂; ˆ₃ maximize p(1; 2; 3|Y; prior 1) ˙ B−(n−mr)=2₁ r Q i=1 Ni Q j=1|ij| −1=2 Qr i=1 Ni P j=1X 0 ij−1ij Xij −1=2 ; (2.6) or

p(1; 2; 3|Y; prior 2) ˙ 1−( ˆÁ+1)−( ˆÁ+1)2 exp(− ˆÂ=1− ˆÂ=2)

×B₁−(n−mr)=2 Qr i=1 Ni Q j=1|ij| −1=2 Qr i=1 Ni P j=1X 0 ij−1ij Xij −1=2 : (2.7) We now close this section with the following remarks. Firstly, the joint posterior distribution of 1; 2, and 3 given in (2.6) is proportional to the objective function

for choosing the REML estimates of 1; 2, and 3. Hence, the REML estimates of

1; 2, and 3 are identical to the mode of the joint posterior distribution of 1; 2, and

3 as given in (2.6). Secondly, by (2.3), the expectation of the approximate posterior

distribution of ÿi is ˆˆÿi. It is the same as the REML estimate of ÿi. Hence, we know that

when prior 1 is used, the REML estimate of ÿi is an approximate Bayesian estimate,

for each i. To the authors’ knowledge, this fact regarding the estimation of ÿi has

not been given in the literature. Finally, by (2.4), the posterior expectation of 2 V is

ˆB1=(n − 2 − mr) while the REML estimate of V2 is ˆB1=(n − mr). Hence, when n − mr

is suciently large, the estimates of 2

V derived by the two methods are very close.

However, in small samples, the Bayesian estimate of 2

V will be a bit more biased than

the REML estimate. In fact, it is easily seen that

E ˆB1 n − 2 − mr − V2 ! = n − mr n − 2 − mrE ˆB1 n − mr − 2V ! + 2 n − 2 − mr2V; (2.8) and the REML estimate of 2

V is positively biased as seen in Table 11.

Note that if the joint prior for the parameters ÿ; 2

V; 1; 2, and 3 is proportional

to a constant, then the mode of the resulting joint posterior distribution of 2

V; 1; 2,

and 3 is the REML estimates of 2V; 1; 2, and 3. This fact has been noted by

Harville (1974) for a general variance component model. In this case, the approximate posterior distribution of ÿi is a multivariate T distribution as given in (2.3), except

that the degree of freedom n − mr is replaced by n − 2 − mr. 3. Prediction

In this section, two dierent prediction problems for the model will be considered. We shall rst study extended prediction of yi‘, a future q-dimensional observation

(6)

when each of priors 1 and 2 is employed. To solve this prediction problem, both mean and covariance structure generally have to be extendable to the future values of the individuals observed. We assume that the mean structure is extendable and the covariance structure considered in the paper also satises this requirement. Since the estimation of parameters, especially 3, can be highly sensitive to outliers and the

proper dependence structure is important in the prediction of future observation, we would expect that the predictive inference will be quite sensitive to outliers. This topic is beyond the scope of the paper and will be a potential topic in the future.

Let x be a q × m design matrix corresponding to yi‘ and ˜Xi‘= Xi‘_x . We have

Cov Yi‘ yi‘ = 2 V(11 10+ C + 2I) = 2V = 11 12 21 22 : (3.1)

Here 1 is a (pi‘+ q) × 1 vector of 1’s, I is a (pi‘+ q) × (pi‘+ q) identity matrix,

and C = (|a−b|₃ ) for a; b = 1; : : : ; (pi‘+ q).

As in Section 2, it can be shown that the approximate predictive distribution and an approximate 100(1 − )% predictive region for yi‘ are

yi‘|Y ˙∼Tq( ˆy; ˆSy((n − mr) ˆG22)−1; n − mr); (3.2)

(yi‘− ˆy)0 ˆG22(yi‘− ˆy)6const2; (3.3)

where Q1= Ni P j6=‘X 0 ij−1ij Xij; Q2= ˜Xi‘0 −1˜Xi‘; Q = Q1+ Q2; ˆÿ_i1= PNi j6=‘X 0 ij−1ij Xij !₋₁ Ni P j6=‘X 0 ij−1ij Yij ! ; B2= r P k6=i Nk P j=1(Ykj− Xkjˆÿk) 0−1 kj (Ykj− Xkjˆÿk) + Ni P j6=‘(Yij− Xijˆÿi1) 0−1 ij (Yij− Xijˆÿi1); G = −1_˜X

i‘( ˜X0i‘−1˜Xi‘)−1Q1Q−1˜X0i‘−1+ Z(Z0Z)−1Z0=

G11 G12

G21 G22

; y= x ˆÿi1− G22−1G21(Yij− Xijˆÿi1); Sy= B2+ (Yi‘− Xi‘ˆÿi1)0G11:2(Yi‘− Xi‘ˆÿi1);

Z is a (pi‘+ q) × (pi‘ + q − m) matrix satisfying ˜X0i‘Z = 0; G11; G12, and G22 are

pi‘ × pi‘; pi‘ × q, and q × q matrices, respectively, G21 = G120 , and G11:2= G11−

G12G−122 G21; const2=F(1−; q; n−mr)q(n−mr)−1ˆSy, and ˆy; ˆSy, and ˆG22are the y; Sy,

and G22 with 1; 2, and 3 replaced respectively by ˆ1; ˆ2, and ˆ3 which maximize

p(1; 2; 3|Y; prior 1) ˙ Sy−(n−mr)=2 r Q k6=i Nk Q j=1|kj| −1=2 QNi j6=‘|ij| −1=2 ×(|||Q||G22|)−1=2 r Q k6=i Nk P j=1X 0 kj−1kj Xkj −1=2 ;

(7)

or

p(1; 2; 3|Y; prior 2) ˙ 1−( ˆÁ+1)−( ˆÁ+1)2 exp(− ˆÂ=1− ˆÂ=2)

×S−(n−mr)=2 y r Q k6=i Nk Q j=1|kj| −1=2 QNi j6=‘|ij| −1=2 ×(|||Q||G22|)−1=2 r Q k6=i Nk P j=1X 0 kj−1kj Xkj −1=2 :

The prediction of future values Fi, a p × 1 vector, given the sample Y is now

considered. Similar to yi‘, the approximate predictive distribution and an approximate

100(1 − )% predictive region for Fi are

Fi|Y ˙∼Tp(X ˜ÿi; ˜B1((n − mr) ˜M)−1; n − mr); (3.4) (Fi− X ˜ÿi)0M(F˜ i− X ˜ÿi)6const3; (3.5) where const3= F(1 − ; p; n − m)p(n − m)−1˜B1, W1= Ni P j=1X 0 ij−1ij Xij; W2= X0−1p X; W = W1+ W2; M = −1 p X (X0−1p X )−1W1W−1X0−1p + U(U0pU)−1U0;

U is a p × (p − m) matrix satisfying X0_{U = 0, and ˜ÿ}

i; ˜B1, and ˜M are the ˆÿi; B1, and

M with 1; 2, and 3 replaced respectively by ˆ1; ˆ2, and ˆ3 which maximize

p(1; 2; 3| Y; prior 1) ˙ B−(n−mr)=2₁ r Q i=1 Ni Q j=1|ij| −1=2 ×(|p||W ||M|)−1=2 r Q k6=i Nk P j=1X 0 k−1kj Xkj −1=2 ; or p(1; 2; 3| Y; prior 2) ˙ −( ˆÁ+1)1 −( ˆÁ+1)2 exp(− ˆÂ=1− ˆÂ=2)B−(n−mr)=21 ×Qr i=1 Ni Q j=1|ij| −1=2_(| p||W ||M|)−1=2 ×Qr k6=i Nk P j=1X 0 k−1kj Xkj −1=2 :

4. Bayesian inference via MCMC methodology

The MCMC methodology has been extremely popular in statistics since the publi-cation of Gelfand and Smith (1990). For some details see Casella and George (1992) and Gilks et al. (1996). In the rest of this section we will show the results for prior 1 only. They can be adapted for prior 2 easily and are hence omitted.

(8)

4.1. Parameter estimation

The joint posterior of ÿ = (ÿ1; : : : ; ÿr), 2V, 1, 2, and 3 can be obtained from

com-bining the likelihood function and the prior (2:1). MCMC proceeds as follows: 1. Generate ÿigiven ÿj, j 6= i, 2V, 1, 2, 3and Y from N( ˆÿi, 2V(PNij=1Xij0−1ij Xij)−1).

2. Generate 2

V given ÿ, 1, 2, 3, and Y from IG(n=2; S(ÿ; 1; 2; 3; Y )) where

S(ÿ, 1, 2, 3; Y ) =Pri=1

P_Ni

j=1(Yij− Xijÿij)0−1ij (Yij− Xijÿij).

3. Generate 1given ÿ, 2, 3and Y via the Metropolis–Hastings (M–H) algorithm from

g1(01| ÿ; 2; 3; Y ) ˙Qr_i=1QNi_j=1|∗ij|−1=2exp[(1=22V) S(ÿ; 10; 2; 3; Y )] exp(01),

where 0

1= log(1); ∗ij is ij with 1 replaced by exp(01). Once 01 is generated,

1= exp(01).

4. Generate 2 given ÿ, 1, 3 and Y via the M–H algorithm from g2(02| ÿ; 1; 3; Y )

which is dened as g1 with 01 replaced by 02 and 2 replaced by 1.

5. Generate 3 given ÿ, 1, 2 and Y via the M–H algorithm from

g3(03| ÿ; 1; 2; Y ) ˙ r Q i=1 Ni Q j=1| ∗ ij|−1=2exp −₂1₂ V S(ÿ; 1; 2; 0 3; Y ) ×[2 exp(0 3)][1 + exp(03)]−2; where 0

3 = log(1 + 3)(1 − 3)−1, ij∗ is ij with 3 replaced by [exp(03) −

1][exp(0

3) + 1]−1. Once 03 is generated, 3= [exp(03) − 1][exp(03) + 1]−1.

In order to ensure that the samples are drawn from the domain of the entire density, Gelman and Rubin (1992) suggested using “overdispersed” starting values in multiple chains to assist in drawing from the domain of the entire density and assessing conver-gence by using the potential scale reduction measure √ˆR. According to this viewpoint, we implemented the MCMC sampling using seven chains with dierent starting values. The starting values for each variable of interest are the MLE and six others obtained from MLE ± k times standard deviation, for k = 1; 3; 5. For each chain, after a su-ciently long burn-in iterations, we then use the remaining samples as simulated from the variable of interest.

4.2. Prediction

We have from (3:1) that yij| Â; Y ∼ N(2:1; 2V22:1), where Â = (ÿ; V2; 1; 2; 3),

2:1= xÿi+ 21−111 (Yi‘− Xi‘ÿi); 22:1= 22− 21−111 12, and ij is dened in (3:1).

Hence we can generate y_i‘(k;s) from f(yi‘|Â(k;s); Y ) where Â(k;s) is the kth iteration and

sth replication of the MCMC sampler of Â. Thus, we can predict yi‘ by the mean

ˆy_i‘=_mn1 Pm s=1 n P k=1y (k;s) i‘

or the median of the samples, i.e., ˆy_i‘=_m1 Pm

s=1y (s) 0:5;

(9)

where y(s)_0:5 is the median of the last n iterations for each chain. Prediction intervals and quantiles of yi‘ can also be obtained from the sampler yi‘(k;s).

It is noted that the prediction of future values Fi can be done in a similar manner.

5. Numerical illustration

In order to get a further insight into the results obtained in Sections 2–4, we will illustrate with a real example and a simulation study.

5.1. A real example

In this section, the results obtained in Sections 2–4 are applied to a real exam-ple. The data, given in Table 1, consist of the weights of 23 calves, each observed from 0 to 18 weeks with an increment of 2 weeks. They are given in Group B of Table 6:1 of Diggle et al. (1994) with the weights on 19th week excluded and with the exception of individuals 1, 9, 11, 19, 23, 26 and 28. The seven calves are excluded from consideration for this illustration because their growth patterns are irregular in the last two weeks which make them unsuitable for the model.

Table 1 Weights (100 kg) of calves Time in weeks 0 2 4 6 8 10 12 14 16 18 1 2.30 2.40 2.58 2.77 2.77 2.93 3.00 3.23 3.27 3.40 2 2.26 2.33 2.48 2.77 2.97 3.13 3.22 3.40 3.54 3.65 3 2.33 2.39 2.53 2.77 2.92 3.10 3.18 3.33 3.36 3.53 4 2.38 2.41 2.62 2.82 3.00 3.14 3.19 3.31 3.38 3.48 5 2.25 2.28 2.37 2.61 2.71 2.88 3.00 3.16 3.19 3.33 6 2.24 2.25 2.39 2.57 2.68 2.90 3.04 3.13 3.10 3.18 7 2.37 2.41 2.55 2.76 2.93 3.07 3.12 3.36 3.36 3.44 8 2.33 2.39 2.59 2.83 2.94 3.13 3.20 3.47 3.48 3.62 9 2.28 2.23 2.46 2.66 2.77 2.87 3.00 3.12 3.08 3.28 10 2.41 2.47 2.68 2.90 3.09 3.23 3.36 3.48 3.59 3.72 11 2.21 2.21 2.40 2.53 2.73 2.82 2.92 3.07 3.06 3.17 12 2.17 2.20 2.35 2.59 2.62 2.76 2.84 3.05 3.03 3.15 13 2.14 2.21 2.37 2.56 2.71 2.83 2.87 3.14 3.16 3.20 14 2.24 2.31 2.41 2.56 2.65 2.83 2.95 3.14 3.13 3.28 15 2.00 2.03 2.21 2.36 2.48 2.62 2.76 2.94 2.91 3.11 16 2.30 2.22 2.43 2.53 2.68 2.84 2.90 3.16 3.14 3.30 17 2.17 2.24 2.42 2.65 2.84 3.02 3.09 3.24 3.28 3.38 18 2.09 2.09 2.21 2.38 2.56 2.67 2.81 2.95 3.01 3.09 19 2.30 2.31 2.44 2.61 2.72 2.83 2.94 3.18 3.20 3.33 20 2.16 2.18 2.23 2.43 2.59 2.70 2.70 2.90 3.01 3.14 21 2.07 2.16 2.28 2.55 2.75 2.85 2.96 3.14 3.19 3.30 22 2.21 2.32 2.51 2.84 2.84 2.95 3.00 3.23 3.19 3.33 23 2.33 2.38 2.54 2.66 2.82 2.94 2.95 3.10 3.20 3.27

(10)

Fig. 1. Weights of calves.

To get the ML, REML and the approximate Bayesian estimates of parameters in the model and to predict future values, the data were divided by 100 to avoid “over ow” or “under ow” problems. They are plotted in Fig. 1.

Given the model and prior 1, in order to predict the values in the 18th week, the MCMC simulated samples of the related parameters were produced by using the data in the rst 16 weeks. In order to t the model, we rst examine Fig. 1 which shows clearly that the data have a linear trend and consist of only one group. By this and the fact that the times of measurement are equally spaced, the value of r was taken as r = 1, the vector of regression coecients ÿ1= (ÿ11; ÿ12)0 and the design matrix for

each calf is X = 1; 1; : : : ; 1 1; 2; : : : ; 9 0 :

Tables 2 and 3 give the mean, standard deviation, and 2.5%, 5%, 25%, 50%, 75%, 95%, and 97.5% quantiles of the 7000 MCMC simulated samples of (ÿ1; 2V; 1; 2; 3)

for priors 1 and 2, respectively. We can then obtain point and interval estimates for any parameter of interest. The parameter estimates from ML and REML are shown in Table 4.

(11)

Table 2

MCMC simulated posterior distributions using prior 1

ÿ11 ÿ12 2 V 1 2 3 Mean 2.1039 0.1259 0.0088 1.2051 0.0860 0.8196 S.D. 0.0372 0.0034 0.0011 0.6511 0.0621 0.0543 2.5% 2.0106 0.1195 0.0071 0.3595 0.0059 0.7268 5% 2.0255 0.1205 0.0074 0.4286 0.0083 0.7366 25% 2.0858 0.1237 0.0081 0.6065 0.0269 0.7742 50% 2.1088 0.1258 0.0085 1.1056 0.0786 0.8195 75% 2.1278 0.1282 0.0092 1.6997 0.1347 0.8622 95% 2.1535 0.1321 0.0109 2.4248 0.1905 0.9053 97.5% 2.1666 0.1329 0.0115 2.5738 0.2143 0.9165 Table 3

MCMC simulated posterior distributions using prior 2

ÿ11 ÿ12 2 V 1 2 3 Mean 2.1041 0.1261 0.0078 0.0623 0.0845 0.8068 S.D. 0.0216 0.0032 0.0012 0.0152 0.0281 0.0506 2.5% 2.0605 0.1199 0.0049 0.0317 0.0430 0.7229 5% 2.0690 0.1209 0.0057 0.0404 0.0481 0.7380 25% 2.0894 0.1239 0.0072 0.0527 0.0641 0.7716 50% 2.1039 0.1261 0.0079 0.0613 0.0795 0.7995 75% 2.1192 0.1282 0.0085 0.0707 0.0992 0.8328 95% 2.1396 0.1314 0.0095 0.0883 0.1392 0.9069 97.5% 2.1462 0.1321 0.0098 0.0952 0.1518 0.9295 Table 4

Comparison of estimates from ML and REML using real data

ÿ11 ÿ12 2

V 1 2 3

ML 2.0962 0.1277 0.0079 1.5448 0.1911 0.8245

REML 2.0962 0.1276 0.0087 1.4019 0.1753 0.8429

We now compare prediction ability among the eight prediction methods: ML method, REML method, approximate Bayesian method, and the mean and the median of the MCMC samples for both priors. The ML method has been used in prediction by Donnelly et al. (1995). The approximate Bayesian method is to take ˆ_yin (3:3) for both priors 1 and 2 as the predictor for yi‘. To get ˆy for prior 2, the minimum accumulated

prediction error criterion was used to select the inverse gamma hyperparameters Á and Â over the rectangle (1; 40]×(0; 10]. The values of S(Á; Â) were calculated on 1950×500 equally spaced grid points in the rectangle. The minimizer of these values was taken as the selected value of (Á; Â) for prior 2.

Using the model and the data in the rst 16 weeks in Fig. 1, each of the above eight methods was used to predict the weight of each calf in the 18th week. For the weight of the rst calf in the 18th week given all the data in the rst 16 weeks, using prior 1, Fig. 2 gives the exact predictive density (solid curve), approximate Bayesian pre-dictive density (dashed curve), and MCMC prepre-dictive density (dotted curve). Similar

(12)

Fig. 2. Comparison of predictive densities f(y|Y ) using prior 1.

comparisons are shown in Fig. 3 when using prior 2. Figs. 2 and 3 show that the approximate Bayesian predictive density and the MCMC predictive density are very close to the exact predictive density. This indicates that the predictive density can be approximated closely by both approximate Bayesian and MCMC sampling meth-ods. It is noted that the exact Bayesian predictive densities are obtained by numerical integrations.

Table 5 gives the MAD and MARD for each of the above eight prediction methods. Table 5 shows that REML and the approximate Bayesian method using prior 1 performs about as well as MLE. However, the approximate Bayesian method using prior 2 performs better. The best among the methods compared are MCMC mean and median using either prior. Thus, MCMC methods are quite encouraging for this model.

Fig. 4 gives 95% condence and posterior regions for ÿ1 derived respectively by the

ML method (solid curve) and the approximate Bayesian method using prior 1 (dashed curve) and using prior 2 (short dashed curve). It also shows the point estimates of ÿ1

obtained by the three methods denoted by the circle, triangle, and plus signs, respec-tively. Fig. 4 shows that the point estimates of ÿ1 derived by the ML method and the

approximate Bayesian method using prior 1 are very close. The 95% posterior region derived by the approximate Bayesian method using prior 1 is slightly larger than the condence region obtained by the ML method. But the approximate Bayesian method

(13)

Fig. 3. Comparison of predictive densities f(y|Y ) using prior 2.

Table 5

Comparison of prediction methods using real data

MLE REML Prior 1 Prior 2 MCMC(Prior 1) MCMC(Prior 2) Mean Median Mean Median MAD 0.0412 0.0412 0.0412 0.0398 0.0398 0.0380 0.0386 0.0386 MARD 0.0126 0.0126 0.0126 0.0120 0.0121 0.0116 0.0117 0.0117

using prior 2 has the largest posterior region for ÿ1. It is noted that the condence

region obtained by the REML method is almost indistinguishable from that of the approximate Bayesian method using prior 1.

5.2. A simulation study

In this section, a simulation study was conducted to compare both the coverage probabilities of condence and posterior regions of ÿ1 and the predictive intervals of yil

derived by the ML method, the REML method and those obtained by the approximate Bayesian methods using priors 1 and 2. We now describe the simulation settings. The simulated data were independently generated from the model with r = 1, 1= 0:5,

(14)

Fig. 4. 95% condence and posterior region of beta. Table 6

Comparison of coverage probabilities for ÿ1(1 − = 0:95)

N p MLE REML Prior 1 Prior 2

5 5 0.829 0.870 0.894 0.915 10 5 0.904 0.921 0.934 0.957 15 5 0.923 0.935 0.936 0.948 20 5 0.933 0.941 0.941 0.953 2= 0:1, 3= 0:8, 2V= 0:2, ÿ1= (10; 1)0, and X = 1; 1; : : : ; 1 1; 2; : : : ; p 0 :

Given p = 5, for N = 5; 10; 15, and 20, Table 6 gives the coverage probabilities of 95% condence and posterior regions for ÿ1. Given the values of N and p, each

data set contains N subjects and for each subject p measurements were made. For each combination of N and p, 1000 independent data sets were generated. Table 6 shows that the approximate posterior regions, using both priors 1 and 2, have larger coverage probabilities (and closer to 0.95) than the condence regions derived by the ML method and the REML method.

Given p = 5, for N = 5; 10; 15, and 20, Table 7 gives the coverage probabilities of 95% predictive intervals for yil, the (p + 1)th measurement for each subject. Given

(15)

Table 7

Comparison of coverage probabilities for yi‘(1 − = 0:95)

5 5 0.8724 0.8798 0.9346 0.9372

10 5 0.9102 0.9156 0.9414 0.9472

15 5 0.9237 0.9273 0.9439 0.9506

20 5 0.9298 0.9313 0.9440 0.9508

Table 8

Comparison of prediction method (MAD)

Mean S.D. Mean S.D. Mean S.D. Mean S.D.

5 5 0.2796 0.0940 0.2782 0.0937 0.2783 0.0937 0.2757 0.0946 10 5 0.2716 0.0641 0.2710 0.0640 0.2710 0.0640 0.2681 0.0632 15 5 0.2676 0.0487 0.2673 0.0486 0.2673 0.0485 0.2648 0.0479 20 5 0.2653 0.0453 0.2651 0.0452 0.2651 0.0452 0.2629 0.0448

Table 9

Comparison of prediction methods (MARD)

Mean S.D. Mean S.D. Mean S.D. Mean S.D.

5 5 0.0175 0.0059 0.0174 0.0059 0.0174 0.0059 0.0173 0.0059 10 5 0.0170 0.0040 0.0170 0.0040 0.0170 0.0040 0.0168 0.0040 15 5 0.0168 0.0031 0.0167 0.0030 0.0167 0.0030 0.0166 0.0030 20 5 0.0166 0.0028 0.0166 0.0028 0.0166 0.0028 0.0165 0.0028

the values of N and p, each data set contains N subjects and for each subject p + 1 measurements were made. Among the N subjects, the rst p measurements were used to predict yil. For each combination of N and p, 1000 independent data sets were

generated. Hence there were N × 1000 predicted values to be compared with N × 1000 true values. Table 7 shows that the approximate Bayesian predictive intervals using both priors 1 and 2 have larger coverage probabilities (and closer to 0.95) than those for the intervals derived from the ML method and the REML method.

Given p=5, for N =5; 10; 15, and 20, Tables 8 and 9 give the prediction comparisons for the various methods in terms of MAD and MARD, respectively. Prediction intervals from the four methods are given in Table 10.

All four predictors perform about equally, although prior 2 is slightly better. Finally, we will show biasedness of the estimate for 2

V. Given p = 5, for N = 5; 10; 15, and 20

and set 2

V = 0:2, Table 11 shows that the mean of the REML estimates is positively

(16)

Table 10

Comparison of prediction intervals

True MLE REML Prior 1 Prior 2

weight

Lower Upper Lower Upper Lower Upper Lower Upper

3.4000 3.2809 3.5508 3.2798 3.5507 3.2767 3.5538 3.2652 3.5439 3.6500 3.4854 3.7553 3.4873 3.7582 3.4842 3.7613 3.4956 3.7743 3.5300 3.3676 3.6365 3.3664 3.6372 3.3633 3.6403 3.3519 3.6306 3.4800 3.3827 3.6517 3.3809 3.6517 3.3778 3.6548 3.3644 3.6432 3.3300 3.2109 3.4808 3.2102 3.4810 3.2071 3.4841 3.1957 3.4744 3.1800 3.1526 3.4225 3.1512 3.4221 3.1481 3.4252 3.1263 3.4050 3.4400 3.3702 3.6401 3.3698 3.6406 3.3667 3.6437 3.3544 3.6331 3.6200 3.4653 3.7353 3.4663 3.7372 3.4632 3.7403 3.4610 3.7397 3.2800 3.1405 3.4104 3.1384 3.4092 3.1353 3.4123 3.1090 3.3878 3.7200 3.5519 3.8218 3.5528 3.8236 3.5497 3.8267 3.5488 3.8278 3.1700 3.1092 3.3791 3.1075 3.3784 3.1044 3.3815 3.0845 3.3632 3.1500 3.0794 3.3493 3.0777 3.3486 3.0746 3.3517 3.0568 3.3355 3.2000 3.1756 3.4455 3.1753 3.4461 3.1722 3.4492 3.1666 3.4454 3.2800 3.1671 3.4371 3.1660 3.4368 3.1629 3.4399 3.1469 3.4256 3.1100 2.9660 3.2359 2.9650 3.2359 2.9619 3.2390 2.9491 3.2278 3.3000 3.1753 3.4452 3.1742 3.4450 3.1711 3.4481 3.1559 3.4346 3.3800 3.2845 3.5544 3.2848 3.5556 3.2817 3.5587 3.2767 3.5554 3.0900 3.0374 3.3074 3.0366 3.3075 3.0335 3.3106 3.0261 3.3048 3.3300 3.2197 3.4896 3.2187 3.4896 3.2156 3.4927 3.2041 3.4828 3.1400 3.0299 3.2998 3.0281 3.2989 3.0250 3.3020 3.0163 3.2940 3.3000 3.1942 3.4641 3.1947 3.4655 3.1916 3.4686 3.1913 3.4690 3.3300 3.2277 3.4976 3.2266 3.4975 3.2235 3.5006 3.2063 3.4840 3.2700 3.2147 3.4846 3.2126 3.4835 3.2095 3.4866 3.1938 3.4725 Table 11 REML estimate of 2 Va N p Mean S.D. 5 5 0.2216 0.1546 10 5 0.2270 0.1311 15 5 0.2211 0.1125 20 5 0.2231 0.1043

a_{The true value of}2 V is 0.2.

6. Concluding remarks

In this paper we consider, from a Bayesian viewpoint, estimation of parameters and prediction of future values for the longitudinal model proposed by Diggle (1988). Two dierent priors are employed in this study. For the noninformative prior, it is found that the approximate Bayesian estimates of the regression coecients are identical to the REML estimates. For the informative prior, the inverse gamma distribution is assumed for both variance ratios with the same set of hyperparameters which are determined from the sample using the minimum accumulated prediction error criterion.

(17)

In real and simulated data, it is found that the proposed methods are quite encour-aging. For the prediction of future values, the approximate Bayesian predictor using noninformative prior performs about the same as the REML predictor, but slightly better than the ML predictor, although the approximate Bayesian predictor using informative prior is even better. The best among the predictors compared are the MCMC mean and median using either noninformative or informative priors. For the condence and pos-terior regions, it is found that the pospos-terior region based on the approximate pospos-terior distribution of the regression coecients when noninformative prior is employed has a much more accurate coverage probability than the ML method and the REML method when the sample size is small. Thus we believe that the proposed approximate method is a useful alternative to the ML method and the REML method in dealing with the longitudinal model proposed by Diggle (1988). Of course, better results are those using the MCMC simulation as suggested in the paper, although the implementation is more involved. However, the steps outlined in the paper can be followed rather easily. References

Casella, G., George, E.I., 1992. Explaining the Gibbs sampler. Amer. Statist. 46, 167–174.

Dawid, A.P., 1984. Present position and potential developments: some personal views. Statistical theory — the prequential approach (with distribution). J. Roy. Statist. Soc. Ser. A 147, 278–292.

Diggle, P.J., 1988. An approach to the analysis of repeated measurements. Biometrics 44, 959–971. Diggle, P.J., Liang, K.Y., Zeger, S.L., 1994. Analysis of Longitudinal Data. Oxford Statistical Science Series,

Vol. 13. Oxford Science Publications, Oxford.

Donnelly, C.A., Laird, N.M., Ware, J.H., 1995. Prediction and creation of smooth curve for temporally correlated longitudinal data. J. Amer. Statist. Assoc. 90, 984–989.

Edwards, W., Lindman, H., Savage, L.J., 1963. Bayesian statistical inference for psychological research. Psychol. Rev. 70, 193–242.

Fearn, T., 1975. A Bayesian approach to growth curves. Biometrika 62, 89–100.

Gelfand, A.E., Hills, S.E., Racine-Poon, Smith, A.F.M., 1990. Illustration of Bayesian inference in normal data models using Gibbs sampling. J. Amer. Statist. Assoc. 85, 972–985.

Gelfand, A.E., Smith, A.F.M., 1990. Sampling based approaches to calculating densities. J. Amer. Statist. Assoc. 85, 298–309.

Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences. Statist. Sci. 7, 457–472.

Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall, London.

Harville, D., 1974. Bayesian inference for variance components using only error contrasts. Biometrika 61, 383–385.

Lee, J.C., 1988. Prediction and estimation of growth curve with special covariance structures. J. Amer. Statist. Assoc. 83, 432–440.

Lee, J.C., 1991. Tests and model selection for the general growth curve model. Biometrics 47, 147–159. Lee, J.C., Geisser, S., 1972. Growth curve prediction. Sankhya A 34, 393–412.

Lee, J.C., Geisser, S., 1975. Applications of growth curve prediction. Sankhya A 37, 239–256.

Lee, J.C., Tsao, S.L., 1992. On estimation and prediction procedures for AR(1) models with power transformation. J. Forecast. 12, 499–511.

Ljung, G.M., Box, G.E.P., 1980. Analysis of variance with autocorrelated observations. Scand. J. Statist. 7, 172–180.

Patterson, H.D., Thompson, R., 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554.

(18)

Rao, C.R., 1987. Prediction of future observations in growth curve models. Statist. Sci. 2, 434–471. Rissanen, J., 1986. Order estimation by accumulated prediction errors. In: Gani, J., Priestley, M.B. (Eds.),

Essays in Time Series and Allied Processes — Papers in Honor of E.J. Hannan. Applied Probability Trust, Sheeld, England.

Zellner, A., Tiao, G.C., 1964. Bayesian analysis of regression model with autocorrelated errors. J. Amer. Statist. Assoc. 59, 763–778.