Geostatistical model averaging based on conditional information criteria

(1)

DOI 10.1007/s10651-011-0171-2

Geostatistical model averaging based on conditional

information criteria

Chun-Shu Chen · Hsin-Cheng Huang

Received: 11 March 2010 / Revised: 14 September 2010 / Published online: 11 April 2011 © Springer Science+Business Media, LLC 2011

Abstract Variable selection in geostatistical regression is an important problem, but has not been well studied in the literature. In this paper, we focus on spatial pre-diction and consider a class of conditional information criteria indexed by a penalty parameter. Instead of applying a fixed criterion, which leads to an unstable predictor in the sense that it is discontinuous with respect to the response variables due to that a small change in the response may cause a different model to be selected, we fur-ther stabilize the predictor by local model averaging, resulting in a predictor that is not only continuous but also differentiable even after plugging-in estimated model param-eters. Then Stein’s unbiased risk estimate is applied to select the penalty parameter, leading to a data-dependent penalty that is adaptive to the underlying model. Some numerical experiments show superiority of the proposed model averaging method over some commonly used variable selection methods. In addition, the proposed method is applied to a mercury data set for lakes in Maine.

Keywords Conditional Akaike information criterion· Data perturbation ·

Spatial prediction· Stabilization · Stein’s unbiased risk estimate · Variable selection

C.-S. Chen (

B

)

Institute of Statistics and Information Science, National Changhua University of Education, No.1, Jin-De Road, 500 Changhua, Taiwan

e-mail: [email protected] H.-C. Huang

Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, 115 Taipei, Taiwan

e-mail: [email protected] H.-C. Huang

(2)

1 Introduction

In many geostatistical problems, the variable of interest is often observed along with other variables at some locations. Typically, a geostatistical regression model is applied by treating the variable of interest as the response and some other variables as explan-atory variables. How to select a suitable subset of explanexplan-atory variables is crucial, because accuracy of selection directly affects estimation and prediction. Model selec-tion and model averaging have been well studied for linear regression; seeShao(1997), Hoeting et al. (1999), andBurnham and Anderson(2002) for a review. For selec-tion of geostatistical models, the Akaike informaselec-tion criterion (AIC) (Akaike 1973), the Bayesian information criterion (BIC) (Schwarz 1978), and the cross-validation method are often applied. Although some heuristic arguments for AIC were provided byHoeting et al.(2006), as pointed out byBreiman(1996), these model selection pro-cedures are unstable in the sense that a small change in the response variables may cause a different model to be selected, resulting in less accurate estimation and prediction.

In this paper, we focus on spatial prediction and attempt to make three proposals. First, we introduce a class of conditional information criteria indexed by a penalty parameter, which we believe is more suitable for geostatistical model selection com-paring to commonly used unconditional information criteria, such as AIC and BIC. A particular criterion in this class, called the conditional AIC (CAIC), was proposed byVaida and Blanchard(2005) for variable selection in linear mixed-effects models. Despite some tendency to select an over-complex model, this criterion has a desired property that it is an unbiased estimate of the sum of the mean squared prediction errors over data locations up to a constant. Second, due to that the spatial predictor obtained from a fixed conditional information criterion is also unstable, we propose to stabilize the spatial predictor by local model averaging using perturbed data, resulting in a stabilized predictor that can be shown to be differentiable with respect to the response variables even after plugging-in estimated model parameters. Third, utiliz-ing the differentiable property of the stabilized predictor, we propose to apply Stein’s unbiased risk estimate (Stein 1981) to select among a collection of conditional infor-mation criteria, leading to a data-dependent penalty with an adaptive feature such that it tends to selects a large penalty, and hence a small model (based on a small number of explanatory variables), when the underlying true model is small, and vice versa.

The rest of this article is organized as follows. Section2introduces the geostatis-tical regression model and the class of conditional information criteria for variable selection. Section3introduces the proposed stabilized predictor and Stein’s unbiased risk estimate for selecting the penalty parameter. Some numerical results are shown in Sect.4. An application of the proposed methodology for assessing mercury levels of fish in Maine lakes is presented in Sect.5. Finally, a brief discussion is given in Sect.6.

2 Geostatistical regression and variable selection 2.1 Geostatistical regression models

Consider a spatial process{S(s) : s ∈ D} defined over a region D ⊆ Rd. Suppose that the spatial process can be decomposed into the following:

(3)

S(s) = μ(s) + η(s); s ∈ D, (1) whereμ(·) is a deterministic mean process corresponding to a large-scale mean struc-ture, andη(·) is a zero-mean, L2-continuous, spatial dependent process correspond-ing to a small-scale structure. Suppose that we observe response variable Zi and a

p-dimensional vector of explanatory variables,(x1(si), . . . , xp(si)), associated with

Zi at location si ∈ D; i = 1, . . . , n. The mean process μ(s) is usually modeled as

β0+

p

j=1

βjxj(s), where β = (β0, β1, . . . , βp)are regression parameters. Hence the

geostatistical regression model can be written as:

Zi = S(si) + ε(si) = β0+ p j=1 βjxj(si) + η(si) + ε(si); i = 1, . . . , n, (2)

whereε(s1), . . . , ε(sn) ∼ N(0, σ_ε2) are white noise variables representing

measure-ment errors, and are independent of the spatial processη(·). Here, the spatial depen-dent processη(·) is usually assumed to be stationary with some parametric covariance function class. A commonly used one is the isotropic Matérn model (Matérn 1986):

C(h) = cov(η(s + h), η(s)) = ⎧ ⎨ ⎩ σ2 η(a2||h||/2)ν2κν(a2||h||) (ν) ; if h = 0, σ2 η; if h= 0, (3)

where · represents the Euclidean distance, κ_ν(·) is the modified Bessel function of the second kind with orderν > 0 (Abramowitz and Stegun 1965),ν indicates the smoothness of the process, a > 0 is a scaling parameter, and σ_η2 = var(η(s)). Note that (3) reduces to the exponential covariance model whenν = 0.5.

In this paper, we consider selecting among p explanatory variables. Each candi-date model corresponds to a subset of p variables and is indexed byγ ∈ , where  ⊂ 2{1,...,p} _{is the class of all candidate models. Then the geostatistical regression}

model corresponding toγ can be written as:

Z= X_γβ_γ + η + ε,

where β_γ is the parameter vector consisting of β0 and {βj : j ∈ γ }, Xγ is

the n × (p_γ+ 1) design matrix corresponding to γ with p_γ being the number of explanatory variables in γ, Z = (Z1, . . . , Zn), η = (η(s1), . . . , η(sn)), and

ε = (ε(s1), . . . , ε(sn)).

Denote the covariance parameters of Z byθ ≡ (ν, a, σ_η2, σ_ε2) and let(θ) ≡ var(Z). For known θ, the best linear unbiased predictor (BLUP), typically called the universal kriging (UK) predictor, of S(s) based on model γ satisfies

(4)

where ˆβ_γ =

X_γ(θ)−1X_γ

₋₁

X_γ(θ)−1Z, and x_γ(s) consists of the constant 1 and the p_γ explanatory variables observed at location s. Then the BLUP of S ≡ (S(s1), . . . , S(sn))based on modelγ can be written as

ˆS_γ(θ) ≡ ( ˆSγ(s1; θ), . . . , ˆSγ(sn; θ))= H_γ(θ)Z, (4)

in terms of some matrix H_γ(θ).

In practice, the model parameters in (2) and (3) can be estimated by methods of moments, maximum likelihood (ML), restricted maximum likelihood (REML) (Patterson and Thompson 1971), or Bayesian methods. Note that after plugging-in an estimated parameter vector ˆθ_γofθ based on model γ in (4), the resulting estimated UK predictor ˆS_γ( ˆθ_γ) = H_γ( ˆθ_γ)Z is no longer linear.

2.2 Model selection methods

In this paper, we focus on spatial prediction. Our objective is to select γ ∈ that minimizes E ˆSγ( ˆθγ) − S2. Some commonly used model selection criteria, such as AIC, BIC, and the corrected AIC (AICc) criterion (Hurvich and Tsai 1989), are special cases of the generalized information criterion (Nishii 1984) given by

GIC_λ(γ ) = −2_γZ, ˆθ_γ(ml), ˆβ_γ(ml)+ λ p_γ, (5) withλ = 2, log(n), and 2n/(n − p_γ− 2), respectively, where _γZ, θ, β_γis the log-likelihood function of Z based on modelγ , and ˆθ_γ(ml)and ˆβ_γ(ml)are the corresponding ML estimates. However, these criteria are not aim to minimize E ˆS_γ( ˆθ_γ) − S2, and hence may not be ideal for spatial prediction purpose, as demonstrated in some simulation examples in Sect.4.

With the goal of spatial prediction in mind, we now introduce a class of conditional information criteria by conditioning onη. For notational simplicity, we first assume that the covariance parameter vectorθ is known. Then the CAIC criterion proposed byVaida and Blanchard(2005) is

CAIC(γ ; θ) = 1 n Z − ˆSγ(θ)2+ 2trH_γ(θ)σ_ε2 , (6)

where trH_γ(θ), the trace of H_γ(θ), is the effective degrees of freedom, which tends to be larger for a larger (more complex) model. As shown byVaida and Blanchard (2005), ECAIC(γ ; θ)= 1 nE ˆSγ(θ) − S 2 + σ2 ε .

Consequently, CAIC appears more appropriate than AIC for variable selection when prediction is the main goal of the analysis. Whenθ is unknown, we can apply CAIC by plugging-in an estimate ofθ in (6).

(5)

In general, CAIC tends to select an over-complex model particularly when the underlying true model is small. A natural remedy is to consider applying a larger penalty in (6), leading to the following conditional generalized information criterion (CGIC) indexed by a penalty parameterλ > 0:

CGIC_λ(γ ) = 1 n Z − ˆSγ( ˆθγ)2+ λ trH_γ( ˆθ_γ)ˆσ_ε2 ; γ ∈ , (7)

where ˆS_γ( ˆθ_γ) = H_γ( ˆθ_γ)Z, ˆθ_γ is some estimator ofθ based on model γ , and ˆσ_ε2is an estimate of σ_ε2 in (7) independent of γ ∈ . For example, it can be estimated by REML based on the full model. For a given penaltyλ, CGIC in (7) selects the model, ˆγ(λ) ≡ arg min

γ ∈ CGICλ(γ ). The corresponding spatial predictor of S(s) at

s ∈ D is given by ˆS_ˆγ(λ)(s; ˆθ_ˆγ(λ)), where ˆθ_ˆγ(λ) is the estimator ofθ based on the selected model ˆγ(λ). The criterion (7) includes CAIC (corresponding toλ = 2) and conditional BIC (CBIC) (corresponding toλ = log(n)) as special cases. Clearly, if we know the underlying true model depends only on a small number of explanatory variables in advance, it would be preferable to choose a criterion with a larger penalty that penalizes more for a larger model, and vice versa. But in practice, the underlying true model is unknown, and hence a CGIC criterion may perform well only in some situations, which is not adaptive. Without knowing the underlying true model,Shen and Ye(2002) proposed to select the penaltyλ of (5) from data in linear regression model selection and termed the method as adaptive model selection. We shall adopt this approach and further generalize it for spatial model averaging.

3 The proposed method

As inShen and Ye(2002), we could consider ˆS_ˆγ(λ)(s; ˆθ_ˆγ(λ)) : λ ∈ } as the class of candidate spatial predictors for some ⊂ (0, ∞). However, ˆS_ˆγ(λ)(s; ˆθ_ˆγ(λ)) is discon-tinuous in Z and hence unstable (Breiman 1996). Motivated fromBreiman (1996), we consider a stabilized version of ˆS_ˆγ(λ)(s; ˆθ_ˆγ(λ)) obtained by using perturbed data, Z∗= (Z₁∗, . . . , Z∗_n)≡ Z +τξ, where ξ ∼ N(0, σ_ε2I) is independent of Z, and τ > 0 is the perturbation size. Whenσ_ε2is unknown, the estimatorˆσ_ε2in (7) is used for com-puting Z∗. Our proposed stabilized predictor of S(s) corresponding to ˆS_ˆγ(λ)(s; ˆθ_ˆγ(λ)) is given by

E ˆS∗_ˆγ∗_(λ)(s; ˆθ∗_ˆγ∗_(λ))Z

; λ ∈ , (8)

where ˆγ∗(λ) is the model selected by CGIC_λbased on Z∗:

ˆγ∗(λ) = arg min γ ∈ _Z_∗ − ˆS_γ∗( ˆθ_γ∗)2 + λ trH_γ( ˆθ_γ∗)1+ τ2σ_ε2 , ˆS∗ γ( ˆθγ∗) = ˆSγ∗(s1; ˆθγ∗), . . . , ˆSγ∗(sn; ˆθ_γ∗)

= Hγ( ˆθγ∗)Z∗is the UK predictor of S based

(6)

ˆθγ but based on Z∗. Since ˆγ∗(λ) may be different from ˆγ(λ) for different Z∗, the proposed stabilized predictor E ˆS∗_ˆγ_∗_(λ)(s; ˆθ∗_ˆγ_∗_(λ))Zin (8) can be regarded as a type of model averaging predictor centered around ˆS_ˆγ(λ)(s; ˆθ_ˆγ(λ)). Clearly, a larger τ tends to produce a smoother but more stable predictor. We found in some simulation exper-iments (e.g., Table5) that spatial prediction is typically not sensitive to the choose of τ when τ is between 0.5 and 0.9. Therefore, we fix τ = 0.5 throughout the simulation and the data analysis unless stated otherwise.

As shown byHuang and Chen(2007), E ˆS∗_ˆγ∗_(λ)(s; ˆθ∗_ˆγ∗_(λ))Z

is infinitely differen-tiable. Applying Stein’s lemma (1981), the corresponding effective degrees of freedom is given by d f_λ≡ n i₌₁ ∂ ∂ Zi E ˆS∗_ˆγ∗_(λ)(si; ˆθ∗_ˆγ∗_(λ))Z=_τ₂1_σ₂ ε n i₌₁ cov ˆS∗_ˆγ∗_(λ)(si; ˆθ∗_ˆγ∗_(λ)), Z∗iZ , (9)

and an unbiased estimator of EE ˆS∗_ˆγ_∗_(λ)( ˆθ∗_ˆγ_∗_(λ))Z− S2satisfies

Z− E ˆS∗_ˆγ∗_(λ)( ˆθ∗_ˆγ∗_(λ))Z2+ 2σ_ε2d fλ− nσε2. (10)

Note that (10) is usually referred to as Stein’s unbiased risk estimate (SURE). Thus we can simply selectλ by minimizing SURE in (10) and obtain

ˆλ ≡ arg min λ∈ Z − E ˆS∗_ˆγ∗_(λ)( ˆθ∗_ˆγ∗_(λ))Z2+ 2σ_ε2d fλ . (11)

Consequently, the model selected by CGIC_ˆλcriterion is

ˆγ(ˆλ) ≡ arg min γ ∈ Z − ˆSγ( ˆθγ)2+ ˆλ trHγ( ˆθγ)σε2 , and the resulting stabilized predicted surface is E ˆS∗

ˆγ∗_(ˆλ)(s; ˆθ∗_ˆγ∗_(ˆλ))Z

: s ∈ D. In practice, it suffices to select among a few discrete penalty values due to the continuity of E ˆS∗_ˆγ∗_(λ)( ˆθ∗_ˆγ∗_(λ))Z

in λ. For example, we consider =

{1, 2, log(n), 2 log(n)} in the simulation study and the data analysis, which involves

two typical criteria: CAIC with λ = 2 and CBIC with λ = log(n). Note that as demonstrated in Sect.4, the selected penalty in (11) tends to be small when the size of true model is large, and vice versa. This adaptive feature is attractive, because it automatically adjusts to select an appropriate model ˆγ(ˆλ), around which a model aver-aging predictor E ˆS∗

ˆγ∗_(ˆλ)(s; ˆθ∗_ˆγ∗_(ˆλ))Z

of S(s) is obtained, regardless of whether the underlying true model is small or large.

In practice, d f_λin (9) can be computed using a simple Monte Carlo (MC) method by simulating some replicates of perturbed data, and then approximate the covari-ances in the righthand side of the equality in (9) based on the corresponding sample covariances.

(7)

4 Simulations

We conducted a simulation study to examine the performance of the proposed model averaging method. We consider the model given by (1) and (2) with D= [0, 1]2_{. In} the simulation, we consider two different examples (Examples I and II) corresponding to two different sets of explanatory variables. In Example I, p= 9 explanatory vari-ables were independently generated from a standard Gaussian white noise process. In addition, the spatial processη(·) was generated from a zero-mean Gaussian sta-tionary process with the exponential covariance function having two combinations of parameters(ν, a, σ_η2) = (0.5, 0.5, 1) and (0.5, 0.1, 1) in (3) corresponding to weak and strong spatial dependence, respectively (see Fig.1). The noise variance in (2) is set to beσ_ε2 = 1, and is assumed known throughout the simulation. We consider five cases corresponding to five different choices ofβ = (0, β1, . . . , β9) indexed by k = 1, 3, 5, 7, 9 as the underlying true models. For each k, the regression coef-ficients are given by β1 = · · · = βk = √7/k, and βk₊₁ = · · · = βp = 0 so

that the signal-to-noise ratio (SNR) is controlled at 8, where SNR is defined as the ratio of the variance of the signal S(si) to the noise variance σ_ε2. We generated data,

{(x1(si), . . . , x9(si), Zi) : i = 1, . . . , n} with n = 32, by sampling at 64 × 64 regular

grid points in D using simple random sampling. For each case, we consider all possible combinations of explanatory variables by selecting among = 2{1,...,9}withγ = ∅ representing the intercept-only model.

The setup for Example II is basically the same as that in Example I except we select a polynomial up to the second order,β0+ β1x+ β2y+ β3x2+ β4x y+ β5y2, where x and y are the x and y coordinates, and consider only the weak dependence case with a= 0.5 in (3). Hence there are p= 5 explanatory variables, x, y, xy, x2, y2, in this

0.0 0.5 1.0 1.5 0.0 0.2 0.4 0.6 0.8 1.0 Distance Correlation ν = 0.5 and a = 0.5 ν = 0.5 and a = 0.1

(8)

example. We consider three different cases with differentβ values for the underlying model:βx, β(x + y + xy), and β(x + y + x2+ xy + y2), where β is chosen so that SNR= 8. Similar to Example I, we consider selecting among = 2{1,...,5}for each case.

We applied the proposed geostatistical model averaging method, referred to as GMA hereafter, with the covariance function of η(·) given by the Matérn class of (3). We selected among four CGIC criteria with = {1, 2, log(n), 2 log(n)} in (11) and estimated the Matérn parameters using REML, where d f_λ in (11) is computed using the MC method based on 100 replicates as mentioned in the last paragraph of Sect.3for eachλ ∈ . The proposed GMA method is compared with five different model selection criteria, including AIC, BIC, CAIC, CBIC, and AICc, and their cor-responding model averaging methods (seeBurnham and Anderson 2002), denoted as AIC-MA, BIC-MA, CAIC-MA, CBIC-MA, and AICc-MA, where a model averaging predictor of S is given by γ ∈ wγˆSγ( ˆθγ) with weights wγ = exp −1 2GICλ(γ ) γ∗_∈ exp −1 2GICλ(γ ∗₎ or exp −1 2CGICλ(γ ) γ∗_∈ exp −1 2CGICλ(γ ∗₎ ; γ ∈ .

The results of various methods are assessed in terms of the mean squared prediction error (MSPE): 1

n ˆS− S 2

, where ˆS is a generic predictor of S.

The results based on 200 simulation replicates for Examples I and II are shown in Tables1and2, respectively. In general, conditional information criteria perform better than the unconditional counterparts, and model averaging predictors perform better than the corresponding model selection predictors in terms of MSPE. As expected, both CAIC and CBIC perform well only in some situations with CBIC performing better than CAIC when the size of underlying true model is small, and vice versa. The proposed GMA method performs better than CAIC and CBIC for all cases. It also performs better than both CAIC-MA and CBIC-MA for almost all cases.

Tables3and4show the distribution of the penalties selected from GMA for each case in Examples I and II. Evidently, GMA tends to select a large penalty when the size of true model is small, and vice versa, as if the size of true model is known in advance. This adaptive feature can not be achieved by either CAIC or CBIC.

Finally, we performed a sensitivity study regarding the choice of perturbation size τ for the proposed GMA method. We also investigated whether it is advantageous to optimize the search ofλ over (0, ∞). The results for Example II are displayed in Table5, which show that MSPEs are not sensitive to the perturbation size when τ is between 0.5 and 0.9. In addition, there seems no clear advantage to search λ over (0, ∞). Similar results can be seen for all cases in Example I as well. Although we can further apply SURE to optimize the search of(λ, τ) over (0, ∞) × (0, ∞), it appears effective and computationally more efficient to fixτ at 0.5 and search over a small number ofλ values.

(9)

Table 1 MSPE performance of various methods for Example I with weak (a= 0.5) and strong (a = 0.1) spatial dependence based on 200 simulation replicates, where the values in parentheses are the corresponding standard errors a Criterion |{ j : βj= 0}| 1 3 5 7 9 AIC .557 (.021) .538 (.021) .547 (.021) .541 (.021) .556 (.021) AICc .542 (.025) .498 (.023) .540 (.024) .559 (.023) .657 (.024) BIC .530 (.024) .515 (.023) .566 (.024) .554 (.022) .569 (.022) CAIC .305 (.010) .345 (.014) .469 (.022) .499 (.020) .535 (.018) CBIC .255 (.010) .313 (.016) .467 (.025) .506 (.021) .589 (.021) 0.5 AIC-MA .467 (.020) .465 (.019) .506 (.020) .522 (.019) .560 (.019) AICc-MA .425 (.021) .426 (.020) .487 (.021) .533 (.018) .625 (.017) BIC-MA .434 (.021) .439 (.020) .494 (.020) .523 (.019) .575 (.017) CAIC-MA .256 (.009) .309 (.013) .423 (.019) .470 (.017) .533 (.015) CBIC-MA .224 (.009) .289 (.013) .440 (.022) .491 (.019) .588 (.018) GMA .214 (.009) .296 (.012) .386 (.014) .449 (.014) .490 (.013) AIC .448 (.021) .452 (.022) .452 (.022) .464 (.022) .467 (.022) AICc .416 (.026) .383 (.024) .393 (.024) .425 (.024) .557 (.024) BIC .405 (.026) .421 (.025) .423 (.023) .446 (.023) .469 (.022) CAIC .219 (.011) .272 (.016) .322 (.017) .392 (.019) .441 (.018) CBIC .154 (.010) .253 (.023) .314 (.021) .378 (.019) .471 (.020) 0.1 AIC-MA .339 (.019) .367 (.020) .392 (.020) .435 (.021) .475 (.019) AICc-MA .284 (.019) .318 (.020) .356 (.020) .430 (.020) .559 (.017) BIC-MA .295 (.019) .333 (.020) .370 (.020) .430 (.020) .496 (.018) CAIC-MA .171 (.008) .234 (.013) .295 (.015) .374 (.016) .454 (.016) CBIC-MA .136 (.007) .225 (.019) .288 (.016) .381 (.017) .489 (.016) GMA .116 (.009) .191 (.011) .278 (.013) .347 (.013) .408 (.013)

Table 2 MSPE performance of various methods for Example II based on 200 simulation replicates, where the values in parentheses are the corresponding standard errors

Criterion |{ j : βj= 0}| 1 3 5 AIC .372 (.025) .373 (.025) .361 (.024) AICc .203 (.009) .250 (.011) .238 (.009) BIC .203 (.009) .253 (.012) .236 (.009) CAIC .198 (.009) .246 (.009) .240 (.008) CBIC .180 (.009) .260 (.010) .242 (.008) GMA .168 (.007) .234 (.008) .212 (.008) 5 Application

We applied the proposed GMA method to the mercury data set previously analyzed by Hoeting and Olsen (see Chap. 1 ofPeck et al. 1998) using multiple linear regression for assessing mercury levels of fish in Maine lakes. It is known that mercury is a toxic metal, which may damage the human nervous system if the mercury level in the human body is above the safety limit. For example, the state government of Maine suggests that the mercury level in parts per million (ppm) should be less than 0.43. To assess

(10)

Table 3 Distributions of the selected penalties for Example I based on 200 simulation replicates a |{ j : βj= 0}| ˆλ Average ˆλ 1 2 log(n) 2 log(n) 1 14 23 24 139 5.53 3 16 43 31 110 4.86 0.5 5 25 53 70 52 3.67 7 48 89 54 9 2.38 9 89 98 13 0 1.65 1 9 7 16 168 6.21 3 7 18 34 141 5.69 0.1 5 16 29 60 95 4.70 7 36 76 78 10 2.64 9 89 99 12 0 1.64

Table 4 Distributions of the selected penalties for Example II based on 200 simulation replicates

|{ j : βj= 0}| ˆλ Average ˆλ

1 2 log(n) 2 log(n)

1 0 42 46 112 5.10

3 1 84 54 61 3.89

5 0 83 65 52 3.76

Table 5 MSPE performance of GMA for various perturbation sizesτ in Example II based on 200 sim-ulation replicates, where the penalty parameterλ is chosen from either = {1, 2, log(n), 2 log(n)} or a continuous interval∗= (0, ∞), and the standard errors of MSPEs are all less than or equal to 0.011

τ |{ j : βj= 0}| = 1 |{ j : βj= 0}| = 3 |{ j : βj= 0}| = 5

∗ ∗ ∗

0.1 .189 .204 .254 .262 .253 .264

0.5 .168 .174 .234 .234 .212 .220

0.9 .159 .166 .221 .227 .198 .206

whether the fish in Maine lakes are safe to eat, it is important to estimate mercury levels in fish particularly for lakes where no observations are taken and identify the important explanatory variables responsible for elevated levels of mercury.

The data set consists of mercury level (in ppm) as the response and 10 explanatory variables sampled by US Environmental Protection Agency at 110 lakes in Maine (see Fig.2). The 10 explanatory variables include number of fish, elevation (in feet), surface area (in acres), maximum depth (in feet), lake type (in three categories: oligotrophic, eutrophic, and mesotrophic), lake stratification (in two categories: yes and no), drain-age area (in square miles), runoff factor (in percentdrain-age of rainwater or melted snow flow into rivers and streams), flushing rate (in number flushes per year), and DAM (in two categories: all natural flowage and some man-made flowage in the drainage

(11)

−71 −70 −69 −68 −67 43 44 45 46 47 Longitude (degrees) Latitude (degrees)

Fig. 2 Locations of 110 lakes in Maine

area). We consider all possible combinations of explanatory variables, leading to 210 candidate models to be selected over.

We applied the model of (2) with Zi and S(si) being the observed and the true

mercury levels in fish at lake si, and x1(si), . . . , x10(si) being the corresponding 10

explanatory variables, where the covariance function ofη(·) is modeled by the expo-nential covariance model withν = 0.5 in (3). We applied CAIC, CBIC, and the pro-posed GMA method with = {1, 2, log(n), 2 log(n)} in (11), where the covariance parametersθ are estimated using REML and d f_λin (11) is computed using the MC method based on 100 replicates. In addition,σ_ε2in (11) is estimated by ˆσ_ε2= 0.0876 using REML based on the full model.

The results for CAIC and CBIC are shown in Table6. Comparing between the two criteria, CBIC selects a smaller model having only one variable (i.e., elevation). In contrast, CAIC selects a more complex model having two additional variables (surface area and lake stratification). Our method selects ˆλ = log(n) corresponding to CBIC. The predicted mercury level surface based on our GMA method is shown in Fig.3. Among the 110 lakes, 75 of them (shown as solid triangles in Fig.3) particularly in southeast of Maine have mercury levels higher than the safety limit (i.e., 0.43 ppm).

(12)

Table 6 Estimated regression parameters and covariance parameters for CAIC and CBIC, where the values in parentheses are the corresponding standard errors under the selected models

Criterion Regression parameters Covariance parameters Intercept Elevation Surface area Lake stratification a σ_η2 σ_ε2

CAIC .64 −2.9 × 10−4 −2 × 10−5 .083 .546 .034 .075 (.10) (1× 10−4) (1.5 × 10−5) (.059) CBIC .66 −2.7 × 10−4 − − .534 .026 .081 (.08) (1× 10−4) − − −71 −70 −69 −68 −67 0.3 0.4 0.5 0.6 0.7 43 44 45 4 4 7 46 Longitude (degrees) Latitude (degrees)

Fig. 3 Locations of 75 lakes in Maine with mercury levels higher than 0.43 ppm

6 Discussion

Motivated from the adaptive model selection idea ofShen and Ye(2002) and the sta-bilization method ofBreiman(1996), in this paper, we developed a model averaging method for geostatistical regression, which performs well for spatial prediction in some simulation experiments. Further theoretical justification is of interest but appears to be very difficult particularly under the fixed domain asymptotic framework, and hence is beyond the scope of this paper.

(13)

Acknowledgments This work was supported by the National Science Council of Taiwan under Grants NSC 98-2118-M-018-003-MY2 and NSC 97-2118-M-001-001-MY3. The authors thank the editor, the associate editor, and three anonymous referees for helpful comments and suggestions.

References

Abramowitz M, Stegun IA (1965) Handbook of mathematical functions. Dover, New York

Akaike H (1973) Information theory and the maximum likelihood principle. In International symposium on information theory (V. Petrov and F. Csáki eds.), Akademiai Kiádo, Budapest, pp. 267–281 Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24: 2350–2383 Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical

information-theoretic approach, 2nd edn. Springer-Verlag, New York

Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial (with Discussion). Stat Sci 14: 382–401

Hoeting JA, Davis RA, Merton AA, Thompson SE (2006) Model selection for geostatistical models. Ecol Appl 16: 87–98

Huang HC, Chen CS (2007) Optimal geostatistical model selection. J Am Stat Assoc 102: 1009–1024 Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika

76: 297–307

Matérn B (1986) Spatial variation, 2nd edn. Springer, New York, (Lecture Notes in Statistics)

Nishii R (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Ann Stat 12: 758–765

Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58: 545–554

Peck R, Haugh LD, Goodman A (eds) (1998) Statistical case studies: a collaboration between academe and industry. ASA-SIAM Series on statistics and applied probability 3 and 4

Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

Shao J (1997) An asymptotic theory for model selection (with discussion). Stat Sin 7: 221–264 Shen X, Ye J (2002) Adaptive model selection. J Am Stat Assoc 97: 210–221

Stein C (1981) Estimation of the mean of a multivariate normal distribution. Ann Stat 9: 1135–1151 Vaida F, Blanchard S (2005) Conditional Akaike information for mixed-effects models. Biometrika

92: 351–370 Author Biographies

Chun-Shu Chen graduated from the National Central University, Taiwan, in 2007 with a Ph.D. degree in Statistics. In 2008, he joined the Institute of Statistics and Information Science at the National Changhua University of Education, Taiwan. He is currently an Assistant Professor in the same University and his research activities are focused on spatial statistics and model selection.

Hsin-Cheng Huang is research fellow in the Institute of Statistical Science at Academia Sinica, Taiwan. His research interests include spatial and space-time models, model selection, and wavelet methods.