• 沒有找到結果。

Geostatistical model averaging based on conditional information criteria

N/A
N/A
Protected

Academic year: 2021

Share "Geostatistical model averaging based on conditional information criteria"

Copied!
13
0
0

加載中.... (立即查看全文)

全文

(1)

DOI 10.1007/s10651-011-0171-2

Geostatistical model averaging based on conditional

information criteria

Chun-Shu Chen · Hsin-Cheng Huang

Received: 11 March 2010 / Revised: 14 September 2010 / Published online: 11 April 2011 © Springer Science+Business Media, LLC 2011

Abstract Variable selection in geostatistical regression is an important problem, but has not been well studied in the literature. In this paper, we focus on spatial pre-diction and consider a class of conditional information criteria indexed by a penalty parameter. Instead of applying a fixed criterion, which leads to an unstable predictor in the sense that it is discontinuous with respect to the response variables due to that a small change in the response may cause a different model to be selected, we fur-ther stabilize the predictor by local model averaging, resulting in a predictor that is not only continuous but also differentiable even after plugging-in estimated model param-eters. Then Stein’s unbiased risk estimate is applied to select the penalty parameter, leading to a data-dependent penalty that is adaptive to the underlying model. Some numerical experiments show superiority of the proposed model averaging method over some commonly used variable selection methods. In addition, the proposed method is applied to a mercury data set for lakes in Maine.

Keywords Conditional Akaike information criterion· Data perturbation ·

Spatial prediction· Stabilization · Stein’s unbiased risk estimate · Variable selection

C.-S. Chen (

B

)

Institute of Statistics and Information Science, National Changhua University of Education, No.1, Jin-De Road, 500 Changhua, Taiwan

e-mail: [email protected] H.-C. Huang

Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, 115 Taipei, Taiwan

e-mail: [email protected] H.-C. Huang

(2)

1 Introduction

In many geostatistical problems, the variable of interest is often observed along with other variables at some locations. Typically, a geostatistical regression model is applied by treating the variable of interest as the response and some other variables as explan-atory variables. How to select a suitable subset of explanexplan-atory variables is crucial, because accuracy of selection directly affects estimation and prediction. Model selec-tion and model averaging have been well studied for linear regression; seeShao(1997), Hoeting et al. (1999), andBurnham and Anderson(2002) for a review. For selec-tion of geostatistical models, the Akaike informaselec-tion criterion (AIC) (Akaike 1973), the Bayesian information criterion (BIC) (Schwarz 1978), and the cross-validation method are often applied. Although some heuristic arguments for AIC were provided byHoeting et al.(2006), as pointed out byBreiman(1996), these model selection pro-cedures are unstable in the sense that a small change in the response variables may cause a different model to be selected, resulting in less accurate estimation and prediction.

In this paper, we focus on spatial prediction and attempt to make three proposals. First, we introduce a class of conditional information criteria indexed by a penalty parameter, which we believe is more suitable for geostatistical model selection com-paring to commonly used unconditional information criteria, such as AIC and BIC. A particular criterion in this class, called the conditional AIC (CAIC), was proposed byVaida and Blanchard(2005) for variable selection in linear mixed-effects models. Despite some tendency to select an over-complex model, this criterion has a desired property that it is an unbiased estimate of the sum of the mean squared prediction errors over data locations up to a constant. Second, due to that the spatial predictor obtained from a fixed conditional information criterion is also unstable, we propose to stabilize the spatial predictor by local model averaging using perturbed data, resulting in a stabilized predictor that can be shown to be differentiable with respect to the response variables even after plugging-in estimated model parameters. Third, utiliz-ing the differentiable property of the stabilized predictor, we propose to apply Stein’s unbiased risk estimate (Stein 1981) to select among a collection of conditional infor-mation criteria, leading to a data-dependent penalty with an adaptive feature such that it tends to selects a large penalty, and hence a small model (based on a small number of explanatory variables), when the underlying true model is small, and vice versa.

The rest of this article is organized as follows. Section2introduces the geostatis-tical regression model and the class of conditional information criteria for variable selection. Section3introduces the proposed stabilized predictor and Stein’s unbiased risk estimate for selecting the penalty parameter. Some numerical results are shown in Sect.4. An application of the proposed methodology for assessing mercury levels of fish in Maine lakes is presented in Sect.5. Finally, a brief discussion is given in Sect.6.

2 Geostatistical regression and variable selection 2.1 Geostatistical regression models

Consider a spatial process{S(s) : s ∈ D} defined over a region D ⊆ Rd. Suppose that the spatial process can be decomposed into the following:

(3)

S(s) = μ(s) + η(s); s ∈ D, (1) whereμ(·) is a deterministic mean process corresponding to a large-scale mean struc-ture, andη(·) is a zero-mean, L2-continuous, spatial dependent process correspond-ing to a small-scale structure. Suppose that we observe response variable Zi and a

p-dimensional vector of explanatory variables,(x1(si), . . . , xp(si)), associated with

Zi at location si ∈ D; i = 1, . . . , n. The mean process μ(s) is usually modeled as

β0+

p



j=1

βjxj(s), where β = (β0, β1, . . . , βp)are regression parameters. Hence the

geostatistical regression model can be written as:

Zi = S(si) + ε(si) = β0+ p  j=1 βjxj(si) + η(si) + ε(si); i = 1, . . . , n, (2)

whereε(s1), . . . , ε(sn) ∼ N(0, σε2) are white noise variables representing

measure-ment errors, and are independent of the spatial processη(·). Here, the spatial depen-dent processη(·) is usually assumed to be stationary with some parametric covariance function class. A commonly used one is the isotropic Matérn model (Matérn 1986):

C(h) = cov(η(s + h), η(s)) = ⎧ ⎨ ⎩ σ2 η(a2||h||/2)ν2κν(a2||h||) (ν) ; if h = 0, σ2 η; if h= 0, (3)

where ·  represents the Euclidean distance, κν(·) is the modified Bessel function of the second kind with orderν > 0 (Abramowitz and Stegun 1965),ν indicates the smoothness of the process, a > 0 is a scaling parameter, and ση2 = var(η(s)). Note that (3) reduces to the exponential covariance model whenν = 0.5.

In this paper, we consider selecting among p explanatory variables. Each candi-date model corresponds to a subset of p variables and is indexed byγ ∈ , where  ⊂ 2{1,...,p} is the class of all candidate models. Then the geostatistical regression

model corresponding toγ can be written as:

Z= Xγβγ + η + ε,

where βγ is the parameter vector consisting of β0 and {βj : j ∈ γ }, Xγ is

the n × (pγ+ 1) design matrix corresponding to γ with pγ being the number of explanatory variables in γ, Z = (Z1, . . . , Zn), η = (η(s1), . . . , η(sn)), and

ε = (ε(s1), . . . , ε(sn)).

Denote the covariance parameters of Z byθ ≡ (ν, a, ση2, σε2) and let(θ) ≡ var(Z). For known θ, the best linear unbiased predictor (BLUP), typically called the universal kriging (UK) predictor, of S(s) based on model γ satisfies

(4)

where ˆβγ =



Xγ(θ)−1Xγ

−1

Xγ(θ)−1Z, and xγ(s) consists of the constant 1 and the pγ explanatory variables observed at location s. Then the BLUP of S(S(s1), . . . , S(sn))based on modelγ can be written as

ˆSγ(θ) ≡ ( ˆSγ(s1; θ), . . . , ˆSγ(sn; θ))= Hγ(θ)Z, (4)

in terms of some matrix Hγ(θ).

In practice, the model parameters in (2) and (3) can be estimated by methods of moments, maximum likelihood (ML), restricted maximum likelihood (REML) (Patterson and Thompson 1971), or Bayesian methods. Note that after plugging-in an estimated parameter vector ˆθγofθ based on model γ in (4), the resulting estimated UK predictor ˆSγ( ˆθγ) = Hγ( ˆθγ)Z is no longer linear.

2.2 Model selection methods

In this paper, we focus on spatial prediction. Our objective is to select γ ∈  that minimizes E ˆSγ( ˆθγ) − S2. Some commonly used model selection criteria, such as AIC, BIC, and the corrected AIC (AICc) criterion (Hurvich and Tsai 1989), are special cases of the generalized information criterion (Nishii 1984) given by

GICλ(γ ) = −2 γZ, ˆθγ(ml), ˆβγ(ml)+ λ pγ, (5) withλ = 2, log(n), and 2n/(n − pγ− 2), respectively, where γZ, θ, βγis the log-likelihood function of Z based on modelγ , and ˆθγ(ml)and ˆβγ(ml)are the corresponding ML estimates. However, these criteria are not aim to minimize E ˆSγ( ˆθγ) − S2, and hence may not be ideal for spatial prediction purpose, as demonstrated in some simulation examples in Sect.4.

With the goal of spatial prediction in mind, we now introduce a class of conditional information criteria by conditioning onη. For notational simplicity, we first assume that the covariance parameter vectorθ is known. Then the CAIC criterion proposed byVaida and Blanchard(2005) is

CAIC(γ ; θ) = 1 n Z − ˆSγ(θ) 2+ 2trHγ(θ)σε2 , (6)

where trHγ(θ), the trace of Hγ(θ), is the effective degrees of freedom, which tends to be larger for a larger (more complex) model. As shown byVaida and Blanchard (2005), ECAIC(γ ; θ)= 1 nE ˆSγ(θ) − S 2 + σ2 ε .

Consequently, CAIC appears more appropriate than AIC for variable selection when prediction is the main goal of the analysis. Whenθ is unknown, we can apply CAIC by plugging-in an estimate ofθ in (6).

(5)

In general, CAIC tends to select an over-complex model particularly when the underlying true model is small. A natural remedy is to consider applying a larger penalty in (6), leading to the following conditional generalized information criterion (CGIC) indexed by a penalty parameterλ > 0:

CGICλ(γ ) = 1 n Z − ˆSγ( ˆθγ) 2+ λ trHγ( ˆθγ)ˆσε2 ; γ ∈ , (7)

where ˆSγ( ˆθγ) = Hγ( ˆθγ)Z, ˆθγ is some estimator ofθ based on model γ , and ˆσε2is an estimate of σε2 in (7) independent of γ ∈ . For example, it can be estimated by REML based on the full model. For a given penaltyλ, CGIC in (7) selects the model, ˆγ(λ) ≡ arg min

γ ∈ CGICλ(γ ). The corresponding spatial predictor of S(s) at

s ∈ D is given by ˆSˆγ(λ)(s; ˆθˆγ(λ)), where ˆθˆγ(λ) is the estimator ofθ based on the selected model ˆγ(λ). The criterion (7) includes CAIC (corresponding toλ = 2) and conditional BIC (CBIC) (corresponding toλ = log(n)) as special cases. Clearly, if we know the underlying true model depends only on a small number of explanatory variables in advance, it would be preferable to choose a criterion with a larger penalty that penalizes more for a larger model, and vice versa. But in practice, the underlying true model is unknown, and hence a CGIC criterion may perform well only in some situations, which is not adaptive. Without knowing the underlying true model,Shen and Ye(2002) proposed to select the penaltyλ of (5) from data in linear regression model selection and termed the method as adaptive model selection. We shall adopt this approach and further generalize it for spatial model averaging.

3 The proposed method

As inShen and Ye(2002), we could consider ˆSˆγ(λ)(s; ˆθˆγ(λ)) : λ ∈ } as the class of candidate spatial predictors for some ⊂ (0, ∞). However, ˆSˆγ(λ)(s; ˆθˆγ(λ)) is discon-tinuous in Z and hence unstable (Breiman 1996). Motivated fromBreiman (1996), we consider a stabilized version of ˆSˆγ(λ)(s; ˆθˆγ(λ)) obtained by using perturbed data, Z= (Z1, . . . , Zn)≡ Z +τξ, where ξ ∼ N(0, σε2I) is independent of Z, and τ > 0 is the perturbation size. Whenσε2is unknown, the estimatorˆσε2in (7) is used for com-puting Z. Our proposed stabilized predictor of S(s) corresponding to ˆSˆγ(λ)(s; ˆθˆγ(λ)) is given by

E ˆSˆγ(λ)(s; ˆθˆγ(λ))Z



; λ ∈ , (8)

where ˆγ(λ) is the model selected by CGICλbased on Z∗:

ˆγ(λ) = arg min γ ∈ Z − ˆSγ( ˆθγ) 2 + λ trHγ( ˆθγ)1+ τ2σε2 , ˆSγ( ˆθγ) = ˆ(s1; ˆθγ), . . . , ˆSγ(sn; ˆθγ) 

= Hγ( ˆθγ)Zis the UK predictor of S based

(6)

ˆθγ but based on Z∗. Since ˆγ(λ) may be different from ˆγ(λ) for different Z∗, the proposed stabilized predictor E ˆSˆγ(λ)(s; ˆθˆγ(λ))Zin (8) can be regarded as a type of model averaging predictor centered around ˆSˆγ(λ)(s; ˆθˆγ(λ)). Clearly, a larger τ tends to produce a smoother but more stable predictor. We found in some simulation exper-iments (e.g., Table5) that spatial prediction is typically not sensitive to the choose of τ when τ is between 0.5 and 0.9. Therefore, we fix τ = 0.5 throughout the simulation and the data analysis unless stated otherwise.

As shown byHuang and Chen(2007), E ˆSˆγ(λ)(s; ˆθˆγ(λ))Z



is infinitely differen-tiable. Applying Stein’s lemma (1981), the corresponding effective degrees of freedom is given by d fλn  i=1 ∂ Zi E ˆSˆγ(λ)(si; ˆθˆγ(λ))Z=τ21σ2 ε n  i=1 cov ˆSˆγ(λ)(si; ˆθˆγ(λ)), ZiZ  , (9)

and an unbiased estimator of E E ˆSˆγ(λ)( ˆθˆγ(λ))Z− S 2satisfies

Z− E ˆSˆγ(λ)( ˆθˆγ(λ))Z 2+ 2σε2d fλ− nσε2. (10)

Note that (10) is usually referred to as Stein’s unbiased risk estimate (SURE). Thus we can simply selectλ by minimizing SURE in (10) and obtain

ˆλ ≡ arg min λ∈ Z − E ˆSˆγ(λ)( ˆθˆγ(λ))Z 2+ 2σε2d fλ . (11)

Consequently, the model selected by CGICˆλcriterion is

ˆγ(ˆλ) ≡ arg min γ ∈ Z − ˆSγ( ˆθγ) 2+ ˆλ trHγ( ˆθγ)σε2 , and the resulting stabilized predicted surface is E ˆS

ˆγ(ˆλ)(s; ˆθˆγ(ˆλ))Z



: s ∈ D. In practice, it suffices to select among a few discrete penalty values due to the continuity of E ˆSˆγ(λ)( ˆθˆγ(λ))Z



in λ. For example, we consider =

{1, 2, log(n), 2 log(n)} in the simulation study and the data analysis, which involves

two typical criteria: CAIC with λ = 2 and CBIC with λ = log(n). Note that as demonstrated in Sect.4, the selected penalty in (11) tends to be small when the size of true model is large, and vice versa. This adaptive feature is attractive, because it automatically adjusts to select an appropriate model ˆγ(ˆλ), around which a model aver-aging predictor E ˆS

ˆγ(ˆλ)(s; ˆθˆγ(ˆλ))Z



of S(s) is obtained, regardless of whether the underlying true model is small or large.

In practice, d fλin (9) can be computed using a simple Monte Carlo (MC) method by simulating some replicates of perturbed data, and then approximate the covari-ances in the righthand side of the equality in (9) based on the corresponding sample covariances.

(7)

4 Simulations

We conducted a simulation study to examine the performance of the proposed model averaging method. We consider the model given by (1) and (2) with D= [0, 1]2. In the simulation, we consider two different examples (Examples I and II) corresponding to two different sets of explanatory variables. In Example I, p= 9 explanatory vari-ables were independently generated from a standard Gaussian white noise process. In addition, the spatial processη(·) was generated from a zero-mean Gaussian sta-tionary process with the exponential covariance function having two combinations of parameters(ν, a, ση2) = (0.5, 0.5, 1) and (0.5, 0.1, 1) in (3) corresponding to weak and strong spatial dependence, respectively (see Fig.1). The noise variance in (2) is set to beσε2 = 1, and is assumed known throughout the simulation. We consider five cases corresponding to five different choices ofβ = (0, β1, . . . , β9) indexed by k = 1, 3, 5, 7, 9 as the underlying true models. For each k, the regression coef-ficients are given by β1 = · · · = βk = √7/k, and βk+1 = · · · = βp = 0 so

that the signal-to-noise ratio (SNR) is controlled at 8, where SNR is defined as the ratio of the variance of the signal S(si) to the noise variance σε2. We generated data,

{(x1(si), . . . , x9(si), Zi) : i = 1, . . . , n} with n = 32, by sampling at 64 × 64 regular

grid points in D using simple random sampling. For each case, we consider all possible combinations of explanatory variables by selecting among = 2{1,...,9}withγ = ∅ representing the intercept-only model.

The setup for Example II is basically the same as that in Example I except we select a polynomial up to the second order,β0+ β1x+ β2y+ β3x2+ β4x y+ β5y2, where x and y are the x and y coordinates, and consider only the weak dependence case with a= 0.5 in (3). Hence there are p= 5 explanatory variables, x, y, xy, x2, y2, in this

0.0 0.5 1.0 1.5 0.0 0.2 0.4 0.6 0.8 1.0 Distance Correlation ν = 0.5 and a = 0.5 ν = 0.5 and a = 0.1

(8)

example. We consider three different cases with differentβ values for the underlying model:βx, β(x + y + xy), and β(x + y + x2+ xy + y2), where β is chosen so that SNR= 8. Similar to Example I, we consider selecting among  = 2{1,...,5}for each case.

We applied the proposed geostatistical model averaging method, referred to as GMA hereafter, with the covariance function of η(·) given by the Matérn class of (3). We selected among four CGIC criteria with = {1, 2, log(n), 2 log(n)} in (11) and estimated the Matérn parameters using REML, where d fλ in (11) is computed using the MC method based on 100 replicates as mentioned in the last paragraph of Sect.3for eachλ ∈ . The proposed GMA method is compared with five different model selection criteria, including AIC, BIC, CAIC, CBIC, and AICc, and their cor-responding model averaging methods (seeBurnham and Anderson 2002), denoted as AIC-MA, BIC-MA, CAIC-MA, CBIC-MA, and AICc-MA, where a model averaging predictor of S is given by γ ∈ ˆSγ( ˆθγ) with weights = exp  −1 2GICλ(γ )  γ∈ exp  −1 2GICλ(γ) or exp  −1 2CGICλ(γ )  γ∈ exp  −1 2CGICλ(γ) ; γ ∈ .

The results of various methods are assessed in terms of the mean squared prediction error (MSPE): 1

n ˆS− S 2

, where ˆS is a generic predictor of S.

The results based on 200 simulation replicates for Examples I and II are shown in Tables1and2, respectively. In general, conditional information criteria perform better than the unconditional counterparts, and model averaging predictors perform better than the corresponding model selection predictors in terms of MSPE. As expected, both CAIC and CBIC perform well only in some situations with CBIC performing better than CAIC when the size of underlying true model is small, and vice versa. The proposed GMA method performs better than CAIC and CBIC for all cases. It also performs better than both CAIC-MA and CBIC-MA for almost all cases.

Tables3and4show the distribution of the penalties selected from GMA for each case in Examples I and II. Evidently, GMA tends to select a large penalty when the size of true model is small, and vice versa, as if the size of true model is known in advance. This adaptive feature can not be achieved by either CAIC or CBIC.

Finally, we performed a sensitivity study regarding the choice of perturbation size τ for the proposed GMA method. We also investigated whether it is advantageous to optimize the search ofλ over (0, ∞). The results for Example II are displayed in Table5, which show that MSPEs are not sensitive to the perturbation size when τ is between 0.5 and 0.9. In addition, there seems no clear advantage to search λ over (0, ∞). Similar results can be seen for all cases in Example I as well. Although we can further apply SURE to optimize the search of(λ, τ) over (0, ∞) × (0, ∞), it appears effective and computationally more efficient to fixτ at 0.5 and search over a small number ofλ values.

(9)

Table 1 MSPE performance of various methods for Example I with weak (a= 0.5) and strong (a = 0.1) spatial dependence based on 200 simulation replicates, where the values in parentheses are the corresponding standard errors a Criterion |{ j : βj= 0}| 1 3 5 7 9 AIC .557 (.021) .538 (.021) .547 (.021) .541 (.021) .556 (.021) AICc .542 (.025) .498 (.023) .540 (.024) .559 (.023) .657 (.024) BIC .530 (.024) .515 (.023) .566 (.024) .554 (.022) .569 (.022) CAIC .305 (.010) .345 (.014) .469 (.022) .499 (.020) .535 (.018) CBIC .255 (.010) .313 (.016) .467 (.025) .506 (.021) .589 (.021) 0.5 AIC-MA .467 (.020) .465 (.019) .506 (.020) .522 (.019) .560 (.019) AICc-MA .425 (.021) .426 (.020) .487 (.021) .533 (.018) .625 (.017) BIC-MA .434 (.021) .439 (.020) .494 (.020) .523 (.019) .575 (.017) CAIC-MA .256 (.009) .309 (.013) .423 (.019) .470 (.017) .533 (.015) CBIC-MA .224 (.009) .289 (.013) .440 (.022) .491 (.019) .588 (.018) GMA .214 (.009) .296 (.012) .386 (.014) .449 (.014) .490 (.013) AIC .448 (.021) .452 (.022) .452 (.022) .464 (.022) .467 (.022) AICc .416 (.026) .383 (.024) .393 (.024) .425 (.024) .557 (.024) BIC .405 (.026) .421 (.025) .423 (.023) .446 (.023) .469 (.022) CAIC .219 (.011) .272 (.016) .322 (.017) .392 (.019) .441 (.018) CBIC .154 (.010) .253 (.023) .314 (.021) .378 (.019) .471 (.020) 0.1 AIC-MA .339 (.019) .367 (.020) .392 (.020) .435 (.021) .475 (.019) AICc-MA .284 (.019) .318 (.020) .356 (.020) .430 (.020) .559 (.017) BIC-MA .295 (.019) .333 (.020) .370 (.020) .430 (.020) .496 (.018) CAIC-MA .171 (.008) .234 (.013) .295 (.015) .374 (.016) .454 (.016) CBIC-MA .136 (.007) .225 (.019) .288 (.016) .381 (.017) .489 (.016) GMA .116 (.009) .191 (.011) .278 (.013) .347 (.013) .408 (.013)

Table 2 MSPE performance of various methods for Example II based on 200 simulation replicates, where the values in parentheses are the corresponding standard errors

Criterion |{ j : βj= 0}| 1 3 5 AIC .372 (.025) .373 (.025) .361 (.024) AICc .203 (.009) .250 (.011) .238 (.009) BIC .203 (.009) .253 (.012) .236 (.009) CAIC .198 (.009) .246 (.009) .240 (.008) CBIC .180 (.009) .260 (.010) .242 (.008) GMA .168 (.007) .234 (.008) .212 (.008) 5 Application

We applied the proposed GMA method to the mercury data set previously analyzed by Hoeting and Olsen (see Chap. 1 ofPeck et al. 1998) using multiple linear regression for assessing mercury levels of fish in Maine lakes. It is known that mercury is a toxic metal, which may damage the human nervous system if the mercury level in the human body is above the safety limit. For example, the state government of Maine suggests that the mercury level in parts per million (ppm) should be less than 0.43. To assess

(10)

Table 3 Distributions of the selected penalties for Example I based on 200 simulation replicates a |{ j : βj= 0}| ˆλ Average ˆλ 1 2 log(n) 2 log(n) 1 14 23 24 139 5.53 3 16 43 31 110 4.86 0.5 5 25 53 70 52 3.67 7 48 89 54 9 2.38 9 89 98 13 0 1.65 1 9 7 16 168 6.21 3 7 18 34 141 5.69 0.1 5 16 29 60 95 4.70 7 36 76 78 10 2.64 9 89 99 12 0 1.64

Table 4 Distributions of the selected penalties for Example II based on 200 simulation replicates

|{ j : βj= 0}| ˆλ Average ˆλ

1 2 log(n) 2 log(n)

1 0 42 46 112 5.10

3 1 84 54 61 3.89

5 0 83 65 52 3.76

Table 5 MSPE performance of GMA for various perturbation sizesτ in Example II based on 200 sim-ulation replicates, where the penalty parameterλ is chosen from either = {1, 2, log(n), 2 log(n)} or a continuous interval = (0, ∞), and the standard errors of MSPEs are all less than or equal to 0.011

τ |{ j : βj= 0}| = 1 |{ j : βj= 0}| = 3 |{ j : βj= 0}| = 5

0.1 .189 .204 .254 .262 .253 .264

0.5 .168 .174 .234 .234 .212 .220

0.9 .159 .166 .221 .227 .198 .206

whether the fish in Maine lakes are safe to eat, it is important to estimate mercury levels in fish particularly for lakes where no observations are taken and identify the important explanatory variables responsible for elevated levels of mercury.

The data set consists of mercury level (in ppm) as the response and 10 explanatory variables sampled by US Environmental Protection Agency at 110 lakes in Maine (see Fig.2). The 10 explanatory variables include number of fish, elevation (in feet), surface area (in acres), maximum depth (in feet), lake type (in three categories: oligotrophic, eutrophic, and mesotrophic), lake stratification (in two categories: yes and no), drain-age area (in square miles), runoff factor (in percentdrain-age of rainwater or melted snow flow into rivers and streams), flushing rate (in number flushes per year), and DAM (in two categories: all natural flowage and some man-made flowage in the drainage

(11)

−71 −70 −69 −68 −67 43 44 45 46 47 Longitude (degrees) Latitude (degrees)

Fig. 2 Locations of 110 lakes in Maine

area). We consider all possible combinations of explanatory variables, leading to 210 candidate models to be selected over.

We applied the model of (2) with Zi and S(si) being the observed and the true

mercury levels in fish at lake si, and x1(si), . . . , x10(si) being the corresponding 10

explanatory variables, where the covariance function ofη(·) is modeled by the expo-nential covariance model withν = 0.5 in (3). We applied CAIC, CBIC, and the pro-posed GMA method with = {1, 2, log(n), 2 log(n)} in (11), where the covariance parametersθ are estimated using REML and d fλin (11) is computed using the MC method based on 100 replicates. In addition,σε2in (11) is estimated by ˆσε2= 0.0876 using REML based on the full model.

The results for CAIC and CBIC are shown in Table6. Comparing between the two criteria, CBIC selects a smaller model having only one variable (i.e., elevation). In contrast, CAIC selects a more complex model having two additional variables (surface area and lake stratification). Our method selects ˆλ = log(n) corresponding to CBIC. The predicted mercury level surface based on our GMA method is shown in Fig.3. Among the 110 lakes, 75 of them (shown as solid triangles in Fig.3) particularly in southeast of Maine have mercury levels higher than the safety limit (i.e., 0.43 ppm).

(12)

Table 6 Estimated regression parameters and covariance parameters for CAIC and CBIC, where the values in parentheses are the corresponding standard errors under the selected models

Criterion Regression parameters Covariance parameters Intercept Elevation Surface area Lake stratification a ση2 σε2

CAIC .64 −2.9 × 10−4 −2 × 10−5 .083 .546 .034 .075 (.10) (1× 10−4) (1.5 × 10−5) (.059) CBIC .66 −2.7 × 10−4 − − .534 .026 .081 (.08) (1× 10−4) − − −71 −70 −69 −68 −67 0.3 0.4 0.5 0.6 0.7 43 44 45 4 4 7 46 Longitude (degrees) Latitude (degrees)

Fig. 3 Locations of 75 lakes in Maine with mercury levels higher than 0.43 ppm

6 Discussion

Motivated from the adaptive model selection idea ofShen and Ye(2002) and the sta-bilization method ofBreiman(1996), in this paper, we developed a model averaging method for geostatistical regression, which performs well for spatial prediction in some simulation experiments. Further theoretical justification is of interest but appears to be very difficult particularly under the fixed domain asymptotic framework, and hence is beyond the scope of this paper.

(13)

Acknowledgments This work was supported by the National Science Council of Taiwan under Grants NSC 98-2118-M-018-003-MY2 and NSC 97-2118-M-001-001-MY3. The authors thank the editor, the associate editor, and three anonymous referees for helpful comments and suggestions.

References

Abramowitz M, Stegun IA (1965) Handbook of mathematical functions. Dover, New York

Akaike H (1973) Information theory and the maximum likelihood principle. In International symposium on information theory (V. Petrov and F. Csáki eds.), Akademiai Kiádo, Budapest, pp. 267–281 Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24: 2350–2383 Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical

information-theoretic approach, 2nd edn. Springer-Verlag, New York

Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial (with Discussion). Stat Sci 14: 382–401

Hoeting JA, Davis RA, Merton AA, Thompson SE (2006) Model selection for geostatistical models. Ecol Appl 16: 87–98

Huang HC, Chen CS (2007) Optimal geostatistical model selection. J Am Stat Assoc 102: 1009–1024 Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika

76: 297–307

Matérn B (1986) Spatial variation, 2nd edn. Springer, New York, (Lecture Notes in Statistics)

Nishii R (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Ann Stat 12: 758–765

Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58: 545–554

Peck R, Haugh LD, Goodman A (eds) (1998) Statistical case studies: a collaboration between academe and industry. ASA-SIAM Series on statistics and applied probability 3 and 4

Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

Shao J (1997) An asymptotic theory for model selection (with discussion). Stat Sin 7: 221–264 Shen X, Ye J (2002) Adaptive model selection. J Am Stat Assoc 97: 210–221

Stein C (1981) Estimation of the mean of a multivariate normal distribution. Ann Stat 9: 1135–1151 Vaida F, Blanchard S (2005) Conditional Akaike information for mixed-effects models. Biometrika

92: 351–370 Author Biographies

Chun-Shu Chen graduated from the National Central University, Taiwan, in 2007 with a Ph.D. degree in Statistics. In 2008, he joined the Institute of Statistics and Information Science at the National Changhua University of Education, Taiwan. He is currently an Assistant Professor in the same University and his research activities are focused on spatial statistics and model selection.

Hsin-Cheng Huang is research fellow in the Institute of Statistical Science at Academia Sinica, Taiwan. His research interests include spatial and space-time models, model selection, and wavelet methods.

數據

Fig. 1 Matérn correlation functions
Table 1 MSPE performance of various methods for Example I with weak (a = 0.5) and strong (a = 0.1) spatial dependence based on 200 simulation replicates, where the values in parentheses are the corresponding standard errors a Criterion |{ j : β j = 0}| 1
Table 3 Distributions of the selected penalties for Example I based on 200 simulation replicates a |{ j : β j = 0}| ˆλ Average ˆ λ 1 2 log (n) 2 log (n) 1 14 23 24 139 5 .53 3 16 43 31 110 4 .86 0 .5 5 25 53 70 52 3 .67 7 48 89 54 9 2 .38 9 89 98 13 0 1 .
Fig. 2 Locations of 110 lakes in Maine
+2

參考文獻

相關文件

• elearning pilot scheme (Four True Light Schools): WIFI construction, iPad procurement, elearning school visit and teacher training, English starts the elearning lesson.. 2012 •

ix If more than one computer room is opened, please add up the opening hours for each room per week. duties may include planning of IT infrastructure, procurement of

The min-max and the max-min k-split problem are defined similarly except that the objectives are to minimize the maximum subgraph, and to maximize the minimum subgraph respectively..

In a nonparametric setting, we discuss identifiability of the conditional and un- conditional survival and hazard functions when the survival times are subject to dependent

We propose two types of estimators of m(x) that improve the multivariate local linear regression estimator b m(x) in terms of reducing the asymptotic conditional variance while

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

In addition, to incorporate the prior knowledge into design process, we generalise the Q(Γ (k) ) criterion and propose a new criterion exploiting prior information about