• 沒有找到結果。

Asymptotic Frameworks

There are two asymptotic frameworks in geostatistics having different assumptions on the domain D. One is called the fixed domain asymptotic framework, where data are sampled more and more densely in a bounded fixed region D. The other is called the increasing domain asymptotic framework with |D| → ∞ as n → ∞, which is often considered in time series analysis. The fixed domain asymptotic framework is somewhat unique in geostatistics, which tends to have some unusual asymptotic behavior due to limited information available in a bounded fixed region.

Asymptotic properties under the increasing domain asymptotic framework are more standard. Suppose that we observe data Z according to (2.2), where µ(s) = x(s)0β is known, but var(Z) depends on some unknown parameter vector θ. Then Mardia and Marshall (1984) show under some regularity conditions that

θˆM L ∼ N(θ0, I−10)), (2.10) where θ0 is the true parameter vector and I(θ0) is the Fisher information. However, (2.10) is generally not satisfied under the fixed domain asymptotic framework, and in fact some parameters of θ can not be consistently estimated. For example, suppose that η(·) is generated from a Mat´ern covariance function of (2.5) with ν known but ση2 and κη unknown. Zhang (2004) shows that the ML estimates of σ2η and κη are inconsistent under the fixed domain asymptotic framework. That is,

n→∞lim P¡

|ˆση2− σ2η,0| > ε} > 0,

and

n→∞lim P¡

|ˆκη − κη,0| > ε} > 0.

for any ε > 0, where ση,02 and κη,0 are the corresponding true parameters. However, as shown in the following proposition, some function of ση2 and κη can be consistently estimated.

Proposition 1 (Zhang, 2004) Consider an increasing sequence of finite subsets Dn of Rd, for d = 1, 2, 3, such that ∪n=1Dn is bounded and infinite. Suppose that the data Z are observed on D = Dn according to (2.2) with β = 0 and σ²2 = 0 known, where η(·) is a Gaussian process with a Mat´ern covariance function of (2.5) and ν > 0 is known.

Assume that ση,02 > 0 and κη,0 > 0 are the true parameters corresponding to σ2 and κη. If κη is fixed at some constant κ1 > 0, and ˆσ2η is the ML estimate of σ2. Then

ˆ

σ2ηκ1 −→ σp η,02 κη,0, as n → ∞.

Also, Ying (1991) shows the similar results for exponential covariance function which is a special case of Mat´ern class for ν = 0.5.

Proposition 2 (Ying, 1991) Suppose that the data Z are observed on D = [0, 1] accord-ing to (2.2) with β = 0 and σ2² = 0 known, where η(·) is a zero-mean Gaussian process with an exponential covariance function of (2.3). Let Θ be the parameter space of (ση2, κη)0. Assume that either Θ = [a, b] × (0, ∞) or Θ = (0, ∞) × [a, b], where 0 < a ≤ b < ∞, and the true parameter vector (σ2η,0, κη,0)0 ∈ Θ.

(i) Let ˆσ2η and ˆκη be the ML estimates of ση2 and κη. Then

√n(ˆση2κˆη − σ2η,0κη,0)−→ N(0, 2(σd η,02 κη,0)2), as n → ∞.

(ii) Suppose that κη is fixed at some constant κ1 > 0 and ˆσ12 is the corresponding ML estimate of ση2. Then

√n µ

ˆ

σ21 ση,02 κη,0 κ1

d

→ N Ã

0, 2

µσ2η,0κη,0 κ1

2!

, as n → ∞.

(iii) Suppose that ση2 is fixed at some constant σ12 and ˆκ1is the corresponding ML estimate of κη. Then

√n µ

ˆ

κ1−ση,02 κη,0 σ12

d

→ N µ

0, 2

µσ2η,0κη,0 σ12

2

, as n → ∞.

The parameters ση2κη in Proposition 1 and ση2κη in Proposition 2 are called microer-godic parameters (Matheron 1971, 1989; Stein 1999), which basically imply that both parameters can be recovered with probability 1 from observations in a bounded fixed region. These parameters have also been shown to play an important role in spatial prediction by Stein (1999). Specifically, consider the spectral density function of K(h), h ∈ Rd:

f (ω) = 1 (2π)d

Z

Rd

exp(−iω0h)K(h)dh; ω ∈ Rd.

Stein shows that under the fixed domain asymptotic framework, f (ω) contributes to mean square prediction error mainly for large |ω|, whose behavior is governed by some microergodic parameters. He also provides some specific examples for exponential and Mat´ern covariance functions.

For σ2² > 0 in (2.2), Chen et al. (2000) provides the following results regarding the ML estimates of σ2η, κη and σ²2.

Proposition 3 (Chen et al. 2000) Suppose that the data Z are observed regularly on D = [0, 1] according to (2.2) with β = 0 known, where η(·) is a zero-mean Gaussian process with an exponential covariance function of (2.3). Assume that (ση, κη, σ2²)0 ∈ Θ, where Θ ⊂ (0, ∞)3 is a compact set, and the true parameter vector (σ²,02 , σ2η,0, κη,0)0 ∈ Θ.

(i) Let ˆσ2², ˆση2 and ˆκη be the ML estimates of σ²2, ση2 and κη. Then, as n → ∞, µ n1/4ση2κˆη − σ2η,0κη,0)

n1/2σ2² − σ²,02 )

d

→ N µµ 0

0

,

µ 4

²,0η,02 κη,0)3/2 0 0 ²,04

¶¶

.

(ii) Suppose that κη is known and ˆσ²2 and ˆσ2η are the corresponding ML estimates of σ²2 and σ2η. Then, as n → ∞,

µ n1/4σ2η− ση,02 ) n1/2σ²2− σ²,02 )

d

→ N µµ 0

0

,

µ 4

²,0ση,03 κ−1/2η,0 0 0 ²,04

¶¶

.

In Chapter 5, we shall provide the convergence rates for the ML estimates of σ²2, σ2η and κη under more general spatial domains with D = [0, nδ] and δ ∈ [0, 1). In addition, those convergence rates will be given under geostatistical regression models of (2.2) based on not only the true model, but also underfitted and overfitted models.

Chapter 3

Variable Selection

Consider the geostatistical regression model of (2.2). Suppose that we observe spatial data, {x(si), Z(si)}; si ∈ D and i = 1, . . . , n. This model reduces to a usual regression model when η(·) = 0. Similar to linear regression, a large model with many insignificant variables tends to produce a large variance, resulting in low predictive power. On the other hand, a small model that ignores some important variable may produce large bias. To achieve good compromise between bias and variance, it is essential to identify significant variables.

Clearly, variable selection is essential not only in regression but also in geostatistical regression.

We consider selecting a subset of {1, . . . , p} corresponding to p explanatory variables.

Let A ⊂ 2{1,...,p} be the set of all candidate models, and let α ∈ A denotes a candidate model. Note that intercept is always included in our models, and α = ∅ corresponds to the intercept only model.

Let X(α) be an n × p(α) sub-matrix of X containing the columns corresponding to α, and let β(α) be the sub-vector of β corresponding to X(α). A model α is said to be correct if µ(s) can be written as P

j∈αβjxj(s), for s ∈ D. Let Ac ⊂ A be the set of all correct models and let αc= arg min

α∈A

|α| be the correct model having the smallest number of variables. Then Ac = {α ∈ A : αc⊂ α}.

The geostatistical regression model corresponding to α ∈ A can be written in a matrix form as:

Z = X(α)β(α) + η + ², (3.1)

where η ≡ (η(s1), . . . , η(sn))0 ∼ N(0, Ση) and ² ∼ N(0, σ²2I). Hence the mean and the variance of Z under model α ∈ A are µ(α) = X(α)β(α) and

Σ(θ) = Ση + σ²2I, (3.2)

where θ is the covariance parameter vector associate with var(Z).

3.1 Loss Functions

We consider two loss functions: the Kullback-Leibler (KL) loss function and the squared error loss function. First, for model α given in (3.1), the KL loss function is given by:

LKL(α; θ) = Z

Y ∈Rn

f (Y ; µ, Σ(θ0)) log f (Y ; µ, Σ(θ0)) f (Y ; ˆµ(α; θ), Σ(θ))dY

= 1

2log det(Σ(θ)) −1

2log det(Σ(θ0)) + 1

2tr(Σ(θ0−1(θ)) − n 2 +1

2(µ − ˆµ(α; θ))0Σ−1(θ)(µ − ˆµ(α; θ)), (3.3) where µ = E(Z) is the true mean vector and θ0 is the true covariance parameter vector,

µ(α; θ) = X(α) ˆˆ β(α; θ), (3.4)

β(α; θ) = (X(α)ˆ 0Σ−1(θ)X(α))−1X(α)0Σ−1(θ)Z,

and recall that f (·; µ, Σ) is the Gaussian density function defined in (2.8). Now, let M (α; θ) = X(α)(X(α)0Σ−1(θ)X(α))−1X(α)0Σ−1(θ), (3.5)

A(α; θ) = I − M (α; θ). (3.6)

Note that when θ = θ0, LKL(α; θ) in (3.3) reduces to a simpler form:

LKL(α) ≡ LKL(α; θ0) = 1

2(µ − ˆµ(α))0Σ−1(µ − ˆµ(α)), (3.7) where ˆµ(α; θ0) and Σ(θ0) are written as ˆµ(α) and Σ to simplify their notations. We can rewrite (3.7) as

LKL(α) = 1

2µ0A(α)0Σ−1A(α)µ + 1

2(η + ²)0M (α)0Σ−1M (α)(η + ²), (3.8) where A(α; θ0) and M (α; θ0) are also simplified as A(α) and M (α). Clearly, the first term µ0A(α)0Σ−1A(α)µ on the righthand side of the equality in (3.8) vanishes when α ∈ Ac. Thus we have the following lemma.

Lemma 1 Consider a class of models given by (3.1). Let LKL(α) be the KL loss for model α defined in (3.7). Then

E(LKL(α)) = 1

2µ0A(α)0Σ−1A(α)µ + p(α)

2 ; α ∈ A, (3.9)

where A(α) is defined in (3.6). In particular, E(LKL(α)) = p(α)/2, for α ∈ Ac.

Lemma 2 Consider a class of models given by (3.1). Let LKL(α) be the KL loss for model α defined in (3.7). Then

n→∞lim P¡

αc= arg min

α∈Ac

LKL(α)¢

= 1, (3.10)

and

αc= arg min

α∈Ac

E(LKL(α)). (3.11)

In addition, if αc is fixed, and

n→∞lim inf

α∈A\Acµ0A(α)0Σ−1A(α)µ = ∞, (3.12)

where A(α) is defined in (3.6), then

n→∞lim P¡

αc= arg min

α∈A

LKL(α)¢

= 1. (3.13)

In general, (3.12) is satisfied under the increasing domain asymptotic framework. How-ever, under the fixed domain asymptotic framework, it may or may not be satisfied; see Theorem 9 in Section 5.2 and Theorem 12 in Section 5.3, for which (3.12) holds and Theorems 5 and 6 in Section 5.1 for which (3.12) fails. In fact, as shown in Theorems 5 and 6, the smallest true model αc does not have the smallest KL loss under the fixed domain asymptotic framework. In other words, (3.13) is not always satisfied.

The other loss function we consider in this thesis is the squared error loss commonly used in geostatistics particularly for prediction purpose:

L(α) = k ˆS(α) − Sk2, (3.14)

where ˆS(α) is a generic predictor of S based on model α ∈ A. Throughout the thesis, we consider the universal kriging predictor of S in (2.7) unless indicated otherwise. For θ = θ0, the universal kriging predictor based on model α can be written as:

S(α) = H(α)Z,ˆ (3.15)

where

H(α) ≡ M (α) + ΣηΣ−1A(α), (3.16) with M (α) and A(α) defined in (3.5) and (3.6), respectively. Then the corresponding risk can be decomposed into the following:

E(L(α)) = EkS − E(S|Z) − ˆS(α) + E(S|Z)k2

= Ek ˆS(α) − E(S|Z)k2− 2E(( ˆS(α) − E(S|Z))0(S − E(S|Z))) + EkS − (S|Z)k2

= Ek ˆS(α) − E(S|Z)k2+ EkS − E(S|Z)k2,

which is lower bounded by EkS − E(S|Z)k2, independent of α ∈ A. The following lemma provides some more details regarding decomposition of E(L(α)), which is useful in establishing some asymptotic properties concerning the squared error loss.

Lemma 3 Consider a class of models given by (3.1). Let ˆS(α) be the UK predictor of S given by (3.15) and L(α) be the corresponding squared error loss defined in (3.14). Then

E(L(α)) = Ek ˆS(α) − E(S|Z)k2+ EkS − E(S|Z)k2

= R1(α) + R2(α) + σ²2tr(ΣηΣ−1), (3.17) where Ek ˆS(α) − E(S|Z)k2 = R1(α) + R2(α),

R1(α) = σ²4µ0A(α)0Σ−2A(α)µ,

R2(α) = σ²4tr(Σ−1M (α)), (3.18) where M (α) = M (α; θ0) is defined in (3.5).

Note that the term R1(α) corresponds to the model misspecification error, which is smaller for a larger model α, and in particular, R1(α) = 0 for α ∈ Ac. The term R2(α) corresponds to the estimation error, which generally increases with p(α) and is bounded by σ²2p(α), since

σ²2tr(Σ−1M (α)) = σ²2tr¡

Σ−1X(α)(X(α)0Σ−1X(α))−1X(α)0Σ−1¢

= tr¡¡

X(α)0Σ−1X(α)¢−1

X(α)0¡

σ2²Σ−2¢

X(α)¢

≤ tr¡¡

X(α)0Σ−1X(α)¢−1

X(α)0Σ−1X(α)¢

= tr(Ip(α)) = p(α).

In addition, the term EkS−E(S|Z)k2 = σ²2tr(ΣηΣ−1) in (3.17) corresponds to the optimal mean squared prediction error, which provides a lower bound for E(L(α)).

In general, lim

n→∞σ²2tr(ΣηΣ−1

R2(α) = ∞, for α ∈ Ac. It follows from (3.17) that

n→∞lim E(L(α))±

E(L(αc)) = 1, for α ∈ Ac. In contrast, from (3.9),

n→∞lim E(LKL(α))±

E(LKLc)) > 1, for α ∈ Ac\ {αc}.

Therefore, it would be preferable to select αc among α ∈ Ac under the KL loss.

相關文件