White Noise Regressors - 空間統計模型選取之大樣本理論

In this section, we consider explanatory variables generated independently from Gaussian white noise processes of (5.7).

Proposition 6 Consider a class of models given by (3.1) with x_j(s)’s independently gen-erated from white-noise processes of (5.7) and cov(η(s), η(s⁰)) = σ²_ηexp(−κη|s−s⁰|), where

and A(α; θ) is defined in (3.6).

(ii) For δ = 0,

−2`(θ; α) = n log(2π) − 1 − δ

2 log n + µ

log σ_²²+ P

j∈α^c\αβ_j²σ_j²+ σ²_²,0 σ_²²

¶ n

µ2κ_ησ_η² σ_²²

¶_1/2µ 1 −

j∈α^c\αβ_j²σ_j²+ σ_²,0²

2σ_²² +κ_η,0σ_η,0² 2κ_ησ_η²

¶ n^1/2

+ξ⁽³⁾(α; θ) + O_p(1). (5.126)

By (5.124), for δ ∈ (0, 1) and any α ∈ A \ A^c,

−2`(θ; α) + 2`(θ; α^c) = σ_²⁻² X

j∈α^c\α

β_j²σ_j²n + ξ⁽³⁾(α; θ) − ξ⁽²⁾(α^c; θ) + o_p(n)

= σ_²⁻² X

j∈α^c\α

β_j²σ_j²n + o_p(n), (5.127)

where the last equality holds because by (5.125), ξ⁽³⁾(α; θ) − ξ⁽³⁾(α^c; θ) = o_p(n). Similarly by (5.126) for δ = 0,

−2`(θ; α) + 2`(θ; α^c) = σ⁻²_² X

j∈α^c\α

β_j²σ_j²n + o_p(n).

As to be demonstrated in Theorem 12, we can use (5.127) to find an appropriate penalty λ that leads to selection consistency.

The following lemma shows that σ_²² is over-estimated by ML asymptotically when α ∈ A\A^cunder both the fixed domain and the increasing domain asymptotic frameworks.

Lemma 13 Under the setup of Proposition 6, let Θ ⊂ (0, ∞)³ be a compact set and let ˆθ(α) = (ˆσ_η²(α), ˆκ_η(α), ˆσ_²²(α))⁰ be the ML estimate of θ based on model α. Then for δ ∈ [0, 1) and α ∈ A,

σ²_²(α) = X

j∈α^c\α

β_j²σ²_j + σ_²,0² + o_p(1), (5.128) ˆ

κ_η(α)ˆσ²_η(α) = κ_η,0σ_η,0² + o_p(1). (5.129) The following theorem further provides the convergence rates for the ML estimates of κη, σ²_η and σ_²². These results are keys for establishing some asymptotic properties of GIC in Theorem 13.

Theorem 11 Under the setup of Proposition 6, let Θ ⊂ (0, ∞)³ be a compact set and let θ(α) = (ˆˆ σ²_η(α), ˆκ_η(α), ˆσ²_²(α))⁰ be the ML estimate of θ based on model α. Then

(i) For δ ∈ (0, 1), ˆ

σ_²²(α) =

½ σP_²,0² + op(n^{−(1−δ)/2}); if α ∈ A^c,

j∈α^c\αβ_j²σ²_j + σ_²,0² + o_p(n^{−(1−δ)/2}); if α ∈ A \ A^c,(5.130) ˆ

κη(α)ˆσ_η²(α) = κη,0σ_η,0² + op(n^{−(1−δ)/4}), (5.131) ˆ

σ_η²(α) = σ_η,0² + o_p(1); α ∈ A, (5.132) ˆ

κη(α) = κη,0+ op(1); α ∈ A. (5.133)

(ii) For δ = 0, Then, it can be obtained in a way similar to (5.99) that

ξ⁽³⁾(θ; α) = ξ1(θ; α) + ξ2(θ; α) + ξ(θ) + Op(1). (5.138) First, we prove (5.130). By (5.128) and (5.129), it suffices to show that for |σ_²²−σ²_²,α| = o(1), |κησ²_η − κη,0σ_η,0² | = o(1), and any ε > 0,

|σ²_²−σ_²,α² |≥εninf^{−(1−δ)/2}(−2`(θ; α) + 2`((σ_η², κη, σ_²,α² )⁰; α)) > 0, (5.139) as n → ∞ with probability tending to 1. By (5.124),

−2`(θ; α) = n log(2π) − 1 − δ

where the second equality follows from |σ²_² − σ²_²,α| = o(1), (5.39) and (5.40), and the last equality follows form (5.67) and

ξ₁(θ; α) − ξ₁((σ_η², κ_η, σ_²,α² )⁰; α) = o_p(max((σ_²²− σ_²,α² )²n, n^δ)), (5.141) ξ₂(θ; α) − ξ₂((σ_η², κ_η, σ_²,α² )⁰; α) = o_p(max((σ_²²− σ_²,α² )²n, n^δ)), (5.142) which can be obtained in a way similar to (5.105)-(5.106) where the moment conditions are given from (5.43)-(5.44) in this case. Thus, (5.139) is obtained. This completes the proof of (5.130).

Second, we prove (5.131). By (5.129) and (5.130), it suffices to show that for |σ²_² − σ²_²,α| = o(n^{−(1−δ)/2}), |κ_ησ_η²− κ_η,0σ_η,0² | = o(1) and any ε > 0,

|σ_η²κη−κη,0σinf²_η,0|≥εn^{−(1−δ)/4}

¡− 2`(θ; α) + 2`((σ_η,0² , κ_η,0, σ²_²,α)⁰; α)¢

> 0, (5.143) as n → ∞ with probability tending to 1. By (5.140), we have for |σ²_²− σ²_²,α| = o(n^{−(1−δ)/2}) and |κ_ησ_²²− κ_η,0σ_η,0² | = o(1),

−2`(θ; α) + 2`((σ_η,0² , κ_η,0, σ_²,α² )⁰; α)

½µ2κ_ησ_η² σ²_²,α

¶_1/2µ 1

2 +κ_η,0σ_η,0² 2κ_ησ_η²

− (2κ_η,0σ_η,0² )^1/2 σ_²,α

n^(1+δ)/2+ 1

2κ_η,α(κ_η− κ_η,α)²n^δ +ξ₁(θ; α) − ξ₁((σ_η,0² , κ_η,0, σ_²,α² )⁰; α) + ξ₂(θ; α) − ξ₂((σ²_η,0, κ_η,0, σ_²,α² )⁰; α)

+ξ(θ) − ξ((σ²_η,0, κ_η,0, σ_²,α² )⁰) + o_p(n^δ)

= (κησ_η²− κη,0σ²_η,0)²n^(1+δ)/2 2^5/2(κη,0σ_η,0² )^3/2 + 1

2κη,0

(κ_η − κ_η,0)²n^δ+ ξ₁(θ; α) − ξ₁((σ²_η,0, κ_η,0, σ_²,α² )⁰; α) +ξ₂(θ; α) − ξ₂((σ_η,0² , κ_η,0, σ_²,α² )⁰; α) + ξ(θ) − ξ((σ_η,0² , κ_η,0, σ²_²,α)⁰)

+o((κησ_η²− κη,0σ²_η,0)²n^(1+δ)/2) + op(n^δ)

= (κ_ησ_η²− κ_η,0σ²_η,0)²n^(1+δ)/2

2^5/2(κ_η,0σ_η,0² )^3/2 +(κ_η − κ_η,0)²n^δ

2κ_η,0 + o((κησ²_η− κη,0σ²_η,0)²n^(1+δ)/2)

+o_p(n^δ), (5.144)

where the first equality follows from |σ_²²− σ²_²,α| = o(n^{−(1−δ)/2}), the second equality follows from (5.41), and the last equality follows from

ξ₁(θ; α) − ξ₁((σ²_η,0, κ_η,0, σ_²,α² )⁰; α) = o_p(max((κ_ησ_η²− κ_η,0σ_η,0² )²n^(1+δ)/2, n^δ)),(5.145) ξ₂(θ; α) − ξ₂((σ²_η,0, κ_η,0, σ_²,α² )⁰; α) = o_p(max((κ_ησ_η²− κ_η,0σ_η,0² )²n^(1+δ)/2, n^δ)),(5.146) ξ(θ) − ξ((σ_η,0² , κ_η,0, σ²_²,α)⁰) = o_p(max((κ_ησ_η²− κ_η,0σ_η,0² )²n^(1+δ)/2, n^δ)),(5.147) which can be obtained in a way similar to (5.109)-(5.111). Thus, (5.143) is obtained. This completes the proof of (5.131).

Third, we prove (5.132) and (5.133). By (5.144), we have for |σ²_²− σ²_²,α| = o(n^{−(1−δ)/2}),

|σ²_ηκ_η − κ_η,0σ_η,0² | = o(n^{−(1−δ)/4}) and any ε > 0,

|κη−κinfη,0|≥ε−2`(θ; α) + 2`((σ_η,0² , κ_η,0, σ_²,α² )⁰; α) = 1 2κη,0

ε²n^δ+ o_p(n^δ) > 0,

as n → ∞ with probability tending to 1, which gives (5.133). This together with (5.131) gives (5.132).

Fourth, we prove (5.134). By (5.128) and (5.129), it suffices to show that for |σ_²² − σ²_²,α| = o(1), |κ_ησ_η²− κ_η,0σ_η,0² | = o(1), there exists M > 0 such that

|σ_²²−σ²_²,αinf|≥M n^−1/2(−2`(θ; α) + 2`((σ_η², κ_η, σ²_²,α)⁰; α)) > 0, (5.148) as n → ∞ with probability tending to 1. By (5.126) and (5.138),

−2`(θ; α) = n log(2π) − 1 − δ

2 log n + µ

log σ_²²+ σ²_²,α σ_²²

¶ n

µ2κ_ησ_η² σ²_²

¶_1/2µ

1 −σ_²,α²

2σ²_² +κ_η,0σ²_η,0 2κησ²_η

¶ n^1/2

+ξ1(θ; α) + ξ2(θ; α) + ξ(θ) + Op(1). (5.149) Then, for |σ_²²− σ²_²,α| = o(1) and |κ_ησ_η²− κ_η,0σ²_η,0| = o(1), we have

−2`(θ; α) + 2`((σ_η², κ_η, σ_²,α² )⁰; α)

= µ

log σ_²²+σ_²,α²

σ_²² − log σ_²,α² − 1

¶ n

½µ2κ_ησ_η² σ²_²

¶_1/2µ

1 −σ_²,α²

2σ_²² +κ_η,0σ_η,0² 2κ_ησ²_η

−

µ2κ_ησ_η² σ_²,α²

¶_1/2µ 1

2+ κ_η,0σ_η,0² 2κ_ησ²_η

¶¾ n^1/2 +ξ₁(θ; α) − ξ₁((σ²_η, κ_η, σ²_²,α)⁰; α) + ξ₂(θ; α) − ξ₂((σ²_η, κ_η, σ²_²,α)⁰; α)

+ξ(θ) − ξ((σ_η², κ_η, σ_²,α² )⁰) + O_p(1)

= 1

2σ_²,α⁴ (σ_²²− σ²_²,α)²n + ξ₁(θ; α) − ξ₁((σ_η², κ_η, σ_²,α² )⁰; α) + ξ₂(θ; α) − ξ₂((σ²_η, κ_η, σ_²,α² )⁰; α) +ξ(θ) − ξ((σ_η², κη, σ_²,α² )⁰) + op((σ_²²− σ_²,α² )²n) + Op(1)

= 1

2σ_²,α⁴ (σ_²²− σ²_²,α)²n + o_p((σ²_² − σ_²,α² )²n) + O_p(1),

where the second equality follows from (5.39) and (5.40) with σ = σ_²,α and τ = κ_η,0σ²_η,0, and the last equality follows from (5.73) and

ξ₁(θ; α) − ξ₁((σ_η², κ_η, σ_²,α² )⁰; α) = o_p((σ²_² − σ_²,α² )²n) + O_p(1), ξ₂(θ; α) − ξ₂((σ_η², κ_η, σ_²,α² )⁰; α) = o_p((σ²_² − σ_²,α² )²n) + O_p(1),

which can be obtained in a way similar to (5.105)-(5.106). Consequently, there exists M > 0 such that

inf

|σ²²−σ²,α² |≥M n^−1/2(−2`(θ; α) + 2`((σ²_η, κ_η, σ²_²,α)⁰; α)) = M²

2σ_²,α⁴ ε²+ O_p(1) > 0,

as n → ∞ with probability tending to 1. Thus, (5.148) is obtained. This completes the proof of (5.134).

Finally, we prove (5.135). By (5.129) and (5.134), it suffices to show that for |σ_²² − σ²_²,α| = O(n^−1/2), |κησ²_η − κη,0σ_η,0² | = o(1) and there exist M > 0 such that

|σ²_ηκη−κη,0infσ_η,0² |≥M n^−1/4

¡− 2`(θ; α) + 2`((σ²_η,0, κ_η,0, σ_²,α² )⁰; α)¢

> 0, (5.150)

as n → ∞ with probability tending to 1. By (5.149), for |σ²_² − σ²_²,α| = O(n^−1/2) and

|κ_ησ_η²− κ_η,0σ_η,0² | = o(1), we have

−2`(θ; α) + 2`((σ_η,α² , κ_η,α, σ²_²,0)⁰; α)

½µ2κ_ησ_η² σ²_²,α

¶_1/2µ 1

2 +κ_η,0σ_η,0² 2κ_ησ_η²

−

µ2κ_η,0σ²_η,0 σ_²,α²

¶¾ n^1/2

+ξ₁(θ; α) − ξ₁((σ_η,0² , κ_η,0, σ_²,α² )⁰; α) + ξ₂(θ; α) − ξ₂((σ²_η,0, κ_η,0, σ_²,α² )⁰; α) +ξ(θ) − ξ((σ²_η,0, κ_η,0, σ_²,α² )⁰) + O_p(1)

= (κ_ησ_η²− κ_η,0σ_η,0² )²n^1/2

2^5/2(κ_η,0σ_η,0² )^3/2 + ξ₁(θ; α) − ξ₁((σ_η,0² , κ_η,0, σ²_²,α)⁰; α) +ξ₂(θ; α) − ξ₂((σ_η,0² , κ_η,0, σ_²,α² )⁰; α) + ξ(θ) − ξ((σ_η,0² , κ_η,0, σ²_²,α)⁰) +o¡

(κ_ησ²_η − κ_η,0σ_η,0² )²n^1/2¢

+ O_p(1)

= (κησ_η²− κη,0σ_η,0² )²n^1/2 2^5/2(κη,0σ_η,0² )^3/2 + o¡

(κ_ησ²_η− κ_η,0σ²_η,0)²n^1/2¢

+ O_p(1), (5.151) where the first equality follows from |σ²_² − σ_²,α² | = O(n^−1/2), the second equality follows from (5.41), and the last equality follows from

ξ₁(θ; α) − ξ₁((σ_η,0² , κ_η,0, σ²_²,α)⁰; α) = o_p((κ_ησ²_η− κ_η,0σ_η,0² )²n^1/2) + O_p(1),

ξ2(θ; α) − ξ2((σ_η,0² , κη,0, σ²_²,α)⁰; α) = op((κησ²_η− κη,0σ_η,0² )²n^1/2) + Op(1), (5.152) ξ(θ) − ξ(σ_η,0² , κη,0, σ_²,α² )⁰) = op((κησ²_η− κη,0σ_η,0² )²n^1/2) + Op(1), (5.153) which can be obtained in a way similar to (5.109)-(5.111). Thus, (5.150) is obtained. This

completes the proof of (5.135). 2

Corollary 6 Under the setup of Theorem 11, let θ⁽³⁾_α =

σ_η,0² , κ_η,0, X

j∈α^c\α

β_j²σ²_j + σ_²,0²

. (5.154)

Then for `(θ; α) defined in (2.9), plim

n→∞

n^δ(−2`( ˆθ(α); α) + 2`(θ_α⁽³⁾; α)) = 0; if δ ∈ (0, 1), (5.155)

−2`( ˆθ(α); α) + 2`(θ⁽³⁾_α ; α) = O_p(1); if δ = 0. (5.156) In addition, for L^KL(θ; α) defined in (3.3),

plim

n→∞L^KL( ˆθ(α); α)±

L^KL(θ_α⁽³⁾; α)) = 0; if δ ∈ [0, 1). (5.157)

Note that from Theorem 11, plim

n→∞

θ(α) = θˆ _α⁽³⁾ for δ ∈ (0, 1), which immediately im-plies (5.155). On the other hand, (5.156) is somewhat surprising, because ˆθ(α) generally does not converge to θα⁽²⁾ for δ = 0. However, selection consistency and asymptotic loss efficiency are possible for geostatistical model selection even if some covariance parame-ters cannot be consistently estimated under the fixed domain asymptotic framework (see Theorem 13).

Theorem 12 Consider a class of models given by (3.1) with x_j(s)’s independently gener-ated from white-noise processes of (5.7) and cov(η(s), η(s⁰)) = σ_η²exp(−κ_η|s − s⁰|), where A^c 6= ∅ and p is fixed. Suppose that σ_η² > 0, κη > 0 and σ²_² > 0 are known. In addi-tion, suppose that the data are collected at s_i = in^−(1−δ) ∈ [0, n^δ]; i = 1, . . . , n for some δ ∈ [0, 1). If λ → ∞ and λ/n → 0, then

L^KL(ˆα_GIC(λ))±

minα∈AL^KL(α)−→ 1,^p as n → ∞.

In addition,

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1.

Proof. By Corollary 2, it suffices to show that

n→∞lim tr(Σ⁻¹)/λ = ∞, (5.158)

which follows from (5.31) and λ = o(n). This completes the proof. 2 Theorem 13 Under the setup of Theorem 12, suppose that θ = (σ²_η, κ_η, σ_²²)⁰ is unknown, where Θ ⊂ (0, ∞)³ is a compact set such that θ0 ∈ Θ. Let ˆθ(α) be the ML estimate of θ based on model α. For δ ∈ [0, 1), if λ → ∞ and λ±

n → 0, then

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1.

Proof. For the consistency, it suffices to show that the conditions in Corollary 3 are satisfied with τ_n= n. First, for δ ∈ [0, 1),

µ⁰A(α; θ)⁰Σ⁻¹(θ)A(α; θ)µ = β(α^c\ α)⁰X(α^c\ α)⁰Σ⁻¹(θ)X(α^c\ α)β(α^c\ α)

−β(α^c\ α)⁰X(α^c\ α)⁰Σ⁻¹(θ)M (α; θ)X(α^c\ α)β(α^c\ α)

= β(α^c\ α)⁰X(α^c\ α)⁰Σ⁻¹(θ)X(α^c\ α)β(α^c\ α) + O_p(1)

= X

j∈α^c\α

β_j²σ²_jtr(Σ⁻¹(θ)) + op(n)

= X

j∈α^c\α

β_j²σ_j²

σ_²² n + o_p(n), (5.159)

where the second equality is obtained in a way similar to (5.100), the third equality follows from

β(α^c\ α)⁰X(α^c\ α)⁰Σ⁻¹(θ)X(α^c\ α)β(α^c\ α) = X

j∈α^c\α

β_j²σ_j²tr(Σ⁻¹(θ)) + o_p(n),

which can be obtained by (5.35), Chebyshev’s inequality and using the following moment condition:

var(X_j⁰Σ⁻¹(θ)Xj⁰) = σ²_jσ²_j⁰tr(Σ⁻²(θ)) = O(n),

and the last equality follows from (5.31). Hence, (A.1’) is satisfied. Second, by (5.31) and (5.35), we have for any θ ∈ Θ,

plim

n→∞

nX⁰Σ⁻¹(θ)X = D(θ),

where D(θ) is a p × p diagonal matrix with diagonals σ_j²±

σ²_², j = 1, . . . , p. Hence, (A.2’) holds. Third, by (5.34) and (5.35), (A.3’) holds. Fourth, by (5.155)-(5.157), (A.4) and (A.5) hold trivially for τ_n= n and θα⁽³⁾ defined in (5.154). Fifth, for ξ(θ) defined in (5.53), by (5.147) and (5.153), we have for δ ∈ [0, 1),

plim

n→∞

1 n

¡ξ(θ₀) − ξ(θ⁽³⁾_α )¢

= 0.

Hence, (4.12) holds. Last, for α ∈ A^c, θα⁽³⁾ = θ₀, (4.14) holds trivially. Then, for A^c6= ∅, λ → ∞ and λ = o(n), we have

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1,

which completes the proof. 2

Comparing among Theorems 7, 10 and 13, we see that GIC is easiest to be consistent when the variables to be selected are from white-noise processes, but is most difficult to be so when the variables to be selected are polynomials.

Chapter 6 Conditional Generalized Information Criterion

If we are interested to find the asymptotic optimal properties of (3.14) throughout some selection procedure, it is somehow difficult to prove the asymptotic properties directly from GIC we introduce above. Another criterion is needed. Vaida and Blanchard (2005) suggest a suitable criterion when we are interesting in spatial process prediction which is named conditional AIC (CAIC). Here we will also suggest a conditional generalized information criterion (CGIC) which includes CAIC as a special case. In the following sections, we are going to introduce the asymptotic theory of CGIC in geostatistical model selection problems.

6.1 Conditional Akaike’s Information Criterion

Consider the loss, L(α) defined in (3.14) with estimators, ˆS(α) defined in (3.15). It’s difficult to find the optimal properties of L(α) directly from the criterion (4.6). Vaida and Blanchard (2005) suggested a conditional AIC (CAIC) selection procedure for the linear mixed models which is an unbiased estimator of E(L(α)) shown in (3.17). They suggested when focus on the mean function estimate, the AIC in (4.3) is good to be a selection procedure. When focus on both the mean function estimate and the spatial process prediction, CAIC is much adequate than AIC to be a selection procedure. That is for α ∈ A,

Γ_CAIC(α) = kZ − ˆS(α)k²+ 2tr(H(α))σ²_², (6.1) where ˆS(α) = H(α)Z with H(α) defined in (3.16). Let

α_CAIC = arg min

α∈A Γ_CAIC(α). (6.2)

Then we have the following theorem.

Theorem 14 Consider a class of models given by (3.1). Suppose that

n→∞lim X

α∈A

E(L(α)) = 0, (6.3)

where L(α) is defined in (3.14). Then the criterion ΓCAIC(α) defined in (6.1) is asymp-totically loss efficient:

plim

n→∞L(ˆα_CAIC)±

α∈Ainf L(α) = 1.

Proof. Here, we first expand the CAIC defined in (6.1). It is Γ_CAIC(α) = (Z − ˆS(α))⁰(Z − ˆS(α)) + 2σ²_²tr(H(α))

= (S − ˆS(α) + ²)⁰(S − ˆS(α) + ²) + 2σ²_²tr(H(α))

= L(α) + 2²⁰(S − ˆS(α)) + ²⁰² + 2σ_²²tr(H(α))

= L(α) + 2²⁰(I − H(α))S − 2²⁰H(α)² + ²⁰² + 2σ²_²tr(H(α))

= L(α) + 2σ_²²²⁰(Σ⁻¹A(α))S − 2²⁰H(α)² + ²⁰² + 2σ_²²tr(H(α))

= L(α) + 2σ_²²²⁰(Σ⁻¹A(α))µ + 2σ²_²²⁰(Σ⁻¹A(α))η + ²⁰²

−2¡

²⁰H(α)² − σ_²²tr(H(α))¢

, (6.4)

where the third equality follows from (3.14) and the second last equality follows from I − H(α) = A(α) − Σ_ηΣ⁻¹A(α) = σ²_²Σ⁻¹A(α).

It then needs to show that for α ∈ A,

Γ_CAIC(α) = ²⁰² + L(α) + o_p(L(α)), (6.5) which suffices to show that

plim

n→∞sup

α∈A

|²⁰(Σ⁻¹A(α))µ|

E(L(α)) = 0, (6.6)

plim

n→∞sup

α∈A

|²⁰(Σ⁻¹A(α))η|

E(L(α)) = 0, (6.7)

plim

n→∞sup

α∈A

|²⁰H(α)² − σ_²²tr(H(α))|

E(L(α)) = 0, (6.8)

plim

n→∞sup

α∈A

¯¯

¯¯ L(α) E(L(α)) − 1

¯¯

¯¯ = 0. (6.9)

Hence, by (6.5), for ˆα_CAIC defined in (6.2) and α^L = arg min_α∈AL(α), we can easily conclude that

Γ_CAIC(ˆα_CAIC) = ²⁰² + L(ˆα_CAIC) + o_p(L(ˆα_CAIC)), Γ_CAIC(α^L) = ²⁰² + L(α^L) + o_p(L(α^L)).

It follows that

0 ≤ Γ_CAIC(α^L) − Γ_CAIC(ˆα_CAIC)

L(ˆαCAIC) = L(α^L) − L(ˆα_CAIC)

L(ˆαCAIC) + o_p(1), and then

plim

n→∞

L(α^L) − L(ˆα_CAIC) L(ˆα_CAIC) = 0, which gives plim

n→∞L(ˆαCAIC)±

α∈Ainf L(α) = 1.

Here, we start to prove (6.6)-(6.9) one by one. First, any ε > 0,

which gives (6.6), where the second last inequality follows from σ_²⁴µ⁰A(α)⁰Σ⁻²A(α)µ ≤ E(L(α)), by (3.17) and the last equality follows from (6.3).

Second, for any ε > 0,

where the third inequality follows from σ²_²Σ⁻¹ ≤ I, the second last inequality follows from σ_²²tr(Σ⁻¹A(α)Ση) ≤ σ_²²tr(ΣηΣ⁻¹) ≤ E(L(α)),

by (3.17) and the last equality follows from (6.3).

Third, for any ε > 0,

where the second inequality is an application of Theorem 2 of Whittle (1960) for some c₁ > 0, the third equality follows from

tr(H(α)H(α)⁰) = tr¡

Σ_ηΣ⁻¹+ σ²_²Σ⁻¹M (α))(Σ_ηΣ⁻¹+ σ_²²Σ⁻¹M (α))⁰¢

= tr¡

Σ_ηΣ⁻²Σ_η+ σ²_²Σ_ηΣ⁻¹M (α)⁰Σ⁻¹+ σ_²²Σ⁻¹M (α)Σ⁻¹Σ_η +σ_²⁴M (α)⁰Σ⁻²M (α)¢

≤ tr(ΣηΣ⁻¹) + 3σ²_²tr(Σ⁻¹M (α))

≤ 3σ_²⁻²E(L(α)), by

tr(Σ_ηΣ⁻²Σ_η) = tr(Σ_ηΣ⁻¹− σ²_²Σ⁻²Σ_η)

≤ tr(ΣηΣ⁻¹)

tr(σ_²²ΣηΣ⁻¹M (α)⁰Σ⁻¹) = tr(σ_²²tr(M (α)⁰Σ⁻¹− σ_²²Σ⁻¹M (α)⁰Σ⁻¹))

≤ tr(σ_²²tr(M (α)⁰Σ⁻¹)), and

σ⁴_²tr(M (α)⁰Σ⁻²M (α)) ≤ σ_²²tr(M (α)⁰Σ⁻¹M (α)) = σ²_²tr(Σ⁻¹M (α)).

Last, it remains to show (6.9). Here, we first expand L(α) defined in (3.14). That is L(α) = (S − ˆS(α))⁰(S − ˆS(α))

= k(I − H(α))µ + (η − Σ_ηΣ⁻¹(η + ²)) − σ_²²Σ⁻¹M (α)(η + ²)k²

= kσ_²²Σ⁻¹A(α)µ + (σ_²²Σ⁻¹η − Σ_ηΣ⁻¹²) − σ²_²Σ⁻¹M (α)(η + ²)k²

= σ_²⁴µ⁰A(α)⁰Σ⁻²A(α)µ + kσ²_²Σ⁻¹η − ΣηΣ⁻¹²k²− 2σ_²⁴µ⁰A(α)⁰Σ⁻²M (α)(η + ²) +σ⁴_²(η + ²)⁰M (α)⁰Σ⁻²M (α)(η + ²) + 2σ_²²µ⁰A(α)⁰Σ⁻¹(σ_²²Σ⁻¹η − ΣηΣ⁻¹²)

−2σ²_²(σ_²²Σ⁻¹η − Σ_ηΣ⁻¹²)⁰Σ⁻¹M (α)(η + ²). (6.10) It then follows together with (3.17),

L(α) − E(L(α)) = kσ_²²Σ⁻¹η − Σ_ηΣ⁻¹²k²− σ_²²tr(Σ_ηΣ⁻¹)

+σ_²⁴(η + ²)⁰M (α)⁰Σ⁻²M (α)(η + ²) − σ_²⁴tr(Σ⁻¹M (α))

+2σ²_²µ⁰A(α)⁰Σ⁻¹(σ²_²Σ⁻¹η − ΣηΣ⁻¹²) − 2σ_²⁴µ⁰A(α)⁰Σ⁻²M (α)(η + ²)

−2σ_²²(σ²_²Σ⁻¹η − Σ_ηΣ⁻¹²)⁰Σ⁻²M (α)(η + ²).

Then, to show (6.9), it suffices to show that plim

n→∞

sup

α∈A

¯¯kσ_²²Σ⁻¹η − Σ_ηΣ⁻¹²k²− σ_²²tr(Σ_ηΣ⁻¹)¯

E(L(α)) = 0, (6.11)

plim

n→∞sup

α∈A

|(η + ²)⁰M (α)⁰Σ⁻²M (α)(η + ²) − tr(Σ⁻¹M (α))|

E(L(α)) = 0, (6.12)

plim

n→∞

sup

α∈A

|µ⁰A(α)⁰Σ⁻¹(σ_²²Σ⁻¹η − Σ_ηΣ⁻¹²)|

E(L(α)) = 0, (6.13)

plim

n→∞sup

α∈A

|µ⁰A(α)⁰Σ⁻²M (α)(η + ²)|

E(L(α)) = 0, (6.14)

plim

n→∞

sup

α∈A

|(σ²_²Σ⁻¹η − Σ_ηΣ⁻¹²)⁰Σ⁻¹M (α)(η + ²)|

E(L(α)) = 0. (6.15)

Now, we start to prove (6.11)-(6.15) one by one.

Hence, to show (6.11), it suffices to show plim

First, (6.16) can be established in a similar manner by Theorem 2 of Whittle. It is for any ε > 0,

for some c₂ > 0, where the third inequality follows from

tr(Σ_ηΣ⁻²Σ_ηΣ⁻²) ≤ σ_²⁻⁴tr(Σ²Σ⁻²) ≤ σ_²⁻⁴tr(Σ_ηΣ⁻¹), (6.19) by σ_²²Σ⁻¹≤ I and Σ^1/2η Σ⁻¹Σ^1/2η ≤ I by Σ_η ≤ Σ, and the last equality follows from (4.5).

Second, (6.17) is also established by Theorem 2 of Whittle. It is for any ε > 0,

n→∞lim P

for some c₃ > 0, where third inequality follows from where the third equality follows from

σ_²²tr(Σ⁻¹ΣηΣ⁻¹ΣηΣ⁻¹ΣηΣ⁻¹) ≤ σ_²²tr(Σ⁻¹ΣηΣ⁻¹ΣηΣ⁻¹)

≤ σ_²²tr(Σ⁻¹Σ_ηΣ⁻¹)

≤ tr(Σ_ηΣ⁻¹),

by σ_²²Σ⁻¹ ≤ I and Σ^−1/2Σ_ηΣ^−1/2 ≤ I by Σ_η < Σ, and the last equality follows from (6.3). It then gives (6.11).

For (6.12), it can be established by Theorem 2 of Whittle. It is for any ε > 0,

n→∞lim P

for some c4 > 0, where the third inequality follows from

tr(ΣM (α)⁰Σ⁻²M (α)ΣM (α)⁰Σ⁻²M (α)) = tr(M (α)Σ⁻¹M (α)Σ⁻¹)

≤ tr(Σ⁻¹M (α)Σ⁻¹)

≤ σ⁻²_² tr(Σ⁻¹M (α)),

by M (α)ΣM⁰(α)Σ⁻¹ = M (α), Σ⁻¹M (α) ≤ Σ⁻¹ and σ_²²Σ⁻¹ ≤ I, and the last equality follows from (6.3).

For (6.13), we have

|µ⁰A(α)⁰Σ⁻¹(σ²_²Σ⁻¹η − Σ_ηΣ⁻¹²)| = |σ_²²µ⁰A(α)⁰Σ⁻²η − µ⁰A(α)⁰Σ⁻¹Σ_ηΣ⁻¹²|

≤ |σ_²²µ⁰A(α)⁰Σ⁻²η| + |µ⁰A(α)⁰Σ⁻¹Σ_ηΣ⁻¹²|.

Hence, to show (6.13), it suffices to show that plim

First, (6.20) can be show similarly from (6.6). It is for any ε > 0,

n→∞lim P where the third inequality follows from

µ⁰A(α)⁰Σ⁻²Σ_ηΣ⁻²A(α)µ ≤ µ⁰A(α)⁰Σ⁻³A(α)µ follows from (6.3). It then gives (6.13).

For (6.14), we have for any ε > 0, follows from (6.3). It then gives (6.14).

For (6.15), we have

Then, to show (6.15), it suffices to show that plim

Now, we start to show (6.22)-(6.25) one by one. First, (6.22) can be established by

Theorem 2 of Whittle. That is for any ε > 0,

for some c₆ > 0, where the third inequality follows from

Σ⁻¹M (α)Σ_ηM (α)⁰Σ⁻¹ ≤ Σ⁻¹M (α)ΣM (α)⁰Σ⁻¹ = Σ⁻¹M (α),

and the fourth inequality follows from σ²_²Σ⁻¹Σ_ηΣ⁻¹ ≤ I by Σ_η ≤ Σ and σ²_²Σ⁻¹ ≤ I, and the last equality follows from (6.3). Second, (6.23) is similarly to (6.18). It is

n→∞lim P

equality follows from (6.3). Third, (6.24) is similar to (6.23). It is for any ε > 0,

where the third inequality follows from

Σ⁻¹M (α)Σ_ηM (α)⁰Σ⁻¹ ≤ Σ⁻¹M (α)ΣM (α)⁰Σ⁻¹ = Σ⁻¹M (α),

and the fourth inequality follows from Σ_ηΣ⁻²Σ_η ≤ I by Σ²_η ≤ Σ², and the last equality follows from (6.3). Last, (6.25) can be established by Theorem 2 of Whittle. It is for any ε > 0,

for some c7 > 0, where the third equality follows from

σ²_²Σ⁻¹M (α)M (α)⁰Σ⁻¹ ≤ Σ⁻¹M (α)ΣM (α)⁰Σ⁻¹ = Σ⁻¹M (α),

and the fourth inequality follows from Σ_ηΣ⁻²Σ_η ≤ I by Σ²_η ≤ Σ², and the last equality follows from (6.3). Thus, we ends the proof of (6.9), which completes the proof. 2 Note that (6.3) holds in general. Here, we consider an example where (6.3) is satisfied.

Corollary 7 Consider a class of models given by (3.1) with p fixed and any arbitrary explanatory variables. Suppose that the data are collected at s_i = in^−(1−δ) ∈ [0, n^δ];

i = 1, . . . , n for some δ ∈ [0, 1). Consider the exponential covariance model of (5.1) for η(·). Let ˆα_CAIC be the model selected by CAIC as defined in (6.2). Then,

plim

n→∞L(ˆα_CAIC)±

α∈Ainf L(α) = 1.

Further, if A^c6= ∅, then for any model selection procedure ˆα, such that lim

n→∞P (ˆα ∈ A^c) = 1,

plim

n→∞L(ˆα)±

α∈Ainf L(α) = 1.

It is shown in (3.17) that E(L(α)) is lower bounded by dominated by σ_²²tr(Σ_ηΣ⁻¹) for α ∈ A, which is often a dominated term of E(L(α)). In addition, for α ∈ A^c, σ_²²tr(ΣηΣ⁻¹) is the dominated term of E(L(α)). Hence, it might suggests us that whatever correct model we select, it will be always satisfied the asymptotic loss efficiency. Further, in the following example, EL((α)) are dominated by σ²_²tr(ΣηΣ⁻¹) for α ∈ A. In such case, every candidate model achieves the asymptotic loss efficiency.

Corollary 8 Consider a class of models given by (3.1) with x_j(s) = (sn^−δ)^j; j = 1, . . . , p, and cov(η(s), η(s⁰)) = σ²_ηexp(−κ_η|s − s⁰|), where p fixed and A^c 6= ∅. Suppose that the data are collected at s_i = in^−(1−δ) ∈ [0, n^δ]; i = 1, . . . , n for some δ ∈ [0, 1). Let ˆα_CAIC be the model selected by CAIC as defined in (6.2). Then

plim

n→∞L(ˆα_CAIC)±

α∈Ainf L(α) = 1.

Further, for any model selection procedure ˆα, plim

n→∞L(ˆα)±

α∈Ainf L(α) = 1.

¿From (7) and (8), it might suggest us that the variable selection is somehow unnec-essary for the asymptotic loss efficiency of L(α) in those cases. Here, we consider the strongly asymptotic loss efficiency of L(α) defined in (3.21).

Theorem 15 Consider a class of models given by (3.1) and the universal kriging predictor S(α) of S defined in (3.15). Supposeˆ

n→∞lim X

α∈A\A^c

E(L(α)) − σ_²²tr(ΣηΣ⁻¹) = 0, (6.26) where L(α) is defined in (3.14). If |A^c| ≤ 1 and α^c is fixed, then ˆαCAIC of (6.2) is strongly asymptotic loss efficient:

plim

n→∞

L(ˆα_CAIC) − kS − E(S|Z)k² inf_α∈AL(α) − kS − E(S|Z)k² = 1.

Proof. Here, we first suppose that A^c = ∅. Now, we expand the CAIC defined in (6.1) from (6.4). It is

Γ_CAIC(α) = L(α) + 2σ_²²²⁰(Σ⁻¹A(α))µ + 2σ²_²²⁰(Σ⁻¹A(α))η + ²⁰²

−2¡

²⁰H(α)² − σ_²²tr(H(α))¢

= L(α) + 2σ_²²²⁰(Σ⁻¹A(α))µ + 2σ²_²²⁰Σ⁻¹η − 2σ_²²²⁰M (α)η + ²⁰²

−2¡

²⁰Σ_ηΣ⁻¹² − σ_²²Σ_ηΣ⁻¹¢

− 2σ_²²¡

²⁰Σ⁻¹M (α)² − σ²_²tr(Σ⁻¹M (α))¢ ,(6.27) where the last equality follows from H(α) = ΣηΣ⁻¹+ σ_²²Σ⁻¹M (α) by (3.16). Note that 2σ_²²²⁰Σ⁻¹η +²⁰²−2¡

²⁰Σ_ηΣ⁻¹²−σ_²²Σ_ηΣ⁻¹¢

is constant in variable selection. It then needs to show that for α ∈ A \ A^c,

Γ_CAIC(α) = constant + L^∗(α) + o_p(L^∗(α)), (6.28) where L^∗(α) = L(α) − kS − E(S|Z)k², which suffices to show that

plim

n→∞ sup

α∈A\A^c

|²⁰(Σ⁻¹A(α))µ|

E(L^∗(α)) = 0, (6.29)

plim

n→∞

sup

α∈A\A^c

|²⁰(Σ⁻¹M (α))η|

E(L^∗(α)) = 0, (6.30)

plim

n→∞ sup

α∈A\A^c

|²⁰Σ⁻¹M (α)² − σ_²²tr(Σ⁻¹M (α))|

E(L^∗(α)) = 0, (6.31)

plim

n→∞ sup

α∈A\A^c

¯¯

¯¯ L^∗(α) E(L^∗(α)) − 1

¯¯

¯¯ = 0. (6.32)

Hence, by (6.28), for ˆα_CAIC defined in (6.2) and α^L = arg min_α∈AL^∗(α), we can easily conclude that

ΓCAIC(ˆαCAIC) = constant + L^∗(ˆαCAIC) + op(L^∗(ˆαCAIC)), Γ_CAIC(α^L) = constant + L^∗(α^L) + o_p(L^∗(α^L)).

It follows that

0 ≤ Γ_CAIC(α^L) − Γ_CAIC(ˆα_CAIC)

L^∗(ˆα_CAIC) = L^∗(α^L) − L^∗(ˆα_CAIC)

L^∗(ˆα_CAIC) + op(1), and then

plim

n→∞

L^∗(α^L) − L^∗(ˆα_CAIC) L^∗(ˆα_CAIC) = 0, which gives plim

n→∞

L^∗(ˆα_CAIC)±

α∈Ainf L^∗(α) = 1 when A^c= ∅.

Here, we first calculate EL^∗(α). By (9.1), we have E(L^∗(α)) = E(L(α)) − EkS − E(S|Z)k²

= E(L(α)) − σ_²²tr(ΣηΣ⁻¹)

= σ_²⁴µ⁰A(α)⁰Σ⁻²A(α)µ + σ⁴_²tr(Σ⁻¹M (α)), (6.33) by (3.17). Now, we start to prove (6.29)-(6.32) one by one. For (6.29), the proof can be followed from the proof of (6.6) by replacing E(L(α)) with (6.33).

For (6.30), we have for any ε > 0,

where the third inequality follows from Σ^−1/2Σ_ηΣ^−1/2 ≤ I, the second last inequality follows from (6.33) and the last equality follows from (6.26).

For (6.31), we have for any ε > 0,

where the second inequality is an application of Theorem 2 of Whittle (1960) for some c₁ > 0, and the third and fourth inequality follows from

σ_²⁴tr(M (α)⁰Σ⁻²M (α)) ≤ σ²_²tr(M (α)⁰Σ⁻¹M (α)) = σ_²²tr(Σ⁻¹M (α)) ≤ σ_²⁻²E(L^∗(α)), and the last equality follows from (6.26).

Now, it remains to show (6.32). Here, we first expand L^∗(α) from (6.10). That is L^∗(α) = L(α) − kS − E(S|Z)k²

= L(α) − kσ_²²Σ⁻¹η − Σ_ηΣ⁻¹²k²

= σ_²⁴µ⁰A(α)⁰Σ⁻²A(α)µ + σ_²⁴(η + ²)⁰M (α)⁰Σ⁻²M (α)(η + ²)

+2σ²_²µ⁰A(α)⁰Σ⁻¹(σ_²²Σ⁻¹η − Σ_ηΣ⁻¹²) − 2σ_²⁴µ⁰A(α)⁰Σ⁻²M (α)(η + ²)

−2σ_²²(σ²_²Σ⁻¹η − Σ_ηΣ⁻¹²)⁰Σ⁻¹M (α)(η + ²), (6.34)

the second equality follows from (9.1). It then follows together with (3.17), L^∗(α) − E(L^∗(α)) = σ⁴_²(η + ²)⁰M (α)⁰Σ⁻²M (α)(η + ²) − σ⁴_²tr(Σ⁻¹M (α))

+2σ_²²µ⁰A(α)⁰Σ⁻¹(σ²_²Σ⁻¹η − Σ_ηΣ⁻¹²) − 2σ_²⁴µ⁰A(α)⁰Σ⁻²M (α)(η + ²)

−2σ_²²(σ²_²Σ⁻¹η − Σ_ηΣ⁻¹²)⁰Σ⁻²M (α)(η + ²).

Equation (6.32) can then be followed by plim

Note that the proofs of (6.35)-(6.38) can be followed from the proofs of (6.12)-(6.15) by replacing EL(α) with E(L^∗(α)). Hence, (6.32) is then followed, which completes the proof when A^c= ∅.

Not, we suppose that A^c = {α^c}. To show that the CAIC is still asymptotically loss efficient, it remains to show that for fixed α^c,

L^∗(α^c) = o_p(L^∗(α)); if α ∈ A \ A^c, (6.39) Γ_CAIC(α^c) = constant + L^∗(α^c) + o_p(L^∗(α)). (6.40) Hence, by (6.39) and (6.40), we can easily conclude that

n→∞lim P¡

Now, we start to prove (6.39). Equation (6.39) can be followed by (6.32) and plim

Equations (6.41) can then be followed by plim

Note that (6.42) can be followed similarly from the proof of (6.35) and (6.43) is trivial since σ_²²tr(Σ⁻¹M (α^c)) ≤ p(α^c) < ∞, and (6.44) can be followed similarly from the proof of (6.38). It then gives (6.39).

Now we start to prove (6.40). By (6.27), we have Γ_CAIC(α^c) = constant + L^∗(α^c) − 2σ_²²²⁰M (α^c)η − 2σ²_²¡

²⁰Σ⁻¹M (α^c)² − σ_²²tr(Σ⁻¹M (α^c))¢ . Equation (6.40) can then be followed by

plim

n→∞ sup

α∈A\A^c

|²⁰M (α^c)η|

E(L^∗(α)) = 0, plim

n→∞ sup

α∈A\A^c

|²⁰Σ⁻¹M (α^c)² − σ_²²tr(Σ⁻¹M (α^c))|

E(L^∗(α)) = 0,

which can be followed easily from (6.30) and (6.31). It then gives (6.40). This completes

the proof. 2

An example is given here for the Theorem 15.

Corollary 9 Consider a class of models given by (3.1) with xj(s)’s independently gener-ated from white-noise processes of (5.7), where p fixed and A^c = {α^c}. If lim

n→∞tr(Σ⁻²) =

∞, then

plim

n→∞

L(ˆα_CAIC) − kS − E(S|Z)k² inf_α∈AL(α) − kS − E(S|Z)k² = 1.

The model with smallest value of L(α) might not exist for |A^c| ≥ 2. If there are at least two correct models with fixed dimensions in A^c, there will be no asymptotic optimal properties under the level of loss comparison. We are then interested to ask if the model selection procedure still has some optimal properties on E(L(α)) in the cases of |A^c| ≥ 2.

Hence, we need a much more heavily penalty on model dimension to select α^camong A^c.

在文檔中空間統計模型選取之大樣本理論 (頁 58-80)