In this section, we consider explanatory variables generated independently from Gaussian white noise processes of (5.7).
Proposition 6 Consider a class of models given by (3.1) with xj(s)’s independently gen-erated from white-noise processes of (5.7) and cov(η(s), η(s0)) = σ2ηexp(−κη|s−s0|), where
and A(α; θ) is defined in (3.6).
(ii) For δ = 0,
−2`(θ; α) = n log(2π) − 1 − δ
2 log n + µ
log σ²2+ P
j∈αc\αβj2σj2+ σ2²,0 σ²2
¶ n
+
µ2κηση2 σ²2
¶1/2µ 1 −
P
j∈αc\αβj2σj2+ σ²,02
2σ²2 +κη,0ση,02 2κηση2
¶ n1/2
+ξ(3)(α; θ) + Op(1). (5.126)
By (5.124), for δ ∈ (0, 1) and any α ∈ A \ Ac,
−2`(θ; α) + 2`(θ; αc) = σ²−2 X
j∈αc\α
βj2σj2n + ξ(3)(α; θ) − ξ(2)(αc; θ) + op(n)
= σ²−2 X
j∈αc\α
βj2σj2n + op(n), (5.127)
where the last equality holds because by (5.125), ξ(3)(α; θ) − ξ(3)(αc; θ) = op(n). Similarly by (5.126) for δ = 0,
−2`(θ; α) + 2`(θ; αc) = σ−2² X
j∈αc\α
βj2σj2n + op(n).
As to be demonstrated in Theorem 12, we can use (5.127) to find an appropriate penalty λ that leads to selection consistency.
The following lemma shows that σ²2 is over-estimated by ML asymptotically when α ∈ A\Acunder both the fixed domain and the increasing domain asymptotic frameworks.
Lemma 13 Under the setup of Proposition 6, let Θ ⊂ (0, ∞)3 be a compact set and let ˆθ(α) = (ˆση2(α), ˆκη(α), ˆσ²2(α))0 be the ML estimate of θ based on model α. Then for δ ∈ [0, 1) and α ∈ A,
ˆ
σ2²(α) = X
j∈αc\α
βj2σ2j + σ²,02 + op(1), (5.128) ˆ
κη(α)ˆσ2η(α) = κη,0ση,02 + op(1). (5.129) The following theorem further provides the convergence rates for the ML estimates of κη, σ2η and σ²2. These results are keys for establishing some asymptotic properties of GIC in Theorem 13.
Theorem 11 Under the setup of Proposition 6, let Θ ⊂ (0, ∞)3 be a compact set and let θ(α) = (ˆˆ σ2η(α), ˆκη(α), ˆσ2²(α))0 be the ML estimate of θ based on model α. Then
(i) For δ ∈ (0, 1), ˆ
σ²2(α) =
½ σP²,02 + op(n−(1−δ)/2); if α ∈ Ac,
j∈αc\αβj2σ2j + σ²,02 + op(n−(1−δ)/2); if α ∈ A \ Ac,(5.130) ˆ
κη(α)ˆση2(α) = κη,0ση,02 + op(n−(1−δ)/4), (5.131) ˆ
ση2(α) = ση,02 + op(1); α ∈ A, (5.132) ˆ
κη(α) = κη,0+ op(1); α ∈ A. (5.133)
(ii) For δ = 0, Then, it can be obtained in a way similar to (5.99) that
ξ(3)(θ; α) = ξ1(θ; α) + ξ2(θ; α) + ξ(θ) + Op(1). (5.138) First, we prove (5.130). By (5.128) and (5.129), it suffices to show that for |σ²2−σ2²,α| = o(1), |κησ2η − κη,0ση,02 | = o(1), and any ε > 0,
|σ2²−σ²,α2 |≥εninf−(1−δ)/2(−2`(θ; α) + 2`((ση2, κη, σ²,α2 )0; α)) > 0, (5.139) as n → ∞ with probability tending to 1. By (5.124),
−2`(θ; α) = n log(2π) − 1 − δ
where the second equality follows from |σ2² − σ2²,α| = o(1), (5.39) and (5.40), and the last equality follows form (5.67) and
ξ1(θ; α) − ξ1((ση2, κη, σ²,α2 )0; α) = op(max((σ²2− σ²,α2 )2n, nδ)), (5.141) ξ2(θ; α) − ξ2((ση2, κη, σ²,α2 )0; α) = op(max((σ²2− σ²,α2 )2n, nδ)), (5.142) which can be obtained in a way similar to (5.105)-(5.106) where the moment conditions are given from (5.43)-(5.44) in this case. Thus, (5.139) is obtained. This completes the proof of (5.130).
Second, we prove (5.131). By (5.129) and (5.130), it suffices to show that for |σ2² − σ2²,α| = o(n−(1−δ)/2), |κηση2− κη,0ση,02 | = o(1) and any ε > 0,
|ση2κη−κη,0σinf2η,0|≥εn−(1−δ)/4
¡− 2`(θ; α) + 2`((ση,02 , κη,0, σ2²,α)0; α)¢
> 0, (5.143) as n → ∞ with probability tending to 1. By (5.140), we have for |σ2²− σ2²,α| = o(n−(1−δ)/2) and |κησ²2− κη,0ση,02 | = o(1),
−2`(θ; α) + 2`((ση,02 , κη,0, σ²,α2 )0; α)
=
½µ2κηση2 σ2²,α
¶1/2µ 1
2 +κη,0ση,02 2κηση2
¶
− (2κη,0ση,02 )1/2 σ²,α
¾
n(1+δ)/2+ 1
2κη,α(κη− κη,α)2nδ +ξ1(θ; α) − ξ1((ση,02 , κη,0, σ²,α2 )0; α) + ξ2(θ; α) − ξ2((σ2η,0, κη,0, σ²,α2 )0; α)
+ξ(θ) − ξ((σ2η,0, κη,0, σ²,α2 )0) + op(nδ)
= (κηση2− κη,0σ2η,0)2n(1+δ)/2 25/2(κη,0ση,02 )3/2 + 1
2κη,0
(κη − κη,0)2nδ+ ξ1(θ; α) − ξ1((σ2η,0, κη,0, σ²,α2 )0; α) +ξ2(θ; α) − ξ2((ση,02 , κη,0, σ²,α2 )0; α) + ξ(θ) − ξ((ση,02 , κη,0, σ2²,α)0)
+o((κηση2− κη,0σ2η,0)2n(1+δ)/2) + op(nδ)
= (κηση2− κη,0σ2η,0)2n(1+δ)/2
25/2(κη,0ση,02 )3/2 +(κη − κη,0)2nδ
2κη,0 + o((κησ2η− κη,0σ2η,0)2n(1+δ)/2)
+op(nδ), (5.144)
where the first equality follows from |σ²2− σ2²,α| = o(n−(1−δ)/2), the second equality follows from (5.41), and the last equality follows from
ξ1(θ; α) − ξ1((σ2η,0, κη,0, σ²,α2 )0; α) = op(max((κηση2− κη,0ση,02 )2n(1+δ)/2, nδ)),(5.145) ξ2(θ; α) − ξ2((σ2η,0, κη,0, σ²,α2 )0; α) = op(max((κηση2− κη,0ση,02 )2n(1+δ)/2, nδ)),(5.146) ξ(θ) − ξ((ση,02 , κη,0, σ2²,α)0) = op(max((κηση2− κη,0ση,02 )2n(1+δ)/2, nδ)),(5.147) which can be obtained in a way similar to (5.109)-(5.111). Thus, (5.143) is obtained. This completes the proof of (5.131).
Third, we prove (5.132) and (5.133). By (5.144), we have for |σ2²− σ2²,α| = o(n−(1−δ)/2),
|σ2ηκη − κη,0ση,02 | = o(n−(1−δ)/4) and any ε > 0,
|κη−κinfη,0|≥ε−2`(θ; α) + 2`((ση,02 , κη,0, σ²,α2 )0; α) = 1 2κη,0
ε2nδ+ op(nδ) > 0,
as n → ∞ with probability tending to 1, which gives (5.133). This together with (5.131) gives (5.132).
Fourth, we prove (5.134). By (5.128) and (5.129), it suffices to show that for |σ²2 − σ2²,α| = o(1), |κηση2− κη,0ση,02 | = o(1), there exists M > 0 such that
|σ²2−σ2²,αinf|≥M n−1/2(−2`(θ; α) + 2`((ση2, κη, σ2²,α)0; α)) > 0, (5.148) as n → ∞ with probability tending to 1. By (5.126) and (5.138),
−2`(θ; α) = n log(2π) − 1 − δ
2 log n + µ
log σ²2+ σ2²,α σ²2
¶ n
+
µ2κηση2 σ2²
¶1/2µ
1 −σ²,α2
2σ2² +κη,0σ2η,0 2κησ2η
¶ n1/2
+ξ1(θ; α) + ξ2(θ; α) + ξ(θ) + Op(1). (5.149) Then, for |σ²2− σ2²,α| = o(1) and |κηση2− κη,0σ2η,0| = o(1), we have
−2`(θ; α) + 2`((ση2, κη, σ²,α2 )0; α)
= µ
log σ²2+σ²,α2
σ²2 − log σ²,α2 − 1
¶ n
+
½µ2κηση2 σ2²
¶1/2µ
1 −σ²,α2
2σ²2 +κη,0ση,02 2κησ2η
¶
−
µ2κηση2 σ²,α2
¶1/2µ 1
2+ κη,0ση,02 2κησ2η
¶¾ n1/2 +ξ1(θ; α) − ξ1((σ2η, κη, σ2²,α)0; α) + ξ2(θ; α) − ξ2((σ2η, κη, σ2²,α)0; α)
+ξ(θ) − ξ((ση2, κη, σ²,α2 )0) + Op(1)
= 1
2σ²,α4 (σ²2− σ2²,α)2n + ξ1(θ; α) − ξ1((ση2, κη, σ²,α2 )0; α) + ξ2(θ; α) − ξ2((σ2η, κη, σ²,α2 )0; α) +ξ(θ) − ξ((ση2, κη, σ²,α2 )0) + op((σ²2− σ²,α2 )2n) + Op(1)
= 1
2σ²,α4 (σ²2− σ2²,α)2n + op((σ2² − σ²,α2 )2n) + Op(1),
where the second equality follows from (5.39) and (5.40) with σ = σ²,α and τ = κη,0σ2η,0, and the last equality follows from (5.73) and
ξ1(θ; α) − ξ1((ση2, κη, σ²,α2 )0; α) = op((σ2² − σ²,α2 )2n) + Op(1), ξ2(θ; α) − ξ2((ση2, κη, σ²,α2 )0; α) = op((σ2² − σ²,α2 )2n) + Op(1),
which can be obtained in a way similar to (5.105)-(5.106). Consequently, there exists M > 0 such that
inf
|σ²2−σ²,α2 |≥M n−1/2(−2`(θ; α) + 2`((σ2η, κη, σ2²,α)0; α)) = M2
2σ²,α4 ε2+ Op(1) > 0,
as n → ∞ with probability tending to 1. Thus, (5.148) is obtained. This completes the proof of (5.134).
Finally, we prove (5.135). By (5.129) and (5.134), it suffices to show that for |σ²2 − σ2²,α| = O(n−1/2), |κησ2η − κη,0ση,02 | = o(1) and there exist M > 0 such that
|σ2ηκη−κη,0infση,02 |≥M n−1/4
¡− 2`(θ; α) + 2`((σ2η,0, κη,0, σ²,α2 )0; α)¢
> 0, (5.150)
as n → ∞ with probability tending to 1. By (5.149), for |σ2² − σ2²,α| = O(n−1/2) and
|κηση2− κη,0ση,02 | = o(1), we have
−2`(θ; α) + 2`((ση,α2 , κη,α, σ2²,0)0; α)
=
½µ2κηση2 σ2²,α
¶1/2µ 1
2 +κη,0ση,02 2κηση2
¶
−
µ2κη,0σ2η,0 σ²,α2
¶¾ n1/2
+ξ1(θ; α) − ξ1((ση,02 , κη,0, σ²,α2 )0; α) + ξ2(θ; α) − ξ2((σ2η,0, κη,0, σ²,α2 )0; α) +ξ(θ) − ξ((σ2η,0, κη,0, σ²,α2 )0) + Op(1)
= (κηση2− κη,0ση,02 )2n1/2
25/2(κη,0ση,02 )3/2 + ξ1(θ; α) − ξ1((ση,02 , κη,0, σ2²,α)0; α) +ξ2(θ; α) − ξ2((ση,02 , κη,0, σ²,α2 )0; α) + ξ(θ) − ξ((ση,02 , κη,0, σ2²,α)0) +o¡
(κησ2η − κη,0ση,02 )2n1/2¢
+ Op(1)
= (κηση2− κη,0ση,02 )2n1/2 25/2(κη,0ση,02 )3/2 + o¡
(κησ2η− κη,0σ2η,0)2n1/2¢
+ Op(1), (5.151) where the first equality follows from |σ2² − σ²,α2 | = O(n−1/2), the second equality follows from (5.41), and the last equality follows from
ξ1(θ; α) − ξ1((ση,02 , κη,0, σ2²,α)0; α) = op((κησ2η− κη,0ση,02 )2n1/2) + Op(1),
ξ2(θ; α) − ξ2((ση,02 , κη,0, σ2²,α)0; α) = op((κησ2η− κη,0ση,02 )2n1/2) + Op(1), (5.152) ξ(θ) − ξ(ση,02 , κη,0, σ²,α2 )0) = op((κησ2η− κη,0ση,02 )2n1/2) + Op(1), (5.153) which can be obtained in a way similar to (5.109)-(5.111). Thus, (5.150) is obtained. This
completes the proof of (5.135). 2
Corollary 6 Under the setup of Theorem 11, let θ(3)α =
µ
ση,02 , κη,0, X
j∈αc\α
βj2σ2j + σ²,02
¶
. (5.154)
Then for `(θ; α) defined in (2.9), plim
n→∞
1
nδ(−2`( ˆθ(α); α) + 2`(θα(3); α)) = 0; if δ ∈ (0, 1), (5.155)
−2`( ˆθ(α); α) + 2`(θ(3)α ; α) = Op(1); if δ = 0. (5.156) In addition, for LKL(θ; α) defined in (3.3),
plim
n→∞LKL( ˆθ(α); α)±
LKL(θα(3); α)) = 0; if δ ∈ [0, 1). (5.157)
Note that from Theorem 11, plim
n→∞
θ(α) = θˆ α(3) for δ ∈ (0, 1), which immediately im-plies (5.155). On the other hand, (5.156) is somewhat surprising, because ˆθ(α) generally does not converge to θα(2) for δ = 0. However, selection consistency and asymptotic loss efficiency are possible for geostatistical model selection even if some covariance parame-ters cannot be consistently estimated under the fixed domain asymptotic framework (see Theorem 13).
Theorem 12 Consider a class of models given by (3.1) with xj(s)’s independently gener-ated from white-noise processes of (5.7) and cov(η(s), η(s0)) = ση2exp(−κη|s − s0|), where Ac 6= ∅ and p is fixed. Suppose that ση2 > 0, κη > 0 and σ2² > 0 are known. In addi-tion, suppose that the data are collected at si = in−(1−δ) ∈ [0, nδ]; i = 1, . . . , n for some δ ∈ [0, 1). If λ → ∞ and λ/n → 0, then
LKL(ˆαGIC(λ))±
minα∈ALKL(α)−→ 1,p as n → ∞.
In addition,
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1.
Proof. By Corollary 2, it suffices to show that
n→∞lim tr(Σ−1)/λ = ∞, (5.158)
which follows from (5.31) and λ = o(n). This completes the proof. 2 Theorem 13 Under the setup of Theorem 12, suppose that θ = (σ2η, κη, σ²2)0 is unknown, where Θ ⊂ (0, ∞)3 is a compact set such that θ0 ∈ Θ. Let ˆθ(α) be the ML estimate of θ based on model α. For δ ∈ [0, 1), if λ → ∞ and λ±
n → 0, then
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1.
Proof. For the consistency, it suffices to show that the conditions in Corollary 3 are satisfied with τn= n. First, for δ ∈ [0, 1),
µ0A(α; θ)0Σ−1(θ)A(α; θ)µ = β(αc\ α)0X(αc\ α)0Σ−1(θ)X(αc\ α)β(αc\ α)
−β(αc\ α)0X(αc\ α)0Σ−1(θ)M (α; θ)X(αc\ α)β(αc\ α)
= β(αc\ α)0X(αc\ α)0Σ−1(θ)X(αc\ α)β(αc\ α) + Op(1)
= X
j∈αc\α
βj2σ2jtr(Σ−1(θ)) + op(n)
= X
j∈αc\α
βj2σj2
σ²2 n + op(n), (5.159)
where the second equality is obtained in a way similar to (5.100), the third equality follows from
β(αc\ α)0X(αc\ α)0Σ−1(θ)X(αc\ α)β(αc\ α) = X
j∈αc\α
βj2σj2tr(Σ−1(θ)) + op(n),
which can be obtained by (5.35), Chebyshev’s inequality and using the following moment condition:
var(Xj0Σ−1(θ)Xj0) = σ2jσ2j0tr(Σ−2(θ)) = O(n),
and the last equality follows from (5.31). Hence, (A.1’) is satisfied. Second, by (5.31) and (5.35), we have for any θ ∈ Θ,
plim
n→∞
1
nX0Σ−1(θ)X = D(θ),
where D(θ) is a p × p diagonal matrix with diagonals σj2±
σ2², j = 1, . . . , p. Hence, (A.2’) holds. Third, by (5.34) and (5.35), (A.3’) holds. Fourth, by (5.155)-(5.157), (A.4) and (A.5) hold trivially for τn= n and θα(3) defined in (5.154). Fifth, for ξ(θ) defined in (5.53), by (5.147) and (5.153), we have for δ ∈ [0, 1),
plim
n→∞
1 n
¡ξ(θ0) − ξ(θ(3)α )¢
= 0.
Hence, (4.12) holds. Last, for α ∈ Ac, θα(3) = θ0, (4.14) holds trivially. Then, for Ac6= ∅, λ → ∞ and λ = o(n), we have
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1,
which completes the proof. 2
Comparing among Theorems 7, 10 and 13, we see that GIC is easiest to be consistent when the variables to be selected are from white-noise processes, but is most difficult to be so when the variables to be selected are polynomials.
Chapter 6
Conditional Generalized Information Criterion
If we are interested to find the asymptotic optimal properties of (3.14) throughout some selection procedure, it is somehow difficult to prove the asymptotic properties directly from GIC we introduce above. Another criterion is needed. Vaida and Blanchard (2005) suggest a suitable criterion when we are interesting in spatial process prediction which is named conditional AIC (CAIC). Here we will also suggest a conditional generalized information criterion (CGIC) which includes CAIC as a special case. In the following sections, we are going to introduce the asymptotic theory of CGIC in geostatistical model selection problems.
6.1 Conditional Akaike’s Information Criterion
Consider the loss, L(α) defined in (3.14) with estimators, ˆS(α) defined in (3.15). It’s difficult to find the optimal properties of L(α) directly from the criterion (4.6). Vaida and Blanchard (2005) suggested a conditional AIC (CAIC) selection procedure for the linear mixed models which is an unbiased estimator of E(L(α)) shown in (3.17). They suggested when focus on the mean function estimate, the AIC in (4.3) is good to be a selection procedure. When focus on both the mean function estimate and the spatial process prediction, CAIC is much adequate than AIC to be a selection procedure. That is for α ∈ A,
ΓCAIC(α) = kZ − ˆS(α)k2+ 2tr(H(α))σ2², (6.1) where ˆS(α) = H(α)Z with H(α) defined in (3.16). Let
ˆ
αCAIC = arg min
α∈A ΓCAIC(α). (6.2)
Then we have the following theorem.
Theorem 14 Consider a class of models given by (3.1). Suppose that
n→∞lim X
α∈A
1
E(L(α)) = 0, (6.3)
where L(α) is defined in (3.14). Then the criterion ΓCAIC(α) defined in (6.1) is asymp-totically loss efficient:
plim
n→∞L(ˆαCAIC)±
α∈Ainf L(α) = 1.
Proof. Here, we first expand the CAIC defined in (6.1). It is ΓCAIC(α) = (Z − ˆS(α))0(Z − ˆS(α)) + 2σ2²tr(H(α))
= (S − ˆS(α) + ²)0(S − ˆS(α) + ²) + 2σ2²tr(H(α))
= L(α) + 2²0(S − ˆS(α)) + ²0² + 2σ²2tr(H(α))
= L(α) + 2²0(I − H(α))S − 2²0H(α)² + ²0² + 2σ2²tr(H(α))
= L(α) + 2σ²2²0(Σ−1A(α))S − 2²0H(α)² + ²0² + 2σ²2tr(H(α))
= L(α) + 2σ²2²0(Σ−1A(α))µ + 2σ2²²0(Σ−1A(α))η + ²0²
−2¡
²0H(α)² − σ²2tr(H(α))¢
, (6.4)
where the third equality follows from (3.14) and the second last equality follows from I − H(α) = A(α) − ΣηΣ−1A(α) = σ2²Σ−1A(α).
It then needs to show that for α ∈ A,
ΓCAIC(α) = ²0² + L(α) + op(L(α)), (6.5) which suffices to show that
plim
n→∞sup
α∈A
|²0(Σ−1A(α))µ|
E(L(α)) = 0, (6.6)
plim
n→∞sup
α∈A
|²0(Σ−1A(α))η|
E(L(α)) = 0, (6.7)
plim
n→∞sup
α∈A
|²0H(α)² − σ²2tr(H(α))|
E(L(α)) = 0, (6.8)
plim
n→∞sup
α∈A
¯¯
¯¯ L(α) E(L(α)) − 1
¯¯
¯¯ = 0. (6.9)
Hence, by (6.5), for ˆαCAIC defined in (6.2) and αL = arg minα∈AL(α), we can easily conclude that
ΓCAIC(ˆαCAIC) = ²0² + L(ˆαCAIC) + op(L(ˆαCAIC)), ΓCAIC(αL) = ²0² + L(αL) + op(L(αL)).
It follows that
0 ≤ ΓCAIC(αL) − ΓCAIC(ˆαCAIC)
L(ˆαCAIC) = L(αL) − L(ˆαCAIC)
L(ˆαCAIC) + op(1), and then
plim
n→∞
L(αL) − L(ˆαCAIC) L(ˆαCAIC) = 0, which gives plim
n→∞L(ˆαCAIC)±
α∈Ainf L(α) = 1.
Here, we start to prove (6.6)-(6.9) one by one. First, any ε > 0,
which gives (6.6), where the second last inequality follows from σ²4µ0A(α)0Σ−2A(α)µ ≤ E(L(α)), by (3.17) and the last equality follows from (6.3).
Second, for any ε > 0,
where the third inequality follows from σ2²Σ−1 ≤ I, the second last inequality follows from σ²2tr(Σ−1A(α)Ση) ≤ σ²2tr(ΣηΣ−1) ≤ E(L(α)),
by (3.17) and the last equality follows from (6.3).
Third, for any ε > 0,
where the second inequality is an application of Theorem 2 of Whittle (1960) for some c1 > 0, the third equality follows from
tr(H(α)H(α)0) = tr¡
ΣηΣ−1+ σ2²Σ−1M (α))(ΣηΣ−1+ σ²2Σ−1M (α))0¢
= tr¡
ΣηΣ−2Ση+ σ2²ΣηΣ−1M (α)0Σ−1+ σ²2Σ−1M (α)Σ−1Ση +σ²4M (α)0Σ−2M (α)¢
≤ tr(ΣηΣ−1) + 3σ2²tr(Σ−1M (α))
≤ 3σ²−2E(L(α)), by
tr(ΣηΣ−2Ση) = tr(ΣηΣ−1− σ2²Σ−2Ση)
≤ tr(ΣηΣ−1)
tr(σ²2ΣηΣ−1M (α)0Σ−1) = tr(σ²2tr(M (α)0Σ−1− σ²2Σ−1M (α)0Σ−1))
≤ tr(σ²2tr(M (α)0Σ−1)), and
σ4²tr(M (α)0Σ−2M (α)) ≤ σ²2tr(M (α)0Σ−1M (α)) = σ2²tr(Σ−1M (α)).
Last, it remains to show (6.9). Here, we first expand L(α) defined in (3.14). That is L(α) = (S − ˆS(α))0(S − ˆS(α))
= k(I − H(α))µ + (η − ΣηΣ−1(η + ²)) − σ²2Σ−1M (α)(η + ²)k2
= kσ²2Σ−1A(α)µ + (σ²2Σ−1η − ΣηΣ−1²) − σ2²Σ−1M (α)(η + ²)k2
= σ²4µ0A(α)0Σ−2A(α)µ + kσ2²Σ−1η − ΣηΣ−1²k2− 2σ²4µ0A(α)0Σ−2M (α)(η + ²) +σ4²(η + ²)0M (α)0Σ−2M (α)(η + ²) + 2σ²2µ0A(α)0Σ−1(σ²2Σ−1η − ΣηΣ−1²)
−2σ2²(σ²2Σ−1η − ΣηΣ−1²)0Σ−1M (α)(η + ²). (6.10) It then follows together with (3.17),
L(α) − E(L(α)) = kσ²2Σ−1η − ΣηΣ−1²k2− σ²2tr(ΣηΣ−1)
+σ²4(η + ²)0M (α)0Σ−2M (α)(η + ²) − σ²4tr(Σ−1M (α))
+2σ2²µ0A(α)0Σ−1(σ2²Σ−1η − ΣηΣ−1²) − 2σ²4µ0A(α)0Σ−2M (α)(η + ²)
−2σ²2(σ2²Σ−1η − ΣηΣ−1²)0Σ−2M (α)(η + ²).
Then, to show (6.9), it suffices to show that plim
n→∞
sup
α∈A
¯¯kσ²2Σ−1η − ΣηΣ−1²k2− σ²2tr(ΣηΣ−1)¯
¯
E(L(α)) = 0, (6.11)
plim
n→∞sup
α∈A
|(η + ²)0M (α)0Σ−2M (α)(η + ²) − tr(Σ−1M (α))|
E(L(α)) = 0, (6.12)
plim
n→∞
sup
α∈A
|µ0A(α)0Σ−1(σ²2Σ−1η − ΣηΣ−1²)|
E(L(α)) = 0, (6.13)
plim
n→∞sup
α∈A
|µ0A(α)0Σ−2M (α)(η + ²)|
E(L(α)) = 0, (6.14)
plim
n→∞
sup
α∈A
|(σ2²Σ−1η − ΣηΣ−1²)0Σ−1M (α)(η + ²)|
E(L(α)) = 0. (6.15)
Now, we start to prove (6.11)-(6.15) one by one.
Hence, to show (6.11), it suffices to show plim
First, (6.16) can be established in a similar manner by Theorem 2 of Whittle. It is for any ε > 0,
for some c2 > 0, where the third inequality follows from
tr(ΣηΣ−2ΣηΣ−2) ≤ σ²−4tr(Σ2Σ−2) ≤ σ²−4tr(ΣηΣ−1), (6.19) by σ²2Σ−1≤ I and Σ1/2η Σ−1Σ1/2η ≤ I by Ση ≤ Σ, and the last equality follows from (4.5).
Second, (6.17) is also established by Theorem 2 of Whittle. It is for any ε > 0,
n→∞lim P
for some c3 > 0, where third inequality follows from where the third equality follows from
σ²2tr(Σ−1ΣηΣ−1ΣηΣ−1ΣηΣ−1) ≤ σ²2tr(Σ−1ΣηΣ−1ΣηΣ−1)
≤ σ²2tr(Σ−1ΣηΣ−1)
≤ tr(ΣηΣ−1),
by σ²2Σ−1 ≤ I and Σ−1/2ΣηΣ−1/2 ≤ I by Ση < Σ, and the last equality follows from (6.3). It then gives (6.11).
For (6.12), it can be established by Theorem 2 of Whittle. It is for any ε > 0,
n→∞lim P
for some c4 > 0, where the third inequality follows from
tr(ΣM (α)0Σ−2M (α)ΣM (α)0Σ−2M (α)) = tr(M (α)Σ−1M (α)Σ−1)
≤ tr(Σ−1M (α)Σ−1)
≤ σ−2² tr(Σ−1M (α)),
by M (α)ΣM0(α)Σ−1 = M (α), Σ−1M (α) ≤ Σ−1 and σ²2Σ−1 ≤ I, and the last equality follows from (6.3).
For (6.13), we have
|µ0A(α)0Σ−1(σ2²Σ−1η − ΣηΣ−1²)| = |σ²2µ0A(α)0Σ−2η − µ0A(α)0Σ−1ΣηΣ−1²|
≤ |σ²2µ0A(α)0Σ−2η| + |µ0A(α)0Σ−1ΣηΣ−1²|.
Hence, to show (6.13), it suffices to show that plim
First, (6.20) can be show similarly from (6.6). It is for any ε > 0,
n→∞lim P where the third inequality follows from
µ0A(α)0Σ−2ΣηΣ−2A(α)µ ≤ µ0A(α)0Σ−3A(α)µ follows from (6.3). It then gives (6.13).
For (6.14), we have for any ε > 0, follows from (6.3). It then gives (6.14).
For (6.15), we have
Then, to show (6.15), it suffices to show that plim
Now, we start to show (6.22)-(6.25) one by one. First, (6.22) can be established by
Theorem 2 of Whittle. That is for any ε > 0,
for some c6 > 0, where the third inequality follows from
Σ−1M (α)ΣηM (α)0Σ−1 ≤ Σ−1M (α)ΣM (α)0Σ−1 = Σ−1M (α),
and the fourth inequality follows from σ2²Σ−1ΣηΣ−1 ≤ I by Ση ≤ Σ and σ2²Σ−1 ≤ I, and the last equality follows from (6.3). Second, (6.23) is similarly to (6.18). It is
n→∞lim P
equality follows from (6.3). Third, (6.24) is similar to (6.23). It is for any ε > 0,
where the third inequality follows from
Σ−1M (α)ΣηM (α)0Σ−1 ≤ Σ−1M (α)ΣM (α)0Σ−1 = Σ−1M (α),
and the fourth inequality follows from ΣηΣ−2Ση ≤ I by Σ2η ≤ Σ2, and the last equality follows from (6.3). Last, (6.25) can be established by Theorem 2 of Whittle. It is for any ε > 0,
for some c7 > 0, where the third equality follows from
σ2²Σ−1M (α)M (α)0Σ−1 ≤ Σ−1M (α)ΣM (α)0Σ−1 = Σ−1M (α),
and the fourth inequality follows from ΣηΣ−2Ση ≤ I by Σ2η ≤ Σ2, and the last equality follows from (6.3). Thus, we ends the proof of (6.9), which completes the proof. 2 Note that (6.3) holds in general. Here, we consider an example where (6.3) is satisfied.
Corollary 7 Consider a class of models given by (3.1) with p fixed and any arbitrary explanatory variables. Suppose that the data are collected at si = in−(1−δ) ∈ [0, nδ];
i = 1, . . . , n for some δ ∈ [0, 1). Consider the exponential covariance model of (5.1) for η(·). Let ˆαCAIC be the model selected by CAIC as defined in (6.2). Then,
plim
n→∞L(ˆαCAIC)±
α∈Ainf L(α) = 1.
Further, if Ac6= ∅, then for any model selection procedure ˆα, such that lim
n→∞P (ˆα ∈ Ac) = 1,
plim
n→∞L(ˆα)±
α∈Ainf L(α) = 1.
It is shown in (3.17) that E(L(α)) is lower bounded by dominated by σ²2tr(ΣηΣ−1) for α ∈ A, which is often a dominated term of E(L(α)). In addition, for α ∈ Ac, σ²2tr(ΣηΣ−1) is the dominated term of E(L(α)). Hence, it might suggests us that whatever correct model we select, it will be always satisfied the asymptotic loss efficiency. Further, in the following example, EL((α)) are dominated by σ2²tr(ΣηΣ−1) for α ∈ A. In such case, every candidate model achieves the asymptotic loss efficiency.
Corollary 8 Consider a class of models given by (3.1) with xj(s) = (sn−δ)j; j = 1, . . . , p, and cov(η(s), η(s0)) = σ2ηexp(−κη|s − s0|), where p fixed and Ac 6= ∅. Suppose that the data are collected at si = in−(1−δ) ∈ [0, nδ]; i = 1, . . . , n for some δ ∈ [0, 1). Let ˆαCAIC be the model selected by CAIC as defined in (6.2). Then
plim
n→∞L(ˆαCAIC)±
α∈Ainf L(α) = 1.
Further, for any model selection procedure ˆα, plim
n→∞L(ˆα)±
α∈Ainf L(α) = 1.
¿From (7) and (8), it might suggest us that the variable selection is somehow unnec-essary for the asymptotic loss efficiency of L(α) in those cases. Here, we consider the strongly asymptotic loss efficiency of L(α) defined in (3.21).
Theorem 15 Consider a class of models given by (3.1) and the universal kriging predictor S(α) of S defined in (3.15). Supposeˆ
n→∞lim X
α∈A\Ac
1
E(L(α)) − σ²2tr(ΣηΣ−1) = 0, (6.26) where L(α) is defined in (3.14). If |Ac| ≤ 1 and αc is fixed, then ˆαCAIC of (6.2) is strongly asymptotic loss efficient:
plim
n→∞
L(ˆαCAIC) − kS − E(S|Z)k2 infα∈AL(α) − kS − E(S|Z)k2 = 1.
Proof. Here, we first suppose that Ac = ∅. Now, we expand the CAIC defined in (6.1) from (6.4). It is
ΓCAIC(α) = L(α) + 2σ²2²0(Σ−1A(α))µ + 2σ2²²0(Σ−1A(α))η + ²0²
−2¡
²0H(α)² − σ²2tr(H(α))¢
= L(α) + 2σ²2²0(Σ−1A(α))µ + 2σ2²²0Σ−1η − 2σ²2²0M (α)η + ²0²
−2¡
²0ΣηΣ−1² − σ²2ΣηΣ−1¢
− 2σ²2¡
²0Σ−1M (α)² − σ2²tr(Σ−1M (α))¢ ,(6.27) where the last equality follows from H(α) = ΣηΣ−1+ σ²2Σ−1M (α) by (3.16). Note that 2σ²2²0Σ−1η +²0²−2¡
²0ΣηΣ−1²−σ²2ΣηΣ−1¢
is constant in variable selection. It then needs to show that for α ∈ A \ Ac,
ΓCAIC(α) = constant + L∗(α) + op(L∗(α)), (6.28) where L∗(α) = L(α) − kS − E(S|Z)k2, which suffices to show that
plim
n→∞ sup
α∈A\Ac
|²0(Σ−1A(α))µ|
E(L∗(α)) = 0, (6.29)
plim
n→∞
sup
α∈A\Ac
|²0(Σ−1M (α))η|
E(L∗(α)) = 0, (6.30)
plim
n→∞ sup
α∈A\Ac
|²0Σ−1M (α)² − σ²2tr(Σ−1M (α))|
E(L∗(α)) = 0, (6.31)
plim
n→∞ sup
α∈A\Ac
¯¯
¯¯ L∗(α) E(L∗(α)) − 1
¯¯
¯¯ = 0. (6.32)
Hence, by (6.28), for ˆαCAIC defined in (6.2) and αL = arg minα∈AL∗(α), we can easily conclude that
ΓCAIC(ˆαCAIC) = constant + L∗(ˆαCAIC) + op(L∗(ˆαCAIC)), ΓCAIC(αL) = constant + L∗(αL) + op(L∗(αL)).
It follows that
0 ≤ ΓCAIC(αL) − ΓCAIC(ˆαCAIC)
L∗(ˆαCAIC) = L∗(αL) − L∗(ˆαCAIC)
L∗(ˆαCAIC) + op(1), and then
plim
n→∞
L∗(αL) − L∗(ˆαCAIC) L∗(ˆαCAIC) = 0, which gives plim
n→∞
L∗(ˆαCAIC)±
α∈Ainf L∗(α) = 1 when Ac= ∅.
Here, we first calculate EL∗(α). By (9.1), we have E(L∗(α)) = E(L(α)) − EkS − E(S|Z)k2
= E(L(α)) − σ²2tr(ΣηΣ−1)
= σ²4µ0A(α)0Σ−2A(α)µ + σ4²tr(Σ−1M (α)), (6.33) by (3.17). Now, we start to prove (6.29)-(6.32) one by one. For (6.29), the proof can be followed from the proof of (6.6) by replacing E(L(α)) with (6.33).
For (6.30), we have for any ε > 0,
where the third inequality follows from Σ−1/2ΣηΣ−1/2 ≤ I, the second last inequality follows from (6.33) and the last equality follows from (6.26).
For (6.31), we have for any ε > 0,
where the second inequality is an application of Theorem 2 of Whittle (1960) for some c1 > 0, and the third and fourth inequality follows from
σ²4tr(M (α)0Σ−2M (α)) ≤ σ2²tr(M (α)0Σ−1M (α)) = σ²2tr(Σ−1M (α)) ≤ σ²−2E(L∗(α)), and the last equality follows from (6.26).
Now, it remains to show (6.32). Here, we first expand L∗(α) from (6.10). That is L∗(α) = L(α) − kS − E(S|Z)k2
= L(α) − kσ²2Σ−1η − ΣηΣ−1²k2
= σ²4µ0A(α)0Σ−2A(α)µ + σ²4(η + ²)0M (α)0Σ−2M (α)(η + ²)
+2σ2²µ0A(α)0Σ−1(σ²2Σ−1η − ΣηΣ−1²) − 2σ²4µ0A(α)0Σ−2M (α)(η + ²)
−2σ²2(σ2²Σ−1η − ΣηΣ−1²)0Σ−1M (α)(η + ²), (6.34)
the second equality follows from (9.1). It then follows together with (3.17), L∗(α) − E(L∗(α)) = σ4²(η + ²)0M (α)0Σ−2M (α)(η + ²) − σ4²tr(Σ−1M (α))
+2σ²2µ0A(α)0Σ−1(σ2²Σ−1η − ΣηΣ−1²) − 2σ²4µ0A(α)0Σ−2M (α)(η + ²)
−2σ²2(σ2²Σ−1η − ΣηΣ−1²)0Σ−2M (α)(η + ²).
Equation (6.32) can then be followed by plim
Note that the proofs of (6.35)-(6.38) can be followed from the proofs of (6.12)-(6.15) by replacing EL(α) with E(L∗(α)). Hence, (6.32) is then followed, which completes the proof when Ac= ∅.
Not, we suppose that Ac = {αc}. To show that the CAIC is still asymptotically loss efficient, it remains to show that for fixed αc,
L∗(αc) = op(L∗(α)); if α ∈ A \ Ac, (6.39) ΓCAIC(αc) = constant + L∗(αc) + op(L∗(α)). (6.40) Hence, by (6.39) and (6.40), we can easily conclude that
n→∞lim P¡
Now, we start to prove (6.39). Equation (6.39) can be followed by (6.32) and plim
Equations (6.41) can then be followed by plim
Note that (6.42) can be followed similarly from the proof of (6.35) and (6.43) is trivial since σ²2tr(Σ−1M (αc)) ≤ p(αc) < ∞, and (6.44) can be followed similarly from the proof of (6.38). It then gives (6.39).
Now we start to prove (6.40). By (6.27), we have ΓCAIC(αc) = constant + L∗(αc) − 2σ²2²0M (αc)η − 2σ2²¡
²0Σ−1M (αc)² − σ²2tr(Σ−1M (αc))¢ . Equation (6.40) can then be followed by
plim
n→∞ sup
α∈A\Ac
|²0M (αc)η|
E(L∗(α)) = 0, plim
n→∞ sup
α∈A\Ac
|²0Σ−1M (αc)² − σ²2tr(Σ−1M (αc))|
E(L∗(α)) = 0,
which can be followed easily from (6.30) and (6.31). It then gives (6.40). This completes
the proof. 2
An example is given here for the Theorem 15.
Corollary 9 Consider a class of models given by (3.1) with xj(s)’s independently gener-ated from white-noise processes of (5.7), where p fixed and Ac = {αc}. If lim
n→∞tr(Σ−2) =
∞, then
plim
n→∞
L(ˆαCAIC) − kS − E(S|Z)k2 infα∈AL(α) − kS − E(S|Z)k2 = 1.
The model with smallest value of L(α) might not exist for |Ac| ≥ 2. If there are at least two correct models with fixed dimensions in Ac, there will be no asymptotic optimal properties under the level of loss comparison. We are then interested to ask if the model selection procedure still has some optimal properties on E(L(α)) in the cases of |Ac| ≥ 2.
Hence, we need a much more heavily penalty on model dimension to select αcamong Ac.