α∈Ainf LKL(α) = 1.
Proof. The proof is essentially the same as that for Theorem 2 of Shao (1997) and hence
is omitted. 2
Theorem 2 reduces to Theorem 2 of Shao (1997) if Σ = σ2I. Similar to (4.5), Equation (4.7) provides a condition for risks associated with underfitted models. Equa-tion (4.8) is a weak technique condiEqua-tion that holds trivially when p is fixed. In fact, (4.7) is slightly weaker than the two conditions given in Theorem 2 of Shao (1997):
n→∞lim inf
α∈A\AcE(LKL(α))/n > 0 and lim
n→∞λp/n = 0. Similar to Corollary 1, we have the following corollary.
Corollary 2 Consider a class of models given by (3.1) with xj(s)’s independently gener-ated from white-noise processes of (5.7), where p is fixed and Ac6= ∅. If lim
n→∞tr¡ Σ−1¢±
λ =
∞ and λ → ∞, then lim
n→∞P¡ ˆ
αGIC(λ) = αc¢
= 1. In addition, plim
n→∞
LKL(ˆαGIC(λ)) .
α∈Ainf LKL(α) = 1.
Similar to the remark given right after Corollary 1, lim
n→∞λtr¡ Σ¢±
n2 = 0 is sufficient for lim
n→∞tr¡
Σ−1¢±
λ = ∞. (see an example in Theorem 12 of Section 5.3).
4.3 Unknown Covariance Parameters
In practice, the covariance parameter vector θ is usually unknown and needs to be esti-mated. Two approaches are commonly applied under this situation. The first one utilizes a two-step procedure by first estimating the covariance parameters using, for example, ML or REML, and then pretending the estimated parameters as known for subsequent
inference or prediction. The other one applies a Bayesian method that requires specify-ing a joint prior distribution for all the unknown parameters. Here we consider only the former one with ˆθ(α) being the ML estimate of θ for α ∈ A, obtained by maximizing the following profile log-likelihood function,
`(θ; α) = −1
2n log(2π) − 1
2log det(Σ(θ))
−1
2(Z − X(α) ˆβ(α; θ))0Σ−1(θ)(Z − X(α) ˆβ(α; θ)), (4.9) where ˆβ(α; θ) ≡ (X(α)0Σ(θ)−1X(α))−1X(α)0Σ(θ)−1Z and Σ is written as Σ(θ) to emphasis its dependence on θ. Let Θ be the parameter space for θ, and let θ0 ∈ Θ be the true covariance parameter vector. We shall develop asymptotic properties of GIC,
ΓGIC(λ)(α) = −2`( ˆθ(α); α) + λ(p(α)), (4.10)
under both the fixed domain asymptotic and the increasing domain asymptotic frame-works. The main difficulty to overcome is that some components of ˆθ(α) may converge to nondegenerate distributions even for α ∈ Ac under the fixed domain asymptotic frame-work.
We impose some regularity conditions for establishing asymptotic properties of GIC.
Denote by λmin(M ) the smallest eigenvalue of a symmetric matrix M . We consider some regularity conditions. Suppose that there exists τn → ∞ such that the following are satisfied;
(A.1) For θ ∈ Θ, lim
n→∞
1 τn inf
α∈A\Acµ0A(α; θ)0Σ−1(θ)A(α; θ)µ > 0, where A(α; θ) is defined in (3.6).
(A.2) For θ ∈ Θ, lim
n→∞
1 τnλmin
¡X0Σ−1(θ)X¢
> 0.
(A.3) For θ ∈ Θ, lim
n→∞
1 τnλmax¡
X0Σ−1(θ)Σ(θ0)Σ−1(θ)X¢
< ∞.
(A.4) For α ∈ A, there exists some θα∈ Θ such that plim
n→∞
1 τn
¡`( ˆθ(α); α) − `(θα; α)¢
= 0.
(A.5) For α ∈ A \ Ac and θα given in (A.4), plim
n→∞
1 τn
¡LKL(α; ˆθ(α)) − LKL(α; θα)¢
= 0.
In most cases, τn can be chosen as inf
j∈αcXj0Σ−1(θ)Xj or λmin¡
X0Σ−1(θ)X¢
, where Xj is the jth column of X (see Theorems 7, 10 and 13). Condition (A.1) provides the effect suffered from applying an incorrect model. Condition (A.2) ensures that the explanatory variables are not too much correlated. Obviously, (A.4) and (A.5) hold when plim
n→∞
θ(α) =ˆ θα, for some θα ∈ Θ. In some situation, θα is different from θ0. For example, when α ∈ A \ Ac, ˆθ(α) generally does not converge in probability to θ0. Surprisingly, (A.4) and (A.5) may hold even if ˆθ(α) converges to a nondegenerate distribution (see Theorems 7, 10 and 13).
Theorem 3 Consider a class of models given by (3.1) with p fixed. Let Θ be a compact parameter space for θ with θ0 ∈ Θ being the true parameter, and let LKL(α) be the KL loss defined in (3.3). Suppose that for α ∈ A, `(θ; α) defined in (4.9) is continuous in Θ, and (A.1)-(A.5) are satisfied for some τn→ ∞.
(i) For Ac= ∅, if τn/λ → ∞, and the following two conditions hold for α ∈ A:
n→∞lim sup
α∈A\Ac
1
τnµ0A(α; θ)0Σ−1(θ)Σ(θ0)Σ−1(θ)A(α; θ)µ < ∞, (4.11) plim
n→∞
1 τntr¡
((η + ²)(η + ²)0− Σ(θ0))¡
Σ−1(θα) − Σ−1(θ0)¢¢
= 0, (4.12) then GIC defined in (4.2) is asymptotically loss efficient:
plim
n→∞LKL( ˆθ(ˆαGIC(λ)); ˆαGIC(λ)) .
minα∈ALKL( ˆθ(α); α) = 1. (4.13)
(ii) For Ac6= ∅, if λ → ∞, τn/λ → ∞, (4.12) holds, and
n→∞lim 1 τn
¡log det(Σ(θα)) − log det(Σ(θ0)) + tr(Σ(θ0)Σ−1(θα)) − n¢
= 0,(4.14) for α ∈ Ac, then lim
n→∞P¡ ˆ
αGIC(λ) = αc¢
= 1.
Proof. (i) We first prove that for α ∈ A,
ΓGIC(λ)(α) = n log(2π) + log det(Σ(θ0)) + (η + ²)0Σ−1(θ0)(η + ²) + 2LKL(α; θα)
+op(τn). (4.15)
By (3.3) and (3.7), we can rewrite 2LKL(α; θα) as
2LKL(α; θα) = log det(Σ(θα)) − log det(Σ(θ0)) + tr(Σ(θ0)Σ−1(θα)) − n +µ0A(α; θα)0Σ−1A(α; θα)(θ)µ
+(η + ²)M (α; θα)0Σ−1(θα)(η + ²), (4.16) where M (α; θα) and A(α; θα) are defined in (3.5) and (3.6). By (4.10), we have for α ∈ A,
ΓGIC(λ)(α) = −2`( ˆθ(α); α) + 2`(θα; α) − 2`(θα; α) + λp(α)
= −2`(θα; α) + λp(α) + op(τn)
= n log(2π) + log det(Σ(θα)) + µ0A(α; θα)0Σ−1(θα)A(α; θα)µ
−2µ0A(α; θα)0Σ−1(θα)A(α; θα)(η + ²) + (η + ²)0Σ−1(θα)(η + ²)
−(η + ²)0M (α; θα)0Σ−1(θα)(η + ²) + op(τn)
= n log(2π) + log det(Σ(θα)) + µ0A(α; θα)0Σ−1(θα)A(α; θα)µ +(η + ²)0Σ−1(θα)(η + ²) + op(τn)
= n log(2π) + log det(Σ(θ0)) + (η + ²)0Σ−1(θ0)(η + ²) + 2LKL(α; θα) +tr((η + ²)(η + ²)0− Σ(θ0)(Σ−1(θα) − Σ−1(θ0))) + op(τn)
= n log(2π) + log det(Σ(θ0)) + (η + ²)0Σ−1(θ0)(η + ²) + 2LKL(α; θα) +op(τn),
where the second equality follows from (A.4), the third equality follows from (4.9), the fourth equality follows from the following two equations, which will be proved later:
(η + ²)0M (α; θα)0Σ−1(θα)(η + ²) = Op(1); α ∈ A, (4.17) µ0A(α; θα)0Σ−1(θα)A(α; θα)(η + ²) = op(τn); α ∈ A, (4.18)
the fifth equality follows from (4.16) and
(η + ²)0Σ−1(θα)(η + ²) − (η + ²)0Σ−1(θ0)(η + ²) + n − tr(Σ(θ0)Σ−1(θα))
= tr((η + ²)(η + ²)0− Σ(θ0)(Σ−1(θα) − Σ−1(θ0))),
and the last equality follows from (4.12). It remains to show (4.17) and (4.18). For (4.17), we have
(η + ²)0M (α; θα)0Σ−1(θα)(η + ²)
=
µ(η + ²)0Σ−1(θα)X(α) τn1/2
¶µX(α)0Σ−1(θα)X(α) τn
¶−1
×
µX(α)0Σ−1(θα)(η + ²) τn1/2
¶
. (4.19)
By (A.2), µ
X0Σ−1(θα)X τn
¶−1
= Op(1). (4.20)
By (A.3),
n→∞lim 1
τnvar(Xj0Σ−1(θα)(η + ²)) = lim
n→∞
1
τnXj0Σ−1(θα)Σ(θ0)Σ−1(θα)Xj < ∞.
where Xj be the jth column of X. This together with E(Xj0Σ−1(θα)(η + ²)) = 0 imply that
1 τn1/2
Xj0Σ−1(θα)(η + ²) = Op(1). (4.21) Therefore, (4.17) follows from (4.19)-(4.21). Using
µ0A(α; θα)0Σ−1(θα)A(α; θα)(η + ²) = µ0A(α; θα)0Σ−1(θα)(η + ²), (4.11) and the Markov’s inequality, we have for any ε > 0,
n→∞lim P ¡
|µ0A(α; θα)0Σ−1(θα)(η + ²)±
τn| > ε¢
≤ lim
n→∞P ¡
|µ0A(α; θα)0Σ−1(θα)(η + ²)±
τn|2 ≥ ε2¢
≤ lim
n→∞
1 ε2τn2E¡
µ0A(α; θα)0Σ−1(θα)Σ(θ0)Σ−1(θα)A(α; θα)µ¢
= 0.
This gives (4.18). Thus (4.15) is obtained.
We are now ready to prove (4.13). Let αL = arg min
α∈A
LKL(α; θα). By (4.15), we have
0 ≤ plim
n→∞
ΓGIC(λ)(αL) − ΓGIC(λ)(ˆαGIC(λ))
τn = plim
n→∞
LKL(αL; θαL) − LKL(ˆαGIC(λ); θαˆGIC(λ))
τn ≤ 0,
for some θαˆ, θαL ∈ Θ where the first inequality follows from the definition of ˆαGIC(λ), the equality follows from (4.15) and the last inequality follows from the definition of αL. It follows that
plim
n→∞
LKL(ˆαGIC(λ);; θαˆGIC(λ)) − LKL(αL; θαL) τn
= 0. (4.22)
In addition, by (A.1) and (4.16), plim
n→∞
2
τnLKL(α; θα)) > plim
n→∞
1
τnµ0A(α; θα)0Σ−1(θα)A(α; θα)µ > 0.
This together with (4.22) implies that plim
n→∞LKL(αL; θα)±
LKL(ˆαGIC(λ); θαˆGIC(λ)) = 1. Then by (A.5),
plim
n→∞LKL(αL; ˆθ(αL))±
LKL(ˆαGIC(λ); ˆθ(ˆαGIC(λ))) = 1.
which gives (4.13). This completes the proof of (i).
(ii) We first prove (4.15) for Ac 6= ∅. The proof is essentially the same as that in (i) except (4.18) needs to be shown as follows:
µ0A(α; θα)0Σ−1(θα)A(α; θα)(η + ²)
= µ0A(α; θα)0Σ−1(θα)(η + ²)
= β(αc\ α)0X(αc\ α)0Σ−1(θα)(η + ²)
−
µβ(αc\ α)0X(αc\ α)0Σ−1(θα)X(α) τn1/2
¶µX(α)0Σ−1(θα)X(α) τn
¶−1
×
µX(α)0Σ−1(θα)(η + ²) τn1/2
¶
= β(αc\ α)0X(αc\ α)0Σ−1(θα)(η + ²) + Op(1)
= op(τn),
where the second last equality follows similarly from the proof of (4.17) and the last equality follows from (4.21).
Second, we prove that lim
n→∞P¡
ΓGIC(λ)(α) > ΓGIC(λ)(αc)¢
= 1, for α ∈ A \ Ac. By (A.4), we have for α ∈ Ac,
ΓGIC(λ)(α) = −2`(θα; α) + op(τn)
= n log(2π) + log det(Σ(θα)) + (η + ²)0Σ−1(θα)(η + ²)
−(η + ²)0M (α; θα)0Σ−1(θ0)(η + ²) + op(τn)
= n log(2π) + log det(Σ(θα)) + (η + ²)0Σ−1(θα)(η + ²) + op(τn),(4.23) where the first equality follows from λp = o(τn) and the last equality follows from (4.17).
Then, by (4.15) and (4.23), we have for α ∈ A \ Ac, ΓGIC(λ)(α) − ΓGIC(λ)(αc)
= 2LKL(α; θα) + log det(Σ(θ0)) + (η + ²)0Σ−1(θ0)(η + ²)
− log det(Σ(θαc)) − (η + ²)0Σ−1(θαc)(η + ²) + o(τn)
= 2LKL(α; θα) − tr¡
((η + ²)(η + ²)0− Σ(θ0))¡
Σ−1(θαc) − Σ−1(θ0)¢¢
−¡
log det(Σ(θαc)) − log det(Σ(θ0)) + tr(Σ(θ0)Σ−1(θαc)) − n¢
+ op(τn)
= 2LKL(α; θα) + op(τn) > 0,
as n → ∞ with probability tending to 1, where the last equality follow from (4.12), (4.14) and (4.22). It follows that lim
n→∞P¡ ˆ
αGIC(λ) ∈ A \ Ac¢
= 0.
Last, it remains to show that GIC achieves its minimum at αc among α ∈ Ac. For as n → ∞ with probability tending to 1, which follows that lim
n→∞P¡ ˆ
αGIC(λ) ∈ Ac, ˆαGIC(λ) 6=
αc¢
= 0. This completes the proof of the theorem. 2
Conditions (A.1)-(A.3) in Theorem 3 not only depend on explanatory variables but also depend on asymptotic frameworks. As shown in Theorem 7, those conditions are easier to be satisfied under the increasing domain asymptotic framework, particularly when the domain increases with the sample size in a faster rate. On the other hand, (A.1) may not be satisfied under the fixed domain asymptotic framework.
Theorem 3 is for fixed designs. A random design version is given in the following corollary.
Corollary 3 (random design) Consider a class of models given by (3.1) with p fixed and X random, where X is independent of (η + ²). Let Θ be a compact parameter space for θ with θ0 ∈ Θ being the true parameter vector, and let LKL(α) be the KL loss defined
(A.3’) For θ ∈ Θ, lim
n→∞
1
τntr(Σ−1(θ)Σ(θ0)Σ−1(θ)E(XjXj0)) < ∞, where Xj is the jth column of X,
and (A.4)-(A.5) are satisfied.
(i) For Ac= ∅, if τn/λ → ∞, (4.12) holds and
n→∞lim sup
α∈A\Ac
1 τnE¡
µ0A(α; θ)0Σ−1(θ)Σ(θ0)Σ−1(θ)A(α; θ)µ¢
< ∞,
for θ ∈ Θ, then GIC defined in (4.2) is asymptotically loss efficient:
plim
n→∞LKL( ˆθ(ˆαGIC(λ)); ˆαGIC(λ)) .
minα∈ALKL( ˆθ(α); α) = 1.
(ii) For Ac6= ∅, if λ → ∞, τn/λ → ∞, (4.12) and (4.14) hold, then
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1.
Similar to (A.1)-(A.3) in Theorem 3 under fixed designs, (A.1’)-(A.3’) in Corollary 3 not only depend on explanatory variables but also depend on asymptotic frameworks.
In contrast to fixed designs with smooth functions as explanatory variables, where (A.1) may not be satisfied (see Theorem 7) under the fixed domain asymptotic framework, condition (A.1’) appear to be easier satisfied when random designs are considered (see some examples in Theorems 10 and 13).
Chapter 5
Exponential Covariance Models in One Dimension
In this chapter, we consider some examples in the one-dimensional space with η(·) of (2.2) generated from an exponential covariance function:
cov(η(s), η(s)0) = σ2ηexp(−κη|s − s0|); s, s0 ∈ R, (5.1) where ση2 > 0 and κη > 0. Let si = in−(1−δ) i = 1, . . . , n, for some δ ∈ [0, 1). Then {η(s1), . . . η(sn)} can be expressed as an AR(1) process:
η(si) = ρnη(si−1) + ζi, (5.2) where
ρn≡ exp(−κηn−(1−δ)), (5.3)
η(s1) ∼ N(0, σ2η), ζi ∼ N(0, σ2η(1 − ρ2n)) is independent of η(si−1) for i = 2, . . . , n, and η(s1), ζ2, . . . , ζn are independent. Then the covariance parameter vector can be written as θ ≡ (ση2, κη, σ2²)0.
In what follows, we consider four examples corresponding to four different classes of explanatory variables in (3.1) with the exponential covariance model of (5.1) for η(·).
Example 1 (polynomials) Suppose that there are p explanatory variables, xj(si); j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, with xj(·) given by
xj(s) = sj; s ∈ R, j = 1, . . . , p, (5.4) where p is fixed and δ ∈ [0, 1).
Example 2 (polynomials varying with n) Suppose that there are p explanatory variables xj(si); j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, with xj(·) given by
xj(s) = (sn−δ)j; s ∈ R, j = 1, . . . , p, (5.5) where p is fixed and δ ∈ [0, 1).
Example 3 (spatially dependent processes) Suppose that there are p explanatory vari-ables xj(si); j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, where x1(·), . . . , xp(·) are independent zero-mean Gaussian spatial processes with covariance functions,
cov(xj(s), xj(s0)) = σj2exp{−κj|s − s0|}; s, s0 ∈ R, j = 1, . . . , p, (5.6) p is fixed, δ ∈ [0, 1), and σ2j, κj > 0; j = 1, . . . , p.
Example 4 (white noise processes) Suppose that there are p explanatory variables xj(si);
j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, where x1(·), . . . , xp(·) are independent white-noise processes with
xj(si) ∼ N(0, σ2j); i = 1, . . . , n, j = 1, . . . , p, (5.7) p is fixed, δ ∈ [0, 1), and σ2j > 0; j = 1, . . . , p.
We shall characterize the asymptotic behavior of GIC under both the fixed domain and the increasing domain frameworks with θ being either known or estimated by ML.
We shall also show how different generating mechanism of explanatory variables in the aforementioned examples affects the asymptotic behavior.
First, we introduce some notations and a number of technical lemmas regarding ex-ponential covariance functions, which are crucial for developing the asymptotical results of GIC. Let
Gk ≡
1 0 0 · · · 0
−ρn 1 0 . .. ...
0 −ρn 1 . .. 0 ... . .. ... ... 0 0 · · · 0 −ρn 1
k×k
, (5.8)
Tk ≡
ση2+ σ2² −σ²2ρn 0 · · · 0
−σ²2ρn f1(ρn) −σ2²ρn . .. ...
0 −σ²2ρn f1(ρn) . .. 0 ... . .. . .. . .. −σ2²ρn 0 · · · 0 −σ2²ρn f1(ρn)
k×k
, (5.9)
be k × k matrices, where
f1(ρn) ≡ (1 − ρ2n)ση2+ (1 + ρ2n)σ²2. (5.10) Lemma 4 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Then
Σ−1(θ) = G0nTn−1Gn, (5.11) where Gn and Tn are given by (5.8) and (5.9), respectively.
Lemma 5 For any c > 0 and δ ∈ [0, 1) with n(1−δ)/2+c < n, consider Tjn defined in (5.9) with n(1−δ)/2+c ≤ jn ≤ n. Let Cjn(k, `) be the (k, `)th element of Tj−1n . Then there exists a constant τ > 0 such that
σ²−2jndet(Tjn) = f2jn−1(ρn)
(f12(ρn) − 4ρ2nσ4²)1/2((σ2η+ σ²2)f2(ρn) − ρ2nσ²2) + o(exp(−τ nc/2)),(5.12) where ρn and f1(ρn) are given by (5.3) and (5.10), respectively, and
f2(ρn) ≡ f1(ρn) + (f12(ρn) − 4ρ2nσ4²)1/2
2σ²2 . (5.13)
In addition,
Cjn(1, `) = Cjn(`, 1) = ρ`−1n
((ση2+ σ2²)f2(ρn) − ρ2nσ2²)f2`−2(ρn) + o(exp(−τ nc/2));
1 ≤ ` ≤ jn− n(1−δ+c)/2, (5.14)
Cjn(jn, `) = Cjn(`, jn) = ρjnn−`
f2jn−`+1(ρn)σ²2 + o(exp(−τ nc/2)); n(1−δ+c)/2 < ` ≤ jn,(5.15)
1≤k,`≤jmaxn
Cjn(k, `) = 1
(8κηση2σ−2² )1/2n(1−δ)/2+ o(n−(1−δ)), (5.16) and
tr(Tn−1) = n(3−δ)/2
2(2κησ2ησ²2)1/2 + O(n1−δ). (5.17) Furthermore, let Tn(1) be the matrix with (σ2η, κη, σ2²) in Tn replaced by (ση(1)2, κ(1)η , σ(1)2² ).
Then
tr(Tn−1Tn(1)−1) = n(5−3δ)/2
25/2(κηση2κ(1)η σ(1)2η )1/2((κησ2ησ²(1)2)1/2+ (κ(1)η σ(1)2η σ²2)1/2)
+O(n2−δ). (5.18)
Notice that Tn defined in (5.9) corresponds to the variance-covariance matrix of a moving average (MA) process {υ1, . . . , υn} of order 1:
var(υn) = Tn, (5.19)
where υn≡ (υ1, . . . , υn),
υi = ui− f4(ρn)ui−1; i = 2, . . . , n, (5.20) with u1 ∼ N(0, (σ2η− σ²2− f2(ρn)σ²2)f4−2(ρn)) and ui ∼ N(0, f2(ρn)σ²2); i = 2, . . . , n,
f4(ρn) ≡ ρn/f2(ρn), (5.21) and recall that f2(ρn) and ρnare defined in (5.13) and (5.3), respectively. Some asymptotic properties of f4(ρn) and Tn are given in the follow lemmas.
Lemma 6 With f2(ρn) and f4(ρn) defined in (5.13) and (5.21), respectively, we have f4(ρn) = 1 − (2κηση2σ²−2)1/2n−(1−δ)/2+ O(n−(1−δ)), (5.22) f2(ρn) = 1 + (2κηση2σ−2² )1/2n−(1−δ)/2+ (ση2− σ²2)σ−2² κηn−(1−δ)+ O(n−3(1−δ)/2),(5.23) and
log f4(ρn) = −(2κησ2ησ²−2)1/2n−(1−δ)/2+ O(n−(1−δ)). (5.24) In addition, for any c > 0 and δ ∈ [0, 1) with n(1−δ)/2+c < n, and any jn with n(1−δ)/2+c ≤ jn≤ n, there exists a constant τ > 0 such that
f4jn(ρn) = o(exp(−τ nc)). (5.25)
Lemma 7 Consider Tn defined in (5.9). For any c > 0, δ ∈ [0, 1) with n(1−δ)/2+c < n, and any jn with n(1−δ)/2+c ≤ jn≤ n, there exists a constant τ > 0 such that
Tn−1 = Ω0n
µ Λ−1jn 0
0 (f2(ρn)σ²2)−1In−jn
¶
Ωn+ o(exp(−τ nc/2)), (5.26) where
Ωn ≡
1 0 · · · 0
f4(ρn) 1 . .. ...
... . .. . .. 0 f4n−1(ρn) · · · f4(ρn) 1
, (5.27)
f2(ρn) and f4(ρn) are given by (5.13) and (5.21), respectively, and
Λk = ΩkTkΩ0k. (5.28)
The following three lemmas are based on Lemmas 4-7, which are crucial in developing the asymptotical results of ML estimates in Sections 5.1-5.3.
Lemma 8 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let Σ(j)η be the same as Ση except (σ2η, κη) are replaced by
¡σ(j)2η , κ(j)η
¢. Define Σ(j) ≡ Σ(j)η + σ²(j)2; j = 1, 2, 3. Then for δ ∈ [0, 1),
log(det(Σ(θ))) = n log σ2² +
µ2κησ2η σ²2
¶1/2
n(1+δ)/2−
µκη(σ2η+ σ²2) σ²2
¶ nδ
− log n(1−δ)/2+ o(nδ) + O(1), (5.29) tr(Σ(1)η Σ−1(θ)) = σ(1)2η κ(1)η
(2κησ2ησ²2)1/2n(1+δ)/2+ση(1)2κ(1)η (κη − κ(1)η ) κησ2η nδ +ση(1)2(κη − κ(1)η )2
2κηση2 nδ+ o(nδ) + O(1), (5.30) tr(Σ−1(θ)) = n
σ²2 − (2κηση2σ−2² )1/2
2σ²2 n(1+δ)/2+ o(nδ) + O(1), (5.31) tr(Σ(1)Σ−1(θ)) = σ²(1)2
σ2² n − σ²(1)2
2σ²2 (2κηση2σ²−2)1/2n(1+δ)/2+ ση(1)2κ(1)η
(2κηση2σ²2)1/2n(1+δ)/2 +ση(1)2κ(1)η (κη − κ(1)η )
κησ2η nδ+ση(1)2(κη− κ(1)η )2 2κηση2 nδ
+o(nδ) + O(1), (5.32)
tr(Σ(1)η Σ−1(θ)Σ(2)η Σ(3)−1) = σ(1)2η κ(1)η σ(2)2η κ(2)η n(1+δ)/2
21/2(κησ2ηκ(3)η ση(3)2)1/2((κηση2σ²(3)2)1/2+ (κ(3)η ση(3)2σ²2)1/2)
+O(nδ), (5.33)
tr(Σ(1)η Σ−1(θ)Σ(2)−1) = ση(1)κ(1)η
σ2²σ²(2)2((2κηση2σ²−2)1/2+ (2κ(2)η σ(2)2η σ(2)−2² )1/2) + O(nδ), (5.34) tr(Σ(1)−1Σ−1(θ)) = n
σ(1)2² σ2² − 1 σ²(1)2σ2²
µ(κησ2ησ−2² )1/2+ (κ(1)η ση(1)2σ²(1)−2)1/2 21/2
− (κηση2σ−2² )1/2(κ(1)η ση(1)2σ²(1)−2)1/2 21/2((κησ2ησ²−2)1/2+ (κ(1)η ση(1)2σ²(1)−2)1/2)
¶
n(1+δ)/2
+O(nδ). (5.35)
Lemma 9 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let ψk ≡ n−k(1k, 2k, . . . , nk)0; k ∈ {0, 1, . . . }. Then for any k = 0, 1, . . . , ` = 1, 2, . . . , and any δ ∈ [0, 1),
ψk0Σ−1ψ` = κη
2ση2(k + ` + 1)nδ+ 1
2ση2 + k`
2κηση2(k + ` − 1)n−δ+ o(nδ), (5.36) ψ00Σ−1ψ0 = κ
2ση2nδ+ 1
σ2η + o(nδ). (5.37)
In addition, for δ ∈ [0, 1), k, ` = 0, 1, . . . , p, and Σ(1) defined in Lemma 8,
ψk0Σ−1Σ(1)Σ−1ψ` = O(nδ). (5.38)
Lemma 10 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let Σj = var((xj(s1), . . . , xj(sn))0) with xj(s) defined in (5.6).
(i) Suppose that |σ²2− σ2| = o(1) for some constant σ2 > 0. Then log(σ²2) + σ2
σ²2 − log(σ2) − 1 = 1
2σ4(σ2² − σ2)2+ o((σ2² − σ2)2). (5.39) (ii) Suppose that |σ²2−σ2| = o(1) for some constants σ2 > 0. Then for any κη, ση2, τ > 0,
µ 1 σ2²
¶1/2µ
1 − σ2
2σ2² + τ 2κηση2
¶
− µ 1
σ2
¶1/2µ 1 2 + τ
2κησ2η
¶
= o(σ²2− σ2). (5.40)
(iii) Suppose that |κησ2η− τ | = o(1) for some constant τ > 0. Then for any σ2 > 0, µ2κηση2
σ2
¶1/2µ 1
2 + τ 2κηση2
¶
− µ2τ
σ2
¶1/2
= (κησ2η− τ )2
25/2στ3/2 + o((κηση2− τ )2). (5.41) (iv) Suppose that |σ²2− σ2| = o(1) and |κηση2 − τ | = o(1) for some constants σ2, τ > 0.
Then for any κj, κj0, σj2, σj0 > 0,
tr(Σj(Σ(θ) − Σ((σ2η, κη, σ2)0))Σj(Σ((ση2, κη, σ²2)0) − Σ((ση2, κη, σ2)0)))
= κjσj2κj0σj20
29/2τ3/2σ3(σ²2− σ2)2n(1+δ)/2+ o((σ²2− σ2)2n(1+δ)/2) + O(nδ). (5.42)
(v) Suppose that |σ²2− σ2| = o(1) and |κηση2 − τ | = o(1) for some constants σ2, τ > 0.
Then for any κj, σj2 > 0,
tr(Σj(Σ−1(θ) − Σ−1((σ2η, κη, σ2)0))(Σ−1(θ) − Σ−1((ση2, κη, σ2)0)))
= 5κjσ2j
29/2τ1/2σ7(σ²2− σ2)2n(1+δ)/2+ o((σ²2− σ2)2n(1+δ)/2) + O(nδ). (5.43) (vi) Suppose that |σ²2− σ2| = o(1) and |κηση2 − τ | = o(1) for some constants σ2, τ > 0.
Then
tr((Σ−1(θ) − Σ−1((ση2, κη, σ2)0))(Σ−1(θ) − Σ−1((σ2η, κη, σ2)0)))
= 1
σ8(σ²2− σ2)2n + o((σ2² − σ2)n(1+δ)/2) + O(nδ). (5.44) (vii) Suppose that |σ²2 − σ2| = o(n−(1−δ)/2) and |κησ2η − τ | = o(1) for some constants
σ2, τ > 0. Then for any κj, κj0, σj2, σj0, c, d > 0 with cd = τ ,
tr(Σj(Σ−1(θ) − Σ−1((c, d, σ2)0))Σj0(Σ−1(θ) − Σ−1((c, d, σ2)0)))
= 5κjσj2κj0σj20
29/2στ7/2 (κηση2− τ )2n(1+δ)/2+ o((κηση2− τ )2n(1+δ)/2) + O(nδ). (5.45) (viii) Suppose that |σ²2 − σ2| = o(n−(1−δ)/2) and |κησ2η − τ | = o(1) for some constants
σ2, τ > 0. Then for any κj, σj2, c, d > 0 with cd = τ ,
tr(Σj(Σ−1(θ) − Σ−1((c, d, σ2)0))(Σ−1(θ) − Σ−1((c, d, σ2)0)))
= κjσj2
29/2σ3τ5/2(κησ2η− τ )2n(1+δ)/2+ o((κηση2− τ )2n(1+δ)/2) + O(nδ). (5.46) (ix) Suppose that |σ²2 − σ2| = o(n−(1−δ)/2) and |κησ2η − τ | = o(1) for some constants
σ2, τ > 0. Then for any c, d > 0 with cd = τ ,
tr((Σ−1(θ) − Σ−1((c, d, σ2)0))(Σ−1(θ) − Σ−1((c, d, σ2)0)))
= 1
29/2τ5/2(κηση2− τ )2n(1+δ)/2+ o((κηση2− τ )2n(1+δ)/2) + O(nδ). (5.47)
5.1 Polynomial Order Selection
In this section, we consider Examples 1 and 2 given by (5.4) and (5.5) for polynomial order selection. Note that in Example 1, the underlying true polynomial does not vary with the sample size, whereas in Example 2, the magnitude of the underlying true polynomial decreases as the sample size increases, making estimation and polynomial order selection more difficult. Let Vj×j0 be a j × j0 matrix with the (k, `)th element,
1
k + ` + 1; k = 1, . . . , j, ` = 1, . . . , j0. (5.48) Note that when j = j0, the square matrix Vj,j is nonsingular (see Shibata 1981).
Proposition 4 Consider a class of models given by (3.1) with p explanatory variables and Vj×j0 is defined in (5.48). In addition, the log-likelihood of (4.9) based on model α ∈ A can be decomposed into the following:
(i) For δ ∈ (0, 1),
Equation (5.52) provides some guidance of applying GIC to distinguish between correct and incorrect models in polynomial order selection. For example, it follows from (5.52) and γ(αc) = 0 that for α ∈ A \ Ac and δ ∈ (0, 1),
−2`(θ; α) + 2`(θ; αc) = γ(α)κη
2ση2 nδ+ op(nδ), (5.55)
so that we can get rid of underfitted models if the penalty term has a smaller order than O(nδ). As to be demonstrated in Theorem 6, we can use (5.55) to find an appropriate penalty λ that leads to selection consistency. On the other hand, applying (5.54) to Example 2 under the fixed domain asymptotic framework (i.e., δ = 0), we obtain
−2`(θ; α) + 2`(θ; αc) = Op(1), (5.56) indicating that consistency or asymptotic loss efficiency of GIC is almost impossible.
Additionally, we see from (5.54) that the likelihood value depends on κη and ση2 mainly through their product, but not their individual values under the fixed domain asymptotic framework when δ = 0. Consequently, variable selection based on GIC is expected to be not much affected by individual estimates of κη and ση2 as long as the estimate of the microergodic parameter, κηση2, remains the same.
The following lemma provides consistency of the ML estimate of σ²2 and the microer-godic parameter, κηση2, under both the fixed domain and the increasing domain asymptotic frameworks with δ ∈ [0, 1). The results are extended from Chen et al. (2000) who consider only α ∈ Ac and δ = 0.
Lemma 11 Under the setup of Proposition 4, let Θ ⊂ (0, ∞)3 be a compact set and let θ(α) = (ˆˆ σ2η(α), ˆκη(α), ˆσ2²(α))0 be the ML estimate of θ based on model α. Then for any δ ∈ [0, 1),
ˆ
σ²2(α) = σ²,02 + op(1), (5.57) ˆ
κη(α)ˆση2(α) = κη,0ση,02 + op(1). (5.58)
The following theorem further provides the convergence rates for the ML estimates of κη, σ2η and σ²2. These results are also extended from Chen et al. (2000) who consider only α ∈ Ac and δ = 0, and are keys for establishing some asymptotic properties of GIC in Theorem 7.
Theorem 4 Under the setup of Proposition 4, let Θ ⊂ (0, ∞)3 be a compact set and let θ(α) = (ˆˆ σ2η(α), ˆκη(α), ˆσ2²(α))0 be the ML estimate of θ based on model α. Then
(i) For δ ∈ (0, 1), ˆ
σ2²(α) = σ²,02 + op(n−(1−δ)/2); α ∈ A, (5.59) ˆ
κη(α)ˆσ2η(α) = κη,0ση,02 + op(n−(1−δ)/4); α ∈ A, (5.60) ˆ
σ2η(α) =
½ σ2η,0+ op(1); if α ∈ Ac,
γ(α) + σ2η,0+ op(1); if α ∈ A \ Ac, (5.61) ˆ
κη(α) =
½ κη,0+ op(1); if α ∈ Ac,
κη,0ση,02 (γ(α) + ση,02 )−1+ op(1); if α ∈ A \ Ac, (5.62) where γ(α) > 0 is a constant defined in (5.51) for α ∈ A \ Ac.
(ii) For δ = 0 and any α ∈ A, ˆ
σ²2(α) = σ²,02 + Op(n−1/2), (5.63) ˆ
κη(α)ˆση2(α) = κη,0σ2η,0+ Op(n−1/4). (5.64)
Proof. Denote ση,α2 ≡ γ(α) + ση,02 and κη,α ≡ κη,0σ2η,0/(γ(α) + ση,02 ), for α ∈ A, where γ(α) ≡ 0 for α ∈ Ac. Note that κη,αση,α2 = κη,0σ2η,0.
First, we prove (5.59). By (5.57) and (5.58), it suffices to show that for |σ2² − σ²,02 | = o(1), |κησ2η − κη,0ση,02 | = o(1) and any ε > 0,
|σ2²−σ²,02 |≥εninf−(1−δ)/2(−2`(θ; α) + 2`((ση2, κη, σ²,02 )0; α)) > 0, (5.65) as n → ∞ with probability tending to 1. By (5.52), we can write
−2`(θ; α) = n log(2π) − 1 − δ
where the last equality follows from (5.39) and (5.40). Therefore, for (5.65) to hold, it remains to show that Applying Chebyshev’s inequality on each of the three parts and using the following three moment conditions given from (5.42)-(5.44) on (5.68):
var¡
we obtain (5.67). This completes the proof of (5.59).
Second, we prove (5.60). By (5.58) and (5.59), it suffices to show that for |σ²2− σ²,02 | = o(n−(1−δ)/2), |κηση2− κη,0ση,02 | = o(1) and any ε > 0,
|ση2κη−σ2η,0κinfη,0|≥εn−(1−δ)/4
¡− 2`(θ; α) + 2`((ση,α2 , κη,α, σ2²,0)0; α)¢
> 0, (5.69)
as n → ∞ with probability tending to 1. By (5.66), for |σ2² − σ²,02 | = o(n−(1−δ)/2) and
|κηση2− κη,0ση,02 | = o(1), we have
−2`(θ; α) + 2`((ση,α2 , κη,α, σ²,02 )0; α)
=
½µ2κησ2η σ²,02
¶1/2µ 1
2 +κη,0σ2η,0 2κηση2
¶
−
µ2κη,0ση,02 σ²,02
¶1/2¾
n(1+δ)/2
+ ση,α2
2κη,0σ2η,0(κη− κη,α)2nδ+ ξ(θ) − ξ((ση,α2 , κη,α, σ2²,0)0) + op(nδ)
= (κηση2− κη,0ση,02 )2n(1+δ)/2
25/2(κη,0σ2η,0)3/2 + σ2η,α(κη − κη,α)nδ
2κη,0σ2η,0 + ξ(θ) − ξ((ση,α2 , κη,α, σ2²,0)0)
+op(nδ), (5.70)
where the first equality follows from (5.39) and the second equality follows from (5.41).
Therefore, for (5.69) to hold, it remains to show that
ξ(θ) − ξ((ση,α2 , κη,α, σ²,02 )0) = op(max((κηση2− κη,0ση,02 )2n(1+δ)/2, nδ)), (5.71) which can be obtained from a decomposition similar to (5.68) in addition to the following three moment conditions given from (5.45)-(5.47):
var¡
η0(Σ−1(θ) − Σ−1((ση,α2 , κη,α, σ²,02 )0))η¢
= O(max((κησ2η− κη,0ση,02 )2n(1+δ)/2, nδ)), var¡
η0(Σ−1(θ) − Σ−1((σ2η,α, κη,α, σ²,02 )0))²¢
= O(max((κησ2η− κη,0ση,02 )2n(1+δ)/2, nδ)), var¡
²0(Σ−1(θ) − Σ−1((σ2η,α, κη,α, σ²,02 )0))²¢
= O(max((κησ2η− κη,0ση,02 )2n(1+δ)/2, nδ)).
Thus (5.69) is obtained. This completes the proof of (5.60).
Third, we prove (5.61) and (5.62). By (5.70) and (5.71), for |σ2² − σ2²,0| = o(n−(1−δ)/2),
|σ2ηκη − σ2η,0κη,0| = o(n−(1−δ)/4) and any ε > 0, we have
|κη−κinfη,α|≥ε−2`(θ; α) + 2`((σ2η,α, κη,α, σ2²,0)0; α) = ση,α2
2κη,0ση,02 ε2nδ+ op(nδ) > 0, as n → ∞ with probability tending to 1, which gives (5.62). This together with (5.60) gives (5.61).
Fourth, we prove (5.63). By (5.57) and (5.58), it suffices to show that for |σ²2− σ²,02 | = o(1) and |κηση2− κη,0ση,02 | = o(1), there exists M > 0 such that
|σ2²−σ2²,0inf|≥M n−1/2
©− 2`(θ; α) + 2`((ση2, κη, σ²,02 )0; α)ª
> 0, (5.72)
as n → ∞ with probability tending to 1. By (5.54), for |σ²2 − σ²,02 | = o(1) and |κησ2η −
where the second equality follows from (5.39) and (5.40), and the last equality follows from
ξ(θ) − ξ((ση2, κη, σ²,02 )0) = op
¡(σ2² − σ²,02 )2n¢
+ Op(1), (5.73) which can be obtained in a way similar to (5.67). Consequently, there exists M > 0 such that
|σ2²−σ2²,0inf|≥M n−1/2(−2`(θ; α) + 2`((ση2, κη, σ²,02 )0; α)) = M2
2σ²,04 + Op(1) > 0,
as n → ∞ with probability tending to 1. Thus, we obtain (5.72), and hence the proof of (5.63) is complete.
where the first equality follows from |σ²2− σ2²,0| = o(n−(1−δ)/2), the second equality follows from (5.41), and the last equality follows from
ξ(θ) − ξ((ση,α2 , κη,α, σ²,02 )0) = op
¡(κησ2η − κη,0ση,02 )2n1/2¢
+ Op(1), (5.76) which can be obtained in a way similar to (5.71). Thus, (5.74), and hence (5.64) are
obtained. This completes the proof. 2
Note that a special case of Theorem 4 for which δ = 0 and β = 0, can be found in Zhang and Zimmerman (2005), where they consider no regressor, and hence consider no underfitted model.
Corollary 4 Under the setup of Theorem 4, let
θ(1)α = (γ(α) + ση,02 , κη,0ση,02 (γ(α) + ση,02 )−1, σ²,02 )0; α ∈ A, (5.77) where γ(α) ≡ 0 for α ∈ Ac. Then
plim
n→∞
1
nδ(−2`( ˆθ(α); α) + 2`(θα(1); α)) = 0; if δ ∈ (0, 1), (5.78)
−2`( ˆθ(α); α) + 2`(θ(1)α ; α) = Op(1); if δ = 0. (5.79) In addition, for LKL(α; θ) defined in (3.3) and α ∈ A \ Ac,
plim
n→∞LKL( ˆθ(α); α)±
LKL(θα(1); α) = 1; if δ ∈ (0, 1), (5.80) LKL( ˆθ(α); α) − LKL(θα(1); α) = Op(1); if δ = 0. (5.81)
Note that from Theorem 4, we have plim
n→∞
θ(α) = θˆ α(1) for δ ∈ (0, 1), which immediately gives (5.78). On the other hand, (5.79) is somewhat surprising, because ˆθ(α) generally does not converge to θ(1)α for δ = 0.
Theorem 5 Consider a class of models given by (3.1) with xj(s) = sj; j = 1, . . . , p, and cov(η(s), η(s0)) = σ2ηexp(−κη|s − s0|), where ση2 > 0, κη > 0 and σ²2 > 0 are known, and p is fixed. Suppose that A = {α0, α1, . . . , αp}, where α0 = ∅, αj = {1, . . . , j} for j = 1, . . . , p, and Ac6= ∅. In addition, suppose that the data are collected at si = in−(1−δ); i = 1, . . . , n, for some δ ∈ [0, 1).
(i) For δ = 0 and any λ > 0,
n→∞lim P¡
αc= arg min
α∈A
LKL(α)¢
< 1. (5.82)
In addition, if λ → ∞, then
n→∞lim P¡ ˆ
αGIC(λ) = α0
¢= 1, (5.83)
where ˆαGIC(λ) is defined in (4.2).
(ii) For δ ∈ (0, 1), if λ → ∞ and n(2p(αc)+1)δ±
λ → ∞ as n → ∞, then
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1.
Proof. (i) For δ = 0, by (3.8),
LKL(α) = µ0A(α)0Σ−1A(α)µ + (η + ²)0M (α)0Σ−1(η + ²), where
(η + ²)0M (α)0Σ−1(η + ²) = (η + ²)0Σ−1X(α)(X(α)0Σ−1X(α))−1X(α)0Σ−1(η + ²)
∼ χ2(p(α)), (5.84)
with χ2(k) denoting the chi-square distribution with k degrees of freedom. Similarly, (η + ²)0(M (αc) − M (α))0Σ−1(η + ²) ∼ χ2(p(αc) − p(α)).
By (5.50), for α ∈ A \ Ac, we have µ0A(α)0Σ−1A(α)µ = O(1). Hence, for α ∈ A \ Ac,
n→∞lim P¡
LKL(αc) − LKL(α) > 0¢
= lim
n→∞P¡
(η + ²)0(M (αc) − M (α))0Σ−1(η + ²) − µ0A(α)0Σ−1A(α)µ > 0¢
> 0.
Thus (5.82) is obtained.
For (5.83), by (4.6) with λ → ∞,
ΓGIC(λ)(α) = (Z − ˆµ(α))0Σ−1(Z − ˆµ(α)) + λp(α)
= µ0A(α)0Σ−1A(α)µ + (η + ²)0A(α)0Σ−1(η + ²) + 2µ0A(α)0Σ−1(η + ²) +λp(α)
= 2µ0A(α)0Σ−1(η + ²) + (η + ²)0Σ−1(η + ²) + λp(α) + op(λ),
where the last equality follows from (5.50) and (5.84). In addition, by Chebyshev’s in-equality and the following moment condition:
var(µ0A(α)0Σ−1(η + ²)) = µ0A(α)0Σ−1A(α)µ = O(1), we have µ0A(α)0Σ−1(η + ²) = Op(1). Therefore, for α ∈ A \ {α0},
ΓGIC(λ)(α) − ΓGIC(λ)(α0) = λ(p(α) − p(α0)) + op(λ),
which is greater than zero with probability tending to 1. Thus (5.83) is obtained.
(ii) It suffices to show that limn→∞ELKL(α)/λ = ∞ for α ∈ A \ Ac by (4.7). First, for α ∈ A \ Ac,
µ0A(α)0Σ−1A(α)µ
= β0X0(Σ−1− Σ−1X(α)(X(α)0Σ−1X(α))−1X(α)Σ−1)Xβ
= β∗0X∗0(Σ−1− Σ−1X∗(α)(X∗(α)0Σ−1X∗(α))−1X∗(α)Σ−1)X∗β∗
= β∗0(Vp,p− Vp,p(α)Vp(α),p(α)−1 Vp(α),p)β∗+ o(n(2p(αc)+1)δ)
= βp(α2 c)e0p(αc)(Vp,p − Vp,p(α)Vp(α),p(α)−1 Vp(α),p)ep(αc)n(2p(αc)+1)nδ + o(n(2p(αc)+1)δ), where ej is the jth column of Ip, β∗(α) = D(α)β(α), X∗(α) = D−1(α)X(α) with
D(α) =
1 0 · · · 0 0 nδ . .. ...
... ... ... 0 0 · · · 0 np(α)δ
,
Vj×j0 is defined in (5.48), and e0p(αc)(Vp,p−Vp,p(α)Vp(α),p(α)−1 Vp(α),p)ep(αc)is a constant, which is bounded away from 0 by Theorem 3.1 of Shibata (1981). It follows from (3.9) and
n→∞lim λ±
n(2p(α)+1)δ= 0 that
n→∞lim
ELKL(α)
λ = lim
n→∞
ELKL(α)/n(2p(αc)+1)δ λ/n(2p(αc)+1)δ = ∞.
This completes the proofs. 2
Theorem 6 Consider the same setup as in Theorem 5 except xj(s) = (sn−δ)j; j = 1, . . . , p.
(i) For δ = 0 and any λ > 0,
n→∞lim P¡
αc= arg min
α∈A
LKL(α)¢
< 1.
In addition, if λ → ∞, then
n→∞lim P¡ ˆ
αGIC(λ) = α0¢
= 1, where ˆαGIC(λ) is defined in (4.2).
(ii) For δ ∈ (0, 1), if λ → ∞ and nδ±
λ → ∞ as n → ∞, then
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1.
Proof. (i). See (i) in Proof of Theorem 5.
(ii). From (ii) of Theorem 2, it suffices to show that limn→∞ELKL(α)/λ = ∞ for α ∈ A \ Ac. By (5.49),
µ0A(α)0Σ−1A(α)µ = γ(α)nδ+ o(nδ),
where γ(α) is a constant, which is bounded away from 0 by Theorem 3.1 of Shibata (1981).
It follows from (3.9) and lim
n→∞λ±
nδ = 0 that
n→∞lim
ELKL(α)
λ = lim
n→∞
ELKL(α)/nδ λ/nδ = ∞.
This completes the proof. 2
Theorem 7 Under the setup of Theorem 6, suppose that θ = (ση2, κη, σ²2)0 ∈ Θ is un-known, where Θ ⊂ (0, ∞)3 is a compact set such that θ0 ∈ Θ. Let ˆθ(α) be the ML estimate of θ . For δ = 0, if λ → ∞ as n → ∞, then
n→∞lim P¡ ˆ
αGIC(λ) = α0¢
= 1.
For δ ∈ (0, 1), if λ → ∞ and λ/nδ → 0 as n → ∞, then
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1.
Proof. First, for δ = 0, we prove
n→∞lim P¡ ˆ
αGIC(λ)= α0
¢ = 1.
By (4.10) and by (5.79), for α0 = ∅ and θ(1)α defined in (5.77), we have
ΓGIC(λ)(α) − ΓGIC(λ)(α0) = −2`(θα(1); α) + 2`(θ(1)α0; α0) + λ(p(α) − p(α0)) + Op(1)
= λ(p(α) − p(α0)) + ξ(θ(1)α ) − ξ(θ(1)α0) + Op(1)
= λ(p(α) − p(α0)) + Op(1) > 0,
as n → ∞ with probability tending to 1, where the second equality follows from (5.54) and the third equality follows from (5.76).
Second, for δ ∈ (0, 1), we prove
n→∞lim P¡ ˆ
αGIC(λ) = αc¢
= 1.
It suffices to show that the conditions in Theorem 3 are satisfied. First, by (5.36) and (5.37), we have
X0Σ−1(θ)X = κη
2ση2Vp×pnδ+ o(nδ),
where Vp×p is defined in (5.48) and is nonsingular. Then (A.2) is satisfied. Second, by (5.38), (A.3) is satisfied trivially. Third, (A.4)-(A.5) are followed by (5.78) and (5.80) for τn = nδ and θα = θ(1)α defined in (5.77). Fourth, (A.1) holds by (5.49). Fifth, for ξ(θ) defined in (5.53), by (5.71), we have
ξ(θ0) − ξ(θα(1)) = op(nδ).
Hence, (4.12) holds. Last, for α ∈ Ac, θα(1) = θ0, (4.14) holds trivially. This completes
the proof. 2