Unknown Covariance Parameters - 空間統計模型選取之大樣本理論

α∈Ainf L^KL(α) = 1.

Proof. The proof is essentially the same as that for Theorem 2 of Shao (1997) and hence

is omitted. 2

Theorem 2 reduces to Theorem 2 of Shao (1997) if Σ = σ²I. Similar to (4.5), Equation (4.7) provides a condition for risks associated with underfitted models. Equa-tion (4.8) is a weak technique condiEqua-tion that holds trivially when p is fixed. In fact, (4.7) is slightly weaker than the two conditions given in Theorem 2 of Shao (1997):

n→∞lim inf

α∈A\A^cE(L^KL(α))/n > 0 and lim

n→∞λp/n = 0. Similar to Corollary 1, we have the following corollary.

Corollary 2 Consider a class of models given by (3.1) with xj(s)’s independently gener-ated from white-noise processes of (5.7), where p is fixed and A^c6= ∅. If lim

n→∞tr¡ Σ⁻¹¢±

λ =

∞ and λ → ∞, then lim

n→∞P¡ ˆ

α_GIC(λ) = α^c¢

= 1. In addition, plim

n→∞

L^KL(ˆα_GIC(λ)) .

α∈Ainf L^KL(α) = 1.

Similar to the remark given right after Corollary 1, lim

n→∞λtr¡ Σ¢±

n² = 0 is sufficient for lim

n→∞tr¡

Σ⁻¹¢±

λ = ∞. (see an example in Theorem 12 of Section 5.3).

4.3 Unknown Covariance Parameters

In practice, the covariance parameter vector θ is usually unknown and needs to be esti-mated. Two approaches are commonly applied under this situation. The first one utilizes a two-step procedure by first estimating the covariance parameters using, for example, ML or REML, and then pretending the estimated parameters as known for subsequent

inference or prediction. The other one applies a Bayesian method that requires specify-ing a joint prior distribution for all the unknown parameters. Here we consider only the former one with ˆθ(α) being the ML estimate of θ for α ∈ A, obtained by maximizing the following profile log-likelihood function,

`(θ; α) = −1

2n log(2π) − 1

2log det(Σ(θ))

−1

2(Z − X(α) ˆβ(α; θ))⁰Σ⁻¹(θ)(Z − X(α) ˆβ(α; θ)), (4.9) where ˆβ(α; θ) ≡ (X(α)⁰Σ(θ)⁻¹X(α))⁻¹X(α)⁰Σ(θ)⁻¹Z and Σ is written as Σ(θ) to emphasis its dependence on θ. Let Θ be the parameter space for θ, and let θ₀ ∈ Θ be the true covariance parameter vector. We shall develop asymptotic properties of GIC,

Γ_GIC(λ)(α) = −2`( ˆθ(α); α) + λ(p(α)), (4.10)

under both the fixed domain asymptotic and the increasing domain asymptotic frame-works. The main difficulty to overcome is that some components of ˆθ(α) may converge to nondegenerate distributions even for α ∈ A^c under the fixed domain asymptotic frame-work.

We impose some regularity conditions for establishing asymptotic properties of GIC.

Denote by λmin(M ) the smallest eigenvalue of a symmetric matrix M . We consider some regularity conditions. Suppose that there exists τ_n → ∞ such that the following are satisfied;

(A.1) For θ ∈ Θ, lim

n→∞

1 τ_n inf

α∈A\A^cµ⁰A(α; θ)⁰Σ⁻¹(θ)A(α; θ)µ > 0, where A(α; θ) is defined in (3.6).

(A.2) For θ ∈ Θ, lim

n→∞

1 τ_nλmin

¡X⁰Σ⁻¹(θ)X¢

> 0.

(A.3) For θ ∈ Θ, lim

n→∞

1 τ_nλ_max¡

X⁰Σ⁻¹(θ)Σ(θ₀)Σ⁻¹(θ)X¢

< ∞.

(A.4) For α ∈ A, there exists some θ_α∈ Θ such that plim

n→∞

1 τn

¡`( ˆθ(α); α) − `(θ_α; α)¢

= 0.

(A.5) For α ∈ A \ A^c and θα given in (A.4), plim

n→∞

1 τ_n

¡L^KL(α; ˆθ(α)) − L^KL(α; θα)¢

= 0.

In most cases, τ_n can be chosen as inf

j∈α^cX_j⁰Σ⁻¹(θ)X_j or λ_min¡

X⁰Σ⁻¹(θ)X¢

, where X_j is the jth column of X (see Theorems 7, 10 and 13). Condition (A.1) provides the effect suffered from applying an incorrect model. Condition (A.2) ensures that the explanatory variables are not too much correlated. Obviously, (A.4) and (A.5) hold when plim

n→∞

θ(α) =ˆ θ_α, for some θ_α ∈ Θ. In some situation, θ_α is different from θ₀. For example, when α ∈ A \ A^c, ˆθ(α) generally does not converge in probability to θ₀. Surprisingly, (A.4) and (A.5) may hold even if ˆθ(α) converges to a nondegenerate distribution (see Theorems 7, 10 and 13).

Theorem 3 Consider a class of models given by (3.1) with p fixed. Let Θ be a compact parameter space for θ with θ₀ ∈ Θ being the true parameter, and let L^KL(α) be the KL loss defined in (3.3). Suppose that for α ∈ A, `(θ; α) defined in (4.9) is continuous in Θ, and (A.1)-(A.5) are satisfied for some τn→ ∞.

(i) For A^c= ∅, if τ_n/λ → ∞, and the following two conditions hold for α ∈ A:

n→∞lim sup

α∈A\A^c

τ_nµ⁰A(α; θ)⁰Σ⁻¹(θ)Σ(θ₀)Σ⁻¹(θ)A(α; θ)µ < ∞, (4.11) plim

n→∞

1 τ_ntr¡

((η + ²)(η + ²)⁰− Σ(θ₀))¡

Σ⁻¹(θ_α) − Σ⁻¹(θ₀)¢¢

= 0, (4.12) then GIC defined in (4.2) is asymptotically loss efficient:

plim

n→∞L^KL( ˆθ(ˆα_GIC(λ)); ˆα_GIC(λ)) .

minα∈AL^KL( ˆθ(α); α) = 1. (4.13)

(ii) For A^c6= ∅, if λ → ∞, τ_n/λ → ∞, (4.12) holds, and

n→∞lim 1 τ_n

¡log det(Σ(θ_α)) − log det(Σ(θ₀)) + tr(Σ(θ₀)Σ⁻¹(θ_α)) − n¢

= 0,(4.14) for α ∈ A^c, then lim

n→∞P¡ ˆ

αGIC(λ) = α^c¢

= 1.

Proof. (i) We first prove that for α ∈ A,

Γ_GIC(λ)(α) = n log(2π) + log det(Σ(θ0)) + (η + ²)⁰Σ⁻¹(θ0)(η + ²) + 2L^KL(α; θα)

+o_p(τ_n). (4.15)

By (3.3) and (3.7), we can rewrite 2L^KL(α; θ_α) as

2L^KL(α; θα) = log det(Σ(θα)) − log det(Σ(θ0)) + tr(Σ(θ0)Σ⁻¹(θα)) − n +µ⁰A(α; θ_α)⁰Σ⁻¹A(α; θ_α)(θ)µ

+(η + ²)M (α; θ_α)⁰Σ⁻¹(θ_α)(η + ²), (4.16) where M (α; θ_α) and A(α; θ_α) are defined in (3.5) and (3.6). By (4.10), we have for α ∈ A,

Γ_GIC(λ)(α) = −2`( ˆθ(α); α) + 2`(θ_α; α) − 2`(θ_α; α) + λp(α)

= −2`(θα; α) + λp(α) + op(τn)

= n log(2π) + log det(Σ(θα)) + µ⁰A(α; θα)⁰Σ⁻¹(θα)A(α; θα)µ

−2µ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)A(α; θ_α)(η + ²) + (η + ²)⁰Σ⁻¹(θ_α)(η + ²)

−(η + ²)⁰M (α; θ_α)⁰Σ⁻¹(θ_α)(η + ²) + o_p(τ_n)

= n log(2π) + log det(Σ(θ_α)) + µ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)A(α; θ_α)µ +(η + ²)⁰Σ⁻¹(θ_α)(η + ²) + o_p(τ_n)

= n log(2π) + log det(Σ(θ₀)) + (η + ²)⁰Σ⁻¹(θ₀)(η + ²) + 2L^KL(α; θ_α) +tr((η + ²)(η + ²)⁰− Σ(θ0)(Σ⁻¹(θα) − Σ⁻¹(θ0))) + op(τn)

= n log(2π) + log det(Σ(θ₀)) + (η + ²)⁰Σ⁻¹(θ₀)(η + ²) + 2L^KL(α; θ_α) +o_p(τ_n),

where the second equality follows from (A.4), the third equality follows from (4.9), the fourth equality follows from the following two equations, which will be proved later:

(η + ²)⁰M (α; θ_α)⁰Σ⁻¹(θ_α)(η + ²) = O_p(1); α ∈ A, (4.17) µ⁰A(α; θα)⁰Σ⁻¹(θα)A(α; θα)(η + ²) = op(τn); α ∈ A, (4.18)

the fifth equality follows from (4.16) and

(η + ²)⁰Σ⁻¹(θ_α)(η + ²) − (η + ²)⁰Σ⁻¹(θ₀)(η + ²) + n − tr(Σ(θ₀)Σ⁻¹(θ_α))

= tr((η + ²)(η + ²)⁰− Σ(θ₀)(Σ⁻¹(θ_α) − Σ⁻¹(θ₀))),

and the last equality follows from (4.12). It remains to show (4.17) and (4.18). For (4.17), we have

(η + ²)⁰M (α; θα)⁰Σ⁻¹(θα)(η + ²)

µ(η + ²)⁰Σ⁻¹(θ_α)X(α) τn^1/2

¶µX(α)⁰Σ⁻¹(θ_α)X(α) τ_n

¶₋₁

µX(α)⁰Σ⁻¹(θ_α)(η + ²) τn^1/2

. (4.19)

By (A.2), µ

X⁰Σ⁻¹(θ_α)X τ_n

¶₋₁

= O_p(1). (4.20)

By (A.3),

n→∞lim 1

τ_nvar(X_j⁰Σ⁻¹(θ_α)(η + ²)) = lim

n→∞

τ_nX_j⁰Σ⁻¹(θ_α)Σ(θ₀)Σ⁻¹(θ_α)X_j < ∞.

where Xj be the jth column of X. This together with E(X_j⁰Σ⁻¹(θα)(η + ²)) = 0 imply that

1 τn^1/2

X_j⁰Σ⁻¹(θα)(η + ²) = Op(1). (4.21) Therefore, (4.17) follows from (4.19)-(4.21). Using

µ⁰A(α; θα)⁰Σ⁻¹(θα)A(α; θα)(η + ²) = µ⁰A(α; θα)⁰Σ⁻¹(θα)(η + ²), (4.11) and the Markov’s inequality, we have for any ε > 0,

n→∞lim P ¡

|µ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)(η + ²)±

τ_n| > ε¢

≤ lim

n→∞P ¡

|µ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)(η + ²)±

τ_n|² ≥ ε²¢

≤ lim

n→∞

1 ε²τ_n²E¡

µ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)Σ(θ₀)Σ⁻¹(θ_α)A(α; θ_α)µ¢

= 0.

This gives (4.18). Thus (4.15) is obtained.

We are now ready to prove (4.13). Let α^L = arg min

α∈A

L^KL(α; θ_α). By (4.15), we have

0 ≤ plim

n→∞

Γ_GIC(λ)(α^L) − Γ_GIC(λ)(ˆα_GIC(λ))

τ_n = plim

n→∞

L^KL(α^L; θ_α^L) − L^KL(ˆα_GIC(λ); θ_α_ˆ_GIC(λ))

τ_n ≤ 0,

for some θ_α_ˆ, θ_α^L ∈ Θ where the first inequality follows from the definition of ˆα_GIC(λ), the equality follows from (4.15) and the last inequality follows from the definition of α^L. It follows that

plim

n→∞

L^KL(ˆα_GIC(λ);; θ_α_ˆ_GIC(λ)) − L^KL(α^L; θ_α^L) τn

= 0. (4.22)

In addition, by (A.1) and (4.16), plim

n→∞

τ_nL^KL(α; θ_α)) > plim

n→∞

τ_nµ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)A(α; θ_α)µ > 0.

This together with (4.22) implies that plim

n→∞L^KL(α^L; θ_α)±

L^KL(ˆα_GIC(λ); θ_α_ˆ_GIC(λ)) = 1. Then by (A.5),

plim

n→∞L^KL(α^L; ˆθ(α^L))±

L^KL(ˆαGIC(λ); ˆθ(ˆαGIC(λ))) = 1.

which gives (4.13). This completes the proof of (i).

(ii) We first prove (4.15) for A^c 6= ∅. The proof is essentially the same as that in (i) except (4.18) needs to be shown as follows:

µ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)A(α; θ_α)(η + ²)

= µ⁰A(α; θ_α)⁰Σ⁻¹(θ_α)(η + ²)

= β(α^c\ α)⁰X(α^c\ α)⁰Σ⁻¹(θ_α)(η + ²)

−

µβ(α^c\ α)⁰X(α^c\ α)⁰Σ⁻¹(θ_α)X(α) τn^1/2

¶µX(α)⁰Σ⁻¹(θ_α)X(α) τ_n

¶₋₁

µX(α)⁰Σ⁻¹(θ_α)(η + ²) τn^1/2

= β(α^c\ α)⁰X(α^c\ α)⁰Σ⁻¹(θ_α)(η + ²) + O_p(1)

= o_p(τ_n),

where the second last equality follows similarly from the proof of (4.17) and the last equality follows from (4.21).

Second, we prove that lim

n→∞P¡

Γ_GIC(λ)(α) > Γ_GIC(λ)(α^c)¢

= 1, for α ∈ A \ A^c. By (A.4), we have for α ∈ A^c,

Γ_GIC(λ)(α) = −2`(θ_α; α) + o_p(τ_n)

= n log(2π) + log det(Σ(θα)) + (η + ²)⁰Σ⁻¹(θα)(η + ²)

−(η + ²)⁰M (α; θ_α)⁰Σ⁻¹(θ₀)(η + ²) + o_p(τ_n)

= n log(2π) + log det(Σ(θ_α)) + (η + ²)⁰Σ⁻¹(θ_α)(η + ²) + o_p(τ_n),(4.23) where the first equality follows from λp = o(τn) and the last equality follows from (4.17).

Then, by (4.15) and (4.23), we have for α ∈ A \ A^c, Γ_GIC(λ)(α) − Γ_GIC(λ)(α^c)

= 2L^KL(α; θ_α) + log det(Σ(θ₀)) + (η + ²)⁰Σ⁻¹(θ₀)(η + ²)

− log det(Σ(θ_α^c)) − (η + ²)⁰Σ⁻¹(θ_α^c)(η + ²) + o(τ_n)

= 2L^KL(α; θ_α) − tr¡

((η + ²)(η + ²)⁰− Σ(θ₀))¡

Σ⁻¹(θ_α^c) − Σ⁻¹(θ₀)¢¢

−¡

log det(Σ(θ_α^c)) − log det(Σ(θ₀)) + tr(Σ(θ₀)Σ⁻¹(θ_α^c)) − n¢

+ o_p(τ_n)

= 2L^KL(α; θ_α) + o_p(τ_n) > 0,

as n → ∞ with probability tending to 1, where the last equality follow from (4.12), (4.14) and (4.22). It follows that lim

n→∞P¡ ˆ

αGIC(λ) ∈ A \ A^c¢

= 0.

Last, it remains to show that GIC achieves its minimum at α^c among α ∈ A^c. For as n → ∞ with probability tending to 1, which follows that lim

n→∞P¡ ˆ

α_GIC(λ) ∈ A^c, ˆα_GIC(λ) 6=

α^c¢

= 0. This completes the proof of the theorem. 2

Conditions (A.1)-(A.3) in Theorem 3 not only depend on explanatory variables but also depend on asymptotic frameworks. As shown in Theorem 7, those conditions are easier to be satisfied under the increasing domain asymptotic framework, particularly when the domain increases with the sample size in a faster rate. On the other hand, (A.1) may not be satisfied under the fixed domain asymptotic framework.

Theorem 3 is for fixed designs. A random design version is given in the following corollary.

Corollary 3 (random design) Consider a class of models given by (3.1) with p fixed and X random, where X is independent of (η + ²). Let Θ be a compact parameter space for θ with θ0 ∈ Θ being the true parameter vector, and let L^KL(α) be the KL loss defined

(A.3’) For θ ∈ Θ, lim

n→∞

τ_ntr(Σ⁻¹(θ)Σ(θ₀)Σ⁻¹(θ)E(X_jX_j⁰)) < ∞, where X_j is the jth column of X,

and (A.4)-(A.5) are satisfied.

(i) For A^c= ∅, if τ_n/λ → ∞, (4.12) holds and

n→∞lim sup

α∈A\A^c

1 τ_nE¡

µ⁰A(α; θ)⁰Σ⁻¹(θ)Σ(θ₀)Σ⁻¹(θ)A(α; θ)µ¢

< ∞,

for θ ∈ Θ, then GIC defined in (4.2) is asymptotically loss efficient:

plim

n→∞L^KL( ˆθ(ˆα_GIC(λ)); ˆα_GIC(λ)) .

minα∈AL^KL( ˆθ(α); α) = 1.

(ii) For A^c6= ∅, if λ → ∞, τ_n/λ → ∞, (4.12) and (4.14) hold, then

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1.

Similar to (A.1)-(A.3) in Theorem 3 under fixed designs, (A.1’)-(A.3’) in Corollary 3 not only depend on explanatory variables but also depend on asymptotic frameworks.

In contrast to fixed designs with smooth functions as explanatory variables, where (A.1) may not be satisfied (see Theorem 7) under the fixed domain asymptotic framework, condition (A.1’) appear to be easier satisfied when random designs are considered (see some examples in Theorems 10 and 13).

Chapter 5 Exponential Covariance Models in One Dimension

In this chapter, we consider some examples in the one-dimensional space with η(·) of (2.2) generated from an exponential covariance function:

cov(η(s), η(s)⁰) = σ²_ηexp(−κ_η|s − s⁰|); s, s⁰ ∈ R, (5.1) where σ_η² > 0 and κ_η > 0. Let s_i = in^−(1−δ) i = 1, . . . , n, for some δ ∈ [0, 1). Then {η(s₁), . . . η(s_n)} can be expressed as an AR(1) process:

η(s_i) = ρ_nη(s_i−1) + ζ_i, (5.2) where

ρ_n≡ exp(−κ_ηn^−(1−δ)), (5.3)

η(s₁) ∼ N(0, σ²_η), ζ_i ∼ N(0, σ²_η(1 − ρ²_n)) is independent of η(s_i−1) for i = 2, . . . , n, and η(s₁), ζ₂, . . . , ζ_n are independent. Then the covariance parameter vector can be written as θ ≡ (σ_η², κη, σ²_²)⁰.

In what follows, we consider four examples corresponding to four different classes of explanatory variables in (3.1) with the exponential covariance model of (5.1) for η(·).

Example 1 (polynomials) Suppose that there are p explanatory variables, xj(si); j = 1, . . . , p, sampled at s_i = in^−(1−δ); i = 1, . . . , n, with x_j(·) given by

x_j(s) = s^j; s ∈ R, j = 1, . . . , p, (5.4) where p is fixed and δ ∈ [0, 1).

Example 2 (polynomials varying with n) Suppose that there are p explanatory variables x_j(s_i); j = 1, . . . , p, sampled at s_i = in^−(1−δ); i = 1, . . . , n, with x_j(·) given by

x_j(s) = (sn^−δ)^j; s ∈ R, j = 1, . . . , p, (5.5) where p is fixed and δ ∈ [0, 1).

Example 3 (spatially dependent processes) Suppose that there are p explanatory vari-ables x_j(s_i); j = 1, . . . , p, sampled at s_i = in^−(1−δ); i = 1, . . . , n, where x₁(·), . . . , x_p(·) are independent zero-mean Gaussian spatial processes with covariance functions,

cov(xj(s), xj(s⁰)) = σ_j²exp{−κj|s − s⁰|}; s, s⁰ ∈ R, j = 1, . . . , p, (5.6) p is fixed, δ ∈ [0, 1), and σ²_j, κj > 0; j = 1, . . . , p.

Example 4 (white noise processes) Suppose that there are p explanatory variables x_j(s_i);

j = 1, . . . , p, sampled at s_i = in^−(1−δ); i = 1, . . . , n, where x₁(·), . . . , x_p(·) are independent white-noise processes with

x_j(s_i) ∼ N(0, σ²_j); i = 1, . . . , n, j = 1, . . . , p, (5.7) p is fixed, δ ∈ [0, 1), and σ²_j > 0; j = 1, . . . , p.

We shall characterize the asymptotic behavior of GIC under both the fixed domain and the increasing domain frameworks with θ being either known or estimated by ML.

We shall also show how different generating mechanism of explanatory variables in the aforementioned examples affects the asymptotic behavior.

First, we introduce some notations and a number of technical lemmas regarding ex-ponential covariance functions, which are crucial for developing the asymptotical results of GIC. Let

G_k ≡







1 0 0 · · · 0

−ρn 1 0 . .. ...

0 −ρ_n 1 . .. 0 ... . .. ... ... 0 0 · · · 0 −ρn 1







k×k

, (5.8)

T_k ≡







σ_η²+ σ²_² −σ_²²ρ_n 0 · · · 0

−σ_²²ρn f1(ρn) −σ²_²ρn . .. ...

0 −σ_²²ρ_n f₁(ρ_n) . .. 0 ... . .. . .. . .. −σ²_²ρ_n 0 · · · 0 −σ²_²ρn f1(ρn)







k×k

, (5.9)

be k × k matrices, where

f₁(ρ_n) ≡ (1 − ρ²_n)σ_η²+ (1 + ρ²_n)σ_²². (5.10) Lemma 4 Consider Σ(θ) and Σ_η defined in (3.2) and (5.1), where s_i = in^−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Then

Σ⁻¹(θ) = G⁰_nT_n⁻¹G_n, (5.11) where G_n and T_n are given by (5.8) and (5.9), respectively.

Lemma 5 For any c > 0 and δ ∈ [0, 1) with n^(1−δ)/2+c < n, consider T_j_n defined in (5.9) with n^(1−δ)/2+c ≤ jn ≤ n. Let Cjn(k, `) be the (k, `)th element of T_j⁻¹_n . Then there exists a constant τ > 0 such that

σ_²^−2jⁿdet(T_j_n) = f₂^jⁿ⁻¹(ρ_n)

(f₁²(ρ_n) − 4ρ²_nσ⁴_²)^1/2((σ²_η+ σ_²²)f₂(ρ_n) − ρ²_nσ_²²) + o(exp(−τ n^c/2)),(5.12) where ρ_n and f₁(ρ_n) are given by (5.3) and (5.10), respectively, and

f₂(ρ_n) ≡ f1(ρn) + (f₁²(ρn) − 4ρ²_nσ⁴_²)^1/2

2σ_²² . (5.13)

In addition,

Cjn(1, `) = Cjn(`, 1) = ρ^`−1_n

((σ_η²+ σ²_²)f₂(ρ_n) − ρ²_nσ²_²)f₂^`−2(ρ_n) + o(exp(−τ n^c/2));

1 ≤ ` ≤ jn− n^(1−δ+c)/2, (5.14)

C_j_n(j_n, `) = C_j_n(`, j_n) = ρ^j_nⁿ^−`

f₂^jⁿ^−`+1(ρ_n)σ_²² + o(exp(−τ n^c/2)); n^(1−δ+c)/2 < ` ≤ j_n,(5.15)

1≤k,`≤jmaxn

C_j_n(k, `) = 1

(8κ_ησ_η²σ⁻²_² )^1/2n^(1−δ)/2+ o(n^−(1−δ)), (5.16) and

tr(T_n⁻¹) = n^(3−δ)/2

2(2κ_ησ²_ησ_²²)^1/2 + O(n^1−δ). (5.17) Furthermore, let Tn⁽¹⁾ be the matrix with (σ²_η, κ_η, σ²_²) in T_n replaced by (ση⁽¹⁾², κ⁽¹⁾η , σ⁽¹⁾²² ).

Then

tr(T_n⁻¹T_n⁽¹⁾⁻¹) = n^(5−3δ)/2

2^5/2(κησ_η²κ⁽¹⁾η σ⁽¹⁾²η )^1/2((κησ²_ησ²⁽¹⁾²)^1/2+ (κ⁽¹⁾η σ⁽¹⁾²η σ_²²)^1/2)

+O(n^2−δ). (5.18)

Notice that T_n defined in (5.9) corresponds to the variance-covariance matrix of a moving average (MA) process {υ₁, . . . , υ_n} of order 1:

var(υ_n) = T_n, (5.19)

where υ_n≡ (υ₁, . . . , υ_n),

υ_i = u_i− f₄(ρ_n)u_i−1; i = 2, . . . , n, (5.20) with u1 ∼ N(0, (σ²_η− σ_²²− f2(ρn)σ_²²)f₄⁻²(ρn)) and ui ∼ N(0, f2(ρn)σ_²²); i = 2, . . . , n,

f4(ρn) ≡ ρn/f2(ρn), (5.21) and recall that f₂(ρ_n) and ρ_nare defined in (5.13) and (5.3), respectively. Some asymptotic properties of f4(ρn) and Tn are given in the follow lemmas.

Lemma 6 With f2(ρn) and f4(ρn) defined in (5.13) and (5.21), respectively, we have f₄(ρ_n) = 1 − (2κ_ησ_η²σ_²⁻²)^1/2n^{−(1−δ)/2}+ O(n^−(1−δ)), (5.22) f₂(ρ_n) = 1 + (2κ_ησ_η²σ⁻²_² )^1/2n^{−(1−δ)/2}+ (σ_η²− σ_²²)σ⁻²_² κ_ηn^−(1−δ)+ O(n^{−3(1−δ)/2}),(5.23) and

log f₄(ρ_n) = −(2κ_ησ²_ησ_²⁻²)^1/2n^{−(1−δ)/2}+ O(n^−(1−δ)). (5.24) In addition, for any c > 0 and δ ∈ [0, 1) with n^(1−δ)/2+c < n, and any j_n with n^(1−δ)/2+c ≤ j_n≤ n, there exists a constant τ > 0 such that

f₄^jⁿ(ρn) = o(exp(−τ n^c)). (5.25)

Lemma 7 Consider T_n defined in (5.9). For any c > 0, δ ∈ [0, 1) with n^(1−δ)/2+c < n, and any j_n with n^(1−δ)/2+c ≤ j_n≤ n, there exists a constant τ > 0 such that

T_n⁻¹ = Ω⁰_n

µ Λ⁻¹_j_n 0

0 (f₂(ρ_n)σ_²²)⁻¹I_n−j_n

Ω_n+ o(exp(−τ n^c/2)), (5.26) where

Ω_n ≡







1 0 · · · 0

f₄(ρ_n) 1 . .. ...

... . .. . .. 0 f₄ⁿ⁻¹(ρ_n) · · · f₄(ρ_n) 1





, (5.27)

f₂(ρ_n) and f₄(ρ_n) are given by (5.13) and (5.21), respectively, and

Λ_k = Ω_kT_kΩ⁰_k. (5.28)

The following three lemmas are based on Lemmas 4-7, which are crucial in developing the asymptotical results of ML estimates in Sections 5.1-5.3.

Lemma 8 Consider Σ(θ) and Σ_η defined in (3.2) and (5.1), where s_i = in^−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let Σ^(j)η be the same as Σ_η except (σ²_η, κ_η) are replaced by

¡σ^(j)2η , κ^(j)η

¢. Define Σ^(j) ≡ Σ^(j)η + σ²^(j)2; j = 1, 2, 3. Then for δ ∈ [0, 1),

log(det(Σ(θ))) = n log σ²_² +

µ2κ_ησ²_η σ_²²

¶_1/2

n^(1+δ)/2−

µκ_η(σ²_η+ σ_²²) σ_²²

¶ n^δ

− log n^(1−δ)/2+ o(n^δ) + O(1), (5.29) tr(Σ⁽¹⁾_η Σ⁻¹(θ)) = σ⁽¹⁾²η κ⁽¹⁾η

(2κ_ησ²_ησ_²²)^1/2n^(1+δ)/2+ση⁽¹⁾²κ⁽¹⁾η (κ_η − κ⁽¹⁾η ) κ_ησ²_η n^δ +ση⁽¹⁾²(κ_η − κ⁽¹⁾η )²

2κ_ησ_η² n^δ+ o(n^δ) + O(1), (5.30) tr(Σ⁻¹(θ)) = n

σ_²² − (2κησ_η²σ⁻²_² )^1/2

2σ_²² n^(1+δ)/2+ o(n^δ) + O(1), (5.31) tr(Σ⁽¹⁾Σ⁻¹(θ)) = σ²⁽¹⁾²

σ²_² n − σ²⁽¹⁾²

2σ_²² (2κ_ησ_η²σ_²⁻²)^1/2n^(1+δ)/2+ ση⁽¹⁾²κ⁽¹⁾η

(2κησ_η²σ_²²)^1/2n^(1+δ)/2 +ση⁽¹⁾²κ⁽¹⁾η (κη − κ⁽¹⁾η )

κ_ησ²_η n^δ+ση⁽¹⁾²(κη− κ⁽¹⁾η )² 2κ_ησ_η² n^δ

+o(n^δ) + O(1), (5.32)

tr(Σ⁽¹⁾_η Σ⁻¹(θ)Σ⁽²⁾_η Σ⁽³⁾⁻¹) = σ⁽¹⁾²η κ⁽¹⁾η σ⁽²⁾²η κ⁽²⁾η n^(1+δ)/2

2^1/2(κ_ησ²_ηκ⁽³⁾η ση⁽³⁾²)^1/2((κ_ησ_η²σ²⁽³⁾²)^1/2+ (κ⁽³⁾η ση⁽³⁾²σ_²²)^1/2)

+O(n^δ), (5.33)

tr(Σ⁽¹⁾_η Σ⁻¹(θ)Σ⁽²⁾⁻¹) = ση⁽¹⁾κ⁽¹⁾η

σ²_²σ²⁽²⁾²((2κ_ησ_η²σ_²⁻²)^1/2+ (2κ⁽²⁾η σ⁽²⁾²η σ⁽²⁾⁻²² )^1/2) + O(n^δ), (5.34) tr(Σ⁽¹⁾⁻¹Σ⁻¹(θ)) = n

σ⁽¹⁾²² σ²_² − 1 σ²⁽¹⁾²σ²_²

µ(κησ²_ησ⁻²_² )^1/2+ (κ⁽¹⁾η ση⁽¹⁾²σ²⁽¹⁾⁻²)^1/2 2^1/2

− (κ_ησ_η²σ⁻²_² )^1/2(κ⁽¹⁾η ση⁽¹⁾²σ²⁽¹⁾⁻²)^1/2 2^1/2((κ_ησ²_ησ_²⁻²)^1/2+ (κ⁽¹⁾η ση⁽¹⁾²σ²⁽¹⁾⁻²)^1/2)

n^(1+δ)/2

+O(n^δ). (5.35)

Lemma 9 Consider Σ(θ) and Σ_η defined in (3.2) and (5.1), where s_i = in^−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let ψ_k ≡ n^−k(1^k, 2^k, . . . , n^k)⁰; k ∈ {0, 1, . . . }. Then for any k = 0, 1, . . . , ` = 1, 2, . . . , and any δ ∈ [0, 1),

ψ_k⁰Σ⁻¹ψ_` = κ_η

2σ_η²(k + ` + 1)n^δ+ 1

2σ_η² + k`

2κ_ησ_η²(k + ` − 1)n^−δ+ o(n^δ), (5.36) ψ₀⁰Σ⁻¹ψ0 = κ

2σ_η²n^δ+ 1

σ²_η + o(n^δ). (5.37)

In addition, for δ ∈ [0, 1), k, ` = 0, 1, . . . , p, and Σ⁽¹⁾ defined in Lemma 8,

ψ_k⁰Σ⁻¹Σ⁽¹⁾Σ⁻¹ψ` = O(n^δ). (5.38)

Lemma 10 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in^−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let Σ_j = var((x_j(s₁), . . . , x_j(s_n))⁰) with x_j(s) defined in (5.6).

(i) Suppose that |σ_²²− σ²| = o(1) for some constant σ² > 0. Then log(σ_²²) + σ²

σ_²² − log(σ²) − 1 = 1

2σ⁴(σ²_² − σ²)²+ o((σ²_² − σ²)²). (5.39) (ii) Suppose that |σ_²²−σ²| = o(1) for some constants σ² > 0. Then for any κ_η, σ_η², τ > 0,

µ 1 σ²_²

¶_1/2µ

1 − σ²

2σ²_² + τ 2κησ_η²

− µ 1

σ²

¶_1/2µ 1 2 + τ

2κησ²_η

= o(σ_²²− σ²). (5.40)

(iii) Suppose that |κ_ησ²_η− τ | = o(1) for some constant τ > 0. Then for any σ² > 0, µ2κ_ησ_η²

σ²

¶_1/2µ 1

2 + τ 2κ_ησ_η²

− µ2τ

σ²

¶_1/2

= (κ_ησ²_η− τ )²

2^5/2στ^3/2 + o((κ_ησ_η²− τ )²). (5.41) (iv) Suppose that |σ_²²− σ²| = o(1) and |κ_ησ_η² − τ | = o(1) for some constants σ², τ > 0.

Then for any κ_j, κ_j⁰, σ_j², σ_j⁰ > 0,

tr(Σ_j(Σ(θ) − Σ((σ²_η, κ_η, σ²)⁰))Σ_j(Σ((σ_η², κ_η, σ_²²)⁰) − Σ((σ_η², κ_η, σ²)⁰)))

= κ_jσ_j²κ_j⁰σ_j²0

2^9/2τ^3/2σ³(σ_²²− σ²)²n^(1+δ)/2+ o((σ_²²− σ²)²n^(1+δ)/2) + O(n^δ). (5.42)

(v) Suppose that |σ_²²− σ²| = o(1) and |κ_ησ_η² − τ | = o(1) for some constants σ², τ > 0.

Then for any κ_j, σ_j² > 0,

tr(Σ_j(Σ⁻¹(θ) − Σ⁻¹((σ²_η, κ_η, σ²)⁰))(Σ⁻¹(θ) − Σ⁻¹((σ_η², κ_η, σ²)⁰)))

= 5κ_jσ²_j

2^9/2τ^1/2σ⁷(σ_²²− σ²)²n^(1+δ)/2+ o((σ_²²− σ²)²n^(1+δ)/2) + O(n^δ). (5.43) (vi) Suppose that |σ_²²− σ²| = o(1) and |κ_ησ_η² − τ | = o(1) for some constants σ², τ > 0.

Then

tr((Σ⁻¹(θ) − Σ⁻¹((σ_η², κη, σ²)⁰))(Σ⁻¹(θ) − Σ⁻¹((σ²_η, κη, σ²)⁰)))

= 1

σ⁸(σ_²²− σ²)²n + o((σ²_² − σ²)n^(1+δ)/2) + O(n^δ). (5.44) (vii) Suppose that |σ_²² − σ²| = o(n^{−(1−δ)/2}) and |κησ²_η − τ | = o(1) for some constants

σ², τ > 0. Then for any κ_j, κ_j⁰, σ_j², σ_j⁰, c, d > 0 with cd = τ ,

tr(Σj(Σ⁻¹(θ) − Σ⁻¹((c, d, σ²)⁰))Σj⁰(Σ⁻¹(θ) − Σ⁻¹((c, d, σ²)⁰)))

= 5κ_jσ_j²κ_j⁰σ_j²0

2^9/2στ^7/2 (κ_ησ_η²− τ )²n^(1+δ)/2+ o((κ_ησ_η²− τ )²n^(1+δ)/2) + O(n^δ). (5.45) (viii) Suppose that |σ_²² − σ²| = o(n^{−(1−δ)/2}) and |κησ²_η − τ | = o(1) for some constants

σ², τ > 0. Then for any κ_j, σ_j², c, d > 0 with cd = τ ,

tr(Σj(Σ⁻¹(θ) − Σ⁻¹((c, d, σ²)⁰))(Σ⁻¹(θ) − Σ⁻¹((c, d, σ²)⁰)))

= κ_jσ_j²

2^9/2σ³τ^5/2(κησ²_η− τ )²n^(1+δ)/2+ o((κησ_η²− τ )²n^(1+δ)/2) + O(n^δ). (5.46) (ix) Suppose that |σ_²² − σ²| = o(n^{−(1−δ)/2}) and |κ_ησ²_η − τ | = o(1) for some constants

σ², τ > 0. Then for any c, d > 0 with cd = τ ,

tr((Σ⁻¹(θ) − Σ⁻¹((c, d, σ²)⁰))(Σ⁻¹(θ) − Σ⁻¹((c, d, σ²)⁰)))

= 1

2^9/2τ^5/2(κησ_η²− τ )²n^(1+δ)/2+ o((κησ_η²− τ )²n^(1+δ)/2) + O(n^δ). (5.47)

5.1 Polynomial Order Selection

In this section, we consider Examples 1 and 2 given by (5.4) and (5.5) for polynomial order selection. Note that in Example 1, the underlying true polynomial does not vary with the sample size, whereas in Example 2, the magnitude of the underlying true polynomial decreases as the sample size increases, making estimation and polynomial order selection more difficult. Let V_j×j⁰ be a j × j⁰ matrix with the (k, `)th element,

k + ` + 1; k = 1, . . . , j, ` = 1, . . . , j⁰. (5.48) Note that when j = j⁰, the square matrix Vj,j is nonsingular (see Shibata 1981).

Proposition 4 Consider a class of models given by (3.1) with p explanatory variables and V_j×j⁰ is defined in (5.48). In addition, the log-likelihood of (4.9) based on model α ∈ A can be decomposed into the following:

(i) For δ ∈ (0, 1),

Equation (5.52) provides some guidance of applying GIC to distinguish between correct and incorrect models in polynomial order selection. For example, it follows from (5.52) and γ(α^c) = 0 that for α ∈ A \ A^c and δ ∈ (0, 1),

−2`(θ; α) + 2`(θ; α^c) = γ(α)κ_η

2σ_η² n^δ+ o_p(n^δ), (5.55)

so that we can get rid of underfitted models if the penalty term has a smaller order than O(n^δ). As to be demonstrated in Theorem 6, we can use (5.55) to find an appropriate penalty λ that leads to selection consistency. On the other hand, applying (5.54) to Example 2 under the fixed domain asymptotic framework (i.e., δ = 0), we obtain

−2`(θ; α) + 2`(θ; α^c) = Op(1), (5.56) indicating that consistency or asymptotic loss efficiency of GIC is almost impossible.

Additionally, we see from (5.54) that the likelihood value depends on κ_η and σ_η² mainly through their product, but not their individual values under the fixed domain asymptotic framework when δ = 0. Consequently, variable selection based on GIC is expected to be not much affected by individual estimates of κ_η and σ_η² as long as the estimate of the microergodic parameter, κ_ησ_η², remains the same.

The following lemma provides consistency of the ML estimate of σ_²² and the microer-godic parameter, κ_ησ_η², under both the fixed domain and the increasing domain asymptotic frameworks with δ ∈ [0, 1). The results are extended from Chen et al. (2000) who consider only α ∈ A^c and δ = 0.

Lemma 11 Under the setup of Proposition 4, let Θ ⊂ (0, ∞)³ be a compact set and let θ(α) = (ˆˆ σ²_η(α), ˆκη(α), ˆσ²_²(α))⁰ be the ML estimate of θ based on model α. Then for any δ ∈ [0, 1),

σ_²²(α) = σ_²,0² + o_p(1), (5.57) ˆ

κη(α)ˆσ_η²(α) = κη,0σ_η,0² + op(1). (5.58)

The following theorem further provides the convergence rates for the ML estimates of κ_η, σ²_η and σ_²². These results are also extended from Chen et al. (2000) who consider only α ∈ A^c and δ = 0, and are keys for establishing some asymptotic properties of GIC in Theorem 7.

Theorem 4 Under the setup of Proposition 4, let Θ ⊂ (0, ∞)³ be a compact set and let θ(α) = (ˆˆ σ²_η(α), ˆκ_η(α), ˆσ²_²(α))⁰ be the ML estimate of θ based on model α. Then

(i) For δ ∈ (0, 1), ˆ

σ²_²(α) = σ_²,0² + o_p(n^{−(1−δ)/2}); α ∈ A, (5.59) ˆ

κ_η(α)ˆσ²_η(α) = κ_η,0σ_η,0² + o_p(n^{−(1−δ)/4}); α ∈ A, (5.60) ˆ

σ²_η(α) =

½ σ²_η,0+ o_p(1); if α ∈ A^c,

γ(α) + σ²_η,0+ op(1); if α ∈ A \ A^c, (5.61) ˆ

κη(α) =

½ κη,0+ op(1); if α ∈ A^c,

κ_η,0σ_η,0² (γ(α) + σ_η,0² )⁻¹+ o_p(1); if α ∈ A \ A^c, (5.62) where γ(α) > 0 is a constant defined in (5.51) for α ∈ A \ A^c.

(ii) For δ = 0 and any α ∈ A, ˆ

σ_²²(α) = σ_²,0² + Op(n^−1/2), (5.63) ˆ

κ_η(α)ˆσ_η²(α) = κ_η,0σ²_η,0+ O_p(n^−1/4). (5.64)

Proof. Denote σ_η,α² ≡ γ(α) + σ_η,0² and κ_η,α ≡ κ_η,0σ²_η,0/(γ(α) + σ_η,0² ), for α ∈ A, where γ(α) ≡ 0 for α ∈ A^c. Note that κ_η,ασ_η,α² = κ_η,0σ²_η,0.

First, we prove (5.59). By (5.57) and (5.58), it suffices to show that for |σ²_² − σ_²,0² | = o(1), |κ_ησ²_η − κ_η,0σ_η,0² | = o(1) and any ε > 0,

|σ²_²−σ_²,0² |≥εninf^{−(1−δ)/2}(−2`(θ; α) + 2`((σ_η², κ_η, σ_²,0² )⁰; α)) > 0, (5.65) as n → ∞ with probability tending to 1. By (5.52), we can write

−2`(θ; α) = n log(2π) − 1 − δ

where the last equality follows from (5.39) and (5.40). Therefore, for (5.65) to hold, it remains to show that Applying Chebyshev’s inequality on each of the three parts and using the following three moment conditions given from (5.42)-(5.44) on (5.68):

var¡

we obtain (5.67). This completes the proof of (5.59).

Second, we prove (5.60). By (5.58) and (5.59), it suffices to show that for |σ_²²− σ_²,0² | = o(n^{−(1−δ)/2}), |κησ_η²− κη,0σ_η,0² | = o(1) and any ε > 0,

|σ_η²κη−σ²_η,0κinfη,0|≥εn^{−(1−δ)/4}

¡− 2`(θ; α) + 2`((σ_η,α² , κ_η,α, σ²_²,0)⁰; α)¢

> 0, (5.69)

as n → ∞ with probability tending to 1. By (5.66), for |σ²_² − σ_²,0² | = o(n^{−(1−δ)/2}) and

|κ_ησ_η²− κ_η,0σ_η,0² | = o(1), we have

−2`(θ; α) + 2`((σ_η,α² , κη,α, σ_²,0² )⁰; α)

½µ2κ_ησ²_η σ_²,0²

¶_1/2µ 1

2 +κ_η,0σ²_η,0 2κ_ησ_η²

−

µ2κ_η,0σ_η,0² σ_²,0²

¶_1/2¾

n^(1+δ)/2

+ σ_η,α²

2κ_η,0σ²_η,0(κ_η− κ_η,α)²n^δ+ ξ(θ) − ξ((σ_η,α² , κ_η,α, σ²_²,0)⁰) + o_p(n^δ)

= (κ_ησ_η²− κ_η,0σ_η,0² )²n^(1+δ)/2

2^5/2(κ_η,0σ²_η,0)^3/2 + σ²_η,α(κ_η − κ_η,α)n^δ

2κ_η,0σ²_η,0 + ξ(θ) − ξ((σ_η,α² , κη,α, σ²_²,0)⁰)

+o_p(n^δ), (5.70)

where the first equality follows from (5.39) and the second equality follows from (5.41).

Therefore, for (5.69) to hold, it remains to show that

ξ(θ) − ξ((σ_η,α² , κ_η,α, σ_²,0² )⁰) = o_p(max((κ_ησ_η²− κ_η,0σ_η,0² )²n^(1+δ)/2, n^δ)), (5.71) which can be obtained from a decomposition similar to (5.68) in addition to the following three moment conditions given from (5.45)-(5.47):

var¡

η⁰(Σ⁻¹(θ) − Σ⁻¹((σ_η,α² , κ_η,α, σ_²,0² )⁰))η¢

= O(max((κ_ησ²_η− κ_η,0σ_η,0² )²n^(1+δ)/2, n^δ)), var¡

η⁰(Σ⁻¹(θ) − Σ⁻¹((σ²_η,α, κ_η,α, σ_²,0² )⁰))²¢

= O(max((κ_ησ²_η− κ_η,0σ_η,0² )²n^(1+δ)/2, n^δ)), var¡

²⁰(Σ⁻¹(θ) − Σ⁻¹((σ²_η,α, κ_η,α, σ_²,0² )⁰))²¢

= O(max((κ_ησ²_η− κ_η,0σ_η,0² )²n^(1+δ)/2, n^δ)).

Thus (5.69) is obtained. This completes the proof of (5.60).

Third, we prove (5.61) and (5.62). By (5.70) and (5.71), for |σ²_² − σ²_²,0| = o(n^{−(1−δ)/2}),

|σ²_ηκ_η − σ²_η,0κ_η,0| = o(n^{−(1−δ)/4}) and any ε > 0, we have

|κη−κinfη,α|≥ε−2`(θ; α) + 2`((σ²_η,α, κη,α, σ²_²,0)⁰; α) = σ_η,α²

2κ_η,0σ_η,0² ε²n^δ+ o_p(n^δ) > 0, as n → ∞ with probability tending to 1, which gives (5.62). This together with (5.60) gives (5.61).

Fourth, we prove (5.63). By (5.57) and (5.58), it suffices to show that for |σ_²²− σ_²,0² | = o(1) and |κ_ησ_η²− κ_η,0σ_η,0² | = o(1), there exists M > 0 such that

|σ²_²−σ²_²,0inf|≥M n^−1/2

> 0, (5.72)

as n → ∞ with probability tending to 1. By (5.54), for |σ_²² − σ_²,0² | = o(1) and |κ_ησ²_η −

where the second equality follows from (5.39) and (5.40), and the last equality follows from

ξ(θ) − ξ((σ_η², κη, σ_²,0² )⁰) = op

¡(σ²_² − σ_²,0² )²n¢

+ Op(1), (5.73) which can be obtained in a way similar to (5.67). Consequently, there exists M > 0 such that

|σ²²−σ²_²,0inf|≥M n^−1/2(−2`(θ; α) + 2`((σ_η², κη, σ_²,0² )⁰; α)) = M²

2σ_²,0⁴ + Op(1) > 0,

as n → ∞ with probability tending to 1. Thus, we obtain (5.72), and hence the proof of (5.63) is complete.

where the first equality follows from |σ_²²− σ²_²,0| = o(n^{−(1−δ)/2}), the second equality follows from (5.41), and the last equality follows from

ξ(θ) − ξ((σ_η,α² , κη,α, σ_²,0² )⁰) = op

¡(κησ²_η − κη,0σ_η,0² )²n^1/2¢

+ Op(1), (5.76) which can be obtained in a way similar to (5.71). Thus, (5.74), and hence (5.64) are

obtained. This completes the proof. 2

Note that a special case of Theorem 4 for which δ = 0 and β = 0, can be found in Zhang and Zimmerman (2005), where they consider no regressor, and hence consider no underfitted model.

Corollary 4 Under the setup of Theorem 4, let

θ⁽¹⁾_α = (γ(α) + σ_η,0² , κ_η,0σ_η,0² (γ(α) + σ_η,0² )⁻¹, σ_²,0² )⁰; α ∈ A, (5.77) where γ(α) ≡ 0 for α ∈ A^c. Then

plim

n→∞

n^δ(−2`( ˆθ(α); α) + 2`(θ_α⁽¹⁾; α)) = 0; if δ ∈ (0, 1), (5.78)

−2`( ˆθ(α); α) + 2`(θ⁽¹⁾_α ; α) = O_p(1); if δ = 0. (5.79) In addition, for L^KL(α; θ) defined in (3.3) and α ∈ A \ A^c,

plim

n→∞L^KL( ˆθ(α); α)±

L^KL(θ_α⁽¹⁾; α) = 1; if δ ∈ (0, 1), (5.80) L^KL( ˆθ(α); α) − L^KL(θ_α⁽¹⁾; α) = O_p(1); if δ = 0. (5.81)

Note that from Theorem 4, we have plim

n→∞

θ(α) = θˆ _α⁽¹⁾ for δ ∈ (0, 1), which immediately gives (5.78). On the other hand, (5.79) is somewhat surprising, because ˆθ(α) generally does not converge to θ⁽¹⁾α for δ = 0.

Theorem 5 Consider a class of models given by (3.1) with xj(s) = s^j; j = 1, . . . , p, and cov(η(s), η(s⁰)) = σ²_ηexp(−κ_η|s − s⁰|), where σ_η² > 0, κ_η > 0 and σ_²² > 0 are known, and p is fixed. Suppose that A = {α₀, α₁, . . . , α_p}, where α₀ = ∅, α_j = {1, . . . , j} for j = 1, . . . , p, and A^c6= ∅. In addition, suppose that the data are collected at si = in^−(1−δ); i = 1, . . . , n, for some δ ∈ [0, 1).

(i) For δ = 0 and any λ > 0,

n→∞lim P¡

α^c= arg min

α∈A

L^KL(α)¢

< 1. (5.82)

In addition, if λ → ∞, then

n→∞lim P¡ ˆ

α_GIC(λ) = α0

¢= 1, (5.83)

where ˆα_GIC(λ) is defined in (4.2).

(ii) For δ ∈ (0, 1), if λ → ∞ and n^(2p(α^c^)+1)δ±

λ → ∞ as n → ∞, then

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1.

Proof. (i) For δ = 0, by (3.8),

L^KL(α) = µ⁰A(α)⁰Σ⁻¹A(α)µ + (η + ²)⁰M (α)⁰Σ⁻¹(η + ²), where

(η + ²)⁰M (α)⁰Σ⁻¹(η + ²) = (η + ²)⁰Σ⁻¹X(α)(X(α)⁰Σ⁻¹X(α))⁻¹X(α)⁰Σ⁻¹(η + ²)

∼ χ²(p(α)), (5.84)

with χ²(k) denoting the chi-square distribution with k degrees of freedom. Similarly, (η + ²)⁰(M (α^c) − M (α))⁰Σ⁻¹(η + ²) ∼ χ²(p(α^c) − p(α)).

By (5.50), for α ∈ A \ A^c, we have µ⁰A(α)⁰Σ⁻¹A(α)µ = O(1). Hence, for α ∈ A \ A^c,

n→∞lim P¡

L^KL(α^c) − L^KL(α) > 0¢

= lim

n→∞P¡

(η + ²)⁰(M (α^c) − M (α))⁰Σ⁻¹(η + ²) − µ⁰A(α)⁰Σ⁻¹A(α)µ > 0¢

> 0.

Thus (5.82) is obtained.

For (5.83), by (4.6) with λ → ∞,

ΓGIC(λ)(α) = (Z − ˆµ(α))⁰Σ⁻¹(Z − ˆµ(α)) + λp(α)

= µ⁰A(α)⁰Σ⁻¹A(α)µ + (η + ²)⁰A(α)⁰Σ⁻¹(η + ²) + 2µ⁰A(α)⁰Σ⁻¹(η + ²) +λp(α)

= 2µ⁰A(α)⁰Σ⁻¹(η + ²) + (η + ²)⁰Σ⁻¹(η + ²) + λp(α) + o_p(λ),

where the last equality follows from (5.50) and (5.84). In addition, by Chebyshev’s in-equality and the following moment condition:

var(µ⁰A(α)⁰Σ⁻¹(η + ²)) = µ⁰A(α)⁰Σ⁻¹A(α)µ = O(1), we have µ⁰A(α)⁰Σ⁻¹(η + ²) = O_p(1). Therefore, for α ∈ A \ {α₀},

Γ_GIC(λ)(α) − Γ_GIC(λ)(α₀) = λ(p(α) − p(α₀)) + o_p(λ),

which is greater than zero with probability tending to 1. Thus (5.83) is obtained.

(ii) It suffices to show that lim_n→∞EL^KL(α)/λ = ∞ for α ∈ A \ A^c by (4.7). First, for α ∈ A \ A^c,

µ⁰A(α)⁰Σ⁻¹A(α)µ

= β⁰X⁰(Σ⁻¹− Σ⁻¹X(α)(X(α)⁰Σ⁻¹X(α))⁻¹X(α)Σ⁻¹)Xβ

= β^∗⁰X^∗⁰(Σ⁻¹− Σ⁻¹X^∗(α)(X^∗(α)⁰Σ⁻¹X^∗(α))⁻¹X^∗(α)Σ⁻¹)X^∗β^∗

= β^∗⁰(V_p,p− V_p,p(α)V_p(α),p(α)⁻¹ V_p(α),p)β^∗+ o(n^(2p(α^c^)+1)δ)

= β_p(α² c)e⁰_p(αc)(V_p,p − V_p,p(α)V_p(α),p(α)⁻¹ V_p(α),p)e_p(α^c₎n^(2p(α^c⁾⁺¹⁾ⁿ^δ + o(n^(2p(α^c^)+1)δ), where e_j is the jth column of I_p, β^∗(α) = D(α)β(α), X^∗(α) = D⁻¹(α)X(α) with

D(α) =







1 0 · · · 0 0 n^δ . .. ...

... ... ... 0 0 · · · 0 n^p(α)δ





,

V_j×j⁰ is defined in (5.48), and e⁰_p(αc)(V_p,p−V_p,p(α)V_p(α),p(α)⁻¹ V_p(α),p)e_p(α^c₎is a constant, which is bounded away from 0 by Theorem 3.1 of Shibata (1981). It follows from (3.9) and

n→∞lim λ±

n^(2p(α)+1)δ= 0 that

n→∞lim

EL^KL(α)

λ = lim

n→∞

EL^KL(α)/n^(2p(α^c^)+1)δ λ/n^(2p(α^c^)+1)δ = ∞.

This completes the proofs. 2

Theorem 6 Consider the same setup as in Theorem 5 except x_j(s) = (sn^−δ)^j; j = 1, . . . , p.

(i) For δ = 0 and any λ > 0,

n→∞lim P¡

α^c= arg min

α∈A

L^KL(α)¢

< 1.

In addition, if λ → ∞, then

n→∞lim P¡ ˆ

α_GIC(λ) = α₀¢

= 1, where ˆα_GIC(λ) is defined in (4.2).

(ii) For δ ∈ (0, 1), if λ → ∞ and n^δ±

λ → ∞ as n → ∞, then

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1.

Proof. (i). See (i) in Proof of Theorem 5.

(ii). From (ii) of Theorem 2, it suffices to show that limn→∞EL^KL(α)/λ = ∞ for α ∈ A \ A^c. By (5.49),

µ⁰A(α)⁰Σ⁻¹A(α)µ = γ(α)n^δ+ o(n^δ),

where γ(α) is a constant, which is bounded away from 0 by Theorem 3.1 of Shibata (1981).

It follows from (3.9) and lim

n→∞λ±

n^δ = 0 that

n→∞lim

EL^KL(α)

λ = lim

n→∞

EL^KL(α)/n^δ λ/n^δ = ∞.

This completes the proof. 2

Theorem 7 Under the setup of Theorem 6, suppose that θ = (σ_η², κ_η, σ_²²)⁰ ∈ Θ is un-known, where Θ ⊂ (0, ∞)³ is a compact set such that θ₀ ∈ Θ. Let ˆθ(α) be the ML estimate of θ . For δ = 0, if λ → ∞ as n → ∞, then

n→∞lim P¡ ˆ

α_GIC(λ) = α₀¢

= 1.

For δ ∈ (0, 1), if λ → ∞ and λ/n^δ → 0 as n → ∞, then

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1.

Proof. First, for δ = 0, we prove

n→∞lim P¡ ˆ

αGIC(λ)= α0

¢ = 1.

By (4.10) and by (5.79), for α₀ = ∅ and θ⁽¹⁾α defined in (5.77), we have

ΓGIC(λ)(α) − ΓGIC(λ)(α0) = −2`(θ_α⁽¹⁾; α) + 2`(θ⁽¹⁾_α₀; α0) + λ(p(α) − p(α0)) + Op(1)

= λ(p(α) − p(α₀)) + ξ(θ⁽¹⁾_α ) − ξ(θ⁽¹⁾_α₀) + O_p(1)

= λ(p(α) − p(α₀)) + O_p(1) > 0,

as n → ∞ with probability tending to 1, where the second equality follows from (5.54) and the third equality follows from (5.76).

Second, for δ ∈ (0, 1), we prove

n→∞lim P¡ ˆ

α_GIC(λ) = α^c¢

= 1.

It suffices to show that the conditions in Theorem 3 are satisfied. First, by (5.36) and (5.37), we have

X⁰Σ⁻¹(θ)X = κ_η

2σ_η²V_p×pn^δ+ o(n^δ),

where Vp×p is defined in (5.48) and is nonsingular. Then (A.2) is satisfied. Second, by (5.38), (A.3) is satisfied trivially. Third, (A.4)-(A.5) are followed by (5.78) and (5.80) for τ_n = n^δ and θ_α = θ⁽¹⁾α defined in (5.77). Fourth, (A.1) holds by (5.49). Fifth, for ξ(θ) defined in (5.53), by (5.71), we have

ξ(θ₀) − ξ(θ_α⁽¹⁾) = o_p(n^δ).

Hence, (4.12) holds. Last, for α ∈ A^c, θα⁽¹⁾ = θ0, (4.14) holds trivially. This completes

the proof. 2

在文檔中空間統計模型選取之大樣本理論 (頁 28-49)