• 沒有找到結果。

Unknown Covariance Parameters

α∈Ainf LKL(α) = 1.

Proof. The proof is essentially the same as that for Theorem 2 of Shao (1997) and hence

is omitted. 2

Theorem 2 reduces to Theorem 2 of Shao (1997) if Σ = σ2I. Similar to (4.5), Equation (4.7) provides a condition for risks associated with underfitted models. Equa-tion (4.8) is a weak technique condiEqua-tion that holds trivially when p is fixed. In fact, (4.7) is slightly weaker than the two conditions given in Theorem 2 of Shao (1997):

n→∞lim inf

α∈A\AcE(LKL(α))/n > 0 and lim

n→∞λp/n = 0. Similar to Corollary 1, we have the following corollary.

Corollary 2 Consider a class of models given by (3.1) with xj(s)’s independently gener-ated from white-noise processes of (5.7), where p is fixed and Ac6= ∅. If lim

n→∞tr¡ Σ−1¢±

λ =

∞ and λ → ∞, then lim

n→∞P¡ ˆ

αGIC(λ) = αc¢

= 1. In addition, plim

n→∞

LKLαGIC(λ)) .

α∈Ainf LKL(α) = 1.

Similar to the remark given right after Corollary 1, lim

n→∞λtr¡ Σ¢±

n2 = 0 is sufficient for lim

n→∞tr¡

Σ−1¢±

λ = ∞. (see an example in Theorem 12 of Section 5.3).

4.3 Unknown Covariance Parameters

In practice, the covariance parameter vector θ is usually unknown and needs to be esti-mated. Two approaches are commonly applied under this situation. The first one utilizes a two-step procedure by first estimating the covariance parameters using, for example, ML or REML, and then pretending the estimated parameters as known for subsequent

inference or prediction. The other one applies a Bayesian method that requires specify-ing a joint prior distribution for all the unknown parameters. Here we consider only the former one with ˆθ(α) being the ML estimate of θ for α ∈ A, obtained by maximizing the following profile log-likelihood function,

`(θ; α) = −1

2n log(2π) − 1

2log det(Σ(θ))

1

2(Z − X(α) ˆβ(α; θ))0Σ−1(θ)(Z − X(α) ˆβ(α; θ)), (4.9) where ˆβ(α; θ) ≡ (X(α)0Σ(θ)−1X(α))−1X(α)0Σ(θ)−1Z and Σ is written as Σ(θ) to emphasis its dependence on θ. Let Θ be the parameter space for θ, and let θ0 ∈ Θ be the true covariance parameter vector. We shall develop asymptotic properties of GIC,

ΓGIC(λ)(α) = −2`( ˆθ(α); α) + λ(p(α)), (4.10)

under both the fixed domain asymptotic and the increasing domain asymptotic frame-works. The main difficulty to overcome is that some components of ˆθ(α) may converge to nondegenerate distributions even for α ∈ Ac under the fixed domain asymptotic frame-work.

We impose some regularity conditions for establishing asymptotic properties of GIC.

Denote by λmin(M ) the smallest eigenvalue of a symmetric matrix M . We consider some regularity conditions. Suppose that there exists τn → ∞ such that the following are satisfied;

(A.1) For θ ∈ Θ, lim

n→∞

1 τn inf

α∈A\Acµ0A(α; θ)0Σ−1(θ)A(α; θ)µ > 0, where A(α; θ) is defined in (3.6).

(A.2) For θ ∈ Θ, lim

n→∞

1 τnλmin

¡X0Σ−1(θ)X¢

> 0.

(A.3) For θ ∈ Θ, lim

n→∞

1 τnλmax¡

X0Σ−1(θ)Σ(θ0−1(θ)X¢

< ∞.

(A.4) For α ∈ A, there exists some θα∈ Θ such that plim

n→∞

1 τn

¡`( ˆθ(α); α) − `(θα; α)¢

= 0.

(A.5) For α ∈ A \ Ac and θα given in (A.4), plim

n→∞

1 τn

¡LKL(α; ˆθ(α)) − LKL(α; θα

= 0.

In most cases, τn can be chosen as inf

j∈αcXj0Σ−1(θ)Xj or λmin¡

X0Σ−1(θ)X¢

, where Xj is the jth column of X (see Theorems 7, 10 and 13). Condition (A.1) provides the effect suffered from applying an incorrect model. Condition (A.2) ensures that the explanatory variables are not too much correlated. Obviously, (A.4) and (A.5) hold when plim

n→∞

θ(α) =ˆ θα, for some θα ∈ Θ. In some situation, θα is different from θ0. For example, when α ∈ A \ Ac, ˆθ(α) generally does not converge in probability to θ0. Surprisingly, (A.4) and (A.5) may hold even if ˆθ(α) converges to a nondegenerate distribution (see Theorems 7, 10 and 13).

Theorem 3 Consider a class of models given by (3.1) with p fixed. Let Θ be a compact parameter space for θ with θ0 ∈ Θ being the true parameter, and let LKL(α) be the KL loss defined in (3.3). Suppose that for α ∈ A, `(θ; α) defined in (4.9) is continuous in Θ, and (A.1)-(A.5) are satisfied for some τn→ ∞.

(i) For Ac= ∅, if τn/λ → ∞, and the following two conditions hold for α ∈ A:

n→∞lim sup

α∈A\Ac

1

τnµ0A(α; θ)0Σ−1(θ)Σ(θ0−1(θ)A(α; θ)µ < ∞, (4.11) plim

n→∞

1 τntr¡

((η + ²)(η + ²)0− Σ(θ0))¡

Σ−1α) − Σ−10)¢¢

= 0, (4.12) then GIC defined in (4.2) is asymptotically loss efficient:

plim

n→∞LKL( ˆθ(ˆαGIC(λ)); ˆαGIC(λ)) .

minα∈ALKL( ˆθ(α); α) = 1. (4.13)

(ii) For Ac6= ∅, if λ → ∞, τn/λ → ∞, (4.12) holds, and

n→∞lim 1 τn

¡log det(Σ(θα)) − log det(Σ(θ0)) + tr(Σ(θ0−1α)) − n¢

= 0,(4.14) for α ∈ Ac, then lim

n→∞P¡ ˆ

αGIC(λ) = αc¢

= 1.

Proof. (i) We first prove that for α ∈ A,

ΓGIC(λ)(α) = n log(2π) + log det(Σ(θ0)) + (η + ²)0Σ−10)(η + ²) + 2LKL(α; θα)

+opn). (4.15)

By (3.3) and (3.7), we can rewrite 2LKL(α; θα) as

2LKL(α; θα) = log det(Σ(θα)) − log det(Σ(θ0)) + tr(Σ(θ0−1α)) − n 0A(α; θα)0Σ−1A(α; θα)(θ)µ

+(η + ²)M (α; θα)0Σ−1α)(η + ²), (4.16) where M (α; θα) and A(α; θα) are defined in (3.5) and (3.6). By (4.10), we have for α ∈ A,

ΓGIC(λ)(α) = −2`( ˆθ(α); α) + 2`(θα; α) − 2`(θα; α) + λp(α)

= −2`(θα; α) + λp(α) + opn)

= n log(2π) + log det(Σ(θα)) + µ0A(α; θα)0Σ−1α)A(α; θα

−2µ0A(α; θα)0Σ−1α)A(α; θα)(η + ²) + (η + ²)0Σ−1α)(η + ²)

−(η + ²)0M (α; θα)0Σ−1α)(η + ²) + opn)

= n log(2π) + log det(Σ(θα)) + µ0A(α; θα)0Σ−1α)A(α; θα +(η + ²)0Σ−1α)(η + ²) + opn)

= n log(2π) + log det(Σ(θ0)) + (η + ²)0Σ−10)(η + ²) + 2LKL(α; θα) +tr((η + ²)(η + ²)0− Σ(θ0)(Σ−1α) − Σ−10))) + opn)

= n log(2π) + log det(Σ(θ0)) + (η + ²)0Σ−10)(η + ²) + 2LKL(α; θα) +opn),

where the second equality follows from (A.4), the third equality follows from (4.9), the fourth equality follows from the following two equations, which will be proved later:

(η + ²)0M (α; θα)0Σ−1α)(η + ²) = Op(1); α ∈ A, (4.17) µ0A(α; θα)0Σ−1α)A(α; θα)(η + ²) = opn); α ∈ A, (4.18)

the fifth equality follows from (4.16) and

(η + ²)0Σ−1α)(η + ²) − (η + ²)0Σ−10)(η + ²) + n − tr(Σ(θ0−1α))

= tr((η + ²)(η + ²)0− Σ(θ0)(Σ−1α) − Σ−10))),

and the last equality follows from (4.12). It remains to show (4.17) and (4.18). For (4.17), we have

(η + ²)0M (α; θα)0Σ−1α)(η + ²)

=

µ(η + ²)0Σ−1α)X(α) τn1/2

¶µX(α)0Σ−1α)X(α) τn

−1

×

µX(α)0Σ−1α)(η + ²) τn1/2

. (4.19)

By (A.2), µ

X0Σ−1α)X τn

−1

= Op(1). (4.20)

By (A.3),

n→∞lim 1

τnvar(Xj0Σ−1α)(η + ²)) = lim

n→∞

1

τnXj0Σ−1α)Σ(θ0−1α)Xj < ∞.

where Xj be the jth column of X. This together with E(Xj0Σ−1α)(η + ²)) = 0 imply that

1 τn1/2

Xj0Σ−1α)(η + ²) = Op(1). (4.21) Therefore, (4.17) follows from (4.19)-(4.21). Using

µ0A(α; θα)0Σ−1α)A(α; θα)(η + ²) = µ0A(α; θα)0Σ−1α)(η + ²), (4.11) and the Markov’s inequality, we have for any ε > 0,

n→∞lim P ¡

0A(α; θα)0Σ−1α)(η + ²)±

τn| > ε¢

lim

n→∞P ¡

0A(α; θα)0Σ−1α)(η + ²)±

τn|2 ≥ ε2¢

lim

n→∞

1 ε2τn2

µ0A(α; θα)0Σ−1α)Σ(θ0−1α)A(α; θα¢

= 0.

This gives (4.18). Thus (4.15) is obtained.

We are now ready to prove (4.13). Let αL = arg min

α∈A

LKL(α; θα). By (4.15), we have

0 ≤ plim

n→∞

ΓGIC(λ)L) − ΓGIC(λ)αGIC(λ))

τn = plim

n→∞

LKLL; θαL) − LKLαGIC(λ); θαˆGIC(λ))

τn ≤ 0,

for some θαˆ, θαL ∈ Θ where the first inequality follows from the definition of ˆαGIC(λ), the equality follows from (4.15) and the last inequality follows from the definition of αL. It follows that

plim

n→∞

LKLαGIC(λ);; θαˆGIC(λ)) − LKLL; θαL) τn

= 0. (4.22)

In addition, by (A.1) and (4.16), plim

n→∞

2

τnLKL(α; θα)) > plim

n→∞

1

τnµ0A(α; θα)0Σ−1α)A(α; θα)µ > 0.

This together with (4.22) implies that plim

n→∞LKLL; θα

LKLαGIC(λ); θαˆGIC(λ)) = 1. Then by (A.5),

plim

n→∞LKLL; ˆθ(αL))±

LKLαGIC(λ); ˆθ(ˆαGIC(λ))) = 1.

which gives (4.13). This completes the proof of (i).

(ii) We first prove (4.15) for Ac 6= ∅. The proof is essentially the same as that in (i) except (4.18) needs to be shown as follows:

µ0A(α; θα)0Σ−1α)A(α; θα)(η + ²)

= µ0A(α; θα)0Σ−1α)(η + ²)

= β(αc\ α)0X(αc\ α)0Σ−1α)(η + ²)

µβ(αc\ α)0X(αc\ α)0Σ−1α)X(α) τn1/2

¶µX(α)0Σ−1α)X(α) τn

−1

×

µX(α)0Σ−1α)(η + ²) τn1/2

= β(αc\ α)0X(αc\ α)0Σ−1α)(η + ²) + Op(1)

= opn),

where the second last equality follows similarly from the proof of (4.17) and the last equality follows from (4.21).

Second, we prove that lim

n→∞P¡

ΓGIC(λ)(α) > ΓGIC(λ)c

= 1, for α ∈ A \ Ac. By (A.4), we have for α ∈ Ac,

ΓGIC(λ)(α) = −2`(θα; α) + opn)

= n log(2π) + log det(Σ(θα)) + (η + ²)0Σ−1α)(η + ²)

−(η + ²)0M (α; θα)0Σ−10)(η + ²) + opn)

= n log(2π) + log det(Σ(θα)) + (η + ²)0Σ−1α)(η + ²) + opn),(4.23) where the first equality follows from λp = o(τn) and the last equality follows from (4.17).

Then, by (4.15) and (4.23), we have for α ∈ A \ Ac, ΓGIC(λ)(α) − ΓGIC(λ)c)

= 2LKL(α; θα) + log det(Σ(θ0)) + (η + ²)0Σ−10)(η + ²)

− log det(Σ(θαc)) − (η + ²)0Σ−1αc)(η + ²) + o(τn)

= 2LKL(α; θα) − tr¡

((η + ²)(η + ²)0− Σ(θ0))¡

Σ−1αc) − Σ−10)¢¢

¡

log det(Σ(θαc)) − log det(Σ(θ0)) + tr(Σ(θ0−1αc)) − n¢

+ opn)

= 2LKL(α; θα) + opn) > 0,

as n → ∞ with probability tending to 1, where the last equality follow from (4.12), (4.14) and (4.22). It follows that lim

n→∞P¡ ˆ

αGIC(λ) ∈ A \ Ac¢

= 0.

Last, it remains to show that GIC achieves its minimum at αc among α ∈ Ac. For as n → ∞ with probability tending to 1, which follows that lim

n→∞P¡ ˆ

αGIC(λ) ∈ Ac, ˆαGIC(λ) 6=

αc¢

= 0. This completes the proof of the theorem. 2

Conditions (A.1)-(A.3) in Theorem 3 not only depend on explanatory variables but also depend on asymptotic frameworks. As shown in Theorem 7, those conditions are easier to be satisfied under the increasing domain asymptotic framework, particularly when the domain increases with the sample size in a faster rate. On the other hand, (A.1) may not be satisfied under the fixed domain asymptotic framework.

Theorem 3 is for fixed designs. A random design version is given in the following corollary.

Corollary 3 (random design) Consider a class of models given by (3.1) with p fixed and X random, where X is independent of (η + ²). Let Θ be a compact parameter space for θ with θ0 ∈ Θ being the true parameter vector, and let LKL(α) be the KL loss defined

(A.3’) For θ ∈ Θ, lim

n→∞

1

τntr(Σ−1(θ)Σ(θ0−1(θ)E(XjXj0)) < ∞, where Xj is the jth column of X,

and (A.4)-(A.5) are satisfied.

(i) For Ac= ∅, if τn/λ → ∞, (4.12) holds and

n→∞lim sup

α∈A\Ac

1 τnE¡

µ0A(α; θ)0Σ−1(θ)Σ(θ0−1(θ)A(α; θ)µ¢

< ∞,

for θ ∈ Θ, then GIC defined in (4.2) is asymptotically loss efficient:

plim

n→∞LKL( ˆθ(ˆαGIC(λ)); ˆαGIC(λ)) .

minα∈ALKL( ˆθ(α); α) = 1.

(ii) For Ac6= ∅, if λ → ∞, τn/λ → ∞, (4.12) and (4.14) hold, then

n→∞lim P¡ ˆ

αGIC(λ) = αc¢

= 1.

Similar to (A.1)-(A.3) in Theorem 3 under fixed designs, (A.1’)-(A.3’) in Corollary 3 not only depend on explanatory variables but also depend on asymptotic frameworks.

In contrast to fixed designs with smooth functions as explanatory variables, where (A.1) may not be satisfied (see Theorem 7) under the fixed domain asymptotic framework, condition (A.1’) appear to be easier satisfied when random designs are considered (see some examples in Theorems 10 and 13).

Chapter 5

Exponential Covariance Models in One Dimension

In this chapter, we consider some examples in the one-dimensional space with η(·) of (2.2) generated from an exponential covariance function:

cov(η(s), η(s)0) = σ2ηexp(−κη|s − s0|); s, s0 ∈ R, (5.1) where ση2 > 0 and κη > 0. Let si = in−(1−δ) i = 1, . . . , n, for some δ ∈ [0, 1). Then {η(s1), . . . η(sn)} can be expressed as an AR(1) process:

η(si) = ρnη(si−1) + ζi, (5.2) where

ρn≡ exp(−κηn−(1−δ)), (5.3)

η(s1) ∼ N(0, σ2η), ζi ∼ N(0, σ2η(1 − ρ2n)) is independent of η(si−1) for i = 2, . . . , n, and η(s1), ζ2, . . . , ζn are independent. Then the covariance parameter vector can be written as θ ≡ (ση2, κη, σ2²)0.

In what follows, we consider four examples corresponding to four different classes of explanatory variables in (3.1) with the exponential covariance model of (5.1) for η(·).

Example 1 (polynomials) Suppose that there are p explanatory variables, xj(si); j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, with xj(·) given by

xj(s) = sj; s ∈ R, j = 1, . . . , p, (5.4) where p is fixed and δ ∈ [0, 1).

Example 2 (polynomials varying with n) Suppose that there are p explanatory variables xj(si); j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, with xj(·) given by

xj(s) = (sn−δ)j; s ∈ R, j = 1, . . . , p, (5.5) where p is fixed and δ ∈ [0, 1).

Example 3 (spatially dependent processes) Suppose that there are p explanatory vari-ables xj(si); j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, where x1(·), . . . , xp(·) are independent zero-mean Gaussian spatial processes with covariance functions,

cov(xj(s), xj(s0)) = σj2exp{−κj|s − s0|}; s, s0 ∈ R, j = 1, . . . , p, (5.6) p is fixed, δ ∈ [0, 1), and σ2j, κj > 0; j = 1, . . . , p.

Example 4 (white noise processes) Suppose that there are p explanatory variables xj(si);

j = 1, . . . , p, sampled at si = in−(1−δ); i = 1, . . . , n, where x1(·), . . . , xp(·) are independent white-noise processes with

xj(si) ∼ N(0, σ2j); i = 1, . . . , n, j = 1, . . . , p, (5.7) p is fixed, δ ∈ [0, 1), and σ2j > 0; j = 1, . . . , p.

We shall characterize the asymptotic behavior of GIC under both the fixed domain and the increasing domain frameworks with θ being either known or estimated by ML.

We shall also show how different generating mechanism of explanatory variables in the aforementioned examples affects the asymptotic behavior.

First, we introduce some notations and a number of technical lemmas regarding ex-ponential covariance functions, which are crucial for developing the asymptotical results of GIC. Let

Gk







1 0 0 · · · 0

−ρn 1 0 . .. ...

0 −ρn 1 . .. 0 ... . .. ... ... 0 0 · · · 0 −ρn 1







k×k

, (5.8)

Tk







ση2+ σ2² −σ²2ρn 0 · · · 0

−σ²2ρn f1n) −σ2²ρn . .. ...

0 −σ²2ρn f1n) . .. 0 ... . .. . .. . .. −σ2²ρn 0 · · · 0 −σ2²ρn f1n)







k×k

, (5.9)

be k × k matrices, where

f1n) ≡ (1 − ρ2nη2+ (1 + ρ2n²2. (5.10) Lemma 4 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Then

Σ−1(θ) = G0nTn−1Gn, (5.11) where Gn and Tn are given by (5.8) and (5.9), respectively.

Lemma 5 For any c > 0 and δ ∈ [0, 1) with n(1−δ)/2+c < n, consider Tjn defined in (5.9) with n(1−δ)/2+c ≤ jn ≤ n. Let Cjn(k, `) be the (k, `)th element of Tj−1n . Then there exists a constant τ > 0 such that

σ²−2jndet(Tjn) = f2jn−1n)

(f12n) − 4ρ2nσ4²)1/2((σ2η+ σ²2)f2n) − ρ2nσ²2) + o(exp(−τ nc/2)),(5.12) where ρn and f1n) are given by (5.3) and (5.10), respectively, and

f2n) ≡ f1n) + (f12n) − 4ρ2nσ4²)1/2

²2 . (5.13)

In addition,

Cjn(1, `) = Cjn(`, 1) = ρ`−1n

((ση2+ σ2²)f2n) − ρ2nσ2²)f2`−2n) + o(exp(−τ nc/2));

1 ≤ ` ≤ jn− n(1−δ+c)/2, (5.14)

Cjn(jn, `) = Cjn(`, jn) = ρjnn−`

f2jn−`+1n²2 + o(exp(−τ nc/2)); n(1−δ+c)/2 < ` ≤ jn,(5.15)

1≤k,`≤jmaxn

Cjn(k, `) = 1

(8κηση2σ−2² )1/2n(1−δ)/2+ o(n−(1−δ)), (5.16) and

tr(Tn−1) = n(3−δ)/2

2(2κησ2ησ²2)1/2 + O(n1−δ). (5.17) Furthermore, let Tn(1) be the matrix with (σ2η, κη, σ2²) in Tn replaced by (ση(1)2, κ(1)η , σ(1)2² ).

Then

tr(Tn−1Tn(1)−1) = n(5−3δ)/2

25/2ηση2κ(1)η σ(1)2η )1/2((κησ2ησ²(1)2)1/2+ (κ(1)η σ(1)2η σ²2)1/2)

+O(n2−δ). (5.18)

Notice that Tn defined in (5.9) corresponds to the variance-covariance matrix of a moving average (MA) process {υ1, . . . , υn} of order 1:

var(υn) = Tn, (5.19)

where υn≡ (υ1, . . . , υn),

υi = ui− f4n)ui−1; i = 2, . . . , n, (5.20) with u1 ∼ N(0, (σ2η− σ²2− f2n²2)f4−2n)) and ui ∼ N(0, f2n²2); i = 2, . . . , n,

f4n) ≡ ρn/f2n), (5.21) and recall that f2n) and ρnare defined in (5.13) and (5.3), respectively. Some asymptotic properties of f4n) and Tn are given in the follow lemmas.

Lemma 6 With f2n) and f4n) defined in (5.13) and (5.21), respectively, we have f4n) = 1 − (2κηση2σ²−2)1/2n−(1−δ)/2+ O(n−(1−δ)), (5.22) f2n) = 1 + (2κηση2σ−2² )1/2n−(1−δ)/2+ (ση2− σ²2−2² κηn−(1−δ)+ O(n−3(1−δ)/2),(5.23) and

log f4n) = −(2κησ2ησ²−2)1/2n−(1−δ)/2+ O(n−(1−δ)). (5.24) In addition, for any c > 0 and δ ∈ [0, 1) with n(1−δ)/2+c < n, and any jn with n(1−δ)/2+c jn≤ n, there exists a constant τ > 0 such that

f4jnn) = o(exp(−τ nc)). (5.25)

Lemma 7 Consider Tn defined in (5.9). For any c > 0, δ ∈ [0, 1) with n(1−δ)/2+c < n, and any jn with n(1−δ)/2+c ≤ jn≤ n, there exists a constant τ > 0 such that

Tn−1 = Ω0n

µ Λ−1jn 0

0 (f2n²2)−1In−jn

n+ o(exp(−τ nc/2)), (5.26) where

n





1 0 · · · 0

f4n) 1 . .. ...

... . .. . .. 0 f4n−1n) · · · f4n) 1



, (5.27)

f2n) and f4n) are given by (5.13) and (5.21), respectively, and

Λk = ΩkTk0k. (5.28)

The following three lemmas are based on Lemmas 4-7, which are crucial in developing the asymptotical results of ML estimates in Sections 5.1-5.3.

Lemma 8 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let Σ(j)η be the same as Ση except (σ2η, κη) are replaced by

¡σ(j)2η , κ(j)η

¢. Define Σ(j) ≡ Σ(j)η + σ²(j)2; j = 1, 2, 3. Then for δ ∈ [0, 1),

log(det(Σ(θ))) = n log σ2² +

µησ2η σ²2

1/2

n(1+δ)/2

µκη2η+ σ²2) σ²2

nδ

− log n(1−δ)/2+ o(nδ) + O(1), (5.29) tr(Σ(1)η Σ−1(θ)) = σ(1)2η κ(1)η

(2κησ2ησ²2)1/2n(1+δ)/2+ση(1)2κ(1)η η − κ(1)η ) κησ2η nδ +ση(1)2η − κ(1)η )2

ηση2 nδ+ o(nδ) + O(1), (5.30) tr(Σ−1(θ)) = n

σ²2 (2κηση2σ−2² )1/2

²2 n(1+δ)/2+ o(nδ) + O(1), (5.31) tr(Σ(1)Σ−1(θ)) = σ²(1)2

σ2² n − σ²(1)2

²2 (2κηση2σ²−2)1/2n(1+δ)/2+ ση(1)2κ(1)η

(2κηση2σ²2)1/2n(1+δ)/2 +ση(1)2κ(1)η η − κ(1)η )

κησ2η nδ+ση(1)2η− κ(1)η )2 ηση2 nδ

+o(nδ) + O(1), (5.32)

tr(Σ(1)η Σ−1(θ)Σ(2)η Σ(3)−1) = σ(1)2η κ(1)η σ(2)2η κ(2)η n(1+δ)/2

21/2ησ2ηκ(3)η ση(3)2)1/2((κηση2σ²(3)2)1/2+ (κ(3)η ση(3)2σ²2)1/2)

+O(nδ), (5.33)

tr(Σ(1)η Σ−1(θ)Σ(2)−1) = ση(1)κ(1)η

σ2²σ²(2)2((2κηση2σ²−2)1/2+ (2κ(2)η σ(2)2η σ(2)−2² )1/2) + O(nδ), (5.34) tr(Σ(1)−1Σ−1(θ)) = n

σ(1)2² σ2² 1 σ²(1)2σ2²

µησ2ησ−2² )1/2+ (κ(1)η ση(1)2σ²(1)−2)1/2 21/2

ηση2σ−2² )1/2(1)η ση(1)2σ²(1)−2)1/2 21/2((κησ2ησ²−2)1/2+ (κ(1)η ση(1)2σ²(1)−2)1/2)

n(1+δ)/2

+O(nδ). (5.35)

Lemma 9 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let ψk ≡ n−k(1k, 2k, . . . , nk)0; k ∈ {0, 1, . . . }. Then for any k = 0, 1, . . . , ` = 1, 2, . . . , and any δ ∈ [0, 1),

ψk0Σ−1ψ` = κη

η2(k + ` + 1)nδ+ 1

η2 + k`

ηση2(k + ` − 1)n−δ+ o(nδ), (5.36) ψ00Σ−1ψ0 = κ

η2nδ+ 1

σ2η + o(nδ). (5.37)

In addition, for δ ∈ [0, 1), k, ` = 0, 1, . . . , p, and Σ(1) defined in Lemma 8,

ψk0Σ−1Σ(1)Σ−1ψ` = O(nδ). (5.38)

Lemma 10 Consider Σ(θ) and Ση defined in (3.2) and (5.1), where si = in−(1−δ); i = 1, . . . , n, and δ ∈ [0, 1). Let Σj = var((xj(s1), . . . , xj(sn))0) with xj(s) defined in (5.6).

(i) Suppose that |σ²2− σ2| = o(1) for some constant σ2 > 0. Then log(σ²2) + σ2

σ²2 − log(σ2) − 1 = 1

42² − σ2)2+ o((σ2² − σ2)2). (5.39) (ii) Suppose that |σ²2−σ2| = o(1) for some constants σ2 > 0. Then for any κη, ση2, τ > 0,

µ 1 σ2²

1/2µ

1 − σ2

2² + τ ηση2

µ 1

σ2

1/2µ 1 2 + τ

ησ2η

= o(σ²2− σ2). (5.40)

(iii) Suppose that |κησ2η− τ | = o(1) for some constant τ > 0. Then for any σ2 > 0, µηση2

σ2

1/2µ 1

2 + τ ηση2

µ

σ2

1/2

= ησ2η− τ )2

25/2στ3/2 + o((κηση2− τ )2). (5.41) (iv) Suppose that |σ²2− σ2| = o(1) and |κηση2 − τ | = o(1) for some constants σ2, τ > 0.

Then for any κj, κj0, σj2, σj0 > 0,

tr(Σj(Σ(θ) − Σ((σ2η, κη, σ2)0))Σj(Σ((ση2, κη, σ²2)0) − Σ((ση2, κη, σ2)0)))

= κjσj2κj0σj20

29/2τ3/2σ3²2− σ2)2n(1+δ)/2+ o((σ²2− σ2)2n(1+δ)/2) + O(nδ). (5.42)

(v) Suppose that |σ²2− σ2| = o(1) and |κηση2 − τ | = o(1) for some constants σ2, τ > 0.

Then for any κj, σj2 > 0,

tr(Σj−1(θ) − Σ−1((σ2η, κη, σ2)0))(Σ−1(θ) − Σ−1((ση2, κη, σ2)0)))

= jσ2j

29/2τ1/2σ7²2− σ2)2n(1+δ)/2+ o((σ²2− σ2)2n(1+δ)/2) + O(nδ). (5.43) (vi) Suppose that |σ²2− σ2| = o(1) and |κηση2 − τ | = o(1) for some constants σ2, τ > 0.

Then

tr((Σ−1(θ) − Σ−1((ση2, κη, σ2)0))(Σ−1(θ) − Σ−1((σ2η, κη, σ2)0)))

= 1

σ8²2− σ2)2n + o((σ2² − σ2)n(1+δ)/2) + O(nδ). (5.44) (vii) Suppose that |σ²2 − σ2| = o(n−(1−δ)/2) and |κησ2η − τ | = o(1) for some constants

σ2, τ > 0. Then for any κj, κj0, σj2, σj0, c, d > 0 with cd = τ ,

tr(Σj−1(θ) − Σ−1((c, d, σ2)0))Σj0−1(θ) − Σ−1((c, d, σ2)0)))

= jσj2κj0σj20

29/2στ7/2 ηση2− τ )2n(1+δ)/2+ o((κηση2− τ )2n(1+δ)/2) + O(nδ). (5.45) (viii) Suppose that |σ²2 − σ2| = o(n−(1−δ)/2) and |κησ2η − τ | = o(1) for some constants

σ2, τ > 0. Then for any κj, σj2, c, d > 0 with cd = τ ,

tr(Σj−1(θ) − Σ−1((c, d, σ2)0))(Σ−1(θ) − Σ−1((c, d, σ2)0)))

= κjσj2

29/2σ3τ5/2ησ2η− τ )2n(1+δ)/2+ o((κηση2− τ )2n(1+δ)/2) + O(nδ). (5.46) (ix) Suppose that |σ²2 − σ2| = o(n−(1−δ)/2) and |κησ2η − τ | = o(1) for some constants

σ2, τ > 0. Then for any c, d > 0 with cd = τ ,

tr((Σ−1(θ) − Σ−1((c, d, σ2)0))(Σ−1(θ) − Σ−1((c, d, σ2)0)))

= 1

29/2τ5/2ηση2− τ )2n(1+δ)/2+ o((κηση2− τ )2n(1+δ)/2) + O(nδ). (5.47)

5.1 Polynomial Order Selection

In this section, we consider Examples 1 and 2 given by (5.4) and (5.5) for polynomial order selection. Note that in Example 1, the underlying true polynomial does not vary with the sample size, whereas in Example 2, the magnitude of the underlying true polynomial decreases as the sample size increases, making estimation and polynomial order selection more difficult. Let Vj×j0 be a j × j0 matrix with the (k, `)th element,

1

k + ` + 1; k = 1, . . . , j, ` = 1, . . . , j0. (5.48) Note that when j = j0, the square matrix Vj,j is nonsingular (see Shibata 1981).

Proposition 4 Consider a class of models given by (3.1) with p explanatory variables and Vj×j0 is defined in (5.48). In addition, the log-likelihood of (4.9) based on model α ∈ A can be decomposed into the following:

(i) For δ ∈ (0, 1),

Equation (5.52) provides some guidance of applying GIC to distinguish between correct and incorrect models in polynomial order selection. For example, it follows from (5.52) and γ(αc) = 0 that for α ∈ A \ Ac and δ ∈ (0, 1),

−2`(θ; α) + 2`(θ; αc) = γ(α)κη

η2 nδ+ op(nδ), (5.55)

so that we can get rid of underfitted models if the penalty term has a smaller order than O(nδ). As to be demonstrated in Theorem 6, we can use (5.55) to find an appropriate penalty λ that leads to selection consistency. On the other hand, applying (5.54) to Example 2 under the fixed domain asymptotic framework (i.e., δ = 0), we obtain

−2`(θ; α) + 2`(θ; αc) = Op(1), (5.56) indicating that consistency or asymptotic loss efficiency of GIC is almost impossible.

Additionally, we see from (5.54) that the likelihood value depends on κη and ση2 mainly through their product, but not their individual values under the fixed domain asymptotic framework when δ = 0. Consequently, variable selection based on GIC is expected to be not much affected by individual estimates of κη and ση2 as long as the estimate of the microergodic parameter, κηση2, remains the same.

The following lemma provides consistency of the ML estimate of σ²2 and the microer-godic parameter, κηση2, under both the fixed domain and the increasing domain asymptotic frameworks with δ ∈ [0, 1). The results are extended from Chen et al. (2000) who consider only α ∈ Ac and δ = 0.

Lemma 11 Under the setup of Proposition 4, let Θ ⊂ (0, ∞)3 be a compact set and let θ(α) = (ˆˆ σ2η(α), ˆκη(α), ˆσ2²(α))0 be the ML estimate of θ based on model α. Then for any δ ∈ [0, 1),

ˆ

σ²2(α) = σ²,02 + op(1), (5.57) ˆ

κη(α)ˆση2(α) = κη,0ση,02 + op(1). (5.58)

The following theorem further provides the convergence rates for the ML estimates of κη, σ2η and σ²2. These results are also extended from Chen et al. (2000) who consider only α ∈ Ac and δ = 0, and are keys for establishing some asymptotic properties of GIC in Theorem 7.

Theorem 4 Under the setup of Proposition 4, let Θ ⊂ (0, ∞)3 be a compact set and let θ(α) = (ˆˆ σ2η(α), ˆκη(α), ˆσ2²(α))0 be the ML estimate of θ based on model α. Then

(i) For δ ∈ (0, 1), ˆ

σ2²(α) = σ²,02 + op(n−(1−δ)/2); α ∈ A, (5.59) ˆ

κη(α)ˆσ2η(α) = κη,0ση,02 + op(n−(1−δ)/4); α ∈ A, (5.60) ˆ

σ2η(α) =

½ σ2η,0+ op(1); if α ∈ Ac,

γ(α) + σ2η,0+ op(1); if α ∈ A \ Ac, (5.61) ˆ

κη(α) =

½ κη,0+ op(1); if α ∈ Ac,

κη,0ση,02 (γ(α) + ση,02 )−1+ op(1); if α ∈ A \ Ac, (5.62) where γ(α) > 0 is a constant defined in (5.51) for α ∈ A \ Ac.

(ii) For δ = 0 and any α ∈ A, ˆ

σ²2(α) = σ²,02 + Op(n−1/2), (5.63) ˆ

κη(α)ˆση2(α) = κη,0σ2η,0+ Op(n−1/4). (5.64)

Proof. Denote ση,α2 ≡ γ(α) + ση,02 and κη,α ≡ κη,0σ2η,0/(γ(α) + ση,02 ), for α ∈ A, where γ(α) ≡ 0 for α ∈ Ac. Note that κη,αση,α2 = κη,0σ2η,0.

First, we prove (5.59). By (5.57) and (5.58), it suffices to show that for |σ2² − σ²,02 | = o(1), |κησ2η − κη,0ση,02 | = o(1) and any ε > 0,

2²−σ²,02 |≥εninf−(1−δ)/2(−2`(θ; α) + 2`((ση2, κη, σ²,02 )0; α)) > 0, (5.65) as n → ∞ with probability tending to 1. By (5.52), we can write

−2`(θ; α) = n log(2π) − 1 − δ

where the last equality follows from (5.39) and (5.40). Therefore, for (5.65) to hold, it remains to show that Applying Chebyshev’s inequality on each of the three parts and using the following three moment conditions given from (5.42)-(5.44) on (5.68):

var¡

we obtain (5.67). This completes the proof of (5.59).

Second, we prove (5.60). By (5.58) and (5.59), it suffices to show that for |σ²2− σ²,02 | = o(n−(1−δ)/2), |κηση2− κη,0ση,02 | = o(1) and any ε > 0,

η2κη−σ2η,0κinfη,0|≥εn−(1−δ)/4

¡− 2`(θ; α) + 2`((ση,α2 , κη,α, σ2²,0)0; α)¢

> 0, (5.69)

as n → ∞ with probability tending to 1. By (5.66), for |σ2² − σ²,02 | = o(n−(1−δ)/2) and

ηση2− κη,0ση,02 | = o(1), we have

−2`(θ; α) + 2`((ση,α2 , κη,α, σ²,02 )0; α)

=

½µησ2η σ²,02

1/2µ 1

2 +κη,0σ2η,0 ηση2

µη,0ση,02 σ²,02

1/2¾

n(1+δ)/2

+ ση,α2

η,0σ2η,0η− κη,α)2nδ+ ξ(θ) − ξ((ση,α2 , κη,α, σ2²,0)0) + op(nδ)

= ηση2− κη,0ση,02 )2n(1+δ)/2

25/2η,0σ2η,0)3/2 + σ2η,αη − κη,α)nδ

η,0σ2η,0 + ξ(θ) − ξ((ση,α2 , κη,α, σ2²,0)0)

+op(nδ), (5.70)

where the first equality follows from (5.39) and the second equality follows from (5.41).

Therefore, for (5.69) to hold, it remains to show that

ξ(θ) − ξ((ση,α2 , κη,α, σ²,02 )0) = op(max((κηση2− κη,0ση,02 )2n(1+δ)/2, nδ)), (5.71) which can be obtained from a decomposition similar to (5.68) in addition to the following three moment conditions given from (5.45)-(5.47):

var¡

η0−1(θ) − Σ−1((ση,α2 , κη,α, σ²,02 )0))η¢

= O(max((κησ2η− κη,0ση,02 )2n(1+δ)/2, nδ)), var¡

η0−1(θ) − Σ−1((σ2η,α, κη,α, σ²,02 )0))²¢

= O(max((κησ2η− κη,0ση,02 )2n(1+δ)/2, nδ)), var¡

²0−1(θ) − Σ−1((σ2η,α, κη,α, σ²,02 )0))²¢

= O(max((κησ2η− κη,0ση,02 )2n(1+δ)/2, nδ)).

Thus (5.69) is obtained. This completes the proof of (5.60).

Third, we prove (5.61) and (5.62). By (5.70) and (5.71), for |σ2² − σ2²,0| = o(n−(1−δ)/2),

2ηκη − σ2η,0κη,0| = o(n−(1−δ)/4) and any ε > 0, we have

η−κinfη,α|≥ε−2`(θ; α) + 2`((σ2η,α, κη,α, σ2²,0)0; α) = ση,α2

η,0ση,02 ε2nδ+ op(nδ) > 0, as n → ∞ with probability tending to 1, which gives (5.62). This together with (5.60) gives (5.61).

Fourth, we prove (5.63). By (5.57) and (5.58), it suffices to show that for |σ²2− σ²,02 | = o(1) and |κηση2− κη,0ση,02 | = o(1), there exists M > 0 such that

2²−σ2²,0inf|≥M n−1/2

©− 2`(θ; α) + 2`((ση2, κη, σ²,02 )0; α)ª

> 0, (5.72)

as n → ∞ with probability tending to 1. By (5.54), for |σ²2 − σ²,02 | = o(1) and |κησ2η

where the second equality follows from (5.39) and (5.40), and the last equality follows from

ξ(θ) − ξ((ση2, κη, σ²,02 )0) = op

¡2² − σ²,02 )2n¢

+ Op(1), (5.73) which can be obtained in a way similar to (5.67). Consequently, there exists M > 0 such that

2²−σ2²,0inf|≥M n−1/2(−2`(θ; α) + 2`((ση2, κη, σ²,02 )0; α)) = M2

²,04 + Op(1) > 0,

as n → ∞ with probability tending to 1. Thus, we obtain (5.72), and hence the proof of (5.63) is complete.

where the first equality follows from |σ²2− σ2²,0| = o(n−(1−δ)/2), the second equality follows from (5.41), and the last equality follows from

ξ(θ) − ξ((ση,α2 , κη,α, σ²,02 )0) = op

¡ησ2η − κη,0ση,02 )2n1/2¢

+ Op(1), (5.76) which can be obtained in a way similar to (5.71). Thus, (5.74), and hence (5.64) are

obtained. This completes the proof. 2

Note that a special case of Theorem 4 for which δ = 0 and β = 0, can be found in Zhang and Zimmerman (2005), where they consider no regressor, and hence consider no underfitted model.

Corollary 4 Under the setup of Theorem 4, let

θ(1)α = (γ(α) + ση,02 , κη,0ση,02 (γ(α) + ση,02 )−1, σ²,02 )0; α ∈ A, (5.77) where γ(α) ≡ 0 for α ∈ Ac. Then

plim

n→∞

1

nδ(−2`( ˆθ(α); α) + 2`(θα(1); α)) = 0; if δ ∈ (0, 1), (5.78)

−2`( ˆθ(α); α) + 2`(θ(1)α ; α) = Op(1); if δ = 0. (5.79) In addition, for LKL(α; θ) defined in (3.3) and α ∈ A \ Ac,

plim

n→∞LKL( ˆθ(α); α)±

LKLα(1); α) = 1; if δ ∈ (0, 1), (5.80) LKL( ˆθ(α); α) − LKLα(1); α) = Op(1); if δ = 0. (5.81)

Note that from Theorem 4, we have plim

n→∞

θ(α) = θˆ α(1) for δ ∈ (0, 1), which immediately gives (5.78). On the other hand, (5.79) is somewhat surprising, because ˆθ(α) generally does not converge to θ(1)α for δ = 0.

Theorem 5 Consider a class of models given by (3.1) with xj(s) = sj; j = 1, . . . , p, and cov(η(s), η(s0)) = σ2ηexp(−κη|s − s0|), where ση2 > 0, κη > 0 and σ²2 > 0 are known, and p is fixed. Suppose that A = {α0, α1, . . . , αp}, where α0 = ∅, αj = {1, . . . , j} for j = 1, . . . , p, and Ac6= ∅. In addition, suppose that the data are collected at si = in−(1−δ); i = 1, . . . , n, for some δ ∈ [0, 1).

(i) For δ = 0 and any λ > 0,

n→∞lim P¡

αc= arg min

α∈A

LKL(α)¢

< 1. (5.82)

In addition, if λ → ∞, then

n→∞lim P¡ ˆ

αGIC(λ) = α0

¢= 1, (5.83)

where ˆαGIC(λ) is defined in (4.2).

(ii) For δ ∈ (0, 1), if λ → ∞ and n(2p(αc)+1)δ±

λ → ∞ as n → ∞, then

n→∞lim P¡ ˆ

αGIC(λ) = αc¢

= 1.

Proof. (i) For δ = 0, by (3.8),

LKL(α) = µ0A(α)0Σ−1A(α)µ + (η + ²)0M (α)0Σ−1(η + ²), where

(η + ²)0M (α)0Σ−1(η + ²) = (η + ²)0Σ−1X(α)(X(α)0Σ−1X(α))−1X(α)0Σ−1(η + ²)

∼ χ2(p(α)), (5.84)

with χ2(k) denoting the chi-square distribution with k degrees of freedom. Similarly, (η + ²)0(M (αc) − M (α))0Σ−1(η + ²) ∼ χ2(p(αc) − p(α)).

By (5.50), for α ∈ A \ Ac, we have µ0A(α)0Σ−1A(α)µ = O(1). Hence, for α ∈ A \ Ac,

n→∞lim P¡

LKLc) − LKL(α) > 0¢

= lim

n→∞P¡

(η + ²)0(M (αc) − M (α))0Σ−1(η + ²) − µ0A(α)0Σ−1A(α)µ > 0¢

> 0.

Thus (5.82) is obtained.

For (5.83), by (4.6) with λ → ∞,

ΓGIC(λ)(α) = (Z − ˆµ(α))0Σ−1(Z − ˆµ(α)) + λp(α)

= µ0A(α)0Σ−1A(α)µ + (η + ²)0A(α)0Σ−1(η + ²) + 2µ0A(α)0Σ−1(η + ²) +λp(α)

= 2µ0A(α)0Σ−1(η + ²) + (η + ²)0Σ−1(η + ²) + λp(α) + op(λ),

where the last equality follows from (5.50) and (5.84). In addition, by Chebyshev’s in-equality and the following moment condition:

var(µ0A(α)0Σ−1(η + ²)) = µ0A(α)0Σ−1A(α)µ = O(1), we have µ0A(α)0Σ−1(η + ²) = Op(1). Therefore, for α ∈ A \ {α0},

ΓGIC(λ)(α) − ΓGIC(λ)0) = λ(p(α) − p(α0)) + op(λ),

which is greater than zero with probability tending to 1. Thus (5.83) is obtained.

(ii) It suffices to show that limn→∞ELKL(α)/λ = ∞ for α ∈ A \ Ac by (4.7). First, for α ∈ A \ Ac,

µ0A(α)0Σ−1A(α)µ

= β0X0−1− Σ−1X(α)(X(α)0Σ−1X(α))−1X(α)Σ−1)Xβ

= β0X0−1− Σ−1X(α)(X(α)0Σ−1X(α))−1X(α)Σ−1)Xβ

= β0(Vp,p− Vp,p(α)Vp(α),p(α)−1 Vp(α),p+ o(n(2p(αc)+1)δ)

= βp(α2 c)e0p(αc)(Vp,p − Vp,p(α)Vp(α),p(α)−1 Vp(α),p)ep(αc)n(2p(αc)+1)nδ + o(n(2p(αc)+1)δ), where ej is the jth column of Ip, β(α) = D(α)β(α), X(α) = D−1(α)X(α) with

D(α) =





1 0 · · · 0 0 nδ . .. ...

... ... ... 0 0 · · · 0 np(α)δ



,

Vj×j0 is defined in (5.48), and e0p(αc)(Vp,p−Vp,p(α)Vp(α),p(α)−1 Vp(α),p)ep(αc)is a constant, which is bounded away from 0 by Theorem 3.1 of Shibata (1981). It follows from (3.9) and

n→∞lim λ±

n(2p(α)+1)δ= 0 that

n→∞lim

ELKL(α)

λ = lim

n→∞

ELKL(α)/n(2p(αc)+1)δ λ/n(2p(αc)+1)δ = ∞.

This completes the proofs. 2

Theorem 6 Consider the same setup as in Theorem 5 except xj(s) = (sn−δ)j; j = 1, . . . , p.

(i) For δ = 0 and any λ > 0,

n→∞lim P¡

αc= arg min

α∈A

LKL(α)¢

< 1.

In addition, if λ → ∞, then

n→∞lim P¡ ˆ

αGIC(λ) = α0¢

= 1, where ˆαGIC(λ) is defined in (4.2).

(ii) For δ ∈ (0, 1), if λ → ∞ and nδ±

λ → ∞ as n → ∞, then

n→∞lim P¡ ˆ

αGIC(λ) = αc¢

= 1.

Proof. (i). See (i) in Proof of Theorem 5.

(ii). From (ii) of Theorem 2, it suffices to show that limn→∞ELKL(α)/λ = ∞ for α ∈ A \ Ac. By (5.49),

µ0A(α)0Σ−1A(α)µ = γ(α)nδ+ o(nδ),

where γ(α) is a constant, which is bounded away from 0 by Theorem 3.1 of Shibata (1981).

It follows from (3.9) and lim

n→∞λ±

nδ = 0 that

n→∞lim

ELKL(α)

λ = lim

n→∞

ELKL(α)/nδ λ/nδ = ∞.

This completes the proof. 2

Theorem 7 Under the setup of Theorem 6, suppose that θ = (ση2, κη, σ²2)0 ∈ Θ is un-known, where Θ ⊂ (0, ∞)3 is a compact set such that θ0 ∈ Θ. Let ˆθ(α) be the ML estimate of θ . For δ = 0, if λ → ∞ as n → ∞, then

n→∞lim P¡ ˆ

αGIC(λ) = α0¢

= 1.

For δ ∈ (0, 1), if λ → ∞ and λ/nδ → 0 as n → ∞, then

n→∞lim P¡ ˆ

αGIC(λ) = αc¢

= 1.

Proof. First, for δ = 0, we prove

n→∞lim P¡ ˆ

αGIC(λ)= α0

¢ = 1.

By (4.10) and by (5.79), for α0 = ∅ and θ(1)α defined in (5.77), we have

ΓGIC(λ)(α) − ΓGIC(λ)0) = −2`(θα(1); α) + 2`(θ(1)α0; α0) + λ(p(α) − p(α0)) + Op(1)

= λ(p(α) − p(α0)) + ξ(θ(1)α ) − ξ(θ(1)α0) + Op(1)

= λ(p(α) − p(α0)) + Op(1) > 0,

as n → ∞ with probability tending to 1, where the second equality follows from (5.54) and the third equality follows from (5.76).

Second, for δ ∈ (0, 1), we prove

n→∞lim P¡ ˆ

αGIC(λ) = αc¢

= 1.

It suffices to show that the conditions in Theorem 3 are satisfied. First, by (5.36) and (5.37), we have

X0Σ−1(θ)X = κη

η2Vp×pnδ+ o(nδ),

where Vp×p is defined in (5.48) and is nonsingular. Then (A.2) is satisfied. Second, by (5.38), (A.3) is satisfied trivially. Third, (A.4)-(A.5) are followed by (5.78) and (5.80) for τn = nδ and θα = θ(1)α defined in (5.77). Fourth, (A.1) holds by (5.49). Fifth, for ξ(θ) defined in (5.53), by (5.71), we have

ξ(θ0) − ξ(θα(1)) = op(nδ).

Hence, (4.12) holds. Last, for α ∈ Ac, θα(1) = θ0, (4.14) holds trivially. This completes

the proof. 2

相關文件