科技部補助專題研究計畫成果報告
期末報告
相關誤差下的迴歸函數適應性估計
計 畫 類 別 : 個別型計畫
計 畫 編 號 : NSC 102-2118-M-004-007-
執 行 期 間 : 102 年 08 月 01 日至 103 年 07 月 31 日
執 行 單 位 : 國立政治大學統計學系
計 畫 主 持 人 : 黃子銘
計畫參與人員: 碩士級-專任助理人員:鄭宇翔
處 理 方 式 :
1.公開資訊:本計畫可公開查詢
2.「本研究」是否已有嚴重損及公共利益之發現:否
3.「本報告」是否建議提供政府單位施政參考:否
中 華 民 國 103 年 07 月 28 日
中 文 摘 要 : 無母數迴歸估計問題中,若是迴歸函數平滑程度已知, 則容
易得到收斂速度為最佳的估計量。 當迴歸函數平滑程度為未
知時, 我們也希望迴歸函數估計量能達到一樣好的估計效
果。 這樣的估計量稱為具有適應性。
當迴歸模型中誤差為相關時, 有人提出基於一種選模規則
的迴歸函數估計量, 是具適應性的。 然而這方法使用上有
困難, 因為選模規則牽涉到一些未知參數。 本計畫主要成
果為提出一種迴歸函數估計量使用上不牽涉到一些未知參
數,並證明其收斂速度達到幾乎最佳,可視為接近適應性的
估計量。
中文關鍵詞: 迴歸, 適應性估計, 相關誤差
英 文 摘 要 : In nonparametric regression, if the degree of
smoothness of the regression function is known, it is
often easy to obtain estimators that attain the
optimal convergence rate. When the degree of
smoothness of the regression function is unknown, it
is desirable to have estimators for the regression
function that can also achieve the optimal
convergence rate. Estimators that have this property
are called adaptive.
When the errors in a regression model are dependent,
an adaptive estimator based on a model selection
criterion has been proposed, but it is difficult to
implement because the criterion involves unknown
parameters for the error dependence structure and the
error variance. In this project, it is proposed to
consider an estimator that is also based on model
selection approach without invovling unknown
parameters. It is shown that the proposed estimator
attends a nearly-optimal convergence rate.
study the possibility of replacing the unknown
parameters by their consistent estimators to make the
adaptive estimator implementable.
An adaptive knot selection method for regression
splines via penalized minimum contrast estimation
Tzee-Ming Huang
Abstract
In this report, a knot selection method for regression splines is proposed. This method yields a least square spline estimator that adapts to the smoothness of the regression function, and the knots are allowed to be not equally spaced. If the true regression func-tion s belongs to a Sobolev space W2
m[0, 1], then for a sequence {an}
such that limn→∞an = ∞ and a constant γ0 > 0, the proposed
estimator (depending on an and γ0) can converge to s at the rate
O(pan(log n)1+γ0n−2m/(1+2m)) in terms of L2 norm in probability.
1
Introduction
One of the most popular methods in non-parametric regression is B-spline estimation. B-splines are piecewise polynomials joined smoothly at points called knots. For implementation, one has to choose the number of knots and the degrees of polynomials. The choice of knots is especially crucial. For functions that are m times continuously differentiable with the m-th derivatives bounded by a constant, Zhou, Shen and Wolfe [9] showed that the number of knots should grow at the rate n1/(1+2m)for the spline estimator for the regression function to achieve the optimal convergence rate n−2m/(1+2m) in integrated mean squared error.
It is possible to construct estimators for regression functions in the Sobolev space Wm2[0, 1] that can achieve the rate n−m/(1+2m)with respect to the L2 norm without knowing m. These estimators are known as adaptive estimators. Barron, Birg´e and Massart [2] derived risk bounds for penalized minimum contrast estimators, which can be used to construct adaptive esti-mators for regression functions. Huang [5] applied an inequality in Yang and Barron [8], which is obtained using a similar approach in [2], to construct an adaptive estimator for regression function using B-splines with equally spaced knots assuming the errors are normally distributed.
The objective of this study is to construct an adaptive estimator for re-gression function using B-splines without requiring that the knots are equally
spaced or that the errors are normally distributed. This objective is achieved by first establishing an exponential inequality to control the error of mini-mum contrast estimators and then applying the result to derive convergence rate for a spline estimator obtained via model selection. The exponential inequality is given in Section 2. The application to adaptive B-spline esti-mation is given in Section 3.
2
Error Control for Minimum Contrast
Estima-tors
In this section, the problem of minimum contrast estimation is introduced, which includes least square regression as a special case. Then, a theorem that gives error bounds for penalized minimum contrast estimators is presented, which will be used for establishing the rate of convergence of the proposed regression estimator.
The problem set-up for minimum contrast estimation is this: Consider the problem of estimating an unknown function s based on observations Z1, . . ., Zn, where there exists a function γ such that E
1 n n X i=1 γ(Zi, t) ! is minimized when t = s. Suppose that s is estimated by
˜ s = arg min t∈S 1 n n X i=1 γ(Zi, t).
Then ˜s is called the minimum contrast estimator over S with respect to γ, and the function γ is called the contrast function.
Least square regression fits into the framework of minimum contrast estimation. Consider the regression model
Yi = s(Xi) + Wi, i = 1, . . . , n, (1)
where the regression function s is defined on an interval I0, Wi are errors
of mean zero that are independent of the Xi’s. If s is estimated by some
function in a parametric family S, then the least square estimator ˜s is a minimum contrast estimator with Zi = (Xi, Yi) with respect to the contrast
function γ(z, t) = (y − t(x))2 for z = (x, y).
In minimum contrast estimation, a key requirement is that the function s can be approximated well by some function in S. An approach to achieve the approximation requirement is to take S = ∪j∈ΛSj, where Sj’s are various
families of functions and the collection {Sj} is rich enough to include some
Barron?). To prevent overfitting, a penalty term η1,j is often considered in
minimum contrast estimation to obtain the following estimator ˆ s = arg min j∈Λ η1,j+ arg mint∈Sj 1 n n X i=1 γ(Zi, t) ! . (2)
To control the estimation error of the estimator penalized minimum contrast estimator ˆs, several assumptions are made. Let
νn(·) = 1 n n X i=1 [γ(Zi, ·) − Eγ(Zi, ·)],
and let k · k and k · k∞ denote the L2 norm and sup norm on I0. The
assumptions are given below.
Assumption M1. Suppose that for i = 1, . . ., n, Zi = f0(s, Xi, Wi), where f0
is known. Suppose that there exists a non-negative constant k0 and for each
j ∈ Λ, there exist non-negative constants Aj, Bj, A2,j, B2,j, non-negative
functions Mj, ∆j, M2,j, ∆2,j, such that for u ∈ Sj,
|γ(z, u) − γ(z, v)| ≤
(
Mj(w)∆j(x, u, v), if v ∈ Sj;
M2,j(w)∆2,j(x, u, s), if v = s,
and, for all m ≥ 2, i = 1, . . ., n,
Es[Mjm(Wi)] ≤ amAmj ,
Es[∆mj (Xi, u, v)] ≤ bmk0ku − vk2Bm−2j ,
Es[M2,jm(Wi)] ≤ amAm2,j,
Es[∆m2,j(Xi, u, s)] ≤ bmk0ku − sk2B2,jm−2,
where am = 1, bm = m!/2, for all m ≥ 2, or bm = 1, am = m!/2, for all
m ≥ 2.
Assumption M2. For each j ∈ Λ, there exist constants Bj0 ≥ 1, rj > 0
and Dj ≥ 1 such that, for σ > 0 and 0 < δ < σ/5, for any ball B ⊂ Sj
of radius σ with respect to k · k, one can find T : a subset of B such that |T | ≤ (Bj0σ/δ)Dj and for every u ∈ B, there exists v ∈ T such that
ku − vk ≤ δ and
Assumption M3. There exist positive constants k1 and k2 such that
k1ku − sk2≤ E[γ(Zi, u) − γ(Zi, s)] ≤ k2ku − sk2 (3)
for all u ∈ Sj for all j ∈ Λ.
Under Assumptions M1 – M3, if (X1, W1), . . ., (Xn, Wn) are β-mixing, one
can derive an error bound for ˆs using bounds for supu∈S[νn(s) − νn(u)] and
νn(u) − νn(s) for u ∈ Sj. This result is stated in Theorem 1 below.
Theorem 1 Suppose that Assumptions M1 – M3 hold and (X1, W1), . . .,
(Xn, Wn) are β-mixing. Suppose that Bj ≥ 1, rjBj2 ≤ n/4, rj ≥ 1, and
there exist constants A0 > 0, B0 > 0 such that 0 < Aj/Bj ≤ A0 + 2 and
0 < A2,j/B2,j ≤ 1/B0. Suppose that kuk∞ ≤ Bj for all u ∈ Sj. Suppose
that qnis a positive integer and ˜qn= bqn/2c. Let `n= bn/qnc. Suppose that
δn> 0, `n≥ A20 and rjBj2≤ (`n/4) ∧ δn, σ > 0, τ > 0, θ ≥ A 0+ 2 c(8τ ) ∨ 5, (4)
and the penalty η1,j is chosen so that
`nη1,j
2 > 24B
2
jrj(1 + 2Djlog 2) for each j ∈ Λ. (5)
Let c1(t) = t (1 +√1 + t)2 and c(t) = c1 t k0(A0+ 2) (6) for t > 0 and take
ηj = 1 2 η1,j+ B2 jrjξ `n ! . Then there exists a set Ωn with
P (Ωcn) ≤ (`n+ 1)(βq˜n+ βqn−˜qn)
such that for ξ > 0, j∗ ∈ Λ and s∗∈ Sj∗, we have
(k1− 9τ )kˆs − sk2IΩn− 9τ σ2 4 ! ≤ 1.5η1,j∗+ (k2+ τ )ks∗− sk2+1.5δnξ `n (7) except on a set of probability at most
qn p∗j∗(`n, ηj∗) + X j∈Λ pj(`n, ηj) ,
where βk denotes the k-th β-mixing coefficient for k ≥ 1, pj(`n, ηj) = exp − c1(2τ B0/k0)`nηj A2,jB2,j ! +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (Bj0θ) ∨ 23Dj× " exp −`nc(τ )ηj 2AjBj ! + exp − `nηj 24B2 jrj !# (8) and p∗j∗(`n, ηj∗) = exp −c1(2τ B0/k0)`nηj ∗ A2,j∗B2,j∗ ! . (9)
The proof of Theorem 1 is given in Section 5.1.
Remark. In the least regression framework, while the inequality in Lemma 0 in [8] can also be applied to give control of νn(s)−νn(u) (see [5] for
example), the exponential inequality given in (21) is more direct. Lemma 0 in [8] gives control of the likelihood function, and in [5], the control of νn(s) − νn(u) is achieved by assuming the Wi’s are normally distributed. In
contrast, the inequality in (21) is derived following the proof of Theorem 5 in Birg´e and Massart [3], which only requires moment conditions of the Wi’s.
3
Application to Adaptive B-spline Estimation
In this section, we will apply Theorem 1 to obtain adaptive B-spline es-timators for s in (1). Here the regression function s is assumed to be in the Sobolev space Wm2[0, 1] for some m ≥ 1 and each Sj is taken to be acollection of B-splines that are on [0, 1] with order q for some integer q ≥ 1 and boundary knots at 0 and 1 and distinct internal knots ξ1, . . ., ξk in
{1/2J, . . . , (2J − 1)/2J} for some positive integer J . Also, the coefficients
for B-splines in Sj are bounded by b in absolute value for some positive
integer b, and the index j is (b, q, ξ1, . . . , ξk). Let
˜ ∆2,j = max 1≤i≤k+1(ξi− ξi−1), (10) ˜ ∆1,j = min 1≤i≤k+1(ξi− ξi−1), (11) Bj0 =√2πe 0.5 + q√q(2q + 1)9q−1 q ˜ ∆2,j q ˜ ∆1,j , (12)
rj = 1, (13)
and
Jj = min{J ≥ 1 : ξ1, . . ., ξk are in {1/2J, . . . , (2J− 1)/2J} },
where ξ0 = 0 and ξk+1 = 1. Let Λ be the set of all j = (b, q, ξ1, . . . , ξk)’s
such that b and q are positive integers, 2Jj+ q + b ≤ n and r
j(2b)2 ≤ δn,
where {δn} is chosen such that
lim n→∞δn= ∞ = limn→∞ `n δn and lim n→∞ δnlog(n) nα = 0 for all α > 0. (14)
Then the estimator for s considered here is the penalized least square esti-mator ˆ s = argminj∈Λ,u∈Sj 1 n n X i=1 (Yi− u(Xi))2+ η1,j ! , (15) where η1,j = anrj(2b)2 `n (k + q) log(Bj0) + λ[(log 2)2Jj+ q + b], (16)
λ ≥ 1 is a constant, and {an} is a sequence of positive numbers such that
limn→∞an= ∞. The L2 convergence rate for ˆs is given in Theorem 2.
Theorem 2 Suppose that the regression model in (1) holds, where the re-gression function s is in Wm2[0, 1], {(Xi, Wi)}i≥1 is a β-mixing series, and
its `-th β-mixing coefficient, denoted by β`, satisfies
β`≤ γ1e−`γ2 for ` ≥ 1, (17)
for some positive constants γ1and γ2, and the Wi’s are of mean zero and are
independent of the Xi’s. Suppose that Eeα|Wi|< Γ for some α > 0 and Γ ≥ 1
and Xi has a Lebesgue density that is bounded above and bounded below from
zero. Suppose that {an} and {`n} are two sequences of positive numbers such
that limn→∞an= ∞ and `n= O(n(log n)−(1+γ0)) for some γ0> 0. Suppose
that {δn} is chosen so that (14) holds. Then, for the estimator ˆs given in
(15) with Sj defined above in this section and η1,j defined in (16), we have
Ekˆs − sk2 = O(an(log n)1+γ0n−2m/(1+2m)).
4
Conclusion
An adaptive estimator for regression function using B-splines that allows non-equally spaced knots is successfully constructed. The estimator is a penalized least square estimator and it achieves the L2 convergence rate O(an(log n)1+γ0n−2m/(1+2m)) for any sequence {an} such that limn→∞an=
∞ and constant γ0> 0 when the regression function is in Wm2[0, 1] for some
m ≥ 1.
5
Proofs
5.1 Proof of Theorem 1
The proof of Theorem 1 is based on the following result:
Lemma 1 Suppose that Assumptions M1 and M2 hold and (X1, W1), . . .,
(Xn, Wn) are β-mixing. Suppose that Bj ≥ 1, rjBj2 ≤ n/4, rj ≥ 1, and
there exist constants A0 > 0 and B0> 0 such that 0 < Aj/Bj ≤ A0+ 2 and
0 < A2,j/B2,j ≤ 1/B0. Suppose that kuk∞ ≤ Bj for all u ∈ Sj. Suppose
that qnis a positive integer and ˜qn= bqn/2c. Let `n= bn/qnc. Suppose that
`n ≥ A20 and rjB2j ≤ `n/4 for each j ∈ Λ, σ > 0, τ > 0, (4) holds. Then
there exists a set Ωn such that P (Ωcn) ≤ d`ne(βq˜nI(˜qn≥ 1) + βqn−˜qn) and for
ηj such that
`nηj > 24Bj2rj(1 + 2Djlog 2) for each j ∈ Λ, (18)
for j∗∈ Λ and s∗ ∈ Sj∗, Ps(Ωn∩ (S2,1∪ S2,1∗ )) ≤ qn p ∗ j∗(`n, ηj∗) +X j∈Λ pj(`n, ηj) , (19)
where S2,1 is the event that
νn[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ ks − uk 2 ! + 2ηj for some u ∈ Sj for some j ∈ Λ, S2,1∗ =nνn[γ(·, s∗) − γ(·, s)] > τ ks − s∗k2+ ηj∗ o ,
βk is the k-th β-mixing coefficient for k ≥ 1 and pj(`n, ηj) and p∗j(`n, ηj∗)
Lemma 1 can be derived by first establishing the special case where (X1, W1),
. . ., (Xn, Wn) are independent and then applying a corollary of Berbee’s
Lemma taken from Claim 2 in [1]. The independent case of Lemma 1 is stated in Lemma 2 below, whose proof is given in Section 5.2. The corollary of Berbee’s Lemma is stated in Fact 1 below.
Lemma 2 Suppose that Assumptions M1 and M2 hold and (X1, W1), . . .,
(Xn, Wn) are independent. Suppose that Bj ≥ 1, rjBj2 ≤ n/4, rj ≥ 1, and
there exist constants A0 > 0 and B0> 0 such that 0 < Aj/Bj ≤ A0+ 2 and
0 < A2,j/B2,j ≤ 1/B0. Suppose that kuk∞ ≤ Bj for all u ∈ Sj. Suppose
that n ≥ A20, σ > 0, τ > 0, (4) and and
nηj > 24Bj2rj(1 + 2Djlog 2) for each j ∈ Λ. (20)
Then we have Ps " νn[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ ks − uk 2 ! + 2η for some u ∈ Sj # ≤ exp −c1(2τ B0/k0)nηj A2,jB2,j ! +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (Bj0θ) ∨ 23Dj× " exp −nc(τ )ηj 2AjBj ! + exp − nηj 24B2 jrj !# , (21)
where c1 and c are defined in (6). In addition, for u ∈ Sj and ηj ≥ 0, we
have Ps h νn[γ(·, u) − γ(·, s)] > τ ku − sk2+ ηj i ≤ exp −c1(2τ B0/k0)nηj A2,jBj2, j ! (22) Fact 1 Suppose that a sequence {ui}∞i=1 is β-mixing and for n ≥ 1, ˜qn and
qn are integers such that 0 ≤ ˜qn ≤ qn/2, qn ≥ 1. Let ui for i ≥ 1. Let
`n = bn/qnc. Then there exists u∗i: i = 1, . . ., d`neqn such that (i)–(iii)
hold.
(i) For ` = 1, . . ., d`ne, let U`,1 = (u(`−1)qn+1, . . ., u(`−1)qn+˜qn)
T, U∗ `,1 = (u∗(`−1)q n+1, . . ., u ∗ (`−1)qn+˜qn) T, U `,2= (u(`−1)qn+˜qn+1, . . ., u`qn) T, and U`,2∗ = (u∗(`−1)q n+˜qn+1, . . ., u ∗ `qn)
T, then for each δ ∈ {1, 2}, U `,δ and
(ii) For ` = 1, . . ., d`ne, P (U`,1 6= U`,1∗ ) ≤ βqn−˜qn and P (U`,2 6= U
∗ `,2) ≤
βq˜n, where βk denotes the k-th β-mixing coefficient for k ≥ 1.
(iii) For each δ ∈ {1, 2}, U1,δ∗ , . . ., Ud`∗
ne,δ are independent.
Next, we will prove Lemma 1 using Lemma 2 and Fact 1. To find an upper bound for Ps(S2,1) when the series {(Xi, Wi)}i≥1is β-mixing, we will
apply Fact 1 with ui = (Xi, Wi). We will only prove the case qn> 1, which
implies that ˜qn = bqn/2c ≥ 1, since the proof for the case qn= 1 is similar.
Let (Xi∗, Wi∗) = u∗i, and
Ωn= {(Xi, Wi) = (Xi∗, Wi∗) for i = 1, . . . , n}.
Then Ps(S2,1) ≤ P (Ωcn) + Ps(S2,1∩ Ωn). For k = 1, . . ., qn, let Γk= {i : i =
k + `qn for some integer ` and 1 ≤ i ≤ n} and
νn,k[γ(·, u)] = 1 |Γk| X i∈Γk γ(Zi, u) − Eγ(Z1, u)
for all u, where |Γk| denotes the number of elements in Γk, which is at most
d`ne. Let S2,1,k be the event that
νn,k[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ ks − uk 2 ! + 2ηj
for some u ∈ Sj for some j ∈ Λ and let
S2,1,k∗ =nνn,k[γ(·, s∗) − γ(·, s)] > τ ks∗− sk2+ ηj∗
o
. Then S2,1 ⊂ ∪qk=1n S2,1,k and S2,1∗ ⊂ ∪
qn
k=1S2,1,k∗ and it follows from (iii) in
Claim 2 and Theorem 2 that P (Ωcn) ≤ d`ne(βq˜n+ βqn−˜qn) and
Ps(Ωn∩ (S2,1∪ S2,1∗ )) ≤ qn X k=1 (Ps(S2,1,k∩ Ωn) + Ps(S∗2,1k∩ Ωn)) ≤ qn X k=1 X j∈Λ pj(|Γk|, ηj) + qn X k=1 p∗j∗(|Γk|, ηj∗) ≤ qn X j∈Λ pj(`n, ηj) + qnp∗j∗(`n, ηj∗)
if (4) holds, `n≥ A20and for each j ∈ Λ, rjBj2 ≤ `n/4, and `nηj > 24Bj2rj(1+
To give an error bound for kˆs − sk using Lemma 1, for ξ > 0, take ηj = 1 2 η1,j+ Bj2rjξ `n ! , then (18) holds and on S2,1c , we have
k1ku − sk2− 9τ
σ2
4 ∨ ks − uk
2
!
≤ E(γ(Zi, u)) − E(γ(Zi, s)) − 9τ
σ2 4 ∨ ks − uk 2 ! ≤ 1 n n X i=1 γ(Zi, u) + η1,j+ B2 jrjξ `n − 1 n n X i=1 γ(Zi, s) ≤ 1 n n X i=1 γ(Zi, u) + η1,j+ δnξ `n − 1 n n X i=1 γ(Zi, s)
for all u ∈ Sj for all j ∈ Λ, and on (S2,1∗ )c, we have
1 n n X i=1 γ(Zi, s∗) − 1 n n X i=1 γ(Zi, s) ≤ E[γ(Zi, s∗) − γ(Zi, s)] + τ ks∗− sk2+ ηj∗ ≤ E[γ(Zi, s∗) − γ(Zi, s)] + τ ks∗− sk2+ η1,j∗ 2 + δnξ 2`n . Thus on Ωn∩ (S2,1c ∩ (S2,1∗ )c), we have the error bound
(k1− 9τ )kˆs − sk2− 9τ σ2 4 ! ≤ 1 n n X i=1 γ(Zi, s∗) + η1,j∗+δnξ `n − 1 n n X i=1 γ(Zi, s) ≤ 1.5 η1,j∗+δnξ `n + E[γ(Zi, s∗) − γ(Zi, s)] + τ ks∗− sk2 ≤ 1.5η1,j∗+ (k2+ τ )ks∗− sk2+1.5δnξ `n . Let U = (k1− 9τ )kˆs − sk2IΩn− 9τ σ2 4 ! + 1.5η1,j∗+ (k2+ τ )ks∗− sk2 ! , then the above result can be expressed as P (U > 0) ≤ P (Ωn∩ (S2,1∪ S2,1∗ )),
where P (Ωc
n) ≤ (`n+ 1)(βq˜n + βqn−˜qn) and an upper bounds for P (Ωn∩
5.2 Proof of Lemma 2
To prove Lemma 2, we will first establish the following result:
Fact 2 Suppose that the conditions in Lemma 2 hold. Suppose that θ and ηj satisfy (4) and (18) for some τ > 0. Then for σ > 0, x > 0, xk> 0, we
have Ps " sup u∈B(s∗,σ) νn[γ(·, s∗) − γ(·, u)] > τ σ2+ ηj # ≤ exp(−nx + H0) +X k exp(−nxk+ Hk+ Hk+1) (23) if σp2k0x + Bjx + (σ/θ) ∞ X k=0 2−k(p5k0xk+ 1.5rxk) ≤ τ σ2+ ηj Aj , (24)
where Hk= Djlog(Bj02kθ) = Djlog(Bj0θ) + kDjlog(2).
The proof of Fact 2 relies on a version of Bernstein’s inequality given Lemma 8 in [3], which is given below.
Fact 3 Suppose that U1, . . ., Unare independent random variables such that
1 n n X i=1 E[|Ui|m] ≤ m! 2 v 2cm−2 for all m ≥ 2
for some positive constants v and c. Then, for x ≥ 0, P " n X i=1 Ui− E n X i=1 Ui ! ≥ n(v√2x + cx) # ≤ exp(−nx).
Fact 2 follows from Fact 3 and Assumption M2. To prove Fact 2, let δk =
2−kσ/θ. Let Tk be a subset of B(s∗, σ) such that |Tk| ≤ (B02kθ)D and
for every u ∈ B(s∗, σ), there exists v ∈ Tk such that ku − vk ≤ δk and
k∆(·, u, v)k∞ ≤ rδk. Let Vk be the set of (uk, uk+1)’s such that uk ∈ Tk,
uk+1 ∈ Tk+1and there exists u ∈ Sj such that kuk+1−uk ≤ δk+1, kuk−uk ≤
δk, k∆(·, u, uk+1)k∞ ≤ rδk+1, and k∆(·, u, uk)k∞ ≤ rδk. For u ∈ B(s∗, σ), since νn[γ(·, s∗) − γ(·, u)] = νn[γ(·, s∗) − γ(·, u0)] + ∞ X k=0 (νn[γ(·, u) − γ(·, uk+1)] − νn[γ(·, u) − γ(·, uk)]) ,
for some uk∈ Tk for k = 0, 1, . . ., we have Ps " sup u∈B(s∗,σ) νn[γ(·, s∗) − γ(·, u)] > τ σ2+ ηj # ≤ Ps[νn[γ(·, s∗) − γ(·, u0)] > y for some u0 ∈ T0] | {z } I +X k Ps[νn[γ(·, uk) − γ(·, uk+1)] > yk for some (uk, uk+1) ∈ Vk] | {z } IIk if y +X k yk≤ τ σ2+ ηj.
Below we will use Fact 3 to find bounds for I and IIk. To derive an
upper bound for I, note that
Es|γ(Zi, s∗) − γ(Zi, u0)|m ≤ Es[Mjm(Wi)]Es[∆mj (Xi, s∗, u0)] ≤ amAmj Es[∆m(Xi, s∗, u0)] and 1 n n X i=1 Es|γ(Zi, s∗) − γ(Zi, u0)|m≤ amAmj bmk0σ2Bm−2j . Take y = Aj(σ √
2k0x + Bjx), then I ≤ (Bj0θ)Dj exp(−ny) = exp(−ny + H0).
To derive an upper bound for IIk, note that
Es|γ(Zi, uk+1)−γ(Zi, uk)|m≤ Es[Mjm(Wi)]Es(∆j(Xi, u, uk+1)+∆j(Xi, u, uk))m, where Es(∆(Xi, u, uk+1) + ∆(Xi, u, uk))m ≤ Es(∆j(Xi, u, uk+1) + ∆j(Xi, u, uk))2(k∆j(·, u, uk+1)k∞+ k∆j(·, u, uk+1)k∞)m−2, so 1 n n X i=1 Es|γ(Zi, uk+1) − γ(Zi, uk)|m ≤ amAmj bmk0[2(δk2+ δk+12 )(rjδk+ rjδk+1)m−2] ≤ m! 2 5 2δ 2 kA2jk0 Aj 3rjδk 2 m−2 . Therefore, if yk = Ajδk( √ 5k0xk+ 1.5rjxk), then IIk≤ Bj02kθDjB0j2k+1θDjexp(−nxk) = exp(−nxk+ Hk+ Hk+1).
From the upper bounds for I and IIk given above, (23) holds true if Aj(σ p 2k0x + Bjx) + X k Ajδk( p 5k0xk+ 1.5rjxk) ≤ τ σ2+ ηj,
so (24) implies (23) and we have Fact 2.
Next we will apply Fact 2 with specific x and xk’s for case 0 < σ ≤ 2Bj.
In such case, (24) is implied by σp2k0x + Bjx + σ θ ∞ X k=0 2−k(p5k0xk) + 2Bj θ ∞ X k=0 2−k(1.5rjxk) ≤ τ σ2+ ηj Aj . (25) Let g(x) = (x/(1 +√1 + x))2 for x > 0, then g(x)/x < 1 is increasing on (0, ∞). Let xk= (k + 1)˜y, then (25) holds if
0 < x ≤ k0σ 2 2Bj2g Bj Ajk0σ2 (τ σ2+ ηj) ! and 0 < ˜y ≤ c 2 1k0σ2 16B2 jc20r2j g 4c0rjθBj c2 1Ajk0σ2 (τ σ2+ ηj) ! , where c1 = √ 5P∞ k=0( √ k + 1)2−k ≈ 3.789034 and c0 = 1.5P∞ k=0(k+1)2−k =
6. If x and ˜y satisfy the above constraints and ˜y > 2Djlog(2)/n, then
Ps " sup u∈B(s∗,σ)νn[γ(·, s ∗ ) − γ(·, u)] > τ σ2+ ηj # ≤ exp(−nx + Djlog(B0jθ)) +
exp−n˜y + Djlog(2) + 2Djlog(B0jθ)
1 − exp(−[n˜y − 2Djlog(2)])
. From the assumptions that rj ≥ 1, Bj ≥ 1, Bj/Aj ≥ A0+ 2 and θ > 5,
we have Bj Ajσ2 (τ σ2) ≥ τ A0+ 2 and 4c0rjθBj c21Ajσ2 (τ σ2) ≥20c0Bjτ c21Aj ≥ 8τ A0+ 2 . Since c(τ ) = g τ k0(A0+ 2) k 0(A0+ 2) τ , take x = c(τ )τ σ 2+ η j 2AjBj and ˜y = c(8τ )θ(τ σ 2+ η j) 4AjBjc0rj ,
then by (4), n˜y ≥ nηjc(8τ )θ 4AjBjc0rj ≥ nηjc(8τ )θ 4(A0+ 2)Bj2c0rj ≥ nηj 4B2 jc0rj . Thus n˜y ≥ 1 + 2Djlog 2 if (20) holds. In such case,
Ps " sup u∈B(s∗,σ)νn[γ(·, s ∗ ) − γ(·, u)] > τ σ2+ ηj # ≤ exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + exp −nc(8τ )θτ σ 2 4AjBjc0rj − nc(8τ )θηj 4AjBjc0rj + Djlog(2) + 2Djlog(Bj0θ) ! × 1 − exp(−[nc(8τ )θτ σ 2 4AjBjc0rj + nc(8τ )θηj 4AjBjc0rj − 2Djlog(2)]) !−1 ≤ exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + exp − nτ σ 2 4Bj2c0rj − nηj 4Bj2c0rj + Djlog(2) + 2Djlog(B0jθ) ! 1 − e−1−1.
In summary, we have prove the following fact assuming σ ≤ 2Bj:
Fact 4 Under the conditions in Lemma 2, for τ > 0, σ > 0, if θ and ηj
satisfy (4) and (18), then Ps " sup u∈B(s∗,σ) νn[γ(·, s∗) − γ(·, u)] > τ σ2+ ηj # ≤ exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + exp − nτ σ 2 4B2 jc0rj − nηj 4B2 jc0rj + Djlog(2) + 2Djlog(B0jθ) ! 1 − e−1−1.
Note that Fact 4 also holds for σ > 2Bj. To see this, note that kuk∞≤ Bj
for all u ∈ Sj, so Ps " sup u∈B(s∗,σ)νn[γ(·, s ∗ ) − γ(·, u)] > τ σ2+ ηj # = Ps " sup u∈B(s∗,2B j) νn[γ(·, s∗) − γ(·, u)] > τ (2Bj)2+ τ [σ2− (2Bj)2] + ηj # .
Apply Fact 4 with σ = 2Bj and replace ηj with τ [σ2− (2Bj)2] + ηj, then it
is clear that Fact 4 also holds for σ > 2Bj.
To prove Lemma 2 using Fact 4, for ε > 0, choose s∗ ∈ Sj so that ks − s∗k ≤ ks − uk + ε for all u ∈ Sj,
and then we will derive upper bounds for II = Ps
h
νn[γ(·, s∗) − γ(·, u)] > 2τ (σ2∨ ks∗− uk2) + ηj for some u in Sj
i and III = Ps h νn[γ(·, s) − γ(·, s∗)] > τ ks∗− sk2+ ηj i to control νn[γ(·, s) − γ(·, u)].
To find an upper bound for II, note that Ps
h
νn[γ(·, s∗) − γ(·, u)] > 2τ (σ2∨ ks∗− uk2) + ηj for some u in Sj
i ≤ ∞ X k=1 Ps h
νn[γ(·, s∗) − γ(·, u)] > τ (k + 1)σ2+ ηj for some u ∈ S1,k
i +Ps " sup u∈B(s∗,σ) νn[γ(·, s∗) − γ(·, u)] > 2τ σ2+ ηj # , where S1,k = Sj ∩ B(s∗, σ p
(k + 1)) ∩ B(s∗, σ√k)c. Apply Fact 4 with σ replaced by σ√k + 1, we have
II = Ps
h
νn[γ(·, s∗) − γ(·, u)] > 2τ (σ2∨ ks∗− uk2) + ηj for some u in Sj
i ≤ ∞ X k=0 exp −nc(τ )τ (k + 1)σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + ∞ X k=0 exp −nτ (k + 1)σ 2 4Bj2c0rj − nηj 4Bj2c0rj + Djlog(2) + 2Djlog(Bj0θ) ! 1 − e−1−1 ≤ 1 − exp −nc(τ )τ σ 2 2AjBj !!−1 exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! +1.6 1 − exp − nτ σ 2 4B2jc0rj !!−1 exp − nτ σ 2 4B2jc0rj − nηj 4Bj2c0rj + Djlog(2) + 2Djlog(Bj0θ) ! . Since 2Bj2 ≤ 2rjBj2 ≤ n/2 and n ≥ A02, we have AjBj ≤ n and 4rjBj2 ≤ n.
Thus the above upper bound for II is at most
1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (B0jθ) ∨ 23Dj " exp −nc(τ )ηj 2AjBj ! + exp − nηj 24B2jrj !# .
To derive an upper bound for III, note that Es|γ(Zi, s∗) − γ(Zi, s)|m ≤ Es[M2,jm(Wi)]Es[∆m2,j(Xi, s∗, s)], so 1 n n X i=1 Es|γ(Zi, s∗) − γ(Zi, s)|m ≤ amAm2,jbmk0ks∗− sk2B2,jm−2 ≤ m! 2 k0ks∗− sk2A22,j (A2,jB2,j)m−2. For x > 0 and y2 ≥ ks∗− skA2,j p 2k0x + A2,jB2,jx = A2,j(ks∗− sk p 2k0x + B2,jx), (26) we have Ps[νn[γ(·, s∗) − γ(·, s)] > y2] ≤ exp(−nx). (26) is equivalent to x ≤ g 2B2,jy2 k0A2,jks − s∗k2 ! k0ks − s∗k2 2B2 2,j , (27)
where g(x) = (x/(1 +√1 + x))2. Since g(x)/x is increasing on (0, ∞) and B2,j/A2,j ≥ B0, for y2 ≥ τ ks − s∗k2, (27) holds if x ≤ g(2τ B0/k0) 2τ B0/k0 y2 A2,jB2,j = c1(2τ B0/k0)y2 A2,jB2,j . Therefore, III = Ps h νn[γ(·, s∗) − γ(·, s)] > τ ks − s∗k2+ ηj i ≤ exp −c1(2τ B0/k0)nηj A2,jB2,j ! . (28)
From the above bounds for II and III and the fact that 2τ (σ2∨ ks∗− uk2) + ηj + τ ks − s∗k2+ ηj ≤ 2τ (σ2∨ (2ks − uk + ε)2) + τ (ks − uk + ε)2+ 2ηj ≤ 9τ σ 2 4 ∨ (ks − uk + ε) 2 ! + 2ηj, we have Ps " νn[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ (ks − uk + ε) 2 ! + 2ηj for some u ∈ S #
≤ exp −c1(2τ B0/k0)nηj A2,jB2,j ! +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (Bj0θ) ∨ 23Dj× " exp −nc(τ )ηj 2AjBj ! + exp − nηj 24Bj2rj !# .
Since ε > 0 is arbitrary, we can obtain (21) by letting ε → 0.
It is clear that (22) holds since (28) still holds if s∗ is replaced by any u ∈ Sj. The proof of Lemma 2 is complete.
5.3 Proof of Theorem 2
Proof of Theorem 2. Theorem 2 is an application of Theorem 1 with Zi = (Xi, Yi) and γ(z, t) = γ((x, y), t) = (y − t(x))2. To apply Theorem
1, Assumptions M1 – M3 will be verified first. Verification of Assumption M1. Note that
|γ(z, u) − γ(z, v)| = |u(x) − v(x)| · |2y − u(x) − v(x)|
= |u(x) − v(x)| · |2(s(x) + w) − u(x) − v(x)|, where
|2(s(x) + w) − u(x) − v(x)| ≤ 2(|w| + ksk∞+ b)
for u, v ∈ Sj since functions in Sj are bounded by b in sup-norm. Take
Mj(w) = 2(|w| + ksk∞+ b) and ∆j(x, u, v) = |u(x) − v(x)|, then
|γ(z, u) − γ(z, v)| ≤ Mj(w)∆j(x, u, v).
An upper bound for E[Mjm(Wi)] can be obtained by controlling E|Wi|m:
E[Mjm(Wi)] = E[2m(|Wi| + ksk∞+ b)m] ≤ 4m 1 2E(|Wi| m) +1 2(ksk∞+ b) m ≤ m! 2 4 m Γ αm + (ksk∞+ b)m m! ≤ m! 2 4 mΓ α + ksk∞+ b m . (29)
Here the inequality E|Wi|m≤
m!Γ
αm follows from the assumption that Ee α|Wi|<
To control Es[∆mj (Xi, u, v)] = Es|u(Xi) − v(Xi)|m, note that for u ∈ Sj
and v ∈ Sj,
Es|u(Xi) − v(Xi)|m≤ Es[u(X1) − v(X1)]2ku − vkm−2∞ ,
where Es[u(X1) − v(X1)]2≤ k0ku − vk2 for some constant k0 that does not
depend on j since the density of Xi is bounded above. Thus for u ∈ Sj and
v ∈ Sj,
Es[∆mj (Xi, u, v)] ≤ k0ku − vk2ku − vkm−2∞ (30)
≤ k0ku − vk2(2b)m−2. (31)
To control |γ(z, u) − γ(z, v)| for u ∈ Sj and v = s, take M2,j(w) =
2|w| + ksk∞+ b and ∆2,j(x, u, s) = |u(x) − s(x)| = ∆j(x, u, s), then
|γ(z, u) − γ(z, s)| ≤ M2,j(w)∆2,j(x, u, s)
since(29) still holds and
|2(s(x) + w) − u(x) − s(x)| ≤ 2|w| + ksk∞+ b.
Modify slightly the equations for deriving (29) by replacing ksk∞+ b with
(ksk∞+ b)/2 and we have E[M2,jm(Wi)] ≤ m! 2 4 mΓ α + ksk∞+ b 2 m . (32) Also, from (30), we have
Es[∆m2,j(Xi, u, s)] ≤ k0ku − vk2(ksk∞+ b)m (33)
Let A0 = 4(Γ/α + ksk∞) and B0 = 0.5. From (29), (31), (32), (33),
Assumption M1 holds with bm = 1, am= m!/2, Bj = 2b,
Aj = 4
Γ
α+ ksk∞+ b
= A0+ 2Bj,
A2,j = A0+ Bj, and B2,j = (A0+ Bj)/2. It is clear that Bj ≥ 1 and kuk∞≤
Bj for all u ∈ Sj. Also, 0 < Aj/Bj ≤ A0+ 2 and A2,j/B2,j = 2 = 1/B0.
Verification of Assumption M2. To identify the constants in Assumption M2, Facts 5 and 6 will be applied. These two facts are first stated and proved below.
Fact 5 Let ¯S be a D-dimensional subspace of L2∩ L∞(µ) spanned by some
the L∞-norm with respect to µ. Let | · |2 and | · |∞ denote the l2-norm and
the l∞-norm in RD.
Suppose that there exist constants T1 and T2 such that for (θ1, . . . , θD) ∈
RD, D X i=1 θiφi ∞ ≤ T1|θ|∞ (34) and T2 √ D|θ|2≤ D X i=1 θiφi 2 ≤ √T3 D|θ|2. (35) Take r0 ≥ T1/T3 and B0 =√2πe 0.5 + max T 3 T2 , 1 (36) Then for B: an L2 ball of radius σ in ¯S with 0 < δ < σ/5, there exists a
finite set T ⊂ B such that T is a δ-net for B with respect to the L2-norm
and a r0δ-net with respect to the L∞ norm, and the number of elements in
T is at most (B0σ/δ)D.
Proof. Suppose that the center of B is PD
i=1θ∗iφi. Let θ∗ = (θ∗1, . . . , θD∗).
Then it follows from the first inequality in (35) that B is contained {PD
i=1θiφi :
(θ1, . . . , θD) ∈ B0}, where B0 is the l2 ball of center θ∗ and radius
√
Dσ/T2.
Since the volume for an l2 ball in RD with radius σ is bounded by c00(D)σD
(cf. Proof of Lemma 2 in [3]), where
c00(D) = (2πe/D)D/2(πD)−1/2,
we can cover B0 with cubes of edge length δ/T3 such that the number of
cubes is at most c00(D)( √ Dσ/T2+ √ Dδ/T3)D (δ/T3)D ≤ (1 + (T3σ)/(T2δ))D(2πe)D/2≤ (B0σ/δ)D
for the B0 in (36) if σ/5 > δ > 0. Choosing one point from each cube to form a set T0, and take T = B ∩ {PDi=1θiφi : (θ1, . . . , θD) ∈ T0}, then from
the second inequality in (35), T is a δ-net for B with respect to the L2-norm.
From (34), T is a r0δ-net for B with respect to the L∞-norm for r0≥ T1/T3.
The proof of Fact 5 is complete.
Fact 6 Suppose that µ is the Lebesgue measure on [0, 1]. Let ¯S be the space of B-splines on [0, 1] with order q and k knots ξ1, . . ., ξk with multiplicities
m1, . . ., mk, where 0 < ξ1 < · · · < ξk < 1. Then ¯S is a sub-space of
where ξ0 = 0 and ξk+1= 1. Let K = m1+ · · · + mk and D = K + q. Then
(34) holds with T1 = 1 and (35) holds with T2 =
q ˜ ∆1D/( √ q(2q + 1)9q−1) and T3 = q q ˜ ∆2D. Proof. Let (y1, . . . , yK+2q) = ( 0, . . . , 0 | {z } q times ξ1, . . . , ξ1 | {z } m1 times , . . . , ξk, . . . , ξk | {z } mk times 1, . . . , 1 | {z } q times )
and let φi be the (normalized) B-spline basis of order q associated with knots
yi, . . ., yi+q for i = 1, . . ., D. Then φ1, . . ., φD spans ¯S. It follows from
Equation (4.80) in Schumaker [6] that (34) holds with T1 = 1, so it remains
to check (35).
To check that the first inequality in (35) holds with the T2 specified
above, note that from (4.79) and (4.86) in [6], we have that for f =PK+q
i=1 θiφi,
|θi| ≤ (2q + 1)292(q−1)∆˜−1/2
1 kf kL2[yi,yi+q],
where φi is supported on [yi, yi+q] which implies that K+q X i=1 θ2i ≤ (2q + 1)292(q−1)∆˜−1 1 K+q X i=1 kf k2L 2[yi,yi+q] ≤ (2q + 1)292(q−1)∆˜−11 q kf k22,
which implies that the first inequality in (35) holds with T2 =
q ˜ ∆1D/( √ q(2q+ 1)9q−1).
To check that the first inequality in (35) holds with the T3 specified
above, we follow the approach in the proof of Lemma 4.2 in Ghosal, Ghosh and Van der Vaart [4], which is originally given in Stone [7]. For f =
PK+q
j=1 θjφj, we have that for x ∈ [yi, yi+1) and q + 1 ≤ i ≤ q + K, f (x) =
Pi
j=i+1−qθjφj(x) (cf. [6], Equations (4.25) and (4.29)), so it follows from
Schwartz inequality that for x ∈ [yi, yi+1),
f2(x) ≤ q i X j=i+1−q θ2jφ2j(x) ≤ q i X j=i+1−q θj2, which gives Z 1 0 f2(x)dx = q+K X i=q+1 Z yi+1 yi f2(x)dx ≤ q q+K X i=q+1 i X j=i+1−q θ2j(yi+1−yi) ≤ q2∆˜2 D X j=1 θ2j
and the second inequality in (35) holds with T3 = q
q
˜
∆2D. The proof of
Fact 6 is complete.
From Facts 5 and 6, Assumption M2 holds with Dj = q + k, Bj0 defined
in (12) and rj = 1 (defined in (13)) since
1 ≥ 1 q q ˜ ∆2,j(k + q) ,
where ˜∆2,j and ˜∆2,j are defined in (10) and (11) respectively. It is clear
that Bj0 ≥ 1 and Dj ≥ 1 for j ∈ Λ, as required in Theorem 1. In addition,
(14) and B2jrj ≤ δntogether implies that rjB2j ≤ `n/4 for j ∈ Λ if n is large
enough.
Assumption M3 holds with constants k1 and k2 that do not depend on
j since the density for the distribution of Xi is supported on [0, 1] and is
bounded below from zero and bounded above on [0, 1] and E 1 n n X i=1 (Yi− t(Xi))2− 1 n n X i=1 (Yi− s(Xi))2 ! = E(s(Xi) − t(Xi))2.
Next, we will verify (5). Note that with Bj = 2b and Dj = k + q, (16)
implies that `nη1,j 2rjBj2 > an(k + q) log(B 0 j) 2 ≥ anDjlog(3.5 √ 2πe) 2
and 24(1 + 2Djlog 2) ≤ 24(1 + 2 log 2)Dj, so if n is large enough so that
anlog(3.5
√
2πe) ≥ 48(1 + 2 log 2), (37) then (5) holds.
Now we have verified that the conditions in Theorem 1 hold true. Let ηj = (η1,j + rj(2b)2ξ/`n)/2. Then by Theorem 1, the error bound in (7)
holds with θ =
A
0+ 2
c(8τ )
∨ 5 except on a set of probability at most
qn p ∗ j∗(`n, ηj∗) +X j∈Λ pj(`n, ηj) , (38) for a given j∗∈ Λ.
Below we will calculate the probability upper bound in (38) with j∗= b∗, q∗, 1 2J∗, . . . , 2J∗− 1 2J∗ ! def = (b∗, q∗, ξ∗) ,
where q∗ = m + 1 and J∗ = blog2(n1/(1+2m))c, and b∗ is a constant large enough so that for any spline with knot vector ξ∗, order q∗, and sup-norm bounded by bksk∞c + 1, the spline coefficients are bounded by b∗. Note that
ηj = 1 2 η1,j + rj(2b)2ξ `n ! = 1 2 anrj(2b)2 `n (k + q) log(Bj0) + λ[(log 2)2Jj+ q + b]+rj(2b) 2ξ `n ! , so p∗j∗(`n, ηj∗) = exp −2c1(τ /k0)`nηj∗ (A0+ 2b∗)2
≤ exp−anc9[(log 2)2J
∗ + q∗+ b∗] − c9ξ , (39) and for 0 < σ < 1, pj(`n, ηj) = exp −2c1(τ /k0)`nηj (A0+ 2b)2 +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (B0jθ) ∨ 23(k+q)× " exp − `nc(τ )ηj 2(A0+ 2b)(2b) + exp − `nηj 24rj(2b)2 !# ≤ exp −2c3`nηj (2b)2 + c4 σ2 exp −2c5`nηj (2b)2 Bj0θ3(k+q)
for some constants c9, c3, c4, c5. For 0 < σ < 1, if n is large enough so that
c5an≥ 3 +
3 log θ
log(3.5√2πe), (40) then for ξ > 0 and c6 = c3∧ c5,
pj(`n, ηj) ≤ 2 1 ∨ c4 σ2 exp−c6han[(log 2)2Jj+ q + b] + ξ i . For n large enough so that anc6 > 2,
X j∈Λ exp−c6an[(log 2)2Jj+ q + b] ≤X b X q X J 2J−1 X k=1 2J− 1 k!(2J− 1 − k)!exp −2[(log 2)2J+ q + b] ≤X b X q X J 2−2Je−2be−2qdef= c7 < ∞,
so for c8 = 2c7, X j∈Λ pj(`n, ηj) ≤ c8 1 ∨ c4 σ2 exp (−c6ξ) .
Therefore, for 0 < σ < 1 and for n large enough so that anc6 > 2 and (40)
and (37) hold, we have (k1− 9τ )kˆs − sk2IΩn > 9τ σ2 4 ! + 1.5η1,j∗+ (k2+ τ )ks∗− sk2+ 1.5δnξ `n
with probability at most qn exp(−c9ξ) + c8 1 ∨ c4 σ2 exp (−c6ξ) for ξ > 0. Let U = (k1− 9τ )kˆs − sk2IΩn− 9τ σ2 4 ! + 1.5η1,j∗+ (k2+ τ )ks∗− sk2 ! , then for ξ0 > 0, 0 < σ < 1 ∧ √ c4, c10= c6∧ c9 and c11= c4(1 + c8), we have E ` nU 1.5δn ≤ ξ0+ Z ∞ ξ0 qn exp(−c9ξ) + c8 1 ∨ c4 σ2 exp (−c6ξ) dξ ≤ ξ0+ c11qn σ2 Z ∞ ξ0 exp (−c10ξ) dξ ≤ ξ0+ c11qn c10σ2 exp (−c10ξ0) .
Take ξ0= δ−1n `nn−2m/(1+2m)and σ = n−m/(1+2m), then ξ0 = O(δ−1n (log n)−1−γ0n1/(1+2m))
and lim sup
n n
2m/(1+2m)E(U ) ≤ lim sup n 1.5δnn2m/(1+2m) `n ξ0+ c11 c10σ2 exp (−c10ξ0) < ∞, so (k1− 9τ )Ekˆs − sk2IΩn ≤ 1.5η1,j∗+ (k2+ τ )ks ∗− sk2+ Cn−2m/(1+2m)
for some constant C > 0. Recall that for the splines in Sj∗, the knots are
equally spaced and the number of knots is 2J∗ − 1 = O(n1/(1+2m)), so we
can choose s∗ ∈ Sj∗ such that ks∗− sk2 = O(n−2m/(1+2m)) (cf. Theorem
6.25 in [6]) and
(k1− 9τ )Ekˆs − sk2IΩn ≤ 1.5η1,j∗+ O(n
−2m/(1+2m)
Since rj∗= 1 and Bj0∗ = √ 2πe0.5 + q∗pq∗(2q∗+ 1)9q∗−1 , we have η1,j∗ = anrj ∗(2b∗)2 `n (2J∗− 1 + q∗) log(Bj0∗) + λ[(log 2)2J ∗ + q∗+ b∗] = O(an(log n)1+γ0n−2m/(1+2m)).
Choose τ < k1/9, then (41) implies that
Ekˆs − sk2IΩn = O(an(log n)
1+γ0n−2m/(1+2m)). (42)
It remains to establish an upper bound for Ekˆs − sk2IΩc
n. Note that
kˆsk ≤ δn, ksk∞ < ∞ and P (Ωcn) ≤ (`n+ 1)(βq˜n + βqn−˜qn), where both ˜qn
and qn− ˜qn are O((log n)1+γ0) from the choice that `n= O(n(log n)−1−γ0).
Thus there exists a constant c12> 0 such that
n2m/(1+2m)Ekˆs − sk2IΩc
n = O(n
2m/(1+2m)δ2
n(`n+ 1) exp(−c12(log n)1+γ0)
= o(1). (43)
The proof of Theorem 2 is complete by combining (42) and (43).
References
[1] Y Baraud, F. Comte, and G. Viennet. Adaptive estimation in autoregres-sion or β-mixing regresautoregres-sion via model selection. The Annals of Statistics, 29(3):839–875, 2001.
[2] Andrew Barron, Lucien Birg´e, and Pascal Massart. Risk bounds for model selection via penalization (Disc: P415-419). Probability Theory and Related Fields, 113:301–413, 1999.
[3] Lucien Birg´e and Pascal Massart. Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli, 4:329– 375, 1998.
[4] Subhashis Ghosal, Jayanta K. Ghosh, and Aad W. van der Vaart. Convergence rates of posterior distributions. The Annals of Statistics, 28(2):500–531, 2000.
[5] Tzee-Ming Huang. Convergence rates for posterior distributions and adaptive estimation. The Annals of Statistics, 32(4):1556–1593, 2004.
[6] Larry L. Schumaker. Spline Functions: Basic Theory. Wiley-Interscience, 1981.
[7] Charles J. Stone. The dimensionality reduction principle for generalized additive models. The Annals of Statistics, 14:590–606, 1986.
[8] Yuhong Yang and Andrew R. Barron. An asymptotic property of model selection criteria. IEEE Transactions on Information Theory, 44(1):95– 116, 1998.
[9] S. Zhou, X. Shen, and D. A. Wolfe. Local asymptotics for regression splines and confidence regions. The Annals of Statistics, 26(5):1760– 1782, 1998.