相關誤差下的迴歸函數適應性估計

(1)

科技部補助專題研究計畫成果報告

期末報告

計畫類別：個別型計畫

計畫編號： NSC 102-2118-M-004-007-

執行期間： 102 年 08 月 01 日至 103 年 07 月 31 日

執行單位：國立政治大學統計學系

計畫主持人：黃子銘

計畫參與人員：碩士級-專任助理人員：鄭宇翔

處理方式：

1.公開資訊：本計畫可公開查詢

2.「本研究」是否已有嚴重損及公共利益之發現：否

3.「本報告」是否建議提供政府單位施政參考：否

中華民國 103 年 07 月 28 日

(2)

中文摘要：無母數迴歸估計問題中，若是迴歸函數平滑程度已知，則容

易得到收斂速度為最佳的估計量。當迴歸函數平滑程度為未

知時，我們也希望迴歸函數估計量能達到一樣好的估計效

果。這樣的估計量稱為具有適應性。

當迴歸模型中誤差為相關時，有人提出基於一種選模規則

的迴歸函數估計量，是具適應性的。然而這方法使用上有

困難，因為選模規則牽涉到一些未知參數。本計畫主要成

果為提出一種迴歸函數估計量使用上不牽涉到一些未知參

數，並證明其收斂速度達到幾乎最佳，可視為接近適應性的

估計量。

中文關鍵詞：迴歸, 適應性估計, 相關誤差

英文摘要： In nonparametric regression, if the degree of

smoothness of the regression function is known, it is

often easy to obtain estimators that attain the

optimal convergence rate. When the degree of

smoothness of the regression function is unknown, it

is desirable to have estimators for the regression

function that can also achieve the optimal

convergence rate. Estimators that have this property

are called adaptive.

When the errors in a regression model are dependent,

an adaptive estimator based on a model selection

criterion has been proposed, but it is difficult to

implement because the criterion involves unknown

parameters for the error dependence structure and the

error variance. In this project, it is proposed to

consider an estimator that is also based on model

selection approach without invovling unknown

parameters. It is shown that the proposed estimator

attends a nearly-optimal convergence rate.

study the possibility of replacing the unknown

parameters by their consistent estimators to make the

adaptive estimator implementable.

(3)

An adaptive knot selection method for regression

splines via penalized minimum contrast estimation

Tzee-Ming Huang

Abstract

In this report, a knot selection method for regression splines is proposed. This method yields a least square spline estimator that adapts to the smoothness of the regression function, and the knots are allowed to be not equally spaced. If the true regression func-tion s belongs to a Sobolev space W2

m[0, 1], then for a sequence {an}

such that limn→∞an = ∞ and a constant γ0 > 0, the proposed

estimator (depending on an and γ0) can converge to s at the rate

O(pan(log n)1+γ0n−2m/(1+2m)) in terms of L2 norm in probability.

1 Introduction

One of the most popular methods in non-parametric regression is B-spline estimation. B-splines are piecewise polynomials joined smoothly at points called knots. For implementation, one has to choose the number of knots and the degrees of polynomials. The choice of knots is especially crucial. For functions that are m times continuously differentiable with the m-th derivatives bounded by a constant, Zhou, Shen and Wolfe [9] showed that the number of knots should grow at the rate n1/(1+2m)for the spline estimator for the regression function to achieve the optimal convergence rate n−2m/(1+2m) in integrated mean squared error.

It is possible to construct estimators for regression functions in the Sobolev space W_m2[0, 1] that can achieve the rate n−m/(1+2m)with respect to the L2 norm without knowing m. These estimators are known as adaptive estimators. Barron, Birg´e and Massart [2] derived risk bounds for penalized minimum contrast estimators, which can be used to construct adaptive esti-mators for regression functions. Huang [5] applied an inequality in Yang and Barron [8], which is obtained using a similar approach in [2], to construct an adaptive estimator for regression function using B-splines with equally spaced knots assuming the errors are normally distributed.

The objective of this study is to construct an adaptive estimator for re-gression function using B-splines without requiring that the knots are equally

(4)

spaced or that the errors are normally distributed. This objective is achieved by first establishing an exponential inequality to control the error of mini-mum contrast estimators and then applying the result to derive convergence rate for a spline estimator obtained via model selection. The exponential inequality is given in Section 2. The application to adaptive B-spline esti-mation is given in Section 3.

2 Error Control for Minimum Contrast

Estima-tors

In this section, the problem of minimum contrast estimation is introduced, which includes least square regression as a special case. Then, a theorem that gives error bounds for penalized minimum contrast estimators is presented, which will be used for establishing the rate of convergence of the proposed regression estimator.

The problem set-up for minimum contrast estimation is this: Consider the problem of estimating an unknown function s based on observations Z1, . . ., Zn, where there exists a function γ such that E

1 n n X i=1 γ(Zi, t) ! is minimized when t = s. Suppose that s is estimated by

˜ s = arg min t∈S 1 n n X i=1 γ(Zi, t).

Then ˜s is called the minimum contrast estimator over S with respect to γ, and the function γ is called the contrast function.

Least square regression fits into the framework of minimum contrast estimation. Consider the regression model

Yi = s(Xi) + Wi, i = 1, . . . , n, (1)

where the regression function s is defined on an interval I0, Wi are errors

of mean zero that are independent of the Xi’s. If s is estimated by some

function in a parametric family S, then the least square estimator ˜s is a minimum contrast estimator with Zi = (Xi, Yi) with respect to the contrast

function γ(z, t) = (y − t(x))2 for z = (x, y).

In minimum contrast estimation, a key requirement is that the function s can be approximated well by some function in S. An approach to achieve the approximation requirement is to take S = ∪j∈ΛSj, where Sj’s are various

families of functions and the collection {Sj} is rich enough to include some

(5)

Barron?). To prevent overfitting, a penalty term η1,j is often considered in

minimum contrast estimation to obtain the following estimator ˆ s = arg min j∈Λ η1,j+ arg mint∈Sj 1 n n X i=1 γ(Zi, t) ! . (2)

To control the estimation error of the estimator penalized minimum contrast estimator ˆs, several assumptions are made. Let

νn(·) = 1 n n X i=1 [γ(Zi, ·) − Eγ(Zi, ·)],

and let k · k and k · k∞ denote the L2 norm and sup norm on I0. The

assumptions are given below.

Assumption M1. Suppose that for i = 1, . . ., n, Zi = f0(s, Xi, Wi), where f0

is known. Suppose that there exists a non-negative constant k0 and for each

j ∈ Λ, there exist non-negative constants Aj, Bj, A2,j, B2,j, non-negative

functions Mj, ∆j, M2,j, ∆2,j, such that for u ∈ Sj,

|γ(z, u) − γ(z, v)| ≤

(

Mj(w)∆j(x, u, v), if v ∈ Sj;

M2,j(w)∆2,j(x, u, s), if v = s,

and, for all m ≥ 2, i = 1, . . ., n,

Es[Mjm(Wi)] ≤ amAmj ,

Es[∆mj (Xi, u, v)] ≤ bmk0ku − vk2Bm−2j ,

Es[M2,jm(Wi)] ≤ amAm2,j,

Es[∆m2,j(Xi, u, s)] ≤ bmk0ku − sk2B_2,jm−2,

where am = 1, bm = m!/2, for all m ≥ 2, or bm = 1, am = m!/2, for all

m ≥ 2.

Assumption M2. For each j ∈ Λ, there exist constants B_j0 ≥ 1, rj > 0

and Dj ≥ 1 such that, for σ > 0 and 0 < δ < σ/5, for any ball B ⊂ Sj

of radius σ with respect to k · k, one can find T : a subset of B such that |T | ≤ (B_j0σ/δ)Dj _{and for every u ∈ B, there exists v ∈ T such that}

ku − vk ≤ δ and

(6)

Assumption M3. There exist positive constants k1 and k2 such that

k1ku − sk2≤ E[γ(Zi, u) − γ(Zi, s)] ≤ k2ku − sk2 (3)

for all u ∈ Sj for all j ∈ Λ.

Under Assumptions M1 – M3, if (X1, W1), . . ., (Xn, Wn) are β-mixing, one

can derive an error bound for ˆs using bounds for sup_u∈S[νn(s) − νn(u)] and

νn(u) − νn(s) for u ∈ Sj. This result is stated in Theorem 1 below.

Theorem 1 Suppose that Assumptions M1 – M3 hold and (X1, W1), . . .,

(Xn, Wn) are β-mixing. Suppose that Bj ≥ 1, rjBj2 ≤ n/4, rj ≥ 1, and

there exist constants A0 > 0, B0 > 0 such that 0 < Aj/Bj ≤ A0 + 2 and

0 < A2,j/B2,j ≤ 1/B0. Suppose that kuk∞ ≤ Bj for all u ∈ Sj. Suppose

that qnis a positive integer and ˜qn= bqn/2c. Let `n= bn/qnc. Suppose that

δn> 0, `n≥ A20 and rjBj2≤ (`n/4) ∧ δn, σ > 0, τ > 0, θ ≥ _A 0+ 2 c(8τ ) ∨ 5, (4)

and the penalty η1,j is chosen so that

`nη1,j

2 > 24B

2

jrj(1 + 2Djlog 2) for each j ∈ Λ. (5)

Let c1(t) = t (1 +√1 + t)2 and c(t) = c1 _t k0(A0+ 2) (6) for t > 0 and take

ηj = 1 2 η1,j+ B2 jrjξ `n ! . Then there exists a set Ωn with

P (Ωc_n) ≤ (`n+ 1)(βq˜n+ βqn−˜qn)

such that for ξ > 0, j∗ ∈ Λ and s∗∈ Sj∗, we have

(k1− 9τ )kˆs − sk2IΩn− 9τ σ2 4 ! ≤ 1.5η1,j∗+ (k₂+ τ )ks∗− sk2+1.5δnξ `n (7) except on a set of probability at most

qn  p∗_j∗(`_n, η_j∗) + X j∈Λ pj(`n, ηj)  ,

(7)

where βk denotes the k-th β-mixing coefficient for k ≥ 1, pj(`n, ηj) = exp − c1(2τ B0/k0)`nηj A2,jB2,j ! +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (B_j0θ) ∨ 23Dj× " exp −`nc(τ )ηj 2AjBj ! + exp − `nηj 24B2 jrj !# (8) and p∗_j∗(`_n, η_j∗) = exp −c1(2τ B0/k0)`nηj ∗ A2,j∗B_2,j∗ ! . (9)

The proof of Theorem 1 is given in Section 5.1.

Remark. In the least regression framework, while the inequality in Lemma 0 in [8] can also be applied to give control of νn(s)−νn(u) (see [5] for

example), the exponential inequality given in (21) is more direct. Lemma 0 in [8] gives control of the likelihood function, and in [5], the control of νn(s) − νn(u) is achieved by assuming the Wi’s are normally distributed. In

contrast, the inequality in (21) is derived following the proof of Theorem 5 in Birg´e and Massart [3], which only requires moment conditions of the Wi’s.

3 Application to Adaptive B-spline Estimation

In this section, we will apply Theorem 1 to obtain adaptive B-spline es-timators for s in (1). Here the regression function s is assumed to be in the Sobolev space W_m2[0, 1] for some m ≥ 1 and each Sj is taken to be a

collection of B-splines that are on [0, 1] with order q for some integer q ≥ 1 and boundary knots at 0 and 1 and distinct internal knots ξ1, . . ., ξk in

{1/2J_{, . . . , (2}J _{− 1)/2}J_{} for some positive integer J . Also, the coefficients}

for B-splines in Sj are bounded by b in absolute value for some positive

integer b, and the index j is (b, q, ξ1, . . . , ξk). Let

˜ ∆2,j = max 1≤i≤k+1(ξi− ξi−1), (10) ˜ ∆1,j = min 1≤i≤k+1(ξi− ξi−1), (11) B_j0 =√2πe  0.5 + q√q(2q + 1)9q−1 q ˜ ∆2,j q ˜ ∆1,j  , (12)

(8)

rj = 1, (13)

and

Jj = min{J ≥ 1 : ξ1, . . ., ξk are in {1/2J, . . . , (2J− 1)/2J} },

where ξ0 = 0 and ξk+1 = 1. Let Λ be the set of all j = (b, q, ξ1, . . . , ξk)’s

such that b and q are positive integers, 2Jj_{+ q + b ≤ n and r}

j(2b)2 ≤ δn,

where {δn} is chosen such that

lim n→∞δn= ∞ = limn→∞ `n δn and lim n→∞ δnlog(n) nα = 0 for all α > 0. (14)

Then the estimator for s considered here is the penalized least square esti-mator ˆ s = argmin_j∈Λ,u∈S_j 1 n n X i=1 (Yi− u(Xi))2+ η1,j ! , (15) where η1,j = anrj(2b)2 `n (k + q) log(B_j0) + λ[(log 2)2Jj_{+ q + b]}_, ₍₁₆₎

λ ≥ 1 is a constant, and {an} is a sequence of positive numbers such that

limn→∞an= ∞. The L2 convergence rate for ˆs is given in Theorem 2.

Theorem 2 Suppose that the regression model in (1) holds, where the re-gression function s is in W_m2[0, 1], {(Xi, Wi)}i≥1 is a β-mixing series, and

its `-th β-mixing coefficient, denoted by β`, satisfies

β`≤ γ1e−`γ2 for ` ≥ 1, (17)

for some positive constants γ1and γ2, and the Wi’s are of mean zero and are

independent of the Xi’s. Suppose that Eeα|Wi|< Γ for some α > 0 and Γ ≥ 1

and Xi has a Lebesgue density that is bounded above and bounded below from

zero. Suppose that {an} and {`n} are two sequences of positive numbers such

that limn→∞an= ∞ and `n= O(n(log n)−(1+γ0)) for some γ0> 0. Suppose

that {δn} is chosen so that (14) holds. Then, for the estimator ˆs given in

(15) with Sj defined above in this section and η1,j defined in (16), we have

Ekˆs − sk2 = O(an(log n)1+γ0n−2m/(1+2m)).

(9)

4 Conclusion

An adaptive estimator for regression function using B-splines that allows non-equally spaced knots is successfully constructed. The estimator is a penalized least square estimator and it achieves the L2 convergence rate O(an(log n)1+γ0n−2m/(1+2m)) for any sequence {an} such that limn→∞an=

∞ and constant γ0> 0 when the regression function is in Wm2[0, 1] for some

m ≥ 1.

5 Proofs

5.1 Proof of Theorem 1

The proof of Theorem 1 is based on the following result:

Lemma 1 Suppose that Assumptions M1 and M2 hold and (X1, W1), . . .,

(Xn, Wn) are β-mixing. Suppose that Bj ≥ 1, rjBj2 ≤ n/4, rj ≥ 1, and

there exist constants A0 > 0 and B0> 0 such that 0 < Aj/Bj ≤ A0+ 2 and

that qnis a positive integer and ˜qn= bqn/2c. Let `n= bn/qnc. Suppose that

`n ≥ A20 and rjB2j ≤ `n/4 for each j ∈ Λ, σ > 0, τ > 0, (4) holds. Then

there exists a set Ωn such that P (Ωcn) ≤ d`ne(βq˜nI(˜qn≥ 1) + βqn−˜qn) and for

ηj such that

`nηj > 24Bj2rj(1 + 2Djlog 2) for each j ∈ Λ, (18)

for j∗∈ Λ and s∗ ∈ S_j∗, Ps(Ωn∩ (S2,1∪ S2,1∗ )) ≤ qn  p ∗ j∗(`_n, η_j∗) +X j∈Λ pj(`n, ηj)  , (19)

where S2,1 is the event that

νn[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ ks − uk 2 ! + 2ηj for some u ∈ Sj for some j ∈ Λ, S_2,1∗ =nνn[γ(·, s∗) − γ(·, s)] > τ ks − s∗k2+ ηj∗ o ,

βk is the k-th β-mixing coefficient for k ≥ 1 and pj(`n, ηj) and p∗j(`n, ηj∗)

(10)

Lemma 1 can be derived by first establishing the special case where (X1, W1),

. . ., (Xn, Wn) are independent and then applying a corollary of Berbee’s

Lemma taken from Claim 2 in [1]. The independent case of Lemma 1 is stated in Lemma 2 below, whose proof is given in Section 5.2. The corollary of Berbee’s Lemma is stated in Fact 1 below.

Lemma 2 Suppose that Assumptions M1 and M2 hold and (X1, W1), . . .,

(Xn, Wn) are independent. Suppose that Bj ≥ 1, rjBj2 ≤ n/4, rj ≥ 1, and

there exist constants A0 > 0 and B0> 0 such that 0 < Aj/Bj ≤ A0+ 2 and

that n ≥ A2₀, σ > 0, τ > 0, (4) and and

nηj > 24Bj2rj(1 + 2Djlog 2) for each j ∈ Λ. (20)

Then we have Ps " νn[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ ks − uk 2 ! + 2η for some u ∈ Sj # ≤ exp −c1(2τ B0/k0)nηj A2,jB2,j ! +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (B_j0θ) ∨ 23Dj× " exp −nc(τ )ηj 2AjBj ! + exp − nηj 24B2 jrj !# , (21)

where c1 and c are defined in (6). In addition, for u ∈ Sj and ηj ≥ 0, we

have Ps h νn[γ(·, u) − γ(·, s)] > τ ku − sk2+ ηj i ≤ exp −c1(2τ B0/k0)nηj A2,jBj2, j ! (22) Fact 1 Suppose that a sequence {ui}∞i=1 is β-mixing and for n ≥ 1, ˜qn and

qn are integers such that 0 ≤ ˜qn ≤ qn/2, qn ≥ 1. Let ui for i ≥ 1. Let

`n = bn/qnc. Then there exists u∗i: i = 1, . . ., d`neqn such that (i)–(iii)

hold.

(i) For ` = 1, . . ., d`ne, let U`,1 = (u(`−1)qn+1, . . ., u(`−1)qn+˜qn)

T_{, U}∗ `,1 = (u∗_(`−1)q n+1, . . ., u ∗ (`−1)qn+˜qn) T_{, U} `,2= (u(`−1)qn+˜qn+1, . . ., u`qn) T_{, and} U_`,2∗ = (u∗_(`−1)q n+˜qn+1, . . ., u ∗ `qn)

T_{, then for each δ ∈ {1, 2}, U} `,δ and

(11)

(ii) For ` = 1, . . ., d`ne, P (U`,1 6= U_`,1∗ ) ≤ βqn−˜qn and P (U`,2 6= U

∗ `,2) ≤

βq˜n, where βk denotes the k-th β-mixing coefficient for k ≥ 1.

(iii) For each δ ∈ {1, 2}, U_1,δ∗ , . . ., U_d`∗

ne,δ are independent.

Next, we will prove Lemma 1 using Lemma 2 and Fact 1. To find an upper bound for Ps(S2,1) when the series {(Xi, Wi)}i≥1is β-mixing, we will

apply Fact 1 with ui = (Xi, Wi). We will only prove the case qn> 1, which

implies that ˜qn = bqn/2c ≥ 1, since the proof for the case qn= 1 is similar.

Let (X_i∗, W_i∗) = u∗_i, and

Ωn= {(Xi, Wi) = (Xi∗, Wi∗) for i = 1, . . . , n}.

Then Ps(S2,1) ≤ P (Ωcn) + Ps(S2,1∩ Ωn). For k = 1, . . ., qn, let Γk= {i : i =

k + `qn for some integer ` and 1 ≤ i ≤ n} and

νn,k[γ(·, u)] =   1 |Γk| X i∈Γk γ(Zi, u)  − Eγ(Z1, u)

for all u, where |Γk| denotes the number of elements in Γk, which is at most

d`ne. Let S2,1,k be the event that

νn,k[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ ks − uk 2 ! + 2ηj

for some u ∈ Sj for some j ∈ Λ and let

S_2,1,k∗ =nνn,k[γ(·, s∗) − γ(·, s)] > τ ks∗− sk2+ ηj∗

o

. Then S2,1 ⊂ ∪qk=1n S2,1,k and S2,1∗ ⊂ ∪

qn

k=1S2,1,k∗ and it follows from (iii) in

Claim 2 and Theorem 2 that P (Ωc_n) ≤ d`ne(βq˜n+ βqn−˜qn) and

Ps(Ωn∩ (S2,1∪ S2,1∗ )) ≤ qn X k=1 (Ps(S2,1,k∩ Ωn) + Ps(S∗2,1k∩ Ωn)) ≤ qn X k=1 X j∈Λ pj(|Γk|, ηj) + qn X k=1 p∗_j∗(|Γ_k|, η_j∗) ≤  q_n X j∈Λ pj(`n, ηj)  + q_np∗_j∗(`_n, η_j∗)

if (4) holds, `n≥ A20and for each j ∈ Λ, rjBj2 ≤ `n/4, and `nηj > 24Bj2rj(1+

(12)

To give an error bound for kˆs − sk using Lemma 1, for ξ > 0, take ηj = 1 2 η1,j+ B_j2rjξ `n ! , then (18) holds and on S_2,1c , we have

k1ku − sk2− 9τ

σ2

4 ∨ ks − uk

2

!

≤ E(γ(Zi, u)) − E(γ(Zi, s)) − 9τ

σ2 4 ∨ ks − uk 2 ! ≤ 1 n n X i=1 γ(Zi, u) + η1,j+ B2 jrjξ `n − 1 n n X i=1 γ(Zi, s) ≤ 1 n n X i=1 γ(Zi, u) + η1,j+ δnξ `n − 1 n n X i=1 γ(Zi, s)

for all u ∈ Sj for all j ∈ Λ, and on (S2,1∗ )c, we have

1 n n X i=1 γ(Zi, s∗) − 1 n n X i=1 γ(Zi, s) ≤ E[γ(Zi, s∗) − γ(Zi, s)] + τ ks∗− sk2+ ηj∗ ≤ E[γ(Zi, s∗) − γ(Zi, s)] + τ ks∗− sk2+ η1,j∗ 2 + δnξ 2`n . Thus on Ωn∩ (S2,1c ∩ (S2,1∗ )c), we have the error bound

(k1− 9τ )kˆs − sk2− 9τ σ2 4 ! ≤ 1 n n X i=1 γ(Zi, s∗) + η1,j∗+δnξ `n − 1 n n X i=1 γ(Zi, s) ≤ 1.5 η1,j∗+δnξ `n + E[γ(Zi, s∗) − γ(Zi, s)] + τ ks∗− sk2 ≤ 1.5η_1,j∗+ (k₂+ τ )ks∗− sk2+1.5δnξ `n . Let U = (k1− 9τ )kˆs − sk2IΩn− 9τ σ2 4 ! + 1.5η1,j∗+ (k₂+ τ )ks∗− sk2 ! , then the above result can be expressed as P (U > 0) ≤ P (Ωn∩ (S2,1∪ S2,1∗ )),

where P (Ωc

n) ≤ (`n+ 1)(βq˜n + βqn−˜qn) and an upper bounds for P (Ωn∩

(13)

5.2 Proof of Lemma 2

To prove Lemma 2, we will first establish the following result:

Fact 2 Suppose that the conditions in Lemma 2 hold. Suppose that θ and ηj satisfy (4) and (18) for some τ > 0. Then for σ > 0, x > 0, xk> 0, we

have Ps " sup u∈B(s∗_,σ) νn[γ(·, s∗) − γ(·, u)] > τ σ2+ ηj # ≤ exp(−nx + H₀) +X k exp(−nxk+ Hk+ Hk+1) (23) if σp2k0x + Bjx + (σ/θ) ∞ X k=0 2−k(p5k0xk+ 1.5rxk) ≤ τ σ2+ ηj Aj , (24)

where Hk= Djlog(Bj02kθ) = Djlog(Bj0θ) + kDjlog(2).

The proof of Fact 2 relies on a version of Bernstein’s inequality given Lemma 8 in [3], which is given below.

Fact 3 Suppose that U1, . . ., Unare independent random variables such that

1 n n X i=1 E[|Ui|m] ≤ m! 2 v 2_cm−2 _{for all m ≥ 2}

for some positive constants v and c. Then, for x ≥ 0, P " _n X i=1 Ui− E n X i=1 Ui ! ≥ n(v√2x + cx) # ≤ exp(−nx).

Fact 2 follows from Fact 3 and Assumption M2. To prove Fact 2, let δk =

2−kσ/θ. Let Tk be a subset of B(s∗, σ) such that |Tk| ≤ (B02kθ)D and

for every u ∈ B(s∗, σ), there exists v ∈ Tk such that ku − vk ≤ δk and

k∆(·, u, v)k∞ ≤ rδk. Let Vk be the set of (uk, uk+1)’s such that uk ∈ Tk,

uk+1 ∈ Tk+1and there exists u ∈ Sj such that kuk+1−uk ≤ δk+1, kuk−uk ≤

δk, k∆(·, u, uk+1)k∞ ≤ rδk+1, and k∆(·, u, uk)k∞ ≤ rδk. For u ∈ B(s∗, σ), since νn[γ(·, s∗) − γ(·, u)] = νn[γ(·, s∗) − γ(·, u0)] + ∞ X k=0 (νn[γ(·, u) − γ(·, uk+1)] − νn[γ(·, u) − γ(·, uk)]) ,

(14)

for some uk∈ Tk for k = 0, 1, . . ., we have Ps " sup u∈B(s∗_,σ) νn[γ(·, s∗) − γ(·, u)] > τ σ2+ ηj # ≤ Ps[νn[γ(·, s∗) − γ(·, u0)] > y for some u0 ∈ T0] | {z } I +X k Ps[νn[γ(·, uk) − γ(·, uk+1)] > yk for some (uk, uk+1) ∈ Vk] | {z } IIk if y +X k yk≤ τ σ2+ ηj.

Below we will use Fact 3 to find bounds for I and IIk. To derive an

upper bound for I, note that

Es|γ(Zi, s∗) − γ(Zi, u0)|m ≤ Es[Mjm(Wi)]Es[∆mj (Xi, s∗, u0)] ≤ amAmj Es[∆m(Xi, s∗, u0)] and 1 n n X i=1 Es|γ(Zi, s∗) − γ(Zi, u0)|m≤ amAmj bmk0σ2Bm−2j . Take y = Aj(σ √

2k0x + Bjx), then I ≤ (Bj0θ)Dj exp(−ny) = exp(−ny + H0).

To derive an upper bound for IIk, note that

Es|γ(Zi, uk+1)−γ(Zi, uk)|m≤ Es[Mjm(Wi)]Es(∆j(Xi, u, uk+1)+∆j(Xi, u, uk))m, where Es(∆(Xi, u, uk+1) + ∆(Xi, u, uk))m ≤ Es(∆j(Xi, u, uk+1) + ∆j(Xi, u, uk))2(k∆j(·, u, uk+1)k∞+ k∆j(·, u, uk+1)k∞)m−2, so 1 n n X i=1 Es|γ(Zi, uk+1) − γ(Zi, uk)|m ≤ amAmj bmk0[2(δk2+ δk+12 )(rjδk+ rjδk+1)m−2] ≤ m! 2 ₅ 2δ 2 kA2jk0 Aj 3rjδk 2 m−2 . Therefore, if yk = Ajδk( √ 5k0xk+ 1.5rjxk), then IIk≤ B_j02kθDjB0_j2k+1θDjexp(−nxk) = exp(−nxk+ Hk+ Hk+1).

(15)

From the upper bounds for I and IIk given above, (23) holds true if Aj(σ p 2k0x + Bjx) + X k Ajδk( p 5k0xk+ 1.5rjxk) ≤ τ σ2+ ηj,

so (24) implies (23) and we have Fact 2.

Next we will apply Fact 2 with specific x and xk’s for case 0 < σ ≤ 2Bj.

In such case, (24) is implied by σp2k0x + Bjx + σ θ ∞ X k=0 2−k(p5k0xk) + 2Bj θ ∞ X k=0 2−k(1.5rjxk) ≤ τ σ2+ ηj Aj . (25) Let g(x) = (x/(1 +√1 + x))2 for x > 0, then g(x)/x < 1 is increasing on (0, ∞). Let xk= (k + 1)˜y, then (25) holds if

0 < x ≤ k0σ 2 2B_j2g Bj Ajk0σ2 (τ σ2+ ηj) ! and 0 < ˜y ≤ c 2 1k0σ2 16B2 jc20r2j g 4c0rjθBj c2 1Ajk0σ2 (τ σ2+ ηj) ! , where c1 = √ 5P∞ k=0( √ k + 1)2−k ≈ 3.789034 and c₀ = 1.5P∞ k=0(k+1)2−k =

6. If x and ˜y satisfy the above constraints and ˜y > 2Djlog(2)/n, then

Ps " sup u∈B(s∗_,σ)νn[γ(·, s ∗ ) − γ(·, u)] > τ σ2+ ηj # ≤ exp(−nx + Djlog(B0jθ)) +

exp−n˜y + Djlog(2) + 2Djlog(B0jθ)

1 − exp(−[n˜y − 2Djlog(2)])

. From the assumptions that rj ≥ 1, Bj ≥ 1, Bj/Aj ≥ A0+ 2 and θ > 5,

we have Bj Ajσ2 (τ σ2) ≥ τ A0+ 2 and 4c0rjθBj c2₁Ajσ2 (τ σ2) ≥20c0Bjτ c2₁Aj ≥ 8τ A0+ 2 . Since c(τ ) = g _τ k0(A0+ 2) _k 0(A0+ 2) τ , take x = c(τ )τ σ 2_{+ η} j 2AjBj and ˜y = c(8τ )θ(τ σ 2_{+ η} j) 4AjBjc0rj ,

(16)

then by (4), n˜y ≥ nηjc(8τ )θ 4AjBjc0rj ≥ nηjc(8τ )θ 4(A0+ 2)Bj2c0rj ≥ nηj 4B2 jc0rj . Thus n˜y ≥ 1 + 2Djlog 2 if (20) holds. In such case,

Ps " sup u∈B(s∗_,σ)νn[γ(·, s ∗ ) − γ(·, u)] > τ σ2+ ηj # ≤ exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + exp −nc(8τ )θτ σ 2 4AjBjc0rj − nc(8τ )θηj 4AjBjc0rj + Djlog(2) + 2Djlog(Bj0θ) ! × 1 − exp(−[nc(8τ )θτ σ 2 4AjBjc0rj + nc(8τ )θηj 4AjBjc0rj − 2Djlog(2)]) !−1 ≤ exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + exp − nτ σ 2 4B_j2c0rj − nηj 4B_j2c0rj + Djlog(2) + 2Djlog(B0jθ) ! 1 − e−1−1.

In summary, we have prove the following fact assuming σ ≤ 2Bj:

Fact 4 Under the conditions in Lemma 2, for τ > 0, σ > 0, if θ and ηj

satisfy (4) and (18), then Ps " sup u∈B(s∗_,σ) νn[γ(·, s∗) − γ(·, u)] > τ σ2+ ηj # ≤ exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + exp − nτ σ 2 4B2 jc0rj − nηj 4B2 jc0rj + Djlog(2) + 2Djlog(B0jθ) ! 1 − e−1−1.

Note that Fact 4 also holds for σ > 2Bj. To see this, note that kuk∞≤ Bj

for all u ∈ Sj, so Ps " sup u∈B(s∗_,σ)νn[γ(·, s ∗ ) − γ(·, u)] > τ σ2+ ηj # = Ps " sup u∈B(s∗_,2B j) νn[γ(·, s∗) − γ(·, u)] > τ (2Bj)2+ τ [σ2− (2Bj)2] + ηj # .

(17)

Apply Fact 4 with σ = 2Bj and replace ηj with τ [σ2− (2Bj)2] + ηj, then it

is clear that Fact 4 also holds for σ > 2Bj.

To prove Lemma 2 using Fact 4, for ε > 0, choose s∗ ∈ S_j so that ks − s∗k ≤ ks − uk + ε for all u ∈ S_j,

and then we will derive upper bounds for II = Ps

h

νn[γ(·, s∗) − γ(·, u)] > 2τ (σ2∨ ks∗− uk2) + ηj for some u in Sj

i and III = Ps h νn[γ(·, s) − γ(·, s∗)] > τ ks∗− sk2+ ηj i to control νn[γ(·, s) − γ(·, u)].

To find an upper bound for II, note that Ps

h

i ≤ ∞ X k=1 Ps h

νn[γ(·, s∗) − γ(·, u)] > τ (k + 1)σ2+ ηj for some u ∈ S1,k

i +Ps " sup u∈B(s∗_,σ) νn[γ(·, s∗) − γ(·, u)] > 2τ σ2+ ηj # , where S1,k = Sj ∩ B(s∗, σ p

(k + 1)) ∩ B(s∗, σ√k)c. Apply Fact 4 with σ replaced by σ√k + 1, we have

II = Ps

h

i ≤ ∞ X k=0 exp −nc(τ )τ (k + 1)σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! + ∞ X k=0 exp −nτ (k + 1)σ 2 4B_j2c0rj − nηj 4B_j2c0rj + Djlog(2) + 2Djlog(Bj0θ) ! 1 − e−1−1 ≤ 1 − exp −nc(τ )τ σ 2 2AjBj !!−1 exp −nc(τ )τ σ 2 2AjBj −nc(τ )ηj 2AjBj + Djlog(Bj0θ) ! +1.6 1 − exp − nτ σ 2 4B2_jc0rj !!−1 exp − nτ σ 2 4B2_jc0rj − nηj 4B_j2c0rj + Djlog(2) + 2Djlog(Bj0θ) ! . Since 2B_j2 ≤ 2rjBj2 ≤ n/2 and n ≥ A02, we have AjBj ≤ n and 4rjBj2 ≤ n.

Thus the above upper bound for II is at most

1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (B0_jθ) ∨ 23Dj " exp −nc(τ )ηj 2AjBj ! + exp − nηj 24B2_jrj !# .

(18)

To derive an upper bound for III, note that Es|γ(Zi, s∗) − γ(Zi, s)|m ≤ Es[M2,jm(Wi)]Es[∆m2,j(Xi, s∗, s)], so 1 n n X i=1 Es|γ(Zi, s∗) − γ(Zi, s)|m ≤ amAm2,jbmk0ks∗− sk2B2,jm−2 ≤ m! 2 k0ks∗− sk2A22,j (A2,jB2,j)m−2. For x > 0 and y2 ≥ ks∗− skA2,j p 2k0x + A2,jB2,jx = A2,j(ks∗− sk p 2k0x + B2,jx), (26) we have Ps[νn[γ(·, s∗) − γ(·, s)] > y2] ≤ exp(−nx). (26) is equivalent to x ≤ g 2B2,jy2 k0A2,jks − s∗k2 ! k0ks − s∗k2 2B2 2,j , (27)

where g(x) = (x/(1 +√1 + x))2. Since g(x)/x is increasing on (0, ∞) and B2,j/A2,j ≥ B0, for y2 ≥ τ ks − s∗k2, (27) holds if x ≤ g(2τ B0/k0) 2τ B0/k0 y2 A2,jB2,j = c1(2τ B0/k0)y2 A2,jB2,j . Therefore, III = Ps h νn[γ(·, s∗) − γ(·, s)] > τ ks − s∗k2+ ηj i ≤ exp −c1(2τ B0/k0)nηj A2,jB2,j ! . (28)

From the above bounds for II and III and the fact that 2τ (σ2∨ ks∗− uk2) + ηj + τ ks − s∗k2+ ηj ≤ 2τ (σ2∨ (2ks − uk + ε)2) + τ (ks − uk + ε)2+ 2ηj ≤ 9τ σ 2 4 ∨ (ks − uk + ε) 2 ! + 2ηj, we have Ps " νn[γ(·, s) − γ(·, u)] > 9τ σ2 4 ∨ (ks − uk + ε) 2 ! + 2ηj for some u ∈ S #

(19)

≤ exp −c1(2τ B0/k0)nηj A2,jB2,j ! +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (B_j0θ) ∨ 23Dj× " exp −nc(τ )ηj 2AjBj ! + exp − nηj 24B_j2rj !# .

Since ε > 0 is arbitrary, we can obtain (21) by letting ε → 0.

It is clear that (22) holds since (28) still holds if s∗ is replaced by any u ∈ Sj. The proof of Lemma 2 is complete.

5.3 Proof of Theorem 2

Proof of Theorem 2. Theorem 2 is an application of Theorem 1 with Zi = (Xi, Yi) and γ(z, t) = γ((x, y), t) = (y − t(x))2. To apply Theorem

1, Assumptions M1 – M3 will be verified first. Verification of Assumption M1. Note that

= |u(x) − v(x)| · |2(s(x) + w) − u(x) − v(x)|, where

|2(s(x) + w) − u(x) − v(x)| ≤ 2(|w| + ksk∞+ b)

for u, v ∈ Sj since functions in Sj are bounded by b in sup-norm. Take

Mj(w) = 2(|w| + ksk∞+ b) and ∆j(x, u, v) = |u(x) − v(x)|, then

|γ(z, u) − γ(z, v)| ≤ M_j(w)∆j(x, u, v).

An upper bound for E[M_jm(Wi)] can be obtained by controlling E|Wi|m:

E[M_jm(Wi)] = E[2m(|Wi| + ksk∞+ b)m] ≤ 4m ₁ 2E(|Wi| m_{) +}1 2(ksk∞+ b) m ≤ m! 2 4 m Γ αm + (ksk∞+ b)m m! ≤ m! 2 4 mΓ α + ksk∞+ b m . (29)

Here the inequality E|Wi|m≤

m!Γ

αm follows from the assumption that Ee α|Wi|_<

(20)

To control Es[∆mj (Xi, u, v)] = Es|u(Xi) − v(Xi)|m, note that for u ∈ Sj

and v ∈ Sj,

Es|u(Xi) − v(Xi)|m≤ Es[u(X1) − v(X1)]2ku − vkm−2∞ ,

where Es[u(X1) − v(X1)]2≤ k0ku − vk2 for some constant k0 that does not

depend on j since the density of Xi is bounded above. Thus for u ∈ Sj and

v ∈ Sj,

Es[∆mj (Xi, u, v)] ≤ k0ku − vk2ku − vkm−2∞ (30)

≤ k0ku − vk2(2b)m−2. (31)

To control |γ(z, u) − γ(z, v)| for u ∈ Sj and v = s, take M2,j(w) =

2|w| + ksk∞+ b and ∆2,j(x, u, s) = |u(x) − s(x)| = ∆j(x, u, s), then

|γ(z, u) − γ(z, s)| ≤ M2,j(w)∆2,j(x, u, s)

since(29) still holds and

|2(s(x) + w) − u(x) − s(x)| ≤ 2|w| + ksk∞+ b.

Modify slightly the equations for deriving (29) by replacing ksk∞+ b with

(ksk∞+ b)/2 and we have E[M_2,jm(Wi)] ≤ m! 2 4 mΓ α + ksk∞+ b 2 m . (32) Also, from (30), we have

Es[∆m2,j(Xi, u, s)] ≤ k0ku − vk2(ksk∞+ b)m (33)

Let A0 = 4(Γ/α + ksk∞) and B0 = 0.5. From (29), (31), (32), (33),

Assumption M1 holds with bm = 1, am= m!/2, Bj = 2b,

Aj = 4

_Γ

α+ ksk∞+ b

= A0+ 2Bj,

A2,j = A0+ Bj, and B2,j = (A0+ Bj)/2. It is clear that Bj ≥ 1 and kuk∞≤

Bj for all u ∈ Sj. Also, 0 < Aj/Bj ≤ A0+ 2 and A2,j/B2,j = 2 = 1/B0.

Verification of Assumption M2. To identify the constants in Assumption M2, Facts 5 and 6 will be applied. These two facts are first stated and proved below.

Fact 5 Let ¯S be a D-dimensional subspace of L2∩ L∞(µ) spanned by some

(21)

the L∞-norm with respect to µ. Let | · |2 and | · |∞ denote the l2-norm and

the l∞-norm in RD.

Suppose that there exist constants T1 and T2 such that for (θ1, . . . , θD) ∈

RD, D X i=1 θiφi ∞ ≤ T₁|θ|∞ (34) and T2 √ D|θ|2≤ D X i=1 θiφi 2 ≤ √T3 D|θ|2. (35) Take r0 ≥ T1/T3 and B0 =√2πe 0.5 + max _T 3 T2 , 1 (36) Then for B: an L2 ball of radius σ in ¯S with 0 < δ < σ/5, there exists a

finite set T ⊂ B such that T is a δ-net for B with respect to the L2-norm

and a r0δ-net with respect to the L∞ norm, and the number of elements in

T is at most (B0σ/δ)D_.

Proof. Suppose that the center of B is PD

i=1θ∗iφi. Let θ∗ = (θ∗1, . . . , θD∗).

Then it follows from the first inequality in (35) that B is contained {PD

i=1θiφi :

(θ1, . . . , θD) ∈ B0}, where B0 is the l2 ball of center θ∗ and radius

√

Dσ/T2.

Since the volume for an l2 ball in RD with radius σ is bounded by c00(D)σD

(cf. Proof of Lemma 2 in [3]), where

c00(D) = (2πe/D)D/2(πD)−1/2,

we can cover B0 with cubes of edge length δ/T3 such that the number of

cubes is at most c00(D)( √ Dσ/T2+ √ Dδ/T3)D (δ/T3)D ≤ (1 + (T₃σ)/(T2δ))D(2πe)D/2≤ (B0σ/δ)D

for the B0 in (36) if σ/5 > δ > 0. Choosing one point from each cube to form a set T0, and take T = B ∩ {PDi=1θiφi : (θ1, . . . , θD) ∈ T0}, then from

the second inequality in (35), T is a δ-net for B with respect to the L2-norm.

From (34), T is a r0δ-net for B with respect to the L∞-norm for r0≥ T1/T3.

The proof of Fact 5 is complete.

Fact 6 Suppose that µ is the Lebesgue measure on [0, 1]. Let ¯S be the space of B-splines on [0, 1] with order q and k knots ξ1, . . ., ξk with multiplicities

m1, . . ., mk, where 0 < ξ1 < · · · < ξk < 1. Then ¯S is a sub-space of

(22)

where ξ0 = 0 and ξk+1= 1. Let K = m1+ · · · + mk and D = K + q. Then

(34) holds with T1 = 1 and (35) holds with T2 =

q ˜ ∆1D/( √ q(2q + 1)9q−1) and T3 = q q ˜ ∆2D. Proof. Let (y1, . . . , yK+2q) = ( 0, . . . , 0 | {z } q times ξ1, . . . , ξ1 | {z } m1 times , . . . , ξk, . . . , ξk | {z } mk times 1, . . . , 1 | {z } q times )

and let φi be the (normalized) B-spline basis of order q associated with knots

yi, . . ., yi+q for i = 1, . . ., D. Then φ1, . . ., φD spans ¯S. It follows from

Equation (4.80) in Schumaker [6] that (34) holds with T1 = 1, so it remains

to check (35).

To check that the first inequality in (35) holds with the T2 specified

above, note that from (4.79) and (4.86) in [6], we have that for f =PK+q

i=1 θiφi,

|θ_i| ≤ (2q + 1)2₉2(q−1)_∆_˜−1/2

1 kf kL2[yi,yi+q],

where φi is supported on [yi, yi+q] which implies that K+q X i=1 θ2_i ≤ (2q + 1)2₉2(q−1)_∆_˜−1 1 K+q X i=1 kf k2_L 2[yi,yi+q] ≤ (2q + 1)292(q−1)∆˜−1₁ q kf k2₂,

which implies that the first inequality in (35) holds with T2 =

q ˜ ∆1D/( √ q(2q+ 1)9q−1).

To check that the first inequality in (35) holds with the T3 specified

above, we follow the approach in the proof of Lemma 4.2 in Ghosal, Ghosh and Van der Vaart [4], which is originally given in Stone [7]. For f =

PK+q

j=1 θjφj, we have that for x ∈ [yi, yi+1) and q + 1 ≤ i ≤ q + K, f (x) =

Pi

j=i+1−qθjφj(x) (cf. [6], Equations (4.25) and (4.29)), so it follows from

Schwartz inequality that for x ∈ [yi, yi+1),

f2(x) ≤ q i X j=i+1−q θ2_jφ2_j(x) ≤ q i X j=i+1−q θ_j2, which gives Z 1 0 f2(x)dx = q+K X i=q+1 Z yi+1 yi f2(x)dx ≤ q q+K X i=q+1 i X j=i+1−q θ2_j(yi+1−yi) ≤ q2∆˜2 D X j=1 θ2_j

(23)

and the second inequality in (35) holds with T3 = q

q

˜

∆2D. The proof of

Fact 6 is complete.

From Facts 5 and 6, Assumption M2 holds with Dj = q + k, Bj0 defined

in (12) and rj = 1 (defined in (13)) since

1 ≥ 1 q q ˜ ∆2,j(k + q) ,

where ˜∆2,j and ˜∆2,j are defined in (10) and (11) respectively. It is clear

that B_j0 ≥ 1 and Dj ≥ 1 for j ∈ Λ, as required in Theorem 1. In addition,

(14) and B2_jrj ≤ δntogether implies that rjB2j ≤ `n/4 for j ∈ Λ if n is large

enough.

Assumption M3 holds with constants k1 and k2 that do not depend on

j since the density for the distribution of Xi is supported on [0, 1] and is

bounded below from zero and bounded above on [0, 1] and E 1 n n X i=1 (Yi− t(Xi))2− 1 n n X i=1 (Yi− s(Xi))2 ! = E(s(Xi) − t(Xi))2.

Next, we will verify (5). Note that with Bj = 2b and Dj = k + q, (16)

implies that `nη1,j 2rjBj2 > an(k + q) log(B 0 j) 2 ≥ anDjlog(3.5 √ 2πe) 2

and 24(1 + 2Djlog 2) ≤ 24(1 + 2 log 2)Dj, so if n is large enough so that

anlog(3.5

√

2πe) ≥ 48(1 + 2 log 2), (37) then (5) holds.

Now we have verified that the conditions in Theorem 1 hold true. Let ηj = (η1,j + rj(2b)2ξ/`n)/2. Then by Theorem 1, the error bound in (7)

holds with θ =

_A

0+ 2

c(8τ )

∨ 5 except on a set of probability at most

qn  p ∗ j∗(`_n, η_j∗) +X j∈Λ pj(`n, ηj)  , (38) for a given j∗∈ Λ.

Below we will calculate the probability upper bound in (38) with j∗= b∗, q∗, 1 2J∗, . . . , 2J∗− 1 2J∗ ! def = (b∗, q∗, ξ∗) ,

(24)

where q∗ = m + 1 and J∗ = blog₂(n1/(1+2m))c, and b∗ is a constant large enough so that for any spline with knot vector ξ∗, order q∗, and sup-norm bounded by bksk∞c + 1, the spline coefficients are bounded by b∗. Note that

ηj = 1 2 η1,j + rj(2b)2ξ `n ! = 1 2 anrj(2b)2 `n (k + q) log(B_j0) + λ[(log 2)2Jj_{+ q + b]}₊rj(2b) 2_ξ `n ! , so p∗_j∗(`_n, η_j∗) = exp −2c1(τ /k0)`nηj∗ (A0+ 2b∗)2

≤ exp−a_nc9[(log 2)2J

∗ + q∗+ b∗] − c9ξ , (39) and for 0 < σ < 1, pj(`n, ηj) = exp −2c1(τ /k0)`nηj (A0+ 2b)2 +1.6 1 − exp −τ σ 2 2 c(τ ) ∧ 1 3 !!−1 (B0_jθ) ∨ 23(k+q)× " exp − `nc(τ )ηj 2(A0+ 2b)(2b) + exp − `nηj 24rj(2b)2 !# ≤ exp −2c3`nηj (2b)2 + c4 σ2 exp −2c5`nηj (2b)2 B_j0θ3(k+q)

for some constants c9, c3, c4, c5. For 0 < σ < 1, if n is large enough so that

c5an≥ 3 +

3 log θ

log(3.5√2πe), (40) then for ξ > 0 and c6 = c3∧ c5,

pj(`n, ηj) ≤ 2 1 ∨ c4 σ2 exp−c₆han[(log 2)2Jj+ q + b] + ξ i . For n large enough so that anc6 > 2,

X j∈Λ exp−c6an[(log 2)2Jj+ q + b] ≤X b X q X J 2J₋₁ X k=1 2J− 1 k!(2J_{− 1 − k)!}exp −2[(log 2)2J+ q + b] ≤X b X q X J 2−2Je−2be−2qdef= c7 < ∞,

(25)

so for c8 = 2c7, X j∈Λ pj(`n, ηj) ≤ c8 1 ∨ c4 σ2 exp (−c6ξ) .

Therefore, for 0 < σ < 1 and for n large enough so that anc6 > 2 and (40)

and (37) hold, we have (k1− 9τ )kˆs − sk2IΩn > 9τ σ2 4 ! + 1.5η1,j∗+ (k₂+ τ )ks∗− sk2+ 1.5δnξ `n

with probability at most qn exp(−c9ξ) + c8 1 ∨ c4 σ2 exp (−c6ξ) for ξ > 0. Let U = (k1− 9τ )kˆs − sk2IΩn− 9τ σ2 4 ! + 1.5η1,j∗+ (k₂+ τ )ks∗− sk2 ! , then for ξ0 > 0, 0 < σ < 1 ∧ √ c4, c10= c6∧ c9 and c11= c4(1 + c8), we have E _` nU 1.5δn ≤ ξ₀+ Z ∞ ξ0 qn exp(−c9ξ) + c8 1 ∨ c4 σ2 exp (−c6ξ) dξ ≤ ξ0+ c11qn σ2 Z ∞ ξ0 exp (−c10ξ) dξ ≤ ξ0+ c11qn c10σ2 exp (−c10ξ0) .

Take ξ0= δ−1n `nn−2m/(1+2m)and σ = n−m/(1+2m), then ξ0 = O(δ−1n (log n)−1−γ0n1/(1+2m))

and lim sup

n n

2m/(1+2m)_{E(U ) ≤ lim sup} n 1.5δnn2m/(1+2m) `n ξ0+ c11 c10σ2 exp (−c10ξ0) < ∞, so (k1− 9τ )Ekˆs − sk2IΩn ≤ 1.5η1,j∗+ (k2+ τ )ks ∗_{− sk}2_{+ Cn}−2m/(1+2m)

for some constant C > 0. Recall that for the splines in Sj∗, the knots are

equally spaced and the number of knots is 2J∗ − 1 = O(n1/(1+2m)_{), so we}

can choose s∗ ∈ S_j∗ such that ks∗− sk2 = O(n−2m/(1+2m)) (cf. Theorem

6.25 in [6]) and

(k1− 9τ )Ekˆs − sk2IΩn ≤ 1.5η1,j∗+ O(n

−2m/(1+2m)

(26)

Since rj∗= 1 and B_j0∗ = √ 2πe0.5 + q∗pq∗_(2q∗_{+ 1)9}q∗₋₁ , we have η1,j∗ = anrj ∗(2b∗)2 `n (2J∗− 1 + q∗) log(B_j0∗) + λ[(log 2)2J ∗ + q∗+ b∗] = O(an(log n)1+γ0n−2m/(1+2m)).

Choose τ < k1/9, then (41) implies that

Ekˆs − sk2IΩn = O(an(log n)

1+γ0_n−2m/(1+2m)_). ₍₄₂₎

It remains to establish an upper bound for Ekˆs − sk2IΩc

n. Note that

kˆsk ≤ δn, ksk∞ < ∞ and P (Ωcn) ≤ (`n+ 1)(βq˜n + βqn−˜qn), where both ˜qn

and qn− ˜qn are O((log n)1+γ0) from the choice that `n= O(n(log n)−1−γ0).

Thus there exists a constant c12> 0 such that

n2m/(1+2m)Ekˆs − sk2IΩc

n = O(n

2m/(1+2m)_δ2

n(`n+ 1) exp(−c12(log n)1+γ0)

= o(1). (43)

The proof of Theorem 2 is complete by combining (42) and (43).

References

[1] Y Baraud, F. Comte, and G. Viennet. Adaptive estimation in autoregres-sion or β-mixing regresautoregres-sion via model selection. The Annals of Statistics, 29(3):839–875, 2001.

[2] Andrew Barron, Lucien Birg´e, and Pascal Massart. Risk bounds for model selection via penalization (Disc: P415-419). Probability Theory and Related Fields, 113:301–413, 1999.

[3] Lucien Birg´e and Pascal Massart. Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli, 4:329– 375, 1998.

[4] Subhashis Ghosal, Jayanta K. Ghosh, and Aad W. van der Vaart. Convergence rates of posterior distributions. The Annals of Statistics, 28(2):500–531, 2000.

[5] Tzee-Ming Huang. Convergence rates for posterior distributions and adaptive estimation. The Annals of Statistics, 32(4):1556–1593, 2004.

(27)

[6] Larry L. Schumaker. Spline Functions: Basic Theory. Wiley-Interscience, 1981.

[7] Charles J. Stone. The dimensionality reduction principle for generalized additive models. The Annals of Statistics, 14:590–606, 1986.

[8] Yuhong Yang and Andrew R. Barron. An asymptotic property of model selection criteria. IEEE Transactions on Information Theory, 44(1):95– 116, 1998.

[9] S. Zhou, X. Shen, and D. A. Wolfe. Local asymptotics for regression splines and confidence regions. The Annals of Statistics, 26(5):1760– 1782, 1998.

(28)

科技部補助計畫衍生研發成果推廣資料表

日期:2014/07/28

科技部補助計畫

計畫名稱: 相關誤差下的迴歸函數適應性估計計畫主持人: 黃子銘計畫編號: 102-2118-M-004-007- 學門領域: 數理統計與機率

無研發成果推廣資料

(29)

102 年度專題研究計畫研究成果彙整表

計畫主持人：黃子銘

計畫編號：102-2118-M-004-007-

計畫名稱：相關誤差下的迴歸函數適應性估計

量化

成果項目

實際已達成數（被接受或已發表）預期總達成數(含實際已達成數)

本計畫實

際貢獻百

分比

單位

備註

（

質化說

明：如數個計畫

共同成果、成果

列為該期刊之

封面故事 ...

等

）

期刊論文

0

0 100%

研究報告/技術報告

0

0 100%

研討會論文

0

0 100%

篇

論文著作

專書

0

0 100%

申請中件數

0

0 100%

專利

已獲得件數

0

0 100%

件

件數

0

0 100%

件

技術移轉

權利金

0

0 100%

千元

碩士生

0

0 100%

博士生

0

0 100%

博士後研究員

0

0 100%

國內

參與計畫人力

（本國籍）

專任助理

0

0 100%

人次

期刊論文

0

0 100%

研究報告/技術報告 0

0 100%

研討會論文

1

1 100%

篇

論文名稱 : ＇＇

An adaptive knot

selection method

for regression

splines

via

penalized

minimum contrast

estimation ＇＇ .

發表於 2014 IMS

APRM 國際研討

會.

論文著作

專書

0

0 100%

章/本

申請中件數

0

0 100%

專利

已獲得件數

0

0 100%

件

件數

0

0 100%

件

技術移轉

權利金

0

0 100%

千元

碩士生

0

0 100%

博士生

0

0 100%

博士後研究員

0

0 100%

國外

參與計畫人力

（外國籍）

人次

(30)

其他成果

(

無法以量化表達之成

果如辦理學術活動、獲

得獎項、重要國際合

作、研究成果國際影響

力及其他協助產業技

術發展之具體效益事

項等，請以文字敘述填

列。)

無

成果項目量化 名稱或內容性質簡述 測驗工具(含質性與量性)

0

課程/模組

0

電腦及網路系統或工具

0

教材

0

舉辦之活動/競賽

0

研討會/工作坊

0

電子報、網站

0

科教處計畫加填項目計畫成果推廣之參與（閱聽）人數

0

(31)

相關誤差下的迴歸函數適應性估計

科技部補助專題研究計畫成果報告

期末報告

相關誤差下的迴歸函數適應性估計

計 畫 類 別 ： 個別型計畫

計 畫 編 號 ： NSC 102-2118-M-004-007-

執 行 期 間 ： 102 年 08 月 01 日至 103 年 07 月 31 日

執 行 單 位 ： 國立政治大學統計學系

計 畫 主 持 人 ： 黃子銘

計畫參與人員： 碩士級-專任助理人員：鄭宇翔

處 理 方 式 ：

1.公開資訊：本計畫可公開查詢

2.「本研究」是否已有嚴重損及公共利益之發現：否

3.「本報告」是否建議提供政府單位施政參考：否

中 華 民 國 103 年 07 月 28 日

中 文 摘 要 ： 無母數迴歸估計問題中，若是迴歸函數平滑程度已知， 則容

易得到收斂速度為最佳的估計量。 當迴歸函數平滑程度為未

知時， 我們也希望迴歸函數估計量能達到一樣好的估計效

果。 這樣的估計量稱為具有適應性。

當迴歸模型中誤差為相關時， 有人提出基於一種選模規則

的迴歸函數估計量， 是具適應性的。 然而這方法使用上有

困難， 因為選模規則牽涉到一些未知參數。 本計畫主要成

果為提出一種迴歸函數估計量使用上不牽涉到一些未知參

數，並證明其收斂速度達到幾乎最佳，可視為接近適應性的

估計量。

中文關鍵詞： 迴歸, 適應性估計, 相關誤差

英 文 摘 要 ： In nonparametric regression, if the degree of

smoothness of the regression function is known, it is

often easy to obtain estimators that attain the

optimal convergence rate. When the degree of

smoothness of the regression function is unknown, it

is desirable to have estimators for the regression

function that can also achieve the optimal

convergence rate. Estimators that have this property

are called adaptive.

When the errors in a regression model are dependent,

an adaptive estimator based on a model selection

criterion has been proposed, but it is difficult to

implement because the criterion involves unknown

parameters for the error dependence structure and the

error variance. In this project, it is proposed to

consider an estimator that is also based on model

selection approach without invovling unknown

parameters. It is shown that the proposed estimator

attends a nearly-optimal convergence rate.

study the possibility of replacing the unknown

parameters by their consistent estimators to make the

adaptive estimator implementable.

An adaptive knot selection method for regression

splines via penalized minimum contrast estimation

1

Introduction

2

Error Control for Minimum Contrast

Estima-tors

3

Application to Adaptive B-spline Estimation

4

Conclusion

5

Proofs

References

科技部補助計畫衍生研發成果推廣資料表

科技部補助計畫

無研發成果推廣資料

102 年度專題研究計畫研究成果彙整表

計畫主持人：黃子銘

計畫編號：102-2118-M-004-007-

計畫名稱：相關誤差下的迴歸函數適應性估計

量化

成果項目

本計畫實

際貢獻百

分比

單位

備 註

質 化 說

明：如 數 個 計 畫

共 同 成 果、成 果

列 為 該 期 刊 之

計畫類別：個別型計畫

計畫編號： NSC 102-2118-M-004-007-

執行期間： 102 年 08 月 01 日至 103 年 07 月 31 日

執行單位：國立政治大學統計學系

計畫主持人：黃子銘

計畫參與人員：碩士級-專任助理人員：鄭宇翔

處理方式：

中華民國 103 年 07 月 28 日

中文摘要：無母數迴歸估計問題中，若是迴歸函數平滑程度已知，則容

易得到收斂速度為最佳的估計量。當迴歸函數平滑程度為未

知時，我們也希望迴歸函數估計量能達到一樣好的估計效

果。這樣的估計量稱為具有適應性。

當迴歸模型中誤差為相關時，有人提出基於一種選模規則

的迴歸函數估計量，是具適應性的。然而這方法使用上有

困難，因為選模規則牽涉到一些未知參數。本計畫主要成

中文關鍵詞：迴歸, 適應性估計, 相關誤差

英文摘要： In nonparametric regression, if the degree of

備註

質化說

明：如數個計畫

共同成果、成果

列為該期刊之

封面故事 ...