DATA-DRIVEN EFFICIENT ESTIMATORS FOR A PARTIALLY LINEAR-MODEL

(1)

Author(s): Hung Chen and Jyh-Jen Horng Shiau

Source: The Annals of Statistics, Vol. 22, No. 1 (Mar., 1994), pp. 211-237 Published by: Institute of Mathematical Statistics

Stable URL: http://www.jstor.org/stable/2242451 .

Accessed: 28/04/2014 12:45

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Statistics.

(2)

DATA-DRIVEN EFFICIENT ESTIMATORS FOR A PARTIALLY LINEAR MODEL

BY HUNG CHEN1 AND JYH-JEN HORNG SHIAU2

State University of New York, Stony Brook and National Chiao-Thng University

Chen and Shiau showed that a two-stage spline smoothing method and the partial regression method lead to efficient estimators for the parametric component of a partially linear model when the smoothing parameter is a deterministic sequence tending to zero at an appropriate rate. This paper is concerned with the large-sample behavior of these estimators when the smoothing parameter is chosen by the generalized cross validation (GCV) method or Mallows' CL. Under mild conditions, the estimated parametric component is asymptotically normal with the usual parametric rate of convergence for both spline estimation methods. As a by-product, it is shown that the 'optimal rate" for the smoothing parameter, with respect to expected average squared error, is the same for the two estimation methods as it is for ordinary smoothing splines.

1. Introduction. In this paper, we study the asymptotic behavior of the two efficient estimators for the parametric component of a partially linear model discussed in Chen and Shiau (1991) when the smoothing parameter is chosen either by the generalized cross validation (GCV) method proposed by Craven and Wahba (1979) or by the Mallows CL criterion [Mallows (1973)]. As in Chen and Shiau (1991), we consider a semiparametric regression model (1) Yin = X'23 +g(tin) + ein, i = 1, ...n,

where both the Xin = _(Xin,..._,Xidn)T(a d-vector) and _{tin E}[0, 1] are observed

design variables, ,3 = (01,... ,/d)T is a vector of unknown regression coeffi-

cients, g is a smooth function to be estimated and the {ein} are independent and identically distributed errors when mean zero and variance a2.

Several estimation methods for model (1) have been proposed in the literature. See Chen and Shiau (1991) and the references cited therein. Chen and Shiau (1991) discussed the asymptotic behavior of the following three estimators.

(i) The partial spline estimator [proposed by Engle, Granger, Rice and Weiss (1986), Wahba (1984, 1986) and Shiau, Wahba and Johnson (1986), among

Received June 1991; revised July 1992. 1 Supported by NSF Grant DMS-89-01556.

2 _{Part of this work was done while J. Shiau was at the Engineering}_{Research Center,}_AT&T

Bell Laboratories.

AMS 1991 subject classifications. Primary 62G05, 62G99, 62J99.

Key words and phrases. Partial splines, semiparametric regression, smoothing splines, rate of convergence, partial regression, generalized cross validation, Mallows' CL, efficient estimators.

(3)

others] is the solution to the following variational problem: 1 n1

(2) mn _Z'- in-X -/3 - -g(tin)]2 +

L

_m(t)]2 dt,

,3ERd, gEw2mn A]0

where W.' is the Sobolev space

{f f

has mr-1 absolutely continuous derivatives and

fm)

E L2[0, 1]} and A is the smoothing parameter controlling the tradeoff between fidelity to data and roughness of the solution. It is known that the partial spline estimators for ,3 and g = (g(t1n), .. . ,g(tnn))T are

(3) _,8vi_{= (XT(I - S,)X)-lXT(I - S )y and jx = SA(y-X/3),}

where X = (xirn) is the n x d design matrix for the parametric component of (1), Y = (Yin, . .. ,Ynn)T and SAi is the smoother matrix for ordinary spline smoothing [i.e., when 3 = 0 in (2)].

(ii) The partial regression estimator was proposed independently by Denby (1986) and Speckman (1988). Motivated by the partial regression scheme in linear regression, the partial regression estimator is obtained by first smoothing X and y, respectively, by the smoother matrix SA, and then regressing the residuals of y on the residuals of X. Specifically, we have the partial regression estimator defined by

(4) ) = (XT(I_S2)2X) XT(I-S y)2y and glA _=SA(y-Xf31A) (iii) The two-stage spline smoothing estimator was recently proposed by Chen and Shiau (1991). For simplicity, we shall discuss a simplified version of the estimator when the same smoothing parameter is used in both stages of smoothing, namely,

/30A = (XTi(I - S,)3X)-lXT (I _ S-)2y

= Sx(y - _X,A")- (I - SA)SAX/3oA.

The basic idea behind this estimator is to modify the partial spline method so that roughness of the parametric component is penalized as well as that of the nonparametric component. Thus we first smooth X to obtain the residuals (I - _S,)XW_{for the purpose of extracting}_{the smooth}_{part from}_{the parametric}

component, and then we apply the partial spline technique to smooth y over (I - _SA,)X._{This two-stage}_smoothing_{gives (5).}

In general, the smoother matrix SA in (3), (4) and (5) can be replaced by any commonly used smoother matrix. Of course, estimators obtained by dif- ferent smoothers may behave differently. See Chen and Shiau (1991) for some remarks. In this paper, we only study the case that SA is the smoothing spline smoother.

To use these three methods to estimate

/3

and g in practice, it is necessary to specify a value of the smoothing parameter A. In the context of nonparametric regression, it is well known that the choice of A is very crucial to the solution. A popular data-driven method of choosing A is the generalized cross

(4)

validation

(GCV) method

(to be described

in Section

2). Numerically,

the GCV

method

has been proven

to be a good method.

Speckman

(1981) and Li (1986)

gave some

nice theoretical

results

on the GCV method.

However,

the use of the

GCV method

for

determining

the value of A in (3), (4) or (5) has not yet been

thoroughly

examined.

To our knowledge

the only

relevant

reference

is Speck-

man (1988), who gave a weak GCV theorem

as in Craven

and Wahba (1979)

for

the partial

regression

estimator

(4) in the context

of kernel

smoothing.

There have been some studies

on the asymptotic

behavior

of the preced-

ing three

estimators,

when A is a deterministic

quantity

depending

on n, in

the setting

that Xirn

= hr(tin)

+

Zirn,

where

the hr's are smooth

functions

and

{(Ziins,. ,Zidn)}1ii<n are independent and identically distributed error vectors with zero mean and positive definite covariance matrix. For the partial spline estimator with spline smoothing, Rice (1986) pointed out that /3A _{-,/3 can}

achieve the usual parametric rate of convergence as in parametric regression, namely, O(n-1/2), only at the expense of undersmoothing the nonparametric component g. Thus Rice (1986) concluded that the use of the GCV method for choosing A is questionable in this case.

On the other hand, Speckman (1988), for the partial regression estimator with kernel smoothing, and Chen and Shiau (1991), for the two-stage spline smoothing estimator as well as the partial regression estimator with spline smoothing, showed that the negative result reported in Rice (1986) disappears. More specifically, by choosing an appropriate rate for A, the convergence rate of ,30A -/3 or -31A,-j3 reaches the parametric rate O(n"2) while gOx or gl\ can still

estimate

g= _{(g(tin),... ,g(tnn))T}with the same optimal convergence rate as that of the ordinary nonparametric regression estimator, which is achievable by the GCV estimator of A. Basically, Chen and Shiau (1991) demonstrated that the goal of obtaining an estimate for the regression surface g( ) with an "optimal" nonparametric convergence rate does not conflict with the goal of obtaining an estimate for the parametric component ,3 with the parametric convergence rate. Since "optimal" estimates of the regression surface can be obtained by the method of GCV for the nonparametric regression context, we expect that the parametric convergence rate can be achieved for some estimators of 3, such as (4) and (5), for the semiparametric model (1). The following conjecture is hence reasonably made by Speckman (1988) for kernel smoothing and Chen and Shiau (1991) for spline smoothing.

CONJECTURE. The GCV method can be used to choose the value of A in (4) or (5) such that /31a or /30o can still estimate /3 with n112 rate.

The main objective of this paper is to prove this conjecture when SA is the smoother matrix for ordinary spline smoothing. We also prove that the same result holds if A is chosen by the criterion of Mallows' CL. We remark that although the problem of determining smoothing parameters for nonparametric regression based on data only is studied extensively in the literature [see Li (1986) and references therein], those results are not applicable in general to the problem posed in this article. A further remark on this is given in Section 2.

(5)

The main results are summarized in Theorems 1 and 2 (Section 2), in which the asymptotic distributions _{of 0(30 - j3 and 1,3\}- ,3 are derived when the smoothing parameter is determined by either the (restricted) GCV method or (restricted) Mallows' CL. Descriptions of these two methods are given in Section 2. Most of the proofs are given in the remaining sections.

As a by-product of proving Theorems 1 and 2, it is shown in Propositions l(b) and 3(b) that the "optimal rate" for the smoothing parameter, with respect to expected average squared error, is the same for the two estimation methods as it is for ordinary smoothing splines.

As suggested by a referee, we also have looked into the situation studied by Heckman (1986). When hr

-

constant, Heckman (1986) established asymptotic normality for the partial spline estimator of ,3 and showed that its bias is asymptotically negligible. According to the preceding discussion, it is expected that the GCV method can be used to choose the value of A in (3) such that

,13 can still estimate

3

with n-'/2 rate under the setting of Heckman (1986). This conjecture is also confirmed for a more general case where the hr's are polynomial of degree less than m, and the result is presented as Theorem 3 in Section 2.

2. Data-driven methods and main results. In this section we describe the (restricted) GCV method and (restricted) Mallows' CL for determining the value of A in (4) and (5) and present the main results of this paper. We first introduce some notation. Write

X/30z + gOA = [SA + (I - _SA)2X(XT(I_ _SA)3X)_{-1XT(I _}S)2]y Aoy

Xl31, + glA = [SA + (I - S)X(XT(I - SA)2X)-1XT (I - SA)2]y = AlAy, where AoA and A1A are so-called hat matrices or influence matrices. Let AOG

be the minimizer of the generalized cross validation function (GCV function)

2

n-1 _(I-AoA\)y

v0

(A)

( )n-1 ||r(I - AOA )Y| = ntr1AO)2'

over A E [A1, A2] where A1 = n-61 log' n with 61 = 2m/5, and A2 = n-62 for any 62 satisfying 0 < 62 < 2m/(4m+ 1). Also II(I-AoA)y112 = yT(I -Ao)T(I -Ao)y, the residual sum of squares. Similarly, let Aoc denote the minimizer of Mallows' CL

COL(A)=nl (I-AO,)y 2 _{+2n-lo,2trAoA}

over A e [A1, A2], where a2 is assumed known. For the partial regression

method,

A1G, V1(A), AlC

and

CL(A)

are defined

accordingly

for

A1ly.

It is known that there exists a common orthonormal basis for all SA, (with A being the running index), for example, a Demmler-Reinsch basis [Demmler and Reinsch (1975)]. In other words, all S i can be diagonalized simultane-

ously by this basis. Further details of this basis are given in Section 3. Unfor- tunately, it is not clear whether there exists such a common orthonormal basis

(6)

for all Ao, or A,,A in general. Although both the GCV method and Mallows' CL have been studied in the context of nonparametric regression when S), is the smoother matrix for smoothing splines, these results are not applicable to our problem since the arguments used to prove these results depend strongly on the existence of a common orthonormal basis for all SA.

Throughout the rest of the paper, we assume that

{xi

}

is a random sample from x, where x = (X1,... ,Xd)T, Xr = _hr(t)+zr,for 1 < r < d, t E [0, 1] and the hr'S are smooth functions. Set go = EdZl3rhr +g. We also assume that the following conditions hold.

(Al) Ezr=O,Var((zl,...,Zd))=E=(urs)andEz4 <00,forl<r<d where

E is a d x d positive definite matrix. (A2) f1(g1m)(t))2 _{dt = y > 0 and m > 2.}

(A3) The points tin are generated by (2i - 1)/2n = tfin p(t)dt for some den-

sity function p(t) on [0, 1].

(A4) The errors en,.. ,enn are i.i.d. having a distribution independent of

n and t, and Eel < oo, for i= 1, 2,..., n.

(AS) g,hr,go E F =

{f: f

E _W22m[O,1], f(k)(o) = f()(l) = 0, m < k < 2m - 1} for 1 < r < d.

Under (A3), we can find the magnitude of trS' for 1 = 1, 2,. .. over [A1, A2] based on Lemma 5.1 of Speckman (1981). This result is summarized in Lemma 2(c). Under (AS), functions in F are the so-called very smooth functions defined in Wahba (1977). When A2 also holds, it follows from Speckman [(1981), (3.2) and Lemma 3.1] that an exact bound can be obtained for gT(I - SA)2go, where

go = (g0(t1), . . ,go(tnn))T. This bound is given in Lemma 2(b) in Section 3.

We now discuss the assumption (A5), which states that go and hr must satisfy boundary conditions on some high derivatives. (AS) is considered because it and (A2) give an explicit asymptotic expression for the expectation of the averaged squared error loss. Then this expression can be used to determine the asymptotic behavior of A determined by either the GCV method or Mallows'

CL. Using the bias reduction approach developed by Eubank and Speckman (1991), go and hr can be modified (by construction) to satisfy the boundary conditions specified in (AS). It is then conjectured that a result similar to that of this paper without (AS) will still hold as long as an explicit asymptotic expression for the expectation of the averaged squared error loss exists after the boundary adjustment. However, no proof is available now.

The asymptotic distribution of ,BO and 31~ are summarized in Theorems 1 and 2, respectively, when the value of A is determined by either the GCV method or Mallows' CL.

THEOREM 1. Under (A1}(A5), _-,/ni(l30- j3) converges in distribution to

N(O, 2-1) for A = AOG or AOC.

THEOREM 2. Under (A1)-(A5), _vni(031,- 3) converges in distribution to

(7)

Now we describe the (restricted) GCV method and (restricted) Mallows' CL for determining the value of A in (3) under the assumption that hr's are polynomial of degree less than m [i.e., h(m)(t) 0 ]. The results are summarized in Theorem 3. First write

X13A + = [Sv + (I- S\)X(XT (I - _SA,)X)1XT(I- = AAY

Let AG and AC be the minimizer of the corresponding GCV function and Mal- lows' CL, respectively, over A E [A1, A2].

THEOREM 3. Under (A1}(A5) and _h(m)(t)_{0, for 1}_<_{r < d,}

Vii(11-3)

converges in distribution to N(O, o2>Y1), for A = AG or AC.

Let Lon(A) denote the averaged squared error loss over design points, that is, n- lAoAy - X,3 - gll2, and AOR denote the value of A that minimizes the

risk Ron(A) = ELon(A) over [A1, A2]. Note that here the expectation is taken with respect to e only, that is, conditioned on (x, t). We will prove Theorem 1 via the following three steps. Since the GCV method or Mallows' CL at- tempts to provide a data-based estimate of AOR, we first try to locate AOR. Let A0 = [A,,n-S3], where 61 > 2m/(4m + 1) > 63 > 62 > _4.1. Note that A0 is con- tained in [A1, A2]. We show in Proposition 1 that AOR E AO. Next, we show in Proposition 2 that the choice of A based on either the GCV method or Mal- lows' CL does fall in Ao in probability. Finally, _{we show that v/i(130,}-

1)

is asymptotically normal.

Set hr = (hr(tin) ... ,hr(tnn))T, for 1 < r < d, and

Cl =lr ["p 1/2m (v) dv] j(1+ _V2n)-ldv.

The proofs of the following two propositions are given in Section 4.

PROPOSITION 1. Under (A1)(A5) and A E [A1, A2], when n tends to infinity, we have (a) ROn(A) % A2 + n-lA-1/2m and (b) AOR ; n-2m/(4m+l)

Here the symbol a(n) P b(n) means that a(n)/b(n) is bounded away from

zero and infinity. Note that AOR E Ao is an immediate result of (b).

PROPOSITION 2. Under (A1)(A5) and A E [Al,A2], limnP(A E Ao) = 1, for A = AOG or Aoc.

To prove Theorem 1, we use the following technical lemma to pave the way. Set Ao(A) = n-'XT(I - _SA)3X,_{A1(A) =}_n-XY(I- _{S)2X, Z = (Zirn)nxd}_and

H = (hr(tin))nxd.

LEMMA 1. Assume that (A1(A5) hold and that g, hr E F, for 1 < r < d. Then the following hold uniformly over all A E [A1, A2]:

(8)

(a) Ao(A) = 2(I + op(l)); (b) A1(A) = EiI + op(l));

and the following hold uniformly over all A e Ao: (c) n-l/2XT(I - _SA)2S,\X= o(1);

(d) n-1/2XT(I - SA)2g =

(e) n-'/2HT(I - S)2e = op (1); (f) n- 12ZTS'

e

= _{op(1),forl= 1,2;}

(g) n'1/2ZT(I - SI )g = op(1).

The proof

of Lemma

1 is given

at the end of Section

3. Note

that

the notation

op(l) used in this paper

denotes

either

the usual convention

or a d x d (or d x 1)

matrix

such that the magnitude

of each element

is op(l).

Now the proof

of Theorem

1 becomes

fairly

simple.

PROOF OF THEOREM 1.

Rewrite

Ao(A)n

"2('0o - _{,a) =}n-2Ze +

Rem(A),

where

Rem(A) = n-12{X3(I - SA)2(SAXI3 +

g)

+ HI(I - SA)2e + ZT[(I - _\)2- I]e}.

It follows

from

Lemma 1(c)-(f)

that

supAEAO

IRem(A)I

=

op(l). Although

any

realization

of A is in [A1,

A2],

which

is a wider

interval

than AO0

by noting

that,

for

any c > 0,

P

Rem(A) >

c)

_{<P( V AO0)}+P (IRem()I > c and A E Ao)

<

P(I V Ao)

+ P

(sup IRem(A) I >

c)

AE0

we can conclude

that Rem(A)

_{= op(l) by Proposition}

2. By Lemma l(a),

supxE[A Ai2]

_\2A(A)

-

E

= op(l).

Since

IAOPO- EI < sup Ao(A)-

E=oP(1),

AE[A1, A2]

we have

Ao(A) -+ E

in probability.

It is shown in Chen and Shiau (1991)

that n' /2ZTe -+ _N(O,_{L2% )}_{in distribution.}We then conclude _{V/(,30Q - 3)}

N(0,

a2E-1) by the above argument

and Slutsky's

theorem.

0

We now turn

to the partial

regression

estimator

(4). Observe

that

A(X) _1/2 _i3)-n/2ZTe₁₁ _+n1/2XT(I

-S)2g-+n/2HT (I-

(9)

Similarly, the proof of Theorem 2 can be performed via the following two propositions and Lemma 1(b)-(f).

Let the loss function Ljn(A) = n-' IIA1y - X,3 - gil2, and let A1R denote the value of the smoothing parameter that minimizes the risk R1n(A) = EL1n(A)

over

A

E [A1,

A2].

PROPOSITION 3. Under (A1)(A5) and A E [A1, A2], when n tends to infinity,

we have (a) R1n(A)

e A2 + nl'A-1/2m and (b) A?R t n-2m/(4m+l)

PROPOSITION 4. Under (A1)-(A5) and A e [A1, A2], limn P(A

E

Ao) = 1 for

A = _AlGor Alr.

We now turn to the partial spline estimator (3) when h(m)(t)

_ O.

PROOF OF THEOREM 3. Set A2(A) = n-lXT(I - S,)X. Rewrite

A2(,-,)nl/d - ,3) =

n-/2ZTe

+ Rem('), where

Rem(A) = n-1/2 {ZT(I - S\)g + HT(I - S4)(e + g) - ZTS\e} .

Note that HT(I - SA)(e + g) 0 O because the hr's are polynomials of degree less than m and SA is the smoother matrix for ordinary spline smoothing. Using

the same proof

to show Lemma l(a), we have

supAE[A1, \21

IA2(A)

-

E

= op(l). It follows from Lemma l(f) and (g) that supAEAo IRem(A)l = op(l). We then

conclude

V/in(i%S

-, 3) - _N(O,_a2E-1)_{by the above discussion}and the argument

used in proving Theorem 1. 0

3. Technical lemmas. In this section we state two more technical lemmas and summarize some properties of smoothing splines that are needed in the sequel. Lemma 1 is proved as an immediate result of these lemmas.

It is well known that smoothing splines are in the space of natural polynomial splines of order 2m on [0,

1]

with knot set {tin},.=I According to Demmler and Reinsch (1975), a basis for natural splines is {0jn(t)}j,<n with the following biorthogonality property:

(m (t (m t dt ₌Akn S1k.

n Oi (>tin)Okn (tin) = _0jk _Jo kn

i=1

Here {Akn} is a nondecreasing sequence of nonnegative numbers, and the eigenvalues of SA are (1 + AAkn)-1 for 1 < k < n. Hence, SA is a nonnegative definite matrix and has the eigenvalue decomposition rTD,r, where DA is a diagonal n x n matrix with k-th diagonal value (1 + AknA)-1 and r is an orthogonal n x n matrix with the ij-th element n-1/25 in(tjn). Therefore, (I -

(10)

Let

B2= n-1gT(I

_

SA)2g,

B2rp =nlhT(I SA)hr

and B23, = n-1 gT(I - SX)2go.

Note that B2 is the averaged squared bias of the ordinary smoothing spline estimate of g. A similar interpretation is applicable to

B2rp

₂and B2 _3p.

The following lemma is due to Speckman [(1981), Lemma 3.1, (3.2) and Theorem 2.4].

LEMMA 2. Suppose that (A3) holds. When A E [A1, A2] and m > 2,

(a)

(aB2- B32p = o(A2) if

go

E F, (b) B2 3,, = 'yA2(1 + o(1)) if go E F and (A2) holds, and (c) trSI = Ek(l + AknA)-' = C,Al-/2m(1 + o(1)) for positive integer 1.

Thus Lemma 2(a) also implies that B2 = o(A2) and B2r = O(A2 ), ifg,

hr

E F.

Lemma 3 summarizes the convergence rates for some terms to be used later

in the proofs of Lemma 1 and Propositions 1-4. Let Xirn = hr(tin) + Zirn and

Zr = (Zirn,... Znrn )T.

LEMMA 3. Assume that (A1)-(A4) hold and that hrf, fi, _ff2 E F, for 1 <r < d. Let a, aO and a, be constants satisfying 1 < a < 1/ao < 5 and a < a1. Then, for any finite positive integer 1, the following statements hold uniformly overall A

E

[A1, A2] and 1 < r,s < d:

(a) zTS1Z. = CjarsA-1/2m + op (A-1/4mao)

(b) eTS1 e =a2c,A-l/2m(1+o (1));

(c) n-l/2fT(I - Sx)le - _Op(Al/al) -op(l), where f = (f(tln), f(tnn))T; (d) n - _{1/2fT (I}- _SA)Zr- _Op(Al/al) = op (1);

(e) n-lfT'(I - SA)lf2 = O(A2), where fi = (fi(tln), . . _fi(tnn))

i = 1,2 and 1 > 2;

(f) n -xT (I - SA)2zs = Urs + Op (1);

(g) n-lx T(I - SA)'X8 = _{urs + op}_{(i), for 1}> _2;

(h) n-1/2x T(I - _SA,)2SAxs= _op(l) + O(1/2 2

(i) Z4TSl e = Op (A-1/4mao)

(j) n-1zT(I - S,)le = Op(n-'/');

(k) xT(I-S,)3x _-xrT(I-SA)2z8

-(c1 - 2c2 + C3)rrSA- 1/2m + _op_{(A-1/4mao) + Op(n1/2A1/a}

+T~(I _ S\)3h8; + hrT(->3S

(1) xT(I - SA)3Xs - XT(I _ 4Xs

= (cl - _3C2+ _3c3- _C4)crsA-l/2m₊Op(A -1/4mao) + _{Op(nl/2A1/al)}

+ hT (I - _SA)3SAhs.

(11)

nontrivial proof of Lemma 3 is deferred to Section 6.

PROOF OF LEMMA 1. First note that A-1/4mao = o(n-1/2), for all A E [Al, A2]

since 1/ao < 5. It is easy to see that (a) and (b) hold by Lemma 3(g). Note that

A2 = o(n-1/2), for all A E Ao. Then it is easy to see that (c) holds by Lemma 3(h); (d) holds by Lemma 3(d) and 3(e); (e) holds by Lemma 3(c); (f) holds by Lemma 3(i); (g) holds by Lemma 3(d). 0

4. Proof for two-stage

spline smoothing

estimate. We prove

Proposi-

tions 1 and 2 for the two-stage spline smoothing estimates in this section. The following technical lemma summarizes the convergence rates for some terms to be used in the proofs. The proof of the lemma is deferred to Section 7.

LEMMA 4. Assume that (A1)-4A4) hold and that g, hr E F, for 1 < r < d. We further assume that the constants _{a, ao and a, specified}in Lemma 3 satisfy the further constraint that 4m/(4m - 1) > a, > a and aO > 1. Then the following statements hold uniformly over all A e [A1, A2]:

(a) n-1 trAOA = cln-lA-l/2m(1 + o (1));

(b) n-1 trA2 = C2n-IA-1/2m(+ (1)

(c) n - _1goT(I- AOx) 2go = yA 2(1 + op (1));

(d) n-l/3TZT(I -Ao,)2Z13

- [n1

(Z

rhr) (I-SX)

(Zfrhr)

+ _{(c2 -}2c3 +c4 )T n-1A-1/2m _(1+op

(1));

(e) I n IgoT (I - AO,\)2e I =

op (ROn

(A\));J

(f) in-l(Zj) T(I -Ao\)2e l=op(Rn(A));

(g) In-leT(2AoA -A2 )e (2 trAoA -trA, A

)i

= op(Ro (A))v

PROOF OF PROPOSITION 1. Write Ao\y-X,Z3-g = _(AoA\-I)(Xf3+ g) +AAOe.

Hence

ROn(A) = n1 (Xf3

+

_g)T(I-A0o)2 (X3 + g) +n- 1o2 trA 2 Note that X,3 + g =Z3 + go. Then

ROn(A)>{[cC20 + (C2 - _{2c3 +}_{C4),3T ,]n-lAl-1/2m +-yA}2

(6)T

by) Lemm.+ (Ia_ (a l))-)( 4b)4 (drhr}

n

(

rhr

( + op by Lemma 4(b)-(d).

(12)

Note that nY-(ErI3rhr)T(I - _{S)\)4(Er3rhr)}> 0 and its order is 0(A2), by

Lemma 3(e) and that the eigenvalues of SA are between 0 and 1. Also,

(C2 - 2C3 + C4)16TEI3 > 0, by the fact that C2 - 2C3 + C4 > 0 and E is positive definite. Hence, Proposition l(a) holds by (6), and Proposition l(b) follows easily from Proposition l(a). o

PROOF OF PROPOSITION 2. Recall that COL(A)= n1 II(I-Ao)y112+2n-1'2trAoA, which can be written as

COL(A) = n-leTe +ROn(A) + 2n-l(Z3 + go)T(I -Ao,\)2e

(7) _{+n'1{a2(2trAoA) - trA2) )eT(2AoA -AO)e}}

= n -leTe +Ron(A) + op(Ron(A)),

by Lemma 4(e)-(g).

Recall that the GCV function VO(A) = n- II(I - Ao)y112[n- tr(I _-AO)]-2.

Write Ao,A = SA + Bo,, where Bo = n-1(I - SA)2XA- (A)XT(I - SA)2. It follows from Lemmas l(a) and 3(g) that

trBo,\ = tr (A -1 (A)n1-XT (I - SA) 4X) = _{tr(Id xd +}_op(i)) = 0p(1). Also Lemma 2(c) gives that trS), = O(A-1/2m). We then have

[n-1tr(I _{-Ao 2)]} = _{1+ 2n1 trAoA}+ o(n tr Ao,). Observe that

n-ll(I -AoA)y)

112

=ROn(A) + 2n-'(Z,3 + go)T(I -Ao0\)2e + n-leTe

-n-1 [eT(2Ao, -A,)e -_ 2(2trAo, - _trAo,)]

- 2a2n-1 tr Ao,.

The fourth term on the right-hand side is equal to op(ROn(A)), by Lemma

4(g). The second term is also of the order op(ROn(A)), by Lemma 4(e) and (f).

We thus get

Vo(A) = _[n-leTe+ ROn(A) + _op(ROn(A))- _2A2n-1_trAo_\]

(8) x [1 + 2n-1 trAoA + o(n'1 trAoA)] =

n(-8leTe _{+ ROn (A) + op (ROn (A)) + 2 (n tr AOA) (n-leTe}- a2)

= n-leTe +ROn(A) +op(ROn(A)),

by Lemma 4(a), Proposition l(a) and the law of large numbers. From (7) and (8), we have

COL (A) - _{CoL (AOR)}_{= Ron}_{(A) -Ron (AOR) +}_{op (ROn}_(A)) and

(13)

respectively. When ROn(A)/ROn(AOR) -A 00, it follows easily that COL(A) >

COL(AOR) and VO(A) > Vo(AOR) in probability. Since A is the minimizer of COL(A)

or Vo(A), this implies that Ron(A)/Ron(AoR) -- _{1 in probability.}Let {6n} be

any sequence that tends to infinity. Note that Ron(AoR6n)/Ron(AoR) -- oo and

Ron(AoR/6n)/Ron(AoR) --

o?

_{by Proposition}_{l(a). Hence,}_{Ron(A)/Ron(AOR)}-? 00 for any A

>

AOR6n or A

<

AOR/6n. Since Ron(A)/Ron(AoR) cannot go to infinity, we have that

lim P(AOR/bn _n < " A '< AoRSn) = 1.

Since {En} is any sequence that tends to infinity, A cannot be too far away

from AOR in probability. Thus limn P(A E Ao) = 1. 0

5. Proof for the partial regression estimate. First, we state a technical lemma that summarizes the convergence rates for some terms to be used in the proofs of Propositions 3 and 4. We defer the proof of this lemma to Section 7.

LEMMA 5. Assume that (A1)-(A4) hold and that g, hr E F, for 1 < r < d. We further assume that the constants a, aO and a, specified in Lemma 3 satisfy the further constraint that 4m/(4m - 1) > a1 > a and aO > 2. Then the following statements hold uniformly over all A E [A1, A2]:

(a) n-19gT(I -Ax )T(I - A1,)go = yA2(1 + op(l)); n 13TZT (I - Al) T(I- A ,)ZO

(b)

=n ;(Eprhr)

(I_-SA

_{)2 E,Prhr}

_{+Op(A-1/2m) (1 +}o(1)); (c) In-l(X,3 + g)T(I _Al))T(I _{-AAl)el =op(Rn(A))-}

PROOF OF PROPOSITION 3. Simple algebra leads to

R14(A) = nl(Xfi3+g)T(I-A1x )T(I-A1l)(Xj33+ g)+ n1ov trAT,A1,.

Set Al, = SA _{+ BlA\,}where

BlA

= n-'(I - SA)XA7'(A)XT(I - SA\)2. By Lemmas l(b), 3(g) and 2(c), we have

tr BTfB1X = trAL'(A) [n-'XT(I - SA\)2X]AL'(A) [n-lXT(I - S,\)4X] = Op (1)

and

(10) tr STBlA trAj'(A){n

XT[(I-SA)3-(I_-SA)4]X} =-p(j)

(14)

Hence,

(11) ~~~n-1 tr ATA n-lA-l/2m(1+o() Then since X,3 + g = Zl3 + go, we have

R1n(A) = [C2a2n-lA-l/2m + A2

(12) LT

+ n 1( rhr) (ISA)2 (Z rhr)] (1+Op(1)), by Lemmas 5(a), 5(b) and (11).

Note that n-l(ErOfrhr)T (I - SA)2 (Erlrhr) > 0 and its order is O(A2) by Lemma 3(e). Hence, (a) holds; (b) follows easily from (a). O

PROOF OF PROPOSITION 4. We first observe that trBlA = Op(l) by Lemmas l(b) and 3(g). Then, by Lemma 5(c), it remains to show that

(13) _{n-1jU2 trAlTAA1A}_{- eT(Al +ATA -ATIAAl)eI =o(R1n(A)),}

(14) n-lju2(2 trAlA - trATAA1A) - eT(Al), +ATA -ATAAl4eI =o(R1n(A)) hold uniformly over all A E [A1, A2], so that

CuL(A) = n-leTe +Rln(A) + o,(Rln(A)),

V1L,(A) = _n-leTe + _{Rln (A) +}_op_{(Rln (A)) -}

Then, by applying the same argument employed in Proposition 2, we have Proposition 4.

It follows from Lemmas l(b), 3(c), 3(j) and 3(g) that

n-leTBlAe = [n-leT(I - SA) (Z + H)]A' -(A) [n-l(Z + H)T(I - SA)2e]

= op (Rln (A)))

n eTB,\SAe = [n-leT(I -S)2(Z+ H)]Aj'(A)

(16) x {n-l(Z+H)T[(I SA) - (I- S)2]e}

= 0 (Rln (A)\)

n-leTBT_BlAe = [n-leT(I - _S\)2(Z + H)]A1'(A)

(17) _X~~~~_{[n-lXT(I -5,\)2X]A1 1(A)} x [n-l(Z + H)T(I - S,)2e]

= op (Rin (A))

It follows from Lemmas 3(b) and 2(c) that

(18)

n-c[e T(2S3 a

_(S2)e

b (,2 tr (2S15

-_S2)]

) _(Rln(A)) We conclude (13) and (14) by (9), (10) and (15) (18).0

(15)

6. Proof of Lemma 3. We begin with a technical lemma which is an extension of Lemma 4.4 in Speckman (1985) to the case when the random variables are not independent. Therefore, the Gaussian assumption in Speck- man (1985) or Li (1986) is removed.

LEMMA 6. Let W1, ... , Wn be random variables with zero mean and finite

variance. Suppose that there exist nonnegative numbers {Uk} such that

2 _v

E

[ WjW < Z Uk,

for

all

u

< v.

Lk=,4 k=,u Then, for any c > 0,

sup ZcWk?c}?c2c~g

4n

)2

1I

Uk.

Pt SUp _|ECkWkI >C} <C _0002(O 4) u.

-<Cl< ... <Cnf<Co k=1 k=1

PROOF. By the argument used in Lemma 4.4 of Speckman (1985), we have

n i

sup E CkWk =Co max

_E

Wk.

0<?Cl ?...?<Cnl?c0 k=1 -< - k=1

Then, by the first two theorems stated in Serfling [(1970), page 1228],

E jmax

_[

x Wk < (log24n)2Euk.

L k=1 J4 k=1

Hence, this lemma holds by Chebyshev's inequality. O

REMARK 1. When EWkWl = 0, for k i 1, Lemma 6 holds with uk = Var(Wk).

REMARK 2. Lemma 6 also holds when 0 < cn < ... < cl _<_cO.

Define

n n

'mkrn = n1/2 Zirn dkn (tin) X hkrn = n 1/2 hr (tin >knhr (tin) , )

i=l i=l

n n

ckn =n 1/2 _{>g(tin )q$kn (tin)} _{Ekn=n 1/2}_{Eein_Okn(tin)}

i=l i=l

for 1 < k < n and 1 < r < d. Lemma 6 will be applied to {fkrn4ksn}1<k<n and

{Jkzrn6ekn1}<k<n9 for 1 < r, s <

d,

later on in the proof of Lemma 3. Thus we need to show that these two sequences of random variables satisfy the assumption of Lemma 6.

(16)

LEMMA 7. For any finite positive integer 1 and 1 < r, s < d, both

{(4rn4sn - 0rs)(1 + AknA) } and {4krnEkn(1 + AknA) }

l<k<n l<k<n

satisfy the assumption of Lemma 6 with Uk = c*(l + _AknA)-21,for some con-

stant c*.

PROOF. Recall that S), =

rTDA\r.

Set D, = (dik)nxn, where dik = 1, if

IL < i = k < v, and dik = 0, otherwise. In other words, Dp, is an

n x n

diagonal

matrix with the diagonal entry equal to 1 from the p-th row to the v-th row and zero otherwise. Then

Z

(+Akm6) =

ZT(rTDHv D,

D,Ir)

le,

(19) k,

4krn4ksn - 0rs _{ZT(rTDv DA Divr)lZS}T - _{ars tr(rTD,VDAD4Vr)l.}

(1+Akn A)l

By (Al), (A4) and a conditioning argument, we have

rv

12

E

z

( Akn =)lJ _=2EZT(FTDjv

DA

_D,ivl')21Zr,

= a 2arr

_tr(rTD,pv

_DA,_D,,Vr)21 = Or2cTrr j(1 + Aknf A2)]

Lk=L J

Letting Uk = 020rr(l + AknA)-21, we have shown that the assump.tion of Lemma

6 holds for ,krnEkn(l + AknA)}l1<k<n.

Next, by (19) we have

1 2 E

jS

S n

-n)rs] = Var(zrT(rTDj,1DA D,,vr)lzs),

since E(Z T(rTD,W _{Dx DilvrFlz8)}= ars tr(rTDiv DA DMvr)'. We first show that, for

any symmetric matrix _{A = (ajj)nxn,}

(20) Var(zTAz8) < co tr A2

for 1 < r < s < d, where co is a constant depending on Ez2z2 and E only. For notational simplicity, we only demonstrate the case of r = 1 and s = 2. First, we note that EzT Az2 - a12 tr A and

(ZTAz2)2 =

5555

_{aijaklzZilnZj2nZk1nZl2n}

(17)

Since {(Ziln, Zi2n)}1<i<n are mutually independent with mean (0, 0), we have

(Ez2z,

_{1 2} i=j=k =l, 0zk = 12, i

=j,

k = 1, i k, EZilnZj2nZklnZl2n = 1292s z-, J 0110ll22,

i

=

k,j = 19 i _ij, 10, otherwise. Hence

Var(ZT Az2) = (EZ2Z2 - p12)

E

ai + alla22

Z

ai < c

Z

at,

i j i,j

where

c

= max(Ez2lz2 - al2 1 a1U22).

Since A is symmetric

and

Eija?.

tr A2,

(20) holds.

Let A

= _(rTTDpv

DA

_DtWF')1.

By (19) and (20), we have

1 2

E

₍r _Aksn_{-A)rs] -Var(z{TAzS) ?}

co

_tr(FTD _vDA_>Dv ₎₂ h

=CO (l + AknA) 2

,k=v

Thus

{(mkrn6ksn - ars)(l + AknA)-1}1<k<n

satisfies

the assumption

of Lemma

6 by

identifying

Uk

= c?(1 + AknAY21

?

PROOF OF PART

(a). First,

we show the case of I = 1, that is, to show that

(21)

ZT'Sxzs

= ars tr S?, + op(A l/4mao)

holds

uniformly

for

all A e [A1,

A2]

and its proof

argument

will be used through-

out the proof

of Lemma

3. Since

62 < 61,

there

exists

a

>

1 such that a62 <

₆₁

Define

the index set A =

{6: 6

= ai62, for some positive

integer

i and

S

<

61}.

Then A is a finite

partition

of [61,

62].

Correspondingly,

{n-6,

6 E

A} is a

finite partition of [A1, A2]. For any r =

n-a

with ab E A, Ez TSTZ8 = ars tr S,

and Var

(zTSTZS) < cO

tr

S2 = O(r-1/2m)

by (20) and Lemma

2(c). Thus by the

Chebyshev

inequality,

we have

(22)

z4TS,z -_ r,

tr

ST

=

Op(r-1/4m

Write

(z

SAz,

-

_{ars tr SA)}

-

_(zTSz8

-

_{ars tr S,)}

-

(6krn4sn - 0rs3)

(23)

k =

1 +

AknA

1 +

Akn-rJ

,r-

A(n 1 ' rnksn - _(rs

(18)

Note that (1 + AknA)-1 are nonincreasing in k and bounded above by 1, and that {(mkrn4ksn - ars)(l + AknT)-1}1<k<n satisfy the assumption of Lemma 6 with

Uk = c*(1 + Aknr)-2. Then, for any c > 0 and 6 E A, we have

1n krn6s8n - _{07rs >}

nPa6<<nup k

1 + An A

su kk 1 + A 1+

0<_ 1/(l+,\nn A)< - - - < 1/(l+A\ln A)< 1 k=1 1 + Akn, A 1 + Akn'T

)

n

<c2(10g2 4n)2 ZUk,

k=1 by applying Lemma 6 to (23). Since

E

Uk = c*(1 + AknT) 2 = C*C2r 1/2m(1 + o(l))

k

by Lemma 2(c), these arguments lead to

(24) (Z'SAzs - Crs tr SA) - (ZTS,Zs - _rs tr S,) = OP ( , 1/4m logn)

uniformly for all A E [n,-a, n-6]. Then, by (22), for n-a6 < A < n-6,

(ZrSAZs-ors

tr

SA)

= OP (T1/4m) + Op (r-1/4m logn) = op(A-1/4mao)

where ao is any fixed constant satisfying 1/ao > a > 1. Since the cardinality of A is finite, (21) holds.

Now it remains to study the case when 1 > 2. Note that EZ4Tsl Zs = (ors tr Si = CiorsT 1/2m (1 + o(1)) and Var(zr 8) < cO tr S21 1

by (20) and Lemma 2(c). Hence z4TS' z - _arstr S' = _OP(CT-1/4m)by the Cheby-

shev inequality. Some algebra shows that

(Z SzS - _Orrstr Sk) - (Z S Z's - _arstr S )

k [(1 + AknlA)- (1 +

Aknr)l

((krn'ksn- rs)

.T _ - 1

]

_{6krn6ksn- 0rs}

A _k=1 _L_{1 +}_Xkn,)i _{(1 +}_Aknl)i+'J _{(1 +}_Akn7)'-"

Note that (1 + AknA)- are nonincreasing in k and bounded above by 1. Hence,

(a) holds by applying Lemma 6 to each term on the right-hand side of the above expression and by the argument used in showing (21). o

(19)

PROOF OF PART

(b). (b) follows

from

(a) by identifying

Zr

and

z8

in (a) with

e in (b).

O

PROOF OF PART (c).

For any finite

positive

integer

1,

observe

that

En l/2fT (I-S,)e=O and

Var[n/2fT (I - ST)'e] = n-l2fT(I - _S,)2f _n-la2fT(I - S )2f

0(r

2)

since the eigenvalues

of

ST

are between

0 and 1. Hence, for any given r

E [A1, A2],

(25) n /2fT (I - _S)'e = _Op(r).

Forr=n-6 with

6E

A, write

fT[(I SA) - (I - S7)']e

(26) AT

[E

_(i

1+ +AkA) ( I

+

AknrT)]

1 _AknrnT _n

1 + AknA 1 + Aknr

wherefkn = n

1/2

Ei=l Atin)0kn(tin). Note that {[Aknr/(1 + Aknfr)]fknEkn} does not depend on A, _{E(fknEknXf1n6In)}= _{0, for}k

_i

1, that {(1+AknA)Ni(1+Akn'r)J} for 1 <

i,j

< 1, are nonincreasing

in k, and that

Akn,r n Aknr 2

Var _kn fEkn) =n- 1U2 _Z1

A

_fi)2

1 _A,,+ / _k=l1\+ AknT f = n-lY2fj (I-S7)2f= 0(r2).

It follows

from

Remark

1 following

Lemma

6 that we can apply Lemma 6 to

each term

on the right-hand

side of (26). Thus we conclude

that

(27) n-1/2 [fT(I - _Sx)le_{- f (I - ST)1e] = Op((r -}_{A) logn)}

holds uniformly for all A E [n-a6, n-6]. By (25) and (27), for any a, > a, n-l/2fT(I - S,\)'e = Op (Al/a,)

holds uniformly

for all A

E [A1, A2]

and finite

positive

integer

1. Hence, (c)

holds.

0

PROOF OF PART

(d). (d) can be shown

similarly

by identifying

e in (c) with

Zr in (d). o

PROOF OF PART (e). Note that

(20)

by the Cauchy-Schwarz inequality. Since that the eigenvalues of SA are between 0 and 1, (e) holds by Lemma 2(a). 0l

PROOF OF PART (f). Write XT(I-S)\)2 Z = _{ZTZ+HT(I-S,\)2Z+ZT(S2 -2SA\)Z.} Then, by (Al) (in Section 2) and the law of large numbers, n-lZTZ = > + op,(l).

Hence, (f) follows from (a) and (d). El

PROOF OF PART (g). Note that

n xrT(i- SA)XS = n'zT(I- _nT_hIT(I + _-S)

+ n - -1T _(I-SA _)tZr+ n _lhrT_(I- SA,)'h.

Recall that n-lZTZ = E + _{op(l). Hence, it follows}easily from (a), (d) and (e)

that (g) holds. Ol

PROOF OF PART (h). Write

-

T(I_SA )2S,\Xs = -2S, + S3)z8 + hT(I-

+ hrT [(I-S,)2_-(I-SA\)3] Zs + hsT[(I-SA)2 -(I_SA)3]Zr. Since IhrT(I - _SA,)2SAhsI< nB2rpB2sp = _O(nA2),(h) holds by (a) and (d). El

PROOF OF PART (i). Observe that EZTS' e = 0 and Var (ZT S'e) = q2 Var (zST2l'Zr) 7-1/2m

by (20) and Lemma 2(c). Hence z TS' e - OP(T-1/4m), for any given sequence Tr = n-1 with ab E A. Write

(28) Zr A- A ₍₁+ _AknA)V (1 + _{SknA)v+1] (1+} _\knr)-v Note that (1 + S\nA)-i are nonincreasing in k and bounded above by 1. By Lemma 7, {mkrnEkn(l + AknT)-i} satisfies the assumption of Lemma 6. By ap-

plying Lemma 6 to each term on the right-hand side of (28), we conclude that (29) z TS'e = Op((l - _{A-1 r)}_T)1/4m _logn)= _op(A-1/4mao)

holds uniformly over A E [n-a6, n-6]. Hence, (i) holds. El PROOF OF PART (J). Write

-1/2T(I _ S, n/2ze + n"2zT(-2S2 + S2)e.

(21)

PROOF OF PARTS (k) AND (1). It follows from (a) and (c) that

XT (I-SA )3xS _ -XT(I-SA )2Zs

_ zT(I - )2S,z - _hrT(I- SA)2z

+ hT(I_SA)3Zs

+

hT(IjSA)3zr + hT(I_S)3hs

=-(c - _2c2_{+C3 )rsA -1/2m + Op (A-1/4mao}) + Op(n l/2Al/ai) + h T(I-SA)3hs,

4T(i - sA)3x8 - _Xr(i- s

T(II-,> _3sx (I-)S,Z )h. xS,\3(_

=

z4

(I-SA)3SAz8 + hT[(I-SA)3 3-(ISA)4] Zr

+ hT [(I - S) _{(I_}- 4 + T(I _ _S\)3ShS

= _(cl- 3c2 + _3C3- _{C4 )a rsA 1/2m + Op (A -1/4mao ) +}_Op_(n1/2Al/a)

+ h4T(I-SA\)3SAhs8.

Hence, we conclude (k) and (1). 0l

7. Proofs of Lemmas 4 and 5.

PROOF OF LEMMA _{4. Recall ROn(A)} A2 + n-1A-1/2m . From now on, we require that the three constants a, ao and a1 in Lemma 3 satisfy 4m/(4m- 1) > a1 > a and ao > I so that, for A e [A1, A2],

(30) nl-lA-1/4mao = o(A2 + -lA-l/2m) = o(ROn(A)) and

(31) n- 1/2Al/a, = 0(A\2 + n-1A - 1/2m) = O(ROn (A))

Equations (30) and (31) can be verified by simple algebra. Recall that Ao, = S), + Bo,\ where Bo, = n'-(I - SA\)2XA - (A)XiT(I - _{S,)2 and AoA = nXiXT(I -}

SA)3X.

[(a) and (b)] By Lemma 3(g), we have

(32) trBO, = tr

{A -

1(A) [n\XT(I-SA)4X]} =op (1)

(33) _{trBO = tr{AO-1(A)}_{[n-XT(IS)4X]}2} _=Op()

(34) trSA\BOA = trA- 1(A) {nlXT[(I-SA)5 - (I\-S)4]X} =op(i). This, together with Lemma 2(c), proves (a) and (b).

(c) It follows from Lemma 2(b) that

(22)

Also, by Lemma (a) and Lemma 3(d) and (e), we have n'go Bogo = _{[n'go(I -S4)2H}

+

n-g (I- S)2Z]

x {A

1(A) [n-1XT(I - S\)4X]AO1(A)}

x [rrlHT(I - _{S,A)2go + n-lZT(I _ SA)2go]}

= _{[O(A2) +op(n-1/2)][7-1 +op(j)]} _[O(A2)_+op(n-1/2)] = o(A2) This, together with the Cauchy-Schwarz inequality and (35), leads to (c).

(d) Write

nlzT(I - _{AO,)2Z = n-1 {ZT(S2} - _{2S\)Z + ZT[(I}- BoA) - (BoA -BOA]Z

+ ZTSABoAZ + ZTBOASAZ}.

By Lemma 3(a), we have the first term

(36) nl(Zf3)T(S2-2S\)(Z,3)=(c2-2c,)13TE43n- l)Jl/2m(1 +o ())

It follows from Lemma 3(a), (d) and (f), Lemma l(a), (30) and (31) that the third term

13ZTSA A3 = 3 [n 'ZTSA(I - SA)2X]A- 1(A) [n -XT(I -S)2Z]IB = (c1 - 2C2 + C3 )f3T Efn31A-A1/2m (1 _{+ oP (1)).}

The fourth term has the same rate. Observe that

nlZT(I - _Bo)Z _{[n Z TZA} _1()]_{[Ao(A) -nlXT(I}

_S

_)2Z]

+ { [lZT(2SA -S2)Z] - [nlZT(I S)2H]}

xAO-'(A)[n-1XT(I - _S,

n-IZT (Box - Boi)Z = Err lZT(I - _Sv)2X]Ao'(A) [Ao(A) - rrlXT (I - _s>)4x]

x A-' (A) [n- XT(I - s)2Z].

Then by Lemma 3(d) and (f), we have

r-lHT(I - S>)2z - op(A2 + 1 IlXT(I n-lA-l/2m) and -S T2Z = S +op(i). Also, by ()in Section 2) and the law of large numbers, we see that nlZTZ = T + op(l). Hence, it follows from Lemma 1(a) that

(23)

Hence,

by Lemma l(a), (36) and Lemma

3(k) and (1), we conclude

that

n- (Z,f3)T [(I -Bo,\) - (Bo, - _{Bo,)] (Z,e3)}

= _{n-1 [(Hj3)T(I} - + o(A-/4'1o) + Op(fl-/2A1/al)

+ (4C2- 4c3 + c4)fTE, -Al/2m] (1 + _{op (1)).}

This, together

with

(36) and (37), proves

(d).

(e) Note that n-g1T(I

-

S)jhr

= o(A2),

for

1 > 2,

and

(39) n lXT(I S\)2e = 0p(n-l/2Al/a1) + Op(n-1/2) = O (n-1/2

by Lemma 3(c) and

Ci).

Then by (31), (39), Lemma l(a) and Lemma 3 (d), (e)

and (g),

n-1gTB2

e

= [ng90(I - _{S\)2X]Ao-1(A) [n-lXT(I}- _S>)4X]Ao'(A)

x [n-lXT(I SA)2e]

(n - +/2 Al/a +A2) [n-1/2 Al/a, + (n-1/2)]

= op(ROn(A)).

Similarly,

we have

n9g0'(i -SA)BoAe = [n'g (I- S\)3X]Ao (A) [n-lXT(I - SA)2e]

= op (ROn (A)).

By Lemma 3(c) and (31),

n-1gT(I - SA)2e = _OP(Ron(A)).

_Putting

these results

together,

we have (e).

(f) Write (I-AoA)2 = S2 -(I -Bo,) S-Sx(I-BoA) + (I-Bo,) - (Bo, -B2A). By

the central

limit

theorem,

n-lZTe = Op(n-1/2).

Then it follows

from

Lemma

l(a) and Lemma

3 (c), (k), (f) and (i) that

n-lzT (I - _Bo,\)e_{= [Ao(A) -}

_n-1ZT(j

_

_S\)_2X]A- _l(A)(n- ZTe)

--[n-lZT(I SA,)2X]A- 1 (A)

x [n-lHT(I - SA)2e - _{n-lZT(2S\ _ S2)e]}

= _{O(Ro (A)).}

By Lemma

3(1), we have Ao(A)

- n-1XT(I - SA)4X = _Op(Ron(A)).

_{It then}

_follows

by (39) and Lemma

3(f) that

n-lZT(Box - B 2)e = [n-lZT(I - S)2X]Ao (A)

x[Ao(A) - _n-lXT(I - _S,)4X]A-'(A) _[n-lXT(I _-_{S )2e]}

(24)

Next,

n- ZT(i

-

BoA

)SAe

=

n-lZTSAe

- _[n-lZT(I-

_{S )2X]A- (A)}

=~~~~~~~~~~~

(40) x _\lT(-s)2 _{- (I _ S,\)3]el}

= (Ron (A))

since n'ZTS7Se = op(Ron(A)),

by Lemma

3(i) and the fact

that

n-lXT[(I - S _-)2_ (I - SA,)3]e = n'lHT[(I - _S,-)2 _ (I - _S,)3]e

(41) _-Sn-ZT(S-2S2 +S3)e

= _o(ROn_(A)),

by Lemma 3(c) and (i). Similarly, n71ZTSA(I - BOA)e = o(Ron(A)). Finally,

nZTSSl e _{=op (ROn(A))} by Lemma 3(i). Combining all the terms, we have (f).

(g) Write A2 - _{2AO = S2 - 2S + (BoASA + SABo,)}+B - 2Bo. First, we note that

(42) n-l(eTS' e _ U2 tr S ) = o (n-A1 A-l/4mao) = O (R ()),

by the proofs _{of Lemma 3(a) and (b) and (30). Next, by Lemma l(a), (39) and} (41) and Lemma 3(g),

(43) _{n-leTBoAe = [n-leT(I - S )2Xl4-1 (A) [nr'XT(I}- SA,)2e]

;t n-1/2A1/aj + 0=Jn-1/2 2= p(ROn(A))t

n eTBoASAe =n [leT _(I- _{SA) 2X]A0 (A)}

_[n

_'X _{(I SA)}_2SAe

n-leTB2 e = _{[n-1eT(I - SA,)2X]A-}_{1(A) [n -XTi(I - S) 4X]} xA-' l(A) [n-1XT(j _ S,\)2e] = op (ROn (A)). Part (g) holds by (32)-(34), (43)-(45) and (42). Cl

PROOF OF LEMMA _{5. Recall that Al, =}

S),

+ BlA, where

B n-=1n(I-SA)XA 1(A)XT(I-SA)2 and A1(A) =n XT (I S)2X. (a) By Lemma 3(d), (e) and (g),

n-goTBTABl,AgO = [n-lgT(I - S\)2XA1 (A) [n-lX(I - T [ngA( A/lk Sc<)2X]A-l(A) JA I~iA

x [n-XT(I - S\)2go]

(25)

It can be easily verified that Op(n-lA2/al) = op(A2). This, together with the Cauchy-Schwarz inequality and (35), proves (a).

(b) Write

I,TZT(I -A,,)T(I -A A)Z13 = JTZT(S2 - 2SA)Zfl + 2fTZTSTBlAZ/3

+ OTZT [ (I - B1 B1T(I - B1 )]Z By (36), Lemma 3(a) and (f) and Lemma 1 (b), we have the second term,

n 1f3TZTSTBZI3 = 3T [nlZTSA (I - SA)X]A-1 (A) [n lXT(I - (46) = (cl - _C2_)/3T _{,{3n-Al1 1/2m}+ _{op(n-f-A-1/4mao)}

+ Op (n -1/2 A1/al) .

Observe that

n-lZT(I - BlA)Z = [n-lZTZAl1(A)] [A1 (A) - n-lXT(I -

_S0)2Z]

+ (n-lZTSXZ)A l (A) [n-lXT (I SA)2Z] - [nlZT(I - SA)H]A1 (A) [nlXT (I - ₎₂

n lZTBT (I -B1l)Z = [n lZT(I- SA)X]Aj1(A)

x

{

[nlXT(I - _S,\)Z- nlXT(I _ SA)2X]

+ [n lXT(I - _{S()2X]A (A)} x [Al (A) - n-lX( S)2Z]} It follows from Lemma 3(a) and (d) that

xT(I - _SA)2x8-_{x_7T(i}- _SA)2zs = XrT(I- _ S

= hrT(I - _{SA)2hs + Op(nl1/2A1/l)}

= hrT (I-SA)2hs + op (nA2 + A- 1/2m

xT(I-SA)zs

-xT(I-S\)2xS = (c11C2)UrsAl/2m -hT(I-S)2hs

+Op(A-1/4mao) + Op(nl1/2 Al/al)

= _(C1- _C2)orsA _h/2m_-2hT(Ih-

+p n2 1 /2m . +oP(nA2 +A

Note that _In'lhT(I- SA)2hsI

=

O(A2) by Lemma 3(e). Hence

,B7 T[A 1(A) -n - XT (I _S,\) 2Z],

= n-[o (nA2 + A-l/2m) + (H3)T(I _ SA)2H,3] (1 + op (1)), (48) 3T [n 4XT(I - S)Z -n-XT(I- S-)2X])

(26)

By Lemma l(b), (38), (47), (48) and Lemma 3(a), (d), (f) and (g), we conclude that

n-1 (Z)3)T[(I - Bl,\) -BlT(I - Bl,)](Z)3)

= _{n-1 [(Hf3)}T (I - S>,)2(H,3) + C2'3TE'A_1/2m] ( 1 + op ( 1) )

Part (b) holds by (36), (46) and (49).

(c) Write (I-Al,x)T(I -Al,) = (I-SA)2 + BTAB1 -(I-S,)B1 -BBTA(I- S,). By Lemma l(b), (48) and Lemma 3(c}(e), (g) and (j), we have

n-1 OTz3T, Ble

= [n-gT(I - SA)2X]A7'(A) [n-lXT(I - S -)2X]A1 (A) [n lXT(I - SA)2e]

(n-1/21/+, 2 [-/2 1/al + -1/2

=op

(R

ln

(A))

s

n-1OT(i - S,A)Bl,Ae = n-1gOTBTz (I - S,\)e

= [n-g (I - SA) _{X]A1 (A)[n-lXT (I- S)2e]} = Op (R ln (A)).

By Lemma 3(c), n-1g?T(I - _{S,)2e = op(Rln(A)).}Hence,

(50) _{n-g1T(Ij-AlA)T(I-AjA)e} = _o(Rj(A)).

Write

(I -Al,)T(I Al) = _{(I - BlA)T(I-B )}+B s -(I - Bl,\)Ts,\ -

S'\(I

- B1)

ZT(I-BlA)T(I-BlA)e = ZT(I-BlA)e_ZTBT (I-BlA)e.

Recall that n-lZTe = Op(n-1/2). Then

n-lZT(I - BlA\)e = [A1(A) - n-lZT(I - SA)X]A71 (n-lZTe) - _{[n-lZT(I -}_SA\)X]A7_(A)

x [n-lHT(I - S)2e - n-lZT(2S \ - S2)e] =~~~~~~~~~~~~~

=

_0(R1n(A))j

by Lemma l(a), (47) and Lemma 3(c), (f) and (i). Using the same argument to derive (47), we have

(27)

By (39), Lemma l(b), Lemma 3(f) and (g) and (51), we have n 1ZTBTA _{(I-BlA)e = [n-lZT(I_SA} _)2X]Ail(A)

x [Al(A) - n-lXT(I - SA)2X]Aj1 (A)[n-lXT(I S)2e]

= O(Rln(A)).

Hence,

(52) n-lzT(I - _{BlA)T(I - BlA)e = o(Rin(A))} Then by Lemma 3(c) and (i), we have

n-lZT(I - Bl,)TS,Ae n-lZTS,\e - [n-'ZT(I - SA)2X]A-1 (A)

(53) _{x [n-lXT(I - SA )SA,e]}

= _{o(Rin (A)).}

Thus by (52), (53) and Lemma 3(i) we have

(54) ZT(I - Al,)T(I - Alxe) = op (Rin (A)) Part(c) holds by (50) and (54). El

Acknowledgments. We would like to thank two referees, an Associate Editor and a former Editor (Arthur Cohen) for the careful review. The referees' constructive comments considerably improved the paper and are greatly appreciated. We would also like to thank Professor Paul Speckman for his comments on our earlier work, which led to the development of the two-stage spline smoothing method.

REFERENCES

CHEN, H. and SHLIU, J. H. (1991). A two-stage spline smoothing method for partially linear mod-

els. J. Statist. Plann. Inference 27 187-201.

CRAVEN, P. and WAHBA, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31

377-403.

DEMMLER, A. and REINSCH, C. (1975). Oscillation matrices with spline smoothing. Numer. Math. 24 375-382.

DENBY, L. (1986). Smooth regression function. Statistical Research Report 26, AT&T Bell Labo-

ratories.

ENGLE, R. F., GRANGER, C. W, RICE, J. and WEISS, A. (1986). Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 310-320. EUBANK, R. L. (1988). Spline Smoothing and Nonparametric Regression. Dekker, New York. EUBANK, R. L. and SPECKMAN, P. L. (1991). A bias reduction theorem with applications in non-

parametric regression. Scand. J. Statist. 18 211-222.

HECKMAN, N. (1986). Spline smoothing in partly linear models. J. Roy. Statist. Soc. Ser. B 48 244-248.

LI, K C. (1986). Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing. Ann. Statist. 14 1101-1112.

MALuows, C. L. (1973). Some comments on Cp. Technometrics 15 661-675.

(28)

SERFLING, R. J. (1970). Moment inequalities for the maximum cumulative sum. Ann. Math. Statist. 41 1227-1234.

SHIAU, J., WAHBA, G. and JOHNSON, D. R. (1986). Partial spline models for the inclusion of tropopause and frontal boundary information in otherwise smooth two and three di- mensional objective analysis. Journal ofAtmospheric and Ocean Technology 3 713-725. SPECKMAN, P. (1981). The asymptotic integrated mean square error for smoothing noisy data by

splines. Unpublished manuscript.

SPECKMAN, P. (1985). Spline smoothing and optimal rates of convergence in nonparametric regression models. Ann. Statist. 13 970-983.

SPECKMAN, P. (1988). Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50 413-436.

WAHBA, G. (1977). Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14 661-667.

WAHBA, G. (1984). Partial spline models for the semiparametric estimation of functions of several variables. In Statistical Analysis of 7lme Series, Proceeding of the Japan U.S. Joint Seminar, 7bkyo 319-329. Inst. Statist. Math., lbkyo.

WAHBA, G. (1986). Partial and interaction splines for the semiparametric estimation of functions of several variables. In Computer Science and Statistics: Proceedings of the 18th Sym- posium on the Interface (T. J. Boardman, ed.) 75-80. Amer. Statist. Assoc., Alexandria, VA.

DEPARTMENT OF APPLIED MATHEMATICS

AND STATISTICS

STATE UNIVERSITY OF NEW YORK

STONY BROOK, NEW YORK 11794-3600

INSTITUTE OF STATISTICS

NATIONAL CHIAO-TUNG UNIvERSrrY