Author(s): Hung Chen and Jyh-Jen Horng Shiau
Source: The Annals of Statistics, Vol. 22, No. 1 (Mar., 1994), pp. 211-237 Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2242451 .
Accessed: 28/04/2014 12:45
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Statistics.
DATA-DRIVEN EFFICIENT ESTIMATORS FOR A PARTIALLY LINEAR MODEL
BY HUNG CHEN1 AND JYH-JEN HORNG SHIAU2
State University of New York, Stony Brook and National Chiao-Thng University
Chen and Shiau showed that a two-stage spline smoothing method and the partial regression method lead to efficient estimators for the paramet- ric component of a partially linear model when the smoothing parameter is a deterministic sequence tending to zero at an appropriate rate. This paper is concerned with the large-sample behavior of these estimators when the smoothing parameter is chosen by the generalized cross validation (GCV) method or Mallows' CL. Under mild conditions, the estimated parametric component is asymptotically normal with the usual parametric rate of con- vergence for both spline estimation methods. As a by-product, it is shown that the 'optimal rate" for the smoothing parameter, with respect to ex- pected average squared error, is the same for the two estimation methods as it is for ordinary smoothing splines.
1. Introduction. In this paper, we study the asymptotic behavior of the two efficient estimators for the parametric component of a partially linear model discussed in Chen and Shiau (1991) when the smoothing parameter is chosen either by the generalized cross validation (GCV) method proposed by Craven and Wahba (1979) or by the Mallows CL criterion [Mallows (1973)]. As in Chen and Shiau (1991), we consider a semiparametric regression model (1) Yin = X'23 +g(tin) + ein, i = 1, ...n,
where both the Xin = (Xin,... ,Xidn)T (a d-vector) and tin E [0, 1] are observed
design variables, ,3 = (01,... ,/d)T is a vector of unknown regression coeffi-
cients, g is a smooth function to be estimated and the {ein} are independent and identically distributed errors when mean zero and variance a2.
Several estimation methods for model (1) have been proposed in the lit- erature. See Chen and Shiau (1991) and the references cited therein. Chen and Shiau (1991) discussed the asymptotic behavior of the following three estimators.
(i) The partial spline estimator [proposed by Engle, Granger, Rice and Weiss (1986), Wahba (1984, 1986) and Shiau, Wahba and Johnson (1986), among
Received June 1991; revised July 1992. 1 Supported by NSF Grant DMS-89-01556.
2 Part of this work was done while J. Shiau was at the Engineering Research Center, AT&T
Bell Laboratories.
AMS 1991 subject classifications. Primary 62G05, 62G99, 62J99.
Key words and phrases. Partial splines, semiparametric regression, smoothing splines, rate of convergence, partial regression, generalized cross validation, Mallows' CL, efficient estimators.
others] is the solution to the following variational problem: 1 n1
(2) mn Z' - in-X -/3 - -g(tin)]2 +
L
m(t)]2 dt,,3ERd, gEw2mn A]0
where W.' is the Sobolev space
{f f
has mr-1 absolutely continuous derivatives andfm)
E L2[0, 1]} and A is the smoothing parameter controlling the tradeoff between fidelity to data and roughness of the solution. It is known that the partial spline estimators for ,3 and g = (g(t1n), .. . ,g(tnn))T are(3) ,8vi = (XT(I - S,)X)-lXT(I - S )y and jx = SA(y-X/3),
where X = (xirn) is the n x d design matrix for the parametric component of (1), Y = (Yin, . .. ,Ynn)T and SAi is the smoother matrix for ordinary spline smoothing [i.e., when 3 = 0 in (2)].
(ii) The partial regression estimator was proposed independently by Denby (1986) and Speckman (1988). Motivated by the partial regression scheme in linear regression, the partial regression estimator is obtained by first smooth- ing X and y, respectively, by the smoother matrix SA, and then regressing the residuals of y on the residuals of X. Specifically, we have the partial regression estimator defined by
(4) ) = (XT(I_S2)2X) XT(I-S y)2y and glA =SA(y-Xf31A) (iii) The two-stage spline smoothing estimator was recently proposed by Chen and Shiau (1991). For simplicity, we shall discuss a simplified version of the estimator when the same smoothing parameter is used in both stages of smoothing, namely,
/30A = (XTi(I - S,)3X)-lXT (I _ S-)2y
= Sx(y - X,A") - (I - SA)SAX/3oA.
The basic idea behind this estimator is to modify the partial spline method so that roughness of the parametric component is penalized as well as that of the nonparametric component. Thus we first smooth X to obtain the residuals (I - S,)XW for the purpose of extracting the smooth part from the parametric
component, and then we apply the partial spline technique to smooth y over (I - SA,)X. This two-stage smoothing gives (5).
In general, the smoother matrix SA in (3), (4) and (5) can be replaced by any commonly used smoother matrix. Of course, estimators obtained by dif- ferent smoothers may behave differently. See Chen and Shiau (1991) for some remarks. In this paper, we only study the case that SA is the smoothing spline smoother.
To use these three methods to estimate
/3
and g in practice, it is necessary to specify a value of the smoothing parameter A. In the context of nonpara- metric regression, it is well known that the choice of A is very crucial to the solution. A popular data-driven method of choosing A is the generalized crossvalidation
(GCV) method
(to be described
in Section
2). Numerically,
the GCV
method
has been proven
to be a good method.
Speckman
(1981) and Li (1986)
gave some
nice theoretical
results
on the GCV method.
However,
the use of the
GCV method
for
determining
the value of A in (3), (4) or (5) has not yet been
thoroughly
examined.
To our knowledge
the only
relevant
reference
is Speck-
man (1988), who gave a weak GCV theorem
as in Craven
and Wahba (1979)
for
the partial
regression
estimator
(4) in the context
of kernel
smoothing.
There have been some studies
on the asymptotic
behavior
of the preced-
ing three
estimators,
when A is a deterministic
quantity
depending
on n, in
the setting
that Xirn
= hr(tin)
+
Zirn,where
the hr's are smooth
functions
and
{(Ziins,. ,Zidn)}1ii<n are independent and identically distributed error vectors with zero mean and positive definite covariance matrix. For the partial spline estimator with spline smoothing, Rice (1986) pointed out that /3A -,/3 canachieve the usual parametric rate of convergence as in parametric regression, namely, O(n-1/2), only at the expense of undersmoothing the nonparametric component g. Thus Rice (1986) concluded that the use of the GCV method for choosing A is questionable in this case.
On the other hand, Speckman (1988), for the partial regression estimator with kernel smoothing, and Chen and Shiau (1991), for the two-stage spline smoothing estimator as well as the partial regression estimator with spline smoothing, showed that the negative result reported in Rice (1986) disappears. More specifically, by choosing an appropriate rate for A, the convergence rate of ,30A -/3 or -31A,-j3 reaches the parametric rate O(n"2) while gOx or gl\ can still
estimate
g= (g(tin),... ,g(tnn))T with the same optimal convergence rate as that of the ordinary nonparametric regression estimator, which is achievable by the GCV estimator of A. Basically, Chen and Shiau (1991) demonstrated that the goal of obtaining an estimate for the regression surface g( ) with an "optimal" nonparametric convergence rate does not conflict with the goal of obtaining an estimate for the parametric component ,3 with the parametric convergence rate. Since "optimal" estimates of the regression surface can be obtained by the method of GCV for the nonparametric regression context, we expect that the parametric convergence rate can be achieved for some estimators of 3, such as (4) and (5), for the semiparametric model (1). The following conjecture is hence reasonably made by Speckman (1988) for kernel smoothing and Chen and Shiau (1991) for spline smoothing.CONJECTURE. The GCV method can be used to choose the value of A in (4) or (5) such that /31a or /30o can still estimate /3 with n112 rate.
The main objective of this paper is to prove this conjecture when SA is the smoother matrix for ordinary spline smoothing. We also prove that the same result holds if A is chosen by the criterion of Mallows' CL. We remark that although the problem of determining smoothing parameters for nonparametric regression based on data only is studied extensively in the literature [see Li (1986) and references therein], those results are not applicable in general to the problem posed in this article. A further remark on this is given in Section 2.
The main results are summarized in Theorems 1 and 2 (Section 2), in which the asymptotic distributions of 0(30 - j3 and 1,3\ - ,3 are derived when the smoothing parameter is determined by either the (restricted) GCV method or (restricted) Mallows' CL. Descriptions of these two methods are given in Section 2. Most of the proofs are given in the remaining sections.
As a by-product of proving Theorems 1 and 2, it is shown in Propositions l(b) and 3(b) that the "optimal rate" for the smoothing parameter, with respect to expected average squared error, is the same for the two estimation methods as it is for ordinary smoothing splines.
As suggested by a referee, we also have looked into the situation studied by Heckman (1986). When hr
-
constant, Heckman (1986) established asymp- totic normality for the partial spline estimator of ,3 and showed that its bias is asymptotically negligible. According to the preceding discussion, it is expected that the GCV method can be used to choose the value of A in (3) such that,13 can still estimate
3
with n-'/2 rate under the setting of Heckman (1986). This conjecture is also confirmed for a more general case where the hr's are polynomial of degree less than m, and the result is presented as Theorem 3 in Section 2.2. Data-driven methods and main results. In this section we describe the (restricted) GCV method and (restricted) Mallows' CL for determining the value of A in (4) and (5) and present the main results of this paper. We first introduce some notation. Write
X/30z + gOA = [SA + (I - SA)2X(XT(I _ SA)3X) -1XT(I _ S)2]y Aoy
Xl31, + glA = [SA + (I - S)X(XT(I - SA)2X)-1XT (I - SA)2]y = AlAy, where AoA and A1A are so-called hat matrices or influence matrices. Let AOG
be the minimizer of the generalized cross validation function (GCV function)
2
n-1 (I-AoA\)y
v0
(A)
( )n-1 ||r(I - AOA )Y| = ntr1AO)2'over A E [A1, A2] where A1 = n-61 log' n with 61 = 2m/5, and A2 = n-62 for any 62 satisfying 0 < 62 < 2m/(4m+ 1). Also II(I-AoA)y112 = yT(I -Ao)T(I -Ao)y, the residual sum of squares. Similarly, let Aoc denote the minimizer of Mallows' CL
COL(A)=nl (I-AO,)y 2 +2n-lo,2trAoA
over A e [A1, A2], where a2 is assumed known. For the partial regression
method,
A1G, V1(A), AlCand
CL(A)are defined
accordingly
for
A1ly.It is known that there exists a common orthonormal basis for all SA, (with A being the running index), for example, a Demmler-Reinsch basis [Demmler and Reinsch (1975)]. In other words, all S i can be diagonalized simultane-
ously by this basis. Further details of this basis are given in Section 3. Unfor- tunately, it is not clear whether there exists such a common orthonormal basis
for all Ao, or A,,A in general. Although both the GCV method and Mallows' CL have been studied in the context of nonparametric regression when S), is the smoother matrix for smoothing splines, these results are not applicable to our problem since the arguments used to prove these results depend strongly on the existence of a common orthonormal basis for all SA.
Throughout the rest of the paper, we assume that
{xi
}
is a random sample from x, where x = (X1,... ,Xd)T, Xr = hr(t)+zr, for 1 < r < d, t E [0, 1] and the hr'S are smooth functions. Set go = EdZl3rhr +g. We also assume that the following conditions hold.(Al) Ezr=O,Var((zl,...,Zd))=E=(urs)andEz4 <00,forl<r<d where
E is a d x d positive definite matrix. (A2) f1(g1m)(t))2 dt = y > 0 and m > 2.
(A3) The points tin are generated by (2i - 1)/2n = tfin p(t)dt for some den-
sity function p(t) on [0, 1].
(A4) The errors en,.. ,enn are i.i.d. having a distribution independent of
n and t, and Eel < oo, for i= 1, 2,..., n.
(AS) g,hr,go E F =
{f: f
E W22m[O, 1], f(k)(o) = f()(l) = 0, m < k < 2m - 1} for 1 < r < d.Under (A3), we can find the magnitude of trS' for 1 = 1, 2,. .. over [A1, A2] based on Lemma 5.1 of Speckman (1981). This result is summarized in Lemma 2(c). Under (AS), functions in F are the so-called very smooth functions defined in Wahba (1977). When A2 also holds, it follows from Speckman [(1981), (3.2) and Lemma 3.1] that an exact bound can be obtained for gT(I - SA)2go, where
go = (g0(t1), . . ,go(tnn))T. This bound is given in Lemma 2(b) in Section 3.
We now discuss the assumption (A5), which states that go and hr must sat- isfy boundary conditions on some high derivatives. (AS) is considered because it and (A2) give an explicit asymptotic expression for the expectation of the av- eraged squared error loss. Then this expression can be used to determine the asymptotic behavior of A determined by either the GCV method or Mallows'
CL. Using the bias reduction approach developed by Eubank and Speckman (1991), go and hr can be modified (by construction) to satisfy the boundary conditions specified in (AS). It is then conjectured that a result similar to that of this paper without (AS) will still hold as long as an explicit asymptotic ex- pression for the expectation of the averaged squared error loss exists after the boundary adjustment. However, no proof is available now.
The asymptotic distribution of ,BO and 31~ are summarized in Theorems 1 and 2, respectively, when the value of A is determined by either the GCV method or Mallows' CL.
THEOREM 1. Under (A1}(A5), -,/ni(l30 - j3) converges in distribution to
N(O, 2-1) for A = AOG or AOC.
THEOREM 2. Under (A1)-(A5), vni(031, - 3) converges in distribution to
Now we describe the (restricted) GCV method and (restricted) Mallows' CL for determining the value of A in (3) under the assumption that hr's are polynomial of degree less than m [i.e., h(m)(t) 0 ]. The results are summarized in Theorem 3. First write
X13A + = [Sv + (I- S\)X(XT (I - SA,)X) 1XT(I- = AAY
Let AG and AC be the minimizer of the corresponding GCV function and Mal- lows' CL, respectively, over A E [A1, A2].
THEOREM 3. Under (A1}(A5) and h(m)(t) 0, for 1 < r < d,
Vii(11-3)
converges in distribution to N(O, o2>Y1), for A = AG or AC.Let Lon(A) denote the averaged squared error loss over design points, that is, n- lAoAy - X,3 - gll2, and AOR denote the value of A that minimizes the
risk Ron(A) = ELon(A) over [A1, A2]. Note that here the expectation is taken with respect to e only, that is, conditioned on (x, t). We will prove Theorem 1 via the following three steps. Since the GCV method or Mallows' CL at- tempts to provide a data-based estimate of AOR, we first try to locate AOR. Let A0 = [A,,n-S3], where 61 > 2m/(4m + 1) > 63 > 62 > 4. 1. Note that A0 is con- tained in [A1, A2]. We show in Proposition 1 that AOR E AO. Next, we show in Proposition 2 that the choice of A based on either the GCV method or Mal- lows' CL does fall in Ao in probability. Finally, we show that v/i(130, -
1)
is asymptotically normal.Set hr = (hr(tin) ... ,hr(tnn))T, for 1 < r < d, and
Cl =lr ["p 1/2m (v) dv] j(1+ V2n) -ldv.
The proofs of the following two propositions are given in Section 4.
PROPOSITION 1. Under (A1)(A5) and A E [A1, A2], when n tends to infinity, we have (a) ROn(A) % A2 + n-lA-1/2m and (b) AOR ; n-2m/(4m+l)
Here the symbol a(n) P b(n) means that a(n)/b(n) is bounded away from
zero and infinity. Note that AOR E Ao is an immediate result of (b).
PROPOSITION 2. Under (A1)(A5) and A E [Al,A2], limnP(A E Ao) = 1, for A = AOG or Aoc.
To prove Theorem 1, we use the following technical lemma to pave the way. Set Ao(A) = n-'XT(I - SA)3X, A1(A) = n-XY(I- S)2X, Z = (Zirn)nxd and
H = (hr(tin))nxd.
LEMMA 1. Assume that (A1(A5) hold and that g, hr E F, for 1 < r < d. Then the following hold uniformly over all A E [A1, A2]:
(a) Ao(A) = 2(I + op(l)); (b) A1(A) = EiI + op(l));
and the following hold uniformly over all A e Ao: (c) n-l/2XT(I - SA)2S,\X = o(1);
(d) n-1/2XT(I - SA)2g =
(e) n-'/2HT(I - S)2e = op (1); (f) n- 12ZTS'
e
= op(1),forl= 1,2;(g) n'1/2ZT(I - SI )g = op(1).
The proof
of Lemma
1 is given
at the end of Section
3. Note
that
the notation
op(l) used in this paper
denotes
either
the usual convention
or a d x d (or d x 1)
matrix
such that the magnitude
of each element
is op(l).
Now the proof
of Theorem
1 becomes
fairly
simple.
PROOF OF THEOREM 1.
Rewrite
Ao(A)n
"2('0o - ,a) = n-2Ze +Rem(A),
where
Rem(A) = n-12{X3(I - SA)2(SAXI3 +
g)
+ HI(I - SA)2e + ZT[(I - \)2 - I]e}.It follows
from
Lemma 1(c)-(f)
that
supAEAOIRem(A)I
=op(l). Although
any
realization
of A is in [A1,
A2],which
is a wider
interval
than AO0
by noting
that,
for
any c > 0,
P
Rem(A) >c)
<P( V AO0) +P (IRem()I > c and A E Ao)<
P(I V Ao)+ P
(sup IRem(A) I >c)
AE0
we can conclude
that Rem(A)
= op(l) by Proposition
2.
By Lemma l(a),
supxE[A Ai2]\2A(A)
-E
= op(l).Since
IAOPO- EI < sup Ao(A)-
E=oP(1),
AE[A1, A2]
we have
Ao(A) -+ Ein probability.
It is shown in Chen and Shiau (1991)
that n' /2ZTe -+ N(O, L2% ) in distribution. We then conclude V/(,30Q - 3)
N(0,
a2E-1) by the above argument
and Slutsky's
theorem.
0We now turn
to the partial
regression
estimator
(4). Observe
that
A(X) 1/2 i3)-n/2ZTe 11 +n1/2XT(I
-S)2g-+n/2HT (I-
Similarly, the proof of Theorem 2 can be performed via the following two propositions and Lemma 1(b)-(f).
Let the loss function Ljn(A) = n-' IIA1y - X,3 - gil2, and let A1R denote the value of the smoothing parameter that minimizes the risk R1n(A) = EL1n(A)
over
AE [A1,
A2].PROPOSITION 3. Under (A1)(A5) and A E [A1, A2], when n tends to infinity,
we have (a) R1n(A)
e A2 + nl'A-1/2m and (b) A?R t n-2m/(4m+l)PROPOSITION 4. Under (A1)-(A5) and A e [A1, A2], limn P(A
E
Ao) = 1 forA = AlG or Alr.
We now turn to the partial spline estimator (3) when h(m)(t)
_ O.
PROOF OF THEOREM 3. Set A2(A) = n-lXT(I - S,)X. RewriteA2(,-,)nl/d - ,3) =
n-/2ZTe
+ Rem('), whereRem(A) = n-1/2 {ZT(I - S\)g + HT(I - S4)(e + g) - ZTS\e} .
Note that HT(I - SA)(e + g) 0 O because the hr's are polynomials of degree less than m and SA is the smoother matrix for ordinary spline smoothing. Using
the same proof
to show Lemma l(a), we have
supAE[A1, \21IA2(A)
-E
= op(l). It follows from Lemma l(f) and (g) that supAEAo IRem(A)l = op(l). We thenconclude
V/in(i%S
-, 3) - N(O, a2E-1) by the above discussion and the argumentused in proving Theorem 1. 0
3. Technical lemmas. In this section we state two more technical lem- mas and summarize some properties of smoothing splines that are needed in the sequel. Lemma 1 is proved as an immediate result of these lemmas.
It is well known that smoothing splines are in the space of natural polyno- mial splines of order 2m on [0,
1]
with knot set {tin},.=I According to Demmler and Reinsch (1975), a basis for natural splines is {0jn(t)}j,<n with the follow- ing biorthogonality property:(m (t (m t dt = Akn S1k.
n Oi (>tin)Okn (tin) = 0jk Jo kn
i=1
Here {Akn} is a nondecreasing sequence of nonnegative numbers, and the eigenvalues of SA are (1 + AAkn)-1 for 1 < k < n. Hence, SA is a nonnegative definite matrix and has the eigenvalue decomposition rTD,r, where DA is a diagonal n x n matrix with k-th diagonal value (1 + AknA)-1 and r is an orthogonal n x n matrix with the ij-th element n-1/25 in(tjn). Therefore, (I -
Let
B2= n-1gT(I
_
SA)2g,
B2rp =nlhT(I SA)hrand B23, = n-1 gT(I - SX)2go.
Note that B2 is the averaged squared bias of the ordinary smoothing spline estimate of g. A similar interpretation is applicable to
B2rp
2 and B2 3p.The following lemma is due to Speckman [(1981), Lemma 3.1, (3.2) and Theorem 2.4].
LEMMA 2. Suppose that (A3) holds. When A E [A1, A2] and m > 2,
(a)
(aB2- B32p = o(A2) ifgo
E F, (b) B2 3,, = 'yA2(1 + o(1)) if go E F and (A2) holds, and (c) trSI = Ek(l + AknA)-' = C,Al-/2m(1 + o(1)) for positive integer 1.Thus Lemma 2(a) also implies that B2 = o(A2) and B2r = O(A2 ), ifg,
hr
E F.Lemma 3 summarizes the convergence rates for some terms to be used later
in the proofs of Lemma 1 and Propositions 1-4. Let Xirn = hr(tin) + Zirn and
Zr = (Zirn,... Znrn )T.
LEMMA 3. Assume that (A1)-(A4) hold and that hrf, fi, ff2 E F, for 1 <r < d. Let a, aO and a, be constants satisfying 1 < a < 1/ao < 5 and a < a1. Then, for any finite positive integer 1, the following statements hold uniformly overall A
E
[A1, A2] and 1 < r,s < d:(a) zTS1Z. = CjarsA-1/2m + op (A-1/4mao)
(b) eTS1 e =a2c,A-l/2m(1+o (1));
(c) n-l/2fT(I - Sx)le - Op(Al/al) -op(l), where f = (f(tln), f(tnn))T; (d) n - 1/2fT (I - SA)Zr - Op (Al/al) = op (1);
(e) n-lfT'(I - SA)lf2 = O(A2), where fi = (fi(tln), . . fi(tnn))
i = 1,2 and 1 > 2;
(f) n -xT (I - SA)2zs = Urs + Op (1);
(g) n-lx T(I - SA)'X8 = urs + op (i), for 1 > 2;
(h) n-1/2x T(I - SA,)2SAxs = op(l) + O(1/2 2
(i) Z4TSl e = Op (A-1/4mao)
(j) n-1zT(I - S,)le = Op(n-'/');
(k) xT(I-S,)3x _-xrT(I-SA)2z8
-(c1 - 2c2 + C3)rrSA- 1/2m + op (A-1/4mao) + Op(n1/2A1/a
+T~(I _ S\)3h8; + hrT(->3S
(1) xT(I - SA)3Xs - XT(I _ 4Xs
= (cl - 3C2 + 3c3 - C4)crsA-l/2m + Op(A -1/4mao) + Op(nl/2A1/al)
+ hT (I - SA)3SAhs.
nontrivial proof of Lemma 3 is deferred to Section 6.
PROOF OF LEMMA 1. First note that A-1/4mao = o(n-1/2), for all A E [Al, A2]
since 1/ao < 5. It is easy to see that (a) and (b) hold by Lemma 3(g). Note that
A2 = o(n-1/2), for all A E Ao. Then it is easy to see that (c) holds by Lemma 3(h); (d) holds by Lemma 3(d) and 3(e); (e) holds by Lemma 3(c); (f) holds by Lemma 3(i); (g) holds by Lemma 3(d). 0
4. Proof for two-stage
spline smoothing
estimate. We prove
Proposi-
tions 1 and 2 for the two-stage spline smoothing estimates in this section. The following technical lemma summarizes the convergence rates for some terms to be used in the proofs. The proof of the lemma is deferred to Section 7.LEMMA 4. Assume that (A1)-4A4) hold and that g, hr E F, for 1 < r < d. We further assume that the constants a, ao and a, specified in Lemma 3 satisfy the further constraint that 4m/(4m - 1) > a, > a and aO > 1. Then the following statements hold uniformly over all A e [A1, A2]:
(a) n-1 trAOA = cln-lA-l/2m(1 + o (1));
(b) n-1 trA2 = C2n-IA-1/2m(+ (1)
(c) n - 1goT(I - AOx) 2go = yA 2(1 + op (1));
(d) n-l/3TZT(I -Ao,)2Z13
- [n1
(Z
rhr) (I-SX)
(Zfrhr)+ (c2 - 2c3 +c4 )T n-1A-1/2m (1+op
(1));
(e) I n IgoT (I - AO,\)2e I =op (ROn
(A\));J(f) in-l(Zj) T(I -Ao\)2e l=op(Rn(A));
(g) In-leT(2AoA -A2 )e (2 trAoA -trA, A
)i
= op(Ro (A))vPROOF OF PROPOSITION 1. Write Ao\y-X,Z3-g = (AoA\-I)(Xf3 + g) +AAOe.
Hence
ROn(A) = n1 (Xf3
+
g)T(I -A0o)2 (X3 + g) +n- 1o2 trA 2 Note that X,3 + g =Z3 + go. ThenROn(A)>{[cC20 + (C2 - 2c3 + C4),3T ,]n-lAl-1/2m +-yA 2
(6)T
by) Lemm.+ (Ia_ (a l))-)( 4b)4 (drhr}
n
(
rhr
( + op by Lemma 4(b)-(d).Note that nY-(ErI3rhr)T(I - S)\)4(Er3rhr) > 0 and its order is 0(A2), by
Lemma 3(e) and that the eigenvalues of SA are between 0 and 1. Also,
(C2 - 2C3 + C4)16TEI3 > 0, by the fact that C2 - 2C3 + C4 > 0 and E is posi- tive definite. Hence, Proposition l(a) holds by (6), and Proposition l(b) follows easily from Proposition l(a). o
PROOF OF PROPOSITION 2. Recall that COL(A)= n1 II(I-Ao)y112+2n-1'2trAoA, which can be written as
COL(A) = n-leTe +ROn(A) + 2n-l(Z3 + go)T(I -Ao,\)2e
(7) +n'1{a2(2trAoA) - trA2) )eT(2AoA -AO)e}
= n -leTe +Ron(A) + op(Ron(A)),
by Lemma 4(e)-(g).
Recall that the GCV function VO(A) = n- II(I - Ao)y112[n- tr(I -AO)]-2.
Write Ao,A = SA + Bo,, where Bo = n-1(I - SA)2XA- (A)XT(I - SA)2. It follows from Lemmas l(a) and 3(g) that
trBo,\ = tr (A -1 (A)n1-XT (I - SA) 4X) = tr(Id xd + op(i)) = 0p(1). Also Lemma 2(c) gives that trS), = O(A-1/2m). We then have
[n-1tr(I -Ao 2)] = 1+ 2n1 trAoA + o(n tr Ao,). Observe that
n-ll(I -AoA)y)
112
=ROn(A) + 2n-'(Z,3 + go)T(I -Ao0\)2e + n-leTe-n-1 [eT(2Ao, -A,)e -_ 2(2trAo, - trAo,)]
- 2a2n-1 tr Ao,.
The fourth term on the right-hand side is equal to op(ROn(A)), by Lemma
4(g). The second term is also of the order op(ROn(A)), by Lemma 4(e) and (f).
We thus get
Vo(A) = [n-leTe + ROn(A) + op(ROn(A)) - 2A2n-1 trAo \]
(8) x [1 + 2n-1 trAoA + o(n'1 trAoA)] =
n(-8leTe + ROn (A) + op (ROn (A)) + 2 (n tr AOA) (n-leTe - a2)
= n-leTe +ROn(A) +op(ROn(A)),
by Lemma 4(a), Proposition l(a) and the law of large numbers. From (7) and (8), we have
COL (A) - CoL (AOR) = Ron (A) -Ron (AOR) + op (ROn (A)) and
respectively. When ROn(A)/ROn(AOR) -A 00, it follows easily that COL(A) >
COL(AOR) and VO(A) > Vo(AOR) in probability. Since A is the minimizer of COL(A)
or Vo(A), this implies that Ron(A)/Ron(AoR) -- 1 in probability. Let {6n} be
any sequence that tends to infinity. Note that Ron(AoR6n)/Ron(AoR) -- oo and
Ron(AoR/6n)/Ron(AoR) --
o?
by Proposition l(a). Hence, Ron(A)/Ron(AOR) -? 00 for any A>
AOR6n or A<
AOR/6n. Since Ron(A)/Ron(AoR) cannot go to infinity, we have thatlim P(AOR/bn n < " A '< AoRSn) = 1.
Since {En} is any sequence that tends to infinity, A cannot be too far away
from AOR in probability. Thus limn P(A E Ao) = 1. 0
5. Proof for the partial regression estimate. First, we state a techni- cal lemma that summarizes the convergence rates for some terms to be used in the proofs of Propositions 3 and 4. We defer the proof of this lemma to Section 7.
LEMMA 5. Assume that (A1)-(A4) hold and that g, hr E F, for 1 < r < d. We further assume that the constants a, aO and a, specified in Lemma 3 satisfy the further constraint that 4m/(4m - 1) > a1 > a and aO > 2. Then the following statements hold uniformly over all A E [A1, A2]:
(a) n-19gT(I -Ax )T(I - A1,)go = yA2(1 + op(l)); n 13TZT (I - Al) T(I- A ,)ZO
(b)
=n ;(Eprhr)
(I_-SA
)2 E,Prhr
+Op(A-1/2m) (1 + o(1)); (c) In-l(X,3 + g)T(I _Al))T(I -AAl)el =op(Rn(A))-PROOF OF PROPOSITION 3. Simple algebra leads to
R14(A) = nl(Xfi3+g)T(I-A1x )T(I-A1l)(Xj33+ g)+ n1ov trAT,A1,.
Set Al, = SA + BlA\, where
BlA
= n-'(I - SA)XA7'(A)XT(I - SA\)2. By Lemmas l(b), 3(g) and 2(c), we havetr BTfB1X = trAL'(A) [n-'XT(I - SA\)2X]AL'(A) [n-lXT(I - S,\)4X] = Op (1)
and
(10) tr STBlA trAj'(A){n
XT[(I-SA)3-(I_-SA)4]X} =-p(j)
Hence,
(11) ~~~n-1 tr ATA n-lA-l/2m(1+o() Then since X,3 + g = Zl3 + go, we have
R1n(A) = [C2a2n-lA-l/2m + A2
(12) LT
+ n 1( rhr) (ISA)2 (Z rhr)] (1+Op(1)), by Lemmas 5(a), 5(b) and (11).
Note that n-l(ErOfrhr)T (I - SA)2 (Erlrhr) > 0 and its order is O(A2) by Lemma 3(e). Hence, (a) holds; (b) follows easily from (a). O
PROOF OF PROPOSITION 4. We first observe that trBlA = Op(l) by Lemmas l(b) and 3(g). Then, by Lemma 5(c), it remains to show that
(13) n-1jU2 trAlTAA1A - eT(Al +ATA -ATIAAl)eI =o(R1n(A)),
(14) n-lju2(2 trAlA - trATAA1A) - eT(Al), +ATA -ATAAl4eI =o(R1n(A)) hold uniformly over all A E [A1, A2], so that
CuL(A) = n-leTe +Rln(A) + o,(Rln(A)),
V1L,(A) = n-leTe + Rln (A) + op (Rln (A)) -
Then, by applying the same argument employed in Proposition 2, we have Proposition 4.
It follows from Lemmas l(b), 3(c), 3(j) and 3(g) that
n-leTBlAe = [n-leT(I - SA) (Z + H)]A' -(A) [n-l(Z + H)T(I - SA)2e]
= op (Rln (A)))
n eTB,\SAe = [n-leT(I -S)2(Z+ H)]Aj'(A)
(16) x {n-l(Z+H)T[(I SA) - (I- S)2]e}
= 0 (Rln (A)\)
n-leTBT_BlAe = [n-leT(I - S\)2 (Z + H)]A1'(A)
(17) X~~~~ [n-lXT(I -5,\)2X]A1 1(A) x [n-l(Z + H)T(I - S,)2e]
= op (Rin (A))
It follows from Lemmas 3(b) and 2(c) that
(18)
n-c[e T(2S3 a_(S2)e
b (,2 tr (2S15-_S2)]
) (Rln(A)) We conclude (13) and (14) by (9), (10) and (15) (18).06. Proof of Lemma 3. We begin with a technical lemma which is an extension of Lemma 4.4 in Speckman (1985) to the case when the random variables are not independent. Therefore, the Gaussian assumption in Speck- man (1985) or Li (1986) is removed.
LEMMA 6. Let W1, ... , Wn be random variables with zero mean and finite
variance. Suppose that there exist nonnegative numbers {Uk} such that
2 v
E
[ WjW < Z Uk,for
all
u< v.
Lk=,4 k=,u Then, for any c > 0,
sup ZcWk?c}?c2c~g
4n
)21I
Uk.Pt SUp |ECkWkI >C} <C 0002(O 4) u.
-<Cl< ... <Cnf<Co k=1 k=1
PROOF. By the argument used in Lemma 4.4 of Speckman (1985), we have
n i
sup E CkWk =Co max
E
Wk.0<?Cl ?...?<Cnl?c0 k=1 -< - k=1
Then, by the first two theorems stated in Serfling [(1970), page 1228],
E jmax
[
x Wk < (log24n)2Euk.L k=1 J4 k=1
Hence, this lemma holds by Chebyshev's inequality. O
REMARK 1. When EWkWl = 0, for k i 1, Lemma 6 holds with uk = Var(Wk).
REMARK 2. Lemma 6 also holds when 0 < cn < ... < cl < cO.
Define
n n
'mkrn = n1/2 Zirn dkn (tin) X hkrn = n 1/2 hr (tin >knhr (tin) , )
i=l i=l
n n
ckn =n 1/2 >g(tin )q$kn (tin) Ekn=n 1/2 Eein_Okn(tin)
i=l i=l
for 1 < k < n and 1 < r < d. Lemma 6 will be applied to {fkrn4ksn}1<k<n and
{Jkzrn6ekn1}<k<n9 for 1 < r, s <
d,
later on in the proof of Lemma 3. Thus we need to show that these two sequences of random variables satisfy the assumption of Lemma 6.LEMMA 7. For any finite positive integer 1 and 1 < r, s < d, both
{(4rn4sn - 0rs)(1 + AknA) } and {4krnEkn(1 + AknA) }
l<k<n l<k<n
satisfy the assumption of Lemma 6 with Uk = c*(l + AknA)-21, for some con-
stant c*.
PROOF. Recall that S), =
rTDA\r.
Set D, = (dik)nxn, where dik = 1, ifIL < i = k < v, and dik = 0, otherwise. In other words, Dp, is an
n x n
diagonalmatrix with the diagonal entry equal to 1 from the p-th row to the v-th row and zero otherwise. Then
Z
(+Akm6) =ZT(rTDHv D,
D,Ir)
le,
(19) k,
4krn4ksn - 0rs ZT(rTDv DA Divr)lZS T - ars tr(rTD,VDAD4Vr)l.
(1+Akn A)l
By (Al), (A4) and a conditioning argument, we have
rv
12E
z
( Akn =)lJ =2EZT(FTDjvDA
D,ivl')21Zr,= a 2arr
tr(rTD,pv
DA, D,,Vr)21 = Or2cTrr j(1 + Aknf A2)]Lk=L J
Letting Uk = 020rr(l + AknA)-21, we have shown that the assump.tion of Lemma
6 holds for ,krnEkn(l + AknA)}l1<k<n.
Next, by (19) we have
1 2 E
jS
S n-n)rs] = Var(zrT(rTDj,1DA D,,vr)lzs),
since E(Z T(rTD,W Dx DilvrFlz8) = ars tr(rTDiv DA DMvr)'. We first show that, for
any symmetric matrix A = (ajj)nxn,
(20) Var(zTAz8) < co tr A2
for 1 < r < s < d, where co is a constant depending on Ez2z2 and E only. For notational simplicity, we only demonstrate the case of r = 1 and s = 2. First, we note that EzT Az2 - a12 tr A and
(ZTAz2)2 =
5555
aijaklzZilnZj2nZk1nZl2nSince {(Ziln, Zi2n)}1<i<n are mutually independent with mean (0, 0), we have
(Ez2z,
1 2 i=j=k =l, 0zk = 12, i=j,
k = 1, i k, EZilnZj2nZklnZl2n = 1292s z-, J 0110ll22,i
=
k,j = 19 i ij, 10, otherwise. HenceVar(ZT Az2) = (EZ2Z2 - p12)
E
ai + alla22Z
ai < cZ
at,i j i,j
where
c
= max(Ez2lz2 - al2 1 a1U22).Since A is symmetric
and
Eija?.
tr A2,
(20) holds.
Let A
= (rTTDpvDA
DtWF')1.
By (19) and (20), we have
1 2
E
( r Aksn -A)rs] -Var(z{TAzS) ?co
tr(FTD vDA >Dv )2 h=CO (l + AknA) 2
,k=v
Thus
{(mkrn6ksn - ars)(l + AknA)-1}1<k<nsatisfies
the assumption
of Lemma
6 by
identifying
Uk
= c?(1 + AknAY21
?
PROOF OF PART
(a). First,
we show the case of I = 1, that is, to show that
(21)
ZT'Sxzs
= ars tr S?, + op(A l/4mao)
holds
uniformly
for
all A e [A1,
A2]and its proof
argument
will be used through-
out the proof
of Lemma
3. Since
62 < 61,there
exists
a
>1 such that a62 <
61Define
the index set A =
{6: 6= ai62, for some positive
integer
i and
S<
61}.Then A is a finite
partition
of [61,
62].Correspondingly,
{n-6,
6 EA} is a
finite partition of [A1, A2]. For any r =n-a
with ab E A, Ez TSTZ8 = ars tr S,and Var
(zTSTZS) < cOtr
S2 = O(r-1/2m)by (20) and Lemma
2(c). Thus by the
Chebyshev
inequality,
we have
(22)
z4TS,z -_ r,tr
ST
=
Op(r-1/4m
Write
(z
SAz,
-ars tr SA)
-(zTSz8
-ars tr S,)
-
(6krn4sn - 0rs3)
(23)
k =1 +
AknA1 +
Akn-rJ,r-
A(n 1 ' rnksn - (rsNote that (1 + AknA)-1 are nonincreasing in k and bounded above by 1, and that {(mkrn4ksn - ars)(l + AknT)-1}1<k<n satisfy the assumption of Lemma 6 with
Uk = c*(1 + Aknr)-2. Then, for any c > 0 and 6 E A, we have
1n krn6s8n - 07rs >
nPa6<<nup k
1 + An A
su kk 1 + A 1+0<_ 1/(l+,\nn A)< - - - < 1/(l+A\ln A)< 1 k=1 1 + Akn, A 1 + Akn'T
)
n
<c2(10g2 4n)2 ZUk,
k=1 by applying Lemma 6 to (23). Since
E
Uk = c*(1 + AknT) 2 = C*C2r 1/2m(1 + o(l))k
by Lemma 2(c), these arguments lead to
(24) (Z'SAzs - Crs tr SA) - (ZTS,Zs - _rs tr S,) = OP ( , 1/4m logn)
uniformly for all A E [n,-a, n-6]. Then, by (22), for n-a6 < A < n-6,
(ZrSAZs-ors
tr
SA)
= OP (T1/4m) + Op (r-1/4m logn) = op(A-1/4mao)where ao is any fixed constant satisfying 1/ao > a > 1. Since the cardinality of A is finite, (21) holds.
Now it remains to study the case when 1 > 2. Note that EZ4Tsl Zs = (ors tr Si = CiorsT 1/2m (1 + o(1)) and Var(zr 8) < cO tr S21 1
by (20) and Lemma 2(c). Hence z4TS' z - ars tr S' = OP(CT-1/4m) by the Cheby-
shev inequality. Some algebra shows that
(Z SzS - Orrs tr Sk) - (Z S Z's - ars tr S )
k [(1 + AknlA)- (1 +
Aknr)l
((krn'ksn- rs).T _ - 1
]
6krn6ksn- 0rsA k=1 L 1 + Xkn,)i (1 + Aknl)i+'J (1 + Akn7)'-"
Note that (1 + AknA)- are nonincreasing in k and bounded above by 1. Hence,
(a) holds by applying Lemma 6 to each term on the right-hand side of the above expression and by the argument used in showing (21). o
PROOF OF PART
(b). (b) follows
from
(a) by identifying
Zrand
z8in (a) with
e in (b).
OPROOF OF PART (c).
For any finite
positive
integer
1,observe
that
En l/2fT (I-S,)e=O andVar[n/2fT (I - ST)'e] = n-l2fT(I - S,)2f n-la2fT(I - S )2f
0(r
2)since the eigenvalues
of
ST
are between
0 and 1. Hence, for any given r
E [A1, A2],(25) n /2fT (I - S)'e = Op(r).
Forr=n-6 with
6EA, write
fT[(I SA) - (I - S7)']e(26) AT
[E
(i
1+ +AkA) ( I+
AknrT)]1 AknrnT n
1 + AknA 1 + Aknr
wherefkn = n
1/2
Ei=l Atin)0kn(tin). Note that {[Aknr/(1 + Aknfr)]fknEkn} does not depend on A, E(fknEknXf1n6In) = 0, for ki
1, that {(1+AknA)Ni(1+Akn'r)J} for 1 <i,j
< 1, are nonincreasing
in k, and that
Akn,r n Aknr 2
Var _kn fEkn) =n- 1U2 Z1
A
fi)21 A,, + / k=l1 \+ AknT f = n-lY2fj (I-S7)2f= 0(r2).
It follows
from
Remark
1 following
Lemma
6 that we can apply Lemma 6 to
each term
on the right-hand
side of (26). Thus we conclude
that
(27) n-1/2 [fT(I - Sx)le - f (I - ST)1e] = Op((r - A) logn)
holds uniformly for all A E [n-a6, n-6]. By (25) and (27), for any a, > a, n-l/2fT(I - S,\)'e = Op (Al/a,)
holds uniformly
for all A
E [A1, A2]and finite
positive
integer
1. Hence, (c)
holds.
0PROOF OF PART
(d). (d) can be shown
similarly
by identifying
e in (c) with
Zr in (d). o
PROOF OF PART (e). Note that
by the Cauchy-Schwarz inequality. Since that the eigenvalues of SA are be- tween 0 and 1, (e) holds by Lemma 2(a). 0l
PROOF OF PART (f). Write XT(I-S)\)2 Z = ZTZ+HT(I-S,\)2Z+ZT(S2 -2SA\)Z. Then, by (Al) (in Section 2) and the law of large numbers, n-lZTZ = > + op,(l).
Hence, (f) follows from (a) and (d). El
PROOF OF PART (g). Note that
n xrT(i- SA)XS = n'zT(I- nT hIT(I + -S)
+ n - -1T (I-SA )tZr + n lhrT (I - SA,)'h.
Recall that n-lZTZ = E + op(l). Hence, it follows easily from (a), (d) and (e)
that (g) holds. Ol
PROOF OF PART (h). Write
-
T(I_SA )2S,\Xs = -2S, + S3)z8 + hT(I-
+ hrT [(I-S,)2_-(I-SA\)3] Zs + hsT[(I-SA)2 -(I_SA)3]Zr. Since IhrT(I - SA,)2SAhsI < nB2rpB2sp = O(nA2), (h) holds by (a) and (d). El
PROOF OF PART (i). Observe that EZTS' e = 0 and Var (ZT S'e) = q2 Var (zST2l'Zr) 7-1/2m
by (20) and Lemma 2(c). Hence z TS' e - OP(T-1/4m), for any given sequence Tr = n-1 with ab E A. Write
(28) Zr A- A (1 + AknA)V (1 + SknA)v+1] (1+ \knr)-v Note that (1 + S\nA)-i are nonincreasing in k and bounded above by 1. By Lemma 7, {mkrnEkn(l + AknT)-i} satisfies the assumption of Lemma 6. By ap-
plying Lemma 6 to each term on the right-hand side of (28), we conclude that (29) z TS'e = Op((l - A-1 r) T)1/4m logn) = op (A-1/4mao)
holds uniformly over A E [n-a6, n-6]. Hence, (i) holds. El PROOF OF PART (J). Write
-1/2T(I _ S, n/2ze + n"2zT(-2S2 + S2)e.
PROOF OF PARTS (k) AND (1). It follows from (a) and (c) that
XT (I-SA )3xS _ -XT(I-SA )2Zs
_ zT(I - )2S,z - hrT(I - SA)2z
+ hT(I_SA)3Zs
+
hT(IjSA)3zr + hT(I_S)3hs=-(c - 2c2 +C3 )rsA -1/2m + Op (A-1/4mao ) + Op(n l/2Al/ai) + h T(I-SA)3hs,
4T(i - sA)3x8 - Xr(i - s
T(II-,> _3sx (I-)S,Z )h. xS,\3(_
=
z4
(I-SA)3SAz8 + hT[(I-SA)3 3-(ISA)4] Zr+ hT [(I - S) (I_ - 4 + T(I _ S\)3ShS
= (cl - 3c2 + 3C3 - C4 )a rsA 1/2m + Op (A -1/4mao ) + Op (n 1/2Al/a)
+ h4T(I-SA\)3SAhs8.
Hence, we conclude (k) and (1). 0l
7. Proofs of Lemmas 4 and 5.
PROOF OF LEMMA 4. Recall ROn(A) A2 + n-1A-1/2m . From now on, we require that the three constants a, ao and a1 in Lemma 3 satisfy 4m/(4m- 1) > a1 > a and ao > I so that, for A e [A1, A2],
(30) nl-lA-1/4mao = o(A2 + -lA-l/2m) = o(ROn(A)) and
(31) n- 1/2Al/a, = 0(A\2 + n-1A - 1/2m) = O(ROn (A))
Equations (30) and (31) can be verified by simple algebra. Recall that Ao, = S), + Bo,\ where Bo, = n'-(I - SA\)2XA - (A)XiT(I - S,)2 and AoA = nXiXT(I -
SA)3X.
[(a) and (b)] By Lemma 3(g), we have
(32) trBO, = tr
{A -
1(A) [n\XT(I-SA)4X]} =op (1)(33) trBO = tr{AO-1(A) [n-XT(IS)4X]}2 =Op()
(34) trSA\BOA = trA- 1(A) {nlXT[(I-SA)5 - (I\-S)4]X} =op(i). This, together with Lemma 2(c), proves (a) and (b).
(c) It follows from Lemma 2(b) that
Also, by Lemma (a) and Lemma 3(d) and (e), we have n'go Bogo = [n'go(I -S4)2H
+
n-g (I- S)2Z]x {A
1(A) [n-1XT(I - S\)4X]AO1(A)}x [rrlHT(I - S,A)2go + n-lZT(I _ SA)2go]
= [O(A2) +op(n-1/2)][7-1 +op(j)] [O(A2) +op(n-1/2)] = o(A2) This, together with the Cauchy-Schwarz inequality and (35), leads to (c).
(d) Write
nlzT(I - AO,)2Z = n-1 {ZT(S2 - 2S\)Z + ZT[(I - BoA) - (BoA -BOA]Z
+ ZTSABoAZ + ZTBOASAZ}.
By Lemma 3(a), we have the first term
(36) nl(Zf3)T(S2-2S\)(Z,3)=(c2-2c,)13TE43n- l)Jl/2m(1 +o ())
It follows from Lemma 3(a), (d) and (f), Lemma l(a), (30) and (31) that the third term
13ZTSA A3 = 3 [n 'ZTSA(I - SA)2X]A- 1(A) [n -XT(I -S)2Z]IB = (c1 - 2C2 + C3 )f3T Efn31A-A1/2m (1 + oP (1)).
The fourth term has the same rate. Observe that
nlZT(I - Bo)Z [n Z TZA 1()] [Ao(A) -nlXT(I
S
)2Z]+ { [lZT(2SA -S2)Z] - [nlZT(I S)2H]}
xAO-'(A)[n-1XT(I - S,
n-IZT (Box - Boi)Z = Err lZT(I - Sv)2X]Ao '(A) [Ao(A) - rrlXT (I - s>)4x]
x A-' (A) [n- XT(I - s)2Z].
Then by Lemma 3(d) and (f), we have
r-lHT(I - S>)2z - op(A2 + 1 IlXT(I n-lA-l/2m) and -S T2Z = S +op(i). Also, by ()in Section 2) and the law of large numbers, we see that nlZTZ = T + op(l). Hence, it follows from Lemma 1(a) that
Hence,
by Lemma l(a), (36) and Lemma
3(k) and (1), we conclude
that
n- (Z,f3)T [(I -Bo,\) - (Bo, - Bo,)] (Z,e3)
= n-1 [(Hj3)T(I - + o(A-/4'1o) + Op(fl-/2A1/al)
+ (4C2- 4c3 + c4)fTE, -Al/2m] (1 + op (1)).
This, together
with
(36) and (37), proves
(d).
(e) Note that n-g1T(I
-S)jhr
= o(A2),for
1 > 2,and
(39) n lXT(I S\)2e = 0p(n-l/2Al/a1) + Op(n-1/2) = O (n-1/2
by Lemma 3(c) and
Ci).
Then by (31), (39), Lemma l(a) and Lemma 3 (d), (e)
and (g),
n-1gTB2
e
= [ng90(I - S\)2X]Ao-1(A) [n-lXT(I - S>)4X]Ao'(A)
x [n-lXT(I SA)2e]
(n - +/2 Al/a +A2) [n-1/2 Al/a, + (n-1/2)]
= op(ROn(A)).
Similarly,
we have
n9g0'(i -SA)BoAe = [n'g (I- S\)3X]Ao (A) [n-lXT(I - SA)2e]
= op (ROn (A)).
By Lemma 3(c) and (31),
n-1gT(I - SA)2e = OP(Ron(A)).Putting
these results
together,
we have (e).
(f) Write (I-AoA)2 = S2 -(I -Bo,) S-Sx(I-BoA) + (I-Bo,) - (Bo, -B2A). By
the central
limit
theorem,
n-lZTe = Op(n-1/2).
Then it follows
from
Lemma
l(a) and Lemma
3 (c), (k), (f) and (i) that
n-lzT (I - Bo,\)e = [Ao(A) -
n-1ZT(j
_
S\) 2X]A - l(A)(n - ZTe)--[n-lZT(I SA,)2X]A- 1 (A)
x [n-lHT(I - SA)2e - n-lZT(2S\ _ S2)e]
= O(Ro (A)).
By Lemma
3(1), we have Ao(A)
- n-1XT(I - SA)4X = Op(Ron(A)).It then
follows
by (39) and Lemma
3(f) that
n-lZT(Box - B 2)e = [n-lZT(I - S)2X]Ao (A)
x[Ao(A) - n-lXT(I - S,)4X]A-'(A) [n-lXT(I - S )2e]
Next,
n- ZT(i
-BoA
)SAe
=n-lZTSAe
- [n-lZT(I -S )2X]A- (A)
=~~~~~~~~~~~(40) x \lT(-s)2 - (I _ S,\)3]el
= (Ron (A))
since n'ZTS7Se = op(Ron(A)),
by Lemma
3(i) and the fact
that
n-lXT[(I - S -)2 _ (I - SA,)3]e = n'lHT[(I - S,-)2 _ (I - S,)3]e
(41) -Sn-ZT(S-2S2 +S3)e
= o(ROn (A)),
by Lemma 3(c) and (i). Similarly, n71ZTSA(I - BOA)e = o(Ron(A)). Finally,
nZTSSl e =op (ROn(A)) by Lemma 3(i). Combining all the terms, we have (f).
(g) Write A2 - 2AO = S2 - 2S + (BoASA + SABo,) +B - 2Bo. First, we note that
(42) n-l(eTS' e _ U2 tr S ) = o (n-A1 A-l/4mao) = O (R ()),
by the proofs of Lemma 3(a) and (b) and (30). Next, by Lemma l(a), (39) and (41) and Lemma 3(g),
(43) n-leTBoAe = [n-leT(I - S )2Xl4-1 (A) [nr'XT(I - SA,)2e]
;t n-1/2A1/aj + 0=Jn-1/2 2= p(ROn(A))t
n eTBoASAe =n [leT (I - SA) 2X]A0 (A)
[n
'X (I SA) 2SAen-leTB2 e = [n-1eT(I - SA,)2X]A- 1(A) [n -XTi(I - S) 4X] xA-' l(A) [n-1XT(j _ S,\)2e] = op (ROn (A)). Part (g) holds by (32)-(34), (43)-(45) and (42). Cl
PROOF OF LEMMA 5. Recall that Al, =
S),
+ BlA, whereB n-=1n(I-SA)XA 1(A)XT(I-SA)2 and A1(A) =n XT (I S)2X. (a) By Lemma 3(d), (e) and (g),
n-goTBTABl,AgO = [n-lgT(I - S\)2XA1 (A) [n-lX(I - T [ngA( A/lk Sc<)2X]A-l(A) JA I~iA
x [n-XT(I - S\)2go]
It can be easily verified that Op(n-lA2/al) = op(A2). This, together with the Cauchy-Schwarz inequality and (35), proves (a).
(b) Write
I,TZT(I -A,,)T(I -A A)Z13 = JTZT(S2 - 2SA)Zfl + 2fTZTSTBlAZ/3
+ OTZT [ (I - B1 B1T(I - B1 )]Z By (36), Lemma 3(a) and (f) and Lemma 1 (b), we have the second term,
n 1f3TZTSTBZI3 = 3T [nlZTSA (I - SA)X]A-1 (A) [n lXT(I - (46) = (cl - C2 )/3T ,{3n-Al1 1/2m + op(n-f-A-1/4mao)
+ Op (n -1/2 A1/al) .
Observe that
n-lZT(I - BlA)Z = [n-lZTZAl1(A)] [A1 (A) - n-lXT(I -
S0)2Z]
+ (n-lZTSXZ)A l (A) [n-lXT (I SA)2Z] - [nlZT(I - SA)H]A1 (A) [nlXT (I - )2
n lZTBT (I -B1l)Z = [n lZT(I- SA)X]Aj1(A)
x
{
[nlXT(I - S,\)Z - nlXT(I _ SA)2X]+ [n lXT(I - S()2X]A (A) x [Al (A) - n-lX( S)2Z]} It follows from Lemma 3(a) and (d) that
xT(I - SA)2x8- x_7T(i - SA)2zs = XrT(I- _ S
= hrT(I - SA)2hs + Op(nl1/2A1/l)
= hrT (I-SA)2hs + op (nA2 + A- 1/2m
xT(I-SA)zs
-xT(I-S\)2xS = (c11C2)UrsAl/2m -hT(I-S)2hs+Op(A-1/4mao) + Op(nl1/2 Al/al)
= (C1 - C2)orsA h/2m -2hT(Ih-
+p n2 1 /2m . +oP(nA2 +A
Note that In'lhT(I - SA)2hsI
=
O(A2) by Lemma 3(e). Hence,B7 T[A 1(A) -n - XT (I _S,\) 2Z],
= n-[o (nA2 + A-l/2m) + (H3)T(I _ SA)2H,3] (1 + op (1)), (48) 3T [n 4XT(I - S)Z -n-XT(I- S-)2X])
By Lemma l(b), (38), (47), (48) and Lemma 3(a), (d), (f) and (g), we conclude that
n-1 (Z)3)T[(I - Bl,\) -BlT(I - Bl,)](Z)3)
= n-1 [(Hf3) T (I - S>,)2(H,3) + C2'3TE'A_1/2m] ( 1 + op ( 1) )
Part (b) holds by (36), (46) and (49).
(c) Write (I-Al,x)T(I -Al,) = (I-SA)2 + BTAB1 -(I-S,)B1 -BBTA(I- S,). By Lemma l(b), (48) and Lemma 3(c}(e), (g) and (j), we have
n-1 OTz3T, Ble
= [n-gT(I - SA)2X]A7'(A) [n-lXT(I - S -)2X]A1 (A) [n lXT(I - SA)2e]
(n-1/21/+, 2 [-/2 1/al + -1/2
=op
(R
ln(A))
s
n-1OT(i - S,A)Bl,Ae = n-1gOTBTz (I - S,\)e
= [n-g (I - SA) X]A1 (A)[n-lXT (I- S)2e] = Op (R ln (A)).
By Lemma 3(c), n-1g?T(I - S,)2e = op(Rln(A)). Hence,
(50) n-g1T(Ij-AlA)T(I-AjA)e = o(Rj(A)).
Write
(I -Al,)T(I Al) = (I - BlA)T(I-B ) +B s -(I - Bl,\)Ts,\ -
S'\(I
- B1)ZT(I-BlA)T(I-BlA)e = ZT(I-BlA)e_ZTBT (I-BlA)e.
Recall that n-lZTe = Op(n-1/2). Then
n-lZT(I - BlA\)e = [A1(A) - n-lZT(I - SA)X]A71 (n-lZTe) - [n-lZT(I - SA\)X]A7 (A)
x [n-lHT(I - S)2e - n-lZT(2S \ - S2)e] =~~~~~~~~~~~~~
=
0(R1n(A))j
by Lemma l(a), (47) and Lemma 3(c), (f) and (i). Using the same argument to derive (47), we have
By (39), Lemma l(b), Lemma 3(f) and (g) and (51), we have n 1ZTBTA (I-BlA)e = [n-lZT(I_SA )2X]Ail(A)
x [Al(A) - n-lXT(I - SA)2X]Aj1 (A)[n-lXT(I S)2e]
= O(Rln(A)).
Hence,
(52) n-lzT(I - BlA)T(I - BlA)e = o(Rin(A)) Then by Lemma 3(c) and (i), we have
n-lZT(I - Bl,)TS,Ae n-lZTS,\e - [n-'ZT(I - SA)2X]A-1 (A)
(53) x [n-lXT(I - SA )SA,e]
= o(Rin (A)).
Thus by (52), (53) and Lemma 3(i) we have
(54) ZT(I - Al,)T(I - Alxe) = op (Rin (A)) Part(c) holds by (50) and (54). El
Acknowledgments. We would like to thank two referees, an Associate Editor and a former Editor (Arthur Cohen) for the careful review. The refer- ees' constructive comments considerably improved the paper and are greatly appreciated. We would also like to thank Professor Paul Speckman for his comments on our earlier work, which led to the development of the two-stage spline smoothing method.
REFERENCES
CHEN, H. and SHLIU, J. H. (1991). A two-stage spline smoothing method for partially linear mod-
els. J. Statist. Plann. Inference 27 187-201.
CRAVEN, P. and WAHBA, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31
377-403.
DEMMLER, A. and REINSCH, C. (1975). Oscillation matrices with spline smoothing. Numer. Math. 24 375-382.
DENBY, L. (1986). Smooth regression function. Statistical Research Report 26, AT&T Bell Labo-
ratories.
ENGLE, R. F., GRANGER, C. W, RICE, J. and WEISS, A. (1986). Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 310-320. EUBANK, R. L. (1988). Spline Smoothing and Nonparametric Regression. Dekker, New York. EUBANK, R. L. and SPECKMAN, P. L. (1991). A bias reduction theorem with applications in non-
parametric regression. Scand. J. Statist. 18 211-222.
HECKMAN, N. (1986). Spline smoothing in partly linear models. J. Roy. Statist. Soc. Ser. B 48 244-248.
LI, K C. (1986). Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing. Ann. Statist. 14 1101-1112.
MALuows, C. L. (1973). Some comments on Cp. Technometrics 15 661-675.
SERFLING, R. J. (1970). Moment inequalities for the maximum cumulative sum. Ann. Math. Statist. 41 1227-1234.
SHIAU, J., WAHBA, G. and JOHNSON, D. R. (1986). Partial spline models for the inclusion of tropopause and frontal boundary information in otherwise smooth two and three di- mensional objective analysis. Journal ofAtmospheric and Ocean Technology 3 713-725. SPECKMAN, P. (1981). The asymptotic integrated mean square error for smoothing noisy data by
splines. Unpublished manuscript.
SPECKMAN, P. (1985). Spline smoothing and optimal rates of convergence in nonparametric re- gression models. Ann. Statist. 13 970-983.
SPECKMAN, P. (1988). Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50 413-436.
WAHBA, G. (1977). Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14 661-667.
WAHBA, G. (1984). Partial spline models for the semiparametric estimation of functions of several variables. In Statistical Analysis of 7lme Series, Proceeding of the Japan U.S. Joint Seminar, 7bkyo 319-329. Inst. Statist. Math., lbkyo.
WAHBA, G. (1986). Partial and interaction splines for the semiparametric estimation of functions of several variables. In Computer Science and Statistics: Proceedings of the 18th Sym- posium on the Interface (T. J. Boardman, ed.) 75-80. Amer. Statist. Assoc., Alexandria, VA.
DEPARTMENT OF APPLIED MATHEMATICS
AND STATISTICS
STATE UNIVERSITY OF NEW YORK
STONY BROOK, NEW YORK 11794-3600
INSTITUTE OF STATISTICS
NATIONAL CHIAO-TUNG UNIvERSrrY