arXiv:1501.00538v3 [stat.ME] 13 Sep 2015
Supplement to “Efficient estimation in semivarying coefficient models for longitudinal/clustered data”
by Ming-Yen Cheng, Toshio Honda, and Jialiang Li S.1. Additional simulation results.
S.1.1. Nonparametric component estimates. In Step 7 of our estimation procedure we give both local linear and spline approaches to estimation the nonparametric component after the efficient estimator bβb
Σ is obtained. In
this section we examine the finite sample performance via simulations. For comparison, we also computed the respective initial estimates, that is, the version using bβI instead of bβΣb. We considered the same settings in Section
4, and we used cross-validation to choose the bandwidth used in the local linear estimation. We computed the mean integrated square error (MISE) for all the function estimates and took their average. The results are given in Table S.1.
The figures in Table S.1 indicate that it is clearly advantageous to update the nonparametric component after efficient estimation of the parametric component. In addition, we observe that the refine local linear and spline estimators perform roughly the same in terms of MISE.
Table S.1
MISE for simulation studies.
Local linear estimate Spline estimate
Initial Refined Initial Refined
n=100 ρ= .4 .0449 .0354 .0492 .0376 ρ= .8 .0691 .0597 .0639 .0593 n=200 ρ= .4 .0390 .0315 .0415 .0355 ρ= .8 .0595 .0589 .0584 .0576
S.1.2. Parametric component estimates. We note that we adjusted the covariance functionbσ(s, t) by setting all negative eigenvalues to be zero. We also considered a strictly positive threshold λL= 0.05 and set all eigenvalues
lower than λL to be zero. The estimator using this covariance estimate is
denoted by “Positive” in Table S.2. The “positive” estimator includes an adjustment when estimating the covariance function by setting eigenvalues lower than a positive cut-off to be zero while the efficient estimator only adjusts the negative eigenvalues. Therefore, it is slightly more biased than the efficient estimator. In all the considered cases, the crude and positive estimators are still more efficient than the working independence estimator.
Recall that in all the numerical analysis reported in the paper, h1 and h2
were selected via the commonly used leave-one-subject-out cross-validation, and the bandwidth h3 used in the estimation of the covariance structure
were selected as h3 = 2h1. To examine effects of the bandwidth choice, we
considered various choices of h3 in the numerical studies and obtained quite
similar results. Under the column “Different h3”, we report the results for
another case when h3 = 1.5h1, which are similar to those obtained when
h3 = 2h1.
Our procedure does not require any iteration. In practice it may be inter-esting to refine the estimation of coefficients and covariances using iterations and obtain a final estimation upon convergence. We report the numerical results under the “Iterative” column. The bias and SE are very close to those obtained without iteration.
Table S.2
Estimation results of 200 simulations. “Positive” means we set a positive threshold for
the covariance eigenvalues; “Different h3” means using a different choice of h3 in our
efficient estimation; “Iterative” indicates an iterative estimation approach.
Positive Different h3 Iterative
n ρ bias SE bias SE bias SE
100 0.4 β1 .0173 .0411 -.0152 .0375 -.0146 .0361 β2 .0176 .0423 -.0098 .0375 -.0095 .0352 β3 .0205 .0425 -.0122 .0369 -.0099 .0360 β4 -.0096 .0425 .0098 .0373 -.0086 .0362 200 0.4 β1 -.0113 .0329 .0056 .0274 .0045 .0228 β2 -.0164 .0334 -.0099 .0274 -.0066 .0219 β3 .0120 .0323 .0072 .0273 .0034 .0259 β4 -.0095 .0329 -.0043 .0276 -.0035 .0274 100 0.8 β1 .0202 .0366 .0082 .0336 .0065 .0325 β2 .0163 .0378 -.0075 .0335 -.0034 .0323 β3 .0197 .0372 .0166 .0337 .0121 .0328 β4 -.0168 .0354 -.0182 .0338 .0157 .0325 200 0.8 β1 -.0044 .0214 -.0124 .0202 .0056 .0199 β2 .0036 .0215 .0138 .0200 -.0049 .0199 β3 .0042 .0215 .0165 .0204 .0052 .0178 β4 -.0038 .0214 -.0148 .0200 -.0050 .0179
S.2. Proofs of Propositions 1-3 and Lemma 1. In this section, we outline the proofs of Propositions 1-3 and present the proof of Lemma 1. When mi is uniformly bounded, we have the same results for general link
functions by just following closely the arguments of [3]. We outline the results at the end of this supplement. Note that the sub-Gaussian error assumption is necessary in that case. We outline the proofs of Propositions 1-3 since we allow some of the mi’s to diverge as in Assumptions A1 and A2.
Proof of Proposition 1. First we consider the properties of ΓV. The (k, l)th
element of n−1H
11·2is given by
hXk− ZTϕbVk, Xl− ZTϕbVliVn.
From Lemma 1 (v)-(vii), we have
hXk− ZTϕbVk, Xl− ZTϕbVliVn = hXk− ZTϕ∗Vk, Xl− ZTϕ∗VliVn + op(1)
= hXk− ZTϕ∗Vk, Xl− ZTϕ∗VliV + op(1).
This and (2.5) imply that for some positive constants C1 and C2, we have
C1 ≤ λmin(n−1H11·2) ≤ λmax(n−1H11·2) ≤ C2 and hence 1 nC2 ≤ λmin (H11) ≤ λmax(H11) ≤ 1 nC1 (S.1)
with probability tending to 1. Note that
Var( bβV | {Xij}, {Zij}, {Tij}) = ΓV
and Theorem 1 of [13] implies that ΓV − H11 is nonnegative definite when
H11 is defined with Vi= Σi. Hence for some positive constant C3, we have
λmin(ΓV) ≥
C3
n with probability tending to 1.
Now we prove the asymptotic normality of b βV − E{ bβV | {Xij}, {Zij}, {Tij}} = H11 n X i=1 XiTVi−1ǫi− H12H22−1 n X i=1 WiTVi−1ǫi.
As in the proof of Theorem 2 of [13], we take c ∈ Rp such that |c| = 1 and write cT( bβV − E{ bβV | {Xij}, {Zij}, {Tij}}) = n X i=1 aiηi (say), where a2i = cTH11(Xi− WiH22−1H21)TVi−1ΣiVi−1(Xi− WiH22−1H21)H11c
and {ηi} is a sequence of conditionally independent random variables with
E{ηi| {Xij}, {Zij}, {Tij}} = 0 and Var(ηi| {Xij}, {Zij}, {Tij}) = 1.
We have from (S.1) and Lemma 1 (vii) that max 1≤i≤na 2 i = Opm 2 max n2 p X k=1 kXk− ZTϕbVkk2∞= Op m 2 max n2 . On the other hand, we have for some positive constant C4,
n X i=1 a2i = cTΓVc ≥ C4 n
with probability tending to 1. Hence we have established
max1≤i≤na2i
Pn
i=1a2i
= Op(n−1m2max) = op(1)
and it follows from the standard argument that
(S.2) n X i=1 a2i−1/2 n X i=1 aiηi → N(0, 1).d
Finally we evaluate the conditional bias:
Biasβ = E{ bβV | {Xij}, {Zij}, {Tij}} − β0
Takege∈ GB such that kg0− egkG,∞= O(Kn−2) and set
δ0 = g0− eg and δ0 = ZTδ0.
Note that
kδ0k∞= O(Kn−2) and kδ0kV = O(Kn−2).
We also take ϕeVk ∈ GB such that kϕ∗Vk− eϕVkkG,∞ = O(Kn−2). Then we
have the following expression for the conditional bias: Biasβ = nH11(S1, . . . , Sp)T, where Sk= hXk, δ0− ZTΠbVnδ0iVn = hXk− ZTϕeVk, δ0− ZTΠbVnδ0iVn = hXk− ZTϕ∗Vk, δ0− ZTΠVnδ0iVn + hXk− ZTϕ∗Vk, ZTΠVnδ0− ZTΠbVnδ0iVn + hZTϕ∗Vk− ZTϕeVk, δ0− ZTΠbVnδ0iVn = S1k+ S2k+ S3k (say).
Note that
E{S1k} = 0 and E{S1k2 } = O(kX
k− ZTϕ∗VkkV)2
K3
nn
since S1k is a sum of independent random variables, ϕ∗Vk = ΠVXk, δ0 =
ZTδ0, and
kδ0− ZTΠVnδ0k∞≤ kδ0k∞+ CKn1/2kZTΠVnδ0kV
≤ kδ0k∞+ CKn1/2kδ0kV = O(Kn−3/2).
Hence we have
S1k = Op(1/(nKn3)1/2) = op(n−1/2).
Now we deal with S2k. From Lemma 1 (vi) and the fact that kδ0−ZTΠVnδ0k∞=
O(Kn−3/2), we have kZTΠVnδ0− ZTΠbVnδ0kVn = sup g∈GB |hδ0− ZTΠVnδ0, ZTgiVn − hδ0− ZTΠVnδ0, ZTgiV| kZTgkV n = Op Kn−3/2 r Kn n = Op(Kn−1n−1/2). Thus we have |S2k| = op(n−1/2). We also have |S3k| ≤ kδ0kVnkZT(ϕ∗Vk− eϕVk)kVn = Op(Kn−4) = op(n−1/2)
since kδ0− ZTΠbVnδ0kVn ≤ kδ0kVn. Hence we have
Biasβ = op(n−1/2) .
The desired result follows from (S.2) and the above equality.
As for Proposition 2, there is almost no change in calculation of the score functions in [13] and [4] and we omit the outline. This is because mi is
bounded for any fixed n.
Proof of Proposition 3. When Vi= Σi, we have
Lemma 1 (vii) implies that 1 nΓ −1 V = 1 nH11·2= 1 nE{l ∗ β(lβ∗)T} + op(1) = ΩΣ+ op(1).
The desired result follows from the above result and Proposition 1. Proof of Lemma 1. The proof consists of seven parts.
(i) Recall that
(kZTgkV)2 = 1 nE nXn i=1 (ZTg)T i V −1 i (ZTg)i o . We have from Assumptions A4 and A5 that
C1 n E nXn i=1 1 mi mi X j=1 gT(Tij)ZijZijTg(Tij) o (S.3) ≤ (kZTgkV)2 ≤ C2 n E nXn i=1 mi X j=1 gT(Tij)ZijZijTg(Tij) o
for some positive constants C1 and C2. Assumptions A2 and A3 imply that
for some positive constants C3 and C4,
C3 q X l=1 Z g2l(t)dt ≤ 1 nE nXn i=1 1 mi mi X j=1 gT(Tij)ZijZijTg(Tij) o (S.4) ≤ n1En n X i=1 mi X j=1 gT(Tij)ZijZijTg(Tij) o ≤ C4 q X l=1 Z gl2(t)dt.
The desired result follows from (S.3) and (S.4).
(ii) This is a well-known result in the literature of spline regression. See for example A.2 of [12].
(iii)The result in (ii) implies
kXTβ+ ZTgk2∞ ≤ CKn
|β|2+ kgk2G,2
for some positive constant C. Recall that p and q are fixed in this paper. On the other hand, we have from Assumptions A1-3 and A5 that for some
positive constants C1, C2, and C3, (kXTβ+ ZTgkV)2 ≥ C1 n E nXn i=1 1 mi mi X j=1 (βT gT(Tij)) XijXijT XijZijT ZijXijT ZijZijT β g(Tij) o ≥ Cn2En n X i=1 1 mi mi X j=1 (βT gT(Tij)) β g(Tij) o ≥ C3|β|2+ kgk2G,2.
Besides, we have for some positive constants C1 and C2,
(kvkV)2 ≤ C1 n n X i=1 mi X j=1 |vij|2 ≤ C2kvk∞.
Hence the desired results are established. (iv) For g1 ∈ GB and g2 ∈ GB, we have
hZTg1, ZTg2iVn = γ1T n 1 n n X i=1 WTiVi−1Wioγ2= γ1T∆Vnγ2 (say),
where ∆Vn is a qKn× qKn matrix and γ1 and γ2 correspond to g1 and g2,
respectively. Elements of n1Pni=1WTi Vi−1Wi are written as
(S.5) 1 n n X i=1 X j1,j2 vj1j2 i Bk1(Tij1)Bk2(Tij2)Zij1l1Zij2l2 = ∆ (k1,l1,k2,l2) Vn (say), where vj1j2 i is defined in (??), 1 ≤ k1, k2 ≤ Kn, and 1 ≤ l1, l2 ≤ q. By
evaluating the variance of (S.5) and using the Bernstein inequality for inde-pendent bounded random variables, and Assumptions A1 and A2, we have uniformly in k1, k2, l1, and l2, ∆(k1,l1,k2,l2) Vn − E(∆ (k1,l1,k2,l2) Vn ) = Op slog n nK2 n if Bk1(t)Bk2(t) ≡ 0 (S.6) and ∆(k1,l1,k2,l2) Vn − E(∆ (k1,l1,k2,l2) Vn ) = Oprlog n nKn if Bk1(t)Bk2(t) 6≡ 0. (S.7)
By exploiting (S.6), (S.7), and the local property of the B-spline basis, we obtain
(S.8)
max{|λmin(∆Vn− E(∆Vn))|, |λmax(∆Vn− E(∆Vn))|} = Oprlog n
n
We also have
(S.9) C1
Kn ≤ λmin
(E(∆Vn)) ≤ λmax(E(∆Vn)) ≤
C2
Kn
since Assumptions A2 and A3 yields C3 n n X i=1 1 mi mi X j=1 (Zij ⊗ B(Tij))T(Zij⊗ B(Tij)) ≤ ∆Vn≤ C4 n n X i=1 mi X j=1 (Zij ⊗ B(Tij))T(Zij ⊗ B(Tij))
for some positive constants C3 and C4. See the proof of Lemma A.3 of [12].
Hence the desired result follows from (S.8) and (S.9). (v) This follows from (iv) and (vi).
(vi) Using Assumptions A1 and A2 we have hδn, ZlBkiVn = 1 n n X i=1 X j1,j2 δn,ij1v j1j2 i Zij2lBk(Tij2) and Var(hδn, ZlBkiVn) ≤ C1kδnk2∞ n2 n X i=1 m2i X j1,j2 E{Bk2(Tij1)B 2 k(Tij2)} ≤ C2kδnk2∞ nKn
for some positive constants C1 and C2. Hence we have q X l=1 Kn X k=1 Var(hδn, ZlBkiVn) ≤ C nkδnk 2 ∞
for some positive constant C and the desired result follows from (S.9). (vii) TakeϕeVk∈ GB such that k eϕVk− ϕ∗VkkG,∞= O(Kn−2). Then we have
for some positive C,
kZT(ϕVk− ϕ∗Vk)k∞ (S.10) ≤ kZT(ϕVk− eϕVk)k∞+ kZT(ϕeVk− ϕ∗Vk)k∞ ≤ CpKnkZT(ϕVk− eϕVk)kV + kZT(ϕeVk− ϕ∗Vk)k∞ ≤ CpKnkZT(ϕ∗Vk− eϕVk)kV + kZT(ϕeVk− ϕ∗Vk)k∞ = O(Kn−3/2).
Here we used the fact that ϕVk = ΠVnXk ∈ GB and ϕ∗Vk = ΠVXk.
In-equality (S.10) implies kZTϕ
Vkk∞ = O(1) and we have only to evaluate
ZT(ϕVk− bϕVk). We should just follow the arguments on p.16 of [3] by
re-placing ϕ∗k,nandϕbk,nwith ZTϕVkand ZTϕbVk since the arguments employ
(iv) and (vi) and don’t depend on mi. Then we have
kZT(ϕVk− bϕVk)k∞= op(1), kZT(ϕVk− bϕVk)kVn = Op( p Kn/n), and kZT(ϕVk− bϕVk)kV = Op( p Kn/n).
The desired results follow from the above equations and (S.10).
S.3. Proof of Proposition 4. In the proof, we repeatedly use argu-ments based on exponential inequalities, truncation, and division of regions into small rectangles to prove uniform convergence results as in [S3]. We do not give the details of these arguments since they are standard ones in non-parametric kernel methods. Since we impose Assumption A2 and we do not use Σi or Vi in the construction ofg(t), cb σ2(t), and bσ(s, t), we see the effects
of diverging mi explicitly only when applying the exponential inequality
for generalized U-statistics. Recall that we assume three times continuous differentiability of the relevant functions in this proposition.
The proof consists of four parts: (i) representation ofg(t), (ii) represen-b tation ofbǫij, (iii) representation of cσ2(t), and (iv) representation ofbσ(s, t).
(i) Representation ofg(t). Applying the third order Taylor series expansionb to g0(t), we have (S.11) ZijTg0(Tij) = ZijT n g0(t) + h1 Tij− t h1 g′0(t) +h 2 1 2 Tij − t h1 2 g′′0(t)o+ O(h31), where g0′(t) = (g01′ (t), . . . , g′0q(t))T and g0′′(t) = (g01′′(t), . . . , g0q′′ (t))T. By plugging (S.11) into (3.2), we have uniformly in t,
b g(t) = g0(t) + Dq(bL1(t))−1Lb2(t)(β0− bβI) (S.12) +h 2 1 2 Dq(bL1(t)) −1Lb 3(t)g0′′(t) + Dq(bL1(t))−1E0(t) + Op(h31),
where bL1(t) = A1n(t) defined after (3.2), b L2(t) = 1 N1h1 n X i=1 mi X j=1 Zij ⊗ 1 Tij−t h1 ! KTij − t h1 XijT, b L3(t) = 1 N1h1 n X i=1 mi X j=1 (ZijZijT) ⊗ (Tij−t h1 ) 2 (Tij−t h1 ) 3 ! KTij− t h1 , E0(t) = 1 N1h1 n X i=1 mi X j=1 Zij ⊗ Tij1−t h1 ! KTij − t h1 ǫij.
By following standard arguments such as those in [S3], we obtain for j = 1, 2, 3, b Lj(t) = Lj(t) + Oprlog n nh1 uniformly in t, (S.13)
where Lj = E{bLj(t)}, and
E0(t) = Oprlog n
nh1
uniformly in t. (S.14)
Assumption A2 implies that
(S.15) C1I2q ≤ L1(t) ≤ C2I2q
for some positive constants C1 and C2. From (S.12)-(S.15), we have
uni-formly in t, b g(t) = g0(t) + Dq(L1(t))−1L2(t)(β0− bβI) + h2 1 2 Dq(L1(t)) −1L 3(t)g0′′(t) (S.16) + Dq(L1(t))−1E0(t) + Op(h31) + Op log n nh1 + Op h21 r log n nh1 = g0(t) + L4(t)(β0− bβI) + h12L5(t)g0′′(t) + L6(t)E0(t) + Op(h31) + Oplog n nh1 + Op h21 r log n nh1 (say).
Note that all the elements of Lj(t), j = 4, 5, 6, are bounded functions of t.
(ii) Representation ofbǫij. We have
By plugging (S.16) into the above equality, we obtain uniformly in i and j, bǫij = ǫij + (XijT − ZijTL4(Tij))(β0− bβI) − h21ZijTL5(Tij)g′′(Tij) (S.17) − ZijTL6(Tij)E0(Tij) + Op(h31) + Op log n nh1 + Op h21 r log n nh1 = ǫij + Mij(1)(β0− bβI) + h21M (2) ij g′′(Tij) + Mij(3)E0(Tij) + Op(h31) + Oplog n nh1 + Op h21 r log n nh1 (say).
Note that all the elements of Mij(1), Mij(2), and Mij(3) are uniformly bounded functions of Xij, Zij, and Tij.
(iii) Representation of cσ2(t). We have uniformly in i and j,
(bǫij)2 = ǫ2ij− σ2(Tij) + σ2(Tij) + 2ǫijMij(3)E0(Tij) (S.18) + 2ǫijMij(1)(β0− bβI) + 2ǫijh21M (2) ij g′′0(Tij) + Op(h31) + Op log n nh1 + Op h21 r log n nh1 . Recall that Mij(l), l = 1, 2, 3, are defined in (S.17). It is easy to see that the contributions of 2ǫijMij(1)(β0− bβI) and 2ǫijh21M (2) ij g′′(Tij) to cσ2(t) are Op 1√ n r log n nh2 and Op h21 r log n nh2
uniformly in t, respectively. Thus we have only to consider ǫ2ij − σ2(Tij),
σ2(Tij), and 2ǫijMij(3)E0(Tij) in (S.18).
Setting bL7(t) = A2n(t), which is defined after (3.3), we have for some
positive constants C1 and C2,
(S.19) Lb7(t) = L7(t) + Oprlog n
nh2
and C1I2 ≤ L7(t) ≤ C2I2
uniformly in t, where L7(t) = E{bL7(t)}. Now we have uniformly in t,
c σ2(t) = (1 0)(bL 7(t))−1(E1(t) + Bias1(t) + R1(t)) (S.20) + Op(h31) + Op log n nh1 + Op h21 r log n nh1 ,
where E1(t) is defined in Proposition 4, Bias1(t) is the term of σ2(Tij), and
R1(t) is the term of 2ǫijMij(3)E0(Tij). It is easy to see that uniformly in t,
(S.21) E1(t) = Oprlog n
nh2
. By applying the Taylor series expansion, we have
σ2(Tij) = σ2(t) + h2(σ2)′(t) Tij− t h2 +h 2 2 2 (σ 2)′′(t)Tij− t h2 2 + O(h32). Therefore Bias1(t) can be represented as
Bias1(t) = bL7(t) σ2(t) h2(σ2)′(t) +h 2 2(σ2)′′(t) 2N1h2 n X i=1 mi X j=1 (Tij−t h2 ) 2 (Tij−t h2 ) 3 ! KTij− t h2 + Op(h32). uniformly in t. Setting b L8(t) = 1 N1h2 n X i=1 mi X j=1 (Tij−t h2 ) 2 (Tij−t h2 ) 3 ! KTij − t h2 , we have uniformly in t, b L8(t) = L8(t) + Oprlog n nh2 ,
where L8(t) = E{bL8(t)} and L8(t) is a bounded vector function of t. Hence
we have uniformly in t, (S.22) Bias1(t) = bL7(t) σ2(t) h2(σ2)′(t) +h 2 2(σ2)′′(t) 2 L8(t) + Op(h 3 2) + Op h22 r log n nh2 . Next we deal with R1(t), which can be written as
(S.23) 1 N2 1h1h2 X a,b n X i=1 mi X j=1 n X i′=1 mi′ X j′=1 ǫijǫi′j′Aab,ijBab,i′j′KaTij − t h2 Kb Ti′j′− Tij h1 ,
uniformly bounded functions of Xij, Zij, and Tij. We evaluate 1 N2 1h1h2 n X i=1 mi X j=1 n X i′=1 mi′ X j′=1 ǫijǫi′j′Aab,ijBab,i′j′Ka Tij− t h2 Kb Ti′j′− Tij h1 (S.24) = 1 N2 1h1h2 n X i=1 mi X j=1 ǫ2ijAab,ijBab,ijKaTij − t h2 Kb(0) + 1 N12h1h2 n X i=1 X j6=j′ ǫijǫij′Aab,ijBab,ij′Ka Tij− t h2 Kb Tij′− Tij h1 + 1 N2 1h1h2 X i6=i′ X j,j′ ǫijǫi′j′Aab,ijBab,i′j′Ka Tij − t h2 Kb Ti′j′− Tij h1
= R(1)1ab(t) + R(2)1ab(t) + R1ab(3)(t) (say).
Note that we cannot apply classical exponential inequalities for U-statistics since kernel functions depend on i and i′ and observations are not identical. It is easy to see that uniformly in t,
(S.25) R(1)1ab(t) = Op((nh1)−1) and R(2)1ab(t) = Op(n−1).
We evaluate R(3)1ab(t) by using an exponential inequality as the one given in (3.5) of [S1] with A = C1(log n)km2max/(n2h1h2),
B2 = C2 (log n)2km2max n3h 1h2 (h−11 + h−12 ), C = C3/(nh1/21 h 1/2 2 ), and x = M log n/(nh 1/2 1 h 1/2
2 ) in the inequality and
standard arguments in nonparametric regression as in [S3]. Note that we used a kind of truncation technique to handle ǫij and that we have to take
sufficiently large k and M here. Hence we have R(3)1ab(t) = Op
log n
n(h1h2)1/2
. The above equation and (S.23)-(S.25) imply that
(S.26) R1(t) = Op log n n(h1h2)1/2 + Op 1 nh1
uniformly in t. It follows from (S.19)-(S.22) and (S.26) that c σ2(t) = σ2(t) + (1 0)(L 7(t))−1E1(t) + h22 2 (1 0)(L7(t)) −1L 8(t)(σ2)′′(t) + Op(h31) + Op(h32) + Op log n nh1 + Oplog n nh2 .
The expression of cσ2(t) in Proposition 4 also follows from the above
expres-sion.
(iv) Representation of bσ(s, t). We can proceed almost in the same way as when we deal with cσ2(t). First we have uniformly in i, j, and j′,
bǫijbǫij′ = ǫijǫij′ − σ(Tij, Tij′) + σ(Tij, Tij′) + ǫijMij(3)′E0(Tij′) + ǫij′Mij(3)E0(Tij) + ǫijMij(1)′(β0− bβI) + ǫij′Mij(1)(β0− bβI) (S.27) + ǫijh21M (2) ij′ g′′0(Tij′) + ǫij′h21M(2) ij g0′′(Tij) (S.28) + Op(h31) + Op log n nh1 + Op h21 r log n nh1 .
It is easy to see that the contributions of (S.27) and (S.28) toσ(s, t) areb Op 1√ n s log n nh23 and Op h21 s log n nh23
uniformly in s and t, respectively. Therefore we have only to consider ǫijǫij′−
σ(Tij, Tij′), σ(Tij, Tij′), and ǫijM(3)
ij′E0(Tij′) + ǫij′M(3)
ij E0(Tij) in bǫijbǫij′.
Setting bL9(s, t) = A3n(s, t), which is defined after (3.4), we have for some
positive constants C1 and C2,
(S.29) Lb9(s, t) = L9(s, t) + Op slog n nh2 3 and C1I3≤ L9(s, t) ≤ C2I3
uniformly in s and t, where L9(s, t) = E{bL9(s, t)}. Now we have uniformly
in s and t, bσ(s, t) = (1 0 0)(bL9(s, t))−1(E2(s, t) + Bias2(s, t) + R2(s, t)) (S.30) + Op(h31) + Oplog n nh1 + Op h21 r log n nh1 ,
where E2(s, t) is defined in Proposition 4, Bias2(s, t) is the term of σ(Tij, Tij′),
and R2(s, t) is the term of ǫijMij(3)′ E0(Tij′) + ǫij′Mij(3)E0(Tij). It is easy to
see that uniformly in s and t,
(S.31) E2(s, t) = Op slog n nh2 3 . Setting b L10(s, t) = 1 N2h23 n X i=1 X j6=j′ 1 Tij−s h3 Tij′−t h3 Tijh− s 3 2 2(Tij− s)(Tij′− t) h2 3 Tij′ − t h3 2 × KTijh− s 3 KTij′− t h3 , we have uniformly in s and t,
b L10(s, t) = L10(s, t) + Op slog n nh23 ,
where L10(s, t) = E{bL10(s, t)} which is a bounded matrix function of (s, t).
Then we have, as in the proof of the representation of cσ2(t), uniformly in s
and t Bias2(s, t) = bL9(s, t) σ(s, t) h3∂σ∂s(s, t) h3∂σ∂t(s, t) +h23 2 L10(s, t) ∂2σ ∂s2(s, t) ∂2σ ∂s∂t(s, t) ∂2σ ∂t2(s, t) (S.32) + Op(h33) + Op h23 s log n nh2 3 .
Finally we deal with R2(s, t) in the same way as in the proof of the
repre-sentation of cσ2(t). We use the same exponential inequality for U-statistics.
We should consider 1 N1N2h1h23 n X i1=1 n X i2=1 X j16=j2 X j3 ǫi1j1ǫi2j3Aabc,i1j2Babc,i2j3 (S.33) × Ka Ti2j3− Ti1j2 h1 Kb Ti1j1 − t h3 Kc Ti1j2− s h3 ,
where Kl(t) = tlK(t), a = 0, 1, b = 0, 1, and c = 0, 1. Note that Aabc,ij
and Babc,ij are uniformly bounded functions of Xij, Zij, and Tij. This is a
generalized U-statistics when we remove the summands of i1 = i2 and we
recall (1.1) when we evaluate (S.33). It is easy to see that uniformly in s and t, 1 N1N2h1h23 n X i1=1 X j16=j2 X j3 ǫi1j1ǫi1j3Aabc,i1j2Babc,i1j3 (S.34) × Ka Ti1j3 − Ti1j2 h1 Kb Ti1j1− t h3 KcTi1j2 − s h3 = Op 1 nh1 . In the same way as when dealing with R1ab(3)(t), we obtain
1 N1N2h1h23 X i16=i2 X j16=j2 X j3 ǫi1j1ǫi2j3Aabc,i1j2Babc,i2j3 (S.35) × KaTi2j3 − Ti1j2 h1 KbTi1j1 − t h3 KcTi1j2− s h3 = Op log n nh1/21 h3
with A = C1(log n)km3max/(n2h1h23), B = C2(log n)km2max/(n3/2h1/21 h23),
C = C3/(nh1/21 h3), and x = M log n/(nh1/21 h3) in the exponential
inequal-ity. Note that we should choose sufficiently large k and M . It follows from (S.34) and (S.35) that uniformly in s and t,
(S.36) R2(s, t) = Op log n
nh1/21 h3
.
Note that we cannot relax the assumption of mmax= O(n1/8) in Assumption
A1 when we derive (S.36). It follows from (S.29)- (S.32) and (S.36) that uniformly in s and t, bσ(s, t) − σ(s, t) = (1 0 0)(L9(s, t))−1E2(s, t) + h2 3 2 (1 0 0)(L9(s, t)) −1L 10(s, t) ∂2σ ∂s2(s, t) ∂2σ ∂s∂t(s, t) ∂2σ ∂t2(s, t) + Op(h31) + Op(h33) + Oplog n nh1 + Oplog n nh23 .
S.4. Proofs of Lemmas 2-8. First we state some results on bΣi. Set (S.37) δn= h22+ h23+ r log n nh2 + s log n nh23 . Then we have from Proposition 4 that uniformly in i,
max{|λmin(Σi− bΣi)|, |λmax(Σi− bΣi)|} = Op(miδn).
Recall that b
Σ−1i − Σ−1i = bΣi−1(Σi− bΣi)Σ−1i
= Σ−1i (Σi− bΣi)Σ−1i + bΣ−1i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i .
We have from Assumption A4 and Proposition 4 that uniformly in i, |Σ−1i (Σi− bΣi)Σi−1|max= Op(miδn), (S.38) | bΣ−1i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i |max= Op(m2iδn 2 ), (S.39)
where |A|max= maxi,j|aij| for any matrix A = (aij). Besides, it follows from
Assumption A4 that we have uniformly in i, (S.40)
max{|λmin(Σ−1i (Σi− bΣi)Σ−1i )|, |λmax(Σ−1i (Σi− bΣi)Σ−1i )|} = Op(miδn).
We also have the same result for bΣ−1i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i as in (S.40)
with miδn replaced by (miδn)2. Proposition 4 also implies each element of
Σ−1i (Σi− bΣi)Σ−1i has the form of
(S.41) Di(1)(Ti)h22+Di(2)(Ti)h23+ mi X j=1 Dij(3)(Ti)E1(Tij)+ X j6=j′ Dijj(4)′(Ti)E2(Tij, Tij′)+Di(5), where D(5)i = miOp h31+ h32+ h33+log n nh1 + log n nh2 +log n nh23 uniformly in i.
We state the following two useful facts before we start proving Lemmas 2-8, both hold uniformly in l:
(S.42) 1 n n X i=1 m3i mi X j=1 |Wijl| = Op(Kn−1) ,
(S.43) and 1 n n X i=1 m2i mi X j1=1 mi X j2=1 |Wij1l||ǫij2| = Op(Kn−1) ,
where Wijl denotes the lth element of Wij. We can prove them in the same
way, except that we need a kind of truncation argument when showing (S.43), and we outline the proof of (S.42) in the following. To prove (S.42), we evaluate the expectation and variance and apply the Bernstein inequality. First note that we have uniformly in l,
Enn−1 n X i=1 m3i mi X j=1 |Wijl| o = O(Kn−1).
This follows from the local property of the B-spline basis and Assumption A2. In addition, since we have from Assumption A2 that
Enm 2 max n2 n X i=1 m4i mi X j=1 |Wijl|2+ m3max n2 n X i=1 m3i X j16=j2 |Wij1l||Wij2l| o = Om 2 max nKn + m 3 max nK2 n ,
the variance is bounded from above by C1n−19/20 uniformly in l. Each
sum-mand is bounded from above by C2m4max/n = O(n−1/2). Hence (S.42) and
the uniformity in l follow from the Bernstein inequality.
Proof of Lemma 2. We can verify the result on n−1h12,kl by using the local
property of the B-spline basis and the Bernstein inequality for independent bounded random variables. Since
1 n(cH12− H12) = 1 n n X i=1 XTi {Σ−1i (Σi− bΣi)Σ−1i }Wi + 1 n n X i=1 XTi { bΣ−1i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i }Wi,
the desired result on n−1(bh12,kl − h12,kl) follows from (S.38), (S.39), and
(S.42). The results on the Euclidean norm follow from those on the ele-ments. Hence the proof is complete.
Proof of Lemma 3. We have from Assumption A4 that
C1 n n X i=1 1 mi WTi Wi ≤ 1 nH22≤ C2 n n X i=1 WTi Wi (S.44)
for some positive constants C1 and C2 and for k = 0, 1, 1 n n X i=1 1 mki W T i Wi = 1 n n X i=1 1 mki mi X j=1 (ZijZijT) ⊗ (B(Tij)BT(Tij)).
Thus the first result follows from Assumptions A2 and A3 and the standard arguments on B-spline bases as in the proofs of Lemmas A.1 and A.2 of [12].
Since we have 1 n(cH22− H22) = 1 n n X i=1 WTi {Σ−1i (Σi− bΣi)Σ−1i }Wi + 1 n n X i=1 WTi { bΣ−1i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i }Wi,
the second result follows from (S.40), the inequalities similar to (S.44), and Assumptions A2 and A3. The third result follows from the first and second results. Finally we deal with the fourth result. Note that
(n−1Hc22)−1− (n−1H22)−1
(S.45)
= (n−1H22)−1(n−1H22− n−1Hc22)(n−1H22)−1
+(n−1Hc22)−1(n−1H22− n−1Hc22)
×(n−1H22)−1(n−1H22− n−1Hc22)(n−1H22)−1.
By using the first, second, and third results and (S.45), we obtain the fourth one. Hence the proof is complete.
Proof of Lemma 4. The first result follows from (S.40). The second one
follows from Lemmas 2 and 3. The last one follows from the first two. Proof of Lemma 5. The first result follows from the fact
C1 n n X i=1 1 mi WTi Wi≤ 1 n n X i=1 WTi Σ−1i Wi≤ C2 n n X i=1 WTi Wi for some positive constants C1 and C2. Next note that
1 √ n n X i=1 WTi {Σ−1i (Σi− bΣi)Σ−1i } ǫi (S.46) = √1 n n X i=1 WTi {Σ−1i (Σi− bΣi)Σ−1i } ǫi +√1 n n X i=1 WTi { bΣ−1i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i } ǫi.
By employing (S.39) and (S.43), we can prove the stochastic order of the ele-ments of the second term of the right-hand side is uniformly Op(√nKn−1(h42+
h43+log n/(nh2)+log n/(nh23))). Thus the norm of this qKn-dimensional
vec-tor has the stochastic order of (S.47) r n Kn Op h42+ h43+log n nh2 +log n nh2 3 .
According to Proposition 4, the first term of the right-hand side of (S.46) can be decomposed into
(S.48) √1 n n X i=1 WTi Q1iǫi+ 1 √ n n X i=1 WTi Q2iǫi+ 1 √ n n X i=1 WTi Q3iǫi,
where Q1i corresponds to the first and second terms in (S.41), Q2i
corre-sponds to the third and fourth terms in (S.41), and Q3i corresponds to the
fifth term in (S.41). Proposition 4 implies Q1i= Q(2)1i h22+ Q
(3)
1i h23,
where we have for s = 2, 3,
max{|λmin(Q(s)1i )|, |λmax(Q(s)1i)|} = O(mi)
uniformly in i. Besides Q(s)1i depends only on Ti for s = 2, 3. The (k, l)
element of Q2i has the form of
mi X j=1 σkji σiljE1(Tij) + X j6=j′ σikjσlji ′E2(Tij, Tij′),
where Σ−1i = (σikl). Note that uniformly in l and i,
mi
X
k=1
(σikl)2 = O(1).
Uniformly in i, the elements of Q3i, D(5)i in (S.41), have the order of
miOp h31+ h32+ h33+log n nh1 + log n nh2 +log n nh2 3 . We can prove as in the proof of Lemma 3 that for s = 2, 3,
C1 Kn IqKn ≤ Cov n−1/2 n X i=1 WTi Q(s)1iǫi≤ C2 Kn IqKn
for some positive constants C1 and C2. Hence we have (S.49) n−1/2 n X i=1 WTi Q1iǫi = Op(h22+ h23).
Similarly to the second term in the right-hand side of (S.46), we can demon-strate by using (S.43) that
(S.50) n−1/2 n X i=1 WTi Q3iǫi = r n Kn Op h31+h32+h33+log n nh1 +log n nh2 +log n nh2 3 . Finally we evaluate the second term of (S.48) and it has a structure of V-statistics. By exploiting the structure, we evaluate the expectations and the variances of the elements by using Assumption A2. Then we have
n−1/2 n X i=1 WTi Q2iǫi = Op 1√ nh2 + p1 nh23 + 1 √ nKnh2 +√ 1 nKnh23 . The second result follows from (S.47), (S.49), (S.50), and the above equality. Proof of Lemma 6. This lemma can be proved in the same way as Lemma 5 and the details are omitted.
Proof of Lemma 7. From the definition of γ∗ given after (5.5), we have
max
1≤j≤mi|W
T
ijγ∗− ZijTg0(Tij)| = Op(Kn−2)
uniformly in i. The above equality and (S.42) imply that the elements of 1 n n X i=1 WTi Σ−1i (Wiγ∗− (ZTg0)i)
is uniformly Op(Kn−3) and the first result follows from this. As for the second
result, first we note that
| bΣ−1i − Σ−1i |max= Op(miδn)
uniformly in i from (S.38) and (S.39). Recall that δnis defined in (S.37). Thus
the elements of WTi ( bΣ−1i − Σ−1i )(Wiγ∗− (ZTg0)i) are bounded uniformly
in l by CKn−2δnm2i mi X j=1 |Wijl|
with probability tending to 1 for some positive constant C. Hence the second result follows from (S.42).
Proof of Lemma 8. This lemma can be proved in the same way as Lemma 7 and the details are omitted.
S.5. Theoretical results for general link functions. We state the results of Section 2 for general link functions when mi is uniformly bounded
and ǫi satisfies the sub-Gaussian assumption, Assumption A6′ here. Note
that we have no counterpart of Theorem 1 for general link functions even when mi is uniformly bounded.
Let v1 and v2 be two processes each taking a scalar stochastic value at
Tij, i = 1, . . . , n, j = 1, . . . , mi. Then we define two inner products of v1and
v2 by hv1, v2i∆n = 1 n n X i=1 vT1i∆0iVi−1∆0iv2i and hv1, v2i∆= E{hv1, v2i∆n},
where v1iand v2i are defined in the same way as Ti and ∆0i= diag µ′(Xi1Tβ0+ Zi1Tg0(Ti1)), . . . , µ′(XimT iβ0+ Z
T
imig0(Timi))
. The associated norms are then defined by
kvk∆n = (hv, vi∆n)1/2 and kvk∆= (hv, vi∆)1/2.
We now define the projections, with respect to k · k∆, of the kth element of
X onto ZTG and ZTG B by Π∆Xk = argmin g∈G kXk− Z Tgk∆and Π ∆nXk= argmin g∈GB kXk− ZTgk∆, where kXk− ZTgk∆= 1 nE nXn i=1 (Xik− (ZTg) i) T∆ 0iVi−1∆0i(Xik− (ZTg)i) o , with Xik = (Xi1k, . . . , Ximik) T and (ZTg) i = (Z T i1g(Ti1), . . . , ZimT ig(Timi)).
We denote these projections by ϕ∗
∆k = Π∆Xk and ϕ∆k = Π∆nXk, and
define another one by
b
where
b
Π∆nXk= argmin
g∈GB
kXk− ZTgk∆n.
The arguments in Section 5.2 also apply to this ϕ∗
∆k.
Some matrices are necessary to present Proposition S.1 and we define them here. Let
f H = Pn i=1XTi ∆0iVi−1∆0iXi Pn i=1XTi ∆0iVi−1∆0iWi Pn i=1WTi ∆0iVi−1∆0iXi Pn i=1WTi ∆0iVi−1∆0iWi = Hf11 Hf12 f H21 Hf22 ! (say), f H11·2= fH11− fH12Hf22−1Hf21, and Hf11= (fH11·2)−1.
Let eΩVn be a p × p matrix whose (k, l)th element is
1 n n X i=1 En(Xik− (ZTϕ∗∆k) i) T∆ 0iVi−1∆0i(Xil− (ZTϕ∗∆l)i) o .
Note that n−1Hf11·2 is an estimate of eΩVn. We assume that there exists a
p × p positive definite matrix eΩV such that
(S.51) lim
n→∞ΩeVn= eΩV.
We present Propositions S.1-S.3 before stating the assumptions for these propositions. By using Lemma S.1 we can prove Proposition S.1 based on the same arguments as those in [4].
Proposition S.1. (Asymptotic normality of bβV) Under Assumption S
in Section 2 for the norm here, (S.51), and Assumptions A1′, A2′, A3, A4′, A5′, and A6′, we have b βV = β0+ fH11 n X i=1 (Xi− WiHf22−1Hf21)T∆0iVi−1ǫi+ op 1 √n. We also have e Γ−1/2V ( bβV − β0)→ N(0, Id p), where eΓV is f H11 n X i=1 n (Xi−WiHf22−1Hf21)T∆0iVi−1ΣiVi−1∆0i(Xi−WiHf22−1Hf21) o f H11.
We give in Proposition S.2 the semiparametric efficiency bound for esti-mation of β0. It can be proved in the same way as Lemma 1 of [4] and the
proof is omitted. We denote the semiparametric efficient score function of β by
˜
lβ∗ = (˜l∗β1, . . . , ˜l∗βp)T.
Its expression is given in Proposition S.2. When Vi = Σi, we denote ϕ∗∆k(t)
by ˜ϕ∗ef f,k(t).
Proposition S.2. (Semiparametric efficiency bound) Under the same
assumptions as in Proposition S.1, we have ˜ l∗βk = n X i=1 (Xik− (ZTϕ˜∗ef f,k) i) T∆ 0iΣ−1i {Yi− µ(Xiβ0+ (ZTg0)i)},
and the semiparametric efficient information matrix for β is given by lim n→∞ 1 nE{˜l ∗ β(˜l∗β)T} = eΩΣ with Vi= Σi in (S.51).
Proposition S.3 is parallel to Proposition 3. It can be proved in the same way as Corollary 1 of [4], and it also follows from Proposition S.1 and Lemma S.1 (vii). Thus the proof is omitted.
PropositionS.3. (Oracle efficient estimator) Under the same
assump-tions as in Proposition S.1, we have with Vi = Σi in (2.2)
√
n eΩ1/2Σ ( bβΣ− β0)→ N(0, Id p).
Now we describe assumptions for the above propositions. Here we need Assumption A6′ since we need some results from the empirical process the-ory in dealing with general link functions.
Assumption A1′.
(i) µ(x) is twice continuously differentiable and infx∈Rµ′(x) > 0.
(ii) For some positive constant CB9, we have lim sup
|x|→∞ |µ(x)|/|x|
CB9 < ∞.
uni-formly bounded and we have for some positive constants CB1 and CB2, CB1 < 1 n n X i=1 mi X j=1 fij(t) < CB2 on [0, 1] and CB1 < 1 n n X i=1 X j6=j′ fijj′(s, t) < CB2 on [0, 1]2.
Assumption A4′. For some positive constants CB5 and CB6, we have
uni-formly in i,
CB5 ≤ λmin(Σi) ≤ λmax(Σi) ≤ CB6.
Assumption A5′. For some positive constants CB7 and CB8, we have
uni-formly in i,
CB7 ≤ λmin(Vi) ≤ λmax(Vi) ≤ CB8.
Assumption A6′. For some positive constants CB10 and CB11, we have
uniformly in i, max
1≤i≤nCB10E{exp(|ǫi|
2/C
B10) − 1|Xi, Zi, Ti} ≤ CB11.
To prove Proposition S.1, we have only to proceed as in [3] by replacing their Zij, Zi, and ϕ∗k(t) with Wij, Wi, and ZTϕ∗∆k(t), respectively. We just
state the relevant changes and remarks in the following:
(i) Lemmas S.2-S.4 of [3]: We reorganize these lemmas in Lemma S.1 given later. Its (i)-(iii), (iv) and (vi) correspond to Lemma S.2, the latter half of Lemma S.3 and Lemma S.4 of [3], respectively. The former half of Lemma S.3 of [3] seems to be used in their Corollary 1. However, it can be relaxed to (v) of Lemma S.1 here.
(ii) Lemma S.8 of [3]: The regressors Xij and Wij still form a VC class
and we can proceed completely in the same way as in [3].
We state Lemma S.1 in the following. It can be proved it in the same way as Lemma 1.
Lemma S.1. Assume that Assumptions A1′, A2′, A3, A4′, A5′ hold.
Then we have the following results.
(i) There are positive constants C1 and C2 such that
C1kgkG,2≤ kZTgk∆≤ C2kgkG,2
(ii) There are positive constants C3 and C4 such that
kgk2G,∞ ≤ C3Knkgk2G,2≤ C4Kn(kZTgk∆)2
for any g ∈ GB.
(iii) There is a positive constant C5 such that for any β ∈ Rp and g ∈ GB,
kXTβ+ ZTgk∞≤ C5Kn1/2kXTβ+ ZTgk∆,
where kvk∞= maxi,j|vij|. Besides we have for some positive constant
C6, kvk∆≤ C6kvk∞. (iv) sup g1,g2∈GB hZ Tg 1, ZTg2i∆n − hZTg1, ZTg2i∆ kZTg 1k∆kZTg2k∆ = Op(Kn p log n/n). (v) For any positive constant M , we have
hXj− ZTgj, Xk− ZTgki∆n − hXj− ZTgj, Xk− ZTgki∆= op(1)
uniformly in gj ∈ GB and gk ∈ GB satisfying kgjkG,2 ≤ M and
kgkkG,2 ≤ M, respectively.
(vi) For any stochastic process δn taking values at Tij satisfying that kδnk∞
is uniformly bounded in n and {δn,ij}mj=1i are mutually independent in
i, we have sup g∈GB hδn, Z Tgi∆ n − hδn, ZTgi∆ kZTgk∆ = Op( p Kn/n)kδnk∞.
(vii) We also have Assumption S for the norm here. Then we have for k = 1, . . . , p, k bϕ∆kk∞= Op(1),
kZT(ϕ∗∆k− bϕ∆k)k∆n = op(1), and kZT(ϕ∗∆k− bϕ∆k)k∆= op(1).
References.
[S1] Gin´e, E., Lata la, R. and Zinn, J. (2000). Exponential and moment inequalities for
U-statistics. In High Dimensional Probability II (pp. 13-38). Boston: Birkh¨auser.
[S2] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist.
311600–1635.
[S3] Masry, E. (1996). Multivariate local polynomial regression for time series: uniform strong consistency and rates. J. Time Series Anal. 17 571-599.
EFFICIENT ESTIMATION IN SEMIVARYING COEFFICIENT MODELS FOR
LONGITUDINAL/CLUSTERED DATA
By Ming-Yen Cheng ¶,∗, Toshio Hondak,†, and Jialiang Li∗∗,‡
National Taiwan University∗, Hitotsubashi University†, and
National University of Singapore‡
In semivarying coefficient modeling of longitudinal/clustered data, of primary interest is usually the parametric component which in-volves unknown constant coefficients. First we study semiparametric efficiency bound for estimation of the constant coefficients in a gen-eral setup. It can be achieved by spline regression using the true within-subject covariance matrices, which are often unavailable in reality. Thus we propose an estimator when the covariance matrices are unknown and depend only on the index variable. To achieve this goal, we estimate the covariance matrices using residuals obtained from a preliminary estimation based on working independence and both spline and local linear regression. Then, using the covariance matrix estimates, we employ spline regression again to obtain our final estimator. It achieves the semiparametric efficiency bound un-der normality assumption and has the smallest asymptotic covariance matrix among a class of estimators even when normality is violated. Our theoretical results hold either when the number of within-subject observations diverges or when it is uniformly bounded. In addition, the local linear estimator of the nonparametric component is superior to the spline estimator in terms of numerical performance. The pro-posed method is compared with the working independence estimator and some existing method via simulations and application to a real data example.
§This research was partially supported by the Hitotsubashi International Fellow
Pro-gram and a Taiwan Ministry of Education grant.
¶Corresponding author. Research was supported by the Ministry of Science and
Tech-nology grants 101-2118-M-002-001-MY3 and 104-2118-M-002-005-MY3.
kResearch was supported by the JSPS Grant-in-Aids for Scientific Research (A)
24243031 and (C) 25400197.
∗∗Research was supported by grants AcRF R-155-000-130-112 and
NMRC/CBRG/0014/2012.
MSC 2010 subject classifications: Primary 62G08
Keywords and phrases: Covariance matrix estimation; local linear regression; semipara-metric efficiency bound; spline functions.
1. Introduction. Suppose we have a scalar response Y , and two p-dimensional and q-p-dimensional covariate vectors X and Z. Longitudinal data consist of (Yij, Xij, Zij, Tij), i = 1, . . . , n, j = 1, . . . , mi, where Yij,
Xij = (Xij1, . . . , Xijp)T and Zij = (Zij1, . . . , Zijq)T are respectively the
values of Y , X and Z of the ith subject at the jth observation time Tij ∈
[0, 1]. Such kind of data are commonly acquired for various purposes, such as evidence based knowledge discovery and empirical study, in a wide range of subject areas. When the subjects are changed to clusters and the Tij’s are
observations on some index variable other than time, they are usually called clustered data. We assume that all the covariates are uniformly bounded for technical reasons. Besides, we let Zij1≡ 1 and suppose Xij has no constant
element for all i and j. For i = 1, . . . , n, denote Xi = (Xi1, . . . , Ximi) T, Z i = (Zi1, . . . , Zimi) T, and T i = (Ti1, . . . , Timi) T.
A popular model for longitudinal data analysis is the semivarying coefficient model, which is specified by
E(Yij|Xij, Zij, Tij, Xi, Zi, Ti)
(1.1)
= E(Yij|Xij, Zij, Tij) ≡ µ(XijTβ+ ZijTg(Tij)) = µij,
where AT stands for the transpose of a matrix A. In model (1.1), µ(x) is a known strictly increasing smooth link function, β is an unknown regression coefficient vector, and g(t) = g1(t), . . . , gq(t)T is a vector of unknown
smooth functions. Define (1.2) ǫi = (ǫi1, . . . , ǫimi) T = Y i− µi, and Σi= Var(ǫi|Xi, Zi, Ti), where Yi = (Yi1, . . . , Yimi) T, µ i = (µi1, . . . , µimi) T, and Σ i is an mi× mi
positive definite matrix depending on Xi, Zi, and Ti, i = 1, . . . , n. This is a standard marginal model in longitudinal data analysis [24].
Model (1.1) consists of a parametric component, which provides informa-tion on the constant impacts of some important covariates, and a nonpara-metric component which captures the dynamic impacts of the other covari-ates. In this way the model is able to reflect unknown nonlinear structures in the data while retaining similar interpretability as the classical linear models at the same time. There is an extensive literature on the variable selection, structure identification, estimation, and inference issues [6, 8, 12, 22, 25]. In particular, often of primary interest is to have access to the parametric component while the nonparametric component is viewed as the nuisance
part. In this regard, it is well known that assuming independence or some mis-specified working covariance structure yields less efficient estimation of the constant coefficients. Therefore, a substantial portion of the existing lit-erature aimed at improving the efficiency via modeling and estimating the within-subject covariance structure [6, 7, 10, 18, 26, 27, 28], which is itself a challenging task.
In this article, we focus on the identity link function and make contribu-tions to the efficient estimation problem for model (1.1) in three direccontribu-tions. First, we allow some of the mi’s to tend to infinity. As far as we know,
this setup has not been treated before and the problem is nontrivial. Our results also hold when the mi’s are uniformly bounded and ǫi satisfies the
sub-Gaussian property. See the supplement [5] for the details. When all of the mi’s are diverging, that is, if we have densely observed data, it becomes
a kind of functional data problem and is out of the scope of this paper. Second, we study explicit expression of the semiparametric efficiency bound for estimation of β and asymptotic normality of the generalized estimat-ing equations (GEE) spline estimator under general covariance structures and error distributions. Using the true covariance matrices in the GEE es-timation leads to optimality among all GEE estimators of the parametric component. Furthermore, it achieves the semiparametric efficiency bound when the errors are conditionally normal. Our results are in parallel to that for partially linear and partially linear additive models given by [13] and [4] respectively. Those models are among a rich variety of semiparametric ways of modeling longitudinal data, and they differ from semivarying coef-ficient models in that their nonparametric components admit more direct additive expressions. Partially linear (additive) models were also considered by [14, 15, 16, 17, 23], among which [14, 15, 16, 23] used kernel method and [17] used spline estimation.
Our third contribution is to deal with adaptive efficient estimation when the within-subject covariance matrices are estimated nonparametrically us-ing the data at hand. Notice that [4] ignored this practical issue and did not consider estimation of the covariances, and [13] suggested using some parametric specification which can be estimated √n-consistently. We con-sider the case where the nonparametric within-subject covariance matrices depend only on the observation times but not on the other covariates. Such assumptions are reasonable because we do not assume that the observation times are regular across different subjects or they are dense. Indeed, with irregular and/or sparse observation times, estimating the covariances in a completely nonparametric way, by letting them to be dependent on all of the Tij, Xij and Zij nonparametrically, is particularly problematic and even
unreliable as the curse-of-dimensionality problem arises. Our covariance es-timator is constructed based on residuals yielded by an initial estimation. The final estimator of the true value of β is then given by plugging-in the covariance estimates to the GEE spline estimation. We show the asymptotic equivalence of our final estimator to the oracle efficient estimator which uses the true covariance matrices in the GEE spline estimation.
The above result is partly motivated by the study of [14] on efficient esti-mation in partially linear models under the same nonparametric covariance structure. However, the kernel profile method taken by [14] involves only local linear regression, thus, to achieve semiparametric efficiency it requires some complicated iterative backfitting calculation except for the identity link function [15, 16]. By comparison, our approach to estimating the parametric and nonparametric components in the mean function is different and much simpler. We ingeniously use both spline approximation and local linear es-timation to avoid complicated calculation while allowing for the asymptotic equivalence property at the same time. To the best of our knowledge, there are no existing results for semivarying coefficient models, especially when some of the mi’s tend to infinity or when the Σis are estimated.
Our final estimator is some kind of feasible generalized least squares (FGLS) estimator since we replace the within-subject covariance matrices with their nonparametric estimates. Even if our assumption on the covari-ance matrices fails to hold, it still possesses the asymptotic normality under mild conditions and still makes use of some information of the covariance matrices. For example, if the covariances depend on some time-dependent covariates, to some extent such effects are still captured by our method. In this sense, compared with existing methods which use either parametri-cally estimated or some ad-hoc covariance matrices [7, 18, 21], our approach is more adaptive to the unknown covariance matrices. A promising cluster bootstrap inference method was proposed by [2]; it assumes some parametric within-cluster covariance structure, however. In the case where there is one observation for each subject/cluster, our assumption on the covariance ma-trices reduces to that of [20], which also suggested to improve the efficiency in a similar manner.
Our simulation study shows that numerically the proposed method out-performs the working independence approach and the quadratic inference functions (QIF) method by [18], and it behaves close to the oracle estimator which uses the true covariance matrices. Note that, while the QIF procedure is suitable when there is some kind of regularity and stationarity in the er-ror process, our procedure adapts to both non-stationarity and irregularity. We also applied our method to the CD4 count dataset and identified some
interesting new effects not detected by the working independence approach. After the semiparametric efficient estimation, we can estimate and make inference on the nonparametric component in the same way as in dealing with varying coefficient models, using the difference between the response and the estimated parametric part [25]. When p and q are both diverg-ing and the model is sparse, [6] suggested a simultaneous variable selection and structure identification procedure and showed its consistency property. By combining the method with the proposed estimation procedure and by putting together the corresponding consistency and efficiency results, we have an efficient estimation procedure in this case.
The organization of this paper is as follows. In Section 2 we derive the semiparametric efficiency bound for the constant coefficient vector β and asymptotic normality of GEE spline estimators. In Section 3, we propose an efficient estimator of β when the errors have some general covariance structure and state its asymptotic equivalence to the oracle estimator which assumes the covariance matrices are known. Section 4 summarizes and dis-cusses results of our simulation and empirical studies used to assess numeri-cal performance of the proposed efficient estimator. Section 5 contains some technical assumptions and proof of the asymptotic equivalence. In the sup-plementary material [5] we give additional simulation results for estimation, proofs of the other theoretical results, some lemmas, and theoretical results when the mi’s are uniformly bounded.
2. Semiparametric efficiency bound for β. In this section, Vi is a
given mi× mi inverse weight matrix depending only on Xi, Zi, and Ti,
i = 1, . . . , n. We use a Kn-dimensional equispaced B-spline basis on [0, 1],
denoted by B(t), to approximate the function g(t). See [19] for the definition and properties of B-spline bases. We set Wij = Zij ⊗ B(Tij) and Wi =
(Wi1, . . . , Wimi)
T, where ⊗ is the Kronecker product, and we denote the
true values of β and g(t) by β0 and g0(t) = (g01(t), . . . , g0q(t))T respectively.
Then we estimate β0 and g0(t) by minimizing with respect to β and γ
simultaneously the following objective function: (2.1)
n
X
i=1
where γ ∈ RqKnand the j th element of µ(X
iβ+ Wiγ) is µ(XijTβ+ WijTγ).
Thus the generalized estimating equations are
n X i=1 XTi∆iVi−1(Yi− µ(Xiβ+ Wiγ)) = 0, and n X i=1 WTi∆iVi−1(Yi− µ(Xiβ+ Wiγ)) = 0, (2.2)
where ∆i is an mi × mi diagonal matrix defined by ∆i = diag(µ′(Xi1Tβ+
Wi1Tγ), . . . , µ′(XT
imiβ+ W
T
imiγ)). Denote the solution to (2.2) by bβV and
b
γV ≡ bγ1VT , . . . ,bγqVT
T
. Then the GEE spline estimator with weight matrices Vi−1, i = 1, . . . , n, for β0is bβV and that for g0(t) is bγ1VT B(t), . . . ,bγqVT B(t)T.
Hereafter we focus on the identity link function and present the asymp-totic normality of bβV in Proposition 1 under general error distributions as
specified in Assumption A6 given in Section 5. We allow some of the mi’s to
diverge in a way like Pni=1m5i = O(n) and max1≤i≤nmi = O(n1/8). See
Assumptions A1 and A2 for the specific conditions on the mi’s. We refer to
the supplement [5] for the results for general link functions when the mi’s
are uniformly bounded and the ǫi’s satisfy the sub-Gaussian property. First, we introduce some function spaces, inner products and projections. Let L2 denote the space of square integrable functions on [0, 1] and recall
B(t) is the equispaced B-spline basis on [0, 1]. We define two function spaces: G= {(g1, . . . , gq)T| gj ∈ L2, j = 1, . . . , q},
and GB = {(BTγ1, . . . , BTγq)T | γ = (γ1T, . . . , γqT)T ∈ RqKn} .
Note that GB ⊂ G. Next, let v1 and v2 be two stochastic processes each
taking scalar values at Tij, i = 1, . . . , n, j = 1, . . . , mi. Then we define two
inner products of v1and v2by hv1, v2iVn = n1
Pn
i=1vT1iVi−1v2iand hv1, v2iV =
E{hv1, v2iVn}, where v1i and v2i are defined in the same way as Ti, and we
define the associated norms by kvkV
n = (hv, viVn)1/2 and kvkV = (hv, viV)1/2.
The projections, with respect to k · kV, of the kth element of X onto ZTG
and ZTGB are given by
(2.3) ΠVXk= argmin g∈G kXk− Z TgkV and Π VnXk= argmin g∈GB kXk− ZTgkV, where kXk − ZTgkV = 1nEn Pni=1(Xik − (ZTg)i)TVi−1(Xik− (ZTg)i) o , with Xik = (Xi1k, . . . , Ximik) T and (ZTg) i = (Z T i1g(Ti1), . . . , ZimT ig(Timi)). Hereafter we write ϕ∗ Vk= ΠVXk ∈ G and ϕVk= ΠVnXk ∈ GB.
Assumption S
(i) The projections ϕ∗
Vk(t), k = 1, . . . , p, and the varying coefficient
func-tion g0 are twice continuously differentiable on [0, 1], and they and
their second order derivatives are uniformly bounded in n.
(ii) We take Kn = ⌊cKn1/5⌋ for some positive constant cK, where ⌊x⌋ is
the largest integer no greater than x.
Assumption S(i) is a mild and standard assumption for semiparamet-ric models. We consider the existence and smoothness properties of ϕ∗Vk(t) in Section 5. Recall that all the covariates are assumed to be uniformly bounded. Since the relevant functions are assumed to be at least twice con-tinuously differentiable, we recommend quadratic or cubic spline approxima-tion. Then the order of Kn specified in Assumption S(ii) is optimal. If the
smoothness of different functions varies, we refer to [1] for the convergence rate interfere phenomenon.
The following matrices are necessary in order to present asymptotic nor-mality of bβV: H = Pn i=1XTi Vi−1Xi Pn i=1XTi Vi−1Wi Pn i=1WTi Vi−1Xi Pn i=1WTi Vi−1Wi = H11 H12 H21 H22 , (2.4) H11·2= H11− H12H22−1H21, and H11= (H11·2)−1.
Let ΩVn be a p × p matrix whose (k, l)th element is
hXk− ZTϕ∗Vk, Xl− ZTϕ∗VliV = 1 n n X i=1 En(Xik− (ZTϕ∗Vk) i) TV−1 i (Xil− (ZTϕ∗Vl)i) o .
Note that n−1H11·2 is an estimate of ΩVn. We assume that there exists a
p × p positive definite matrix ΩV such that
(2.5) lim
n→∞ΩVn= ΩV.
Now we are ready to state the asymptotic normality of bβV under general
error distributions as specified in Assumption A6 given in Section 5. Its proof is given in the supplement [5]. We denote the normal distribution with mean η and covariance Ω by N(η, Ω), and by “→” we mean convergence ind distribution. Let Il be the l-dimensional identity matrix.
Proposition 1. (Asymptotic normality of bβV) Under Assumption S,
(2.5), and Assumptions A1-6 given in Section 5, we have b βV = β0+ H11 n X i=1 (Xi− WiH22−1H21)TVi−1ǫi+ op 1 √ n . We also have Γ−1/2V ( bβV − β0)→ N(0, Id p), where ΓV is given by (2.6) H11 n X i=1 n (Xi−WiH22−1H21)TVi−1ΣiVi−1(Xi−WiH22−1H21) o H11.
Under (2.5), bβV is √n-consistent for β0. We can estimate its asymptotic
covariance ΓV given in (2.6) by replacing the Σi’s with some estimates
based on bβV and bγV. For example, we can replace Σi with eǫieǫTi where
eǫi = Yi− XTi βbV − WTi γbV. However, this approach may be too crude and it
does not make use of the common information on the covariance structure contained in different subjects. Alternatively, we can estimate the Σi’s by
applying smoothing techniques to some residuals based on some assumption on the covariance structure. We investigate this problem in Section 3.
Next, Proposition 2 gives the semiparametric efficiency bound for estima-tion of β0. It can be proved in almost the same way as in Section 4.4 of [13]
and Lemma 1 of [4] and the proof is omitted. We denote the semiparametric efficient score function of β by l∗β = (l∗β1, . . . , l∗βp)T. Its expression is given in Proposition 2. Then we denote ϕ∗
Σk(t) by ϕ∗ef f,k(t) when Vi = Σi in (2.1).
Proposition2. (Semiparametric efficiency bound) Under the same
as-sumptions as in Proposition 1, we have l∗βk= n X i=1 (Xik− (ZTϕ∗ef f,k) i) TΣ−1 i {Yi− XTi β0− (ZTg0)i},
and the semiparametric efficient information matrix for β is given by lim n→∞ 1 nE{l ∗ β(l∗β)T} = ΩΣ with Vi= Σi in (2.5).
Proposition 3 gives the asymptotic normality of bβΣ, the so called oracle estimator, which uses the true covariance structure in the GEE spline regres-sion. It also asserts that bβΣ achieves the semiparametric efficiency bound derived from Proposition 2. The proof is given in the supplement [5].
Proposition3. (Oracle efficient estimator) If we take Vi = Σi in (2.2)
then, under the same assumptions as in Proposition 1, we have √
n Ω1/2Σ ( bβΣ− β0)→ N(0, Id p).
In practice, usually the Σi’s are unknown and we have no direct access
to the semiparametric efficient score function or the oracle estimator. In the next section we study nonparametric estimation of the covariances so as to improve the efficiency.
3. Efficient estimation. The semiparametric efficiency bound of β given in Proposition 2 indicates that knowledge, or at least estimation, of the Σi’s is necessary in order to construct a semiparametric efficient
estima-tor. On the other hand, as discussed in the Introduction, when the Σi’s are
unknown it is almost impossible to estimate them in a fully nonparametric way. Fortunately, for longitudinal or clustered data sets, it is reasonable to make some assumptions such as
(3.1) Σi = Σ(Ti), i = 1, . . . , n,
where the (j, j)th element of Σiis given by σ2(Tij) and the (j, j′)th element is
given by σ(Tij, Tij′) when j 6= j′, for some smooth functions σ2(t) and σ(s, t).
Based on (3.1), in Section 3.1 we construct nonparametric estimates of the covariances and then use them to derive an FGLS procedure to improve the efficiency, and we show in Section 3.2 its asymptotic equivalence to the oracle estimator bβΣ. We also discuss estimation of the nonparametric component. 3.1. Methodology. A preliminary estimation of β0 and g0 is necessary
before we can estimate the covariances. For simplicity and robustness, we utilize working independence in the GEE spline estimation. As noted fol-lowing Proposition 1 we could then use the resultant residuals to estimate the covariance matrices directly. However it is intuitively better to further make use of the covariance structure (3.1) by applying some nonparametric smoothing techniques to the residuals. In addition, alternative to the spline estimator, we could apply smoothing techniques to the pseudo responses Yi− XTi βbV to obtain another estimator of g0. We take this latter approach
for technical and numerical reasons given in Remark 1. After the preliminary estimation, for each i = 1, . . . , n, we estimate Σi by applying local linear
regression and denote the resultant estimate by bΣi. Our final estimator of β0
is then obtained by taking Vi= bΣi, i = 1, . . . , n, in the GEE spline
estima-tion. Note that in the trivial case where mi is fixed for all i and the Tij’s are
Let K be a given kernel function. Our estimation procedure is formally specified as follows:
Step 1. Estimate β0 by the GEE spline method given in Section 2 with
Vi= Imi, i = 1, . . . , n, and denote the resultant working independence
estimate by bβI.
Step 2. Estimate g0(t) by applying local linear regression toYij−XijTβbI, i =
1, . . . , n, j = 1, . . . , mi , using bandwidth h1. We denote the resultant
estimate byg(t), which is written asb (3.2) b g(t) = Dq(A1n(t))−1 1 N1h1 n X i=1 mi X j=1 Zij⊗ 1 Tij−t h1 ! KTij− t h1 (Yij−XijTβbI),
where N1 =Pni=1mi, Dq = Iq⊗ (1 0), and
A1n(t) = 1 N1h1 n X i=1 mi X j=1 (ZijZijT) ⊗ 1 Tij−t h1 Tij−t h1 ( Tij−t h1 ) 2 ! KTij − t h1 .
Step 3. Calculate the residuals, denoted asbǫij, given by
bǫij = Yij − XijTβbI − ZijTg(Tb ij), i = 1, . . . , n, j = 1, . . . , mi.
Step 4. Estimate the variance function σ2(t) by applying to the squared residuals local linear regression with bandwidth h2. Denote the
resul-tant estimate by cσ2(t); it can be expressed as
(3.3) cσ2(t) = (1 0)(A 2n(t))−1 1 N1h2 n X i=1 mi X j=1 1 Tij−t h2 ! K Tij − t h2 (bǫij)2, where A2n(t) = N11h2 Pni=1 Pmi j=1 1 Tij−t h2 Tij−t h2 ( Tij−t h2 ) 2 ! K Tij−t h2 .
Step 5. Estimate the covariance function σ(s, t) by applying tobǫijbǫij′, j 6=
j′, i = 1, . . . , n local linear regression with bandwidth h3. We denote
the resultant estimate bybσ(s, t); it has the following expression: bσ(s, t) = (1 0 0)(A3n(s, t))−1 (3.4) ×N1 2h23 n X i=1 X j6=j′ 1 Tij−s h3 Tij′−t h3 K Tijh− s 3 K Tij′− t h3 bǫijbǫij′,
where N2 =Pni=1mi(mi− 1) and A3n(s, t) = 1 N2h23 n X i X j6=j′ 1 Tij−s h3 Tij′−t h3 1 Tij−s h3 Tij′−t h3 K Tij− s h3 K Tij′− t h3 .
Step 6. Calculate bΣiby combining the results from steps 4 and 5 by letting
b
Σi(j, j′) =bσ(Tij, Tij′)I(j 6= j′) + cσ2(Tij)I(j = j′),
and then estimate β0 with Vi = bΣi in the GEE (2.2). Denote the
resultant estimate of β0 by bβΣb.
Step 7. Update the nonparametric estimator of g0(t) given in Step 2 by
replacing Yij − XijTβbI with Yij − XijTβbΣb, i = 1, . . . , n, j = 1, . . . , mi.
Denote the resultant estimator bygbU(t). Alternatively, we can estimate
g0(t) with splines, by replacing β with bβb
Σ and taking Vi = bΣi in the
GEE (2.2). Denote the resultant estimator bygbS(t).
In general the covariance function estimatebσ(s, t) given by step 5 may not be positive semidefinite. We can modify it by truncating the eigenfunctions in its spectral decomposition that have eigenvalues not exceeding some non-negative constant λL. Then we have positive definite covariance estimates if
we replaceσ(s, t) with this modified version in step 6.b
Remark 1. When we calculate bβI in step 1, we also have γbI and get
the set of residuals {eǫij = Yij− XijTβbI− WijTbγI}. Then we could omit steps
2 and 3 of our procedure by exploiting this set of residuals when we estimate Σi in steps 4-6. However, our simulation results summarized in Section 4
indicate that this simplified approach is inferior to the proposed one. Intu-itively speaking, to achieve the semiparametric efficiency in the GEE spline estimation of β0, to some extent the accompanying estimation of g0(t)
re-quires undersmoothing and thus it often exhibits spurious wiggling patterns. Besides, it is difficult to justify theoretically this simplified approach as the local property of spline estimators seems to be intractable.
3.2. Asymptotic results. First we establish the asymptotic equivalence between the data-driven estimator bβb
Σ and the oracle estimator bβΣ by
ex-ploiting some desirable properties of bΣi. First, we specify our assumptions
on the smoothness of g0(t), σ2(t) and σ(s, t). We need Assumption B given
below, which is more restrictive than usual, in order to evaluate the differ-ence between bΣ−1i and Σ−1i .
Assumption B.
(i) Assumption (3.1) holds.
(ii) The true varying coefficient function g0(t) is three times continuously
differentiable on [0, 1].
(iii) The variance function σ2(t) is three times continuously differentiable on [0, 1].
(iv) The covariance function σ(s, t) is three times continuously differen-tiable on [0, 1]2.
In the following we collect our assumptions on the kernel function K and the three bandwidths used in the construction of the proposed estimator. Assumption H(i) on K is a standard one. When Assumption B holds, our as-sumptions on the bandwidths h1, h2and h3are not restrictive. For example,
the optimal order of h1and h2is n−1/5which falls into the specified range. A
larger order is recommended only for h3due to the two-dimensional
smooth-ing in step 5. However, since the effective number of observations used in step 5 of the procedure is N2 we anticipate that bandwidth choice will not
seriously affect the performance of our final estimator. Assumption H.
(i) The kernel function K is some continuously differentiable symmetric density function with a compact support.
(ii) The bandwidths h1, h2 and h3 satisfy h1 = c1n−ah for some 1/6 <
ah ≤ 1/4, h2 = c2n−bh for some 1/6 < bh ≤ 1/4 and h3 = c3n−ch for
some 1/6 < ch < 1/4, where c1, c2 and c3 are some positive constants.
The asymptotic expression of bΣiis given in Proposition 4, which is verified
in the supplementary material [5]. Note that we need more elaborate repre-sentations than those used by [14] since we deal with a (p+qKn)-dimensional
linear regression model. Note also that the functions Bj, j = 1, . . . , 4, that
appear in Proposition 4 are implicitly defined in the proof of the proposition and only their boundedness property is needed in the proof of Theorem 1.
Proposition 4. (Representations of the covariance estimators) Under
the assumptions in Proposition 1 with Vi = Imi, and Assumptions B and H,
we have the following representations of cσ2(t) and bσ(s, t). Uniformly in t,
c σ2(t) − σ2(t) = B 1(t)h22+ B2(t)E1(t) + Op(h31+ h32) + Op log n nh1 +log n nh2