Efficient estimation in semivarying coefficient models for longitudinal/clustered data

(1)

arXiv:1501.00538v3 [stat.ME] 13 Sep 2015

Supplement to “Efficient estimation in semivarying coefficient models for longitudinal/clustered data”

by Ming-Yen Cheng, Toshio Honda, and Jialiang Li S.1. Additional simulation results.

S.1.1. Nonparametric component estimates. In Step 7 of our estimation procedure we give both local linear and spline approaches to estimation the nonparametric component after the efficient estimator bβ_b

Σ is obtained. In

this section we examine the finite sample performance via simulations. For comparison, we also computed the respective initial estimates, that is, the version using bβI instead of bβ_Σb. We considered the same settings in Section

4, and we used cross-validation to choose the bandwidth used in the local linear estimation. We computed the mean integrated square error (MISE) for all the function estimates and took their average. The results are given in Table S.1.

The figures in Table S.1 indicate that it is clearly advantageous to update the nonparametric component after efficient estimation of the parametric component. In addition, we observe that the refine local linear and spline estimators perform roughly the same in terms of MISE.

Table S.1

MISE for simulation studies.

Local linear estimate Spline estimate

Initial Reﬁned Initial Reﬁned

n=100 ρ= .4 .0449 .0354 .0492 .0376 ρ= .8 .0691 .0597 .0639 .0593 n=200 ρ= .4 .0390 .0315 .0415 .0355 ρ= .8 .0595 .0589 .0584 .0576

S.1.2. Parametric component estimates. We note that we adjusted the covariance function_bσ(s, t) by setting all negative eigenvalues to be zero. We also considered a strictly positive threshold λL= 0.05 and set all eigenvalues

lower than λL to be zero. The estimator using this covariance estimate is

denoted by “Positive” in Table S.2. The “positive” estimator includes an adjustment when estimating the covariance function by setting eigenvalues lower than a positive cut-off to be zero while the efficient estimator only adjusts the negative eigenvalues. Therefore, it is slightly more biased than the efficient estimator. In all the considered cases, the crude and positive estimators are still more efficient than the working independence estimator.

(2)

Recall that in all the numerical analysis reported in the paper, h1 and h2

were selected via the commonly used leave-one-subject-out cross-validation, and the bandwidth h3 used in the estimation of the covariance structure

were selected as h3 = 2h1. To examine effects of the bandwidth choice, we

considered various choices of h3 in the numerical studies and obtained quite

similar results. Under the column “Different h3”, we report the results for

another case when h3 = 1.5h1, which are similar to those obtained when

h3 = 2h1.

Our procedure does not require any iteration. In practice it may be inter-esting to refine the estimation of coefficients and covariances using iterations and obtain a final estimation upon convergence. We report the numerical results under the “Iterative” column. The bias and SE are very close to those obtained without iteration.

Table S.2

Estimation results of 200 simulations. “Positive” means we set a positive threshold for

the covariance eigenvalues; “Different h3” means using a different choice of h3 in our

efficient estimation; “Iterative” indicates an iterative estimation approach.

Positive Different h3 Iterative

n ρ bias SE bias SE bias SE

100 0.4 β1 .0173 .0411 -.0152 .0375 -.0146 .0361 β2 .0176 .0423 -.0098 .0375 -.0095 .0352 β3 .0205 .0425 -.0122 .0369 -.0099 .0360 β4 -.0096 .0425 .0098 .0373 -.0086 .0362 200 0.4 β1 -.0113 .0329 .0056 .0274 .0045 .0228 β2 -.0164 .0334 -.0099 .0274 -.0066 .0219 β3 .0120 .0323 .0072 .0273 .0034 .0259 β4 -.0095 .0329 -.0043 .0276 -.0035 .0274 100 0.8 β1 .0202 .0366 .0082 .0336 .0065 .0325 β2 .0163 .0378 -.0075 .0335 -.0034 .0323 β3 .0197 .0372 .0166 .0337 .0121 .0328 β4 -.0168 .0354 -.0182 .0338 .0157 .0325 200 0.8 β1 -.0044 .0214 -.0124 .0202 .0056 .0199 β2 .0036 .0215 .0138 .0200 -.0049 .0199 β3 .0042 .0215 .0165 .0204 .0052 .0178 β4 -.0038 .0214 -.0148 .0200 -.0050 .0179

S.2. Proofs of Propositions 1-3 and Lemma 1. In this section, we outline the proofs of Propositions 1-3 and present the proof of Lemma 1. When mi is uniformly bounded, we have the same results for general link

functions by just following closely the arguments of [3]. We outline the results at the end of this supplement. Note that the sub-Gaussian error assumption is necessary in that case. We outline the proofs of Propositions 1-3 since we allow some of the mi’s to diverge as in Assumptions A1 and A2.

(3)

Proof of Proposition 1. First we consider the properties of ΓV. The (k, l)th

element of n−1_H

11·2is given by

hXk− ZTϕbVk, Xl− ZTϕbVliVn.

From Lemma 1 (v)-(vii), we have

hXk− ZTϕbVk, Xl− ZTϕbVliVn = hXk− ZTϕ∗Vk, Xl− ZTϕ∗VliVn + op(1)

= hXk− ZTϕ∗Vk, Xl− ZTϕ∗VliV + op(1).

This and (2.5) imply that for some positive constants C1 and C2, we have

C1 ≤ λmin(n−1H11·2) ≤ λmax(n−1H11·2) ≤ C2 and hence 1 nC2 ≤ λmin (H11_{) ≤ λ}max(H11) ≤ 1 nC1 (S.1)

with probability tending to 1. Note that

Var( bβV | {Xij}, {Zij}, {Tij}) = ΓV

and Theorem 1 of [13] implies that ΓV − H11 is nonnegative definite when

H11 is defined with Vi= Σi. Hence for some positive constant C3, we have

λmin(ΓV) ≥

C3

n with probability tending to 1.

Now we prove the asymptotic normality of b βV − E{ bβV | {Xij}, {Zij}, {Tij}} = H11 n X i=1 X_iTV_i−1ǫ_i_{− H}12H22−1 n X i=1 W_iTV_i−1ǫ_i.

As in the proof of Theorem 2 of [13], we take c ∈ Rp such that |c| = 1 and write cT( bβV − E{ bβV | {Xij}, {Zij}, {Tij}}) = n X i=1 aiηi (say), where a2_i = cTH11(X_i_{− W}_iH₂₂−1H21)TV_i−1ΣiV_i−1(Xi− WiH22−1H21)H11c

(4)

and {ηi} is a sequence of conditionally independent random variables with

E{ηi| {Xij}, {Zij}, {Tij}} = 0 and Var(ηi| {Xij}, {Zij}, {Tij}) = 1.

We have from (S.1) and Lemma 1 (vii) that max 1≤i≤na 2 i = Opm 2 max n2 p X k=1 kXk− ZTϕbVkk2_∞= Op m 2 max n2 . On the other hand, we have for some positive constant C4,

n X i=1 a2_i = cTΓVc ≥ C4 n

with probability tending to 1. Hence we have established

max_1≤i≤na2_i

Pn

i=1a2i

= Op(n−1m2max) = op(1)

and it follows from the standard argument that

(S.2) n X i=1 a2_i−1/2 n X i=1 aiηi → N(0, 1).d

Finally we evaluate the conditional bias:

Biasβ = E{ bβV | {Xij}, {Zij}, {Tij}} − β0

Takeg_e_{∈ G}B such that kg0− egkG,∞= O(Kn−2) and set

δ0 = g0− eg and δ0 = ZTδ0.

Note that

kδ0k∞= O(Kn−2) and kδ0kV = O(Kn−2).

We also take ϕ_eVk ∈ GB such that kϕ∗Vk− eϕVkkG,∞ = O(Kn−2). Then we

have the following expression for the conditional bias: Biasβ = nH11(S1, . . . , Sp)T, where Sk= hXk, δ0− ZTΠbVnδ0iVn = hXk− ZTϕeVk, δ0− ZTΠbVnδ0iVn = hXk− ZTϕ∗Vk, δ0− ZTΠVnδ0iVn + hXk− ZTϕ∗Vk, ZTΠVnδ0− ZTΠbVnδ0iVn + hZTϕ∗_V_k_{− Z}Tϕ_eVk, δ0− ZTΠbVnδ0iVn = S1k+ S2k+ S3k (say).

(5)

Note that

E{S1k} = 0 and E{S1k2 } = O(kX

k− ZTϕ∗VkkV)2

K3

nn

since S1k is a sum of independent random variables, ϕ∗Vk = ΠVXk, δ0 =

ZTδ₀, and

kδ0− ZTΠVnδ0k∞≤ kδ0k∞+ CKn1/2kZTΠVnδ0kV

≤ kδ0k∞+ CKn1/2kδ0kV = O(Kn−3/2).

Hence we have

S1k = Op(1/(nKn3)1/2) = op(n−1/2).

Now we deal with S2k. From Lemma 1 (vi) and the fact that kδ0−ZTΠVnδ0k∞=

since kδ0− ZTΠbVnδ0kVn ≤ kδ0kVn. Hence we have

Biasβ = op(n−1/2) .

The desired result follows from (S.2) and the above equality.

As for Proposition 2, there is almost no change in calculation of the score functions in [13] and [4] and we omit the outline. This is because mi is

bounded for any fixed n.

Proof of Proposition 3. When Vi= Σi, we have

(6)

Lemma 1 (vii) implies that 1 nΓ −1 V = 1 nH11·2= 1 nE{l ∗ β(lβ∗)T} + op(1) = ΩΣ+ op(1).

The desired result follows from the above result and Proposition 1. Proof of Lemma 1. The proof consists of seven parts.

(i) Recall that

(kZTg_kV)2 = 1 nE n_Xn i=1 (ZTg)T i V −1 i (ZTg)_i o . We have from Assumptions A4 and A5 that

C1 n E n_Xn i=1 1 mi mi X j=1 gT(Tij)ZijZijTg(Tij) o (S.3) ≤ (kZTg_kV)2 _≤ C2 n E n_Xn i=1 mi X j=1 gT(Tij)ZijZijTg(Tij) o

for some positive constants C1 and C2. Assumptions A2 and A3 imply that

for some positive constants C3 and C4,

C3 q X l=1 Z g2_l_{(t)dt ≤} 1 nE n_Xn i=1 1 mi mi X j=1 gT(Tij)ZijZijTg(Tij) o (S.4) ≤ _n1En n X i=1 mi X j=1 gT(Tij)ZijZijTg(Tij) o ≤ C4 q X l=1 Z g_l2(t)dt.

The desired result follows from (S.3) and (S.4).

(ii) This is a well-known result in the literature of spline regression. See for example A.2 of [12].

(iii)The result in (ii) implies

kXTβ+ ZTg_k2_∞ _{≤ CK}n

|β|2+ kgk2G,2

for some positive constant C. Recall that p and q are fixed in this paper. On the other hand, we have from Assumptions A1-3 and A5 that for some

(7)

positive constants C1, C2, and C3, (kXTβ+ ZTg_kV)2 ≥ C1 n E n_Xn i=1 1 mi mi X j=1 (βT gT(Tij)) XijXijT XijZijT ZijX_ijT ZijZ_ijT β g(Tij) o ≥ C_n2En n X i=1 1 mi mi X j=1 (βT gT(Tij)) β g(Tij) o ≥ C3|β|2+ kgk2G,2.

Besides, we have for some positive constants C1 and C2,

(kvkV)2 _≤ C1 n n X i=1 mi X j=1 |vij|2 ≤ C2kvk∞.

Hence the desired results are established. (iv) For g1 ∈ GB and g2 ∈ GB, we have

hZTg1, ZTg2iVn = γ1T n 1 n n X i=1 WT_iV_i−1W_ioγ2= γ1T∆Vnγ2 (say),

where ∆Vn is a qKn× qKn matrix and γ1 and γ2 correspond to g1 and g2,

respectively. Elements of _n1Pn_i=1WT_i V_i−1W_i are written as

(S.5) 1 n n X i=1 X j1,j2 vj1j2 i Bk1(Tij1)Bk2(Tij2)Zij1l1Zij2l2 = ∆ (k1,l1,k2,l2) Vn (say), where vj1j2 i is defined in (??), 1 ≤ k1, k2 ≤ Kn, and 1 ≤ l1, l2 ≤ q. By

evaluating the variance of (S.5) and using the Bernstein inequality for inde-pendent bounded random variables, and Assumptions A1 and A2, we have uniformly in k1, k2, l1, and l2, ∆(k1,l1,k2,l2) Vn − E(∆ (k1,l1,k2,l2) Vn ) = Op s_{log n} nK2 n if Bk1(t)Bk2(t) ≡ 0 (S.6) and ∆(k1,l1,k2,l2) Vn − E(∆ (k1,l1,k2,l2) Vn ) = Oprlog n nKn if Bk1(t)Bk2(t) 6≡ 0. (S.7)

By exploiting (S.6), (S.7), and the local property of the B-spline basis, we obtain

(S.8)

max{|λmin(∆Vn− E(∆Vn))|, |λmax(∆Vn− E(∆Vn))|} = Oprlog n

n

(8)

We also have

(S.9) C1

Kn ≤ λmin

(E(∆Vn)) ≤ λmax(E(∆Vn)) ≤

C2

Kn

since Assumptions A2 and A3 yields C3 n n X i=1 1 mi mi X j=1 (Zij ⊗ B(Tij))T(Zij⊗ B(Tij)) ≤ ∆Vn≤ C4 n n X i=1 mi X j=1 (Zij ⊗ B(Tij))T(Zij ⊗ B(Tij))

for some positive constants C3 and C4. See the proof of Lemma A.3 of [12].

Hence the desired result follows from (S.8) and (S.9). (v) This follows from (iv) and (vi).

(vi) Using Assumptions A1 and A2 we have hδn, ZlBkiVn = 1 n n X i=1 X j1,j2 δn,ij1v j1j2 i Zij2lBk(Tij2) and Var(hδn, ZlBkiVn) ≤ C1kδnk2_∞ n2 n X i=1 m2_i X j1,j2 E{Bk2(Tij1)B 2 k(Tij2)} ≤ C2kδnk2_∞ nKn

for some positive constants C1 and C2. Hence we have q X l=1 Kn X k=1 Var(hδn, ZlBkiVn) ≤ C nkδnk 2 ∞

for some positive constant C and the desired result follows from (S.9). (vii) TakeϕeVk∈ GB such that k eϕVk− ϕ∗VkkG,∞= O(Kn−2). Then we have

for some positive C,

kZT(ϕVk− ϕ∗Vk)k∞ (S.10) ≤ kZT(ϕ_V_k_{− e}ϕVk)k∞+ kZT(ϕeVk− ϕ∗Vk)k∞ ≤ CpKnkZT(ϕVk− eϕVk)kV + kZT(ϕeVk− ϕ∗Vk)k∞ ≤ CpKnkZT(ϕ∗Vk− eϕVk)kV + kZT(ϕeVk− ϕ∗Vk)k∞ = O(K_n−3/2).

(9)

Here we used the fact that ϕ_V_k = ΠVnXk ∈ GB and ϕ∗Vk = ΠVXk.

In-equality (S.10) implies kZT_ϕ

Vkk∞ = O(1) and we have only to evaluate

ZT(ϕVk− bϕVk). We should just follow the arguments on p.16 of [3] by

re-placing ϕ∗_k,nandϕ_bk,nwith ZTϕVkand ZTϕbVk since the arguments employ

(iv) and (vi) and don’t depend on mi. Then we have

kZT(ϕ_V_k_{− b}ϕVk)k∞= op(1), kZT(ϕVk− bϕVk)kVn = Op( p Kn/n), and _kZT(ϕ_V_k_{− b}ϕVk)kV = Op( p Kn/n).

The desired results follow from the above equations and (S.10).

S.3. Proof of Proposition 4. In the proof, we repeatedly use argu-ments based on exponential inequalities, truncation, and division of regions into small rectangles to prove uniform convergence results as in [S3]. We do not give the details of these arguments since they are standard ones in non-parametric kernel methods. Since we impose Assumption A2 and we do not use Σi or Vi in the construction ofg(t), cb σ2(t), and bσ(s, t), we see the effects

of diverging mi explicitly only when applying the exponential inequality

for generalized U-statistics. Recall that we assume three times continuous differentiability of the relevant functions in this proposition.

The proof consists of four parts: (i) representation ofg(t), (ii) represen-_b tation ofbǫij, (iii) representation of cσ2(t), and (iv) representation ofbσ(s, t).

(i) Representation ofg(t). Applying the third order Taylor series expansionb to g0(t), we have (S.11) Z_ijTg₀(Tij) = ZijT n g₀(t) + h1 Tij− t h1 g′₀(t) +h 2 1 2 Tij − t h1 2 g′′₀(t)o+ O(h3₁), where g₀′(t) = (g₀₁′ (t), . . . , g′_0q(t))T and g₀′′(t) = (g₀₁′′(t), . . . , g_0q′′ (t))T. By plugging (S.11) into (3.2), we have uniformly in t,

b g(t) = g0(t) + Dq(bL1(t))−1Lb2(t)(β0− bβI) (S.12) +h 2 1 2 Dq(bL1(t)) −1_L_b 3(t)g0′′(t) + Dq(bL1(t))−1E0(t) + Op(h31),

(10)

where bL1(t) = A1n(t) defined after (3.2), b L2(t) = 1 N1h1 n X i=1 mi X j=1 Zij ⊗ 1 Tij−t h1 ! KTij − t h1 X_ijT, b L3(t) = 1 N1h1 n X i=1 mi X j=1 (ZijZijT) ⊗ (Tij−t h1 ) 2 (Tij−t h1 ) 3 ! KTij− t h1 , E0(t) = 1 N1h1 n X i=1 mi X j=1 Z_ij _⊗ Tij1−t h1 ! KTij − t h1 ǫij.

By following standard arguments such as those in [S3], we obtain for j = 1, 2, 3, b Lj(t) = Lj(t) + Oprlog n nh1 uniformly in t, (S.13)

where Lj = E{bLj(t)}, and

E0(t) = Oprlog n

nh1

uniformly in t. (S.14)

Assumption A2 implies that

(S.15) C1I2q ≤ L1(t) ≤ C2I2q

for some positive constants C1 and C2. From (S.12)-(S.15), we have

uni-formly in t, b g(t) = g0(t) + Dq(L1(t))−1L2(t)(β0− bβI) + h2 1 2 Dq(L1(t)) −1_L 3(t)g0′′(t) (S.16) + Dq(L1(t))−1E0(t) + Op(h31) + Op log n nh1 + Op h2₁ r log n nh1 = g0(t) + L4(t)(β0− bβI) + h12L5(t)g0′′(t) + L6(t)E0(t) + Op(h31) + Oplog n nh1 + Op h2₁ r log n nh1 (say).

Note that all the elements of Lj(t), j = 4, 5, 6, are bounded functions of t.

(ii) Representation of_bǫij. We have

(11)

By plugging (S.16) into the above equality, we obtain uniformly in i and j, bǫij = ǫij + (XijT − ZijTL4(Tij))(β0− bβI) − h21ZijTL5(Tij)g′′(Tij) (S.17) − ZijTL6(Tij)E0(Tij) + Op(h31) + Op log n nh1 + Op h2₁ r log n nh1 = ǫij + Mij(1)(β0− bβI) + h21M (2) ij g′′(Tij) + Mij(3)E0(Tij) + Op(h31) + Oplog n nh1 + Op h2₁ r log n nh1 (say).

Note that all the elements of M_ij(1), M_ij(2), and M_ij(3) are uniformly bounded functions of Xij, Zij, and Tij.

(iii) Representation of cσ2_{(t). We have uniformly in i and j,}

(_bǫij)2 = ǫ2ij− σ2(Tij) + σ2(Tij) + 2ǫijM_ij(3)E0(Tij) (S.18) + 2ǫijMij(1)(β0− bβI) + 2ǫijh21M (2) ij g′′0(Tij) + Op(h31) + Op log n nh1 + Op h2₁ r log n nh1 . Recall that M_ij(l), l = 1, 2, 3, are defined in (S.17). It is easy to see that the contributions of 2ǫijM_ij(1)(β0− bβI) and 2ǫijh21M (2) ij g′′(Tij) to cσ2(t) are Op 1√ n r log n nh2 and Op h2₁ r log n nh2

uniformly in t, respectively. Thus we have only to consider ǫ2_ij _{− σ}2(Tij),

σ2(Tij), and 2ǫijM_ij(3)E0(Tij) in (S.18).

Setting bL7(t) = A2n(t), which is defined after (3.3), we have for some

positive constants C1 and C2,

(S.19) Lb7(t) = L7(t) + Oprlog n

nh2

and C1I2 ≤ L7(t) ≤ C2I2

uniformly in t, where L7(t) = E{bL7(t)}. Now we have uniformly in t,

c σ2_{(t) = (1 0)(b}_L 7(t))−1(E1(t) + Bias1(t) + R1(t)) (S.20) + Op(h31) + Op log n nh1 + Op h2₁ r log n nh1 ,

(12)

where E1(t) is defined in Proposition 4, Bias1(t) is the term of σ2(Tij), and

R1(t) is the term of 2ǫijMij(3)E0(Tij). It is easy to see that uniformly in t,

(S.21) E1(t) = Oprlog n

nh2

. By applying the Taylor series expansion, we have

σ2(Tij) = σ2(t) + h2(σ2)′(t) Tij− t h2 +h 2 2 2 (σ 2₎′′_(t)Tij− t h2 2 + O(h3₂). Therefore Bias1(t) can be represented as

Bias1(t) = bL7(t) σ2_(t) h2(σ2)′(t) +h 2 2(σ2)′′(t) 2N1h2 n X i=1 mi X j=1 (Tij−t h2 ) 2 (Tij−t h2 ) 3 ! KTij− t h2 + Op(h32). uniformly in t. Setting b L8(t) = 1 N1h2 n X i=1 mi X j=1 (Tij−t h2 ) 2 (Tij−t h2 ) 3 ! KTij − t h2 , we have uniformly in t, b L8(t) = L8(t) + Oprlog n nh2 ,

where L8(t) = E{bL8(t)} and L8(t) is a bounded vector function of t. Hence

we have uniformly in t, (S.22) Bias1(t) = bL7(t) σ2_(t) h2(σ2)′(t) +h 2 2(σ2)′′(t) 2 L8(t) + Op(h 3 2) + Op h2₂ r log n nh2 . Next we deal with R1(t), which can be written as

(S.23) 1 N2 1h1h2 X a,b n X i=1 mi X j=1 n X i′₌₁ mi′ X j′₌₁ ǫijǫi′_j′A_ab,ijB_ab,i′_j′K_aTij − t h2 Kb Ti′_j′− T_ij h1 ,

(13)

uniformly bounded functions of Xij, Zij, and Tij. We evaluate 1 N2 1h1h2 n X i=1 mi X j=1 n X i′₌₁ mi′ X j′₌₁ ǫijǫi′_j′A_ab,ijB_ab,i′_j′K_a Tij− t h2 Kb Ti′_j′− T_ij h1 (S.24) = 1 N2 1h1h2 n X i=1 mi X j=1 ǫ2_ijAab,ijBab,ijKaTij − t h2 Kb(0) + 1 N₁2h1h2 n X i=1 X j6=j′ ǫijǫij′A_ab,ijB_ab,ij′K_a Tij− t h2 Kb Tij′− T_ij h1 + 1 N2 1h1h2 X i6=i′ X j,j′ ǫijǫi′_j′A_ab,ijB_ab,i′_j′K_a Tij − t h2 Kb Ti′_j′− T_ij h1

= R(1)_1ab(t) + R(2)_1ab(t) + R_1ab(3)(t) (say).

Note that we cannot apply classical exponential inequalities for U-statistics since kernel functions depend on i and i′ and observations are not identical. It is easy to see that uniformly in t,

(S.25) R(1)_1ab(t) = Op((nh1)−1) and R(2)_1ab(t) = Op(n−1).

We evaluate R(3)_1ab(t) by using an exponential inequality as the one given in (3.5) of [S1] with A = C1(log n)km2max/(n2h1h2),

B2 = C2 (log n)2km2_max n3_h 1h2 (h−1₁ + h−1₂ ), C = C3/(nh1/21 h 1/2 2 ), and x = M log n/(nh 1/2 1 h 1/2

2 ) in the inequality and

standard arguments in nonparametric regression as in [S3]. Note that we used a kind of truncation technique to handle ǫij and that we have to take

sufficiently large k and M here. Hence we have R(3)_1ab(t) = Op

_{log n}

n(h1h2)1/2

. The above equation and (S.23)-(S.25) imply that

(S.26) R1(t) = Op _{log n} n(h1h2)1/2 + Op 1 nh1

(14)

uniformly in t. It follows from (S.19)-(S.22) and (S.26) that c σ2_{(t) = σ}2_{(t) + (1 0)(L} 7(t))−1E1(t) + h2₂ 2 (1 0)(L7(t)) −1_L 8(t)(σ2)′′(t) + Op(h31) + Op(h32) + Op log n nh1 + Oplog n nh2 .

The expression of cσ2_{(t) in Proposition 4 also follows from the above}

expres-sion.

(iv) Representation of _{bσ(s, t). We can proceed almost in the same way as} when we deal with cσ2_{(t). First we have uniformly in i, j, and j}′_,

bǫijbǫij′ = ǫ_ijǫ_ij′ − σ(T_ij, T_ij′) + σ(T_ij, T_ij′) + ǫijMij(3)′E0(Tij′) + ǫ_ij′M_ij(3)E₀(T_ij) + ǫijM_ij(1)′(β0− bβI) + ǫij′M_ij(1)(β₀− bβ_I) (S.27) + ǫijh21M (2) ij′ g′′₀(Tij′) + ǫ_ij′h2₁M(2) ij g0′′(Tij) (S.28) + Op(h31) + Op log n nh1 + Op h2₁ r log n nh1 .

It is easy to see that the contributions of (S.27) and (S.28) toσ(s, t) are_b Op 1√ n s log n nh2₃ and Op h2₁ s log n nh2₃

uniformly in s and t, respectively. Therefore we have only to consider ǫijǫij′−

σ(Tij, Tij′), σ(T_ij, T_ij′), and ǫ_ijM(3)

ij′E0(Tij′) + ǫ_ij′M(3)

ij E0(Tij) in bǫijbǫij′.

Setting bL9(s, t) = A3n(s, t), which is defined after (3.4), we have for some

positive constants C1 and C2,

(S.29) Lb9(s, t) = L9(s, t) + Op s_{log n} nh2 3 and C1I3≤ L9(s, t) ≤ C2I3

uniformly in s and t, where L9(s, t) = E{bL9(s, t)}. Now we have uniformly

in s and t, bσ(s, t) = (1 0 0)(bL9(s, t))−1(E2(s, t) + Bias2(s, t) + R2(s, t)) (S.30) + Op(h31) + Oplog n nh1 + Op h2₁ r log n nh1 ,

(15)

where E2(s, t) is defined in Proposition 4, Bias2(s, t) is the term of σ(Tij, Tij′),

and R2(s, t) is the term of ǫijM_ij(3)′ E0(Tij′) + ǫ_ij′M_ij(3)E₀(T_ij). It is easy to

see that uniformly in s and t,

(S.31) E2(s, t) = Op s_{log n} nh2 3 . Setting b L10(s, t) = 1 N2h23 n X i=1 X j6=j′    1 Tij−s h3 T_ij′−t h3   Tij_h− s 3 2 _2(T_ij_{− s)(T}_ij′− t) h2 3 Tij′ − t h3 2 × KTij_h− s 3 KTij′− t h3 , we have uniformly in s and t,

b L10(s, t) = L10(s, t) + Op s_{log n} nh2₃ ,

where L10(s, t) = E{bL10(s, t)} which is a bounded matrix function of (s, t).

Then we have, as in the proof of the representation of cσ2_{(t), uniformly in s}

and t Bias2(s, t) = bL9(s, t)   σ(s, t) h3∂σ_∂s(s, t) h3∂σ_∂t(s, t)   +h23 2 L10(s, t)    ∂2_σ ∂s2(s, t) ∂2_σ ∂s∂t(s, t) ∂2_σ ∂t2(s, t)    (S.32) + Op(h33) + Op h2₃ s log n nh2 3 .

Finally we deal with R2(s, t) in the same way as in the proof of the

repre-sentation of cσ2_{(t). We use the same exponential inequality for U-statistics.}

We should consider 1 N1N2h1h23 n X i1=1 n X i2=1 X j16=j2 X j3 ǫi1j1ǫi2j3Aabc,i1j2Babc,i2j3 (S.33) × Ka Ti2j3− Ti1j2 h1 Kb Ti1j1 − t h3 Kc Ti1j2− s h3 ,

(16)

where Kl(t) = tlK(t), a = 0, 1, b = 0, 1, and c = 0, 1. Note that Aabc,ij

and Babc,ij are uniformly bounded functions of Xij, Zij, and Tij. This is a

generalized U-statistics when we remove the summands of i1 = i2 and we

recall (1.1) when we evaluate (S.33). It is easy to see that uniformly in s and t, 1 N1N2h1h2₃ n X i1=1 X j16=j2 X j3 ǫi1j1ǫi1j3Aabc,i1j2Babc,i1j3 (S.34) × Ka Ti1j3 − Ti1j2 h1 Kb Ti1j1− t h3 KcTi1j2 − s h3 = Op 1 nh1 . In the same way as when dealing with R_1ab(3)(t), we obtain

1 N1N2h1h23 X i16=i2 X j16=j2 X j3 ǫi1j1ǫi2j3Aabc,i1j2Babc,i2j3 (S.35) × KaTi2j3 − Ti1j2 h1 KbTi1j1 − t h3 KcTi1j2− s h3 = Op log n nh1/2₁ h3

with A = C1(log n)km3max/(n2h1h23), B = C2(log n)km2max/(n3/2h1/21 h23),

C = C3/(nh1/2₁ h3), and x = M log n/(nh1/2₁ h3) in the exponential

inequal-ity. Note that we should choose sufficiently large k and M . It follows from (S.34) and (S.35) that uniformly in s and t,

(S.36) R2(s, t) = Op log n

nh1/2₁ h3

.

Note that we cannot relax the assumption of mmax= O(n1/8) in Assumption

A1 when we derive (S.36). It follows from (S.29)- (S.32) and (S.36) that uniformly in s and t, bσ(s, t) − σ(s, t) = (1 0 0)(L9(s, t))−1E2(s, t) + h2 3 2 (1 0 0)(L9(s, t)) −1_L 10(s, t)    ∂2_σ ∂s2(s, t) ∂2_σ ∂s∂t(s, t) ∂2_σ ∂t2(s, t)    + Op(h31) + Op(h33) + Oplog n nh1 + Oplog n nh2₃ .

(17)

S.4. Proofs of Lemmas 2-8. First we state some results on bΣi. Set (S.37) δn= h22+ h23+ r log n nh2 + s log n nh2₃ . Then we have from Proposition 4 that uniformly in i,

max{|λmin(Σi− bΣi)|, |λmax(Σi− bΣi)|} = Op(miδn).

Recall that b

Σ−1_i _{− Σ}−1_i = bΣ_i−1(Σi− bΣi)Σ−1_i

= Σ−1_i (Σi− bΣi)Σ−1i + bΣ−1i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i .

We have from Assumption A4 and Proposition 4 that uniformly in i, |Σ−1i (Σi− bΣi)Σi−1|max= Op(miδn), (S.38) | bΣ−1_i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i |max= Op(m2iδn 2 ), (S.39)

where |A|max= maxi,j|aij| for any matrix A = (aij). Besides, it follows from

Assumption A4 that we have uniformly in i, (S.40)

max{|λmin(Σ−1_i (Σi− bΣi)Σ−1_i )|, |λmax(Σ−1_i (Σi− bΣi)Σ−1_i )|} = Op(miδn).

We also have the same result for bΣ−1_i (Σi− bΣi)Σ−1_i (Σi− bΣi)Σ−1_i as in (S.40)

with miδn replaced by (miδn)2. Proposition 4 also implies each element of

Σ−1_i (Σi− bΣi)Σ−1i has the form of

(S.41) D_i(1)(T_i)h2₂+D_i(2)(T_i)h2₃+ mi X j=1 D_ij(3)(T_i)E1(Tij)+ X j6=j′ D_ijj(4)′(T_i)E2(Tij, Tij′)+D_i(5), where D(5)_i = miOp h3₁+ h3₂+ h3₃+log n nh1 + log n nh2 +log n nh2₃ uniformly in i.

We state the following two useful facts before we start proving Lemmas 2-8, both hold uniformly in l:

(S.42) 1 n n X i=1 m3_i mi X j=1 |Wijl| = Op(Kn−1) ,

(18)

(S.43) and 1 n n X i=1 m2_i mi X j1=1 mi X j2=1 |Wij1l||ǫij2| = Op(Kn−1) ,

where Wijl denotes the lth element of Wij. We can prove them in the same

way, except that we need a kind of truncation argument when showing (S.43), and we outline the proof of (S.42) in the following. To prove (S.42), we evaluate the expectation and variance and apply the Bernstein inequality. First note that we have uniformly in l,

Enn−1 n X i=1 m3_i mi X j=1 |Wijl| o = O(K_n−1).

This follows from the local property of the B-spline basis and Assumption A2. In addition, since we have from Assumption A2 that

Enm 2 max n2 n X i=1 m4_i mi X j=1 |Wijl|2+ m3_max n2 n X i=1 m3_i X j16=j2 |Wij1l||Wij2l| o = Om 2 max nKn + m 3 max nK2 n ,

the variance is bounded from above by C1n−19/20 uniformly in l. Each

sum-mand is bounded from above by C2m4max/n = O(n−1/2). Hence (S.42) and

the uniformity in l follow from the Bernstein inequality.

Proof of Lemma 2. We can verify the result on n−1h12,kl by using the local

property of the B-spline basis and the Bernstein inequality for independent bounded random variables. Since

1 n(cH12− H12) = 1 n n X i=1 XT_i _{Σ−1_i (Σi− bΣi)Σ−1_i }Wi + 1 n n X i=1 XT_i _{{ b}Σ−1_i (Σi− bΣi)Σ−1i (Σi− bΣi)Σ−1i }Wi,

the desired result on n−1_(b_h_12,kl _{− h}_12,kl_{) follows from (S.38), (S.39), and}

(S.42). The results on the Euclidean norm follow from those on the ele-ments. Hence the proof is complete.

Proof of Lemma 3. We have from Assumption A4 that

C1 n n X i=1 1 mi WT_i W_i _≤ 1 nH22≤ C2 n n X i=1 WT_i W_i (S.44)

(19)

for some positive constants C1 and C2 and for k = 0, 1, 1 n n X i=1 1 mk_i W T i Wi = 1 n n X i=1 1 mk_i mi X j=1 (ZijZijT) ⊗ (B(Tij)BT(Tij)).

Thus the first result follows from Assumptions A2 and A3 and the standard arguments on B-spline bases as in the proofs of Lemmas A.1 and A.2 of [12].

Since we have 1 n(cH22− H22) = 1 n n X i=1 WT_i _{Σ−1_i (Σi− bΣi)Σ−1i }Wi + 1 n n X i=1 WT_i _{{ b}Σ−1_i (Σi− bΣi)Σ−1_i (Σi− bΣi)Σ−1_i }Wi,

the second result follows from (S.40), the inequalities similar to (S.44), and Assumptions A2 and A3. The third result follows from the first and second results. Finally we deal with the fourth result. Note that

(n−1Hc22)−1− (n−1H22)−1

(S.45)

= (n−1H22)−1(n−1H22− n−1Hc22)(n−1H22)−1

+(n−1Hc22)−1(n−1H22− n−1Hc22)

×(n−1H₂₂)−1(n−1H₂₂_{− n}−1Hc₂₂)(n−1H₂₂)−1.

By using the first, second, and third results and (S.45), we obtain the fourth one. Hence the proof is complete.

Proof of Lemma 4. The first result follows from (S.40). The second one

follows from Lemmas 2 and 3. The last one follows from the first two. Proof of Lemma 5. The first result follows from the fact

C1 n n X i=1 1 mi WT_i W_i_≤ 1 n n X i=1 WT_i Σ−1_i W_i_≤ C2 n n X i=1 WT_i W_i for some positive constants C1 and C2. Next note that

1 √ n n X i=1 WT_i _{Σ−1_i (Σi− bΣi)Σ−1_i } ǫi (S.46) = √1 n n X i=1 WT_i _{Σ−1_i (Σi− bΣi)Σ−1i } ǫi +_√1 n n X i=1 WT_i _{{ b}Σ−1_i (Σi− bΣi)Σ−1_i (Σi− bΣi)Σ−1_i } ǫi.

(20)

By employing (S.39) and (S.43), we can prove the stochastic order of the ele-ments of the second term of the right-hand side is uniformly Op(√nKn−1(h42+

h4₃+log n/(nh2)+log n/(nh23))). Thus the norm of this qKn-dimensional

vec-tor has the stochastic order of (S.47) r n Kn Op h4₂+ h4₃+log n nh2 +log n nh2 3 .

According to Proposition 4, the first term of the right-hand side of (S.46) can be decomposed into

(S.48) √1 n n X i=1 WT_i Q1iǫi+ 1 √ n n X i=1 WT_i Q2iǫi+ 1 √ n n X i=1 WT_i Q3iǫi,

where Q1i corresponds to the first and second terms in (S.41), Q2i

corre-sponds to the third and fourth terms in (S.41), and Q3i corresponds to the

fifth term in (S.41). Proposition 4 implies Q1i= Q(2)_1i h22+ Q

(3)

1i h23,

where we have for s = 2, 3,

max{|λmin(Q(s)1i )|, |λmax(Q(s)1i)|} = O(mi)

uniformly in i. Besides Q(s)_1i depends only on Ti for s = 2, 3. The (k, l)

element of Q2i has the form of

mi X j=1 σkj_i σ_iljE1(Tij) + X j6=j′ σ_ikjσlj_i ′E2(Tij, Tij′),

where Σ−1_i = (σ_ikl). Note that uniformly in l and i,

mi

X

k=1

(σ_ikl)2 = O(1).

Uniformly in i, the elements of Q3i, D(5)i in (S.41), have the order of

miOp h3₁+ h3₂+ h3₃+log n nh1 + log n nh2 +log n nh2 3 . We can prove as in the proof of Lemma 3 that for s = 2, 3,

C1 Kn IqKn ≤ Cov n−1/2 n X i=1 WT_i Q(s)_1iǫ_i_≤ C2 Kn IqKn

(21)

for some positive constants C1 and C2. Hence we have (S.49) _n−1/2 n X i=1 WT_i Q1iǫi = Op(h22+ h23).

Similarly to the second term in the right-hand side of (S.46), we can demon-strate by using (S.43) that

(S.50) n−1/2 n X i=1 WT_i Q3iǫi = r n Kn Op h3₁+h3₂+h3₃+log n nh1 +log n nh2 +log n nh2 3 . Finally we evaluate the second term of (S.48) and it has a structure of V-statistics. By exploiting the structure, we evaluate the expectations and the variances of the elements by using Assumption A2. Then we have

n−1/2 n X i=1 WT_i Q2iǫi = Op 1√ nh2 + p1 nh2₃ + 1 √ nKnh2 +_√ 1 nKnh23 . The second result follows from (S.47), (S.49), (S.50), and the above equality. Proof of Lemma 6. This lemma can be proved in the same way as Lemma 5 and the details are omitted.

Proof of Lemma 7. From the definition of γ∗ _{given after (5.5), we have}

max

1≤j≤mi|W

T

ijγ∗− ZijTg0(Tij)| = Op(Kn−2)

uniformly in i. The above equality and (S.42) imply that the elements of 1 n n X i=1 WT_i Σ−1_i (W_iγ∗_{− (Z}Tg0)_i)

is uniformly Op(Kn−3) and the first result follows from this. As for the second

result, first we note that

| bΣ−1_i _{− Σ}−1_i _|max= Op(miδn)

uniformly in i from (S.38) and (S.39). Recall that δnis defined in (S.37). Thus

the elements of WT_i ( bΣ−1_i _{− Σ}−1_i )(W_iγ∗_{− (Z}Tg0)_i) are bounded uniformly

in l by CK_n−2δnm2i mi X j=1 |Wijl|

(22)

with probability tending to 1 for some positive constant C. Hence the second result follows from (S.42).

Proof of Lemma 8. This lemma can be proved in the same way as Lemma 7 and the details are omitted.

S.5. Theoretical results for general link functions. We state the results of Section 2 for general link functions when mi is uniformly bounded

and ǫ_i satisfies the sub-Gaussian assumption, Assumption A6′ _{here. Note}

that we have no counterpart of Theorem 1 for general link functions even when mi is uniformly bounded.

Let v1 and v2 be two processes each taking a scalar stochastic value at

Tij, i = 1, . . . , n, j = 1, . . . , mi. Then we define two inner products of v1and

v2 by hv1, v2i∆n = 1 n n X i=1 vT_1i∆0iV_i−1∆0iv2i and hv1, v2i∆= E{hv1, v2i∆n},

where v_1iand v_2i are defined in the same way as T_i and ∆0i= diag µ′(Xi1Tβ0+ Zi1Tg0(Ti1)), . . . , µ′(XimT iβ0+ Z

T

imig0(Timi))

. The associated norms are then defined by

kvk∆n = (hv, vi∆n)1/2 and kvk∆= (hv, vi∆)1/2.

We now define the projections, with respect to k · k∆, of the kth element of

X onto ZT_G _{and Z}T_G B by Π∆Xk = argmin g∈G kXk− Z T_g_k∆_{and Π} ∆nXk= argmin g∈GB kXk− ZTgk∆, where kXk− ZTgk∆= 1 nE n_Xn i=1 (X_ik_{− (Z}Tg) i) T_∆ 0iV_i−1∆0i(Xik− (ZTg)_i) o , with X_ik = (Xi1k, . . . , Ximik) T _{and (Z}T_g) i = (Z T i1g(Ti1), . . . , ZimT ig(Timi)).

We denote these projections by ϕ∗

∆k = Π∆Xk and ϕ∆k = Π∆nXk, and

define another one by

b

(23)

where

b

Π∆nXk= argmin

g_∈GB

kXk− ZTgk∆n.

The arguments in Section 5.2 also apply to this ϕ∗

∆k.

Some matrices are necessary to present Proposition S.1 and we define them here. Let

f H = Pn i=1XTi ∆0iVi−1∆0iXi Pn i=1XTi ∆0iVi−1∆0iWi Pn i=1WTi ∆0iVi−1∆0iXi Pn i=1WTi ∆0iVi−1∆0iWi = Hf11 Hf12 f H₂₁ Hf₂₂ ! (say), f H_11·2= fH11− fH12Hf22−1Hf21, and Hf11= (fH11·2)−1.

Let eΩVn be a p × p matrix whose (k, l)th element is

1 n n X i=1 En(X_ik_{− (Z}Tϕ∗_∆k) i) T_∆ 0iV_i−1∆0i(Xil− (ZTϕ∗∆l)_i) o .

Note that n−1Hf_11·2 is an estimate of eΩVn. We assume that there exists a

p × p positive definite matrix eΩV such that

(S.51) lim

n→∞ΩeVn= eΩV.

We present Propositions S.1-S.3 before stating the assumptions for these propositions. By using Lemma S.1 we can prove Proposition S.1 based on the same arguments as those in [4].

Proposition S.1. (Asymptotic normality of bβV) Under Assumption S

in Section 2 for the norm here, (S.51), and Assumptions A1′, A2′, A3, A4′, A5′_{, and A6}′_{, we have} b βV = β0+ fH11 n X i=1 (X_i_{− W}_iHf₂₂−1Hf₂₁)T∆0iVi−1ǫi+ op 1 √_n. We also have e Γ−1/2_V ( bβV − β0)→ N(0, Id p), where eΓV is f H11 n X i=1 n (X_i_−W_iHf₂₂−1Hf21)T∆0iVi−1ΣiVi−1∆0i(Xi−WiHf22−1Hf21) o f H11.

(24)

We give in Proposition S.2 the semiparametric efficiency bound for esti-mation of β0. It can be proved in the same way as Lemma 1 of [4] and the

proof is omitted. We denote the semiparametric efficient score function of β by

˜

l_β∗ = (˜l∗_β1, . . . , ˜l∗_βp)T.

Its expression is given in Proposition S.2. When Vi = Σi, we denote ϕ∗_∆k(t)

by ˜ϕ∗_{ef f,k}(t).

Proposition S.2. (Semiparametric efficiency bound) Under the same

assumptions as in Proposition S.1, we have ˜ l∗_βk = n X i=1 (X_ik_{− (Z}Tϕ˜∗_{ef f,k}) i) T_∆ 0iΣ−1_i {Yi− µ(Xiβ0+ (ZTg0)_i)},

and the semiparametric efficient information matrix for β is given by lim n→∞ 1 nE{˜l ∗ β(˜l∗β)T} = eΩΣ with Vi= Σi in (S.51).

Proposition S.3 is parallel to Proposition 3. It can be proved in the same way as Corollary 1 of [4], and it also follows from Proposition S.1 and Lemma S.1 (vii). Thus the proof is omitted.

PropositionS.3. (Oracle efficient estimator) Under the same

assump-tions as in Proposition S.1, we have with Vi = Σi in (2.2)

√

n eΩ1/2_Σ ( bβ_Σ_{− β}0)→ N(0, Id p).

Now we describe assumptions for the above propositions. Here we need Assumption A6′ since we need some results from the empirical process the-ory in dealing with general link functions.

Assumption A1′.

(i) µ(x) is twice continuously differentiable and inf_x∈Rµ′_{(x) > 0.}

(ii) For some positive constant CB9, we have lim sup

|x|→∞ |µ(x)|/|x|

CB9 _{< ∞.}

(25)

uni-formly bounded and we have for some positive constants CB1 and CB2, CB1 < 1 n n X i=1 mi X j=1 fij(t) < CB2 on [0, 1] and CB1 < 1 n n X i=1 X j6=j′ fijj′(s, t) < C_B2 on [0, 1]2.

Assumption A4′. For some positive constants CB5 and CB6, we have

uni-formly in i,

CB5 ≤ λmin(Σi) ≤ λmax(Σi) ≤ CB6.

Assumption A5′. For some positive constants CB7 and CB8, we have

uni-formly in i,

CB7 ≤ λmin(Vi) ≤ λmax(Vi) ≤ CB8.

Assumption A6′_{. For some positive constants C}_B10 _{and C}_B11_{, we have}

uniformly in i, max

1≤i≤nCB10E{exp(|ǫi|

2_/C

B10) − 1|Xi, Zi, Ti} ≤ CB11.

To prove Proposition S.1, we have only to proceed as in [3] by replacing their Zij, Zi, and ϕ∗k(t) with Wij, Wi, and ZTϕ∗∆k(t), respectively. We just

state the relevant changes and remarks in the following:

(i) Lemmas S.2-S.4 of [3]: We reorganize these lemmas in Lemma S.1 given later. Its (i)-(iii), (iv) and (vi) correspond to Lemma S.2, the latter half of Lemma S.3 and Lemma S.4 of [3], respectively. The former half of Lemma S.3 of [3] seems to be used in their Corollary 1. However, it can be relaxed to (v) of Lemma S.1 here.

(ii) Lemma S.8 of [3]: The regressors Xij and Wij still form a VC class

and we can proceed completely in the same way as in [3].

We state Lemma S.1 in the following. It can be proved it in the same way as Lemma 1.

Lemma S.1. Assume that Assumptions A1′_{, A2}′_{, A3, A4}′_{, A5}′ _hold.

Then we have the following results.

(i) There are positive constants C1 and C2 such that

C1kgkG,2≤ kZTgk∆≤ C2kgkG,2

(26)

(ii) There are positive constants C3 and C4 such that

kgk2G,∞ ≤ C3Knkgk2G,2≤ C4Kn(kZTgk∆)2

for any g ∈ GB.

(iii) There is a positive constant C5 such that for any β ∈ Rp and g ∈ GB,

kXTβ+ ZTg_k_∞_{≤ C}5Kn1/2kXTβ+ ZTgk∆,

where kvk∞= maxi,j|vij|. Besides we have for some positive constant

C6, kvk∆≤ C6kvk∞. (iv) sup g1,g2∈GB hZ T_g 1, ZTg2i∆n − hZTg1, ZTg2i∆ kZT_g 1k∆kZTg2k∆ = Op(Kn p log n/n). (v) For any positive constant M , we have

hXj− ZTgj, Xk− ZTgki∆n − hXj− ZTgj, Xk− ZTgki∆= op(1)

uniformly in gj ∈ GB and gk ∈ GB satisfying kgjkG,2 ≤ M and

kgkkG,2 ≤ M, respectively.

(vi) For any stochastic process δn taking values at Tij satisfying that kδnk∞

is uniformly bounded in n and {δn,ij}mj=1i are mutually independent in

i, we have sup g∈GB hδn, Z T_g_i∆ n − hδn, ZTgi∆ kZT_g_k∆ = Op( p Kn/n)kδnk∞.

(vii) We also have Assumption S for the norm here. Then we have for k = 1, . . . , p, k bϕ∆kk∞= Op(1),

kZT(ϕ∗_∆k_{− b}ϕ∆k)k∆n = op(1), and kZT(ϕ∗∆k− bϕ∆k)k∆= op(1).

References.

[S1] Gin´e, E., Lata la, R. and Zinn, J. (2000). Exponential and moment inequalities for

U-statistics. In High Dimensional Probability II (pp. 13-38). Boston: Birkh¨auser.

[S2] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist.

31_1600–1635.

[S3] Masry, E. (1996). Multivariate local polynomial regression for time series: uniform strong consistency and rates. J. Time Series Anal. 17 571-599.

(27)

EFFICIENT ESTIMATION IN SEMIVARYING COEFFICIENT MODELS FOR

LONGITUDINAL/CLUSTERED DATA

By Ming-Yen Cheng ¶,∗, Toshio Hondak,†, and Jialiang Li∗∗,‡

National Taiwan University∗_{, Hitotsubashi University}†_{, and}

National University of Singapore‡

In semivarying coefficient modeling of longitudinal/clustered data, of primary interest is usually the parametric component which in-volves unknown constant coefficients. First we study semiparametric efficiency bound for estimation of the constant coefficients in a gen-eral setup. It can be achieved by spline regression using the true within-subject covariance matrices, which are often unavailable in reality. Thus we propose an estimator when the covariance matrices are unknown and depend only on the index variable. To achieve this goal, we estimate the covariance matrices using residuals obtained from a preliminary estimation based on working independence and both spline and local linear regression. Then, using the covariance matrix estimates, we employ spline regression again to obtain our final estimator. It achieves the semiparametric efficiency bound un-der normality assumption and has the smallest asymptotic covariance matrix among a class of estimators even when normality is violated. Our theoretical results hold either when the number of within-subject observations diverges or when it is uniformly bounded. In addition, the local linear estimator of the nonparametric component is superior to the spline estimator in terms of numerical performance. The pro-posed method is compared with the working independence estimator and some existing method via simulations and application to a real data example.

§_{This research was partially supported by the Hitotsubashi International Fellow}

Pro-gram and a Taiwan Ministry of Education grant.

¶_{Corresponding author. Research was supported by the Ministry of Science and}

Tech-nology grants 101-2118-M-002-001-MY3 and 104-2118-M-002-005-MY3.

k_{Research was supported by the JSPS Grant-in-Aids for Scientiﬁc Research (A)}

24243031 and (C) 25400197.

∗∗_Research _was _supported _by _grants _AcRF _{R-155-000-130-112} _and

NMRC/CBRG/0014/2012.

MSC 2010 subject classifications: Primary 62G08

Keywords and phrases: Covariance matrix estimation; local linear regression; semipara-metric eﬃciency bound; spline functions.

(28)

1. Introduction. Suppose we have a scalar response Y , and two p-dimensional and q-p-dimensional covariate vectors X and Z. Longitudinal data consist of (Yij, Xij, Zij, Tij), i = 1, . . . , n, j = 1, . . . , mi, where Yij,

X_ij = (Xij1, . . . , Xijp)T and Zij = (Zij1, . . . , Zijq)T are respectively the

values of Y , X and Z of the ith subject at the jth observation time Tij ∈

[0, 1]. Such kind of data are commonly acquired for various purposes, such as evidence based knowledge discovery and empirical study, in a wide range of subject areas. When the subjects are changed to clusters and the Tij’s are

observations on some index variable other than time, they are usually called clustered data. We assume that all the covariates are uniformly bounded for technical reasons. Besides, we let Zij1≡ 1 and suppose Xij has no constant

element for all i and j. For i = 1, . . . , n, denote X_i = (Xi1, . . . , Ximi) T_{, Z} i = (Zi1, . . . , Zimi) T_{, and T} i = (Ti1, . . . , Timi) T_.

A popular model for longitudinal data analysis is the semivarying coefficient model, which is specified by

E(Yij|Xij, Zij, Tij, Xi, Zi, Ti)

(1.1)

= E(Yij|Xij, Zij, Tij) ≡ µ(XijTβ+ ZijTg(Tij)) = µij,

where AT stands for the transpose of a matrix A. In model (1.1), µ(x) is a known strictly increasing smooth link function, β is an unknown regression coefficient vector, and g(t) = g1(t), . . . , gq(t)T is a vector of unknown

smooth functions. Define (1.2) ǫ_i = (ǫi1, . . . , ǫimi) T _{= Y} i− µ_i, and Σi= Var(ǫi|Xi, Zi, Ti), where Y_i = (Yi1, . . . , Yimi) T_{, µ} i = (µi1, . . . , µimi) T_{, and Σ} i is an mi× mi

positive definite matrix depending on X_i, Z_i, and T_i, i = 1, . . . , n. This is a standard marginal model in longitudinal data analysis [24].

Model (1.1) consists of a parametric component, which provides informa-tion on the constant impacts of some important covariates, and a nonpara-metric component which captures the dynamic impacts of the other covari-ates. In this way the model is able to reflect unknown nonlinear structures in the data while retaining similar interpretability as the classical linear models at the same time. There is an extensive literature on the variable selection, structure identification, estimation, and inference issues [6, 8, 12, 22, 25]. In particular, often of primary interest is to have access to the parametric component while the nonparametric component is viewed as the nuisance

(29)

part. In this regard, it is well known that assuming independence or some mis-specified working covariance structure yields less efficient estimation of the constant coefficients. Therefore, a substantial portion of the existing lit-erature aimed at improving the efficiency via modeling and estimating the within-subject covariance structure [6, 7, 10, 18, 26, 27, 28], which is itself a challenging task.

In this article, we focus on the identity link function and make contribu-tions to the efficient estimation problem for model (1.1) in three direccontribu-tions. First, we allow some of the mi’s to tend to infinity. As far as we know,

this setup has not been treated before and the problem is nontrivial. Our results also hold when the mi’s are uniformly bounded and ǫi satisfies the

sub-Gaussian property. See the supplement [5] for the details. When all of the mi’s are diverging, that is, if we have densely observed data, it becomes

a kind of functional data problem and is out of the scope of this paper. Second, we study explicit expression of the semiparametric efficiency bound for estimation of β and asymptotic normality of the generalized estimat-ing equations (GEE) spline estimator under general covariance structures and error distributions. Using the true covariance matrices in the GEE es-timation leads to optimality among all GEE estimators of the parametric component. Furthermore, it achieves the semiparametric efficiency bound when the errors are conditionally normal. Our results are in parallel to that for partially linear and partially linear additive models given by [13] and [4] respectively. Those models are among a rich variety of semiparametric ways of modeling longitudinal data, and they differ from semivarying coef-ficient models in that their nonparametric components admit more direct additive expressions. Partially linear (additive) models were also considered by [14, 15, 16, 17, 23], among which [14, 15, 16, 23] used kernel method and [17] used spline estimation.

Our third contribution is to deal with adaptive efficient estimation when the within-subject covariance matrices are estimated nonparametrically us-ing the data at hand. Notice that [4] ignored this practical issue and did not consider estimation of the covariances, and [13] suggested using some parametric specification which can be estimated √n-consistently. We con-sider the case where the nonparametric within-subject covariance matrices depend only on the observation times but not on the other covariates. Such assumptions are reasonable because we do not assume that the observation times are regular across different subjects or they are dense. Indeed, with irregular and/or sparse observation times, estimating the covariances in a completely nonparametric way, by letting them to be dependent on all of the Tij, Xij and Zij nonparametrically, is particularly problematic and even

(30)

unreliable as the curse-of-dimensionality problem arises. Our covariance es-timator is constructed based on residuals yielded by an initial estimation. The final estimator of the true value of β is then given by plugging-in the covariance estimates to the GEE spline estimation. We show the asymptotic equivalence of our final estimator to the oracle efficient estimator which uses the true covariance matrices in the GEE spline estimation.

The above result is partly motivated by the study of [14] on efficient esti-mation in partially linear models under the same nonparametric covariance structure. However, the kernel profile method taken by [14] involves only local linear regression, thus, to achieve semiparametric efficiency it requires some complicated iterative backfitting calculation except for the identity link function [15, 16]. By comparison, our approach to estimating the parametric and nonparametric components in the mean function is different and much simpler. We ingeniously use both spline approximation and local linear es-timation to avoid complicated calculation while allowing for the asymptotic equivalence property at the same time. To the best of our knowledge, there are no existing results for semivarying coefficient models, especially when some of the mi’s tend to infinity or when the Σis are estimated.

Our final estimator is some kind of feasible generalized least squares (FGLS) estimator since we replace the within-subject covariance matrices with their nonparametric estimates. Even if our assumption on the covari-ance matrices fails to hold, it still possesses the asymptotic normality under mild conditions and still makes use of some information of the covariance matrices. For example, if the covariances depend on some time-dependent covariates, to some extent such effects are still captured by our method. In this sense, compared with existing methods which use either parametri-cally estimated or some ad-hoc covariance matrices [7, 18, 21], our approach is more adaptive to the unknown covariance matrices. A promising cluster bootstrap inference method was proposed by [2]; it assumes some parametric within-cluster covariance structure, however. In the case where there is one observation for each subject/cluster, our assumption on the covariance ma-trices reduces to that of [20], which also suggested to improve the efficiency in a similar manner.

Our simulation study shows that numerically the proposed method out-performs the working independence approach and the quadratic inference functions (QIF) method by [18], and it behaves close to the oracle estimator which uses the true covariance matrices. Note that, while the QIF procedure is suitable when there is some kind of regularity and stationarity in the er-ror process, our procedure adapts to both non-stationarity and irregularity. We also applied our method to the CD4 count dataset and identified some

(31)

interesting new effects not detected by the working independence approach. After the semiparametric efficient estimation, we can estimate and make inference on the nonparametric component in the same way as in dealing with varying coefficient models, using the difference between the response and the estimated parametric part [25]. When p and q are both diverg-ing and the model is sparse, [6] suggested a simultaneous variable selection and structure identification procedure and showed its consistency property. By combining the method with the proposed estimation procedure and by putting together the corresponding consistency and efficiency results, we have an efficient estimation procedure in this case.

The organization of this paper is as follows. In Section 2 we derive the semiparametric efficiency bound for the constant coefficient vector β and asymptotic normality of GEE spline estimators. In Section 3, we propose an efficient estimator of β when the errors have some general covariance structure and state its asymptotic equivalence to the oracle estimator which assumes the covariance matrices are known. Section 4 summarizes and dis-cusses results of our simulation and empirical studies used to assess numeri-cal performance of the proposed efficient estimator. Section 5 contains some technical assumptions and proof of the asymptotic equivalence. In the sup-plementary material [5] we give additional simulation results for estimation, proofs of the other theoretical results, some lemmas, and theoretical results when the mi’s are uniformly bounded.

2. Semiparametric efficiency bound for β. In this section, Vi is a

given mi× mi inverse weight matrix depending only on Xi, Zi, and Ti,

i = 1, . . . , n. We use a Kn-dimensional equispaced B-spline basis on [0, 1],

denoted by B(t), to approximate the function g(t). See [19] for the definition and properties of B-spline bases. We set Wij = Zij ⊗ B(Tij) and Wi =

(Wi1, . . . , Wimi)

T_{, where ⊗ is the Kronecker product, and we denote the}

true values of β and g(t) by β0 and g0(t) = (g01(t), . . . , g0q(t))T respectively.

Then we estimate β0 and g0(t) by minimizing with respect to β and γ

simultaneously the following objective function: (2.1)

n

X

i=1

(32)

where γ ∈ RqKn_{and the j th element of µ(X}

iβ+ Wiγ) is µ(XijTβ+ WijTγ).

Thus the generalized estimating equations are

n X i=1 XT_i∆iV_i−1(Yi− µ(Xiβ+ Wiγ)) = 0, and n X i=1 WT_i∆iVi−1(Yi− µ(Xiβ+ Wiγ)) = 0, (2.2)

where ∆i is an mi × mi diagonal matrix defined by ∆i = diag(µ′(Xi1Tβ+

W_i1Tγ), . . . , µ′_(XT

imiβ+ W

T

imiγ)). Denote the solution to (2.2) by bβV and

b

γV ≡ bγ_1VT , . . . ,bγ_qVT

T

. Then the GEE spline estimator with weight matrices V_i−1, i = 1, . . . , n, for β0is bβV and that for g0(t) is bγ_1VT B(t), . . . ,bγ_qVT B(t)T.

Hereafter we focus on the identity link function and present the asymp-totic normality of bβV in Proposition 1 under general error distributions as

specified in Assumption A6 given in Section 5. We allow some of the mi’s to

diverge in a way like Pn_i=1m5_i = O(n) and max_1≤i≤nmi = O(n1/8). See

Assumptions A1 and A2 for the specific conditions on the mi’s. We refer to

the supplement [5] for the results for general link functions when the mi’s

are uniformly bounded and the ǫ_i’s satisfy the sub-Gaussian property. First, we introduce some function spaces, inner products and projections. Let L2 denote the space of square integrable functions on [0, 1] and recall

B(t) is the equispaced B-spline basis on [0, 1]. We define two function spaces: G_{= {(g}1, . . . , gq)T| gj ∈ L2, j = 1, . . . , q},

and GB = {(BTγ1, . . . , BTγq)T | γ = (γ1T, . . . , γqT)T ∈ RqKn} .

Note that GB ⊂ G. Next, let v1 and v2 be two stochastic processes each

taking scalar values at Tij, i = 1, . . . , n, j = 1, . . . , mi. Then we define two

inner products of v1and v2by hv1, v2iVn = n1

Pn

i=1vT1iVi−1v2iand hv1, v2iV =

E{hv1, v2iVn}, where v1i and v2i are defined in the same way as Ti, and we

define the associated norms by kvkV

n = (hv, viVn)1/2 and kvkV = (hv, viV)1/2.

The projections, with respect to k · kV_{, of the kth element of X onto Z}T_G

and ZTGB are given by

(2.3) ΠVXk= argmin g_∈G kXk− Z T_g_kV _{and Π} VnXk= argmin g_∈GB kXk− ZTgkV, where kXk − ZTgkV = 1_nEn Pni=1(Xik − (ZTg)_i)TVi−1(Xik− (ZTg)_i) o , with X_ik = (Xi1k, . . . , Ximik) T _{and (Z}T_g) i = (Z T i1g(Ti1), . . . , ZimT ig(Timi)). Hereafter we write ϕ∗ Vk= ΠVXk ∈ G and ϕVk= ΠVnXk ∈ GB.

(33)

Assumption S

(i) The projections ϕ∗

Vk(t), k = 1, . . . , p, and the varying coefficient

func-tion g0 are twice continuously differentiable on [0, 1], and they and

their second order derivatives are uniformly bounded in n.

(ii) We take Kn = ⌊cKn1/5⌋ for some positive constant cK, where ⌊x⌋ is

the largest integer no greater than x.

Assumption S(i) is a mild and standard assumption for semiparamet-ric models. We consider the existence and smoothness properties of ϕ∗_V_k(t) in Section 5. Recall that all the covariates are assumed to be uniformly bounded. Since the relevant functions are assumed to be at least twice con-tinuously differentiable, we recommend quadratic or cubic spline approxima-tion. Then the order of Kn specified in Assumption S(ii) is optimal. If the

smoothness of different functions varies, we refer to [1] for the convergence rate interfere phenomenon.

The following matrices are necessary in order to present asymptotic nor-mality of bβV: H = Pn i=1XTi Vi−1Xi Pn i=1XTi Vi−1Wi Pn i=1WTi Vi−1Xi Pn i=1WTi Vi−1Wi = H11 H12 H₂₁ H₂₂ , (2.4) H_11·2= H11− H12H22−1H21, and H11= (H11·2)−1.

Let ΩVn be a p × p matrix whose (k, l)th element is

hXk− ZTϕ∗Vk, Xl− ZTϕ∗VliV = 1 n n X i=1 En(X_ik_{− (Z}Tϕ∗_V_k) i) T_V−1 i (Xil− (ZTϕ∗Vl)_i) o .

Note that n−1H_11·2 is an estimate of ΩVn. We assume that there exists a

p × p positive definite matrix ΩV such that

(2.5) lim

n→∞ΩVn= ΩV.

Now we are ready to state the asymptotic normality of bβV under general

error distributions as specified in Assumption A6 given in Section 5. Its proof is given in the supplement [5]. We denote the normal distribution with mean η and covariance Ω by N(η, Ω), and by “_{→” we mean convergence in}d distribution. Let Il be the l-dimensional identity matrix.

(34)

Proposition 1. (Asymptotic normality of bβV) Under Assumption S,

(2.5), and Assumptions A1-6 given in Section 5, we have b βV = β0+ H11 n X i=1 (X_i_{− W}_iH₂₂−1H₂₁)TV_i−1ǫ_i+ op 1 √ n . We also have Γ−1/2_V ( bβV − β0)→ N(0, Id p), where ΓV is given by (2.6) H11 n X i=1 n (X_i_−W_iH₂₂−1H21)TVi−1ΣiVi−1(Xi−WiH22−1H21) o H11.

Under (2.5), bβV is √n-consistent for β0. We can estimate its asymptotic

covariance ΓV given in (2.6) by replacing the Σi’s with some estimates

based on bβV and bγV. For example, we can replace Σi with eǫieǫTi where

eǫi = Yi− XTi βbV − WTi γbV. However, this approach may be too crude and it

does not make use of the common information on the covariance structure contained in different subjects. Alternatively, we can estimate the Σi’s by

applying smoothing techniques to some residuals based on some assumption on the covariance structure. We investigate this problem in Section 3.

Next, Proposition 2 gives the semiparametric efficiency bound for estima-tion of β0. It can be proved in almost the same way as in Section 4.4 of [13]

and Lemma 1 of [4] and the proof is omitted. We denote the semiparametric efficient score function of β by l∗_β = (l∗_β1, . . . , l∗_βp)T. Its expression is given in Proposition 2. Then we denote ϕ∗

Σk(t) by ϕ∗ef f,k(t) when Vi = Σi in (2.1).

Proposition2. (Semiparametric efficiency bound) Under the same

as-sumptions as in Proposition 1, we have l∗_βk= n X i=1 (X_ik_{− (Z}Tϕ∗_{ef f,k}) i) T_Σ−1 i {Yi− XTi β0− (ZTg0)_i},

and the semiparametric efficient information matrix for β is given by lim n→∞ 1 nE{l ∗ β(l∗β)T} = ΩΣ with Vi= Σi in (2.5).

Proposition 3 gives the asymptotic normality of bβ_Σ, the so called oracle estimator, which uses the true covariance structure in the GEE spline regres-sion. It also asserts that bβ_Σ achieves the semiparametric efficiency bound derived from Proposition 2. The proof is given in the supplement [5].

(35)

Proposition3. (Oracle efficient estimator) If we take Vi = Σi in (2.2)

then, under the same assumptions as in Proposition 1, we have √

n Ω1/2_Σ ( bβ_Σ_{− β}₀)_{→ N(0, I}d p).

In practice, usually the Σi’s are unknown and we have no direct access

to the semiparametric efficient score function or the oracle estimator. In the next section we study nonparametric estimation of the covariances so as to improve the efficiency.

3. Efficient estimation. The semiparametric efficiency bound of β given in Proposition 2 indicates that knowledge, or at least estimation, of the Σi’s is necessary in order to construct a semiparametric efficient

estima-tor. On the other hand, as discussed in the Introduction, when the Σi’s are

unknown it is almost impossible to estimate them in a fully nonparametric way. Fortunately, for longitudinal or clustered data sets, it is reasonable to make some assumptions such as

(3.1) Σi = Σ(Ti), i = 1, . . . , n,

where the (j, j)th element of Σiis given by σ2(Tij) and the (j, j′)th element is

given by σ(Tij, Tij′) when j 6= j′, for some smooth functions σ2(t) and σ(s, t).

Based on (3.1), in Section 3.1 we construct nonparametric estimates of the covariances and then use them to derive an FGLS procedure to improve the efficiency, and we show in Section 3.2 its asymptotic equivalence to the oracle estimator bβ_Σ. We also discuss estimation of the nonparametric component. 3.1. Methodology. A preliminary estimation of β0 and g0 is necessary

before we can estimate the covariances. For simplicity and robustness, we utilize working independence in the GEE spline estimation. As noted fol-lowing Proposition 1 we could then use the resultant residuals to estimate the covariance matrices directly. However it is intuitively better to further make use of the covariance structure (3.1) by applying some nonparametric smoothing techniques to the residuals. In addition, alternative to the spline estimator, we could apply smoothing techniques to the pseudo responses Y_i_{− X}T_i βbV to obtain another estimator of g0. We take this latter approach

for technical and numerical reasons given in Remark 1. After the preliminary estimation, for each i = 1, . . . , n, we estimate Σi by applying local linear

regression and denote the resultant estimate by bΣi. Our final estimator of β0

is then obtained by taking Vi= bΣi, i = 1, . . . , n, in the GEE spline

estima-tion. Note that in the trivial case where mi is fixed for all i and the Tij’s are

(36)

Let K be a given kernel function. Our estimation procedure is formally specified as follows:

Step 1. Estimate β0 by the GEE spline method given in Section 2 with

V_i= Imi, i = 1, . . . , n, and denote the resultant working independence

estimate by bβ_I.

Step 2. Estimate g0(t) by applying local linear regression toYij−XijTβbI, i =

1, . . . , n, j = 1, . . . , mi , using bandwidth h1. We denote the resultant

estimate byg(t), which is written asb (3.2) b g(t) = Dq(A1n(t))−1 1 N1h1 n X i=1 mi X j=1 Zij⊗ 1 Tij−t h1 ! KTij− t h1 (Yij−XijTβbI),

where N1 =Pn_i=1mi, Dq = Iq⊗ (1 0), and

A1n(t) = 1 N1h1 n X i=1 mi X j=1 (ZijZijT) ⊗ 1 Tij−t h1 Tij−t h1 ( Tij−t h1 ) 2 ! KTij − t h1 .

Step 3. Calculate the residuals, denoted as_bǫij, given by

bǫij = Yij − XijTβbI − ZijTg(Tb ij), i = 1, . . . , n, j = 1, . . . , mi.

Step 4. Estimate the variance function σ2(t) by applying to the squared residuals local linear regression with bandwidth h2. Denote the

resul-tant estimate by cσ2_{(t); it can be expressed as}

(3.3) cσ2_{(t) = (1 0)(A} 2n(t))−1 1 N1h2 n X i=1 mi X j=1 1 Tij−t h2 ! K Tij − t h2 (_bǫij)2, where A2n(t) = _N₁1_h₂ Pni=1 Pmi j=1 1 Tij−t h2 Tij−t h2 ( Tij−t h2 ) 2 ! K Tij−t h2 .

Step 5. Estimate the covariance function σ(s, t) by applying to_bǫijbǫij′, j 6=

j′, i = 1, . . . , n local linear regression with bandwidth h3. We denote

the resultant estimate by_{bσ(s, t); it has the following expression:} bσ(s, t) = (1 0 0)(A3n(s, t))−1 (3.4) ×_N1 2h23 n X i=1 X j6=j′    1 Tij−s h3 Tij′−t h3    K Tij_h− s 3 K Tij′− t h3 bǫijbǫij′,

(37)

where N2 =Pni=1mi(mi− 1) and A3n(s, t) = 1 N2h2₃ n X i X j6=j′    1 Tij−s h3 Tij′−t h3    1 Tij−s h3 T_ij′−t h3 K Tij− s h3 K Tij′− t h3 .

Step 6. Calculate bΣiby combining the results from steps 4 and 5 by letting

b

Σi(j, j′) =bσ(Tij, Tij′)I(j 6= j′) + cσ2(T_ij)I(j = j′),

and then estimate β0 with Vi = bΣi in the GEE (2.2). Denote the

resultant estimate of β0 by bβ_Σb.

Step 7. Update the nonparametric estimator of g0(t) given in Step 2 by

replacing Yij − XijTβbI with Yij − XijTβb_Σb, i = 1, . . . , n, j = 1, . . . , mi.

Denote the resultant estimator byg_bU(t). Alternatively, we can estimate

g₀(t) with splines, by replacing β with bβ_b

Σ and taking Vi = bΣi in the

GEE (2.2). Denote the resultant estimator byg_bS(t).

In general the covariance function estimatebσ(s, t) given by step 5 may not be positive semidefinite. We can modify it by truncating the eigenfunctions in its spectral decomposition that have eigenvalues not exceeding some non-negative constant λL. Then we have positive definite covariance estimates if

we replaceσ(s, t) with this modified version in step 6._b

Remark 1. When we calculate bβI in step 1, we also have γbI and get

the set of residuals {eǫij = Yij− X_ijTβbI− W_ijTbγI}. Then we could omit steps

2 and 3 of our procedure by exploiting this set of residuals when we estimate Σi in steps 4-6. However, our simulation results summarized in Section 4

indicate that this simplified approach is inferior to the proposed one. Intu-itively speaking, to achieve the semiparametric efficiency in the GEE spline estimation of β0, to some extent the accompanying estimation of g0(t)

re-quires undersmoothing and thus it often exhibits spurious wiggling patterns. Besides, it is difficult to justify theoretically this simplified approach as the local property of spline estimators seems to be intractable.

3.2. Asymptotic results. First we establish the asymptotic equivalence between the data-driven estimator bβ_b

Σ and the oracle estimator bβΣ by

ex-ploiting some desirable properties of bΣi. First, we specify our assumptions

on the smoothness of g0(t), σ2(t) and σ(s, t). We need Assumption B given

below, which is more restrictive than usual, in order to evaluate the differ-ence between bΣ−1_i and Σ−1_i .

(38)

Assumption B.

(i) Assumption (3.1) holds.

(ii) The true varying coefficient function g0(t) is three times continuously

differentiable on [0, 1].

(iii) The variance function σ2(t) is three times continuously differentiable on [0, 1].

(iv) The covariance function σ(s, t) is three times continuously differen-tiable on [0, 1]2.

In the following we collect our assumptions on the kernel function K and the three bandwidths used in the construction of the proposed estimator. Assumption H(i) on K is a standard one. When Assumption B holds, our as-sumptions on the bandwidths h1, h2and h3are not restrictive. For example,

the optimal order of h1and h2is n−1/5which falls into the specified range. A

larger order is recommended only for h3due to the two-dimensional

smooth-ing in step 5. However, since the effective number of observations used in step 5 of the procedure is N2 we anticipate that bandwidth choice will not

seriously affect the performance of our final estimator. Assumption H.

(i) The kernel function K is some continuously differentiable symmetric density function with a compact support.

(ii) The bandwidths h1, h2 and h3 satisfy h1 = c1n−ah for some 1/6 <

ah ≤ 1/4, h2 = c2n−bh for some 1/6 < bh ≤ 1/4 and h3 = c3n−ch for

some 1/6 < ch < 1/4, where c1, c2 and c3 are some positive constants.

The asymptotic expression of bΣiis given in Proposition 4, which is verified

in the supplementary material [5]. Note that we need more elaborate repre-sentations than those used by [14] since we deal with a (p+qKn)-dimensional

linear regression model. Note also that the functions Bj, j = 1, . . . , 4, that

appear in Proposition 4 are implicitly defined in the proof of the proposition and only their boundedness property is needed in the proof of Theorem 1.

Proposition 4. (Representations of the covariance estimators) Under

the assumptions in Proposition 1 with Vi = Imi, and Assumptions B and H,

we have the following representations of cσ2_{(t) and} _{bσ(s, t). Uniformly in t,}

c σ2_{(t) − σ}2_{(t) = B} 1(t)h22+ B2(t)E1(t) + Op(h31+ h32) + Op log n nh1 +log n nh2