行政院國家科學委員會專題研究計畫 成果報告
相依資料的條件獨立檢定
研究成果報告(精簡版)
計 畫 類 別 : 個別型
計 畫 編 號 : NSC 99-2118-M-004-006-
執 行 期 間 : 99 年 08 月 01 日至 100 年 07 月 31 日
執 行 單 位 : 國立政治大學統計學系
計 畫 主 持 人 : 黃子銘
計畫參與人員: 碩士班研究生-兼任助理人員:程毓婷
博士班研究生-兼任助理人員:鄭宇翔
處 理 方 式 : 本計畫可公開查詢
中 華 民 國 100 年 07 月 29 日
A conditional independence test for dependent
data based on maximal conditional correlation
Yu-Hsiang Cheng
Department of Statistics
National Chengchi University
Taipei, Taiwan, ROC
96354501@nccu.edu.tw
Tzee-Ming Huang
Department of Statistics
National Chengchi University
Taipei, Taiwan, ROC
tmhuang@nccu.edu.tw
June 16, 2011
Abstract
In Huang [7], a test of conditional independence based on maximal nonlinear conditional correlation is proposed and the asymptotic distri-bution for the test statistic under conditional independence is established for IID data. In this paper, we derive the asymptotic distribution for the test statistic under conditional independence for α-mixing data. The results of simulation show that the test performs reasonably well for de-pendent data. We also apply the test to stock index data to test Granger noncausality between returns and trading volume.
Keywords: conditional independence test, α-mixing, maximal conditional nonlinear correlation
2010 Mathematics Subject Classification: 62G10, 62H20
1
Introduction
The testing of conditional independence is important in statistics; one interest-ing application of such testinterest-ing is variable selection. For instance, consider the following regression problem:
Y = f (Z, X) + , (1)
where is independent of (Z, X) and f is a real-valued function. If Y and X are conditionally independent given Z, the variable X can be excluded from the model in (1).
Suppose that X, Y and Z are continuous random vectors of dimensions d1, d2
and d respectively. For testing whether X and Y are conditionally independent given Z, most tests in the literature deal with the case where the observations
for (X, Y, Z) are IID. See, for example, Linton and Gozalo [10], Delgado and Manteiga [2], Li, Cook and Nachtsheim [8], Huang [7], etc.
When the observations for (X, Y, Z) are weakly dependent, fewer tests are available in the literature. Su and White [12, 13] developed nonparametric tests based on a weighted Hellinger distance between conditional densities or the difference between conditional characteristic functions. Bouezmarni, Rombouts and Taamouti [1] also proposed a nonparametric test based on the Hellinger distance of copula densities.
In [12], [13] and [1], one motivation for constructing conditional indepen-dence tests for dependent data is to test Granger noncausality, which, according to Florens and Mouchart [4] and Florens and Fougere [3], is a form of condi-tional independence. Specifically, a series {Ut} does not Granger cause series
{Vt} if
Vt⊥ (Ut−1, Ut−2, . . . , Ut−p)|(Vt−1, Vt−2, . . . , Vt−p) for every p ≥ 1,
where ⊥ denotes an independent relationship.
In this paper, we consider Huang’s test statistic and derive its asymptotic distribution for α-mixing data. In order to measure the conditional associa-tion between X and Y given Z, Huang [7] uses a measure called the maximal nonlinear conditional correlation, which is defined as
sup
f,g∈S∗ 0
Corr(f (X, Z), g(Y, Z)|Z), (2)
where S0∗is the collection of (f, g)’s such that E(f2(X, Z)) < ∞ and E(g2(Y, Z)) <
∞. Huang’s test statistic is an estimator for a weighted average of estimators of maximal nonlinear conditional correlation at different evaluation points for the given variable Z. The test statistic also involves certain basis functions used to approximate the f and g in (2). We show that the asymptotic distribution of Huang’s test statistic for α-mixing data is the same as that for IID data if the number of evaluation points and the number of basis functions are held constant.
This paper is organized as follows. In Section 2, we review the definition of maximal nonlinear conditional correlation and certain approximation results given in [7], and state the asymptotic properties of the test statistic that we derive under α-mixing condition. Some simulation results and an application are in Section 3. Proofs are given in Section 4.
2
Review and main results
In this section, we review the definition of the maximal nonlinear conditional correlation ρ1(X, Y |Z), the approximation of ρ1(X, Y |Z) and the proposed
es-timator for ρ1(X, Y |Z = z) in [7]. Then, we consider Huang’s test statistic
for testing H0 : ρ1(X, Y |Z) = 0 and present its asymptotic properties that we
2.1
Definition, approximation, and estimation for
maxi-mal nonlinear conditional correlation
The maximal nonlinear conditional correlation ρ1(X, Y |Z) is essentially the
maximum of E(f (X, Z)g(Y, Z)|Z) over S0, where S0 is the collection of (f, g)’s
that satisfy the following conditions:
E(f2(X, Z)|Z)I(0,∞)(E(f2(X, Z)|Z)) = I(0,∞)(E(f2(X, Z)|Z))
E(g2(Y, Z)|Z)I(0,∞)(E(g2(Y, Z)|Z)) = I(0,∞)(E(g2(Y, Z)|Z))
(3)
and
E(f (X, Z)|Z) = E(g(Y, Z)|Z) = 0. (4) To avoid dealing with the existence of the maximum and the measurability of ρ1(X, Y |Z), in [7], ρ1(X, Y |Z) is defined as
sup
(f,g)∈S0
E(f (X, Z)g(Y, Z)|Z),
where the supremum is defined as lim
n→∞E(αn(X, Z)βn(Y, Z)|Z),
where {(αn, βn)} is a sequence in S0 that satisfies the following conditions:
(i) The sequence {E(αn(X, Z)βn(Y, Z)|Z)} is non-decreasing.
(ii) For every (f, g) ∈ S0,
E(f (X, Z)g(Y, Z)|Z) ≤ lim
n→∞E(αn(X, Z)βn(Y, Z)|Z).
To approximate
ρ1(X, Y |Z) = sup (f,g)∈S0
E(f (X, Z)g(Y, Z)|Z),
we consider S0,p,q: the collection of all (f, g)’s in S0 such that f and g are in
the spans of {φp,j : 1 ≤ j ≤ p} and {ψq,k : 1 ≤ k ≤ q} respectively, when Z is
given. That is,
f (X, Z) = p X j=1 ap,j(Z)φp,j(X) for some ap,j(Z)’s and g(Y, Z) = q X k=1 bq,k(Z)ψq,k(Y ) for some bq,k(Z)’s.
Suppose that the basis functions φp,i’s and ψq,j’s are selected so that there
exists basis functions θr,k’s such that
lim p,r→∞a(i,k)inf E α(X, Z) − X 1≤i≤p,1≤k≤r a(i, k)φp,i(X)θr,k(Z) 2 = 0 (5)
and lim q,r→∞b(j,k)inf E β(Y, Z) − X 1≤j≤q,1≤k≤r b(j, k)ψq,j(Y )θr,k(Z) 2 = 0. (6)
for every α and β such that E(α2(X, Z)) and E(β2(Y, Z)) are finite. Let X , Y
and Z be the ranges of X, Y and Z respectively. Suppose that for each (p, q), there exist coefficients ap,0,i’s and bq,0,j’s such that
X 1≤i≤p ap,0,iφp,i(x) = 1 = X 1≤j≤q bq,0,jψq,j(y) (7)
for every x in X and every y in Y. Let ρp,q(Z) = max
(f,g)∈S0,p,q
E(f (X, Z)g(Y, Z)|Z).
Then, by Fact 2 in [7], ρ1(X, Y |Z) can be reasonably approximated by ρp,q(Z)
if p and q are large. The statement of the fact is given below.
FACT 1. (Fact 2 in [7]) Suppose that (5), (6) and (7) hold and {pn} and {qn}
are sequences of positive integers that tend to ∞ as n → ∞. Then lim
n→∞E(|ρ1(X, Y |Z) − ρpn,qn(Z)|) = 0.
A remark.
• It is not difficult to find basis functions that satisfy (5), (6) and (7). If X , Y and Z are bounded regions in Rd1, Rd2 and Rd respectively
and the Lebesgue densities for (X, Z) and (Y, Z) are bounded, then φp,i’s
and ψq,j’s can be taken as B-spline basis functions on multidimensional
intervals containing X and Y respectively, where the θr,k’s can be can be
taken as B-spline basis functions on a multidimensional interval containing Z.
ρp,q(Z) can be found as follows. First, we look for vectors a1= (a1,1(Z), . . . , a1,p(Z))T
and b1 = (b1,1(Z), . . . , b1,q(Z))T such that (a1, b1) is the pair (a, b) that
maxi-mizes aTΣ
φ,ψ,p,q(Z)b subject to
aTΣφ,p(Z)a = 1 = bTΣψ,p(Z)b,
where
Σφ,p(Z) = (E(φp,i(X)φp,j(X)|Z) − E(φp,i(X)|Z)E(φp,j(X)|Z))p×p,
Σψ,q(Z) = (E(ψq,i(Y )ψq,j(Y )|Z) − E(ψq,i(Y )|Z)E(ψq,j(Y )|Z))q×q,
and
Take f1(X, Z) = p X j=1 a1,j(Z)(φp,j(X) − E(φp,j(X)|Z)) and g1(Y, Z) = q X k=1 b1,k(Z)(ψq,j(X) − E(ψq,j(Y )|Z)).
Then, E(f1(X, Z)g1(Y, Z)|Z) = ρp,q(Z).
For z ∈ Z, let ˆΣφ,ψ,p,q(z), ˆΣφ,p(z) and ˆΣψ,q(z) be the kernel estimators
of Σφ,ψ,p,q(z), Σφ,p(z) and Σψ,q(z) respectively; in other words, every element
E(g(X, Y )|Z = z) in Σφ,ψ,p,q(z), Σφ,p(z) and Σψ,q(z) is estimated by
ˆ E(g(X, Y )|Z = z) = Pn t=1g(Xt, Yt)k0((Zt− z)/h) Pn t=1k0((Zt− z)/h) (8)
in ˆΣφ,ψ,p,q(z), ˆΣφ,p(z) and ˆΣψ,q(z), where k0 is a kernel function defined on Rd
and h > 0. Then, we use ˆρp,q(z) = maxa,baTΣˆφ,ψ,p,q(z)b for estimating ρp,q(z),
where all pairs (a, b) satisfy
aTΣˆφ,p(z)a = 1 = bTΣˆψ,q(z)b.
Henceforth, the estimator ˆρp,q(z) will be abbreviated as ˆρ(z) for each z in Z.
2.2
A test for conditional independence and relative
asymp-totic properties
The conditional independence test that we use in this paper is based on ˆρ2(z)
at different z’s. Since each ˆρ(z) is determined by the kernel estimators of cer-tain conditional expectations, we first derive their joint asymptotic distribution. Then, we use Pk
i=1fˆZ(zi) ˆρ2(zi)’s as our test statistic and establish its
consis-tency and asymptotic distribution. Here the zi’s are selected points in Z and
ˆ fZ(·) =
Pn
t=1k0((Zt− ·)/h)
nhd
is the kernel density estimator of fZ: the Lebesgue pdf of Z. In order to avoid
dealing with the boundary bias problem in kernel estimation, we consider a set Z0that is contained in the interior of Z so that points in Z0are away from the
boundary of Z, and choose the zi’s from Z0.
Our first result is with regard to the joint asymptotic distribution of kernel estimators of some conditional expectations. In order to describe the assump-tions, we first review the definition for α-mixing coefficients. For a strictly stationary process {Ut}, let Fabdenote the σ-algebra generated by (Ua, . . . , Ub).
Then, the α-mixing coefficient at lag s for {Ut} is
{Ut} is considered to be α-mixing if its α-mixing coefficient at lag s tends to
0 as s tends to ∞. Let α(s) denote the α-mixing coefficient at lag s for the process {(Xt, Yt, Zt)}. Our assumptions are provided below.
(S0) The basis functions φp,1, . . ., φp,pand ψq,1, . . ., ψq,qare bounded and (5),
(6) and (7) hold. For the sake of brevity, φp,1, . . ., φp,pand ψq,1, . . ., ψq,q
will be abbreviated as φ1, . . ., φp and ψ1, . . ., ψq respectively hereafter.
(S1) {(Xt, Yt, Zt) ∈ Rd1+d2+d, t ≥ 0} is a strictly stationary α-mixing process
that satisfies α(τ ) = O(τ−(1+)), where > max(1, d/2), d1, d2 and d
denote the dimensions of Xt, Ytand Zt respectively.
(S2) Suppose that there exist Z0: an open subset of the interior of Z and µ:
σ-finite measure such that for every z ∈ Z0, the conditional distribution of
(X, Y ) given Z = z has a pdf f (·|z) with respect to µ. Further, f (x, y|z) and fZ(z) are twice differentiable with respect to z on Z0.
(S3) There exists a function h on X × Y such that
sup z∈Z0 max |f (x, y|z)|, max 1≤i≤d ∂ ∂zi f (x, y|z) , max 1≤i,j≤d ∂2 ∂zi∂zj f (x, y|z) ≤ h(x, y) andR h(x, y)dµ(x, y) < ∞.
(S4) There exist constants c0 and c1 such that
sup z∈Z0 max |fZ(z)|, max 1≤i≤d ∂ ∂zi fZ(z) , max 1≤i,j≤d ∂2 ∂zi∂zj fZ(z) ≤ c0 and 1/fZ(z) ≤ c1for z ∈ Z0.
(S5) k∗ is a kernel function defined on R1, and k
0 is a product kernel on Rd
that satisfies
k0(v1, v2, . . . , vd) = k∗(v1)k∗(v2) · · · k∗(vd),
k∗ ≥ 0, supvk∗(v) < ∞, R k∗(v)dv = 1, R vk∗(v)dv = 0,R v(k∗(v))2dv =
0 and κ2=R v2k∗(v)dv < ∞.
(S6) As n → ∞, the bandwidth h → 0, nhd→ ∞ and nhd+4→ 0.
Under the above conditions, the joint asymptotic distribution of kernel estima-tors of conditional expectations can be established, as stated in Lemma 1. The proof for Lemma 1 is provided in Section 4.1.
LEMMA 1. Suppose that Conditions (S1)–(S6) hold. Suppose that g1, g2, . . .,
gm are bounded functions defined on X × Y. Suppose z1, . . ., zk are distinct
points in Z0. For i = 1, . . ., k, let
ˆ gj(zi) = Pn t=1gj(Xt, Yt)k0((Zt− zi)/h) Pn t=1k0((Zt− zi)/h)
be the kernel estimator of g∗j(zi) ≡ E(gj(X, Y )|Z = zi). Further, let Bs,j(zi) = κ2 2 (fZ(zi)g ∗ j,ss(zi) + 2fs(zi)g∗j,s(zi)) (9) and Wj,n(zi) = √ nhd gˆ j(zi) − g∗j(zi) − h2 d X s=1 Bs,j(zi)/fZ(zi) !
for 1 ≤ i ≤ k and 1 ≤ j ≤ m, where g∗j,sand gj,ss∗ denote the first and the second partial derivatives of g∗
j with respect to the s-th component respectively and fs
denotes the first partial derivative of fZ with respect to the s-th component. Let
uj,t= gj(Xt, Yt) − g∗j(Zt),
cjj∗(zi) = E(uj,1uj∗,1|Z1= zi),
σj2(zi) = E(u2j,1|Z1= zi),
and
Wn = (W1,n(z1), . . . , W1,n(zk), . . . , Wm,n(z1), . . . , Wm,n(zk))T.
Then, Wn converges in distribution to a random vector
(Z1,1∗ , . . . , Zk,1∗ , . . . , Z1,m∗ , . . . , Zk,m∗ )T ≡ Z∗,
where Z∗ is multivariate normal with mean 0 and for 1 ≤ i, i∗ ≤ k and 1 ≤ j, j∗ ≤ m, Cov(Zi,j∗ , Zi∗∗,j∗) = κdσ2 j(zi)/fZ(zi) if i = i∗and j = j∗; κdc jj∗(zi)/fZ(zi) if i = i∗and j 6= j∗; 0 if i 6= i∗, where κ =R (k∗(v))2dv.
Now, suppose that the basis functions φl’s and ψm∗’s are linearly
indepen-dent. For the sake of convenience, for z ∈ {z1, . . . , zk}, we apply certain linear
transformations to φl’s and ψm∗’s to obtain new basis functions φ∗
l’s and ψ∗m∗’s
(the ˆρ(z) remains unchanged under such transformations). Take g1(X, Y ), . . .,
gm(X, Y ) to be the functions φ∗l(X)φ ∗
l0(X), φ∗l(X)ψ∗m∗(Y ) and ψm∗∗(Y )ψm∗0(Y ),
where 1 ≤ l ≤ l0 ≤ p and 1 ≤ m∗ ≤ m0 ≤ q. Then, the consistency of ˆρ(z) can
be established and we have Theorem 1. The proof for Theorem 1 is provided in Section 4.2.
THEOREM 1. Suppose that conditions (S0)–(S6) hold and the basis functions φl’s and ψm∗’s are linearly independent. Suppose z1, . . ., zk are distinct points
in Z0. Then, k X i=1 ˆ ρ2(zi) − ρ2p,q(zi) 2 = Op 1 nhd + h 4
and k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2p,q(zi) !2 = Op 1 nhd + h 4
The following theorem states the approximate distribution of the statistic Pk
i=1fˆZ(zi) ˆρ2(zi) when X and Y are conditionally independent given Z.
THEOREM 2. Suppose that the conditions in Theorem 1 hold and X and Y are conditionally independent given Z. Then,
nhd κd k X i=1 ˆ fZ(zi) ˆρ2(zi) converges in distribution to k X i=1 λi
as n tends to ∞, where the λi’s are IID and have the same distribution as the
largest eigenvalue of a matrix CCT, where C is a (p − 1) × (q − 1) matrix whose elements are IID N (0, 1).
The proof of Theorem 2 is provided in Section 4.3. Theorem 2 is similar to Theorem 3.2 given in [7]. The main difference between the two is that Theorem 2 can be applied to α-mixing data. In addition, p and q are held fixed in Theorem 2, while they are allowed to depend on n and tend to ∞ as n tends to ∞ in Theorem 3.2 in [7].
According to Theorem 2, a test that rejects H0if
nhd κd k X i=1 ˆ f (zi) ˆρ2(zi) ≥ F1−α∗ (10)
is of approximate level α, where F∗ is the distribution function ofPk
i=1λi and
F1−α∗ is the 1 − α quantile of F∗. Theorem 3 states that the test with rejection
region in (10) is consistent if p and q are sufficiently large and one of the ρ(zi)’s
is positive. The proof for this theorem is provided in Section 4.4.
THEOREM 3. Suppose that ρ(zi) > 0 for some ziand p and q are sufficiently
large so that ρp,q(zi) > 0. Then, for 0 < α < 1, the probability that (10) holds
tends to 1 as n → ∞.
3
Simulation studies and application to S&P500
index data
3.1
Simulation studies
In this section, we conduct several simulation studies for illustrating the per-formance of our test. The data generating processes, labeled Data1 – Data13, are described below. In order to make our simulation results comparable with those of the test proposed by Su and White [13], some of our data generating processes (Data1–Data10) are the same as theirs. Throughout the description for Data1–Data10, (1,t, 2,t, 3,t) are IID N (0, I3).
Data1: (Xt, Yt, Zt) = (1,t, 2,t, 3,t).
Data2: Xt= 0.5Xt−1+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.
Date3: Xt= 1,t q 0.01 + 0.5X2 t−1, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1. Data4: Xt = 1,tph1,t, Yt = 2,tph2,t, Zt = Xt−1, h1,t = 0.01 + 0.9h1,t−1+ 0.05Xt−12 and h2,t= 0.01 + 0.9h2,t−1+ 0.05Yt−12 .
Data5: Xt= 0.5Xt−1+ 0.5Yt+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.
Data6: Xt= 0.5Xt−1+ 0.5Yt2+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.
Data7: Xt= 0.5Xt−1Yt+ 1,t, Yt= 0.5Yt−1+ 2,tand Zt= Xt−1.
Data8: Xt= 0.5Xt−1+ 0.5Yt1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.
Data9: Xt= 1,t
q
0.01 + 0.5X2
t−1+ 0.25Yt2, Yt= 0.5Yt−1+ 2,tand Zt=
Xt−1.
Data10: Xt = 1,tph1,t, Yt = 2,tph2,t, Zt = Xt−1, h1,t = 0.01 +
0.1h1,t−1+ 0.4Xt−12 + 0.5Yt2 and h2,t= 0.01 + 0.9h2,t−1+ 0.5Yt2.
Data11: (Xt, Yt, Zt) = (1,t, 2,t, 3,t), where (1,t, 2,t, 3,t) are IID LN (0, I3).
Data12: Xt= 1,t1,t−1, Yt = 2,t2,t−1 and Zt = Xt−1, where (1,t, 2,t)
are IID LN (0, I2).
Data13: Xt= 1,t2,t−1, Yt= 21,t2,t−1 and Zt= 2,t−1, where (1,t, 2,t)
are IID LN (0, I2).
Here, Data1 – Data4, Data11 and Data12 are used for examining the level of the test, and Data5 – Data10 and Data13 are used for checking the power.
3.1.1 Simulation studies based on asymptotic distribution of the test statistic
We first apply our test using the asymptotic distribution of the test statistic. Parameter set-up: In order to apply our test, certain parameters need to be specified, including the kernel function k∗, the kernel bandwidth h and the basis functions. For the sake of simplicity, in all the simulation experiments, we take the kernel bandwidth h to be cn−0.25, where n is the sample size and c ∈ {0.5, 1, 1.5, 2}; we use the following kernel function:
k∗(x) =
1 − x if 0 ≤ x ≤ 1; x + 1 if − 1 ≤ x < 0.
In addition, the basis functions φ∗1,. . . , φ∗p and ψ1∗,. . . , ψq∗ are selected in the following manner. For i = 1, . . . , p and j = 1, . . . , q, let
φi(x) = 1 if i−1 p ≤ x < i p; 0 otherwise (11)
and
ψj(y) =
1 if j−1q ≤ y < jq;
0 otherwise, (12) where p = q = 4. Since the basis functions are defined on [0, 1], we transform the data (Xt, Yt, Zt)nt=1into (F1(Xt), F2(Yt), F3(Zt))nt=1before using the test, where
F1, F2 and F3 denote the empirical CDF’s of {Xt}t=1n , {Yt}nt=1 and {Zt}nt=1
respectively. For the choice of the evaluation points, we take z1= 0.78n−0.25≡
h0 and zi= zi−1+ 2h0 if i ≥ 2 and zi≤ 1 − h0.
Table 1 shows that the levels of the test are less than 0.05 for c = 0.5 and c = 1 and the powers of the test are larger for larger c’s. It seems that when c = 1, the levels of the test are close to 0.05 and the power performance is fine.
n = 500 n = 1000 c = 0.5 c = 1 c = 1.5 c = 2 c = 0.5 c = 1 c = 1.5 c = 2 Data1 0.030 0.039 0.053 0.071 0.043 0.048 0.057 0.074 Data2 0.030 0.041 0.058 0.074 0.033 0.048 0.060 0.069 Data3 0.032 0.042 0.055 0.080 0.038 0.049 0.055 0.070 Data4 0.038 0.044 0.057 0.075 0.042 0.048 0.057 0.066 Data5 0.951 1 1 1 1 1 1 1 Data6 0.898 1 1 1 0.997 1 1 1 Data7 0.918 1 1 1 0.985 1 1 1 Data8 0.995 1 1 1 1 1 1 1 Data9 0.725 0.991 1 1 0.993 1 1 1 Data10 0.374 0.817 0.959 0.986 0.819 0.996 1 1 Data11 0.036 0.050 0.062 0.079 0.035 0.042 0.049 0.059 Data12 0.036 0.051 0.055 0.072 0.041 0.041 0.053 0.065 Data13 1 1 1 1 1 1 1 1
Table 1: Power results for different c’s when n = 500 and n = 1000
3.1.2 Simulation studies based on local bootstrap
The test based on asymptotic distribution of the test statistic does not work well for small sample sizes. Figure 1 shows that the distribution of the test statistic and the asymptotic distribution are quite different for Data11 when n = 100. For Data1 – Data4 and Data12, we find similar patterns. When n = 200, the difference between the distribution of the test statistic and the asymptotic distribution become smaller but is still visible.
To apply our test for small sample sizes, we consider the local bootstrap procedure proposed by Paparoditis and Politis [11]. The local bootstrap proce-dure is described below. For a given sample {(Xt, Yt, Zt)}nt=1, a local bootstrap
sample {(Xt∗, Yt∗, Zt∗)}n
t=1is generated according to the following steps.
Figure 1: Exact distribution (solid line) versus asymptotic distribution (dashed line) of the test statistic with different bandwidth choices (h = cn−1/4)
distribution function ˆFZ, where
ˆ FZ(z) = 1 n n X t=1 I(−∞,Zt](z).
(b) For 1 ≤ t ≤ n, we draw Xt∗ and Yt∗ independently from the empirical cu-mulative distribution functions ˆFX|Z=Z∗
t and ˆFY |Z=Zt∗respectively, where
ˆ FX|Z=Z∗ t(x) = Pn t=1k ∗((Z∗ t − Zt)/b)I(−∞,Xt](x) Pn t=1k∗((Zt∗− Zt)/b) and ˆ FY |Z=Z∗ t(y) = Pn t=1k ∗((Z∗
t − Zt)/b)I(−∞,Yt](y)
Pn
t=1k∗((Zt∗− Zt)/b)
Here, the bandwidth b is taken to be n−0.2 and the kernel function k∗ is the probability density function for N (0, 1).
In order to determine the rejection region for a given sample, we repeat the above procedure to obtain bootstrap resamples and compute the test statistic nhdκ−dPk
i=1f (zˆ i) ˆρ2(zi) for the original sample and each local bootstrap
re-sample. For a given level α, if the test statistic based on the given sample is larger than the (1 − α) quantile of the test statistics that are computed based on the local bootstrap resamples, we reject the conditional independence hy-pothesis at level α. The purpose of using the local bootstrap procedure is to generate a resample {(Xt∗, Yt∗, Zt∗)}n
t=1such that the distribution of Z∗, the
con-ditional distributions of X∗ given Z∗= z and Y∗ given Z∗= z are close to the distribution of Z, the conditional distributions of X given Z = z and Y given Z = z respectively. In addition, since Xt∗ and Yt∗ are generated independently given Zt∗ = z, they are conditionally independent given Zt∗= z, irrespective of whether or not X and Y are conditionally independent given Z.
In these simulation studies, we choose the basis functions in (11) and (12) with p = q = 5. The evaluation points are {0.2, 0.4, 0.6, 0.8}, and the kernel bandwidth h to be cn−0.25, where n is the sample size and c ∈ {0.5, 1, 1.5, 2}.
Finally, we present a few experimental results of our test (Test 1) and Su and White’s test (Test 2). For Test 2, we run the simulations for Data11 – Data13 with the bandwidth hn = c∗n−1/8.5, where c∗ = 1 or 2. Each power estimate
is based on 3000 repetitions, where 1000 local bootstrap resamples are used in each repetition. For the sake of comparison, we also list some power estimates for Test 2 for Data1 – Data10, which are taken directly from [13]. They use 250 repetitions with 200 local bootstrap resamples for each repetition.
Tables 2 and 3 indicate the level and power estimates for Test 1 and Test 2 at significance level 5% when the sample sizes are 100 and 200 respectively.
Data1 Data2 Data3 Data4 Data5 Data6 Data7 Test 2,c∗= 1 0.096 0.060 0.048 0.072 0.668 0.756 0.388 Test 2,c∗= 2 0.072 0.036 0.072 0.048 0.952 0.944 0.576 Test 1,c = 0.5 0.045 0.061 0.046 0.062 0.525 0.479 0.265 Test 1,c = 1 0.046 0.050 0.050 0.047 0.746 0.717 0.400 Test 1,c = 1.5 0.040 0.052 0.056 0.055 0.814 0.779 0.329 Test 1,c = 2 0.041 0.050 0.053 0.062 0.852 0.793 0.218
Data8 Data9 Data10 Data11 Data12 Data13 Test 2,c∗= 1 0.860 0.828 0.680 0.034 0.043 0.589 Test 2,c∗= 2 0.940 0.988 0.912 0.022 0.022 0.859 Test 1,c = 0.5 0.692 0.357 0.195 0.058 0.050 1 Test 1,c = 1 0.873 0.566 0.320 0.049 0.048 1 Test 1,c = 1.5 0.889 0.618 0.341 0.049 0.041 1 Test 1,c = 2 0.860 0.631 0.348 0.046 0.045 1
Data1 Data2 Data3 Data4 Data5 Data6 Data7 Test 2,c∗= 1 0.064 0.052 0.080 0.080 0.900 0.960 0.596 Test 2,c∗= 2 0.044 0.060 0.056 0.048 1 1 0.864 Test 1,c = 0.5 0.040 0.061 0.036 0.055 0.827 0.830 0.488 Test 1,c = 1 0.049 0.051 0.057 0.054 0.982 0.983 0.831 Test 1,c = 1.5 0.046 0.048 0.049 0.053 0.995 0.989 0.827 Test 1,c = 2 0.045 0.045 0.047 0.057 0.997 0.995 0.735
Data8 Data9 Data10 Data11 Data12 Data13 Test 2,c∗= 1 0.992 0.968 0.880 0.031 0.036 0.347 Test 2,c∗= 2 1 1 0.996 0.025 0.032 0.872 Test 1,c = 0.5 0.988 0.730 0.392 0.062 0.062 1 Test 1,c = 1 1 0.947 0.679 0.048 0.047 1 Test 1,c = 1.5 1 0.968 0.738 0.051 0.043 1 Test 1,c = 2 1 0.971 0.745 0.058 0.037 1
Table 3: Power comparison between Tests 1 and 2 when n = 200
3.2
Application to S&P500 index data
In this section, we apply the linear Granger causality test (hereafter denoted by Test LIN) and our conditional independence test (Test 1) in order to check the interaction between returns and volume for S&P500 index data at one day lag. There are 2514 observations for daily index returns and trading volume from January 2000 to December 2009, taken from Yahoo Finance. Here, the return for day t is defined as
Rt= 100 log Pt Pt−1 ,
where Pt is the index value for day t. Moreover, the trading volume for day t
(in dollars), denoted by Vt, is transformed into
Vt∗= log V t Vt−1 .
The above transformations are commonly used in the analysis for financial data; for example, see Hiemstra and Jones [6] and [1]. The augmented Dickey-Fuller test reveals that the series {Rt} and {Vt∗} are stationary.
In order to examine whether {Rt} is useful for predicting {Vt∗}, we consider
the effects up to lag 1. Specifically, we test
H0: Vt∗⊥ Rt−1|Vt−1∗ (13)
using Test 1. For Test LIN, it is assumed that
E(Vt∗|Rt−1, Vt−1∗ ) = a1Rt−1+ b1Vt−1∗
and the null hypothesis is
We use the notation Rt−16⇒ Vt∗to denote the relation expressed in (13) or (14).
The notation Vt−1∗ 6⇒ Rtis defined analogously.
The p-values for Test LIN and Test 1 are provided in Table 4. For Test 1, we use the same parameter set-up as in Section 3.1.1 and find both the return-to-volume and volume-to-return relationships are significant at the 5% level. However, for Test LIN, the volume-to-return relationship is not significant. These findings are consistent with the results obtained in [6] and [1].
H0 Rt−16⇒ Vt∗ Vt−1∗ 6⇒ Rt
Test LIN 0.000 0.804 Test 1 0.001 0.032
Table 4: p-values for Test LIN and Test 1 for testing the relationship between returns and volume changes
To illustrate the implementation of our test for the d > 1 case, we also apply the test to test
H0: Vt∗⊥ (Rt−1, Rt−2)|(Vt−1∗ , Vt−2∗ ) (15)
and
H0: Rt⊥ (Vt−1∗ , V ∗
t−2)|(Rt−1, Rt−2). (16)
The empirical CDF transforms are applied component-wisely. For instance, we transform (Vt−1∗ , Vt−2∗ , )n
t=4into (F1(Vt−1∗ ), F2(Vt−2∗ ))nt=4, where n = 2512 and Fi
is the empirical CDF of Vt−i∗ for i = 1, 2. For the basis functions, we use 4 basis functions on [0, 1]: φ1, . . ., φ4 and 4 basis functions ψ1, . . ., ψ4 on [0, 1]2,
where φ1, . . ., φ4are given in (11) with p = 4, ψ1(y1, y2) = I[0,0.5)(y1)I[0,0.5)(y2),
ψ2(y1, y2) = I[0,0.5)(y1)I[0.5,1)(y2), ψ3(y1, y2) = I[0.5,1)(y1)I[0,0.5)(y2) and ψ4 =
1−ψ1−ψ2−ψ3. Here IA(·) denotes the indicator function on A. In addition, the
kernel bandwidth is cn−1/(d+δ) with c = 1.4 and δ = 2.4. The evaluation points are all the points in S2
h0, where Sh0 = {(2k −1)h0: k is an integer }∩[h0, 1−h0]
and h0= 0.78n−1/(d+δ). Here c and δ are selected from certain candidate values
so that the levels of the test are close to 0.05 when the data are IID U (0, 1). The p-values for (15) and (16) are 0.017 and 0.370 respectively.
Some remarks on the implementation of the test.
• It is recommended to choose evaluation points so that two evaluation points, zi and zj, are at least 2h away (for each component) when a
compact kernel supported on [−1, 1]d is used. In such case, the ˆρ(zi) and
ˆ
ρ(zj) are independent for IID data, which makes the distribution of the
test statistic close to the derived asymptotic distribution. Since nhd→ ∞,
h cannot be too small, which implies that the number of evaluation points cannot be too large.
• We apply empirical CDF transforms to our data so that the distribution of each component of X, Y and Z is supported on [0, 1]. The transforms are data dependent and it is not clear whether the transformed data can
be treated as if they were transformed by the true underlying CDF. The simulation results are fine, but further investigation is needed.
4
Proofs
In this section, we give proofs for Theorems 1 – 3 and Lemma 1. Before giving the proofs, we first define and recall some notations. Recall that k∗ is a kernel on R1 and k0 is a product kernel on Rd defined by
k0(v1, v2, . . . , vd) = k∗(v1)k∗(v2) · · · k∗(vd),
κ = Z
(k∗(v))2dv and κ2=R v2k∗(v)dv.
For a (p + q) × (p + q) matrix V0, let g1,1(V0), g1,2(V0), g2,1(V0) and g2,2(V0)
denote the matrices of dimensions p × p, p × q, q × p, q × q respectively such that V0= g1,1(V0) g1,2(V0) g2,1(V0) g2,2(V0) .
4.1
Proof of Lemma 1
For simplicity, we prove the lemma only for the case where m = 2 and k = 2. For t = 1, 2, . . . , n, i = 1, 2 and j = 1, 2, let
ˆ ηj,1(zi) = (nhd)−1 n X t=1 (gj∗(Zt) − gj∗(zi))k0 Zt− zi h , ˆ ηj,2(zi) = (nhd)−1 n X t=1 uj,tk0 Zt− zi h
and ˆηj(zi) = ˆηj,1(zi) + ˆηj,2(zi). Then, ˆgj(zi) − gj∗(zi) = ˆηj(zi)/ ˆfZ(zi), where
ˆ
fZ(zi) = (1/(nhd))Pnt=1k0((Zt− zi)/h). We can complete the proof using the
following results (A1)-(A3):
(A1) Suppose that the conditions in Lemma 1 hold. Then, for 1 ≤ i, j ≤ 2,
ˆ ηj,1(zi) = h2 d X s=1 Bs,j(zi) + op h3+ (nhd)−1/2.
(A2) Suppose that the conditions in Lemma 1 hold. Then,
Zn∗≡√nhd ˆ η1,2(z1) ˆ η2,2(z1) ˆ η1,2(z2) ˆ η2,2(z2) D → Z,
where the distribution of Z is N (0, Σ) and Σ is κdσ12(z1)fZ(z1) κdc12(z1)fZ(z1) 0 0 κdc12(z1)fZ(z1) κdσ22(z1)fZ(z1) 0 0 0 0 κdσ21(z2)fZ(z2) κdc12(z2)fZ(z2) 0 0 κdc12(z2)fZ(z2) κdσ22(z2)fZ(z2) .
(A3) Suppose that (Xn1, Xn2, . . . , Xnk)T D→ (Y1, Y2, . . . , Yk)Tand (Zn1, Zn2. . . , Znk)T D
→ (c1, c2, . . . , ck)T, where c1, c2, . . ., ck are constants. Then,
(Xn1Zn1, Xn2Zn2, . . . , XnkZnk)T D→ (c1Y1, c2Y2, . . . , ckYk)T.
From (A1),(A2) and the assumption that nhd+4 → 0, we have
√ nhd ˆ η1(z1) − h2P d s=1Bs,1(z1) ˆ η2(z1) − h2P d s=1Bs,2(z1) ˆ η1(z2) − h2P d s=1Bs,1(z2) ˆ η2(z2) − h2P d s=1Bs,2(z2) ∼ Z + (nhd)1/2op h3+√1 nhd D → Z,
where A ∼ B means that the distributions of A and B are the same. Apply (A3) and we have Lemma 1.
The proofs of (A1)-(A3) are given below. • Proof of (A1). Note that
E (ˆηj,1(zi)) = 1 hd Z g∗j(zt) − gj∗(zi) k0 zt− zi h fZ(zt)dzt = Z gj∗(zi+ hν) − g∗j(zi) k0(ν)fZ(zi+ hν)dν ν = (ν1, . . . , νd) = Z h d X s=1 gj,s∗ (zi)νsfZ(zi)k0(ν)dν + Z h2 d X s=1 g∗j,s(zi)νs ! d X s=1 fs(zi)νs ! k0(ν)dν +1 2 Z h2fZ(zi) d X s=1 d X s∗=1 gj,ss∗ ∗(zi)νsνs∗k0(ν)dν + O(h3) = h2κ2 2 d X s=1 fZ(zi)g∗j,ss(zi) + 2fs(zi)g∗j,s(zi) + O(h3) = h2 d X s=1 Bs,j(zi) + O(h3).
Let Ki,j,t= h−d gj∗(Zt) − gj∗(zi) k0((Zt− zi)/h). Then, we have
V ar (ˆηj,1(zi)) = 1 n2 n X t=1 V ar(Ki,j,t) + n X t=1 n X s=1,s6=t
Cov(Ki,j,t, Ki,j,s)
.
Since
V ar(Ki,j,t) = E Ki,j,t2 − (E(Ki,j,t)) 2 = 1 hd Z gj∗(zi+ hν) − g∗j(zi) 2 (k0(ν)) 2 fZ(zi+ hν)dν − fZ(zi)h Z d X s=1 gj,s∗ (zi)νsk0(ν)dν + O(h2) !2 = 1 hdO(h 2) − O(h4), Pn
t=1V ar(Ki,j,t) = O(nh2−d). Note that from Corollary A.2 in Hall
and Heyde [5] and the fact that for 2 < β < 2(2 + d)/d, E( K β i,j,t ) = O(h2+d−βd), we have X s6=t
Cov(Ki,j,t, Ki,j,s)
= 2 n X t=1 n X s>t
Cov(Ki,j,t, Ki,j,s)
≤ 2n ∞ X s=1
|Cov(Ki,j,1, Ki,j,1+s)|
≤ 16nO(h2(2+d−βd)/β) ∞ X s=1 α(β−2)/β(s). Therefore, V ar (ˆηj,1(zi)) = O h2 nhd + O h 2(2+d−βd)/β n = o 1 nhd
From the above results, ˆηj,1(zi) = h2P d
s=1Bs,j(zi) + op(h3+ (nhd)(−1/2)).
• Proof of (A2). By the Cram´er-Wold Theorem, it is sufficient to prove that cTZ∗
n converges in distribution to cTZ for any c = (c1, c2, c3, c4)T in R4.
We use “big-small block” arguments to complete the proof. Assume that there exist positive integers p = p(n), q = q(n) and k = k(n) = [n/(p + q)] (the integer part of n/(p + q)) such that as n → ∞,
p → ∞, q → ∞, p = o(n), q = o(p), p = o(nhd)1/2, np−1α(q) = o(1), phd= o(1), phd→ ∞. Let Zn,t = 1 √ hd c1u1,tk0 Zt− z1 h + c2u2,tk0 Zt− z1 h +c3u1,tk0 Zt− z2 h + c4u2,tk0 Zt− z2 h .
Then, we have cTZ∗ n= √1n Pn t=1Zn,t ≡√1nWn. Let ξj =P (j+1)p+jq t=j(p+q)+1Zn,t and ζj=P (j+1)(p+q) t=(j+1)p+jq+1Zn,tfor j = 0,1, . . ., k−1, and ζk= Pn t=k(p+q)+1Zn,t. Then, Wn= k−1 X j=0 ξj | {z } Wn1 + k−1 X j=0 ζj | {z } Wn2
+ζk. In order to prove this lemma, it suffices
to show that as n → ∞, (1) E(exp(itWn1)) − Qk−1 j=0E(exp(itξj)) → 0, (2) √1 nWn2 p → 0 and √1 nζk p → 0, (3) σ2 n≡ Pk−1 j=0E(ξ 2 j) = n(σ2+ o(1)), (4) σ12 n Pk−1 j=0E(ξ 2
jI(|ξj| > εpσn2)) → 0 for any ε > 0,
where σ2=c21κdfZ(z1)σ12(z1) + c22κdfZ(z1)σ22(z1) + c23κdfZ(z2)σ12(z2) + c24κ df Z(z2)σ22(z2) + 2c1c2κdfZ(z1)c12(z1) + 2c3c4κdfZ(z2)c12(z2).
The verification of the above expression for σ2
n is given in Section 4.5.
We now prove these results respectively. From Lemma 18.2 in Li and Racine [9], which is due to Volkonskii and Rozanov [14],
E(exp(itWn1)) − k−1 Y j=0 E(exp(itξj)) ≤ 16kα(q) = O n pα(q) = o(1),
we obtain (1). In order to prove (2), we first consider Wn2. Note that
E(Wn22 ) = V ar( k−1 X j=0 ζj) = kV ar(ζ0) | {z } (P 1) + k−1 X i=0 k−1 X j=0,j6=i Cov(ζi, ζj) | {z } (P 2) .
Computation of (P1). Note that from
V ar(ζ0) = q X i=1 V ar(Zn,i) + 2 q X i=1 q X j>i Cov(Zn,i, Zn,j), q X i=1
and the fact that 2 q X i=1 q X j>i Cov(Zn,i, Zn,j) = 2q q X j=1 (1 −j q)Cov(Zn,1, Zn,1+j) = O(q 2hd), we have that
V ar(ζ0) = qσ2+ O(q2hd) + O(qh2) = qσ2(1 + o(1)).
Therefore,
(P 1) = kqσ2(1 + o(1)) = O(kq) = o(n). Computation of (P2). Note that from Theorem A.5 in [5],
|(P 2)| = 2 k−1 X i=0 k−1 X j>i Cov(ζi, ζj) ≤ 2 n−p X i=1 n X j=i+p |Cov(Zn,i, Zn,j)| ≤ 2n ∞ X j=p |Cov(Zn,1, Zn,1+j)| ≤ 2n ∞ X j=p 4C1nC2nα(j) ≤ C∗ n hd ∞ X j=p α(j) = o(n),
where Cin= 4 max |ck| sup |us,1| sup |k0|/
√
hd for i = 1, 2. Then, we have
E(W2
n2)/n = o(1). Similarly, V ar(ζk) = O(p + q) = o(n), so (2) holds.
By stationarity and the same arguments in (1), we have V ar(ξ0) = pσ2(1+
o(1)). Thus Pk−1
j=0E(ξ 2
j)/n = kpσ
2(1 + o(1))/n → σ2. Finally, since
|Zn,t| ≤ C/
√
hd, for every > 0, the set {|ξ
j| ≥ pσ2n} is an empty set
when n is large. Therefore, (4) holds. This complete the proof.
• Proof of (A3). It is sufficient to prove that (Xn1, . . . , Xnk, Zn1, . . . , Znk)T D→
(Y1, . . . , Yk, c1, . . . , ck). Let Xn= (Xn1, . . . , Xnk)T , Zn= (Zn1, . . . , Znk)T,
Y = (Y1, . . . , Yk)T and c = (c1, . . . , ck)T. Then,
E(ei(tTXn+sTZn)) = E(ei(tTXn+sTc)ei(sT(Zn−c)))
= E(ei(tTXn+sTc)(ei(sT(Zn−c))− 1))
| {z } I + E(ei(tTXn+sTc)) | {z } II .
Note that II → E(ei(tTY +sTc)) and I → 0 by Lebesgue’s dominated
convergence theorem. Apply the continuous mapping Theorem and we have (A3).
4.2
Proof of Theorem 1
We adopt the proof in [7]. For z ∈ {z1, . . . , zk}, let φ∗l: 1 ≤ l ≤ p and ψ∗m∗: 1 ≤
φl’s and ψm∗’s such that φ∗1= 1 = ψ1∗, (E(φ∗l(X)φ∗l0(X)|Z = z) : 1 ≤ l, l0 ≤ p)
and (E(ψm∗∗(Y )ψm∗0(Y )|Z = z) : 1 ≤ m∗, m0 ≤ q) are identity matrices, and
E(φ∗l(X)ψm∗∗(Y )|Z = z) = 0 for l 6= m∗. Take g1(X, Y ), . . ., gm(X, Y ) to be the
functions φ∗l(X)φ∗l0(X), φ∗l(X)ψm∗∗(Y ) and ψm∗∗(Y )ψm∗0(Y ), where 1 ≤ l ≤ l0≤ p
and 1 ≤ m∗≤ m0≤ q. Apply Lemma 1 and we have
√ nhd ˆ g1(z1) − g1∗(z1) .. . ˆ g1(zk) − g1∗(zk) .. . ˆ gm(z1) − gm∗(z1) .. . ˆ gm(zk) − gm∗(zk) −√nhd h2Pd s=1Bs,1(z1)/fZ(z1) .. . h2Pd s=1Bs,1(zk)/fZ(zk) .. . h2Pd s=1Bs,m(z1)/fZ(z1) .. . h2Pd s=1Bs,m(zk)/fZ(zk) D → Z∗. (17) Let V∗(z) = V11(z) V12(z) V21(z) V22(z) ,
where the (l, l0)-th element of V11(z) is E(φ∗l(X)φ∗l0(X)|Z = z) for 1 ≤ l, l0 ≤ p,
the (l, m∗)-th element of V12(z) is E(φ∗l(X)ψm∗∗(Y )|Z = z) for 1 ≤ l ≤ p,
1 ≤ m∗ ≤ q, the (m∗, m0)-th element of V
22(z) is E(ψm∗∗(Y )ψm∗0(Y )|Z = z)
for 1 ≤ m∗, m0 ≤ q, and V
21(z) = (V12(z))T. Let ˆV∗(z) be the estimator
of V∗(z) obtained by replacing each conditional expectation in V∗(z) with its kernel estimator defined in (8). Then, (17) gives
k X i=1 k ˆV∗(zi) − V∗(zi) k2= Op 1 nhd + Op(h4) = Op 1 nhd + h 4 .
For 1 ≤ i ≤ k, for a p × 1 vector a and a (p + q) × (p + q) matrix
U = U11 U12 U21 U22 ,
where the dimensions of U11, U12, U21 and U22are p × p, p × q, q × p and q × q
respectively, define gr,s(U ) = Urs (18) for 1 ≤ r, s ≤ 2, gr,s∗ (U ) = gr,s(U ) if (r, s) = (1, 2) or (2, 1); (gr,s(U ))−1 if (r, s) = (1, 1) or (2, 2), and g(U, a) = U1,2U2,2−1U2,1U1,1−1− U1,1aaT. (19)
Let α∗ be the p × 1 vector whose first element is 1 and the rest elements are 0’s. Then, ˆρ(z) and ρp,q(z) are the square roots of the largest eigenvalues of the
matrices g( ˆV∗(z), α∗) and g(V∗(z), α∗) respectively. Let 4r,s,i= gr,s∗ ( ˆV ∗(z i)) − g∗r,s(V ∗(z i)). Then, we have k g( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k ≤ 2 Y r=1 2 Y s=1 (kgr,s∗ (V∗(zi))k + k4r,s,ik) − 2 Y r=1 2 Y s=1 kg∗r,s(V∗(zi))k + kg1,1∗ ( ˆV∗(zi)) − g1,1∗ (zi)kkα∗(α∗)Tk,
which gives that
k X i=1 kg( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k2= Op 1 nhd + h 4 = Op 1 nhd and k X i=1 ˆ ρ2(zi) − ρ2p,q(zi) 2 = Op 1 nhd + h 4 (20)
since | ˆρ2(zi) − ρ2p,q(zi)| ≤ kg( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k for 1 ≤ i ≤ k. From
(20) and the fact that
k X i=1 ( ˆfZ(zi) − fZ(zi))2= Op 1 nhd + h 4 , k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2p,q(zi) !2 = k X i=1 ( ˆfZ(zi) − fZ(zi)) ˆρ2(zi) + k X i=1 fZ(zi)( ˆρ2(zi) − ρ2p,q(zi)) !2 = Op 1 nhd + h 4 .
4.3
Proof of Theorem 2
We adopt the proof in [7]. For z ∈ {z1, . . . , zk}, let ˆV∗(z), V∗(z) and Bs,j be as
defined in the proof of Theorem 1. Let Bi be the (p + q) × (p + q) matrix whose
elements are h2Pd s=1Bs,j(zi)/fZ(zi): 1 ≤ j ≤ m = (p + q)2. From Lemma 1, we have pnhdf Z(z1)/κd( ˆV∗(z1) − V∗(z1) − B1) .. . pnhdf Z(zk)/κd( ˆV∗(zk) − V∗(zk) − Bk) D → N∗ 1 .. . Nk∗ ≡ N ∗,
where for 1 ≤ i ≤ k, Ni∗ is a normal matrix of elements with mean 0 and variance 1. Apply the Skorohod’s theorem, for 1 ≤ i ≤ k, there exist random matrices Ti and W1,isuch that Ti∼ (nhdfZ(zi)/κd)1/2( ˆV∗(zi) − V∗(zi) − Bi),
W1,i∼ Ni∗ and Ti → W1,ialmost surely. Therefore,
ˆ V∗(zi) ∼ √ κdT i pnhdf Z(zi) + V∗(zi) + Bi= V∗(zi) + √ κd pnhdf Z(zi) (W1,i+ W2,i),
where W2,i = Ti− W1,i+pnhdfZ(zi)/κdBi. Note that Bi = O(h2). From
(S6),Pk
i=1kW2,ik = op(1).
For 1 ≤ i ≤ k, let ˜Vi= V∗(zi) + (nhdfZ(zi)/κd)−1/2(W1,i+ W2,i), A1(zi) =
g( ˜Vi, α∗)g1,1( ˜Vi) and ˜ρ20(zi) be the largest eigenvalue of A1(zi)(g1,1( ˜Vi))−1. Here
the functions g(·, ·) and g1,1 are defined in (19) and (18) respectively. Then,
˜
ρ0(zi) has the same distribution as ˆρ(zi). Below we will show that the impact
of W2,iis negligible in the derivation of the asymptotic distribution of ˜ρ0(zi).
For 1 ≤ r, s ≤ 2 and 1 ≤ i ≤ k, let 4r,s,i = gr,s( ˜Vi) − gr,s(V∗(zi)). Then, k X i=1 2 X r=1 2 X s=1 k4r,s,ik2= Op 1 nhd + h 4 = Op 1 nhd and A1(zi) =g1,2(V∗(zi))(g2,2( ˜Vi))−1g2,1(V∗(zi)) − g1,1( ˜Vi)α∗(α∗)Tg1,1( ˜Vi)
+ g1,2(V∗(zi))42,1,i+ 41,2,ig2,1(V∗(zi)) + 41,2,i42,1,i
− g1,2(V∗(zi))42,2,i42,1,i− 41,2,i42,2,ig2,1(V∗(zi)) + R1,i,
where
R1,i=41,2,i((g2,2( ˜Vk))−1− Iq)42,1,i
+ g1,2(V∗(zi))((g2,2( ˜Vi))−1− Iq+ 42,2,i)42,1,i
+ 41,2,i((g2,2( ˜Vi))−1− Iq+ 42,2,i)g2,1(V∗(zi))
and Iq denotes the q × q identity matrix. Note that g2,2( ˜Vi) can be expressed as
g2,2( ˜Vi) = 1 BT i Bi Di
for some matrices Bi and Di, so A1(zi) becomes
BiT((Di− BiBiT) −1− I
q−1)BiJ + g1,2(V∗(zi))(42,2,i− J )2g2,1(V∗(zi))
− 41,1,ig1,2(V∗(zi))g2,1(V∗(zi))41,1,i+ 41,2,i42,1,i
− g1,2(V∗(zi))42,2,i42,1,i− 41,2,i42,2,ig2,1(V∗(zi)) + R1,i,
where J = α∗(α∗)T. Let
A2(zi) =g1,2(V∗(zi))(g2,2(W1,i))2g2,1(V∗(zi))
− g1,1(W1,i)g1,2(V∗(zi))g2,1(V∗(zi))g1,1(W1,i) + g1,2(W1,i)g2,1(W1,i)
and R2,i=BiT((Di− BiBTi ) −1− I q−1)BiJ − (nhdf Z(zi)/κd)−1A2(zi) + g1,2(V∗(zi))(42,2,i− J )2g2,1(V∗(zi))
− 41,1,ig1,2(V∗(zi))g2,1(V∗(zi))41,1,i+ 41,2,i42,1,i
− g1,2(V∗(zi))42,2,i42,1,i− 42,1,i42,2,ig2,1(V∗(zi)).
Then, A1(zi) = A2(zi)κd nhdf Z(zi) + R1,i+ R2,i, (21) where k X i=1 (kR1,ik2+ kR2,ik2) = Op 1 (nhd)2 . (22)
Note that under conditional independence, for 1 ≤ i ≤ k, A2(zi) = CiCiT,
where Ci is the p × q matrix obtained by replacing elements in the first
col-umn and first row of g1,2(W1,i) with zero’s, and g1,2(W1,i) is a random matrix
whose elements are IID N (0, 1) expect that the (1, 1)-th element is 1. There-fore,Pk
i=1kA2(zi)k2= Op(1), which, together with (21) and (22), implies that
Pk i=1kA1(zi)k 2= O p(1/((nhd)2)) and k X i=1 kA1(zi)(g1,1( ˜Vi))−1− A1(zi)k2= Op 1 (nhd)3 . (23)
For 1 ≤ i ≤ k, let λ0,i be the largest eigenvalue of A2(zi). By (21), (22) and
(23),
k
X
i=1
(nhdfZ(zi) ˜ρ20(zi)/κd− λ0,i)2= op(1).
Let ˜fi, ˜ρ(zi) and λi : 1 ≤ i ≤ k be random variables such that the joint
distribution of ( ˜fi, ˜ρ(zi)) : 1 ≤ i ≤ k is the same as ( ˆfZ(zi), ˆρ(zi)) : 1 ≤ i ≤ k,
and the joint distribution of ( ˜ρ(zi), λi) : 1 ≤ i ≤ k is the same as ( ˜ρ0(zi), λ0,i) :
1 ≤ i ≤ k. Note that nhdPk i=1( ˆρ(zi)) 2= O p(1), so we have that nhd κd k X i=1 ˆ fZ(zi)( ˆρ(zi))2− nhd κd k X i=1 fZ(zi)( ˆρ(zi))2 ≤nh d κd k X i=1 ( ˆfZ(zi) − fZ(zi))2 !1/2 k X i=1 ( ˆρ(zi))2 = Op(1)Op((nhd)−1/2) = Op((nhd)−1/2) and nhd κd k X i=1 ˜ fi( ˜ρ(zi))2− k X i=1 λi ≤ Op((nhd)−1/2) + op(1) = op(1).
4.4
Proof of Theorem 3
Suppose that ρ(zi) > 0 for some zi. Then, we have P k
i=1fZ(zi)ρ2(zi) > 0.
Choose such that 0 < <Pk
i=1fZ(zi)ρ 2(z i) and we have P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ k X i=1 fZ(zi)ρ2(zi) − ! | {z } III ≤ P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ κdF∗ 1−α nhd !
for large n. From Theorem 1,
III ≥ P k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2(zi) ≤ ! → 1, so P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ κdF1−α∗ nhd ! → 1.
4.5
The verification of the expression for σ
2 nThe expression for σ2
n involves some variance and covariance terms. Under the
conditions in Theorem 1, the major parts for those variance and covariance terms can be obtained. The results are as follows. For 1 ≤ i, i∗ ≤ k and 1 ≤ j, j∗≤ m, 1 - 4 hold. 1. V ar uj,tk0(Zt−zh i) = hdκdσ2j(zi)fZ(zi) + O(hd+2). 2. Cov uj,tk0(Zth−zi), uj∗,tk0(Zt−zi h ) = h dκdc jj∗(zi)fZ(zi) + O(hd+2). 3. Covuj,tk0(Zth−zi), uj,tk0(Zt−zhi∗) = O(h2d). 4. Covuj,tk0(Zth−zi), uj∗,tk0(Zt−zi∗ h ) = O(h2d).
We will only give the proof for Case 1 since the proofs for other cases are similar. Since V ar uj,tk0 Zt− zi h = E E u2j,t k0 Zt− zi h 2 |Zt !! = Z σj2(zt) k0 zt− zi h 2 fZ(zt)dzt = hd Z σj2(zi+ hν)(k0(ν))2fZ(zi+ hν)dν = hd Z σj2(zi)(k0(ν))2 fZ(zi) + h d X s=1 fs(zi)νs+ O(h2) ! dν = hdκdσ2j(zi)fZ(zi) + O(hd+2),
Acknowledgements
This research is supported by National Science Council in Taiwan under grant NSC 99-2118-M-004 -006 -. The authors would like to thank the reviewers for careful reading and constructive comments.
References
[1] Taoufik Bouezmarni, Jeroen V. K. Rombouts, and Abderrahim Taamouti. A nonparametric copula based test for conditional independence with appli-cations to granger causality. Working papers, Departamento de Econom´ıa Universidad Carlos III de Madrid, 2009.
[2] Miguel A. Delgado and Wenceslao Gonz´alez Manteiga. Significance test-ing in nonparametric regression based on the bootstrap. The Annals of Statistics, 29(5):1469–1507, 2001.
[3] J. P. Florens and Denis Fougere. Noncausality in continuous time. Econo-metrica, 64(5):1195–1212, 1996.
[4] J. P. Florens and M. Mouchart. A note on noncausality. Econometrica, 50(3):583–591, 1982.
[5] P. G. Hall and C. C. Heyde. Martingale Limit Theory and Its Applications. Academic Press, 1980.
[6] Craig Hiemstra and Jonathan D. Jones. Testing for linear and nonlinear granger causality in the stock price-volume relation. The Journal of Fi-nance, 49(5):1639–1664, 1994.
[7] Tzee Ming Huang. Testing conditional independence using maximal non-linear conditional correlation. The Annals of Statistics, 38(4):2047–2091, 2010.
[8] Lexin Li, R. Dennis Cook, and Christopher J. Nachtsheim. Model-free vari-able selection. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 67(2):285–299, 2005.
[9] Qi Li and Jeffrey Scott Racine. Nonparametric Econometrics : theory and practice. Princeton University Press, 2007.
[10] Oliver Bruce Linton and Pedro Gozalo. Conditional independence restric-tions: Testing and estimation. Cowles Foundation Discussion Papers 1140, Cowles Foundation, Yale University, 1996.
[11] Efstathios Paparoditis and Dimitris N. Politis. The local bootstrap for ker-nel estimators under general dependence conditions. Annals of the Institute of Statistical Mathematics, 52(1):139–159, 2000.
[12] Liangjun Su and Halbert White. A consistent characteristic function-based test for conditional independence. Journal of Econometrics, 141:807–834, 2007.
[13] Liangjun Su and Halbert White. A nonparametric hellinger metric test for conditional independence. Econometric Theory, 24(4):829–864, 2008. [14] V. A. Volkonskii and Yu. A. Rozanov. Some limit theorems for random