相依資料的條件獨立檢定

(1)

行政院國家科學委員會專題研究計畫成果報告

相依資料的條件獨立檢定

研究成果報告(精簡版)

計畫類別：個別型

計畫編號： NSC 99-2118-M-004-006-

執行期間： 99 年 08 月 01 日至 100 年 07 月 31 日

執行單位：國立政治大學統計學系

計畫主持人：黃子銘

計畫參與人員：碩士班研究生-兼任助理人員：程毓婷

博士班研究生-兼任助理人員：鄭宇翔

處理方式：本計畫可公開查詢

中華民國 100 年 07 月 29 日

(2)

A conditional independence test for dependent

data based on maximal conditional correlation

Yu-Hsiang Cheng

Department of Statistics

National Chengchi University

Taipei, Taiwan, ROC

96354501@nccu.edu.tw

Tzee-Ming Huang

Department of Statistics

National Chengchi University

Taipei, Taiwan, ROC

tmhuang@nccu.edu.tw

June 16, 2011

Abstract

In Huang [7], a test of conditional independence based on maximal nonlinear conditional correlation is proposed and the asymptotic distri-bution for the test statistic under conditional independence is established for IID data. In this paper, we derive the asymptotic distribution for the test statistic under conditional independence for α-mixing data. The results of simulation show that the test performs reasonably well for de-pendent data. We also apply the test to stock index data to test Granger noncausality between returns and trading volume.

Keywords: conditional independence test, α-mixing, maximal conditional nonlinear correlation

2010 Mathematics Subject Classification: 62G10, 62H20

1 Introduction

The testing of conditional independence is important in statistics; one interest-ing application of such testinterest-ing is variable selection. For instance, consider the following regression problem:

Y = f (Z, X) + , (1)

where is independent of (Z, X) and f is a real-valued function. If Y and X are conditionally independent given Z, the variable X can be excluded from the model in (1).

Suppose that X, Y and Z are continuous random vectors of dimensions d1, d2

and d respectively. For testing whether X and Y are conditionally independent given Z, most tests in the literature deal with the case where the observations

(3)

for (X, Y, Z) are IID. See, for example, Linton and Gozalo [10], Delgado and Manteiga [2], Li, Cook and Nachtsheim [8], Huang [7], etc.

When the observations for (X, Y, Z) are weakly dependent, fewer tests are available in the literature. Su and White [12, 13] developed nonparametric tests based on a weighted Hellinger distance between conditional densities or the difference between conditional characteristic functions. Bouezmarni, Rombouts and Taamouti [1] also proposed a nonparametric test based on the Hellinger distance of copula densities.

In [12], [13] and [1], one motivation for constructing conditional indepen-dence tests for dependent data is to test Granger noncausality, which, according to Florens and Mouchart [4] and Florens and Fougere [3], is a form of condi-tional independence. Specifically, a series {Ut} does not Granger cause series

{Vt} if

Vt⊥ (Ut−1, Ut−2, . . . , Ut−p)|(Vt−1, Vt−2, . . . , Vt−p) for every p ≥ 1,

where ⊥ denotes an independent relationship.

In this paper, we consider Huang’s test statistic and derive its asymptotic distribution for α-mixing data. In order to measure the conditional associa-tion between X and Y given Z, Huang [7] uses a measure called the maximal nonlinear conditional correlation, which is defined as

sup

f,g∈S∗ 0

Corr(f (X, Z), g(Y, Z)|Z), (2)

where S₀∗is the collection of (f, g)’s such that E(f2_{(X, Z)) < ∞ and E(g}2_{(Y, Z)) <}

∞. Huang’s test statistic is an estimator for a weighted average of estimators of maximal nonlinear conditional correlation at different evaluation points for the given variable Z. The test statistic also involves certain basis functions used to approximate the f and g in (2). We show that the asymptotic distribution of Huang’s test statistic for α-mixing data is the same as that for IID data if the number of evaluation points and the number of basis functions are held constant.

This paper is organized as follows. In Section 2, we review the definition of maximal nonlinear conditional correlation and certain approximation results given in [7], and state the asymptotic properties of the test statistic that we derive under α-mixing condition. Some simulation results and an application are in Section 3. Proofs are given in Section 4.

2 Review and main results

In this section, we review the definition of the maximal nonlinear conditional correlation ρ1(X, Y |Z), the approximation of ρ1(X, Y |Z) and the proposed

es-timator for ρ1(X, Y |Z = z) in [7]. Then, we consider Huang’s test statistic

for testing H0 : ρ1(X, Y |Z) = 0 and present its asymptotic properties that we

(4)

2.1 Definition, approximation, and estimation for

maxi-mal nonlinear conditional correlation

The maximal nonlinear conditional correlation ρ1(X, Y |Z) is essentially the

maximum of E(f (X, Z)g(Y, Z)|Z) over S0, where S0 is the collection of (f, g)’s

that satisfy the following conditions:

E(f2(X, Z)|Z)I(0,∞)(E(f2(X, Z)|Z)) = I(0,∞)(E(f2(X, Z)|Z))

E(g2(Y, Z)|Z)I(0,∞)(E(g2(Y, Z)|Z)) = I(0,∞)(E(g2(Y, Z)|Z))

(3)

and

E(f (X, Z)|Z) = E(g(Y, Z)|Z) = 0. (4) To avoid dealing with the existence of the maximum and the measurability of ρ1(X, Y |Z), in [7], ρ1(X, Y |Z) is defined as

sup

(f,g)∈S0

E(f (X, Z)g(Y, Z)|Z),

where the supremum is defined as lim

n→∞E(αn(X, Z)βn(Y, Z)|Z),

where {(αn, βn)} is a sequence in S0 that satisfies the following conditions:

(i) The sequence {E(αn(X, Z)βn(Y, Z)|Z)} is non-decreasing.

(ii) For every (f, g) ∈ S0,

E(f (X, Z)g(Y, Z)|Z) ≤ lim

n→∞E(αn(X, Z)βn(Y, Z)|Z).

To approximate

ρ1(X, Y |Z) = sup (f,g)∈S0

E(f (X, Z)g(Y, Z)|Z),

we consider S0,p,q: the collection of all (f, g)’s in S0 such that f and g are in

the spans of {φp,j : 1 ≤ j ≤ p} and {ψq,k : 1 ≤ k ≤ q} respectively, when Z is

given. That is,

f (X, Z) = p X j=1 ap,j(Z)φp,j(X) for some ap,j(Z)’s and g(Y, Z) = q X k=1 bq,k(Z)ψq,k(Y ) for some bq,k(Z)’s.

Suppose that the basis functions φp,i’s and ψq,j’s are selected so that there

exists basis functions θr,k’s such that

lim p,r→∞a(i,k)inf E  α(X, Z) − X 1≤i≤p,1≤k≤r a(i, k)φp,i(X)θr,k(Z)   2 = 0 (5)

(5)

and lim q,r→∞b(j,k)inf E  β(Y, Z) − X 1≤j≤q,1≤k≤r b(j, k)ψq,j(Y )θr,k(Z)   2 = 0. (6)

for every α and β such that E(α2_{(X, Z)) and E(β}2_{(Y, Z)) are finite. Let X , Y}

and Z be the ranges of X, Y and Z respectively. Suppose that for each (p, q), there exist coefficients ap,0,i’s and bq,0,j’s such that

X 1≤i≤p ap,0,iφp,i(x) = 1 = X 1≤j≤q bq,0,jψq,j(y) (7)

for every x in X and every y in Y. Let ρp,q(Z) = max

(f,g)∈S0,p,q

E(f (X, Z)g(Y, Z)|Z).

Then, by Fact 2 in [7], ρ1(X, Y |Z) can be reasonably approximated by ρp,q(Z)

if p and q are large. The statement of the fact is given below.

FACT 1. (Fact 2 in [7]) Suppose that (5), (6) and (7) hold and {pn} and {qn}

are sequences of positive integers that tend to ∞ as n → ∞. Then lim

n→∞E(|ρ1(X, Y |Z) − ρpn,qn(Z)|) = 0.

A remark.

• It is not difficult to find basis functions that satisfy (5), (6) and (7). If X , Y and Z are bounded regions in Rd1_{, R}d2 _{and R}d _respectively

and the Lebesgue densities for (X, Z) and (Y, Z) are bounded, then φp,i’s

and ψq,j’s can be taken as B-spline basis functions on multidimensional

intervals containing X and Y respectively, where the θr,k’s can be can be

taken as B-spline basis functions on a multidimensional interval containing Z.

ρp,q(Z) can be found as follows. First, we look for vectors a1= (a1,1(Z), . . . , a1,p(Z))T

and b1 = (b1,1(Z), . . . , b1,q(Z))T such that (a1, b1) is the pair (a, b) that

maxi-mizes aT_Σ

φ,ψ,p,q(Z)b subject to

aTΣφ,p(Z)a = 1 = bTΣψ,p(Z)b,

where

Σφ,p(Z) = (E(φp,i(X)φp,j(X)|Z) − E(φp,i(X)|Z)E(φp,j(X)|Z))p×p,

Σψ,q(Z) = (E(ψq,i(Y )ψq,j(Y )|Z) − E(ψq,i(Y )|Z)E(ψq,j(Y )|Z))q×q,

and

(6)

Take f1(X, Z) = p X j=1 a1,j(Z)(φp,j(X) − E(φp,j(X)|Z)) and g1(Y, Z) = q X k=1 b1,k(Z)(ψq,j(X) − E(ψq,j(Y )|Z)).

Then, E(f1(X, Z)g1(Y, Z)|Z) = ρp,q(Z).

For z ∈ Z, let ˆΣφ,ψ,p,q(z), ˆΣφ,p(z) and ˆΣψ,q(z) be the kernel estimators

of Σφ,ψ,p,q(z), Σφ,p(z) and Σψ,q(z) respectively; in other words, every element

E(g(X, Y )|Z = z) in Σφ,ψ,p,q(z), Σφ,p(z) and Σψ,q(z) is estimated by

ˆ E(g(X, Y )|Z = z) = Pn t=1g(Xt, Yt)k0((Zt− z)/h) Pn t=1k0((Zt− z)/h) (8)

in ˆΣφ,ψ,p,q(z), ˆΣφ,p(z) and ˆΣψ,q(z), where k0 is a kernel function defined on Rd

and h > 0. Then, we use ˆρp,q(z) = maxa,baTΣˆφ,ψ,p,q(z)b for estimating ρp,q(z),

where all pairs (a, b) satisfy

aTΣˆφ,p(z)a = 1 = bTΣˆψ,q(z)b.

Henceforth, the estimator ˆρp,q(z) will be abbreviated as ˆρ(z) for each z in Z.

2.2 A test for conditional independence and relative

asymp-totic properties

The conditional independence test that we use in this paper is based on ˆρ2_(z)

at different z’s. Since each ˆρ(z) is determined by the kernel estimators of cer-tain conditional expectations, we first derive their joint asymptotic distribution. Then, we use Pk

i=1fˆZ(zi) ˆρ2(zi)’s as our test statistic and establish its

consis-tency and asymptotic distribution. Here the zi’s are selected points in Z and

ˆ fZ(·) =

Pn

t=1k0((Zt− ·)/h)

nhd

is the kernel density estimator of fZ: the Lebesgue pdf of Z. In order to avoid

dealing with the boundary bias problem in kernel estimation, we consider a set Z0_{that is contained in the interior of Z so that points in Z}0_{are away from the}

boundary of Z, and choose the zi’s from Z0.

Our first result is with regard to the joint asymptotic distribution of kernel estimators of some conditional expectations. In order to describe the assump-tions, we first review the definition for α-mixing coefficients. For a strictly stationary process {Ut}, let Fabdenote the σ-algebra generated by (Ua, . . . , Ub).

Then, the α-mixing coefficient at lag s for {Ut} is

(7)

{Ut} is considered to be α-mixing if its α-mixing coefficient at lag s tends to

0 as s tends to ∞. Let α(s) denote the α-mixing coefficient at lag s for the process {(Xt, Yt, Zt)}. Our assumptions are provided below.

(S0) The basis functions φp,1, . . ., φp,pand ψq,1, . . ., ψq,qare bounded and (5),

(6) and (7) hold. For the sake of brevity, φp,1, . . ., φp,pand ψq,1, . . ., ψq,q

will be abbreviated as φ1, . . ., φp and ψ1, . . ., ψq respectively hereafter.

(S1) {(Xt, Yt, Zt) ∈ Rd1+d2+d, t ≥ 0} is a strictly stationary α-mixing process

that satisfies α(τ ) = O(τ−(1+)), where > max(1, d/2), d1, d2 and d

denote the dimensions of Xt, Ytand Zt respectively.

(S2) Suppose that there exist Z0_{: an open subset of the interior of Z and µ:}

σ-finite measure such that for every z ∈ Z0_{, the conditional distribution of}

(X, Y ) given Z = z has a pdf f (·|z) with respect to µ. Further, f (x, y|z) and fZ(z) are twice differentiable with respect to z on Z0.

(S3) There exists a function h on X × Y such that

(S4) There exist constants c0 and c1 such that

sup z∈Z0 max |fZ(z)|, max 1≤i≤d ∂ ∂zi fZ(z) , max 1≤i,j≤d ∂2 ∂zi∂zj fZ(z) ≤ c0 and 1/fZ(z) ≤ c1for z ∈ Z0.

(S5) k∗ is a kernel function defined on R1_{, and k}

0 is a product kernel on Rd

that satisfies

k0(v1, v2, . . . , vd) = k∗(v1)k∗(v2) · · · k∗(vd),

k∗ ≥ 0, sup_vk∗(v) < ∞, R k∗_{(v)dv = 1,} _{R vk}∗_{(v)dv = 0,}_{R v(k}∗_(v))2_{dv =}

0 and κ2=R v2k∗(v)dv < ∞.

(S6) As n → ∞, the bandwidth h → 0, nhd_{→ ∞ and nh}d+4_{→ 0.}

Under the above conditions, the joint asymptotic distribution of kernel estima-tors of conditional expectations can be established, as stated in Lemma 1. The proof for Lemma 1 is provided in Section 4.1.

LEMMA 1. Suppose that Conditions (S1)–(S6) hold. Suppose that g1, g2, . . .,

gm are bounded functions defined on X × Y. Suppose z1, . . ., zk are distinct

points in Z0_{. For i = 1, . . ., k, let}

ˆ gj(zi) = Pn t=1gj(Xt, Yt)k0((Zt− zi)/h) Pn t=1k0((Zt− zi)/h)

(8)

be the kernel estimator of g∗_j(zi) ≡ E(gj(X, Y )|Z = zi). Further, let Bs,j(zi) = κ2 2 (fZ(zi)g ∗ j,ss(zi) + 2fs(zi)g∗j,s(zi)) (9) and Wj,n(zi) = √ nhd _g_ˆ j(zi) − g∗j(zi) − h2 d X s=1 Bs,j(zi)/fZ(zi) !

for 1 ≤ i ≤ k and 1 ≤ j ≤ m, where g∗_j,sand g_j,ss∗ denote the first and the second partial derivatives of g∗

j with respect to the s-th component respectively and fs

denotes the first partial derivative of fZ with respect to the s-th component. Let

uj,t= gj(Xt, Yt) − g∗j(Zt),

cjj∗(z_i) = E(u_j,1u_j∗_,1|Z₁= z_i),

σ_j2(zi) = E(u2j,1|Z1= zi),

and

Wn = (W1,n(z1), . . . , W1,n(zk), . . . , Wm,n(z1), . . . , Wm,n(zk))T.

Then, Wn converges in distribution to a random vector

(Z_1,1∗ , . . . , Z_k,1∗ , . . . , Z_1,m∗ , . . . , Z_k,m∗ )T ≡ Z∗,

where Z∗ is multivariate normal with mean 0 and for 1 ≤ i, i∗ ≤ k and 1 ≤ j, j∗ ≤ m, Cov(Z_i,j∗ , Z_i∗∗_,j∗) =    κd_σ2 j(zi)/fZ(zi) if i = i∗and j = j∗; κd_c jj∗(z_i)/f_Z(z_i) if i = i∗and j 6= j∗; 0 if i 6= i∗, where κ =R (k∗_(v))2_dv.

Now, suppose that the basis functions φl’s and ψm∗’s are linearly

indepen-dent. For the sake of convenience, for z ∈ {z1, . . . , zk}, we apply certain linear

transformations to φl’s and ψm∗’s to obtain new basis functions φ∗

l’s and ψ∗m∗’s

(the ˆρ(z) remains unchanged under such transformations). Take g1(X, Y ), . . .,

gm(X, Y ) to be the functions φ∗l(X)φ ∗

l0(X), φ∗_l(X)ψ∗_m∗(Y ) and ψ_m∗∗(Y )ψ_m∗0(Y ),

where 1 ≤ l ≤ l0 ≤ p and 1 ≤ m∗ _{≤ m}0 _{≤ q. Then, the consistency of ˆ}_{ρ(z) can}

be established and we have Theorem 1. The proof for Theorem 1 is provided in Section 4.2.

THEOREM 1. Suppose that conditions (S0)–(S6) hold and the basis functions φl’s and ψm∗’s are linearly independent. Suppose z₁, . . ., z_k are distinct points

in Z0_{. Then,} k X i=1 ˆ ρ2(zi) − ρ2p,q(zi) 2 = Op ₁ nhd + h 4

(9)

and k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2p,q(zi) !2 = Op 1 nhd + h 4

The following theorem states the approximate distribution of the statistic Pk

i=1fˆZ(zi) ˆρ2(zi) when X and Y are conditionally independent given Z.

THEOREM 2. Suppose that the conditions in Theorem 1 hold and X and Y are conditionally independent given Z. Then,

nhd κd k X i=1 ˆ fZ(zi) ˆρ2(zi) converges in distribution to k X i=1 λi

as n tends to ∞, where the λi’s are IID and have the same distribution as the

largest eigenvalue of a matrix CCT, where C is a (p − 1) × (q − 1) matrix whose elements are IID N (0, 1).

The proof of Theorem 2 is provided in Section 4.3. Theorem 2 is similar to Theorem 3.2 given in [7]. The main difference between the two is that Theorem 2 can be applied to α-mixing data. In addition, p and q are held fixed in Theorem 2, while they are allowed to depend on n and tend to ∞ as n tends to ∞ in Theorem 3.2 in [7].

According to Theorem 2, a test that rejects H0if

nhd κd k X i=1 ˆ f (zi) ˆρ2(zi) ≥ F1−α∗ (10)

is of approximate level α, where F∗ is the distribution function ofPk

i=1λi and

F1−α∗ is the 1 − α quantile of F∗. Theorem 3 states that the test with rejection

region in (10) is consistent if p and q are sufficiently large and one of the ρ(zi)’s

is positive. The proof for this theorem is provided in Section 4.4.

THEOREM 3. Suppose that ρ(zi) > 0 for some ziand p and q are sufficiently

large so that ρp,q(zi) > 0. Then, for 0 < α < 1, the probability that (10) holds

tends to 1 as n → ∞.

3 Simulation studies and application to S&P500

index data

3.1 Simulation studies

In this section, we conduct several simulation studies for illustrating the per-formance of our test. The data generating processes, labeled Data1 – Data13, are described below. In order to make our simulation results comparable with those of the test proposed by Su and White [13], some of our data generating processes (Data1–Data10) are the same as theirs. Throughout the description for Data1–Data10, (1,t, 2,t, 3,t) are IID N (0, I3).

(10)

Data1: (Xt, Yt, Zt) = (1,t, 2,t, 3,t).

Data2: Xt= 0.5Xt−1+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Date3: Xt= 1,t q 0.01 + 0.5X2 t−1, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1. Data4: Xt = 1,tph1,t, Yt = 2,tph2,t, Zt = Xt−1, h1,t = 0.01 + 0.9h1,t−1+ 0.05Xt−12 and h2,t= 0.01 + 0.9h2,t−1+ 0.05Yt−12 .

Data5: Xt= 0.5Xt−1+ 0.5Yt+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Data6: Xt= 0.5Xt−1+ 0.5Yt2+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Data7: Xt= 0.5Xt−1Yt+ 1,t, Yt= 0.5Yt−1+ 2,tand Zt= Xt−1.

Data8: Xt= 0.5Xt−1+ 0.5Yt1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Data9: Xt= 1,t

q

0.01 + 0.5X2

t−1+ 0.25Yt2, Yt= 0.5Yt−1+ 2,tand Zt=

Xt−1.

Data10: Xt = 1,tph1,t, Yt = 2,tph2,t, Zt = Xt−1, h1,t = 0.01 +

0.1h1,t−1+ 0.4Xt−12 + 0.5Yt2 and h2,t= 0.01 + 0.9h2,t−1+ 0.5Yt2.

Data11: (Xt, Yt, Zt) = (1,t, 2,t, 3,t), where (1,t, 2,t, 3,t) are IID LN (0, I3).

Data12: Xt= 1,t1,t−1, Yt = 2,t2,t−1 and Zt = Xt−1, where (1,t, 2,t)

are IID LN (0, I2).

Data13: Xt= 1,t2,t−1, Yt= 21,t2,t−1 and Zt= 2,t−1, where (1,t, 2,t)

are IID LN (0, I2).

Here, Data1 – Data4, Data11 and Data12 are used for examining the level of the test, and Data5 – Data10 and Data13 are used for checking the power.

3.1.1 Simulation studies based on asymptotic distribution of the test statistic

We first apply our test using the asymptotic distribution of the test statistic. Parameter set-up: In order to apply our test, certain parameters need to be specified, including the kernel function k∗, the kernel bandwidth h and the basis functions. For the sake of simplicity, in all the simulation experiments, we take the kernel bandwidth h to be cn−0.25, where n is the sample size and c ∈ {0.5, 1, 1.5, 2}; we use the following kernel function:

k∗(x) =

1 − x if 0 ≤ x ≤ 1; x + 1 if − 1 ≤ x < 0.

In addition, the basis functions φ∗₁,. . . , φ∗_p and ψ₁∗,. . . , ψ_q∗ are selected in the following manner. For i = 1, . . . , p and j = 1, . . . , q, let

φi(x) = ₁ _if i−1 p ≤ x < i p; 0 otherwise (11)

(11)

and

ψj(y) =

1 if j−1_q ≤ y < j_q;

0 otherwise, (12) where p = q = 4. Since the basis functions are defined on [0, 1], we transform the data (Xt, Yt, Zt)nt=1into (F1(Xt), F2(Yt), F3(Zt))nt=1before using the test, where

F1, F2 and F3 denote the empirical CDF’s of {Xt}t=1n , {Yt}nt=1 and {Zt}nt=1

respectively. For the choice of the evaluation points, we take z1= 0.78n−0.25≡

h0 and zi= zi−1+ 2h0 if i ≥ 2 and zi≤ 1 − h0.

Table 1 shows that the levels of the test are less than 0.05 for c = 0.5 and c = 1 and the powers of the test are larger for larger c’s. It seems that when c = 1, the levels of the test are close to 0.05 and the power performance is fine.

n = 500 n = 1000 c = 0.5 c = 1 c = 1.5 c = 2 c = 0.5 c = 1 c = 1.5 c = 2 Data1 0.030 0.039 0.053 0.071 0.043 0.048 0.057 0.074 Data2 0.030 0.041 0.058 0.074 0.033 0.048 0.060 0.069 Data3 0.032 0.042 0.055 0.080 0.038 0.049 0.055 0.070 Data4 0.038 0.044 0.057 0.075 0.042 0.048 0.057 0.066 Data5 0.951 1 1 1 1 1 1 1 Data6 0.898 1 1 1 0.997 1 1 1 Data7 0.918 1 1 1 0.985 1 1 1 Data8 0.995 1 1 1 1 1 1 1 Data9 0.725 0.991 1 1 0.993 1 1 1 Data10 0.374 0.817 0.959 0.986 0.819 0.996 1 1 Data11 0.036 0.050 0.062 0.079 0.035 0.042 0.049 0.059 Data12 0.036 0.051 0.055 0.072 0.041 0.041 0.053 0.065 Data13 1 1 1 1 1 1 1 1

Table 1: Power results for different c’s when n = 500 and n = 1000

3.1.2 Simulation studies based on local bootstrap

The test based on asymptotic distribution of the test statistic does not work well for small sample sizes. Figure 1 shows that the distribution of the test statistic and the asymptotic distribution are quite different for Data11 when n = 100. For Data1 – Data4 and Data12, we find similar patterns. When n = 200, the difference between the distribution of the test statistic and the asymptotic distribution become smaller but is still visible.

To apply our test for small sample sizes, we consider the local bootstrap procedure proposed by Paparoditis and Politis [11]. The local bootstrap proce-dure is described below. For a given sample {(Xt, Yt, Zt)}nt=1, a local bootstrap

sample {(X_t∗, Y_t∗, Z_t∗)}n

t=1is generated according to the following steps.

(12)

Figure 1: Exact distribution (solid line) versus asymptotic distribution (dashed line) of the test statistic with different bandwidth choices (h = cn−1/4)

distribution function ˆFZ, where

ˆ FZ(z) = 1 n n X t=1 I(−∞,Zt](z).

(b) For 1 ≤ t ≤ n, we draw X_t∗ and Y_t∗ independently from the empirical cu-mulative distribution functions ˆFX|Z=Z∗

t and ˆFY |Z=Zt∗respectively, where

ˆ FX|Z=Z∗ t(x) = Pn t=1k ∗_((Z∗ t − Zt)/b)I(−∞,Xt](x) Pn t=1k∗((Zt∗− Zt)/b) and ˆ FY |Z=Z∗ t(y) = Pn t=1k ∗_((Z∗

t − Zt)/b)I(−∞,Yt](y)

Pn

t=1k∗((Zt∗− Zt)/b)

(13)

Here, the bandwidth b is taken to be n−0.2 and the kernel function k∗ is the probability density function for N (0, 1).

In order to determine the rejection region for a given sample, we repeat the above procedure to obtain bootstrap resamples and compute the test statistic nhd_κ−dPk

i=1f (zˆ i) ˆρ2(zi) for the original sample and each local bootstrap

re-sample. For a given level α, if the test statistic based on the given sample is larger than the (1 − α) quantile of the test statistics that are computed based on the local bootstrap resamples, we reject the conditional independence hy-pothesis at level α. The purpose of using the local bootstrap procedure is to generate a resample {(X_t∗, Y_t∗, Z_t∗)}n

t=1such that the distribution of Z∗, the

con-ditional distributions of X∗ given Z∗= z and Y∗ given Z∗= z are close to the distribution of Z, the conditional distributions of X given Z = z and Y given Z = z respectively. In addition, since X_t∗ and Y_t∗ are generated independently given Z_t∗ = z, they are conditionally independent given Z_t∗= z, irrespective of whether or not X and Y are conditionally independent given Z.

In these simulation studies, we choose the basis functions in (11) and (12) with p = q = 5. The evaluation points are {0.2, 0.4, 0.6, 0.8}, and the kernel bandwidth h to be cn−0.25, where n is the sample size and c ∈ {0.5, 1, 1.5, 2}.

Finally, we present a few experimental results of our test (Test 1) and Su and White’s test (Test 2). For Test 2, we run the simulations for Data11 – Data13 with the bandwidth hn = c∗n−1/8.5, where c∗ = 1 or 2. Each power estimate

is based on 3000 repetitions, where 1000 local bootstrap resamples are used in each repetition. For the sake of comparison, we also list some power estimates for Test 2 for Data1 – Data10, which are taken directly from [13]. They use 250 repetitions with 200 local bootstrap resamples for each repetition.

Tables 2 and 3 indicate the level and power estimates for Test 1 and Test 2 at significance level 5% when the sample sizes are 100 and 200 respectively.

Data1 Data2 Data3 Data4 Data5 Data6 Data7 Test 2,c∗= 1 0.096 0.060 0.048 0.072 0.668 0.756 0.388 Test 2,c∗_{= 2} _0.072 _0.036 _0.072 _0.048 _0.952 _0.944 _0.576 Test 1,c = 0.5 0.045 0.061 0.046 0.062 0.525 0.479 0.265 Test 1,c = 1 0.046 0.050 0.050 0.047 0.746 0.717 0.400 Test 1,c = 1.5 0.040 0.052 0.056 0.055 0.814 0.779 0.329 Test 1,c = 2 0.041 0.050 0.053 0.062 0.852 0.793 0.218

Data8 Data9 Data10 Data11 Data12 Data13 Test 2,c∗= 1 0.860 0.828 0.680 0.034 0.043 0.589 Test 2,c∗= 2 0.940 0.988 0.912 0.022 0.022 0.859 Test 1,c = 0.5 0.692 0.357 0.195 0.058 0.050 1 Test 1,c = 1 0.873 0.566 0.320 0.049 0.048 1 Test 1,c = 1.5 0.889 0.618 0.341 0.049 0.041 1 Test 1,c = 2 0.860 0.631 0.348 0.046 0.045 1

(14)

Data1 Data2 Data3 Data4 Data5 Data6 Data7 Test 2,c∗= 1 0.064 0.052 0.080 0.080 0.900 0.960 0.596 Test 2,c∗_{= 2} _0.044 _0.060 _0.056 _0.048 ₁ ₁ _0.864 Test 1,c = 0.5 0.040 0.061 0.036 0.055 0.827 0.830 0.488 Test 1,c = 1 0.049 0.051 0.057 0.054 0.982 0.983 0.831 Test 1,c = 1.5 0.046 0.048 0.049 0.053 0.995 0.989 0.827 Test 1,c = 2 0.045 0.045 0.047 0.057 0.997 0.995 0.735

Data8 Data9 Data10 Data11 Data12 Data13 Test 2,c∗= 1 0.992 0.968 0.880 0.031 0.036 0.347 Test 2,c∗= 2 1 1 0.996 0.025 0.032 0.872 Test 1,c = 0.5 0.988 0.730 0.392 0.062 0.062 1 Test 1,c = 1 1 0.947 0.679 0.048 0.047 1 Test 1,c = 1.5 1 0.968 0.738 0.051 0.043 1 Test 1,c = 2 1 0.971 0.745 0.058 0.037 1

Table 3: Power comparison between Tests 1 and 2 when n = 200

3.2 Application to S&P500 index data

In this section, we apply the linear Granger causality test (hereafter denoted by Test LIN) and our conditional independence test (Test 1) in order to check the interaction between returns and volume for S&P500 index data at one day lag. There are 2514 observations for daily index returns and trading volume from January 2000 to December 2009, taken from Yahoo Finance. Here, the return for day t is defined as

Rt= 100 log Pt Pt−1 ,

where Pt is the index value for day t. Moreover, the trading volume for day t

(in dollars), denoted by Vt, is transformed into

V_t∗= log _V t Vt−1 .

The above transformations are commonly used in the analysis for financial data; for example, see Hiemstra and Jones [6] and [1]. The augmented Dickey-Fuller test reveals that the series {Rt} and {Vt∗} are stationary.

In order to examine whether {Rt} is useful for predicting {Vt∗}, we consider

the effects up to lag 1. Specifically, we test

H0: Vt∗⊥ Rt−1|Vt−1∗ (13)

using Test 1. For Test LIN, it is assumed that

E(V_t∗|Rt−1, Vt−1∗ ) = a1Rt−1+ b1Vt−1∗

and the null hypothesis is

(15)

We use the notation Rt−16⇒ Vt∗to denote the relation expressed in (13) or (14).

The notation V_t−1∗ 6⇒ Rtis defined analogously.

The p-values for Test LIN and Test 1 are provided in Table 4. For Test 1, we use the same parameter set-up as in Section 3.1.1 and find both the return-to-volume and volume-to-return relationships are significant at the 5% level. However, for Test LIN, the volume-to-return relationship is not significant. These findings are consistent with the results obtained in [6] and [1].

H0 Rt−16⇒ Vt∗ Vt−1∗ 6⇒ Rt

Test LIN 0.000 0.804 Test 1 0.001 0.032

Table 4: p-values for Test LIN and Test 1 for testing the relationship between returns and volume changes

To illustrate the implementation of our test for the d > 1 case, we also apply the test to test

H0: Vt∗⊥ (Rt−1, Rt−2)|(Vt−1∗ , Vt−2∗ ) (15)

and

H0: Rt⊥ (Vt−1∗ , V ∗

t−2)|(Rt−1, Rt−2). (16)

The empirical CDF transforms are applied component-wisely. For instance, we transform (V_t−1∗ , V_t−2∗ , )n

t=4into (F1(Vt−1∗ ), F2(Vt−2∗ ))nt=4, where n = 2512 and Fi

is the empirical CDF of V_t−i∗ for i = 1, 2. For the basis functions, we use 4 basis functions on [0, 1]: φ1, . . ., φ4 and 4 basis functions ψ1, . . ., ψ4 on [0, 1]2,

where φ1, . . ., φ4are given in (11) with p = 4, ψ1(y1, y2) = I[0,0.5)(y1)I[0,0.5)(y2),

ψ2(y1, y2) = I[0,0.5)(y1)I[0.5,1)(y2), ψ3(y1, y2) = I[0.5,1)(y1)I[0,0.5)(y2) and ψ4 =

1−ψ1−ψ2−ψ3. Here IA(·) denotes the indicator function on A. In addition, the

kernel bandwidth is cn−1/(d+δ) with c = 1.4 and δ = 2.4. The evaluation points are all the points in S2

h0, where Sh0 = {(2k −1)h0: k is an integer }∩[h0, 1−h0]

and h0= 0.78n−1/(d+δ). Here c and δ are selected from certain candidate values

so that the levels of the test are close to 0.05 when the data are IID U (0, 1). The p-values for (15) and (16) are 0.017 and 0.370 respectively.

Some remarks on the implementation of the test.

• It is recommended to choose evaluation points so that two evaluation points, zi and zj, are at least 2h away (for each component) when a

compact kernel supported on [−1, 1]d is used. In such case, the ˆρ(zi) and

ˆ

ρ(zj) are independent for IID data, which makes the distribution of the

test statistic close to the derived asymptotic distribution. Since nhd_{→ ∞,}

h cannot be too small, which implies that the number of evaluation points cannot be too large.

• We apply empirical CDF transforms to our data so that the distribution of each component of X, Y and Z is supported on [0, 1]. The transforms are data dependent and it is not clear whether the transformed data can

(16)

be treated as if they were transformed by the true underlying CDF. The simulation results are fine, but further investigation is needed.

4 Proofs

In this section, we give proofs for Theorems 1 – 3 and Lemma 1. Before giving the proofs, we first define and recall some notations. Recall that k∗ is a kernel on R1 and k0 is a product kernel on Rd defined by

k0(v1, v2, . . . , vd) = k∗(v1)k∗(v2) · · · k∗(vd),

κ = Z

(k∗(v))2dv and κ2=R v2k∗(v)dv.

For a (p + q) × (p + q) matrix V0, let g1,1(V0), g1,2(V0), g2,1(V0) and g2,2(V0)

denote the matrices of dimensions p × p, p × q, q × p, q × q respectively such that V0= g1,1(V0) g1,2(V0) g2,1(V0) g2,2(V0) .

4.1 Proof of Lemma 1

For simplicity, we prove the lemma only for the case where m = 2 and k = 2. For t = 1, 2, . . . , n, i = 1, 2 and j = 1, 2, let

ˆ ηj,1(zi) = (nhd)−1 n X t=1 (g_j∗(Zt) − gj∗(zi))k0 Zt− zi h , ˆ ηj,2(zi) = (nhd)−1 n X t=1 uj,tk0 Zt− zi h

and ˆηj(zi) = ˆηj,1(zi) + ˆηj,2(zi). Then, ˆgj(zi) − gj∗(zi) = ˆηj(zi)/ ˆfZ(zi), where

ˆ

fZ(zi) = (1/(nhd))Pn_t=1k0((Zt− zi)/h). We can complete the proof using the

following results (A1)-(A3):

(A1) Suppose that the conditions in Lemma 1 hold. Then, for 1 ≤ i, j ≤ 2,

ˆ ηj,1(zi) = h2 d X s=1 Bs,j(zi) + op h3+ (nhd)−1/2.

(A2) Suppose that the conditions in Lemma 1 hold. Then,

Z_n∗≡√nhd     ˆ η1,2(z1) ˆ η2,2(z1) ˆ η1,2(z2) ˆ η2,2(z2)     D → Z,

(17)

where the distribution of Z is N (0, Σ) and Σ is     κdσ12(z1)fZ(z1) κdc12(z1)fZ(z1) 0 0 κdc12(z1)fZ(z1) κdσ22(z1)fZ(z1) 0 0 0 0 κdσ2₁(z2)fZ(z2) κdc12(z2)fZ(z2) 0 0 κdc12(z2)fZ(z2) κdσ22(z2)fZ(z2)     .

(A3) Suppose that (Xn1, Xn2, . . . , Xnk)T D→ (Y1, Y2, . . . , Yk)Tand (Zn1, Zn2. . . , Znk)T D

→ (c1, c2, . . . , ck)T, where c1, c2, . . ., ck are constants. Then,

(Xn1Zn1, Xn2Zn2, . . . , XnkZnk)T D→ (c1Y1, c2Y2, . . . , ckYk)T.

From (A1),(A2) and the assumption that nhd+4 _{→ 0, we have}

√ nhd      ˆ η1(z1) − h2P d s=1Bs,1(z1) ˆ η2(z1) − h2P d s=1Bs,2(z1) ˆ η1(z2) − h2P d s=1Bs,1(z2) ˆ η2(z2) − h2P d s=1Bs,2(z2)      ∼ Z + (nhd)1/2op h3+√1 nhd D → Z,

where A ∼ B means that the distributions of A and B are the same. Apply (A3) and we have Lemma 1.

The proofs of (A1)-(A3) are given below. • Proof of (A1). Note that

E (ˆηj,1(zi)) = 1 hd Z g∗_j(zt) − gj∗(zi) k0 zt− zi h fZ(zt)dzt = Z g_j∗(zi+ hν) − g∗j(zi) k0(ν)fZ(zi+ hν)dν ν = (ν1, . . . , νd) = Z h d X s=1 gj,s∗ (zi)νsfZ(zi)k0(ν)dν + Z h2 d X s=1 g∗_j,s(zi)νs ! d X s=1 fs(zi)νs ! k0(ν)dν +1 2 Z h2fZ(zi) d X s=1 d X s∗₌₁ g_j,ss∗ ∗(zi)νsνs∗k₀(ν)dν + O(h3) = h2κ2 2 d X s=1 fZ(zi)g∗j,ss(zi) + 2fs(zi)g∗j,s(zi) + O(h3) = h2 d X s=1 Bs,j(zi) + O(h3).

Let Ki,j,t= h−d gj∗(Zt) − gj∗(zi) k0((Zt− zi)/h). Then, we have

V ar (ˆηj,1(zi)) = 1 n2   n X t=1 V ar(Ki,j,t) + n X t=1 n X s=1,s6=t

Cov(Ki,j,t, Ki,j,s)

 .

(18)

Since

V ar(Ki,j,t) = E Ki,j,t2 − (E(Ki,j,t)) 2 = 1 hd Z g_j∗(zi+ hν) − g∗j(zi) 2 (k0(ν)) 2 fZ(zi+ hν)dν − fZ(zi)h Z d X s=1 g_j,s∗ (zi)νsk0(ν)dν + O(h2) !2 = 1 hdO(h 2_{) − O(h}4_), Pn

t=1V ar(Ki,j,t) = O(nh2−d). Note that from Corollary A.2 in Hall

and Heyde [5] and the fact that for 2 < β < 2(2 + d)/d, E( K β i,j,t ) = O(h2+d−βd), we have X s6=t

Cov(Ki,j,t, Ki,j,s)

= 2 n X t=1 n X s>t

Cov(Ki,j,t, Ki,j,s)

≤ 2n ∞ X s=1

|Cov(Ki,j,1, Ki,j,1+s)|

≤ 16nO(h2(2+d−βd)/β) ∞ X s=1 α(β−2)/β(s). Therefore, V ar (ˆηj,1(zi)) = O h2 nhd + O h 2(2+d−βd)/β n = o ₁ nhd

From the above results, ˆηj,1(zi) = h2P d

s=1Bs,j(zi) + op(h3+ (nhd)(−1/2)).

• Proof of (A2). By the Cram´er-Wold Theorem, it is sufficient to prove that cT_Z∗

n converges in distribution to cTZ for any c = (c1, c2, c3, c4)T in R4.

We use “big-small block” arguments to complete the proof. Assume that there exist positive integers p = p(n), q = q(n) and k = k(n) = [n/(p + q)] (the integer part of n/(p + q)) such that as n → ∞,

p → ∞, q → ∞, p = o(n), q = o(p), p = o(nhd)1/2, np−1α(q) = o(1), phd= o(1), phd→ ∞. Let Zn,t = 1 √ hd c1u1,tk0 Zt− z1 h + c2u2,tk0 Zt− z1 h +c3u1,tk0 Zt− z2 h + c4u2,tk0 Zt− z2 h .

(19)

Then, we have cT_Z∗ n= √1n Pn t=1Zn,t ≡√1_nWn. Let ξj =P (j+1)p+jq t=j(p+q)+1Zn,t and ζj=P (j+1)(p+q) t=(j+1)p+jq+1Zn,tfor j = 0,1, . . ., k−1, and ζk= Pn t=k(p+q)+1Zn,t. Then, Wn= k−1 X j=0 ξj | {z } Wn1 + k−1 X j=0 ζj | {z } Wn2

+ζk. In order to prove this lemma, it suffices

to show that as n → ∞, (1) E(exp(itWn1)) − Qk−1 j=0E(exp(itξj)) → 0, (2) √1 nWn2 p → 0 and _√1 nζk p → 0, (3) σ2 n≡ Pk−1 j=0E(ξ 2 j) = n(σ2+ o(1)), (4) _σ12 n Pk−1 j=0E(ξ 2

jI(|ξj| > εpσn2)) → 0 for any ε > 0,

where σ2=c21κdfZ(z1)σ12(z1) + c22κdfZ(z1)σ22(z1) + c2₃κdfZ(z2)σ12(z2) + c24κ d_f Z(z2)σ22(z2) + 2c1c2κdfZ(z1)c12(z1) + 2c3c4κdfZ(z2)c12(z2).

The verification of the above expression for σ2

n is given in Section 4.5.

We now prove these results respectively. From Lemma 18.2 in Li and Racine [9], which is due to Volkonskii and Rozanov [14],

E(exp(itWn1)) − k−1 Y j=0 E(exp(itξj)) ≤ 16kα(q) = O n pα(q) = o(1),

we obtain (1). In order to prove (2), we first consider Wn2. Note that

E(W_n22 ) = V ar( k−1 X j=0 ζj) = kV ar(ζ0) | {z } (P 1) + k−1 X i=0 k−1 X j=0,j6=i Cov(ζi, ζj) | {z } (P 2) .

Computation of (P1). Note that from

V ar(ζ0) = q X i=1 V ar(Zn,i) + 2 q X i=1 q X j>i Cov(Zn,i, Zn,j), q X i=1

(20)

and the fact that 2 q X i=1 q X j>i Cov(Zn,i, Zn,j) = 2q q X j=1 (1 −j q)Cov(Zn,1, Zn,1+j) = O(q 2_hd_), we have that

V ar(ζ0) = qσ2+ O(q2hd) + O(qh2) = qσ2(1 + o(1)).

Therefore,

(P 1) = kqσ2(1 + o(1)) = O(kq) = o(n). Computation of (P2). Note that from Theorem A.5 in [5],

where Cin= 4 max |ck| sup |us,1| sup |k0|/

√

hd _{for i = 1, 2. Then, we have}

E(W2

n2)/n = o(1). Similarly, V ar(ζk) = O(p + q) = o(n), so (2) holds.

By stationarity and the same arguments in (1), we have V ar(ξ0) = pσ2(1+

o(1)). Thus Pk−1

j=0E(ξ 2

j)/n = kpσ

2_{(1 + o(1))/n → σ}2_{. Finally, since}

|Zn,t| ≤ C/

√

hd_{, for every > 0, the set {|ξ}

j| ≥ pσ2n} is an empty set

when n is large. Therefore, (4) holds. This complete the proof.

• Proof of (A3). It is sufficient to prove that (Xn1, . . . , Xnk, Zn1, . . . , Znk)T D→

(Y1, . . . , Yk, c1, . . . , ck). Let Xn= (Xn1, . . . , Xnk)T , Zn= (Zn1, . . . , Znk)T,

Y = (Y1, . . . , Yk)T and c = (c1, . . . , ck)T. Then,

E(ei(tTXn+sTZn)₎ ₌ _E(ei(tTXn+sTc)_ei(sT(Zn−c))₎

= E(ei(tTXn+sTc)_(ei(sT(Zn−c))_{− 1))}

| {z } I + E(ei(tTXn+sTc)₎ | {z } II .

Note that II → E(ei(tTY +sTc)_{) and I → 0 by Lebesgue’s dominated}

convergence theorem. Apply the continuous mapping Theorem and we have (A3).

4.2 Proof of Theorem 1

We adopt the proof in [7]. For z ∈ {z1, . . . , zk}, let φ∗l: 1 ≤ l ≤ p and ψ∗m∗: 1 ≤

(21)

φl’s and ψm∗’s such that φ∗₁= 1 = ψ₁∗, (E(φ∗_l(X)φ∗_l0(X)|Z = z) : 1 ≤ l, l0 ≤ p)

and (E(ψ_m∗∗(Y )ψ_m∗0(Y )|Z = z) : 1 ≤ m∗, m0 ≤ q) are identity matrices, and

E(φ∗_l(X)ψm∗∗(Y )|Z = z) = 0 for l 6= m∗. Take g1(X, Y ), . . ., gm(X, Y ) to be the

functions φ∗_l(X)φ∗_l0(X), φ∗_l(X)ψm∗∗(Y ) and ψ_m∗∗(Y )ψ_m∗0(Y ), where 1 ≤ l ≤ l0≤ p

and 1 ≤ m∗≤ m0_{≤ q. Apply Lemma 1 and we have}

√ nhd              ˆ g1(z1) − g1∗(z1) .. . ˆ g1(zk) − g1∗(zk) .. . ˆ gm(z1) − gm∗(z1) .. . ˆ gm(zk) − gm∗(zk)              −√nhd               h2Pd s=1Bs,1(z1)/fZ(z1) .. . h2Pd s=1Bs,1(zk)/fZ(zk) .. . h2Pd s=1Bs,m(z1)/fZ(z1) .. . h2Pd s=1Bs,m(zk)/fZ(zk)               D → Z∗. (17) Let V∗(z) = V11(z) V12(z) V21(z) V22(z) ,

where the (l, l0)-th element of V11(z) is E(φ∗l(X)φ∗l0(X)|Z = z) for 1 ≤ l, l0 ≤ p,

the (l, m∗)-th element of V12(z) is E(φ∗l(X)ψm∗∗(Y )|Z = z) for 1 ≤ l ≤ p,

1 ≤ m∗ ≤ q, the (m∗_{, m}0_{)-th element of V}

22(z) is E(ψm∗∗(Y )ψ_m∗0(Y )|Z = z)

for 1 ≤ m∗_{, m}0 _{≤ q, and V}

21(z) = (V12(z))T. Let ˆV∗(z) be the estimator

of V∗(z) obtained by replacing each conditional expectation in V∗(z) with its kernel estimator defined in (8). Then, (17) gives

k X i=1 k ˆV∗(zi) − V∗(zi) k2= Op ₁ nhd + Op(h4) = Op ₁ nhd + h 4 .

For 1 ≤ i ≤ k, for a p × 1 vector a and a (p + q) × (p + q) matrix

U = U11 U12 U21 U22 ,

where the dimensions of U11, U12, U21 and U22are p × p, p × q, q × p and q × q

respectively, define gr,s(U ) = Urs (18) for 1 ≤ r, s ≤ 2, g_r,s∗ (U ) = gr,s(U ) if (r, s) = (1, 2) or (2, 1); (gr,s(U ))−1 if (r, s) = (1, 1) or (2, 2), and g(U, a) = U1,2U2,2−1U2,1U1,1−1− U1,1aaT. (19)

(22)

Let α∗ be the p × 1 vector whose first element is 1 and the rest elements are 0’s. Then, ˆρ(z) and ρp,q(z) are the square roots of the largest eigenvalues of the

matrices g( ˆV∗(z), α∗) and g(V∗(z), α∗) respectively. Let 4r,s,i= gr,s∗ ( ˆV ∗_(z i)) − g∗r,s(V ∗_(z i)). Then, we have k g( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k ≤ 2 Y r=1 2 Y s=1 (kg_r,s∗ (V∗(zi))k + k4r,s,ik) − 2 Y r=1 2 Y s=1 kg∗_r,s(V∗(zi))k + kg_1,1∗ ( ˆV∗(zi)) − g1,1∗ (zi)kkα∗(α∗)Tk,

which gives that

k X i=1 kg( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k2= Op ₁ nhd + h 4 = Op ₁ nhd and k X i=1 ˆ ρ2(zi) − ρ2p,q(zi) 2 = Op ₁ nhd + h 4 (20)

since | ˆρ2(zi) − ρ2p,q(zi)| ≤ kg( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k for 1 ≤ i ≤ k. From

(20) and the fact that

k X i=1 ( ˆfZ(zi) − fZ(zi))2= Op ₁ nhd + h 4 , k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2p,q(zi) !2 = k X i=1 ( ˆfZ(zi) − fZ(zi)) ˆρ2(zi) + k X i=1 fZ(zi)( ˆρ2(zi) − ρ2p,q(zi)) !2 = Op ₁ nhd + h 4 .

4.3 Proof of Theorem 2

We adopt the proof in [7]. For z ∈ {z1, . . . , zk}, let ˆV∗(z), V∗(z) and Bs,j be as

defined in the proof of Theorem 1. Let Bi be the (p + q) × (p + q) matrix whose

elements are h2Pd s=1Bs,j(zi)/fZ(zi): 1 ≤ j ≤ m = (p + q)2. From Lemma 1, we have    pnhd_f Z(z1)/κd( ˆV∗(z1) − V∗(z1) − B1) .. . pnhd_f Z(zk)/κd( ˆV∗(zk) − V∗(zk) − Bk)    D →    N∗ 1 .. . N_k∗   ≡ N ∗_,

(23)

where for 1 ≤ i ≤ k, N_i∗ is a normal matrix of elements with mean 0 and variance 1. Apply the Skorohod’s theorem, for 1 ≤ i ≤ k, there exist random matrices Ti and W1,isuch that Ti∼ (nhdfZ(zi)/κd)1/2( ˆV∗(zi) − V∗(zi) − Bi),

W1,i∼ Ni∗ and Ti → W1,ialmost surely. Therefore,

ˆ V∗(zi) ∼ √ κd_T i pnhd_f Z(zi) + V∗(zi) + Bi= V∗(zi) + √ κd pnhd_f Z(zi) (W1,i+ W2,i),

where W2,i = Ti− W1,i+pnhdfZ(zi)/κdBi. Note that Bi = O(h2). From

(S6),Pk

i=1kW2,ik = op(1).

For 1 ≤ i ≤ k, let ˜Vi= V∗(zi) + (nhdfZ(zi)/κd)−1/2(W1,i+ W2,i), A1(zi) =

g( ˜Vi, α∗)g1,1( ˜Vi) and ˜ρ20(zi) be the largest eigenvalue of A1(zi)(g1,1( ˜Vi))−1. Here

the functions g(·, ·) and g1,1 are defined in (19) and (18) respectively. Then,

˜

ρ0(zi) has the same distribution as ˆρ(zi). Below we will show that the impact

of W2,iis negligible in the derivation of the asymptotic distribution of ˜ρ0(zi).

For 1 ≤ r, s ≤ 2 and 1 ≤ i ≤ k, let 4r,s,i = gr,s( ˜Vi) − gr,s(V∗(zi)). Then, k X i=1 2 X r=1 2 X s=1 k4r,s,ik2= Op ₁ nhd + h 4 = Op ₁ nhd and A1(zi) =g1,2(V∗(zi))(g2,2( ˜Vi))−1g2,1(V∗(zi)) − g1,1( ˜Vi)α∗(α∗)Tg1,1( ˜Vi)

+ g1,2(V∗(zi))42,1,i+ 41,2,ig2,1(V∗(zi)) + 41,2,i42,1,i

− g1,2(V∗(zi))42,2,i42,1,i− 41,2,i42,2,ig2,1(V∗(zi)) + R1,i,

where

R1,i=41,2,i((g2,2( ˜Vk))−1− Iq)42,1,i

+ g1,2(V∗(zi))((g2,2( ˜Vi))−1− Iq+ 42,2,i)42,1,i

+ 41,2,i((g2,2( ˜Vi))−1− Iq+ 42,2,i)g2,1(V∗(zi))

and Iq denotes the q × q identity matrix. Note that g2,2( ˜Vi) can be expressed as

g2,2( ˜Vi) = 1 BT i Bi Di

for some matrices Bi and Di, so A1(zi) becomes

B_iT((Di− BiBiT) −1_{− I}

q−1)BiJ + g1,2(V∗(zi))(42,2,i− J )2g2,1(V∗(zi))

− 41,1,ig1,2(V∗(zi))g2,1(V∗(zi))41,1,i+ 41,2,i42,1,i

− g1,2(V∗(zi))42,2,i42,1,i− 41,2,i42,2,ig2,1(V∗(zi)) + R1,i,

where J = α∗_(α∗₎T_{. Let}

A2(zi) =g1,2(V∗(zi))(g2,2(W1,i))2g2,1(V∗(zi))

− g1,1(W1,i)g1,2(V∗(zi))g2,1(V∗(zi))g1,1(W1,i) + g1,2(W1,i)g2,1(W1,i)

(24)

and R2,i=BiT((Di− BiBTi ) −1_{− I} q−1)BiJ − (nhd_f Z(zi)/κd)−1A2(zi) + g1,2(V∗(zi))(42,2,i− J )2g2,1(V∗(zi))

− 41,1,ig1,2(V∗(zi))g2,1(V∗(zi))41,1,i+ 41,2,i42,1,i

− g1,2(V∗(zi))42,2,i42,1,i− 42,1,i42,2,ig2,1(V∗(zi)).

Then, A1(zi) = A2(zi)κd nhd_f Z(zi) + R1,i+ R2,i, (21) where k X i=1 (kR1,ik2+ kR2,ik2) = Op ₁ (nhd₎2 . (22)

Note that under conditional independence, for 1 ≤ i ≤ k, A2(zi) = CiCiT,

where Ci is the p × q matrix obtained by replacing elements in the first

col-umn and first row of g1,2(W1,i) with zero’s, and g1,2(W1,i) is a random matrix

whose elements are IID N (0, 1) expect that the (1, 1)-th element is 1. There-fore,Pk

i=1kA2(zi)k2= Op(1), which, together with (21) and (22), implies that

Pk i=1kA1(zi)k 2_{= O} p(1/((nhd)2)) and k X i=1 kA1(zi)(g1,1( ˜Vi))−1− A1(zi)k2= Op ₁ (nhd₎3 . (23)

For 1 ≤ i ≤ k, let λ0,i be the largest eigenvalue of A2(zi). By (21), (22) and

(23),

k

X

i=1

(nhdfZ(zi) ˜ρ20(zi)/κd− λ0,i)2= op(1).

Let ˜fi, ˜ρ(zi) and λi : 1 ≤ i ≤ k be random variables such that the joint

distribution of ( ˜fi, ˜ρ(zi)) : 1 ≤ i ≤ k is the same as ( ˆfZ(zi), ˆρ(zi)) : 1 ≤ i ≤ k,

and the joint distribution of ( ˜ρ(zi), λi) : 1 ≤ i ≤ k is the same as ( ˜ρ0(zi), λ0,i) :

1 ≤ i ≤ k. Note that nhdPk i=1( ˆρ(zi)) 2_{= O} p(1), so we have that nhd κd k X i=1 ˆ fZ(zi)( ˆρ(zi))2− nhd κd k X i=1 fZ(zi)( ˆρ(zi))2 ≤nh d κd k X i=1 ( ˆfZ(zi) − fZ(zi))2 !1/2 _k X i=1 ( ˆρ(zi))2 = Op(1)Op((nhd)−1/2) = Op((nhd)−1/2) and nhd κd k X i=1 ˜ fi( ˜ρ(zi))2− k X i=1 λi ≤ Op((nhd)−1/2) + op(1) = op(1).

(25)

4.4 Proof of Theorem 3

Suppose that ρ(zi) > 0 for some zi. Then, we have P k

i=1fZ(zi)ρ2(zi) > 0.

Choose such that 0 < <Pk

i=1fZ(zi)ρ 2_(z i) and we have P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ k X i=1 fZ(zi)ρ2(zi) − ! | {z } III ≤ P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ κd_F∗ 1−α nhd !

for large n. From Theorem 1,

III ≥ P k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2(zi) ≤ ! → 1, so P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ κdF_1−α∗ nhd ! → 1.

4.5 The verification of the expression for σ

2 n

The expression for σ2

n involves some variance and covariance terms. Under the

conditions in Theorem 1, the major parts for those variance and covariance terms can be obtained. The results are as follows. For 1 ≤ i, i∗ ≤ k and 1 ≤ j, j∗≤ m, 1 - 4 hold. 1. V ar uj,tk0(Zt−z_h i) = hdκdσ2j(zi)fZ(zi) + O(hd+2). 2. Cov uj,tk0(Zth−zi), uj∗_,tk₀(Zt−zi h ) = h d_κd_c jj∗(z_i)f_Z(z_i) + O(hd+2). 3. Covuj,tk0(Zt_h−zi), uj,tk0(Zt−z_hi∗) = O(h2d). 4. Covuj,tk0(Zt_h−zi), uj∗_,tk₀(Zt−zi∗ h ) = O(h2d).

We will only give the proof for Case 1 since the proofs for other cases are similar. Since V ar uj,tk0 Zt− zi h = E E u2_j,t k0 Zt− zi h 2 |Zt !! = Z σj2(zt) k0 zt− zi h 2 fZ(zt)dzt = hd Z σ_j2(zi+ hν)(k0(ν))2fZ(zi+ hν)dν = hd Z σ_j2(zi)(k0(ν))2 fZ(zi) + h d X s=1 fs(zi)νs+ O(h2) ! dν = hdκdσ2_j(zi)fZ(zi) + O(hd+2),

(26)

Acknowledgements

This research is supported by National Science Council in Taiwan under grant NSC 99-2118-M-004 -006 -. The authors would like to thank the reviewers for careful reading and constructive comments.

References

[1] Taoufik Bouezmarni, Jeroen V. K. Rombouts, and Abderrahim Taamouti. A nonparametric copula based test for conditional independence with appli-cations to granger causality. Working papers, Departamento de Econom´ıa Universidad Carlos III de Madrid, 2009.

[2] Miguel A. Delgado and Wenceslao Gonz´alez Manteiga. Significance test-ing in nonparametric regression based on the bootstrap. The Annals of Statistics, 29(5):1469–1507, 2001.

[3] J. P. Florens and Denis Fougere. Noncausality in continuous time. Econo-metrica, 64(5):1195–1212, 1996.

[4] J. P. Florens and M. Mouchart. A note on noncausality. Econometrica, 50(3):583–591, 1982.

[5] P. G. Hall and C. C. Heyde. Martingale Limit Theory and Its Applications. Academic Press, 1980.

[6] Craig Hiemstra and Jonathan D. Jones. Testing for linear and nonlinear granger causality in the stock price-volume relation. The Journal of Fi-nance, 49(5):1639–1664, 1994.

[7] Tzee Ming Huang. Testing conditional independence using maximal non-linear conditional correlation. The Annals of Statistics, 38(4):2047–2091, 2010.

[8] Lexin Li, R. Dennis Cook, and Christopher J. Nachtsheim. Model-free vari-able selection. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 67(2):285–299, 2005.

[9] Qi Li and Jeffrey Scott Racine. Nonparametric Econometrics : theory and practice. Princeton University Press, 2007.

[10] Oliver Bruce Linton and Pedro Gozalo. Conditional independence restric-tions: Testing and estimation. Cowles Foundation Discussion Papers 1140, Cowles Foundation, Yale University, 1996.

[11] Efstathios Paparoditis and Dimitris N. Politis. The local bootstrap for ker-nel estimators under general dependence conditions. Annals of the Institute of Statistical Mathematics, 52(1):139–159, 2000.

(27)

[12] Liangjun Su and Halbert White. A consistent characteristic function-based test for conditional independence. Journal of Econometrics, 141:807–834, 2007.

[13] Liangjun Su and Halbert White. A nonparametric hellinger metric test for conditional independence. Econometric Theory, 24(4):829–864, 2008. [14] V. A. Volkonskii and Yu. A. Rozanov. Some limit theorems for random

(28)

國科會補助計畫衍生研發成果推廣資料表

日期:2011/07/29

國科會補助計畫

計畫名稱: 相依資料的條件獨立檢定計畫主持人: 黃子銘計畫編號: 99-2118-M-004-006- 學門領域: 數理統計

無研發成果推廣資料

(29)

99 年度專題研究計畫研究成果彙整表

計畫主持人：

黃子銘

計畫編號：

99-2118-M-004-006-

計畫名稱：

相依資料的條件獨立檢定

量化

成果項目

實際已達成

數（被接受

或已發表）

預期總達成

數(含實際已

達成數)

本計畫實

際貢獻百

分比

單位

備註（質化說明：

如數個計畫共同

成果、成果列為

該期刊之封面故

事 ...等

）

期刊論文

0

0 100%

研究報告/技術報告

0

1 100%

Yu-Hsiang Cheng,

Tzee-Ming Huang,

(2010)

＇＇A

conditional

indenpendence

test for dependent

data

based

on

maximamal

conditional

correlation.＇＇

Department

of

Statistics ,NCCU,

Technology

Report,

No.2010-03

研討會論文

0

0 100%

篇

論文著作

專書

0

0 100%

申請中件數

0

0 100%

專利

已獲得件數

0

0 100%

件

件數

0

0 100%

件

技術移轉

權利金

0

0 100%

千元

碩士生

1

0 100%

負責行政支援

博士生

1

0 100%

合著投稿論文

博士後研究員

0

0 100%

國內

參與計畫人力

（本國籍）

專任助理

0

0 100%

人次

期刊論文

0

0 100%

研究報告/技術報告 0

0 100%

研討會論文

0

0 100%

篇

論文著作

專書

0

0 100%

章/本

申請中件數

0

0 100%

專利

已獲得件數

0

0 100%

件

件數

0

0 100%

件

技術移轉

權利金

0

0 100%

千元

國外

(30)

博士生

0

0 100%

博士後研究員

0

0 100%

（外國籍）

專任助理

0

0 100%

其他成果

(

無法以量化表達之成

果如辦理學術活動、獲

得獎項、重要國際合

作、研究成果國際影響

力及其他協助產業技

術發展之具體效益事

項等，請以文字敘述填

列。)

技術報告投稿至 Journal of Multivariate Analysis, 目前 under review

成果項目

量化

名稱或內容性質簡述

測驗工具(含質性與量性)

0 課程/模組

0 電腦及網路系統或工具

0 教材

0 舉辦之活動/競賽

0 研討會/工作坊

0 電子報、網站

0 科

教

處

計

畫

加

填

項

目計畫成果推廣之參與（閱聽）人數

0

(31)

(32)

相依資料的條件獨立檢定

行政院國家科學委員會專題研究計畫 成果報告

相依資料的條件獨立檢定

研究成果報告(精簡版)

計 畫 類 別 ： 個別型

計 畫 編 號 ： NSC 99-2118-M-004-006-

執 行 期 間 ： 99 年 08 月 01 日至 100 年 07 月 31 日

執 行 單 位 ： 國立政治大學統計學系

計 畫 主 持 人 ： 黃子銘

計畫參與人員： 碩士班研究生-兼任助理人員：程毓婷

博士班研究生-兼任助理人員：鄭宇翔

處 理 方 式 ： 本計畫可公開查詢

中 華 民 國 100 年 07 月 29 日

A conditional independence test for dependent

data based on maximal conditional correlation

Yu-Hsiang Cheng

Department of Statistics

National Chengchi University

Taipei, Taiwan, ROC

96354501@nccu.edu.tw

Tzee-Ming Huang

Department of Statistics

National Chengchi University

Taipei, Taiwan, ROC

tmhuang@nccu.edu.tw

June 16, 2011

1

Introduction

2

Review and main results

2.1

Definition, approximation, and estimation for

maxi-mal nonlinear conditional correlation

2.2

A test for conditional independence and relative

asymp-totic properties

3

Simulation studies and application to S&P500

index data

3.1

Simulation studies

3.2

Application to S&P500 index data

4

Proofs

4.1

Proof of Lemma 1

4.2

Proof of Theorem 1

4.3

Proof of Theorem 2

4.4

Proof of Theorem 3

4.5

The verification of the expression for σ

Acknowledgements

References

國科會補助計畫衍生研發成果推廣資料表

國科會補助計畫

無研發成果推廣資料

99 年度專題研究計畫研究成果彙整表

計畫主持人：

黃子銘

計畫編號：

99-2118-M-004-006-

計畫名稱：

相依資料的條件獨立檢定

量化

成果項目

實際已達成

數（被接受

或已發表）

預期總達成

數(含實際已

達成數)

本計畫實

際貢獻百

分比

單位

備註（質 化 說 明 ：

行政院國家科學委員會專題研究計畫成果報告

計畫類別：個別型

計畫編號： NSC 99-2118-M-004-006-

執行期間： 99 年 08 月 01 日至 100 年 07 月 31 日

執行單位：國立政治大學統計學系

計畫主持人：黃子銘

計畫參與人員：碩士班研究生-兼任助理人員：程毓婷

處理方式：本計畫可公開查詢

中華民國 100 年 07 月 29 日

備註（質化說明：

如數個計畫共同

成果、成果列為

該期刊之封面故