• 沒有找到結果。

相依資料的條件獨立檢定

N/A
N/A
Protected

Academic year: 2021

Share "相依資料的條件獨立檢定"

Copied!
32
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

相依資料的條件獨立檢定

研究成果報告(精簡版)

計 畫 類 別 : 個別型

計 畫 編 號 : NSC 99-2118-M-004-006-

執 行 期 間 : 99 年 08 月 01 日至 100 年 07 月 31 日

執 行 單 位 : 國立政治大學統計學系

計 畫 主 持 人 : 黃子銘

計畫參與人員: 碩士班研究生-兼任助理人員:程毓婷

博士班研究生-兼任助理人員:鄭宇翔

處 理 方 式 : 本計畫可公開查詢

中 華 民 國 100 年 07 月 29 日

(2)

A conditional independence test for dependent

data based on maximal conditional correlation

Yu-Hsiang Cheng

Department of Statistics

National Chengchi University

Taipei, Taiwan, ROC

96354501@nccu.edu.tw

Tzee-Ming Huang

Department of Statistics

National Chengchi University

Taipei, Taiwan, ROC

tmhuang@nccu.edu.tw

June 16, 2011

Abstract

In Huang [7], a test of conditional independence based on maximal nonlinear conditional correlation is proposed and the asymptotic distri-bution for the test statistic under conditional independence is established for IID data. In this paper, we derive the asymptotic distribution for the test statistic under conditional independence for α-mixing data. The results of simulation show that the test performs reasonably well for de-pendent data. We also apply the test to stock index data to test Granger noncausality between returns and trading volume.

Keywords: conditional independence test, α-mixing, maximal conditional nonlinear correlation

2010 Mathematics Subject Classification: 62G10, 62H20

1

Introduction

The testing of conditional independence is important in statistics; one interest-ing application of such testinterest-ing is variable selection. For instance, consider the following regression problem:

Y = f (Z, X) + , (1)

where  is independent of (Z, X) and f is a real-valued function. If Y and X are conditionally independent given Z, the variable X can be excluded from the model in (1).

Suppose that X, Y and Z are continuous random vectors of dimensions d1, d2

and d respectively. For testing whether X and Y are conditionally independent given Z, most tests in the literature deal with the case where the observations

(3)

for (X, Y, Z) are IID. See, for example, Linton and Gozalo [10], Delgado and Manteiga [2], Li, Cook and Nachtsheim [8], Huang [7], etc.

When the observations for (X, Y, Z) are weakly dependent, fewer tests are available in the literature. Su and White [12, 13] developed nonparametric tests based on a weighted Hellinger distance between conditional densities or the difference between conditional characteristic functions. Bouezmarni, Rombouts and Taamouti [1] also proposed a nonparametric test based on the Hellinger distance of copula densities.

In [12], [13] and [1], one motivation for constructing conditional indepen-dence tests for dependent data is to test Granger noncausality, which, according to Florens and Mouchart [4] and Florens and Fougere [3], is a form of condi-tional independence. Specifically, a series {Ut} does not Granger cause series

{Vt} if

Vt⊥ (Ut−1, Ut−2, . . . , Ut−p)|(Vt−1, Vt−2, . . . , Vt−p) for every p ≥ 1,

where ⊥ denotes an independent relationship.

In this paper, we consider Huang’s test statistic and derive its asymptotic distribution for α-mixing data. In order to measure the conditional associa-tion between X and Y given Z, Huang [7] uses a measure called the maximal nonlinear conditional correlation, which is defined as

sup

f,g∈S∗ 0

Corr(f (X, Z), g(Y, Z)|Z), (2)

where S0∗is the collection of (f, g)’s such that E(f2(X, Z)) < ∞ and E(g2(Y, Z)) <

∞. Huang’s test statistic is an estimator for a weighted average of estimators of maximal nonlinear conditional correlation at different evaluation points for the given variable Z. The test statistic also involves certain basis functions used to approximate the f and g in (2). We show that the asymptotic distribution of Huang’s test statistic for α-mixing data is the same as that for IID data if the number of evaluation points and the number of basis functions are held constant.

This paper is organized as follows. In Section 2, we review the definition of maximal nonlinear conditional correlation and certain approximation results given in [7], and state the asymptotic properties of the test statistic that we derive under α-mixing condition. Some simulation results and an application are in Section 3. Proofs are given in Section 4.

2

Review and main results

In this section, we review the definition of the maximal nonlinear conditional correlation ρ1(X, Y |Z), the approximation of ρ1(X, Y |Z) and the proposed

es-timator for ρ1(X, Y |Z = z) in [7]. Then, we consider Huang’s test statistic

for testing H0 : ρ1(X, Y |Z) = 0 and present its asymptotic properties that we

(4)

2.1

Definition, approximation, and estimation for

maxi-mal nonlinear conditional correlation

The maximal nonlinear conditional correlation ρ1(X, Y |Z) is essentially the

maximum of E(f (X, Z)g(Y, Z)|Z) over S0, where S0 is the collection of (f, g)’s

that satisfy the following conditions:

E(f2(X, Z)|Z)I(0,∞)(E(f2(X, Z)|Z)) = I(0,∞)(E(f2(X, Z)|Z))

E(g2(Y, Z)|Z)I(0,∞)(E(g2(Y, Z)|Z)) = I(0,∞)(E(g2(Y, Z)|Z))

(3)

and

E(f (X, Z)|Z) = E(g(Y, Z)|Z) = 0. (4) To avoid dealing with the existence of the maximum and the measurability of ρ1(X, Y |Z), in [7], ρ1(X, Y |Z) is defined as

sup

(f,g)∈S0

E(f (X, Z)g(Y, Z)|Z),

where the supremum is defined as lim

n→∞E(αn(X, Z)βn(Y, Z)|Z),

where {(αn, βn)} is a sequence in S0 that satisfies the following conditions:

(i) The sequence {E(αn(X, Z)βn(Y, Z)|Z)} is non-decreasing.

(ii) For every (f, g) ∈ S0,

E(f (X, Z)g(Y, Z)|Z) ≤ lim

n→∞E(αn(X, Z)βn(Y, Z)|Z).

To approximate

ρ1(X, Y |Z) = sup (f,g)∈S0

E(f (X, Z)g(Y, Z)|Z),

we consider S0,p,q: the collection of all (f, g)’s in S0 such that f and g are in

the spans of {φp,j : 1 ≤ j ≤ p} and {ψq,k : 1 ≤ k ≤ q} respectively, when Z is

given. That is,

f (X, Z) = p X j=1 ap,j(Z)φp,j(X) for some ap,j(Z)’s and g(Y, Z) = q X k=1 bq,k(Z)ψq,k(Y ) for some bq,k(Z)’s.

Suppose that the basis functions φp,i’s and ψq,j’s are selected so that there

exists basis functions θr,k’s such that

lim p,r→∞a(i,k)inf E  α(X, Z) − X 1≤i≤p,1≤k≤r a(i, k)φp,i(X)θr,k(Z)   2 = 0 (5)

(5)

and lim q,r→∞b(j,k)inf E  β(Y, Z) − X 1≤j≤q,1≤k≤r b(j, k)ψq,j(Y )θr,k(Z)   2 = 0. (6)

for every α and β such that E(α2(X, Z)) and E(β2(Y, Z)) are finite. Let X , Y

and Z be the ranges of X, Y and Z respectively. Suppose that for each (p, q), there exist coefficients ap,0,i’s and bq,0,j’s such that

X 1≤i≤p ap,0,iφp,i(x) = 1 = X 1≤j≤q bq,0,jψq,j(y) (7)

for every x in X and every y in Y. Let ρp,q(Z) = max

(f,g)∈S0,p,q

E(f (X, Z)g(Y, Z)|Z).

Then, by Fact 2 in [7], ρ1(X, Y |Z) can be reasonably approximated by ρp,q(Z)

if p and q are large. The statement of the fact is given below.

FACT 1. (Fact 2 in [7]) Suppose that (5), (6) and (7) hold and {pn} and {qn}

are sequences of positive integers that tend to ∞ as n → ∞. Then lim

n→∞E(|ρ1(X, Y |Z) − ρpn,qn(Z)|) = 0.

A remark.

• It is not difficult to find basis functions that satisfy (5), (6) and (7). If X , Y and Z are bounded regions in Rd1, Rd2 and Rd respectively

and the Lebesgue densities for (X, Z) and (Y, Z) are bounded, then φp,i’s

and ψq,j’s can be taken as B-spline basis functions on multidimensional

intervals containing X and Y respectively, where the θr,k’s can be can be

taken as B-spline basis functions on a multidimensional interval containing Z.

ρp,q(Z) can be found as follows. First, we look for vectors a1= (a1,1(Z), . . . , a1,p(Z))T

and b1 = (b1,1(Z), . . . , b1,q(Z))T such that (a1, b1) is the pair (a, b) that

maxi-mizes aTΣ

φ,ψ,p,q(Z)b subject to

aTΣφ,p(Z)a = 1 = bTΣψ,p(Z)b,

where

Σφ,p(Z) = (E(φp,i(X)φp,j(X)|Z) − E(φp,i(X)|Z)E(φp,j(X)|Z))p×p,

Σψ,q(Z) = (E(ψq,i(Y )ψq,j(Y )|Z) − E(ψq,i(Y )|Z)E(ψq,j(Y )|Z))q×q,

and

(6)

Take f1(X, Z) = p X j=1 a1,j(Z)(φp,j(X) − E(φp,j(X)|Z)) and g1(Y, Z) = q X k=1 b1,k(Z)(ψq,j(X) − E(ψq,j(Y )|Z)).

Then, E(f1(X, Z)g1(Y, Z)|Z) = ρp,q(Z).

For z ∈ Z, let ˆΣφ,ψ,p,q(z), ˆΣφ,p(z) and ˆΣψ,q(z) be the kernel estimators

of Σφ,ψ,p,q(z), Σφ,p(z) and Σψ,q(z) respectively; in other words, every element

E(g(X, Y )|Z = z) in Σφ,ψ,p,q(z), Σφ,p(z) and Σψ,q(z) is estimated by

ˆ E(g(X, Y )|Z = z) = Pn t=1g(Xt, Yt)k0((Zt− z)/h) Pn t=1k0((Zt− z)/h) (8)

in ˆΣφ,ψ,p,q(z), ˆΣφ,p(z) and ˆΣψ,q(z), where k0 is a kernel function defined on Rd

and h > 0. Then, we use ˆρp,q(z) = maxa,baTΣˆφ,ψ,p,q(z)b for estimating ρp,q(z),

where all pairs (a, b) satisfy

aTΣˆφ,p(z)a = 1 = bTΣˆψ,q(z)b.

Henceforth, the estimator ˆρp,q(z) will be abbreviated as ˆρ(z) for each z in Z.

2.2

A test for conditional independence and relative

asymp-totic properties

The conditional independence test that we use in this paper is based on ˆρ2(z)

at different z’s. Since each ˆρ(z) is determined by the kernel estimators of cer-tain conditional expectations, we first derive their joint asymptotic distribution. Then, we use Pk

i=1fˆZ(zi) ˆρ2(zi)’s as our test statistic and establish its

consis-tency and asymptotic distribution. Here the zi’s are selected points in Z and

ˆ fZ(·) =

Pn

t=1k0((Zt− ·)/h)

nhd

is the kernel density estimator of fZ: the Lebesgue pdf of Z. In order to avoid

dealing with the boundary bias problem in kernel estimation, we consider a set Z0that is contained in the interior of Z so that points in Z0are away from the

boundary of Z, and choose the zi’s from Z0.

Our first result is with regard to the joint asymptotic distribution of kernel estimators of some conditional expectations. In order to describe the assump-tions, we first review the definition for α-mixing coefficients. For a strictly stationary process {Ut}, let Fabdenote the σ-algebra generated by (Ua, . . . , Ub).

Then, the α-mixing coefficient at lag s for {Ut} is

(7)

{Ut} is considered to be α-mixing if its α-mixing coefficient at lag s tends to

0 as s tends to ∞. Let α(s) denote the α-mixing coefficient at lag s for the process {(Xt, Yt, Zt)}. Our assumptions are provided below.

(S0) The basis functions φp,1, . . ., φp,pand ψq,1, . . ., ψq,qare bounded and (5),

(6) and (7) hold. For the sake of brevity, φp,1, . . ., φp,pand ψq,1, . . ., ψq,q

will be abbreviated as φ1, . . ., φp and ψ1, . . ., ψq respectively hereafter.

(S1) {(Xt, Yt, Zt) ∈ Rd1+d2+d, t ≥ 0} is a strictly stationary α-mixing process

that satisfies α(τ ) = O(τ−(1+)), where  > max(1, d/2), d1, d2 and d

denote the dimensions of Xt, Ytand Zt respectively.

(S2) Suppose that there exist Z0: an open subset of the interior of Z and µ:

σ-finite measure such that for every z ∈ Z0, the conditional distribution of

(X, Y ) given Z = z has a pdf f (·|z) with respect to µ. Further, f (x, y|z) and fZ(z) are twice differentiable with respect to z on Z0.

(S3) There exists a function h on X × Y such that

sup z∈Z0 max  |f (x, y|z)|, max 1≤i≤d ∂ ∂zi f (x, y|z) , max 1≤i,j≤d ∂2 ∂zi∂zj f (x, y|z)  ≤ h(x, y) andR h(x, y)dµ(x, y) < ∞.

(S4) There exist constants c0 and c1 such that

sup z∈Z0 max  |fZ(z)|, max 1≤i≤d ∂ ∂zi fZ(z) , max 1≤i,j≤d ∂2 ∂zi∂zj fZ(z)  ≤ c0 and 1/fZ(z) ≤ c1for z ∈ Z0.

(S5) k∗ is a kernel function defined on R1, and k

0 is a product kernel on Rd

that satisfies

k0(v1, v2, . . . , vd) = k∗(v1)k∗(v2) · · · k∗(vd),

k∗ ≥ 0, supvk∗(v) < ∞, R k∗(v)dv = 1, R vk(v)dv = 0,R v(k(v))2dv =

0 and κ2=R v2k∗(v)dv < ∞.

(S6) As n → ∞, the bandwidth h → 0, nhd→ ∞ and nhd+4→ 0.

Under the above conditions, the joint asymptotic distribution of kernel estima-tors of conditional expectations can be established, as stated in Lemma 1. The proof for Lemma 1 is provided in Section 4.1.

LEMMA 1. Suppose that Conditions (S1)–(S6) hold. Suppose that g1, g2, . . .,

gm are bounded functions defined on X × Y. Suppose z1, . . ., zk are distinct

points in Z0. For i = 1, . . ., k, let

ˆ gj(zi) = Pn t=1gj(Xt, Yt)k0((Zt− zi)/h) Pn t=1k0((Zt− zi)/h)

(8)

be the kernel estimator of g∗j(zi) ≡ E(gj(X, Y )|Z = zi). Further, let Bs,j(zi) = κ2 2 (fZ(zi)g ∗ j,ss(zi) + 2fs(zi)g∗j,s(zi)) (9) and Wj,n(zi) = √ nhd gˆ j(zi) − g∗j(zi) − h2 d X s=1 Bs,j(zi)/fZ(zi) !

for 1 ≤ i ≤ k and 1 ≤ j ≤ m, where g∗j,sand gj,ss∗ denote the first and the second partial derivatives of g∗

j with respect to the s-th component respectively and fs

denotes the first partial derivative of fZ with respect to the s-th component. Let

uj,t= gj(Xt, Yt) − g∗j(Zt),

cjj∗(zi) = E(uj,1uj,1|Z1= zi),

σj2(zi) = E(u2j,1|Z1= zi),

and

Wn = (W1,n(z1), . . . , W1,n(zk), . . . , Wm,n(z1), . . . , Wm,n(zk))T.

Then, Wn converges in distribution to a random vector

(Z1,1∗ , . . . , Zk,1∗ , . . . , Z1,m∗ , . . . , Zk,m∗ )T ≡ Z∗,

where Z∗ is multivariate normal with mean 0 and for 1 ≤ i, i∗ ≤ k and 1 ≤ j, j∗ ≤ m, Cov(Zi,j∗ , Zi∗∗,j∗) =    κdσ2 j(zi)/fZ(zi) if i = i∗and j = j∗; κdc jj∗(zi)/fZ(zi) if i = i∗and j 6= j∗; 0 if i 6= i∗, where κ =R (k∗(v))2dv.

Now, suppose that the basis functions φl’s and ψm∗’s are linearly

indepen-dent. For the sake of convenience, for z ∈ {z1, . . . , zk}, we apply certain linear

transformations to φl’s and ψm∗’s to obtain new basis functions φ∗

l’s and ψ∗m∗’s

(the ˆρ(z) remains unchanged under such transformations). Take g1(X, Y ), . . .,

gm(X, Y ) to be the functions φ∗l(X)φ ∗

l0(X), φ∗l(X)ψ∗m∗(Y ) and ψm∗∗(Y )ψm∗0(Y ),

where 1 ≤ l ≤ l0 ≤ p and 1 ≤ m∗ ≤ m0 ≤ q. Then, the consistency of ˆρ(z) can

be established and we have Theorem 1. The proof for Theorem 1 is provided in Section 4.2.

THEOREM 1. Suppose that conditions (S0)–(S6) hold and the basis functions φl’s and ψm∗’s are linearly independent. Suppose z1, . . ., zk are distinct points

in Z0. Then, k X i=1 ˆ ρ2(zi) − ρ2p,q(zi) 2 = Op  1 nhd + h 4 

(9)

and k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2p,q(zi) !2 = Op  1 nhd + h 4 

The following theorem states the approximate distribution of the statistic Pk

i=1fˆZ(zi) ˆρ2(zi) when X and Y are conditionally independent given Z.

THEOREM 2. Suppose that the conditions in Theorem 1 hold and X and Y are conditionally independent given Z. Then,

nhd κd k X i=1 ˆ fZ(zi) ˆρ2(zi) converges in distribution to k X i=1 λi

as n tends to ∞, where the λi’s are IID and have the same distribution as the

largest eigenvalue of a matrix CCT, where C is a (p − 1) × (q − 1) matrix whose elements are IID N (0, 1).

The proof of Theorem 2 is provided in Section 4.3. Theorem 2 is similar to Theorem 3.2 given in [7]. The main difference between the two is that Theorem 2 can be applied to α-mixing data. In addition, p and q are held fixed in Theorem 2, while they are allowed to depend on n and tend to ∞ as n tends to ∞ in Theorem 3.2 in [7].

According to Theorem 2, a test that rejects H0if

nhd κd k X i=1 ˆ f (zi) ˆρ2(zi) ≥ F1−α∗ (10)

is of approximate level α, where F∗ is the distribution function ofPk

i=1λi and

F1−α∗ is the 1 − α quantile of F∗. Theorem 3 states that the test with rejection

region in (10) is consistent if p and q are sufficiently large and one of the ρ(zi)’s

is positive. The proof for this theorem is provided in Section 4.4.

THEOREM 3. Suppose that ρ(zi) > 0 for some ziand p and q are sufficiently

large so that ρp,q(zi) > 0. Then, for 0 < α < 1, the probability that (10) holds

tends to 1 as n → ∞.

3

Simulation studies and application to S&P500

index data

3.1

Simulation studies

In this section, we conduct several simulation studies for illustrating the per-formance of our test. The data generating processes, labeled Data1 – Data13, are described below. In order to make our simulation results comparable with those of the test proposed by Su and White [13], some of our data generating processes (Data1–Data10) are the same as theirs. Throughout the description for Data1–Data10, (1,t, 2,t, 3,t) are IID N (0, I3).

(10)

Data1: (Xt, Yt, Zt) = (1,t, 2,t, 3,t).

Data2: Xt= 0.5Xt−1+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Date3: Xt= 1,t q 0.01 + 0.5X2 t−1, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1. Data4: Xt = 1,tph1,t, Yt = 2,tph2,t, Zt = Xt−1, h1,t = 0.01 + 0.9h1,t−1+ 0.05Xt−12 and h2,t= 0.01 + 0.9h2,t−1+ 0.05Yt−12 .

Data5: Xt= 0.5Xt−1+ 0.5Yt+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Data6: Xt= 0.5Xt−1+ 0.5Yt2+ 1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Data7: Xt= 0.5Xt−1Yt+ 1,t, Yt= 0.5Yt−1+ 2,tand Zt= Xt−1.

Data8: Xt= 0.5Xt−1+ 0.5Yt1,t, Yt= 0.5Yt−1+ 2,t and Zt= Xt−1.

Data9: Xt= 1,t

q

0.01 + 0.5X2

t−1+ 0.25Yt2, Yt= 0.5Yt−1+ 2,tand Zt=

Xt−1.

Data10: Xt = 1,tph1,t, Yt = 2,tph2,t, Zt = Xt−1, h1,t = 0.01 +

0.1h1,t−1+ 0.4Xt−12 + 0.5Yt2 and h2,t= 0.01 + 0.9h2,t−1+ 0.5Yt2.

Data11: (Xt, Yt, Zt) = (1,t, 2,t, 3,t), where (1,t, 2,t, 3,t) are IID LN (0, I3).

Data12: Xt= 1,t1,t−1, Yt = 2,t2,t−1 and Zt = Xt−1, where (1,t, 2,t)

are IID LN (0, I2).

Data13: Xt= 1,t2,t−1, Yt= 21,t2,t−1 and Zt= 2,t−1, where (1,t, 2,t)

are IID LN (0, I2).

Here, Data1 – Data4, Data11 and Data12 are used for examining the level of the test, and Data5 – Data10 and Data13 are used for checking the power.

3.1.1 Simulation studies based on asymptotic distribution of the test statistic

We first apply our test using the asymptotic distribution of the test statistic. Parameter set-up: In order to apply our test, certain parameters need to be specified, including the kernel function k∗, the kernel bandwidth h and the basis functions. For the sake of simplicity, in all the simulation experiments, we take the kernel bandwidth h to be cn−0.25, where n is the sample size and c ∈ {0.5, 1, 1.5, 2}; we use the following kernel function:

k∗(x) = 

1 − x if 0 ≤ x ≤ 1; x + 1 if − 1 ≤ x < 0.

In addition, the basis functions φ∗1,. . . , φ∗p and ψ1∗,. . . , ψq∗ are selected in the following manner. For i = 1, . . . , p and j = 1, . . . , q, let

φi(x) =  1 if i−1 p ≤ x < i p; 0 otherwise (11)

(11)

and

ψj(y) =



1 if j−1q ≤ y < jq;

0 otherwise, (12) where p = q = 4. Since the basis functions are defined on [0, 1], we transform the data (Xt, Yt, Zt)nt=1into (F1(Xt), F2(Yt), F3(Zt))nt=1before using the test, where

F1, F2 and F3 denote the empirical CDF’s of {Xt}t=1n , {Yt}nt=1 and {Zt}nt=1

respectively. For the choice of the evaluation points, we take z1= 0.78n−0.25≡

h0 and zi= zi−1+ 2h0 if i ≥ 2 and zi≤ 1 − h0.

Table 1 shows that the levels of the test are less than 0.05 for c = 0.5 and c = 1 and the powers of the test are larger for larger c’s. It seems that when c = 1, the levels of the test are close to 0.05 and the power performance is fine.

n = 500 n = 1000 c = 0.5 c = 1 c = 1.5 c = 2 c = 0.5 c = 1 c = 1.5 c = 2 Data1 0.030 0.039 0.053 0.071 0.043 0.048 0.057 0.074 Data2 0.030 0.041 0.058 0.074 0.033 0.048 0.060 0.069 Data3 0.032 0.042 0.055 0.080 0.038 0.049 0.055 0.070 Data4 0.038 0.044 0.057 0.075 0.042 0.048 0.057 0.066 Data5 0.951 1 1 1 1 1 1 1 Data6 0.898 1 1 1 0.997 1 1 1 Data7 0.918 1 1 1 0.985 1 1 1 Data8 0.995 1 1 1 1 1 1 1 Data9 0.725 0.991 1 1 0.993 1 1 1 Data10 0.374 0.817 0.959 0.986 0.819 0.996 1 1 Data11 0.036 0.050 0.062 0.079 0.035 0.042 0.049 0.059 Data12 0.036 0.051 0.055 0.072 0.041 0.041 0.053 0.065 Data13 1 1 1 1 1 1 1 1

Table 1: Power results for different c’s when n = 500 and n = 1000

3.1.2 Simulation studies based on local bootstrap

The test based on asymptotic distribution of the test statistic does not work well for small sample sizes. Figure 1 shows that the distribution of the test statistic and the asymptotic distribution are quite different for Data11 when n = 100. For Data1 – Data4 and Data12, we find similar patterns. When n = 200, the difference between the distribution of the test statistic and the asymptotic distribution become smaller but is still visible.

To apply our test for small sample sizes, we consider the local bootstrap procedure proposed by Paparoditis and Politis [11]. The local bootstrap proce-dure is described below. For a given sample {(Xt, Yt, Zt)}nt=1, a local bootstrap

sample {(Xt∗, Yt∗, Zt∗)}n

t=1is generated according to the following steps.

(12)

Figure 1: Exact distribution (solid line) versus asymptotic distribution (dashed line) of the test statistic with different bandwidth choices (h = cn−1/4)

distribution function ˆFZ, where

ˆ FZ(z) = 1 n n X t=1 I(−∞,Zt](z).

(b) For 1 ≤ t ≤ n, we draw Xt∗ and Yt∗ independently from the empirical cu-mulative distribution functions ˆFX|Z=Z∗

t and ˆFY |Z=Zt∗respectively, where

ˆ FX|Z=Z∗ t(x) = Pn t=1k ∗((Z∗ t − Zt)/b)I(−∞,Xt](x) Pn t=1k∗((Zt∗− Zt)/b) and ˆ FY |Z=Z∗ t(y) = Pn t=1k ∗((Z

t − Zt)/b)I(−∞,Yt](y)

Pn

t=1k∗((Zt∗− Zt)/b)

(13)

Here, the bandwidth b is taken to be n−0.2 and the kernel function k∗ is the probability density function for N (0, 1).

In order to determine the rejection region for a given sample, we repeat the above procedure to obtain bootstrap resamples and compute the test statistic nhdκ−dPk

i=1f (zˆ i) ˆρ2(zi) for the original sample and each local bootstrap

re-sample. For a given level α, if the test statistic based on the given sample is larger than the (1 − α) quantile of the test statistics that are computed based on the local bootstrap resamples, we reject the conditional independence hy-pothesis at level α. The purpose of using the local bootstrap procedure is to generate a resample {(Xt∗, Yt∗, Zt∗)}n

t=1such that the distribution of Z∗, the

con-ditional distributions of X∗ given Z∗= z and Y∗ given Z∗= z are close to the distribution of Z, the conditional distributions of X given Z = z and Y given Z = z respectively. In addition, since Xt∗ and Yt∗ are generated independently given Zt∗ = z, they are conditionally independent given Zt∗= z, irrespective of whether or not X and Y are conditionally independent given Z.

In these simulation studies, we choose the basis functions in (11) and (12) with p = q = 5. The evaluation points are {0.2, 0.4, 0.6, 0.8}, and the kernel bandwidth h to be cn−0.25, where n is the sample size and c ∈ {0.5, 1, 1.5, 2}.

Finally, we present a few experimental results of our test (Test 1) and Su and White’s test (Test 2). For Test 2, we run the simulations for Data11 – Data13 with the bandwidth hn = c∗n−1/8.5, where c∗ = 1 or 2. Each power estimate

is based on 3000 repetitions, where 1000 local bootstrap resamples are used in each repetition. For the sake of comparison, we also list some power estimates for Test 2 for Data1 – Data10, which are taken directly from [13]. They use 250 repetitions with 200 local bootstrap resamples for each repetition.

Tables 2 and 3 indicate the level and power estimates for Test 1 and Test 2 at significance level 5% when the sample sizes are 100 and 200 respectively.

Data1 Data2 Data3 Data4 Data5 Data6 Data7 Test 2,c∗= 1 0.096 0.060 0.048 0.072 0.668 0.756 0.388 Test 2,c∗= 2 0.072 0.036 0.072 0.048 0.952 0.944 0.576 Test 1,c = 0.5 0.045 0.061 0.046 0.062 0.525 0.479 0.265 Test 1,c = 1 0.046 0.050 0.050 0.047 0.746 0.717 0.400 Test 1,c = 1.5 0.040 0.052 0.056 0.055 0.814 0.779 0.329 Test 1,c = 2 0.041 0.050 0.053 0.062 0.852 0.793 0.218

Data8 Data9 Data10 Data11 Data12 Data13 Test 2,c∗= 1 0.860 0.828 0.680 0.034 0.043 0.589 Test 2,c∗= 2 0.940 0.988 0.912 0.022 0.022 0.859 Test 1,c = 0.5 0.692 0.357 0.195 0.058 0.050 1 Test 1,c = 1 0.873 0.566 0.320 0.049 0.048 1 Test 1,c = 1.5 0.889 0.618 0.341 0.049 0.041 1 Test 1,c = 2 0.860 0.631 0.348 0.046 0.045 1

(14)

Data1 Data2 Data3 Data4 Data5 Data6 Data7 Test 2,c∗= 1 0.064 0.052 0.080 0.080 0.900 0.960 0.596 Test 2,c∗= 2 0.044 0.060 0.056 0.048 1 1 0.864 Test 1,c = 0.5 0.040 0.061 0.036 0.055 0.827 0.830 0.488 Test 1,c = 1 0.049 0.051 0.057 0.054 0.982 0.983 0.831 Test 1,c = 1.5 0.046 0.048 0.049 0.053 0.995 0.989 0.827 Test 1,c = 2 0.045 0.045 0.047 0.057 0.997 0.995 0.735

Data8 Data9 Data10 Data11 Data12 Data13 Test 2,c∗= 1 0.992 0.968 0.880 0.031 0.036 0.347 Test 2,c∗= 2 1 1 0.996 0.025 0.032 0.872 Test 1,c = 0.5 0.988 0.730 0.392 0.062 0.062 1 Test 1,c = 1 1 0.947 0.679 0.048 0.047 1 Test 1,c = 1.5 1 0.968 0.738 0.051 0.043 1 Test 1,c = 2 1 0.971 0.745 0.058 0.037 1

Table 3: Power comparison between Tests 1 and 2 when n = 200

3.2

Application to S&P500 index data

In this section, we apply the linear Granger causality test (hereafter denoted by Test LIN) and our conditional independence test (Test 1) in order to check the interaction between returns and volume for S&P500 index data at one day lag. There are 2514 observations for daily index returns and trading volume from January 2000 to December 2009, taken from Yahoo Finance. Here, the return for day t is defined as

Rt= 100 log  Pt Pt−1  ,

where Pt is the index value for day t. Moreover, the trading volume for day t

(in dollars), denoted by Vt, is transformed into

Vt∗= log  V t Vt−1  .

The above transformations are commonly used in the analysis for financial data; for example, see Hiemstra and Jones [6] and [1]. The augmented Dickey-Fuller test reveals that the series {Rt} and {Vt∗} are stationary.

In order to examine whether {Rt} is useful for predicting {Vt∗}, we consider

the effects up to lag 1. Specifically, we test

H0: Vt∗⊥ Rt−1|Vt−1∗ (13)

using Test 1. For Test LIN, it is assumed that

E(Vt∗|Rt−1, Vt−1∗ ) = a1Rt−1+ b1Vt−1∗

and the null hypothesis is

(15)

We use the notation Rt−16⇒ Vt∗to denote the relation expressed in (13) or (14).

The notation Vt−1∗ 6⇒ Rtis defined analogously.

The p-values for Test LIN and Test 1 are provided in Table 4. For Test 1, we use the same parameter set-up as in Section 3.1.1 and find both the return-to-volume and volume-to-return relationships are significant at the 5% level. However, for Test LIN, the volume-to-return relationship is not significant. These findings are consistent with the results obtained in [6] and [1].

H0 Rt−16⇒ Vt∗ Vt−1∗ 6⇒ Rt

Test LIN 0.000 0.804 Test 1 0.001 0.032

Table 4: p-values for Test LIN and Test 1 for testing the relationship between returns and volume changes

To illustrate the implementation of our test for the d > 1 case, we also apply the test to test

H0: Vt∗⊥ (Rt−1, Rt−2)|(Vt−1∗ , Vt−2∗ ) (15)

and

H0: Rt⊥ (Vt−1∗ , V ∗

t−2)|(Rt−1, Rt−2). (16)

The empirical CDF transforms are applied component-wisely. For instance, we transform (Vt−1∗ , Vt−2∗ , )n

t=4into (F1(Vt−1∗ ), F2(Vt−2∗ ))nt=4, where n = 2512 and Fi

is the empirical CDF of Vt−i∗ for i = 1, 2. For the basis functions, we use 4 basis functions on [0, 1]: φ1, . . ., φ4 and 4 basis functions ψ1, . . ., ψ4 on [0, 1]2,

where φ1, . . ., φ4are given in (11) with p = 4, ψ1(y1, y2) = I[0,0.5)(y1)I[0,0.5)(y2),

ψ2(y1, y2) = I[0,0.5)(y1)I[0.5,1)(y2), ψ3(y1, y2) = I[0.5,1)(y1)I[0,0.5)(y2) and ψ4 =

1−ψ1−ψ2−ψ3. Here IA(·) denotes the indicator function on A. In addition, the

kernel bandwidth is cn−1/(d+δ) with c = 1.4 and δ = 2.4. The evaluation points are all the points in S2

h0, where Sh0 = {(2k −1)h0: k is an integer }∩[h0, 1−h0]

and h0= 0.78n−1/(d+δ). Here c and δ are selected from certain candidate values

so that the levels of the test are close to 0.05 when the data are IID U (0, 1). The p-values for (15) and (16) are 0.017 and 0.370 respectively.

Some remarks on the implementation of the test.

• It is recommended to choose evaluation points so that two evaluation points, zi and zj, are at least 2h away (for each component) when a

compact kernel supported on [−1, 1]d is used. In such case, the ˆρ(zi) and

ˆ

ρ(zj) are independent for IID data, which makes the distribution of the

test statistic close to the derived asymptotic distribution. Since nhd→ ∞,

h cannot be too small, which implies that the number of evaluation points cannot be too large.

• We apply empirical CDF transforms to our data so that the distribution of each component of X, Y and Z is supported on [0, 1]. The transforms are data dependent and it is not clear whether the transformed data can

(16)

be treated as if they were transformed by the true underlying CDF. The simulation results are fine, but further investigation is needed.

4

Proofs

In this section, we give proofs for Theorems 1 – 3 and Lemma 1. Before giving the proofs, we first define and recall some notations. Recall that k∗ is a kernel on R1 and k0 is a product kernel on Rd defined by

k0(v1, v2, . . . , vd) = k∗(v1)k∗(v2) · · · k∗(vd),

κ = Z

(k∗(v))2dv and κ2=R v2k∗(v)dv.

For a (p + q) × (p + q) matrix V0, let g1,1(V0), g1,2(V0), g2,1(V0) and g2,2(V0)

denote the matrices of dimensions p × p, p × q, q × p, q × q respectively such that V0=  g1,1(V0) g1,2(V0) g2,1(V0) g2,2(V0)  .

4.1

Proof of Lemma 1

For simplicity, we prove the lemma only for the case where m = 2 and k = 2. For t = 1, 2, . . . , n, i = 1, 2 and j = 1, 2, let

ˆ ηj,1(zi) = (nhd)−1 n X t=1 (gj∗(Zt) − gj∗(zi))k0  Zt− zi h  , ˆ ηj,2(zi) = (nhd)−1 n X t=1 uj,tk0  Zt− zi h 

and ˆηj(zi) = ˆηj,1(zi) + ˆηj,2(zi). Then, ˆgj(zi) − gj∗(zi) = ˆηj(zi)/ ˆfZ(zi), where

ˆ

fZ(zi) = (1/(nhd))Pnt=1k0((Zt− zi)/h). We can complete the proof using the

following results (A1)-(A3):

(A1) Suppose that the conditions in Lemma 1 hold. Then, for 1 ≤ i, j ≤ 2,

ˆ ηj,1(zi) = h2 d X s=1 Bs,j(zi) + op  h3+ (nhd)−1/2.

(A2) Suppose that the conditions in Lemma 1 hold. Then,

Zn∗≡√nhd     ˆ η1,2(z1) ˆ η2,2(z1) ˆ η1,2(z2) ˆ η2,2(z2)     D → Z,

(17)

where the distribution of Z is N (0, Σ) and Σ is     κdσ12(z1)fZ(z1) κdc12(z1)fZ(z1) 0 0 κdc12(z1)fZ(z1) κdσ22(z1)fZ(z1) 0 0 0 0 κdσ21(z2)fZ(z2) κdc12(z2)fZ(z2) 0 0 κdc12(z2)fZ(z2) κdσ22(z2)fZ(z2)     .

(A3) Suppose that (Xn1, Xn2, . . . , Xnk)T D→ (Y1, Y2, . . . , Yk)Tand (Zn1, Zn2. . . , Znk)T D

→ (c1, c2, . . . , ck)T, where c1, c2, . . ., ck are constants. Then,

(Xn1Zn1, Xn2Zn2, . . . , XnkZnk)T D→ (c1Y1, c2Y2, . . . , ckYk)T.

From (A1),(A2) and the assumption that nhd+4 → 0, we have

√ nhd      ˆ η1(z1) − h2P d s=1Bs,1(z1) ˆ η2(z1) − h2P d s=1Bs,2(z1) ˆ η1(z2) − h2P d s=1Bs,1(z2) ˆ η2(z2) − h2P d s=1Bs,2(z2)      ∼ Z + (nhd)1/2op  h3+√1 nhd  D → Z,

where A ∼ B means that the distributions of A and B are the same. Apply (A3) and we have Lemma 1.

The proofs of (A1)-(A3) are given below. • Proof of (A1). Note that

E (ˆηj,1(zi)) = 1 hd Z g∗j(zt) − gj∗(zi) k0  zt− zi h  fZ(zt)dzt = Z gj∗(zi+ hν) − g∗j(zi) k0(ν)fZ(zi+ hν)dν ν = (ν1, . . . , νd) = Z h d X s=1 gj,s∗ (zi)νsfZ(zi)k0(ν)dν + Z h2 d X s=1 g∗j,s(zi)νs ! d X s=1 fs(zi)νs ! k0(ν)dν +1 2 Z h2fZ(zi) d X s=1 d X s∗=1 gj,ss∗ ∗(zi)νsνs∗k0(ν)dν + O(h3) = h2κ2 2 d X s=1 fZ(zi)g∗j,ss(zi) + 2fs(zi)g∗j,s(zi) + O(h3) = h2 d X s=1 Bs,j(zi) + O(h3).

Let Ki,j,t= h−d gj∗(Zt) − gj∗(zi) k0((Zt− zi)/h). Then, we have

V ar (ˆηj,1(zi)) = 1 n2   n X t=1 V ar(Ki,j,t) + n X t=1 n X s=1,s6=t

Cov(Ki,j,t, Ki,j,s)

 .

(18)

Since

V ar(Ki,j,t) = E Ki,j,t2  − (E(Ki,j,t)) 2 = 1 hd Z gj∗(zi+ hν) − g∗j(zi) 2 (k0(ν)) 2 fZ(zi+ hν)dν − fZ(zi)h Z d X s=1 gj,s∗ (zi)νsk0(ν)dν + O(h2) !2 = 1 hdO(h 2) − O(h4), Pn

t=1V ar(Ki,j,t) = O(nh2−d). Note that from Corollary A.2 in Hall

and Heyde [5] and the fact that for 2 < β < 2(2 + d)/d, E( K β i,j,t ) = O(h2+d−βd), we have X s6=t

Cov(Ki,j,t, Ki,j,s)

= 2 n X t=1 n X s>t

Cov(Ki,j,t, Ki,j,s)

≤ 2n ∞ X s=1

|Cov(Ki,j,1, Ki,j,1+s)|

≤ 16nO(h2(2+d−βd)/β) ∞ X s=1 α(β−2)/β(s). Therefore, V ar (ˆηj,1(zi)) = O  h2 nhd  + O h 2(2+d−βd)/β n  = o  1 nhd 

From the above results, ˆηj,1(zi) = h2P d

s=1Bs,j(zi) + op(h3+ (nhd)(−1/2)).

• Proof of (A2). By the Cram´er-Wold Theorem, it is sufficient to prove that cTZ

n converges in distribution to cTZ for any c = (c1, c2, c3, c4)T in R4.

We use “big-small block” arguments to complete the proof. Assume that there exist positive integers p = p(n), q = q(n) and k = k(n) = [n/(p + q)] (the integer part of n/(p + q)) such that as n → ∞,

p → ∞, q → ∞, p = o(n), q = o(p), p = o(nhd)1/2, np−1α(q) = o(1), phd= o(1), phd→ ∞. Let Zn,t = 1 √ hd  c1u1,tk0  Zt− z1 h  + c2u2,tk0  Zt− z1 h  +c3u1,tk0  Zt− z2 h  + c4u2,tk0  Zt− z2 h  .

(19)

Then, we have cTZ∗ n= √1n Pn t=1Zn,t ≡√1nWn. Let ξj =P (j+1)p+jq t=j(p+q)+1Zn,t and ζj=P (j+1)(p+q) t=(j+1)p+jq+1Zn,tfor j = 0,1, . . ., k−1, and ζk= Pn t=k(p+q)+1Zn,t. Then, Wn= k−1 X j=0 ξj | {z } Wn1 + k−1 X j=0 ζj | {z } Wn2

+ζk. In order to prove this lemma, it suffices

to show that as n → ∞, (1) E(exp(itWn1)) − Qk−1 j=0E(exp(itξj)) → 0, (2) √1 nWn2 p → 0 and 1 nζk p → 0, (3) σ2 n≡ Pk−1 j=0E(ξ 2 j) = n(σ2+ o(1)), (4) σ12 n Pk−1 j=0E(ξ 2

jI(|ξj| > εpσn2)) → 0 for any ε > 0,

where σ2=c21κdfZ(z1)σ12(z1) + c22κdfZ(z1)σ22(z1) + c23κdfZ(z2)σ12(z2) + c24κ df Z(z2)σ22(z2) + 2c1c2κdfZ(z1)c12(z1) + 2c3c4κdfZ(z2)c12(z2).

The verification of the above expression for σ2

n is given in Section 4.5.

We now prove these results respectively. From Lemma 18.2 in Li and Racine [9], which is due to Volkonskii and Rozanov [14],

E(exp(itWn1)) − k−1 Y j=0 E(exp(itξj)) ≤ 16kα(q) = O n pα(q)  = o(1),

we obtain (1). In order to prove (2), we first consider Wn2. Note that

E(Wn22 ) = V ar( k−1 X j=0 ζj) = kV ar(ζ0) | {z } (P 1) + k−1 X i=0 k−1 X j=0,j6=i Cov(ζi, ζj) | {z } (P 2) .

Computation of (P1). Note that from

V ar(ζ0) = q X i=1 V ar(Zn,i) + 2 q X i=1 q X j>i Cov(Zn,i, Zn,j), q X i=1

(20)

and the fact that 2 q X i=1 q X j>i Cov(Zn,i, Zn,j) = 2q q X j=1 (1 −j q)Cov(Zn,1, Zn,1+j) = O(q 2hd), we have that

V ar(ζ0) = qσ2+ O(q2hd) + O(qh2) = qσ2(1 + o(1)).

Therefore,

(P 1) = kqσ2(1 + o(1)) = O(kq) = o(n). Computation of (P2). Note that from Theorem A.5 in [5],

|(P 2)| = 2 k−1 X i=0 k−1 X j>i Cov(ζi, ζj) ≤ 2 n−p X i=1 n X j=i+p |Cov(Zn,i, Zn,j)| ≤ 2n ∞ X j=p |Cov(Zn,1, Zn,1+j)| ≤ 2n ∞ X j=p 4C1nC2nα(j) ≤ C∗ n hd ∞ X j=p α(j) = o(n),

where Cin= 4 max |ck| sup |us,1| sup |k0|/

hd for i = 1, 2. Then, we have

E(W2

n2)/n = o(1). Similarly, V ar(ζk) = O(p + q) = o(n), so (2) holds.

By stationarity and the same arguments in (1), we have V ar(ξ0) = pσ2(1+

o(1)). Thus Pk−1

j=0E(ξ 2

j)/n = kpσ

2(1 + o(1))/n → σ2. Finally, since

|Zn,t| ≤ C/

hd, for every  > 0, the set {|ξ

j| ≥ pσ2n} is an empty set

when n is large. Therefore, (4) holds. This complete the proof.

• Proof of (A3). It is sufficient to prove that (Xn1, . . . , Xnk, Zn1, . . . , Znk)T D→

(Y1, . . . , Yk, c1, . . . , ck). Let Xn= (Xn1, . . . , Xnk)T , Zn= (Zn1, . . . , Znk)T,

Y = (Y1, . . . , Yk)T and c = (c1, . . . , ck)T. Then,

E(ei(tTXn+sTZn)) = E(ei(tTXn+sTc)ei(sT(Zn−c)))

= E(ei(tTXn+sTc)(ei(sT(Zn−c))− 1))

| {z } I + E(ei(tTXn+sTc)) | {z } II .

Note that II → E(ei(tTY +sTc)) and I → 0 by Lebesgue’s dominated

convergence theorem. Apply the continuous mapping Theorem and we have (A3).

4.2

Proof of Theorem 1

We adopt the proof in [7]. For z ∈ {z1, . . . , zk}, let φ∗l: 1 ≤ l ≤ p and ψ∗m∗: 1 ≤

(21)

φl’s and ψm∗’s such that φ∗1= 1 = ψ1∗, (E(φ∗l(X)φ∗l0(X)|Z = z) : 1 ≤ l, l0 ≤ p)

and (E(ψm∗∗(Y )ψm∗0(Y )|Z = z) : 1 ≤ m∗, m0 ≤ q) are identity matrices, and

E(φ∗l(X)ψm∗∗(Y )|Z = z) = 0 for l 6= m∗. Take g1(X, Y ), . . ., gm(X, Y ) to be the

functions φ∗l(X)φ∗l0(X), φ∗l(X)ψm∗∗(Y ) and ψm∗∗(Y )ψm∗0(Y ), where 1 ≤ l ≤ l0≤ p

and 1 ≤ m∗≤ m0≤ q. Apply Lemma 1 and we have

√ nhd              ˆ g1(z1) − g1∗(z1) .. . ˆ g1(zk) − g1∗(zk) .. . ˆ gm(z1) − gm∗(z1) .. . ˆ gm(zk) − gm∗(zk)              −√nhd               h2Pd s=1Bs,1(z1)/fZ(z1) .. . h2Pd s=1Bs,1(zk)/fZ(zk) .. . h2Pd s=1Bs,m(z1)/fZ(z1) .. . h2Pd s=1Bs,m(zk)/fZ(zk)               D → Z∗. (17) Let V∗(z) =  V11(z) V12(z) V21(z) V22(z)  ,

where the (l, l0)-th element of V11(z) is E(φ∗l(X)φ∗l0(X)|Z = z) for 1 ≤ l, l0 ≤ p,

the (l, m∗)-th element of V12(z) is E(φ∗l(X)ψm∗∗(Y )|Z = z) for 1 ≤ l ≤ p,

1 ≤ m∗ ≤ q, the (m∗, m0)-th element of V

22(z) is E(ψm∗∗(Y )ψm∗0(Y )|Z = z)

for 1 ≤ m∗, m0 ≤ q, and V

21(z) = (V12(z))T. Let ˆV∗(z) be the estimator

of V∗(z) obtained by replacing each conditional expectation in V∗(z) with its kernel estimator defined in (8). Then, (17) gives

k X i=1 k ˆV∗(zi) − V∗(zi) k2= Op  1 nhd  + Op(h4) = Op  1 nhd + h 4  .

For 1 ≤ i ≤ k, for a p × 1 vector a and a (p + q) × (p + q) matrix

U =  U11 U12 U21 U22  ,

where the dimensions of U11, U12, U21 and U22are p × p, p × q, q × p and q × q

respectively, define gr,s(U ) = Urs (18) for 1 ≤ r, s ≤ 2, gr,s∗ (U ) =  gr,s(U ) if (r, s) = (1, 2) or (2, 1); (gr,s(U ))−1 if (r, s) = (1, 1) or (2, 2), and g(U, a) = U1,2U2,2−1U2,1U1,1−1− U1,1aaT. (19)

(22)

Let α∗ be the p × 1 vector whose first element is 1 and the rest elements are 0’s. Then, ˆρ(z) and ρp,q(z) are the square roots of the largest eigenvalues of the

matrices g( ˆV∗(z), α∗) and g(V∗(z), α∗) respectively. Let 4r,s,i= gr,s∗ ( ˆV ∗(z i)) − g∗r,s(V ∗(z i)). Then, we have k g( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k ≤ 2 Y r=1 2 Y s=1 (kgr,s∗ (V∗(zi))k + k4r,s,ik) − 2 Y r=1 2 Y s=1 kg∗r,s(V∗(zi))k + kg1,1∗ ( ˆV∗(zi)) − g1,1∗ (zi)kkα∗(α∗)Tk,

which gives that

k X i=1 kg( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k2= Op  1 nhd + h 4  = Op  1 nhd  and k X i=1 ˆ ρ2(zi) − ρ2p,q(zi) 2 = Op  1 nhd + h 4  (20)

since | ˆρ2(zi) − ρ2p,q(zi)| ≤ kg( ˆV∗(zi), α∗) − g(V∗(zi), α∗)k for 1 ≤ i ≤ k. From

(20) and the fact that

k X i=1 ( ˆfZ(zi) − fZ(zi))2= Op  1 nhd + h 4  , k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2p,q(zi) !2 = k X i=1 ( ˆfZ(zi) − fZ(zi)) ˆρ2(zi) + k X i=1 fZ(zi)( ˆρ2(zi) − ρ2p,q(zi)) !2 = Op  1 nhd + h 4  .

4.3

Proof of Theorem 2

We adopt the proof in [7]. For z ∈ {z1, . . . , zk}, let ˆV∗(z), V∗(z) and Bs,j be as

defined in the proof of Theorem 1. Let Bi be the (p + q) × (p + q) matrix whose

elements are h2Pd s=1Bs,j(zi)/fZ(zi): 1 ≤ j ≤ m = (p + q)2. From Lemma 1, we have    pnhdf Z(z1)/κd( ˆV∗(z1) − V∗(z1) − B1) .. . pnhdf Z(zk)/κd( ˆV∗(zk) − V∗(zk) − Bk)    D →    N∗ 1 .. . Nk∗   ≡ N ∗,

(23)

where for 1 ≤ i ≤ k, Ni∗ is a normal matrix of elements with mean 0 and variance 1. Apply the Skorohod’s theorem, for 1 ≤ i ≤ k, there exist random matrices Ti and W1,isuch that Ti∼ (nhdfZ(zi)/κd)1/2( ˆV∗(zi) − V∗(zi) − Bi),

W1,i∼ Ni∗ and Ti → W1,ialmost surely. Therefore,

ˆ V∗(zi) ∼ √ κdT i pnhdf Z(zi) + V∗(zi) + Bi= V∗(zi) + √ κd pnhdf Z(zi) (W1,i+ W2,i),

where W2,i = Ti− W1,i+pnhdfZ(zi)/κdBi. Note that Bi = O(h2). From

(S6),Pk

i=1kW2,ik = op(1).

For 1 ≤ i ≤ k, let ˜Vi= V∗(zi) + (nhdfZ(zi)/κd)−1/2(W1,i+ W2,i), A1(zi) =

g( ˜Vi, α∗)g1,1( ˜Vi) and ˜ρ20(zi) be the largest eigenvalue of A1(zi)(g1,1( ˜Vi))−1. Here

the functions g(·, ·) and g1,1 are defined in (19) and (18) respectively. Then,

˜

ρ0(zi) has the same distribution as ˆρ(zi). Below we will show that the impact

of W2,iis negligible in the derivation of the asymptotic distribution of ˜ρ0(zi).

For 1 ≤ r, s ≤ 2 and 1 ≤ i ≤ k, let 4r,s,i = gr,s( ˜Vi) − gr,s(V∗(zi)). Then, k X i=1 2 X r=1 2 X s=1 k4r,s,ik2= Op  1 nhd + h 4  = Op  1 nhd  and A1(zi) =g1,2(V∗(zi))(g2,2( ˜Vi))−1g2,1(V∗(zi)) − g1,1( ˜Vi)α∗(α∗)Tg1,1( ˜Vi)

+ g1,2(V∗(zi))42,1,i+ 41,2,ig2,1(V∗(zi)) + 41,2,i42,1,i

− g1,2(V∗(zi))42,2,i42,1,i− 41,2,i42,2,ig2,1(V∗(zi)) + R1,i,

where

R1,i=41,2,i((g2,2( ˜Vk))−1− Iq)42,1,i

+ g1,2(V∗(zi))((g2,2( ˜Vi))−1− Iq+ 42,2,i)42,1,i

+ 41,2,i((g2,2( ˜Vi))−1− Iq+ 42,2,i)g2,1(V∗(zi))

and Iq denotes the q × q identity matrix. Note that g2,2( ˜Vi) can be expressed as

g2,2( ˜Vi) =  1 BT i Bi Di 

for some matrices Bi and Di, so A1(zi) becomes

BiT((Di− BiBiT) −1− I

q−1)BiJ + g1,2(V∗(zi))(42,2,i− J )2g2,1(V∗(zi))

− 41,1,ig1,2(V∗(zi))g2,1(V∗(zi))41,1,i+ 41,2,i42,1,i

− g1,2(V∗(zi))42,2,i42,1,i− 41,2,i42,2,ig2,1(V∗(zi)) + R1,i,

where J = α∗)T. Let

A2(zi) =g1,2(V∗(zi))(g2,2(W1,i))2g2,1(V∗(zi))

− g1,1(W1,i)g1,2(V∗(zi))g2,1(V∗(zi))g1,1(W1,i) + g1,2(W1,i)g2,1(W1,i)

(24)

and R2,i=BiT((Di− BiBTi ) −1− I q−1)BiJ − (nhdf Z(zi)/κd)−1A2(zi) + g1,2(V∗(zi))(42,2,i− J )2g2,1(V∗(zi))

− 41,1,ig1,2(V∗(zi))g2,1(V∗(zi))41,1,i+ 41,2,i42,1,i

− g1,2(V∗(zi))42,2,i42,1,i− 42,1,i42,2,ig2,1(V∗(zi)).

Then, A1(zi) = A2(zi)κd nhdf Z(zi) + R1,i+ R2,i, (21) where k X i=1 (kR1,ik2+ kR2,ik2) = Op  1 (nhd)2  . (22)

Note that under conditional independence, for 1 ≤ i ≤ k, A2(zi) = CiCiT,

where Ci is the p × q matrix obtained by replacing elements in the first

col-umn and first row of g1,2(W1,i) with zero’s, and g1,2(W1,i) is a random matrix

whose elements are IID N (0, 1) expect that the (1, 1)-th element is 1. There-fore,Pk

i=1kA2(zi)k2= Op(1), which, together with (21) and (22), implies that

Pk i=1kA1(zi)k 2= O p(1/((nhd)2)) and k X i=1 kA1(zi)(g1,1( ˜Vi))−1− A1(zi)k2= Op  1 (nhd)3  . (23)

For 1 ≤ i ≤ k, let λ0,i be the largest eigenvalue of A2(zi). By (21), (22) and

(23),

k

X

i=1

(nhdfZ(zi) ˜ρ20(zi)/κd− λ0,i)2= op(1).

Let ˜fi, ˜ρ(zi) and λi : 1 ≤ i ≤ k be random variables such that the joint

distribution of ( ˜fi, ˜ρ(zi)) : 1 ≤ i ≤ k is the same as ( ˆfZ(zi), ˆρ(zi)) : 1 ≤ i ≤ k,

and the joint distribution of ( ˜ρ(zi), λi) : 1 ≤ i ≤ k is the same as ( ˜ρ0(zi), λ0,i) :

1 ≤ i ≤ k. Note that nhdPk i=1( ˆρ(zi)) 2= O p(1), so we have that nhd κd k X i=1 ˆ fZ(zi)( ˆρ(zi))2− nhd κd k X i=1 fZ(zi)( ˆρ(zi))2 ≤nh d κd k X i=1 ( ˆfZ(zi) − fZ(zi))2 !1/2 k X i=1 ( ˆρ(zi))2 = Op(1)Op((nhd)−1/2) = Op((nhd)−1/2) and nhd κd k X i=1 ˜ fi( ˜ρ(zi))2− k X i=1 λi ≤ Op((nhd)−1/2) + op(1) = op(1).

(25)

4.4

Proof of Theorem 3

Suppose that ρ(zi) > 0 for some zi. Then, we have P k

i=1fZ(zi)ρ2(zi) > 0.

Choose  such that 0 <  <Pk

i=1fZ(zi)ρ 2(z i) and we have P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ k X i=1 fZ(zi)ρ2(zi) −  ! | {z } III ≤ P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ κdF∗ 1−α nhd !

for large n. From Theorem 1,

III ≥ P k X i=1 ˆ fZ(zi) ˆρ2(zi) − k X i=1 fZ(zi)ρ2(zi) ≤  ! → 1, so P k X i=1 ˆ fZ(zi) ˆρ2(zi) ≥ κdF1−α∗ nhd ! → 1.

4.5

The verification of the expression for σ

2 n

The expression for σ2

n involves some variance and covariance terms. Under the

conditions in Theorem 1, the major parts for those variance and covariance terms can be obtained. The results are as follows. For 1 ≤ i, i∗ ≤ k and 1 ≤ j, j∗≤ m, 1 - 4 hold. 1. V ar uj,tk0(Zt−zh i) = hdκdσ2j(zi)fZ(zi) + O(hd+2). 2. Cov uj,tk0(Zth−zi), uj∗,tk0(Zt−zi h ) = h dκdc jj∗(zi)fZ(zi) + O(hd+2). 3. Covuj,tk0(Zth−zi), uj,tk0(Zt−zhi∗)  = O(h2d). 4. Covuj,tk0(Zth−zi), uj∗,tk0(Zt−zi∗ h )  = O(h2d).

We will only give the proof for Case 1 since the proofs for other cases are similar. Since V ar  uj,tk0  Zt− zi h  = E E u2j,t  k0  Zt− zi h 2 |Zt !! = Z σj2(zt)  k0  zt− zi h 2 fZ(zt)dzt = hd Z σj2(zi+ hν)(k0(ν))2fZ(zi+ hν)dν = hd Z σj2(zi)(k0(ν))2 fZ(zi) + h d X s=1 fs(zi)νs+ O(h2) ! dν = hdκdσ2j(zi)fZ(zi) + O(hd+2),

(26)

Acknowledgements

This research is supported by National Science Council in Taiwan under grant NSC 99-2118-M-004 -006 -. The authors would like to thank the reviewers for careful reading and constructive comments.

References

[1] Taoufik Bouezmarni, Jeroen V. K. Rombouts, and Abderrahim Taamouti. A nonparametric copula based test for conditional independence with appli-cations to granger causality. Working papers, Departamento de Econom´ıa Universidad Carlos III de Madrid, 2009.

[2] Miguel A. Delgado and Wenceslao Gonz´alez Manteiga. Significance test-ing in nonparametric regression based on the bootstrap. The Annals of Statistics, 29(5):1469–1507, 2001.

[3] J. P. Florens and Denis Fougere. Noncausality in continuous time. Econo-metrica, 64(5):1195–1212, 1996.

[4] J. P. Florens and M. Mouchart. A note on noncausality. Econometrica, 50(3):583–591, 1982.

[5] P. G. Hall and C. C. Heyde. Martingale Limit Theory and Its Applications. Academic Press, 1980.

[6] Craig Hiemstra and Jonathan D. Jones. Testing for linear and nonlinear granger causality in the stock price-volume relation. The Journal of Fi-nance, 49(5):1639–1664, 1994.

[7] Tzee Ming Huang. Testing conditional independence using maximal non-linear conditional correlation. The Annals of Statistics, 38(4):2047–2091, 2010.

[8] Lexin Li, R. Dennis Cook, and Christopher J. Nachtsheim. Model-free vari-able selection. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 67(2):285–299, 2005.

[9] Qi Li and Jeffrey Scott Racine. Nonparametric Econometrics : theory and practice. Princeton University Press, 2007.

[10] Oliver Bruce Linton and Pedro Gozalo. Conditional independence restric-tions: Testing and estimation. Cowles Foundation Discussion Papers 1140, Cowles Foundation, Yale University, 1996.

[11] Efstathios Paparoditis and Dimitris N. Politis. The local bootstrap for ker-nel estimators under general dependence conditions. Annals of the Institute of Statistical Mathematics, 52(1):139–159, 2000.

(27)

[12] Liangjun Su and Halbert White. A consistent characteristic function-based test for conditional independence. Journal of Econometrics, 141:807–834, 2007.

[13] Liangjun Su and Halbert White. A nonparametric hellinger metric test for conditional independence. Econometric Theory, 24(4):829–864, 2008. [14] V. A. Volkonskii and Yu. A. Rozanov. Some limit theorems for random

(28)

國科會補助計畫衍生研發成果推廣資料表

日期:2011/07/29

國科會補助計畫

計畫名稱: 相依資料的條件獨立檢定 計畫主持人: 黃子銘 計畫編號: 99-2118-M-004-006- 學門領域: 數理統計

無研發成果推廣資料

(29)

99 年度專題研究計畫研究成果彙整表

計畫主持人:

黃子銘

計畫編號:

99-2118-M-004-006-

計畫名稱:

相依資料的條件獨立檢定

量化

成果項目

實際已達成

數(被接受

或已發表)

預期總達成

數(含實際已

達成數)

本計畫實

際貢獻百

分比

單位

備註(質 化 說 明 :

如 數 個 計 畫 共 同

成 果 、 成 果 列 為

該 期 刊 之 封 面 故

事 ...等

期刊論文

0

0

100%

研究報告/技術報告

0

1

100%

Yu-Hsiang Cheng,

Tzee-Ming Huang,

(2010)

''A

conditional

indenpendence

test for dependent

data

based

on

maximamal

conditional

correlation.''

Department

of

Statistics ,NCCU,

Technology

Report,

No.2010-03

研討會論文

0

0

100%

論文著作

專書

0

0

100%

申請中件數

0

0

100%

專利

已獲得件數

0

0

100%

件數

0

0

100%

技術移轉

權利金

0

0

100%

千元

碩士生

1

0

100%

負責行政支援

博士生

1

0

100%

合著投稿論文

博士後研究員

0

0

100%

國內

參與計畫人力

(本國籍)

專任助理

0

0

100%

人次

期刊論文

0

0

100%

研究報告/技術報告 0

0

100%

研討會論文

0

0

100%

論文著作

專書

0

0

100%

章/本

申請中件數

0

0

100%

專利

已獲得件數

0

0

100%

件數

0

0

100%

技術移轉

權利金

0

0

100%

千元

國外

(30)

博士生

0

0

100%

博士後研究員

0

0

100%

(外國籍)

專任助理

0

0

100%

其他成果

(

無法以量化表達之成

果如辦理學術活動、獲

得獎項、重要國際合

作、研究成果國際影響

力及其他協助產業技

術發展之具體效益事

項等,請以文字敘述填

列。)

技術報告投稿至 Journal of Multivariate Analysis, 目前 under review

成果項目

量化

名稱或內容性質簡述

測驗工具(含質性與量性)

0

課程/模組

0

電腦及網路系統或工具

0

教材

0

舉辦之活動/競賽

0

研討會/工作坊

0

電子報、網站

0

目 計畫成果推廣之參與(閱聽)人數

0

(31)
(32)

國科會補助專題研究計畫成果報告自評表

請就研究內容與原計畫相符程度、達成預期目標情況、研究成果之學術或應用價

值(簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性)

、是否適

合在學術期刊發表或申請專利、主要發現或其他有關價值等,作一綜合評估。

1. 請就研究內容與原計畫相符程度、達成預期目標情況作一綜合評估

■達成目標

□未達成目標(請說明,以 100 字為限)

□實驗失敗

□因故實驗中斷

□其他原因

說明:

2. 研究成果在學術期刊發表或申請專利等情形:

論文:□已發表 ■未發表之文稿 □撰寫中 □無

專利:□已獲得 □申請中 ■無

技轉:□已技轉 □洽談中 ■無

其他:(以 100 字為限)

已投稿至 Journal of Multivariate Analysis, 目前 Under Review

3. 請依學術成就、技術創新、社會影響等方面,評估研究成果之學術或應用價

值(簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性)(以

500 字為限)

提出條件獨立檢定本來的目的就是希望能應用於相依資料. 這次成果的理論是建構在原

本不考慮相依資料時所得到的結果上, 雖然理論上的突破不如之前的結果大, 但是在應

用上是比之前的結果更有意義的.

數據

Table 1 shows that the levels of the test are less than 0.05 for c = 0.5 and c = 1 and the powers of the test are larger for larger c’s
Figure 1: Exact distribution (solid line) versus asymptotic distribution (dashed line) of the test statistic with different bandwidth choices (h = cn −1/4 )
Table 2: Power comparison between Tests 1 and 2 when n = 100
Table 3: Power comparison between Tests 1 and 2 when n = 200
+2

參考文獻

相關文件

Although Taiwan stipulates explicit regulations governing the requirements for organic production process, certification management, and the penalties for organic agricultural

For 5 to be the precise limit of f(x) as x approaches 3, we must not only be able to bring the difference between f(x) and 5 below each of these three numbers; we must be able

Consistent with the negative price of systematic volatility risk found by the option pricing studies, we see lower average raw returns, CAPM alphas, and FF-3 alphas with higher

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

The first row shows the eyespot with white inner ring, black middle ring, and yellow outer ring in Bicyclus anynana.. The second row provides the eyespot with black inner ring

substance) is matter that has distinct properties and a composition that does not vary from sample

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

Hope theory: A member of the positive psychology family. Lopez (Eds.), Handbook of positive