• 沒有找到結果。

條件相關性度量及條件獨立檢定(I)

N/A
N/A
Protected

Academic year: 2021

Share "條件相關性度量及條件獨立檢定(I)"

Copied!
8
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

條件相關性度量及條件獨立檢定

研究成果報告(精簡版)

計 畫 類 別 : 個別型

計 畫 編 號 : NSC 95-2119-M-004-001-

執 行 期 間 : 95 年 10 月 01 日至 96 年 07 月 31 日

執 行 單 位 : 國立政治大學統計學系

計 畫 主 持 人 : 黃子銘

計畫參與人員: 碩士班研究生-兼任助理:歐陽致平

處 理 方 式 : 本計畫可公開查詢

中 華 民 國 96 年 10 月 22 日

(2)

1

Introduction

Conditional independence tests have different applications. For example, to make variable selection in a regression model, Li, Cook and Nachtsheim (2005) proposed to test the independence between the response variable and a predic-tor variable given other predicpredic-tors, and remove the predicpredic-tor variable when the test is not significant. Su and White (2005) pointed out that conditional inde-pendence tests can be used for testing Granger non-causality for two time series and choosing a proper model for a certain family of semi-parametric models.

For the case where the variables involved are discrete, there are many exist-ing tests for conditional independence. For the case with continuous variables, there are relatively few results. Li et al (2005) proposed a test of conditional independence, which is constructed by projecting two variables to the space gen-erated by the conditioned variable and then testing the independence between the residuals. Su and White (2005, 2006) proposed tests based on a weighted Hellinger distance between the conditional densities or based on the difference between the conditional characteristic functions.

It is desirable to construct a test of conditional independence of two random vectors X and Y given a random vector Z based on some measure of conditional association, where the measure of conditional association have the following properties:

P1 The measure can be defined for any types of random vectors, including both discrete and continuous variables.

P2 The measure is invariant when one-to-one transforms are applied to each vector.

P3 The measure is between 0 and 1, where 0 corresponds to independence and 1 corresponds to full dependence.

Part or all of Properties P1 - P3 have been considered by various authors in different contexts. Some examples are as follows.

Romanoviˇc (1975) defined the maximum partial correlation between two σ-fields given a third σ-field. According to Romanoviˇc’s definition, the maximum partial correlation between σ(X) and σ(Y ) given σ(Z) is

sup

f,g

corr (f (X, Z) − E(f (X, Z)|Z), g(Y, Z) − E(g(Y, Z)|Z)) ,

where σ(X) denotes the σ-field generated by the random vector X. The max-imum partial correlation between σ(X) and σ(Y ) given σ(Z) can serve as a measure of conditional association of X and Y given Z, and it satisfies Proper-ties P1 - P3.

Su and White (2005) proposed a test of conditional independence which is based on a test statistic that is a weighted Hellinger distance between the

(3)

conditional density of X given Z and the conditional density of Y given Z. Such a statistic can serve as a measure of conditional association, and they chose Hellinger distance so that the test statistic has the invariant property P2. Dauxois and Nkiet (1998) proposed to use canonical coefficients obtain in nonlinear canonical analysis (NLCA) to construct measures of association and tests of independence. The following is a straightforward extension of Dauxois and Nkiet (1998)’s definition of the canonical coefficients to the conditional case. Definition 1. Suppose that there exist pairs of functions (fi, gi): i = 0,

1, . . ., such that for each i, (fi, gi) is a pair of functions (f, g) that maximizes

E (f (X, Z)g(Y, Z)|Z) subject to E(f2(X, Z)|Z) = 1 = E(g2(Y, Z)|Z) and E(f (X, Z)fj(X, Z)|Z) = 0 = E(g(Y, Z)gj(Y, Z)|Z) for j < i.

Define ρi(X, Y |Z) = E (fi(X, Z)gi(Y, Z)|Z) for each i. The ρi(X, Y |Z)’s will

be referred as canonical coefficients.

Suppose that the (fi, gi)’s in Definition 1 exist, then a proper combination

of ρi(X, Y |Z)’s can give a measure of conditional association. Some examples

of such a combination are ρ1(X, Y |Z) and −Pklog(1 − ρ2k(X, Y |Z)), whose

unconditional counterparts are two commonly used measures of association, as mentioned in Huang, Lee and Hsiao (2006).

Among the various approaches for constructing measures of conditional as-sociation described above, the NLCA approach offers the most flexibility. The objective of this project is to construct a test of conditional independence based on measures of conditional association from NLCA. However, it is not clear what conditions need to be added to guarantee the existence of the (fi, gi)’s. In

the report, an alternative definition for the canonical coefficient ρ1(X, Y |Z) is

provided to avoid finding such conditions.

2

Measures of Conditional Association

In this section, the canonical coefficients ρi(X, Y |Z)’s are defined using a new

approach. In Definition 1, it is clear that ρ0(X, Y |Z) = 1 with f0(X, Z) = 1 =

g0(Y, Z). Therefore, only the ρi(X, Y |Z)’s with i ≥ 1 will be defined again, and

the new definitions involve only pairs of functions in

S = {(f, g) : E(f2(X, Z)|Z) = 1 = E(g2(Y, Z)|Z) and E(f (X, Z)|Z) = 0 = E(g(Y, Z)|Z)}. As mentioned in Section 1, the purpose for introducing alternative definitions

for the ρi(X, Y |Z)’s is to avoid dealing with the existence of the maximizers

(fi, gi)’s. To achieve this goal, the maximums of certain conditional

expecta-tions in the original definiexpecta-tions of the ρi(X, Y |Z)’s will be replaced by suitable

supremums. In particular, sup(f,g)∈S∗E(f (X, Z)g(Y, Z)|Z) needs to be defined

(4)

Fact 1 For S∗⊂ S, there exists a sequence {(αn, βn)} in S∗ such that

(i) T hesequence{E(αn(X, Z)βn(Y, Z)|Z)} is increasing (not necessarily strictly),

and

(ii) for every (f, g) ∈ S∗, E(f (X, Z)g(Y, Z)|Z) ≤ limn→∞E(αn(X, Z)βn(Y, Z)|Z)

almost surely.

Furthermore, if (i) and (ii) hold for {(αn(X, Z), βn(Y, Z))} = {(αn,1(X, Z), βn,1(Y, Z))}

or {(αn,2(X, Z), βn,2(Y, Z))}, then

lim

n→∞E(αn,1(X, Z)βn,1(Y, Z)|Z) = limn→∞E(αn,2(X, Z)βn,2(Y, Z)|Z) (1)

almost surely.

Fact 1 allows one to define sup(f,g)∈S∗E(f (X, Z)g(Y, Z)|Z):

Definition 2. For S∗⊂ S, sup

(f,g)∈S∗

E(f (X, Z)g(Y, Z)|Z) = lim

n→∞E(αn(X, Z)βn(Y, Z)|Z),

where {(αn, βn)} is a sequence in S∗ that satisfies (i) and (ii) in Fact 1.

Proof for Fact 1. First, note that (1) holds because for every n, E(αn,2(X, Z)βn,2(Y, Z)|Z) ≤ lim

n→∞E(αn,1(X, Z)βn,1(Y, Z)|Z)

and

E(αn,1(X, Z)βn,1(Y, Z)|Z) ≤ lim

n→∞E(αn,2(X, Z)βn,2(Y, Z)|Z)

almost surely. It remains to find a sequence {(αn, βn)} that satisfies (i) and (ii).

Let {(α∗

n(X, Z), βn∗(Y, Z)} be a sequence in S∗ such that E(α∗n(X, Z)βn∗(Y, Z))

increases to sup(f,g)∈S∗E(f (X, Z)g(Y, Z)).

Let (α1(X, Z), β1(Y, Z)) = (α∗1(X, Z), β1∗(Y, Z)), and for n ≥ 2, define

(αn(X, Z), βn(Y, Z))

= 

(α∗

n(X, Z), βn∗(Y, Z)) if E(α∗n(X, Z)β∗n(Y, Z)|Z) > E(αn−1(X, Z)βn−1(Y, Z)|Z);

(αn−1(X, Z), βn−1(Y, Z)) otherwise.

Then {(αn(X, Z), βn(Y, Z))} is a sequence in S∗ that satisfies (i). To see that

{(αn(X, Z), βn(Y, Z))} also satisfies (ii), for (α, β) in S∗, Define

(α∗∗n , β∗∗n ) = 

(α, β) if E(α(X, Z)β(Y, Z)|Z) > limn→∞E(αn(X, Z)βn(Y, Z)|Z);

(5)

Then

E(α∗∗n (X, Z)βn∗∗(Y, Z)|Z) = max{E(α(X, Z)β(Y, Z)|Z), E(αn(X, Z)βn(Y, Z)|Z)}

and

E(α∗∗n (X, Z)βn∗∗(Y, Z)) = sup

(f,g)∈S∗

E(f (X, Z)g(Y, Z)) = E(αn(X, Z)βn(Y, Z)),

so E(α∗∗n (X, Z)βn∗∗(Y, Z)|Z) = E(αn(X, Z)βn(Y, Z)|Z) almost surely and (ii)

holds. The proof of Fact 1 is complete.

With Definition 2, ρ1(X, Y |Z) can be re-defined as follows:

Definition 3. ρ1(X, Y |Z) = sup(f,g)∈SE(f (X, Z)g(Y, Z)|Z).

Note that if the maximizer (f1, g1) in Definition 1 exists, it is clear that ρ1(X, Y |Z) =

E(f1(X, Z)g1(Y, Z)|Z) using Definition 3. Therefore, the definition for ρ1(X, Y |Z)

in Definition 3 can be viewed as a generalized version of that in Definition 1. It might be possible to define the ρk(X, Y |Z)’s for k ≥ 2 without assuming

the existence of the the (fi, gi)’s in Definition 1. However, the definition is

currently under construction and is not reported here. Below are some remarks for the ρk(X, Y |Z)’s.

1. ρk(X, Y |Z)’s satisfy Properties P1 and P2 and are between 0 and 1.

ρ1(X, Y |Z) satisfies Property P3. That is, when X and Y are

condi-tionally independent given Z, ρ1(X, Y |Z) = 0. When X is a function of

Y and Z or Y is a function of X and Z, ρ1(X, Y |Z) = 1.

2. When Z is a constant vector, ρk(X, Y |Z)’s are the canonical coefficients

in Dauxois and Nkiet (1998).

3. It is stated in Dauxois and Nkiet (1998) that when the joint distribution of X and Y is bivariate normal N

 0 0  ,  1 ρ ρ 1 

, the first canon-ical coefficient ρ1(X, Y ) = |ρ|. This result implies that, when the joint

distribution for X, Y and Z is multivariate normal and X and Y are both univariate, ρ1(X, Y |Z) =

E(X − E(X|Z))(Y − E(Y |Z))|Z)

(E(X − E(X|Z))2|Z))1/2(E(Y − E(Y |Z))2|Z))1/2

=

E(X − E(X|Z))(Y − E(Y |Z))) (E(X − E(X|Z))2))1/2(E(Y − E(Y |Z))2))1/2

,

which also equals the absolute value of the usual partial correlation coef-ficient.

(6)

3

A Test of Conditional Independence

Testing conditional independence is equivalent to testing if E(ρ1(X, Y |Z)) = 0.

In this section, an estimator for E(ρ1(X, Y |Z)) is proposed, and its asymptotic

distribution is derived to give a test of conditional independence.

3.1

Estimation of E(ρ

1

(X, Y |Z))

An estimator for E(ρ1(X, Y |Z)) can be constructed using basis approximation.

First, suppose that there exist basis functions {φp,i : 1 ≤ i ≤ p, p ≥ 1}, {ψq,j :

1 ≤ j ≤ q, q ≥ 1} and {θr,k : 1 ≤ k ≤ r, k ≥ 1} such that for {pn} and qn with

limn→∞pn= ∞ and limn→∞qn= ∞,

lim n→∞p≤pn,r≤rinfn,αp,r,i,k E  α(X, Z) − X 1≤i≤p,1≤k≤r αp,r,i,kφp,i(X)θr,k(Z)   2 = 0 (2) and lim n→∞q≤qn,r≤rinfn,βq,r,j,k E  β(Y, Z) − X 1≤j≤q,1≤k≤r βq,r,j,kψq,j(Y )θr,k(Z)   2 = 0 (3) for any α(X, Z) and β(Y, Z) with finite second moments. Furthermore, it is assumed that for each (p, q), there exist coefficients αp,i’s and βq,j’s such that

1 = X

1≤i≤p

αp,iφp,i(X) and 1 =

X

1≤j≤q

βq,jψq,j(Y ).

Then ρ1(X, Y |Z) can be approximated by sup(f,g)∈Spn,qnE(f (X, Z)g(Y, Z)|Z),

where Spn,qn = {(f, g) ∈ S : f (X, Z) =

Ppn

i=1αi(Z)φpn,i(X) and g(Y, Z) =

Pqn

j=1βj(Z)ψqn,j(Y )}. Denote the supremum by ρpn,qn(Z). Then specific

ap-proximation result is stated as follows.

Fact 2 Suppose that limn→∞pn= ∞ and limn→∞qn = ∞, then

limn→∞E(|ρ1(X, Y |Z) − ρpn,qn(Z)|) = 0.

The proof of Fact 2 follows from the approximation properties (2) and (3): ρ1(X, Y |Z) ≈ Ef (X, Z)g(Y, Z)|Z = corr (f (X, Z)g(Y, Z)|Z)

≈ corr (f∗(X, Z), g∗(Y, Z)|Z) ≤ ρpn,qn(Z)

for some (f, g) ∈ S and some (f∗, g∗) ∈ Spn,qn. Based on Fact 2, it is

reason-able to estimate ρ1(X, Y |Z) using an estimator for ρpn,qn(Z). To make such

(7)

is equal to some continuous function of Z almost surely. Then an estimator for ρpn,qn(z) and ρ1(X, Y |Z = z) is max {αi}pni=1,{βj}qnj=1 X i,j

αiβj ˆE(φi(X)ψj(Y )|Z = z) − ˆE(φi(X)|Z = z) ˆE(ψj(Y )|Z = z)

 ,

where the maximum is taken over all {αi}pi=1n and {βj}qj=1n such that

X

1≤i≤pn,1≤j≤qn

αiαj ˆE(φi(X)φj(X)|Z = z) − ˆE(φi(X)|Z = z) ˆE(φj(X)|Z = z)

 = 1

and X

1≤i≤pn,1≤j≤qn

βiβj ˆE[ψi(Y )ψj(Y )|Z = z] − ˆE(ψi(Y )|Z = z) ˆE(ψj(Y )|Z = z)

 = 1. Here ˆE[g(X, Y )|Z = z] = PNn i=1g(Xi, Yi)kh(z − Zi)/P Nn i=1kh(z − Zi), where Nn → ∞ as n → ∞, for z = (z(1), . . . , z(d)), kh(z) = Q d j=1h −1k 0(z(j)/h)

and k0 is a symmetric probability density function on R. Denote the above

estimator for ρ1(X, Y |Z = z) by ˆρ(z), then Nρ,n−1

PNn+Nρ,n

i=Nn+1 ρ(Zˆ i) is an estimator

for E(ρ1(X, Y |Z)), where Nρ,n→ ∞ as n → ∞.

The estimator ˆρ(z) can be obtained using SVD (single value decomposition), which makes it easy to do the computation.

3.2

Asymptotic distribution of the estimator and test of

conditional independence

To build a test for conditional independence based on the estimator ˆρ, it is necessary to derive the asymptotic distribution of the estimator under the con-ditional independence hypothesis. An asymptotic property of the estimator ˆρ(z) is given in the following theorem.

Theorem 1 Suppose that pn = p and qn = q do not depend on n, then

pNnhdn( ˆρ(z) − ρp,q(z)) converges in distribution as n → ∞. Furthermore,

if X and Y are conditionally independent given Z, then NnhdncKρˆ2(z)

con-verges in distribution to the maximum eigenvalue of CCT as n → ∞, where cK= f (z)/R k02(s)ds, f is the pdf of Z and C is an p × q matrix with 0’s in the

first row and first column and other elements are IID N (0, 1).

The proof of Theorem 1 requires only minor modification of Lemma 7.1 in Dauxois and Nkiet (1998) and is left out. Also, the asymptotic joint distribution of the estimators of certain conditional expectations is needed, which is taken directly from the lecture notes by James Powell titled “Notes on Nonparametric Regression Estimation”, which is available at

(8)

The asymptotic distribution of the test statistic Nρ,n−1

PNn+Nρ,n

i=Nn+1 ρ(Zˆ i) is

nor-mal, with mean and variance equal to the mean and variance of ˆρ(Z1), which can

be approximated using estimators from Bootstrap or the asymptotic distribu-tion of ˆρ(z). However, if pn and qnboth tend to ∞, then the distribution of the

maximum eigenvalue of the random matrix CCT does not converge. Therefore,

a more general version of Theorem 1 needs to be derived in order to understand the behavior of ˆρ(z) when both pn and qn tend to ∞.

References

[1] Baba, K.,Shibata, R. and Sibuya, M. (2004). Australian and New Zealand Journal of Statistics 46 657–664.

[2] Daudin, J. J. (1980). Biometrika 67 581–590.

[3] Dauxois, J. and Nkiet, G. M. (1998). Annals of Statistics 26 1254–1278. [4] Delgado, M. A. and Gonz´alez-Manteiga, W. (2001). Annals of

Statis-tics 1469–1507.

[5] Dossou-Gbete, S. and Pousse, A. (1991). Statistics 22 479–491. [6] Huang, S.-Y.,Lee, M.-H. and Hsiao, C. K. (2006). Draft [7] Johnstone, I. (2001). Annals of Statistics 29 295–327.

[8] Lawrance, A. J. (1976). The American Statistician 30 146–149.

[9] Li, L.,Cook, R. D. and Nachtsheim, C. J. (2005). Journal of the Royal Statistical Society. Series B 67 285–299.

[10] Linton, O. and Gozalo, P. (1997). Discussion paper, Cowles Founda-tion for Research in Economics, Yale University

[11] Muirhead, R. J. and Waternaux, C. M. (1980). Biometrika 67 31–43. [12] Romanoviˇc, V. A. (1975). Izvestiya Vysshikh Uchebnykh Zavedeniˇi

Matematika 10 94–96.

[13] Roussas, G. G. and Tran, L. T. (1992). Annals of Statistics 20 98–120. [14] Schuster, E. F. (1972). Annals of Mathematical Statistics 43 84–88. [15] Su, L. and White, H. (2005). Submitted to Econometric Theory. [16] Su, L. and White, H. (2006). Submitted to Journal of Econometrics [17] Waternaux, C. M. (1976). Biometrika 63 639–645.

參考文獻

相關文件

The best way to picture a vector field is to draw the arrow representing the vector F(x, y) starting at the point (x, y).. Of course, it’s impossible to do this for all points (x, y),

In a nonparametric setting, we discuss identifiability of the conditional and un- conditional survival and hazard functions when the survival times are subject to dependent

We propose two types of estimators of m(x) that improve the multivariate local linear regression estimator b m(x) in terms of reducing the asymptotic conditional variance while

The proof is based on Hida’s ideas in [Hid04a], where Hida provided a general strategy to study the problem of the non-vanishing of Hecke L-values modulo p via a study on the

If x or F is a vector, then the condition number is defined in a similar way using norms and it measures the maximum relative change, which is attained for some, but not all

Numerical experiments are done for a class of quasi-convex optimization problems where the function f (x) is a composition of a quadratic convex function from IR n to IR and

(a) The magnitude of the gravitational force exerted by the planet on an object of mass m at its surface is given by F = GmM / R 2 , where M is the mass of the planet and R is

We compare the results of analytical and numerical studies of lattice 2D quantum gravity, where the internal quantum metric is described by random (dynamical)