行政院國家科學委員會專題研究計畫 成果報告
條件相關性度量及條件獨立檢定
研究成果報告(精簡版)
計 畫 類 別 : 個別型
計 畫 編 號 : NSC 95-2119-M-004-001-
執 行 期 間 : 95 年 10 月 01 日至 96 年 07 月 31 日
執 行 單 位 : 國立政治大學統計學系
計 畫 主 持 人 : 黃子銘
計畫參與人員: 碩士班研究生-兼任助理:歐陽致平
處 理 方 式 : 本計畫可公開查詢
中 華 民 國 96 年 10 月 22 日
1
Introduction
Conditional independence tests have different applications. For example, to make variable selection in a regression model, Li, Cook and Nachtsheim (2005) proposed to test the independence between the response variable and a predic-tor variable given other predicpredic-tors, and remove the predicpredic-tor variable when the test is not significant. Su and White (2005) pointed out that conditional inde-pendence tests can be used for testing Granger non-causality for two time series and choosing a proper model for a certain family of semi-parametric models.
For the case where the variables involved are discrete, there are many exist-ing tests for conditional independence. For the case with continuous variables, there are relatively few results. Li et al (2005) proposed a test of conditional independence, which is constructed by projecting two variables to the space gen-erated by the conditioned variable and then testing the independence between the residuals. Su and White (2005, 2006) proposed tests based on a weighted Hellinger distance between the conditional densities or based on the difference between the conditional characteristic functions.
It is desirable to construct a test of conditional independence of two random vectors X and Y given a random vector Z based on some measure of conditional association, where the measure of conditional association have the following properties:
P1 The measure can be defined for any types of random vectors, including both discrete and continuous variables.
P2 The measure is invariant when one-to-one transforms are applied to each vector.
P3 The measure is between 0 and 1, where 0 corresponds to independence and 1 corresponds to full dependence.
Part or all of Properties P1 - P3 have been considered by various authors in different contexts. Some examples are as follows.
Romanoviˇc (1975) defined the maximum partial correlation between two σ-fields given a third σ-field. According to Romanoviˇc’s definition, the maximum partial correlation between σ(X) and σ(Y ) given σ(Z) is
sup
f,g
corr (f (X, Z) − E(f (X, Z)|Z), g(Y, Z) − E(g(Y, Z)|Z)) ,
where σ(X) denotes the σ-field generated by the random vector X. The max-imum partial correlation between σ(X) and σ(Y ) given σ(Z) can serve as a measure of conditional association of X and Y given Z, and it satisfies Proper-ties P1 - P3.
Su and White (2005) proposed a test of conditional independence which is based on a test statistic that is a weighted Hellinger distance between the
conditional density of X given Z and the conditional density of Y given Z. Such a statistic can serve as a measure of conditional association, and they chose Hellinger distance so that the test statistic has the invariant property P2. Dauxois and Nkiet (1998) proposed to use canonical coefficients obtain in nonlinear canonical analysis (NLCA) to construct measures of association and tests of independence. The following is a straightforward extension of Dauxois and Nkiet (1998)’s definition of the canonical coefficients to the conditional case. Definition 1. Suppose that there exist pairs of functions (fi, gi): i = 0,
1, . . ., such that for each i, (fi, gi) is a pair of functions (f, g) that maximizes
E (f (X, Z)g(Y, Z)|Z) subject to E(f2(X, Z)|Z) = 1 = E(g2(Y, Z)|Z) and E(f (X, Z)fj(X, Z)|Z) = 0 = E(g(Y, Z)gj(Y, Z)|Z) for j < i.
Define ρi(X, Y |Z) = E (fi(X, Z)gi(Y, Z)|Z) for each i. The ρi(X, Y |Z)’s will
be referred as canonical coefficients.
Suppose that the (fi, gi)’s in Definition 1 exist, then a proper combination
of ρi(X, Y |Z)’s can give a measure of conditional association. Some examples
of such a combination are ρ1(X, Y |Z) and −Pklog(1 − ρ2k(X, Y |Z)), whose
unconditional counterparts are two commonly used measures of association, as mentioned in Huang, Lee and Hsiao (2006).
Among the various approaches for constructing measures of conditional as-sociation described above, the NLCA approach offers the most flexibility. The objective of this project is to construct a test of conditional independence based on measures of conditional association from NLCA. However, it is not clear what conditions need to be added to guarantee the existence of the (fi, gi)’s. In
the report, an alternative definition for the canonical coefficient ρ1(X, Y |Z) is
provided to avoid finding such conditions.
2
Measures of Conditional Association
In this section, the canonical coefficients ρi(X, Y |Z)’s are defined using a new
approach. In Definition 1, it is clear that ρ0(X, Y |Z) = 1 with f0(X, Z) = 1 =
g0(Y, Z). Therefore, only the ρi(X, Y |Z)’s with i ≥ 1 will be defined again, and
the new definitions involve only pairs of functions in
S = {(f, g) : E(f2(X, Z)|Z) = 1 = E(g2(Y, Z)|Z) and E(f (X, Z)|Z) = 0 = E(g(Y, Z)|Z)}. As mentioned in Section 1, the purpose for introducing alternative definitions
for the ρi(X, Y |Z)’s is to avoid dealing with the existence of the maximizers
(fi, gi)’s. To achieve this goal, the maximums of certain conditional
expecta-tions in the original definiexpecta-tions of the ρi(X, Y |Z)’s will be replaced by suitable
supremums. In particular, sup(f,g)∈S∗E(f (X, Z)g(Y, Z)|Z) needs to be defined
Fact 1 For S∗⊂ S, there exists a sequence {(αn, βn)} in S∗ such that
(i) T hesequence{E(αn(X, Z)βn(Y, Z)|Z)} is increasing (not necessarily strictly),
and
(ii) for every (f, g) ∈ S∗, E(f (X, Z)g(Y, Z)|Z) ≤ limn→∞E(αn(X, Z)βn(Y, Z)|Z)
almost surely.
Furthermore, if (i) and (ii) hold for {(αn(X, Z), βn(Y, Z))} = {(αn,1(X, Z), βn,1(Y, Z))}
or {(αn,2(X, Z), βn,2(Y, Z))}, then
lim
n→∞E(αn,1(X, Z)βn,1(Y, Z)|Z) = limn→∞E(αn,2(X, Z)βn,2(Y, Z)|Z) (1)
almost surely.
Fact 1 allows one to define sup(f,g)∈S∗E(f (X, Z)g(Y, Z)|Z):
Definition 2. For S∗⊂ S, sup
(f,g)∈S∗
E(f (X, Z)g(Y, Z)|Z) = lim
n→∞E(αn(X, Z)βn(Y, Z)|Z),
where {(αn, βn)} is a sequence in S∗ that satisfies (i) and (ii) in Fact 1.
Proof for Fact 1. First, note that (1) holds because for every n, E(αn,2(X, Z)βn,2(Y, Z)|Z) ≤ lim
n→∞E(αn,1(X, Z)βn,1(Y, Z)|Z)
and
E(αn,1(X, Z)βn,1(Y, Z)|Z) ≤ lim
n→∞E(αn,2(X, Z)βn,2(Y, Z)|Z)
almost surely. It remains to find a sequence {(αn, βn)} that satisfies (i) and (ii).
Let {(α∗
n(X, Z), βn∗(Y, Z)} be a sequence in S∗ such that E(α∗n(X, Z)βn∗(Y, Z))
increases to sup(f,g)∈S∗E(f (X, Z)g(Y, Z)).
Let (α1(X, Z), β1(Y, Z)) = (α∗1(X, Z), β1∗(Y, Z)), and for n ≥ 2, define
(αn(X, Z), βn(Y, Z))
=
(α∗
n(X, Z), βn∗(Y, Z)) if E(α∗n(X, Z)β∗n(Y, Z)|Z) > E(αn−1(X, Z)βn−1(Y, Z)|Z);
(αn−1(X, Z), βn−1(Y, Z)) otherwise.
Then {(αn(X, Z), βn(Y, Z))} is a sequence in S∗ that satisfies (i). To see that
{(αn(X, Z), βn(Y, Z))} also satisfies (ii), for (α, β) in S∗, Define
(α∗∗n , β∗∗n ) =
(α, β) if E(α(X, Z)β(Y, Z)|Z) > limn→∞E(αn(X, Z)βn(Y, Z)|Z);
Then
E(α∗∗n (X, Z)βn∗∗(Y, Z)|Z) = max{E(α(X, Z)β(Y, Z)|Z), E(αn(X, Z)βn(Y, Z)|Z)}
and
E(α∗∗n (X, Z)βn∗∗(Y, Z)) = sup
(f,g)∈S∗
E(f (X, Z)g(Y, Z)) = E(αn(X, Z)βn(Y, Z)),
so E(α∗∗n (X, Z)βn∗∗(Y, Z)|Z) = E(αn(X, Z)βn(Y, Z)|Z) almost surely and (ii)
holds. The proof of Fact 1 is complete.
With Definition 2, ρ1(X, Y |Z) can be re-defined as follows:
Definition 3. ρ1(X, Y |Z) = sup(f,g)∈SE(f (X, Z)g(Y, Z)|Z).
Note that if the maximizer (f1, g1) in Definition 1 exists, it is clear that ρ1(X, Y |Z) =
E(f1(X, Z)g1(Y, Z)|Z) using Definition 3. Therefore, the definition for ρ1(X, Y |Z)
in Definition 3 can be viewed as a generalized version of that in Definition 1. It might be possible to define the ρk(X, Y |Z)’s for k ≥ 2 without assuming
the existence of the the (fi, gi)’s in Definition 1. However, the definition is
currently under construction and is not reported here. Below are some remarks for the ρk(X, Y |Z)’s.
1. ρk(X, Y |Z)’s satisfy Properties P1 and P2 and are between 0 and 1.
ρ1(X, Y |Z) satisfies Property P3. That is, when X and Y are
condi-tionally independent given Z, ρ1(X, Y |Z) = 0. When X is a function of
Y and Z or Y is a function of X and Z, ρ1(X, Y |Z) = 1.
2. When Z is a constant vector, ρk(X, Y |Z)’s are the canonical coefficients
in Dauxois and Nkiet (1998).
3. It is stated in Dauxois and Nkiet (1998) that when the joint distribution of X and Y is bivariate normal N
0 0 , 1 ρ ρ 1
, the first canon-ical coefficient ρ1(X, Y ) = |ρ|. This result implies that, when the joint
distribution for X, Y and Z is multivariate normal and X and Y are both univariate, ρ1(X, Y |Z) =
E(X − E(X|Z))(Y − E(Y |Z))|Z)
(E(X − E(X|Z))2|Z))1/2(E(Y − E(Y |Z))2|Z))1/2
=
E(X − E(X|Z))(Y − E(Y |Z))) (E(X − E(X|Z))2))1/2(E(Y − E(Y |Z))2))1/2
,
which also equals the absolute value of the usual partial correlation coef-ficient.
3
A Test of Conditional Independence
Testing conditional independence is equivalent to testing if E(ρ1(X, Y |Z)) = 0.
In this section, an estimator for E(ρ1(X, Y |Z)) is proposed, and its asymptotic
distribution is derived to give a test of conditional independence.
3.1
Estimation of E(ρ
1(X, Y |Z))
An estimator for E(ρ1(X, Y |Z)) can be constructed using basis approximation.
First, suppose that there exist basis functions {φp,i : 1 ≤ i ≤ p, p ≥ 1}, {ψq,j :
1 ≤ j ≤ q, q ≥ 1} and {θr,k : 1 ≤ k ≤ r, k ≥ 1} such that for {pn} and qn with
limn→∞pn= ∞ and limn→∞qn= ∞,
lim n→∞p≤pn,r≤rinfn,αp,r,i,k E α(X, Z) − X 1≤i≤p,1≤k≤r αp,r,i,kφp,i(X)θr,k(Z) 2 = 0 (2) and lim n→∞q≤qn,r≤rinfn,βq,r,j,k E β(Y, Z) − X 1≤j≤q,1≤k≤r βq,r,j,kψq,j(Y )θr,k(Z) 2 = 0 (3) for any α(X, Z) and β(Y, Z) with finite second moments. Furthermore, it is assumed that for each (p, q), there exist coefficients αp,i’s and βq,j’s such that
1 = X
1≤i≤p
αp,iφp,i(X) and 1 =
X
1≤j≤q
βq,jψq,j(Y ).
Then ρ1(X, Y |Z) can be approximated by sup(f,g)∈Spn,qnE(f (X, Z)g(Y, Z)|Z),
where Spn,qn = {(f, g) ∈ S : f (X, Z) =
Ppn
i=1αi(Z)φpn,i(X) and g(Y, Z) =
Pqn
j=1βj(Z)ψqn,j(Y )}. Denote the supremum by ρpn,qn(Z). Then specific
ap-proximation result is stated as follows.
Fact 2 Suppose that limn→∞pn= ∞ and limn→∞qn = ∞, then
limn→∞E(|ρ1(X, Y |Z) − ρpn,qn(Z)|) = 0.
The proof of Fact 2 follows from the approximation properties (2) and (3): ρ1(X, Y |Z) ≈ Ef (X, Z)g(Y, Z)|Z = corr (f (X, Z)g(Y, Z)|Z)
≈ corr (f∗(X, Z), g∗(Y, Z)|Z) ≤ ρpn,qn(Z)
for some (f, g) ∈ S and some (f∗, g∗) ∈ Spn,qn. Based on Fact 2, it is
reason-able to estimate ρ1(X, Y |Z) using an estimator for ρpn,qn(Z). To make such
is equal to some continuous function of Z almost surely. Then an estimator for ρpn,qn(z) and ρ1(X, Y |Z = z) is max {αi}pni=1,{βj}qnj=1 X i,j
αiβj ˆE(φi(X)ψj(Y )|Z = z) − ˆE(φi(X)|Z = z) ˆE(ψj(Y )|Z = z)
,
where the maximum is taken over all {αi}pi=1n and {βj}qj=1n such that
X
1≤i≤pn,1≤j≤qn
αiαj ˆE(φi(X)φj(X)|Z = z) − ˆE(φi(X)|Z = z) ˆE(φj(X)|Z = z)
= 1
and X
1≤i≤pn,1≤j≤qn
βiβj ˆE[ψi(Y )ψj(Y )|Z = z] − ˆE(ψi(Y )|Z = z) ˆE(ψj(Y )|Z = z)
= 1. Here ˆE[g(X, Y )|Z = z] = PNn i=1g(Xi, Yi)kh(z − Zi)/P Nn i=1kh(z − Zi), where Nn → ∞ as n → ∞, for z = (z(1), . . . , z(d)), kh(z) = Q d j=1h −1k 0(z(j)/h)
and k0 is a symmetric probability density function on R. Denote the above
estimator for ρ1(X, Y |Z = z) by ˆρ(z), then Nρ,n−1
PNn+Nρ,n
i=Nn+1 ρ(Zˆ i) is an estimator
for E(ρ1(X, Y |Z)), where Nρ,n→ ∞ as n → ∞.
The estimator ˆρ(z) can be obtained using SVD (single value decomposition), which makes it easy to do the computation.
3.2
Asymptotic distribution of the estimator and test of
conditional independence
To build a test for conditional independence based on the estimator ˆρ, it is necessary to derive the asymptotic distribution of the estimator under the con-ditional independence hypothesis. An asymptotic property of the estimator ˆρ(z) is given in the following theorem.
Theorem 1 Suppose that pn = p and qn = q do not depend on n, then
pNnhdn( ˆρ(z) − ρp,q(z)) converges in distribution as n → ∞. Furthermore,
if X and Y are conditionally independent given Z, then NnhdncKρˆ2(z)
con-verges in distribution to the maximum eigenvalue of CCT as n → ∞, where cK= f (z)/R k02(s)ds, f is the pdf of Z and C is an p × q matrix with 0’s in the
first row and first column and other elements are IID N (0, 1).
The proof of Theorem 1 requires only minor modification of Lemma 7.1 in Dauxois and Nkiet (1998) and is left out. Also, the asymptotic joint distribution of the estimators of certain conditional expectations is needed, which is taken directly from the lecture notes by James Powell titled “Notes on Nonparametric Regression Estimation”, which is available at
The asymptotic distribution of the test statistic Nρ,n−1
PNn+Nρ,n
i=Nn+1 ρ(Zˆ i) is
nor-mal, with mean and variance equal to the mean and variance of ˆρ(Z1), which can
be approximated using estimators from Bootstrap or the asymptotic distribu-tion of ˆρ(z). However, if pn and qnboth tend to ∞, then the distribution of the
maximum eigenvalue of the random matrix CCT does not converge. Therefore,
a more general version of Theorem 1 needs to be derived in order to understand the behavior of ˆρ(z) when both pn and qn tend to ∞.
References
[1] Baba, K.,Shibata, R. and Sibuya, M. (2004). Australian and New Zealand Journal of Statistics 46 657–664.
[2] Daudin, J. J. (1980). Biometrika 67 581–590.
[3] Dauxois, J. and Nkiet, G. M. (1998). Annals of Statistics 26 1254–1278. [4] Delgado, M. A. and Gonz´alez-Manteiga, W. (2001). Annals of
Statis-tics 1469–1507.
[5] Dossou-Gbete, S. and Pousse, A. (1991). Statistics 22 479–491. [6] Huang, S.-Y.,Lee, M.-H. and Hsiao, C. K. (2006). Draft [7] Johnstone, I. (2001). Annals of Statistics 29 295–327.
[8] Lawrance, A. J. (1976). The American Statistician 30 146–149.
[9] Li, L.,Cook, R. D. and Nachtsheim, C. J. (2005). Journal of the Royal Statistical Society. Series B 67 285–299.
[10] Linton, O. and Gozalo, P. (1997). Discussion paper, Cowles Founda-tion for Research in Economics, Yale University
[11] Muirhead, R. J. and Waternaux, C. M. (1980). Biometrika 67 31–43. [12] Romanoviˇc, V. A. (1975). Izvestiya Vysshikh Uchebnykh Zavedeniˇi
Matematika 10 94–96.
[13] Roussas, G. G. and Tran, L. T. (1992). Annals of Statistics 20 98–120. [14] Schuster, E. F. (1972). Annals of Mathematical Statistics 43 84–88. [15] Su, L. and White, H. (2005). Submitted to Econometric Theory. [16] Su, L. and White, H. (2006). Submitted to Journal of Econometrics [17] Waternaux, C. M. (1976). Biometrika 63 639–645.