二群多變量時間數列間之簡化因果關係

(1)

行政院國家科學委員會專題研究計畫期末報告

二群多變量時間數列間之簡化因果關係

計畫類別：個別型計畫編號： NSC 101-2118-M-004-002- 執行期間： 101 年 08 月 01 日至 102 年 09 月 30 日執行單位：國立政治大學統計學系計畫主持人：洪英超計畫參與人員：碩士班研究生-兼任助理人員：陳韋成碩士班研究生-兼任助理人員：陳寬旻碩士班研究生-兼任助理人員：張欣惠碩士班研究生-兼任助理人員：林建佑公開資訊：本計畫涉及專利或其他智慧財產權，1 年後可公開查詢中華民國 102 年 11 月 22 日

(2)

中文摘要：本計畫將探討＇驗證二群多變量時間數列因果關係＇中之變數選擇問題。其想法主要是利用所謂的＇向量自我迴歸模型＇(Vector Autoregression Model) 將二群多變量時間數列中的＇重要變數＇萃取出來，並藉此建構一簡化(且新) 的＇因果關係。當向量自我迴歸模型的參數已知時，我們將証明此一變數選擇問題的解可以完全的表達出來。當向量自我迴歸模型的參數未知時，我們將介紹一個統計的假設檢定程序來估計(或近似) 此一變數選擇問題的解。中文關鍵詞：因果關係, 向量自我迴歸過程, 顯著變數, 預測

英文摘要： In this project we investigate a variable selection problem in the validation of causal relationship between two groups of multivariate time series data. By utilizing the Vector Autoregression (VAR) model, we introduce how to extract ＇significant variables＇ in both groups of time series data so that

a ＇trimmed causal relationship＇ can be presented based on precedence and predictability. When the parameters of the VAR model are known, we show that explicit conditions for solving this variable

selection problem can be obtained； when the parameters are unknown, a statistical hypothesis testing procedure is used to approximate the solution.

英文關鍵詞： Granger causality, Vector Autoregressive Process, significant variables, forecasting

(3)

□期中進度報

告

行政院國家科學委員會補助專題研究

計畫

■期末報告

（計畫名稱）

二群多變量時間數列間之簡化因果關係

計畫類別：■個別型計畫 □整合型計畫

計畫編號：NSC 101－2118－M－004－002－

執行期間： 101年 8月 1 日至 102年 9月 30日

執行機構及系所：國立政治大學統計系

計畫主持人：洪英超

共同主持人：

計畫參與人員：陳韋成陳寬旻張欣惠林建佑

本計畫除繳交成果報告外，另含下列出國報告，共 _0_ 份：

□移地研究心得報告

□出席國際學術會議心得報告

□國際合作研究計畫國外研究報告

處理方式：除列管計畫及下列情形者外，得立即公開查詢

□涉及專利或其他智慧財產權，□一年□二年後可公開查詢

中華民國 102年 11月

附件一

(4)

1

報告內容

(一)

前言及文獻探討

Comput Stat DOI 10.1007/s00180-012-0351-z ORIGINAL PAPER

Extracting informative variables in the validation

of two-group causal relationship

Ying-Chao Hung · Neng-Fang Tseng

Received: 14 December 2011 / Accepted: 13 July 2012 © Springer-Verlag 2012

Abstract The validation of causal relationship between two groups of multivariate

time series data often requires the precedence knowledge of all variables. How-ever, in practice one finds that some variables may be negligible in describing the underlying causal structure. In this article we provide an explicit definition of “non-informative variables” in a two-group causal relationship and introduce various auto-matic computer-search algorithms that can be utilized to extract informative variables based on a hypothesis testing procedure. The result allows us to represent a simpli-fied causal relationship by using minimum possible information on two groups of variables.

Keywords Causal relationship · Vector autoregression model · Informative

variables_{· Modified Wald test · Automatic computer-search algorithm}

1 Introduction

Over the years the causality system described by the multivariate time series process has been one of the most flexible and popular statistical techniques to measure the dynamic relationships between groups of variables in the areas of economics, finance, medicine, science, and engineering. The primary study of causal relationships can

Y.-C. Hung

Department of Statistics, National Chengchi University, NO. 64, Sec. 2, ZhiNan Rd., Wenshan District, Taipei 11605, Taiwan

e-mail: [email protected] N.-F. Tseng (

B

)

Department of Mathematical Statistics and Actuarial Science, Aletheia University, 32 Chen-Li Street, Tamsui, Taipei 25103, Taiwan

e-mail: [email protected]

123 Author's personal copy

2 Ying-Chao Hung, Neng-Fang Tseng

medicine, science, and engineering. The primary study of causal relationships can date back to the work by Granger (1969), wherein the Vector Autoregression (VAR) model, which is a generalization of the univariate AR models, was used to iden-tify the “causality” between two groups of time series data based on precedence and predictability. Afterward, there exists a fairly rich literature on its extensive studies. Some remarkable works are: Granger (1980) proposed a statistical hypothesis testing procedure to validate the bivariate causal relationship; Osborn (1984) discussed about “Unidirectional Granger Causality” based on the ARMA model and probed it into a statistical hypothesis testing procedures; Geweke (1982, 1984) considered measures of linear dependence and feedback between multiple time series data and provided a comprehensive literature survey of Granger-causality; Boudjellaba et al. (1992) tested causality between two vectors in multivariate autoregressive moving average models; Granger and Lin (1995) talked about the measure of causality by using the spectral decomposition based on the Vector Error Correction Model (VECM); Mosconi and Giannine (1992) investigated the Granger causality based on a non-stationary VAR model; Roebroech et al. (2005) used the Granger causality mapping (GCM) to ex-plore directed influences between neuronal populations in fMRI data; Hacker and Hatemi-J (2006) developed a method that is not sensitive to deviations from the as-sumption that the error term is normally distributed; Fujota et al. (2007) proposed an improved VAR model (called DVAR) to estimate time-varying gene regulatory networks based on gene expression profiles obtained from microarray experiments; Haufe et al. (2010) estimated causal interactions in multivariate time series using the VAR model; just to name a few.

The study of causal relationships usually include all variable information in the analysis. However, in many practical situations one finds that some variables are not particularly informative and can mislead the interpretation of the underlying causal structure. The work by Hsiao (1982) was closely related to such a concept. He intro-duced three different types of causal relationships (called direct, indirect, and spuri-ous causality) by reducing the information set in a three-variate time series model. However, when the number of variables becomes large, it is a much harder task to characterize all the causal patterns due to model complexity. To overcome this prob-lem, some graphical techniques have been successfully developed to identify and visualize the causal relationships between the components of multivariate time series data. The readers can refer to the works by Koster (1996, 1999), Lauritzen (1996, 2000), Pearl (1995, 2000), Whittaker (1990), and Arnold et al. (2007) for this type of approaches.

The goal of this study is to extract informative variables in the validation of causal relationship between two groups of multivariate time series data. These ex-tracted variables are important and useful in the sense that it allows us to forecast the future quantity of explicit variables by utilizing the minimum data information. The remainder of this paper is organized as follows. In Sect. 2, we introduce some background knowledge required for defining and identifying informative variables in the validation of two-group causal relationship. In Sect. 3, we introduce how to ex-tract all informative variables by utilizing a hypothesis testing procedure (called the modified Wald test) and various automatic computer-search algorithms. In Sect. 4,

medicine, science, and engineering. The primary study of causal relationships can date back to the work by Granger (1969), wherein the Vector Autoregression (VAR) model, which is a generalization of the univariate AR models, was used to iden-tify the “causality” between two groups of time series data based on precedence and predictability. Afterward, there exists a fairly rich literature on its extensive studies. Some remarkable works are: Granger (1980) proposed a statistical hypothesis testing procedure to validate the bivariate causal relationship; Osborn (1984) discussed about “Unidirectional Granger Causality” based on the ARMA model and probed it into a statistical hypothesis testing procedures; Geweke (1982, 1984) considered measures of linear dependence and feedback between multiple time series data and provided a comprehensive literature survey of Granger-causality; Boudjellaba et al. (1992) tested causality between two vectors in multivariate autoregressive moving average models; Granger and Lin (1995) talked about the measure of causality by using the spectral decomposition based on the Vector Error Correction Model (VECM); Mosconi and Giannine (1992) investigated the Granger causality based on a non-stationary VAR model; Roebroech et al. (2005) used the Granger causality mapping (GCM) to ex-plore directed influences between neuronal populations in fMRI data; Hacker and Hatemi-J (2006) developed a method that is not sensitive to deviations from the as-sumption that the error term is normally distributed; Fujota et al. (2007) proposed an improved VAR model (called DVAR) to estimate time-varying gene regulatory networks based on gene expression profiles obtained from microarray experiments; Haufe et al. (2010) estimated causal interactions in multivariate time series using the VAR model; just to name a few.

The study of causal relationships usually include all variable information in the analysis. However, in many practical situations one finds that some variables are not particularly informative and can mislead the interpretation of the underlying causal structure. The work by Hsiao (1982) was closely related to such a concept. He intro-duced three different types of causal relationships (called direct, indirect, and spuri-ous causality) by reducing the information set in a three-variate time series model. However, when the number of variables becomes large, it is a much harder task to characterize all the causal patterns due to model complexity. To overcome this prob-lem, some graphical techniques have been successfully developed to identify and visualize the causal relationships between the components of multivariate time series data. The readers can refer to the works by Koster (1996, 1999), Lauritzen (1996, 2000), Pearl (1995, 2000), Whittaker (1990), and Arnold et al. (2007) for this type of approaches.

The goal of this study is to extract informative variables in the validation of causal relationship between two groups of multivariate time series data. These ex-tracted variables are important and useful in the sense that it allows us to forecast the future quantity of explicit variables by utilizing the minimum data information. The remainder of this paper is organized as follows. In Sect. 2, we introduce some background knowledge required for defining and identifying informative variables in the validation of two-group causal relationship. In Sect. 3, we introduce how to ex-tract all informative variables by utilizing a hypothesis testing procedure (called the modified Wald test) and various automatic computer-search algorithms. In Sect. 4,

(5)

2

data. The readers can refer to the works by Koster (1996, 1999), Lauritzen (1996, 2000), Pearl (1995, 2000), Whittaker (1990), and Arnold et al. (2007) for this type of approaches.

(二) 研究目的

The goal of this study is to extract informative variables in the validation of causal relationship between two groups of multivariate time series data. These ex-tracted variables are important and useful in the sense that it allows us to forecast the future quantity of explicit variables by utilizing the minimum data information. The remainder of this paper is organized as follows. In Sect. 2, we introduce some

(三) 研究方法

The notion of causality in multivariate time series data is often discussed by the sta-tionary pth-order Vector Autoregression model (denoted by VAR(p)):

Wt = b + p

�

j=1

AjWt−j+ at, t = 1, . . . , T, (1)

where b is a (K × 1) constant vector, Wt = (W1,t, W2,t, . . . , WK,t)� is a (K × 1)

random vector, Aj is a (K × K) coefficient matrix for all j = 1, . . . , p, and at is

a (K × 1) error (or noise) vector satisfying that (i) E(at) = 0; (ii) the covariance

matrix E(ata�t)is positive definite (thus non-singular); and (iii) E(ata�_t−k) = 0 for

any non-zero k. Dividing all the variables of interest into two groups Xt and Yt, we

see that Wt can be further represented as

Wt = � Xt Yt � = � b1 b2 � + p � j=1 � AXX,j AXY,j AY X,j AY Y,j � � Xt−j Yt−j � + � aX,t aY,t � , t = 1, . . . , T, (2) where Xt = (X1,t, . . . , Xn,t)and Yt = (Y1,t, . . . , Ym,t) are (n × 1) and (m × 1)

random vectors, b1and b2are (n × 1) and (m × 1) constant vectors, AXX,j, AXY,j,

AY X,j, and AY Y,j are sub-matrices of Aj with orders (n × n), (n × m), (m × n),

and (m × m), respectively, aX,t and aY,t are (n × 1) and (m × 1) error vectors. The

primary goal of the so-called “Granger causality” is to examine whether or not the time series Yt is useful in forecasting the time series Xt.

Given any point in time , let us consider the two information sets Given any point in time t, let us consider the two information sets

ΩXY = {X1,t, . . . , Xn,t, . . . , X1,1, . . . , Xn,1, Y1,t, . . . , Ym,t, . . . , Y1,1, . . . , Ym,1}

and

(6)

3

Extracting Informative Variables in the Validation of Two-group Causal Relationship 3

the computer-search algorithms are illustrated on a real example. Some concluding remarks are given in Sect. 5.

2 Background Knowledge

The notion of causality in multivariate time series data is often discussed by the sta-tionary pth-order Vector Autoregression model (denoted by VAR(p)):

Wt = b + p

�

j=1

AjWt_−j + at, t = 1, . . . , T, (1)

where b is a (K × 1) constant vector, Wt = (W1,t, W2,t, . . . , WK,t)� is a (K × 1)

random vector, Aj is a (K × K) coefficient matrix for all j = 1, . . . , p, and at is

a (K × 1) error (or noise) vector satisfying that (i) E(at) = 0; (ii) the covariance

matrix E(ata�t)is positive definite (thus non-singular); and (iii) E(ata�_t_−k) = 0 for

any non-zero k. Dividing all the variables of interest into two groups Xt and Yt, we

see that Wt can be further represented as

Wt = � Xt Yt � = � b1 b2 � + p � j=1 � AXX,j AXY,j AY X,j AY Y,j � � Xt−j Yt−j � + � aX,t aY,t � , t = 1, . . . , T, (2) where Xt = (X1,t, . . . , Xn,t) and Yt = (Y1,t, . . . , Ym,t) are (n × 1) and (m × 1)

random vectors, b1and b2are (n × 1) and (m × 1) constant vectors, AXX,j, AXY,j,

AY X,j, and AY Y,j are sub-matrices of Aj with orders (n × n), (n × m), (m × n),

and (m × m), respectively, aX,t and aY,tare (n × 1) and (m × 1) error vectors. The

primary goal of the so-called “Granger causality” is to examine whether or not the time series Yt is useful in forecasting the time series Xt.

Given any point in time t, let us consider the two information sets

ΩXY ={X1,t, . . . , Xn,t, . . . , X1,1, . . . , Xn,1, Y1,t, . . . , Ym,t, . . . , Y1,1, . . . , Ym,1}

and

ΩX ={X1,t, . . . , Xn,t, . . . , X1,1, . . . , Xn,1}.

For any given future time (t + h), we denote the best linear predictor of Xt+hbased

on the information sets ΩXY and ΩX by

ˆ

Xt(h|ΩXY) = ( ˆX1,t(h|ΩXY), . . . , ˆXn,t(h|ΩXY))

and

ˆ

Xt(h|ΩX) = ( ˆX1,t(h|ΩX), . . . , ˆXn,t(h|ΩX),

respectively. The two-group causality (also known as generalization of Granger causal-ity) is defined as follows.

Definition 1 (Two-group Causality up to Horizon c)

Given any positive integer c, if ˆXt(h|ΩX) �= ˆXt(h|ΩXY) for some h ≤ c, then we

say that Ytcauses Xtup to horizon c and denote it by Y →

(c)X. On the other hand, if

ˆ

Xt(h|ΩX) = ˆXt(h|ΩXY) for all h ≤ c, then we say that Yt does not cause Xtup to

horizon c and denote it by Y �

(c)X.

The following proposition is useful for identifying the causality/non-causality be-tween Xt and Yt.

Proposition 1 Based on the model in Eq. (1)-(2), for any positive integer c we have that Y �

(c)X if and only if AXY,j = 0n×mfor all j = 1, . . . , p.

Proof Since we know that Y �

(c)Xis equivalent to Y �(_∞) X(see Dufour and Renault

(1998), Proposition 2.3) and Y �

(∞) X if and only if AXY,j = 0n×m for all j =

1, . . . , p(see L¨utkepohl (2005), Corollary 2.2.1), the result is simply obtained. Proposition 1 indicates that the two-group causality based on the VAR model can be determined by examining the coefficient matrix AXY,j. We next review some

properties that are necessary for establishing the procedure of extracting informative variables in the later section.

As a result of Definition 1, if Y →

(c) X then there exists at least one pair (i, h) ∈

{1, . . . , n} × {1, . . . , c} such that E�Xˆi,t(h|ΩXY)− Xi,t+h �2 < E�Xˆi,t(h|ΩX)− Xi,t+h �2 ,

where ˆXi,t(h|ΩXY)and ˆXi,t(h|ΩX)are the i-th element of ˆXt(h|ΩXY)and ˆXt(h|ΩX),

respectively. Now we introduce how to calculate ˆXt(h|ΩXY). Based on Eq. (1), for

any given time lag h > 0 we have that Wt+h = h_�₋₁ k=0 A(k)₁ (b + at+h−k) + p � j=1 A(h)_j Wt+1−j, (3) where A(k)

j is a matrix obtained from the recursive formula

A(k)_j = �

Aj k = 1

A(k_j+1−1)+ A(k₁ −1)Aj k = 2, 3, . . . , h,

(4) and j = 1, . . . , p. Consider the following partition of matrix A(h)

j : A(h)_j = � A(h)_XX,j A(h)_XY,j A(h)_{Y X,j} A(h)_{Y Y,j} � , where A(h) XX,j and A (h)

XY,j are two sub-matrices with orders (n × n) and (n × m),

respectively. Denote the identity matrix of order n by In, it is clear that Xt+h =

Given any positive integer c, if ˆXt(h|ΩX) �= ˆXt(h|ΩXY)for some h ≤ c, then we

(c)X. On the other hand, if

ˆ

Xt(h|ΩX) = ˆXt(h|ΩXY)for all h ≤ c, then we say that Yt does not cause Xtup to

(c)X.

The following proposition is useful for identifying the causality/non-causality be-tween Xtand Yt.

(c)Xif and only if AXY,j = 0n×m for all j = 1, . . . , p.

(c)Xis equivalent to Y �(∞)X(see Dufour and Renault

any given time lag h > 0 we have that Wt+h = h₋₁ � k=0 A(k)₁ (b + at+h_−k) + p � j=1 A(h)_j Wt+1_−j, (3) where A(k)

j is a matrix obtained from the recursive formula

A(k)_j = �

Aj k = 1

A(k_j+1−1)+ A(k₁ −1)Aj k = 2, 3, . . . , h,

(7)

4

Given any positive integer c, if ˆXt(h|ΩX) �= ˆXt(h|ΩXY)for some h ≤ c, then we

(c) X. On the other hand, if

ˆ

Xt(h|ΩX) = ˆXt(h|ΩXY)for all h ≤ c, then we say that Yt does not cause Xtup to

(c) X.

The following proposition is useful for identifying the causality/non-causality be-tween Xtand Yt.

(c) Xif and only if AXY,j = 0n×m for all j = 1, . . . , p.

(c)Xis equivalent to Y �(_∞)X(see Dufour and Renault

any given time lag h > 0 we have that Wt+h = h_�−1 k=0 A(k)₁ (b + at+h_−k) + p � j=1 A(h)_j Wt+1_−j, (3)

where A(k)j is a matrix obtained from the recursive formula

A(k)_j = �

Aj k = 1

A(k_j+1−1) + A(k₁ −1)Aj k = 2, 3, . . . , h,

respectively. Denote the identity matrix of order n by In, it is clear that Xt+h =

(In, 0n×m)Wt+h. Based on the notations introduced above, the best linear predictor

(in matrix form) is given by ˆ Xt(h|ΩXY) = b1,h+ p � j=1 (A(h)_XX,jXt+1−j + A(h)XY,jYt+1−j), (5) where b1,h = (In, 0n×m)�hk=0−1A (k)

1 b. Eq. (5) shows that the best linear predictor

ˆ

Xt(h|ΩXY) relates to Yt merely through the coefficient matrix A(h)_XY,j. This will

serve as an important benchmark for the remaining of this study.

Note that Definition 1 focuses on the causal relationship between the two random vectors Xt and Yt. In particular, it explicitly defines whether or not Yt can improve

the forecasting of Xt+h. However, by the preceding arguments we learn that if Y → (c)

X, then it is guaranteed that adding all variables in Yt into the information set will

improve the forecasting of “some” variables in Xt- but not definitely all. On the other

hand, the forecasting of Xt+h may be improved by utilizing merely the information

of “some” variables in Yt - but not necessarily all. Therefore, our goal here is to

provide a statistical procedure to extract those “informative variables” in both Xtand

Yt. To do this, we first introduce the definition of “non-informative variables” in both

Xtand Yt.

Definition 2 (Non-informative Variables in Xt and Yt)

Consider the VAR(p) model described in Eq. (1)-(2) and assume that Y →

(c) X for

some given integer c > 0.

(a) The variable Yi,tin Yt = (Y1,t, . . . , Ym,t)� is non-informative if

ˆ

Xt(h|ΩXY) = ˆXt(h|ΩXY−i)for all h ≤ c, (6)

where ΩXY_−i = ΩXY \ {Yi,t, . . . , Yi,1} refers to the reduced information set with

the i-th variable in Ytbeing excluded.

(b) The variable Xi,tin Xt = (X1,t, . . . , Xn,t)� is non-informative if

ˆ

Xi,t(h|ΩXY) = ˆXi,t(h|ΩX)for all h ≤ c. (7)

The result of Definition 2 directly implies that, if the prediction of Xt+h based

on ΩXY is the same as that based on the reduced information set ΩXY−i, then the

variable Yi,tcan be excluded from analysis (since it is non-informative in predicting

Xt). Analogously, if the prediction of Xi,t+hbased on ΩXY is the same as that based

on ΩX, then the variable Xi,t can be excluded from analysis. The following two

theorems provide useful guidelines for finding the non-informative (or informative) variables in both Xt and Yt.

Theorem 1 (Identification of Non-informative Variables in Yt)

Consider the matrix A(h)_XY,j given in Eq. (5) and its column partition

(8)

5

(In, 0n×m)Wt+h. Based on the notations introduced above, the best linear predictor

(in matrix form) is given by ˆ Xt(h|ΩXY) = b1,h+ p � j=1 (A(h)_XX,jXt+1−j + A(h)XY,jYt+1−j), (5)

where b1,h = (In, 0n_×m)�h_k=0−1A(k)1 b. Eq. (5) shows that the best linear predictor