### 科技部補助專題研究計畫成果報告

### 期末報告

### 含有非定態數列的階層式因子模型(第2年)

計 畫 類 別 ： 個別型計畫 計 畫 編 號 ： NSC 102-2410-H-004-019-MY2 執 行 期 間 ： 103年08月01日至104年07月31日 執 行 單 位 ： 國立政治大學經濟學系 計 畫 主 持 人 ： 徐士勛 計畫參與人員： 碩士班研究生-兼任助理人員：李勁宏 碩士班研究生-兼任助理人員：張伊婷 碩士班研究生-兼任助理人員：陳建安 碩士班研究生-兼任助理人員：游書豪 博士班研究生-兼任助理人員：徐兆璿 處 理 方 式 ： 1.公開資訊：本計畫可公開查詢 2.「本研究」是否已有嚴重損及公共利益之發現：否 3.「本報告」是否建議提供政府單位施政參考：否### 中 華 民 國 104 年 10 月 28 日

中 文 摘 要 ： 由大量多維度資料的出現，假設所有變數的波動都僅由少數幾個重 要共同因子所決定的「因子模型」以及其變化模型在近年的文獻研 究上也逐漸受到重視。其中，為了能讓因子所蘊含的經濟意義更為 清晰，納入更多經濟結構設定的「階層式因子模型」也因而被引進 於實證研究中。然而，目前文獻上的「階層式因子模型」中皆隱含 了兩個重要的限制：一為其只能分析定態資料，另一則為每一階層 的因子數目必須由研究者事先給定以利後續估計方法(如最大概似估 計或貝式估計)的進行。此一兩年期的計畫，即試圖在保留「階層式 因子模型」的結構優點下，突破這兩大限制。 我們的模型允許可能非定態的資料、因子及干擾項的存在，同時每 一階層的因子數目將由資料來客觀決定。並且，我們提出的估計方 法將僅倚賴主成分分析法，以逐層分析的方式進行估計。相較於大 部分文獻所採用的估計方式， 這是一個相對容易執行的估計方法。我們已經完整探討此模型及其 延伸，對應的極限性質與對應的模擬設計。 基本而言，由於多數資料皆非定態，而這些非定態特徵可能存在值 得探討的共同趨勢，因此，此計畫所提出的分析架構在文獻上是一 套新的嘗試，而且可與現有的「階層式因子模型」相抗衡。 中 文 關 鍵 詞 ： 共同因子，非定態，階層式因子模型，主成分分析法，變異數分解 英 文 摘 要 ： While facing the large dimensional data,the factor model,

which assumes the main fluctuations of all variables of interest are driven by only a few common factors,has thus become popular, and lots of its variants are introduced in the literature.In particular,to gain a better understanding of factors,the so-called top-down hierarchical factor model is established by imposing more economic structures on factors.Nevertheless,there are a couple of limitations in the exiting hierarchical factor models:(1) they work for the stationary data only,and (2) the number of factors of each layer must be presumed by researchers in advance of employing the maximum likelihood estimation or Bayesian methods.This paper thus aims to get round these

limitations,while keeping the advantages of top-down hierarchical factor model.The non-stationary data as well as non-stationary factors and idiosyncratic errors are allowed,

the number of factors of each layer is determined by the data instead of presumption by researchers,and the proposed estimation procedure is implementable by applying principal component analysis from top layer to bottom layer

recursively.The corresponding asymptotic properties of the proposed approach are discussed in detailed,and good

finite-sample performance is also shown by some Monte Carlo simulations. In essence,the proposed framework is new in the literature and can be a comparable alternative to the existing top-down hierarchical factor models, while

facing the possible non-stationary data.

英 文 關 鍵 詞 ： common factor, non-stationarity, hierarchical factor model, principal component analysis, variance decomposition

### Hierarchical Factor Models with Possible

### Non-stationary Components

### Shih-Hsun Hsu

∗Department of Economics, National Chengchi University

This Version: Oct, 2015

∗_{Corresponding author. E-mail: shhsu@nccu.edu.tw; TEL: +886-2-2939-3091 ext:51667; Address:}

Department of Economics, National Chengchi University, Taipei 116, Taiwan. The research support from the National Science Council of the Republic of China (Taiwan), NSC 102-2410-H-004-019-MY2, is gratefully acknowledged.

Abstract

While facing the large dimensional data, the factor model, which assumes the main fluctuations of all variables of interest are driven by only a few common factors, has thus become popular, and lots of its variants are introduced in the literature. In particu-lar, to gain a better understanding of factors, the so-called top-down hierarchical factor model is established by imposing more economic structures on factors. Nevertheless, there are a couple of limitations in the exiting hierarchical factor models: (1) they work for the stationary data only, and (2) the number of factors of each layer must be pre-sumed by researchers in advance of employing the maximum likelihood estimation or Bayesian methods. This paper thus aims to get round these limitations, while keeping the advantages of top-down hierarchical factor model. The non-stationary data as well as non-stationary factors and idiosyncratic errors are allowed, the number of factors of each layer is determined by the data instead of presumption by researchers, and the pro-posed estimation procedure is implementable by applying principal component analysis from top layer to bottom layer recursively. The corresponding asymptotic properties of the proposed approach are discussed in detailed, and good finite-sample performance is also shown by some Monte Carlo simulations. In essence, the proposed framework is new in the literature and can be a comparable alternative to the existing top-down hierarchical factor models, while facing the possible non-stationary data.

Keywords: common factor, non-stationarity, hierarchical factor model, principal com-ponent analysis, variance decomposition.

### 1

### Introduction

As the information technology improves, thousands of time series data with long span are now available, and how to efficiently extract the useful information from these large dimen-sional data while making inference has drawn the attention of researchers. In particular, the co-called factor model, which assumes the main fluctuations of all variables of interest are driven by only a few common factors, has thus become popular in the literature. This project is also focusing on some related issues about factor model and its variants.

For a stationary time series data {yi t}(i = 1, . . . , Ny;t = 1, . . . , T ), the conventional

factor model assumes that they are driven by q × 1 unobserved common factors, that is,

y_{i t} =30_{i}F_{t} +ε_{i t}, i = 1, . . . , N_{y}, t = 1, . . . , T, (1)
where Ft = (F1t, F2t, . . . , Fqt)0 is a q × 1 vector of unobserved common factors at time t ,

3i = (λ1i, λ2i, . . . , λqi)0 is the corresponding (q × 1) factor loadings, and εi t is referred

to as the idiosyncratic error. In the literature, this model and its variants have been widely employed in making inference or forecasting and their performance are quite well in general while facing the large dimensional data; see e.g., Stock and Watson (2002, 2006) and Bai and Ng (2008) among others. As is well known, however, because of lacking of economic structures, one of the deficiencies of this conventional factor model is that the “meaning” of factors is sometimes not so straightforward to know since they just combine information from all variables with some particular weights.

### 2

### The hierarchical factor model

To gain a better understanding of factors, the so-called top-down hierarchical factor model is introduced in the literature,1 especially in analysis of international business cycles; see e.g.,

1_{Basically, the named “multi-level” or “multi-layer” factor models are similar to the hierarchical factor}

models. Besides, instead of the top-down specification, there is also a hierarchical factor model with bottom-up specification in the literature, see e.g., Diebold, Li and Yue (2008), Ng and Moench (2010) and Moench et.al (2011).

Gregory et al.(1997) and Kose et al. (2003). To illustrate, let N denote the number of coun-tries, M the number of time series per country, R the number of regions, and T the length of the time series, then given the stationary data {yi t}(i = 1, . . . , M × N; t = 1, . . . , T ),

the three-layer, top-down hierarchical factor model of Kose et al. (2003) for analyzing the international business cycles is

y_{i,t} =a_{i} +λW_{i} F_{t}W +λ_{i}RF_{r,t}R +λC_{i} F_{n,t}C +ε_{i,t}, (2)
8W_{(L)F}W

t =uuuWt , 8rR(L)FrR,t =uuurR,t, 8Cn(L)FnC,t =uuuCn,t, (3)

i =1, . . . , M × N, t = 1, . . . , T, r = 1, . . . , R, n = 1, . . . , N,

where F_{t}W, F_{r,t}R and F_{n,t}C , orthogonal to each other, are world (W), regional (R) and
country-specific (C) factors,λW_{i} , λ_{i}R andλC_{i} are corresponding factor loadings,8W(L), 8_{r}R(L) and
8C

n(L) are generic lag polynomials, and uuuWt , uuur,tR and uuuCn,t are corresponding factor

inno-vations, respectively.2 In particular, it is worth noting that a shock should be labeled as worldwide, regional or country-specific depending on its effects rather than its origin; see e.g., the illustration of this type of model in Forni and Reichlin (2001) or Kose et al.(2003).

As easily seen from the specification (2), the “meaning” of these factors of each layer now is much easier to understand; i.e., they are world-wide, regional or country-specific. Besides, recall that all variables in the conventional factor model (1) are driven by the same q common factors, the fluctuation of i -th variable yi t in (2) now is explained only by three

factors (one for world, one for the located region and one for its country) instead, even though the whole system of M × N time series is driven by 1 + R + N factors. From this viewpoint, the top-down hierarchical factor model, therefore, can be viewed as a “restricted” factor model by imposing some zero loadings in conventional factor model (1). Moreover, because factors of each layer are orthogonal, the variance of yi t can be decomposed as

Var(yi,t) = (λW_{i} )2Var(F_{t}W) + (λ_{i}R)2Var(F_{r,t}R) + (λC_{i} )2Var(F_{n,t}C ) + Var(εi,t),

it thus provides an easy way to understand how the fluctuations of factors contribute to the variance of yi,t.

2_{For comparison with what proposed below, we changed some notations of variables and presentation of}

model in Kose et al.(2003). Their original model specification and notations of variables can refer to equation (1)–(4) on page 1219 of Kose et al.(2003).

Since the model specification above can be viewed as a state-space model for
unobserv-able factors — model (2) is the measurement equation and (3) is the corresponding
transi-tion equatransi-tions, given the distributransi-tions of idiosyncratic errorsε_{i,t} and of the factor innovations
(they are usually normal distributed), most of the existing methods for estimating these
unob-served factors and loadings in hierarchical factor models are either the maximum likelihood
estimation (MLE) with Kalman filter or the Bayesian estimation by Gibbs sampling. More
examples of top-down hierarchical factor models, estimation procedures and properties can
be found in, for example, Gregory et al. (1997), Forni and Reichlin (2001), Brook and Del
Negro (2006), Kose et al. (2003, 2008), Fu (2007), Del Negro and Otrok (2008), Stock and
Warson (2008), Beck et al. (2009), Crucini et al. (2011), He and Liao (2011), Hirata et
al. (2011), Mumtaz et al. (2011), Lee (2012), and Bai and Wang (2012) etc.

### 2.1

### The limitations of the existing analysis of hierarchical factor models

So far as the existing hierarchical factor models is concern, the “meaning” of factors is much more clear than the conventional factor models, and most of the related empirical studies are impressed and insightful; see e.g., Kose et al. (2003, 2008) and Crucini et al. (2011). Nevertheless, there are a couple of limitations which are worth noting. The first is that, all the existing hierarchical factor models in the literature work only for the stationary data. It is quite evident that not all raw data are stationary, and transferring non-stationary data to be stationary is kind of arbitrary and is even tedious while facing the large dimensional data. Besides, since the co-movement of non-stationary variables via co-integration is crucial to capture the long-run relationship between variables and has widely found in the literature, the existing hierarchical factor models could not take into account this possibility, and can not further take the advantage of this long-run relationship between variables for inference. The second limitation is that the number of factors of each layer must be presumed by researchers in advance of employing the typical estimation methods such like Kalman filter or Gibbs sampling. The resulting inference might be misleading if the presumed number of factors for any layer is wrong. Moreover, the estimation procedures, either by MLE with Kalman filter or the Bayesian estimation by Gibbs sampling, for the existing hierarchical factor models are sometime tedious because to obtain convergent estimates is usually time-consuming.

previous work is to get round above limitations, while keeping the advantages of top-down hierarchical factor model. The non-stationary data as well as non-stationary factors and idiosyncratic errors are allowed, the number of factors of each layer is determined by the data instead of presumption by researchers, and the estimation procedure is implementable by applying principal component analysis (PCA) recursively.3 From this perspective, the proposed model specification, estimation procedure and corresponding asymptotics are new in the literature and can be a comparable alternative to the existing top-down hierarchical factor models.

### 3

### The Proposed

In order to illustrate the proposed methodology more clearly for analyzing the hierarchical factor models with non-stationary components, we consider a two-layer, top-down hierar-chical factor model in what follows. We specify the model clearly, compare it with the existing hierarchical factor models, and propose a corresponding procedure for consistently estimating the unobservable factors and loadings. Extension to multi-layer models are quite straightforward, we thus focus on this two-layer model in this paper.

### 3.1

### Model Specification

More specifically, let YYY_{t} be N × 1 vector of standardized variables at time t , and can be
partitioned into two classes XXX_{t} (Nx×1) and ZZZt (Nz×1), where Nx+Nz = N. A two-layer

common factor model for YYY_{t} is

YYYt =
"
X
XX_{t}
ZZZ_{t}
#
=" αααx
αααz
#
+" λλλ
c0
x
λλλc0
z
#
F
F
Fc_{t} +" λλλ
x0_{F}_{F}_{F}x
t
λλλz0_{F}_{F}_{F}z
t
#
+" εεε
x
t
εεεz
t
#
, (4)

where αααx(Nx × 1) and αααz(Nz × 1) are vectors of constants, FFFc_{t}, identified as the global

factors, is a qc×1 vector of factors common to all variables, FFFx_{t} (qx ×1) and FFFz_{t} (qz ×1),

3_{Beck et al. (2009) also consider the similar estimation procedure, they also estimate factors and loadings}

from top layer to down layer by employing PCA and OLS recursively. However, their method works only for stationary data and not any asymptotic properties is provided in their paper.

orthogonal to FFFc_{t}, are the respective factors specific to classes of XXX and ZZZ, λλλc_{x} (qc × N_{x}),
λλλc

z (qc×Nz), λλλx (qx × Nx) andλλλz (qz ×Nz) are the corresponding vectors of loadings for

each layer,εεεx_{t} =[εx_{1,t}, . . . , εx_{N}
x,t]
0
andεεεz_{t} = [εz_{1,t}, . . . , εz_{N}
z,t]
0

are idiosyncratic errors related to XXX and ZZZ. Besides, similar to the specifications by Bai and Ng (2004) for analyzing the conventional factor model, the dynamics of factors and errors are specified as

(1 − L)FFFm t =888m(L)uuumt , m = c, x, z, (5) (1 − ρm i L)ε m i,t =2mi (L)ν m i,t, m = x, z, i = 1, . . . , Nm, (6)

where for m = c, x, z, uuum_{t} are factor innovations with corresponding lag polynomials888m(L) =
P∞

l=1φlmLl; and for m = x, z, i = 1, . . . , Nm,νi,tm are error innovations with corresponding

lag polynomials2m_{i} (L) = P∞_{l=1}θ_{i,l}mLl.

Contrary to the three-layer model (2), the specification for two-layer one in (4) implies that xi t = αx,i +λλλc 0 x,iFFFct +λλλx 0 i FFF x

t +εi,tx and zi t = αz,i +λλλc

0 z,iFFFct +λλλz 0 i FFF z t +εi,tz ; they

are driven by the same common factors from first layer and their own class-specific factors
form second layer. Instead of the top-down structure, as the proposed in Diebold, Li and Yue
(2008), Ng and Moench (2010) or Moench et.al (2011), the bottom-up hierarchical factor
model is specified as
first layer:
(
X
XX_{t} = ¨ααα_{x} + ¨λλλx
0
¨
F
FF_{t}x + ¨εεεx_{t}
Z
ZZ_{t} = ¨ααα_{z} + ¨λλλz
0
¨
FFFz_{t} + ¨εεεz_{t} , second layer:
(
¨
F
FFx_{t} = ¨λλλc
0
x FFF¨
c
t + ¨εεεcxt
¨
F
FFz_{t} = ¨λλλc
0
z FFF¨
c
t + ¨εεεczt
.
In comparing with the top-down structure, it is obvious that the factors ¨FFF_{t}x and ¨FFF_{t}z in first
layer are correlated with the second-layer factor ¨FFFc_{t} since ¨FFFc_{t} is the common factor of ¨FFFx_{t}
and ¨FFFz_{t}.

The remarkable features of this model specification are as follows. First, the proposed
specification is quite general and flexible since the non-stationarity is allowed via factors in
(5) or idiosyncratic errors when ρ_{i}m = 1 in (6); cf. (3). Moreover, the weak correlations
between idiosyncratic errors and factor innovations, and the possible dependence between
FFFx_{t} and FFFz_{t} are unrestricted. Second, the number of factors of each layer, qc, qx and qz,
are not presumed and will be determined by the data. Third, we do not need to worry about
how to transfer the non-stationary data to be stationary while facing the large dimensional
data. Fourth, a series is stationary or not depends on the composite effect of common factors

of each layer and its corresponding idiosyncratic error. Since at least one of factors is
non-stationary, all variables are non-stationary no matter how their corresponding idiosyncratic
errors are. On the other hand, if all factors of each layer are stationary, then the series is
stationary only when its idiosyncratic error is; otherwise, it is not. Last but not least, after
employing the typical unit root tests to the proposed model, we can tell whether the
non-stationarity is pervasive(due to factor FFFc_{t}), or class-specific (due to factors FFFx_{t} or FFFz_{t}) or
variable-specific (due to idiosyncratic error), or some (all) of them. In addition, we can
further address the issue of co-integration for each layer within this framework.

### 3.2

### Estimation Procedure

For the proposed new hierarchical factor models with non-stationary components, We con-sider a new estimation procedure which involves continuously updating PCA by extending the methods of Bai and Ng (2004) which deals with the conventional factor models only. In what follows, for an unknown parameter(or variable) of interest, θ say, we denote its estimator at the m−th iteration of the continuously updating PCA as bθ(m). The proposed estimation procedure is thus involved in the following few steps:

Step 1: Since not all variables of XXX_{t} and ZZZ_{t} are stationary, we difference variables in (4) first:

1YYYt =" 1XXXt
1ZZZt
#
=" λλλ
c0
x
λλλc0
z
#
1FFFc
t +
" λλλx0_{1FFF}x
t
λλλz0_{1FFF}z
t
#
+" 1εεε
x
t
1εεεz
t
#
, (7)

where1Qt = Qt −Qt −1for Qt = XXXt, ZZZt, FFFct, FFFxt, FFFtz, εεεxt andεεεtx.

Step 2: Under the identification conditions N−1(λλλc_{x}0λλλc_{x} + λλλ_{z}c0λλλc_{z}) = I_{q}c, a qc × qc identify

matrix, the typical PCA yields the initial estimators for [λλλc_{x} λλλc_{z}]0, [ bλλλc_{x}(0) bλλλc_{z}(0)]0say,
they are√N times the eigenvectors corresponding to the qc largest eigenvalues of the
N × N matrixPT

t =11YYYt1YYY0t. As a consequence, in the first layer of this hierarchical

factor model, the initial estimators for1FFFc_{t} is
[
1FFFc
t(0) = N
−1
b
λλλc
x(0)
0_{1XXX}
t + bλλλcz(0)
0_{1ZZZ}
t ,
for t = 2, . . . , T .

Step 3: For the second layer of the proposed hierarchical factor model, the initial estimators for loadingsλλλx andλλλz, bλλλx(0) and bλλλz(0) say, as well as the initial estimators for factors 1FFFx

t and1FFFzt, [1FFFxt(0) and [1FFFtz(0) say, can then be obtained by applying PCA as

Step 2 to1XXXt− bλλλcx(0)
0_{1FFF}_{[}c

t(0) and 1ZZZt− bλλλcz(0)
0_{1FFF}_{[}c

t(0), respectively.

Step 4: Given the initial estimators for factors and loadings for the second layer in the previous steps, we again apply PCA to the data

" 1XXXt − bλλλx(0)01FFF[tx(0)

1ZZZt − bλλλz(0)01FFF[z_{t}(0)

#

, (8)

which is the data after purging the estimated component components in the second
layer. The resulting PC estimator for the loadings ([λλλc_{x} λλλc_{z}]0) and factors (1FFFc_{t}) in the
first layer are denoted as [ bλλλc_{x}(1) bλλλc_{z}(1)]0and [1FFFc_{t}(1), respectively.

Step 5: Given updated PC estimators [ bλλλc_{x}(1) bλλλc_{z}(1)]0 and [1FFFc_{t}(1) in the first layer, the
up-dating PC estimators for the second layer can then obtained by applying PCA to the
purged data1XXXt − bλλλc_{x}(1)01FFF[c_{t}(1) and 1ZZZt − bλλλc_{z}(1)01FFF[c_{t}(1), respectively. We will

get the loading estimators bλλλx(1) and bλλλz(1), as well as the factor estimators [1FFFx_{t}(1)
and [1FFFz_{t}(1).

Step 6: Continuously update PC estimators by iterating between Step 4 and Step 5 until the
estimators converge, at the m∗−th iteration, say. The resulted convergent
estima-tors are denoted as bλλλc_{x} ≡ bλλλc_{x}(m∗), bλλλc_{z} ≡ bλλλc_{z}(m∗), bλλλx ≡ bλλλx(m∗), bλλλz ≡ bλλλz(m∗),
[
1FFFc
t ≡ [1FFFct(m
∗_{), [}_{1FFF}x
t ≡ [1FFFxt(m
∗_{) and [}_{1FFF}z
t ≡ [1FFFzt(m
∗_{). As a consequence, the}

estimators for FFFc_{t},1FFF_{t}x and1FFFz_{t}, respectively, are

c
FFFc_{t} =
t
X
s=2
[
1FFFc
s = N
−1
b
λλλc
x
0
X
X
X_{t} + bλλλc_{z}0ZZZ_{t} ,
c
FFFx_{t} =
t
X
s=2
[
1FFFx
s, and cFFFzt =
t
X
s=2
[
1FFFz
s,
for t = 2, . . . , T .

Step 7: Given the estimators for factors and loadings in the previous steps, we have

d 1εεεx t =1XXXt − bλλλcx 0 [ 1FFFc t − bλλλx 0 [ 1FFFx t, d1εεε z t =1ZZZt − bλλλcz 0 [ 1FFFc t − bλλλz 0 [ 1FFFz t,

then the estimators of the idiosyncratic errors are b εεεx t = t X s=2 d 1εεεx s, and bεεε z t = t X s=2 d 1εεεz s.

Step 8: Given all the estimators in the previous steps, the estimators for constants are

c
αααx = XXXt − bλλλcx
0
c
FFFc_{t} − bλλλx0FFcF_{t}x − bεεεx_{t},
b
αααz = ZZZt− bλλλcz
0
c
F
F
Fc_{t} − bλλλz0FFcF_{t}z− bεεεz_{t}.

Essentially, the consistent estimators for these possible non-stationary factors are ob-tained by continuously updating the PC estimators for the first-differenced data from top layer to bottom layer. Note also that, the number of factors for each layer can not be con-sistently estimated by using the information criteria of Bai and Ng (2002) directly in se-quence. As a consequence, we consider the following modified procedure as suggested in Wang (2010).

For the first layer, the estimator for the number of factors qccan be constructed as

b
qc =
b
q1+qb2−qb3, (9)
where
b
q_{1} =arg min
0≤q≤ ¯qln(Sx(q)) + q · P (Nx, T ),
b
q2 =arg min
0≤q≤ ¯qln(Sz(q)) + q · P (Nz, T ),
b
q3 =arg min
0≤q≤ ¯qln(Sc(q)) + q · P (Nc, T ),

with some pre-specified value ¯q, penalty functionP(N, T ), and
S_{x}(q) = (N_{x}T)−1
T
X
t =2
n 1XXXt −χχχx(q)
0
1XXXt −χχχx(q)o ,
S_{z}(q) = (N_{z}T)−1
T
X
t =2
n 1ZZZt −χχχz(q)
0
1ZZZt−χχχz(q)o ,
S_{c}(q) = (N_{c}T)−1
T
X
t =2
n 1YYYt−χχχc(q)
0
1YYYt −χχχc(q)o ,

are the sum of squared residuals between the specified data and its corresponding common
component. Specifically,χχχ_{x}(q), χχχ_{z}(q), and χχχ_{c}(q) are common components with rank q and
they are estimated by applying PCA to the data1XXXt,1ZZZt, and1YYYt, respectively. Givenqbc,
the estimator for the number of factors in the first layer, the numbers of factors in the second
layers can be determined accordingly as

b
qx _{=}

b

q_{1}−qbc, qbz =q_{b}2−qbc. (10)
As claimed in Wang (2010),qbc,qbx andqbz are consistent for qc, qx and qz, respectively.

### 4

### Asymptotic Properties

Because the proposed estimation procedure—constructing factors and loadings from top layer to bottom layer sequentially— is new in the literature while facing the possible non-stationary components in the hierarchical factor models, the corresponding properties need to be established carefully. Denote k Ak = trace(A0A)1/2, then extending the assumptions of Bai and Ng (2004) and Wang (2010), the regularity conditions for the proposed estimation to be valid are:

A-1: (i)λλλc_{x}(qc×N_{x}),λλλc_{z}(qc×N_{z}),λλλx(qx×N_{x}) andλλλz(qz×N_{z}) are non-random and satisfy
kλλλc_{x}k < ∞, kλλλc_{z}k < ∞, kλλλxk < ∞, and kλλλzk < ∞; (ii)(λλλc_{x}λλλc_{x}0 +λλλc_{z}λλλc_{z}0)/(N_{x} +N_{z})
converges to an qc ×qc positive definite matrix, (λλλxλλλx0)/Nx converges to an qx ×

qx positive definite matrix, and (λλλzλλλz0)/Nz converges to an qz ×qz positive definite

matrix; (iii) the rank of [λλλc_{x}0 λλλx0] is qc+qx and the rank of [λλλc_{z}0 λλλz0] is qc+qz.
A-2: (i) Var[1FFFc_{t}0 1FFF_{t}x0 1FFFz_{t}0]0 has rank qc +qx + qz. For m = c, x, z, (ii) uuum_{t} ∼

i i d(0, 666_{u}_{m}), IEkuuum_{t} k4< ∞; (iii) Var(1FFFm_{t} ) = P∞_{l=0}φm

l 666umφ
j0
l > 0; (iv) P
∞
l=0lkφlmk
< ∞; (v) 888m_{(1) has rank q}m
1 , 0 ≤ q1m ≤qm.

A-3: For m = x, z, (i) ν_{i,t}m ∼i i d(0, σ_{ν}2

m), IE|ν m i,t|8 < ∞, Pl=0∞ lkθi,lmk < ∞, 2mi (1)2σν2m > 0; (ii)PNm i =1|IE[ν m

i,tνmj,t]|< ∞ for all j; (iii) IE|Nm1/2P_{i =1}Nm[ν_{i,t}mν_{i,s}m − IE[ν_{i,t}mν_{i,s}m]]|4 <

∞, for all(t, s); (iv) PNx

i =1

PNz

i =1|IE[ν x

A-4: (i) uuuc_{t} is independent of uuux_{t}, uuuz_{t} andν_{i,t}m, m = x, z; (ii) for m = x, z, uuum_{t} is independent
ofν_{i,t}m.

A-5: For m = c, x, z, IE[kFFFm_{0}k]< ∞ and IE[|ν_{i}m_{,0}|]< ∞, i = 1, . . . , Nm.

Note that, because of the multi-layer top-down factor structure, the above regularity
con-ditions are slightly different from what in Bai and Ng (2004) which focuses on the model
with one-layer factor structure; we need to regulate some relationships among factors,
inno-vations of factors and errors within layer and between layers. First, as in Bai and Ng (2002),
we only consider non-random factor loadings for each layer for simplicity in Assumption
A-1(i); Assumption A-1(ii) ensures that the FFFc_{t}, FFF_{t}x and FFFz_{t}, respectively, has a nontrivial
contribution to the variances of YYY_{t} (all variables), of XXX_{t} (class-specific variables) and of ZZZ_{t}
(class-specific variables); Assumption A-1(iii), as in Assumption B of Wang (2010),
guaran-tees enough heterogeneity among individual variable within classes x or z when responding
to both common and class-specific factors; The factor structure for each layer is thus
identi-fiable. It also implies that only the global factors FFFc_{t} are common to all variables, while the
class-specific factors FFFx_{t} and FFFz_{t} are not pervasive enough to be counted as the global
fac-tors. Positive definite short-run variances and reduced-rank long-run variances of1FFFm_{t} are
regulated under Assumption A-2 for all factors; q_{1}m common stochastic trends and qm −q_{1}m
stationary factors are allowded for m = c, x, z. For the error innovations, Assumption A-3
allows the some weak serial and cross-section correlations in (1 − ρ_{i}mL)εm_{i,t}, since ρ_{i}m can
be different across i in the class m, m = x, z; similarly, the weak correlations between
class-specific error innovations are also regulated in A-3(iv). Under Assumption A-4, the
global factor innovations uuuc_{u} is independent of other factors and error innovations, and the
class-specified factors and error innovations are also independent in the class m, m = x, z,
while all these innovations themselves are serially independent as regulated in Assumptions
A-2(i) and A-3(i). Assumption A-5 regulates some initial conditions as commonly used in
unit root analysis. More detailed discussions about these assumptions may also refer to Bai
and Ng (2004) and Wang(2010).

Given these assumptions, it immediately follows

Theorem 4.1 Given Assumptions A-1 to A-5, and Nx/N → δ, 0 < δ < 1, as N, Nx, Nz, T →

(a) qcm

P

−→ qm if (i)P(Nm, T ) → 0 and (ii) C2_{N}_{m}_{,T}P(Nm, T ) → ∞ where CNm,T =

min{√N_{m},
√

T }.

(b) When qm = 1, dFFFm_{t} −→P δFFFm_{t} with some constant δ, and cε_{i t}m −→P ε_{i t}m for all i =
1, . . . , Nm;

(c) When qm > 1, the space spanned by dFFFm_{t} is an consistent estimate for the space
spanned by FFFm_{t} ;

(d) _{c}ααα_{x} −→P ααα_{x} andααα_{b}_{z} −→P ααα_{z}.

In essence, Theorem 4.1 (a) is obtained by applying Theorem 2 of Bai and Ng (2002) and results claimed by Wang(2010), Theorem 4.1 (b)(c) can be proved by using the Lemma 1 and Lemma 2 of Bai and Ng (2004), and the proof of Theorem 4.1 (d) is thus straightforward provided that Theorem 4.1 (a)(b)(c) are held. Note that, compared with the estimation and asymptotics for the PANIC model of Bai and Ng (2004), the major concern on the proposed procedure could be that, the “estimation errors” of factors and loadings for the first layer would affect the properties of the estimators for the second layer, and the induced composite effect would further affect the estimations in Step 7 and Step 8. However, these “estimation errors” practically would be smaller and smaller after each iteration in this continuous up-dating PC procedure, and could be neglected if above regularity conditions and the growing orders of Nm and T are held theoretically.

Since the proposed model allows the possible non-stationary components, it is thus
im-portant and interesting to detect whether they are stationary or not based on the proposed
estimates. Basically, for the estimates of idiosyncratic errors for every variable (Step 4) and
the estimates of factors for each layer when qc =qx =qz =1, Theorem 4.1 (b) guarantees
that it is valid to employ the typical unit root tests such as Dickey-Fuller test or
Phillips-Perron test for making inference. The problem is much more complicated, however, when
the number of factors for each layer is greater than one. Again, take the first layer of the
model for example, without loss of generality, we assume that there are q_{1}c non-stationary
factors and (qc −q_{1}c) stationary ones when qc > 1. Even though qc can be consistently
estimated by ˆqcvia (9), the number of common non-stationary trends will still be overstated
if we test each of the first ˆqc factor estimates individually for the presence of unit root. The
reason is that we can only consistently estimate the space spanned by the factors instead of

identifying factors themselves when there are more than one factors; cf. Theorem 4.1 (a).
Any rotation of these factors by using a full-rank matrix yields the same space. It
imme-diately implies that more than one common non-stationary trends would be detected after
rotations of factors even q_{1}cis exactly equal to one; similar argument can be found in Bai and
Ng (2004). This complicated case is under consideration now. Moreover, for some layer,
if some estimated factor is detected to be stationary but some variables whin this layer are
non-stationary, it means that this factor represents a co-integration relationship between these
non-stationary variables. This long-run relationship may help us to make further inference
in various topics.

### 5

### Simulations

Some selected simulation results are reported in what follows. In the first simulation, we investigate the performance of estimated number of factors for each layer based on (9) and (10). For the proposed model (4), the settings for factors and error innovations are

FFFc_{t} ∼ N(000, Iqc), FFFx
t ∼ N(000, σ2Iqx), FFFz
t ∼ N(000, σ2Iqz),
εεεx
t ∼ N(000, INx), εεε
z
t ∼ N(000, INz),

where eq is the q × 1 unity vectors and Iq is the q × q identity matrix; all the loadings

are followed the normal distribution with mean one and unity variance. The settings for the parameters in the selected case are T = 50 or 100, Nx = Nz = 20 or 80, qc = 1, qx = 2,

and qz =3, the accurate ratio of the proposed procedure as suggested in Wang (2010) with variousσ2are reported in Table 1.

The results in Table 1 shows that the introduced procedure works quite well, except the case thatσ = 0.25. When σ = 0.25, the fluctuation of the class-specific factors is much smaller than that of the corresponding error innovations with unity variance, the PCA thus cannot succeed in separating the common factors form the errors because of its objective. However, the performance improves very much since σ ≥ 0.5. In sum, this simulation shows that the introduced procedure based on (9) and (10) can yield the very well estimates for the number of common factors in each layer once the relative fluctuation of the factor to that of the corresponding error innovation is not too small.

Table 1: Accurate rates for selecting the number of common factors.
(A) T = 50 Nx = Nz =20
σ = 0.25 0.5 0.75 1 1.25 1.5 2
F
FFc_{t} 0.5396 0.9373 0.9986 0.9990 0.9993 0.9992 0.9994
FFFx_{t} 0.0595 0.9315 0.9992 0.9993 0.9994 0.9996 0.9997
F
FFz_{t} 0.0559 0.9342 0.9992 0.9997 0.9999 0.9996 0.9997
(B) T = 50 Nx =Nz =80
σ = 0.25 0.5 0.75 1 1.25 1.5 2
F
FFc_{t} 0.2178 0.9992 1.0000 1.0000 1.0000 1.0000 1.0000
FFFx_{t} 0.0702 0.9992 1.0000 1.0000 1.0000 1.0000 1.0000
F
FFz_{t} 0.0643 0.9992 1.0000 1.0000 1.0000 1.0000 1.0000
(C) T = 100 Nx =Nz =20
σ = 0.25 0.5 0.75 1 1.25 1.5 2
F
FFc_{t} 0.5610 0.9813 0.9999 1.0000 1.0000 1.0000 1.0000
FFFx_{t} 0.0786 0.9816 1.0000 1.0000 1.0000 1.0000 1.0000
F
FFz_{t} 0.0804 0.9809 0.9999 1.0000 1.0000 1.0000 1.0000
(D) T = 100 Nx = Nz =80
σ = 0.25 0.5 0.75 1 1.25 1.5 2
F
FFc_{t} 0.2851 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
FFFx_{t} 0.2404 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
F
FFz_{t} 0.2403 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

Table 2: R2for the estimated and true factors.
T =50, Nx = Nz =20 T =200, Nx = Nz =80
σ 0.5 1 2 0.5 1 2
\
F
FFc_{t}(0) 0.94 0.73 0.19 0.95 0.74 0.17
c
F
FFc_{t} 0.98 0.98 0.98 0.98 0.98 0.98
\
FFFx_{t}(0) 0.82 0.78 0.49 0.85 0.81 0.51
c
F
F
Fx_{t} 0.84 0.87 0.74 0.86 0.89 0.78
\
F
FFz_{t}(0) 0.83 0.77 0.48 0.85 0.81 0.52
c
FFF_{t}z 0.84 0.86 0.73 0.86 0.89 0.78

NOTE: \FFFm_{t} (0) and dFFFm_{t} represent the initial and the convergent estimates for
the factor FFFm_{t} , m = c, x, z.

In the second simulation, we would like to investigate the performance of the proposed
estimation procedure in Section 3.2. We compare performance of the initial PC estimates of
these factors ( \FFFm_{t} (0) for m = c, x, z.) with the continuous updated PC estimates (dFFFm_{t} for
m = c, x, z.). In this selected result, we consider qc = qx = qz = 1, and(T = 50, Nx =

N_{z} =20) or (T = 200, Nx = Nz =80). The regression R2while regressing the true factor

to the estimated one is reported. Basically, the R2of the initial PC estimates heavily depend
on σ, the variance of the class-specific factors. In particular, when σ = 2, which is twice
as violent as the common factor FFFc, the R2s for the initial PC estimate \FFFc_{t}(0) are below
0.2, which reflects the fact that the employing PCA once cannot succeed in separating the
common factor from the class-specific factors if the latter is much more violent. However,
once we continuously update the estimated based on Steps 4 and 5 of the procedure until the
estimates converge, the results in Table 2 show that the R2 for cFFFc_{t} is 0.98 even in the case
withσ = 2. The proposed estimating procedure can yield consistent estimates for the factors
in each layer.

### 6

### Concluding Remarks

In this paper, we study the hierarchical factor models with possible non-stationary compo-nents. We show how to estimate this kind of factor model(as well as the numbers of factor in each layer), the required conditions for yielding the consistent estimates, and the corre-sponding asymptotic theory. Several simulations are designed for investigating the finite-sample performance of the proposed procedure, and the selected results (as well as other un-reported ones) show that the proposed procedure works very well. Two empirical stud-ies are also considered based on the proposed model, one fucus on the global CDS(Credit Default Swap) indexes and another one deals with the GVAR dataset which involves several real- and nominal-sector variables in the world. The preliminary results for the estimated global and regional factors are quite interesting, and the corresponding academic papers are now in progress.

### References

Bai, J. and P. Wang (2012), Identification and estimation of dynamic factor models, Working paper.

Bai, J. and S. Ng (2002), Determine the number of factors in approximate factor models, Econometrica, 71, 135-171.

Bai, J. and S. Ng (2004), A Panic attack on unit root and cointegration, Econometrica, 72(4), 1127–1177.

Bai, J. and S. Ng (2008), Large dimensional factor analysis. Foundations and Trendsr in Econometrics, 3, 89-163.

Beck, G. W., K. Hubrich and M.Marcellino (2009), Regional inflation dynamics within and across Euro Area and a comparison with the United States. Economic Policy, 143–184.

Bernanke, B.S., J. Boivin and P. Eliasz (2005). Measuring the effects of monetary policy: A factor-augmented vector autoregressive (FAVAR) approach, Quarterly Journal of Economics, 120, 387–422.

comovement, Review of Finance, 10, 69–98.

Crucini, M.J., M.A. Kose and C. Otrok (2011), What are the driving forces of international business cycles? Review of Economic Dynamics, 14, 156–175.

Del Negro, M. and C. Otrok (2008), Dynamic factor models with time-varying parameters: measuring changes in international business cycles, Working paper.

Diebold, F.X., C. Li and V. Z. Yue (2008), Global yield curve dynamics and Interations: a dynamic Nelson-Siegel approach, Journal of Econometrics, 146, 351–363.

Forni, M. and L. Reichlin (2001), Federal policies and local economies Europe and the US European Economic Review, 45, 109–134.

Fu, D. (2007), National, regional and metro-specic factors of the U.S. housing market, Work-ing paper.

Gregory, A.W., A.C. Head, and J. Raynauld (1997), Measuring world business cycles, Inter-national Economic Review, 38(3), 677-701.

He, D. and W. Liao (2011), Asian business cycle sychronization, Pacific Economic Review, 17(1), 106–135.

Hirata, H., M.A. Kose and C. Otrok (2011), Regionalization vs. globalization, Working paper.

Kose, M.A., C. Otrok, and C.H. Whiteman (2003), International business cycles: world, re-gion,and country-specific factors, The American Economic Review, 93(4), 1216–1239.

Kose, M.A., C. Otrok, and C.H. Whiteman (2008), Understanding the evolution of world business cycles, Journal of International Economics, 75, 110–130.

Lee, J. (2012), Measuring business cycle comovements in Europe-Evidence from a dynamic factor model with time-varying parameters. pdf Economics Letters, 115, 438–440.

Moench, E., S. Ng and M. Potter(2011), Dynamic hierarchical factor models, Working paper.

Mumtaz, H., S. Simonelli and P. Surico (2011), International comovements, business cycle and inflation: A historical perspective, Review of Economic Dynamics, 14, 176–198.

Ng, S. and E. Moench (2010), A hierarchical factor analysis of US housing market dynamics, Working paper.

Pesaran, M.H. and Y. Shin(1999), An autoregressive distributed lag modelling approach to cointegration analysis, in S. Strom (ed), Econometrics and Economic Theory in the 20th Century: The Ragnar Frisch Centennial Symposium, Cambridge, Cambridge Uni-versity Press, Ch.11, 371–413.

Stock, J.H. and M.W. Watson (1998), Diffusion indexes, NBER Working Paper 6702.

Stock, J.H. and M.W. Watson (2002), Macroeconomic forecasting using diffusion indexes, Journal of Business and Economic Statistics, 20, 147-162.

Stock, J.H. and M.W. Watson (2006), Forecasting with many predictors, In G. Elliott, C. W. J. Granger, and A. Timmermann (Eds.): Handbook of Economic Forecasting, Volume 1, Amsterdam: Elsevier, 515-554.

Stock, J.H. and M.W. Watson (2008), The evolution of national and regional factors in U.S., Working paper.

Wang, P (2010), Large Dimensional Factor Models with a Multi-Level Factor Structure Iden-tification, Estimation and Inference, Working paper.

## 科技部補助計畫衍生研發成果推廣資料表

日期:2015/10/21### 科技部補助計畫

計畫名稱: 含有非定態數列的階層式因子模型 計畫主持人: 徐士勛 計畫編號: 102-2410-H-004-019-MY2 學門領域: 數理與數量方法### 無研發成果推廣資料

### 102年度專題研究計畫研究成果彙整表

計畫主持人：徐士勛 計畫編號：102-2410-H-004-019-MY2 計畫名稱：含有非定態數列的階層式因子模型 成果項目 量化 單位 備註（質化說明 ：如數個計畫共 同成果、成果列 為該期刊之封面 故事...等） 實際已達成 數（被接受 或已發表） 預期總達成 數（含實際 已達成數） 本計畫實 際貢獻百 分比 國內 論文著作 期刊論文 0 0 100% 篇 研究報告/技術報告 1 1 100% 根據此計畫的研 究成果，我們將 其分成兩大部分 ，一部份為側重 理論模型與性質 ，另一部份則為 實證研究，目前 正在整理並撰寫 相關學術論文。 研討會論文 0 0 100% 專書 0 0 100% 章/本 專利 申請中件數 0 0 100% 件 已獲得件數 0 0 100% 技術移轉 件數 0 0 100% 件 權利金 0 0 100% 千元 參與計畫人力 （本國籍） 碩士生 4 0 100% 人次 為使計畫順利進 行以及因應實際 的學生情況，在 此計畫中將每年 原申請的兩名博 士班兼任助理員 額改為1名博士 生以及4名碩士 級兼任助理。 博士生 1 2 100% 為使計畫順利進 行以及因應實際 的學生情況，在 此計畫中將每年 原申請的兩名博 士班兼任助理員 額改為1名博士 生以及4名碩士 級兼任助理。 博士後研究員 0 0 100% 專任助理 0 0 100%研討會論文 0 0 100% 專書 0 0 100% 章/本 專利 申請中件數 0 0 100% 件 已獲得件數 0 0 100% 技術移轉 件數 0 0 100% 件 權利金 0 0 100% 千元 參與計畫人力 （外國籍） 碩士生 0 0 100% 人次 博士生 0 0 100% 博士後研究員 0 0 100% 專任助理 0 0 100% 其他成果 （無法以量化表達之 成果如辦理學術活動 、獲得獎項、重要國 際合作、研究成果國 際影響力及其他協助 產業技術發展之具體 效益事項等，請以文 字敘述填列。） 無 成果項目 量化 名稱或內容性質簡述 科 教 處 計 畫 加 填 項 目 測驗工具(含質性與量性) 0 課程/模組 0 電腦及網路系統或工具 0 教材 0 舉辦之活動/競賽 0 研討會/工作坊 0 電子報、網站 0 計畫成果推廣之參與（閱聽）人數 0