追蹤資料的分量迴歸分析之內生性問題

(1)

行政院國家科學委員會專題研究計畫成果報告

追蹤資料的分量迴歸分析之內生性問題

研究成果報告(精簡版)

計畫類別：個別型計畫編號： NSC 100-2410-H-004-071- 執行期間： 100 年 08 月 01 日至 101 年 07 月 31 日執行單位：國立政治大學經濟學系計畫主持人：林馨怡計畫參與人員：碩士班研究生-兼任助理人員：李婉璘博士班研究生-兼任助理人員：劉冬威公開資訊：本計畫可公開查詢

中華民國 101 年 10 月 31 日

(2)

中文摘要：無中文關鍵詞：無

英文摘要： This project develops a two-stage estimation of a panel data quantile regression model with endogenous explanatory variable. The regressors in the model include a lagged endogenous dependent variable and other explanatory variables, that are correlated with the fixed effects. In the estimation, the control function approach is used, and a penalized quantile regression method for panel data is applied in the second stage. The Monte Carlo simulation shows that the proposed estimation effectively reduces the endogenous bias and performs better than other estimators in finite samples.

英文關鍵詞： control function, endogeneity, panel data, quantile regression

(3)

1 Introduction

The panel data model with individual fixed effects, is widely employed in empirical studies. Classical panel data model usually investigates the mean relationship be-tween dependent variable and explanatory variables. It is thus desirable to have an econometric method which enables modeling heterogeneous relationship. Koenker (2004) proposes the quantile regression for panel data model, which provides a com-plete description of the heterogeneous effect of explanatory variables on dependent variables. Note that, in most applications, the number of observations on each in-dividual would be relatively modest, therefore, the fixed effects in the project do not allow a distribution shift and do not depend on the quantiles. Specifying a dummy variable that identifies individuals for the fixed effects to obtain estimates of common model parameters is not available in the quantile regression for panel data model, where the fixed effects should be estimated directly, and the incidental parameter problem arises in the estimation. By using a penalized quantile regression (PQR) method, Koenker (2004) uses a penalized objective function to improve the estimation of common model parameters by controlling the variability introduced by the fixed effects. See also Lamarche (2010) and Galvao and Montes-Rojas (2010). In practical application, the variables of interest are often endogenous, making conventional method inconsistent and hence inappropriate for recovering the hetero-geneous effects of variables. There are several studies focused on obtaining consis-tent estimators in quantile regression models with endogenous regressors. Amemiya (1982) and Powell (1983) suggest the two stage least absolute deviation estimation which is analogous to the two stage least squares estimation. Chesher (2003, 2005, 2007) and Ma and Koenker (2006) consider the scope of quantile regression meth-ods for structural econometrics models. Bundell and Powell (2007) and Lee (2007) propose the control function approach to solve possible endogenous problems. In addition, the instrumental variable quantile regression (IVQR) of Chernozhukov and Hansen (2005, 2006, 2008) is widely applied in empirical research.

When quantile regression is applied to a panel data model with endogenous variables, Arias et al. (2001), following the control function approach, suggest a two-stage estimation method. Moreover, Harding and Lamarche (2009), and Galvao (2011) introduce the IVQR method of Chernozhukov and Hansen (2005) for panel data models. Galvao and Montes-Rojas (2010) consider the IVQR method and the

(4)

PQR method of Koenker (2004) for dynamic panel data model. Lin (2010) combinds the PQR method of Koenker (2004) and the fitted value approach to propose a two-stage estimation for dynamic panel data quantile regression model with fixed effects. While most studies apply the IVQR method for the endogenous problem in quantile regression, it is very complicated to compute the estimators via the IVQR method in empirical practice.

2 Control Function

This project aims to extend the control function approach of quantile regression to panel data model with unobservable heterogeneity. The panel data model with fixed effect has the form

Yit= αi+ Xitβ + Z1,it0 γ1 + Uit, ∀i = 1, · · · , N, t = 1, · · · , T.

where Yit is a real-valued dependent variable, αi is the parameter which represents

time-invariant fixed effect which is intended to capture some individual specific effect or unobserved heterogeneity that is not adequately controlled for by other covariates in the model, Xit is a real-valued, continuously distributed, endogenous explanatory

variable, Z1,it is a (dZ1 × 1) vector of exogenous explanatory variables, β and γ1

are unknown parameters, Uit is the error term. If the number of observations T is

large for each individual then we may estimate a distributional shift αi(τ ) for each

individual. In most application, the number of observation T in each individual is relatively modest and it is not suitable to estimate distribution individual effect. For example, in time series applications, the number of observations T is small and it is difficult to estimate a distributional individual effect.

Consider the following model of endogenous variable

Xit = µ + Zit0γ + Vit, (1)

where Zit ≡ (Z1,it, Z2,it) is a (dZ × 1) vector of exogenous explanatory variable, µ

is an unknown parameter, γ ≡ [γ1, γ2] is a (dZ× 1) vector of unknown parameters,

Vit is real-valued unobserved random variable. For identification it is assumed that

there is at least one component of Zit that is not included in Z1,it, and there is at

(5)

and γ2 6= 0, where dZ1 is the dimension of Z1,it. When vit is the value of Vit that

satisfies (1),

QUit|Xit,Zit(τ |xit, zit) = QUit|Vit,Zit(τ |vit, zit),

where QUit|Xit,Zit(τ |xit, zit) denotes the τ th quantile of Uit conditional on Xit = xit

and Zit = zit, and the other expressions are understood similarly. In addition,

assume a quantile independence of Uit on Zit conditional on vit and a quantile

independence of Vit on Zit,

QUit|Vit,Zit(τ |vit, zit) = QUit|Vit(τ |vit) and (2)

QVit|Zit(θ|zit) = 0 (3)

almost surely. Under assumption (2), the panel data model for the τ th conditional quantile function of the response of the tth observation on the ith individual Yit is

QYit|Xit,Z1,it(τ |xit, zit) = αi+xitβ(τ )+z

0

1,itγ1(τ )+QUit|Vit(τ |vit), ∀i = 1, · · · , N, t = 1, · · · , T.

(4)

In model (4), the α’s have a pure location shift effect on the conditional quantiles and the effects of α do not depend on the quantile, τ . The covariates Xit, Z1,it,

are permitted to depend on the quantile, τ . In addition, since the variable Vit

is stochastically dependent on Uit, further assume that the conditional quantile

function of Uit on Vit is a linear function of Vit, (4) is rewritten as

QYit|Xit,Z1,it(τ |xit, zit) = αi+xitβ(τ )+z

0

1,itγ1(τ )+vitφ(τ ) ∀i = 1, · · · , N, t = 1, · · · , T,

(5)

with φ(τ ) the parameter. It is noted that unlike the ordinary least square method which eliminates the fixed effect by taking first difference of the model to obtain consistent estimation of parameters, in quantile regression framework, such trans-formation to eliminate the individual fixed effect is not available since the quantile function is not a linear operator. This suggests that β(τ ), γ1(τ ) and φ(τ ) could be

estimated by the penalized model of quantile regression for panel data of Koenker (2004). In applications, vit is unobserved. By assumption (3),

QXit|Zit(θ|zit) = µ(θ) + z

0 itγ(θ),

(6)

and vit can be estimated consistently by the residual of a linear θth quantile

re-gression of X on (1, Z). Therefore, β(τ ) and γ1(τ ) can be estimated by a two-step

procedure. The first step is construction of estimated residuals ˆV from the linear quantile regression of X on (1, Z). The second step is the penalized quantile regres-sion for panel data of Y on X, Z1 and ˆV . This approach corrects for endogeneity by

adding estimates of V as an additional explanatory variable, and can be viewed as a variant of control function approach.

3 Estimation

The estimation procedure consists of two steps. The data consist of independent and identically distributed (i.i.d.) observations {yit, xit, zit : i = 1, · · · , N, t =

1, · · · , T }. The first step is construction of estimated residuals ˆvit(θ) = xit− ˆµ(θ) −

z_it0 ˆγ(θ) (i = 1, · · · , N, t = 1, · · · , T ) by a linear quantile regression of X on (1, Z), where (ˆµ(θ), ˆγ(θ)) is a solution to min µ,γ N X i=1 T X t=1 ρθ(xit− µ − zit0γ),

where θ ∈ (0, 1) and ρθ(u) = u(θ − 11{u<0}) is the piecewise linear quantile loss

function or “check” function of Koenker and Bassett (1978). The second step is estimation of penalized linear quantile regression of yit on (xit, z1,it, vit) using the

estimated residuals ˆvit in place of unobserved vit’s. In this project, the second step

is carried out via penalized quantile regression of panel data of Koenker (2004). To describe the second step, when the covariates contain of the model contain an intercept, the penalized quantile regression is to estimate the model (5) for several quantiles simultaneously, min β,γ1 q X k=1 N X i=1 T X t=1 ωkρτk(yit− αi− x 0 itβ(τk) − z01,itγ1(τk) − ˆvitφ(τk)) − λ N X i=1 |αi|, (6)

where ρτ is again the check function for τ ∈ (0, 1). When n is large relative to the

T , the `1 shrinkage is advantageous in controlling the variability introduced by the

large number of estimated αi parameters. For λ → 0, (6) becomes

min β,γ1 q X k=1 N X i=1 T X t=1 ωkρτ(yit− αi− x0itβ(τk) − z1,it0 γ1(τk) − ˆvitφ(τk)),

(7)

which is similar to the objective function of panel data quantile regression using instrumental variables method of Galvao (2011) and Harding and Lamarche (2009). When λ → ∞, the ˆαi → 0 for all i = 1, · · · , N , then an estimate of the model

purged of the fixed effect is obtained. The weights ωk control the relative influence

of the q quantiles {τ1, · · · , τq}, on the estimation of the αi parameters. Note that

since the z1,it contains an intercept, therefore we have q, τ -specific, estimates of the

intercept.

Impose assumptions 1-4 in the paper, we can obtanin the asymptotic normality of the proposed estimator. Please contact the author for the complete paper.

4 Monte Carlo Simulations

In this section, the Monte Carlo study is studied to investigate the small sample properties of estimators. We compare the bias and RMSE of the following estima-tors: (1) the proposed estimator in this project; (2) the penalized QR estimator in Koenker (2004); (3) the penalized QR estimator using the IVQR method in Galvao and Montes-Rojas (2010). Three models are considered in this section: (A) the pure location shift model,

yit = ηi+ βxit+ uit;

(B) the location-scale shift model I,

yit = ηi+ βxit+ (γ0xit)uit;

and (C) the location-scale shift model II,

yit = ηi+ βxit+ (1 + γ1xit)uit.

The error term uit follows the normal distribution N (0, σu2) with σu2 = 1, 3, 5,

the heavy-tail t-distribution with 3 degree of freedom (t3 distribution), or the χ2

-distribution with 3 degree of freedom (χ2

3 distribution).

The regressor xit is generated according to xit = µi+ ξit, where the fixed effect

µi = e1i+ 1 T T X t=1 xit, e1i∼ N (0, σ2e1),

(8)

and ξit follows the same distribution as uit. The fixed effects, ηi is generated as ηi = e2i+ 1 T T X t=1 it, e2i ∼ N (0, σ2e2).

From the above specification of the fixed effect, there is correlation between the individual effects and the explanatory variables; which ensures that the random effects are inconsistent. In the simulation, T = 10, N = 50, and the number of replication is 2000. In addition, the parameters α = {0.3, 0.4, 0.5, 0.6, 0.7}, β = 1, σe1 = σe2 = 1. For the location-scale shift models, we use γ0 = 0.5 and γ1 = 0.1.

The Monte Carlo simulation shows that the proposed estimation effectively reduces the endogenous bias and performs better than other estimators in finite samples. We report all the results in the paper.

5 Conclusions

This project develops a two-stage estimation of a panel data quantile regression model with endogenous explanatory variable. The regressors in the model include a lagged endogenous dependent variable and other explanatory variables, that are correlated with the fixed effects. In the estimation, the control function approach is used, and a penalized quantile regression method for panel data is applied in the second stage. The Monte Carlo simulation shows that the proposed estimation effectively reduces the endogenous bias and performs better than other estimators in finite samples. The proposed approach is easy to implement and effective in several practical applications.

References

Amemiya, T. (1982). Two stage least absolute deviations estimators, Journal Econo-metrics, 50, 689–711.

Arias, O., Hallock, K.F. and Sosa-Escudero, W. (2001). Individual heterogeneity in the returns to schooling: Instrumental variables quantile regression using twins data, Empirical Economics, 26, 7-40.

Blundell, R. and Powell, J.V. (2007). Censored regression quantiles with endogenous regressors, Journal of Econometrics, 141, 65–83.

(9)

Chernozhukov, V. and Hansen, C. (2005). Notes and comments an IV model of quantile treatment effects, Econometrica, 73, 245–261.

Chernozhukov, V. and Hansen, C. (2006). Instrumental quantile regression inference for structural and treatment effect models, Journal of Econometrics, 132, 497– 525.

Chernozhukov, V. and Hansen, C. (2008). Instrumental variable quantile regression: A robust inference approach, Journal of Econometrics, 142, 379–398.

Chesher, A. (2003). Identification in nonseparable models, Econometrica, 71, 1405– 1441.

Chesher, A. (2005). Nonparametric identification under discrete variation, Econo-metrica, 73, 1525–1550.

Chesher, A. (2007). Endogeneity and discrete outcomes. Cemmap Working Papers.

Galvao, A.F. (2011). Quantile regression for dynamic panel data, Journal of Econo-metrics, 164, 142–157.

Galvao, A.F. and Montes-Rojas, G.V. (2010). Penalized quantile regression for dynamic panel data, Journal of Statistical Planning and Inference, 140, 3476– 3497.

Harding, M. and Lamarche, C. (2009). A quantile regression approach for estimating panel data models using instrumental variables, Economics Letters, 104, 133– 135.

Koenker, R. (2004). Quantile regression for longitudinal data, Journal of Multivari-ate Analysis, 91, 74–89.

Koenker, R. and Bassett, G. (1978), Regression Quantiles, Econometrica, 46, 33-50.

Lamarche, C. (2010). Robust penalized quantile regression estimation for panel data, Journal of Econometrics, 157, 396–408.

Lee, S. (2007). Endogeneity in quantile regression models:a control function ap-proach, Journal of Econometrics, 141, 1131–1158.

Lin, H. Y., 2010. Dynamic panel quantile regression with an application to deficit and inflation. Working Paper.

(10)

Ma, L. and Koenker, R. (2006). Quantile regression methods for recursive structural equation models, Journal of Econometrics, 134, 471–506.

Powell, J. (1983). The asymptotic normality of two-stage least absolute deviations estimators, Econometrica, 51 , 1569–1575.

(11)

國科會補助計畫衍生研發成果推廣資料表

日期:2012/10/29

國科會補助計畫

計畫名稱: 追蹤資料的分量迴歸分析之內生性問題計畫主持人: 林馨怡計畫編號: 100-2410-H-004-071- 學門領域: 數理與數量方法

無研發成果推廣資料

(12)

100 年度專題研究計畫研究成果彙整表

計畫主持人：林馨怡計畫編號： 100-2410-H-004-071-計畫名稱：追蹤資料的分量迴歸分析之內生性問題量化成果項目實際已達成數（被接受或已發表）預期總達成數(含實際已達成數) 本計畫實際貢獻百分比單位備註（質化說明：如數個計畫共同成果、成果列為該期刊之封面故事 ... 等）期刊論文 0 0 100% 研究報告/技術報告 1 1 100% 研討會論文 0 0 100% 篇論文著作專書 0 0 100% 申請中件數 0 0 100% 專利已獲得件數 0 0 100% 件件數 0 0 100% 件技術移轉權利金 0 0 100% 千元碩士生 1 1 100% 博士生 1 1 100% 博士後研究員 0 0 100% 國內參與計畫人力（本國籍）專任助理 0 0 100% 人次期刊論文 0 1 100% 研究報告/技術報告 0 0 100% 研討會論文 0 0 100% 篇論文著作專書 0 0 100% 章/本申請中件數 0 0 100% 專利已獲得件數 0 0 100% 件件數 0 0 100% 件技術移轉權利金 0 0 100% 千元碩士生 0 0 100% 博士生 0 0 100% 博士後研究員 0 0 100% 國外參與計畫人力（外國籍）專任助理 0 0 100% 人次

(13)

其他成果

(

無法以量化表達之成果如辦理學術活動、獲得獎項、重要國際合作、研究成果國際影響力及其他協助產業技術發展之具體效益事項等，請以文字敘述填列。) 研究成果撰寫完畢後,將會投稿至國際期刊,具有學術上之價值. 成果項目量化 名稱或內容性質簡述 測驗工具(含質性與量性) 0 課程/模組 0 電腦及網路系統或工具 0 教材 0 舉辦之活動/競賽 0 研討會/工作坊 0 電子報、網站 0 科教處計畫加填項目計畫成果推廣之參與（閱聽）人數 0

(14)

國科會補助專題研究計畫成果報告自評表

請就研究內容與原計畫相符程度、達成預期目標情況、研究成果之學術或應用價

值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）

、是否適

合在學術期刊發表或申請專利、主要發現或其他有關價值等，作一綜合評估。

1. 請就研究內容與原計畫相符程度、達成預期目標情況作一綜合評估

■達成目標

□未達成目標（請說明，以 100 字為限）

□實驗失敗

□因故實驗中斷

□其他原因

說明：

2. 研究成果在學術期刊發表或申請專利等情形：

論文：□已發表 □未發表之文稿 ■撰寫中 □無

專利：□已獲得 □申請中 ■無

技轉：□已技轉 □洽談中 ■無

其他：（以 100 字為限）

3. 請依學術成就、技術創新、社會影響等方面，評估研究成果之學術或應用價

值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）（以

500 字為限）

研究成果撰寫完畢後,將會投稿至國際期刊,具有學術上之價值.