METHODS - 網路搜尋量是否可以增進股票市場波動率的預測?國際實證

3.1 Vector autoregressive model (VAR model)

In this section, we estimate a VAR model for every stock index to capture the dynamic relationship between Google search volume and stock index volatility. We study the dynamics of realized volatility and search volume by three ways: (1) Granger causality tests to examine if past volatility can significantly influence present search volume (Granger (1969) and Sims (1972)). (2) Impulse response function to see how volatility reacts over time to shock of search volume and vice versa. (3) Long-run variance decomposition to test how much of volatility can be explained by internet search volume.

First, we need to estimate a VAR(p) model:

-RV_t ∑ _, -RV_t-j ∑ _, -SV_t-j _, , (5)

-SV_t ∑ _, -RV_t-j ∑ _, -SV_t-j _, , (6)

where we decide the lag order (p) by Schwarz Criterion (SC), or named Bayes Information Criterion (BIC). The model with the lower value of SC is the one to be preferred to use.

The degree of freedom (df) used by Granger causality presented in Table 8 is just the optimal lag order (p) used for VAR model. The VAR model contains the first through sixth lags (t-1~t-6) of all the endogenous variables as p is 6. We could find that p is between 1 and 6 for daily data while weekly data have much smaller p, from 1 to 3. It makes sense that if t is this Friday and t-6 is last Thursday, then it means this week and last week when we transform to weekly frequency. At this time, p is 1. So it is normal that lag order is smaller for weekly data, where Peru has largest p=3.

And then, we examine the following 3 tests under the optimal VAR model for every index.

3.1.1 Granger causality test

Granger causality test, approached by Granger (1969) is to see how much of the current y can be explained by past values of y and then to examine whether adding lagged values of x can improve the explanation. If the coefficients of lagged x are statistically significant, y is said to be Granger-caused by x. That is, if search volume has statistically significant information about future volatility by t-tests or F-tests then search volume is said to Granger cause stock market volatility. Note that the statement

“x Granger cause y “does not imply that y is the effect or the result of x.

Here we use pairwise Granger causality tests to test whether an endogenous variable can be treated as exogenous in the VAR model. The null hypotheses are

“log-RV doesn't Granger cause log-SV” and “log-RV doesn't Granger cause log-SV” to see whether realized volatility is useful in forecasting search volume and whether search volume is useful in forecasting volatility at the same time. If Chi-squared (Wald) statistic is larger than critical value, such that p-value is under 0.1 with 90%

confidence level, then we can reject the null hypothesis.

3.1.2 Impulse response function (IRF)

Generally, impulse response refers to the reaction of any dynamic system in response to some external change. It traces the effect of a shock to one of the innovations on current and future values of the endogenous variables. Here we used to explore how volatility reacts over time to the shock of search volume, and vice versa.

To trace the response function, we set the number of period as 100 and use the Cholesky decomposition with the ordering, log-RV log-SV, due to the economically meaningful restriction of volatility being contemporaneously exogenous, i.e. volatility can affect search volume immediately, but search volume cannot contemporaneously affect volatility. This ordering intuitively indicates that abnormal volatility attracts

retail investors’ attention and then in turn makes volatility. On the other hand, search volume would not rise without a preceding event on the market.

3.1.3 Variance decomposition

While impulse response function trace the effects of a shock to one endogenous variable on to the other variables in the VAR model, variance decomposition separates the variation in an endogenous variable into the component shocks to the VAR model. Thus, the variance decomposition provides information about the relative importance of each random innovation in affecting the variables in the VAR model. We used this to examine the amount of information of search volume contributes to the volatility.

We set number of periods as 100 to capture long-term variance decomposition.

Because of the economically meaningful restriction of volatility discussed before for tracing impulse response function, we use the same ordering, log-RV log-SV, such that volatility is contemporaneously exogenous.

3.2 Regression models

In this section, we use three other regression models to rule out whether search volume has additional information for modeling volatility. Here we only focus on the equation of interest, the volatility equation. We choose these regression models since they are commonly used to capture the time series properties of realized volatility and include lagged proxy of individual’s attention to test whether retail investors’

attention add information. Here, we only include search volume at one lag in these models, -SV_t-1.

First, we estimate autoregressive models with first lag (AR(1)) and augment this with lagged search volume, -SV_t-1, following Andersen, Bollerslev, Christoffersen and Diebold (2006) and Bollen and Inder (2002).

-RV_t -RV_t-1 -SV_t-1 . (7)

Next, we estimate heterogeneous autoregressive (HAR) model of Corsi (2009), which could capture the long-memory properties of volatility very well. This HAR model has different lag length and augments with lagged search volume, -SV_t-1,

-RV_t -RV_t-1 -RV_t-1 -RV_t-1 -SV_t-1 (8)

, where -RV_t ∑ -RV_t-j and -RV_t ∑ -RV_t-j. That is, the model contains the realized volatility data of yesterday, previous week and previous month so it can explain the long-memory pattern of volatility well.

Since bad news usually cause higher volatility than good news, that is asymmetry, and in turn makes more attention of retail investors. The above two models don’t consider asymmetry. Therefore, we estimate the EGARCH(1,1) model by augmented with lagged search volume, -SV_t-1, (Nelson(1991)),

, (9)

log log -SV_t-1. (10)

The input of this model is not the realized volatility time series but the return data of index measured by equation (2). We augment the lagged search volume to the variance equation, which we interest in.

In all three models contain the previous day’s search volume as an exogenous variable, AR(1)+SV, HAR+SV and EG+SV. We examine whether lagged search volume indeed add valuable information to the model by testing whether is significantly different from zero.

3.3 Volatility forecasts

In this section we compare the forecasting ability of the volatility models with

and without lagged search volume, -SV_t-1, in- and out-of-sample. The models we use are the univariate AR(1), HAR and EGARCH models, which are simply equations (7), (8) and (10) with equal to zero, and the respective augmented models including lagged search volume, AR(1)+SV (7), HAR+SV (8)and EGARCH+SV (10).

We evaluate the forecasting ability by comparing realized volatility and its prediction following the literatures (e.g. Andersen et al. (2003), Ghysels et al. (2006), Ait-Sahalia and Mancini (2008)).

We use two robust loss functions to compare the volatility forecasting ability (Patton (2011)). They are the mean squared error (MSE) and the quasi-likelihood loss function (QL),

MSE ∑ _| , (11)

QL ∑

| log

| 1 , (12)

where _| is the respective forecast of volatility based upon information available up to and including time t. If MSE and QL decrease after the models augment with lagged search volume, -SV_t-1, then it indicates that search volume can improve forecasting ability. We also test that if the differences between loss functions of the univariate models and ones of the respective augmented models are statically significant.

In addition, we use the R² of regression of the actual realized volatilities on their prediction to compare the ability of volatility forecasts (Mincer and Zarnowitz (1969)),

| . (13)

Search volume can help to improve volatility forecasting as the R²increase after the model augment with lagged search volume.

At the first, we make in-sample forecasts to evaluate one-step ahead forecasts of realized volatility. For in-sample analysis, we estimate the parameters in the sample period, where the total observations are used, and then using the same parameters to forecast one-step ahead volatilities. They are just the fitted values of the model. The total number of observations (Obs.) in the model for each index can be seen from the rightist column of Table 3.

While for out-of-sample analysis, we do not use the same parameters to predict volatilities. We set the window as 2/3 of total number of observations and then forecast volatility by rolling window. Take DJIA as an example. The number of total observations of DJIA is 1803 so the window is 1202. For the initial forecast, RV₁₂₀₃, we estimate the models using the time series, t=1 to 1202. We then re-estimate the models using the time series, t=2 to 1203, for RV₁₂₀₄. We repeat this action until the end of the period.

在文檔中網路搜尋量是否可以增進股票市場波動率的預測?國際實證 (頁 33-38)