Forecast Models - Statistical Approach for RQ2

2. Literature Review

3.4 Statistical Approach for RQ2

3.4.4 Forecast Models

For analyses based on the GDELT 2.0 GKG, a series of predictions for each month from April 2015 to November 2017 is made, converted into a time series object, and then plotted. Because there is no data for the preceding month, no prediction for March 2015 is included in the time series.

Figure 33: Predicted South China Sea tensions by month using average benchmark model (for analyses based on GDELT 2.0 GKG)

Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values represent lower tensions (i.e., more negative tone).

3.4.4 Forecast Models

In addition to the four benchmark models covered above, this dissertation considers four forecast models for predicting South China Sea tensions in past and future time periods: a simple exponential smoothing (SES) model, an autoregressive (AR) model, a moving average (MA) model, and an autoregressive integrated moving average (ARIMA) model. As above, the following four subsections explain the approaches, historical knowledge, assumptions, fundamental logic, and mathematical notation

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

for each of these forecast models. They also provide visualizations of the predictions generated and references to the relevant code as contained in the appendices.

Simple Exponential Smoothing Model

Like the average model, the simple exponential smoothing (SES, also called single exponential smoothing) model makes predictions based on a knowledge of all historical levels of tensions from earlier time periods within the given timeframe. It diﬀers, however, in that it uses a weighted average, in which data from more recent time periods are considered to have a larger eﬀect on forecasts than earlier data. ¹⁴² Because monthly tensions ( Tensions ) appear to be stationary (i.e., without any increasing or decreasing trend) and do not show any seasonality, forecasts can be made using simple exponential smoothing. The simple exponential smoothing ¹⁴³ model’s rationale for weighting recent data more heavily than earlier data is derived from the knowledge that many real world processes are more aﬀected by recent events. For example, the state of Russia–United States relations last year would likely be a better predictor of their bilateral relations this year than would observations from three decades ago, but it is reasonable to think that relations historically would continue to have some eﬀect.

In mathematical notation, the simple exponential smoothing model predicts the expected tensions ŷ at time t+1 based on the observed level of tensions y at times t , t‑1 , t‑2 , and so on, back to the earliest known time period, using the following equation:

ŷ _t+1 = αy _t + α(1−α) ²y _t‑1 + α(1−α) ²y _t−2 + … + α(1−α) ²y _t−n

142 Rob J. Hyndman and George Athana sopou los, “7.1 Simple exponential smoothing,” in Rob J.

Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012,

<https://www.otexts.org/fpp/7/1>.

143 Avril Coghlan, A Little Book of R For Time Series: Release 0.2 , June 15, 2017,

<http://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

In the above equation, the alpha α term is the smoothing parameter and 0 ≤ α

≤ 1 , where α near zero gives more weight to earlier data and α near 1 gives more weight to recent data. The calculations use the ets() function in the forecast package in R. When the model parameter is set to ANN and the alpha and beta parameters are set to NULL , the function makes forecasts using the simple

exponential smoothing model with additive errors, and the α parameter is calculated automatically.

For analyses using the simple exponential smoothing model and based on the GDELT 1.0 Event Database, a model is ﬁt to the training dataset, and a series of predictions for each month from March 2011 to November 2017 is made, converted into a time series object, and then plotted. The smoothing parameter α is calculated to be 0.644. Figure 34 shows the observed data and the predictions made by the model over the training dataset.

Figure 34: Predicted South China Sea tensions by month using simple exponential smoothing model (for analyses based on GDELT 1.0 Event Database)

Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conﬂictive events); lower values represent lower tensions (i.e., more cooperative events).

For analyses based on the GDELT 2.0 GKG, a model is ﬁt to the training dataset, and a series of predictions for each month from April 2015 to November

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2017 is made, converted into a time series object, and then plotted. The smoothing parameter α is calculated to be 1 ^-04, or essentially zero, which suggests that the model does not provide enough additional information to produce better forecasts than the mean for this speciﬁc training dataset. This could potentially change as more data becomes available in the future. Figure 35 shows the observed data and the

predictions made by the model over the training dataset.

Figure 35: Predicted South China Sea tensions by month using simple exponential smoothing model (for analyses based on GDELT 2.0 GKG)

Autoregressive Model

Like the average and simple exponential smoothing models, the autoregressive (AR) model represents a prediction based on a knowledge of all historical levels of

tensions from earlier time periods within the given timeframe. It diﬀers, however, in that its forecasts are based on “a linear regression of the current value of the series against one or more prior values of the series.” The model predicts that tensions in ¹⁴⁴

144 “6.4.4.4. Common Approaches to Univariate Time Series,” in NIST/SEMATECH e-Handbook of Statistical Methods , July 2017, <http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc444.htm>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

a given month will follow a linear regression trend based on data points from earlier time periods.

In mathematical notation, the autoregressive model predicts the expected tensions ŷ at time t+1 based on the observed level of tensions y at times t , t‑1 , t‑2 , and so on, back to the earliest known time period, using the following equation:

y _t = c + ϕ ₁y _t−1 + ϕ ₂y _t−2 + … + ϕ _py _t−p + e _t

In the above equation, the term c is a constant; e t is white noise; and ϕ gives diﬀerent time-series patterns. The calculations use the Arima() function in the ¹⁴⁵ forecast package in R. When the ARIMA(p,d,q) order parameter ( order ) is set to c(1,0,0) , the function makes forecasts using an autoregressive model.

For analyses using the autoregressive model and based on the GDELT 1.0 Event Database, a model is ﬁt to the training dataset, and a series of predictions for each month from November 2011 to November 2017 is made, converted into a time series object, and then plotted. Figure 36 shows the observed data and the

predictions made by the model over the training dataset.

145 Rob J. Hyndman and George Athana sopou los, “8.3 Autoregressive models,” in Rob J. Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012,

<https://www.otexts.org/fpp/8/3>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 36: Predicted South China Sea tensions by month using autoregressive model (for analyses based on GDELT 1.0 Event Database)

For analyses based on the GDELT 2.0 GKG, a model is ﬁt to the training dataset, and a series of predictions for each month from March 2015 to November 2017 is made, converted into a time series object, and then plotted. Figure 37 shows the observed data and the predictions made by the model over the training dataset.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 37: Predicted South China Sea tensions by month using autoregressive model (for analyses based on GDELT 2.0 GKG)

Moving Average Model

Like several of the models above, the moving average (MA) model represents a prediction based on a knowledge of all historical levels of tensions from earlier time periods within the given timeframe. Whereas the autoregressive model’s forecasts are based on past observations, the moving average model’s forecasts are based on past forecast errors. The model predicts that tensions in a given month will follow ¹⁴⁶ a linear regression trend based on the these forecast errors from earlier time periods.

In mathematical notation, the moving average model predicts the expected tensions ŷ at time t+1 based on past forecast errors related to the observed level of tensions y at times t , t‑1 , t‑2 , and so on, back to the earliest known time period, using the following equation:

y _t = c + e _t + θ ₁e _t−1 + θ ₂e _t−2 + … + θ _qe _t−q, or

146 Rob J. Hyndman and George Athana sopou los, “8.4 Moving average models,” in Rob J. Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012,

<https://www.otexts.org/fpp/8/4>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

y _t = e _t + ϕ ₁e _t−1 + ϕ ²₁e _t−2 + ϕ ³₁e _t−3 + … + ϕ ^p₁e _t−p

In the above equation, the term c is a constant; e t is white noise; and θ gives diﬀerent time-series patterns. The calculations use the Arima() function in the ¹⁴⁷ forecast package in R. When the ARIMA(p,d,q) order parameter ( order ) is set to c(0,0,1) , the function makes forecasts using a moving average model.

For analyses using the moving average model and based on the GDELT 1.0 Event Database, a model is ﬁt to the training dataset, and a series of predictions for each month from March 2011 to November 2017 is made, converted into a time series object, and then plotted. Figure 38 shows the observed data and the predictions made by the model over the training dataset.

Figure 38: Predicted South China Sea tensions by month using moving average model (for analyses based on GDELT 1.0 Event Database)

147 Rob J. Hyndman and George Athana sopou los, “8.4 Moving average models,” in Rob J. Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012,

<https://www.otexts.org/fpp/8/4>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

For analyses based on the GDELT 2.0 GKG, a model is ﬁt to the training dataset, and a series of predictions for each month from April 2015 to November 2017 is made, converted into a time series object, and then plotted. Figure 39 shows the observed data and the predictions made by the model over the training dataset.

Figure 39: Predicted South China Sea tensions by month using moving average model (for analyses based on GDELT 2.0 GKG)

ARIMA Prediction Model

Like the average model, the autoregressive integrated moving average (ARIMA) model represents a prediction based on a knowledge of all historical levels of

tensions from earlier time periods within the given timeframe. It combines both the autoregressive (AR) and moving average (MA) models in addition to allowing for diﬀerencing, represented by integration (I), its inverse. ¹⁴⁸

148 Rob J. Hyndman and George Athana sopou los, “8.5 Non-seasonal ARIMA models,” in Rob J.

Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012,

<https://www.otexts.org/fpp/8/5>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

In mathematical notation, the ARIMA model predicts the expected tensions ŷ at time t+1 based on the observed level of tensions y at times t , t‑1 , t‑2 , and so on, back to the earliest known time period, using the following equation:

y′ _t = c + ϕ ₁y′ _t−1 + … + ϕ _py′ _t−p + θ ₁e _t−1 + … + θ _qe _t−q + e _t

In the above equation, the term c is a constant; e t is white noise; and ϕ and θ give diﬀerent time-series patterns. The ARIMA(p,d,q) model requires the setting of ¹⁴⁹ three parameters: p, the order of the autoregressive part of the model; d, the degree of the ﬁrst diﬀerencing; and q, the order of the moving average part. The calculations use the Arima() function in the forecast package in R.

For analyses using the ARIMA model and based on the GDELT 1.0 Event Database, a model is ﬁt to the training dataset, and a series of predictions for each month from February 2011 to November 2017 is made, converted into a time series object, and then plotted. Figure 40 shows the observed data and the predictions made by four ARIMA model variants over the training dataset.

149 Rob J. Hyndman and George Athana sopou los, “8.5 Non-seasonal ARIMA models,” in Rob J.

Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012,

<https://www.otexts.org/fpp/8/5>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 40: Predicted South China Sea tensions by month using four ARIMA model variants (for analyses based on GDELT 1.0 Event Database)

Note: The solid lines represents predictions based on the models. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conﬂictive events); lower values represent lower tensions (i.e., more cooperative events).

For analyses based on the GDELT 2.0 GKG, a model is ﬁt to the training dataset, and a series of predictions for each month from February 2015 to November 2017 is made, converted into a time series object, and then plotted. Figure 41 shows the observed data and the predictions made by four ARIMA model variants over the training dataset.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 41: Predicted South China Sea tensions by month using four ARIMA model variants (for analyses based on GDELT 2.0 GKG)

Note: The solid lines represent predictions based on the models. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values represent lower tensions (i.e., more negative tone).

When each ARIMA model is ﬁt to the training dataset, an AICc value is produced. Minimizing the AICc value output suggests the best candidate model. The relevant comparisons are discussed in further detail in {4.2.2 Forecast Models}.

在文檔中南海緊張情勢：GDELT 時間序列數據之分析 - 政大學術集成 (頁 122-133)