• 沒有找到結果。

2. Literature Review

3.4 Statistical Approach for RQ2

3.4.4 Forecast Models

For analyses based on the GDELT 2.0 GKG, a series of predictions for each  month from April 2015 to November 2017 is made, converted into a time series  object, and then plotted. Because there is no data for the preceding month, no  prediction for March 2015 is included in the time series. 

 

Figure 33: Predicted South China Sea tensions by month using average benchmark model  (for analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

3.4.4 Forecast Models 

In addition to the four benchmark models covered above, this dissertation considers  four forecast models for predicting South China Sea tensions in past and future time  periods: a simple exponential smoothing (SES) model, an autoregressive (AR) model,  a moving average (MA) model, and an autoregressive integrated moving average  (ARIMA) model. As above, the following four subsections explain the approaches,  historical knowledge, assumptions, fundamental logic, and mathematical notation 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

for each of these forecast models. They also provide visualizations of the predictions  generated and references to the relevant code as contained in the appendices. 

Simple Exponential Smoothing Model 

Like the average model, the simple exponential smoothing (SES, also called single  exponential smoothing) model makes predictions based on a knowledge of all  historical levels of tensions from earlier time periods within the given timeframe. It  differs, however, in that it uses a weighted average, in which data from more recent  time periods are considered to have a larger effect on forecasts than earlier data.  142 Because monthly tensions ( Tensions ) appear to be stationary (i.e., without any  increasing or decreasing trend) and do not show any seasonality, forecasts can be  made using simple exponential smoothing. The simple exponential smoothing 143 model’s rationale for weighting recent data more heavily than earlier data is derived  from the knowledge that many real world processes are more affected by recent  events. For example, the state of Russia–United States relations last year would likely  be a better predictor of their bilateral relations this year than would observations  from three decades ago, but it is reasonable to think that relations historically would  continue to have some effect. 

In mathematical notation, the simple exponential smoothing model predicts  the expected tensions ŷ  at time t+1  based on the observed level of tensions y  at times  t  , t‑1 , t‑2 , and so on, back to the earliest known time period, using the following  equation: 

 

ŷ t+1  = αy  + α(1−α) t‑1  + α(1−α) t−2  + … + α(1−α) t−n    

142  Rob J. Hyndman and George Athana sopou los, “7.1 Simple exponential smoothing,” in Rob J. 

Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012, 

<https://www.otexts.org/fpp/7/1>.  

143  Avril Coghlan, A Little Book of R For Time Series: Release 0.2 , June 15, 2017, 

<http://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html>. 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

In the above equation, the alpha α  term is the smoothing parameter and 0 ≤ α 

≤ 1 , where α  near zero gives more weight to earlier data and α  near 1 gives more  weight to recent data. The calculations use the ets()  function in the forecast  package in R. When the model parameter is set to ANN  and the alpha and beta  parameters are set to NULL , the function makes forecasts using the simple 

exponential smoothing model with additive errors, and the α  parameter is calculated  automatically. 

For analyses using the simple exponential smoothing model and based on the  GDELT 1.0 Event Database, a model is fit to the training dataset, and a series of  predictions for each month from March 2011 to November 2017 is made, converted  into a time series object, and then plotted. The smoothing parameter α  is calculated  to be 0.644. Figure 34 shows the observed data and the predictions made by the  model over the training dataset. 

 

Figure 34: Predicted South China Sea tensions by month using simple exponential  smoothing model (for analyses based on GDELT 1.0 Event Database)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

For analyses based on the GDELT 2.0 GKG, a model is fit to the training  dataset, and a series of predictions for each month from April 2015 to November 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

2017 is made, converted into a time series object, and then plotted. The smoothing  parameter α  is calculated to be 1 -04 , or essentially zero, which suggests that the model  does not provide enough additional information to produce better forecasts than the  mean for this specific training dataset. This could potentially change as more data  becomes available in the future. Figure 35 shows the observed data and the 

predictions made by the model over the training dataset. 

 

Figure 35: Predicted South China Sea tensions by month using simple exponential  smoothing model (for analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

Autoregressive Model 

Like the average and simple exponential smoothing models, the autoregressive (AR)  model represents a prediction based on a knowledge of all historical levels of 

tensions from earlier time periods within the given timeframe. It differs, however, in  that its forecasts are based on “a linear regression of the current value of the series  against one or more prior values of the series.” The model predicts that tensions in 144

144  “6.4.4.4. Common Approaches to Univariate Time Series,” in NIST/SEMATECH e-Handbook of  Statistical Methods , July 2017, <http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc444.htm>. 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

a given month will follow a linear regression trend based on data points from earlier  time periods. 

In mathematical notation, the autoregressive model predicts the expected  tensions ŷ  at time t+1  based on the observed level of tensions y  at times t  , t‑1 , t‑2 ,  and so on, back to the earliest known time period, using the following equation: 

 

 = c + ϕ t−1  + ϕ t−2  + … + ϕ t−p  + e t    

In the above equation, the term c  is a constant; e  is white noise; and ϕ  gives  different time-series patterns. The calculations use the Arima()  function in the 145 forecast  package in R. When the ARIMA(p,d,q) order parameter ( order ) is set to  c(1,0,0) , the function makes forecasts using an autoregressive model.  

For analyses using the autoregressive model and based on the GDELT 1.0  Event Database, a model is fit to the training dataset, and a series of predictions for  each month from November 2011 to November 2017 is made, converted into a time  series object, and then plotted. Figure 36 shows the observed data and the 

predictions made by the model over the training dataset. 

 

145  Rob J. Hyndman and George Athana sopou los, “8.3 Autoregressive models,” in Rob J. Hyndman and  George Athana sopou los, Forecasting: Principles and Practice , May 2012, 

<https://www.otexts.org/fpp/8/3>.  

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Figure 36: Predicted South China Sea tensions by month using autoregressive model (for  analyses based on GDELT 1.0 Event Database)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

For analyses based on the GDELT 2.0 GKG, a model is fit to the training  dataset, and a series of predictions for each month from March 2015 to November  2017 is made, converted into a time series object, and then plotted. Figure 37 shows  the observed data and the predictions made by the model over the training dataset. 

 

   

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Figure 37: Predicted South China Sea tensions by month using autoregressive model (for  analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

Moving Average Model 

Like several of the models above, the moving average (MA) model represents a  prediction based on a knowledge of all historical levels of tensions from earlier time  periods within the given timeframe. Whereas the autoregressive model’s forecasts  are based on past observations, the moving average model’s forecasts are based on  past forecast errors. The model predicts that tensions in a given month will follow 146 a linear regression trend based on the these forecast errors from earlier time periods. 

In mathematical notation, the moving average model predicts the expected  tensions ŷ  at time t+1  based on past forecast errors related to the observed level of  tensions y  at times t  , t‑1 , t‑2 , and so on, back to the earliest known time period,  using the following equation: 

 

 = c + e  + θ t−1  + θ t−2  + … + θ t−q , or 

146  Rob J. Hyndman and George Athana sopou los, “8.4 Moving average models,” in Rob J. Hyndman  and George Athana sopou los, Forecasting: Principles and Practice , May 2012, 

<https://www.otexts.org/fpp/8/4>.  

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

 = e  + ϕ t−1  + ϕ t−2  + ϕ t−3  + … + ϕ t−p    

In the above equation, the term c  is a constant; e  is white noise; and θ  gives  different time-series patterns. The calculations use the Arima()  function in the 147 forecast  package in R. When the ARIMA(p,d,q) order parameter ( order ) is set to  c(0,0,1) , the function makes forecasts using a moving average model. 

For analyses using the moving average model and based on the GDELT 1.0  Event Database, a model is fit to the training dataset, and a series of predictions for  each month from March 2011 to November 2017 is made, converted into a time series  object, and then plotted. Figure 38 shows the observed data and the predictions made  by the model over the training dataset. 

 

Figure 38: Predicted South China Sea tensions by month using moving average model (for  analyses based on GDELT 1.0 Event Database)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

147  Rob J. Hyndman and George Athana sopou los, “8.4 Moving average models,” in Rob J. Hyndman  and George Athana sopou los, Forecasting: Principles and Practice , May 2012, 

<https://www.otexts.org/fpp/8/4>.  

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

For analyses based on the GDELT 2.0 GKG, a model is fit to the training  dataset, and a series of predictions for each month from April 2015 to November  2017 is made, converted into a time series object, and then plotted. Figure 39 shows  the observed data and the predictions made by the model over the training dataset. 

 

Figure 39: Predicted South China Sea tensions by month using moving average model (for  analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

ARIMA Prediction Model 

Like the average model, the autoregressive integrated moving average (ARIMA)  model represents a prediction based on a knowledge of all historical levels of 

tensions from earlier time periods within the given timeframe. It combines both the  autoregressive (AR) and moving average (MA) models in addition to allowing for  differencing, represented by integration (I), its inverse.    148

148  Rob J. Hyndman and George Athana sopou los, “8.5 Non-seasonal ARIMA models,” in Rob J. 

Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012, 

<https://www.otexts.org/fpp/8/5>.  

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

In mathematical notation, the ARIMA model predicts the expected tensions ŷ  at time t+1  based on the observed level of tensions y  at times t  , t‑1 , t‑2 , and so on,  back to the earliest known time period, using the following equation: 

 

y′  = c + ϕ y′ t−1  + … + ϕ y′ t−p  + θ t−1  + … + θ t−q  + e t    

In the above equation, the term c  is a constant; e  is white noise; and ϕ  and θ  give different time-series patterns. The ARIMA(p,d,q) model requires the setting of 149 three parameters: p, the order of the autoregressive part of the model; d, the degree  of the first differencing; and q, the order of the moving average part. The calculations  use the Arima()  function in the forecast  package in R.  

For analyses using the ARIMA model and based on the GDELT 1.0 Event  Database, a model is fit to the training dataset, and a series of predictions for each  month from February 2011 to November 2017 is made, converted into a time series  object, and then plotted. Figure 40 shows the observed data and the predictions made  by four ARIMA model variants over the training dataset. 

 

149  Rob J. Hyndman and George Athana sopou los, “8.5 Non-seasonal ARIMA models,” in Rob J. 

Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012, 

<https://www.otexts.org/fpp/8/5>.  

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Figure 40: Predicted South China Sea tensions by month using four ARIMA model variants  (for analyses based on GDELT 1.0 Event Database)

 

Note: The solid lines represents predictions based on the models. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

For analyses based on the GDELT 2.0 GKG, a model is fit to the training  dataset, and a series of predictions for each month from February 2015 to November  2017 is made, converted into a time series object, and then plotted. Figure 41 shows  the observed data and the predictions made by four ARIMA model variants over the  training dataset. 

 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Figure 41: Predicted South China Sea tensions by month using four ARIMA model variants  (for analyses based on GDELT 2.0 GKG)

 

Note: The solid lines represent predictions based on the models. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

When each ARIMA model is fit to the training dataset, an AICc value is  produced. Minimizing the AICc value output suggests the best candidate model. The  relevant comparisons are discussed in further detail in {4.2.2 Forecast Models}.