4. Results and Discussion
4.2 Predicting Tensions
4.2.1 Benchmark Models Random Benchmark Model
As discussed in {3.4 Statistical Approach for RQ2}, the random benchmark model represents predictions based on zero knowledge of the historical levels of tensions except for the possible range of values for the prediction. The assumption is made 157 that tensions in the following month will not fall outside of the bounds of the
historical minimum and maximum levels of tensions. The rationale for this model as relates to making predictions in the real world is covered in {3.4.3 Benchmark
Models}.
For analyses using the GDELT 1.0 Event Database, the historical range of South China Sea tensions ( Tensions ) is from -3.019 (lower tensions; more
cooperative) to +5.722 (higher tensions; more conflictive). The random benchmark model’s predictions for the level of tensions in each month for the January 2011 to November 2017 time period plus twelve months into the future are shown in Figure 50.
157 The random benchmark model has zero knowledge of data for specific months, but it is provided with the historical range.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 50: Predicted South China Sea tensions by month using random benchmark model (for analyses based on GDELT 1.0 Event Database)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values represent lower tensions (i.e., more cooperative events).
By comparing the results of the random benchmark model and the observed data from the test dataset, different forecast accuracy measures are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.971 2.788 2.441 209.998 397.819 0.435 1.44
The mean absolute error (MAE) value for the forecast data is 2.441, meaning that tensions forecasted by the random benchmark model were on average 2.441 away from the actual level of tensions observed in each time period.
For analyses using the GDELT 2.0 GKG, the historical range of South China Sea tensions ( Tensions ) is from 0.723 (more positive tone; lower tensions) to 2.137 (more negative tone; higher tensions). The random benchmark model’s predictions for the level of tensions in each month for the March 2015 to November 2017 time period plus twelve months into the future are shown in Figure 51.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 51: Predicted South China Sea tensions by month using random benchmark model (for analyses based on GDELT 2.0 GKG)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more negative tone); lower values represent lower tensions (i.e., more positive tone).
By comparing the results of the random benchmark model and the observed data from the test dataset, MAE and other measures of forecast accuracy are
calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.416 0.898 0.806 ‑54.721 76.594 ‑0.064 1.788
The MAE for the forecast data is 0.806, meaning that tensions forecasted by the random benchmark model were on average 0.806 away from the actual level of tensions observed in each time period.
Fixed Benchmark Model
As discussed in {3.4 Statistical Approach for RQ2}, the fixed benchmark model represents a prediction based only on a knowledge of the level of tensions from the previous month. It predicts that tensions in a given month will be equal to tensions
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
in the previous month. The rationale for this model as relates to making predictions in the real world is covered in {3.4.3 Benchmark Models}.
For analyses using the GDELT 1.0 Event Database, the fixed benchmark model’s predictions for the level of tensions in each month for the March 2011 to November 2017 time period plus twelve months into the future are shown in Figure 52.
Figure 52: Predicted South China Sea tensions by month using fixed benchmark model (for analyses based on GDELT 1.0 Event Database)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values represent lower tensions (i.e., more cooperative events).
By comparing the results of the fixed benchmark model and the observed data from the test dataset, MAE and other measures of forecast accuracy are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set 0.033 1.419 1.094 64.518 158.024 ‑0.145 1
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
The mean absolute error (MAE) value for the forecast data is 1.094, meaning that tensions forecasted by the fixed benchmark model were on average 1.094 away from the actual level of tensions observed in each time period.
For analyses using the GDELT 2.0 GKG, the fixed benchmark model’s predictions for the level of tensions in each month for the April 2015 to November 2017 time period plus twelve months into the future are shown in Figure 53.
Figure 53: Predicted South China Sea tensions by month using fixed benchmark model (for analyses based on GDELT 2.0 GKG)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more negative tone); lower values represent lower tensions (i.e., more positive tone).
By comparing the results of the fixed benchmark model and the observed data from the test dataset, different forecast accuracy measures are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.069 0.595 0.429 ‑15.131 35.086 ‑0.273 1
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
The MAE for the forecast data is 0.429, meaning that tensions forecasted by the fixed benchmark model were on average 0.429 away from the actual level of tensions observed in each time period.
Linear Benchmark Model
As discussed in {3.4 Statistical Approach for RQ2}, the linear benchmark model represents a prediction based on a knowledge of the level of tensions from the two previous months. It predicts that tensions in the month to be predicted will follow a linear trend based on the two most recent data points. The rationale for this model as relates to making predictions in the real world is covered in {3.4.3 Benchmark
Models}.
For analyses using the GDELT 1.0 Event Database, the linear benchmark model’s predictions for the level of tensions in each month for the January 2011 to November 2017 time period plus twelve months into the future are shown in Figure 54.
Figure 54: Predicted South China Sea tensions by month using linear benchmark model (for analyses based on GDELT 1.0 Event Database)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values represent lower tensions (i.e., more cooperative events).
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
By comparing the results of the linear benchmark model and the observed data from the test dataset, MAE and other measures of forecast accuracy are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.13 2.14 1.547 ‑33.427 220.496 ‑0.42 1.147
The mean absolute error (MAE) value for the forecast data is 1.547, meaning that tensions forecasted by the linear benchmark model were on average 1.547 away from the actual level of tensions observed in each time period.
For analyses using the GDELT 2.0 GKG, the linear benchmark model’s predictions for the level of tensions in each month for the May 2015 to November 2017 time period plus twelve months into the future are shown in Figure 55.
Figure 55: Predicted South China Sea tensions by month using linear benchmark model (for analyses based on GDELT 2.0 GKG)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more negative tone); lower values represent lower tensions (i.e., more positive tone).
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
By comparing the results of the linear benchmark model and the observed data from the test dataset, different forecast accuracy measures are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.058 0.984 0.777 ‑9.127 73.755 ‑0.5 1.49
The MAE for the forecast data is 0.777, meaning that tensions forecasted by the linear benchmark model were on average 0.777 away from the actual level of tensions observed in each time period.
Average Benchmark Model
As discussed in {3.4 Statistical Approach for RQ2}, the average benchmark model represents a prediction based on a knowledge of all historical levels of tensions from earlier time periods within the given timeframe. It predicts that tensions in a given month will be equal to the average tensions of all previous months. The rationale for this model as relates to making predictions in the real world is covered in {3.4.3 Benchmark Models}.
For analyses using the GDELT 1.0 Event Database, the average benchmark model’s predictions for the level of tensions in each month for the March 2011 to November 2017 time period plus twelve months into the future are shown in Figure 56.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 56: Predicted South China Sea tensions by month using average benchmark model (for analyses based on GDELT 1.0 Event Database)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values represent lower tensions (i.e., more cooperative events).
By comparing the results of the average benchmark model and the observed data from the test dataset, MAE and other measures of forecast accuracy are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.056 1.252 0.916 123.139 161.742 0.336 0.625
The mean absolute error (MAE) value for the forecast data is 0.916, meaning that tensions forecasted by the average benchmark model were on average 0.916 away from the actual level of tensions observed in each time period.
For analyses using the GDELT 2.0 GKG, the average benchmark model’s predictions for the level of tensions in each month for the April 2015 to November 2017 time period plus twelve months into the future are shown in Figure 57.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 57: Predicted South China Sea tensions by month using average benchmark model (for analyses based on GDELT 2.0 GKG)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more negative tone); lower values represent lower tensions (i.e., more positive tone).
By comparing the results of the average benchmark model and the observed data from the test dataset, different forecast accuracy measures are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.224 0.483 0.453 ‑30.96 41.65 0.039 1.06
The MAE for the forecast data is 0.453, meaning that tensions forecasted by the average benchmark model were on average 0.453 away from the actual level of tensions observed in each time period.
4.2.2 Forecast Models
Simple Exponential Smoothing Model
As discussed in {3.4 Statistical Approach for RQ2}, the simple exponential smoothing model represents a prediction based on a knowledge of all historical levels of
tensions from earlier time periods within the given timeframe and makes predictions using a weighted average, in which data from more recent time periods are
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
considered to have a larger effect on forecasts than those from earlier time periods.
The rationale for this model as relates to making predictions in the real world is covered in {3.4.4 Forecast Models}.
For analyses using the GDELT 1.0 Event Database, the simple exponential smoothing model’s predictions for the level of tensions in each month for the
February 2011 to November 2017 time period plus twelve months into the future are shown in Figure 58.
Figure 58: Predicted South China Sea tensions by month using simple exponential smoothing model (for analyses based on GDELT 1.0 Event Database)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values represent lower tensions (i.e., more cooperative events).
By comparing the results of the simple exponential smoothing model and the observed data from the test dataset, different forecast accuracy measures are
calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set 0.123 1.27 0.965 113.969 126.485 0.322 0.799
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
The mean absolute error (MAE) value for the forecast data is 0.965, meaning that tensions forecasted by the simple exponential smoothing model were on average 0.965 away from the actual level of tensions observed in each time period.
For analyses using the GDELT 2.0 GKG, the simple exponential smoothing model’s predictions for the level of tensions in each month for the March 2015 to November 2017 time period plus twelve months into the future are shown in Figure 59.
Figure 59: Predicted South China Sea tensions by month using simple exponential smoothing model (for analyses based on GDELT 2.0 GKG)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more negative tone); lower values represent lower tensions (i.e., more positive tone).
The smoothing parameter α is calculated to be 1 -04 , or essentially zero, which suggests that the model does not provide enough additional information to produce better forecasts than the mean for these specific training and test datasets. This could potentially change as more data becomes available in the future or if the training and test data windows were shi ed. By comparing the results of the simple exponential smoothing model and the observed data from the test dataset, MAE and other measures of forecast accuracy are calculated.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set 0 0.421 0.29 ‑9.158 22.829 0.038 0.92
The MAE for the forecast data is 0.290, meaning that tensions forecasted by the simple exponential smoothing model were on average 0.290 away from the actual level of tensions observed in each time period. Although this can be used for comparison with other models, it may not be particularly meaningful and is likely coincidental with the test dataset falling near the mean produced by the model.
Autoregressive Model
As discussed in {3.4 Statistical Approach for RQ2}, the autoregressive (AR) model represents a prediction based on a knowledge of all historical levels of tensions from earlier time periods within the given timeframe and predicts that tensions in a given month will follow a linear regression trend based on data points from earlier time periods. The rationale for this model as relates to making predictions in the real world is covered in {3.4.4 Forecast Models}.
For analyses using the GDELT 1.0 Event Database, the autoregressive model’s predictions for the level of tensions in each month for the February 2011 to
November 2017 time period plus twelve months into the future are shown in Figure 60.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 60: Predicted South China Sea tensions by month using autoregressive model (for analyses based on GDELT 1.0 Event Database)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values represent lower tensions (i.e., more cooperative events).
By comparing the results of the autoregressive model and the observed data from the test dataset, different forecast accuracy measures are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.063 1.188 0.873 115.274 161.949 0.226 0.645
The mean absolute error (MAE) value for the forecast data is 0.873, meaning that tensions forecasted by the autoregressive model were on average 0.873 away from the actual level of tensions observed in each time period.
For analyses using the GDELT 2.0 GKG, the autoregressive model’s
predictions for the level of tensions in each month for the March 2015 to November 2017 time period plus twelve months into the future are shown in Figure 61.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 61: Predicted South China Sea tensions by month using autoregressive model (for analyses based on GDELT 2.0 GKG)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more negative tone); lower values represent lower tensions (i.e., more positive tone).
By comparing the results of the autoregressive model and the observed data from the test dataset, MAE and other measures of forecast accuracy are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.261 0.496 0.473 ‑34.418 44.33 0.071 1.122
The MAE for the forecast data is 0.473, meaning that tensions forecasted by the autoregressive model were on average 0.473 away from the actual level of tensions observed in each time period.
Moving Average Model
As discussed in {3.4 Statistical Approach for RQ2}, the moving average (MA) model represents a prediction based on a knowledge of all historical levels of tensions from earlier time periods within the given timeframe and predicts that tensions in a given
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
month will follow a linear regression trend based on past forecast errors. The 158 rationale for this model as relates to making predictions in the real world is covered in {3.4.4 Forecast Models}.
For analyses using the GDELT 1.0 Event Database, the moving average model’s predictions for the level of tensions in each month for the February 2011 to November 2017 time period plus twelve months into the future are shown in Figure 62.
Figure 62: Predicted South China Sea tensions by month using moving average model (for analyses based on GDELT 1.0 Event Database)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values represent lower tensions (i.e., more cooperative events).
By comparing the results of the moving average model and the observed data from the test dataset, different forecast accuracy measures are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.068 1.179 0.863 113.753 160.539 0.22 0.639
158 Rob J. Hyndman and George Athana sopou los, “8.4 Moving average models,” in Rob J. Hyndman and George Athana sopou los, Forecasting: Principles and Practice , May 2012,
<https://www.otexts.org/fpp/8/4>.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
The mean absolute error (MAE) value for the forecast data is 0.863, meaning that tensions forecasted by the moving average model were on average 0.863 away from the actual level of tensions observed in each time period.
For analyses using the GDELT 2.0 GKG, the moving average model’s
predictions for the level of tensions in each month for the April 2015 to November 2017 time period plus twelve months into the future are shown in Figure 63.
Figure 63: Predicted South China Sea tensions by month using moving average model (for analyses based on GDELT 2.0 GKG)
Note: The solid black line represents predictions based on the model. The dotted black line represents observed tensions. Higher values represent higher tensions (i.e., more negative tone); lower values represent lower tensions (i.e., more positive tone).
By comparing the results of the moving average model and the observed data from the test dataset, MAE and other measures of forecast accuracy are calculated.
ME RMSE MAE MPE MAPE ACF1 Theil's U Test set ‑0.262 0.495 0.473 ‑34.44 44.323 0.068 1.12
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
The MAE for the forecast data is 0.473, meaning that tensions forecasted by the moving average model were on average 0.473 away from the actual level of tensions observed in each time period.
ARIMA Model
As discussed in {3.4 Statistical Approach for RQ2}, the autoregressive integrated moving average (ARIMA) model represents a prediction based on a knowledge of all historical levels of tensions from earlier time periods within the given timeframe and predicts that tensions will be based on a combination of the autoregressive (AR) model and the moving average (MA) model while taking into account integration (I), the inverse of differencing. The rationale for this model as relates to making
predictions in the real world is covered in {3.4.4 Forecast Models}.
Four ARIMA model variants are fit to the training data, and the relevant AICc values are calculated for each. These are summarized in Table 15.
Table 15: ARIMA model variants and AICc values
Database ARIMA(p,d,q) Model AICc
GDELT 1.0 Event Database
ARIMA(1,0,1) 253.24
ARIMA(1,1,0) 270.58
ARIMA(0,1,1) 249.99
ARIMA(1,1,1) 251.41
GDELT 2.0 GKG
ARIMA(1,0,1) 9.63
ARIMA(1,1,0) 15.30
ARIMA(0,1,1) 9.28
ARIMA(1,1,1) 11.83
Minimizing the AICc value output suggests most suitable ARIMA model for the data. Based on the AICc values, it can be determined that an ARIMA(0,1,1) model
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
is the best of the four variants for analyses with both the GDELT 1.0 Event Database and GDELT 2.0 GKG. 159
For analyses using the GDELT 1.0 Event Database, the ARIMA(0,1,1) model’s
For analyses using the GDELT 1.0 Event Database, the ARIMA(0,1,1) model’s