• 沒有找到結果。

2. Literature Review

3.4 Statistical Approach for RQ2

3.4.3 Benchmark Models

Note: Higher values represent higher tensions (i.e., more conflictive events); lower values represent  lower tensions (i.e., more cooperative events). Training data include the first 80% of months; test data  include the more recent 20% of events. 

 

Figure 25: Training and test datasets showing monthly average South China Sea tensions  from March 2015 to November 2017 (based on GDELT 2.0 GKG)

 

Note: Higher values represent higher tensions (i.e., more negative tone); lower values represent lower  tensions (i.e., more positive tone). Training data include the first 80% of months; test data include the  more recent 20% of events. 

 

3.4.3 Benchmark Models 

For the purposes of comparison with the four forecast models introduced in {3.4.4  Forecast Models}, four benchmark models are used in the analyses: a random 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

benchmark model, a fixed benchmark model, a linear benchmark model, and an  average benchmark model. The following four subsections explain the approaches,  historical knowledge, assumptions, fundamental logic, and mathematical notation  for each of these benchmark models. They also provide visualizations of the 

predictions generated by each model for past time periods in order to allow readers  to more readily understand them as well as references to the relevant code as 

contained in the appendices. 

Random Benchmark Model 

The random benchmark model represents predictions based on zero knowledge of  the historical levels of tensions except for the possible range of values for the 

prediction. The assumption is made that tensions in the following month will not 141 fall outside of the bounds of the historical minimum and maximum levels of 

tensions. For analyses using the GDELT 1.0 Event Database, the historical range of  South China Sea tensions ( Tensions ) is from -3.019 (lower tensions; more 

cooperative) to +5.722 (higher tensions; more conflictive). For analyses using the  GDELT 2.0 GKG, the historical range of South China Sea tensions ( Tensions ) is  from 0.723 (more positive tone; lower tensions) to 2.137 (more negative tone; higher  tensions). 

Although the random prediction model is simplistic, its logic can be easily  understood. Making random predictions with a knowledge of the range of 

possibilities is clearly more likely to be accurate than doing so without a knowledge  of the possible range. For example, if one were asked to predict the number of 

attendees at an upcoming regular event and given the historical range (i.e., minimum  and maximum number of attendees in all previous events), forecasting within this  range would be a logical approach. 

In mathematical notation, the statement R ~ U([y min ,y max ])  means that R  is a  random number from a uniform distribution between the historical minimum level 

141  The random benchmark model has zero knowledge of data for specific months, but it is provided  with the historical range. 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

of tensions y min  and maximum level of tensions y max . Thus, a random prediction of the  expected tensions ŷ  at time t+1  can be made using the following equation: 

 

ŷ t+1  = R ~ U([y min ,y max ])    

For analyses using the random benchmark model and based on the GDELT 1.0  Event Database, a series of predictions for each month from February 2011 to 

November 2017 is made, converted into a time series object, and then plotted. The  random benchmark model’s predictions for the level of tensions in each month from  February 2011 to November 2017 are shown in Figure 26. It can be seen that the  expected values all fall within the historical range of tensions. 

 

Figure 26: Predicted South China Sea tensions by month using random benchmark model  (for analyses based on GDELT 1.0 Event Database)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

For analyses based on the GDELT 2.0 GKG, a series of predictions for each  month from March 2015 to November 2017 is made, converted into a time series 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

object, and then plotted in the same fashion as the predictions produced above. 

Complete code for the data analyses can be found in {Appendix II: GDELT Data  Analysis Code for R}. The random benchmark model’s predictions for the level of  tensions in each month are shown in Figure 27. It can be seen that the expected  values all fall within the historical range of tensions. 

 

Figure 27: Predicted South China Sea tensions by month using random benchmark model  (for analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

Fixed Benchmark Model 

The fixed benchmark model represents a prediction based only on a knowledge of  the level of tensions from the previous month. It predicts that tensions in a given  month will be equal to tensions in the previous month. Although simplistic, the  rationale for making such predictions is clear because having knowledge of one  relevant value can enable us to make a more educated forecast than having zero  knowledge of previous data. For example, if one were asked to predict the number of  trees in a given forest or the price of a certain stock and was given the figure from 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

the previous time period, it would be logical to guess that the values remained  unchanged in the following time period. 

In mathematical notation, the fixed model predicts the expected tensions ŷ  at  time t+1  based on the observed level of tensions y  at time t  using the following  equation: 

 

ŷ t+1  = y t    

For analyses using the fixed benchmark model and based on the GDELT 1.0  Event Database, a series of predictions for each month from March 2011 to 

November 2017 is made, converted into a time series object, and then plotted. 

Because there is no data for the preceding month, no prediction for February 2011 is  included in the time series. 

 

Figure 28: Predicted South China Sea tensions by month using fixed benchmark model (for  analyses based on GDELT 1.0 Event Database)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

For analyses based on the GDELT 2.0 GKG, a series of predictions for each  month from April 2015 to November 2017 is made, converted into a time series  object, and then plotted. Because there is no data for the preceding month, no  prediction for March 2015 is included in the time series. 

 

Figure 29: Predicted South China Sea tensions by month using fixed benchmark model (for  analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

Linear Benchmark Model 

The linear benchmark model represents a prediction based on a knowledge of the  level of tensions from the two previous months. It predicts that tensions in the  month to be predicted will follow a linear trend based on the two most recent data  points. Because many processes in the real world follow roughly linear trends,  informed predictions can o en be made based on a limited knowledge of historical  data. For example, one could make a more educated forecast of the price of a 

commodity next month if they knew that commodity’s price this month and last  month by inferring that its price would continue to increase or decrease at the same  rate as before.  

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

In mathematical notation, the linear model predicts the expected tensions ŷ  at  time t+1  based on the observed level of tensions y  at times t  and t‑1  using the  following equation: 

 

ŷ t+1  = (y  ‑ y t‑1 ) + y t    

For analyses using the linear benchmark model and based on the GDELT 1.0  Event Database, a series of predictions for each month from April 2011 to November  2017 is made, converted into a time series object, and then plotted. Because data for  the preceding two months is required for forecasting with the model, no predictions  for February 2011 or March 2011 are included in the time series. 

 

Figure 30: Predicted South China Sea tensions by month using linear benchmark model (for  analyses based on GDELT 1.0 Event Database)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

For analyses based on the GDELT 2.0 GKG, a series of predictions for each  month from May 2015 to November 2017 is made, converted into a time series object, 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

and then plotted. Because data for the preceding two months is required for 

forecasting with the model, no predictions for March 2015 or April 2015 are included  in the time series. 

 

Figure 31: Predicted South China Sea tensions by month using linear benchmark model (for  analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone). 

 

Average Benchmark Model 

The average benchmark model represents a prediction based on a knowledge of all  historical levels of tensions from earlier time periods within the given timeframe. It  predicts that tensions in a given month will be equal to the average tensions of all  previous months. As with the models above, the average benchmark model is 

simplistic but has a clear rationale. Having knowledge of historical data and making  the assumption that forecasts for a given time period are more likely to be near the  historical average than unexpected outliers can enable us to make a more educated  forecast than having zero knowledge of previous data. For example, if one were asked  to predict the number of students in a given classroom and was given the average  number of students from all previous time periods, it would be logical to guess that  the number would be near that average in the following time period, assuming that 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

there was no knowledge of trends or other additional information to take into  account. 

In mathematical notation, the average benchmark model predicts the 

expected tensions ŷ  at time t+1  based on the average level of tensions y  at times t  ,  t‑1 , t‑2 , and so on, back to the earliest known time period, using the following  equation: 

 

ŷ t+1  = mean(y , y , y , ... y t‑2 , y t‑1 , y )  or   ŷ t+1  = mean(y t‑n , ... y t‑2 , y t‑1 , y )   

 

For analyses using the average benchmark model and based on the GDELT 1.0  Event Database, a series of predictions for each month from March 2011 to 

November 2017 is made, converted into a time series object, and then plotted. 

Because there is no data for the preceding month, no prediction for February 2011 is  included in the time series. 

 

Figure 32: Predicted South China Sea tensions by month using average benchmark model  (for analyses based on GDELT 1.0 Event Database)

 

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more conflictive events); lower values  represent lower tensions (i.e., more cooperative events). 

 

For analyses based on the GDELT 2.0 GKG, a series of predictions for each  month from April 2015 to November 2017 is made, converted into a time series  object, and then plotted. Because there is no data for the preceding month, no  prediction for March 2015 is included in the time series. 

 

Figure 33: Predicted South China Sea tensions by month using average benchmark model  (for analyses based on GDELT 2.0 GKG)

 

Note: The solid black line represents predictions based on the model. The dotted black line represents  observed tensions. Higher values represent higher tensions (i.e., more positive tone); lower values  represent lower tensions (i.e., more negative tone).