2. Literature Review
2.5 Analyses with GDELT in Other Geographic Contexts
In the context of South China Sea tensions, there are no existing studies to date using analyses based on GDELT. As such, this dissertation represents a first cut at the data and aims to serve as a foundation for future research on the issue. That said, previous studies have used GDELT for analyses related to various other geographic contexts around the world. For the purposes of this dissertation, those studies using GDELT to analyze or predict conflictive events, such as protests, violent uprisings, armed conflict, and genocides, are the most relevant in terms of purpose, data, and methodology.
77 Scott E. Page, The Model Thinker , University of Michigan, 2015, p. 36.
‧
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Two authors, Philip A. Schrodt and James E. Yonamine, have published
various articles over the years covering the use of event databases, including GDELT, for prediction, and their studies serve as an indispensable resource for relevant research. In an article published before the public release of GDELT, Yonamine and Schrodt compare the two primary forms of data used in quantitative conflict studies:
structural data and event data. Structural databases focus on broad structural aspects of interstate relations, tend to be manually compiled by researchers, are typically aggregated at the state-year level, and have slowly developed since the 1960s. Event databases are typically derived from news reports, contain records of specific events or stories, are coded by date and even time, and have evolved from manual
compilation in the mid-1970s to automatic compilation and coding using computer algorithms since the late 1980s, allowing for the inclusion of much more fine-grained data on a theoretically limitless number of issues. They argue that structural
databases have been useful for understanding broad questions about international relations and conflict but are limited by their temporal aggregation, unable to shed light on the ongoing interactions between states, and unhelpful for policymakers interested in predicting events at given times. They then discuss various event databases, including their approaches to data collection, challenges faced, and limitations, making relevant suggestions for improvement that seem to have
foreshadowed the launch of GDELT. A follow-up article published two years later 78 by the same authors builds upon this work. 79
Following GDELT’s initial public release, Yonamine describes in a reference article on dealing with event data the three main aggregation choices available to researchers: actors, actions, and time. The analyses in this dissertation aggregate 80
78 James E. Yonamine and Philip A. Schrodt, “A Guide to Event Data: Past, Present, and Future,”
November 28, 2011,
<http://jayyonamine.com/wp-content/uploads/2012/07/YonamineSchrodt_A_Guide_to_Event_Data.
pdf>.
79 Philip A. Schrodt and James E. Yonamine, “A Guide to Event Data: Past, Present, and Future,” All Azimuth 2(2): 5–22, July 2013, <http://dergipark.gov.tr/download/article-file/147447>.
80 James E. Yonamine, “Working with Event Data: A Guide to Aggregation Choices,” 2013,
<http://jayyonamine.com/wp-content/uploads/2013/04/Working-with-Event-Data-A-Guide-to-Aggre
the data by time into monthly intervals, which are short enough to reflect the effects of individual events and be useful for policymakers yet long enough to include sufficient data for averaging. This dissertation also filters data by location, omitting all records not relevant to the South China Sea, and, for analyses related to RQ1, by state actor, in order to assess the relationship between state involvement on tensions.
In the few years since its public release, prediction of conflictive events has been among the most common themes in studies using GDELT. Many of these have focused on predicting protests, violent uprisings, armed conflict, and genocides in Africa and the Middle East. In an early study using GDELT in 2013, he uses an autoregressive fractionally integrated moving average (ARFIMA) model to forecast levels of violence in districts in Afghanistan. In the study, he also compares existing event databases based on five attributes—broad spatial coverage, density, geocoding, accuracy, and future availability in real-time—suggesting that GDELT is the first to satisfy all of the criteria of an ideal dataset. In other research, Yonamine has also 81 used GDELT data to analyze the effects of violence against Israel on the Tel Aviv stock exchange, finding that the two are not significantly correlated but that the conflictive events do affect certain companies included in the exchange, and the effects of civil war on interstate war, finding that domestic conflicts increase the likelihood of that state becoming involved in interstate conflicts with its neighbors. 82
In another early study from 2013, Arva et al. compare the Integrated Conflict Early Warning System (ICEWS) database – a US government project that was capable of forecasting conflictive events but later became classified – and the GDELT 1.0 Event Database in terms of the forecast accuracy using various statistical models.
They find that GDELT “performs as well or better than the data in the original ICEWS”, suggesting that is likely the best publicly available global dataset for
81 James E. Yonamine, “Predicting Future Levels of Violence in Afghanistan Districts using GDELT,”
April 2013, p. 2,
<http://jayyonamine.com/wp-content/uploads/2013/04/Predicting-Future-Levels-of-Violence-in-Afg hanistan-Districts-using-GDELT.pdf>.
82 James E. Yonamine, A Nuanced Study of Political Conflict Using the Global Datasets of Events Location and Tone (GDELT) Dataset , Pennsylvania State University, August 2013,
<https://etda.libraries.psu.edu/catalog/18659 >.
making predictions related to conflictive events. Moreover, they conclude that 83 GDELT’s “firehose” approach to data collection and inclusion in the resulting dataset could make it less suitable for monitoring but may actually be to its benefit for the purposes of statistical forecasting, as is the focus of the analyses for RQ2 in this dissertation. 84
Brandt, Freeman, Lin, and Schrodt look at the effect of different length
training sets in forecasting using GDELT data. Taking cross-strait relations as a case study, they find that shorter length training sets may be as effective as longer ones. 85 They also suggest that, at certain points in event data, there may be a “clear change in the dynamics of the data,” so inclusion of all available historical data may not be necessary or even desirable. For the purposes of this dissertation, their conclusions 86 are significant because they demonstrate that it is not necessary to use the entire dataset dating back to 1979 to achieve meaningful results.
Abb and Strüver, like this study, draw upon Goldstein values from the GDELT 1.0 Event Database, using them as a measure of the “quality of relations” between two countries. It should be noted that this term they chose to use is simply the inverse of “tensions” as it is referred to in this study, so it is essentially measuring tensions between two countries, one of which is always China in their article.
83 Bryan Arva, John Beieler, Ben Fisher, Gustavo Lara, Philip A. Schrodt, Wonjun Song, Marsha Sowell, and Sam Stehle, “Improving Forecasts of International Events of Interest,” European Political Studies Association 2013 Annual General Conference, July 3, 2013,
<https://ssrn.com/abstract=2225130>.
84 Bryan Arva, John Beieler, Ben Fisher, Gustavo Lara, Philip A. Schrodt, Wonjun Song, Marsha Sowell, and Sam Stehle, “Improving Forecasts of International Events of Interest,” European Political Studies Association 2013 Annual General Conference, July 3, 2013, p. 57,
<https://ssrn.com/abstract=2225130>.
85 Patrick T. Brandt, John R. Freeman, Tse-min Lin, and Philip A. Schrodt, “Forecasting Conflict in the Cross-Straits: Long Term and Short Term Predictions,” Annual Meeting of the American Political Science Association, September 4, 2013,
<http://www.utdallas.edu/~pbrandt/Patrick_Brandts_Website/Research_files/ForecastWindows.pdf>
.
86 Patrick T. Brandt, John R. Freeman, Tse-min Lin, and Philip A. Schrodt, “Forecasting Conflict in the Cross-Straits: Long Term and Short Term Predictions,” Annual Meeting of the American Political Science Association, September 4, 2013, p. 3,
<http://www.utdallas.edu/~pbrandt/Patrick_Brandts_Website/Research_files/ForecastWindows.pdf>
Whereas the analyses in this dissertation use monthly averages, their data are
aggregated into yearly averages in order to match the time interval frequency of their other variables, of which their dependent variable “global policy alignment” is 87 derived from United Nations General Assembly voting records. The validity of the 88 Goldstein values data is confirmed by manually comparing the levels of
conflict/cooperation in each time period with real-world events and relevant
qualitative literature, as is done for tensions data in {3.2.2 Linking GDELT 1.0 Event 89 Database Data to Real World Events} and {3.2.5 Linking GDELT 2.0 GKG Data to Real World Events} of this dissertation. Using these data, they conclude that the quality of bilateral relations between China and Southeast Asian countries is strongly correlated with policy alignment with China at the global level. 90
Davis, Fuchs, and Johnson also incorporate data from the GDELT 1.0 Event Database into their analysis of bilateral political relations on bilateral trade. As in this dissertation, they use Goldstein values as a measure of “tensions”, noting that it
“captures the likelihood that the event will impact on the stability of the country”. 91 Furthermore, like this dissertation as well as Abb and Strüver, they manually link real-world events to visible changes in the tensions data to confirm its validity. 92
Various other studies related to interstate relations and conflict in other geographic contexts have been conducted using GDELT data. Morgan and Reiter, for
87 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015, p. 54.
88 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015, pp. 49–50.
89 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015, pp. 56–57.
90 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015.
91 Christina Davis, Andreas Fuchs, and Kristina Johnson, “State Control and the Effects of Foreign Relations on Bilateral Trade,” University of Heidelberg Department of Economics Discussion Paper Series 576, November 2014, pp. 21–22,
<http://archiv.ub.uni-heidelberg.de/volltextserver/17673/1/davis_fuchs_johnson_2014_dp576.pdf>.
92 Christina Davis, Andreas Fuchs, and Kristina Johnson, “State Control and the Effects of Foreign Relations on Bilateral Trade,” University of Heidelberg Department of Economics Discussion Paper Series 576, November 2014, pp. 22,
<http://archiv.ub.uni-heidelberg.de/volltextserver/17673/1/davis_fuchs_johnson_2014_dp576.pdf>.
examples, analyze factors affecting government funding allocation for roads in India, one of which is georeferenced violent events from GDELT. Although the variables 93 and approaches used in such studies are not directly applicable to those of this dissertation, it is worth mentioning them as relevant examples of how GDELT data has been used in analyses of international relations in other geographic contexts.
To date, no studies have attempted to use GDELT data to analyze the
relationship between state involvement and tensions in the South China Sea, explore historic levels of tensions in the maritime area, or predict the escalation and
deescalation of tensions there into the future. The analyses of this dissertation aim to change that. By using two distinct databases, two ways of measuring tensions, and aggregating tensions into monthly averages, they first assess and provide
visualizations of historical tensions based on observed data related to the South China Sea. Then, for RQ1, they explore the relationship between state involvement and tensions for eleven countries using two different interpretations of state
involvement. For RQ2, predictions are made using four benchmark models and four forecast models for past and future tensions in each time period. These models are then compared based on their respective forecast accuracies to determine their relative performance at predicting South China Sea tensions over time.
Regardless of outcome, the results will serve as an important contribution to discussions of the future of maritime territorial disputes in that they will be a first attempt to apply these relatively recent methodological approaches to South China Sea regional relations. Moreover, they will either support or refute the many claims and analyses arguing that certain states are responsible for heightened tensions, that South China Sea tensions will increase or decrease, or that the maritime area is primed as a flashpoint for armed conflict, claims that invariably lack empirical backing and simply rely on common sense assumptions and incomplete evidence.
93 Richard Morgan and Dan Reiter, “How War Makes the State: Insurgency, External Threat, and Road Construction in India,” revised version of paper from 2013 Annual Meeting of the American Political Science Association, October 17, 2013,
<http://www.cidcm.umd.edu/workshop/papers/reiter.pdf>.
‧
國
立 政 治 大 學