Analyses with GDELT in Other Geographic Contexts

2. Literature Review

2.5 Analyses with GDELT in Other Geographic Contexts

In the context of South China Sea tensions, there are no existing studies to date using analyses based on GDELT. As such, this dissertation represents a ﬁrst cut at the data and aims to serve as a foundation for future research on the issue. That said, previous studies have used GDELT for analyses related to various other geographic contexts around the world. For the purposes of this dissertation, those studies using GDELT to analyze or predict conﬂictive events, such as protests, violent uprisings, armed conﬂict, and genocides, are the most relevant in terms of purpose, data, and methodology.

77 Scott E. Page, The Model Thinker , University of Michigan, 2015, p. 36.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Two authors, Philip A. Schrodt and James E. Yonamine, have published

various articles over the years covering the use of event databases, including GDELT, for prediction, and their studies serve as an indispensable resource for relevant research. In an article published before the public release of GDELT, Yonamine and Schrodt compare the two primary forms of data used in quantitative conﬂict studies:

structural data and event data. Structural databases focus on broad structural aspects of interstate relations, tend to be manually compiled by researchers, are typically aggregated at the state-year level, and have slowly developed since the 1960s. Event databases are typically derived from news reports, contain records of speciﬁc events or stories, are coded by date and even time, and have evolved from manual

compilation in the mid-1970s to automatic compilation and coding using computer algorithms since the late 1980s, allowing for the inclusion of much more ﬁne-grained data on a theoretically limitless number of issues. They argue that structural

databases have been useful for understanding broad questions about international relations and conﬂict but are limited by their temporal aggregation, unable to shed light on the ongoing interactions between states, and unhelpful for policymakers interested in predicting events at given times. They then discuss various event databases, including their approaches to data collection, challenges faced, and limitations, making relevant suggestions for improvement that seem to have

foreshadowed the launch of GDELT. A follow-up article published two years later ⁷⁸ by the same authors builds upon this work. ⁷⁹

Following GDELT’s initial public release, Yonamine describes in a reference article on dealing with event data the three main aggregation choices available to researchers: actors, actions, and time. The analyses in this dissertation aggregate ⁸⁰

78 James E. Yonamine and Philip A. Schrodt, “A Guide to Event Data: Past, Present, and Future,”

November 28, 2011,

<http://jayyonamine.com/wp-content/uploads/2012/07/YonamineSchrodt_A_Guide_to_Event_Data.

pdf>.

79 Philip A. Schrodt and James E. Yonamine, “A Guide to Event Data: Past, Present, and Future,” All Azimuth 2(2): 5–22, July 2013, <http://dergipark.gov.tr/download/article-ﬁle/147447>.

80 James E. Yonamine, “Working with Event Data: A Guide to Aggregation Choices,” 2013,

<http://jayyonamine.com/wp-content/uploads/2013/04/Working-with-Event-Data-A-Guide-to-Aggre

the data by time into monthly intervals, which are short enough to reﬂect the eﬀects of individual events and be useful for policymakers yet long enough to include suﬃcient data for averaging. This dissertation also ﬁlters data by location, omitting all records not relevant to the South China Sea, and, for analyses related to RQ1, by state actor, in order to assess the relationship between state involvement on tensions.

In the few years since its public release, prediction of conﬂictive events has been among the most common themes in studies using GDELT. Many of these have focused on predicting protests, violent uprisings, armed conﬂict, and genocides in Africa and the Middle East. In an early study using GDELT in 2013, he uses an autoregressive fractionally integrated moving average (ARFIMA) model to forecast levels of violence in districts in Afghanistan. In the study, he also compares existing event databases based on ﬁve attributes—broad spatial coverage, density, geocoding, accuracy, and future availability in real-time—suggesting that GDELT is the ﬁrst to satisfy all of the criteria of an ideal dataset. In other research, Yonamine has also ⁸¹ used GDELT data to analyze the eﬀects of violence against Israel on the Tel Aviv stock exchange, ﬁnding that the two are not signiﬁcantly correlated but that the conﬂictive events do aﬀect certain companies included in the exchange, and the eﬀects of civil war on interstate war, ﬁnding that domestic conﬂicts increase the likelihood of that state becoming involved in interstate conﬂicts with its neighbors. ⁸²

In another early study from 2013, Arva et al. compare the Integrated Conﬂict Early Warning System (ICEWS) database – a US government project that was capable of forecasting conﬂictive events but later became classiﬁed – and the GDELT 1.0 Event Database in terms of the forecast accuracy using various statistical models.

They ﬁnd that GDELT “performs as well or better than the data in the original ICEWS”, suggesting that is likely the best publicly available global dataset for

81 James E. Yonamine, “Predicting Future Levels of Violence in Afghanistan Districts using GDELT,”

April 2013, p. 2,

<http://jayyonamine.com/wp-content/uploads/2013/04/Predicting-Future-Levels-of-Violence-in-Afg hanistan-Districts-using-GDELT.pdf>.

82 James E. Yonamine, A Nuanced Study of Political Conﬂict Using the Global Datasets of Events Location and Tone (GDELT) Dataset , Pennsylvania State University, August 2013,

<https://etda.libraries.psu.edu/catalog/18659 >.

making predictions related to conﬂictive events. Moreover, they conclude that ⁸³ GDELT’s “ﬁrehose” approach to data collection and inclusion in the resulting dataset could make it less suitable for monitoring but may actually be to its beneﬁt for the purposes of statistical forecasting, as is the focus of the analyses for RQ2 in this dissertation. ⁸⁴

Brandt, Freeman, Lin, and Schrodt look at the eﬀect of diﬀerent length

training sets in forecasting using GDELT data. Taking cross-strait relations as a case study, they ﬁnd that shorter length training sets may be as eﬀective as longer ones. ⁸⁵ They also suggest that, at certain points in event data, there may be a “clear change in the dynamics of the data,” so inclusion of all available historical data may not be necessary or even desirable. For the purposes of this dissertation, their conclusions ⁸⁶ are signiﬁcant because they demonstrate that it is not necessary to use the entire dataset dating back to 1979 to achieve meaningful results.

Abb and Strüver, like this study, draw upon Goldstein values from the GDELT 1.0 Event Database, using them as a measure of the “quality of relations” between two countries. It should be noted that this term they chose to use is simply the inverse of “tensions” as it is referred to in this study, so it is essentially measuring tensions between two countries, one of which is always China in their article.

83 Bryan Arva, John Beieler, Ben Fisher, Gustavo Lara, Philip A. Schrodt, Wonjun Song, Marsha Sowell, and Sam Stehle, “Improving Forecasts of International Events of Interest,” European Political Studies Association 2013 Annual General Conference, July 3, 2013,

<https://ssrn.com/abstract=2225130>.

84 Bryan Arva, John Beieler, Ben Fisher, Gustavo Lara, Philip A. Schrodt, Wonjun Song, Marsha Sowell, and Sam Stehle, “Improving Forecasts of International Events of Interest,” European Political Studies Association 2013 Annual General Conference, July 3, 2013, p. 57,

<https://ssrn.com/abstract=2225130>.

85 Patrick T. Brandt, John R. Freeman, Tse-min Lin, and Philip A. Schrodt, “Forecasting Conﬂict in the Cross-Straits: Long Term and Short Term Predictions,” Annual Meeting of the American Political Science Association, September 4, 2013,

<http://www.utdallas.edu/~pbrandt/Patrick_Brandts_Website/Research_ﬁles/ForecastWindows.pdf>

86 Patrick T. Brandt, John R. Freeman, Tse-min Lin, and Philip A. Schrodt, “Forecasting Conﬂict in the Cross-Straits: Long Term and Short Term Predictions,” Annual Meeting of the American Political Science Association, September 4, 2013, p. 3,

<http://www.utdallas.edu/~pbrandt/Patrick_Brandts_Website/Research_ﬁles/ForecastWindows.pdf>

Whereas the analyses in this dissertation use monthly averages, their data are

aggregated into yearly averages in order to match the time interval frequency of their other variables, of which their dependent variable “global policy alignment” is ⁸⁷ derived from United Nations General Assembly voting records. The validity of the ⁸⁸ Goldstein values data is conﬁrmed by manually comparing the levels of

conﬂict/cooperation in each time period with real-world events and relevant

qualitative literature, as is done for tensions data in {3.2.2 Linking GDELT 1.0 Event ⁸⁹ Database Data to Real World Events} and {3.2.5 Linking GDELT 2.0 GKG Data to Real World Events} of this dissertation. Using these data, they conclude that the quality of bilateral relations between China and Southeast Asian countries is strongly correlated with policy alignment with China at the global level. ⁹⁰

Davis, Fuchs, and Johnson also incorporate data from the GDELT 1.0 Event Database into their analysis of bilateral political relations on bilateral trade. As in this dissertation, they use Goldstein values as a measure of “tensions”, noting that it

“captures the likelihood that the event will impact on the stability of the country”. ⁹¹ Furthermore, like this dissertation as well as Abb and Strüver, they manually link real-world events to visible changes in the tensions data to conﬁrm its validity. ⁹²

Various other studies related to interstate relations and conﬂict in other geographic contexts have been conducted using GDELT data. Morgan and Reiter, for

87 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015, p. 54.

88 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015, pp. 49–50.

89 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015, pp. 56–57.

90 Pascal Abb and Georg Strüver, “Regional Linkages and Global Policy Alignment: The Case of China–Southeast Asia Relations,” Issues & Studies 51(4): 33–83, December 2015.

91 Christina Davis, Andreas Fuchs, and Kristina Johnson, “State Control and the Eﬀects of Foreign Relations on Bilateral Trade,” University of Heidelberg Department of Economics Discussion Paper Series 576, November 2014, pp. 21–22,

<http://archiv.ub.uni-heidelberg.de/volltextserver/17673/1/davis_fuchs_johnson_2014_dp576.pdf>.

92 Christina Davis, Andreas Fuchs, and Kristina Johnson, “State Control and the Eﬀects of Foreign Relations on Bilateral Trade,” University of Heidelberg Department of Economics Discussion Paper Series 576, November 2014, pp. 22,

<http://archiv.ub.uni-heidelberg.de/volltextserver/17673/1/davis_fuchs_johnson_2014_dp576.pdf>.

examples, analyze factors aﬀecting government funding allocation for roads in India, one of which is georeferenced violent events from GDELT. Although the variables ⁹³ and approaches used in such studies are not directly applicable to those of this dissertation, it is worth mentioning them as relevant examples of how GDELT data has been used in analyses of international relations in other geographic contexts.

To date, no studies have attempted to use GDELT data to analyze the

relationship between state involvement and tensions in the South China Sea, explore historic levels of tensions in the maritime area, or predict the escalation and

deescalation of tensions there into the future. The analyses of this dissertation aim to change that. By using two distinct databases, two ways of measuring tensions, and aggregating tensions into monthly averages, they ﬁrst assess and provide

visualizations of historical tensions based on observed data related to the South China Sea. Then, for RQ1, they explore the relationship between state involvement and tensions for eleven countries using two diﬀerent interpretations of state

involvement. For RQ2, predictions are made using four benchmark models and four forecast models for past and future tensions in each time period. These models are then compared based on their respective forecast accuracies to determine their relative performance at predicting South China Sea tensions over time.

Regardless of outcome, the results will serve as an important contribution to discussions of the future of maritime territorial disputes in that they will be a ﬁrst attempt to apply these relatively recent methodological approaches to South China Sea regional relations. Moreover, they will either support or refute the many claims and analyses arguing that certain states are responsible for heightened tensions, that South China Sea tensions will increase or decrease, or that the maritime area is primed as a ﬂashpoint for armed conﬂict, claims that invariably lack empirical backing and simply rely on common sense assumptions and incomplete evidence.

93 Richard Morgan and Dan Reiter, “How War Makes the State: Insurgency, External Threat, and Road Construction in India,” revised version of paper from 2013 Annual Meeting of the American Political Science Association, October 17, 2013,

<http://www.cidcm.umd.edu/workshop/papers/reiter.pdf>.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在文檔中南海緊張情勢：GDELT 時間序列數據之分析 - 政大學術集成 (頁 56-62)