Theoretical Background - 政治人物，推特與金融市場：來自川普推特的證據

III. METHODOLOGY

3.2 Theoretical Background

國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

3.1.3 Dependent and Independent Variables

To analyze the effects on equity markets, the daily closing price was set as the key dependent variable. A number of covariates were also chosen, as follows: market capitalization, price-to-earnings (P/E) ratio, gross profit, total revenue, book value, and dividend rate. For both dependent and covariate variables, data was collected using the Python module, yahoofinancials, which allows users to pull data from the now-defunct Yahoo Finance application programming interface (API) (Sanders, 2019). Closing stock price data was collected on a daily frequency, while covariate data is time-invariant and represents only the most recent data for said variables.

As for the foreign exchange market part of this study, exchange rates were used as the dependent variable. Using a simple Python program, daily closing exchange rates were programmatically collected from the currencylayer API (apilayer, 2019). Exchange rate pairs used in this study have the U.S. dollar (USD) set as the base currency, therefore the format should be as follows: USD/country currency. For covariates, the following variables were used: consumer price index (CPI), population, Gross Domestic Capital (GDP) per capita (purchasing power parity or PPP) in current international dollars, and current account in current international dollars. For these variables, data was once again programmatically collected, but this time from the World Bank API through the wbdata module for Python (Sherouse, 2014). For some missing data, mostly for current account values, data was collected from the American Central Intelligence Agency’s (CIA) World Factbook (CIA, 2019).

3.2 Theoretical Background

As noted earlier, the main objective of this research is to study the causal relationship between Trump’s tweets directed at specific countries or companies, and movements in stock prices and exchange rates. This differs from some of the existing literature which tend to be more focused on studying the predictive power of Trump’s tweets, and thus, are

‧

built on more sentiment analysis-focused, machine-related methodology. Perhaps one downside to these studies is that they may sometimes gloss over ascertaining the actual cause-and-effect relationship that these tweets have with financial markets. On the other hand, by using the SC method, this allows for the estimation of causal relationships in a panel data setting. For this thesis, this statistical procedure was performed using the synth and synth_runner packages for Stata (Abadie et al., 2011; Galiani & Quistorff, 2017).

How does one estimate the effect of a certain intervention, policy, or law? In this case, how does one estimate the effect of a tweet on stock prices or exchange rates? To answer these questions, researchers have utilized comparative case studies to develop causal explanations for these events. Doing so involves comparing one or more units exposed to the event to one or more unexposed units (Abadie et al., 2010). However, a major pitfall of this method is that even if aggregate data were to be used, there continues to be some uncertainty about producing a suitable counterfactual outcome that illustrates how the affected group would have developed or changed in the absence of the intervention or event (Abadie et al., 2010). Thus, researchers are left to speculate about what could have happened had the event never occurred.

The synthetic control method aims to address this issue by constructing a weighted combination of control units to model the counterfactual -- the “synthetic control”, so to speak (Cunningham, 2018). Doing so allows researchers to estimate treatment effects by comparing treated unit outcomes with control unit outcomes. In a traditional comparative case study, the selection of comparison units may lead to erroneous conclusions, especially if the comparison units are not sufficiently similar to the treatment units (Abadie et al., 2015). As a solution to this issue, the SC method offers a more systematic way to choose comparison units by assigning varying weights to different control units. Abadie et al.

(2015) have likewise argued that a set of control units provides a better representation of the treatment unit as opposed to using a single comparison unit alone. Through the SC method, the counterfactual is “selected as the weighted average of all potential comparison units that best resembles the characteristics of the case of interest” (Abadie et al., 2015).

‧

國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

First, let be the outcome observed for unit i, where i = 1, …, J + 1, and at time period t, where t = 1, …, T, in the absence of the intervention. The interaction or treatment occurs at period where , with unit being affected by the intervention, while units remain unaffected by the intervention (Yang, 2019). Knowing this, let be the observed outcome for region i at time t if unit i received the treatment, and let represent the observed outcome for unit i at time t if unit i had never received the treatment (Yang, 2019).

Before the event, assume that the intervention has no effect on the outcome before the treatment period, thus, for periods and all units , then (Abadie et al., 2010). Next, let be the unit ( ) that receives treatment from periods until , and let be units that do not receive treatment. From this, we can derive the effect of the intervention for unit i at time t, as follows. Let the observed outcome be . Before intervention, the observed outcomes are . Therefore, after the intervention (after period ), the observed outcome can be written as . Given this, this can be further simplified to:

(1)

where is the causal effect of the treatment on unit i at time t, or simply, the difference between the counterfactual and the actual trend of the treated unit (Yang, 2019;

Bouttell, Craig, Lewsey, Robinson, & Popham, 2018). And since only unit 1 (i=1) received treatment, then we estimate the causal effect for this unit over periods ,

therefore, where for , then , where

represents the observed outcome and represents the counterfactual (Yang, 2019).

Based on Abadie et al. (2010), the SC method in its simplest form be written as follows:

(2)

‧

國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Where represents time effects, while is a (r x 1) vector of observed covariates that are not affected by the intervention, is a (1 x r) vector of unknown parameters, is a (1 x F) vector of unobserved common factors, is an (F x 1) vector of permanent unobserved variables, while represents the unobserved transitory shocks at the unit level with zero mean.

To implement SCM, we choose a vector of positive weights that sum to one. Let this be:

, wherein each value of W represents a weighted average of the available control units, thus representing a synthetic control (Yang, 2019; Abadie et al., 2010). We want to choose w* such that the following conditions are met so that the treatment effect ( ) is unbiased.

(3)

(4)

But because is unobserved, we must choose w* that satisfies:

(5)

Thus, the unbiased estimator, , should be (Yang, 2019):

(6)

Fitting and is sufficient to match so long as the synthetic controls can fit and pre-intervention outcomes, (Abadie et al., 2010).

Ideally, the control group’s pre-treatment path must be parallel to that of the treated unit’s.

This allows for a better understanding of the treatment’s effect on the affected unit. That is, if the synthetic and treated paths diverge, then it can be said that the treatment does

‧

indeed have an effect and vice versa. But does divergence always point to a statistically significant effect? To assess the significance of the estimates, Abadie et al. (2010) suggest performing a series of placebo studies by iteratively applying the SC method to the control units. This process then returns a set of root mean squared prediction error (RMSPE) values which are calculated for both pre- and post-treatment periods (Cunningham, 2018).

When the pre-RMSPE values are too large, this could skew the placebo effects and make p-values too conservative. It is thus suggested that these observations be omitted (Cunningham, 2018). In the model used for this thesis, we have chosen to skip the omission of values as the dependent variables have already been standardized.

Although the SC method certainly has its benefits, it is important to be mindful of its limitations as well. First, the key identifying assumption is somewhat ambiguous (Yang, 2019). Moreover, the method may be too restrictive for some cases, as only the treatment unit can be affected by the intervention, and it is assumed that there are no spillovers in the donor pool.

3.3 Procedure

We first start with choosing appropriate variables for the dependent variables and covariates, for both the stock price and exchange rate models. Again, the main objective of this paper is to examine causal relationships between Trump’s more negative-toned tweets and stock prices, as well as exchange rates. We chose treatment events which are individual tweets published by Donald Trump wherein he has threatened a corporation or country.

Because of the nature of the synthetic control model, we needed to ensure that the tweets were spaced out enough so that treatment effects were not present both within the 20-trading day pre-treatment period and the 5-20-trading day post-treatment period. Note that days are measured as transaction or trading days and exclude weekends and some national holidays in the U.S. — New Year’s and Christmas, for example. In short, for both periods, we needed to ensure that there were no negative trade or business related tweets, such as tweets threatening to impose tariffs or boycott companies, for instance. This ensures that the before-and-after distinction is maintained, as suggested by Dube and Zipperer (2015).

As a result, however, this has somewhat restricted the scope of this thesis to just a handful

‧

of events. After the tweets are collected, to avoid contamination in the donor pool for each event, control countries or companies that Trump tweeted about in either the post-treatment and pre-treatment periods of each tweet event are omitted.

Data for both the dependent and independent variables for the stock price-based and exchange rate-based SC method models were then collected. We first collected two years’

worth of data in order to standardize the dependent variable data, as values for both stock prices and exchange rates varied considerably across corporations and countries. For instance, certain currency pairs such as U.S. dollar/Indonesian rupiah (USD/IDR) or U.S.

dollar/Laotian kip (LAK) can easily breach over 1,000 in value, whereas other currency pairs such as U.S. dollar/Euro (USD/EUR) or U.S. dollar/British pound (USD/GBP) may often fall below 1.0 in value (apilayer, 2019). Likewise, for stock prices, for the sake of comparison, on June 6, 2019, Amazon’s closing stock price stood at USD1,734.56, whereas Harley-Davidson’s stock price was a mere USD34.16 (NASDAQ, 2019a;

NASDAQ, 2019b). Thus, because the range varies considerably it is important to first standardize the dependent variable data. To do so, first, both the standard deviation and mean were calculated per country or company. After which, standardized closing stock prices and standardized closing exchange rates were calculated by subtracting the mean stock price or exchange rate from the original stock price or exchange rate and then dividing by the value of the standard deviation for stock prices or exchange rates. A simplified version of this formula (Formula 7) can be seen below, where A represents either closing stock price or closing exchange rates, μ_𝐴 represents the mean of A and σ_𝐴 represents the standard deviation of A.

(7)

Aside from the selected covariates, another potentially important predictor are the lagged outcome variables (McClelland & Gault, 2017). As Botosaru and Ferman (2017) have noted, there is “little guidance” on which variables are to be used as covariates. In selecting covariates for this model, we chose predictors that would have some bearing on the

‧

weights (McClelland & Gault, 2017). In addition to these predictors, the pre-intervention lagged outcomes were also included in the models. In doing so, matching on pre-intervention lagged outcomes may help control for unobserved factors and the heterogeneity of effects on unobserved and observed factors (Abadie et al., 2015).

Two models were produced for each event, with the first model (Model 1) containing all of the outcome lags and all of the chosen covariates, while the second model (Model 2) kept all of the outcome lags and some covariates. The choice of covariates for the second model was based on the difference in synthetic and treated predictor values.

Using Stata, we first used the synth command to identify weights for donor pool units and to calculate the pre-RMSPE value and covariate balance. Ideally, the optimal set of weights should produce a nearly identical pre-treatment trend for the synthetic unit (Cunningham, 2018). In order to see this in action, we also produce graphs to see the goodness-of-fit of the pre-treatment synthetic treatment unit with the actual treatment unit data. Ideally, the pre-treatment synthetic treatment series should be similar to the actual treatment series, and if the intervention does have an effect, then we should expect these two series to diverge after the event. The synth command also produces a table with the covariate balance values for both the actual treated unit and the synthetic treated unit. As Cunningham (2018) has stated, this table is not a technical test, however, if the predictors are more or less balanced, then we would expect the synthetic unit to be a suitable approximation of the real treated unit assuming the event had never occurred in the first place (Abadie et al., 2010). However, predictors, arguably, do not have to be perfectly balanced (Botosaru & Ferman, 2019).

Usually, the SC process requires researchers to drop observations with pre-RMSPE values that are deemed to be too large — this scales depending on the dependent variable, for instance, one can opt to drop observations with pre-RMSPE values that are twice as large as the treatment units’. However, since the dependent variables have already been standardized, we posit that this already takes matching quality into account, making it unnecessary to drop observations at this point. In order to obtain the models p-values, we

‧

國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

utilized synth_runner’s pvals1s option which produces a table of standardized two-sided p-values to show whether the treatment effects were significant on the day of treatment up to five days after treatment. Additional commands were then utilized as well to create placebo graphs.

‧

國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

在文檔中政治人物，推特與金融市場：來自川普推特的證據 - 政大學術集成 (頁 27-35)