3. RESEARCH METHOD AND DESIGN
3.3 TEXT ANALYSIS APPROACHES ON FINANCIAL NEWS
3.3 TEXT ANALYSIS APPROACHES ON FINANCIAL NEWS
For the further steps of research, Gathering information from financial news is an important part because text is unstructured data. In other words, this study needs to extract the features and calculate the scores from the news articles we searched because this study tries to find out that if terms used in news content will affect investor sentiment. This section will introduce how this study gets this data and process it into numerical form, which is applicable for research model.
The method this study adopts to examine the information from news is divided into two steps. First, this study constructs a dictionary with words list. And then we count the number of words, which is identified in the dictionary from the news we searched by a computer program. The dictionary approach is referred to Carretta et al.
(2011), Davis et al. (2012) and Garcia (2013). Their studies all adopt a dictionary as a base, which storing meaningful words. The design of dictionaries is diversified. The category of word list can be sorted by subjects or by sentiment depending on the purpose of study. This research builds up a dictionary to analyze the financial news.
This study designs the dictionary to be simplicity. There are only two categories in it:
positive words and negative words.
In this research, the study is interested in these two aspects of sentiment, to analyze the positive effect of positive words and negative effect of negative words as well. Could it be useful for the amount of positive words used to made readers become optimistic about the financial environment or their assets on hand? Does it have any possibility for the amount of negative words used to make people feel anxious? These two directions of sentiment of words are mainly concerned in this research, so this study builds the dictionary with positive and negative words, which is suitable to describe the financial market situation and investor sentiment.
Another point in the dictionary is the language. Because the test objects are the
‧
financial news published by Taiwan press media, the news are all written in Mandarin Chinese. By the experience of reading massive amount of financial news, and knowledge to financial market, the selection of words is through a careful process and validated by a joint inspection of researchers. As mentioned in section 3.2, researchers develop a standard process of word selecting and the process is under supervision of the instructor to mitigate the subjective bias on word selection. This research ensures that the words follow the habit of Chinese language and are used commonly within people and media. Most of words included in the dictionary are adjective used to describe psychological conditions of investor (e.g., optimistic, sad and anxious).
Another major proportion of words in the dictionary are adjectives and verbs which are used to describe financial market movements or situations (e.g., rise, drop and shock). Notice that sample words, which are mentioned above is just to show. The actual words are Chinese terms. In order to make sure this dictionary is complete; the dictionary includes as many words as possible. The quantity of positive words and negative words are 349 and 376 respectively. The detailed information about the dictionary is demonstrated in appendix 2.
After constructed the word lists of the dictionary, the following step is design an approach the measure the tone of news articles. The way to estimate usages of positive and negative words in simply adopt the method called term frequency. This study counts the number of positive words and negative words in every piece of news we searched respectively. At the tone analysis part, this research is referred to the research of Carretta et al. (2011), Davis et al. (2012) and Huang et al. (2013). They employ a formula to capture the level of opinion in textual content. The steps are similar in those studies. This study counts the number of positive and negative words in every single piece of news and then calculates the difference between the sum of positive and negative words to represent the level of tone, which is displayed as
‧
counts in both categories, the value will get closer to 0, which means the tone of this news combines both sentiments alleviating the strength of positive tone or negative tone, and the reader could receive a composite opinion. We can interpret that when output value is closer to 1, the news tone is more likely near optimism. So the scale between 1 and -1 is the value this study relying on to measure the tone of financial news.Carretta et al. (2011) mention that the value obtained is between −1 (completely negative news) and 1 (completely positive news). The scales (from −1 to 1) allow them to express the degree with which news is positive or negative and the strength of the tone of communication of news. (Carretta et al., 2011) They say that the value of -1 represents complete negative news and 1 is complete positive news on the contrary.
Although this study employs the same method to measure tone, we doubt that the value of 1 really means complete positive news in the test data. In the verification by reading some news afterward, this study finds that in the test data. Most of the news with the value above 0, which means the positive words captured is more than the negative words. But some of them may be defined as neutral news and even negative news by manual reading. Somehow there is a less possibility of negative words captured in negative news comparing to the positive words in positive news overall.
This study attributes this to the words selection in the dictionary. The paper speculates that the effect of tone measuring by both word categories is asymmetric. The reason
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
40
may be that the dictionary stores many negative terms in negative words list, but terms commonly used to describe a bad phenomenon by media are not as diversify as the words list. On the other hand, positive words are more likely to be captured by word list.
As a result, we don’t define the value itself as a score to say it is complete positive (negative) news with a value of 1 (-1), or the value of 0 is completely a neutral opinion. But to see the analyzed data as a whole, we still believe the relative property of the scale can precisely measure the level of tone among those samples.
With the greater number, the news conveys a more optimistic tone to readers.
An additional consideration of the formula is that once there is no any word of the dictionary words list been captured from the news, the calculation comes to be zero divided. To avoid this problem, this study substitutes it with the value of 0 rather than removes this data, because this study believes that news with no words been found still contains valuable information for investors. It is just written with no sentimental terms making no clear effect of sentiment on readers. Previous part says the value of 0 perhaps is not an absolute neutral opinion, but this value is, at least, objectively closer to non-sentimental expression.
‧
This study employs multiple regression analysis models to test the hypotheses respectively and estimated by ordinary least squares (OLS) to investigate the linear relationships between variables.
In every equation, this study adopts TWVIX as the proxy for investor sentiment referred to prior studies1. The VIX, which is also called fear index, is a suitable gauge for measuring investor sentiment.
The equation (1) tests the hypothesis 1 and 2 and to observe the effect of news number and news length on investor sentiment. By the studies of Birz and Lott (2011) and Nofsinger (2001), they suggest that the importance of news has the connection to market activity. This study divided the importance into two variables: number and length. So in the equation (1), we test if there is any association of these two variables on investor sentiment.
To take an examination in content of financial news, this study sets up equation (2) and (3), respectively. Equation (2) deals with the analysis of words usage of positive and negative words in financial news, which links to the hypothesis 3.a and 3.b to inspect the relationship with TWVIX. Equation (3) tests hypothesis 4 to find how tone of financial news motivates investor sentiment.
All equations and the definitions of variables are listed as follows:
𝑉𝐼𝑋𝑡= 𝛽0+ 𝛽1𝑁𝐸𝑊𝑆𝑁𝑈𝑀𝑡+ 𝛽2𝑇𝐿𝐻𝐼𝑇𝑡+ 𝛽3𝑇𝑊𝑆𝑇𝑂𝐶𝐾𝑡
1 Simon and Wiggins (2001), Baba and Sakurai (2011), Whaley (2000)