• 沒有找到結果。

3. RESEARCH METHOD AND DESIGN

3.2 DATA COLLECTION

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

32

3.2 DATA COLLECTION

Some statistic variables in this study including TWVIX, Inflation rate, Consumer Confident Index, Industrial Production Index and Unemployment rate are from Taiwan Economic Journal (TEJ). TAIEX and Taiwan Stock Monthly Trading Value are obtained from Taiwan Stock Exchange Corporation (TWSE).

Variables about the features of financial news are constructed through the text analysis approach. The textual test data are financial news articles retrieved from Knowledge Management Winner (KMW), which provided by Infotimes owned by China Times Group. KMW serves as a data base for publications of China Times Group.

There are hundreds pieces of news published in a day in KMW data base. By an overall inspection manually preliminary, this research finds that many of them are irrelevant to financial issues or contain useless content such as advertisements. For this reason, this study set up some financial related key words to filter the news. This process could make sure the articles used to analyze are more focusing on financial, economic or political issues which investor actually interests in. The detailed key words setting and searching steps are shown by the following figure:

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

33

Figure 2-1 Financial news collection

At the first step, this research divide searching period into four sections by the TWVIX. The research period is following some reasons. First, TWIFEX started to calculate and announce TWVIX on December 1st 2006, and the first available data is from then. Another reason is that this study investigates investor sentiment and financial news, which there is no prior study of similar design to be compared or study further. So this study considers that all the date of available data should be included in. Furthermore, this research is started at 2015, so the last date of data is cut off on 2014 year-end. As a result, the sample date is set from December 2006 to December 2014, which also is the period for news searching. This study set up 5 sections to describe the degree of TWVIX index. The date whose TWVIX is above 40 is sorted to section 1 representing the high level of fear. The rest of sections setting and summaries are shown as the following table:

1. Searching Period Determination

2. Preliminary News Reading

4. Financial News Search 3. Key Words Selection

Section TWVIX Index Numbers of Days

Days Proportion maximum count of number of days, except for the section 1. Because TWVIX which is above 40 isn’t very often, to avoid the zero capture of section 1, and to consider the importance of high TWVIX, this study decide to sort the month into section 1 once there is one or more day’s TWVIX belongs to section 1. The reason why this study separates the whole periods into different sections is that it believes the different value of TWVIX implies different degree if investors’ fear. Those news articles been reported in the specific section of TWVIX may contains similar feature or sentiment.

So this study decides to set up different groups of words to search their matched period of sections.

At the step 2, after identifying the representative section of each month, this study select one piece of news randomly of every month as a sample and do the preliminary reading to pick up financial related words for financial news searching.

The words we pick up from news are related to financial, economic events, or material public issue. This study believes the third step would help us filter out the useless news such as entertainment news, sports and other non-financial reports.

Additionally, words could make the test data more focusing on financial topics, and

done by researchers through a joint inspection. Because researchers have been doing the financial news reading work under a research project for a year, and they developed a standard process of picking, removing words and validation. The researchers are experienced in analyzing financial news and the key word selection process is also done under supervision of the instructor. As a result, this study believes that these efforts help mitigate the subjective bias of words selection.

Next, the study matches those words of every month to their representative section of TWVIX. The words of same section are used to search through those dates of that section.

The key word selection and searching period are shown in APPENDIX 1. The tables show the monthly period with its matched searching words and presentative section. Notice that those words are used to search through the dates of that section, not the entire months.

The last step of financial news collection is to carry out the news searching by using the word we set and targeting the specific period.

Table 3-2 News Searching Result

Section

Numbers of words been Selected

News Collected Numbers of Days of No Search Result

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

36

Section 4 is matched with the most number of words. Obviously, with more words been used, there would be more numbers of news result been searched out. Secondly, there are 52 days with no search result by the words. That means there will be 52 missing values in the observations, because we could not obtain the scores we need in the model.

TWIFEX started to calculate and announce TWVIX on December 1st 2006. So the sample date is set from December 2006 to December 2014. The period for news searching also is from December 2006 to December 2014, which matches the researching period. There are total 40,577 search results under the financial related keywords. The entire research period is around 8 years, which contains 2,010 observations. There are 52 observations that contain missing value, so there are 1,958 observations remaining after removing the missing values.