Cost-Sensitive Classification

analysts’ reports to predict stock price movements. Kogan et al. further formulated this risk prediction problem as a text regression problem [11]. These studies highlight exam-ples of risk prediction processes in computer science. It can be observed that most models developed in these previous studies were based on various types of distinct features. We propose an approach that explores the effect of using a finance-specific sentiment lexicon to represent textual information.

2.2 Sentiment Analysis

Sentiment analysis is the process of identifying people’s attitudes and emotional states from languages. Sometimes, sentiment analysis also refers to opinion mining. Sentiment words can reflect speakers’ personal opinions more effectively.

With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs (e.g., authors of tweets who write about their life and share opinions on a variety of topics and discuss current issues), the growing importance of sentiment analysis has been recognized. Pak and Paroubek [18] built a classifier that was able to determine positive, negative and neutral sentiments in a given document. They performed linguistic analysis of the collected twitter corpora and explained the findings.

Pang et al. [19] focuses on methods that aim to address new challenges associated to sen-timent aware applications. This work helps in controlling other well-known time-series patterns, and news content helps predict stock returns at the daily frequency. Garcia [8]

studied the effect of sentiment on asset prices during the first half of the 20th century.

There has already been some research and applications dedicated to sentiment words.

Based on the associated results, the importance of sentiment words in text mining is ev-ident. In finance, there have been studies [14] that apply textual analysis to examine the sentiment of numerous news items, articles, financial reports, and tweets about public companies. Motivated by the results and guidance provided by these previous studies, we conducted a sentiment analysis on financial risks.

2.3 Cost-Sensitive Classification

Cost-sensitive classification is an approach that acknowledges the fact that some types of misclassifications may be worse than others. There are many cases in real life where penalties are applied when misclassifications occur. For instance, consider recommend-ing music to a subscriber with a preference for jazz over popular music and least of all over country music. Under these conditions, the cost of incorrectly predicting a jazz composition as a country music should be significantly higher than the cost of

misiden-‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

tifying it as a popular music. In our experiments, we assume that the cost is assigned based on the needs of the application. Lin [13] extended the one-versus-one approach to the field of cost-sensitive classification.As a result, better performance was observed with the improved approach than with the original one-versus-one cost-sensitive classification approach. Chen et al. [4] proposed a cost-sensitive learning vector quantization (LVQ) approach that incorporates cost information into the model.

These applications can be applied to many cases in the field of finance. For instance, financial distress prediction is of crucial importance in credit risk analysis, with increasing competition and complexity in the credit industry. Accurate predictions that minimize or eliminate misclassification costs are particularly critical in various applications like credit risk analysis and fraud detection. Recently, an increasing number of studies have applied cost information in different applications. The cost-sensitive concept is also applied in our work. If high-risk companies were misclassified, the consequences are worse than those for low-risk companies. To verify this hypothesis, we focused on the risk associated companies by using hard information contained in their financial reports as cost weights defined in cost-sensitive learning techniques.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Chapter 3 Methodology

The proposed approach evaluates soft information on a finance-specific sentiment lexicon.

Afterwards, the sentiment influence of the information on this experiment is analyzed.

The result of this analysis is integrated with hard information to predict cost-sensitive risk with ranking approach. The following subsections provide further details and reasons for the proposed approach.

3.1 Definition of Financial Terms

3.1.1 Daily Stock Returns

Daily return is a ratio that assesses the profit or loss derived from an investment in a trad-ing day. In business, a tradtrad-ing day is defined as the duration for which a particular stock exchange is open. Return on investment is a measure of investment performance used by both professional and novice investors. By dividing the loss or gain on an investment over the period of one day by the original cost of the investment, potential investors can compare investment opportunities by examining the daily return percentage rate. Daily returns provide a general overview of one’s investments, and the value indicates the asso-ciated gain or loss ratio. Positive daily return values represent gains, while negative daily return values represent investment loses. It is easy to interpret the results, which is why the value is widely used by investors to evaluate investments. In this study, we calculated the daily return Ri of a company on day i using Equation (3.1):

R_i = P_i− P_i−1

P_i−1 , (3.1)

where Pi is the closing price on day i and Pi−1is the closing price on day i − 1.

‧

In finance, volatility is a common risk metric measured by the standard deviation of a stock’s returns over a period of time. Volatility was selected as a risk metric for the fol-lowing reasons. It is closely related to the stock price because of its formulation. Volatility of investment is also closely connected to risks because it reflects the tendency of prices.

A stock with high volatility presents opportunities to buy assets at a lower cost and sell them when they are overpriced because the price intensely fluctuate. A stock has low volatility when its price remains stable. In other words, volatility helps us to understand the characteristics of an investment over time. Another advantage is that stock prices are easy to track. Moreover, previous studies utilized stock return volatility as a risk met-ric [11]. We can extend the findings of the previous studies and compare the effectiveness of our approach with these findings.

In this study, all returns throughout the trading days will be used to determine the stock return volatility for each of the companies of interest. Let Stbe the price of a stock at time t. Holding the stock from time t − 1 to time t would result in a net return of R_t = S_t/S_t−1− 1 [25]. Volatility of returns for a stock from time t − n to time t can be

The average number of trading days is 220 days per year in the training data. Hence, time t represents the date when the financial report was published, and the value of n is 220. In this study, the average stock return volatility will also be considered as the weight of companies in cost-sensitive models. In the tuning weight section, the manner in which the weights are applied to cost-sensitive models will be described.

在文檔中以文字探勘為基礎之財務風險分析方法研究 - 政大學術集成 (頁 20-24)

misiden-‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Chapter 3 Methodology

3.1.1 Daily Stock Returns

‧

立政治大學

立政治大學

立政治大學