• 沒有找到結果。

輿論對外匯趨勢的影響 - 政大學術集成

N/A
N/A
Protected

Academic year: 2021

Share "輿論對外匯趨勢的影響 - 政大學術集成"

Copied!
71
0
0

加載中.... (立即查看全文)

全文

(1)國立政治大學資訊管理學系研究所 碩士學位論文. 政 治 大. 立 輿論對外匯趨勢的影響. ‧ 國. 學. The Effects of Public Opinions on Exchange Rate Movements. ‧ er. io. sit. y. Nat. al. n. v i n Ch 指導教授:蔡 e n g c瑞h i煌U 博士 研究生:林 子 翔 撰. 中 華 民 國 一○六 年 七月.

(2) 摘要 本研究要探討的是在新聞、論壇和社群媒體討論的相關訊 息是否真的會影響匯率的運動的假設。對於這樣的研究目標,我 們建立了一個實驗,首先以文字探勘技術應用在新聞、論壇與社 群媒體來產生與匯率相關的數值表示。接著,機器學習技術應用 於學習得到的數值表示和匯率波動之間的關係。最後,我們證明 透過檢驗所獲得的關係的有效性的假設。在此研究中,我們提出 一種兩階段的神經網路來學習與預測每日美金兌台幣匯率的走. 政 治 大 行整合,並將論壇的討論納為輸入資料。不同的資料組合產生出 立 勢。不同於其他專注於新聞或者社群媒體的研究,我們將他們進. ‧ 國. 學. 多種觀點,而三個資料來源的不同組合可能會以不同的方式影響 預測準確率。透過該方法,初步實驗的結果顯示此方法優於隨機. ‧. 漫步模型。. y. Nat. n. al. er. io. 圖形處理器. sit. 關鍵字:文字探勘、機器學習、匯率、類神經網路、TensorFlow、. Ch. engchi. i. i n U. v.

(3) Abstract This study wants to explore the hypothesis that the relevant information in the news, the posts in forums and discussions on the social media can really affect the daily movement of exchange rates. For such study objective, we set up an experiment, where the text mining technique is first applied to the news, the forum and the social media to generate numerical representations regarding the textual information relevant with the exchange rate. Then the machine learning technique is applied to learn the relationship between the derived numerical representations and the movement of exchange rates. At. 政 治 大 relationship. In this paper, 立 we propose a hybrid neural networks to learn and forecast. the end, we justify the hypothesis through examining the effectiveness of the obtained. ‧ 國. 學. the daily movements of USD/TWD exchange rates. Different from other studies, which focus on news or social media, we integrate them and add the discussion of forum as. ‧. input data. Different data combinations yield many views while different combination. sit. y. Nat. of three data sources might affect the forecasting accuracy rate in different ways. As a. n. al. er. io. result of this method, the experiment result was better than random walk model.. Ch. engchi. i n U. v. Keywords—text mining, machine learning, exchange rates, artificial neural networks, Tensorflow, graphic processing units. i.

(4) Index Introduction .................................................................................... 1 1.1. Background ........................................................................................ 1. 1.2. Motivation .......................................................................................... 3. 1.3. Objective ............................................................................................ 6 Literature Review........................................................................... 7. 2.1. Background of Exchange Rates ......................................................... 7. 2.2. Purchasing Power Parity (PPP) .......................................................... 8. 2.3. Autoregressive Integrated Moving Average (ARIMA) Model ........ 10. 2.4. Random Walk Theory ...................................................................... 12. ‧ 國. Text Mining ...................................................................................... 14 Decision Support Mechanism .......................................................... 15. ‧. 2.6. 學. 2.5. 立. 政 治 大. 1. Concept Drifting ............................................................................ 15. y. Nat. io. sit. 2. SLFN .............................................................................................. 16. n. al. er. 3. The Resistant Learning with Envelope Module............................. 17. Ch. i n U. v. 4. Moving Window ............................................................................ 19 2.7. engchi. Background of TensorFlow and GPU .............................................. 19 1. TensorFlow .................................................................................... 19 2. Graphics Processing Unit ............................................................... 22 3. TensorFlow and GPU implementation ........................................... 23. 2.8. Reasoning Neural Networks ............................................................ 23 Experiment Design....................................................................... 25. 3.1. Data Collection and Data Pre-Processing ........................................ 25 1. Text Segmentation ......................................................................... 26 2. Stop Words Removal ..................................................................... 27 i.

(5) 3. Part-of-Speech Tagging (PoS) ....................................................... 27 4. Sentiment Analysis......................................................................... 28 3.2. Build a Neural Network in TensorFlow ........................................... 29 Experimental Results ................................................................... 35 1. Facebook ........................................................................................ 38 2. News .............................................................................................. 40 3. Forum(PTT): ............................................................................... 43 4. Facebook and News: ...................................................................... 45. 政 治 大 News and Forum (PTT) ................................................................. 50 立. 5. Facebook and Forum (PTT) ........................................................... 48 6.. 7. Facebook, News and Forum .......................................................... 53. ‧ 國. 學. Conclusions and Future Works .................................................... 56. Future Works .................................................................................... 56. y. Nat. 5.2. Conclusions ...................................................................................... 56. ‧. 5.1. n. al. er. io. sit. References ............................................................................................................ 58. Ch. engchi. ii. i n U. v.

(6) List of Tables Table 1. The resistant learning with envelope module (Huang et al., 2014)................ 18 Table 2. Examples for TensorFlow operations (Abadi et al., 2016). ........................... 21 Table 3. Traditional Chinese Text Segmentation Example .......................................... 27 Table 4. Example of Traditional Chinese stop words removal .................................... 27 Table 5. Example of Traditional Chinese part-of speech tagging ................................ 28 Table 6. Example of NTUSD words ............................................................................ 28 Table 7. The definition of six variables in the input vector ......................................... 30. 政 治 大 Table 9. Defintion of all variables 立 ................................................................................ 33 Table 8. Representation for three movements groups .................................................. 30. ‧ 國. 學. Table 10. Example of after transformed data ............................................................... 35 Table 11. Forecasting Accuracy Rate (Facebook)........................................................ 39. ‧. Table 12. Forecasting Accuracy Rate (News) .............................................................. 42. sit. y. Nat. Table 13. Forecasting Accuracy Rate (Forum) ............................................................ 44. al. er. io. Table 14. Forecasting Accuracy Rate (Facebook and News)....................................... 47. v. n. Table 15. Forecasting Accuracy Rate (Facebook and Forum) ..................................... 49. Ch. engchi. i n U. Table 16. Forecasting Accuracy Rate (News and Forum) ........................................... 52 Table 17. Accuracy of forecasting exchange rate movements (three sources) ............ 54. iii.

(7) List of Figures Figure 1. Computation graph in TensorFlow (Abadi et al., 2016) ............................... 20 Figure 2. Network structure of RN .............................................................................. 24 Figure 3. The flow chart of data collection and data pre-processing ........................... 26 Figure 4. Real exchange rate movements for USD/TWD ........................................... 29 Figure 5. The flowchart of the learning process of Reasoning Neural Networks ........ 31 Figure 6. The Thinking Mechanism ............................................................................. 32 Figure 7. The Cramming Mechanism .......................................................................... 32. 政 治 大 Figure 9. The proposed neural 立network initial architecture .........................................34 Figure 8. The Reasoning Mechanism .......................................................................... 33. ‧ 國. 學. Figure 10. Volume of discussion and their positive / negative emotion ...................... 36 Figure 11. The implementation of moving windows in this experiment ..................... 37. ‧. Figure 12. Facebook Forecasting Result (M = 1~3) .................................................... 38. sit. y. Nat. Figure 13. Facebook Forecasting Result (M = 4~6) .................................................... 39. al. er. io. Figure 14. News Forecasting Result (M = 1~3) .......................................................... 41. v. n. Figure15. News Forecasting Result (M = 4~6) ........................................................... 42. Ch. engchi. i n U. Figure 16. Forum Forecasting Result (M = 1~3) ......................................................... 43 Figure 17. Forum Forecasting Result (M = 4~6) ......................................................... 44 Figure 18. Facebook and News Forecasting Result (M = 1~3) ................................... 46 Figure 19. Facebook and News Forecasting Result (M = 4~6) ................................... 46 Figure 20. Facebook and Forum Forecasting Result (M = 1~3).................................. 48 Figure 21. Facebook and Forum Forecasting Result (M = 4~6).................................. 49 Figure 22. News and Forum Forecasting Result (M = 1~3) ........................................ 51 Figure 23. News and Forum Forecasting Result (M = 4~6) ........................................ 51 Figure 24. Facebook, News and Forum Forecasting Result (M = 1~3) ...................... 53 iv.

(8) Figure 25. Facebook, News and Forum Forecasting Result (M = 4~6) ...................... 54. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i. i n U. v.

(9) Introduction 1.1 Background The exchange rate is the value of one nation’s currency in comparison to another (Levinson, 2014). It has been an important issue in the international trade and finance. Generally speaking, there are two kinds of exchange rate systems – the fixed exchange rate system and the floating exchange rate system. Under a fixed exchange rate system,. 政 治 大 currency. In the mid-20th century, there are a lot of countries operated in the Bretton 立 the government of a country keeps the value of a currency fixed against another. ‧ 國. 學. Woods system1, such as United States, Canada, Western European countries, Japan and so on. The Bretton Woods system is an international monetary system where the. ‧. currencies of most countries were pegged to the US dollars. Until early 1970s, most. sit. y. Nat. countries decided to allow market forces determine the exchange rates. Recently, most. io. er. major economic bodies adopt the floating exchange rates. Under a floating exchange. al. rate system, the market forces of demand and supply of foreign and domestic currencies. n. v i n rate. C As ha result, to makeUthe exchange engchi. determine the exchange. rate stable, the. government of a country has to hold large reserves to control demand and supply of foreign and domestic currencies (Mankiw, 2010). With the increasing international trades, the fluctuations in exchange rates have a tremendous impact on the economy. They are influenced by many factors, such as inflation rates, interest rates, political stability and so on (Levinson, 2014). As a result, many experts and scholars are committed to the researches of forecasting the exchange. 1. https://en.wikipedia.org/wiki/Bretton_Woods_system 1.

(10) rates. There are four types of forecasting methods – technical forecasting, fundamental forecasting, market-based forecasting and mixed (Madura, 2011). The technical forecasting uses the historical data to predict the future trend of the exchange rate. The fundamental forecasting uses fundamental data related to macro-economic variables, such as Gross Domestic Product (GDP), unemployment rate, balance of trade and inflation rates. The market-based forecasting uses the spot rate or the forward rate to forecast the spot rate in the future. The mixed forecasting assigns weights value to the outcomes resulting from the aforementioned forecasting techniques. To use a weighted. 政 治 大 Recently, computers 立 have been widely utilized in almost every field, such as. average of various forecasts is a useful method of mixed forecasting (Madura, 2011).. ‧ 國. 學. defense, education, business, medicine, transport system and so on. In the meantime, computers have changed the economics transaction and financial markets. In terms of. ‧. economics, computer models and simulations can be used to predict how trend will. y. sit. n. al. er. io. analysis.. Nat. change. One of the fundamental techniques used by econometricians is the regression. i n U. v. Random walk theory2 states that all price movements are unpredictable random. Ch. engchi. walks. That is, it is difficult to forecast which exchange rate will appreciate or depreciate. Like the coin flipping, it has a closer 50/50 probability of landing on heads or tails. Meese and Rogoff (1983a, b) thought that exchange rate forecasts based on structural models are worse than the random walk model. One of major applications of machine learning techniques is prediction. There are a lot of machine learning techniques like support vector machines, neural networks,. 2. https://en.wikipedia.org/wiki/Random_walk 2.

(11) deep learning and so forth, may provide effective ways to deal with more complex relationship (Varian, 2014). Neural networks have some good features for a forecasting task. First, neural networks are data-driven self-adaptive methods different from the traditional model-based methods. Second, neural networks learn from data sets where the desired output is provided in advance. Hence, the neural networks learn by adjusting itself to find the right answer, increasing the accuracy of prediction. Finally, neural networks are non-linear. In fact, the linear model only provides limited support for large data set (Varian, 2014). It is difficult to convert a nonlinear model into a particular data. 政 治 大 traditional model-based methods. They are able to accomplish nonlinear modeling 立. set. Therefore, neural networks are nonlinear data-driven methods in comparison to the. without a priori knowledge about relationships between input and output (Zhang et.al,. ‧ 國. 學. 1998). However, the performance of neural networks has been an important issue over. ‧. the past few decades. Neural networks generally require tuning lots of parameters and. y. Nat. consume the bulk of the processing time, especially the multi-layer neural networks.. speed performance of neural networks is improving a lot.. n. al. 1.2 Motivation. Ch. engchi. er. io. sit. With the advent of powerful hardware and deep learning frameworks, the learning. i n U. v. Nowadays, people become more reliant on social media for accessing and sharing information. News, social media and forum have become a main source of information for many people when making decisions. Those data could provide invaluable forecasting insight in financial markets. Therefore, academic people and practitioners are wondering if the information and the discussions on the social media affect the daily movement of exchange rates. To this end, this study wants to explore the hypothesis that the relevant information in the news, the posts in forums and the discussions on the social 3.

(12) media can really affect the movement of exchange rates. On the other hand, forecasting the movement of exchange rates has been a difficult task all the time that most econometric models are not capable of forecasting exchange rates with outstandingly higher accuracy than a naive random walk model. However, the variety, the volume, and the velocity of data on social media become bigger and bigger. Furthermore, the veracity of data on social media is an issue. Therefore, to justify the aforementioned hypothesis needs to apply the techniques of big data analytic and artificial intelligence (AI) (Walker , 2014) and the challenge of such study is huge.. 政 治 大 widely utilized in almost every field, such as defense, education, business, medicine, 立. Recently, concepts of big data analytic and artificial intelligence (AI) have been. ‧ 國. 學. transport system and so on. On the other hand, the exchange rate is the value of one nation’s currency in comparison to another. With the increasing international trades, the. ‧. fluctuations in exchange rates have a tremendous impact on the economy. The exchange. sit. y. Nat. rate is influenced by many factors, such as inflation rates, interest rates, political stability. io. n. al. er. and so on (Levinson, 2014).. i n U. v. The time series model has been widely discussed and valued by many experts and. Ch. engchi. scholars in forecasting the exchange rate. The time series data is an ordered sequence of observations in chronological order. Many financial data, such as unemployment rate can be thought of as time series data. The popular time series model is known as the autoregressive integrated moving average (ARIMA) model (Zhang, 2003). Owing to the data is getting bigger and bigger, we may need some powerful tools such as big data analytics, artificial neural networks (ANN), deep learning and so on to deal with the more complex relationships. As a result, more and more economists are interested in the applications of big data analysis with machine learning (Varian, 2014) 4.

(13) since they may provide effective ways to deal with more complex relationship (Zhang, 2003). Big data used to mean a large of structured and unstructured data, which is very large and difficult to process using traditional database and software techniques. In general, big data is often characterized by its volume, variety, velocity and veracity. Volume is how much data we have - the data measured in petabytes is now in zettabytes or even more. We can collect data from all kinds of sources, including social media, scientific instruments, mobile devices, Internet of Things and so on. Variety means that. 政 治 大 refers to the increasing speed 立at which data are generated and must be coped with in a. there are so many different types of data from textual to numerical to video. Velocity. ‧ 國. 學. timely manner. Veracity is about ensuring that the data is accurate, which requires a process to prevent wrong data from accumulating in system. The data is worthless if it. ‧. is not accurate. Different from the conventional approach, big data analytics have. sit. y. Nat. already changed the way we process and manage data (Walker , 2014).. n. al. er. io. Over the coming decades, AI is the future trend around the world. Machine. i n U. v. learning is a particular approach to AI. Machine learning techniques have increased the. Ch. engchi. ability for computer to recognize patterns in big data. It also has the ability to learn from data and make prediction based on information. However, for instance, the performance of ANN has been an important issue over the past few decades. ANN generally consumes the bulk of the processing time and requires tuning lots of parameters. With the advent of more powerful hardware and deep learning frameworks recently, the performance of ANN has being improved. In 2015, Google released its 2nd generation machine learning software, TensorFlow. It made easier for everyone to participate in machine learning. Not only 5.

(14) algorithms, but also hardware makes a great progress. Besides the CPU, one of the most important parts in machine learning is the graphics processing unit (GPU). Both CPUs and GPUs can handle graphical operation, but GPUs have additional advantages over CPUs. First, a single GPU might have thousands of cores while a CPU usually has no more than multiple cores. Second, GPU designed for handling multiple tasks simultaneously. Finally, GPU is really fast at doing certain types of math calculations, particularly vector and matrix operations. To reinforce the performance of machine learning, Google even successfully developed a custom machine learning accelerator. 政 治 大. chip, the tensor processing unit (TPU) (Osborne, 2016).. 立. 1.3 Objective. ‧ 國. 學. We set up an experiment to justify the hypothesis that the relevant information in. ‧. the news, the posts in forums and the discussions on the social media can really affect the daily movement of exchange rates. The text mining technique is first applied to the. y. Nat. io. sit. news, the forum and the social media to generate numerical representations regarding. n. al. er. the textual information relevant with the exchange rate. Then the machine learning. Ch. i n U. v. technique is applied to learn the relationship between the derived numerical. engchi. representations and the movement of exchange rates. This study would like to use of the features of TensorFlow and GPU to build up the neural network system of Tsaih’s work to forecast the exchange rate movements. Tsaih (1998) proposed the Reasoning Neural Network (RN) that used a softening learning procedure to emulate the behavior of human brain. At the end, we justify the hypothesis through examining the effectiveness of the obtained relationship.. 6.

(15) Literature Review 2.1 Background of Exchange Rates The initial system of exchange rate is the gold standard, which is a monetary system where a country's currency has a value directly pegged to gold. The system used for exchange rate has changed from the gold standard, to fixed exchange rates, to a floating rate system. The Bretton Wood Agreement create a new international monetary system for. 政 治 大. fixed exchange rates between currencies in 1944. The main objectives of the agreement. 立. were to ensure a foreign exchange rate system, prevent competitive devaluations and. ‧ 國. 學. promote economic growth. The Bretton Wood Agreement among countries lasted until 1971 and replaced by the Smithsonian Agreement 3. Even with the wider bands due to. ‧. the Smithsonian Agreement, governments still had difficulty maintaining exchange. Nat. sit. y. rates within the stated boundaries. The Smithsonian Agreement began to lose its. n. al. er. io. popularity, and economic forces compelled a lot of countries to change.. Ch. i n U. v. Nowadays, most economies have shifted to a floating exchange rate system. Under. engchi. a floating exchange rate system, exchange rates are influence by a variety of factors including political stability, inflation, interest rate, public debts and terms of trade. Those factors make forecasting exchange rates difficult. Trading at the right time can bring significant profit, but a trade based on incorrect movement can risk big losses. Using suitable tools or methods can reduce the effect of mistakes and also can increase profitability.. 3. https://en.wikipedia.org/wiki/Smithsonian_Agreement 7.

(16) The exchange rate markets are changing all the time. As a result, exchange rates matter more than ever in today's globalized world. They affect the prices of international trades, can be a focus of government policy, and often play an important role in economic and political. The ability to forecast the direction of exchange rate movements, up or down, is vital to the people and government. To this end a wide variety of approaches and techniques have been tried and used by the participants in the world. Furthermore, over the last few decades, academics and financial have shown an interest in this field and. 政 治 大 presents the basic understanding 立 of subjects associated with exchange rate forecasting. attempted to quantify and justify the wide variety of techniques used. This chapter. ‧ 國. 學. through a literature review.. ‧. 2.2 Purchasing Power Parity (PPP). sit. y. Nat. Purchasing power parity (PPP) is an economic theory which tries to quantify the. io. al. n. form of PPP.. er. inflation exchange rate relationship. It consists of absolutely form of PPP and relative. Ch. engchi. i n U. v. Absolute purchasing power parity shows that price of same goods in two different countries should be equal when measured in the same currency. If a dissimilarity in prices as measured by the same currency exists, then we can expect the demand should move to a level at which the goods have the same price in the two countries. Relative purchasing power parity considers that owing to market imperfections such as transportation costs, tariffs, transaction cost and so on, price of same goods in different countries will not necessarily be the same when measured in a common currency. In other words, it describes the differences in the inflation rates between two countries. For example, the inflation rate in United States is higher than that in Taiwan, causing 8.

(17) the price of goods in United States to rise. Dornbusch and Krugman (1976) observed that most macroeconomists have a deep-rooted belief that a variant of PPP is justified in some sense. Owing to it forms a basic of many macroeconomic models of trade and of exchange rate determination, failure to support this parity empirically would somewhat destroy the foundation for such models. PPP theory does not always hold when applied in the real world. There are quite a few reasons, such as trade barriers, trade protection, transportation costs, information. 政 治 大. costs, non-tradable products and service and other market participants (Salvatore, 2012).. 立. Abuaf and Jorion (1990) considered that exchange rates should be related to relative. ‧ 國. 學. changes in price levels with deviations that might be only minimal or immediate, while empirical work could hardly find evidence in support of purchasing power parity. There. ‧. is a lot of evidence of long term PPP which support the notion that temporary. y. Nat. er. io. 1981).. sit. disequilibrium may occur, but the deviations will be stationary in the long term (Frenkel,. al. n. v i n C hused an expandedUof the purchasing power parity MacDonald and Marsh (1997) engchi. condition to build simultaneous equation models for incorporate meaningful long-run equilibrium relationships and complex short-run dynamics. They showed that fully dynamic out-of-sample forecasts from the expanded version model are able to outperform those of a random walk model over horizons as short as three month. In sum, there are more recent and exhaustive study that use the econometric techniques to research PPP. Many economists still believe that relative prices should form some type of anchor in the determination of long run real exchange rates.. 9.

(18) 2.3 Autoregressive Integrated Moving Average (ARIMA) Model ARIMA is an abbreviation of AutoRegressive Integrated Moving Average introduced by Box and Jenkins. In an ARIMA model, the future value of a variable is assumed to be a linear combination of past observations and random errors, expressed as follows 𝑦𝑡 = 𝜃0 + ∅1 𝑦𝑡−1 + ∅2 𝑦𝑡−2 + ⋯ + ∅𝑝 𝑦𝑡−𝑝 + 𝜀𝑡 − 𝜃1 𝜀𝑡−1 − 𝜃1 𝜀𝑡−2 − ⋯ − 𝜃𝑞 𝜀𝑡−𝑞. 政 治 大 are the coefficients, p and q are integers that are often referred to as autoregressive and 立 where 𝑦𝑡 is the value of observations and 𝜀𝑡 is the random error at time t, ∅𝑖 and 𝜃𝑗. ‧ 國. 學. moving average polynomials, respectively. Fundamentally, this method has three phases – model identification, parameter estimation and diagnostic checking. For. ‧. example, the ARIMA (1, 0, 1) model can be represented as follows. y. sit. Nat. 𝑦𝑡 = 𝜃0 + ∅1 𝑦𝑡−1 + 𝜀𝑡 − 𝜃1 𝜀𝑡−1. n. al. er. io. An ARIMA model has its limitations. Because of the models directly depend on. i n U. v. past values, and therefore work best on long and stable series (Box and Jenkins, 1970).. Ch. engchi. Autoregressive Integrated Moving Average (ARIMA) which is often called method of Box-Jenkins time series. The accuracy of short term forecasting is good in comparison with long term forecasting. ARIMA model ignores the independent variable completely, and uses past and present values of dependent variable to generate accurate short term forecasting. ARIMA is proper when the observation of time series is statistically related to the dependent (Hendranata, 2003). Sabur et.al (1993) used ARIMA models to examine the trend, annual and seasonal variability and relative profitability of spices in Bangladesh. They considered that the 10.

(19) ARIMA model has to be used only to short term forecasts. Meyler et.al (1998) adopted ARIMA time series models for forecasting Irish inflation. They considered that two alternative ways to the issue of identifying ARIMA models and the objective penalty function methods. The result showed that ARIMA models are theoretically justified and can be surprisingly robust as for alternative modelling approaches. Sekine (2001) estimated an inflation function and forecasted one-year ahead inflation for Japan. He found that markup relationship, excess money supply and the output gaps were particularly important for determining long term equilibrium model of inflation. He. 政 治 大 information of alternative models. 立. emphasized the importance of adjustment to a pure model-based forecast by utilizing. ‧ 國. 學. Hogan (1986) compared four different models – purchasing power parity model, static and dynamic specifications of both the flexible price and sticky price monetary. ‧. models, forward exchange theory and univariate ARIMA models. The author found that. sit. y. Nat. forward rates have preferable forecasting at a horizon of one quarter. The uncovered. al. er. io. interest parity is the better model at two quarter forecasting horizons. The static and. v. n. dynamic specifications of both the flexible price and sticky price monetary models. Ch. engchi. i n U. outperformed all other models. Wood and Dasgupta (1996) used the neural networks to forecast the MSCI U.S.A. Capital Market Index. They wanted to test the ability of a non-parametric learning network to provide valuable information to a global portfolio manager. The result showed that their system is the best in comparison with multiple linear regression and two ARIMA models. Pai and Lin (2005) proposed a hybrid ARIMA and support vector machine are used in stock price forecasting. Their data consisted of closing price on ten publicly traded company stock in United States. The result has a significant improvement in forecasting accuracy.. 11.

(20) From the above-mentioned studies, it is clear that ARIMA model can be used to forecast. We can conclude that the ARIMA model has the enough predictive ability.. 2.4 Random Walk Theory The term “random walk” was first proposed by Karl Pearson in 1905 (Pearson, 1905). Random walk theory assumes that all price shifts are unpredictable and random. Suppose that we have a series of historical values of the exchange rate, and we want to forecast its future value. In order to forecast, we need to estimate the parameters. 政 治 大 as the exchange rate observed at time t. The possible. of a statistical model that describes the relationship between past and future values of. 立. exchange rate. Assume that 𝑒𝑡. ‧ 國. 學. equation describing the dynamics of the exchange rate would define as 𝑒𝑡 = 𝜌𝑒𝑡 − 1 + 𝜀𝑡. ‧. where 𝜀𝑡 are random innovations (James et al., 2012). Empirical estimates of the. y. Nat. io. sit. autoregressive parameter 𝜌 lie systematically around 1 (Cheung et al., 2005; Meese. n. al. er. and Rogoff, 1983a, b; Mussa, 1979). It means that a value of the 𝜌 = 1 implies that. C h𝑒𝑡 = 𝑒𝑡 − 1 + 𝜀𝑡 engchi. i n U. v. That is, the exchange rate follows a random walk. Meese and Rogoff (1983a, b) provided evidence showing that no structural model of the exchange rate can reliably outperform the forecasting capacity of a naïve random walk at short- and medium-term horizons. Since their study, the random walk became the standard benchmark against which the forecasting performance of every exchange rate model is compared to (James et al., 2012). The argument for the random walk is usually traced back to Fama in 1965. 12.

(21) Regnault (1863) observed that the longer you hold a security, the more you can win or lose on its price variations; the price deviation is directly proportional to the square root of time. Bachelier (1900) considered that there is no helpful information contained in historical price movements of securities. He thought that today’s return signals nothing about the signals of tomorrow’s return. The result led him to model stock returns as a random walk. As a result, the expected return of speculation is zero. Macauley (1925) thought that there was a noticeable similarity between the fluctuations of the stock. 政 治 大. market and those of a chance curve which may be obtained by throwing a dice.. 立. Kendall (1953) examined the patterns of UK stock. He considered that the stock. ‧ 國. 學. market was inefficient. These random price movements believe to be the effect of a well-functioning market. His findings showed that stock prices are a random series.. ‧. Fama (1965) thought stock prices fluctuate randomly around their intrinsic values,. y. Nat. sit. returns quickly to the equilibrium and fully reflect the latest information available in. n. al. er. io. the market. Sometimes the market will over adjust or under adjust, but it cannot be. i n U. v. predicted which one will occur at any time. Samuelson (1965) explained that such. Ch. engchi. randomness in returns should be expected from a well-functioning stock market. The random walk theory assumes that information is openly and easily available and there are large numbers of competing rational profit maximizing participants with enough resources make good use of any profit opportunity arising from systematic price shift of an individual stock. Competition will cause firms to a have a full impact of new information on stocks intrinsic values to be adjusted instantaneously into the actual stock prices determination and make all non-random fluctuations so small that they cannot be took advantage of profitably; when this hold true neither those that study 13.

(22) critically the trend in past stock price movement will be able to make future predictions based on a certain trend analysis nor basic analysis which analyze publically available information to determine misprices securities would be capable of earning abnormal returns over and above that could be earned on buy and hold strategy (Seelenfreud et al., 1968 ; Robert, 1959 ; Fama, 1970).. 2.5 Text Mining Text mining is a method which gathers structured information from unstructured or semi-structured text. Text mining tries to find interesting patterns from large amount of. 治 政 different unstructured textual resources (Feldman & Sanger, 大 2007). The text mining 立 technique provides a valuable tool for the extraction of information from text. These ‧ 國. 學. techniques are designed to generate numerical representations of the text for several. ‧. potential uses.. sit. y. Nat. Peramunetilleke & Wong (2002) used news headlines to predict intraday movements. io. er. in exchange rates. Their approach is different from analysis based on quantifiable information, the predictions are generated form text describing the world financial,. al. n. v i n political and general economic C news. claim their system makes results that are U h eThey i h ngc significantly better than random prediction on a publicly available commercial data set.. Zhang et al. (2005) used a framework where they use the news articles and economic data to model the exchange rate movements between Euro and US dollars. They considered that this approach will cause an exchange rate model with improved accuracy. According to a research from Markit which a global financial information and services company, from December 2011 to November 2013 positive social media sentiment stocks have shown growing returns of 76% in comparison with -14% from 14.

(23) negative sentiment stocks. Those sentimental data show whether the common talk is good or bad news for a given stock. Depend on that information, people can trade the stock based on the sentiment data (Kilburn, 2014).. 2.6 Decision Support Mechanism Huang et al. (2014) proposed a decision support mechanism (DSM) for dealing with the outlier detection problem in concept drifting environment. Their algorithm is the integration of (1) implementation of the resistant learning concept with envelope module via the adaptive single layer feed-forward neural network (SLFN) (2). 治 政 implementation of the incremental learning concept via大 the moving window technique. 立 Concept Drifting. ‧ 國. 學. 1.. The definition of concept drifting is the concepts are not stable and changing with. ‧. time (Tsymbal, 2004). Some scholars classified the concept drifting into several types.. y. Nat. sit. For example, Stanley (2003) classified concept drifting into two types – sudden concept. n. al. er. io. drifting and gradual concept drifting. Furthermore, according to the rate of change,. i n U. v. gradual concept drifting can subdivide into moderate and slow drifts.. Ch. engchi. In order to obtain a model from the concept drifting environment, Gama et al. (2014) considered that learning requires not only updating the predictive model with new patterns but also forgetting old information. Because incremental learning can not only learn new concepts but also hold existing and still relevant concepts, Elwell and Polikar (2011) integrated incremental learning with an ensemble classifier system to address the concept drifting problem. Masud et al. (2011) added a time constraint that causes the system to wait for more test instances to discover similarities; when they had. 15.

(24) sufficient candidates, they decided whether to perform a correlation function and classify them as a novel class. There are many methods that address the concept drifting problem in many types of problems. Solutions for each type of concept drifting or concept evolution require distinct methods. In sum, it is difficult to identify the hidden trends evolving over time or the implicit concepts changing over time. 2.. SLFN. 政 治 大 implement an adaptive SLFN to solve it. The SLFN’s fitting function is defined as: 立. In order to deal the outlier detection with resistant learning, Tsaih and Cheng (2009). ‧ 國. 𝐻 𝑎𝑖 (𝑥) = tanh(𝑤𝑖0 + ∑ 𝑤𝑖𝑗𝐻 𝑥𝑗 ). 學. 𝑚. 𝑗=1. ‧. 𝑝. 𝑚. al. 𝑗=1. sit. 𝑖=1. er. io. 𝑒 𝑥 − 𝑒 −𝑥. y. Nat. 𝐻 𝑓(x) ≡ 𝑤0𝑜 + ∑ 𝑤𝑖𝑜 tanh (𝑤𝑖0 + ∑ 𝑤𝑖𝑗𝐻 𝑥𝑗 ). v. n. where tanh(𝑥) ≡ 𝑒 𝑥 + 𝑒 −𝑥 , m is the number of explanatory variables (𝑥𝑗 ) ; 𝐱 ≡. Ch. engchi. i n U. (𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 )T; p is the adaptive number of adopted hidden nodes; 𝑤𝑖𝑜 is the bias value of the 𝑖 𝑡ℎ hidden node; the superscript H throughout the paper refers to quantities related to the hidden layer; 𝑤𝑖𝑗𝐻 is the weight between the 𝑗𝑡ℎ explanatory variable 𝑥𝑗 and the 𝑖 𝑡ℎ hidden node; 𝑤0𝑜 is the bias value of the output node; the superscript o throughout the paper refers to quantities related to the output layer; and 𝑤𝑖𝑜 is the weight between the 𝑖 𝑡ℎ hidden node and the output node. In their study, a character in bold represents a column vector, a matrix, or a set, and the superscript T indicates the transposition.. 16.

(25) Through this SLFN, the input information x is first transformed into ≡ 𝑇. (𝑎1 , 𝑎2 , … , 𝑎𝑝 ) , and the corresponding value of f is generated by a rather than x. Namely, given the observation, all the corresponding values of hidden nodes are first 𝐻 𝐻 calculated with 𝑎𝑖 ≡ tanh(𝑤𝑖0 + ∑𝑚 𝑗=1 𝑤𝑖𝑗 𝑥𝑗 ) for all I, and the corresponding value. 𝑓(x) is then calculated as 𝑓(x) = 𝑔(𝑎) ≡ 𝑤0𝑜 + ∑𝑝𝑖=1 𝑤𝑖𝑜 𝑎𝑖 . 3.. The Resistant Learning with Envelope Module Tsaih and Cheng (2009) proposed a resistant learning mechanism with the SLFN. 政 治 大 adapts the number of adopted hidden nodes and the associated weights of SLFN during 立 and a tiny pre-specified value to deduce a function form. The mechanism dynamically. ‧ 國. 學. the training process. They also implemented both robustness analysis and deletion diagnostics to exclude potential outliers at the early stage, thus prevent the SLFN from. ‧. learning them (Roussseeuw and Driessen, 2006). Above all, the weight-tuning. sit. y. Nat. mechanism, the recruiting mechanism, and the reasoning mechanism are implemented. io. al. er. to allow the SLFN to evolve dynamically during the learning process and to explore an. n. acceptable nonlinear relationship between explanatory variables and the response in the presence of outliers.. Ch. engchi. i n U. v. Huang et al. (2014) propose an envelope bulk mechanism integrated with the SLFN to handle outlier detection problem. This outlier detection algorithm is performed with an envelope bulk whose half width is 2ε. The ε is changed from a tiny value (10−6) to a non-tiny value (1.96) due to the envelope module. The value changes to 1.96 similarly according to that the 5% significance level in given the distribution is normal. The standard to distinguish whether the instance is outlier or not is the instance’s residual is greater than ε ∗ γ ∗ σ, where σ is the standard deviation of the residual of the current reference observations and γ is a constant that is equal to or greater than 1.0, 17.

(26) depending on the user’s stringency in the outlier detection. The smaller the γ value is, the more stringent the outlier detection is. Furthermore, if our requirements are stricter, we also can modify the ε value to an appropriate value. The envelope module results in a fitting function with an envelope that contains nearly the majority of training data but no outliers. Furthermore, the outliers are expected to be included at later stages. Table 1. The resistant learning with envelope module (Huang et al., 2014). 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. In brief, this envelope module allows us to wrap the response elements seen as inliers in the envelope. Vice versa, the response as outliers won’t wrap in the envelope. The quantity of the inliers is decided by the ε and γ. The stricter parameter is, the less 18.

(27) inliers inside the envelope. In other aspect of outliers, there will be more potential outliers determined by the envelope module. 4.. Moving Window Babcock et al. (2002) defined a sequence-based window by choosing a specified. size uniformly over a moving window of the last elements in a data stream. The mechanism has been adopted in several fields. Navvab et al. (2012) used a robust artificial neural network with the moving window concept to build a dynamical model for predicting the crude oil fouling behavior. They claimed that the implementation of. 治 政 moving window updated the model whenever a new 大 data block came in and helped 立 catch the slowly changing dynamic trends. ‧ 國. 學. 2.7 Background of TensorFlow and GPU. ‧. Neural networks have large computational demands, with many algorithms. y. Nat. sit. requiring extensive computing resources to execute. As a consequence, many. n. al. er. io. improvement to algorithms need to be made to enable them to execute in reasonable. i n U. v. time. This situation, combined with stagnant of increases in the clock speed of CPUs,. Ch. engchi. has led to the increasing of using parallel ways in neural networks, where a traditional CPU can be combined with parallel accelerators such as GPU. 1.. TensorFlow In November 2015, Google released TensorFlow, which is an open source software. library for defining, deploying and training neural network models. It supports CPU, GPU, distributed computing and hybrid CPU and GPU. Computational Graph Architecture 19.

(28) In TensorFlow, machine learning algorithms are represented as computation graphs. A computation graph is a form of directed graph where the nodes describe operations, while edges represent data flowing between these operations. There is an example for data flow graph in Figure 1. The following paragraphs review the basic concept of model, such as tensors, operations, variables and sessions.. 立. 政 治 大. ‧. ‧ 國. 學 y. Nat. er. io. 1). sit. Figure 1. Computation graph in TensorFlow (Abadi et al., 2016) Tensor: In TensorFlow, a tensor is a typed multi-dimensional array.. al. n. v i n C hof floating point numbers For example, a 3-D array representing a batch of engchi U images with dimensions [height, width, channels]. They support all kinds of tensor element types, such as signed and unsigned integers ranging in size from 8 bits to 64 bits, a complex number type, IEEE float and double types and so on. 2). Operations: The nodes of the TensorFlow graph are called. operations or ops. An operation has a name and represents an abstract computation (e.g.,“subtract”). Each operation can be handed a constant, array, matrix or n-dimensional matrix. 20.

(29) 3). Variables: A variable is a special kind of operation that returns a. handle to a persistent mutable tensor that survives across executions of a graph. Variables must be initialized before they have values in TensorFlow. 4). Sessions: In TensorFlow, the execution of operations and. evaluation of tensors only be executed in session. A session differs from graphs; session allows to execute graphs or part of graph. It allocates resources for that and holds the actual values of intermediate results and variables. Table 2. Examples for TensorFlow operations (Abadi et al., 2016). Category. 政 治 大 Example. 立. ‧ 國. 學. Add, Mul, Exp. Matrix operations. MatMul, MatrixInverse. ‧. Element-wise operations. sit. y. Nat. io. n. al. Constant, Variable. er. Value-producing operations. Neural network units. Ch. i n U. v. SoftMax, ReLU, Conv2D. engchi. Checkpoint operations. Save, Restore. Implementation The main components in TensorFlow are the client. A client communicates with the master using session interface. The master managers one or more work processes. Each worker is responsible for arbitrating one or more computational devices and for executing operations on those devices.. 21.

(30) The TensorFlow implementation translates the computational graph definition into executable operations distributed across available compute resources, such as the CPU or GPU. In general, you do not have to specify CPUs or GPUs explicitly. TensorFlow uses your first GPU, if you have one, for as many operations as possible. 2.. Graphics Processing Unit GPU is also known as graphics processing unit. The world's first GPU, GeForce. 256, was to be released by NVIDIA in 1999. GPUs are very efficient at image. 治 政 processing, math calculations, especially vector and matrix 大 operation and their massive 立 parallel processing make them more efficient than general-purpose CPUs (NVidia, ‧ 國. 學. 2009).. ‧. Parallelism is the future trend of computing. The microprocessor development. sit. y. Nat. efforts will keep to focus on adding cores instead of increasing single-thread. io. al. er. performance. GPUs have developed rapidly in recent years. GPUs are not only used to. n. graphics applications but also used to general purpose computing.. Ch. engchi. i n U. v. The GPU is designed for a specific type of applications with the following characters. Over the past few years, a growing community has identified other applications with similar features and successfully mapped these applications to the GPU (Owens, 2008). 1.. Computational requirements are large: GPUs must deliver an. enormous amount of compute performance to satisfy the demand of complex real-time applications. 2.. Parallelism is substantial: The graphics pipeline is well suited for 22.

(31) parallelism. Operations on vertices and fragments are well matched to finegrained closely coupled programmable parallel compute units, which in turn are applicable to many other computational domains 3.. Throughput is more important than latency. In today's computer computing, GPUs can take on many task, such as accelerating video, image recognition and others. The CPUs integrated with GPUs can deliver the best value of system performance. 3.. 政 治 大. TensorFlow and GPU implementation. 立. Szegedy et al. (2015) proposed a new version of neural network model called. ‧ 國. 學. Inception-v4 which has a more uniform simplified architecture than Inception-v3. Their study used the TensorFlow distributed machine learning system on NVidia Kepler GPU.. ‧. They report 3.08% top-5 error on the ImageNet test set of the ImageNet classification. io. sit. y. Nat. challenge.. n. al. er. Barzdins et al. (2016) use TensorFlow with a sliding-window technique to. Ch. i n U. v. character-level English to Latvian translation of audio and video content. They built a. engchi. LSTM layer of size 400, batch size 16. To train the network is performed on TitanX GPU for 24h. Babaeizadeh et al. (2017) propose a hybrid CPU and GPU version of the Asynchronous Advantage ActorCritic (A3C) algorithm in TensorFlow. They demonstrate the significance speed up GA3C algorithm of hybrid CPU and GPU. They achieve a significant speed up with respect to its CPU counter part.. 2.8 Reasoning Neural Networks 23.

(32) The Reasoning Neural Networks (RN), first propounded by Tsaih’s work in 1998. The RN adopts the layered feedforward network structure. It has a learning algorithm being a member of the weight-and-structure-change category, because it puts only one hidden node initially, and will autonomously recruit and prune hidden nodes during the learning process. Tsaih (1998) considered that RN is guaranteed to be completed, the number of required hidden nodes is sensible, and that the speed of learning is better than back propagation networks. As mentioned above, RN is layered feedforward a network structures, which are the. 政 治 大. input layer, hidden layer and output layer. Figure 1 shows the network structure of RN.. 立. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 2. Network structure of RN The value of m and q is set depending on the problem, and they are fixed. The value of p set up to 1 initially, it would add or remove during the learning process. The RN's activate function which adopts hyperbolic tanh to hidden layer. Hyperbolic tanh function: tanh 𝑥 =. 𝑒 𝑥 − 𝑒 −𝑥 𝑒 𝑥 + 𝑒 −𝑥. 24.

(33) Experiment Design Text mining techniques provide a valuable tool for the extraction of information from text. These techniques are designed to generate numerical representations of the text for several potential uses. A common usage of text mining is information extraction. Social media offers some text mining opportunities; there are 13 millions of daily active Facebook users in Taiwan as of 2016. We suppose that social media and forum do have a lot of precious information that others do not have.. 政 治 大 forum contents and opinions on the social media to derive numerical representations of 立. In this experiment, the text mining techniques are first applied to the relevant news,. the information regarding the momentum changes in the exchange rate. That is, what. ‧ 國. 學. are being extracted here are the news, the posts in forums and the discussions on the. ‧. social media. The derived numerical representations are related with the movement in. y. Nat. exchange rates depending on the negativity or positivity of discussion board content.. er. io. sit. Therefore, the unstructured and semi-structured data should be converted to the structured numerical data before feeding into the artificial neural networks system.. n. al. i n U. C. h Pre-Processing 3.1 Data Collection and Data engchi. v. In this experiment, we would like to forecast USD/TWD daily exchange rates movements. Our input data are fetched from financial news, forum and social media, such as cnYes 4, moneyDJ 5, ForeignEX of PTT6 and Facebook. The crawler for these data resources is implemented in Python 2.7. When the input data is available, it must. 4. http://www.cnyes.com/. 5. https://www.moneydj.com/. 6. https://www.ptt.cc/bbs/ForeignEX/index.html 25.

(34) be preprocessed into the numerical values for being fed into neural networks. That is, we need to convert the unstructured data and the semi-structured data to the structured numerical data so that it can be processed by the neural network. The data collection and data pre-processing is divided into various phases as following graph.. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 3. The flow chart of data collection and data pre-processing 1.. Text Segmentation Text Segmentation means that to divide textual data into meaningful units, such. as words, sentences (Hearst, 1997). Unlike English, there is no spaces in Chinese sentence. Therefore, word segmentation adds spaces between words. For traditional Chinese and simplified Chinese, there are a lot of tools to do segmentation, such as. 26.

(35) CKIP7, ICTCLAS 8, jieba9 and so on. In this experiment, we use the jieba which is a Python package to do text segmentation. Table 3. Traditional Chinese Text Segmentation Example Original 在台股開紅及韓元反彈下,新台幣一開盤就升值 7.1 分、以 30.4 元開出. Result. 在 台股 開紅及 韓元 反彈 下 , 新台幣 一 開盤 就 升值 7.1 分 、以 30.4 元 開出. 2.. Stop Words Removal. 立. 政 治 大. Stop words are some words which have nothing to do with the content, such as "a",. ‧ 國. 學. "an", "or" and others. Removing these words so that a more refined data for sentiment. ‧. analysis purposes. We can use the Natural Language Toolkit (NLTK) which is an NLP tool for Python to remove these words (Feldman, 2007). If we remove stop words that. y. Nat. al. er. io. sit. are very commonly used in Chinese, we can concentrate on the important words instead.. n. Table 4. Example of Traditional Chinese stop words removal. Ch. engchi. i n U. v. Original 在台股開紅及韓元反彈下,新台幣一開盤就升值 7.1 分、以 30.4 元開出. Result. 3.. 台股 開紅及 韓元 反彈 下 新台幣 一 開盤 就 升值 分以 元 開出. Part-of-Speech Tagging (PoS). 7. http://ckipsvr.iis.sinica.edu.tw/. 8. http://ictclas.nlpir.org/. 9. https://github.com/fxsjy/jieba 27.

(36) The part-of-speech tagging tries to label each word with the appropriate part of speech. We can distinguish the word type by introducing part-of-speech tags. To make sure each word is a noun, verb, adjective or other. We also use the jieba to tag all textual data. Table 5. Example of Traditional Chinese part-of speech tagging Original 新台幣一開盤就升值 7.1 分. Result. Sentiment Analysis. 立. 7.1 (m) 分 (n). 政 治 大. 學. ‧ 國. 4.. 新台幣 (n) 一 (m) 開盤 (x) 就 (d) 升值(v). Sentiment analysis, also known as opinion mining, refers to the use of natural language processing, text mining and other ways to systematically identify, extract,. ‧. quantify and study affective states and subjective information. Sentiment analysis can. y. Nat. sit. be applied on any textual form of opinions such as blogs, social media and so on. n. al. er. io. (Feldman, 2007). We use the NTUSD which is a semantic dictionary developed by the. i n U. v. National Taiwan University to do the sentiment analysis (Ku & Chen, 2007). It's a. Ch. engchi. lexicon of about 11,100 words with positive / negative sentiment. Table 6. Example of NTUSD words Positive words. 一帆風順的、了不起的、令人驚奇的、可信賴的…. Negative words. 下等的、不公平的、不合時宜的、不守規矩的…. 28.

(37) 3.2 Build a Neural Network in TensorFlow In this study, the tomorrow exchange rate movements are used as the forecasting target of our experiment. They refer to each transaction date movements in Taiwan forex market.. USD/TWD 32.5 32 31.5. 31 30.5. 立. 30. 政 治 大. 29.5. ‧ 國. 2/3/2017. 3/3/2017. 4/3/2017. 5/3/2017. 6/3/2017. ‧. 28.5 1/3/2017. 學. 29. sit. y. Nat. Figure 4. Real exchange rate movements for USD/TWD. n. al. er. io. The data set for USD/TWD exchange rate movements in this study consisted of 6. i n U. v. monthly data from 1st Jan. 2017 to 30th Jun. 2017. The data used is constituted of the. Ch. engchi. emotion and the volume of discussions in Facebook, news, forum as well as their positive / negative emotions. The real transaction date exchange rate movements were downloaded from the Taiwan Futures Exchange 10. For simplicity, real exchange rate movements were considered as quaternary representation. Our proposed system is using outlier detection algorithm based on (Huang et al., 2014) and reasoning neural networks algorithm based on (Tsaih, 1998).. 10. http://www.taifex.com.tw/chinese/3/3_5.asp 29.

(38) Input node: There are six input nodes in our proposed neural networks system. As shown in Table 7, these values represent the emotions and volume of discussion of the three data resources, respectively. Table 7. The definition of six variables in the input vector Variable. Definition. x1. Facebook emotion. x2. volume of Facebook discussion and their positive/negative emotion. x3. News emotion. x4. volume of News and their positive/negative emotion. x5. ForeignEX of PTT emotion. x6. volume of PTT discussion and their positive/negative emotion. 立. 政 治 大. ‧ 國. 學. ‧. Hidden node: The number of adopted hidden nodes is 1 initially. Within the learning process, the number of adopted hidden nodes will be altered.. sit. y. Nat. io. er. Output node: The output layer has two output nodes whose values are binary. The. al. output could be seen as (1, 1) representing the appreciation, (1, -1) and (-1, 1). n. v i n C(-1, representing the unchanged and the depreciation. That is, the h e-1)n representing gchi U movements of the exchange rate transactions have been divided into three groups, appreciation, unchanged and depreciation as shown in Table 8. Desired output: The desired output is the real movement of exchange rate transaction. The representation of desired output regarding the appreciation is (1, 1), the depreciation is (-1, -1), and the unchanged is (1, -1). Table 8. Representation for three movements groups. 30.

(39) Category. Description. (1, 1). Appreciation. (1, -1) or (-1, 1). Unchanged.. (-1, -1). Depreciation. The reasoning neural network can be used as a classifier which learns to distinguish whether the stimulus to two classes.. 政 治 大. 立. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 5. The flowchart of the learning process of Reasoning Neural Networks The stopping criterion of learning is the Linearly Separating Condition (LSC). At the stage of learning the kth training case, let K  K1  K2, where K1 and K2 are the sets of indices of training cases in class 1 and 2, respectively. The condition LSC(𝑘) = True if min 𝑂(𝐵𝑐 , 𝑌, 𝑋) >. 𝑐 ∈ 𝐾1 (𝑘). max 𝑂(𝐵𝑐 , 𝑌, 𝑋). 𝑐 ∈ 𝑘2 (𝑘). and LSC(𝑘) = False, otherwise. 31.

(40) 治 政 大 Figure 6. The Thinking Mechanism 立 ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 7. The Cramming Mechanism. 32.

(41) 政 治 大. 立. Figure 8. The Reasoning Mechanism. ‧ 國. 學. When LSC(𝑘) = T, the following value 𝑣 can be used for the purpose of correct. min 𝑂(𝐵𝑐 , 𝑌, 𝑋) + max 𝑂(𝐵𝑐 , 𝑌, 𝑋) 𝑐∈𝐾2 (𝑘). y. 𝑐∈𝐾1 (𝑘). io. n. al. 𝐵∈{. sit. 2. er. Nat. 𝑣=. so that. ‧. classification regarding the output node:. i n U. v. 𝐶𝑙𝑎𝑠𝑠1 , 𝑖𝑓 𝑂(𝐵, 𝑌, 𝑋) ≥ 𝑣 𝐶𝑙𝑎𝑠𝑠2 , 𝑖𝑓 𝑂(𝐵, 𝑌𝑙 , 𝑋) < −𝑣. Ch. engchi. Note that LSC(𝑘) = T is a sufficient, but not a necessary condition for the goal of 2-class learning. Table 9. Defintion of all variables Variable. Definition 𝑩𝒄. the input vector of the cth case. 𝑿. (𝑿1 𝑇, 𝑿2 𝑇, … , 𝑿𝑝 𝑇)T. Y. (𝑌1 𝑇, 𝑌2 𝑇, … , 𝑌𝑞 𝑇)T 33.

(42) 立. 政 治 大. Figure 9. The proposed neural network initial architecture. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 34. i n U. v.

(43) Experimental Results The process of data collection and data processing averages six minutes a day from three sources. In order to feed the data to the neural networks, we need to transform the fetched textual data to numerical representation. We derive sentiment scores by counting positive and negative words by NTUSD. First, we will read all the positive and negative words of the NTUSD into a large Python dictionary. Then, we read the fetched document and make the word-to-word comparison with the large Python dictionary. We assume the emotion value is zero initially. If the fetched words match to. 政 治 大. the positive words, the emotion value will add one, and vice versa. In this study, we do. 立. 學. ‧ 國. not distinguish between weak and strong words. The given result is obtained by simply counting positive and negative words on every day.. Forum. emotion. volume. Forum. y. News. sit. ‧. Facebook Facebook News. io. volume. n. al. emotion. 30. emotion. volume. er. Nat. Date. Table 10. Example of after transformed data. Ch. 1/3. 1. 1 6 en gchi. 1/4. 1. 2. -1. 1/5. -1. -7. -1. 35. i n U. v. 1. 5. -33. 1. 2. -24. -1. -2.

(44) 60 40 20. -20. 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100. 0. -40 -60 -80 -100 Facebook. News. Forum. Figure 10. Volume of discussion and their positive / negative emotion. 政 治 大 as time series data. We think that there have concept drifting and outliers in exchange 立. The 120 transaction daily data will be in chronological order, so we can regard it. rates environment. In order to deal the outliers problem in concept drifting environment,. ‧ 國. 學. we implement Huang et.al (2014) work in the first part of neural networks. First, we. ‧. take the first window, where M = 1, into consideration. The training block is made up. y. Nat. of N elements, and the testing block is consisted of B elements. In this study, the first. er. io. sit. training block is made up of 1 st to 90th and the second testing block is composed of 91 st to 95th data. Moreover, the initial SLFN will be trained from the training block. The. al. n. v i n envelope module will wrap the C least residual 95% data, h e n g c h i U letting the far away from the fitting function’s data as potential outliers.. 36.

(45) 立. 政 治 大. Figure 11. The implementation of moving windows in this experiment. ‧ 國. 學. Over time, M will be 2. The first 5 data will be discarded. The training block will. ‧. move to 6th to 95th. Meanwhile, the testing block will slide to 96 th to 100th. The. y. Nat. mechanism will be done until no incoming data. In the first part, we have six windows.. er. io. sit. After the first part of learning process, we remove the potential outliers from the input data. At the same time, we obtained a trained SLFN. Then, we use the learning. al. n. v i n algorithm of RN to learn. First ofC all, the learning will read h e n g c h i U data in sequence. The second. step is the thinking mechanism which checks if the LSC is satisfied. If the LSC is match, the next training case will be added to the neural networks. If not, it will go to the cramming mechanism. The cramming part strategy is recruiting extra hidden nodes. In this part, the total amount of hidden nodes will be large if extra hidden nodes are added frequently. In order to avoid a huge amount of hidden nodes, the next step is the reasoning mechanism. In the reasoning part, where there is a pruning mechanism, is designed to try to decrease the total amount of adopted hidden nodes. Sequentially ignoring the hidden node, and doing thinking mechanism. If the Linearly Separating 37.

(46) Condition is satisfied, remove the nth hidden node; otherwise, recover the hidden node. When the learning process finish, put testing data to the reasoning neural networks to classify. Then, check if the forecasting matches the truth. In the end, we will get the accuracy rate of the reasoning neural networks. We divide the data sources into seven cases to make forecasting exchange rate movements. Facebook. 立. 政 治 大. 學 ‧. ‧ 國 io. sit. y. Nat. n. al. er. 1.. Ch. engchi. i n U. v. Figure 12. Facebook Forecasting Result (M = 1~3). 38.

(47) 立. 政 治 大. ‧ 國. 學. Figure 13. Facebook Forecasting Result (M = 4~6). In the first part of proposed neural networks, it took nearly 50 minutes to remove. ‧. potential outliers every window. Then, it took almost 6 hours to classify the output to. y. Nat. sit. three conditions every window in the second part of proposed neural networks. Finally,. n. al. er. io. using Facebook to make forecasting, we got the 43% forecasting accuracy rate.. i n U. v. Table 11. Forecasting Accuracy Rate (Facebook) 1st NN. Ch. engchi. inliers (85) Window. F. all (90). 2nd NN all (85). testing (5). ↑. -. ↓. ↑. -. ↓. ↑. -. ↓. ↑. -. ↓. A M=1. ↑. 47/47. N/A. N/A. 52/52. N/A. N/A. 47/47. N/A. N/A. N/A. N/A. 2/2. (1 – 90). -. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 37/37. N/A. N/A. 37/37. N/A. N/A. 37/37. 2/3. N/A. 1/3. ↑. 47/47. N/A. N/A. 52/52. N/A. N/A. 47/47. N/A. N/A. 1/1. N/A. N/A. M=2. 39.

(48) N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. N/A. 1/1. ↓. N/A. N/A. 37/37. N/A. N/A. 37/37. N/A. N/A. 37/37. 2/3. N/A. 1/3. M=3. ↑. 44/44. N/A. N/A. 49/49. N/A. N/A. 44/44. N/A. N/A. N/A. N/A. N/A. (11 – 100). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 39/39. N/A. N/A. 39/39. N/A. N/A. 39/39. 2/5. N/A. 3/5. M=4. ↑. 40/40. N/A. N/A. 45/45. N/A. N/A. 40/40. N/A. N/A. N/A. N/A. 1/1. (16 – 105). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 43/43. N/A. N/A. 43/43. N/A. N/A. 43/43. 1/4. N/A. 3/4. M=5. ↑. 38/38. N/A. N/A. 42/42. N/A. N/A. 38/38. N/A. N/A. 2/3. N/A. 1/3. (21– 110). -. N/A. 2/2. 立. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 45/45. N/A. N/A. 46/46. N/A. N/A. 45/45. 2/2. N/A. N/A. 37/37. N/A. N/A. 42/42. N/A. N/A. 37/37. N/A. N/A. 1/3. N/A. 2/3. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. N/A. N/A. 46/46. N/A. N/A. 46/46. N/A. N/A. 46/46. 1/2. N/A. 1/2. io. 2. News. y. Nat. ↓. sit. -. n. al. er. (26 – 115). ‧. ↑. 政 治 大. 學. M=6. ‧ 國. -. (6 – 95). Ch. engchi. 40. i n U. v.

(49) 立. 政 治 大. ‧ 國. 學 ‧. Figure 14. News Forecasting Result (M = 1~3). n. er. io. sit. y. Nat. al. Ch. engchi. 41. i n U. v.

(50) Figure15. News Forecasting Result (M = 4~6) In the first part of proposed neural networks, it took approximately 50 minutes to remove potential outliers every window. After that, it took almost 6 hours to classify the output to three conditions every window in the second part of proposed neural networks. Finally, using financial news to make forecasting, we got the 37% forecasting accuracy rate. Table 12. Forecasting Accuracy Rate (News) 1st NN. 2nd NN. inliers (85). ↑. (1 – 90). -. -. ↓. ↑. -. ↓. 49/49. N/A. N/A. 52/52. N/A. N/A. 49/49. N/A. N/A. 1/2. N/A. 1/2. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. N/A. N/A. N/A. N/A. 35/35. N/A. N/A. 37/37. N/A. N/A. 35/35. 3/3. N/A. N/A. 49/49. N/A. N/A. 52/52. N/A. N/A. N/A. N/A. 1/1. N/A. N/A. N/A. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. N/A. 1/1. N/A. N/A. 35/35. 3/3. N/A. N/A. ↑. (6 – 95). -. N/A. ↓. N/A. N/A. M=3. ↑. 46/46. N/A. N/A. 49/49. N/A. N/A. 46/46. N/A. N/A. N/A. N/A. N/A. (11 – 100). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 37/37. N/A. N/A. 39/39. N/A. N/A. 37/37. 3/5. N/A. 2/5. M=4. ↑. 41/41. N/A. N/A. 45/45. N/A. N/A. 41/41. N/A. N/A. 1/1. N/A. N/A. (16 – 105). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 42/42. N/A. N/A. 43/43. N/A. N/A. 42/42. 3/4. N/A. 1/4. M=5. ↑. 39/39. N/A. N/A. 42/42. N/A. N/A. 39/39. N/A. N/A. 1/3. N/A. 2/3. (21– 110). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. y. M=2. Nat. ↓. ↑. ‧. M=1. - 立. testing (5). 學. A. ↑. all (85). sit. F. ‧ 國. Window. all (90) 政 治 大 - ↓ ↑ ↓. n. er. io. al. 1/1. 49/49. v ni. C35/35 h e nN/Ag cN/A h i U37/37. 42.

(51) ↓. N/A. N/A. 44/44. N/A. N/A. 46/46. N/A. N/A. 44/44. N/A. N/A. 2/2. M=6. ↑. 40/40. N/A. N/A. 42/42. N/A. N/A. 40/40. N/A. N/A. 2/3. N/A. 1/3. (26 – 115). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 43/43. N/A. N/A. 46/46. N/A. N/A. 43/43. 2/2. N/A. N/A. Forum (PTT). 立. 政 治 大. 學 ‧. ‧ 國 io. sit. y. Nat. n. al. er. 3.. Ch. engchi. i n U. v. Figure 16. Forum Forecasting Result (M = 1~3). 43.

(52) 立. 政 治 大. ‧ 國. 學. Figure 17. Forum Forecasting Result (M = 4~6). ‧. In the first part of proposed neural networks, it took close to 40 minutes to remove. sit. y. Nat. potential outliers every window. Further, it took almost 5 hours to classify the output to. al. er. io. three conditions every window in the second part of proposed neural networks. Finally,. v. n. using forum to make forecasting, we got the 23% forecasting accuracy rate.. Ch. engchi. i n U. Table 13. Forecasting Accuracy Rate (Forum) 1st NN. 2nd NN. inliers (85) Window. F. all (90). all (85). testing (5). ↑. -. ↓. ↑. -. ↓. ↑. -. ↓. ↑. -. ↓. N/A. N/A. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. 35/35. 2/3. N/A. 1/3. A M=1. ↑. 49/49. N/A. N/A. 52/52. N/A. N/A. 49/49. (1 – 90). -. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. ↓. N/A. N/A. 35/35. N/A. N/A. 37/37. 44. N/A. 1/1 N/A.

(53) ↑. 48/48. N/A. N/A. 52/52. N/A. N/A. 48/48. (6 – 95). -. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. ↓. N/A. N/A. 36/36. N/A. N/A. 37/37. N/A. M=3. ↑. 46/46. N/A. N/A. 49/49. N/A. N/A. 46/46. (11 – 100). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. ↓. N/A. N/A. 37/37. N/A. N/A. 39/39. N/A. M=4. ↑. 41/41. N/A. N/A. 45/45. N/A. N/A. 41/41. (16 – 105). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. ↓. N/A. N/A. 42/42. N/A. N/A. 43/43. N/A. M=5. ↑. 39/39. 立 N/A. N/A. 42/42. N/A. N/A. 39/39. (21– 110). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. 44/44. N/A. N/A. 46/46. 40/40. N/A. N/A. 42/42. N/A. N/A. 40/40. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. 43/43. N/A. N/A. 46/46. sit. io. ↓. al. n. 4.. Facebook and News. y. -. N/A. Ch. engchi. 45. i n U. v. N/A. N/A. N/A. 1/1. N/A. 1/1. N/A. N/A. N/A. 36/36. 3/3. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. 37/37. 4/5. N/A. 1/5. N/A. N/A. N/A. N/A. 1/1. N/A. N/A. N/A. N/A. N/A. 42/42. 2/4. N/A. 2/4. N/A. N/A. N/A. N/A. 3/3. N/A. N/A. N/A. N/A. N/A. 44/44. N/A. N/A. 2/2. N/A. N/A. 1/3. N/A. 1/3. N/A. N/A. N/A. N/A. 43/43. 1/2. N/A. 1/2. 1/1. 2/2. 2/2. 2/2. N/A. er. (26 – 115). N/A. ‧. ↑. Nat. M=6. 政 治 大. 學. ↓. ‧ 國. M=2. 2/2 N/A.

(54) 立. 政 治 大. ‧ 國. 學. Figure 18. Facebook and News Forecasting Result (M = 1~3). ‧. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 19. Facebook and News Forecasting Result (M = 4~6) 46.

(55) In the first part of proposed neural networks, it took just about one hour to remove potential outliers every window. Moreover, it took nearly 7 hours to classify the output to three conditions every window in the second part of proposed neural networks. Finally, using Facebook and financial news to make forecasting, we got the 53% forecasting accuracy rate. Table 14. Forecasting Accuracy Rate (Facebook and News) 1st NN. 2nd NN. inliers (85) Window. F. ↑. -. A. -. ↑. -. ↓. ↑. -. ↓. N/A. N/A. 1/2. N/A. 1/2. N/A. N/A. N/A. N/A. N/A. N/A. 52/52. N/A. N/A. 47/47. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. N/A. 37/37. N/A. N/A. 37/37. N/A. N/A. 37/37. 2/3. N/A. 1/3. 47/47. N/A. N/A. 52/52. N/A. N/A. 47/47. N/A. N/A. N/A. N/A. 1/1. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. 1/1. N/A. N/A. 37/37. N/A. N/A. 37/37. N/A. N/A. 37/37. N/A. N/A. 3/3. CN/A N/A N/A h e n49/49 gchi U. 44/44. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. N/A. 1/1. ↑. (6 – 95). - ↓. N/A. M=3. ↑. 44/44. (11 – 100). -. N/A. 2/2. N/A. N/A. 2/2. N/A. ↓. N/A. N/A. 39/39. N/A. N/A. 39/39. N/A. N/A. 39/39. 2/5. N/A. 3/5. M=4. ↑. 40/40. N/A. N/A. 45/45. N/A. N/A. 40/40. N/A. N/A. 1/1. N/A. N/A. (16 – 105). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 43/43. N/A. N/A. 43/43. N/A. N/A. 43/43. 3/4. N/A. 1/4. M=5. ↑. 38/38. N/A. N/A. 42/42. N/A. N/A. 38/38. N/A. N/A. 1/3. N/A. 2/3. (21– 110). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. N/A. N/A. N/A. N/A. ↓. N/A. N/A. 45/45. N/A. N/A. 46/46. 45/45. N/A. N/A. 2/2. y. M=2. Nat. ‧. ↓. ↓ ↑ 治- ↓ 政 大. testing (5). sit. (1 – 90). 47/47. ‧ 國. ↑. all (85). 學. M=1. 立. all (90). n. N/A. 47. er. io. aN/Al. N/A. v ni. N/A. N/A. 1/1. 2/2. 2/2. 2/2 N/A.

(56) M=6. ↑. 37/37. N/A. N/A. 42/42. N/A. N/A. 37/37. (26 – 115). -. N/A. 2/2. N/A. N/A. 2/2. N/A. N/A. ↓. N/A. N/A. 46/46. N/A. N/A. 46/46. 2/2. N/A. N/A. N/A. 2/3. N/A. 1/3. N/A. N/A. N/A. N/A. 46/46. 1/2. N/A. 1/2. Facebook and Forum (PTT). 立. 政 治 大. 學 ‧. ‧ 國 io. sit. y. Nat. er. 5.. N/A. al. n. v i n C Forum Forecasting Figure 20. Facebookhand e n g c h i U Result (M = 1~3). 48.

參考文獻

相關文件

Robinson Crusoe is an Englishman from the 1) t_______ of York in the seventeenth century, the youngest son of a merchant of German origin. This trip is financially successful,

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

For the proposed algorithm, we establish a global convergence estimate in terms of the objective value, and moreover present a dual application to the standard SCLP, which leads to

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette &amp; Turner, 1999?. Total Mass Density

The case where all the ρ s are equal to identity shows that this is not true in general (in this case the irreducible representations are lines, and we have an infinity of ways

This kind of algorithm has also been a powerful tool for solving many other optimization problems, including symmetric cone complementarity problems [15, 16, 20–22], symmetric

• How social media shape our relationship to and understanding of breaking news events. – How do we know if information shared on social media

專案執 行團隊