運用於高頻交易策略規劃之分散式類神經網路框架 - 政大學術集成

全文

(1)Running head: DISTRIBUTED ANN FOR HFT STRATEGIES. 國立政治大學資訊管理學系. 碩士學位論文指導教授：劉文卿博士. 立. 政治大. ‧ 國. 學. 運用於高頻交易策略規劃之. ‧. 分散式類神經網路框架. sit. y. Nat. Distributed Framework of Artificial Neural Network. n. er. io. for Planning High-Frequency Trading Strategies al v i n Ch engchi U. 研究生：何善豪中華民國一百零三年七月.

(2) DISTRIBUTED ANN FOR HFT STRATEGIES. i. Abstract In this research, we introduce a distributed framework of artificial neural network (ANN) as a subproject under the research of a high-frequency trading (HFT) system. In the system, ANNs are used in the data mining process for identifying patterns in financial time series. We implement a framework for training ANNs on a distributed computing platform. We adopt Apache Spark to build the base computing cluster because it is capable of high performance. 政治大. in-memory computing. We investigate a number of distributed backpropagation algorithms. 立. and techniques, especially ones for time series prediction, and incorporate them into our. ‧ 國. 學. framework with some modifications. With various options for the details, we provide the user. ‧. with flexibility in neural network modeling.. Nat. io. sit. y. Keywords: high-frequency trading, time series, data mining, artificial neural network,. n. al. er. multilayer perceptron, backpropagation, distributed computing, cluster computing. Ch. engchi. i n U. v.

(3) DISTRIBUTED ANN FOR HFT STRATEGIES. ii. Abstract in Chinese 在這份研究中，我們提出一個類分散式神經網路框架，此框架為高頻交易系統研究下之子專案。在系統中，我們透過資料探勘程序發掘財務時間序列中的模式，其中所採用的資料探勘演算法之一即為類神經網路。我們實作一個在分散式平台上訓練類神經網路的框架。我們採用 Apache Spark 來建立底層的運算叢集，因為它提供高效能的記憶體內運算（in-memory computing）。我們分析一些分散式後向傳導演算法（特別是. 政治大. 用來預測財務時間序列的），加以調整，並將其用於我們的框架。我們提供了許多細. 立. 部的選項，讓使用者在進行類神經網路建模時有很高的彈性。. ‧ 國. 學. 關鍵詞：高頻交易、時間序列、資料探勘、類神經網路、多層感知器、後向傳. ‧. 導、分散式運算、叢集運算. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(4) DISTRIBUTED ANN FOR HFT STRATEGIES. iii. Table of Contents Abstract ....................................................................................................................................... i Abstract in Chinese ....................................................................................................................ii Table of Contents ..................................................................................................................... iii List of Tables ............................................................................................................................. v List of Figures ........................................................................................................................... vi. 政治大. Introduction ................................................................................................................................ 1. 立. High-Frequency Trading in the Markets around the World .......................................... 1. ‧ 國. 學. Research for High-Frequency Trading System.............................................................. 3. ‧. Strategy Planning with Artificial Neural Network ........................................................ 4. Nat. io. sit. y. Training ANNs on a Distributed Computing Platform .................................................. 5. al. er. Literature Review....................................................................................................................... 7. n. v i n Time Series Prediction withC ANN 7 h e................................................................................. ngchi U Distributed Backpropagation Algorithm........................................................................ 9 Technologies for Parallel and Distributed Computing ................................................ 12. Research Method ..................................................................................................................... 13 Overview of the HFT System ...................................................................................... 13 Architecture of the HFT system. ...................................................................... 14 Data mining process in the HFT system. ......................................................... 17.

(5) DISTRIBUTED ANN FOR HFT STRATEGIES. iv. Conceptual Model of the Distributed Framework of ANN ......................................... 19 Single-model training session. ......................................................................... 21 Multi-model training session. .......................................................................... 24 Architecture of the Distributed Framework of ANN ................................................... 26 Components for single-model training session. ............................................... 26 Components for multi-model training session. ................................................ 30. 政治大. Experiments ............................................................................................................................. 33. 立. Training Time .............................................................................................................. 33. ‧ 國. 學. Experiment Method. ........................................................................................ 33. ‧. Result. .............................................................................................................. 34. Nat. io. sit. y. Trading Simulation ...................................................................................................... 36. al. er. Experiment Method. ........................................................................................ 36. n. v i n C h ........................................................................... Testing for Firing Threshold. 39 engchi U Testing for Voting Threshold........................................................................... 40 Conclusion ............................................................................................................................... 42 References ................................................................................................................................ 43.

(6) DISTRIBUTED ANN FOR HFT STRATEGIES. v. List of Tables Table 1 Training Time ............................................................................................................. 34 Table 2 Result of Firing Threshold Test ................................................................................. 39 Table 3 Result of Voting Threshold Test ................................................................................ 40. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(7) DISTRIBUTED ANN FOR HFT STRATEGIES. vi. List of Figures Figure 1. Architecture of the HFT System............................................................................... 15 Figure 2. Data Mining Process................................................................................................. 17 Figure 3. Training Set and Training Subsets ........................................................................... 20 Figure 4. Single-model Training Session................................................................................. 21 Figure 5. Multi-model Training Session .................................................................................. 24. 政治大. Figure 6. Components for Single-model Training Session...................................................... 27. 立. Figure 7. Components for Multi-model Training Session ....................................................... 30. ‧ 國. 學. Figure 8. Training Time in Percentage .................................................................................... 35. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(8) DISTRIBUTED ANN FOR HFT STRATEGIES. 1. Introduction High-frequency trading (HFT) is a special class of algorithmic trading and perhaps the one attracting most attention in recent years. The basic concept of HFT is to make low profit margins by buying and selling securities rapidly in very short terms. While the margin is quite low, with a large volume of trades, a considerable amount of return could be expected. In order to determine when to trade, we need to predict the trend in a very short period. 政治大. (usually a few seconds). Such prediction is a problem with high complexity. In this research. 立. 學. methods suitable for complex problems solving. High-Frequency Trading in the Markets around the World. ‧. ‧ 國. we approach the problem with artificial neural network (ANN) which is known as one of the. Nat. io. sit. y. Since the emergence of high-frequency trading, financial markets around the world. al. er. have been experiencing its remarkable impact. In the United States, the trading volume in. n. v i n C h trading firms areUaccounted for once reached the terms of percentage which high-frequency engchi peak of 73% (Kenett, Ben-Jacob, & Stanley, 2013). Although the trend of HFT seems to be declining, these firms are still accounting for about half of the trading volume (Popper, 2012; Price, 2013). In the European market, the Bank of England suggests that HFT accounts about 40% of trading volume of equity orders (Haldane, 2010); similarly, the Wall Street Journal also claims that HFT accounts about 40% of trading value in 2012 (Price, 2013). In Japan and Australia, high frequency traders also take a significant part with 50% to 60% of trading.

(9) DISTRIBUTED ANN FOR HFT STRATEGIES. 2. volume and 27% of trading value respectively (Kingsley, Phadnis, & Stone, 2013; Price, 2013). As for the rest of the Asia-Pacific area (excluding Japan and Australia), however, HFT only accounts for 12% of trading value (Price, 2013). One of the reasons for low HFT participation is the stamp duty (a form of financial transaction tax, FTT) because it can threaten the thin profit margins of high-frequency traders. Neither of Japan and Australia has a stamp duty; while those with low percentage of HFT in. 政治大. stock exchanges (namely Korea, Taiwan, Singapore and Hong Kong) have stamp duties. 立. imposed (Kingsley et al., 2013; Kwong, 2011; Price, 2013). Another reason for the hesitation. ‧ 國. 學. to adopt HFT in Asia might be the concerns raised by May 2010 “flash crash” in the US,. ‧. which is still a main topic at the debates on HFT until today (Grant, 2013; Ranasinghe, 2014).. Nat. io. sit. y. It is true that misuse of technology can cause disastrous consequences, which is why rules. n. al. er. and regulations are needed. But the policy perspective on HFT is not going to be discussed. Ch. engchi. here because it is outside the scope of this research.. i n U. v. Yet another reason is the current market regulations and limitations on the trading platform. In Taiwan, for example, the equity orders were matched every 20 seconds, which is far slower than major exchanges around the world. However, Taiwan Stock Exchange (TWSE) had been upgrading its trading platform in recent years (Kwong, 2011). Subsequently, starting from last year (July 1, 2013 to be specific), TWSE and GreTai Securities Market (GTSM) have been progressively accelerating the cycle of order matching.

(10) DISTRIBUTED ANN FOR HFT STRATEGIES. 3. in order to “keep abreast of international practice” (TWSE, 2013; GTSM, 2013). Along with this policy, the frequency of information disclosure at the website of Market Information System is also being increased accordingly. By the end of 2014, the cycle time is going to be reduced to 5 seconds and is expected to achieve order-by-order matching ultimately. These new measures are considered as new doorways to high-frequency trading in the equity market in Taiwan.. 政治大. Research for High-Frequency Trading System. 立. For research, we have a project on HFT system development in the KMLab at NCCU.. ‧ 國. 學. The system in development provides an environment for the simulation of high frequency. ‧. trades using historical data or realtime quotes and is able to visualize the data and results.. Nat. sit. al. er. io. data mining.. y. Most importantly, the system is designed with the ability to formulate trading strategies by. n. v i n Cbrowse The system allows the user to of investment objects and plan a trading h e ndata gchi U. strategy by deciding parameters of the strategy. With these parameters, the system can start a strategy planning process, using a data mining algorithm to formulate a trading strategy in the form of a mathematical model. Once a strategy is formulated, it is used for backtesting in the simulation environment. The strategy and the simulation result will be presented to the user as tables, charts and graphs. The user can examine the model of the strategy along with the result of backtesting to assess the trading strategy. The user can then decide whether to use.

(11) DISTRIBUTED ANN FOR HFT STRATEGIES. 4. the strategy to trade in the realtime simulation environment or–if the system is connected to a real world trading platform–in a real market. In other words, the key concept of the system is to discover patterns in short-term market trends with data mining algorithms. Currently, we have several algorithms such as logistic regression, genetic algorithm and, of course, artificial neural network. The algorithms are developed as modules used in data mining processes and trading actions, which makes the. 政治大. system flexible. The system is able to evolve as more algorithms are added. In this research,. 立. Strategy Planning with Artificial Neural Network. 學. ‧ 國. we focus on the method of artificial neural network.. ‧. An artificial neural network, or simply a neural network, is an information processing. Nat. io. sit. y. paradigm inspired by central nervous systems of animals and is capable of machine learning. al. er. and pattern recognition. Like a biological neural system, an artificial neural network consists. n. v i n of multiple interconnected neurons C which h ereceive i U input signals to produce output n g candh process signals. A trading strategy, in our definition, is a mathematical model which takes the values of quotes and financial indicators at a specific time point as the variables and, though computation, produce a trading signal as the output. Hence in this context, the input of the neural network is quotes and indicators, and the output is a trading signal. In this research, we adopt a commonly used model of feedforward ANN known as multilayer perceptron (MLP) which uses backpropagation algorithm for learning. In the.

(12) DISTRIBUTED ANN FOR HFT STRATEGIES. 5. backpropagation algorithm, a training pattern in the training set is a pair of an input vector and an output vector. Hence in this context, the input vector of a training pattern contains quotes and indicators at a specific time point, and the output vector contains a trading signal at the same time point. Consequently, the training set will be a time series. Learning of an MLP is the adaptation of weights (multiplication effect applied to input) in the network. An MLP consists of multiple layers; each layer consists of multiple neurons; each neuron holds a. 政治大. weight vector. Therefore, an MLP trading strategy can be represented as a mathematical. 立. model with the network structure as the function, quotes and financial indicators as the. ‧ 國. 學. variables, and a third-order tensor of weights (or a weight tensor) as the coefficients.. ‧. Training ANNs on a Distributed Computing Platform. Nat. io. sit. y. The training process of neural networks is usually time- and CPU-consuming,. al. er. especially when using a large volume of time series data. For performance, scalability and. n. v i n Cneural fault tolerance, we decide to train our distributedly. Instead of the renowned h e nnetworks gchi U Hadoop MapReduce, we choose to adopt Apache Spark to build the base computing cluster because it is able to execute distributed tasks in memory (avoiding overheads of disk I/O) and promises up to 100 times the performance of Hadoop MapReduce (Xin et al., 2013). It has also been proven that Spark is highly scalable with the cluster size of 100 nodes in the lab (Zaharia et al. 2012) and 80 nodes in production at Yahoo (Feng, 2013). Hence in this.

(13) DISTRIBUTED ANN FOR HFT STRATEGIES. 6. research, we propose a distributed framework of artificial neural network for planning highfrequency trading strategies.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(14) DISTRIBUTED ANN FOR HFT STRATEGIES. 7. Literature Review Time Series Prediction with ANN The application of artificial neural networks in predicting time series data can be traced back to late 1980s. The research of White (1988) presents a method to identify patterns in asset price movements with neural networks. Although the author considers the result as a failure, his research provides some valuable insights and modifications to contemporary machine learning techniques.. 立. 政治大. Jones et al. (1990) uses neural networks for function approximation and the time. ‧ 國. 學. series prediction. They propose an improved learning algorithm which is able to reduce size. ‧. of training set while maintaining the performance of a network with linear weights.. Nat. io. sit. y. Kimoto, Asakawa, Yoda and Takeoka (1990) present a TOPIX (Tokyo Stock. al. er. Exchange Prices Indexes) prediction system based on neural network. Into the system they. n. v i n incorporate some improvements to C theh learning process ofU e n g c h i neural network, including a. modified backpropagation algorithm and the moving window (or sliding window) technique for selecting training sets. The system exhibits excellent results in trading simulation. When comparing to multiple regression analysis, the neural network produces a much higher correlation coefficient..

(15) DISTRIBUTED ANN FOR HFT STRATEGIES. 8. Kaastra and Boyd (1996) provide a practical introductory guide to the design of a neural network for forecasting financial and economic time series. An eight-step procedure focusing on backpropagation neural network is defined as follows. 1.. Variable selection: The researcher must decide the input of the network by choosing which indicators to use. It depends on the research objectives that whether to use technical indicators, fundamental indicators or both and what the. 政治大. frequency of data should be.. 立. Data collection: When collecting data, the researcher must consider cost,. 學. ‧ 國. 2.. availability, reliability of the data source, consistency in the calculation over time,. Nat. y. Data preprocessing: The input and output are rarely fed into the network in raw. io. sit. 3.. ‧. etc.. al. er. form. It is crucial to analyze and transform the data in order to minimize noise,. n. v i n C h detect trends, highlight important relationships, e n g c h i U and flatten the distribution. 4.. Training, testing, and validation sets: Common practice is to divide the time series into training, testing, and validation sets. There are different methods to determine these three sets. A rigorous approach is to use moving window testing.. 5.. Neural network paradigms: An improperly constructed neural network could cost more time and computing power without producing better results. It is important.

(16) DISTRIBUTED ANN FOR HFT STRATEGIES. 9. to properly determine appropriate number of layers, number of neurons of neurons in each layer, and transfer function. 6.. Evaluation criteria: There are various error functions which can be used for evaluation. The most common one minimized in neural networks is the sum of squared errors.. 7.. Neural network training: A neural network learns by iteratively processing. 政治大. examples and adjusting its weights. Properly assigned number of epochs (an. 立. epoch is an iteration of presenting all training data to the network) and value of. ‧ 國. 學. learning rate and momentum can ensure performance and the chance the reach. ‧. global minimum of error.. Nat. y. Implementation: The environment in which the neural network will be deployed. io. sit. 8.. n. al. er. requires careful consideration. Also, deployed networks needs periodic retraining. Ch. engchi. to maintain their performance.. i n U. v. In addition to the steps above, the authors also list some key factors to successful application. Fortunately, with a self-developed system, such requirements are not difficult to achieve in our research. Distributed Backpropagation Algorithm One way to execute backpropagation distributedly is to partition neurons into several disjoint subsets and assign them to different processors or computing nodes in the distributed.

(17) DISTRIBUTED ANN FOR HFT STRATEGIES. 10. system. Each processor/node processes its local neurons and passes on the outputs to neurons in the next layer by communicating with other processors/nodes. There are two schemes of partitioning called vertical partitioning and hybrid partitioning (Ganeshamoorthy & Ranasinghe, 2008; Sudhakar & Murthy, 1998; Suresh, Omkar, & Mani, 2005; Yoon, Nang, & Maeng, 1990). While hybrid partitioning is claimed as improving performance over vertical partitioning by reducing computation and communication overheads, the network still. 政治大. requires frequent communication. Therefore, the neuron parallelism is considered to be. 立. suitable for systems with low inter-processor communication costs. The communication. ‧ 國. 學. overheads would be overwhelming if we implement neuron parallelism on a computing. ‧. cluster.. Nat. io. sit. y. Another type of parallelism is called training pattern parallelism (Andonie,. al. er. Chronopoulos, Grosu, & Galmeanu, 1998; Dahl, McAvinney, & Newhall, 2008; Gu, Shen, &. n. v i n CPethick, Huang, 2013; Liu, Li, & Miao, 2010; Werstein, & Huang, 2003). The neural h e n Liddle, gchi U network will not be partitioned across the cluster; instead, it is copied onto nodes in the cluster. Each node maintains a clone of the neural network, learns a subset of the whole training set and, returns its local weight deltas which are later aggregated. Various techniques for constructing subsets have been proposed. The most straightforward one is to divide the training set into disjoint subsets by the number of worker nodes. The system in the research of Gu et.al (2013) is implemented in this way and the.

(18) DISTRIBUTED ANN FOR HFT STRATEGIES. 11. partitions are kept static, i.e., each node holds its designated subset throughout the entire training process. The advantage of static partitioning is that subsets can be cached locally on their corresponding nodes to increase performance. Dahl et al. (2008) use a technique of random sampling to avoid possible anomalies cause by predefined ordering of the data. In their algorithm, each node is presented with a randomly sampled subset of a certain size every epoch. A potential problem of random sampling is that each patterns might not be. 政治大. presented to the neural network with the same number of times, allowing the possibility of. 立. sampling bias.. ‧ 國. 學. The frequency of updating weights is also varied. In the standard backpropagation. ‧. algorithm, weights can be updated after each presentation of pattern or at the end of an epoch. Nat. io. sit. y. (Pethick et al., 2003). In a distributed algorithm, it is preferable to commit a batch update at. al. er. the end of each epoch to reduce communication overheads. Each clone accumulates its own. n. v i n Cithto the master node.UWeight deltas of all clones are set of local weight deltas and returns engchi. aggregated by taking either sums or averages, and updated into the weights on the original instance of the neural network on the master node (Andonie et al., 1998; Gu et al., 2013; Liu et al., 2010). A hybrid technique is used in the algorithm proposed by Dahl et al. (2008). The local weights on each node are updated on a per-pattern basis in order to support the momentum term in the delta function. At the end of each epoch, the weights are updated in batch with local weight deltas from all nodes..

(19) DISTRIBUTED ANN FOR HFT STRATEGIES. 12. Technologies for Parallel and Distributed Computing To implement the framework (and the underlying system of course), we consider some criteria such as performance, scalability, and fault tolerance. Based on the criteria, we have surveyed a number of software technologies. We use the programming language called Scala mainly because it supports concurrency and data parallelism on language level (in contrast to the API-level support for. 政治大. which in Java). The feature makes it easier to develop and maintain modules for parallel tasks. 立. which fully utilize multi-core processors of modern computers. Another good reason for. ‧ 國. 學. using Scala is the ability to integrate with Java seamlessly (“Scala Documentation”, n.d.).. ‧. As mentioned above, we adopt Apache Spark for our computing platform. The core. Nat. io. sit. y. concept of Spark is Resilient Distributed Dataset (RDD), a distributed memory abstraction. al. er. enables in-memory distributed computations and is fault-tolerant (Zaharia et al., 2012). It has. n. v i n C hof applications including been proven to be suitable for a number e n g c h i U machine learning. It also provides native support for the Scala language. Considering concepts and technologies above, in our framework, Spark is used to address parallelism across the computing cluster, and Scala is used to address parallelism within one computing node..

(20) DISTRIBUTED ANN FOR HFT STRATEGIES. 13. Research Method This research basically follows the eight-step guide proposed by Kaastra and Boyd (1996). However, our premise is quite different from which of the guide since we have an inhouse system under construction, and this research involves merely a part of the whole. Some of the steps are actually of concern in other parts of the system and therefore outside the scope of this research. Before presenting the distributed framework of ANN, we first make a. 政治大. brief introduction about the HFT system. Next, the data mining process in the system will be. 立. explained. Finally, the conceptual model and the architecture of the framework will be. Overview of the HFT System. ‧. ‧ 國. 學. introduced.. Nat. io. sit. y. The system allows the user to decide how a trading strategy will be formulated by. al. er. selecting algorithm, investment object, technical analysis indicators (TAIs), time intervals of. n. v i n C h constitute a strategy data and other parameters. These parameters e n g c h i U configuration. With a. strategy configuration, the user can instruct the system to start the strategy planning process. Based on the strategy configuration, the system initiates a data mining task. First, the input data is prepared according to the investment object, TAIs, time intervals and other parameters set by the user. Next, with other algorithm-specific parameters, the data mining algorithm selected by the user is executed on the input data. The output of the algorithm is formulated into a trading strategy in the form of a mathematical model. The final step is to.

(21) DISTRIBUTED ANN FOR HFT STRATEGIES. 14. use the strategy for backtesting, i.e., for trading in simulation using the historical data within the time intervals selected by the user. The user can examine the visual representation of the mathematical model of the strategy and the result of backtesting showing the performance of the strategy. Depending on the user’s assessment of the trading strategy, the strategy might be discarded or stored into the user’s strategy library. The user can register strategies from the library for trading in. 政治大. realtime. The system is currently connected to an external simulated trading platform (from. 立. connected to the real trading platform.. 學. ‧ 國. the same provider of our data source), which means it is capable of trading in real markets if. ‧. Architecture of the HFT system. The system is divided into several separate but. Nat. io. sit. y. intercommunicating subsystems, as illustrated by the UML component diagram in Figure 1.. al. er. Each subsystem serves a dedicated purpose explained as follows.. n. v i n GUI. The GUI is where the C user hinteracts i Usystem. It is exposed to the user e n g cwithh the. with the functions of account management, historical data browsing, strategy planning, and trading. Strategy planning involves the user selecting TAIs and other parameters for the strategy, and TAIs are provided as many as we are able to. Therefore in our system, the first step (variable selection) of the guide by Kaastra and Boyd (1996) is actually not addressed— or rather addressed by the user..

(22) DISTRIBUTED ANN FOR HFT STRATEGIES. AccountManagement. 15. HistoricalDataBrowsing. StrategyPlanning. Trading GUI. HFTService. Facade Planner. Trader. TradingPlatform PlanCenter. TradeCenter. 政治大. Trader. MarketState Cetner. 學. QuoteSource. DataAccessor. ‧. ‧ 國. 立. DataAccess Center. sit. y. Nat. io. n. al. er. Figure 1. Architecture of the HFT System. i n U. v. Facade. The Facade is where the backend system is exposed to the frontend GUI as a. Ch. engchi. service. It redirect requests from the frontend to others subsystems to execute the tasks and the results back to the frontend. Plan Center. The Plan Center is where all strategies are formulated. It receives strategy planning requests, gathers the data required, executes mining algorithms, performs backtesting, and stores strategies via the Data Access Center. Strategies and backtesting results are returned to the GUI for visualization. Since the mining algorithms including ANN are developed as modules in this subsystem, step three to seven (data preprocessing; training,.

(23) DISTRIBUTED ANN FOR HFT STRATEGIES. 16. testing, and validation sets; neural network paradigms; evaluation criteria; neural network training) of the guide by Kaastra and Boyd (1996) are addressed here. Trade Center. The Trade Center is where strategies are put to use. It receives requests for strategy registration and activates the strategies such that when realtime data is passed from the Market State Center, the strategies can produce signals for trading actions. To actually make orders, it connects to the external trading platform from our data provider.. 政治大. Market State Center. The Market State Center is where realtime quotes enters the. 立. system. It connects to the quote source from our data provider to receive quote ticks, and. ‧ 國. 學. calculate OHLC prices (candlesticks) and TAIs which represent the current state of the. ‧. market. It is the result of the calculation which is passed to the Trade Center for immediate. Nat. io. sit. y. use and stored via the Data Access Center for future use. Consequently, step two (data. n. al. er. collection) and a part of step three (data preprocessing) of the guide by Kaastra and Boyd (1996) are addressed here.. Ch. engchi. i n U. v. Data Access Center. The Data Access Center is where the system accesses the database. It is the only route between the system and the database. All other parts of the system must delegate data access tasks to it. The system follows the paradigm of service-oriented architecture (SOA). The framework we propose in this research can also be view as a service..

(24) DISTRIBUTED ANN FOR HFT STRATEGIES. 17. Data mining process in the HFT system. The distributed framework of ANN is used as one of the data mining algorithms for planning strategies in the Plan Center. The data mining process is composed of three stages, as illustrated by the UML activity diagram in Figure 2. The input is a strategy configuration and the output is a trading strategy along with its backtesting result. Each stage contains a few steps explained as follows.. Preprocessing. 立. 政治大 Mining. Postprocessing. ‧. ‧ 國. 學 Figure 2. Data Mining Process. sit. y. Nat. io. n. al. er. Preprocessing. There are three steps in this stage, namely gathering, signalizing and transforming, as explained below. 1.. Ch. engchi. i n U. v. In step one, the time series of indicators used for mining is gathered from the database based on the user’s selection of investment object, TAIs and time intervals.. 2.. In step two, the time series of indicators is used to compute target trading signals at each time point, producing a time series of signals. The two sets of time series is the mapped into a training set, a time series of indicators-signal pairs..

(25) DISTRIBUTED ANN FOR HFT STRATEGIES. 3.. 18. In the optional step three, the training set is transformed with one of the techniques including normalization, first differentiating or natural log as specified by the user.. Consequently, this stage is where step three (data preprocessing) of the guide by Kaastra and Boyd (1996) is mainly addressed. Mining. In this stage, the algorithm along with algorithm-specific parameters set by. 政治大. the user is executed on the training set prepared in the previous stage. In the case of the ANN. 立. algorithm, step four to seven (training, testing, and validation sets; neural network paradigms;. ‧ 國. 學. evaluation criteria; neural network training) of the guide by Kaastra and Boyd (1996) is. ‧. addressed—with one exception. The validation set is not of consideration in this stage.. Nat. sit. al. v i n C h is formulated asUa mathematical model with In step one, a trading strategy engchi n. 1.. er. io. as explained below.. y. Postprocessing. There are two steps in this stage, namely formulating and backtesting,. indicators as the variables and output of the mining algorithm as the function and coefficients. 2.. In step two, the formulated strategy is put to backtesting on the simulation set specified by the user. The strategy and the backtesting result consists the output of the data mining task..

(26) DISTRIBUTED ANN FOR HFT STRATEGIES. 19. The gathering of the simulation set is also the last piece of step three (data preprocessing) of the guide by Kaastra and Boyd (1996) and the simulation set is effectively the validation set from step four (training, testing, and validation sets) of the guide. In the system, stage one and stage three are handled by shared modules. In other words, any mining algorithm depends on the same modules for preprocessing and postprocessing. Some algorithms might have needs for data transformation for better result.. 政治大. We provide options and suggestions on the GUI while leaving the decision to the user.. 立. Conceptual Model of the Distributed Framework of ANN. ‧ 國. 學. With the system and the data mining process explained above, the distributed. ‧. framework of artificial neural network can finally be introduced. In the framework, we adopt. Nat. io. sit. y. a commonly used model of feedforward ANN known as multilayer perceptron (MLP) which. al. er. uses a supervised learning algorithm called backpropagation for learning. The training set. n. v i n Cofh the preprocessingUstage, a time series of training used in backpropagation is the output engchi. patterns (indicators-signal pairs). The training set is divided into a learning set and a testing set. The learning set is used for the MLP to learn—to adapt its weights in order to minimize the error between actual output signals and target signals. The testing set is used to test if the error is minimized. In our definition, the trading signal in a training pattern is either zero or one, where a signal of one suggesting either to take an action of buying or selling depending on the.

(27) DISTRIBUTED ANN FOR HFT STRATEGIES. 20. strategy configuration, and a signal of zero suggesting no action should be taken. Pattern with the target signal of one in a training set is usually of a small portion. In other words, most signals would be zero, which might affect the mining algorithm by over-emphasizing signals of zero. In contrary, the objective of a mining algorithm is to find reliable signals of one. Because of this characteristic in the training set, our mining algorithms should be able to handle two different paradigms of model building tasks, namely single-model task and multi. 政治大. multi-model task. The single-model task takes the whole training set as the input and return. 立. one trained model. The multi-model task takes a collection of subsets, each of which is. ‧ 國. 學. composed of all patterns with signal of one and a part of patterns with signal of zero, as. ‧. illustrated in Figure 3.. 0s. 1s. a l1s. n. 1s. 0s. er. io. 0s. sit. y. Nat. (a) Training set. 0s. 1s. 0s. 1s. C (b) h eTraining n gsubsets chi. i n U 0s. v. 1s. 0s. 1s. 0s. 1s. Figure 3. Training Set and Training Subsets To process the two different types of mining task, two variants of distributed algorithm for MLP training are needed. In the single-model training session, the algorithm adopts pattern-level parallelism. On the other hand, in the multi-model training session, the algorithm adopts perceptron-level parallelism (or network-level parallelism), where each training subsets is learned by one MLP and multiple MLPs are processed simultaneously and.

(28) DISTRIBUTED ANN FOR HFT STRATEGIES. 21. independently. The single-model training session can be modeled by the UML activity diagram in Figure 4, and the multi-model training session can be modeled by the UML activity diagram in Figure 5. TrainingSessionManager. MLPTrainer. MLPDistirbuted. 1. Creating a trainer. 2. Initializing and starting. 3.1. Distributing MLP clones. 立. 3.2. Partitioning tarining set. MLP (Clone). 政治大. er. io. sit. y. ‧. ‧ 國. 學. Nat. 4.2. Aggregating weight deltas. 4.1. Learning local partitions. n. 5. 4.3. aEvaluating Updating iv lMLPC weights n hengchi U [failed]. 6. Broadcasting new weights. [passed] 7. Returning result. Figure 4. Single-model Training Session Single-model training session. The single-model training session realize distributed computing with pattern-level parallelism across worker nodes among the cluster. The.

(29) DISTRIBUTED ANN FOR HFT STRATEGIES. 22. distributed backpropagation algorithm divides the computation in the traditional backpropagation into smaller tasks and executes them on the cluster. The update of weights is a hybrid method of per-pattern updating and per-epoch updating. The steps of the singlemodel training session are explained as follows. Step 1: Creating a trainer. A single-model training session receives a single training set as the input. An MLP trainer is created to train an instance of distributed version of MLP with the training set.. 立. 政治大. Step 2: Initializing and starting. The trainer is initialized with parameters from the. ‧ 國. 學. strategy configuration. An MLP is constructed the parameters selected by the user. The. ‧. number of the input neurons is the same as the number of indicators selected by the user. The. Nat. io. sit. y. number of output neurons is one since we have only one trading signal in a training pattern.. al. er. The number of hidden layers and neurons can be decided by the user. The transfer function is. n. v i n C hrate and momentumUare also decided by the user. We a standard sigmoid function. Learning engchi. provide default values and suggestions about user-adjustable parameters the on the GUI. The training set is divided into a learning set and a testing set. With a new MLP and the learning set, the trainer then starts the training process. A part of step four (training, testing, and validation sets), step five (neural network paradigms) and a part of step seven (neural network training) of the guide by Kaastra and Boyd (1996) is addressed here..

(30) DISTRIBUTED ANN FOR HFT STRATEGIES. 23. Step 3: Distributing and partitioning. The newly created MLP is cloned according the configuration of the computing cluster. The clones are then sent to the cluster and cached on worker nodes. The learning set is partitioned and each partition is assigned to one clone. A part of step four (training, testing, and validation sets) of the guide by Kaastra and Boyd (1996) is addressed here. Step 4: Learning, aggregating and updating. Each MLP clone learns its local. 政治大. partition of learning set through the backpropagation algorithm, and updates local weights.. 立. The weights are updated on a per-pattern basis because we include a momentum term in the. ‧ 國. 學. delta function of backpropagation. Once every clone finishes its local epoch, weight deltas. ‧. from all clones are aggregated into a delta tensor and returned to the master node. The. Nat. io. sit. y. aggregated delta tensor is then updated into the original instance of MLP. A part of step. al. er. seven (neural network training) of the guide by Kaastra and Boyd (1996) is addressed here.. n. v i n Step 5: Evaluating MLP. AtCthe epoch, the MLP is evaluated with the hend e nofgeach chi U. testing set to determine whether to stop training. Step 6 (evaluation criteria) and a part of step seven (neural network training) of the guide by Kaastra and Boyd (1996) is addressed here. Step 6: Broadcasting new weights. If the MLP did not pass the evaluation, the weight tensor of the original MLP is broadcasted to the cluster, and another epoch is started. Each MLP clone synchronizes with the latest weight tensor and starts its new local epoch..

(31) DISTRIBUTED ANN FOR HFT STRATEGIES. 24. Step 7: Returning result. If the MLP passed the evaluation, the structure and weight values of the MLP along with other information of the training session is returned as the result of the mining algorithm. TrainingSessionManager. 1.1 Distributing trainers. 立. MLP. 政治大 2. Initializing and starting. 學. ‧ 國. 1.2. Assigning subsets. MLPTrainer. [passed]. n. al. Ch. 5. Returning result. engchi. er. io. sit. y. [failed]. Nat 6. Collecting results. 3. Learning training set. ‧. 4. Evaluating MLP. i n U. v. Figure 5. Multi-model Training Session Multi-model training session. The multi-model training session realize distributed computing with perceptron-level parallelism across worker nodes among the cluster. This is in practice less complicated than the single-model session because it uses only traditional backpropagation algorithm. Each perceptron is trained independently. There is no need for.

(32) DISTRIBUTED ANN FOR HFT STRATEGIES. 25. communication during the training process. The weights are updated on a per-pattern basis. The result of a multi-model session is a collection of the results from all trainers. The steps of the multi-model training session are explained as follows. Step 1: Distributing and assigning. A multi-model training session receives a collection of training subsets as the input. For each subset, an MLP trainer is created to train an instance of plain, simple MLP. These trainers are then distributed across the cluster to process their own subsets respectively.. 立. 政治大. Step 2: Initializing and starting. This step is the same as the step two of a single-. ‧ 國. 學. model session. From the perspective of a trainer, there is no difference between a simple. ‧. MLP and a distributed MLP.. Nat. io. sit. y. Step 3: Learning training set. The simple MLP is presented with the learning set. al. er. once. The MLP learns by backpropagation with momentum, updating weights on a per-. n. v i n C h network training)Uof the guide by Kaastra and Boyd pattern basis. A part of step seven (neural engchi (1996) is addressed here. Step 4: Evaluating MLP. Since there is no difference between a simple MLP and a distributed one from the perspective of a trainer, this step is the same as the step five of a single-model session. Step 5: Returning result. For the same reason, this step is also the same as the step six of a single-model session..

(33) DISTRIBUTED ANN FOR HFT STRATEGIES. 26. Step 6: Collecting results. The results from all trainers are gathered—not aggregated—into a collection and returned as the result of the multi-model session. The result will be converted into a multi-model strategy in the postprocessing stage. A multi-model strategy processes the input vector with all models and decides the final output signal by voting. Architecture of the Distributed Framework of ANN. 政治大. The architecture of the distributed framework of ANN is designed to be object-. 立. oriented and can theoretically be implemented using any object-oriented programming. ‧ 國. 學. language. In this research we implement it with Scala. The design is based on the real world. ‧. analogy that the trainer teaches the learner and the learner is tested by the evaluator to. Nat. io. sit. y. determine the result of learning. Since the two types of training sessions adopt different levels. al. er. of parallelism respectively, the components used in two types of training session are also. n. v i n slightly varied. The components andCtheir h erelationships n g c h i inUeach context can be illustrated by Figure 6 and Figure 7. Components for single-model training session. In the distributed algorithm for single-model training session illustrated by Figure 4, the learning process is actually performed by multiple learning units, but in the architecture they are wrapped into one to simplify the relationship between components. In addition, a training set manager is designed.

(34) DISTRIBUTED ANN FOR HFT STRATEGIES. 27. for handling the training set. The functions of and relationships between components are explained as follows.. MLPEvaluator MLPService Evaluator TrainingSet TrainingSession Manager. TrainingSet Manger. MLPTrainer Trainer. 立. 政治大 MLPDistributed. ‧ 國. 學. Perceptron. SparkContext. ClusterAdapter. y. ‧. Nat. n. al. er. io. sit. MLP. Ch. i n U. MLP (Clones). engchi. v. Figure 6. Components for Single-model Training Session TrainingSessionManager. This component works as the entry point of the framework. It receives training set and other parameters, instantiates the trainer component, starts the training process, and collects the result of training. It performs three major functions, namely instantiation, starting, and completion, as explained below..

(35) DISTRIBUTED ANN FOR HFT STRATEGIES. 1.. 28. Instantiation: The training session manager instantiates and assembles components needed for the training session with the parameters received.. 2.. Starting: Since there is only one trainer, the training session manager instructs the trainer to start training by local function call.. 3.. Completion: Once the trainer finishes training, the training session manager collects necessary information, composes the simulation result and returns.. 政治大. MLPTrainer. The trainer works as the coordinator during the training process. It. 立. epoch control, as explained below.. ‧. 1.. 學. ‧ 國. controls rest of the components. It performs three major functions, namely initialization,. Initialization: The trainer uses the training set and other parameters to initialize. Nat. sit. io. Epoch control: The trainer controls the learning process by presenting the. al. er. 2.. y. the training set manager, the evaluator, and the perceptron.. n. v i n Ch learning set to the learner and, when the learner e n g c h i U finishes each epoch, calling the evaluator to test the perceptron with the testing set. If the test is failed, another epoch will be started. If the test is passed or the maximum number of iteration is reached the trainer stops iteration and finishes training.. TrainingSetManager. The training set manage provides access to the training set. It receives and encapsules the training set. The learner and evaluator can only access the.

(36) DISTRIBUTED ANN FOR HFT STRATEGIES. 29. training set through the manager. It performs two major functions, namely sliding and dividing, as explained below. 1.. Sliding: The training set manager slides the window of the training set progressively. The window is then used for dividing.. 2.. Dividing: The training set manager divides the current window of training set into a learning set and a testing set. The content of the learning set and the testing. 政治大. set changes as the window slides.. 立. MLPEvaluator. The evaluator tests the perceptron with the training set. It performs. ‧ 國. 學. two major functions, namely testing and evaluation, as explained below.. ‧. Nat. io. 2.. sit. function is the mean squared errors.. y. Testing: The evaluator computes error of the MLP on the testing set. The error. Evaluation: If there is no significant improvement in the error function, the MLP. al. er. 1.. n. v i n Ch is considered to reach convergence marked as passed, otherwise as failed. e n gand chi U. MLPDistributed. The distributed version of MLP connects to the Spark cluster through the adapter. It is exposed to the trainer as a single perceptron, while delegating the learning process to clone units on worker nodes. It performs four major functions, namely construction and duplication, distribution, and learning, as explained below. 1.. Construction: This component constructs an instance of MLP with parameters received..

(37) DISTRIBUTED ANN FOR HFT STRATEGIES. 2.. 30. Duplication: This component creates clones from the original instance of MLP, the number of clones depending on the configuration of the cluster.. 3.. Distribution: This component distributes clones to worker nodes, partitions the learning set, and assigns partitions to clones.. 4.. Learning: This component instructs the cluster to process MLP clones distributedly and returns weight delta tensors. The delta tensors are aggregated. 政治大. and updated into the original MLP.. 立. Components for multi-model training session. The components used in multi-. ‧ 國. 學. model training session are mostly the same as which in signal-model session. The differences. ‧. are that the simple MLP is used instead of the distributed MLP and the training session. Nat. n. al. er. io. sit. y. manager connects to the cluster.. MLPService. Ch. i n U. MLPEvaluator. engchi. v. Evaluator TrainingSet TrainingSession Manager. MLPTrainer. TrainingSet Manger. Trainer Adpater. Perceptron. ClusterAdapter. MLP. SparkContext. Figure 7. Components for Multi-model Training Session.

(38) DISTRIBUTED ANN FOR HFT STRATEGIES. 31. TrainingSessionManager. This component works as the entry point of the framework. It performs the same three functions as in a single-model session, but with slight differences. 1.. Instantiation: The same as in in a single-model session.. 2.. Starting: In a multi-model session multiple trainers and perceptrons are created. Trainers are distributed across the clusters and therefore the training session connects to the cluster through the adapter.. 3.. 政治大. Completion: Once the trainer finishes training, the training session manager. 立. collects information from all trainers, composes a collection of simulation results. ‧ 國. 學. and returns.. ‧. ClusterAdapter. This component initializes an instance of SparkContext and connects. Nat. sit. al. er. io. by replace the adapter.. y. to a Spark cluster. Theoretically we can connect to different distributed computing platform. n. v i n C has the coordinatorUduring the training process. It MLPTrainer. The trainer works engchi. works the same as in a single-model session since both the simple MLP and the distributed MLP implements the interface (trait in Scala) of Perceptron. TrainingSetManager. The training set manage provides access to the training set. It works the same as in a single-model session. The only difference is that an instance of training set manager handles only one training subset..

(39) DISTRIBUTED ANN FOR HFT STRATEGIES. 32. MLPEvaluator. The evaluator tests the perceptron with the training set. It works the same as in a single-model session. MLP. The plain, simple MLP which learns locally on one computing node. It performs four major functions, namely construction and learning, as explained below. 1.. Construction: This component constructs itself with parameters received.. 2.. Learning: This component learns a learning set by the backpropagation (with momentum) algorithm.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(40) DISTRIBUTED ANN FOR HFT STRATEGIES. 33. Experiments Training Time Experiment Method. To analyze the performance of the distributed framework of ANN we designed the following experiment. The distributed framework is deployed on a cluster of desktop computers (each with a PC-class quad-core CPU and 4 GB of RAM). Input data. The TAIs of in our system is generated every second. In other words, a. 政治大. training set is a time series of training patterns with the interval of one second. In the. 立. experiment, we use one trading day (about 18,000 seconds) of training patterns as the training. ‧ 國. 學. set. As mentioned previously, there two types of training session, and therefore the. ‧. experiment of training time is made on the single-model training session and the multi-model. Nat. io. sit. y. training session respectively.. al. er. Independent variables. A worker node is a quad-core PC. To analyze the. n. v i n performance, we adjust the numberC of h worker nodes (i.e. the e n g c h i U number of cores) used for training. It is possible to use part of one worker node’s resources (e.g. two cores or three cores) for computing. But in the experiment we only consider the situation of using all cores on each work nodes. Dependent Variables. We measure the time consumed by a complete training session including the construction of the MLP(s), training until convergence, extracting and.

(41) DISTRIBUTED ANN FOR HFT STRATEGIES. 34. collecting of simulation result. The time needed for preprocessing and post processing is not taken into account. Result. We use different number of cores to execute single-model and multi-model training session respectively and measure the training time. The result is shown in Table 1. Table 1 Training Time Multi-model 治 (seconds) 政大 2608.865 100.00% 98.948 立. Single-model (seconds). 20. 58.30%. 1116.823. 42.81%. 37.075. 37.47%. 844.588. 32.37%. 26.936. 27.22%. 679.535. 26.05%. 20.746. 21.00%. 569.673. 21.84%. 17.135. 16.31%. Nat. 24. 57.682. ‧. 16. 100.00%. 63.57%. y. 12. Multi-model (%). 1658.437. 學. 8. ‧ 國. 4. Single-model (%). sit. Number of cores. al. er. io. It is surprising that the single-model session by 4 cores takes about 43.5 minutes. n. v i n (2608.865 seconds), while the multi-model take only 1.6 minutes (98.948 C h session by 4 cores engchi U seconds). When we dig deeper into the training process, we finds out that single-model sessions takes more than 1,800 epochs to converge, while each MLP in the multi-model session takes about 250 to 500 epochs to converge. If we consider the input data for multimodel session, the training subset of each model is smaller in size and more balanced in terms of signal-zero patterns and signal-one patterns (i.e. less bias). It make sense that such characteristics results in faster convergence..

(42) DISTRIBUTED ANN FOR HFT STRATEGIES. 35. Furthermore, the percentage columns in Table 1 can be illustrated by Figure 8. The percentage is calculated based on the training time using 4 cores. In other words, the training time using 4 cores is defined as 100%, and the rest cell are derived from the training time using n cores dividing by the training time using 4 cores. The same applies to both the singlemodel session and the multi-model session. 100% 80% 60%. io. Single-model. al. 20. 24. sit. 12 16 Number of cores. er. 8. Nat. 4. y. ‧. 0%. ‧ 國. 20%. 立. 學. 40%. 政治大. Multi-model. n. v i n CTraining Figure 8. in Percentage h e n gTime chi U. It shows in Figure 8 that the trend is steeper for the percentages of multi-model session. It means the overhead of distributed computing is less for the multi-model session. It makes sense that the multi-model session has a low overhead because each MLP model is trained independently. There is no communication required during the training process. On the other hand, the communication during the training process is intensive in the single-model.

(43) DISTRIBUTED ANN FOR HFT STRATEGIES. 36. session because the distributed MLP on the master node exchanges weights with MLP clones on worker nodes frequently. In conclusion, the multi-model training session is far more efficient than the singlemodel training session. In practice, it the single-model session would need a dedicated cluster since it requires intensive network communication and longer CPU time. On the other hand, the multi-model session should be deployed along with other systems of services because it is. 政治大. relatively lightweight in terms of network communication and CPU consumption.. 立. Trading Simulation. ‧ 國. 學. Experiment Method. As mention previously, a trading strategy will be activated in. ‧. the Trade Center. An active MLP strategy receives realtime TAIs as the input vector and. Nat. io. sit. y. compute trading signals as the output. In this experiment, we registered some MLP strategies. n. al. er. in the Trade Center and perform simulated trading. This experiment can also be considered as cross-validation.. Ch. engchi. i n U. v. Input data. As illustrated by Figure 3, there are two types of training input (a single training set and a collection of training subsets) from the preprocessing stage and each is processed by a dedicated type of training session, i.e., variant of the distributed algorithm. Furthermore, there three options for the sampling (grouping) of signal-zero parts of training subsets, as explained below. 1.. Random without replacement..

(44) DISTRIBUTED ANN FOR HFT STRATEGIES. 2.. 37. Stratification: Signal-zero parts have low intra-group similarity and high intergroup similarity.. 3.. Clustering: Signal-zero parts have high intra-group similarity and low intergroup similarity.. In the following simulation we compare signal-model strategies and multi-model strategies with different grouping techniques. In addition, the period of the training set and. 政治大. the trading simulation is both one trading day, and training set is from the day just before. 立. which of the trading simulation.. ‧ 國. 學. Independent variables. A number of parameters which might have effect on the. Nat. y. Firing threshold: The output value of an MLP is a real number between zero and. io. sit. 1.. ‧. simulation result are selected, as explained below.. al. er. one, but we require the output signal to be either zero or one. If the output value. n. v i n is greater or equal thanC thehfiring threshold, the e n g c h i Usignal is fired as one, otherwise as zero. 2.. Voting threshold: For a multi-model strategy, the final output signal is determined through voting. If the number of models firing signals of one exceeds a certain percentage of all models, the final output signal would be one, otherwise being zero. The percentage is defined as the voting threshold. In other.

(45) DISTRIBUTED ANN FOR HFT STRATEGIES. 38. words, the voting threshold is the confidence level required for supporting a signal-one decision. Dependent Variables. The simulation result is analyzed with receiver operating characteristic (ROC). We have selected three most relevant measures as the dependent variables, as explained below. 1.. True positive (TP): A positive is a signal of one and therefore a true positive is. 政治大. when a correctly predicted signal of one. In other words, for a buying put. 立. in the future, e.g. in 10 seconds).. ‧. False positive (FP): On the other hand, a false positive is an incorrect prediction. Nat. io. Positive predictive value (PPV): The positive predictive value (also called. er. 3.. sit. of price rising which represents a trading loss.. y. 2.. 學. ‧ 國. strategy, a true positive is a correct prediction of price rising (in a certain period. al. n. v i n C probability precision) is the post-test U positive being a true positive. h e n g cofhanioutput It is the ratio of the number of true positives to the number of output positives (i.e., the number of true positives plus the number of false positives). In practice, the value should be higher than 0.5; otherwise there would be loss. The equation is as below where P’ stands for outcome positive. PPV = TP/P ′ = TP/(TP + FP).

(46) DISTRIBUTED ANN FOR HFT STRATEGIES. 39. A valid strategy requires PPV to be higher than 0.5, yet a good strategy requires TP and PPV as higher as possible. However, it does guaranty maximized TP and PPV by minimizing the error function, which is why we need to adjust other parameters for better result. Testing for Firing Threshold. The first test takes the firing threshold (FT) as the independent variable. The value of firing threshold defaults to 0.5 (i.e., rounding of the output. 政治大. value) but can be increased. The simulation result of strategies with different levels of FT is. 立. shown in Table 2.. ‧. ‧ 國. 學. PPV. TP. io. 0.5. FP. MMR. 120.7 104.8 0.535. 96.6. al. FP. MMS PPV. TP. 15.9 0.859 102.2. FP. PPV. er. TP. Nat. Single-model FT. MMC. y. Result of Firing Threshold Test. sit. Table 2. 16.4 0.862. TP. FP. PPV. 57.2. 2.9 0.952. 33.2. 0.2 0.993. 22.8. 0.0 1.000. 77.3. 63.0 0.551. 0.7. 74.5. 62.3 0.544. 0.8. 50.7. 39.6 0.561. 25.4. 0.0 1.000. 26.1. 0.0 1.000. 11.6. 0.0 1.000. 0.9. 32.3. 27.9 0.536. 4.8. 0.0 1.000. 7.6. 0.0 1.000. 4.8. 0.0 1.000. n. 0.6. v 45.4 C 0.6 0.988 70.1 3.4i 0.953 n hengchi U 39.0 0.3 0.994 38.5 0.3 0.994. Notes. FT = firing threshold, MMR = multi-model with random sampling, MMS = multi-model with stratification, MMC = multi-model with clustering. When FT increases, all multi-model strategies shows increasing PPV. As we would normally expect, higher firing threshold yields more conservative result. A conservative.

(47) DISTRIBUTED ANN FOR HFT STRATEGIES. 40. trading strategy makes fewer orders but the probability for each order to actually make a profit is higher. However single-model strategies does not show the same kind of trend. In addition, when comparing the four types of strategies, the single-model ones perform the worst. While producing more outcome positives (TP + FP) than multi-model strategies, single-model strategies produce PPVs only a bit over 0.5. MMR and MMS strategies produce similar results in terms of PPV but MMS strategies produce more positives.. 政治大. MMC strategies produce the highest PPVs but the least positives. In general, there is no need for lifting the firing threshold.. 立. ‧ 國. 學. Testing for Voting Threshold. The second test takes the voting threshold (VT) as the. ‧. independent variable. The value of firing threshold defaults to 0.8 but can be adjust between. Nat. io. sit. y. 0.5 and 1.0. The simulation result of strategies with different levels of VT is shown in Table 3.. er. Only multi-model strategies are tested since there is no need for voting in single-model. al. n. v i n C h in this test is setUto the default value of 0.5. strategies. In addition, the firing threshold engchi Table 3 Result of Voting Threshold Test MMR. MMS. MMC. VT TP. FP. PPV. TP. FP. PPV. TP. FP. PPV. 0.5. 587.1 358.6 0.621 438.1 245.4 0.641 860.9 711.5 0.548. 0.6. 311.0 164.7 0.654 288.7 140.2 0.673 300.0 154.0 0.661. 0.7. 214.5. 0.8. 96.6. 80.8 0.726 237.1 109.3 0.684 174.3 15.9 0.859 102.2. 16.4 0.862. 57.2. 61.8 0.738 2.9 0.952.

(48) DISTRIBUTED ANN FOR HFT STRATEGIES. 41. 0.9. 53.4. 1.7 0.969. 71.4. 4.7 0.938. 33.1. 0.5 0.984. 10.. 23.3. 0.3 0.988. 37.9. 4.1 0.901. 11.5. 0.1 0.989. Notes. VT = voting threshold. When VT increases, all multi-model strategies show increasing PPV. As we would normally expect, higher voting threshold yields more conservative result. Considering both the number of outcome positives and PPV, MMR strategies performs better at low levels of VT (0.5 and 0.6) but have few advantage over MMS strategies at VTs above 0.7. MMS. 政治大. strategies produce the least positives at low VTs but the most positives at VTs above 0.7.. 立. While MMS does not always produce the highest PPV, but when taking outcome positives. ‧ 國. 學. into account, it seems to be the most stable. MMC strategies produce the most positives at VT. ‧. of 0.5 but the PPV is only 0.548. When the VT increases, the outcome positives produced by. y. Nat. al. er. io. sit. MMC decreases dramatically. Comparing all strategies, the default value of VT = 0.8 might. v. n. be too conservative. It is acceptable to use 0.7 or even lower level of VT.. Ch. engchi. i n U. Considering the results of all experiments, single-model strategies have no advantage over multi-model strategies and require far more time and computing resources. On the other hand, MMS seems to be the most recommended in all three types of multi-model strategies, but MMR and MMC do have the advantage of risk aversion at medium and high levels of VT..

(49) DISTRIBUTED ANN FOR HFT STRATEGIES. 42. Conclusion In the HFT system, trading strategies are formulated through data mining on financial time series. Among several mining algorithms adopted in the system, this research focuses on the artificial neural network. Following the guide by Kaastra and Boyd (1996), we propose a distributed framework of ANN. The framework is designed with various options to the construction and training of a neural network. For performance, scalability and fault tolerance,. 政治大. we implement the framework using Scala and Apache Spark while retaining the flexibility for. 立. a different based cluster.. ‧ 國. 學. Although the motivation of developing a distributed framework is to accelerate the. ‧. single-model training session, the results of experiments show that single-model strategies. Nat. io. sit. y. have no advantage over multi-model strategies. However, the framework also works well. n. al. er. with multi-model training sessions, and multi-model strategies produce excellent prediction result in the trading simulation.. Ch. engchi. i n U. v. There is no optimal solution in neural network modeling. One can only find a better approximation through repeated trial-and-error processes. Hence the framework provides flexibility, allowing the user to experiment on various MLPs with different configurations. In this research, only a few combinations of parameters are used in the experiment, yet the result proves MLP’s exceptional ability in predicting financial time series..

(50) DISTRIBUTED ANN FOR HFT STRATEGIES. 43. References 縮短集合競價秒數提升交易效能. (2013, May 28). TWSE 臺灣證券交易所. Retrieved March 14, 2014, from http://www.twse.com.tw/ch/about/press_room/tsec_news_detail.php?id=11972 [Reducing Cycle Time of Call Auction to Increase Performance. (2013, May 28). TWSE Taiwan Stock Exchange. Retrieved March 14, 2014, from. 政治大. http://www.twse.com.tw/ch/about/press_room/tsec_news_detail.php?id=11972]. 立. Andonie, R., Chronopoulos, A. T., Grosu, D., & Galmeanu, H. (1998, October). Distributed. ‧ 國. 學. backpropagation neural networks on a PVM heterogeneous system. In Parallel and. ‧. Distributed Computing and Systems Conference (PDCS'98) (p. 555).. Nat. io. sit. y. Dahl, G., McAvinney, A., & Newhall, T. (2008, February). Parallelizing neural network. al. er. training for cluster systems. In Proceedings of the IASTED International Conference. n. v i n on Parallel and Distributed C Computing h e n gandc Networks h i U (pp. 220-225). ACTA Press.. Feng, A. (2013). Spark and Hadoop at Yahoo: Brought to you by YARN [Slides]. Retrieved March 21, 2014, from http://ampcamp.berkeley.edu/wpcontent/uploads/2013/07/andy-feng-ampcamp-3-presentation-Spark_on_YARN.pdf Ganeshamoorthy, K., & Ranasinghe, D. N. (2008, May). On the performance of parallel neural network implementations on distributed memory architectures. In Cluster.

(51) DISTRIBUTED ANN FOR HFT STRATEGIES. 44. Computing and the Grid, 2008. CCGRID'08. 8th IEEE International Symposium on (pp. 90-97). IEEE. Grant, J. (2013, November 12). Asia stock exchanges and watchdogs grapple with HFT dilemma. Financial Times. Retrieved March 14, 2014, from http://www.ft.com/cms/s/0/5ff181f6-4b4c-11e3-8c4c-00144feabdc0.html GTSM to Re-adjust Securities Matching Time to 15 seconds Starting July 1, 2013. (2013,. 政治大. June 28). GreTai Securities Market. Retrieved March 14, 2014, from. 立. http://hist.gretai.org.tw/en/about/news/otc_news/otc_news_detail.php?doc_id=783. ‧ 國. 學. Gu, R., Shen, F., & Huang, Y. (2013, October). A parallel computing platform for training. ‧. large scale neural networks. In Big Data, 2013 IEEE International Conference on (pp.. Nat. io. sit. y. 376-384). IEEE.. n. al. er. Haldane, A. (2010). Patience and finance. Remarks at the Oxford China Business Forum, Beijing.. Ch. engchi. i n U. v. Jones, R. D., Lee, Y. C., Barnes, C. W., Flake, G. W., Lee, K., Lewis, P. S., & Qian, S. (1990, June). Function approximation and time series prediction with neural networks. In Neural Networks, 1990., 1990 IJCNN International Joint Conference on (pp. 649-665). IEEE. Kaastra, I., & Boyd, M. (1996). Designing a neural network for forecasting financial and economic time series. Neurocomputing, 10(3), 215-236..

(52) DISTRIBUTED ANN FOR HFT STRATEGIES. 45. Kenett, D. Y., Ben-Jacob, E., & Stanley, H. E. (2013). How High Frequency Trading Affects a Market Index. Scientific reports, 3. Kimoto, T., Asakawa, K., Yoda, M., & Takeoka, M. (1990, June). Stock market prediction system with modular neural networks. In Neural Networks, 1990., 1990 IJCNN International Joint Conference on (pp. 1-6). IEEE. Kingsley, T., Phadnis, K., & Stone, G. (2013, June 11). HFT: Perspectives from Asia-Part I.. 政治大. Bloomberg Tradebook. Retrieved March 14, 2014, from. 立. http://www.bloombergtradebook.com/blog/hft-perspectives-from-asia-part-i/. ‧ 國. 學. Kwong, R. (2011, November 18). Taiwan Stock Exchange plans IT upgrade. Financial. ‧. Nat. io. sit. 11e1-a468-00144feabdc0.html. y. Times. Retrieved March 14, 2014, from http://www.ft.com/cms/s/0/f9803820-0fa4-. al. er. Liu, Z., Li, H., & Miao, G. (2010, August). MapReduce-based backpropagation neural. n. v i n C hdata. In Natural Computation network over large scale mobile (ICNC), 2010 Sixth engchi U International Conference on (Vol. 4, pp. 1726-1730). IEEE.. Pethick, M., Liddle, M., Werstein, P., & Huang, Z. (2003, November). Parallelization of a backpropagation neural network on a cluster computer. In International conference on parallel and distributed computing and systems (PDCS 2003). Popper, N. (2012, October 14). High-Speed Trading No Longer Hurtling Forward. The New York Times. Retrieved March 14, 2014, from.

(53) DISTRIBUTED ANN FOR HFT STRATEGIES. 46. http://www.nytimes.com/2012/10/15/business/with-profits-dropping-high-speedtrading-cools-down.html Price, M. (2013, October 7). Asia Goes Slow on High-Speed Trading. MoneyBeat - The Wall Street Journal. Retrieved March 14, 2014, from http://blogs.wsj.com/moneybeat/2013/10/07/asia-goes-slow-on-high-speed-trading/ Ranasinghe, D. (2014, April 2). Are markets rigged? Asia experts weigh in on debate. CNBC.. 政治大. Retrieved April 6, 2014, from http://www.cnbc.com/id/101546147. 立. http://docs.scala-lang.org/. 學. ‧ 國. Scala Documentation. (n.d.). Scala Documentation. Retrieved March 21, 2014, from. ‧. Sudhakar, V., & Murthy, C. S. R. (1998). Efficient mapping of backpropagation algorithm. Nat. IEEE Transactions on, 28(6), 841-848.. al. er. io. sit. y. onto a network of workstations. Systems, Man, and Cybernetics, Part B: Cybernetics,. n. v i n C(2005). Suresh, S., Omkar, S. N., & Mani, V. implementation of back-propagation h e nParallel gchi U algorithm in networks of workstations. Parallel and Distributed Systems, IEEE Transactions on, 16(1), 24-34. White, H. (1988, July). Economic prediction using neural networks: The case of IBM daily stock returns. In Neural Networks, 1988., IEEE International Conference on (pp. 451458). IEEE..

(54) DISTRIBUTED ANN FOR HFT STRATEGIES. 47. Xin, R. S., Rosen, J., Zaharia, M., Franklin, M. J., Shenker, S., & Stoica, I. (2013, June). Shark: SQL and rich analytics at scale. In Proceedings of the 2013 international conference on Management of data (pp. 13-24). ACM. Yoon, H., Nang, J. H., & Maeng, S. R. (1990, October). A distributed backpropagation algorithm of neural networks on distributed-memory multiprocessors. In Frontiers of Massively Parallel Computation, 1990. Proceedings., 3rd Symposium on the (pp. 358363). IEEE.. 立. 政治大. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, Franklin M. J., Shenker,. ‧ 國. 學. S., & Stoica, I. (2012, April). Resilient distributed datasets: A fault-tolerant. ‧. abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX. Nat. sit. n. al. er. io. Association.. y. conference on Networked Systems Design and Implementation (pp. 2-2). USENIX. Ch. engchi. i n U. v.

(55)