Collecting results. The results from all trainers are gathered

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Step 6: Collecting results. The results from all trainers are gathered—not

aggregated—into a collection and returned as the result of the multi-model session. The result will be converted into a multi-model strategy in the postprocessing stage. A multi-model strategy processes the input vector with all models and decides the final output signal by voting.

Architecture of the Distributed Framework of ANN

The architecture of the distributed framework of ANN is designed to be object-oriented and can theoretically be implemented using any object-object-oriented programming language. In this research we implement it with Scala. The design is based on the real world analogy that the trainer teaches the learner and the learner is tested by the evaluator to

determine the result of learning. Since the two types of training sessions adopt different levels of parallelism respectively, the components used in two types of training session are also slightly varied. The components and their relationships in each context can be illustrated by Figure 6 and Figure 7.

Components for single-model training session. In the distributed algorithm for

single-model training session illustrated by Figure 4, the learning process is actually performed by multiple learning units, but in the architecture they are wrapped into one to simplify the relationship between components. In addition, a training set manager is designed

‧

for handling the training set. The functions of and relationships between components are explained as follows.

Figure 6. Components for Single-model Training Session

TrainingSessionManager. This component works as the entry point of the framework.

It receives training set and other parameters, instantiates the trainer component, starts the training process, and collects the result of training. It performs three major functions, namely instantiation, starting, and completion, as explained below.

MLPTrainer

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

1. Instantiation: The training session manager instantiates and assembles components needed for the training session with the parameters received.

2. Starting: Since there is only one trainer, the training session manager instructs the trainer to start training by local function call.

3. Completion: Once the trainer finishes training, the training session manager collects necessary information, composes the simulation result and returns.

MLPTrainer. The trainer works as the coordinator during the training process. It

controls rest of the components. It performs three major functions, namely initialization, epoch control, as explained below.

1. Initialization: The trainer uses the training set and other parameters to initialize the training set manager, the evaluator, and the perceptron.

2. Epoch control: The trainer controls the learning process by presenting the learning set to the learner and, when the learner finishes each epoch, calling the evaluator to test the perceptron with the testing set. If the test is failed, another epoch will be started. If the test is passed or the maximum number of iteration is reached the trainer stops iteration and finishes training.

TrainingSetManager. The training set manage provides access to the training set. It

receives and encapsules the training set. The learner and evaluator can only access the

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

training set through the manager. It performs two major functions, namely sliding and dividing, as explained below.

1. Sliding: The training set manager slides the window of the training set progressively. The window is then used for dividing.

2. Dividing: The training set manager divides the current window of training set into a learning set and a testing set. The content of the learning set and the testing set changes as the window slides.

MLPEvaluator. The evaluator tests the perceptron with the training set. It performs

two major functions, namely testing and evaluation, as explained below.

1. Testing: The evaluator computes error of the MLP on the testing set. The error function is the mean squared errors.

2. Evaluation: If there is no significant improvement in the error function, the MLP is considered to reach convergence and marked as passed, otherwise as failed.

MLPDistributed. The distributed version of MLP connects to the Spark cluster

through the adapter. It is exposed to the trainer as a single perceptron, while delegating the learning process to clone units on worker nodes. It performs four major functions, namely construction and duplication, distribution, and learning, as explained below.

1. Construction: This component constructs an instance of MLP with parameters received.

‧

2. Duplication: This component creates clones from the original instance of MLP, the number of clones depending on the configuration of the cluster.

3. Distribution: This component distributes clones to worker nodes, partitions the learning set, and assigns partitions to clones.

4. Learning: This component instructs the cluster to process MLP clones

distributedly and returns weight delta tensors. The delta tensors are aggregated and updated into the original MLP.

Components for model training session. The components used in

multi-model training session are mostly the same as which in signal-multi-model session. The differences are that the simple MLP is used instead of the distributed MLP and the training session manager connects to the cluster.

Figure 7. Components for Multi-model Training Session

MLPTrainer

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

TrainingSessionManager. This component works as the entry point of the framework.

It performs the same three functions as in a single-model session, but with slight differences.

1. Instantiation: The same as in in a single-model session.

2. Starting: In a multi-model session multiple trainers and perceptrons are created.

Trainers are distributed across the clusters and therefore the training session connects to the cluster through the adapter.

3. Completion: Once the trainer finishes training, the training session manager collects information from all trainers, composes a collection of simulation results and returns.

ClusterAdapter. This component initializes an instance of SparkContext and connects

to a Spark cluster. Theoretically we can connect to different distributed computing platform by replace the adapter.

MLPTrainer. The trainer works as the coordinator during the training process. It

works the same as in a single-model session since both the simple MLP and the distributed MLP implements the interface (trait in Scala) of Perceptron.

TrainingSetManager. The training set manage provides access to the training set. It

works the same as in a single-model session. The only difference is that an instance of training set manager handles only one training subset.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

MLPEvaluator. The evaluator tests the perceptron with the training set. It works the

same as in a single-model session.

MLP. The plain, simple MLP which learns locally on one computing node. It

performs four major functions, namely construction and learning, as explained below.

1. Construction: This component constructs itself with parameters received.

2. Learning: This component learns a learning set by the backpropagation (with momentum) algorithm.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Experiments

Training Time

Experiment Method. To analyze the performance of the distributed framework of

ANN we designed the following experiment. The distributed framework is deployed on a cluster of desktop computers (each with a PC-class quad-core CPU and 4 GB of RAM).

Input data. The TAIs of in our system is generated every second. In other words, a

training set is a time series of training patterns with the interval of one second. In the

experiment, we use one trading day (about 18,000 seconds) of training patterns as the training set. As mentioned previously, there two types of training session, and therefore the

experiment of training time is made on the single-model training session and the multi-model training session respectively.

Independent variables. A worker node is a quad-core PC. To analyze the

performance, we adjust the number of worker nodes (i.e. the number of cores) used for training. It is possible to use part of one worker node’s resources (e.g. two cores or three cores) for computing. But in the experiment we only consider the situation of using all cores on each work nodes.

Dependent Variables. We measure the time consumed by a complete training session

including the construction of the MLP(s), training until convergence, extracting and

‧

collecting of simulation result. The time needed for preprocessing and post processing is not taken into account.

Result. We use different number of cores to execute single-model and multi-model

training session respectively and measure the training time. The result is shown in Table 1.

Table 1

It is surprising that the single-model session by 4 cores takes about 43.5 minutes (2608.865 seconds), while the multi-model session by 4 cores take only 1.6 minutes (98.948 seconds). When we dig deeper into the training process, we finds out that single-model sessions takes more than 1,800 epochs to converge, while each MLP in the multi-model session takes about 250 to 500 epochs to converge. If we consider the input data for multi-model session, the training subset of each multi-model is smaller in size and more balanced in terms of signal-zero patterns and signal-one patterns (i.e. less bias). It make sense that such characteristics results in faster convergence.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Furthermore, the percentage columns in Table 1 can be illustrated by Figure 8. The percentage is calculated based on the training time using 4 cores. In other words, the training time using 4 cores is defined as 100%, and the rest cell are derived from the training time using n cores dividing by the training time using 4 cores. The same applies to both the single-model session and the multi-single-model session.

Figure 8. Training Time in Percentage

It shows in Figure 8 that the trend is steeper for the percentages of multi-model session. It means the overhead of distributed computing is less for the multi-model session. It makes sense that the multi-model session has a low overhead because each MLP model is trained independently. There is no communication required during the training process. On the other hand, the communication during the training process is intensive in the single-model

20%

40%

60%

80%

100%

4 8 12 16 20 24

Number of cores

Single-model Multi-model

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

session because the distributed MLP on the master node exchanges weights with MLP clones on worker nodes frequently.

In conclusion, the multi-model training session is far more efficient than the single-model training session. In practice, it the single-single-model session would need a dedicated cluster since it requires intensive network communication and longer CPU time. On the other hand, the multi-model session should be deployed along with other systems of services because it is relatively lightweight in terms of network communication and CPU consumption.

Trading Simulation

Experiment Method. As mention previously, a trading strategy will be activated in

the Trade Center. An active MLP strategy receives realtime TAIs as the input vector and compute trading signals as the output. In this experiment, we registered some MLP strategies in the Trade Center and perform simulated trading. This experiment can also be considered as cross-validation.

Input data. As illustrated by Figure 3, there are two types of training input (a single

training set and a collection of training subsets) from the preprocessing stage and each is processed by a dedicated type of training session, i.e., variant of the distributed algorithm.

Furthermore, there three options for the sampling (grouping) of signal-zero parts of training subsets, as explained below.

1. Random without replacement.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2. Stratification: Signal-zero parts have low intra-group similarity and high inter-group similarity.

3. Clustering: Signal-zero parts have high intra-group similarity and low inter-group similarity.

In the following simulation we compare signal-model strategies and multi-model strategies with different grouping techniques. In addition, the period of the training set and the trading simulation is both one trading day, and training set is from the day just before which of the trading simulation.

Independent variables. A number of parameters which might have effect on the

simulation result are selected, as explained below.

1. Firing threshold: The output value of an MLP is a real number between zero and one, but we require the output signal to be either zero or one. If the output value is greater or equal than the firing threshold, the signal is fired as one, otherwise as zero.

2. Voting threshold: For a multi-model strategy, the final output signal is

determined through voting. If the number of models firing signals of one exceeds a certain percentage of all models, the final output signal would be one,

otherwise being zero. The percentage is defined as the voting threshold. In other

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

words, the voting threshold is the confidence level required for supporting a signal-one decision.

Dependent Variables. The simulation result is analyzed with receiver operating

characteristic (ROC). We have selected three most relevant measures as the dependent variables, as explained below.

1. True positive (TP): A positive is a signal of one and therefore a true positive is when a correctly predicted signal of one. In other words, for a buying put strategy, a true positive is a correct prediction of price rising (in a certain period in the future, e.g. in 10 seconds).

2. False positive (FP): On the other hand, a false positive is an incorrect prediction of price rising which represents a trading loss.

3. Positive predictive value (PPV): The positive predictive value (also called precision) is the post-test probability of an output positive being a true positive.

It is the ratio of the number of true positives to the number of output positives (i.e., the number of true positives plus the number of false positives). In practice, the value should be higher than 0.5; otherwise there would be loss. The equation is as below where P’ stands for outcome positive.

PPV = TP/P^′ = TP/(TP + FP)

‧

A valid strategy requires PPV to be higher than 0.5, yet a good strategy requires TP and PPV as higher as possible. However, it does guaranty maximized TP and PPV by minimizing the error function, which is why we need to adjust other parameters for better result.

Testing for Firing Threshold. The first test takes the firing threshold (FT) as the

independent variable. The value of firing threshold defaults to 0.5 (i.e., rounding of the output value) but can be increased. The simulation result of strategies with different levels of FT is shown in Table 2.

Table 2

Result of Firing Threshold Test

FT Notes. FT = firing threshold, MMR = multi-model with random sampling,

MMS = multi-model with stratification, MMC = multi-model with clustering.

When FT increases, all multi-model strategies shows increasing PPV. As we would normally expect, higher firing threshold yields more conservative result. A conservative

‧

trading strategy makes fewer orders but the probability for each order to actually make a profit is higher. However single-model strategies does not show the same kind of trend.

In addition, when comparing the four types of strategies, the single-model ones perform the worst. While producing more outcome positives (TP + FP) than multi-model strategies, single-model strategies produce PPVs only a bit over 0.5. MMR and MMS

strategies produce similar results in terms of PPV but MMS strategies produce more positives.

MMC strategies produce the highest PPVs but the least positives. In general, there is no need for lifting the firing threshold.

Testing for Voting Threshold. The second test takes the voting threshold (VT) as the

independent variable. The value of firing threshold defaults to 0.8 but can be adjust between 0.5 and 1.0. The simulation result of strategies with different levels of VT is shown in Table 3.

Only multi-model strategies are tested since there is no need for voting in single-model strategies. In addition, the firing threshold in this test is set to the default value of 0.5.

Table 3

Result of Voting Threshold Test

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

0.9 53.4 1.7 0.969 71.4 4.7 0.938 33.1 0.5 0.984 10. 23.3 0.3 0.988 37.9 4.1 0.901 11.5 0.1 0.989 Notes. VT = voting threshold.

When VT increases, all multi-model strategies show increasing PPV. As we would normally expect, higher voting threshold yields more conservative result. Considering both the number of outcome positives and PPV, MMR strategies performs better at low levels of VT (0.5 and 0.6) but have few advantage over MMS strategies at VTs above 0.7. MMS strategies produce the least positives at low VTs but the most positives at VTs above 0.7.

While MMS does not always produce the highest PPV, but when taking outcome positives into account, it seems to be the most stable. MMC strategies produce the most positives at VT of 0.5 but the PPV is only 0.548. When the VT increases, the outcome positives produced by MMC decreases dramatically. Comparing all strategies, the default value of VT = 0.8 might be too conservative. It is acceptable to use 0.7 or even lower level of VT.

Considering the results of all experiments, single-model strategies have no advantage over multi-model strategies and require far more time and computing resources. On the other hand, MMS seems to be the most recommended in all three types of multi-model strategies, but MMR and MMC do have the advantage of risk aversion at medium and high levels of VT.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Conclusion

In the HFT system, trading strategies are formulated through data mining on financial time series. Among several mining algorithms adopted in the system, this research focuses on the artificial neural network. Following the guide by Kaastra and Boyd (1996), we propose a distributed framework of ANN. The framework is designed with various options to the

construction and training of a neural network. For performance, scalability and fault tolerance, we implement the framework using Scala and Apache Spark while retaining the flexibility for a different based cluster.

Although the motivation of developing a distributed framework is to accelerate the single-model training session, the results of experiments show that single-model strategies have no advantage over multi-model strategies. However, the framework also works well with multi-model training sessions, and multi-model strategies produce excellent prediction result in the trading simulation.

There is no optimal solution in neural network modeling. One can only find a better approximation through repeated trial-and-error processes. Hence the framework provides flexibility, allowing the user to experiment on various MLPs with different configurations. In this research, only a few combinations of parameters are used in the experiment, yet the result proves MLP’s exceptional ability in predicting financial time series.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

References

縮短集合競價秒數提升交易效能. (2013, May 28). TWSE 臺灣證券交易所. Retrieved

March 14, 2014, from

http://www.twse.com.tw/ch/about/press_room/tsec_news_detail.php?id=11972

[Reducing Cycle Time of Call Auction to Increase Performance. (2013, May 28).

TWSE Taiwan Stock Exchange. Retrieved March 14, 2014, from

http://www.twse.com.tw/ch/about/press_room/tsec_news_detail.php?id=11972]

Andonie, R., Chronopoulos, A. T., Grosu, D., & Galmeanu, H. (1998, October). Distributed backpropagation neural networks on a PVM heterogeneous system. In Parallel and Distributed Computing and Systems Conference (PDCS'98) (p. 555).

Dahl, G., McAvinney, A., & Newhall, T. (2008, February). Parallelizing neural network training for cluster systems. In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (pp. 220-225). ACTA Press.

Feng, A. (2013). Spark and Hadoop at Yahoo: Brought to you by YARN [Slides]. Retrieved March 21, 2014, from

在文檔中運用於高頻交易策略規劃之分散式類神經網路框架 - 政大學術集成 (頁 33-54)