Would Evolutionary Computation Help in Designs of ANNs in Forecasting Exchange Rates?

(1)

Would Evolutionary Computation Help in Designs of

Articial Neural Nets in Forecasting Financial Time

Series?

Shu-Heng Chen

AI-ECON Research Group

Department of Economics

National Chengchi University

Taipei, Taiwan 11623

TEL: 886-2-9387308

FAX: 886-2-9390344

Masterlink Securities Corporation

19th Fl., 99, Tun Hua S. Rd., Sec. 2

Taipei, Taiwan, 106

TEL: 886-2-2325-5818-808

FAX: 886-2-2325-9408

E-mail: [email protected]

Key Words: Articial Neural Networks, Evolutionary Computation, Genetic Algorithms, Tick-by-Tick Data, Forecasting.

1 Motivation and Introduction:

Since the pioneering work by White (1988), the application of articial neural networks (ANNs) to nance has enjoyed an exponential growth in research and publications. The evidences accumulated over the last decade indicate that the success of the nancial application of ANNs crucially depends on the design of the ANN.

Let us rst consider the structure of the ANN. It is well known that nonlinear ARMA time series can be more eciently approximated by recurrent neural nets than by their layered feedforward counterparts. (Kamijo and Tanigawa, 1990; Lee and Park, 1992; Kaun and Liu, 1995). Also, with the presence of structural switches in time, it may be more appropriate to use modular neural networks (Kimoto, Asakawa, Yoda, and Takeoka, 1990). The many successful applications of radial basis neural networks in nance further suggests the relevance of this special class of articial neural nets to nance (Hutchinson, Lo and Poggio, 1994).

Once the structure has been decided, the next complicated task to deal with in the design of the ANN is the architectures. In this area, one of the most frequently discussed issues is the number of hidden nodes. It is widely accepted by nance people that the number of hidden nodes are closely related to the issue of overtting, and that a proper choice of the number of hidden nodes (hidden layer size) can enhance the generalization capability of ANNs. Dierent techniques have been tried by researchers to control hidden layer size, among them, cross validation (Moody and Utans, 1995), early stopping (Kimoto and Askawa, 1990; Hoptro, Bramson and Hall, 1991), complexity regularization (Weigend, Huberman and Rumelhart, 1992), construction-and-destruction algorithms (Jang and Lai, 1994).

In contrast, the signicance of transfer functions has received less attention among nance people in their design of ANNs. While the relevance of the transfer function to the design of ANNw has been shown earlier by Mani (1990), Lovell and Tsoi (1992) and, recently, by Sebald and Chellapilla (1998), its role in the nancial domain has not been well documented. Even though the radial basis network has been promoted, the choice of sigmoid, hyper tangent or radian basis functions seems to be based largely on the hunch of the researcher rather than the performance of each function. In their option pricing model, Hutchinson, Lo and Poggio (1994) compared the performance of the radial basis function and the sigmoid function and did not

(2)

nd signicant dierence between them. However, in a similar application, Chen and Lee (1998) provided some evidences of the superiority of the hyper tangent function over the sigmoid function.

Apart from hidden layer size and transfer functions, other potential contributing factors, such as the learning rate, momentum, and training methods, have also attracted the attention of some nance people, though their signicance is far less clear.

The brief review above roughly summarizes the list of factors which have been considered relevant to a successful application of ANNs in the nancial domain. Needless to say, the ANN space based on such a list is incredibly large. While experts' prior knowledge or domain-specic knowledge may reduce the size of this space, nancial theory can help us little in this regard. At its best, nancial theroy can only guide us in the selection of structure (recurrent or feedforward net) and inputs, and it can hardly say anything about the hidden layer size or transfer functions.

In addition to nancial theory, statistical theory and information theory have also been employed to solve the design problem. In most cases, they are applied in such a manner that only one factor can be addressed at a time. For example, when the only design issue is the control of hidden layer size, the ANN can be designed in accordance with a statistical or an information-theoretic approach. However, if, in the meantime, the choice of transfer functions is also a problem, then this approach may no longer be applicable. So, when both nancial theory and statistical theory reach their limits, some search procedures must be taken, and this is where evolutionary articial neural nets come into play. Evolutionary ANNs (EANNs)

can be considered a combination of ANNs and evolutionary search procedures. A prominent feature of EANNs is that they can evolve towards the ttest in a task environment without outside interference, and thus eliminate the tedious trial-and-error work of manually nding an optimal (ttest) ANN for the task about which little prior knowlegde is available. Yao (1993) provided an survey of the development in this research area. Yao distinguished among three kinds of evolution in EANNs, i.e., the evolution of connection weights, architectures, and learning rules.

However, like many other \solutions", an EANN is not a panacea, and, what is worse, it may create more problems before it can solve any. One of the major issues in EANNs is the representation problem. While encoding connection weights is straightforward, encoding the architecture and the learning rule are daunting tasks. As Yao (1993) stated:

Trying to develop a universal representation scheme which can specify any kind of dynamic behaviours of an EANN is clearly impractical, let alone the prohibitive long computation time required to seach such a learning rule space.

Therefore, what he further suggested is that it may not be a good strategy to use evolutionary computation at all levels of the evolution of an ANN. So, the question left is: at what level, can evolutionary computation be helpful for the design of EANNs in nance?

The last few years have seen a series of nancial applications of EANNs. Margarita (1991) applied a genetic search to the weights of a recurrent network for the trading of the FIAT shares in the Milan Stock Exchange. In Dorsey, Johnson and Mayer (1995), the GA was found to perform well when optimizing NNs. Sexton, Johnson and Dorsey (1995) also found the GA-optimized NN to outperform the BPNN when testing out-of-sample, thereby addressing the problem of overtting. Harrald and Kamstra (1998) used evolutionary programming to replace the more familiar backpropagation method to ne tune the connection weights of feedforward nets for forecasting volatility. White (1998) showed that a Genetic Adaptive Neural Network (GANN) is able to approximate, to a high degree of accuracy, the complex, nonlinear option-pricing function used to prodcue the simulated option prices.

While these studies clearly evidenced the promising feature of using evolutionary computation in the design of ANNs, in Yao's categorization they are all concerned with the lowest level of evolution, namely, connection weights. Other dimensions of the design of an articial neural net, such as the number of hidden layers, number of hidden nodes, inputs, and transfer functions, have not been tackled. Therefore, the purpose of this paper is to extend the current nancial applications of EANNs to a higher level of evolution, and to evaluate its relevance.

(3)

Table 1: Stylized Facts of DM/US Returns: 6/3/98, 3799 Observations. Procedure Result Procedure Result

Skewness -0.0196 Kurtosis 4.3912 Jargue-Bera 306:6182 AR(1) -0.4202 BDS 23.997 BDS[AR(1)] 15:871 BDS[MA(1)] 13:798 Hurst Exponent 0.38

The BDS tests reported here were conducted by taking = 1 and m(embedding dimension)=5. BDS[AR(1)] refers to the BDS test on the residuals after being prewhiten by an AR(1) lter, and similar for BDS[MA(1)].

Table 2: Designs of BPNNs BP1 BP3 BP5 BP7 BP2 BP4 BP6 BP8 # of Inputs 1 15 2 16 1 15 2 16 # of Hidden Layers 1 # of Hidden Nodes 1-10 Initial Weights 0.3

Transfer Function tanh for the hidden layer sigmoid for the hidden layer linear for the output layer linear for the hidden layer Learning Rate 0.1 or 0.8 for the hidden layer

0.1 or 0.4 for the output layer Momentum 0.1 or 0.6 for the hidden layer 0.1 or 0.2 for the output layer Fitness Function Mean Square Error (MSE)

Due to Lapedes and Farber (1987, 1988), the transfer functions for the output layers are all linear. Initial weights are set ranging from -0.3 to 0.3. The learning rate is set at either 0.1 or 0.8 for the hidden layer, and 0.1 or o.4 for the output layer. Momentum is set at 0.6 and 0.1 for the hidden layer, and 0.2 or 0.1 for the output layer. The energy function is dened by mean square error.

2 Data Description

Our data set is composed of intradaily foreign-exchange-rate returns for the USD/DEM. The main source of this data set is the interbank spot prices published by Dow Jones in a mutliple contributors page (the TELERATE page). This covers markets worldwide and 24 hours a day. Those prices are quotations of bid and ask prices and not actual trading prices. Furthermore, they are irregularly sampled and therefore termed as tick-by-tick prices. The specic date of the intraday data employed is June 3, 1998, which is free from the weekend or the Monday eect. There are 3,800 observations in the original series. By taking log and dierence, we have 3,799 observations in return series.

We rst conducted a series of statistical procedures to examine the existence of stylized facts in this dataset. The prodedures and results are exhibited in Table 1. From Table 1, we can see that this intraday data holds many stylized facts, such as being not IID, not normally distributed, and rst-order negative correlated. (Zhou, 1996) The data indicates the possible existence of hidden nonlinearity and hence opens a door for the application of ANNs.

3 Experimental Design

In the following, we will discuss whether ANNs can help extract nonlinear structures from the data. And if so, would we further benet by using evolutionary computation to guide the design of ANNs. This leads us to consider three classes of models. The rst one is the random-walk model. The second is a class of

(4)

back-propagation ANNs with prespecied architectures. The third is a class of ANNs whose architectures are determined by evolution. For brevity, the second class of models is entitled BPNNs, and the third EANNs. As mentioned earlier, a higher level of evolution is what we have in mind. Therefore, our EANNs include evolution of the number of inputs, the number of hidden layers, the number of hidden neurons, transfer functions (sigmoid and tanh), the learning coecients and momentum. We used the rst 2,500 observations as the training set, the next 500 observations as the validation set, and the last 800 observations as the testing set. The software NueroGenetic Optimizer (version 2.5) is used to implement the computation.

The relevant parameters for the BPNNs are given in Table 2. Eight settings of BPNNs are considered in this study. Four dierent input sets are used. There are 1(

fr t?1 g), 2( fr t?1 ; 2 t?1 g), 15( fr t?1 ;r t?2 ;:::;r t?15 g), and and 16( fr t?1 ;r t?2 ;:::;r t?15 ; 2 t?1 g), where r

t?i is the return for the t?ith

tick, and 2

t?1refers to the volatility of the last tick.

2

t?1is dervied by the simple moving-average formula.

More preciesely, 2 t?1= 8 < : P 5 i=1 (rt?i?r5) 2 4 ; if 2 is used; P 15 i=1 (r t?i ?r 15 ) 2 14 ; if 16 is used: (1) where r 5= P 5 i=1 r t?i 5 (2) r 15= P 15 i=1 r t?i 15 (3)

The number of hidden nodes is set to be from 1 to 10. For each number of hidden node and a set of the learning rate and momentum, an ANN is randomly generated, and is trained by using backpropagation method and the early stopping rule. We then use the validation set to test the inferred networks, and the best among these 10 ANNs is kept for the post-sample forecasting competition. This procedure is applied to each setting, and in the end, eight BPNNs were selected out of 80 BPNNs. The similar procedure was taken to generate 8 EANNs, each distinguished by its design of evolution. Table 3 summarizes their dierences.

Notice that the \Max # of Inputs" assigned to each setting refers to the number of inputs potentially being available. The actual number of inputs used is nallized by evolution. The input sets consideedr here are the same as BPNNs. The only dierence is that now we are letting genetic serach to decide which inputs should be included. Similarly, for Max # of Hidden Layers", in the case where there are two hidden layers, the number of hidden layers actually used is determined by evolution. Given the parameters chosen in Table 3, both the weights, the architectures and learning coecients will be determined by the genetic search driven by the NeuroGenetic Optimizer, Version 2.5.

4 Experimental Results

Based on the testing procedures described above, we ran a forecasting competition among 16 ANNs (8 BPNNs and 8 EANNs) and the random-walk (RW) model. The perfromance criteria are mean absolute error, mean squared error, Theil's U, hit ratio and Sharpe ratio. Together, these ve criteria enable us to see dierences, if any, among these models, not only from a statistical standpoint, but from an economic perspective. Furthermore, several statistical tests were run to see whether the dierences are signicant. The tests employed are the Granger-Newbold test, the sign test, the Pesaran-Timmermann test, and the Diebold Mariano test.

Due to the space limit, we only summarize a few major experimental results here. First, can ANNs beat the random-walk model in terms of intraday data? The answer is positive. All our 16 ANNs beat the RW model in all the ve criteria above. This result is interesting in its own right. It indicates that, to beat the RW, what we need is a reasonable, rather than an optimal, design of the ANN. Second, are the dierences found signicant? The answer is \not always". In terms of the Granger-Newbold test and the sign test, the result is clear. All 16 ANNs beat the RW model at the 1% signicance level. However, in terms of the Pesaran-Timmermann test, only two out of these sixteen (BPNN4 and EANN7) outperformed the RW model signicantly. As to the Diebold-Mariano test, none of these ANNs beat the RW signicantly. Thus,

(5)

Table 3: Designs of EANNs

Setting GA1 GA2 GA3 GA4 GA5 GA6 GA7 GA8

Max # of Inputs 1 15 2 16 1 15 2 16

Max # of Hidden Layers 1 2

Transfer Functions tanh, sigmoid, linear Learning Rate [0.1, 0.8] for the hidden layer

[0.1, 0.4] for the output layer Momentum [0.1, 0.8] for the hidden layer [0.1, 0.4] for the output layer

Population Size 50

Generation 50

Crossover Rate 0.6

Mutation Rate 0.001

Fitness Function Mean Square Error (MSE) Selectin Mode Roulette Wheel Selection

The range of the learning rate is set at [0.1, 0.8] for the hidden layer, and [0.1, 0.4] for the output layer. The range of momentum is set at [0.1, 0.6] for the hidden layer, and [0.1, 0.2] for the output layer.

the superiority of nonlinear models, such as ANNs, over the RW in forecasting, is not statistically evident as far as high-frequency data are concerned.

Now comes the core issue: can evolution help the design of ANNs? The results are mixed. While in the criteria MAE, MSE, Theil's U, EANN8 took the lead, it is the BPNN4 that was the top performer in the hit ratio and the Sharpe ratio. If we look further down the ranking, we will see that more BPNNs ranked higher than the EANNs. In fact, in all ve criteria, the worst performers belonged to the EANN family. Therefore, at the current stage, what we have found can be concluded as follows.

ANNs in general can be anticipated to outperform, at least numerically, the RW in the intraday data.

To get this superior performance, what we need is only a reasonable design of the ANN rather than an optimal one.

While the design can be crucial for the nancial application of ANNs, evolutionary computation may

not help that much. So far, we have seen that evolutionary computation can be helpful only up to the level of connection weights. Beyond that, there is no evidence for any signicant improvement.

References

[1] Harrald, P. G. and M. Kamstra (1997), \Evolving Articial Neural Networks to Combine Financial Forecasts," IEEE Transactions on Evolutionary Computation, Vol. 1, No. 1, pp. 40-52.

[2] Hoptro, R. C., J. Bramson, and T. J. Hall (1991), \Forecasting Economic Turning Points with Neural Nets," Proceeding of International Joint Conference on Neuarl Nets, Vol. 1, pp. 347-352.

[3] Hutchinson, J. M., A. W. Lo, and T. Poggio (1994), \A Nonparametric Approach to Pricing and Hedging Derivative Securities Via Learning Networks," Journal of Finance, Vol. XLIX, No. 3, pp. 851-889. [4] Jang, G.-S. and F. Lai (1994), \Intelligent Trading of an Emerging Market," in Deboeck, G. J. (ed.),

Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic Financial Markets, Wiley. [5] Kamijo, K.-I., and T. Tanigawa (1990), \Stock Price Pattern Recognition: A Recurrent Neural Network

(6)

[6] Kimoto, T., K. Asakawa, M. Yoda, and M. Takeoka (1990), \Stock Market Prediction System with Modular Neural Networks," Proceeding of International Joint Conference on Neural Networks, Vol. 1, pp.1-6.

[7] Kuan, C.-M. and T. Liu (1995), \Forecasting Exchange Rates Using Feedforward and Recurrent Neural Networks," Journal of Applied Econometrics, Vol. 10, pp.347-364.

[8] Lee, C. H. and K. C. Park (1992), \Prediction of Monthly Transition of the Composition of Stock Price Index Using Recurrent Back-Propagation," Articial Neural Network 2, pp. 1629-1632.

[9] Lovell, D. R. and A. C. Tsoi (1992), \The Performance of the Neocognition with Various S-cell and C-cell Transfer Functions," Intelligent Machines Laboratory, Department of Electronic Engineering, Univ. of Queensland.

[10] Mani, G. (1990), \Learning by Gradient Descent in Function Space," Proceedings of International Conference on System, Man and Cybernetics, pp. 242-247.

[11] Margarita, S. (1991), \Neuarl Network, Genetic Algorithms and Stock Trading," Articial Neural Net-works1, pp. 1763-1766.

[12] Moody, J., and J. Utans (1995), \Architecture Selection Strategies for Neural Networks: Application to Corporate Bond Rating Prediction," in A.-P. Refenes (ed.), Neural Networks in the Capital Market, John Wiley, New York.

[13] Sebald, A. V. and K. Chellapilla (1998), \On Making Problems Evolutionary Friednly," in V. W. Porto, N. Saravanan, D. Waagen and A. E. Eiben (eds.), Evolutionary Progarmming VII, pp. 271-290. [14] Weigend, A. S., B. A. Huberman and D. E. Rumelhart (1992), \Predicting Sunspots and Exchange

Rates with Connectionist Networks," Nonlinear Modeling and Forecasting, SFI Studies in the Science of Complexity 7, pp. 395-432.

[15] White, A. J. (1998), \A Genetic Adaptive Neural Network Approach to Pricing Options: A Simulation Analysis," Journal of Computational Intelligence, Vol. 6, No. 2, pp. 13 - 23.

[16] White, H. (1988), \Economic Prediction Using Neural Networks: The Case of IBM Daily Stock Re-turns," in Proceedings of the IEEE International Conference on Neural Networks, Vol. II, pp. 451-458. [17] Yao, X. (1993), \A Review of Evolutionary Articial Neural Networks," International Journal of

Intel-ligent Systems, Vol. 8, No. 4, pp. 539-567.

[18] Zhou, B. (1996), \High-frequency Data and Volatility in Foreign-Exchange Rate," Journal of Business and Economic Statistics, Vol. 14, No. 1, pp. 45-52.

Would Evolutionary Computation Help in Designs of ANNs in Forecasting Exchange Rates?