Evolutionary Artificial Neural Networks for Hydrological Systems Forecasting

(1)

Evolutionary artiﬁcial neural networks for hydrological systems forecasting

Yung-hsiang Chen

a,b

, Fi-John Chang

a,*

a

Department of Bioenvironmental Systems Engineering, National Taiwan University, Taiwan, ROC b

Water Resources Agency, Ministry of Economic Affairs, Taiwan, ROC

a r t i c l e

i n f o

Article history: Received 20 June 2008

Received in revised form 8 December 2008 Accepted 12 January 2009

This manuscript was handled by K. Georgakakos Editor-in-Chief with the assistance of Enrique R. Vivoni, Associate Editor.

Keywords:

Evolutionary artiﬁcial neural network (EANN)

Genetic algorithm (GA) Time series

Forecasting Hydrology Water resources

s u m m a r y

The conventional ways of constructing artificial neural network (ANN) for a problem generally presume a specific architecture and do not automatically discover network modules appropriate for specific training data. Evolutionary algorithms are used to automatically adapt the network architecture and connection weights according to the problem environment without substantial human intervention. To improve on the drawbacks of the conventional optimal process, this study presents a novel evolutionary artificial neural network (EANN) for time series forecasting. The EANN has a hybrid procedure, including the genetic algorithm and the scaled conjugate gradient algorithm, where the feedforward ANN architecture and its connection weights of neurons are simultaneously identified and optimized. We first explored the performance of the proposed EANN for the Mackey–Glass chaotic time series. The performance of the dif-ferent networks was evaluated. The excellent performance in forecasting of the chaotic series shows that the proposed algorithm concurrently possesses efficiency, effectiveness, and robustness. We further explored the applicability and reliability of the EANN in a real hydrological time series. Again, the results indicate the EANN can effectively and efficiently construct a viable forecast module for the 10-day reser-voir inflow, and its accuracy is superior to that of the AR and ARMAX models.

Introduction

In Darwin’s concept of survival of the fittest, creatures must be able to adjust to the environment for survival. Learning and evolu-tion are two fundamental forms of adaptaevolu-tion. Learning refers to the process of modifying behaviour to adjust to the environment in different stages for an individual during its life. When learning well an individual is capable of adapting to a similar environment. In contrast with the learning of single individuals, evolution men-tions the process of a population of parent species passing genes to their offspring through reproduction, crossover, and mutation for generations.Fig. 1shows a comparison of an individual’s learning and a population’s evolution. Among various research fields, artifi-cial neural networks (ANNs) and evolutionary algorithms (EAs) are typical applications of learning and evolution, respectively.

With the ability to process massive information and deal with high non-linearity, ANNs have been widely studied and success-fully applied to various ﬁelds, e.g., hydrology and water resources, in recent years (Hsu et al., 1995; Sajikumar and Thandaveswara, 1999; Chang et al., 2005, 2007; Karunasinghe and Liong, 2006; Sa-hoo and Ray, 2006; Chiang et al., 2007; Kim and Kim, 2008). Most present artiﬁcial neural networks (ANNs), however, rely heavily on

human experts who have sufficient knowledge about the different aspects of the network and the problem domain (Abraham, 2004). These generally presume a specific architecture and do not auto-matically discover network modules appropriate for specific train-ing data. The selection of network architecture has a significant influence on the performance of ANNs. Too simple a network archi-tecture might not meet the demand for accuracy, while a too com-plicated architecture might reduce the generalization ability of network due to over-fitting. As the complexity of the problem in-creases, manual design becomes more difficult. The conventional way to design network architecture involves a destructive algo-rithm (Abrahart et al., 1998) and a constructive algorithm (Kwok and Yeung, 1997). However, the application by designers of those two algorithms to increase or decrease the hidden layers or neu-rons is based on a predefined network architecture, which usually misleads the search to a restrictedly structural local optimum (Angeline et al., 1994; Castillo et al., 2007). The major challenge of applying ANN is how to evolve unique neural network architec-ture and its corresponding weight values for a specific problem.

The genetic algorithm (GA), a branch of EAs, was proposed by Holland (1975)based on Darwin’s concept of ‘‘survival of the ﬁt-test”. GA regards optimization problems as natural evolutionary species and transfers the search process into an evolutionary pro-cess. Having the abilities of global search and evolutionary adapta-tion properties, the GA is able to supplement the insufﬁciency of 0022-1694/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved.

doi:10.1016/j.jhydrol.2009.01.009

* Corresponding author. Tel.: +886 2 23639461; fax: +886 2 23635854. E-mail address:[email protected](F.-J. Chang).

Contents lists available atScienceDirect

Journal of Hydrology

(2)

ANN. In such a way it would couple the evolution of the population and the learning process of each individual, achieving better adaptation of the whole environment to a generic fitness land-scape. These investigations led to the birth of a new framework generally referred to as evolutionary artificial neural networks (EANNs). The creative evolutionary systems could have a major im-pact on the effectiveness and efficiency of designing neural net-works. Evolutionary algorithms are used to automatically adapt the connection weights, network architecture and learning rules according to the problem environment without substantial human intervention. The evolution of connection weights is applied to im-prove the adaptation ability of connection weights by global train-ing based on predefined network architecture. The evolution of network architecture can lead the training to adapt different opti-mization problems. The objects to be evolved could be only net-work topology or both netnet-work topology and connection weights. The evolution of ANN’s architecture means a process of optimiz-ing parameters of network architecture. The optimization of parameters generally depends on the characteristics of problems, and it is often true that using an ANN to optimize the best network architecture is not easy, especially for highly non-linear data. A better way is to introduce other algorithms of global optimization (e.g., GAs) that would increase the opportunities to search the near-optimal solution. For example, the conventional approach for feedforward ANN is to predefine the network architecture and then implement the iterated process of optimizing connection weights. Different from traditional ANN requiring predefining the network architecture, EANN is able to automatically search for the best network architecture and near-optimal solution as well by evolution and learning more efficiently and effectively. In other words, EANN implements the GA to locate a good region in the space (i.e., evolved network architecture) and then a local search procedure (gradient search) is used to find a near-optimal solution in this region (Yao, 1999).

The main purpose of this study is to propose an EANN for auto-matically constructing the optimal network architecture and con-nection weights of ANN to the investigated time series. The Mackey–Glass chaotic time series is first taken as a theoretical ser-ies to evaluate the efficiency, effectiveness, and robustness of the constructed EANN. Then we apply the EANN to the forecasting of 10-day reservoir inflows of the Shihmen Reservoir and compare the forecasted inflow with the lag-one autoregressive (AR(1)) and autoregressive moving-average with exogenous inputs (ARMAX) models.

The status of EANN

EANNs have been widely explored in last few years.Yao (1993, 1999) provided two excellent reviews of the different tions between ANNs and EAs. He roughly divided the combina-tions into three evolucombina-tions of ANNs and then described a framework for EANNs and pointed out an important concept:

‘‘Design of the optimal architecture for an ANN can be formulated as a search problem in the architecture space where each point represents an architecture”. There are several studies which have explored the applicability of EANN in the hydrological sciences. Cortez et al. (1996) used a genetic algorithm neural network (GANN) to forecast time series. Abraham (2004) proposed an effective evolutionary neural network, the meta-learning evolu-tionary artiﬁcial neural networks (MLEANN) and applied MLEANN to several different time series. Dawson et al. (2006) applied JavaSANE, a package developed from a symbiotic adaptive neuro-evolutionary (SANE) algorithm (Moriaty and Miikkulainen, 1998), to evolving and optimizing individual neurons of a rainfall–runoff network. Leahy et al. (2008) stated that a global optimization methodology for ANN architecture and weights can be employed successfully to a river level prediction.Chaves and Chang (2008) recently presented an Evolving ANN Intelligent System (ENNIS) for reservoir operation.

The parameters of ANN and GA, types and divisions of observed or generated data, and criteria of performance evaluation in the above studies and this study are tabulated inTable 1. The primary research questions to be addressed in this paper are as follows:

1. How can the architecture of ANN (e.g., inputs, the number of hidden layers, and the number of neurons in hidden layers) be automatically optimized?

2. How can the inputs be appropriately selected among all possi-ble ones?

3. How can the parameters of ANN’s architecture be encoded to artiﬁcial chromosomes?

4. How can the chromosomes be encoded so that they can be per-formed crossover with different lengths?

5. How well do EANNs perform in comparison to conventional ANNs in terms of effectiveness, efﬁciency, and robustness?

Modeling the EANN

Artiﬁcial neural networks

An ANN is a network system composed of a set of intercon-nected processing elements (neurons), with each element ex-pressed by a function of the sum of weighted inputs as follows:

Yi¼ fi X i WijXj hi ! ð1Þ where Yiis the output of the ith neuron; fiis the transfer function of the ith neuron; Wijis the connection weight between the ith and jth neurons; Xjis the input of the jth neuron; hiis the threshold of the ith neuron.

The architecture of multi-layer feedforward BPNN can be roughly divided into three layers, including input layer, hidden layer, and output layer (Fig. 2). The input layer acts as the receiver of input data without a weighted sum. The hidden layer could be a single layer or multiple layers. In addition to receiving the informa-tion from all neurons of the input layer, each neuron of the hidden layers sums weighted inputs and then delivers to the neurons of next hidden layer or output layer. The output layer is a single layer including one or several output variables.

The estimated output value of the output layer is compared with a target output value by objective functions (or error functions). If the error does not meet the criteria of the objective functions, then the connection weights between any two feedforward-connected neurons are modiﬁed. To search optimal connection weights and biases, we may use the search algorithms to minimize the error functions. Common search algorithms include gradient descent method, Newton method, and conjugated gradient method. Fig. 1. Comparison of individual’s learning and population’s evolution.

(3)

Genetic algorithm

As briefly mentioned above, the GA, by mimicking the evolu-tionary process of natural genetic heredity, transfers the search process of an optimization problem into an evolutionary process. The parameters of an optimization problem are first encoded as an artificial chromosome. By starting with a number of randomly initialized chromosomes, the optimization process is then imple-mented for each chromosome and the fitness of each chromosome

is evaluated by objective functions. If the optimal solution does not meet the criteria of the objective function or the predefined itera-tion times are not met, the optimizaitera-tion process is iterated through reproduction, crossover, and mutation until the stop criteria are satisfied. The key elements of the GA used in this study are briefly given as follows.

Encoding: Encoding is the first step of GA. All the parameters of a problem to be optimized have to be encoded as an artificial chro-mosome, i.e., a string of genes with fixed length.

Table 1

Comparison of three previous studies and this study applying EANNs.

Items Cortez et al. (1996) Abraham (2004) Dawson et al.

(2006)

This study

Types of ANN Genetic algorithm neural network (GANN)

Meta-learning evolutionary artiﬁcial neural networks (MLEANN)

Symbiotic adaptive neuro-evolutionary (SANE) algorithm

Hybrid-encoding evolutionary artiﬁcial neural networks (HEEANN)

Types of data 91–349 generated data of time series

a. 475 observations of wastewater time series f(t)

b. 1000 generated data of Mackey–Glass chaotic time series x(t)

c. 292 pairs of methane u(t) and carbon dioxide y(t) time series

6-h streamﬂow and rainfall

observations for 3 years

a. 1000 generated data of Mackey–Glass cha-otic time series x(t)

b. 10-day reservoir inﬂow and rainfall obser-vations for 39 years

Division of data Training and validation Training and testing Training, validation, and testing

Training and testing

Input variables 3–14 neurons (unspecified variable) a. f(t 1), f(t) b. x(t 18), x(t 12), x(t 6), x(t) c. u(t), y(t) Predefined 3 streamflow variables and 5 rainfall variables

a. Any selection of all variables between x(t 18) and x(t)

b. Constant or any selection of variables between P(t 2), P(t 1), P(t), Q(t 2), Q(t 1), and Q(t) for observed and stan-dardized data

Number of hidden layers and number of neurons in hidden layers

1 layer (3–14 neurons) Maximum number of neurons = 16 Maximum number of neurons = 1000

Maximum number of hidden layers = 3; maximum number of neurons = 15

Neuron connection Fully connected N/A N/A Fully connected

Transfer function N/A Tanh, logisitic, sigmoidal, tanh–sigmoidal, log–sigmoidal

N/A Tanh–sigmoidal and linear Output variables 1-h ahead data a. f(t + 1)

b. x(t + 6) c. y(t + 1) 6-h and 24-h ahead streamﬂow a. x(t + 6) b. Q(t + 1) Learning rules Backpropagation algorithm Backpropagation algorithm, conjugate

gradient algorithm Levenberg-Marquardt algorithm, quasi-Newton algorithm

Backpropagation algorithm

Scaled conjugate gradient algorithm

Encoding scheme Binary indirect encoding Binary direct encoding N/A Binary hybrid encoding (direct encoding and indirect encoding)

Encoded parameters Number of inputs, transfer functions, learning rate, and number of neurons in hidden layer

Number of inputs, transfer functions, learning algorithms, number of hidden layers, number of neurons in hidden layers, and connection weights

Neuron Inputs, number of neurons in hidden layer

Chromosome length 14 bits N/A N/A a. 31

b. 18

Initial population 20 40 N/A 20

Genetic operators Selection, crossover, and mutation

– Roulette-wheel selection – Crossover rate = 1 – Mutation rate = 0.02

Selection and mutation – Rank selection – Elite rate = 0.05 – Mutation rate = 0.4 Selection, crossover, and mutation

Selection, crossover, and mutation – Elite number = 1

– Tournament selection – Crossover rate = 0.5 – Mutation rate = 0.05

Generations N/A 40 200 10

Computing time N/A a. 288–1463 min

b. 62–696 min c. 146–1176 min

N/A a. 3–223 min

b. 1.3–160 min

Iteration N/A N/A Random 50 times At least successive ﬁve times

Criteria of performance evaluation

MSE and SMSE RMSE RMSE, COE, MAE,

and COD

RMSE and CC

(4)

Initialization: Randomness is the common way to initialize a number of individual chromosomes expressing the population.

Fitness: Fitness may be regarded as the degree of evaluation over the search process for the individual in every generation whether it is satisﬁed with the objective function.

Genetic operators

(1) Reproduction. Elite and selection of chromosomes are two strategies of reproduction. In elite strategy a chromosome with the best ﬁtness is regarded as the elite among all per generation. All features of the elite chromosome will remain unchanged in the next generation. Selection means that some of the chromosomes, except for the elite, with better ﬁtness will be put into the match pool awaiting the opportu-nity of crossover.

(2) Crossover. Increasing diversity of genes is the purpose of crossover. The way of crossover is to randomly select any two of all parent individuals in the match pool, then exchange some genes of each parent individual.

(3) Mutation. Mutation can avoid the optimal search trapped in the local minimums of error function. A typical way of gene muta-tion is to randomly select some genes of parent individuals, and then change the values of the selected genes into different ones.

Output layer Hidden layer Input layer … ……… … …. . ………

Fig. 2. Schematic architecture of feedforward BPNN.

Randomize a fixed population of chromosomes with different hidden layers 1. Set fixed generations and a value of objective function as termination criteria 2. Set the number of chromosomal genes (part for inputs by direct encoding scheme

and the rest for the neurons in hidden layers by indirect encoding scheme) 3. Set crossover rate and mutation rate

4. Divide input data into training and testing sets

Complete genetic operations for all chromosomes?

Is any one of the two preset termination criteria met? No

Yes

Reproduce genes by elite strategy and selection Perform crossover

Perform mutation

Yes End

No

For each chromosome (regarded as an individual network):

1. Predefine the transfer functions of neurons in the hidden layers and output layer 2. Randomize the initial connection weights of all connections

3. Set scaled conjugate gradient algorithm (SCGA) as a learning rule of network

1. Perform SCGA to optimize the connection weights of each network 2. Compute the values of the objective function (fitness) for all networks

and record the best one of them

Decode the chromosomes into the parameters of network architecture: 1. “1”means the input is selected while “0”means not selected.

2. Convert the binary numbers in each hidden layer into the number of neurons

Input training and testing data into each constructed network Decoding Initialization Construction of ANNs Genetic operation

Construct networks by the corresponding decoded chromosomes one by one Start from the first generation

To the next generation Settings

Randomize a fixed population of chromosomes with different hidden layers 1. Set fixed generations and a value of objective function as termination criteria 2. Set the number of chromosomal genes (part for inputs by direct encoding scheme

and the rest for the neurons in hidden layers by indirect encoding scheme) 3. Set crossover rate and mutation rate

4. Divide input data into training and testing sets

Complete genetic operations for all chromosomes?

Is any one of the two preset termination criteria met? No

Yes

Reproduce genes by elite strategy and selection Perform crossover

Perform mutation

Yes End

No

For each chromosome (regarded as an individual network):

1. Predefine the transfer functions of neurons in the hidden layers and output layer 2. Randomize the initial connection weights of all connections

3. Set scaled conjugate gradient algorithm (SCGA) as a learning rule of network

1. Perform SCGA to optimize the connection weights of each network 2. Compute the values of the objective function (fitness) for all networks

and record the best one of them

Decode the chromosomes into the parameters of network architecture: 1. “1”means the input is selected while “0”means not selected.

2. Convert the binary numbers in each hidden layer into the number of neurons

Input training and testing data into each constructed network Decoding Initialization Construction of ANNs Genetic operation

Construct networks by the corresponding decoded chromosomes one by one Start from the first generation

To the next generation Settings

(5)

Termination of genetic search

The evolutionary process of optimization search can be iterated unless the termination criteria have been met.

Evolutionary artiﬁcial neural networks

This study is to evolve the network architecture of BPNN by GA and optimize the connection weights by a gradient search. The de-tailed steps of our framework are described as follows and shown inFig. 3.

Encoding of architecture

The encoding of ANN’s architecture can be roughly classiﬁed into direct encoding scheme and indirect encoding scheme as follows:

Direct encoding scheme:

The way to encode the connection of any two among all neu-rons of an ANN is called direct encoding. In terms of network archi-tecture, the connection between any two neurons is generally expressed by a binary digit: 1 means connection and 0 no connec-tion. Having complete information is the advantage of a direct encoding scheme. However, if the network architecture becomes more complicated, the chromosome length or connectivity matrix would signiﬁcantly increase and then cause inefﬁciency. Therefore, direct encoding is more suitable for a small network.

Indirect encoding scheme:

An indirect encoding scheme involves only encoding some char-acteristics of the network architecture such as the number of hid-den layers or that of neurons. The search process would be more efﬁcient, however, if some important parameters of the network architecture were pre-selected and encoded (Abraham, 2004).

Hybridized encoding scheme:

A hybridized encoding scheme, a combination of direct and indi-rect encoding, is proposed in this study. The network is composed of an input layer, multiple hidden layers, and an output layer; and the network is fully connected. To automatically identify appropri-ate inputs, a direct encoding scheme is used to encode the input vector of the network, the ﬁrst part of the chromosome. Indirect encoding is used to encode the number of neurons in hidden layers, the rest part of the chromosome.Fig. 4shows a schematic encoding of the network architecture for a fully connected feedforward ANN. Fig. 5illustrates an example of a feedforward ANN’s architecture corresponding to its chromosome.

The details of the hybridized encoding schemes are described as follows:

1. Direct encoding the inputs:

(1) Predeﬁne the number of possible inputs (no_input). (2) Randomly generate a series of 0 or 1, where ‘‘1” means

the input is selected while ‘‘0” means not selected, for all the possible inputs.

2. Indirect encoding the number of neurons in hidden layers: (1) Predeﬁne a maximal number of hidden layers (no_maxlayer)

and a maximal number of neurons (no_gene) in hidden lay-ers. It implies that the maximal string of genes from indirect encoded the neurons of hidden layers is no_maxlayer no_gene.

(2) Randomly generate a number for the no_layer, which should not be larger than the number of predeﬁned maxi-mal hidden layers. The generated number (no_layer) repre-sents the number of hidden layers for an individual chromosome.

(3) Randomly generate a series of 0 or 1 for the no_gene of each layer. The total generated amount of the binary numbers is no_layerno_gene.

(4) Decode the binary numbers in each hidden layer into the number of neurons in the hidden layer by the following expression:

f ð1Þ 2no gene

þ f ð2Þ 2no gene1þ ::: þ f ðno geneÞ 20 where f(i) = 0 or 1 (i = 1, 2, . . ., no_gene).

(5) If all the generated numbers of neurons in different hidden layers are zeros, steps 1–3 are repeated until at least one gene is not zero.

Initialization of architectural chromosomes and network neurons A number of network architectural chromosomes (no_indvid-ual) are initially randomized. It should be noted that the length of each chromosome would not be the same because of the randomly generated number of inputs and hidden layers. Each randomized chromosome is regarded as an individual network. For each network, the transfer functions of neurons in the hidden layers and output layer are deﬁned as tan–sigmoid trans-fer functions and linear transtrans-fer function, respectively. The ini-tial connection weights for each constructed network are also randomized.

…

Number of inputs (no_input) Neurons in 1st_hidden layer (no_gene) Neurons in 2nd_hidden layer (no_gene) Neurons in no_geneth hidden layer (no_gene)

…………

Output layer (1 variable without encoding)

……

………

……

Chromosome

Fully connected network

Architecture

…

Hidden layers Input layer

…

Number of inputs (no_input) Neurons in 1st_hidden layer (no_gene) Neurons in 2nd_hidden layer (no_gene) Neurons in no_geneth hidden layer (no_gene)

…………

Output layer (1 variable without encoding)

……

………

……

Chromosome

Fully connected network

Architecture

…

Hidden layers Input layer

(6)

Construction of a feedforward ANN

1. Construct feedforward ANNs by the corresponding decoded chromosomes one by one. The network architecture is deter-mined by the following rules:

(1) If the number of hidden layers is randomized as one (no_layer = 1) and the number of neurons in the layer is not zero, there is only a hidden layer in the network. (2) If no_layer = 2 and the numbers of neurons in both layers

are not zero, there are two hidden layers in the network. However, if either one of the two numbers is zero, there is only one hidden layer in the network.

(3) The above rules are applied when the architecture of the network for the randomized number of hidden layers is more than two.

2. Use the scaled conjugate gradient algorithm (SCGA) (Moller, 1993) as a learning rule to search for the optimal connecting weights of the constructed ANN. SCGA is a supervised learning algorithm with superlinear convergence rate and based upon a class of conjugate gradient (CG) methods, which are well-known numerical techniques used for solving various optimiza-tion problems (Ham and Kostanic, 2001). In practice, the pro-cess of CG makes good uniform progress toward the solution at every time step and has been found to be effective in ﬁnding better optimization than the standard BP algorithm (Chiang et al., 2004).

3. Set an objective function (e.g., minimal error between target and forecasted outputs of a network) and search the optimal connecting weights for each constructed network by the SCGA using the training set, then compute the value of the objective function using the testing set.

Genetic operations

1. Set the number of generations (no_generation) and then per-form the following genetic processes generation by generation. 2. Compute the values of the objective function for all chromo-somes in the current generation and record the best one. Each of the objective functions could be regarded as the ﬁtness of the speciﬁc chromosome.

3. Perform genetic operations for each generation. Parent chromo-somes with better ﬁtness have more probabilities to generate offspring chromosomes. The genetic operators include repro-duction, crossover, and mutation, described as follows: (1) Reproduction: First, one or a few with better ﬁtness

(no_e-lite) from parent chromosomes in the current generation, called elite chromosomes, are selected and kept unchanged in the next generations. Then two that are not elite chro-mosomes are randomly selected by tournament selection and one with better ﬁtness is put into a match pool. (2) Crossover: Set a crossover rate (prob_crossover) and deﬁne

the number of crossovers (no_crossover, should be an inte-ger) as follows:

no crossover ¼ prob crossover ðno individual-no eliteÞ ð2Þ Repeatedly select chromosomes for crossover in the match pool for several times (no_crossover/2). To perform cross-over, two chromosomes exchange their post-parts. That is, one offspring chromosome is combined with a pre-part of the first parent chromosome and a post-part of the second chromosome, whereas the other offspring chromosome is combined with a pre-part of the second parent chromo-some and a post-part of the first chromochromo-some.Fig. 6 illus-trates a schematic crossover for two chromosomes. (3) Mutation: Set a mutation rate (prob_mutation) and define

the number of genes to be mutated (no_mutation, also an integer) as follows:

no mutation ¼ prob mutation

ðno individual-no eliteÞ

ðno input þ no layerÞ ð3Þ

Repeatedly select chromosomes for mutation except elites. One parent chromosome is randomly selected at a time. Then a gene of the chromosome is randomly selected for mutation, changing into a different binary number. (4) Repeat step 2 for offspring chromosomes in the next

gener-ations until the termination criterion has been met. (5) Iterate all steps for a number of times.

no_input=1+1+1=3 no_gene=0*23_+1*22_+0*21_+0*20₌₄ 1 1 1

1 0 1 1 0 1 0 0

0

0 0 0 0 0

no_gene=0, i.e. no hidden layer

Direct encoding:

Encode the connection of any two among all neurons of an ANN

Indirect encoding:

Encode some characteristics of the network architecture

Randomize the number of hidden layers=2

0 0

0 no_gene=4 no_input=1+1+1=3

no_input=1+1+1=3 no_gene=0*2_{no_gene=0*2}33_+1*2_+1*222_+0*2_+0*211_+0*2_+0*200₌₄₌₄

1 1 1

1 0 1 1 0 1 0 0

0 1 0 0

0

0 0 0 0 0

0 0 0 0

no_gene=0, i.e. no hidden layer no_gene=0, i.e. no hidden layer

Direct encoding:

Encode the connection of any two among all neurons of an ANN

Indirect encoding:

Encode some characteristics of the network architecture

Randomize the number of hidden layers=2

0 0

0 no_gene=4

(7)

Applications

Mackey–Glass chaotic time series

Mackey and Glass (1977)proposed the following ﬁrst-order dif-ferential-delay equation:

dxðtÞ dt ¼

0:2xðt

s

Þ

1 þ x10_ðt

_s

_Þ 0:1xðtÞ ð4Þ

The time data generated by Eq.(4)are well known as the Mac-key–Glass chaotic time series, being periodic when

s

is small and chaotic when

s

> 17. Mackey–Glass time series are usually used as a time series benchmark when investigating performance of artiﬁcial neural network.

To evaluate the performance of the proposed EANN, this study uses Mackey–Glass time series available from the data file of MAT-LABÒ_{, a widely used computer platform. The 1200 used data have a} mean of 0.9194 and a standard deviation of 0.2357. To reduce the effect of noise, the first 118 and the last 82 data of the time series were not used. Eight hundred of the 1000 data available are used as training set and the rest two hundreds as testing set. It should be noted here that the validation set is not necessary for EANN mod-eling.Dawson et al. (2006) pointed out that the purpose of con-structing one model on one dataset and using a second dataset for conventional ANNs is to prevent an over-fitted solution; while, for GA-based ANNs, constructing several models on one dataset and using a second dataset is to select the best available model from what might be a set of over-fitted solutions. To further ex-plore this issue and to establish the degree to which the evolved models are capable of being overfitted would require the second dataset to be used directly in the neuro-evolutionary process. Abraham (2004)used the same way of dividing the data sets into training and testing sets for his meta-learning EANN.

Parameter settings

The parameters of EANN are described as follows:

1. Most of previous studies used the four input variables, x(t 18), x(t 12), x(t 6), x(t), to estimate the single output variable, x(t + 6). Since direct encoding allows all possible variables as inputs to ANN, this study uses all variables between x(t 18) and x(t) as input variables (i.e., x(t Dt), where Dt = 0,1, 2, . . ., 18) and x(t + 6) as output variables.

2. The maximal hidden layers are set to be 3 and the number of neuron genes in each layer is 4 (i.e., the neuron range can be 0–15). The total number of neuron genes in three hidden layers is not larger than 12 (i.e., 34).

3. The population of chromosomes is initialized as 20, the same as the study ofCortez et al. (1996).

4. The number of elites is set to be 1.

5. The crossover rate and mutation rate are set to be 0.5 and 0.05, respectively.

6. The generations are set to be 10.

7. SCGA is used as learning rule of ANN and, to make sure the search process could research optimal solutions, the training epoch is set as 8000.

8. The root mean square error (RMSE), expressed in Eq.(5), is used as the termination criterion. Three different RMSEs (0.001, 0.0005, and 0.0004, respectively) are set, where 0.0004 was the smallest RMSE value used in Abraham’s (2004) (see Table 2). RMSE ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN t¼1ðxpðt þ 6Þ xoðt þ 6ÞÞ2 N s ð5Þ in which xp(t + 6) is the forecasting value at t + 6; xo(t + 6) is the generated data at t + 6; N is the number of generated data. 9. To avoid over-fitting, the maximal connections of the

constructed network architecture are restricted to be less than 200.

Results and discussion

1. Table 3shows the results of three arbitrary runs for EANN mod-eling, under three different termination criteria, (a) RMSE 5 0.001, (b) RMSE 5 0.0005, and (c) RMSE 5 0.0004 on testing data set, respectively. It appears the EANN can effec-tively obtain suitable networks. With the same termination cri-terion, several other runs have been done and various network architectures are obtained. The results indicate that the inputs are different and the number of the hidden layers could be one, two, or three. The difference is mainly due to the initial randomness and genetic operations in EANN modeling. It should be noted that the initial random selection of inputs might leave out important inputs and prior information would be helpful to determine the potential inputs at the phase of set-tings. In this study, we aim at automatically identifying the optimal architecture of ANN and a large number of input com-binations have been generated and evaluated through GA pro-cess, and the initially selected inputs without contributions to meet the objective function in the ﬁrst generation would be changed into more important inputs after generations of genetic operation.

…

Match pool Match point Before crossover After crossover

…

Match pool Match point Before crossover After crossover

(8)

Table 2

Parameter settings of EANN modeling for Mackey–Glass time series.

Number of input genes (no_input) 19 Maximal hidden layers (no_maxlayer) 3

Number of neurons [gene] in each hidden layer (no_gene) 15 [4] Initial population of individual chromosomes (no_individual) 20

Number of elite chromosomes (no_elite) 1 Crossover rate (prob_crossover) 0.5

Training epochs 8000 Mutation rate (prob_mutation) 0.05

Learning rule Scaled conjugate gradient algorithm (SCGA)

Termination criterion (1) Generations (no_generation) 10

(2) RMSE value (training) (a) RMSE 5 0.0010

(b) RMSE 5 0.0005 (c) RMSE 5 0.0004

Table 3

Results of three arbitrary runs for EANN modeling under different termination criteria. Termination criterion (a) RMSE 5 0.001

Termination criterion (b) RMSE 5 0.0005

Termination criterion (c) RMSE 5 0.0004 The generation when termination criterion is

met

1st 2nd 2nd

Optimal inputs 18, 16, 15, 14, 12, 11, 6, 3, 2, 0 18, 17, 16, 13, 12, 11, 9, 8, 7, 6, 5, 4 17, 16, 15, 13, 12, 11, 10, 9, 7, 5, 3, 2

Optimal number of neurons in hidden layers 14 3-8-9 10

RMSE (training) 0.0009998 0.0004983 0.0003998

RMSE (testing) 0.0009306 0.0005505 0.0004266

Note:

1. In each run, EANNs are modeled for ten generations. The modeling is terminated once the termination criterion is met. 2. The number in the third row, e.g., ‘‘18”, means x(t 18) .

3. The numbers in the fourth row, e.g., ‘‘3-8-9”, mean there are three hidden layers and the number of neurons in the ﬁrst hidden layer is 3, in the second is 8, and in the third is 9.

g

n

i

n

i

a

r

T

)

b

(

g

n

i

n

i

a

r

T

)

a

(

0 0.5 1 1.5 0 0.5 1 1.5 Training generated data forec as ted dat a 0 100 200 300 400 500 600 700 800 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 Training time generat ed data & forec as ted data generated forecasted 0 0.5 1 1.5 0 0.5 1 1.5 Testing generated data forec as ted dat a 0 20 40 60 80 100 120 140 160 180 200 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 Testing time

generated data &

forec as ted data generated forecasted

g

n

i

t

s

e

T

)

d

(

g

n

i

t

s

e

T

)

c

(

(9)

2. Another important point to be addressed is whether the opti-mal architecture should be unique or consistent. It can be found from the comparison of two results: The forecasting results of RMSE 5 0.0004 are compared with those ofAbraham’s study (2004), whose termination criterion of RMSE is also 0.0004. Under such a strict requirement of accuracy, it is however found that there are still several optimal results, i.e., multi-optimal solutions with different network architectures. A stricter termi-nation criterion might be helpful for a more consistent result; but it could not be assured of ﬁnding a unique optimal solution unless we have known the exact global optimal solution. 3. There are chances that the initial population happens to include

a set of solutions close to the ‘‘stopping criterion”. For the termi-nation criteria RMSE 5 0.001 inTable 3, the optimal results are obtained in the ﬁrst generation. In addition to the chance of the initial population close to the termination criterion, it is also because of powerful searching ability of SCGA.

4. Fig. 7displays a comparison between the forecasted and gener-ated time series for the arbitrary run under the termination cri-terion (c) in Table 3. It appears that almost all pairs of

forecasted and generated data points are on the diagonal line, which means nearly perfect forecasting has been achieved. Therefore, the EANN can be regarded as having the ability to forecast with higher accuracy.

5. In addition to effectiveness, the other two criteria used to evaluate the performance are robustness and efficiency. Robustness is the ability to find an optimum for every search under restriction, while efficiency is the ability to take the least time to search for the optimum. Table 4 shows the results of five arbitrary runs for EANN under three different termination criteria. It appears optimal network architecture can be found for every run under its restriction, which means the proposed EANN can be regarded as robust. It can also be seen that the time to search for optimal network architectures ranges from 3 to 223 min for three termination criteria (based on a laptop with Intel Pentium M, 1.73 GHz CPU and 512 MB RAM). Compared with the computing time in Abraham’s (2004) study, the proposed EANN could be regarded as effi-cient, in spite of the different evolutionary computation and computer.

6. The parameters of GA inevitably require a number of trials to increase the possibility of searching the global optimal solution. The trials of parameters usually rely on the features of data. In this study, the settings of the crossover rate and mutation rate are determined after a number of trials. The maximum number of hidden layers and maximum number of neuron in the hidden layer might increase the effectiveness of the EANN model. How-ever, they would also increase the possibility of over-ﬁtting and training time. With respect to the parameter of SCGA, we also noticed that increase of training epochs to a certain number (e.g., 8000 epochs in the case of chaotic series) would improve the performance.

Table 4

The computation time (in minutes) of ﬁve arbitrary runs for EANN modeling under different termination criteria.

Successive run Termination criterion (a) RMSE 5 0.001 Termination criterion (b) RMSE 5 0.0005 Termination criterion (c) RMSE 5 0.0004

Computing time (min.)

1st 8 87 57 2nd 6 180 223 3rd 10 57 184 4th 8 110 77 5th 3 67 161 Shihmen Tanshui River Taipei Feitsui Reservoir Shihmen Reservoir Tanshui River Feitsui Feitseui Reservoir Shihmen Tahan River

Rain gauge station Streamflow gauge station

Taiwan

Tanshui River Basin

Shihmen Reservoir Catchment

(10)

Reservoir inﬂow time series Description of study area

Streamflow forecasting is of significant importance for planning and operation of water resource systems. For hydrologic compo-nents, there is a need for short-term (hourly or daily), mid-term (10-days or monthly), and long-term (yearly) forecasts of stream-flow events in order to optimize the systems or to plan for future expansion or reduction. Mid-term streamflow forecasting is espe-cially important for the operation of water supply systems over drought seasons. Among the water supply systems, reservoirs should be regarded as the most important and effective water stor-age facilities, which have the functions of modifying uneven distri-bution of water and allocating water resources. Precise forecasting of seasonal inflows of reservoirs will benefit the reservoir opera-tion and management.

The Shihmen Reservoir is located in the upper reaches of the Ta-han River, a branch of Tanshui River in northern Taiwan. The wa-tershed area of the reservoir is 763.4 km2_{and the effective water} storage is 251.88 million cubic meters. The reservoir serves a num-ber of purposes, including irrigation, hydroelectric power, fresh water supply, ﬂood prevention and sightseeing. Supplying water to 28 districts and housing to 3.4 million people, the reservoir is a very important water facility for the livelihoods of the people liv-ing in northern Taiwan.

Locations of the study area and gauge stations are shown in Fig. 8. The weighted average rainfalls over the watershed are com-puted by Thiessen method and the reservoir inﬂow measurements with time step of ten days are available from the gauging stations. The data were collected from the gauging stations during the per-iod from 1965 to 2003 and divided into a training set, 30 years, and testing set, 9 years. The total numbers of data sets are 1080 ten-days for training and 324 ten-ten-days for testing, respectively. Parameter settings

The parameters of the proposed EANN are described as follows and tabulated inTable 5:

1. In order to compare with the performance of AR(1) and ARMAX models, which are usually applied to periodical streamflow forecasting, two sets of rainfall and inflow variables are taken as the inputs of EANN. One set only includes inflow variables, Q(t). The other set includes both rainfall and inflow variables, P(t Dt) and Q(t Dt) (Dt = 0, 1, . . .,Dtmax), where the time step is 10 days and the value ofDtmaxis arbitrarily set to be two. The output variables of both sets are Q(t + 1). The value of Dtmax could be either set arbitrary or based on prior information of inputs.

2. SCGA is again used as a learning rule of ANN and the training epochs are set to be 5000.

3. The termination criterion of EANN can be set by an RMSE value. Similarly, to compare with the performance of AR(1) and ARMAX models inTable 6, the RMSE values on a testing data set are initially set to be 60 and then decreased or increased if an optimal result is found or not found, respectively. In addition to RMSE, Eqs.(6) and (7)are also used to estimate the correla-tion and mean relative error between the forecasted and observed inﬂow, respectively.

r ¼ PN t¼1½Qfðt þ 1Þ Qfðt þ 1Þ½Qoðt þ 1Þ Qoðt þ 1Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN t¼1½Qfðt þ 1Þ Qfðt þ 1Þ 2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi_P N t¼1½Qoðt þ 1Þ Qoðt þ 1Þ2 q ð6Þ MRE ¼ 1 N PN t¼1ðQfðt þ 1Þ Qoðt þ 1ÞÞ Qoðt þ 1Þ ð7Þ Table 5

Parameter settings of EANN modeling for reservoir inﬂow time series.

Number of input genes (no_input) 6 Maximal hidden layers (no_maxlayer) 3

Number of neurons [gene] in each hidden layer (no_gene) 15 [4] Initial population of individual chromosomes (no_individual) 20

Number of elite chromosomes (no_elite) 1 Crossover rate (prob_crossover) 0.5

Training epochs 5000 Mutation rate (prob_mutation) 0.05

Learning rule Scaled conjugate gradient algorithm (SCGA)

Termination criterion (1) Generations (no_generation) 10

(2) Initial value of RMSE (testing) RMSE 5 60

Table 6

Performance of AR(1) and ARMAX models.

Models Forecasting RMSE (training) RMSE (testing)

AR(1) Q(t + 1) 47.2 58.9

ARMAX(1, 2, 1) Q(t + 1) 46.8 59.2

Note:

1. AR(1) model: Q(t + 1) = 0.6186Q(t) + e(t + 1), where e(t + 1) is an error term at time t + 1.

2. ARMAX(1, 2, 1) model: Q(t + 1) = 0.7257Q(t) + 0.1866P(t) 0.06515P(t 1) + e(t + 1) 0.2239e(t).

3. The time series for input were standardized before modeling for both AR(1) and ARMAX models.

4. A time step is deﬁned as ten days, so ‘‘t + 1” means ten days ahead.

Table 7

Cases settings with different inputs.

Inputs Forecasting Initial termination criterion

Case 1 Random selection of P(t), P(t 1), P(t 2), Q(t), Q(t 1), Q(t 2) Q(t + 1) RMSE 5 60

Case 2 P(t), P(t 1), Q(t) Q(t + 1) RMSE 5 60

Case 3 Q(t) Q(t + 1) RMSE 5 60

Case 4 Random selection of PS (t), PS (t 1), PS (t 2), QS (t), QS (t 1), QS (t 2) Q(t + 1) RMSE 5 60 Case 5 PS (t), PS (t 1), QS (t) Q(t + 1) RMSE 5 60 Case 6 QS_(t) _{Q(t + 1)} _{RMSE 5 60} Note:

1. P and Q mean observed rainfall and inﬂow, respectively. The superscript S means standardization, i.e., PS and QS

mean standardized rainfall and inﬂow, respectively. 2. Time step is deﬁned as ten days.

(11)

where Qf(t + 1) is the forecasted inﬂow at time t + 1; Qfðt þ 1Þ, the average of Qf(t + 1) over N; Qp(t + 1), the observed inﬂow at time t + 1; and Qoðt þ 1Þ is the average of Qp(t + 1) over N; N is the number of data.

Case settings

To investigate the performance of the proposed EANN, we de-signed six different cases.Table 7displays the case settings with different inputs and the same forecasting and initial termination criteria. The six cases can be divided into two groups: the inputs with observed data (cases 1, 2, and 3) and those with standardized data (cases 4, 5, and 6). Since standardization of seasonal time ser-ies data is usually used to eliminate the effects of periodical trends, the purpose of the second group is to investigate whether stan-dardization could improve the performance of the proposed EANN. The standardization of rainfall and inﬂow data can be attained by Eqs.(8) and (9), respectively.

PS ijðtÞ ¼ PijðtÞ PjðtÞ

r

PjðtÞ ð8Þ QS ijðtÞ ¼ QijðtÞ QjðtÞ

r

QjðtÞ ð9Þ where PSijðtÞ is the standardized rainfall of the jth 10-days in ith year; Pij(t), the observed rainfall of the jth 10-days in ith year;

PjðtÞ, the average of Pij(t);

r

PjðtÞ, the standard deviation of Pij(t);

QS

ijðtÞ, the standardized inﬂow of the jth 10-days; Qij(t), the observed inﬂow of the jth 10-days in ith year; QjðtÞ, the average of Qij(t); and

r

QjðtÞ is the standard deviation of Qij(t).

Results and discussion

1. Fig. 9shows the ranges (black bar) of optimal RMSE values on testing data for six cases. From the figure it can be also observed that the RMSE value on testing data in every case is smaller than that of ARMAX(1, 2, 1) model with standardized inputs, 59.15, and the RMSE values in cases 2, 4, 5, and 6 are smaller than that of AR(1) model with standardized inputs, 58.75. However, the RMSE values in cases 4, 5, and 6 are much smaller than those in cases 1, 2, and 3. The result reveals that standardization of input data can improve the performance of the proposed EANN. This could be because the standardized streamflow used as inputs provides the flow anomaly (e.g., below or above average flow) and supports information relevant to the dry or wet in cli-mate scenario; as a result, they give better 10-day forecasts than that of non- standardized flow as inputs.

2. Table 8shows the best optimal results of the EANN modeling in six cases. For example, in case 6 the termination criterion is met in the seventh generation; the input is only the standardized inﬂow at time t; there are three optimal hidden layers and

52 53 54 55 56 57 58 59 60 Case 1 Case 2 Case 3 Case 4 Case 5 Case 6

RMSE on testing data

[ARMAX(1,2,1) model] 59.15 [AR(1) model] 58.75 (53.12) (55.42) (55.30) (58.85) (58.54) (58.96) No optimal result is found Optimal result is found

Fig. 9. Ranges of optimal RMSE values on testing data for six cases.

Table 8

The best optimal results of the EANN modeling in six cases.

Case 1 Case 2 Case 3 Case 4 Case 5 Case 6

The generation when termination criterion is met 2 3 1 1 4 7

Optimal inputs P: 1 P: 0, 1 Q: 0 PS_{: 1} _PS_{: 0, 1} _QS_{: 0}

Q: 0 Q: 0 QS_{: 1} _QS_{: 0}

Optimal number of neurons in hidden layers 2-13 2-15 1-6 3-3 10-4-11 7-12-9

RMSE Training 44.9 45.2 46.4 43.6 41.5 43.5 Testing 58.9 58.5 58.8 55.3 55.4 53.1 CC Training 0.57 0.56 0.52 0.60 0.65 0.60 Testing 0.52 0.54 0.52 0.59 0.62 0.63 MRE Training 0.002 0.005 0.015 0.018 0.012 0.010 Testing 0.122 0.056 0.041 0.062 0.088 0.030 Note:

1. Q and P in the 3rd row denote inﬂows and rainfalls, respectively. For example, ‘‘P: 0, 1” means the rainfall variables P(t) and P(t 1), while ‘‘Q: 0” means the inﬂow variables Q(t). QS

and PS

in the same row denote standardized inﬂows and rainfalls, respectively.

2.‘‘RMSE” in the 5th and 6th rows means ‘‘root mean square error”, ‘‘CC” in the 7th and 8th rows means ‘‘coefﬁcient of correlation”, and ‘‘MRE” in the 7th and 8th rows means ‘‘mean relative error”.

(12)

the number of neurons in the first hidden layer is 7, in the sec-ond is 12, and in the third is 9, respectively. The optimal coeffi-cients of correlation on the training and testing data sets are 0.596 and 0.633, respectively. The optimal mean relative errors (MRE) are very small on both training and testing data sets (i.e., 0.01 and 0.03). Fig. 10 (a) and (b) display the comparison between forecasted and observed inflow time series for case 6 inTable 8. Since the performance of forecasting is our focus, we pay more attention to the results of testing. FromFig. 10 (b) it can be seen the proposed EANN performs well on the test-ing data set in the forecasttest-ing of low inflows. However, most of high inflow forecasting, especially the streamflow more than 150 m3_{/s, shows underestimation. This is mainly because there} are only a few typhoon events with limited observed high flow data to model and train the constructed networks.

3. The results inTable 8also show that the case 5 (including the rainfall data) could have better performance than the case 6 (excluding the rainfall data) in training phase, however, a reverse result is obtained in the testing phase. Since the quick response from heavy precipitation can be in a matter of few hours for the catchment of size 763.4 km2_{, we believe the} rain-fall information in this study cases could be regarded as distur-bance in forecasting the 10-day ahead reservoir inﬂow. It can also been seen from that the RMSE of AR(1) is smaller than that of ARMAX(1, 2, 1).

4. The percentage of improvement for the RMSE values on train-ing and testtrain-ing data of EANN, compared with ARMAX(1, 2, 1) and AR(1) models with standardized inputs, is shownTable 9. Since the cases with standardized input data have better per-formance, cases 4, 5, and 6 are selected to compare with the 0 108 216 324 432 540 648 756 864 972 1080 0 100 200 300 400 500 600 700 time (10-day) str eam fl o w (cm s) obeserved forecasted

(a) Training

0 36 72 108 144 180 216 252 288 324 200 400 600 time (10-day) str ea m fl ow (c m s) obeserved forecasted

(b) Testing

Fig. 10. Comparison between forecasted and observed inﬂow time series for case 6. Table 9

Percentage of improvement for the RMSE values on training and testing data of EANN modeling, compared with ARMAX(1, 2, 1) and AR(1) models.

Models ARMAX(1, 2, 1) EANN AR(1) EANN

Case 4 Case 5 Case 6

RMSE on training data 46.8 43.6 41.5 47.2 43.5

Percentage of improvement – 6.8% 11.3% – 7.8%

RMSE on testing data 59.2 55.3 55.4 58.9 53.1

(13)

above two models. Compared with the ARMAX(1, 2, 1) model, the percentage of improvement in terms of the RMSE values on training and testing data for case 4 is 6.8% and 6.6%; for case 5 is 11.3% and 6.4%, respectively. Compared with the AR(1) model, the percentages of improvement in terms of the RMSE on training and testing data for case 6 are 7.8% and 9.9%, respectively, which is quite valuable and uneasy accomplishment for long-term hydrological time series fore-cast. The results reveal that the admirable effectiveness of the proposed EANN and standardization is beneﬁcial to model-ing for seasonal time series.

Conclusions

To pursue adaptivity and to increase the efficiency of optimiza-tion systems, there has been an increasing interest in a new gen-eral framework for adaptive systems, namely, Evolutionary Artificial Neural Networks, where the modeling potentialities of artificial neural networks have been matched with the adaptation properties of the evolutionary algorithms. The need for adaptation came out from several real-world applications in non-stationary environments such as non-linear control tasks and time series forecasting. This paper proposes a novel evolutionary neural net-work for hydrological time series forecasting. The excellent perfor-mance for forecasting of the Mackey–Glass chaotic time series shows that the proposed EANN concurrently possesses efficiency, effectiveness, and robustness. Furthermore, the forecasting of 10-day reservoir inflows reveals again the excellent effectiveness of the proposed EANN, and standardization is beneficial to modeling for seasonal time series.

The proposed EANN in this study consists of the following features:

1. The optimal architecture of feedforward ANN, including inputs, hidden layers, and neurons in each hidden layer, can be auto-matically searched. The automatic search algorithm has improved on the drawbacks of the conventional approach that requires predeﬁning the network architecture and involves a tedious trail-and error process.

2. Binary direct encoding and indirect encoding are hybridized to encode the important parameters of network architecture into an artiﬁcial chromosome. One part of the chromosome, genes of inputs, is encoded by a direct encoding scheme, while the other part, neurons in hidden layers, is encoded by an indirect encoding scheme. A relatively short length of chromosome and simple encoding scheme are the advantages of such hybrid encoding schemes. The proposed EANN that allows evolving all possible input variables is better than the conventional evolu-tionary design of ANN, which only allows evolving the number of inputs or unchanged inputs.

3. Since the number of hidden layers is randomly initialized, the lengths of chromosomes in each generation cannot be the same. After crossover between two parent chromosomes with differ-ent lengths, the lengths of offspring chromosomes are also dif-ferent. In comparison with some conventional evolutionary designs of ANN which only allow mutation, the proposed EANN can perform crossover with non-constant lengths of chromo-somes in addition to mutation.

Acknowledgments

This study was partially supported by the National Science Council, ROC (Grant No. NSC 97-2313-B-002-013-MY3). In addi-tion, the authors are indebted to the Editors and Reviewers for their valuable comments and suggestions.

References

Abraham, A., 2004. Meta learning evolutionary artiﬁcial neural networks. Neurocomputing 56, 1–38.

Abrahart, R.J., See, L., Kneale P.E., 1998. New tools for neurohydrologist: using network pruning and model breeding algorithm to discover optimum inputs and architecture. In: Proceedings of the 3rd International Conference on Geocomputation. University of Bristol.

Angeline, P.J., Saunders, G.B., Pollack, J.B., 1994. An evolutionary algorithm that evolves recurrent neural networks. IEEE Transactions on Neural Networks 5 (1), 54–65.

Castillo, P.A., Merelo, J.J., Arenas, M.G., Romero, G., 2007. Comparing evolutionary hybrid systems for design and optimization of multilayer perception structure along training parameters. Information Sciences: An International Journal 177 (14), 2884–2905.

Chaves, P., Chang, F.J., 2008. Intelligent reservoir operation system based on evolving artiﬁcial neural networks. Advances in Water Resources 31, 926–936. Chiang, Y.M., Chang, F.J., Jou, B.J.D., Lin, P.F., 2007. Dynamic ANN for precipitation estimation and forecasting from radar observations. Journal of Hydrology 334, 250–261.

Chiang, Y.M., Chang, L.C., Chang, F.J., 2004. Comparison of static-feedforward and dynamic-feedback neural networks for rainfall–runoff modeling. Journal of Hydrology 290, 297–311.

Chang, F.J., Chang, L.C., Wang, Y.S., 2007. Enforced self-organizing map neural networks for river ﬂood forecasting. Hydrological Processes 21, 741–749. Chang, Y.T., Chang, L.C., Chang, F.J., 2005. Intelligent control for modeling of real

time reservoir operation: part II ANN with operating curves. Hydrological Processes 19, 1431–1444.

Cortez, P., Machado, J., Neves, J., 1996. An evolutionary artiﬁcial neural network time series forecasting system. In: Proceedings of the IASTED International Conference on Artiﬁcial Intelligence, Expert Systems and Neural Networks, pp. 278–281.

Dawson, C.W., See, L.M., Abrahart, R., Heppenstall, A.J., 2006. Symbiotic adaptive neuro-evolution applied to rainfall–runoff modeling in northern England. Neural Networks 19, 236–247.

Ham, F.M., Kostanic, I., 2001. Principles of Neurocomputing for Science and Engineering. McGraw-Hill.

Holland, J.H., 1975. Adaptation in Natural and Artiﬁcial Systems, second ed. Massachusetts Institute of Technology, Cambridge.

Hsu, K.L., Gupta, H.V., Sorooshian, S., 1995. Artiﬁcial neural network modeling of the rainfall–runoff process. Water Resources Research 31 (10), 2517–2530. Karunasinghe, D.S.K., Liong, S.Y., 2006. Chaotic time series prediction with a global

model: artiﬁcial neural network. Journal of Hydrology 323, 92–105. Kim, S., Kim, H.S., 2008. Neural networks and genetic algorithm approach for

nonlinear evaporation and evapotranspiration modeling. Journal of Hydrology 351, 299–317.

Kwok, T.Y., Yeung, D.Y., 1997. Constructive algorithm for structure learning in feedforward neural networks for regression problems. IEEE Transactions on Neural Networks 3, 630–645.

Leahy, P., Kiely, G., Corcoran, G., 2008. Structural optimisation and input selection of an artiﬁcial neural network for river level prediction. Journal of Hydrology 355 (1), 192–201.

Mackey, M.C., Glass, L., 1977. Oscillation and chaos in physiological control systems. Science 197, 287–289.

Moller, M.F., 1993. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 6, 525–533.

Moriaty, D.E., Miikkulainen, R., 1998. Forming neural networks through efﬁcient and adaptive coevolution. Evolutionary Computation 5, 373–399.

Sajikumar, N., Thandaveswara, B.S., 1999. A non-linear rainfall–runoff model using an artiﬁcial neural network. Journal of Hydrology 216, 32–55.

Sahoo, G.B., Ray, C., 2006. Flow forecasting for a Hawaii stream using rating curves and neural networks. Journal of Hydrology 317, 63–80.

Yao, X., 1993. A review of evolutionary artiﬁcial neural networks. International Journal of Intelligent Systems 8, 539–567.

Yao, X., 1999. Evolving artiﬁcial neural networks. Proceedings of the IEEE 87 (9), 1423–1447.