Age-Based Survival Selection - Evolutionary Constructive and Pruning Algorithm

Chapter 4 Network-Based Structural Learning for Prediction

4.2 Evolutionary Constructive and Pruning Algorithm

4.2.5 Age-Based Survival Selection

After the network crossover, network mutation and CBP are completed, the individuals in the next generation are chosen through survival selection. If a general survival selection is adopted, the evolved NNs tend to have fully connected topologies due to network mutation, which add more inputs to the hidden neurons. As a result, hardware implementation costs are increased, and the generalization capabilities of the evolved NNs are reduced. To avoid this problem, we propose a different survival selection method, ABSS, to select younger NNs with partial connections, rather than full connections, for the next generation.

ABSS is performed in two steps. The first step involves traditional tournament selection to choose Np candidates for the next stage. If the age of an NN is defined as the number of generations it survives in the population, then the Np candidates may have different ages. For example, the age of a newborn NN is one, and its age increases by one if it survives to the next generation. The second step continues to delete the elder NNs from the Np candidates according to the aging index, defined as follows:

1 2

where Agej is the age of the jth NN. Selection proceeds by generating a uniform random number r in the range [0, 1]. If Aj > r, the jth NN is deleted and replaced by a newborn NN produced by Step 1; otherwise, the jth NN is retained in the population. As a result, the population size Np is unchanged after ABSS, and the average age of the NNs is potentially lower, which prevents the evolved NNs from adopting a fully connected topology.

In summary, the network crossover operator constructs an NN by adding hidden neurons so that the NN possesses more processing ability to accurately approximate the

target function. The network mutation operator adds one connection from the input to the hidden neuron so that the hidden neuron can process more input information. CBP prunes the worse hidden neurons from an NN to prevent overfitting of the training data.

ABSS deletes elder NNs that are potentially fully connected. Thus, network crossover and mutation operations direct the evolution of NNs in a constructive manner that can improve their processing ability to accurately approximate the true function, whereas CBP and ABSS direct the evolution of NNs in a destructive way that can improve their generalization capabilities while reducing their hardware requirements.

4.3 Numerical Results

In this section, we demonstrate the performance of the proposed algorithm using three time series prediction problems: Mackey-Glass, sunspots, and vehicle count. The first time series is generated from the Mackey-Glass differential equation, the second series is recorded from the sunspots, and the third series is obtained from the hourly vehicle count for the Monash Freeway outside Melbourne in Victoria, Australia, beginning in August, 1995. In order to make a fair comparison with previous works, the first problem adopts RMSE as the fitness, which is calculated as follows:

( ) ( ) second and third problems compare with previous works by normalized mean squared error (NMSE) and mean absolute percentage error (MAPE). The NMSE is defined as the ratio of the mean squared error to the variance of the time-series as follows:

( ) ( )

where σ is the standard deviation of the time-series. The MAPE is determined as

Furthermore, the number of hidden neurons Nh and the number of connections Nc are recorded to observe the evolutionary progress. The following parameters are used in each problem.

1) The population size N_p is 30.

2) The crossover probability is 0.8.

3) The mutation probability is 0.6.

4) The value of ϕ for training NN by BP is 15.

5) The maximum number of generations is 500.

As described in Section 3.5, the Np, G, and ϕ would affect the computational complexity of ECPA. The larger Np the less effect of genetic drift, the larger G the more chances to find better ANNs, and the larger ϕ the more prediction accuracy. However, the larger Np, G, and ϕ lead to the longer computation time. To select suitable values, Np is set as 30 according to the suggestion in [109]. In order to select appropriate ϕ and G, the values of ϕ were chosen as 5, 10, 15, 20, 25, and 30, and the values of G were chosen as 300, 400, 500, and 600 in the preliminary runs. As a result, G = 500 and ϕ = 15 were adopted in the following experiments due to the sufficient prediction accuracy and acceptable computation time. Because the parameters were chosen after some preliminary runs, the value was not meant to be optimal. The setting of pc = 0.8 and pm = 0.6 is to enhance the chance of increasing the number of hidden neurons than the chance of increasing the number of connections. It was expected that structures with more hidden neurons would

be found first, and these structure would then be pruned. We evaluate the performance of ECPA on the three examples over 10 independent runs.

4.3.1 Example 1: Chaotic Time Series Prediction

The Mackey-Glass time series prediction is recognized as a benchmark problem in the area of NNs. This chaotic time series prediction was considered to be a suitable way to evaluate the performance of the proposed ECPA. The Mackey-Glass time series is generated from the following delay differential equation:

( ) ( )

where τ = 17 and x(0) = 1.2 in the simulation. The fourth-order Runge-Kutta method is used to generate 1,000 data points ranging from t = 118 to 1,117. The task involves predicting the value of x(t+6) from the input vector [x(t−18) x(t−12) x(t−6) x(t)] for any t. Therefore, the input-output data pairs for prediction are

( ) ( ) ( ) ( ) ( )

[

x t−18,x t−12,x t−6,x t ;x t+6

]

where the first 500 data pairs are used as training set and the later 500 data pairs are testing set.

The evolutionary progress of NNs for the Mackey-Glass time series prediction problem is illustrated in Fig. 4.5. The top panel of Fig. 4.5 shows the decrease in RMSE resulting from the evolution of NNs. The middle and bottom panels of Fig. 4.5 present N_h and Nc and demonstrate the structural evolution of NNs, respectively. Fig. 4.6 graphically illustrates how the topologies of NNs evolve in selected generations. The input vector [u(4) u(3) u(2) u(1)] represents [x(t−18) x(t−12) x(t−6) x(t)], and the output y represents x(t+6). The blue lines represent positive-valued weights, and the red lines represent negative-valued weights. The widths of the lines indicate the relative strengths

of the weights. The NN structure produced in the 1st generation starts with a network of two neurons and a few connections from the inputs and thus has limited information processing ability for the task. As its evolution progresses, Nh gradually increases to 38 by the 25th generation, is reduced to 28 by the 30th generation, and then increases to 40 by the 490th generation. The NNs were observed to grow rapidly, but the growth did not always occur due to the use of CBP and ABSS. Note that the resulting NN structure does not have a fully connected topology; less than 85% of the synapses are connected.

Many approaches have been developed to design both the architecture and weights of NNs to address the same prediction problem. Table 4.1 presents the experimental results obtained using the proposed algorithm compared with other algorithms. The best (i.e., lowest) RMSE, Nh, and Nc values among the various approaches are shown in boldface type, and the RMSE, N_h, and N_c of ECPA are the average values over 10 independent runs. As shown here, although ECPA achieved a larger RMSE than that of Du and Zhang [110] with the training set, it obtained a lower RMSE than the other methods for the testing data. It is interesting that ECPA obtained a lower RMSE for the testing data than for the training data in this experiment, but this phenomenon has been observed previously [111]. In terms of the average number of hidden neurons over 10 independent runs, ECPA obtained a lower Nh than those of Du and Zhang [110] and Harpham and Dawson [112]. Although ECPA obtained a higher Nh than those of Rojas et al. [113], Chen et al. [114], and Cho and Wang [115], it achieved a lower RMSE.

ECPA resulted in the evolution of an NN with training data RMSE, testing data RMSE, Nh, and Nc values of 6.76×10⁻⁴, 6.30×10⁻⁴, 40.5, and 203.2, respectively. Clearly, the evolved NN possessed a partially connected topology; our observations showed that ECPA can evolve NNs with a lower RMSE and more compact structure than the other methods.

Fig. 4.5 Evolution progress for Example 1.

1st 5th 10th

25th 30th 40th

50th 350th 490th

Fig. 4.6 Evolved NNs for Example 1..

Table 4.1 Prediction results for Example 1.

t = 6

　　t = 84

Method

Train Test First 500

points Last 500 points

N_h N_c

Du and Zhang

[110] 2.87×10⁻⁴ 7.67×10⁻⁴ 1.93×10⁻² 2.07×10⁻² 294 --- Harpham and

Dawson [112] --- 1.50×10⁻³ --- --- 116 ---

Rojas et al. [113] 2.87×10⁻³ --- 2.63×10⁻² --- 12 --- Chen et al. [114] 3.30×10⁻³ 3.60×10⁻³ --- --- 10 110 Cho and Wang

[115] 9.60×10⁻³ 1.14×10⁻² --- --- 23 ---

ECPA 6.76×10⁻⁴ 6.30×10⁻⁴ 6.20×10⁻³ 3.10×10⁻³ 40.5 203.2

The prediction result for the one-step prediction of x(t+6) is shown in Fig. 4.7. In addition to the one-step prediction of x(t+6), the evolved NN was applied to another general testing case: the multiple-step prediction of x(t+84) [111]. To perform a multiple-step prediction, the proposed algorithm iteratively predicts x(t+6), x(t+12), etc.

until it reaches x(t+84) after 14 such iterations. The prediction result for multiple-step prediction of x(t+84) is shown in Fig. 4.8 where the NN was evolved based on the training data for the one-step prediction of x(t+6). When compared with Fig. 4.7, the prediction error in Fig. 4.8 indicates an increase from 6.30×10⁻⁴ to 3.10×10⁻³ because multiple-step prediction is more complex than one-step prediction. As shown in Table 4.1, the prediction errors for multiple-step prediction of x(t+84) for the first and last 500 points were 6.20×10⁻³ and 3.10×10⁻³, respectively. Therefore, ECPA was superior to the other methods in the multiple-step prediction.

Fig. 4.7 Prediction error for Example 1.

Fig. 4.8 Multiple-step prediction error for Example 1.

在文檔中具演化式結構學習能力之類神經網路及其預測之應用 (頁 83-91)