Dynamical Optimal Training via Lyapunov’s Method

CHAPTER 3 Dynamic Optimal Training of A Three-Layer Neural Network with

3.3. Dynamical Optimal Training via Lyapunov’s Method

Accordingly, the use of (3.17)、(3.23)、(3.24) and (3.25) in (3.22) yields

( ) ( ( ) ( ) )

Equation (3.27) is the formula that we use to adjust the synaptic weights WH. By using (3.21) and (3.27), we can adjust the synaptic weights of the network.

3.3. Dynamical Optimal Training via Lyapunov’s Method

In control system, we know that we can use the Lyapunov function to consider the stability of the system. The basic philosophy of Lyapunov’s direct method is the mathematical extension of a fundamental physical observation: if the total energy of a

mechanical (or electrical) system is continuously dissipated, then the system, whether linear or nonlinear must eventually settle down to an equilibrium point [19]. So by the same meaning, we define the Lyapunov function as

V

= (3.28)

J

Where the item J is total square error, defined in (3.11). And Equation (3.28) is positive definite, which means that V = J > 0. The difference of the Lyapunov function is

V J

t₊

J

∆ = − (3.29) Where Jt+1 expresses the total square error of the (t+1)^th iteration. If Equation (3.29) is negative, the system is guaranteed to be stable. Then, for △

V < 0 we have

1 0 one parameter G(β(t)). The Equation (3.31) can be rewritten as

( ( ) )

t t

J

₊ − =

J G β t

(3.32) If the parameter β(t) satisfies Jt+1 -

J

t =

G(β(t)) <0, then the set of β(t) is the stable

range of the learning rate of the system at the t^th iteration. In this stable range, if the

β

opt(t) satisfies that Jt+1 - Jt is at its minimum, we call βopt(t) the optimal learning rate at the t^th iteration. The optimal learning rate βopt(t) will not only guarantee the stability of the training process, but also has the fastest speed of convergence.

In order to find the optimal learning rate βopt(t) from the function Jt+1 - Jt analytically, we need to have an explicit form of Jt+1 - Jt, like the simplest form of Jt+1 - Jt (for a simple two layer neural network) in [13]. But the function Jt+1 -

J

t here is a very complicated nonlinear algebraic equation, it is nearly impossible to have a simple explicit form. However we have also defined the function Jt+1 -

J

t in (3.31) by progressively evolving the equations from the beginning of this Chapter. Therefore we can indeed defined (3.31) implicitly in Matlab coding. In this case, we can apply the Matlab routine fminbnd to find the optimal learning rate βopt(t) from Jt+1 -

J

t. The

calling sequence of fminbnd is:

FMINBND Scalar bounded nonlinear function minimization.

X = FMINBND(FUN,x1,x2) starts at X0 and finds a local minimizer X of the function FUN in the interval x1 <= X <= x2. FUN accepts scalar input X and returns a scalar function value F evaluated at X.

The capability of the Matlab routine “fminbnd” is to find a local minimizer βopt of the function G(β), which has only one independent variable β, in a given interval. So we have to define an interval when we use this routine. For the following two examples in Chapter 4, we set the interval as [0.01, 100] to set the allowable learning rates between 0.01 and 100. Note that for simplicity, we assume that βH(t) = βY(t) = β(t) in (3.31), therefore there is only one variable in (3.31). So we use the routine “fminbnd”

to find the optimal learning rate. However, we can also find the learning rate βH(t) and

β

Y(t) respectively by using another Matlab routine “fminunc”. But this routine

“fminunc” can only find the minimizer βH(t) and βY(t) around one specific point, not around an interval. Therefore it is very limited in application and is not appropriate for this case.

Algorithm 1: Dynamic optimal training algorithm for a three-layer neural network

Step 0: The following WH(t), WY(t), G(β(t)), βopt(t) and Y(t) denote their respective

values at iteration t.

Step 1: Given the initial weighting matrix WH(1),

W

Y(1), the training input matrix X and the desired output matrix D then we can find the actual output matrix of the network Y(1) and the nonlinear function G(β(1)).

Step 2: Using Matlab routine “fminbnd” with the interval [0.01, 100] to solve the nonlinear function G(β(1)) and find the optimal learning rate βopt(1).

Step 3: Iteration count t=1. Start the back propagation training process.

Step 4: Find if the desired output matrix D and the actual output matrix of the network Y(1) are close enough or not? If Yes, GOTO Step 9.

Step 5: Update the synaptic weights matrix to yield WH(t+1) and WY(t+1) by using (3.27) and (3.21) respectively.

Step 6: Find the actual output matrix of the network Y(t+1) and the nonlinear function

G(β(t+1)).

Step 7: Use Matlab routine “fminbnd” to find the optimal learning rate βopt(t+1)for the next iteration.

Step 8: t=t+1 and GOTO Step 4.

Step 9: End.

CHAPTER 4 Experimental Results

In this chapter, the classification problems of XOR and Iris data will be solved via our new dynamical optimal training algorithm in Chapter 3. The training results will be compared with the conventional BP training using fixed learning rate.

4.1. Example 1: The XOR Problem

The task is to train the network to produce the Boolean “Exclusive OR” (XOR) function of two variables. The XOR operator yields true if exactly one (but not both) of two conditions is true, otherwise the XOR operator yields false. We need only consider four input data (0,0), (0,1), (1,1), and (1,0) in this problem. The first and third input patterns are in class 0, which means the XOR operator yields “False”

when the input data is (0,0) or (1,1). The distribution of the input data is shown in Figure 4-1. Because there are two variables of XOR function, we choose the input layer with two nodes and the output layer with one node. Then we use one hidden layer with two neurons to solve XOR problem [14], as shown in Figure 4-2. The architecture of the neural network is 2-2-1 network.

The input data of XOR

0, 0

1, 1

1, 0 0, 1

X₁

X₂ ^{Class 0}

Class 1

Figure 4-1. The distribution of XOR input data sets

y

Figure 4-2. The neural network for solving XOR

First, we use the standard BP algorithm with fixed learning rates (β = 1.5, 0.9, 0.5 and 0.1) to train the XOR, and the training results are shown in Figure 4-3-1 ~ 4-3-4. The result of using BP algorithm with dynamic optimal learning rates to train the XOR is shown Figure 4-4.

Figure 4-3-1. The square error J of the standard BPA with fixed

β = 1.5

Figure 4-3-2. The square error J of the standard BPA with fixed

β = 0.9

Figure 4-3-3. The square error J of the standard BPA with fixed

β = 0.5

Figure 4-3-4. The square error J of the standard BPA with fixed

β = 0.1

Figure 4-4. The square error J of the BPA with dynamic optimal training

The following Figure 4-5 shows the plot of (3.32) for -1 <

β < 100, which is G( β

) =

∆J(

β

) = Jt+1 - Jt, at iteration count t = 1. The Matlab routine fminbnd will be invoked to find

β

opt with the constraint that G(

β

opt) < 0 with maximum absolute value. This

β

opt

is the learning rate for iteration count t = 2.The

β

opt is found to be 7.2572 for iteration count 2. The dynamic learning rate of every iteration is shown in Figure 4-6.

Figure 4-5. The difference equation G(β(n)) and

β

opt

= 7.2572

Figure 4-6. The dynamic learning rates of every iteration

The comparison of these cases is shown in Figure 4-7. In Figure 4-7, it is obvious that our dynamical optimal training yields the best training results in minimum epochs.

Figure 4-7. Training errors of dynamic optimal learning rates and fixed learning rates

Table 4-1 shows the training result via dynamical optimal training for XOR problem.

Table 4.1. The training result for XOR using dynamical optimal training Iterations

Training Results 1000 5000 10000 15000

W

₁ (after trained) 6.5500 7.7097 8.8191 8.4681

W

2 (after trained) 6.5652 7.7145 8.1921 8.4703

W

₃ (after trained) 0.8591 0.9265 0.9473 0.9573

W

4 (after trained) 0.8592 0.9265 0.9473 0.9573

W

5 (after trained) 14.9536 26.2062 33.0393 37.8155

W

6 (after trained) -19.0670 -32.9550 -41.3692 -47.2513

Actual Output Y for (x1, x2) = (0,0) 0.1134 0.0331 0.0153 0.0089

Actual Output Y for (x1, x2) = (0,1) 0.8232 0.9300 0.9616 0.9750

Actual Output Y for (x1, x2) = (1,0) 0.8232 0.9300 0.9616 0.9750

Actual Output Y for (x1, x2) = (1,1) 0.2291 0.0925 0.0511 0.0334

J

0.0639 0.0097 0.0029 0.0012

Table 4-2 shows the training result via the standard BP with fixed β = 0.9 for XOR problem.

Table 4.2. The training result for XOR using fixed learning rate β = 0.9 Iterations

Training Results 1000 5000 10000 15000

W

₁ (after trained) 4.7659 7.2154 7.6576 7.8631

W

2 (after trained) 4.8474 7.2228 7.6624 7.8670

W

₃ (after trained) 0.7199 0.8996 0.9234 0.9331

W

4 (after trained) 0.7228 0.8996 0.9234 0.9331

W

5 (after trained) 6.2435 20.6467 25.5617 28.2288

W

6 (after trained) -8.1214 -26.1034 -32.1610 -35.4456

Actual Output Y for (x1, x2) = (0,0) 0.2811 0.0613 0.0356 0.0264

Actual Output Y for (x1, x2) = (0,1) 0.6742 0.8885 0.9263 0.9415

Actual Output Y for (x1, x2) = (1,0) 0.6745 0.8885 0.9263 0.9415

Actual Output Y for (x1, x2) = (1,1) 0.4192 0.1479 0.0982 0.0781

J

0.2334 0.0252 0.0109 0.0068

To compare Table 4.1 with Table 4.2, we can see that the training result via dynamical optimal training is faster with better result than other approaches.

Now, we will use another method, the back-propagation algorithm with momentum, to solve XOR problem again. Then we will compare its training errors with that of dynamic optimal training and see if dynamic optimal training is indeed better. The back-propagation algorithm with momentum is to modify the Equation (2.13) by including a momentum term as follows:

( ) (

¹ where α is usually a positive number called the momentum constant and usually in the range [0, 1). The training results of the BPA with momentum for XOR problem are shown in Figure 4-8-1 ~ 4-8-3. The comparison of these cases is shown in Figure 4-9.

In Figure 4-9, we can see that some training results of BPA with momentum are as well as the dynamic training but the most results of BPA with momentum are unpredictable. The training results still depend on the chosen learning rates and momentum.

Figure 4-8-1. The square error J of the BPA with variant momentum(

β = 0.9)

Figure 4-8-2. The square error J of the BPA with variant momentum(

β = 0.5)

Figure 4-8-3. The square error J of the BPA with variant momentum (

β = 0.1)

Figure 4-9. Total square errors of dynamic training and the BPA with different learning rates and momentum

4.2. Example 2: Classification of Iris Data Set

In this example, we will use the same neural network as before to classify Iris data sets [15], [16]. Generally, Iris has three kinds of subspecies, and the classification will depend on the length and width of the petal and the length and width of the sepal. The total Iris data are shown in Figure 4-10-1 and 4-10-2. And the training data sets, the first 75 samples of total data, are shown in Figures 4-11-1 and 4-11-2. The Iris data samples are available in [20]. There are 150 samples of three species of the Iris flowers in this data. We choose 75 samples to train the network and using the other 75 samples to test the network. We will have four kinds of input data, so we adopt the network which has four nodes in the input layer and three nodes in the output layer for this problem. Then the architecture of the neural network is a 4-4-3 network as shown in Figure 4-12. In which, we use the network with four hidden nodes in the hidden layer.

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-10-1. The total Iris data set (Sepal)

IRIS Data-Petal

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-10-2. The total Iris data set (Petal)

IRIS Data-Sepal

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-11-1. The training set of Iris data (Sepal)

IRIS Data-Petal

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-11-2. The training set of Iris data (Petal)

Figure 4-12. The neural network for solving Iris problem

First, we use the standard BPA with fixed learning rates (β = 0.1, 0.01 and 0.001) to solve the classification of Iris data sets, and the training results are shown in Figure 4-13-1 ~ 4-13-3. The result of BPA with dynamic optimal learning rates is shown in Figure 4-14.

Figure 4-13-1. The square error J of the standard BPA with fixed

β = 0.1

Figure 4-13-2. The square error J of the standard BPA with fixed

β = 0.01

Figure 4-13-3. The square error J of the standard BPA with fixed

β = 0.001

Figure 4-14. The square error J of the BPA with dynamic optimal training

Figure 4-15 shows that the convergence speed of the network with dynamic learning rate is absolutely faster than the network with fixed learning rates. Because the optimal learning rate of every iteration is almost in the range [0.01, 0.02], so the convergence speed of the fixed learning rate β = 0.01 is similar to the convergence speed of the dynamic learning rate. But dynamic learning rate approach still performs better than those of using fixed learning rates.

Figure 4-15. Training errors of dynamic optimal learning rates and fixed learning rates After 10000 training iterations, the resulting weights and total square error J are shown below.

1.2337 -0.5033 1.3225 1.3074 -0.3751 3.4714 -2.6777 -1.6052 W = 3.7235 5.1603 -4.0019 -10.4289

1.9876 -2.7186 4.6171 4.3400

Total square error J = 0.1582

The actual output and desired output of 10000 training iteration are shown in Table 4.3 and the testing output and desired output are shown in Table 4.4. After we substitute the above weighting matrices into the network and perform real testing, we find that there is no classification error by using training set (the first 75 data set).

However there are 5 classification errors by using testing set (the later 75 data set), which are index 34, 51, 55, 57, 59 in Table 4-4.

Table 4.3. Actual and desired outputs after 10000 iterations Actual Output Desired Output Index Class 1 Class 2 Class 3 Class 1 Class 2 Class 3

28 0.0211 0.9909 0.0050 0.0000 1.0000 0.0000

65 0.0007 0.0093 0.9940 0.0000 0.0000 1.0000

Table 4.4. Actual and desired outputs in real testings Actual Output Desired Output

Index Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 1 0.9794 0.0249 0.0001 1.0000 0.0000 0.0000 2 0.9812 0.0224 0.0001 1.0000 0.0000 0.0000 3 0.9818 0.0218 0.0001 1.0000 0.0000 0.0000 4 0.9818 0.0218 0.0001 1.0000 0.0000 0.0000 5 0.9810 0.0229 0.0001 1.0000 0.0000 0.0000 6 0.9804 0.0236 0.0001 1.0000 0.0000 0.0000 7 0.9813 0.0223 0.0001 1.0000 0.0000 0.0000 8 0.9823 0.0214 0.0001 1.0000 0.0000 0.0000 9 0.9823 0.0213 0.0001 1.0000 0.0000 0.0000 10 0.9810 0.0228 0.0001 1.0000 0.0000 0.0000 11 0.9818 0.0219 0.0001 1.0000 0.0000 0.0000 12 0.9820 0.0216 0.0001 1.0000 0.0000 0.0000 13 0.9810 0.0228 0.0001 1.0000 0.0000 0.0000 14 0.9813 0.0226 0.0001 1.0000 0.0000 0.0000 15 0.9817 0.0219 0.0001 1.0000 0.0000 0.0000 16 0.9820 0.0216 0.0001 1.0000 0.0000 0.0000 17 0.9650 0.0442 0.0002 1.0000 0.0000 0.0000 18 0.9819 0.0220 0.0001 1.0000 0.0000 0.0000 19 0.9813 0.0224 0.0001 1.0000 0.0000 0.0000 20 0.9816 0.0219 0.0001 1.0000 0.0000 0.0000 21 0.9804 0.0236 0.0001 1.0000 0.0000 0.0000

22 0.9820 0.0215 0.0001 1.0000 0.0000 0.0000 23 0.9816 0.0222 0.0001 1.0000 0.0000 0.0000 24 0.9820 0.0215 0.0001 1.0000 0.0000 0.0000 25 0.9817 0.0220 0.0001 1.0000 0.0000 0.0000 26 0.0215 0.9909 0.0049 0.0000 1.0000 0.0000 27 0.0210 0.9909 0.0051 0.0000 1.0000 0.0000 28 0.0170 0.9839 0.0095 0.0000 1.0000 0.0000 29 0.0196 0.9886 0.0064 0.0000 1.0000 0.0000 30 0.0239 0.9898 0.0046 0.0000 1.0000 0.0000 31 0.0215 0.9907 0.0050 0.0000 1.0000 0.0000 32 0.0219 0.9907 0.0049 0.0000 1.0000 0.0000 33 0.0220 0.9906 0.0049 0.0000 1.0000 0.0000

*34 0.0019 0.1415 0.8726 0.0000 1.0000 0.0000 35 0.0140 0.9710 0.0179 0.0000 1.0000 0.0000 36 0.0217 0.9901 0.0051 0.0000 1.0000 0.0000 37 0.0212 0.9909 0.0050 0.0000 1.0000 0.0000 38 0.0201 0.9897 0.0058 0.0000 1.0000 0.0000 39 0.0224 0.9903 0.0049 0.0000 1.0000 0.0000 40 0.0199 0.9889 0.0061 0.0000 1.0000 0.0000 41 0.0197 0.9889 0.0062 0.0000 1.0000 0.0000 42 0.0210 0.9905 0.0052 0.0000 1.0000 0.0000 43 0.0215 0.9907 0.0050 0.0000 1.0000 0.0000 44 0.0232 0.9900 0.0047 0.0000 1.0000 0.0000 45 0.0206 0.9899 0.0055 0.0000 1.0000 0.0000 46 0.0223 0.9905 0.0048 0.0000 1.0000 0.0000 47 0.0216 0.9905 0.0050 0.0000 1.0000 0.0000 48 0.0215 0.9908 0.0049 0.0000 1.0000 0.0000 49 0.0300 0.9869 0.0042 0.0000 1.0000 0.0000 50 0.0215 0.9905 0.0050 0.0000 1.0000 0.0000

*51 0.0060 0.7787 0.1866 0.0000 0.0000 1.0000 52 0.0026 0.2674 0.7388 0.0000 0.0000 1.0000 53 0.0032 0.3974 0.5942 0.0000 0.0000 1.0000 54 0.0007 0.0095 0.9938 0.0000 0.0000 1.0000

*55 0.0159 0.9807 0.0117 0.0000 0.0000 1.0000 56 0.0009 0.0230 0.9835 0.0000 0.0000 1.0000

*57 0.0165 0.9827 0.0104 0.0000 0.0000 1.0000 58 0.0007 0.0094 0.9939 0.0000 0.0000 1.0000

*59 0.0133 0.9690 0.0199 0.0000 0.0000 1.0000 60 0.0018 0.1281 0.8862 0.0000 0.0000 1.0000 61 0.0007 0.0106 0.9930 0.0000 0.0000 1.0000 62 0.0007 0.0095 0.9939 0.0000 0.0000 1.0000 63 0.0018 0.1130 0.9011 0.0000 0.0000 1.0000 64 0.0033 0.4128 0.5766 0.0000 0.0000 1.0000 65 0.0011 0.0335 0.9748 0.0000 0.0000 1.0000 66 0.0007 0.0094 0.9939 0.0000 0.0000 1.0000 67 0.0008 0.0166 0.9885 0.0000 0.0000 1.0000 68 0.0007 0.0101 0.9934 0.0000 0.0000 1.0000 69 0.0007 0.0096 0.9938 0.0000 0.0000 1.0000 70 0.0007 0.0094 0.9939 0.0000 0.0000 1.0000 71 0.0007 0.0105 0.9931 0.0000 0.0000 1.0000 72 0.0007 0.0123 0.9917 0.0000 0.0000 1.0000 73 0.0010 0.0284 0.9791 0.0000 0.0000 1.0000 74 0.0007 0.0099 0.9935 0.0000 0.0000 1.0000 75 0.0011 0.0370 0.9717 0.0000 0.0000 1.0000

CHAPTER 5 Conclusions

Although the back propagation algorithm is a useful tool to solve the problems of classification, optimization, prediction etc, it still has many defects. One of those defects is that we don’t know how to choose the suitable learning rate to get converged training results. But by using the dynamical training algorithm for three layer neural network that we proposed in the end of Chapter 3, we can find the dynamic optimal learning rate very easily. And the dynamic learning rate guarantees that the total square error J is a decreasing function. This means that actual outputs will be closer to desired outputs for more iterations. The classification problems of XOR and Iris data are proposed in Chapter 4. They are solved by using the dynamical optimal training for a three layer neural network with sigmoid activation functions in hidden and output layers. Excellent results are obtained in the XOR and Iris data problems. Therefore the dynamic training algorithm is actually very powerful for getting better results than the other conventional back propagation algorithm with unknown fixed learning rates. So the goal of removing the defects of the back propagation algorithm with fixed learning rate is achieved by using the dynamical optimal training algorithm.

REFERENCES

[1] T. Yoshida, and S. Omatu, “Neural network approach to land cover mapping,”

IEEE Trans. Geoscience and Remote, Vol. 32, pp. 1103-1109, Sept. 1994.

[2] H. Bischof, W. Schneider, and A. J. Pinz, “Multispectral classification of Landsat-images using neural networks,” IEEE Trans, Gsoscience and Remote

Sensing, Vol. 30, pp. 482-490, May 1992.

[3] M. Gopal, L. Behera, and S. Choudhury, “On adaptive trajectory tracking of a robot manipulator using inversion of its neural emulator,” IEEE Trans. Neural

Networks, 1996.

[4] L. Behera, “Query based model learning and stable tracking of a roboot arm using radial basis function network,” Elsevier Science LTd., Computers and

Electrical Engineering, 2003.

[5] F. Amini, H. M. Chen, G. Z. Qi, and J. C. S. Yang, “Generalized neural network based model for structural dynamic identification, analytical and experimental studies,”

Intelligent Information Systems, Proceedings 8-10, pp. 138-142, Dec.

1997.

[6] K. S. Narendra, and S. Mukhopadhyay, “Intelligent control using neural networks,”

IEEE Trans., Control Systems Magazine, Vol. 12, Issue 2, pp.11-18,

April 1992.

[7] L. Yinghua, and G. A. Cunningham, “A new approach to fuzzy-neural system modeling,” IEEE Trans., Fuzzy Systems, Vol. 3, pp. 190-198, May 1995.

[8] L. J. Zhang, and W. B. Wang, “Scattering signal extracting using system modeling method based on a back propagation neural network,” Antennas and

Propagation Society International Symposium, 1992. AP-S, 1992 Digest. Held in

Conjuction with: URSI Radio Science Meting and Nuclear EMP Meeting, IEEE 18-25, Vol. 4, pp. 2272, July 1992

[9] P. Poddar, and K. P. Unnikrishnan, “Nonlinear prediction of speech signals using memory neuron networks,” Neural Networks for Signal Processing [1991], Proceedings of the 1991 IEEE Workshop 30, pp. 395-404, Oct. 1991.

[10] R. P. Lippmann, “An introduction to computing with neural networks,” IEEE

ASSP Magazine, 1987.

[11] D. E. Rumelhart et al., Learning representations by back propagating error,”

Nature, Vol. 323, pp. 533-536, 1986

[12] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” Parallel Distributed Processing,

Exploration in the Microstructure of Cognition, Vol. 1, D. E. Rumelhart and J. L.

McClelland, eds. Cambridge, MA: MIT Press, 1986.

[13] C. H. Wang, H. L. Liu, and C. T. Lin, “Dynamic Optimal Learning Rates of a Certain Class of Fuzzy Neural Networks and its Applications with Genetic Algorithm,” IEEE Trans. Syst., Man, Cybern. Part B, Vol. 31, pp. 467-475, June 2001.

[14] L. Behera, S. Kumar, and A. Patnaik, “A novel learning algorithm for feedforeward networks using Lyapunov function approach,” Intelligent Sensing

and Information Processing, Proceedings of international Conference, pp.

277-282, 2004.

[15] M. A. AL-Alaoui, R. Mouci, M. M. Mansour, and R. Ferzli, “A Cloning Approach to Classifier Training,” IEEE Trans. Syst., Man, Cybern. Part A, Vol.

32, pp. 746-752, Nov. 2002.

[16] R. Kozma, M. Kitamura, A. Malinowski, and J. M. Zurada, “On performance measures of artificial neural networks trained by structural learning algorithms,”

Artificial Neural Networks and Expert Systmes, Proceedings, Second New

Zealand International Two-Stream Conference, pp.22-25, Nov. 20-23, 1995.

[17] F. Rosenblatt, “Principles of Neurodynamics”, Spartan books, New York, 1962.

[18] S. Haykin, “Neural Networks: A Comprehensive Foundation,” New Jersey:

Prentice-Hall, second edition, 1999.

[19] J. E. Slotine, “Applied Nonlinear Control,” New Jersey: Prentice-Hall, 1991.

[20] Iris Data Samples [Online]. Available: ftp.ics.uci.edu/pub/machine-learning data-

bases/iris/iris.data.

在文檔中三層類神經網路的動態最佳學習 (頁 28-0)