Example 2: Classification of Iris Data Set

CHAPTER 4 Experimental Results

4.2. Example 2: Classification of Iris Data Set

In this example, we will use the same neural network as before to classify Iris data sets [15], [16]. Generally, Iris has three kinds of subspecies, and the classification will depend on the length and width of the petal and the length and width of the sepal. The total Iris data are shown in Figure 4-10-1 and 4-10-2. And the training data sets, the first 75 samples of total data, are shown in Figures 4-11-1 and 4-11-2. The Iris data samples are available in [20]. There are 150 samples of three species of the Iris flowers in this data. We choose 75 samples to train the network and using the other 75 samples to test the network. We will have four kinds of input data, so we adopt the network which has four nodes in the input layer and three nodes in the output layer for this problem. Then the architecture of the neural network is a 4-4-3 network as shown in Figure 4-12. In which, we use the network with four hidden nodes in the hidden layer.

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-10-1. The total Iris data set (Sepal)

IRIS Data-Petal

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-10-2. The total Iris data set (Petal)

IRIS Data-Sepal

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-11-1. The training set of Iris data (Sepal)

IRIS Data-Petal

Class 1-setosa Class 2-versicolor Class 3-virginica

Figure 4-11-2. The training set of Iris data (Petal)

Figure 4-12. The neural network for solving Iris problem

First, we use the standard BPA with fixed learning rates (β = 0.1, 0.01 and 0.001) to solve the classification of Iris data sets, and the training results are shown in Figure 4-13-1 ~ 4-13-3. The result of BPA with dynamic optimal learning rates is shown in Figure 4-14.

Figure 4-13-1. The square error J of the standard BPA with fixed

β = 0.1

Figure 4-13-2. The square error J of the standard BPA with fixed

β = 0.01

Figure 4-13-3. The square error J of the standard BPA with fixed

β = 0.001

Figure 4-14. The square error J of the BPA with dynamic optimal training

Figure 4-15 shows that the convergence speed of the network with dynamic learning rate is absolutely faster than the network with fixed learning rates. Because the optimal learning rate of every iteration is almost in the range [0.01, 0.02], so the convergence speed of the fixed learning rate β = 0.01 is similar to the convergence speed of the dynamic learning rate. But dynamic learning rate approach still performs better than those of using fixed learning rates.

Figure 4-15. Training errors of dynamic optimal learning rates and fixed learning rates After 10000 training iterations, the resulting weights and total square error J are shown below.

1.2337 -0.5033 1.3225 1.3074 -0.3751 3.4714 -2.6777 -1.6052 W = 3.7235 5.1603 -4.0019 -10.4289

1.9876 -2.7186 4.6171 4.3400

Total square error J = 0.1582

The actual output and desired output of 10000 training iteration are shown in Table 4.3 and the testing output and desired output are shown in Table 4.4. After we substitute the above weighting matrices into the network and perform real testing, we find that there is no classification error by using training set (the first 75 data set).

However there are 5 classification errors by using testing set (the later 75 data set), which are index 34, 51, 55, 57, 59 in Table 4-4.

Table 4.3. Actual and desired outputs after 10000 iterations Actual Output Desired Output Index Class 1 Class 2 Class 3 Class 1 Class 2 Class 3

28 0.0211 0.9909 0.0050 0.0000 1.0000 0.0000

65 0.0007 0.0093 0.9940 0.0000 0.0000 1.0000

Table 4.4. Actual and desired outputs in real testings Actual Output Desired Output

Index Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 1 0.9794 0.0249 0.0001 1.0000 0.0000 0.0000 2 0.9812 0.0224 0.0001 1.0000 0.0000 0.0000 3 0.9818 0.0218 0.0001 1.0000 0.0000 0.0000 4 0.9818 0.0218 0.0001 1.0000 0.0000 0.0000 5 0.9810 0.0229 0.0001 1.0000 0.0000 0.0000 6 0.9804 0.0236 0.0001 1.0000 0.0000 0.0000 7 0.9813 0.0223 0.0001 1.0000 0.0000 0.0000 8 0.9823 0.0214 0.0001 1.0000 0.0000 0.0000 9 0.9823 0.0213 0.0001 1.0000 0.0000 0.0000 10 0.9810 0.0228 0.0001 1.0000 0.0000 0.0000 11 0.9818 0.0219 0.0001 1.0000 0.0000 0.0000 12 0.9820 0.0216 0.0001 1.0000 0.0000 0.0000 13 0.9810 0.0228 0.0001 1.0000 0.0000 0.0000 14 0.9813 0.0226 0.0001 1.0000 0.0000 0.0000 15 0.9817 0.0219 0.0001 1.0000 0.0000 0.0000 16 0.9820 0.0216 0.0001 1.0000 0.0000 0.0000 17 0.9650 0.0442 0.0002 1.0000 0.0000 0.0000 18 0.9819 0.0220 0.0001 1.0000 0.0000 0.0000 19 0.9813 0.0224 0.0001 1.0000 0.0000 0.0000 20 0.9816 0.0219 0.0001 1.0000 0.0000 0.0000 21 0.9804 0.0236 0.0001 1.0000 0.0000 0.0000

22 0.9820 0.0215 0.0001 1.0000 0.0000 0.0000 23 0.9816 0.0222 0.0001 1.0000 0.0000 0.0000 24 0.9820 0.0215 0.0001 1.0000 0.0000 0.0000 25 0.9817 0.0220 0.0001 1.0000 0.0000 0.0000 26 0.0215 0.9909 0.0049 0.0000 1.0000 0.0000 27 0.0210 0.9909 0.0051 0.0000 1.0000 0.0000 28 0.0170 0.9839 0.0095 0.0000 1.0000 0.0000 29 0.0196 0.9886 0.0064 0.0000 1.0000 0.0000 30 0.0239 0.9898 0.0046 0.0000 1.0000 0.0000 31 0.0215 0.9907 0.0050 0.0000 1.0000 0.0000 32 0.0219 0.9907 0.0049 0.0000 1.0000 0.0000 33 0.0220 0.9906 0.0049 0.0000 1.0000 0.0000

*34 0.0019 0.1415 0.8726 0.0000 1.0000 0.0000 35 0.0140 0.9710 0.0179 0.0000 1.0000 0.0000 36 0.0217 0.9901 0.0051 0.0000 1.0000 0.0000 37 0.0212 0.9909 0.0050 0.0000 1.0000 0.0000 38 0.0201 0.9897 0.0058 0.0000 1.0000 0.0000 39 0.0224 0.9903 0.0049 0.0000 1.0000 0.0000 40 0.0199 0.9889 0.0061 0.0000 1.0000 0.0000 41 0.0197 0.9889 0.0062 0.0000 1.0000 0.0000 42 0.0210 0.9905 0.0052 0.0000 1.0000 0.0000 43 0.0215 0.9907 0.0050 0.0000 1.0000 0.0000 44 0.0232 0.9900 0.0047 0.0000 1.0000 0.0000 45 0.0206 0.9899 0.0055 0.0000 1.0000 0.0000 46 0.0223 0.9905 0.0048 0.0000 1.0000 0.0000 47 0.0216 0.9905 0.0050 0.0000 1.0000 0.0000 48 0.0215 0.9908 0.0049 0.0000 1.0000 0.0000 49 0.0300 0.9869 0.0042 0.0000 1.0000 0.0000 50 0.0215 0.9905 0.0050 0.0000 1.0000 0.0000

*51 0.0060 0.7787 0.1866 0.0000 0.0000 1.0000 52 0.0026 0.2674 0.7388 0.0000 0.0000 1.0000 53 0.0032 0.3974 0.5942 0.0000 0.0000 1.0000 54 0.0007 0.0095 0.9938 0.0000 0.0000 1.0000

*55 0.0159 0.9807 0.0117 0.0000 0.0000 1.0000 56 0.0009 0.0230 0.9835 0.0000 0.0000 1.0000

*57 0.0165 0.9827 0.0104 0.0000 0.0000 1.0000 58 0.0007 0.0094 0.9939 0.0000 0.0000 1.0000

*59 0.0133 0.9690 0.0199 0.0000 0.0000 1.0000 60 0.0018 0.1281 0.8862 0.0000 0.0000 1.0000 61 0.0007 0.0106 0.9930 0.0000 0.0000 1.0000 62 0.0007 0.0095 0.9939 0.0000 0.0000 1.0000 63 0.0018 0.1130 0.9011 0.0000 0.0000 1.0000 64 0.0033 0.4128 0.5766 0.0000 0.0000 1.0000 65 0.0011 0.0335 0.9748 0.0000 0.0000 1.0000 66 0.0007 0.0094 0.9939 0.0000 0.0000 1.0000 67 0.0008 0.0166 0.9885 0.0000 0.0000 1.0000 68 0.0007 0.0101 0.9934 0.0000 0.0000 1.0000 69 0.0007 0.0096 0.9938 0.0000 0.0000 1.0000 70 0.0007 0.0094 0.9939 0.0000 0.0000 1.0000 71 0.0007 0.0105 0.9931 0.0000 0.0000 1.0000 72 0.0007 0.0123 0.9917 0.0000 0.0000 1.0000 73 0.0010 0.0284 0.9791 0.0000 0.0000 1.0000 74 0.0007 0.0099 0.9935 0.0000 0.0000 1.0000 75 0.0011 0.0370 0.9717 0.0000 0.0000 1.0000

CHAPTER 5 Conclusions

Although the back propagation algorithm is a useful tool to solve the problems of classification, optimization, prediction etc, it still has many defects. One of those defects is that we don’t know how to choose the suitable learning rate to get converged training results. But by using the dynamical training algorithm for three layer neural network that we proposed in the end of Chapter 3, we can find the dynamic optimal learning rate very easily. And the dynamic learning rate guarantees that the total square error J is a decreasing function. This means that actual outputs will be closer to desired outputs for more iterations. The classification problems of XOR and Iris data are proposed in Chapter 4. They are solved by using the dynamical optimal training for a three layer neural network with sigmoid activation functions in hidden and output layers. Excellent results are obtained in the XOR and Iris data problems. Therefore the dynamic training algorithm is actually very powerful for getting better results than the other conventional back propagation algorithm with unknown fixed learning rates. So the goal of removing the defects of the back propagation algorithm with fixed learning rate is achieved by using the dynamical optimal training algorithm.

REFERENCES

[1] T. Yoshida, and S. Omatu, “Neural network approach to land cover mapping,”

IEEE Trans. Geoscience and Remote, Vol. 32, pp. 1103-1109, Sept. 1994.

[2] H. Bischof, W. Schneider, and A. J. Pinz, “Multispectral classification of Landsat-images using neural networks,” IEEE Trans, Gsoscience and Remote

Sensing, Vol. 30, pp. 482-490, May 1992.

[3] M. Gopal, L. Behera, and S. Choudhury, “On adaptive trajectory tracking of a robot manipulator using inversion of its neural emulator,” IEEE Trans. Neural

Networks, 1996.

[4] L. Behera, “Query based model learning and stable tracking of a roboot arm using radial basis function network,” Elsevier Science LTd., Computers and

Electrical Engineering, 2003.

[5] F. Amini, H. M. Chen, G. Z. Qi, and J. C. S. Yang, “Generalized neural network based model for structural dynamic identification, analytical and experimental studies,”

Intelligent Information Systems, Proceedings 8-10, pp. 138-142, Dec.

1997.

[6] K. S. Narendra, and S. Mukhopadhyay, “Intelligent control using neural networks,”

IEEE Trans., Control Systems Magazine, Vol. 12, Issue 2, pp.11-18,

April 1992.

[7] L. Yinghua, and G. A. Cunningham, “A new approach to fuzzy-neural system modeling,” IEEE Trans., Fuzzy Systems, Vol. 3, pp. 190-198, May 1995.

[8] L. J. Zhang, and W. B. Wang, “Scattering signal extracting using system modeling method based on a back propagation neural network,” Antennas and

Propagation Society International Symposium, 1992. AP-S, 1992 Digest. Held in

Conjuction with: URSI Radio Science Meting and Nuclear EMP Meeting, IEEE 18-25, Vol. 4, pp. 2272, July 1992

[9] P. Poddar, and K. P. Unnikrishnan, “Nonlinear prediction of speech signals using memory neuron networks,” Neural Networks for Signal Processing [1991], Proceedings of the 1991 IEEE Workshop 30, pp. 395-404, Oct. 1991.

[10] R. P. Lippmann, “An introduction to computing with neural networks,” IEEE

ASSP Magazine, 1987.

[11] D. E. Rumelhart et al., Learning representations by back propagating error,”

Nature, Vol. 323, pp. 533-536, 1986

[12] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” Parallel Distributed Processing,

Exploration in the Microstructure of Cognition, Vol. 1, D. E. Rumelhart and J. L.

McClelland, eds. Cambridge, MA: MIT Press, 1986.

[13] C. H. Wang, H. L. Liu, and C. T. Lin, “Dynamic Optimal Learning Rates of a Certain Class of Fuzzy Neural Networks and its Applications with Genetic Algorithm,” IEEE Trans. Syst., Man, Cybern. Part B, Vol. 31, pp. 467-475, June 2001.

[14] L. Behera, S. Kumar, and A. Patnaik, “A novel learning algorithm for feedforeward networks using Lyapunov function approach,” Intelligent Sensing

and Information Processing, Proceedings of international Conference, pp.

277-282, 2004.

[15] M. A. AL-Alaoui, R. Mouci, M. M. Mansour, and R. Ferzli, “A Cloning Approach to Classifier Training,” IEEE Trans. Syst., Man, Cybern. Part A, Vol.

32, pp. 746-752, Nov. 2002.

[16] R. Kozma, M. Kitamura, A. Malinowski, and J. M. Zurada, “On performance measures of artificial neural networks trained by structural learning algorithms,”

Artificial Neural Networks and Expert Systmes, Proceedings, Second New

Zealand International Two-Stream Conference, pp.22-25, Nov. 20-23, 1995.

[17] F. Rosenblatt, “Principles of Neurodynamics”, Spartan books, New York, 1962.

[18] S. Haykin, “Neural Networks: A Comprehensive Foundation,” New Jersey:

Prentice-Hall, second edition, 1999.

[19] J. E. Slotine, “Applied Nonlinear Control,” New Jersey: Prentice-Hall, 1991.

[20] Iris Data Samples [Online]. Available: ftp.ics.uci.edu/pub/machine-learning data-

bases/iris/iris.data.

在文檔中三層類神經網路的動態最佳學習 (頁 40-0)