• 沒有找到結果。

There is no explicit way to solve the problem of choosing parameters for SVMs. The use of a gradient descent algorithm over the set of parameters by minimizing some estimates of the generalization error of SVMs is discussed in [44]. On the other hand, the exhaustive search or the grid search is the popular method to choose the parameters, but it becomes intractable in this application as number of parameters is growing.

In order to select parameters in this kind of problem, we divide the training procedure into two main parts and propose the following procedures.

1. Use the original algorithm of SVMs to get the optimal kernel parameters and the regularization parameter C.

2. Fix the kernel parameters and the regularization parameter C, that are ob-tained in the previous procedure, and find the other parameters in FSVMs.

a) Define the heuristic function h(x)

b) Use the exhaustive search or the grid search to choose the confident factor hC, the trashy factor hT, the mapping degree d, and the fuzzy membership lower bound σ.

3.5 Experiments

In these simulations, we use the RBF kernel as

K(xi, xj) = e−γxi−xj2. (44) We conducted computer simulations of SVMs and FSVMs using the same data sets as in [45]. Each data set is split into 100 sample sets of training and test sets. For each sample set, the test set is independent of training set. For each data set, we train and test the first 5 sample sets iteratively to find the parameters of the best average test error. Then we use these parameters to train and test the whole sample sets iteratively and get the average test error.

Since there are more parameters than the original algorithm of SVMs, we use two procedures to find the parameters as described in the previous section. In the first procedure, we search the kernel parameters and C using the original algorithm of SVMs. In the second procedure, we fix the kernel parameters and C that are found in the first stage, and search the parameters of the fuzzy membership mapping function.

To find the parameters of strategy using kernel-target alignment, we first fix hC = maxifK(xi, yi) and hT = minifK(xi, yi), and perform a two-dimensional search of parameters σ and d. The value of σ is chosen from 0.1 to 0.9 step by 0.1. For some case, we also compare the result of σ = 0.01.

The value of d is chosen from 2−8 to 28 multiply by 2. Then, we fix σ and d, and perform a two-dimensional search of parameters hC and hT. The value of hC is chosen such that 0%, 10%, 20%, 30%, 40%, and 50% of data points have the value of fuzzy membership as 1. The value of hT is chosen such that

0%, 10%, 20%, 30%, 40%, and 50% of data points have the value of fuzzy membership as σ.

To find the parameters of strategy using k-NN, we just perform a two-dimensional search of parameters σ and k. We fix the value hC = k/2, hT = 0, and d = 1, since we don’t find much gain or loss when we choose other values of these two parameters such that we skip searching for saving time. The value of σ is chosen from 0.1 to 0.9 stepped by 0.1. For some case, we also compare the result of σ = 0.01. The value of k is chosen from 21 to 28 multiplied by 2.

Table 1 shows the results of our simulations. For comparison with SVMs, FSVMs with kernel-target alignment perform better in 9 data sets, and FSVMs with k-NN perform better in 5 data sets. By checking the average training error of SVMs in each data set, we find that FSVMs perform well in the data set when the average training error is high. These results show that our algorithm can improve the performance of SVMs when the data set contains noisy data.

Table 1. The test error of SVMs, FSVMs using strategy of kernel-target alignment (KT), and FSVMs using strategy of k-NN (k-NN), and the average training error of SVMs (TR) on 13 datasets.

SVMs KT k-NN TR

Banana 11.5±0.7 10.4±0.5 11.4±0.6 6.7 B. Cancer 26.0±4.7 25.3±4.4 25.2±4.1 18.3 Diabetes 23.5±1.7 23.3±1.7 23.5±1.7 19.4 F. Solar 32.4±1.8 32.4±1.8 32.4±1.8 32.6 German 23.6±2.1 23.3±2.3 23.6±2.1 16.2 Heart 16.0±3.3 15.2±3.1 15.5±3.4 12.8

Image 3.0±0.6 2.9±0.7 - 1.3

Ringnorm 1.7±0.1 - - 0.0

Splice 10.9±0.7 - - 0.0

Thyroid 4.8±2.2 4.7±2.3 - 0.4 Titanic 22.4±1.0 22.3±0.9 22.3±1.1 19.6 Twonorm 3.0±0.2 2.4±0.1 2.9±0.2 0.4 Waveform 9.9±0.4 9.9±0.4 - 3.5

4 Conclusions

In this report, we reviewed the concept of fuzzy support vector machines and proposed training procedures for FSVMs. By associating the data points with fuzzy memberships, FSVMs train data points with different memberships in learning the decision function. However, the extra freedom in selecting the membership poses an issue to learning. Thus, systematic methods are required for the applicability of the FSVMs. The proposed training procedures

along with two strategies for setting fuzzy membership can effectively solve the membership selection problem. This makes FSVMs more feasible in the application of reducing the effects of noises or outliers. The experiments show that the performance is better in the applications with the noisy data.

It is still an issue that FSVMs should select a proper fuzzy model for a given specific problem. Some problems may involve different domains that are outside the discipline of learning techniques. For example, the problem of economical trend prediction may work better with both the domain knowledge of economics and the learning technique of computer scientists. The illustrated examples in this report show only the basic application of FSVMs. More versatile applications are expected in the near future.

References

1. C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning, vol. 20, pp. 273–297, 1995.

2. V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.

3. V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.

4. B. Sch¨olkopf, S. Mika, C. Burges, P. Knirsch, K.-R. M¨uller, G. R¨atsch, and A. Smola, “Input space vs. feature space in kernel-based methods,” IEEE Trans-actions on Neural Networks, vol. 10, no. 5, pp. 1000–1017, 1999.

5. E. Osuna, R. Freund, and F. Girosi, “Support vector machines: Training and applications,” Tech. Rep. AIM-1602, MIT A.I. Lab., 1996.

6. V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982.

7. C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,”

Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.

8. A. Smola and B. Sch¨olkopf, “A tutorial on support vector regression,” Tech.

Rep. NC2-TR-1998-030, Neural and Computational Learning II, 1998.

9. C. J. C. Burges and B. Sch¨olkopf, “Improving the accuracy and speed of support vector learning machines,” in Advances in Neural Information Processing Sys-tems 9 (M. Mozer, M. Jordan, and T. Petsche, eds.), pp. 375–381, Cambridge, MA: MIT Press, 1997.

10. M. Schmidt, “Identifying speaker with support vector networks,” in Interface

’96 Proceedings, (Sydney), 1996.

11. S. Ben-Yacoub, Y. Abdeljaoued, and E. Mayoraz, “Fusion of face and speech data for person identity verification,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1065–1074, 1999.

12. E. Osuna, R. Freund, and F. Girosi, “An improved training algorithm for sup-port vector machines,” in 1997 IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285, 1997.

13. G. Fung, O. L. Mangasarian, and J. Shavlik, “Knowledge-based support vector machine classifiers,” in Advances in Neural Information Processing, 2002.

14. T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” in Proceedings of ECML-98, 10th European Conference on Machine Learning (C. N´edellec and C. Rouveirol, eds.), (Chemnitz, DE), pp. 137–142, Springer Verlag, Heidelberg, DE, 1998.

15. K. Crammer and Y. Singer, “On the learnability and design of output codes for multiclass problems,” in Computational Learning Theory, pp. 35–46, 2000.

16. K.-R. M¨uller, A. Smola, G. R¨atsch, B. Sch¨olkopf, J. Kohlmorgen, and V. Vap-nik, “Predicting time series with support vector machines,” in Articial Neural Networks - ICANN’97 (W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, eds.), pp. 999–1004, 1997.

17. S. Mukherjee, E. Osuna, and F. Girosi, “Nonlinear prediction of chaotic time series using support vector machines,” in 1997 IEEE Workshop on Neural Net-works for Signal Processing, pp. 511–519, 1997.

18. F. E. H. Tay and L. Cao, “Application of support vector machines in financial time series forecasting,” Omega, vol. 29, pp. 309–317, 2001.

19. L. J. Cao, K. S. Chua, and L. K. Guan, “c-ascending support vector machines for financial time series forecasting,” in 2003 International Conference on Com-putational Intelligence for Financial Engineering (CIFEr2003), (Hong Kong), pp. 317–323, 2003.

20. H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” in Advances in Neural Information Processing Sys-tems, vol. 9, p. 155, The MIT Press, 1997.

21. R. Fletcher, Practical methods of optimization. Chichester and New York: John Wiley and Sons, 1987.

22. M. Aizerman, E. Braverman, and L. Rozonoer, “Theoretical foundations of the potential function method in pattern recognition learning,” Automations and Remote Control, vol. 25, pp. 821–837, 1964.

23. N. J. Nilsson, Learning machines: Foundations of trainable pattern classifying systems. McGraw-Hill, 1965.

24. L. Kaufman, “Solving the quadratic programming problem arising in support vector classification,” in Advances in Kernel Methods: Support Vector Learning (B. Sch¨olkopf, C. Burges, and A. Smola, eds.), pp. 147–168, Cambridge, MA:

MIT Press, 1998.

25. J. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Tech. Rep. 98-14, Microsoft Research, Washington, 1998.

26. J. Platt, “Fast training of support vector machines using sequential mini-mal optimization,” in Advances in Kernel Methods: Support Vector Learning (B. Sch¨olkopf, C. Burges, and A. Smola, eds.), pp. 185–208, Cambridge, MA:

MIT Press, 1998.

27. C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,”

2001. Software avaiable at http://www.csie.ntu.edu.tw/cjlin/libsvm/.

28. J. Platt, “Making large-scale svm learning practical,” in Advances in Kernel Methods: Support Vector Learning (B. Sch¨olkopf, C. Burges, and A. Smola, eds.), pp. 169–184, Cambridge, MA: MIT Press, 1998.

29. B. E. Boser, I. Guyon, and V. Vapnik, “A training algorithm for optimal margin classifiers,” in Computational Learing Theory, pp. 144–152, 1992.

30. X. Zhang, “Using class-center vectors to build support vector machines,” in 1999 IEEE Workshop on Neural Networks for Signal Processing, pp. 3–11, 1999.

31. C.-F. Lin and S.-D. Wang, “Fuzzy support vector machines,” IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 464–471, 2002.

32. N. D. Freitas, M. Milo, P. Clarkson, M. Niranjan, and A. Gee, “Sequential support vector machines,” in 1999 IEEE Workshop on Neural Networks for Signal Processing, pp. 31–40, 1999.

33. S. A. Yaser and A. F. Atiya, “Introduction to financial forecasting,” Applied Intelligence, vol. 6, pp. 205–213, 1996.

34. K. K. Lee, S. R. Gunn, C. J. Harris, and P. A. S. Reed, “Classification of unbalanced data with transparent kernels,” in International Joint Conference on Neural Networks (IJCNN ’01), vol. 4, pp. 2445–2450, July 2001.

35. A. T. Quang, Q.-L. Zhang, and X. Li, “Evolving support vector machine parame-ters,” in 2002 International Conference on Machine Learning and Cybernetics, vol. 1, pp. 548–551, 2002.

36. L. J. Cao, H. P. Lee, and W. K. Chong, “Modified support vector novelty de-tector using training data with outliers,” Pattern Recognition Letters, vol. 24, pp. 2479–2487, 2003.

37. J. Weston, “Leave-one-out support vector machines,” in Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 99 (T. Dean, ed.), pp. 727–733, Morgan Kaufmann, 1999.

38. J. Weston and R. Herbrich, “Adaptive margin support vector machines,” in Advances in Large Margin Classifiers, pp. 281–295, Cambridge, MA: MIT Press, 2000.

39. C.-F. Lin and S.-D. Wang, “Training algorithms for fuzzy support vector ma-chines with noisy data,” in 2003 IEEE Workshop on Neural Networks for Signal Processing, 2003.

40. J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999.

41. K. S. Chua, “Efficient computations for large least square support vector ma-chine classifiers,” Pattern Recognition Letters, vol. 24, pp. 75–80, 2003.

42. D. S. Chen and R. C. Jain, “A robust back propagation learning algorithm for function approximation,” IEEE Transactions on Neural Networks, vol. 5, no. 3, pp. 467–479, 1994.

43. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, “On kernel-target alignment,” in Advances in Neural Information Processing Systems 14, pp. 367–

373, MIT Press, 2002.

44. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,” Machine Learning, vol. 46, no. 1-3, pp. 131–159, 2002.

45. G. R¨atsch, T. Onoda, and K.-R. M¨uller, “Soft margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287–320, 2001.

相關文件