The Overall Procedure - 行動電子商務系統關鍵技術之研發與實作—子計畫一：行動電子商務中多策略學習之分散式智慧型代理人架構(3/3)

There is no explicit way to solve the problem of choosing parameters for SVMs. The use of a gradient descent algorithm over the set of parameters by minimizing some estimates of the generalization error of SVMs is discussed in [44]. On the other hand, the exhaustive search or the grid search is the popular method to choose the parameters, but it becomes intractable in this application as number of parameters is growing.

In order to select parameters in this kind of problem, we divide the training procedure into two main parts and propose the following procedures.

1. Use the original algorithm of SVMs to get the optimal kernel parameters and the regularization parameter C.

2. Fix the kernel parameters and the regularization parameter C, that are ob-tained in the previous procedure, and ﬁnd the other parameters in FSVMs.

a) Deﬁne the heuristic function h(x)

b) Use the exhaustive search or the grid search to choose the conﬁdent factor h^C, the trashy factor h^T, the mapping degree d, and the fuzzy membership lower bound σ.

3.5 Experiments

In these simulations, we use the RBF kernel as

K(xⁱ, x^j) = e^−γxⁱ^−x^j². (44) We conducted computer simulations of SVMs and FSVMs using the same data sets as in [45]. Each data set is split into 100 sample sets of training and test sets. For each sample set, the test set is independent of training set. For each data set, we train and test the ﬁrst 5 sample sets iteratively to ﬁnd the parameters of the best average test error. Then we use these parameters to train and test the whole sample sets iteratively and get the average test error.

Since there are more parameters than the original algorithm of SVMs, we use two procedures to ﬁnd the parameters as described in the previous section. In the ﬁrst procedure, we search the kernel parameters and C using the original algorithm of SVMs. In the second procedure, we ﬁx the kernel parameters and C that are found in the ﬁrst stage, and search the parameters of the fuzzy membership mapping function.

To ﬁnd the parameters of strategy using kernel-target alignment, we ﬁrst ﬁx h^C = maxif^K(xi, yⁱ) and h^T = minif^K(xi, yⁱ), and perform a two-dimensional search of parameters σ and d. The value of σ is chosen from 0.1 to 0.9 step by 0.1. For some case, we also compare the result of σ = 0.01.

The value of d is chosen from 2⁻⁸ to 2⁸ multiply by 2. Then, we ﬁx σ and d, and perform a two-dimensional search of parameters h^C and h^T. The value of h^C is chosen such that 0%, 10%, 20%, 30%, 40%, and 50% of data points have the value of fuzzy membership as 1. The value of hT is chosen such that

0%, 10%, 20%, 30%, 40%, and 50% of data points have the value of fuzzy membership as σ.

To ﬁnd the parameters of strategy using k-NN, we just perform a two-dimensional search of parameters σ and k. We ﬁx the value hC = k/2, hT = 0, and d = 1, since we don’t ﬁnd much gain or loss when we choose other values of these two parameters such that we skip searching for saving time. The value of σ is chosen from 0.1 to 0.9 stepped by 0.1. For some case, we also compare the result of σ = 0.01. The value of k is chosen from 2¹ to 2⁸ multiplied by 2.

Table 1 shows the results of our simulations. For comparison with SVMs, FSVMs with kernel-target alignment perform better in 9 data sets, and FSVMs with k-NN perform better in 5 data sets. By checking the average training error of SVMs in each data set, we ﬁnd that FSVMs perform well in the data set when the average training error is high. These results show that our algorithm can improve the performance of SVMs when the data set contains noisy data.

Table 1. The test error of SVMs, FSVMs using strategy of kernel-target alignment (KT), and FSVMs using strategy of k-NN (k-NN), and the average training error of SVMs (TR) on 13 datasets.

SVMs KT k-NN TR

Banana 11.5±0.7 10.4±0.5 11.4±0.6 6.7 B. Cancer 26.0±4.7 25.3±4.4 25.2±4.1 18.3 Diabetes 23.5±1.7 23.3±1.7 23.5±1.7 19.4 F. Solar 32.4±1.8 32.4±1.8 32.4±1.8 32.6 German 23.6±2.1 23.3±2.3 23.6±2.1 16.2 Heart 16.0±3.3 15.2±3.1 15.5±3.4 12.8

Image 3.0±0.6 2.9±0.7 - 1.3

Ringnorm 1.7±0.1 - - 0.0

Splice 10.9±0.7 - - 0.0

Thyroid 4.8±2.2 4.7±2.3 - 0.4 Titanic 22.4±1.0 22.3±0.9 22.3±1.1 19.6 Twonorm 3.0±0.2 2.4±0.1 2.9±0.2 0.4 Waveform 9.9±0.4 9.9±0.4 - 3.5

4 Conclusions

In this report, we reviewed the concept of fuzzy support vector machines and proposed training procedures for FSVMs. By associating the data points with fuzzy memberships, FSVMs train data points with diﬀerent memberships in learning the decision function. However, the extra freedom in selecting the membership poses an issue to learning. Thus, systematic methods are required for the applicability of the FSVMs. The proposed training procedures

along with two strategies for setting fuzzy membership can eﬀectively solve the membership selection problem. This makes FSVMs more feasible in the application of reducing the eﬀects of noises or outliers. The experiments show that the performance is better in the applications with the noisy data.

It is still an issue that FSVMs should select a proper fuzzy model for a given speciﬁc problem. Some problems may involve diﬀerent domains that are outside the discipline of learning techniques. For example, the problem of economical trend prediction may work better with both the domain knowledge of economics and the learning technique of computer scientists. The illustrated examples in this report show only the basic application of FSVMs. More versatile applications are expected in the near future.

References

1. C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning, vol. 20, pp. 273–297, 1995.

2. V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.

3. V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.

4. B. Sch¨olkopf, S. Mika, C. Burges, P. Knirsch, K.-R. M¨uller, G. R¨atsch, and A. Smola, “Input space vs. feature space in kernel-based methods,” IEEE Trans-actions on Neural Networks, vol. 10, no. 5, pp. 1000–1017, 1999.

5. E. Osuna, R. Freund, and F. Girosi, “Support vector machines: Training and applications,” Tech. Rep. AIM-1602, MIT A.I. Lab., 1996.

6. V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982.

7. C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,”

Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.

8. A. Smola and B. Sch¨olkopf, “A tutorial on support vector regression,” Tech.

Rep. NC2-TR-1998-030, Neural and Computational Learning II, 1998.

9. C. J. C. Burges and B. Sch¨olkopf, “Improving the accuracy and speed of support vector learning machines,” in Advances in Neural Information Processing Sys-tems 9 (M. Mozer, M. Jordan, and T. Petsche, eds.), pp. 375–381, Cambridge, MA: MIT Press, 1997.

10. M. Schmidt, “Identifying speaker with support vector networks,” in Interface

’96 Proceedings, (Sydney), 1996.

11. S. Ben-Yacoub, Y. Abdeljaoued, and E. Mayoraz, “Fusion of face and speech data for person identity veriﬁcation,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1065–1074, 1999.

12. E. Osuna, R. Freund, and F. Girosi, “An improved training algorithm for sup-port vector machines,” in 1997 IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285, 1997.

13. G. Fung, O. L. Mangasarian, and J. Shavlik, “Knowledge-based support vector machine classiﬁers,” in Advances in Neural Information Processing, 2002.

14. T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” in Proceedings of ECML-98, 10th European Conference on Machine Learning (C. N´edellec and C. Rouveirol, eds.), (Chemnitz, DE), pp. 137–142, Springer Verlag, Heidelberg, DE, 1998.

15. K. Crammer and Y. Singer, “On the learnability and design of output codes for multiclass problems,” in Computational Learning Theory, pp. 35–46, 2000.

16. K.-R. M¨uller, A. Smola, G. R¨atsch, B. Sch¨olkopf, J. Kohlmorgen, and V. Vap-nik, “Predicting time series with support vector machines,” in Articial Neural Networks - ICANN’97 (W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, eds.), pp. 999–1004, 1997.

17. S. Mukherjee, E. Osuna, and F. Girosi, “Nonlinear prediction of chaotic time series using support vector machines,” in 1997 IEEE Workshop on Neural Net-works for Signal Processing, pp. 511–519, 1997.

18. F. E. H. Tay and L. Cao, “Application of support vector machines in ﬁnancial time series forecasting,” Omega, vol. 29, pp. 309–317, 2001.

19. L. J. Cao, K. S. Chua, and L. K. Guan, “c-ascending support vector machines for ﬁnancial time series forecasting,” in 2003 International Conference on Com-putational Intelligence for Financial Engineering (CIFEr2003), (Hong Kong), pp. 317–323, 2003.

20. H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” in Advances in Neural Information Processing Sys-tems, vol. 9, p. 155, The MIT Press, 1997.

21. R. Fletcher, Practical methods of optimization. Chichester and New York: John Wiley and Sons, 1987.

22. M. Aizerman, E. Braverman, and L. Rozonoer, “Theoretical foundations of the potential function method in pattern recognition learning,” Automations and Remote Control, vol. 25, pp. 821–837, 1964.

23. N. J. Nilsson, Learning machines: Foundations of trainable pattern classifying systems. McGraw-Hill, 1965.

24. L. Kaufman, “Solving the quadratic programming problem arising in support vector classiﬁcation,” in Advances in Kernel Methods: Support Vector Learning (B. Sch¨olkopf, C. Burges, and A. Smola, eds.), pp. 147–168, Cambridge, MA:

MIT Press, 1998.

25. J. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Tech. Rep. 98-14, Microsoft Research, Washington, 1998.

26. J. Platt, “Fast training of support vector machines using sequential mini-mal optimization,” in Advances in Kernel Methods: Support Vector Learning (B. Sch¨olkopf, C. Burges, and A. Smola, eds.), pp. 185–208, Cambridge, MA:

MIT Press, 1998.

27. C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,”

2001. Software avaiable at http://www.csie.ntu.edu.tw/^∼cjlin/libsvm/.

28. J. Platt, “Making large-scale svm learning practical,” in Advances in Kernel Methods: Support Vector Learning (B. Sch¨olkopf, C. Burges, and A. Smola, eds.), pp. 169–184, Cambridge, MA: MIT Press, 1998.

29. B. E. Boser, I. Guyon, and V. Vapnik, “A training algorithm for optimal margin classiﬁers,” in Computational Learing Theory, pp. 144–152, 1992.

30. X. Zhang, “Using class-center vectors to build support vector machines,” in 1999 IEEE Workshop on Neural Networks for Signal Processing, pp. 3–11, 1999.

31. C.-F. Lin and S.-D. Wang, “Fuzzy support vector machines,” IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 464–471, 2002.

32. N. D. Freitas, M. Milo, P. Clarkson, M. Niranjan, and A. Gee, “Sequential support vector machines,” in 1999 IEEE Workshop on Neural Networks for Signal Processing, pp. 31–40, 1999.

33. S. A. Yaser and A. F. Atiya, “Introduction to ﬁnancial forecasting,” Applied Intelligence, vol. 6, pp. 205–213, 1996.

34. K. K. Lee, S. R. Gunn, C. J. Harris, and P. A. S. Reed, “Classiﬁcation of unbalanced data with transparent kernels,” in International Joint Conference on Neural Networks (IJCNN ’01), vol. 4, pp. 2445–2450, July 2001.

35. A. T. Quang, Q.-L. Zhang, and X. Li, “Evolving support vector machine parame-ters,” in 2002 International Conference on Machine Learning and Cybernetics, vol. 1, pp. 548–551, 2002.

36. L. J. Cao, H. P. Lee, and W. K. Chong, “Modiﬁed support vector novelty de-tector using training data with outliers,” Pattern Recognition Letters, vol. 24, pp. 2479–2487, 2003.

37. J. Weston, “Leave-one-out support vector machines,” in Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 99 (T. Dean, ed.), pp. 727–733, Morgan Kaufmann, 1999.

38. J. Weston and R. Herbrich, “Adaptive margin support vector machines,” in Advances in Large Margin Classifiers, pp. 281–295, Cambridge, MA: MIT Press, 2000.

39. C.-F. Lin and S.-D. Wang, “Training algorithms for fuzzy support vector ma-chines with noisy data,” in 2003 IEEE Workshop on Neural Networks for Signal Processing, 2003.

40. J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classiﬁers,” Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999.

41. K. S. Chua, “Eﬃcient computations for large least square support vector ma-chine classiﬁers,” Pattern Recognition Letters, vol. 24, pp. 75–80, 2003.

42. D. S. Chen and R. C. Jain, “A robust back propagation learning algorithm for function approximation,” IEEE Transactions on Neural Networks, vol. 5, no. 3, pp. 467–479, 1994.

43. N. Cristianini, J. Shawe-Taylor, A. Elisseeﬀ, and J. Kandola, “On kernel-target alignment,” in Advances in Neural Information Processing Systems 14, pp. 367–

373, MIT Press, 2002.

44. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,” Machine Learning, vol. 46, no. 1-3, pp. 131–159, 2002.

45. G. R¨atsch, T. Onoda, and K.-R. M¨uller, “Soft margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287–320, 2001.

在文檔中行動電子商務系統關鍵技術之研發與實作—子計畫一：行動電子商務中多策略學習之分散式智慧型代理人架構(3/3) (頁 21-26)