9 Applications of ν-SV Classifiers - a tutorial on ν -support vector machines

Researchers have applied ν-SVM on different applications. Some of them feel that it is easier and more intuitive to deal with ν ∈ [0, 1] than C ∈ [0, ∞). Here, we briefly summarize some work which useLIBSVMto solve ν-SVM.

In [7], researchers from HP Labs discuss the topics of personal email agent. Data classification is an important component for which the authors use ν-SVM because they think “the ν parameter is more intuitive than the C parameter.”

[23] applies machine learning methods to detect and localize boundaries of natural images. Several classifiers are tested where, for SVM, the authors considered ν-SVM.

10 Conclusion

One of the most appealing features of kernel algorithms is the solid foundation pro-vided by both statistical learning theory and functional analysis. Kernel methods let us interpret (and design) learning algorithms geometrically in feature spaces nonlin-early related to the input space, and combine statistics and geometry in a promising way. Kernels provide an elegant framework for studying three fundamental issues of machine learning:

– Similarity measures — the kernel can be viewed as a (nonlinear) similarity measure, and should ideally incorporate prior knowledge about the problem at hand – Data representation — as described above, kernels induce representations of the

data in a linear space

– Function class — due to the representer theorem, the kernel implicitly also deter-mines the function class which is used for learning.

Support vector machines have been one of the major kernel methods for data classi-fication. Its original form requires a parameter C ∈ [0, ∞), which controls the trade-off between the classifier capacity and the training errors. Using the ν-parameterization, the parameter C is replaced by a parameter ν ∈ [0, 1]. In this tutorial, we have given its derivation and present possible advantages of using the ν-support vector classifier.

Acknowledgments

The authors thank Ingo Steinwart and Arthur Gretton for some helpful comments.

References

1. M. A. Aizerman, ´E.. M. Braverman, and L. I. Rozono´er. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.

2. N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uni-form convergence, and learnability. Journal of the ACM, 44(4):615–631, 1997.

3. M. Avriel. Nonlinear Programming. Prentice-Hall Inc., New Jersey, 1976.

4. P. L. Bartlett and J. Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In B. Sch ¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 43–54, Cambridge, MA, 1999. MIT Press.

5. M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear programming : theory and algo-rithms. Wiley, second edition, 1993.

6. K. P. Bennett and E. J. Bredensteiner. Duality and geometry in SVM classifiers. In P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning, pages 57–64, San Francisco, California, 2000. Morgan Kaufmann.

7. R. Bergman, M. Griss, and C. Staelin. A personal email assistant. Technical Report HPL-2002-236, HP Laboratories, Palo Alto, CA, 2002.

8. B. E. Boser, I. M. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers.

In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July 1992. ACM Press.

9. C. J. C. Burges and B. Sch¨olkopf. Improving the accuracy and speed of support vector learning machines. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 375–381, Cambridge, MA, 1997. MIT Press.

10. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available athttp://www.csie.ntu.edu.tw/˜cjlin/libsvm.

11. C.-C. Chang and C.-J. Lin. Training ν-support vector classifiers: Theory and algorithms.

Neural Computation, 13(9):2119–2147, 2001.

12. H.-G. Chew, R. E. Bogner, and C.-C. Lim. Dual ν-support vector machine with error rate and training size biasing. In Proceedings of ICASSP, pages 1269–72, 2001.

13. H. G. Chew, C. C. Lim, and R. E. Bogner. An implementation of training dual-nu support vector machines. In Qi, Teo, and Yang, editors, Optimization and Control with Applications.

Kluwer, 2003.

14. K.-M. Chung, W.-C. Kao, C.-L. Sun, and C.-J. Lin. Decomposition methods for linear sup-port vector machines. Technical resup-port, Department of Computer Science and Information Engineering, National Taiwan University, 2002.

15. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.

16. D. J. Crisp and C. J. C. Burges. A geometric interpretation of ν-SVM classifiers. In S. A.

Solla, T. K. Leen, and K.-R. M¨uller, editors, Advances in Neural Information Processing Systems 12. MIT Press, 2000.

17. A. Gretton, R. Herbrich, O. Chapelle, B. Sch ¨olkopf, and P. J. W. Rayner. Estimating the Leave-One-Out Error for Classification Learning with SVMs. Technical Report CUED/F-INFENG/TR.424, Cambridge University Engineering Department, 2001.

18. C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines.

IEEE Transactions on Neural Networks, 13(2):415–425, 2002.

19. T. Joachims. Making large–scale SVM learning practical. In B. Sch ¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 169–184, Cambridge, MA, 1999. MIT Press.

20. C.-J. Lin. Formulations of support vector machines: a note from an optimization point of view. Neural Computation, 13(2):307–317, 2001.

21. C.-J. Lin. On the convergence of the decomposition method for support vector machines.

IEEE Transactions on Neural Networks, 12(6):1288–1298, 2001.

22. Luntz, A. and Brailovsky, V. On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica 3, 1969.

23. D. R. Martin, C. C. Fowlkes, and J. Malik. Learning to detect natural image boundaries using brightness and texture. In Advances in Neural Information Processing Systems, volume 14, 2002.

24. J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London, A 209:415–

446, 1909.

25. D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine Learning, Neural and Sta-tistical Classification. Prentice Hall, Englewood Cliffs, N.J., 1994. Data available at http://www.ncc.up.pt/liacc/ML/statlog/datasets.html.

26. M. Opper and O. Winther. Gaussian processes and SVM: Mean field and leave-one-out estimator. In A. Smola, P. Bartlett, B. Sch ¨olkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, Cambridge, MA, 2000. MIT Press.

27. F. Perez Cruz, J. Weston, D. J. L. Herrmann, and B. Sch ¨olkopf. Extension of the ν-svm range for classification. In J. Suykens, G. Horvath, S. Basu, C. Micchelli, and J. Vandewalle, editors, Advances in Learning Theory: Methods, Models and Applications, 190, pages 179–

196, Amsterdam, 2003. IOS Press.

28. J. C. Platt. Fast training of support vector machines using sequential minimal optimization.

In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods -Support Vector Learning, Cambridge, MA, 1998. MIT Press.

29. B. Sch¨olkopf. Support Vector Learning. R. Oldenbourg Verlag, M¨unchen, 1997. Doktorar-beit, Technische Universit¨at Berlin. Available from http://www.kyb.tuebingen.mpg.de/∼bs.

30. B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola. Advances in Kernel Methods — Support Vector Learning. MIT Press, Cambridge, MA, 1999.

31. B. Sch¨olkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

32. B. Sch¨olkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algo-rithms. Neural Computation, 12:1207–1245, 2000.

33. A. J. Smola, P. L. Bartlett, B. Sch ¨olkopf, and D. Schuurmans. Advances in Large Margin Classifiers. MIT Press, Cambridge, MA, 2000.

34. I. Steinwart. Support vector machines are universally consistent. Journal of Complexity, 18:768–791, 2002.

35. I. Steinwart. On the optimal parameter choice for ν-support vector machines. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 2003. To appear.

36. I. Steinwart. Sparseness of support vector machines. Technical report, 2003.

37. V. Vapnik. Estimation of Dependences Based on Empirical Data (in Russian). Nauka, Moscow, 1979. (English translation: Springer Verlag, New York, 1982).

38. V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.

39. V. Vapnik. Statistical Learning Theory. Wiley, NY, 1998.

40. V. Vapnik and O. Chapelle. Bounds on error expectation for support vector machines. Neural Computation, 12(9):2013–2036, 2000.

41. V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition (in Russian). Nauka, Moscow, 1974. (German Translation: W. Wapnik & A. Tscherwonenkis, Theorie der Ze-ichenerkennung, Akademie–Verlag, Berlin, 1979).

42. V. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:774–780, 1963.

43. G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Con-ference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1990.

44. R. C. Williamson, A. J. Smola, and B. Sch ¨olkopf. Generalization performance of regulariza-tion networks and support vector machines via entropy numbers of compact operators. IEEE Transactions on Information Theory, 47(6):2516–2532, 2001.

45. P. Wolfe. A duality theorem for non-linear programming. Quartely of Applied Mathematics, 19:239–244, 1961.

在文檔中 a tutorial on ν -support vector machines (頁 26-29)