• 沒有找到結果。

4. The Properties

5.2 Benchmark Datasets

In this section we present experimental results on several benchmark datasets.

From UCI Repository [Blake, 1998] we choose the following datasets: iris, wine, dermatology, house-vote-88, ionosphere, and sonar. From Statolog Collection [Michie, 1994] we choose following datasets: vehicle and german. Furthermore, we choose face and imox from [Mitchell, 1997] and [Chen, 2005], respectively. In this experiment, we compare our approach with the hyperplane-based multi-class SVMs approaches, including one-against-all, one-against-one, and DAGSVM. Besides, we compare our approach with the hypersphere-based multi-class SVMs, including M-SVDD and M-SVDD-NEG. For hypersphere-based SVMs, we evaluate the classification performance by using different similarity functions mentioned in Section 3.2.

The most important criterion for evaluating the performance of those methods is their accuracy rate. However, it is unfair to use only one parameter set for comparing these methods. Practically for any method, the best parameters are first obtained by performing model selection. For each problem, we estimate the generalized accuracy using different kernel parameters  = [22,21,20,…,2-15] and regularization parameter C = [215,212,211,…,2-4]. Therefore, for each problem we try 1820 combinations [Hsu, 2002]. We apply the five-fold cross-validation method to select the model parameters.

Namely, for each problem we partition the available examples into five disjoint subsets (called ‘folds’) of approximately equal size. The classifier is trained on all the

subsets except for one, and the validation error is measured by testing it on the subset left out. This procedure is repeated for a total of five trails, each time using a different subset for validation. The performance of the model is assessed by averaging the squared error under validation over all the trails of this problem. According to their cross-validation rate, we try to infer the proper values of model-parameters.

Table II. A comparison of classification performance (best rates bold faced)

iris wine dermat-ology

house-vote-88

ionosph-ere sonar vehicle german face imox one-againat-all

Table II presents the result of comparing these methods. We present the optimal parameters (C, ) and the corresponding cross-validation rate. Note that (a)-(e) means the five similarity functions mentioned in Section 3.2, respectively. In addition, we denote by C

,, M

the logarithm of the optimal model parameters C,,M (to base 2), respectively. It can be seen that optimal model parameters are in various ranges for different problems so it is critical to perform model selection task.

The previous hypersphere-based SVM classifiers, M-SVDD and M-SVDD-NEG., give worse results than the standard hyperplane-based SVM classifiers on most of the datasets. However, using our proposed algorithm, which incorporated the concept of maximal margin, the classification performance of the resulting hypersphere-based classifiers improves significantly and is better than that of the standard hyperplane-based SVM classifiers on most of the datasets being tested. In addition, the Wu’s and the proposed Chiang’s similarity functions achieve better accuracy rate compared with other similarity functions.

6. Conclusions

The solution of binary classification problem using the SVM has been well developed. For multi-class classification problems, two types of multiclass SVMs have been proposed. One is the hyperplane-based SVM; while the other is the hypersphere-based SVM. Wang et al. [Wang, 2005] first incorporated the concept of maximal-margin into hypersphere-based SVM for two-class classification problem via a single sphere by adjusting the ratio of the radius of the sphere to the separation margin. In this paper, we extend Wang’s approach to multi-class problems, and propose a maximal-margin spherical-structured multi-class support vector machine (MSM-SVM). The proposed MSM-SVM approach finds several class-specific

hyperspheres where each encloses all positive examples but excludes all negative examples. Besides, the hypersphere separates the positive examples from the negative examples with maximal margin. The proposed MSM-SVM has advantage of using parameters M and C on controlling the number of support vectors. With M and C limiting the maximum number of outlier support vectors (OSVs), as well as the minimum number of total support vectors (SVs), the selection of (M, C) is more intuitive. We propose a new fuzzy similarity function, and give an experimental comparison of the similarity functions that have been proposed in previous spherical-structured SVM. Experimental results show that the proposed method performs fairly well on both artificial and benchmark datasets.

Now, we discuss the time complexity in proposed approach. Empirically, SVM training is observed to scale super-linearly with the training size N [Platt, 1999], according to the power law:TcNr, where r 2 for algorithms based on the Sequential Minimal Optimization (SMO) decomposition method, with some proportionality const c. In our training phase, we need to solve K optimal class-specific hyperspheres each with the training size N, so the training time complexity is O(KNr). The time complexity of the proposed approach is equal to the one-against-all method, which is satisfactory for many real-world applications.

Reference

[1] C. L. Blake and C. J. Merz, UCI repository of Machine Learning Databases.

Univ. California, Dept. Inform. Comput. Sci., Irvine, CA. 1998, [Online].

Available: http://kdd.ics.uci.edu/

[2] L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun, U.

Muller, E. Sackinger, P. Simard, and V. Vapnik, “Comparison of classifier

methods: A case study in handwriting digit recognition,” in Proc. Int. Conf.

Pattern Recognition, pp. 77-87, 1994.

[3] C. C. Chang and C. J. Lin., LIBSVM: A Library for Support Vector Machines, 2001, [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/

[4] C. C. Chen. Computational Mathematics. Univ. Tsing Hua, Institute of Information Systems & Applications. 2005, Data Available at http://www.cs.nthu.edu.tw/~cchen/ISA5305/isa5305.html

[5] J.-H. Chiang and P.-Y. Hao, “A New Kernel-Based Fuzzy Clustering Approach:

Support Vector Clustering with Cell Growing,” IEEE Trans. On Fuzzy Systems, vol. 11, pp. 518-527, 2003.

[6] P. W. Cooper, “The hypersphere in pattern recognition.” Information and Control, no. 5, pp. 324–346, 1962.

[7] P. W. Cooper, “Note on adaptive hypersphere decision boundary.” IEEE Transactions on Electronic Computers, pp. 948–949, 1966.

[8] C. Cortes and V. Vapnik, “Support-vector network,” Machine Learning, vol. 20, pp. 273-297, 1995.

[9] K. Crammer and Y. Singer, “On the ability and design of output codes for multiclass problems,” in Computational Learning Theory, pp. 35-46, 2000.

[10] R. E. Fan, P. H. Chen, and C. J. Lin, “Working Set Selection Using Second Order Information for Training Support Vector Machines,” Journal of Machine Learning Research, vol. 6, pp. 1889-1918, 2005.

[11] K. Fukunaga, Introduction to Statistical Pattern Recognition (Second Edition), Academic Press, New York, 1990.

[12] C. W. Hsu and C. J. Lin, “A comparison of methods for multiclass support vector machines,” IEEE Trans. On Neural Networks, vol. 13, pp. 415-425, 2002.

[13] U. Kreel, “Pairwise classification and support vector machines,” in Advances in

Kernel Methods—Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J.

Smola, Eds. MIT Press, Cambridge, MA, pp. 255-268, 1999.

[14] L. M Manevitz, M. Yousef, “One-class SVMs for document classification.”

Journal of Machine Learning Research. vol. 2, pp. 139-154, 2001.

[15] M. Marchand and J. Shawe-Taylor, “The set covering machine.” Journal of Machine Learning Research, vol. 3, pp. 723–746, 2002

[16] D. Michie, D. J. Spiegelhalter, and C. C. Taylor, Machine Learning, Neural and Statistical Classification, Ellis Horwood, 1994. [Online]. Available:

http://www.maths.leeds.ac.uk/~charles/statlog/

[17] T. Mitchell, Machine Learning, McGraw Hill, 1997. Data Available at http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html

[18] S. Mukherjee and V. Vapnik. “Multivariate density estimation: A support vector machine approach.” Technical Report: A.I. Memo No. 1653, MIT AI Lab, 1999.

[19] J. C. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods—Support Vector Learning, B.

Schölkopf, C. J. C. Burges, and A. J. Smola, Eds. MIT Press, Cambridge, MA, pp. 185-208, 1999.

[20] J. C. Platt, N. Cristianini, and J. Shawe-Taylor, “Large margin DAG’s for multiclass classification,” in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, vol. 12, pp. 547-553, 2000.

[21] D. L. Reilly, L. N. Cooper, and C. Elbaum, “A neural model for category learning,” Biological Cybernetics, vol. 45, pp. 35–41, 1982.

[22] B. Schölkopf, C., Burges, V. Vapnik, “Extracting support data for a given task.”

In: Proceedings of First International Conference on Knowledge Discovery and Data Mining, pp. 252–257, 1995.

[23] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola and R. C. Williamson,

“Estimating the support of a high-dimensional distribution,” Neural Computation, vol. 13, pp. 1443-1471, 2001.

[24] D. Tax and R. Duin, “Support vector domain description,” Pattern Recognition Letters, vol. 20, pp. 11-13, 1999.

[25] D. Tax and R. Duin, “Support Vector Data Description,” Machine Learning, vol.

54, pp. 45-66, 2004.

[26] V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.

[27] J. Wang, P. Neskovic, and L. N. Cooper, “Pattern Classification via Single Spheres,” Lecture Notes in Artificial Intelligence, vol. 3735, pp. 241-252, 2005.

[28] J. Weston and C. Watkins, “Multi-class Support machines,” in Proceedings of ESANN99, M. Verleysen, Eds. Brussels, 1999.

[29] Q. Wu, X. Shen, Y. Li, G. Xu, W. Yan, G. Dong, and Q. Yang, “Classifying the Multiplicity of the EEG Source Models Using Sphere-Shaped Support Vector Machines,” IEEE Trans. On Magnetics, vol. 41, pp. 1912-1915, 2005.

[30] M. L. Zhu, S. F. Chen, and X. D. Liu, “Sphere-structured support vector machines for multi-class pattern recognition,” Lecture Notes in Computer Science, vol. 2639 pp. 589-593, 2003.

相關文件