Probabilistic Outputs for One-class SVM - LIBSVM: A Library for Support Vector Machines

Without the label information, it is difficult to obtain probabilistic outputs for one-class SVM. Que and Lin (2022) broadly checked existing approaches for two-one-class classification, but showed that almost none of them are suitable for one-class SVM.

They suggest that a feasible setting is to have probabilities mimic to the decision values of training data. The idea is to group instances with negative values to five bins and instances with positive values to five bins. Then we obtain 11 marks corre-sponding to probability values

0 0.1 . . . 1.

In the prediction phase, we check that the decision value ˆf of a test instance x is the closest to which mark. The corresponding value is our estimate of the probability of being a normal instance (i.e., not an outlier). Specifically, after solving (8), we sort decision values of training data in ascending order and obtain the following points.

m0 m¯1 . . . m¯5 = 0 . . . m¯10

where

m_i, i = 0, . . . , 4 are the 20 × i percentile of sorted negative decision values

and

mi, i = 6, . . . , 10 are the 20 × (i − 5) percentile of sorted positive decision values.

The setting of ¯m₅ = 0 follows from the assumption that P (normal| ˆf = 0) = 0.5.

By defining

m_i = m¯_i+ ¯m_i+1

2 , i = 0, . . . , 9,

identifying the closest ¯mi is equivalent to checking which of the following intervals the decision value falls into.

fˆ (−∞, m₀) [m₀, m₁) [m₁, m₂) · · · [m₁₀, ∞)

P (normal| ˆf ) 0 0.1 0.2 · · · 1

In our implementation, ˆf is sequentially compared with mi to locate the interval and thus the corresponding probability value.

9 Parameter Selection

To train SVM problems, users must specify some parameters. LIBSVM provides a simple tool to check a grid of parameters. For each parameter setting, LIBSVM obtains cross-validation (CV) accuracy. Finally, the parameters with the highest CV accuracy are returned. The parameter selection tool assumes that the RBF (Gaussian) kernel is used although extensions to other kernels and SVR can be easily made. The RBF kernel takes the form

K(x_i, x_j) = e^−γ∥xⁱ^−x^j^∥², (50) so (C, γ) are parameters to be decided. Users can provide a possible interval of C (or γ) with the grid space. Then, all grid points of (C, γ) are tried to find the one giving the highest CV accuracy. Users then use the best parameters to train the whole training set and generate the final model.

We do not consider more advanced parameter selection methods because for only two parameters (C and γ), the number of grid points is not too large. Further, because SVM problems under different (C, γ) parameters are independent, LIBSVM provides a simple tool so that jobs can be run in a parallel (multi-core, shared memory, or distributed) environment.

Figure 3: Contour plot of running the parameter selection tool in LIBSVM. The data set heart scale (included in the package) is used. The x-axis is log₂C and the y-axis is log₂γ.

For multi-class classification, under a given (C, γ), LIBSVM uses the one-against-one method to obtain the CV accuracy. Hence, the parameter selection tool suggests the same (C, γ) for all k(k − 1)/2 decision functions. Chen et al. (2005, Section 8) discuss issues of using the same or different parameters for the k(k − 1)/2 two-class problems.

LIBSVM outputs the contour plot of cross-validation accuracy. An example is in Figure 3.

10 Conclusions

When we released the first version of LIBSVM in 2000, only two-class C-SVC was supported. Gradually, we added other SVM variants, and supported functions such as multi-class classification and probability estimates. Then, LIBSVM becomes a complete SVM package. We add a function only if it is needed by enough users. By

keeping the system simple, we strive to ensure good system reliability.

In summary, this article gives implementation details of LIBSVM. We are still actively updating and maintaining this package. We hope the community will benefit more from our continuing development of LIBSVM.

Acknowledgments

This work was supported in part by the National Science Council of Taiwan via the grants NSC 89-2213-E-002-013 and NSC 89-2213-E-002-106. The authors thank their group members and users for many helpful comments. A list of acknowledgments is at http://www.csie.ntu.edu.tw/~cjlin/libsvm/acknowledgements.

References

B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, 1992.

C.-C. Chang and C.-J. Lin. Training ν-support vector classifiers: Theory and algo-rithms. Neural Computation, 13(9):2119–2147, 2001.

C.-C. Chang and C.-J. Lin. Training ν-support vector regression: Theory and algo-rithms. Neural Computation, 14(8):1959–1977, 2002.

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27:27, 2011. Soft-ware available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

P.-H. Chen, C.-J. Lin, and B. Sch¨olkopf. A tutorial on ν-support vector machines.

Applied Stochastic Models in Business and Industry, 21:111–136, 2005. URL http:

//www.csie.ntu.edu.tw/~cjlin/papers/nusvmtoturial.pdf.

P.-H. Chen, R.-E. Fan, and C.-J. Lin. A study on SMO-type decomposition methods for support vector machines. IEEE Transactions on Neural Networks, 17:893–908, July 2006. URL http://www.csie.ntu.edu.tw/~cjlin/papers/generalSMO.

pdf.

C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273–297,

D. J. Crisp and C. J. C. Burges. A geometric interpretation of ν-SVM classifiers.

In S. Solla, T. Leen, and K.-R. M¨uller, editors, Advances in Neural Information Processing Systems, volume 12, Cambridge, MA, 2000. MIT Press.

K. C. Dorff, N. Chambwe, M. Srdanovic, and F. Campagne. BDVal: repro-ducible large-scale predictive model development and validation in high-throughput datasets. Bioinformatics, 26(19):2472–2473, 2010.

R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research, 6:1889–1918, 2005. URL http://www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf.

S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representa-tions. Journal of Machine Learning Research, 2:243–264, 2001.

T. Glasmachers and C. Igel. Maximum-gain working set selection for support vector machines. Journal of Machine Learning Research, 7:1437–1466, 2006.

K. Grauman and T. Darrell. The pyramid match kernel: Discriminative classification with sets of image features. In Proceedings of IEEE International Conference on Computer Vision, 2005.

M. Hanke, Y. O. Halchenko, P. B. Sederberg, S. J. Hanson, J. V. Haxby, and S. Poll-mann. PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data.

Neuroinformatics, 7(1):37–53, 2009. ISSN 1539-2791.

C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13(2):415–425, 2002a.

C.-W. Hsu and C.-J. Lin. A simple decomposition method for support vector ma-chines. Machine Learning, 46:291–314, 2002b.

C.-W. Hsu, C.-C. Chang, and C.-J. Lin. A practical guide to support vector classifica-tion. Technical report, Department of Computer Science, National Taiwan Univer-sity, 2003. URL http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.

pdf.

T. Joachims. Making large-scale SVM learning practical. In B. Sch¨olkopf, C. J. C.

Burges, and A. J. Smola, editors, Advances in Kernel Methods – Support Vector Learning, pages 169–184, Cambridge, MA, 1998. MIT Press.

S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation, 13:637–

649, 2001.

S. S. Keerthi, O. Chapelle, and D. DeCoste. Building support vector machines with reduced classifier complexity. Journal of Machine Learning Research, 7:1493–1515, 2006.

S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer learning revisited: a stepwise procedure for building and training a neural network. In J. Fogelman, editor, Neu-rocomputing: Algorithms, Architectures and Applications. Springer-Verlag, 1990.

U. H.-G. Kressel. Pairwise classification and support vector machines. In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods – Support Vector Learning, pages 255–268, Cambridge, MA, 1998. MIT Press.

Y.-J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In Proceedings of the First SIAM International Conference on Data Mining, 2001.

C.-J. Lin and R. C. Weng. Simple probabilistic predictions for support vector regres-sion. Technical report, Department of Computer Science, National Taiwan Univer-sity, 2004. URL http://www.csie.ntu.edu.tw/~cjlin/papers/svrprob.pdf.

H.-T. Lin, C.-J. Lin, and R. C. Weng. A note on Platt’s probabilistic outputs for support vector machines. Machine Learning, 68:267–276, 2007. URL http://www.

csie.ntu.edu.tw/~cjlin/papers/plattprob.pdf.

N. List and H. U. Simon. General polynomial time decomposition algorithms. Journal of Machine Learning Research, 8:303–321, 2007.

N. List and H. U. Simon. SVM-optimization and steepest-descent line search. In Proceedings of the 22nd Annual Conference on Computational Learning Theory, 2009.

J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kubler, S. Marinov, and E. Marsi. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2):95–135, 2007.

E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An appli-cation to face detection. In Proceedings of IEEE Computer Society Conference on

在文檔中 LIBSVM: A Library for Support Vector Machines (頁 34-40)