A new maximal-margin spherical-structured multi-class support vector machine

(1)

DOI 10.1007/s10489-007-0101-z

A new maximal-margin spherical-structured multi-class support

vector machine

Pei-Yi Hao· Jung-Hsien Chiang · Yen-Hsiu Lin

Published online: 18 October 2007

Abstract Support vector machines (SVMs), initially pro-posed for two-class classification problems, have been very successful in pattern recognition problems. For multi-class multi-classification problems, the standard hyperplane-based SVMs are made by constructing and combining several maximal-margin hyperplanes, and each class of data is con-fined into a certain area constructed by those hyperplanes. Instead of using hyperplanes, hyperspheres that tightly en-closed the data of each class can be used. Since the class-specific hyperspheres are constructed for each class sepa-rately, the spherical-structured SVMs can be used to deal with the multi-class classification problem easily. In addi-tion, the center and radius of the class-specific hypersphere characterize the distribution of examples from that class, and may be useful for dealing with imbalance problems. In this paper, we incorporate the concept of maximal margin into the spherical-structured SVMs. Besides, the proposed approach has the advantage of using a new parameter on controlling the number of support vectors. Experimental re-sults show that the proposed method performs well on both artificial and benchmark datasets.

P.-Y. Hao (

)

Department of Information Management, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan

e-mail:haupy@cc.kuas.edu.tw J.-H. Chiang· Y.-H. Lin

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan

J.-H. Chiang

e-mail:jchiang@mail.ncku.edu.tw Y.-H. Lin

e-mail:yanxiu@cad.csie.ncku.edu.tw

Keywords Support vector machines (SVMs)· Multi-class classification· Spherical classifier · Maximal-margin classifier· Quadratic programming

1 Introduction

The support vector machine (SVM) is a very promising ma-chine learning method proposed by Vapnik et al. [8,26]. Based on the idea of VC-dimension and the principal of structural risk minimization, an SVM is intuitively a two-class two-classifier in the form of a hyperplane that leaves the largest possible fraction of points of the same class on the same side, while maximizing the distance of either class from the hyperplane. The optimal maximal-margin hyper-plane minimizes not only the empirical risk, but also the up-per bound on the expected risk, and thus has better general-ization ability compared with traditional classifiers. Because of their excellent performance and simple structures, SVMs have been applied to many fields. Although SVMs were ini-tially proposed for two-class classification problems, two main approaches have been proposed to solve multi-class classification problems currently [12]. One is all-together multi-class SVM that directly considers all classes in one optimization formulation [9, 28], while the other is com-bined multi-class SVM that constructs several binary clas-sifiers through methods such as one-against-all [2], one-against-one [13], directed acyclic graph SVM (DAG-SVM) [20]. All these SVM classifiers belong to algorithms based on a maximal-margin hyperplane. To a multi-class problem, such SVMs are to divide the data space by several hyper-planes and each class of data is confined into a certain area constructed by a number of hyperplanes.

Instead of using a hyperplane, a hypersphere around the examples of one class can be used. Given a set of training

(2)

data, the minimum bounding hypersphere is defined as the smallest hypersphere that encloses all the data. The min-imum bounding hypersphere was first used by Schölkopf, Burges, and Vapnik to estimate the VC-dimension of a clas-sifier [22], and latter applied by Tax and Duin to data do-main description [24, 25]. In [23], Schölkopf et al. pro-posed a hyperplane-based one-class SVM and showed that it is equivalent to the minimum bounding hypersphere when using the Gaussian RBF kernel. Inspired by the min-imum bounding hypersphere, spherical-structured SVMs have been proposed to solve the multi-class classification problems [14,29,30]. The same class of data being bound by an optimal class-specific hypersphere, the whole data space is then divided by a number of such hyperspheres. Ex-perimental results have showed that the spherical-structured SVMs perform comparable to the standard hyperplane-based SVMs [14,27,29,30].

Motivated by the Bayes decision theory, an optimal clas-sifier should consider the probability distribution, includ-ing the mean and variance, of each class. The use of clas-sical hyperplane-based SVM for probability density esti-mation was first introduced in [18]. Schölkopf et al. also showed that the hyperplane-based one-class SVM [23] can be used as a probability density estimator. In contrast to the hyperplane-based SVMs, the hypersphere-based SVMs pro-vide the information of spherical center and radius, which characterize the probability distribution for each class more intuitive, and may be useful for dealing with class-imbalance problems. In this paper, inspired by the maximal-margin SVM classifier and the spherical-structured SVMs, we pro-pose a novel maximal-margin spherical-structured multi-class support vector machine (SVM). The MSM-SVM finds several class-specific hyperspheres that each en-closes all examples from one class but excludes all examples from the rest class. In addition, the hypersphere separates those classes with maximal margin. Moreover, the proposed approach has the advantage of using a new parameter on controlling the number of support vectors.

The rest of this paper is organized as follows. First, we give a brief overview of the spherical-structured SVMs. In Sect.3we address the proposed maximal-margin spherical-structured multi-class SVM, and then analyze its properties theoretically in Sect. 4. Experiments are then discussed in Sect.5.

2 Previous works

Spherical classifiers [27] were first introduced into pattern classification by Cooper in 1962 [6, 7]. One well-known classification algorithm consisting of spheres is the Re-stricted Coulomb Energy (RCE) network [21]. The RCE

network is a supervised learning algorithm that learns pat-tern categories by representing each class as a set of proto-type regions—usually spheres. Another well-known spheri-cal classifiers is the set covering machine (SCM) proposed by Marchand and Shawe-Taylor [15]. In their approach, the final classifier is a conjunction or disjunction of a set of spherical classifiers, where every spherical classifier di-chotomizes the whole input space into two different classes with a sphere.

Spherical classification algorithms normally need a num-ber of spheres in order to achieve good classification per-formance, and therefore have to deal with difficult theo-retical and practical issues such as how many spheres are needed and how to determine the centers and radius of the spheres [27]. In contrast to previous spherical classifiers that construct spheres in the input space, the basic idea of the spherical-structured SVMs [14,27,29, 30] is to construct class-specific hyperspheres in the feature space induced by the kernel.

2.1 Support vector domain description (SVDD)

Tax and Duin introduced the use of a data domain descrip-tion method [24, 25], inspired by the SVM developed by Vapnik. Their approach is known as the support vector do-main description (SVDD). In dodo-main description the task is to give a description of a set of objects. This description should cover the class of objects represented by the training set, and should ideally reject all other possible objects in the object space.

We now illustrate the support vector domain description method as shown in Fig.1. To begin, let φ denotes a non-linear transformation, which maps the original input space into a high-dimensional feature space. Data domain descrip-tion gives a closed boundary around the data: a hypersphere. As shown in Fig.1(a) the SVDD may be viewed as find-ing a smallest sphere, which is characterized by its cen-ter a and radius R, that encloses all the data points in the feature space. The contour diagram shown in Fig.1(b) can be obtained by estimating the distance between the spher-ical center and the corresponding input point in the fea-ture space. The larger the distance, the darker is the gray level in the contour diagram. The algorithm then further identifies domain description using the data points enclosed inside the boundary curve. The mathematical formulation of the SVDD is as follows. Given a set of input patterns, {xi}i=1,...,N, the support vector domain description method

is to solve the problem: minimize R,a,ξi R2+ C i ξi subject toφ(xi)− a2≤ R2+ ξi, ξi≥ 0 ∀i, (2.1)

(3)

Fig. 1 Support vector domain description

(a)

(b)

where R is the radius and a is the center of the enclosing hypersphere; ξi is a slack variable; and C > 0 is a constant

that controls the penalty to errors. Using the Lagrangian the-orem, we can formulate the Wolfe dual problem as

maximize αi i αiφ(xi)· φ(xi) − i,j αiαjφ(xi)· φ(xj) subject to i αi= 1 and 0 ≤ αi≤ C ∀i, (2.2)

where{αi}i=1,...,N are the Lagrange multipliers and·

de-notes the inner product. Solving the dual quadratic program-ming problem gives the Lagrange multipliers αi for all i.

The center a and the radius R can be subsequently deter-mined by

a=

i

αiφ (xi) and R= φ(xi)− a

∀i such that 0 < αi< C.

In real-world applications, training data of a class is rarely distributed spherically [27]. To have more flexible descriptions of a class, one can first transform the train-ing examples into a high-dimensional feature space by us-ing a nonlinear mappus-ing φ and then compute the mini-mum bounding hypersphere in that feature space. For certain nonlinear mappings, there exists a highly effective method, known as the kernel trick, for computing inner products in

the feature space. A key property of the SVMs is that the only quantities that one needs to compute are inner products, of the formφ(x) · φ(y). It is therefore convenient to intro-duce the so-called kernel function K(x, y)≡ φ(x) · φ(y). The functional form of mapping φ(x) does not need to be known since it is implicitly defined by the choice of ker-nel function. The power of the kerker-nel trick lies in that, by simply replacing the inner productφ(x) · φ(y) with a ker-nel function K(x, y), the resulting hypersphere may actually represent a highly complex shape in the original input space, as illustrated in Fig.1.

2.2 Support vector domain description for multi-class problem (M-SVDD)

Inspired by the support vector domain description, spherical-structured SVMs have been proposed to solve the multi-class multi-classification problems [14, 29, 30]. Given a set of training data (x1, y1), . . . , (xN, yN), where xi ∈ Rn and

yi ∈ {1, . . . , K} is the class of xi, the minimum bounding

hypersphere of each class is the smallest hypersphere en-closing all the training examples of that class. The mini-mum bounding hypersphere Sk for each class k, which is

characterized by its center ak and radius Rk, can be found

by solving the following constrained quadratic optimization problem: minimize Rk,ak,ξi R2_k+ C Nk i:yi=k ξi (2.3a)

(4)

subject to φ(xi)− ak2≤ Rk2+ ξi

∀i such that yi= k, (2.3b)

ξi≥ 0 ∀i, (2.3c)

where ξi are slack variables, C > 0 is a constant that

con-trols the penalty to errors, and Nk is the number of

exam-ples within class k. Using the Lagrange multiplier method, this quadratic programming problem can be formulated as the following Wolfe dual form:

maximize αi:yi=k i:yi=k αiK(xi,xi) − i,j:yi,yj=k αiαjK(xi,xj) (2.4a) subject to i:yi=k αi= 1, (2.4b) 0≤ αi≤ C

Nk ∀i such that yi= k.

(2.4c) After all K class-specific hyperspheres are constructed, we can determine the membership degree of a data point x belonging to class k based on the center ak and radius

Rkof the class-specific hypersphere Sk. By using a

similar-ity function sim(x, Sk), we say x is in the class that has the

largest value of the similarity function: class of x≡ arg max

k=1,...,Ksim(x, Sk).

Imbalanced data distribution is a familiar phenomenon in classification problems. For instance, Fig. 2 indicates two classes of examples generated from the Gaussian dis-tribution with different variance and number of data. In some iterative update-based learning machines (such as

back-propagation neural networks) the final decision func-tion is deviated to the class with smaller number of data points due to it takes more back-propagation update iter-ations in the class with larger number of data points. The hyperplane-based SVMs classifier does not suffer this bias. The hyperplane-based SVMs construct a hyperplane that leaves the largest possible fraction of points of the same class on the same side, while maximizing the distance of either class from the hyperplane, and the distance is called margin, as shown in Fig. 2. However, hyperplane-based SVM classifier only considers the information of bound-aries of each class, but not the information of distribu-tion, for instance the mean and variance, of each class. Ac-cording to the Bayesian decision rule, the optimal deci-sion function is defined as minimizing the error probabil-ity [11] and with location slightly deviated to the class with smaller variance. In contrast to hyperplane-based SVM, the hypersphere-based SVM takes the distribution for each class into account. Therefore, taking into consideration the spher-ical center (mean) and the spherspher-ical radius (variance) in the feature space, the hypersphere-based SVM resembles more the optimal Bayesian classifier.

2.3 Support vector domain description with negative examples for multi-class problem (M-SVDD-NEG) Given the fact that a minimum bounding hypersphere of each class is constructed without considering the distribu-tion of training examples of other classes, it is not imme-diately clear whether or not an effective classifier can be built based on these class-specific minimum bounding hy-perspheres. Tax et al. proposed a new algorithm to incor-porate the negative examples in the training phase to im-prove the description [25]. In contrast with the positive ex-amples (exex-amples in the kth class), which should be within the class-specific hypersphere Sk, the negative examples (all

Fig. 2 A comparison of hyperplane-based SVMs and hypersphere-based SVMs in decision boundary

(5)

other examples) should be outside it. The minimum bound-ing hypersphere Skfor each class k can be found by solving

the following constrained quadratic optimization problem: minimize Rk,ak,ξi,ξl R_k2+ C Nk i:yi=k ξi+ C N_¯k l:yl=k ξl (2.5a) subject to φ(xi)− ak2≤ R2k+ ξi

∀i such that yi= k, (2.5b)

φ(xl)− ak2≥ Rk2− ξl

∀l such that yl= k, (2.5c)

ξi≥ 0, ξl≥ 0 ∀i, l (2.5d)

where ξi, ξl are slack variables. Nk and N¯k are the number

of examples within the kth class and the rest class, respec-tively. Using the Lagrange multiplier method, this quadratic programming problem can be formulated as the following Wolfe dual form:

maximize i:yi=k αiK(xi,xi)− l:yl=k αlK(xl,xl) − i,j:yi,yj=k αiαjK(xi,xj) + 2 i:yi=k,l:yl=k αiαlK(xi,xl) − l,m:yl,ym=k αlαmK(xl,xm) (2.6a) subject to i:yi=k αi− l:yl=k αl= 1, (2.6b) 0≤ αi≤ C Nk

∀i such that yi= k, (2.6c)

0≤ αl≤

C

N_¯k ∀l such that yl= k. (2.6d) Solving the dual quadratic programming problem gives the Lagrange multipliers αiand αl. The center akand the radius

Rkof the class-specific hypersphere Skcan be subsequently

determined by ak= i:yi=k αiφ (xi)− l:yl=k αlφ (xl), (2.7) Rk= φ(xi)− ak

∀i such that yi = k and 0 < αi<

C Nk

. (2.8)

3 Methodology

Inspired by the maximal-margin hyperplane-based SVM [8,26] and the support vector domain description (SVDD)

[24,25], Wang et al. [27] first incorporated the concept of maximal-margin into the hypersphere-based SVM for two-class two-classification problem via a single sphere. In this sec-tion, we propose a modification of the Wang’s approach, called the maximal-margin spherical-structured multi-class support vector machine (MSM-SVM). The MSM-SVM finds several class-specific hyperspheres that each encloses all examples from one class but excludes all examples from the rest class. In addition, the hypersphere separates the pos-itive examples from the negative examples with maximal margin.

3.1 The quadratic programming problem

Given a set of training data (x1, y1), . . . , (xN, yN), where

yi∈ {1, . . . , K} is the class of xi, we first map training points

into a high-dimensional feature space via a nonlinear trans-form φ, and then find K class-specific hyperspheres with minimal radius in the feature space such that the kth hyper-sphere encloses all data points within the kth class (posi-tive examples) but excludes all examples from the rest class (negative examples). In addition, according to the concept of maximal margin, the negative examples shall far away from the kth hypersphere. As illustrated in Fig.3, the margin is defined as the distance from the nearest negative example to the boundary of hypersphere. Maximizing the margin will lead the negative examples far away from that hypersphere. By introducing a new margin factor, dk, the MSM-SVM is

derived to separate the positive examples from the negative examples with maximal margin. Mathematically, the class-specific hypersphere Skfor each class k, which is

character-ized by its center akand radius Rk, can be found by solving

the following constrained quadratic optimization problem: QP3.1 minimize Rk,ak,dk,ξi,ξl R2_k− Md_k2+ C Nk i:yi=k ξi+ C N_¯k l:yl=k ξl (3.1a) subject to φ(xi)− ak2≤ Rk2+ ξi

∀i such that yi = k, (3.1b)

φ(xl)− ak2≥ Rk2+ d

2

k− ξl

∀l such that yl= k, (3.1c)

ξi≥ 0, ξl≥ 0 ∀i, l (3.1d)

where Nkand N¯kare the number of examples within the kth

class and the rest class, respectively. The positive examples are enumerated by indices i, j and the negative examples by l, m.ξi and ξl are slack variables and parameter C≥ 0

controls the penalty to errors. Figure3 gives a geometrical point of view explaining the relationship between the mar-gin factor dk and the width of margin. As shown in Fig.3,

the width of margin in our approach is

(6)

we choose following datasets: vehicle and german. Further-more, we choose face and imox from [17] and [4], respec-tively. In this experiment, we compare our approach with the hyperplane-based multi-class SVMs approaches, includ-ing one-against-all, one-against-one, and DAGSVM. Be-sides, we compare our approach with the hypersphere-based multi-class SVMs, including M-SVDD and M-SVDD-NEG. For hypersphere-based SVMs, we evaluate the classification performance by using different similarity functions men-tioned in Sect.3.2.

The most important criterion for evaluating the perfor-mance of those methods is their accuracy rate. However, it is unfair to use only one parameter set for comparing these methods. Practically for the SVM, the best parameters are first obtained by performing model-parameters selection. For each problem, we estimate the generalized accuracy us-ing different kernel parameters γ = [22,21,20, . . . ,2−15] and regularization parameter C= [215,212,211, . . . ,2−4]. Therefore, for each problem we try 18×20 combinations [12]. We apply the five-fold cross-validation method to se-lect the model parameters. Namely, for each problem we partition the available examples into five disjoint subsets (called ‘folds’) of approximately equal size. The classifier is trained on all the subsets except for one, and the valida-tion error is measured by testing it on the subset left out. This procedure is repeated for a total of five trails, each time using a different subset for validation. The performance of the model is assessed by averaging the squared error under validation over all the trails of this problem. According to their cross-validation rate, we try to infer the proper values of model-parameters.

Table2presents the result of comparing these methods. We present the optimal parameters (C, γ ) and the corre-sponding cross-validation rate. Note that (a)–(e) means the five similarity functions mentioned in Sect.3.2, respectively. In addition, we denote by ˆC,ˆγ, ˆMthe logarithm of the op-timal model-parameters C, γ , M (to base 2), respectively. It can be seen that optimal model-parameters are in vari-ous ranges for different problems so it is critical to perform model-parameters selection task. The previous hypersphere-based SVM classifiers, M-SVDD and M-SVDD-NEG., give worse results than the standard hyperplane-based SVM clas-sifiers on most of the datasets. However, using our pro-posed algorithm, which incorporated the concept of maxi-mal margin, the classification performance of the resulting hypersphere-based classifiers improves significantly and is better than that of the standard hyperplane-based SVM clas-sifiers on most of the datasets being tested. In addition, the Wu’s and the proposed Chiang’s similarity functions achieve better accuracy rate compared with other similarity func-tions.

6 Conclusions

The solution of binary classification problem using the SVM has been well developed. For multi-class classifica-tion problems, two types of multiclass SVMs have been pro-posed. One is the hyperplane-based SVM; while the other is the hypersphere-based SVM. Wang et al. [27] first incor-porated the concept of maximal-margin into hypersphere-based SVM for two-class classification problem via a sin-gle sphere by adjusting the ratio of the radius of the sphere to the separation margin. In this paper, we extend Wang’s approach to multi-class problems, and propose a maximal-margin spherical-structured multi-class support vector ma-chine (MSM-SVM). The proposed MSM-SVM approach finds several class-specific hyperspheres where each en-closes all positive examples but excludes all negative ples. Besides, the hypersphere separates the positive exam-ples from the negative examexam-ples with maximal margin. The proposed MSM-SVM has advantage of using parameters M and C on controlling the number of support vectors. With M and C limiting the maximum number of outlier support vec-tors (OSVs), as well as the minimum number of total sup-port vectors (SVs), the selection of (M, C) is more intuitive. We propose a new fuzzy similarity function, and give an ex-perimental comparison of the similarity functions that have been proposed in previous spherical-structured SVMs. Ex-perimental results show that the proposed method performs fairly well on both artificial and benchmark datasets. Acknowledgements The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This work was partially supported by National Science Council Taiwan under grant. NSC-95-2221-E-151-037.

References

1. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Dept Inform Comput Sci, Univ California, Irvine, CA (online). Available:http://kdd.ics.uci.edu/

2. Bottou L, Cortes C, Denker J, Drucker H, Guyon I, Jackel L, Le-Cun Y, Muller U, Sackinger E, Simard P, Vapnik V (1994) Com-parison of classifier methods: a case study in handwriting digit recognition. In: Proceediongs of the international conference on pattern recognition, pp 77–87

3. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines (online). Available: http://www.csie.ntu.edu.tw/~cjlin/ libsvm/

4. Chen CC (2005) Computational mathematics. Univ Tsing Hua, Institute of Information Systems & Applications. Data available athttp://www.cs.nthu.edu.tw/~cchen/ISA5305/isa5305.html 5. Chiang J-H, Hao P-Y (2003) A new kernel-based fuzzy clustering

approach: support vector clustering with cell growing. IEEE Trans Fuzzy Syst 11:518–527

6. Cooper PW (1962) The hypersphere in pattern recognition. Inf Control 5:324–346

7. Cooper PW (1966) Note on adaptive hypersphere decision bound-ary. IEEE Trans Electron Comput 948–949

(7)

8. Cortes C, Vapnik V (1995) Support-vector network. Mach Learn 20:273–297

9. Crammer K, Singer Y (2000) On the ability and design of output codes for multiclass problems. In: Computational learning theory, pp 35–46

10. Fan RE, Chen PH, Lin CJ (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918

11. Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, New York

12. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13:415–425 13. Kreßel U (1999) Pairwise classification and support vector

ma-chines. In: Schölkopf, B, Burges CJC, Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 255–268

14. M Manevitz L, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2:139–154

15. Marchand M, Shawe-Taylor J (2002) The set covering machine. J Mach Learn Res 3:723–746

16. Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learn-ing, neural and statistical classification. Ellis Horwood, Chichester Available:http://www.maths.leeds.ac.uk/~charles/statlog/ 17. Mitchell T (1997) Machine Learning. McGraw-Hill, New York

Data available at http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ mitchell/ftp/faces.html

18. Mukherjee S, Vapnik V (1999) Multivariate density estimation: A support vector machine approach. Technical Report: AI Memo No 1653, MIT AI Lab

19. Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B, Burges CJC,

Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208

20. Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAG’s for multiclass classification. In: Advances in neural in-formation processing systems, vol 12. MIT Press, Cambridge, pp 547–553

21. Reilly DL, Cooper LN, Elbaum C (1982) A neural model for cat-egory learning. Biol Cybern 45:35–41

22. Schölkopf B, Burges C, Vapnik V (1995) Extracting support data for a given task. In: Proceedings of first international conference on knowledge discovery and data mining, pp 252–257

23. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13:1443–1471

24. Tax D, Duin R (1999) Support vector domain description. Pattern Recognit Lett 20:11–13

25. Tax D, Duin R (2004) Support vector data description. Mach Learn 54:45–66

26. Vapnik V (1998) Statistical learning theory. Wiley, New York 27. Wang J, Neskovic P, Cooper LN (2005) Pattern classification via

single spheres. In: Lecture notes in artificial intelligence, vol 3735, pp 241–252

28. Weston J, Watkins C (1999) Multi-class Support machines. In: Verleysen M (ed) Proceedings of ESANN99, Brussels

29. Wu Q, Shen X, Li Y, Xu G, Yan W, Dong G, Yang Q (2005) Clas-sifying the multiplicity of the EEG source models using sphere-shaped support vector machines. IEEE Trans Magn 41:1912– 1915

30. Zhu ML, Chen SF, Liu XD (2003) Sphere-structured support vec-tor machines for multi-class pattern recognition. In: Lecture notes in computer science, vol 2639, pp 589–593