Fuzzy One-Class Support Vector Machines

(1)

www.elsevier.com/locate/fss

Fuzzy one-class support vector machines

Pei-Yi Hao

∗

Department of Information Management, National Kaohsiung University of Applied Sciences, 415 Chien Kung Road, Kaohsiung 807, Taiwan, ROC

Received 19 June 2006; received in revised form 13 July 2007; accepted 10 January 2008 Available online 25 January 2008

Abstract

In one-class classification, the problem is to distinguish one class of data from the rest of the feature space. It is important in many applications where one of the classes is characterized well, while no measurements are available for the other class. Schölkopf et al. first introduced a method of adapting the support vector machine (SVM) methodology to the one-class classification problem, called one-class SVM. In this paper, we incorporate the concept of fuzzy set theory into the one-class SVM. We apply a fuzzy membership to each input point and reformulate the one-class SVM such that different input points can make different contributions to the learning of decision surface. Besides, the parameters to be identified in one-class SVM, such as the components within the weight vector and the bias term, are fuzzy numbers. This integration preserves the benefits of SVM learning theory and fuzzy set theory, where the SVM learning theory characterizes the properties of learning machines which enable them to effectively generalize the unseen data and the fuzzy set theory might be very useful for finding a fuzzy structure in an evaluation system.

Keywords: Support vector machines (SVMs); One-class classiﬁcation; One-class SVMs; Fuzzy system models

1. Introduction

In modeling some systems where available information is uncertain, we must deal with a fuzzy structure of the system considered. This structure is represented as a fuzzy function whose parameters are given by fuzzy sets. The fuzzy functions are deﬁned by Zadeh’s extension principle[7,8,21,31,32]. Basically, the fuzzy function provides an effective means of capturing the approximate, inexact natural of real world. Fuzzy theory appears very useful when the processes are too complex for analysis by conventional quantitative techniques or when the available source information is interpreted qualitatively, inexactly, or uncertainly.

The support vector machines (SVMs) were developed at AT&T Bell Laboratories by Vapnik and co-works [4,11]. It is based on the idea of structural risk minimization, which shows that the generalization error is bounded by the sum of the training error and a term depending on the Vapnik–Chervonenkis dimension. By minimizing this bound, high generalization performance can be achieved. Due to this industrial context, SVM research is up to date and has a sound orientation toward real-world applications. In many applications, SVM has been shown to provide higher performance than traditional learning machines and has been introduced as a powerful tool for solving classiﬁcation problems.

∗_{Tel.: +886 7 3814526 6117; fax: +886 7 3831332.} E-mail address:haupy@cc.kuas.edu.tw.

(2)

A comprehensive tutorial on SVM classifier has been published by Burges[1]. Since SVM has been very successful in pattern recognition problems, Lin et al. [16] first introduced the use of fuzzy set theory for SVM classification problems. Whereas Chiang et al. applied the SVM theory for the fuzzy rules based modeling [5] and clustering [6] .

However, those studies are based on training using both positive and negative examples, as the basic SVM paradigm suggest. We have been interested, however, in the classiﬁcation using only positive examples for training. This is important in many applications [18,27]. Consider, for example, trying to classify sites of “Internet’’ to a web surfer where the only information available is the history of the user’s activities. One can envisage identifying typical positive examples by such tracking, but it would be hard to identify representative negative examples. Another example is that when we want to monitor a machine. A classiﬁer should detect when the machine is showing abnormal, faulty behavior. Measurements on the normal operation of the machine are easy to obtain. In faulty situations, on the other hand, the machine might be destroyed completely.

In one-class classification, one class of data has to be distinguished from the rest of the feature space. In this type of classification problems, one of the classes is characterized well, while for the other class (almost) no measurements are available. It is assumed that we have examples from just one of the classes, called the target class and that all other possible objects, per definition of the outlier objects, are uniformly distributed around the target class. This one-class classification problem is often solved by estimating the target density [20], or by fitting a model to the data support vector classifier [29]. Schölkopf et al. [23] suggested a method of adapting the SVM methodology to the one-class classification, called one-class SVM. Instead of using a hyperplane, a hypersphere around the target set is used. This method is called the support vector data description (SVDD), developed by Tax and Duin [26,27].

In this paper, we incorporate the concept of fuzzy set theory into the one-class SVM model proposed by Schölkopf et al. [23]. Different from one-class SVM, the proposed fuzzy one-class SVM treat the training data points with different importance in the training process. Namely, fuzzy one-class SVM fuzzify the penalty term of the cost function to be minimized and reformulate the constrained optimization problem. Hence, the central concept of the proposed fuzzy one-class SVM is to assign each data point a membership value according to its relative importance in the class. Beside, the parameters to be identified in the fuzzy one-class SVM model, such as the components of weight vector and the bias term, are set to be the fuzzy numbers. In other words, we construct a fuzzy hyperplane in the feature space to distinguish the target class from the rest. Moreover, the decision function of the proposed fuzzy one-class SVM is based on a fuzzy partial ordering relation. This integrates the benefits of one-class SVM and fuzzy set theory, where the VC theory characterizes properties of learning machines which enables them to generalize well the unseen data, whereas the fuzzy set theory might be very useful for finding a fuzzy structure in an evaluation system.

This is important in many applications. Consider, for example, trying to predict unknown protein–protein interactions based on computational methods [19]. All currently existing data set of protein–protein interactions has positive examples only. Besides, in view of the large fraction of false positive interactions generated by high throughput methods, the data sets of protein–protein interactions are usually incomplete, incorrect, and noisy [14,30]. We would like to have a learning machine such that the interactions generated by reliable methods (small-scale experiments) are given more weighting than the interactions generated by high throughput experiments.

The rest of this paper is organized as follows. A brief review of the theory of one-class SVM is described in Section 2. The fuzzy one-class SVM is derived in Section 3. Experiments are presented in Section 4, and some concluding remarks are given in Section 5. Detail of the derivation is relegated to an Appendix.

2. One-class SVM

Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple’’ subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori speciﬁed value between 0 and 1. Schölkopf et al. [23] propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data.

Consider the training data x1, . . . , xN∈ Rn, where N is the number of observations. Let be a feature map Rn→ F i.e. a map into a dot product space such that the dot product in the image of can be computed by evaluating some simple kernel [28,1,22]

(3)

such as the Gaussian kernel

k(x, y) = e−qx−y2,

where· denotes the inner product. Schölkopf et al. suggested a method of adapting the SVM methodology to the one-class classiﬁcation. Essentially, after transforming the training data via a feature map, they treat the origin as the only member of the second class. Then using “relaxation parameters’’ they separate the image of the target class from the origin with maximum margin. The one-class SVM algorithm returns a function f that takes the value +1 in a “small’’ region capturing most of the data points, and−1 elsewhere. For a new point x, the value f (x) is determined by evaluating which side of the hyperplane it falls on, in the feature space.

To separate the data set from the origin, they solve the following quadratic program: minimize w,b,i 1 2w 2_{+ C}N i=1 i + b subject to w · (x_i) + b 0 − _i, _i0, i = 1, . . . , N. (1) If w and b solve this problem, then the decision function

f (x) = sign(w · (x) + b)

will be positive for most example x_i in the training set. Using the Lagrangian theorem, we can formulate the dual problem as minimize i 1 2 N i=1 N j =1 ijk(xi, xj) subject to 0_iC, i = 1, . . . , N, i i = 1, (2)

wherei are the nonnegative Lagrange multipliers.

A larger C means to assign a higher penalty to errors and thus reduces the number of misclassification data points. On the contrary, a smaller C is to ignore more plausible misclassification data points and thus get wider margins. No matter whether the value of C is large or small, this parameter is fixed during the training process of one-class SVM. Namely, all training data points are equally treated during the training of one-class SVM. This leads to a higher sensitivity to some special cases, such as outliers and noises. In addition, the decision function f (x) of the one-class SVM takes a value of either+1 or −1, i.e. each data point is dichotomized into two groups: members and nonmembers. A sharp, unambiguous distinction exists between the member and nonmember of the set. However, when modeling a system where human estimation is influential, we perceive this set as having an imprecise boundary. The decision function should be generalized such that the values assigned to the elements fall within a specified range and indicate the membership grade of these elements in the set in question.

3. Fuzzy one-class SVM

In many real-world applications, the effects of the training points are different. It is often seen that some training points are more important than others in the classification problem. We would require that the meaningful training points must be classified correctly and would not care about some training points like noises whether or not they are misclassified.

That is, each training point no more exactly belongs to the target classes. It may 90% belong to the target class and 10% be meaningless, or it may 20% belong to the target class and 80% be meaningless. In other words, there is a fuzzy membership 0 < i1 associated with each training point xi. This fuzzy membershipi can be regarded as the

attitude of the corresponding training point toward the target class in the classiﬁcation problem and the value (1 − i)

can be regarded as the attitude of meaningless[16]. As mentioned above, equal treatment of all the data points may cause unsuitable overﬁtting in one-class SVMs. Hence, the central concept of the proposed fuzzy one-class SVM is to

(4)

assign each data point a membership value according to its relative importance in the class and treat the training data points with different importance in the training process.

Another application is sequential learning and inference method, which is important in many applications involving real-time signal processing[9]. For example, we would like to have a learning machine such that the points from recent past is given more weighting than the points far back in the past. For this purpose, we can select the fuzzy membership as a function of time that the point generated and this kind of problem can be easily implemented.

Now, we consider the following two models M1 and M2: M1: f (x) = w · x + b, w ∈ Rn, b ∈ R,

M2: f (x) = W · x + B, W ∈ T (R)n, B ∈ T (R).

In model M1 we construct a crisp hyperplane that separates the image of the target class from the origin, and we associate a fuzzy membership to each data point such that different data points can have different effects in the learning of the separating hyperplane. We can treat the noises or outliers as less importance and let these points have lower fuzzy membership. While in model M2 we construct a fuzzy hyperplane to distinguish the target class from the rest. For computational simplicity, we assume the fuzzy parameters to be identiﬁed in model M2 are symmetric triangular fuzzy numbers, where T (R) denotes the space of all symmetric triangular fuzzy numbers. Now, we are going to consider how to get solutions for models M1 and M2.

3.1. Model M1

Suppose we are given a set of training points with associated fuzzy membership (x1, 1), . . . , (xN, N).

Each training point x_i ∈ Rnis given a fuzzy membership 0 < _i1. Since the fuzzy membership _i is the attitude of the corresponding point x_i toward the target class and the parameter_i is a measure of error in the one-class SVM, the term_i_i is a measure of error with different weighting[16]. The constrained optimization problem of the fuzzy one-class SVM is then formulated as

minimize w,b,i 1 2w 2_{+ C} N i=1 _ii+ b subject to w · (x_i) + b 0 − _i, _i0, i = 1, . . . , N. (3)

We can ﬁnd the solution of this optimization problem given by Eq. (3) in dual variables by ﬁnding the saddle point of the Lagrangian: L = 1 2w 2_{+ C} N i=1 _ii+ b − N i=1 i(w · (xi) + b + i) − N i=1 _ii, (4)

where_iand_i are the nonnegative Lagrange multipliers. Differentiating L with respect to w, b and i and setting the

result to zero, we obtain jL/jw = w −N i=1 i(xi) = 0 ⇒ w = N i=1 i(xi), (5) jL/jb = 1 − N i=1 i = 0 ⇒ N i=1 i = 1, (6) jL/j_i = C_i− i − i = 0 ⇒ i = Ci− i and iCi. (7)

(5)

Table 1

The numerical example

No. 1 2 3 4 5 xi [0.25 0.5]t [0.5 0.5]t [0.75 0.5]t [0.5 0.25]t [0.5 0.75]t i 0.2 0.9 0.9 0.9 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9 0.9 0.9 0.9 0.2

Fig. 1. The decision boundary obtained by the original one-class SVM for the numerical example.

Substituting Eqs. (5)–(7) into Eq. (4), the Lagrangian is a function of only. The dual problem becomes minimize i 1 2 N i=1 N j =1 ijk(xi, xj) subject to 0_iC_i, i = 1, . . . , N, i i = 1. (8)

It can be noted that the only difference to original one-class SVM is the upper bound of Lagrange multiplier_i corresponding to each training point x_i.

Toy Example 1. We ﬁrst consider a simple toy data set shown in Table1. The proposed fuzzy one-class SVM model M1 is applied to this data set from an illustrative point of view. Figs. 1 and 2 illustrate the results of the original one-class SVM and the proposed model M1, respectively. In this example, we use Gaussian kernel with parameter q = 8 and

C = 0.4. The optimal choice of parameters C and q for the original one-class SVM was tuned using a grid search

(6)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9 0.9 0.9 0.9 0.2

Fig. 2. The decision boundary obtained by the proposed fuzzy one-class SVM model M1 for the numerical example.

one-class SVM encloses all data points; while the model M1 rejects the data point of No. 1 since its attitude toward target class is 0.2. The data point of No. 1 may belong to the target class 20% and 80% be meaningless.

3.2. Model M2

Now, we incorporated the concept of fuzzy set theory into the one-class SVM model. We construct a “fuzzy’’ hyperplane and separate the image of the target class from the fuzzy origin. The parameters to be identiﬁed, such as the components of weight vector and bias term, are fuzzy numbers. The fuzzy parameters studied in this work are restricted to a class of “triangular’’ membership functions. To do this, we need some preliminaries.

Preliminary 1 (Klir and Yuan[15]). For any fuzzy number A, B and ∈ (0, 1], where A = [a1, a2] and B =

[b1, b2] denoted the -cuts of A and B, respectively. If we deﬁne the partial ordering of closed intervals in the usual

way, that is

[a1, a2][b1, b2] iff a1b1 and a2b2 then for any fuzzy number A, B, we have

A f B

iff AB

for all ∈ (0, 1], where “ f

’’ denotes the fuzzy larger than.

Let X = (m, c) be a symmetric triangular fuzzy number where m is the center and c is the width. From Preliminary 1, for any two symmetric triangular fuzzy numbers A = (mA, cA) and B = (mB, cB) in T (R), we have

A

f B

(7)

Moreover, the components in the weight vector and bias term used in the hyperplane are symmetric triangular fuzzy numbers. Given the fuzzy weight vector W= (w, c) and fuzzy bias term B = (b, d), W is the fuzzy weight vector, where each component within it, W_i = (w_i, ci), are fuzzy numbers. It was denoted in the vector form of w =

[w1, . . . , wn]t and c = [c1, . . . , cn]t, which means “approximation w’’, described by the center w and the width c. Similarly, B = (b, d) is the fuzzy bias term, which means “approximation b’’, described by the center b and the width d.

Preliminary 2 (Tanaka et al.[25]). The fuzzy hyperplane,

Y = W1x1+ · · · + W_nx_n+ B = W · x + B, is deﬁned by the following membership function:

_Y(y) = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 1−|y − (w · x + b)| c · |x| + d , x = 0, 1, x= 0, y = 0, 0, x= 0, y = 0,

where_Y(y) = 0 when c · |x| + d |y − (w · x + b)|.

Our fuzzy one-class SVM task here is therefore to minimize w,b,c,d,i J =1 2w 2_{+ b + M} 1 2c 2_{+ d} + CN i=1 _ii subject to W · x_i + B f  − i for all i = 1, . . . , N, (9) where denoted the “fuzzy origin’’ which is also a triangular fuzzy number with center zero and width O_w.w2 is the term which characterizes the model complexity, the minimization ofw2 can be understood in the context of regularization operators[24]. 1₂c2+ d is the term which characterizes the vagueness of the model. The more vagueness in the fuzzy one-class SVM model means the more inexactness in the result, and M is a trade off parameter chosen by the decision-maker. The{_i}_i=1,...,Nare sets of slack variables that measure the amount of variation of the constraints for each point where C is a ﬁxed penalty parameter chosen by the user. The fuzzy membership i is the attitude of the corresponding point x_itoward the target class.

More speciﬁcally, from the above preliminaries, our problem is to ﬁnd out the fuzzy weight vector W∗= (w, c) and fuzzy bias term B∗= (b, d), which is the solution of the following quadratic programming problem:

minimize w,c,b,d,1i,2i J =1 2w 2_{+ b + M} 1 2c 2_{+ d} + C N i=1 _i(1i+2i) (10.1) subject to (w · xi + b) + (c · |xi| + d)0 + Ow− 1i, (10.2) (w · xi + b) − (c · |xi| + d)0 − Ow− 2i, d 0, 1i, 2i0 for i = 1, . . ., N. (10.3)

In Appendix we show that the fuzzy weight vector W∗= (w, c) that determines the optimal fuzzy hyperplane can be written as a linear combination of training vectors:

w= N i=1 (1i+ 2i)xi and c= 1 M N i=1 (1i− 2i)|xi|, (11)

where _1i, 2i 0. Since 1i, 2i> 0 only for support vector, the expression represents a compact form of writing W∗. We also show that to ﬁnd _1i, 2i, one has to solve the following quadratic programming

(8)

problem: maximize 1i,2i − 1 2 N i=1 N j =1 (1i+ 2i)(1j+ 2j)xi · xj − 1 2M N i=1 N j =1 (1i− 2i)(1j− 2j)|xi| · |xj| + N i=1 (1i− 2i)Ow (12.1) subject to N i=1 (1i+ 2i) = 1, (12.2) N i=1 (1i− 2i)M, (12.3) 01iC_i, 02iC_i, i = 1, . . . , N. (12.4)

While the fuzzy bias term B∗ = (b, d) can be determined from the Karush–Kuhn–Tucker (KKT) conditions (see Appendix). Hence for some i, j such that 1i ∈ (0, Ci) and 2j ∈ (0, Cj), b and d can be computed as

b = −12(w · xi + w · xj + c · |xi| − c · |xj|), (13)

d = −12(w · xi − w · xj + c · |xi| + c · |xj| − 2Ow). (14)

The fuzzy hyperplane is deﬁned by the following membership function:

_Y∗ i(y) = 1 − |y − ( N k=1(1k+ 2k)xi· xk + b)| (_M1 N_k=1(1k− 2k)|xi| · |xk|) + d . (15)

For any x_i, Y_i∗= W∗· xi+B∗is a symmetric triangular fuzzy number with centerw · xi+b and width c · |xi|+d.

The fuzzy origin,, is also a symmetric triangular fuzzy number with center zero and width O_w. For a new point x, we evaluate which side of the hyperplane it falls on by deﬁning the following fuzzy partial ordering relation. For any two symmetric triangular fuzzy numbers A = (mA, cA) and B = (mB, cB) in T (R), the degree that A is larger than B (i.e. A falls on the right side of B) is deﬁned by the following membership function:

RB(A) = R(A, B) = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 1 if > 0 and > 0, 0 if < 0 and < 0, 0.5 1+ +  max(||, ||) o.w., (16)

where = (mA+ cA) − (mB+ cB) and = (mA− cA) − (mB− cB).

Notice that RB(A) = 0.5 if mA= mB, RB(A) < 0.5 if mA< mB, and RB(A) > 0.5 if mA > mB. And the

decision function of the proposed model M2 is

f (x) = R(W∗· x + B∗) = R(W∗· x + B∗, ).

This decision function takes a value within a speciﬁed range that indicates the membership grade of the new point x belongs to the target class. A vague, fuzzy boundary exists between members and nonmembers of the set

3.3. Extension to the nonlinear case

To extend model M2 to nonlinear one-class classiﬁcation, we will use the idea of SVM for crisp-nonlinear classiﬁca-tion[1,24,28,29]. The basic idea is to simply map the input patterns xiby : Rn→ F into a higher dimensional feature space F . Note that the only way in which the data appears in the algorithm for the model is in the form of inner products xi· xj and |xi| · |xj|. The algorithm would only depend on the data through inner products in F , i.e. on functions

(9)

of the form(x_i) · (xj) and (|xi|) · (|xj|). Hence it sufﬁces to know and use k(xi, xj) = (xi) · (xj) and k(|xi|, |xj|) = (|xi|) · (|xj|) instead of (•) explicitly[12]. By replacingxi· xj and |xi| · |xj| with k(xi, xj) and k(|xi|, |xj|), respectively, we obtain the dual quadratic optimization problem given by Eq. (17). Here, we should note that the constraints are not changed.

maximize 1i,2i − 1 2 N i=1 N j =1 (1i+ 2i)(1j+ 2j)k(xi, xj) − 1 2M N i=1 N j =1 (1i− 2i)(1j− 2j)k(|xi|, |xj|) + N i=1 (1i− 2i)Ow (17.1) subject to N i=1 (1i+ 2i) = 1, (17.2) N i=1 (1i− 2i)M, (17.3) 01iC_i, 02iC_i, i = 1, . . . , N. (17.4)

The fuzzy hyperplane is deﬁned by the following membership function:

_Y∗ i(y) = 1 − |y − ( N k=1(1k+ 2k)k(xi, xk) + b)| (_M1 N_k=1(1k− 2k)k(|xi|, |xk|)) + d . (18)

Considering the quadratic programming problem given in by Eq. (17), the linear term in the objective function,

N

i=1

(1i− 2i)Ow,

will increase the value of1i; whereas the value of2i will be decreased. Hence, the vector c= 1

M

(1i− 2i)|xi|

is nonnegative in all our experiments. In case c is negative, we can take c= max(c, 0) since c_ishould be nonnegative. From constraints (17.2) and (17.3), it is easy to see that

2i(1 − M)/2 and

1i(1 + M)/2.

Hence, to guarantee the existence of some j such that 2j > 0, we suggest M should be set between 0 and 1. As for

the vagueness of the fuzzy hyperplane, 1 M N k=1 (1k− 2k)k(|x|, |xk|) + d,

the ﬁrst term is always in the range of 0–1 (using Eq. (17.3) and the characteristic of the Gaussian kernel) while the second term, d, is minimized in the prime optimization problem. Hence, we suggest the width of the fuzzy origin, Ow, should be set between 0 and 1.

3.4. The upper bound on number of errors

In this section we give a theoretical analysis explaining the characteristics of the parameters C and M in model M2. The KKT conditions (see Appendix) make several useful conclusions to us. The training points for which_1i (or_2i) > 0 are termed support vectors since only those points determine the ﬁnal fuzzy hyperplane among all training

(10)

References

[1]C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998) 955–974. [2]O. Chapelle, V. Vapnik, O. Bousquet, S. Mukherjee, Choosing multiple parameters for support vector machines, Mach. Learning 46 (2002)

131–159.

[3]J.-H. Chiang, P. Gader, Recognition of handprinted numerals in VISA card application forms, Mach. Vision Appl. 10 (1997) 144–149. [4]J.-H. Chiang, P.-Y. Hao, A new kernel-based fuzzy clustering approach: support vector clustering with cell growing, IEEE Trans. Fuzzy Systems

11 (4) (2003) 518–527.

[5]J.-H. Chiang, P.-Y. Hao, Support vector learning mechanism for fuzzy rule-based modeling: a new approach, IEEE Trans. Fuzzy Systems 12 (1) (2004) 1–12.

[6]C. Cortes, V.N. Vapnik, Support vector network, Mach. Learning 20 (1995) 1–25.

[7]D. Dubois, H. Prade, Operations on fuzzy number, Internat. J. Systems Sci. 9 (1978) 613–626.

[8]D. Dubois, H. Prade, Theory and Applications, Fuzzy Sets and Systems, Academic Press, New York, 1980.

[9]N. deFreitas, M. Milo, P. Clarkson, M. Niranjan, A. Gee, Sequential support vector machines, in: Proc. IEEE NNSP’99, 1999, pp. 31–40. [10]P. Gader, M. Mohamed, J.-H. Chiang, Handwritten word recognition with character and inter-character neural networks, IEEE Trans. Systems

Man Cybernet. 27 (1) (1997) 158–164.

[11]I. Guyon, B. Boser, V. Vapnik, Automatic capacity tuning of very large VC-dimension classiﬁer, Adv. Neural Inform. Process. Systems 5 (1993) 147–155.

[12]D.H. Hong, C. Hwang, Support vector fuzzy regression machines, Fuzzy Sets and Systems 138 (2003) 271–281.

[13]J.J. Hull, A database for handwritten text recognition research, IEEE Trans. Pattern Anal. Mach. Intelligence 16 (1994) 550–554.

[14]R. Jansen, H. Yu, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt, M. Gerstein, A Bayesian networks approach for predicting protein–protein interactions from genomic data, Science 302 (2003) 449–453.

[15]G.J. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice-Hall, New Jersey, 1995. [16]C.-F. Lin, S.-D. Wang, Fuzzy support vector machines, IEEE Trans. Neural Networks 13 (2) (2002) 464–471.

[17]C.-F. Lin, S.-D. Wang, Training algorithms for fuzzy support vector machines with noisy data, Pattern Recognition Lett. 25 (14) (2004) 1647–1656.

[18]L.M. Manevitz, M. Yousef, One-class SVMs for document classiﬁcation, J. Machine Learning Res. 2 (2001) 139–154.

[19]E.M. Marcotte, M. Pellegrini, H.L. Ng, D.W. Rice, T.O. Yeates, D. Eisenberg, Detecting protein function and protein–protein interactions from genome sequences, Science 285 (1999) 751–753.

[20]M.R. Moya, M.W. Koch, L.D. Hostetler, One-class classiﬁer networks for target recognition applications, in: Proc. World Congress on Neural Networks, International Neural Network Society, Portland, OR, 1993, pp. 797–801.

[21]C.V. Negoita, D.A. Ralescu, Application of Fuzzy Sets to Systems Analysis, Birkhauser, Basel, 1975, pp. 12–24.

[22]B. Schölkopf, C.J.C. Burges, A.J. Smola, Advances in Kernel Method—Support Vector Learning, MIT Press, Cambridge, MA, 1999. [23]B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution, Neural

Comput. 13 (2001) 1443–1471.

[24]A.J. Smola, B. Schölkopf, K.-R. Müller, The connection between regularization operations and support vector kernels, Neural Networks 11 (1998) 637–649.

[25]H. Tanaka, S. Uejima, K. Asai, Linear regression analysis with fuzzy model, IEEE. Trans. Systems Man Cybernet. 12 (6) (1982) 903–907. [26]D.M.J. Tax, R.P.W. Duin, Support vector domain description, Pattern Recognition Lett. 20 (1999) 1191–1199.

[27]D.M.J. Tax, R.P.W. Duin, Support vector data description, Mach. Learning 54 (2004) 45–66. [28]V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. [29]V.N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.

[30]C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, P. Bork, Comparative assessment of large-scale data sets of protein–protein interactions 31 (2002) 399–403.

[31]R.R. Yager, On solving fuzzy mathematical relationships, Inform. and Control 41 (1979) 29–55.