A Fuzzy Model of Support Vector Regression Machine

(1)

International Journal of Fuzzy Systems, Vol. 9, No. 1, March 2007 45

A Fuzzy Model of Support Vector Regression Machine

Pei-Yi Hao and Jung-Hsien Chiang

Abstract

1

Fuzziness must be considered in systems where available information is uncertain. A model of such a vague phenomenon might be represented as a fuzzy system equation which can be described by the fuzzy functions defined by Zadeh’s extension principle. In this paper, we incorporate the concept of fuzzy set theory into the support vector machine (SVM). This integration preserves the benefits of SVM regression model and fuzzy regression model, where the SVM learning theory characterizes properties of learning machines which enable them to generalize well and the fuzzy set theory provides an effective means of capturing the approximate, inexact nature of real world.

Keywords: Support Vector Machines (SVMs), Support Vector Regression, Fuzzy Regression.

1. Introduction

In modeling some systems where available informa-tion is uncertain, we must deal with a fuzzy structure of the system considered. This structure is represented as a fuzzy function whose parameters are given by fuzzy sets. The fuzzy functions are defined by Zadeh’s extension principle [5], [13], [14]. Considering a fuzzy function as a model of fuzzy structure of fuzzy system, a fuzzy re-gression analysis is formulated [10], [11]. The fuzzy pa-rameters of the fuzzy model obtained represent a possi-bility distribution which corresponds to the fuzziness of the system. This fuzzy regression model might be very useful for finding a fuzzy structure in an evaluation sys-tem.

The Support Vector Machines (SVMs), developed at AT&T Bell Laboratories by Vapnik and co-works [3], [12], have been very successful in pattern classification and function estimation problems for crisp data. It is based on the idea of structural risk minimization, which shows that the generalization error is bounded by the

Corresponding Author: J.-H. Chiang is with the Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, R.O.C.

E-mail: jchiang@mail.ncku.edu.tw

Manuscript received 20 Jun. 2006; revised 11 Oct. 2006; accepted 12 Mar. 2007.

sum of the training error and a term depending on the Vapnik–Chervonenkis (VC) dimension. By minimizing this bound, high generalization performance can be achieved. A comprehensive tutorial on SVM classifier has been published by Burges [1]. In regression and time series prediction applications, excellent performances were also obtained [4]. Lin et al. [8] and Huang et al. [7] first introduced the use of fuzzy set theory for SVM classification problems. Whereas Chiang et al. [2] ap-plied the SVM theory for the fuzzy rules based model-ing.

In this paper, we incorporated the concept of fuzzy set theory into the SVM regression model. The parameters to be identified in SVM regression model, such as the components of weight vector and bias term, are set to be fuzzy numbers. Besides, the desired outputs in training samples are also fuzzy number. Incorporating the con-cept of fuzzy set theory into the SVM regression pre-serves the benefits of SVM regression and fuzzy regres-sion, where the VC theory characterizes properties of learning machines which enable them to generalize well in the unseen data. Finally, the fuzzy SVM regression model might be very useful for finding a fuzzy structure in an evaluation system where available information is inexact.

2. SVM Regression Model

Suppose we are given a training data set

R y

y ),...,( N, N)}⊂ℵ×

,

{(x₁ ₁ x , where ℵ denotes the

space of input patterns—for instance, k

R . In ε-SVM regression [4,12], the goal is to find a function f(x) that has at most ε deviation from the actually obtained targets

i

y for all the training data. In other words, we do not

care about error as long as they are less than ε, but will not accept any deviation larger than _ε. An _ε-insensitive loss function    − ≤ = , , 0 : otherwise if ε ξ ε ξ ξ_ε (1)

is used so that the error is penalized only if it is outside the _ε-tube. Figure 1 depicts this situation graphically. To make the SVM regression nonlinear, this could be achieved by simply mapping the training patterns x _i

by a nonlinear transform Φ: ℵ→ F into some high di-mensional feature space F. A simple example of the

(2)

46 International Journal of Fuzzy Systems, Vol. 9, No. 1, March 2007

nonlinear transform Φ is the polynomial transform func-tion:

Fig. 1. The epsilon insensitive loss setting corresponds for a linear SV regression machine.

(

3

)

2 3 1 2 2 1 2 2 1 2 1 2 2 2 1 2 1 2 1 , , 3 , 3 , 6 , 3 , 3 , 3 , 3 ) ( ) , ( x x x x x x x x x x x x x x = Φ → = x x ₍₂₎

where the degree of the polynomial transform function is 3, and the dimension in input space ℵ and feature space F is 2 and 9, respectively. A best fitting function

b

f(x)=w⋅Φ(x)+ (3)

is estimated in feature space F, where “.” denotes the dot product in the feature space F. To avoid overfitting in the very high-dimensional feature space, one should add a capacity control term, which in the SVM case results to

be 2

w . Formally, the SVM regression model can be

written as a convex optimization problem by requiring:

. 0 , ) ) ( ( ) ) ( ( to subject ) ( 2 1 minimize * * 1 * 2 , , , * i y b b y C i i i i i t i i t i N i i i b w _i _i ∀ ≥ − ≤ − + Φ − ≤ + Φ − + +

∑

= ξ ξ ξ ε ξ ε ξ ξ ξ ξ x w x w w (4)

The constant C>0 determines the trade off between the complexity of f(x) and the amount up to which devia-tions larger than ε are tolerated. What makes SVM re-gression attractive is we can estimate a linear function in the feature space, while it is a nonlinear function esti-mated in the original space, as shown in Fig. 2. Besides, the regression task was achieved by solving a convex programming with linear constraints; in other words, it has a unique solution.

However, the size of ε-tube is a pre-defined constant. The selection of a parameter ε may seriously affect the modeling performance. Besides, the ε-insensitive zone in the SVM regression model has a tube (or slab) shape. Namely, all training data points are equally treated dur-ing the traindur-ing of SVM regression model, and are pe-nalized only if they are outside the ε-tube. In many real-world applications, the effects of the training points are different. We would require that the precise training

points must be regressed correctly, and would allow more errors on imprecise training points.

Fig. 2. The epsilon insensitive loss setting corresponds for a nonlinear SV regression machine.

3. Fuzzy SVM Regression

In many real-world applications, available information is often uncertain, imprecise and incomplete and thus, usually is represented by fuzzy sets or a generalization of interval data. For handling those fuzzy data, fuzzy re-gression analysis is an important tool and has been suc-cessfully applied in different applications. In this section, we applied the fuzzy set theory for the SVM regression model. The fuzzifized parameters within SVM regres-sion will make it more elastic.

First, we deal with fuzzy desired output in the regres-sion task. The given output data, denoted by

) , ( ~ i i i y e

Y = , are symmetric triangular fuzzy numbers,

where yi is a center and ei is a width. The

member-ship function of Yi ~_{is given by} i i Y _e y y y i − − =1 ) ( ~ µ . (5) Then, for handling those fuzzifized training data, the components in weight vector and bias term using in the SVM regression model are also set to be fuzzy numbers. Give fuzzy weight vector W={w, c} and fuzzy bias term

B={b, d}, W={w, c} is the fuzzy weight vector, where

each components within it are fuzzy numbers. It was

denoted in the vector form of t

n w w ,..., ] [ 1 = w , and t n c c ,..., ] [ ₁ =

c , which means “approximation w”,

de-scribed by the center w and the width c. Similarly, B={b,

d} is the fuzzy bias term, which means “approximation b”, described by the center b and the width d. The fuzzy

parameters studied in this work are restricted to a class of “triangular” membership functions. The fuzzy func-tion B B W W Y= 1Φ(x)1+ _nΦ(x)_n+ =W⋅Φ(x)+ , (6)

(3)

50 International Journal of Fuzzy Systems, Vol. 9, No. 1, March 2007

6. Conclusions

The difference between SVM regression model and fuzzy SVM regression model is the SVM regression model seeks a linear function that has at most ε deviation from the actually obtained targets yi for all the training

data, whereas fuzzy SVM regression model seeks a fuzzy linear function with fuzzy parameters that has at last H fitting degree from the fuzzy desired targets Y~_i

for all the training data.

Another important feature in SVM algorithm is the prime optimization problem can be reformulated as the Wolf dual problem by using the Lagrange multiplier method. The benefit of transforming into dual problem is, by using the kernel trick (K(x,y)≡Φ(x)⋅(y)), this makes SVM possible to deal with the feature space with vast high dimension (possibly infinite dimension). In our ap-proach, we do not reformulate the prime fuzzy SVM re-gression problem as the dual problem because the defini-tions of operadefini-tions on fuzzy number using here. This limits the dimensionality of feature space in the pro-posed fuzzy SVM regression model to be finite. Hence, we only use the polynomial regression models. However, almost all fuzzy regression approaches suggested poly-nomials as regression models since any function can be represented by the polynomial approximation. How to transform the prime fuzzy SVM regression problem into the dual problem will be our further work.

7. References

[1] C.J.C. Burges, “A tutorial on support vector ma-chines for pattern recognition,” Data Mining and

Knowledge Discovery, vol. 2, no. 2, pp. 955-974,

1998.

[2] J.-H. Chiang and P.-Y. Hao, “Support vector learn-ing mechanism for fuzzy rule-based modellearn-ing: a new approach,” IEEE Trans. on Fuzzy Systems, vol. 12, no. 1, pp. 1-12, 2004.

[3] C. Cortes, and V. N. Vapnik, “Support vector net-work,” Machine learning, vol. 20, pp. 1-25, 1995. [4] H. Drucker, C. Burges, L. Kaufman, A. Smola, and

V. N. Vapnik, “Support vector regression ma-chines,” In Advances in Neural Information

Proc-essing Systems, vol. 9, pp. 155-161, 1996.

[5] D. Dubois and H. Prade, “Operations on fuzzy number,” Int. J. Syst. Sci., vol. 9, pp. 613-626, 1978.

[6] P.-Y. Hao and J.-H. Chiang, “A fuzzy model of support vector regression,” IEEE Int. Conf. On

Fuzzy Systems, vol. 1, pp. 738-742, 2003.

[7] H.-P. Huang and Y.-H. Liu, “Fuzzy support vector machines for pattern recognition and data mining,”

International Journal of Fuzzy Systems, vol. 4, no.

3, pp. 826-835, Sep. 2002.

[8] C. -F. Lin and S. -D. Wang, “Fuzzy support vector machines,” IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 464-471, Mar. 2002.

[9] A. Smola, B. Schölkopf, and K.-R. Müller, “The connection between regularization operations and support vector kernels,” Neural Networks, vol. 11, pp. 637-649, 1998.

[10] H. Tanaka, S. Uejima, and K. Asai, “Linear regres-sion analysis with fuzzy model,” IEEE. Trans. On

Syst., Man, and Cyber., vol. 12, no. 6, pp. 903-907,

1982.

[11] H. Tanaka and H. Lee, “Interval regression analysis by quadratic programming approach,” IEEE Trans.

on Fuzzy Systems, vol. 6, no. 4, pp. 473-481, 1998.

[12] V. N. Vapnik, The Nature of Statistical Learning

Theory. Springer-Verlag, New York, 1995.

[13] R. R. Yager, “On solving fuzzy mathematical rela-tionships,” Information and Control, vol. 41, pp. 29-55, 1979.

[14] L. A. Zadeh, “The concept of linguistic variable and its application to approximate reasoning,”