Support Vector Regression Algorithm - SUPPORT-VECTOR BASED FUZZY NEURAL

CHAPTER 2 SUPPORT-VECTOR BASED FUZZY NEURAL

4.1 Support Vector Regression Algorithm

In ε-SV regression, the goal is to find a function f(x) that has at most ε deviation

from the actually obtained targets yi for all the training data, and at the same time is as flat as possible. In other words, we do not care about errors as long as they are less than ε, but will not accept any deviation larger than this.

For this reasons, the linear regression function is considered first as follows:

f(x)=w^Tx+b (4.1)

Where w is the weight vector and b is a bias. The error of approximation is used instead of the margin between an optimal separating hyperplane and support vectors.

Vapnik introduced a general type of loss function, the linear loss function with ε-insensitivity zone:

The loss is equal to zero if the difference between the predicted f(x) and the measured value is less than ε. The ε-insensitivity loss function defines an ε tube. If the predicted value is within the tube, the loss is zero. For all other predicted points outside the tube, the loss is equal to the magnitude of the difference between the predicted value and the radius ε of the tube. Figure 4.1 shows the soft margin loss setting for a regression problem.

Fig. 4.1 the soft margin loss setting for a regression problem

From Fig. 4.1, the slack variables ξ ξ cope with the large outliers in the _i, _i^* regression problem. In formulating support vector algorithm for regression, the objective is to minimize the empirical risk and ||w||² simultaneously. The primal problem can therefore be defined as follows:

2 *

The constant C>0 determines the trade-off between the flatness of f(x) and the amount up to which deviations larger than ε are tolerated. The optimization problem can be converted to the dual optimization problem, which can be formulated as follows:

The kernel method can be added to above optimization to solve the nonlinear problem, too. The parameter ε in the ε-insensitive function and the regular constant C are powerful means for regularization and adaptation to the noise in training data. Both parameters control the network complexity and the generalization capability of SVR.

In next section, we proposed the learning algorithm of SVFNN that combine the capability of good robustness against noise and the efficient human-like reasoning of FNN in handling uncertainty information. The SVFNN use the fuzzy kernels to provide the SVR with adaptive local representation power such that the number of support vectors can be further reduced.

4.2 Learning Algorithm of SVFNN

The proposed learning algorithm of SVFNN consists of three phases. In the first phase, the initial fuzzy rule (cluster) and membership of network structure are automatically established based on the fuzzy clustering method. The input space partitioning determines the initial fuzzy rules, which is used to determine the fuzzy kernels. In the second phase, the means of membership functions and the connecting weights between layer 3 and layer 4 of SVFNN (see Fig. 2.1) are optimized by using the result of the support vector learning method with the fuzzy kernels function approximation. In the third phase, unnecessary fuzzy rules are recognized and eliminated and the relevant fuzzy rules are determined.

Learning Phase 1 – Establishing initial fuzzy rules

The first phase establishes the initial fuzzy rules. The input space partitioning determines the number of fuzzy rules extracted from the training set and also the number of fuzzy sets. We use the centers and widths of the clusters to represent the rules. To determine the cluster to which a point belongs, we consider the value of the firing strength for the given cluster. The highest value of the firing strength determines the cluster to which the point belongs. The input vector will combine the corresponding output value y

i in the training set S={(x1, y1), (x2, y2), …, (xv, yv)}

to input the learning phase 1. For generating a compact structure, the Cartesian product-space of the input and output is applied to the clustering algorithm [60]. The training samples are partitioned into characteristic regions where the system behaviors are approximated. The input data set is formed by combining the input vector x=[x1, x2, x3, …, xM]^T and the corresponding output value yi. Based on the clustering-based approach to construct initial fuzzy rules of FNN, first the input data is partitioned. For

each incoming pattern b,

b=[x;y]^T. (4.5)

The whole algorithm of SVFNN for the generation of new fuzzy rules as well as fuzzy sets in each input variable is as follows. Suppose no rules are existent initially.

IF b=[x;y](n+ ×11) is the first incoming input pattern THEN do

. After decomposition, we have n

one-dimensional membership functions, with m1i=bi and σ1i=σ

init, i=1, …, n+1.

}

ELSE for each newly incoming input b=[x;y], do

PART 2. {Find as defined in (2.10).

degree between two clusters. In addition, after decomposition, we have

− = ,

new i i

m b σ_{new i}₋ = − ×χ ln(F^J), i=1, ,M . Do the following fuzzy

measure for each input variable i:

{ ( , ) max₁ ( , ), ( , )

j ki new i new i ij ij

Degree i t ≡ _{≤ ≤} E⎡⎣µ m ₋ σ ₋ µ m σ ⎤⎦

, where E(‧) is defined in (2.14).

IF Degree i t( , )≤ρ( )t

THEN adopt this new membership function, and set

i i 1

k = + , where k k_i is the number of partitions of the ith training pattern.

ELSE merge the new membership function with closest one

In the above algorithm, σ_init is a prespecified constant, is the rule number at time t,

( ) c t

χ decides the overlap degree between two clusters, and the threshold F_in

determines the number of the generated rules. For a higher value of F_in, more rules are generated and, in general, a higher accuracy is achieved. The valueρ is a ( )t scalar similarity criterion, which is monotonically decreasing such that higher similarity between two fuzzy sets is allowed in the initial stage of learning. The pre-specified values are given heuristically. In addition, after we determine the precondition part of fuzzy rule, we also need to properly assign the consequence part of fuzzy rule. Hence, initially, we should assign the proper weight for the consequence part of fuzzy rule. The above procedure gives us means ( ) and variances (

Con−1

mij 2

σij) in (2.12). Another parameter in (2.7) that needs concern is the weight

dj associated with each . It is presented in Learning Phase 2 to show how we can use the results from the SVR method to determine these weights.

(4)

Learning Phase 2 - Calculating the parameters of SVFNN

Through above method, the optimal parameters of SVFNN are trained by using the ε-insensitivity loss function SVR [35] based on the fuzzy kernels [61]. The dual quadratic optimization of SVR [36], [62] is solved in order to obtain an optimal hyperplane for any linear or nonlinear space:

maximize ^L

(

^{α α}^, ^*

)

^{= −}^ε

∑

^v ⁽^αⁱ^*⁺^αⁱ⁾⁺

∑

^v ⁽^{α α}ⁱ^*⁻ ⁱ⁾^yⁱ⁻¹₂

∑

^v ⁽^{α α α α}ⁱ^*⁻ ⁱ⁾⁽ ^*^j⁻ ^j⁾

(

^{x x}

)

where is the fuzzy kernel that is defined as (2.17), ε is a previously

chosen nonnegative number for ε-insensitive loss function and C is a user-specified positive parameter to control the tradeoff between complexity of the SVR and the number of nonseparable points. This quadratic optimization problem can be solved and a solution support vectors. The corresponding support vectors

can be obtained, and the constant (threshold) d

[ ₁ ₂ , _i, ,

where nsv is the number of fuzzy rules (support vectors). Hence, the fuzzy rules of SVFNN are reconstructed by using the result of the SVR learning with fuzzy kernels.

The means and variances of the membership functions can be calculated by the values of support vector m_j =sx_j, j=1, 2, …, nsv, in (2.6) and (2.7) and the variances of the multidimensional membership function of the cluster that the support vector belongs to, respectively. The coefficients dj in (2.8) corresponding to m_j =sx_j can be calculated by d_j = y_j(α α^*_j− _j). In this phase, the number of fuzzy rules can be increased or decreased. The adaptive fuzzy kernel is advantageous to both the SVR and the FNN. The use of variable-width fuzzy kernels makes the SVR more efficient in terms of the number of required support vectors, which are corresponding to the fuzzy rules in SVFNN.

Learning Phase 3 – Removing irrelevant fuzzy rules

In this phase, the number of fuzzy rules learning in Phases 1 and 2 are reduced by removing some irrelevant fuzzy rules. The method of reducing fuzzy rules attempts to reduce the number of fuzzy rules by minimizing the distance measure between original fuzzy rules and reduced fuzzy rules without losing the generalization performance. The reducing method is the same as in Section 2 of Chapter 3

4.3 Experimental Results

In this section we present some experimental results to demonstrate the performance and capabilities of the proposed SVFNN. First, we apply the SVFNN to four function approximation problems to examine its rule-reduction performance.

Then the robustness of SVFNN is evaluated by these functions with noise.

A. Setup

1) Functions for approximation:

The function approximation problems include one- and two- variable functions

which have been widely used in the literature [63]-[65]:

The fist function is a one-variable sinc function defined as

x x x

f sin( ) )

)(

( = with x∈[−10, 10]. (4.8)

The second function is one-variable function defined as

f ⁽²⁾(x)= x²^/³ with x∈[−2, 2]. (4.9) The third function is a two-variable Gaussian function defined as

f ⁽³⁾(x,y)=exp{−2(x² + y²)} with x∈[−1, 1], y∈[−1, 1]. (4.10) The fourth function, which exhibits a more complex structure, is defined as

2 2

(4)

2 2

sin(10 )

( , )

x y

f x y

x y

= +

+ with x∈[−1, 1], y∈[−1, 1]. (4.11) Plots of these four functions are shown in subplots (a) of Figs. 4.2-4.5.

(a)

(b)

Fig.4.2 (a) The desired output of the function show in (4.8). (b) The resulting approximation by SVFNN.

(a)

(b)

Fig 4.3 (a) The desired output of the function show in (4.9) (b) The resulting approximation by SVFNN.

(a)

(b)

Fig 4.4 (a) The desired output of the function show in (4.10). (b) The resulting approximation by SVFNN.

(a)

(b)

Fig 4.5 (a) The desired output of the function show in (4.11). (b) The resulting approximation by SVFNN.

2) Training and Testing data:

There are two sets of training data for each function, one is noiseless and the other is noisy. In the first function, the noiseless training set has 50 points that are generated by randomly selecting, where x∈[−10, 10]. The testing set has 200 points that are randomly generated by the same function in the same range. The training and testing sets of the second function are generated by the same way, where . In the third function, the 150 training examples are generated by randomly selecting, where , . The testing set has 600 points that are randomly generated by the same function in the same range. In the fourth function, The 150 training examples are generated by randomly selecting, where ,

. The testing set has 600 points that is randomly generated by the same function in the same range. The noisy training sets are generated by adding independent and identically distributed (i.i.d.) Gaussian noise, with zero mean and 0.25 standard deviation, to the original training sets.

]

, the signal to noise ratio (SNR) is roughly equal to 4 (1/0.25=4).

3) experimental particular

The computational experiments were done on a Pentium III-1000 with 1024MB RAM using the Microsoft window operation system. The simulations were conducted in the Matlab environment. The root-mean-square-error (RMSE) is used to quantify the performance of methods and it is defined as

∑

= used training or testing data. The ε-insensitivity parameter and cost parameter C in (4.6) are selected from the range of ε=[0.1, 0.01, 0.001, 0.0001] and C=[10

yˆi

-1, 10⁰, 10¹, …, 10⁵], respectively. For the SVFNN training, we choose the ε-insensitivity parameter and cost parameter C that results in the best RMSE average to calculate the testing RMSE. Similarly, the parameters of SVR for comparison are also selected by using the same method, too.

B. Experimental Results

Tables 4.1 to 4.5 show the training and testing RMSEs and the number of used fuzzy rules (i.e., support vectors) in the SVFNN on the approximation of the four functions ((4.8) to (4.11)), respectively. The training and testing RMSEs can reach a nice level by selecting a proper parameter set for {ε, C }. The criterion of determining the number of reduced fuzzy rules is the difference of the accuracy values before and after reducing one fuzzy rule. If the difference is larger than 0.2%,

meaning that some important support vector has been removed, then we stop the rule reduction. In Table 4.1 (a), the SVFNN is verified by the one-variable sinc function defined as (4.8), where the constant n in the symbol SVFNN-n means the number of the learned fuzzy rules. It uses sixteen fuzzy rules and achieves a root mean square error (RMSE) value of 0.0007 on the training data and an RMSE value of 0.0026 on the testing data. When the number of fuzzy rules is reduced to twelve, its testing error rate increased to 0.0029. When the number of fuzzy rules is reduced to eleven, its testing error rate is increased to 0.01. Continuously decreasing the number of fuzzy rules will keep the error rate increasing. Therefore, twelve fuzzy rules are used in this case. From Tables 4.2 (a) to 4.4 (a), we have the similar experimental results as those in Table 4.1 (a). Plots of these experimental results are shown in subplots (b) of Figs.

4.2-4.5. In Table 4.1 (b), the independent and identically distributed (i.i.d.) Gaussian noise, with zero mean and 0.25 standard deviation, is added to the function for approximation. It uses sixteen fuzzy rules and achieves a root mean square error (RMSE) value of 0.0085 on the training data and an RMSE value of 0.042 on the testing data. When the number of fuzzy rules is reduced to twelve, its testing error rate is increased to 0.045. When the number of fuzzy rules is reduced to eleven, its testing error rate is increased to 0.091. Therefore, twelve fuzzy rules are also used in this case.

From Table 4.2 (b) to 4.4 (b), we have the similar experimental results as those in Table 4.1 (b) These experimental results show that the proposed SVFNN can properly reduce the number of required fuzzy rules and maintain the robustness against noise.

The performance comparisons among the Adaptive-network-based fuzzy inference system (ANFIS) [66], the robust neural network [67], the RBF-kernel-based SVR (without support vector reduction) [68], and the proposed SVFNN are made in Tables 4.5 and 4.6.

TABLE 4.1 (a) Experimental results of SVFNN on the first function using the training data without noise. (b) Experimental results of SVFNN on the first function using the training data with noise.

(a)

Training process Testing process SVFNN-n (SVFNN

1. The first function is

x 2. The number of training data is 50.

3. The number of testing data is 200.

(b)

Training process Testing process SVFNN-n (SVFNN

1. The first function is

x 2. The number of training data is 50.

3. The number of testing data is 200.

TABLE 4.2 (a) Experimental results of SVFNN on the second function using the training data without noise. (b) Experimental results of SVFNN on the second function using the training data with noise.

(a)

Training process Testing porcess SVFNN-n (SVFNN

with n fuzzy rules) C RMSE RMSE

SVFNN – 19 100 0.0009 0.0056

SVFNN – 16 100 0.0009 0.0056

SVFNN – 12 100 0.0009 0.0060

SVFNN - 11 100 0.0015 0.0092

1. The second function is f ⁽²⁾(x)= x²^/³ with x∈[−2, 2]. 2. The number of training data is 50.

3. The number of testing data is 200.

(b)

Training process Testing porcess SVFNN-n (SVFNN

with n fuzzy rules) C RMSE RMSE

SVFNN – 25 100 0.001 0.078

SVFNN – 20 100 0.001 0.078

SVFNN – 15 100 0.001 0.081

SVFNN - 14 100 0.0057 0.139

1. The second function is f⁽²⁾(x)=x²^/³ with x∈[−2, 2]. 2. The number of training data is 50.

3. The number of testing data is 200.

TABLE 4.3 (a) Experimental results of SVFNN on the third function using the training data without noise. (b) Experimental results of SVFNN on the third function using the training data with noise.

(a)

Training process Testing process SVFNN-n (SVFNN

2. The number of training data is 150.

3. The number of testing data is 600.

(b)

Training process Testing process SVFNN-n (SVFNN

2. The number of training data is 150.

3. The number of testing data is 600.

TABLE 4.4 (a) Experimental results of SVFNN on the fourth function using the training data without noise. (b) Experimental results of SVFNN on the fourth function using the training data with noise.

(a)

Training process Testing process SVFNN-n (SVFNN

2. The number of training data is 150.

3. The number of testing data is 600.

(b)

Training process Testing process SVFNN-n (SVFNN

2. The number of training data is 150.

3. The number of testing data is 600.

TABLE 4.5 Comparisons RMSE using the training data without noise.

FUNCTION ANFIS [66] Robust NN [67] RBF-kernel-based SVR

[68] SVFNN

Number of

fuzzy rules RMSE Number of

neurons RMSE Number of

support vectors RMSE Number of

Fuzzy rules RMSE

TABLE 4.6 Comparisons RMSE using the training data with noise.

FUNCTION ANFIS [66] Robust NN [67] RBF-kernel-based SVR [68] SVFNN Number of

fuzzy rules RMSE Number of

neurons RMSE Number of

support vectors RMSE Number of

Fuzzy rules RMSE

4.4 Discussions

These results indicate that the SVFNN maintains the function approximation accuracy and uses less support vectors as compared to the regular SVR using fixed-width RBF kernels. The computational cost of the proposed SVFNN is also less than the regular SVR in the testing stage. In addition, according to Table 4.6 the testing results of SVFNN trained by the noisy data are close to results of SVFNN trained by the data without noise. It demonstrates that the proposed SVFNN have better robustness compared to ANFIS and the robust neural network, although the SVFNN uses little more rules compared with the ANFIS. In summary, the proposed SVFNN exhibits better generalization ability, maintains more robustness and uses less fuzzy rules.

CHAPTER 5 CONCLUSIONS

In this dissertation we proposed a support-vector-based fuzzy neural networks (SVFNNs) for solving more complex classification and function approximation problems. SVFNNs combines the superior classification power of support vector machine (SVM) in high dimensional data spaces and the efficient human-like reasoning of FNN in handling uncertainty information. The SVFNNs is the realization of a new idea for the adaptive kernel functions used in the SVM. The use of the proposed fuzzy kernels provides the SVM with adaptive local representation power, and thus brings the advantages of FNN (such as adaptive learning and economic network structure) into the SVM directly. SVFNNs combine the capability of good robustness against noise and global generalization of support vector learning and the efficient human-like reasoning of FNN in handling uncertainty information. A novel adaptive fuzzy kernel function is also proposed to bring the advantages of FNNs to the SVR directly and the use of the proposed fuzzy kernels provides the SVR with adaptive local representation power. The major advantages of the proposed SVFNNs are as follows:

(1) The proposed SVFNNs can automatically generate fuzzy rules, and improve the accuracy and learning speed of classification.

(2) It combined the optimal classification ability of SVM and the human-like reasoning of fuzzy systems. It improved the classification ability by giving SVM with adaptive fuzzy kernels and increased the speed of classification by reduced

fuzzy rules.

(3) The fuzzy kernels using the variable-width fuzzy membership functions can make the SVM more efficient in terms of the number of required support vectors, and also make the learned FNN more understandable to human.

(4) The ability of the structural risk minimization induction principle, which forms the basis for the SVM method to minimize the expected risk, gives better generalization ability to the FNN classification.

(5) The proposed SVFNN can automatically generate fuzzy rules and improve the accuracy of function approximation.

(6) The combination of the robust regression ability of SVR and the human-like reasoning of fuzzy systems improves the robust regression ability of FNN by using SVR training and increases the speed of execution by reduced fuzzy rules.

In the future work, we will try to develop a mechanism to automatically select the appropriate initial values of the parameters used in the first phase training and the penalty parameter in the second phase training. We will also apply the proposed method to deal with complex and huge classification problem and more complex and noisy functions.

REFERENCES

[1] K. Tanaka and H. O. Wang, Fuzzy Control Systems Design and Analysis, New York: Wiley, 2001.

[2] B. Kosko, Neural Networks and Fuzzy Systems, Englewood Cliffs, NJ:

Prentice-Hall, 1992.

[3] M. Y. Chen and D. A. Linkens, “Rule-base self-generation and simplification for data-driven fuzzy models,” Fuzzy Set and Syst., Vol. 142, pp 243-265, March 2004.

[4] J. S. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,” IEEE Trans.

Syst. Man. Cybern., Vol. 23, pp. 665-685, May 1993.

[5] K. Tanaka, M. Sano, and H. Wantanabe, “Modeling and control of carbon monoxide concentration using a neuro-fuzzy technique,” IEEE Trans. Fuzzy Syst., Vol. 3, pp. 271-279, Aug. 1995.

[6] L. Y. Cai and H. K. Kwan, “Fuzzy classifications using fuzzy inference networks,”

IEEE Trans. Syst., Man, Cybern. Pt B, Vol. 28, pp. 334-347, June. 1998.

[7] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum, 1981.

[8] J. C. Bezdek, S. K. Chuah, and D. Leep, “Generalized K-nearest neighbor rules,”

在文檔中支持向量模糊類神經網路及其在資料分類和函數近似之應用 (頁 46-0)