Learning Algorithm of SVFNN - SUPPORT-VECTOR BASED FUZZY NEURAL

CHAPTER 2 SUPPORT-VECTOR BASED FUZZY NEURAL

3.2 Learning Algorithm of SVFNN

The learning algorithm of the SVFNN consists of three phases. The details are given below:

Learning Phase 1 – Establishing initial fuzzy rules

The first phase establishes the initial fuzzy rules, which were usually derived from human experts as linguistic knowledge. Because it is not always easy to derive

fuzzy rules from human experts, the method of automatically generating fuzzy rules from numerical data is issued. The input space partitioning determines the number of fuzzy rules extracted from the training set and also the number of fuzzy sets. We use the centers and widths of the clusters to represent the rules. To determine the cluster to which a point belongs, we consider the value of the firing strength for the given cluster. The highest value of the firing strength determines the cluster to which the point belongs. The whole algorithm for the generation of new fuzzy rules as well as fuzzy sets in each input variable is as follows. Suppose no rules are existent initially.

IF x is the first incoming input pattern THEN do

PART 1. { Generate a new rule with center m = x₁ and width

c t+( 1) =

w χ decides the overlap degree between two clusters. In addition, after decomposition, we have m_{new i}₋ =x_i,

ln( ^J)

new i F

σ ₋ = − ×χ , i=1, ,M . Do the following fuzzy measure for each input variable i:

{ ( , ) max₁ ( , ), ( , )

j ki new i new i ij ij

Degree i t ≡ _{≤ ≤} E⎡⎣µ m ₋ σ ₋ µ m σ ⎤⎦ , where E(‧) is defined in (2.14).

IF Degree i t( , )≤ρ( )t

THEN adopt this new membership function, and set

i i 1

k = + , where k k_i is the number of partitions of the ith input variable.

ELSE merge the new membership function with closest one

⎟, and the respective consequent

weight w_{Con a t}₋ _{( 1)}₊ = y. In addition, we also need to do the fuzzy measure for each input variable i. } } }

In the above algorithm, σ_init is a prespecified constant, is the rule number at time t,

( ) c t

χ decides the overlap degree between two clusters, and the threshold F_in

determines the number of rules generated. For a higher value of F_in, more rules are generated and, in general, a higher accuracy is achieved. The valueρ is a scalar ( )t similarity criterion, which is monotonically decreasing such that higher similarity between two fuzzy sets is allowed in the initial stage of learning. The pre-specified values are given heuristically. In general, F(t)=0.35, β =0.05, 5σ_init =0. , χ =2.

In addition, after we determine the precondition part of fuzzy rule, we also need to properly assign the consequence part of fuzzy rule. Here we define two output nodes for doing two-cluster recognition. If output node 1 obtains higher exciting value, we know this input-output pattern belongs to class 1. Hence, initially, we should assign the proper weight for the consequence part of fuzzy rule. The above procedure gives us means ( ) and variances (

Con−1

mij σ_ij²) in (2.9). Another parameter in (2.7) that needs concern is the weight dj associated with each . We shall see later in Learning Phase 2 how we can use the results from the SVM method to determine these weights.

(4)

Learning Phase 2 - Calculating the parameters of SVFNN

Through learning phase (1), the initial structure of SVFNN is established and we can then use SVM [34], [35] to find the optimal parameters of SVFNN based on the proposed fuzzy kernels. The dual quadratic optimization of SVM [36] is solved in order to obtain an optimal hyperplane for any linear or nonlinear space:

maximize ^L

( )

^α ⁼

∑

^v ^αⁱ ⁻¹₂

∑

^v ^{y y}ⁱ ^j^{α α}ⁱ ^j^K

(

^{x x}ⁱ^, ^j

)

where is the fuzzy kernel in (2.17) and C is a user-specified positive parameter to control the tradeoff between complexity of the SVM and the number of nonseparable points. This quadratic optimization problem can be solved and a

solution can be obtained, where

coefficients, and nsv is the number of support vectors. The corresponding support

vectors can be obtained, and the constant

(threshold) d

where nsv is the number of fuzzy rules (support vectors); the support vector x^*(1) belongs to the first class and support vector x^*(-1) belongs to the second class. Hence, the fuzzy rules of SVFNN are reconstructed by using the result of the SVM learning with fuzzy kernels. The means and variances of the membership functions can be calculated by the values of support vector m_j =sx_j, j=1, 2, …, nsv, in (2.5) and (2.6) and the variances of the multidimensional membership function of the cluster that the support vector belongs to, respectively. The coefficients dj in (2.7) corresponding to

j = j

m sx can be calculated by d_j = y_jα_j. In this phase, the number of fuzzy rules can be increased or decreased. The adaptive fuzzy kernel is advantageous to both the SVM and the FNN. The use of variable-width fuzzy kernels makes the SVM more efficient in terms of the number of required support vectors, which are corresponding to the fuzzy rules in SVFNN.

Learning Phase 3 – Removing irrelevant fuzzy rules

In this phase, we propose a method for reducing the number of fuzzy rules learning in Phases 1 and 2 by removing some irrelevant fuzzy rules and retuning the consequent parameters of the remaining fuzzy rules under the condition that the classification accuracy of SVFNN is kept almost the same. Several methods including orthogonal least squares (OLS) method and singular value decomposition QR (SVD-QR) had been proposed to select important fuzzy rules from a given rule base [37]-[39]. In [37] the SVD-QR algorithm select a set of independent fuzzy basis function that minimize the residual error in a least squares sense. In [38], an orthogonal least-squares method tries to minimize the fitting error according to the error reduction ratio rather than simplify the model structure [39]. The proposed method reduces the number of fuzzy rules by minimizing the distance measure between original fuzzy rules and reduced fuzzy rules without losing the generalization performance. To achieve this goal, we rewrite (2.8) as

where N is the number of fuzzy rules after Learning phases 1 and 2. Now we try to approximate it by the expansion of a reduced set :

Re 2 variance of reducing fuzzy rules. To this end, one can minimize [40]

(4) Re(4) 2 (4) Re(4) Re (4) Re

where . Evidently, the problem of finding reduced fuzzy rules consists of two parts: one is to determine the reduced fuzzy rules and the other is to compute the expansion coefficients

Re Re Re Re

1 2

[ , , , ]^T

q = mq m q mMq

βi. This problem can be solved by choosing the more important Rz fuzzy rules from the old N fuzzy rules. By adopting the sequential optimization approach in the reduced support vector method in [41], the approximation in (3.4) can be achieved by computing a whole sequence of reduced set approximations

Re(4) Re(4) the expansion of the reduced fuzzy-rule set in (3.4) can be obtained by the following iterative optimization rule [41] :

(4) Re to the first most important fuzzy rule and then remove this rule from the original fuzzy rule set represented by mj, j=1, 2, …, N and put (add) this rule into the reduced fuzzy rule set. Then the procedure for obtaining the reduced rules is repeated. The optimal coefficients β_q, q=1, 2, , R_z, are then computed to approximate

, (3.16)

The whole learning scheme is iterated until the new rules are sufficiently sparse.

在文檔中支持向量模糊類神經網路及其在資料分類和函數近似之應用 (頁 32-39)