Dynamic optimal learning rates of a certain class of fuzzy neural networks and its applications with genetic algorithm

(1)

[24] M. Sakawa and H. Yano, “An interactive method for multiobjective non-linear programming problems with fuzzy parameters,” in Cybernetics and Systems ’86, R. Trappl, Ed. Dordrecht, The Netherlands: Reidel, 1986, pp. 607–614.

[25] M. Sakawa and H. Yano, “Interactive decision making for multiobjective linear programming problems with fuzzy parameters,” in Large-Scale Modeling and Interactive Decision Analysis, G. Fandel, M. Grauer, A. Kurzhanski, and A. P. Wierzbicki, Eds. New York: Springer-Verlag, 1986, pp. 88–96.

[26] , “An interactive satisficing method for multiobjective nonlinear programming problems with fuzzy parameters,” in Optimization Models Using Fuzzy Sets and Possibility Theory, J. Kacprzyk and S. A. Orlovski, Eds. Dordrecht, The Netherlands: Reidel, 1987, pp. 258–271. [27] , “Interactive decision making for multiobjective nonlinear

pro-gramming problems with fuzzy parameters,” Fuzzy Sets Syst., vol. 29, no. 3, pp. 315–326, 1989.

[28] , “An interactive fuzzy satisficing method for multiobjective non-linear programming problems with fuzzy parameters,” Fuzzy Sets Syst., vol. 30, no. 3, pp. 221–238, 1989.

[29] J. H. Holland, Adaptation in Natural and Artificial Systems. Cambridge, MA: MIT Press, 1992.

[30] Z. Michalewicz, Genetic Algorithms+ Data Structures = Evolution Programs, 2nd extended ed. New York: Springer-Verlag, 1994. [31] , Genetic Algorithms+ Data Structures = Evolution Programs,

Third, revised and extended ed. New York: Springer-Verlag, 1996.

Dynamic Optimal Learning Rates of a Certain Class of Fuzzy Neural Networks and its Applications with Genetic

Algorithm

Chi-Hsu Wang, Han-Leih Liu, and Chin-Teng Lin

Abstract—The stability analysis of the learning rate for a two-layer neural network (NN) is discussed first by minimizing the total squared error between the actual and desired outputs for a set of training vectors. The stable and optimal learning rate, in the sense of maximum error reduction, for each iteration in the training (back propagation) process can therefore be found for this two-layer NN. It has also been proven in this paper that the dynamic stable learning rate for this two-layer NN must be greater than zero. Thus it is guaranteed that the maximum error reduction can be achieved by choosing the optimal learning rate for the next training iteration. A dynamic fuzzy neural network (FNN) that consists of the fuzzy linguistic process as the premise part and the two-layer NN as the consequence part is then illustrated as an immediate application of our approach. Each part of this dynamic FNN has its own learning rate for training purpose. A genetic algorithm is designed to allow a more efficient tuning process of the two learning rates of the FNN. The objective of the genetic algorithm is to reduce the searching time by searching for only one learning rate, which is the learning rate of the premise part, in the FNN. The dynamic optimal learning rates of the two-layer NN can be found directly using our innovative approach. Several examples are fully illustrated and excellent results are obtained for the model car backing up problem and the identification of nonlinear first order and second order systems.

Index Terms—Backpropogation, fuzzy neural networks, genetic algo-rithm, learning rate.

Manuscript received November 26, 1999; revised September 29, 2000. This paper was recommended by Associate Editor T. Sudkamp.

C.-H. Wang and H.-L. Liu are with the School of Microelectronic Engineering, Griffith University, Queensland, Australia (e-mail: c.wang@me.gu.edu.au).

C.-T. Lin is with the Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C.

Publisher Item Identifier S 1083-4419(01)02511-0.

I. INTRODUCTION

During the past decade, fuzzy neural networks (FNNs) have found a variety of applications in various fields [1]–[3]. Most notably, a FNN system has been applied to control nonlinear, ill-defined systems [4]. These systems used the back-propagation (BP) algorithm to tune the parameters of fuzzy sets and the weights of neural network (NN). Basically the BP algorithm is of descent type, which attempts to minimize the difference (or error) between the desired and actual outputs in an iterative manner. For each iteration, the parameters and weights are adjusted by the algorithm so as to reduce the error along a descent direction. In doing so, values, which are called learning rates, should be properly set in the BP algorithm. Authors in [5] proposed dynamic optimization of the learning rate using derivative information. In [5], it was shown that the relatively large or small learning rates may affect the progress of BP algorithm and even may lead to failure of the learning process. However, the analysis of stable learning rates was not discussed in [5]. Recently genetic algorithms (GAs) [6]–[10] have emerged as a popular family of methods for global optimization. GAs perform a search by evolving a population of potential solutions through the use of its operators. The authors in [9] proposed GAs to tune the parameters of the Gaussian membership functions. Although reasonable results have been obtained in [9], the analysis of stable learning rate was also not discussed at all.

In order to perform the stability analysis of the learning rate [11] in FNN, we start from the stability analysis of the learning rate for a two-layer neural network (NN) by minimizing the total squared error between the actual and desired outputs for a set of training vectors. The stable and optimal learning rate, in the sense of maximum error reduction, for each iteration during the back propagation process can be found for this two-layer NN. It is proven in this paper that the stable learning rate for this two-layer NN must be greater than zero. Following Theorem 1, it is guaranteed that the maximum error reduction can be achieved by choosing the optimal learning rate for the next training iteration. We then propose a dynamic fuzzy neural network that con-sists of the fuzzy linguistic process as the premise part and the two-layer NN as the consequence part. Each part has its own learning rate to be decided. The stable and optimal learning rate of the two-layer NN in the proposed FNN can also be found directly by our method, provided that the output of the premise part (or the input of the consequent part) remains the same during the training process of the consequent part. In order to find the best learning rate for the premise part, a new genetic search algorithm is proposed together with the stable and op-timal learning rate in the consequent part. The major advantage of this new genetic algorithm is to reduce the searching time by searching only one learning rate, which is the learning rate of the premise part, in the dynamic FNN. In comparison with the searching process proposed in [9], our proposed GA has the benefit of reducing the searching com-plexity dramatically.

It is well known that backing up control of a truck is a very difficult exercise for all but the most skilled truck drivers since its dynamics are nonlinear and unstable. Based on our new methodology, a FNN controller for backing up a truck is successfully designed. Using the optimal learning rates, the trained FNN system can indeed let the truck reach its loading zone successfully. Also the nonlinear system identifications of first and second order systems are fully illustrated with excellent results. For the applicability of other FNN models, such as Horikawa et al. [12], we will consider them as future research topics.

(2)

Fig. 1. Two-layer NN.

II. DYNAMICOPTIMALLEARNINGRATES FOR ATWO-LAYERNN Consider the following simple two-layer NN in Fig. 1, which will form the consequent part of the FNN adopted in this paper where

r = [r1 r2 1 1 1 rL]T 2 RL

the training data vector (1)

W = [w1 w2 1 1 1 wZ] 2 RL2Z

the weighting matrix (2)

w_i= [w1

i w2i 1 1 1 wLi]T 2 RL

theith weighting vector (3) y = [y1 y2 1 1 1 yZ]T 2 RZ

the actual output vector (4)

d = [d1 d2 1 1 1 dZ]T 2 RZ

the desired output vector (5) and “T ” denotes matrix transpose.

Given a set of training vectors, which forms the training matrixR in (7), it is desired to use the back propagation technique to train the above NN so that the actual outputs converge to the desired outputs. The actual outputyzis defined as

yz= L l=1

rlwzl = rTwz: (6)

GivenP training vectors, there should be P desired output vectors. In matrix notations, we let

R = [r1 r2 1 1 1 rP] 2 RL2P

the input training matrix (7)

Y = [y₁ y₂ 1 1 1 y_P]T _{2 R}P 2Z

the actual output matrix (8)

D = [d₁ d₂ 1 1 1 d_P]T _{2 R}P 2Z

the desired output matrix. (9) The actual output matrixY (8) can be shown as

Y = RT_W: ₍₁₀₎

It is desired to update (or train) the weighting matrixW so that the actual output yz will converge to a desired outputdz. To do so, we define the total squared errorJ as follows:

J =_{2P 1 Z}1 P p=1 Z z=1 (yp z0 dpz)2: (11)

The aboveJ can also be reorganized using matrix notation. To do so, we define error functionE as

E = Y 0 D = RT_{W 0 D:} ₍₁₂₎

To updateW , we apply the back propagation method as follows: Wt+1= Wt0 t _@W@J

t

(14)

wheret denotes the tth iteration. Using chain rule, we get

Wt+1= Wt0 t_{P 1 Z}1 RE: (15) After training, assuming zero error, we should have matrix form D = RT_{W . It should be noted we assume that the learning rate for}

each iteration during the back propagation process is different, i.e., the learning rates are not fixed. In order to find the optimal learning rate fort, we have the following theorem.

Theorem 1: The optimal learning ratet defined in (15) can be found from the minimum of a quadratic polynomialA2+ B = 0, whereA(> 0) and B(< 0) can be obtained from the training vector r, desired output vectord and the weighting matrix W .

Proof: First, we must find the stable range fort. To do so, we define the Lyapunov function as

V = J2 ₍₁₆₎

whereJ is defined in (13). The change of the Lyapunov function is 1V = J2

t+10 Jt2. It is well known that if1V < 0, the response of

the system is guaranteed to be stable. For1V < 0 we have

Jt+10 Jt < 0: (17)

Here we consider all the P training vectors as fr_i = [r1

i ri2 1 1 1 rLi]Tji = 1; 1 1 1 ; P g. From (15) (for wzl(t + 1)) and

the fact that the training vectors remain the same during the training process, i.e.,r_lp(t + 1) = rp_l(t) = r_lp), we have JJJttt+1[from (13)] as follows: Jt+1= (2P Z)01T r(Et+1Et+1T ) = (2P Z)01_{T r[(R}T_W t+10 D)(RTWt+10 D)T] = (2P Z)01_{T rf[(R}T_(W t0 t(P Z)01REt) 0 D)] 1 [(RT_(W t0 t(P Z)01REt) 0 D)T]g = (2P Z)01T r[(RTWt0 t(P Z)01RTREt0 D) 1 (WtTR 0 t(P Z)01EtTRTR 0 DT)] = (2P Z)01_{T rf[(R}T_W t0 D) 0 t(P Z)01RTREt] 1 [(WT t R 0 DT) 0 t(P Z)01EtTRTR]g = (2P Z)01_{T r[(E} t0 t(P Z)01RTREt) 1 (ET t 0 t(P Z)01EtTRTR)] = (2P Z)01_{T r[E} tETt 0 2t(P Z)01RTREtEtT + 2 t(P Z)02RTREtEtTRTR] = Jt+ t(0P Z)01T r[(P Z)01RTREtEtT] + 2 t(2P Z)01T r[(P Z)02RTREtEtTRTR] = Jt+ t(0P Z)01T r[(P Z)01EtTRTREt] + 2 t(2P Z)01T r[(P Z)02RTREtEtTRTR]:

(3)

Fig. 2. Parabolic trajectory ofJ 0 J (or A + B) versus . Hence Jt+10 Jt= t(0P Z)01T r[(P Z)01EtTRTREt] + 2 t(2P Z)01T r[(P Z)02RTREtEtTRTR] = A2 t + Bt (18) where A =1 2(P Z)03T r[RTREtEtTRTR] =1 2(P Z)03 P p=1 Z z=1 1 L l=1 rp l(t) P i=1 ri l(t)(yzi0 diz) 2 (19) B = 0(P Z)02_{T r[E}T tRTREt] = 0(P Z)02 P p=1 Z z=1 (yp z0 dpz) L l=1 rp l(t) P i=1 ri l(t)(yzi0 diz) : (20)

It is obvious that (19) and (20) contain quadratic matrices, therefore, the A should be greater than zero and B should be less than zero. Therefore we have

Jt+10 Jt= A2+ B < 0:

Fig. 2 shows the parabolic trajectory ofA2 + B versus . In order to satisfy (17), we must haveA2+ B < 0. Since A > 0, it is obvious that the stable range of is (l; u), where l anduare the two roots ofA2+ B = 0. From Fig. 2, we also know that the optimal(= opt) is the median of landu, i.e., when

opt= (u+ l)=2 (21) A2

opt + Bopt is at its minimum. This is due to the symmetrical property of the parabola in Fig. 2. Theopt will not only guarantee the stability of the training process, but also has the fastest speed of

convergence. Q:E:D:

By inspecting (19) and (20), it is obvious that the stable range of is a function ofr, d and W . Theorem 2 shows that the stable learning rate should be positive in the two-layer NN with a set of fixed training vectors.

Theorem 2: For the two-layer NN defined in Fig. 1, the stable

learning rate should be positive, i.e., > 0.

Proof: From Theorem 1, we know thatA > 0 and B < 0.

ThereforeA2+ B < 0 implies that B < 0A2 < 0. Since B < 0, we have the end result of > 0. Q:E:D: Algorithm I shows the overall computational aspects for the back propagation training process of the above two layer NN.

Fig. 3. Two-layer NN with three inputs and two outputs.

Algorithm I: Dynamic Optimal Learning Rates for a Two-Layer NN

Step1: Given the initial weighting matrix W0, training matrixR and desired output matrixD, find the initial actual output matrix Y0(10) and optimal learning rate0(Theorem 1).

Step2: Start the back propagation training process. Iteration count t = 0.

Step3: Find if the D and Yt (10) are close enough or not? If Yes,

GOTO Step 7.

Step4: Update the weighting matrix to yield Wt+1by (15). Step5: Find the optimal learning rate t+1(Theorem 1) for the next

iteration.

Step6: t = t + 1. GOTO Step 3. Step7: End.

The following Example 1 illustrates the major concept in this section.

Example 1: Fig. 3 shows a two-layer NN with three inputs and two

outputs.

Given input training matrixR, desired output matrix D (defined in (7) and (9)) as R = [r₁ r₂ r₃ r₄] = 03:0852 1:0449 2:9027 5:0642 04:1030 04:3199 0:5842 1:4118 05:0811 6:316 11 00:9816 1:2853 _{L(=3)2P (=4)} D = 00:9346 00:0882 1:0108 0:1857 0:0664 00:9783 0:4995 01:3264 _{P (=4)2Z(=2)} :

The initial weighting matrixW_iis chosen to be

Wi=

00:0531 0:1050 01:7333 1:3398

00:9498 01:2728 _L(=3)2Z(=2) :

The initial J is 28.1832. After 30 iterations, the stable range of learning rate for each iteration can be found from (14)–(20) and are listed in Table I.

After finding the stable range of each iteration, we choose0:5_uto be the real learning rate for that iteration and perform the update of the weighting matrixW . Fig. 4 shows the trajectory of total squared errorJ. It is obvious that the values of total squared error decreased as expected.

The final weighting matrixWf is

Wf =

0:0657 00:3169 00:0039 0:0852

0:1461 0:1396 _L(=3)2Z(=2) :

(4)

Fig. 4. Total squared errorJ via iteration t.

In the end, we have

D = 00:9346 00:0882 1:0108 0:1857 0:0664 00:9783 0:4995 01:3264 RT _{2 W} f = 00:9295 00:0814 1:0083 0:1823 0:0452 01:0071 0:5153 01:3050 :

III. FNNWITHDYNAMICSTABLELEARNINGRATE

The FNN in Fig. 5 was proposed in [4] for the control of a model car to follow a specified path, but without the stability analysis of learning rates. Here we adopt the identical structure as shown in [4] but re-place the B-spline membership functions with Gaussain membership functions. Fig. 5 contains the premise part and consequent part. Each part has its own learning rate. The learning rate in the premise part is to fine-tune the Gaussian membership functions, whereas the learning rate in the consequent part is to adjust the weighting factors. Fig. 6 redraws the consequent part of Fig. 5, which clearly shows that the two-layer NN in Fig. 1 is the consequent part of Fig. 5. The stability analysis in Section II will be used to analyze the stability of the FNN and then a more efficient GA is devised in Section IV to tune this FNN.

The reasoning rule can be established by the following: Rule lll: If xxx1is F1land 1 1 1 and xxxNNNis FNl then

yyy₁is wl

1and 1 1 1 and yyyZZZ is wZl

wherel = 1; 2; 1 1 1 ; L, F_ql’s are fuzzy membership functions of the antecedent part, andw_zl 2 W are neural network weights of the conse-quent part. TheF_ql’s, whose functions are Gaussian functions, and_l are Fl q(xq) = exp 0 xq0 q q 2 (22)

Fig. 5. Proposed FNN in this paper.

Fig. 6. Another look at the consequent part in Fig. 5.

rl= l= N i=1

Fl

i(xi) (23)

where_lis the truth value of the premise of thelth rule. The output y_z of the fuzzy reasoning can be derived from the following equation:

yz= a=b a = L i=1 wi zi; b = L i=1 i (z = 1; 2; 1 1 1 ; Z) y = [y1 y2 1 1 1 yZ]T: (24)

By adjusting the weighting factors and the parameters of the Gaussian functions of the neural network, the learning algorithm can be derived to minimize the total squared errorJ defined in (11). To updatel_i, we use

l

i(t + 1) = li(t) 0 t @J_@ t

: (25)

Using the chain rule, we get l i(t + 1) = il(t) 0 t_{P 1 Z}1 P p=1 Z z=1 (yp z0 dpz) 1 wl0 yzp b l(x p i0 il(t)) (l i(t))2 (26) where_tis a current learning rate for tuning. Again, using a similar method, we have the following for_il,w_zl:

l i(t + 1) = li(t) 0 t_{P 1 Z}1 P p=1 Z z=1 (yp z0 dpz) 1 wl0 y_b zpl(xpi0 li(t))2 (l i(t))3 ; (27) wl z(t + 1) = wlz(t) 0 t_{P 1 Z}1 P p=1 yp z0 dpz b l (28)

(5)

Fig. 7. Training process of the proposed FNN.

wheretis a current learning rate for tuningwl_z,b defined in (24). Hence the input matrixR of the consequent part, i.e., the two layer NN, becomes R = 1 1 L i=1 1 i 21 L i=1 2 i 1 1 1 P1 L i=1 P i 1 2 L i=1 1 i 22 L i=1 2 i 1 1 1 P2 L i=1 P i .. . ... . .. ... 1 L L i=1 1 i 2L L i=1 2 i 1 1 1 PL L i=1 P i L2P : (29)

For each iteration during the back propagation training process of the premise part (with a chosen learning rate), we can have the above R matrix for the consequent part. Then we can apply the results of Theorems 1 and 2 to find the dynamic optimal learning rates for all the iterations during the training process of the consequent part. The following Fig. 7 shows the proposed training process of the whole FNN in Fig. 5.

The number of iterationsM in the consequent part of Fig. 7 depends upon the convergent rate set by the designer. In order to find the optimal learning rate of the premise part, we rely on a genetic search algorithm. The following section will explain the details of the proposed new ge-netic search algorithm based on Fig. 7.

IV. TUNINGFNNUSING AGENETICALGORITHM

GAs are iterative search algorithms based on an analogy with the process of natural selection (Darwinism) and evolutionary genetics. The main goal is to search for a solution, which optimizes a user-de-fined function called the fitness function. To perform this task, it main-tains a population or a gene pool of randomly encoded chromosomes (or individuals, solution candidates),P opt= f1t; 1 1 1 ; P op sizet g for

each generationt. Each i_tis selected randomly following a uniform distribution over search space and can be binary strings or a real value. It represents a potential solution to the problem at hand and is evaluated. Then, a new population (generationt + 1) is formed by selecting the more fit chromosomes. Some members of the new population undergo transformation by means of genetic operators to form new solutions. After some generations, it is hoped that the best chromosome repre-sents a near-optimal solution.

There are three operators: selection, crossover, and mutation. The se-lection decides which of the chromosomes in a population are selected for further genetic operations. Each chromosomei in a population is assigned a value'iof fitness. The fitness values are used to assign a

probability valueito each chromosome. The probability valueiis defined as

i= 'i

P op size k=1

'k: (30)

The chromosome with a larger fitness value has a larger probability of selection. The crossover operation combines the features of two parent chromosomes to form two similar offspring by swapping corresponding segments of the parents. The parameters defining the crossover operation are the probability of crossoverP c and the crossover position. Mutation is a process of occasional alternation of some gene values in a chromosome by a random change with a probability less than the mutation rateP m.

GAs [10] are used to maximize a function or to do a minimization. In our application, the error functionJ needs to be scaled and transformed into another function to meet the fitness evaluation requirement. For a givenJ, J = 10,1 < < 10, the fitness function (J) is defined as [8]

(J) = '( 10₌ 0 + 1 0 ₁₀; if < 0 100(+1)_{+ 1 0 =10}

10(+1) ; if 0:

(31)

Equation (31) finds a larger fitness value for smallerJ. In other words, if the value ofJ is larger, it will be mapped to a smaller fit-ness value and vice versa. For example, ifJ is 0.007, then = 03 and (31) will yield a fitness value of 3.3. IfJ is 10238, then = 4 and (31) will be mapped to 1.8976e-005.

Following the training process as explained in Fig. 7, we start with an initial learning rate₀in the premise part and proceed to train the NN with the dynamic optimal rates obtained from Theorems 1 and 2 in the consequent part. By choosing the optimaloptin each iteration in the training process of the NN, the total squared errorJ can be found for this initial0. The search must then be continued to yield the optimal optsuch that the total squared errorJ is a minimum. It is obvious that we only have to search foroptin the FNN. The determination ofopt

is from Theorems 1 and 2. Otherwise, the FNN with two learning rates (to be searched for by GAs, [9]) will require much more searching time. The overall search algorithm, which summarizes the whole concept, is listed below.

Algorithm II: Tuning of FNN via Genetic Algorithm

Step 1: Initialize weighting matrix W randomly. Initialize centers ’s, and widths ’s. Set values to Iteration, l, u, P op size, Max gen, and Threshold.

Step 2:

Fort = 1: Iteration

Initialize population P op = fig, i 2 (l; u), i = 1; 1 1 1 ; P op size.

For generation = 1:Max gen % GA Fori = 1: P op size

Getith .

Compute centersith _t+1’s in (26), and widthsith _t+1’s in (27).

Establish a new matrixR

WhilejJt;min0 Jt02;minj=Jt02;min >Threshold %update Wt+1

Compute matrixE in (12). ComputeA in (19), B in (20).

Computeith _optin (21),ith W_t+1in (28), andith J_t+1;min in (11).

(6)

Fig. 8. Diagram of simulated truck and loading zone.

Putith Jt+1;mininto fitness vector. End

Perform selection, crossover, and mutation. % for next generation End

Optimaloptis found.

For premise part: centers(t+1), widths (t+1), and matrix R are

found.

For consequent part:(l= 0; u) of , opt,Wt+1, andJt+1;min. are found.

End

The performance of the algorithm will be illustrated using three pop-ular examples.

V. EXAMPLES

The applications of the above GAs will be fully illustrated in this section. Example 2 is the truck back up problem. Examples 3 and 4 are nonlinear system identifications.

Example 2: Truck Back Up Problem: The well-known problem of

backing up a truck into a loading dock via the FNN controller [9], [13], [14] will be considered in this section. The FNN in Fig. 5 will be fully utilized and tuned by our GAs. Fig. 8 shows the truck and loading zone. The truck is located by three variablesx, y, and , where is the angle of the truck with the horizontal axis and0 x 20, 0115 295_{. The steering angle}_{is within [040}_{; 40}_{], which is to}

control the truck. The truck moves backward by a fixed unit distance at every step. Because we assume enough clearance between the truck and the loading zone,y is not considered. We must first prepare many pairs of data for x, , and as the training data such that the final state(xf; f) is equal or close to (10, 90). In this simulation, we normalized [040; 40] into [0, 1].

To cover the whole situation, the following 14 initial states are used to generate desired input-output (I/O) pairs:(x0; 0) = (1, 0), (1, 90), (1, -90), (7, 0), (7, 90), (7, 180), (7,090), (13, 0), (13, 90), (13, 180), (13, 270), (19, 90), (19, 180), and (19, 270). Also, the following approximate kinematics are used:

xt+1= xt+ cos(t+ t) + sin(t) sin(t) (32) yt+1= yt+ sin(t+ t) 0 sin(t) sin(t) (33) t+1= t0 sin01 2 sin(_l t) (34) wherel is the length of truck. In this simulation, we assumed l = 4. Equations (32)–(34) will be used to obtain the next state when the present state and control are given. Sincey is not considered, (33) will not be used. Further we let the Ts be the fuzzy membership functions

TABLE III

CENTER ANDWIDTH FORFUZZYSETS OF

TABLE IV FUZZYRULES

(Gaussian functions) of steering angle. The centers of T1, T2, T3, T4, T5, T6, and T7 are040,020,07,0,7,20, and40, respec-tively. Table II shows the initial fuzzy sets ofx (Q1 Q5) which are represented by the centers and widths of Gaussian functions.

The centers and widths of membership functions of(R1 R7) are listed in Table III. Table IV shows the fuzzy rules.

We use 16 bits to form the chromosome pattern. The chromosomes will be mapped to the real values in range(l; u). To increase the

efficiency, we define mutation rateP m and crossover rate P c [9] as P m = exp(0:05k=Max gen)₀₁ ₍₃₅₎

P c = exp(0k=Max gen) ₍₃₆₎

wherek denotes the kth generation. Table V shows all the parameters in the GAs process.

The value of initialJ is 0.051 76. After five iterations, we have an excellent result as shown in Fig. 9. Fig. 9 also shows the performance comparison with other cases(bbb ddd) in which the learning rates are fixed.

Tables II and III show that final centers and widths of the member-ship functions have not been changed a lot from the initial ones. The optimal learning ratesopt,opt, (l; u) and J of 5 iterations are

shown in Table VI.

From the above table, the values of_optis very close to one, which is ’s upper bound u, andoptis derived from (21). The final weighting factors of the fuzzy rules are shown in Table IV. During the simulation, the tolerated ranges ofx and are defined as [9.85, 10.15], [89, 91], respectively. The truck trajectories of this simulation are shown in Figs. 10 and 11 with different initial positions (case:aaa jjj).

Example 3: Nonlinear System Identification Second Order System:

The plant to be identified is described by the second-order difference equation [14]–[17]

(7)

TABLE V PARAMETERS FORGA

Fig. 9. Performance comparison for Example 2. Case a: Our optimal, ; Case b: = 0, = 0:6. Case c: = 0:2, = 0:7; Case d: = 0:9, = 0:1.

TABLE VI

LEARNINGRATE , , ( ; ),ANDJ

where

g[y(k); y(k(1)] = y(k)y(k 0 1)[y(k) + 2:5]_{1 + y}₂_{(k) + y}₂_{(k 0 1)} : (38) A series-parallel FNN identifier [14], [15] described by the following equation

^y(k + 1) = ^f[y(k); y(k 0 1)] + u(k) (39) will be adopted, where ^f[y(k); y(k 0 1)] is in the form of (24) with two fuzzy variablesy(k) and y(k 0 1). Training data of 500 points are generated from the plant model, assuming a random input signalu(k) uniformly distributed in the interval [02, 2]. The data are used to build fuzzy model for ^f. The y(k) and y(k01) are allocated with four fuzzy sets (Q’s and R’s) in Tables VIII and IX, respectively. Hence 16(= 42) fuzzy rules are required. The initial centers and widths ofy(k) and y(k 0 1) are shown in Tables VII and VIII. The mutation rate P m in (35) and crossover rateP c in (36) are applied again. The parameters for the GA are listed in Table IX.

The initial value ofJ is 3.2268. After 606 iterations, the trajectory of J of first 29 iterations via iteration t is shown in Fig. 12. It also shows the performance comparison with other cases(bbb eee).

After the training process is finished, the model is tested by applying a sinusoidal input signalu(k) = sin(2k=25) to the fuzzy model. Fig. 13 shows the output of both fuzzy model and the actual model. The total squared errorJ using 120 data items is 0.0023. This example once again shows the tremendous effect of using optimal learning rates. The final centers and widths of membership functions fory(k) and y(k 0 1) are also listed in Tables VIII and IX, respectively. Table X shows the fuzzy rules and final weighting matrix.

Fig. 10. Truck trajectories using fuzzy neural network controller.

Fig. 11. Another five truck trajectories using fuzzy neural network controller. TABLE VII

CENTER ANDWIDTH FORFUZZYSETS OFy(k)

TABLE VIII

CENTER ANDWIDTH FORFUZZYSETS OFy(k 0 1)

TABLE IX PARAMETERS OFGA

The optimal learning ratesopt,opt,(l; u) and J of certain

(8)

Fig. 12. Performance comparison for Example 3. Case a: Our optimal, . Case b: = 0:5, = 0:8. Case c: = 0:4, = 0:4. Case d: = 0, = 0:6. Case e: = 0:1, = 0:2.

Fig. 13. Outputs of the planty (solid line) and the identification model ^y (dashed line).

Example 4: Nonlinear System Identification First Order System: In

Example 3, the input is seen to occur linearly in the difference equation describing the plant. In this example [15], the plant to be identified is of the following nonlinear form:

y(k + 1) = g[y(k); u(k)] (40) where the unknown functiong has the following nonlinear form:

g(x1; x2) = _{1 + x}x1 ₂ 1 + x

3

2 (41)

andu(k) = sin(2k=25) + sin(2k=10), y(k) is distributed in range [07.3713, 7.4410]. The series-parallel identification model is

^y(k + 1) = ^f[y(k); u(k)] (42) where ^f is in the form of (24) with two fuzzy variables xxx₁ andxxx₂. The fuzzy variablesxxx1 andxxx2 are defined to have five fuzzy sets re-spectively. Hence 25(= 52) fuzzy rules are required. Also 60 training data items are generated for training purposes. The initial centers and widths of two fuzzy variablesx1andx2 are shown in Tables XII and XIII. Mutation rateP m in (35) and crossover rate P c in (36) are ap-plied again. The parameters for GAs are listed in Table XIV.

After 589 iterations, the trajectory ofJ of first 29 iterations via iter-ationt is shown in Fig. 14. It also shows the performance comparison with other cases(bbb eee) in which the learning rates are fixed.

TABLE XI

LEARNINGRATE , ,( ; ),ANDJ

TABLE XII

CENTER ANDWIDTH FORFUZZYSETS OFx

TABLE XIII

CENTER ANDWIDTH FORFUZZYSETS OFx

Fig. 15 shows the outputs and the model after the identification pro-cedure was terminated. After recall, the total squared errorJ using 120 data items is 0.0131.

The final centers and widths of membership functions ofx1andx2

are listed in Tables XII and XIII, respectively. Table XV shows the fuzzy rules and final weighting matrix.

The optimal learning rates_opt,_uandJ of certain iterations are shown in Table XVI. Due toopt = [l(= 0) + u]=2, we avoid

showing the value ofopt.

VI. CONCLUSION

The stability analysis of a dynamic learning rate in a simple two-layer neural network is first explored in this paper. It has been found that the dynamic stable optimal learning rates in the sense of maximum error reduction must be positive. This result can be used in any dynamic FNN that includes the simple two-layer NN as the consequent part. In order to demonstrate this effectiveness, a genetic algorithm is devised to fully utilize this result to fine-tune the two learning rates in a dynamic FNN. In this case, we only have to search for the optimal learning rate in the premise part of the FNN. The other optimal learning rate for the consequent part can be determined immediately from the proposed innovative approach for the two-layer NN in this paper. Several popular examples are considered and it has been found that in all the examples the FNNs are trained in a convergent way as expected. Performance comparisons with different

(9)

TABLE XIV PARAMETERS FORGA

Fig. 14. Performance comparison for Example 4. Case a: Our optimal, . Case b: = 0:5, = 0:8. Case c: = 0, = 0:6. Case d: = 0:4, = 0:4. Case e: = 0:1, = 0:2.

Fig. 15. Outputs of the planty (solid line) and the identification model ^y (dashed line).

TABLE XV

FUZZYRULES ANDFINALWEIGHTINGFACTORSw

learning rates in all examples are also presented. It is believed that the new results presented in this paper can be applied to any other

TABLE XVI

LEARNINGRATE , ,( ; ),ANDJ

applications that utilize the dynamic FNNs, which includes two-layer NNs.

REFERENCES

[1] C. T. Lin and Y. C. Lu, “A neural fuzzy system with fuzzy supervised learning,” IEEE Trans. Syst., Man, Cybern. B, vol. 26, pp. 744–763, Oct. 1996.

[2] W. Y. Wang et al., “Function approximation using fuzzy neural networks with robust learning algorithm,” IEEE Trans. Syst., Man, Cybern. B, vol. 27, pp. 740–747, Aug. 1997.

[3] J. T. Spooner and K. M. Passino, “Stable adaptive control using fuzzy systems and neural networks,” IEEE Trans. Fuzzy Syst., vol. 4, pp. 339–359, Aug. 1996.

[4] C. H. Wang et al., “Fuzzy B-spline membership function (BMF) and its applications in fuzzy-neural control,” IEEE Trans. Syst., Man, Cybern., vol. 25, pp. 841–851, May 1995.

[5] X. H. Yu et al., “Dynamic learning rate optimization of the backpropa-gation algorithm,” IEEE Trans. Neural Networks, vol. 6, pp. 669–677, May 1995.

[6] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Pro-grams., 3rd ed. New York: Springer-Verlag, 1996.

[7] K. S. Tang et al., “Structured genetic algorithm for robust H control systems design,” IEEE Trans. Ind. Electron., vol. 43, pp. 575–582, Oct. 1996.

[8] C.-C. Hsu et al., “Digital redesign of continuous systems with improved suitability using genetic algorithms,” IEE Electron. Lett., vol. 33, no. 15, pp. 1345–1347, July 1997.

[9] T. L. Seng et al., “Tuning of a neural-fuzzy controller by genetic algo-rithm,” IEEE Trans. Syst., Man, Cybern. B, vol. 29, pp. 226–236, Apr. 1999.

[10] J. H. Holland, Adaptation in Natural and Artificial Systems. Cambridge, MA: MIT Press, 1992.

[11] V. Solo and X. Kong, Adaptive Signal Processing Algorithms Stability and Performance. Englewood Cliffs, NJ: Prentice- Hall, 1995. [12] S. Horikawa et al., “On fuzzy modeling using fuzzy neural networks

with back-propagation algorithm,” IEEE Trans. Neural Networks, vol. 3, pp. 801–806, Sept. 1992.

[13] B. Kosko, Neural Network and Fuzzy Systems. Englewood Cliffs, NJ: Prentice-Hall, 1992.

[14] L. X. Wang, Adaptive Fuzzy Systems and Control: Design and Stability Analysis. Englewood Cliffs, NJ: Prentice-Hall, 1994.

[15] K. S. Narendra and K. Parthasarathy, “Identification and control of dy-namical systems using neural networks,” IEEE Trans. Neural Networks, vol. 1, pp. 4–26, Mar. 1990.

[16] S. Barada and H. Singh, “Generating optimal adaptive fuzzy-neural models of dynamical systems with applications to control,” IEEE Trans. Syst., Man, Cybern. B, vol. 28, pp. 371–390, Aug. 1998. [17] W. A. Farag et al., “A genetic-based neural-fuzzy approach for modeling

and control of dynamical systems,” IEEE Trans. Neural Networks, vol. 9, pp. 756–767, Sept. 1998.