Dynamic neuro-fuzzy modeling We proposed an improved TSK-type recurrent

fuzzy network (ITRFN) for dynamic system iden-tiﬁcation. For the network structure, we extend the internal dynamics to be high-order, and add adaptive parameters for tuning the membership functions of internal variables. The network is constructed and trained concurrently through a learning algorithm. There are two phases included in the learning algorithm, i.e., structure learning and parameter learning. We propose a new incre-mental self-clustering method to initialize the net-work structure and weights in the structure learn-ing phase. Our clusterlearn-ing method can generate clusters that ﬁt the real data distribution better than the original TSK-type recurrent fuzzy net-work (TRFN). Besides, we derive learning rules for reﬁning the adaptive parameters of the network

with a real time recurrent learning algorithm.

2.1 Overview of ITRFN

In the following, structure and operation of the ITRFN are introduced. For convenience, we con-sider a dynamic system with a single input x(t) and a single output y(t + 1). Extension to multiple inputs and multiple outputs is obvious. Figure 10 shows the whole structure of the ITRFN. Two ex-ternal inputs fed into the network are x(t) and y(t). Note that we represent the two external in-puts x(t) and y(t) as x1(t) and x2(t), respectively, in the following description. Totally, there are ﬁve layers included in the ITRFN. A such network re-alizes a recurrent fuzzy system with J fuzzy rules.

Each rule j is deﬁned as the following form:

IF x1(t) IS µ1j AND x2(t) IS µ2j AND hj(t) IS Gj

where hj is an internal variable, aij and wijk are adaptive constant parameters, Piand Qiare user-deﬁned parameters which decide the order of in-ternal recurrency, µij is a Gaussian membership function with mean mij and deviation σij, i.e.,

µij(xi(t)) = g(xi(t); mij, σij)

and Gjis a sigmoid function deﬁned as the follow-ing equation:

Gj(hj(t)) = s(hj(t); bj, cj) = 1

1 + e^−b^j^h^j^(t)^−c^j (80) where bj and cj are two constant parameters.

The operation of the ITRFN is described as fol-lows.

1. Layer 1. Layer 1 contains two nodes. Node i of this layer produces output o⁽¹⁾_i (t) by trans-mitting its input signal xi(t) directly to layer 2, i.e.,

o⁽¹⁾_i (t) = xi(t). (81) 2. Layer 2. Layer 2 contains J groups and each group contains three nodes. Two types of

Table 24: Comparison on eﬃciency of diﬀerent clustering methods with diﬀerent number of rules for the STOCK dataset of Experiment 2.3.

7 Rules 10 Rules

Training MSE Testing MSE Time Training MSE Testing MSE Time SVD-QR-CP 2.22×10⁻² 3.21×10⁻² 0.06 2.13×10⁻² 2.83×10⁻² 0.11

ACA 8.77×10⁻³ 1.25×10⁻² 0.01 6.63×10⁻³ 9.58×10⁻³ 0.02 SCRG 7.52×10⁻³ 8.26×10⁻³ 0.01 5.45×10⁻³ 6.64×10⁻³ 0.03 MFC 5.58×10⁻³ 5.79×10⁻³ 0.14 4.57×10⁻³ 5.06×10⁻³ 0.16

Table 25: Comparison on learning performance of diﬀerent systems for the STOCK dataset of Experi-ment 2.3.

7 Rules 10 Rules

Training Testing Training Testing

MSE MSE Iters Time MSE MSE Iters Time

Yen’s system 1.62×10⁻³ 6.11×10⁻² 1 0.16 4.17×10⁻⁴ 7.02×10⁻³ 1 0.22 Juang’s system 1.30×10⁻⁴ 1.94×10⁻⁴ 2011 193.50 1.00×10⁻⁴ 2.28×10⁻⁴ 2677 238.56

Lee’s system 1.28×10⁻⁴ 1.66×10⁻⁴ 1035 88.35 1.08×10⁻⁴ 1.62×10⁻⁴ 2000 244.61 Our system 1.25×10⁻⁴ 2.13×10⁻⁴ 5 0.37 1.87×10⁻⁶ 9.79×10⁻⁶ 1 0.11

Table 26: Comparison on eﬃciency of diﬀerent clustering methods with diﬀerent number of rules for the HOUSING dataset of Experiment 2.3.

8 Rules 12 Rules

Training MSE Testing MSE Time Training MSE Testing MSE Time SVD-QR-CP 2.77×10⁻² 2.62×10⁻² 0.44 2.76×10⁻² 2.64×10⁻² 0.55

ACA 1.98×10⁻² 2.08×10⁻² 0.10 1.31×10⁻² 1.74×10⁻² 0.09 SCRG 2.03×10⁻² 2.56×10⁻² 0.11 1.16×10⁻² 1.32×10⁻² 0.11 MFC 1.85×10⁻² 2.01×10⁻² 0.34 1.06×10⁻² 1.24×10⁻² 0.49

Table 27: Comparison on learning performance of diﬀerent systems for the HOUSING dataset of Ex-periment 2.3.

8 Rules 12 Rules

Training Testing Training Testing

MSE MSE Iters Time MSE MSE Iters Time

Yen’s system 3.74×10⁻³ 3.53×10⁻² 1 1.32 2.98×10⁻³ 3.17×10⁻² 1 1.70 Juang’s system 3.26×10⁻³ 9.88×10⁻³ 2478 280.47 2.50×10⁻³ 1.26×10⁻² 2749 318.02

Lee’s system 3.12×10⁻³ 5.24×10⁻³ 2000 212.42 2.77×10⁻³ 8.94×10⁻³ 2395 255.73 Our system 2.97×10⁻³ 7.35×10⁻³ 12 5.88 2.50×10⁻³ 7.11×10⁻³ 10 4.92

Figure 10: Architecture of the ITRFN.

membership functions are used in this layer.

For the external input xi(t), the correspond-ing node in group j of this layer produces its output, E.o⁽²⁾_ij (t), by computing the value of the corresponding Gaussian function, i.e.,

E.o⁽²⁾_ij (t) = µij(o⁽¹⁾_i (t))

= g(o⁽¹⁾_i (t); mij, σij). (82) For the internal variable hj(t), the corre-sponding node in group j of this layer pro-duces its output, I.o⁽²⁾_j (t), by computing the value of the corresponding sigmoid function, i.e.,

I.o⁽²⁾_j (t) = Gj(hj(t)) = s(hj(t); mij, σij). (83) 3. Layer 3. Layer 3 contains J nodes. Node j’s output, o⁽³⁾_j (t), of this layer is the product of all its inputs from layer 2, i.e.,

o⁽³⁾_j (t) =

2 i=1

E.o⁽²⁾_ij (t)× I.o⁽²⁾j (t). (84)

4. Layer 4. Layer 4 contains J nodes. Node j in this layer computes the centroid defuzziﬁ-cation result of the internal variable hj(t + 1).

Therefore, its output, o⁽⁴⁾(t), is calculated by the following equation:

5. Layer 5. Layer 5 contains only one node whose output, o⁽⁵⁾, represents the centroid defuzziﬁ-cation result of the network output y(t + 1), i.e., parameters that can be tuned to improve the per-formance of the network. We call these parameters adaptive parameters of the network.

2.2 Learning of ITRFN

In the following, we develop a new learning algo-rithm for the ITRFN. There are two phases in-cluded in the learning algorithm, i.e., structure learning and parameter learning. In the struc-ture learning phase, new fuzzy rules are generated incrementally, and the corresponding nodes and weights in the network structure are constructed and initialized. Then, the adaptive weights of the network are reﬁned by a real time recurrent learn-ing algorithm in the parameter learnlearn-ing phase.

The previous two-phase learning scheme is per-formed for each incoming training pattern. We describe the two learning phases in detail as fol-lows.

2.2.1 Structure learning phase

As mentioned earlier, fuzzy rules are generated incrementally by a self-clustering method in the phase of structure learning. Also, the correspond-ing nodes and weights in the network structure are constructed and initialized.

We deﬁne a fuzzy cluster j as a pair (Gj(x), cj) where Gj(x) is the product of two one-dimensional Gaussian functions, i.e., vec-tor, and the deviation vecvec-tor, respectively, and cj

denotes the height of cluster j. Let J be the num-ber of existing fuzzy clusters and Sj be the size of cluster j. Apparently, J is 0 initially. For an in-coming training pattern t, (x(t), y^d(t + 1)), where

x(t) = [x1(t), x2(t)], we calculate Gj(x(t)) for each existing cluster j, 1≤ j ≤ J. We say that training pattern t passes the input-similarity test on cluster j if

Gj(x(t))≥ ρ (90) where ρ, 0 ≤ ρ ≤ 1, is a predeﬁned threshold.

Then we calculate

ej(t) =|y^d(t + 1)− cj| (91) for each cluster j on which training pattern t has passed the input-similarity test. Let d = qmax−qminwhere qmaxand qminare the maximum output and the minimum output, respectively, of the given data set. We say that training pattern t passes the output-similarity test on cluster j if

ej(t)≤ τd (92)

where τ , 0≤ τ ≤ 1, is another predeﬁned thresh-old.

Two cases may occur. First, there are no ex-isting fuzzy clusters on which training pattern t has passed both the input-similarity test and the output-similarity test. For this case, we assume that training pattern t is not close enough to any existing cluster and a new fuzzy cluster k = J + 1 is created with

mk = x(t), σk = σ0, ck= y^d(t + 1) (93) where σ0= [σ0, σ0] is a user-deﬁned constant vec-tor. Note that the new cluster k contains only one member, training pattern t, at this time. Of course, the number of clusters is increased by 1 and the size of cluster k should be initialized, i.e., J = J + 1, Sk = 1. (94) Moreover, the corresponding fuzzy rule as in Equa-tion 79 is also deﬁned. mik and σik are set as the same values in Equation 93 initially. The initial values of bk and ck are assigned as 1 and 0, re-spectively. a0k is set as ck in Equation 93 and the other aij parameters are set as random values in [−0.05, 0.05] initially. The initial values of wjkl pa-rameters are assigned as random values in [−1, 1].

Besides, the corresponding nodes and weights in the network structure are constructed and initial-ized with the above parameters.

On the other hand, if there are existing fuzzy clusters on which training pattern t has passed both the input-similarity test and the output-similarity test, let clusters j1, j2, . . . , and jf be such clusters and let the cluster with the largest membership degree be cluster l, i.e.,

Gl(x(t)) = max(Gj₁(x(t)), . . . , Gj_f(x(t))). (95) In this case, we assume that training pattern t is closest to cluster l and cluster l should be modiﬁed to include training pattern t as its member. The modiﬁcation to cluster l is as follows:

σil = ((Sl− 1)(σil− σ0)²+ Stmil2

Note that J is not changed in this case.

2.2.2 Parameter Learning phase

After the structure learning phase, the parame-ter learning phase is performed successively with

the same incoming training pattern. As mentioned earlier, the objective of the parameter learning phase is to reﬁne the adaptive parameters, includ-ing mij, σij, bj, cj, aij, and wkjl, of the network through a learning algorithm. In this paper, we de-rive the learning algorithm based on the real time recurrent learning as follows.

The error function we want to minimize is E(t + 1) = 1

2(y(t + 1)− y^d(t + 1))² (100) where y(t + 1) is the actual network output and y^d(t + 1) is the desired output. The learning rule for aij is derived as the following equations:

aij(t + 1) = aij(t)− η∂E(t + 1)

Note that η is a learning rate.

For the parameters mij and σij, we have the following learning rules. Note that the parameter φij represents either mij or σij in the following equations.

∂2

For the parameter wjik, we derive the learning rule as follows: fol-lowing learning rules. Note that the parameter ψj

represents either bj or cj in the following equa-tions.

2.3 Experimental Results and Discussion We demonstrate the performance of ITRFN by showing the results of two experiments. A compar-ison between our system and the original TRFN is also presented.

2.3.1 Experiment 3.1

The ﬁrst experiment concerns the identiﬁcation of the following dynamic system [18] for a plant:

yp(t + 1) = f (yp(t), yp(t− 1), yp(t− 2), u(t), u(t− 1)) (118) where

f (x1, x2, x3, x4, x5) = x1x2x3x5(x3− 1) + x4

1 + x32+ x22 .(119) Apparently, the present output of the plant de-pends on three past outputs and two past inputs.

As in [18], one training epoch consists of 900 time steps. The training input signal u(t) consists of an iid (independent and identically distributed) uni-form sequence over [−2, 2] for about half of the 900 time steps and a sinusoid denoted 1.05 sin(πt/45) for the remaining time steps. Besides, two external inputs, including the current state yp(t) and con-trol input u(t), are fed into the network. For the test data, the following input signal u(t) as used

(a) (b)

Figure 12: Simulated output functions after ten training epoch for experiment 3.1. (a) By TRFN. (b) By ITRFN.

Table 28: Comparison of the TRFN and ITRFN for experiment 3.1 and experiment 3.2.

Experiment 3.1 Experiment 3.2

TRFN ITRFN TRFN ITRFN

(P1, Q1, P2, Q2) (1,1,1,1) (2,2,2,2) (2,3,2,3) (1,1,1,1) (2,2,2,2) (3,2,3,2)

RMSE 0.0346 0.024 0.020 0.015 0.0313 0.027 0.022 0.018

time steps 9000 8000 6000 6000 9000 8000 6000 6000

(a)

(b)

Figure 11: Clustering results after one training epoch for experiment 3.1. (a) By TRFN. (b) By ITRFN.

in [18] is adopted:

u(t) =

⎧⎪

⎪⎪

⎪⎨

⎪⎪

⎩

sin(πt/25), t < 250

1.0, 250≤ t < 500

−1.0, 500≤ t < 750 0.3 sin(πk/25) + 0.1

× sin(πk/32) + 0.6

× sin(πk/10), 750 ≤ t < 1000.

To demonstrate that the clustering method of the original TRFN is unreasonable for some in-put orders of training data, Figure 11 shows the clustering results produced by the original TRFN and our ITRFN after one training epoch. From Figure 11(a), we can see that two clusters gener-ated by TRFN is too big and each of them covers a wide range of area in which no training data exists. On the other hand, the three clusters gen-erated by ITRFN are more reasonable as shown in Figure 11(b). After training with ten epoches, the simulated output functions produced by TRFN and ITRFN are shown in Figure 12. Note that the solid curve denotes the actual output and the dot-ted curve denotes the output produced by TRFN or ITRFN. Therefore, the original TRFN take more training time to meet the desired precision or it even cannot achieve the desired precision at all. The comparison between these two methods is shown in Table 28. Note that the number of fuzzy rules adopted in this experiment is three, RMSE denotes the root-mean-square error for test data, and the number of time steps means the training time steps. We can see that the ITRFN take less training time steps to achieve a higher precision.

2.3.2 Experiment 3.2

The dynamic system considered in the second ex-periment is the following equation [18]:

yp(t + 1) = 0.72yp(t) + 0.025yp(t− 1)u(t − 1) +0.01u²(t− 2) + 0.2u(t − 3).(120) Apparently, the present output of the plant de-pends on two past outputs and four past inputs.

We use the same training and test input signals u(t) adopted in experiment 1. A comparison be-tween TRFN and ITRFN is also shown in Ta-ble 28. Again, ITRFN take less training time steps to achieve a higher precision.

在文檔中靜、動態神經模糊建模技術及其在語言辨識上的應用(III)Static/Dynamic Neuro-Fuzzy Modeling Techniques and Its Application to Speech Recognition (頁 21-27)