www.elsevier.com/locate/fss

### Water bath temperature control with a neural fuzzy

### inference network

### Chin-Teng Lin

*∗*

### , Chia-Feng Juang, Chung-Ping Li

Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, Taiwan, ROC Received July 1996; received in revised form March 1998

Abstract

Although multilayered backpropagation neural networks (BPNN) have demonstrated high potential in the
nonconven-tional branch of adaptive control, its long training time usually discourages their applications in industry. Moreover, when
they are trained on-line to adapt to plant variations, the overtuned phenomenon usually occurs. To overcome the
weak-ness of the BPNN, we propose a neural fuzzy inference network (NFIN) in this paper suitable for adaptive control of
practical plant systems in general, and for adaptive temperature control of a water bath system in particular. The NFIN
is inherently a modied TSK (Takagi–Sugeno –Kang)-type fuzzy rule-based model possessing neural network’s learning
ability. In contrast to the general adaptive neural fuzzy networks, where the rules should be decided in advance before
parameter learning is performed, there are no rules initially in the NFIN. The rules in the NFIN are created and adapted as
on-line learning proceeds via simultaneous structure and parameter identication. The NFIN has been applied to a water
bath temperature control system. As compared to the BPNN under the same training procedure, the control results show
that not only can the NFIN greatly reduce the training time and avoid the overtuned phenomenon, but the NFIN also has
perfect regulation ability. c*
2000 Elsevier Science B.V. All rights reserved.*

Keywords: Neural fuzzy network; Backpropagation network; TSK fuzzy rules; Recursive least square; Structure/parameter learning; Similarity measure; Water bath temperature control

1. Introduction

The advent of the fuzzy logic controllers (FLC) [4,8,9] and the neural controllers based on multilayered neural networks [15,30] has inspired new resources for the possible realization of better and more ecient control. They oer a key advantage over traditional adaptive control systems [1,16]. That is, they do not require mathematical models of the plants. In this paper, a neural fuzzy inference network (NFIN) is proposed to combine the advantages of fuzzy logic and neural networks. The NFIN is a fuzzy rule-based network possessing neural network’s learning ability. A major characteristic of the network is that no preassignment and design of the rules are required. The rules are constructed automatically during the on-line operation.

*∗*_{Corresponding author.}

0165-0114/00/$ - see front matter c*
2000 Elsevier Science B.V. All rights reserved.*
PII: S 0 1 6 5 - 0 1 1 4 ( 9 8 ) 0 0 0 7 5 - X

Fig. 1. Fuzzy partition of a two-dimensional input space (a) clustering-based partitioning. (b) Proposed aligned clustering-based partitioning.

Two learning phases, the structure as well as the parameter learning phases, are adopted on-line for the construction task. One important task in the structure identication of the NFIN is the partition of the input space, which in uences the number of fuzzy rules generated. Unlike the general clustering-based partitioning method [22,29], where the formed membership functions projected from dierent clusters may be very similar as shown in Fig. 1(a), this paper proposes a novel on-line input space partitioning method which is an aligned clustering-based approach. This method can produce a partition result like the one shown in Fig. 1(b). Basically, it aligns the clusters formed in the input space, so it reduces not only the number of rules but also the number of membership functions under a prespecied accuracy requirement. The proposed method creates only the signicant membership functions on the universe of discourse of each input variable based on a fuzzy measure. It can thus generate signicant fuzzy rules from numerical data dynamically.

Another feature of the NFIN is that it can optimally determine the consequent part of fuzzy if–then rules during the structure learning phase. A fuzzy rule of the following form is adopted in our system initially,

Rule j: IF x1 is Ai1 and : : : and xn is Ain THEN yi is mi, (1)

where xi and yi are the input and output variables, respectively, Aij is a fuzzy set, and mi is the position of

a symmetric membership function of the output variable with its width neglected during the defuzzication process. This type of fuzzy rule is used as the main body of the NFIN. We call a NFIN consisting of such kind of rules as a basic NFIN. By monitoring the change of the network output error, additional terms (the linear terms used in the consequent part of the TSK model) will be included when necessary to further reduce the output error. If it is decided that some auxiliary terms should be added to the consequent part during the on-line learning process, a projection-based correlation measure will be performed on each rule to select the most signicant terms to be incorporated into the rule. This consequent identication process is employed in conjunction with the precondition identication process to reduce both the number of rules and the number of consequent terms. For the parameter identication scheme, the consequent parameters are tuned by the recursive least-squares (RLS) algorithm, and the precondition parameters are tuned by the backpropagation learning algorithm. Both the structure and parameter learning are done simultaneously to achieve fast learning. The proposed NFIN is used for temperature control in this paper. Temperature control is an important factor in many process control systems. If the temperature is too high or too low, the nal product is seriously aected. Therefore, it is necessary to reach some desired temperature points quickly and avoid large overshoot. Since the process control systems are often nonlinear and tend to change in an unpredictable way, it is not easy to control them accurately. To verify if the NFIN has good control performance on the temperature control

system, we compare it to the backpropagation neural network (BPNN) under the same training process via a water bath temperature control system. In the training process, we adopt both o-line and on-line training schemes. First, a network (BPNN or NFIN) is trained o-line to learn the inverse dynamics model of a plant, and then the network is congured as a feedforward network controller to the plant. Next, a conventional on-line training scheme is used to adapt the network to the practical environment.

This paper is organized as follows. In Section 2, the structure and the learning algorithm of the NFIN are proposed. In Section 3, the conguration of the NFIN-based control and the training process are introduced. In Section 4, the control results of using the NFIN on the water bath temperature control problem are presented. Conclusion is made in Section 5.

2. Neural fuzzy inference network (NFIN)

In this section, the structure of the NFIN as shown in Fig. 2 is introduced. This six-layered network realizes a fuzzy model of the following form

Ru i: IF x1 is Ai1 and : : : and xn is Ain THEN y is mi0+ aijxj+*· · · ;*

where Aij is a fuzzy set, mi0 is the center of a symmetric membership function on y, and aij is a consequent

parameter. It is noted that unlike the traditional TSK model [23,27] where all the input variables are used in the output linear equation, only the signicant ones are used in the NFIN, i.e., some aij’s in the above fuzzy

rules are zero. With this six-layered network structure of the NFIN, we shall dene the function of each node of the NFIN in Section 2.1, and the learning algorithm of the NFIN in Section 2.2.

2.1. Structure of the NFIN

The NFIN consists of nodes, each of which has some nite fan-in of connections represented by weight values from other nodes and fan-out of connections to other nodes. Associated with the fan-in of a node is an integration function f which serves to combine information, activation, or evidence from other nodes. This function provides the net input for this node

net-input = f(u(k)_{1} ; u(k)_{2} ; : : : ; u(k)_{p} ; w_{1}(k); w(k)_{2} ; : : : ; w_{p}(k));

where u(k)_{1} ; u(k)_{2} ; : : : ; u(k)p are inputs to this node, and w(k)1 ; w
(k)
2 ; : : : ; w

(k)

p are the associated link weights. The

superscript (k) in the above equation indicates the layer number. This notation will also be used in the following equations. A second action of each node is to produce an activation value as a function of its net-input,

output = o(k)i = a(net-input) = a(f);

where a(*·) denotes the activation function. We shall next describe the functions of the nodes in each of the*
six layers of the NFIN.

Layer 1: No computation is done in this layer. Each node in this layer, which corresponds to one input variable, only transmits input values to the next layer directly. That is,

f = u(1)i and a

(1)_{= f:} _{(2)}

From the above equation, the link weight in layer one (w_{i}(1)) is unity.

Layer 2: Each node in this layer corresponds to one linguistic label (small, large, etc.) of one of the input variables in Layer 1. In other words, the membership value which species the degree to which an input

Fig. 2. The structure of the neural fuzzy inference network (NFIN).

value belongs to a fuzzy set is calculated in Layer 2. With the choice of Gaussian membership function, the operations performed in this layer are

f(u(2)ij ) =*−*

(u(2)_{i} *− m*ij)2

2 ij

and a(2)(f) = ef; (3)

where mij and ij are, respectively, the center (or mean) and the width (or variance) of the Gaussian

mem-bership function of the jth term of the ith input variable xi. Hence, the link weight in this layer can be

interpreted as mij. Unlike other clustering-based partitioning methods, where each input variable has the same

number of fuzzy sets, the number of fuzzy sets of each input variable is not necessarily identical in the NFIN. Layer 3: A node in this layer represents one fuzzy logic rule and performs precondition matching of a rule. Here, we use the following AND operation for each Layer-3 node,

f(u(3)i ) =

Y

i

u(3)_{i} = e*−[D*i(*x−m*i)]T[Di(*x−m*i)] _{and} _{a}(3)_{(f) = f;} _{(4)}

where n is the number of Layer 2 nodes participating in the IF part of the rule, Di= diag(1=i1; 1=i2; : : : ; 1=in),

and _{m}i = (mi1; mi2; : : : ; min)T. The link weight in Layer 3 (w
(3)

i ) is then unity. The output f of a Layer-3

Layer 4: The number of nodes in this layer is equal to that in Layer 3, and the ring strength calculated in Layer 3 is normalized in this layer by

f(u(4)i ) =

X

i

u(4)_{i} and a(4)(f) = u(4)_{i} =f: (5)

Like Layer 3, the link weight (w(4)_{i} ) in this layer is unity, too.

Layer 5: This layer is called the consequent layer. Two types of nodes are used in this layer, and they are denoted as blank and shaded circles in Fig. 2, respectively. The node denoted by a blank circle (blank node) is the essential node representing a fuzzy set (described by a Gaussian membership function) of the output variable. Only the center of each Gaussian membership function is delivered to the next layer for the LMOM (local mean of maximum) defuzzication operation [2], and the width is used for output clustering only. Dierent nodes in Layer 4 may be connected to a same blank node in Layer 5, meaning that the same consequent fuzzy set is specied for dierent rules. The function of the blank node is

f =X

i

u(5)_{i} and a(5)(f) = f*· a*0i; (6)

where a0i = m0i, the center of a Gaussian membership function. As to the shaded node, it is generated only

when necessary. Each node in Layer 4 has its own corresponding shaded node in Layer 5. One of the inputs to a shaded node is the output delivered from Layer 4, and the other possible inputs (terms) are the input variables from Layer 1. The shaded node function is

f =X

j

ajixj and a(5)(f) = f*· u*
(5)

i ; (7)

where the summation is over the signicant terms connected to the shaded node only, and aji is the

corre-sponding parameter. Combining these two types of nodes in Layer 5, we obtain the whole function performed
by this layer as
a(5)(f) = X
j
ajixj+ a0i
!
u(5)_{i} : (8)

Layer 6: Each node in this layer corresponds to one output variable. The node integrates all the actions recommended by Layer 5 and acts as a defuzzier with

f(u(6)i ) =

X

i

u(6)_{i} and a(6)(f) = f: (9)

2.2. Learning algorithms for the NFIN

Two types of learning, structure and parameter learning, are used concurrently for constructing the NFIN. The structure learning includes both the precondition and consequent structure identication of a fuzzy if–then rule. Here the precondition structure identication corresponds to the input space partitioning and can be formulated as a combinational optimization problem with the following two objectives: to minimize the number of rules generated and to minimize the number of fuzzy sets on the universe of discourse of each input variable. As to the consequent structure identication, the main task is to decide when a new mem-bership function is generated for the output variable and which signicant terms (input variables) should be added to the consequent part (a linear equation) when necessary. For the parameter learning, the parameters of the linear equations in the consequent parts are adjusted by the RLS algorithm, and the parameters in the precondition part are adjusted by the backpropagation algorithm to minimize a given cost function. There are

no rules (i.e., no nodes in the network except the input/output nodes) in the NFIN initially. They are created dynamically as learning proceeds upon receiving on-line incoming training data by performing the following learning processes simultaneously,

A. Input /output space partitioning. B. Construction of fuzzy rules.

C. Optimal consequent structure identication. D. Parameter identication.

In the above, processes A, B, and C belong to the structure learning phase and process D belongs to the parameter learning phase. The details of these learning processes are described in the rest of this section.

A. Input/output space partitioning

The way the input space is partitioned determines the number of rules extracted from training data as well as the number of fuzzy sets on the universal of discourse of each input variable. For each incoming patternx, the strength a rule is red can be interpreted as the degree the incoming pattern belongs to the corresponding cluster. For computational eciency, we can use the ring strength derived in Eq. (4) directly as this degree measure,

Fi(_{x) =}Y

i

u(3)_{i} = e*−[D*i(*x−m*i)]T[Di(*x−m*i)]_{;} _{(10)}

where Fi_{∈ [0; 1]. Using this measure, we can obtain the following criterion for the generation of a new fuzzy}

rule. Let _{x(t) be the newly incoming pattern. Find}
J = arg max

16j6c(t)F

j_{(}

x); (11)

where c(t) is the number of existing rules at time t. If FJ6 F(t), then a new rule is generated,
where F(t)*∈ (0; 1) is a prespecied threshold that decays during the learning process. Once a new rule*
is generated, the next step is to assign initial centers and widths of the corresponding membership functions.
According to the rst-nearest-neighbor heuristic [10], we set

m(c(t)+1)=x; (12)
D(c(t)+1)=*−*
1
diag(1= ln(F
J
) : : : 1= ln(FJ)); (13)

where ¿0 decides the overlap degree between two clusters.

After a rule is generated, the next step is to decompose the multidimensional membership function formed in Eqs. (12) and (13) to the corresponding one-dimensional membership function for each input variable. For the Gaussian membership function used in the NFIN, the task can be easily done as

e*−[D*i(*x−m*i)]T[Di(*x−m*i)]_{=}Y
j

e*−(x*j*−m*ij)2=2ij_{;} _{(14)}

where mij and ij are, respectively, the projected center and width of the membership function in each

input dimension. To reduce the number of fuzzy sets of each input variable and to avoid the existence of redundant ones, we should check the similarities between the newly projected membership function and the existing ones in each input dimension. Before going to the details on how this overall process works, let us consider the similarity measure rst. Since bell-shaped membership functions are used in the NFIN, we use the formula of the similarity measure of two fuzzy sets with bell-shaped membership functions derived previously in [7,11 – 14]. Suppose the fuzzy sets to be measured are fuzzy sets A and B with membership

Fig. 3. Flow chart of the algorithm for input /output space parti- Fig. 4. Flow chart of the algorithm for rule construction in NFIN. tioning in NFIN.

function A(x) = exp*{−(x − m*1)2=21*} and *B(x) = exp*{−(x − m*2)2=22*}, respectively. Assume m*1¿m2, we

can compute *|A ∩ B| by:*

*|A ∩ B| =* 1
2
h2_{(m}
2*− m*1+
*√*
(1+ 2))
*√*
(1+ 2)
+1
2
h2_{(m}
2*− m*1+
*√*
(1*− *2))
*√*
(2*− *1)
+1
2
h2(m2*− m*1*−*
*√*
(1+ 2))
*√*
(1*− *2)
; (15)

where h(x) = max*{0; x}. So the approximate similarity measure is*
E(A; B) = *|A ∩ B|*
*|A ∪ B|* =
*|A ∩ B|*
1
*√*
+ 2
*√*
*− |A ∩ B|*; (16)

where we use the fact that *|A| + |B| = |A ∩ B| + |A ∪ B|:*

Let (mi; i) represent the Gaussian membership function with center mi and width i. The whole algorithm

for the generation of new fuzzy rules as well as fuzzy sets in each input dimension is as follows, and the corresponding owchart is shown in Fig. 3.

IF _{x is the rst incoming pattern THEN do}
PART 1. *{Generate a new rule,*

with center _{m}1 =x, width D1= diag(1=init; : : : ; 1=init),

where init is a prespecied constant.

After decomposition, we have n one-dimensional membership functions,
with m1i= xi and 1i= init; i = 1*· · · n.*

*}*

ELSE for each newly incoming _{x, do}
PART 2. *{nd J = arg max*1_{6j6c(t)}Fj(x);

IF FJ¿ Fin(t)

do nothing ELSE

*{c(t + 1) = c(t) + 1,*

generate a new fuzzy rule, with

mc(t+1)=x, Dc(t+1)=*−1= · diag(1= ln(F*J) : : : 1= ln(FJ)):

After decomposition, we have

mnew-i= xi, new-i=*− · ln(F*J); i = 1 : : : n.

Do the following fuzzy measure for each input variable i:

*{degree(i; t) ≡ max*16j6kiE[(mnew-i; new-i); (mji; ji)],

where ki is the number of partitions of the ith input variable.

IF degree(i; t)6(t);

THEN adopt this new membership function, and set ki= ki+ 1,

ELSE set the projected membership function as the closest one.*}*

*}*

In the above algorithm, (t) is a scalar similarity criterion which is monotonically decreasing such that higher similarity between two fuzzy sets is allowed in the initial stage of learning. For the output space partitioning, the same measure in Eq. (11) is used. Since the criterion for the generation of a new output cluster is related to the construction of a rule, we shall describe it together with the rule construction process in learning process B below.

B. Construction of fuzzy rules

As mentioned in learning process A, the generation of a new input cluster corresponds to the generation of
a new fuzzy rule, with its precondition part constructed by the learning algorithm in Process A. At the same
time, we have to decide the consequent part of the generated rule. Suppose a new input cluster is formed
after the presentation of the current input– output training pair (_{x; d), then the consequent part is constructed}
by the following algorithm (see the
owchart in Fig. 4):

IF there are no output clusters,

do *{PART 1 in Process A, with x replaced by d}*
ELSE

do *{*

nd J = arg maxjFj(x):

IF FJ¿ Fout(t)

connect input cluster c(t + 1) to the existing output cluster J , ELSE

do the decomposition process in PART 2 of Process A,

connect input cluster c(t + 1) to the newly generated output cluster.

*}.*

The algorithm is based on the fact that dierent preconditions of dierent rules may be mapped to the same consequent fuzzy set. Since only the center of each output membership function is used for defuzzi-cation, the consequent part of each rule may simply be regarded as a singleton. Compared to the general fuzzy rule-based models with singleton output, where each rule has its own individual singleton value, fewer parameters are needed in the consequent part of the NFIN, especially in the case with a large number of rules.

C. Optimal consequent structure identication

Up to now, the NFIN contains fuzzy rules in the form of Eq. (1). Even though such a basic NFIN can be used directly for system modeling, a large number of rules are necessary for modeling sophisticated systems under a tolerable modeling accuracy. To cope with this problem, we adopt the spirit of the TSK model [27] in the NFIN. In the TSK model, each consequent part is represented by a linear equation of the input variables. It is reported in [23] that the TSK model can model a sophisticated system using a few rules. However, even for the TSK model, if the dimension of the input or output space is high, the number of terms used in the linear equation is large even though some terms are in fact of little signicance. Hence, instead of using the linear combination of all the input variables as the consequent part, only the most signicant input variables are used as the consequent terms of the NFIN. The signicant terms will be chosen and added to the network incrementally any time when the parameter learning cannot improve the network output accuracy any more during the on-line learning process.

In the choice of the signicant terms participated in the consequent part, since the dependence between
the candidates u(5)_{i} *· x*i and the desired output ym is linear (ym =

P

iu (5)

i (

P

jamjixj)), we can consider the

training sequences u(5)_{i} xi(1); u(5)i xi(2); : : : ; u(5)i xi(t) and ym(1); ym(2); : : : ; ym(t) as vectors and nd the correlation

between ˆ_{x}i = u(5)i [xi(1); : : : ; xi(t)]T and [ym(1); : : : ; ym(t)]T. The correlation between two vectors ˆx and y, is

estimated by the cosine value of their angle , Deg = cos2_{() = ( ˆ}

xTy)2_{=( ˆ}

xT_{x)(y}ˆ T

y). If vectors ˆx and y are
dependent then Deg = 1, otherwise if ˆ_{x and y are orthogonal then Deg = 0. The main idea of the proposed}
choice scheme is as follows. Suppose we have chosen k *− 1 vectors from n candidates to form a space*
Pk*−1*= p1*⊕ p*2*⊕ · · · ⊕ p*k*−1*. To nd the next important vector from the remaining n*− k + 1 vectors, we*

rst project each of the remaining n*− k + 1 vectors to the null space of P*k*−1*, nd the correlation value Deg

between the n*− k + 1 projected vectors and y, then choose the maximum one which is the kth important*
term of the n candidates, and nally set Pk = p1*⊕ p*2*⊕ · · · ⊕ p*k. Here, p1 = ˆx0 is the vector formed by

the essential singleton values. To nd the projected vectorpk, the Gram–Schmidt orthogonalization procedure

[18] is adopted as
ik =pTixˆk=(pTipi); (17)
pk = ˆxk*−*
k* _{−1}*
X
i=1
ikpi: (18)

If there are c rules, then we have cn candidate vectors, a large number that may lead to high computation load in the calculation of the projected vectors in the above. To reduce the computation cost and to keep the parallel-processing advantage assumed in fuzzy rule-based systems, the terms in the consequent part are selected independently for each rule; i.e. the projection operation is done only for the n vectors in each rule, not for other rules. This assumption is based upon the local property of a fuzzy rule-based system, so the vectors from dierent rules can be regarded as being orthogonal.

For on-line learning, to calculate the correlation degree, we have to store all the input /output sequences before these degrees are calculated. The memory required is of order O(nt + Mt), where n and M are, respectively, the number of input and output variables, for every rule and output, and is huge for large t. To cope with this problem, instead of storing the input– output sequences, we store the correlation values only. Let Cxiym denote the correlation between the sequence u

(5)

j xi and ym, and Cxixp the correlation between the

sequence u(5)j xi and u (5)

j xp. For each incoming datum, these values are on-line calculated, respectively, for

each rule j by
C_{x}j
iym(t + 1) = C
j
xiym(t) + (t)(u
(5)
j (t + 1) xi(t + 1)ym(t + 1)*− C*xjiym(t)); (19)
C_{x}j_{i}_{x}_{p}(t + 1) = C_{x}j_{i}_{x}_{p}(t) + (t)(u(5)_{j} (t + 1) xi(t + 1)u(5)j (t + 1) xp(t + 1)*− C*xjixp(t)); (20)

where i; p = 0; : : : ; n, m = 1; : : : ; M , and Cxjiym(0) and C j

xixp(0) are equal to zero initially. For normal correlation

computation, (t) = 1=(t + 1) is used, but for computation eciency and for changing environment where the recent calculations dominate, a constant value, say 0¡ ¡1, can be used. Using the stored correlation values in Eqs. (19) and (20), we can compute the correlation values and choose the signicant ones. The algorithm for this computation is described as follows.

Projection-based Correlation Measure Algorithm. For each rule do,

*{*

For k = 1 : : : K , where K denotes the number of terms to be selected from the n candidates.
For i = 1 : : : n; i*6= i*1; : : : ; ik*−1*, where i1; : : : ; ik*−1* denotes the terms already selected.

Compute
(m; i) = A(m; i)
B(m; m); 0*6m6k − 1;* (21)
E(i; j) = Cxiyj*−*
k*−1*
X
m=0
(m; i)E(im; j); (22)
G(i; i) = Cxixi *− 2*
k* _{−1}*
X
m=0
(m; i) A (m; i) +
k

*X m=0 k*

_{−1}*X q=0 (m; i) (q; i) B(m; q); (23) Degk(i) = E2(i; j) G(i; i)Cyjyj ; (24) where A(m; i) = Cx*

_{−1}_{im}xi

*−*m

*X l=0 (l; im) A(l; i); (25) B(l; n) = Cx0x0; l = n = 0; G(il; il); l = n*

_{−1}*6= 0;*A(n; il)

*−*Pl

*m=0(m; il)B(m; n); 0*

_{−1}*6n6l − 1;*B(n; l); 0

*6n6n − 1:*(26)

Then nd ik*∈ {1; 2; : : : ; n} such that*

Degk(ik) = max
i=1;:::; n; i*6=i*1;:::; i*6=i*k*−1*

(Degk(i)): (27)

*}*

The procedure is terminated at the K th step, when

DegK(iK)6 ; (28)

where 06 61 is the tolerable dependence degree, and K terms are added to the consequent part of the rule. The consequent structure identication scheme in the NFIN is a kind of node growing method in neural networks. For the node growing method, in general there is a question of when to perform node growing. The criterion used in the NFIN is by monitoring the learning curve. When the eect of parameter learning diminished (i.e., the output error does not decrease over a period of time), then it is the time to apply the above algorithm to add additional terms to the consequent part.

D. Parameter identication

After the network structure is adjusted according to the current training pattern, the network then enters the parameter identication phase to adjust the parameters of the network optimally based on the same training pattern. Notice that the following parameter learning is performed on the whole network after structure learning, no matter whether the nodes (links) are newly added or are existent originally. The idea of backpropagation is used for this supervised learning. Considering the single-output case for clarity, our goal is to minimize the error function

E =1_{2}(y(t)*− y*d(t))2; (29)

where yd_{(t) is the desired output, and y(t) is the current output. The parameters, a}

ji, in layer 5 are tuned by

RLS as
P(t + 1) = 1
"
P(t)*−*P(t)u
(5)T_{(t + 1)}
u(5)_{(t + 1)P(t)}
+u(5)T_{(t + 1)P(t)}_{u}(5)_{(t + 1)}
#
; (30)

a(t + 1) = a(t) + P(t + 1)u(5)(t + 1)(yd(t)*− y(t));* (31)
where 0¡61 is the forgetting factor, u(5) _{is the current input vector for nodes in layer 5,}

a is the
corre-sponding parameter vector, and P is the covariance matrix. The initial parameter vector _{a(0) is determined}
in the structure learning phase and P(0) = I , where is a large positive constant. To cope with changing
environment, in general, 0:9 ¡ ¡ 1 is used. Also, to avoid the unstable eect caused by a small , we
may reset P(t) as P(t) = I after a period of learning. As to the free parameters mij and ij of the input

membership functions in layer 2, they are updated by the backpropagation algorithm. Using the chain rule,
we have
@E
@m(2)_{ij} =
@E
@y
X
k
@y
@a(3)_{k}
@a(3)_{k}
@m(2)_{ij} ; (32)
where
@E
@y = y(t)*− y*
d_{(t);} _{(33)}
@y
@a(3)_{k} = a
(5)
k *−*
y
P
ia
(3)
i
; (34)

@a(3)_{k}
@m(2)_{ij} =
a(2)_{k} 2(xi*− m*ij)
2
ij

if term node j is connected to rule node k
0 otherwise,
(35)
and m(2)_{ij} is updated by
m(2)_{ij} (t + 1) = m(2)_{ij} (t)*− * @E
@m(2)_{ij} (36)
= m(2)_{ij} (t)*− (y(t) − y*d(t))X
k
@y
@a(3)_{k}
@a(3)_{k}
@m2
ij
: (37)
Similarly, we have
@E
@_{ij}(2) =
@E
@y
X
k
@y
@a(3)_{k}
@a(3)_{k}
@_{ij}(2); (38)
where
@a(3)_{k}
@_{ij}(2) =
a(2)_{k} 2(xi*− m*ij)
2
3
ij

if term node j is connected to rule node k;
0 otherwise,
(39)
and (2)_{ij} is updated by
(2)_{ij} (t + 1) = (2)_{ij} (t)*− * @E
@_{ij}(2)
: (40)

3. NFIN-based adaptive control

Consider a classical discrete-time single-input–single-output (SISO) plant

yp(k + 1) = f[yp(k); yp(k*− 1); : : : ; y*p(k*− m + 1); u(k); u(k − 1); : : : ; u(k − n)];* (41)

where u denotes the input, yp is the output, k is the discrete-time index, m; n*¿0 and m; n ∈ Z, and the function*

f(*·) is f : <*m+n _{→ <. In many practical systems, the plant input is limited in magnitude, i.e., there exist}

umax and umin such that, for any k,

umin6u(k)6umax: (42)

In this paper, the task is to control the plant described in Eq. (41) with respect to a specied reference output
yref(k) and there is no a priori knowledge regarding its dynamics, i.e., the function f(*·) is unknown.*

3.1. Network controller

Assume that the plant described in Eq. (41) is invertible, i.e., there exists a function g(*·) such that*
u(k) = g[yp(k + 1); yp(k); : : : ; yp(k*− m + 1); u(k − 1); : : : ; u(k − n)]:* (43)

Fig. 5. Schematic diagram of the network controller.

Consider a network (BPNN or NFIN) with input vector I (k), single output ˆu(k), and an input– output rela-tionship represented by

ˆ

u(k) = N (I (k)); (44)

where

I (k) = [yp(k + 1); yp(k); : : : ; yp(k*− ˆm + 1); u(k − 1); : : : ; u(k − ˆn)]*T; (45)

and function N (*·) denotes the input–output mapping of the network. If the output of N(·) approximates the*
output of g(*·) for the same input, the network can be viewed as a controller in the feedforward control path.*
At any instant k, the control input to the plant can be obtained from Eq. (44) by setting the input vector
as

I*0*(k) = [yref(k + 1); yp(k); : : : ; yp(k*− ˆm + 1); u(k − 1); : : : ; u(k − ˆn)]*T; (46)

where reference output yref(k + 1) is used instead of the unknown yp(k + 1) in Eq. (45).

Therefore, if the network can be trained to make its output approximates the output of the inverse dynamics model of the plant described in Eq. (43) for the same input, it can be viewed as a controller, called network controller. A schematic diagram of the network controller is shown in Fig. 5. Before a network controller is used, however, one must perform sucient training on the network. In the following subsection, a training procedure for the network is presented.

3.2. O-line and on-line combined training procedure

In the NFIN, we adopt both the o-line and on-line training schemes to train a network controller. For the o-line training, the general inverse-modeling learning scheme proposed by Psaltis et al. [20] is used. When we perform the o-line training, we have to decide the training patterns in advance. We always hope that the training patterns are sucient and proper; however, there are no procedures or rules suitable to all cases. In some approaches [5,6], the training patterns are obtained by injecting specied input signals to the plant. In practical applications, such a technique is rather impractical, because we normally do not know what plant input u will cause the reference output yref. Therefore, in our scheme, the training patterns are obtained

by probing the plant input using random signals. That is, a sequence of random input signals urd(k) under

the magnitude limits of the plant input is injected directly to the plant, and then an open-loop input–output characteristic of the plant can be obtained. According to the input–output characteristic of the plant, proper

Fig. 6. Conventional on-line training scheme.

training patterns are selected to cover the entire reference output space. Using the collected training patterns with the values of the selected input variables as the input pattern and the corresponding control signal urd(k) as the target pattern, the network can be updated supervisedly to minimize an error function E dened

by
E =
kn
X
k=1
1
2[urd(k)*− ˆu(k)]*
2_{;} _{(47)}

where kn is the number of training patterns.

After sucient o-line training, the trained network is congured as a network controller to the plant as shown in Fig. 5. Generally speaking, such controller realizes fair but not perfect control. To achieve perfect control performance, on-line training is usually required. To do this, when the trained network is used to control the plant, the training patterns near the specic operating points are further gathered to train the network to suit the current control environment. In addition, in order to adapt to parameter variations in the plant and changes in the environment, the on-line training is also necessary. Hence, in our training procedure, we also include the on-line training scheme.

For the on-line training, a conventional on-line training scheme is used. Fig. 6 is a block diagram for
the conventional on-line training scheme. In executing this scheme, we follow two phases, application phase
and training phase. In the application phase, the switch S1 and S2 are connected to node 1 and node 2,
respectively, to form a control loop. In this loop, the control signal u(k) is generated according to the input
vector I*0*(k) = [yref(k +1); yp(k); : : : ; yp(k*− ˆm+1); u(k −1); : : : ; u(k − ˆn)]*T (see Eq. (46)). In the training phase,

the switches S1 and S2 are connected to nodes 3 and 4, respectively, to form a training loop. In this loop, we
can dene a training pattern with input vector I (k) = [yp(k +1); yp(k); : : : ; yp(k*− ˆm+1); u(k −1); : : : ; u(k − ˆn)]*T

(see Eq. (45)) and desired output u(k), where the input vector of network controller is the same as that used in the o-line training scheme. With this training pattern, the network controller can be trained supervisedly to minimize the error function E(k + 1) dened by

E(k + 1) = 1_{2}[ u(k)*− ˆu(k)]*2; (48)
where ˆu(k) is the actual output of the network controller when it receives the input vector I (k) in the training
phase.

In the next section, the o-line and on-line combined training procedure will be used to train the BPNN and the proposed NFIN to control a water bath temperature control system.

4. NFIN for water bath temperature control

4.1. Problem statement

To see whether the proposed NFIN can achieve good performance and overcome the disadvantages of the BPNN, we compare it with the BPNN under the same aforementioned training procedure on a water bath temperature control system. Consider a discrete-time SISO temperature control system

yp(k + 1) = A(Ts)yp(k) +

B(Ts)

1 + e1=2yp(k)*−
*u(k) + [1*− A(T*s)]y0; (49)

where
A(Ts) = e*−aT*s; (50)
B(Ts) =
b
a(1*− e*
*−aT*s_{):} _{(51)}

The above equation models a real water bath temperature control system given in [25]. The parameters are
set as a = 1:00151e*−4*, b = 8:67973e*−3*,
= 40:0 and y0= 25:0*◦*C. The plant input u(k) is limited between

0 and 5 V, and the sampling period is Ts= 30 s. The task is to control the water bath system to follow three

set-points:
yref(k) =
35*◦*C for k640,
55*◦*C for 40¡ k680,
75*◦*C for 806k6120:
4.2. Control performance

In implementing the o-line training scheme, a sequence of random input signals urd(k) limited between

0 and 5 is injected directly to the water bath system described in Eq. (49). Then an open-loop input–output
characteristic of the system is obtained as shown in Fig. 7. It is observed that the system behaves linearly up to
about 70:0*◦*C and then becomes nonlinear and saturates at about 80:0*◦*C. From the input–output characteristic
of the water bath system, 90 training patterns are selected to cover the entire reference output space. Comparing
Eqs. (41) and (49), it is clear that we have m = 1 and n = 0 in Eq. (41) now. Hence, the input vector of
the network controller can be decided as ˆm = 1 and ˆn = 0 in Eq. (45) to get a perfect matching.

For the BPNN, a four-layer feedforward network with two hidden layers is used. The hidden and output nodes have hyperbolic tangent sigmoid activation functions. In general, the choice of the number of hidden nodes is a fundamental question often raised in the application of BPNN and is usually decided by trials and errors. Therefore, to nd a suitable number of hidden nodes, three networks: BPNN(5, 5), BPNN(10, 10) and BPNN(15, 15) are chosen, where the notation BPNN(a; b) denotes that the number of nodes in the rst-and second-hidden-layer are a rst-and b, respectively. To increase the convergence speed, a modied form of the generalized delta rule [17] is used. According to this form, the weights from the second-hidden-layer to the output-layer connections are updated by

wiq(k + 1) = oiZq+ wiq(k) + wiq(k*− 1);* (52)

where

oi = ea*0*(neti); (53)

and Zq is the output of the qth node in the second-hidden-layer. The parameter e is the dierence between

Fig. 7. Input– output characteristic of the water bath temperature Fig. 8. Convergence curves of the control system through o-training using control system line (a) BPNN(5, 5), (b) BPNN(10; 10), (c) BPNN(15; 15) and

(d) NFIN.

and acceleration coecients, respectively. The value oi is the error signal and its double subscript indicates

the ith node in the output layer. The value neti is the net input to node i in the output layer, and a*0*(neti) =

@a(neti)=@neti.

The weights from the rst-hidden-layer to the second-hidden-layer connections are updated by

sqp(k + 1) = hqLp+ sqp(k) + sqp(k*− 1);* (54)
where
hq = a*0*(netq)
X
i
oiwiq; (55)

and Lp is the output of the pth node in the rst hidden layer. The value hq is the error signal of node

q in the second hidden layer. The value netq is the net input to node q in the second hidden layer, and

a*0*(netq) = @a(netq)=@netq.

The weights from the input-layer to the rst-hidden-layer connections are updated by

vpj(k + 1) = hpXj+ vpj(k) + vpj(k*− 1);* (56)
where
_{hp} = a*0*(netp)
X
q
_{hq}s_{qp}; (57)

and Xj is the output of the jth node in the input layer. The value hp is the error signal of node p in the rst

hidden layer. The value netpis the net input to node p in the rst hidden layer, and a*0*(netp) = @a(netp)=@netp.

The learning rate, momentum and acceleration coecients are set to 0:01, 0:9 and *−0:1, respectively. The*
weights are initialized at small random values.

For the NFIN, the learning parameters are set as = 0:005, = 0:7, Fin = 0:1, Fout = 0:7, = 0:4

Fig. 9. Regulation performance of the control system after o-line training using (a) NFIN and (b) BPNN(15; 15). (c) The corresponding errors of (a) and (b).

urd(k), each of the NFIN and the three BPNNs described above is trained to minimize the error function E

dened as
E =
90
X
k=1
1
2[urd(k)*− ˆu(k)]*
2_{:} _{(58)}

In the o-line training, the convergence curves of the NFIN and the three BPNNs are shown in Fig. 8. The curves show the sum square error per iteration as a function of the number of iterations. For the three BPNNs, we nd that the BPNN(15, 15) shows the highest convergence speed of all. However, its convergence speed is still not satised. After 10 000 iterations, its error only reach about 0.7031. For the NFIN, as expected, its convergence speed is higher than that of the BPNN(15, 15). We nd that its error reaches a relatively small value at about 0.6731 after only 5 iterations. After the o-line learning, either of the NFIN

Fig. 10. Regulation performance of each trial of the control system through on-line training using (a) BPNN(15; 15) based on the con-ventional on-line training scheme, (b) BPNN(15; 15) based on the new on-line training scheme, and (c) NFIN based on the concon-ventional on-line training scheme.

and the BPNN(15, 15) is congured as a direct feedforward controller to the water bath system. The regulation performance of these two network controllers and the corresponding errors are shown in Figs. 9(a) – (c). The curves in Fig. 9(c) shows the error between the reference output and the actual output of the control system. We nd that the NFIN controller achieves better performance than the BPNN(15, 15) controller. However, their errors are still too large, meaning that o-line training along is not enough and on-line training is necessary to achieve high accuracy.

In the on-line training, the NFIN and BPNN(15, 15) controllers are trained by the conventional on-line training scheme shown in Fig. 6. Moreover, the new on-line training method that performs multiple updating operations during each sampling period [25] is also applied to the BPNN(15, 15) for comparison. In the new on-line training method, we choose additional 20 adjacent training patterns per sampling period. To test their regulation performance in each trial, a performance index, sum of absolute error (SAE), is dened by

SAE =X

k

*|y*ref(k)*− y*p(k)*|;* (59)

where yref(k) and yp(k) are the reference output and the actual output of control system, respectively. The

performance index SAE is calculated for k ranging from 1 to 120, which is called a trial.

After 20 trials of on-line training, the regulation performance of each trial of the aforementioned three cases is shown in Fig. 10, and the nal regulation performance and the corresponding errors in the 20th trial of the on-line training are shown in Figs. 11(a) – (d). The curves in Fig. 10 show the sum of absolute error per trial as a function of the number of trials. For the BPNN(15, 15) based on the conventional on-line training scheme, as expected, its global learning ability leads to seriously overtuned phenomenon. From the corresponding errors in Fig. 11(d), it is found that it only performs good regulation at the upper set-point but deteriorate at others. Furthermore, we nd its performance index value after 20 trials of on-line training is larger than its initial. For the BPNN(15, 15) based on the new on-line training scheme, although such method increases the on-line convergence speed and improves control performance as expected, the overtuned phenomenon still exists. From the corresponding errors in Fig. 11(d), it is also observed that it only improves the performance at the upper and middle set-points, but not at the lower set-points. Furthermore, the additional training of the adjacent patterns increases the computational load per sampling period. The NFIN based on the conventional on-line training scheme shows the highest on-line convergence speed and, owing to the local

Fig. 11. Regulation performance of the control system in the 20th trial of on-line training using (a) BPNN(15, 15) based on the conven-tional on-line training scheme, (b) BPNN(15, 15) based on the new on-line training scheme, and (c) NFIN based on the convenconven-tional on-line training scheme. (d) The corresponding errors of (a), (b) and (c).

tuning property of fuzzy rule-based systems, there is no overtuned phenomenon after 20 trials of on-line training. According to the corresponding errors in Fig. 11(d), the NFIN shows good regulation ability at all set-points and little errors exist. The NFIN also shows better regulation performance after 4 trials of on-line training than the two other cases after 20 trials and the best performance is achieved after about 13 trials of on-line training. According to the performance index, among the three cases, the NFIN shows the best regulation control performance for the overall process. Fig. 12(a) shows the nal assignment of fuzzy rules of the NFIN after 20-trials of on-line training in the [y(k); y(k + 1)] plain. The number of generated rules is 7, and the numbers of fuzzy sets on the y(k) and y(k + 1) dimensions are 4 and 4, respectively, as shown in Fig. 12(b). In total, the number of network structure parameters is 31, but that of the BPNN(15, 15) is 270.

From the above results, we verify that the NFIN can achieve good control performance and overcome the disadvantages of the BPNN. Not only can it reduce the long o-line and on-line training time and avoid the overtuned phenomenon in the on-line training, but it also shows good regulation control capability on the

Fig. 12. Final structure of the NFIN after 20 trials of on-line training. (a) The nal assignment of fuzzy rules in the [y(k); y(k + 1)] plain. (b) The corresponding membership functions on the y(k) and y(k + 1) dimensions.

water bath system. Moreover, in comparing the number of network structure parameters, the NFIN also use less parameters than the BPNN.

5. Conclusion

Based on the studies of a water bath temperature control system, a summary of comparisons between the two networks, NFIN and BPNN(15, 15), is shown in Table 1. In comparing the number of network structure parameters, there are 270 structure parameters to be tuned for BPNN(15, 15), but only 31 structure parameters to be tuned for the NFIN. As to the convergence speed of the two networks, we have found that the NFIN not only has higher o-line convergence speed than the BPNN(15, 15), but the NFIN based on the conventional

Table 1

Summary of comparisons among the two network controllers on the water bath temperature control system

BPNN(15, 15)

Criteria NFIN BPNN(15, 15) (Using new on-line

training scheme [16])

Network structure parameters Light Heavy

Convergence speed (o-line) Fast Slow

Regulation performance (o-line) SAE = 351:6840 SAE = 354:8154
SAE = 120_{k=1}*|y*ref(k)*− y*p(k)*|*

Convergence speed (on-line) Fast Slow Medium

The nal regulation performance SAE = 341:7778 SAE = 356:7284 SAE = 344:1059
SAE = 120_{k=1}*|y*ref(k)*− y*p(k)*|*

Computational load (on-line) Light Light Heavy

on-line training scheme also has a higher on-line convergence speed than the BPNN(15, 15) based on the conventional or the new on-line training scheme proposed in [25]. For the regulation performance after o-line training, the NFIN controller shows a better regulation capability than the BPNN(15, 15) controller. Moreover, the nal regulation performance of the NFIN controller after 20 trials of the conventional on-line training also shows the best regulation control capability of all cases. According to these comparisons, we nd the NFIN not only overcome the disadvantages of the BPNN but also owns good control capability in the water bath temperature control system. Hence, it is promising to apply the NFIN controller to other real cases in industries in the near future.

References

[1] K.J. Astrom, B. Wittenmark, Adaptive Control, Addison-Wesley, Reading, MA, 1989.

[2] H.R. Berenji, P. Khedkar, Learning and tuning fuzzy logic controllers through reinforcements, IEEE Trans. Neural Networks 3 (5) (1992) 724–740.

[3] M. Jordan, D.E. Rumelhart, Internal world models and supervised learning, in: Machine Learning, Proc. 8th Internat. Workshop, 1991.

[4] C.L. Karr, E.J. Gentry, Fuzzy control of pH using genetic algorithms, IEEE Trans. Fuzzy System 1 (1993) 46 –53.

[5] M. Khalid, S. Omatu, A neural network controller for a temperature control system, IEEE Control System Mag. 12 (3) (1992) 58–64.

[6] M. Khalid, S. Omatu, R. Yusof, MIMO furnace control with neural networks, IEEE Trans. Control System Technol. 1 (4) (1993) 238–245.

[7] B. Kosko, Neural Networks and Fuzzy Systems, Prentice-Hall, Englewood Clis, NJ, 1992.

[8] C.C. Lee, Fuzzy logic in control systems: fuzzy logic controllers – Parts I, II, IEEE Trans. System Man Cybernet. 20 (1990) 404 – 435.

[9] C. Ling, T. Edgar, A new fuzzy gain scheduling algorithm for process control, in: Proc. ACC’92, Chicago, IL, 1992, pp. 2284–2290. [10] C.T. Lin, C.S.G. Lee, Neural-network-based fuzzy logic control and decision system, IEEE Trans. Comput. 40 (12) (1991)

1320 –1336.

[11] C.T. Lin, C.S.G. Lee, Neural Fuzzy Systems: A Neural-Fuzzy Synergism to Intelligent Systems (with disk), Prentice-Hall, Englewood Clis, NJ, 1996.

[12] C.T. Lin, Neural Fuzzy Control Systems with Structure and Parameter Learning, World Scientic, Singapore, 1994.

[13] C.J. Lin, C.T. Lin, Reinforcement learning for ART-based fuzzy adaptive learning control networks, IEEE Trans. Neural Networks 7 (3) (1996).

[14] C.T. Lin, A neural fuzzy control system with structure and parameter learning, Fuzzy Sets and Systems 70 (1995) 183–212. [15] W.T. Miller III, R.S. Sutton, P.J. Werbos (Ed.), Neural Networks for Control, MIT Press, Cambridge, MA, 1990.

[16] K.S. Narendra, R. Ortega, P. Dorato (Eds.), Advances in Adaptive Control, IEEE Press, New York, 1991.

[17] S. Nagata, M. Sekiguchi, K. Asakawa, Mobile robot control by a structured hierarchical neural network, IEEE Control System Mag. 10 (3) (1990) 69–76.

[18] B. Noble, J.W. Daniel, Applied Linear Algebra, 3rd ed., Prentice-Hall, Englewood Clis, NJ, 1988. [19] K. Ogata, Discrete-Time Control Systems, Prentice-Hall, Englewood Clis, NJ, p. 202.

[20] D. Psaltis, A. Sideris, A. Yamamura, A multilayered neural network controller, IEEE Control System Mag. 10 (3) (1989) 44 – 48. [21] D.E. Rumelhart et al., Learning internal representation by error propagation, in Parallel Distributed Processing: Explorations in the

Microstructure of Cognition, vol. I, MIT Press, Cambridge, MA, 1986.

[22] E.H. Ruspini, Recent development in fuzzy clustering, Fuzzy Set and Possibility Theory, North-Holland, New York, 1982, pp. 113 –147.

[23] M. Sugeno, K. Tanaka, Successive identication of a fuzzy model and its applications to prediction of a complex system, Fuzzy Sets and Systems 42 (3) (1991) 315–334.

[24] C.T. Sun, J.S. Jang, A neuro-fuzzy classier and its applications, Proc. IEEE Internat. Conf. Fuzzy Systems, San Francisco, CA, vol. I, 1993, pp. 94 –98.

[25] J. Tanomaru, S. Omatu, Process control by on-line trained neural controllers, IEEE Trans. Ind. Electron. 39 (6) (1992) 511–521. [26] K. Tanaka, M. Sano, H. Watanabe, Modeling and control of carbon monoxide concentration using a neuro-fuzzy technique, IEEE

Trans. Fuzzy Systems 3 (3) (1995) 271–279.

[27] T. Takagi, M. Sugeno, Fuzzy identication of systems and its applications to modeling and control, IEEE Trans. System Man Cybernet. 15 (1) (1985) 116 –132.

[28] L.X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation, and orthogonal least-squares learning, IEEE Trans. Neural Networks 3 (5) (1992) 807– 814.

[29] L. Wang, R. Langari, Building Sugeno-type models using fuzzy discretization and orthogonal parameter estimation techniques, IEEE Trans. Fuzzy Systems 3 (4) (1995) 454 – 458.

[30] T. Yabuta, T. Yamada, Learning control using neural networks, in Proc. 1991 IEEE Internat. Conf. Robotics and Automation, Sacramento, CA, April 1991.

[31] L.A. Zazeh, Fuzzy sets, Inform. and Control 8 (1965) 338–352.

[32] L.A. Zazeh, Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. System Man Cybernet. 3 (1973) 28– 44.