Reinforcement group cooperation-based symbiotic evolution for recurrent wavelet-based neuro-fuzzy systems

(1)

Reinforcement group cooperation-based symbiotic evolution for recurrent

wavelet-based neuro-fuzzy systems

Yung-Chi Hsu, Sheng-Fuu Lin

Department of Electrical and Control Engineering, National Chiao-Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan 300, ROC

a r t i c l e

i n f o

Article history: Received 9 August 2007 Received in revised form 26 June 2008

Accepted 8 December 2008 Communicated by T. Heskes Available online 23 January 2009 Keywords: Neuro-fuzzy system Symbiotic evolution Control Reinforcement learning Recurrent network

a b s t r a c t

This paper proposes a recurrent wavelet-based neuro-fuzzy system (RWNFS) with a reinforcement group cooperation-based symbiotic evolution (R-GCSE) for solving various control problems. The R-GCSE is different from the traditional symbiotic evolution. In the R-GCSE method, a population is divided to several groups. Each group formed by a set of chromosomes represents a fuzzy rule and cooperates with other groups to generate better chromosomes by using the proposed elite-based compensation crossover strategy (ECCS). In this paper, the proposed R-GCSE is used to evaluate numerical control problems. The performance of the R-GCSE in the simulations is excellent compared with other existing models.

1. Introduction

In recent years, fuzzy logic or artiﬁcial neural networks used to solve control problems have become a popular research topic

[1–10]. The reason is that classical control theory usually requires a mathematical model for designing controllers. The inaccuracy of mathematical modeling of plants usually degrades the perfor-mance of the controllers, especially for nonlinear and complex control problems[11–14]. Fuzzy logic has the ability to express the ambiguity of human thinking and to translate expert knowl-edge into computable numerical data.

A fuzzy system consists of a set of fuzzy IF–THEN rules that describe the input–output mapping relationship of networks. Obviously, it is difficult for human experts to examine all the input-output data from a complex system to find proper rules for a fuzzy system. To cope with this difficulty, several approaches used to generate the fuzzy IF–THEN rules from numerical data have been proposed [2,3,6]. These methods were developed for supervised learning; i.e., the correct ‘‘target’’ output values are given for each input pattern to guide the learning of the network. However, most of the supervised learning algorithms for neural fuzzy networks require precise training data in order to tune the networks for various applications. For some real world applica-tions, precise training data are usually difficult and expensive, if

not impossible, to obtain. For this reason, there has been a growing interest in reinforcement learning algorithms for neural controller[15–18]or fuzzy[19–21]design.

In designing a fuzzy controller, adjusting the required para-meters is important. To do this, back-propagation (BP) training was used in[3,6–8]. It is a powerful training technique that can be applied to networks with a forward structure. Since the steepest descent technique is used in BP training to minimize the error function, the algorithms may reach the local minima very fast and never ﬁnd the global solution. To solve these problems, several evolutionary algorithms, such as genetic algorithm (GA) [22], genetic programming[23], evolutionary programming[24], and evolution strategies[25], have recently been proposed. They are parallel and global search techniques. Because they simulta-neously evaluate many points in the search space, they are more likely to converge toward the global solution. For this reason, evolutionary methods, which are used for training fuzzy models, have become an important ﬁeld.

The evolutionary fuzzy model generates a fuzzy system automatically by incorporating evolutionary learning procedures

[26–33]. The most well-known evolutionary learning procedure is GAs. Several genetic fuzzy models have been proposed[26–31]. In

[26], Karr applied GAs to design the membership functions of a fuzzy controller with its fuzzy rule set being assigned in advance. Since the membership functions and rule sets are co-dependent, simultaneous design of these two approaches is a more appropriate methodology. Based on this concept, many researchers have applied GAs to optimize both the parameters of the membership functions and Contents lists available atScienceDirect

journal homepage:www.elsevier.com/locate/neucom

Neurocomputing

Corresponding author.

(2)

the rule sets[27–29]. Lin and Jou[30]proposed GA-based fuzzy reinforcement learning to control magnetic bearing systems. Juang et al.[31] proposed using genetic reinforcement learning to design fuzzy controllers. The GA adopted in[31]was based on traditional symbiotic evolution which, when applied to fuzzy controller design, complements the local mapping property of a fuzzy rule. In [32] Tang proposed a hierarchical genetic algorithm. The hierarchical GA enables the optimization of the fuzzy system design for a particular application. Juang [33]

proposed the combination of online clustering and Q-value based GA for reinforcement fuzzy system (CQGAF) to simultaneously design the number of fuzzy rules and the free parameters in a fuzzy system.

However, these approaches encounter one or more of the following major problems: (1) all the fuzzy rules are encoded into one chromosome; (2) the population cannot evaluate each fuzzy rule locally.

Recently, Gomez and Schmidhuber[34,35]proposed solutions for these problems. The proposed enforced sub-populations (ESP) used sub-populations of neurons for the ﬁtness evaluation and overall control. As shown in[34,35], the sub-populations that are used to evaluate the solution locally can obtain better perfor-mance compared to systems of only one population which are used to evaluate the solution.

As with[34,35], in this paper, a RWNFS with a reinforcement group cooperation-based symbiotic evolution (R-GCSE) is pro-posed for solving the problems mentioned above. In the propro-posed R-GCSE, each chromosome represents only one fuzzy rule, and the n-rules fuzzy system is constructed by selecting and combining n chromosomes from several groups. The R-GCSE, which promotes both cooperation and specialization, ensures diversity and prevents a population from converging to suboptimal solutions. In the R-GCSE, compared with normal symbiotic evolution, several groups are in the population. Each group formed by a set of chromosomes represents a fuzzy rule. Compared with[34,35]to let the well-performing groups of individuals cooperate to create better generations, an elite-based compensation crossover strat-egy (ECCS) is proposed in this paper. In the ECCS, each group cooperates to perform the crossover steps. Therefore, the better chromosomes of each group will be selected to perform the crossover steps in the next generation.

The advantages of the R-GCSE are summarized as follows: (1) the R-GCSE uses group-based populations to evaluate the fuzzy rule locally; (2) the R-GCSE uses the ECCS to allow better solutions from different groups to cooperate in order to generate better solutions in the next generation; (3) it indeed performs better performance and converges more quickly than some traditional genetic methods.

This paper is organized as follows. In Section 2, the RWNFS is introduced. In Section 3, the proposed group cooperation-based symbiotic evolution (GCSE) is described. In Section 4, the reinforce-ment group cooperation-based symbiotic evolution (R-GCSE) using for constructing the RWNFS model is introduced. In Section 5, the simulation results are presented. The conclusions are summarized in the last section.

2. Structure of a RWNFS

In this section, the structure of RWNFS shown inFig. 1will be introduced. For TSK-type fuzzy networks[1,5], the consequence of each rule is a function input linguistic variable. A widely adopted function is a linear combination of input variables plus a constant term. This study adopts a nonlinear combination of input variables (i.e., wavelet neural network (WNN)). The advantages of the WNN are as follows: (1) its ability to ﬁnd ‘‘universal

approximation’’; (2) an explicit link between the wavelet trans-form and the network coefﬁcient is completed, and an initial guess of network parameters can be derived by the decomposition of a wavelet formula; (3) it probably obtains the same approx-imation performance as a smaller size network; in addition, wavelet networks are optimal approximators since the smallest number of bits are required to obtain an arbitrary precision[36]. In RWNFS, each fuzzy rule corresponds to a sub-WNN which consists of single-scaling wavelets[37]. The non-orthogonal and compact wavelet functions used as the node function (wavelet bases) are adopted in this paper. The purpose of introducing a fuzzy model into WNN is to improve the accuracy of function approximation based on the dilation and translation parameters of wavelets while not increasing the number of wavelet bases. A RWNFS is composed of fuzzy rules that can be presented in the following general form:

Rj: If I1is A1jand . . . Iijis Aijand . . . and Injis Anj Then ^y1j ¼ XM k¼1 w1 jk

j

a:b¼w1j1

j

0:0þw1j2

j

1:0þw1j3

j

1:1. . . and ^y2j ¼ XM k¼1 w2jk

j

a:b¼w 2 j1

j

0:0þw 2 j2

j

1:0þw 2 j3

j

1:1 .. . (1) where Rj_{denotes the jth rule; (I}

1j,y, Iij,y,Inj) is the network input

pattern (x1,y, xi,y,xn) plus the temporal term for the linguistic

term of the precondition part Aj¼ ðA1j; . . . ; Aij; . . . ; AnjÞ; the local WNN model’s outputs ^y1j and ^y

2

j are calculated for outputs y1and

y2of rule Rj.

Next, the signal propagation is indicated, along with the operation functions of the nodes in each layer. In the following description, IðhÞ_i denotes the ith node’s input in the hth layer, and OðhÞ_i denotes the ith node’s output in layer h.

In layer 1, nodes just transmit input signals to the next layer directly, that is,

Oð1Þ_i ¼Ið1Þ_i (2)

where Ið1Þ_i ¼ ðx1; . . . ; xi; . . . ; xnÞ. Each precondition part of the jth rule Aj¼ ðA1j; . . . ; Aij; . . . ; AnjÞ(a group of fuzzy sets) is described here by a Gaussian-type membership function; that is, the membership value specifying the degree of how an input value belongs to a fuzzy set is determined in layer 2. The Gaussian

(3)

function is deﬁned by Oð2Þ_ij ¼exp ðI ð2Þ ij mijÞ2

s

2 ij ! (3) where mijand

s

ijare the mean and standard deviation with the ith

dimension and jth rule node, respectively. Additionally, the input of this layer for the discrete time can be denoted by

Ið2Þ_ij ðtÞ ¼ Oð1Þ_i ðtÞ þ Oðf Þ_ijðtÞ; Oðf Þ_ijðtÞ ¼ Oð2Þ_ij ðt 1Þ

y

ij (4) where

y

ijis the feedback weight. Clearly, the input of this layer

contains the memory terms Oð2Þ_ij ðt 1Þ, which store the past information of the network. In RWNFS, the recurrent property is achieved by feeding the output of each membership function back to itself so that each membership value is inﬂuenced by its previous value.

Although some recurrent neural fuzzy networks have been proposed and applied to dynamic system identiﬁcation and control, there are still disadvantages to these network structures. In[38], the order of both the control input and the network output in the Auto Regressive with eXogenous (ARX) model needs to be known. This problem can be solved by feeding back the output of each membership function in the recurrent property of RWNFS. Only the current control input and system state are fed to the network input. The past values can be memorized by using the feedback structure. In[39], a global feedback structure is adopted, and the outputs of all the rule nodes, the ﬁring strengths, are fed back and summed. In this case, the TRFN [39] needs more adjustable parameters.

In layer 3, deﬁning the number and the locations of the membership functions leads to the partition of the space D ¼ D1 Dn. The collection of fuzzy sets Aj¼ ðA1j; . . . ; Aij; . . . ; AnjÞpertaining to the premise part of Rjformulates a fuzzy region in D that can be regarded as a multi-dimensional fuzzy set whose membership function is determined by

Oð3Þ_j ¼Y n i¼1 Ið3Þ_j ¼Y n i¼1 exp ðI ð2Þ ij mijÞ2

s

2 ij ! (5) where n is the number of external dimensions.

Layer 4 only receives the signal ^ys_jfrom the output of the WNN for an output Ysand the jth rule. The mathematical function of

each node j is ^ys_j¼Oð4Þ_sj ¼X M

k¼1

wsjk

j

a:b. (6)

The crisp

j

a.bcan be obtained as follows:

j

a:b¼

Pn i¼1

f

a;bðxiÞ

jXj (7)

where |X| is the number of input dimensions. The

f

a.b(xi) functions which are used to input vectors to ﬁre up the wavelet interval are calculated as follows:

f

ðxiÞ ¼cosðxiÞ 0:5pxip0:5

0; otherwise

(

;

f

a:bðxiÞ ¼cosðaxibÞ

where a ¼ 1; . . . ; m; b ¼ 1; . . . ; a. (8)

The above equation formulates the non-orthogonal wavelets in a ﬁnite range, where b denotes a shifting parameter with its maximum value equal to the corresponding scaling parameter a.

The ﬁnal output of the model (y1,y, ys,y, yp) is calculated in

layer 5, and the node’s output together with related links acts as a

defuzziﬁer. The mathematical function is ys¼O ð5Þ s ¼ PM j¼1I ð5Þ sjI ð5Þ j PM j¼1I ð5Þ j ¼ PM j¼1ðwsj1

f

0:0þ þwjks

f

a:b þwsjM

f

m:mÞI ð5Þ j PM j¼1I ð5Þ j (9) where Ið5Þ_sj ¼Oð4Þ_sj denotes the output of the local model of WNN for an output Ysand the jth rule, Ið5Þj ¼O

ð3Þ

j is the output of layer 3, and ysis the sth output of RWNFS.

3. A group cooperation-based symbiotic evolution (GCSE) In this section, the proposed GCSE method will be discussed. Recently, there have been many studies which have tried to enhance the traditional GAs [40–43]. One category of these studies tries to modify the structure of a population. Examples in this category include the distributed GA[41], the cellular GA[42], and the symbiotic GA[43].

This study proposes using the GCSE to improve the symbiotic GA[43]. In the GCSE, the algorithm is developed from symbiotic evolution. The idea of symbiotic evolution was first proposed in an implicit fitness-sharing algorithm that is used in an immune system model[44]. The authors developed artificial antibodies to identify artificial antigens. Because each antibody can match only one antigen, a different population of antibodies is required to effectively defend against a variety of antigens. As shown in

[31,43], partial solutions can be characterized as specializations. The specialization property ensures diversity and prevents a population from converging to suboptimal solutions. A single partial solution cannot ‘‘take over’’ a population since it must correspond with other specializations. Unlike the standard evolutionary approach which always causes a given population to converge, hopefully at the global optimum, the symbiotic evolution ﬁnds solutions in different, unconverted populations

[31,43]. In the GCSE, compared with normal symbiotic evolution, several groups are in the population. Each group formed by a set of chromosomes represents a fuzzy rule.

In the GCSE, the structure of the population consists of several groups. The structure of the chromosome in the GCSE is shown in

Fig. 2. However, to allow groups to cooperate with each other in order to generate better solutions, the GCSE proposes an ECCS.

(4)

In the GCSE, the coding structure of the chromosomes must be suitable for each chromosome to represent only one fuzzy rule. A fuzzy rule with the form introduced in Eq. (1) is shown inFig. 3. The learning process of the GCSE in each group involves five major steps: initialization, fitness assignment, elite-based repro-duction strategy (ERS), ECCS, and mutation strategy. The flowchart

of the learning process is shown inFig. 4. The learning process is described step-by-step as follows:

3.1. Initialization

Before the GCSE is designed, individuals forming several initial groups should be generated. The following formulations show how to generate the initial chromosomes in each group: Deviation : Chrg;c½p ¼ random½

s

min;

s

max

where p ¼ 2; 4; . . . ; 2n; g ¼ 1; 2; . . . ; M; c ¼ 1; 2; . . . ; NC (10)

Fig. 3. Coding a rule of a RWNFS into a chromosome in GCSE.

(5)

Mean : Chrg;c½p ¼ random½mmin;mmax where p ¼ 1; 3; . . . ; 2n 1 (11) Theta : Chrg;c½p ¼ random½

y

min;

y

max

where p ¼ 2n þ 1; 2n þ 2; . . . 2n þ 1 (12)

Weight : Chrg;c½p ¼ random½wmin;wmax

where p ¼ 2ðn þ 1Þ þ 1; 2ðn þ 1Þ þ 2; . . . ; 2ðn þ 1Þ þ M (13) where Chrg,c represents the cth chromosome in the gth group;

M represents the total number of groups; NCis the total number of

chromosomes in each group; p represents the pth gene in a Chrg,c;

and [

s

min,

s

max], [mmin, mmax],[

y

min,

y

max], and [wmin, wmax]

represent the predeﬁned range. 3.2. Fitness assignment

As discussed previously, in the GCSE, the fitness value of a single rule (an individual) is calculated by summing up the fitness values of all the possible combinations which contain that single rule. The details for assigning the fitness value are described step-by-step as follows:

Step 1. Randomly choose M fuzzy rules from the M groups with size NC.

Step 2. Evaluate every RWNFS, which is generated from step 1, to obtain a ﬁtness value.

Step 3. Divide the fitness value by M and accumulate the divided fitness values to the selected rules with their fitness value records initially set to zero.

Step 4. Repeat the above steps until each rule (an individual) in each group has been selected a sufﬁciently large number of times, and record the number of RWNFS models in which each individual has participated.

Step 5. Divide the accumulated ﬁtness value of each chromo-some by the number of times it has been selected. The average ﬁtness value represents the performance of a single rule.

3.3. Elite-based reproduction strategy (ERS)

Reproduction is a process in which individuals are copied according to their fitness values. A fitness value is assigned to each chromosome according to a fitness assignment step in which high values denote a good fit. The goal of the GCSE is to maximize the fitness value. For stability, this study proposes an ERS to allow the best combination of chromosomes to be kept in the next generation. In the GCSE, the chromosome with the best fitness value may not be in the best combination. Therefore, every chromosome in the best combination must be kept by applying ERS. Other chromosomes in each group are selected by the roulette-wheel selection method [45]—a simulated roulette is spun—in this study. The best performing chromosomes in the top half of each group[31]advance to the next generation. The other half is generated by applying the crossover and mutation operations on the chromosomes in the top half of the parent generation. In the reproduction step, the top half of each group must keep the same number of chromosomes.

3.4. Elite-based compensation crossover strategy (ECCS)

Although the ERS can search for the best existing individuals, it does not create any new individuals. In nature, an offspring has two parents and inherits genes from both. The main step which works on the parents is the crossover step, which occurs on a

selected pair under a crossover rate. In this paper, an ECCS is proposed to improve the crossover operation. The ECCS mimics the cooperation phenomenon in society, in which individuals become more suitable for the environment as they acquire and share more knowledge of their surroundings. The best performing individuals in the top half of each group that are called elites are used to select the parents so that the ECCS can be applied. Details of the ECCS are shown below.

Step 1. The ﬁrst of the parents that is used in the crossover operation is selected from the original group by using the following equations:

Fitness_Ratiog;t¼ Pt

u¼1fitnessg;u PNc

c¼1fitnessg;u

; where t ¼ 1; 2; . . . ; Nc (14)

Rand_Value½g ¼ Random½0; 1; where g ¼ 1; 2; . . . ; M; (15) Parent_SiteA½g ¼ t; if

Fitness_Ratiog;t1oRand_Value½gpFitness_Ratiog;t, (16) where Fitness_Ratiog;t is the fitness ratio of the tth chromosome in the gth group; Rand_Value½g 2 ½0; 1 is a random value in the gth group; and Parent_SiteA½g is the site of the first parent. According to Eq. (16), if the Rand_Value½g is greater than the fitness ratio at the (t1)th chromosome in the gth group and equal to or smaller than the fitness ratio at the tth chromosome in the gth group, the site of the first parent of the gth group is assigned to t.

Step 2. After the ﬁrst parent is determined, the best performing elites in every group are used to determine the other parent. In this step, the total ﬁtness ratio of every group is computed as follows: Total_Fitnessg¼ XNc c¼1 fitnessg;c; where g ¼ 1; 2; . . . ; M; (17) Total_Fitness_Ratiow¼ Pw u¼1Total_Fitnessu PM g¼1Total_Fitnessg where w ¼ 1; 2; . . . ; M; (18)

where Total_Fitnessgrepresents the summation of all the chromo-somes’ ﬁtness values in the gth group and Total_Fitness_Ratiowis the total ﬁtness ratio of the wth group.

Step 3. Determine the other parental group for applying crossover with the Parent_SiteA½gth chromosome in the gth group according to the following equations:

Group_Rand_Value½g ¼ Random½0; 1 where g ¼ 1; 2; . . . ; M; (19) Parent_Group_SiteB½g ¼ w; if Total_Fitness_Ratiow1

oGroup_Rand_Value½gpTotal_Fitness_Ratiow. (20)

where Group_Rand_Value½g 2 ½0; 1 is a random value in the gth group and Parent_Group_SiteB½g represents the site of the group where the second parent is selected from.

Step 4. After the Parent_Group_SiteB½gth group is selected, the other parent which is selected from the Parent_Group_SiteB½gth group is determined by the ECCS according to the following equations:

Fitness_RatioSelected_g;t¼ Pt

u¼1fitnessSelected_g;u PNc

c¼1fitnessSelected_g;c

, (21)

where t ¼ 1; 2; . . . ; Nc; Selected_g ¼ Parent_Group_SiteB½g; Rand_Value½g ¼ Random½0; 1; where g ¼ 1; 2; . . . ; M; (22) Parent_SiteB½g ¼ l; if Fitness_RatioSelected_g;l1

(6)

where Fitness_RatioSelected_g;tis a ﬁtness ratio of tth chromosome in the Parent_Group_SiteB½gth group and Parent_SiteB½g is the site of the second parent. The pseudo code of the ECCS is listed inFig. 5. After the parents from the gth group and the Parent_Group_ SiteB½gth group are selected using ECCS, the individuals (the Parent_SiteA½gth chromosome and the Parent_SiteB½gth chromo-some) are crossed and separated by using a two-point crossover in the gth group, as shown inFig. 6. InFig. 6, exchanging the site’s values between the selected sites of the parents’ individuals creates new individuals. After this operation, the individuals with poor performances are replaced by the newly produced offspring.

3.5. Mutation strategy

Although the ERS and ECCS would produce many new strings, they do not introduce any new information to the population at the site of an individual. Mutation can randomly alter the allele of a gene. In this paper, to emphasize the capability of the ECCS, the GCSE tries to simplify the mutation operation. Therefore, a uniform mutation [45] is adopted, and the mutated gene is generated randomly from the domain of the corresponding variable.

The aforementioned steps are done repeatedly and stopped when the predetermined condition is achieved.

4. Reinforcement learning for a RWNFS

Unlike the supervised learning problem, in which the correct ‘‘target’’ output values are given for each input pattern, the reinforcement learning problem has only very simple ‘‘evaluative’’ or ‘‘critical’’ information rather than ‘‘instructive’’ information. In the extreme case, there is only a single bit of information to indicate whether the output is right or wrong. The training environment of reinforcement group cooperation-based symbiotic evolution (R-GCSE), which interacts with reinforcement learning problems, is shown in Fig. 7. In this paper, the reinforcement signal indicates whether a success or a failure occurs.

As shown inFig. 7, the R-GCSE consists of a RWNFS in order to determine a proper action according to the current input vector (environment state). The structure of the R-GCSE is different from Barto and his colleagues’ actor-critic architecture [17], which consists of a control network and a critic network. The input of the RWNFS is the state of the plant, and the output is a control action of the state denoted by f. The only available feedback is a reinforcement signal that notifies the RWNFS only when a failure occurs. An accumulator plays the role of a relative performance measure. It is shown inFig. 7. It accumulates the number of time steps before a failure occurs. In this paper, the feedback is decided by an accumulator that determines how long the experiment is still a ‘‘success.’’ The accumulator is used as a relative measure of the fitness in the R-GCSE. The key to the R-GCSE is formulating a number of time steps before a failure occurs and using this formulation as the fitness function of the R-GCSE. It will be observed that the advantage of the R-GCSE is that it can meet global optimization capability.

A flowchart of the R-GCSE is shown inFig. 8. The R-GCSE runs in a feed forward fashion to control the environment (plant) until a failure occurs. In this paper, the fitness function is defined as a number of time steps before a failure occurs. The goal of the R-GCSE is to maximize the fitness value. The fitness function is defined by:

Fitness Value ¼ TIME-STEP (24)

where TIME-STEP represents how long the experiment is still a ‘‘success.’’ Eq. (24) indicates that long-time steps before a failure Fig. 5. The pseudo code of the ECCS method.

(7)

occurs (to keep the desired control goal longer) means a higher ﬁtness of the R-GCSE.

5. Illustrative examples

Two applications are discussed in this section. The ﬁrst simulation simulated balance a cart-pole system that was described in [46–48]. The second simulation simulated the balancing of a ball and beam system that was described in

[49,50]. The initial parameters for these two examples are given in

Table 1. The initial parameters were determined by practical experimentation or trial-and-error tests.

5.1. Example 1: Control of a cart-pole balancing system

In this example, the R-GCSE was applied to the classic control problem of a cart-pole balancing system. This problem is often used as an example of inherently unstable and dynamic systems to demonstrate both modern and classic control techniques[46–48]or

reinforcement learning schemes[15–21], and is now used as a control benchmark. As shown inFig. 9, a cart-pole balancing problem is the problem of learning how to balance an upright pole. The bottom of the pole is hinged to a cart that travels along a ﬁnite-length track to its right or left. Both the cart and the pole can move only in the vertical plane; that is, each has only one degree of freedom.

There are four state variables in the system:

y

, the angle of the pole from an upright position (in degrees); _

y

, the angular velocity of the pole (in degrees/seconds); x, the horizontal position of the cart’s center (in meters); and _x, the velocity of the cart (in meters/ seconds). The only control action is f, which is the amount of force (in Newtons) applied to a cart to move it left or right. The system fails when the pole falls past a certain angle (_{7121 is used here)} or when the cart runs into the bounds of its track (the distance is 2.4 m from the center to each bound of the track). The goal of this control problem is to determine a sequence of forces that is applied to the cart to balance the pole upright. The equations of motion are as follows:

y

ðt þ 1Þ ¼

y

ðtÞ þ

D

y

_ðtÞ, (25) _ yðt þ 1Þ ¼ _yðtÞ þDððm þ mpÞg sinyðtÞÞ=ðð4=3Þðm þ mpÞl mpl cos2yðtÞÞ cos

y

ðtÞ f ðtÞ þ mpl _

y

ðtÞ2sin

y

ðtÞ

m

csgnð_xðtÞÞ h i ð4=3Þðm þ mpÞl mpl cos2

y

ðtÞ ð

m

pðm þ mpÞ _

y

ðtÞ=mplÞ ð4=3Þðm þ mpÞl mplcos2

y

ðtÞ , (26) xðt þ 1Þ ¼ xðtÞ þ

D

_xðtÞ, (27) _xðt þ 1Þ ¼ _xðtÞ þ

D

f ðtÞ þ mpl½ _

y

ðtÞ 2_sin

_y

_{ðtÞ €}

_y

_{ðtÞ cos}

_y

_ðtÞ ðm þ mpÞ

m

csgnð_xðtÞÞ ðm þ mpÞ , (28) where

l ¼ 0:5 m; the length of the pole;

m ¼ 1:1 kg; combined mass of the pole and the cart; mp¼0:1 kg; mass of the pole;

g ¼ 9:8 m=s; acceleration due to the gravity;

m

c¼0:0005; coefficient of friction of the cart on the track,

m

p¼0:000002; coefficient of friction of the pole on the cart,

D

¼0:02ðsÞ; sampling interval. (29)

Fig. 8. Flowchart of the R-GCSE.

Fig. 7. Schematic diagram of the R-GCSE for the RWNFS model.

Table 1

The initial parameters before training.

Parameters Value

[smin,smax] [0, 2]

[mmin, mmax] [0, 2]

[ymin,ymax] [2, 2]

[wmin, wmax] [20, 20]

(8)

The constraints on the variables were 12_p

_y

_p12_{, 2.4} mpxp2.4 m, and 10 Npfp10 N. A control strategy is deemed successful if it can balance a pole for 100,000 time steps.

The four input variables ð

y

; _

y

;x; _xÞ and the output f(t) were normalized between 0 and 1 over the following ranges:

y

: [12,12], _

y

: [240,240],: [2.4, 2.4], _x: [2.4, 2.4], and f(t): [10,10]. The ranges of _

y

and _x were calculated by experiments with extreme boundary conditions. The car was placed at the location of 2.4 m (or 2.4 m) with the pole angle set at 121 (or 121), respectively. Then the maximum force of 10N (or 10N) was applied to the cart. When the system failed, the observed _

y

and _x were the boundaries.

The four normalized state variables were used as inputs to the RWNFS. The coding of a rule in a chromosome is the form shown inFig. 5. The values are floating-point numbers initially assigned to the R-GCSE. The fitness function in this example is defined in Eq. (24) to train the RWNFS and represents how long before the

pole falls past a certain angle ðj

y

j412

Þor before the cart runs into the bounds of its track ðjxj42:4 mÞ.

The initial parameters of the R-GCSE were determined by parameter exploration. The ﬁrst study in parameter exploration was proposed by De Jong [51]. As shown in [51], a small population size is good for the initial performance, and a large population size is good for long-term performance. Moreover, a low mutation rate is good for on-line performance, and a high mutation rate is good for off-line performance. In[52], the author found from his simulation that the best population size and mutation rate were 30 and 0.01, respectively.

In this study, the parameters were found using the method given in [52]. Therefore, the number of fuzzy rules was from 2 to 20 in increments of 1, the group size was from 10 to 100 in increments of 10, the crossover rate was from 0.25 to 1 in increments of 0.05, and the mutation rate was from 0 to 0.3 in exponential increments. The parameters set for the R-GCSE are shown inTable 2.

There were four rules that were used to construct the RWNFS. A total of 30 runs were performed. Each run started at different initial states ( _

y

and _x were set to 0, and

y

and x were set randomly according to the predeﬁned ranges). The learning curve of the R-GCSE after 30 runs is shown in Fig. 10(a). The learning curve represents how long before the cart-pole balancing system failed. As shown in this ﬁgure, the RWNFS learned to balance the pole in the 198th generation on average. The standard deviation in

Fig. 10(a) is 72.43. When the R-GCSE was stopped, the best combination of strings from the groups in the ﬁnal generation was Table 2

Parameters Value

Fuzzy rules 4

Group size 30

Crossover rate 0.4

Mutation rate 0.15

(9)

selected and tested on the cart-pole balancing system. The obtained fuzzy rules of the RWNFS are as follows:

R1: If I11is A1;1ð0:134; 0:038Þ and I12is A2;1ð0:61; 1:12Þ and I13is A3;1ð1:02; 0:71Þ and I14is A4;1ð0:83; 0:12Þ Then ^y11¼1:533

j

0:00:147

j

1:0þ0:011

j

1:1þ0:147

j

2:0 R2_{: If I} 21is A1;2ð0:43; 0:93Þ and I22is A2;2ð0:28; 0:81Þ and I23is A3;2ð0:32; 0:18Þ and I24is A4;2ð0:61; 0:87Þ Then ^y12¼ 0:45

j

0:0þ0:42

j

1:0þ0:013

j

1:10:783

j

2:0 R3: If I31is A1;3ð0:96; 0:21Þ and I32is A2;3ð0:56; 0:14Þ and I33is A3;3ð1:01; 0:46Þ and I34is A4;3ð0:38; 0:39Þ Then ^y12¼0:074

j

0:0þ0:23

j

1:00:17

j

1:10:38

j

2:0 R4_{: If I} 41is A1;4ð0:64; 0:58Þ and I42is A2;4ð0:21; 0:34Þ and I43is A3;4ð0:56; 0:31Þ and I44is A4;4ð0:82; 0:58Þ Then ^y14¼0:183

j

0:00:193

j

1:0þ0:412

j

1:1þ0:039

j

2:0 The simulation was carried out for 30 runs. The results, which consisted of the pole angle, cart position and controller output, are shown inFig. 11. Each line inFig. 11represents each run with a different initial state. The results shown in this ﬁgure are the ﬁrst 1,000 time steps in the 100,000 control time steps. As shown in

Fig. 11, the R-GCSE successfully controlled the cart-pole balancing system in 30 runs.

In this example, in order to demonstrate the effectiveness and efﬁciency of the R-GCSE, the reinforcement symbiotic evolution (R-SE)[49]and reinforcement genetic algorithm (R-GA)[26]were applied to the same problem. In the R-SE and R-GA, the parameters were set according to[52]. Therefore, the number of fuzzy rules was from 2 to 20 in increments of 1, the population

(10)

size was from 10 to 250 in increments of 10, the crossover rate was from 0.25 to 1 in increments of 0.05, and the mutation rate was from 0 to 0.3 in exponential increments. The parameters set for two methods (the R-SE and R-GA) were as follows: (1) the numbers of fuzzy rules were both set to 4; (2) the population sizes of the R-SE and R-GA were 170 and 70, respectively; (3) the crossover rates of the R-SE and R-GA were 0.55 and 0.6, respectively; (4) the mutation rate of the R-SE and R-GA were 0.08 and 0.12, respectively.

A total of 30 runs were performed. Each run started at different initial states. The ﬁtness is deﬁned in Eq. (24). The learning curves of the R-SE and R-GA after 30 runs are shown inFig. 10(b) and (c). The R-SE and R-GA learned to balance the pole in the 346th and 514th generations on average. The standard deviations in

Fig. 10(b) and (c) are 141.37 and 199.12. The R-GCSE only compares the performance of the fitness value with the R-SE and R-GA. This is because, in the reinforcement learning signal design that is adopted in this study, a well-performing controller is defined as a controller that does not exceed the predefined boundaries. As shown inFig. 11, the control capabilities of the R-GCSE are better than those of[26,49].

Genetic reinforcement learning for neuro control (GENITOR)

[48], symbiotic adaptive neuro-evolution (SANE) [43], temporal difference and genetic algorithm-based reinforcement learning (TDGAR)[30], combination of online clustering and Q-value based GA for reinforcement fuzzy system (CQGAF)[33], and enforce sub-population (ESP)[34]methods were applied to the same control problem. The simulation results are listed inTable 3. The number of pole-balance trials (which reﬂects the number of training episodes required) and the CPU time are shown inTable 3. This experiment used a Pentium III chip with a 400 MHz CPU, a 512 MB memory, and the visual C++ 6.0 simulation software.

A total of 30 runs were performed. Each run started at different initial states. The initial parameters of these methods[30,33,34, 43,48]were determined according to [52]. In [48], the normal evolution algorithm was used to evolve the weights of a fully-connected two-layer neural network, with additional connections from each input unit to the output layer. After trial-and-error tests, the network size was 10 in [48]. In [43], the symbiotic evolution algorithm was used to evolve a two-layer neural network. In[43], the network size was 10.

The TDGAR [30] consists of the critic network and action network to the learning system. The critic network is a standard three-layer feedforward network that uses sigmoid functions in the hidden layer and output layer. The action network is a fuzzy neural network with five layers of nodes, and each layer performs one stage of the fuzzy inference process. There are five hidden nodes and five rules in the critic network and the action network. In CQGAF[33], the fuzzy controller with Q-value based GA was proposed to solve controller problems. After trial-and-error tests, the final average number of rules in CQGAF from 30 runs was 8 using the on-line clustering algorithm.

In the ESP[34], the author proposed using ESP to evaluate the solution locally. There are five sub-populations in the ESP. The other parameters set for five methods[30,33,34,43,48] were as follows: (1) the population sizes of the five methods were 130, 170, 100, 130 and 40, respectively; (2) the crossover rates of the five methods were 0.45, 0.55, 0.35, 0.45 and 0.5, respectively; (3) the mutation rate of the five methods were 0.21, 0.17, 0.16, 0.24 and 0.18, respectively.

As shown inTable 3, the proposed R-GCSE method is feasible and effective and obtains smaller CPU times than other existing methods.

To demonstrate the efﬁciency of the RWNFS, two different networks are introduced in this example: the RWNFS and the TSK-type recurrent neuro-fuzzy network (TRFN)[39]. There are four

rules that are used to construct the TRFN. The parameters of the R-GCSE used to train the TRFN are the same as the parameters of the R-GCSE used to train the RWNFS. A performance (time steps and CPU time) comparison of the two models is shown inTable 4. To demonstrate the efﬁciency of the proposed GSE and ECCS, in this example, three different methods, the R-GCSE without the ECCS (Type I), the R-SE method (Type II), and the R-GCSE (Type III), were used. In the Type I method, each group performed the two-point crossover strategy independently. In the Type II method, the R-SE[49]was adopted. In the Type III method, the R-GCSE used the ECCS to perform crossover strategy. In the Type I method, the parameters were set according to [52]. The parameters set for Type I method were as follows: (1) the number of fuzzy rules was 4; (2) the population size was 40; (3) the crossover rate was 0.45; (4) the mutation rate was 0.07. The performance (time steps and CPU time) of the three types of methods is shown inTable 5. The R-GCSE (Type III) performs better than the other two types of methods. In Table 5, a comparison of the Type III and Type I methods is given, from which it can be observed that the ECCS can reduce time steps and the CPU time.

Although the R-GCSE performs better than other methods in the cart-pole balancing problem, it is too easy to ﬁnd solutions quickly for this problem. With regards to this, extensions of a basic cart-pole balancing problem have been used. In[53], the author proposed several variations of the cart-pole balancing problem. The most challenging extension of the cart-pole balancing problem in [53] was a double pole balancing problem, where two poles of different lengths must be balanced synchronously.

Therefore, a double pole balancing problem was used to evaluate the R-GCSE. There are six state variables in the system: Table 3

Comparison of time steps and CPU time for various existing models in Example 1.

Method Mean Best Worst Standard deviation

Steps Seconds Steps Seconds Steps Seconds Steps Seconds GENITOR[48]1981 69.65 519 20.54 3143 185.51 598.78 62.54 SANE[43] 879 34.25 89 11.15 1541 75.34 337.91 24.97 R-GA[26] 514 25.34 78 8.23 887 64.75 199.12 23.88 R-SE[49] 346 21.37 56 7.87 658 61.39 141.37 23.67 TDGAR[30] 327 31.34 23 10.84 469 69.91 124.77 24.28 ESP[34] 294 18.92 14 3.08 401 34.74 91.56 8.37 CQGAF[33] 264 28.77 15 6.24 376 57.49 95.82 14.67 R-GCSE 198 11.64 12 2.34 314 26.54 72.43 7.29 Table 4

Comparison of time steps and CPU time for two different networks in Example 1. Method Mean Best Worst Standard deviation

Steps Seconds Steps Seconds Steps Seconds Steps Seconds RWNFS 198 12.64 12 2.34 314 28.54 72.43 7.29 RTNFN 232 15.83 14 3.43 331 31.45 81.58 7.93

Table 5

Comparison of time steps and CPU time for two different methods.

Method Mean Best Worst Standard deviation Steps Seconds Steps Seconds Steps Seconds Steps Seconds Type I 257 15.34 13 2.94 371 31.58 89.67 8.19 Type II 346 21.37 56 7.87 658 61.39 141.37 23.67 Type III 198 11.64 12 2.34 314 27.54 72.43 7.29

(11)

y

i, the angle of the ith pole; _

y

i, the angular velocity of the ith pole; x, the position of the cart; and _x, the velocity of the cart. The only control action is f, which is the amount of force applied to the cart to move it left or right. The system fails when the pole falls past a certain angle (7361 was used here) or when the cart runs into the bounds of its track (the distance is 2.4 m from the center to each bound of the track). The equations of motion for N poles balanced on a single cart are as follows:

€x ¼F

m

csgnð_xÞ þ PN i¼1F˜i M þPN i¼1m˜i , (30) €

y

i¼ 3 4li €x cos

y

iþg sin

y

iþ

m

pi _

y

i mili ! , (31)

where ˜Fiis the effective force from ith pole, the equation of ˜Fi if ˜ Fi¼mili

y

_ 2 i sin

y

iþ 3 4micos

y

i

m

pi _

y

i mili þg sin

y

i ! , (32)

where ˜miis the effective mass ot the ith pole, the equation of ˜miis ˜

mi¼mið1 34cos 2

_y

iÞ. (33)

The parameters used for the double pole problem are shown in

Table 6. The parameters set for the R-GCSE are shown inTable 7. A total of 30 runs were performed. Each run started at different initial states.

The R-SE [49], R-GA [26], GENITOR [48], SANE [43], TDGAR

[30], CQGAF [33], and ESP[34] were also applied to the same problem. In these seven methods, the parameters were set according to[52]. A total of 30 runs were performed. Each run started at different initial states. In[26,49], the numbers of fuzzy rules were both set to 6. In[48], the network size was eighteen. In

[43], the network size was sixteen. In TDGAR [30], there were 10 hidden nodes in the critic network and 10 rules in the action network. In the CQGAF[33], the ﬁnal average number of rules in CQGAF of 30 runs was 13. In the ESP[34], there were eight sub-populations. The parameters set for seven methods [26,30,33, 34,43,48,49]were as follows: (1) the population sizes of the seven methods were 210, 120, 180, 240, 160, 200 and 60, respectively; (2) the crossover rates of the seven methods were 0.5, 0.6, 0.40, 0.55, 0.45, 0.35 and 0.45, respectively; (3) the mutation rate of the ﬁve methods were 0.12, 0.22, 0.21, 0.15, 0.26, 0.14 and 0.16, respectively.

This paper compares time steps and the CPU time with those of other existing methods [26,30,33,34,43,48,49] in a double pole balancing problem in Table 8. A comparison shows that the R-GCSE is feasible and effective and requires less CPU time than other existing models in the double pole balancing problem. 5.2. Example 2: Control of a ball and beam system

The ball and beam system[49,50]is shown inFig. 12. The beam is made to rotate in vertical plane by applying a torque at the center of rotation, and the ball is free to roll along the beam. The goal is for the ball to remain in contact with the beam. The ball and beam system can be written in state space form as

_x1 _x2 _x3 _x4 2 6 6 6 6 6 4 3 7 7 7 7 7 5 ¼ x2 Bðx1x24G sin x3Þ x4 0 2 6 6 6 6 6 4 3 7 7 7 7 7 5 þ 0 0 0 1 2 6 6 6 6 6 4 3 7 7 7 7 7 5 u, y ¼ x1, (34)

where x ¼ ðx1;x2;x3;x4ÞT ðr; _r;

y

; _

y

ÞTis the state of the system and y ¼ x1r is the output of the system. The control u is the angular acceleration ( €

y

), and the parameters B ¼ 0.7143 and G ¼ 9.81 were chosen in this system. The purpose of control is to determine u(x) such that the closed-loop system output y will converge to zero from different initial conditions.

According to the input/output-linearization algorithm [54], the control law u(x) is determined as follows: in state x, vðxÞ ¼

a

3

f

4ðxÞ

a

2

f

3ðxÞ

a

1

f

2ðxÞ

a

0

f

1ðxÞ, where

f

1ðxÞ ¼ x1,

f

2ðxÞ ¼ x2,

f

3ðxÞ ¼ BG sin x3,

f

4ðxÞ ¼ BGx4cos x3, and the

a

iis

chosen so that s4_þ

_a

3s3þ

a

2s2þ

a

1s þ

a

0 is the Hurwitz poly-nomial. We compute aðxÞ ¼ BG cos x3 and bðxÞ ¼ BGx24sin x3; then uðxÞ ¼ ½vðxÞ bðxÞ=aðxÞ.

Table 6

The parameters for the double pole balancing problem.

Parameters Description Value

x Position of the cart [2.4,2.4] m

y Angle of the pole [36,36] deg.

F Force applied to cart [10,10] N

l1 Half length of 1st pole 0.5 m

l2 Half length of 2nd pole 0.05 m

M Mass of cart 1.0 kg

m1 Mass of the 1st pole 0.1 kg

m2 Mass of the 2nd pole 0.01 kg

mc Coefﬁcient of friction of cart on track 0.0005

mP Coefﬁcient of friction if ith pole’s hinge 0.000002

Table 7

Parameters Value Fuzzy rules 6 Group size 60 Crossover rate 0.5 Mutation rate 0.18 Table 8

Comparison of time steps and CPU time in a double pole balancing problem.

Steps Seconds Steps Seconds Steps Seconds Steps Seconds

GENITOR[48] 31.760 412.49 6.560 91.85 60.120 572.54 14892.45 109.69 SANE[43] 14800 225.73 3140 63.87 25.560 276.54 4589.87 60.23 R-GA[26] 12250 218.34 2870 49.56 21.150 251.68 3404.33 52.97 R-SE[49] 9790 192.67 2150 47.49 16.870 241.67 2943.62 47.38 TDGAR[30] 8970 231.45 1987 56.37 15.230 258.74 2314.24 54.05 CQGAF[33] 6790 187.96 1230 39.54 9870 238.95 1691.38 46.32 ESP[34] 4120 123.73 396 26.18 7190 214.51 1186.73 37.89 R-GCSE 3380 98.47 267 21.48 6390 191.78 1056.54 32.45

(12)

The four input variables ðr; _r;

y

; _

y

Þ and the output u(x) were normalized between 0 and 1. The values were floating-point numbers initially assigned to the R-GCSE. In the R-GCSE, the fitness function is also defined in Eq. (24) to train the RWNFS, and represents how long before the beam deviates beyond a certain angle (j

y

j412

) or before the ball reaches the end of the beam (jrj42 m). In this example, the parameters were set according to

[52]. The parameters set for the R-GCSE are shown inTable 9.

There are ﬁve rules that were used to construct the RWNFS. A total of 30 runs were performed. Each run started at the same initial state. The learning curve of the RWNFS is shown in

Fig. 13(a). The RWNFS learned to balance the ball in the 121st generation on average. The standard deviation in Fig. 13(a) is 37.21. When the learning process was stopped, the best combina-tion of strings from groups at the ﬁnal generacombina-tion was selected and tested on the ball and beam system. The obtained fuzzy rules of the RWNFS are as follows:

R1: If I11is A1;1ð0:37; 0:42Þ and I12is A2;1ð0:51; 0:21Þ and I13is A3;1ð0:66; 0:18Þ and I14is A4;1ð0:13; 0:31Þ Then ^y11¼ 1:06

j

0:0þ2:41

j

1:00:31

j

1:1 5:06

j

2:0þ0:36

j

2:1 R2_{: If I} 21is A1;2ð0:81; 0:37Þ and I22is A2;2ð0:63; 0:63Þ and I23is A3;2ð0:57; 0:23Þ and I24is A4;2ð0:33; 0:011Þ Then ^y12¼ 3:11

j

0:0þ1:01

j

1:0þ0:07

j

1:1 þ0:29

j

2:01:33

j

2:1 R3: If I31is A1;3ð0:37; 0:61Þ and I32is A2;3ð0:72; 0:22Þ and I33is A3;3ð0:93; 0:35Þ and I34is A4;3ð0:89; 0:17Þ Then ^y1₃¼0:23

j

0:00:28

j

1:01:19

j

1:1 þ0:38

j

2:00:04

j

2:1

Fig. 12. The ball and beam system.

Table 9

Parameters Value

Fuzzy rules 5

Group size 40

Crossover rate 0.45

Mutation rate 0.18

(13)

R4_{: If I} 41is A1;4ð0:46; 0:33Þ and I42is A2;4ð0:12; 0:94Þ and I43is A3;4ð0:39; 0:77Þ and I44is A4;4ð0:07; 0:83Þ Then ^y14¼0:06

j

0:0þ0:17

j

1:0þ0:35

j

1:1þ0:15

j

2:0 1:46

j

2:1 R5: If I51is A1;5ð0:68; 0:03Þ and I52is A2;5ð0:33; 0:24Þ and I53is A3;5ð0:87; 0:41Þ and I54is A4;5ð0:94; 0:06Þ Then ^y15¼ 0:74

j

0:00:63

j

1:0þ1:24

j

1:1þ0:91

j

2:0 þ0:04

j

2:1

The simulation was run 30 times. The results, which consist of the beam angle, ball position, and controller output, are shown in

Fig. 14. The results shown in this ﬁgure is from the ﬁrst 1000 time steps in the 100,000 control time steps. As shown inFig. 14, the R-GCSE in 30 runs successfully controlled the ball and beam system. The results show that the trained RWNFS has good ability in controlling the ball and beam balancing system.

In this example, as with Example 1, the performance of the R-GCSE was also compared with the performance of other methods (the R-SE[49]and R-GA[26]). In[26,49]the parameters were set according to[52]. The parameters set for the R-SE and R-GA were as follows: (1) the numbers of fuzzy rules were both set to 5; (2) the population sizes of the R-SE and R-GA were 180 and 100, respectively; (3) the crossover rates of the R-SE and R-GA were 0.4 and 0.5, respectively; (4) the mutation rates of the R-SE and R-GA were 0.10 and 0.05, respectively. A total of 30 runs were performed. Each run started at the same initial state. The learning curves of the R-SE and R-GA are shown inFig. 13(b) and (c). The R-SE[49]and R-GA[26]learned to balance the ball in the 217th generation and 386th generation, on average. The standard deviations inFig. 13(b) and (c) are 74.21 and 132.68.

The performance (time steps and CPU time) in this example compared with various existing models[26,30,33,34,43,48,49]is shown inTable 10. In[48], the network size was eleven. In[43], the network size was 10. In the TDGAR, there were six hidden nodes in the critic network and six rules in the action network. In

(14)

the CQGAF, the final average number of rules in the CQGAF from 30 runs was 8. In the ESP, there were five sub-populations. The parameters set for five methods[30,33,34,43,48]were as follows: (1) the population sizes of the five methods were 140, 170, 120, 150 and 60, respectively; (2) the crossover rates of the five methods were 0.45, 0.45, 0.35, 0.4 and 0.5, respectively; and (3) the mutation rate of the five methods were 0.16, 0.24, 0.18, 0.12 and 0.21, respectively. A total of 30 runs were performed. Each run started at the same initial state. As shown inTable 10, the R-GCSE has shorter time steps and CPU times than the existing models.

6. Conclusion

In this paper, a recurrent wavelet-based neuro-fuzzy system (RWNFS) with the reinforcement group cooperation-based sym-biotic evolution method (R-GCSE) was proposed. The R-GCSE can evaluate fuzzy rules locally and make groups cooperate with each other to generate better chromosomes by using an elite-based compensation crossover strategy (ECCS). The advantages of the R-GCSE are summarized as follows: (1) the R-GCSE uses group-based population to evaluate fuzzy rules locally; (2) the R-GCSE uses the ECCS to let the better solutions from different groups cooperate in order to generate better solutions in the next generation; and (3) the R-GCSE indeed performs better and converges more quickly than some genetic methods. Computer simulations show that the R-GCSE performs better than the other methods.

Acknowledgment

This work is supported in part by the National Science Council, Taiwan. ROC under Grant NSC 2221-E-009-214 and NSC 95-2752-E-009-011-PAE.

References

[1] C.T. Lin, C.S.G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent System, Prentice-Hall, NJ, 1996.

[2] G.G. Towell, J.W. Shavlik, Extracting reﬁned rules from knowledge-based neural networks, Mach. Learn. 13 (1993) 71–101.

[3] C.J. Lin, C.T. Lin, An ART-based fuzzy adaptive learning control network, IEEE Trans. Fuzzy Syst. 5 (4) (1997) 477–496.

[4] L.X. Wang, J.M. Mendel, Generating fuzzy rules by learning from examples, IEEE Trans. Syst. Man Cybern. 22 (6) (1992) 1414–1427.

[5] T. Takagi, M. Sugeno, Fuzzy identiﬁcation of systems and its applications to modeling and control, IEEE Trans. Syst. Man Cybern. 15 (1985) 116–132. [6] C.F. Juang, C.T. Lin, An on-line self-constructing neural fuzzy inference

network and its applications, IEEE Trans. Fuzzy Syst. 6 (1) (1998) 12–31. [7] J.S.R. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEE Trans.

Syst. Man Cybern. 23 (3) (1993) 665–685.

[8] F.J. Lin, C.H. Lin, P.H. Shen, Self-constructing fuzzy neural network speed controller for permanent-magnet synchronous motor drive, IEEE Trans. Fuzzy Syst. 9 (5) (2001) 751–759.

[9] H. Takagi, N. Suzuki, T. Koda, Y. Kojima, Neural networks designed on approximate reasoning architecture and their application, IEEE Trans. Neural Networks 3 (5) (1992) 752–759.

[10] E. Mizutani, J.S.R. Jang, Coactive neural fuzzy modeling, in: Proceedings of International Conference on Neural Networks, 1995, pp. 760–765.

[11] C.J. Lin, C.C. Chin, Prediction and identiﬁcation using wavelet-based recurrent fuzzy neural networks, IEEE Trans. Syst. Man Cybern. Part B 34 (5) (2004) 2144–2154.

[12] K.S. Narendra, K. Parthasarathy, Identiﬁcation and control of dynamical systems using neural networks, IEEE Trans. Neural Networks 1 (1) (1990) 4–27.

[13] C.F. Juang, C.T. Lin, A recurrent self-organizing neural fuzzy inference network, IEEE Trans. Neural Networks 10 (4) (1999) 828–845.

[14] P.A. Mastorocostas, J.B. Theocharis, A recurrent fuzzy-neural model for dynamic system identiﬁcation, IEEE Trans. Syst. Man Cybern. Part B 32 (2) (2002) 176–190.

[15] X. Xu, H.G. He, Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat, in: Proceedings of IEEE International Conference on Intelligent Control, 2002, pp. 27–30.

[16] O. Grigore, Reinforcement learning neural network used in control of nonlinear systems, in: Proceedings of IEEE International Conference on Industrial Technology, vol. 1, 2000, pp. 19–22.

[17] A.G. Barto, R.S. Sutton, C.W. Anderson, Neuron like adaptive elements that can solve difﬁcult learning control problem, IEEE Trans. Syst. Man Cybern. 13 (5) (1983) 834–847.

[18] C.J. Lin, A GA-based neural network with supervised and reinforcement learning, J. Chin. Inst. Electr. Eng. 9 (1) (2002) 11–25.

[19] X.W. Yan, Z.D. Deng, Z.Q. Sun, Competitive Takagi–Sugeno fuzzy reinforce-ment learning, in: Proceedings of IEEE International Conference on Control Applications, 2001, pp. 878–883.

[20] C.T. Lin, C.P. Jou, GA-based fuzzy reinforcement learning for control of a magnetic bearing system, IEEE Trans. Syst. Man Cybern. Part B 30 (2) (2000) 276–289.

[21] H.R. Berenji, P. Khedkar, Learning and tuning fuzzy logic controllers through reinforcements, IEEE Trans. Neural Networks 3 (5) (1992) 724–740. [22] D.E. Goldberg, Genetic Algorithms in Search Optimization and Machine

Learning, Addison-Wesley, Reading, MA, 1989.

[23] J.K. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, 1992.

[24] L.J. Fogel, Evolutionary programming in perspective: the top-down view, in: J.M. Zurada, R.J. Marks II, C. Goldberg (Eds.), Computational Intelligence: Imitating Life, IEEE Press, Piscataway, NJ, 1994.

[25] I. Rechenberg, Evolution strategy, in: J.M. Zurada, R.J. Marks II, C. Goldberg (Eds.), Computational Intelligence: Imitating Life, IEEE Press, Piscataway, NJ, 1994.

[26] C.L. Karr, Design of an adaptive fuzzy logic controller using a genetic algorithm, in: Proceedings of the Fourth International Conference on Genetic Algorithms, 1991, pp. 450–457.

[27] M. Lee, H. Takagi, Integrating design stages of fuzzy systems using genetic algorithms, in: Proceedings of the Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, 1993, pp. 612–617.

[28] K. Belarbi, F. Titel, Genetic algorithm for the design of a class of fuzzy controllers: an alternative approach, IEEE Trans. Fuzzy Syst. 8 (4) (2000) 398–405.

[29] C.F. Juang, A hybrid of genetic algorithm and particle swarm optimization for recurrent network design, IEEE Trans. Syst. Man Cybern. Part B 34 (2) (2004) 997–1006.

[30] C.T. Lin, C.P. Jou, GA-based fuzzy reinforcement learning for control of a magnetic bearing system, IEEE Trans. Syst. Man Cybern. Part B 30 (2) (2000) 276–289.

[31] C.F. Juang, J.Y. Lin, C.T. Lin, Genetic reinforcement learning through symbiotic evolution for fuzzy controller design, IEEE Trans. Syst. Man Cybern. Part B 30 (2) (2000) 290–302.

[32] K.S. Tang, Genetic algorithms in modeling and optimization, Ph.D. Disserta-tion, Department of Electronic Engineering, City University Hong Kong, Hong Kong, 1996.

Table 10

Comparison of time steps and CPU time for various existing models in Example 2.

Steps Seconds Steps Seconds Steps Seconds Steps Seconds

GENTOR[48] 914 78.43 79 13.04 1531 98.64 334.37 23.80 SANE[43] 697 50.26 52 10.73 912 74.52 218.39 15.19 R-GA[26] 386 30.21 37 8.05 536 42.36 132.68 8.04 R-SE[49] 217 28.51 21 6.87 358 39.91 74.21 7.12 TDGAR[30] 194 31.25 24 9.23 347 46.43 67.36 8.45 ESP[34] 167 17.46 60 3.51 276 31.24 51.91 5.37 CQGAF[33] 148 21.51 18 5.27 264 34.21 48.67 5.91 R-CGSE 121 14.29 14 3.04 218 26.71 37.21 4.64

(15)

[33] C.F. Juang, Combination of online clustering and Q-value based GA for reinforcement fuzzy system design, IEEE Trans. Fuzzy Syst. 13 (3) (2005) 289–302. [34] F.J. Gomez, Robust non-linear control through neuroevolution, Ph.D.

Dissera-tion, The University of Texas at Austin, 2003.

[35] F. Gomez, J. Schmidhuber, Co-evolving recurrent neurons learn deep memory POMDPs, in: Proceedings of Conference on Genetic and Evolutionary Computation, 2005, pp. 491–498.

[36] V. Kreinovich, O. Sirisaengtaksin, S. Cabrera, Wavelet neural networks are asymptotically optimal approximators for functions of one variable, in: Proceedings of IEEE Conference on Neural Networks, vol. 1, 1994, pp. 299–304. [37] D.W.C. Ho, P.A. Zhang, J. Xu, Fuzzy wavelet networks for function learning,

IEEE Trans. Fuzzy Syst. 9 (1) (2001) 200–211.

[38] J. Zhang, A.J. Morris, Recurrent neuro-fuzzy networks for nonlinear process modeling, IEEE Trans. Neural Networks 10 (2) (1999) 313–326.

[39] S.F. Su, F.Y. Yang, On the dynamical modeling with neural fuzzy networks, IEEE Trans. Neural Networks 13 (6) (2002) 1548–1553.

[40] Z. Michalewicz, Genetic Algorithms+Data Structures ¼ Evolution Programs, Springer, New York, 1999.

[41] R. Tanese, Distributed genetic algorithm, in: Proceedings of International Conference on Genetic Algorithms, 1989, pp. 434–439.

[42] J. Arabas, Z. Michalewicz, J. Mulawka, GAVaPS—A genetic algorithm with varying population size, in: Proceedings of IEEE International Conference on Evolutionary Computation, Orlando, 1994, pp. 73–78.

[43] D.E. Moriarty, R. Miikkulainen, Efﬁcient reinforcement learning through symbiotic evolution, Mach. Learn. 22 (1996) 11–32.

[44] R.E. Smith, S. Forrest, A.S. Perelson, Searching for diverse, cooperative populations with genetic algorithms, Evol. Comput. 1 (2) (1993) 127–149. [45] O. Cordon, F. Herrera, F. Hoffmann, L. Magdalena, Genetic fuzzy systems

evolutionary tuning and learning of fuzzy knowledge bases. Advances in Fuzzy Systems—Applications and Theory, vol. 19, World Scientiﬁc Publishing, NJ, 2001. [46] C.J. Lin, Y.J. Xu, The design of TSK-type fuzzy controllers using a new hybrid learning approach, Int. J. Adaptive Control Signal Process. 20 (2006) 1–25. [47] K.C. Cheok, N.K. Loh, A ball-balancing demonstration of optimal and

disturbance-accommodating control, IEEE Control Syst. Mag. (1987) 54–57. [48] D. Whitley, S. Dominic, R. Das, C.W. Anderson, Genetic reinforcement learning

for neuro control problems, Mach. Learn. 13 (1993) 259–284.

[49] C.J. Lin, Y.J. Xu, Efﬁcient reinforcement learning through dynamical symbiotic evolution for TSK-type fuzzy controller design, Int. J. Gen. Syst. 34 (5) (2005) 559–578.

[50] J. Hauser, S. Sastry, P. Kokotovic, Nonlinear control via approximate input–output linearization: the ball and beam example, IEEE Trans. Autom. Control 37 (3) (1992) 392–398.

[51] K.A. De Jong, Analysis of the behavior of a class of genetic adaptive systems, Ph.D. Disseration, The University of Michigan, Ann Arbor, MI, 1975. [52] J.J. Grefenstette, Optimization of control parameters for genetic algorithms,

IEEE Trans. Syst. Man Cybern. 6 (1) (1986) 122–128.

[53] A. Wieland, Evolving neural network controllers for unstable systems, in: Proceedings of IEEE Conference on Neural Networks, vol. 2, 1991, pp. 667–673.

Yung-Chi Hsu received the B.S. degree in Information Management from Ming-Hsin University of Science and Technology, Taiwan, ROC, in 2002 and the M.S. degree in Computer Science and Information Engineer-ing from Chaoyang University of Technology, Taiwan, ROC. He is currently pursuing the Ph.D. degree at the Department of Electrical and Control Engineering from the National Chiao Tung University, Taiwan, ROC. He is a member of the Phi Tau Phi. He is also a member of the Taiwanese Association for Artiﬁcial Intelligence (TAAI). His research interests include neural networks, fuzzy systems, and genetic algorithms.

Sheng-Fuu Lin was born in Tainan, the Republic of China, in 1954. He received the B.S. and M.S. degree in Mathematics from National Normal Univer-sity in 1976 and 1979, respectively, the M.S. degree in Computer Science from the University of Maryland in 1985, and the Ph.D. degree in Electrical Engineering from the University of Illinois, Champaign, in 1988.

Since 1988, he has been on the faculty of the Department of Electrical and Control Engineering at National Chiao Tung University, Hsinchu, Taiwan, where he is currently a professor.

His research interests include fuzzy systems, genetic algorithms, neural networks automatic target recognition, scheduling, image processing, and image recognition.

Reinforcement group cooperation-based symbiotic evolution for recurrent wavelet-based neuro-fuzzy systems