Incorporation family competition into Gaussian and Cauchy mutations to training neural networks using an evolutionary algorithm

(1)

Incorporation Family Competition into Gaussian and Cauchy Mutations to

Tkaining Neural Networks Using an Evolutionary Algorithm

JiM-Moon Yang Jorng-Tzong Horng Cheng-Yen Kao

Department of Computer Science

and Information Engineering, and Information Engineering, and Information Engineering, National Taiwan LJniversity, National Central University, National Taiwan University,

Taipei, Taiwan ChungLi, Taiwan Taipei, Taiwan

[email protected] horng @db.csie.ntu.edu.tw [email protected]

Department of Computer Science Department of Computer Science

Abstract- This paper presents an evolutionary dechnique to train neural networks in tasks requiring learning be- havior. Based on family competition principles and adap- tive rules, the pro@ approach integrates decreasing- based mutations and self-adaptive mutations. Different mutations act global and local strategies separately to bal- ance the trade-off between solution quality and conver- gence speed. The algorithm proposed herein is applied to two different task domains: Boolean functions and ar- tificial and problem. Experimental results indicate that, in all tested problem, the proposed algorithm performs better than other canonical evolutionary algorithms, such as genetic algorithms, evolution strategies, and evolution- ary programmitug. Moreover, essendial components such as mutation operators and adaptive rules in the proposed algorithm are thoroughly analyzed.

1 Introduction

As widely recognized, artificial neural networks ( A " s )[9] achieve complex computational tasks, such as language rec- ognizer, autonomous robotic control [ 133, and time serial pre- diction [ 101. In addition to having the approximation capabilities for multilayer feedforward networks in numerous functions [7]. ANNs avoid the bias of a designer in shaping system development owing

to

their flexibility, robustness, and tolerance of noise. To train ANNs is usually formulated

as

a weight training process. The process is performed to achieve an optimal set of connection weights for a network according to some optimal criteria. Back propagation [ 121, a con- ventional training algorithm, implements a gradient decent search algorithm, which attempts to minimize the total error between actual output and target output of an ANN. How- ever, back propagation is susceptible to being trapped into local optima and is inefficient in terms of searching for a global minimum of a function which is vast, multimodal, and nondifferentiable.

As global search approaches, evolutionary algorithms effectively deal with complex and nondifferentiable search space. Pertinent research [ 113, [ 191 has demonstrated that the search speed of evolutionary algorithms is comparable to back propagation if genetic operators are well designed. Evo- lutionary algorithms train or evolve various ANNs s m c m e s

for many application domains.

Evolutionary methodologies can be categorized as genetic algorithms 161, evolutionary programming, and evolu- tion strategies. Applying genetic algorithms to train neural networks may be unsatisfactory because recombination operators incur several problems such as competing conventions [lSl and the epistasis effect [21. Epistasis, a nonlinear inter- action, dramatically retards genetic algorithms. To ensure a better performance, modified approaches, called real-coded genetic algorithms, use real-valued representation and pro- mote the ability of mutation operators to reduce the above drawbacks. However, these real-coded genetic algorithms employed random mutations so that they make a larger jump in a search space; however, this may be insufficient to achieve good solution quality. On the other hand, evolution strategy and evolutionary programming use real-valued representation and focus on self-adaptive Gaussian mutation. Despite successful implementation of the mutation operator for vari- ous numerical optimization problems and its reputation as a good operator

for

local search, self-adaptive Gaussian mutation does not perform well for certain specific functions and it is easily trapped to local optima for rugged functions [21], 1201.

This paper presents an evolutionary algorithm, called Family Competition Evolutionary Algorithm, hereafter called FCEA, to train neural networks. The proposed algorithm combines four mutation operators: self-adaptive Gaussian mutation, self-adaptive Cauchy mutation, decreasing-based Gaussian mutation, and decreasing-based Cauchy mutation. FCEA constructs a relationship

among

these four operators to balance the search power of the exploration and exploitation by applying family competition and by automatically controlling the step sizes of mutations. These operators compensate for their disadvantages to enhance the performance of FCEA. To our knowledge, FCEA is the first approach to successfully integrate self-adaptive mutations with decreasing-based mutations via our efficient adaptive rules based on family competition principles.

The proposed algorithm is applied to two different problem areas: Boolean functions learning[l2] and an artificial ant problem [SI, [14]. First, FCEA is applied to solve two famous Boolean function problems, i.e., Xor and 2-bit adder, in order to compare with previous results. Then, the algo-

(2)

Figure 1: Overview of our algorithm: (a) FCEA (b) FCadaptive procedure.

rithm proposed herein trains networks to learn how to generate different tracks based on sensory inputs of an ant robot. Our FCEA algorithm performs better than genetic algorithms,

comes a “family father”. Herein, the term “family father” is used to distinguish other terms such as parent, because a fam- ily is built on the basis of the “family father” in the family eompetition procedure. Next, the “family father” and other individual selected from the population are applied by the recombination operator and mutation operator

M

to generate an offspring. The process is repeated according to the family length L. A family with L offspring via the“fami1y father” is then built. These L offspring in each family then compete with each other and the one with the best objective value survives. These adaptive rules are applied to adapt the step-size vector of this individual for mutation operators. Therefore, the size of each new quasi-population remains N . Finally, the selection operator (S) chooses the N fittest individuals from the set of parent population and they become the parent population of the next stage. The following subsections de- scribe the components of the FCEA approach including the chromosome representation, family competition, recombination operators and mutation operators, selection methods, and control rules.

evolution strategies, and evolutionary programming in all two problems. This work also thoroughly analyzes the essential components of FCEA such as mutation operators and step sizes. Also investigated herein is the influence of the adaptive rules and strategy parameters of the proposed algorithm.

2.1 Chromosome

Each network izrepresented as a quadruple n-dimensional Vector

(a,a,

c,

$), where n denotes the number Of connection links of an ANN. The vector

2”

is an optimized vari- able vector, i.e., a weight vector ofA the connection links of

and initialization

2 Family Competition Evolutionary Algorithm

an

A”.

In addition, 8, G, and

IJ

represent the step-size vectors of decreasing-based mutations, sei€-adaptive Gaus- Proposed herein Family Competition Evolutionary Algorithm

is a multi-operator approach. FCEA incorporates four mutation operators: decreasing-based Gaussian mutation, self- adaptive Gaussian mutation, self-adaptive Cauchy mutation, and decreasing-based Cauchy mutation. Fig.1 depicts the flow of FCEA. %ch block in Fig. 1 indicates the use of amu- tation operator

and

FC-adaptive shown in Fig.l(b) to generate a population of offspring. A factor that determines the pemr- bation size significantly affect the power of these four mutation operators. This important factor is called step size. These four mutation operators

are

the main operators of FCEA; they are sequentially applied in four stages.

The FCEA in Fig. 1 work as follows. Initially, N networks are generated. The fitness value of each network is evalu- ated, FCEA then enters the main evolutionary loop consist- ing of four stages: decreasing-based Gaussian mutation stage, self-adaptive Cauchy mutation stage, self-adaptive Gaussian mutation stage, and decreasing-based Cauchy mutation stage. Each stage is realized by calling FCadaptive procedure illus- trated in Fig. Fig. I@).

The FC-adaptive procedure uses four parameters, i.e.. parent population

(P).

mutation operator (M), selection operator (S), and family competition length (L), to generate a new quasi-population which becomes the parent population of the next stage. The kernel of FCadaptive consists of family competition and adaptive

d e s

for step sizes. In the family competition, each individual in the population sequentially be-

sian mutation, and self-adaptive Cauchy mutation, respectively. Herein, the initial value of each entry of

2

is randomly chosen over [-O.l,O.l] and the initial values of each entries of the vectors o‘, G, and

6,

are set to be 1.0, 0.25, and 0.25, respectively. In the upcoming subections, we use

a’ = (&, ZG,iTa2$+J to represent an individual called “fam- The offspring I?= (&, Gc, qc, &) is a generated offspring by applying the recombination or mutation operators. The symbol

zj

denotes the j-th connection weight of the individual ily fathCf” and b = (&

,

ab9 ‘i?b,

gb)

to denote amkf p;lrent.

d‘.

2.2 Family Competition

The family competition in FCEA can be viewed as

a

local search procedure and works

as

follows. An individual, re- f e n d to as “family parent”, is the leading role of genetic operators. The “family parent” generates offspring by us- ing recombination operators with probability p , and mutation operator with probability 1. While the recombination is ap- plied, recombination selection is used to select two parents: one is the “family parent” and other individual randomly selected from population. Recombination generates only one offspring c‘. The offspring c‘is exact same the “family parent” if recornbination operators are not applied. Then, mu- tation operaror(M) i s applied to the offspring E to generated an offspring d. The “famjly parent” generates L offspring

(3)

by repeatedly applies these procedures. These

L

offspring compete with each other and only the one with best fitness survives. FCEA employs this strategy to avoid premature convergence by maintaining the diversity of the populations because the L offspring generated from the same ‘‘family parent” may resemble each other. Family competition principle is that each individual in the population sequentially becomes the “family father” and perfom the local search to generate L offspring; and then only the one with best fitness survives. Therefore, FCEA will generate L

.

N offspring in each stage so that K E A generates 2 . (Ld

+

La)

offspring in one generation.

2.3 Recombination Operators

FCEA uses three kinds of recombination operators: modified discrete recombination, blend crossover (BLX-0.5) [3] and intermediate recombination [ 11. The intermediate recombination is a specikl case of BLX-0.5.

Modiiied discrete recombination: The original discrete recombination [ 11 generates a child that inherits genes from two parents with equal probability. Herein, this recombination is modified such that a child inherits genes from the “family father”

d

with probability 0.8 and from another parent

6

with probability 0.2. The modified discrete recombination is given below.

(1) The probabilities in (1) can reduce the undesired effects of competing conventions on training neural networks.

BLX-0.5 and intermediate recombination: The BLX- 0.5 [3] is successfully

used

in a real-coded genetic algorithm. It is defined as follows:

za with probability 0.8

(d

zj with probability 0.2. zj” =

w; = wp

+

p(w; -a;),

where w may be any vector such

as

Z,a,G,or d a n d

p

is cho- sen uniformly from the range [ - O S , 1.51. BLX-0.5 is called intermediate recombination when

p

is equal to 0.5. This is accounts for why intermediate recombination is considered herein to be a special case of BLX-0.5.

This work follows the work of the evolution strategies community to employ only intymediate recombination on step-size vectors, i.e., ?,t 8, and y5. Meanwhile, FCEA applies

discrete recombination, BLX-0.5, and intermediate recombination to recombine connection links 3. In the following experiments, the probabilities are 0.2,O.l and 0.1, respectively. 2.4 Mutation Operators

Mutations are main operators of our FCEA. As mentioned earlier, four mutation operators are used in FCEA. Details of each operator are described as follows.

Self-adaptive Gaussian mutation: Schwefel [ 171 pro- posed a self-adaptive technique, called self-adaptive Gaussian

Figure 2: Comparisons of Gaussian and Cauchy distributions.

mutation. This technique performs well in parameter optimization problems. It is accomplished by first mutating step size wj. Next the connection link s:j is mutated by adding a

normally distributed random value with zero and w j as expec-

tation and standard deviation, respectively. This operator is realized by using the following equations (3) and (4).

Wj” = Wj”

.

[T’

.

N(O,1>

+

T

.

Nj(0, I)] (3)

x; = zp+w; .Nj(O,l), (4)

where N(0,l) is a normal distribution with mean 0 and standard deviation 1. The solid line in Fig. 2 shows the density distribution of N(0,l). In addition, Nj(0,l) is a normaliza- tion distribution for the jth connection link. In our experiments, T and

7’

are set to (&%)-‘ and @)-’, respec-

tively.

Self-adaptive Cauchy mutation: Cauchy density distri- bution is the dash line in Fig. 2 and is defined as follows:

where t is a scale parameter [211. The behavior of self- adaptive Cauchy mutation is $exactly the same as self-adaptive Gaussian mutation except Cauchy distribution replaces the normal distribution. Restated, the step size is controlled by using the similar equation in (6) and then the connection link xj is mutated by adding a Cauchy distributed random value with y5j as standard deviation. Self-adaptive Cauchy mutation is given by (6) and (7).

y5c 3 =

?+5;

ezp[T’

.

N(0,l)

+

T

-

N j ( 0 , l)] (6)

z; = 2; +?)j”.Cj(t), (7)

where C(t) is aCauchy probability distribution function with parameter t. In our experiments, t is 1.

Decreasing-based mutations: The decreasing-based Gaussian mutation and decreasing-based Cauchy mutation share the same step-size vector

5.

It is decreased by a de- creasing rate 7 , O

+

y

+

l. These two mutations use two following (9) and (10) to mutate connection links, respectively.

U; = 7.g; (8)

z; = 2;

+

U; * Nj(0,l) (9)

z; = z;+Uj”.cj(l), (10)

where 7 is 0.95 in our experiments. According to (3), (6), and (8), two interesting phenomena are observed. First, (8) can

(4)

save computational time because it is multiplication; in addition, (3) and (6) must compute a normal distribution function as well

as

an exponential function. Second, the search behavior of decreasing-based mutation markedly differs from self- adaptive mutations because (8) decreases the step sizes by a fixed rate; however (3) and (6) adapt step sizes by a stochastic approach.

2.5 Selections

FCEA uses four selections: recombination selection, family selection, replacement selection, and population selection. FCEA employs recombination selection to select two individuals for recombination. One is “family father” and the other is randomly selected from the population. Family selection selects the one with best objective value from the L offspring that are generated from the same “family father”. The best children population set is then formed by repeatedly applying the procedure. In

our

three mutation stages except for the decreasing-based Gaussian mutation stage, FCEA employs replacement selection to select the one with better objective value from “family father” and its best child that is selected by family selection. Combining family selection and replacement selection is usually viewed as a local search procedure. Population selection selects the best N individuals from the union set formed by the parent population set and best chil- dren population set. Population selection resembles ( p

+

p)-

ES used by traditional evolution strategies. 2.6 Adaptive

Rules

Controlling the step size heavily influences the perfomance of Gaussian and Cauchy mutations. FCEA constructs the relationship between self-adaptive mutations and decreasing- based mutations by combining deterministic, self-adaptive, and adaptive techniques to effectively control the step sizes of Gaussian and Cauchy mutations according to the adaptation classification [5]. Herein, these rules are summarized into A-rules, including A-adaptive-rule and A-decrease-rule, for self-adaptive mutations and D-rules, including D-decrease- rule and D-increase-rule, for decreasing-based mutations.

1. A-&@:

0 A-adaptive-rule: This_self-adaptive rule controls

the step sizes of v’and y5 according to (3) and (6). It is called a self-adaptive rule because the step- size vectors v’ and $ are directly encoded into a chromosome of an individual and undergo mutations and recombination. The rule is applied when the mutation is a self-adaptive one.

0 A-decrease-rule: ,The rule decreases the step-

size vectors v’ and $ of a “family parent” when the “family parent” is better than its best child generated

t

y

applied family competition. Step sizes v’ and $ are adapted while self-adaptive Gaussian and self-adaptive Cauchy mutation are applied,

respectively. The step sizes v’ and

4

are adapted in the following manner:

w; = 7 ~ ; if ‘‘family parent” 3 is (1 1)

where 7 is the decreasing rate and 7 is 0.95 in our experiments.

better than its best child,

2. D-rules:

0 D-decrease-rule: The rule is a deterministic rule

because it decreases the step size ?t according to

(8). The rule is applied when the mutation is

a

decreasing-based one.

0 D-increase-rule: This adaptive rule enlarges the

step size t3of the best child when family competition is applied and the best child is better than its “family father” in two self-adaptive mutation stages. It updates the step sizes as follows: oj”

=Pvh,,,,

i f U; 4

Pv~,,,,

and the best child c‘

is better than its ”family parent” Z,

(12) where v’ is the step-size vector of the best child;

vLe,,,

is the mean value of the vector v’; and p is 0.2 in our experiments.

FCEA successfully combines self-adaptive mutations and decreasing-based mutations via A-rules and D-rules to enhance the performance. Later we demonstrate how these rules can enhance the performance of FCEA.

3 Boolean Functions Learning

FCEA is applied to optimize the connection weights for two well-lmown Boolean function problems [E]

.

To compare with previous works, FCEA uses standard fully connected networks structures which have a hidden layer with a bias neuron. These two problems are described

as

follows:

1. Xor: An ANN has 2 input nodes, 2 hidden nodes, and 1 output node. There are 9 connection weights and 4

input patterns. The output value is the Exclusive OR of the input bits.

2. Addition: An ANN has 4 input nodes, 4 hidden nodes, and 3 output nodes. These are 35 connection weights and 16 input patterns. The output pattern is the result of the sum of the two 2-bits input strings.

Herein, binary input patterns are used and a network is trained to generate output values ranging from 0 to 1. The fitness function of a network is based on mean square error and is given below

(5)

Table 1: Comparison the results of FCEA with previous works on two Boolean functions.

I

Method

1

xor

I

Addition

1

Evolutionary 2000.0 Promammine f41 (100%) N/A Algorithm [18] Adaptive Genetic

1

StandardGenetic

1

6120

I

(80%) N/A 3473 GENITOR 1191 GENITOR II [191

tGENITOR is a well-known modified genetic algorithm. tGENITOR II is a distributed version of GENITOR. t(N/A denotes not available in the literature.) $The values in () is the successful classified rate.

(93%)

where Ohj and O& denote, respetively, the output value and training value of the j t h output neuron for the kth input pattern; m is the number of input pattern; and No is the number of output neuron. A training input pattern is classified cor- rectly if the tolerance of

[&

-

O&I is below 0.1 for each output neuron. A network is convergent if the network clas- sifies all the training input patterns.

Evolution begins by initializing all the connection weights z' of each network to random values between -0.1 and 0.1. The initial values of step sizes for decreasing-based mutations, self-adaptive Gaussian mutation, and self-adaptive Cauchy mutation are 1.0, 0.25, and 0.25, respectively. The family competition length Ld and La in the decreasing-based stages and self-adaptive stages are 3 and 9, respectively. In this case, FCEA generates 720 networks, i.e. (3+9+9+3).30, in one generation if the population size is 30. The population size is 10 for Xor and is 30 for addition problems. The rate of recombination is 0.2. These parameter values except for the population size are applied eo dl problems addressed herein. Table 1 compares our FCEA, evolutionary programming [4],

and

genetic algorithm

[HI,

[19]

on

the Boolean functions. Detailed implementation of these compared approaches can be found in the original papers. According to pertinent literature, the performance of their evolutionary al- gorithms is competitive with back propagation. FCEA is ex- ecuted 50 runs for each problem and is up to 500000 function evaluations, i.e., the number of generated offspring, for each run. FCEA can solve J 1 Boolean functions within reasonable function evaluations; the successful classified rates are 96% for Addition problem.

Standard evolutionary algorithms, such as simple genetic algorithm [18] and (1+6)-ES [16], cannot completely solve Xor problem for all puns. The modified evolutionary algorithms [ 161, [ 181 can resolve simple problems, such

as

Xor.

Figure 3: Artificial ant problems: "John Muir Trail".

However, they only solve several simple problems. GENI- TOR needed only around 500 recombination to resolve Xor problem. However, it required a population of 5000 and 2 million function evaluations to solve 2-bit adder and the classified rate is only 56%. These results indicates that although efficient for simple problems, these evolutionary algorithms can not solve complicated problems, such as Addition problems. GENITOR

II,

a distributed version of

GENITOR,

can increase classified rate to 93% in the Addition problem. How- ever, its population size is also 5000 and the number of func- tion evaluations also reaches 2 million. In contrast to these approaches, FCEA only needs 256464 function evaluations and the successfully classified rate is up to 96% by using small population size, i.e., 30, for Addition problem. These results demonstrate that FCEA is a robust approach to train forward networks for Boolean functions learning.

4 The Ant

Problem

This study applies FCEA to experiment on complex search and collection task that is the tracker task "John Muir Trail" [8]. In this problems. a simulated ant is placed on a two- dimensional toroidal grid that contains a trail of food. The ant traverses the grid to collect any food encountered along the trail. This task attempts to train a neural network, i.e., a simulated ant, that collects the maximum number of pieces of food during the given time steps.

Fig.3

shows this trail. Each black box in the trail stands for a food unit. According to the environment of [8], the ant stands on one cell, facing one of the cardinal dwtions; it can sense only the cell ahead of it. After sensing the cell ahead of it, the ant must take one of four actions: move forward one step, turn right 90°, turn left go", and no-op (do nothing). In the optimal trail of the "John Muir Trail", there are 89 food cells, 38 no food cells, and 20 turns. So, the number of minimum steps for eating all food is 147 time steps. On the other hand. an ant requires at least 165 time steps to completely travel the optimal trail of the "Santa Fe Trial".

(6)

f

Figure 4: The typical convergent curve of “John Muir Trail” problems.

Figure 5: The typical search behavior of a simulated ant controlled by our evolved neural controller for “John Muir Trail” ant problem.

of [SI. That investigation not only used finite state machines and recurrent neural networks to represent the problem, but also used the traditional bit-string genetic algorithm to train the structures. Each simulated ant is controlled by a network having two input nodes and four output nodes. The “food” input is 1 when food is present in the cell ahead of the ant; and the second ”no-food” is 1 in the absence of food in the cell in front of the ant. Each output unit corresponds to a unique action: move forward one step, turn right 90°, turn left 90°,

or no-op. Each input node is connected to each of the five hidden nodes and to each of the four output nodes. The five hidden nodes are fully connected in the hidden layer. There- fore, this structure is a full connection with shortcut recurrent neural network; its total number of links with bias input is 72. To compare with previous results, the fitness is defined

the number of pieces of food eaten within 200 time steps for “John

Muir

Trail”.

Fig.4 displays the convergence curve of the ant problems. Fig.4 indicates that FCEA only requires about 12,000 function evaluations to train a neural controller to find 82 food pieces within 200 time steps. To find 85 and 88 food pieces within 200 time steps, FCEA then requires about 35000 and 58000 function evaluations. FCEA on average found 81,87, and 88 food pieces within 200 time steps about 2oooO,65000, and 8oooO function evaluations, respectively. “John Muir Trail” was tested over 25 runs and the rate of success of find- ing 89 food pieces was 80%. The remaining 20% of runs the ant foraged at least 86 food pieces. The successful rate can be improved to 96% when the population is 100 and the number of function evaluations is 500,000.

Table 2: Comparison among genetic algorithm, evolutionary programming, and our FCEA on ”John Muir Trail” ant problem.

k I

Fig.5 depicts a typical search behavior and the traveled path of a simulated ant that is controlled by our evolved neural network. The number of the cell is the time step to eat the food. The symbol

’*’

denotes a cell traveled by

an

ant when the cell is empty. Fig.5 indicate that the ant requires 195 time steps to seek all 89 food pieces in the environment of “John Muir Trail”.

Table 2 compares our FCEA, evolutionary programming [ 141, and genetic algorithm 181 on the “John Muir Trail” ant problem. Jefferson et al. used traditional genetic algorithms to solve “John Muir Trail”. That investigation encoded the problem with 448 bits and used a population of 65536 to achieve the task in 100 generations. Their approach required 6,553,600 networks to forage 89 food pieces exactly within 200 time steps. In contrast to Jefferson’s solution, our FCEA uses population sizes 50 and 100, and only requires about 126,000 and 284,000 function evaluations, respectively, to eat 89 food pieces within 195 time steps. Table 2 also indicates that FCEA perfoms better than evolutionary programming.

5 The Characteristics

of

FCEA

In this section, we briefly &scussed several characteristics of FCEA via experimental designs. Table 3 compares the ten approaches in term of 2-bits Adder functions and an ant problem. Each approach is a combination of operators applied in our FCEA: decreasing-based Gaussian mutation (MDG), self-adaptive Cauchy mutation ( M c ) , self-adaptive Gaussian mutation (MG), and decreasing-based Cauchy mutation (MDc). For example, the M c approach only uses self- adaptive Cauchy mutation; the MDG

+

MC approach integrates decreasing-based Gaussian mutation with self-adaptive Cauchy mutation and it also applied the control rules. The FCL~FCEA approach is unique case of our FCEA because the family competition lengths (Ld and L a ) is set to 1. The NCRFCEA approach is also a unique case of our FCEA but it does not apply adaptive rules, i.e., A-decrease-rule and D- increase-rule. The final approach in Table 3 is a standard evolution strategy i.e., ( p

+

X)-ES, where p is 20 and X is 120. Each approach executes 50 runs for Boolean Functions; and 25 runs for the ant problem. The maximum numbers of function evaluations of each run on Boolean functions and the ailt

(7)

Methods

problem are 500,000 and 250,000, respectively. The value in the parenthesis in the ant problem denotes the average number of food pieces eaten.

We observe several properties according to these experimental results of Table 3 and Fig. 6.

e Each mutation operator in FCEA has different performance on the seCected problems. These results indicate that each operator has different search behavior.

0 Generally, the approaches of a combination of multiple

mutations perform better than the approaches of unary- operator mutation and they do not increase proportion- ally on the number of function evaluations. For example, our FCEA that combines MDG, Mc, MG, and MDC has the best performance among all approaches on all testing problems. Nevertheless, the number of function evaluations of FCEA is not larger than other approaches for all testing problems.

e The control rules of step sizes are useful because

NCRFCEA

perfoms worst than FCEA. Fig. 6(b) indicates that the step size ( 0 ) of decreasing-based mu-

tation becomes small while FCEA does not apply D- increase-rule. Fig. q a ) indicates that the step size of self-adaptive Gaussian mutation is too large to improve solution while FCEA does not apply A-decrease-rule.

Q The family Competition length is a one of critical fac-

tors of FCEA

m

obtain better performance for com- Addition Jefferson's

Ant Problem

(a) Self-adaptive mutation (b) Demasing-based muta- tion

Figure 6: The comparison of average step size between FCEA with adaptive rules and FCEA without adaptive rules on ant problem

plex problems. For example, FCEA have to enlarge the length in order to solve ant problems.

e Cauchy mutations perform better than Gaussian mutations on training neural networks.

6 Conclusions

This study has demonstrated that FCEA is an efficient approach for training neural networks. The proposed algorithm combines decreasing-based mutations with self-adaptive mutations to enhance the performance based on family competition and adaptive rules. Our FCEA is able to balance the exploitation and exploration of search ability. Results from Boolean functions and an ant problems confirm the flexibility and robusmess of such an evolutionary approach.

A global optimization method must consist of both global and local search strategies. For our FCEA, the decreasing- based mutation with large initial step size are global search strategies and self-adaptive mutations with family competition procedure and replacement selection are local search strategies. Cauchy mutations are attention to be used in global search strategies than Gaussian mutations as demonstrated in the proposed approach. These mutation operators can be integrated to closely cooperate with each other. These smoothly integrated strategies make our FCEA applicable to train neu- ral networks for various applications as well as to solve vari- ous numeric optimization problems. Under appropriate con- ditions, FCEA is able to converge to a global solution.

In summary, experiments in these well-known problems verify that the proposed approach consistently performs more robustly than other algorithms, such as genetic algorithms, evolution strategies, and evolutionary programming. We be- lieve that the flexibility and robustness of our FCEA makes it a highly effective global optimization tool.

(8)

Bibliography

113 T. Blck, F. Hoffmeister, and H-P. Schwefel. A survey of evolution strategies. In Proc. Fourth Int. Con5 on Genetic Algorithms, pages 2-9,199 1.

[23 Y. Davidor. Epistasis variance: Suitability of a represen- tation to genetic algorithms. Complex Systems, 4:368-

383,1990.

[3] L. J. Eshelman and J. D. Schaffer. Real-coded genetic algorithms and interval-schemata. In L. D. Whitley, edi- tor, Foundations of Genetic Algorithm, volume 2, pages

187-202. Morgan Kaufmann Publishers, Inc., 1993. [4] D. B. Fogel, L. J. Fogel, and V. W. Porto. Evolving

neural networks. Biological Cybernetics, 63:487-193,

1990.

[5] R. Hinterding, 2. Michalewicz, and A. E. Eiben. Adap- tation in evolutionary computation: A survey. In Proc.

of IEEE Con5 on Evolutionary Computation, pages 65-

69, 1997.

[6] John H. Holland, Adaptation in natural and artificial Jystems. The University of Michigan Press, Ann Arbor,

MI, 1975.

171 K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4:25 1-257,

1991.

[8] D. Jefferson, R. Collins, C. Cooperand M. Dyer, M. Flowers, R. Korf, C. Taylor, and A. Wang. Evo- lution as a theme in artificial life: The genesysltracker system. In Artijicial L$e 11: Proc. of the Workshop on Artificial Life. pages 549-577,1990.

[9] R. P. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine, pages 4-22,1987.

[lo] J. R. McDonnell

and

D.

Waagen. Evolving recurrent perceptrons for time-series modeling, IEEE Trans. on Neural Networks, 5( 1):24-38, 1994.

1113 D. J. Montana and L. Davis. Training feedforward neural networks using genetic algorithms. In Proc. of Eleventh Int. Joint Con$ on Artificial Intelligence, pages

[121 D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D.E. Rumelhart and J. L. McClelland, editors, Pur- allel Distributed Processing: Explorations in the Mi- crostructures of Cognition, pages 3 18-362. Cambridge,

MA: MIT Press, 1986.

Cl31 R. Salomon. Scaling behavior of the evolution strat- egy when evolving neuronal control architectures for autonomous agents. In P. J. Angeline et al., editor, the

762-767,1989.

Lecture Notes in Computer Science: Evolutionary Pro- gramming VI, pages 47-57,1997.

1141 P. J. Angeline G. M. Saunders and J. B. Pollack. An evolutionary algorithm that constructs recurrent neural networks. IEEE Trans. on Neural Networks, 5( 1):54-

65, 1994.

1151 J. D. Schaffer, D. Whitley, and L. J. Eshelman. Com- binations of genetic algorithms and neural networks: A survey of the state of the art. In Proc. oflnt. workshop on Combinations of Genetic Algorithms and Neural Net- works, pages 1-37,1992.

[16] M. Scholz. A learning strategy for neural networks based on a modified evolutionary strategy. In Paral- lel Problem Solving from Nature-Proc. 1st Workshop PPSN I (Lecture Notes in Computer Science), volume

496, pages 316318,1991.

puter Models. Chichester: Wiley, 1981.

[ 171 Hans-Paul Schwefel. Numerical Optimization of Com-

[ 181 M. Srinivas and L. M. Patnaik. Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans. Systems, Man, and Cybernetics, 24(4):656-667,

1994.

[19] D. Whitley, T. Starkweather, and C. Bogart. Genetic algorithms and neural networks: Optimizing connec- tions and connectivity. Parallel Computing, 14:347-

361,1990.

[20] J. M. Yang, C. Y. Kao, and J. T. Horng. A continu- ous genetic algorithm for global optimization, In Proc.

of the Seventh Int. Con$ on Genetic Algorithms, pages

230-237,1997.

E211 X. Yao and Y. Liu. Fast evolutionary programming. In

Pmc. of the Fifth Annual Con5 on Evolutionary Pm- gramrning, pages 451460,1996.