**MOEA N** ^{EURA} **N** ^{ETWORK} **D** ^{ESIGN}

^{EURA}

^{ETWORK}

^{ESIGN}

**MOEA N** ^{EURAL } **N** ^{ETWORK} **D** ^{ESIGN}

^{EURAL }

^{ETWORK}

^{ESIGN}

**Design Specification**

## • Essential requirements for neural network design

A t i i l ith th t h f th ti l *t*

o A training algorithm that can search for the optimal parameters (i.e., weights or biases) for the specified network structure and training task.

o A rule or algorithm that can determine the network complexity and ensure it to be sufficient for given training problem.

o A*metric* or measure to evaluate the reliability and generalization
o A metric or measure to evaluate the reliability and generalization

of the produced neural network.

**Design Dilemma**

) ,

( ^{*}

* *x* ω

*f*_{NS}

## •

A single optimal neural network is very difficult to find due too Weights are tuned by training sets, which are finite– difficult to extract *f*_{NS}^{*} (**x**,**ω**^{*})

from by a finite training data set._{F}_{NS}

o Trade-off between NN learning capability and the variation of the hidden neuron numbers

*F**NS*

A network with insufficient neurons— low training performance

A network with excessive number of neurons— difficult for generalization .

## •

Instead of trying to obtain a single optimal neural network, finding a set of*networks with different network structures is more feasible.*

**near-optimal**### ••

NN design– Multiobjective optimization problem (achieve better performance while simplify network structure)**Design Principle**

### •

A population based, parallel searching multiobjective genetic algorithm is suitable for neural network design– to find a set of non-dominated neural net solutionso Find a uniformly distributed Pareto front by one run

o Equally treat discontinuous, concave and other shapes of Pareto front

f2

**f2** **t** **k**

**f1– network **
**complexity **
**f2– network **

**training error**

**p** **y**

**(neuron number) **

f1

**Hierarchical Genotype**

### •

Using GA to evolve RBF Neural Network o Evolving NN topology along with parametersHi hi l i d i

**phenotype**

o Hierarchical gene structure in genotype design

**weight** **center**

)

||

||

exp(

)

( ^{2}

1

*i*
*m*

*i*

*f* **x** =

### ∑

*i*−

**x**−

**c**

=

**phenotype**

ω
**1 0 0**
**1 0 1**

**Control**
**genes**
**Weight**

**1 0 0**
**1 0 1**

**Control**
**genes**
**Weight**

**Genotype**

**Center**
**genes**
**genes**

**Center**
**genes**
**genes**

**Domination Ranking**

## • Goal: To convert multiple fitness of multiobjectives to a rank value.

f2

f1

## • Rank value represents the dominated relationship by

### layerizing the resulting population.

**Diversity Preservation**

## • Goal: to maintain population diversity during

### evolutionary process to obtain uniformly distributed Pareto front

**f2** **f2**

**f2** **f2**

**f1** **f1**

**Automatic Accumulated Ranking**

### •

Combine hierarchical genotype representation and Rank Density based Genetic Algorithm (RDGA)— AutomaticAccumulated Ranking Strategy, includes both dominance and diversity info

diversity info.

### •

All nondominated individual has rank=1, for dominated*individual y at generation t , Its rank value is given by*

### ∑

### +

### =

) (

### ) , ( 1

### ) , (

*p* *t*

*j*

*t* *y* *rank* *t*

*y*

**f2**

*rank*

3

8

### ∑

=1

*j* *j*

1 1

2

2

6 12

*where P** ^{(t)}* are the number of individual who

**f1**
1

2 *where P** ^{(t)}* are the number of individual who

*dominate y*

**Adaptive Cell Density Estimation**

### •

In HRDGA, an adaptive grid density estimation approach is proposed. The length of adaptive grid cell in objective space is computed ascomputed as

*X* *i*

*X* *f* *i* *f*

*d* max (**x**) min (**x** )

**x**

**x**∈ − ∈

= ^{i}

*i 1* = *n*

^{Density=4}*X* *i*
*X* *i*

*i* *K*

*f*

*d* max *f* (**x** ) min (**x** )

**x**

**x**∈ − ∈

=

*i* = 1 ,..., *n*

f2

*i*

*i* *K*

*d* =

*i* = 1 ,..., *n*

### •

Density value of each individual is the b f th i di id l l t d i th**Density=4**

number of the individuals located in the same cell.

f1

**Fitness Assignment**

## • In HRDGA, original population is divided to optimize rank and density value independently. The rank and

### d it l b ti i d i lt l !

### density value can be optimized simultaneously!

population

pop lation Sub-pop 1 to Minimize

population rank value

i-th generation (i+1)-th generation

p p population

Sub-pop 2 to minimize population density value

Selection

Crossover

&mutation

Selection

**Elitism**

## • An elitist archive is used and an adaptive probability of sampling an individual from archive is applied.

Main

population

Elitists’

Archive

**Local Search**

## • Diffusion scheme from Cellular GA

**Selected**
**parent**

**Best **
**i di id l**

Diffusion scheme

**individual**

**Offspring**

**Forbidden Region**

## • To prevent the appearances of offsprings with high rank value and low density value

**Offspring**
**Selected **

**parent**

Forbidden Region

**Offspring**

Forbidden Region

**Mackey-Glass Time Series Prediction**

### •

Algorithms for comparisono K-Nearest Neighbor (KNN, Kaylani & Dasgupta, 2000);

## •

Testing Problem Mackey Glass Chaotic Time Serieso Generalized Regression Neural Network (GRNN, Wasserman, 1993)
*o Orthogonal Least Square (OLS, Chen,et. al, 1991) *

Testing Problem– Mackey-Glass Chaotic Time Series )

)) ( (

1 (

) (

) (

)) (

( *b* *x* *t*

*τ*
*t*
*x*

*τ*
*t*
*x*
*a*
*t*

*d*
*t*
*x*
*d*

*c* − ×

− +

−

= ×

*o To predict x(t+6) based on x(t), x(t-6), *
*x(t-12) and x(t-18) *

o

150 10

1 0 2

0 = = =

= *b* *c* *τ*

*a* = 0.2,*b* = 0.1, *c* =10, *τ* =150
*a*

**Parameter Setting**

### •

For HRDGA, the length of all three layers of genes are chosen to be 150, population size is 400.### •

^{F}

^{KNN}

^{d GRNN}h d 70 i di id l k i h 11 80

### •

For KNN and GRNN method, 70 individual networks with 11~80 number of neurons are used for comparison.### •

For OLS method, 40 different tolerance parameter– ρ^{ρ}values are chosen

### •

^{St}

^{i}

^{C it i}

^{ith}

^{th}

^{h (}

^{ti}

^{)}

^{d 5 000}

^{th}

form 0.01 to 0.4 with step 0.01. determines the trade-off between the performance and complexity of a network.

ρ

ρ

### •

Stopping Criteria: either the epochs (generations) exceeds 5,000, or the training Sum Square Error (SSE) between two sequential (epochs)generations is smaller than 0.01.

### •

Training data set: first 250 seconds data.### •

Testing data set: data from 250 – 499, 500 – 749, 750 – 999 and 1,000 – 1,249 seconds**Simulation Study-** **Training Set**

Training SSE vs Neuron Numbers Non dominated front Training SSE vs. Neuron Numbers

for training set

Non-dominated front for training set

**Simulation Study-** **Testing Set #1**

Training SSE vs Neuron Numbers Non dominated front Training SSE vs. Neuron Numbers

for testing set #1 Non-dominated front

for testing set #1

**Simulation Study-** **Testing Set #2**

T i i SSE N N b Non dominated front

Training SSE vs. Neuron Numbers for testing set #2

Non-dominated front for testing set #2

**Simulation Study-** **Testing Set #3**

T i i SSE N N b N d i d f

Training SSE vs. Neuron Numbers

for testing set #3 Non-dominated front

for testing set #3

**Simulation Study-** **Testing Set #4**

T i i SSE N N b N d i d f

Training SSE vs. Neuron Numbers

for testing set #4 Non-dominated front

for testing set #4

**Performance Comparison**

**Best performance **
**for Training set**

**Best performance **
**for Testing set #1**

**Best performance **
**for Testing set #2**

**Best performance **
**for Testing set #3**

**Best performance **
**for Testing set #4**
**for Training set** **for Testing set #1** **for Testing set #2** **for Testing set #3** **for Testing set #4**
Training

SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

**KNN ** 2.8339 69 3.3693 42 3.4520 42 4.8586 48 4.8074 19

**GRNN ** 2.3382 68 2.7720 38 3.0711 43 2.9644 40 3.2348 37

**OLS ** 2.3329 60 2.4601 46 2.5856 50 2.5369 37 2.7199 54

**HRDGA ** 2.2901 74 2.4633 47 2.5534 52 2.5226 48 2.7216 58

**Problem for OLS**

### •

The trade-off characteristic between network performance and complexity totally depends on the value of tolerance parameterρ ρ

### •

Same value means completely different trade-off features for### ρ

different NN design problems– designer cannot control the network complexity

*parameter .*^{ρ}

### ρ

network complexity.

### •

The relationship between value and network topology is nonlinear, many to one mappingρ

many to one mapping

**Observations**

### •

A new genetic algorithm assisted neural network design approach Hierarchical Rank-Density Genetic Algorithm is proposed### •

Characteristics of HRDGA:o Hierarchical genotype representation– evolve topology with parameters o Multiobjective optimization– find a set of near-optimal network

candidates by one run candidates by one run

o Elitism, local search and forbidden region techniques help HRDGA in finding near-optimal networks with low training errors.

o The network complexity is near completelyp y **p*** y* and uniformly

*sampled p due to HRDGA’s diversity maintaining scheme.*

**y****P** ^{ARTIC E} **S** ^{WARM} **O** **PTIMIZATION**

^{ARTIC E}

^{WARM}

**P** ^{ARTICLE } **S** ^{WARM } **O** **PTIMIZATION**

^{ARTICLE }

^{WARM }

**Definition**

z

### Swarm Intelligence (SI) is the property of a system whereby the collective behaviors of (unsophisticated) *agents* *interacting locally with their environment*

*agents* *interacting locally with their environment*

### cause coherent functional global patterns to emerge.

z

### SI provides a basis with which it is possible to p p

### explore collective (or distributed) problem solving without centralized control or the provision of a global model

### global model.

**In Simple Language…**

z

### SI systems are typically made up of a population of simple agents interacting locally with one another and with their environment to accomplish simple goal.

### with their environment to accomplish simple goal.

z

**Although there is normally no centralized control** structure dictating how individual agents should

### behave local interactions between such agents often behave, local interactions between such agents often **lead to the emergence of global behavior **

### (swarming behavior).

**SI in Nature**

### Fish Schooling Bird Flocking in V formation Fish Schooling

©CORO, Caltech

### Bird Flocking in V formation

©CORO, Caltech

### Benefits of forming a swarm of thousand of animals g

### (agents) is to increasing foraging efficiency and defense

### over against predators

### • Another example: Nest building in termites

### building in termites

### • An eighteen feet tall termite nest at the University of

### Ghana Legon.

(© Kjell B. Sandved / Visuals Unlimited)

**How Termites Build their Nest**

### 1. Each termite scoops up a mudball from its

### environment, invests the ball with pheromones.

### 2. Then, randomly , y drops it on the ground or elevate p g positions to form small heaps.

### 3. Termites are attracted to their nestmates' pheromones.

### Hence, they are more likely to drop their own mudballs y y p near or on top of their neighbors'.

### 4. The process stop when reaches a specific height.

### 5. Termites look for heaps clusters, choose the heap

### 5 e es oo o eaps c us e s, c oose e eap

### clusters, and connect them with each other by building walls.

### 6. Over time this leads to the construction of pillars, p ,

### arches, tunnels and chambers.

**Fundamental Concepts of SI**

z

**Self-organization process**

–

### Each agent plays its role by interacting with its environment to gather the latest information, constantly making decision based on some

### simple local rules and information received, and p , interact locally with other agents

z

**Division of labor**

–

### Different groups of agents have their own specializations to carry on certain tasks

### They collaborate with other groups and perform

–

### They collaborate with other groups and perform

### their own tasks simultaneously

**Applications of ‘Swarm’ Principle**

z

### Robotics

– Swarm bots http://www.swarm-bots.org/

### C t A i ti

z

### Computer Animation

– http://gvu.cc.gatech.edu/animation/Areas/group_behavior/gr oup.html

z

### Games, Interactive graphics and virtual reality

z

### “Swarm-like” algorithm - solving optimization problems

### problems

– Particle Swarm Optimization (PSO)

– Ant Colony Optimization (ACO)

**Early Particle Swarm Optimization**

z

### Craig Reynolds’s flock motion (1986-1987)

–

### To model the flocking behavior of simple agents (boids)

### (boids)

–

### Resulting flock motion is contributed by the

### interaction between the behaviors of individual b id

### boids

–

### Each boid has its own coordinate system and

### applies geometric flight model to support its flight movement

–

### Geometric flight model includes translation and flight dynamic parameters of yaw, pitch, and

### flight dynamic parameters of yaw, pitch, and

### banking (roll)

–

### Incorporates three steering behaviors (local

### rules), which is the underlying concept of flocking E h b id h it l l i hb h d (li it d

–

### Each boid has its own local neighborhood (limited perception range of birds or fishes in the nature)

**angel**
**angle**

### boid

**distance**
**distance**

### The three steering behaviors describe the maneuverability of an individual boid:

**Separation: steer to **
avoid crowding local
flockmates

**Alignment: steer **
towards the average
heading of local

flockmates (neighbors)

**Cohesion: steer to **
move toward the
average position of
local flockmates

Reference: http://www.red3d.com/cwr/boids/index.html

flockmates (neighbors) local flockmates (neighbors)

Simulated Boid Flock simulated boid flock avoiding cylindrical obstacles (1986)

Reference: http://www.red3d.com/cwr/boids/index.html

z

### Reynolds’s pioneering work became the stepping stone for the development of a computer graphic area known as the *behavior animation*

### area known as the behavior animation

– Georgia Tech' physically-based models of group behaviors

– Stampede sequence (The Lion King)

– *Orc* army (The Lord of the Rings)

z

### Heppner’s artificial bird flocks simulation

– Studied bird flock from the movies

C ll b t d ith G d d P tt t d l

– Collaborated with Grenander and Potter to develop a program that simulates an artificial bird flocks

– Through observation, Heppner realized that chaotic theory

b t l i th t b h i i fl ki

can be to explain the emergent behavior in flocking

– Designed the four simple rules to model an individual bird’s behavior

*F. Heppner and U. Grenander, “A stochastic nonlinear model for coordinated bird flocks.” The Ubiquity of Chaos, ed. S. *

Krasner, AAAS Publications, Washington, DC, 1990

1.

### The attractive force is to allow the birds to attract each other and repulsive force is to forbid the birds to fly too close to each other

### close to each other

2.

### Each bird maintains the same velocity as its neighboring birds

### O f

3.

### Occasionally, the birds’ flight path can be altered by a random input (craziness)

4.

### Any birds are attracted to a roost y and the attraction

### increases as the birds are flying closer to the roost

z

### The whole concept is as follow:

– The birds begin to fly around with no particular destination

– Once a bird discover the roost it will move away from theOnce a bird discover the roost, it will move away from the flock and land on the roost

– Hence, it will pull its nearby neighbors to move towards the roost

– As these neighbors discover the roost, they will land on the roost and bring others more

– This process will go on until the entire flock land on the roost

**Particle Swarm Optimization (PSO)**

z

### Inspired by the “roosting area” concept, James

### Kennedy (social-psychologist) and Russell Eberhart (Electrical Engineer) revised Heppner and

### (Electrical Engineer) revised Heppner and Grenander proposed methodology

### I t h d th bi d k h t l t

### In nature, how do the birds know where to locate food (“roost”) when they are hundred feet in the air?

z

### Explore the area of social psychology, which related to social behavior of the human beings

Reference: http://www.adaptiveview.com/articles/ipsop1.html

z

### Their conclusion: knowledge is shared within the flock

z

### They also include the “mind of social” viewpoint, which

–

### Individuals want to be individualistic, i.e. to improve themselves.

### I di id l t t l th f th i

–

### Individuals want to learn the success of their

### neighbors (both locally and globally), primarily learn their “experiences”.

z

### Hence, they developed the Particle Swarm Optimization

### (PSO)

**About PSO**

z

### PSO is a population based optimization technique

z

*The population is called a swarm (swarm population)* Th t ti l l ti ( ti l ) f

z

### The potential solutions (particles) - form a swarm

**“flying” around the search space to search for the best ** solution

z

### Particles = Candidate solutions (decision variables)

z

**Particles’ flights are governed by the historical ** **information** **o** **at o** :

–

### Velocity *(v)*

–

### Own personal best position found so far (pbest)

### Global best position discovered so far by any particles in

–

### Global best position discovered so far by any particles in

### the swarm (gbest)

*x** _{2}* My Personal Best

*Position (pbest*

*)*

_{i}Global Best Position
*(gbest)*

This is my new
*position (x** _{i}*(t+1))

*I’m here (x*

*(t))*

_{i}*My Velocity (v*

*(t))*

_{i}*position (x** _{i}*(t+1))

43

*x*_{1}

**Standard PSO Equation**

*At each iteration, t* *For each particle, i* p ,

*For each dimension, j*

### ( ) ^{t} ^{w} ^{v} ( ) ^{t}

^{t}

^{w}

^{v}

^{t}

*v*

_{i}

_{j}### ( ) + 1 = ×

_{i}

_{j}### ( ) Momentum Component

## ( ( ) )

## ( ^{gbest} ^{x} ( ) ^{t} )

^{gbest}

^{x}

^{t}

*r* *c*

*t* *x*

*pbest* *r*

*c*

*j*
*i*
*j*

*j*
*i*
*j*

*i*
*j*
*i*
*j*

*i*

2 2

, ,

1 1

, ,

### −

### ×

### × +

### −

### ×

### × Update +

### Velocity

### Momentum Component

### Cognitive Component Social Component

## ( ^{gbest} ^{x} ( ) ^{t} )

^{gbest}

^{x}

^{t}

*r*

*c*

_{2}

_{2}

_{j}

_{i}_{,}

_{j}### +

### ( ^{1} )

_{,}

### ( )

_{,}

### ( ^{1} )

,

*t* + = *x* *t* + *v* *t* +

*x*

_{i}

_{j}

_{i}

_{j}

_{i}

_{j}### Update Position

### Social Component

z

*r*

_{1}*,r*

_{2}**–** Random numbers with [0,1]

z

*c*

_{1}*,c*

_{2}**–** Acceleration constants

### M t t i ti l i ht t l

z

### Momentum component, , – inertial weight controls the impact of previous velocity

z

### Cognitive component - Personal thinking of each

*w*

### particles; personal desire to exceed its current achievement

z

### Social Component – Social knowledge attained via Soc a Co po e Soc a o edge a a ed a

### collaborative effort of all the particles

**Velocity Clipping Criterion**

z

### Kennedy investigated the swarm behavior if the velocity clipping criterion is not introduced.

z

### Without the velocity clipping criterion the swarm

z

### Without the velocity clipping criterion, the swarm would diverge in a sinus-like waves of increasing amplitude without able to converge to the global optimum

### optimum

z

### Hence, the velocity clipping criterion is necessary for the swarm to converge close to or equal to the global

### ti

### optimum

J. Kennedy, and R. C. Eberhart, Swarm Intelligence, ISBN 1-55860-595-9, Academic Press (2001)

### A particle without velocity clipping criterion

### A particle with velocity clipping criterion

Jakob Vesterstrøm and Jacques Riget, “Particle SwarmsExtensions for improved local, multi-modal, and dynamic search in numerical optimization,” May 2002.

z

### To prevent the particles from leaving the search space, the one of the following steps can be taken:

### V l it li i it i E h ti l t

–

### Velocity clipping criterion: Each particle are not allow to have velocities exceed the user defined, i.e.,

[-v[_{max}

_{max}

*, v*

_{max}

_{max}].]

### Usually, y

*v*

_{max}

_{max}

### is chosen to be *k* × *x*

_{i}^{max}

### where is the feasible bound for variables, i.e.

## [

^{U}*i*

## ]

*L*

*i*

*x*

*x ,*

*i*
max

*x**i*

–

### Position clipping criterion: Each particle are not allow to have decision variables exceed the feasible bound for variables ( [

^{L}

^{U}## ] )

### feasible bound for variables ( ) [

*i*

^{U}## ]

*L*

*i*

*x*

*x ,*

**PSO Algorithm (Pseudo Code)**

**/Initialization/**

**Initialize**swarm randomly (Particles and velocity)
**Set w, c1,c2, max num of iterations (tmax), **

**Store** particles’ positions (pbest)
**begin**

**begin**

**While t<tmax**

**for each**particle

z Calculate fitness.

z Update pbest if current position is better than the position contained in the memory.

**End**

Find global best position (gbest)

**for each**particle

z Update velocity and position of particle.

z Apply velocity/position clipping criterion
**End**

**End while**
**End while**

*Report optimum solution (gbest) *
**End**

**Animation of PSO **

z

### Function:

5

5

### ⎞

### ( ) ⎛ ( ( ( ) ) ) ( ( ( ) ) )

### ( ) (

^{2}

### )

^{2}

5

1

2 5

1

1 2

1

### 80032 0

### 42513 1

### 1 cos

### 1 cos

### ,

### + +

### +

### ⎟ +

### ⎠

### ⎜ ⎞

### ⎝

### ⎛ + + × + +

### = ∑ ∑

=

=

*x* *x*

*k* *k*

*x* *k*

*k* *k*

*x* *k*

*x* *x* *F*

*k*
*k*

– Minimization problem

T d i i i bl

### ( ^{x}

^{x}

_{1}

^{+} ^{1} ^{.} ^{42513} ) ( ^{+} ^{x}

^{x}

_{2}

^{+} ^{0} ^{.} ^{80032} )

– Two decision variables

– No decision variable bounds

**Modification in PSO for Solving SOPs**

1.

### Parameter Settings

2.

### Modifications of PSO Equation

3.

### Neighborhood Topology

4.

### Mutation/Perturbation Operators

### M lti l S C t i PSO

5.

### Multiple-Swarm Concept in PSO

**1. Parameter Settings**

*x** _{2}* My Personal Best

*Position (pbest*

*)*

_{i}Global Best Position
*(gbest)*

*Inertial weight, w is large *

*I’m here (x** _{i}*(t))

*My Velocity (v (t))*

This is my new
*position (x** _{i}*(t+1))

53

*x*_{1}*My Velocity (v** _{i}*(t))

*x*_{2}

This is my new
*position (x** _{i}*(t+1))

### c

_{1}

### >> c

_{2}

*I’m here (x** _{i}*(t))

*My Velocity (v*

*(t))*

_{i}54

*x*_{1}

*x** _{2}* This is my new

*position (x*

*(t+1))*

_{i}p ( * _{i}*( ))

### c

_{2}

### >> c

_{1}

*I’m here (x** _{i}*(t))

*My Velocity (v*

*(t))*

_{i}55

*x*_{1}

z

### Random inertia weight

– Experiment indicates this strategy accelerate the

convergence of particle swarm in the early time of the convergence of particle swarm in the early time of the algorithm

### ( )

5 2 .

0 + ⋅

= *rand*

*w*

– rand() is uniformly distributed random number within [0,1]

**Y. H. Shi, R. C. Eberhart, “Empirical Study of Particle Swarm Optimization”, Proceeding Congress on Evolutionary*
*Computation, Piscataway, pp.1945-1949, 1999*

z

### Linear decreasing the inertial weight

### (

_{1}

_{2}

### )

^{w}_{2}

*max*
*t*

*t*
*max*
*w* *t*

*w*

*w* − +

×

−

=

– *w*_{1} *and w*_{2} are initial and final values of inertia weight

– Larger value for *w facilitates global search at the beginning *
of the run

*max*
*t*

of the run

– Smaller *w encourage more local search ability near the end *
of the run

– Experiment indicates good performance when inertia weightExperiment indicates good performance when inertia weight descend from 0.9 to 0.4

**R. C. Eberhart, Y. Shi, “Comparing inertia weight and constriction factors in particle swarm optimization”, Proceeding *
*Congress on Evolutionary Computation, San Diego, pp. 84-88, 2000*

z

### Chaotic inertia weight

– Use chaotic mapping to set inertia weight coefficient L i ti i

^{z} ^{(} ^{t} ^{+} ^{1} ^{)} ^{μ} ^{×} ^{z} ^{(} ^{t} ^{)} ^{×} ( ^{1} ^{z} ^{(} ^{t} ^{)} )

^{z}

^{t}

^{z}

^{t}

^{z}

^{t}

– Logistic mapping

^{z} ^{(} ^{t} ^{+} ^{1} ^{)} ^{=} ^{μ} ^{×} ^{z} ^{(} ^{t} ^{)} ^{×} ( ^{1} ^{−} ^{z} ^{(} ^{t} ^{)} )

^{z}

^{t}

^{z}

^{t}

^{z}

^{t}

• Distribution of Logistic mapDistribution of Logistic map when µ = 4

• Logistic mapping is iterated 30,000 times

### Times happen to both Intervals are very high

,

• Mean times happening to interval [0.1,0.9] is 200

0.1 0.9

–

### Strategy of chaotic initial weight

1. Select a random number z in the interval of (0, 1)

2 Calculate Logistic mapping z with µ = 4

2. Calculate Logistic mapping, z with µ = 4

3. *Apply to either linear decreasing the inertial weight or random *
inertia weight

### ( )

### E i t G d i i i k

### ( )

5 2 .

0 ⋅

+

×

= *rand*

*z*

### ( ) ^{w} ^{z}

^{w}

^{z}

*w*

*max* *t*

*t* *max* *w* *t*

*w*

*w* − + ×

### ×

### −

### =

_{1}

_{2}

_{2}

### or

z

### Experiment: Good convergence precision, quick

### convergence velocity, and better global search ability

Yong Feng, Gui-Fa Teng, Ai-Xin Wang, and Yong-Mei Yao, “Chaotic inertia weight in particle swarm optimization,”

*Proceeding 2*^{nd}*International Conference on Innovative Computing, Information and Control, Kumamoto, Japan, pp. 475-*
475, 2007

z

*Time varying acceleration coefficients (c*

_{1}

*, c*

_{2}

### )

– *Large c*_{1} *and small c*_{2} in the early stage, to encourage
particles to explore the search space

particles to explore the search space

– Promoted quick convergence to the optimum solution in the
*later stage with larger c*_{2} *and smaller c*_{1}

### (

_{f}

_{i}### ) ^{c}

^{c}

_{i}*tmax* *c* *t*

*c*

*c*

_{1}

### =

_{1}

### −

_{1}

### +

_{1}

### ( )

^{t}### (

_{f}

_{i}### )

^{c}

_{i}*tmax*
*c* *t*

*c*

*c*_{2} = _{2} − _{2} + _{2}

*Ratnaweera A, Halgamuge S.K., and Watson H. C., “SELF-ORGANIZING HIERARCHICAL PARTICLE SWARM *
*OPTIMIZER,” IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004 *

**2. Modifications of PSO Equation**

z

### Canonical PSO by Maurice Clerc

– Studied the swarm behavior using the second order differential equations

– The study shows that it is possible to determine under which conditions that the swarm will converge

– Introduces a constriction factor , to ensure convergence by restraining the velocity to guarantee convergence of the

### χ

by restraining the velocity to guarantee convergence of the particles

– Observation: The amplitude of the particle’s oscillations
decreases and increase depend on the distance between
*pbest* *and gbest*

*pbest* *and gbest*

M. Clerc, and J. Kennedy, “The Particle Swarm Explosion, Stability,and Convergence in a Multidimensional Complex
*Space,” IEEE Transactionson Evolutionary Computation, Vol. 6, No. 1, February, (2002): 58-73 *

– *Particle will oscillate around the weighted mean of pbest*
*and gbest*

– *If pbest and gbest are near each other, particle will performIf pbest and gbest are near each other, particle will perform *
local search

– *If pbest and gbest are far apart from each other, particle will *
perform global search

perform global search

– During the search process, particle will shift from local

*search back to global search depending on pbest and gbest*
The constriction factor balances the need for local and

– The constriction factor balances the need for local and global search depending how the social conditions are in place

z

### The update velocity equation is

### ( )

_{⎟}

^{⎞}

⎜⎛*v*_{i,}_{j}*t* +

### ( ) ( ( ) )

### ( ( ) )

^{⎟}

^{⎟}

^{⎟}

⎟

⎜ ⎠

⎜⎜

⎜

⎝ −

+

−

= +

*t*
*x*
*gbest*

*r*
*c*

*t*
*x*
*pbest*

*r*
*c*
*t*

*v*

*j*
*i*
*j*

*j*
*i*
*j*

*i*
*j*

*i*

, 2

2

, ,

1 1

, 1

### χ

2κ

### [ ]

### where ; ; and

2### (

4### )

2

−

−

= −

φ φ φ

χ κ

^{κ} ^{∈} [ ] ^{0} ^{,} ^{1}

4

2 ;

1 + >

=

### φ

### φ

*c*

_{1}

*c*

_{2}

### φ

### φ

z The parameter controls the convergence speed to the point of attraction.

z If is close to zero, will be close to zero, then the resulting

### κ

### κ χ

velocity will be small. Small velocity encourage local search, so the convergence speed is high

z If is close to one, high exploration behavior but slowest possible convergence speed

### κ

possible convergence speed

z Experiment: Even without the velocity clipping criterion the constriction factor can prevent the particles from leaving the search space and ensure convergence

search space and ensure convergence

z

### Gaussian Particle Swarm Model (GPSO)

–

### Observation shows the expected values for

*( b* ) *d ( b* ) 0 729 d 0 85

*(pbest-x) and (gbest- x) are 0.729 and 0.85*

–

### A probability distribution that generates random values with expected values of [0.729 0.85]

### values with expected values of [0.729 0.85]

### ( ) ^{t} ^{randn} ( ^{pbest} ^{x} ( ) ^{t} ) ^{randn} ( ^{gbest} ^{x} ^{( )} ^{t} )

^{t}

^{randn}

^{pbest}

^{x}

^{t}

^{randn}

^{gbest}

^{x}

^{t}

*v*

_{i}_{,}

_{j}### + 1 =

_{i}_{,}

_{j}### −

_{i}_{,}

_{j}### +

_{j}### −

_{i}_{,}

_{j}### ( ) ^{1} ( ) ( ) ^{1}

–

*|randn| and |randn| are positive random numbers * *generated according to abs[N(0,1)]*

### ( ) ^{1}

_{,}

### ( )

_{,}

### ( ) ^{1}

,

*t* + = *x* *t* + *v* *t* +

*x*

_{i}

_{j}

_{i}

_{j}

_{i}

_{j}### g g [ ( , )]

*R. A. Krohling, “Gaussian swarm: a novel particle swarm optimization algorithm,” Proceedings of the IEEE Conference on *
*Cybernetics and Intelligent Systems, Singapore, pp. 372-376, 2004.*

**3. Neighborhood Topology**

z

### From the standard PSO equation, the movement of *particles influence by both personal best (pbest) and * *global best (gbest)*

*global best (gbest)*

z

### Neighborhood topology- topology of a swarm *(usually replace gbest)*

### ( y p *g* )

z

### Neighborhood topology- How the particles in the swarm are connected with each other in terms of sharing their knowledge

### sharing their knowledge

z

### The convergence rate can be estimated by calculating the average distance between two particles in the neighborhood topology

### particles in the neighborhood topology

z

### The shorter the average distance facilitates quick

**convergence speed – This means lower degree of ** g p **g**

**connectivity**

z

### Global Topology (STAR)

– Happens in every iteration

All th ti l t d i h th t th k l d i

– All the particles are connected in such that the knowledge is shared by all particles

– Best solution found by any particles in the swarm and will i fl ll th ti l t t it ti

influence all the particles at next iteration

**g**

**e** **b**

**f** **a**

**g**

**d** **c**

z

### Wheel Topology

– When particle (example: b) finds the global best, particle a, will immediately drawn into it (at next iteration)

will immediately drawn into it (at next iteration)

– Only when particle a move to the location, that it influence the rest of the particles

O it ti i d b f ll th ti l i

– One or more iterations required before all the particles in the neighborhood are influence by the global best

**b**
**f**

**g**

**a** **f** **b**

**g**
**a**

**The rest of **
**particles **
**e** **will**

**d**

**c**
**a**

**e**

**d**

**c**

**will**

**drawn into **
**particle a**

z

### Ring Topology (Circle or lbest)

– *Each particle is connected with K immediate neighbors*

When one particle (example: b) finds the global best only its

– When one particle (example: b) finds the global best, only its immediate neighbors (i.e., a & c), will be drawn to b

– Other particles are not influenced by b until their immediate neighbors have moved towards that location

neighbors have moved towards that location

– A few iterations may required before all the particles in the neighborhood are influence by the global best

**b**
**a**
**f**

**g**

**b**
**a**
**f**

**g**

**e** **d**

**c**
**f** **b**

**e** **d** **c**

**f** **b**

z

### Each neighborhood has strength and weakness

z

### STAR – Fast information flow, converges fast, but with potential of premature convergence

### with potential of premature convergence

z

### Wheel – Moderate information flow, converges more quickly but sometimes to a suboptimal point in the space

z

### Circle – Slow information flow, converge slower,

### favor more exploration; might have more chances to find better solutions slowly

z

### There are many more neighborhood topologies and

### the choice of these topologies are depended by the

### the choice of these topologies are depended by the

### problems (i.e., problem dependent)

**4. Mutation/Perturbation Operators**

z

### Problem of PSO – lack of diversity and easily trapped in a local optimum.

### A t ti ti l b bl t l d th th

z

### A mutating particle may be able to lead the other particles away from their current position if this particle becomes the global best.

### p g

z

### Hence, apply mutation operators to PSO – A

### strategy to enhance the exploration of the particles and to escape the local minima

### and to escape the local minima.

z

### Where to apply the mutation operators in PSO?

– Apply to the updated particle’s position (decision variable)

A l t th d fi d i l it th h ld

– Apply to the user defined maximum velocity threshold

z

### The mutation operators are the mutation approaches use for GA or MOEA. Example:

R d t ti N if t ti N l di t ib t d

– Random mutation; Non-uniform mutation; Normal distributed mutation

– Example of Gaussian mutation used for PSO*:

where σ is set to be 0.1 times the length of the search space in

### ( ) ^{x} ^{x} ( ^{Gaussian} ( ) ^{σ} )

^{x}

^{x}

^{Gaussian}

*mutation* = × 1 −

g

one dimension; OR σ can be set at 1.0 and linearly decreases to 0 as the iteration counts reach maximum criteria

**5. Multiple-Swarm Concept in PSO**

z

### Improve performance by promoting exploration and diversity

– Counter its tendency of premature convergence z

### Three main groups

– Improve the performance of PSO by promoting diversityImprove the performance of PSO by promoting diversity

– Solve multimodal problems

– Locate and track the optima of a multimodal problem in a dynamic environment

dynamic environment

z

**Kennedy proposed using a k-means clustering**

### algorithm to identify the centers of different clusters of particles in the population, and then use these of particles in the population, and then use these cluster centers to substitute the personal bests

– Require a pre-specified number of iterations to determine the cluster centers and pre-specified number of clustersp p

– Not suitable for multimodal problems since the cluster

centers don’t necessarily the best-fit particles in that cluster

*J. Kennedy, “Stereotyping: improving particle swarm performance with cluster analysis,” Proceedings of Congress on Evolutionary Computation, San Diego, CA. pp. 1507-1512, 2000

–

### Number of clusters and number of iterations to identify the cluster centers must be

### predetermined predetermined.

### Cluster A’s center performs better than performs better than all members of

### cluster A, whereas

### l t B’ t

### cluster B’s center performs better than some and worse

### than others*

z

### Chen and Yu’s TPSO

– **First subswarm** will optimize following the global best;

**second subswarm** will move in the opposite direction pp

– *Particle’s pbest is updated based on its local best, their *
corresponding subswarm’s best, and the global best
collected from two subswams

– If the global best has not improved for 15 successive

iterations, the worst particles of a subswarm are replaced by the best ones of the other subswarm. Then, the subswarms switch their flight directions

switch their flight directions

*G. Chen and J. Yu, “Two sub-swarms particle swarm optimization algorithm,” Proceeding of International Conference on *
*Natural Computation, Changsha, China, pp. 515-524, 2005 *

z

### Multi-population cooperative optimization (MCPSO)

– Based on the concept of master-slave mode

S l ti ill h t d lti l

– Swarm population will have a master swarm and multiple slave swarms

– Slave swarms explore the search space independently to i t i di it f ti l

maintain diversity of particles

– Master swarm updates via the best particles collected from the slave swarms

*B. Niu, Y. Zhu, and X. He, “Multi-population cooperative particle swarm optimization,” Proceeding of European *
*Conference on Artificial Life, Canterbury, UK, pp. 874-883, 2005 *

z

### Speciation-based PSO (SPSO) is proposed by Parrott and Li

– Notion of species: A group of individuals sharing commonNotion of species: A group of individuals sharing common attributes according to some similarity metric

– **A radius, r**** _{s}**, is measured in Euclidean distance from the
center of a species to its boundary

– *The center of a species, species seed, is always the fittest *
individual in the species

– All particles that fall within the distance from the species

d l ifi d h i

seed are classified as the same species

D. Parrott and X. Li, “Locating and tracking multiple synamic optima by a particle swarm model using speciation,” IEEE Transactions on Evolutionary Computations, Vol. 10, No. 4, pp. 440-458, 2006

**PSO Designs Inspired by Other Fields**

z

### Emotion Particle Swarm Optimization

– Each particle has two emotions (joy and sad)

Th ti l t t f th ti l i b d th

– The emotional state of the particles is based on the emotional factor compare with a random value

– If certain condition is met, then the particle is updated using

th “j f l” l it ti l “ d” l it ti

the “joyful” velocity equation or else “sad” velocity equation will be applied

– Psychological model is incorporated in both “joyful” and

“sad” velocity equations

“sad” velocity equations

*B. Niu, Y.L. Zhu, K.Y. Hu, S.F. Li, X.X. He, “A novel particle swarm optimizer using optimal foraging theory,” Proceedings of *
*Computational Intelligence and Bioinformatics, Part 3, Kunming, China , Vol. 4115, pp. 61-71, 2006 *

**A** ^{NT} **C** ^{O ONY} **O** **PTIMIZATION**

^{NT}

^{O ONY}

**A** ^{NT } **C** ^{OLONY} **O** **PTIMIZATION**

^{NT }

^{OLONY}

**Biological Fact**

### •

Ant Colony Optimization (ACO) studies artificial systems that take inspiration from the behavior of real ant colonies and which are used to solve discrete optimization problems.### •

Real ants are capable of finding the shortest path from a food source to their nest without using visual cues by exploiting pheromone information.### •

While walking, ants deposit pheromone on the ground and follow, in probability, pheromone previously deposited by other ants.### •

Also, they are capable of adapting to changes in the environment, so, t ey a e capab e o adapt g to c a ges t e e o e t, for example finding a new shortest path once the old one is no longer feasible due to a new obstacle (dynamical environments).### •

Consider the following figure in which ants are moving on a straight line which connects a food source to the nest:### •

It is well-known that the primary means used by ants to form and maintain the line is a pheromone trail. Ants deposit a certain amount maintain the line is a pheromone trail. Ants deposit a certain amount of pheromone while walking, and each ant probabilistically prefers to follow a direction rich in pheromone rather than a poorer one. This elementary behavior of real ants can be used to explain how they can find the shortest path which reconnects a broken line after the sudden appearance of an unexpected obstacle has interrupted the initial path:### •

In fact, once the obstacle has appeared, those ants which are just in front of the obstacle cannot continue to follow thepheromone trail and therefore they have to

h b t t i i ht l ft I thi

choose between turning right or left. In this situation we can expect half the ants to choose to turn right and the other half to turn left. The very same situation can be found on the other side of the obstacle found on the other side of the obstacle.

### •

It is interesting to note that those ants which choose, by chance, the shorter path around the obstacle will more rapidly reconstitute the interrupted pheromone trail compared to p p p those which choose the longer path. Hence, the shorter path will receive a higher amount of pheromone in the time unit and this will in turn cause a higher number of ants tochoose the shorter path Due to this positive choose the shorter path. Due to this positive feedback (autocatalytic) process, very soon all the ants will choose the shorter path.

### •

The most interesting aspect of this autocatalytic process is that finding the shortest path around the obstacle seems to be an*emergent property of the interaction between the obstacle shape *
and ants distributed behavior: Although all ants move at

and ants distributed behavior: Although all ants move at

approximately the same speed and deposit a pheromone trail at approximately the same rate, it is a fact that it takes longer to contour obstacles on their longer side than on their shorter side which makes the pheromone trail accumulate quicker on the which makes the pheromone trail accumulate quicker on the shorter side. It is the ants preference for higher pheromone trail levels which makes this accumulation still quicker on the shorter path.

• [R1] Beckers R., Deneubourg J.L. and S. Goss (1992). Trails and U-turns in the selection of the shortest path
**by the ant Lasius niger. Journal of theoretical biology, 159, 397-415. **

• *[R2] Hölldobler B. and E.O. Wilson (1990). The ants. Springer-Verlag, Berlin.*

**History**

### •

The first ACO system was introduced by Marco Dorigo in his Ph.D.thesis (1992), and was called Ant System (AS).

### •

AS is the result of a research on computational intelligenceapproaches to combinatorial optimization that Dorigo conducted at approaches to combinatorial optimization that Dorigo conducted at Politecnico di Milano in collaboration with Alberto Colorni and

Vittorio Maniezzo. AS was initially applied to the travelling salesman problem, and to the quadratic assignment problem.

### •

^{Si}

^{1995 D i}

^{G}

^{b d ll}

^{d Stüt l h}

^{b}

^{ki}

### •

Since 1995 Dorigo, Gambardella and Stützle have been working on various extended versions of the AS paradigm. Dorigo and Gambardella have proposed Ant Colony System (ACS), while Stützle and Hoos have proposed MAX-MIN Ant System (MMAS).### ••

They have both been applied to the symmetric and asymmetric traveling salesman problem, with excellent results. Dorigo,Gambardella and Stützle have also proposed new hybrid versions of ant colony optimization with local search. In problems like the quadratic assignment problem and the sequential ordering problem these ACO algorithms outperform all known algorithms on vast

classes of benchmark problems.