MOEA N EURA N ETWORK D ESIGN
MOEA N EURAL N ETWORK D ESIGN
Design Specification
• Essential requirements for neural network design
A t i i l ith th t h f th ti l t
o A training algorithm that can search for the optimal parameters (i.e., weights or biases) for the specified network structure and training task.
o A rule or algorithm that can determine the network complexity and ensure it to be sufficient for given training problem.
o Ametric or measure to evaluate the reliability and generalization o A metric or measure to evaluate the reliability and generalization
of the produced neural network.
Design Dilemma
) ,
( *
* x ω
fNS
•
A single optimal neural network is very difficult to find due too Weights are tuned by training sets, which are finite– difficult to extract fNS* (x,ω*)
from by a finite training data set.FNS
o Trade-off between NN learning capability and the variation of the hidden neuron numbers
FNS
A network with insufficient neurons— low training performance
A network with excessive number of neurons— difficult for generalization .
•
Instead of trying to obtain a single optimal neural network, finding a set of near-optimal networks with different network structures is more feasible.••
NN design– Multiobjective optimization problem (achieve better performance while simplify network structure)Design Principle
•
A population based, parallel searching multiobjective genetic algorithm is suitable for neural network design– to find a set of non-dominated neural net solutionso Find a uniformly distributed Pareto front by one run
o Equally treat discontinuous, concave and other shapes of Pareto front
f2
f2 t k
f1– network complexity f2– network
training error
p y
(neuron number)
f1
Hierarchical Genotype
•
Using GA to evolve RBF Neural Network o Evolving NN topology along with parametersHi hi l i d i
phenotype
o Hierarchical gene structure in genotype design
weight center
)
||
||
exp(
)
( 2
1
i m
i
f x =
∑
i − x −c=
phenotype
ω1 0 0 1 0 1
Control genes Weight
1 0 0 1 0 1
Control genes Weight
Genotype
Center genes genes
Center genes genes
Domination Ranking
• Goal: To convert multiple fitness of multiobjectives to a rank value.
f2
f1
• Rank value represents the dominated relationship by
layerizing the resulting population.
Diversity Preservation
• Goal: to maintain population diversity during
evolutionary process to obtain uniformly distributed Pareto front
f2 f2
f2 f2
f1 f1
Automatic Accumulated Ranking
•
Combine hierarchical genotype representation and Rank Density based Genetic Algorithm (RDGA)— AutomaticAccumulated Ranking Strategy, includes both dominance and diversity info
diversity info.
•
All nondominated individual has rank=1, for dominated individual y at generation t , Its rank value is given by∑
+
=
) (
) , ( 1
) , (
p t
j
t y rank t
y
f2
rank
3
8
∑
=1
j j
1 1
2
2
6 12
where P(t) are the number of individual who
f1 1
2 where P(t) are the number of individual who dominate y
Adaptive Cell Density Estimation
•
In HRDGA, an adaptive grid density estimation approach is proposed. The length of adaptive grid cell in objective space is computed ascomputed as
X i
X f i f
d max (x) min (x )
x
x∈ − ∈
= i
i 1 = n
Density=4X i X i
i K
f
d max f (x ) min (x )
x
x∈ − ∈
=
i = 1 ,..., n
f2
i
i K
d =
i = 1 ,..., n
•
Density value of each individual is the b f th i di id l l t d i thDensity=4
number of the individuals located in the same cell.
f1
Fitness Assignment
• In HRDGA, original population is divided to optimize rank and density value independently. The rank and
d it l b ti i d i lt l !
density value can be optimized simultaneously!
population
pop lation Sub-pop 1 to Minimize
population rank value
i-th generation (i+1)-th generation
p p population
Sub-pop 2 to minimize population density value
Selection
Crossover
&mutation
Selection
Elitism
• An elitist archive is used and an adaptive probability of sampling an individual from archive is applied.
Main
population
Elitists’
Archive
Local Search
• Diffusion scheme from Cellular GA
Selected parent
Best i di id l
Diffusion scheme
individual
Offspring
Forbidden Region
• To prevent the appearances of offsprings with high rank value and low density value
Offspring Selected
parent
Forbidden Region
Offspring
Forbidden Region
Mackey-Glass Time Series Prediction
•
Algorithms for comparisono K-Nearest Neighbor (KNN, Kaylani & Dasgupta, 2000);
•
Testing Problem Mackey Glass Chaotic Time Serieso Generalized Regression Neural Network (GRNN, Wasserman, 1993) o Orthogonal Least Square (OLS, Chen,et. al, 1991)
Testing Problem– Mackey-Glass Chaotic Time Series )
)) ( (
1 (
) (
) (
)) (
( b x t
τ t x
τ t x a t
d t x d
c − ×
− +
−
= ×
o To predict x(t+6) based on x(t), x(t-6), x(t-12) and x(t-18)
o
150 10
1 0 2
0 = = =
= b c τ
a = 0.2,b = 0.1, c =10, τ =150 a
Parameter Setting
•
For HRDGA, the length of all three layers of genes are chosen to be 150, population size is 400.•
F KNN d GRNN h d 70 i di id l k i h 11 80•
For KNN and GRNN method, 70 individual networks with 11~80 number of neurons are used for comparison.•
For OLS method, 40 different tolerance parameter– ρρvalues are chosen•
St i C it i ith th h ( ti ) d 5 000 thform 0.01 to 0.4 with step 0.01. determines the trade-off between the performance and complexity of a network.
ρ
ρ
•
Stopping Criteria: either the epochs (generations) exceeds 5,000, or the training Sum Square Error (SSE) between two sequential (epochs)generations is smaller than 0.01.
•
Training data set: first 250 seconds data.•
Testing data set: data from 250 – 499, 500 – 749, 750 – 999 and 1,000 – 1,249 secondsSimulation Study- Training Set
Training SSE vs Neuron Numbers Non dominated front Training SSE vs. Neuron Numbers
for training set
Non-dominated front for training set
Simulation Study- Testing Set #1
Training SSE vs Neuron Numbers Non dominated front Training SSE vs. Neuron Numbers
for testing set #1 Non-dominated front
for testing set #1
Simulation Study- Testing Set #2
T i i SSE N N b Non dominated front
Training SSE vs. Neuron Numbers for testing set #2
Non-dominated front for testing set #2
Simulation Study- Testing Set #3
T i i SSE N N b N d i d f
Training SSE vs. Neuron Numbers
for testing set #3 Non-dominated front
for testing set #3
Simulation Study- Testing Set #4
T i i SSE N N b N d i d f
Training SSE vs. Neuron Numbers
for testing set #4 Non-dominated front
for testing set #4
Performance Comparison
Best performance for Training set
Best performance for Testing set #1
Best performance for Testing set #2
Best performance for Testing set #3
Best performance for Testing set #4 for Training set for Testing set #1 for Testing set #2 for Testing set #3 for Testing set #4 Training
SSE
Neuron number
Testing SSE
Neuron number
Testing SSE
Neuron number
Testing SSE
Neuron number
Testing SSE
Neuron number
KNN 2.8339 69 3.3693 42 3.4520 42 4.8586 48 4.8074 19
GRNN 2.3382 68 2.7720 38 3.0711 43 2.9644 40 3.2348 37
OLS 2.3329 60 2.4601 46 2.5856 50 2.5369 37 2.7199 54
HRDGA 2.2901 74 2.4633 47 2.5534 52 2.5226 48 2.7216 58
Problem for OLS
•
The trade-off characteristic between network performance and complexity totally depends on the value of tolerance parameterρ ρ
•
Same value means completely different trade-off features forρ
different NN design problems– designer cannot control the network complexity
parameter .ρ
ρ
network complexity.
•
The relationship between value and network topology is nonlinear, many to one mappingρ
many to one mapping
Observations
•
A new genetic algorithm assisted neural network design approach Hierarchical Rank-Density Genetic Algorithm is proposed•
Characteristics of HRDGA:o Hierarchical genotype representation– evolve topology with parameters o Multiobjective optimization– find a set of near-optimal network
candidates by one run candidates by one run
o Elitism, local search and forbidden region techniques help HRDGA in finding near-optimal networks with low training errors.
o The network complexity is near completelyp y p y and uniformlyy sampled p due to HRDGA’s diversity maintaining scheme.
P ARTIC E S WARM O PTIMIZATION
P ARTICLE S WARM O PTIMIZATION
Definition
z
Swarm Intelligence (SI) is the property of a system whereby the collective behaviors of (unsophisticated) agents interacting locally with their environment
agents interacting locally with their environment
cause coherent functional global patterns to emerge.
z
SI provides a basis with which it is possible to p p
explore collective (or distributed) problem solving without centralized control or the provision of a global model
global model.
In Simple Language…
z
SI systems are typically made up of a population of simple agents interacting locally with one another and with their environment to accomplish simple goal.
with their environment to accomplish simple goal.
z
Although there is normally no centralized control structure dictating how individual agents should
behave local interactions between such agents often behave, local interactions between such agents often lead to the emergence of global behavior
(swarming behavior).
SI in Nature
Fish Schooling Bird Flocking in V formation Fish Schooling
©CORO, Caltech
Bird Flocking in V formation
©CORO, Caltech
Benefits of forming a swarm of thousand of animals g
(agents) is to increasing foraging efficiency and defense
over against predators
• Another example: Nest building in termites
building in termites
• An eighteen feet tall termite nest at the University of
Ghana Legon.
(© Kjell B. Sandved / Visuals Unlimited)
How Termites Build their Nest
1. Each termite scoops up a mudball from its
environment, invests the ball with pheromones.
2. Then, randomly , y drops it on the ground or elevate p g positions to form small heaps.
3. Termites are attracted to their nestmates' pheromones.
Hence, they are more likely to drop their own mudballs y y p near or on top of their neighbors'.
4. The process stop when reaches a specific height.
5. Termites look for heaps clusters, choose the heap
5 e es oo o eaps c us e s, c oose e eap
clusters, and connect them with each other by building walls.
6. Over time this leads to the construction of pillars, p ,
arches, tunnels and chambers.
Fundamental Concepts of SI
z
Self-organization process
–
Each agent plays its role by interacting with its environment to gather the latest information, constantly making decision based on some
simple local rules and information received, and p , interact locally with other agents
z
Division of labor
–
Different groups of agents have their own specializations to carry on certain tasks
They collaborate with other groups and perform
–
They collaborate with other groups and perform
their own tasks simultaneously
Applications of ‘Swarm’ Principle
z
Robotics
– Swarm bots http://www.swarm-bots.org/
C t A i ti
z
Computer Animation
– http://gvu.cc.gatech.edu/animation/Areas/group_behavior/gr oup.html
z
Games, Interactive graphics and virtual reality
z
“Swarm-like” algorithm - solving optimization problems
problems
– Particle Swarm Optimization (PSO)
– Ant Colony Optimization (ACO)
Early Particle Swarm Optimization
z
Craig Reynolds’s flock motion (1986-1987)
–
To model the flocking behavior of simple agents (boids)
(boids)
–
Resulting flock motion is contributed by the
interaction between the behaviors of individual b id
boids
–
Each boid has its own coordinate system and
applies geometric flight model to support its flight movement
–
Geometric flight model includes translation and flight dynamic parameters of yaw, pitch, and
flight dynamic parameters of yaw, pitch, and
banking (roll)
–
Incorporates three steering behaviors (local
rules), which is the underlying concept of flocking E h b id h it l l i hb h d (li it d
–
Each boid has its own local neighborhood (limited perception range of birds or fishes in the nature)
angel angle
boid
distance distance
The three steering behaviors describe the maneuverability of an individual boid:
Separation: steer to avoid crowding local flockmates
Alignment: steer towards the average heading of local
flockmates (neighbors)
Cohesion: steer to move toward the average position of local flockmates
Reference: http://www.red3d.com/cwr/boids/index.html
flockmates (neighbors) local flockmates (neighbors)
Simulated Boid Flock simulated boid flock avoiding cylindrical obstacles (1986)
Reference: http://www.red3d.com/cwr/boids/index.html
z
Reynolds’s pioneering work became the stepping stone for the development of a computer graphic area known as the behavior animation
area known as the behavior animation
– Georgia Tech' physically-based models of group behaviors
– Stampede sequence (The Lion King)
– Orc army (The Lord of the Rings)
z
Heppner’s artificial bird flocks simulation
– Studied bird flock from the movies
C ll b t d ith G d d P tt t d l
– Collaborated with Grenander and Potter to develop a program that simulates an artificial bird flocks
– Through observation, Heppner realized that chaotic theory
b t l i th t b h i i fl ki
can be to explain the emergent behavior in flocking
– Designed the four simple rules to model an individual bird’s behavior
F. Heppner and U. Grenander, “A stochastic nonlinear model for coordinated bird flocks.” The Ubiquity of Chaos, ed. S.
Krasner, AAAS Publications, Washington, DC, 1990
1.
The attractive force is to allow the birds to attract each other and repulsive force is to forbid the birds to fly too close to each other
close to each other
2.
Each bird maintains the same velocity as its neighboring birds
O f
3.
Occasionally, the birds’ flight path can be altered by a random input (craziness)
4.
Any birds are attracted to a roost y and the attraction
increases as the birds are flying closer to the roost
z
The whole concept is as follow:
– The birds begin to fly around with no particular destination
– Once a bird discover the roost it will move away from theOnce a bird discover the roost, it will move away from the flock and land on the roost
– Hence, it will pull its nearby neighbors to move towards the roost
– As these neighbors discover the roost, they will land on the roost and bring others more
– This process will go on until the entire flock land on the roost
Particle Swarm Optimization (PSO)
z
Inspired by the “roosting area” concept, James
Kennedy (social-psychologist) and Russell Eberhart (Electrical Engineer) revised Heppner and
(Electrical Engineer) revised Heppner and Grenander proposed methodology
I t h d th bi d k h t l t
In nature, how do the birds know where to locate food (“roost”) when they are hundred feet in the air?
z
Explore the area of social psychology, which related to social behavior of the human beings
Reference: http://www.adaptiveview.com/articles/ipsop1.html
z
Their conclusion: knowledge is shared within the flock
z
They also include the “mind of social” viewpoint, which
–
Individuals want to be individualistic, i.e. to improve themselves.
I di id l t t l th f th i
–
Individuals want to learn the success of their
neighbors (both locally and globally), primarily learn their “experiences”.
z
Hence, they developed the Particle Swarm Optimization
(PSO)
About PSO
z
PSO is a population based optimization technique
z
The population is called a swarm (swarm population) Th t ti l l ti ( ti l ) f
z
The potential solutions (particles) - form a swarm
“flying” around the search space to search for the best solution
z
Particles = Candidate solutions (decision variables)
z
Particles’ flights are governed by the historical information o at o :
–
Velocity (v)
–
Own personal best position found so far (pbest)
Global best position discovered so far by any particles in
–
Global best position discovered so far by any particles in
the swarm (gbest)
x2 My Personal Best Position (pbesti)
Global Best Position (gbest)
This is my new position (xi(t+1)) I’m here (xi(t)) My Velocity (vi(t))
position (xi(t+1))
43
x1
Standard PSO Equation
At each iteration, t For each particle, i p ,
For each dimension, j
( ) t w v ( ) t
v
i j( ) + 1 = ×
i j( ) Momentum Component
( ( ) )
( gbest x ( ) t )
r c
t x
pbest r
c
j i j
j i j
i j i j
i
2 2
, ,
1 1
, ,
−
×
× +
−
×
× Update +
Velocity
Momentum Component
Cognitive Component Social Component
( gbest x ( ) t )
r
c
2 2 j i,j+
( 1 )
,( )
,( 1 )
,
t + = x t + v t +
x
i j i j i jUpdate Position
Social Component
z
r
1,r
2– Random numbers with [0,1]
z
c
1,c
2– Acceleration constants
M t t i ti l i ht t l
z
Momentum component, , – inertial weight controls the impact of previous velocity
z
Cognitive component - Personal thinking of each
w
particles; personal desire to exceed its current achievement
z
Social Component – Social knowledge attained via Soc a Co po e Soc a o edge a a ed a
collaborative effort of all the particles
Velocity Clipping Criterion
z
Kennedy investigated the swarm behavior if the velocity clipping criterion is not introduced.
z
Without the velocity clipping criterion the swarm
z
Without the velocity clipping criterion, the swarm would diverge in a sinus-like waves of increasing amplitude without able to converge to the global optimum
optimum
z
Hence, the velocity clipping criterion is necessary for the swarm to converge close to or equal to the global
ti
optimum
J. Kennedy, and R. C. Eberhart, Swarm Intelligence, ISBN 1-55860-595-9, Academic Press (2001)
A particle without velocity clipping criterion
A particle with velocity clipping criterion
Jakob Vesterstrøm and Jacques Riget, “Particle SwarmsExtensions for improved local, multi-modal, and dynamic search in numerical optimization,” May 2002.
z
To prevent the particles from leaving the search space, the one of the following steps can be taken:
V l it li i it i E h ti l t
–
Velocity clipping criterion: Each particle are not allow to have velocities exceed the user defined, i.e.,
[-v[ maxmax, vmaxmax].]Usually, y
vmaxmaxis chosen to be k × x
imaxwhere is the feasible bound for variables, i.e.
[
Ui]
L
i
x
x ,
i max
xi
–
Position clipping criterion: Each particle are not allow to have decision variables exceed the feasible bound for variables ( [ L U ] )
feasible bound for variables ( ) [ iU ]
L
i
x
x ,
PSO Algorithm (Pseudo Code)
/Initialization/
Initializeswarm randomly (Particles and velocity) Set w, c1,c2, max num of iterations (tmax),
Store particles’ positions (pbest) begin
begin
While t<tmax
for eachparticle
z Calculate fitness.
z Update pbest if current position is better than the position contained in the memory.
End
Find global best position (gbest)
for eachparticle
z Update velocity and position of particle.
z Apply velocity/position clipping criterion End
End while End while
Report optimum solution (gbest) End
Animation of PSO
z
Function:
5
5
⎞
( ) ⎛ ( ( ( ) ) ) ( ( ( ) ) )
( ) (
2)
25
1
2 5
1
1 2
1
80032 0
42513 1
1 cos
1 cos
,
+ +
+
⎟ +
⎠
⎜ ⎞
⎝
⎛ + + × + +
= ∑ ∑
=
=
x x
k k
x k
k k
x k
x x F
k k
– Minimization problem
T d i i i bl
( x
1+ 1 . 42513 ) ( + x
2+ 0 . 80032 )
– Two decision variables
– No decision variable bounds
Modification in PSO for Solving SOPs
1.
Parameter Settings
2.
Modifications of PSO Equation
3.
Neighborhood Topology
4.
Mutation/Perturbation Operators
M lti l S C t i PSO
5.
Multiple-Swarm Concept in PSO
1. Parameter Settings
x2 My Personal Best Position (pbesti)
Global Best Position (gbest)
Inertial weight, w is large
I’m here (xi(t))
My Velocity (v (t))
This is my new position (xi(t+1))
53
x1 My Velocity (vi(t))
x2
This is my new position (xi(t+1))
c
1>> c
2I’m here (xi(t)) My Velocity (vi(t))
54
x1
x2 This is my new position (xi(t+1))
p ( i( ))
c
2>> c
1I’m here (xi(t)) My Velocity (vi(t))
55
x1
z
Random inertia weight
– Experiment indicates this strategy accelerate the
convergence of particle swarm in the early time of the convergence of particle swarm in the early time of the algorithm
( )
5 2 .
0 + ⋅
= rand
w
– rand() is uniformly distributed random number within [0,1]
*Y. H. Shi, R. C. Eberhart, “Empirical Study of Particle Swarm Optimization”, Proceeding Congress on Evolutionary Computation, Piscataway, pp.1945-1949, 1999
z
Linear decreasing the inertial weight
(
1 2)
w2max t
t max w t
w
w − +
×
−
=
– w1 and w2 are initial and final values of inertia weight
– Larger value for w facilitates global search at the beginning of the run
max t
of the run
– Smaller w encourage more local search ability near the end of the run
– Experiment indicates good performance when inertia weightExperiment indicates good performance when inertia weight descend from 0.9 to 0.4
*R. C. Eberhart, Y. Shi, “Comparing inertia weight and constriction factors in particle swarm optimization”, Proceeding Congress on Evolutionary Computation, San Diego, pp. 84-88, 2000
z
Chaotic inertia weight
– Use chaotic mapping to set inertia weight coefficient L i ti i
z ( t + 1 ) μ × z ( t ) × ( 1 z ( t ) )
– Logistic mapping
z ( t + 1 ) = μ × z ( t ) × ( 1 − z ( t ) )
• Distribution of Logistic mapDistribution of Logistic map when µ = 4
• Logistic mapping is iterated 30,000 times
Times happen to both Intervals are very high
,
• Mean times happening to interval [0.1,0.9] is 200
0.1 0.9
–
Strategy of chaotic initial weight
1. Select a random number z in the interval of (0, 1)
2 Calculate Logistic mapping z with µ = 4
2. Calculate Logistic mapping, z with µ = 4
3. Apply to either linear decreasing the inertial weight or random inertia weight
( )
E i t G d i i i k
( )
5 2 .
0 ⋅
+
×
= rand
z
( ) w z
wmax t
t max w t
w
w − + ×
×
−
=
1 2 2or
z
Experiment: Good convergence precision, quick
convergence velocity, and better global search ability
Yong Feng, Gui-Fa Teng, Ai-Xin Wang, and Yong-Mei Yao, “Chaotic inertia weight in particle swarm optimization,”
Proceeding 2ndInternational Conference on Innovative Computing, Information and Control, Kumamoto, Japan, pp. 475- 475, 2007
z
Time varying acceleration coefficients (c
1, c
2)
– Large c1 and small c2 in the early stage, to encourage particles to explore the search space
particles to explore the search space
– Promoted quick convergence to the optimum solution in the later stage with larger c2 and smaller c1
(
f i) c
itmax c t
c
c
1=
1−
1+
1( )
t(
f i)
c itmax c t
c
c2 = 2 − 2 + 2
Ratnaweera A, Halgamuge S.K., and Watson H. C., “SELF-ORGANIZING HIERARCHICAL PARTICLE SWARM OPTIMIZER,” IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004
2. Modifications of PSO Equation
z
Canonical PSO by Maurice Clerc
– Studied the swarm behavior using the second order differential equations
– The study shows that it is possible to determine under which conditions that the swarm will converge
– Introduces a constriction factor , to ensure convergence by restraining the velocity to guarantee convergence of the
χ
by restraining the velocity to guarantee convergence of the particles
– Observation: The amplitude of the particle’s oscillations decreases and increase depend on the distance between pbest and gbest
pbest and gbest
M. Clerc, and J. Kennedy, “The Particle Swarm Explosion, Stability,and Convergence in a Multidimensional Complex Space,” IEEE Transactionson Evolutionary Computation, Vol. 6, No. 1, February, (2002): 58-73
– Particle will oscillate around the weighted mean of pbest and gbest
– If pbest and gbest are near each other, particle will performIf pbest and gbest are near each other, particle will perform local search
– If pbest and gbest are far apart from each other, particle will perform global search
perform global search
– During the search process, particle will shift from local
search back to global search depending on pbest and gbest The constriction factor balances the need for local and
– The constriction factor balances the need for local and global search depending how the social conditions are in place
z
The update velocity equation is
( )
⎟⎞⎜⎛vi,j t +
( ) ( ( ) )
( ( ) )
⎟⎟⎟⎟
⎜ ⎠
⎜⎜
⎜
⎝ −
+
−
= +
t x gbest
r c
t x pbest
r c t
v
j i j
j i j
i j
i
, 2
2
, ,
1 1
, 1
χ
2κ
[ ]
where ; ; and
2(
4)
2
−
−
= −
φ φ φ
χ κ
κ ∈ [ ] 0 , 1
4
2 ;
1 + >
=
φ
φ
c1 c 2φ
φ
z The parameter controls the convergence speed to the point of attraction.
z If is close to zero, will be close to zero, then the resulting
κ
κ χ
velocity will be small. Small velocity encourage local search, so the convergence speed is high
z If is close to one, high exploration behavior but slowest possible convergence speed
κ
possible convergence speed
z Experiment: Even without the velocity clipping criterion the constriction factor can prevent the particles from leaving the search space and ensure convergence
search space and ensure convergence
z
Gaussian Particle Swarm Model (GPSO)
–
Observation shows the expected values for
( b ) d ( b ) 0 729 d 0 85
(pbest-x) and (gbest- x) are 0.729 and 0.85
–
A probability distribution that generates random values with expected values of [0.729 0.85]
values with expected values of [0.729 0.85]
( ) t randn ( pbest x ( ) t ) randn ( gbest x ( ) t )
v
i,j+ 1 =
i,j−
i,j+
j−
i,j( ) 1 ( ) ( ) 1
–
|randn| and |randn| are positive random numbers generated according to abs[N(0,1)]
( ) 1
,( )
,( ) 1
,
t + = x t + v t +
x
i j i j i jg g [ ( , )]
R. A. Krohling, “Gaussian swarm: a novel particle swarm optimization algorithm,” Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore, pp. 372-376, 2004.
3. Neighborhood Topology
z
From the standard PSO equation, the movement of particles influence by both personal best (pbest) and global best (gbest)
global best (gbest)
z
Neighborhood topology- topology of a swarm (usually replace gbest)
( y p g )
z
Neighborhood topology- How the particles in the swarm are connected with each other in terms of sharing their knowledge
sharing their knowledge
z
The convergence rate can be estimated by calculating the average distance between two particles in the neighborhood topology
particles in the neighborhood topology
z
The shorter the average distance facilitates quick
convergence speed – This means lower degree of g p g
connectivity
z
Global Topology (STAR)
– Happens in every iteration
All th ti l t d i h th t th k l d i
– All the particles are connected in such that the knowledge is shared by all particles
– Best solution found by any particles in the swarm and will i fl ll th ti l t t it ti
influence all the particles at next iteration
g
e b
f a
g
d c
z
Wheel Topology
– When particle (example: b) finds the global best, particle a, will immediately drawn into it (at next iteration)
will immediately drawn into it (at next iteration)
– Only when particle a move to the location, that it influence the rest of the particles
O it ti i d b f ll th ti l i
– One or more iterations required before all the particles in the neighborhood are influence by the global best
b f
g
a f b
g a
The rest of particles e will
d
c a
e
d
c
will
drawn into particle a
z
Ring Topology (Circle or lbest)
– Each particle is connected with K immediate neighbors
When one particle (example: b) finds the global best only its
– When one particle (example: b) finds the global best, only its immediate neighbors (i.e., a & c), will be drawn to b
– Other particles are not influenced by b until their immediate neighbors have moved towards that location
neighbors have moved towards that location
– A few iterations may required before all the particles in the neighborhood are influence by the global best
b a f
g
b a f
g
e d
c f b
e d c
f b
z
Each neighborhood has strength and weakness
z
STAR – Fast information flow, converges fast, but with potential of premature convergence
with potential of premature convergence
z
Wheel – Moderate information flow, converges more quickly but sometimes to a suboptimal point in the space
z
Circle – Slow information flow, converge slower,
favor more exploration; might have more chances to find better solutions slowly
z
There are many more neighborhood topologies and
the choice of these topologies are depended by the
the choice of these topologies are depended by the
problems (i.e., problem dependent)
4. Mutation/Perturbation Operators
z
Problem of PSO – lack of diversity and easily trapped in a local optimum.
A t ti ti l b bl t l d th th
z
A mutating particle may be able to lead the other particles away from their current position if this particle becomes the global best.
p g
z
Hence, apply mutation operators to PSO – A
strategy to enhance the exploration of the particles and to escape the local minima
and to escape the local minima.
z
Where to apply the mutation operators in PSO?
– Apply to the updated particle’s position (decision variable)
A l t th d fi d i l it th h ld
– Apply to the user defined maximum velocity threshold
z
The mutation operators are the mutation approaches use for GA or MOEA. Example:
R d t ti N if t ti N l di t ib t d
– Random mutation; Non-uniform mutation; Normal distributed mutation
– Example of Gaussian mutation used for PSO*:
where σ is set to be 0.1 times the length of the search space in
( ) x x ( Gaussian ( ) σ )
mutation = × 1 −
g
one dimension; OR σ can be set at 1.0 and linearly decreases to 0 as the iteration counts reach maximum criteria
5. Multiple-Swarm Concept in PSO
z
Improve performance by promoting exploration and diversity
– Counter its tendency of premature convergence z
Three main groups
– Improve the performance of PSO by promoting diversityImprove the performance of PSO by promoting diversity
– Solve multimodal problems
– Locate and track the optima of a multimodal problem in a dynamic environment
dynamic environment
z
Kennedy proposed using a k-means clustering
algorithm to identify the centers of different clusters of particles in the population, and then use these of particles in the population, and then use these cluster centers to substitute the personal bests
– Require a pre-specified number of iterations to determine the cluster centers and pre-specified number of clustersp p
– Not suitable for multimodal problems since the cluster
centers don’t necessarily the best-fit particles in that cluster
*J. Kennedy, “Stereotyping: improving particle swarm performance with cluster analysis,” Proceedings of Congress on Evolutionary Computation, San Diego, CA. pp. 1507-1512, 2000
–
Number of clusters and number of iterations to identify the cluster centers must be
predetermined predetermined.
Cluster A’s center performs better than performs better than all members of
cluster A, whereas
l t B’ t
cluster B’s center performs better than some and worse
than others*
z
Chen and Yu’s TPSO
– First subswarm will optimize following the global best;
second subswarm will move in the opposite direction pp
– Particle’s pbest is updated based on its local best, their corresponding subswarm’s best, and the global best collected from two subswams
– If the global best has not improved for 15 successive
iterations, the worst particles of a subswarm are replaced by the best ones of the other subswarm. Then, the subswarms switch their flight directions
switch their flight directions
G. Chen and J. Yu, “Two sub-swarms particle swarm optimization algorithm,” Proceeding of International Conference on Natural Computation, Changsha, China, pp. 515-524, 2005
z
Multi-population cooperative optimization (MCPSO)
– Based on the concept of master-slave mode
S l ti ill h t d lti l
– Swarm population will have a master swarm and multiple slave swarms
– Slave swarms explore the search space independently to i t i di it f ti l
maintain diversity of particles
– Master swarm updates via the best particles collected from the slave swarms
B. Niu, Y. Zhu, and X. He, “Multi-population cooperative particle swarm optimization,” Proceeding of European Conference on Artificial Life, Canterbury, UK, pp. 874-883, 2005
z
Speciation-based PSO (SPSO) is proposed by Parrott and Li
– Notion of species: A group of individuals sharing commonNotion of species: A group of individuals sharing common attributes according to some similarity metric
– A radius, rs, is measured in Euclidean distance from the center of a species to its boundary
– The center of a species, species seed, is always the fittest individual in the species
– All particles that fall within the distance from the species
d l ifi d h i
seed are classified as the same species
D. Parrott and X. Li, “Locating and tracking multiple synamic optima by a particle swarm model using speciation,” IEEE Transactions on Evolutionary Computations, Vol. 10, No. 4, pp. 440-458, 2006
PSO Designs Inspired by Other Fields
z
Emotion Particle Swarm Optimization
– Each particle has two emotions (joy and sad)
Th ti l t t f th ti l i b d th
– The emotional state of the particles is based on the emotional factor compare with a random value
– If certain condition is met, then the particle is updated using
th “j f l” l it ti l “ d” l it ti
the “joyful” velocity equation or else “sad” velocity equation will be applied
– Psychological model is incorporated in both “joyful” and
“sad” velocity equations
“sad” velocity equations
B. Niu, Y.L. Zhu, K.Y. Hu, S.F. Li, X.X. He, “A novel particle swarm optimizer using optimal foraging theory,” Proceedings of Computational Intelligence and Bioinformatics, Part 3, Kunming, China , Vol. 4115, pp. 61-71, 2006
A NT C O ONY O PTIMIZATION
A NT C OLONY O PTIMIZATION
Biological Fact
•
Ant Colony Optimization (ACO) studies artificial systems that take inspiration from the behavior of real ant colonies and which are used to solve discrete optimization problems.•
Real ants are capable of finding the shortest path from a food source to their nest without using visual cues by exploiting pheromone information.•
While walking, ants deposit pheromone on the ground and follow, in probability, pheromone previously deposited by other ants.•
Also, they are capable of adapting to changes in the environment, so, t ey a e capab e o adapt g to c a ges t e e o e t, for example finding a new shortest path once the old one is no longer feasible due to a new obstacle (dynamical environments).•
Consider the following figure in which ants are moving on a straight line which connects a food source to the nest:•
It is well-known that the primary means used by ants to form and maintain the line is a pheromone trail. Ants deposit a certain amount maintain the line is a pheromone trail. Ants deposit a certain amount of pheromone while walking, and each ant probabilistically prefers to follow a direction rich in pheromone rather than a poorer one. This elementary behavior of real ants can be used to explain how they can find the shortest path which reconnects a broken line after the sudden appearance of an unexpected obstacle has interrupted the initial path:•
In fact, once the obstacle has appeared, those ants which are just in front of the obstacle cannot continue to follow thepheromone trail and therefore they have to
h b t t i i ht l ft I thi
choose between turning right or left. In this situation we can expect half the ants to choose to turn right and the other half to turn left. The very same situation can be found on the other side of the obstacle found on the other side of the obstacle.
•
It is interesting to note that those ants which choose, by chance, the shorter path around the obstacle will more rapidly reconstitute the interrupted pheromone trail compared to p p p those which choose the longer path. Hence, the shorter path will receive a higher amount of pheromone in the time unit and this will in turn cause a higher number of ants tochoose the shorter path Due to this positive choose the shorter path. Due to this positive feedback (autocatalytic) process, very soon all the ants will choose the shorter path.
•
The most interesting aspect of this autocatalytic process is that finding the shortest path around the obstacle seems to be anemergent property of the interaction between the obstacle shape and ants distributed behavior: Although all ants move at
and ants distributed behavior: Although all ants move at
approximately the same speed and deposit a pheromone trail at approximately the same rate, it is a fact that it takes longer to contour obstacles on their longer side than on their shorter side which makes the pheromone trail accumulate quicker on the which makes the pheromone trail accumulate quicker on the shorter side. It is the ants preference for higher pheromone trail levels which makes this accumulation still quicker on the shorter path.
• [R1] Beckers R., Deneubourg J.L. and S. Goss (1992). Trails and U-turns in the selection of the shortest path by the ant Lasius niger. Journal of theoretical biology, 159, 397-415.
• [R2] Hölldobler B. and E.O. Wilson (1990). The ants. Springer-Verlag, Berlin.
History
•
The first ACO system was introduced by Marco Dorigo in his Ph.D.thesis (1992), and was called Ant System (AS).
•
AS is the result of a research on computational intelligenceapproaches to combinatorial optimization that Dorigo conducted at approaches to combinatorial optimization that Dorigo conducted at Politecnico di Milano in collaboration with Alberto Colorni and
Vittorio Maniezzo. AS was initially applied to the travelling salesman problem, and to the quadratic assignment problem.
•
Si 1995 D i G b d ll d Stüt l h b ki•
Since 1995 Dorigo, Gambardella and Stützle have been working on various extended versions of the AS paradigm. Dorigo and Gambardella have proposed Ant Colony System (ACS), while Stützle and Hoos have proposed MAX-MIN Ant System (MMAS).••
They have both been applied to the symmetric and asymmetric traveling salesman problem, with excellent results. Dorigo,Gambardella and Stützle have also proposed new hybrid versions of ant colony optimization with local search. In problems like the quadratic assignment problem and the sequential ordering problem these ACO algorithms outperform all known algorithms on vast
classes of benchmark problems.