• Essential requirements for neural network design

(1)

MOEA N ÊURA N ÊTWORK D ÊSIGN

MOEA N ÊURAL N ÊTWORK D ÊSIGN

(2)

Design Specification

• Essential requirements for neural network design

A t i i l ith th t h f th ti l t

o A training algorithm that can search for the optimal parameters (i.e., weights or biases) for the specified network structure and training task.

o A rule or algorithm that can determine the network complexity and ensure it to be sufficient for given training problem.

o Ametric or measure to evaluate the reliability and generalization o A metric or measure to evaluate the reliability and generalization

of the produced neural network.

(3)

Design Dilemma

) ,

( ^*

* x ω

f_NS

•

A single optimal neural network is very difficult to find due to

o Weights are tuned by training sets, which are finite– difficult to extract f_NS^* (x,ω^*)

from by a finite training data set._F_NS

o Trade-off between NN learning capability and the variation of the hidden neuron numbers

FNS

A network with insufficient neurons— low training performance

A network with excessive number of neurons— difficult for generalization .

•

Instead of trying to obtain a single optimal neural network, finding a set of near-optimal networks with different network structures is more feasible.

••

NN design– Multiobjective optimization problem (achieve better performance while simplify network structure)

(4)

Design Principle

•

A population based, parallel searching multiobjective genetic algorithm is suitable for neural network design– to find a set of non-dominated neural net solutions

o Find a uniformly distributed Pareto front by one run

o Equally treat discontinuous, concave and other shapes of Pareto front

f2

f2 t k

f1– network complexity f2– network

training error

p y

(neuron number)

f1

(5)

Hierarchical Genotype

•

Using GA to evolve RBF Neural Network o Evolving NN topology along with parameters

Hi hi l i d i

phenotype

o Hierarchical gene structure in genotype design

weight center

)

||

exp(

)

( ²

1

i m

i

f x =

∑

i − x −c

=

phenotype

ω

1 0 0 1 0 1

Control genes Weight

1 0 0 1 0 1

Control genes Weight

Genotype

Center genes genes

(6)

Domination Ranking

• Goal: To convert multiple fitness of multiobjectives to a rank value.

f2

f1

• Rank value represents the dominated relationship by

layerizing the resulting population.

(7)

Diversity Preservation

• Goal: to maintain population diversity during

evolutionary process to obtain uniformly distributed Pareto front

f2 f2

f1 f1

(8)

Automatic Accumulated Ranking

•

Combine hierarchical genotype representation and Rank Density based Genetic Algorithm (RDGA)— Automatic

Accumulated Ranking Strategy, includes both dominance and diversity info

diversity info.

•

All nondominated individual has rank=1, for dominated individual y at generation t , Its rank value is given by

∑

+

=

) (

) , ( 1

) , (

p t

j

t y rank t

y

f2

rank

3

8

∑

=1

j j

1 1

2

6 12

where P^(t) are the number of individual who

f1 1

2 where P^(t) are the number of individual who dominate y

(9)

Adaptive Cell Density Estimation

•

In HRDGA, an adaptive grid density estimation approach is proposed. The length of adaptive grid cell in objective space is computed as

computed as

X i

X f i f

d max (x) min (x )

x

x∈ − ∈

= ⁱ

i 1 = n

^Density=4

X i X i

i K

f

d max f (x ) min (x )

x

x∈ − ∈

=

i = 1 ,..., n

f2

i

i K

d =

i = 1 ,..., n

•

Density value of each individual is the b f th i di id l l t d i th

Density=4

number of the individuals located in the same cell.

f1

(10)

Fitness Assignment

• In HRDGA, original population is divided to optimize rank and density value independently. The rank and

d it l b ti i d i lt l !

density value can be optimized simultaneously!

population

pop lation Sub-pop 1 to Minimize

population rank value

i-th generation (i+1)-th generation

p p population

Sub-pop 2 to minimize population density value

Selection

Crossover

&mutation

Selection

(11)

Elitism

• An elitist archive is used and an adaptive probability of sampling an individual from archive is applied.

Main

population

Elitists’

Local Search

• Diffusion scheme from Cellular GA

Selected parent

Best i di id l

Diffusion scheme

individual

Offspring

(13)

Forbidden Region

• To prevent the appearances of offsprings with high rank value and low density value

Offspring Selected

parent

Forbidden Region

Offspring

Forbidden Region

(14)

Mackey-Glass Time Series Prediction

•

Algorithms for comparison

o K-Nearest Neighbor (KNN, Kaylani & Dasgupta, 2000);

•

Testing Problem Mackey Glass Chaotic Time Series

o Generalized Regression Neural Network (GRNN, Wasserman, 1993) o Orthogonal Least Square (OLS, Chen,et. al, 1991)

Testing Problem– Mackey-Glass Chaotic Time Series )

)) ( (

1 (

) (

)) (

( b x t

τ t x

τ t x a t

d t x d

c − ×

− +

−

= ×

o To predict x(t+6) based on x(t), x(t-6), x(t-12) and x(t-18)

o

150 10

1 0 2

0 = = =

= b c τ

a = 0.2,b = 0.1, c =10, τ =150 a

(15)

Parameter Setting

•

For HRDGA, the length of all three layers of genes are chosen to be 150, population size is 400.

•

^F ^KNN ^{d GRNN} h d 70 i di id l k i h 11 80

•

For KNN and GRNN method, 70 individual networks with 11~80 number of neurons are used for comparison.

•

For OLS method, 40 different tolerance parameter– ρ^ρvalues are chosen

•

^St ⁱ ^{C it i} ^ith ^th ^{h (} ^ti ⁾ ^{d 5 000} ^th

form 0.01 to 0.4 with step 0.01. determines the trade-off between the performance and complexity of a network.

ρ

•

Stopping Criteria: either the epochs (generations) exceeds 5,000, or the training Sum Square Error (SSE) between two sequential (epochs)

generations is smaller than 0.01.

•

Training data set: first 250 seconds data.

•

Testing data set: data from 250 – 499, 500 – 749, 750 – 999 and 1,000 – 1,249 seconds

(16)

Simulation Study- Training Set

Training SSE vs Neuron Numbers Non dominated front Training SSE vs. Neuron Numbers

for training set

Non-dominated front for training set

(17)

Simulation Study- Testing Set #1

Training SSE vs Neuron Numbers Non dominated front Training SSE vs. Neuron Numbers

for testing set #1 Non-dominated front

for testing set #1

(18)

Simulation Study- Testing Set #2

T i i SSE N N b Non dominated front

Training SSE vs. Neuron Numbers for testing set #2

Non-dominated front for testing set #2

(19)

Simulation Study- Testing Set #3

T i i SSE N N b N d i d f

Training SSE vs. Neuron Numbers

for testing set #3

(20)

Simulation Study- Testing Set #4

T i i SSE N N b N d i d f

Training SSE vs. Neuron Numbers

for testing set #4

(21)

Performance Comparison

Best performance for Training set

Best performance for Testing set #1

Best performance for Testing set #4 for Training set for Testing set #1 for Testing set #2 for Testing set #3 for Testing set #4 Training

SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

KNN 2.8339 69 3.3693 42 3.4520 42 4.8586 48 4.8074 19

GRNN 2.3382 68 2.7720 38 3.0711 43 2.9644 40 3.2348 37

OLS 2.3329 60 2.4601 46 2.5856 50 2.5369 37 2.7199 54

HRDGA 2.2901 74 2.4633 47 2.5534 52 2.5226 48 2.7216 58

(22)

Problem for OLS

•

The trade-off characteristic between network performance and complexity totally depends on the value of tolerance parameter

ρ ρ

•

Same value means completely different trade-off features for

ρ

different NN design problems– designer cannot control the network complexity

parameter .^ρ

ρ

network complexity.

•

The relationship between value and network topology is nonlinear, many to one mapping

ρ

many to one mapping

(23)

Observations

•

A new genetic algorithm assisted neural network design approach Hierarchical Rank-Density Genetic Algorithm is proposed

•

Characteristics of HRDGA:

o Hierarchical genotype representation– evolve topology with parameters o Multiobjective optimization– find a set of near-optimal network

candidates by one run candidates by one run

o Elitism, local search and forbidden region techniques help HRDGA in finding near-optimal networks with low training errors.

o The network complexity is near completelyp y p y and uniformlyy sampled p due to HRDGA’s diversity maintaining scheme.

(24)

P ^{ARTIC E} S ^WARM O PTIMIZATION

P ^ARTICLE S ^WARM O PTIMIZATION

(25)

Definition

z

Swarm Intelligence (SI) is the property of a system whereby the collective behaviors of (unsophisticated) agents interacting locally with their environment

agents interacting locally with their environment

cause coherent functional global patterns to emerge.

z

SI provides a basis with which it is possible to p p

explore collective (or distributed) problem solving without centralized control or the provision of a global model

global model.

(26)

In Simple Language…

z

SI systems are typically made up of a population of simple agents interacting locally with one another and with their environment to accomplish simple goal.

with their environment to accomplish simple goal.

z

Although there is normally no centralized control structure dictating how individual agents should

behave local interactions between such agents often behave, local interactions between such agents often lead to the emergence of global behavior

(swarming behavior).

(27)

SI in Nature

Fish Schooling Bird Flocking in V formation Fish Schooling

©CORO, Caltech

Bird Flocking in V formation

©CORO, Caltech

Benefits of forming a swarm of thousand of animals g

(agents) is to increasing foraging efficiency and defense

over against predators

(28)

• Another example: Nest building in termites

building in termites

• An eighteen feet tall termite nest at the University of

Ghana Legon.

(29)

How Termites Build their Nest

1. Each termite scoops up a mudball from its

environment, invests the ball with pheromones.

2. Then, randomly , y drops it on the ground or elevate p g positions to form small heaps.

3. Termites are attracted to their nestmates' pheromones.

Hence, they are more likely to drop their own mudballs y y p near or on top of their neighbors'.

4. The process stop when reaches a specific height.

5. Termites look for heaps clusters, choose the heap

5 e es oo o eaps c us e s, c oose e eap

clusters, and connect them with each other by building walls.

6. Over time this leads to the construction of pillars, p ,

arches, tunnels and chambers.

(30)

Fundamental Concepts of SI

z

Self-organization process

–

Each agent plays its role by interacting with its environment to gather the latest information, constantly making decision based on some

simple local rules and information received, and p , interact locally with other agents

z

Division of labor

–

Different groups of agents have their own specializations to carry on certain tasks

They collaborate with other groups and perform

–

They collaborate with other groups and perform

their own tasks simultaneously

(31)

Applications of ‘Swarm’ Principle

z

Robotics

– Swarm bots http://www.swarm-bots.org/

C t A i ti

z

Computer Animation

– http://gvu.cc.gatech.edu/animation/Areas/group_behavior/gr oup.html

z

Games, Interactive graphics and virtual reality

z

“Swarm-like” algorithm - solving optimization problems

problems

– Particle Swarm Optimization (PSO)

– Ant Colony Optimization (ACO)

(32)

Early Particle Swarm Optimization

z

Craig Reynolds’s flock motion (1986-1987)

–

To model the flocking behavior of simple agents (boids)

(boids)

–

Resulting flock motion is contributed by the

interaction between the behaviors of individual b id

boids

–

Each boid has its own coordinate system and

applies geometric flight model to support its flight movement

–

Geometric flight model includes translation and flight dynamic parameters of yaw, pitch, and

flight dynamic parameters of yaw, pitch, and

banking (roll)

(33)

–

Incorporates three steering behaviors (local

rules), which is the underlying concept of flocking E h b id h it l l i hb h d (li it d

–

Each boid has its own local neighborhood (limited perception range of birds or fishes in the nature)

angel angle

boid

distance distance

(34)

The three steering behaviors describe the maneuverability of an individual boid:

Separation: steer to avoid crowding local flockmates

Alignment: steer towards the average heading of local

flockmates (neighbors)

Cohesion: steer to move toward the average position of local flockmates

Reference: http://www.red3d.com/cwr/boids/index.html

flockmates (neighbors) local flockmates (neighbors)

(35)

Simulated Boid Flock simulated boid flock avoiding cylindrical obstacles (1986)

Reference: http://www.red3d.com/cwr/boids/index.html

(36)

z

Reynolds’s pioneering work became the stepping stone for the development of a computer graphic area known as the behavior animation

area known as the behavior animation

– Georgia Tech' physically-based models of group behaviors

– Stampede sequence (The Lion King)

– Orc army (The Lord of the Rings)

(37)

z

Heppner’s artificial bird flocks simulation

– Studied bird flock from the movies

C ll b t d ith G d d P tt t d l

– Collaborated with Grenander and Potter to develop a program that simulates an artificial bird flocks

– Through observation, Heppner realized that chaotic theory

b t l i th t b h i i fl ki

can be to explain the emergent behavior in flocking

– Designed the four simple rules to model an individual bird’s behavior

F. Heppner and U. Grenander, “A stochastic nonlinear model for coordinated bird flocks.” The Ubiquity of Chaos, ed. S.

Krasner, AAAS Publications, Washington, DC, 1990

(38)

1.

The attractive force is to allow the birds to attract each other and repulsive force is to forbid the birds to fly too close to each other

close to each other

2.

Each bird maintains the same velocity as its neighboring birds

O f

3.

Occasionally, the birds’ flight path can be altered by a random input (craziness)

4.

Any birds are attracted to a roost y and the attraction

increases as the birds are flying closer to the roost

(39)

z

The whole concept is as follow:

– The birds begin to fly around with no particular destination

– Once a bird discover the roost it will move away from theOnce a bird discover the roost, it will move away from the flock and land on the roost

– Hence, it will pull its nearby neighbors to move towards the roost

– As these neighbors discover the roost, they will land on the roost and bring others more

– This process will go on until the entire flock land on the roost

(40)

Particle Swarm Optimization (PSO)

z

Inspired by the “roosting area” concept, James

Kennedy (social-psychologist) and Russell Eberhart (Electrical Engineer) revised Heppner and

(Electrical Engineer) revised Heppner and Grenander proposed methodology

I t h d th bi d k h t l t

In nature, how do the birds know where to locate food (“roost”) when they are hundred feet in the air?

z

Explore the area of social psychology, which related to social behavior of the human beings

Reference: http://www.adaptiveview.com/articles/ipsop1.html

(41)

z

Their conclusion: knowledge is shared within the flock

z

They also include the “mind of social” viewpoint, which

–

Individuals want to be individualistic, i.e. to improve themselves.

I di id l t t l th f th i

–

Individuals want to learn the success of their

neighbors (both locally and globally), primarily learn their “experiences”.

z

Hence, they developed the Particle Swarm Optimization

(PSO)

(42)

About PSO

z

PSO is a population based optimization technique

z

The population is called a swarm (swarm population) Th t ti l l ti ( ti l ) f

z

The potential solutions (particles) - form a swarm

“flying” around the search space to search for the best solution

z

Particles = Candidate solutions (decision variables)

z

Particles’ flights are governed by the historical information o at o :

–

Velocity (v)

–

Own personal best position found so far (pbest)

Global best position discovered so far by any particles in

–

Global best position discovered so far by any particles in

the swarm (gbest)

(43)

x₂ My Personal Best Position (pbest_i)

Global Best Position (gbest)

This is my new position (x_i(t+1)) I’m here (x_i(t)) My Velocity (v_i(t))

position (x_i(t+1))

43

x₁

(44)

Standard PSO Equation

At each iteration, t For each particle, i p ,

For each dimension, j

( ) ^t ^w ^v ( ) ^t

v

_i _j

( ) + 1 = ×

_i _j

( ) Momentum Component

( ( ) )

( ^gbest ^x ( ) ^t )

r c

t x

pbest r

c

j i j

i j i j

i

2 2

, ,

1 1

, ,

−

×

× +

−

×

× Update +

Velocity

Momentum Component

Cognitive Component Social Component

( ^gbest ^x ( ) ^t )

r

c

₂ ₂ _j _i_,_j

+

( ¹ )

_,

( )

_,

( ¹ )

,

t + = x t + v t +

x

_i _j _i _j _i _j

Update Position

Social Component

(45)

z

r

₁

,r

₂

– Random numbers with [0,1]

z

c

₁

,c

₂

– Acceleration constants

M t t i ti l i ht t l

z

Momentum component, , – inertial weight controls the impact of previous velocity

z

Cognitive component - Personal thinking of each

w

particles; personal desire to exceed its current achievement

z

Social Component – Social knowledge attained via Soc a Co po e Soc a o edge a a ed a

collaborative effort of all the particles

(46)

Velocity Clipping Criterion

z

Kennedy investigated the swarm behavior if the velocity clipping criterion is not introduced.

z

Without the velocity clipping criterion the swarm

z

Without the velocity clipping criterion, the swarm would diverge in a sinus-like waves of increasing amplitude without able to converge to the global optimum

optimum

z

Hence, the velocity clipping criterion is necessary for the swarm to converge close to or equal to the global

ti

optimum

J. Kennedy, and R. C. Eberhart, Swarm Intelligence, ISBN 1-55860-595-9, Academic Press (2001)

(47)

A particle without velocity clipping criterion

A particle with velocity clipping criterion

Jakob Vesterstrøm and Jacques Riget, “Particle SwarmsExtensions for improved local, multi-modal, and dynamic search in numerical optimization,” May 2002.

(48)

z

To prevent the particles from leaving the search space, the one of the following steps can be taken:

V l it li i it i E h ti l t

–

Velocity clipping criterion: Each particle are not allow to have velocities exceed the user defined, i.e.,

[-v[ _max_max, v_max_max].]

Usually, y

v_max_max

is chosen to be k × x

_i^max

where is the feasible bound for variables, i.e.

[

^Ui

]

L

i

x

x ,

i max

xi

–

Position clipping criterion: Each particle are not allow to have decision variables exceed the feasible bound for variables ( [

^L ^U

] )

feasible bound for variables ( ) [

i^U

]

L

i

x

x ,

(49)

PSO Algorithm (Pseudo Code)

/Initialization/

Initializeswarm randomly (Particles and velocity) Set w, c1,c2, max num of iterations (tmax),

Store particles’ positions (pbest) begin

begin

While t<tmax

for eachparticle

z Calculate fitness.

z Update pbest if current position is better than the position contained in the memory.

End

Find global best position (gbest)

for eachparticle

z Update velocity and position of particle.

z Apply velocity/position clipping criterion End

End while End while

Report optimum solution (gbest) End

(50)

Animation of PSO

z

Function:

5

⎞

( ) ⎛ ( ( ( ) ) ) ( ( ( ) ) )

( ) (

²

)

²

5

1

2 5

1

1 2

1

80032 0

42513 1

1 cos

,

+ +

+

⎟ +

⎠

⎜ ⎞

⎝

⎛ + + × + +

= ∑ ∑

=

x x

k k

x k

k k

x k

x x F

k k

– Minimization problem

T d i i i bl

( ^x

₁

⁺ ¹ ^. ⁴²⁵¹³ ) ( ⁺ ^x

₂

⁺ ⁰ ^. ⁸⁰⁰³² )

– Two decision variables

– No decision variable bounds

(51)

(52)

Modification in PSO for Solving SOPs

1.

Parameter Settings

2.

Modifications of PSO Equation

3.

Neighborhood Topology

4.

Mutation/Perturbation Operators

M lti l S C t i PSO

5.

Multiple-Swarm Concept in PSO

(53)

1. Parameter Settings

x₂ My Personal Best Position (pbest_i)

Global Best Position (gbest)

Inertial weight, w is large

I’m here (x_i(t))

My Velocity (v (t))

This is my new position (x_i(t+1))

53

x₁ My Velocity (v_i(t))

(54)

x₂

This is my new position (x_i(t+1))

c

₁

>> c

₂

I’m here (x_i(t)) My Velocity (v_i(t))

54

x₁

(55)

x₂ This is my new position (x_i(t+1))

p ( _i( ))

c

₂

>> c

₁

I’m here (x_i(t)) My Velocity (v_i(t))

55

x₁

(56)

z

Random inertia weight

– Experiment indicates this strategy accelerate the

convergence of particle swarm in the early time of the convergence of particle swarm in the early time of the algorithm

( )

5 2 .

0 + ⋅

= rand

w

– rand() is uniformly distributed random number within [0,1]

*Y. H. Shi, R. C. Eberhart, “Empirical Study of Particle Swarm Optimization”, Proceeding Congress on Evolutionary Computation, Piscataway, pp.1945-1949, 1999

(57)

z

Linear decreasing the inertial weight

(

₁ ₂

)

^w₂

max t

t max w t

w

w − +

×

−

=

– w₁ and w₂ are initial and final values of inertia weight

– Larger value for w facilitates global search at the beginning of the run

max t

of the run

– Smaller w encourage more local search ability near the end of the run

– Experiment indicates good performance when inertia weightExperiment indicates good performance when inertia weight descend from 0.9 to 0.4

*R. C. Eberhart, Y. Shi, “Comparing inertia weight and constriction factors in particle swarm optimization”, Proceeding Congress on Evolutionary Computation, San Diego, pp. 84-88, 2000

(58)

z

Chaotic inertia weight

– Use chaotic mapping to set inertia weight coefficient L i ti i

^z ⁽ ^t ⁺ ¹ ⁾ ^μ ^× ^z ⁽ ^t ⁾ ^× ( ¹ ^z ⁽ ^t ⁾ )

– Logistic mapping

^z ⁽ ^t ⁺ ¹ ⁾ ⁼ ^μ ^× ^z ⁽ ^t ⁾ ^× ( ¹ ⁻ ^z ⁽ ^t ⁾ )

• Distribution of Logistic mapDistribution of Logistic map when µ = 4

• Logistic mapping is iterated 30,000 times

Times happen to both Intervals are very high

,

• Mean times happening to interval [0.1,0.9] is 200

0.1 0.9

(59)

–

Strategy of chaotic initial weight

1. Select a random number z in the interval of (0, 1)

2 Calculate Logistic mapping z with µ = 4

2. Calculate Logistic mapping, z with µ = 4

3. Apply to either linear decreasing the inertial weight or random inertia weight

( )

E i t G d i i i k

( )

5 2 .

0 ⋅

+

×

= rand

z

( ) ^w ^z

w

max t

t max w t

w

w − + ×

×

−

=

₁ ₂ ₂

or

z

Experiment: Good convergence precision, quick

convergence velocity, and better global search ability

Yong Feng, Gui-Fa Teng, Ai-Xin Wang, and Yong-Mei Yao, “Chaotic inertia weight in particle swarm optimization,”

Proceeding 2^ndInternational Conference on Innovative Computing, Information and Control, Kumamoto, Japan, pp. 475- 475, 2007

(60)

z

Time varying acceleration coefficients (c

₁

, c

₂

)

– Large c₁ and small c₂ in the early stage, to encourage particles to explore the search space

particles to explore the search space

– Promoted quick convergence to the optimum solution in the later stage with larger c₂ and smaller c₁

(

_f _i

) ^c

_i

tmax c t

c

₁

=

₁

−

₁

+

₁

( )

^t

(

_f _i

)

^c _i

tmax c t

c

c₂ = ₂ − ₂ + ₂

Ratnaweera A, Halgamuge S.K., and Watson H. C., “SELF-ORGANIZING HIERARCHICAL PARTICLE SWARM OPTIMIZER,” IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004

(61)

2. Modifications of PSO Equation

z

Canonical PSO by Maurice Clerc

– Studied the swarm behavior using the second order differential equations

– The study shows that it is possible to determine under which conditions that the swarm will converge

– Introduces a constriction factor , to ensure convergence by restraining the velocity to guarantee convergence of the

χ

by restraining the velocity to guarantee convergence of the particles

– Observation: The amplitude of the particle’s oscillations decreases and increase depend on the distance between pbest and gbest

pbest and gbest

M. Clerc, and J. Kennedy, “The Particle Swarm Explosion, Stability,and Convergence in a Multidimensional Complex Space,” IEEE Transactionson Evolutionary Computation, Vol. 6, No. 1, February, (2002): 58-73

(62)

– Particle will oscillate around the weighted mean of pbest and gbest

– If pbest and gbest are near each other, particle will performIf pbest and gbest are near each other, particle will perform local search

– If pbest and gbest are far apart from each other, particle will perform global search

perform global search

– During the search process, particle will shift from local

search back to global search depending on pbest and gbest The constriction factor balances the need for local and

– The constriction factor balances the need for local and global search depending how the social conditions are in place

(63)

z

The update velocity equation is

( )

_⎟^⎞

⎜⎛v_i,_j t +

( ) ( ( ) )

( ( ) )

^⎟^⎟^⎟

⎟

⎜ ⎠

⎜⎜

⎜

⎝ −

+

−

= +

t x gbest

r c

t x pbest

r c t

v

j i j

i j

i

, 2

2

, ,

1 1

, 1

χ

2κ

[ ]

where ; ; and

2

(

4

)

2

−

= −

φ φ φ

χ κ

^κ ^∈ [ ] ⁰ ^, ¹

4

2 ;

1 + >

=

φ

c₁ c ₂

φ

(64)

z The parameter controls the convergence speed to the point of attraction.

z If is close to zero, will be close to zero, then the resulting

κ

κ χ

velocity will be small. Small velocity encourage local search, so the convergence speed is high

z If is close to one, high exploration behavior but slowest possible convergence speed

κ

possible convergence speed

z Experiment: Even without the velocity clipping criterion the constriction factor can prevent the particles from leaving the search space and ensure convergence

search space and ensure convergence

(65)

z

Gaussian Particle Swarm Model (GPSO)

–

Observation shows the expected values for

( b ) d ( b ) 0 729 d 0 85

(pbest-x) and (gbest- x) are 0.729 and 0.85

–

A probability distribution that generates random values with expected values of [0.729 0.85]

values with expected values of [0.729 0.85]

( ) ^t ^randn ( ^pbest ^x ( ) ^t ) ^randn ( ^gbest ^x ^{( )} ^t )

v

_i_,_j

+ 1 =

_i_,_j

−

_i_,_j

+

_j

−

_i_,_j

( ) ¹ ( ) ( ) ¹

–

|randn| and |randn| are positive random numbers generated according to abs[N(0,1)]

( ) ¹

_,

( )

_,

( ) ¹

,

t + = x t + v t +

x

_i _j _i _j _i _j

g g [ ( , )]

R. A. Krohling, “Gaussian swarm: a novel particle swarm optimization algorithm,” Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore, pp. 372-376, 2004.

(66)

3. Neighborhood Topology

z

From the standard PSO equation, the movement of particles influence by both personal best (pbest) and global best (gbest)

global best (gbest)

z

Neighborhood topology- topology of a swarm (usually replace gbest)

( y p g )

z

Neighborhood topology- How the particles in the swarm are connected with each other in terms of sharing their knowledge

sharing their knowledge

(67)

z

The convergence rate can be estimated by calculating the average distance between two particles in the neighborhood topology

particles in the neighborhood topology

z

The shorter the average distance facilitates quick

convergence speed – This means lower degree of g p g

connectivity

(68)

z

Global Topology (STAR)

– Happens in every iteration

All th ti l t d i h th t th k l d i

– All the particles are connected in such that the knowledge is shared by all particles

– Best solution found by any particles in the swarm and will i fl ll th ti l t t it ti

influence all the particles at next iteration

g

e b

f a

g

d c

(69)

z

Wheel Topology

– When particle (example: b) finds the global best, particle a, will immediately drawn into it (at next iteration)

will immediately drawn into it (at next iteration)

– Only when particle a move to the location, that it influence the rest of the particles

O it ti i d b f ll th ti l i

– One or more iterations required before all the particles in the neighborhood are influence by the global best

b f

g

a f b

g a

The rest of particles e will

d

c a

e

d

c

will

drawn into particle a

(70)

z

Ring Topology (Circle or lbest)

– Each particle is connected with K immediate neighbors

When one particle (example: b) finds the global best only its

– When one particle (example: b) finds the global best, only its immediate neighbors (i.e., a & c), will be drawn to b

– Other particles are not influenced by b until their immediate neighbors have moved towards that location

neighbors have moved towards that location

– A few iterations may required before all the particles in the neighborhood are influence by the global best

b a f

g

b a f

g

e d

c f b

e d c

f b

(71)

z

Each neighborhood has strength and weakness

z

STAR – Fast information flow, converges fast, but with potential of premature convergence

with potential of premature convergence

z

Wheel – Moderate information flow, converges more quickly but sometimes to a suboptimal point in the space

z

Circle – Slow information flow, converge slower,

favor more exploration; might have more chances to find better solutions slowly

z

There are many more neighborhood topologies and

the choice of these topologies are depended by the

problems (i.e., problem dependent)

(72)

4. Mutation/Perturbation Operators

z

Problem of PSO – lack of diversity and easily trapped in a local optimum.

A t ti ti l b bl t l d th th

z

A mutating particle may be able to lead the other particles away from their current position if this particle becomes the global best.

p g

z

Hence, apply mutation operators to PSO – A

strategy to enhance the exploration of the particles and to escape the local minima

and to escape the local minima.

(73)

z

Where to apply the mutation operators in PSO?

– Apply to the updated particle’s position (decision variable)

A l t th d fi d i l it th h ld

– Apply to the user defined maximum velocity threshold

(74)

z

The mutation operators are the mutation approaches use for GA or MOEA. Example:

R d t ti N if t ti N l di t ib t d

– Random mutation; Non-uniform mutation; Normal distributed mutation

– Example of Gaussian mutation used for PSO*:

where σ is set to be 0.1 times the length of the search space in

( ) ^x ^x ( ^Gaussian ( ) ^σ )

mutation = × 1 −

g

one dimension; OR σ can be set at 1.0 and linearly decreases to 0 as the iteration counts reach maximum criteria

(75)

5. Multiple-Swarm Concept in PSO

z

Improve performance by promoting exploration and diversity

– Counter its tendency of premature convergence z

Three main groups

– Improve the performance of PSO by promoting diversityImprove the performance of PSO by promoting diversity

– Solve multimodal problems

– Locate and track the optima of a multimodal problem in a dynamic environment

dynamic environment

(76)

z

Kennedy proposed using a k-means clustering

algorithm to identify the centers of different clusters of particles in the population, and then use these of particles in the population, and then use these cluster centers to substitute the personal bests

– Require a pre-specified number of iterations to determine the cluster centers and pre-specified number of clustersp p

– Not suitable for multimodal problems since the cluster

centers don’t necessarily the best-fit particles in that cluster

*J. Kennedy, “Stereotyping: improving particle swarm performance with cluster analysis,” Proceedings of Congress on Evolutionary Computation, San Diego, CA. pp. 1507-1512, 2000

(77)

–

Number of clusters and number of iterations to identify the cluster centers must be

predetermined predetermined.

Cluster A’s center performs better than performs better than all members of

cluster A, whereas

l t B’ t

cluster B’s center performs better than some and worse

than others*

(78)

z

Chen and Yu’s TPSO

– First subswarm will optimize following the global best;

second subswarm will move in the opposite direction pp

– Particle’s pbest is updated based on its local best, their corresponding subswarm’s best, and the global best collected from two subswams

– If the global best has not improved for 15 successive

iterations, the worst particles of a subswarm are replaced by the best ones of the other subswarm. Then, the subswarms switch their flight directions

switch their flight directions

G. Chen and J. Yu, “Two sub-swarms particle swarm optimization algorithm,” Proceeding of International Conference on Natural Computation, Changsha, China, pp. 515-524, 2005

(79)

z

Multi-population cooperative optimization (MCPSO)

– Based on the concept of master-slave mode

S l ti ill h t d lti l

– Swarm population will have a master swarm and multiple slave swarms

– Slave swarms explore the search space independently to i t i di it f ti l

maintain diversity of particles

– Master swarm updates via the best particles collected from the slave swarms

B. Niu, Y. Zhu, and X. He, “Multi-population cooperative particle swarm optimization,” Proceeding of European Conference on Artificial Life, Canterbury, UK, pp. 874-883, 2005

(80)

z

Speciation-based PSO (SPSO) is proposed by Parrott and Li

– Notion of species: A group of individuals sharing commonNotion of species: A group of individuals sharing common attributes according to some similarity metric

– A radius, r_s, is measured in Euclidean distance from the center of a species to its boundary

– The center of a species, species seed, is always the fittest individual in the species

– All particles that fall within the distance from the species

d l ifi d h i

seed are classified as the same species

D. Parrott and X. Li, “Locating and tracking multiple synamic optima by a particle swarm model using speciation,” IEEE Transactions on Evolutionary Computations, Vol. 10, No. 4, pp. 440-458, 2006

(81)

PSO Designs Inspired by Other Fields

z

Emotion Particle Swarm Optimization

– Each particle has two emotions (joy and sad)

Th ti l t t f th ti l i b d th

– The emotional state of the particles is based on the emotional factor compare with a random value

– If certain condition is met, then the particle is updated using

th “j f l” l it ti l “ d” l it ti

the “joyful” velocity equation or else “sad” velocity equation will be applied

– Psychological model is incorporated in both “joyful” and

“sad” velocity equations

B. Niu, Y.L. Zhu, K.Y. Hu, S.F. Li, X.X. He, “A novel particle swarm optimizer using optimal foraging theory,” Proceedings of Computational Intelligence and Bioinformatics, Part 3, Kunming, China , Vol. 4115, pp. 61-71, 2006

(82)

A ^NT C ^{O ONY} O PTIMIZATION

A ^NT C ^OLONY O PTIMIZATION

(83)

Biological Fact

•

Ant Colony Optimization (ACO) studies artificial systems that take inspiration from the behavior of real ant colonies and which are used to solve discrete optimization problems.

•

Real ants are capable of finding the shortest path from a food source to their nest without using visual cues by exploiting pheromone information.

•

While walking, ants deposit pheromone on the ground and follow, in probability, pheromone previously deposited by other ants.

•

Also, they are capable of adapting to changes in the environment, so, t ey a e capab e o adapt g to c a ges t e e o e t, for example finding a new shortest path once the old one is no longer feasible due to a new obstacle (dynamical environments).

(84)

•

Consider the following figure in which ants are moving on a straight line which connects a food source to the nest:

•

It is well-known that the primary means used by ants to form and maintain the line is a pheromone trail. Ants deposit a certain amount maintain the line is a pheromone trail. Ants deposit a certain amount of pheromone while walking, and each ant probabilistically prefers to follow a direction rich in pheromone rather than a poorer one. This elementary behavior of real ants can be used to explain how they can find the shortest path which reconnects a broken line after the sudden appearance of an unexpected obstacle has interrupted the initial path:

(85)

•

In fact, once the obstacle has appeared, those ants which are just in front of the obstacle cannot continue to follow the

pheromone trail and therefore they have to

h b t t i i ht l ft I thi

choose between turning right or left. In this situation we can expect half the ants to choose to turn right and the other half to turn left. The very same situation can be found on the other side of the obstacle found on the other side of the obstacle.

•

It is interesting to note that those ants which choose, by chance, the shorter path around the obstacle will more rapidly reconstitute the interrupted pheromone trail compared to p p p those which choose the longer path. Hence, the shorter path will receive a higher amount of pheromone in the time unit and this will in turn cause a higher number of ants to

choose the shorter path Due to this positive choose the shorter path. Due to this positive feedback (autocatalytic) process, very soon all the ants will choose the shorter path.

(86)

(87)

•

The most interesting aspect of this autocatalytic process is that finding the shortest path around the obstacle seems to be an

emergent property of the interaction between the obstacle shape and ants distributed behavior: Although all ants move at

and ants distributed behavior: Although all ants move at

approximately the same speed and deposit a pheromone trail at approximately the same rate, it is a fact that it takes longer to contour obstacles on their longer side than on their shorter side which makes the pheromone trail accumulate quicker on the which makes the pheromone trail accumulate quicker on the shorter side. It is the ants preference for higher pheromone trail levels which makes this accumulation still quicker on the shorter path.

• [R1] Beckers R., Deneubourg J.L. and S. Goss (1992). Trails and U-turns in the selection of the shortest path by the ant Lasius niger. Journal of theoretical biology, 159, 397-415.

• [R2] Hölldobler B. and E.O. Wilson (1990). The ants. Springer-Verlag, Berlin.

(88)

History

•

The first ACO system was introduced by Marco Dorigo in his Ph.D.

thesis (1992), and was called Ant System (AS).

•

AS is the result of a research on computational intelligence

approaches to combinatorial optimization that Dorigo conducted at approaches to combinatorial optimization that Dorigo conducted at Politecnico di Milano in collaboration with Alberto Colorni and

Vittorio Maniezzo. AS was initially applied to the travelling salesman problem, and to the quadratic assignment problem.

•

^Si ^{1995 D i} ^G ^{b d ll} ^{d Stüt l h} ^b ^ki

•

Since 1995 Dorigo, Gambardella and Stützle have been working on various extended versions of the AS paradigm. Dorigo and Gambardella have proposed Ant Colony System (ACS), while Stützle and Hoos have proposed MAX-MIN Ant System (MMAS).

••

They have both been applied to the symmetric and asymmetric traveling salesman problem, with excellent results. Dorigo,

Gambardella and Stützle have also proposed new hybrid versions of ant colony optimization with local search. In problems like the quadratic assignment problem and the sequential ordering problem these ACO algorithms outperform all known algorithms on vast

classes of benchmark problems.

• Essential requirements for neural network design

MOEA N EURA N ETWORK D ESIGN

MOEA N EURAL N ETWORK D ESIGN

Design Specification

• Essential requirements for neural network design

Design Dilemma

•

•

••

Design Principle

•

Hierarchical Genotype

•

phenotype

∑

phenotype

Genotype

Domination Ranking

• Goal: To convert multiple fitness of multiobjectives to a rank value.

• Rank value represents the dominated relationship by

layerizing the resulting population.

Diversity Preservation

• Goal: to maintain population diversity during

evolutionary process to obtain uniformly distributed Pareto front

Automatic Accumulated Ranking

•

•

∑

+

=

) , ( 1

) , (

t y rank t

y

rank

∑

Adaptive Cell Density Estimation

•

i 1 = n

i = 1 ,..., n

i = 1 ,..., n

•

Fitness Assignment

• In HRDGA, original population is divided to optimize rank and density value independently. The rank and

d it l b ti i d i lt l !

density value can be optimized simultaneously!

Elitism

• An elitist archive is used and an adaptive probability of sampling an individual from archive is applied.

Local Search

• Diffusion scheme from Cellular GA

Forbidden Region

• To prevent the appearances of offsprings with high rank value and low density value

Mackey-Glass Time Series Prediction

•

•

Parameter Setting

•

•

•

•

•

•

•

•

Simulation Study- Training Set

Simulation Study- Testing Set #1

Simulation Study- Testing Set #2

Simulation Study- Testing Set #3

Simulation Study- Testing Set #4

Performance Comparison

Problem for OLS

•

•

ρ

ρ

•

Observations

•

•

P ARTIC E S WARM O PTIMIZATION

MOEA N ÊURA N ÊTWORK D ÊSIGN

MOEA N ÊURAL N ÊTWORK D ÊSIGN

P ^{ARTIC E} S ^WARM O PTIMIZATION

P ^ARTICLE S ^WARM O PTIMIZATION