• 沒有找到結果。

Analysis of particle interaction in particle swarm optimization

N/A
N/A
Protected

Academic year: 2021

Share "Analysis of particle interaction in particle swarm optimization"

Copied!
15
0
0

加載中.... (立即查看全文)

全文

(1)

Contents lists available atScienceDirect

Theoretical Computer Science

journal homepage:www.elsevier.com/locate/tcs

Analysis of particle interaction in particle swarm optimization

Ying-ping Chen

, Pei Jiang

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan

a r t i c l e i n f o

Keywords:

Particle swarm optimization Particle interaction Social-only model Statistical interpretation Theoretical analysis Progress rate Convergence Sphere function

a b s t r a c t

In this paper, we analyze the behavior of particle swarm optimization (PSO) on the facet of particle interaction. We firstly propose a statistical interpretation of particle swarm optimization in order to capture the stochastic behavior of the entire swarm. Based on the statistical interpretation, we investigate the effect of particle interaction by focusing on the social-only model and derive the upper and lower bounds of the expected particle norm. Accordingly, the lower and upper bounds of the expected progress rate on the sphere function are also obtained. Furthermore, the sufficient and necessary condition for the swarm to converge is derived to demonstrate the PSO convergence caused by the effect of particle interaction.

© 2010 Elsevier B.V. All rights reserved. 1. Introduction

Particle swarm optimization (PSO), introduced by Kennedy and Eberhart [1] in 1995, was proposed based on an inspiration from the social behavior of insects or animals that the exchanging and sharing of information among a group of individuals benefit the group survival by improving the group capability of foraging. In the framework of PSO, the insects or animals are considered as particles flying through the multi-dimensional search space and searching for the optimal position. The movement of particles is affected by three factors: the inertia, personal experience (the cognitive part), and particle interaction (the social part).

Since its introduction, PSO has been empirically shown to be a very useful and effective optimization framework [2] for the easiness to implement and flexibility to use. Although PSO is widely applied in many research fields nowadays, the theoretical analysis on PSO is still quite limited. To the best of our knowledge, the first analysis was proposed by Kennedy [3]. Particle trajectories for design choices were shown. Ozcan and Mohan [4,5] assumed fixed attractors and constant coefficients to demonstrate the particle trajectory as a sinusoidal wave. With similar assumptions, Maurice and Kennedy [6] simplified PSO to a deterministic dynamical system and analyzed its stability. Such simplified, deterministic versions of PSO or similar systems, employing a single particle, fixed attractors, or constant coefficients, were analyzed by many researchers for stability, convergence, and parameter selection [7–11]. Kadirkamanathan et al. [12] and Jian et al. [13] started to consider the randomness in acceleration coefficients, but attractors were still fixed. Away from the common PSO configuration, Emara and Fattah [14] as well as Gazi and Passino [15] analyzed PSO in a continuous time setting.

Most of the existing studies do not provide analysis on the facet of particle interaction, which is definitely an essential mechanism of PSO. In this paper, under more practical assumptions, including multiple particles, unfixed attractors, and stochastic acceleration coefficients, we make the first attempt to analyze the effect of particle interaction. In particular, we consider the PSO system from a macrostate viewpoint, analyze the swarm behavior, and obtain theoretical results on the progress rate as well as the convergence criterion.

The paper is organized as follows. In Section2, we will describe the particle swarm optimization algorithm and propose the statistical interpretation. In Section3, we will analyze the mean positions of particles by considering the effect of particle

Corresponding author.

E-mail addresses:ypchen@nclab.tw(Y.-p. Chen),pjiang@nclab.tw(P. Jiang). 0304-3975/$ – see front matter©2010 Elsevier B.V. All rights reserved.

(2)

interaction and derive the expected progress rate of the swarm on the sphere function. Next, we will look into the variance of the particle positions and show that the swarm will converge under certain condition in Section4. Finally, Section5

summarizes and concludes this paper. 2. PSO and particle interaction

In this section, we will firstly describe the standard PSO algorithm and then discuss the operations of PSO step by step, followed by the proposal of our statistical interpretation.

2.1. The standard PSO algorithm

First of all, for easily making an abstraction of PSO based on statistics and probabilistic distributions, we restate the standard PSO system as the following algorithm:

Algorithm 1 (Standard PSO).

procedure Standard PSO(Objective functionF

:

Rn

R)

Initialize a swarm of m particles

while the stopping criterion is not satisfied do Evaluate each particle

for particle i, i

=

1

,

2

, . . . ,

m do

F

Update the best positions ifF

(

Xi

) <

F

(

Pbi

)

then Pbi

Xi ifF

(

Pbi

) <

F

(

Nb

)

then Nb

Pbi end if end if end for

for particle i, i

=

1

,

2

, . . . ,

m do

F

Generate the next generation Vi

(

t

+

1

) ← w

Vi

(

t

) +

Cp

(

Pbi

Xi

) +

Cn

(

Nb

Xi

)

Xi

(

t

+

1

) ←

Xi

(

t

) +

Vi

(

t

+

1

)

end for end while end procedure

Throughout this paper, boldface is used to distinguish vectors from scalars, and

k·k

denotes the L2norm of a vector. The

notation

indicates component-by-component multiplication. According toAlgorithm 1, we can see that a standard PSO system comprises the following two main operations regarding the information sharing and utilizing:

(1) Updating attractors: Update the personal best position, Pbi, found by each particle, and the neighborhood best position,

Nb, found by any member within the neighborhood. Since Pbiand Nb exert gravity on other particles, they are referred

to as attractors in this study.

(2) Updating particles: Update the velocities at time t by using a linear combination of the inertia, Vi

(

t

)

, and the gravitation

from the cognitive part, Pbi, and the social part, Nb, respectively.

w

is the weight for the inertia and is usually a constant.

Cp and Cnare random vectors with each component sampled from uniform distributions U

(

0

,

cp

)

and U

(

0

,

cn

)

with cp

>

0 and cn

>

0 as acceleration coefficients. The position is then assigned according to the current position with application of the updated velocity.

As we can observe, the inherent characteristics of PSO – the interactions among particles – are implemented with the shared knowledge on the best position found by neighbors. When a particle within the neighborhood locates a position of an objective value which is better thanF

(

Nb

)

, the other particles will make corresponding adjustments and tend to go toward that position. Therefore, the neighborhood attractor can be viewed as a channel through which each particle can emulate the others, and the update of the neighborhood attractor can be considered as a signal urging the swarm to adjust their movements in order to respond to the new discovery in the search space.

2.2. A macroscopic view of PSO

In spite of its importance, the effect of particle interaction in PSO is hardly investigated in the literature. Although there are a number of remarkable theoretical studies that bring insights into the properties and behavior of PSO conducted in the past, most of those studies are based on the assumption that the attractor is fixed, e.g., the trajectory analysis [4,5] mentioned in Section1. Such a setting seems an inevitable path to simplify the PSO system to the extent that rigorous analysis can be done because the highly decentralized property of a particle swarm leads the system away from a unified depiction of the entire swarm. Each particle keeps its own position and memory, in the form of the inertia and the cognitive part, Pbi. In addition to the personal experience, the swarm also shares collective knowledge, Nb, and any slight change in

(3)

these quantities substantially defines a new state of the whole system. The analysis on the overall behavior of a swarm is thus beyond tractable due to the complication of state transition, and the simplification of invariant attractors becomes an unpleasant but necessary means that makes a particle able to be observed independently without the interference from the other factors of the entire swarm.

As a consequence, in order to take particle interaction into consideration in a theoretical analysis, an alternative interpretation of PSO that regards the swarm as a unity becomes necessary. With this point of view, the state of a PSO system should be considered as a measurement that reflects the overall behavior and characteristics of a swarm rather than as a detailed configuration directly related to each individual particle. For this purpose, the development of statistical mechanics may be a good example to learn from, especially the employment of statistical methods to bridge the macroscopic and microscopic descriptions. Accordingly, the state of the entire swarm can be considered as the ‘‘macrostate’’ — an abstraction of the detailed description of particles, i.e., the ‘‘microstate.’’ Hence in the macrostate space, the precise configuration of particles are converted into a statistical abstraction and characterization of the entire swarm.

More specifically, the exact locations of particles are no longer traced but instead modeled and expressed by using a distribution

θ(

t

)

over Rn. The velocities on each dimension are viewed as a random vectorV

(

t

) ∈

Rn. To concentrate on the social behavior, i.e., particle interaction, we use the social-only model of PSO categorized by Kennedy [16], in which PSO works without the cognitive component, to make the system more concise. The swarm size m is considered as the number of realizations or samples of the distribution. As to the neighborhood attractor, since the geographic knowledge about the search space is embodied in the positional distribution, it can be viewed as the best observed value of the current time step. When the neighborhood attractor is determined, the social gravitation is also accordingly determined. Formally, each particle Piis a random vector sampled from

θ(

t

)

, and its velocity vector Viis distributed asV

(

t

)

. Since the neighborhood

attractor is the best observed value, it can be defined as P

:=

arg min

{

F

(

P1

),

F

(

P2

), . . . ,

F

(

Pm

)} ,

and each particle Piupdates its position to Pi

+

w

Vi

+

C

(

P

Pi

)

. The distributions of the next time step

θ(

t

+

1

)

and V(t

+

1

)

are thus the statistical characterization, denoted as functionsTP andTV, of the observed values:

θ(

t

+

1

) ←

TP

(

P1

,

P2

, . . . ,

Pm

) ;

V

(

t

+

1

) ←

TV

(

P1

,

P2

, . . . ,

Pm

;

V1

,

V2

, . . . ,

Vm

) .

By considering PSO in this way, the search/optimization process is conducted through the repeated observations on the search space by realizing particles and modifying the distribution to accommodate the newly discovered results. Furthermore, going deeper into the notion of distribution, since the inertia weight

w

is usually a constant,V

(

t

)

can be considered redundant and may be removed because given two random vectors X ∼

θ(

t

)

and V ∼V(t

)

, where the notation ‘‘ ∼ ’’ indicates ‘‘is distributed according to,’’ we can simply let

e

θ(

t

)

be the distribution of X0

:=

X

+

w

V that includes the

effects of both the position and the velocity. Therefore, in the following, we will alter the notation

θ

to denote this compound distribution and parameterize it based on varied contexts.

The remaining questions would be what distribution is suitable for the description of a swarm without sacrificing too much essence of PSO and how to update the distribution as the search process proceeds. We can consider the random vector X ∼

θ(

t

)

, denote E [X]

=

µ, and decompose the region

R

:= {

y

Rn

|

Prob

{

X

=

y

}

>

0

}

into s disjoint regions R1

,

R2

, . . . ,

Rssuch that Prob

{

X

Ri

} =

1

/

s for all i

∈ {

1

,

2

, . . . ,

s

}

. Each region is associated with a random variable of velocity Vi ∼V

(

t

)

. If one point xiis respectively selected from each region Ri, when s is sufficiently large, the average behavior of a swarm can therefore be characterized by

s

X

i=1 1 s

(

xi

+

Vi

) =

s

X

i=1 1 sxi

+

s

X

i=1 1 sVi

µ

+

s

X

i=1 1 sVi

,

and each component of the term

P

s

i=1

(

1

/

s

)

Vican be approximated with a normal distribution according to the central

limit theorem. Thus, as an attempt to characterize the overall behavior of a swarm, the normal distribution should be a reasonable starting point. It is assumed that, at time t, each particle is sampled from c

(

t

) +

Z, where c

(

t

) ∈

Rnis the center of distribution and Z

Rnis a random vector of which each coordinate is distributed according to N

(

0

, σ

2

)

, where N

(

0

, σ

2

)

denotes the normal distribution with zero mean and variance

σ

2. In this paper,

φ(·)

andΦ(·)are used as the probability

density function (pdf) and the cumulative distribution function (cdf) of the standard normal distribution, respectively. We can then reparameterize

θ(

t

)

, the distribution of c

(

t

) +

Z, as

θ(

c

(

t

), σ

2

)

.

The update of distributions can now be simplified into the modification of the mean and the variance. The mean is the arithmetic average of updated positions of particles, and the variance is estimated by a maximum likelihood estimation (MLE) which will be addressed later. Under such an interpretation, the PSO system can be described with the following algorithm:

(4)

Table 1

Average p-values of normality tests. Swarm size Normality tests

Shapiro–Wilk [17] Anderson–Darling [18] D’Agostino–Pearson [19]

10 0.3879 0.3621 0.3985

20 0.3257 0.2842 0.3393

30 0.2903 0.2518 0.2876

Algorithm 2 (Statistical interpretation of PSO). procedure PSO(Objective functionF

:

Rn

R) Initialize

θ

while the stopping criterion is not satisfied do for i

=

1

,

2

, . . . ,

m do Pi

θ

end for P

=

arg min

{

F

(

P1

),

F

(

P2

), . . . ,

F

(

Pm

)}

for i

=

1

,

2

, . . . ,

m do P0 i

Pi

+

Ci

(

P

Pi

)

end for µt+1

(P

m i=1P 0 i

)/

m

σ

2 t+1

MLE

(

P 0 1

,

P 0 2

, . . . ,

P 0 m

)

θ ← θ(

µt+1

, σ

t2+1

)

t

t

+

1 end while end procedure

In order to validate the utilization of normal distributions for describing swarms, we conducted three well-known normality tests: the Shapiro–Wilk test [17], the Anderson–Darling test [18], and the D’Agostino–Pearson test [19] on the social-only PSO on the sphere function.Table 1displays the test results, which were obtained for 100 independent runs and 10 iterations in each run. The weight for the inertia is 0

.

73 and the acceleration coefficient is 1

.

49. Since all p-values of the three normality tests significantly surpass the conventional significance level 0

.

05, none of these tests are able to reject the null hypothesis. As a result, in this study, adopting the normal distribution as the description of swarms is an acceptable assumption.

In summary, the macrostate model transforms the detailed configuration of PSO into a corresponding stochastic representation embodied by normal distributions. As a consequence, the update of particles is simplified as the modification of the parameters of normal distributions. In each iteration,Algorithm 2generates a swarm of particles by means of sampling from the current distribution, and thereafter, the distribution is updated according to particle interaction. In others words, a state ofAlgorithm 2is a distribution, and the sampled swarm serves as a medium for state transition. In this manner, the analysis of the behavior of the entire swarm is thus reduced to the analysis of parameterized distributions. The inclusion of particle interaction into analysis supplies numerous facets of PSO typically absent in related theoretical studies, e.g., the progress rate and the influence of objective functions, because the restriction of fixed attractors makes objective functions irrelevant. Since the No-Free-Lunch theorem states that all optimization algorithms perform identically on average [20], the effectiveness of PSO can hardly be theoretically identified unless the scope of functions is specified.

In the remainder of this paper, Algorithm 2will be the study subject and be formally investigated on the sphere function, which is commonly adopted in the theoretical analysis of evolutionary algorithms (e.g., [21]) and can be formulated as F

(

x

) =

n

X

i=1 x2i

,

where x

=

(

x1

,

x2

, . . . ,

xn

) ∈

Rn. 3. Progress rate analysis

The major benefit to develop and adopt the abstraction based on probabilistic distributions of PSO is that the mathematical model can be analyzed without the assumption of fixed attractors, because particles are in essence random vectors in the search space and consequently their behavior can be described and predicted in a statistical sense. In this section, we will demonstrate how the statistical interpretation of PSO proposed in the present work facilitates the analysis of inter-particle effects and how these effects are accounted for the progress rate of a swarm. We will begin with the n-ball hitting probability.

(5)

3.1. n-ball hitting probability

Given a distribution

θ

over Rn, the term n-ball hitting probability refers to the probability that a random vector sampled from

θ

that ‘‘falls’’ into a specific n-dimensional ball. This probability is fundamental to the sphere model, because in the sphere model the objective function is simply the squared L2norm, and a subset of Rnconstructed by collecting all the vectors with their norms bounded by a specific non-negative quantity forms an n-ball located at the origin with a radius defined by that non-negative quantity. Therefore, n-ball hitting probability is equal to the probability that the norm of a random vector is less than or equal to the radius. In other words, it is essentially the cumulative distribution function (cdf) of the norm of a random vector.

Given the center of distribution at time t, c

(

t

) = (

c1

,

c2

, . . . ,

cn

) ∈

Rn, we would like to calculate the probability, denoted as Bk

(

o

)

, that c

(

t

) +

Z ∼

θ

is in an n-ball located at the origin with radius k, where Z

=

(

Z1

,

Z2

, . . . ,

Zn

) ∈

Rnis a random vector and each coordinate of Z is normally distributed. Since Z1

,

Z2

, . . . ,

Znare independent and identically distributed (i.i.d.) random variables, Z is an isotropic random vector, i.e., all directions of Z are equally likely to occur [22]. We elaborate this property as follows. Given Z1

,

Z2

, . . . ,

Zn∼ N

(

0

, σ

2

)

, for all x

=

(

x1

,

x2

, . . . ,

xn

) ∈

Rn,

Prob

{

c

(

t

) +

Z

=

x

} =

n

Y

i=1 1

2

πσ

exp

 −

(

xi

ci

)

2 2

σ

2



=



1

2

πσ



n exp

n

X

i=1

(

xi

ci

)

2 2

σ

2

=



1

2

πσ



n exp

 −

d

(

x

,

c

(

t

))

2 2

σ

2



,

where d

(·, ·)

denotes the Euclidean distance. It is obvious that the density at point x is determined by d

(

x

,

c

(

t

))

, regardless of the direction in which x is relatively to c

(

t

)

. Therefore, without loss of generality, we may assume that c

(

t

)

is on the first axis by conducting a coordinate transformation. Let r

:=

d

(

c

(

t

),

o

) ≥

0. As a result, c

(

t

)

can be expressed, after the coordinate transformation, as

(

r

,

0

,

0

, . . . ,

0

)

, and the distribution is denoted as

θ(

r

, σ

2

)

. Now, the n-ball hitting probability

can be formally defined as follows.

Definition 1. Given an n-ball Bk

(

o

) ∈

Rnand a random vector c

(

t

) +

Z ∼

θ(

r

, σ

2

) ∈

Rn, where c

(

t

) = (

r

,

0

,

0

, . . . ,

0

)

and all coordinates of Z are distributed according to N

(

0

, σ

2

)

, the n-ball hitting probability

HB

(

k

, θ(

r

, σ

2

)) :=

Prob

{

c

(

t

) +

Z

Bk

(

o

)} .

The analysis approach adopted in the present work is similar to that used by Beyer in 2001 [21]. The vector Z is decomposed into two orthogonal vectors: Z1e1

=

(

Z1

,

0

,

0

, . . . ,

0

)

and Z0

=

(

0

,

Z2

,

Z3

, . . . ,

Zn

)

. We can take a closer look at the n-ball hitting probability HB

(

k

, θ(

r

, σ

2

))

: HB

(

k

, θ(

r

, σ

2

)) =

Prob

{

c

(

t

) +

Z

Bk

(

o

)}

=

Prob

k

(

r

+

Z1

)

e1

+

Z0

k ≤

k

=

Prob



(

r

+

Z1

)

2

+ k

Z0

k

2

k2

=

Prob

−

k

r

Z1

k

r

,

0

≤ k

Z 0

k

2

k2

(

r

+

Z1

)

2

.

The equation shows that the n-ball hitting probability is the joint distribution of Z1and W

:= k

Z0

k

2. Since Z1 ∼ N

(

0

, σ

2

)

, the probability density function can be expressed as

p

(

Z1

,

x

) :=

Prob

{

Z1

=

x

} =

1

2

πσ

exp

 −

x2 2

σ

2



,

and W is a chi-square random variable with n0

:=

n

1 degrees of freedom:

p

(

W

,

y

) :=

Prob

{

W

=

y

} =

1

σ

2



y σ2



n 0 2−1 exp



y 2σ2



2n 0 2Γ



n0 2



.

(6)

As a result, we can get HB

(

k

, θ(

r

, σ

2

)) =

Prob

−

k

r

Z1

k

r

,

0

≤ k

Z0

k

2

k2

(

r

+

Z1

)

2

=

Z

kr x=−kr

Z

k2−(x+r)2 y=0 p

(

Z1

,

x

)

p

(

W

,

y

)

dy dx

=

Z

kr x=−kr p

(

Z1

,

x

)

Z

k2−(x+r)2 y=0 1

σ

2



y σ2



n 0 2−1 exp



y 2σ2



2n 0 2Γ



n0 2



dy dx (let u

:=

y

2)

=

Z

kr x=−kr p

(

Z1

,

x

)

Z

k2−(x+r) 2 σ2 u=0 un 0 2−1expu 2



2n 0 2Γ



n0 2



du dx

=

Z

kr x=−kr p

(

Z1

,

x

)

P



n0 2

,

k2

(

x

+

r

)

2 2

σ

2



dx

,

whereP

(·)

is the regularized Gamma function.

Remark 2. If an asymptotic approximation is desired for the n-ball hitting probability, HB

(

k

, θ(

r

, σ

2

))

, we can utilize the normal approximation to the regularized Gamma function [23, chapter 7] as

P



n0 2

,

k2

(

x

+

r

)

2 2

σ

2



≈Φ



1

2n0



k2

(

x

+

r

)

2

σ

2

n 0



.

For the asymptotic approximation, when n is sufficiently large, the term

(

1

/

2n0

)[

k2

(

x

+

r

)

2

]

2vanishes. Thanks to

the continuity ofΦ(·), we can obtain

P



n0 2

,

k2

(

x

+

r

)

2 2

σ

2



≈Φ

r

n0 2

!

.

Hence, HB

(

k

, θ(

r

, σ

2

))

≈Φ

r

n0 2

!

Z

kr x=−kr p

(

Z1

,

x

)

dx

=

Φ

r

n0 2

!



Φ



k

r

σ



Φ

 −

k

r

σ



=

Φ

r

n0 2

!



Φ



r

+

k

σ



Φ



r

k

σ



.

In addition to the asymptotic properties of HB

(

k

, θ(

r

, σ

2

))

, it would be helpful to derive a lower bound for HB

(

k

, θ(

r

, σ

2

))

to facilitate our analysis in the present work.

Lemma 3 (Lower Bound for HB

(

k

, θ(

r

, σ

2

))

). HB

(

k

, θ(

r

, σ

2

)) ≥

"

Φ r

+

kn

σ

!

Φ r

k n

σ

!#



1



k

n

σ



n−1

.

Proof. Let Y

:=

c

(

t

) +

Z, where c

(

t

) = (

r

,

0

,

0

, . . . ,

0

)

, and Z

=

(

Z1

,

Z2

, . . . ,

Zn

)

. LetD

:= [−

k

/

n

,

k

/

n

]

n

Rn. For all x

D, because

k

x

k ≤

n

k

x

k

n

(

k

/

n

) =

k, we can know that x

Bk

(

o

)

. Hence,D

Bk

(

o

)

, and Prob

{

Y

Bk

(

o

)} ≥

Prob

{

Y

D

} =

Prob



k n

r

Z1

k

n

r



n

Y

i=2 Prob



k n

Zi

k

n



=

"

Φ kn

r

σ

!

Φ

k n

r

σ

!# "

Φ kn

σ

!

Φ

k n

σ

!#

n−1

=

"

Φ r

+

kn

σ

!

Φ r

k n

σ

!#



1



k

n

σ



n−1

.



(7)

For the notational purpose, we let

ψ

0

(

k

) :=

"

Φ r

+

kn

σ

!

Φ r

k n

σ

!#



1



k

n

σ



n−1

,

and the antiderivative is defined as

ψ(

k

) := R

tk=0

ψ

0

(

t

)

dt.

Remark 4. Similarly, we can also define the n-sphere hitting density HS

(

k

, θ(

r

, σ

2

))

for random vector c

(

t

) +

Z as HS

(

k

, θ(

r

, σ

2

)) :=

Prob

{k

c

(

t

) +

Z

k =

k

}

=

Prob

−

k

r

Z1

k

r

,

W

=

k2

x2

=

Z

kr x=−kr p

(

Z1

,

x

)

p

(

W

,

k2

x2

)

dx

.

Therefore, the n-ball hitting probability, HB

(

k

, θ(

r

, σ

2

))

, as the cumulative function of HS

(

k

, θ(

r

, σ

2

))

, can be alternatively defined as

Z

k y=0

Z

yr x=−yr p

(

Z1

,

x

)

p

(

W

,

y2

x2

)

dx dy

.

However, the density function HS

(

k

, θ(

r

, σ

2

))

serves no purpose other than a definition in the following analysis. We left it as a side note for completeness without further discussion.

3.2. Expected particle norm

The entire PSO system can be decomposed into two fundamental components: (1) the update of attractors to share and exchange information among particles, and (2) the update of particle positions through the interaction between particles and attractors. Hence, as we gain understandings of the characteristics of attractors and particles, we may capture the stochastic behavior of the PSO system. More specifically, because the distance from the origin is the most important characteristic of the sphere model for its unimodality, in this section, we highlight the expected distance between particles and the global optimum. Given a probabilistic model according to which particles are distributed, we would like to know how close to the global optimum in expectation the sampled particles are. Since the global optimum is simply the origin in the sphere model, we concentrate on the L2-norm of sampled particles. The expected norms of the attractor and of particles are examined, respectively. As the analysis proceeds, it can be shown that these two expectations influence the progress rate of PSO.

Given the center of a particle distribution c

(

t

) = (

r

,

0

, . . . ,

0

)

and Z

=

(

Z1

,

Z2

, . . . ,

Zn

)

with Z1

,

Z2

, . . . ,

Zn∼ N

(

0

, σ

2

)

, suppose that there are m particles, P1, P2,

. . .

, Pm, sampled as c

(

t

) +

Z, the expected norm of particles can be defined as

P

:=

E [

k

c

(

t

) +

Z

k

]

,

which can be considered as the mean solution quality of the current swarm on the sphere function. The following lemma gives an upper bound for P.

Lemma 5 (Upper Bound for the Expected Particle Norm). If c

(

t

) = (

r

,

0

,

0

, . . . ,

0

)

and Z

=

(

Z1

,

Z2

, . . . ,

Zn

)

with Z1

,

Z2

, . . . ,

Zn∼ N

(

0

, σ

2

)

, P

r2

+

n

σ

2.

Proof. For all positive random variable X , since the square root is a concave function, we have E

h√

X

i

E [X ] according to Jensen’s inequality. By utilizing this property, we can have the following derivation:

P

=

E [

k

c

(

t

) +

Z

k

]

=

E

v

u

u

t

(

Z1

r

)

2

+

n

X

i=2 Z2 i

v

u

u

t

E

"

(

Z1

r

)

2

+

n

X

i=2 Zi2

#

=

v

u

u

t

E



r2

 −

2rE [Z1]

+

n

X

i=1 E



Z2 i



=

p

r2

+

n

σ

2

.

Because Zi ∼ N

(

0

, σ

2

)

, we have E



Zi2

 =

σ

2and E [Zi]

=

0. An upper bound for the expected particle norm, P, is therefore obtained. 

(8)

The expected particle norm describes how close on average a swarm is to the global optimum, i.e., the origin, of the sphere function. In order to capture the characteristic of the essential mechanism of PSO – particle interaction – we also need to investigate the attractor. As stated in the previous section, the attractor is the best observed value, i.e., in our case, the particle with the minimum objective value within the neighborhood in the current swarm. Under the adopted statistical interpretation of PSO, the expected minimum objective value of a swarm becomes traceable through order statistics, because particles are viewed as random vectors over Rn.

Let P(i,m)denote the ith order statistic of

k

P1

k

,

k

P2

k

,

. . .

,

k

Pm

k

, e.g., P(1,m)

=

min

{k

P1

k

,

k

P2

k

,

. . .

,

k

Pm

k}

. Denoting the

event

k

Pi

k =

x as

{k

Pi

k =

x

}

, the density of P(1,m)at a non-negative real number x can be given as Prob



P(1,m)

=

x

=

Prob

(

m

[

i=1

"

{k

Pi

k =

x

}

\

\

j∈{1,2,...,m}\{i}

k

Pj

k

>

x

!#)

=

Z

kr x=−kr



m 1



HS

(

x

, θ(

r

, σ

2

)) 

1

HB

(

x

, θ(

r

, σ

2

))

m dx

.

Denoting E



P(1,m)



as P(1,m), a naive upper bound for P(1,m)is derived in the following lemma. Lemma 6. P(1,m)

P

Proof. The general upper bound for the expected ith order statistic states

P(i,m)

P

+

(

Var [

k

c

(

t

) +

Z

k

]

)

1 2

r

i

1 m

i

+

1

.

As a result, P(1,m)

P

+

(

Var [

k

c

(

t

) +

Z

k

]

)

1 2

r

1

1 m

1

+

1

=

P

.



Lemma 6causes no surprise. The expected minimum particle norm is obviously less than or equal to the expected norm. However, inspired byLemma 6, we can seek another upper bound for P(1,m)by definition.

Lemma 7 (Upper Bound for P(1,m)). (1) P(1,m)

=

Z

x=0



1

HB

(

x

, θ(

r

, σ

2

))

m dx

,

and (2) P(1,m)



lim h→∞[h

ψ(

h

)

]



m2

.

Proof. (1) For any random variable X , E [

|

X

|

]r

=

r

R

0tr−1Prob

{|

X

|

>

t

}

dt with r

>

0 [24]. Since P

(1,m)is a non-negative random variable, by letting r

=

1 we have

P(1,m)

=

Z

x=0 Prob



P(1,m)

>

x

dx

=

Z

x=0 Prob

(

m

\

i=1

{k

Pi

k

>

x

}

)

dx

=

Z

x=0



1

HB

(

x

, θ(

r

, σ

2

))

m dx

.

(2) Based on the result of (1), we obtain

P(1,m)

=

Z

x=0



1

HB

(

x

, θ(

r

, σ

2

))

m dx

Z

x=0



1

ψ

0

(

x

)

mdx

.

By resorting to Hölder’s inequality, we can move m outside of the integration to obtain a more comprehensible bound as

Z

x=0



1

ψ

0

(

x

)

mdx

Z

x=0



1

ψ

0

(

x

)

2dx



m2

Z

x=0



1

ψ

0

(

x

)

dx



m2

=



lim h→∞ [h

ψ(

h

)

]



m2

.

(9)

Because this upper bound is presented in a limit form, a subsequent question would be whether or not it converges. The following theorem guarantees the convergence of the quantity.

Lemma 8.

(

limh→∞

[

h

ψ(

h

)])

m 2 is convergent. Proof. Denote

R

h x=0



1

ψ

0

(

h

)

dx as G

(

h

)

. Since m is a constant,

(

lim

h→∞

[

h

ψ(

h

)])

m

2 converges if limh→∞G

(

h

)

converges.

G

(

h

)

is incremental because 1

ψ

0

(

x

)

is always non-negative. Thus, it is sufficient to show that G

(

h

)

is bounded from above.

When h

>

r

n, G

(

h

) =

Z

h x=0



1

ψ

0

(

h

)

dx

=

Z

h x=0 1

"

Φ r

+

xn

σ

!

Φ r

x n

σ

!#



1



x

n

σ



n−1

!

dx

Z

rn x=0 dx

+

Z

h x=rn 1

"

Φ r

+

xn

σ

!

Φ r

x n

σ

!#



1



x

n

σ



n−1

!

dx

r

n

+

Z

h x=rn

1

"

Φ xn

r

σ

!

Φ r

x n

σ

!# "

1

r

x n

σ

!#

n−1

dx

=

r

n

+

Z

h x=rn 1

"

1

r

x n

σ

!#

n

!

dx

.

When x

r

n, Φ r

xn

σ

!

1 2

.

By applying Bernoulli’s inequality, we can get G

(

h

) ≤

r

n

+

Z

h x=rn 1

"

1

2nΦ r

x n

σ

!#!

dx

=

r

n

+

2n

Z

h x=rn Φ r

xn

σ

!

dx

=

r

n

+

2n

"

r

n

+

x



Φ r

xn

σ

!

σ

n

·

φ

r

x n

σ

!#

x=h x=rn

.

The integration of the normal distribution is given in [25]. When h

→ ∞

, the term

σ

n

·

φ

r

h n

σ

!

vanishes. Thus, now we only need to show lim h→∞

"

r

n

+

h



Φ r

h n

σ

!#

< ∞.

Here we apply Mill’s ratio to replaceΦ(·)with

φ(·)

and get

r

n

+

h



Φ r

hn

σ

!

=

h

r

n



"

1

Φ hn

r

σ

!#

h

r

n

 ·

φ

hn

r

σ

!

·

hn

r

σ

!

−1

=

σ

n

 ·

φ

hn

r

σ

!

=

0 as h

→ ∞

.

(10)

3.3. Lower and upper bounds for the expected progress rate

After the work was done in the previous sections, the progress rate of the social-only model PSO can now be formally investigated under the proposed statistical interpretation. The term ‘‘progress rate’’ was introduced by Rechenberg in 1973 [26]. As the name suggests, progress rate should be a quantity indicating how a particle swarm progresses, and hence in the present work, it is defined as the difference of the norms of the two distribution centers in successive time steps, because the distance to the optimum is the L2norm for the sphere function. Given the current center of distribution c

(

t

) =

(

r

,

0

,

0

, . . . ,

0

)

and a random vector Z

=

(

Z1

,

Z2

, . . . ,

Zn

)

with Z1

,

Z2

, . . . ,

Zn ∼ N

(

0

, σ

2

)

, the m particles P1

,

P2

, . . . ,

Pm

are sampled as c

(

t

) +

Z. Let P(i,m)denote the ith order statistic of

k

P1

k

,

k

P2

k

,

. . .

,

k

Pm

k

. Let P

:=

arg min

{

F

(

P1

),

F

(

P2

),

. . . ,

F

(

Pm

)}

. By definition,

k

P

k =

P(1,m). According to the update rules described in Section2.2, the updated position P0i

is computed as P0

i

=

Pi

+

Ci

(

P

Pi

)

, where each coordinate of Ciis distributed according to U

(

0

,

c

)

with c being the

coefficient representing the compound effect of both the inertia weight and the acceleration coefficient of the social part. For simplicity, we still call c the acceleration coefficient in this paper because the inertia weight is usually constant. The center of distribution in the next step c

(

t

+

1

)

is the mean of P0

1

,

P 0 2

, . . . ,

P 0 m, i.e., c

(

t

+

1

) = (P

m i=1P 0 i

)/

m.

Definition 9. Given c

(

t

) = (

r

,

0

,

0

, . . . ,

0

)

, the progress rate∆t

:= k

c

(

t

)k − k

c

(

t

+

1

)k =

r

− k

c

(

t

+

1

)k

.

The following theorem shows that, when c

1

/

2, the expected norm of the center of distribution in the next time step is bounded from above by a linear combination of the expected particle norm P and the expected minimum of the particle norm P(1,m).

Lemma 10. Suppose C

=

(

C1

,

C2

, . . . ,

Cn

)

is a random vector of Rnwith i.i.d. components and X is a random vector of Rn. If C and X are independent, then E [

k

C

X

k

]

p

µ

20E [

k

X

k

], where

µ

02is the second moment of Ci.

Proof. For any fixed vector x

=

(

x1

,

x2

, . . . ,

xn

) ∈

Rn, E [

k

C

x

k

]

=

E

"

r X

i=1:n Ci2x2i

#

v

u

u

t

E

"

X

i=1:n Ci2x2i

#

=

r X

i=1:n E



C2 i



x2 i

=

q

µ

0 2

k

x

k

.

Since C and X are independent, by the law of total expectation conditional on X, this lemma is proved. 

Theorem 11 (Upper Bound for the Expected Norm of the Next Center). (1) E [

k

c

(

t

+

1

)k

]

E [

|

1

C

|]

] P

+

E [

|

C

|

] P(1,m); and (2) If c

1

/

2, E [

k

c

(

t

+

1

)k

]

(

1

c

)

P

+

cP(1,m); otherwise, E [

k

c

(

t

+

1

)k

]

≤ [

(

2c2

2c

+

1

)/

2c

]

P

+

cP(1,m).

Proof. This result is derived from the triangle inequality for L2-norm and the previous lemma:

E [

k

c

(

t

+

1

)k

]

=

E

m

X

i=1



Pi

+

Ci

P

Pi



m

=



1 m



E

"

m

X

i=1

(

1

Ci

) ⊗

Pi

+

mCi

P

#



1 m



m

X

i=1 E [

k

(

1

Ci

) ⊗

Pi

k

]

+

mE

k

Ci

P

k



!

c2

/

3

c

+

1



1/2P

+

c2

/

3



1/2P(1,m)

.



Corollary 12 (Lower Bound for the Progress Rate). E [t]

r

c2

/

3

c

+

1



1/2

P

c2

/

3



1/2

P(1,m).

After the lower bound for E [∆t] is established inCorollary 12, the next theorem sets a lower bound for E [

k

c

(

t

+

1

)k

]. An upper bound for E [∆t] will be accordingly obtained as a corollary.

(11)

Theorem 13 (Lower Bound for the Expected Norm of the Next Center). If c

1, E [

k

c

(

t

+

1

)k

]

r

(

1

exp

(−

2n0

[

Φ(−r

/

σ)]

m

))

.

Proof. Since

k

c

(

t

+

1

)k

is a non-negative random variable, from Markov’s inequality, we have, for any positive number a, Prob

{k

c

(

t

+

1

)k >

a

} ≤

a−1E [

k

c

(

t

+

1

)k

]

.

Substituting a with r,

r Prob

{k

c

(

t

+

1

)k >

r

} ≤

E [

k

c

(

t

+

1

)k

]

.

Let the jth coordinate of Pi, Ci, and c

(

t

+

1

)

be Pij, Cij, and c

(

t

+

1

)

j, respectively. If there exists a coordinate j such that min

{

P1j

,

P2j

, . . . ,

Pmj

} ≥

r, then

k

c

(

t

+

1

)k ≥ |

c

(

t

+

1

)

j

|

=

m

X

i=1



Pij

+

Cij Pj

Pij



m

=

m

X

i=1



1

Cij



Pij

+

CijPj



m



1

Cij



mr

+

Cijmr



m

=

r

.

Similarly, max

{

P1j

,

P2j

, . . . ,

Pmj

} ≤ −

r implies

k

c

(

t

+

1

)k >

r. Let Ej+be the event that min

{

P1j

,

P2j

, . . . ,

Pmj

} ≥

r and Ej be the event that max

{

P1j

,

P2j

, . . . ,

Pmj

} ≤ −

r. Let Ej

:=

E+j

S

Ejand E

:=

S

m j=1Ej, we have Prob

{

E

} =

Prob

n

E

\

E1+

o

+

Prob

n

E

\

E1+



c

o

Prob



E1+

+

Prob

(

n

[

i=2 Ei

!

\

E1+



c

)

=

Prob



E1+

+

1

Prob



E+1



Prob

(

n

[

i=2 Ei

)

.

Because P1

,

P2

, . . . ,

Pmare i.i.d. and for each particle all of its coordinates other than the first one are identically distributed,

for all i

>

1 the symmetry and disjointness of E+i and Ei−imply that Prob

{

Ei

} =

2Prob



Ei+

=

2

[

1

Φ(r

/σ )]

m

=

2

[

Φ(−r

/σ)]

m. Let q

:=

2

[

Φ(−r

/σ )]

mfor convenience of notation. By using the inclusion–exclusion principle, we have Prob

(

n

[

i=2 Ei

)

=

n0

X

i=1



n0 i



qi

(−

1

)

i+1

=

1

n0

X

i=0



n0 i



(−

q

)

i

=

1

(

1

q

)

n0

1

exp

n0q



.

As a result,

E [

k

c

(

t

+

1

)k

]

r Prob



E1+

+

1

Prob



E1+



1

exp

n0q



r Prob



E1+

+

1

Prob



E1+

exp

n0q



=

r 1

exp

2n0[Φ(−r

/σ )

]m



.



Corollary 14 (Upper Bound for the Progress Rate). If c

<

1, then E [t]

r exp

2n0

[

Φ(−r

/σ )]

m



.

WithTheorems 11and13, we established the upper and lower bounds of the expected particle norm. Accordingly, with

Corollaries 12and14, we derived the lower and upper bounds of the expected progress rate of a swarm in the social-only model. As aforementioned, by statistically interpreting the social-only model PSO, we can describe the ‘‘macrostate’’ of the particle swarm and therefore are able to analyze the stochastic behavior of PSO based on the facet of particle interaction.

(12)

4. Convergence analysis

As stated in Section2.2, the transition from the current time step to the next time step consists of updating positions of particles, calculating the distribution center by means of the updated positions, and using the maximum likelihood estimation to calculate the distribution variance. The issues related to the centers of distributions have been addressed in Section3. Thus, the part of variance is considered in this section. While the center of a distribution can be viewed as the indication of the average quality of the swarm at a specific time step, the variance is a direct measurement of convergence, because from the viewpoint of statistical interpretation, a swarm converges as the variance of the distribution reduces to zero. The word ‘‘converge’’ is not a unified term in the research domain of PSO [27, p. 132]. It has been used to describe the behavior of a swarm approaching the local optimum in some papers, while it simply indicates the phenomenon that a swarm of particles crowds into a specific point, sometimes called the equilibrium, not necessarily the local optimum, in the search space in other papers. Here in the present work, we adopt the latter definition. We concentrate on the condition under which a swarm of particles may go into a stable state. We will demonstrate that if certain condition of the relationship between the swarm size and the acceleration coefficient is satisfied, a swarm in the social-only model does converge under the mechanism of particle interaction.

Given m observed vectors y1

,

y2

, . . . ,

ymthat stand for the updated positions and the distribution center is denoted

as c

(

t

+

1

) =

y

:=

im=1yi

)/

m. Let Y1

,

Y2

, . . . ,

Ym be random vectors sampled from

θ(k

y

k

, σ

t2+1

)

. These vectors are

n-dimensional random vectors centered at y, and the coordinate on each dimension is a random variable sampled from N

(

0

, σ

2

t+1

)

, where

σ

t2+1is the variance that we wish to estimate. In order to estimate the variance, the likelihood function of

σ

2

t+1, L

t2+1

)

, can be defined as the joint probability:

L

t2+1

) :=

m

Y

i=1



1

2

πσ

t+1



n exp

d

(

yi

,

y

)

2 2

σ

2 t+1

!

=



1

2

πσ

t+1



mn exp

m

X

i=1 d

(

yi

,

y

)

2 2

σ

t2+1

=

K

σ

t+mn1 exp

R 2

σ

t2+1

!

,

where K

:=



1

2

π



mn

,

R

:=

m

X

i=1 d

(

yi

,

y

)

2

.

In order to get the

σ

t2+1that maximizes L

t2+1

)

, we differentiate L

t2+1

)

with respect to

σ

t2+1: L0

t2+1

) = −

mn 2 K

·

σ

mn−2 t+1

·

exp

R 2

σ

2 t+1

!

+

R 2K

·

σ

mn−4 t+1

·

exp

R 2

σ

2 t+1

!

.

L0

2

t+1

) =

0 implies

σ

t2+1

=

R

/(

mn

)

, and it is routine to check the maximality. Since both m and n are fixed, the only

quantity needs to be examined is R, the sum of square of the distance between each updated particle and the center. Given c

(

t

) = (

r

,

0

,

0

, . . . ,

0

)

and Z

=

(

Z1

,

Z2

, . . . ,

Zn

)

with Z1

,

Z2

, . . . ,

Zn ∼ N

(

0

, σ

t2

)

, the m particles P1

,

P2

, . . . ,

Pm

are sampled from c

(

t

) +

Z, and the updated position is calculated as Pi

+

Ci

(

P

Pi

)

, where P∗is the attractor. Since

c

(

t

+

1

) = P

mi=1[Pi

+

Ci

(

P

Pi

)

]

/

m, R, as a random variable, can be defined by P1

,

P2

, . . . ,

Pmand P∗:

R

=

m

X

i=1

Pi

+

Ci

(

P

Pi

) −

m

X

j=1 Pj

+

Ci

(

P

Pj

)

m

2

.

Denoting Pi’s and P’s kth coordinate as Pikand Pk, respectively, the expectation of R, E [R], can be derived in the following lemma:

Lemma 15. Given the swarm size, m, and the variance of distribution at time t,

σ

2

t

=

σ

2, E



σ

2 t+1

 ≤ (

m

1

2 12m



5

+

3

(

m

1

)

n



c2

6c

+

12



.

(13)

Proof. Defining Rjas Rj

:=

m

X

i=1

Pij

+

Cij Pj

Pij

 −

m

X

k=1 Pkj

+

Ckj

(

Pj

Pkj

)

m

2 yields R

=

P

j=1:nRjand that we can obtain for j

>

1, E



Rj



=



m

1 12m



2 4c2

6c

+

12

 +

c2E

h

Pj



2

io



m

1 12m





σ

2 5c2

6c

+

12



.

The last inequality follows from the fact that the independence of coordinates implies E

h

Pj



2

i

σ

2. Moreover, E

[

R1

] =



m

1 12m



2 4c2

6c

+

12

 +

c2E

h

P∗ 1

r



2

io .

Since E

h

P1

r



2

i

is less than or equal to the expected value of the extreme order statistics of T12

,

T22

, . . . ,

T2

m, where Ti∼ N

(

0

, σ

2

)

, by using the upper bound for the extreme order statistics [28],

E

h

P1

r



2

i

σ

2



1

+

p

3

(

m

1

) .

As a consequence, E



σ

2 t+1

 =

E [R]

/(

mn

) ≤

(

m

1

2 12m



5

+

3

(

m

1

)

n



c2

6c

+

12



.



WhileLemma 15is under the assumption that

σ

t2is given or more formally, the conditional expectation E



σ

t2+1

|

σ

t2

=

σ

2



is derived, the following theorem indicates the relationship between E



σ

2 t



and E



σ

2 t+1



and gives a sufficient and necessary condition that the sequence



E



σ

2

t



converges to zero. Without loss of generality for the normal operation of PSO, we assume that E



σ

02



< ∞

.

Theorem 16 (Convergence of the Expectation of Variance). Let

κ :=

3

(

m

1

)/

n. If c satisfies the condition that 3

q

9

+

60+5κ m−1 5

+

κ

<

c

<

3

+

q

9

+

60+5κ m−1 5

+

κ

,

limt→∞



E



σ

t2

 =

0.

Proof. The law of total expectation andLemma 15imply that E



σ

2 t+1

 ≤ (

m

1

)

12m



5

+

3

(

m

1

)

n



c2

6c

+

12



E



σ

2 t



.

Therefore,



E



σ

2 t



is upper-bounded by the geometric sequence with the first term E



σ

2

0



and the ratio

(

m

1

)

12m



5

+

3

(

m

1

)

n



c2

6c

+

12



.

By solving

(

m

1

)

12m



5

+

3

(

m

1

)

n



c2

6c

+

12



<

1

,

the theorem is proved. 

Since

σ

t2takes the value on non-negative real numbers, the convergence of sequence



E



σ

t2



implies sequence



σ

t2

converges to zero in probability, as shown in the following corollary.

Corollary 17 (Convergence of Variance). If limt→∞



E



σ

2 t



=

0, then limt→∞

σ

t2 p

0, i.e., for every

 >

0 limt→∞Prob



σ

2

參考文獻

相關文件

Other advantages of our ProjPSO algorithm over current methods are (1) our experience is that the time required to generate the optimal design is gen- erally a lot faster than many

Shih-Cheng Horng , Feng-Yi Yang, “Embedding particle swarm in ordinal optimization to solve stochastic simulation optimization problems”, International Journal of Intelligent

Then, it is easy to see that there are 9 problems for which the iterative numbers of the algorithm using ψ α,θ,p in the case of θ = 1 and p = 3 are less than the one of the

We explicitly saw the dimensional reason for the occurrence of the magnetic catalysis on the basis of the scaling argument. However, the precise form of gap depends

We cannot exclude the presence of the SM Higgs boson below 127 GeV/c 2 because of a modest excess of events in the region. between 115 and 127

On top of the overall students’ attainment rates of a school in Chinese Language, English Language and Mathematics (starting from 2014, individual primary schools are no

The ES and component shortfall are calculated using the simulation from C-vine copula structure instead of that from multivariate distribution because the C-vine copula

assembly of the genome of that species will be far better if read lengths are longer than N... Accurate but