3 Tournament Selection

(1)

Convergence Models of Genetic Algorithm Selection Schemes

DirkThierens

1?and ^David^{Goldb erg}²

1 Department of Electrical Engineering ESAT-lab KU.Leuven

Kardinaal Mercierlaan 94 B-3001 Leuven, Belgium

2 Department of General Engineering University of Illinois at Urbana-Champaign

104 South Mathews Avenue Urbana, IL 61801, USA

Abstract. We discuss the use of normal distribution theory as a tool to model the convergence characteristics of dierent GA selection schemes.

The models predict the proportion of optimal alleles in function of the number of generations when optimizing the bit-counting function. The selection schemes analyzed are proportionate selection, tournament selection, truncation selection and elitist recombination. Simple yet accurate models are derived that have only a slight deviation from the experimental results. It is argued that this small dierence is due to the build-up of covariances between the alleles { a phenomenon called linkage disequilibrium in quantitative genetics. We conclude with a brief discussion of this linkage disequilibrium.

1 Introduction

Many di erent genetic algorithm implementations exist and due to the lack of a closed form analytical GA theory it is dicult to compare them objectively.

By assuming that the tness function is normally distributed, we can apply the theory of normaldistributions as analysis tool. This kind of analysis is a standard approach in quantitative genetics 2,3] and allows accurate modeling of the GA convergence characteristics.

In 5] a comparative analysis of GA selection schemes was given that mod- eled the selection behavior in terms of deterministic nite di erence equations.

The di erence equations described the change in proportion of di erent classes of individuals, assuming xed and identical tness function values within each class. Since only selection is used no new tness values are ever generated. Con- vergence time is characterized with the takeover time which is the number of

? The rst author acknowledges the support provided by the Flemish Community under the Concerted Action Project No. 90/94-4. Correspondence can be sent to [email protected].

(2)

generations needed for the best individual in the population to take over the entire population.

In this paper we take an alternative view by modeling di erent selection schemes when optimizing a normal distributed tness function. There are two main advantages of using this approach. First, restricting ourselves to normal distributed tness functions allows simple yet accurate modeling of the convergence characteristics of selection and recombination together. In order to main- tain the normal distribution assumptions as good as possible, we use uniform crossover which maximally decorrelates the alleles. Since there is no epistatic e ect between the genes, the population mean tness does not change after recombination.

Second, the normal tness distribution modeling allows us to analyze the build-up of covariances between the alleles caused by selection { the so called linkage disequilibrium 2,3]. In previous work we discussed the mixing problem in genetic algorithms 7,10] and since mixing failure is basically an extreme build- up of linkage disequilibrium it is instructive to model the covariance between the genes.

The convergence models predict the proportion of optimal alleles^p(^t) in the total population as a function of the number of generations^t. Experimental results are obtained by optimizing the bit-counting function, which is binomial distributed and can well be approximated with a normal distribution. The population mean tness at generation ^t is given by^f(^t) = ^l ^p(^t) and the variance

2(^t) =^l ^p(^t)(1^;^p(^t)).

The experiments in this paper all use ^l = 100 and ⁿ = 200. As shown in

6] a population size ⁿ= 2^lallows reliable decision making for the bit-counting function.

In the next four sections we analyze di erent GA selection schemes, namely proportionate selection, tournament selection, truncation selection and nally elitist recombination. We conclude with a brief discussion of the allele covariance build-up or linkage disequilibrium.

2 Proportionate Selection

Proportionate selection is the oldest and still best known selection scheme in evolutionary computation and implements the idea that reproduction rates are proportional to the tness value^f4,8]. The probability of selecting an individual

i with tness^fⁱ and proportion^Pⁱ(^t) at generation^t is given by:

P

i(^t^s) =^Pⁱ(^t) ^fⁱ

f(^t)

with^f(^t) the mean tness of the population at generation^t.

(3)

The increment in population mean tness can easily be computed:

f(^t^s)^;^f(^t) =^Xⁿ

i=1 P

i(^t^s)^fⁱ^;^f(^t)

=^Xⁿ

i=1 P

i(^t)^fⁱ²

f(^t) ^;^f(^t)

= 1

f(^t)(^f²(^t)^;^f(^t)²)

= ²(^t)

f(^t)

Recombination does not change the population mean tness when optimizing the bit-counting function (^f(^t+ 1) =^f(^t^s)), so the increment of the population average tness with proportionate selection is proportional to the tness variance and inverse proportional to the mean tness.

f(^t+ 1)^;^f(^t) = ²(^t)

f(^t) (1)

Since^f(^t) =^l ^p(^t) and ²(^t) =^{l p}(^t)(1^;^p(^t)) the increase of the proportion p(t) of optimal alleles is given by

l(^p(^t+ 1)^;^p(^t)) = ^{l p}(^t)(1^;^p(^t))

l p(^t) = 1^;^p(^t)

Approximating the di erence equation with the corresponding di erential equation we obtain a simple convergence model expressing the proportion^p(^t) in function of the number of generations^t

dp(^t)

dt

= 1

l

(1^;^p(^t)) which has as solution

p(^t) = 1^;(1^;^p(0)) e^;t⁼l^:

Starting from a random initial population ^p(0) = 0^:5 the convergence model becomes:

p(^t) = 1^;0^:5e^;t⁼l (2) To calculate the convergence speed we compute the number of generations

g

convit takes to let the proportion^p(^t) come arbitrarily close to 1 or^p(^g^conv) = 1^;

g

conv=^;lln(2)

(4)

For= 1⁼(2^l) the convergence time is:

g

conv=^lln(^l) (3)

In gure 1 we have plotted the model for a string length^l = 100 and compared it with experimental results when implementing proportionate selection with stochastic universal sampling. Obviously the model predicts the experimental results very well. The rate of convergence with proportionate selection is extremely slow and it drastically slows down when approaching the solution.

Instead of calculating the convergence speed by computing how long it takes to let the proportion^p(^t) come arbitrarily close to 1, we might also compute the generation at which the global optimum is expected to be found with a given probability. The probability that at least one of the strings in the population consists of all ones is given by:

prob(opt) = 1^;1^;^p^l(^t)]ⁿ

From this equation we can calculate the needed allele proportion^p(^gôpt) and plug this into equation 2 to obtain the number of generations it takes to nd the optimum with given probability prob(opt). For instance in our example with string length ^l = 100, population size ⁿ = 200 and prob(opt) = 99% we nd that ^p(^gôpt) = 0^:96 and^gôpt= 260.

In order to obtain better convergence characterics for proportionate selection one has to use some form of tness scaling 4]. Instead of scaling the tness function explicitly it is also possible to use a selection scheme that does not look at the absolute function value but instead performs selection according to the number of individuals in the population that are better or worse.

Examples of this ranked based selection principle are tournament selection, truncation selection and elitist recombination, which we will discuss in the next sections.

3 Tournament Selection

Tournament selection randomly chooses a set of individuals and picks out the best for reproduction. The number of individuals in the set is mostly equal to two but larger tournament sizes can be used in order to increase the selection pressure 10]. Here we consider the case of optimizing the bit-counting function with a tournament size ^s = 2. Under the assumption of a normally distributed function the tness di erence between two randomly sampled individuals in each tournament is also normally distributed with mean^f(^t) = 0 and variance ²^f(^t) = 2 ²(^t). Since we are only selecting the best of the two competing individuals we are actually only looking at the absolute value of the

tness di erence. The average tness di erence between two randomly sampled individuals is thus given by the mean value of those di erences that are greater than ^f = 0, which is equivalent to the mean value of one half of the normal distribution truncated at its mean value 0. The mean value of the right half of a standard normal distribution is given by^p2⁼= 0^:7979.

(5)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 50 100 150 200 250 300 350 400 450 500

proportion p(t)

generations

SUS model

Fig.1.Convergence model and experimental results of the proportion of optimal alleles when optimizing the bit-counting function with proportionate selection (Stochastic Universal Sampling) and uniform crossover.

Tournament selection selects the best out of every random pair of individuals so the population average tness increase from one generation to the next is equal to half the mean value of the di erence between two randomly sampled individuals:

f(^t+ 1)^;^f(^t) = 12 0^:7979 ^f(^t) = 12 0^:7979^p2 (^t) = 1^p

(^t) or

f(^t+ 1)^;^f(^t) = 0^:5642 (^t) (4) Since^f(^t) =^l ^p(^t) and ²(^t) =^l ^p(^t) (1^;^p(^t)) we get

p(^t+ 1)^;^p(^t) =

r

p(^t)(1^;^p(^t))

l :

Approximating the di erence equation with the di erential equation

dp(^t)

dt

=

r

p(^t)(1^;^p(^t))

l

gives us the solution:

p(^t) = 0^:5(1 + sin(^p^t

l

+ arcsin(2^p(0)^;1)))^:

For a randomly initialized population^p(0) = 0^:5 the convergence model becomes:

p(^t) = 0^:5(1 + sin(^p^t l)) (5)

(6)

The number of generations ^g^conv it takes to let the population fully convergence is obtained by putting ^p(^g^conv) = 1

g

conv= 2

p

l (6)

The convergence time complexity for tournament selection is thus^O(^p^l) which compares very favorable to^O(^lln(^l)) for unscaled proportionate selection.

The proposed convergence model is compared with experimental results in

gure 2. The model slightly overestimates the proportion of optimal alleles for tournament selection and recombination with uniform crossover. In a second experiment we recombine the population twice at each generation, so after the usual procedure of tournament selection followed by recombination we randomly shue the population and again recombine the individuals. By doing this the alleles of each individual are more decorrelated and the assumptions we made when building the convergence model are less violated. Figure 2 shows that the model now coincides very well with the experimental results, so the slight over- estimation is due to the build-up of covariances between the alleles caused by selection. In the last section we will have a closer look at this phenomenon.

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 5 10 15 20 25 30 35 40

proportion p(t)

generations

tour + recomb tour + 2.recomb model

Fig.2.Convergence model and experimental results of the proportion of optimal alleles when optimizing the bit-counting function with tournament selection and uniform crossover.

4 Truncation Selection

Truncation selection or block selection ranks all individuals according to their

tness and selects the best ones as parents. In truncation selection a threshold

(7)

T is dened such that the^T% best individuals are selected. Truncation selection has been used extensively in evolution strategies 1]. It is also often used in quantitative genetics where articial selection performed by breeders is studied.

In 9] a specic genetic algorithm - the Breeder Genetic Algorithm - is proposed that incorporates ideas from breeders to perform parameter optimization tasks, and the convergence model for the bit-counting function has also been computed there.

Block selection 10] is equivalent to truncation selection since for a given population size ⁿ one simply gives ^s copies to the ^n=s best individuals. Both implementations are identical when^s= 100^=T.

If the tness function is normally distributed, quantitative genetics denes the selection intensityⁱthat expresses the selection di erential^S(^t) =^f(^t^s)^;f(^t) in function of the standard deviation (^t):

S(^t) =ⁱ (^t)

Using this denition one can easily compute the convergence model:

f(^t+ 1)^;^f(^t) =ⁱ (^t) (7) Since^f(^t) =^l ^p(^t) and ²(^t) =^l ^p(^t) (1^;^p(^t)) we get

p(^t+ 1)^;^p(^t) =^pⁱ

l p

p(^t)(1^;^p(^t))^:

Approximating the di erence equation with the di erential equation

dp(^t)

dt

= ^pⁱ

l p

p(^t)(1^;^p(^t)) the solution becomes

p(^t) = 0^:5(1 + sin(^pⁱ

l

t+ arcsin(2^p(0)^;1)))^: For a randomly initialized population^p(0) = 0^:5 or:

p(^t) = 0^:5(1 + sin(^pⁱl

t)) (8)

Calculating the number of generations^g^convto convergence (^p(^g^conv) = 1) is straightforward:

g

conv=2^:

p

l

i

(9) It is rather remarkable that the convergence models for tournament selection and truncation selection have the same functional form. The only di erence is the magnitude of the selection intensity constant ⁱ, which can be changed by

(8)

increasing the truncation threshold or tournament size of the respective algorithms. For the standard tournament selection with tournament size ^s= 2, the selection intensity isⁱ= 1⁼^p= 0^:56.

Experimental results are shown in gure 3 for a block size ^s= 2 or equiva- lently a truncation threshold^T = 50%, which gives us a selection intensityⁱ= 0^:8

2,3,9]. Again the model slightly overestimates the proportion^p(^t) of optimal alleles and predictions become more accurate when the alleles are decorrelated by repeating the recombination phase.

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 5 10 15 20 25 30 35 40

proportion p(t)

generations

trunc + recomb trunc + 2.recomb model

Fig.3. Convergence model and experimental results of the proportion of optimal alleles when optimizing the bit-counting function with truncation selection and uniform crossover.

5 Elitist Recombination

Recently we have introduced an extremely simple GA implementation 11] where the selection and recombination phases are intertwined. Competition for survival takes place at the level of each family | the mating parents and their o spring

| which results in a local elitist selection operator, called elitist recombination.

For every mating pair two o springs are created and the best two of these four individuals go to the next generation, so individuals can only be replaced by other individuals with a higher tness value. Crossover is applied for every mating pair so there is no need to choose a specic value for the crossover probability parameter^p^c.

Elitist Recombination algorithm:

1. initialize population

(9)

2. for every generation

(a) random shue population (b) for every mating pair:

{ generate o spring

{ keep best two of each family

It is easy to show that - when optimizing the bit-counting function - the best individual of every mating pair will go to the next generation and the worst parent will be replaced by the best of the two o springs 11]. As a result the population average tness increase ^f(^t+ 1)^;^f(^t) is equal to half the average di erence between the worst parent and the best child.

Since the children are basically random samples of a binomial distribution with same parameter^p(^t) as the parent population, we can compute the population average tness increase as half the mean tness di erence between two random samples of a normal distribution with mean = ^{l p}(^t) and variance

2(^t) =^{l p}(^t)(1^;^p(^t)).

Obviously this is exactly what we have done when modeling tournament selection so the convergence model for elitist recombination and tournament selection with tournament size^s= 2 are the same:

p(^t) = 0^:5(1 + sin(^p^t l)) (10) and

g

conv=2

p

l: (11)

Experimental results are shown in gure 4. As with tournament and truncation selection the model slightly overestimates the proportion^p(^t), and predictions become more accurate when the alleles are decorrelated by repeating the recombination phase. Comparing gure 4 with 2 we note that due to the elitist mechanism the build-up of covariance between the alleles is higher for elitist recombination than for tournament selection and standard recombination.

6 Linkage disequilibrium

The convergence models discussed in the previous sections all assume that the

tness is normally distributed. In practice however there a number of factors - such as a nite population size - that violate this assumption. From the previous experiments it is clear that the most important factor is the build-up of correlations between the alleles. Repeating the recombination phase during each generation made the model predictions signicantly more accurate. In this section we will briey discuss this problem but due to space limitations a detailed analysis is beyond the scope of this paper and can be found in 12].

Selection picks out individuals that are mainly concentrated at the higher level of the tness distribution. The variance of the population is reduced but

(10)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 5 10 15 20 25 30 35 40

proportion p(t)

generations

ElRe ElRe + recomb ElRe + 2.recomb model

Fig.4.Convergence model and experimental results of the proportion of optimal alleles when optimizing the bit-counting function with elitist recombination and uniform crossover.

this reduction is not only caused by the change in allele frequency but also by the introduction of covariances between the alleles. If we represent the covariance with ^cov then the variance after selection is given by:

2(^t^s) = ²(^t) + 2 ^cov(^t^s) (12) With truncation selection for instance the variance is reduced with a factor

k such that

2(^t^s) = (1^;^k) ²(^t) (13) with ^k given by ^k =ⁱ(ⁱ^;^x) where ⁱ is the selection intensity and^x is the corresponding deviation of the truncation point from the population mean 3].

The covariance after selection can now be computed from equations 12 and 13:

cov(^t^s) =^;^k2 ²(^t)

Crossover decorrelates the genes with a decorrelation factor= 1^;p^c^p^rwith

p

r = 2^p^x(1^;^p^x) (^p^c : crossover probability,^p^r : disruption probability and ^p^x: allele swap probability for uniform crossover). The covariance after selection and recombination is thus given by:

cov(^t+ 1) =^;^k2 ²(^t)

Using this expression for the covariance we can more accurately calculate the population tness variance such that the convergence models predict the experimental results even better (for details see 12]).

(11)

7 Conclusion

Simple yet accurate convergence models for di erent selection schemes are derived that allow us to compare the GA selection schemes in terms of their convergence characteristics. The models assume a normally distributed tness function and predict the proportion of optimal alleles in function of the number of generations. Experimental results for optimizing the bit-counting function show good agreement with the predictions. It is argued that the slight deviations are due to linkage disequilibrium or the build-up of covariances between the alleles which can be included in the models to increase their accuracy.

References

1] Back,T., Homeister,F.,& Schwefel,H.P.(1991) A Survey of Evolution Strategies.

Proceedings of the Fourth International Conference on Genetic Algorithms, 2-9.

Morgan Kaufmann.

2] Bulmer,M.G.(1985). The Mathematical Theory of Quantitative Genetics.Oxford University Press.

3] Falconer,D.S.(1989). Introduction to Quantitative Genetics.Longman Scientic &

Technical.

4] Goldberg,D.E.(1989). Genetic Algorithms in Search, Optimization and Machine Learning.Addison Wesley Publishing Company.

5] Goldberg,D.E.,& Deb,K.(1991).A comparative analysis of selection schemes used in genetic algorithms.Foundations of Genetic Algorithms I, 69-93.

6] Goldberg,D.E., Deb,K.,& Clark,J.H.(1991). Genetic algorithms, noise, and the sizing of populations.Complex Systems, 6, 333-362.

7] Goldberg,D.E., Deb,K.,& Thierens,D.(1992). Toward a better understanding of mixing in genetic algorithms.Journal of the Society for Instrumentation and Con- trol Engineers, 32(1),10-16.

8] Holland,J.H.(1975).Adaptation in natural and articial systems.Ann Arbor: Uni- versity of Michigan Press.

9] Muhlenbein,H.,& Schlierkamp-Voosen,D.(1993).Predictive Models for the Breeder Genetic Algorithm. I. Continuous Parameter Optimization.Evolutionary Compu- tation 1(1):25-49, MIT Press.

10] Thierens,D.,& Goldberg,D.E.(1993).Mixing in Genetic Algorithms.Proceedings of the Fifth International Conference on Genetic Algorithms, 38-45. Morgan Kauf- mann.

11] Thierens,D.,& Goldberg,D.E.(1993). Elitist Recombination: an integrated selection recombination GA.Technical Report ESAT-SISTA/TR 1993-76. To appear in Proceedings of the IEEE World Congress on Computational Intelligence. Orlando, June 1994.

12] Thierens,D.,& Goldberg,D.E.(1994). Eects of Linkage Disequilibrium on GA convergence.Technical Report ESAT-SISTA/TR 1994-41.

This article was processed using the L^aTEX macro package with LLNCS style