結論與未來工作

本文主要目的為針對傳統 CMA-ES 的突變機制做改良，由於傳統的突變機制上，主要是以自適應機制來決定其突變的行為，雖然能夠在演化過程中適度地調整策略參數以決定突變方向，但常態分佈的突變行為仍然限制了個體所能夠突變的程度。在本論文第三章提出了一個分群突變的機制，其主要精神為額取混合機率模型具有極大分佈彈性的概念，來經由更有彈性的突變進行對於各種方向上的搜尋，進而降低區域搜尋的限制。由視覺化的實驗觀察，本論文分群突變的概念將有效地使個體在突變的過程中，達到更全面的搜尋能力，因此能有效找到最佳解位置。本論文的未來工作主要是解決收斂速度過慢的問題，這問題在多漏斗型函數實驗上面更顯得嚴重，以下列出了幾點可行方法：

z 從理論上著手，[39]在最近得出了部分 CMA-ES 更新公式是一種「期望適應值上的自然梯度下降法(natural gradient decent on expectation of fitness)」的理論證明。倘若從這種下降法推廣到常態混和機率分佈，或許可以從裡面看出一些 CMA-ES 混合模型版本的重要的端倪。這種完全由常態混合機率模型推倒獲得的理論結果，可能會隱藏著更加自然的組件數量更新機制，其重要性將會大於每一代人為刻意使用某種分群法再硬套入混合機率模型的 MS-CMA-ES。

z 更改混和機率參數更新機制，在簡單的球形測試函式上，MS-CMA-ES 收斂速度慢的原因除了是取樣數不同也有可能是不正常分群造成的，不必要的分群是否造成收斂速度什麼影響是另一個未來研究方向。在複雜的多漏斗多峰函式上，收斂速度慢的原因本論文認為是由於混和機率參數更新機制不良所造成的，或許使用其他指標例如適應值進步程度會更好。另外混和機率更新參數也需要重新設計，不應該是個跟取樣數量、維度、和組件數無關的固定常數。

參考文獻

[1] M. Lunacek, D. Whitley, and A. Sutton, “The impact of global structure on search,”Parallel Problem Solving from Nature - PPSN X, vol. 5199, pp.

498-507, 2008.

[2] C. L. Muller and I. F. Sbalzarini, “A tunable real-world multi-funnel benchmark problem for evolutionary optimization and why parallel island models might remedy the failure of CMA-ES on it,” in Proc. of 2009 International Joint Conference on Computational Intelligence,Paris, France, pp. 248-253, 2009.

[3] D. J. Wales, “Energy landscapes and properties of biomolecules,” Physical Biology, vol. 2, pp. 86-93, Dec 2005.

[4] P. L. Clark, “Protein folding in the cell: reshaping the folding funnel,” Trends in Biochemical Sciences, vol. 29, pp. 527-534, Oct 2004.

[5] C. L. Muller, B. Benedikt, and F. S. Ivo, “Particle Swarm CMA Evolution Strategy for the optimization of multi-funnel landscapes,” in Proc. of 2009 IEEE Congress on Evolutionary Computation, New York, USA, pp.

2685-2692, 2009.

[6] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 603-619, May 2002.

[7] N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution atrategies,” Evolutionary Computation, vol. 9, pp. 159-195, Jun 2001.

[8] N. Hansen, S. D. Muller, and P. Koumoutsakos, “Reducing the time complexity of the derandomized evolution atrategy with covariance matrix adaptation (CMA-ES),” Evolutionary Computation, vol. 11, pp. 1-18, Spr 2003.

[9] M. A. T. Figueiredo and A. K. Jain, “Unsupervised learning of finite mixture models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 381-396, Mar 2002.

[10] C. M. Bishop, Pattern recognition and machine learning, New York: Springer, 2006.

[11] M. P. Wand and M. C. Jones, “Comparison of smoothing parameterizations in bivariate kernel density-estimation,” Journal of the American Statistical Association, vol. 88, pp. 520-528, Jun 1993.

[12] V. Granville, M. Kfivanek, and J. Rasson, “Simulated Annealing: a proof of

convergence,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 6, pp. 652-656, 1994.

[13] D. B. Fogel, T. Bäck, and Z. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators, Taylor & Francis, 2000

[14] A. E. Eiben, and J. E. Smith, Introduction to evolutionary computing, Springer, 2003.

[15] H. G. Beyer and H. P. Schwefel, “Evolution strategies: A comprehensive introduction,” Natural Computing, vol. 1, pp. 3-52, 2002.

[16] A. Auger and N. Hansen, “Performance evaluation of an advanced local search evolutionary algorithm,” in Proc. of 2005 IEEE Congress on Evolutionary Computation, New York, USA, pp. 1777-1784, 2005.

[17] A. Auger and N. Hansen, “A restart CMA evolution strategy with increasing population size,” in Proc. of 2005 IEEE Congress on Evolutionary Computation, New York, pp. 1769-1776, 2005.

[18] N. Hansen and S. Kern, “Evaluating the CMA evolution strategy on multimodal test functions,” Parallel Problem Solving from Nature - PPSN XIII, vol. 3242, pp. 282-291, 2004.

[19] C. T. Hsieh, C. M. Chen, and Y. P. Chen,“Particle Swarm Guided Evolution Strategy,” in Proc. of the 9th Annual Conference on Genetic and Evolutionary Computation(GECCO'07), New York, USA, 2007.

[20] R. S. Stephan, “Defining a standard for particle swarm optimization,”IEEE Swarm Intelligence Symposium, 2007, Honolulu, Hawaii, USA, pp. 120-128, April 2007.

[21] J. Kennedy, “The particle swarm: social adaptation of knowledge,” in Proc. of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97), Indianapolis, IN, pp. 303-308, 1997.

[22] S. Kern, S. D. Müller, N. Hansen, D. Büche, J. Ocenasek, and P.

Koumoutsakos, “Learning probability distributions in continuous evolutionary algorithms-a comparative review,” Natural Computing, vol. 3, no. 1, pp.

77-112, 2004.

[23] N. Hansen and A. Ostermeierm, “Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation,” in Proc. of the 1996 IEEE International Conference on Evolutionary Computation, pp. 312-317, 1996.

[24] M. P. Wand and M. C. Jones, Kernel smoothing, London: Chapman & Hall, 1995.

[25] M. C. Jones, J. S. Marron, and S. J. Sheather, “A brief survey of bandwidth selection for density estimation,” Journal of the American Statistical

Association, vol. 91, no. 433, pp. 401-407, Mar 1996.

[26] R. S. Stephan, “Multivariate locally adaptive density estimation,”

Computational Statistics & Data Analysis, vol. 39, no. 2, pp. 165-186, Jan 2001.

[27] R. S. Stephan and D. W. Scott, “On locally adaptive density estimation,”

Journal of the American Statistical Association, vol. 91, no. 436, pp.

1525-1534, Dec 1996.

[28] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790-799, Aug 1995.

[29] S. T. Tokdar and R. E. Kass, ”Importance sampling: a review,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 1, pp. 54-60, Feb 2010.

[30] D. J. C. MacKay, Information theory, inference, and learning algorithms, Cambridge University Press, 2003.

[31] P. Zhang, “Nonparametric importance sampling,” Journal of the American Statistical Association, vol. 91, no. 435, pp. 1245-1253, Sep 1996.

[32] D, Koller and N, Friedman, probabilistic graphical models, MIT Press, 2009.

[33] M. Lynch, “Evolution of the mutation rate,” Trends in Genetic, vol. 26, no.

436, pp. 345–352, 2010.

[34] D. Comaniciu, “An algorithm for data-drivan bandwidth selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, pp.

281-288, Feb 2003.

[35] D. Comaniciu, V. Ramesh, and P. Meer, “The variable bandwidth mean shift and data-driven scale scale selection,” in Proc. of 2001 Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, vol. 1, pp. 438 - 445, 20017.

[36] G. R. Terrell, “The maximal smoothing principle in density estimation,”

Journal of the American Statistical Association, vol. 85, no. 7, pp. 470-480, Dec 1990.

[37] N. Hansen, A. Auger, S. Finck, and R. Ros, “Real-parameter black-box optimization benchmarking 2010: experimental setup,” INRIA research report RR-7215, Sep 2010.

[38] P. N. Suganthan, N. Hansen, J. J. Liang, K. Deb, Y. P. Chen, A. Auger, and S.

Tiwari, “Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization,” Nanyang Technological University, Singapore KanGAL Report #2005005, 2005.

[39] Y. Akimoto, Y. Nagata, I. Ono, and S. Kobayashi, “Bidirectional relation

between CMA evolution strategies and natural evolution strategies,” Parallel Problem Solving from Nature – PPSN XI, vol. 6238, pp. 154-163, 2010.

附錄一測試函式定義

D: dimensions,

o = [o ,o ,...,o ]: the shifted global optimim.

F2: Elliptic Function

( )

^i-1 M: orthogonal matrix.

F3: Rosenbrock’s Function

( ) ( )

D: dimensions,

o = [o ,o ,...,o ]: the shifted global optimim.

F4:Rastrigin’s Function

( )

D 2

4 i

i=1

F (x) =

∑

z −10cos(2πz_i) 10 + f_bias ,+

1 2 D

1 1 D

z = x - o, x = [x ,x ,...,x ], D: dimensions,

o = [o ,o ,...,o ]: the shifted global optimim. D: dimensions,

o = [o ,o ,...,o ]: the shifted global optimim,

M': linear transformation matrix, condition number = 3, M = M'(1 + 0.3 N(0,1) ).

F6: Schwefel’s Function

D 2

A, B are two D*D matrix, a ,b are integer random number in the range [-100,100],

= [ , ,..., ], α α α α

∑ ∑

j are random numbers in the range [- , ].

α π π

F7: Double-Rastrigin’s Function

( )

1 2 D

1 1 D

z = x - o, x = [x ,x ,...,x ], D: dimensions,

o = [o ,o ,...,o ]: the shifted global optimim.

F8: Weierstrass’s Function

( )

D max max

k k

8 i

i=1 0 0

F (x) = ^k ^kcos 2 b (z +0.5) -D^k ^kcos(2 b 0.5) +f_bias ,

k k

a π a π

= =

⎡ ⎤ ⎡ ⋅ ⎤

⎢ ⎥ ⎣ ⎦

⎣ ⎦

∑ ∑ ∑

a=0.5, b=3, kmax=20, z=(x-o)*M, x = [x ,x ,...,x ],₁ ₂ _D

1 1 D

D: dimensions,

o = [o ,o ,...,o ]: the shifted global optimim,

M: linear transformation matrix, condition number = 5.

在文檔中基於均值移動之自適應共變異數矩陣演化策略 (頁 68-75)

參考文獻

附錄一 測試函式定義

( )

( ) ( )

( )

∑

∑ ∑

( )

( )

∑ ∑ ∑

附錄一測試函式定義