本文主要目的為針對傳統 CMA-ES 的突變機制做改良,由於傳統的突變機制 上,主要是以自適應機制來決定其突變的行為,雖然能夠在演化過程中適度地調 整策略參數以決定突變方向,但常態分佈的突變行為仍然限制了個體所能夠突變 的程度。在本論文第三章提出了一個分群突變的機制,其主要精神為額取混合機 率模型具有極大分佈彈性的概念,來經由更有彈性的突變進行對於各種方向上的 搜尋,進而降低區域搜尋的限制。由視覺化的實驗觀察,本論文分群突變的概念 將有效地使個體在突變的過程中,達到更全面的搜尋能力,因此能有效找到最佳 解位置。 本論文的未來工作主要是解決收斂速度過慢的問題,這問題在多漏 斗型函數實驗上面更顯得嚴重,以下列出了幾點可行方法:
z 從理論上著手,[39]在最近得出了部分 CMA-ES 更新公式是一種「期望適應 值上的自然梯度下降法(natural gradient decent on expectation of fitness)」的理 論證明。倘若從這種下降法推廣到常態混和機率分佈,或許可以從裡面看出 一些 CMA-ES 混合模型版本的重要的端倪。這種完全由常態混合機率模型推 倒獲得的理論結果,可能會隱藏著更加自然的組件數量更新機制,其重要性 將 會 大 於 每 一 代 人 為 刻 意 使 用 某 種 分 群 法 再 硬 套 入 混 合 機 率 模 型 的 MS-CMA-ES。
z 更改混和機率參數更新機制,在簡單的球形測試函式上,MS-CMA-ES 收斂 速度慢的原因除了是取樣數不同也有可能是不正常分群造成的,不必要的分 群是否造成收斂速度什麼影響是另一個未來研究方向。在複雜的多漏斗多峰 函式上,收斂速度慢的原因本論文認為是由於混和機率參數更新機制不良所 造成的,或許使用其他指標例如適應值進步程度會更好。另外混和機率更新 參數也需要重新設計,不應該是個跟取樣數量、維度、和組件數無關的固定 常數。
59
參考文獻
[1] M. Lunacek, D. Whitley, and A. Sutton, “The impact of global structure on search,”Parallel Problem Solving from Nature - PPSN X, vol. 5199, pp.
498-507, 2008.
[2] C. L. Muller and I. F. Sbalzarini, “A tunable real-world multi-funnel benchmark problem for evolutionary optimization and why parallel island models might remedy the failure of CMA-ES on it,” in Proc. of 2009 International Joint Conference on Computational Intelligence, Paris, France, pp. 248-253, 2009.
[3] D. J. Wales, “Energy landscapes and properties of biomolecules,” Physical Biology, vol. 2, pp. 86-93, Dec 2005.
[4] P. L. Clark, “Protein folding in the cell: reshaping the folding funnel,” Trends in Biochemical Sciences, vol. 29, pp. 527-534, Oct 2004.
[5] C. L. Muller, B. Benedikt, and F. S. Ivo, “Particle Swarm CMA Evolution Strategy for the optimization of multi-funnel landscapes,” in Proc. of 2009 IEEE Congress on Evolutionary Computation, New York, USA, pp.
2685-2692, 2009.
[6] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 603-619, May 2002.
[7] N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution atrategies,” Evolutionary Computation, vol. 9, pp. 159-195, Jun 2001.
[8] N. Hansen, S. D. Muller, and P. Koumoutsakos, “Reducing the time complexity of the derandomized evolution atrategy with covariance matrix adaptation (CMA-ES),” Evolutionary Computation, vol. 11, pp. 1-18, Spr 2003.
[9] M. A. T. Figueiredo and A. K. Jain, “Unsupervised learning of finite mixture models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 381-396, Mar 2002.
[10] C. M. Bishop, Pattern recognition and machine learning, New York: Springer, 2006.
[11] M. P. Wand and M. C. Jones, “Comparison of smoothing parameterizations in bivariate kernel density-estimation,” Journal of the American Statistical Association, vol. 88, pp. 520-528, Jun 1993.
[12] V. Granville, M. Kfivanek, and J. Rasson, “Simulated Annealing: a proof of
60
convergence,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 6, pp. 652-656, 1994.
[13] D. B. Fogel, T. Bäck, and Z. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators, Taylor & Francis, 2000
[14] A. E. Eiben, and J. E. Smith, Introduction to evolutionary computing, Springer, 2003.
[15] H. G. Beyer and H. P. Schwefel, “Evolution strategies: A comprehensive introduction,” Natural Computing, vol. 1, pp. 3-52, 2002.
[16] A. Auger and N. Hansen, “Performance evaluation of an advanced local search evolutionary algorithm,” in Proc. of 2005 IEEE Congress on Evolutionary Computation, New York, USA, pp. 1777-1784, 2005.
[17] A. Auger and N. Hansen, “A restart CMA evolution strategy with increasing population size,” in Proc. of 2005 IEEE Congress on Evolutionary Computation, New York, pp. 1769-1776, 2005.
[18] N. Hansen and S. Kern, “Evaluating the CMA evolution strategy on multimodal test functions,” Parallel Problem Solving from Nature - PPSN XIII, vol. 3242, pp. 282-291, 2004.
[19] C. T. Hsieh, C. M. Chen, and Y. P. Chen,“Particle Swarm Guided Evolution Strategy,” in Proc. of the 9th Annual Conference on Genetic and Evolutionary Computation(GECCO'07), New York, USA, 2007.
[20] R. S. Stephan, “Defining a standard for particle swarm optimization,”IEEE Swarm Intelligence Symposium, 2007, Honolulu, Hawaii, USA, pp. 120-128, April 2007.
[21] J. Kennedy, “The particle swarm: social adaptation of knowledge,” in Proc. of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97), Indianapolis, IN, pp. 303-308, 1997.
[22] S. Kern, S. D. Müller, N. Hansen, D. Büche, J. Ocenasek, and P.
Koumoutsakos, “Learning probability distributions in continuous evolutionary algorithms-a comparative review,” Natural Computing, vol. 3, no. 1, pp.
77-112, 2004.
[23] N. Hansen and A. Ostermeierm, “Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation,” in Proc. of the 1996 IEEE International Conference on Evolutionary Computation, pp. 312-317, 1996.
[24] M. P. Wand and M. C. Jones, Kernel smoothing, London: Chapman & Hall, 1995.
[25] M. C. Jones, J. S. Marron, and S. J. Sheather, “A brief survey of bandwidth selection for density estimation,” Journal of the American Statistical
61
Association, vol. 91, no. 433, pp. 401-407, Mar 1996.
[26] R. S. Stephan, “Multivariate locally adaptive density estimation,”
Computational Statistics & Data Analysis, vol. 39, no. 2, pp. 165-186, Jan 2001.
[27] R. S. Stephan and D. W. Scott, “On locally adaptive density estimation,”
Journal of the American Statistical Association, vol. 91, no. 436, pp.
1525-1534, Dec 1996.
[28] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790-799, Aug 1995.
[29] S. T. Tokdar and R. E. Kass, ”Importance sampling: a review,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 1, pp. 54-60, Feb 2010.
[30] D. J. C. MacKay, Information theory, inference, and learning algorithms, Cambridge University Press, 2003.
[31] P. Zhang, “Nonparametric importance sampling,” Journal of the American Statistical Association, vol. 91, no. 435, pp. 1245-1253, Sep 1996.
[32] D, Koller and N, Friedman, probabilistic graphical models, MIT Press, 2009.
[33] M. Lynch, “Evolution of the mutation rate,” Trends in Genetic, vol. 26, no.
436, pp. 345–352, 2010.
[34] D. Comaniciu, “An algorithm for data-drivan bandwidth selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 2, pp.
281-288, Feb 2003.
[35] D. Comaniciu, V. Ramesh, and P. Meer, “The variable bandwidth mean shift and data-driven scale scale selection,” in Proc. of 2001 Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, vol. 1, pp. 438 - 445, 20017.
[36] G. R. Terrell, “The maximal smoothing principle in density estimation,”
Journal of the American Statistical Association, vol. 85, no. 7, pp. 470-480, Dec 1990.
[37] N. Hansen, A. Auger, S. Finck, and R. Ros, “Real-parameter black-box optimization benchmarking 2010: experimental setup,” INRIA research report RR-7215, Sep 2010.
[38] P. N. Suganthan, N. Hansen, J. J. Liang, K. Deb, Y. P. Chen, A. Auger, and S.
Tiwari, “Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization,” Nanyang Technological University, Singapore KanGAL Report #2005005, 2005.
[39] Y. Akimoto, Y. Nagata, I. Ono, and S. Kobayashi, “Bidirectional relation
62
between CMA evolution strategies and natural evolution strategies,” Parallel Problem Solving from Nature – PPSN XI, vol. 6238, pp. 154-163, 2010.
附錄一 測試函式定義
D: dimensions,o = [o ,o ,...,o ]: the shifted global optimim.
F2: Elliptic Function
( )
i-1 M: orthogonal matrix.F3: Rosenbrock’s Function
( ) ( )
D: dimensions,o = [o ,o ,...,o ]: the shifted global optimim.
F4:Rastrigin’s Function
( )
D 2
4 i
i=1
F (x) =
∑
z −10cos(2πzi) 10 + f_bias ,+1 2 D
1 1 D
z = x - o, x = [x ,x ,...,x ], D: dimensions,
o = [o ,o ,...,o ]: the shifted global optimim. D: dimensions,
o = [o ,o ,...,o ]: the shifted global optimim,
M': linear transformation matrix, condition number = 3, M = M'(1 + 0.3 N(0,1) ).
F6: Schwefel’s Function
D 2
A, B are two D*D matrix, a ,b are integer random number in the range [-100,100],
= [ , ,..., ], α α α α
∑ ∑
j are random numbers in the range [- , ].
α π π
F7: Double-Rastrigin’s Function
( )
1 2 D
1 1 D
z = x - o, x = [x ,x ,...,x ], D: dimensions,
o = [o ,o ,...,o ]: the shifted global optimim.
F8: Weierstrass’s Function
65
( )
8D max max
k k
8 i
i=1 0 0
F (x) = k kcos 2 b (z +0.5) -Dk kcos(2 b 0.5) +f_bias ,
k k
a π a π
= =
⎡ ⎤ ⎡ ⋅ ⎤
⎢ ⎥ ⎣ ⎦
⎣ ⎦
∑ ∑ ∑
a=0.5, b=3, kmax=20, z=(x-o)*M, x = [x ,x ,...,x ],1 2 D
1 1 D
D: dimensions,
o = [o ,o ,...,o ]: the shifted global optimim,
M: linear transformation matrix, condition number = 5.