National University of Kaohsiung Repository System:Item 310360000Q/10511

全文

(1)國立高雄大學統計學研究所碩士論文. Adaptive Search Region Methods with Derivative Information in Computer Experiment 在電腦實驗上加入微分資訊的自適搜尋區域法. 研究生：簡暉展撰指導教授：陳瑞彬博士. 中華民國九十八年七月.

(2) 謝辭兩年的碩士生活一轉眼就過去了。自己也學習到了很多東西，從不會寫程式到已經能獨立完成，最終也順利完成論文。在此，先要感謝陳瑞彬老師的教導，他總是能不厭其煩的重複敘述一些我比較不理解的地方；並且不斷的鼓勵我，使我能更加的有信心去完成繁雜的事情。然後我要感謝王偉仲與莊曜遠老師的指導，他們常給我一些意見，讓我能加以參考並且學習。還有感謝依帆、鍊奇、能凱等學長姊在我剛升碩二時，常替我解答一些程式上的疑問。謝謝你們。我還要謝謝藍屏姊和怡如，在我當班代整整一年期間給我的建議與輔助。雖然沒有當的很稱職，但是很感謝他們的付出與辛勞!另外也感謝統計所的同學們，大家在課業上有困難時，也能互相討論與協助教導，都是我學習的榜樣。最後我要感謝我的父母，讓我在求學期間沒有經濟上的負擔，你們辛苦了! 簡暉展於高雄大學統計學研究所民國 98 年 7 月.

(3) Adaptive Search Region Methods with Derivative Information in Computer Experiment. by Hui-Chan Chien Advisor Ray-Bing Chen. Institute of Statistics, National University of Kaohsiung Kaohsiung, Taiwan 811 R.O.C. July 2009.

(4) Contents. Z` zZ`. ii iii. 1 Introduction. 1. 2 Algorithm. 3. 2.1. 2.2. Adaptive Search Regions Method . . . . . . . . . . . . . . . . . . . .. 3. 2.1.1. Defining the Grid.. . . . . . . . . . . . . . . . . . . . . . . . .. 3. 2.1.2. Choosing Initial Experimental Points. . . . . . . . . . . . . . .. 5. 2.1.3. Constructing the Surrogate Surface. . . . . . . . . . . . . . . .. 5. 2.1.4. Choosing a New Point and Verifying the Stopping Criterion. .. 7. 2.1.5. Shrinking the Search Region and Refining Grids. . . . . . . . .. 7. 2.1.6. Stopping Criteria for Accuracy Control.. . . . . . . . . . . . .. 8. Cooperate with Derivative Information . . . . . . . . . . . . . . . . .. 9. 2.2.1. Surrogate Construction with Derivative Information . . . . . .. 9. 2.2.2. Search Region by Derivative Information . . . . . . . . . . . . 11. 2.3. Adaptive Search Region Method with Derivative Information . . . . . 12. 2.4. Multiple Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. 2.5. 2.4.1. Retracing Way . . . . . . . . . . . . . . . . . . . . . . . . . . 12. 2.4.2. Determining a New Optimal Point and Stopping Criteria . . . 13. The whole algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.1. Stopping Criteria for the Algorithm . . . . . . . . . . . . . . . 13. 2.5.2. The theoretical algorithm . . . . . . . . . . . . . . . . . . . . 14. 3 Convergence Result. 16. 4 Numerical Experiments. 17. 4.1. Smooth Function with a Single Minimum . . . . . . . . . . . . . . . . 18 4.1.1. 4.2. Banana shape function . . . . . . . . . . . . . . . . . . . . . . 18. Smooth Function with a Multiple Local Minimum . . . . . . . . . . . 20 1.

(5) 4.3. 4.4. 4.2.1. Camel function . . . . . . . . . . . . . . . . . . . . . . . . . . 20. 4.2.2. M¨ uller Brown Surface. 4.2.3. Branin function . . . . . . . . . . . . . . . . . . . . . . . . . . 23. 4.2.4. 2-D Ackley’s function . . . . . . . . . . . . . . . . . . . . . . . 24. 4.2.5. 2-D Griewank function . . . . . . . . . . . . . . . . . . . . . . 24. . . . . . . . . . . . . . . . . . . . . . . 20. High Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.1. 3-D Ackley function. . . . . . . . . . . . . . . . . . . . . . . . 26. 4.3.2. Shekel’s family . . . . . . . . . . . . . . . . . . . . . . . . . . 26. 4.3.3. Hartman’s family . . . . . . . . . . . . . . . . . . . . . . . . . 28. 4.3.4. Rosenbrock family . . . . . . . . . . . . . . . . . . . . . . . . 28. Summary of the result . . . . . . . . . . . . . . . . . . . . . . . . . . 29. 5 Conclusion. 34. Reference. 36. i.

(6) 3é\@îá5£GÝÊ¨´ ½° ¼0>0: Wü }ÿ »ñ{..Ù.@~X .ß:³" »ñ{..Ù.@~X ` Lai (2008) èÊ¨´ ½°h]° b§P/0´XbÁÂ`« ]° 3ÍZtÝé\@EyDT`«Ý×'²¬'5£GÎ ãÿÝhÑÑÝÊ¨´ ½°xÞ5£Gá3Cÿl ¨´ÁÂËI Zô"îÝ3&îìÝ8n n"C:Ê¨´ ½°_Õ`«í8't·;5£G. ii.

(7) Adaptive Search Region Methods with Derivative Information in Computer Experiment Advisor: Dr. Ray-Bing Chen Institute of Statistics National University of Kaohsiung. Student: Hui-Chan Chien Institute of Statistics National University of Kaohsiung. ABSTRACT Adaptive Search Region Method (ASRM), proposed by Lai (2008), is a surrogateassisted method for searching all local extremes of a response function in a bounded experimental region. In this work, besides common assumptions for response function in computer experiments, we assume that the derivative information is also available. Thus a modified ASRM is proposed by cooperating the derivative information into surrogate construction and extreme point search. Several numerical experiments with different dimensionalities are used to show the efficiency of this new method.. Keywords: Adaptive Search Region Method, retracing, surrogate surface, uniform design, optimization, derivative information.. iii.

(8) 1. Introduction. In this article, we consider the problem of minimizing an objective f : Rn → R subject to bound constraints, i.e. min. f (x). s.t. a ≤ x ≤ b,. (1.1). where a, b ∈ Rn represent upper and lower bounds on the decision variable. Here we are interested in the unknown functions that exhibit the following characteristics. First, the function f (x) is either complicated or defined implicitly. Second, the computational cost associated with simulating the function values is very high. The most common challenge in the general development of optimization methods is in reducing the computational cost for computing or simulating the function values. Therefore, the purpose of designing optimization algorithms is to avoid excessive function evaluations and makes full use of the information obtained in each iteration. During the past 50 years, many advances have been made in the area of optimization methods. The Response surface methodology(RSM) is one of the most common methods for building an approximate model. The foundations of RSM were established by Box and Wilson in [4]. Pattern search methods evolved directly from the statistical literature on RSM. The Generalized Pattern Search (GPS) was described in [13] and [19]. Lewis and Torczon discuss pattern search methods for bound and linearly constraining minimization in [11] and [12], where the convergence of these methods is analyzed in these works. Audet and Dennis propose a new convergence analysis for the GPS in [2], which combines the arguments of [19], [11], and [12]. Furthermore, Abramson, Audet and Dennis presented GPS with derivative information in [1]. In the previous paragraph we only considered what has been referred as ”direct methods”. Another class of methods use a ”surrogate approximation” scheme for optimization. One of the popular methods is the Design and Analysis of Computer Experiment (DACE), which was described in [16]. Some analyses of DACE and global optimization were presented in [10][8][9]. Trosset and Torczon proposed a 1.

(9) new iterative optimization algorithm, which called the Model-Assisted Grid Search (MAGS) [20]. MAGS was later modified by Siefert and renamed the Model-Assisted Pattern Search (MAPS) [17], essentially a combination of DACE and the Pattern Search Method. Lai proposed an Adaptive Search Regions Method (ASRM) in 2008. The purpose of this method is to find more accurate local minimum and identify multiple local minima. The ASRM was originally derived from the Basis-based Response Surface Method (BRSM). The main idea of ASRM is to construct a global surrogate surface as a rough fit to the true surface roughly. The algorithm distinguishes multiple local minima by shrinking the search region. BRSM and Kriging is used for the global surrogate surface construct. BRSM was proposed by Wang and Chen in 2008. The algorithm uses a linear combination of basis functions to approximate the response surface. Kriging is a geostatistical method for interpolation of the surrogate surface of a random field. The kriging process is an approximation using a linear combination of the regression models and correlations. Here we combine ASRM with derivative information(ASRM-DI). The method is a modified ASRM in constructing surrogate and reducing search region. In this paper, we use Hermite interpolation and Cokriging approximation models to construct surrogate. Hermite interpolation is an idea to construct gradient-enhanced RBF approximations, for details see [15]. Cokriging approximation models was proposed by Chung and Alonso in [6]. The method is an extension of the Kriging method, which incorporates secondary information such as gradients in addition to primary function values. Abramson, Audet and Dennis use gradient to decrease the worst case cost of a POLL step at each iterate in [1]. Then we employ the idea to reduce the search region. In this work, we given a convergence proof for this algorithm that essentially follows the analysis made by Lin in 2009. Another concerned of the Method is in finding multiple local minima automatically.. 2.

(10) 2. Algorithm. The Adaptive Search Regions Method (ASRM) is a framework to find a minimum accurately and multiple local minima automatically. In Lai (2008), ASRM has been applied into several numerical experiments to identify all local extremes. Here the derivative information is assumed as available in our paper. Thus we want to efficiently combine ASRM with derivative information, and this new method is know as ASRM-DI in short. In this section, we would describe ASRM first, and then focus on how to combine ASRM with derivative information.. 2.1. Adaptive Search Regions Method. Here we describe the Adaptive Search Regions Method step by step. Basically ASRM contains the following steps: Table 1: Adaptive Search Regions Method (1)Defining the grid. (2)Choosing initial experimental points. (3)Constructing the surrogate surface. (4)Choosing a new point and verifying the stopping criterion. (5)Shrinking the search region and refining grids. (6)Stopping criteria for accuracy control.. 2.1.1. Defining the Grid.. Given an experimental domain of interest, the grid points is chosen first and denoted by Gk . Following the ideas in pattern search method the grid can be defined as follows. To define the grid in ASRM we need two components, a main matrix and a minor matrix. The minor matrix B ∈ Rd×d is a diagonal matrix with positive diagonal elements, where d is the dimension. The main matrix is A ∈ Zd× 3. N. , where.

(11) N = p1 × p2 × . . . × pd . We can decompose A as: A = [M − M L] = [Γ L],. (2.2). where M and −M are determined the boundary of the grid, and L denotes the other points of the grid and contains the column of zeros that is identified as the center of the grid. Given a grid size 4k ∈ R, and 4k is a positive number, we define a trial step in the grid of the form: gki = 4k · B · ai + Ck ,. (2.3). where ai is a column of A = [a1 a2 · · · aN ]. Note that B · ai determines the direction of the step, and 4k is a step length parameter. Now we want to show that the construction of the grid Gk by matrices A and B, so we discuss the case of dimension-two and the number of grid points is p1 = p2 = 3. Here we choose the minor matrix B = I2 , so the grid size of each dimension is the same. Furthermore, we determine M and −M         p1 −1 p1 −1 1 0 − 2 −1 0 0 0 =  , −M =  = . M = 2 p2 −1 p2 −1 0 1 0 −1 0 0 − 2 2 The matrix L contains the other points of the grid and the column of zeros, then we will show how to generate the matrix A. for i = 1 to p1 do for i = 1  to p2 do  i − p12+1 (i,j)  a = j − p22+1 end end Here we take n = 2 and p1 = p2 = 3, then  A=. 1 0 −1 0 1. 0. 0. −1 −1 0. −1. 1 4. 1. 1. −1 0 −1 1.  ,.

(12) and we let 4k =0.5, Ck =(0, 0), then we have the grid   0.5 0 −0.5 0 −0.5 −0.5 0 0.5 0.5 . Gk =  0 0.5 0 −0.5 0.5 −0.5 0 −0.5 0.5 2.1.2. Choosing Initial Experimental Points.. The choice of initial experimental points is an important issue. We could consider the uniform design (Fang et al., 2000) as a method to pick initial experimental points Pinit ∈ Gk . The uniform design seeks design points that are uniformly scattered on the domain. The benefit of using this method to choose initial experimental points is that we will not describe any operator bias into the analysis. We then have a temporarily optimal point Oinit = arg min{f (x)|x ∈ Pinit } after choosing initial experimental points. We will see that there is a specific criterion that allows the algorithm to determine whether the search region should be shrunken or not. 2.1.3. Constructing the Surrogate Surface.. • Kriging Kriging is a geostatistical method which used to interpolate the surrogate surface of a random field. It is developed in the field of spatial statistics and can be used in application to computer experiments. The surrogate model is denoted by: Se =. N X. βj α(xj ) + Z(x),. (2.4). j=1. where βj are the regression coefficients of a system of linear equations, and α(xj ) are j functions in the regression. Here Z(x) is a random process with mean zero and covariance function of the form: c(s, t) = σ 2 Cθ (s, t),. (2.5). where σ 2 > 0 is the process variance and Cθ (s, t) is the correlation function. The best linear unbiased predictor (BLUP) is obtained by choosing the vector w(x) to minimize M SE[fe(x)] = E[wT (x)f (Pexp ) − f (x)]2 , 5. (2.6).

(13) subject to the unbiasedness constraint E[wT (x)f (Pexp )] = E[f (x)].. (2.7). We suppose that the number of experimental points Pexp is p (1 ≤ p ≤ N ): If these points are x1 , . . . , xp , we define the BLUP of the response at an untried point as follows: r(x) = [r1 (x), . . . , rk (x)]T ,. (2.8). for the k functions in the regression, R = [r(x1 ), . . . , r(xp )]T ,. (2.9). C = {C(xi , xj )}, 1 ≤ i, j ≤ p,. (2.10). for the p × k matrix,. for the p × p matrix of random process correlation between Z’s at the Pexp , and c(x) = [C(x1 , x), . . . , C(xp , x)]T ,. (2.11). for the correlations between Z’s at the Pexp and an untried point x. The MSE can be denoted as follows: σ 2 [1 + wT (x)Cw(x) − 2wT (x)c(x)],. (2.12). and the impartial constraint is RT w(x) = r(x). Employing Lagrange multipliers λ(x) for the minimization of the MSE, the weight w(x) of the BLUP has to satisfy      0 R λ(x) r(x)  = . M = (2.13) R C w(x) c(x) The surrogate surface then can be rewritten in the following form e Se = rT (x)βe + cT (x)C −1 (f (Pexp ) − Rβ),. (2.14). where βe = (RT C −1 R)−1 RT C −1 f (x) is the common generalized least-squares estimate of β.. 6.

(14) 2.1.4. Choosing a New Point and Verifying the Stopping Criterion.. In this subsection, we employ the surrogate surface Se to determine the next new point Pnew from Gk \ Pexp . Here we apply an optimization method to Se to find Pnew e x , for all x ∈ Gk \ Pexp }. We then add the point Pnew to Pexp e.g. Pnew = arg min{S| and have a new possible optimal point Ok+1 = arg min{f (x)|s ∈ Pexp }. Then the algorithm is ready to check the stopping criteria for shrinking the search region. When Ok+1 is identified, the algorithm will check if the function value of Ok+1 is lower than Ok . In other words, ρk = f (Ok ) − f (Ok+1 ) will be computed. If ρk > 0, then let Ok = Ok+1 and keep on adding new point. On the contrary, if ρk ≤ 0, then the searching region should be shrunken and the grids refined. Thus the searching sequence can operate in an ever decreasing. 2.1.5. Shrinking the Search Region and Refining Grids.. If ρk ≤ 0, then the original searching region could not find an improvable Pnew . The searching region should be shrunken and the grids should be refined. The algorithm then attempts to find Pnew with a lower function value in a new search region. While the algorithm can’t find a point with function value lower than the current optimal point Ok = arg min{f (x)|x ∈ Pnew } in the current grid Gk , we suppose that there is a lower point with smaller grid size nearby current optimal point Ok . Thus we choose as the current optimal point the center Ck+1 = Ok of a new grid Gk+1 . The grid’s size control parameter 4k would be refined with a ratio R1 , where i hS (xi ) , for i = 1, . . . , N , R ≥ 2. Let 4k+1 = R · 4k , then we have Gk+1 = gk+1 (x ). i where gk+1 = 4k+1 · Bii · ai + Ck+1 . The main reasons that we consider such a grid. reduction are as follows. We would like to have function values that have already been computed, that is, for some given experimental points in Pexp . This is because the new experimental region will take the optimal point of Pexp as the center of the region before doing the reduction. Here, we believe that the local minimum is located near the center, even though it is not actually the best point found thus far. The range of each dimension will become a half of its value in the last experimental region, while still keeping the same number of grid points. However, some parts of 7.

(15) the new domain might not be completely included in the last experimental region; the algorithm will change their position. Therefore, based on the same principle, the grid size is a factor of. 1 R. times the prior grid size. Also, some grid points of. former experimental region are contained in the latter experimental region. The algorithm then uses some of the information obtained. In the new search region, only the points whose function values have already been evaluated will be taken as the initial experimental points. The following describes the step of grid reduction in detail: 1.The optimal point in Pexp determines the center of the new experimental region. 2.We verify whether the new experimental region is beyond the last experimental region or not with the use of a region checking algorithm. In short, we suppose that the grid point of each dimension is odd. Algorithm. CHECKING REGION. for i = 1 to d do if C(i) − 12 (pi − 1) · 21 4i ≤ x(i)− min, new− x(i)− min = x(i)− min, new− x(i)− max = x(i)− min + (pi − 1) · 12 4i . elseif C(i) + 12 (pi − 1) · 21 4i ≥ x(i)− max, new− x(i)− min = x(i)− max − (pi − 1) · 21 4i , new− x(i)− max = x(i)− max. else new− x(i)− min = C(i)− 12 (pi −1)· 12 4i , new− x(i)− max = C(i)+ 12 (pi −1)· 21 4i . endif end 2.1.6. Stopping Criteria for Accuracy Control.. Iterating the procedure until some stopping criteria have been achieved. These stopping criteria may generally be tuned to suit a given experiment and desired degree of accuracy. Some stopping conditions which are in common used : • The mesh size is less than a certain numerical tolerance; • Let ρk = f (Ok )−f (Ok+1 ), and ρk becomes less than a certain numerical tolerance; • The available computational resources run out. 8.

(16) 2.2. Cooperate with Derivative Information. 2.2.1. Surrogate Construction with Derivative Information. In Section 2.1.3, we show two methods to construct surrogate (BRSM and Kriging). Here we assume that gradient information of the response surface is available. In this section, we describe two surrogate construction methods which is coupled with derivative information, namely the Hermite Interpolation and Cokriging Approximation Models. • Hermite Interpolation We use the idea of Hermite interpolation to construct gradient enhanced radial basis functions (RBFs) approximations. We outline how Hermite interpolation can be implemented using RBFs when sensitivity information is cheaply available via an adjoint CFD solver. Let us denote the explored dataset by {z i , f (z i ), ∇f (z i )}, i = 1, 2, . . . m, where z i ∈ <d denotes the input vector, f (z i ) denotes the output to be approximated and ∇f = {∂f /∂z1 , ∂f /∂z2 , . . . , ∂f /∂zd } denotes the partial derivatives of the output f (z) with respect to the components of the input vector. Then, a Hermite interpolant for approximating f (z) can be written in terms of a set of RBFs as follows: fb(z) =. m X i=1. i. βi φ(kz − z k) +. m X d X i=1. ∂φ βeij (kz − z i k), ∂z j j=1. where φ(kz − z i k) is a radial basis function which is differentiable at least twice. βi and βeij , where i = 1, 2, . . . , m, j = 0, 1, 2, . . . , d, are a set of m(d + 1) undetermined weights. Since the training dataset contains f (z) and ∇f (z) at points, we can arrive at a total of linear algebraic equations to compute the undetermined coefficients in the RBF model. The first set of m equations using the function values corresponding to the points z i , i = 1, 2, . . . , m can be written as fb(z i ) = f (z i ), i = 1, 2, . . . . . . , m.. 9.

(17) An additional set of md equations can be derived by using the derivative information available in the training dataset, which gives ∇fb(z i ) = ∇f (z i ), i = 1, 2, . . . . . . , m. To implement the above conditions, we first differentiate (17) with respect to the variable , which gives m. m. d. XX ∂ ∂ ∂φ ∂ fb(z) X = βi φ(kz − z i k) + βeij (kz − z i k). ∂zk ∂z ∂z ∂z k k j i=1 i=1 j=1 Given a set of m data points for a problem with d variables, we arrive at a total of m(d + 1) linear algebraic equations using (18-20), which can be compactly written as Aβ= y, where β= {β1 , βe11 , βe12 , . . . , βe1d , β2 , βe21 , βe22 , . . . , βe2d , . . . . . . , βm , βem1 , βem2 , . . . , βemd } ∈ <m(d+1) , and ∂f ∂f ∂f ∂f ∂f ∂f y = {f (z1 ), ∂z (z1 ), ∂z (z1 ), . . . , ∂z (z1 ), . . . , f (zm ), ∂z (zm ), ∂z (zm ), . . . , ∂z (zm )} ∈ 1 2 1 2 d d. <m(d+1) . The coefficient A ∈ <m(d+1)×m(d+1) matrix can be written in partitioned form in terms of m submatrices as follows: . Φ11. Φ12. ···. Φ1m.    Φ21 Φ22 · · · Φ2m A=   ··· ··· ··· ···  Φm1 Φm2 · · · Φmm.     ,   . where     Φij =    . φ(kzi − zj k) ∂φ (kzi ∂z1. j. − z k). ··· ∂φ (kzi ∂zd. − zj k). ∂φ (kzi ∂z1 ∂2φ (kzi ∂z1d. − zj k) j. − z k). ···. ∂φ (kzi ∂zd. ···. ∂2φ (kzi ∂z1 ∂zd. ···. ···. ∂2φ (kzi ∂zd ∂z1. − zj k) · · ·. .   − z k)  .   ···  ∂2φ i j (kz − z k) ∂z 2 d. 10. − zj k) j.

(18) It can be noted from the above derivation that in order to implement Hermite interpolation the RBF φ must be differentiable at least twice. • Cokriging Approximation Models This construction method is to include gradient information in the Kriging method by predicting additional function values using the gradients available in a close neighborhood of the explored point. In this approach, the original Kriging formulation can be used with an increased number of sample data located in the proximity of the original sample points. These additional function values serve as if they were gradients because they tend to have strong correlations with the original sample points given the close distances to each other. 2.2.2. Search Region by Derivative Information. GPS with derivative information is proposed by Abramson, Audt and Dennis [1]. They present how to use gradient to decrease the worst case cost of a POLL step at each iterate. In order to further find the descent direction quickly, we employ the same idea to reduce the search region Gk and find the reductive region Gred . We then decide the next new point Pnew . The detailed process as follows: Given a Grid Gk , the indicator point, denoted by PI , is given by e x , for all x ∈ Gk }, PI = arg min{S|. (2.15). e I ) is available, then we define the indicator direction when the gradient ∇S(P e I ). P˙I = −∇S(P. (2.16). The direction is used to decide the search region. We find the point in Gk which has vector from indicator point to the each of the grid Gk included angle with P˙I is less than forty five degrees. We can now define the set of these points in Gk as Gred = {x ∈ Gk | arccosh. x − PI P˙I , i < 45◦ }. ˙ |x − PI | |PI |. 11. (2.17).

(19) When the reductive region Gred is known, the next new point Pnew will be modified as e x , f or all x ∈ Gred \ Pexp }. Pnew = arg min{S|. 2.3. (2.18). Adaptive Search Region Method with Derivative Information. Adaptive search region method with derivative information(ASRM-DI) is a modified ASRM which uses derivative information to reduce the computational cost of constructing surrogate and finds the reductive region to decide the new point. Section 2.1.1∼2.1.6 illustrate the components of ASRM. ASRM-DI follows the process but modify section 2.1.3 and 2.1.4. The surrogate constructors is replaced by Hermite interpolation and Cokriging(see 2.2.1), and we decrease the search region to find the local minimal(see 2.2.2). The compositions of ASRM-DI is shown in Table 2. Table 2: Adaptive Search Regions Method with Derivative Information (1)Defining the grid. (2)Choosing initial experimental points. (3)Constructing the surrogate surface with derivative information.. (4)Using derivative information to choose a new point and verifying the stopping criterion. (5)Shrinking the search region and refining grids. (6)Stopping criteria for accuracy control.. 2.4 2.4.1. Multiple Minima Retracing Way. In this subsubsection, we proposed a way to find multiple local minima. The idea is that there could be interesting region which have not been searched at each iteration. Therefore, there are other local minimal in these region which is retraced. We have 12.

(20) two suitable choices for different situations as follows: • Bottom-Up - The process of tracing the search region back node-by-node. The search region would be traced back only one node at a time. • Top-Down - The process of tracing the search region back all the way to root. In other words, the algorithm would be traced back to the original experimental region. 2.4.2. Determining a New Optimal Point and Stopping Criteria. In section 2.1.4. The algorithm employs the surrogate surface Se to determine the next new point Pnew . Since we want to find another possible local minimum, the algorithm will ignore the refined areas in the current experimental region. In others words, the algorithm will not pick up the points that are located any such disregarded areas Gd . Here we determine the next new point Pnew from Gred \ (Pexp ∪ Gd ). In the same way, we apply an optimization method to Se to find Pnew e.g. Pnew = e x , f or all x ∈ Gred \ (Pexp ∪ Gd )}. We then add the point Pnew to Pexp arg min{S| and have a new possible optimal point Ok+1 = arg min{f (x)|s ∈ Pexp }. The stopping criteria is similar to 2.1.4. If ρk > 0, then let Ok = Ok+1 and keep adding new point. On the contrary, if ρk ≤ 0, then the searching region should be shrunken and the grids should be refined. A new searching path will be produced. When the new region is obtained, the algorithm will go back to the last section and repeat components 2.3.2 through 2.3.5. The next section will discuss stopping criteria in complete ASRM algorithm.. 2.5 2.5.1. The whole algorithm Stopping Criteria for the Algorithm. The process of tracing back over the search region and finding additional local minima does not terminate until the procedure has satisfied some stopping criteria. For these stopping conditions, it will vary with the different experiments and occur numerous result. Some stopping conditions are in common use are listed below.. 13.

(21) • When the degrees of each node reached a certain number. • When the ratio of each searching region to refining areas became more than a certain numerical tolerance. • When the available computational resources ran out. • When all the targets we set had been found. 2.5.2. The theoretical algorithm. Algorithm 1. Initialization 1. Define a major matrix A ∈ Zd×N , a minor matrix B ∈ Rd×d , a center C0 , and a step size parameter 40 . 2. Let 4k = 40 , and Ck = C0 . Algorithm 2. Finding the optima Ok+1 1. Construct the grid Gk Gk = Ck + 4k · B · A. There are N (N = ×p2 × . . . × pd , here d is the dimension.) grid points in Gk . 2. Choose initial experimental points Pinit ; which are chosen uniformly over the grid Gk . 3. Let Pexp = Pinit ∪ Peval , Ok = arg min(f (Pexp )). 4. Do until ρk ≤ 0 then shrinking searching region by Algorithm 3: e (a) Construct surrogate S. If the first run or tracing back from Algorithm 4, then let indicator point e PIk = arg min S. (b) Define the region Gred in the grid Gk by using descent direction from PIk ’s gradients. (c) Find new possible point Pnew by minimizing Gred . 14.

(22) e exp )) and PI = arg min(S(P e exp )). (d) Let Pexp = Pexp ∪Pnew , Ok+1 = arg min(S(P k (e) ρk = f (Ok ) − f (Ok+1 ). (f) If ρk > 0 then Ok = Ok+1 . Algorithm 3. Contracting searching region 1. Let 4k = θ · 4k (here0 < θ < 1), Ck = Ok+1 . 2. Store the searching region Gk in Gall . 3. If 4k → 0 then trace back searching region Gk by Algorithm 4. 4. Otherwise, return to Algorithm 2. Algorithm 4. Tracing back 1. For i = length(Gall ), . . . ,1 (a) Check if Ck ∈ Giall and 4k = θ · 4iall . i (b) Let Ck = Call , 4k = 4iall , and Gk = Giall .. (c) Stop. Return 2. Identify reduced area Rk in Gk , For i = length(Gall ), . . . ,1 (a) Check if Giall ∈ Gk . (b) Then, reduced area Rk =. S. Giall .. Return 3. Compute reduced ratio rk =. area of Rk . area of Gk. 4. If rk = 1, then return to 4.1. 5. Otherwise, ignore the Rk of Gk and return to Algorithm 2. 15.

(23) 3. Convergence Result. The following theorem is to give the limit of the step control parameter. The proof is the same to that found in Lin (2009). Theorem 3.1. Assume that L(x0 ) is compact, the mesh size parameters satisfy lim inf 4k = 0. k→+∞. Definition 3.2. A subsequence of the ASRM-DI iterates consisting of mesh local optimizers, {xk }k∈K (for some subset of indices K), is said to be a refining subsequence if {4k }k∈K converges to zero. We now show results about global convergence. The proof was adapted in [1] for our notation. Theorem 3.3. Assume that L(x0 ) is compact and that x b is any limit of a refining subsequence, and Baik is any direction in Gred , and if f is Lipschitz near x b, then the generalized directional derivative of f at x b in the direction Baik is nonnegative, i.e., f ◦ (b x; Baik ) ≥ 0. Proof. Let {xk }k∈K be a refining subsequence with limit point x b. Let Baik be obtained in the statement of the Section 2.2.1. The analysis is divided in two cases. First, consider the case where the gradient is evaluated only a finite number of times in the subsequence {xk }k∈K . Note that since f is Lipschitz near x b , it must be finite near x b. Thus, we have that infinitely many of the right hand quotients of Clarke’s generalized directional derivative definition [7] f ◦ (b x; Baik ) ≡ lim sup y→b x,t↓0. ≥ lim sup k∈K. f (y + tBaik ) − f (y) t f (xk + 4k Baik ) − f (xk ) 4k. (3.19). are defined. This allows us to conclude that all of them must be nonnegative. Second, consider the case where the gradient is used in an infinite number of iterates in the subsequence. Then there is a subsequence that converges to x b for which (Baik )T ∇f (xk ) > 0 and thus the right hand side of (3.24) is bounded below by zero. 16.

(24) Theorem 3.4. Assume that L(x0 ) is compact and that x b is any limit of a refining subsequence. If f is strictly differentiable at x b, then ∇f (b x) = 0. Proof. If f is strictly differentiable at x b, then for any direction Bai0 6= 0, f ◦ (b x; Bai0 ) = (Bai0 )T ∇f (b x). Then by Theorem 3.3, for each Baik , 0 ≤ (Baik )T ∇f (b x). Thus, we see immediately that (Bai0 )T ∇f (b x) ≥ 0, but the same construction for −Bai0 shows that −(Bai0 )T ∇f (b x) ≥ 0 and so ∇f (b x) = 0. The following corollary show the set of all the points searched in ASRM-DI is dense in the experimental region as t → ∞. Thus we propose two assumptions as follows: 1. For any experiment region Gk , the initial experimental points Pinitk ∈ Gk is the uniform design. 2. The search region would be traced back only one node at a time(BottomUp). In addition, the search region wouldn’t trace back until it was covered with the union of the refining areas . The proof is omitted since it is identical to that in Lin (2009). Theorem 3.5. The set of all the points searched in ASRM-DI is dense in the experimental region D when 4k converges to zero and time goes to infinite.. 4. Numerical Experiments. In this section, we show the results of numerical experiments and compare with ASRM in function evaluation. Before showing the numerical results, we describe the basis that is used for constructing the surface by Hermite and Cokriging. The structure of some commonly used the correlation of the Hermite and Cokriging method is shown in Table 3.. 17.

(25) Table 3: Correlation functions that we use for Hermite and Cokriging. Linear Spline. kxi − xj k. Thin Plate Splines. kxi − xj kk lnkxi − xj kk. Cubic Splines. kxi − xj k3 −. Gaussian. e q. Multiquadrics Inverse Multiquadrics. 4.1 4.1.1. kxi −xj k2 β. kxi −xj k2 β kxi −xj k2 − 1 ) 2 β. 1+. (1 +. Smooth Function with a Single Minimum Banana shape function. The first test function is defined as: min f (x, y) = −. 10((x +. 1)2. 100 , − (y + 1)2 ) + x2 + 4. (4.20). s.t. − 1.5 ≤ x ≤ 1.5 and − 2.5 ≤ x ≤ 0.5. This test function with a single minimum. It has been studied previously in [3] and [5]. There is one local minimum at (0, 0). In ASRM-DI, we define the initial grid G = {(x, y)|x ∈ {−1.5, − 69 , 50. 63 , 50. . . . , 1.5} and y ∈ {−2.5, − 119 , 50. 113 , 50. . . . , 0.5}}.. The grid is composed of 676(26×26) grid points. The initial experiment points are chosen by the two-factors uniform design using 21 levels for each factor. The true response and contour is shown in Figure 1. Results of the ASRM is illustrated in Figure 2.. 18.

(26) Figure 1: The true response of the Banana-Shaped function. Figure 2: The coverage rate of each level is 25%. Function evaluation : 70. 4 : Local minimum found by ASRM-DI(Hermite). × : True local minima.. 19.

(27) 4.2 4.2.1. Smooth Function with a Multiple Local Minimum Camel function. The proposed test function is an artificial function with two symmetric basins, which was originally studied in [5]. The function is defined as follows: (−x4 + 4.5x2 + 2) min f (x, y) = − , e2y2. (4.21). s.t. − 2 ≤ x, y ≤ 2. The true surface has a saddle point at (0,0) and two minima at (-1.5,0) and (1.5,0). The true response and contour is shown in Figure 3. Figure 4 and Figure 5 is the different retracing results in this test function.. Figure 3: The true response of the Camel function. 4.2.2. M¨ uller Brown Surface. The next test function is found in a chemical application, which is originally studied in [14]. The function has three local minima and is defined as follows: min f (x, y) =. 4 X. Ai exp[ai (x − x0i )2 + bi (x − x0i )(y − yi0 ) + ci (y − yi0 )2 ] ,. i=1. where A = (−200, −100, −170, 15), a = (−1, −1, −6.5, 0.7) , b = (0, 0, 11, 0.6), c = (−10, −10, −6.5, 0.7) , 20. (4.22).

(28) Figure 4: Bottom-Up search - The coverage rate of each level is 30%. × : Local minimum found by ASRM-DI(H). Figure 5: Top-Down search - The smaller the grid size is carved, the fewer coverage rate is used. × : Local minimum found by ASRM-DI(H). 21.

(29) x0 = (1, 0, −0.5, −1), y 0 = (0, 0.5, 1.5, 1), s.t. − 1.5 ≤ x ≤ 1.0 and − 0.5 ≤ x ≤ 2.5. There are three local minima at (-0.558,1.442), (0.623,0.028) and (-0.05,0.467) in the true response surface of M¨ uller-Brown function. Figure 6 and Figure 7 are relevant results in this case.. Figure 6: The true response of the M¨ uller-Brown function. Figure 7: The coverage rate of each level is 45%. Function evaluation : 289. 4 : Local minimum found by ASRM-DI(Hermite). × : True local minima.. 22.

(30) 4.2.3. Branin function. The Branin test function is defined as follows: min f (x, y) = (y − (5/(4π 2 ))x2 + (5/π)x − 6)2 + 10(1 − 1/(8π)) cos x + 10 , (4.23) s.t. − 5 ≤ x ≤ 10 and − 0 ≤ x ≤ 15. There are three local minima at (-3.1415,12.2745), (3.1419,2.275) and (9.4247,2.475) in the true response surface of the Branin function.. Figure 8: The true response of the Branin function. Figure 9: The coverage rate of each level is 60%. Function evaluation : 126. 4 : Local minimum found by ASRM-DI(Hermite). × : True local minima.. 23.

(31) 4.2.4. 2-D Ackley’s function. The 2-D Ackley’s function has been studied in [18]. The function is defined as follows: 1 min f (x, y) = f. (. ". −a exp −b. r. # ) 1 1 2 (x + y 2 ) − exp (cos(cx) + cos(cy)) + a + exp(1) + d , n n (4.24). where a = 20, b = 0.2, c = 2π, d = 5.7, f = 0.8, n = 2 , s.t. − 1.5 ≤ x ≤ 1.5. There are nine local minima at (0, 0), (−0.9522, 0), (0.9522, 0), (0, −0.9522), (0, 0.9522), (−0.9685, −0.9685), (0.9685, 0.9685), (0.9685, −0.9685), and (−0.9685, 0.9685) in the true response surface of 2-D Ackley’s function. Results of the numerical analysis of the 2-D Ackley’s function is presented in Figure 11. The local minima ascertained by ASRM-DI are marked with red ” 4 ”. These ” × ”’s are close to true local minima marked with a white ” × ”.. Figure 10: The true response of the Ackley function. 4.2.5. 2-D Griewank function. The final 2-D function is the 2-D Griewank function min f (x, y) = 1 +. 1 2 y y − cos(x) ∗ cos( √ ), 200 2 24. (4.25).

(32) Figure 11: The coverage rate is 100%. The decreasing coverage rate is 75%. Function evaluation : 354. 4 : Local minimum found by ASRM-DI(Hermite). × : True local minima. s.t. − 10 ≤ x, y ≤ 10 There are seventeen local minima in the true response surface of 2-D Griewank’s function. The true response and the relevant result is displayed in Figure 14.. Figure 12: The true response of the 2-D Griewank function. 25.

(33) Figure 13: The coverage rate is 100%. The decreasing coverage rate is 85%. Function evaluation : 943. 4 : Local minimum found by ASRM-DI(Hermite). × : True local minima.. 4.3 4.3.1. High Dimension 3-D Ackley function. This function has been studied in [18]. The function is defined as follows: # ( " r 1 1 2 min fA (x, y, z) = (x + y 2 + z 2 ) (4.26) −a exp −b f n 1 − exp (cos cx + cos cy + cos cz) + a exp (1) + d , n where a = 20, b = 0.2, c = 2π, d = 5.7, f = 0.8, n = 2, s.t.. −1.5 ≤ x ≤ 1.5, −1.5 ≤ y ≤ 1.5 and − 1.5 ≤ z ≤ 1.5.. There are twenty seven local minimums in the true surface. The true response and the relevant result is displayed in Figure 14. 4.3.2. Shekel’s family. This case is a family function, which is defined as f (x) = −. m X i=1. 1 (x − ai. )T (x. − ai ) + c i. ,. where x = (x1 , . . . , xn ), 0 ≤ xi ≤ 10, ai = (ai1 , . . . , ain ), ci > 0. 26.

(34) (a) True slice of 3-d Ackley’s function.. (b) True slice and minima found by ASRM-DI(H) of x-y plane.. (c) True slice and minima found by. (d) True slice and minima found by. ASRM-DI(H) of x-z plane.. ASRM-DI(H) of y-z plane.. Figure 14: The results generated by the ASRM-DI(H). Coverage rate of each level is 90%. The maximum refinement factor is 2. Four different views are shown in part (a) to (d), respectively The Shekel5, Shekel7 and Shekel10 with 5, 7 and 10 local minimums, respectively. The relevant parameter is shown in Table 4. Table 5, Table 6, and Table 7 depict the 4-D Shekel’s function with 5, 7, and 10 local minima, respectively. The coverage rate of Shekel5 and Shekel7 is 40% and the decrease in each level is 70%. Shekel10 takes 50% as initial coverage rate and the same decrease in each level. Initial experimental points are 100 Latin hypercube designs.. 27.

(35) Table 4: Contents are parameters for the 4-D Shekel’s function with 5, 7, and 10 local minima. The parameters correspond to the local minima. i. ai. ci. 1. 4. 4. 4. 4.. 0.1. 2. 1. 1. 1. 1.. 0.2. 3. 8. 8. 8. 8.. 0.2. 4. 6. 6. 6. 6.. 0.4. 5. 3. 7. 3. 7.. 0.4. 6. 2. 9. 2. 9.. 0.6. 7. 5. 5. 3. 3.. 0.3. 8. 8. 1. 8. 1.. 0.7. 9. 6. 2. 6. 2.. 0.5. 10 7. 3.6. 7. 3.6.. 4.3.3. 0.4. Hartman’s family. The function is defined as follows ( n ) m X X f (x) = − ci exp − aij (xj − pij )2 , i=1. j=1. where x = (x1 , . . . , xn ), 0 ≤ xi ≤ 1, pi = (pi1 , . . . , pin ), ai = (ai1 , . . . ain ). The 3-D Hartman’s function with three local minimums and 6-D Hartman’ function with two local minimums. The relevant parameter is shown in Table 8(3-D) and Table 9(6-D). 4.3.4. Rosenbrock family. The n-D function is defined as n X . f (x) = 100(xi+1 − x2i )2 + (xi − 1)2 , i=1. where x = (x1 , . . . , xn ), −2 ≤ xi ≤ 2. The global optimum lies in the point (1, 1, . . . , 1) and the optimal value is 0. 28.

(36) Table 5: Numerical results of 4-D Shekel5 function. Initial Grid: 94 , Final Grid: 654 , Final Grid Size: 0.1563, and Function Evaluations: 900. TRUE local min. f. x1. x2. x3. x4. 1. -10.1532. 4.0004. 4.00013. 4.0004. 4.00013. 2. -5.10051. 7.99958 7.99964. 7.99958 7.99964. 3. -5.05520. 1.00013 1.00016. 1.00013 1.00016. 4. -2.68286. 5.99875 6.00029. 5.99875 6.00029. 5. -2.63047. 3.0018. 6.99833. 3.0018. 6.99833. local min. f. x1. x2. x3. x4. 1. -8.806317. 4.0625. 4.0625. 4.0625. 4.0625. 2. -5.007738. 7.9688. 7.9688. 7.9688. 7.9688. 3. -4.691123. 0.9375. 0.9375. 0.9375. 0.9375. 4. -2.590704. 5.9375. 5.9375. 5.9375. 5.9375. 5. -2.603559. 2.9688. 7.0313. 2.9688. 7.0313. ASRM-DI(H). 4.4. Summary of the result. In order to provide the function evaluation of comparison, we compare the proposed methodology(ASRM-DI) to ASRM. Here Kriging is used for surrogate construct in ASRM. There are several test functions is employed in our paper. Table 12 to 16 summarizes the results of the simulation study for the response function. Note that the table is divided into three case(ASRM(K), ASRM-DI(C) and ASRM-DI(H)). The last column gives the relative efficiency (R.E.) of the two approaches and is the ratio of function evaluations with the ASRM(K) item in the fifth column. See that for both ASRM-DI(C) and ASRM-DI(H), the relative efficiencies are smaller than 1. This indicates that the proposed methodology is performing better than ASRM. Also note that as we go right the table, the relative efficiencies in ASRMDI(H) are almost smaller than in ASRM-DI(C). This is expected since, Cokriging. 29.

(37) Table 6: Numerical results of 4-D Shekel7 function. Initial Grid: 94 , Final Grid: 654 , Final Grid Size: 0.1563, and Function Evaluations: 2715. TRUE local min. f. x1. x2. x3. x4. 1. -10.47029. 4.0006. 4.0007. 3.9995. 3.9996. 2. -51288. 7.9995. 7.9996. 7.9995. 7.9996. 3. -5.0877. 1.00013 1.00016. 1.00013 1.00016. 4. -3.7243. 4.9942. 4.885. 3.0061. 3.0068. 5. -2.7659. 3.0009. 7.0006. 3.0004. 7.0001. 6. -2.7519. 5.9981. 6.0008. 5.9973. 5.9993. 7. -1.8376. 2.0048. 8.9917. 2.0046. 8.9915. local min. f. x1. x2. x3. x4. 1. -9.055315. 4.0625. 4.0625. 4.0625. 4.0625. 2. -5.036113. 7.9688. 7.9688. 7.9688. 7.9688. 3. -4.722564. 0.9375. 0.9375. 0.9375. 0.9375. 4. -3.692580. 5.0000. 5.0000. 2.9688. 2.9688. 5. -2.741714. 2.9688. 7.0312. 2.9688. 7.0312. 6. -2.662330. 5.9375. 5.9375. 5.9375. 5.9375. 7. -1.806288. 2.0312. 9.0625. 2.0312. 9.0625. ASRM-DI(H). constructor is to create additional function values using the derivative information and a first order Taylor series expansion in a close neighborhood of the sample point. However, the Hermite interpolation which exploits gradient information at n sample points, given d dimension, it can arrive at a total of n(d + 1) linear algebraic equations. We can know the Hermite interpolation which uses the derivative information directly. Therefore, the results in ASRM-DI(H) is better than in ASRM-DI(C).. 30.

(38) Table 7: Numerical results of 4-D Shekel10 function. Initial Grid: 94 , Final Grid: 654 , Final Grid Size: 0.1563, and Function Evaluations: 2609 2561. TRUE local min. f. x1. x2. x3. x4. 1. -10.5364. 4.0008. 4.0006. 3.9997. 3.9995. 2. -5.1756. 7.9995. 7.9994. 7.9995. 7.9994. 3. -5.1285. 1.0004. 1.0003. 1.0003. 1.0002. 4. -3.8354. 4.9949. 4.994. 3.0076. 3.0067. 5. -2.8711. 5.999. 5.9979. 5.9982. 5.9965. 6. -2.8066. 3.0013. 7.0002. 3.0007. 6.9997. 7. -2.4273. 6.9916. 3.5956. 6.9907. 3.5946. 8. -2.4217. 6.0056. 2.01. 6.0044. 2.0088. 9. -1.8595. 2.0051. 8.9913. 2.0049. 8.9911. 10. -1.6766. 7.9869 1.0122 7.9864 1.0119. ASRM-DI(H) local min. f. x1. x2. x3. x4. 1. -9.190667. 4.0625. 4.0625. 4.0625. 4.0625. 2. -5.083573. 7.9588. 7.9588. 7.9588. 7.9588. 3. -4.762371. 0.9375. 0.9375. 0.9375. 0.9375. 4. -3.802553. 5.0000. 5.0000. 2.9688. 2.9688. 5. -2.784382. 5.9375. 5.9375. 5.9375. 5.9375. 6. -2.781770. 2.9688. 7.0313. 2.9688. 7.0313. 7. -2.414486. 7.0313. 3.5938. 7.0313. 3.5938. 8. -2.382442. 5.9375. 2.0312. 5.9375. 2.0312. 9. -1.827928. 2.0312. 9.0625. 2.0312. 9.0625. 10. -1.652839. 7.9688 0.9375 7.9688 0.9375. 31.

(39) Table 8: Parameter for the 3-D Hartman Function m = 4, n = 3 i. aij. ci. pij. 1. 3. 10. 30.. 1. 0.3689 0.1170 0.2673. 2. 0.1 10. 35.. 1.2. 0.4699 0.4387 0.7470. 3. 3. 10. 30.. 3. 0.1091 0.8732 0.5547. 4. 0.1 10. 35.. 3.2 0.03815 0.5743 0.8828. Table 9: Parameter for the 6-D Hartman Function m=4 n=6 i. aij. ci. pij. 1. 10. 3. 17. 3.5 1.7 8.. 1. 0.1312 0.1696 0.5569 0.0124 0.8283 0.5886. 2. 0.05 10. 17. 0.1 8. 14.. 3. 3. 3.5 1.7 10. 17. 8.. 4. 17. 8. 0.05 10. 0.1 14.. 1.2 0.2329 0.4135 0.8307 0.3736 0.1004 0.9991 3. 0.2348 0.1451 0.3522 0.2883 0.3047 0.6650. 3.2 0.4047 0.8828 0.8732 0.5743 0.1091 0.0381. Table 10: ”Top-Down” search for 3-D Hartman function by ASRM-DI(H). Initial Grid: 93 , Final Grid: 81 × 81 × 81, Final Grid Size: 0.04167, and Function Evaluations: 292. TRUE local min. f. x1. x2. x3. 1. -3.8628. 0.1146. 0.5556. 0.8525. 2. -3.0898. 0.1093. 0.8605. 0.5641. 3. -1.0008. 0.3687. 0.1176. 0.2676. 1. -3.860970. 0.1000. 0.5500. 0.8500. 2. -3.054583. 0.1500. 0.8500. 0.5500. 3. -0.998518. 0.3750. 0.1250. 0.2750. ASRM-DI(H). 32.

(40) Table 11: ”Top-Down” search for 6-D Hartman function by ASRM-DI(H). Initial Grid: 76 , Final Grid: 256 , Final Grid Size: 0.04167, and Function Evaluations: 586. TRUE local min. f. x1. x2. x3. x4. x5. x6. 1. -3.32237. 0.20169. 0.150011. 0.476874. 0.275332 0.311652. 2. -3.20316. 0.404653. 0.882445. 0.846102. 0.57399. local min. f. x1. x2. x3. x4. x5. x6. 1. -3.288033. 0.2083. 0.1667. 0.4583. 0.2917. 0.2917. 0.6667. 2. -3.190570. 0.4167. 0.8750. 0.8333. 0.5833. 0.1250. 0.0417. 0.657301. 0.138927 0.0384959. ASRM-DI(H). Table 12: 2-D test functions and relevant parameters. Initial grid : 262 . Method Test function Ranges Local minimum Function evaluation ASRM(K). R.E.. 123. 1. 67. 0.54. 70. 0.57. 146. 1. 44. 0.30. ASRM-DI(H). 44. 0.30. ASRM(K). 597. 1. 303. 0.51. 289. 0.48. 293. 1. 149. 0.51. 126. 0.43. 635. 1. 355. 0.56. ASRM-DI(H). 354. 0.56. ASRM(K). 2114. 1. 1022. 0.48. 943. 0.45. ASRM-DI(C). Banana. −1.5 ≤ x ≤ 1.5,. 1. −2.5 ≤ y ≤ 0.5. ASRM-DI(H) ASRM(K) ASRM-DI(C). ASRM-DI(C). Camel. MBS. −2 ≤ x, y ≤ 2. −1.5 ≤ x ≤ 1. 2. 3. −0.5 ≤ y ≤ 2. ASRM-DI(H) ASRM(K) ASRM-DI(C). Branin. −5 ≤ x ≤ 10. 3. 0 ≤ y ≤ 15. ASRM-DI(H) ASRM(K) ASRM-DI(C). ASRM-DI(C). Ackley. Griewank. −1.5 ≤ x, y ≤ 1.5. −10 ≤ x, y ≤ 10. ASRM-DI(H). 33. 9. 17.

(41) 5. Conclusion. In this article, we cooperate ASRM with derivative information. The comparative result in function evaluations shows the method from ASRM-DI is better than ASRM. The ASRM-DI algorithm follows its asymptotic convergence properties from Lin (2009). In addition, the algorithm exhibits theoretical global dense searching to ensure that all local minima and the global minimum can be found. The preliminary numerical results of the ASRM-DI perform well, even for oscillatory highdimensional problems. We obtained a significant improvement in finding all local minima in a given experimental area. Table 13: 3-D test functions and relevant parameters. Initial grid : 113 Method. Test function. Ranges. Local minimum. Function evaluation. R.E.. 2400. 1. 1586. 0.66. 1526. 0.64. 542. 1. 298. 0.55. 292. 0.54. ASRM(K) ASRM-DI(C). 3-D Ackley. ASRM-DI(H). −1.5 ≤ xi ≤ 1.5,. 27. i = 1, . . . , 3. ASRM(K) ASRM-DI(C). 3-D Hartman. ASRM-DI(H). 0 ≤ xi ≤ 1,. 3. i = 1, . . . , 3. Table 14: 4-D test functions and relevant parameters. Initial grid : 94 Method. Test function. Ranges. Local minimum. ASRM(K) ASRM-DI(C). Shekel5. ASRM-DI(H). 0 ≤ xi ≤ 10,. 5. i = 1, . . . , 4. ASRM(K) ASRM-DI(C). Shekel7. ASRM-DI(H). 0 ≤ xi ≤ 10,. 7. i = 1, . . . , 4. ASRM(K) ASRM-DI(C) ASRM-DI(H). Shekel10. 0 ≤ xi ≤ 10, i = 1, . . . , 4. 34. 10. Function evaluation. R.E.. 1242. 1. 943. 0.76. 900. 0.72. 3555. 1. 2794. 0.79. 2715. 0.76. 3630. 1. 2529. 0.70. 2609. 0.72.

(42) Table 15: 6-D test function and relevant parameters. Initial grid : 76 Method. Test function. Ranges. Local minimum. Function evaluation. R.E.. 886. 1. 593. 0.67. 586. 0.66. ASRM(K) ASRM-DI(C). 6-D Hartrman. ASRM-DI(H). 0 ≤ xi ≤ 1,. 2. i = 1, . . . , 6. Table 16: 8-D test function and relevant parameters. Initial grid : 38 Method. Test function. Ranges. Local minimum. ASRM(K) ASRM-DI(C) ASRM-DI(H). 8-D Rosenbrock −2 ≤ xi ≤ 2, i = 1, . . . , 8. 35. 1. Function evaluation. R.E.. 205. 1. 135. 0.66. 122. 0.60.

(43) References [1] M. A. Abramson, C. Audet, and Jr. J. E. Dennis. Generalized pattern searches with derivative information. Mathematical Programming, 100(1):3–25, 2004. [2] C. Audet and Jr. J. E. Dennis. Analysis of generalized pattern searches. SIAM Journal on Optimization, 17(1):188–217, 2006. [3] S. D. Balkin and D. K. J. Lin. A neural network approach to response surface methodology. Communications in Statistics-Theory and Methods, 29(9):2215– 2227, 2000. [4] G. E. P. BOX and K. B. WILSON. On the experimental attainment of optimum conditions. J. of the Royal Statist. Society, Series B, 13(1):1–45, 1951. [5] R. B Chen, W. Wang, and F. Tsai. Basis-based Response Surface Method in Computer Experiments Optimization. Techincal report, Institute of Statistics and Department of Applied Math., National University of Kaohsiung, 2006. [6] H. S. Chung and J. J. Alonso. Using gradients to construct cokriging approximation models for high-dimensional design optimization problems. AIAA Paper, 317:14–17, 2002. [7] F. H. Clarke. Optimization and nonsmooth analysis. Society for Industrial Mathematics, 1990. [8] H. M. Gutmann. A radial basis function method for global optimization. Journal of Global Optimization, 19(3):201–227, 2001. [9] D. R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21(4):345–383, 2001. [10] D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, 1998.. 36.

(44) [11] R. M. Lewis and V. Torczon. Pattern search algorithms for bound constrained minimization. Citeseer, 1996. [12] R. M. Lewis and V. Torczon. Pattern search methods for linearly constrained minimization. SIAM Journal on Optimization, 10(3):917–941, 2000. [13] R. M. Lewis, V. Torczon, and M.W. Trosset. Why pattern search works. Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, 1998. [14] K. M¨ uller and L. D. Brown. Location of saddle points and minimum energy paths by a constrained simplex optimization procedure. Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta), 53(1):75–93, 1979. [15] Y. S. Ong, P. B. Nair, K. Y. Lum, and Z. K. Zhang. Hybrid Evolutionary Algorithm for Aerodynamic Design Using Hermite Radial Basis Function Interpolants. American Institute of Aeronautics and Astronautics Journal, In communication, 2004. [16] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. Design and analysis of computer experiments. Statistical science, 4(4):409–423, 1989. [17] C. M. Siefert. Model-assisted pattern search. PhD thesis, College of William & Mary, 2000. [18] A. Sobester, S. J. Leary, and A. J. Keane. A parallel updating scheme for approximating and optimizing high fidelity computer simulations. Structural and multidisciplinary optimization, 27(5):371–383, 2004. [19] V. Torczon. On the convergence of pattern search algorithms. SIAM Journal on Optimization, 1997. [20] M. W. Trosset and V. Torczon. Numerical optimization using computer experiments. Defense Technical Information Center, 1997.. 37.

(45)