Literature Review - National University of Kaohsiung Repository System:Item 310360000Q/10829

• Early Works

Response surface methodology. (Box and Wilson 1951)

The Response Surface Methodology was proposed by Box and Wilson [15]. The notion of response surface methodology is to use a local approximation for an unknown true surface. A better path toward optimization is determined by such an approximation. Pattern search methods were inspired by response surface methodology.

Coordinate Search. (Davidon 1959)

The coordinate search algorithm was first proposed by Davidon in [11]. This method is the simplest of all the pattern search methods. The coordinate search uses the pattern which is in the shape of a cross. The algorithm proposes using a parameter to ensure increasing or decreasing. The iterations of coordinate search is slow but precise.

Hooke and Jeeves’ pattern search. (Hooke and Jeeves 1961)

Hooke and Jeeve introduce the general notion of “direct search” method in [17]. The Hooke and Jeeves’ pattern search does not follow the prescribed order, in other words, it speculates in a risky adventure. The pattern search of them is a variant of coordinate search which contains a parameter for controlling the size of pattern to speed up the rate of progress in the algorithm by the information which was gotten from the search during previous steps.

Simplex Designs in Optimisation and Evolutionary Operation. (Spendley et al. 1962)

This idea was originated from Box and Wilson in [15] and Box in [7], and then it was proposed by Spendley in [33]. There are two significant ideas in the article. First, by using ”Evolutionary Operation”, it is easier and faster to obtain the closest target and reach the optimal condition.

Second, it could let the numerical computing to be much more formal and simpler, so is the process of judgment. Moreover, it could be computed in digital computer.

Parallel Direct Search Methods. (Dennis and Torczon 1991)

This article proposed by Dennis and Torczon in [19]. This is the first study of parallel direct search methods. They focus on the nonlinear unconstrained optimization problems which do not need derivatives. They specify that the direct search method are easy to implement to parallel scheme.

In this paper, they also review the multidirectional search algorithm and implement it to parallel scheme.

• Pattern Search (Direct Search)

– Convergence Analysis

Multidirectional Search. (Torczon 1991)

The convergence analysis for the multidirectional search algorithm was presented by Torczon [34]. The convergence analysis is according as the typical proof of the gradient-related methods.

The difference between them is the gradient is unnecessary.

Pattern Search. (Torczon 1997, Lewis et al. 1998)

Generally speaking, the Pattern Search is the method introduced in [35] and [25].

The general form of all pattern search methods and their convergence proofs were described and demonstrated in [35]. About a clearer and more general discussion regarding the pattern search methos was presented in [25].

Lewis and Torczon discuss pattern search methods for bound and linearly constrained mini-mization in [23] and [24]. They also investigate the convergence of these methods.

Grid-Based Methods. (Coope and Price 2001)

The convergent analysis of derivative-free methods have been concluded of three main points which is line search methods, trust region methods, or grid-based methods. This article is presented by Coope and Price [9], and is mainly focused in convergent investigating of grid-based methods. They addressed that the analysis is not limited by the grid, that is, the discuss allows freely modifying the grid.

Derivative-Free Methods. (Lucidi and Sciandrone 2002)

Lucidi and Sciandrone present a global convergence for unconstrained minimization methods which is only using function values in [26]. They also compared the proms and cons and made a clear view of pattern search and line search. In addition, they designed new algorithm which contain the advantage of both methods.

Asynchronous Parallel Pattern Search. (Kolda and Torczon 2004)

Kolda and Torczon introduce the global convergence proof for Asynchronous Parallel Pattern Search (APPS) method in [22]. APPS as implied by the name, the algorithm performs the iteration and searches each direction is not necessarily synchronous. The iterations and the search along each direction will progress single and semiautomatic.

Revisiting Asynchronous Parallel Pattern Search. (Kolda 2005)

A new Asynchronous Parallel Pattern Search (APPS) method was proposed by Kolda in [21].

They modify the original APPS, so that it is more adaptable in its use of distributed computing.

They develop the convergence theories including not only unconstrained and bound constrained problem but also simple and sufficient decrease.

– General Pattern Search

Analysis of Generalized Pattern Search. (Audet and Dennis 2003)

Torczon originally considers the unconstrained optimization problems in [35]. Lewis and Tor-czon discuss pattern search methods for bound and linearly constrained minimization in [23]

and [24]. Furthermore, they also present convergence theories for them. Audet and Dennis bring up a fresh convergence analysis for the Generalized Pattern Search (GPS)[35].

The Analysis of GPS which was described in [3] combine the arguments of [35][23][24]. Abram-son et al. introduce how to use derivative information of GPS in [2].

– Precision Control

Precision Control Algorithm

Wetter and Polak present a precision control algorithm of general pattern search in [38]. The algorithm increases the precision in the iterations little by little. They give a convergence analysis in [28].

– Mesh Adaptive Direct Search

Mesh Adaptive Direct Search. (Audet and Dennis 2006)

Audet and Dennis first introduce the Mesh Adaptive Direct Search (MADS) in [4]. The mainly issues they want to solve are the constrained non-smooth optimization problems. Abramson and Audet present a general convergence analysis of MADS in [1].

– Parallel

Parallel Direct Search Methods. (Dennis and Torczon 1991)

This article proposed by Dennis and Torczon [19]. This is the first study of parallel direct search methods. They focus on the nonlinear unconstrained optimization problems which do not need derivatives. They specify that the direct search method are easy to implement to parallel scheme. In this paper, they also review the multidirectional search algorithm and implement it to parallel scheme.

Asynchronous Parallel Pattern Search. (Hough et al. 2001)

The Asynchronous Parallel Pattern Search (APPS) proposed by Hough et al. [18]. They address some invalid situations of parallel direct search methods. In this article, they also proposed the fault-tolerant strategies to make sure the work will not down even though there is a failing process. The global convergence of APPS was introduced in [22].

• Model-Assisted

Model-assisted Grid Search. (Trosset and Torczon 1997)

The Model-assisted Grid Search algorithm (MAGS) [37] implements both pattern search and sur-rogate management framework. The MAGS is the method to combine the numerical optimization and the computer experiment. The algorithm uses kriging to generate surrogate surfaces of the object function. Then the algorithm exploits the surrogate surfaces to find the optimal point by grid search.

Surrogate Management Framework. (Booker et al. 1999)

The Surrogate Management Framework described by Booker et al. [6]. The following are mainly properties they would like to solve: (i) The computation of object function is very expensive. (ii) It is difficult to obtain derivatives of object function. To solve the problem, the basic idea of surrogate management framework is to employ a global approximation to the objective function to speed up the search for a minimizer.

Model-assisted Pattern Search. (Siefert 2000)

The Model-assisted Pattern Search (MAPS) algorithm maintains a pattern search structure and constructs surrogates for the sake of speeding the search of optimization. MAPS should refine its grid when its condition is satisfied [36]. The MAPS algorithm makes use of the design criterion that is the estimated mean squared error of the approximation. The design criterion was described by Cox and John [10]. The MAGS algorithm [37] did not use the design criterion.

• Design and Analysis of Computer Experiments

Kriging.

Kriging is a geostatistical technique to interpolate the value of a random field. The theory interpo-lates unknown points from known points of their function values by the correlation from point to

point. The approximation of kriging is a linear combination of the regression models and correlation parts. There are more information in the web site, http://en.wikipedia.org/wiki/Kriging.

Design and Analysis of Computer Experiments. (Sacks et al. 1989)

Design and Analysis of Computer Experiments was outlined by Sacks et al. [30]. More and more scientific computing problems are now studied by computer models and codes. These problems have something in common: (i) The codes are expensive to run. (ii) The experiment is to fit the object function by some cheaper models. This article is from statistical viewpoint. Its statistical model is adopted from kriging method.

• Global Optimization Analysis

Radial Basis. (Jones et al. 1998)

The issue of optimization problems is the computational cost of function evaluations that are expensive. In [13], the authors describe a response surface method to solve the issue. They also introduce how to use the information of approximating functions to create a global optimization method with a certain stopping criterion.

A Radial Basis Function Method for Global Optimization. (Gutmann 2001)

This article [16] contains a good many varieties of radial basis functions to specify the advantages of interpolation for global optimization. It also shows the convergence and proposes a global opti-mization algorithm. In this article, they finally give a comparison with other global optiopti-mization methods.

A taxonomy of global optimization. (Jones 2001)

Jones introduces a taxonomy of global optimization for employing response surfaces in [20]. In this article, the methods that are using approximative models are compared with each other. The paper also presents the advantages and disadvantages of them.

Constrained global optimization. (Regis and Shoemaker 2005)

In [29], Regis and Shoemaker introduce a new strategy which is using radial basis functions for the constrained global optimization of expensive black box functions. They present the convergence of this new method and show that it is better than other constrained optimization using response surfaces methods.

2 Algorithm

The Adaptive Search Regions Method (ASRM) is a framework for finding a minimum accurately and for finding multiple local minima automatically. The algorithm first finds the local minimum from the global surrogate surface optimization method. The algorithm then shrinks searching region after finding the local minimum in original mesh. The above-mentioned steps are repeated until the mesh size is small enough. These serial searchings can be treated as a branch of a tree. The algorithm traces back to the root and decides whether to search another branch or to continue tracing back when it meets a node.

We discuss the main components of the algorithm in Section 2.1. We then specify how to find multiple minima in Section 2.2. The complete algorithm and comparison with the pattern search method are summarized in Section 2.3.

2.1 Main Components of ASRM

In this section, we outline the ASRM algorithm as follows:

a. Defining the Grid.

We would first like to set up an experimental domain of interest. From this testing range, the algorithm will perform several procedures of grid enlarging and grid reducing. We discuss details regarding multiple minima in Section 2.2. All the grids of the ASRM process satisfy the form that we define as follows.

The grid we define is composed of a main matrix and a minor matrix. The minor matrix B ∈ R^d×dis a diagonal matrix, where d is the dimension. The main matrix is A ∈ Z^d×N, where N = p1×p²×. . .×p^d. We can decompose A as:

A = [M − M L] = [Γ L] (2)

where M and −M determine the boundary of the grid, and L determines the other points of the grid and contains the column of zeros that is identified as the center of the grid.

Given a grid size parameter ∆k ∈ R, ∆^k > 0, then we can determine all points in the grid of the the grid size of each dimensions.

Now we discuss the construction of the grid Gk by matrices A and B. For simplicity, here we will only discuss the case of dimension-two and the number of grid points is p1= p2= 5.

We choose the minor matrix B = I2, so the grid size of each dimension is the same.

The matrix A contains M , −M, and L.

M =

The matrix L contains the other points of the grid and the column of zeros. The preceding comments suggest that we can generate the matrix A as follows:

For i = 1, . . . , p1 do

The choice of initial experimental points is an important one. We could consider the uniform design (Fang et al. 2000) as a method to pick initial experimental points Pinit ∈ G^k, if there is no extra information regarding the object surface. The benefit of using this method to choose initial experimental points is that we will not introduce any operator bias into the analysis. After choosing initial experimental points, we then have a temporarily optimal point Oinit = arg min {f(x) | x ∈ P^init}. We will see that there is a specific criterion that allows the algorithm to determine whether the search region should be shrunken or not.

c. Constructing the Surrogate Surface.

Here the surrogate surface can be constructed by any method. In this article, we implement two surrogate surface constructors for ASRM namely the Basis-based Response Surface Method and Kriging.

• Basis-based Response Surface Method (BRSM)

BRSM proposed by Wang and Chen in 2007. BRSM involves the following two main parts.

1. Determination of overcomplete bases.

The idea of determining a set of overcomplete bases originated from the use of linear combi-nations to estimate a surrogate surface that is similar to the true response surface. Here we define the set of bases i = 1, . . . , d (d is the dimension). The corresponding dimension contains pi grid points.

2. Forming the surrogate surfaces by the overcomplete bases.

In each iteration, the response variables f (x)⁰s of all the points in P^exp will be evaluated. We suppose that the number of experimental points Pexp is p (1 ≤ p ≤ N). If these points are x1, . . . , xp, the response variables corresponding to them are f (x1), . . . , f (xp), then we define the p × 1 vector

VePexp = (f (x1), . . . , f (xp))^>.

Let ex_ibe the N × 1 unit vector whose values are all zero except the one corresponding to the point xiin which the value is allotted to be one. We define the p × N identification matrix I^p of the form:

Our main target is to construct a surrogate surfacePM

i=1ceiφei so that

where eφi = Ipφi, i = 1, . . . , M. Since M p, a matching pursuit algorithm is usually used to compute ci in order to minimize the error to a certain tolerance. When eciis identified, the surrogate surface can be constructed over the whole experimental region as:

S =e

Kriging is a geostatistical method used to interpolate the surrogate surface of a random field. It is developed in the field of spatial statistics and usually used in application to computer experiments.

The surrogate surface can be denoted as follows:

S =e XN j=1

βjα(xj) + Z(x), (9)

where βj are the regression coefficients of a system of linear equations, and α(xj) are j functions in the regression.

Z(x) is a random process with mean zero and covariance function of the form:

c(s, t) = σ²Cθ(s, t) (10)

where σ²> 0 is the process variance and Cθ(s, t) is the correlation function.

We compute the w(x) to minimize

MSE[ ef (x)] = E[w^>(x)f (Pexp) − f(x)]² (11) subject to the impartial constraint

E[w^>(x)f (Pexp)] = E[f (x)] (12) to gain the best linear unbiased predictor (BLUP).

We suppose that the number of experimental points Pexp is p (1 ≤ p ≤ N). If these points are x1, . . . , xp, we define the BLUP of the response at an untried point as follows:

r(x) = [r1(x), . . . , rk(x)]^> (13) for the k functions in the regression,

R = [r(x1), . . . , r(xp)]^> (14) for the p × k matrix,

C = {C(xⁱ, xj)}, 1 ≤ i ≤ p, 1 ≤ j ≤ p,

for the p × p matrix of random process correlations between Z’s at the P^exp, and c(x) = [C(x1, x), . . . , C(xp, x)]^>

for the correlations between Z’s at the Pexpand an untried point x. The MSE can then be denoted as follows:

σ²[1 + w^>(x)Cw(x) − 2w^>(x)c(x)],

and the impartial constraint is R^>w(x) = r(x). Employing Lagrange multipliers λ(x) for the minimization of the MSE, the weight w(x) of the BLUP has to satisfy

0 R

The surrogate surface then can be rewritten in the following form

S = re ^>(x) eβ + c^>(x)C⁻¹(f (Pexp) − R eβ), (16) where eβ = (R^>C⁻¹R)⁻¹R^>C⁻¹f (x) is the common generalized least-squares estimate of β.

d. Choosing a New Point and Verifying the Stopping Criterion.

We employ the surrogate surface eS to determine the next new point P^new from Gk\P^exp. Here we apply an optimization method to eS to find P^newe.g. P^new= arg min { eS |^x, for all x ∈ G^k\Pexp} We then add the point P^new to P^exp and we have a new possible optimal point Ok+1= arg min {f(x) | x ∈ P^exp}.

After this the algorithm is ready to check the stopping criteria for shrinking the search region.

In order to prove the convergence of the ASRM, we make some conditions on the searching paths.

The conditions consist of the following Hypotheses on searching paths.

Hypotheses On Searching Paths:

1. Ok+1 ∈ ∆^k· B · A + C^k ≡ ∆^k[BΓ BL] + Ck.

2. If min{f(O^k+ y), y ∈ ∆^kBΓ} < f(O^k), then f (Ok+1) < f (Ok).

The hypotheses on searching paths, which imply that f (Ok+1) < f (Ok) will force the point with a lower function value to be found.

When Ok+1 is identified, the algorithm will check if the function value of Ok+1 is lower that for Ok. In other words, ρk = f (Ok) − f(Ok+1) will be computed. If ρk > 0, then let Ok = Ok+1 and keep on adding new point. On the contrary, if ρk≤ 0 then the searching region should be shrunken and the grids refined. Thus the searching sequence can operate in an ever decreasing.

e. Shrinking the Search Region and Refining Grids.

If ρk ≤ 0 then the original searching region could not find an improvable P^new. The searching region should be shrunken and the grids should be refined. The algorithm then attempts to find Pnew with a lower function value in a new search region.

While the algorithm can’t find a point with function value lower than the current optimal point Ok = arg min {f(x) | x ∈ Pexp} in the current grid G^k, we suppose that there is a lower point with smaller grid size nearby current optimal point Ok. Thus we choose as the current optimal point the center Ck+1 = Ok of a new grid Gk+1. The grid’s size control parameter ∆k would be refined with a ratio _R¹, where R ≥ 2. Let ∆k+1 = R · ∆^k, then we have Gk+1 = hS

g_k+1^(xⁱ⁾i

, for i = 1, . . . , N , where g_k+1^(xⁱ⁾= ∆k+1· Bⁱⁱ· aⁱ+ Ck+1.

The main reasons that we consider such a grid reduction are as follows. We would like to have function values that have already been computed, that is, for some given experimental points in P^exp. This is because the new experimental region will take the optimal point of P^exp as the center of the region before doing the reduction. Here, we believe that the local minimum is located near the center, even though it is not actually the best point found thus far. The range of each dimension will become a half of its value in the last experimental region, while still keeping the same number of grid points.

However, some parts of the new domain might not be completely included in the last experimental region;

the algorithm will change their position. Therefore, by the same principle, the grid size is a factor of

R times the prior grid size. Also, some grid points of former experimental region are contained in the latter experimental region. The algorithm then uses some of the information obtained. In the new search region, only the points whose function values have already been evaluated will be taken as the initial experimental points.

The following describes the step of grid reduction in detail:

1. The optimal point in Pexp determines the center of the new experimental region.

2. We verify whether the new experimental region is beyond the last experimental region or not with the use of a region checking algorithm. For convenience of explanation, we suppose that the grid point of each dimension is odd.

Algorithm. CHECKING REGION.

Let the i-th dimension coordinate of center be C(i).

For i = 1, . . . , d (Here d is the dimension.)

When the new region is obtained, the grid refinement will go back and repeat components b.

through e.. Iterations stop when the stopping criteria are satisfied.

f. Stopping Criteria for Accuracy Control (Downward).

Shrinking the search region and refining grids will not terminate until the procedure has satisfied some stopping criteria. These stopping criteria may generally be tuned to suit a given experiment and desired degree of accuracy. Some stopping conditions that are in common use are:

• When the mesh size becomes less than a certain numerical tolerance.

• When ρ^k= f (Ok) − f(O^k+1) > 0, and ρk becomes less than a certain numerical tolerance.

• When the available computational resources run out.

在文檔中 National University of Kaohsiung Repository System:Item 310360000Q/10829 (頁 9-21)