• 沒有找到結果。

OUTLINE OF THE TRUST-REGION APPROACH

Trust-Region Methods

OUTLINE OF THE TRUST-REGION APPROACH

p2

, which is small when p is small.

When Bkis equal to the true Hessian∇2f(xk), the approximation error in the model function mkis O

p3

, so this model is especially accurate whenp is small. This choice Bk  ∇2f(xk) leads to the trust-region Newton method, and will be discussed further in Section 4.4. In other sections of this chapter, we emphasize the generality of the trust-region approach by assuming little about Bkexcept symmetry and uniform boundedness.

To obtain each step, we seek a solution of the subproblem

minp∈IRn mk( p) fk+ gkTp+12pTBkp s.t. p ≤ k, (4.3)

wherek > 0 is the trust-region radius. In most of our discussions, we define  ·  to be the Euclidean norm, so that the solution pkof (4.3) is the minimizer of mkin the ball of radiusk. Thus, the trust-region approach requires us to solve a sequence of subproblems (4.3) in which the objective function and constraint (which can be written as pTp≤ 2k) are both quadratic. When Bkis positive definite andBk−1gk ≤ k, the solution of (4.3) is easy to identify—it is simply the unconstrained minimum pBk −Bk−1gkof the quadratic mk( p). In this case, we call pkBthe full step. The solution of (4.3) is not so obvious in other cases, but it can usually be found without too much computational expense. In any case, as described below, we need only an approximate solution to obtain convergence and good practical behavior.

OUTLINE OF THE TRUST-REGION APPROACH

One of the key ingredients in a trust-region algorithm is the strategy for choosing the trust-region radiuskat each iteration. We base this choice on the agreement between the model function mkand the objective function f at previous iterations. Given a step pkwe define the ratio

ρk f(xk)− f (xk+ pk)

mk(0)− mk( pk) ; (4.4)

the numerator is called the actual reduction, and the denominator is the predicted reduction (that is, the reduction in f predicted by the model function). Note that since the step pk

is obtained by minimizing the model mkover a region that includes p 0, the predicted reduction will always be nonnegative. Hence, if ρk is negative, the new objective value f(xk+ pk) is greater than the current value f (xk), so the step must be rejected. On the other hand, if ρk is close to 1, there is good agreement between the model mk and the function f over this step, so it is safe to expand the trust region for the next iteration. Ifρk

is positive but significantly smaller than 1, we do not alter the trust region, but if it is close to zero or negative, we shrink the trust region by reducingkat the next iteration.

The following algorithm describes the process.

Algorithm 4.1 (Trust Region).

Given ˆ > 0, 0∈ (0, ˆ), and η ∈ 0,14

: for k 0, 1, 2, . . .

Obtain pkby (approximately) solving (4.3);

Evaluateρkfrom (4.4);

ifρk< 14

k+1 14k

else

ifρk> 34andpk  k

k+1 min(2k, ˆ) else

k+1 k; ifρk> η

xk+1 xk+ pk

else

xk+1 xk; end (for).

Here ˆ is an overall bound on the step lengths. Note that the radius is increased only if pk actually reaches the boundary of the trust region. If the step stays strictly inside the region, we infer that the current value ofkis not interfering with the progress of the algorithm, so we leave its value unchanged for the next iteration.

To turn Algorithm 4.1 into a practical algorithm, we need to focus on solving the trust-region subproblem (4.3). In discussing this matter, we sometimes drop the iteration subscript k and restate the problem (4.3) as follows:

minp∈IRnm( p)def f + gTp+12pTBp s.t. p ≤ . (4.5) A first step to characterizing exact solutions of (4.5) is given by the following theorem (due to Mor´e and Sorensen [214]), which shows that the solution pof (4.5) satisfies

(B+ λI )p  −g (4.6)

for someλ ≥ 0.

Theorem 4.1.

The vector pis a global solution of the trust-region problem

minp∈IRnm( p) f + gTp+12pTBp, s.t. p ≤ , (4.7)

if and only if pis feasible and there is a scalarλ ≥ 0 such that the following conditions are satisfied:

(B+ λI )p  −g, (4.8a)

λ( − ||p||)  0, (4.8b)

(B+ λI ) is positive semidefinite. (4.8c) We delay the proof of this result until Section 4.3, and instead discuss just its key features here with the help of Figure 4.2. The condition (4.8b) is a complementarity condition that states that at least one of the nonnegative quantitiesλ and ( − p) must be zero.

Hence, when the solution lies strictly inside the trust region (as it does when  1in Figure 4.2), we must haveλ  0 and so Bp  −g with B positive semidefinite, from (4.8a) and (4.8c), respectively. In the other cases  2and  3, we havep  , and soλ is allowed to take a positive value. Note from (4.8a) that

λp  −Bp− g  −∇m(p).

m

1

contours of p*3

2

3

p*1 p*2

Figure 4.2 Solution of trust-region subproblem for different radii1,2,3.

Thus, whenλ > 0, the solution pis collinear with the negative gradient of m and normal to its contours. These properties can be seen in Figure 4.2.

In Section 4.1, we describe two strategies for finding approximate solutions of the subproblem (4.3), which achieve at least as much reduction in mkas the reduction achieved by the so-called Cauchy point. This point is simply the minimizer of mkalong the steepest descent direction−gk. subject to the trust-region bound. The first approximate strategy is the dogleg method, which is appropriate when the model Hessian Bkis positive definite. The second strategy, known as two-dimensional subspace minimization, can be applied when Bk

is indefinite, though it requires an estimate of the most negative eigenvalue of this matrix.

A third strategy, described in Section 7.1, uses an approach based on the conjugate gradient method to minimize mk, and can therefore be applied when B is large and sparse.

Section 4.3 is devoted to a strategy in which an iterative method is used to identify the value ofλ for which (4.6) is satisfied by the solution of the subproblem. We prove global convergence results in Section 4.2. Section 4.4 discusses the trust-region Newton method, in which the Hessian Bkof the model function is equal to the Hessian∇2f(xk) of the objective function. The key result of this section is that, when the trust-region Newton algorithm con-verges to a point xsatisfying second-order sufficient conditions, it converges superlinearly.

4.1 ALGORITHMS BASED ON THE CAUCHY POINT