OUTLINE OF THE TRUST-REGION APPROACH

Trust-Region Methods

OUTLINE OF THE TRUST-REGION APPROACH

p²

, which is small when p is small.

When Bkis equal to the true Hessian∇²f(xk), the approximation error in the model function mkis O

p³

, so this model is especially accurate whenp is small. This choice Bk ∇²f(xk) leads to the trust-region Newton method, and will be discussed further in Section 4.4. In other sections of this chapter, we emphasize the generality of the trust-region approach by assuming little about Bkexcept symmetry and uniform boundedness.

To obtain each step, we seek a solution of the subproblem

minp∈IRⁿ mk( p) fk+ gk^Tp+¹₂p^TBkp s.t. p ≤ k, (4.3)

wherek > 0 is the trust-region radius. In most of our discussions, we deﬁne · to be the Euclidean norm, so that the solution p_k^∗of (4.3) is the minimizer of mkin the ball of radiusk. Thus, the trust-region approach requires us to solve a sequence of subproblems (4.3) in which the objective function and constraint (which can be written as p^Tp≤ ²k) are both quadratic. When Bkis positive deﬁnite andBk⁻¹gk ≤ k, the solution of (4.3) is easy to identify—it is simply the unconstrained minimum p^B_k −Bk⁻¹gkof the quadratic mk( p). In this case, we call p_k^Bthe full step. The solution of (4.3) is not so obvious in other cases, but it can usually be found without too much computational expense. In any case, as described below, we need only an approximate solution to obtain convergence and good practical behavior.

OUTLINE OF THE TRUST-REGION APPROACH

One of the key ingredients in a trust-region algorithm is the strategy for choosing the trust-region radiuskat each iteration. We base this choice on the agreement between the model function mkand the objective function f at previous iterations. Given a step pkwe deﬁne the ratio

ρk f(xk)− f (xk+ pk)

mk(0)− mk( pk) ; (4.4)

the numerator is called the actual reduction, and the denominator is the predicted reduction (that is, the reduction in f predicted by the model function). Note that since the step pk

is obtained by minimizing the model mkover a region that includes p 0, the predicted reduction will always be nonnegative. Hence, if ρk is negative, the new objective value f(xk+ pk) is greater than the current value f (xk), so the step must be rejected. On the other hand, if ρk is close to 1, there is good agreement between the model mk and the function f over this step, so it is safe to expand the trust region for the next iteration. Ifρk

is positive but signiﬁcantly smaller than 1, we do not alter the trust region, but if it is close to zero or negative, we shrink the trust region by reducingkat the next iteration.

The following algorithm describes the process.

Algorithm 4.1 (Trust Region).

Given ˆ > 0, 0∈ (0, ˆ), and η ∈ 0,¹₄

: for k 0, 1, 2, . . .

Obtain pkby (approximately) solving (4.3);

Evaluateρkfrom (4.4);

ifρk< ¹₄

k+1 ¹₄k

else

ifρk> ³₄andpk k

k+1 min(2k, ˆ) else

k+1 k; ifρk> η

xk+1 xk+ pk

else

xk+1 xk; end (for).

Here ˆ is an overall bound on the step lengths. Note that the radius is increased only if pk actually reaches the boundary of the trust region. If the step stays strictly inside the region, we infer that the current value ofkis not interfering with the progress of the algorithm, so we leave its value unchanged for the next iteration.

To turn Algorithm 4.1 into a practical algorithm, we need to focus on solving the trust-region subproblem (4.3). In discussing this matter, we sometimes drop the iteration subscript k and restate the problem (4.3) as follows:

minp∈IRⁿm( p)^def f + g^Tp+¹₂p^TBp s.t. p ≤ . (4.5) A ﬁrst step to characterizing exact solutions of (4.5) is given by the following theorem (due to Mor´e and Sorensen [214]), which shows that the solution p^∗of (4.5) satisﬁes

(B+ λI )p^∗ −g (4.6)

for someλ ≥ 0.

Theorem 4.1.

The vector p^∗is a global solution of the trust-region problem

minp∈IRⁿm( p) f + g^Tp+¹₂p^TBp, s.t. p ≤ , (4.7)

if and only if p^∗is feasible and there is a scalarλ ≥ 0 such that the following conditions are satisﬁed:

(B+ λI )p^∗ −g, (4.8a)

λ( − ||p^∗||) 0, (4.8b)

(B+ λI ) is positive semideﬁnite. (4.8c) We delay the proof of this result until Section 4.3, and instead discuss just its key features here with the help of Figure 4.2. The condition (4.8b) is a complementarity condition that states that at least one of the nonnegative quantitiesλ and ( − p^∗) must be zero.

Hence, when the solution lies strictly inside the trust region (as it does when 1in Figure 4.2), we must haveλ 0 and so Bp^∗ −g with B positive semideﬁnite, from (4.8a) and (4.8c), respectively. In the other cases 2and 3, we havep^∗ , and soλ is allowed to take a positive value. Note from (4.8a) that

λp^∗ −Bp^∗− g −∇m(p^∗).

contours of p*3

∆

p^*1 p_*2

Figure 4.2 Solution of trust-region subproblem for different radii¹,²,³.

Thus, whenλ > 0, the solution p^∗is collinear with the negative gradient of m and normal to its contours. These properties can be seen in Figure 4.2.

In Section 4.1, we describe two strategies for ﬁnding approximate solutions of the subproblem (4.3), which achieve at least as much reduction in mkas the reduction achieved by the so-called Cauchy point. This point is simply the minimizer of mkalong the steepest descent direction−gk. subject to the trust-region bound. The ﬁrst approximate strategy is the dogleg method, which is appropriate when the model Hessian Bkis positive deﬁnite. The second strategy, known as two-dimensional subspace minimization, can be applied when Bk

is indeﬁnite, though it requires an estimate of the most negative eigenvalue of this matrix.

A third strategy, described in Section 7.1, uses an approach based on the conjugate gradient method to minimize mk, and can therefore be applied when B is large and sparse.

Section 4.3 is devoted to a strategy in which an iterative method is used to identify the value ofλ for which (4.6) is satisﬁed by the solution of the subproblem. We prove global convergence results in Section 4.2. Section 4.4 discusses the trust-region Newton method, in which the Hessian Bkof the model function is equal to the Hessian∇²f(xk) of the objective function. The key result of this section is that, when the trust-region Newton algorithm con-verges to a point x^∗satisfying second-order sufﬁcient conditions, it converges superlinearly.

4.1 ALGORITHMS BASED ON THE CAUCHY POINT

在文檔中 SecondEdition NumericalOptimization JorgeNocedalStephenJ.Wright (頁 87-90)