• 沒有找到結果。

TRUST REGIONS IN OTHER NORMS

Trust-Region Methods

TRUST REGIONS IN OTHER NORMS

Trust regions may also be defined in terms of norms other than the Euclidean norm.

For instance, we may have

p1≤ k or p≤ k,

or their scaled counterparts

Dp1≤ k or Dp≤ k,

where D is a positive diagonal matrix as before. Norms such as these offer no obvious ad-vantages for small-medium unconstrained problems, but they may be useful for constrained problems. For instance, for the bound-constrained problem

minx∈IRn f(x), subject to x ≥ 0,

the trust-region subproblem may take the form

minp∈IRn mk( p) fk+ gkTp+12pTBkp s.t. xk+ p ≥ 0, p ≤ k. (4.62)

When the trust region is defined by a Euclidean norm, the feasible region for (4.62) consists of the intersection of a sphere and the nonnegative orthant—an awkward object, geometrically speaking. When the∞-norm is used, however, the feasible region is simply the rectangular box defined by

xk+ p ≥ 0, p ≥ −ke, p≤ ke,

where e  (1, 1, . . . , 1)T, so the solution of the subproblem is easily calculated by using techniques for bound-constrained quadratic programming.

For large problems, in which factorization or formation the model Hessian Bkis not computationally desirable, the use of a trust region defined by · will also give rise to a bound-constrained subproblem, which may be more convenient to solve than the standard subproblem (4.3). To our knowledge, there has not been much research on the relative performance of methods that use trust regions of different shapes on large problems.

NOTES AND REFERENCES

One of the earliest works on trust-region methods is Winfield [307]. The influential paper of Powell [244] proves a result like Theorem 4.5 for the case ofη  0, where the algo-rithm takes a step whenever it decreases the function value. Powell uses a weaker assumption than ours on the matricesB, but his analysis is more complicated. Mor´e [211] summarizes developments in algorithms and software before 1982, paying particular attention to the importance of using a scaled trust-region norm.

Byrd, Schnabel, and Schultz [279], [54] provide a general theory for inexact trust-region methods; they introduce the idea of two-dimensional subspace minimization and also focus on proper handling of the case of indefinite B to ensure stronger local convergence results than Theorems 4.5 and 4.6. Dennis and Schnabel [93] survey trust-region methods as part of their overview of unconstrained optimization, providing pointers to many important developments in the literature.

The monograph of Conn, Gould, and Toint [74] is an exhaustive treatment of the state of the art in trust-region methods for both unconstrained and constrained optimization. It includes an comprehensive annotated bibliography of the literature in the area.

✐ E

X E R C I S E S

4.1 Let f (x) 10(x2− x12)2+ (1 − x1)2. At x  (0, −1) draw the contour lines of the quadratic model (4.2) assuming that B is the Hessian of f . Draw the family of solutions of (4.3) as the trust region radius varies from  0 to   2. Repeat this at x  (0, 0.5).

4.2 Write a program that implements the dogleg method. Choose Bkto be the exact Hessian. Apply it to solve Rosenbrock’s function (2.22). Experiment with the update rule for the trust region by changing the constants in Algorithm 4.1, or by designing your own rules.

4.3 Program the trust-region method based on Algorithm 7.2. Choose Bkto be the exact Hessian, and use it to minimize the function

min f (x)

n i1

(1− x2i−1)2+ 10(x2i− x2i−12 )2

with n 10. Experiment with the starting point and the stopping test for the CG iteration.

Repeat the computation with n 50.

Your program should indicate, at every iteration, whether Algorithm 7.2 encountered negative curvature, reached the trust-region boundary, or met the stopping test.

4.4 Theorem 4.5 shows that the sequence{g} has an accumulation point at zero.

Show that if the iterates x stay in a bounded setB, then there is a limit point xof the sequence{xk} such that g(x) 0.

4.5 Show thatτk defined by (4.12) does indeed identify the minimizer of mkalong the direction−gk.

4.6 The Cauchy–Schwarz inequality states that for any vectors u andv, we have

|uTv|2≤ (uTu)(vTv),

with equality only when u andv are parallel. When B is positive definite, use this inequality to show that

γ def g4

(gTBg)(gTB−1g)≤ 1,

with equality only if g and Bg (and B−1g) are parallel.

4.7 When B is positive definite, the double-dogleg method constructs a path with three line segments from the origin to the full step. The four points that define the path are

• the origin;

• the unconstrained Cauchy step pC −(gTg)/(gTBg)g;

• a fraction of the full step ¯γ pB − ¯γ B−1g, for some ¯γ ∈ (γ, 1], where γ is defined in the previous question; and

• the full step pB −B−1g.

Show thatp increases monotonically along this path.

(Note: The double-dogleg method, as discussed in Dennis and Schnabel [92, Section 6.4.2], was for some time thought to be superior to the standard dogleg method, but later testing has not shown much difference in performance.)

4.8 Show that (4.43) and (4.44) are equivalent. Hints: Note that d

(from (4.39)), and

q2 R−Tp2 pT(B+ λI )−1p

n j1

(qTjg)2j+ λ)3.

4.9 Derive the solution of the two-dimensional subspace minimization problem in the case where B is positive definite.

4.10 Show that if B is any symmetric matrix, then there existsλ ≥ 0 such that B +λI is positive definite.

4.11 Verify that the definitions (4.60) for pkSand (4.61) forτkare valid for the Cauchy point in the case of an elliptical trust region. (Hint: Using the theory of Chapter 12, we can show that the solution of (4.58) satisfies gk+ αD2pkS  0 for some scalar α ≥ 0.)

4.12 The following example shows that the reduction in the model function m achieved by the two-dimensional minimization strategy can be much smaller than that achieved by the exact solution of (4.5).

In (4.5), set

g



−1

, −1, − 2

T

,

where is a small positive number. Set

B diag

1 3, 1, 3



,   0.5.

Show that the solution of (4.5) has components 

O( ),12+ O( ), O( )T

and that the reduction in the model m is 38 + O( ). For the two-dimensional minimization strategy, show that the solution is a multiple of B−1gand that the reduction in m is O( ).

C H A P T E R 5

Conjugate