• 沒有找到結果。

Quasi-Newton Methods

PROPERTIES OF SR1 UPDATING

6.3 THE BROYDEN CLASS

So far, we have described the BFGS, DFP, and SR1 quasi-Newton updating formulae, but there are many others. Of particular interest is the Broyden class, a family of updates specified by the following general formula:

Bk+1 BkBkskskTBk

skTBksk

+ ykykT ykTsk

+ φk(skTBksk)vkvkT, (6.32)

whereφkis a scalar parameter and

vk

 yk

ykTsk

Bksk

skTBksk



. (6.33)

The BFGS and DFP methods are members of the Broyden class—we recover BFGS by setting φk  0 and DFP by setting φk  1 in (6.32). We can therefore rewrite (6.32) as a “linear combination” of these two methods, that is,

Bk+1 (1 − φk)BkBFGS+1+ φkBkDFP+1.

This relationship indicates that all members of the Broyden class satisfy the secant equation (6.6), since the BGFS and DFP matrices themselves satisfy this equation. Also, since BFGS and DFP updating preserve positive definiteness of the Hessian approximations when skTyk > 0, this relation implies that the same property will hold for the Broyden family if 0≤ φk ≤ 1.

Much attention has been given to the so-called restricted Broyden class, which is obtained by restricting φk to the interval [0, 1]. It enjoys the following property when applied to quadratic functions. Since the analysis is independent of the step length, we assume for simplicity that each iteration has the form

pk −Bk−1∇ fk, xk+1 xk+ pk. (6.34) Theorem 6.3.

Suppose that f : IRn → IR is the strongly convex quadratic function f (x)  bTx+

1

2xTAx, where A is symmetric and positive definite. Let x0be any starting point for the iteration (6.34) and B0be any symmetric positive definite starting matrix, and suppose that the matrices Bkare updated by the Broyden formula (6.32) withφk∈ [0, 1]. Define λk1 ≤ λk2 ≤ · · · ≤ λkn

to be the eigenvalues of the matrix

A12Bk−1A12. (6.35)

Then for all k, we have

minki, 1} ≤ λki+1≤ max{λki, 1}, i  1, 2, . . . , n. (6.36) Moreover, the property (6.36) does not hold if the Broyden parameterφkis chosen outside the interval [0, 1].

Let us discuss the significance of this result. If the eigenvaluesλki of the matrix (6.35) are all 1, then the quasi-Newton approximation Bk is identical to the Hessian A of the quadratic objective function. This situation is the ideal one, so we should be hoping for these eigenvalues to be as close to 1 as possible. In fact, relation (6.36) tells us that the

eigenvaluesik} converge monotonically (but not strictly monotonically) to 1. Suppose, for example, that at iteration k the smallest eigenvalue isλk1  0.7. Then (6.36) tells us that at the next iterationλk1+1 ∈ [0.7, 1]. We cannot be sure that this eigenvalue has actually moved closer to 1, but it is reasonable to expect that it has. In contrast, the first eigenvalue can become smaller than 0.7 if we allow φkto be outside [0, 1]. Significantly, the result of Theorem 6.3 holds even if the line searches are not exact.

Although Theorem 6.3 seems to suggest that the best update formulas belong to the restricted Broyden class, the situation is not at all clear. Some analysis and computational testing suggest that algorithms that allowφkto be negative (in a strictly controlled manner) may in fact be superior to the BFGS method. The SR1 formula is a case in point: It is a member of the Broyden class, obtained by setting

φk skTyk

skTyk− skTBksk

,

but it does not belong to the restricted Broyden class, because this value ofφk may fall outside the interval [0, 1].

In the remaining discussion of this section, we determine more precisely the range of values ofφkthat preserve positive definiteness.

The last term in (6.32) is a rank-one correction, which by the interlacing eigenvalue theorem (Theorem A.1) increases the eigenvalues of the matrix whenφkis positive. Therefore Bk+1is positive definite for allφk ≥ 0. On the other hand, by Theorem A.1 the last term in (6.32) decreases the eigenvalues of the matrix whenφkis negative. As we decreaseφk, this matrix eventually becomes singular and then indefinite. A little computation shows that Bk+1is singular whenφkhas the value

φkc 1 1− µk

, (6.37)

where

µk (ykTBk−1yk)(skTBksk)

(ykTsk)2 . (6.38)

By applying the Cauchy–Schwarz inequality (A.5) to (6.38), we see thatµk≥ 1 and therefore φck ≤ 0. Hence, if the initial Hessian approximation B0is symmetric and positive definite, and if skTyk > 0 and φk > φkcfor each k, then all the matrices Bk generated by Broyden’s formula (6.32) remain symmetric and positive definite.

When the line search is exact, all methods in the Broyden class withφk≥ φkcgenerate the same sequence of iterates. This result applies to general nonlinear functions and is based on the observation that when all the line searches are exact, the directions generated by Broyden-class methods differ only in their lengths. The line searches identify the same

minima along the chosen search direction, though the values of the step lengths may differ because of the different scaling.

The Broyden class has several remarkable properties when applied with exact line searches to quadratic functions. We state some of these properties in the next theorem, whose proof is omitted.

Theorem 6.4.

Suppose that a method in the Broyden class is applied to the strongly convex quadratic function f (x) bTx+12xTAx, where x0is the starting point and B0is any symmetric positive definite matrix. Assume thatαkis the exact step length and thatφk≥ φkcfor all k, whereφkcis defined by (6.37). Then the following statements are true.

(i) The iterates are independent ofφkand converge to the solution in at most n iterations.

(ii) The secant equation is satisfied for all previous search directions, that is,

Bksj  yj, j  k − 1, k − 2, . . . , 1.

(iii) If the starting matrix is B0  I , then the iterates are identical to those generated by the conjugate gradient method (see Chapter 5). In particular, the search directions are conjugate, that is,

siTAsj  0, for i  j.

(iv) If n iterations are performed, we have Bn A.

Note that parts (i), (ii), and (iv) of this result echo the statement and proof of Theorem 6.1, where similar results were derived for the SR1 update formula.

We can generalize Theorem 6.4 slightly: It continues to hold if the Hessian approxi-mations remain nonsingular but not necessarily positive definite. (Hence, we could allow φkto be smaller thanφkc, provided that the chosen value did not produce a singular updated matrix.) We can also generalize point (iii) as follows. If the starting matrix B0 is not the identity matrix, then the Broyden-class method is identical to the preconditioned conjugate gradient method that uses B0as preconditioner.

We conclude by commenting that results like Theorem 6.4 would appear to be of mainly theoretical interest, since the inexact line searches used in practical implementations of Broyden-class methods (and all other quasi-Newton methods) cause their performance to differ markedly. Nevertheless, it is worth noting that this type of analysis guided much of the development of quasi-Newton methods.