THE BROYDEN CLASS - PROPERTIES OF SR1 UPDATING

Quasi-Newton Methods

PROPERTIES OF SR1 UPDATING

6.3 THE BROYDEN CLASS

So far, we have described the BFGS, DFP, and SR1 quasi-Newton updating formulae, but there are many others. Of particular interest is the Broyden class, a family of updates speciﬁed by the following general formula:

Bk+1 Bk− Bksks_k^TBk

s_k^TBksk

+ yky_k^T y_k^Tsk

+ φk(s_k^TBksk)vkvk^T, (6.32)

whereφkis a scalar parameter and

y_k^Tsk

− Bksk

s_k^TBksk

. (6.33)

The BFGS and DFP methods are members of the Broyden class—we recover BFGS by setting φk 0 and DFP by setting φk 1 in (6.32). We can therefore rewrite (6.32) as a “linear combination” of these two methods, that is,

Bk+1 (1 − φk)B_k^BFGS₊₁+ φkB_k^DFP₊₁.

This relationship indicates that all members of the Broyden class satisfy the secant equation (6.6), since the BGFS and DFP matrices themselves satisfy this equation. Also, since BFGS and DFP updating preserve positive deﬁniteness of the Hessian approximations when s_k^Tyk > 0, this relation implies that the same property will hold for the Broyden family if 0≤ φk ≤ 1.

Much attention has been given to the so-called restricted Broyden class, which is obtained by restricting φk to the interval [0, 1]. It enjoys the following property when applied to quadratic functions. Since the analysis is independent of the step length, we assume for simplicity that each iteration has the form

pk −Bk⁻¹∇ fk, xk+1 xk+ pk. (6.34) Theorem 6.3.

Suppose that f : IRⁿ → IR is the strongly convex quadratic function f (x) b^Tx+

2x^TAx, where A is symmetric and positive deﬁnite. Let x₀be any starting point for the iteration (6.34) and B0be any symmetric positive deﬁnite starting matrix, and suppose that the matrices Bkare updated by the Broyden formula (6.32) withφk∈ [0, 1]. Deﬁne λ^k1 ≤ λ^k2 ≤ · · · ≤ λ^kn

to be the eigenvalues of the matrix

A¹²B_k⁻¹A¹². (6.35)

Then for all k, we have

min{λ^ki, 1} ≤ λ^ki⁺¹≤ max{λ^ki, 1}, i 1, 2, . . . , n. (6.36) Moreover, the property (6.36) does not hold if the Broyden parameterφkis chosen outside the interval [0, 1].

Let us discuss the signiﬁcance of this result. If the eigenvaluesλ^ki of the matrix (6.35) are all 1, then the quasi-Newton approximation Bk is identical to the Hessian A of the quadratic objective function. This situation is the ideal one, so we should be hoping for these eigenvalues to be as close to 1 as possible. In fact, relation (6.36) tells us that the

eigenvalues{λi^k} converge monotonically (but not strictly monotonically) to 1. Suppose, for example, that at iteration k the smallest eigenvalue isλ^k₁ 0.7. Then (6.36) tells us that at the next iterationλ^k₁⁺¹ ∈ [0.7, 1]. We cannot be sure that this eigenvalue has actually moved closer to 1, but it is reasonable to expect that it has. In contrast, the ﬁrst eigenvalue can become smaller than 0.7 if we allow φkto be outside [0, 1]. Signiﬁcantly, the result of Theorem 6.3 holds even if the line searches are not exact.

Although Theorem 6.3 seems to suggest that the best update formulas belong to the restricted Broyden class, the situation is not at all clear. Some analysis and computational testing suggest that algorithms that allowφkto be negative (in a strictly controlled manner) may in fact be superior to the BFGS method. The SR1 formula is a case in point: It is a member of the Broyden class, obtained by setting

φk s_k^Tyk

s_k^Tyk− s_k^TBksk

but it does not belong to the restricted Broyden class, because this value ofφk may fall outside the interval [0, 1].

In the remaining discussion of this section, we determine more precisely the range of values ofφkthat preserve positive deﬁniteness.

The last term in (6.32) is a rank-one correction, which by the interlacing eigenvalue theorem (Theorem A.1) increases the eigenvalues of the matrix whenφkis positive. Therefore Bk+1is positive deﬁnite for allφk ≥ 0. On the other hand, by Theorem A.1 the last term in (6.32) decreases the eigenvalues of the matrix whenφkis negative. As we decreaseφk, this matrix eventually becomes singular and then indeﬁnite. A little computation shows that Bk+1is singular whenφkhas the value

φk^c 1 1− µk

, (6.37)

where

µk (y_k^TB_k⁻¹yk)(s_k^TBksk)

(y_k^Tsk)² . (6.38)

By applying the Cauchy–Schwarz inequality (A.5) to (6.38), we see thatµk≥ 1 and therefore φ^ck ≤ 0. Hence, if the initial Hessian approximation B0is symmetric and positive deﬁnite, and if s_k^Tyk > 0 and φk > φk^cfor each k, then all the matrices Bk generated by Broyden’s formula (6.32) remain symmetric and positive deﬁnite.

When the line search is exact, all methods in the Broyden class withφk≥ φk^cgenerate the same sequence of iterates. This result applies to general nonlinear functions and is based on the observation that when all the line searches are exact, the directions generated by Broyden-class methods differ only in their lengths. The line searches identify the same

minima along the chosen search direction, though the values of the step lengths may differ because of the different scaling.

The Broyden class has several remarkable properties when applied with exact line searches to quadratic functions. We state some of these properties in the next theorem, whose proof is omitted.

Theorem 6.4.

Suppose that a method in the Broyden class is applied to the strongly convex quadratic function f (x) b^Tx+¹₂x^TAx, where x0is the starting point and B0is any symmetric positive deﬁnite matrix. Assume thatαkis the exact step length and thatφk≥ φ_k^cfor all k, whereφ_k^cis deﬁned by (6.37). Then the following statements are true.

(i) The iterates are independent ofφkand converge to the solution in at most n iterations.

(ii) The secant equation is satisﬁed for all previous search directions, that is,

Bksj yj, j k − 1, k − 2, . . . , 1.

(iii) If the starting matrix is B0 I , then the iterates are identical to those generated by the conjugate gradient method (see Chapter 5). In particular, the search directions are conjugate, that is,

s_i^TAsj 0, for i j.

(iv) If n iterations are performed, we have Bn A.

Note that parts (i), (ii), and (iv) of this result echo the statement and proof of Theorem 6.1, where similar results were derived for the SR1 update formula.

We can generalize Theorem 6.4 slightly: It continues to hold if the Hessian approxi-mations remain nonsingular but not necessarily positive deﬁnite. (Hence, we could allow φkto be smaller thanφk^c, provided that the chosen value did not produce a singular updated matrix.) We can also generalize point (iii) as follows. If the starting matrix B₀ is not the identity matrix, then the Broyden-class method is identical to the preconditioned conjugate gradient method that uses B₀as preconditioner.

We conclude by commenting that results like Theorem 6.4 would appear to be of mainly theoretical interest, since the inexact line searches used in practical implementations of Broyden-class methods (and all other quasi-Newton methods) cause their performance to differ markedly. Nevertheless, it is worth noting that this type of analysis guided much of the development of quasi-Newton methods.

在文檔中 SecondEdition NumericalOptimization JorgeNocedalStephenJ.Wright (頁 168-172)