Modifications of Trust Region Algorithm

2 Trust Region Method

2.4 Modifications of Trust Region Algorithm

We do some modification to the TRS algorithm [9], the conventional TRS algorithm is detailed in Appendix B. The first is that because we need to compute the eigenvectors with respect to the smallest eigenvalue to solve the hard case, we use a more numerical computation robust method, namely, Singular Value Decomposition (SVD), to compute the eigen-system. Cholesky factorization is therefore replaced by SVD.

-17.0865

-15.9231 -15.9231

-14.7596 -14.7596

-13.5962 -13.5962

-13.5962

-13.5962 -12.4327

-12.4327

-12.4327 -11.2692

-11.2692 -11.2692

-11.2692

-11.2692 -10.1058

-10.1058 -10.1058

-10.1058

-10.1058 -8.94231

-8.94231 -8.94231

-8.94231

-8.94231 -7.77885

-7.77885

-7.77885 -6.61538

-6.61538

-6.61538 -6.61538

-5.45192

-5.45192 -4.28846

-4.28846

-4.28846 -3.125

-3.125 -3.125

-3.125

-3.125 -3.125

-1.96154

-1.96154 -1.96154

-0.798077

-0.798077 1.52885

1.52885

1.52885 2.69231

2.69231

We first derive the all ingredients for the Trust Region algorithm. The root-finding problem applied Newton’s method generates a sequence of iterate of

μ ^~

by setting

( ) ( ) ^μ ^φ ^μ

φ μ

μ ~ = −

/

1^' ^(2.16)

, where k is the k-th search index;

μ ^~

is the next Lagrangian multiplier found in the Newton’s iterates and

( ) ₍ ₎ _[ ₍ ₎ _]

( )

[ ] [ ⁽ ⁾ ] ^.

2 2 ) 1

) ( ( '

2 3 2 3

2 3 3 2 1

β I G β β I G β

β I G β β

I G d β

− −

−

− −

−

=

+

⎥⎦ −

⎢⎣ ⎤

⎡ − +

−

− =

=

μ μ

μ μ μ

μ μ φ

T T

d d

(2.17)

In trust region literature, the first order derivative can solve by solving linear system.

Due to the matrix

( ^G

⁺

^μ ^I )

is positive definite with

μ

on the interval (−λ1,∞) and

( ^G

⁺

^μ ^I )

is also a symmetric matrix so it can be factorized by Cholesky factorization

( ^G ⁺

^I ) ⁼ ^U

^U

(2.18)

, where U is a upper triangular matrix.

By substituting (2.18) into (2.4) yields

U

U d= U

U d( μ

^)=－β. (2.19)

Solve the linear system (2.19) we have the the solution d(

μ

^{) becomes}

β

U U

d (

) = −

⁻¹ ⁻^T

(2.20)

, and

^d

^T⁽

μ

⁾

^d

⁽

μ

⁾⁼

^β

^U

⁻¹

^U

⁻^T

^U

⁻¹

^U

⁻^T

^β

⁼

^β

( ^G

⁺

μ ^I )

⁻²

^β

^. ^(2.21)

Also, solve the linear system U^T

U y( μ

^{) = d(}

μ

), the solution y (

μ

^{) is}

y ( μ

⁾⁼

^U

⁻¹

^U

⁻^T

^d ( ) μ

⁼⁻

^U

⁻¹

^U

⁻^T

^U

⁻¹

^U

⁻^T

^β

⁼⁻

( ^G

⁺

μ ^I )

⁻²

^β

^. ^(2.22)

Besides we also have

( ) ( ) ^y ^β ( ^G ^I ) ^β

d

μ μ

= ^T +

μ

⁻³

^(2.23)

Substituting (2.21) and (2.23) into (2.17) yields

[ ] [ ^{( ) ( )} ] [ ] ^[ ^{( ) ( )} ^]

Also, substitute (2.12) and (2.24) into (2.16), and we have the formula of Newton’s iterates

( ) ( )

Now we have derived the formula for performing Newton’s iteration. The detailed algorithm is described in Appendix A.

Because (2.12) and (2.17) involve the term

( ^G

^{+ I}

μ )

⁻^p^where

p ∈ ℜ

. By applying the SVD method to

( ^G

μ ^I )

we have

( ^G ⁺ ^μ ^I ) ⁼ ^Q ^Σ ^Q

^T ^(2.26)

, where

⎥ ⎥

⎥

⎦

⎤

⎢ ⎢

⎢

⎣

⎡

=

⎥ ⎥

⎥

⎦

⎤

⎢ ⎢

⎢

⎣

⎡

= Σ

n λ

λ σ

O O

1 1

and

_λ

₁

,..., _λ

_nare the eigenvalues of

( ^G

μ ^I )

; Q is the n by n matrix with columns consisting of orthonormal eigenvectors of

( ^G

μ ^I )

Therefore, the inverse of

( ^G

μ ^I )

with any order p could be calculated by the following formula.

( ) ( )

P n p

T p p

G I Q Q Q Q

I G

⎥ ⎥

⎥

⎦

⎤

⎢ ⎢

⎢

⎣

⎡

= Σ

= +

=

+

⁻ ⁺ ⁺

σ μ σ

1 O

. (2.27)

To proceed with our algorithm, we also have to do the following transformation.

Qβ

θ =

(2.28)

, where the components of

θ

denotes as

θ

_i which is the product of the eigenvectors

q

i and the gradient vector

β

The second modification of the TRS algorithm is to find the lower-bound for the Lagrangian multiplier μ more efficiently. The purpose of the lower-bound is to prevent the unsuccessful iterates of the Newton’s method. As presented in Section 2.2, the safeguard mechanism of the TRS algorithm is designed to prevent this situation.

Figure 2.6 shows Newton’s method leads

μ

beyond the logical interval. Moreover, the

information of− to safeguard the possible failure of Newton’s method. Since the

λ

₁ S.V.D has been used to help us solve the problem, and the smallest eigenvalue of the Hessian matrix is also obtained. We may establish a new lower-bound based on the current Lagrangian multiplier. It will be shown that this new lower-bound will be better than the lower bound

μ

_min

^{( )}

^S proposed by Semple, J. (1997) [18]. To derive the lower-bound, we first define

( ) ( ) ( ) ( )

Differentiating (2.29) with respect to

μ

^produces

( ) μ

^β ( ^G μ ^I ) ^β

^d

( ) ( ) μ ^y μ

' =− + ³ =−

Φ ⁻

(2.30)

The lower bound is estimated by the following inequality:

( ) ( ) ( )

Both elements in (2.29) and (2.30) are calculated in the Newton’s iterate, so it is easy to identify the estimated lower bound becomes

Then the lower-bound proposed by Semple, J denotes as

μ

_min

^{( )}

^S can be written as:

( ) ^μ ( ) ( ) ^μ ^μ ^μ

μ d y

d

T S

min

)

− (

=

Figure 2.6 The failure Newton’s iteration

On the other hand, by using SVD our lower-bound

μ

_min

^{( )}

^T can be calculated by the following formula:

( )

( ) ( )

( )

( ) ( ⁽ ⁾ )

( ( ) ) ( )

( )

3 2 3 2 1 min

β I G β

d d

β I G β β I G β

d

−

− −

−

+ Δ + −

+ +

−

− −

−

μ μ μ μ

μ μ

μ μ μ φ φ μ μ μ

(2.33)

Thus the lower bound of

μ

can be then set to

( ^λ ^,μ ( )

)

μ

_min

= max −

₁ _min ^. (2.34)

That is, the larger value between the negative smallest eigenvalue and the lower

bound in (2.33) is set to be

μ

_min. We use Example 1.2 to show (2.34) is better than

( )

μ

min and their geometric meanings in three different situations, i.e., three different positions of the current point.

Situation 1:

For being a positive definite matrix G +

μ ^{I, let} Δ = ¹

and consider

μ

0 = 3 as the current point. Calculate the two lower-bounds and the two lower-bounds are illustrated in Figure 2.7.

Figure 2.7 Comparison of lower-bound for

μ in situation 1 (0, 0)

(-6,1)

(3, 0.25) (1, 1)

( )

μ

min

( )

μ

min

^μ

⁰

^μ

λ

−

μ

(1.3644,1)

Since

μ

_min

^{( )}

< − λ

₁^{, we set}

μ

min^{to be}− according to (2.34). With the help of

λ

− , we a obtain better

μ

min^than

μ

min

^{( )}

^S ^.

Situation 2:

In this situation, we consider

μ

0 = 1.5 (at the right of the optimal

μ

^*) to be the current point as shown in Figure 2.8. The two lower-bound are calculated and shown in Figure 2.8.

Figure 2.8 Comparison of lower-bound for

μ in situation 2

As shown in Figure 2.8,

μ

_min

^{( )}

^T is set to be 1.31, which appears to be very close to the optimal

μ

^* and also lower than

μ

_min

^{( )}

^T ^.

( )

μ

min

λ

− (1.5, 0.74)

(1.31, 1)

μ

(0.95, 0)

(1.3644, 1)

( )^T

μ

min

(1, 1)

Situation 3:

In this situation, we consider

μ

⁰ = 1.2 (at the left of the optimal

μ

^*) to be the current point as shown in Figure 2.9. Again the two lower-bound are also calculated and shown in Figure 2.9.

Figure 2.9 Comparison of lower-bound for

μ in situation 3

We find that

μ

_min

^{( )}

^T is still larger than

μ

_min

^{( )}

^S even if the current point

μ

⁰ is on the left hand side of

μ

^*. We already demonstrate, without proof, that

μ

_min

^{( )}

^T better than

μ

_min

^{( )}

^S ^. When we use the lower-bound to safeguard the Newton’s method from invalid solutions, this new lower bound helps the Newton’s method to converge quickly.

μ

−λ1

(1.2, 1.78)

(1, 1) (1.289, 1)

(0.996, 0)

μ

( )

μ

min

( )

μ

min

μ

^∗ (1.3644, 1)

Algorithm 2.1 (Trust Region Subproblem Algorithm) Begin

Perform S.V.D to

( ^G

μ ^I )

by (2.27) Calculate θ by (2.28)

If G is positive definite then

Return μ

= 0

and the solution

^d ( ) ⁰ ⁼ ⁻ ^G

⁻¹

^β

(2.35)

Else If β

⊥

E

_min

then

Calculate ( ) ( )²)

1 2

1 2 2 2 2

⎭⎬

⎫

⎩⎨

⎧ ⎥

⎦

⎢ ⎤

⎣

⎡

+ + + +

− Δ

μ λ

θ λ

μ τ θ

K k (2.36)

If τ

> 0

^then

0 0 0 1

1 1 2

⎥⎥

⎦

⎤

⎢⎢

⎣

⎡

+ +

⎥⎥

⎦

⎤

⎢⎢

⎣

⎡

⎥⎥

⎦

⎤

⎢⎢

⎣

⎡

← +

λ μ

θ λ μ

θ τ τ

λ μ

θ λ μ

θ

k k k

M M

D

M (2.37)

Return a better solution d by evaluating the objective of original

objective function

Else

Go to Algorithm 2.2 (the problem is a good case)

End If

Else

Go to Algorithm 2.2 (the problem is a good case).

End If End

We summarize the algorithm to solve the Trust Region Subproblem as follows.

Algorithm 2.2 (Algorithm for Good Case) Input:

δ : the tolerance for convergence of the solution

^d ( ) μ ε

: the tolerance for ensuring (ensure (G+

μ I) is P.D.) G:the hessian matrix of (2.2)

β: the gradient vector of (2.2) Δ: the given trust region radius λ1: the smallest eigenvalue

Begin

ε λ

−

← 1

μ

(ensure

( ^{G μ}

^I )

is P.D.). (2.38)

Repeat while ^d ( ) ^μ ^{− Δ} ^> ^δ

( ( ) ) ( )

( )

( ) ^⎟⎟ ^⎠ ^⎞

⎜⎜ ⎝

⎛

+ Δ + −

−

←

₋

β I G β

d d

1 3

min

max

μ μ μ μ

, λ

μ

(2.39)

(set the lower-bound for

μ

^).

If ^d ( ) ^μ

^<^Δ (at the right of the root) then

μ

max

←

(2.40)

Else

μ

min ← (2.41)

Notice that, when the hard case occurs the optimal solution must be chosen by evaluating the objective value of original objective function. The algorithm for the TR Hard Case is complete. For “Good Case” of TRS, the Newton’s iterates algorithm is shown below:

( )

[ ^β ^G ^I ^β ] [ ^β ⁽ ^G ^I ⁾ ^β ]

d

2 3 2 3

1 Δ

− −

− +

←

μ μ

μ μ μ

T T

(2.42)

If

μ

_min

then

μ

max

μ

min

μ

← + . (2.43)

End If End Repeat Return μ

= μ ^~ End

Example 1.2 is used again to demonstrate the TRS good case algorithm. With the explanation of the geometric meanings, first let the given trust region radius to be 1;

ε

= 2; the procedure is detailed as follows.

Preparation:

The eigenvalue of the Hessian matrix are 3 and −1 respectively, that is, G is an indefinite matrix. We first set

^μ

⁼⁻

( )

⁻¹ ⁺²⁼³ according to (2.38), and then proceed to the algorithm.

Iteration 1:

By (2.39), we have

( ) ( )

⎥

⎦

⎢ ⎤

⎣

⎡−

= 0

25 . 3 0

d

d μ

^and

^d ( ) μ

=⁰^.²⁵<¹. Thus we enter the TRS problem solving step. The

μ

_minis first found to be ^max

( )

¹^,−⁶ =¹ according to (2.39) and is shown in Figure 2.10. Because

^d ( ) ^μ

⁼⁰^.²⁵^<¹, to obtain an valid μ , we can set

μ

_max

= 3

^{. With}

μ

^* known to be in the interval of

( ^μ

^min

^, ^μ

^max

) ( ) ⁼ ¹ ^, ³

^{, the}

first Newton’s iterate can be performed by (2.42), as shown in Figure 2.11

μ ⁽ = 0.75

Because

μ ⁽

is not in (1, 3) according to the safeguard mechanism (2.43), we take the average of

μ

_min ^and

μ

max to replace

μ ⁽

_{, i.e.,}

^μ

^~⁼

( )

¹⁺³ ^/²⁼² ^{. The two}

bold-dashed line in Figure 2.11 indicate

μ

_min^and

μ

max in the space of

1 / _Δ

Figure 2.10 μ ^, μ ^, μ ^{( )}

^and μ on two-dimensional space

max

= 3

= μ

μ ^μ

min 1

1 = =

−

λ μ

( ) 6

min^T

= −

μ μ

Figure 2.11 Safeguard mechanism for Newton’s iterate in the 1 Δ _space

With

μ ~ = 2

, the remaining iteration is listed in the following table.

Table 2.1 The iterative results of example 1.2 solved by the TRS algorithm Iteration m

1 3 Safeguarded Safeguarded

2 2 (-0.4,0.1) 0.41231

3 1.2544 (-1.1588,0.8063) 1.41181

4 1.3623 (-08618,0.5179) 1.0055

5 1.3644 (-0.8577,0.5140) 1

( ) ^μ

d

≈

max

3 = μ =

μ 75

.

= 0 μ ⁽

~ = 2 μ

min =1

μ

在文檔中一般化縮減信賴域搜尋及其在多目標統計模型最佳化之應用 (頁 58-72)

Modifications of Trust Region Algorithm

2 Trust Region Method

2.4 Modifications of Trust Region Algorithm

μ ~

( ) ( ) μ φ μ

φ μ

μ ~ = −

/

μ ~

( ) ( ) [ ( ) ]

( )

[ ] [ ( ) ] .

2 2 ) 1

) ( ( '

β I G β β I G β

β I G β β

I G d β

−

−

−

=

+

⎥⎦ −

⎢⎣ ⎤

⎡ − +

−

− =

=

d d

( G

μ I )

μ

( G

μ I )

( G +

I ) = U

U

U

U d= U

U d( μ

μ

β

U U

d (

) = −

d

μ

d

μ

β

U

U

U

U

β

β

( G

μ I )

β

U y( μ

μ

μ

y ( μ

U

U

d ( ) μ

U

U

U

U

β

( G

μ I )

β

( ) ( ) y β ( G I ) β

d

μ μ

μ

[ ] [ ( ) ( ) ] [ ] [ ( ) ( ) ]

( ) ( )

μ ^~

( ) ( ) ^μ ^φ ^μ

μ ^~

( ) ₍ ₎ _[ ₍ ₎ _]

[ ] [ ⁽ ⁾ ] ^.

( ^G

^μ ^I )

( ^G

^μ ^I )

( ^G ⁺

^I ) ⁼ ^U

^U

^d

^d

^β

^U

^U

^U

^U

^β

^β

( ^G

μ ^I )

^β

^U

^U

^d ( ) μ

^U

^U

^U

^U

^β

( ^G

μ ^I )

^β

( ) ( ) ^y ^β ( ^G ^I ) ^β

[ ] [ ^{( ) ( )} ] [ ] ^[ ^{( ) ( )} ^]

( ^G

( ^G

μ ^I )

( ^G ⁺ ^μ ^I ) ⁼ ^Q ^Σ ^Q

_λ

,..., _λ

( ^G

μ ^I )

( ^G

μ ^I )

( ^G

μ ^I )

^{( )}

^β ( ^G μ ^I ) ^β