2 Trust Region Method
2.4 Modifications of Trust Region Algorithm
We do some modification to the TRS algorithm [9], the conventional TRS algorithm is detailed in Appendix B. The first is that because we need to compute the eigenvectors with respect to the smallest eigenvalue to solve the hard case, we use a more numerical computation robust method, namely, Singular Value Decomposition (SVD), to compute the eigen-system. Cholesky factorization is therefore replaced by SVD.
-17.0865
-17.0865
-15.9231 -15.9231
-15.9231 -15.9231
-14.7596 -14.7596
-14.7596 -14.7596
-13.5962 -13.5962
-13.5962
-13.5962 -12.4327
-12.4327
-12.4327
-12.4327
-12.4327 -11.2692
-11.2692 -11.2692
-11.2692
-11.2692
-11.2692 -10.1058
-10.1058 -10.1058
-10.1058
-10.1058
-10.1058 -8.94231
-8.94231 -8.94231
-8.94231
-8.94231
-8.94231 -7.77885
-7.77885
-7.77885
-7.77885
-7.77885
-7.77885 -6.61538
-6.61538
-6.61538
-6.61538
-6.61538 -6.61538
-5.45192
-5.45192
-5.45192
-5.45192
-5.45192
-5.45192 -4.28846
-4.28846
-4.28846
-4.28846
-4.28846
-4.28846 -3.125
-3.125 -3.125
-3.125
-3.125 -3.125
-1.96154
-1.96154
-1.96154
-1.96154
-1.96154 -1.96154
-0.798077
-0.798077 1.52885
1.52885
1.52885
1.52885
1.52885 2.69231
2.69231
We first derive the all ingredients for the Trust Region algorithm. The root-finding problem applied Newton’s method generates a sequence of iterate of
μ ~
by setting
( ) ( ) μ φ μ
φ μ
μ ~ = −
1/
1' (2.16), where k is the k-th search index;
μ ~
is the next Lagrangian multiplier found in the Newton’s iterates and( ) ( ) [ ( ) ]
( )
[ ] [ ( ) ] .
2 2 ) 1
) ( ( '
2 3 2 3
2 3 3 2 1
1
β I G β β I G β
β I G β β
I G d β
− −
−
− −
−
−
−
−
−
=
+
⎥⎦ −
⎢⎣ ⎤
⎡ − +
−
− =
=
μ μ
μ μ μ
μ μ φ
T T
T T
d d
(2.17)
In trust region literature, the first order derivative can solve by solving linear system.
Due to the matrix
( G
+μ I )
is positive definite withμ
on the interval (−λ1,∞) and( G
+μ I )
is also a symmetric matrix so it can be factorized by Cholesky factorizationas
( G +
μI ) = U
TU
(2.18), where U is a upper triangular matrix.
By substituting (2.18) into (2.4) yields
U
TU d= U
TU d( μ
)=-β. (2.19)Solve the linear system (2.19) we have the the solution d(
μ
) becomesβ
U U
d (
μ) = −
−1 −T(2.20)
, and
d
T(μ
)d
(μ
)=β
TU
−1U
−TU
−1U
−Tβ
=β
T( G
+μ I )
−2β
. (2.21)Also, solve the linear system UT
U y( μ
) = d(μ
), the solution y (μ
) isy ( μ
)=U
−1U
−Td ( ) μ
=−U
−1U
−TU
−1U
−Tβ
=−( G
+μ I )
−2β
. (2.22)Besides we also have
( ) ( ) y β ( G I ) β
d
Tμ μ
= T +μ
−3(2.23)
Substituting (2.21) and (2.23) into (2.17) yields
[ ] [ ( ) ( ) ] [ ] [ ( ) ( ) ]
Also, substitute (2.12) and (2.24) into (2.16), and we have the formula of Newton’s iterates
( ) ( )
Now we have derived the formula for performing Newton’s iteration. The detailed algorithm is described in Appendix A.
Because (2.12) and (2.17) involve the term
( G
+ Iμ )
−p wherep ∈ ℜ
. By applying the SVD method to( G
+μ I )
we have( G + μ I ) = Q Σ Q
T (2.26), where
⎥ ⎥
⎥
⎦
⎤
⎢ ⎢
⎢
⎣
⎡
=
⎥ ⎥
⎥
⎦
⎤
⎢ ⎢
⎢
⎣
⎡
= Σ
n
n λ
λ σ
σ
O O
1 1
and
λ
1,..., λ
nare the eigenvalues of( G
+μ I )
; Q is the n by n matrix with columns consisting of orthonormal eigenvectors of( G
+μ I )
.Therefore, the inverse of
( G
+μ I )
with any order p could be calculated by the following formula.( ) ( )
TP n p
T p p
p
G I Q Q Q Q
I G
⎥ ⎥
⎥ ⎥
⎥
⎦
⎤
⎢ ⎢
⎢ ⎢
⎢
⎣
⎡
= Σ
= +
=
+
− + +σ μ σ
μ
1
1
1
O
. (2.27)To proceed with our algorithm, we also have to do the following transformation.
Qβ
θ =
(2.28), where the components of
θ
denotes asθ
i which is the product of the eigenvectorsq
i and the gradient vectorβ
.The second modification of the TRS algorithm is to find the lower-bound for the Lagrangian multiplier μ more efficiently. The purpose of the lower-bound is to prevent the unsuccessful iterates of the Newton’s method. As presented in Section 2.2, the safeguard mechanism of the TRS algorithm is designed to prevent this situation.
Figure 2.6 shows Newton’s method leads
μ
beyond the logical interval. Moreover, theinformation of− to safeguard the possible failure of Newton’s method. Since the
λ
1 S.V.D has been used to help us solve the problem, and the smallest eigenvalue of the Hessian matrix is also obtained. We may establish a new lower-bound based on the current Lagrangian multiplier. It will be shown that this new lower-bound will be better than the lower boundμ
min( )
S proposed by Semple, J. (1997) [18]. To derive the lower-bound, we first define( ) ( ) ( ) ( )
Differentiating (2.29) with respect to
μ
produces( ) μ
2β ( G μ I ) β
2d
T( ) ( ) μ y μ
' =− + 3 =−
Φ −
(2.30)
The lower bound is estimated by the following inequality:
( ) ( ) ( )
Both elements in (2.29) and (2.30) are calculated in the Newton’s iterate, so it is easy to identify the estimated lower bound becomes
Then the lower-bound proposed by Semple, J denotes as
μ
min( )
S can be written as:( ) μ ( ) ( ) μ μ μ
μ d y
d
T S
2
min
)
− (
=
.Figure 2.6 The failure Newton’s iteration
On the other hand, by using SVD our lower-bound
μ
min( )
T can be calculated by the following formula:( )
( )
( ) ( )
( )
( ) ( ( ) )
( ( ) ) ( )
( )
( )
,
'
3 2 3 2 1 min
β I G β
d d
β I G β β I G β
d
−
− −
−
+ Δ + −
=
+ +
−
Δ
− −
=
−
=
μ μ μ μ
μ μ
μ μ μ φ φ μ μ μ
T(2.33)
Thus the lower bound of
μ
can be then set to( λ ,μ ( )
T)
μ
min= max −
1 min . (2.34)That is, the larger value between the negative smallest eigenvalue and the lower
bound in (2.33) is set to be
μ
min. We use Example 1.2 to show (2.34) is better than( )
Sμ
min and their geometric meanings in three different situations, i.e., three different positions of the current point.Situation 1:
For being a positive definite matrix G +
μ I, let Δ = 1
and considerμ
0 = 3 as the current point. Calculate the two lower-bounds and the two lower-bounds are illustrated in Figure 2.7.Figure 2.7 Comparison of lower-bound for
μ in situation 1 (0, 0)(-6,1)
(3, 0.25) (1, 1)
( )
Sμ
min( )
Tμ
minμ
0μ
Δ
λ
1−
μ
*(1.3644,1)
Since
μ
min( )
T< − λ
1, we setμ
min to be − according to (2.34). With the help ofλ
1λ
1− , we a obtain better
μ
min thanμ
min( )
S .Situation 2:
In this situation, we consider
μ
0 = 1.5 (at the right of the optimalμ
*) to be the current point as shown in Figure 2.8. The two lower-bound are calculated and shown in Figure 2.8.Figure 2.8 Comparison of lower-bound for
μ in situation 2As shown in Figure 2.8,
μ
min( )
T is set to be 1.31, which appears to be very close to the optimalμ
* and also lower thanμ
min( )
T .( )
Sμ
minλ
1− (1.5, 0.74)
(1.31, 1)
μ
Δμ
1μ
*(0.95, 0)
(1.3644, 1)
( )T
μ
min(1, 1)
Situation 3:
In this situation, we consider
μ
0 = 1.2 (at the left of the optimalμ
*) to be the current point as shown in Figure 2.9. Again the two lower-bound are also calculated and shown in Figure 2.9.Figure 2.9 Comparison of lower-bound for
μ in situation 3We find that
μ
min( )
T is still larger thanμ
min( )
S even if the current pointμ
0 is on the left hand side ofμ
*. We already demonstrate, without proof, thatμ
min( )
T better thanμ
min( )
S . When we use the lower-bound to safeguard the Newton’s method from invalid solutions, this new lower bound helps the Newton’s method to converge quickly.μ
0−λ1
(1.2, 1.78)
(1, 1) (1.289, 1)
(0.996, 0)
μ
Δ
( )
Sμ
min( )
Tμ
minμ
∗ (1.3644, 1)Algorithm 2.1 (Trust Region Subproblem Algorithm) Begin
Perform S.V.D to
( G
+μ I )
by (2.27) Calculate θ by (2.28)If G is positive definite then
Return μ
*= 0
and the solutiond ( ) 0 = − G
−1β
(2.35)Else If β
⊥E
minthen
Calculate ( ) ( )2)
1 2
1 2 2 2 2
⎭⎬
⎫
⎩⎨
⎧ ⎥
⎦
⎢ ⎤
⎣
⎡
+ + + +
− Δ
=
μ λ
θ λ
μ τ θ
k
K k (2.36)
If τ
2> 0
then0 0 0 1
1 1 2
2
1 1 2
2
⎥⎥
⎥⎥
⎥⎥
⎦
⎤
⎢⎢
⎢⎢
⎢⎢
⎣
⎡
+ +
±
=
⎥⎥
⎥⎥
⎦
⎤
⎢⎢
⎢⎢
⎣
⎡
±
⎥⎥
⎥⎥
⎥⎥
⎦
⎤
⎢⎢
⎢⎢
⎢⎢
⎣
⎡
+
← +
λ μ
θ λ μ
θ τ τ
λ μ
θ λ μ
θ
k k k
k
M M
D
M (2.37)Return a better solution d by evaluating the objective of original
objective functionElse
Go to Algorithm 2.2 (the problem is a good case)
End If
Else
Go to Algorithm 2.2 (the problem is a good case).
End If End
We summarize the algorithm to solve the Trust Region Subproblem as follows.
Algorithm 2.2 (Algorithm for Good Case) Input:
δ : the tolerance for convergence of the solution
d ( ) μ ε
: the tolerance for ensuring (ensure (G+μ I) is P.D.) G:the hessian matrix of (2.2)
β: the gradient vector of (2.2) Δ: the given trust region radius λ1: the smallest eigenvalue
Begin
ε λ
+−
← 1
μ
(ensure( G μ
+I )
is P.D.). (2.38)Repeat while d ( ) μ − Δ > δ
( ( ) ) ( )
( )
( ) ⎟⎟ ⎠ ⎞
⎜⎜ ⎝
⎛
+ Δ + −
−
←
−β I G β
d d
1 3
min
max
μ μ μ μ
, λ
μ
(2.39)(set the lower-bound for
μ
).If d ( ) μ
<Δ (at the right of the root) thenμ
μ
max←
(2.40)Else
μ
μ
min ← (2.41)Notice that, when the hard case occurs the optimal solution must be chosen by evaluating the objective value of original objective function. The algorithm for the TR Hard Case is complete. For “Good Case” of TRS, the Newton’s iterates algorithm is shown below:
( )
( )
[ β G I β ] [ β ( G I ) β ]
d
2 3 2 3
1 Δ
1
~
− −
− +
+
− +
←
μ μ
μ μ μ
T T
(2.42)
If
~μ
<μ
minthen
2
~
μ
maxμ
minμ
← + . (2.43)End If End Repeat Return μ
*= μ ~ End
Example 1.2 is used again to demonstrate the TRS good case algorithm. With the explanation of the geometric meanings, first let the given trust region radius to be 1;
ε
= 2; the procedure is detailed as follows.
Preparation:
The eigenvalue of the Hessian matrix are 3 and −1 respectively, that is, G is an indefinite matrix. We first set
μ
=−( )
−1 +2=3 according to (2.38), and then proceed to the algorithm.Iteration 1:
By (2.39), we have
( ) ( )
⎥⎦
⎢ ⎤
⎣
⎡−
=
= 0
25 . 3 0
d
d μ
andd ( ) μ
=0.25<1. Thus we enter the TRS problem solving step. Theμ
minis first found to be max( )
1,−6 =1 according to (2.39) and is shown in Figure 2.10. Becaused ( ) μ
=0.25<1, to obtain an valid μ , we can setμ
max= 3
. Withμ
* known to be in the interval of( μ
min, μ
max) ( ) = 1 , 3
, thefirst Newton’s iterate can be performed by (2.42), as shown in Figure 2.11
μ ( = 0.75
Becauseμ (
is not in (1, 3) according to the safeguard mechanism (2.43), we take the average ofμ
min andμ
max to replaceμ (
, i.e.,μ
~=( )
1+3 /2=2 . The twobold-dashed line in Figure 2.11 indicate
μ
min andμ
max in the space of1 / Δ
.Figure 2.10 μ , μ , μ ( )
Tand μ on two-dimensional space
max
= 3
= μ
μ μ
Δ
min 1
1 = =
−
λ μ
( ) 6
minT
= −
μ μ
*Figure 2.11 Safeguard mechanism for Newton’s iterate in the 1 Δ space
With
μ ~ = 2
, the remaining iteration is listed in the following table.Table 2.1 The iterative results of example 1.2 solved by the TRS algorithm Iteration m
k1 3 Safeguarded Safeguarded
2 2 (-0.4,0.1) 0.41231
3 1.2544 (-1.1588,0.8063) 1.41181
4 1.3623 (-08618,0.5179) 1.0055
5 1.3644 (-0.8577,0.5140) 1
( ) μ
k( ) μ
kd
d
≈
max
3
1
= μ =
μ 75
.
= 0 μ (
~ = 2 μ
min =1
μ
μ
Δ1