Shortcomings of Generalized Reduced Gradient Method

1 Introduction

1.3 Shortcomings of Current NLP Methods

1.3.1 Shortcomings of Generalized Reduced Gradient Method

One motivation of this study is to overcome some unexpected phenomenon rose by the GRG method, although the GRG method is applied intensively in practice. The phenomenon is called “zigzagging” or “jamming”. Zigzagging usually appears at the later phase of search and causes a poor convergence. As mentioned earlier, the GRG method employs the first-order approximation:

f ( x

⁽^k⁾

+

d ) = f ( x

⁽^k⁾

) +

∇ f ( x

⁽^k⁾

)

d

+ , where d is the search direction, e is the error of the linearization approximation.

e

When x^(k) is close to the stationary point,

∇ f x (

^(k⁾

)

becomes very small so is the term

-8.9419 -8.4438

-8.4438 -7.9457

-7.9457 -7.4476

-7.4476 -6.9

495

-6.9495 -6.9495

-6.451 4

-6.4514 -6.4514

-5.95 33

-5.9533 -5.9533

-5.45 52

-5.4552

-5.4552 -4.95

-4.9571

-4.9571 -4.459

-4.459

-4.459 -4.459

-3.961

-3.961 -3.961

-3.46 29

-3.4629

-3.4629 -3.4629

-2.9 648

-2.9648

-2.9648 -2.46

-2.4667

-2.4667 -1.96

-1.9686

-1.9686 -1.47

-1.4705 -1.4705

-1.4705 -0.97238

-0.97238 -0.47429

-0.47429 0.02381

x1

x2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0 0.2 0.4 0.6 0.8

Zoutendijk’s direction

5 5 ₂ ₄

x

2 + x + x =

x

d x

^k ^T

f (

⁽ ⁾

)

∇

. The error term, thus, becomes relatively significant. The error term caused by the linearization approximation thus induces the search path to zigzag.

For example the response surface resembling an inclined trough, will cause the GRG method to zigzag easily. That is, the search direction of the GRG method, i.e., the reduced gradient, moves toward the bottom of the trough, not to the inclined direction. The number of moving steps will be enormous, and it becomes difficult to reach the optimal point. The objective function of the SMOO problem in (1.8) could certainly form an inclined trough and cause the zigzagging problem.

As described earlier, the quartic objective function in SMOO problem is the “the quadratic of the quadratic”. If the response, expressed as a quadratic function of input variables, isn’t absolutely positive or negative, the quartic objective function could form a trough. For an example with two input variables, the quadratic function, z = 10x² + 10y² – 4, will exhibit a shape as shown in Figure 1.10. After squaring the quadratic function, a trough ring will be created. Figure 1.11 shows the quartic response surface. Because in a typical SMOO problem, the objective function is the sum of multiple quartic functions, a trough will be easily formed and cause the zigzagging problem.

Figure 1.10 Quadratic response surface

Figure 1.11 Quartic response surface

Example 1.3 is a well known problem for testing optimization algorithm. The objective function is called Rosenbrock’s Function or Rosenbrock’s valley. The function is a summation of a square of quadratic function and a square of a linear function. The value of this function thus must be greater or equal to zero. Because of the global minimum f (x1, x2) =0 at (x1=1, x2=1) is inside a long, narrow and

minimum is difficult. The function form is expressed in the following equation and figure 1.12 shows the corresponding surface and contour plot.

Example 1.3:

Minimize: (1－x

1)²+100×(x2－x12)²,

Subject to: －2

≤ x1≤ 2; 0 ≤ x2≤ 4.

Figure 1.12 Surface plot and contour map of Example 1.3

Suppose the initial point is (x1, x2) = (−2, 0.5) with the terminal criterion to be 10^-10 and the maximal iteration number to be 3000. The objective value of the second iteration is 0.0173683 with (x1, x2) = (1.1316, 1.2812). The objective value of the latest iteration is 0.000657 with (x1, x2) = (1.0256, 1.0520). There are 2417 iterations in total from the second iteration to the latest iteration. From the result, there are two major drawbacks of GRG method. The search path of the GRG is sketched on the

-2

Obj. Value

40.19

358.54194 358.54194

637.09677

716.68387

Figure 1.13. Figure 1.15 shows the zigzagging phenomenon of the GRG method. Due to zigzagging phenomenon, the GRG method sometimes is unable to converge to the global minimum (x1, x2) = (1, 1).

Figure 1.13 Search path of GRG in Example 1.3

Figure 1.14 Dash-line region in Figure 1.13

24.19

96.784314

314.54902

338.7451

701.68627

701.6 8627 725.8

8235

725.88235

725.8

Figure 1.15 Zigzagging path in dash-line region of Figure 1.14 1.3.2 Shortcomings of Ridge Search Method

Although the ridge analysis helps us to find the minimum without zigzagging phenomenon, the required optimal Lagrangian multiplier is difficult to find or is inefficiently found. What is known is that the optimal Lagrangian multiplier should be smaller than the smallest eigenvalue of the quadratic coefficient matrix G if we want to minimize the objective function. The RS search uses the following formula to search for the optimal Lagrangian multiplier to calculate the stationary point and obtain the corresponding objective value. The updating formula of Lagrangian multiplier is:

γ γ

γ μ α

μ⁽ ^{+ )}¹

= − Δ ×

(1.24)

, where

γ

is the search step index; Δ is the step size and is set to be proportional to the smallest eigenvalue and

α

is the parameter to approximate the exponential

0.00065825

0.00067105

1.0254 1.0256 1.0258 1.026 1.0262 1.0264 1.026

1.0515

relationship between the radius and

μ

. When we use (1.24) to search the optimal Lagrangian multiplier, there exist three drawbacks. In (1.24) there are two manipulatable parameters Δ and

α

, the search result of the GRR algorithm is in fact quite sensitive to these two parameters. This is also an important reason motivating us to develop a new algorithm with less parameter settings.

Considering Example 1.3, with an initial point

( ^x

¹^,

^x

) (

⁼ ⁻²^,⁰^.⁵

)

, we first set

α

=10 and Δ=100 and then change the setting to α=100 and Δ=100. Figure 1.16, Figure 1.17 and Table 1.1 show the search processes and the comparisons including objective value, iterations and computing time under the two different settings.

Table 1.1 Comparison of two different settings

Settings Obj. Value Number of

Iterations Computing time

1st 3.368948E-07 181 0.06

2nd 1.100042E-10 976 0.30

Figure 1.16 Search process of 1

^st

setting

Figure 1.17 Search process of 2

^nd

setting

51.90

674.74194

778.54839

1193.7742 1245.6774 1297.5801349.48391401.38711453.29031505.19356

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

155.70968

467.12903

519.03226

622.83871

622.83871 1193.7742 1245.67741297.58061453.29031505.19351349.1401.38714839

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Although the GRR search method can find the near-optimal objective value by different setting, but the difference between the two settings is large, and the computing time is also greatly depending to the setting. That is, the first drawback of the GRR search is the difficulty of parameter selection. The second drawback is due to some issues of numerical calculation. For a large SMOO problem, the quadratic coefficient matrix G is easy to be singular or near singular with the smallest eigenvalue near zero. Under the circumstances, the

_Δ

, set to be proportional to the smallest eigenvalue, in (1.24) is also near zero and cause the search to be extremely slow. Finally, the GRR uses (1.24) to approximate the relationship between the radius and the Lagrangian multiplier. This sometimes results in an over-large radius is large sometimes. In fact, the objective function of the SMOO problem is a quartic function.

In order to perform the GRR search, the algorithm needs to approximate the objective function to second-order function by the Taylor series expansion. If the radius is too large, the solution solved by GRR may not in the region of trusted approximate. This is the third drawback of the GRR algorithm.

1.4 Research Objectives

The formulation of the statistical multi-objective optimization (SMOO) problem is exactly a nonlinear programming problem (NLP) with nonlinear inequality and

optimization algorithm for solving the quadratic programming NLP problem.

Furthermore, the objective function of SMOO problem is a quartic function and is not guaranteed to be a convex function. This is the first challenge we need to face. On the other hand, there are three drawbacks of GRR we discussed in subsection 1.3.2. we then attempt to develop a new algorithm that prevents the three drawbacks of the GRR search.

There are some commercial optimization softwares, such as “Lingo”, adopts

“Generalized Reduced Gradient method” together with “Successive Linear Programming method” in its algorithm. The two methods used by Lingo are actually Feasible Direction Methods. These methods are also subject to the zigzagging. Since one of our research objectives is to avoid zigzagging, our research results will be compared to Lingo’s to validate the proposed algorithm. To Summary, our research objectives are to develop a constrained optimization algorithm for solving the SMOO problem and this algorithm must (1) overcome the three drawbacks of GRR and (2) avoid the zigzagging phenomenon.

In specific, there are four research objectives:

1. Develop a nonlinear constrained optimization algorithm called Generalized Reduced Trust-Region (GRT) search method based on trust-region method.

2. Develop a algorithm using the developed GRT method and the Zoutendijk’s method.

3. Propose the convergence proof of GRT search algorithm

4. Test the proposed search algorithm with four cases: (1) A well-known test problem for NLP algorithm called Rosenbrock’s function, (2) Geometric Layout Design for Semiconductor Manufacturability, (3) Robust Configuration of Semiconductor Supply Chain, (4) Track System PED CDU Optimization.

1.5 Thesis Organization

In this chapter, we describe the background, problem definition, current methodology review, and drawbacks of these NLP algorithm and the research objectives. Chapter 2 introduces the trust-region method and subproblem of the trust-region method. Moreover, the hard case of the trust-region method will be also mentioned in this chapter. Finally we do some modification of the traditional trust-region algorithm is also be introduced here. In chapter 3, we describe the algorithm of generalized reduced trust region method. The convergence proof of GRT is also proposed in this chapter. In chapter 4, the test problem and result will be presented. Every result will be compared against Lingo’s result. Finally, some conclusions and suggestions are presented in Chapter 5.

2 Trust Region Method

Due to the drawbacks of the GRG method and the GRR method, this research develops an algorithm based on a method known in numerical optimization are “Trust Region” (TR) method. In Section 2.1, the basic ideas and the problem formulation of the TR method will be introduced. In Section 2.2, we study an algorithm to help us solve the “Trust Region Subproblem” (TRS). Some numerical issues called “Hard Case” of the TR method in the literature will be discussed in Section 2.3. Finally, we make some modifications to the TR method to improve its numerical implementation in Section 2.4.

2.1 Trust Region Method

The TR method and the Ridge Analysis (RA), in effect, share the same mathematical formulation, i.e., minimizing or maximizing a quadratic function subject to a spheral region constraint. The quadratic function can be an approximation of any objective function. For example, we can approximate the quartic SMOO problem to a quadratic function and solve it by the TR method or the RA method.

Though the problem formulation is the same, there are still fundamental differences between two methods. First, the TR method finds a solution inside the spheral region, while the RA method only considers the boundary solutions. Second, as we discussed

and makes guess on the value of the Lagrangian multiplier iteratively by (1.24). In contrast, the TR method finds the optimal Lagrangian multiplier directly by solving a sequence of Trust Region Subproblems (TRS). This method for TRS is discussed in the next section. The TR method allows us to adjust the radius directly without guessing on the value of the Lagrangian multiplier.

Determination of the trust region radius with TR method is critical. If the radius of the region is too small, the algorithm misses the chance to move faster to a minimum of the objective function. If it’s too large, the approximated model may become a poor approximate of the objective function and the minimum found inside the region may be far from the global minimum. Thus the TR algorithm gradually shrinks the size of the region in its search steps. In every iteration, the algorithm uses the approximate performance of the previous iterate to determine radius of the trust region. If the approximation is good, we enlarge the size else we shrink the size of the trust region. Such update of the trust region radius is introduced in next Chapter.

Figure 2.1 shows the TR approach for a function f of two variables on a contour plot.

The contour of quadratic model function

φ

(in dashed line) is constructed from the derivative information at the current iterate

~x

₀

Figure 2.1 Trust-Region and Trust-Region step

The TR method approximates any differentiable function to a quadratic function

by the Taylor series expansion. Consider the following TR problem

( )

, :

2 : 1

Δ

≤

+ + x

Hx x x g x to Subject

f

Minimize

^T ^T

(2.1)

where ⋅ is the Euclidean norm;

g

^T is the gradient vector, i.e., ^∇

^f ( ) ^x

; H is the Hessian Matrix, i.e.,

^∇

^f ( ) ^x

^{; and}Δ is the radius of the trust region.

Now considering the SMOO problem, the objective function of (1.8) is a quartic function. We approximate the objective function of the SMOO problem with respect

to a given point

x by the second-order approximation and apply the TR method as

^{( )}^k follows:

~x

Contour of f Contour of

φ

Trust Region Trust Region Step

( ) ( )

denotes the trust region radius at current iteration; the partial derivative matrix

( )

^{f x} ( ) ( )

β = ∇

and the Hessian matrix

^G ^{( )}

⁼ ^∇

^{f x} ( ) ^{( )}

^k are calculated with the following formulas:

( )

The solution of (2.2) is derived in the next Section.

2.2 Iterative Solution of Trust Region Subproblem (TRS)

To solve (2.2), a sequence of the “Trust Region Subproblems” (TRS) has to be solved. The TRS is to find the minimum of (2.2) with a given trust region radius Δ.

Actually,

( ^x ^{( )}

^k⁺¹

⁻ ^x ^{( )}

)

in (2.2) is the improving direction

d ^{( )}

^k to be found.

Therefore we replace

( ^x ^{( )}

^k⁺¹

⁻ ^x ^{( )}

)

by the improving direction

d ^{( )}

^k . Without loss generality, we drop the superscript

( ) k and consider the following direction-finding

problem.

Gd d d β

x

f

Minimize

2 ) 1

(

: + + (2.3)

≤

d d

to subject :

First, we characterize the exact solution of (2.3) by the theorem 2.1 which shows that the improving direction dsatisfies

β d I G + ) = −

( μ

^(2.4)

Theorem 2.1 [9, 13]

The vector

d

is a global solution of the TR problem

Δ

≤ + + d

Gd d d

β :

2 : 1

to Subject

f

Minimize

^T ^T

(2.5)

if and only if d is feasible with the Lagrange multiplier μ ≥ 0 such that the following

conditions are satisfied:

0 ) (Δ d− =

μ

^; ^(2.7)

)

( G + μ I

is positive semidefinite. (2.8)

Any solutions of (2.5) lies either in the interior or on the boundary of the feasible set (trust region), i.e. the set

{ ^d

^d

^≤^Δ

}

. Equation (2.5) has no solution on the boundary if and only if G is positive define and

G

⁻¹

β

<Δ. In this case, the solution of (2.5) is

d = G

⁻¹

β

with the Lagrangian multiplier

μ

^*^{= 0.}

In (2.4), the hessian matrix G and the gradient vector β are known. The unknowns in (2.4) are the solution d and the Lagrangian multiplier

μ

. Τhe solution of

d in (2.3) is shown to be:

β I G

d = − ( +

)

⁻¹

.

(2.9)

According to (2.7), either Δ d− =0 or

μ

= 0 must hold. If

μ

= 0 then the solution d is in the interior of the region else d is on the boundary. In the latter caseΔ d− =0 hold and then the norm of the solution d,

d , equals to the trust region radius

Δ, i.e.,

d

Δ . Due to the equality relationship between the radius and the norm of the

solution, (2.9) becomes Δ

= +

−

G I

⁻

β

d

(

μ

) ¹ ^. ^(2.10)

From (2.10), the solution d is a function of

μ

. To find d, we have to find

μ

^first.

Finding

μ

is a typical root-finding problem of a nonlinear equation. We can apply

Newton method to help us find the optimal

μ

with a given radius Δ. Now we define the function

φ ( ) μ

^as

( )

⁼ ⁻ ⁺ ⁼^Δ

d

(

G I

)⁻¹

β

)

(

μ μ μ

φ

^. ^(2.11)

Equation (2.11) describes the equality relationship between the radius and the Lagrangian multiplier, much like what we have discussed for (1.22) in Subsection 1.2.2 where we also have sketched the relationship on a two dimension space like Figure 1.8. It shows that if the Newton’s method is applied to find the root of (2.11), the root finding procedure is slow and inefficient due to the nonlinearity of the function

φ

(

μ

)

^with μ

on the interval of (−λ1,∞).

Fortunately, the Newton’s method can perform quite efficiently with the following transformation to (2.11). The attempt is to reformulate (2.11) to become almost linear with

μ

on the interval of (−λ1,∞). We define the reformulated equation as follows:

) 0 (

1 1

) (

1 ) 1

( ₁

1 =

− −

= Δ Δ−

= ₋

β I

d μ G μ

μ

φ

^(2.12)

As shown in Figure 2.2,

φ

( ) μ

becomes a near-linear function of

μ

^{. Now the} Newton’s method can perform better to find the root.

Figure 2.2 The relationship of

1 Δ

and

μ in example 1.2

To apply the one-dimensional Newton's method:

( ) ( ) μ φ μ μ φ

μ

~= − 1 , where

^φ

¹^'

( ) ^μ

is the first derivative of

^φ

( ) ^μ

and

μ ^~

denotes the next Lagrangian multiplier found by the Newton’s iterates. In order to perform the Newton’s method

φ

( ) μ

and

^φ

¹^'

( ) ^μ

must be evaluated. That can be obtained by solving a linear system involving

( G + μ I )

. Because in the range of interest,

( G + μ I )

is definite positive, we may use its Cholesky factors (

^G

+ )

μ ^I

^U ( ) ( ) μ

^U μ

, where

( ) μ

U

is an upper triangular matrix. To solve the problem, computation demanding calculation of the eigen-system of G is thus avoided.

−λ¹ μ

However, to be able to use the Cholesky factorization, we have to ensure that

)

( G + μ I

is positive definite. In other words,

μ

has to be in the interval of

(

− ,

λ

1 ∞

)

A safeguard mechanism is therefore needed to ensure the success of the Newton’s method. Here, we don’t discuss the Newton’s method and the safeguard mechanism in detail. For a more detailed explanation, please see Section 2.4 and Appendix B.

2.3 The Hard Case

Although the Newton’s iterates can be used to find root of

^φ

( ) ^μ

, there are some computation difficulties. The numerical difficulty is called “Hard Case” in the TR literature. The hard case occurs when the eigenvector corresponding to the smallest eigenvalue is perpendicular to the gradient vector

β

_{, i.e.,}

q

₁^T

β

=0, where the eigenvector with respect to the smallest eigenvalue is denoted as q1. When there are multiple eigenvectors, i.e., an eigenspace corresponding to the smallest eigenvalue provided that

Q

₁^T

b

=0, where Q1 is the matrix whose columns span the eigenspace

corresponding to the smallest eigenvalue. The hard case is caused by the failure of the limit condition

( )

=∞

→_λ

μ

d

lim . Therefore, there may not exist a value in

(

^{− ,}

^λ

¹ ^∞

)

^to

solve

^d ( ) μ

=Δ. We use an example to illustrate the hard case condition followed by a geometric interpretation.

Consider the following example with a current point at (d1, d2) = (0, 0).

Example 2.1

Express (2.14) in matrix notation:

( ) ( ) ( )

Minimize

^T ^T

where the gradient vector ⎥

⎦ that the gradient vector β be perpendicular to the eigenvector corresponding to smallest eigenvalue −1. The relationship among the radius,

μ

and the solution

^d ( ) μ

becomes:

( ) ( )

( )

( )( ) ( )( )

( )( )

( )

3 2 1

3 1 2 1

3 1

1 3

1 1

2 2

⎟⎟ ⎠

⎜⎜ ⎞

⎝

⎛

= +

⎟⎟ ⎠

⎜⎜ ⎞

⎝

⎛

+

−

= −

⎟⎟ ⎠

⎜⎜ ⎞

⎝

⎛

+

− + + −

⎟⎟ ⎠

⎜⎜ ⎞

⎝

⎛

+

− +

= −

= Δ

μ μ μ μ

μ μ

d

(2.14)

We also sketch (2.14) in a two dimension space. Figure 2.2 shows that there is only one pole corresponding to the second eigenvalue, that is, the pole with respect to the first eigenvalue is vanished. This is because the denominator

( μ

+³

)( μ

−¹

)

^is

eliminated by its factor

( μ

−¹

)

. However the pole which is corresponding to the second smallest eigenspace is still existed but the solution

^μ

^∈

(

⁻³^,¹

)

is only a local minimum on the spheral constraint. Fortunately, More, J. J. and D. C. Sorensen (1983) [13] propose a solution to solve the hard case which will be discussed latter.

Figure 2.3 The vanished pole with respect to the smallest eigenspace

The hard case is a special situation in which the boundary solution of (2.3) is not unique. It can be shown that the hard case can only occur when the hessian matrix G is positive semidefinite, indefinite; the gradient vector β is perpendicular to the eigenspace with respect to the smallest eigenvalue of G; and ^Δ^> ⁻

( ^G

⁻

λ

^I )

⁺

^β

where the superscript (⁺) indicates the pseudo inverse. That is, if any of the above

three conditions is not met, the hard case cannot occur see also [17]. Let

( )

1 2 2

2 φ λ

= Δ −

and

^d

⁼⁻

( ^G

⁻

^λ

^I )

⁺

^β

^{. If}

^λ

¹^≤⁰^and

^d

^<^Δ then the solution to the hard case is defined as:

( G

−

λ

I ) β

τ q

d

τ q

− ⁺ (2.15)

−3 1 μ

If d solves the (2.4) then d must satisfy the condition (2.7) to (2.9), see also Appendix A.

Now we explain the geometric interpretation of the hard-case solution. Consider the following example with a current point at x0 =

( x

1^,

x

) (

= −³^,⁰

)

, :

. 2 15

:

2 2 2 1

2 2 2 1 1

Δ

≤ +

− + x x to subject

x x +x Minimize

where the Hessian matrix ⎥

⎦

⎢ ⎤

⎣

⎡

= −

4 0

G

2 and the gradient vector ⎥

⎦

⎢ ⎤

⎣

⎡−

= 0

β

5 at

current point

(

−³^,⁰

)

and −

( G

−

λ

I )

⁺

β

=⁵^/⁶^.

With the indefinite Hessian matrix G, we firstly show the geometric interpretation for

the case with the gradient vector β orthogonal to E^min but ^Δ^< ⁻

( ^G

⁻

λ

^I )

⁺

^β

^{. Figure}

2.4 shows a two-dimensional example where

_Δ

is chosen to be 0.6. When the radius

is chosen to be Δ< −

( G

−

λ

)

⁺

β

=⁵ ⁶, there is still a unique solution because the intersection of the sphere and the contour of the optimal y along the d direction is a unique point.

Figure 2.4 The Easy Case for an Indefinite Hessian

Suppose the trust-region radius Δ is chosen to be 1.5 and is greater than

⁵ ^/ ⁶

, i.e., the gradient vector β is still orthogonal to E^min but^Δ^> ⁻

( ^G

⁻

λ

^I )

⁺

^β

. Figure 2.5 shows how the solution for this case becomes not unique. In Figure 2.5 the length from the current point

x

= ( − ³ ^, ⁰ )

to x0 + d, ⎟

bold line in Figure 2.5) and less than 1.5. In this case, there are actually two solutions by adding d with

τ q

₁ ^and −

τ q

1 (dotted lines):

d

H1=−

( G

−

λ

I )

⁺

β

τ q

1 and

-15.3226 -15.3226

-14.3468 -14.3468

-13.371

-13.371 -13.371

-12.3952

-12.3952 -11.4194

-11.4194

-11.4194 -10.4435

-10.4435 -10.4435

-10.4435

-10.4435 -9.46774

-9.46774

-9.46774 -9.46774

-8.49194

-8.49194 -8.49194

-8.49194

-8.49194 -7.51613

-7.51613 -7.51613

-7.51613

-7.51613 -6.54

032

-6.54032 -6.54032

-6.54032

-6.54032 -5.56452

-5.56452 -5.56452

-5.56452

-5.56452 -4.58871

-4.58871 -4.58871

-4.58871

-4.58871 -4.58871

-3.6129

-3.6129 -3.6129

-2.6371

-2.6371 -2.6371

-1.66129

-1.66129 -1.66129 0.290323

0.290323

0.290323 1.26613

1.26613

1.2661 3 2.24194

2.24194

2.24194 3.21774

3.21774

Figure 2.5 The Hard Case for an Indefinite Hessian

2.4 Modifications of Trust Region Algorithm

We do some modification to the TRS algorithm [9], the conventional TRS algorithm is detailed in Appendix B. The first is that because we need to compute the eigenvectors with respect to the smallest eigenvalue to solve the hard case, we use a more numerical computation robust method, namely, Singular Value Decomposition (SVD), to compute the eigen-system. Cholesky factorization is therefore replaced by SVD.

-17.0865

-15.9231 -15.9231

-14.7596 -14.7596

-13.5962 -13.5962

-13.5962

-13.5962 -12.4327

-12.4327

-12.4327 -11.2692

-11.2692 -11.2692

-11.2692

-11.2692 -10.1058

-10.1058 -10.1058

-10.1058

-10.1058 -8.94231

-8.94231 -8.94231

-8.94231

-8.94231 -7.77885

-7.77885

-7.77885 -6.61538

-6.61538

-6.61538 -6.61538

-5.45192

-5.45192 -4.28846

-4.28846

-4.28846 -3.125

-3.125 -3.125

-3.125

-3.125 -3.125

-1.96154

-1.96154 -1.96154

-0.798077

-0.798077 1.52885

1.52885

1.52885 2.69231

2.69231

We first derive the all ingredients for the Trust Region algorithm. The root-finding problem applied Newton’s method generates a sequence of iterate of

μ ^~

by setting

( ) ( ) ^μ ^φ ^μ

φ μ

μ ~ = −

/

1^' ^(2.16)

, where k is the k-th search index;

μ ^~

is the next Lagrangian multiplier found in the Newton’s iterates and

( ) ₍ ₎ _[ ₍ ₎ _]

( )

[ ] [ ⁽ ⁾ ] ^.

2 2 ) 1

) ( ( '

2 3 2 3

2 3 3 2 1

β I G β β I G β

β I G β β

I G d β

− −

−

− −

−

=

+

⎥⎦ −

⎢⎣ ⎤

⎡ − +

−

− =

=

μ μ

μ μ μ

μ μ φ

T T

d d

(2.17)

In trust region literature, the first order derivative can solve by solving linear system.

Due to the matrix

( ^G

⁺

^μ ^I )

is positive definite with

μ

on the interval (−λ1,∞) and

( ^G

⁺

^μ ^I )

is also a symmetric matrix so it can be factorized by Cholesky factorization

( ^G ⁺

^I ) ⁼ ^U

^U

(2.18)

, where U is a upper triangular matrix.

By substituting (2.18) into (2.4) yields

U

U d= U

U d( μ

^)=－β. (2.19)

Solve the linear system (2.19) we have the the solution d(

μ

^{) becomes}

β

U U

d (

) = −

⁻¹ ⁻^T

(2.20)

, and

^d

^T⁽

μ

⁾

^d

⁽

μ

⁾⁼

^β

^U

⁻¹

^U

⁻^T

^U

⁻¹

^U

⁻^T

^β

⁼

^β

( ^G

⁺

μ ^I )

⁻²

^β

^. ^(2.21)

Also, solve the linear system U^T

U y( μ

^{) = d(}

μ

), the solution y (

μ

^{) is}

y ( μ

⁾⁼

^U

⁻¹

^U

⁻^T

^d ( ) μ

⁼⁻

^U

⁻¹

^U

⁻^T

^U

⁻¹

^U

⁻^T

^β

⁼⁻

( ^G

⁺

μ ^I )

⁻²

^β

^. ^(2.22)

Besides we also have

( ) ( ) ^y ^β ( ^G ^I ) ^β

d

μ μ

= ^T +

μ

⁻³

^(2.23)

Substituting (2.21) and (2.23) into (2.17) yields

[ ] [ ^{( ) ( )} ] [ ] ^[ ^{( ) ( )} ^]

Also, substitute (2.12) and (2.24) into (2.16), and we have the formula of Newton’s iterates

( ) ( )

Now we have derived the formula for performing Newton’s iteration. The detailed algorithm is described in Appendix A.

Because (2.12) and (2.17) involve the term

( ^G

^{+ I}

μ )

⁻^p^where

p ∈ ℜ

. By applying the SVD method to

( ^G

μ ^I )

we have

( ^G ⁺ ^μ ^I ) ⁼ ^Q ^Σ ^Q

^T ^(2.26)

, where

⎥ ⎥

⎥

⎦

⎤

⎢ ⎢

⎢

⎣

⎡

=

⎥ ⎥

⎥

⎦

⎤

⎢ ⎢

⎢

⎣

⎡

= Σ

n λ

λ σ

O O

1 1

and

_λ

₁

,..., _λ

_nare the eigenvalues of

( ^G

μ ^I )

; Q is the n by n matrix with columns consisting of orthonormal eigenvectors of

( ^G

μ ^I )

Therefore, the inverse of

( ^G

μ ^I )

with any order p could be calculated by the following formula.

( ) ( )

P n p

T p p

G I Q Q Q Q

I G

⎥ ⎥

⎥

⎦

⎤

⎢ ⎢

⎢

⎣

⎡

= Σ

= +

=

+

⁻ ⁺ ⁺

σ μ σ

1 O

. (2.27)

To proceed with our algorithm, we also have to do the following transformation.

Qβ

θ =

(2.28)

, where the components of

θ

denotes as

θ

_i which is the product of the eigenvectors

q

i and the gradient vector

β

The second modification of the TRS algorithm is to find the lower-bound for the Lagrangian multiplier μ more efficiently. The purpose of the lower-bound is to prevent the unsuccessful iterates of the Newton’s method. As presented in Section 2.2, the safeguard mechanism of the TRS algorithm is designed to prevent this situation.

Figure 2.6 shows Newton’s method leads

μ

beyond the logical interval. Moreover, the

在文檔中一般化縮減信賴域搜尋及其在多目標統計模型最佳化之應用 (頁 32-0)

Shortcomings of Generalized Reduced Gradient Method

1 Introduction

1.3 Shortcomings of Current NLP Methods

1.3.1 Shortcomings of Generalized Reduced Gradient Method

f ( x

+

d ) = f ( x

) +

∇ f ( x

)

d

e

∇ f x (

)

x1

x2

Zoutendijk’s direction

x

x

x

2

+ x + x =

x

d x

f (

)

∇

Figure 1.10 Quadratic response surface

Figure 1.11 Quartic response surface

Minimize: (1－x

Subject to: －2

Figure 1.12 Surface plot and contour map of Example 1.3

Figure 1.13 Search path of GRG in Example 1.3

Figure 1.14 Dash-line region in Figure 1.13

Figure 1.15 Zigzagging path in dash-line region of Figure 1.14 1.3.2 Shortcomings of Ridge Search Method

= − Δ ×

γ

α

μ

α

( x

x

) (

)

α

Table 1.1 Comparison of two different settings

Settings Obj. Value Number of

Iterations Computing time

1st 3.368948E-07 181 0.06

2nd 1.100042E-10 976 0.30

Figure 1.16 Search process of 1

setting

Figure 1.17 Search process of 2

setting

Δ

1.4 Research Objectives

1.5 Thesis Organization

2 Trust Region Method

2.1 Trust Region Method

φ

~x

Figure 2.1 Trust-Region and Trust-Region step

( )

, :

2 : 1

Δ

≤

+ + x

Hx x x g x to Subject

f

Minimize

g

f ( ) x

∇

f ( ) x

x by the second-order approximation and apply the TR method as

~x

φ

( )

f x ( ) ( )

( ^x

^x

_Δ

^f ( ) ^x

^∇

^f ( ) ^x

^{f x} ( ) ( )

^G ^{( )}

⁼ ^∇

^{f x} ( ) ^{( )}

( ^x ^{( )}

⁻ ^x ^{( )}

d ^{( )}

( ^x ^{( )}

⁻ ^x ^{( )}

d ^{( )}

{ ^d

^d