• 沒有找到結果。

Numerical solutions of nonlinear systems of equations

N/A
N/A
Protected

Share "Numerical solutions of nonlinear systems of equations"

Copied!
35
0
0

(1)

of equations

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University, Taiwan

E-mail: min@ntnu.edu.tw

September 12, 2015

1 / 35

(2)

Outline

1 Fixed points for functions of several variables

2 Newton’s method

3 Quasi-Newton methods

4 Steepest Descent Techniques

2 / 35

(3)

Fixed points for functions of several variables

Theorem 1

Let f : D ⊂ Rn→ R be a function and x0∈ D. If all the partial derivatives of f exist and ∃ δ > 0 and α > 0 such that

∀ kx − x0k < δ and x ∈ D, we have

∂f (x)

∂xj

≤ α, ∀ j = 1, 2, . . . , n, then f is continuous at x0.

Definition 2 (Fixed Point)

A function G from D ⊂ Rninto Rnhas a fixed point at p ∈ D if G(p) = p.

3 / 35

(4)

Theorem 3 (Contraction Mapping Theorem)

Let D = {(x1, · · · , xn)T; ai≤ xi ≤ bi, ∀ i = 1, . . . , n} ⊂ Rn. Suppose G : D → Rnis a continuous function with G(x) ∈ D whenever x ∈ D. Then G has a fixed point in D.

Suppose, in addition, G has continuous partial derivatives and a constant α < 1 exists with

∂gi(x)

∂xj

≤ α

n, whenever x ∈ D,

for j = 1, . . . , n and i = 1, . . . , n. Then, for any x(0) ∈ D, x(k)= G(x(k−1)), for each k ≥ 1

converges to the unique fixed point p ∈ D and k x(k)− p k≤ αk

1 − α k x(1)− x(0) k.

4 / 35

(5)

Example 4

Consider the nonlinear system

3x1− cos(x2x3) −1

2 = 0, x21− 81(x2+ 0.1)2+ sin x3+ 1.06 = 0,

e−x1x2+ 20x3+10π − 3

3 = 0.

Fixed-point problem:

Change the system into the fixed-point problem:

x1 = 1

3cos(x2x3) +1

6 ≡ g1(x1, x2, x3), x2 = 1

9 q

x21+ sin x3+ 1.06 − 0.1 ≡ g2(x1, x2, x3), x3 = 1

20e−x1x210π − 3

60 ≡ g3(x1, x2, x3).

Let G : R3→ R3be defined by G(x) = [g1(x), g2(x), g3(x)]T.

5 / 35

(6)

• G has a unique point in D ≡ [−1, 1] × [−1, 1] × [−1, 1]:

Existence: ∀ x ∈ D,

|g1(x)| ≤1

3| cos(x2x3)| +1 6 ≤ 0.5,

|g2(x)| = 1 9

q

x21+ sin x3+ 1.06 − 0.1

1 9

1 + sin 1 + 1.06 − 0.1 < 0.09,

|g3(x)| = 1

20e−x1x2+10π − 3 60 1

20e +10π − 3

60 < 0.61, it implies that G(x) ∈ D whenever x ∈ D.

Uniqueness:

∂g1

∂x1

= 0,

∂g2

∂x2

= 0 and

∂g3

∂x3

= 0, as well as

∂g1

∂x2

≤ 1

3|x3| · | sin(x2x3)| ≤ 1

3sin 1 < 0.281,

6 / 35

(7)

∂g1

∂x3

1

3|x2| · | sin(x2x3)| ≤1

3sin 1 < 0.281,

∂g2

∂x1

= |x1|

9px21+ sin x3+ 1.06 < 1 9

0.218 < 0.238,

∂g2

∂x3

= | cos x3|

18px21+ sin x3+ 1.06< 1 18

0.218 < 0.119,

∂g3

∂x1

= |x2|

20 e−x1x2 1

20e < 0.14,

∂g3

∂x2

= |x1|

20 e−x1x2 1

20e < 0.14.

These imply that g1, g2and g3are continuous on D and ∀ x ∈ D,

∂gi

∂xj

≤ 0.281, ∀ i, j.

Similarly, ∂gi/∂xjare continuous on D for all i and j. Consequently, Ghas a unique fixed point in D.

7 / 35

(8)

• Approximated solution:

Fixed-point iteration (I):

Choosing x(0)= [0.1, 0.1, −0.1]T, {x(k)} is generated by x(k)1 = 1

3cos x(k−1)2 x(k−1)3 +1 6, x(k)2 = 1

9 r



x(k−1)1 2

+ sin x(k−1)3 + 1.06 − 0.1, x(k)3 = 1

20e−x(k−1)1 x(k−1)2 10π − 3 60 . Result:

k x(k)1 x(k)2 x(k)3 kx(k)− x(k−1)k

0 0.10000000 0.10000000 -0.10000000

1 0.49998333 0.00944115 -0.52310127 0.423 2 0.49999593 0.00002557 -0.52336331 9.4 × 10−3 3 0.50000000 0.00001234 -0.52359814 2.3 × 10−4 4 0.50000000 0.00000003 -0.52359847 1.2 × 10−5 5 0.50000000 0.00000002 -0.52359877 3.1 × 10−7

8 / 35

(9)

• Approximated solution (cont.):

Accelerate convergence of the fixed-point iteration:

x(k)1 = 1

3cos x(k−1)2 x(k−1)3 +1 6, x(k)2 = 1

9 r

x(k)1 2

+ sin x(k−1)3 + 1.06 − 0.1, x(k)3 = 1

20e−x(k)1 x(k)2 10π − 3 60 , as in the Gauss-Seidel method for linear systems.

Result:

k x(k)1 x(k)2 x(k)3 kx(k)− x(k−1)k

0 0.10000000 0.10000000 -0.10000000

1 0.49998333 0.02222979 -0.52304613 0.423 2 0.49997747 0.00002815 -0.52359807 2.2 × 10−2 3 0.50000000 0.00000004 -0.52359877 2.8 × 10−5 4 0.50000000 0.00000000 -0.52359877 3.8 × 10−8

9 / 35

(10)

Exercise

Page 636: 5, 7.b, 7.d

10 / 35

(11)

Newton’s method

First consider solving the following system of nonlinear eqs.:

(f1(x1, x2) = 0, f2(x1, x2) = 0.

Suppose (x(k)1 , x(k)2 )is an approximation to the solution of the system above, and we try to compute h(k)1 and h(k)2 such that (x(k)1 + h(k)1 , x(k)2 + h(k)2 )satisfies the system. By the Taylor’s theorem for two variables,

0 = f1(x(k)1 + h(k)1 , x(k)2 + h(k)2 )

≈ f1(x(k)1 , x(k)2 ) + h(k)1 ∂f1

∂x1

(x(k)1 , x(k)2 ) + h(k)2 ∂f1

∂x2

(x(k)1 , x(k)2 ) 0 = f2(x(k)1 + h(k)1 , x(k)2 + h(k)2 )

≈ f2(x(k)1 , x(k)2 ) + h(k)1 ∂f2

∂x1(x(k)1 , x(k)2 ) + h(k)2 ∂f2

∂x2(x(k)1 , x(k)2 )

11 / 35

(12)

Put this in matrix form

" ∂f

1

∂x1(x(k)1 , x(k)2 ) ∂x∂f1

2(x(k)1 , x(k)2 )

∂f2

∂x1(x(k)1 , x(k)2 ) ∂x∂f2

2(x(k)1 , x(k)2 )

# "

h(k)1 h(k)2

# +

"

f1(x(k)1 , x(k)2 ) f2(x(k)1 , x(k)2 )

#

 0 0

 . The matrix

J (x(k)1 , x(k)2 ) ≡

" ∂f

1

∂x1(x(k)1 , x(k)2 ) ∂f∂x1

2(x(k)1 , x(k)2 )

∂f2

∂x1(x(k)1 , x(k)2 ) ∂f∂x2

2(x(k)1 , x(k)2 )

#

is called theJacobian matrix. Set h(k)1 and h(k)2 be the solution of the linear system

J (x(k)1 , x(k)2 )

"

h(k)1 h(k)2

#

= −

"

f1(x(k)1 , x(k)2 ) f2(x(k)1 , x(k)2 )

# ,

then

"

x(k+1)1 x(k+1)2

#

=

"

x(k)1 x(k)2

# +

"

h(k)1 h(k)2

#

is expected to be a better approximation.

12 / 35

(13)

In general, we solve the system of n nonlinear equations fi(x1, · · · , xn) = 0, i = 1, . . . , n. Let

x =

x1 x2 · · · xn

T

and

F (x) =

f1(x) f2(x) · · · fn(x) T

. The problem can be formulated as solving

F (x) = 0, F : Rn→ Rn. LetJ (x), where the(i, j)entryis ∂x∂fi

j(x), be the n × nJacobian matrix. Then the Newton’s iteration is defined as

x(k+1) = x(k)+ h(k),

where h(k)∈ Rnis the solution of the linear system J (x(k))h(k)= −F (x(k)).

13 / 35

(14)

Algorithm 1 (Newton’s Method for Systems)

Given a function F : Rn→ Rn, an initial guess x(0) to the zero of F , and stop criteria M , δ, and ε, this algorithm performs the Newton’s iteration to approximate one root of F .

Set k = 0 and h(−1)= e1.

While (k < M ) and (k h(k−1)k≥ δ) and (k F (x(k)) k≥ ε) Calculate J (x(k)) = [∂Fi(x(k))/∂xj].

Solve the n × n linear system J (x(k))h(k)= −F (x(k)).

Set x(k+1)= x(k)+ h(k) and k = k + 1.

End while

Output (“Convergent x(k)”) or

(“Maximum number of iterations exceeded”)

14 / 35

(15)

Theorem 5

Let xbe a solution of G(x) = x. Suppose ∃ δ > 0 with

(i) ∂gi/∂xjis continuous on Nδ = {x; kx − xk < δ} for all i and j.

(ii) 2gi(x)/(∂xj∂xk)is continuous and

2gi(x)

∂xj∂xk

≤ M

for some M whenever x ∈ Nδ for each i, j and k.

(iii) ∂gi(x)/∂xk = 0for each i and k.

Then ∃ ˆδ < δsuch that the sequence {x(k)} generated by x(k)= G(x(k−1))

convergesquadraticallyto xfor any x(0)satisfying kx(0)− xk< ˆδ.

Moreover,

kx(k)− xkn2M

2 kx(k−1)− xk2, ∀ k ≥ 1. 15 / 35

(16)

Example 6

Consider the nonlinear system

3x1− cos(x2x3) −1

2 = 0, x21− 81(x2+ 0.1)2+ sin x3+ 1.06 = 0,

e−x1x2 + 20x3+10π − 3

3 = 0.

Nonlinear functions: Let

F (x1, x2, x3) = [f1(x1, x2, x3), f2(x1, x2, x3), f3(x1, x2, x3)]T , where

f1(x1, x2, x3) = 3x1− cos(x2x3) −1 2,

f2(x1, x2, x3) = x21− 81(x2+ 0.1)2+ sin x3+ 1.06, f3(x1, x2, x3) = e−x1x2+ 20x3+10π − 3

3 .

16 / 35

(17)

Nonlinear functions (cont.):

The Jacobian matrix J (x) for this system is

J (x1, x2, x3) =

3 x3sin x2x3 x2sin x2x3 2x1 −162(x2+ 0.1) cos x3

−x2e−x1x2 −x1e−x1x2 20

. Newton’s iteration with initial x(0)= [0.1, 0.1, −0.1]T:

 x(k)1 x(k)2 x(k)3

=

 x(k−1)1 x(k−1)2 x(k−1)3

−

 h(k−1)1 h(k−1)2 h(k−1)3

, where

 h(k−1)1 h(k−1)2 h(k−1)3

= J

x(k−1)1 , x(k−1)2 , x(k−1)3 −1

F (x(k−1)1 , x(k−1)2 , x(k−1)3 ).

17 / 35

(18)

Result:

k x(k)1 x(k)2 x(k)3 kx(k)− x(k−1)k

0 0.10000000 0.10000000 −0.10000000

1 0.50003702 0.01946686 −0.52152047 0.422 2 0.50004593 0.00158859 −0.52355711 1.79 × 10−2 3 0.50000034 0.00001244 −0.52359845 1.58 × 10−3 4 0.50000000 0.00000000 −0.52359877 1.24 × 10−5 5 0.50000000 0.00000000 −0.52359877 0

18 / 35

(19)

Exercise Page 644: 2, 8

19 / 35

(20)

Quasi-Newton methods

Newton’s Methods

Disadvantage: For each iteration, it requires O(n3) + O(n2) + O(n)arithmetic operations:

n2partial derivatives for Jacobian matrix – in most situations, the exact evaluation of the partial derivatives is inconvenient.

nscalar functional evaluations of F

O(n3)arithmetic operations to solve linear system.

quasi-Newton methods

Advantage: it requires only n scalar functional evaluations per iteration and O(n2)arithmetic operations

Recall that in one dimensional case, one uses thelinearmodel

`k(x) = f (xk) + ak(x − xk)

toapproximatethe functionf (x)at xk. That is,`k(xk) = f (xk) for any ak∈ R. If we further require that`0(xk) = f0(xk), then

ak= f0(xk). 20 / 35

(21)

The zero of `k(x)is used to give a new approximate for the zero of f (x), that is,

xk+1= xk 1

f0(xk)f (xk) which yieldsNewton’smethod.

`k(xk) = f (xk) and `k(xk−1) = f (xk−1).

In doing this, the identity

f (xk−1) = `k(xk−1) = f (xk) + ak(xk−1− xk)

gives

ak =f (xk) − f (xk−1) xk− xk−1

.

Solving `k(x) = 0yields thesecantiteration xk+1= xk xk− xk−1

f (xk) − f (xk−1)f (xk).

21 / 35

(22)

In multiple dimension, the analogueaffine modelbecomes Mk(x) = F (x(k)) + Ak(x − x(k)),

where x, x(k)∈ Rnand Ak∈ Rn×n, and satisfies Mk(x(k)) = F (x(k)),

for any Ak. The zero of Mk(x)is then used to give a new approximate for the zero of F (x), that is,

x(k+1) = x(k)− A−1k F (x(k)).

TheNewton’smethod chooses

Ak= F0(x(k)) ≡ J (x(k)) =the Jacobian matrix

and yields the iteration x(k+1)= x(k)−

F0(x(k))−1

F (x(k)).

22 / 35

(23)

When the Jacobian matrix J (x(k)) ≡ F0(x(k))is not available, one can require

Mk(x(k−1)) = F (x(k−1)).

Then

F (x(k−1)) = Mk(x(k−1)) = F (x(k)) + Ak(x(k−1)− x(k)), which gives

Ak(x(k)− x(k−1)) = F (x(k)) − F (x(k−1)) and this is the so-called secant equation. Let

h(k)= x(k)− x(k−1) and y(k)= F (x(k)) − F (x(k−1)).

The secant equation becomes Akh(k)= y(k).

23 / 35

(24)

However, this secant equation can not uniquely determine Ak. One way of choosing Ak is to minimize Mk− Mk−1 subject to the secant equation. Note

Mk(x) − Mk−1(x)

=F (x(k)) + Ak(x − x(k)) − F (x(k−1)) − Ak−1(x − x(k−1))

=(F (x(k)) − F (x(k−1))) + Ak(x − x(k)) − Ak−1(x − x(k−1))

=Ak(x(k)− x(k−1)) + Ak(x − x(k)) − Ak−1(x − x(k−1))

=Ak(x − x(k−1)) − Ak−1(x − x(k−1))

=(Ak− Ak−1)(x − x(k−1)).

For any x ∈ Rn, we express

x − x(k−1) = αh(k)+ t(k),

for some α ∈ R, t(k)∈ Rn, and (h(k))Tt(k)= 0. Then

Mk−Mk−1= (Ak−Ak−1)(αh(k)+t(k)) = α(Ak−Ak−1)h(k)+(Ak−Ak−1)t(k).

24 / 35

(25)

Since

(Ak− Ak−1)h(k)= Akh(k)− Ak−1h(k)= y(k)− Ak−1h(k), both y(k)and Ak−1h(k)are old values, we have no control over the first part (Ak− Ak−1)h(k). In order to minimize

Mk(x) − Mk−1(x), we try to choose Akso that (Ak− Ak−1)t(k) = 0

for all t(k)∈ Rn, (h(k))Tt(k)= 0. This requires that Ak− Ak−1to be a rank-one matrix of the form

Ak− Ak−1 = u(k)(h(k))T

for some u(k)∈ Rn. Then

u(k)(h(k))Th(k)= (Ak− Ak−1)h(k)= y(k)− Ak−1h(k)

25 / 35

(26)

which gives

u(k)= y(k)− Ak−1h(k) (h(k))Th(k) .

Therefore,

Ak= Ak−1+(y(k)− Ak−1h(k))(h(k))T

(h(k))Th(k) . (1) After Akis determined, the new iterate x(k+1) is derived from solving Mk(x) = 0. It can be done by first noting that

h(k+1) = x(k+1)− x(k) =⇒ x(k+1) = x(k)+ h(k+1) and

Mk(x(k+1)) = 0 ⇒ Akh(k+1)= −F (x(k)) These formulations give theBroyden’smethod.

26 / 35

(27)

Algorithm 2 (Broyden’s Method)

Given F : Rn→ Rn, an initial vector x(0) and initial Jacobian matrix A0 ∈ Rn×n (e.g., A0 = I), tolerance T OL, maximum number of iteration M .

Set k = 1.

While k ≤ M and kx(k)− x(k−1)k2 ≥ T OL Solve Akh(k+1)= −F (x(k))for h(k+1) Update x(k+1)= x(k)+ h(k+1)

Compute y(k+1)= F (x(k+1)) − F (x(k)) Update

Ak+1= Ak+(y(k+1)+ F (x(k)))(h(k+1))T (h(k+1))Th(k+1) Set k = k + 1

End While

27 / 35

(28)

Solve the linear system Akh(k+1) = −F (x(k))for h(k+1): LU-factorization: cost 23n3+ O(n2)floating-point operations.

Applying the Shermann-Morrison-Woodbury formula B + U VT−1

= B−1− B−1U I + VTB−1U−1

VTB−1 to (1), we have

A−1k

=



Ak−1+(y(k)− Ak−1h(k))(h(k))T (h(k))Th(k)

−1

= A−1k−1− A−1k−1y(k)− Ak−1h(k) (h(k))Th(k)



1 + (h(k))TA−1k−1y(k)− Ak−1h(k) (h(k))Th(k)

−1

(h(k))TA−1k−1

= A−1k−1+(h(k)− A−1k−1y(k))(h(k))TA−1k−1 (h(k))TA−1k−1y(k) .

28 / 35

(29)

Newton-based methods

Advantage: high speed of convergence once a sufficiently accurate approximation

Weakness: an accurate initial approximation to the solution is needed to ensure convergence.

Steepest Descent method converges only linearly to the sol., but it will usually converge even for poor initial approximations.

“Find sufficiently accurate starting approximate solution by using Steepest Descent method” + ”Compute convergent solution by using Newton-based methods”

The method of Steepest Descent determines a local minimum for a multivariable function of g : Rn → R.

A system of the form fi(x1, . . . , xn) = 0, i = 1, 2, . . . , nhas a solution at x iff the function g defined by

g(x1, . . . , xn) =

n

X

i=1

[fi(x1, . . . , xn)]2

has the minimal value zero. 29 / 35

(30)

Basic idea of steepest descent method:

(i) Evaluate g at an initial approximation x(0);

(ii) Determine a direction from x(0)that results in a decrease in the value of g;

(iii) Move an appropriate distance in this direction and call the new vector x(1);

(iv) Repeat steps (i) through (iii) with x(0)replaced by x(1). Definition 7 (Gradient)

If g : Rn→ R, the gradient, ∇g(x), at x is defined by

∇g(x) = ∂g

∂x1

(x), · · · , ∂g

∂xn

(x)

 .

Definition 8 (Directional Derivative)

The directional derivative of g at x in the direction of v with k v k2= 1 is defined by

Dvg(x) = lim

h→0

g(x + hv) − g(x)

h = vT∇g(x).

30 / 35

(31)

Theorem 9

The direction of the greatest decrease in the value of g at x is the direction given by−∇g(x).

Object: reduce g(x) to its minimal value zero.

⇒ for an initial approximation x(0), an appropriate choice for new vector x(1) is

x(1)= x(0)− α∇g(x(0)), for some constant α > 0.

Choose α > 0 such that g(x(1)) < g(x(0)): define h(α) = g(x(0)− α∇g(x(0))),

then find α such that

h(α) = min

α h(α).

31 / 35

(32)

How to find α?

Solve a root-finding problem h0(α) = 0 ⇒ Too costly, in general.

Choose three number α1< α2< α3, construct quadratic polynomial P (x) that interpolates h at α1, α2and α3, i.e.,

P (α1) = h(α1), P (α2) = h(α2), P (α3) = h(α3), to approximate h. Use the minimum value P ( ˆα)in [α1, α3] to approximate h(α). The new iteration is

x(1) = x(0)− ˆα∇g(x(0)).

Set α1 = 0to minimize the computation α3is found with h(α3) < h(α1).

Choose α2= α3/2.

32 / 35

(33)

Example 10

Use the Steepest Descent method with x(0)= (0, 0, 0)T to find a reasonable starting approximation to the solution of the nonlinear system

f1(x1, x2, x3) = 3x1− cos(x2x3) −1 2 = 0,

f2(x1, x2, x3) = x21− 81(x2+ 0.1)2+ sin x3+ 1.06 = 0, f3(x1, x2, x3) = e−x1x2+ 20x3+10π − 3

3 = 0.

Let g(x1, x2, x3) = [f1(x1, x2, x3)]2+ [f2(x1, x2, x3)]2+ [f3(x1, x2, x3)]2. Then∇g(x1, x2, x3) ∇g(x)

=



2f1(x)∂f1

∂x1(x) + 2f2(x)∂f2

∂x1(x) + 2f3(x)∂f3

∂x1(x), 2f1(x)∂f1

∂x2

(x) + 2f2(x)∂f2

∂x2

(x) + 2f3(x)∂f3

∂x2

(x), 2f1(x)∂f1

∂x3(x) + 2f2(x)∂f2

∂x3(x) + 2f3(x)∂f3

∂x3(x)



33 / 35

(34)

For x(0) = [0, 0, 0]T, we have

g(x(0)) = 111.975 and z0 = k∇g(x(0))k2 = 419.554.

Let z = 1

z0

∇g(x(0)) = [−0.0214514, −0.0193062, 0.999583]T. With α1 = 0, we have

g1= g(x(0)− α1z) = g(x(0)) = 111.975.

Let α3 = 1so that

g3 = g(x(0)− α3z) = 93.5649 < g1. Set α2 = α3/2 = 0.5. Thus

g2= g(x(0)− α2z) = 2.53557.

34 / 35

(35)

Form quadratic polynomial P (α) defined as P (α) = g1+ h1α + h3α(α − α2)

that interpolates g(x(0)− αz) at α1= 0, α2= 0.5and α3= 1as follows g2= P (α2) = g1+ h1α2 ⇒ h1= g2− g1

α2 = −218.878, g3= P (α3) = g1+ h1α3+ h3α33− α2) ⇒ h3= 400.937.

Thus

P (α) = 111.975 − 218.878α + 400.937α(α − 0.5) so that

0 = P00) = −419.346 + 801.872α0 ⇒ α0= 0.522959 Since

g0= g(x(0)− α0z) = 2.32762 < min{g1, g3}, we set

x(1)= x(0)− α0z = [0.0112182, 0.0100964, −0.522741]T.

35 / 35

The disadvantage of the inversion methods of that type, the encountered dependence of discretization and truncation error on the free parameters, is removed by

In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family F of smoothing functions.

Chen, The semismooth-related properties of a merit function and a descent method for the nonlinear complementarity problem, Journal of Global Optimization, vol.. Soares, A new

By exploiting the Cartesian P -properties for a nonlinear transformation, we show that the class of regularized merit functions provides a global error bound for the solution of

In the following we prove some important inequalities of vector norms and matrix norms... We define backward and forward errors in

Lin, A smoothing Newton method based on the generalized Fischer-Burmeister function for MCPs, Nonlinear Analysis: Theory, Methods and Applications, 72(2010), 3739-3758..

The superlinear convergence of Broyden’s method for Example 1 is demonstrated in the following table, and the computed solutions are less accurate than those computed by