師大

**of equations**

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University, Taiwan

E-mail: min@ntnu.edu.tw

September 12, 2015

**1 / 35**

師大

**Outline**

**1** **Fixed points for functions of several variables**

**2** **Newton’s method**

**3** **Quasi-Newton methods**

**4** **Steepest Descent Techniques**

**2 / 35**

師大

**Fixed points for functions of several variables**

**Theorem 1**

Let f : D ⊂ R^{n}→ R be a function and x0∈ D. If all the partial
derivatives of f exist and ∃ δ > 0 and α > 0 such that

∀ kx − x_{0}k < δ and x ∈ D, we have

∂f (x)

∂xj

≤ α, ∀ j = 1, 2, . . . , n,
then f is continuous at x_{0}.

**Definition 2 (Fixed Point)**

A function G from D ⊂ R^{n}into R^{n}has a fixed point at p ∈ D if
G(p) = p.

**3 / 35**

師大

**Theorem 3 (Contraction Mapping Theorem)**

Let D = {(x_{1}, · · · , xn)^{T}; ai≤ x_{i} ≤ b_{i}, ∀ i = 1, . . . , n} ⊂ R^{n}.
Suppose G : D → R^{n}is a continuous function with G(x) ∈ D
whenever x ∈ D. Then G has a fixed point in D.

Suppose, in addition, G has continuous partial derivatives and a constant α < 1 exists with

∂g_{i}(x)

∂xj

≤ α

n, whenever x ∈ D,

for j = 1, . . . , n and i = 1, . . . , n. Then, for any x^{(0)} ∈ D,
x^{(k)}= G(x^{(k−1)}), for each k ≥ 1

converges to the unique fixed point p ∈ D and
k x^{(k)}− p k∞≤ α^{k}

1 − α k x^{(1)}− x^{(0)} k∞.

**4 / 35**

師大

**Example 4**

Consider the nonlinear system

3x_{1}− cos(x_{2}x_{3}) −1

2 = 0,
x^{2}_{1}− 81(x_{2}+ 0.1)^{2}+ sin x_{3}+ 1.06 = 0,

e^{−x}^{1}^{x}^{2}+ 20x3+10π − 3

3 = 0.

Fixed-point problem:

Change the system into the fixed-point problem:

x1 = 1

3cos(x2x3) +1

6 ≡ g1(x1, x2, x3), x2 = 1

9 q

x^{2}_{1}+ sin x3+ 1.06 − 0.1 ≡ g2(x1, x2, x3),
x3 = − 1

20e^{−x}^{1}^{x}^{2}−10π − 3

60 ≡ g3(x1, x2, x3).

Let G : R^{3}→ R^{3}be defined by G(x) = [g1(x), g_{2}(x), g_{3}(x)]^{T}.

**5 / 35**

師大

• G has a unique point in D ≡ [−1, 1] × [−1, 1] × [−1, 1]:

Existence: ∀ x ∈ D,

|g1(x)| ≤1

3| cos(x2x3)| +1 6 ≤ 0.5,

|g2(x)| = 1 9

q

x^{2}_{1}+ sin x_{3}+ 1.06 − 0.1

≤1 9

√1 + sin 1 + 1.06 − 0.1 < 0.09,

|g3(x)| = 1

20e^{−x}^{1}^{x}^{2}+10π − 3
60 ≤ 1

20e +10π − 3

60 < 0.61, it implies that G(x) ∈ D whenever x ∈ D.

Uniqueness:

∂g1

∂x_{1}

= 0,

∂g2

∂x_{2}

= 0 and

∂g3

∂x_{3}

= 0, as well as

∂g_{1}

∂x2

≤ 1

3|x_{3}| · | sin(x_{2}x_{3})| ≤ 1

3sin 1 < 0.281,

**6 / 35**

師大

∂g1

∂x_{3}

≤ 1

3|x2| · | sin(x2x3)| ≤1

3sin 1 < 0.281,

∂g2

∂x1

= |x1|

9px^{2}_{1}+ sin x_{3}+ 1.06 < 1
9√

0.218 < 0.238,

∂g_{2}

∂x3

= | cos x_{3}|

18px^{2}_{1}+ sin x3+ 1.06< 1
18√

0.218 < 0.119,

∂g3

∂x_{1}

= |x2|

20 e^{−x}^{1}^{x}^{2}≤ 1

20e < 0.14,

∂g3

∂x2

= |x1|

20 e^{−x}^{1}^{x}^{2}≤ 1

20e < 0.14.

These imply that g1, g2and g3are continuous on D and ∀ x ∈ D,

∂gi

∂x_{j}

≤ 0.281, ∀ i, j.

Similarly, ∂gi/∂x_{j}are continuous on D for all i and j. Consequently,
Ghas a unique fixed point in D.

**7 / 35**

師大

• Approximated solution:

Fixed-point iteration (I):

Choosing x^{(0)}= [0.1, 0.1, −0.1]^{T}, {x^{(k)}} is generated by
x^{(k)}_{1} = 1

3cos x^{(k−1)}_{2} x^{(k−1)}_{3} +1
6,
x^{(k)}_{2} = 1

9 r

x^{(k−1)}_{1} 2

+ sin x^{(k−1)}_{3} + 1.06 − 0.1,
x^{(k)}_{3} = −1

20e^{−x}^{(k−1)}^{1} ^{x}^{(k−1)}^{2} −10π − 3
60 .
Result:

k x^{(k)}_{1} x^{(k)}_{2} x^{(k)}_{3} kx^{(k)}− x^{(k−1)}k_{∞}

0 0.10000000 0.10000000 -0.10000000

1 0.49998333 0.00944115 -0.52310127 0.423
2 0.49999593 0.00002557 -0.52336331 9.4 × 10^{−3}
3 0.50000000 0.00001234 -0.52359814 2.3 × 10^{−4}
4 0.50000000 0.00000003 -0.52359847 1.2 × 10^{−5}
5 0.50000000 0.00000002 -0.52359877 3.1 × 10^{−7}

**8 / 35**

師大

• Approximated solution (cont.):

Accelerate convergence of the fixed-point iteration:

x^{(k)}_{1} = 1

3cos x^{(k−1)}_{2} x^{(k−1)}_{3} +1
6,
x^{(k)}_{2} = 1

9 r

x^{(k)}_{1} ^{2}

+ sin x^{(k−1)}_{3} + 1.06 − 0.1,
x^{(k)}_{3} = −1

20e^{−x}^{(k)}^{1} ^{x}^{(k)}^{2} −10π − 3
60 ,
as in the Gauss-Seidel method for linear systems.

Result:

k x^{(k)}_{1} x^{(k)}_{2} x^{(k)}_{3} kx^{(k)}− x^{(k−1)}k∞

0 0.10000000 0.10000000 -0.10000000

1 0.49998333 0.02222979 -0.52304613 0.423
2 0.49997747 0.00002815 -0.52359807 2.2 × 10^{−2}
3 0.50000000 0.00000004 -0.52359877 2.8 × 10^{−5}
4 0.50000000 0.00000000 -0.52359877 3.8 × 10^{−8}

**9 / 35**

師大

**Exercise**

Page 636: 5, 7.b, 7.d

**10 / 35**

師大

**Newton’s method**

First consider solving the following system of nonlinear eqs.:

(f_{1}(x_{1}, x_{2}) = 0,
f2(x1, x2) = 0.

Suppose (x^{(k)}_{1} , x^{(k)}_{2} )is an approximation to the solution of the
system above, and we try to compute h^{(k)}_{1} and h^{(k)}_{2} such that
(x^{(k)}_{1} + h^{(k)}_{1} , x^{(k)}_{2} + h^{(k)}_{2} )satisfies the system. By the Taylor’s
theorem for two variables,

0 = f1(x^{(k)}_{1} + h^{(k)}_{1} , x^{(k)}_{2} + h^{(k)}_{2} )

≈ f_{1}(x^{(k)}_{1} , x^{(k)}_{2} ) + h^{(k)}_{1} ∂f1

∂x1

(x^{(k)}_{1} , x^{(k)}_{2} ) + h^{(k)}_{2} ∂f1

∂x2

(x^{(k)}_{1} , x^{(k)}_{2} )
0 = f2(x^{(k)}_{1} + h^{(k)}_{1} , x^{(k)}_{2} + h^{(k)}_{2} )

≈ f_{2}(x^{(k)}_{1} , x^{(k)}_{2} ) + h^{(k)}_{1} ∂f2

∂x_{1}(x^{(k)}_{1} , x^{(k)}_{2} ) + h^{(k)}_{2} ∂f2

∂x_{2}(x^{(k)}_{1} , x^{(k)}_{2} )

**11 / 35**

師大

Put this in matrix form

" _{∂f}

1

∂x_{1}(x^{(k)}_{1} , x^{(k)}_{2} ) _{∂x}^{∂f}^{1}

2(x^{(k)}_{1} , x^{(k)}_{2} )

∂f_{2}

∂x_{1}(x^{(k)}_{1} , x^{(k)}_{2} ) _{∂x}^{∂f}^{2}

2(x^{(k)}_{1} , x^{(k)}_{2} )

# "

h^{(k)}_{1}
h^{(k)}_{2}

# +

"

f1(x^{(k)}_{1} , x^{(k)}_{2} )
f2(x^{(k)}_{1} , x^{(k)}_{2} )

#

≈

0 0

. The matrix

J (x^{(k)}_{1} , x^{(k)}_{2} ) ≡

" _{∂f}

1

∂x1(x^{(k)}_{1} , x^{(k)}_{2} ) ^{∂f}_{∂x}^{1}

2(x^{(k)}_{1} , x^{(k)}_{2} )

∂f_{2}

∂x_{1}(x^{(k)}_{1} , x^{(k)}_{2} ) ^{∂f}_{∂x}^{2}

2(x^{(k)}_{1} , x^{(k)}_{2} )

#

is called theJacobian matrix. Set h^{(k)}_{1} and h^{(k)}_{2} be the solution of the
linear system

J (x^{(k)}_{1} , x^{(k)}_{2} )

"

h^{(k)}_{1}
h^{(k)}_{2}

#

= −

"

f_{1}(x^{(k)}_{1} , x^{(k)}_{2} )
f_{2}(x^{(k)}_{1} , x^{(k)}_{2} )

# ,

then

"

x^{(k+1)}_{1}
x^{(k+1)}_{2}

#

=

"

x^{(k)}_{1}
x^{(k)}_{2}

# +

"

h^{(k)}_{1}
h^{(k)}_{2}

#

is expected to be a better approximation.

**12 / 35**

師大

In general, we solve the system of n nonlinear equations
f_{i}(x_{1}, · · · , x_{n}) = 0, i = 1, . . . , n. Let

x =

x1 x2 · · · xn

T

and

F (x) =

f1(x) f2(x) · · · fn(x) T

. The problem can be formulated as solving

F (x) = 0, F : R^{n}→ R^{n}.
LetJ (x), where the(i, j)entryis _{∂x}^{∂f}^{i}

j(x), be the n × nJacobian matrix. Then the Newton’s iteration is defined as

x^{(k+1)} = x^{(k)}+ h^{(k)},

where h^{(k)}∈ R^{n}is the solution of the linear system
J (x^{(k)})h^{(k)}= −F (x^{(k)}).

**13 / 35**

師大

**Algorithm 1 (Newton’s Method for Systems)**

Given a function F : R^{n}→ R^{n}, an initial guess x^{(0)} to the zero
of F , and stop criteria M , δ, and ε, this algorithm performs the
Newton’s iteration to approximate one root of F .

Set k = 0 and h^{(−1)}= e1.

While (k < M ) and (k h^{(k−1)}k≥ δ) and (k F (x^{(k)}) k≥ ε)
Calculate J (x^{(k)}) = [∂Fi(x^{(k)})/∂xj].

Solve the n × n linear system J (x^{(k)})h^{(k)}= −F (x^{(k)}).

Set x^{(k+1)}= x^{(k)}+ h^{(k)} and k = k + 1.

End while

Output (“Convergent x^{(k)}”) or

(“Maximum number of iterations exceeded”)

**14 / 35**

師大

**Theorem 5**

Let x^{∗}be a solution of G(x) = x. Suppose ∃ δ > 0 with

**(i)** ∂g_{i}/∂x_{j}is continuous on Nδ = {x; kx − x^{∗}k < δ} for all i and j.

**(ii)** ∂^{2}g_{i}(x)/(∂x_{j}∂x_{k})is continuous and

∂^{2}g_{i}(x)

∂xj∂xk

≤ M

for some M whenever x ∈ Nδ for each i, j and k.

**(iii)** ∂g_{i}(x^{∗})/∂x_{k} = 0for each i and k.

Then ∃ ˆδ < δsuch that the sequence {x^{(k)}} generated by
x^{(k)}= G(x^{(k−1)})

convergesquadraticallyto x^{∗}for any x^{(0)}satisfying kx^{(0)}− x^{∗}k_{∞}< ˆδ.

Moreover,

kx^{(k)}− x^{∗}k_{∞}≤n^{2}M

2 kx^{(k−1)}− x^{∗}k^{2}_{∞}, ∀ k ≥ 1. _{15 / 35}

師大

**Example 6**

Consider the nonlinear system

3x_{1}− cos(x_{2}x_{3}) −1

2 = 0,
x^{2}_{1}− 81(x_{2}+ 0.1)^{2}+ sin x3+ 1.06 = 0,

e^{−x}^{1}^{x}^{2} + 20x3+10π − 3

3 = 0.

Nonlinear functions: Let

F (x_{1}, x_{2}, x_{3}) = [f_{1}(x_{1}, x_{2}, x_{3}), f_{2}(x_{1}, x_{2}, x_{3}), f_{3}(x_{1}, x_{2}, x_{3})]^{T} ,
where

f_{1}(x_{1}, x_{2}, x_{3}) = 3x_{1}− cos(x_{2}x_{3}) −1
2,

f2(x1, x2, x3) = x^{2}_{1}− 81(x_{2}+ 0.1)^{2}+ sin x3+ 1.06,
f_{3}(x_{1}, x_{2}, x_{3}) = e^{−x}^{1}^{x}^{2}+ 20x_{3}+10π − 3

3 .

**16 / 35**

師大

Nonlinear functions (cont.):

The Jacobian matrix J (x) for this system is

J (x_{1}, x_{2}, x_{3}) =

3 x_{3}sin x_{2}x_{3} x_{2}sin x_{2}x_{3}
2x_{1} −162(x_{2}+ 0.1) cos x_{3}

−x_{2}e^{−x}^{1}^{x}^{2} −x_{1}e^{−x}^{1}^{x}^{2} 20

.
Newton’s iteration with initial x^{(0)}= [0.1, 0.1, −0.1]^{T}:

x^{(k)}_{1}
x^{(k)}_{2}
x^{(k)}_{3}

=

x^{(k−1)}_{1}
x^{(k−1)}_{2}
x^{(k−1)}_{3}

−

h^{(k−1)}_{1}
h^{(k−1)}_{2}
h^{(k−1)}_{3}

, where

h^{(k−1)}_{1}
h^{(k−1)}_{2}
h^{(k−1)}_{3}

= J

x^{(k−1)}_{1} , x^{(k−1)}_{2} , x^{(k−1)}_{3} −1

F (x^{(k−1)}_{1} , x^{(k−1)}_{2} , x^{(k−1)}_{3} ).

**17 / 35**

師大

Result:

k x^{(k)}_{1} x^{(k)}_{2} x^{(k)}_{3} kx^{(k)}− x^{(k−1)}k_{∞}

0 0.10000000 0.10000000 −0.10000000

1 0.50003702 0.01946686 −0.52152047 0.422
2 0.50004593 0.00158859 −0.52355711 1.79 × 10^{−2}
3 0.50000034 0.00001244 −0.52359845 1.58 × 10^{−3}
4 0.50000000 0.00000000 −0.52359877 1.24 × 10^{−5}
5 0.50000000 0.00000000 −0.52359877 0

**18 / 35**

師大

**Exercise**
Page 644: 2, 8

**19 / 35**

師大

**Quasi-Newton methods**

Newton’s Methods

Advantage:quadraticconvergence

Disadvantage: For each iteration, it requires
O(n^{3}) + O(n^{2}) + O(n)arithmetic operations:

n^{2}partial derivatives for Jacobian matrix – in most situations,
the exact evaluation of the partial derivatives is inconvenient.

nscalar functional evaluations of F

O(n^{3})arithmetic operations to solve linear system.

quasi-Newton methods

Advantage: it requires only n scalar functional evaluations
per iteration and O(n^{2})arithmetic operations

Disadvantage:superlinearconvergence

Recall that in one dimensional case, one uses thelinearmodel

`_{k}(x) = f (x_{k}) + a_{k}(x − x_{k})

toapproximatethe functionf (x)at x_{k}. That is,`_{k}(x_{k}) = f (x_{k})
for any a_{k}∈ R. If we further require that`^{0}(x_{k}) = f^{0}(x_{k}), then

a_{k}= f^{0}(x_{k}). ^{20 / 35}

師大

The zero of `k(x)is used to give a new approximate for the zero of f (x), that is,

xk+1= xk− 1

f^{0}(xk)f (xk)
which yieldsNewton’smethod.

Iff^{0}(xk)isnot available, one instead asks the linear model to satisfy

`_{k}(x_{k}) = f (x_{k}) and `_{k}(x_{k−1}) = f (x_{k−1}).

In doing this, the identity

f (xk−1) = `k(xk−1) = f (xk) + ak(xk−1− xk)

gives

ak =f (x_{k}) − f (x_{k−1})
xk− xk−1

.

Solving `k(x) = 0yields thesecantiteration xk+1= xk− xk− xk−1

f (xk) − f (xk−1)f (xk).

**21 / 35**

師大

In multiple dimension, the analogueaffine modelbecomes
M_{k}(x) = F (x^{(k)}) + A_{k}(x − x^{(k)}),

where x, x^{(k)}∈ R^{n}and A_{k}∈ R^{n×n}, and satisfies
M_{k}(x^{(k)}) = F (x^{(k)}),

for any A_{k}. The zero of M_{k}(x)is then used to give a new
approximate for the zero of F (x), that is,

x^{(k+1)} = x^{(k)}− A^{−1}_{k} F (x^{(k)}).

TheNewton’smethod chooses

A_{k}= F^{0}(x^{(k)}) ≡ J (x^{(k)}) =the Jacobian matrix

and yields the iteration
x^{(k+1)}= x^{(k)}−

F^{0}(x^{(k)})−1

F (x^{(k)}).

**22 / 35**

師大

When the Jacobian matrix J (x^{(k)}) ≡ F^{0}(x^{(k)})is not available,
one can require

M_{k}(x^{(k−1)}) = F (x^{(k−1)}).

Then

F (x^{(k−1)}) = M_{k}(x^{(k−1)}) = F (x^{(k)}) + A_{k}(x^{(k−1)}− x^{(k)}),
which gives

Ak(x^{(k)}− x^{(k−1)}) = F (x^{(k)}) − F (x^{(k−1)})
and this is the so-called secant equation. Let

h^{(k)}= x^{(k)}− x^{(k−1)} and y^{(k)}= F (x^{(k)}) − F (x^{(k−1)}).

The secant equation becomes
Akh^{(k)}= y^{(k)}.

**23 / 35**

師大

However, this secant equation can not uniquely determine A_{k}.
One way of choosing A_{k} is to minimize M_{k}− M_{k−1} subject to
the secant equation. Note

M_{k}(x) − M_{k−1}(x)

=F (x^{(k)}) + Ak(x − x^{(k)}) − F (x^{(k−1)}) − Ak−1(x − x^{(k−1)})

=(F (x^{(k)}) − F (x^{(k−1)})) + A_{k}(x − x^{(k)}) − A_{k−1}(x − x^{(k−1)})

=A_{k}(x^{(k)}− x^{(k−1)}) + A_{k}(x − x^{(k)}) − A_{k−1}(x − x^{(k−1)})

=A_{k}(x − x^{(k−1)}) − A_{k−1}(x − x^{(k−1)})

=(A_{k}− A_{k−1})(x − x^{(k−1)}).

For any x ∈ R^{n}, we express

x − x^{(k−1)} = αh^{(k)}+ t^{(k)},

for some α ∈ R, t^{(k)}∈ R^{n}, and (h^{(k)})^{T}t^{(k)}= 0. Then

M_{k}−M_{k−1}= (A_{k}−A_{k−1})(αh^{(k)}+t^{(k)}) = α(A_{k}−A_{k−1})h^{(k)}+(A_{k}−A_{k−1})t^{(k)}.

**24 / 35**

師大

Since

(Ak− A_{k−1})h^{(k)}= Akh^{(k)}− A_{k−1}h^{(k)}= y^{(k)}− A_{k−1}h^{(k)},
both y^{(k)}and A_{k−1}h^{(k)}are old values, we have no control over
the first part (A_{k}− A_{k−1})h^{(k)}. In order to minimize

M_{k}(x) − M_{k−1}(x), we try to choose A_{k}so that
(Ak− A_{k−1})t^{(k)} = 0

for all t^{(k)}∈ R^{n}, (h^{(k)})^{T}t^{(k)}= 0. This requires that A_{k}− A_{k−1}to
be a rank-one matrix of the form

A_{k}− A_{k−1} = u^{(k)}(h^{(k)})^{T}

for some u^{(k)}∈ R^{n}. Then

u^{(k)}(h^{(k)})^{T}h^{(k)}= (A_{k}− A_{k−1})h^{(k)}= y^{(k)}− A_{k−1}h^{(k)}

**25 / 35**

師大

which gives

u^{(k)}= y^{(k)}− A_{k−1}h^{(k)}
(h^{(k)})^{T}h^{(k)} .

Therefore,

Ak= Ak−1+(y^{(k)}− A_{k−1}h^{(k)})(h^{(k)})^{T}

(h^{(k)})^{T}h^{(k)} . (1)
After A_{k}is determined, the new iterate x^{(k+1)} is derived from
solving M_{k}(x) = 0. It can be done by first noting that

h^{(k+1)} = x^{(k+1)}− x^{(k)} =⇒ x^{(k+1)} = x^{(k)}+ h^{(k+1)}
and

M_{k}(x^{(k+1)}) = 0 ⇒ A_{k}h^{(k+1)}= −F (x^{(k)})
These formulations give theBroyden’smethod.

**26 / 35**

師大

**Algorithm 2 (Broyden’s Method)**

Given F : R^{n}→ R^{n}, an initial vector x^{(0)} and initial Jacobian
matrix A_{0} ∈ R^{n×n} (e.g., A_{0} = I), tolerance T OL, maximum
number of iteration M .

Set k = 1.

While k ≤ M and kx^{(k)}− x^{(k−1)}k_{2} ≥ T OL
Solve A_{k}h^{(k+1)}= −F (x^{(k)})for h^{(k+1)}
Update x^{(k+1)}= x^{(k)}+ h^{(k+1)}

Compute y^{(k+1)}= F (x^{(k+1)}) − F (x^{(k)})
Update

A_{k+1}= A_{k}+(y^{(k+1)}+ F (x^{(k)}))(h^{(k+1)})^{T}
(h^{(k+1)})^{T}h^{(k+1)}
Set k = k + 1

End While

**27 / 35**

師大

Solve the linear system A_{k}h^{(k+1)} = −F (x^{(k)})for h^{(k+1)}:
LU-factorization: cost ^{2}_{3}n^{3}+ O(n^{2})floating-point
operations.

Applying the Shermann-Morrison-Woodbury formula
B + U V^{T}−1

= B^{−1}− B^{−1}U I + V^{T}B^{−1}U−1

V^{T}B^{−1}
to (1), we have

A^{−1}_{k}

=

Ak−1+(y^{(k)}− Ak−1h^{(k)})(h^{(k)})^{T}
(h^{(k)})^{T}h^{(k)}

^{−1}

= A^{−1}_{k−1}− A^{−1}_{k−1}y^{(k)}− Ak−1h^{(k)}
(h^{(k)})^{T}h^{(k)}

1 + (h^{(k)})^{T}A^{−1}_{k−1}y^{(k)}− Ak−1h^{(k)}
(h^{(k)})^{T}h^{(k)}

^{−1}

(h^{(k)})^{T}A^{−1}_{k−1}

= A^{−1}_{k−1}+(h^{(k)}− A^{−1}_{k−1}y^{(k)})(h^{(k)})^{T}A^{−1}_{k−1}
(h^{(k)})^{T}A^{−1}_{k−1}y^{(k)} .

**28 / 35**

師大

Newton-based methods

Advantage: high speed of convergence once a sufficiently accurate approximation

Weakness: an accurate initial approximation to the solution is needed to ensure convergence.

Steepest Descent method converges only linearly to the sol., but it will usually converge even for poor initial approximations.

“Find sufficiently accurate starting approximate solution by using Steepest Descent method” + ”Compute convergent solution by using Newton-based methods”

The method of Steepest Descent determines a local minimum
for a multivariable function of g : R^{n} → R.

A system of the form fi(x1, . . . , xn) = 0, i = 1, 2, . . . , nhas a solution at x iff the function g defined by

g(x1, . . . , xn) =

n

X

i=1

[fi(x1, . . . , xn)]^{2}

has the minimal value zero. ^{29 / 35}

師大

Basic idea of steepest descent method:

**(i)** Evaluate g at an initial approximation x^{(0)};

**(ii)** Determine a direction from x^{(0)}that results in a decrease in the
value of g;

**(iii)** Move an appropriate distance in this direction and call the new
vector x^{(1)};

**(iv)** Repeat steps (i) through (iii) with x^{(0)}replaced by x^{(1)}.
**Definition 7 (Gradient)**

If g : R^{n}→ R, the gradient, ∇g(x), at x is defined by

∇g(x) = ∂g

∂x1

(x), · · · , ∂g

∂xn

(x)

.

**Definition 8 (Directional Derivative)**

The directional derivative of g at x in the direction of v with k v k2= 1 is defined by

D_{v}g(x) = lim

h→0

g(x + hv) − g(x)

h = v^{T}∇g(x).

**30 / 35**

師大

**Theorem 9**

The direction of the greatest decrease in the value of g at x is the direction given by−∇g(x).

Object: reduce g(x) to its minimal value zero.

⇒ for an initial approximation x^{(0)}, an appropriate choice
for new vector x^{(1)} is

x^{(1)}= x^{(0)}− α∇g(x^{(0)}), for some constant α > 0.

Choose α > 0 such that g(x^{(1)}) < g(x^{(0)}): define
h(α) = g(x^{(0)}− α∇g(x^{(0)})),

then find α^{∗} such that

h(α^{∗}) = min

α h(α).

**31 / 35**

師大

How to find α^{∗}?

Solve a root-finding problem h^{0}(α) = 0 ⇒ Too costly, in
general.

Choose three number α1< α2< α3, construct quadratic polynomial P (x) that interpolates h at α1, α2and α3, i.e.,

P (α1) = h(α1), P (α2) = h(α2), P (α3) = h(α3),
to approximate h. Use the minimum value P ( ˆα)in [α1, α3]
to approximate h(α^{∗}). The new iteration is

x^{(1)} = x^{(0)}− ˆα∇g(x^{(0)}).

Set α1 = 0to minimize the computation α3is found with h(α3) < h(α1).

Choose α2= α3/2.

**32 / 35**

師大

**Example 10**

Use the Steepest Descent method with x^{(0)}= (0, 0, 0)^{T} to find a
reasonable starting approximation to the solution of the nonlinear
system

f1(x1, x2, x3) = 3x1− cos(x2x3) −1 2 = 0,

f2(x1, x2, x3) = x^{2}_{1}− 81(x2+ 0.1)^{2}+ sin x3+ 1.06 = 0,
f_{3}(x_{1}, x_{2}, x_{3}) = e^{−x}^{1}^{x}^{2}+ 20x_{3}+10π − 3

3 = 0.

Let g(x1, x_{2}, x_{3}) = [f_{1}(x_{1}, x_{2}, x_{3})]^{2}+ [f_{2}(x_{1}, x_{2}, x_{3})]^{2}+ [f_{3}(x_{1}, x_{2}, x_{3})]^{2}.
Then∇g(x1, x2, x3) ≡ ∇g(x)

=

2f_{1}(x)∂f1

∂x_{1}(x) + 2f_{2}(x)∂f2

∂x_{1}(x) + 2f_{3}(x)∂f3

∂x_{1}(x),
2f1(x)∂f1

∂x2

(x) + 2f2(x)∂f2

∂x2

(x) + 2f3(x)∂f3

∂x2

(x), 2f1(x)∂f1

∂x_{3}(x) + 2f2(x)∂f2

∂x_{3}(x) + 2f3(x)∂f3

∂x_{3}(x)

**33 / 35**

師大

For x^{(0)} = [0, 0, 0]^{T}, we have

g(x^{(0)}) = 111.975 and z0 = k∇g(x^{(0)})k2 = 419.554.

Let z = 1

z0

∇g(x^{(0)}) = [−0.0214514, −0.0193062, 0.999583]^{T}.
With α_{1} = 0, we have

g1= g(x^{(0)}− α_{1}z) = g(x^{(0)}) = 111.975.

Let α_{3} = 1so that

g3 = g(x^{(0)}− α_{3}z) = 93.5649 < g1.
Set α_{2} = α_{3}/2 = 0.5. Thus

g2= g(x^{(0)}− α_{2}z) = 2.53557.

**34 / 35**

師大

Form quadratic polynomial P (α) defined as P (α) = g1+ h1α + h3α(α − α2)

that interpolates g(x^{(0)}− αz) at α1= 0, α_{2}= 0.5and α3= 1as follows
g2= P (α2) = g1+ h1α2 ⇒ h1= g2− g1

α_{2} = −218.878,
g3= P (α3) = g1+ h1α3+ h3α3(α3− α2) ⇒ h3= 400.937.

Thus

P (α) = 111.975 − 218.878α + 400.937α(α − 0.5) so that

0 = P^{0}(α_{0}) = −419.346 + 801.872α_{0} ⇒ α_{0}= 0.522959
Since

g0= g(x^{(0)}− α0z) = 2.32762 < min{g1, g3},
we set

x^{(1)}= x^{(0)}− α0z = [0.0112182, 0.0100964, −0.522741]^{T}.

**35 / 35**