師大
of equations
Tsung-Ming Huang
Department of Mathematics National Taiwan Normal University, Taiwan
E-mail: min@ntnu.edu.tw
September 12, 2015
1 / 35
師大
Outline
1 Fixed points for functions of several variables
2 Newton’s method
3 Quasi-Newton methods
4 Steepest Descent Techniques
2 / 35
師大
Fixed points for functions of several variables
Theorem 1
Let f : D ⊂ Rn→ R be a function and x0∈ D. If all the partial derivatives of f exist and ∃ δ > 0 and α > 0 such that
∀ kx − x0k < δ and x ∈ D, we have
∂f (x)
∂xj
≤ α, ∀ j = 1, 2, . . . , n, then f is continuous at x0.
Definition 2 (Fixed Point)
A function G from D ⊂ Rninto Rnhas a fixed point at p ∈ D if G(p) = p.
3 / 35
師大
Theorem 3 (Contraction Mapping Theorem)
Let D = {(x1, · · · , xn)T; ai≤ xi ≤ bi, ∀ i = 1, . . . , n} ⊂ Rn. Suppose G : D → Rnis a continuous function with G(x) ∈ D whenever x ∈ D. Then G has a fixed point in D.
Suppose, in addition, G has continuous partial derivatives and a constant α < 1 exists with
∂gi(x)
∂xj
≤ α
n, whenever x ∈ D,
for j = 1, . . . , n and i = 1, . . . , n. Then, for any x(0) ∈ D, x(k)= G(x(k−1)), for each k ≥ 1
converges to the unique fixed point p ∈ D and k x(k)− p k∞≤ αk
1 − α k x(1)− x(0) k∞.
4 / 35
師大
Example 4
Consider the nonlinear system
3x1− cos(x2x3) −1
2 = 0, x21− 81(x2+ 0.1)2+ sin x3+ 1.06 = 0,
e−x1x2+ 20x3+10π − 3
3 = 0.
Fixed-point problem:
Change the system into the fixed-point problem:
x1 = 1
3cos(x2x3) +1
6 ≡ g1(x1, x2, x3), x2 = 1
9 q
x21+ sin x3+ 1.06 − 0.1 ≡ g2(x1, x2, x3), x3 = − 1
20e−x1x2−10π − 3
60 ≡ g3(x1, x2, x3).
Let G : R3→ R3be defined by G(x) = [g1(x), g2(x), g3(x)]T.
5 / 35
師大
• G has a unique point in D ≡ [−1, 1] × [−1, 1] × [−1, 1]:
Existence: ∀ x ∈ D,
|g1(x)| ≤1
3| cos(x2x3)| +1 6 ≤ 0.5,
|g2(x)| = 1 9
q
x21+ sin x3+ 1.06 − 0.1
≤1 9
√1 + sin 1 + 1.06 − 0.1 < 0.09,
|g3(x)| = 1
20e−x1x2+10π − 3 60 ≤ 1
20e +10π − 3
60 < 0.61, it implies that G(x) ∈ D whenever x ∈ D.
Uniqueness:
∂g1
∂x1
= 0,
∂g2
∂x2
= 0 and
∂g3
∂x3
= 0, as well as
∂g1
∂x2
≤ 1
3|x3| · | sin(x2x3)| ≤ 1
3sin 1 < 0.281,
6 / 35
師大
∂g1
∂x3
≤ 1
3|x2| · | sin(x2x3)| ≤1
3sin 1 < 0.281,
∂g2
∂x1
= |x1|
9px21+ sin x3+ 1.06 < 1 9√
0.218 < 0.238,
∂g2
∂x3
= | cos x3|
18px21+ sin x3+ 1.06< 1 18√
0.218 < 0.119,
∂g3
∂x1
= |x2|
20 e−x1x2≤ 1
20e < 0.14,
∂g3
∂x2
= |x1|
20 e−x1x2≤ 1
20e < 0.14.
These imply that g1, g2and g3are continuous on D and ∀ x ∈ D,
∂gi
∂xj
≤ 0.281, ∀ i, j.
Similarly, ∂gi/∂xjare continuous on D for all i and j. Consequently, Ghas a unique fixed point in D.
7 / 35
師大
• Approximated solution:
Fixed-point iteration (I):
Choosing x(0)= [0.1, 0.1, −0.1]T, {x(k)} is generated by x(k)1 = 1
3cos x(k−1)2 x(k−1)3 +1 6, x(k)2 = 1
9 r
x(k−1)1 2
+ sin x(k−1)3 + 1.06 − 0.1, x(k)3 = −1
20e−x(k−1)1 x(k−1)2 −10π − 3 60 . Result:
k x(k)1 x(k)2 x(k)3 kx(k)− x(k−1)k∞
0 0.10000000 0.10000000 -0.10000000
1 0.49998333 0.00944115 -0.52310127 0.423 2 0.49999593 0.00002557 -0.52336331 9.4 × 10−3 3 0.50000000 0.00001234 -0.52359814 2.3 × 10−4 4 0.50000000 0.00000003 -0.52359847 1.2 × 10−5 5 0.50000000 0.00000002 -0.52359877 3.1 × 10−7
8 / 35
師大
• Approximated solution (cont.):
Accelerate convergence of the fixed-point iteration:
x(k)1 = 1
3cos x(k−1)2 x(k−1)3 +1 6, x(k)2 = 1
9 r
x(k)1 2
+ sin x(k−1)3 + 1.06 − 0.1, x(k)3 = −1
20e−x(k)1 x(k)2 −10π − 3 60 , as in the Gauss-Seidel method for linear systems.
Result:
k x(k)1 x(k)2 x(k)3 kx(k)− x(k−1)k∞
0 0.10000000 0.10000000 -0.10000000
1 0.49998333 0.02222979 -0.52304613 0.423 2 0.49997747 0.00002815 -0.52359807 2.2 × 10−2 3 0.50000000 0.00000004 -0.52359877 2.8 × 10−5 4 0.50000000 0.00000000 -0.52359877 3.8 × 10−8
9 / 35
師大
Exercise
Page 636: 5, 7.b, 7.d
10 / 35
師大
Newton’s method
First consider solving the following system of nonlinear eqs.:
(f1(x1, x2) = 0, f2(x1, x2) = 0.
Suppose (x(k)1 , x(k)2 )is an approximation to the solution of the system above, and we try to compute h(k)1 and h(k)2 such that (x(k)1 + h(k)1 , x(k)2 + h(k)2 )satisfies the system. By the Taylor’s theorem for two variables,
0 = f1(x(k)1 + h(k)1 , x(k)2 + h(k)2 )
≈ f1(x(k)1 , x(k)2 ) + h(k)1 ∂f1
∂x1
(x(k)1 , x(k)2 ) + h(k)2 ∂f1
∂x2
(x(k)1 , x(k)2 ) 0 = f2(x(k)1 + h(k)1 , x(k)2 + h(k)2 )
≈ f2(x(k)1 , x(k)2 ) + h(k)1 ∂f2
∂x1(x(k)1 , x(k)2 ) + h(k)2 ∂f2
∂x2(x(k)1 , x(k)2 )
11 / 35
師大
Put this in matrix form
" ∂f
1
∂x1(x(k)1 , x(k)2 ) ∂x∂f1
2(x(k)1 , x(k)2 )
∂f2
∂x1(x(k)1 , x(k)2 ) ∂x∂f2
2(x(k)1 , x(k)2 )
# "
h(k)1 h(k)2
# +
"
f1(x(k)1 , x(k)2 ) f2(x(k)1 , x(k)2 )
#
≈
0 0
. The matrix
J (x(k)1 , x(k)2 ) ≡
" ∂f
1
∂x1(x(k)1 , x(k)2 ) ∂f∂x1
2(x(k)1 , x(k)2 )
∂f2
∂x1(x(k)1 , x(k)2 ) ∂f∂x2
2(x(k)1 , x(k)2 )
#
is called theJacobian matrix. Set h(k)1 and h(k)2 be the solution of the linear system
J (x(k)1 , x(k)2 )
"
h(k)1 h(k)2
#
= −
"
f1(x(k)1 , x(k)2 ) f2(x(k)1 , x(k)2 )
# ,
then
"
x(k+1)1 x(k+1)2
#
=
"
x(k)1 x(k)2
# +
"
h(k)1 h(k)2
#
is expected to be a better approximation.
12 / 35
師大
In general, we solve the system of n nonlinear equations fi(x1, · · · , xn) = 0, i = 1, . . . , n. Let
x =
x1 x2 · · · xn
T
and
F (x) =
f1(x) f2(x) · · · fn(x) T
. The problem can be formulated as solving
F (x) = 0, F : Rn→ Rn. LetJ (x), where the(i, j)entryis ∂x∂fi
j(x), be the n × nJacobian matrix. Then the Newton’s iteration is defined as
x(k+1) = x(k)+ h(k),
where h(k)∈ Rnis the solution of the linear system J (x(k))h(k)= −F (x(k)).
13 / 35
師大
Algorithm 1 (Newton’s Method for Systems)
Given a function F : Rn→ Rn, an initial guess x(0) to the zero of F , and stop criteria M , δ, and ε, this algorithm performs the Newton’s iteration to approximate one root of F .
Set k = 0 and h(−1)= e1.
While (k < M ) and (k h(k−1)k≥ δ) and (k F (x(k)) k≥ ε) Calculate J (x(k)) = [∂Fi(x(k))/∂xj].
Solve the n × n linear system J (x(k))h(k)= −F (x(k)).
Set x(k+1)= x(k)+ h(k) and k = k + 1.
End while
Output (“Convergent x(k)”) or
(“Maximum number of iterations exceeded”)
14 / 35
師大
Theorem 5
Let x∗be a solution of G(x) = x. Suppose ∃ δ > 0 with
(i) ∂gi/∂xjis continuous on Nδ = {x; kx − x∗k < δ} for all i and j.
(ii) ∂2gi(x)/(∂xj∂xk)is continuous and
∂2gi(x)
∂xj∂xk
≤ M
for some M whenever x ∈ Nδ for each i, j and k.
(iii) ∂gi(x∗)/∂xk = 0for each i and k.
Then ∃ ˆδ < δsuch that the sequence {x(k)} generated by x(k)= G(x(k−1))
convergesquadraticallyto x∗for any x(0)satisfying kx(0)− x∗k∞< ˆδ.
Moreover,
kx(k)− x∗k∞≤n2M
2 kx(k−1)− x∗k2∞, ∀ k ≥ 1. 15 / 35
師大
Example 6
Consider the nonlinear system
3x1− cos(x2x3) −1
2 = 0, x21− 81(x2+ 0.1)2+ sin x3+ 1.06 = 0,
e−x1x2 + 20x3+10π − 3
3 = 0.
Nonlinear functions: Let
F (x1, x2, x3) = [f1(x1, x2, x3), f2(x1, x2, x3), f3(x1, x2, x3)]T , where
f1(x1, x2, x3) = 3x1− cos(x2x3) −1 2,
f2(x1, x2, x3) = x21− 81(x2+ 0.1)2+ sin x3+ 1.06, f3(x1, x2, x3) = e−x1x2+ 20x3+10π − 3
3 .
16 / 35
師大
Nonlinear functions (cont.):
The Jacobian matrix J (x) for this system is
J (x1, x2, x3) =
3 x3sin x2x3 x2sin x2x3 2x1 −162(x2+ 0.1) cos x3
−x2e−x1x2 −x1e−x1x2 20
. Newton’s iteration with initial x(0)= [0.1, 0.1, −0.1]T:
x(k)1 x(k)2 x(k)3
=
x(k−1)1 x(k−1)2 x(k−1)3
−
h(k−1)1 h(k−1)2 h(k−1)3
, where
h(k−1)1 h(k−1)2 h(k−1)3
= J
x(k−1)1 , x(k−1)2 , x(k−1)3 −1
F (x(k−1)1 , x(k−1)2 , x(k−1)3 ).
17 / 35
師大
Result:
k x(k)1 x(k)2 x(k)3 kx(k)− x(k−1)k∞
0 0.10000000 0.10000000 −0.10000000
1 0.50003702 0.01946686 −0.52152047 0.422 2 0.50004593 0.00158859 −0.52355711 1.79 × 10−2 3 0.50000034 0.00001244 −0.52359845 1.58 × 10−3 4 0.50000000 0.00000000 −0.52359877 1.24 × 10−5 5 0.50000000 0.00000000 −0.52359877 0
18 / 35
師大
Exercise Page 644: 2, 8
19 / 35
師大
Quasi-Newton methods
Newton’s Methods
Advantage:quadraticconvergence
Disadvantage: For each iteration, it requires O(n3) + O(n2) + O(n)arithmetic operations:
n2partial derivatives for Jacobian matrix – in most situations, the exact evaluation of the partial derivatives is inconvenient.
nscalar functional evaluations of F
O(n3)arithmetic operations to solve linear system.
quasi-Newton methods
Advantage: it requires only n scalar functional evaluations per iteration and O(n2)arithmetic operations
Disadvantage:superlinearconvergence
Recall that in one dimensional case, one uses thelinearmodel
`k(x) = f (xk) + ak(x − xk)
toapproximatethe functionf (x)at xk. That is,`k(xk) = f (xk) for any ak∈ R. If we further require that`0(xk) = f0(xk), then
ak= f0(xk). 20 / 35
師大
The zero of `k(x)is used to give a new approximate for the zero of f (x), that is,
xk+1= xk− 1
f0(xk)f (xk) which yieldsNewton’smethod.
Iff0(xk)isnot available, one instead asks the linear model to satisfy
`k(xk) = f (xk) and `k(xk−1) = f (xk−1).
In doing this, the identity
f (xk−1) = `k(xk−1) = f (xk) + ak(xk−1− xk)
gives
ak =f (xk) − f (xk−1) xk− xk−1
.
Solving `k(x) = 0yields thesecantiteration xk+1= xk− xk− xk−1
f (xk) − f (xk−1)f (xk).
21 / 35
師大
In multiple dimension, the analogueaffine modelbecomes Mk(x) = F (x(k)) + Ak(x − x(k)),
where x, x(k)∈ Rnand Ak∈ Rn×n, and satisfies Mk(x(k)) = F (x(k)),
for any Ak. The zero of Mk(x)is then used to give a new approximate for the zero of F (x), that is,
x(k+1) = x(k)− A−1k F (x(k)).
TheNewton’smethod chooses
Ak= F0(x(k)) ≡ J (x(k)) =the Jacobian matrix
and yields the iteration x(k+1)= x(k)−
F0(x(k))−1
F (x(k)).
22 / 35
師大
When the Jacobian matrix J (x(k)) ≡ F0(x(k))is not available, one can require
Mk(x(k−1)) = F (x(k−1)).
Then
F (x(k−1)) = Mk(x(k−1)) = F (x(k)) + Ak(x(k−1)− x(k)), which gives
Ak(x(k)− x(k−1)) = F (x(k)) − F (x(k−1)) and this is the so-called secant equation. Let
h(k)= x(k)− x(k−1) and y(k)= F (x(k)) − F (x(k−1)).
The secant equation becomes Akh(k)= y(k).
23 / 35
師大
However, this secant equation can not uniquely determine Ak. One way of choosing Ak is to minimize Mk− Mk−1 subject to the secant equation. Note
Mk(x) − Mk−1(x)
=F (x(k)) + Ak(x − x(k)) − F (x(k−1)) − Ak−1(x − x(k−1))
=(F (x(k)) − F (x(k−1))) + Ak(x − x(k)) − Ak−1(x − x(k−1))
=Ak(x(k)− x(k−1)) + Ak(x − x(k)) − Ak−1(x − x(k−1))
=Ak(x − x(k−1)) − Ak−1(x − x(k−1))
=(Ak− Ak−1)(x − x(k−1)).
For any x ∈ Rn, we express
x − x(k−1) = αh(k)+ t(k),
for some α ∈ R, t(k)∈ Rn, and (h(k))Tt(k)= 0. Then
Mk−Mk−1= (Ak−Ak−1)(αh(k)+t(k)) = α(Ak−Ak−1)h(k)+(Ak−Ak−1)t(k).
24 / 35
師大
Since
(Ak− Ak−1)h(k)= Akh(k)− Ak−1h(k)= y(k)− Ak−1h(k), both y(k)and Ak−1h(k)are old values, we have no control over the first part (Ak− Ak−1)h(k). In order to minimize
Mk(x) − Mk−1(x), we try to choose Akso that (Ak− Ak−1)t(k) = 0
for all t(k)∈ Rn, (h(k))Tt(k)= 0. This requires that Ak− Ak−1to be a rank-one matrix of the form
Ak− Ak−1 = u(k)(h(k))T
for some u(k)∈ Rn. Then
u(k)(h(k))Th(k)= (Ak− Ak−1)h(k)= y(k)− Ak−1h(k)
25 / 35
師大
which gives
u(k)= y(k)− Ak−1h(k) (h(k))Th(k) .
Therefore,
Ak= Ak−1+(y(k)− Ak−1h(k))(h(k))T
(h(k))Th(k) . (1) After Akis determined, the new iterate x(k+1) is derived from solving Mk(x) = 0. It can be done by first noting that
h(k+1) = x(k+1)− x(k) =⇒ x(k+1) = x(k)+ h(k+1) and
Mk(x(k+1)) = 0 ⇒ Akh(k+1)= −F (x(k)) These formulations give theBroyden’smethod.
26 / 35
師大
Algorithm 2 (Broyden’s Method)
Given F : Rn→ Rn, an initial vector x(0) and initial Jacobian matrix A0 ∈ Rn×n (e.g., A0 = I), tolerance T OL, maximum number of iteration M .
Set k = 1.
While k ≤ M and kx(k)− x(k−1)k2 ≥ T OL Solve Akh(k+1)= −F (x(k))for h(k+1) Update x(k+1)= x(k)+ h(k+1)
Compute y(k+1)= F (x(k+1)) − F (x(k)) Update
Ak+1= Ak+(y(k+1)+ F (x(k)))(h(k+1))T (h(k+1))Th(k+1) Set k = k + 1
End While
27 / 35
師大
Solve the linear system Akh(k+1) = −F (x(k))for h(k+1): LU-factorization: cost 23n3+ O(n2)floating-point operations.
Applying the Shermann-Morrison-Woodbury formula B + U VT−1
= B−1− B−1U I + VTB−1U−1
VTB−1 to (1), we have
A−1k
=
Ak−1+(y(k)− Ak−1h(k))(h(k))T (h(k))Th(k)
−1
= A−1k−1− A−1k−1y(k)− Ak−1h(k) (h(k))Th(k)
1 + (h(k))TA−1k−1y(k)− Ak−1h(k) (h(k))Th(k)
−1
(h(k))TA−1k−1
= A−1k−1+(h(k)− A−1k−1y(k))(h(k))TA−1k−1 (h(k))TA−1k−1y(k) .
28 / 35
師大
Newton-based methods
Advantage: high speed of convergence once a sufficiently accurate approximation
Weakness: an accurate initial approximation to the solution is needed to ensure convergence.
Steepest Descent method converges only linearly to the sol., but it will usually converge even for poor initial approximations.
“Find sufficiently accurate starting approximate solution by using Steepest Descent method” + ”Compute convergent solution by using Newton-based methods”
The method of Steepest Descent determines a local minimum for a multivariable function of g : Rn → R.
A system of the form fi(x1, . . . , xn) = 0, i = 1, 2, . . . , nhas a solution at x iff the function g defined by
g(x1, . . . , xn) =
n
X
i=1
[fi(x1, . . . , xn)]2
has the minimal value zero. 29 / 35
師大
Basic idea of steepest descent method:
(i) Evaluate g at an initial approximation x(0);
(ii) Determine a direction from x(0)that results in a decrease in the value of g;
(iii) Move an appropriate distance in this direction and call the new vector x(1);
(iv) Repeat steps (i) through (iii) with x(0)replaced by x(1). Definition 7 (Gradient)
If g : Rn→ R, the gradient, ∇g(x), at x is defined by
∇g(x) = ∂g
∂x1
(x), · · · , ∂g
∂xn
(x)
.
Definition 8 (Directional Derivative)
The directional derivative of g at x in the direction of v with k v k2= 1 is defined by
Dvg(x) = lim
h→0
g(x + hv) − g(x)
h = vT∇g(x).
30 / 35
師大
Theorem 9
The direction of the greatest decrease in the value of g at x is the direction given by−∇g(x).
Object: reduce g(x) to its minimal value zero.
⇒ for an initial approximation x(0), an appropriate choice for new vector x(1) is
x(1)= x(0)− α∇g(x(0)), for some constant α > 0.
Choose α > 0 such that g(x(1)) < g(x(0)): define h(α) = g(x(0)− α∇g(x(0))),
then find α∗ such that
h(α∗) = min
α h(α).
31 / 35
師大
How to find α∗?
Solve a root-finding problem h0(α) = 0 ⇒ Too costly, in general.
Choose three number α1< α2< α3, construct quadratic polynomial P (x) that interpolates h at α1, α2and α3, i.e.,
P (α1) = h(α1), P (α2) = h(α2), P (α3) = h(α3), to approximate h. Use the minimum value P ( ˆα)in [α1, α3] to approximate h(α∗). The new iteration is
x(1) = x(0)− ˆα∇g(x(0)).
Set α1 = 0to minimize the computation α3is found with h(α3) < h(α1).
Choose α2= α3/2.
32 / 35
師大
Example 10
Use the Steepest Descent method with x(0)= (0, 0, 0)T to find a reasonable starting approximation to the solution of the nonlinear system
f1(x1, x2, x3) = 3x1− cos(x2x3) −1 2 = 0,
f2(x1, x2, x3) = x21− 81(x2+ 0.1)2+ sin x3+ 1.06 = 0, f3(x1, x2, x3) = e−x1x2+ 20x3+10π − 3
3 = 0.
Let g(x1, x2, x3) = [f1(x1, x2, x3)]2+ [f2(x1, x2, x3)]2+ [f3(x1, x2, x3)]2. Then∇g(x1, x2, x3) ≡ ∇g(x)
=
2f1(x)∂f1
∂x1(x) + 2f2(x)∂f2
∂x1(x) + 2f3(x)∂f3
∂x1(x), 2f1(x)∂f1
∂x2
(x) + 2f2(x)∂f2
∂x2
(x) + 2f3(x)∂f3
∂x2
(x), 2f1(x)∂f1
∂x3(x) + 2f2(x)∂f2
∂x3(x) + 2f3(x)∂f3
∂x3(x)
33 / 35
師大
For x(0) = [0, 0, 0]T, we have
g(x(0)) = 111.975 and z0 = k∇g(x(0))k2 = 419.554.
Let z = 1
z0
∇g(x(0)) = [−0.0214514, −0.0193062, 0.999583]T. With α1 = 0, we have
g1= g(x(0)− α1z) = g(x(0)) = 111.975.
Let α3 = 1so that
g3 = g(x(0)− α3z) = 93.5649 < g1. Set α2 = α3/2 = 0.5. Thus
g2= g(x(0)− α2z) = 2.53557.
34 / 35
師大
Form quadratic polynomial P (α) defined as P (α) = g1+ h1α + h3α(α − α2)
that interpolates g(x(0)− αz) at α1= 0, α2= 0.5and α3= 1as follows g2= P (α2) = g1+ h1α2 ⇒ h1= g2− g1
α2 = −218.878, g3= P (α3) = g1+ h1α3+ h3α3(α3− α2) ⇒ h3= 400.937.
Thus
P (α) = 111.975 − 218.878α + 400.937α(α − 0.5) so that
0 = P0(α0) = −419.346 + 801.872α0 ⇒ α0= 0.522959 Since
g0= g(x(0)− α0z) = 2.32762 < min{g1, g3}, we set
x(1)= x(0)− α0z = [0.0112182, 0.0100964, −0.522741]T.
35 / 35