Tsung-Ming Huang
Matrix Computation, 2016, NTNU
1
Plan
Gradient method
Conjugate gradient method Preconditioner
2
Gradient method
3
Theorem
4
Ax = b, A : s.p.d
Inner product
< x, y > = x
⊤y for any x, y ∈!
ng(x) = < x, Ax > −2 < x,b > = x
⊤Ax − 2x
⊤b
Define
Theorem
A : s.p.d
x
∗is the sol. of Ax = b g(x
∗) = min
x∈!ng(x)
A⊤ = A
x⊤Ax > 0, ∀ x ≠ 0
A : symmetric positive definite if
Definition
Proof
5
g(x) = < x, Ax > −2 < x,b >
Assume
x
∗is the sol. of Ax = b
= < x − x
∗, A(x − x
∗) > + < x, Ax
∗> + < x
∗, Ax >
− < x
∗, Ax
∗> −2 < x,b >
= < x − x
∗, A(x − x
∗) > − < x
∗, Ax
∗>
+ 2 < x, Ax
∗> −2 < x,b >
= < x − x
∗, A(x − x
∗) > − < x
∗, Ax
∗> +2 < x, Ax
∗− b >
Ax
∗= b
= < x − x
∗, A(x − x
∗) > − < x
∗, Ax
∗>
g(x
∗) = min
x∈!ng(x)
< x − x
∗, A(x − x
∗) > ≥ 0
Proof
6
Assume
g(x
∗) = min
x∈!ng(x)
Fixed vectors and , for any
x vα
∈!f ( α ) ≡ g(x + α v) = < x + α v, Ax + α Av > −2 < x + α v,b >
= < x, Ax > + α < v, Ax > + α < x, Av > + α
2< v, Av >
−2 < x,b > −2 α < v,b >
= < x, Ax > −2 < x,b > +2 α < v, Ax >
−2 α < v,b > + α
2< v, Av >
= g(x) + 2 α < v, Ax − b > + α
2< v, Av >
g(x + ˆ α v) = f ( ˆ α ) = g(x) − 2 < v,b − Ax >
< v, Av > < v,b − Ax >
+ < v,b − Ax >
< v, Av >
⎛ ⎝⎜ ⎞
⎠⎟
2
< v, Av >
Proof
7
f ( α ) = g(x) + 2 α < v, Ax − b > + α
2< v, Av >
is a quadratic function of
f
α
A : s.p.d
fhas a minimal value when
f ′(α
) = 0f ′( ˆ α ) = 2 < v, Ax − b > +2 ˆ α < v, Av > = 0 α ˆ = − < v, Ax − b >
< v, Av > = < v,b − Ax >
< v, Av >
= g(x) − < v,b − Ax >
2< v, Av >
Proof
8
g(x + ˆ α v) = g(x) − < v,b − Ax >
2< v, Av >
g(x + ˆ α v) = g(x) if < v,b − Ax > = 0 g(x + ˆ α v) < g(x) if < v,b − Ax > ≠ 0
∀ v ≠ 0
Suppose that
g(x
∗) = min
x∈!ng(x)
g(x
∗+ ˆ α v) ≥ g(x
∗) for any v
< v,b − Ax
∗> = 0, ∀ v Ax
∗= b
9
α = < v,b − Ax >
< v, Av > = < v,r >
< v, Av > , r ≡ b − Ax
If r ≠ 0 and < v,r > ≠ 0
g(x +
α
v) = g(x) − < v,b − Ax >2< v, Av > < g(x)
is closer to than is
x + α v x∗ x
Given and
x(0) v(1) ≠ 0k = 1,2,3,!
For
α k = < v(k ),b − Ax(k−1) >
< v(k ), Av(k ) > , x(k ) = x(k−1) + αkv(k )
Choose a new search direction
v(k+1)Steepest descent
10
Question
How to choose s.t. rapidly? { }
v(k ){ }
x(k ) → x∗Let be a differentiable function on
Φ :!n → !x
Φ(x + ε p) − Φ(x)
ε = ∇Φ(x)
⊤
p +O( ε )
p = − ∇Φ (x)
‖ ∇Φ(x)‖ (i.e., the largest descent) The right hand side takes minimum at
for all with (neglect ) p ‖p‖ = 1 O( ε )
Steepest descent direction of g
11
Denote x = [x
1, x
2, !, x
n]
⊤g(x) = < x, Ax > −2 < x,b > = a
ijj=1
∑
n i=1∑
nx
ix
j− 2 x
i i=1∑
nb
i∂g
∂x
k(x) = 2 a
kii=1
∑
nx
i− 2b
k= 2 A(k,:)x − b (
k)
∇g(x) = ∂ g
∂x
1(x), ∂g
∂x
2, !, ∂g
∂x
n(x)
⎡
⎣ ⎢ ⎤
⎦ ⎥
⊤
= 2(Ax − b) = −2r
Steepest descent method (gradient method)
12
Given
x(0) ≠ 0rk−1 = b − Ax(k−1)
α
k = < rk−1,rk−1 >< rk−1, Ark−1 >
x(k ) = x(k−1) +
α
krk−1Else
If , then
rk−1 = 0Stop;
End
k = 1,2,3,!
For
End
Convergence Theorem
λ1 ≥ λ2 ≥! ≥ λn > 0 : eigenvalues
x(k ), x(k−1) : approx. sol.
x∗ : exact sol.
where ‖x‖A = x⊤Ax
‖x(k ) − x*‖A
≤ λ1 − λn λ1 + λn
⎛
⎝⎜
⎞
⎠⎟‖x(k−1) − x*‖A
Conjugate gradient method
13
A-orthogonal
14
If κ (A) = λ1
λn is large
λ
1− λ
nλ
1+ λ
n≈ 1
Convergence is very slow
Improvement
Choose A-orthogonal search directions
Definition
p,q ∈!n
are called A-orthogonal (A-conjugate) if
p⊤Aq = 0
NOT recommend it
Lemma
15
v
1, …,v
n≠ 0 : pairwisely A-conjugate
Proof
0 = c
jj=1
∑
nv
j0 = (v
k)
⊤A c
jj=1
∑
nv
j⎛
⎝⎜
⎞
⎠⎟ = c
jj=1
∑
n(v
k)
⊤Av
j= c
k(v
k)
⊤ Avkc
k= 0, k = 1,…,n
v
1, …,v
n: linearly independent
v
1, …,v
n: linearly independent
Theorem
16
A : symmetric positive definite
v
1, …,v
n≠ 0 ∈!
n: pairwisely A-conjugate x
0: given
For , let k = 1,…,n
α
k = < vk ,b − Axk−1 >< vk, Avk >
xk = xk−1 + α kvk
Then
Axn = b
< b − Axk ,vj > = 0, for j = 1,2,…,k
Proof
17
xk = xk−1 + αkvk Axn = Axn−1 + αnAvn
= ! = Ax0 + α1Av1 + α2Av2 +!+ αnAvn
= (Axn−2 + αn−1Avn−1) + αnAvn
< Axn − b,vk >
= < Ax0 − b,vk > +α1 < Av1,vk > +!+ αn < Avn,vk >
= < Ax0 − b,vk > +α1 < v1, Avk > +!+ αn < vn, Avk >
= < Ax0 − b,vk > +α k < vk , Avk >
= < Ax0 − b,vk > + < vk,b − Axk−1 >
< vk , Avk > < vk, Avk >
= < Ax0 − b,vk > + < vk ,b − Axk−1 >
Proof
18
< Axn − b,vk > = < Ax0 − b,vk > + < vk ,b − Axk−1 >
= < Ax0 − b,vk >
+ < vk ,b − Ax0 + Ax0 − Ax1 +!− Axk−2 + Axk−2 − Axk−1 >
= < Ax0 − b,vk > + < vk ,b − Ax0 >
+ < vk , Ax0 − Ax1 > +!+ < vk, Axk−2 − Axk−1 >
= < vk, Ax0 − Ax1 > +!+ < vk , Axk−2 − Axk−1 >
xi = xi−1 + αivi, ∀i Axi = Axi−1 + αiAvi Axi−1 − Axi = −αiAvi
< Axn − b,vk > = −α1 < vk , Av1 > −!− αk−1 < vk , Avk−1 > = 0 Axn = b
Proof
19
< b − Axk ,vj > = 0, for j = 1,2,…,k
< rk−1,vj > = 0, for j = 1,2,…,k −1
Assume
rk = b − Axk = b − A(xk−1 + αkvk ) = rk−1 − α k Avk
< rk ,vk > = < rk−1,vk > −αk < Avk ,vk >
= < rk−1,vk > − < vk ,b − Axk−1 >
< vk , Avk > < Avk,vk > = 0
For
j = 1,…,k −1< rk ,vj > = < rk−1,vj > −α k < Avk ,vj > = 0
Assumption A-conjugate
which is completed the proof by the mathematic induction.
Method of conjugate directions
20
rk = rk−1 −
α
k Avk = b − Ax(k−1)α
k = < vk ,rk−1 >< vk, Avk > , x(k ) = x(k−1) +
α
kvkk = 1,…,n
For
End
Given : pairwisely A-orthogonal
x(0), v1,…,vn ∈!n \ {0}r0 = b − Ax(0)
Question
How to find A-orthogonal search directions?
A-orthogonalization
21
v1 v2
αv1
!v2
!v
2= v
2− α v
1⊥ v
10 = v
1⊤!v
2= v
1⊤v
2− α v
1⊤v
1α = v
1⊤v
2v
1⊤v
1A-orthogonal
!v
2= v
2− α v
1⊥
Av
10 = v
1⊤A !v
2= v
1⊤Av
2− α v
1⊤Av
1α = v
1⊤Av
2v
1⊤Av
1A-orthogonalization
22
!v
2= v
2− v
1⊤Av
2v
1⊤Av
1v
1⊥
Av
1v
1, v
2{ } { v
1, !v
2} : A-orthogonal v
1, v
2, v
3{ } { v
1, !v
2, !v
3} : A-orthogonal
!v
3= v
3− α
1v
1− α
2!v
2⊥
A{ v
1, !v
2}
0 = v
1⊤A !v
3= v
1⊤Av
3− α
1v
1⊤Av
1α
1= v
1⊤Av
3/ v
1⊤Av
10 = !v
2⊤A !v
3= !v
2⊤Av
3− α
2!v
2⊤A !v
2α
2= !v
2⊤Av
3/ !v
2⊤A !v
2Practical Implementation
23
Given
x(0) r0 = b − Ax(0) v1 = r0α
1 = < v1,r0 >< v1, Av1 > , x(1) = x(0) +
α
1v1r1 = r0 − α1Av1 steepest descent direction
v1, r1
{ }
NOT A-orthogonal setv2 = r1 + β1v1, β1 = − < v1, Ar1 >
< v1, Av1 >
Construct A-orthogonal vector
α
2 = < v2,r1 >< v2, Av2 > , x(2) = x(1) +
α
2v2 r2 = r1 − α2Av2Construct A-orthogonal vector
24
v1, v2,r2
{ }
r1 = r0 −
α
1Av1v2⊤r2 = v2⊤r1 −
α
2v2⊤Av2 = v2⊤r1 − v2⊤r1v2⊤Av2 v2⊤Av2 = 0
v1⊤Ar2 = r2⊤Av1 =
α
1−1(
r2⊤r0 − r2⊤r1)
0 = v2⊤r2 = r
(
1⊤ +β
1v1⊤)
r2 = r1⊤r2 +β
1v1⊤r2v3 = r2 +
β
21v1 +β
22v2,β
21 = − v1⊤Ar2v1⊤Av1 ,
β
22 = − v2⊤Ar2 v2⊤Av2= r1⊤r2 + β1v1⊤
(
r1 − α2Av2)
= r1⊤r2 + β1v1⊤r1= r1⊤r2 +
β
1v1⊤(
r0 −α
1Av1)
= r1⊤r2 +β
1 v1⊤r0 − << vv1,r0 >1, Av1 > v1⊤Av1
⎛
⎝⎜
⎞
= r1⊤r2 ⎠⎟
25
r1 = r0 − α1Av1, α1 = < v1,r0 >
< v1, Av1 >
< v1,r1 > = < v1,r0 > −α1 < v1, Av1 > = 0
< r2,r0 > = < r2,v1 > = < r1,v1 > −α2 < Av2,v1 > = 0 v1⊤Ar2 =
α
1−1(
r2⊤r0 − r2⊤r1)
= 0β21 = − v1⊤Ar2
v1⊤Av1 = 0
v3 = r2 + β2v2, β2 = − v2⊤Ar2 v2⊤Av2
In general case
26
vk = rk−1 + βk−1vk−1 if rk−1 ≠ 0
(i). r
{
0,r1,…,rk−1}
is an orthogonal set0 = < vk−1, Avk > = < vk−1, Ark−1 +
β
k−1Avk−1 >= < vk−1, Ark−1 > +βk−1 < vk−1, Avk−1 >
βk−1 = − < vk−1, Ark−1 >
< vk−1, Avk−1 >
Theorem
(ii). v
{
1,…,vk}
is an A-orthogonal setReformula
27
αk = < vk,rk−1 >
< vk , Avk > = < rk−1 + βk−1vk−1,rk−1 >
< vk , Avk >
= < rk−1,rk−1 >
< vk , Avk > +
β
k−1 < vk−1,rk−1 >< vk, Avk > = < rk−1,rk−1 >
< vk, Avk >
< rk−1,rk−1 > =
α
k < vk , Avk >α
k, β
kvk = rk−1 + βk−1vk−1
rk = rk−1 −
α
k Avk< rk ,rk > = < rk−1,rk > −
α
k < Avk ,rk > = −α
k < rk , Avk >= < rk ,rk >
< rk−1,rk−1 >
βk = − < vk , Ark >
< vk , Avk > = − < rk , Avk >
< vk , Avk >
Algorithm (Conjugate Gradient Method)
28
rk+1 = rk − α k Avk α k = < rk ,rk >
< vk, Avk > , x(k+1) = x(k ) + αkvk
k = 0,1,…
For
End
Given compute
x(0), r0 = b − Ax(0) = v0If , then
rk+1 = 0Stop;
End
βk = < rk+1,rk+1 >
< rk,rk > , vk+1 = rk+1 + βkvk
Else
well-conditioned
‖r n‖< tol Axn = b
Theorem
ill-conditioned
‖rk‖< tol k > n
Conjugate Gradient Method
29
Convergence Theorem
λ1 ≥ λ2 ≥! ≥ λn > 0 : eigenvalues
x(k )
{ }
: produced by CG method x∗ : exact sol.‖xG(k ) − x*‖A ≤
λ
1 −λ
nλ
1 +λ
n⎛
⎝⎜
⎞
⎠⎟
k
‖xG(0) − x*‖A =
κ
−1κ
+1⎛⎝⎜ ⎞
⎠⎟
k
‖xG(0) − x*‖A
‖x(k ) − x*‖A ≤ 2 κ −1 κ +1
⎛
⎝⎜
⎞
⎠⎟
k
‖x0 − x*‖A, κ = λ1 λn xG(k )
{ }
: produced by Gradient methodCG is much better than Gradient method
κ −1
κ +1 > κ −1 κ +1
Preconditioner
30
C−1A x = C−1b
31
Ax = b C−⊤C⊤
Choose such thatC κ (C−1AC−⊤ ) < κ (A)
Goal
!A !x !b
!A!x = !b
Apply CG method to Get !x Solve x = C−⊤ !x
Nothing NEW
Apply CG method to !A!x = !b Get x
Question
Algorithm (Conjugate Gradient Method)
32
!rk+1 = !b − !A!x(k+1)
α
! k = < !rk , !rk >< !vk , !A !vk >
!x(k+1) = !x(k ) + !
α
k !vkk = 0,1,…
For
End
Given compute
!x(0), !r0 = !b − !A!x(0) = !v0If , then Stop
!rk+1 = 0β
!k = < !rk+1, !rk+1 >< !rk , !rk >
!vk+1 = !rk+1 + !
β
k !vk!rk+1 = C−1b − C
(
−1AC −T)
C⊤xk+1= C−1(b − Axk+1) = C−1rk+1
= C−1rk+1
Let
!vk = C⊤vk, wk = C−1rk
= < wk+1,wk+1 >
< wk ,wk >
= < wk+1,wk+1 >
< wk ,wk >
β!k = < C−1rk+1,C−1rk+1 >
< C−1rk ,C−1rk >
Algorithm (Conjugate Gradient Method)
33
α
! k = < !rk , !rk >< !vk , !A !vk >
!x(k+1) = !x(k ) + !
α
k !vkk = 0,1,…
For
End
Given compute
!x(0), !r0 = !b − !A!x(0) = !v0If , then Stop
!rk+1 = 0< C⊤vk ,C−1Avk >
= vk⊤CC−1Avk = vk⊤Avk
!rk+1 = C−1rk+1
β
!k = < wk+1,wk+1 >< wk ,wk >
!vk+1 = !rk+1 + !
β
k !vkα! k = < C−1rk ,C−1rk >
< C⊤vk ,C−1AC−⊤C⊤vk >
= < wk,wk >
< C⊤vk,C−1Avk >
α! k = < wk,wk >
< vk , Avk >
= < wk ,wk >
< vk , Avk >
Algorithm (Conjugate Gradient Method)
34
α
! k = < wk ,wk >< vk , Avk >
!x(k+1) = !x(k ) + !
α
k !vkk = 0,1,…
For
End
Given compute
!x(0), !r0 = !b − !A!x(0) = !v0If , then Stop
!rk+1 = 0C−1rk+1 = C−1rk − !α kC−1AC−⊤C⊤vk
β
!k = < wk+1,wk+1 >< wk ,wk >
!vk+1 = !rk+1 + !
β
k !vkC⊤x(k+1) = C⊤x(k ) + !α kC⊤vk x(k+1) = x(k ) + !α kvk
!rk+1 = !rk − !
α
k !A!vk rk+1 = rk − !αk AvkC⊤vk+1 = C −1rk+1 + !βkC⊤vk
!vk = C⊤vk, wk = C−1rk
= C−⊤wk+1 + !βkvk
vk+1 = C−⊤C−1rk+1 + !βkvk
Algorithm (Conjugate Gradient Method)
35
α
! k = < wk ,wk >< vk , Avk >
x(k+1) = x(k ) + !
α
kvkk = 0,1,…
For
End
Given compute
!x(0), !r0 = !b − !A!x(0) = !v0If , then Stop
rk+1 = 0β
!k = < wk+1,wk+1 >< wk ,wk >
vk+1 = C−⊤wk+1 + !
β
kvkrk+1 = rk − !
α
k Avk vk+1 = C−⊤wk+1 + !βkvk
need w0
w0 = C−1r0 = C−1(b − Ax(0)) wk = C−1rk
need v0
v0 = C−⊤w0
Solve C wk+1 = rk+1
Algorithm (CG Method with preconditioner C)
36
α
k = < wk,wk > / < vk , Avk >x(k+1) = x(k ) +
α
kvkk = 0,1,…
For
End
If , then Stop
rk+1 = 0β
k = < wk+1,wk+1 > / < wk ,wk >vk+1 = zk+1 +
β
kvk rk+1 = rk −α
k AvkGiven and compute
C x(0), r0 = b − Ax(0)Solve and
Cw0 = r0 C⊤v1 = w0Solve and
Cwk+1 = rk+1 C⊤zk+1 = wk+1rk+1 = CC⊤zk+1 ≡ Mzk+1 βk = < C−1rk+1,C−1rk+1 >
< C−1rk ,C−1rk >
= < zk+1,rk+1 >
< zk ,rk >
α k = < C−1rk ,C−1rk >
< C⊤vk ,C−1Avk >
= < zk ,rk >
< vk , Avk >
Algorithm (CG Method with preconditioner M)
37
k = 0,1,…
For
End
If , then Stop
rk+1 = 0Given and compute
M x(0), r0 = b − Ax(0)Solve and set
Mz0 = r0 v1 = z0Solve
Mzk+1 = rk+1α k = < zk ,rk > / < vk , Avk >
Compute Compute
x(k+1) = x(k ) + α kvkβk = < zk+1,rk+1 > / < zk ,rk >
Compute
rk+1 = rk − αk AvkCompute
vk+1 = zk+1 +
β
kvkCompute
Choices of M (Criterion)
cond is nearly by 1, i.e.,
38
(M −1/2AM −1/2 )
M −1/2AM −1/2 ≈ I, A ≈ M
The linear system must be easily solved. e.g.
Mz = r M = LL⊤is symmetric positive definite
MPreconditioner M
Jacobi method
39
A = D + (L +U), M = D
xk+1 = −D−1(L +U)xk + D−1b
= −D−1(A − D)xk + D−1b
= xk + D−1rk
Gauss-Seidel
A = (D + L) +U, M = D + L xk+1 = −(D + L)−1Uxk + (D + L)−1b= (D + L)−1(D + L − A)xk + (D + L)−1b
= xk + (D + L)−1rk