• 沒有找到結果。

Gradient method

N/A
N/A
Protected

Academic year: 2022

Share "Gradient method "

Copied!
40
0
0

加載中.... (立即查看全文)

全文

(1)

Tsung-Ming Huang


Matrix Computation, 2016, NTNU

1

(2)

Plan

Gradient method

Conjugate gradient method Preconditioner

2

(3)

Gradient method

3

(4)

Theorem

4

Ax = b, A : s.p.d

Inner product

< x, y > = x

y for any x, y ∈!

n

g(x) = < x, Ax > −2 < x,b > = x

Ax − 2x

b

Define

Theorem

A : s.p.d

x

is the sol. of Ax = b g(x

) = min

x∈!n

g(x)

A = A

xAx > 0, ∀ x ≠ 0

A : symmetric positive definite if

Definition

(5)

Proof

5

g(x) = < x, Ax > −2 < x,b >

Assume

x

is the sol. of Ax = b

= < x − x

, A(x − x

) > + < x, Ax

> + < x

, Ax >

− < x

, Ax

> −2 < x,b >

= < x − x

, A(x − x

) > − < x

, Ax

>

+ 2 < x, Ax

> −2 < x,b >

= < x − x

, A(x − x

) > − < x

, Ax

> +2 < x, Ax

− b >

Ax

= b

= < x − x

, A(x − x

) > − < x

, Ax

>

g(x

) = min

x∈!n

g(x)

< x − x

, A(x − x

) > ≥ 0

(6)

Proof

6

Assume

g(x

) = min

x∈!n

g(x)

Fixed vectors and , for any

x v

α

∈!

f ( α ) ≡ g(x + α v) = < x + α v, Ax + α Av > −2 < x + α v,b >

= < x, Ax > + α < v, Ax > + α < x, Av > + α

2

< v, Av >

−2 < x,b > −2 α < v,b >

= < x, Ax > −2 < x,b > +2 α < v, Ax >

−2 α < v,b > + α

2

< v, Av >

= g(x) + 2 α < v, Ax − b > + α

2

< v, Av >

(7)

g(x + ˆ α v) = f ( ˆ α ) = g(x) − 2 < v,b − Ax >

< v, Av > < v,b − Ax >

+ < v,b − Ax >

< v, Av >

⎛ ⎝⎜ ⎞

⎠⎟

2

< v, Av >

Proof

7

f ( α ) = g(x) + 2 α < v, Ax − b > + α

2

< v, Av >

is a quadratic function of

f

α

A : s.p.d

f

has a minimal value when

f ′(

α

) = 0

f ′( ˆ α ) = 2 < v, Ax − b > +2 ˆ α < v, Av > = 0 α ˆ = − < v, Ax − b >

< v, Av > = < v,b − Ax >

< v, Av >

= g(x) − < v,b − Ax >

2

< v, Av >

(8)

Proof

8

g(x + ˆ α v) = g(x) − < v,b − Ax >

2

< v, Av >

g(x + ˆ α v) = g(x) if < v,b − Ax > = 0 g(x + ˆ α v) < g(x) if < v,b − Ax > ≠ 0

∀ v ≠ 0

Suppose that

g(x

) = min

x∈!n

g(x)

g(x

+ ˆ α v) ≥ g(x

) for any v

< v,b − Ax

> = 0, ∀ v Ax

= b

(9)

9

α = < v,b − Ax >

< v, Av > = < v,r >

< v, Av > , r ≡ b − Ax

If r ≠ 0 and < v,r > ≠ 0

g(x +

α

v) = g(x) − < v,b − Ax >2

< v, Av > < g(x)

is closer to than is

x + α v x x

Given and

x(0) v(1) ≠ 0

k = 1,2,3,!

For

α k = < v(k ),b − Ax(k−1) >

< v(k ), Av(k ) > , x(k ) = x(k−1) + αkv(k )

Choose a new search direction

v(k+1)

(10)

Steepest descent

10

Question

How to choose s.t. rapidly? { }

v(k )

{ }

x(k ) → x

Let be a differentiable function on

Φ :!n → !

x

Φ(x + ε p) − Φ(x)

ε = ∇Φ(x)

p +O( ε )

p = − ∇Φ (x)

∇Φ(x)‖ (i.e., the largest descent) The right hand side takes minimum at

for all with (neglect ) p ‖p‖ = 1 O( ε )

(11)

Steepest descent direction of g

11

Denote x = [x

1

, x

2

, !, x

n

]

g(x) = < x, Ax > −2 < x,b > = a

ij

j=1

n i=1

n

x

i

x

j

− 2 x

i i=1

n

b

i

∂g

∂x

k

(x) = 2 a

ki

i=1

n

x

i

− 2b

k

= 2 A(k,:)x − b (

k

)

∇g(x) = ∂ g

∂x

1

(x), ∂g

∂x

2

, !, ∂g

∂x

n

(x)

⎣ ⎢ ⎤

⎦ ⎥

= 2(Ax − b) = −2r

(12)

Steepest descent method (gradient method)

12

Given

x(0) ≠ 0

rk−1 = b − Ax(k−1)

α

k = < rk−1,rk−1 >

< rk−1, Ark−1 >

x(k ) = x(k−1) +

α

krk−1

Else

If , then

rk−1 = 0

Stop;

End

k = 1,2,3,!

For

End

Convergence Theorem

λ1 λ2 ≥! ≥ λn > 0 : eigenvalues

x(k ), x(k−1) : approx. sol.

x : exact sol.

where ‖x‖A = xAx

‖x(k ) − x*A

≤ λ1 − λn λ1 + λn

⎝⎜

⎠⎟‖x(k−1) − x*A

(13)

Conjugate gradient method

13

(14)

A-orthogonal

14

If κ (A) = λ1

λn is large

λ

1

− λ

n

λ

1

+ λ

n

≈ 1

Convergence is very slow

Improvement

Choose A-orthogonal search directions

Definition

p,q ∈!n

are called A-orthogonal (A-conjugate) if

pAq = 0

NOT recommend it

(15)

Lemma

15

v

1

, …,v

n

≠ 0 : pairwisely A-conjugate

Proof

0 = c

j

j=1

n

v

j

0 = (v

k

)

A c

j

j=1

n

v

j

⎝⎜

⎠⎟ = c

j

j=1

n

(v

k

)

Av

j

= c

k

(v

k

)

Avk

c

k

= 0, k = 1,…,n

v

1

, …,v

n

: linearly independent

v

1

, …,v

n

: linearly independent

(16)

Theorem

16

A : symmetric positive definite

v

1

, …,v

n

≠ 0 ∈!

n

: pairwisely A-conjugate x

0

: given

For , let k = 1,…,n

α

k = < vk ,b − Axk−1 >

< vk, Avk >

xk = xk−1 + α kvk

Then

Axn = b

< b − Axk ,vj > = 0, for j = 1,2,…,k

(17)

Proof

17

xk = xk−1 + αkvk Axn = Axn−1 + αnAvn

= ! = Ax0 + α1Av1 + α2Av2 +!+ αnAvn

= (Axn−2 + αn−1Avn−1) + αnAvn

< Axn − b,vk >

= < Ax0 − b,vk > +α1 < Av1,vk > +!+ αn < Avn,vk >

= < Ax0 − b,vk > +α1 < v1, Avk > +!+ αn < vn, Avk >

= < Ax0 − b,vk > +α k < vk , Avk >

= < Ax0 − b,vk > + < vk,b − Axk−1 >

< vk , Avk > < vk, Avk >

= < Ax0 − b,vk > + < vk ,b − Axk−1 >

(18)

Proof

18

< Axn − b,vk > = < Ax0 − b,vk > + < vk ,b − Axk−1 >

= < Ax0 − b,vk >

+ < vk ,b − Ax0 + Ax0 − Ax1 +!− Axk−2 + Axk−2 − Axk−1 >

= < Ax0 − b,vk > + < vk ,b − Ax0 >

+ < vk , Ax0 − Ax1 > +!+ < vk, Axk−2 − Axk−1 >

= < vk, Ax0 − Ax1 > +!+ < vk , Axk−2 − Axk−1 >

xi = xi−1 + αivi, ∀i Axi = Axi−1 + αiAvi Axi−1 − Axi = −αiAvi

< Axn − b,vk > = −α1 < vk , Av1 > −!− αk−1 < vk , Avk−1 > = 0 Axn = b

(19)

Proof

19

< b − Axk ,vj > = 0, for j = 1,2,…,k

< rk−1,vj > = 0, for j = 1,2,…,k −1

Assume

rk = b − Axk = b − A(xk−1 + αkvk ) = rk−1 − α k Avk

< rk ,vk > = < rk−1,vk > −αk < Avk ,vk >

= < rk−1,vk > − < vk ,b − Axk−1 >

< vk , Avk > < Avk,vk > = 0

For

j = 1,…,k −1

< rk ,vj > = < rk−1,vj > −α k < Avk ,vj > = 0

Assumption A-conjugate

which is completed the proof by the mathematic induction.

(20)

Method of conjugate directions

20

rk = rk−1

α

k Avk = b − Ax(k−1)

α

k = < vk ,rk−1 >

< vk, Avk > , x(k ) = x(k−1) +

α

kvk

k = 1,…,n

For

End

Given : pairwisely A-orthogonal

x(0), v1,…,vn ∈!n \ {0}

r0 = b − Ax(0)

Question

How to find A-orthogonal search directions?

(21)

A-orthogonalization

21

v1 v2

αv1

!v2

!v

2

= v

2

− α v

1

v

1

0 = v

1

!v

2

= v

1

v

2

− α v

1

v

1

α = v

1

v

2

v

1

v

1

A-orthogonal

!v

2

= v

2

− α v

1

A

v

1

0 = v

1

A !v

2

= v

1

Av

2

− α v

1

Av

1

α = v

1

Av

2

v

1

Av

1

(22)

A-orthogonalization

22

!v

2

= v

2

v

1

Av

2

v

1

Av

1

v

1

A

v

1

v

1

, v

2

{ } { v

1

, !v

2

} : A-orthogonal v

1

, v

2

, v

3

{ } { v

1

, !v

2

, !v

3

} : A-orthogonal

!v

3

= v

3

− α

1

v

1

− α

2

!v

2

A

{ v

1

, !v

2

}

0 = v

1

A !v

3

= v

1

Av

3

− α

1

v

1

Av

1

α

1

= v

1

Av

3

/ v

1

Av

1

0 = !v

2

A !v

3

= !v

2

Av

3

− α

2

!v

2

A !v

2

α

2

= !v

2

Av

3

/ !v

2

A !v

2

(23)

Practical Implementation

23

Given

x(0) r0 = b − Ax(0) v1 = r0

α

1 = < v1,r0 >

< v1, Av1 > , x(1) = x(0) +

α

1v1

r1 = r0 − α1Av1 steepest descent direction

v1, r1

{ }

NOT A-orthogonal set

v2 = r1 + β1v1, β1 = − < v1, Ar1 >

< v1, Av1 >

Construct A-orthogonal vector

α

2 = < v2,r1 >

< v2, Av2 > , x(2) = x(1) +

α

2v2 r2 = r1 − α2Av2

(24)

Construct A-orthogonal vector

24

v1, v2,r2

{ }

r1 = r0

α

1Av1

v2r2 = v2r1

α

2v2Av2 = v2r1v2r1

v2Av2 v2Av2 = 0

v1Ar2 = r2Av1 =

α

1−1

(

r2r0 − r2r1

)

0 = v2r2 = r

(

1 +

β

1v1

)

r2 = r1r2 +

β

1v1r2

v3 = r2 +

β

21v1 +

β

22v2,

β

21 = − v1Ar2

v1Av1 ,

β

22 = − v2Ar2 v2Av2

= r1r2 + β1v1

(

r1 − α2Av2

)

= r1r2 + β1v1r1

= r1r2 +

β

1v1

(

r0

α

1Av1

)

= r1r2 +

β

1 v1r0 − << vv1,r0 >

1, Av1 > v1Av1

⎝⎜

= r1r2 ⎠⎟

(25)

25

r1 = r0 − α1Av1, α1 = < v1,r0 >

< v1, Av1 >

< v1,r1 > = < v1,r0 > −α1 < v1, Av1 > = 0

< r2,r0 > = < r2,v1 > = < r1,v1 > −α2 < Av2,v1 > = 0 v1Ar2 =

α

1−1

(

r2r0 − r2r1

)

= 0

β21 = − v1Ar2

v1Av1 = 0

v3 = r2 + β2v2, β2 = − v2Ar2 v2Av2

(26)

In general case

26

vk = rk−1 + βk−1vk−1 if rk−1 ≠ 0

(i). r

{

0,r1,…,rk−1

}

is an orthogonal set

0 = < vk−1, Avk > = < vk−1, Ark−1 +

β

k−1Avk−1 >

= < vk−1, Ark−1 > +βk−1 < vk−1, Avk−1 >

βk−1 = − < vk−1, Ark−1 >

< vk−1, Avk−1 >

Theorem

(ii). v

{

1,…,vk

}

is an A-orthogonal set

(27)

Reformula

27

αk = < vk,rk−1 >

< vk , Avk > = < rk−1 + βk−1vk−1,rk−1 >

< vk , Avk >

= < rk−1,rk−1 >

< vk , Avk > +

β

k−1 < vk−1,rk−1 >

< vk, Avk > = < rk−1,rk−1 >

< vk, Avk >

< rk−1,rk−1 > =

α

k < vk , Avk >

α

k

, β

k

vk = rk−1 + βk−1vk−1

rk = rk−1

α

k Avk

< rk ,rk > = < rk−1,rk > −

α

k < Avk ,rk > = −

α

k < rk , Avk >

= < rk ,rk >

< rk−1,rk−1 >

βk = − < vk , Ark >

< vk , Avk > = − < rk , Avk >

< vk , Avk >

(28)

Algorithm (Conjugate Gradient Method)

28

rk+1 = rk − α k Avk α k = < rk ,rk >

< vk, Avk > , x(k+1) = x(k ) + αkvk

k = 0,1,…

For

End

Given compute

x(0), r0 = b − Ax(0) = v0

If , then

rk+1 = 0

Stop;

End

βk = < rk+1,rk+1 >

< rk,rk > , vk+1 = rk+1 + βkvk

Else

well-conditioned

‖r n< tol Axn = b

Theorem

ill-conditioned

‖rk< tol k > n

(29)

Conjugate Gradient Method

29

Convergence Theorem

λ1 λ2 ≥! ≥ λn > 0 : eigenvalues

x(k )

{ }

: produced by CG method x : exact sol.

‖xG(k ) − x*A

λ

1

λ

n

λ

1 +

λ

n

⎝⎜

⎠⎟

k

‖xG(0) − x*A =

κ

−1

κ

+1

⎛⎝⎜ ⎞

⎠⎟

k

‖xG(0) − x*A

‖x(k ) − x*A ≤ 2 κ −1 κ +1

⎝⎜

⎠⎟

k

‖x0 − x*A, κ = λ1 λn xG(k )

{ }

: produced by Gradient method

CG is much better than Gradient method

κ −1

κ +1 > κ −1 κ +1

(30)

Preconditioner

30

(31)

C−1A x = C−1b

31

Ax = b C−⊤C

Choose such thatC κ (C−1AC−⊤ ) < κ (A)

Goal

!A !x !b

!A!x = !b

Apply CG method to Get !x Solve x = C−⊤ !x

Nothing NEW

Apply CG method to !A!x = !b Get x

Question

(32)

Algorithm (Conjugate Gradient Method)

32

!rk+1 = !b − !A!x(k+1)

α

! k = < !rk , !rk >

< !vk , !A !vk >

!x(k+1) = !x(k ) + !

α

k !vk

k = 0,1,…

For

End

Given compute

!x(0), !r0 = !b − !A!x(0) = !v0

If , then Stop

!rk+1 = 0

β

!k = < !rk+1, !rk+1 >

< !rk , !rk >

!vk+1 = !rk+1 + !

β

k !vk

!rk+1 = C−1b − C

(

−1AC −T

)

Cxk+1

= C−1(b − Axk+1) = C−1rk+1

= C−1rk+1

Let

!vk = Cvk, wk = C−1rk

= < wk+1,wk+1 >

< wk ,wk >

= < wk+1,wk+1 >

< wk ,wk >

β!k = < C−1rk+1,C−1rk+1 >

< C−1rk ,C−1rk >

(33)

Algorithm (Conjugate Gradient Method)

33

α

! k = < !rk , !rk >

< !vk , !A !vk >

!x(k+1) = !x(k ) + !

α

k !vk

k = 0,1,…

For

End

Given compute

!x(0), !r0 = !b − !A!x(0) = !v0

If , then Stop

!rk+1 = 0

< Cvk ,C−1Avk >

= vkCC−1Avk = vkAvk

!rk+1 = C−1rk+1

β

!k = < wk+1,wk+1 >

< wk ,wk >

!vk+1 = !rk+1 + !

β

k !vk

α! k = < C−1rk ,C−1rk >

< Cvk ,C−1AC−⊤Cvk >

= < wk,wk >

< Cvk,C−1Avk >

α! k = < wk,wk >

< vk , Avk >

= < wk ,wk >

< vk , Avk >

(34)

Algorithm (Conjugate Gradient Method)

34

α

! k = < wk ,wk >

< vk , Avk >

!x(k+1) = !x(k ) + !

α

k !vk

k = 0,1,…

For

End

Given compute

!x(0), !r0 = !b − !A!x(0) = !v0

If , then Stop

!rk+1 = 0

C−1rk+1 = C−1rk − !α kC−1AC−⊤Cvk

β

!k = < wk+1,wk+1 >

< wk ,wk >

!vk+1 = !rk+1 + !

β

k !vk

Cx(k+1) = Cx(k ) + !α kCvk x(k+1) = x(k ) + !α kvk

!rk+1 = !rk − !

α

k !A!vk rk+1 = rk − !αk Avk

Cvk+1 = C −1rk+1 + !βkCvk

!vk = Cvk, wk = C−1rk

= C−⊤wk+1 + !βkvk

vk+1 = C−⊤C−1rk+1 + !βkvk

(35)

Algorithm (Conjugate Gradient Method)

35

α

! k = < wk ,wk >

< vk , Avk >

x(k+1) = x(k ) + !

α

kvk

k = 0,1,…

For

End

Given compute

!x(0), !r0 = !b − !A!x(0) = !v0

If , then Stop

rk+1 = 0

β

!k = < wk+1,wk+1 >

< wk ,wk >

vk+1 = C−⊤wk+1 + !

β

kvk

rk+1 = rk − !

α

k Avk vk+1 = C

−⊤wk+1 + !βkvk

need w0

w0 = C−1r0 = C−1(b − Ax(0)) wk = C−1rk

need v0

v0 = C−⊤w0

Solve C wk+1 = rk+1

(36)

Algorithm (CG Method with preconditioner C)

36

α

k = < wk,wk > / < vk , Avk >

x(k+1) = x(k ) +

α

kvk

k = 0,1,…

For

End

If , then Stop

rk+1 = 0

β

k = < wk+1,wk+1 > / < wk ,wk >

vk+1 = zk+1 +

β

kvk rk+1 = rk

α

k Avk

Given and compute

C x(0), r0 = b − Ax(0)

Solve and

Cw0 = r0 Cv1 = w0

Solve and

Cwk+1 = rk+1 Czk+1 = wk+1

rk+1 = CCzk+1 ≡ Mzk+1 βk = < C−1rk+1,C−1rk+1 >

< C−1rk ,C−1rk >

= < zk+1,rk+1 >

< zk ,rk >

α k = < C−1rk ,C−1rk >

< Cvk ,C−1Avk >

= < zk ,rk >

< vk , Avk >

(37)

Algorithm (CG Method with preconditioner M)

37

k = 0,1,…

For

End

If , then Stop

rk+1 = 0

Given and compute

M x(0), r0 = b − Ax(0)

Solve and set

Mz0 = r0 v1 = z0

Solve

Mzk+1 = rk+1

α k = < zk ,rk > / < vk , Avk >

Compute Compute

x(k+1) = x(k ) + α kvk

βk = < zk+1,rk+1 > / < zk ,rk >

Compute

rk+1 = rk − αk Avk

Compute

vk+1 = zk+1 +

β

kvk

Compute

(38)

Choices of M (Criterion)

cond is nearly by 1, i.e.,

38

(M −1/2AM −1/2 )

M −1/2AM −1/2 ≈ I, A ≈ M

The linear system must be easily solved. e.g.

Mz = r M = LL

is symmetric positive definite

M

(39)

Preconditioner M

Jacobi method

39

A = D + (L +U), M = D

xk+1 = −D−1(L +U)xk + D−1b

= −D−1(A − D)xk + D−1b

= xk + D−1rk

Gauss-Seidel

A = (D + L) +U, M = D + L xk+1 = −(D + L)−1Uxk + (D + L)−1b

= (D + L)−1(D + L − A)xk + (D + L)−1b

= xk + (D + L)−1rk

參考文獻

相關文件

The underlying idea was to use the power of sampling, in a fashion similar to the way it is used in empirical samples from large universes of data, in order to approximate the

He proposed a fixed point algorithm and a gradient projection method with constant step size based on the dual formulation of total variation.. These two algorithms soon became

Then, we recast the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method to solve the smoothing

Then, we recast the signal recovery problem as a smoothing penalized least squares optimization problem, and apply the nonlinear conjugate gradient method to solve the smoothing

Stress and energy distribution in quark-anti-quark systems using gradient flow.. Ryosuke Yanagihara

2 Center for Theoretical Sciences and Center for Quantum Science and Engineering, National Taiwan University, Taipei 10617, Taiwan..

Abstract In this paper, we consider the smoothing Newton method for solving a type of absolute value equations associated with second order cone (SOCAVE for short), which.. 1

Convex optimization, compressed sensing, ℓ 1 -regularization, nuclear/trace norm, regression, variable selection, sensor network localization, approximation accuracy, proximal