NCTS 2006 School on Modern Numerical Methods in Mathematics and Physics : Methods for Accelerating the Arnoldi Iteration

(1)

Methods for Accelerating the

Arnoldi Iteration

Chao Yang

Computational Research Division

Lawerence Berkeley National Laboratory Berkeley, California, USA

(2)

Background

Large-scale eigenvalue problems

Ax = λx

or

Ax = λBx

A

,

B

large, sparse, or structured

y ← Ax

and

y ← Bx

can be computed efficiently

Krylov subspace methods Acceleration techniques

(3)

Arnoldi Iteration

Produces an orthonormal basis Vm = {v1, v2, ..., vm}

associated with a Krylov Subspace

K(A, v1; m) = span{v1, Av1, ..., Amv1}

A single step (Gram-Schmidt):

fj ← (I − VjV_jH)Avj; vj+1 ← fj/kfjk;

After m steps:

(4)

Approximate eigenvalues and vectors from _{K{A, v}₁_{; m}}

Find

(θ

_i

, y

_i

)

such that

V

_mH

(AV

_m

_y−θV

_m

y) = 0

(Galerkin condition

)

Equivalent to solving

H

_m

y

_i

= θ

_i

y

_i

,

where

H

_m

= V

_mH

AV

_m

.

Approximation to

eigenvalue

θ

_i

(

Ritz value

)

(5)

Checking Convergence

Let

z = V

_m

y

, where

H

_m

y = θy

; Residual norm

kAz − θzk = kAV

m

y − θV

m

yk

=

_k(V

_m

H

_m

+ f e

H_m

)y

_{− θV}

_m

_yk

=

_kfk|e

H_m

_y|

(6)

Convergence of Arnoldi

If

v

₁

∈

an

m

-dimensional invariant

subspace of

A

, Arnoldi converges in

m

or

few steps:

AV

_m

= V

_m

H

_m

.

One rarely finds such a good

v

₁. In general, extremal and well separated

eigenvalues emerge rapidly (Kaniel,

(7)

Convergence of Ritz values

0 5 10 15 20 25 30 35 −20 −15 −10 −5 0 5 10 15 20

Lanczos iteration number

(8)

Computational Cost

Storage for

V

_m,

H

_m:

O(nm + m

2

)

;

Orthogonalization

f

_m

← (I − V

_m

V

_mH

)Av

_m:

O(nm

2

)

;

Eigen-analysis of

H

_m:

O(m

3

)

; MATVEC

y ← Ax

: varies with applications;

(9)

Acceleration Methods

Method of implicit restart

polynomial

rational

Method of spectral transformation

polynomial

rational

Precondition (tomorrow’s lecture)

Solve an eigenvalue problem as either a system of nonlinear equations or an optimization problem

(10)

Restarting an Arnoldi iteration

1. Fix the dimension

m

of

K(A, v

₁

; m)

at a moderate value;

2. Modify the starting vector

v

₁

_{← ψ(A)v}

₁

;

3. Repeat the Arnoldi process with the modified

v

₁;

(11)

How to choose

_ψ(λ)

?

Suppose eigenvalues of

A

are:

λ

₁

, ..., λ

_k

,

|

{z

}

wanted

λ

_k+1

, ..., λ

_n

|

{z

}

unwanted

,

and the corresponding eigenvector are

x

₁

, x

₂

, ..., x

_n.

ψ(A)v

₁

= γ

₁

ψ(λ

₁

)x

₁

+

_{· · · + γ}

_k

ψ(λ

_k

)x

_k

|

{z

}

wanted

+ γ

_k+1

ψ(λ

_k+1

)x

_k+1

+

_{· · · + γ}

_n

ψ(λ

_n

)x

_n

|

{z

}

unwanted

(12)

Two types of restarts

1.

ψ(λ)

is a polynomial;

2.

ψ(λ)

is a rational function;

ψ(λ)

must be large on

λ

₁

, ..., λ

_k

and small

(13)

Filter Example

10−4 10−3 10−2 10−1 100 101 −2 0 2 4 6 8 10 12 14 16x 10 30 lambda p(lambda)

(14)

Implicit Restart

1. Do not form

v

₁

← ψ(A)v

₁ explicitly;

2. Do not repeat the Arnoldi iteration from the first column;

Need to understand the connection

between Arnoldi and QR (RQ)

(15)

QR and RQ iteration

AV = V H

ւ

ց

AV = V QR

AV = V RQ

⇓

A(V Q) = (V Q)RQ AV Q

H

= V Q

H

(QR)

ց

ւ

AV

+

= V

+

H

+

(16)

Difference

QR RQ

AV = (V Q)R A(V Q

H

) = V R

⇓

Av

₁

= v

₁+

ρ

_1,1

Av

₁+

= v

₁

ρ

_1,1 power inverse power inverse

(17)

Arnoldi = Truncated Hessenberg Reduction

=

V

(18)

Implicit Researt Arnoldi = Truncated QR Iteration

AV

_m

= V

_m

H

_m

+ f e

H_m

AV

_m

= V

_m

QR + f e

H_m

AV

_m

Q = (V

_m

Q)(RQ) + f e

H_m

Q

(19)

Shifts & Polynomial filter

Truncated Hessenberg reduction is shift-invariant

(A − µI)Vm = Vm(Hm − µI) + feH_m

Applying p shifts = Running p QR iterations on Hm

v₁+ _{= β(A − µ}₁_{I)(A − µ}₂_{I) · · · (A − µ}pI)v1

What to use for shifts? Eigenvalues of Hm.

θ₁, ..., θk, | {z } wanted θ_k+1, ..., θm | {z } unwanted , m = k + p

(20)

Filtering Polynomial (seek the smallest eigenvalues) 10−4 10−3 10−2 10−1 100 101 −2 0 2 4 6 8 10 12 14 16x 10 30 lambda p(lambda)

(21)

(22)

(23)

ARPACK

http://www.caam.rice.edu/software/ARPACK

Solve a variety of problems (sym, nonsym, real, complex)

Location of the eigenvalues: which = LM, SM, LA, SA, BE, LR, SR, LI, SI

Choose

k

and

p

(nev=

k

, ncv=

k + p

).

p

is the degree of the filtering polynomial

Reverse communication interface Level-2 BLAS

(24)

Reverse Communication

10 continue

call dsaupd(ido, ..., workd(in),

& workd(out))

if (ido .eq. 1) then

call matvec(..., workd(in),

& workd(out))

endif ...

(25)

Truncating the RQ Iteration

RQ-iteration

AV = V H_=⇒AV = V RQ_=⇒AV QH = (V QH)QR

Let V + = V QH and v₁+ = V +e₁, then Av₁+ = v₁ρ_1,1

Truncating RQ

(26)

The TRQ Equation

=

(A

_{− µI)v}

₊

= V

_k

h + vα, V

_kH

v

₊

= 0,

_kv

₊

_{k = 1}

⇓

A − µI V

k

V

_kH

0 v

₊

−h

=

vα

0 , kv

+

k = 1.

(27)

Solving the TRQ equation

Using

AV

_k

= V

_k

H

_k

+ f

_k

e

H_k to simplify the bordered system.

1.

w ← (I − V

_k

V

_kH

)(A

− µI)

−1

(V

_k

s)

; 2.

v

₊

← w/kwk

;

(28)

Example

The matrix A: 2-D discrete Laplacian (_{100 × 100});

Target: smallest eigenvalues;

k = 4; Convergence tolerance: 10−15;           α1 β1 β₁ α₂ β₂ β₂ α₃ β₃ β₃ α₄ β₄ β₄ α₅          

(29)

Convergence

iter. β₁ β₂ β₃ β₄

1 1.5e-2 4.2e+1 2.3e+1 1.9e+1

2 5.1e-4 3.2e-2 4.3e+1 2.3e+1

3 1.1e-6 9.1e-5 7.1e-4 3.8e-2

4 8.3e-13 7.0e-5 1.0e-4 8.5e-2

5 7.1e-24 2.9e-5 2.9e-4 2.5e-4

6 0.0 9.6e-6 1.1e-5 1.1e-4

7 0.0 3.7e-8 1.0e-6 4.8e-1

8 0.0 4.4e-19 1.5e-11 6.0e-3

(30)

Inexact TRQ

w ← (I − V

k

V

_kH

) (A

− µI)

−1

(V

k

s)

|

{z

}

solve iteratively

;

˜

v

₊

_{← w/kwk;}

˜h ← V

H k

A˜

v

+

; ˜

α ← v

H

(A

− µI)˜v

+

;

(31)

Error Damping

(A

_{− µI)˜v}

₁+

= ρ

_1,1

v

₁

+ zδ

where

δ

is a product of

k − 1

sines (error damping).

(32)

Linear Convergence

kr+k ≤ ψ(µ, ν)krk, |ψ(µ, ν)| ≤ p ν ζ2 _{− ν}2 + O(krk), where r = (A − µI)v1, r+ = (A − µ+I)v₁+;

µ and µ₊ are Rayleigh Quotient shifts (converging);

ν damped residual error (from TRQ equation);

(33)

Observation

kr

+

k ≤ ψ(µ, ν)krk,

|ψ(µ, ν)| ≤

p

ν

ζ

2

_{− ν}

2

+

O(krk).

ν

is normally several magnitude smaller than

kzk

;

Asymptotically, if

|ν| < ζ/

√

2

, then

ψ < 1

monotonic convergence;

If

ν = 0

,

kr

₊

k ≤ γkrk

2 (quadratic convergence of RQI)

(34)

Convergence of Inexact TRQ

iter.

α

₁

kzk

β

₁ 1 3.0e-3 - 7.8e-3 2 9.7e-4 1.9e-3 7.3e-5 3 9.7e-4 2.5e-3 3.2e-6 4 9.7e-4 5.7e-4 1.5e-7 5 9.7e-4 1.6e-3 6.7e-9 6 9.7e-4 9.2e-4 3.2e-10 7 9.7e-4 1.3e-3 1.6e-11

(35)

Example (Reactive Scattering)

n = 256

, symmetric Target:smallest

k = 6

;

Solver: MINRES; tol =

1.0

−8; max iter =

100

.

(36)

Residual

IRAM precond no−precond 0 0.5 1 1.5 2 2.5 3 3.5 x 108 10−10 10−8 10−6 10−4 10−2 100 102 104 106 108 1010 flops residual norm matrix size = 256

(37)

Spectral Transformation

Compute eigenvalues of

ψ(A)

instead; Cluster or interior eigenvalues of

A

becomes dominant and well separated eigenvalues of

ψ(A)

;

Eigenvectors are preserved, recover eigenvalues by Rayleigh Quotient. Two types of

ψ(λ)

:

rational function:

ψ(A) = (A − µI)

−1; polynomial;

(38)

Rational Transformation

10−3 10−2 10−1 100 0 100 200 300 400 500 600 700 800 900 1000 λ λ −1 Shift-invert _{(A − σI)}−1x = _λ−σ1 x Caley Transformation _{(A − σ}₁I)−₁ (A + σ₂I)x = λ+σ2 λ−σ₁ x

(39)

Generalized Problems

B symmetric positive definite, B = LLT

L−1AL−T(LT x) = λ(LTx)

B symmetric positive semidefinite,

(A − σB)−1Bx = 1

λ − σ x

symmetric w/s to B-semi-inner product

B nonsymmetric,

B−1Ax = λx

(40)

Practical Issues

Sparse matrix ordering and factorization Sparse triangular solves

0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 nz = 95138

(41)

Polynomial Transformations

Only requires MATVEC;

Fewer number of iterations implies fewer inner products;

(42)

Polynomial Construction

Chebyshev and Kernel polynomials; Minmax polynomials;

(43)

Polynomials

Chebyshev polynomials 10−3 10−2 10−1 100 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 x y Chebyshev Kernel Kernel polynomial Kˆm(λ; ξ) = Pm j=0 φj(λ)φj(ξ).

(44)

Kernel Polynomial

−20 −15 −10 −5 0 5 10 15 20 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 lambda p(lambda Kernel polynomial

(45)

Constructing Polynomials

Minmax polynomial:

min

λ∈(α,β),p(λ)∈Pm

kf(λ) − p(λ)k∞,

Least square polynomial: min_{λ∈(α,β),p(λ)∈P}m kf(λ) − p(λ)kw,

10−4 10−3 10−2 10−1 100 101 −200 0 200 400 600 800 1000 1200

Least square polynomial

x y LS poly eigenvalues 1/x 10−4 10−3 10−2 10−1 100 101 −100 −50 0 50 100 150 200 250 300 350

Least square error

x

y

LS error error on eigenvalues

(46)

Comparison (

_{n = 2500}

, degree=20)

polynomial MATVECs CPU time (sec)

Minmax 5240 32.4

Chebyshev 5230 35.6

Leja 6790 41.8

Kernel 6760 42.0

Chebyshev least square 6770 42.2

Ritz least square 6100 42.4

Legendre least square 7290 50.8

NCTS 2006 School on Modern Numerical Methods in Mathematics and Physics : Methods for Accelerating the Arnoldi Iteration

Methods for Accelerating the

Arnoldi Iteration

Background

Ax = λx

Ax = λBx

A

B

y ← Ax

y ← Bx

Arnoldi Iteration

(θ

, y

)

V

(AV

y−θV

y) = 0

)

H

y

= θ

y

,

H

= V

AV

.

θ

(

)

Checking Convergence

z = V

y

H

y = θy

kAz − θzk = kAV

y − θV

yk

=

k(V

H

+ f e

)y

− θV

yk

=

kfk|e

y|

Convergence of Arnoldi

v

∈

m

A

m

AV

= V

H

.

v

Convergence of Ritz values

Computational Cost

V

H

O(nm + m

)

f

← (I − V

V

)Av

O(nm

)

H

O(m

)

y ← Ax

Acceleration Methods

 polynomial

rational

 polynomial

_y−θV

_k(V

_{− θV}

_yk

_kfk|e

_y|

polynomial

polynomial

_{← ψ(A)v}

_ψ(λ)

_{· · · + γ}

_{· · · + γ}