Methods for Accelerating the
Arnoldi Iteration
Chao Yang
Computational Research Division
Lawerence Berkeley National Laboratory Berkeley, California, USA
Background
Large-scale eigenvalue problems
Ax = λx
orAx = λBx
A
,B
large, sparse, or structuredy ← Ax
andy ← Bx
can be computed efficientlyKrylov subspace methods Acceleration techniques
Arnoldi Iteration
Produces an orthonormal basis Vm = {v1, v2, ..., vm}
associated with a Krylov Subspace
K(A, v1; m) = span{v1, Av1, ..., Amv1}
A single step (Gram-Schmidt):
fj ← (I − VjVjH)Avj; vj+1 ← fj/kfjk;
After m steps:
Approximate eigenvalues and vectors from K{A, v1; m}
Find
(θ
i, y
i)
such thatV
mH(AV
my−θV
my) = 0
(Galerkin condition)
Equivalent to solvingH
my
i= θ
iy
i,
whereH
m= V
mHAV
m.
Approximation toeigenvalue
θ
i(
Ritz value)
Checking Convergence
Let
z = V
my
, whereH
my = θy
; Residual normkAz − θzk = kAV
my − θV
myk
=
k(V
mH
m+ f e
Hm)y
− θV
myk
=
kfk|e
Hmy|
Convergence of Arnoldi
If
v
1∈
anm
-dimensional invariantsubspace of
A
, Arnoldi converges inm
orfew steps:
AV
m= V
mH
m.
One rarely finds such a good
v
1. In general, extremal and well separatedeigenvalues emerge rapidly (Kaniel,
Convergence of Ritz values
0 5 10 15 20 25 30 35 −20 −15 −10 −5 0 5 10 15 20Lanczos iteration number
Computational Cost
Storage for
V
m,H
m:O(nm + m
2)
;Orthogonalization
f
m← (I − V
mV
mH)Av
m:O(nm
2)
;Eigen-analysis of
H
m:O(m
3)
; MATVECy ← Ax
: varies with applications;Acceleration Methods
Method of implicit restart
polynomial
rational
Method of spectral transformation
polynomial
rational
Precondition (tomorrow’s lecture)
Solve an eigenvalue problem as either a system of nonlinear equations or an optimization problem
Restarting an Arnoldi iteration
1. Fix the dimension
m
ofK(A, v
1; m)
at a moderate value;2. Modify the starting vector
v
1← ψ(A)v
1;
3. Repeat the Arnoldi process with the modified
v
1;How to choose
ψ(λ)
?
Suppose eigenvalues of
A
are:λ
1, ..., λ
k,
|
{z
}
wantedλ
k+1, ..., λ
n|
{z
}
unwanted,
and the corresponding eigenvector are
x
1, x
2, ..., x
n.ψ(A)v
1= γ
1ψ(λ
1)x
1+
· · · + γ
kψ(λ
k)x
k|
{z
}
wanted+ γ
k+1ψ(λ
k+1)x
k+1+
· · · + γ
nψ(λ
n)x
n|
{z
}
unwantedTwo types of restarts
1.
ψ(λ)
is a polynomial;2.
ψ(λ)
is a rational function;ψ(λ)
must be large on
λ
1, ..., λ
kand small
Filter Example
10−4 10−3 10−2 10−1 100 101 −2 0 2 4 6 8 10 12 14 16x 10 30 lambda p(lambda)Implicit Restart
1. Do not form
v
1← ψ(A)v
1 explicitly;2. Do not repeat the Arnoldi iteration from the first column;
Need to understand the connection
between Arnoldi and QR (RQ)
QR and RQ iteration
AV = V H
ւ
ց
AV = V QR
AV = V RQ
⇓
⇓
A(V Q) = (V Q)RQ AV Q
H= V Q
H(QR)
ց
ւ
AV
+= V
+H
+Difference
QR RQAV = (V Q)R A(V Q
H) = V R
⇓
⇓
Av
1= v
1+ρ
1,1Av
1+= v
1ρ
1,1 power inverse power inverseArnoldi = Truncated Hessenberg Reduction
=
=
V
Implicit Researt Arnoldi = Truncated QR Iteration
AV
m= V
mH
m+ f e
HmAV
m= V
mQR + f e
HmAV
mQ = (V
mQ)(RQ) + f e
HmQ
Shifts & Polynomial filter
Truncated Hessenberg reduction is shift-invariant
(A − µI)Vm = Vm(Hm − µI) + feHm
Applying p shifts = Running p QR iterations on Hm
v1+ = β(A − µ1I)(A − µ2I) · · · (A − µpI)v1
What to use for shifts? Eigenvalues of Hm.
θ1, ..., θk, | {z } wanted θk+1, ..., θm | {z } unwanted , m = k + p
Filtering Polynomial (seek the smallest eigenvalues) 10−4 10−3 10−2 10−1 100 101 −2 0 2 4 6 8 10 12 14 16x 10 30 lambda p(lambda)
ARPACK
http://www.caam.rice.edu/software/ARPACKSolve a variety of problems (sym, nonsym, real, complex)
Location of the eigenvalues: which = LM, SM, LA, SA, BE, LR, SR, LI, SI
Choose
k
andp
(nev=k
, ncv=k + p
).p
is the degree of the filtering polynomialReverse communication interface Level-2 BLAS
Reverse Communication
10 continue
call dsaupd(ido, ..., workd(in),
& workd(out))
if (ido .eq. 1) then
call matvec(..., workd(in),
& workd(out))
endif ...
Truncating the RQ Iteration
RQ-iteration
AV = V H=⇒AV = V RQ=⇒AV QH = (V QH)QR
Let V + = V QH and v1+ = V +e1, then Av1+ = v1ρ1,1
Truncating RQ
The TRQ Equation
=(A
− µI)v
+= V
kh + vα, V
kHv
+= 0,
kv
+k = 1
⇓
A − µI V
kV
kH0
v
+−h
=
vα
0
, kv
+k = 1.
Solving the TRQ equation
Using
AV
k= V
kH
k+ f
ke
Hk to simplify the bordered system.1.
w ← (I − V
kV
kH)(A
− µI)
−1(V
ks)
; 2.v
+← w/kwk
;Example
The matrix A: 2-D discrete Laplacian (100 × 100);
Target: smallest eigenvalues;
k = 4; Convergence tolerance: 10−15; α1 β1 β1 α2 β2 β2 α3 β3 β3 α4 β4 β4 α5
Convergence
iter. β1 β2 β3 β4
1 1.5e-2 4.2e+1 2.3e+1 1.9e+1
2 5.1e-4 3.2e-2 4.3e+1 2.3e+1
3 1.1e-6 9.1e-5 7.1e-4 3.8e-2
4 8.3e-13 7.0e-5 1.0e-4 8.5e-2
5 7.1e-24 2.9e-5 2.9e-4 2.5e-4
6 0.0 9.6e-6 1.1e-5 1.1e-4
7 0.0 3.7e-8 1.0e-6 4.8e-1
8 0.0 4.4e-19 1.5e-11 6.0e-3
Inexact TRQ
w ← (I − V
kV
kH) (A
− µI)
−1(V
ks)
|
{z
}
solve iteratively;
˜
v
+← w/kwk;
˜h ← V
H kA˜
v
+; ˜
α ← v
H(A
− µI)˜v
+;
Error Damping
(A
− µI)˜v
1+= ρ
1,1v
1+ zδ
where
δ
is a product ofk − 1
sines (error damping).Linear Convergence
kr+k ≤ ψ(µ, ν)krk, |ψ(µ, ν)| ≤ p ν ζ2 − ν2 + O(krk), where r = (A − µI)v1, r+ = (A − µ+I)v1+;µ and µ+ are Rayleigh Quotient shifts (converging);
ν damped residual error (from TRQ equation);
Observation
kr
+k ≤ ψ(µ, ν)krk,
|ψ(µ, ν)| ≤
p
ν
ζ
2− ν
2+
O(krk).
ν
is normally several magnitude smaller thankzk
;Asymptotically, if
|ν| < ζ/
√
2
, thenψ < 1
monotonic convergence;If
ν = 0
,kr
+k ≤ γkrk
2 (quadratic convergence of RQI)Convergence of Inexact TRQ
iter.
α
1kzk
β
1 1 3.0e-3 - 7.8e-3 2 9.7e-4 1.9e-3 7.3e-5 3 9.7e-4 2.5e-3 3.2e-6 4 9.7e-4 5.7e-4 1.5e-7 5 9.7e-4 1.6e-3 6.7e-9 6 9.7e-4 9.2e-4 3.2e-10 7 9.7e-4 1.3e-3 1.6e-11Example (Reactive Scattering)
n = 256
, symmetric Target:smallestk = 6
;Solver: MINRES; tol =
1.0
−8; max iter =100
.Residual
IRAM precond no−precond 0 0.5 1 1.5 2 2.5 3 3.5 x 108 10−10 10−8 10−6 10−4 10−2 100 102 104 106 108 1010 flops residual norm matrix size = 256Spectral Transformation
Compute eigenvalues of
ψ(A)
instead; Cluster or interior eigenvalues ofA
becomes dominant and well separated eigenvalues of
ψ(A)
;Eigenvectors are preserved, recover eigenvalues by Rayleigh Quotient. Two types of
ψ(λ)
:rational function:
ψ(A) = (A − µI)
−1; polynomial;Rational Transformation
10−3 10−2 10−1 100 0 100 200 300 400 500 600 700 800 900 1000 λ λ −1 Shift-invert (A − σI)−1x = λ−σ1 x Caley Transformation (A − σ1I)−1 (A + σ2I)x = λ+σ2 λ−σ1 xGeneralized Problems
B symmetric positive definite, B = LLT
L−1AL−T(LT x) = λ(LTx)
B symmetric positive semidefinite,
(A − σB)−1Bx = 1
λ − σ x
symmetric w/s to B-semi-inner product
B nonsymmetric,
B−1Ax = λx
Practical Issues
Sparse matrix ordering and factorization Sparse triangular solves
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 nz = 95138
Polynomial Transformations
Only requires MATVEC;
Fewer number of iterations implies fewer inner products;
Polynomial Construction
Chebyshev and Kernel polynomials; Minmax polynomials;
Polynomials
Chebyshev polynomials 10−3 10−2 10−1 100 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 x y Chebyshev Kernel Kernel polynomial Kˆm(λ; ξ) = Pm j=0 φj(λ)φj(ξ).Kernel Polynomial
−20 −15 −10 −5 0 5 10 15 20 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 lambda p(lambda Kernel polynomialConstructing Polynomials
Minmax polynomial:
min
λ∈(α,β),p(λ)∈Pm
kf(λ) − p(λ)k∞,
Least square polynomial: minλ∈(α,β),p(λ)∈Pm kf(λ) − p(λ)kw,
10−4 10−3 10−2 10−1 100 101 −200 0 200 400 600 800 1000 1200
Least square polynomial
x y LS poly eigenvalues 1/x 10−4 10−3 10−2 10−1 100 101 −100 −50 0 50 100 150 200 250 300 350
Least square error
x
y
LS error error on eigenvalues
Comparison (
n = 2500
, degree=20)
polynomial MATVECs CPU time (sec)
Minmax 5240 32.4
Chebyshev 5230 35.6
Leja 6790 41.8
Kernel 6760 42.0
Chebyshev least square 6770 42.2
Ritz least square 6100 42.4
Legendre least square 7290 50.8