The Use of a Preconditioner in
Iterative Methods for Solving
Large-scale Eigenvalue
Problems
Chao Yang
Computational Research Division
Lawerence Berkeley National Laboratory Berkeley, California, USA
Limitation of a Krylov Subspace
May require a high degree polynomial
φ(λ)
to produce an accurate approximationz = φ(A)v
0;Subspace of large dimension Many restarts
Spectral transformation may be
prohibitively costly (sometime impossible)
Preconditioner for
Ax = b
Solve
K
−1Ax = K
−1b
; ChooseK
such thatκ(K
−1A) ≪ κ(A)
;Eigenvalue of
κ(K
−1A) ≪ κ(A)
are clustered;K
is easy to construct, and solvingKy = z
is more efficient than solvingPreconditioner for
Ax = λx
?
Eigenvectors are not preserved under
K
−1A
;Cannot extract correct spectral info from
K(K
−1A, b; m)
directly;However, preconditioning does make
sense if we treat an eigenvalue problem as
a system of nonlinear equations (JD) an optimization problem (LOBPCG)
Nonlinear Equation Point of View
Becauseλ(x) = x
TAx/x
Tx
,Ax = λ(x)x,
is a nonlinear equation inx
; Alternative formulationAx = (x
TAx)x
x
Tx = 1
Many solutions;Solve by Newton’s Correction
Given a starting guess
u
such thatu
Tu = 1
;Let
θ = u
TAu
;Seek
(z, δ)
pair such thatA(u + z) = (θ + δ)(u + z)
Ignore the 2nd order term
δz
(Newton correction) and imposeThe Correction Equation
Augmented formA − θI u
u
T0
z
−δ
=
−r
0
,
wherer = Au − θu
; Projected form(I − uu
T)(A − θI)(I − uu
T)z = −r
whereu
Tz = 0
Solving the Correction Eq (Direct)
Assume θ not converged yet, block elimination yields
I 0 uT (A − θI)−1 1 A − θI u 0 γ z −δ = −r 0 , where γ = uT (A − θ)−1u δ = u T (A − θI)−1r uT(A − θI)−1u.
Connection with Inverse Iteration
Adding correction
z
tou
directlyx = u + z = u + δ(A − θI)−1u − u = δ(A − θI)−1u
Quadratic convergence in general Cubic convergence for symmetric problems
But requires solving
(A − θI)x = u
Jacobi-Davidson (JD)
Solving the correction equation iteratively
(I − uu
T)(A − θI)(I − uu
T)z = −r
where
u
Tz = 0
Allows the use of a preconditioner; Instead of adding
z
tou
, construct asearch space
S = {u, z}
;JD Inner-outer Iteration
Input: A, v0, tol;
Output: (u, θ) such that kAu − θuk is small
1. u ← v0/kv0k, V ← (u), θ = uT Au, r ← Au − θu;
2. while ( krk > tol)
(a) Iteratively solve the correction equation
(I − V V T)(A − θI)(I − V V T)z = −r approximately; (b) z ← (I − V V T )z;
(c) V ← (V, z), H = V T AV ;
(d) Solve Hy = θy and select the desired (y, θ);
Practical Issues
Choose an iterative solver and a preconditioner (for the correction equation);
Set tolerance for the inner iteration; Shift selection;
Restart (set a limit on the dimension of
V
); Compute more than one eigenpairPreconditioner for the Correction Eq
If
K ≈ A
, thenK ≈ ˆ
ˆ
A
, whereˆ
K = (I − V V T )K(I − V V T), A = (I − V Vˆ T)A(I − V V T);
Precondition: must solve
Kx = b
ˆ
;Use augmented formulation and block elimination ˆ K† = (I − Y G−1V T )K−1 = K−1(I − V G−1UT ), where Y = K−1V , G = V TY , U = K−TY ; MATVEC: y ← (I − V G−1Y T)K−1A(I − Y G−1V T)x
Termination Criterion - Inner Iteration
Fixed tolerance (larger than that required for the eigen-residual)
Dynamic tolerance (tighter when eigen-residual small)
Can estimate eigen-residual from the correction equation residual for certain solvers (CG, sQMR)
Example
normal mode vibration analysis for macromolecules
3000-atom,
n = 9000
, interested in low frequency modes (small eigenvalues)Effect of Preconditioner
0 20 40 60 80 100 120 140 160 180 200 10−5 10−4 10−3 10−2 10−1 100 101 102 103 i λi λ(F) λ(B−1F) 0 20 40 60 80 100 120 140 160 180 200 10−5 10−4 10−3 10−2 10−1 100 101 102 103 i λi λ(F) λ(L−1CL−T)Convergence History
0 20 40 60 80 100 120 140 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1Convergence history of leading 20 eigenpairs of problem pe3k
work (flops) / n2
residual norm
Other Issues
Block version (not trivial)
Can extend JD to polynomial or rational
eigenvalue problems (Van der Vorst, Voss, Lin)
Automatic parameter tuning Missing eigenvalues?
The Optimization View
Only valid for symmetric problems, extreme eigenvalues Constrained optimization
min
xTx=1x
TAx
LagrangianL(x, λ) = x
TAx − λ(x
Tx − 1)
KKT conditionAx − λx = 0
x
Tx = 1
Geometry
−1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 constraint Rayleigh quotientDirect Constrained Minimization
Assume
x
k is current approximation; Update byx
k+1= αx
k+ βp
kp
k is a descent (search) direction;α
,β
are chosen so thatx
Tk+1x
k+1= 1
;Search Direction
Steepest descentr
k= −∇
xL(x
k, θ
k) = −(Ax
k− θ
kx
k)
Conjugate gradientp
k= p
k−1+ γr
k Chooseγ
so thatp
TkAp
k−1= 0
Maintaining the orthonormality constraint?
Subspace Minimization
LetV = (x
k, p
k−1, r
k)
, thenx
k+1= V y
k, for somey
k∈ R
3; Must solvemin
yT k V T V yk=1y
kTV
TAV y
k Equivalent to solvingGy
k= λBy
ky
kTBy
k= 1
whereB = V
TV
andG = V
TAV
;Compute More Eigenpairs
Trace minimizationmin
XT X=I m1
2
trace(X
TAX)
whereX ∈ R
n×m; GradientR
k= ∇
XL(X
k, Λ
k) = AX
k− X
kΛ
k,
whereΛ
k= X
kTAX
k;LOBPCG (Knyazev)
Input: A, K, X0 ∈ Rn×m, tol;
Output: (X, Λ) such that kAX − XΛk is small
1. Orthonormalize columns of X0, Λ = X0∗AX0, i = 1,
P0 = [ ],R1 = AX0 − X0Λ,;
2. while ( kRik > tol)
(a) Set V = (Xi−1, Pi−1, K−1Ri);
(b) Compute the eigenvectors G corresponding to the m
smallest eigenvalues of (H, B), where B = V TV and
H = V TAV ;
(c) Xi = V G(1 : m, :); Λi = XiT AXi,
Pi = V G(m + 1 : 3m, :);
Practical Issues
Choice of preconditioner
Linear dependency between columns of
V
;Deflation (not all eigenpairs converge at the same rate)
Extension to (symmetric) generalized eigenvalue problem (straightforward)
Example
Small accelerator model (n = 750)
Generalized problem
Interested in the smallest eigenvalue
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700 nz = 26634
Change in residual
Diagonal Preconditioner 10−4 10−3 10−2 10−1 100 101 residual norm no preconditioner preconditionedExtension to a Nonlinear EV Problem
Quantum many-body problem reduced to single-particle wavefunctions through
DFT;
single particle wavefunctions (orbitals)
X = (x1, x2, ..., xk), X∗X = Ik, xi ∈ Cn
n - real space grid size;
k - number of occupied states;
Charge density ρ(X) = diag(XX∗);
Kohn-Sham total energy
KS Total Energy Minimization
min
X∗X=Ik Etot(X) ≡ Ekinetic(X) + Eion(X) + EHartree(X) + Exc(X),
where Ekinetic = 1 2trace(X ∗LX) Eionic = 1 2 h trace(XDionX∗) + X i X ℓ (x∗wℓ)2 i EHartree = 1 4ρ(X) TSρ(X) Exc = eT (fxc[ρ(X)])
First order (KKT) Condition
KKT condition ∇XL(X, Λ) = 0; X∗X = I. Kohn-Sham equation H(X)X = XΛ, X∗X = I. Kohn-Sham Hamiltonian H(X) = L +Dion+ X ℓ wℓwℓ∗+diag(Sρ(X)) +diaggxc(ρ(X))Two Approaches
Work with the KS equation
Self-Consistent Field Iteration Minimize the total energy directly
Direct Constrained Minimization (extension of LOBPCG)
The SCF Iteration
Initial X ? Solve H(X) ˆX = ˆX ˆΛ ? PPPP PPP PP PP PPP H( ˆX) ˆX = ˆX ˆΛ? ? Y N X ← ˆX - -TerminateProblems with SCF
Does not always converge,
E
tot(X)
may increase;Little convergence theory
Fixed-point iteration
X
(i+)= F (X
(i))
. But what isF (·)
?Solve large-scale linear eigenvalue at each iteration?
Direct Constrained Minimization
Minimize the total energy directly; Block method
Wavefunction update similar to LOBPCG
X(i+1) = X(i)Gx + P(i−1)Gp + R(i)Gr,
R(i) = K−1(H(X(i))X(i) − X(i)Θ(i)),
Θ(i) = X(i)∗H(X(i))X(i).
Choose Gx, Gp, Gr to
Minimize Etot in Y = (X(i), P(i−1), R(i));
Minimization within a Subspace
Let Y = (X(i), P(i−1), R(i)); Solve minGEtot(Y G) s.t.G∗Y ∗Y G = Ik Equivalent to solving ˆ H(G)G = BGΩk G∗BG = Ik, ˆ H(G) = Y ∗H(Y G)Y, B = Y ∗Y,
Solving the Projected Problem
A smaller nonlinear (generalized) eigenvalue problem
ˆ
H(G)G = BGΩ
kG
∗BG = I
k,
where
H(G) ∈ C
3k×3k,G ∈ C
3k×k.Modify SCF by introducting a trust region (TRSCF)
The Optimization View of SCF
SCF minimizes a sequence of surrogate models
Objective:
Etot(X) = Ekinetic(X) + Eion(X) + EHartree(X) + Exc(X)
Esurrogate(X) = 12trace(X∗H(X(i))X)
Gradient:
∇Etot(X) = H(X)X
Toy Example
E
tot(x) =
1
2
x
TLx +
α
4
ρ(x)
TL
−1ρ(x)
L =
2 −1
−1 2
, x =
x
1x
2, ρ(x) =
x
2 1x
22min E
tot(x)
s.t.x
21+ x
22= 1
When does SCF work? How can it fail?
SCF Works (
α = 2.0
)
−1.5 −1 −0.5 0 0.5 1 1.5 x 2 constraint surrogate total energySCF Works
−0.85 −0.8 −0.75 −0.7 −0.65 −0.6 −0.75 −0.7 −0.65 −0.6 x 1 x 2 constraint surrogate total energySCF Fails (
α = 12.0
)
−1 −0.5 0 0.5 1 1.5 x 2 constraint surrogate total energySCF Fails
−0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 x 1 x 2 constraint surrogate total energyImproving SCF
Construct better surrogate
Cannot afford to use local quadratic approx (Hessian too expensive)
Charge-mixing
Use Trust Region to restrict the update of the wavefunction in a small neighborhood of the gradient matching point
TRSCF (Thogersen, Olsen, Yeager & Jorgensen 2004)
Trust Region
Must be defined on quantities that are “rotationally invariant” Density matrix
D(X) = D(XQ) = XX
∗; Charge densityρ(X) = ρ(XQ) =
diag(D(X))
; UsekD(X) − D(X
(0))k
F≤ ∆
The trust region subproblem must be easy to solve
Trust Region Subproblem
Instead of solving
min
X∗X=Iq(X)
s.t.kD(X) − D(X
(0))k
F≤ ∆
We solve the penalized problem
min
X∗X−Iq(X) +
σ
2
kD(X) − D(X
(0)k
2 F or equivalentlyTRSCF
In each TRSCF iteration, we solve
min X∗X=I 1 2trace h X∗H(X(0)X − σX∗X(0)X(0)∗Xi
First order (KKT) condition
h H(X(0)) − σX(0)X(0)∗iX = XΛ X∗X = I At convergence λ1 − σ, λ2 − σ, ..., λk − σ, λk+1, ..., λn How to pick σ?
The Effect of Trust Region
−0.9 −0.8 −0.7 −0.6 −0.5 −0.4 x 2 constraint surrogate total energyDCM
Input: L,Dion,wℓ, X0 ∈ Rn×m;
Output: X such that Etot(X) is minimized
1. Orthonormalize columns of X0, Θ = X0∗H(X(0))X0, i = 1,
P0 = [];
2. while ( not converged ) (a) Ri = H(Xi)Xi − XiΛ,
(b) Set Y = (Xi−1, Pi−1, L−1Ri);
(c) Solve minG∗Y ∗Y G=Ik Etot(Y G);
(d) Xi = Y G(1 : m, :); Λi = XiT AXi,
Numerical Example
Atomistic sytem: NiPtO
Discretization: spectral method with plane wave basis
n = 96 × 48 × 48 in real space, N = 15179 (number of basis functions) in frequency space
Number of occupied states k = 43
PETOT version of SCF does 10 PCG step per outer iteration
DCM 5 inner iteration
Convergence
0 100 200 300 400 500 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100wall clock time (seconds)
∆
E(X
(i) )
DCM SCF
Conclusion
The use of a preconditioner is natural if we treat an eigenvalue problem (both linear and nonlinear) as
a nonlinear system of equations
a constrained nonlinear optimization
problem (for certain class of problems) The choice of preconditioner is application dependant