NCTS 2006 School on Modern Numerical Methods in Mathematics and Physics : The Use of a Preconditioner in Iterative Methods for Solving Large-scale Eigenvalue Problems

(1)

The Use of a Preconditioner in

Iterative Methods for Solving

Large-scale Eigenvalue

Problems

Chao Yang

Computational Research Division

Lawerence Berkeley National Laboratory Berkeley, California, USA

(2)

Limitation of a Krylov Subspace

May require a high degree polynomial

φ(λ)

to produce an accurate approximation

z = φ(A)v

₀;

Subspace of large dimension Many restarts

Spectral transformation may be

prohibitively costly (sometime impossible)

(3)

Preconditioner for

_{Ax = b}

Solve

K

−1

Ax = K

−1

b

; Choose

K

such that

κ(K

−1

A) ≪ κ(A)

;

Eigenvalue of

κ(K

−1

A) ≪ κ(A)

are clustered;

K

is easy to construct, and solving

Ky = z

is more efficient than solving

(4)

Preconditioner for

_{Ax = λx}

?

Eigenvectors are not preserved under

K

−1

A

;

Cannot extract correct spectral info from

K(K

−1

A, b; m)

directly;

However, preconditioning does make

sense if we treat an eigenvalue problem as

a system of nonlinear equations (JD) an optimization problem (LOBPCG)

(5)

Nonlinear Equation Point of View

Because

λ(x) = x

T

Ax/x

T

x

,

Ax = λ(x)x,

is a nonlinear equation in

x

; Alternative formulation

Ax = (x

T

Ax)x

x

T

x = 1

Many solutions;

(6)

Solve by Newton’s Correction

Given a starting guess

u

such that

u

T

u = 1

;

Let

θ = u

T

Au

;

Seek

(z, δ)

pair such that

A(u + z) = (θ + δ)(u + z)

Ignore the 2nd order term

δz

(Newton correction) and impose

(7)

The Correction Equation

Augmented form

A − θI u

u

T

0 z

−δ

=

−r

0 ,

where

r = Au − θu

; Projected form

(I − uu

T

)(A − θI)(I − uu

T

)z = −r

where

u

T

z = 0

(8)

Solving the Correction Eq (Direct)

Assume θ not converged yet, block elimination yields

  I 0 uT (A − θI)−1 1     A − θI u 0 γ     z −δ   =   −r 0   , where γ = uT (A − θ)−1u δ = u T _{(A − θI)}−1_r uT_{(A − θI)}−1_u.

(9)

Connection with Inverse Iteration

Adding correction

z

to

u

directly

x = u + z = u + δ(A − θI)−1u − u = δ(A − θI)−1u

Quadratic convergence in general Cubic convergence for symmetric problems

But requires solving

(A − θI)x = u

(10)

Jacobi-Davidson (JD)

Solving the correction equation iteratively

(I − uu

T

)(A − θI)(I − uu

T

)z = −r

where

u

T

z = 0

Allows the use of a preconditioner; Instead of adding

z

to

u

, construct a

search space

S = {u, z}

;

(11)

JD Inner-outer Iteration

Input: A, v₀, tol;

Output: (u, θ) such that kAu − θuk is small

1. u ← v0/kv0k, V ← (u), θ = uT Au, r ← Au − θu;

2. while ( krk > tol)

(a) Iteratively solve the correction equation

(I − V V T)(A − θI)(I − V V T)z = −r approximately; (b) z ← (I − V V T )z;

(c) V ← (V, z), H = V T AV ;

(d) Solve Hy = θy and select the desired (y, θ);

(12)

Practical Issues

Choose an iterative solver and a preconditioner (for the correction equation);

Set tolerance for the inner iteration; Shift selection;

Restart (set a limit on the dimension of

V

); Compute more than one eigenpair

(13)

Preconditioner for the Correction Eq

If

K ≈ A

, then

K ≈ ˆ

ˆ

A

, where

ˆ

K = (I − V V T )K(I − V V T), A = (I − V Vˆ T)A(I − V V T);

Precondition: must solve

Kx = b

ˆ

;

Use augmented formulation and block elimination ˆ K† = (I − Y G−1V T )K−1 = K−1(I − V G−1UT ), where Y = K−1V , G = V TY , U = K−TY ; MATVEC: y ← (I − V G−1Y T)K−1A(I − Y G−1V T)x

(14)

Termination Criterion - Inner Iteration

Fixed tolerance (larger than that required for the eigen-residual)

Dynamic tolerance (tighter when eigen-residual small)

Can estimate eigen-residual from the correction equation residual for certain solvers (CG, sQMR)

(15)

Example

normal mode vibration analysis for macromolecules

3000-atom,

n = 9000

, interested in low frequency modes (small eigenvalues)

(16)

Effect of Preconditioner

0 20 40 60 80 100 120 140 160 180 200 10−5 10−4 10−3 10−2 10−1 100 101 102 103 i λi λ(F) λ(B−1F) 0 20 40 60 80 100 120 140 160 180 200 10−5 10−4 10−3 10−2 10−1 100 101 102 103 i λi λ(F) λ(L−1CL−T)

(17)

Convergence History

0 20 40 60 80 100 120 140 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1

Convergence history of leading 20 eigenpairs of problem pe3k

work (flops) / n2

residual norm

(18)

Other Issues

Block version (not trivial)

Can extend JD to polynomial or rational

eigenvalue problems (Van der Vorst, Voss, Lin)

Automatic parameter tuning Missing eigenvalues?

(19)

The Optimization View

Only valid for symmetric problems, extreme eigenvalues Constrained optimization

min

xT_x=1

x

T

_Ax

Lagrangian

L(x, λ) = x

T

Ax − λ(x

T

x − 1)

KKT condition

Ax − λx = 0

x

T

x = 1

(20)

Geometry

−1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 constraint Rayleigh quotient

(21)

Direct Constrained Minimization

Assume

x

_k is current approximation; Update by

x

_k+1

= αx

_k

+ βp

_k

p

_k is a descent (search) direction;

α

,

β

are chosen so that

x

T_k+1

x

_k+1

= 1

;

(22)

Search Direction

Steepest descent

r

_k

= −∇

_x

L(x

_k

, θ

_k

) = −(Ax

_k

− θ

_k

x

_k

)

Conjugate gradient

p

_k

= p

_k−1

+ γr

_k Choose

γ

so that

p

T_k

Ap

_k−1

= 0

Maintaining the orthonormality constraint?

(23)

Subspace Minimization

Let

V = (x

_k

, p

_k−1

, r

_k

)

, then

x

_k+1

= V y

_k, for some

y

_k

∈ R

3; Must solve

min

yT k V T V yk=1

y

_kT

V

T

AV y

_k Equivalent to solving

Gy

_k

= λBy

_k

y

_kT

By

_k

= 1

where

B = V

T

V

and

G = V

T

AV

;

(24)

Compute More Eigenpairs

Trace minimization

min

XT _X=I m

1

2

trace

(X

T

_AX)

where

X ∈ R

n×m; Gradient

R

_k

= ∇

_X

L(X

_k

, Λ

_k

) = AX

_k

− X

_k

Λ

_k

,

where

Λ

_k

= X

_kT

AX

_k;

(25)

LOBPCG (Knyazev)

Input: A, K, X₀ ∈ Rn×m, tol;

Output: (X, Λ) such that kAX − XΛk is small

1. Orthonormalize columns of X0, Λ = X₀∗AX0, i = 1,

P₀ = [ ],R₁ = AX₀ − X₀Λ,;

2. while ( kRik > tol)

(a) Set V = (X_i−1, P_i−1, K−1Ri);

(b) Compute the eigenvectors G corresponding to the m

smallest eigenvalues of (H, B), where B = V TV and

H = V TAV ;

(c) Xi = V G(1 : m, :); Λi = X_iT AXi,

Pi = V G(m + 1 : 3m, :);

(26)

Practical Issues

Choice of preconditioner

Linear dependency between columns of

V

;

Deflation (not all eigenpairs converge at the same rate)

Extension to (symmetric) generalized eigenvalue problem (straightforward)

(27)

Example

Small accelerator model (n = 750)

Generalized problem

Interested in the smallest eigenvalue

0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700 nz = 26634

(28)

Change in residual

Diagonal Preconditioner 10−4 10−3 10−2 10−1 100 101 residual norm no preconditioner preconditioned

(29)

Extension to a Nonlinear EV Problem

Quantum many-body problem reduced to single-particle wavefunctions through

DFT;

single particle wavefunctions (orbitals)

X = (x₁, x₂, ..., xk), X∗X = Ik, xi ∈ Cn

n - real space grid size;

k - number of occupied states;

Charge density ρ(X) = diag(XX∗);

Kohn-Sham total energy

(30)

KS Total Energy Minimization

min

X∗_X=I_k Etot(X) ≡ Ekinetic(X) + Eion(X) + EHartree(X) + Exc(X),

where Ekinetic = 1 2trace(X ∗_LX) Eionic = 1 2 h trace(XDionX∗) + X i X ℓ (x∗wℓ)2 i EHartree = 1 4ρ(X) T_Sρ(X) Exc = eT (fxc[ρ(X)])

(31)

First order (KKT) Condition

KKT condition ∇XL(X, Λ) = 0; X∗X = I. Kohn-Sham equation H(X)X = XΛ, X∗X = I. Kohn-Sham Hamiltonian H(X) = L +Dion+ X ℓ wℓw_ℓ∗+diag(Sρ(X)) +diaggxc(ρ(X))

(32)

Two Approaches

Work with the KS equation

Self-Consistent Field Iteration Minimize the total energy directly

Direct Constrained Minimization (extension of LOBPCG)

(33)

The SCF Iteration

Initial X ? Solve H(X) ˆX = ˆX ˆΛ ? PP_PP PP_P PP PP PPP H( ˆX) ˆX = ˆX ˆΛ? ? Y N X ← ˆX - -_Terminate

(34)

Problems with SCF

Does not always converge,

E

_tot

(X)

may increase;

Little convergence theory

Fixed-point iteration

X

(i+)

= F (X

(i)

)

. But what is

F (·)

?

Solve large-scale linear eigenvalue at each iteration?

(35)

Direct Constrained Minimization

Minimize the total energy directly; Block method

Wavefunction update similar to LOBPCG

X(i+1) = X(i)Gx + P(i−1)Gp + R(i)Gr,

R(i) = K−1(H(X(i))X(i) − X(i)Θ(i)),

Θ(i) = X(i)∗H(X(i))X(i).

Choose Gx, Gp, Gr to

Minimize Etot in Y = (X(i), P(i−1), R(i));

(36)

Minimization within a Subspace

Let Y = (X(i), P(i−1), R(i)); Solve minGEtot(Y G) s.t.G∗Y ∗Y G = Ik Equivalent to solving ˆ H(G)G = BGΩk G∗BG = Ik, ˆ H(G) = Y ∗H(Y G)Y, B = Y ∗Y,

(37)

Solving the Projected Problem

A smaller nonlinear (generalized) eigenvalue problem

ˆ

H(G)G = BGΩ

_k

G

∗

BG = I

_k

,

where

H(G) ∈ C

3k×3k,

G ∈ C

3k×k.

Modify SCF by introducting a trust region (TRSCF)

(38)

The Optimization View of SCF

SCF minimizes a sequence of surrogate models

Objective:

Etot(X) = Ekinetic(X) + Eion(X) + EHartree(X) + Exc(X)

Esurrogate(X) = 1₂trace(X∗H(X(i))X)

Gradient:

∇Etot(X) = H(X)X

(39)

Toy Example

E

_tot

(x) =

1

2 x

T

_{Lx +}

α

4 ρ(x)

T

_L

−1

_ρ(x)

L =

2 −1

−1 2

, x =

x

1

x

₂

, ρ(x) =

x

2 1

x

2₂

min E

_tot

(x)

s.t.

x

2₁

+ x

2₂

= 1

When does SCF work? How can it fail?

(40)

SCF Works (

_{α = 2.0}

)

−1.5 −1 −0.5 0 0.5 1 1.5 x 2 constraint surrogate total energy

(41)

SCF Works

−0.85 −0.8 −0.75 −0.7 −0.65 −0.6 −0.75 −0.7 −0.65 −0.6 x 1 x 2 constraint surrogate total energy

(42)

SCF Fails (

_{α = 12.0}

)

−1 −0.5 0 0.5 1 1.5 x 2 constraint surrogate total energy

(43)

SCF Fails

−0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 x 1 x 2 constraint surrogate total energy

(44)

Improving SCF

Construct better surrogate

Cannot afford to use local quadratic approx (Hessian too expensive)

Charge-mixing

Use Trust Region to restrict the update of the wavefunction in a small neighborhood of the gradient matching point

TRSCF (Thogersen, Olsen, Yeager & Jorgensen 2004)

(45)

Trust Region

Must be defined on quantities that are “rotationally invariant” Density matrix

D(X) = D(XQ) = XX

∗; Charge density

ρ(X) = ρ(XQ) =

diag

(D(X))

; Use

kD(X) − D(X

(0)

)k

_F

≤ ∆

The trust region subproblem must be easy to solve

(46)

Trust Region Subproblem

Instead of solving

min

_X∗_X=I

q(X)

s.t.kD(X) − D(X

(0)

)k

_F

≤ ∆

We solve the penalized problem

min

X∗_X−I

q(X) +

σ

2 kD(X) − D(X

(0)

_k

2 F or equivalently

(47)

TRSCF

In each TRSCF iteration, we solve

min X∗_X=I 1 2trace h X∗H(X(0)X − σX∗X(0)X(0)∗Xi

First order (KKT) condition

h H(X(0)) − σX(0)X(0)∗iX = XΛ X∗X = I At convergence λ₁ − σ, λ₂ − σ, ..., λk − σ, λk+1, ..., λn How to pick σ?

(48)

The Effect of Trust Region

−0.9 −0.8 −0.7 −0.6 −0.5 −0.4 x 2 constraint surrogate total energy

(49)

DCM

Input: L,Dion,wℓ, X0 ∈ Rn×m;

Output: X such that Etot(X) is minimized

1. Orthonormalize columns of X0, Θ = X₀∗H(X(0))X0, i = 1,

P₀ = [];

2. while ( not converged ) (a) Ri = H(Xi)Xi − XiΛ,

(b) Set Y = (X_i−1, P_i−1, L−1Ri);

(c) Solve minG∗_Y ∗Y G=I_k E_tot(Y G);

(d) Xi = Y G(1 : m, :); Λi = X_iT AXi,

(50)

Numerical Example

Atomistic sytem: NiPtO

Discretization: spectral method with plane wave basis

n = 96 × 48 × 48 in real space, N = 15179 (number of basis functions) in frequency space

Number of occupied states k = 43

PETOT version of SCF does 10 PCG step per outer iteration

DCM 5 inner iteration

(51)

Convergence

0 100 200 300 400 500 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

wall clock time (seconds)

∆

E(X

(i) )

DCM SCF

(52)

Conclusion

The use of a preconditioner is natural if we treat an eigenvalue problem (both linear and nonlinear) as

a nonlinear system of equations

a constrained nonlinear optimization

problem (for certain class of problems) The choice of preconditioner is application dependant

NCTS 2006 School on Modern Numerical Methods in Mathematics and Physics : The Use of a Preconditioner in Iterative Methods for Solving Large-scale Eigenvalue Problems

The Use of a Preconditioner in

Iterative Methods for Solving

Large-scale Eigenvalue

Problems

Limitation of a Krylov Subspace

φ(λ)

z = φ(A)v

Preconditioner for

Ax = b

K

Ax = K

b

K

κ(K

A) ≪ κ(A)

κ(K

A) ≪ κ(A)

K

Ky = z

Preconditioner for

Ax = λx

?

K

A

K(K

A, b; m)

Nonlinear Equation Point of View

λ(x) = x

Ax/x

x

Ax = λ(x)x,

x

Ax = (x

Ax)x

x

x = 1

Solve by Newton’s Correction

u

u

u = 1

θ = u

Au

(z, δ)

A(u + z) = (θ + δ)(u + z)

δz

The Correction Equation

 A − θI u

u

0

 

z

−δ



=

 −r

0



,

r = Au − θu

(I − uu

)(A − θI)(I − uu

)z = −r

u

z = 0

Solving the Correction Eq (Direct)

Connection with Inverse Iteration

z

u

(A − θI)x = u

Jacobi-Davidson (JD)

(I − uu

)(A − θI)(I − uu

)z = −r

u

z = 0

z

u

S = {u, z}

JD Inner-outer Iteration

_{Ax = b}

_{Ax = λx}

A − θI u

−r

_Ax