A NULL SPACE FREE JACOBI-DAVIDSON ITERATION FOR MAXWELL'S OPERATOR

(1)

A NULL SPACE FREE JACOBI–DAVIDSON ITERATION

FOR MAXWELL’S OPERATOR∗

YIN-LIANG HUANG†, TSUNG-MING HUANG‡, WEN-WEI LIN§,

AND WEI-CHENG WANG¶

Abstract. We present an eﬃcient null space free Jacobi–Davidson method to compute the

positive eigenvalues of time harmonic Maxwell’s equations. We focus on a class of spatial discretiza-tions that guarantee the existence of discrete vector potentials, such as Yee’s scheme and the edge elements. During the Jacobi–Davidson iteration, the correction process is applied to the vector po-tential instead. The correction equation is solved approximately as in the standard Jacobi–Davidson approach. The computational cost of the transformation from the vector potential to the corrector is negligible. As a consequence, the expanding subspace automatically stays out of the null space and no extra projection step is needed. Numerical evidence conﬁrms that the proposed scheme indeed outperforms the standard and projection-based Jacobi–Davidson methods by a signiﬁcant margin.

Key words. time harmonic Maxwell’s equations, Yee’s scheme, edge elements, generalized eigenvalue problem, discrete vector potential, discrete deRham complex, Poincar´e Lemma, Jacobi– Davidson method

AMS subject classifications. 15A18, 15A90, 65F15 DOI. 10.1137/140954714

1. Introduction. Photonic crystals are made of dielectric materials with

pe-riodic structure. The shape and permittivity of the dielectric material completely determines the band structure of the photonic crystal. Over the past few decades, photonic crystals with speciﬁc band structures have been of practical interest and have been extensively studied. The governing equation for three-dimensional pho-tonic crystals is the time harmonic Maxwell’s equations:

∇ × H = iωεE, (1.1) ∇ × E = −iωμH, (1.2) ∇ · (εE) = 0, (1.3) ∇ · (μH) = 0, (1.4)

where ω is the frequency and ε = εrε0, μ = μrμ0. The constants ε0 = 8.854 × 10−12 _{Farad/meter and μ}0= 1.257×10−6Henry/meter represent vacuum permittivity ∗_{Submitted to the journal’s Methods and Algorithms for Scientiﬁc Computing section January 28,}

2014; accepted for publication (in revised form) August 25, 2014; published electronically January 8, 2015. This work is sponsored in part by the National Center of Theoretical Sciences of Taiwan and ST Yau Center at National Chiao Tung University.

http://www.siam.org/journals/sisc/37-1/95471.html

†_{Department of Applied Mathematics, National University of Tainan, Tainan, 700, Taiwan}

([email protected]). This author was supported by the Ministry of Science and Technology (101-2115-M-024-002).

‡_{Department of Mathematics,} _{National Taiwan Normal University, Taipei, 116,} _Taiwan

([email protected]). This author was supported by the Ministry of Science and Technology (102-2115-M-003-011-MY3).

§_{Department of Applied Mathematics, National Chiao Tung University, HsinChu, 300, Taiwan}

([email protected]). This author was supported by the Ministry of Science and Technology (103-2628-M-009-005).

¶_{Department of Mathematics, National Tsing Hua University and National Center for Theoretical}

Sciences, HsinChu, 300, Taiwan ([email protected]). This author was supported by the Ministry of Science and Technology (101-2115-M-007-002-MY2).

A1

(2)

and vacuum permeability, respectively. _{The relative permittivity εr} and relative permeability μrare dimensionless, material dependent parameters.

We can recast the Maxwell’s equations in terms of the electric ﬁeldE alone:

∇ × μ−1

r ∇ × E = λ εrE, (1.5)

∇ · (εrE) = 0, (1.6)

where λ = ε0μ0ω2 is the eigenvalue. The degenerate elliptic operator∇ × μ−1_r ∇× is self-adjoint and nonnegative. Since μr and εr are material dependent and therefore piecewise constant, (1.5) constitutes an elliptic interface problem [35, 16]. Equations (1.5) and (1.6) need to be supplied with appropriate boundary conditions which we will elaborate in the beginning of section 2.

Equation (1.6) serves as a constraint for the degenerate elliptic equation (1.5) and is redundant for the nonzero eigenvalues λ = 0 as a consequence of the basic identity from calculus

(1.7) ∇ · ∇× ≡ 0.

A traditional wisdom to reﬂect this fact is to adopt spatial discretizations that ad-mit discrete analogue of (1.7). This class of spatial discretizations includes the Yee’s scheme [37], the Whitney form [7, 36], the co-volume discretization [25], the mimetic discretization [20], and the edge element [23, 24, 28]. With this approach, the di-vergence free constraint (1.6) is ignored and the resulting discretized system is a generalized eigenvalue problem

(1.8) Ae = λBe,

whereA is the matrix representation of the discretized ∇ × μ−1_r ∇× and B is the mass matrix.

In the unconstrained formulation (1.8), the matrix A is symmetric and non-negative semideﬁnite. The major diﬃculty with this approach, however, is the large null space associated with (1.5) in the absence of (1.6). This can be seen easily from the identity

(1.9) ∇ × ∇φ ≡ 0.

In other words, the null space of∇ × μ−1_r ∇× contains all the gradient vectors. The discrete counterpart of (1.9) holds true for the generalized eigenvalue problem (1.8) with the class of spatial discretizations under consideration. As a result, a huge spurious null space arises from discarding the divergence free constraint (1.6). In Yee’s discretization, for example, the dimension of the null space is the same as the number of cells, constituting one third of the total degrees of freedom. This causes severe numerical diﬃculties in numerical computation since in practice, we are mainly interested in the lowest nonzero eigenvalues of (1.5), (1.6) which are now located deep in the interior of the spectrum.

In this paper, we propose a novel numerical scheme to handle the null space issue using a modified Jacobi–Davidson (JD) iteration. The JD method [1, 2, 3, 5, 12, 26, 31, 32] is a well-established, efficient eigensolver based on expanding subspace iteration. In contrast to classical eigensolvers such as the inverse power method [9, 15] and various Lanczos [2, 30] or Arnoldi methods [5], which require the shift-and-invert technique to compute interior eigenpairs, the linear system for the corrector in the JD iteration only needs to be solved approximately. This is the key to efficiency of

(3)

the JD method. On the other hand, since the corrector is only solved approximately, direct application of the JD method inevitably brings null vectors into the expanding subspace. As a result, the eﬃciency of the JD method is signiﬁcantly deteriorated. The pollution of the spurious null space is common to other eigensolvers as well. The standard remedy is to project the approximate eigenvector back to the orthogonal complement of the null space. A well-known approach for the projection is through the Helmholtz decomposition [1, 3, 15]. See also [9] for the approach of combining the CG method with multigrid iteration.

In this paper, we take a diﬀerent approach and propose a null space free Jacobi– Davidson (NFJD) iteration inspired by the Poincar´e lemma [10, 11, 29], whose discrete analogue remains valid for the class of spatial discretizations under consideration:

Poincar´e lemma. If v ∈ C1(Ω) and ∇ · v = 0 on a contractible domain Ω, thenv = ∇ × ˆv for some vector potential ˆv.

The Poincar´e lemma on general domains can be found in, for example, [33, Theorem 1.5]. See also [14, Theorem 2.1] for the discrete counterpart.

Motivated by the Poincaré lemma, we will show in Theorem 2.2the existence of discrete vector potentials for the relevant vector fields. Instead of solving the cor-rector directly, the novelty of our approach is to derive and solve an equation for the vector potential of the corrector. The vector potential only needs to be solved approx-imately as in the original Jacobi–Davidson approach. By taking the discrete curl of the approximate vector potential, we annihilate completely the components in the null space and obtain a good approximation of the corrector which is null vector free. As a result, the expanding subspace automatically satisfies the divergence free constraint (1.6). The total cost to get an approximate corrector from an approximate vector potential is a single sparse matrix-vector multiplication. This is by far much cheaper than any projection. Numerical experiments have confirmed that our approach ef-fectively resolves the slow convergence issue caused by the spurious null space in the original JD iteration. In addition, our scheme also outperforms the projection-based method by a significant margin over a wide range of parameter regime and provides a competitive alternative to existing schemes.

The rest of the paper is organized as follows. In section 2, we present detailed mathematical formulations for the eigenvalue problem (1.8), including Theorem 2.2, which characterizes the subspace spanned by the eigenvectors perpendicular to the null space in terms of discrete vector potentials. This is the foundation of our new scheme. The vector potential approach admitsa magnetic field interpretation which leads to an isospectral reformulation of (1.8). We then further interplay between these equivalent formulations in section 3 and derive our NFJD method as an application of Theorem 2.2. We also present a dual version of NFJD with the roles of electric field and magnetic field interchanged during the iteration. In section 4, we conduct extensive numerical comparison among the original JD method, the Helmholtz pro-jected Jacobi–Davidson (HPJD), and NFJD to show the robustness and efficiency of NFJD. The finite element version of our scheme is described in the appendix.

2. Background and mathematical formulation. The photonic crystals

con-sist of dielectric materials fabricated in periodic structure. The relative permittivity

εr(x) and relative permeability μr(x), as material dependent parameters, are therefore periodic and piecewise constant. In other words,

(2.1) (εr(x), μr(x)) =

(εr,1, μr,1) in material 1, (εr,2, μr,2) in material 2,

εr(x + a) = εr(x), μr(x + a) = μr(x),  = 1, 2, 3,

(4)

where the lattice translation vectors a_{, = 1, 2, 3, span the primitive cell which} extends periodically to form the photonic crystal.

From Bloch’s theorem [21], the eigenfunctions of (1.5) satisfy the k-periodic boundary condition:

(2.2) E(x + a_{) = e}√−1k·a_{E(x), = 1, 2, 3,} for some vector k in the ﬁrst Brillouin zone.

To apply our scheme, we will only consider a class of spatially compatible dis-cretizations satisfying discrete analogue of (1.7). Such disdis-cretizations include Yee’s scheme, the co-volume discretization, and the edge elements. In this paper, we only report numerical results from Yee’s discretization due to its simplicity in implemen-tation. In addition, it is also easier to ﬁnd an eﬃcient preconditioner for the corre-sponding linear system.

For simplicity of presentation, we assume that the primitive cell is a unit cube spanned by the basis vectors

3= {(xi−21, yj−12, zk)}, (2.9)

V

= {(x_i_{, y}_j_{, z}_k)}, where 1≤ i ≤ N1, 1≤ j ≤ N2, 1≤ k ≤ N3.

We denote byV_E,V_F, andV_V the spaces of complex valued, k-periodic functions on

E

,

F

_{, and}

V

, respectively: VE = E1(xi−1 2, yj, zk), E2(xi, yj−12, zk), E3(xi, yj, zk−12) Em(x + a) = e √ −1k·a_E_{m(x), m, = 1, 2, 3} ∼₌C3N1N2N3_, (2.10) VF = F1(xi, y_j−1 2, zk−12), F2(xi−12, yj, zk−12), F3(xi−12, yj−12, zk) Fn(x + a_{) = e}√−1k·a_F_{n(x), n, = 1, 2, 3} ∼₌_C3N1N2N3_, (2.11) (2.12) V_V =_Φ(xi_{, y}_j_{, z}_k)Φ(x + a_{) = e}√−1k·a_{Φ(x), = 1, 2, 3} ∼₌CN1N2N3_.

(5)

The curl operator can be discretized naturally using standard centered diﬀerencing:

(2.13) ∇_h× : V_E → V_F_.

For example, ifE ∈ V_E, then the ﬁrst component of∇_h× E is given by

(∇_h× E)1(xi, y_j−1 2, zk−12) = E3(xi, yj, zk−1 2)− E3(xi, yj−1, zk−12) h2 −E2(xi, yj−12, zk)− E2(xi, yj−12, zk−1) h3 , (2.14)

where h1= 1/N1, h2= 1/N2, h3= 1/N3 are mesh sizes and N1, N2, N3 are numbers of partitions in x, y, and z directions, respectively.

Similarly, one can deﬁne the discrete curl operator onV_F,

(2.15) ∇_h×∗:V_F → V_E with (∇_h×∗H)1(xi−1 2, yj, zk) = H3(xi−1 2, yj+12, zk)− H3(xi−12, yj−12, zk) h2 −H2(xi−12, yj, zk+12)− H2(xi−12, yj, zk−12) h3 (2.16)

and so on, as well as the discrete divergence operator onV_E, (2.17) −∇∗_h:V_E → V_V_, (−∇∗_hE)(x_i_{, y}_j_{, z}_k) =E1(xi+ 1 2, yj, zk)− E1(xi−12, yj, zk) h1 +E2(xi, yj+12, zk)− E2(xi, yj−12, zk) h2 +E3(xi, yj, zk+12)− E3(xi, yj, zk−12) h3 . (2.18)

One can show that −∇∗_h is indeed the adjoint of −∇_h, where the discrete gradient

∇h is given by (2.19) ∇_h:V_V → V_E_, (∇_hΦ)1(x_i−1 2, yj, zk) = Φ(xi, yj, zk)− Φ(x_i−1_{, y}_j_{, z}_k) h1 , (2.20)

and similarly for (∇_hΦ)2(xi, yj−1

2, zk) and (∇hΦ)3(xi, yj, zk−12).

Even though (2.18) is a natural ﬁnite diﬀerence interpretation of discrete diver-gence operator on V_E, we have adopted −∇∗_h instead of the usual notation∇_h· so that the notation is consistent with that of the edge element discretization detailed in AppendixA. In the latter case, the divergence free constraint can only be realized through the adjoint of the gradient operator.

(6)

The crucial identity

(2.21) −∇∗_h ∇_h×∗≡ 0,

a discrete analogue of the identity (1.7), follows from straightforward calculation. The resulting generalized eigenvalue problem can be written as

(2.22) ∇_h×∗_μ−1_r,h∇_h× E = λε_r,hE, E ∈ V_E_,

where εr,h : V_E → V_E _{and μ}−1_r,h : V_F → V_F represent (multiplication by) numerical approximations of εr and μ−1_r , respectively. Both ∇_h×∗ and ∇_h× are discrete ap-proximations of the curl operator∇×; they act on diﬀerent spaces and are adjoint to each other. See Lemma 2.1 below.

ForE, U ∈ V_E andF , H ∈ V_F, we denote by·, ·_E and·, ·_F the standard inner products onV_E andV_F_{, respectively:}

E, UE = h1h2h3 N1 i=1 N2 j=1 N3 k=1 (_E1U1)_i−1 2,j,k+ (E2U2)i,j−21,k+ (E3U3)i,j,k−12 , F , HF = h1h2h3 N1 i=1 N2 j=1 N3 k=1 (F1H1)_i,j−1 2,k−12 (2.23) + (_F2H2)_i−1 2,j,k−12 + (F3H3)i−12,j−12,k . (2.24)

The following lemma explains the notation used for the two discrete curl operators in (2.16), (2.14) and is crucial to the development of our scheme.

Lemma 2.1. If U, V ∈ V_E, then

(2.25) U, ∇_h×∗_μ−1_r,h∇_h× V _E =∇_h× U, μ−1_r,h∇_h× V _F_.

Proof. It suﬃces to show that

(2.26) U, ∇_h×∗H_E =∇_h× U, H_F for allU ∈ V_E_{, H ∈ V}_F_.

This follows from the summation by parts identity and the k-periodic boundary con-dition. To see this, we denote byW = ∇_h×∗H ∈ V_E. We have

U, ∇h×∗HE=U, W E =h1h2h3 N1 i=1 N2 j=1 N3 k=1 (U1W1)_i−1 2,j,k+ (U2W2)i,j−21,k+ (U3W3)i,j,k−12 :=h1h2h3(I1+I2+I3), (2.27) where I1= N1 i=1 N2 j=1 N3 k=1 U1,i−1 2,j,k H3,i−1 2,j+12,k− H3,i−12,j−12,k h2 −H2,i−12,j,k+12− H2,i−12,j,k−12 h3 , (2.28)

(7)

and I2, I3 are deﬁned similarly. The ﬁrst term in I1 can be rearranged as N1 i=1 N3 k=1 N2 j=1 U1,i−1 2,j,k H_3,i−1 2,j+12,k− H3,i−12,j−12,k h2 =− N1 i=1 N3 k=1 U1,i−1 2,1,k h2 H3,i−12,12,k+ N2 j=2 U1,i−1 2,j,k− U1,i−12,j−1,k h2 H3,i−12,j−12,k −U1,i−12,N2,k h2 H3,i−12,N2+12,k =− N1 i=1 N3 k=1 N2 j=1 U1,i−1 2,j,k− U1,i−12,j−1,k h2 H3,i−12,j−12,k, (2.29)

where in the last equality we have used (2.30) _U_1,i−1 2,N2,k= e √ −1k2_U 1,i−1 2,0,k, H3,i−12,N2+12,k= e √ −1k2_H 3,i−1 2,12,k, which follows from the k-periodic boundary conditions (2.2), (2.3) imposed onU and

H. The second term in I1 can be treated similarly. Overall, we have

I1= N1 i=1 N2 j=1 N3 k=1 −U1,i−12,j,k− U1,i−12,j−1,k h2 H3,i−12,j−12,k +U1,i−12,j,k− U1,i−12,j,k−1 h3 H2,i−12,j,k−12 , (2.31) and similarly I2= N1 i=1 N2 j=1 N3 k=1 −U2,i,j−12,k− U2,i,j−12,k−1 h3 H1,i,j−12,k−12 +U2,i,j− 1 2,k− U2,i−1,j−12,k h1 H3,i−12,j−12,k , (2.32) I3= N1 i=1 N2 j=1 N3 k=1 −U3,i,j,k−12 − U3,i−1,j,k−12 h1 H2,i−12,j,k−12 +U3,i,j,k−12 − U3,i,j−1,k−12 h2 H1,i,j−12,k−12 . (2.33)

Consequently, (2.26) follows from (2.27), (2.31), (2.32), and (2.33).

From Lemma 2.1, the operator ∇_h ×∗ _μ−1_r,h∇_h× is self-adjoint on V_E and semideﬁnite. In addition, the operator εr,h· is positive deﬁnite. It follows that the

(8)

eigenvalues in (2.22) are real and nonnegative. The eigenvectors constitute a basis in

VE and are orthogonal with respect to the inner product induced by εr,h:

E, UE,εr,h = h1h2h3 N1 i=1 N2 j=1 N3 k=1 (E1εr,hU1)_i−1 2,j,k + (E2εr,hU2)_i,j−1 2,k+ (E3εr,hU3)i,j,k−12 . (2.34)

We therefore have the following eigendecomposition: (2.35) V_E = ker(∇_h×) ⊕ ker(∇_h×)⊥εr,h_, where (2.36) ker(∇_h×) = ker∇_h×∗_μ−1_r,h∇_h×={V ∈ V_E_{, ∇}_h× V = 0} and ker(∇_h×)⊥εr,h ₌_{U ∈ V}_E_{| U, V}_ε r,h = 0, ∀ V ∈ ker(∇h×) = Span{V_j ∈ V_E| ∇_h×∗_μ−1_r,h∇_h× V_j _{= λj}_ε_r,hV_j_{, λ}_j _{> 0}.} (2.37)

The following main theorem is the foundation of our null space free algorithm. Theorem 2.2. U ∈ ker(∇_h×)⊥εr,h if and only if ε_r,hU = ∇_h×∗μ−1

r,hU for some

discrete vector potential U ∈ V_F.

Proof. Denote by (λj, Vj) the eigenpairs of the generalized eigenvalue problem (2.22).

Suppose that U ∈ ker(∇_h×)⊥εr,h_{; then there exist constants aj} such that U = λj>0ajVj. Thus εr,hU = εr,h λj>0 ajVj= λj>0 aj∇h×∗μ−1r,h∇h×λ−1j Vj =∇_h×∗ μ−1_r,h λj>0 aj∇h× λ−1j Vj =∇_h×∗_μ−1_r,hU, (2.38) where U = λj>0 aj∇h× (λ−1j Vj)∈ VF.

Conversely, we suppose that εr,hU = ∇h×∗μ−1r,hU with U ∈ VF. Then for any

V ∈ ker(∇h×), we have

(2.39) U, V _E,ε_r,h =ε_r,hU, V _E =∇_h×∗_μ−1_r,hU, V _E =μ−1_r,hU, ∇ _h× V _F _{= 0.} ThusU ∈ ker(∇_h×)⊥εr,h_{, completing the proof.}

The matrix representation of the discrete curl operator∇_h× : V_E → V_F is given by

C = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ 0 −1 h3K 3 1 h2 diag (K2, . . . , K2) 1 h3K 3 0 −1 h1 diag (K1, . . . , K1) −1 h2 diag (K2, . . . , K2) 1 h1 diag (K1, . . . , K1) 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦∈ C 3N×3N_,

(9)

where K1= ⎡ ⎢ ⎢ ⎢ ⎣ 1 −e−√−1k1 −1 1 . ._. . ._. −1 1 ⎤ ⎥ ⎥ ⎥ ⎦∈ CN1×N1, (2.40a) K2= ⎡ ⎢ ⎢ ⎢ ⎣ IN1 −e− √ −1k2I N1 −IN1 IN1 . ._. . ._. −IN1 IN1 ⎤ ⎥ ⎥ ⎥ ⎦∈ C(N1N2)×(N1N2), (2.40b) K3= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ IN1×N2 −e− √ −1k3IN 1×N2 −IN1×N2 IN1×N2 . ._. . ._. −IN1×N2 IN1×N2 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ∈ CN×N_, (2.40c) and N = N1N2N3.

The matrix representation of (2.22) is thus

(2.41) Ae = λBe,

whereA = C∗ BC. The diagonal matrices B and B represent multiplication by εr,h(x) for x∈

_E

_{and multiplication by μ}−1_r,h(x) for x∈

_F

, respectively.

Before we proceed, we brieﬂy summarize our notation for the reader’s convenience. (a) Hatted uppercase boldface characters denote discrete vector potentials of el-ements in V_E. The latter are denoted by (nonhatted) uppercase boldface characters. For example, V ∈ V_F is a vector potential of V ∈ V_E with

∇h×∗μ−1_r,hV = ε r,hV .

(b) Lowercase boldface characters denote the column vector representation of corresponding elements inV_E orV_F. For example, bothT ∈ V_E and T ∈ V_F have three components, each of them being an N1×N2×N3array, whilet and t are 3N1N2N3× 1 column vector representation of T and T , respectively. (c) Uppercase blackboard bold characters such as A, B, I denote matrices of

varying dimensions.

3. JD method, Helmholtz projection, and vector potential. The JD

method [32] is a subspace iteration algorithm for large sparse eigenvalue problems and has been proved successful in many practical applications such as various quan-tum dot models [17, 18, 19, 34].

The major advantage of the JD method for general eigenvalue problem is that the correction equation only needs to be solved approximately. However, this advantage becomes a drawback for (2.22) as the correcting procedure inevitably brings null space components into the expanding subspace, causing slow convergence.

A simple remedy is to project out the null space components by means of the Helmholtz decomposition. The Helmholtz projection has been applied in conjunction with both the inverse power iteration [15] and the JD iteration [3] and obtaineddecent

convergence rate. However, there is an essential diﬀerence in the two projection-based

(10)

approaches. While inverse power iteration requires accurate Helmholtz projection in order to maintain accuracy of the numerical eigenpairs, the JD method seems to be quite sensitive to inexact projections and therefore much more demanding on the accuracy of the Helmholtz projection step. We found that for the Helmholtz projected JD method to work properly, the projection step needs to be solved much more accurately than the generalized eigenvalue problem(2.22)itself. More detailed, quantitative demonstration of this assertion can be found in section 4.

In contrast, our remedy to the spurious null space components is to initialize and expand the approximating subspace in terms of the discrete vector potential given in Theorem 2.2. Instead of solving the correction equation directly, we lift the correc-tion equacorrec-tion and the approximate solver to the vector potential level. The vector potential version of the correction equation is similar to the original one. As in the original JD method, the vector potential only needs to be solved approximately. An approximate corrector is then obtained by taking the discrete curl of the approximate vector potential. The null space components are then annihilated completely to ma-chine accuracy at the expense of a single sparse matrix multiplication. As a result, in our vector potential based algorithm, the approximating subspace remains null space free throughout the iteration.

In this section, we ﬁrst review the original JD method and HPJD method and then introduce our NFJD method. A detailed numerical comparison of these methods will be given in section 4. Our numerical experiment conﬁrms that the vector potential approach indeed yields superior performance against original and projection-based JD methods.

3.1. JD method. The JD method for the generalized eigenvalue problem (2.22)

consists of the following steps:

1. To compute the ith eigenpair (λi, Ei), one initializes a subspace V1 := Span{ E1, . . . , Ei−1, V1} ⊂ VE, where E1, . . . , Ei−1, i ≥ 2, are previously computed eigenvectors corresponding to eigenvalues λ1, . . . , λi−1.

2. For k = 1, 2, 3, . . ., do

(i) Find θ ∈ R \ { λ1, . . . , λi−1} nearest to the target and U ∈ Vk, with

Uεr,h = 1 such that

(3.1) (∇_h×∗_μ−1_r,h∇_h× − θε_r,h)U , V

E = 0 for all V ∈ Vk. Denote by (θk, Uk) the solution to (3.1).

(ii) Set

(3.2) R_k := (∇_h×∗_μ_r,h−1∇_h× − θ_k_ε_r,h)U_k_.

(iii) If R_k_E is less than a prescribed tolerance, then _θ_k_{, U}_kis accepted as an eigenpair. Output (λi, Ei) = (θk, Uk). Stop.

Else, solve approximately forT from (3.3)

I − εr,hUk⊗ Uk∇h×∗μ−1_r,h∇h× −θkεr,hI − Uk⊗ (εr,hUk)T = −R_k_,

T ⊥εr,h Uk,

where (I − U ⊗ V )W := W − V , W _EU.

(11)

OrthonormalizeT against V_kwith respect to the inner product·, ·_E,ε_r,h to getV_k+1.

ExpandV_k+1:= Span{ V_k_{, V}_k+1}.

The implementation detail is summarized as Algorithm 1 in matrix-vector notation. Algorithm 1.JD andHPJDmethods. Additional projection steps in HPJD are marked by double parentheses.

SetE0= [ ], Λ0=∅.

for i = 1, 2, 3, . . . , imax do

Initialize a vectorv1 withv1B= 1, PBv1=v1andE∗_i−1Bv1= 0. SetV1= [Ei−1_{, v}1].

ComputeW1=V∗₁AV1.

for k = 1, 2, 3, . . . do

(i) Compute all the eigenpairs of (Wk− θ I)s = 0.

Select the desired eigenpair (θk, sk) with θk ∈ Λ/ i−1 nearest to the target ands_k2= 1.

(ii) Computeu_k=Vks_k, r_k= (A − θkB)uk. (iii) if r_k2< τJD then

Output λi = θk,e_i=u_k.

UpdateE_i= [E_i−1_{, e}_i], Λ_i= Λ_i−1∪ {λ_i}. Exit k.

else

Solve (approximately)

(I − Bu_ku∗_k)(A − θ_kB)(I − u_ku∗_kB)t = −r_k, t ⊥_Bu_k.

Apply Helmholtz projection: t ←− P_B _t PBt . B-orthonormalize t against Vk: vk+1= t − _k =1(v∗Bt)v t −k₌₁(v∗Bt)v_B. ExpandV_k+1= [V_k_{, v}_k+1],W_k+1= Wk V∗ kAvk+1 v∗ k+1AVk v∗k+1Avk+1 . end if end for k end for i

The main computational cost in the original JD method is in step (iii), where the correction equation

(3.4) (I − Bu_ku∗_k)(A − θ_kB)(I − u_ku∗_kB)t = −r_k_, t ⊥_Bu_k_,

is solved by standard iterative methods such as GMRES with a preconditioner [32], (3.5) Mp := (I − Buku∗_k)M(I − uku∗_kB),

where M is a preconditioner for (A − θkB). In jth iteration of the linear solver, one solves forz(j) from

(3.6) Mpz(j)=y(j)_, z(j)⊥_Bu_k_. The solution to (3.6) is given by

(3.7) z(j)=M−1y(j)− ζ(j)M−1Bu_k_, where _ζ(j)= u ∗

kBM−1y(j)

u∗

kBM−1Buk.

(12)

Remark 3.1. The correction equation (3.3) is equivalent to

∇h×∗μ−1_r,h∇h× − θkεr,hT = −Rk+ ηεr,hUk

=−∇_h×∗_μ−1_r,h∇_h× − θ_k_ε_r,hU_k_{+ ηεr,h}U_k_, (3.8)

where η = εr,hUk, ∇h ×∗μ−1r,h∇h × − θkεr,h−1εr,hUk−1_E . If (3.3) were solved accurately, then one would haveT = _θ1

kε −1

r,h∇h×∗μ−1r,h∇h× (T + Uk)−θk+η_θ_k Uk ∈ ker(∇_h×)⊥εr,h_{. In other words, the corrector T and hence the approximating subspace}

V

k+1would remain null vectors free. However, the same should not be expected when (3.3) is only solved approximately as in the original JD method (3.5)–(3.7). Under such circumstances, null vectors are inevitably introduced into

_V

_k+1 to deteriorate the overall performance.

3.2. HPJD method. To overcome the slowing down caused by the null space, a

standard approach is to remove the null space components by means of Helmholtz de-composition. An approximate solutiont of (3.4) is postprocessed with the Helmholtz projection before it is appended to the expanding subspace:

(3.9) t ←− P_B t tB =I − G(G∗BG)−1G∗B t tB .

HereG is the matrix representation of the discrete gradient ∇_h:V_V → V_E, and−G∗ is precisely the matrix representation of the discrete divergence operator (2.18).

The Helmholtz projection (3.9) requires solving an elliptic equation

(3.10) −G∗BG φ = −G∗B

t

tB

for each vector appended to the expanding subspace. The combination of the JD method and Helmholtz projection has been proposed in the literature [15]. In ad-dition to a linear solver and preconad-ditioner for the correction equation (3.4), the Helmholtz projection (3.9) also requires an eﬃcient Poisson solver and preconditioner for (3.10). The overall performance of HPJD depends on the solver/preconditioner selected for the correction equation and the Poisson equation. Although solving the Poisson equation (3.10) is straightforward and much easier compared to solving the correction equation (3.4), the load balance between the Poisson equation (3.10) and the correction equation (3.4) is somewhat delicate. We will elaborate this issue in section 4. Both standard JD and HPJD methods are summarized in Algorithm 1.

3.3. Discrete vector potential and the NFJD method. Instead of the

Helmholtz projection, we propose an alternative approach by vector potential for-mulation in order to ﬁlter out the null space components. The novelty of our scheme is to derive and solve a new correction equation satisﬁed by the vector potential. The vector potential T only needs to be solved approximately as in the original JD method. An approximate corrector for (3.3) is then obtained by takingT := ε−1_r,h∇_h×∗_μ−1_r,hT .

3.3.1. Vector potential approach for the correction equation (step (iii)).

We ﬁrst explain how to derive a new correction equation for the vector potential. This procedure can be illustrated, for example, by the evaluation of M−1Buk in (3.7), or equivalently by getting an approximate solution of

(3.11) (∇_h×∗_μ−1_r,h∇_h× −θ_k_ε_r,h)Q = ε_r,hU_k_.

(13)

By construction,U_k∈ ker(∇_h×)⊥εr,h_{. Therefore from Theorem 2.2, we have} (3.12) _ε_r,hU_k=∇_h×∗_μ−1_r,hU _k_.

We seek a solution to (3.11) of the form

(3.13) _ε_r,hQ = ∇_h×∗_μ−1_r,hQ. Substituting (3.12), (3.13) into (3.11), we have

(3.14) ∇_h×∗_μ−1_r,h∇_h× ε−1_r,h∇_h×∗_μ−1_r,h− θ_k_μ−1_r,h Q=∇_h×∗_μ−1_r,hU _k_. Instead of solving (3.11) directly, we propose to solve for the vector potential (3.15) _μ−1_r,h∇_h× ε−1_r,h∇_h×∗_μ−1_r,h− θ_k_μ−1_r,h Q_{= μ}−1_r,hU _k_,

or equivalently

(3.16) ∇_h× ε−1_r,h∇_h×∗−θ_k_μ_r,h_(μ−1_r,hQ) = U_k_,

then take Q = ε−1_r,h∇_h×∗_μ−1_r,hQ as an approximate solution of (3.11) which lies in ker(∇_h×)⊥εr,h automatically.

The same idea can be used to derive an equation for the vector potential of the corrector. We now elaborate the procedure in matrix notation. From Theorem 2.2, we have

(3.17) u_k=B−1C∗ B uk_, t = B−1C∗ B t. We now substitute (3.17) into the correction equation

(3.18) (I − Bu_ku∗_k)(A − θ_kB)(I − u_ku∗_kB)t = −r_k_, t ⊥_Bu_k_. The left-hand side of (3.18) becomes

(I − Bu_ku∗_k)(A − θ_kB)(I − u_ku∗_kB)t

= (I − C∗ B uku ∗_k BCB−1)(C∗ BC − θkB)B−1C∗ B(I − uku ∗_k BCB−1C∗ B) t = (I − C∗ B uku ∗_k BCB−1)C∗( BCB−1C∗ B − θk B)(I − uku ∗_k A) t

=C∗(I − B u_ku ∗_k A B−1)( A − θ_k B)(I − u_k u∗_k A) t,

where A = BCB−1C∗ B. The right-hand side of (3.18) reduces to

(3.19) r_k = (A − θkB)uk = (C∗ BCB−1C∗ B − θkC∗ B) uk =C∗( A − θk B) uk :=C∗ rk_, which implies that

(3.20) C∗(I − B u_ku ∗_k A B−1)( A − θk B)(I − uku ∗_k A) t+ rk_{= 0,} where

(3.21) rk= ( A − θk B) uk_. It suﬃces to solve

(I − B u_ku ∗_k A B−1)( A − θ_k B)(I − uku ∗_k A) t= − r_k (3.22)

(14)

under the original constraintt ⊥_Bu_k. From (3.17), it follows that (3.23) t∗Bu_k = t∗ BCB−1BB−1C∗ B u_k= t∗ A u_k_.

Thus the condition t ⊥_Bu_k now translates to t ⊥_Au _k, and the correction equation for the vector potential is given by

(I − B u_ku ∗_k A B−1)( A − θ_k B)(I − u_ku ∗_k A) t= − r_k_, t⊥_Au _k_. (3.24)

The solution procedure for (3.24) remains the same. One solves it iteratively with a preconditioner similar to the one given in (3.5):

(3.25) M _p:= (I − B u_ku _k∗ A B−1) M(I − u_ku ∗_k A), where M is a preconditioner for ( A − θ_k B).

In jth GMRES iteration, one solves for

Mp z(j)=y (j), u ∗k A z(j)= 0. (3.26)

A solution to (3.26) is therefore given by

(3.27) z(j)= M−1y (j)− ˆζ(j)M −1 B uk_, where _ζˆ(j)= u ∗

k A M−1y (j)

u∗_k A M−1 B uk.

Note that if t is an approximate solution of (3.24), then the transformation (3.17) gives rise to an approximate solution of (3.18) which lies in ker(∇_h×)⊥εr,h _{automatically.} The only diﬀerence between NFJD and HPJD is how an approximate solution to (3.18) (that lies in ker(∇_h×)⊥εr,h) is obtained. A more detailed comparison between NFJD and HPJD can be found in section 4.

For the purpose of implementation, it is more convenient to work with the vector potential variables. We now express the rest of the steps in Algorithm 1 in terms of the vector potential. The B-orthonormalization for the corrector in step (iii) of Algorithm 1,

(3.28) v_k+1= t −

_k

=1(v∗Bt)v

t −k₌₁(v∗Bt)v_B, can now be recast to

(3.29) B−1C∗ B vk+1= B

−1_C∗ B t− k

=1( v∗ A t) v

t −k₌₁(v∗Bt)v_B .

We can therefore express the orthogonalization procedure in terms of vector potentials and A as (3.30) vk+1= t−  k =1( v∗ A t) v t −k₌₁(v∗Bt)v_B = t− k =1( v∗ A t) v t− k =1( v∗ A t) v_A ,

where we have adopted the notation

(3.31) | z|_A:= ( z∗ A z)12 = ( z∗ BCB−1C∗ B z)12 =B−1C∗ B z_B_.

(15)

3.3.2. Isospectral reformulation and the subspace eigenvalue problem (step (i)). It remains to formulate the subspace eigenvalue problem in terms of vector

potentials. This can be done by retaining step (i) in Algorithm 1 and substituting

v = B−1_C∗ B v to get (3.32)

(W_k− θ_kI)s_k _{= 0,}

(W_k)_m=v∗Av_m= v∗ BCB−1C∗ BCB−1C∗ B v_m= v∗ A B−1 A v_m; u _k= V_ks_k_. The subspace eigenvalue problem (3.32) amounts to the following:

(SEP-1) Find uk ∈ Range( V_{k), θk} _{> 0, nearest to the target such} that

(B−1C∗ B v)∗(A − θkB)(B−1C∗ B uk) = 0 for all v ∈ Range( Vk).

This is essentially identical to standard approximation of the original pencil (A, B) on the subspaceV_k, except the subspace now takes the particular formV_k=B−1C∗ B V_k and remains perpendicular to the null space.

There is an alternative approach to formulate the subspace eigenvalue problem in terms of the vector potential. Recall from (3.19) and (3.21) that (θk, B−1C B uk) is an approximate eigenpair of the pencil (A, B) with residual rk = (A − θkB)uk, provided that (θk, uk) is an approximate eigenpair of the pencil ( A, B) with residual rk = ( A − θk B) uk andr_k=C∗ rk.

In fact, it is not diﬃcult to verify that the two pencils (A, B) = (C∗ BC, B) and ( A, B) = ( BCB−1C∗ B, B) have identical spectrum. This is not surprising since the eigenvalue problem A e = λ B e, or equivalently

(3.33) CB−1C∗h = λ B−1h,

is nothing but the matrix representation of the Maxwell’s equations (1.1)–(1.4) written in terms of the magnetic ﬁeldH:

(3.34) ∇_h× ε−1_r,h∇_h×∗H = λμ_r,hH.

In view of this, we now have another formulation for the subspace eigenvalue problem:

(SEP-2) Find u _k ∈ Range( V_k_{), θk} _{> 0 nearest to the target such} that

v∗( A − θk B) uk _{= 0, for all}v ∈ Range( V_k).

Or, in matrix notation (recall the normalization v∗_l A vm_{= δlm} in step (iii)), (3.35) (I − θ_k Z_k)s_k_{= 0,} ( Z_k)_m= v∗ B v_m; u _k = V_ks_k_.

Even though the pencils (A, B) and ( A, B) are isospectral, their subspace ap-proximations (SEP-1) and (SEP-2) are generally diﬀerent and correspond to the pencils ( V∗_k A B−1 A Vk_,V∗_k A Vk) and ( V∗_k A Vk_,V∗_k B Vk), respectively. The u _k selected from (SEP-1) or (SEP-2) is implicitly processed to get an approximate eigenpair (θk, uk) = (θk, B−1C∗ B uk) for the pencil (A, B). Similar to A, the matrix A also possesses a huge null space consisting of those v such that C∗ B v = 0. When the

(16)

selectedu _k lies very close to ker( A), the corresponding (θ_k_{, u}_k) is no longer a good approximate eigenpair for (A, B). In this situation, θ_k ≈ 0 and the dominant part of

uk (the ker( A) component) is completely wiped out upon multiplication by B−1C∗ B. The remaining components in ker( A)⊥B are therefore ampliﬁed to get an essentially random vector u_k = B−1C∗ B uk in ker(A)⊥B_{. To prevent this from happening, we} have set up a threshold in the selection of the nearest-to-target approximate eigen-value for (SEP-2). More precisely, we select the smallest θ such that θ ≥ θc > 0

in (SEP-2). The threshold value θc and other parameters are detailed in section 4. On the other hand, a threshold is not needed for (SEP-1) since it also corresponds to a subspace eigenvalue problem for the pencil ( A B−1 A, A). The above mentioned scenario (θk≈ 0, with uk very close to ker(A)) does not occur. We simply choose the smallest θ > 0 for (SEP-1).

In view of (3.32) and (3.35), it is obvious that the magnetic ﬁeld approach (SEP-2) requires fewer arithmetic operations in forming the subspace matrix (W_k vs Z_k). We have implemented both versions and found that (SEP-2) with the threshold indeed prevails under otherwise identical settings and they both outperform HPJD.

Note, however, that setting up a threshold does not help in the original JD method (that is, without Helmholtz projection). In general, JD with the same threshold per-forms worse than the original JD and frequently fails to converge.

We summarize the NFJD method using the magnetic ﬁeld subspace eigenvalue problem (SEP-2) as Algorithm 2. A detailed comparison between HPJD and NFJD will be given in section 4.

3.3.3. Stopping tolerance, error bounds, and variant of NFJD. In NFJD,

even though the actual working variables are the vector potentials (or magnetic ﬁeld vectors), the underlying eigenvalue problem remains the original one for the electric ﬁeld, namely, Ae = λBe. An error bound for the computed eigenpair follows from the standard estimate [27]:

| sin ∠B(ei, eNFJDi )| ≤ 1 γi rNFJD B−1 eNFJD i B ≤ 1 γi |||B−1_|||₂_rNFJD 2 eNFJD i B , |λi− λNFJDi | ≤min rNFJD B−1 eNFJD i B , 1 γi rNFJD B−1 eNFJD i B 2 ≤min |||B−1_|||₂_rNFJD 2 eNFJD i B , 1 γi |||B−1_|||₂_rNFJD 2 eNFJD i B 2 , (3.36) where (3.37) | sin ∠_B(u, v)| = 1− u∗_Bv uBvB 2 , γi = min λj=λi|λj− λ NFJD i |.

Since rNFJD = C∗ r, where rNFJD = (A − θB)u, r = ( A − θ B) u, the stopping criterion r2< τNFJD implies (3.38) rNFJD2=C∗ r ≤ |||C∗|||2 r2< |||C∗|||2τNFJD with (3.39) |||C∗|||2≤ |||C∗_|||₁_|||C∗_|||_∞_{= 2 max} 1 h1 + 1 h2, 1 h2 + 1 h3, 1 h3 + 1 h1 .

(17)

Algorithm 2.NFJD method. Primary working variables (hatted vectors) are inV_F and output eigenvectors are inV_E.

Set E0= [ ], Λ0=∅.

for i = 1, 2, 3, . . . , imax do

Initialize a vector potential v1 with| v1|_A= 1 and E∗_i−1 A v1= 0. Set V1= [ Ei−1_,v1].

Compute Z1= V∗₁ B V1.

for k = 1, 2, 3, . . . do

(i) Compute all the eigenpairs of (I − θ Zk)s = 0.

(ii) Computeu _k= V_ks_k, rk= ( A − θ_k B) uk. (iii) if ( r_k2< τNFJD) then

Output λi = θk ande_i=B−1C∗ B u_k. Update E_i= [ E_i−1_,u_k], Λ_i= Λ_i−1∪ {λ_i}. Exit k.

else

Solve (approximately)

(I − B uku ∗_k A B−1)( A − θk B)(I − uku ∗_k A) t= − rk, t⊥_Au _k. A-orthonormalize t against Vk: vk+1= t−  k =1( v∗ A t) v t− k =1( v∗ A t) v_A . Expand V_k+1= [ V_k_,v_k+1], Z_k+1= Zk V∗k B vk+1 v∗k+1 B Vk v∗k+1 B vk+1 . end if end for k end for i

Upon convergence, an approximate eigenvectore is obtained by taking eNFJD=u = B−1_C∗ B u, which is B-normalized as in standard JD and HPJD computation: (3.40) eNFJD_B=u_B=B−1C∗ B u_B=| u|_A_{= 1.}

An error bound like (3.36) applies equally to JD and HPJD. In view of (3.38) and (3.40), it follows that for NFJD to give comparable accuracy with JD or HPJD computation, it suﬃces to take τJD=|||C∗|||2τNFJD. This is the basis of our numerical comparison. See section 4 for details.

With the magnetic field interpretation for NFJD, an alternative null space free ap-proach emerges naturally. First observe that a slight change in step (iii) of Algorithm 2, from “Output λi = θk ande_i=B−1C∗ B uk_{” to “Output λi} _{= θk} andh_i= B u_k,” results in a (null space free) numerical scheme for the magnetic field eigenvalue prob-lem (3.33). Alternatively, one could start with the JD method for (3.33) instead and then apply a similar derivation as in NFJD. This approach is dual to NFJD with the roles of electric field vectors and magnetic field vectors interchanged during the iteration. In this new scheme, denoted as NFJD∗, both the primary working variables and output eigenvectors are the electric field vectors. The magnetic field vectors (i.e., the vector potentials) only appear as auxiliary variables in the derivation. We omit the details and summarize the result in Algorithm 3.

(18)

Algorithm 3. _NFJD∗_{: dual version of JD method for Maxwell’s} equations_{. Primary working variables and output eigenvectors are both in}V_E_. SetE0= [ ], Λ0=∅.

for i = 1, 2, 3, . . . , imax do

Initialize a vectorv1 with|v1|A= 1 and E∗_i−1Av1= 0. SetV1= [Ei−1_{, v}1].

ComputeZ1=V∗1BV1.

for k = 1, 2, 3, . . . do

(i) Compute all the eigenpairs of (I − θZk)s = 0.

(ii) Computeu_k= Vksk

VkskB,rk = (A − θkB)uk. (iii) if r_k2< τNFJD∗ then

Output λi = θk,e_i=u_k.

UpdateEi= [Ei−1_{, e}_i], Λ_i= Λ_i−1∪ {λ_i}. Exit k. else Solve (approximately) (I − Bu_ku∗_kAB−1)(A − θ_kB)(I − u_ku∗_kA)t = −r_k, t ⊥_Au_k. A-orthonormalize t against Vk: vk+1= t − _k =1(v∗At)v |t −k₌₁(v∗At)v|_A. ExpandVk+1= [Vk_{, v}_k+1],Zk+1= Zk V∗ kBvk+1 v∗ k+1BVk v∗k+1Bvk+1 . end if end for k end for i

The major difference between NFJD and NFJD∗lies in the routine matrix-vector multiplication. Namely, A v = BCB−1C∗ B v in NFJD versus Av = C∗ BCv in NFJD∗. For Yee’s discretization, both B and B are diagonal matrices (with μ_r≡ 1, B = I in most applications). The difference between NFJD and NFJD∗is insignificant. In the finite element case, the mass matricesB and B are sparse and banded. In addition, multiplication by A in NFJD requires solving a linear system Bv = c and is more expensive than NFJD∗. In Appendix A, we will give a brief derivation for the finite element version of NFJD∗.

4. Numerical tests. Our numerical tests are based on the benchmark example

shown in Figure 1, where the periodic dielectric structure within a primitive cubic cell is depicted. The structure consists of dielectric spheres with radius r connected by circular cylinders with radius s. Here r/a = 0.345, s/a = 0.11, and a is the edge length of the cube. Inside the structure is the dielectric material with permittivity contrast εr,i/εr,o= 13 and μr,i= μr,o= 1 (corresponding to B = I).

Figure 2 shows the plot of w = a√λ/(2π) versus sample points k in the ﬁrst

Brillouin zone for the benchmark problem computed using NFJD with N1 = N2 =

N3 = 100. The smallest nonzero eigenvalues are calculated for 40 sample points k distributed along the segments connecting Γ = (0, 0, 0), X = (π, 0, 0), M = (π, π, 0),

R = (π, π, π), and back to Γ in the ﬁrst Brillouin zone. A clear band gap lies between

the ﬁfth and sixth smallest positive eigenvalues.

(19)

a

Fig. 1_{. The periodic dielectric structure within a primitive cell. Inside: dielectric material.} Outside: air. Herer/a = 0.345, s/a = 0.11, and ε_r,i/εr,o= 13.

As a preliminary test, we summarize the result with various grid resolutions in Table 1. Here wlow denotes the maximum of the ﬁfth eigenvalue, wup the minimum of the sixth eigenvalue, and

γgm:= wup− wlow (wup+ wlow)/2.

The result shows clear convergence and agrees well with those reported in the litera-ture [6, 9].

To further illustrate the accuracy and eﬃciency of NFJD, we have devised several numerical experiments for JD, HPJD, and NFJD in various settings. In the following tests, all examples are computed using 1003cells with the initial vector obtained from interpolating the ground state of 503 calculation. All computations are conducted under identical settings for JD, HPJD, and NFJD, except the tolerances are scaled according to (3.38) and (3.39),

(4.1) _τNFJD:= τJD/400.

The fast Fourier transform (FFT) is used as a preconditioner for (3.10), (3.18), and (3.24). More precisely, we take

(4.2) M = (C∗C − θεrI), M = (ε −1_r CC∗− θI),

where εr and ε−1r are spatial averages of εr and ε−1r , respectively. Similarly, we use G∗_{G as the preconditioner for the Poisson equation (3.10). The matrices M,}_{M, and} G∗_{G are all FFT-invertible. Since both the correction equations and their} precondi-tioners are highly indeﬁnite (with about one third of eigenvalues being negative), we

(20)

0 0.1 0.2 0.3 0.4 0.5 0.6 Γ Γ X M R w = a √ λ/ (2 π )

Fig. 2. Band structure computed with 100× 100 × 100 grid. Table 1

The computed gap-midgap ratio with various grid sizes. grids 50× 50 × 50 100× 100 × 100 200× 200 × 200

wlow 0.41785 0.41789 0.41782

wup 0.48023 0.48079 0.48096

γgm 0.1389 0.1400 0.1405

adopt GMRES as the linear solver for (3.18), (3.24), and PCG as the linear solver for (3.10), in conjunction with the preconditionersM, M, and G∗G. All numerical tests are performed on a PC equipped with an Intel Q9550 2.83-GHz processor and 16 GB main memory using Intel Fortran compiler version 11.1.

We start with investigating the eﬀect of the null space on the original JD method.

Example 4.1. Standard JD and dragging eﬀect of the null space. Figure 3 shows

a typical convergence history of the Ritz value and the residual for the first two eigenvalues in standard JD. The computed Ritz value θj’s are constantly dragged toward zero during the subspace iteration. This effect of dragging is also reflected in the convergence history of the residual. Without further treatment on the null space, a significant portion of the CPU time in standard JD is wasted in producing dragged Ritz values and the resulting scheme is much too slow for practical applications. In contrast, the Ritz value converges monotonically for NFJD, as shown in Figure 4.

A standard approach to accelerating convergence in JD is to apply Helmholtz projection as described in section 3.2. In addition to eﬃcient solvers and precondi-tioners for the correction equation and the Helmholtz projection, the performance of HPJD also relies on proper load balance between them. Denote by τHP and τJD the stopping tolerance for (3.10) and (3.2), respectively. More precisely, an approximate eigenpair (θ, u) with uB= 1 is accepted as a solution to (1.8) provided

(4.3) (A − θB)u2< τJD,

(21)

10 20 30 40 0 1 2 3 4 (a) 10 20 30 40 10-10 10-5 100 105 (c) 10 20 30 40 0 1 2 3 4 5 (b) 10 20 30 40 10-10 10-5 100 105 (d) λ1= 3.4505 _λ₂_{= 4.3901} θk θk  rk 2  rk 2 k k k k

Fig. 3_{. (a)–(b) Convergence history of the Ritz value}θ_k_forλ₁ _andλ₂_{using standard JD with}

k = (π, 0.6π, 0). (c)–(d) Convergence history of the residual.

while an approximate solution φ of (3.10) is accepted if

(4.4) − G∗BG φ + G∗B t tB 2< τHP.

Roughly speaking, there is a critical τHPc (depending on τJD) such that HPJD does not converge if τHP> τHPc . Since the overall accuracy of HPJD is governed by τJD only, one should take τHP≈ τHPc in order to minimize CPU time spent on the Helmholtz projection and get optimal performance. However, it is diﬃcult to predict a priori what τHPc is. This is a fairly good reason to advocate NFJD and NFJD∗.

We have conducted extensive numerical experiments and concluded that

(4.5) _τ_HPc τJD

in the sense that τHPc is generally two or three orders of magnitude smaller than τJD. We now proceed to illustrate (4.5) with the following examples.

(22)

10 20 30 101 102 (a) 10 20 30 10-10 100 (c) 2 4 6 8 10 4.35 4.4 4.45 (b) 2 4 6 8 10 10-10 10-5 (d) λ1= 3.4505 λ2= 4.3901 θk θk  rk 2  rk 2 k k k k

Fig. 4_{. (a)–(b) Convergence history of the Ritz value for}λ₁ _and λ₂ _{using NFJD with k =} (π, 0.6π, 0). (c)–(d) Convergence history of the residual.

Example 4.2. HPJD with (τJD, τHP) = (10−6_{, 10}−7_{). This is the case where τ}HP exceeds the critical value τHPc , resulting in an inexact Helmholtz projection. The remaining null space components accumulate gradually as the subspaceVk grows. As a result, HPJD fails to converge at higher eigenvalues. We have observed a similar dragging eﬀect as shown in Figure 5,

Example 4.3. Detailed comparison between HPJD and NFJD. We now compare

NFJD against HPJD at its optimal setting. We set τJD= 10−6_{and τ}NFJD= 2.5×10−9 according to (4.1). The critical τHP roughly corresponds to τHPc = 10−9 ∼ 10−8, depending on other parameters (see below). As τHP decreases, the overall CPU time for HPJD increases constantly. The smallest positive eigenvalue in our simulation occurs at k = (0.1π, 0, 0) with λ1∼ 0.045. Accordingly, we have set our threshold for NFJD as θc= 0.01 (see section 3.3.2 for the reason for setting up the threshold). An approximate eigenvalue in step (i) of NFJD is considered admissible only if θ ≥ θc.

In Table 2, we summarize the result for the computation of λ1 through λ10 with diﬀerent parameters. One of the varying parameters is the restart dimension, denoted by NS. To accelerate convergence, we recycle a number of Ritz vectors after λi−1 converges (i ≥ 2) and after a restart. These NR Ritz vectors are used to build up initial subspace for λiin the beginning and after a restart. This accelerated version of NFJD is summarized as Algorithm 4. The accelerated version of HPJD and NFJD∗ can be obtained through similar modiﬁcation.

(23)

0 20 40 60 80 100 120 140 3 4 5 6 7 8 theta 0 20 40 60 80 100 120 140 10−5 100 105

Convergence history of the Ritz value forλ5= 6.8940166

Convergence history of the residual

rk 2

k k

Fig. 5_{. Convergence history of Ritz value} θ_k _{and residual}r_k _forλ₅ _{at k = (0.5π, 0, 0) when} using HPJD withτJD= 10−6,τHP= 10−7.

Table 2

CPU time comparison for ﬁrst 10 eigenvalues (averaged over k) with τNFJD = 2.5 × 10−9, τJD= 10−6, andτHP ≈ τ_HPc . NS: restart dimension. NR: number of recycled Ritz vectors from λi−1’s subspaceV_k. (NR, NS) τHP HPJD NFJD Ratio (3, 15) 10−8 5473 sec 3789 sec 1.444 (5, 15) 10−8 4897 sec 3536 sec 1.385 (7, 15) 10−8 4537 sec 3442 sec 1.318 (9, 15) 10−8 4347 sec 3528 sec 1.232 (NR, NS) τHP HPJD NFJD Ratio (3, 25) 10−9 5097 sec 3695 sec 1.379 (5, 25) 10−9 4720 sec 3412 sec 1.383 (7, 25) 10−9 4493 sec 3308 sec 1.358 (9, 25) 10−9 4299 sec 3307 sec 1.300

The results in Example 4.3 show that NFJD outperforms HPJD by a signiﬁcant margin in all cases. In addition, HPJD is considerably slower for small NR’s. On the other hand, the performance of NFJD is relatively insensitive to (NR, NS).

Next, in Table 3, we document in more detail the relevant components of HPJD and NFJD. One can see that, for sufficiently large NR, the CPU time spent on the JD part are comparable in HPJD and NFJD. The additional CPU time spent in Helmholtz projection constitutes a significant portion in HPJD, even though such an FFT-based Helmholtz projector is indeed extremely efficient. For smaller NR, NFJD is even more efficient on the JD part.

Finally, in Table 4, we record the total CPU time needed for HPJD as τHP de-creases below τHPc for various cases of (NR, NS). As expected, the CPU time spent in