• 沒有找到結果。

A NULL SPACE FREE JACOBI-DAVIDSON ITERATION FOR MAXWELL'S OPERATOR

N/A
N/A
Protected

Academic year: 2021

Share "A NULL SPACE FREE JACOBI-DAVIDSON ITERATION FOR MAXWELL'S OPERATOR"

Copied!
29
0
0

加載中.... (立即查看全文)

全文

(1)

A NULL SPACE FREE JACOBI–DAVIDSON ITERATION

FOR MAXWELL’S OPERATOR

YIN-LIANG HUANG, TSUNG-MING HUANG, WEN-WEI LIN§,

AND WEI-CHENG WANG

Abstract. We present an efficient null space free Jacobi–Davidson method to compute the

positive eigenvalues of time harmonic Maxwell’s equations. We focus on a class of spatial discretiza-tions that guarantee the existence of discrete vector potentials, such as Yee’s scheme and the edge elements. During the Jacobi–Davidson iteration, the correction process is applied to the vector po-tential instead. The correction equation is solved approximately as in the standard Jacobi–Davidson approach. The computational cost of the transformation from the vector potential to the corrector is negligible. As a consequence, the expanding subspace automatically stays out of the null space and no extra projection step is needed. Numerical evidence confirms that the proposed scheme indeed outperforms the standard and projection-based Jacobi–Davidson methods by a significant margin.

Key words. time harmonic Maxwell’s equations, Yee’s scheme, edge elements, generalized eigenvalue problem, discrete vector potential, discrete deRham complex, Poincar´e Lemma, Jacobi– Davidson method

AMS subject classifications. 15A18, 15A90, 65F15 DOI. 10.1137/140954714

1. Introduction. Photonic crystals are made of dielectric materials with

pe-riodic structure. The shape and permittivity of the dielectric material completely determines the band structure of the photonic crystal. Over the past few decades, photonic crystals with specific band structures have been of practical interest and have been extensively studied. The governing equation for three-dimensional pho-tonic crystals is the time harmonic Maxwell’s equations:

∇ × H = iωεE, (1.1) ∇ × E = −iωμH, (1.2) ∇ · (εE) = 0, (1.3) ∇ · (μH) = 0, (1.4)

where ω is the frequency and ε = εrε0, μ = μrμ0. The constants ε0 = 8.854 × 10−12 Farad/meter and μ0= 1.257×10−6Henry/meter represent vacuum permittivity Submitted to the journal’s Methods and Algorithms for Scientific Computing section January 28,

2014; accepted for publication (in revised form) August 25, 2014; published electronically January 8, 2015. This work is sponsored in part by the National Center of Theoretical Sciences of Taiwan and ST Yau Center at National Chiao Tung University.

http://www.siam.org/journals/sisc/37-1/95471.html

Department of Applied Mathematics, National University of Tainan, Tainan, 700, Taiwan

([email protected]). This author was supported by the Ministry of Science and Technology (101-2115-M-024-002).

Department of Mathematics, National Taiwan Normal University, Taipei, 116, Taiwan

([email protected]). This author was supported by the Ministry of Science and Technology (102-2115-M-003-011-MY3).

§Department of Applied Mathematics, National Chiao Tung University, HsinChu, 300, Taiwan

([email protected]). This author was supported by the Ministry of Science and Technology (103-2628-M-009-005).

Department of Mathematics, National Tsing Hua University and National Center for Theoretical

Sciences, HsinChu, 300, Taiwan ([email protected]). This author was supported by the Ministry of Science and Technology (101-2115-M-007-002-MY2).

A1

(2)

and vacuum permeability, respectively. The relative permittivity εr and relative permeability μrare dimensionless, material dependent parameters.

We can recast the Maxwell’s equations in terms of the electric fieldE alone:

∇ × μ−1

r ∇ × E = λ εrE, (1.5)

∇ · (εrE) = 0, (1.6)

where λ = ε0μ0ω2 is the eigenvalue. The degenerate elliptic operator∇ × μ−1r ∇× is self-adjoint and nonnegative. Since μr and εr are material dependent and therefore piecewise constant, (1.5) constitutes an elliptic interface problem [35, 16]. Equations (1.5) and (1.6) need to be supplied with appropriate boundary conditions which we will elaborate in the beginning of section 2.

Equation (1.6) serves as a constraint for the degenerate elliptic equation (1.5) and is redundant for the nonzero eigenvalues λ = 0 as a consequence of the basic identity from calculus

(1.7) ∇ · ∇× ≡ 0.

A traditional wisdom to reflect this fact is to adopt spatial discretizations that ad-mit discrete analogue of (1.7). This class of spatial discretizations includes the Yee’s scheme [37], the Whitney form [7, 36], the co-volume discretization [25], the mimetic discretization [20], and the edge element [23, 24, 28]. With this approach, the di-vergence free constraint (1.6) is ignored and the resulting discretized system is a generalized eigenvalue problem

(1.8) Ae = λBe,

whereA is the matrix representation of the discretized ∇ × μ−1r ∇× and B is the mass matrix.

In the unconstrained formulation (1.8), the matrix A is symmetric and non-negative semidefinite. The major difficulty with this approach, however, is the large null space associated with (1.5) in the absence of (1.6). This can be seen easily from the identity

(1.9) ∇ × ∇φ ≡ 0.

In other words, the null space of∇ × μ−1r ∇× contains all the gradient vectors. The discrete counterpart of (1.9) holds true for the generalized eigenvalue problem (1.8) with the class of spatial discretizations under consideration. As a result, a huge spurious null space arises from discarding the divergence free constraint (1.6). In Yee’s discretization, for example, the dimension of the null space is the same as the number of cells, constituting one third of the total degrees of freedom. This causes severe numerical difficulties in numerical computation since in practice, we are mainly interested in the lowest nonzero eigenvalues of (1.5), (1.6) which are now located deep in the interior of the spectrum.

In this paper, we propose a novel numerical scheme to handle the null space issue using a modified Jacobi–Davidson (JD) iteration. The JD method [1, 2, 3, 5, 12, 26, 31, 32] is a well-established, efficient eigensolver based on expanding subspace iteration. In contrast to classical eigensolvers such as the inverse power method [9, 15] and various Lanczos [2, 30] or Arnoldi methods [5], which require the shift-and-invert technique to compute interior eigenpairs, the linear system for the corrector in the JD iteration only needs to be solved approximately. This is the key to efficiency of

(3)

the JD method. On the other hand, since the corrector is only solved approximately, direct application of the JD method inevitably brings null vectors into the expanding subspace. As a result, the efficiency of the JD method is significantly deteriorated. The pollution of the spurious null space is common to other eigensolvers as well. The standard remedy is to project the approximate eigenvector back to the orthogonal complement of the null space. A well-known approach for the projection is through the Helmholtz decomposition [1, 3, 15]. See also [9] for the approach of combining the CG method with multigrid iteration.

In this paper, we take a different approach and propose a null space free Jacobi– Davidson (NFJD) iteration inspired by the Poincar´e lemma [10, 11, 29], whose discrete analogue remains valid for the class of spatial discretizations under consideration:

Poincar´e lemma. If v ∈ C1(Ω) and ∇ · v = 0 on a contractible domain Ω, thenv = ∇ × ˆv for some vector potential ˆv.

The Poincar´e lemma on general domains can be found in, for example, [33, Theorem 1.5]. See also [14, Theorem 2.1] for the discrete counterpart.

Motivated by the Poincar´e lemma, we will show in Theorem 2.2the existence of discrete vector potentials for the relevant vector fields. Instead of solving the cor-rector directly, the novelty of our approach is to derive and solve an equation for the vector potential of the corrector. The vector potential only needs to be solved approx-imately as in the original Jacobi–Davidson approach. By taking the discrete curl of the approximate vector potential, we annihilate completely the components in the null space and obtain a good approximation of the corrector which is null vector free. As a result, the expanding subspace automatically satisfies the divergence free constraint (1.6). The total cost to get an approximate corrector from an approximate vector potential is a single sparse matrix-vector multiplication. This is by far much cheaper than any projection. Numerical experiments have confirmed that our approach ef-fectively resolves the slow convergence issue caused by the spurious null space in the original JD iteration. In addition, our scheme also outperforms the projection-based method by a significant margin over a wide range of parameter regime and provides a competitive alternative to existing schemes.

The rest of the paper is organized as follows. In section 2, we present detailed mathematical formulations for the eigenvalue problem (1.8), including Theorem 2.2, which characterizes the subspace spanned by the eigenvectors perpendicular to the null space in terms of discrete vector potentials. This is the foundation of our new scheme. The vector potential approach admitsa magnetic field interpretation which leads to an isospectral reformulation of (1.8). We then further interplay between these equivalent formulations in section 3 and derive our NFJD method as an application of Theorem 2.2. We also present a dual version of NFJD with the roles of electric field and magnetic field interchanged during the iteration. In section 4, we conduct extensive numerical comparison among the original JD method, the Helmholtz pro-jected Jacobi–Davidson (HPJD), and NFJD to show the robustness and efficiency of NFJD. The finite element version of our scheme is described in the appendix.

2. Background and mathematical formulation. The photonic crystals

con-sist of dielectric materials fabricated in periodic structure. The relative permittivity

εr(x) and relative permeability μr(x), as material dependent parameters, are therefore periodic and piecewise constant. In other words,

(2.1) (εr(x), μr(x)) = 

(εr,1, μr,1) in material 1, (εr,2, μr,2) in material 2,

εr(x + a) = εr(x), μr(x + a) = μr(x),  = 1, 2, 3,

(4)

where the lattice translation vectors a,  = 1, 2, 3, span the primitive cell which extends periodically to form the photonic crystal.

From Bloch’s theorem [21], the eigenfunctions of (1.5) satisfy the k-periodic boundary condition:

(2.2) E(x + a) = e−1k·aE(x),  = 1, 2, 3, for some vector k in the first Brillouin zone.

To apply our scheme, we will only consider a class of spatially compatible dis-cretizations satisfying discrete analogue of (1.7). Such disdis-cretizations include Yee’s scheme, the co-volume discretization, and the edge elements. In this paper, we only report numerical results from Yee’s discretization due to its simplicity in implemen-tation. In addition, it is also easier to find an efficient preconditioner for the corre-sponding linear system.

For simplicity of presentation, we assume that the primitive cell is a unit cube spanned by the basis vectors

(2.3) a1= (1, 0, 0), a2= (0, 1, 0), a3= (0, 0, 1). The corresponding first Brillouin zone is given by

(2.4) {k = (k1, k2, k3)∈ R3| − π ≤ kj≤ π, j = 1, 2, 3}.

In Yee’s discretization for (1.1)–(1.4), the magnetic field and electric field are de-fined on different locations. The centers of cell edges, cell faces, and cell centers are abbreviated as

E

=

E

1

E

2

E

3, (2.5)

E

1= {(xi−1 2, yj, zk)},

E

2= {(xi, yj−12, zk)},

E

3= {(xi, yj, zk−12)}, (2.6)

F

=

F

1

F

2

F

3, (2.7) (2.8)

F

1= {(xi, yj−1 2, zk−12)},

F

2= {(xi−12, yj, zk−12)},

F

3= {(xi−21, yj−12, zk)}, (2.9)

V

= {(xi, yj, zk)}, where 1≤ i ≤ N1, 1≤ j ≤ N2, 1≤ k ≤ N3.

We denote byVE,VF, andVV the spaces of complex valued, k-periodic functions on

E

,

F

, and

V

, respectively: VE =  E1(xi−1 2, yj, zk), E2(xi, yj−12, zk), E3(xi, yj, zk−12) Em(x + a) = e −1k·aEm(x), m,  = 1, 2, 3 ∼=C3N1N2N3, (2.10) VF =  F1(xi, yj−1 2, zk−12), F2(xi−12, yj, zk−12), F3(xi−12, yj−12, zk) Fn(x + a) = e−1k·aFn(x), n,  = 1, 2, 3 ∼=C3N1N2N3, (2.11) (2.12) VV =Φ(xi, yj, zk)Φ(x + a) = e−1k·aΦ(x),  = 1, 2, 3 ∼=CN1N2N3.

(5)

The curl operator can be discretized naturally using standard centered differencing:

(2.13) h× : VE → VF.

For example, ifE ∈ VE, then the first component ofh× E is given by

(h× E)1(xi, yj−1 2, zk−12) = E3(xi, yj, zk−1 2)− E3(xi, yj−1, zk−12) h2 −E2(xi, yj−12, zk)− E2(xi, yj−12, zk−1) h3 , (2.14)

where h1= 1/N1, h2= 1/N2, h3= 1/N3 are mesh sizes and N1, N2, N3 are numbers of partitions in x, y, and z directions, respectively.

Similarly, one can define the discrete curl operator onVF,

(2.15) h×∗:VF → VE with (h×∗H)1(xi−1 2, yj, zk) = H3(xi−1 2, yj+12, zk)− H3(xi−12, yj−12, zk) h2 −H2(xi−12, yj, zk+12)− H2(xi−12, yj, zk−12) h3 (2.16)

and so on, as well as the discrete divergence operator onVE, (2.17) −∇∗h:VE → VV, (−∇∗hE)(xi, yj, zk) =E1(xi+ 1 2, yj, zk)− E1(xi−12, yj, zk) h1 +E2(xi, yj+12, zk)− E2(xi, yj−12, zk) h2 +E3(xi, yj, zk+12)− E3(xi, yj, zk−12) h3 . (2.18)

One can show that −∇∗h is indeed the adjoint of −∇h, where the discrete gradient

∇h is given by (2.19) h:VV → VE, (hΦ)1(xi−1 2, yj, zk) = Φ(xi, yj, zk)− Φ(xi−1, yj, zk) h1 , (2.20)

and similarly for (hΦ)2(xi, yj−1

2, zk) and (∇hΦ)3(xi, yj, zk−12).

Even though (2.18) is a natural finite difference interpretation of discrete diver-gence operator on VE, we have adopted −∇∗h instead of the usual notationh· so that the notation is consistent with that of the edge element discretization detailed in AppendixA. In the latter case, the divergence free constraint can only be realized through the adjoint of the gradient operator.

(6)

The crucial identity

(2.21) −∇∗h h×∗≡ 0,

a discrete analogue of the identity (1.7), follows from straightforward calculation. The resulting generalized eigenvalue problem can be written as

(2.22) h×∗μ−1r,hh× E = λεr,hE, E ∈ VE,

where εr,h : VE → VE and μ−1r,h : VF → VF represent (multiplication by) numerical approximations of εr and μ−1r , respectively. Both h×∗ and h× are discrete ap-proximations of the curl operator∇×; they act on different spaces and are adjoint to each other. See Lemma 2.1 below.

ForE, U ∈ VE andF , H ∈ VF, we denote by ·, · E and ·, · F the standard inner products onVE andVF, respectively:

E, U E = h1h2h3 N1  i=1 N2  j=1 N3  k=1  (E1U1)i−1 2,j,k+ (E2U2)i,j−21,k+ (E3U3)i,j,k−12  , F , H F = h1h2h3 N1  i=1 N2  j=1 N3  k=1  (F1H1)i,j−1 2,k−12 (2.23) + (F2H2)i−1 2,j,k−12 + (F3H3)i−12,j−12,k  . (2.24)

The following lemma explains the notation used for the two discrete curl operators in (2.16), (2.14) and is crucial to the development of our scheme.

Lemma 2.1. If U, V ∈ VE, then

(2.25) U, ∇h×∗μ−1r,hh× V E =h× U, μ−1r,hh× V F.

Proof. It suffices to show that

(2.26) U, ∇h×∗H E =h× U, H F for allU ∈ VE, H ∈ VF.

This follows from the summation by parts identity and the k-periodic boundary con-dition. To see this, we denote byW = ∇h×∗H ∈ VE. We have

U, ∇h×∗HE=U, W E =h1h2h3 N1  i=1 N2  j=1 N3  k=1  (U1W1)i−1 2,j,k+ (U2W2)i,j−21,k+ (U3W3)i,j,k−12  :=h1h2h3(I1+I2+I3), (2.27) where I1= N1  i=1 N2  j=1 N3  k=1 U1,i−1 2,j,k H3,i−1 2,j+12,k− H3,i−12,j−12,k h2 −H2,i−12,j,k+12− H2,i−12,j,k−12 h3 , (2.28)

(7)

and I2, I3 are defined similarly. The first term in I1 can be rearranged as N1  i=1 N3  k=1 N2  j=1 U1,i−1 2,j,k H3,i−1 2,j+12,k− H3,i−12,j−12,k h2 = N1  i=1 N3  k=1 U1,i−1 2,1,k h2 H3,i−12,12,k+ N2  j=2 U1,i−1 2,j,k− U1,i−12,j−1,k h2 H3,i−12,j−12,k −U1,i−12,N2,k h2 H3,i−12,N2+12,k = N1  i=1 N3  k=1 N2  j=1 U1,i−1 2,j,k− U1,i−12,j−1,k h2 H3,i−12,j−12,k, (2.29)

where in the last equality we have used (2.30) U1,i−1 2,N2,k= e −1k2U 1,i−1 2,0,k, H3,i−12,N2+12,k= e −1k2H 3,i−1 2,12,k, which follows from the k-periodic boundary conditions (2.2), (2.3) imposed onU and

H. The second term in I1 can be treated similarly. Overall, we have

I1= N1  i=1 N2  j=1 N3  k=1 −U1,i−12,j,k− U1,i−12,j−1,k h2 H3,i−12,j−12,k +U1,i−12,j,k− U1,i−12,j,k−1 h3 H2,i−12,j,k−12 , (2.31) and similarly I2= N1  i=1 N2  j=1 N3  k=1 −U2,i,j−12,k− U2,i,j−12,k−1 h3 H1,i,j−12,k−12 +U2,i,j− 1 2,k− U2,i−1,j−12,k h1 H3,i−12,j−12,k , (2.32) I3= N1  i=1 N2  j=1 N3  k=1 −U3,i,j,k−12 − U3,i−1,j,k−12 h1 H2,i−12,j,k−12 +U3,i,j,k−12 − U3,i,j−1,k−12 h2 H1,i,j−12,k−12 . (2.33)

Consequently, (2.26) follows from (2.27), (2.31), (2.32), and (2.33).

From Lemma 2.1, the operator h ×∗ μ−1r,hh× is self-adjoint on VE and semidefinite. In addition, the operator εr,h· is positive definite. It follows that the

(8)

eigenvalues in (2.22) are real and nonnegative. The eigenvectors constitute a basis in

VE and are orthogonal with respect to the inner product induced by εr,h:

E, U E,εr,h = h1h2h3 N1  i=1 N2  j=1 N3  k=1  (E1εr,hU1)i−1 2,j,k + (E2εr,hU2)i,j−1 2,k+ (E3εr,hU3)i,j,k−12  . (2.34)

We therefore have the following eigendecomposition: (2.35) VE = ker(h×) ⊕ ker(∇h×)⊥εr,h, where (2.36) ker(h×) = ker h×∗μ−1r,hh× ={V ∈ VE, ∇h× V = 0} and ker(h×)⊥εr,h =U ∈ VE| U, V ε r,h = 0, ∀ V ∈ ker(∇h×)  = Span{Vj ∈ VE| ∇h×∗μ−1r,hh× Vj = λjεr,hVj, λj > 0}. (2.37)

The following main theorem is the foundation of our null space free algorithm. Theorem 2.2. U ∈ ker(∇h×)⊥εr,h if and only if εr,hU = ∇h×∗μ−1

r,hU for some

discrete vector potential U ∈ VF.

Proof. Denote by (λj, Vj) the eigenpairs of the generalized eigenvalue problem (2.22).

Suppose that U ∈ ker(∇h×)⊥εr,h; then there exist constants aj such that U =  λj>0ajVj. Thus εr,hU = εr,h  λj>0 ajVj=  λj>0 aj∇h×∗μ−1r,h∇h×λ−1j Vj =h×∗  μ−1r,h  λj>0 aj∇h× λ−1j Vj  =h×∗μ−1r,hU, (2.38) where U =  λj>0 aj∇h× (λ−1j Vj)∈ VF.

Conversely, we suppose that εr,hU = ∇h×∗μ−1r,hU with U ∈ VF. Then for any

V ∈ ker(∇h×), we have

(2.39) U, V E,εr,h = εr,hU, V E =h×∗μ−1r,hU, V E = μ−1r,hU, ∇ h× V F = 0. ThusU ∈ ker(∇h×)⊥εr,h, completing the proof.

The matrix representation of the discrete curl operatorh× : VE → VF is given by

C = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ 0 −1 h3K 3 1 h2 diag (K2, . . . , K2) 1 h3K 3 0 −1 h1 diag (K1, . . . , K1) −1 h2 diag (K2, . . . , K2) 1 h1 diag (K1, . . . , K1) 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦∈ C 3N×3N,

(9)

where K1= ⎡ ⎢ ⎢ ⎢ ⎣ 1 −e−√−1k1 −1 1 . .. . .. −1 1 ⎤ ⎥ ⎥ ⎥ ⎦∈ CN1×N1, (2.40a) K2= ⎡ ⎢ ⎢ ⎢ ⎣ IN1 −e− −1k2I N1 −IN1 IN1 . .. . .. −IN1 IN1 ⎤ ⎥ ⎥ ⎥ ⎦∈ C(N1N2)×(N1N2), (2.40b) K3= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ IN1×N2 −e− −1k3IN 1×N2 −IN1×N2 IN1×N2 . .. . .. −IN1×N2 IN1×N2 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ∈ CN×N, (2.40c) and N = N1N2N3.

The matrix representation of (2.22) is thus

(2.41) Ae = λBe,

whereA = C∗ BC. The diagonal matrices B and B represent multiplication by εr,h(x) for x

E

and multiplication by μ−1r,h(x) for x

F

, respectively.

Before we proceed, we briefly summarize our notation for the reader’s convenience. (a) Hatted uppercase boldface characters denote discrete vector potentials of el-ements in VE. The latter are denoted by (nonhatted) uppercase boldface characters. For example, V ∈ VF is a vector potential of V ∈ VE with

∇h×∗μ−1r,hV = ε r,hV .

(b) Lowercase boldface characters denote the column vector representation of corresponding elements inVE orVF. For example, bothT ∈ VE and T ∈ VF have three components, each of them being an N1×N2×N3array, whilet and t are 3N1N2N3× 1 column vector representation of T and T , respectively. (c) Uppercase blackboard bold characters such as A, B, I denote matrices of

varying dimensions.

3. JD method, Helmholtz projection, and vector potential. The JD

method [32] is a subspace iteration algorithm for large sparse eigenvalue problems and has been proved successful in many practical applications such as various quan-tum dot models [17, 18, 19, 34].

The major advantage of the JD method for general eigenvalue problem is that the correction equation only needs to be solved approximately. However, this advantage becomes a drawback for (2.22) as the correcting procedure inevitably brings null space components into the expanding subspace, causing slow convergence.

A simple remedy is to project out the null space components by means of the Helmholtz decomposition. The Helmholtz projection has been applied in conjunction with both the inverse power iteration [15] and the JD iteration [3] and obtaineddecent

convergence rate. However, there is an essential difference in the two projection-based

(10)

approaches. While inverse power iteration requires accurate Helmholtz projection in order to maintain accuracy of the numerical eigenpairs, the JD method seems to be quite sensitive to inexact projections and therefore much more demanding on the accuracy of the Helmholtz projection step. We found that for the Helmholtz projected JD method to work properly, the projection step needs to be solved much more accurately than the generalized eigenvalue problem(2.22)itself. More detailed, quantitative demonstration of this assertion can be found in section 4.

In contrast, our remedy to the spurious null space components is to initialize and expand the approximating subspace in terms of the discrete vector potential given in Theorem 2.2. Instead of solving the correction equation directly, we lift the correc-tion equacorrec-tion and the approximate solver to the vector potential level. The vector potential version of the correction equation is similar to the original one. As in the original JD method, the vector potential only needs to be solved approximately. An approximate corrector is then obtained by taking the discrete curl of the approximate vector potential. The null space components are then annihilated completely to ma-chine accuracy at the expense of a single sparse matrix multiplication. As a result, in our vector potential based algorithm, the approximating subspace remains null space free throughout the iteration.

In this section, we first review the original JD method and HPJD method and then introduce our NFJD method. A detailed numerical comparison of these methods will be given in section 4. Our numerical experiment confirms that the vector potential approach indeed yields superior performance against original and projection-based JD methods.

3.1. JD method. The JD method for the generalized eigenvalue problem (2.22)

consists of the following steps:

1. To compute the ith eigenpair (λi, Ei), one initializes a subspace V1 := Span{ E1, . . . , Ei−1, V1} ⊂ VE, where E1, . . . , Ei−1, i ≥ 2, are previously computed eigenvectors corresponding to eigenvalues λ1, . . . , λi−1.

2. For k = 1, 2, 3, . . ., do

(i) Find θ ∈ R \ { λ1, . . . , λi−1} nearest to the target and U ∈ Vk, with

Uεr,h = 1 such that

(3.1) (h×∗μ−1r,hh× − θεr,h)U , V 

E = 0 for all V ∈ Vk. Denote by (θk, Uk) the solution to (3.1).

(ii) Set

(3.2) Rk := (h×∗μr,h−1∇h× − θkεr,h)Uk.

(iii) If RkE is less than a prescribed tolerance, then θk, Uk is accepted as an eigenpair. Output (λi, Ei) = (θk, Uk). Stop.

Else, solve approximately forT from (3.3)

I − εr,hUk⊗ Uk ∇h×∗μ−1r,h∇h× −θkεr,h I − Uk⊗ (εr,hUk) T = −Rk,

T ⊥εr,h Uk,

where (I − U ⊗ V )W := W − V , W EU.

(11)

OrthonormalizeT against Vkwith respect to the inner product ·, · E,εr,h to getVk+1.

ExpandVk+1:= Span{ Vk, Vk+1}.

The implementation detail is summarized as Algorithm 1 in matrix-vector notation. Algorithm 1.JD andHPJDmethods. Additional projection steps in HPJD are marked by double parentheses.

SetE0= [ ], Λ0=∅.

for i = 1, 2, 3, . . . , imax do

Initialize a vectorv1 withv1B= 1, PBv1=v1 andEi−1Bv1= 0. SetV1= [Ei−1, v1].

ComputeW1=V1AV1.

for k = 1, 2, 3, . . . do

(i) Compute all the eigenpairs of (Wk− θ I)s = 0.

Select the desired eigenpair (θk, sk) with θk ∈ Λ/ i−1 nearest to the target andsk2= 1.

(ii) Computeuk=Vksk, rk= (A − θkB)uk. (iii) if rk2< τJD then

Output λi = θk,ei=uk.

UpdateEi= [Ei−1, ei], Λi= Λi−1∪ {λi}. Exit k.

else

Solve (approximately)

(I − Bukuk)(A − θkB)(I − ukukB)t = −rk, t ⊥Buk.



Apply Helmholtz projection: t ←− PB  t PBt  . B-orthonormalize t against Vk: vk+1= t − k =1(v∗Bt)v t −k=1(vBt)vB. ExpandVk+1= [Vk, vk+1],Wk+1=  Wk V kAvk+1 v k+1AVk v∗k+1Avk+1  . end if end for k end for i

The main computational cost in the original JD method is in step (iii), where the correction equation

(3.4) (I − Bukuk)(A − θkB)(I − ukukB)t = −rk, t ⊥Buk,

is solved by standard iterative methods such as GMRES with a preconditioner [32], (3.5) Mp := (I − Bukuk)M(I − ukukB),

where M is a preconditioner for (A − θkB). In jth iteration of the linear solver, one solves forz(j) from

(3.6) Mpz(j)=y(j), z(j)⊥Buk. The solution to (3.6) is given by

(3.7) z(j)=M−1y(j)− ζ(j)M−1Buk, where ζ(j)= u

kBM−1y(j)

u

kBM−1Buk.

(12)

Remark 3.1. The correction equation (3.3) is equivalent to

∇h×∗μ−1r,h∇h× − θkεr,h T = −Rk+ ηεr,hUk

= h×∗μ−1r,hh× − θkεr,h Uk+ ηεr,hUk, (3.8)

where η = εr,hUk, ∇h ×∗μ−1r,h∇h × − θkεr,h −1εr,hUk−1E . If (3.3) were solved accurately, then one would haveT = θ1

−1

r,h∇h×∗ μ−1r,h∇h× (T + Uk) −θk+ηθk Uk ker(h×)⊥εr,h. In other words, the corrector T and hence the approximating subspace

V

k+1would remain null vectors free. However, the same should not be expected when (3.3) is only solved approximately as in the original JD method (3.5)–(3.7). Under such circumstances, null vectors are inevitably introduced into

V

k+1 to deteriorate the overall performance.

3.2. HPJD method. To overcome the slowing down caused by the null space, a

standard approach is to remove the null space components by means of Helmholtz de-composition. An approximate solutiont of (3.4) is postprocessed with the Helmholtz projection before it is appended to the expanding subspace:

(3.9) t ←− PB  t tB  = I − G(G∗BG)−1GB  t tB  .

HereG is the matrix representation of the discrete gradient ∇h:VV → VE, and−G∗ is precisely the matrix representation of the discrete divergence operator (2.18).

The Helmholtz projection (3.9) requires solving an elliptic equation

(3.10) −G∗BG φ = −G∗B



t

tB 

for each vector appended to the expanding subspace. The combination of the JD method and Helmholtz projection has been proposed in the literature [15]. In ad-dition to a linear solver and preconad-ditioner for the correction equation (3.4), the Helmholtz projection (3.9) also requires an efficient Poisson solver and preconditioner for (3.10). The overall performance of HPJD depends on the solver/preconditioner selected for the correction equation and the Poisson equation. Although solving the Poisson equation (3.10) is straightforward and much easier compared to solving the correction equation (3.4), the load balance between the Poisson equation (3.10) and the correction equation (3.4) is somewhat delicate. We will elaborate this issue in section 4. Both standard JD and HPJD methods are summarized in Algorithm 1.

3.3. Discrete vector potential and the NFJD method. Instead of the

Helmholtz projection, we propose an alternative approach by vector potential for-mulation in order to filter out the null space components. The novelty of our scheme is to derive and solve a new correction equation satisfied by the vector potential. The vector potential T only needs to be solved approximately as in the original JD method. An approximate corrector for (3.3) is then obtained by takingT := ε−1r,hh×∗μ−1r,hT .

3.3.1. Vector potential approach for the correction equation (step (iii)).

We first explain how to derive a new correction equation for the vector potential. This procedure can be illustrated, for example, by the evaluation of M−1Buk in (3.7), or equivalently by getting an approximate solution of

(3.11) (h×∗μ−1r,hh× −θkεr,h)Q = εr,hUk.

(13)

By construction,Uk∈ ker(∇h×)⊥εr,h. Therefore from Theorem 2.2, we have (3.12) εr,hUk=h×∗μ−1r,hU k.

We seek a solution to (3.11) of the form

(3.13) εr,hQ = ∇h×∗μ−1r,hQ. Substituting (3.12), (3.13) into (3.11), we have

(3.14) h×∗ μ−1r,hh× ε−1r,hh×∗μ−1r,h− θkμ−1r,h Q=h×∗μ−1r,hU k. Instead of solving (3.11) directly, we propose to solve for the vector potential (3.15) μ−1r,hh× ε−1r,hh×∗μ−1r,h− θkμ−1r,h Q= μ−1r,hU k,

or equivalently

(3.16) h× ε−1r,hh×∗−θkμr,h −1r,hQ) = Uk,

then take Q = ε−1r,hh×∗μ−1r,hQ as an approximate solution of (3.11) which lies in ker(h×)⊥εr,h automatically.

The same idea can be used to derive an equation for the vector potential of the corrector. We now elaborate the procedure in matrix notation. From Theorem 2.2, we have

(3.17) uk=B−1C B uk, t = B−1C B t. We now substitute (3.17) into the correction equation

(3.18) (I − Bukuk)(A − θkB)(I − ukukB)t = −rk, t ⊥Buk. The left-hand side of (3.18) becomes

(I − Bukuk)(A − θkB)(I − ukukB)t

= (I − C∗ B uku k BCB−1)(C∗ BC − θkB)B−1C B(I − uku k BCB−1C B) t = (I − C∗ B uku k BCB−1)C( BCB−1C∗ B − θk B)(I − uku k A) t

=C(I − B uku k A B−1)( A − θk B)(I − uk uk A) t,

where A = BCB−1C B. The right-hand side of (3.18) reduces to

(3.19) rk = (A − θkB)uk = (C BCB−1C∗ B − θkC B) uk =C( A − θk B) uk :=C rk, which implies that

(3.20) C (I − B uku k A B−1)( A − θk B)(I − uku k A) t+ rk = 0, where

(3.21) rk= ( A − θk B) uk. It suffices to solve

(I − B uku k A B−1)( A − θk B)(I − uku k A) t= − rk (3.22)

(14)

under the original constraintt ⊥Buk. From (3.17), it follows that (3.23) tBuk = t BCB−1BB−1C B uk= t A uk.

Thus the condition t ⊥Buk now translates to t ⊥Au k, and the correction equation for the vector potential is given by

(I − B uku k A B−1)( A − θk B)(I − uku k A) t= − rk, t⊥Au k. (3.24)

The solution procedure for (3.24) remains the same. One solves it iteratively with a preconditioner similar to the one given in (3.5):

(3.25) M p:= (I − B uku k A B−1) M(I − uku k A), where M is a preconditioner for ( A − θk B).

In jth GMRES iteration, one solves for

Mp z(j)=y (j), u ∗k A z(j)= 0. (3.26)

A solution to (3.26) is therefore given by

(3.27) z(j)= M−1y (j)− ˆζ(j)M −1 B uk, where ζˆ(j)= u

k A M−1y (j)

uk A M−1 B uk.

Note that if t is an approximate solution of (3.24), then the transformation (3.17) gives rise to an approximate solution of (3.18) which lies in ker(h×)⊥εr,h automatically. The only difference between NFJD and HPJD is how an approximate solution to (3.18) (that lies in ker(h×)⊥εr,h) is obtained. A more detailed comparison between NFJD and HPJD can be found in section 4.

For the purpose of implementation, it is more convenient to work with the vector potential variables. We now express the rest of the steps in Algorithm 1 in terms of the vector potential. The B-orthonormalization for the corrector in step (iii) of Algorithm 1,

(3.28) vk+1= t −

k

=1(v∗Bt)v

t −k=1(vBt)vB, can now be recast to

(3.29) B−1C B vk+1= B

−1C B t− k

=1( v∗ A t) v

t −k=1(vBt)vB .

We can therefore express the orthogonalization procedure in terms of vector potentials and A as (3.30) vk+1= t−  k =1( v∗ A t) v t −k=1(vBt)vB = t− k =1( v∗ A t) v  t− k =1( v∗ A t) vA ,

where we have adopted the notation

(3.31) | z|A:= ( z A z)12 = ( z BCB−1C B z)12 =B−1C B zB.

(15)

3.3.2. Isospectral reformulation and the subspace eigenvalue problem (step (i)). It remains to formulate the subspace eigenvalue problem in terms of vector

potentials. This can be done by retaining step (i) in Algorithm 1 and substituting

v = B−1C B v to get (3.32)

(Wk− θkI)sk = 0,

(Wk)m=vAvm= v BCB−1C BCB−1C B vm= v A B−1 A vm; u k= Vksk. The subspace eigenvalue problem (3.32) amounts to the following:

(SEP-1) Find uk ∈ Range( Vk), θk > 0, nearest to the target such that

(B−1C B v)(A − θkB)(B−1C B uk) = 0 for all v ∈ Range( Vk).

This is essentially identical to standard approximation of the original pencil (A, B) on the subspaceVk, except the subspace now takes the particular formVk=B−1C B Vk and remains perpendicular to the null space.

There is an alternative approach to formulate the subspace eigenvalue problem in terms of the vector potential. Recall from (3.19) and (3.21) that (θk, B−1C B uk) is an approximate eigenpair of the pencil (A, B) with residual rk = (A − θkB)uk, provided that (θk, uk) is an approximate eigenpair of the pencil ( A, B) with residual rk = ( A − θk B) uk andrk=C rk.

In fact, it is not difficult to verify that the two pencils (A, B) = (C∗ BC, B) and ( A, B) = ( BCB−1C∗ B, B) have identical spectrum. This is not surprising since the eigenvalue problem A e = λ B e, or equivalently

(3.33) CB−1Ch = λ B−1h,

is nothing but the matrix representation of the Maxwell’s equations (1.1)–(1.4) written in terms of the magnetic fieldH:

(3.34) h× ε−1r,hh×∗H = λμr,hH.

In view of this, we now have another formulation for the subspace eigenvalue problem:

(SEP-2) Find u k ∈ Range( Vk), θk > 0 nearest to the target such that

v( A − θk B) uk = 0, for all v ∈ Range( Vk).

Or, in matrix notation (recall the normalization vl A vm= δlm in step (iii)), (3.35) (I − θk Zk)sk= 0, ( Zk)m= v B vm; u k = Vksk.

Even though the pencils (A, B) and ( A, B) are isospectral, their subspace ap-proximations (SEP-1) and (SEP-2) are generally different and correspond to the pencils ( Vk A B−1 A Vk, Vk A Vk) and ( Vk A Vk, Vk B Vk), respectively. The u k selected from (SEP-1) or (SEP-2) is implicitly processed to get an approximate eigenpair (θk, uk) = (θk, B−1C B uk) for the pencil (A, B). Similar to A, the matrix A also possesses a huge null space consisting of those v such that C B v = 0. When the

(16)

selectedu k lies very close to ker( A), the corresponding (θk, uk) is no longer a good approximate eigenpair for (A, B). In this situation, θk ≈ 0 and the dominant part of

uk (the ker( A) component) is completely wiped out upon multiplication by B−1C B. The remaining components in ker( A)B are therefore amplified to get an essentially random vector uk = B−1C B uk in ker(A)B. To prevent this from happening, we have set up a threshold in the selection of the nearest-to-target approximate eigen-value for (SEP-2). More precisely, we select the smallest θ such that θ ≥ θc > 0

in (SEP-2). The threshold value θc and other parameters are detailed in section 4. On the other hand, a threshold is not needed for (SEP-1) since it also corresponds to a subspace eigenvalue problem for the pencil ( A B−1 A, A). The above mentioned scenario (θk≈ 0, with uk very close to ker(A)) does not occur. We simply choose the smallest θ > 0 for (SEP-1).

In view of (3.32) and (3.35), it is obvious that the magnetic field approach (SEP-2) requires fewer arithmetic operations in forming the subspace matrix (Wk vs Zk). We have implemented both versions and found that (SEP-2) with the threshold indeed prevails under otherwise identical settings and they both outperform HPJD.

Note, however, that setting up a threshold does not help in the original JD method (that is, without Helmholtz projection). In general, JD with the same threshold per-forms worse than the original JD and frequently fails to converge.

We summarize the NFJD method using the magnetic field subspace eigenvalue problem (SEP-2) as Algorithm 2. A detailed comparison between HPJD and NFJD will be given in section 4.

3.3.3. Stopping tolerance, error bounds, and variant of NFJD. In NFJD,

even though the actual working variables are the vector potentials (or magnetic field vectors), the underlying eigenvalue problem remains the original one for the electric field, namely, Ae = λBe. An error bound for the computed eigenpair follows from the standard estimate [27]:

| sin ∠B(ei, eNFJDi )| ≤ 1 γi rNFJD B−1 eNFJD i B 1 γi |||B−1|||2rNFJD 2 eNFJD i B , |λi− λNFJDi | ≤min  rNFJD B−1 eNFJD i B , 1 γi rNFJD B−1 eNFJD i B 2 ≤min  |||B−1|||2rNFJD 2 eNFJD i B , 1 γi |||B−1|||2rNFJD 2 eNFJD i B 2 , (3.36) where (3.37) | sin ∠B(u, v)| =  1  uBv uBvB 2 , γi = min λj=λi|λj− λ NFJD i |.

Since rNFJD = C r, where rNFJD = (A − θB)u, r = ( A − θ B) u, the stopping criterion r2< τNFJD implies (3.38) rNFJD2=C∗ r ≤ |||C∗|||2 r2< |||C∗|||2τNFJD with (3.39) |||C∗|||2  |||C∗|||1|||C|||= 2 max  1 h1 + 1 h2, 1 h2 + 1 h3, 1 h3 + 1 h1  .

(17)

Algorithm 2.NFJD method. Primary working variables (hatted vectors) are inVF and output eigenvectors are inVE.

Set E0= [ ], Λ0=∅.

for i = 1, 2, 3, . . . , imax do

Initialize a vector potential v1 with| v1|A= 1 and Ei−1 A v1= 0. Set V1= [ Ei−1, v1].

Compute Z1= V1 B V1.

for k = 1, 2, 3, . . . do

(i) Compute all the eigenpairs of (I − θ Zk)s = 0.

Select the desired eigenpair (θk, sk) with θk ∈ Λ/ i−1 nearest to the target andsk2= 1.

(ii) Computeu k= Vksk, rk= ( A − θk B) uk. (iii) if ( rk2< τNFJD) then

Output λi = θk andei=B−1C B uk. Update Ei= [ Ei−1, uk], Λi= Λi−1∪ {λi}. Exit k.

else

Solve (approximately)

(I − B uku k A B−1)( A − θk B)(I − uku k A) t= − rk, t⊥Au k. A-orthonormalize t against Vk: vk+1= t−  k =1( v∗ A t) v  t− k =1( v∗ A t) vA . Expand Vk+1= [ Vk, vk+1], Zk+1=  Zk V∗k B vk+1 v∗k+1 B Vk v∗k+1 B vk+1  . end if end for k end for i

Upon convergence, an approximate eigenvectore is obtained by taking eNFJD=u = B−1C B u, which is B-normalized as in standard JD and HPJD computation: (3.40) eNFJDB=uB=B−1C B uB=| u|A= 1.

An error bound like (3.36) applies equally to JD and HPJD. In view of (3.38) and (3.40), it follows that for NFJD to give comparable accuracy with JD or HPJD computation, it suffices to take τJD=|||C∗|||2τNFJD. This is the basis of our numerical comparison. See section 4 for details.

With the magnetic field interpretation for NFJD, an alternative null space free ap-proach emerges naturally. First observe that a slight change in step (iii) of Algorithm 2, from “Output λi = θk andei=B−1C B uk” to “Output λi = θk andhi= B uk,” results in a (null space free) numerical scheme for the magnetic field eigenvalue prob-lem (3.33). Alternatively, one could start with the JD method for (3.33) instead and then apply a similar derivation as in NFJD. This approach is dual to NFJD with the roles of electric field vectors and magnetic field vectors interchanged during the iteration. In this new scheme, denoted as NFJD, both the primary working variables and output eigenvectors are the electric field vectors. The magnetic field vectors (i.e., the vector potentials) only appear as auxiliary variables in the derivation. We omit the details and summarize the result in Algorithm 3.

(18)

Algorithm 3. NFJD: dual version of JD method for Maxwell’s equations. Primary working variables and output eigenvectors are both inVE. SetE0= [ ], Λ0=∅.

for i = 1, 2, 3, . . . , imax do

Initialize a vectorv1 with|v1|A= 1 and Ei−1Av1= 0. SetV1= [Ei−1, v1].

ComputeZ1=V1BV1.

for k = 1, 2, 3, . . . do

(i) Compute all the eigenpairs of (I − θZk)s = 0.

Select the desired eigenpair (θk, sk) with θk ∈ Λ/ i−1 nearest to the target andsk2= 1.

(ii) Computeuk= Vksk

VkskB,rk = (A − θkB)uk. (iii) if rk2< τNFJD then

Output λi = θk,ei=uk.

UpdateEi= [Ei−1, ei], Λi= Λi−1∪ {λi}. Exit k. else Solve (approximately) (I − BukukAB−1)(A − θkB)(I − ukukA)t = −rk, t ⊥Auk. A-orthonormalize t against Vk: vk+1= t − k =1(v∗At)v |t −k=1(vAt)v|A. ExpandVk+1= [Vk, vk+1],Zk+1=  Zk V kBvk+1 v k+1BVk v∗k+1Bvk+1  . end if end for k end for i

The major difference between NFJD and NFJDlies in the routine matrix-vector multiplication. Namely, A v = BCB−1C B v in NFJD versus Av = C BCv in NFJD. For Yee’s discretization, both B and B are diagonal matrices (with μr≡ 1, B = I in most applications). The difference between NFJD and NFJDis insignificant. In the finite element case, the mass matricesB and B are sparse and banded. In addition, multiplication by A in NFJD requires solving a linear system Bv = c and is more expensive than NFJD. In Appendix A, we will give a brief derivation for the finite element version of NFJD.

4. Numerical tests. Our numerical tests are based on the benchmark example

shown in Figure 1, where the periodic dielectric structure within a primitive cubic cell is depicted. The structure consists of dielectric spheres with radius r connected by circular cylinders with radius s. Here r/a = 0.345, s/a = 0.11, and a is the edge length of the cube. Inside the structure is the dielectric material with permittivity contrast εr,i/εr,o= 13 and μr,i= μr,o= 1 (corresponding to B = I).

Figure 2 shows the plot of w = a√λ/(2π) versus sample points k in the first

Brillouin zone for the benchmark problem computed using NFJD with N1 = N2 =

N3 = 100. The smallest nonzero eigenvalues are calculated for 40 sample points k distributed along the segments connecting Γ = (0, 0, 0), X = (π, 0, 0), M = (π, π, 0),

R = (π, π, π), and back to Γ in the first Brillouin zone. A clear band gap lies between

the fifth and sixth smallest positive eigenvalues.

(19)

a

Fig. 1. The periodic dielectric structure within a primitive cell. Inside: dielectric material. Outside: air. Herer/a = 0.345, s/a = 0.11, and εr,i/εr,o= 13.

As a preliminary test, we summarize the result with various grid resolutions in Table 1. Here wlow denotes the maximum of the fifth eigenvalue, wup the minimum of the sixth eigenvalue, and

γgm:= wup− wlow (wup+ wlow)/2.

The result shows clear convergence and agrees well with those reported in the litera-ture [6, 9].

To further illustrate the accuracy and efficiency of NFJD, we have devised several numerical experiments for JD, HPJD, and NFJD in various settings. In the following tests, all examples are computed using 1003cells with the initial vector obtained from interpolating the ground state of 503 calculation. All computations are conducted under identical settings for JD, HPJD, and NFJD, except the tolerances are scaled according to (3.38) and (3.39),

(4.1) τNFJD:= τJD/400.

The fast Fourier transform (FFT) is used as a preconditioner for (3.10), (3.18), and (3.24). More precisely, we take

(4.2) M = (C∗C − θεrI), M = (ε −1r CC∗− θI),

where εr and ε−1r are spatial averages of εr and ε−1r , respectively. Similarly, we use GG as the preconditioner for the Poisson equation (3.10). The matrices M, M, and GG are all FFT-invertible. Since both the correction equations and their precondi-tioners are highly indefinite (with about one third of eigenvalues being negative), we

(20)

0 0.1 0.2 0.3 0.4 0.5 0.6 Γ Γ X M R w = a λ/ (2 π )

Fig. 2. Band structure computed with 100× 100 × 100 grid. Table 1

The computed gap-midgap ratio with various grid sizes. grids 50× 50 × 50 100× 100 × 100 200× 200 × 200

wlow 0.41785 0.41789 0.41782

wup 0.48023 0.48079 0.48096

γgm 0.1389 0.1400 0.1405

adopt GMRES as the linear solver for (3.18), (3.24), and PCG as the linear solver for (3.10), in conjunction with the preconditionersM, M, and GG. All numerical tests are performed on a PC equipped with an Intel Q9550 2.83-GHz processor and 16 GB main memory using Intel Fortran compiler version 11.1.

We start with investigating the effect of the null space on the original JD method.

Example 4.1. Standard JD and dragging effect of the null space. Figure 3 shows

a typical convergence history of the Ritz value and the residual for the first two eigenvalues in standard JD. The computed Ritz value θj’s are constantly dragged toward zero during the subspace iteration. This effect of dragging is also reflected in the convergence history of the residual. Without further treatment on the null space, a significant portion of the CPU time in standard JD is wasted in producing dragged Ritz values and the resulting scheme is much too slow for practical applications. In contrast, the Ritz value converges monotonically for NFJD, as shown in Figure 4.

A standard approach to accelerating convergence in JD is to apply Helmholtz projection as described in section 3.2. In addition to efficient solvers and precondi-tioners for the correction equation and the Helmholtz projection, the performance of HPJD also relies on proper load balance between them. Denote by τHP and τJD the stopping tolerance for (3.10) and (3.2), respectively. More precisely, an approximate eigenpair (θ, u) with uB= 1 is accepted as a solution to (1.8) provided

(4.3) (A − θB)u2< τJD,

(21)

10 20 30 40 0 1 2 3 4 (a) 10 20 30 40 10-10 10-5 100 105 (c) 10 20 30 40 0 1 2 3 4 5 (b) 10 20 30 40 10-10 10-5 100 105 (d) λ1= 3.4505 λ2= 4.3901 θk θk  rk 2  rk 2 k k k k

Fig. 3. (a)–(b) Convergence history of the Ritz valueθkforλ1 andλ2using standard JD with

k = (π, 0.6π, 0). (c)–(d) Convergence history of the residual.

while an approximate solution φ of (3.10) is accepted if

(4.4) − G∗BG φ + G∗B  t tB  2< τHP.

Roughly speaking, there is a critical τHPc (depending on τJD) such that HPJD does not converge if τHP> τHPc . Since the overall accuracy of HPJD is governed by τJD only, one should take τHP≈ τHPc in order to minimize CPU time spent on the Helmholtz projection and get optimal performance. However, it is difficult to predict a priori what τHPc is. This is a fairly good reason to advocate NFJD and NFJD.

We have conducted extensive numerical experiments and concluded that

(4.5) τHPc  τJD

in the sense that τHPc is generally two or three orders of magnitude smaller than τJD. We now proceed to illustrate (4.5) with the following examples.

(22)

10 20 30 101 102 (a) 10 20 30 10-10 100 (c) 2 4 6 8 10 4.35 4.4 4.45 (b) 2 4 6 8 10 10-10 10-5 (d) λ1= 3.4505 λ2= 4.3901 θk θk  rk 2  rk 2 k k k k

Fig. 4. (a)–(b) Convergence history of the Ritz value forλ1 and λ2 using NFJD with k = (π, 0.6π, 0). (c)–(d) Convergence history of the residual.

Example 4.2. HPJD with (τJD, τHP) = (10−6, 10−7). This is the case where τHP exceeds the critical value τHPc , resulting in an inexact Helmholtz projection. The remaining null space components accumulate gradually as the subspaceVk grows. As a result, HPJD fails to converge at higher eigenvalues. We have observed a similar dragging effect as shown in Figure 5,

Example 4.3. Detailed comparison between HPJD and NFJD. We now compare

NFJD against HPJD at its optimal setting. We set τJD= 10−6and τNFJD= 2.5×10−9 according to (4.1). The critical τHP roughly corresponds to τHPc = 10−9 ∼ 10−8, depending on other parameters (see below). As τHP decreases, the overall CPU time for HPJD increases constantly. The smallest positive eigenvalue in our simulation occurs at k = (0.1π, 0, 0) with λ1∼ 0.045. Accordingly, we have set our threshold for NFJD as θc= 0.01 (see section 3.3.2 for the reason for setting up the threshold). An approximate eigenvalue in step (i) of NFJD is considered admissible only if θ ≥ θc.

In Table 2, we summarize the result for the computation of λ1 through λ10 with different parameters. One of the varying parameters is the restart dimension, denoted by NS. To accelerate convergence, we recycle a number of Ritz vectors after λi−1 converges (i ≥ 2) and after a restart. These NR Ritz vectors are used to build up initial subspace for λiin the beginning and after a restart. This accelerated version of NFJD is summarized as Algorithm 4. The accelerated version of HPJD and NFJD can be obtained through similar modification.

(23)

0 20 40 60 80 100 120 140 3 4 5 6 7 8 theta 0 20 40 60 80 100 120 140 10−5 100 105

Convergence history of the Ritz value forλ5= 6.8940166

Convergence history of the residual

rk 2

k k

Fig. 5. Convergence history of Ritz value θk and residualrk forλ5 at k = (0.5π, 0, 0) when using HPJD withτJD= 10−6,τHP= 10−7.

Table 2

CPU time comparison for first 10 eigenvalues (averaged over k) with τNFJD = 2.5 × 10−9, τJD= 10−6, andτHP ≈ τHPc . NS: restart dimension. NR: number of recycled Ritz vectors from λi−1’s subspaceVk. (NR, NS) τHP HPJD NFJD Ratio (3, 15) 10−8 5473 sec 3789 sec 1.444 (5, 15) 10−8 4897 sec 3536 sec 1.385 (7, 15) 10−8 4537 sec 3442 sec 1.318 (9, 15) 10−8 4347 sec 3528 sec 1.232 (NR, NS) τHP HPJD NFJD Ratio (3, 25) 10−9 5097 sec 3695 sec 1.379 (5, 25) 10−9 4720 sec 3412 sec 1.383 (7, 25) 10−9 4493 sec 3308 sec 1.358 (9, 25) 10−9 4299 sec 3307 sec 1.300

The results in Example 4.3 show that NFJD outperforms HPJD by a significant margin in all cases. In addition, HPJD is considerably slower for small NR’s. On the other hand, the performance of NFJD is relatively insensitive to (NR, NS).

Next, in Table 3, we document in more detail the relevant components of HPJD and NFJD. One can see that, for sufficiently large NR, the CPU time spent on the JD part are comparable in HPJD and NFJD. The additional CPU time spent in Helmholtz projection constitutes a significant portion in HPJD, even though such an FFT-based Helmholtz projector is indeed extremely efficient. For smaller NR, NFJD is even more efficient on the JD part.

Finally, in Table 4, we record the total CPU time needed for HPJD as τHP de-creases below τHPc for various cases of (NR, NS). As expected, the CPU time spent in

數據

Figure 2 shows the plot of w = a √ λ/(2π) versus sample points k in the first
Fig. 1 . The periodic dielectric structure within a primitive cell. Inside: dielectric material
Fig. 2 . Band structure computed with 100 × 100 × 100 grid. Table 1
Fig. 3 . (a)–(b) Convergence history of the Ritz value θ k for λ 1 and λ 2 using standard JD with
+3

參考文獻

相關文件

2.1.1 The pre-primary educator must have specialised knowledge about the characteristics of child development before they can be responsive to the needs of children, set

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

Understanding and inferring information, ideas, feelings and opinions in a range of texts with some degree of complexity, using and integrating a small range of reading

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

The temperature angular power spectrum of the primary CMB from Planck, showing a precise measurement of seven acoustic peaks, that are well fit by a simple six-parameter

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

We investigate some properties related to the generalized Newton method for the Fischer-Burmeister (FB) function over second-order cones, which allows us to reformulate the