Eigendecomposition of The Discrete Double-Curl Operator with Application to Fast Eigensolver for Three Dimensional Photonic Crystals

(1)

OPERATOR WITH APPLICATION TO FAST EIGENSOLVER FOR

THREE DIMENSIONAL PHOTONIC CRYSTALS∗

TSUNG-MING HUANG†_{, HAN-EN HSIEH}‡_{, WEN-WEI LIN}§_, AND WEICHUNG WANG¶

Abstract. This article focuses on the discrete double-curl operator arising in the Maxwell equation that models three dimensional photonic crystals with face centered cubic lattice. The dis-crete double-curl operator is the degenerate coefficient matrix of the generalized eigenvalue problems (GEVP) due to the Maxwell equation. We derive an eigendecomposition of the degenerate coefficient matrix and explore an explicit form of orthogonal basis for the range and null spaces of this matrix. To solve the GEVP, we apply these theoretical results to project the GEVP to a standard eigenvalue problem (SEVP), which involves only the eigenspace associated with the nonzero eigenvalues of the GEVP and therefore the zero eigenvalues are excluded and will not degrade the computational ef-ficiency. This projected SEVP can be solved efficiently by the inverse Lanczos method. The linear systems within the inverse Lanczos method are well-conditioned and can be solved efficiently by the conjugate gradient method without using a preconditioner. We also demonstrate how two forms of matrix-vector multiplications, which are the most costly part of the inverse Lanczos method, can be computed by fast Fourier transformation due to the eigendecomposion to significantly reduce the computation cost. Integrating all of these findings and techniques, we obtain a fast eigenvalue solver. The solver has been implemented by MATLAB and successfully solves each of a set of 5.184 million dimension eigenvalue problems within 50 to 104 minutes on an ordinary computer.

Key words. The Maxwell equation, discrete double-curl operator, eigendecomposition, fast Fourier transform, photonic crystals, face centered cubic lattice.

AMS subject classifications.65F15, 65T50, 15A18, 15A23.

1. Introduction. The theory of electromagnetic effects with periodic structures suggests that the time-harmonic electric fields for linear, isotropic and constant di-electric material in three dimension (3D) can be modelled by the Maxwell equation

∇ × ∇ × E = λεE. (1.1)

Here, the variable E stands for the electric field, ε stands for the permittivity, and λ = µ0ω2 where µ0 and ω are the magnetic constant and the frequency of time,

respectively [9, Chap. 2]. Consider a primitive cell that is spanned by the lattice translation vectors a1, a2, and a3and assume the primitive cell extends a 3D periodic

structure. The Bloch theorem [10] suggests that, for the Bloch wave vector 2πk in the first Brillouin zone [9], the electric field E satisfies the quasi-periodic condition

E(x + aℓ) = eı2πk·aℓE(x), (1.2)

for ℓ = 1, 2, 3, and it is sufficient to solve the problem (1.1) in one primitive cell. Two examples of lattices are the simple cubic (SC) lattice whose lattice translation vectors

∗_{Version April 5, 2012}

†_{Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan}

([email protected]).

‡_Department _of _Mathematics, _National _Taiwan _University, _Taipei _106, _Taiwan

§_{Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan}

¶_Department _of _Mathematics, _National _Taiwan _University, _Taipei _106, _Taiwan

(2)

are the unit vectors in R3 _{and the face-centered cubic (FCC) lattice that} a1=√a 2[1, 0, 0] ⊤_{, a} 2= √a 2 " 1 2, √ 3 2 , 0 #⊤ , and a3= √a 2 " 1 2, 1 2√3, r 2 3 #⊤ , (1.3)

for the lattice constant a. Note that any pairwise angle formed by a1, a2, and a3 is

π/2 and π/3 in SC and FCC lattice, respectively.

Among various applications modelled by (1.1), we focus on 3D photonic crystals with FCC lattice due to their importance and challenges. It has been shown that the photonic crystals with FCC lattice have a larger photonic band-gap, compared with SC lattice, [3] and larger band-gaps are favored in many innovative practical applications [1, 4, 11, 12]. Despite their broad applications, numerical simulations based on the numerical solutions to Eq. (1.1) with FCC lattice in 3D remain a chal-lenge. To predict the shape of the photonic crystals achieving maximal band-gap, one needs to solve a sequence of eigenvalue problems associated with different geometric shape parameters and Bloch wave vectors. This is a very time consuming process as many large-scale eigenvalue problems need to be solved. It is thus of great interest to develop a fast eigenvalue solver for the target eigenvalue problems, so that we can significantly shorten the computational time and thereby make the already widely used numerical simulations an even powerful tool.

Discretizing Eq. (1.1) on a primitive cell with FCC lattice vectors (1.3) by Yee’s scheme [14] leads to a generalized eigenvalue problem (GEVP)

Ax = λBx. (1.4)

Here, A is Hermitian semi-positive definite and B is positive and diagonal. The matrix A is the discrete double-curl operator ∇ × ∇× and the diagonal elements in B are the material dependent dielectric constants. As the GEVP contains many zero eigenvalues [2, 3, 8] and only a few of the smallest positive eigenvalues are of interest, the target eigenvalues are in the interior of the spectrum. It is natural to solve the GEVP by, for example, shift-and-invert Lanczos method or Jacobi-Davidson method. In these two types of methods, we need to solve linear systems in the form of

(A − σB)y = d

for a certain vector d and value σ. Unlike the problem associated with the SC lattice where an efficient preconditioning scheme exists [6], the problem associated with FCC lattice introduces a much more complicated coefficient matrix A. While the problems associated with FCC lattice have been considered in [3], this article aims to further improve the timing performance by proposing novel algorithms to solve the target eigenvalue problems.

To tackle this challenging problem, we make the following contributions to de-rive an eigendecomposition of A and then to develop a fast eigenvalue solver for the GEVP. We analyze the discretized matrices of the partial derivative and double-curl operators to derive the eigendecompositions of these matrices explicitly. We also rep-resent the range and null spaces of A by a set of orthogonal basis. By exploiting the advantage of the aforementioned properties, we propose a novel and efficient inverse projective Lancsoz method that is restricted in the eigenspace associated with nonzero eigenvalues of (1.4) and thus the method is not affected by the zero eigenvalues. In ad-dition, the resulting linear systems are well-conditioned, so that the conjugate gradient

(3)

a1 a2 a3 a √ 2 q 3 4 a √₂ a √₂ q 2 3

Fig. 2.1. The primitive cell and lattice translation vectors a1, a2, and a3. Any pair of the

vectors make an angle of 60 degrees. The length, width and height of the primitive cell are a √ 2, a √ 2 q 3 4, and a √ 2 q 2 3, respectively.

method can solve the associated linear systems efficiently without preconditioner. In addition, we demonstrate how two types of particular matrix-vector multiplications, which are the most costly part of the computation, can be computed by fast Fourier Transform (FFT) and thus significantly reduce the computational time. Numerical results justify the efficiency of the proposed algorithms by showing the promising tim-ing results. In particular, a MATLAB implementation of this new algorithm can find the target eigenvalues of a sequence of 5.184 million dimension GEVP in the form of (1.4) within 50 to 104 minutes.

This paper is outlined as follows. In Section 2, we illustrate the degenerate coeffi-cient matrix A corresponding to the discrete double-curl operator with FCC lattices. In Section 3, we find an eigendecomposition of A and give explicit representations of orthogonal basis for range and null spaces of A. We develop the inverse projec-tive Lancsoz method and an efficient way to compute the associated matrix-vector multiplications in Sections 4 and 5, respectively. Numerical experiments to validate and measure the timing performance of the proposed schemes are demonstrated in Section 6. Finally, we conclude the paper in Section 7.

Throughout this paper, we let ⊤ and ∗ denote the transpose and the conjugate transpose of a matrix by the superscript, respectively. For the matrix operations, we let ⊗ and ⊕ denote the Kronecker product and direct sum of two matrices, respec-tively. The imaginary number√−1 is written as ı and the identity matrix of order n is written as In. The conjugate of a complex scalar z ∈ C and a complex vector z ∈ Cn

are represented by ¯z and ¯z_{, respectively. The vec(·) is the operator that vectorizes a} matrix by stacking the columns of the matrix.

2. Discrete Double-Curl Operator with FCC Lattice. We use Yee’s scheme [14] to discretize Eq. (1.1) in the primitive cell that is illustrated in Figure 2.1. As the details of discretization are complicated, we refer readers to [7] that describes the whole discretization process in details. Let n1, n2 and n3 be the multiplies of 6 and

denote numbers of grid points in x-, y- and z-axis, respectively, and n = n1n2n3.

The grid size in three axes are δx= √a₂_n1₁, δy = √a₂

q 3 4 1 n2, and δz = a √ 2 q 2 3 1 n3. The

(4)

double-curl operator ∇ × ∇× is of the form A = C∗C ∈ C3n×3n, (2.1) where C =   0 −C3 C2 C3 0 −C1 −C2 C1 0  ∈ C3n×3n, (2.2a) C1= In2n3⊗ K1∈ Cn×n, C2= In3⊗ K2∈ Cn×n, C3= K3∈ Cn×n, (2.2b) K1= 1 δx      −1 1 . .. ... −1 1 eı2πk·a1 ₋₁      ∈ Cn1×n1_, _(2.3a) K2= 1 δy      −In1 In1 . .. ... −In1 In1 eı2πk·a2_J 2 −In1      ∈ C(n1n2)×(n1n2)_, _(2.3b) K3= 1 δz         −In1n2 In1n2 . .. . .. −In1n2 In1n2 eı2πk·a3_J 3 −In1n2         ∈ Cn×n, (2.3c) J2= 0 e−ı2πk·a1_I n1/2 In1/2 0 ∈ Cn1×n1_{, and} _(2.4a) J3= ₀ _e−ı2πk·a2_I₁ 3n2⊗ In1 I2 3n2⊗ J2 0 ∈ C(n1n2)×(n1n2)_. _(2.4b)

These matrices are associated with particular operators. The matrices C1, C2,

and C3are the discretizations of the operators ∂x, ∂y, and ∂z, respectively. Let

G = [C1⊤, C2⊤, C3⊤]⊤. (2.5)

The matrices C∗_{C, I}

3⊗ (G∗G), and GG∗ are the discretizations of the operators

∇ × ∇×, ∇2_{, and ∇(∇·), respectively. The block cyclic matrices K}

1, K2, and K3 are

the finite difference discretizations associated with periodicity. The entries −I and I in the same row of Kℓ, ℓ = 1, 2, 3, correspond to the regular finite differences. The

entries eı2πk·a1 _{and −1 in the last row of K}

1are associated with the periodicity along

a1. Similarly, eı2πk·a2J2 and −In1 in K2are associated with the periodicity along a1

and a2. The matrices eı2πk·a3J3and −In1n2 in K3are associated with the periodicity

(5)

3. Eigendecomposition of the Discrete Operators. In the following two sub-sections, we derive eigendecompositions of the discrete partial derivative operators Cℓ’s and then the discrete double-curl operator A = C∗C in explicit forms.

3.1. Eigendecomposition of the partial derivative operators. To find an eigendecomposition of Cℓ defined in (2.2a), our approach is divided into the following

steps. First, we find the eigenpairs of K1, K2, and K3 defined in (2.3). By using

these eigenpairs, we show that the matrices C1, C2, and C3 can be diagonalized by a

common unitary matrix. Combining these results, we obtain the eigendecompositions of Cℓ.

Theorem 3.1 (Eigenpairs of K1). The eigenpairs of K1in(2.3a) are δx−1(eθi− 1), xi,

where θi= ı2π(i + k · a1) n1 , (3.1) xi= 1 eθi e2θi · · · e(n1−1)θi ⊤, (3.2) for i = 1, . . . , n1.

Proof. Verify δxK1xi= eθi− 1 xi directly, for i = 1, . . . , n1.

Theorem 3.2 (Eigenpairs of K2). The eigenpairs of K2 in (2.3b) are

δ−1y (eθi,j− 1), yi,j⊗ xi ,

where xi is given in (3.2) and

θi,j=ı2π(j − i 2+ k · ˆa2) n2 with ˆa2= a2− 1 2a1, (3.3)

yi,j = 1 eθi,j e2θi,j · · · e(n2−1)θi,j ⊤, (3.4)

for i = 1, . . . , n1, andj = 1, . . . , n2.

Proof. Suppose (λ, [y1, · · · , yn2]⊤ ⊗ xi) is an eigenpair of K2. From (2.3b) it

satisfies that y2− y1= λδyy1, (3.5a) .. . yn2− yn2−1= λδyyn2−1, (3.5b) y1eı2πk·a2J2xi− yn2xi= λδyyn2xi. (3.5c)

By the definition of J2 in (2.4a), Eq. (3.5c) implies that

y1eı2πk·a2e−ı2πk·a1e n1 2 θi_{− y} n2 = λδyyn2, (3.6a) y1eı2πk·a2e− n1 2 θi_{− y} n2 = λδyyn2. (3.6b)

Plugging θi in (3.1) into (3.6), we show that two equations of (3.6) are equivalent to

y1eı2π(k·ˆa2−

i

2)− y

n2 = λδyyn2. (3.7)

Combining the results in (3.5a), (3.5b), (3.7), and using Theorem 3.1, we get λ = δ−1

(6)

Theorem 3.3 (Eigenpairs of K3). The eigenpairs of K3 in (2.3c) are

(δ−1z (eθi,j,k− 1), zi,j,k⊗ yi,j⊗ xi),

where xi and yi,j are given in (3.2) and (3.4), respectively, and

θi,j,k= ı2π(k − 1 3(i + j) + k · ˆa3) n3 with aˆ3= a3− 1 3(a1+ a2) , (3.8) zi,j,k=

1 eθi,j,k _e2θi,j,k _{· · ·} _e(n3−1)θi,j,k ⊤_, _(3.9)

for i = 1, . . . , n1,j = 1, . . . , n2 andk = 1, . . . , n3.

Proof. Assume that (λ, [z1, · · · , zn3]⊤⊗ yi,j⊗ xi) is an eigenpair of K3. By the

definition of K3 in (2.3c), it satisfies that

z2− z1= λδzz1, (3.10a)

.. .

zn3− zn3−1= λδzzn3−1, (3.10b)

z1eı2πk·a3J3(yi,j⊗ xi) − zn3(yi,j⊗ xi) = λδzzn3(yi,j⊗ xi). (3.10c)

By the definitions of J3 in (2.4b) and yi,j in (3.4), Eq. (3.10c) implies that

z1eı2πk·a3e−ı2πk·a2e 2 3n2θi,j_{− z} n3 = λδzzn3, (3.11a) z1eı2πk·a3e− n2 3 θi,j_J 2xi− zn3xi= λδzzn3xi. (3.11b)

By the definitions of J2 in (2.4a) and xi in (3.2), Eq. (3.11b) implies that

z1eı2πk·a3e− n2 3θi,j_e−n12 θi_{− z} n3= λδzzn3, (3.12a) z1eı2πk·a3e− n2 3 θi,j_e−ı2πk·a1_en1₂ θi_{− z} n3= λδzzn3. (3.12b)

From the definitions of θi and θi,j in (3.1) and (3.3), respectively, the exponents in

(3.11a), (3.12a), and (3.12b) satisfy ı2πk · a3− ı2πk · a2+ 2 3n2θi,j = ı2πk · â3− ı2π (i + j) 3 + ı2πj, (3.13a) ı2πk · a3− 1 3n2θi,j− n1 2 θi = ı2πk · â3− ı2π (i + j) 3 , and (3.13b) ı2πk · a3− 1 3n2θi,j− ı2πk · a1+ n1 2 θi = ı2πk · â3− ı2π (i + j) 3 + ı2πi. (3.13c) Plugging (3.13a), (3.13b), and (3.13c) into (3.11a), (3.12a), and (3.12b), respectively, we see that Eq. (3.10c) can be reduced to

z1eı2π(k·ˆa3−

(i+j)

3 )− z

n3= λδzzn3. (3.14)

Combining the results in (3.10a), (3.10b) with (3.14), and using Theorem 3.1, we get λ = δ−1

z (eθi,j,k− 1), and zs+1= esθi,j,k, for s = 0, . . . , n3− 1.

It is worth noting that the sub-vectors yi,j and zi,j,k in the eigenvectors of K2

and K3 in Theorems 3.2 and 3.3 depend on the indices (i, j) and (i, j, k),

respec-tively. Such coupling relations are due to the periodic structure over the skew lattice translation vectors in (1.3). These coupling relations complicates the derivation of

(7)

the eigendecomposition. However, as Cℓ’s consists of Kℓ’s, we can suitably use the

eigenvectors of Kℓ’s to form the eigenvectors of Cℓ’s. This idea is developed as follows.

Now, we proceed to show that C1, C2, and C3 in (2.2a) can be diagonalized by

the following unitary matrix T =_√ 1 n1n2n3 T1 T2 · · · Tn1 ∈ Cn×n. (3.15) Here Ti=

Ti,1 Ti,2 · · · Ti,n2 ∈ Cn×(n

2n3)_and

Ti,j= zi,j,1⊗ yi,j⊗ xi zi,j,2⊗ yi,j⊗ xi · · · zi,j,n3⊗ yi,j⊗ xi ∈ Cn×n 3

for i = 1, . . . , n1, j = 1, . . . , n2, and k = 1, . . . , n3.

Theorem 3.4. The matrix T defined in (3.15) is unitary.

Proof. Let ϕs=ı2πsm for s = 1, . . . , m. By a simple calculation, we have

1 + e(ϕs2−ϕs1)_{+ e}2(ϕs2−ϕs1)_{+ · · · + e}(m−1)(ϕs2−ϕs1)_{= mδ}_s

1,s2, (3.16)

where δs1,s2 denotes the Kronecker delta. From the definitions of θi, θi,j, and θi,j,kin

(3.1), (3.3), and (3.8), respectively, it follows that θi2− θi1 = ı2π(i2+ k · a1) n1 − ı2π(i1+ k · a1) n1 = ı2π(i2− i1) n1 , (3.17a) θi1,j2− θi1,j1 = ı2π(j2−i₂1 + k · â2) n2 − ı2π(j1−i₂1 + k · â2) n2 = ı2π(j2− j1) n2 , (3.17b) θi1,j1,k2− θi1,j1,k1 = ı2π(k2−i1+j₃ 1 + k · â3) n3 − ı2π(k1−i1+j₃ 1 + k · â3) n3 = ı2π(k2− k1) n3 . (3.17c)

By the definitions of xi, yi,j and zi,j,kin (3.2), (3.4), and (3.9), respectively, and using

(3.16) and (3.17), we have x∗

i1xi2 = n1δi1,i2, y∗i1,j1yi1,j2 = n2δj1,j2, z∗i1,j1,k1zi1,j1,k2 =

n3δk1,k2. This implies that if (i1, j1, k1) 6= (i2, j2, k2), then

(zi1,j1,k1⊗ yi1,j1⊗ xi1) ∗_(z i2,j2,k2⊗ yi2,j2⊗ xi2) = z∗ i1,j1,k1zi2,j2,k2 y_i∗₁_,j₁yi2,j2 x∗_i₁xi2 = 0

and (zi,j,k⊗ yi,j⊗ xi)∗(zi,j,k⊗ yi,j⊗ xi) = n1n2n3. Therefore, T is unitary.

Define

Λn1= δ−1x diag eθ1− 1, eθ2− 1, · · · , eθn1− 1 ,

Λi,n2 = δ−1y diag eθi,1− 1, eθi,2− 1, · · · , eθi,n2 − 1 ,

Λi,j,n3 = δ−1z diag eθi,j,1− 1, eθi,j,2− 1, · · · , eθi,j,n3− 1 .

By the results of Theorems 3.1 to 3.4, we have C1Ti,j= (In2n3⊗ K1) Ti,j=

eθi_{− 1}

δx Ti,j,

C2Ti= (In3⊗ K2) Ti= Ti(Λi,n2⊗ In3) ,

(8)

and therefore the following theorem holds.

Theorem 3.5 (Eigendecompositions of Ci’s). The unitary matrix T defined in

(3.15) leads to the Eigendecompositions of C1,C2, andC3 in the forms

C1T = T Λx, C2T = T Λy, and C3T = T Λz, (3.18)

whereΛx= T (Λn1⊗ In2n3) , Λy= T (⊕

n1

i=1Λi,n2)⊗In3, andΛz= T ⊕

n1

i=1⊕ n2

j=1Λi,j,n3 .

3.2. Eigendecomposition of the double-curl operator. Now, we proceed to find the eigendecomposition of the discrete double-curl operator A = C∗_{C defined}

in (2.1). We first define several intermediate diagonal matrices and show that these matrices are positive definite and invertible in a particular space in Lemma 3.6. These matrices will be used later to describe the eigendecomposition of A. Then we demon-strate an explicit representation of the corresponding range and null spaces of A. The eigendecomposition of A is finally presented in Theorem 3.7 by applying Lemma 3.6 and the representation of the range and null spaces.

Based on the diagonal matrices Λx, Λy, and Λzdefined in (3.18), we define

Λq = Λ∗xΛx+ Λy∗Λy+ Λ∗zΛz and (3.19a)

Λp= ΛsΛ∗s= (Λx+ Λy+ Λz)(Λx+ Λy+ Λz)∗. (3.19b)

As shown in Section 3.1, these diagonal matrices actually depend on the Bloch wave vector 2πk. To determine the band-gap of a photonic crystals with FCC lattice, we need to solve a sequence of eigenvalue problems. These eigenvalue problems are associated with the Bloch wave vector whose k ∈ B, where

B = ( k= (k1, k2, k3)⊤ | 0 ≤ k1, k2, k3≤ 1, k 6= 0, and k 6= 1,√1 3, 1 √ 6 ⊤) .

The following lemma shows that the intermediate diagonal matrices Λq and 3Λq− Λp

are positive definite.

Lemma 3.6. For every k ∈ B, Λq is positive definite and 3Λq − Λp is positive

definite for δx6= δy or δx6= δz.

Proof_{. The ((i − 1)n}2n3+ (j − 1)n3+ k)-th diagonal element µi,j,k of Λq is equal

to µi,j,k= eıθi_{− 1} δx 2 + eıθi,j_{− 1} δy 2 + eıθi,j,k_{− 1} δz 2

for i = 1, . . . , n1, j = 1, . . . , n2 and k = 1, . . . , n3. This implies that µi,j,k = 0 if

and only if θi/(ı2π), θi,j/(ı2π), and θi,j,k/(ı2π) are integers. By a tedious calculation

(see Theorem A.1 in the appendix), one can show that these conditions hold only if k= [1, 1/√3, 1/√6]⊤_{. Therefore, Λ}

q is nonsingular for k ∈ B.

From (2.2a) and (2.5), it holds that

C∗C = I3⊗ (G∗G) − GG∗. (3.20)

Let T1= [T⊤ T⊤ T⊤]⊤. From (3.20) and the results of Theorem 3.5, we have

T1∗CC∗T1= 3T∗(C1C1∗+ C2C2∗+ C3C3∗) T

(9)

Now, we show that C∗_T

1 is of full column rank. Suppose that C∗T1v = 0, that is,

(C∗

3− C2∗)T v = (C1∗− C3∗)T v = (C2∗− C1∗)T v = 0. From Theorem 3.5, it follows that

Λ∗

z− Λ∗y v = (Λ∗x− Λ∗z) v = Λ∗y− Λ∗x v = 0. Suppose that the ((i − 1)n2n3+ (j −

1)n3+ k)-th element of v is nonzero. Then we have

sin θi δx =sin θi,j δy = sin θi,j,k δz and cos θi− 1 δx = cos θi,j− 1 δy =cos θi,j,k− 1 δz , (3.21)

which imply that sin θi δx 2 + cos θi− 1 δx 2 = sin θi,j δy 2 + cos θi,j− 1 δy 2 = sin θi,j,k δz 2 + cos θi,j,k− 1 δz 2 and then cos θi− 1 δx = δx δy cos θi,j− 1 δy = δx δz cos θi,j,k− 1 δz . (3.22)

From (3.21) and (3.22), we can see that if δx6= δy or δx6= δz, then cos θj= cos θi,j=

cos θi,j,k = 1. That is θi/(ı2π), θi,j/(ı2π) and θi,j,k/(ı2π) must be integers. This

contradicts that k ∈ B. Thus, C∗_T

1is of full column rank which implies that 3Λq−Λp

is positive definite.

The range and null spaces of A are derived as follows. First, we assert that Q0

forms an orthogonal basis for the null space of A, where

Q0=   T Λx T Λy T Λz  ∈ C3n×n. (3.23)

The orthogonality of Q0 holds as Lemma 3.6 suggests that Q∗0Q0 = Λq > 0. Using

the definition of A in (2.1), the eigendecompositions of Cℓ in Theorem 3.5, and the

fact that Cℓ are normal and commute each other (see Theorem A.2 in appendix), we

can show that Q0spans the null space of A as

AQ0= A   T Λx T Λy T Λz  = A   C1T C2T C3T  = 0. (3.24)

Next, we form the orthogonal basis for the range space of A. Considering the full column rank matrix T1 and taking the orthogonal projection of T1 with respect to

Q0, we have I − Q0Λ−1q Q∗0 T1=   T (Λq− ΛxΛ∗s) T (Λq− ΛyΛ∗s) T (Λq− ΛzΛ∗s)  Λ−1_q ,

(10)

where Λsis defined in (3.19b). That is, Q1belongs to the range space of A, where Q1= I − Q0Λ−1q Q∗0 T1Λq =   T (Λq− ΛxΛ∗s) T (Λq− ΛyΛ∗s) T (Λq− ΛzΛ∗_s)  . (3.25)

It is then naturally to form the rest part of the orthogonal basis for the range space of A as the curl of T1 by defining

Q2= C∗T1=   T Λ∗ z− Λ∗y T (Λ∗ x− Λ∗z) T Λ∗ y− Λ∗x  . (3.26)

In short, we have shown that Q0 and [Q1 Q2] are orthogonal bases for the null and

range space of A, respectively.

In the next theorem, we derive the eigendecompositions of A and GG∗_{. Note that}

A is defined in (2.1) and G is defined in (2.5). Theorem 3.7. Define Q =Q0 Q1 Q2 diag Λ−12 q , 3Λ2q− ΛqΛp− 1 2_{, (3Λ} q− Λp)− 1 2 . (3.27) Then Q is unitary. Furthermore,

Q∗AQ = diag (0, Λq, Λq) and Q∗GG∗Q = diag (Λq, 0, 0) . (3.28)

Proof. Since

Λ∗x Λ∗z− Λ∗y + Λ∗y(Λx∗− Λ∗z) + Λ∗z Λ∗y− Λ∗x = 0, (3.29a)

Λ∗x(Λq− ΛxΛ∗s) + Λ∗y(Λq− ΛyΛ∗s) + Λ∗z(Λq− ΛzΛ∗s) = Λ∗sΛq− ΛqΛ∗s= 0, (3.29b)

(Λz− Λy) (Λq− ΛxΛ∗s) + (Λx− Λz) (Λq− ΛyΛ∗s) + (Λy− Λx) (Λq− ΛzΛ∗s) = 0,

(3.29c) and T∗_{T = I}_n_{, it follows that the matrices Q}₀_{, Q}₁_{, and Q}₂ _{in (3.27) are mutually}

orthogonal. Furthermore, we can directly verify that

Q∗0Q0= Λq, Q∗1Q1= 3Λ2q− ΛqΛp, Q∗2Q2= 3Λq− Λp. (3.30)

By Lemma 3.6, it follows that Q0, Q1, and Q2 are of full column rank. Therefore, by

(3.29), Q in (3.27) is unitary.

Eq. (3.24) shows that Q0 forms an orthogonal basis for the null space of A.

Eqs. (3.29a) and (3.29b) lead to

G∗Q1= 0 and G∗Q2= 0. (3.31)

From Theorem 3.5, (3.31), and the factP3

ℓ=1Cℓ∗CℓT = T Λq, it follows that AQ1= (I3⊗ (G∗G) − GG∗) Q = (I3⊗ G∗G) Q1= " I3⊗ 3 X ℓ=1 Cℓ∗Cℓ !# Q1= Q1Λq.

(11)

Similarly,

AQ2= (I3⊗ (G∗G) − GG∗) Q2= (I3⊗ (G∗G)) Q2= Q2Λq.

Consequently, we have proved that Q∗_{AQ = diag (0, Λ}_q_{, Λ}_q_).

Finally, from (3.19) and Theorem 3.5, we have

GG∗Q0=   C1 C2 C3  C₁∗ C₂∗ C₃∗   T Λx T Λy T Λz  =   C1T C2T C3T  Λq = Q0Λq. (3.32)

Combining (3.32) with (3.31), we show that Q∗_GG∗_{Q = diag (Λ}_q_{, 0, 0).}

4. Inverse Projective Lanczos Method. The eigendecomposition of the dis-crete double-curl operator derived in Theorem 3.7 is actually a powerful tool to solve the GEVP (1.4). Via this eigendecomposition, we can form the eigendecomposition of A in terms of its range space. This particular decomposition allows us to project GEVP into a standard eigenvalue problem (SEVP) that is equipped with several attractive computational properties as shown below.

The eigendecomposition (3.28) suggests that Qrforms an orthogonal basis for the

range space of A, where Qr= h Q1 3Λ2q− ΛqΛp− 1 2 _Q 2(3Λq− Λp)− 1 2 i = (I3⊗ T ) Λ. (4.1)

This basis, together with Λr= diag (Λq, Λq) > 0, leads to the fact that A = QrΛrQ∗r.

In addition, B−1/2_Q rΛ

1 2

r forms a basis for the invariant subspace of B−1/2AB−1/2

corresponding to the nonzero eigenvalues of the GEVP (1.4). Letting x= B−1QrΛ

1 2

ry (4.2)

and substituting x into (1.4), we have AB−1_Q rΛ 1 2 ry = λQrΛ 1 2 ry . (4.3) Pre-multiplying (4.3) by Λ− 1 2

r Q∗r and using the facts that A = QrΛrQ∗r and Q∗rQr=

I2n, we can form the SEVP

Ary= λy, (4.4) where Ar= Λ 1 2 rQ∗rB−1QrΛ 1 2 r.

This SEVP has the following computational advantages. First, while both of the GEVP and SEVP have the same 2n positive eigenvalues, the dimension of the GEVP and SEVP are 3n × 3n and 2n × 2n, respectively. The SEVP is a much smaller eigenvalue problem. More importantly, as we are interested in several of the smallest positive eigenvalues among all of the 2n positive eigenvalues in SEVP, we can find these desired eigenvalues efficiently by standard inverse Lanczos method [5]. In contrast, the GEVP contains n zero eigenvalues and 2n positive eigenvalues. The desired eigenvalues are located in the interior of the spectrum and the property usually causes numerical inefficiency [6].

(12)

Second, to solve the SEVP by the inverse Lanczos method, we need to solve the linear system

Q∗rB−1Qru= c (4.5)

at each Lanczos step for a certain u and c. The conjugate gradient (CG) method [5] fits this symmetric positive definite system nicely. In addition, as shown in Theorem 4.1, we can bound the condition number κ(Q∗

rB−1Qr) associated with (4.5) and then

estimate the convergence performance of the CG method. In practice, the condition number is small as demonstrated in Section 6 and there is therefore no need to find preconditioner for (4.5) consequently.

Third, to solve (4.5) by the CG method, the most costly computation is the matrix-vector multiplication in terms of the coefficient matrix Q∗

rB−1Qr or

particu-larly the matrix-vector multiplications T∗_p _{and T q for certain vectors p and q due}

to the definition of Qr in (4.1). At first glance, the components in the three

co-ordinates are coupled together in the matrix T . Consequently, these matrix-vector multiplications are general dense operations with cost O(n3_{). However, as discussed}

in Section 5, these matrix-vector multiplications can be performed by a sequence of diagonal matrix-vector multiplications and one-dimensional FFT with cost O(k) and O(k log(k)), respectively, for k = n1, n2, or n3.

Now, we assert an upper bound of κ(Q∗

rB−1Qr) in Theorem 4.1 and summarize

the aforementioned ideas by proposing the inverse projective Lanczos method (IPL) to solve the GEVP (1.4) in Algorithm 1.

Theorem 4.1. Let Qr be defined in (4.1). Then

κ Q∗rB−1Qr ≤ κ(B−1). (4.6) Proof. Since Q∗ rQr= I2n, it implies that λmax Q∗rB−1Qr = max kzk2=1 z∗Q∗rB−1Qrz≤ max k˜zk2=1 ˜ z∗B−1˜z= λmax(B−1), (4.7) and λmin Q∗rB−1Qr = min kzk2=1 z∗Q∗rB−1Qrz≥ min k˜zk2=1 ˜ z∗B−1˜z= λmin(B−1). (4.8)

From (4.7) and (4.8), the result of (4.6) is proved.

Algorithm 1inverse projective Lanczos method for solving (1.4)

1: Compute Λx, Λy and Λzin Theorem 3.5; 2: Compute Λq, Λp and Λsin (3.19); 3: Compute Λ in (4.1);

4: Apply the inverse Lanczos method to solve the following SEVP diagΛ21 q, Λ 1 2 q Λ∗(I3⊗ T∗) B−1(I3⊗ T ) Λ diag Λ12 q, Λ 1 2 q y= λy; 5: Compute x = B−1_(I₃_{⊗ T ) Λ diag}_Λ12 q, Λ 1 2 q y.

(13)

5. Fast Matrix-Vector Multiplication forT∗_{p and}_{T q. The most expensive}

computational cost for solving (4.5) by CG method has been pinned down to the matrix-vector multiplications T∗_p _{and T q. To derive fast algorithms to compute}

these multiplications, our strategy is to rewrite each of the eigenvector entries in Kℓ’s

as a multiplication of diagonal matrix and a periodical matrix. Then we carefully explore the recursive and periodical matrix representations, so that we can rewrite the multiplication of T∗_p_{and T q as a sequence of operations involving diagonal and}

FFT matrices, which can significantly reduce the computational cost.

First, we rewrite θi, θi,j, θi,j,k and xi, yi,j, and zi,j,k in Theorems 3.1 to 3.3 as

follows. θi= ı2πi n1 +ı2πk · a1 n1 ≡ θ x,i+ εx, θi,j= ı2πj n2 +ı2π n2 k_{· ˆa}2− i 2 ≡ θy,j+ εy,i, θi,j,k= ı2πk n3 +ı2π n3 k_{· ˆa}3−1 3(i + j) ≡ θz,k+ εz,i+j. and xi= Ex1 eθx,i · · · e(n1−1)θx,i ⊤ ≡ Exux,i, (5.1a)

yi,j= Ey,i1 eθy,j · · · e(n2−1)θy,j

⊤

≡ Ey,iuy,j, (5.1b)

zi,j,k= Ez,i+j1 eθz,k · · · e(n3−1)θz,k⊤≡ Ez,i+juz,k, (5.1c)

where Ex = diag 1, eεx, · · · , e(n1−1)εx, E

y,i = diag 1, eεy,i, · · · , e(n2−1)εy,i and

Ez,i+j= diag 1, eεz,i+j, · · · , e(n3−1)εz,i+j. From (5.1) we denote

Ux=ux,1 ux,2 · · · ux,n1 , (5.2a)

Uy=uy,1 uy,2 · · · uy,n2 , (5.2b)

Uz=uz,1 uz,2 · · · uz,n3 . (5.2c)

5.1. Matrix-vector multiplication for T∗_p. _{For a given vector p ∈ C}n1n2n3_,

we denote p recursively by letting p =p⊤

1 · · · p⊤n3 ⊤ , pk=p⊤1,k · · · p⊤n2,k ⊤ ∈ Cn1n2_{, and p} j,k=p1,j,k · · · pn1,j,k ⊤ ∈ Cn1 _{for j = 1, . . . , n} 2 and k = 1, . . . , n3.

By the properties of tensor products, we have

(zi,j,k⊗ yi,j⊗ xi)∗p= (yi,j⊗ xi)∗p1 p2 · · · pn3 ¯zi,j,k

≡ (yi,j⊗ xi)∗P ¯zi,j,k (5.3)

and

(14)

for k = 1, . . . , n3. From (5.3), (5.1c), and (5.2c), it follows that

Ti,j∗ p≡zi,j,1⊗ yi,j⊗ xi zi,j,2⊗ yi,j⊗ xi · · · zi,j,n3⊗ yi,j⊗ xi

∗ p =      (yi,j⊗ xi)∗P zi,j,1 (yi,j⊗ xi)∗P zi,j,2 .. . (yi,j⊗ xi)∗P zi,j,n3      =      z∗ i,j,1P⊤(yi,j⊗ xi) z∗_i,j,2P⊤_(y i,j⊗ xi) .. . z∗ i,j,n3P ⊤_(y_i,j_{⊗ x}_i₎      =zi,j,1 zi,j,2 · · · zi,j,n3

∗ P⊤_(y

i,j⊗ xi)

=Uz∗Ez,i+j∗ P⊤(yi,j⊗ xi) ,

which implies that

Ti∗p=      T∗ i,1p T∗ i,2p .. . T∗ i,n2p      =      U∗ zEz,i+1∗ P⊤(yi,1⊗ xi) Uz∗Ez,i+2∗ P⊤(yi,2⊗ xi) .. . U∗ zEz,i+n∗ 2P ⊤_(y i,n2⊗ xi)      . (5.5)

From the definition of P in (5.3) and the result in (5.4), the vectors P⊤_(y

i,j⊗ xi) for

j = 1, . . . , n2in (5.5) can be calculated by

p⊤_k yi,1⊗ xi yi,2⊗ xi · · · yi,n2 ⊗ xi

⊤

=yi,1⊗ xi yi,2⊗ xi · · · yi,n2 ⊗ xi

∗ pk=    x∗_iPky¯i,1 .. . x∗_iPky¯i,n2   =    y∗_i,1P⊤ k x¯i .. . y_i,n∗ ₂P⊤ k x¯i    =yi,1 · · · yi,n2 ∗ Pk⊤x¯i (5.6) for k = 1, . . . , n3. Let ξ1,k ξ2,k · · · ξn1,k = Pk∗x1 x2 · · · xn1 = (Pk∗Ex) Ux (5.7) for k = 1, . . . , n3and

ηi,1 ηi,2 · · · ηi,n2 =

yi,1 · · · yi,n2 ∗ P⊤ 1 x¯i P2⊤x¯i · · · Pn⊤3x¯i ∗ =ξi,1 ξi,2 · · · ξi,n3

⊤ Ey,i

Uy (5.8)

for i = 1, . . . , n1. Then, by (5.6),

p⊤_k yi,1⊗ xi yi,2⊗ xi · · · yi,n2⊗ xi = ηi,1,k ηi,2,k · · · ηi,n2,k

for k = 1, . . . , n3, where ηi,j,k is the kth component of ηi,j. This implies that

P⊤(yi,j⊗ xi) =      p⊤₁ (yi,j⊗ xi) p⊤₂ (yi,j⊗ xi) .. . p⊤_n₃(yi,j⊗ xi)      =      ηi,j,1 ηi,j,2 .. . ηi,j,n3      = ηi,j. (5.9)

(15)

Substituting (5.9) into (5.5), we have T∗ ip=      U∗ zE∗z,i+1ηi,1 U∗ zEz,i+2∗ ηi,2 .. . U∗ zEz,i+n∗ 2ηi,n2      = vec U∗

zEz,i+1∗ ηi,1 · · · Ez,i+n∗ 2ηi,n2 ≡ vec (Zi) .

(5.10) By the definition of T in (3.15) and the result in (5.10), we obtain

T∗p= _√ 1 n1n2n3

vec Z1 · · · Zn1 . (5.11)

We summarize this new way to compute T∗_p_{in Algorithm 2.}

Algorithm 2FFT-based matrix-vector multiplication for T∗_p

Input: Any vector p =p⊤

1 · · · p⊤n3 ⊤ ∈ Cn _{with p} k =p⊤1,k · · · p⊤n2,k ⊤ and pj,k∈ Cn1 for j = 1, . . . , n2, k = 1, . . . , n3.

Output: _{The vector f ≡ T}∗p.

1: fork = 1, . . . , n3 do

2: Compute ξi,k withξ1,k ξ2,k · · · ξn1,k = (Pk∗Ex) Ux in (5.7). 3: end for

4: fori = 1, . . . , n1 do

5: Compute ηi,j withηi,1 · · · ηi,n2 =

ξi,1 · · · ξi,n3 ⊤ Ey,i Uy in (5.8). 6: Compute Zi with Zi= Uz∗Ez,i+1∗ ηi,1 · · · Ez,i+n∗ 2ηi,n2 in (5.10).

7: _{Set f ((i − 1)n}2n3+ 1 : in2n3) = √_n₁1_n₂_n₃vec(Zi). 8: end for

5.2. Matrix-vector multiplication for _{T q. For a given vector q ∈ C}n, we denote q recursively by letting q =q⊤

1 · · · q⊤n1

⊤

with qi=q⊤i,1 · · · q⊤i,n2

⊤ ∈ Cn2n3 _{and q}

i,j=qi,j,1 · · · qi,j,n3

⊤

∈ Cn3_{. Then, by the definition of T in (3.15),}

and the results in (5.1c) and (5.2c), we have

T q =√_n1 1n2n3 n1 X i=1 n2 X j=1 n3 X k=1

qi,j,k(zi,j,k⊗ yi,j⊗ xi)

=√_n1 1n2n3 n1 X i=1 n2 X j=1

zi,j,1 · · · zi,j,n3 qi,j ⊗ yi,j⊗ xi

=√_n1 1n2n3 n1 X i=1 n2 X j=1

(Ez,i+jUzqi,j) ⊗ yi,j⊗ xi

=_√ 1 n1n2n3 n1 X i=1 n2 X j=1

vec(yi,j⊗ xi) (Ez,i+jUzqi,j)⊤

=_√ 1 n1n2n3 n1 X i=1 vec   n2 X j=1

(yi,j⊗ xi) (Ez,i+jUzqi,j)⊤



(16)

Let

Qz,i≡Ez,i+1Uzqi,1 Ez,i+2Uzqi,2 · · · Ez,i+n2Uzqi,n2

⊤

∈ Cn2×n3 _(5.13)

for i = 1, . . . , n1. Then Eq. (5.12) can be rewritten as

T q =_√ 1 n1n2n3

n1

X

i=1

vec yi,1⊗ xi · · · yi,n2⊗ xi Qz,i

=_√ 1 n1n2n3 n1 X i=1

vec yi,1 yi,2 · · · yi,n2 Qz,i ⊗ xi

=_√ 1 n1n2n3 n1 X i=1

vec ((Ey,iUyQz,i) ⊗ xi) . (5.14)

Define

Gi≡gi,1 gi,2 · · · gi,n3 ≡ Ey,i(UyQz,i) ∈ C

n2×n3 _(5.15)

for i = 1, . . . , n1. Rewrite Eq. (5.14) as

T q =_√ 1 n1n2n3

n1

X

i=1

vec gi,1 gi,2 · · · gi,n3 ⊗ xi

=√_n1 1n2n3 n1 X i=1

vec gi,1⊗ xi gi,2⊗ xi · · · gi,n3⊗ xi

=√_n1 1n2n3 vec " vec n1 X i=1 xigi,1⊤ ! · · · vec n1 X i=1 xig⊤i,n3 !#! . Since n1 X i=1 xig⊤i,k=x1 x2 · · · xn1 g1,k g2,k · · · gn1,k ⊤ = ExUxg1,k g2,k · · · gn1,k ⊤ for k = 1, . . . , n3, it implies that

T q =√_n1 1n2n3× vec           vec      ExUx      g⊤_1,1 g⊤_2,1 .. . g⊤_n₁_,1           · · · vec      ExUx      g⊤ 1,n3 g_2,n⊤ ₃ .. . g⊤_n₁_,n₃                     . (5.16)

We summarize above processes for computing T q in Algorithm 3.

6. Numerical Results. We implement Algorithms 1, 2, and 3 by MATLAB to evaluate their timing performance. The matrices [ξi,k], [ηi,j], and Zi in Lines 2,

5 and 6 of Algorithm 2 are computed by fft, which is the built-in discrete Fourier transform function in MATLAB. The matrices Qz,i, [gi,k], and Q in Lines 2, 3, and

(17)

Algorithm 3FFT-based matrix-vector multiplication for T q Input: Any vector q = q⊤

1 · · · q⊤n1 ⊤ ∈ Cn _{with q} i =q⊤i,1 · · · q⊤i,n2 ⊤ and qi,j ∈ Cn3 for i = 1, . . . , n1, j = 1, . . . , n2.

Output: _{The vector g ≡ T q.}

1: fori = 1, . . . , n1 do

2: Compute Qz,iwith Q = Uzqi,1 qi,2 · · · qi,n2,

Qz,i=Ez,i+1Q(:, 1) Ez,i+2Q(:, 2) · · · Ez,i+n2Q(:, n2)

⊤

in (5.13).

3: Compute gi,k withgi,1 gi,2 · · · gi,n3 = Ey,i(UyQz,i) in (5.15).

4: end for 5: fork = 1, . . . , n3 do 6: Compute Q = ExUxg1,k g2,k · · · gn1,k ⊤ in (5.16). 7: _{Set g((k − 1)n}1n2+ 1 : kn1n2) = √_n₁1_n₂_n₃vec(Q). 8: end for 0 1 2 3 4 5 6 7 8 9 10 x 107 0 1 2 3 4 5 6 7 8 9 10

CPU times (sec.)

n Tq

Tp

Fig. 6.1. CPU time for computing T∗_pand Tq with various n.

6 of Algorithm 3 are computed by ifft, which is the built-in inverse discrete Fourier transform function in MATLAB. The functions eigs and pcg in MATLAB are used for the IPL and CG methods, respectively. All computations are carried out on a workstation with two Intel Quad-Core Xeon X5687 3.6 GHz CPUs, 48 GB main memory, the RedHat Linux operation system, and IEEE double-precision floating-point arithmetic operations.

Figure 6.1 shows the timing results for computing T∗_p_{and T q by Algorithm 2}

and 3, respectively. The matrix size of T ranges from 884, 736 to 94, 818, 816. In particular, the dimension of T is ¯n3

j, where ¯nj = 96 + 24j = n1 = n2 = n3 for

j = 0, 1, . . . , 15. The average CPU time out of ten trials for each j is then plotted in the figure. We can see Algorithms 2 and 3 are extraordinarily efficient. They take less than 10 seconds to finish a T∗_p_{or T q matrix-vector multiplication even for the}

matrix T whose dimension is as large as 95 million.

Being equipped with these fast T∗_p _{or T q computational kernels, we evaluate}

how the IPL method (Algorithm 1) performs, in terms of CPU time and iteration numbers, to solve the eigenvalue problems for the band-gap structure of the target photonic crystals. In the numerical experiments, we assume the periodic dielectric diamond structure with face centered cubic lattice consists of dielectric spheres and connecting spheroid [3]. The radius of the spheres is r = 0.12a and the connecting spheroid has minor axis length s = 0.11a with a = 1. Inside the structure is the

(18)

X U L G X W K 3500 4000 4500 5000 5500 6000

CPU times (sec.)

(a) CPU time

X U L G X W K 45 50 55 60 65 70 75 80 85 90 95 Iteration numbers

(b) IPL iteration numbers Fig. 6.2. CPU time and iteration numbers of IPL method with various wave vector 2πk.

dielectric material with permittivity contrast εi/εo = 13. We solve the eigenvalue

problems associated with the wave vector 2πk’s along the segments connecting X, U , L, G, X, W , and K in the first Brillouin zone. In each of the segments, fifteen uniform distributed sampling wave vectors are chosen. For each wave vector, we compute the five smallest positive eigenvalues of the corresponding GEVP. Note that the dimension of the GEVP is 3¯n3

1= 3 × 1203= 5, 184, 000 and the dimension of the SEVP we are

actually solving is reduced to 2¯n3

1= 2 × 1203= 3, 456, 000.

Computational results in CPU time of the IPL are shown in Figure 6.2(a). Out of all of the 90 test problems, the timing to solve each of the eigenvalue problems ranges from 50 to 104 minutes with average of 65 minutes. For the problems with dimension as large as 3.5 million, the timing results of the MATLAB codes are quite satisfactory.

In addition to the fast T∗_p_{and T q multiplications, another factor contributing}

to the outstanding timing performance is the small number of iterations in the IPL. The total iteration numbers that the IPL takes to solve an eigenvalue problem for the five target eigenvalues are shown in Figure 6.2(b). Among the 90 cases we tested, the IPL takes 47 to 91 iterations (57 on average) to solve each of the eigenvalues with relative residual less than 3.34 × 10−15_{. These small iteration numbers for such large}

problems are again remarkable.

Finally, to solve the linear systems within the IPL solver, the CG method takes around 40 iterations consistently for all of the test problems to fulfill the relative residual tolerance 6.40 × 10−15_{. This fast convergence behavior is due to the}

well-conditioned coefficient matrix defined in (4.5) and can be justified by the following theoretical analysis. Convergence of the CG method for solving Eq. (4.5) depends on the ratio γ = √ κ(Q∗ rB−1Qr)−1 √ κ(Q∗

rB−1Qr)+1 as shown in [13]. In the numerical experiments, we have

εi/εo = 13. Theorem 4.1 suggests that γ ≤ γB = √

13−1 √

13+1 ≈ 0.5657. In other words,

after 40 iterations, the residual is predicted to be less than (γB)40≈ 1.27 × 10−10.

7. Conclusions. Aiming to solve the Maxwell equation that models the 3D photonic crystals with FCC lattice, we have derived an explicit eigendecomposition of the discrete double-curl operator and the orthogonal bases spanning the associated range and null spaces. Based on these results, we propose the inverse projective Lanczos method with the FFT-based matrix-vector multiplications that can solve the

(19)

eigenvalue problems efficiently. This fast eigenvalue solver can significantly reduce the time needed to find the optimal shape of photonic crystals with larger band-gap.

Acknowledgements. This work is partially supported by the National Science Council, the Taida Institute of Mathematical Sciences, the Center for Advanced Study in Theoretical Sciences, and the National Center for Theoretical Sciences in Taiwan.

Appendix A.

Theorem A.1. Let θi, θi,j and θi,j,k be defined in (3.1), (3.3), and (3.8),

re-spectively, and nℓ (ℓ = 1, 2, 3) be multiplies of 6. Assume k = (k1, k2, k3) 6= 0 with

0 ≤ kℓ ≤ 1 (ℓ = 1, 2, 3). Then p1 = θi/(ı2π), p2 = θi,j/(ı2π), and p3= θi,j,k/(ı2π)

are integers if and only if k= (1, 1/√3, 1/√6)⊤_.

Proof. From the definitions of θi, θi,j, θi,j,k, and the lattice vectors a1, a2 and a3

in (1.3), it follows that p1, p2, and p3are integers if and only if k satisfies that k · a1=

n1p1−i, k·a2= n2p2−j +1₂n1p1, and k·a3= n3p3−k+1₂n1p1+1₃n2p2. This is

equiv-alent to k1 = n1p1− i, k2 = √2₃ n2p2+₂i − j and k3 = q 3 2 n3p3− k + 1 3(i + j).

By assumption 0 ≤ kℓ≤ 1 (ℓ = 1, 2, 3), it implies that

0 ≤ n1p1− i ≤ 1, (A.1) 0 ≤ n2p2+ i 2− j < 1, and (A.2) 0 ≤ n3p3− k + 1 3(i + j) < 1. (A.3)

Case 1. i = 0: Since n2/6 is an integer, from (A.1) to (A.3), we have p1= 0, j = n2p2,

and n3p3− k +1₃n2p2 = 0. It follows that k1 = k2 = k3 = 0, or k = 0, which is a

contradiction.

Case 2. i 6= 0 and even: Since n1/6 is an integer, (A.1) suggests that n1p1− i = 0, or

k1= 0. From (A.2), it follows that n2p2+₂i−j = 0, which implies that i = 2(j−n2p2),

or k2 = 0. Then Eq. (A.3) becomes 0 ≤ n3p3− k + j −2₃n2p2 < 1. Since 2₃n2 is

an integer, this implies that n3p3− k + j −2₃n2p2 = 0, or k3= 0, which contradicts

k_{6= 0.}

Case 3. i 6= 0 and odd: From (A.1), we have i = n1p1− 1 and therefore k1 = 1.

From (A.2), it follows that 0 < n1p1−1

2 − j < 1. The fact implies that j = n1p1−2

2

and p2 = 0, and therefore k2 = √1₃. Consequently, from (A.3), it holds that 0 ≤

n3p3− k +1₆(3n1p1− 4) < 1, which implies that p3= 0, k = 1₂n1p1− 1, and therefore

k3=√1₆, or k = (1, 1/

√

3, 1/√6)⊤_.

The following theorem asserts that C1, C2, and C3 are normal and commute each

other.

Theorem A.2. For C1,C2, andC3defined in (2.2a), it holds that Ci∗Cj= CjCi∗

andCiCj = CjCi fori, j = 1, 2, 3.

Proof. First, by definitions of Ki and Ji in (2.3), (2.4a), and (2.4b), it holds that

J∗

2J2 = In1, J3∗J3 = In1n2 as well as K1∗K1 = In1, K2∗K2 = In1n2, and K3∗K3 = In.

We then have C∗

jCj = CjCj∗ for j = 1, 2, 3. Furthermore, it is easy to check that

K1J2∗= J2∗K1, K1∗J2= J2K1∗, and K1J2= J2K1. Therefore, we have C1∗C2= C2C1∗,

C1C2∗= C2∗C1, and C1C2= C2C1. Second, we partition K2as K2=   K22 (eme⊤1) ⊗ In1 0 0 K22 (eme⊤1) ⊗ In1 eı2πk·a2_(e me⊤1) ⊗ J2 0 K22  ,

(20)

where m = n2/3, K22 =       −1 1 . .. ... . .. 1 −1       ⊗ In1 ∈ C n1n2 3 ×n1n23 , and e_j is the jth column of Im. Consequently, K2J3∗=   0 K22(Im⊗ J2∗) (eme⊤1) ⊗ J2∗ eı2πk·a2_(e me⊤1) ⊗ In1 0 K22(Im⊗ J2∗) eı2πk·a2_K 22 eı2πk·a2((eme⊤1) ⊗ In1) 0   and J3∗K2=   0 (Im⊗ J2∗)K22 (eme⊤1) ⊗ J2∗ eı2πk·a2_(e me⊤1) ⊗ In1 0 (Im⊗ J2∗)K22 eı2πk·a2_K 22 eı2πk·a2((eme⊤1) ⊗ In1) 0  .

Since K22(Im⊗ J2∗) = (Im⊗ J2∗)K22, it follows that K2J3∗= J3∗K2. Similarly, K2∗J3=

J3K2∗ and K2J3 = J3K2. We then have C2∗C3 = C3C2∗, C2C3∗ = C3∗C2 and C2C3 =

C3C2.

Finally, by the definitions of C1and C3in (2.2a), it is easy to check that C1∗C3=

C3C1∗, C1C3∗= C3∗C1, and C1C3= C3C1.

REFERENCES

[1] D. L. Bullock, C.-C. Shih, , and R. S. Margulies. Photonic band structure investigation of two-dimensional Bragg reflector mirrors for semiconductor laser mode control. J. Opt. Soc. Am. B, 10:399–403, 1993.

[2] A. Chatterjee, J.M. Jin, and J.L. Volakis. Computation of cavity resonances using edge-based finite elements. IEEE Trans. Microwave Theory Tech., 40:2106–2108, 1992.

[3] R.L. Chern, C.Chung Chang, Chien-C. Chang, and R.R. Hwang. Numerical study of three-dimensional photonic crystals with large band gaps. J. Phys. Soc. Jpn., 73:727–737, 2004. [4] J. S. Foresi, P. R. Villeneuve, J. Ferrera, E. R. Thoen, G. Steinmeyer, S. Fan, J. D. Joannopou-los, L. C. Kimerling, H. I. Smith, and E. P. Ippen. Photonic-bandgap microcavities in optical waveguides. Nature, 390:143–145, 1997.

[5] G.H. Golub and C.F. Van Loan. Matrix computations. Johns Hopkins Univ Pr, 1996. [6] T.-M. Huang, W.-J. Chang, Y.-L. Huang, W.-W. Lin, W.-C. Wang, and W. Wang.

Precondi-tioning bandgap eigenvalue problems in three dimensional photonic crystals simulations. J. Comput. Phys., 229:8684–8703, 2010.

[7] T.-M. Huang, H.-E. Hsieh, W.-W. Lin, and W. Wang. Matrix representation of the double-curl operator for simulating three dimensional photonic crystals. Technical report, National Center for Theoretical Sciences, 2012.

[8] T.-M. Huang, Y.-C. Kuo, and W. Wang. Computing spectrum boundary eigenvalues for three-dimensional photonic crystals. Preprint, 2012.

[9] J. D. Joannopoulos, S. G. Johnson, and R. D. Winn, J. N.and Meade. Photonic Crystals: Molding the Flow of Light. Princeton University Press, 2008.

[10] C. Kittel. Introduction to solid state physics. Wiley, New York, 2005.

[11] A. Mekis, J. C. Chen, I. Kurland, S. Fan, P. R. Villeneuve, and J. D. Joannopoulos. High trans-mission through sharp bends in photonic crystal waveguides. Phys. Rev. Lett., 77:3787– 3790, 1996.

[12] O. Painter, R. K. Lee, A. Scherer, A. Yariv, J. D. O’Brien, P. D. Dapkus, and I. Kim. Two-dimensional photonic band-gap defect mode laser. Science, 284:1819–1821, 1999. [13] Y. Saad. Iterative methods for sparse linear systems. PWS Publishing Company, 1996. [14] K. Yee. Numerical solution of initial boundary value problems involving Maxwell’s equations