National University of Kaohsiung Repository System:Item 310360000Q/10845

全文

(1)國立高雄大學應用數學研究所碩士論文. Preconditioning Effects in Numerical Simulations of Three-Dimensional Photonic Crystals 三維光子晶體數值模擬中的矩陣預處理效應. 研究生：張為仁撰指導教授 : 王偉仲博士. 中華民國九十九年元月.

(2)

(3) 致. 謝. 兩年半的碩士生涯告一段落了。這兩年半來因為受到許多人的幫忙與支持，才能使我順利完成碩士學業，在此一併致上由衷的感謝。首先感謝我的指導教授，台大數學系的王偉仲老師。除了在程式上及理論上的指導之外，也謝謝老師總是鼓勵我要不斷接收新的知識，鼓勵我接觸一些不同的領域，鼓勵我多參加研習並鼓勵我學習英文。在這兩年半內，謝謝你給了我不同的觀點及視野，讓我更有野心想看看更大的世界。接著感謝師大數學系的黃聰明教授在理論方面細心及耐心的教導，並在程式部分提供建議，讓我能完成所有程式的撰寫。這兩年半來因為老師們的指導，讓我看到了更專業，更深一層的領域。感謝碩士班陪著我的同學，學弟妹們。因為有你們的陪伴，讓我的研究所生涯不會枯燥乏味。除了謝謝你們願意陪著我聊天玩牌，也謝謝你們願意分享你們各個領域的觀念和想法，讓我也能收穫不少。最後感謝家人的支持，每每看到電視上有些人沒錢讀書，就會想到自己是何其幸運能夠完成碩士學業。感謝家人不論是在經濟上或是其他各個方面的支持，讓我能夠順利拿到碩士學位。再次感謝所有幫助我的人，沒有你們就不會有現在的我。謝謝。張為仁. 20100114 中壢.

(4) Preconditioning Effects in Numerical Simulations of Three-Dimensional Photonic Crystals. by Wei-Jen Chang Advisor Wei-Chung Wang. Department of Applied Mathematics, National University of Kaohsiung Kaohsiung, Taiwan 811 R.O.C. January 2010.

(5) Contents 1. Introduction…………………………………………………………………………………………2 2. Yee’s discretization……………………………………………………………………………….3 3. Jacobi-Davidson Algorithm…………………………………………………………………..8 4. Shift-and-invert Lanczos with Krylov-Schur restarting ……………………….10 5. FFT preconditioner………………………………………………………………………………13 5.1 FFT usage in solving Laplacian operator………………………………….…..13 5.2 A crucial transformation for applying FFT……………………………………14 6. Numerical Results…………………………………………………………………….………….15 6.1 Results of the Arnoldi method…………………………………..………….…….16 6.2 Results of the Jacobi-Davidson method……………………..………….…….17 6.3 Results of the Krylov-Schur method…………………………...………….……19 6.4 Results of different vector k……………………………………..…………….……23 6.5 Results as k is near zero…………………………………………..……………….….24 7. Conclusion……………………………………………………………………………………………26.

(6) 三維光子晶體數值模擬中的矩陣預處理效應指導教授:王偉仲博士國立臺灣大學數學系研究生:張為仁國立高雄大學應用數學研究所. 摘要. 預處理在線性系統 Ax=b 中扮演了很重要的角色，一個適當的預處理能夠加速線性系統的收斂並且保持線性系統的穩定性。在本篇論文中，我們將討論矩陣的各種預處理效應。我們的問題來自三維光子晶體的馬克斯威爾方程，最主要的目的是希望找出一個適合的預處理來加速原始特徵值問題的收斂速度。我們的實驗包含了 Arnoldi 法、Jacobi-Davidson 法、 Krylov-Schur 法並且結合不同的預處理像是 Jacobi、SSOR、FFT 等等。最後我們發現 Krylov-Schur 法搭配 FFT 預處理的效能要比其他與處理來的好。關鍵字:馬克斯威爾方程、三維光子晶體、Jacobi-Davidson 法、 Krylov-Schur 法、快速傅立葉轉換.

(7) Preconditioning Effects in Numerical Simulations of Three-Dimensional Photonic Crystals Advisor: Dr. Wei-Chung Wang Department of Mathematics National Taiwan University. Student:Wei-Jen Chang Department of Applied Mathematics National University of Kaohsiung. Abstract Preconditioner plays an important role in solving linear system Ax=b. A suitable preconditioner makes the system converge quickly and more stable. In this thesis, we’ll discuss the effects of different preconditioners in the linear system which comes from an eigenvalue problem derived from the governing Maxwell equation. The purpose is to find a suitable preconditioner to accelerate convergent rate of the eigenvalue problem. We conduct our experiment by combining Arnoldi’s method, Jacobi-Davidson’s method and Krylov-Schur method with preconditioners like Jacobi, SSOR, FFT, etc. Finally, we find that the Krylov-Schur method with FFT preconditioner is more effective than other common preconditioners. Key words: The Maxwell equations, three-dimensional photonic crystals, JacobiDavidson’s method, Krylov-Schur method, FFT preconditioner.. 1.

(8) 2. 1. Introduction Three-dimensional photonic band-gap structures are periodic dielectric structures with a frequency region. Over past few decades, photonic crystals with specific band gaps are practical interest and extensive studies. In computing band-gap structures of three-dimensional photonic crystals, we are facing the time harmonic Maxwell’s equations as following: ∇ × H(x, t). =. ε(x)∂t E(x, t),. = =. −µ0 ∂t H(x, t), 0,. ∇ · (H(x, t)). =. 0,. ∇ × E(x, t) ∇ · (εE(x, t)). where H is magnetic field, E is electric field, ε is permittivity (or dielectric constant), and µ0 is permeability. By eliminating magnetic field, we obtain the second order Maxwell’s equation. ∇ × ∇ × E(x, t) = −µ0 ε(x)∂t2 E(x, t). ˜ Next, we take E(x, t) = eiωt E(x) (time harmonic modes), then we obtain ˜ ˜ ˜ ∇ × ∇ × E(x) = µ0 ω 2 ε(x)E(x) ≡ λε(x)E(x) Thus, by using separation of time and space variables and after eliminating the magnetic field H, the differential equation ˜ = λεE ˜ ∇×∇×E (1.1) ˜ ∇ · (εE) = 0. are obtained with λ = µ0 ω 2 and frequency ω. Numerical methods for solving Maxwell’s equations have been developed. Those methods include the finite difference methods [8, 9, 25, 36], the finite volume methods [10, 11, 24] and the finite element methods [4, 6, 7, 13, 21, 26]. These methods usually provide for divergence conservation and yield generalized eigenvalue problems (1.2). A~e = λB~e. where B is a diagonal matrix. The number of zero eigenvalues of (1.2) was found to be equal to the number of grid points for the finite difference formulation [9] or the number of internal nodes of finite elements for the finite element method [5]. In the study of photonic band gaps, the first few branches of non-zero eigenvalues are of primary interest. Hence the desired eigenpairs are located in the interior of the spectrum of (1.2). The eigenvalue problem (1.2) can be solved by inverse power method [1, 8, 9], various Lanczos [1, 28] or Arnoldi methods [2]. These methods require the use of the shift-and-invert technique to compute interior eigenpairs. Consequently, the computational cost for solving linear system is excessive. Another approach is Jacobi-Davidson method [2, 12, 29, 30]. In this method, solving linear system is also required but it is solved only approximately. In this paper, we introduce Yee’s scheme in order to discretize the Maxwell equation and give an explicit matrix form of the associated generalized eigenvalue problem in Section 2. In Section 3, we introduce the Jacobi-Davidson Algorithm and our heuristic strategy for computing target eigenvalues in order to get a better performance. In Section4, we give a brief introduction of the shift-and-invert Lanczos method and Krylov-Schur restart method. In Section 5, we first explain how to solve the Laplacian operator by using FFT and then we give a crucial transformation of our original equation so that we can use FFT as the preconditioner..

(9) 3. In Section 6, our numerical experiment, we first use a traditional method-Arnoldi’s method to solve the eigenvalue problem and then do some comparisons between the different preconditioners. After that, we try to use the Jacobi-Davidson method with tuning some parameters to get a better performance and also do some comparisons between FFT preconditioner and other preconditioners in both small and larger matrix size. We also use Krylov decomposition with Krylov-Schur restarting to solve our generalized problem and discuss the effects of preconditioners and the comparisons of performance between different discretization. Finally, we conclude this paper in Section 7. 2. Yee’s discretization In this section, we introduce the Yee’s scheme in order to discretize the Maxwell’s equations. Since the photonic crystals consist of dielectric materials fabricated in periodic structure, from Bloch’s Theorem [22], the eigenfunctions of (1.1) can be written as ˜ (2.1) E(x) = eik·x e(x) for some vector k in the first Brillouin zone and e satisfies the periodic condition (2.2). e(x + aℓ ) = e(x),. ℓ = 1, 2, 3,. where the lattice translation vectors aℓ , ℓ = 1, 2, 3, span the primitive cell which extends periodically to form the photonic crystal. Hence, Eq. (1.1) can be rewritten as (2.3) Let. e−ik·x ∇ × ∇ × eik·x e = λεe. h = ∇ × eik·x e.. (2.4). Substituting (2.4) for (2.3), we get. ∇ × h = λεeik·x e.. (2.5). For simplicity of presentation, we assume that the primitive cell is a unit cube spanned by the basis vectors a1 = (1, 0, 0),. a2 = (0, 1, 0),. a3 = (0, 0, 1).. The corresponding first Brillouin zone is given by {k = (k1 , k2 , k3 ) ∈ R3 | − π ≤ kj ≤ π, j = 1, 2, 3}.. Let δ1 , δ2 and δ3 denote the mesh length along the x, y and z axial directions, respectively. The indices n1 , n2 and n3 are grid point number in x, y and z directions, respectively. In Yee’s scheme, (2.4) and (2.5) are discretized on centers F = F1 ∪ F2 ∪ F3 of cell faces and centers E = E1 ∪ E2 ∪ E3 of cell edges, respectively, which we denote by E1 = {(xi+ 21 , yj , zk )},. F1 = {(xi , yj+ 21 , zk+ 12 )},. E2 = {(xi , yj+ 12 , zk )},. E3 = {(xi , yj , zk+ 12 )},. F2 = {(xi+ 21 , yj , zk+ 12 )},. F3 = {(xi+ 21 , yj+ 12 , zk )}. for i = 0, . . . , n1 − 1, j = 0, . . . , n2 − 1 and k = 0, . . . , n3 − 1. Under such discretizations, we give some notations first. Let e1 (i, j, k), e2 (i, j, k) and e3 (i, j, k) denote approximated values of functions e1 , e2 and e3 at the edge points (xi+1/2 , yj , zk ), (xi , yj+1/2 , zk ) and (xi , yj , zk+1/2 ), respectively. h1 (i, j, k), h2 (i, j, k) and h3 (i, j, k) denote approximated values of functions h1 , h2 and h3 at the face points (xi , yj+1/2 , zk+1/2 ), (xi+1/2 , yj , zk+1/2 ) and (xi+1/2 , yj+1/2 , zk ), respectively. Let α1 (i, j, k), α2 (i, j, k) and α3 (i, j, k) denote the complex values of function eik·x at the edge points (xi+1/2 , yj , zk ), (xi , yj+1/2 , zk ) and (xi , yj , zk+1/2 ),.

(10) 4. ∂y eik·x3 e3 − ∂z eik·x2 e2 = h1 (i, j + 1/2, k + 1/2). 6. ∂z eik·x1 e1 − ∂x eik·x3 e3 = h2 (i + 1/2, j, k + 1/2). . ? (i + 1/2, j + 1/2, k) ∂x eik·x2 e2 − ∂y eik·x1 e1 = h3. Figure 1. Discretized h = ∇ × eik·x e at face respectively. Set ε1 (i, j, k) = ε1 (xi+ 21 , yj , zk ), ε2 (i, j, k) = ε2 (xi , yj+ 12 , zk ) and ε3 (i, j, k) = ε3 (xi , yj , zk+ 12 ). As Figure 1 shows, in the first step of the Yee’s scheme, we rewrite Eq. (2.4) as ∂y eik·x3 e3 − ∂z eik·x2 e2 = h1 ,. (2.6a). ∂z eik·x1 e1 − ∂x eik·x3 e3 = h2 ,. (2.6b). ∂x eik·x2 e2 − ∂y eik·x1 e1 = h3. (2.6c). and discretize (2.6a), (2.6b) and (2.6c) by using central difference at the face points (xi , yj+1/2 , zk+1/2 ), (xi+1/2 , yj , zk+1/2 ) and (xi+1/2 , yj+1/2 , zk ), respectively. Then we obtain α3 (i, j + 1, k)e3 (i, j + 1, k) − α3 (i, j, k)e3 (i, j, k) δ2 α2 (i, j, k + 1)e2 (i, j, k + 1) − α2 (i, j, k)e2 (i, j, k) (2.7a) , − δ3 α1 (i, j, k + 1)e1 (i, j, k + 1) − α1 (i, j, k)e1 (i, j, k) h2 (i, j, k) = δ3 α3 (i + 1, j, k)e3 (i + 1, j, k) − α3 (i, j, k)e3 (i, j, k) (2.7b) − , δ1 α2 (i + 1, j, k)e2 (i + 1, j, k) − α2 (i, j, k)e2 (i, j, k) h3 (i, j, k) = δ1 α1 (i, j + 1, k)e1 (i, j + 1, k) − α1 (i, j, k)e1 (i, j, k) (2.7c) − δ2 h1 (i, j, k) =. for i = 0, . . . , n1 − 1, j = 0, . . . , n2 − 1 and k = 0, . . . , n3 − 1..

(11) 5. According to above discretizations, we define ~eℓ , ~hℓ ∈ Cn and Dℓ ∈ Cn×n with n = n1 n2 n3 for ℓ = 1, 2, 3 as follows: .  ~eℓ =  . .  ~ hℓ =  . . (0). ~eℓ .. .. .    with ~e(k) =  ℓ  . (n3 −1). ~eℓ. . (0) ~ hℓ .. .. . (0,k). ~eℓ. (n2 −1,k). ~eℓ.    with ~h(k) =  ℓ  . ~h(n3 −1) ℓ. .. .. ~h(0,k) ℓ .. . ~h(n2 −1,k) ℓ. . .  (j,k)   , ~e =  ℓ .   , ~h(j,k) ℓ . eℓ (0, j, k) .. .. eℓ (n1 − 1, j, k)  hℓ (0, j, k)  .. = .. .  ,. hℓ (n1 − 1, j, k). .  ,. and (2.8a) (2.8b). (j,k). =. diag. (k). =. diag. Dℓ. Dℓ. Dℓ. (2.8c). =. diag. . . αℓ (n1 − 1, j, k) ∈ Cn1 ×n1 , (n −1,k) ∈ C(n1 n2 )×(n1 n2 ) , · · · Dℓ 2 (n −1) ∈ Cn×n , · · · Dℓ 3. αℓ (0, j, k) · · · (0,k). Dℓ. (0). Dℓ. for j = 0, . . . , n2 − 1, k = 0, . . . , n3 − 1. Using the periodic property of e1 , e2 and e3 , we have (2.9a) (2.9b). e2 (i, j, n3 ) = e2 (i, j, 0), e1 (i, j, n3 ) = e1 (i, j, 0),. e3 (i, n2 , k) = e3 (i, 0, k), e3 (n1 , j, k) = e3 (0, j, k),. (2.9c). e1 (i, n2 , k) = e1 (i, 0, k),. e2 (n1 , j, k) = e2 (0, j, k),. for i = 0, . . . , n1 − 1, j = 0, . . . , n2 − 1, k = 0, . . . , n3 − 1. Substituting (2.9a), (2.9b), (2.9c) for (2.7a), (2.7b), (2.7c), respectively, Eq. (2.7) can be represented as the following matrix: ~h = G∗ D~e,. (2.10) where. (2.11). .  ~e1 ~e =  ~e2  , ~e3. .  ~h1  ~h =   ~h2  , ~h3. D = diag. D1. D2. D3. . ,. and . 0 1  G= δ3 Γ3 1 − δ2 diag (Γ2 , · · · , Γ2 ). (2.12).  1 − δ13 Γ3 δ2 diag (Γ2 , · · · , Γ2 ) 0 − δ11 diag (Γ1 , · · · , Γ1 )  1 0 δ1 diag (Γ1 , · · · , Γ1 ).

(12) 6. with. (2.13a) Γ1. (2.13b) Γ2. (2.13c) Γ3. =. . =. . =. . 1  −1   . In1  −In1   . −e−in1 δ1 k1. 1 .. .. ... . −1. In1 .. .. In1 ×n2  −In1 ×n2   . 1. .    ∈ Cn1 ×n1 , . −e−in2 δ2 k2 In1 ... . −In1 In1 ×n2 .. .. In1. .    ∈ C(n1 n2 )×(n1 n2 ) , . −e−in3 δ3 k3 In1 ×n2 ... . −In1 ×n2. In1 ×n2. .    ∈ Cn×n . . Next, as Figure 2 shows, we rewrite (2.5) as ∂y h3 − ∂z h2 = λε1 eik·x1 e1 ,. (2.14a). ∂z h1 − ∂x h3 = λε2 eik·x2 e2 ,. (2.14b). ∂x h2 − ∂y h1 = λε3 eik·x3 e3 ,. (2.14c). and discretize (2.14a), (2.14b) and (2.14c) by using central difference at edge points (xi+1/2 , yj , zk ), (xi , yj+1/2 , zk ) and (xi , yj , zk+1/2 ), respectively.. ∂x h2 − ∂y h1 = λε3 eik·x3 e3 (i, j, k + 1/2). 6. ∂y h3 − ∂z h2 = λε1 eik·x1 e1 (i + 1/2, j, k). . (i, j + 1/2, k). -. ∂z h1 − ∂x h3 = λε2 eik·x2 e2. Figure 2. Discretized ∇ × h = λεeik·x e at edge.

(13) 7. Then we obtain h3 (i, j, k) − h3 (i, j − 1, k) h2 (i, j, k) − h2 (i, j, k − 1) − δ2 δ3 (2.15a) h1 (i, j, k) − h1 (i, j, k − 1) h3 (i, j, k) − h3 (i − 1, j, k) − δ3 δ1 (2.15b) h2 (i, j, k) − h2 (i − 1, j, k) h1 (i, j, k) − h1 (i, j − 1, k) − δ1 δ2 (2.15c). = λ(α1 ε1 e1 )(i, j, k),. = λ(α2 ε2 e2 )(i, j, k),. = λ(α3 ε3 E3 )(i, j, k). for i = 0, . . . , n1 − 1, j = 0, . . . , n2 − 1 and k = 0, . . . , n3 − 1. Under the discretizations in (2.15), we define diagonal matrices as follows: for ℓ = 1, 2, 3, Bℓ = diag Bℓ(0) · · · Bℓ(n3 −1) with. (k). =. diag. (j,k). =. diag. Bℓ Bℓ. . (0,k). Bℓ. ···. (n2 −1,k). Bℓ. εℓ (0, j, k) · · ·. . ,. εℓ (n1 − 1, j, k). . ,. for j = 0, . . . , n2 − 1, k = 0, . . . , n3 − 1. By the periodic property of e1 , e2 and e3 , h1 , h2 and h3 satisfy the following properties: h2 (i, j, −1) = e−in3 δ3 k3 h2 (i, j, n3 − 1), (2.16a) h3 (i, −1, k) = e−in2 δ2 k2 h3 (i, n2 − 1, k), h1 (i, j, −1) = e−in3 δ3 k3 h1 (i, j, n3 − 1), (2.16b) h3 (−1, j, k) = e−in1 δ1 k1 h3 (n1 − 1, j, k), h1 (i, −1, k) = e−in2 δ2 k2 h1 (i, n2 − 1, k), (2.16c) h2 (−1, j, k) = e−in1 δ1 k1 h2 (n1 − 1, j, k), for i = 0, . . . , n1 − 1, j = 0, . . . , n2 − 1 and k = 0, . . . , n3 − 1. Substituting (2.16a), (2.16b) and (2.16c) for (2.15a), (2.15b) and (2.15c), respectively, the discretization of (2.5) at edges can be represented as the following matrix: (2.17). D∗ G~h = λB~e,. where D, G are defined in (2.11) and (2.12), respectively, and (2.18) B = diag B1 B2 B3 .. Substituting (2.10) for (2.17), the discretization of (2.3) at edges forms the following generalized eigenvalue problem: (2.19). A~e = λB~e. where (2.20). A = D∗ GG∗ D. whose sparsity is shown in Figure 3..

(14) 8 0. 500. 1000. 1500. 2000. 2500. 3000. 0. 500. 1000. 1500 nz = 39000. 2000. 2500. 3000. Figure 3. Sparsity of A. Algorithm 1 Jacobi-Davidson Algorithm for solving Ax = λBx Input: Coefficient matrices A and B, the number of desired eigenvalues ℓ and an initial B-orthonormal vector Vini . Output: the desired eigenpairs (λj , xj ) for j = 1, . . . , ℓ. 1: Set V1 = [Vint ], Vx = [ ], and Λ = ∅. 2: for j = 1 to ℓ do 3: Compute W1 = AV1 and M1 = V1∗ W1 . 4: for k = 1, 2, . . . do 5: Compute all the eigenpairs of Mk s = θs. 6: Select the desired eigenpair (θk , sk ) with ksk k2 = 1 and θk ∈ / Λ. 7: Compute uk = Vk sk , pk = Buk , rk = (A − θk B)uk . 8: If (krk k2 < ε), then goto line 14. 9: Solve the correction equation uk p∗k pk u∗k (A − θk B) I − ∗ tk = −rk I− ∗ u k pk u k pk approximately for tk ⊥B uk . 10: B-orthogonalize tk against Vk ; set vk+1 = tk /ktk kB . 11: Compute wk+1 = Avk+1 , Mk Vk∗ wk+1 Mk+1 = . ∗ ∗ vk+1 Wk vk+1 wk+1 12: 13: 14: 15: 16: 17:. Expand Vk+1 = [Vk , vk+1 ] and Wk+1 = [Wk , wk+1 ]. end for Set λj = θk , xj = uk , Λ = Λ ∪ {λj }. Perform locking by B-orthogonalizing xj against Vx ; Compute xj = xj /kxj kB ; Update Vx = [Vx , xj ]. Choose an orthonormal matrix Vini ⊥B Vx ; Set V1 = [Vx , Vini ]. end for 3. Jacobi-Davidson Algorithm. For solving large-scale sparse eigenvalue problems, we use the Jacobi–Davidson (JD) Method sketched in Algorithm 1. The preference of the JD is based on the.

(15) 9. successful experience on solving the eigenvalue systems arising in various quantum dot models detailed in [18, 19, 20, 35]. As suggested in [17, 30], the correction equation up∗ pu∗ (A − θB) I − ∗ t = −r (3.1) I− ∗ u p u p. in line 9 of Algorithm 1 is solved by using an iterative method with preconditioner up∗ pu∗ M I− ∗ , Mp ≡ I − ∗ u p u p. where M is an approximation of (A − θB). In each of the iterative steps for solving (3.1), it needs to solve the linear system Mp z = y. for a certain given vector y and z⊥B u. That is, up∗ pu∗ M I− ∗ z=y I− ∗ u p u p or pu∗ up∗ z (3.2) I− ∗ M z− ∗ =y u p u p. Notice that equation (3.2) has infinite solutions, and we have to utilize the property z⊥B u to make sure the linear system has unique solution. We explain how to make sure the uniqueness as follows. Since z⊥Bu , p∗ z = 0. Equation (3.2) becomes pu∗ (3.3) Mz = y I− ∗ u p Form equation (3.3), we have Mz −. (3.4). . p(u∗ )(Mz) u∗ p. . =y. Let constant ξ = (u∗ )(Mz)/u∗ p, equation (3.4) becomes Mz −ξp = y. This implies z = M−1 y − ξM−1 Bu. By multiplying the above equation by p∗ , we obtain Since p∗ z = 0, we obtain. p∗ z = p∗ M−1 y − ξp∗ M−1 Bu. 0 = p∗ M−1 y − ξp∗ M−1 Bu = u∗ BM−1 y − ξu∗ BM−1 Bu. Finally, we get. u∗ BM−1 y u∗ BM−1 Bu Therefore, only one linear system Md = y in each iterative step is need to be solved. The vector M−1 Bu and inner product u∗ BM−1 Bu are computed only once in all steps of the iterative process for solving (3.1). There are various iterative methods available and many parameters allowing tuning up for solving (3.1). Here, we adopt the following heuristic strategies to compute the approximated solution of the correction equation (3.1). For a given preconditioner M of A−θk B, we determine the maximal iteration number of solving correction equation so that Jacobi-Davidson method is convergent. These heuristic strategies are determined by the numerical experience. The iterative methods are stopped if the residual of the linear systems are less than the ξ=.

(16) 10. accuracy requirement. The heuristics are specified by the linear system solver GMRES [27] and Bi-CGSTAB [34], preconditioning strategy M, maximum iteration number, and residual stopping criterion. For example, {GMRES, M, 30, 10−4 } indicates that GMRES with preconditioner M is used to solve a linear system, the maximum iteration number is 30, and the residual stopping criterion is 10−4 . The specific heuristic strategies used to solve the linear system (3.1) are described as follows. • To compute the first smallest positive eigenvalue (λ1 ): In such case, Line 9 of Algorithm 1 is changed to the pseudo-code illustrated in Algorithm 2. Algorithm 2 The heuristic strategy for computing λ1 . 1: 2: 3: 4: 5: 6:. Solve Eq. (3.1) (approximately) to obtain a tk ⊥B uk by the method determined below. if ( k ≤ 9 ) then Use {BiCGSTAB, No precond., 6, 10−4 }. else Use {GMRES, M, m1 , 10−4 }. end if • To compute the other positive eigenvalues (λ2 , λ3 , . . .): In such case, Line 9 of Algorithm 1 is changed to the pseudo-code illustrated in Algorithm 3. Note that the variable i is set to be equal to 15 initially.. Algorithm 3 The heuristic strategy for computing {λ2 , . . . , λℓ }. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:. Solve Eq. (3.1) (approximately) to obtain a tk ⊥B uk by the method determined below. if (krk k2 > 0.1 and k < 6) then Use {BiCGSTAB, No precond., 6, 10−4 }. else if (krk k2 ≥ 0.1 and k > 14) then Use {GMRES, M, m2 , 10−4 }. Set flag = 1. else if (flag == 1) then Set i = m3 and flag = 0. else if (krk k2 < 0.1 andkrk−1 k2 /krk k2 < 4) then Set i = min(m4 , i + m5 ). Use {GMRES, M, i, 10−4 }. else Use {GMRES, M, i, 10−4 }. end if 4. Shift-and-invert Lanczos with Krylov-Schur restarting. In this section, we introduce shift-and-invert Lanczos with Krylov-Schur restarting [32, 33] to solve the generalized Hermitian eigenvalue problem (2.19). It is shown in [16] that the dimension of zero eigenspace of (A, B) is n/3 and some of smallest positive eigenvalues are interesting. Consequently, we need to transform (2.19) into (4.1). (A − σB)−1 Bx = (λ − σ)−1 x. for a given shift value σ > 0 so that the Lanczos method can be applied to solve the eigenvalues which are close to σ..

(17) 11. The basic recursion for applying Lanczos method to (4.1) is (A − σB)−1 BVj = Vj Tj + βj vj+1 eTj ,. (4.2). where the basis Vj is B-orthogonal and Tj is a real defined by  α1 β1  ..  β1 α2 . Tj =   . . .. .. β  j−1 βj−1 αj or equivalent to. symmetric tridiagonal matrix      . (A − σB)−1 Bvj = αj vj + βj−1 vj−1 + βj vj+1 . By the condition Vj∗ BVj = Ij , it is easily to infer that αj = vj∗ B(A − σB)−1 Bvj ,. βj2 = t∗j Btj ,. where tj ≡ (A − σB)−1 Bvj − αj vj − βj−1 vj−1 = βj vj+1 . An eigenpair (θk , sk ) of Tj is used to get an approximate eigenpair (λk , xk ) of (A, B) by 1 λk = σ + , xk = Vj sk . θk The corresponding residual is rk. = = = =. AVj sk − λk BVj sk = (A − σB)Vj sk − θk−1 BVj sk −θk−1 [BVj − (A − σB)Vj Tj ] sk −θk−1 (A − σB) (A − σB)−1 BVj − Vj Tj −θk−1 βj (eTj sk )(A − σB)vj+1. which implies that the norm of rk is small whenever |βj (eTj sk )/θk | is small. We summarize above process of shift-and-invert Lanczos method to solve the generalized Hermitian eigenvalue problem (2.19) in Algorithm 4. Algorithm 4 Shift-and-invert Lanczos method for generalized Hermitian eigenvalue problem p 1: Given starting vector t, compute q = Bt and β0 = |q ∗ t|. 2: for j = 1, 2, . . . , do 3: Compute wj = q/βj−1 and vj = t/βj−1 . 4: Solve linear system (A − σB)t = wj . 5: Set t := t − βj−1 vj−1 ; compute αj = wj∗ t and reset t := t − αj vj . 6: B-reorthogonalize t to v1 , . .p . , vj if necessary. 7: Compute q = Bt and βj = |q ∗ t|. 8: Compute approximate eigenvalues Tj = Sj Θj Sj∗ . 9: Test for convergence. 10: end for 11: Compute approximate eigenvectors X = Vj Sj . In principle, we can keep expanding the Lanczos decomposition in (4.2) until the Ritz eigenpairs converge to the wanted eigenpairs. Unfortunately, the expanding.

(18) 12. process is limited by the amount of memory to store Vj and avoiding loss of numerical orthogonality in Vj . Therefore, we restart the Lanczos process once when j becomes large enough. For restarting process, an implicit restart is proposed in [31] which Krylov process is combined with the implicitly shifted QR algorithm. This technique has been implemented in the ARPACK software [23]. A simpler way of achieving the same effect of the implicit restarting is by the so-called Krylov-Schur method proposed by Stewart [32, 33]. We state the Krylov-Schur restart in the following. Use Lanczos process to generate the Lanczos decomposition of order j + p (4.3). (A − σB)−1 BVj+p = Vj+p Tj+p + βj+p vj+p+1 eTj+p .. Let (4.4). T Tj+p = Uj+p Dj+p Uj+p ≡. . Uj. Up. . diag (Dj , Dp ). . UjT UpT. . be a Schur decomposition of Tj+p where the diagonal elements of Dj and Dp are the j wanted and p unwanted Ritz values, respectively. Substituting (4.4) for (4.3), it holds that (A − σB)−1 B(Vj+p Uj+p ). T = (Vj+p Uj+p )(Uj+p Tj+p Uj+p ) + βj+p vj+p+1 (eTj+p Uj+p ). which implies that (A − σB)−1 B V˜j = V˜j Dj + βj+p vj+p+1 tTj. is a Krylov decomposition of order j where V˜j ≡ Vj+p Uj and eTj+p Uj+p ≡ tTj , tTp . Let H1 be a Householder transformation such that tTj H1 = βeTj . Reduce H1T Dj H1 to tridiagonal form by using Householder transformations Hi for i = 2, . . . , j − 1 as the following illustration:     × × × × × × × 0  × × × ×   × × × 0  T    Dj :=   × × × ×  ⇒Dj := H2 Dj H2 =  × × × ×  × × × × 0 0 × ×   × × 0 0  × × × 0  T  ⇒ Dj := H3 Dj H3 =   0 × × ×  0 0 × × Let. Q = H1 H2 · · · Hj−1 . Then T˜j ≡ QT Dj Q is symmetric tridiagonal and tTj Q = (tTj H1 )(H2 · · · Hj−1 ) = βeTj (H2 · · · Hj−1 ) = βeTj . Therefore, the Krylov decomposition (A − σB)−1 B(V˜j Q) = (V˜j Q)T˜j + (βj+p β)vj+p+1 eTj is a Lanczos decomposition of order j and we can use it to generate a new Lanczos decomposition of order j + p if the j eigenpairs of (A, B) do not converge..

(19) 13. 5. FFT preconditioner For solving the correction equation in Jacobi-Davidson method or Krylov-Schur method, it needs to solve linear system Md = y where M is a preconditioner of A − θB. Jacobi, SSOR, ILU and ICC are some common traditional preconditioners for folving linear systems. For solving three-dimensional Maxwell’s operator, a FFT preconditioner is suggested in [16]. In this section, we’ll explain the most important transformation so that we can utilize Fast Fourier Transform (FFT) to solve the linear system Md = y. 5.1. FFT usage in solving Laplacian operator. In this subsection, we demonstrate how to solve Laplacian operator by using Discrete Fourier Transform. In 1-D case, given a sequence of data {uj }N j=1 , we can write N −1 2π 1 X uˆk eik N j , for j = 1, · · · , N uj = √ N k=0. and. To solve the PDE. N 2π 1 X uj e−ik N j , for k = 0, · · · , N − 1 u ˆk = √ N j=1. uxx = f. for x ∈ [0, 2π]. with the periodic boundary condition u(0) = u(2π), we approximate ∂x2 at xj by the short stencil second order centered difference uj−1 − 2uj + uj+1 D 2 uj = h2 with the periodic boundary condition u0 = uN and uN +1 = u1 for j = 1, · · · , N . Apply D2 to eikxj where k = 0, · · · , N − 1, we have D2 eikxj. = =. eikxj−1 − 2eikxj + eikxj+1 h2 −ikh e − 2 + eikh eikxj = h2. −4 sin2 h2. kh 2. !. eikxj. Under the periodic boundary condition, the discretised PDE D 2 u j = fj ⇔ where. N −1 X k=0. −λ2k u ˆk eikxj =. N −1 X k=0. fˆk eikxj. for j = 1, · · · , N. −4 sin2 kh 2 . h2 They are eigenvalues of D2 under periodic boundary conditions. Therefore, we have −λ2k ,. −λ20. −λ2k u ˆk = fˆk ,. 2. for k = 0, · · · , N − 1.. Since = 0, the kernel of D under periodic boundary conditions consists of all constant functions. Then u ˆk = −fˆk /λ2k , for k = 1, · · · , N − 1 u ˆ0 = 0, (which is called mean-zero solution). In 3-D case, that is, ∇2h uijk , Dx2 uijk + Dy2 uijk + Dz2 uijk = fijk , given { uijk | i = 1, · · · , M ; j = 1, · · · , N ; k = 1, · · · , L }, we can write.

(20) 14. uijk = u(xi , yj , zk ) =. M−1 X m=0. =. M−1 X m=0. =. u bm (yj , zk )eimxi N −1 X n=0. u bmn (zk )einyj. M−1 −1 L−1 X NX X m=0 n=0 ℓ=0. !. eimxi. u bmnℓ eiℓzk einyj eimxi. where xi = 2πi/M , yj = 2πj/N and zk = 2πk/L. In the Fourier space, we have. umnℓ = fbmnℓ . −(λ2m + λ2n + λ2ℓ )b. Thus, the first step for solving the linear system is to do 3-D discrete fourier transformation of b (right-hand side), which is then divided by the eigenvalue −(λ2m + λ2n + λ2ℓ ) and mean-zero solution is set. Finally, 3-D inverse discrete fourier transform is conducted. 5.2. A crucial transformation for applying FFT. In this subsection, we give a transformation so that we can utilize FFT as our preconditioner. First, we take M = θε0 I with ε0 =. nX 1 −1 n 2 −1 n 3 −1 X X. (ε1 (i, j, k) + ε2 (i, j, k) + ε3 (i, j, k))/(3n),. i=0. j=0 k=0. i.e., ε = ε0 in (2.3). Then the linear system (A − ε0 θI)d = y is obtained from discretizing the following equation e−ik·x ∇ × ∇ × eik·x d − ε0 θd = y or ∇ × ∇ × eik·x d − ε0 θeik·x d = eik·x y.. (5.1) Using the fact. ∇ × (∇ × eik·x d) = ∇(∇ · eik·x d) − ∇2 (eik·x d), (5.1) can be rewritten as eik·x y (5.2). = ∇(∇ · eik·x d) − ∇2 (eik·x d) − ε0 θeik·x d = (−△ − ε0 θ) eik·x d + ∇(∇ · eik·x d).. On the other hand, according to (5.1) and the fact ∇ · (∇ × v) = 0, we obtain (5.3) ∇ · (eik·x y) = ∇ · ∇ × ∇ × eik·x d − ε0 θ∇ · (eik·x d) = −ε0 θ∇ · (eik·x d). Substituting (5.3) for (5.2), we obtain (5.4). (−△ − ε0 θ) eik·x d = eik·x y + (ε0 θ)−1 ∇(∇ · eik·x y). which can be discretized to form a linear system (5.5). M(Dd) = y˜.. Since the boundary condition of (5.4) is periodic and ε0 θ is not an eigenvalue, (5.5) can be solved by the FFT..

(21) 15. 6. Numerical Results In numerical results, we concern two important things. The first one is accuracy, and the second one is efficiency. In the following subsections, we discuss the accuracy of different methods and also show the performance of varied eigen solvers and linear solvers with different preconditioners. For examples, Arnoldi’s method with different preconditioners, Krylov decomposition with Krylov-Schur restarting and Jacobi-Davidson methods with preconditioners Jacobi, SSOR(ωssor ), ILU(ℓ) (incomplete LU factorization), ICC(ℓ) (incomplete Cholesky factorization) and FFT, where ωssor is a parameter of SSOR preconditioner with 0 < ωssor < 2 and ℓ is the fill-in level in ILU/ICC precinditioner. The preconditioners Jacobi, SSOR, ILU and ICC are provided by PETSc (Portable, Extensible Toolkit for Scientific Computation) [3], which is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. The Arnoldi’s method is provided by SLEPc (Scalable Library for Eigenvalue Problem Computations) and ARPACK, and the Krylov decomposition with Krylov-Schur restarting is provided by SLEPc [14, 15]. SLEPc is a software library for the solution of large scale sparse eigenvalue problems on both sequential and parallel computers. All the numerical experiments are conducted on a workstation running RedHat Linux. The workstation is equipped with an Intel Quad-Core Xeon E5355 2.66 GHz × 2 and 16-gigabyte main memory. The proposed schemes are implemented by Fortran 90 and compiled by the Intel Fortran Compiler. Our numerical tests are based on the benchmark example [16] shown in Figure 4 where the periodic dielectric structure within a primitive cubic cell is depicted. The structure consists of dielectric spheres with radius r, connected by circular cylinders with radius s. Here r/a = 0.345, s/a = 0.11 and a is the edge length of the cube. Inside the structure is the dielectric material with permittivity contrast εi /εo = 13. The matrix dimension of A is equal to 375000 = 3 × 50 × 50 × 50 or 46875 = 3 × 25 × 25 × 25. The smallest ten positive eigenvalues λ1 , . . . , λ10 with 0 < λ1 ≤ λ2 ≤ · · · ≤ λ10 are required.. a. Figure 4. The periodic dielectric structure within a primitive cell. Inside: dielectric material. Outside: air. Here r/a = 0.345, s/a = 0.11 and εi /εo = 13..

(22) 16. Before presenting our numerical results, it’s necessary to define some common notions in the following tables and figures. First, the notation “ncv” in the following figures and tables denotes the largest dimension of the Krylov subspace, “conv” prescribe the amount of convergent eigenvalues which are computed by the solution algorithm which conforms our stop criteria. Nest, if the absolute value of the eigenvalues computed by the eigen solver is less than 1.0e-7, we set the eigenvalues to be zero eigenvalue, and “nzero” denotes the number of eigenvalues which are zero eigenvalue. “nwew” denotes the number of convergent eigenvalues which belongs to {λ1 , . . . , λ10 }, that is, the eigenvalues we are interested in. “iter” denotes the number of restarting in Krylov decomposition. Notice that, if iter equals to one, it means the program didn’t restart. “iter” in the Jacobi-Davidson method means the number of iterations (Iterate once means do Algorithm 1 from line 4 to line 13 once). “time” denotes the total CPU time we took in eigen solver. 6.1. Results of the Arnoldi method. In this subsection, we report the performances of solving generalized eigenvalue problem (2.19) by using shift-and-invert Arnoldi’s method. In eigen solver, since ARPACK is a popular package for solving eigensystem, we try to evaluate our eigenvalue problem by using ARPACK first. In fact, we combine SLEPc with ARPACK, that is, we call some subroutines provided by ARPACK with SLEPc as the interface. And in the linear system, we use GMRES method with preconditioners such as Jacobi, SSOR, ICC provided by PETSc and we also use FFT preconditioner to solve the associated linear systems. The stopping tolerance of the Arnoldi method is set to be 10−8 , the tolerance of linear system is 10−10 , the shift target is σ = 0.1 and k = (π, π, 0). In Table 1, Table 2, and Table 3, the matrix size is 46875 (25 × 25 × 24 × 3). ”Direct” means we solve the linear system by using direct method, that is, LU factorization. First, we focus on the accuracy part. Table 1 shows the amount of wanted eigenvalues computed by the ARPACK. It’s easy to perceive that we didn’t get enough eigenvalues we desire especially in FFT preconditioner. In Fact, we really got ten convergent eigenvalues in different preconditioners, but most of these eigenvalues are larger than our target eigenvalues. ncv 20 25 30 35 40 45 50 55 60. Direct Jacobi ICC(1) SSOR(0.95) FFT 1 0 0 0 0 6 0 6 6 3 3 4 5 0 0 4 0 4 0 2 4 4 4 4 2 2 2 5 3 1 4 4 4 2 2 2 2 4 2 0 5 3 3 3 0. Table 1. The amount of wanted eigenvalues in Arnoldi’s method. Next, we pay attention to the efficiency part. Table 2 shows the CPU Time of different preconditioners. We ignore ICC(0) preconditioner since it didn’t converge even though it took more than 40000 seconds. According to this table, in general, SSOR, ICC, and Jacobi preconditioners took more than five thousand seconds. It seems that direct method and FFT preconditioner might be two nice choices. But if we consider a large-scale linear system, direct method isn’t a suitable choice since we only have limited memory. Although we didn’t get ten eigenvalues we need by.

(23) 17. using ARPACK, this table shows a crucial information that FFT preconditioner might be a suitable choice. ncv 20 25 30 35 40 45 50 55 60. Direct 663.7 637.6 554.4 558.6 673.5 580.3 556.6 560.6 571.7. Jacobi 25567.7 21248.7 35091.2 42043.3 36040.8 41806.6 32546.6 38404.5 43120.4. ICC(1) SSOR(0.95) FFT 5804.9 5182.8 49.5 10747.6 5591.2 62.1 10535.1 4335.3 5254.5 9370.8 5756.2 70.2 11096.9 6075.9 68.0 10464.0 7340.2 78.4 10411.8 5911.0 64.9 11588.5 6825.1 72.7 12687.4 7219.7 80.6. Table 2. The time (seconds) of Arnoldi’s method In Table 3, we present the result of SSOR preconditioner with different ωssor in order to observe if we can get a better accuracy and efficiency. Unfortunately, we didn’t get a better performance. There are only five to six eigenvalues we are interested in. But we still took lots of time in computing these eigenvalues. ωssor iter time conv nwew nzero 0.55 7 10515.4 9 5 0 0.60 7 8516.8 10 6 0 0.65 7 8915.1 10 6 0 0.70 7 9177.1 10 6 0 0.75 7 8646.3 9 5 0 0.80 7 6486.7 9 5 0 0.85 7 6818.0 9 5 0 0.90 6 5445.9 0 0 0 0.95 7 5591.2 10 6 0 1.00 7 5849.7 9 5 0 1.05 7 7949.2 10 6 0 1.10 6 8789.2 1 0 1 1.15 7 9845.7 10 6 0 1.20 8 19368.6 0 0 2 Table 3. SSOR with different ωssor , ncv = 25 In the following subsections, we’ll try some other methods to observe if we can get all ten eigenvalues we need and if FFT preconditioner is the best choice when we consider a larger matrix size. More detailed results of Arnoldi’s method by using ARPACK with different preconditioners are shown in Appendix D. 6.2. Results of the Jacobi-Davidson method. In this subsection, we report the performances of Jacobi-Davidson’s method with different preconditioners in solving correction equation (Algorithm 1:9). In order to get a better performance, we tune some parameters ω, ℓ and mi where i = 1, . . . , 5 for different preconditioners such as Jacobi, ICC(ℓ), SSOR(ωssor ), FFT, etc in Algorithms 2 and 3 so that each target eigenpairs can be computed by using the Jacobi-Davidson method and converge in 300 iterations. The target value and stopping tolerance ε in Line 8 of Algorithm 1 are set to 0.1 and 10−8 , respectively. The vector k and the matrix dimension of A are equal to (π, π, 0) and 3 × 25 × 25 × 25 = 46875, respectively..

(24) 18. The numerical results are shown in Table 4. We ignore ILU preconditioner since the linear system doesn’t converge in limited steps. According to the last column of this table, it’s easy to observe that we got a great accuracy progress. In most of conditions, we can get all ten eigenvalues we need by using the Jacobi-Davidson method. Next, we discuss the efficiency. We tune some parameters and then present the results of the best performance in every preconditioner. The first information is, according to the table, that it took more time when the level of ICC preconditioner becomes higher and higher. The second information is that we achieve the best efficiency by using SSOR preconditioner. M Jacobi SSOR(0.75) SSOR(0.90) ICC(1) ICC(2) ICC(3). m1 m2 m3 m4 m5 100 150 200 150 50 50 55 75 65 10 50 55 75 65 10 50 60 70 50 10 50 60 70 50 10 100 150 100 100 10. iter 555 671 643 904 1093 1709. time nwew 1006.4 10 934.7 10 940.1 10 1756.5 10 3593.2 10 7086.6 10. Table 4. Numerical results of Jacobi-Davidson method with different preconditioners M. The matrix size is 46875. In Table 4, the matrix comes from the discretization matrix of double curls operator. Next, as section 5 mentioned, we choose M ≈ L where L is the discretization matrix of three-dimensional Laplacian operator by using finite difference method because we’d like to utilize FFT preconditioner. Now we are interested in the efficiency after we apply some common preconditioners involving FFT to the matrix L. The results are shown in Table 5. According to Table 5, we detect that the performance of FFT preconditioner is obviously better than other preconditioners like Jacobi, SSOR and ICC. If we don’t consider FFT preconditioner, as Tables 4 and 5 show, we don’t get a better performance if we apply other common preconditioners to matrix L. M Jacobi SSOR(0.85) ICC(1) ILU(1) FFT. m1 m2 m3 m4 m5 100 300 350 300 50 100 100 150 100 25 35 50 65 45 5 35 50 75 40 10 25 20 25 25 2. iter 811 651 775 762 236. time nwew 8048.6 10 2473.5 10 2862.7 10 4148.9 10 103.6 10. Table 5. Numerical results of Jacobi-Davidson method with different preconditioner (Laplacian operator). The matrix size is equal to 46875.. In order to observe the effect of mi for a given preconditioner M, we extend the matrix dimension of A to be 3×50×50×50 = 375000 and the corresponding results are shown in Table 6. Notice that in this table, we apply the preconditioners to the matrix which comes from the double curls operator except FFT preconditioner since we don’t get a better performance by applying some common preconditioners to the matrix with small size which comes from the Laplacian operator. As Figure 5 shows, if we only consider the double curls operator, SSOR provides the best performance. Now we present what ωssor makes a better efficiency. The.

(25) 19. M Jacobi SSOR(0.80) SSOR(1.00) ICC(0) ICC(1) FFT. m1 m2 m3 m4 m5 iter time nwew 150 150 250 150 100 1105 32429.1 10 50 55 75 60 2 1278 20647.2 10 45 45 65 55 2 1418 20261.7 10 350 350 450 300 100 1200 94346.7 10 100 100 200 100 100 1066 38226.7 10 25 20 25 25 2 243 1187.4 10. Table 6. Numerical results of Jacobi-Davidson method with different preconditioner M. The matrix size is 375000. results of different choices of ωssor are shown in Figure 5. According to this figure, we suggest that the better choice of ωssor is from 0.75 to 1.10. 4. 4.5. x 10. CPU Time (sec.). 4. 3.5. 3. 2.5. 2. 0.7. 0.8. 0.9. 1. ωssor. 1.1. 1.2. 1.3. 1.4. Figure 5. CPU times of Jacobi-Davidson method with preconditioner SSOR, where ωssor from 0.65 to 1.25. And m1 = 50, m2 = 80, m3 = 100, m4 = 60, m5 = 2. The matrix size is 375000 More detailed data of using Jacobi-Davidson’s method are presented in Appendix E. 6.3. Results of the Krylov-Schur method. In this subsection, we report the performances of solving generalized eigenvalue problem (2.19) by using shift-andinvert Krylov decomposition with Krylov-Schur restarting (Krylov-Schur method). The GMRES iterative solver with preconditioner Jacobi, SSOR, ILU, ICC in PETSc and FFT preconditioner are used to solve the associated linear systems. The stopping tolerance of the Krylov decomposition is set to be 10−8 , and the shift target is taken σ = 0.1 and k = (π, π, 0). First, we discuss the performance of preconditioners Jacobi, SSOR, ILU, ICC provided by PETSc. If we apply Jacobi and ILU preconditioner to the matrix M ≈ A−σB, the GMRES method for solving the linear system (A−σB)x = b in KrylovSchur method doesn’t converge. In accuracy part, we can get all ten eigenvalues we need by using the Krylov-Schur method with ICC preconditioners but we only get eight with SSOR(0.8) preconditioner. The result is shown in Table 7. In fact, if we require higher accuracy of linear system, we can get all ten eigenvalues in SSOR(0.8) preconditioner. The efficiency results of using SSOR and ICC preconditioner are shown in Figure 6. As the figure shows, though we can get more eigenvalues than.

(26) 20. ARPACK, we still don’t satisfy the CPU time we took. The detailed results are shown in Appendix A. The notation “ncv” in the following figures and tables denotes the largest dimension of the Krylov subspace. If the dimension of the Krylov subspace is larger than “ncv”, the process of Krylov decomposition will be restarted. ncv 20 25 30 35 40 45 50 55 60. ICC(1) SSOR(0.8) 10 8 10 8 10 8 10 8 10 8 10 8 10 8 10 8 10 8. Table 7. The amount of wanted eigenvalues in Arnoldi’s method. 5. 10. CPU times (sec.). SSOR(0.8) ICC(1). 4. 10. 3. 10. 20. 40. 60 ncv. 80. 100. Figure 6. CPU times of Krylov-Schur method with preconditioners ICC(1) and SSOR(0.8). Note that in order to reduce the computational time, we take matrix size to be 46875. In Section 5, we introduce a new partial differential equation (5.4) which is associated with a preconditioning linear system (5.5). That is, we use GMRES with preconditioning linear system (5.5) to solve (A − σB)x = b. An accurate solution of (5.5) can be obtained by Fast Fourier Transform. The numerical results of such FFT preconditioner with ncv from 20 to 100 is shown in Table 8. As Table 8 shows, we get all ten eigenvalues we desire. It means we got a great accuracy. And the CPU time is less than one hundred seconds by using FFT as our preconditioner. In Table 9, we present the result of Krylov-Schur method with larger matrix size. We also get all ten eigenvalues we desire except ncv = 40, and we get great performance in a larger matrix size. Since (5.5) is a preconditioning linear system, we can find an approximate solution to reduce the computation costs for solving (5.5). Here we replace the linear.

(27) 21. ncv 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100. iter 11 7 5 4 3 3 3 2 2 2 2 1 1 1 1 1 1. time 77.2 70.5 67.1 67.6 65.0 70.4 80.0 69.1 74.2 80.6 87.9 65.1 69.5 74.2 79.3 83.2 87.8. conv 13 15 16 16 16 14 14 16 14 14 17 16 16 14 14 13 14. nwew nzero 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0. Table 8. The results of Krylov-Schur method with FFT preconditioner. The matrix size is 46875. ncv 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100. iter 10 7 5 4 2 3 3 2 2 2 2 1 1 1 1 1 1. time conv nwew 657.2 13 10 642.3 14 10 611.8 15 10 607.8 16 10 446.9 12 9 648.8 14 10 720.8 14 10 620.9 14 10 667.8 14 10 725.0 14 10 789.6 16 10 584.5 16 10 624.0 14 10 664.7 14 10 704.9 14 10 746.0 14 10 786.3 16 10. nzero 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0. Table 9. The results of Krylov-Schur method with FFT preconditioner.The matrix size is 375000.. ˜ ˜ is a preconditioner of M. As we do system in (5.5) with M(Dd) = y˜ where M in the previous section, we apply some common preconditioner Jacobi, SSOR(0.8), ICC(1) and ILU(1) to the new matrix. And we can get all ten eigenvalues we want by using these preconditioners. The numerical results are shown in Figure 7. According to Figures 6 and 7, the performance of Krylov-Schur method with ˜ is better than that of Krylov-Schur method with preconditioner preconditioner M M of A − σB. This result is opposite to the Jacobi-Davidson method. Moreover, as Table 8 shows, FFT preconditioner is still faster than other preconditioners..

(28) 22. 3.6. CPU times (sec.). 10. Jacobi SSOR(0.8) ICC(1) ILU(1). 3.5. 10. 3.4. 10. 3.3. 10. 3.2. 10. 20. 40. 60 ncv. 80. 100. Figure 7. CPU times of Krylov-Schur method with using preconditioners Jacobi, SSOR(0.8), ICC(1) and ILU(1) to solve (5.5). Note that in order to reduce the computational time, we take matrix size to be 46875. Finally, we pay more attention to the linear system. We observe the iteration number of the first Krylov iteration. Here we set ncv equal to 20, that is, we have to solve 20 linear systems in the first Krylov-Schur iteration. Figure 8 shows the iteration number of every linear system with different preconditioners.. SSOR ICC ILU Jacobi FFT. Iteration Number. 1500. 1000. 500. 0 0. 5. 10. 15. 20. Figure 8. The linear system iteration number of Krylov-Schur method with different preconditioners. The matrix size is 375000 As Figure 8 shows, it is easy to perceive the efficient contrast between FFT preconditioner and other preconditioners. If we use FFT as our preconditioner, it only needs less than 30 iterations in comparison with other preconditioners which need more than hundreds of iterations. FFT preconditioner really affect the linear system a lot. We might be confused that the results of SSOR(0.8) preconditioner and ICC(1) preconditioner since we need more iterations in SSOR(0.8), but we got a better performance. That’s because PETSc takes more time in matrix factorizations.

(29) 23. if we use ICC(1) as the preconditioner. We can observe how many percentage PETSc takes in different PETSc functions by using the flag ”-log summary”. 6.4. Results of different vector k. From the previous three subsections, we know that the performance of FFT preconditioner is obviously better than other preconditioners. In this subsection, we’ll do the comparison of different vector k between Arnoldi’s method, Krylov-Schur method and Jacobi-Davidson’s method with FFT preconditioner. In this subsection, our matrix size is 375000. In Table 10, we present the number of desirous eigenvalues computed by using Arnoldi’s method, that is, nwew. Clearly, we can’t get all ten eigenvalues by using Arnoldi’s method no matter how we choose ncv and k. The notation ”*” in Table 10 means we don’t get any result since it takes more than 15000 seconds. The detailed data are shown in Appendix C. In the following paragraphs, we focus on the comparison between Krylov-Schur method and Jacobi-Davidson’s method with FFT preconditioner in different k. √ Figure 9 [16] shows the plot of ω = a λ/(2π) vs. representative vectors k in the first Brillouin zone for the benchmark problem. The smallest nonzero eigenvalues are calculated for 40 sample points k, distributing along the segments connecting Γ = (0, 0, 0), X = (π, 0, 0), M = (π, π, 0), R = (π, π, π) and back to Γ in the first Brillouin zone. 0.6. √ ω = a λ/(2π). 0.5. 0.4. 0.3. 0.2. 0.1. 0. Γ. X. M. R. Γ. Figure 9. Band structure of the benchmark problem. At every segment, the ten smallest positive eigenvalues are computed for four sample vectors k at point π4 , π2 , 3π 4 , π. All the shift σ are set to 0.1 and the values of ncv in Krylov-Schur method are taken as 25, 30, 35 and 40. The numerical results are shown in Figure 10. From Figure 10, we conclude that the performance of Krylov-Schur method with FFT preconditioner is better than the Jacobi-Davidson method with FFT preconditioner. And more detailed data are shown in Appendix G and F. It has been proven in [16] that the number of zero eigenvalue for Hermitian generalized eigenvalue problem (2.19) is n3 where n is the matrix size. The effect of the null space on Jacobi-Davidson method has also reported that the Ritz value θk in Algorithm 1 is constantly dragged towards zero during the subspace iteration. This effect is also reflected in the convergence history of the residual of the computed Ritz vector. This is a reason why Jacobi-Davidson method needs more computational.

(30) 24. ncv k (π/4, 0.0, 0.0) (π/2, 0.0, 0.0) (3π/4, 0.0, 0.0) (π, 0.0, 0.0) (π, π/4, 0.0) (π, π/2, 0.0) (π, 3π/4, 0.0) (π, π, 0.0) (π, π, π/4) (π, π, π/2) (π, π, 3π/4) (π, π, π) (3π/4, 3π/4, 3π/4) (π/2, π/2, π/2) (π/4, π/4, π/4). 20 25 30 35 40 45 50 0 0 0 0 0 0 0 0 1 1 0 5 * 0 5. 6 3 4 6 0 0 0 4 0 8 8 * 8 0 0. 6 5 5 2 3 0 7 5 0 7 0 2 8 0 6. 3 0 3 7 5 4 6 4 3 7 6 3 3 4 4. 6 4 4 4 5 4 8 4 * 8 8 2 0 0 0. 4 2 0 3 5 4 5 2 5 6 8 3 7 6 4. 3 7 4 3 3 4 4 4 4 5 5 2 3 4 4. Table 10. The amount of wanted eigenvalues by using Arnoldi’s method with FFT preconditioner,different k. 3000 ncv=25 ncv=30 ncv=35 ncv=40 JD. CPU Time(sec.). 2500 2000 1500 1000 500 0. Γ. X. M. R. Γ. Figure 10. CPU times of Jacobi-Davidson and Krylov-Schur methods versus k. costs. In order to avoid the effect of the null space, a null space free Jacobi-Davidson method is proposed in [16]. From the results in Figure 10, it is worth studying the effect of the null space on Lanczos method and developing a null space free Lanczos method. 6.5. Results as k is near zero. In this subsection, we observe the behavior as k tends to be zero from both X-Direction and Diagonal-Direction. In this section, we ˜ and the relation is k = 2π k. ˜ In this subsection, our matrix use a new notation k, size is 375000. In the following numerical results, we use different shift in order to get the correct smallest eigenvalues. Table 11 shows the shifts of different k. On the upper right.

(31) 25. boxes of Figure 11 and Figure 12, the prefix ”JD” is abbreviation of the JacobiDavidson method and ”Kry” is the Krylov-Schur method with ncv = 20. And the ˜ =(x, 0.0, 0.0). The suffix ”x” means k tends to be zero from x direction, that is, k ˜ =(x, x, suffix ”diag” means k tends be to zero from diagonal direction, that is, k x). ˜ k (1.0e-1,0.0,0.0) (7.5e-2,0.0,0.0) (5.0e-2,0.0,0.0) (2.5e-2,0.0,0.0) (1.0e-2,0.0,0.0) (5.0e-3,0.0,0.0) (1.0e-3,0.0,0.0) (1.0e-1,1.0e-1,1.0e-1) (7.5e-2,7.5e-2,7.5e-2) (5.0e-2,5.0e-2,5.0e-2) (2.5e-2,2.5e-2,2.5e-2) (1.0e-2,1.0e-2,1.0e-2) (5.0e-3,5.0e-3,5.0e-3) (1.0e-3,1.0e-3,1.0e-3) Table 11. Different shifts as. Shift 1.0e-2 1.0e-2 1.0e-2 1.0e-2 1.0e-3 1.0e-4 1.0e-5 1.0e-2 1.0e-2 1.0e-2 1.0e-2 1.0e-3 1.0e-3 1.0e-5 k is near zero. First, we consider the accuracy, Figure 11 presents the maximum residual of the ten eigenvalues computed by the Krylov-Schur method and the Jacobi-Davidson method as k tends to be zero. As the figure shows, when k is close zero, the accuracy becomes lower and lower. In fact, most of eigenvalues can’t achieve the criterion we set in SLEPc. 0. 10. Maximum Residual. −2. 10. Kry_x Kry_diag JD_x JD_diag. −4. 10. −6. 10. −8. 10 1.0e−3 5.0e−3 1.0e−2 2.5e−2 5.0e−2 7.5e−2 1.0e−1 x. Figure 11. The maximum residual as k is near zero. In the efficiency part, Figure 12 presents the CPU time of these two methods as k tends to be zero. As the figure shows, we have to take more and more time as k is close to zero, and the Krylov-Schur method takes more times than the ˜ is smaller than 1.0e-2. The detailed data are shown Jacobi-Davidson method as k in Appendix H..

(32) 26. 4500 JD_x JD_diag Kry_x Kry_diag. 4000 CPU Time(sec.). 3500 3000 2500 2000 1500 1000. 500 1.0e−3 5.0e−3 1.0e−2 2.5e−2 5.0e−2 7.5e−2 1.0e−1 x. Figure 12. The CPU Time as k is near zero. Finally, we also focus on the linear system. Figure 13 shows the iteration number of three different k, they are (0.125, 0.0, 0.0), (5.0e-3, 0.0, 0.0), and (1.0e-3, 0.0, 0.0). We also only present the first Krylov-Schur iteration, and ncv is also 20. As the figure shows, we need more iterations in linear system as k close to zero. By ˜ = (1.0e-3, 0.0, 0.0), the way, because we need more Krylov-Schur iterations as k ˜ we take more CPU time in comparison with k = (1.0e-5, 0.0, 0.0) even though it seems the iteration numbers of the linear systems of these two k are close. More detailed data are shown in Appendix I. 100 0.125 5.0e−3 1.0e−3. 90 Iteration Number. 80 70 60 50 40 30 20 0. 5. 10. 15. 20. Figure 13. The iteration number as k is near zero.. 7. Conclusion For solving the eigenvalue problem of the Maxwell equation arising from threedimensional photonic crystals, we try Arnodli’s method first. Although we don’t get all eigenvalues, we detected that FFT preconditioner is more suitable than other preconditioners like Jacobi, SSOR, etc. After that, we try Jacobi-Davidson’s method and the Krylov-Schur method with larger matrix size. The good news are.

(33) 27. that we get all ten eigenvalues we want, and FFT preconditioner shows its extraordinary efficient no matter what method you used. We tune some parameters in the linear system of Jacobi-Davidson’s method, but we don’t get better performance than Krylov-Schur method. We can’t promise if there are some batches of parameters which make Jacobi-Davidson converge faster than Krylov-Schur method, but in practical application, they don’t waste time on tuning these parameters. As k is near zero, we find that the Jacobi-Davidson method provided a better performance and a better accuracy than the Krylov-Schur method. But in general, combining Krylov-Schur method with FFT preconditioner is the most appropriate choice in the eigenvalue problem of Maxwell’s equation arising from three-dimensional photonic crystals. References [1] P. Arbenz and R. Geus. A comparison of solvers for large eigenvalue problems occuring in the design of resonant cavities. Numer. Linear Algebr. Appl., 6:3–16, 1999. [2] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia, 2000. [3] S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc Web page, 2009. http://www.mcs.anl.gov/petsc. [4] A. Bossavit and J.-C. Verite. A mixed fem-biem method to solve 3-D eddy-current problems. IEEE Trans. on Magnetics, 18:431–435, 1982. [5] A. Chatterjee, J.M. Jin, and J.L. Volakis. Computation of cavity resonances using edge-based finite elements. IEEE Trans. Microwave Theory Tech., 40:2106–2108, 1992. [6] A. Chatterjee, L. C. Kempel, and J. L. Volakis. Finite Element Method for Electromagnetics: Antennas, Microwave Circuits, and Scattering Applications. IEEE Press, 1998. [7] Z. Chen, Q. Du, and J. Zou. Finite element methods with matching and nonmatching meshes for Maxwell equations with discontinuous coefficients. SIAM J. Numer. Anal., 37:1542–1570, 2000. [8] R.L. Chern, C.Chung Chang, Chien-C. Chang, and R.R. Hwang. Large full band gaps for photonic crystals in two dimensions computed by an inverse method with multgrid acceleration. Phys. Rev. E, 68:26704, 2003. [9] R.L. Chern, C.Chung Chang, Chien-C. Chang, and R.R. Hwang. Numerical study of threedimensional photonic crystals with large band gaps. J. Phys. Soc. Jpn., 73:727–737, 2004. [10] E. Chung, Q. Du, and J. Zou. Convergence analysis of a finite volume method for Maxwell’s equations in nonhomogeneous media. SIAM J. Numer. Anal., 41:37–63, 2003. [11] T. Chung and J. Zou. A finite volume method for Maxwell’s equations with discontinuous physical coefficients. Int. J. Appl. Math., 7:201–223, 2001. [12] D.R. Fokkema, G.L.G. Sleijpen, and H.A. van der Vorst. Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM J. Sci. Comput., 20:94–125, 1998. [13] M. Hano. Finite-element analysis of dielectric-loaded waveguides. IEEE. Trans. Microwave Theory Tech., 32:1275–1279, 1984. [14] V. Hernandez, J.E. Roman, A. Tomas, and V. Vidal. SLEPc users manual. Technical Report DSIC-II/24/02 - Revision 2.3.2, D. Sistemas Inform´ aticos y Computaci´ on, Universidad Polit´ ecnica de Valencia, 2006. [15] V. Hernandez, J.E. Roman, and V. Vidal. SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Software, 31:351–362, 2005. [16] T.-M. Huang, Y.-L. Huang, W.-W. Lin, and W.-C. Wang. A null space free Jacobi-Davidson iteration for Maxwell’s operator. Technical report, National Center for Theoretical Sciences, National Tsing Hua University, Taiwan, Preprints in Mathematics 2009-7-004, 2009. [17] T.-M. Huang, W. Wang, and C.-T. Lee. Jacobi-Davidson methods for polynomial eigenvalue problems arising in quantum dot simulations. Preprint, 2009. [18] T.-M. Hwang, W.-W. Lin, J.-L. Liu, and W. Wang. Jacobi–Davidson methods for cubic eigenvalue problems. Numer. Linear Algebr. Appl., 12:605–624, 2005. [19] T.-M. Hwang, W.-W. Lin, W.-C. Wang, and W. Wang. Numerical simulation of three dimensional pyramid quantum dot. J. Comput. Phys., 196:208–232, 2004. [20] T.-M. Hwang, W.-C. Wang, and W. Wang. Numerical schemes for three-dimensional irregular shape quantum dots over curvilinear coordinate systems. J. Comput. Phys., 226:754–773, 2007. [21] J. Jin. The finite element method in electromagnetics. John Wiley, 2002. [22] C. Kittel. Introduction to solid state physics. Wiley, New York, 2005..

(34) 28. [23] R.B. Lehoucq, D.C. Sorensen, and C. Yang. ARPACK USERS GUIDE: Solution of large scale eigenvalue problems with implicitely restarted Arnoldi methods. SIAM, Philadelphia, 1998. [24] N. Madsen. Divergence preserving discrete surface integral methods for Maxwell’s curl equations using non-orthogonal unstructured grids. J. Comput. Phys., 119:34–45, 1995. [25] P. Monk and E. S¨ uli. A convergence analysis of Yee’s scheme on nonuniform grids. SIAM J. Numer. Anal., 31:393–412, 1994. [26] G. Mur and A. de Hoop. A finite-element method for computing three-dimensional electromagnetic fields in inhomogeneous media. IEEE Trans. on Magnetics, 21:2188–2191, 1985. [27] Y. Saad and M. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 7:856–869, 1986. [28] V. Simoncini. Algebraic formulations for the solution of the nullspace-free eigenvalue problem using the inexact shift-and-invert Lanczos method. Numer. Linear Algebr. Appl., 10:357–375, 2003. [29] G. L. G. Sleijpen, A. G. L. Booten, D. R. Fokkema, and H. A. van der Vorst. Jacobi-Davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT, 36:595–633, 1996. [30] G. L. G. Sleijpen and H. A. van der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl., 17:401–425, 1996. [31] D. C. Sorensen. Implicit application of polynomial filters in a k-step Arnoldi method. SIAM J. Matrix Anal. Appl., 13:357–385, 1992. [32] G.W. Stewart. A Krylov–Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl., 23:601–614, 2001. [33] G.W. Stewart. Addendum to ”a Krylov–Schur algorithm for large eigenproblems”. SIAM J. Matrix Anal. Appl., 24:599–601, 2002. [34] H. A. Van Der Vorst. Bi-CGSTAB: A fast and smoothly converging variant of Bi- CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 13:631–644, 1992. [35] W. Wang, T.-M. Hwang, W.-W. Lin, and J.-L. Liu. Numerical methods for semiconductor heterostructures with band nonparabolicity. J. Comput. Phys., 190:141–158, 2003. [36] K. Yee. Numerical solution of initial boundary value problems involving Maxwell’s equations in isotropic media. IEEE Trans. Antennas Propag., 14:302–307, 1966..

(35) 29. Appendix A. Krylov-Schur method with different preconditioners without changing formula ncv iter time conv nwew nzero 20 11 8147.6 8 8 0 25 7 6958.2 8 8 0 30 5 6631.2 8 8 0 35 4 6616.5 8 8 0 40 3 6436.9 8 8 0 45 3 6908.2 8 8 0 50 3 7970.6 8 8 0 55 2 6893.9 8 8 0 60 2 7430.0 8 8 0 65 2 8016.7 8 8 0 70 2 8833.5 8 8 0 75 1 6470.0 8 8 0 80 1 6909.2 8 8 0 85 1 7346.4 8 8 0 90 1 7724.9 8 8 0 95 1 8149.0 8 8 0 100 1 8556.2 9 8 0 Table 12. The results of Krylov-Schur method with SSOR(0.8) preconditioner.. ncv iter time conv nwew nzero 20 11 7238.9 13 10 0 25 7 6293.7 15 10 0 30 5 6099.8 16 10 0 35 4 6113.2 16 10 0 40 3 5873.7 16 10 0 45 3 6400.6 14 10 0 50 3 7115.9 13 10 0 55 2 6205.2 16 10 0 60 2 6817.6 14 10 0 65 2 7150.0 14 10 0 70 2 7892.2 17 10 0 75 1 5836.4 16 10 0 80 1 6214.6 14 10 0 85 1 6622.2 14 10 0 90 1 6981.7 13 10 0 95 1 7390.3 13 10 0 100 1 7755.2 17 10 0 Table 13. The results of Krylov-Schur method with SSOR(1.0) preconditioner. tol of linear system is 10−11.

(36) 30. ncv iter time conv nwew nzero 20 10 11100.1 13 10 0 25 6 9945.7 14 10 0 30 5 10084.8 14 10 0 35 4 10382.8 14 10 0 40 3 9666.3 15 10 0 45 3 10835.5 14 10 0 50 3 9360.3 15 10 0 55 2 10282.4 14 10 0 60 2 11067.3 13 10 0 65 2 12146.2 17 10 0 70 2 13022.4 17 10 0 75 1 9630.7 14 10 0 80 1 10348.1 14 10 0 85 1 11028.4 13 10 0 90 1 11606.8 14 10 0 95 1 12216.1 17 10 0 100 1 12854.2 17 10 0 Table 14. The results of Krylov-Schur method with ICC(1) preconditioner..

(37) 31. Appendix B. Krylovschur with different preconditioners. Changed formula ncv iter time conv nwew nzero 20 10 3420.9 13 10 0 25 7 3236.0 15 10 0 30 5 2844.8 14 10 0 35 4 2883.3 15 10 0 40 3 2647.2 15 10 0 45 3 2988.0 14 10 0 50 2 2493.3 14 10 0 55 2 2801.9 14 10 0 60 2 3046.1 14 10 0 65 2 3239.2 13 10 0 70 2 3494.8 16 10 0 75 1 2517.1 15 10 0 80 1 2694.7 14 10 0 85 1 2843.4 14 10 0 90 1 3018.9 13 10 0 95 1 3166.4 14 10 0 100 1 3332.4 14 10 0 Table 15. Krylovschur change formula Jacobi k = (π, π, 0) size 46875. ncv iter time conv nwew nzero 20 10 1638.0 13 10 0 25 6 1445.6 14 10 0 30 5 1432.7 14 10 0 35 4 1466.1 14 10 0 40 3 1411.5 15 10 0 45 3 1584.0 14 10 0 50 2 1326.6 14 10 0 55 2 1452.3 14 10 0 60 2 1556.2 14 10 0 65 2 1703.3 13 10 0 70 2 1815.6 16 10 0 75 1 1352.2 15 10 0 80 1 1457.0 14 10 0 85 1 1553.3 14 10 0 90 1 1620.1 13 10 0 95 1 1694.1 13 10 0 100 1 1773.2 16 10 0 Table 16. Krylovschur change formula SSOR(0.8) k = (π, π, 0) size 46875.

(38) 32. ncv iter time conv nwew nzero 20 10 1968.4 13 10 0 25 7 1969.5 14 10 0 30 5 1822.2 14 10 0 35 4 1856.0 14 10 0 40 3 1726.8 14 10 0 45 3 1923.6 13 10 0 50 2 1668.2 14 10 0 55 2 1839.6 14 10 0 60 2 1966.9 13 10 0 65 2 2148.8 13 10 0 70 1 1593.6 14 10 0 75 1 1709.5 14 10 0 80 1 1814.4 13 10 0 85 1 1924.2 13 10 0 90 1 2041.4 13 10 0 95 1 2159.2 16 10 0 100 1 2258.2 16 10 0 Table 17. Krylovschur change formula ICC(1) k = (π, π, 0) size 46875 ncv iter time conv nwew nzero 20 9 2898.6 13 10 0 25 7 2952.8 14 10 0 30 5 2724.8 14 10 0 35 4 2772.0 14 10 0 40 3 2521.4 14 10 0 45 3 2862.6 13 10 0 50 2 2481.7 14 10 0 55 2 2704.7 14 10 0 60 2 2911.3 13 10 0 65 2 3159.0 13 10 0 70 1 2410.4 14 10 0 75 1 2745.0 14 10 0 80 1 2968.9 13 10 0 85 1 3027.1 13 10 0 90 1 3198.4 13 10 0 95 1 3546.0 16 10 0 100 1 3529.4 16 10 0 Table 18. Krylovschur change formula ILU(1) k = (π, π, 0) size 46875.

(39) 33. Appendix C. ARPACK with FFT preconditioner ncv iter time conv nwew nzero 20 5 404.9 0 0 2 25 6 590.2 8 6 0 30 4 567.2 8 6 0 35 3 599.7 7 3 0 40 2 549.3 8 6 0 45 3 815.4 9 4 0 50 2 677.2 10 3 0 55 2 752.5 9 4 0 60 2 845.1 10 5 0 65 1 539.1 8 6 0 70 2 1005.7 10 1 0 75 1 624.6 9 3 0 80 1 667.0 10 3 0 85 1 709.7 9 3 0 90 1 753.5 9 4 0 95 1 796.1 10 4 0 100 1 839.8 6 0 0 Table 19. ARPACK FFT k = (π/4, 0.0, 0.0) size 375000. ncv iter time conv nwew nzero 20 7 446.9 0 0 1 25 6 572.3 7 3 0 30 4 557.6 8 5 0 35 4 715.7 0 0 1 40 3 701.3 8 4 0 45 3 791.0 8 2 0 50 1 401.8 7 7 2 55 2 729.5 10 4 0 60 2 810.8 9 2 0 65 2 891.7 8 1 0 70 2 972.6 10 1 0 75 1 605.5 8 2 0 80 1 646.7 9 3 0 85 1 688.4 10 4 0 90 1 730.1 9 3 0 95 1 771.9 10 1 0 100 1 814.0 4 0 0 Table 20. ARPACK FFT k = (π/2, 0.0, 0.0) size 375000.

(40) 34. ncv iter time conv nwew nzero 20 7 461.9 0 0 2 25 7 639.3 8 4 0 30 5 654.5 10 5 0 35 4 710.4 7 3 0 40 3 679.1 10 4 0 45 3 927.8 0 0 4 50 2 641.5 10 4 0 55 2 722.5 9 3 0 60 2 802.7 8 2 0 65 2 891.2 10 2 0 70 2 972.6 10 3 0 75 1 599.5 7 3 0 80 1 640.6 10 4 0 85 1 681.7 7 3 0 90 1 723.7 9 2 0 95 1 765.2 10 1 0 100 1 806.5 9 1 0 Table 21. ARPACK FFT k = (3π/4, 0.0, 0.0) size 375000 ncv iter time 20 6 403.4 25 4 464.9 30 5 654.9 35 2 443.0 40 3 679.7 45 3 798.9 50 2 651.0 55 1 437.7 60 2 804.9 65 2 885.5 70 2 966.5 75 2 1049.2 80 1 641.8 85 1 683.5 90 1 725.6 95 1 766.6 100 1 809.0 Table 22. ARPACK FFT. conv nwew nzero 0 0 1 8 6 2 6 2 0 7 7 0 9 4 0 10 3 0 8 3 0 0 0 0 9 2 0 9 1 0 9 1 0 10 1 0 8 3 0 7 1 0 9 1 0 10 2 0 8 1 0 k = (π, 0.0, 0.0) size 375000.

(41) 35. ncv iter time conv nwew nzero 20 3 268.4 0 0 0 25 4 448.4 0 0 1 30 3 473.5 3 3 3 35 3 583.7 9 5 1 40 2 513.4 5 5 2 45 2 593.8 8 5 0 50 2 664.9 10 3 0 55 2 754.7 10 3 0 60 1 477.9 2 2 4 65 1 517.9 5 5 2 70 1 558.7 8 5 2 75 1 600.1 8 3 0 80 1 641.7 10 3 0 85 1 682.5 10 3 0 90 1 724.7 7 0 0 95 1 766.0 7 0 0 100 1 807.2 10 0 0 Table 23. ARPACK FFT k = (π, π/4, 0.0) size 375000 ncv iter time conv nwew nzero 20 5 346.6 0 0 1 25 4 439.0 0 0 0 30 3 505.4 0 0 5 35 3 573.4 9 4 0 40 2 511.1 5 4 3 45 2 581.3 9 4 0 50 2 663.5 10 4 0 55 2 736.0 10 3 0 60 2 798.9 10 2 0 65 1 515.8 0 0 10 70 1 555.9 9 4 0 75 1 597.0 9 3 0 80 1 638.0 10 3 0 85 1 678.9 10 3 0 90 1 720.7 10 2 0 95 1 762.0 8 0 0 100 1 803.2 10 0 0 Table 24. ARPACK FFT k = (π, π/2, 0.0) size 375000.

(42) 36. ncv iter time conv nwew nzero 20 5 353.2 0 0 0 25 5 503.0 0 0 0 30 3 453.4 7 7 0 35 3 561.8 9 6 0 40 2 500.1 10 8 0 45 2 570.9 10 5 1 50 2 642.3 8 4 0 55 2 723.0 9 2 0 60 1 472.6 8 8 0 65 1 511.8 10 7 0 70 1 511.6 8 5 1 75 1 592.2 9 5 1 80 1 632.7 9 4 0 85 1 673.2 10 3 0 90 1 713.9 10 2 0 95 1 754.3 8 0 0 100 1 795.5 9 0 0 Table 25. ARPACK FFT k = (π, 3π/4, 0.0) size 375000 ncv iter time conv nwew nzero 20 9 525.3 2 0 2 25 7 609.7 8 4 0 30 3 464.8 7 5 0 35 4 688.7 10 4 0 40 3 656.4 10 4 0 45 3 765.7 8 2 0 50 2 636.3 8 4 0 55 2 708.9 8 2 0 60 2 785.9 9 2 0 65 2 866.1 9 2 0 70 2 945.7 9 2 0 75 1 587.5 7 3 0 80 1 627.0 8 4 0 85 1 666.9 9 3 0 90 1 707.8 8 2 0 95 1 747.8 0 0 10 100 1 788.7 9 2 0 Table 26. ARPACK FFT k = (π, π, 0.0) size 375000.

(43) 37. ncv iter time conv nwew nzero 20 11 626.5 4 1 1 25 4 450.5 0 0 0 30 5 599.0 0 0 3 35 4 751.1 9 3 0 40 >50000 45 2 571.3 9 5 0 50 2 634.5 9 4 0 55 2 704.4 8 2 0 60 2 784.5 9 2 0 65 2 869.0 9 2 0 70 2 942.5 10 2 0 75 1 585.6 8 4 0 80 1 625.8 9 4 0 85 1 665.8 9 3 0 90 1 706.9 7 2 0 95 1 747.0 0 0 10 100 1 787.9 7 2 0 Table 27. ARPACK FFT k = (π, π, π/4) size 375000 ncv iter time conv nwew nzero 20 9 536.8 1 1 0 25 5 494.2 10 8 0 30 3 454.2 9 7 0 35 3 548.6 9 7 0 40 2 501.1 10 8 0 45 2 563.4 9 6 0 50 2 627.8 8 5 0 55 2 705.2 8 3 0 60 1 466.9 9 7 0 65 1 505.8 10 8 0 70 1 545.2 8 6 0 75 1 585.7 9 6 0 80 1 625.7 9 5 0 85 1 665.9 10 6 0 90 1 706.5 8 2 0 95 1 746.4 9 2 0 100 1 786.7 7 1 0 Table 28. ARPACK FFT k = (π, π, π/2) size 375000.