An Efficiency Study of Polynomial Eigenvalue Problem Solvers for Quantum Dot Simulations

(1)

PROBLEM SOLVERS FOR QUANTUM DOT SIMULATIONS

TSUNG-MING HUANG, WEICHUNG WANG, AND CHANG-TSE LEE

Abstract. Nano-scale quantum dot simulations result in large-scale polyno-mial eigenvalue problems. It remains unclear how these problems can be solved efficiently. We fill this gap in capability partially by proposing a polynomial Jacobi-Davidson method framework, including several varied schemes for solv-ing the associated correction equations. We investigate the performance of the proposed Jacobi-Davidson methods for solving the polynomial eigenvalue problems and several Krylov subspace methods for solving the linear eigenvalue problems with the use of various linear solvers and preconditioning schemes. This study finds the most efficient scheme combinations for different types of target problems.

1. Introduction

A standard matrix polynomial eigenvalue problem can be written as

(1) P(λ)u ≡ (

τ

X

i=0

λiAi)u = 0,

where (λ, u) is the corresponding eigenpair with λ ∈ C and u ∈ CN, integer τ is the degree of the matrix polynomial, and Ai∈ RN ×N are the coefficient matrices.

Solving large-scale polynomial eigenvalue problems has a long term history and still remains an active research topic due to its computational challenges and wide applications. Scientific and engineering studies that lead to polynomial eigenvalue problems include nano-scale quantum mechanism of degree 1 [7], 3 [34], or 5 [15]; high speed railway vibration of degree 2 [3, 14]; and plasma physics of degree 3 [13]. In this article, we focus on the polynomial eigenvalue problems arising in sim-ulations of nano-scale quantum dots (QDs). In particular, we consider how the Jacobi-Davidson methods and Krylov subspace methods may be used to solve these problems efficiently. This study aims to do the following:

• derive a framework of the Jacobi-Davidson method for solving polynomial eigenvalue problems;

• propose various schemes for solving the correction equation in the Jacobi-Davidson method;

• provide a set of large-scale polynomial eigenvalue benchmark problems aris-ing in the numerical simulation of quantum dots; and

• to perform intensive numerical performance comparisons for the various schemes against the benchmark problems.

Date: December 1, 2009.

Key words and phrases. polynomial eigenvalue problems, Jacobi-Davidson methods, correction equations, Krylov subspace methods, Schr¨odinger equation, quantum dot.

(2)

The paper is organized as follows. We first introduce the model problems aris-ing in quantum dot simulations in Section 2. Several variants of Jacobi-Davidson methods are discussed in Section 3. Several Krylov subspace methods for solving linear eigenvalue problems are discussed in Section 4. Numerical experiments are presented and analyzed in Section 5. We conclude the paper in Section 6.

2. Model Problems

Nano-scale semiconductor quantum dots are materials in which the carriers are confined within the dots in all three dimensions. These carries consequently have wavelike properties with discrete energy levels that are induced. Over the past few years, numerous studies regarding nano-scale quantum dots have been conducted to examine their physical properties [4, 10, 20, 25] and applications [2, 8, 9, 19, 24]. Other than theoretical and experimental methods, numerical simulations also play important roles to investigate insights into a QD’s electronic and optical properties [26, 28, 35].

We focus on a single particle conduction band model in this article. As shown in Figure 1, which is a structure scheme of the model, a QD is embedded in a matrix. The governing equation of this model can be described by the Schr¨odinger equation in general

(2) −∇ · ( ~

2

2m(x)∇u) + c(x)u = λu, or, in cylindrical coordinates,

(3) −1 r ∂ ∂r(r ~2 2m(x)∂ru) − 1 r∂θ( 1 r ~2 2m(x)∂θu) − ∂z( ~2 2m(x)∂zu) + c(x)u = λu. Here, ~ is the reduced Plank constant, the eigenvalue λ is the total electron energy, and the corresponding eigenvector u denotes the wave function. The variable m(x) represents the electron effective mass, and c(x) is the confinement potential.

In hetero-structures, both m(x) and c(x) are discontinuous at the interface. Par-ticularly, in a constant effective mass model, m(x) and c(x) are piecewise constant functions with respect to the space variable x:

m(x) = m1 in the dot, m2 in the matrix, c(x) = c1 in the dot, c2 in the matrix.

The Ben Daniel-Duke interface conditions associated with the discontinuity in m are imposed as follows

(4) u|_D += u|D−, ~2 2m2 ∂u ∂n _∂D + = ~2 2m1 ∂u ∂n _∂D − ,

where D stands for the QD domain. The normal direction n with the subscripts + and − denotes the corresponding outward normal derivatives of the interface defined for the matrix and dot regions, respectively. If the non-parabolicity of the electron’s dispersion relation is taken into account [5], the effective mass model for the interface conditions (4) becomes

(5) 1 m`(λ) =P 2 ` ~2 ₂ λ + g`− c` + 1 λ + g`− c`+ δ` ,

(3)

where P`, g`, and δ` are the momentum, main energy gap, and spin-orbit splitting

in the `th region, respectively. The following parameters are used in our numerical experiments: c1 = 0.000, g1 = 0.235, δ1 = 0.81, P1 = 0.2875, c2 = 0.350, g2 =

1.590, δ2 = 0.80, and P2= 0.1993. Additionally, we apply homogeneous Dirichlet

conditions on the boundary of the quantum matrix.

To simulate QDs fabricated in laboratories, various discretization schemes have been developed for various QD geometries such as a cylinder [16, 18, 34], cone [23], pyramid [6, 15, 27, 33], or irregular shape [17]. Among these QD geometries and discretizations, we pick the following five settings and use the induced polynomial eigenvalue problems as the test problems. We omit detailed derivations of the discretizations and the resulting eigen-systems here as they can be found in the corresponding references.

• Cylindrical QD with the constant effective mass model [18]:

(6) Pcc(λ)u ≡ (Acc0 + λI)u,

where Acc0 is unsymmetric.

• Cylindrical QD with the non-parabolic effective mass model [34]:

(7) Pcn(λ)u ≡ 3 X i=0 λiAcn_i ! u.

• Pyramidal QD with the constant effective mass model [15]:

(8) Ppc(λ)u ≡ (Apc0 − λI)u,

where Apc₀ is a symmetric positive definite matrix.

• Pyramidal QD with non-parabolic effective mass model [15]:

(9) Ppn(λ)u ≡ 5 X i=0 λiApn_i ! u.

• Irregular QD with the constant effective mass model over the skewed coor-dinate system [17]:

(10) Pic(λ)u ≡ (Aic0 + λA

ic 1)u,

where Aic₀ is symmetric positive definite matrix and Aic₁ is a diagonal matrix. The above test problems have different degrees in the polynomial eigenvalue problems and different distributions of matrix entry magnitudes. We use the eigen-solvers proposed in Sections 3 and 4 to solve these problems, to investigate the efficiency, and to justify the performance.

3. Polynomial Jacobi-Davidson Algorithm

Jacobi-Davidson type methods are competitive numerical methods when inte-rior eigenvalues are of interest, especially for large-scale eigenvalue problems. Such attractive properties mainly result from the fact that: (i) the coefficient matrices are used implicitly in the matrix-vector multiplication forms; (ii) no inverse of the matrix is needed even to compute interior eigenvalues; and (iii) the desired eigen-pairs are approximated iteratively by a gradually expanding subspace, and then corrector vectors can be approximately computed and preconditioned to achieve efficiency. However, there is only sparse literature that addresses how variants of Jacobi-Davidson methods perform on large-scale polynomial eigenvalue problems.

(4)

We describe the main idea of the Jacobi-Davidson method for the polynomial eigenvalue problem as follows from the viewpoint of Taylor expansion. We see that the following derivation is a straightforward generation of the standard lin-ear eigenvalue problem discussed in [29]. Suppose Vk is a k-dimensional subspace

that has an orthogonal unitary basis v1, v2, . . . , vk. Let (θk, uk) be a Ritz pair (an

approximate eigenpair) of P(λ) and (θk, sk) be an eigenpair of Vk∗P(λ)Vks = 0,

where kskk2 = 1, uk = Vksk, and Vk = [v1, · · · , vk]. To expand the subspace Vk

successively, the Jacobi-Davidson method finds the orthogonal complement for the current approximation uk. In other words, starting from the Ritz pair (θk, uk), we

intend to find a corrector t ⊥ uk such that

(11) P(λ)(uk+ t) = 0.

Obviously, it is impractical to solve Eq. (11) directly as it is another polynomial eigenvalue problem that is equivalent to the original target problem (1). Instead, we may expand Eq. (11) and rewrite P(λ) in terms of θk by using a Taylor expansion

to obtain an approximation to the corrector t. First, we rewrite Eq. (11) as P(λ)t = −P(λ)uk = −rk+ (P(θk) − P(λ))uk,

where the residual vector

rk= P(θk)uk.

Furthermore, by assuming θkis close to λ, we may use Taylor’s Theorem to obtain

(P(θk) − P(λ))uk = (θk− λ)P0(θk) − 1 2(θk− λ) 2_P00_(ξ k) uk = (θk− λ)pk− 1 2(θk− λ) 2_P00_(ξ k)uk,

where ξk is between λ and θk and

pk= P0(θk)uk ≡ d dλP(λ) _λ=θ k uk= τ X i=1 iθi−1_k Ai ! uk.

Consequently, Eq. (11) is equivalent to P(λ)t = −rk+ (θk− λ)pk− 1 2(θk− λ) 2_P00_(ξ k)uk. (12)

We can further manipulate the terms containing (θk − λ) to derive practical

computational schemes for finding t. First, by using the fact that rk⊥ uk, or

u∗_krk= u∗kP(θk)uk= s∗kV ∗ kP(θk)Vksk = 0, we multiplyI −pku∗k u∗ kpk

on both sides of Eq. (12) to eliminate the term containing (θk− λ) and to obtain I −pku ∗ k u∗ kpk P(λ)t = −rk− 1 2(θk− λ) 2 I −pku ∗ k u∗ kpk P00(ξk)uk.

Second, we neglect the second order term containing (θk − λ)2 on the right hand

side to obtain a linear (in terms of λ) approximation of P(λ)t satisfying I −pku ∗ k u∗_kpk P(λ)t = −rk and t ⊥ uk. (13)

(5)

Practically, we further apply the orthogonal projection (I − uku∗k) and approximate

P(λ) by P(θk) and then form the following correction equation

I −pku ∗ k u∗_kpk P(θk)(I − uku∗k)˜t = −rk and ˜t ⊥ uk. (14)

Based on the above discussions, a general polynomial Jacobi-Davidson method designed to compute all the desired eigenvalues for the problem (1) is shown in Algorithm 1.

Algorithm 1 Polynomial Jacobi-Davidson Algorithm for (Pτ

i=0λ i_A

i)u = 0.

Input: Coefficient matrices Ai for i = 0, . . . , τ , the number of desired eigenvalues

k and an initial orthonormal vector Vini.

Output: The desired eigenpairs (λj, uj) for j = 1, . . . , k. 1: Set V = [Vini], Vu= [ ], and Λ = ∅.

2: for j = 1 to k do

3: Compute Wi= AiV and Mi= V∗Wi for i = 0, . . . , τ . 4: while (user defined stopping criteria are not satisfied) do

5: Compute the eigenpairs (θ, s) of (Pτ_i=0θiMi)s = 0.

6: Select the desired eigenpair (θ, s) with ksk2= 1 and θ /∈ Λ. 7: Compute u = V s, p = P0(θ)u, r = P(θ)u.

8: Solve the correction equation I −pu ∗ u∗_p P(θ)(I − uu∗)t = −r approximately for t ⊥ u .

9: Orthogonalize t against V ; set v = t/ktk2. 10: Compute wi= Aiv, Mi= Mi V∗wi v∗Wi v∗wi for i = 0, . . . , τ .

11: Expand V = [V, v] and Wi= [Wi, wi] for i = 1, . . . , τ . 12: end while

13: Set λj= θ, uj = u, Λ = Λ ∪ {λj}.

14: Perform locking by orthogonalizing uj against Vu; Compute uj = uj/kujk2;

Update Vu= [Vu, uj].

15: Choose an orthonormal matrix Vini⊥ Vu; Set V = [Vu, Vini]. 16: end for

Now, we focus on three schemes for approximately solving Eq. (14). This is an essential step in the polynomial Jacobi-Davidson method that may affect the over-all performance significantly. We approximately solve Eq. (14) by a preconditioned iterative method, e.g., GMRES with SSOR preconditioner. We call this precondi-tioned iterative process corresponding to line 8 of Algorithm 1 as the “inner loop” of the algorithm. On the other hand, we call the while-loop in lines 4 to 12 as the “outer loop” of the algorithm.

(6)

Three schemes, SOneLS, ST woLS, and SOneStep, are proposed below to solve

Eq. (14) approximately. SOneLS solves one linear system by a preconditioned

it-eration method. ST woLS and SOneStep solve two linear systems by preconditioned

iterations, but SOneStep conducts only one step in the preconditioned iterations.

• Scheme SOneLS. In each step of the preconditioned iterations for solving

correction equation (14), we need to solve a linear system in a form such that

Mpz = y, z ⊥ uk

(15)

where y is a certain given vector that is orthogonal to uk and

Mp≡ I −pku ∗ k u∗_kpk M (I − uku∗k)

with preconditioner M of P(θk). Under the requirement z ⊥ uk, the

solu-tion of Eq. (15) is given by

z = M−1y + ηkM−1pk with ηk= − u∗_kM−1_y u∗_kM−1_p k . (16)

It is clear that the vector M−1pk and the inner product u∗kM −1_p

k need

to be computed only once in the first step of the preconditioned iteration. Consequently, other iterative steps need only the preconditioning operations in the form of M−1y.

• Scheme ST woLS. By (14) and t ⊥ uk, it follows that

P(θk)t =

u∗_kP(θk)t

u∗_kpk

pk− rk ≡ ηk(t)pk− rk.

(17)

We can then solve the two linear systems P(θk)z = −rk and P(θk)z = pk

(18)

approximately by a preconditioned iterative method to obtain the approx-imate solution z1 and z2, respectively. Then we compute

˜ t = z1+ ηkz2 for ηk = − u∗_kz1 u∗_kz2 . (19)

Here ˜t as an approximate solution to Eq. (17) that satisfies the requirement ˜

t ⊥ uk. In ST woLS, two linear systems (18) need to be solved approximately

by a preconditioned iteration method to obtain the approximated solution ˜

t.

• Scheme SOneStep. We may reduce the cost for computing ˜t that solves

Eq (17) in the inner loop. Here we only conduct one preconditioned itera-tion. Namely, ˜t is computed by

˜ t = −M−1rk+ ηkM−1pk for ηk = u∗_kM−1_r k u∗ kM−1pk , (20)

(7)

4. Krylov subspace methods

While the polynomial eigenvalue problems (6)-(10) can be solved by the Jacobi-Davidson methods presented in Section 3, the linear eigenvalue problems (6), (8), and (10) can also be solved by the the so-called Krylov subspace methods, such as the Lanczos, Arnoldi, and Krylov-Schur methods. In this section, we briefly describe how the Krylov subspace methods can be used to solve the standard eigen-value problem A0u = λu. Similar ideas can be generalized to general eigenvalue

problems A0u = λA1u by the A1-inner product.

First, we define the Krylov decomposition [30] as follows. Let Vk+1= [Vk, vk+1] ∈

RN ×(k+1) be an orthonormal matrix, where Vk ∈ RN ×k and vk+1 ∈ RN ×1. An

orthonormal Krylov decomposition of order k is a relation of the form A0Vk= VkBk+ vk+1bTk+1,

(21)

where Bk ∈ Rk×k is a Rayleigh quotient R(A0; Vk) = VkTA0Vk and bk+1 ∈ Rk×1.

Note that if A0 is symmetric, Bk is tridiagonal, and bk+1 = βkek, then (21) is a

Lanczos decomposition of order k. If Bk is upper Hessenberg, bk+1 = βkek, then

(21) is an Arnoldi decomposition of order k. Here βk ∈ R and ek ∈ Rk×1 is the

standard unit vector (kth column of identity matrix).

Consequently, the Ritz pairs associated with Ukcan be computed as follows. Let

(θi, vi) be an eigenpair of Bk and let (θi, yi) = (θi, Ukvi) be the Ritz pair of A. By

(21), we have

kAyi− θiyik2 = kAUkvi− θiUkvik2= k(UkBk+ uk+1bTk+1)vi− θiUkvik2

= kUk(Bkvi− θivi) + (bk+1T vi)uk+1k2= |bTk+1vi|.

As k increases, some of these Ritz pairs will approach the eigenpairs of A. That is, |bT

k+1vi| → 0 for some i.

Theoretically, we can keep expanding the Krylov decomposition until the Ritz eigenpairs converge to the desired eigenpairs. Practically, however, the expanding process is limited to avoid loss of numerical orthogonality of Vk and to use a

rea-sonable amount of memory for storing Vk. A general idea of restarting is that, after

Vp has been computed, a new Krylov process is performed to compute a different

Krylov decomposition of order p with “better” initial vectors. For example, (i) an “explicit restart” strategy [11, 12] reruns the Krylov decomposition by using the approximate Schur vectors associated with the first not-yet-converged eigenvalue as an initial vector. (ii) An “implicit restart” [21] combines the Krylov decomposition process with the implicitly shifted QR algorithm. This implicit restart process is more efficient and numerically stable than explicit restart. (iii) A “Krylov-Schur method” [31, 32] that can be seen as an improvement on traditional Krylov subspace methods. We sum up all the processes in Algorithm 2.

5. Numerical Results

We study how various Jacobi-Davidson and Krylov subspace methods perform when solving the polynomial eigenvalue problems arising in quantum dot simula-tions. The properties of the test problems are shown in Table 1. All of the test problems are solved by the Jacobi-Davidson methods. Only problems of degree 1 (linear or generalized) problems are solved by the Krylov type methods. Note that, as the eigenvector solutions of the cylindrical and irregular QDs are periodical in the azimuthal direction, the 3D problems are transformed to a sequence of 2D

(8)

Algorithm 2 Restarting Krylov Subspace Algorithm for A0u = λu.

Input: Coefficient matrix A0; initial orthonormal vector v1; number of desired

eigenpairs k; size of subspace for restarting p. Output: The desired k eigenpairs of A0.

1: Generate the Krylov decomposition of order j (j ≥ k), by starting from v1:

A0Vj = VjBj+ vj+1bTj+1. 2: Compute the Ritz pairs of A0 from Bj and Vj.

3: while (the desired k eigenpairs of A0 are not convergent) do 4: Extend the Krylov decomposition from order j to p:

AVp= VpBp+ vp+1bTp+1. 5: Compute the Ritz pairs of A0 from Bp and Vp.

6: Reformat a new Krylov decomposition with order j by a restarting process.

7: end while

problems by the truncated Fourier series. Consequently, the discretization domain dimensions of problems P11

2D, P232D, and P612D are over the two dimensional

radial-longitude planes. Except for the results shown on Figure 2, all of the numerical experiments are conducted on an HP BL460c workstation composed of two Intel Dual-Core 5160 3.0 GHz CPUs and 32 GB main memory.

In Section 5.1 and 5.2, we study how the correction equation solution schemes and the preconditioners affect the performance of the Jacobi-Davidson methods pre-sented in Algorithm 1. We implement the Jacobi-Davidson methods with Fortran 90. In Section 5.3, we investigate the performance of various Krylov subspace meth-ods. In Section 5.4, we make an overall comparison of all methods and conclude the most efficient scheme combinations for each of the problems.

5.1. Correction Equation Solution Schemes. To compare the efficiency of the three schemes (SOneLS, ST woLS, and SOneStep) for solving the correction equation

(14), we use GMRES to solve the linear systems in (14) and (18) with the SSOR (symmetric successive over-relaxation) preconditioner M = (D+ωL)D−1(D+ωU ). Here P(θk) = L + D + U and L, D, and U are the strict lower triangular, diagonal,

and strict upper triangular matrices, respectively. The parameter ω is chosen from 0.8 to 1.95. The timing results for problems P11

2D, P2 3 2D, P3 1 3D, P5 5 3D, and P6 1 2Dare

shown in Figure 2. The CPU timing results are computed by summing up the total cost for computing the smallest five positive eigenvalues and their corresponding eigenvectors.

Observing the figure, we can see that SOneLS outperforms the other schemes in

almost all cases. Furthermore, (i) the cost of ST woLS is roughly 2 times of the cost

of SOneLS. Such numerical results coincide with the corresponding algorithms. (ii)

SOneStep is usually the slowest method. While solving Pic(λ)u = 0, it even fails to

converge for ω < 1.7.

Remark. As mentioned in [29], the scheme involving (17) is not efficient for solving linear eigenvalue problems. We have similar observations for higher order polynomial eigenvalue problems and we believe the reasons behind this are similar to the linear cases. Using the definition of residual rk = P(θk)uk, we can rewrite

(17) as

(9)

Such choice is actually equivalent with t = P(θk)−1pk, as t is made orthogonal

to uk afterwards. Let ˜t be an approximated solution of P(θk)t = pk. The angle

between ˜t and uk may be small. Consequently, we do not expect that the subspace

expansion would be efficient. Our numerical experiments of the five test problems verify this conjecture. The norm of residuals can only be reduced to 10−2_.

5.2. Effects of Preconditioning. We have observed that SOneLS outperforms

another two schemes while using the preconditioner SSOR(ω). In this subsection, we further compare the performance of preconditioners, including SSOR(ω), ILU(`) (incomplete LU factorization), ICC(`) (incomplete Cholesky factorization), and Ja-cobi when using the the SOneLSscheme. The numerical results for solving problems

P11_2D, P232D, P3 1 3D, P5 5 3D, and P6 1

2Dare shown in Table 2. Note that we have scanned

the parameter ω of SSOR(ω) from 0.8 to 1.98 and the fill-in level ` of ILU(`) and ICC(`) from 0 to 12 for each of the test problems. However, Table 2 presents only the particular parameters that achieve better timing results. The table suggests the following observations.

• For P11

2D, P232D, and P612D, the preconditioner ILU(6) or ILU(7) results in

the best timing results. But, for P31

3D and P553D, SSOR(1.85) achieves the

best timing results.

• For ILU and ICC, the best ` for P11

2D, P232D, and P612D is either 6 or 7.

However, the best ` for P313D and P5 5

3D is 0 or 1.

The above behaviors are mainly due to the bandwidths of the corresponding coefficient matrices. In P11

2D, P2 3

2D, and P6 1

2D, the discretizations are associated

with two-dimensional planes and thus have smaller bandwidths. In contrast, the matrices associated with P31

3D and P5 5

3D have larger bandwidths due to the

three-dimensional discretization. Figure 3-(a), Figure 4, and Figure 3-(c) illustrate the sparsity of the coefficient matrices associated with P11

2D, P232D, and P612D with

smaller matrix sizes, respectively. Figure 3-(b) and Figure 5 illustrate the sparsity of the coefficient matrices associated with P31

3Dand P553Dwith smaller matrix sizes,

respectively.

While ILU and ICC preconditioners with higher level fill-ins can result in less iterations, such preconditioners may also decrease the overall efficiency. The over-all efficiency between different `’s and preconditioners (SSOR versus ILU/ICC) depends on such a trade-off.

5.3. Performance of Krylov subspace methods. We also solve the linear eigen-value problems P11

2D, P413D and P612D by the explicit restarting Lanczos/Arnoldi

method (ER), implicit restarting the Lanczos/Arnoldi method (IR), and Krylov-Schur method (KS). Note that another linear problem P31

3Dis skipped as the matrix

size is too large for the direct solvers in our computers.

We use the ER and KS provided by the software package SLEPc (Scalable Li-brary for Eigenvalue Problem Computations) [11, 12]. For IR, we use the ARPACK [22] wrapper included in SLEPc. We use PETSc (Portable, Extensible Toolkit for Scientific Computation) [1] to solve the linear system solvers and to perform the preconditionings within ER, KS, and IR. In particular, to solve the associated lin-ear systems within these three methods, we consider (i) direct solvers based on LU or Cholesky factorization and (ii) GMRES iterative solvers with preconditioners SSOR, ILU, or ICC.

(10)

The results for computing the five smallest positive eigenvalues of P112D, P4 1 3D

and P612D are summarized in Tables 3, 4 and 5. In the tables, “Rstno”, “Itno”, and

“Time” stand for the restarting number p in Algorithm 2, the number of the while-loop starting from line 3 to line 7 in Algorithm 2, and total CPU time in seconds for computing five target eigenpairs, respectively. Note that we have scanned the parameter ω in SSOR from 0.8 to 1.98 and the fill-in level parameter ` of ILU and ICC from 0 to 12 in our numerical experiments. The tables only present the results with better timing results. We highlight some observations from the tables as follows.

• KS outperforms ER and IR in almost all cases. In particular, KS and IR are more efficient and numerically stable than ER. Furthermore, the performance of KS is slightly better than that of IR.

• The bandwidths play an important role in determining efficiency of the lin-ear system solvers. In particular, for P112D and P6

1

2D, the Krylov subspace

methods with direct linear solvers are better than the Krylov subspace methods with iterative linear solvers. For P41

3D, the Krylov subspace

meth-ods with iterative linear solvers perform better.

Such behavior is again due to the bandwidths of the coefficient matrices. As the discretizations of these two problems involve only two-dimensional planes, the corresponding bandwidths of matrices Acc

0 and Aic0 in (6) and

(10) are small. Direct solvers remain efficient even when dense band matri-ces are introduced after LU or Cholesky factorizations have been performed. In contrast, the discretization of P41

3Dinvolves all three dimensions and the

corresponding bandwidth is large. In such cases, direct solvers are not effi-cient due to the fill-ins of LU or Cholesky factorizations. See Figure 3 for sparsity examples Acc₀ , Apc₀ , and Aic₀.

We conclude this subsection with the following remarks.

(1) For the problems with symmetric positive definite coefficient matrices (e.g. P31_3D and P61

2D), the eigenvalues are positive and the five smallest

posi-tive eigenvalues can be computed “without” using shift-and-invert. This approach involves matrix-vector multiplications and does not need to solve any linear system. However, our numerical experiments suggest that such an approach is not efficient.

In particular, if we do not perform shift-and-invert in the Krylov sub-space methods, the methods either converge slowly (P313D, as shown in

Ta-ble 7) or fail to convergence (P612D). Table 7 also suggests that the methods

take much more CPU time as compared to the Jacobi-Davidson methods shown Table 2.

(2) Direct solvers with LU factorization outperforms Cholesky factorization, as shown in Table 4. This is because a Cholesky factorization spends much more time in symbolic factorization and leads to a much larger fill-in ratio in order to maintain the symmetry of the matrices. See Table 6 for a detailed breakdown analysis for LU- and Cholesky-based direct solvers.

5.4. Overall Comparisons. Observing the results shown in Tables 2, 3, 4, 5, we can make an overall comparison between the Jacobi-Davidson methods and the Krylov subspace methods and conclude the best scheme combinations for the

(11)

different types of problems in Table 8. As discussed above, the bandwidth of the coefficient matrices is the key component that affects the choice of schemes.

6. Conclusion

We consider degree 1, 3, and 5 eigenvalue problems arising in numerical sim-ulations of nano-scale quantum dots. We have shown that a polynomial Jacobi-Davidson method can solve all of these problems without linearizing the higher de-gree problems. As the efficiency of the polynomial Jacobi-Davidson method mainly relies on solving the correction equation, we have discussed three schemes regarding how to compute the approximate solutions of the correction equations. We have also conducted intensive numerical experiments by using the Jacobi-Davidson and Krylov subspace methods with various linear solvers and preconditioners. The nu-merical results suggest the most efficient scheme combinations for different types of problems, which are shown in Table 8. We also find that the bandwidth of the coefficient matrices is the key component that affects the choice of schemes.

It is possible to further improve the solver efficiencies by parallel computing. We then need to find efficient preconditioners that are suitable to the target problems and the particular parallel architectures of interest. The best scheme combinations for different types of problems may also change consequently on parallel computers.

Acknowledgments

We are grateful to Wen-Wei Lin for many helpful discussions and comments. This work is partially supported by the National Science Council, the Taida Institute of Mathematical Sciences, and the National Center for Theoretical Sciences in Taiwan.

(12)

Pyramid Dot Cuboid Matrix

Figure 1. Structure schema of a pyramidal and a cylindrical quantum dot. Each of the quantum dots is embedded in a hetero-structure matrix.

EVP P112D P232D P33D1 P413D P553D P612D

QD Geo. cylinder cylinder pyramid pyramid pyramid irregular

QD Geo. Sym. radial radial non-rad. non-rad. non-rad. radial

Disc. Dom. 2D 2D 3D 3D 3D 2D

QD Eff. Mass constant non-para. constant constant non-para. constant

Mtx. Degree (τ ) 1 3 1 1 5 1

Mtx. Prop. Unsym. Unsym. S.P.D. S.P.D. Sym. S.P.D.

Mtx. Size 1, 200, 000 1, 228, 150 5, 216, 783 77, 315 5, 216, 783 1, 228, 150

Table 1. Test problem properties. The table shows the names of the eigenvalue problems, QD geometry, QD geometric symmetry, dimension of discretization domains, QD effective mass, degree of the eigenvalue problems, properties of the coefficient matrices, and coefficient matrix size of the eigenvalue problems. The superscript and subscript of problem names denotes the degree and the dis-cretization domain dimension, respectively. Unsym. and S.P.D. stands for unsymmetric and symmetric positive definite, respec-tively.

References

1. S. Balay, K. Buschelman, V. Eijkhout, W.D. Gropp, D. Kaushik, M.G. Knepley, L.C. McInnes, B.F. Smith, and H. Zhang, PETSc users manual, Argonne National Laboratory, Tech. Rep. ANL-95/11-Revision 2 (2002), no. 5.

2. G. Burkard, D. Loss, and D. P. DiVincenzo, Couple quantum dots as quantum gates, Phys. Rev. B 59 (1999), 2070–2078.

3. E. K.-W. Chu, T.-M. Hwang, W.-W. Lin, and C.-T. Wu, Vibration of fast trains, palindromic eigenvalue problems and structure-preserving doubling algorithms, J. Comput. Appl. Math. 219 (2008), 237–252.

4. M. A. Cusack, P. R. Briddon, and M. Jaros, Electronic structure of InAs/GaAs self-assembled quantum dots, Phys. Rev. B 54 (1996), 2300–2303.

5. E. A. de Andrada e Silva, G. C. La Rocca, and F. Bassani, Spin-orbit splitting of electronic states in semiconductor asymmetric quantum wells, Phys. Rev. B 55 (1997), no. 24, 16293. 6. D. El-Moghraby, R.G. Johnson, and P. Harrison, Calculating modes of quantum wire and dot

systems using a finite differencing technique, Computer Physics Communications 150 (2003), 235–246.

(13)

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103

104 Matrix size = 1,200,000

ω

CPU Time (log)

S_OneStep S_TwoLS S_OneLS

(a) P11

2D(Pcc(λ)u = 0 type EVP)

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103

104

Matrix size = 1,228,150

ω

CPU Time (log)

S_OneStep S_OneLS

(b) P23

2D (Pcn(λ)u = 0 type EVP)

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103

ω

CPU Time (log)

(c) P31

3D (Ppc(λ)u = 0 type EVP)

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103

ω

CPU Time (log)

(d) P55

3D(Ppn(λ)u = 0 type EVP)

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103

104

ω

CPU Time (log)

SOneStep S_TwoLS SOneLS

(e) P61_2D(Pic(λ)u = 0 type EVP)

Figure 2. Timing results for eigenvalue problems with a matrix size larger than 1.2 million. The numerical experiments are con-ducted on a workstation equipped with an Intel 1.6 GHz Itanium II CPU, 32-gigabyte main memory, and an HP Unix operating system.

(14)

(a) 2D Problems

P11_2D P23_2D P61_2D

Precond. ω or ` Itno Time ω or ` Itno Time ω or ` Itno Time

SSOR(ω) 1.80 285 1,706 1.25 367 1,440 1.85 458 3,897 1.85 256 1,536 1.30 292 1,077 1.90 392 3,304 1.90 232 1,388 1.35 395 1,734 1.95 388 3,258 1.95 1,213 11,206 1.40 377 1,584 1.98 530 4,548 ILU(`) 5 148 1,415 5 157 867 5 161 1,798 6 126 1,309 6 85 483 6 150 1,767 7 112 1,253 7 100 570 7 143 1,782 8 106 1,278 8 102 631 8 136 1,792 ICC(`) - - - 0 446 3,148 Jacobi - 2,427 11,982 - 837 2,588 - 8,580 48,590 (b) 3D Problems P31_3D P41_3D P55_3D

Precond. ω or ` Itno Time ω or ` Itno Time ω or ` Itno Time

SSOR(ω) 1.80 86 1,905 1.55 50 11 1.80 98 4,087 1.85 82 1,818 1.60 48 11 1.85 98 3,831 1.90 93 2,090 1.65 51 12 1.90 101 4,028 1.95 114 2,659 1.70 50 12 1.95 97 4,036 ILU(`) 0 141 3,273 0 53 12 0 102 3,981 1 106 3,196 1 52 17 1 83 4,008 2 89 3,284 2 52 24 2 72 4,231 ICC(`) 0 143 2,990 0 54 12 0 130 4,662 1 101 2,627 1 51 14 1 99 4,394 2 85 2,887 2 50 20 2 94 5,093 Jacobi - 388 6,342 - 106 16 - 312 9,633

Table 2. The total Jacobi-Davidson iteration numbers (Itno) and CPU times in second (Time) used for solving the test problems P11_2D, P23 2D, P3 1 3D, P4 1 3D, P5 5 3D, and P6 1 2D by Algorithm 1, scheme

SOneLS and different preconditioners. The column “ω or `” shows

the parameter ω in the preconditioner SSOR(ω) or the fill-in level ` in the preconditioners ILU(`) and ICC(`). The timing results are computed by summing the five smallest positive eigenvalues in each of the test problems.

7. J.L. Fattebert and M.B. Nardelli, Finite difference methods for ab initio electronic structure and quantum transport calculations of nanostructures, Handbook of Numerical Analysis 10 (2003), 571–612.

8. J. W. Gray, D. Childs, S. Malik, P. Siverns, C. Roberts, P. N. Stavrinou, M. Whitehead, R. Murray, and G. Parry, Quantum dot resonant cavitylight emitting diode operating near 1300 nm, Electron. Lett. 35 (1999), 242.

9. L. Harris, D. J. Mowbray, M. S. Skolnick, M. Hopkinson, and G. Hill, Emission spectra and mode structure of InAs/GaAs self-organized quantum dot lasers, Appl. Phys. Lett. 73 (1998), 969–971.

(15)

(a) Explicit restarting

Direct GMRES

LU ILU(6) SSOR(1.85)

Rstno Itno Time Itno Time Itno Time

10 29 291 - - - -15 12 251 - - - -20 6 216 - - - -25 3 186 4 8,690 - -30 3 203 3 7,870 3 8,386 35 2 189 2 6,395 2 6,721 40 1 164 1 3,934 1 4,073 45 1 171 1 4,273 1 4,558 50 1 178 1 4,748 1 5,031 (b) Implicit restarting Direct GMRES LU ILU(6) SSOR(1.85)

10 11 165 14 5,251 * * 15 5 164 5 4,465 * * 20 3 166 3 4,469 * * 25 2 165 3 5,723 * * 30 2 176 2 5,162 * * 35 2 188 2 6,015 * * 40 1 166 1 3,903 * * 45 1 173 1 4,369 * * 50 1 180 1 4,830 * *

(c) Krylov Schur restarting

Direct GMRES

LU ILU(6) SSOR(1.85)

10 10 163 12 5,334 12 5,527 15 5 161 6 4,669 5 4,264 20 3 159 4 4,620 4 4,745 25 2 159 3 1,656 3 4,825 30 2 169 2 4,164 2 4,436 35 2 179 2 3,819 2 5,128 40 1 165 1 3,819 1 4,043 45 1 172 1 4,275 1 4,527 50 1 180 1 4,738 1 5,017

Table 3. Numerical results for solving P112D. The notation “*”

means that the method does not converge. Since the coefficient matrices are unsymmetric, the Cholesky-based direct solver and ICC preconditioners are not used.

(16)

(a) Explicit restarting

Direct GMRES

LU Cholesky ILU(2) SSOR(1.30) ICC(2)

Rstno Itno Time Itno Time Itno Time Itno Time Itno Time

10 61 415 68 4,520 147 581 49 143 69 183 15 26 376 18 4,046 27 191 1981 11,003 19 91 20 12 338 12 4,027 13 131 252 1,871 13 87 25 10 346 10 4,056 23 308 * * 378 3,444 30 8 344 8 4,040 7 113 8 87 7 76 35 6 335 18 4,700 6 114 6 79 6 77 40 4 320 4 3,913 9 199 17 260 8 117 45 5 341 5 4,026 15 377 439 7,599 17 285 50 5 349 5 4,087 115 3,160 10 194 68 1,264 (b) Implicit restarting Direct GMRES

10 17 290 19 3,782 20 47 20 33 20 31 15 7 290 8 3,773 9 46 8 30 9 31 20 5 290 5 3,773 5 43 5 30 5 29 25 3 287 4 3,786 4 47 4 32 4 31 30 3 292 3 3,779 3 45 3 32 3 31 35 2 287 3 3,804 3 52 3 36 3 35 40 2 291 2 3,786 2 43 2 30 2 29 45 2 294 2 3,973 2 49 2 34 2 34 50 2 298 2 3,809 2 55 2 39 2 38

(c) Krylov Schur restarting

Direct GMRES

10 15 289 15 3,788 17 48 16 32 17 33 15 8 288 8 3,778 9 40 9 29 9 28 20 6 288 6 3,791 6 40 6 28 6 27 25 4 287 4 3,807 5 42 5 30 5 29 30 4 291 4 3,797 4 43 4 30 4 29 35 3 290 3 3,789 3 40 3 29 3 28 40 3 293 2 3,769 3 46 3 33 3 32 45 2 289 2 3,783 2 39 2 29 2 27 50 2 292 2 3,800 2 44 2 33 2 30

Table 4. Numerical results for solving P413D. The notation “*”

(17)

(a) Implicit restarting

Direct GMRES

LU ILU(6) ICC(0)

10 19 298 22 5,553 22 15,165 15 6 275 7 4,281 7 11,924 20 4 284 5 4,832 5 13,244 25 3 288 3 4,339 3 11,965 30 2 279 3 5,230 3 14,836 35 2 296 2 4,423 2 12,474 40 2 312 2 5,100 2 14,623 45 1 273 2 5,730 2 16,847 50 1 282 1 3,564 1 10,143

(b) Krylov Schur restarting

Direct GMRES

LU ILU(6) ICC(0)

10 12 274 18 5,323 18 148,803 15 5 263 8 4,343 8 12,284 20 4 272 5 4,015 5 10,956 25 3 277 4 4,201 4 11,848 30 2 274 3 4,116 3 11,783 35 2 288 3 4,740 2 11,875 40 1 268 3 5,423 2 11,877 45 1 278 2 4,712 2 13,410 50 1 289 1 3,613 1 10,262

Table 5. Numerical results for solving P612D. Results of explicit

restarting are not shown here as the method does not converge in most of the cases. The results of Cholesky-based direct method cannot be completed in our computer due to the out of memory errors. The results of GMRES with SSOR are not shown here as the method does not converge. Note that the fill-ins of the Cholesky factorizations in P612Dcause out of memory storage errors

in our experiments. Therefore, no Cholesky results are listed here even though the coefficient matrix is symmetric positive definite.

10. R. Heitz, M. Veit, N. N. Ledentsov, A. Hoffmann, D. Bimberg, V. M. Ustinov, P. S. Kop´ev, and Zh. I. Alferov, Energy relaxation by multiphonon processes in InAs/GaAs quantum dots, Phys. Rev. B 56 (1997), 10435–10445.

11. V. Hernandez, J.E. Roman, A. Tomas, and V. Vidal, SLEPc users manual, Tech. Re-port DSIC-II/24/02 - Revision 2.3.2, D. Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 2006.

12. V. Hernandez, J.E. Roman, and V. Vidal, SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems, ACM Trans. Math. Software 31 (2005), 351–362.

(18)

0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 nz = 1840 (a) Acc 0 0 50 100 150 200 0 50 100 150 200 nz = 1477 (b) Apc₀ 0 20 40 60 80 100 0 20 40 60 80 100 nz = 880 (c) Aic 0

Figure 3. Sparsity of the coefficient matrices Acc0 , A pc

0 , and Aic0.

(a) Acc

0 with matrix size 384, (b) A pc

0 with matrix size 245, and (c)

Aic

0 with matrix size 114.

0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 nz = 2364 (a) Acn 0 , Acn1 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 nz = 548 (b) Acn 2 , Acn3

Figure 4. Sparsity of the coefficient matrices for Pcn(λ)u = 0

with matrix size 500. (a) Acn₀ and Acn₁ , (b) Acn₂ and Acn₃ .

0 50 100 150 200 0 50 100 150 200 nz = 1477 (a) Apn₀ , Apn₁ 0 50 100 150 200 0 50 100 150 200 nz = 449 (b) Apn₂ , Apn₃ 0 50 100 150 200 0 50 100 150 200 nz = 34 (c) Apn₄ , Apn₅

Figure 5. Sparsity of the coefficient matrices for Ppn(λ)u = 0

with matrix size 245. (a) Apn₀ and Apn₁ , (b) Apn₂ and Apn₃ , (c) Apn₄ and Apn₅ .

13. M. HOCHBRUCK and D. Lochel, A multilevel Jacobi-Davidson method for polynomial pde eigenvalue problems arising in plasma physics, Tech. report, Technical Report, Math. Institut HHU Dusseldorf, 2009.

(19)

LU Cholesky Time for Symbolic factorization 3 (1%) 2,378 (63%) Time for numerical factorization 273 (90%) 1,257 (33%)

Total CPU time 304 3,773

Fill-in ratio 102% 309%

Table 6. Timing in seconds and the corresponding percentage (shown within parentheses) breakdown of the key components in the LU- and Cholesky-based direct solvers for solving P413D.

Krylov Schur restarting Implicit restarting

Rstno Itno Time Itno Time

20 1052 19,055 449 9,646

25 409 11,430 250 7,956

30 305 11,734 198 8,645

35 268 13,771 118 6,770

40 183 12,055 95 7,132

Table 7. Numerical results for solving P313D by Krylov subspace

methods “without” shift-and-invert.

3D 2D

Higher order EVP JD + SOneLS + SSOR JD + SOneLS + ILU

(P553D) (P2

3 2D)

Standard/General EVP JD + SOneLS + SSOR KS + LU-based Direct

(P31

3D, P413D) (P112D, P612D)

Table 8. The most efficient scheme combinations for different test problems.

14. T.-M. Huang, W.-W. Lin, and J. Qian, Structured algorithms for palindromic quadratic eigen-value problems arising in vibration of fast trains, SIAM J. Matrix Anal. Appl. 30 (2009), 1566–1592.

15. T.-M. Hwang, W.-W. Lin, W.-C. Wang, and W. Wang, Numerical simulation of three dimen-sional pyramid quantum dot, Journal of Computational Physics 196 (2004), 208–232. 16. T.-M. Hwang and W. Wang, Energy states of vertically aligned quantum dot array with

nonparabolic effective mass, Comput. Math. Appl. 49 (2005), 39–51.

17. T.-M. Hwang, W.-C. Wang, and W. Wang, Numerical schemes for three-dimensional irregular shape quantum dots over curvilinear coordinate systems, J. Comput. Phys. 226 (2007), 754– 773.

18. T.-M. Hwang, W.-H. Wang, and W. Wang, Efficient numerical schemes for electronic states in coupled quantum dots, J. Nanosci. Nanotechnol. 8 (2008), 3695–3709.

19. G. Iannaccone, A. Trellakis, and U. Ravaioli, Simulation of a quantum-dot flash memory, J. Appl. Physi. 84 (1998), no. 9, 5032–5036.

20. L. Jacak, P. Hawrylak, and A. W´ojs, Quantum dots, Springer, Berlin, 1998.

21. R.B. Lehoucq and D.C. Sorensen, Deflation techniques for an implicitly restarted arnoldi iteration, SIAM J. Matrix Anal. Appl. 17 (1996), 789–821.

22. R.B. Lehoucq, D.C. Sorensen, and C. Yang, ARPACK USERS GUIDE: Solution of large scale eigenvalue problems with implicitely restarted arnoldi methods, SIAM, Philadelphia, 1998.

(20)

23. Y. Li, O. Voskoboynikov, C.P. Lee, and S.M. Sze, Computer simulation of electron energy levels for different shape InAs/GaAs semiconductor quantum dots, Computer Physics Com-munications (2001), no. 141, 66–72.

24. S. Maimon, E. Finkman, G. Bahir, S. E. Schacham, J. M. Garcia, and P. M. Petroff, Inter-sublevel transitions in InAs/GaAs quantum dots infrared photodetectors, Appl. Phys. Lett. 73 (1998), 2003–2005.

25. G. Medeiros-Ribeiro, J. M. Garcia, and P. M. Petroff, Charging dynamics of InAs self-assembled quantum dots, Phys. Rev. B 56 (1997), 3609–3612.

26. F. M. Peeters and V. A. Schweigert, Two-electron quantum disks, Phys. Rev. B 53 (1996), 1468–1474.

27. C. Pryor, Eight-band calculations of strained InAs/GaAs quantum dots compared with one-, four-, and six-band approximations, Phys. Rev. B 57 (1998), no. 7190–7195.

28. J. Shumway, L. R. C. Fonseca, J. P. Leburton, R. M. Martin, and D. M. Ceperley, Electronic structure of self-assembled quantum dots: comparison between density functional theory and diffusion quantum monte carlo, Physica E 8 (2000), 260–268.

29. G. L. G. Sleijpen and H. A. van der Vorst, A Jacobi-Davidson iteration method for linear eigenvalue problems, SIAM J. Matrix Anal. Appl. 17 (1996), no. 2, 401–425.

30. G. W. Stewart, Matrix algorithms, volume ii: eigensystems, SIAM, Philadelphia, 2001. 31. G.W. Stewart, A Krylov–Schur algorithm for large eigenproblems, SIAM J. Matrix Anal.

Appl. 23 (2001), 601–614.

32. , Addendum to ”a Krylov–Schur algorithm for large eigenproblems”, SIAM J. Matrix Anal. Appl. 24 (2002), 599–601.

33. W. Wang, T.-M. Hwang, and J.-C. Jang, A second-order finite volume scheme for three dimensional truncated pyramidal quantum dot, Comput. Phys. Comm. 174 (2006), 371–385. 34. W. Wang, T.-M. Hwang, W.-W. Lin, and J.-L. Liu, Numerical methods for semiconductor

heterostructures with band nonparabolicity, J. Comput. Phys. 190 (2003), 141–158.

35. A. J. Williamson and A. Zunger, InAs quantum dots: Predicted electronic structure of free-standing versus GaAs-embedded structures, Phys. Rev. B 59 (1999), 15819–15824.

Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan. E-mail address: [email protected]

Department of Mathematics, National Taiwan University, Taipei 106, Taiwan. E-mail address: [email protected]

Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan. E-mail address: [email protected]