PROBLEM SOLVERS FOR QUANTUM DOT SIMULATIONS
TSUNG-MING HUANG, WEICHUNG WANG, AND CHANG-TSE LEE
Abstract. Nano-scale quantum dot simulations result in large-scale polyno-mial eigenvalue problems. It remains unclear how these problems can be solved efficiently. We fill this gap in capability partially by proposing a polynomial Jacobi-Davidson method framework, including several varied schemes for solv-ing the associated correction equations. We investigate the performance of the proposed Jacobi-Davidson methods for solving the polynomial eigenvalue problems and several Krylov subspace methods for solving the linear eigenvalue problems with the use of various linear solvers and preconditioning schemes. This study finds the most efficient scheme combinations for different types of target problems.
1. Introduction
A standard matrix polynomial eigenvalue problem can be written as
(1) P(λ)u ≡ (
τ
X
i=0
λiAi)u = 0,
where (λ, u) is the corresponding eigenpair with λ ∈ C and u ∈ CN, integer τ is the degree of the matrix polynomial, and Ai∈ RN ×N are the coefficient matrices.
Solving large-scale polynomial eigenvalue problems has a long term history and still remains an active research topic due to its computational challenges and wide applications. Scientific and engineering studies that lead to polynomial eigenvalue problems include nano-scale quantum mechanism of degree 1 [7], 3 [34], or 5 [15]; high speed railway vibration of degree 2 [3, 14]; and plasma physics of degree 3 [13]. In this article, we focus on the polynomial eigenvalue problems arising in sim-ulations of nano-scale quantum dots (QDs). In particular, we consider how the Jacobi-Davidson methods and Krylov subspace methods may be used to solve these problems efficiently. This study aims to do the following:
• derive a framework of the Jacobi-Davidson method for solving polynomial eigenvalue problems;
• propose various schemes for solving the correction equation in the Jacobi-Davidson method;
• provide a set of large-scale polynomial eigenvalue benchmark problems aris-ing in the numerical simulation of quantum dots; and
• to perform intensive numerical performance comparisons for the various schemes against the benchmark problems.
Date: December 1, 2009.
Key words and phrases. polynomial eigenvalue problems, Jacobi-Davidson methods, correction equations, Krylov subspace methods, Schr¨odinger equation, quantum dot.
The paper is organized as follows. We first introduce the model problems aris-ing in quantum dot simulations in Section 2. Several variants of Jacobi-Davidson methods are discussed in Section 3. Several Krylov subspace methods for solving linear eigenvalue problems are discussed in Section 4. Numerical experiments are presented and analyzed in Section 5. We conclude the paper in Section 6.
2. Model Problems
Nano-scale semiconductor quantum dots are materials in which the carriers are confined within the dots in all three dimensions. These carries consequently have wavelike properties with discrete energy levels that are induced. Over the past few years, numerous studies regarding nano-scale quantum dots have been conducted to examine their physical properties [4, 10, 20, 25] and applications [2, 8, 9, 19, 24]. Other than theoretical and experimental methods, numerical simulations also play important roles to investigate insights into a QD’s electronic and optical properties [26, 28, 35].
We focus on a single particle conduction band model in this article. As shown in Figure 1, which is a structure scheme of the model, a QD is embedded in a matrix. The governing equation of this model can be described by the Schr¨odinger equation in general
(2) −∇ · ( ~
2
2m(x)∇u) + c(x)u = λu, or, in cylindrical coordinates,
(3) −1 r ∂ ∂r(r ~2 2m(x)∂ru) − 1 r∂θ( 1 r ~2 2m(x)∂θu) − ∂z( ~2 2m(x)∂zu) + c(x)u = λu. Here, ~ is the reduced Plank constant, the eigenvalue λ is the total electron energy, and the corresponding eigenvector u denotes the wave function. The variable m(x) represents the electron effective mass, and c(x) is the confinement potential.
In hetero-structures, both m(x) and c(x) are discontinuous at the interface. Par-ticularly, in a constant effective mass model, m(x) and c(x) are piecewise constant functions with respect to the space variable x:
m(x) = m1 in the dot, m2 in the matrix, c(x) = c1 in the dot, c2 in the matrix.
The Ben Daniel-Duke interface conditions associated with the discontinuity in m are imposed as follows
(4) u|D += u|D−, ~2 2m2 ∂u ∂n ∂D + = ~2 2m1 ∂u ∂n ∂D − ,
where D stands for the QD domain. The normal direction n with the subscripts + and − denotes the corresponding outward normal derivatives of the interface defined for the matrix and dot regions, respectively. If the non-parabolicity of the electron’s dispersion relation is taken into account [5], the effective mass model for the interface conditions (4) becomes
(5) 1 m`(λ) =P 2 ` ~2 2 λ + g`− c` + 1 λ + g`− c`+ δ` ,
where P`, g`, and δ` are the momentum, main energy gap, and spin-orbit splitting
in the `th region, respectively. The following parameters are used in our numerical experiments: c1 = 0.000, g1 = 0.235, δ1 = 0.81, P1 = 0.2875, c2 = 0.350, g2 =
1.590, δ2 = 0.80, and P2= 0.1993. Additionally, we apply homogeneous Dirichlet
conditions on the boundary of the quantum matrix.
To simulate QDs fabricated in laboratories, various discretization schemes have been developed for various QD geometries such as a cylinder [16, 18, 34], cone [23], pyramid [6, 15, 27, 33], or irregular shape [17]. Among these QD geometries and discretizations, we pick the following five settings and use the induced polynomial eigenvalue problems as the test problems. We omit detailed derivations of the discretizations and the resulting eigen-systems here as they can be found in the corresponding references.
• Cylindrical QD with the constant effective mass model [18]:
(6) Pcc(λ)u ≡ (Acc0 + λI)u,
where Acc0 is unsymmetric.
• Cylindrical QD with the non-parabolic effective mass model [34]:
(7) Pcn(λ)u ≡ 3 X i=0 λiAcni ! u.
• Pyramidal QD with the constant effective mass model [15]:
(8) Ppc(λ)u ≡ (Apc0 − λI)u,
where Apc0 is a symmetric positive definite matrix.
• Pyramidal QD with non-parabolic effective mass model [15]:
(9) Ppn(λ)u ≡ 5 X i=0 λiApni ! u.
• Irregular QD with the constant effective mass model over the skewed coor-dinate system [17]:
(10) Pic(λ)u ≡ (Aic0 + λA
ic 1)u,
where Aic0 is symmetric positive definite matrix and Aic1 is a diagonal matrix. The above test problems have different degrees in the polynomial eigenvalue problems and different distributions of matrix entry magnitudes. We use the eigen-solvers proposed in Sections 3 and 4 to solve these problems, to investigate the efficiency, and to justify the performance.
3. Polynomial Jacobi-Davidson Algorithm
Jacobi-Davidson type methods are competitive numerical methods when inte-rior eigenvalues are of interest, especially for large-scale eigenvalue problems. Such attractive properties mainly result from the fact that: (i) the coefficient matrices are used implicitly in the matrix-vector multiplication forms; (ii) no inverse of the matrix is needed even to compute interior eigenvalues; and (iii) the desired eigen-pairs are approximated iteratively by a gradually expanding subspace, and then corrector vectors can be approximately computed and preconditioned to achieve efficiency. However, there is only sparse literature that addresses how variants of Jacobi-Davidson methods perform on large-scale polynomial eigenvalue problems.
We describe the main idea of the Jacobi-Davidson method for the polynomial eigenvalue problem as follows from the viewpoint of Taylor expansion. We see that the following derivation is a straightforward generation of the standard lin-ear eigenvalue problem discussed in [29]. Suppose Vk is a k-dimensional subspace
that has an orthogonal unitary basis v1, v2, . . . , vk. Let (θk, uk) be a Ritz pair (an
approximate eigenpair) of P(λ) and (θk, sk) be an eigenpair of Vk∗P(λ)Vks = 0,
where kskk2 = 1, uk = Vksk, and Vk = [v1, · · · , vk]. To expand the subspace Vk
successively, the Jacobi-Davidson method finds the orthogonal complement for the current approximation uk. In other words, starting from the Ritz pair (θk, uk), we
intend to find a corrector t ⊥ uk such that
(11) P(λ)(uk+ t) = 0.
Obviously, it is impractical to solve Eq. (11) directly as it is another polynomial eigenvalue problem that is equivalent to the original target problem (1). Instead, we may expand Eq. (11) and rewrite P(λ) in terms of θk by using a Taylor expansion
to obtain an approximation to the corrector t. First, we rewrite Eq. (11) as P(λ)t = −P(λ)uk = −rk+ (P(θk) − P(λ))uk,
where the residual vector
rk= P(θk)uk.
Furthermore, by assuming θkis close to λ, we may use Taylor’s Theorem to obtain
(P(θk) − P(λ))uk = (θk− λ)P0(θk) − 1 2(θk− λ) 2P00(ξ k) uk = (θk− λ)pk− 1 2(θk− λ) 2P00(ξ k)uk,
where ξk is between λ and θk and
pk= P0(θk)uk ≡ d dλP(λ) λ=θ k uk= τ X i=1 iθi−1k Ai ! uk.
Consequently, Eq. (11) is equivalent to P(λ)t = −rk+ (θk− λ)pk− 1 2(θk− λ) 2P00(ξ k)uk. (12)
We can further manipulate the terms containing (θk − λ) to derive practical
computational schemes for finding t. First, by using the fact that rk⊥ uk, or
u∗krk= u∗kP(θk)uk= s∗kV ∗ kP(θk)Vksk = 0, we multiplyI −pku∗k u∗ kpk
on both sides of Eq. (12) to eliminate the term containing (θk− λ) and to obtain I −pku ∗ k u∗ kpk P(λ)t = −rk− 1 2(θk− λ) 2 I −pku ∗ k u∗ kpk P00(ξk)uk.
Second, we neglect the second order term containing (θk − λ)2 on the right hand
side to obtain a linear (in terms of λ) approximation of P(λ)t satisfying I −pku ∗ k u∗kpk P(λ)t = −rk and t ⊥ uk. (13)
Practically, we further apply the orthogonal projection (I − uku∗k) and approximate
P(λ) by P(θk) and then form the following correction equation
I −pku ∗ k u∗kpk P(θk)(I − uku∗k)˜t = −rk and ˜t ⊥ uk. (14)
Based on the above discussions, a general polynomial Jacobi-Davidson method designed to compute all the desired eigenvalues for the problem (1) is shown in Algorithm 1.
Algorithm 1 Polynomial Jacobi-Davidson Algorithm for (Pτ
i=0λ iA
i)u = 0.
Input: Coefficient matrices Ai for i = 0, . . . , τ , the number of desired eigenvalues
k and an initial orthonormal vector Vini.
Output: The desired eigenpairs (λj, uj) for j = 1, . . . , k. 1: Set V = [Vini], Vu= [ ], and Λ = ∅.
2: for j = 1 to k do
3: Compute Wi= AiV and Mi= V∗Wi for i = 0, . . . , τ . 4: while (user defined stopping criteria are not satisfied) do
5: Compute the eigenpairs (θ, s) of (Pτi=0θiMi)s = 0.
6: Select the desired eigenpair (θ, s) with ksk2= 1 and θ /∈ Λ. 7: Compute u = V s, p = P0(θ)u, r = P(θ)u.
8: Solve the correction equation I −pu ∗ u∗p P(θ)(I − uu∗)t = −r approximately for t ⊥ u .
9: Orthogonalize t against V ; set v = t/ktk2. 10: Compute wi= Aiv, Mi= Mi V∗wi v∗Wi v∗wi for i = 0, . . . , τ .
11: Expand V = [V, v] and Wi= [Wi, wi] for i = 1, . . . , τ . 12: end while
13: Set λj= θ, uj = u, Λ = Λ ∪ {λj}.
14: Perform locking by orthogonalizing uj against Vu; Compute uj = uj/kujk2;
Update Vu= [Vu, uj].
15: Choose an orthonormal matrix Vini⊥ Vu; Set V = [Vu, Vini]. 16: end for
Now, we focus on three schemes for approximately solving Eq. (14). This is an essential step in the polynomial Jacobi-Davidson method that may affect the over-all performance significantly. We approximately solve Eq. (14) by a preconditioned iterative method, e.g., GMRES with SSOR preconditioner. We call this precondi-tioned iterative process corresponding to line 8 of Algorithm 1 as the “inner loop” of the algorithm. On the other hand, we call the while-loop in lines 4 to 12 as the “outer loop” of the algorithm.
Three schemes, SOneLS, ST woLS, and SOneStep, are proposed below to solve
Eq. (14) approximately. SOneLS solves one linear system by a preconditioned
it-eration method. ST woLS and SOneStep solve two linear systems by preconditioned
iterations, but SOneStep conducts only one step in the preconditioned iterations.
• Scheme SOneLS. In each step of the preconditioned iterations for solving
correction equation (14), we need to solve a linear system in a form such that
Mpz = y, z ⊥ uk
(15)
where y is a certain given vector that is orthogonal to uk and
Mp≡ I −pku ∗ k u∗kpk M (I − uku∗k)
with preconditioner M of P(θk). Under the requirement z ⊥ uk, the
solu-tion of Eq. (15) is given by
z = M−1y + ηkM−1pk with ηk= − u∗kM−1y u∗kM−1p k . (16)
It is clear that the vector M−1pk and the inner product u∗kM −1p
k need
to be computed only once in the first step of the preconditioned iteration. Consequently, other iterative steps need only the preconditioning operations in the form of M−1y.
• Scheme ST woLS. By (14) and t ⊥ uk, it follows that
P(θk)t =
u∗kP(θk)t
u∗kpk
pk− rk ≡ ηk(t)pk− rk.
(17)
We can then solve the two linear systems P(θk)z = −rk and P(θk)z = pk
(18)
approximately by a preconditioned iterative method to obtain the approx-imate solution z1 and z2, respectively. Then we compute
˜ t = z1+ ηkz2 for ηk = − u∗kz1 u∗kz2 . (19)
Here ˜t as an approximate solution to Eq. (17) that satisfies the requirement ˜
t ⊥ uk. In ST woLS, two linear systems (18) need to be solved approximately
by a preconditioned iteration method to obtain the approximated solution ˜
t.
• Scheme SOneStep. We may reduce the cost for computing ˜t that solves
Eq (17) in the inner loop. Here we only conduct one preconditioned itera-tion. Namely, ˜t is computed by
˜ t = −M−1rk+ ηkM−1pk for ηk = u∗kM−1r k u∗ kM−1pk , (20)
4. Krylov subspace methods
While the polynomial eigenvalue problems (6)-(10) can be solved by the Jacobi-Davidson methods presented in Section 3, the linear eigenvalue problems (6), (8), and (10) can also be solved by the the so-called Krylov subspace methods, such as the Lanczos, Arnoldi, and Krylov-Schur methods. In this section, we briefly describe how the Krylov subspace methods can be used to solve the standard eigen-value problem A0u = λu. Similar ideas can be generalized to general eigenvalue
problems A0u = λA1u by the A1-inner product.
First, we define the Krylov decomposition [30] as follows. Let Vk+1= [Vk, vk+1] ∈
RN ×(k+1) be an orthonormal matrix, where Vk ∈ RN ×k and vk+1 ∈ RN ×1. An
orthonormal Krylov decomposition of order k is a relation of the form A0Vk= VkBk+ vk+1bTk+1,
(21)
where Bk ∈ Rk×k is a Rayleigh quotient R(A0; Vk) = VkTA0Vk and bk+1 ∈ Rk×1.
Note that if A0 is symmetric, Bk is tridiagonal, and bk+1 = βkek, then (21) is a
Lanczos decomposition of order k. If Bk is upper Hessenberg, bk+1 = βkek, then
(21) is an Arnoldi decomposition of order k. Here βk ∈ R and ek ∈ Rk×1 is the
standard unit vector (kth column of identity matrix).
Consequently, the Ritz pairs associated with Ukcan be computed as follows. Let
(θi, vi) be an eigenpair of Bk and let (θi, yi) = (θi, Ukvi) be the Ritz pair of A. By
(21), we have
kAyi− θiyik2 = kAUkvi− θiUkvik2= k(UkBk+ uk+1bTk+1)vi− θiUkvik2
= kUk(Bkvi− θivi) + (bk+1T vi)uk+1k2= |bTk+1vi|.
As k increases, some of these Ritz pairs will approach the eigenpairs of A. That is, |bT
k+1vi| → 0 for some i.
Theoretically, we can keep expanding the Krylov decomposition until the Ritz eigenpairs converge to the desired eigenpairs. Practically, however, the expanding process is limited to avoid loss of numerical orthogonality of Vk and to use a
rea-sonable amount of memory for storing Vk. A general idea of restarting is that, after
Vp has been computed, a new Krylov process is performed to compute a different
Krylov decomposition of order p with “better” initial vectors. For example, (i) an “explicit restart” strategy [11, 12] reruns the Krylov decomposition by using the approximate Schur vectors associated with the first not-yet-converged eigenvalue as an initial vector. (ii) An “implicit restart” [21] combines the Krylov decomposition process with the implicitly shifted QR algorithm. This implicit restart process is more efficient and numerically stable than explicit restart. (iii) A “Krylov-Schur method” [31, 32] that can be seen as an improvement on traditional Krylov subspace methods. We sum up all the processes in Algorithm 2.
5. Numerical Results
We study how various Jacobi-Davidson and Krylov subspace methods perform when solving the polynomial eigenvalue problems arising in quantum dot simula-tions. The properties of the test problems are shown in Table 1. All of the test problems are solved by the Jacobi-Davidson methods. Only problems of degree 1 (linear or generalized) problems are solved by the Krylov type methods. Note that, as the eigenvector solutions of the cylindrical and irregular QDs are periodical in the azimuthal direction, the 3D problems are transformed to a sequence of 2D
Algorithm 2 Restarting Krylov Subspace Algorithm for A0u = λu.
Input: Coefficient matrix A0; initial orthonormal vector v1; number of desired
eigenpairs k; size of subspace for restarting p. Output: The desired k eigenpairs of A0.
1: Generate the Krylov decomposition of order j (j ≥ k), by starting from v1:
A0Vj = VjBj+ vj+1bTj+1. 2: Compute the Ritz pairs of A0 from Bj and Vj.
3: while (the desired k eigenpairs of A0 are not convergent) do 4: Extend the Krylov decomposition from order j to p:
AVp= VpBp+ vp+1bTp+1. 5: Compute the Ritz pairs of A0 from Bp and Vp.
6: Reformat a new Krylov decomposition with order j by a restarting process.
7: end while
problems by the truncated Fourier series. Consequently, the discretization domain dimensions of problems P11
2D, P232D, and P612D are over the two dimensional
radial-longitude planes. Except for the results shown on Figure 2, all of the numerical experiments are conducted on an HP BL460c workstation composed of two Intel Dual-Core 5160 3.0 GHz CPUs and 32 GB main memory.
In Section 5.1 and 5.2, we study how the correction equation solution schemes and the preconditioners affect the performance of the Jacobi-Davidson methods pre-sented in Algorithm 1. We implement the Jacobi-Davidson methods with Fortran 90. In Section 5.3, we investigate the performance of various Krylov subspace meth-ods. In Section 5.4, we make an overall comparison of all methods and conclude the most efficient scheme combinations for each of the problems.
5.1. Correction Equation Solution Schemes. To compare the efficiency of the three schemes (SOneLS, ST woLS, and SOneStep) for solving the correction equation
(14), we use GMRES to solve the linear systems in (14) and (18) with the SSOR (symmetric successive over-relaxation) preconditioner M = (D+ωL)D−1(D+ωU ). Here P(θk) = L + D + U and L, D, and U are the strict lower triangular, diagonal,
and strict upper triangular matrices, respectively. The parameter ω is chosen from 0.8 to 1.95. The timing results for problems P11
2D, P2 3 2D, P3 1 3D, P5 5 3D, and P6 1 2Dare
shown in Figure 2. The CPU timing results are computed by summing up the total cost for computing the smallest five positive eigenvalues and their corresponding eigenvectors.
Observing the figure, we can see that SOneLS outperforms the other schemes in
almost all cases. Furthermore, (i) the cost of ST woLS is roughly 2 times of the cost
of SOneLS. Such numerical results coincide with the corresponding algorithms. (ii)
SOneStep is usually the slowest method. While solving Pic(λ)u = 0, it even fails to
converge for ω < 1.7.
Remark. As mentioned in [29], the scheme involving (17) is not efficient for solving linear eigenvalue problems. We have similar observations for higher order polynomial eigenvalue problems and we believe the reasons behind this are similar to the linear cases. Using the definition of residual rk = P(θk)uk, we can rewrite
(17) as
Such choice is actually equivalent with t = P(θk)−1pk, as t is made orthogonal
to uk afterwards. Let ˜t be an approximated solution of P(θk)t = pk. The angle
between ˜t and uk may be small. Consequently, we do not expect that the subspace
expansion would be efficient. Our numerical experiments of the five test problems verify this conjecture. The norm of residuals can only be reduced to 10−2.
5.2. Effects of Preconditioning. We have observed that SOneLS outperforms
another two schemes while using the preconditioner SSOR(ω). In this subsection, we further compare the performance of preconditioners, including SSOR(ω), ILU(`) (incomplete LU factorization), ICC(`) (incomplete Cholesky factorization), and Ja-cobi when using the the SOneLSscheme. The numerical results for solving problems
P112D, P232D, P3 1 3D, P5 5 3D, and P6 1
2Dare shown in Table 2. Note that we have scanned
the parameter ω of SSOR(ω) from 0.8 to 1.98 and the fill-in level ` of ILU(`) and ICC(`) from 0 to 12 for each of the test problems. However, Table 2 presents only the particular parameters that achieve better timing results. The table suggests the following observations.
• For P11
2D, P232D, and P612D, the preconditioner ILU(6) or ILU(7) results in
the best timing results. But, for P31
3D and P553D, SSOR(1.85) achieves the
best timing results.
• For ILU and ICC, the best ` for P11
2D, P232D, and P612D is either 6 or 7.
However, the best ` for P313D and P5 5
3D is 0 or 1.
The above behaviors are mainly due to the bandwidths of the corresponding coefficient matrices. In P11
2D, P2 3
2D, and P6 1
2D, the discretizations are associated
with two-dimensional planes and thus have smaller bandwidths. In contrast, the matrices associated with P31
3D and P5 5
3D have larger bandwidths due to the
three-dimensional discretization. Figure 3-(a), Figure 4, and Figure 3-(c) illustrate the sparsity of the coefficient matrices associated with P11
2D, P232D, and P612D with
smaller matrix sizes, respectively. Figure 3-(b) and Figure 5 illustrate the sparsity of the coefficient matrices associated with P31
3Dand P553Dwith smaller matrix sizes,
respectively.
While ILU and ICC preconditioners with higher level fill-ins can result in less iterations, such preconditioners may also decrease the overall efficiency. The over-all efficiency between different `’s and preconditioners (SSOR versus ILU/ICC) depends on such a trade-off.
5.3. Performance of Krylov subspace methods. We also solve the linear eigen-value problems P11
2D, P413D and P612D by the explicit restarting Lanczos/Arnoldi
method (ER), implicit restarting the Lanczos/Arnoldi method (IR), and Krylov-Schur method (KS). Note that another linear problem P31
3Dis skipped as the matrix
size is too large for the direct solvers in our computers.
We use the ER and KS provided by the software package SLEPc (Scalable Li-brary for Eigenvalue Problem Computations) [11, 12]. For IR, we use the ARPACK [22] wrapper included in SLEPc. We use PETSc (Portable, Extensible Toolkit for Scientific Computation) [1] to solve the linear system solvers and to perform the preconditionings within ER, KS, and IR. In particular, to solve the associated lin-ear systems within these three methods, we consider (i) direct solvers based on LU or Cholesky factorization and (ii) GMRES iterative solvers with preconditioners SSOR, ILU, or ICC.
The results for computing the five smallest positive eigenvalues of P112D, P4 1 3D
and P612D are summarized in Tables 3, 4 and 5. In the tables, “Rstno”, “Itno”, and
“Time” stand for the restarting number p in Algorithm 2, the number of the while-loop starting from line 3 to line 7 in Algorithm 2, and total CPU time in seconds for computing five target eigenpairs, respectively. Note that we have scanned the parameter ω in SSOR from 0.8 to 1.98 and the fill-in level parameter ` of ILU and ICC from 0 to 12 in our numerical experiments. The tables only present the results with better timing results. We highlight some observations from the tables as follows.
• KS outperforms ER and IR in almost all cases. In particular, KS and IR are more efficient and numerically stable than ER. Furthermore, the performance of KS is slightly better than that of IR.
• The bandwidths play an important role in determining efficiency of the lin-ear system solvers. In particular, for P112D and P6
1
2D, the Krylov subspace
methods with direct linear solvers are better than the Krylov subspace methods with iterative linear solvers. For P41
3D, the Krylov subspace
meth-ods with iterative linear solvers perform better.
Such behavior is again due to the bandwidths of the coefficient matrices. As the discretizations of these two problems involve only two-dimensional planes, the corresponding bandwidths of matrices Acc
0 and Aic0 in (6) and
(10) are small. Direct solvers remain efficient even when dense band matri-ces are introduced after LU or Cholesky factorizations have been performed. In contrast, the discretization of P41
3Dinvolves all three dimensions and the
corresponding bandwidth is large. In such cases, direct solvers are not effi-cient due to the fill-ins of LU or Cholesky factorizations. See Figure 3 for sparsity examples Acc0 , Apc0 , and Aic0.
We conclude this subsection with the following remarks.
(1) For the problems with symmetric positive definite coefficient matrices (e.g. P313D and P61
2D), the eigenvalues are positive and the five smallest
posi-tive eigenvalues can be computed “without” using shift-and-invert. This approach involves matrix-vector multiplications and does not need to solve any linear system. However, our numerical experiments suggest that such an approach is not efficient.
In particular, if we do not perform shift-and-invert in the Krylov sub-space methods, the methods either converge slowly (P313D, as shown in
Ta-ble 7) or fail to convergence (P612D). Table 7 also suggests that the methods
take much more CPU time as compared to the Jacobi-Davidson methods shown Table 2.
(2) Direct solvers with LU factorization outperforms Cholesky factorization, as shown in Table 4. This is because a Cholesky factorization spends much more time in symbolic factorization and leads to a much larger fill-in ratio in order to maintain the symmetry of the matrices. See Table 6 for a detailed breakdown analysis for LU- and Cholesky-based direct solvers.
5.4. Overall Comparisons. Observing the results shown in Tables 2, 3, 4, 5, we can make an overall comparison between the Jacobi-Davidson methods and the Krylov subspace methods and conclude the best scheme combinations for the
different types of problems in Table 8. As discussed above, the bandwidth of the coefficient matrices is the key component that affects the choice of schemes.
6. Conclusion
We consider degree 1, 3, and 5 eigenvalue problems arising in numerical sim-ulations of nano-scale quantum dots. We have shown that a polynomial Jacobi-Davidson method can solve all of these problems without linearizing the higher de-gree problems. As the efficiency of the polynomial Jacobi-Davidson method mainly relies on solving the correction equation, we have discussed three schemes regarding how to compute the approximate solutions of the correction equations. We have also conducted intensive numerical experiments by using the Jacobi-Davidson and Krylov subspace methods with various linear solvers and preconditioners. The nu-merical results suggest the most efficient scheme combinations for different types of problems, which are shown in Table 8. We also find that the bandwidth of the coefficient matrices is the key component that affects the choice of schemes.
It is possible to further improve the solver efficiencies by parallel computing. We then need to find efficient preconditioners that are suitable to the target problems and the particular parallel architectures of interest. The best scheme combinations for different types of problems may also change consequently on parallel computers.
Acknowledgments
We are grateful to Wen-Wei Lin for many helpful discussions and comments. This work is partially supported by the National Science Council, the Taida Institute of Mathematical Sciences, and the National Center for Theoretical Sciences in Taiwan.
Pyramid Dot Cuboid Matrix
Figure 1. Structure schema of a pyramidal and a cylindrical quantum dot. Each of the quantum dots is embedded in a hetero-structure matrix.
EVP P112D P232D P33D1 P413D P553D P612D
QD Geo. cylinder cylinder pyramid pyramid pyramid irregular
QD Geo. Sym. radial radial non-rad. non-rad. non-rad. radial
Disc. Dom. 2D 2D 3D 3D 3D 2D
QD Eff. Mass constant non-para. constant constant non-para. constant
Mtx. Degree (τ ) 1 3 1 1 5 1
Mtx. Prop. Unsym. Unsym. S.P.D. S.P.D. Sym. S.P.D.
Mtx. Size 1, 200, 000 1, 228, 150 5, 216, 783 77, 315 5, 216, 783 1, 228, 150
Table 1. Test problem properties. The table shows the names of the eigenvalue problems, QD geometry, QD geometric symmetry, dimension of discretization domains, QD effective mass, degree of the eigenvalue problems, properties of the coefficient matrices, and coefficient matrix size of the eigenvalue problems. The superscript and subscript of problem names denotes the degree and the dis-cretization domain dimension, respectively. Unsym. and S.P.D. stands for unsymmetric and symmetric positive definite, respec-tively.
References
1. S. Balay, K. Buschelman, V. Eijkhout, W.D. Gropp, D. Kaushik, M.G. Knepley, L.C. McInnes, B.F. Smith, and H. Zhang, PETSc users manual, Argonne National Laboratory, Tech. Rep. ANL-95/11-Revision 2 (2002), no. 5.
2. G. Burkard, D. Loss, and D. P. DiVincenzo, Couple quantum dots as quantum gates, Phys. Rev. B 59 (1999), 2070–2078.
3. E. K.-W. Chu, T.-M. Hwang, W.-W. Lin, and C.-T. Wu, Vibration of fast trains, palindromic eigenvalue problems and structure-preserving doubling algorithms, J. Comput. Appl. Math. 219 (2008), 237–252.
4. M. A. Cusack, P. R. Briddon, and M. Jaros, Electronic structure of InAs/GaAs self-assembled quantum dots, Phys. Rev. B 54 (1996), 2300–2303.
5. E. A. de Andrada e Silva, G. C. La Rocca, and F. Bassani, Spin-orbit splitting of electronic states in semiconductor asymmetric quantum wells, Phys. Rev. B 55 (1997), no. 24, 16293. 6. D. El-Moghraby, R.G. Johnson, and P. Harrison, Calculating modes of quantum wire and dot
systems using a finite differencing technique, Computer Physics Communications 150 (2003), 235–246.
0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103
104 Matrix size = 1,200,000
ω
CPU Time (log)
SOneStep STwoLS SOneLS
(a) P11
2D(Pcc(λ)u = 0 type EVP)
0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103
104
Matrix size = 1,228,150
ω
CPU Time (log)
SOneStep SOneLS
(b) P23
2D (Pcn(λ)u = 0 type EVP)
0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103
Matrix size = 5,216,783
ω
CPU Time (log)
SOneStep STwoLS SOneLS
(c) P31
3D (Ppc(λ)u = 0 type EVP)
0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103
Matrix size = 5,216,783
ω
CPU Time (log)
SOneStep STwoLS SOneLS
(d) P55
3D(Ppn(λ)u = 0 type EVP)
0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 103
104
Matrix size = 1,228,150
ω
CPU Time (log)
SOneStep STwoLS SOneLS
(e) P612D(Pic(λ)u = 0 type EVP)
Figure 2. Timing results for eigenvalue problems with a matrix size larger than 1.2 million. The numerical experiments are con-ducted on a workstation equipped with an Intel 1.6 GHz Itanium II CPU, 32-gigabyte main memory, and an HP Unix operating system.
(a) 2D Problems
P112D P232D P612D
Precond. ω or ` Itno Time ω or ` Itno Time ω or ` Itno Time
SSOR(ω) 1.80 285 1,706 1.25 367 1,440 1.85 458 3,897 1.85 256 1,536 1.30 292 1,077 1.90 392 3,304 1.90 232 1,388 1.35 395 1,734 1.95 388 3,258 1.95 1,213 11,206 1.40 377 1,584 1.98 530 4,548 ILU(`) 5 148 1,415 5 157 867 5 161 1,798 6 126 1,309 6 85 483 6 150 1,767 7 112 1,253 7 100 570 7 143 1,782 8 106 1,278 8 102 631 8 136 1,792 ICC(`) - - - 0 446 3,148 Jacobi - 2,427 11,982 - 837 2,588 - 8,580 48,590 (b) 3D Problems P313D P413D P553D
Precond. ω or ` Itno Time ω or ` Itno Time ω or ` Itno Time
SSOR(ω) 1.80 86 1,905 1.55 50 11 1.80 98 4,087 1.85 82 1,818 1.60 48 11 1.85 98 3,831 1.90 93 2,090 1.65 51 12 1.90 101 4,028 1.95 114 2,659 1.70 50 12 1.95 97 4,036 ILU(`) 0 141 3,273 0 53 12 0 102 3,981 1 106 3,196 1 52 17 1 83 4,008 2 89 3,284 2 52 24 2 72 4,231 ICC(`) 0 143 2,990 0 54 12 0 130 4,662 1 101 2,627 1 51 14 1 99 4,394 2 85 2,887 2 50 20 2 94 5,093 Jacobi - 388 6,342 - 106 16 - 312 9,633
Table 2. The total Jacobi-Davidson iteration numbers (Itno) and CPU times in second (Time) used for solving the test problems P112D, P23 2D, P3 1 3D, P4 1 3D, P5 5 3D, and P6 1 2D by Algorithm 1, scheme
SOneLS and different preconditioners. The column “ω or `” shows
the parameter ω in the preconditioner SSOR(ω) or the fill-in level ` in the preconditioners ILU(`) and ICC(`). The timing results are computed by summing the five smallest positive eigenvalues in each of the test problems.
7. J.L. Fattebert and M.B. Nardelli, Finite difference methods for ab initio electronic structure and quantum transport calculations of nanostructures, Handbook of Numerical Analysis 10 (2003), 571–612.
8. J. W. Gray, D. Childs, S. Malik, P. Siverns, C. Roberts, P. N. Stavrinou, M. Whitehead, R. Murray, and G. Parry, Quantum dot resonant cavitylight emitting diode operating near 1300 nm, Electron. Lett. 35 (1999), 242.
9. L. Harris, D. J. Mowbray, M. S. Skolnick, M. Hopkinson, and G. Hill, Emission spectra and mode structure of InAs/GaAs self-organized quantum dot lasers, Appl. Phys. Lett. 73 (1998), 969–971.
(a) Explicit restarting
Direct GMRES
LU ILU(6) SSOR(1.85)
Rstno Itno Time Itno Time Itno Time
10 29 291 - - - -15 12 251 - - - -20 6 216 - - - -25 3 186 4 8,690 - -30 3 203 3 7,870 3 8,386 35 2 189 2 6,395 2 6,721 40 1 164 1 3,934 1 4,073 45 1 171 1 4,273 1 4,558 50 1 178 1 4,748 1 5,031 (b) Implicit restarting Direct GMRES LU ILU(6) SSOR(1.85)
Rstno Itno Time Itno Time Itno Time
10 11 165 14 5,251 * * 15 5 164 5 4,465 * * 20 3 166 3 4,469 * * 25 2 165 3 5,723 * * 30 2 176 2 5,162 * * 35 2 188 2 6,015 * * 40 1 166 1 3,903 * * 45 1 173 1 4,369 * * 50 1 180 1 4,830 * *
(c) Krylov Schur restarting
Direct GMRES
LU ILU(6) SSOR(1.85)
Rstno Itno Time Itno Time Itno Time
10 10 163 12 5,334 12 5,527 15 5 161 6 4,669 5 4,264 20 3 159 4 4,620 4 4,745 25 2 159 3 1,656 3 4,825 30 2 169 2 4,164 2 4,436 35 2 179 2 3,819 2 5,128 40 1 165 1 3,819 1 4,043 45 1 172 1 4,275 1 4,527 50 1 180 1 4,738 1 5,017
Table 3. Numerical results for solving P112D. The notation “*”
means that the method does not converge. Since the coefficient matrices are unsymmetric, the Cholesky-based direct solver and ICC preconditioners are not used.
(a) Explicit restarting
Direct GMRES
LU Cholesky ILU(2) SSOR(1.30) ICC(2)
Rstno Itno Time Itno Time Itno Time Itno Time Itno Time
10 61 415 68 4,520 147 581 49 143 69 183 15 26 376 18 4,046 27 191 1981 11,003 19 91 20 12 338 12 4,027 13 131 252 1,871 13 87 25 10 346 10 4,056 23 308 * * 378 3,444 30 8 344 8 4,040 7 113 8 87 7 76 35 6 335 18 4,700 6 114 6 79 6 77 40 4 320 4 3,913 9 199 17 260 8 117 45 5 341 5 4,026 15 377 439 7,599 17 285 50 5 349 5 4,087 115 3,160 10 194 68 1,264 (b) Implicit restarting Direct GMRES
LU Cholesky ILU(2) SSOR(1.30) ICC(2)
Rstno Itno Time Itno Time Itno Time Itno Time Itno Time
10 17 290 19 3,782 20 47 20 33 20 31 15 7 290 8 3,773 9 46 8 30 9 31 20 5 290 5 3,773 5 43 5 30 5 29 25 3 287 4 3,786 4 47 4 32 4 31 30 3 292 3 3,779 3 45 3 32 3 31 35 2 287 3 3,804 3 52 3 36 3 35 40 2 291 2 3,786 2 43 2 30 2 29 45 2 294 2 3,973 2 49 2 34 2 34 50 2 298 2 3,809 2 55 2 39 2 38
(c) Krylov Schur restarting
Direct GMRES
LU Cholesky ILU(2) SSOR(1.55) ICC(2)
Rstno Itno Time Itno Time Itno Time Itno Time Itno Time
10 15 289 15 3,788 17 48 16 32 17 33 15 8 288 8 3,778 9 40 9 29 9 28 20 6 288 6 3,791 6 40 6 28 6 27 25 4 287 4 3,807 5 42 5 30 5 29 30 4 291 4 3,797 4 43 4 30 4 29 35 3 290 3 3,789 3 40 3 29 3 28 40 3 293 2 3,769 3 46 3 33 3 32 45 2 289 2 3,783 2 39 2 29 2 27 50 2 292 2 3,800 2 44 2 33 2 30
Table 4. Numerical results for solving P413D. The notation “*”
(a) Implicit restarting
Direct GMRES
LU ILU(6) ICC(0)
Rstno Itno Time Itno Time Itno Time
10 19 298 22 5,553 22 15,165 15 6 275 7 4,281 7 11,924 20 4 284 5 4,832 5 13,244 25 3 288 3 4,339 3 11,965 30 2 279 3 5,230 3 14,836 35 2 296 2 4,423 2 12,474 40 2 312 2 5,100 2 14,623 45 1 273 2 5,730 2 16,847 50 1 282 1 3,564 1 10,143
(b) Krylov Schur restarting
Direct GMRES
LU ILU(6) ICC(0)
Rstno Itno Time Itno Time Itno Time
10 12 274 18 5,323 18 148,803 15 5 263 8 4,343 8 12,284 20 4 272 5 4,015 5 10,956 25 3 277 4 4,201 4 11,848 30 2 274 3 4,116 3 11,783 35 2 288 3 4,740 2 11,875 40 1 268 3 5,423 2 11,877 45 1 278 2 4,712 2 13,410 50 1 289 1 3,613 1 10,262
Table 5. Numerical results for solving P612D. Results of explicit
restarting are not shown here as the method does not converge in most of the cases. The results of Cholesky-based direct method cannot be completed in our computer due to the out of memory errors. The results of GMRES with SSOR are not shown here as the method does not converge. Note that the fill-ins of the Cholesky factorizations in P612Dcause out of memory storage errors
in our experiments. Therefore, no Cholesky results are listed here even though the coefficient matrix is symmetric positive definite.
10. R. Heitz, M. Veit, N. N. Ledentsov, A. Hoffmann, D. Bimberg, V. M. Ustinov, P. S. Kop´ev, and Zh. I. Alferov, Energy relaxation by multiphonon processes in InAs/GaAs quantum dots, Phys. Rev. B 56 (1997), 10435–10445.
11. V. Hernandez, J.E. Roman, A. Tomas, and V. Vidal, SLEPc users manual, Tech. Re-port DSIC-II/24/02 - Revision 2.3.2, D. Sistemas Inform´aticos y Computaci´on, Universidad Polit´ecnica de Valencia, 2006.
12. V. Hernandez, J.E. Roman, and V. Vidal, SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems, ACM Trans. Math. Software 31 (2005), 351–362.
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 nz = 1840 (a) Acc 0 0 50 100 150 200 0 50 100 150 200 nz = 1477 (b) Apc0 0 20 40 60 80 100 0 20 40 60 80 100 nz = 880 (c) Aic 0
Figure 3. Sparsity of the coefficient matrices Acc0 , A pc
0 , and Aic0.
(a) Acc
0 with matrix size 384, (b) A pc
0 with matrix size 245, and (c)
Aic
0 with matrix size 114.
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 nz = 2364 (a) Acn 0 , Acn1 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 nz = 548 (b) Acn 2 , Acn3
Figure 4. Sparsity of the coefficient matrices for Pcn(λ)u = 0
with matrix size 500. (a) Acn0 and Acn1 , (b) Acn2 and Acn3 .
0 50 100 150 200 0 50 100 150 200 nz = 1477 (a) Apn0 , Apn1 0 50 100 150 200 0 50 100 150 200 nz = 449 (b) Apn2 , Apn3 0 50 100 150 200 0 50 100 150 200 nz = 34 (c) Apn4 , Apn5
Figure 5. Sparsity of the coefficient matrices for Ppn(λ)u = 0
with matrix size 245. (a) Apn0 and Apn1 , (b) Apn2 and Apn3 , (c) Apn4 and Apn5 .
13. M. HOCHBRUCK and D. Lochel, A multilevel Jacobi-Davidson method for polynomial pde eigenvalue problems arising in plasma physics, Tech. report, Technical Report, Math. Institut HHU Dusseldorf, 2009.
LU Cholesky Time for Symbolic factorization 3 (1%) 2,378 (63%) Time for numerical factorization 273 (90%) 1,257 (33%)
Total CPU time 304 3,773
Fill-in ratio 102% 309%
Table 6. Timing in seconds and the corresponding percentage (shown within parentheses) breakdown of the key components in the LU- and Cholesky-based direct solvers for solving P413D.
Krylov Schur restarting Implicit restarting
Rstno Itno Time Itno Time
20 1052 19,055 449 9,646
25 409 11,430 250 7,956
30 305 11,734 198 8,645
35 268 13,771 118 6,770
40 183 12,055 95 7,132
Table 7. Numerical results for solving P313D by Krylov subspace
methods “without” shift-and-invert.
3D 2D
Higher order EVP JD + SOneLS + SSOR JD + SOneLS + ILU
(P553D) (P2
3 2D)
Standard/General EVP JD + SOneLS + SSOR KS + LU-based Direct
(P31
3D, P413D) (P112D, P612D)
Table 8. The most efficient scheme combinations for different test problems.
14. T.-M. Huang, W.-W. Lin, and J. Qian, Structured algorithms for palindromic quadratic eigen-value problems arising in vibration of fast trains, SIAM J. Matrix Anal. Appl. 30 (2009), 1566–1592.
15. T.-M. Hwang, W.-W. Lin, W.-C. Wang, and W. Wang, Numerical simulation of three dimen-sional pyramid quantum dot, Journal of Computational Physics 196 (2004), 208–232. 16. T.-M. Hwang and W. Wang, Energy states of vertically aligned quantum dot array with
nonparabolic effective mass, Comput. Math. Appl. 49 (2005), 39–51.
17. T.-M. Hwang, W.-C. Wang, and W. Wang, Numerical schemes for three-dimensional irregular shape quantum dots over curvilinear coordinate systems, J. Comput. Phys. 226 (2007), 754– 773.
18. T.-M. Hwang, W.-H. Wang, and W. Wang, Efficient numerical schemes for electronic states in coupled quantum dots, J. Nanosci. Nanotechnol. 8 (2008), 3695–3709.
19. G. Iannaccone, A. Trellakis, and U. Ravaioli, Simulation of a quantum-dot flash memory, J. Appl. Physi. 84 (1998), no. 9, 5032–5036.
20. L. Jacak, P. Hawrylak, and A. W´ojs, Quantum dots, Springer, Berlin, 1998.
21. R.B. Lehoucq and D.C. Sorensen, Deflation techniques for an implicitly restarted arnoldi iteration, SIAM J. Matrix Anal. Appl. 17 (1996), 789–821.
22. R.B. Lehoucq, D.C. Sorensen, and C. Yang, ARPACK USERS GUIDE: Solution of large scale eigenvalue problems with implicitely restarted arnoldi methods, SIAM, Philadelphia, 1998.
23. Y. Li, O. Voskoboynikov, C.P. Lee, and S.M. Sze, Computer simulation of electron energy levels for different shape InAs/GaAs semiconductor quantum dots, Computer Physics Com-munications (2001), no. 141, 66–72.
24. S. Maimon, E. Finkman, G. Bahir, S. E. Schacham, J. M. Garcia, and P. M. Petroff, Inter-sublevel transitions in InAs/GaAs quantum dots infrared photodetectors, Appl. Phys. Lett. 73 (1998), 2003–2005.
25. G. Medeiros-Ribeiro, J. M. Garcia, and P. M. Petroff, Charging dynamics of InAs self-assembled quantum dots, Phys. Rev. B 56 (1997), 3609–3612.
26. F. M. Peeters and V. A. Schweigert, Two-electron quantum disks, Phys. Rev. B 53 (1996), 1468–1474.
27. C. Pryor, Eight-band calculations of strained InAs/GaAs quantum dots compared with one-, four-, and six-band approximations, Phys. Rev. B 57 (1998), no. 7190–7195.
28. J. Shumway, L. R. C. Fonseca, J. P. Leburton, R. M. Martin, and D. M. Ceperley, Electronic structure of self-assembled quantum dots: comparison between density functional theory and diffusion quantum monte carlo, Physica E 8 (2000), 260–268.
29. G. L. G. Sleijpen and H. A. van der Vorst, A Jacobi-Davidson iteration method for linear eigenvalue problems, SIAM J. Matrix Anal. Appl. 17 (1996), no. 2, 401–425.
30. G. W. Stewart, Matrix algorithms, volume ii: eigensystems, SIAM, Philadelphia, 2001. 31. G.W. Stewart, A Krylov–Schur algorithm for large eigenproblems, SIAM J. Matrix Anal.
Appl. 23 (2001), 601–614.
32. , Addendum to ”a Krylov–Schur algorithm for large eigenproblems”, SIAM J. Matrix Anal. Appl. 24 (2002), 599–601.
33. W. Wang, T.-M. Hwang, and J.-C. Jang, A second-order finite volume scheme for three dimensional truncated pyramidal quantum dot, Comput. Phys. Comm. 174 (2006), 371–385. 34. W. Wang, T.-M. Hwang, W.-W. Lin, and J.-L. Liu, Numerical methods for semiconductor
heterostructures with band nonparabolicity, J. Comput. Phys. 190 (2003), 141–158.
35. A. J. Williamson and A. Zunger, InAs quantum dots: Predicted electronic structure of free-standing versus GaAs-embedded structures, Phys. Rev. B 59 (1999), 15819–15824.
Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan. E-mail address: [email protected]
Department of Mathematics, National Taiwan University, Taipei 106, Taiwan. E-mail address: [email protected]
Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan. E-mail address: [email protected]