量子點模擬所產生之多項式特徵值問題的數值研究

全文

(1)A numerical study for polynomial eigenvalue problems arising in quantum dot simulations Chang-Tse, Lee June 18, 2009 Abstract In this paper, we study how to use the polynomial Jacobi-Davidson iterative mehtod to solve the polynomial eigenvalues problem. And we use the locking scheme to lock the convergent eigenpaors, and give four schemes to solve the correction equation in polynomial Jacobi-Davidson method. A set of polynomial eigenvalue benchmark problems are derived from quantum dot simulations, with three different shapes of quantum dot and two kinds of effective mass models. In numerical results, we illustrate the non-zero elements of all coefficient matrices in these five benchmark problems and compare the performance of the various schemes for solving correction equation with different preconditioners to suggest the best choice in each different case when solving the correction equation in polynomial Jacobi-Davidson method. Keywords: polynomial eigenvalue problem; Jacobi-Davidson method; the Schr¨ odinger equation.. 1. Introduction. A polynomial eigenvalues problem can be defined as ! τ X i A(λ)x ≡ λ Ai x = λτ Aτ + λτ −1 Aτ −1 + · · · + λA1 + A0 x = 0, (1.1) i=0. where λ ∈ C, x ∈ Cn , τ is the degree of the matrix polynomial, Ai ∈ Rn×n is the coefficient matrix for all i = 1, 2, · · · , τ , and n is the total number of unknowns. If (λ, x) is a non-trivial solution of A(λ)x = 0, we say that (λ, x) is an eigenpair of the polynomial eigenvalues problem A(λ)x = 0. This definition is a natural extension of the linear eigenvalues problem. For example, the standard eigenvalues problem Ax = λx can be represented as A(λ)x ≡ (−λI + A)x = 0. or (A − λI)x = 0,. and the general eigenvalues problem Ax = λBx can be represented as A(λ)x ≡ (−λB + A)x = 0. or (A − λB)x = 0.. For the small and dense eigenvalues problem, QR and QZ methods are popular eigensolvers, but they only work for linear eigenvalues problems. So, how do 1.

(2) we solve the polynomial eigenvalues problems with small and dense coefficient matrices? A standard way to deal this problem is linearization, that is, transforms a polynomial eigenvalues problem (1.1) to a generalized eigenvalues problem       x I x I    λx     λx  .. ..       . . ,  = λ    .. ..      . I . I  A0. A1. ···. Aτ −1. λτ −1 x. −Aτ. λτ −1 x (1.2). and so we can use QR or QZ method to solve it. But, linearization has following disadvantages: • The size of the linearized problem (1.2) is τ times larger than the size of the original problem (1.1). • The condition number of eigenvalues and eigenvectors in the linearized problem may increase. • The matrices in linearized problem do not preserve the structure of the original matrices. For the large scale and sparse polynomial eigenvalue problem, we don’t use the idea of linearization, because the matrix size is too large to compute, and in most cases of the large scale problem, we only find some target eigenpairs not for all. There are three popular iterative methods, Arnoldi [1], Lanczos [13], and Jacobi-Davidson [18], are usually be used to solve the large scale and sparse linear eigenvalues problem, where Arboldi and Lanczos are based on computing the Krylov subspace to converge the target eigenpairs (they can converge several eigenpairs at one time), and Jacobi-Davidson is based on finding a searching space such that the target eigenvector lies in this searching space. However, these three methods are established on the linear eigenvalues problem, but there are several method based on these three methods can solve the nonlinear eigenvalues problem. For Arnoldi method, Bai and Su in [2] give a quadratic eigensolver: SOAR (Second-order Arnoldi method) which based on the second-order Krylov subspace. If the quadratic eigenvalues problem is (λ2 M +λC +K)x = 0, then SOAR computes the second-order Krylov subspace Gn (A, B, u) with A = −M −1 C and B = −M −1 K. Later, Meergergen in [15] give another quadratic eigensolver: quadratic Arnoldi method which computes the Krylov subspace Kn (S, u) with O I S= . −K −1 M −K −1 C On the other hand, Voss extend the Arnoldi method to the nolinear Arnoldi method based on the inverse iterative [22, 23]. For Jacobi-Davidson method, Sleijpen et al. extend to the general eigenvalues problem and the polynomial eigenvalues problem [17]. But, in fact, they only solve the quadratic eigenvalues problem form this algorithm, and do not report the result in this paper. Hwang et al. exactly extend the Jacobi-Davidson method to cubic [9] and quintic [10] eigenvalues problems with great success for 2.

(3) solving these problems. The polynomial Jacobi-Davidson in this thesis is based on the above method. The outline of this paper is as follows. In Section 2, we describe the detail of the polynomial Jacobi-Davidson iterative method in solving the eigenvalues problem. On the other hand, we also use the locking scheme base on the Schur form, and give four schemes to solve the correction equation in this method. In Section 3, the quantum dot model, our main problem, is described, and we discrete the Schrödinger equation in two different effective mass model and different shapes of quantum dots to five problem. Numerical results are given in Section 4. We will discuss the matrix format in each problem models, compare the correction equation schemes, and use several proconditioners in solving liner equation in this section. Finally, we close the paper with a few concluding remarks in Section 5.. 2. The Jacobi-Davidson algorithm for polynomial eigenvalue problems. Before we go thought the algorithm of polynomial Jacobi-Davidson method, let us see some theorems first. These theorems will give us some ideas for why the Jacobi-Davidson can work. Theorem 2.1 ([20, Chapter 4 Theorem 1.2]). Let X be an eigenspace (or invariant subspace) of a matrix A and let X be a basis of X . Then there is a unique matrix L such that AX = XL, and the matrix L is given by L = X I AX, where X I is a matrix satisfy X I X = I. If (λ, u) is an eigenpair of A with u ∈ X , then (λ, X I u) is an eigenpair of L. Conversely, if (θ, s) is an eigenpair of L, then (θ, Xs) is an eigenpair of A. Let A be an n × n matrix with a large number n. Theorem 2.1 tells us that if we only need a part of eigenpairs of A, we can compute them from the smaller matrix L when dim X << n rather than larger matrix A. However, if we don’t know what the eigenspace X is, we can’t use this theorem to find the desired eigenpairs of A. But, we can use this idea, to compute eigenpairs from a smaller matrix L, and following theorems to approximate the desired eigenpairs of A. Theorem 2.2 ([20, Chapter 4 Theorem 2.6]). The matrix A and V are given. Let V⊥ be a matrix such that [V, V⊥ ] is unitary, and let R = AV − V L. Then kRk is minimized with taking L = V H AV, and in this case, kRk = kV⊥ AV k. Theorem 2.3 ([20, Chapter 4 Theorem 2.9]). Let V be orthogonal, A is Hermitian, and R = AV − V L. If θ1 , θ2 , · · · , θk are eigenvalues of L, then there. 3.

(4) are eigenvalues λl1 , λl2 , · · · , λlk of A such that v u k uX √ |θi − λli | ≤ kRk2 and t (θi − λli )2 ≤ 2kRkF . i=1. Notice that, the matrix V in these two theorems is arbitrary, then span(V ) may not be an eigenspace of A, but if we take L = V H AV (by Theorem 2.2), it will give a good approximation to the desired eigenvalues of A (by Theorem 2.3). Since the approximation depends on V by these two theorems, how to find a new matrix V or update the original matrix V such that the residual kRk becomes to 0 is what we want to do. The polynomial Jacobi-Davidson method is a way to update the original orthonormal matrix V to approximate the desired eigenpairs of the polynomial eigenvalues problem A(λ)x = 0.. 2.1. Correction equation for Jacobi-Davidson method. Suppose we have an orthogonal searching space V = Vk = [v1 , v2 , · · · , vk ], and (θk , sk ) is the eigenpair of the small polynomial eigenvalues problem ! τ X (2.1) θi VkH Ai Vk s = 0, (VkH A(θ)Vk )s = i=0. ˆ of A(λ)x = 0. Let where θk is an approximation to the target eigenvalues λ uk = Vk sk .. (2.2). Then the residual of (θk , uk ) is given by rk = A(θk )uk .. (2.3). From (2.1), (2.2) and (2.3), we have H H H uH k rk = sk Vk A(θk )Vk sk = sk 0 = 0.. (2.4). Now we want to find a correction vector tk ⊥ uk such that ˆ k + tk ) = 0. A(λ)(u. (2.5). Equation (2.5) can be written as ˆ k = −A(λ)u ˆ k = (−A(θk ) + A(θk ) − A(λ))u ˆ k = −rk + (A(θk ) − A(λ))u ˆ k. A(λ)t (2.6) Define

(5)

(6) d pk = A (θk )uk ≡ A(λ)

(7)

(8) uk = dλ λ=θk ′. τ X. i(θk ). i=1. ˆ k can be represented as follows: Then (A(θk ) − A(λ))u 4. i−1. Ai. !. uk ..

(9) • if A(λ) = A − λI, then we obtain A′ (λ) = −I, which implies pk = A′ (θk )uk = −uk and then ˆ k = (−θk I + λI)u ˆ ˆ (A(θk ) − A(λ))u k = (θk − λ)pk ; • if A(λ) = A − λB, then we obtain A′ (λ) = −B, which implies pk = A′ (θk )uk = −Buk and then ˆ k = (−θk B + λB)u ˆ ˆ (A(θk ) − A(λ))u k = (θk − λ)pk ; • if A(λ) =. Pτ. i i=0 (Ai λ ),. from the Taylor’s Theorem, we have. 1 A(λ) = A(θk ) + (λ − θk )A′ (θk ) + (θk − λ)2 A′′ (ξ), 2. ˆ and θk . Thus, where ξ is between λ 1 ′ 2 ′′ ˆ ˆ ˆ (A(θk ) − A(λ))uk = (θk − λ)A (θk ) − (θk − λ) A (ξk ) uk 2 ˆ k − 1 (θk − λ)2 A′′ (ξ)uk . = (θk − λ)p 2 Hence, no matter what degree of A(λ) is in (2.6), equation (2.6) can be reformulated as ˆ 2 A′′ (ξk )uk . ˆ k = −rk + (θk − λ)p ˆ k − 1 (θk − λ) A(λ)t 2 Using the results of (2.4) and . I−. pk u H k uH p k k. . pk = 0,. we get H pk u H ˆ k = −rk − 1 (θk − λ) ˆ 2 I − pk uk A′′ (ξk )uk . I − H k A(λ)t 2 u k pk uH k pk ˆ is, but we have θk to be an approximation However, we don’t know what λ H ˆ 2 I − pkHuk A′′ (ξk )uk to get of it. So, we can ignore the term 12 (θk − λ) u p k. k. pk u H ˆ k ≈ −rk . I − H k A(λ)t u k pk. Therefore, one has to select suitable approximation tk ⊥ uk for the solution of following correction equation pk u H k I− H and tk ⊥ uk . (2.7) A(θk ) I − uk uH k tk = −rk u k pk. And we will update the matrix V by plug in tk for the (k + 1)th column. We describe the algorithm in Algorithm 2.1. There are three questions for Algorithm 2.1. 5.

(10) Algorithm 2.1 Polynomial Jacobi-Davidson algorithm Input: Coefficient matrices Ai for i = 1, 2, · · · , τ . Output: The desired eigenpair (λ, x). 1: Initialize the searching space V = V1 = [v1 ] as an orthonormal matrix. 2: for k = 1, 2, · · · until convergence do 3: Compute all the eigenpairs P (θ, s) of the small size problem τ (VkH A(θ)Vk )u = 0, where A(θ) = i=1 θi Ai . 4: Select the approximate desired (target) eigenpair (θk , sk ) with ksk k2 = 1. Compute uk = Vk sk , pk = A′ (θk )uk , and rk = A(θk )uk . if (krk k2 < ε) then Set λ = θk , x = uk . Stop this for loop. end if Solve (approximately) a tk ⊥ uk from pk u H k I− H A(θk ) I − uk uH k tk = −rk . u k pk. 5: 6: 7: 8: 9:. 10: 11:. Orthogonalize tk against Vk , set vk+1 = tk /ktk k2 , and expand Vk+1 = [Vk , vk+1 ]. end for • How to lock the eigenpairs which have solved in the problem (VkH A(θ)Vk )u = 0 when solving another eigenpairs. • When the polynomial Jacobi-Davidson iterative number k increase, the size of the problem (VkH A(θ)Vk )u = 0 is also increase. Thus, it would cost much time to solve it and more memory to save it. • How to solve the correction equation.. We deal with the first and second question by locking scheme and restart scheme in Section 2.2, and give some schemes to deal with the least question in Section 2.3.. 2.2. Locking and restart scheme. The main idea of locking schemes is to lock the eigenvalues which we need in the spectrum of (V H A(θ)V )s = 0. That is, when we solve the problem (V H A(θ)V )s = 0, we always find these locked eigenvalues. In the linear eigenvalue problem, Stewart [19] use the Schur form to be the restart and locking schemes. Suppose we already have eigenvalues λ1 , λ2 , · · · , λj of A, and the associated Schur vectors v1 , v2 , · · · , vj . That is, let VX = [v1 , v2 , · · · , vj ] with VXH VX = I, then AVX = VX T for some upper triangular T , and the spectrum of T : Λ(T ) = {λ1 , λ2 , · · · , λj }. Set V = [VX , Vi ] with V H V = I, then H H T VXH AVi VX VX T VXH AVi VX AVX VXH AVi H . = = V AV = 0 ViH AVi ViH VX T ViH AVi ViH AVX ViH AVi 6.

(11) It follows that Λ(V H AV ) = Λ(T ) ∪ Λ(V iH AVi ), and it means that we lock the eigenvalues λ1 , λ2 , · · · , λj of A in the spectrum of V H AV . However, the Schur form is not defined in the polynomial eigenvalues problem. But, Meerbergen [14] gives a link between methods for solving quadratic eigenvalues problems and linearized problems, and illustrate quadratic JacobiDavidson with locking based on the Schur form. Hwang et al. also use the Schur form for locking scheme in cubic [9] and quintic [10] Jacobi-Davidson methods, and have a great result. Thus, we can consider this locking scheme in the polynomial Jacobi-Davidson method, and give an algorithm to this scheme as following. Assume we want to find several smallest positive eigenvalues of a polynomial eigenvalues problem. The following algorithm will calculate the smallest positive eigenvalues first, and compute successively all other desired eigenvalues by choosing the orthonormal searching space V = [VX , Vini ] suitably [10]: • We compute the smallest positive eigenvalues λ1 and the associated eigenvector x1 by Algorithm 2.1 first. In this stage, VX is empty, and Vini is H any matrix satisfying Vini Vini = I. • Now, we are going to compute the second smallest positive eigenvalues λ2 . First we normalize the eigenvector x1 and add it to the matrix VX , and choose an orthonormal matrix Vini such that the searching space V = [VX , Vini ] and V H V = I. So, the first eigenvalues λ1 will be locked in the spectrum of (V H A(θ)V )s = 0, and we take the approximate desired eigenvalues θk is the second smallest positive eigenvalues of above (because the smallest if λ1 ). • Suppose we already compute the eigenvalues λ1 , λ2 , · · · , λj . We set VX to be an orthonormal basis of the eigenspace which spanned by x1 , x2 , · · · , xj , and choose an orthonormal matrix Vini satisfying V H V = I where V = [VX , Vini ]. Then the eigenvalues λ1 , λ2 , · · · , λj are looked in the spectrum of (V H A(θ)V )s = 0, and we take the (j + 1)th smallest positive eigenvalues to approach the eigenvalues λj+1 . For the restart scheme, we use the same idea with locking scheme. We reset the searching space V as the eigenvectors of V H A(θ)V = 0, then we will leave almost information of approaching the target eigenpair. The following algorithm will reset the searching space V to gain the restarting. • When we compute the smallest positive eigenvalue λ1 , we save ν eigenvectors s1 , s2 , · · · , sν of (V H A(θ)V )s = 0, where si is the eigenvector associated to the ith small positive eigenvalue of (V H A(θ)V )s = 0. And when we need to restart, we compute s˜i = V si , i = 1, 2, · · · , ν, and reset the searching space V = [˜ s1 , s˜2 , · · · , s˜ν ].. 7.

(12) • When we compute λj , j ≥ 2, we save ν eigenvectors sj , sj+1 , · · · , sj+ν−1 of (V H A(θ)V )s = 0, where si is the eigenvector associated to the ith small positive eigenvalue of (V H A(θ)V )s = 0, i = j, j + 1, · · · , j + ν − 1. When we do restarting, we we compute s˜i = V si , i = j, j + 1, · · · , j + ν − 1, and reset the searching space V = [VX , s˜j , s˜j+1 , · · · , s˜j+ν−1 ]. Algorithm 2.2 is a polynomial Jacobi-Davidson method with locking and restarting scheme based on the Schur form which designed to compute the σ smallest positive eigenvalues and the associated eigenvectors of polynomial eigenvalues problems (1.1). Algorithm 2.2 Polynomial Jacobi-Davidson Method with locking and restarting scheme Input: Coefficient matrices Ai for i = 1, 2, · · · , τ , and the number of desired eigenvalues σ. Output: The desired eigenpairs (λj , xj ) for j = 1, 2, · · · , σ. 1: Initialize the searching space V = [Vini ] as an orthonormal matrix, and let VX = [ ]. 2: for j = 1 to σ do 3: for k = 1, 2, · · · until convergence do 4: Compute all the eigenpairs (θ, Pτs) of the small size problem (V H A(θ)V )u = 0, where A(θ) = i=1 θi Ai . 5: Save sj , sj+1 , · · · , sj+ν−1 which associated to the jth, (j + 1)th, · · · , (j + ν − 1)th small positive eigenvalues, respectively. 6: Select θk as the jth smallest positive eigenvalues and let sk be the associated normalized eigenvector. 7: Compute uk = Vk sk , pk = A′ (θk )uk , and rk = A(θk )uk . 8: if (krk k2 < ε) then 9: Set λj = θk , and xj = uk . Goto step (19). 10: end if 11: Solve (approximately) a tk ⊥ uk from pk u H I − H k A(θk ) I − uk uH k tk = −rk . u k pk 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:. if the size of V ≤ M axV then Orthogonalize tk against V , set vk+1 = tk /ktk k2 , and expand V = [V, vk+1 ]. else Compute s˜j = V sj , s˜j+1 = V sj+1 , · · · , s˜j+ν−1 = V sj+ν−1 . Reset V = [VX , s˜j , s˜j+1 , · · · s˜j+ν−1 ] and orthonormalize V . end if end for Orthogonlize xj against VX , compute xj = xj /kxj k2 , and update VX = [VX , xj ]. Choose an orthogonal matrix Vini ⊥ VX , and set V = [VX , Vini ]. end for. 8.

(13) 2.3. Solving the correction equation. In this section, we will discuss how to solve the correction equation (2.7) pk u H and tk ⊥ uk (2.7) I − H k A(θk ) I − uk uH k tk = −rk u k pk. approximately in step 9 of Algorithm 2.1 and step 11 of Algorithm 2.2, because it is an important and difficult step in the polynomial Jacobi-Davidson method. We will discuss two different points of view for solving (2.7): one is that we use the condition tk ⊥ uk to rewrite (2.7) as pk u H k I− H A(θk )tk = −rk and tk ⊥ uk . (2.8) u k pk and give three schemes to approximate the solution of (2.8) by some preconditioners for A(θk ) [6, 18]; the other is that we see (2.7) as a linear system and introduce how to solve this linear system by using precondition iterative linear system method [18]. In Section 4, we shall compare the performances of these schemes with different preconditioners of A(θk ) in quantum dot simulations. Firstly, form (2.8), it follows that A(θk )tk −. Let ε =. uH k A(θk )tk . uH k pk. uH k A(θk )tk pk = −rk . uH k pk. Then A(θk )tk = εpk − rk ,. which implies that tk = εA(θk )−1 pk − A(θk )−1 rk .. (2.9). Now, we discuss three schemes to approximate tk in (2.9) by different approximation for A(θk )−1 . • Scheme PrecOnly [18]. One way to approximate A(θk )−1 is to take a suitable preconditioner M ≈ A(θk ) directly, then we can compute an approximation t˜ for tk in (2.9): t˜ = εM−1 pk − M−1 rk .. (2.10). By the condition t˜ ≺ uk , we have H −1 −1 ˜ uH pk − u H rk , k t = 0 = εuk M k M. and we obtain ε=. −1 uH rk k M . H −1 u k M pk. Algorithm 2.3 describe how to compute an approximate solution of correction equation (2.7) by using Scheme PrecOnly. Notice that we compute M−1 rk and M−1 pk once, and use them in step 3 and 4 to save the computing time. 9.

(14) Algorithm 2.3 Solving the correction equation with Scheme PrecOnly Input: A(θk ), rk , uk , and pk from (2.7). Output: The approximation t˜ for tk in (2.7). 1: Compute the preconditioner M ≈ A(θk ). 2: Solve M−1 rk and M−1 pk . −1 −1 3: Compute ε = (uH rk )/(uH pk ). k M k M −1 −1 ˜ 4: Set t = εM pk − M rk . • Scheme TwoLS. Another way to approximate A(θk )−1 pk and −A(θk )−1 rk is to see them as two linear systems A(θk )t = pk. and A(θk )t = −rk .. Let t˜1 and t˜2 be approximated solutions respectively, we obtain a formula to compute an approximation t˜ for tk in (2.9): t˜ = εt˜1 + t˜2 .. (2.11). Similarly, since H˜ H˜ ˜ uH k t = 0 = εuk t1 + uk t2 ,. we get ε=−. ˜ uH k t2 . H uk t˜1. Algorithm 2.4 describe how to compute an approximate solution of correction equation (2.7) by using Scheme TwoLS. Notice that, we use the precondition iterative linear solver to solve these two linear systems in step 2, and take preconditioner M ≈ A(θk ) directly. Algorithm 2.4 Solving the correction equation with Scheme TwoLS Input: A(θk ), rk , uk , and pk from (2.7). Output: The approximation t˜ for tk in (2.7). 1: Compute the preconditioner M ≈ A(θk ). 2: Use the precondition iterative linear system method to solve A(θk )t = pk and A(θk )t = −rk with preconditioner M, and get the approximate solution t˜1 and t˜2 respectively. H˜ ˜ 3: Compute ε = −(uH k t2 )/(uk t1 ). ˜ ˜ ˜ 4: Set t = εt1 + t2 . • Scheme EaxctOneLS [6]. The third scheme is similar to the Scheme TwoLS without to solve the linear system A(θk )t = −rk . This is because, in (2.3), we already have rk = A(θk )uk , thus A(θk )−1 rk = uk . Therefore, we let t˜1 be the approximate solution of linear system A(θk )t = pk , and then the approximation t˜ for tk in (2.9) is t˜ = εt˜1 − uk . 10.

(15) On multiplying both side of this equation by uH k with kuk k2 = 1: H˜ H H˜ ˜ uH k t = 0 = εuk t1 − uk uk = εuk t1 − 1,. and we obtain ˜ −1 . ε = (uH k t1 ) This scheme is described in Algorithm 2.5. Algorithm 2.5 Solving the correction equation with Scheme ExactOne Input: A(θk ), rk , uk , and pk from (2.7). Output: The approximation t˜ for tk in (2.7). 1: Compute the preconditioner M ≈ A(θk ). 2: Use the precondition iterative linear system method to solve A(θk )t = pk with preconditioner M, and get the approximate solution t˜1 . ˜ −1 . 3: Compute ε = (uH k t1 ) ˜ ˜ 4: Set t = εt1 − uk . On the other hand, the correct equation (2.7) can be seen as a linear system Atk = −rk , where A=. . I−. pk u H k uH p k k. and tk ⊥ uk , . (2.12). A(θk ) I − uk uH k .. Thus we can use a precondition iterative linear system method, e.g., GMRES, to solve it. Notice that, when using a precondition iterative linear system method to solve a linear system, in each step of the iteration, a ”preconditioning equation” has to be solved. In general, we will take an approximation (preconditioner) Mp of A, and the preconditioning equation will be Mp zi = yi . In the following, we give a special approximation of A(θk ) such that the preconditioning equation is easy to solve. • Scheme OneLS [18]. Let M be some approximation (preconditioner) of A(θk ), then we take the special approximation Mp of A as pk u H k Mp ≡ I − H M I − uk uH k . u k pk Hence in each iterative step, the preconditioning equation is Mp zi = yi , where yi is some given vector orthogonal to uk and zi ⊥ uk . Now, we rewrite the preconditioning equation as pk u H I − H k M I − uk uTk zi = yi . u k pk Since zi ⊥ uk , zi can be computed as (cf. (2.7) and (2.9)) zi = ηi M−1 pk − M−1 yi 11. and ηi =. −1 uH yi k M . −1 p uH M k k.

(16) Notice that, in this approach, we do not need to form the approximation −1 Mp . On the other hand, M, M−1 pk , and uH pk are computed only k M once in each Jacobi-Davidson iterative step, and can beused in all steps of the iteration process for (2.7). The process of this scheme is described in Algorithm 2.6. Algorithm 2.6 Solving the correction equation with Scheme OneLS Input: A(θk ), rk , uk , and pk from (2.7). Output: The approximation t˜ for tk in (2.7). 1: Compute the preconditioner M ≈ A(θk ). −1 2: Compute M−1 pk and uH pk . k M 3: Use the precondition iterative linear system method to solve pk u H k A(θk ) I − uk uH I− H k tk = −rk . u k pk In each iterative step, we compute −1 H −1 (uH M y )/(u M p ). Then compute i j k k. M−1 yi. and. ηi. =. zi = ηi M−1 pk − M−1 yi as the approximate solution of preconditioner equation.. 3. Model Problems. In this article, the models of polynomial eigenvalues problem arising in numerical simulations of nano-scale quantum dots. A quantum dot is a semiconductor whose excitions are confined in all three spatial dimensions. In there, we consider the quantum dot confined on the interface between different two semiconductors, then the quantum dot would carries discrete energy levels. That is, if we consider the single particle conduction band model, the total energy of this particle is one of these discrete energy levels. Thus, our goal is to find these discrete energy levels.. 3.1. The Schr¨ odinger equation for quantum dots. The wave function Ψ(x, t) is a mathematical tool used in quantum mechanics to describe any physical system, and the probability density P (x, t) for a particle to be located at point x at time t is the squares of the absolute values of the wave function, that is, P (x, t) dx = |Ψ(x, t)|2 dx, where dx is a small three-dimensional space. By the property of probability, Z Z |Ψ(x, t)|2 dx = 1, P (x, t) dx = all−space. all−space. and we call this the normalization condition.. 12.

(17) The Schrödinger equation is the law describes how the wave function evolves over time. For a general quantum system, the Schrödinger equation is the form i~. ∂ Ψ(x, t) = HΨ(x, t), ∂t. √ where i ≡ −1 is the imaginary unit, ~ is the reduced Plank constant, and H is the Hamiltonian operator. The Hamiltonian operator H is the observable corresponding to the total energy of the system, that is, HΨ(x, t) means the total energy corresponding to this wave function of the system. Hence, the way to find these discrete energy levels of the quantum dot is to solve the Schrödinger equation for a suitable Hamiltonian operator. We foucs on a single particle conduction band model in this article, so the Schrödinger equation becomes the form ~2 2 ∂ ∇ + V (x, t) Ψ(x, t), (3.1) i~ Ψ(x, t) = − ∂t 2m where m is the effective mass of the particle which depends on the location x and energy of the particle, ∇2 is the Laplacian operator, and V (x, t) is the confinement potential. In there, we consider the situation in the stable state. So, we assume V (x, t) = V (x) which is independent with t, and we separate the x and t variables by substituting Ψ(x, t) = φ(x)f (t) in (3.1). Then by the method of separation of variables, it follows that ~2 2 ∂ f (t) φ(x) = − ∇ + V (x) φ(x)f (t) i~ ∂t 2m i~ ∂ ~2 2 1 ⇒ − f (t) = ∇ + V (x) φ(x). f (t) ∂t φ(x) 2m In this equation, the left hand side is the function of t only, and the right hand side is a function of x only, so these two equations are equal to the same constant λ. Then we obtain the time-independent Schrödinger equation ~2 2 − ∇ + V (x) φ(x) = λφ(x), (3.2) 2m with λ and φ(x) are the total energy and the wave function of this single particle, respectively. In the heterostructures, both m and V (x) are discontinue on the interface. Let Ωd and Ωm ⊂ R3 denote the domain occupied by the dot and matrix, respectively. The confinement potential V (x) is a piecewise constant function with respect to the space variable x: Vd , x ∈ Ωd V (x) = . (3.3) Vm , x ∈ Ωm In a constant effective mass model, m = m(x) is also a piecewise constant function with respect to the space variable x: m d , x ∈ Ωd . (3.4) m(x) = m m , x ∈ Ωm 13.

(18) Moreover, in the non-parabolic effective mass model [7], m = m(λ, x) depends on the total energy λ and is also discontinuous across the interface between the matrix and the dot:

(19)

(20) Pj2 1 1 1 2

(21) + ≡ , for j ∈ {d, m}, = 2 mj (λ) m(λ, x)

(22) x∈Ωj ~ λ + Ej − Vj λ + Ej − Vj + ∆j (3.5) where Pj , Ej and ∆j are the momentum, main energy gap, and spin-orbit splitting in the j-region, respectively. On the boundary of the matrix, we apply the Dirichlet conditions φ = 0. And On the interface between the quantum structure material and the matrix ∂Ωinterf ace = ∂Ωd ∩ ∂Ωm the Ben Daniel-Duke condition

(23)

(24) ∂φ

(25)

(26) ∂φ

(27)

(28) 1 1 = md (λ) ∂nd

(29) ∂Ωinterf ace mm (λ) ∂nm

(30) ∂Ωinterf ace holds, where nd and nm denote the outward unit normal on the boundary of ∂Ωd ∩ ∂Ωinterf ace and ∂Ωm ∩ ∂Ωinterf ace , respectively.. 3.2. The discretization of the Schr¨ odinger equation. Among these various shapes of the quantum dot and different effective mass models, we pick five particular setting as the test problem models. There are three problem models in constant effective mass model (3.4) described in Section 3.2.1, and the other two in non-parabolic effective mass model (3.5) is described in Section 3.2.2. 3.2.1. Discretization in constant effective mass model. In the constant effective mass model (3.4), we have three test models for pyramid, cylindrical and irregular shapes of quantum dots. • Model PyC [10]. First we consider the pyramid quantum dot embedded in a cuboid matrix as shown in Figure 3.1. We discretize the Schrödinger equation (3.2) of this model by the finite volume discretized mathod in a Cartesian coordinate with uniform mesh. This discretization results a standard eigenvalues problem A0 u = −λu ⇐⇒ A(λ)u = (λI + A0 )u = 0,. (3.6). where A0 is a symmetric positive definite sparse matrix with non-zero entries located in the main diagonal and six off-diagonals (See Figure 4.1). • Model CyC [12]. Second, we consider the array of two vertically aligned cylindrical quantum dots embedded in a cylindrical matrix as shown in Figure 3.2 This quantum dot model can be described by the Schrödinger equation in the cylindrical coordinate x = (r, θ, z) ~2 ∂ 2 1 ∂ ∂2 1 ∂2 − φ(x) + φ(x) + φ(x) + V (x)φ(x) = λφ(x), φ(x) + 2m ∂r2 r ∂r r2 ∂θ2 ∂z 2 (3.7) 14.

(31) Pyramid Dot. Cuboid Matrix. Figure 3.1: The pyramid quantum dot embedded in a cuboid matrix in Model PyC and Model PyN. Since the solution φ(x) is periodic in θ-direction, we reduce the threedimensional system to two-dimensional system (only for r- and z-direction) by Fourier transformation. In there, we use finite volume method with uniform mesh to discrtize this two-dimensional system, and it results a standard eigenvalues problem A0 u = λu ⇐⇒ A(λ)u = (−λI + A0 )u = 0,. (3.8). where A0 is a non-symmetric sparse matrix with non-zero entries located in the main diagonal and four off-diagonals (See Figure 4.2).. Matrix Dot 1. Dot 2. Figure 3.2: The array of two vertically aligned cylindrical quantum dots embedded in a cylindrical matrix in Model CyC. • Model Irr [11]. At least, we consider the radial symmetric irregular shape quantum dot embedded in the center of a cylindrical matrix. 3.3 shows a structure scheme of the model. Consider the cylindrical Schrödinger equation (3.7), and also reduce it to two-dimensional system by Fourier transformation. We use the finite difference method with curvilinear coordinate system to discretize this two-dimensional system,. 15.

(32) and it results a general eigenvalues problem A0 u = λA1 u ⇐⇒ A(λ)u = (−λA1 + A0 )u = 0. (3.9). where A1 is a positive diagonal matrix and A0 is a symmetric positive definite sparse matrix with non-zero entries located in the main diagonal and eight off-diagonals (See Figure 4.4).. Matrix. Quantum Dot. Figure 3.3: The radial symmetric irregular shape quantum dot embedded in the center of a cylindrical matrix in Model Irr. 3.2.2. Discretization in non-parabolic effective mass model. Now, we consider the non-parabolic effective mass model (3.5). In this model, we have two test models for cylindrical and pyramid shapes of the quantum dot. • Model CyN [9]. We consider the cylindrical quantum dot embedded in the centre of a cylindrical matrix with the same rotation axis. Figure 3.4 illustrates the scheme of the quantum dot structure. Lick the Model CyC, this quantum dot model is described by the cylindrical Schr¨ odinger equation (3.7), and reduce it to two-dimensional system by Fourier transformation. However, we use the non-parabolic effective mass model and discrete it by central finite difference method with non-uniform grid points, and it results a cubic polynomial eigenvalues problem A(λ)u = (λ3 A3 + λ2 A2 + λA1 + A0 )u = 0,. (3.10). where all these four matrices A3 , A2 , A1 and A0 are non-symmetric sparse matrix with non-zero entries located in the main diagonal and four offdiagonals (See Figure 4.6). • Model PyN [10]. It is the last model, which the pyramid quantum dot is embedded in a cuboid matrix again, but in the non-parabolic effective mass model. Using the same discrete method as Model PyC, the discretization results a quintic polynomial eigenvalues problem A(λ)u = (λ5 A5 + λ4 A4 + λ3 A3 + λ2 A2 + λA1 + A0 )u = 0,. (3.11). where A5 and A4 are diagonal matrices, and the others are non-symmetric sparse matrix with non-zero entries located in the main diagonal and six off-diagonals (See Figure 4.8). 16.

(33) Matrix. Dot. Figure 3.4: The cylindrical quantum dot embedded in the centre of a cylindrical matrix in Model CyN.. 4. Numerical Result. In this paper, the quantum dot model which we consider is the InAs quantum dot embedded into a GaAs matrix. The parameters used in the numerical computations in the constant effective mass model (i.e., Model PyC [10], Model CyC [12], and Model Irr [11]) are md = 0.024, Vd = 0.0 for InAs quantum dot, and mm = 0.067, Vm = 0.77 for GaAs matrix. And the parameters in the nonparabolic effective mass model (i.e., Model CyN [9] and Model PyN [10]) in (3.5) are described in Table 4.1. We discrete the time-independent Schrödinger equation (3.2) and obtain five problem models in Section 3.2.. Model CyN Model PyN. InAs quantum dot Pd Ed ∆d Vd 0.2875 0.235 0.81 0.0 0.8503 0.420 0.48 0.0. Pm 0.1993 0.8878. GaAs matrix Em ∆m 1.590 0.80 1.520 0.34. Vm 0.350 0.770. Table 4.1: The physical parameters for non-parabolic effective mass In Section 4.1 and Section 4.2, we work on a work station running HP Unix. The workstation is equipped with an Intel 1.6 GHz Itanium II CPU and 32gigabyte main memory. The proposed schemes are implemented by Fortran 90 and compiled by the HP Fortran Compiler with options +DSitanium2 +O2 +U77 +DD64 +ofast +Onolimit.. 4.1. Coefficient matrices of problem models. In this section, we will see the sparsity and the element values of each coefficient matrix of these five problem models. • Model PyC. The coefficient matrix A0 of the associated linear eigenvalues problem in (3.6) is symmetric positive definite. The sparsity pattern of A0 is shown in Figure 4.1 (a) which non-zero entries are located in the main diagonal and six off-diagonals. The values of non-zero elements in the main diagonal and off-diagonals is shown are Figure 4.1 (b). In the Figure 4.1 (b), we divide these seven (off-)diagonals into seven subfigures, diag(225), diag(15), diag(1), diag(0), diag(−1), diag(−15), and diag(−225), to denote the values of non-zero elements in A0 : if aij is 17.

(34) 0. (a) diag ( 225 ) 0.5 0 −0.5 0. 500. 1000. 1500. 2000. 1500. 2000. 1500. 2000. 1500. 2000. 1500. 2000. 1000 1500 (g) diag ( −225 ). 2000. 1000. 2000. (b) diag ( 15 ) 50. 0.5 0 −0.5 0. 500. 1000 (c) diag ( 1 ). 0.5 0 −0.5. 100. 0. 500. 1000 (d) diag ( 0 ). 0.5 0 −0.5 0. 150. 500. 1000 (e) diag ( −1 ). 0.5 0 −0.5 0. 500. 1000 (f) diag ( −15 ). 200. 0.5 0 −0.5. 0. 50. 100. 150. 0. 500. 0. 500. 0.5 0 −0.5. 200. nz = 1477. (a) Sparsity pattern. 1500. (b) Element values. Figure 4.1: (a) The sparsity pattern of the matrix A0 for the problem model Model PyC with matrix size 245; (b) The values of non-zero elements of A0 for the problem model Model PyC with matrix size 2475. an non-zero element of A0 , we take k = i − j, and plot a point at the position “(i, log10 |aij |)” in subfigure “diag(k)”. On the other hand, a point at (m, n) in subfigure diag(k) means am,m+k in A0 is non-zero, and it’s absolute values |am,m+k | = 10n . • Model CyC. The coefficient matrix A0 of the associated linear eigenvalue problem is “unsymmetric”. Its non-zero entries are located in the main diagonal and four off-diagonals. The corresponding sparsity pattern and the values of nonzero entries are shows in (a) and (b) of Figure 4.2, respectively. (a) diag ( 24 ). 0. 1 0 50. −1. 100. 1. 150. −1. 0. 100. 200. 300. 400. 500 600 (b) diag ( 1 ). 700. 800. 900. 1000. 0. 100. 200. 300. 400. 500 600 (c) diag ( 0 ). 700. 800. 900. 1000. 0. 100. 200. 300. 400. 500 600 (d) diag ( −1 ). 700. 800. 900. 1000. 0. 100. 200. 300. 400. 500 600 (e) diag ( −24 ). 700. 800. 900. 1000. 0. 100. 200. 300. 400. 700. 800. 900. 1000. 0. 1. 200. 0 −1. 250. 1 0. 300. −1 350. 1 0 0. 50. 100. 150. 200 nz = 1840. 250. 300. 350. −1. (a) Sparsity pattern. 500. 600. (b) Element values. Figure 4.2: (a) The sparsity pattern of the matrix A0 for the problem model Model CyC with matrix size 384; (b) The values of non-zero elements of A0 with matrix size 1080. • Model Irr. The coefficient matrices A0 and A1 of the associated generalized eigenvalue problem are symmetric positive definite. The non-zero entries of A0 are located in the main diagonal and eight off-diagonals, and A1 is diagonal. The associated sparsity patterns of A0 and A1 are shown. 18.

(35) in (a) and (b) of Figure 4.3, respectively. The values of non-zero elements of A0 and A1 are shown in (a) and (b) of Figure 4.4, respectively. 0. 0. 20. 20. 40. 40. 60. 60. 80. 80. 100. 100. 0. 20. 40. 60 nz = 880. 80. 100. 0. 20. 40. (a) A0. 60 nz = 114. 80. 100. (b) A1. Figure 4.3: The sparsity pattern of the matrix A0 and A1 for the problem model Model Irr with matrix size 114. (a) diag ( 0 ) 1.6. (a) diag ( 21 ) 2 0 −2 0. 200. 400. 600 (b) diag ( 20 ). 800. 1000. 0. 200. 400. 600 (c) diag ( 19 ). 800. 1000. 0. 200. 400. 600 (d) diag ( 1 ). 800. 1000. 1.4. 2 0 −2. 1.2. 2 0 −2. 1 0.8. 2 0 −2 0. 200. 400. 600 (e) diag ( 0 ). 800. 1000. 0.6. 0. 200. 400. 600 (f) diag ( −1 ). 800. 1000. 0.4. 0. 200. 400. 600 (g) diag ( −19 ). 800. 1000. 0. 200. 400. 600 (h) diag ( −20 ). 800. 1000. 600 (i) diag ( 21 ). 800. 1000. 800. 1000. 2 0 −2 2 0 −2. 0.2 0. 2 0 −2. −0.2. 2 0 −2 0. 200. 400. 0. 200. 400. −0.4. 2 0 −2 600. 0. (a) A0. 200. 400. 600. 800. 1000. (b) A1. Figure 4.4: The values of non-zero elements of A0 and A1 with matrix size 1180. • Model CyN. The coefficient matrices A0 , A1 , A2 and A3 of the associated generalized eigenvalue problem are unsymmetric. The non-zero entries of these three matrices all are located in the main diagonal and four off-diagonals. The associated sparsity patterns of A0 , A1 , A2 and A3 are shown in (a), (b), (c) and (d) of Figure 4.5, respectively. The values of non-zero elements of A0 , A1 , A2 and A3 are shown in (a), (b), (c) and (d) of Figure 4.6, respectively. • Model PyN. The coefficient matrices A0 , A1 , A2 and A3 of the associated generalized eigenvalue problem are unsymmetric, The non-zero entries of A0 , A1 , A2 and A3 are all located in the main diagonal and four off-diagonals, and A4 and A5 are diagonal. The associated sparsity patterns of A0 , A1 , A2 , A3 , A4 and A5 are shown in (a) to (f) of Figure 4.7, respectively. The values of non-zero elements of A0 , A1 , A2 , A3 , A4 and A5 are shown in (a) to (f) of Figure 4.8, respectively.. 19.

(36) 0. 0. 50. 50. 100. 100. 150. 150. 200. 200. 250. 250. 300. 300. 350. 350. 400. 400. 450. 500. 450. 0. 50. 100. 150. 200. 250 300 nz = 2364. 350. 400. 450. 500. 500. 0. 50. 100. 150. (a) A0. 250 300 nz = 2364. 350. 400. 450. 500. 350. 400. 450. 500. (b) A1. 0. 0. 50. 50. 100. 100. 150. 150. 200. 200. 250. 250. 300. 300. 350. 350. 400. 400. 450. 500. 200. 450. 0. 50. 100. 150. 200. 250 300 nz = 548. 350. 400. 450. 500. 500. (c) A2. 0. 50. 100. 150. 200. 250 300 nz = 548. (d) A3. Figure 4.5: The sparsity pattern of the matrix A0 , A1 , A2 and A3 for the problem model Model CyN with matrix size 500.. 20.

(37) (a) diag ( 55 ). (a) diag ( 55 ). 1 0 −1. 1 0 −1 0. 200. 400. 600. 800 1000 (b) diag ( 1 ). 1200. 1400. 1600. 1 0 −1. 0. 200. 400. 600. 800 1000 (b) diag ( 1 ). 1200. 1400. 1600. 0. 200. 400. 600. 800 1000 (c) diag ( 0 ). 1200. 1400. 1600. 0. 200. 400. 600. 800 1000 (d) diag ( −1 ). 1200. 1400. 1600. 0. 200. 400. 600. 800 1000 (e) diag ( −55 ). 1200. 1400. 1600. 0. 200. 400. 600. 1000. 1200. 1400. 1600. 1 0 −1 0. 200. 400. 600. 800 1000 (c) diag ( 0 ). 1200. 1400. 1600. 1 0 −1. 1 0 −1 0. 200. 400. 600. 800 1000 (d) diag ( −1 ). 1200. 1400. 1600. 1 0 −1. 1 0 −1 0. 200. 400. 600. 800 1000 (e) diag ( −55 ). 1200. 1400. 1600. 1 0 −1. 1 0 −1 0. 200. 400. 600. 800. 1000. 1200. 1400. 1600. (a) A0 (a) diag ( 55 ) 0.5 0 −0.5 −1 0.5 0 −0.5 −1 0.5 0 −0.5 −1 0.5 0 −0.5 −1 0.5 0 −0.5 −1. 800. (b) A1 (a) diag ( 55 ) 0 −0.5 −1. 0. 200. 400. 600. 800 1000 (b) diag ( 1 ). 1200. 1400. 1600. 0. 200. 400. 600. 800 1000 (b) diag ( 1 ). 1200. 1400. 1600. 0. 200. 400. 600. 800 1000 (c) diag ( 0 ). 1200. 1400. 1600. 0. 200. 400. 600. 800 1000 (d) diag ( −1 ). 1200. 1400. 1600. 0. 200. 400. 600. 800 1000 (e) diag ( −55 ). 1200. 1400. 1600. 0. 200. 400. 600. 1200. 1400. 1600. 0 −0.5 −1 0. 200. 400. 600. 800 1000 (c) diag ( 0 ). 1200. 1400. 1600 0 −0.5 −1. 0. 200. 400. 600. 800 1000 (d) diag ( −1 ). 1200. 1400. 1600 0 −0.5 −1. 0. 200. 400. 600. 800 1000 (e) diag ( −55 ). 1200. 1400. 1600 0 −0.5 −1. 0. 200. 400. 600. 800. 1000. 1200. 1400. 1600. (c) A2. 800. 1000. (d) A3. Figure 4.6: The values of non-zero elements of A0 , A1 , A2 , and A3 with matrix size 1650.. 21.

(38) 0. 0. 50. 50. 100. 100. 150. 150. 200. 200. 0. 50. 100. 150. 200. 0. 50. 100. nz = 1477. (a) A0 0. 50. 50. 100. 100. 150. 150. 200. 200. 50. 100. 150. 200. 0. 50. 100. nz = 449. (c) A2. 50. 50. 100. 100. 150. 150. 200. 200. 100. 200. (d) A3 0. 50. 150 nz = 449. 0. 0. 200. (b) A1. 0. 0. 150 nz = 1477. 150. 200. 0. nz = 34. 50. 100. 150. 200. nz = 34. (e) A4. (f) A5. Figure 4.7: The sparsity pattern of the matrix A0 , A1 , A2 , A3 , A4 and A5 for the problem model Model PyN with matrix size 245.. 22.

(39) (a) diag ( 225 ). (a) diag ( 225 ) 1 0.5 0 −0.5. 1 0.5 0 0. 500. 1000. 1500. 2000. 0. 500. 1000. 1 0.5 0 0. 500. 1000. 1500. 2000. 0. 500. 1000. 1 0.5 0 0. 500. 1000. 1500. 2000. 0. 500. 1000. 1 0.5 0 0. 500. 1000. 1500. 2000. 0. 500. 1000. 1 0.5 0 0. 500. 1000. 1500. 2000. 0. 500. 1000. 1 0.5 0 −0.5. 0. 0. 500. 500. 1000 1500 (g) diag ( −225 ). 1000. 1500. 1 0.5 0. 2000. 1 0.5 0. 2000. 500. 1000. 1500. 2000. 1500. 2000. 1500. 2000. 1500. 2000. 1500. 2000. 1500. 2000. (b) A1 (a) diag ( 225 ) 0.5. 0. 500. 1000. 1500. 0. 2000. 0. 500. 1000 (b) diag ( 15 ). 0.5 0. 500. 1000. 1500. 0. 2000. 0. 500. 1000 (c) diag ( 1 ). 0.5 0. 500. 1000. 1500. 0. 2000. 0. 500. 1000 (d) diag ( 0 ). 0.5 0. 500. 1000. 1500. 0. 2000. 0. 500. 1000. (e) diag ( −1 ). (e) diag ( −1 ) 0.5. 0. 500. 1000. 1500. 0. 2000. 0. 500. 1000. (f) diag ( −15 ) 1.5 1 0.5 0 1.5 1 0.5 0. 2000. 0. (d) diag ( 0 ). 1.5 1 0.5 0. 1500. 2000. (c) diag ( 1 ). 1.5 1 0.5 0. 2000. 1000 1500 (g) diag ( −225 ). (b) diag ( 15 ). 1.5 1 0.5 0. 1500. 500. (a) diag ( 225 ). 1.5 1 0.5 0. 2000. 0. (a) A0 1.5 1 0.5 0. 1500. (f) diag ( −15 ). (f) diag ( −15 ) 1 0.5 0 −0.5. 2000. (e) diag ( −1 ). (e) diag ( −1 ) 1 0.5 0 −0.5. 1500. (d) diag ( 0 ). (d) diag ( 0 ) 1 0.5 0 −0.5. 2000. (c) diag ( 1 ). (c) diag ( 1 ) 1 0.5 0 −0.5. 1500 (b) diag ( 15 ). (b) diag ( 15 ) 1 0.5 0 −0.5. (f) diag ( −15 ) 0.5. 0. 500. 1000 1500 (g) diag ( −225 ). 0. 2000. 0. 500. 1000 1500 (g) diag ( −225 ). 2000. 0. 500. 1000. 2000. 0.5 0. 500. 1000. 1500. 0. 2000. (c) A2. 1500. (d) A3 (a) diag ( 0 ). (a) diag ( 0 ) 0.4. 0.88. 0.39. 0.86. 0.38. 0.84. 0.37. 0.82. 0.36. 0.8. 0.35. 0.78. 0.34. 0.76. 0.33. 0.74. 0.32. 0.72. 0.31. 0.7. 0. 500. 1000. 1500. 0.3. 2000. (e) A4. 0. 500. 1000. 1500. 2000. (f) A5. Figure 4.8: The values of non-zero elements of A0 , A1 , A2 , A3 , A4 and A5 with matrix size 2475.. 23.

(40) Finally, we give Table 4.2 to collect the information of these five problem models in Section 3.2 and Section 4.1. model Model PyC Model CyC Model Irr Model CyN Model PyN. EVP type standard standard generalized cubic quintic. QD geometry pyramidal cylindrical irregular cylindrical pyramidal. effective mass model constant constant constant non-parabolic non-parabolic. discretization FVM FDM FVM FDM FVM. Model Model PyC Model CyC Model Irr Model CyN Model PyN. property of coefficient matrix A0 is symmetric positive definite A0 is not symmetric A0 is symmetric positive definite, and A1 is diagonal Use non-uniform mesh, and all Ai are not symmetric A0 ∼ A3 are not symmetric, and A4 , A5 are diagonal. reference [10] [12] [11] [9] [10]. Table 4.2: The characteristic of these five problem models.. 4.2. Performance of different solving schemes for correction equation. In Section 3.2, we give four schemes to solve the correction equation (2.7) in the polynomial Jacobi-Davidson method. In this section, we want to find a better scheme in almost cases. Our method is to compute the smallest five positive eigenvalues and corresponding eigenvectors depend on different preconditioner and different problem models for each scheme. We will report the average computing time and give a suggestion for choosing the scheme. In here, we use two kinds of precondition iterative linear system methods, BiCGSTAB (BiConjugate Gradient Stabilized method) [21] and GMRES (Generalized Minimal Residual method) [16], with the choosing rule by Algorithm 4.1 in computing the first smallest positive eigenvalues (λ1 ), and by Algorithm 4.2 in computing other desired positive eigenvalues (λ2 , λ3 , · · · ) [11]. And we use SSOR method [8, in Section 10.1.6] to establish the preconditioner, that is, we set M ≈ A by M ≡ (D − ωL)D−1 (D − ωD) for a given ω ∈ (0, 2), where D, L and U are the diagonal, strictly lower triangular, and strictly upper triangular matrices of A. In the choosing rule, we use the notation {GMRES, SSOR, 30, 10−3 } indicates that this linear system is solved by GMRES with SSOR preconditioner, the maximum iteration number is 30, and the residual stopping criterion is 10−3 . And the iteration process within the polynomial Jacobi-Davidson subroutine terminates when the residual of the eigenvalues problem of each model reaches 1.0 × 10−10 for the computed eigenpairs. If the number polynomial Jacobi-Davidson iteration more than 6000, but the residual is still more than 1.0 × 10−10 , then we say this eigenpair fails to converge. The timing results with the average of computing the smallest five positive eigenvalues and corresponding eigenvectors are shown in Figure 4.9, where the. 24.

(41) Algorithm 4.1 The choosing rule for computing the smallest positive eigenvalues Input: The PJD iterative number k 1: if (k < 9) then 2: Use {BiCGSTAB, No precondition, 7, 10−3 } 3: else 4: Use {GMRES, SSOR, 30, 10−3 } 5: end if. Algorithm 4.2 The choosing rule for computing eigenvalues other than the smallest positive one Input: The PJD iterative number k, and the residual rk . 1: if (krk k > 0.1 and k < 9) then 2: Use {BiCGSTAB, No precondition, 7, 10−3 } 3: else if (krk k ≥ 10−1 and k > 14) then 4: Set iter = 30 and flag = 1 5: Use {GMRES, SSOR, iter, 10−3 } 6: else if (flag = 1) then 7: Set iter = 15 and flag = 0 8: Use {GMRES, SSOR, iter, 10−3 } 9: else if (krk k < 0.1 and krk−1 k/krk k < 4) then 10: Set iter = min(30, iter + 2) 11: Use {GMRES, SSOR, iter, 10−3 } 12: else 13: Use {GMRES, SSOR, iter, 10−3 } 14: end if. 25.

(42) SSOR parameter ω is chosen in 0.80 to 1.95 (Model Irr is 0.80 to 1.98), and the matrix size of each model is describe in Table 4.3 Model Matrix Size. Model PyC 5, 216, 783. Model CyC 1, 200, 000. Model Irr 1, 228, 150. Model CyN 1, 206, 400. Model PyN 5, 216, 783. Table 4.3: The matrix size for each problem models. We highlight some observations in Figure 4.9 by these four problem models: Model PyC, Model CyC, Model Irr and Model PyN. • No matter what the problem model is, Scheme ExactOne always can not converge for all ω, that is, the residual rk in (2.3) can not reach 1.0 × 10−10 . In almost case, Scheme ExactOne can only reduce the residual to 1.0×10−2. Thus, we do not show the result of Scheme ExactOne in Figure 4.9. • In Model Irr, Scheme Preconly fails to converge when ω < 1.70. • When ω increase to 1.90, the computing time for these three scheme are almost decreasing, but it is increasing rapidly when ω > 1.90. (In the Model Irr, this number is 1.95). • In these four problem models, Scheme OneLS usually has the best performance for almost ω and almost problem models. The special case is in Model PyC, when ω ≥ 1.84, Scheme PrecOnly is more better then Scheme OneLS. Maybe this is because the coefficient matrix A0 of Model PyC is a symmetric positive definite matrix. • In each omega and problem model, the computing time of Scheme TwoLS is roughly 2 times of Scheme OneLS. This is because, in Scheme OneLS, we only need to solve just one linear system, but in Scheme TwoLS, there are two linear system need to solve in each Jacobi-Davidson iteration. In Model CyN, the curve pattern is different from the other four problem models in Figure 4.9. When ω ≤ 1.50, Scheme OneLS has better performance, and when ω ≥ 1.55, Scheme PrecOnly is better. We suggest the different from the other four problem models is because we use the non-uniform grid. Therefore, we suggest to use Scheme OneLS to be the correction equation scheme, because the performance of Scheme OneLS is the best in almost cases, even if it is not the best, the difference between Scheme OneLS and the best one is not to large.. 4.3. Performance of Scheme OneLS with different preconditioners. From Section 4.2, we choose Scheme OneLS to solve the correction equation in this section. The numerical performance of it with different preconditioners of A(θk ), such as Jacobi [8, in Section 10.1.1], ILU, SSOR, and ICC [8, in Section 10.3.2], are reported in the following. In order to use different preconditioners, we include PETSc [3, 4, 5] within our Fortran code. PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of data structures and routines for the scalable (parallel) solution 26.

(43) Model_PyC with matrix size 5,216,783. Model_CyC with matrix size 1,200,000 S. S. STwoLS. STwoLS. S. S. OneStep. OneStep. CPU Time (log). OneLS. CPU Time (log). OneLS. 3. 10. 0.8. 3. 10. 0.9. 1. 1.1. 1.2. 1.3 1.4 1.5 Omega for SSOR. 1.6. 1.7. 1.8. 1.9. 2. 0.8. 0.9. 1. 1.1. (a) Model PyC. 1.2. 1.3 1.4 1.5 Omega for SSOR. 1.6. 1.7. 1.8. 1.9. 2. (b) Model CyC. Model_Irr with matrix size 1,228,150. Model_PyN with matrix size 5,216,783 S. OneStep. STwoLS. 4. 10. S. CPU Time (log). CPU Time (log). OneLS. 3. 10 S. OneStep. S. TwoLS. 3. 10. S. OneLS. 0.8. 0.9. 1. 1.1. 1.2. 1.3 1.4 1.5 Omega for SSOR. 1.6. 1.7. 1.8. 1.9. 2. 0.8. 0.9. 1. 1.1. (c) Model Irr. 1.2. 1.3 1.4 1.5 Omega for SSOR. 1.6. 1.7. 1.8. 1.9. 2. (d) Model PyN Model_CyN with matrix size 1,228,150 S. 4. OneStep. S. OneLS. CPU Time (log). 10. 3. 10. 0.8. 0.9. 1. 1.1. 1.2. 1.3 1.4 1.5 Omega for SSOR. 1.6. 1.7. 1.8. 1.9. 2. (e) Model CyN. Figure 4.9: Timing results for five models with three correction equation schemes.. 27.

(44) of scientific applications modeled by partial differential equations. In addition, we also change the work station for HP BL460c to running PETSc. This work station is equipped with CPU for Intel Dual-Core 5160 3.0 GHz X 2, main memory for 32GB DDR2-667 Fully Buffered DIMMs, hard disk for HP 72GB Hot Plug SAS HD X 2, and the OS for Linux 2.6.9-67. There are some remarks for PETSc code in Fortran. • We still use the original polynomial Jacobi-Davidson method code. Until when we need to solve the linear system Mz = y, we call PETSc to establish the preconditioner matrix M and compute z = M−1 y. (In PETSc, we only need to give the matrix A, and call PCApply to get the vector z directly.) • PETSc is primarily written in C, but our original code is written in Fortran. So we need to use the PETSc Fortran interface to call PETSc, and these function calls are written in the Chapter 11 of PETSc manual [5]. • Although the PETSc manual [5] says ”the PETSc Fortran interface works with both F77 and F90 compliers”, we still make some mistake in F90 complier. So we use F77 complier to comply the PETSc Fortran interface code and use F90 complier to comply the original Fortran code. (But notice that, when we create a large array in F77, write ”vec ( 1000000 )” rather than ”vec ( dim )”, or the segmentation fault occurs.) • In original Fortran code, we use BiCGSTAB and GMRES to solve the linear system and we only need to provide the code about matrix-vector multiple, that is, we don’t need to form the coefficient matrix. But in PETSc, we need to form these coefficient matrices, then PETSc can establish the preconditioner matrix M, and sometimes we would get the error message about out of memory. • Because we don’t discuss the parallel computation in this thesis, when we create the PETSc matrix, we use the function MatCreateSeqAIJ to get a sequential matrix. In here, we use the rule in Algorithm 4.3 to control the program using BiCGSTAB or GMRES to solve the linear system. Notice that, in this paper, we take min iter = 10 in Algorithm 4.3. Algorithm 4.3 The choosing rule for computing eigenvalues in PETSc mode Input: The PJD iterative number k, and the residual rk . 1: if (krk k > 0.1 and k < 9) then 2: Use {BiCGSTAB, No precondition, 7, 10−3 } 3: else if (krk k ≥ 10−1 and k > 14) then 4: Use {GMRES, SSOR, min iter × 2, 10−3 } 5: else 6: Use {GMRES, SSOR, min iter, 10−3 } 7: end if The numerical results of standard eigenvalues problems for Model PyC and Model CyC are illustrated in Table 4.4 and Table 4.5, respectively, the generalized eigenvalues problem for Model Irr is given in Table 4.6, and the cubic and quintic eigenvalues problems are shown in Table 4.7 and Table 4.8, respectively. 28.

(45) Model PyC Size: 5,216,783 Ave. itno. Ave. CPUTime. SSOR(ω) ω = 1.8 1.85 1.9 17.2 16.4 18.6 381.0 363.5 418.0. l=0 28.2 654.6. ILU(l) l=1 l=2 21.2 17.8 639.1 656.7. l=3 OM OM. l=0 28.6 598.0. l=1 20.2 525.4. ICC(l) l=2 17.0 577.4. l=3 DZP DZP. l=4 OM OM. Jacobi. None. 77.6 1268.3. 86.0 1371.3. Table 4.4: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model PyC. The word ”OM” means Out of Memory when establish the preconditioner in PETSc, and ”DZP” means Detected Zero Pivot in Cholesky factorization in PETSc.. 29. Model CyC Size: 1,200,000 Ave. itno. Ave. CPUTime. ω = 1.80 57.0 341.1. SSOR(ω) 1.85 1.90 51.2 46.4 307.1 277.6. 1.95 242.6 2241.2. l=5 29.6 283.0. ILU(l) l=6 l=7 25.2 22.4 261.7 250.5. l=8 21.2 255.5. ICC(l) l=0 l=1 268.8 NS 1624.0 NS. Jacobi. None. 485.4 2396.3. 576.0 2779.3. Table 4.5: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model CyC. The word ”NS” means the linear system is Not Solvable by this preconditioner.. Model Irr Size: 1,228,160 Ave. itno. Ave. CPUTime. ω = 1.85 91.6 779.3. SSOR(ω) 1.90 1.95 78.4 77.6 660.7 651.5. 1.98 106.0 909.5. l=5 32.2 359.5. ILU(l) l=6 l=7 30.0 28.6 353.4 356.4. l=8 27.2 358.4. l=0 89.2 629.5. ICC(l) l=1 l=2 NAN NAN NAN NAN. Jacobi. None. 1716.0 9717.9. NC NC. Table 4.6: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model CyC. The word ”NAN” means the answer of the linear system is NAN in PETSc, and ”NC” means for some eigenpair is Not Converge..

(46) Model CyN Size: 1,206,400 Ave. itno. Ave. CPUTime. ω = 1.25 73.3 287.9. SSOR(ω) 1.30 1.35 58.3 79.0 215.3 346.8. 1.40 75.3 316.7. l=5 31.3 173.4. ILU(l) l=6 l=7 17.0 20.0 96.5 114.0. l=8 20.3 126.2. ICC(l) l=0 l=1 NC NC NC NC. Jacobi. None. 167.3 517.5. NC NC. Table 4.7: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the three smallest positive eigenvalues in Model CyN, which is a cubic EVP. The word ”NC” means for some eigenpair is Not Converge.. 30 Model PyN Size: 5,216,783 Ave. itno. Ave. CPUTime. SSOR(ω) ω = 1.80 1.85 19.6 19.6 817.3 766.2. 1.90 20.2 805.6. l=0 20.4 796.2. ILU(l) l=1 l=2 16.6 14.4 801.5 846.2. l=3 OM OM. l=0 26.0 932.3. l=1 19.8 878.8. ICC(l) l=2 18.8 1018.5. l=3 DZP DZP. l=4 OM OM. Jacobi. None. 62.4 1926.6. 68.2 2057.3. Table 4.8: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model PyN, which is a quintic EVP. The word ”OM” means Out of Memory when establish the preconditioner in PETSc, and ”DZP” means Detected Zero Pivot in Cholesky factorization in PETSc..

(47) The following are some details for these tables. • In our test, we has tried several values of min iter, like 7, 10, 15, 20, etc., and found that for almost preconditioner, min iter = 10 has better performance than the others for all problem models. • For every problem model, we has tried the parameter ω in SSOR from 0.8 to 1.98, and the fill levels of ILU and ICC from 0 to 12. The cases in tables are the best case, which has the smallest average CPUTime, and some cases near the best case for each preconditioner. • The best case of SSOR in PETSc is similar to the best case of SSOR in original Fortran code for all problem models except Model CyN (see Figure 4.9). • In Model CyC, Model Irr, and Model CyN, the best case of ILU is when the fill level l equals to 6 or 7. At this level, its performance is better than the best case of SSOR. • But using ILU in Model PyC and Model PyN, we get “out of memory” in PETSc error message, when the fill level is great than 3. The guess of the reason is the problem sizes of these two models are more than 5 million. In order to verify it, we arise the problem size of Model CyC and Model Irr more than 5 million, and see what error massage occur in these two models. We put the result in the Table 4.9. matrix size original nnz ILU(0) ILU(1) ILU(2) ILU(3) ILU(4) ILU(5) ILU(6) ILU(7). Model PyC 5,216,783 36,444,135 36,444,135 (1) 67,598,527 (1.85) 119,474,125 (3.28) out of memory. Model CyC 5,043,000 25,211,718 25,211,718 (1) 35,294,438 (1.40) 45,377,158 (1.80) 65,542,594 (2.60) 85,708,028 (3.40) 105,872,460 (4.20) 126,038,890 (5.00) out of memory. Model Irr 5,016,960 45,101,770 45,101,770 (1) 65,104,346 (1.44) 85,075,570 (1.89) 105,015,442 (2.33) 124,923,962 (2.77) out of memory. Table 4.9: The number of non-zero elements (nnz) and fill ratio for each ILU fill level. The notation ”a(b)” means total nnz = a and fill ratio = b = (a/original nnz). In Table 4.9, we can find that, after fill in the non-zero elements, if the total number of non-zero elements is more than 126,038,890, we get the error message about out of memory. Thus, this error occurs independent on the matrix size, but dependent on the number of non-zero elements. • We also show the number of non-zero elements and fill ratio for small size Model PyC in Table 4.10. We obtain the fill ratios of non-zero elements is independent on the matrix size. Thus, we can compute the fill ratio of the small size first, and use this fill ratio to approach the total number of non-zero elements of the original large size problem. 31.

(48) matrix size original nnz ILU(0) ILU(1) ILU(2) ILU(3). 5,216,783 36,444,135 36,444,135 (1) 67,598,527 (1.85) 119,474,125 (3.28) out of memory. 640,775 4,467,183 4,467,183 (1) 8,275,543 (1.85) 14,610,901 (3.27) 27,281,041 (6.11). Table 4.10: The number of non-zero elements (nnz) and fill ratio for each ILU fill level in different matrix size of Model PyC. • We use the ICC preconditioner from PETSc, and it is easy to get the error message. So, if it is not necessary, we don’t suggest to use the ICC preconditioner. • The Jacobi method is always not a good choice in each problem models. Collection these details above, when we want to choice a preconditioner to solve the correction equation in PETSc, we suggest the method by following steps: • Use the small size problem to get the fill ratio of ILU(6), and use this fill ratio to approach the number of non-zero elements of ILU(6), nnz, in the original large size problem. • If nnz < 120 million, we suggest to use the ILU(6) to establish the preconditioner. (If the coefficient matrices are all symmetric, we can choose ICC(6) to get better performance, but there is a large probability that it would make mistake.) • If nnz > 120 million, we suggest to use the SSOR to establish the preconditioner with omega between 1.80 and 1.90.. 5. Conclusion. In this thesis, we study the polynomial Jacobi-Davidson method to solve the polynomial eigenvalues problems. We try it from five different kind models arising in quantum dot for linear, cubic, and quintic problem, and it actually can work. So we believe this iterative method can work in each polynomial eigenvalues problem. When we use this iterative method, we suggest that the Scheme OneLS almost has better performance. And if the non-zero elements of preconditioner matrix for ILU(6) has less than 120,000,000, we suggest to use ILU preconditioner with fill level 6, and otherwise, use SSOR preconditioner with ω between 1.80 and 1.90 in PETSc.. References [1] W. E. Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem. Quarterly of Applied Mathematics, 9:17–29, 1951. 32.

(49) [2] Zhaojun Bai and Yangfeng Su. SOAR: A second-order Arnoldi method for the solution of the quadratic eigenvalue problem. SIAM J. Matrix Anal. Appl., 26(3):640–659, 2005. [3] Satish Balay, William D. Gropp, Lois C. McInnes, and Barry F. Smith. Efficienct Management of Parallelism in Object Oriented Numerical Software Libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163–202. Birkhäuser Press, 1997. [4] Satish Balay, William D. Gropp, Lois C. McInnes, and Barry F. Smith. PETSc home page. http://www.mcs.anl.gov/petsc, 2001. [5] Satish Balay, William D. Gropp, Lois C. McInnes, and Barry F. Smith. PETSc Users Manual. Technical Report ANL-95/11 - Revision 2.1.5, Argonne National Laboratory, 2003. [6] T. Betcke and H. Voss. A Jacobi-Davidson-type projection method for nonlinear eigenvalue problems. Future Generation Computer Systems., 20:363– 372, 2004. [7] E. A. de Andrada e Silva, G. C. La Rocca, and F. Bassani. Spin-orbit splitting of electronic states in semiconductor asymmetric quantum wells. PHYSICAL REVIEW B, 55(24):16293–16299, 1997. [8] Gene H. Golub and Charles F. Van Loan. Matrix Computations Third edition. The Johns Hopkins University Press, 1996. [9] Tsung-Min Hwang, Wen-Wei Lin, and Jinn-Liang Liu and. JacobiDavidson methods for cubic eigenvalue problems. Numer. Linear Algebra Appl., 12:605–624, 2005. [10] Tsung-Min Hwang, Wen-Wei Lin, Wei-Cheng Wang, and Weichung Wang. Numerical simulation of three dimensional pyramid quantum dot. Journal of Computational Physics, 196(1):208–232, 2004. [11] Tsung-Min Hwang, Wei-Cheng Wang, and Weichung Wang. Numerical schemes for three-dimensional irregular shape quantum dots over curvilinear coordinate systems. Journal of Computational Physics, 226(1):754–773, 2007. [12] Tsung-Min Hwang, Wei-Hua Wang, and Weichung Wang. Efficient numerical schemes for electronic states in coupled quantum dots. Journal of Nanoscience and Nanotechnology, 8(7):3695–3709, 2008. [13] Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Standards, 45:255–282, 1950. [14] Karl Meerbergen. Locking and restarting quadratic eigenvalue solvers. SIAM J. Sci. Comput., 22(5):1814–1839, 2001. [15] Karl Meerbergen. The quadratic Arnoldi method for the solution of the quadratic eigenvalue problem. SIAM J. Matrix Anal. Appl., 30(4):1463– 1482, 2008. 33.

(50) [16] Youcef Saad and Martin H. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. ScI. STAT. COMPUT., 7(3):856–869, 1986. [17] Gerard L. G. Sleijpen, Albert G. L. Booten, Diederik R. Fokkema, and Henk A. van der Vorst. Jacobi-davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT Numerical Mathematics, 36(3):595–633, 1996. [18] Gerard L. G. Sleijpen and Henk A. Van der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl., 17:401–425, 1996. [19] G. W. Stewart. Simultaneous iteration for computing invariant subspaces of non-Hermitian matrices. Numer. Math., 25:123–136, 1976. [20] G. W. Stewart. Matrix Algorithms Volume II: Eigensystems. SIAM, 2001. [21] H. A. Van Der Vorst. Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM Journal on Scientic and Statistical Computing, 13:631–644, 1992. [22] H. Voss. An Arnoldi method for nonlinear symmetric eigenvalue problems. in Online Proceedings of the SIAM Conference on Applied Linear Algebra, Williamsburg, http://www.siam.org/meetings/laa03/, 2003. [23] H. Voss. An arnoldi method for nonlinear eigenvalue problems. BIT Numerical Mathematics, 44:387–401, 2004.. 34.

(51)