Performance of Scheme OneLS with different precon- precon-ditioners

From Section 4.2, we choose Scheme OneLS to solve the correction equation in this section. The numerical performance of it with different preconditioners of A(θk), such as Jacobi [8, in Section 10.1.1], ILU, SSOR, and ICC [8, in Section 10.3.2], are reported in the following.

In order to use different preconditioners, we include PETSc [3, 4, 5] within our Fortran code. PETSc (Portable, Extensible Toolkit for Scientific Computa-tion) is a suite of data structures and routines for the scalable (parallel) solution

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 10³

Model_PyC with matrix size 5,216,783

Omega for SSOR

CPU Time (log)

SOneStep S

TwoLS SOneLS

(a) Model PyC

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

10³

Model_CyC with matrix size 1,200,000

Omega for SSOR

CPU Time (log)

SOneStep S

TwoLS SOneLS

(b) Model CyC

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

10³ 10⁴

Model_Irr with matrix size 1,228,150

Omega for SSOR

CPU Time (log)

SOneStep STwoLS SOneLS

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

10³

Model_PyN with matrix size 5,216,783

Omega for SSOR

CPU Time (log)

S OneStep STwoLS SOneLS

(d) Model PyN

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

10³ 10⁴

Model_CyN with matrix size 1,228,150

Omega for SSOR

CPU Time (log)

SOneStep SOneLS

(e) Model CyN

Figure 4.9: Timing results for five models with three correction equation schemes.

of scientific applications modeled by partial differential equations. In addition, we also change the work station for HP BL460c to running PETSc. This work station is equipped with CPU for Intel Dual-Core 5160 3.0 GHz X 2, main mem-ory for 32GB DDR2-667 Fully Buffered DIMMs, hard disk for HP 72GB Hot Plug SAS HD X 2, and the OS for Linux 2.6.9-67.

There are some remarks for PETSc code in Fortran.

• We still use the original polynomial Jacobi-Davidson method code. Until when we need to solve the linear system Mz = y, we call PETSc to establish the preconditioner matrix M and compute z = M⁻¹y. (In PETSc, we only need to give the matrix A, and call PCApply to get the vector z directly.)

• PETSc is primarily written in C, but our original code is written in For-tran. So we need to use the PETSc Fortran interface to call PETSc, and these function calls are written in the Chapter 11 of PETSc manual [5].

• Although the PETSc manual [5] says ”the PETSc Fortran interface works with both F77 and F90 compliers”, we still make some mistake in F90 complier. So we use F77 complier to comply the PETSc Fortran interface code and use F90 complier to comply the original Fortran code. (But notice that, when we create a large array in F77, write ”vec ( 1000000 )” rather than ”vec ( dim )”, or the segmentation fault occurs.)

• In original Fortran code, we use BiCGSTAB and GMRES to solve the linear system and we only need to provide the code about matrix-vector multiple, that is, we don’t need to form the coefficient matrix. But in PETSc, we need to form these coefficient matrices, then PETSc can es-tablish the preconditioner matrix M, and sometimes we would get the error message about out of memory.

• Because we don’t discuss the parallel computation in this thesis, when we create the PETSc matrix, we use the function MatCreateSeqAIJ to get a sequential matrix.

In here, we use the rule in Algorithm 4.3 to control the program using BiCGSTAB or GMRES to solve the linear system. Notice that, in this paper, we take min iter = 10 in Algorithm 4.3.

Algorithm 4.3 The choosing rule for computing eigenvalues in PETSc mode Input: The PJD iterative number k, and the residual rk.

1: if (kr^kk > 0.1 and k < 9) then

2: Use {BiCGSTAB, No precondition, 7, 10⁻³}

3: else if (krkk ≥ 10⁻¹and k > 14) then

4: Use {GMRES, SSOR, min iter × 2, 10⁻³}

5: else

6: Use {GMRES, SSOR, min iter, 10⁻³}

7: end if

The numerical results of standard eigenvalues problems for Model PyC and Model CyC are illustrated in Table 4.4 and Table 4.5, respectively, the general-ized eigenvalues problem for Model Irr is given in Table 4.6, and the cubic and quintic eigenvalues problems are shown in Table 4.7 and Table 4.8, respectively.

Model PyC SSOR(ω) ILU(l) ICC(l)

Jacobi None Size: 5,216,783 ω = 1.8 1.85 1.9 l = 0 l = 1 l = 2 l = 3 l = 0 l = 1 l = 2 l = 3 l = 4

Ave. itno. 17.2 16.4 18.6 28.2 21.2 17.8 OM 28.6 20.2 17.0 DZP OM 77.6 86.0

Ave. CPUTime 381.0 363.5 418.0 654.6 639.1 656.7 OM 598.0 525.4 577.4 DZP OM 1268.3 1371.3 Table 4.4: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model PyC. The word ”OM” means Out of Memory when establish the preconditioner in PETSc, and ”DZP”

means Detected Zero Pivot in Cholesky factorization in PETSc.

Model CyC SSOR(ω) ILU(l) ICC(l)

Jacobi None Size: 1,200,000 ω = 1.80 1.85 1.90 1.95 l = 5 l = 6 l= 7 l = 8 l = 0 l = 1

Ave. itno. 57.0 51.2 46.4 242.6 29.6 25.2 22.4 21.2 268.8 NS 485.4 576.0 Ave. CPUTime 341.1 307.1 277.6 2241.2 283.0 261.7 250.5 255.5 1624.0 NS 2396.3 2779.3

Table 4.5: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model CyC. The word ”NS” means the linear system is Not Solvable by this preconditioner.

Model Irr SSOR(ω) ILU(l) ICC(l)

Jacobi None Size: 1,228,160 ω = 1.85 1.90 1.95 1.98 l = 5 l= 6 l = 7 l = 8 l = 0 l = 1 l = 2

Ave. itno. 91.6 78.4 77.6 106.0 32.2 30.0 28.6 27.2 89.2 NAN NAN 1716.0 NC Ave. CPUTime 779.3 660.7 651.5 909.5 359.5 353.4 356.4 358.4 629.5 NAN NAN 9717.9 NC Table 4.6: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model CyC. The word ”NAN” means the answer of the linear system is NAN in PETSc, and ”NC” means for some eigenpair is Not Converge.

Model CyN SSOR(ω) ILU(l) ICC(l)

Jacobi None Size: 1,206,400 ω = 1.25 1.30 1.35 1.40 l = 5 l= 6 l = 7 l = 8 l = 0 l = 1

Ave. itno. 73.3 58.3 79.0 75.3 31.3 17.0 20.0 20.3 NC NC 167.3 NC

Ave. CPUTime 287.9 215.3 346.8 316.7 173.4 96.5 114.0 126.2 NC NC 517.5 NC

Table 4.7: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the three smallest positive eigenvalues in Model CyN, which is a cubic EVP. The word ”NC” means for some eigenpair is Not Converge.

Model PyN SSOR(ω) ILU(l) ICC(l)

Jacobi None Size: 5,216,783 ω = 1.80 1.85 1.90 l = 0 l = 1 l = 2 l = 3 l = 0 l = 1 l = 2 l = 3 l = 4

Ave. itno. 19.6 19.6 20.2 20.4 16.6 14.4 OM 26.0 19.8 18.8 DZP OM 62.4 68.2

Ave. CPUTime 817.3 766.2 805.6 796.2 801.5 846.2 OM 932.3 878.8 1018.5 DZP OM 1926.6 2057.3 Table 4.8: The average PJD iteration number and the average CPUTime in each precondition schemes for finding the five smallest positive eigenvalues in Model PyN, which is a quintic EVP. The word ”OM” means Out of Memory when establish the preconditioner in PETSc, and ”DZP” means Detected Zero Pivot in Cholesky factorization in PETSc.

The following are some details for these tables.

• In our test, we has tried several values of min iter, like 7, 10, 15, 20, etc., and found that for almost preconditioner, min iter = 10 has better performance than the others for all problem models.

• For every problem model, we has tried the parameter ω in SSOR from 0.8 to 1.98, and the fill levels of ILU and ICC from 0 to 12. The cases in tables are the best case, which has the smallest average CPUTime, and some cases near the best case for each preconditioner.

• The best case of SSOR in PETSc is similar to the best case of SSOR in original Fortran code for all problem models except Model CyN (see Figure 4.9).

• In Model CyC, Model Irr, and Model CyN, the best case of ILU is when the fill level l equals to 6 or 7. At this level, its performance is better than the best case of SSOR.

• But using ILU in Model PyC and Model PyN, we get “out of memory”

in PETSc error message, when the fill level is great than 3. The guess of the reason is the problem sizes of these two models are more than 5 million. In order to verify it, we arise the problem size of Model CyC and Model Irr more than 5 million, and see what error massage occur in these two models. We put the result in the Table 4.9.

Model PyC Model CyC Model Irr

matrix size 5,216,783 5,043,000 5,016,960

original nnz 36,444,135 25,211,718 45,101,770 ILU(0) 36,444,135 (1) 25,211,718 (1) 45,101,770 (1) ILU(1) 67,598,527 (1.85) 35,294,438 (1.40) 65,104,346 (1.44) ILU(2) 119,474,125 (3.28) 45,377,158 (1.80) 85,075,570 (1.89) ILU(3) out of memory 65,542,594 (2.60) 105,015,442 (2.33)

ILU(4) 85,708,028 (3.40) 124,923,962 (2.77)

ILU(5) 105,872,460 (4.20) out of memory

ILU(6) 126,038,890 (5.00)

ILU(7) out of memory

Table 4.9: The number of non-zero elements (nnz) and fill ratio for each ILU fill level. The notation ”a(b)” means total nnz = a and fill ratio = b = (a/original nnz).

In Table 4.9, we can find that, after fill in the non-zero elements, if the total number of non-zero elements is more than 126,038,890, we get the error message about out of memory. Thus, this error occurs independent on the matrix size, but dependent on the number of non-zero elements.

• We also show the number of non-zero elements and fill ratio for small size Model PyC in Table 4.10. We obtain the fill ratios of non-zero elements is independent on the matrix size. Thus, we can compute the fill ratio of the small size first, and use this fill ratio to approach the total number of non-zero elements of the original large size problem.

matrix size 5,216,783 640,775 original nnz 36,444,135 4,467,183

ILU(0) 36,444,135 (1) 4,467,183 (1) ILU(1) 67,598,527 (1.85) 8,275,543 (1.85) ILU(2) 119,474,125 (3.28) 14,610,901 (3.27) ILU(3) out of memory 27,281,041 (6.11)

Table 4.10: The number of non-zero elements (nnz) and fill ratio for each ILU fill level in different matrix size of Model PyC.

• We use the ICC preconditioner from PETSc, and it is easy to get the error message. So, if it is not necessary, we don’t suggest to use the ICC preconditioner.

• The Jacobi method is always not a good choice in each problem models.

Collection these details above, when we want to choice a preconditioner to solve the correction equation in PETSc, we suggest the method by following steps:

• Use the small size problem to get the fill ratio of ILU(6), and use this fill ratio to approach the number of non-zero elements of ILU(6), nnz, in the original large size problem.

• If nnz < 120 million, we suggest to use the ILU(6) to establish the pre-conditioner. (If the coefficient matrices are all symmetric, we can choose ICC(6) to get better performance, but there is a large probability that it would make mistake.)

• If nnz > 120 million, we suggest to use the SSOR to establish the precon-ditioner with omega between 1.80 and 1.90.

5 Conclusion

In this thesis, we study the polynomial Jacobi-Davidson method to solve the polynomial eigenvalues problems. We try it from five different kind models arising in quantum dot for linear, cubic, and quintic problem, and it actually can work. So we believe this iterative method can work in each polynomial eigenvalues problem.

When we use this iterative method, we suggest that the Scheme OneLS al-most has better performance. And if the non-zero elements of preconditioner matrix for ILU(6) has less than 120,000,000, we suggest to use ILU precondi-tioner with fill level 6, and otherwise, use SSOR precondiprecondi-tioner with ω between 1.80 and 1.90 in PETSc.

References

[1] W. E. Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem. Quarterly of Applied Mathematics, 9:17–29, 1951.

[2] Zhaojun Bai and Yangfeng Su. SOAR: A second-order Arnoldi method for the solution of the quadratic eigenvalue problem. SIAM J. Matrix Anal.

Appl., 26(3):640–659, 2005.

[3] Satish Balay, William D. Gropp, Lois C. McInnes, and Barry F. Smith.

Efficienct Management of Parallelism in Object Oriented Numerical Soft-ware Libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163–202. Birkh¨auser Press, 1997.

[4] Satish Balay, William D. Gropp, Lois C. McInnes, and Barry F. Smith.

PETSc home page. http://www.mcs.anl.gov/petsc, 2001.

[5] Satish Balay, William D. Gropp, Lois C. McInnes, and Barry F. Smith.

PETSc Users Manual. Technical Report ANL-95/11 - Revision 2.1.5, Ar-gonne National Laboratory, 2003.

[6] T. Betcke and H. Voss. A Jacobi-Davidson-type projection method for non-linear eigenvalue problems. Future Generation Computer Systems., 20:363–

372, 2004.

[7] E. A. de Andrada e Silva, G. C. La Rocca, and F. Bassani. Spin-orbit splitting of electronic states in semiconductor asymmetric quantum wells.

PHYSICAL REVIEW B, 55(24):16293–16299, 1997.

[8] Gene H. Golub and Charles F. Van Loan. Matrix Computations Third edition. The Johns Hopkins University Press, 1996.

[9] Tsung-Min Hwang, Wen-Wei Lin, and Jinn-Liang Liu and. Jacobi-Davidson methods for cubic eigenvalue problems. Numer. Linear Algebra Appl., 12:605–624, 2005.

[10] Tsung-Min Hwang, Wen-Wei Lin, Wei-Cheng Wang, and Weichung Wang.

Numerical simulation of three dimensional pyramid quantum dot. Journal of Computational Physics, 196(1):208–232, 2004.

[11] Tsung-Min Hwang, Wei-Cheng Wang, and Weichung Wang. Numerical schemes for three-dimensional irregular shape quantum dots over curvilin-ear coordinate systems. Journal of Computational Physics, 226(1):754–773, 2007.

[12] Tsung-Min Hwang, Wei-Hua Wang, and Weichung Wang. Efficient nu-merical schemes for electronic states in coupled quantum dots. Journal of Nanoscience and Nanotechnology, 8(7):3695–3709, 2008.

[13] Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur.

Standards, 45:255–282, 1950.

[14] Karl Meerbergen. Locking and restarting quadratic eigenvalue solvers.

SIAM J. Sci. Comput., 22(5):1814–1839, 2001.

[15] Karl Meerbergen. The quadratic Arnoldi method for the solution of the quadratic eigenvalue problem. SIAM J. Matrix Anal. Appl., 30(4):1463–

1482, 2008.

[16] Youcef Saad and Martin H. Schultz. GMRES: a generalized minimal resid-ual algorithm for solving nonsymmetric linear systems. SIAM J. ScI. STAT.

COMPUT., 7(3):856–869, 1986.

[17] Gerard L. G. Sleijpen, Albert G. L. Booten, Diederik R. Fokkema, and Henk A. van der Vorst. Jacobi-davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT Numerical Mathemat-ics, 36(3):595–633, 1996.

[18] Gerard L. G. Sleijpen and Henk A. Van der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal.

Appl., 17:401–425, 1996.

[19] G. W. Stewart. Simultaneous iteration for computing invariant subspaces of non-Hermitian matrices. Numer. Math., 25:123–136, 1976.

[20] G. W. Stewart. Matrix Algorithms Volume II: Eigensystems. SIAM, 2001.

[21] H. A. Van Der Vorst. Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM Journal on Scientic and Statistical Computing, 13:631–644, 1992.

[22] H. Voss. An Arnoldi method for nonlinear symmetric eigenvalue problems.

in Online Proceedings of the SIAM Conference on Applied Linear Algebra, Williamsburg, http://www.siam.org/meetings/laa03/, 2003.

[23] H. Voss. An arnoldi method for nonlinear eigenvalue problems. BIT Nu-merical Mathematics, 44:387–401, 2004.

在文檔中量子點模擬所產生之多項式特徵值問題的數值研究 (頁 26-34)