Numerical Results - 二次暨有理特徵值問題中高效能Arnoldi型態演算法

Q-SEP2 (2.19.2) R-SEP2 (2.31.2) Solving linear system Keux_u = bu Kepx_p = bp

Matrix-vector products Mfubu, eDubu Mfpbp, eDpbp, Lpcp, R^⊤_pc^⊤_p

Inner products j + 1 j + 1

Saxpy operators j + 1 j + 2

Scale-vector product 1 1

Table 2.1: Computational costs of the j-th Arnoldi step of Algorithm 2.1 for Q-SEP2 and R-Q-SEP2.

g_p = eK_p⁻¹v_j2 for j = 1, . . . , k can be saved in Gu ≡ [ eK_u⁻¹v₁₂ · · · eK_u⁻¹v_m2] and Gp ≡ [ eK_p⁻¹v₁₂ · · · eK_p⁻¹v_m2], respectively, so that the vectors u2, p2 in (2.50) and (2.54) can be computed by u2 = Guz and p2 = Gpz directly. Hence, it requires the same computational costs for computing u1, u2 in (2.49) and (2.50), as well as, p1, p2 in (2.53) and (2.54), respectively. Consequently, the computational costs of Q-SEP1 for the convergence test in Algorithm 2.2 need one extra matrix-vector product eKuv_m+1,2 than those of Q-SEP2 in computing ζ1 and ζ2. Similarly, the computational costs of R-SEP1 for the convergence test in Algorithm 2.3 need one extra matrix-vector product eKpv_m+1,2 than those of R-SEP2 in computing ξ1 and ξ2. Therefore, we conclude that Algorithm 2.2 for Q-SEP1 and Q-SEP2, as well as, Algorithm 2.3 for R-SEP1 and R-SEP2, respectively, almost have the same computational costs provided that they have the same outer iterations.

2.5 Numerical Results

We conduct numerical experiments to evaluate performance and accuracy of the eigenvalue solvers described in Section 2.4. To distinguish between various eigenvalue problems, we use notations Q1, Q2, R1 and R2 defined as follows:

• Q1: Applying Algorithm 2.2 to solve the QEP (2.13) with Q-SEP1 in (2.19.1).

2.5 Numerical Results

• Q2: Applying Algorithm 2.2 to solve the QEP (2.13) with Q-SEP2 in (2.19.2).

• R1: Applying Algorithm 2.3 to solve the REP (2.20) with R-SEP1 in (2.31.1).

• R2: Applying Algorithm 2.3 to solve the REP (2.20) with R-SEP2 in (2.31.2).

All computations are carried out in MATLAB 2009a on a HP workstation with an Intel Quad-Core Xeon X5570 2.93GHz and 72 GB main memory, using IEEE double-precision floating-point arithmetic. We apply Algorithms 2.2 and 2.3 to solve the following examples arising in fluid-solid systems. The order m of Arnoldi decomposition in Line 3 of Algorithm 2.1 is set m = 40, the maximum number rmax

of Schur-restartings is set rmax= 15 and the number of desired eigenpairs is k = 10.

The relative residuals of approximate eigenpairs (λi, ui) and (λi, pi) computed by Q1 and Q2, as well as, R1 and R2 are, respectively, defined by

kQ(λⁱ)uik

ϕ(λi)kuⁱk and kR(λⁱ)pik ψ(λi)kpⁱk,

where ϕ(λi) and ψ(λi) are given in Algorithm 2.2 and 2.3, respectively. Tolerances for relative residuals of QEPs and REPs are chosen by tolQ = tolR= 5× 10⁻¹⁵. The linear systems in Algorithms 2.2 and 2.3 are solved by LU-factorization with the shift value σ =−25 + 600πi. Fronbenius norm for matrices and 2-norm for vectors are used.

Example 2.1. [6] We take the geometrical data: the domain Ω = [0m, 1m] × [−0.75m, 0m], Γ^A= [0m, 1m]× {0m} given in Figure 2.1(i) and the following physical data: ρ = 1kg/m³, c = 340 m/s, α = 5× 10⁴ N/m³, and β = 200 Ns/m³.

The rectangular domain Ω is uniformly partitioned into nℓ by nw rectangles and each rectangle is further refined into two triangles, see Figure 2.1(ii). The dimen-sions of coefficient matrices in QEP (2.13) and REP (2.20) are (3nℓ− 1) × n^w and

2.5 Numerical Results

0.75 m b=

1.00 m a=

absorbing wall ΓA

ΓR n

rigid walls

(i) Fluid in a cavity with one absorbing wall. (ii) Initial mesh.

Figure 2.1: Fluid in a cavity with one absorbing wall and initial mesh

(nℓ+ 1)×(n^w+ 1), respectively. Figure 2.2 plots the analytic solutions of the desired eigenvalues λ1, . . . , λ10 of (2.5)–(2.8) (see [6]) with the lowest positive vibration fre-quencies satisfying 0 < ^Im(λ_2πⁱ⁾ < 600Hz.

Convergence test: We first demonstrate convergence rates of Q2 and R2 while computing the desired eigenvalues in Figure 2.2. To measure the convergence rate, we run the test over the five successively refined meshes (See the first column of Table 2.2) and then calculate the rates by

rate[i,j]= log₂

|λ^[i,j]− λ^[i,j+1]|

|λ^[i,j+1]− λ^[i,j+2]|

, for i = 1, . . . , 10, j = 1, 2, 3,

where λ[i,j]for j = 1, . . . , 5 denote the approximate eigenvalues computed by Q2 and R2 corresponding to λi obtained from the meshes described in Table 2.2. The 5-th and the 6-th columns of Table 2.2 illustrate the quadratic convergence of rate[1,j]

j = 1, 2, 3 for λ1 of QEP (2.13) and REP (2.20), respectively. In our numerical experiment, the convergence rate are always close to 2 for all desired eigenvalues, λi, i = 1, . . . , 10, computed by Q2 and R2 as well as Q1 and R1.

2.5 Numerical Results

(nℓ, nw) Matrix size (QEP) Matrix size (REP) Conv. rate (3nℓ− 1) × n^w (nℓ+ 1)× (n^w+ 1) λ1 Q2 R2

( 48, 36) 5, 148 1, 813

( 96, 72) 20, 664 7, 081

(192, 144) 82, 800 27, 985 rate_[1,1] 1.9979 2.0010 (384, 288) 331, 488 111, 265 rate[1,2] 1.9995 2.0003 (768, 576) 1, 326, 528 443, 713 rate[1,3] 1.9999 2.0001

Table 2.2: Dimension information and convergence rates of λ1.

Normwise scaling of QEP: Balancing norms of coefficient matrices is an im-portant issue [66] before solving a QEP of the form

P (λ)x≡ (λ²P2+ λP1+ P0)x = 0. (2.55)

In [15] authors give an elegant way to scale the norms of coefficient matrices of (2.55) as follows. Define

P (ν)xb ≡ (ν²Pb2+ ν bP1+ bP0)x = 0

with ν = λ/ζ, bP2 = ζ²ηP2, bP1 = ζηP1 and bP0 = ηP0, where ζ and η are scaling fac-tors. Taking ζ and η as ζ∗ =p

γ0/γ2 and η∗ = 2/(γ0+ γ1ζ∗) with γ2 :=kP2k2, γ1 :=

kP1k2, γ0 :=kP0k2, it is proved in [15] that the problem

minζ,η maxn

|k bP2k² − 1|, |k bP1k² − 1|, |k bP0k² − 1|o

achieves the optimum at ζ_∗ and η_∗. In our implementation, the values of γi, for i = 0, 1, 2 are computed by γ₂ = kfMuk^F, γ₁ = k eDuk^F, γ₀ = k eKuk^F and γ₂ = kfMpk^F, γ1 = k eDpk^F, γ0 = k eKpk^F for QEP (2.15) and REP (2.29), respectively.

We denote “#It” the number of Schur-restartings (outer iterations). In Table 2.3, we show #Its for computing 10 desired eigenvalues of Example 2.1 with (nℓ, nw) =

2.5 Numerical Results

−3500 −300 −250 −200 −150 −100 −50 0

500 1000 1500 2000 2500 3000 3500 4000

real part

imaginary part

λ1

λ2

λ3

λ4

λ5

λ6

λ7

λ8

λ9

λ10

Figure 2.2: The distribution of the ten desired eigenvalues λ1, . . . , λ10.

(768, 576) by Q1, Q2, R1 and R2 with/without scaling. The tolerances tolQ and tolR for relative residuals are chosen to be 5× 10⁻¹⁵. We see that the convergence rate of scaled Q-SEPs or SEPs is faster than that of unscaled Q-SEPs or R-SEPs. The performance of Q2 and R2 is also better than that of Q1 and R1, respectively. In the case of unscaled REP, the norms of fMp, eDp and eKp in (2.24)–

(2.26) are O(10⁻¹⁰),O(10⁻⁵) and O(1), respectively. Since the norms of coefficient matrices vary too much, R1 can even fail to converge to 10 eigenpairs after 15 outer iterations.

Q1 Q2 R1 R2

#It (scaled) 3 2 4 3

#It (unscaled) 4 3 15 3

Table 2.3: #Its for λ1, . . . , λ10 of Q-SEPs and R-SEPs with/without scaling.

2.5 Numerical Results

−3500 −300 −250 −200 −150 −100 −50 0

500 1000 1500 2000 2500 3000 3500 4000

Real part

Imaginary part

( 6, 5 ) ( 3, 2 )

( 6, 4 ) ( 2, 2 )

( 4, 4 ) ( 2, 2 )

Eigenvalues Shift values

Figure 2.3: The #Its of Q1 and Q2 with different shift values. “o” denotes desired eigenvalues λ1, . . . , λ10. “(i, j)” denotes the #Its for Q1 and Q2, respectively.

10⁻¹⁸ 10⁻¹⁷ 10⁻¹⁶ 10⁻¹⁵ 10⁻¹⁴ 10⁻¹³ 10⁻¹²

Relative Residual

R 1 Q 1 Q 2 R 2

λ₃

λ₁ λ₂ λ₄ λ₅ λ₆ λ₇ λ₈ λ₉ λ₁₀

Figure 2.4: The relative residuals of computed eigenpairs, obtained by Q1, Q2 for QEP (2.13) and R1, R2 for REP (2.20) with (nℓ, nw) = (768, 576).

2.5 Numerical Results

Q2 R2 TR2

TQ2

(nℓ, nw) #It TQ2 #It TR2

( 48, 36) 2 1.316 2 0.471 0.36 ( 96, 72) 2 7.717 2 2.387 0.31 (192, 144) 2 55.27 2 14.95 0.27 (384, 288) 2 567.8 2 134.0 0.24 (768, 576) 2 8152 2 1645 0.20

Table 2.4: Iteration numbers and CPU time for Q2 and R2.

No spurious eigenmodes: In [6], it has been proved that there are no spurious eigenmodes for the discretization based on Raviart-Thomas finite elements. We compute twenty desired eigenvalues of QEP (2.13) and REP (2.20) by Q2 and R2, respectively, with scaling and various mesh sizes as shown in Table 2.2 (we computed 20 instead of 10 eigenvalues to be better confirmed). The desired eigenvalues of REP are in one-to-one correspondence to those of QEP which match well with relative error less than 10⁻⁶, that is, no spurious eigenmodes ever appear. We numerically conclude that there are no spurious eigenmodes for the discretization in terms of pressure nodal finite elements.

Null space considerations: Theorem 2.1 shows that the dimension of the null space of QEP (2.13) is equal to the number of interior nodes, i.e., (nℓ− 1)(nw− 1).

In order to observe the interference of such a large null space in the convergence of Q1 and Q2, we give six different shift values denoted by the “+” in Figure 2.3 to observe variation in the #Its for Q1 and Q2. The integer pair (i, j) under each shift value “+” denotes the #Its for Q1 and Q2, respectively. The results in Figure 2.3 demonstrate that the #It needed decreases, as the shift value σ is chosen relatively far away from zero.

Comparison of pressure and displacement formulation: In this para-graph, we shall discuss the advantages of using the nodal pressure finite elements with various mesh sizes described in Table 2.2. The notations “TQ2” and “TR2”

de-2.5 Numerical Results

note the total CPU time for Q2 and R2, respectively. We summarize the results as follows:

• Accuracy of eigenpairs: From Remark 2.3, the upper bound for relative resid-ual of the approximate eigepairs of QEP (2.13) (or REP (2.20)) by using Q-SEP2 (2.19.2) (or R-SEP2 (2.31.2)) is much smaller than that by using Q-SEP1 (2.19.1) (or R-SEP1 (2.31.1)). On applying Q1 and Q2 to solve QEP (2.13) with #It = 2, in Figure 2.4, we see that the relative residuals of eigenpairs corresponding to λ₄ and λ₅ computed by Q2 are improved by about 1 significant digit than those by Q1. The other eigenpairs almost have the same accuracy. On applying R1 and R2 to solve REP (2.20) with #It = 2, in Figure 2.4, we see that the relative residuals of eigenpairs computed by R2 are improved by about 2 to 4 significant digits than those by R1.

• Comparison R2 with Q2: From Subsection 2.4.2 we see that Q1 and Q2, as well as, R1 and R2 have the same computational costs, respectively. From Figure 2.4, we favor applying Q2 and R2 to solve QEP (2.13) and REP (2.20), respectively. From column 12 of Table 2.4, we see that the CPU time for solving the REP (2.20) by R2 is only 1/5 to 1/3 of that for solving the QEP (2.13) by Q2. The accuracy of the computed eigenpairs for REP (2.20) is also better than that of QEP (2.13). These results tell us that using R2 to solve nodal pressure finite elements for the discrete problem (2.12) is better than that using Q2 to solve Raviart-Thomas displacement finite elements for the discrete problem (2.11).

We now want to apply our methods to a more complicated configuration in which the absorbing walls are located on three sides.

Example 2.2. We use the same geometric data and physical data in Example 2.1 except that the absorbing wall is extended to one half of the rigid walls in the left

在文檔中二次暨有理特徵值問題中高效能Arnoldi型態演算法 (頁 48-56)