GCG-type Methods for Nonsymmetric Linear Systems

(1)

師大

GCG-type Methods for Nonsymmetric Linear Systems

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University

December 6, 2011

(2)

師大

Outline

1 GCG method(Generalized Conjugate Gradient)

2 BCG method (A: unsymmetric)

3 The polynomial equivalent method of the CG method

4 Squaring the CG algorithm

5 Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems

T.M. Huang (NTNU) GCG-type Methods for Nonsymmetric LS December 6, 2011 2 / 45

(3)

師大

Recall: A is s.p.d. Consider the quadratic functional

F (x) = 1

2x^TAx − x^Tb Ax^∗= b ⇐⇒ min

x∈RⁿF (x) = F (x^∗) Consider

ϕ(x) = 1

2(b − Ax)^TA⁻¹(b − Ax) = F (x) + 1

2b^TA⁻¹b, (1) where ¹₂b^TA⁻¹b is a constant. Then

Ax^∗= b ⇐⇒ ϕ(x^∗) = min

x∈Rⁿϕ(x) = [ min

x∈RⁿF (x)] +1

2b^TA⁻¹b

(4)

師大

Algorithm: Conjugate Gradient method (CG-method)

Input: Given s.p.d. A, b ∈ Rⁿ and x0 ∈ Rⁿ and r0= b − Ax0 = p0.

1: Set k = 0.

2: repeat

3: Compute αk= ^p

T kr_k p^T_kApk;

4: Compute xk+1 = xk+ αkpk;

5: Compute r_k+1 = r_k− α_kAp_k= b − Ax_k+1;

6: Compute β_k= ^−r

T k+1Apk

p^T_kApk ;

7: Compute pk+1= rk+1+ βkpk;

8: Set k = k + 1;

9: until r_k = 0

Numerator: r^T_k+1((r_k− r_k+1)/α_k) = (−r^T_k+1r_k+1)/α_k

Denominator: p^T_kAp_k= (r_k^T + β_k−1p^T_k−1)((r_k− r_k+1)/α_k) = (r^T_kr_k)/α_k.

(5)

師大

Remark 1

CG method does not need to compute any parameters. It only needs matrix vector and inner product of vectors. Hence it can not destroy the sparse structure of the matrix A.

The vectors rk and pk generated by CG-method satisfy:

p^T_i rk = (pi, rk) = 0, i < k r_i^Tr_j = (r_i, r_j) = 0, i 6= j p^T_i Ap_j = (p_i, Ap_j) = 0, i 6= j xk+1= x0+Pk

i=0αipi minimizes F (x) over x = x0+ < p0, · · · , pk>.

(6)

師大

GCG method(Generalized Conjugate Gradient)

GCG method is developed to minimize the residual of the linear equation under some special functional. In conjugate gradient method we take

ϕ(x) = 1

2(b − Ax)^TA⁻¹(b − Ax) = 1

2r^TA⁻¹r = 1

2krk²_A−1, where kxk_A⁻¹ =√

x^TA⁻¹x.

Let A be a unsymmetric matrix. Consider the functional f (x) = 1

2(b − Ax)^TP (b − Ax),

where P is s.p.d. Thus f (x) > 0, unless x^∗= A⁻¹b ⇒ f (x^∗) = 0, so x^∗ minimizes the functional f (x).

(7)

師大

Different choices of P:

(i) P = A⁻¹ (A is s.p.d.) ⇒ CG method (classical)

(ii) P = I ⇒ GCR method (Generalized Conjugate Residual).

f (x) = 1

2(b − Ax)^T(b − Ax) = 1 2krk²₂ Here {ri} forms A-conjugate.

(iii) Consider M⁻¹Ax = M⁻¹b. Take P = M^TM > 0 ⇒ GCGLS method (Generalized Conjugate Gradient Least Square).

(iv) Similar to (iii), take P = (A + A^T)/2 (note: P is not positive definite) and M = (A + A^T)/2 we get GCG method (by Concus, Golub and Widlund). In general, P is not necessary to be taken positive definite, but it must be symmetric (P^T = P ). Therefore, the minimality property does not hold.

(8)

師大

Let

(x, y)o= x^TP y =⇒ (x, y)o = (y, x)o. Algorithm: GCG method

Input: Given A, b ∈ Rⁿ and x₀ ∈ Rⁿ and r₀= b − Ax₀ = p₀.

1: Set k = 0.

2: repeat

3: Compute α_k= (r_k, Ap_k)_o/(Ap_k, Ap_k)_o;

4: Compute x_k+1 = x_k+ α_kp_k;

5: Compute rk+1 = rk− α_kApk= b − Axk+1;

6: Compute β_k= −(Ar_k+1, Ap_i)_o/(Ap_i, Ap_i)_o, for i = 0, 1, . . . , k;

7: Compute p_k+1= r_k+1+Pk

i=0β^(k)_i p_i;

8: Set k = k + 1;

9: until r_k = 0

(9)

師大

In GCG method, the choice of {β_i^(k)}^k_i=1 satisfy:

(r_k+1, Ap_i)_o = 0, i ≤ k (2a) (r_k+1, Ar_i)_o = 0, i ≤ k (2b) (Api, Apj)o = 0, i 6= j (2c) Theorem 1

x_k+1= x₀+Pk

i=0α_kp_i minimizes f (x) = ¹₂(b − Ax)^TP (b − Ax) over x = x0+ < p0, · · · , pk>, where P is s.p.d.

(The proof is the same as that of classical CG method).

If P is indefinite, which is allowed in GCG method, then the minimality property does not hold. x_k+1 is the critical point of f (x) over

x = x₀+ < p₀, · · · , p_k>.

(10)

師大

Question

Can the GCG method break down? i.e., Can α_k in GCG method be zero?

Consider the numerator of αk:

(rk, Apk)o= (rk, Ark)o [by Line 7 in GCG Algorithm and (2a) ]

= r^T_kP Ark

= r^T_kA^TP rk [Take transpose]

= r^T_k(P A + A^TP )

2 r_k. (3)

From (3), if (P A + A^TP ) is positive definite, then α_k6= 0 unless r_k= 0.

Hence if the matrix A satisfies (P A + A^TP ) positive definite, then GCG method can not break down.

(11)

師大

From Lines 5 and 7 in GCG Algorithm, r_k and p_k can be rewritten by

r_k= ψ_k(A)r₀, (4a)

p_k= ϕ_k(A)r₀, (4b)

where ψ_k and ϕ_k are polynomials of degree ≤ k with ψ_k(0) = 1. From (4a) and (2b) follows that

(r_k+1, Aⁱ⁺¹r0)o= 0, i = 0, 1, . . . , k. (5) From (4b) and Line 6 in GCG Algorithm, the numerator of β^(k)_i can be expressed by

(Ark+1, Api)o = r_k+1^T A^TP Api= r_k+1^T A^TP Aϕi(A)r0. (6)

(12)

師大

If A^TP can be expressed by

A^TP = P θ_s(A), (7)

where θ_s is some polynomial of degree s. Then (6) can be written by (Ar_k+1, Ap_i)_o = r_k+1^T A^TP Aϕ_i(A)r₀

= r_k+1^T P θs(A)Aϕi(A)r0

= (r_k+1, Aθ_s(A)ϕ_i(A)r₀)_o.

(8)

From (5) we know that if s + i ≤ k, then (8) is zero, i.e.,(Ar_k+1, Api)o = 0. Hence

β_i^(k)= 0, i = 0, 1, . . . , k − s.

But only in the special case s will be small.

(13)

師大

For instance :

(i) In classical CG method, A is s.p.d, P is taking by A⁻¹. Then A^TP = AA⁻¹ = I = A⁻¹A = A⁻¹θ1(A), where θ1(x) = x, s = 1.

So, β^(k)_i = 0, for all i + 1 ≤ k, it is only β_k^(k)6= 0.

(ii) Concus, Golub and Widlund proposed GCG method, it solves M⁻¹Ax = M⁻¹b. (A: unsymmetric), where M = (A + A^T)/2 and P = (A + A^T)/2 (P may be indefinite).

Check condition (7):

(M⁻¹A)^TP = A^TM⁻¹M = A^T = M (2I−M⁻¹A) = P (2I−M⁻¹A).

Then

θs(M⁻¹A) = 2I − M⁻¹A,

where θ1(x) = 2 − x, s = 1. Thus β_i^(k)= 0, i = 0, 1, . . . , k − 1.

Therefore we only use r_k+1 and p_k to construct p_k+1.

(14)

師大

Check condition A^TP + P A:

(M⁻¹A)^TM + M M⁻¹A = A^T + A indefinite The method can possibly break down.

(iii) The other case s = 1 is BCG (BiCG) (See next paragraph).

Remark 2

Except the above three cases, the degree s is usually very large. That is, we need to save all directions p_i (i = 0, 1, . . . , k) in order to construct pk+1 satisfying the conjugate orthogonalization condition (2c). In GCG method, each iteration step needs to save 2k + 5 vectors (x_k+1, r_k+1, p_k+1, {Ap_i}^k_i=0, {p_i}^k_i=0), k + 3 inner products (Here k is the iteration number). Hence, if k is large, then the space of storage and the

computation cost can become very large and can not be acceptable. So, GCG method, in general, has some practical difficulty. Such as GCR, GMRES (by SAAD) methods, they preserve the optimality (P > 0), but it is too expensive (s is very large).

(15)

師大

Modification:

(i) Restarted: If GCG method does not converge after m + 1 iterations, then we take xk+1 as x0 and restart GCG method. There are at most 2m + 5 saving vectors.

(ii) Truncated: The most expensive step of GCG method is to compute β_i^(k), i = 0, 1, . . . , k so that p_k+1 satisfies (2c). We now release the condition (2c) to require that p_k+1 and the nearest m direction {p_i}^k_i=k−m+1 satisfy the conjugate orthogonalization condition.

(16)

師大

BCG method (A: unsymmetric)

BCG is similar to the CG, it does not need to save the search direction.

But the norm of the residual by BCG does not preserve the minimal property.

Solve Ax = b by considering A^Ty = c (phantom). Let A =˜

A 0 0 A^T

, x =˜

x y

, ˜b =

b c

. Consider

A˜˜x = ˜b.

Take P =

0 A^−T A⁻¹ 0

(P = P^T). This implies A˜^TP = P ˜A.

From (7) we know that s = 1 for ˜A˜x = ˜b. Hence it only needs to save one direction p_k as in the classical CG method.

(17)

師大

Apply GCG method to ˜A˜x = ˜b

1: Given ˜x0 =

x0

ˆ x₀

, compute ˜p0 = ˜r0 = ˜b − ˜A˜x0=

r0

ˆ r₀

.

2: Set k = 0.

3: repeat

4: Compute αk= (˜rk, Ã˜pk)o/( Ã˜pk, Ã˜pk)o;

5: Compute ˜x_k+1 = ˜x_k+ α_kp˜_k;

6: Compute ˜r_k+1 = ˜r_k− α_kA˜˜p_k;

7: Compute βk= −( Ã˜rk+1, Ã˜pk)o/( Ã˜pk, Ã˜pk)o;

8: Compute ˜p_k+1= ˜r_k+1+ β_kp˜_k;

9: Set k = k + 1;

10: until ˜rk = 0

(18)

師大

Simplification (BCG method)

1: Given x0, compute p0 = r0= b − Ax0.

2: Choose ˆr0, ˆp0 = ˆr0.

3: Set k = 0.

4: repeat

5: Compute α_k= (ˆr_k, r_k)/(ˆp_k, Ap_k);

6: Compute x_k+1 = x_k+ α_kp_k;

7: Compute rk+1 = rk− α_kApk, rˆk+1 = ˆrk− α_kA^Tpˆk;

8: Compute β_k= (ˆr_k+1, r_k+1)/(ˆr_k, r_k);

9: Compute p_k+1= r_k+1+ β_kp_k, pˆ_k+1= ˆr_k+1+ β_kpˆ_k;

10: Set k = k + 1;

11: until r_k = 0 From above we have

( ˜A˜p_k, ˜A˜p_k)_o=p^T_kA^T, ˆp^T_kA

0 A^−T A⁻¹ 0

Ap_k A^Tpˆk

= 2(ˆp_k, Ap_k).

(19)

師大

BCG method satisfies the following relations:

r_k^Tpˆ_i= ˆr^T_kp_i= 0, i < k (9a) p^T_kA^Tpˆ_i = ˆp^T_kAp_i= 0, i < k (9b) r^T_krˆi= ˆr^T_kri= 0, i < k (9c)

Definition 2

(9c) and (9b) are called biorthogonality and biconjugacy condition, respectively.

(20)

師大

Property:

(i) In BCG method, the residual of the linear equation does not satisfy the minimal property, because P is taken by

P =

0 A^−T A⁻¹ 0

and P is symmetric, but not positive definite. The minimal value of the functional f (x) may not exist.

(ii) BCG method can break down, because Z = ( ˜A^TP + P ˜A)/2 is not positive definite. From above discussion, α_k can be zero. But this case occurs very few.

(21)

師大

GCG

GCR, GCR(k) BCG

Orthomin(k) CGS

Orthodir BiCGSTAB

Orthores QMR

GMRES(m) TFQMR

FOM Axelsson LS

(22)

師大

The polynomial equivalent method of the CG method

Consider first A is s.p.d.

CG-method

1: r₀= b − Ax₀ = p₀.

2: Set k = 0.

3: repeat

4: α_k= _(p^(r^k^,p^k⁾

k,Ap_k) = _(p^(r^k^,r^k⁾

k,Ap_k);

5: xk+1 = xk+ αkpk;

6: r_k+1 = r_k− α_kAp_k;

7: β_k =^−(r_(p^k+1^,Ap^k⁾

k,Apk) = −^(r^k+1_(r ^,r^k+1⁾

k,rk) ; 8: pk+1= rk+1+ βkpk;

9: k = k + 1;

10: until r_k= 0

Equivalent CG-method

1: r0 = b − Ax0 = p0, p−1 = 1, ρ−1= −1.

2: Set k = 0.

3: repeat

4: ρ_k= r^T_kr_k, β_k= _ρ^ρ^k

k−1;

5: p_k= r_k+ β_kp_k−1;

6: σ_k= p^T_kAp_k, α_k= ^ρ_σ^k

k;

7: xk+1= xk+ αkpk;

8: r_k+1= r_k− α_kAp_k;

9: k = k + 1;

10: until r_k= 0

(23)

師大

Remark 3

1. E_k= r^T_kA⁻¹r_k= min_x∈x₀_+K_kkb − Axk²_A−1

2. r^T_i rj = 0, p^T_i Apj = 0, i 6= j.

From the structure of the new form of the CG method, we write rk = ϕk(A)r0, pk= ψk(A)r0

where ϕ_k and ψ_k are polynomial of degree ≤ k. Define ϕ0(τ ) ≡ 1 and ϕ−1(τ ) ≡ 0. Then we find

pk= ϕk(A)r0+ βkψk−1(A)r0 ≡ ψ_k(A)r0 (10a) with

ψ_k(τ ) ≡ ϕ_k(τ ) + β_kψ_k−1(τ ), (10b) and

rk+1= ϕk(A)r0− α_kAψk(A)r0≡ ϕ_k+1(A)r0 (11a) with

(24)

師大

The polynomial equivalent method of the CG method :

1: ϕ₀ ≡ 1, ϕ₋₁ ≡ 0, ρ₋₁= 1.

2: for k = 0, 1, 2, . . . do

3: ρ_k= (ϕ_k, ϕ_k), β_k= _ρ^ρ^k

k−1;

4: ψ_k= ϕ_k+ β_kψ_k−1;

5: σ_k= (ψ_k, θψ_k), α_k= ^ρ_σ^k

k;

6: ϕk+1 = ϕk− α_kθψk;

7: end for where θ(τ ) = τ .

(25)

師大

The minimization property reads

E_k = (ϕ_k, θ⁻¹ϕ_k) = min

ϕ∈P^N

(ϕ, θ⁻¹ϕ) ϕ(0)² . We also have

(ϕi, ϕj) = 0, i 6= j from (ri, rj) = 0, i 6= j.

(ψi, θψj) = 0, i 6= j from (pi, Apj) = 0, i 6= j.

Theorem 3

Let [·, ·] be any symmetric bilinear form satisfying [ϕχ, ψ] = [ϕ, χψ] ∀ϕ, ψ, χ ∈ P^N.

Let the sequence of ϕ_i and ψ_i be constructed according to PE algorithm, but using [·, ·] instead (·, ·). Then as long as the algorithm does not break down by zero division, then ϕ_i and ψ_i satisfy

[ϕ_i, ϕ_j] = ρ_iδ_ij, [ψ_i, θψ_j] = σ_iδ_ij with θ(τ ) ≡ τ .

(26)

師大

Bi-Conjugate Gradient algorithm

1: Given r0 = b − Ax0, p−1 = ˆp−1 and ˆr0 arbitrary.

2: for k = 0, 1, 2, . . . do

3: ρ_k= ˆr^T_kr_k, β_k= ρ_k/ρ_k−1;

4: pk= rk+ βkpk−1, ˆpk= ˆrk+ βkpˆk−1;

5: σ_k= ˆp^T_kAp_k, α_k= ρ_k/σ_k;

6: r_k+1= r_k− α_kAp_k, ˆr_k+1 = ˆr_k− α_kA^Tpˆ_k;

7: xk+1= xk+ αkpk;

8: end for Property:

rk= b − Axk, r^T_i rˆj = 0, i 6= j and p^T_i A^Tpˆj = 0, i 6= j.

(27)

師大

Consider

Ax = b, A : nonsymmetric.

Given x0, r0 = b − Ax0, let ˆr0 be a suitably chosen vector. Define [·, ·] by [ϕ, ψ] = ˆr^T₀ϕ(A)ψ(A)r0= (ϕ(A^T)ˆr0)^Tψ(A)r0

and define p−1= ˆp−1= 0. (If A symmetric : (ϕ, ψ) = r^T₀ϕ(A)ψ(A)r₀).

Then we have

r_k = ϕ_k(A)r₀, rˆ_k= ϕ_k(A^T)ˆr₀, p_k = ψ_k(A)r₀, pˆ_k = ψ_k(A^T)ˆr₀

with ϕ_k and ψ_k according to (10b) and (11b). Indeed, these vectors can be produced by the Bi-Conjugate Gradient algorithm:

(28)

師大

Squaring the CG algorithm: CGS Algorithm (SISC, 1989, Sonneveld)

Assume that Bi-CG is converging well. Then r_k→ 0 as k → ∞. Because r_k= ϕ_k(A)r₀, ϕ_k(A) behaves like contracting operators.

Expect: ϕ_k(A^T) behaves like contracting operators (i.e., ˆr_k→ 0).

But ”quasi-residuals” ˆrk is not exploited, they need to be computed for the ρ_k and σ_k.

Disadvantage: Work of Bi-CG is twice the work of CG and in general A^Tv is not easy to compute. Especially if A is stored with a general data structure.

(29)

師大

• Improvement: Using Polynomial equivalent algorithm to CG.

Since ρ_k = [ϕ_k, ϕ_k] and σ_k= [ψ_k, θψ_k], [·, ·] has the property [ϕχ, ψ] = [ϕ, χψ]. Let ϕ₀ = 1. Then

ρ_k = [ϕ0, ϕ²_k], σ_k= [ϕ0, θψ_k²].

ϕ_k+1 = ϕ_k− α_kθψ_k, ψk= ϕk+ βkψk−1. Remark 4

ρ_k = ˆr_k^Tr_k= (ϕ_k(A^T)ˆr₀)^T(ϕ_k(A)r₀) = ˆr^T₀ϕ²_k(A)r₀, σ_k = ˆp^T_kAp_k = (ψ_k(A^T)ˆr₀)^TA(ψ_k(A)r₀) = ˆr₀^TAψ²_k(A)r₀.

(30)

師大

• Purpose:

1 Find an algorithm that generates the polynomial ϕ²_k and ψ²_k rather than ϕ_k and ψ_k.

2 Compute approximated solution x_k with r_k= ϕ²_k(A)r0 as residuals (try to interpret). Because ρ_k= ˆr^T₀r_k with r_k= ϕ²_k(A)r₀, ˆr_k and ˆp_k need not to be computed.

How to compute ϕ²_k and ψ²_k?

ψ_k² = [ϕk+ βkψk−1]² = ϕ²_k+ 2βkϕkψk−1+ β²_kψ_k−1² , ϕ²_k+1 = [ϕ_k− α_kθψ_k]² = ϕ²_k− 2α_kθϕ_kψ_k+ α²_kθ²ψ²_k.

(31)

師大

Since

ϕ_kψ_k = ϕ_k[ϕ_k+ β_kψ_k−1] = ϕ²_k+ β_kϕ_kψ_k−1,

we only need to compute ϕkψk−1, ϕ²_k and ψ²_k. Now define for k ≥ 0 : Φ_k = ϕ²_k, Θ_k= ϕ_kψ_k−1, Ψ_k−1= ψ_k−1² .

From

ψ²_k= ϕ²_k+ 2βkϕkψk−1+ β_k²ψ_k−1² , ϕ_k+1ψ_k= (ϕ_k− αkθψ_k)ψ_k = ϕ_kψ_k− αkθψ_k²,

ϕ²_k+1= ϕ²_k− 2αkθϕkψk+ α²_kθ²ψ²_k

= ϕ²_k− αkθ(ϕkψk− αkθψ_k²) − αkθϕkψk= ϕ²_k− αkθ(ϕk+1ψk+ ϕkψk),

we have

Yk = ϕkψk = Φk+ βkΘk,

Ψ_k = Φ_k+ 2β_kΘ_k+ β_k²Ψ_k−1= Y_k+ β_k(Θ_k+ β_kΨ_k−1) , Θk+1= Yk− αkθΨk,

Φ_k+1= Φ_k− α_kθ(Y_k+ Θ_k+1).

(32)

師大

Bi-Conjugate Gradient

1: Given r0 = b − Ax0, p−1= ˆp−1 and arbitrary ˆ

r0.

2: for k = 0, 1, 2, . . . do

3: ρ_k= ˆr^T_kr_k, βk= ρk/ρk−1;

4: p_k= r_k+ β_kp_k−1, ˆ

p_k= ˆr_k+ β_kpˆ_k−1;

5: σk= ˆp^T_kApk, α_k= ρ_k/σ_k;

6: r_k+1 = r_k− α_kAp_k, ˆ

rk+1 = ˆrk− α_kA^Tpˆk;

7: x_k+1 = x_k+ α_kp_k;

8: end for

CGS

1: Φ0 ≡ 1, Θ₀ ≡ Ψ−1≡ 0, ρ−1= 1.

2: for k = 0, 1, 2, . . . do

3: ρ_k= [1, Φ_k], β_k= ρ_k/ρ_k−1;

4: Y_k = Φ_k+ β_kΘ_k;

5: Ψ_k= Y_k+ β_k(Θ_k+ β_kΨ_k−1);

6: σ_k= [1, θΨ_k], α_k= ρ_k/σ_k;

7: Θ_k+1= Y_k− α_kθΨ_k;

8: Φ_k+1= Φ_k− α_kθ(Y_k+ Θ_k+1);

9: end for

(33)

師大

CGS

1: Φ0 ≡ 1, Θ₀ ≡ Ψ−1≡ 0, ρ−1= 1.

2: for k = 0, 1, 2, . . . do

3: ρ_k= [1, Φ_k], β_k= ρ_k/ρ_k−1;

4: Y_k= Φ_k+ β_kΘ_k;

5: Ψ_k = Y_k+ β_k(Θ_k+ β_kΨ_k−1);

6: σ_k= [1, θΨ_k], α_k= ρ_k/σ_k;

7: Θ_k+1 = Y_k− α_kθΨ_k;

8: Φ_k+1 = Φ_k− α_kθ(Y_k+ Θ_k+1);

9: end for

CGS Variant

1: Given r0= b − Ax0, q₀ = p−1 = 0, ρ−1= 1.

2: for k = 0, 1, 2, . . . do

3: ρk= ˆr₀^Trk, βk= ρk/ρk−1;

4: u_k= r_k+ β_kq_k;

5: p_k = u_k+ β_k(q_k+ β_kp_k−1);

6: vk= Apk;

7: σ_k= ˆr₀^Tv_k, α_k= ρ_k/σ_k;

8: q_k+1 = u_k− α_kv_k;

9: rk+1= rk−α_kA(uk+qk+1);

10: x_k+1= x_k+ α_k(u_k+ q_k+1);

11: end for

Define rk= Φk(A)r0, qk= Θk(A)r0, pk= Ψk(A)r0 and uk= Yk(A)r0.

(34)

師大

Since r₀ = b − Ax₀, r_k+1− r_k= A(x_k− x_k+1), we have that

r_k= b − Ax_k. So this algorithm produces x_k of which the residual satisfy rk= ϕ²_k(A)r0.

(35)

師大

Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems (SISC, 1992, Van der Vorst)

Bi-CG method

1: Given r0 = b − Ax0, (ˆr0, r0) 6= 0, ρ0 = 1, ˆp0= p0= 0.

2: for k = 1, 2, . . . do

3: ρ_k= (ˆr_k−1, r_k−1);

4: βk= ρk/ρk−1;

5: p_k= r_k−1+ β_kp_k−1, ˆp_k= ˆr_k−1+ β_kpˆ_k−1;

6: v_k = Ap_k;

7: αk= ρk/(ˆpk, vk);

8: x_k= x_k−1+ α_kp_k;

9: Stop here, if x_k is accurate enough.

10: rk= rk−1− α_kvk= rk−1− α_kApk;

11: ˆr_k= ˆr_k−1− α_kA^Tpˆ_k;

12: end for

(36)

師大

Property:

(i) rk ⊥ ˆr0, . . . , ˆrk−1 and ˆrk⊥ r₀, . . . , rk−1.

(ii) three-term recurrence relations between {rk} and {ˆrk}.

(iii) It terminates within n steps, but no minimal property.

Since

r^Bi−CG_k = ϕ_k(A)r₀, ˆr_k^Bi−CG= ϕ_k(A^T)ˆr₀, it implies that

(r_k, ˆri) = ϕ_k(A)r0, ϕi(A^T)ˆr0 = (ϕ_i(A)ϕ_k(A)r0, ˆr0) = 0, i < k.

(37)

師大

CGS Method

1: Given x₀, r₀= b − Ax₀, (r₀, ˆr₀) 6= 0, ˆr₀= r₀, ρ₀ = 1, p₀= q₀ = 0.

2: for k = 1, 2, . . . do

3: ρ_k= (ˆr0, r_k−1), β_k= ρ_k/ρ_k−1;

4: u_k= r_k−1+ β_kq_k−1;

5: pk= uk+ βk(qk−1+ βkpk−1);

6: v_k = Ap_k;

7: α_k= ρ_k/(ˆr₀, v_k);

8: qk= uk− α_kvk;

9: w_k= u_k+ q_k;

10: x_k= x_k−1+ α_kw_k;

11: Stop here, if xk is accurate enough.

12: r_k= r_k−1− α_kAw_k;

13: end for

We have r_k^CGS= ϕk(A)²r0.

(38)

師大

From Bi-CG method we have r_k^Bi−CG= ϕ_k(A)r₀ and p_k+1 = ψ_k(A)r₀. Thus we get

ψ_k(A)r₀= (ϕ_k(A) + β_kψ_k−1(A)) r₀, and

ϕ_k(A)r₀ = (ϕ_k−1(A) − α_kAψ_k−1(A)) r₀, where ψk= ϕk+ βkψk−1 and ϕk= ϕk−1− α_kθψk−1. Since

ϕ_k(A)r₀, ϕ_j(A^T)ˆr₀ = 0, j < k, it holds that

ϕ_k(A)r₀ ⊥ ˆr₀, A^Trˆ₀, . . . , (A^T)^k−1ˆr₀ if and only if

( ˜ϕ_j(A)ϕ_k(A)r₀, ˆr₀, ) = 0

for some polynomial ˜ϕj of degree j < k for j = 0, 1, . . . , k − 1.

(39)

師大

In Bi-CG method, we take ˜ϕ_j = ϕ_j, ˆr_k= ϕ_k(A^T)ˆr₀ and exploit it in CGS to get r_k^CGS= ϕ²_k(A)r0. Now r_k = ˜ϕ_k(A)ϕ_k(A)r0. How to choose ˜ϕ_k polynomial of degree k so that kr_kk satisfies the minimum. Like

polynomial, we can determine the optimal parameters of ˜ϕ_k so that kr_kk satisfies the minimum. But the optimal parameters for the Chebychev polynomial are in general not easily obtainable. Now we take

˜

ϕ_k≡ η_k(x), where

η_k(x) = (1 − ω₁x)(1 − ω₂x) · · · (1 − ω_kx).

Here ω_j are suitable constants to be selected.

(40)

師大

Define

r_k = η_k(A)ϕ_k(A)r0. Then

rk = ηk(A)ϕk(A)r0

= (1 − ω_kA)η_k−1(A) (ϕ_k−1(A) − α_kAψ_k−1(A)) r₀

= {(η_k−1(A)ϕ_k−1(A) − α_kAη_k−1(A)ψ_k−1(A))} r₀

−ω_kA {(η_k−1(A)ϕ_k−1(A) − α_kAη_k−1(A)ψ_k−1(A))} r₀

= r_k−1− α_kAp_k− ω_kA(r_k−1− α_kAp_k)

(41)

師大

and

p_k+1 = η_k(A)ψ_k(A)r₀

= η_k(A) (ϕ_k(A) + β_kψ_k−1(A)) r₀

= η_k(A)ϕ_k(A)r0+ β_k(1 − ω_kA)η_k−1(A)ψ_k−1(A)r0

= ηk(A)ϕk(A)r0+ βkηk−1(A)ψk−1(A)r0

−β_kω_kAη_k−1(A)ψ_k−1(A)r₀

= r_k+ β_k(p_k− ω_kAp_k).

Recover the constants ρ_k, β_k, and α_k in Bi-CG method.

(42)

師大

We now compute β_k: Let ˆ

ρ_k+1= (ˆr₀, η_k(A)ϕ_k(A)r₀) = η_k(A^T)ˆr₀, ϕ_k(A)r₀ .

From Bi-CG we have ϕ_k(A)r₀ ⊥ all vectors µ_k−1(A^T)ˆr₀, where µ_k−1 is an arbitrary polynomial of degree k − 1. Consider the highest order term of ηk(A^T) (when computing ˆρk+1) is (−1)^kω1ω2· · · ω_k(A^T)^k. From Bi-CG method, we also have

ρk+1= ϕk(A^T)ˆr0, ϕk(A)r0 .

The highest order term of ϕ_k(A^T) is (−1)^kα₁· · · α_k(A^T)^k. Thus β_k= ( ˆρ_k/ ˆρ_k−1) (α_k−1/ω_k−1) ,

(43)

師大

because

βk = ρk

ρ_k−1 = α1· · · α_k−1(A^T)^k−1rˆ0, ϕk−1(A)r0

(α1· · · α_k−2(A^T)^k−2rˆ0, ϕ_k−2(A)r0)

=

α1· · · α_k−1

ω1· · · ω_k−1ω1· · · ω_k−1(A^T)^k−1rˆ0, ϕ_k−1(A)r0

α1· · · α_k−2

ω1· · · ω_k−2ω1· · · ω_k−2(A^T)^k−2rˆ0, ϕ_k−2(A)r0

= ( ˆρ_k/ ˆρ_k−1) (α_k−1/ω_k−1) . Similarly, we can compute ρk and αk. Let

r_k= r_k−1− γAy, x_k= x_k−1+ γy (side product).

Compute ω_k so that r_k= η_k(A)ϕ_k(A)r₀ is minimized in 2-norm as a function of ω_k.

(44)

師大

Bi-CGSTAB Method

1: Given x₀, r₀= b − Ax₀, ˆr₀ arbitrary, such that (r₀, ˆr₀) 6= 0, e.g.

ˆ

r0= r0, ρ0 = α = ω0 = 1, v0 = p0 = 0.

2: for k = 1, 2, . . . do

3: ρ_k= (ˆr₀, r_k−1), β = (ρ_k/ρ_k−1)(α/ω_k−1);

4: pk= rk−1+ β(pk−1− ω_k−1vk−1);

5: v_k = Ap_k;

6: α = ρ_k/(ˆr₀, v_k);

7: s = rk−1− αv_k;

8: t = As;

9: ω_k= (t, s)/(t, t);

10: xk= xk−1+ αpk+ ωks (= xk−1+ αpk+ ωk(rk−1− αAp_k));

11: Stop here, if x_k is accurate enough.

12: r_k= s − ω_kt (= r_k−1− αAp_k− ω_kA(r_k−1− αAp_k) = r_k−1− A(αp_k+ ω_k(r_k−1− αAp_k));

13: end for

(45)

師大

Preconditioned Bi-CGSTAB-P:

Rewrite Ax = b as

A˜˜x = ˜b with A = K˜ ₁⁻¹AK₂⁻¹, where x = K₂⁻¹x and ˜˜ b = K₁⁻¹b. Then

˜

p_k:= K₁⁻¹p_k,

˜

v_k:= K₁⁻¹v_k,

˜

rk:= K₁⁻¹rk,

˜

s := K₁⁻¹s, t := K˜ ₁⁻¹t,

˜

x_k:= K₂x_k,

˜

r0:= K₁^Trˆ0.