Numerical methods for solving linear systems

(1)

Chapter 2 Numerical methods for solving linear systems

Let A ∈ C^n×n be a nonsingular matrix. We want to solve the linear system Ax = b by (a) Direct methods (finite steps); Iterative methods (convergence). (See Chapter 4)

2.1 Elementary matrices

Let X = Kⁿ and x, y ∈ X. Then y^∗x∈ K, xy^∗ =





x1y¯1 · · · x¹y¯n

... ... xny¯1 · · · xny¯n



. The eigenvalues

of xy^∗ are {0, · · · , 0, y^∗x}, since rank(xy^∗) = 1 by (xy^∗)z = (y^∗z)x and (xy^∗)x = (y^∗x)x.

Definition 2.1.1 A matrix of the form

I− αxy^∗ (α∈ K, x, y ∈ Kⁿ) (2.1.1) is called an elementary matrix.

The eigenvalues of (I − αxy^∗) are {1, 1, · · · , 1, 1 − αy^∗x}. Compute

(I− αxy^∗)(I − βxy^∗) = I− (α + β − αβy^∗x)xy^∗. (2.1.2) If αy^∗x− 1 6= 0 and letβ = _αy^∗^α_x−1, then α + β− αβy^∗x = 0. We have

(I− αxy^∗)⁻¹ = (I − βxy^∗), 1 α + 1

β = y^∗x. (2.1.3)

Example 2.1.1 Let x∈ Kⁿ, and x^∗x = 1. Let H ={z : z^∗x = 0} and Q = I− 2xx^∗ (Q = Q^∗, Q⁻¹ = Q).

Then Q reflects each vector with respect to the hyperplane H. Let y = αx + w, w ∈ H.

Then, we have

Qy = αQx + Qw =−αx + w − 2(x^∗w)x =−αx + w.

(2)

24 Chapter 2. Numerical methods for solving linear systems Example 2.1.2 Let y = e_i = the i-th column of unit matrix and x = l_i = [0,· · · , 0, li+1,i,· · · , ln,i]^T. Then,

I + l_ie^T_i =





 1

. ..

1 li+1,i

... . ..

ln,i 1







(2.1.4)

Since e^T_i li = 0, we have

(I + lie^T_i )⁻¹= (I − lie^T_i). (2.1.5) From the equality

(I + l1e^T₁)(I + l2e^T₂) = I + l1e^T₁ + l2e^T₂ + l1(e^T₁l2)e^T₂ = I + l1e^T₁ + l2e^T₂ follows that

(I + l1e^T₁)· · · (I + lⁱe^T_i )· · · (I + ln−1e^T_n−1) = I + l1e^T₁ + l2e^T₂ +· · · + ln−1e^T_n−1

=





 1

l21 . .. 0 ... ... ...

ln1 · · · ln,n−1 1





. (2.1.6)

Theorem 2.1.1 A lower triangular with “1” on the diagonal can be written as the product of n− 1 elementary matrices of the form (2.1.4).

Remark 2.1.1 (I + l₁e^T₁ + . . . + l_n−1e^T_n−1)⁻¹ = (l− ln−1e^T_n−1) . . . (I− l1e^T₁) which can not be simplified as in (2.1.6).

2.2 LR-factorization

Definition 2.2.1 Given A ∈ C^n×n, a lower triangular matrix L and an upper triangular matrix R. If A = LR, then the product LR is called a LR-factorization (or LR- decomposition) of A.

Basic problem:

Given b 6= 0, b ∈ Kⁿ. Find a vector l1 = [0, l21, . . . , ln1]^T and c∈ K such that (I− l1e^T₁)b = ce1.

Solution:

b1 = c,

bi− li1b1 = 0, i = 2, . . . , n.

b1 = 0, it has no solution (since b6= 0), b1 6= 0, then c = b¹, li1= bi/b1, i = 2, . . . , n.

(3)

Construction of LR-factorization:

Let A = A⁽⁰⁾ = [a⁽⁰⁾₁ | . . . | a⁽⁰⁾ⁿ ]. Apply basic problem to a⁽⁰⁾₁ : If a⁽⁰⁾₁₁ 6= 0, then there exists L1 = I− l¹e^T₁ such that (I − l¹e^T₁)a⁽⁰⁾₁ = a⁽⁰⁾₁₁e1. Thus

A⁽¹⁾ = L₁A⁽⁰⁾ = [L₁a⁽⁰⁾₁ | . . . | L1a⁽⁰⁾_n ] =







a⁽⁰⁾₁₁ a⁽⁰⁾₁₂ . . . a⁽⁰⁾_1n 0 a⁽¹⁾₂₂ a⁽¹⁾_2n ... ... ...

0 a⁽¹⁾_n2 . . . a⁽¹⁾nn





. (2.2.1)

The i-th step:

A⁽ⁱ⁾ = LiA⁽ⁱ⁻¹⁾ = LiL_i−1. . . L1A⁽⁰⁾

=







a⁽⁰⁾₁₁ · · · a⁽⁰⁾_1n 0 a⁽¹⁾₂₂ · · · a⁽¹⁾_2n

... 0 . .. ...

... ... a⁽ⁱ⁻¹⁾_ii · · · a⁽ⁱ⁻¹⁾in

... ... 0 a⁽ⁱ⁾_i+1,i+1 · · · a⁽ⁱ⁾i+1,n

... ... ... ... ...

0 0 · · · a⁽ⁱ⁾_n,i+1 · · · a⁽ⁱ⁾nn







(2.2.2)

If a⁽ⁱ⁻¹⁾_ii 6= 0, for i = 1, . . . , n − 1, then the method is executable and we have that

A⁽ⁿ⁻¹⁾ = L_n−1. . . L1A⁽⁰⁾ = R (2.2.3)

is an upper triangular matrix. Thus, A = LR. Explicit representation of L:

Li = I − lie^T_i , L⁻¹_i = I + lie^T_i

L = L⁻¹₁ . . . L⁻¹_n−1 = (I + l1e^T₁) . . . (I + l_n−1e^T_n−1)

= I + l1e^T₁ + . . . + l_n−1e^T_n−1 (by (2.1.6)).

Theorem 2.2.1 Let A be nonsingular. Then A has an LR-factorization (A=LR) if and only if ki := det(Ai)6= 0, where Aⁱ is the leading principal matrix of A, i.e.,

A_i =





a11 . . . a1i

... ...

ai1 . . . aii



 ,

for i = 1, . . . , n− 1.

Proof: (Necessity “⇒” ): Since A = LR, we have





a11 . . . a1i

... ... ai1 . . . aii



 =



 l11

... . .. O li1 . . . lii









r11 r1i

O . ..

rii



 .

(4)

26 Chapter 2. Numerical methods for solving linear systems From det(A) 6= 0 follows that det(L) 6= 0 and det(R) 6= 0. Thus, ljj 6= 0 and rjj 6= 0, for j = 1, . . . , n. Hence ki = l11. . . liir11. . . rii6= 0.

(Sufficiency “⇐”): From (2.2.2) we have

A⁽⁰⁾ = (L⁻¹₁ . . . L⁻¹_i )A⁽ⁱ⁾.

Consider the (i + 1)-th leading principle determinant. From (2.2.3) we have





a11 . . . ai,i+1

... ...

ai+1 . . . ai+1,i+1





=







1 0

l21 . ..

... . .. ...

li+1,1 · · · l^i+1,i 1













a⁽⁰⁾₁₁ a⁽⁰⁾₁₂ · · · ∗ a⁽¹⁾₂₂ · · · ...

. .. ...

a⁽ⁱ⁻¹⁾_ii a⁽ⁱ⁻¹⁾_i,i+1

0 a⁽ⁱ⁾_i+1,i+1





 .

Thus, ki = 1· a⁽⁰⁾11a⁽¹⁾₂₂ . . . a⁽ⁱ⁾_i+1,i+1 6= 0 which implies a⁽ⁱ⁾i+1,i+1 6= 0. Therefore, the LR- factorization of A exists.

Theorem 2.2.2 If a nonsingular matrix A has an LR-factorization with A = LR and l11 =· · · = lnn = 1, then the factorization is unique.

Proof: Let A = L1R1 = L2R2. Then L⁻¹₂ L1 = R2R⁻¹₁ = I.

Corollary 2.2.1 If a nonsingular matrix A has an LR-factorization with A = LDR, where D is diagonal, L and R^T are unit lower triangular (with one on the diagonal) if and only if ki 6= 0.

Theorem 2.2.3 Let A be a nonsingular matrix. Then there exists a permutation P , such that P A has an LR-factorization.

(Proof ): By construction! Consider (2.2.2): There is a permutation Pi, which inter- changes the i-th row with a row of index large than i, such that 06= a⁽ⁱ⁻¹⁾ii (∈ PiA⁽ⁱ⁻¹⁾).

This procedure is executable, for i = 1, . . . , n− 1. So we have

L_n−1P_n−1. . . LiPi. . . L1P1A⁽⁰⁾ = R. (2.2.4) Let P be a permutation which affects only elements i + 1,· · · , n. It holds

P (I − lⁱe^T_i )P⁻¹= I − (P lⁱ)e^T_i = I − ˜lⁱe^T_i = ˜Li, (e^T_i P⁻¹ = e^T_i ) where ˜Li is lower triangular. Hence we have

P Li = ˜LiP. (2.2.5)

Now write all P_i in (2.2.4) to the right as

L_n−1L˜_n−2. . . ˜L1P_n−1. . . P1A⁽⁰⁾ = R.

Then we have P A = LR with L⁻¹ = L_n−1L˜_n−2· · · ˜L₁ and P = P_n−1· · · P1.

(5)

2.3 Gaussian elimination

2.3.1 Practical implementation

Given a linear system

Ax = b (2.3.1)

with A nonsingular. We first assume that A has an LR-factorization. i.e., A = LR. Thus LRx = b.

We then (i) solve Ly = b; (ii) solve Rx = y. These imply that LRx = Ly = b. From (2.2.4), we have

L_n−1. . . L2L1(A| b) = (R | L⁻¹b).

Algorithm 2.3.1 (without permutation) For k = 1, . . . , n− 1,

if akk = 0 then stop (∗);

else ωj := akj (j = k + 1, . . . , n);

for i = k + 1, . . . , n,

η := aik/akk, aik := η;

for j = k + 1, . . . , n,

aij := aij− ηω^j, bj := bj − ηb^k. For x: (back substitution!)

x_n= b_n/a_nn;

for i = n− 1, n − 2, . . . , 1, xi = (bi−Pn

j=i+1aijxj)/aii.

Cost of computation (one multiplication + one addition ≡ one flop):

(i) LR-factorization: n³/3− n/3 flops;

(ii) Computation of y: n(n− 1)/2 flops;

(iii) Computation of x: n(n + 1)/2 flops.

For A⁻¹: 4/3n³ ≈ n³/3 + kn² (k = n linear systems).

Pivoting: (a) Partial pivoting; (b) Complete pivoting.

From (2.2.2), we have

A^(k−1) =







a⁽⁰⁾₁₁ · · · a⁽⁰⁾_1n

0 . .. ...

... a^(k−2)_k−1,k−1 · · · a^(k−2)_k−1,n ... 0 a^(k−1)_kk · · · a^(k−1)kn

... ... ... ...

0 . . . 0 a^(k−1)_nk · · · a^(k−1)ⁿⁿ





 .

(6)

28 Chapter 2. Numerical methods for solving linear systems

For (a): 





Find a p∈ {k, . . . , n} such that

|a^pk| = maxk≤i≤n|a^ik| (r^k = p)

swap akj, bk and apj, bp respectively, (j = 1, . . . , n).

(2.3.2) Replacing (∗) in Algorithm 2.3.1 by (2.3.2), we have a new factorization of A with partial pivoting, i.e., P A = LR (by Theorem 2.2.1) and |lij| ≤ 1 for i, j = 1, . . . , n. For solving linear system Ax = b, we use

P Ax = P b⇒ L(Rx) = P^Tb≡ ˜b.

It needs extra n(n− 1)/2 comparisons.

For (b):











Find p, q∈ {k, . . . , n} such that

|a^pq| ≤ max

k≤i,j≤n|a^ij|, (r^k := p, ck:= q)

swap akj, bk and apj, bp respectivly, (j = k, . . . , n), swap aik and aiq(i = 1, . . . , n).

(2.3.3)

Replacing (∗) in Algorithm 2.3.1 by (2.3.3), we also have a new factorization of A with complete pivoting, i.e., P AΠ = LR (by Theorem 2.2.1) and |lij| ≤ 1, for i, j = 1, . . . , n.

For solving linear system Ax = b, we use

P AΠ(Π^Tx) = P b⇒ LR˜x = ˜b ⇒ x = Π˜x.

It needs n³/3 comparisons.

Example 2.3.1 Let A =

10⁻⁴ 1

1 1

be in three decimal-digit floating point arithmetic.

Then κ(A) =kAk∞kA⁻¹k∞≈ 4. A is well-conditioned.

• Without pivoting:

L =

1 0

f l(1/10⁻⁴) 1

, f l(1/10⁻⁴) = 10⁴, R =

10⁻⁴ 1

0 f l(1− 10⁴· 1)

, f l(1− 10⁴· 1) = −10⁴. LR =

1 0

10⁴ 1

10⁻⁴ 1 0 −10⁴

=

10⁻⁴ 1

1 0

6=

10⁻⁴ 1

1 1

= A.

Here a22 entirely “lost” from computation. It is numerically unstable. Let Ax =

1 2

. Then x≈

1 1

. But Ly =

1 2

solves y1 = 1 and y2 = f l(2− 10⁴· 1) = −10⁴, Rˆx = y solves ˆx2 = f l((−10⁴)/(−10⁴)) = 1, ˆx1 = f l((1− 1)/10⁻⁴) = 0. We have an erroneous solution with cond(L), cond(R)≈ 10⁸.

• Partial pivoting:

L =

1 0

f l(10⁻⁴/1) 1

=

1 0

10⁻⁴ 1

, R =

1 1

0 f l(1− 10⁻⁴)

=

1 1 0 1

. L and R are both well-conditioned.

(7)

2.3.2 LDR- and LL

^T

-factorizations

Let A = LDR as in Corollary 2.2.1.

Algorithm 2.3.2 (Crout’s factorization or compact method) For k = 1, . . . , n,

for p = 1, 2, . . . , k− 1, r_p := d_pa_pk, ωp := akpdp, dk:= akk−P_k−1

p=1akprp, if d_k = 0, then stop; else

for i = k + 1, . . . , n, aik := (aik−P_k−1

p=1aiprp)/dk, aki := (aki−P_k−1

p=1ωpapi)/dk. Cost: n³/3 flops.

• With partial pivoting: see Wilkinson EVP pp.225-.

• Advantage: One can use double precision for inner product.

Theorem 2.3.1 If A is nonsingular, real and symmetric, then A has a unique LDL^T- factorization, where D is diagonal and L is a unit lower triangular matrix (with one on the diagonal).

Proof: A = LDR = A^T = R^TDL^T. It implies L = R^T.

Theorem 2.3.2 If A is symmetric and positive definite, then there exists a lower triangular G∈ R^n×n with positive diagonal elements such that A = GG^T.

Proof: A is symmetric positive definite ⇔ x^TAx ≥ 0, for all nonzero vector x ∈ R^n×n

⇔ kⁱ ≥ 0, for i = 1, · · · , n, ⇔ all eigenvalues of A are positive.

From Corollary 2.2.1 and Theorem 2.3.1 we have A = LDL^T. From L⁻¹AL^−T = D follows that dk = (e^T_kL⁻¹)A(L^−Tek) > 0. Thus, G = Ldiag{d^1/21 ,· · · , d^1/2ⁿ } is real, and then A = GG^T.

Algorithm 2.3.3 (Cholesky factorization) Let A be symmetric positive definite. To find a lower triangular matrix G such that A = GG^T.

For k = 1, 2, . . . , n, a_kk := (a_kk−P_k−1

p=1a²_kp)^1/2; for i = k + 1, . . . , n,

aik = (aik−P_k−1

p=1aipakp)/akk. Cost: n³/6 flops.

Remark 2.3.1 For solving symmetric, indefinite systems: See Golub/ Van Loan Matrix Computation pp. 159-168. P AP^T = LDL^T, D is 1× 1 or 2 × 2 block-diagonal matrix, P is a permutation and L is lower triangular with one on the diagonal.

(8)

30 Chapter 2. Numerical methods for solving linear systems

2.3.3 Error estimation for linear systems

Consider the linear system

Ax = b, (2.3.4)

and the perturbed linear system

(A + δA)(x + δx) = b + δb, (2.3.5)

where δA and δb are errors of measure or round-off in factorization.

Definition 2.3.1 Letk k be an operator norm and A be nonsingular. Then κ ≡ κ(A) = kAkkA⁻¹k is a condition number of A corresponding to k k.

Theorem 2.3.3 (Forward error bound) Let x be the solution of the (2.3.4) and x+δx be the solution of the perturbed linear system (2.3.5). If kδAkkA⁻¹k < 1, then

kδxk

kxk ≤ κ

1− κ^kδAk_kAk

kδAk

kAk + kδbk kbk

. (2.3.6)

Proof: From (2.3.5) we have

(A + δA)δx + Ax + δAx = b + δb.

Thus,

δx =−(A + δA)⁻¹[(δA)x− δb]. (2.3.7) Here, Corollary 2.7 implies that (A + δA)⁻¹ exists. Now,

k(A + δA)⁻¹k = k(I + A⁻¹δA)⁻¹A⁻¹k ≤ kA⁻¹k 1

1− kA⁻¹kkδAk. On the other hand, b = Ax implies kbk ≤ kAkkxk. So,

1

kxk ≤ kAk

kbk. (2.3.8)

From (2.3.7) follows that kδxk ≤ _1−kA^kA⁻¹⁻¹_kkδAk^k (kδAkkxk + kδbk). By using (2.3.8), the inequality (2.3.6) is proved.

Remark 2.3.2 If κ(A) is large, then A (for the linear system Ax = b) is called ill- conditioned, else well-conditioned.

2.3.4 Error analysis for Gaussian algorithm

A computer in characterized by four integers: (a) the machine base β; (b) the precision t; (c) the underflow limit L; (d) the overflow limit U . Define the set of floating point numbers.

F = {f = ±0.d1d₂· · · dt× β^e | 0 ≤ di < β, d₁ 6= 0, L ≤ e ≤ U} ∪ {0}. (2.3.9)

(9)

Let G ={x ∈ R | m ≤ |x| ≤ M} ∪ {0}, where m = β^L−1 and M = β^U(1− β^−t) are the minimal and maximal numbers of F \ {0} in absolute value, respectively. We define an operator f l : G→ F by

f l(x) = the nearest c∈ F to x by rounding arithmetic.

One can show that f l satisfies

f l(x) = x(1 + ε), |ε| ≤ eps, (2.3.10) where eps = ¹₂β^1−t. (If β = 2, then eps = 2^−t). It follows that

f l(a◦ b) = (a ◦ b)(1 + ε) or

f l(a◦ b) = (a ◦ b)/(1 + ε), where |ε| ≤ eps and ◦ = +, −, ×, /.

Algorithm 2.3.4 Given x, y ∈ Rⁿ. The following algorithm computes x^Ty and stores the result in s.

s = 0,

for k = 1, . . . , n, s = s + xkyk.

Theorem 2.3.4 If n2^−t ≤ 0.01, then f l(

Xn k=1

xkyk) = Xn

k=1

xkyk[1 + 1.01(n + 2− k)θ^k2^−t], |θ^k| ≤ 1

Proof: Let s_p = f l(Pp

k=1x_ky_k) be the partial sum in Algorithm 2.3.4. Then s1 = x1y1(1 + δ1)

with |δ1| ≤ eps and for p = 2, . . . , n,

sp = f l[s_p−1+ f l(xpyp)] = [s_p−1+ xpyp(1 + δp)](1 + εp) with |δ^p|, |ε^p| ≤ eps. Therefore

f l(x^Ty) = s_n= Xn k=1

x_ky_k(1 + γ_k),

where (1 + γk) = (1 + δk)Qn

j=k(1 + εj), and ε1 ≡ 0. Thus, f l(

Xn k=1

xkyk) = Xn k=1

xkyk[1 + 1.01(n + 2− k)θ^k2^−t]. (2.3.11)

The result follows immediately from the following useful Lemma.

(10)

32 Chapter 2. Numerical methods for solving linear systems Lemma 2.3.5 If (1 + α) =Qn

k=1(1 + α_k), where |αk| ≤ 2^−t and n2^−t ≤ 0.01, then Yn

k=1

(1 + αk) = 1 + 1.01nθ2^−t with |θ| ≤ 1.

Proof: From assumption it is easily seen that

(1− 2^−t)ⁿ≤ Yn k=1

(1 + αk)≤ (1 + 2^−t)ⁿ. (2.3.12)

Expanding the Taylor expression of (1− x)ⁿ as −1 < x < 1, we get (1− x)ⁿ = 1− nx + n(n− 1)

2 (1− θx)ⁿ⁻²x² ≥ 1 − nx.

Hence

(1− 2^−t)ⁿ ≥ 1 − n2^−t. (2.3.13) Now, we estimate the upper bound of (1 + 2^−t)ⁿ:

e^x = 1 + x +x² 2! +x³

3! +· · · = 1 + x +x

2x(1 + x 3 +2x²

4! +· · · ).

If 0≤ x ≤ 0.01, then

1 + x≤ e^x ≤ 1 + x + 0.01x1

2e^x ≤ 1 + 1.01x (2.3.14) (Here, we use the fact e^0.01 < 2 to the last inequality.) Let x = 2^−t. Then the left inequality of (2.3.14) implies

(1 + 2^−t)ⁿ≤ e²^−tⁿ (2.3.15)

Let x = 2^−tn. Then the second inequality of (2.3.14) implies

e²^−tⁿ ≤ 1 + 1.01n2^−t (2.3.16)

From (2.3.15) and (2.3.16) we have

(1 + 2^−t)ⁿ≤ 1 + 1.01n2^−t.

Let the exact LR-factorization of A be L and R (A = LR) and let ˜L, ˜R be the LR-factorization of A by using Gaussian Algorithm (without pivoting). There are two possibilities:

(i) Forward error analysis: Estimate |L − ˜L| and |R − ˜R|.

(ii) Backward error analysis: Let ˜L ˜R be the exact LR-factorization of a perturbed matrix ˜A = A + F . Then F will be estimated, i.e., |F | ≤ ?.

(11)

2.3.5 Apriori error estimate for backward error bound of LR- ` factorization

From (2.2.2) we have

A^(k+1)= L_kA^(k),

for k = 1, 2, . . . , n− 1 (A⁽¹⁾ = A). Denote the entries of A^(k) by a^(k)_ij and let lik = f l(a^(k)_ik /a^(k)_kk), i≥ k + 1. From (2.2.2) we know that

a^(k+1)_ij =







0; for i≥ k + 1, j = k

f l(a^(k)_ij − fl(lika^(k)_kj )); for i≥ k + 1, j ≥ k + 1 a^(k)_ij ; otherwise.

(2.3.17)

From (2.3.10) we have lik = (a^(k)_ik /a^(k)_kk)(1 + δik) with|δ^ik| ≤ 2^−t. Then

a^(k)_ik − lika^(k)_kk + a^(k)_ij δik = 0, for i≥ k + 1. (2.3.18) Let a^(k)_ik δik ≡ ε^(k)ik . From (2.3.10) we also have

a^(k+1)_ij = f l(a^(k)_ij − fl(l^ika^(k)_kj)) (2.3.19)

= (a^(k)_ij − (lika^(k)_kj (1 + δij)))/(1 + δ_ij⁰ ) with |δij|, |δij⁰ | ≤ 2^−t. Then

a^(k+1)_ij = a^(k)_ij − l^ika^(k)_kj − l^ika^(k)_kjδij+ a^(k+1)_ij δ_ij⁰ , for i, j ≥ k + 1. (2.3.20) Let ε^(k)_ij ≡ −lika^(k)_kjδij+ a^(k+1)_ij δ_ij⁰ which is the computational error of a^(k)_ij in A^(k+1). From (2.3.17), (2.3.18) and (2.3.20) we obtain

a^(k+1)_ij =







a^(k)_ij − l^ika^(k)_kk + ε^(k)_ij ; for i≥ k + 1, j = k a^(k)_ij − l^ika^(k)_kj + ε^(k)_ij ; for i≥ k + 1, j ≥ k + 1 a^(k)_ij + ε^(k)_ij ; otherwise,

(2.3.21)

where

ε^(k)_ij =







a^(k)_ij δij; for i≥ k + 1, j = k,

−l^ika^(k)_kj δij− a^(k+1)ij δ_ij⁰ ; for i≥ k + 1, j ≥ k + 1

0; otherwise.

(2.3.22)

Let E^(k) be the error matrix with entries ε^(k)_ij . Then (2.3.21) can be written as

A^(k+1) = A^(k)− MkA^(k)+ E^(k), (2.3.23) where

M_k =





 0

. ..

0 lk+1,k

... . ..

ln,k 0







(2.3.24)

(12)

34 Chapter 2. Numerical methods for solving linear systems For k = 1, 2 . . . , n− 1, we add the n − 1 equations in (2.3.23) together and get

M1A⁽¹⁾ + M2A⁽²⁾+· · · + Mn−1A⁽ⁿ⁻¹⁾+ A⁽ⁿ⁾

= A⁽¹⁾+ E⁽¹⁾+· · · + E⁽ⁿ⁻¹⁾.

From (2.3.17) we know that the k-th row of A^(k)is equal to the k-th row of A^(k+1),· · · , A⁽ⁿ⁾, respectively and from (2.3.24) we also have

MkA^(k) = MkA⁽ⁿ⁾ = MkR.˜ Thus,

(M1 + M2+· · · + Mn−1+ I) ˜R = A⁽¹⁾+ E⁽¹⁾+· · · + E⁽ⁿ⁻¹⁾. Then

L ˜˜R = A + E, (2.3.25)

where

L =˜





 1

l21 1 O

... . ..

ln1 . . . l_n,n−1 1





 and E = E⁽¹⁾+· · · + E⁽ⁿ⁻¹⁾. (2.3.26)

Now we assume that the partial pivotings in Gaussian Elimination are already ar- ranged such that pivot element a^(k)_kk has the maximal absolute value. So, we have|lik| ≤ 1.

Let

ρ = max

i,j,k |a^(k)ij |/kAk∞. (2.3.27)

Then

|a^(k)ij | ≤ ρkAk∞. (2.3.28)

From (2.3.22) and (2.3.28) follows that

|ε^(k)ij | ≤ ρkAk∞





2^−t; for i≥ k + 1, j = k, 2^1−t; for i≥ k + 1, j ≥ k + 1,

0; otherwise.

(2.3.29)

Therefore,

|E^(k)| ≤ ρkAk∞2^−t·







0 0 0 · · · 0 0 1 2 · · · 2 ... ... ... ... 0 1 2 · · · 2





. (2.3.30)

From (2.3.26) we get

|E| ≤ ρkAk∞· 2^−t







0 0 0 · · · 0 0

1 2 2 · · · 2 2

1 3 4 · · · 4 4

... ... ... ... ... 1 3 5 · · · 2n − 4 2n − 4 1 3 5 · · · 2n − 3 2n − 2







(2.3.31)

Hence we have the following theorem.

(13)

Theorem 2.3.6 The LR-factorization ˜L and ˜R of A using Gaussian Elimination with partial pivoting satisfies

L ˜˜R = A + E, where

kEk∞ ≤ n²ρkAk∞2^−t (2.3.32)

Proof:

kEk∞ ≤ ρkAk∞2^−t( Xn

j=1

(2j− 1) − 1) < n²ρkAk∞2^−t.

Now we shall solve the linear system Ax = b by using the factorization ˜L and ˜R, i.e., Ly = b and ˜˜ Rx = y.

• For Ly = b: From Algorithm 2.3.1 we have y1 = f l(b1/l11),

y_i = f l

−lⁱ¹y1− lⁱ²y2− · · · − li,i−1y_i−1+ bi

lii

, (2.3.33)

for i = 2, 3, . . . , n. From (2.3.10) we have











y₁ = b₁/l₁₁(1 + δ₁₁), with |δ11| ≤ 2^−t yi = f l(^{f l(−l}ⁱ¹^y¹^−lⁱ²_l^y²^{−···−l}^i,i−1^yⁱ⁻¹^)+bⁱ

ii(1+δii) )

= ^{f l(−l}ⁱ¹^y¹^−l_l ⁱ²^y²^{−···−l}^i,i−1^yⁱ⁻¹^)+bⁱ

ii(1+δii)(1+δ⁰_ii) , with|δii|, |δii⁰| ≤ 2^−t.

(2.3.34)

Applying Theorem 2.3.4 we get

f l(−li1y1− li2y2− · · · − li,i−1y_i−1) =−li1(1 + δi1)y1− · · · − li,i−1(1 + δ_i,i−1)y_i−1, where

|δⁱ¹| ≤ (i − 1)1.01 · 2^−t; for i = 2, 3,· · · , n,

|δij| ≤ (i + 1 − j)1.01 · 2^−t; for

i = 2, 3,· · · , n,

j = 2, 3,· · · , i − 1. (2.3.35) So, (2.3.34) can be written as





l11(1 + δ11)y1 = b1,

li1(1 + δi1)y1+· · · + li,i−1(1 + δ_i,i−1)y_i−1+ lii(1 + δii)(1 + δ_ii⁰)yi = bi, for i = 2, 3,· · · , n.

(2.3.36)

or

(L + δL)y = b. (2.3.37)

(14)

36 Chapter 2. Numerical methods for solving linear systems From (2.3.35) (2.3.36) and (2.3.37) follow that

|δL| ≤ 1.01 · 2^−t







|l¹¹| 0

|l21| 2|l22|

2|l³¹| 2|l³²| 2|l³³|

3|l41| 3|l42| 2|l43| . ..

... ... ... . .. . ..

(n− 1)|ln1| (n − 1)|ln2| (n − 2)|ln3| · · · 2|ln,n−1| 2|lnn|





 .

(2.3.38) This implies,

kδLk∞ ≤ n(n + 1)

2 · 1.01 · 2^−tmax

i,j |lij| ≤ n(n + 1)

2 · 1.01 · 2^−t. (2.3.39) Theorem 2.3.7 For lower triangular linear system Ly = b, if y is the exact solution of (L + δL)y = b, then δL satisfies (2.3.38) and (2.3.39).

Applying Theorem 2.3.7 to the linear system ˜Ly = b and ˜Rx = y, respectively, the solution x satisfies

( ˜L + δ ˜L)( ˜R + δ ˜R)x = b or

( ˜L ˜R + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R))x = b. (2.3.40) Since ˜L ˜R = A + E, substituting this equation into (2.3.40) we get

[A + E + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R)]x = b. (2.3.41) The entries of ˜L and ˜R satisfy

|el^ij| ≤ 1, and |er^ij| ≤ ρkAk∞.

Therefore, we get 









k ˜Lk∞ ≤ n,

k ˜Rk∞≤ nρkAk∞,

kδ ˜Lk∞≤ ⁿ⁽ⁿ⁺¹⁾₂ 1.01· 2^−t, kδ ˜Rk∞≤ ⁿ⁽ⁿ⁺¹⁾₂ 1.01ρ2^−t.

(2.3.42)

In practical implementation we usually have n²2^−t << 1. So it holds kδ ˜Lk∞kδ ˜Rk∞≤ n²ρkAk∞2^−t.

Let

δA = E + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R). (2.3.43) Then, (2.3.32) and (2.3.42) we get

kδAk∞ ≤ kEk∞+kδ ˜Lk∞k ˜Rk∞+k ˜Lk∞kδ ˜Rk∞+kδ ˜Lk∞kδ ˜Rk∞

≤ 1.01(n³+ 3n²)ρkAk∞2^−t (2.3.44)

(15)

Theorem 2.3.8 For a linear system Ax = b the solution x computed by Gaussian Elim- ination with partial pivoting is the exact solution of the equation (A + δA)x = b and δA satisfies (2.3.43) and (2.3.44).

Remark 2.3.3 The quantity ρ defined by (2.3.27) is called a growth factor. The growth factor measures how large the numbers become during the process of elimination. In practice, ρ is usually of order 10 for partial pivot selection. But it can be as large as ρ = 2ⁿ⁻¹, when

A =







1 0 · · · 0 1

−1 1 0 · · · 0 1

... −1 . .. ... ... 1 ... ... . .. ... 0 1

−1 −1 · · · −1 1 1

−1 −1 · · · −1 1





 .

Better estimates hold for special types of matrices. For example in the case of upper Hessenberg matrices, that is, matrices of the form

A =







× · · · ×

× . .. ... ...

. .. ... ...

0 × ×







the bound ρ≤ (n − 1) can be shown. (Hessenberg matrices arise in eigenvalus problems.) For tridiagonal matrices

A =







α1 β2 0

γ2 . .. ...

. .. ... ...

. .. ... βn

0 γn αn







it can even be shown that ρ ≤ 2 holds for partial pivot selection. Hence, Gaussian elimination is quite numerically stable in this case.

For complete pivot selection, Wilkinson (1965) has shown that

|a^kij| ≤ f(k) max

i,j |aij| with the function

f (k) := k¹²[2¹3¹² 4¹³ · · · k^(k−1)¹ ]¹². This function grows relatively slowly with k:

k 10 20 50 100

f (k) 19 67 530 3300

(16)

38 Chapter 2. Numerical methods for solving linear systems Even this estimate is too pessimistic in practice. Up until now, no matrix has been found which fails to satisfy

|a^(k)ij | ≤ (k + 1) max

i,j |aij| k = 1, 2, ..., n − 1,

when complete pivot selection is used. This indicates that Gaussian elimination with complete pivot selection is usually a stable process. Despite this, partial pivot selection is preferred in practice, for the most part, because:

(i) Complete pivot selection is more costly than partial pivot selection. (To compute A⁽ⁱ⁾, the maximum from among (n− i + 1)² elements must be determined instead of n− i + 1 elements as in partial pivot selection.)

(ii) Special structures in a matrix, i.e. the band structure of a tridiagonal matrix, are destroyed in complete pivot selection.

2.3.6 Improving and Estimating Accuracy

• Iterarive Improvement:

Suppose that the linear system Ax = b has been solved via the LR-factorization P A = LR. Now we want to improve the accuracy of the computed solution x. We compute





r = b− Ax, Ly = P r, Rz = y, x_new = x + z.

(2.3.45)

Then in exact arithmatic we have

Ax_new = A(x + z) = (b− r) + Az = b.

Unfortunately, r = f l(b− Ax) renders an x^new that is no more accurate than x. It is necessary to compute the residual b− Ax with extended precision floating arithmetic.

Algorithm 2.3.5

Compute P A = LR (t-digit) Repeat: r := b− Ax (2t-digit)

Solve Ly = P r for y (t-digit) Solve Rz = y for z (t-digit) Update x = x + z (t-digit)

This is referred to as an iterative improvement. From (2.3.45) we have

ri = bi− aⁱ¹x1− aⁱ²x2− · · · − aⁱⁿxn. (2.3.46) Now, ri can be roughly estimated by 2^−tmaxj|aij| |xj|. That is

krk ≈ 2^−tkAkkxk. (2.3.47)

(17)

Let e = x− A⁻¹b = A⁻¹(Ax− b) = −A⁻¹r. Then we have

kek ≤ kA⁻¹kkrk. (2.3.48)

From (2.3.47) follows that

kek ≈ kA⁻¹k · 2^−tkAkkxk = 2^−tcond(A)kxk.

Let

cond(A) = 2^p, 0 < p < t, (p is integer). (2.3.49) Then we have

kek/kxk ≈ 2^−(t−p). (2.3.50)

From (2.3.50) we know that x has q = t− p correct significant digits. Since r is computed by double precision, so we can assume that it has at least t correct significant digits.

Therefore for solving Az = r according to (2.3.50) the solution z (comparing with −e = A⁻¹r) has q-digits accuracy so that xnew = x + z has usually 2q-digits accuracy. From above discussion, the accuracy of xnew is improved about q-digits after one iteration.

Hence we stop the iteration, when the number of the iterates k (say!) satifies kq ≥ t.

From above we have

kzk/kxk ≈ kek/kxk ≈ 2^−q = 2^−t2^p. (2.3.51) From (2.3.49) and (2.3.51) we have

cond(A) = 2^t· (kzk/kxk).

By (2.3.51) we get

q = log₂(kxk

kzk) and k = t log₂(^kxk_kzk).

In the following we shall give a further discussion of convergence of the iterative improvement. From Theorem 2.3.8 we know that z in Algorithm 5.5 is computed by (A+δA)z = r.

That is

A(I + F )z = r, (2.3.52)

where F = A⁻¹δA.

Theorem 2.3.9 Let the sequence of vectors {xv} be the sequence of improved solutions in Algorithm 5.5 for solving Ax = b and x^∗ = A⁻¹b be the exact solution. Assume that Fk in (2.3.52) satisfies kFkk ≤ σ < 1/2 for all k. Then {xk} converges to x^∗, i.e., lim_v→∞kxk− x^∗k = 0.

Proof: From (2.3.52) and rk= b− Axk we have

A(I + Fk)zk= b− Ax^k. (2.3.53) Since A is nonsingular, multiplying both sides of (2.3.53) by A⁻¹ we get

(I + F_k)z_k = x^∗− xk.

(18)

40 Chapter 2. Numerical methods for solving linear systems From x_k+1 = x_k+ z_k we have (I + F_k)(x_k+1− xk) = x^∗− xk, i.e.,

(I + Fk)xk+1 = Fkxk+ x^∗. (2.3.54) Subtracting both sides of (2.3.54) from (I + F_k)x^∗ we get

(I + Fk)(xk+1− x^∗) = Fk(xk− x^∗).

Applying Corollary 1.2.1 we have

xk+1− x^∗ = (I + Fk)⁻¹Fk(xk− x^∗).

Hence,

kxk+1− x^∗k ≤ kFkkkx^k− x^∗k

1− kFkk ≤ σ

1− σkxk− x^∗k.

Let τ = σ/(1− σ). Then

kxk− x^∗k ≤ τ^k−1kx1− x^∗k.

But σ < 1/2 follows τ < 1. This implies convergence of Algorithm 2.3.5.

Corollary 2.3.1 If

1.01(n³+ 3n²)ρ2^−tkAk kA⁻¹k < 1/2, then Algorithm 2.3.5 converges.

Proof: From (2.3.52) and (2.3.44) follows that

kF^kk ≤ 1.01(n³+ 3n²)ρ2^−tcond(A) < 1/2.

2.3.7 Special Linear Systems

Toeplitz Systems

Definition 2.3.2 (i) T ∈ R^n×n is called a Toeplitz matrix if there exists r_−n+1,· · · , r⁰,· · · , rn−1

such that aij = r_j−i for all i, j. e.g.,

T =







r₀ r₁ r₂ r₃ r₋₁ r0 r1 r2

r₋₂ r₋₁ r0 r1

r₋₃ r₋₂ r₋₁ r₀





 , (n = 4).

(ii) B ∈ R^n×nis called a Persymmetric matrix if it is symmetric about northest-southwest diagonal, i.e., bij= bn−j+1,n−i+1 for all i, j. That is,

B = EB^TE, where E = [en,· · · e1].

(19)

Given scalars r₁,· · · , rn−1 such that the matrices

Tk =







1 r1 r2 · · · rk−1

r1 1 r1 ...

... . ..

r_k−1 · · · 1







are all positive definite, for k = 1, . . . , n. Three algorithms will be described:

(a) Durbin’s Algorithm for the Yule-Walker problem Tny =−(r¹, . . . , rn)^T.

(b) Levinson’s Algorithm for the general right hand side Tnx = b.

(c) Trench’s Algorithm for computing B = T_n⁻¹.

• To (a): Let Ek = [e^(k)_k , . . . , e^(k)₁ ]. Suppose the k-th order Yule-Walker system Tky =−(r¹, . . . , rk)^T =−r^T

has been solved. Consider the (k + 1)-st order system

Tk Ekr r^TEk 1

z α

=

−r

−r^k+1

can be solved in O(k) flops. Observe that

z = T_k⁻¹(−r − αEkr) = y− αTk⁻¹Ekr (2.3.55) and

α =−rk+1 =−r^TEkz. (2.3.56)

Since T_k⁻¹ is persymmetric, T_k⁻¹Ek= EkT_k⁻¹and z = y +αEky. Substituting into (2.3.56) we get

α =−r^k+1− r^TEk(y + αEky) =−(r^k+1+ r^TEky)/(1 + r^Ty).

Here (1 + r^Ty) is positive, because Tk+1 is positive definite and

I Eky

0 1

T

Tk Ekr r^TEk 1

I Eky

0 1

=

Tk 0 0 1 + r^Ty

.

Algorithm 2.3.6 (Durbin Algorithm, 1960) Let Tky^(k)=−r^(k) =−(r1, . . . , rk)^T. For k = 1, . . . , n,

y⁽¹⁾ =−r1,

for k = 1, . . . , n− 1, βk = 1 + r^(k)Ty^(k),

αk=−(rk+1+ r^(k)TEky^(k))/βk, z^(k) = y^(k)+ αkEky^(k),

y^(k+1) =

z^(k) αk

.

(20)

42 Chapter 2. Numerical methods for solving linear systems This algorithm requires ³₂n² flops to generate y = y⁽ⁿ⁾.

Further reduction:

βk = 1 + r^(k)Ty^(k)

= 1 + [r^(k−1)T, r^(k)]

y^(k−1)+ α_k−1E_k−1y^(k−1) α_k−1

= 1 + r^(k−1)Ty^(k−1)+ α_k−1(r^(k−1)TE_k−1y^(k−1)+ rk)

= β_k−1+ α_k−1(−βk−1α_k−1) = (1− α²_k−1)β_k−1.

• To (b):

Tkx = b = (b1,· · · , b^k)^T, for 1≤ k ≤ n. (2.3.57) Want to solve

Tk Ekr r^TEk 1

ν µ

b bk+1

, (2.3.58)

where r = (r1,· · · , r^k)^T. Since ν = T_k⁻¹(b− µE^kr) = x + µEky, it follows that µ = bk+1− r^TEkν = bk+1− r^TEkx− µr^Ty

= (bk+1− r^TEkx)/(1 + r^Ty).

We can effect the transition form (2.3.57) to (2.3.58) in O(k) flops. We can solve the system Tnx = b by solving

T_kx^(k) = b^(k)= (b₁, . . . , b_k)^T and

Tky^(k)=−r^(k) =−(r¹, . . . , rk)^T.

It needs 2n² flops. See Algorithm Levinson (1947) in Matrix Computations, pp.128-129 for details.

• To (c):

T_n⁻¹ =

A Er

r^TE 1

₋₁

=

B ν ν^T γ

,

where A = T_n−1, E = E_n−1 and r = (r₁, . . . , r_n−1)^T. From the equation

A Er

r^TE 1

=

ν γ

=

0 1

follows that

Aν =−γEr = −γE(r1, . . . , r_n−1)^T and γ = 1− r^TEν.

If y is the solution of (n− 1)-st Yule-Walker system Ay = −r, then γ = 1/(1 + r^Ty) and ν = γEy.

Thus the last row and column of T_n⁻¹ are readily obtained. Since AB + Erν^T = I_n−1, we have

B = A⁻¹− (A⁻¹Er)ν^T = A⁻¹ +νν^T γ .

(21)

Since A = T_n−1 is nonsingular and Toeplitz, its inverse is persymmetric. Thus bij = (A⁻¹)ij+ νiνj

γ = (A⁻¹)_n−j,n−i+νiνj

γ

= b_n−j,n−i− ν_n−iν_n−j

γ +νiνj

γ

= b_n−j,n−i− 1

γ(νiνj− νn−iν_n−j).

It needs ⁷₄n² flops. See Algorithm Trench (1964) in Matrix Computations, pp.132 for details.

Banded Systems

Definition 2.3.3 Let A be a n× n matrix. A is called a (p, q)-banded matrix, if a^ij= 0 whenever i− j > p or j − i > q. A has the form

A =







× · · · ... ...

× . ..

O

× O

. ..

. .. ×

. .. ...

× · · · ×







| {z }

p

>

q

⊥ ,

where p and q are the lower and upper band widthes, respectively.

Example 2.3.2 (1, 1): tridiagonal matrix; (1, n−1): upper Hessenberg matrix; (n−1, 1):

lower Hessenberg matrix.

Theorem 2.3.10 Let A be a (p, q)-banded matrix. Suppose A has a LR-factorization (A = LR). Then L = (p, 0) and R = (0, q)-banded matrix, respectively.

Algorithm 2.3.7 See Algorithm 4.3.1 in Matrix Computations, pp.150.

Theorem 2.3.11 Let A be a (p, q)-banded nonsingular matrix. If Gaussian Elimination with partial pivoting is used to compute Gaussian transformations Lj = I − lje^T_j, for j = 1,. . ., n− 1, and permutations P1, . . ., P_n−1 such that

L_n−1P_n−1· · · L¹P1A = R

is upper triangular, then R is a (0, p + q)-banded matrix and lij = 0 whenever i ≤ j or i > j + p. (Since the j-th column of L is a permutation of the Gaussian vector l_j, it follows that L has at most p + 1 nonzero elements per column.)

(22)

44 Chapter 2. Numerical methods for solving linear systems Symmetric Indefinite Systems

Consider the linear system Ax = b, where A ∈ R^n×n is symmetric but indefinite. There are a method using n³/6 flops due to Aasen (1971) that computes the factorization P AP^T = LT L^T, where L = [lij] is unit lower triangular, P is a permutation chosen such that | lij |≤ 1, and T is tridiagonal.

Rather than the above factorization P AP^T = LT L^T we have the calculation of P AP^T = LDL^T,

where D is block diagonal with 1 by 1 and 2 by 2 blocks on diagonal, L = [l_ij] is unit lower triangular, and P is a permutation chosen such that | l^ij |≤ 1.

Bunch and Parlett (1971) has proposed a pivot strategy to do this, n³/6 flops are required. Unfortunately the overall process requires n³/12 ∼ n³/6 comparisons. A better method described by Bunch and Kaufmann (1977) requires n³/6 flops and O(n²) comparisons.

A detailed discussion of this subsection see p.159-168 in Matrix Computations.