• 沒有找到結果。

# Numerical methods for solving linear systems

N/A
N/A
Protected

Share "Numerical methods for solving linear systems"

Copied!
22
0
0

(1)

## Numerical methods for solving linear systems

Let A ∈ Cn×n be a nonsingular matrix. We want to solve the linear system Ax = b by (a) Direct methods (finite steps); Iterative methods (convergence). (See Chapter 4)

### 2.1 Elementary matrices

Let X = Kn and x, y ∈ X. Then yx∈ K, xy =



x11 · · · x1n

... ... xn1 · · · xnn

. The eigenvalues

of xy are {0, · · · , 0, yx}, since rank(xy) = 1 by (xy)z = (yz)x and (xy)x = (yx)x.

Definition 2.1.1 A matrix of the form

I− αxy (α∈ K, x, y ∈ Kn) (2.1.1) is called an elementary matrix.

The eigenvalues of (I − αxy) are {1, 1, · · · , 1, 1 − αyx}. Compute

(I− αxy)(I − βxy) = I− (α + β − αβyx)xy. (2.1.2) If αyx− 1 6= 0 and letβ = αyαx−1, then α + β− αβyx = 0. We have

(I− αxy)−1 = (I − βxy), 1 α + 1

β = yx. (2.1.3)

Example 2.1.1 Let x∈ Kn, and xx = 1. Let H ={z : zx = 0} and Q = I− 2xx (Q = Q, Q−1 = Q).

Then Q reflects each vector with respect to the hyperplane H. Let y = αx + w, w ∈ H.

Then, we have

Qy = αQx + Qw =−αx + w − 2(xw)x =−αx + w.

(2)

24 Chapter 2. Numerical methods for solving linear systems Example 2.1.2 Let y = ei = the i-th column of unit matrix and x = li = [0,· · · , 0, li+1,i,· · · , ln,i]T. Then,

I + lieTi =







 1

. ..

1 li+1,i

... . ..

ln,i 1









(2.1.4)

Since eTi li = 0, we have

(I + lieTi )−1= (I − lieTi). (2.1.5) From the equality

(I + l1eT1)(I + l2eT2) = I + l1eT1 + l2eT2 + l1(eT1l2)eT2 = I + l1eT1 + l2eT2 follows that

(I + l1eT1)· · · (I + lieTi )· · · (I + ln−1eTn−1) = I + l1eT1 + l2eT2 +· · · + ln−1eTn−1

=



 1

l21 . .. 0 ... ... ...

ln1 · · · ln,n−1 1



. (2.1.6)

Theorem 2.1.1 A lower triangular with “1” on the diagonal can be written as the prod- uct of n− 1 elementary matrices of the form (2.1.4).

Remark 2.1.1 (I + l1eT1 + . . . + ln−1eTn−1)−1 = (l− ln−1eTn−1) . . . (I− l1eT1) which can not be simplified as in (2.1.6).

### 2.2 LR-factorization

Definition 2.2.1 Given A ∈ Cn×n, a lower triangular matrix L and an upper triangu- lar matrix R. If A = LR, then the product LR is called a LR-factorization (or LR- decomposition) of A.

Basic problem:

Given b 6= 0, b ∈ Kn. Find a vector l1 = [0, l21, . . . , ln1]T and c∈ K such that (I− l1eT1)b = ce1.

Solution:

 b1 = c,

bi− li1b1 = 0, i = 2, . . . , n.

 b1 = 0, it has no solution (since b6= 0), b1 6= 0, then c = b1, li1= bi/b1, i = 2, . . . , n.

(3)

Construction of LR-factorization:

Let A = A(0) = [a(0)1 | . . . | a(0)n ]. Apply basic problem to a(0)1 : If a(0)11 6= 0, then there exists L1 = I− l1eT1 such that (I − l1eT1)a(0)1 = a(0)11e1. Thus

A(1) = L1A(0) = [L1a(0)1 | . . . | L1a(0)n ] =





a(0)11 a(0)12 . . . a(0)1n 0 a(1)22 a(1)2n ... ... ...

0 a(1)n2 . . . a(1)nn



. (2.2.1)

The i-th step:

A(i) = LiA(i−1) = LiLi−1. . . L1A(0)

=













a(0)11 · · · a(0)1n 0 a(1)22 · · · a(1)2n

... 0 . .. ...

... ... a(i−1)ii · · · a(i−1)in

... ... 0 a(i)i+1,i+1 · · · a(i)i+1,n

... ... ... ... ...

0 0 · · · a(i)n,i+1 · · · a(i)nn













(2.2.2)

If a(i−1)ii 6= 0, for i = 1, . . . , n − 1, then the method is executable and we have that

A(n−1) = Ln−1. . . L1A(0) = R (2.2.3)

is an upper triangular matrix. Thus, A = LR. Explicit representation of L:

Li = I − lieTi , L−1i = I + lieTi

L = L−11 . . . L−1n−1 = (I + l1eT1) . . . (I + ln−1eTn−1)

= I + l1eT1 + . . . + ln−1eTn−1 (by (2.1.6)).

Theorem 2.2.1 Let A be nonsingular. Then A has an LR-factorization (A=LR) if and only if ki := det(Ai)6= 0, where Ai is the leading principal matrix of A, i.e.,

Ai =



a11 . . . a1i

... ...

ai1 . . . aii

 ,

for i = 1, . . . , n− 1.

Proof: (Necessity “⇒” ): Since A = LR, we have



a11 . . . a1i

... ... ai1 . . . aii

 =

 l11

... . .. O li1 . . . lii





r11 r1i

O . ..

rii

 .

(4)

26 Chapter 2. Numerical methods for solving linear systems From det(A) 6= 0 follows that det(L) 6= 0 and det(R) 6= 0. Thus, ljj 6= 0 and rjj 6= 0, for j = 1, . . . , n. Hence ki = l11. . . liir11. . . rii6= 0.

(Sufficiency “⇐”): From (2.2.2) we have

A(0) = (L−11 . . . L−1i )A(i).

Consider the (i + 1)-th leading principle determinant. From (2.2.3) we have



a11 . . . ai,i+1

... ...

ai+1 . . . ai+1,i+1



=







1 0

l21 . ..

... . .. ...

... . .. ...

li+1,1 · · · li+1,i 1













a(0)11 a(0)12 · · · ∗ a(1)22 · · · ...

. .. ...

a(i−1)ii a(i−1)i,i+1

0 a(i)i+1,i+1







 .

Thus, ki = 1· a(0)11a(1)22 . . . a(i)i+1,i+1 6= 0 which implies a(i)i+1,i+1 6= 0. Therefore, the LR- factorization of A exists.

Theorem 2.2.2 If a nonsingular matrix A has an LR-factorization with A = LR and l11 =· · · = lnn = 1, then the factorization is unique.

Proof: Let A = L1R1 = L2R2. Then L−12 L1 = R2R−11 = I.

Corollary 2.2.1 If a nonsingular matrix A has an LR-factorization with A = LDR, where D is diagonal, L and RT are unit lower triangular (with one on the diagonal) if and only if ki 6= 0.

Theorem 2.2.3 Let A be a nonsingular matrix. Then there exists a permutation P , such that P A has an LR-factorization.

(Proof ): By construction! Consider (2.2.2): There is a permutation Pi, which inter- changes the i-th row with a row of index large than i, such that 06= a(i−1)ii (∈ PiA(i−1)).

This procedure is executable, for i = 1, . . . , n− 1. So we have

Ln−1Pn−1. . . LiPi. . . L1P1A(0) = R. (2.2.4) Let P be a permutation which affects only elements i + 1,· · · , n. It holds

P (I − lieTi )P−1= I − (P li)eTi = I − ˜lieTi = ˜Li, (eTi P−1 = eTi ) where ˜Li is lower triangular. Hence we have

P Li = ˜LiP. (2.2.5)

Now write all Pi in (2.2.4) to the right as

Ln−1n−2. . . ˜L1Pn−1. . . P1A(0) = R.

Then we have P A = LR with L−1 = Ln−1n−2· · · ˜L1 and P = Pn−1· · · P1.

(5)

### 2.3.1 Practical implementation

Given a linear system

Ax = b (2.3.1)

with A nonsingular. We first assume that A has an LR-factorization. i.e., A = LR. Thus LRx = b.

We then (i) solve Ly = b; (ii) solve Rx = y. These imply that LRx = Ly = b. From (2.2.4), we have

Ln−1. . . L2L1(A| b) = (R | L−1b).

Algorithm 2.3.1 (without permutation) For k = 1, . . . , n− 1,

if akk = 0 then stop (∗);

else ωj := akj (j = k + 1, . . . , n);

for i = k + 1, . . . , n,

η := aik/akk, aik := η;

for j = k + 1, . . . , n,

aij := aij− ηωj, bj := bj − ηbk. For x: (back substitution!)

xn= bn/ann;

for i = n− 1, n − 2, . . . , 1, xi = (bi−Pn

j=i+1aijxj)/aii.

Cost of computation (one multiplication + one addition ≡ one flop):

(i) LR-factorization: n3/3− n/3 flops;

(ii) Computation of y: n(n− 1)/2 flops;

(iii) Computation of x: n(n + 1)/2 flops.

For A−1: 4/3n3 ≈ n3/3 + kn2 (k = n linear systems).

Pivoting: (a) Partial pivoting; (b) Complete pivoting.

From (2.2.2), we have

A(k−1) =











a(0)11 · · · a(0)1n

0 . .. ...

... a(k−2)k−1,k−1 · · · a(k−2)k−1,n ... 0 a(k−1)kk · · · a(k−1)kn

... ... ... ...

0 . . . 0 a(k−1)nk · · · a(k−1)nn









 .

(6)

28 Chapter 2. Numerical methods for solving linear systems

For (a): 

Find a p∈ {k, . . . , n} such that

|apk| = maxk≤i≤n|aik| (rk = p)

swap akj, bk and apj, bp respectively, (j = 1, . . . , n).

(2.3.2) Replacing (∗) in Algorithm 2.3.1 by (2.3.2), we have a new factorization of A with partial pivoting, i.e., P A = LR (by Theorem 2.2.1) and |lij| ≤ 1 for i, j = 1, . . . , n. For solving linear system Ax = b, we use

P Ax = P b⇒ L(Rx) = PTb≡ ˜b.

It needs extra n(n− 1)/2 comparisons.

For (b):









Find p, q∈ {k, . . . , n} such that

|apq| ≤ max

k≤i,j≤n|aij|, (rk := p, ck:= q)

swap akj, bk and apj, bp respectivly, (j = k, . . . , n), swap aik and aiq(i = 1, . . . , n).

(2.3.3)

Replacing (∗) in Algorithm 2.3.1 by (2.3.3), we also have a new factorization of A with complete pivoting, i.e., P AΠ = LR (by Theorem 2.2.1) and |lij| ≤ 1, for i, j = 1, . . . , n.

For solving linear system Ax = b, we use

P AΠ(ΠTx) = P b⇒ LR˜x = ˜b ⇒ x = Π˜x.

It needs n3/3 comparisons.

Example 2.3.1 Let A =

 10−4 1

1 1



be in three decimal-digit floating point arithmetic.

Then κ(A) =kAkkA−1k≈ 4. A is well-conditioned.

• Without pivoting:

L =

 1 0

f l(1/10−4) 1



, f l(1/10−4) = 104, R =

 10−4 1

0 f l(1− 104· 1)



, f l(1− 104· 1) = −104. LR =

 1 0

104 1

  10−4 1 0 −104



=

 10−4 1

1 0

 6=

 10−4 1

1 1



= A.

Here a22 entirely “lost” from computation. It is numerically unstable. Let Ax =

 1 2

 . Then x≈

 1 1



. But Ly =

 1 2



solves y1 = 1 and y2 = f l(2− 104· 1) = −104, Rˆx = y solves ˆx2 = f l((−104)/(−104)) = 1, ˆx1 = f l((1− 1)/10−4) = 0. We have an erroneous solution with cond(L), cond(R)≈ 108.

• Partial pivoting:

L =

 1 0

f l(10−4/1) 1



=

 1 0

10−4 1

 , R =

 1 1

0 f l(1− 10−4)



=

 1 1 0 1

 . L and R are both well-conditioned.

(7)

T

### -factorizations

Let A = LDR as in Corollary 2.2.1.

Algorithm 2.3.2 (Crout’s factorization or compact method) For k = 1, . . . , n,

for p = 1, 2, . . . , k− 1, rp := dpapk, ωp := akpdp, dk:= akk−Pk−1

p=1akprp, if dk = 0, then stop; else

for i = k + 1, . . . , n, aik := (aik−Pk−1

p=1aiprp)/dk, aki := (aki−Pk−1

p=1ωpapi)/dk. Cost: n3/3 flops.

• With partial pivoting: see Wilkinson EVP pp.225-.

• Advantage: One can use double precision for inner product.

Theorem 2.3.1 If A is nonsingular, real and symmetric, then A has a unique LDLT- factorization, where D is diagonal and L is a unit lower triangular matrix (with one on the diagonal).

Proof: A = LDR = AT = RTDLT. It implies L = RT.

Theorem 2.3.2 If A is symmetric and positive definite, then there exists a lower trian- gular G∈ Rn×n with positive diagonal elements such that A = GGT.

Proof: A is symmetric positive definite ⇔ xTAx ≥ 0, for all nonzero vector x ∈ Rn×n

⇔ ki ≥ 0, for i = 1, · · · , n, ⇔ all eigenvalues of A are positive.

From Corollary 2.2.1 and Theorem 2.3.1 we have A = LDLT. From L−1AL−T = D follows that dk = (eTkL−1)A(L−Tek) > 0. Thus, G = Ldiag{d1/21 ,· · · , d1/2n } is real, and then A = GGT.

Algorithm 2.3.3 (Cholesky factorization) Let A be symmetric positive definite. To find a lower triangular matrix G such that A = GGT.

For k = 1, 2, . . . , n, akk := (akk−Pk−1

p=1a2kp)1/2; for i = k + 1, . . . , n,

aik = (aik−Pk−1

p=1aipakp)/akk. Cost: n3/6 flops.

Remark 2.3.1 For solving symmetric, indefinite systems: See Golub/ Van Loan Matrix Computation pp. 159-168. P APT = LDLT, D is 1× 1 or 2 × 2 block-diagonal matrix, P is a permutation and L is lower triangular with one on the diagonal.

(8)

30 Chapter 2. Numerical methods for solving linear systems

### 2.3.3 Error estimation for linear systems

Consider the linear system

Ax = b, (2.3.4)

and the perturbed linear system

(A + δA)(x + δx) = b + δb, (2.3.5)

where δA and δb are errors of measure or round-off in factorization.

Definition 2.3.1 Letk k be an operator norm and A be nonsingular. Then κ ≡ κ(A) = kAkkA−1k is a condition number of A corresponding to k k.

Theorem 2.3.3 (Forward error bound) Let x be the solution of the (2.3.4) and x+δx be the solution of the perturbed linear system (2.3.5). If kδAkkA−1k < 1, then

kδxk

kxk ≤ κ

1− κkδAkkAk

kδAk

kAk + kδbk kbk



. (2.3.6)

Proof: From (2.3.5) we have

(A + δA)δx + Ax + δAx = b + δb.

Thus,

δx =−(A + δA)−1[(δA)x− δb]. (2.3.7) Here, Corollary 2.7 implies that (A + δA)−1 exists. Now,

k(A + δA)−1k = k(I + A−1δA)−1A−1k ≤ kA−1k 1

1− kA−1kkδAk. On the other hand, b = Ax implies kbk ≤ kAkkxk. So,

1

kxk ≤ kAk

kbk. (2.3.8)

From (2.3.7) follows that kδxk ≤ 1−kAkA−1−1kkδAkk (kδAkkxk + kδbk). By using (2.3.8), the inequality (2.3.6) is proved.

Remark 2.3.2 If κ(A) is large, then A (for the linear system Ax = b) is called ill- conditioned, else well-conditioned.

### 2.3.4 Error analysis for Gaussian algorithm

A computer in characterized by four integers: (a) the machine base β; (b) the precision t; (c) the underflow limit L; (d) the overflow limit U . Define the set of floating point numbers.

F = {f = ±0.d1d2· · · dt× βe | 0 ≤ di < β, d1 6= 0, L ≤ e ≤ U} ∪ {0}. (2.3.9)

(9)

Let G ={x ∈ R | m ≤ |x| ≤ M} ∪ {0}, where m = βL−1 and M = βU(1− β−t) are the minimal and maximal numbers of F \ {0} in absolute value, respectively. We define an operator f l : G→ F by

f l(x) = the nearest c∈ F to x by rounding arithmetic.

One can show that f l satisfies

f l(x) = x(1 + ε), |ε| ≤ eps, (2.3.10) where eps = 12β1−t. (If β = 2, then eps = 2−t). It follows that

f l(a◦ b) = (a ◦ b)(1 + ε) or

f l(a◦ b) = (a ◦ b)/(1 + ε), where |ε| ≤ eps and ◦ = +, −, ×, /.

Algorithm 2.3.4 Given x, y ∈ Rn. The following algorithm computes xTy and stores the result in s.

s = 0,

for k = 1, . . . , n, s = s + xkyk.

Theorem 2.3.4 If n2−t ≤ 0.01, then f l(

Xn k=1

xkyk) = Xn

k=1

xkyk[1 + 1.01(n + 2− k)θk2−t], |θk| ≤ 1

Proof: Let sp = f l(Pp

k=1xkyk) be the partial sum in Algorithm 2.3.4. Then s1 = x1y1(1 + δ1)

with |δ1| ≤ eps and for p = 2, . . . , n,

sp = f l[sp−1+ f l(xpyp)] = [sp−1+ xpyp(1 + δp)](1 + εp) with |δp|, |εp| ≤ eps. Therefore

f l(xTy) = sn= Xn k=1

xkyk(1 + γk),

where (1 + γk) = (1 + δk)Qn

j=k(1 + εj), and ε1 ≡ 0. Thus, f l(

Xn k=1

xkyk) = Xn k=1

xkyk[1 + 1.01(n + 2− k)θk2−t]. (2.3.11)

The result follows immediately from the following useful Lemma.

(10)

32 Chapter 2. Numerical methods for solving linear systems Lemma 2.3.5 If (1 + α) =Qn

k=1(1 + αk), where |αk| ≤ 2−t and n2−t ≤ 0.01, then Yn

k=1

(1 + αk) = 1 + 1.01nθ2−t with |θ| ≤ 1.

Proof: From assumption it is easily seen that

(1− 2−t)n≤ Yn k=1

(1 + αk)≤ (1 + 2−t)n. (2.3.12)

Expanding the Taylor expression of (1− x)n as −1 < x < 1, we get (1− x)n = 1− nx + n(n− 1)

2 (1− θx)n−2x2 ≥ 1 − nx.

Hence

(1− 2−t)n ≥ 1 − n2−t. (2.3.13) Now, we estimate the upper bound of (1 + 2−t)n:

ex = 1 + x +x2 2! +x3

3! +· · · = 1 + x +x

2x(1 + x 3 +2x2

4! +· · · ).

If 0≤ x ≤ 0.01, then

1 + x≤ ex ≤ 1 + x + 0.01x1

2ex ≤ 1 + 1.01x (2.3.14) (Here, we use the fact e0.01 < 2 to the last inequality.) Let x = 2−t. Then the left inequality of (2.3.14) implies

(1 + 2−t)n≤ e2−tn (2.3.15)

Let x = 2−tn. Then the second inequality of (2.3.14) implies

e2−tn ≤ 1 + 1.01n2−t (2.3.16)

From (2.3.15) and (2.3.16) we have

(1 + 2−t)n≤ 1 + 1.01n2−t.

Let the exact LR-factorization of A be L and R (A = LR) and let ˜L, ˜R be the LR-factorization of A by using Gaussian Algorithm (without pivoting). There are two possibilities:

(i) Forward error analysis: Estimate |L − ˜L| and |R − ˜R|.

(ii) Backward error analysis: Let ˜L ˜R be the exact LR-factorization of a perturbed matrix ˜A = A + F . Then F will be estimated, i.e., |F | ≤ ?.

(11)

### 2.3.5 Apriori error estimate for backward error bound of LR- ` factorization

From (2.2.2) we have

A(k+1)= LkA(k),

for k = 1, 2, . . . , n− 1 (A(1) = A). Denote the entries of A(k) by a(k)ij and let lik = f l(a(k)ik /a(k)kk), i≥ k + 1. From (2.2.2) we know that

a(k+1)ij =





0; for i≥ k + 1, j = k

f l(a(k)ij − fl(lika(k)kj )); for i≥ k + 1, j ≥ k + 1 a(k)ij ; otherwise.

(2.3.17)

From (2.3.10) we have lik = (a(k)ik /a(k)kk)(1 + δik) with|δik| ≤ 2−t. Then

a(k)ik − lika(k)kk + a(k)ij δik = 0, for i≥ k + 1. (2.3.18) Let a(k)ik δik ≡ ε(k)ik . From (2.3.10) we also have

a(k+1)ij = f l(a(k)ij − fl(lika(k)kj)) (2.3.19)

= (a(k)ij − (lika(k)kj (1 + δij)))/(1 + δij0 ) with |δij|, |δij0 | ≤ 2−t. Then

a(k+1)ij = a(k)ij − lika(k)kj − lika(k)kjδij+ a(k+1)ij δij0 , for i, j ≥ k + 1. (2.3.20) Let ε(k)ij ≡ −lika(k)kjδij+ a(k+1)ij δij0 which is the computational error of a(k)ij in A(k+1). From (2.3.17), (2.3.18) and (2.3.20) we obtain

a(k+1)ij =





a(k)ij − lika(k)kk + ε(k)ij ; for i≥ k + 1, j = k a(k)ij − lika(k)kj + ε(k)ij ; for i≥ k + 1, j ≥ k + 1 a(k)ij + ε(k)ij ; otherwise,

(2.3.21)

where

ε(k)ij =





a(k)ij δij; for i≥ k + 1, j = k,

−lika(k)kj δij− a(k+1)ij δij0 ; for i≥ k + 1, j ≥ k + 1

0; otherwise.

(2.3.22)

Let E(k) be the error matrix with entries ε(k)ij . Then (2.3.21) can be written as

A(k+1) = A(k)− MkA(k)+ E(k), (2.3.23) where

Mk =







 0

. ..

0 lk+1,k

... . ..

ln,k 0









(2.3.24)

(12)

34 Chapter 2. Numerical methods for solving linear systems For k = 1, 2 . . . , n− 1, we add the n − 1 equations in (2.3.23) together and get

M1A(1) + M2A(2)+· · · + Mn−1A(n−1)+ A(n)

= A(1)+ E(1)+· · · + E(n−1).

From (2.3.17) we know that the k-th row of A(k)is equal to the k-th row of A(k+1),· · · , A(n), respectively and from (2.3.24) we also have

MkA(k) = MkA(n) = MkR.˜ Thus,

(M1 + M2+· · · + Mn−1+ I) ˜R = A(1)+ E(1)+· · · + E(n−1). Then

L ˜˜R = A + E, (2.3.25)

where

L =˜



 1

l21 1 O

... . ..

ln1 . . . ln,n−1 1



 and E = E(1)+· · · + E(n−1). (2.3.26)

Now we assume that the partial pivotings in Gaussian Elimination are already ar- ranged such that pivot element a(k)kk has the maximal absolute value. So, we have|lik| ≤ 1.

Let

ρ = max

i,j,k |a(k)ij |/kAk. (2.3.27)

Then

|a(k)ij | ≤ ρkAk. (2.3.28)

From (2.3.22) and (2.3.28) follows that

(k)ij | ≤ ρkAk



2−t; for i≥ k + 1, j = k, 21−t; for i≥ k + 1, j ≥ k + 1,

0; otherwise.

(2.3.29)

Therefore,

|E(k)| ≤ ρkAk2−t·





0 0 0 · · · 0 0 1 2 · · · 2 ... ... ... ... 0 1 2 · · · 2



. (2.3.30)

From (2.3.26) we get

|E| ≤ ρkAk· 2−t









0 0 0 · · · 0 0

1 2 2 · · · 2 2

1 3 4 · · · 4 4

... ... ... ... ... 1 3 5 · · · 2n − 4 2n − 4 1 3 5 · · · 2n − 3 2n − 2









(2.3.31)

Hence we have the following theorem.

(13)

Theorem 2.3.6 The LR-factorization ˜L and ˜R of A using Gaussian Elimination with partial pivoting satisfies

L ˜˜R = A + E, where

kEk ≤ n2ρkAk2−t (2.3.32)

Proof:

kEk ≤ ρkAk2−t( Xn

j=1

(2j− 1) − 1) < n2ρkAk2−t.

Now we shall solve the linear system Ax = b by using the factorization ˜L and ˜R, i.e., Ly = b and ˜˜ Rx = y.

• For Ly = b: From Algorithm 2.3.1 we have y1 = f l(b1/l11),

yi = f l

−li1y1− li2y2− · · · − li,i−1yi−1+ bi

lii



, (2.3.33)

for i = 2, 3, . . . , n. From (2.3.10) we have













y1 = b1/l11(1 + δ11), with |δ11| ≤ 2−t yi = f l(f l(−li1y1−li2ly2−···−li,i−1yi−1)+bi

ii(1+δii) )

= f l(−li1y1−ll i2y2−···−li,i−1yi−1)+bi

ii(1+δii)(1+δ0ii) , with|δii|, |δii0| ≤ 2−t.

(2.3.34)

Applying Theorem 2.3.4 we get

f l(−li1y1− li2y2− · · · − li,i−1yi−1) =−li1(1 + δi1)y1− · · · − li,i−1(1 + δi,i−1)yi−1, where

i1| ≤ (i − 1)1.01 · 2−t; for i = 2, 3,· · · , n,

ij| ≤ (i + 1 − j)1.01 · 2−t; for

 i = 2, 3,· · · , n,

j = 2, 3,· · · , i − 1. (2.3.35) So, (2.3.34) can be written as



l11(1 + δ11)y1 = b1,

li1(1 + δi1)y1+· · · + li,i−1(1 + δi,i−1)yi−1+ lii(1 + δii)(1 + δii0)yi = bi, for i = 2, 3,· · · , n.

(2.3.36)

or

(L + δL)y = b. (2.3.37)

(14)

36 Chapter 2. Numerical methods for solving linear systems From (2.3.35) (2.3.36) and (2.3.37) follow that

|δL| ≤ 1.01 · 2−t









|l11| 0

|l21| 2|l22|

2|l31| 2|l32| 2|l33|

3|l41| 3|l42| 2|l43| . ..

... ... ... . .. . ..

(n− 1)|ln1| (n − 1)|ln2| (n − 2)|ln3| · · · 2|ln,n−1| 2|lnn|







 .

(2.3.38) This implies,

kδLk ≤ n(n + 1)

2 · 1.01 · 2−tmax

i,j |lij| ≤ n(n + 1)

2 · 1.01 · 2−t. (2.3.39) Theorem 2.3.7 For lower triangular linear system Ly = b, if y is the exact solution of (L + δL)y = b, then δL satisfies (2.3.38) and (2.3.39).

Applying Theorem 2.3.7 to the linear system ˜Ly = b and ˜Rx = y, respectively, the solution x satisfies

( ˜L + δ ˜L)( ˜R + δ ˜R)x = b or

( ˜L ˜R + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R))x = b. (2.3.40) Since ˜L ˜R = A + E, substituting this equation into (2.3.40) we get

[A + E + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R)]x = b. (2.3.41) The entries of ˜L and ˜R satisfy

|elij| ≤ 1, and |erij| ≤ ρkAk.

Therefore, we get 





















k ˜Lk ≤ n,

k ˜Rk≤ nρkAk,

kδ ˜Lkn(n+1)2 1.01· 2−t, kδ ˜Rkn(n+1)2 1.01ρ2−t.

(2.3.42)

In practical implementation we usually have n22−t << 1. So it holds kδ ˜Lkkδ ˜Rk≤ n2ρkAk2−t.

Let

δA = E + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R). (2.3.43) Then, (2.3.32) and (2.3.42) we get

kδAk ≤ kEk+kδ ˜Lkk ˜Rk+k ˜Lkkδ ˜Rk+kδ ˜Lkkδ ˜Rk

≤ 1.01(n3+ 3n2)ρkAk2−t (2.3.44)

(15)

Theorem 2.3.8 For a linear system Ax = b the solution x computed by Gaussian Elim- ination with partial pivoting is the exact solution of the equation (A + δA)x = b and δA satisfies (2.3.43) and (2.3.44).

Remark 2.3.3 The quantity ρ defined by (2.3.27) is called a growth factor. The growth factor measures how large the numbers become during the process of elimination. In practice, ρ is usually of order 10 for partial pivot selection. But it can be as large as ρ = 2n−1, when

A =









1 0 · · · 0 1

−1 1 0 · · · 0 1

... −1 . .. ... ... 1 ... ... . .. ... 0 1

−1 −1 · · · −1 1 1

−1 −1 · · · −1 1







 .

Better estimates hold for special types of matrices. For example in the case of upper Hessenberg matrices, that is, matrices of the form

A =





× · · · ×

× . .. ... ...

. .. ... ...

0 × ×





the bound ρ≤ (n − 1) can be shown. (Hessenberg matrices arise in eigenvalus problems.) For tridiagonal matrices

A =







α1 β2 0

γ2 . .. ...

. .. ... ...

. .. ... βn

0 γn αn







it can even be shown that ρ ≤ 2 holds for partial pivot selection. Hence, Gaussian elimination is quite numerically stable in this case.

For complete pivot selection, Wilkinson (1965) has shown that

|akij| ≤ f(k) max

i,j |aij| with the function

f (k) := k12[21312 413 · · · k(k−1)1 ]12. This function grows relatively slowly with k:

k 10 20 50 100

f (k) 19 67 530 3300

(16)

38 Chapter 2. Numerical methods for solving linear systems Even this estimate is too pessimistic in practice. Up until now, no matrix has been found which fails to satisfy

|a(k)ij | ≤ (k + 1) max

i,j |aij| k = 1, 2, ..., n − 1,

when complete pivot selection is used. This indicates that Gaussian elimination with complete pivot selection is usually a stable process. Despite this, partial pivot selection is preferred in practice, for the most part, because:

(i) Complete pivot selection is more costly than partial pivot selection. (To compute A(i), the maximum from among (n− i + 1)2 elements must be determined instead of n− i + 1 elements as in partial pivot selection.)

(ii) Special structures in a matrix, i.e. the band structure of a tridiagonal matrix, are destroyed in complete pivot selection.

### 2.3.6 Improving and Estimating Accuracy

• Iterarive Improvement:

Suppose that the linear system Ax = b has been solved via the LR-factorization P A = LR. Now we want to improve the accuracy of the computed solution x. We compute



r = b− Ax, Ly = P r, Rz = y, xnew = x + z.

(2.3.45)

Then in exact arithmatic we have

Axnew = A(x + z) = (b− r) + Az = b.

Unfortunately, r = f l(b− Ax) renders an xnew that is no more accurate than x. It is necessary to compute the residual b− Ax with extended precision floating arithmetic.

Algorithm 2.3.5

Compute P A = LR (t-digit) Repeat: r := b− Ax (2t-digit)

Solve Ly = P r for y (t-digit) Solve Rz = y for z (t-digit) Update x = x + z (t-digit)

This is referred to as an iterative improvement. From (2.3.45) we have

ri = bi− ai1x1− ai2x2− · · · − ainxn. (2.3.46) Now, ri can be roughly estimated by 2−tmaxj|aij| |xj|. That is

krk ≈ 2−tkAkkxk. (2.3.47)

(17)

Let e = x− A−1b = A−1(Ax− b) = −A−1r. Then we have

kek ≤ kA−1kkrk. (2.3.48)

From (2.3.47) follows that

kek ≈ kA−1k · 2−tkAkkxk = 2−tcond(A)kxk.

Let

cond(A) = 2p, 0 < p < t, (p is integer). (2.3.49) Then we have

kek/kxk ≈ 2−(t−p). (2.3.50)

From (2.3.50) we know that x has q = t− p correct significant digits. Since r is computed by double precision, so we can assume that it has at least t correct significant digits.

Therefore for solving Az = r according to (2.3.50) the solution z (comparing with −e = A−1r) has q-digits accuracy so that xnew = x + z has usually 2q-digits accuracy. From above discussion, the accuracy of xnew is improved about q-digits after one iteration.

Hence we stop the iteration, when the number of the iterates k (say!) satifies kq ≥ t.

From above we have

kzk/kxk ≈ kek/kxk ≈ 2−q = 2−t2p. (2.3.51) From (2.3.49) and (2.3.51) we have

cond(A) = 2t· (kzk/kxk).

By (2.3.51) we get

q = log2(kxk

kzk) and k = t log2(kxkkzk).

In the following we shall give a further discussion of convergence of the iterative improve- ment. From Theorem 2.3.8 we know that z in Algorithm 5.5 is computed by (A+δA)z = r.

That is

A(I + F )z = r, (2.3.52)

where F = A−1δA.

Theorem 2.3.9 Let the sequence of vectors {xv} be the sequence of improved solutions in Algorithm 5.5 for solving Ax = b and x = A−1b be the exact solution. Assume that Fk in (2.3.52) satisfies kFkk ≤ σ < 1/2 for all k. Then {xk} converges to x, i.e., limv→∞kxk− xk = 0.

Proof: From (2.3.52) and rk= b− Axk we have

A(I + Fk)zk= b− Axk. (2.3.53) Since A is nonsingular, multiplying both sides of (2.3.53) by A−1 we get

(I + Fk)zk = x− xk.

(18)

40 Chapter 2. Numerical methods for solving linear systems From xk+1 = xk+ zk we have (I + Fk)(xk+1− xk) = x− xk, i.e.,

(I + Fk)xk+1 = Fkxk+ x. (2.3.54) Subtracting both sides of (2.3.54) from (I + Fk)x we get

(I + Fk)(xk+1− x) = Fk(xk− x).

Applying Corollary 1.2.1 we have

xk+1− x = (I + Fk)−1Fk(xk− x).

Hence,

kxk+1− xk ≤ kFkkkxk− xk

1− kFkk ≤ σ

1− σkxk− xk.

Let τ = σ/(1− σ). Then

kxk− xk ≤ τk−1kx1− xk.

But σ < 1/2 follows τ < 1. This implies convergence of Algorithm 2.3.5.

Corollary 2.3.1 If

1.01(n3+ 3n2)ρ2−tkAk kA−1k < 1/2, then Algorithm 2.3.5 converges.

Proof: From (2.3.52) and (2.3.44) follows that

kFkk ≤ 1.01(n3+ 3n2)ρ2−tcond(A) < 1/2.

### 2.3.7 Special Linear Systems

Toeplitz Systems

Definition 2.3.2 (i) T ∈ Rn×n is called a Toeplitz matrix if there exists r−n+1,· · · , r0,· · · , rn−1

such that aij = rj−i for all i, j. e.g.,

T =



r0 r1 r2 r3 r−1 r0 r1 r2

r−2 r−1 r0 r1

r−3 r−2 r−1 r0



 , (n = 4).

(ii) B ∈ Rn×nis called a Persymmetric matrix if it is symmetric about northest-southwest diagonal, i.e., bij= bn−j+1,n−i+1 for all i, j. That is,

B = EBTE, where E = [en,· · · e1].

(19)

Given scalars r1,· · · , rn−1 such that the matrices

Tk =





1 r1 r2 · · · rk−1

r1 1 r1 ...

... . ..

rk−1 · · · 1





are all positive definite, for k = 1, . . . , n. Three algorithms will be described:

(a) Durbin’s Algorithm for the Yule-Walker problem Tny =−(r1, . . . , rn)T.

(b) Levinson’s Algorithm for the general right hand side Tnx = b.

(c) Trench’s Algorithm for computing B = Tn−1.

• To (a): Let Ek = [e(k)k , . . . , e(k)1 ]. Suppose the k-th order Yule-Walker system Tky =−(r1, . . . , rk)T =−rT

has been solved. Consider the (k + 1)-st order system

 Tk Ekr rTEk 1

  z α



=

 −r

−rk+1



can be solved in O(k) flops. Observe that

z = Tk−1(−r − αEkr) = y− αTk−1Ekr (2.3.55) and

α =−rk+1 =−rTEkz. (2.3.56)

Since Tk−1 is persymmetric, Tk−1Ek= EkTk−1and z = y +αEky. Substituting into (2.3.56) we get

α =−rk+1− rTEk(y + αEky) =−(rk+1+ rTEky)/(1 + rTy).

Here (1 + rTy) is positive, because Tk+1 is positive definite and

 I Eky

0 1

T 

Tk Ekr rTEk 1

  I Eky

0 1



=

 Tk 0 0 1 + rTy

 .

Algorithm 2.3.6 (Durbin Algorithm, 1960) Let Tky(k)=−r(k) =−(r1, . . . , rk)T. For k = 1, . . . , n,

y(1) =−r1,

for k = 1, . . . , n− 1, βk = 1 + r(k)Ty(k),

αk=−(rk+1+ r(k)TEky(k))/βk, z(k) = y(k)+ αkEky(k),

y(k+1) =

 z(k) αk

 .

(20)

42 Chapter 2. Numerical methods for solving linear systems This algorithm requires 32n2 flops to generate y = y(n).

Further reduction:

βk = 1 + r(k)Ty(k)

= 1 + [r(k−1)T, r(k)]

 y(k−1)+ αk−1Ek−1y(k−1) αk−1



= 1 + r(k−1)Ty(k−1)+ αk−1(r(k−1)TEk−1y(k−1)+ rk)

= βk−1+ αk−1(−βk−1αk−1) = (1− α2k−1k−1.

• To (b):

Tkx = b = (b1,· · · , bk)T, for 1≤ k ≤ n. (2.3.57) Want to solve

 Tk Ekr rTEk 1

  ν µ

  b bk+1



, (2.3.58)

where r = (r1,· · · , rk)T. Since ν = Tk−1(b− µEkr) = x + µEky, it follows that µ = bk+1− rTEkν = bk+1− rTEkx− µrTy

= (bk+1− rTEkx)/(1 + rTy).

We can effect the transition form (2.3.57) to (2.3.58) in O(k) flops. We can solve the system Tnx = b by solving

Tkx(k) = b(k)= (b1, . . . , bk)T and

Tky(k)=−r(k) =−(r1, . . . , rk)T.

It needs 2n2 flops. See Algorithm Levinson (1947) in Matrix Computations, pp.128-129 for details.

• To (c):

Tn−1 =

 A Er

rTE 1

−1

=

 B ν νT γ

 ,

where A = Tn−1, E = En−1 and r = (r1, . . . , rn−1)T. From the equation

 A Er

rTE 1



=

 ν γ



=

 0 1



follows that

Aν =−γEr = −γE(r1, . . . , rn−1)T and γ = 1− rTEν.

If y is the solution of (n− 1)-st Yule-Walker system Ay = −r, then γ = 1/(1 + rTy) and ν = γEy.

Thus the last row and column of Tn−1 are readily obtained. Since AB + ErνT = In−1, we have

B = A−1− (A−1Er)νT = A−1 +ννT γ .

(21)

Since A = Tn−1 is nonsingular and Toeplitz, its inverse is persymmetric. Thus bij = (A−1)ij+ νiνj

γ = (A−1)n−j,n−iiνj

γ

= bn−j,n−i− νn−iνn−j

γ +νiνj

γ

= bn−j,n−i− 1

γ(νiνj− νn−iνn−j).

It needs 74n2 flops. See Algorithm Trench (1964) in Matrix Computations, pp.132 for details.

Banded Systems

Definition 2.3.3 Let A be a n× n matrix. A is called a (p, q)-banded matrix, if aij= 0 whenever i− j > p or j − i > q. A has the form

A =







× · · · ... ...

× . ..

O

× O

. ..

. .. ×

. .. ...

× · · · ×







| {z }

p

>

q

⊥ ,

where p and q are the lower and upper band widthes, respectively.

Example 2.3.2 (1, 1): tridiagonal matrix; (1, n−1): upper Hessenberg matrix; (n−1, 1):

lower Hessenberg matrix.

Theorem 2.3.10 Let A be a (p, q)-banded matrix. Suppose A has a LR-factorization (A = LR). Then L = (p, 0) and R = (0, q)-banded matrix, respectively.

Algorithm 2.3.7 See Algorithm 4.3.1 in Matrix Computations, pp.150.

Theorem 2.3.11 Let A be a (p, q)-banded nonsingular matrix. If Gaussian Elimination with partial pivoting is used to compute Gaussian transformations Lj = I − ljeTj, for j = 1,. . ., n− 1, and permutations P1, . . ., Pn−1 such that

Ln−1Pn−1· · · L1P1A = R

is upper triangular, then R is a (0, p + q)-banded matrix and lij = 0 whenever i ≤ j or i > j + p. (Since the j-th column of L is a permutation of the Gaussian vector lj, it follows that L has at most p + 1 nonzero elements per column.)

(22)

44 Chapter 2. Numerical methods for solving linear systems Symmetric Indefinite Systems

Consider the linear system Ax = b, where A ∈ Rn×n is symmetric but indefinite. There are a method using n3/6 flops due to Aasen (1971) that computes the factorization P APT = LT LT, where L = [lij] is unit lower triangular, P is a permutation chosen such that | lij |≤ 1, and T is tridiagonal.

Rather than the above factorization P APT = LT LT we have the calculation of P APT = LDLT,

where D is block diagonal with 1 by 1 and 2 by 2 blocks on diagonal, L = [lij] is unit lower triangular, and P is a permutation chosen such that | lij |≤ 1.

Bunch and Parlett (1971) has proposed a pivot strategy to do this, n3/6 flops are required. Unfortunately the overall process requires n3/12 ∼ n3/6 comparisons. A bet- ter method described by Bunch and Kaufmann (1977) requires n3/6 flops and O(n2) comparisons.

A detailed discussion of this subsection see p.159-168 in Matrix Computations.

In this paper, we provide new decidability and undecidability results for classes of linear hybrid systems, and we show that some algorithms for the analysis of timed automata can

Now given the volume fraction for the interface cell C i , we seek a reconstruction that mimics the sub-grid structure of the jump between 0 and 1 in the volume fraction

For an important class of matrices the more qualitative assertions of Theorems 13 and 14 can be considerably sharpened. This is the class of consistly

We show that, for the linear symmetric cone complementarity problem (SCLCP), both the EP merit functions and the implicit Lagrangian merit function are coercive if the underlying

Assume that the partial multiplicities of purely imaginary and unimodular eigenvalues (if any) of the associated Hamiltonian and symplectic pencil, respectively, are all even and

Particularly, combining the numerical results of the two papers, we may obtain such a conclusion that the merit function method based on ϕ p has a better a global convergence and

We must assume, further, that between a nucleon and an anti-nucleon strong attractive forces exist, capable of binding the two particles together.. *Now at the Institute for

where L is lower triangular and U is upper triangular, then the operation counts can be reduced to O(2n 2 )!.. The results are shown in the following table... 113) in