## Chapter 2

## Numerical methods for solving linear systems

Let A ∈ C^{n×n} be a nonsingular matrix. We want to solve the linear system Ax = b by
(a) Direct methods (finite steps); Iterative methods (convergence). (See Chapter 4)

### 2.1 Elementary matrices

Let X = K^{n} and x, y ∈ X. Then y^{∗}x∈ K, xy^{∗} =

x1y¯1 · · · x^{1}y¯n

... ... xny¯1 · · · xny¯n

. The eigenvalues

of xy^{∗} are {0, · · · , 0, y^{∗}x}, since rank(xy^{∗}) = 1 by (xy^{∗})z = (y^{∗}z)x and (xy^{∗})x = (y^{∗}x)x.

Definition 2.1.1 A matrix of the form

I− αxy^{∗} (α∈ K, x, y ∈ K^{n}) (2.1.1)
is called an elementary matrix.

The eigenvalues of (I − αxy^{∗}) are {1, 1, · · · , 1, 1 − αy^{∗}x}. Compute

(I− αxy^{∗})(I − βxy^{∗}) = I− (α + β − αβy^{∗}x)xy^{∗}. (2.1.2)
If αy^{∗}x− 1 6= 0 and letβ = _{αy}^{∗}^{α}_{x−1}, then α + β− αβy^{∗}x = 0. We have

(I− αxy^{∗})^{−1} = (I − βxy^{∗}), 1
α + 1

β = y^{∗}x. (2.1.3)

Example 2.1.1 Let x∈ K^{n}, and x^{∗}x = 1. Let H ={z : z^{∗}x = 0} and
Q = I− 2xx^{∗} (Q = Q^{∗}, Q^{−1} = Q).

Then Q reflects each vector with respect to the hyperplane H. Let y = αx + w, w ∈ H.

Then, we have

Qy = αQx + Qw =−αx + w − 2(x^{∗}w)x =−αx + w.

24 Chapter 2. Numerical methods for solving linear systems
Example 2.1.2 Let y = e_{i} = the i-th column of unit matrix and x = l_{i} = [0,· · · , 0, li+1,i,· · · ,
ln,i]^{T}. Then,

I + l_{i}e^{T}_{i} =

1

. ..

1 li+1,i

... . ..

ln,i 1

(2.1.4)

Since e^{T}_{i} li = 0, we have

(I + lie^{T}_{i} )^{−1}= (I − lie^{T}_{i}). (2.1.5)
From the equality

(I + l1e^{T}_{1})(I + l2e^{T}_{2}) = I + l1e^{T}_{1} + l2e^{T}_{2} + l1(e^{T}_{1}l2)e^{T}_{2} = I + l1e^{T}_{1} + l2e^{T}_{2}
follows that

(I + l1e^{T}_{1})· · · (I + l^{i}e^{T}_{i} )· · · (I + ln−1e^{T}_{n−1}) = I + l1e^{T}_{1} + l2e^{T}_{2} +· · · + ln−1e^{T}_{n−1}

=

1

l21 . .. 0 ... ... ...

ln1 · · · ln,n−1 1

. (2.1.6)

Theorem 2.1.1 A lower triangular with “1” on the diagonal can be written as the prod- uct of n− 1 elementary matrices of the form (2.1.4).

Remark 2.1.1 (I + l_{1}e^{T}_{1} + . . . + l_{n−1}e^{T}_{n−1})^{−1} = (l− ln−1e^{T}_{n−1}) . . . (I− l1e^{T}_{1}) which can not
be simplified as in (2.1.6).

### 2.2 LR-factorization

Definition 2.2.1 Given A ∈ C^{n×n}, a lower triangular matrix L and an upper triangu-
lar matrix R. If A = LR, then the product LR is called a LR-factorization (or LR-
decomposition) of A.

Basic problem:

Given b 6= 0, b ∈ K^{n}. Find a vector l1 = [0, l21, . . . , ln1]^{T} and c∈ K such that
(I− l1e^{T}_{1})b = ce1.

Solution:

b1 = c,

bi− li1b1 = 0, i = 2, . . . , n.

b1 = 0, it has no solution (since b6= 0),
b1 6= 0, then c = b^{1}, li1= bi/b1, i = 2, . . . , n.

Construction of LR-factorization:

Let A = A^{(0)} = [a^{(0)}_{1} | . . . | a^{(0)}^{n} ]. Apply basic problem to a^{(0)}_{1} : If a^{(0)}_{11} 6= 0, then there
exists L1 = I− l^{1}e^{T}_{1} such that (I − l^{1}e^{T}_{1})a^{(0)}_{1} = a^{(0)}_{11}e1. Thus

A^{(1)} = L_{1}A^{(0)} = [L_{1}a^{(0)}_{1} | . . . | L1a^{(0)}_{n} ] =

a^{(0)}_{11} a^{(0)}_{12} . . . a^{(0)}_{1n}
0 a^{(1)}_{22} a^{(1)}_{2n}
... ... ...

0 a^{(1)}_{n2} . . . a^{(1)}nn

. (2.2.1)

The i-th step:

A^{(i)} = LiA^{(i−1)} = LiL_{i−1}. . . L1A^{(0)}

=

a^{(0)}_{11} · · · a^{(0)}_{1n}
0 a^{(1)}_{22} · · · a^{(1)}_{2n}

... 0 . .. ...

... ... a^{(i−1)}_{ii} · · · a^{(i−1)}in

... ... 0 a^{(i)}_{i+1,i+1} · · · a^{(i)}i+1,n

... ... ... ... ...

0 0 · · · a^{(i)}_{n,i+1} · · · a^{(i)}nn

(2.2.2)

If a^{(i−1)}_{ii} 6= 0, for i = 1, . . . , n − 1, then the method is executable and we have that

A^{(n−1)} = L_{n−1}. . . L1A^{(0)} = R (2.2.3)

is an upper triangular matrix. Thus, A = LR. Explicit representation of L:

Li = I − lie^{T}_{i} , L^{−1}_{i} = I + lie^{T}_{i}

L = L^{−1}_{1} . . . L^{−1}_{n−1} = (I + l1e^{T}_{1}) . . . (I + l_{n−1}e^{T}_{n−1})

= I + l1e^{T}_{1} + . . . + l_{n−1}e^{T}_{n−1} (by (2.1.6)).

Theorem 2.2.1 Let A be nonsingular. Then A has an LR-factorization (A=LR) if and
only if ki := det(Ai)6= 0, where A^{i} is the leading principal matrix of A, i.e.,

A_{i} =

a11 . . . a1i

... ...

ai1 . . . aii

,

for i = 1, . . . , n− 1.

Proof: (Necessity “⇒” ): Since A = LR, we have

a11 . . . a1i

... ... ai1 . . . aii

=

l11

... . .. O li1 . . . lii

r11 r1i

O . ..

rii

.

26 Chapter 2. Numerical methods for solving linear systems From det(A) 6= 0 follows that det(L) 6= 0 and det(R) 6= 0. Thus, ljj 6= 0 and rjj 6= 0, for j = 1, . . . , n. Hence ki = l11. . . liir11. . . rii6= 0.

(Sufficiency “⇐”): From (2.2.2) we have

A^{(0)} = (L^{−1}_{1} . . . L^{−1}_{i} )A^{(i)}.

Consider the (i + 1)-th leading principle determinant. From (2.2.3) we have

a11 . . . ai,i+1

... ...

ai+1 . . . ai+1,i+1

=

1 0

l21 . ..

... . .. ...

... . .. ...

li+1,1 · · · l^{i+1,i} 1

a^{(0)}_{11} a^{(0)}_{12} · · · ∗
a^{(1)}_{22} · · · ...

. .. ...

a^{(i−1)}_{ii} a^{(i−1)}_{i,i+1}

0 a^{(i)}_{i+1,i+1}

.

Thus, ki = 1· a^{(0)}11a^{(1)}_{22} . . . a^{(i)}_{i+1,i+1} 6= 0 which implies a^{(i)}i+1,i+1 6= 0. Therefore, the LR-
factorization of A exists.

Theorem 2.2.2 If a nonsingular matrix A has an LR-factorization with A = LR and l11 =· · · = lnn = 1, then the factorization is unique.

Proof: Let A = L1R1 = L2R2. Then L^{−1}_{2} L1 = R2R^{−1}_{1} = I.

Corollary 2.2.1 If a nonsingular matrix A has an LR-factorization with A = LDR,
where D is diagonal, L and R^{T} are unit lower triangular (with one on the diagonal) if
and only if ki 6= 0.

Theorem 2.2.3 Let A be a nonsingular matrix. Then there exists a permutation P , such that P A has an LR-factorization.

(Proof ): By construction! Consider (2.2.2): There is a permutation Pi, which inter-
changes the i-th row with a row of index large than i, such that 06= a^{(i−1)}ii (∈ PiA^{(i−1)}).

This procedure is executable, for i = 1, . . . , n− 1. So we have

L_{n−1}P_{n−1}. . . LiPi. . . L1P1A^{(0)} = R. (2.2.4)
Let P be a permutation which affects only elements i + 1,· · · , n. It holds

P (I − l^{i}e^{T}_{i} )P^{−1}= I − (P l^{i})e^{T}_{i} = I − ˜l^{i}e^{T}_{i} = ˜Li, (e^{T}_{i} P^{−1} = e^{T}_{i} )
where ˜Li is lower triangular. Hence we have

P Li = ˜LiP. (2.2.5)

Now write all P_{i} in (2.2.4) to the right as

L_{n−1}L˜_{n−2}. . . ˜L1P_{n−1}. . . P1A^{(0)} = R.

Then we have P A = LR with L^{−1} = L_{n−1}L˜_{n−2}· · · ˜L_{1} and P = P_{n−1}· · · P1.

### 2.3 Gaussian elimination

### 2.3.1 Practical implementation

Given a linear system

Ax = b (2.3.1)

with A nonsingular. We first assume that A has an LR-factorization. i.e., A = LR. Thus LRx = b.

We then (i) solve Ly = b; (ii) solve Rx = y. These imply that LRx = Ly = b. From (2.2.4), we have

L_{n−1}. . . L2L1(A| b) = (R | L^{−1}b).

Algorithm 2.3.1 (without permutation) For k = 1, . . . , n− 1,

if akk = 0 then stop (∗);

else ωj := akj (j = k + 1, . . . , n);

for i = k + 1, . . . , n,

η := aik/akk, aik := η;

for j = k + 1, . . . , n,

aij := aij− ηω^{j}, bj := bj − ηb^{k}.
For x: (back substitution!)

x_{n}= b_{n}/a_{nn};

for i = n− 1, n − 2, . . . , 1, xi = (bi−Pn

j=i+1aijxj)/aii.

Cost of computation (one multiplication + one addition ≡ one flop):

(i) LR-factorization: n^{3}/3− n/3 flops;

(ii) Computation of y: n(n− 1)/2 flops;

(iii) Computation of x: n(n + 1)/2 flops.

For A^{−1}: 4/3n^{3} ≈ n^{3}/3 + kn^{2} (k = n linear systems).

Pivoting: (a) Partial pivoting; (b) Complete pivoting.

From (2.2.2), we have

A^{(k−1)} =

a^{(0)}_{11} · · · a^{(0)}_{1n}

0 . .. ...

... a^{(k−2)}_{k−1,k−1} · · · a^{(k−2)}_{k−1,n}
... 0 a^{(k−1)}_{kk} · · · a^{(k−1)}kn

... ... ... ...

0 . . . 0 a^{(k−1)}_{nk} · · · a^{(k−1)}^{nn}

.

28 Chapter 2. Numerical methods for solving linear systems

For (a):

Find a p∈ {k, . . . , n} such that

|a^{pk}| = maxk≤i≤n|a^{ik}| (r^{k} = p)

swap akj, bk and apj, bp respectively, (j = 1, . . . , n).

(2.3.2) Replacing (∗) in Algorithm 2.3.1 by (2.3.2), we have a new factorization of A with partial pivoting, i.e., P A = LR (by Theorem 2.2.1) and |lij| ≤ 1 for i, j = 1, . . . , n. For solving linear system Ax = b, we use

P Ax = P b⇒ L(Rx) = P^{T}b≡ ˜b.

It needs extra n(n− 1)/2 comparisons.

For (b):

Find p, q∈ {k, . . . , n} such that

|a^{pq}| ≤ max

k≤i,j≤n|a^{ij}|, (r^{k} := p, ck:= q)

swap akj, bk and apj, bp respectivly, (j = k, . . . , n), swap aik and aiq(i = 1, . . . , n).

(2.3.3)

Replacing (∗) in Algorithm 2.3.1 by (2.3.3), we also have a new factorization of A with complete pivoting, i.e., P AΠ = LR (by Theorem 2.2.1) and |lij| ≤ 1, for i, j = 1, . . . , n.

For solving linear system Ax = b, we use

P AΠ(Π^{T}x) = P b⇒ LR˜x = ˜b ⇒ x = Π˜x.

It needs n^{3}/3 comparisons.

Example 2.3.1 Let A =

10^{−4} 1

1 1

be in three decimal-digit floating point arithmetic.

Then κ(A) =kAk∞kA^{−1}k∞≈ 4. A is well-conditioned.

• Without pivoting:

L =

1 0

f l(1/10^{−4}) 1

, f l(1/10^{−4}) = 10^{4},
R =

10^{−4} 1

0 f l(1− 10^{4}· 1)

, f l(1− 10^{4}· 1) = −10^{4}.
LR =

1 0

10^{4} 1

10^{−4} 1
0 −10^{4}

=

10^{−4} 1

1 0

6=

10^{−4} 1

1 1

= A.

Here a22 entirely “lost” from computation. It is numerically unstable. Let Ax =

1 2

. Then x≈

1 1

. But Ly =

1 2

solves y1 = 1 and y2 = f l(2− 10^{4}· 1) = −10^{4}, Rˆx = y
solves ˆx2 = f l((−10^{4})/(−10^{4})) = 1, ˆx1 = f l((1− 1)/10^{−4}) = 0. We have an erroneous
solution with cond(L), cond(R)≈ 10^{8}.

• Partial pivoting:

L =

1 0

f l(10^{−4}/1) 1

=

1 0

10^{−4} 1

, R =

1 1

0 f l(1− 10^{−4})

=

1 1 0 1

. L and R are both well-conditioned.

### 2.3.2 LDR- and LL

^{T}

### -factorizations

Let A = LDR as in Corollary 2.2.1.

Algorithm 2.3.2 (Crout’s factorization or compact method) For k = 1, . . . , n,

for p = 1, 2, . . . , k− 1,
r_{p} := d_{p}a_{pk},
ωp := akpdp,
dk:= akk−P_{k−1}

p=1akprp,
if d_{k} = 0, then stop; else

for i = k + 1, . . . , n,
aik := (aik−P_{k−1}

p=1aiprp)/dk,
aki := (aki−P_{k−1}

p=1ωpapi)/dk.
Cost: n^{3}/3 flops.

• With partial pivoting: see Wilkinson EVP pp.225-.

• Advantage: One can use double precision for inner product.

Theorem 2.3.1 If A is nonsingular, real and symmetric, then A has a unique LDL^{T}-
factorization, where D is diagonal and L is a unit lower triangular matrix (with one on
the diagonal).

Proof: A = LDR = A^{T} = R^{T}DL^{T}. It implies L = R^{T}.

Theorem 2.3.2 If A is symmetric and positive definite, then there exists a lower trian-
gular G∈ R^{n×n} with positive diagonal elements such that A = GG^{T}.

Proof: A is symmetric positive definite ⇔ x^{T}Ax ≥ 0, for all nonzero vector x ∈ R^{n×n}

⇔ k^{i} ≥ 0, for i = 1, · · · , n, ⇔ all eigenvalues of A are positive.

From Corollary 2.2.1 and Theorem 2.3.1 we have A = LDL^{T}. From L^{−1}AL^{−T} = D
follows that dk = (e^{T}_{k}L^{−1})A(L^{−T}ek) > 0. Thus, G = Ldiag{d^{1/2}1 ,· · · , d^{1/2}^{n} } is real, and
then A = GG^{T}.

Algorithm 2.3.3 (Cholesky factorization) Let A be symmetric positive definite. To
find a lower triangular matrix G such that A = GG^{T}.

For k = 1, 2, . . . , n,
a_{kk} := (a_{kk}−P_{k−1}

p=1a^{2}_{kp})^{1/2};
for i = k + 1, . . . , n,

aik = (aik−P_{k−1}

p=1aipakp)/akk.
Cost: n^{3}/6 flops.

Remark 2.3.1 For solving symmetric, indefinite systems: See Golub/ Van Loan Matrix
Computation pp. 159-168. P AP^{T} = LDL^{T}, D is 1× 1 or 2 × 2 block-diagonal matrix,
P is a permutation and L is lower triangular with one on the diagonal.

30 Chapter 2. Numerical methods for solving linear systems

### 2.3.3 Error estimation for linear systems

Consider the linear system

Ax = b, (2.3.4)

and the perturbed linear system

(A + δA)(x + δx) = b + δb, (2.3.5)

where δA and δb are errors of measure or round-off in factorization.

Definition 2.3.1 Letk k be an operator norm and A be nonsingular. Then κ ≡ κ(A) =
kAkkA^{−1}k is a condition number of A corresponding to k k.

Theorem 2.3.3 (Forward error bound) Let x be the solution of the (2.3.4) and x+δx
be the solution of the perturbed linear system (2.3.5). If kδAkkA^{−1}k < 1, then

kδxk

kxk ≤ κ

1− κ^{kδAk}_{kAk}

kδAk

kAk + kδbk kbk

. (2.3.6)

Proof: From (2.3.5) we have

(A + δA)δx + Ax + δAx = b + δb.

Thus,

δx =−(A + δA)^{−1}[(δA)x− δb]. (2.3.7)
Here, Corollary 2.7 implies that (A + δA)^{−1} exists. Now,

k(A + δA)^{−1}k = k(I + A^{−1}δA)^{−1}A^{−1}k ≤ kA^{−1}k 1

1− kA^{−1}kkδAk.
On the other hand, b = Ax implies kbk ≤ kAkkxk. So,

1

kxk ≤ kAk

kbk. (2.3.8)

From (2.3.7) follows that kδxk ≤ _{1−kA}^{kA}^{−1}^{−1}_{kkδAk}^{k} (kδAkkxk + kδbk). By using (2.3.8), the
inequality (2.3.6) is proved.

Remark 2.3.2 If κ(A) is large, then A (for the linear system Ax = b) is called ill- conditioned, else well-conditioned.

### 2.3.4 Error analysis for Gaussian algorithm

A computer in characterized by four integers: (a) the machine base β; (b) the precision t; (c) the underflow limit L; (d) the overflow limit U . Define the set of floating point numbers.

F = {f = ±0.d1d_{2}· · · dt× β^{e} | 0 ≤ di < β, d_{1} 6= 0, L ≤ e ≤ U} ∪ {0}. (2.3.9)

Let G ={x ∈ R | m ≤ |x| ≤ M} ∪ {0}, where m = β^{L−1} and M = β^{U}(1− β^{−t}) are the
minimal and maximal numbers of F \ {0} in absolute value, respectively. We define an
operator f l : G→ F by

f l(x) = the nearest c∈ F to x by rounding arithmetic.

One can show that f l satisfies

f l(x) = x(1 + ε), |ε| ≤ eps, (2.3.10)
where eps = ^{1}_{2}β^{1−t}. (If β = 2, then eps = 2^{−t}). It follows that

f l(a◦ b) = (a ◦ b)(1 + ε) or

f l(a◦ b) = (a ◦ b)/(1 + ε), where |ε| ≤ eps and ◦ = +, −, ×, /.

Algorithm 2.3.4 Given x, y ∈ R^{n}. The following algorithm computes x^{T}y and stores
the result in s.

s = 0,

for k = 1, . . . , n, s = s + xkyk.

Theorem 2.3.4 If n2^{−t} ≤ 0.01, then
f l(

Xn k=1

xkyk) = Xn

k=1

xkyk[1 + 1.01(n + 2− k)θ^{k}2^{−t}], |θ^{k}| ≤ 1

Proof: Let s_{p} = f l(Pp

k=1x_{k}y_{k}) be the partial sum in Algorithm 2.3.4. Then
s1 = x1y1(1 + δ1)

with |δ1| ≤ eps and for p = 2, . . . , n,

sp = f l[s_{p−1}+ f l(xpyp)] = [s_{p−1}+ xpyp(1 + δp)](1 + εp)
with |δ^{p}|, |ε^{p}| ≤ eps. Therefore

f l(x^{T}y) = s_{n}=
Xn
k=1

x_{k}y_{k}(1 + γ_{k}),

where (1 + γk) = (1 + δk)Qn

j=k(1 + εj), and ε1 ≡ 0. Thus, f l(

Xn k=1

xkyk) = Xn k=1

xkyk[1 + 1.01(n + 2− k)θ^{k}2^{−t}]. (2.3.11)

The result follows immediately from the following useful Lemma.

32 Chapter 2. Numerical methods for solving linear systems Lemma 2.3.5 If (1 + α) =Qn

k=1(1 + α_{k}), where |αk| ≤ 2^{−t} and n2^{−t} ≤ 0.01, then
Yn

k=1

(1 + αk) = 1 + 1.01nθ2^{−t} with |θ| ≤ 1.

Proof: From assumption it is easily seen that

(1− 2^{−t})^{n}≤
Yn
k=1

(1 + αk)≤ (1 + 2^{−t})^{n}. (2.3.12)

Expanding the Taylor expression of (1− x)^{n} as −1 < x < 1, we get
(1− x)^{n} = 1− nx + n(n− 1)

2 (1− θx)^{n−2}x^{2} ≥ 1 − nx.

Hence

(1− 2^{−t})^{n} ≥ 1 − n2^{−t}. (2.3.13)
Now, we estimate the upper bound of (1 + 2^{−t})^{n}:

e^{x} = 1 + x +x^{2}
2! +x^{3}

3! +· · · = 1 + x +x

2x(1 + x
3 +2x^{2}

4! +· · · ).

If 0≤ x ≤ 0.01, then

1 + x≤ e^{x} ≤ 1 + x + 0.01x1

2e^{x} ≤ 1 + 1.01x (2.3.14)
(Here, we use the fact e^{0.01} < 2 to the last inequality.) Let x = 2^{−t}. Then the left
inequality of (2.3.14) implies

(1 + 2^{−t})^{n}≤ e^{2}^{−t}^{n} (2.3.15)

Let x = 2^{−t}n. Then the second inequality of (2.3.14) implies

e^{2}^{−t}^{n} ≤ 1 + 1.01n2^{−t} (2.3.16)

From (2.3.15) and (2.3.16) we have

(1 + 2^{−t})^{n}≤ 1 + 1.01n2^{−t}.

Let the exact LR-factorization of A be L and R (A = LR) and let ˜L, ˜R be the LR-factorization of A by using Gaussian Algorithm (without pivoting). There are two possibilities:

(i) Forward error analysis: Estimate |L − ˜L| and |R − ˜R|.

(ii) Backward error analysis: Let ˜L ˜R be the exact LR-factorization of a perturbed matrix ˜A = A + F . Then F will be estimated, i.e., |F | ≤ ?.

### 2.3.5 Apriori error estimate for backward error bound of LR- ` factorization

From (2.2.2) we have

A^{(k+1)}= L_{k}A^{(k)},

for k = 1, 2, . . . , n− 1 (A^{(1)} = A). Denote the entries of A^{(k)} by a^{(k)}_{ij} and let lik =
f l(a^{(k)}_{ik} /a^{(k)}_{kk}), i≥ k + 1. From (2.2.2) we know that

a^{(k+1)}_{ij} =

0; for i≥ k + 1, j = k

f l(a^{(k)}_{ij} − fl(lika^{(k)}_{kj} )); for i≥ k + 1, j ≥ k + 1
a^{(k)}_{ij} ; otherwise.

(2.3.17)

From (2.3.10) we have lik = (a^{(k)}_{ik} /a^{(k)}_{kk})(1 + δik) with|δ^{ik}| ≤ 2^{−t}. Then

a^{(k)}_{ik} − lika^{(k)}_{kk} + a^{(k)}_{ij} δik = 0, for i≥ k + 1. (2.3.18)
Let a^{(k)}_{ik} δik ≡ ε^{(k)}ik . From (2.3.10) we also have

a^{(k+1)}_{ij} = f l(a^{(k)}_{ij} − fl(l^{ik}a^{(k)}_{kj})) (2.3.19)

= (a^{(k)}_{ij} − (lika^{(k)}_{kj} (1 + δij)))/(1 + δ_{ij}^{0} )
with |δij|, |δij^{0} | ≤ 2^{−t}. Then

a^{(k+1)}_{ij} = a^{(k)}_{ij} − l^{ik}a^{(k)}_{kj} − l^{ik}a^{(k)}_{kj}δij+ a^{(k+1)}_{ij} δ_{ij}^{0} , for i, j ≥ k + 1. (2.3.20)
Let ε^{(k)}_{ij} ≡ −lika^{(k)}_{kj}δij+ a^{(k+1)}_{ij} δ_{ij}^{0} which is the computational error of a^{(k)}_{ij} in A^{(k+1)}. From
(2.3.17), (2.3.18) and (2.3.20) we obtain

a^{(k+1)}_{ij} =

a^{(k)}_{ij} − l^{ik}a^{(k)}_{kk} + ε^{(k)}_{ij} ; for i≥ k + 1, j = k
a^{(k)}_{ij} − l^{ik}a^{(k)}_{kj} + ε^{(k)}_{ij} ; for i≥ k + 1, j ≥ k + 1
a^{(k)}_{ij} + ε^{(k)}_{ij} ; otherwise,

(2.3.21)

where

ε^{(k)}_{ij} =

a^{(k)}_{ij} δij; for i≥ k + 1, j = k,

−l^{ik}a^{(k)}_{kj} δij− a^{(k+1)}ij δ_{ij}^{0} ; for i≥ k + 1, j ≥ k + 1

0; otherwise.

(2.3.22)

Let E^{(k)} be the error matrix with entries ε^{(k)}_{ij} . Then (2.3.21) can be written as

A^{(k+1)} = A^{(k)}− MkA^{(k)}+ E^{(k)}, (2.3.23)
where

M_{k} =

0

. ..

0 lk+1,k

... . ..

ln,k 0

(2.3.24)

34 Chapter 2. Numerical methods for solving linear systems For k = 1, 2 . . . , n− 1, we add the n − 1 equations in (2.3.23) together and get

M1A^{(1)} + M2A^{(2)}+· · · + Mn−1A^{(n−1)}+ A^{(n)}

= A^{(1)}+ E^{(1)}+· · · + E^{(n−1)}.

From (2.3.17) we know that the k-th row of A^{(k)}is equal to the k-th row of A^{(k+1)},· · · , A^{(n)},
respectively and from (2.3.24) we also have

MkA^{(k)} = MkA^{(n)} = MkR.˜
Thus,

(M1 + M2+· · · + Mn−1+ I) ˜R = A^{(1)}+ E^{(1)}+· · · + E^{(n−1)}.
Then

L ˜˜R = A + E, (2.3.25)

where

L =˜

1

l21 1 O

... . ..

ln1 . . . l_{n,n−1} 1

and E = E^{(1)}+· · · + E^{(n−1)}. (2.3.26)

Now we assume that the partial pivotings in Gaussian Elimination are already ar-
ranged such that pivot element a^{(k)}_{kk} has the maximal absolute value. So, we have|lik| ≤ 1.

Let

ρ = max

i,j,k |a^{(k)}ij |/kAk∞. (2.3.27)

Then

|a^{(k)}ij | ≤ ρkAk∞. (2.3.28)

From (2.3.22) and (2.3.28) follows that

|ε^{(k)}ij | ≤ ρkAk∞

2^{−t}; for i≥ k + 1, j = k,
2^{1−t}; for i≥ k + 1, j ≥ k + 1,

0; otherwise.

(2.3.29)

Therefore,

|E^{(k)}| ≤ ρkAk∞2^{−t}·

0 0 0 · · · 0 0 1 2 · · · 2 ... ... ... ... 0 1 2 · · · 2

. (2.3.30)

From (2.3.26) we get

|E| ≤ ρkAk∞· 2^{−t}

0 0 0 · · · 0 0

1 2 2 · · · 2 2

1 3 4 · · · 4 4

... ... ... ... ... 1 3 5 · · · 2n − 4 2n − 4 1 3 5 · · · 2n − 3 2n − 2

(2.3.31)

Hence we have the following theorem.

Theorem 2.3.6 The LR-factorization ˜L and ˜R of A using Gaussian Elimination with partial pivoting satisfies

L ˜˜R = A + E, where

kEk∞ ≤ n^{2}ρkAk∞2^{−t} (2.3.32)

Proof:

kEk∞ ≤ ρkAk∞2^{−t}(
Xn

j=1

(2j− 1) − 1) < n^{2}ρkAk∞2^{−t}.

Now we shall solve the linear system Ax = b by using the factorization ˜L and ˜R, i.e., Ly = b and ˜˜ Rx = y.

• For Ly = b: From Algorithm 2.3.1 we have y1 = f l(b1/l11),

y_{i} = f l

−l^{i1}y1− l^{i2}y2− · · · − li,i−1y_{i−1}+ bi

lii

, (2.3.33)

for i = 2, 3, . . . , n. From (2.3.10) we have

y_{1} = b_{1}/l_{11}(1 + δ_{11}), with |δ11| ≤ 2^{−t}
yi = f l(^{f l(−l}^{i1}^{y}^{1}^{−l}^{i2}_{l}^{y}^{2}^{−···−l}^{i,i−1}^{y}^{i−1}^{)+b}^{i}

ii(1+δii) )

= ^{f l(−l}^{i1}^{y}^{1}^{−l}_{l} ^{i2}^{y}^{2}^{−···−l}^{i,i−1}^{y}^{i−1}^{)+b}^{i}

ii(1+δii)(1+δ^{0}_{ii}) , with|δii|, |δii^{0}| ≤ 2^{−t}.

(2.3.34)

Applying Theorem 2.3.4 we get

f l(−li1y1− li2y2− · · · − li,i−1y_{i−1}) =−li1(1 + δi1)y1− · · · − li,i−1(1 + δ_{i,i−1})y_{i−1},
where

|δ^{i1}| ≤ (i − 1)1.01 · 2^{−t}; for i = 2, 3,· · · , n,

|δij| ≤ (i + 1 − j)1.01 · 2^{−t}; for

i = 2, 3,· · · , n,

j = 2, 3,· · · , i − 1. (2.3.35) So, (2.3.34) can be written as

l11(1 + δ11)y1 = b1,

li1(1 + δi1)y1+· · · + li,i−1(1 + δ_{i,i−1})y_{i−1}+ lii(1 + δii)(1 + δ_{ii}^{0})yi = bi,
for i = 2, 3,· · · , n.

(2.3.36)

or

(L + δL)y = b. (2.3.37)

36 Chapter 2. Numerical methods for solving linear systems From (2.3.35) (2.3.36) and (2.3.37) follow that

|δL| ≤ 1.01 · 2^{−t}

|l^{11}| 0

|l21| 2|l22|

2|l^{31}| 2|l^{32}| 2|l^{33}|

3|l41| 3|l42| 2|l43| . ..

... ... ... . .. . ..

(n− 1)|ln1| (n − 1)|ln2| (n − 2)|ln3| · · · 2|ln,n−1| 2|lnn|

.

(2.3.38) This implies,

kδLk∞ ≤ n(n + 1)

2 · 1.01 · 2^{−t}max

i,j |lij| ≤ n(n + 1)

2 · 1.01 · 2^{−t}. (2.3.39)
Theorem 2.3.7 For lower triangular linear system Ly = b, if y is the exact solution of
(L + δL)y = b, then δL satisfies (2.3.38) and (2.3.39).

Applying Theorem 2.3.7 to the linear system ˜Ly = b and ˜Rx = y, respectively, the solution x satisfies

( ˜L + δ ˜L)( ˜R + δ ˜R)x = b or

( ˜L ˜R + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R))x = b. (2.3.40) Since ˜L ˜R = A + E, substituting this equation into (2.3.40) we get

[A + E + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R)]x = b. (2.3.41) The entries of ˜L and ˜R satisfy

|el^{ij}| ≤ 1, and |er^{ij}| ≤ ρkAk∞.

Therefore, we get

k ˜Lk∞ ≤ n,

k ˜Rk∞≤ nρkAk∞,

kδ ˜Lk∞≤ ^{n(n+1)}_{2} 1.01· 2^{−t},
kδ ˜Rk∞≤ ^{n(n+1)}_{2} 1.01ρ2^{−t}.

(2.3.42)

In practical implementation we usually have n^{2}2^{−t} << 1. So it holds
kδ ˜Lk∞kδ ˜Rk∞≤ n^{2}ρkAk∞2^{−t}.

Let

δA = E + (δ ˜L) ˜R + ˜L(δ ˜R) + (δ ˜L)(δ ˜R). (2.3.43) Then, (2.3.32) and (2.3.42) we get

kδAk∞ ≤ kEk∞+kδ ˜Lk∞k ˜Rk∞+k ˜Lk∞kδ ˜Rk∞+kδ ˜Lk∞kδ ˜Rk∞

≤ 1.01(n^{3}+ 3n^{2})ρkAk∞2^{−t} (2.3.44)

Theorem 2.3.8 For a linear system Ax = b the solution x computed by Gaussian Elim- ination with partial pivoting is the exact solution of the equation (A + δA)x = b and δA satisfies (2.3.43) and (2.3.44).

Remark 2.3.3 The quantity ρ defined by (2.3.27) is called a growth factor. The growth
factor measures how large the numbers become during the process of elimination. In
practice, ρ is usually of order 10 for partial pivot selection. But it can be as large as
ρ = 2^{n−1}, when

A =

1 0 · · · 0 1

−1 1 0 · · · 0 1

... −1 . .. ... ... 1 ... ... . .. ... 0 1

−1 −1 · · · −1 1 1

−1 −1 · · · −1 1

.

Better estimates hold for special types of matrices. For example in the case of upper Hessenberg matrices, that is, matrices of the form

A =

× · · · ×

× . .. ... ...

. .. ... ...

0 × ×

the bound ρ≤ (n − 1) can be shown. (Hessenberg matrices arise in eigenvalus problems.) For tridiagonal matrices

A =

α1 β2 0

γ2 . .. ...

. .. ... ...

. .. ... βn

0 γn αn

it can even be shown that ρ ≤ 2 holds for partial pivot selection. Hence, Gaussian elimination is quite numerically stable in this case.

For complete pivot selection, Wilkinson (1965) has shown that

|a^{k}ij| ≤ f(k) max

i,j |aij| with the function

f (k) := k^{1}^{2}[2^{1}3^{1}^{2} 4^{1}^{3} · · · k^{(k−1)}^{1} ]^{1}^{2}.
This function grows relatively slowly with k:

k 10 20 50 100

f (k) 19 67 530 3300

38 Chapter 2. Numerical methods for solving linear systems Even this estimate is too pessimistic in practice. Up until now, no matrix has been found which fails to satisfy

|a^{(k)}ij | ≤ (k + 1) max

i,j |aij| k = 1, 2, ..., n − 1,

when complete pivot selection is used. This indicates that Gaussian elimination with complete pivot selection is usually a stable process. Despite this, partial pivot selection is preferred in practice, for the most part, because:

(i) Complete pivot selection is more costly than partial pivot selection. (To compute
A^{(i)}, the maximum from among (n− i + 1)^{2} elements must be determined instead
of n− i + 1 elements as in partial pivot selection.)

(ii) Special structures in a matrix, i.e. the band structure of a tridiagonal matrix, are destroyed in complete pivot selection.

### 2.3.6 Improving and Estimating Accuracy

• Iterarive Improvement:

Suppose that the linear system Ax = b has been solved via the LR-factorization P A = LR. Now we want to improve the accuracy of the computed solution x. We compute

r = b− Ax,
Ly = P r, Rz = y,
x_{new} = x + z.

(2.3.45)

Then in exact arithmatic we have

Ax_{new} = A(x + z) = (b− r) + Az = b.

Unfortunately, r = f l(b− Ax) renders an x^{new} that is no more accurate than x. It is
necessary to compute the residual b− Ax with extended precision floating arithmetic.

Algorithm 2.3.5

Compute P A = LR (t-digit) Repeat: r := b− Ax (2t-digit)

Solve Ly = P r for y (t-digit) Solve Rz = y for z (t-digit) Update x = x + z (t-digit)

This is referred to as an iterative improvement. From (2.3.45) we have

ri = bi− a^{i1}x1− a^{i2}x2− · · · − a^{in}xn. (2.3.46)
Now, ri can be roughly estimated by 2^{−t}maxj|aij| |xj|. That is

krk ≈ 2^{−t}kAkkxk. (2.3.47)

Let e = x− A^{−1}b = A^{−1}(Ax− b) = −A^{−1}r. Then we have

kek ≤ kA^{−1}kkrk. (2.3.48)

From (2.3.47) follows that

kek ≈ kA^{−1}k · 2^{−t}kAkkxk = 2^{−t}cond(A)kxk.

Let

cond(A) = 2^{p}, 0 < p < t, (p is integer). (2.3.49)
Then we have

kek/kxk ≈ 2^{−(t−p)}. (2.3.50)

From (2.3.50) we know that x has q = t− p correct significant digits. Since r is computed by double precision, so we can assume that it has at least t correct significant digits.

Therefore for solving Az = r according to (2.3.50) the solution z (comparing with −e =
A^{−1}r) has q-digits accuracy so that xnew = x + z has usually 2q-digits accuracy. From
above discussion, the accuracy of xnew is improved about q-digits after one iteration.

Hence we stop the iteration, when the number of the iterates k (say!) satifies kq ≥ t.

From above we have

kzk/kxk ≈ kek/kxk ≈ 2^{−q} = 2^{−t}2^{p}. (2.3.51)
From (2.3.49) and (2.3.51) we have

cond(A) = 2^{t}· (kzk/kxk).

By (2.3.51) we get

q = log_{2}(kxk

kzk) and k = t
log_{2}(^{kxk}_{kzk}).

In the following we shall give a further discussion of convergence of the iterative improve- ment. From Theorem 2.3.8 we know that z in Algorithm 5.5 is computed by (A+δA)z = r.

That is

A(I + F )z = r, (2.3.52)

where F = A^{−1}δA.

Theorem 2.3.9 Let the sequence of vectors {xv} be the sequence of improved solutions
in Algorithm 5.5 for solving Ax = b and x^{∗} = A^{−1}b be the exact solution. Assume that
Fk in (2.3.52) satisfies kFkk ≤ σ < 1/2 for all k. Then {xk} converges to x^{∗}, i.e.,
lim_{v→∞}kxk− x^{∗}k = 0.

Proof: From (2.3.52) and rk= b− Axk we have

A(I + Fk)zk= b− Ax^{k}. (2.3.53)
Since A is nonsingular, multiplying both sides of (2.3.53) by A^{−1} we get

(I + F_{k})z_{k} = x^{∗}− xk.

40 Chapter 2. Numerical methods for solving linear systems
From x_{k+1} = x_{k}+ z_{k} we have (I + F_{k})(x_{k+1}− xk) = x^{∗}− xk, i.e.,

(I + Fk)xk+1 = Fkxk+ x^{∗}. (2.3.54)
Subtracting both sides of (2.3.54) from (I + F_{k})x^{∗} we get

(I + Fk)(xk+1− x^{∗}) = Fk(xk− x^{∗}).

Applying Corollary 1.2.1 we have

xk+1− x^{∗} = (I + Fk)^{−1}Fk(xk− x^{∗}).

Hence,

kxk+1− x^{∗}k ≤ kFkkkx^{k}− x^{∗}k

1− kFkk ≤ σ

1− σkxk− x^{∗}k.

Let τ = σ/(1− σ). Then

kxk− x^{∗}k ≤ τ^{k−1}kx1− x^{∗}k.

But σ < 1/2 follows τ < 1. This implies convergence of Algorithm 2.3.5.

Corollary 2.3.1 If

1.01(n^{3}+ 3n^{2})ρ2^{−t}kAk kA^{−1}k < 1/2,
then Algorithm 2.3.5 converges.

Proof: From (2.3.52) and (2.3.44) follows that

kF^{k}k ≤ 1.01(n^{3}+ 3n^{2})ρ2^{−t}cond(A) < 1/2.

### 2.3.7 Special Linear Systems

Toeplitz Systems

Definition 2.3.2 (i) T ∈ R^{n×n} is called a Toeplitz matrix if there exists r_{−n+1},· · · , r^{0},· · · , rn−1

such that aij = r_{j−i} for all i, j. e.g.,

T =

r_{0} r_{1} r_{2} r_{3}
r_{−1} r0 r1 r2

r_{−2} r_{−1} r0 r1

r_{−3} r_{−2} r_{−1} r_{0}

, (n = 4).

(ii) B ∈ R^{n×n}is called a Persymmetric matrix if it is symmetric about northest-southwest
diagonal, i.e., bij= bn−j+1,n−i+1 for all i, j. That is,

B = EB^{T}E, where E = [en,· · · e1].

Given scalars r_{1},· · · , rn−1 such that the matrices

Tk =

1 r1 r2 · · · rk−1

r1 1 r1 ...

... . ..

r_{k−1} · · · 1

are all positive definite, for k = 1, . . . , n. Three algorithms will be described:

(a) Durbin’s Algorithm for the Yule-Walker problem
Tny =−(r^{1}, . . . , rn)^{T}.

(b) Levinson’s Algorithm for the general right hand side Tnx = b.

(c) Trench’s Algorithm for computing B = T_{n}^{−1}.

• To (a): Let Ek = [e^{(k)}_{k} , . . . , e^{(k)}_{1} ]. Suppose the k-th order Yule-Walker system
Tky =−(r^{1}, . . . , rk)^{T} =−r^{T}

has been solved. Consider the (k + 1)-st order system

Tk Ekr
r^{T}Ek 1

z α

=

−r

−r^{k+1}

can be solved in O(k) flops. Observe that

z = T_{k}^{−1}(−r − αEkr) = y− αTk^{−1}Ekr (2.3.55)
and

α =−rk+1 =−r^{T}Ekz. (2.3.56)

Since T_{k}^{−1} is persymmetric, T_{k}^{−1}Ek= EkT_{k}^{−1}and z = y +αEky. Substituting into (2.3.56)
we get

α =−r^{k+1}− r^{T}Ek(y + αEky) =−(r^{k+1}+ r^{T}Eky)/(1 + r^{T}y).

Here (1 + r^{T}y) is positive, because Tk+1 is positive definite and

I Eky

0 1

T

Tk Ekr
r^{T}Ek 1

I Eky

0 1

=

Tk 0
0 1 + r^{T}y

.

Algorithm 2.3.6 (Durbin Algorithm, 1960) Let Tky^{(k)}=−r^{(k)} =−(r1, . . . , rk)^{T}.
For k = 1, . . . , n,

y^{(1)} =−r1,

for k = 1, . . . , n− 1,
βk = 1 + r^{(k)T}y^{(k)},

αk=−(rk+1+ r^{(k)T}Eky^{(k)})/βk,
z^{(k)} = y^{(k)}+ αkEky^{(k)},

y^{(k+1)} =

z^{(k)}
αk

.

42 Chapter 2. Numerical methods for solving linear systems
This algorithm requires ^{3}_{2}n^{2} flops to generate y = y^{(n)}.

Further reduction:

βk = 1 + r^{(k)T}y^{(k)}

= 1 + [r^{(k−1)T}, r^{(k)}]

y^{(k−1)}+ α_{k−1}E_{k−1}y^{(k−1)}
α_{k−1}

= 1 + r^{(k−1)T}y^{(k−1)}+ α_{k−1}(r^{(k−1)T}E_{k−1}y^{(k−1)}+ rk)

= β_{k−1}+ α_{k−1}(−βk−1α_{k−1}) = (1− α^{2}_{k−1})β_{k−1}.

• To (b):

Tkx = b = (b1,· · · , b^{k})^{T}, for 1≤ k ≤ n. (2.3.57)
Want to solve

Tk Ekr
r^{T}Ek 1

ν µ

b bk+1

, (2.3.58)

where r = (r1,· · · , r^{k})^{T}. Since ν = T_{k}^{−1}(b− µE^{k}r) = x + µEky, it follows that
µ = bk+1− r^{T}Ekν = bk+1− r^{T}Ekx− µr^{T}y

= (bk+1− r^{T}Ekx)/(1 + r^{T}y).

We can effect the transition form (2.3.57) to (2.3.58) in O(k) flops. We can solve the system Tnx = b by solving

T_{k}x^{(k)} = b^{(k)}= (b_{1}, . . . , b_{k})^{T}
and

Tky^{(k)}=−r^{(k)} =−(r^{1}, . . . , rk)^{T}.

It needs 2n^{2} flops. See Algorithm Levinson (1947) in Matrix Computations, pp.128-129
for details.

• To (c):

T_{n}^{−1} =

A Er

r^{T}E 1

_{−1}

=

B ν
ν^{T} γ

,

where A = T_{n−1}, E = E_{n−1} and r = (r_{1}, . . . , r_{n−1})^{T}. From the equation

A Er

r^{T}E 1

=

ν γ

=

0 1

follows that

Aν =−γEr = −γE(r1, . . . , r_{n−1})^{T} and γ = 1− r^{T}Eν.

If y is the solution of (n− 1)-st Yule-Walker system Ay = −r, then
γ = 1/(1 + r^{T}y) and ν = γEy.

Thus the last row and column of T_{n}^{−1} are readily obtained. Since AB + Erν^{T} = I_{n−1}, we
have

B = A^{−1}− (A^{−1}Er)ν^{T} = A^{−1} +νν^{T}
γ .

Since A = T_{n−1} is nonsingular and Toeplitz, its inverse is persymmetric. Thus
bij = (A^{−1})ij+ νiνj

γ = (A^{−1})_{n−j,n−i}+νiνj

γ

= b_{n−j,n−i}− ν_{n−i}ν_{n−j}

γ +νiνj

γ

= b_{n−j,n−i}− 1

γ(νiνj− νn−iν_{n−j}).

It needs ^{7}_{4}n^{2} flops. See Algorithm Trench (1964) in Matrix Computations, pp.132 for
details.

Banded Systems

Definition 2.3.3 Let A be a n× n matrix. A is called a (p, q)-banded matrix, if a^{ij}= 0
whenever i− j > p or j − i > q. A has the form

A =

× · · · ... ...

× . ..

O

× O

. ..

. .. ×

. .. ...

× · · · ×

| {z }

p

>

q

⊥ ,

where p and q are the lower and upper band widthes, respectively.

Example 2.3.2 (1, 1): tridiagonal matrix; (1, n−1): upper Hessenberg matrix; (n−1, 1):

lower Hessenberg matrix.

Theorem 2.3.10 Let A be a (p, q)-banded matrix. Suppose A has a LR-factorization (A = LR). Then L = (p, 0) and R = (0, q)-banded matrix, respectively.

Algorithm 2.3.7 See Algorithm 4.3.1 in Matrix Computations, pp.150.

Theorem 2.3.11 Let A be a (p, q)-banded nonsingular matrix. If Gaussian Elimination
with partial pivoting is used to compute Gaussian transformations Lj = I − lje^{T}_{j}, for
j = 1,. . ., n− 1, and permutations P1, . . ., P_{n−1} such that

L_{n−1}P_{n−1}· · · L^{1}P1A = R

is upper triangular, then R is a (0, p + q)-banded matrix and lij = 0 whenever i ≤ j or
i > j + p. (Since the j-th column of L is a permutation of the Gaussian vector l_{j}, it
follows that L has at most p + 1 nonzero elements per column.)

44 Chapter 2. Numerical methods for solving linear systems Symmetric Indefinite Systems

Consider the linear system Ax = b, where A ∈ R^{n×n} is symmetric but indefinite. There
are a method using n^{3}/6 flops due to Aasen (1971) that computes the factorization
P AP^{T} = LT L^{T}, where L = [lij] is unit lower triangular, P is a permutation chosen such
that | lij |≤ 1, and T is tridiagonal.

Rather than the above factorization P AP^{T} = LT L^{T} we have the calculation of
P AP^{T} = LDL^{T},

where D is block diagonal with 1 by 1 and 2 by 2 blocks on diagonal, L = [l_{ij}] is unit
lower triangular, and P is a permutation chosen such that | l^{ij} |≤ 1.

Bunch and Parlett (1971) has proposed a pivot strategy to do this, n^{3}/6 flops are
required. Unfortunately the overall process requires n^{3}/12 ∼ n^{3}/6 comparisons. A bet-
ter method described by Bunch and Kaufmann (1977) requires n^{3}/6 flops and O(n^{2})
comparisons.

A detailed discussion of this subsection see p.159-168 in Matrix Computations.