Part I
On the Numerical Solutions of
Linear Systems
Chapter 1 Introduction
1.1 Mathematical auxiliary, definitions and relations
1.1.1 Vectors and matrices
A∈ Km×n, where K = R or C ⇔ A = [aij] =
a11 · · · a1n
... ... ...
am1 · · · amn
.
• Product of matrices (Km×n× Kn×p → Km×p): C = AB, where cij = Pn
k=1aikbkj, i = 1,· · · , m, j = 1, · · · , p.
• Transpose (Rm×n→ Rn×m): C = AT, where cij = aji∈ R.
• Conjugate transpose (Cm×n→ Cn×m): C = A∗ or C = AH, where cij = ¯aji ∈ C.
• Differentiation (Rm×n→ Rm×n): Let C(t) = (cij(t)). Then ˙C(t) = [ ˙cij(t)].
• If A, B ∈ Kn×n satisfy AB = I, then B is the inverse of A and is denoted by A−1. If A−1 exists, then A is said to be nonsingular; otherwise, A is singular. A is nonsingular if and only if det(A)6= 0.
• If A ∈ Km×n, x∈ Kn and y = Ax, then yi =Pn
j=1aijxj, i = 1,· · · , m.
• Outer product of x ∈ Km and y∈ Kn:
xy∗ =
x1y¯1 · · · x1y¯n
... . .. ...
xmy¯1 · · · xmy¯n
∈ Km×n.
• Inner product of x and y ∈ Kn: (x, y) := xTy =
Xn i=1
xiyi = yTx∈ R
(x, y) := x∗y = Xn
i=1
¯
xiyi = y∗x∈ C
• Sherman-Morrison Formula:
Let A∈ Rn×n be nonsingular, u, v∈ Rn. If vTA−1u6= −1, then
(A + uvT)−1 = A−1− (1 + vTA−1u)−1A−1uvTA−1. (1.1.1)
• Sherman-Morrison-Woodburg Formula:
Let A∈ Rn×n, be nonsingular U , V ∈ Rn×k. If (I + VTA−1U ) is invertible, then (A + U VT)−1 = A−1− A−1U (I + VTA−1U )−1VTA−1,
Proof of (1.1.1):
(A + uvT)[A−1− A−1uvTA−1/(1 + vTA−1u)]
= I + 1
1 + vTA−1u[uvTA−1(1 + vTA−1u)− uvTA−1− uvTA−1uvTA−1]
= I + 1
1 + vTA−1u[u(vTA−1u)vTA−1− uvTA−1uvTA−1] = I.
Example 1.1.1
A =
3 −1 1 1 1
0 1 2 2 2
0 −1 4 1 1
0 0 0 3 0
0 0 0 0 3
= B +
0 0
−1 0 0
0 1 0 0 0 .
1.1.2 Rank and orthogonality
Let A∈ Rm×n. Then
• R(A) = {y ∈ Rm| y = Ax for some x ∈ Rn } ⊆ Rm is the range space of A.
• N (A) = {x ∈ Rn| Ax = 0 } ⊆ Rn is the null space of A.
• rank(A) = dim [R(A)] = The number of maximal linearly independent columns of A.
• rank(A) = rank(AT).
• dim(N (A)) + rank(A) = n.
• If m = n, then A is nonsingular ⇔ N (A) = {0} ⇔ rank(A) = n.
• Let {x1,· · · , xp} ⊆ Rn. Then {x1,· · · , xp} is said to be orthogonal if xTi xj = 0, for i6= j and orthonormal if xTi xj = δij, where δij = 0 if i 6= j and δij = 1 if i = j.
• S⊥={y ∈ Rm | yTx = 0, for x∈ S} = orthogonal complement of S.
• Rn=R(AT)⊕ N (A), Rm=R(A) ⊕ N (AT).
• R(AT)⊥ N (A), R(A)⊥=N (AT).
1.1 Mathematical auxiliary, definitions and relations 5
A∈ Rn×n A∈ Cn×n
Symmetric: AT = A Hermitian: A∗ = A(AH = A) skew-symmetric: AT =−A skew-Hermitian: A∗ =−A
positive definite: xTAx > 0, x6= 0 positive definite: x∗Ax > 0, x6= 0 non-negative definite: xTAx≥ 0 non-negative definite: x∗Ax ≥ 0
indefinite: (xTAx)(yTAy) < 0 for some x, y indefinite: (x∗Ax)(y∗Ay) < 0 for some x, y orthogonal: ATA = In unitary: A∗A = In
normal: ATA = AAT normal: A∗A = AA∗ positive: aij > 0
non-negative: aij ≥ 0.
Table 1.1: Some definitions for matrices.
1.1.3 Special matrices
Let A∈ Kn×n. Then the matrix A is
• diagonal if aij = 0, for i6= j. Denote D = diag(d1,· · · , dn)∈ Dnthe set of diagonal matrices;
• tridiagonal if aij = 0,|i − j| > 1;
• upper bi-diagonal if aij= 0, i > j or j > i + 1;
• (strictly) upper triangular if aij = 0, i > j (i≥ j);
• upper Hessenberg if aij = 0, i > j + 1.
(Note: the lower case is the same as above.)
Sparse matrix: n1+r, where r < 1 (usually between 0.2∼ 0.5). If n = 1000, r = 0.9, then n1+r = 501187.
Example 1.1.2 If S is skew-symmetric, then I− S is nonsingular and (I − S)−1(I + S) is orthogonal (Cayley transformation of S).
1.1.4 Eigenvalues and Eigenvectors
Definition 1.1.1 Let A∈ Cn×n. Then λ∈ C is called an eigenvalue of A, if there exists x6= 0, x ∈ Cn with Ax = λx and x is called an eigenvector corresponding to λ.
Notations:
σ(A) := Spectrum of A = The set of eigenvalues of A.
ρ(A) := Radius of A = max{|λ| : λ ∈ σ(A)}.
• λ ∈ σ(A) ⇔ det(A− λI) = 0.
• p(λ) = det(λI − A) = The characteristic polynomial of A.
• p(λ) =Qs
i=1(λ− λi)m(λi), where λi 6= λj (for i6= j) and Ps
i=1m(λi) = n.
• m(λi) = The algebraic multiplicity of λi.
• n(λi) = n− rank(A − λiI) = The geometric multiplicity of λi.
• 1 ≤ n(λi)≤ m(λi).
Definition 1.1.2 If there is some i such that n(λi) < m(λi), then A is called degenerated.
The following statements are equivalent:
(1) There are n linearly independent eigenvectors;
(2) A is diagonalizable, i.e., there is a nonsingular matrix T such that T−1AT ∈ Dn; (3) For each λ∈ σ(A), it holds that m(λ) = n(λ).
If A is degenerated, then eigenvectors and principal vectors derive the Jordan form of A.
(See Gantmacher: Matrix Theory I, II)
Theorem 1.1.1 (Schur) (1) Let A ∈ Cn×n. There is a unitary matrix U such that U∗AU (= U−1AU ) is upper triangular.
(2) Let A ∈ Rn×n. There is an orthogonal matrix Q such that QTAQ(= Q−1AQ) is quasi-upper triangular, i.e., an upper triangular matrix possibly with nonzero subdiagonal elements in non-consecutive positions.
(3) A is normal if and only if there is a unitary U such that U∗AU = D diagonal.
(4) A is Hermitian if and only if A is normal and σ(A)⊆ R.
(5) A is symmetric if and only if there is an orthogonal U such that UTAU = D diagonal and σ(A)⊆ R.
1.2 Norms and eigenvalues
Let X be a vector space over K = R or C.
Definition 1.2.1 (Vector norms) Let N be a real-valued function defined on X (N : X → R+). Then N is a (vector) norm, if
N1: N (αx) =|α|N(x), α ∈ K, for x ∈ X;
N2: N (x + y)≤ N(x) + N(y), for x, y ∈ X;
N3: N (x) = 0 if and only if x = 0.
The usual notation is kxk = N(x).
1.2 Norms and eigenvalues 7 Example 1.2.1 Let X = Cn, p ≥ 1. Then kxkp = (Pn
i=1|xi|p)1/p is a p-norm. Espe- cially,
kxk1 = Xn
i=1
|xi| ( 1-norm),
kxk2 = ( Xn
i=1
|xi|2)1/2 (2-norm = Euclidean-norm), kxk∞= max
1≤i≤n|xi| (∞-norm = maximum norm).
Lemma 1.2.1 N (x) is a continuous function in the components x1,· · · , xn of x.
Proof:
|N(x) − N(y)| ≤ N(x − y) ≤ Xn
j=1
|xj − yj|N(ej)
≤ kx − yk∞
Xn j=1
N (ej).
Theorem 1.2.1 (Equivalence of norms) Let N and M be two norms on Cn. Then there are constants c1, c2 > 0 such that
c1M (x)≤ N(x) ≤ c2M (x), for all x ∈ Cn.
Proof: Without loss of generality (W.L.O.G.) we can assume that M (x) =kxk∞and N is arbitrary. We claim that
c1kxk∞≤ N(x) ≤ c2kxk∞, equivalently,
c1 ≤ N(z) ≤ c2,∀ z ∈ S = {z ∈ Cn|kzk∞= 1}.
From Lemma 1.2.1, N is continuous on S (closed and bounded). By maximum and minimum principle, there are c1, c2 ≥ 0 and z1, z2 ∈ S such that
c1 = N (z1)≤ N(z) ≤ N(z2) = c2.
If c1 = 0, then N (z1) = 0, and thus, z1 = 0. This contradicts that z1 ∈ S.
Remark 1.2.1 Theorem 1.2.1 does not hold in infinite dimensional space.
Definition 1.2.2 (Matrix-norms) Let A∈ Cm×n. A real-valued functionk·k : Cm×n→ R+ satisfying
N1: kαAk = |α|kAk;
N2: kA + Bk ≤ kAk + kBk ;
N3: kAk = 0 if and only if A = 0;
N4: kABk ≤ kAkkBk ;
N5: kAxkv ≤ kAkkxkv (matrix and vector norms are compatible for some k · kv) is called a matrix norm. If k · k satisfies N1 to N4, then it is called a multiplicative or algebra norm.
Example 1.2.2 (Frobenius norm) Let kAkF = [Pn
i,j=1|ai,j|2]1/2. kABkF = (X
i,j
|X
k
aikbkj|2)12
≤ (X
i,j
{X
k
|aik|2}{X
k
|bkj|2})12 (Cauchy-Schwartz Ineq.)
= (X
i
X
k
|aik|2)12(X
j
X
k
|bkj|2)12
= kAkFkBkF. (1.2.1)
This implies that N4 holds. Furthermore, by Cauchy-Schwartz inequality we have kAxk2 = (X
i
|X
j
aijxj|2)12
≤ X
i
(X
j
|aij|2)(X
j
|xj|2)
!12
= kAkFkxk2. (1.2.2)
This implies that N5 holds. Also, N1, N2 and N3 hold obviously. (Here, kIkF =√ n).
Example 1.2.3 (Operator norm) Given a vector norm k · k. An associated (induced) matrix norm is defined by
kAk = sup
x6=0
kAxk
kxk = max
x6=0
kAxk
kxk . (1.2.3)
Then N5 holds immediately. On the other hand,
k(AB)xk = kA(Bx)k ≤ kAkkBxk
≤ kAkkBkkxk (1.2.4)
for all x6= 0. This implies that
kABk ≤ kAkkBk. (1.2.5)
It holds N 4. (Here kIk = 1).
1.2 Norms and eigenvalues 9 In the following, we represent and verify three useful matrix norms:
kAk1 = sup
x6=0
kAxk1
kxk1 = max
1≤j≤n
Xn i=1
|aij| (1.2.6)
kAk∞= sup
x6=0
kAxk∞
kxk∞
= max
1≤i≤n
Xn j=1
|aij| (1.2.7)
kAk2 = sup
x6=0
kAxk2
kxk2
=p
ρ(A∗A) (1.2.8)
Proof of (1.2.6):
kAxk1 = X
i
|X
j
aijxj| ≤X
i
X
j
|aij||xj|
= X
j
|xj|X
i
|aij|.
Let C1 := P
i|aik| = maxjP
i|aij|. Then kAxk1 ≤ C1kxk1, thus kAk1 ≤ C1. On the other hand, kekk1 = 1 and kAekk1 =Pn
i=1|aik| = C1. Proof of (1.2.7):
kAxk∞ = max
i |X
j
aijxj|
≤ maxi X
j
|aijxj|
≤ max
i
X
j
|aij|kxk∞
≡ X
j
|akj|kxk∞
≡ C∞kxk∞.
This implies that kAk∞≤ C∞. If A = 0, there is nothing to prove. Assume that A6= 0 and the k-th row of A is nonzero. Define z = [zj]∈ Cn by
zj = ( ¯a
kj
|akj| if akj 6= 0, 1 if akj = 0.
Then kzk∞= 1 and akjzj =|akj|, for j = 1, . . . , n. It follows that kAk∞≥ kAzk∞ = max
i |X
j
aijzj| ≥ |X
j
akjzj| = Xn
j=1
|akj| ≡ C∞. Thus, kAk∞ ≥ max1≤i≤nPn
j=1|aij| ≡ C∞.
Proof of (1.2.8): Let λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0 be the eigenvalues of A∗A. There are mutually orthonormal vectors vj, j = 1, . . . , n such that (A∗A)vj = λjvj. Let x = P
jαjvj. Since kAxk22 = (Ax, Ax) = (x, A∗Ax), kAxk22 = X
j
αjvj,X
j
αjλjvj
!
=X
j
λj|αj|2 ≤ λ1kxk22.
Therefore, kAk22 ≤ λ1. Equality follows by choosing x = v1 and kAv1k22 = (v1, λ1v1) = λ1. So, we have kAk2 =p
ρ(A∗A).
Example 1.2.4 (Dual norm) Let 1p +1q = 1. Then k · k∗p =k · kq, (p =∞, q = 1). (It concludes from the application of the H¨older inequality that |y∗x| ≤ kxkpkykq.)
Theorem 1.2.2 Let A∈ Cn×n. Then for any operator norm k · k, it holds ρ(A)≤ kAk.
Moreover, for any ε > 0, there exists an operator norm k · kε such that k · kε ≤ ρ(A) + ε.
Proof: Let |λ| = ρ(A) ≡ ρ and x be the associated eigenvector with kxk = 1. Then, ρ(A) =|λ| = kλxk = kAxk ≤ kAkkxk = kAk.
On the other hand, there is a unitary matrix U such that A = U∗RU , where R is upper triangular. Let Dt = diag(t, t2, . . . , tn). Compute
DtRD−1t =
λ1 t−1r12 t−2r13 · · · t−n+1r1n
λ2 t−1r23 · · · t−n+2r2n
λ3 ...
. .. t−1rn−1,n λn
.
For t > 0 sufficiently large, the sum of all absolute values of the off-diagonal elements of DtRDt−1 is less than ε. So, it holdskDtRDt−1k1 ≤ ρ(A) + ε for sufficiently large t(ε) > 0.
Define k · kε for any B by
kBkε = kDtU BU∗D−1t k1
= k(UD−1t )−1B(U Dt−1)k1. This implies that
kAkε =kDtRDt−1k ≤ ρ(A) + ε.
Remark 1.2.2
kUAV kF = kAkF (by kUAkF = q
kUa1k22+· · · + kUank22), (1.2.9) kUAV k2 = kAk2 (by ρ(A∗A) = ρ(AA∗)), (1.2.10) where U and V are unitary.
Theorem 1.2.3 (Singular Value Decomposition (SVD)) Let A∈ Cm×n. Then there exist unitary matrices U = [u1,· · · , um]∈ Cm×m and V = [v1,· · · , vn]∈ Cn×n such that
U∗AV = diag(σ1,· · · , σp) = Σ,
where p = min{m, n} and σ1 ≥ σ2 ≥ · · · ≥ σp ≥ 0. (Here, σi denotes the i-th largest singular value of A).
1.2 Norms and eigenvalues 11 Proof: There are x ∈ Cn, y ∈ Cm with kxk2 = kyk2 = 1 such that Ax = σy, where σ = kAk2 (kAk2 = supkxk2=1kAxk2). Let V = [x, V1] ∈ Cn×n, and U = [y, U1] ∈ Cm×m be unitary. Then
A1 ≡ U∗AV =
σ w∗ 0 B
. Since
A1
σ w
2 2
≥ (σ2+ w∗w)2, it follows that
kA1k22 ≥ σ2+ w∗w from A1
σ w
2
2
σ w
2 2
≥ σ2+ w∗w.
But σ2 =kAk22 =kA1k22, it implies w = 0. Hence, the theorem holds by induction.
Remark 1.2.3 kAk2 =p
ρ(A∗A) = σ1 = The maximal singular value of A.
Let A = U ΣV∗. Then we have
kABCkF = kUΣV∗BCkF =kΣV∗BCkF
≤ σ1kBCkF =kAk2kBCkF. This implies
kABCkF ≤ kAk2kBkFkCk2. (1.2.11) In addition, by (1.2.2) and (1.2.11), we get
kAk2 ≤ kAkF ≤√
nkAk2. (1.2.12)
Theorem 1.2.4 Let A∈ Cn×n. The following statements are equivalent:
(1) lim
m→∞Am= 0;
(2) lim
m→∞Amx = 0 for all x;
(3) ρ(A) < 1.
Proof: (1) ⇒ (2): Trivial. (2) ⇒ (3): Let λ ∈ σ(A), i.e., Ax = λx, x 6= 0. This implies Amx = λmx → 0, as λm → 0. Thus |λ| < 1, i.e., ρ(A) < 1. (3) ⇒ (1): There is a norm k · k with kAk < 1 (by Theorem 1.2.2). Therefore, kAmk ≤ kAkm → 0, i.e., Am → 0.
Theorem 1.2.5 It holds that
ρ(A) = lim
k→∞kAkk1/k where k k is an operator norm.
Proof: Since
ρ(A)k = ρ(Ak)≤ kAkk ⇒ ρ(A) ≤ kAkk1/k,
for k = 1, 2, . . .. If ε > 0, then ˜A = [ρ(A) + ε]−1A has spectral radius < 1 and by Theorem 1.2.4 we have k ˜Akk → 0 as k → ∞. There is an N = N(ε, A) such that k ˜Akk < 1 for all k ≥ N. Thus, kAkk ≤ [ρ(A) + ε]k, for all k≥ N or kAkk1/k ≤ ρ(A) + ε for all k ≥ N. Since ρ(A) ≤ kAkk1/k, and k, ε are arbitrary, limk→∞kAkk1/k exists and equals ρ(A).
Theorem 1.2.6 Let A∈ Cn×n, and ρ(A) < 1. Then (I− A)−1 exists and (I− A)−1 = I + A + A2+· · · .
Proof: Since ρ(A) < 1, the eigenvalues of (I − A) are nonzero. Therefore, by Theorem 2.5, (I− A)−1 exists and
(I− A)(I + A + A2+· · · + Am) = I − Am → 0.
Corollary 1.2.1 If kAk < 1, then (I − A)−1 exists and k(I − A)−1k ≤ 1
1− kAk Proof: Since ρ(A)≤ kAk < 1 (by Theorem 1.2.2),
k(I − A)−1k = k X∞
i=0
Aik ≤ X∞
i=0
kAki = (1− kAk)−1.
Theorem 1.2.7 (Without proof ) For A∈ Kn×n the following statements are equivalent:
(1) There is a multiplicative norm p with p(Ak)≤ 1, k = 1, 2, . . ..
(2) For each multiplicative norm p the power p(Ak) are uniformly bounded, i.e., there exists a M (p) <∞ such that p(Ak)≤ M(p), k = 0, 1, 2, . . ..
(3) ρ(A)≤ 1 and all eigenvalue λ with |λ| = 1 are not degenerated. (i.e., m(λ) = n(λ).) (See Householder’s book: The theory of matrix, pp.45-47.)
In the following we prove some important inequalities of vector norms and matrix norms.
1.2 Norms and eigenvalues 13 (a) It holds that
1≤ kxkp
kxkq ≤ n(q−p)/pq, (p≤ q). (1.2.13) Proof: Claim kxkq ≤ kxkp, (p≤ q): It holds
kxkq = kxkp
x kxkp
q
=kxkp
x kxkp
q
≤ Cp,qkxkp,
where
Cp,q = max
kekp=1kekq, e = (e1,· · · , en)T. We now show that Cp,q≤ 1. From p ≤ q, we have
kekqq = Xn
i=1
|ei|q ≤ Xn
i=1
|ei|p = 1 (by |ei| ≤ 1).
Hence, Cp,q ≤ 1, thus kxkq ≤ kxkp.
To prove the second inequality: Let α = q/p > 1. Then the Jensen inequality holds for the convex function ϕ(x):
ϕ(
Z
Ω
f dµ)≤ Z
Ω
(ϕ◦ f)dµ, µ(Ω) = 1.
If we take ϕ(x) = xα, then we have Z
Ω|f|qdx = Z
Ω
(|f|p)q/pdx≥
Z
Ω|f|pdx
q/p
with |Ω| = 1. Consider the discrete measure Pn i=11
n = 1 and f (i) =|xi|. It follows that
Xn i=1
|xi|q1 n ≥
Xn i=1
|xi|p1 n
!q/p
. Hence, we have
n−1qkxkq ≥ n−1pkxkp. Thus,
n(q−p)/pqkxkq ≥ kxkp. (b) It holds that
1≤ kxkp
kxk∞ ≤ n1p. (1.2.14)
Proof: Let q→ ∞ and lim
q→∞kxkq =kxk∞:
kxk∞ =|xk| = (|xk|q)1q ≤ Xn
i=1
|xi|q
!1q
=kxkq.
On the other hand, we have
kxkq = Xn
i=1
|xi|q
!1q
≤ (nkxkq∞)1q ≤ n1qkxk∞
which implies that limq→∞kxkq =kxk∞. (c) It holds that
1≤j≤nmax kajkp ≤ kAkp ≤ n(p−1)/p max
1≤j≤nkajkp, (1.2.15) where A = [a1,· · · , an]∈ Rm×n.
Proof: The first inequality holds obviously. To show the second inequality, for kykp = 1 we have
kAykp ≤ Xn j=1
|yj|kajkp ≤ Xn
j=1
|yj| max
j kajkp
= kyk1max
j kajkp ≤ n(p−1)/pmax
j kajkp (by (1.2.13)).
(d) It holds that
maxi,j |aij| ≤ kAkp ≤ n(p−1)/pm1/pmax
i,j |aij|, (1.2.16) where A∈ Rm×n.
Proof: By (1.2.14) and (1.2.15) immediately.
(e) It holds that
m(1−p)/pkAk1 ≤ kAkp ≤ n(p−1)/pkAk1. (1.2.17) Proof: By (1.2.15) and (1.2.13) immediately.
(f ) By H¨older inequality, we have (see Appendix later!)
|y∗x| ≤ kxkpkykq, where 1p +1q = 1 or
max{|x∗y| : kykq= 1} = kxkp. (1.2.18) Then it holds that
kAkp =kATkq. (1.2.19)
Proof: By (1.2.18) we have
kxkmaxp=1kAxkp = max
kxkp=1 max
kykq=1|(Ax)Ty|
= max
kykq=1 max
kxkp=1|xT(ATy)| = max
kykq=1kATykq =kATkq.
1.2 Norms and eigenvalues 15 (g) It holds that
n−1pkAk∞≤ kAkp ≤ m1pkAk∞. (1.2.20) Proof: By (1.2.17) and (1.2.19), we get
m1pkAk∞ = m1pkATk1 = m1−1qkATk1
= m(q−1)/qkATk1 ≥ kATkq =kAkp. (h) It holds that
kAk2 ≤q
kAkpkAkq, (1 p +1
q = 1). (1.2.21)
Proof: By (1.2.19) we have
kAkpkAkq =kATkqkAkq ≥ kATAkq ≥ kATAk2.
The last inequality holds by the following statement: Let S be a symmetric matrix.
Then kSk2 ≤ kSk, for any matrix operator norm k k. Since |λ| ≤ kSk, kSk2 =p
ρ(S∗S) = p
ρ(S2) = max
λ∈σ(S)|λ| = |λmax|.
This implies, kSk2 ≤ kSk.
(i) For A∈ Rm×n and q ≥ p ≥ 1, it holds that
n(p−q)/pqkAkq ≤ kAkp ≤ m(q−p)/pqkAkq. (1.2.22) Proof: By (1.2.13), we get
kAkp = max
kxkp=1kAxkp ≤ max
kxkq≤1m(q−p)/pqkAxkq
= m(q−p)/pqkAkq.
Appendix: To show H¨ older inequality and (1.2.18)
Taking ϕ(x) = ex in Jensen’s inequality we have exp
Z
Ω
f dµ
≤ Z
Ω
efdµ.
Let Ω = finite set ={p1, . . . , pn}, µ({pi}) = n1, f (pi) = xi. Then exp
1
n(x1+· · · + xn)
≤ 1
n(ex1 +· · · + exn) . Taking yi = exi, we have
(y1· · · yn)1/n ≤ 1
n(y1 +· · · + yn).
Taking µ({pi}) = qi > 0, Pn
i=1qi = 1 we have
y1q1· · · yqnn ≤ q1y1+· · · + qnyn. (1.2.23) Let αi = xi/kxkp, βi = yi/kykq, where x = [x1,· · · , xn]T, y = [y1,· · · , yn]T, α = [α1,· · · , αn]T and β = [β1,· · · , βn]T. By (1.2.23) we have
αiβi ≤ 1
pαpi + 1 qβiq. Since kαkp = 1, kβkq = 1, it holds
Xn i=1
αiβi ≤ 1 p +1
q = 1.
Thus,
|xTy| ≤ kxkpkykq.
To show max{|xTy|; kxkp = 1} = kykq. Taking xi = yiq−1/kykq/pq we have kxkpp =
Pn
i=1|yi|(q−1)p kykqq = 1.
Note (q− 1)p = q. Then
Xn i=1
xTi yi
=
Pn i=1|yi|q
kykq/pq = kykqq
kykq/pq =kykq. The following two properties are useful in the following sections.
(i) There exists ˆz with kˆzkp = 1 such that kykq = ˆzTy. Let z = ˆz/kykq. Then we have zTy = 1 andkzkp = kyk1q.
(ii) From the duality, we have kyk = (kyk∗)∗ = maxkuk∗=1|yTu| = yTz andˆ kˆzk∗ = 1.
Let z = ˆz/kyk. Then we have zTy = 1 and kzk∗ = kyk1 .
1.3 The Sensitivity of Linear System Ax = b
1.3.1 Backward error and Forward error
Let x = F (a). We define backward and forward errors in Figure 1.1. In Figure 1.1, ˆ
x + ∆x = F (a + ∆a) is called a mixed forward-backward error, where |∆x| ≤ ε|x|,
|∆a| ≤ η|a|.
Definition 1.3.1 (i) An algorithm is backward stable, if for all a, it produces a computed ˆ
x with a small backward error, i.e., ˆx = F (a + ∆a) with ∆a small.
(ii) An algorithm is numerical stable, if it is stable in the mixed forward-backward error sense, i.e., ˆx + ∆x = F (a + ∆a) with both ∆a and ∆x small.
1.3 The Sensitivity of Linear System Ax = b 17
Figure 1.1: Relationship between backward and forward errors.
(iii) If a method which produces answers with forward errors of similar magnitude to those produced by a backward stable method, is called a forward stable.
Remark 1.3.1 (i) Backward stable ⇒ forward stable, not vice versa!
(ii) Forward error ≤ condition number × backward error Consider
ˆ
x− x = F (a + ∆a) − F (a) = F0(a)∆a + F00(a + θ∆a)
2 (∆a)2, θ∈ (0, 1).
Then we have
ˆ x− x
x =
aF0(a) F (a)
∆a
a + O (∆a)2 .
The quantity C(a) = aFF (a)0(a)
is called the condition number of F. If x or F is a vector, then the condition number is defined in a similar way using norms and it measures the maximum relative change, which is attained for some, but not all ∆a.
`Apriori error estimate ! Pposteriori error estimate !`
1.3.2 An SVD Analysis
Let A =Pn
i=1σiuiviT = U ΣVT be a singular value decomposition (SVD) of A. Then x = A−1b = (U ΣVT)−1b =
Xn i=1
uiTb σi
vi.
If cos(θ) =| unTb | / k b k2 and
(A− εunvnT)y = b + ε(unTb)un, σn > ε≥ 0.
Then we have
k y − x k2≥ ( ε
σn)k x k2 cos(θ).
Let E = diag{0, · · · , 0, ε}. Then it holds
(Σ− E)VTy = UTb + ε(unTb)en. Therefore,
y− x = V (Σ − E)−1UTb + ε(unTb)(σn− ε)−1vn− V Σ−1UTb
= V ((Σ− E)−1− Σ−1)UTb + ε(unTb)(σn− ε)−1vn
= V (Σ−1E(Σ− E)−1)UTb + ε(unTb)(σn− ε)−1vn
= V diag
0,· · · , 0, ε σn(σn− ε)
UTb + ε(unTb)(σn− ε)−1vn
= ε
σn(σn− ε)vn(unTb) + ε(unTb)(σn− ε)−1vn
= unTbvn( ε
σn(σn− ε) + ε(σn− ε)−1)
= ε(1 + σn)
σn(σn− ε)unTbvn.
From the inequality kxk2 ≤ kbk2kA−1k2 we have k y − x k2
k x k2 ≥ | unTb | σεn(1+σσ−ε)
k b k2 ≥ | unTb | k b k2
ε σn
.
Theorem 1.3.1 A is nonsingular and k A−1E k= r < 1. Then A + E is nonsingular and k (A + E)−1− A−1 k≤k E k k A−1 k2 /(1− r).
Proof:: Since A is nonsingular, A+E = A(I−F ), where F = −A−1E. Sincek F k= r < 1, it follows that I− F is nonsingular (by Corollary 1.2.1) and k (I − F )−1 k< 1−r1 . Then
(A + E)−1 = (I− F )−1A−1 =⇒k (A + E)−1 k≤ kA−1k 1− r and
(A + E)−1− A−1 =−A−1E(A + E)−1. It follows that
k (A + E)−1− A−1 k≤k A−1 kk E kk (A + E)−1 k≤ k A−1 k2k E k 1− r .
Lemma 1.3.1 Let
Ax = b,
(A + ∆A)y = b + ∆b,
wherek ∆A k≤ δ k A k and k ∆b k≤ δkbk. If δκ(A) = r < 1, then A+∆A is nonsingular and kykkxk ≤ 1+r1−r, where κ(A) =kAkkA−1k.
1.3 The Sensitivity of Linear System Ax = b 19 Proof: Since k A−1∆A k< δkA−1kkAk = r < 1, it follows that A + ∆A is nonsingular.
From the equality (I + A−1∆A)y = x + A−1∆b follows that
kyk ≤ k (I + A−1∆A)−1 k (kxk + δkA−1kkbk)
≤ 1
1− r(kxk + δkA−1kkbk)
= 1
1− r(kxk + rk b k kAk).
From kbk =k Ax k≤ kAkkxk follows the lemma.
1.3.3 Normwise Forward Error Bound
Theorem 1.3.2 If the assumption of Lemma 1.3.1 holds, then kx−ykkxk ≤ 1−r2δ κ(A).
Proof:: Since y− x = A−1∆b− A−1∆Ay, we have
k y − x k≤ δkA−1kkbk + δkA−1kkAkkyk.
So by Lemma 1.3.1 it holds k y − x k
kxk ≤ δκ(A) kbk
kAkkxk+ δκ(A)kyk kxk
≤ δκ(A)(1 + 1 + r
1− r) = 2δ
1− rκ(A).
1.3.4 Componentwise Forward Error Bound
Theorem 1.3.3 Let Ax = b and (A + ∆A)y = b + ∆b, where | ∆A |≤ δ | A | and
| ∆b |≤ δ | b |. If δκ∞(A) = r < 1, then (A + ∆A) is nonsingular and ky−xkkxk ∞
∞ ≤ 1−r2δ k|
A−1 || A |k∞. Here k | A−1 || A | k∞ is called a Skeel condition number.
Proof:: Since k ∆A k∞≤ δkAk∞ and k ∆b k∞≤ δkbk∞, the assumptions of Lemma 1.3.1 are satisfied in ∞-norm. So, A + ∆A is nonsingular and kykkxk∞∞ ≤ 1+r1−r.
Since y− x = A−1∆b− A−1∆Ay, we have
| y − x | ≤ | A−1 || ∆b | + | A−1 || ∆A || y |
≤ δ | A−1 || b | +δ | A−1 || A || y |
≤ δ | A−1 || A | (| x | + | y |).
By taking ∞-norm, we have
k y − x k∞ ≤ δ k| A−1 || A |k∞ (kxk∞+ 1 + r
1− rkxk∞)
= 2δ
1− rk| A−1 || A |k∞.
1.3.5 Derivation of Condition Number of Ax = b
Let
(A + εF )x(ε) = b + εf with x(0) = x.
Then we have ˙x(0) = A−1(f − F x) and x(ε) = x + ε ˙x(0) + o(ε2). Therefore, k x(ε) − x k
kxk ≤ εkA−1k{k f k
kxk +k F k} + o(ε2).
Define condition number κ(A) :=kAkkA−1k. Then we have k x(ε) − x k
kxk ≤ κ(A)(ρA+ ρb) + o(ε2), where ρA = εkF k/kAk and ρb = εkfk/kbk.
1.3.6 Normwise Backward Error
Theorem 1.3.4 Let y be the computed solution of Ax = b. Then the normwise backward error bound
η(y) := min
ε|(A + ∆A)y = b + ∆b, k∆Ak ≤ εkAk, k∆bk ≤ εkbk is given by
η(y) = krk
kAkkyk + kbk, (1.3.24)
where r = b− Ay is the residual.
Proof: The right hand side of (1.3.24) is a upper bound of η(y). This upper bound is attained for the perturbation (by construction!)
∆Amin = kAkkykrzT
kAkkyk + kbk, ∆bmin =− kbk
kAkkyk + kbkr, where z is the dual vector of y, i.e. zTy = 1 and kzk∗ = kyk1 .
Check:
k∆Amink = η(y)kAk, or
k∆Amink = kAkkykkrzTk kAkkyk + kbk =
krk
kAkkyk + kbk
kAk.
That is, to prove
krzTk = krk kyk. Since
krzTk = max
kuk=1k(rzT)uk = krk max
kuk=1|zTu| = krkkzk∗ =krk 1 kyk, we have done. Similarly, k∆bmink = η(y)kbk.
1.3 The Sensitivity of Linear System Ax = b 21
1.3.7 Componentwise Backward Error
Theorem 1.3.5 The componentwise backward error bound ω(y) := min
ε|(A + ∆A)y = b + ∆b, |∆A| ≤ ε|A|, |∆b| ≤ ε|b|
is given by
ω(y) = max
i
|r|i (A|y| + b)i
, (1.3.25)
where r = b− Ay. (note: ξ/0 = 0 if ξ = 0; ξ/0 = ∞ if ξ 6= 0.)
Proof: The right hand side of (1.3.25) is a upper bound for ω(y). This bound is at- tained for the perturbations ∆A = D1AD2 and ∆b =−D1b, where D1 = diag(ri/(A|y| + b)i) and D2 = diag(sign(yi)).
Remark 1.3.2 Theorems 1.3.4 and 1.3.5 are posterior error estimation approach.
1.3.8 Determinants and Nearness to Singularity
Bn =
1 −1 · · · −1 1 . .. ...
1 −1
0 1
, Bn−1 =
1 1 · · · 2n−2 . .. ... ...
. .. 1
0 1
.
Then det(Bn) = 1, κ∞(Bn) = n2n−1, σ30(B30)≈ 10−8.
Dn=
10−1 0
. ..
0 10−1
.
Then det(Dn) = 10−n, κp(Dn) = 1 and σn(Dn) = 10−1.