Part II
On the Numerical Solutions of
Eigenvalue Problems
Chapter 5
The Unsymmetric Eigenvalue Problem
Generalized eigenvalue problem (GEVP):
Given A, B ∈ Cn×n. Determine λ ∈ C and 0 6= x ∈ Cn with Ax = λBx. λ is called an eigenvalue of the pencil A − λB (or pair(A, B)) and x is called an eigen- vector corresponding to λ. λ is an eigenvalue of A − λB ⇐⇒ det(A − λB) = 0.
(σ(A, B) ≡ {λ ∈ C | det(A − λB) = 0}.)
Definition 5.0.2 A pencil A − λB (A, B ∈ Rm×n) or a pair(A, B) is called regular if that
(i) A and B are square matrices of order n, and (ii) det(A − λB) 6≡ 0.
In all other case (m 6= n or m = n but det(A − λB) ≡ 0), the pencil is called singular.
Detailed algebraic structure of a pencil A − λB see Matrix theory II, chapter XII (Gant- macher 1959).
Eigenvalue Problem (EVP):
Special case in GEVP when B = I, we have λ ∈ C and 0 6= x ∈ Cn with Ax = λx. λ is an eigenvalue of A and x is an eigenvector corresponding to λ.
Definition 5.0.3 (a) σ(A) = {λ ∈ C | det(A − λI) = 0} is called the spectrum of A.
(b) ρ(A) = max{| λ |: λ ∈ σ(A)} is called the radius of σ(A).
(c) P (λ) = det(λI − A) is called the characteristic polynomial of A.
Let P (λ) = Ys i=1
(λ − λi)m(λi), λi 6= λj(i 6= j) and Xs
i=1
m(λi) = n.
Example 5.0.2 A =
· 2 2 0 3
¸ , B =
· 1 0 0 0
¸
=⇒ det(A − λB) = 2 − λ and σ(A, B) = {2}.
Example 5.0.3 A =
· 1 2 0 3
¸ , B =
· 0 1 0 0
¸
=⇒ det(A − λB) = 3 and σ(A, B) = ∅.
Example 5.0.4 A =
· 1 2 0 0
¸ , B =
· 1 0 0 0
¸
=⇒ det(A − λB) = 0 and σ(A, B) = C.
Example 5.0.5 det(µA − λB) = (2µ − λ)µ µ = 1 : Ax = λBx =⇒ λ = 2.
λ = 1 : Bx = µAx =⇒ µ = 0, µ = 12 =⇒ λ = ∞, λ = 2.
σ(A, B) = {2, ∞}.
Example 5.0.6 det(µA − λB) = µ · 3µ µ = 1 : no solution for λ.
λ = 1 : Bx = µAx =⇒ µ = 0, 0.(multiple) σ(A, B) = {∞, ∞}.
Let
m(λi) := algebraic multiplicity of λi.
n(λi) := n − rank(A − λiI) = geometric multiplicity.
1 ≤ n(λi) ≤ m(λi).
If for some i, n(λi) < m(λi), then A is degenerated (defective). The following statements are equivalent:
(a) A is diagonalizable: There exists a nonsingular matrix T such that T−1AT = diag(λ1, · · · , λn).
(b) There are n linearly independent eigenvectors.
(c) A is nondefective, i.e. ∀ λ ∈ σ(A) =⇒ m(λ) = n(λ).
If A is defective then eigenvector + principle vector =⇒ Jordan form.
Theorem 5.0.3 (Jordan decomposition) If A ∈ Cn×n, then there exists a nonsingu- lar X ∈ Cn×n, such that X−1AX = diag(J1, · · · , Jt),where
Ji =
λi 1 0
. .. ...
. .. 1
0 λi
is mi× mi and m1+ · · · + mt= n.
Theorem 5.0.4 (Schur decomposition) If A ∈ Cn×n then there exists a unitary ma- trix U ∈ Cn×n such that U∗AU(= U−1AU) a upper triangular.
- A normal(i.e. AA∗ = A∗A) ⇐⇒ ∃ unitary U such that U∗AU = diag(λ1, · · · , λn), i.e.
Aui = λiui, u∗iuj = δij.
- A hermitian(i.e. A∗ = A) ⇐⇒ A is normal and σ(A) ⊆ R.
- A symmetric(i.e. AT = A, A ∈ Rn×n) ⇐⇒ ∃ orthogonal U such that UTAU = diag(λ1, · · · , λn) and σ(A) ⊆ R.
5.1 Orthogonal Projections and C-S Decomposition
Definition 5.1.1 Let S ⊆ Rn be a subspace, P ∈ Rn×n is the orthogonal projection onto S if
Range(P ) = S, P2 = P,
PT = P,
(5.1.1)
where Range(P ) = R(P ) = {y ∈ Rn| y = P x, for some x ∈ Rn}.
Remark 5.1.1 If x ∈ Rn, then P x ∈ S and (I − P )x ∈ S⊥.
Example 5.1.1 P = vvT/vTv is the orthogonal projection onto S = span{v}, v ∈ Rn. x
Px
S=span{v}
Figure 5.1: Orthogonal projection
Remark 5.1.2 (i) If P1 and P2 are orthogonal projections, then for any z ∈ Rn we have k (P1− P2)z k22= (P1z)T(I − P2)z + (P2z)T(I − P1)z. (5.1.2) If R(P1) = R(P2) = S then the right-hand side of (5.1.2) is zero. Thus the orthog- onal projection for a subspace is unique.
(ii) If V = [v1, · · · , vk] is an orthogonal basis for S, then P = V VT is unique orthogonal projection onto S.
Definition 5.1.2 Suppose S1 and S2 are subspaces of Rn and dim(S1) = dim(S2). We define the distance between S1 and S2 by
dist(S1, S2) =k P1− P2 k2, (5.1.3) where Pi is the orthogonal projection onto Si, i = 1, 2.
Remark 5.1.3 By considering the case of one-dimensional subspaces in R2, we obtain a geometrical interpretation of dist(·, ·). Suppose S1 = span{x} and S2 = span{y} and
S
S
=span{y}
=span{x}
2
θ
1
k x k2=k y k2= 1. Assume that xTy = cos θ, θ ∈ [0,π2]. It follows that the difference between the projections onto these spaces satisfies
P1− P2 = xxT − yyT = x[x − (yTx)y]T − [y − (xTy)x]yT. If θ = 0(⇒ x = y), then dist(S1, S2) =k P1− P2 k2= sin θ = 0.
If θ 6= 0, then
Ux = [u1, u2] = [x, −[y − (yTx)x]/ sin θ]
and
Vx = [v1, v2] = [[x − (xTy)y]/ sin θ, y]
are defined and orthogonal. It follows that
P1− P2 = Ux diag[sin θ, sin θ] VxT
is the SVD of P1 − P2. Consequently, dist(S1, S2) = sin θ, the sine of the angle between the two subspaces.
Theorem 5.1.1 (C-S Decomposition, Davis / Kahan(1970) or Stewart(1977)) If Q =
· Q11 Q12 Q21 Q22
¸
is orthogonal with Q11 ∈ Rk×k and Q22 ∈ Rj×j(k ≥ j), then there exists orthogonal matrices U1, V1 ∈ Rk×k and orthogonal matrices U2, V2 ∈ Rj×j such that
· U1 0 0 U2
¸T ·
Q11 Q12 Q21 Q22
¸ · V1 0 0 V2
¸
=
I 0 0
0 C S
0 −S C
, (5.1.4)
where
C = diag(c1, · · · , cj), ci = cos θi, S = diag(s1, · · · , sj), si = sin θi
and 0 ≤ θ1 ≤ θ2 ≤ · · · ≤ θj ≤ π2. Lemma 5.1.1 Let Q =
· Q1 Q2
¸
be orthogonal with Q1 ∈ Rn×n. Then there are unitary matrices U1, U2 and W such that
· U1T 0 0 U2T
¸ · Q1 Q2
¸ W =
· C S
¸
where C = diag(c1, · · · , cj) ≥ 0, and S = diag(s1, · · · , sn) ≥ 0 with c2i + s2i = 1, i = 1, · · · , n.
Proof: Let U1TQ1W = C be the SVD of Q1. Consider
· U1T 0 0 I
¸ · Q1 Q2
¸ W =
· C
Q2W
¸
has orthogonal columns. Define ˜Q2 ≡ Q2W . Then C2 + ˜QT2Q˜2 = I or ˜QT2Q˜2 = I − C2 diagonal, thus ˜QT2Q˜2 is diagonal. Which means that the nonzero column of ˜Q2 are orthogonal to one another.If all the columns of ˜Q2 are nonzero, set S2 = ˜QT2Q˜2 and U2 = ˜Q2S−1, then we have U2TU2 = I and U2TQ˜2 = S. It follows the decomposition.
If ˜Q2 has zero columns, normalize the nonzero columns and replace the zero columns with an orthogonal basis for the orthogonal complement of the column space of ˜Q2. It is easily verified that U2 so defined is orthogonal and S ≡ U2TQ˜2 is diagonal. It also follows that decomposition.
Theorem 5.1.2 (C-S Decomposition) Let the unitary matrix W ∈ Cn×n be parti- tioned in the form W =
· W11 W12 W21 W22
¸
, where W11∈ Cr×r with r ≤ n2. Then there exist
unitary matrices U = diag(
z}|{r
U1 , z}|{n−r
U2 ) and V = diag(
z}|{r
V1 , z}|{n−r
V2 ) such that
U∗W V =
Γ −Σ 0
Σ Γ 0
0 0 I
}r }r }n − 2r
, (5.1.5)
where Γ = diag(γ1, · · · , γr) ≥ 0 and Σ = diag(σ1, · · · , σr) ≥ 0 with γi2 + σ2i = 1, i = 1, · · · , r.
Proof: Let Γ = U1∗W11V1 be the SVD of W11 with the diagonal elements of Γ : γ1 ≤ γ2 ≤ · · · ≤ γk < 1 = γk+1 = · · · = γr, i.e.
Γ = diag(Γ0, Ir−k).
The matrix
· W11 W21
¸
V1 has orthogonal columns. Hence
I =
·µ W11 W21
¶ V1
¸∗ ·µ W11 W21
¶ V1
¸
= Γ2+ (W21V1)∗(W21V1).
Since I and Γ2 are diagonal, (W21V1)∗(W21V1) is diagonal. So the columns of W21V1 are orthogonal. Since the ith diagonal of I − Γ2 is the norm of the ith column of W21V1, only the first k(k ≤ r ≤ n − r) columns of W21V1 are nonzero. Let ˆU2 be unitary whose first k columns are the normalized columns of W21V1. Then
Uˆ2∗W21V1 =
· Σ 0
¸ ,
where Σ = diag(σ1, · · · , σk, 0, · · · , 0) ≡ diag(Σ0, 0), ˆU2 ∈ C(n−r)×(n−r). Since
diag(U1, ˆU2)∗
µ W11 W21
¶ V1 =
Γ Σ 0
has orthogonal (orthonormal) columns, we have γi2+ σi2 = 1, i = 1, · · · , r. (Σ0 is nonsin- gular).
By the same argument as above : there is a unitary V2 ∈ C(n−r)×(n−r) such that U1∗W12V2 = (T, 0),
where T = diag(τ1, · · · , τr) and τi ≤ 0. Since γi2 + τi2 = 1, it follows from γi2 + σi2 = 1 that T = −Σ. Set ˆU = diag(U1, ˆU2) and V = diag(V1, V2). Then X = ˆU∗W V can be partitioned in the form
X =
Γ0 0 −Σ0 0 0
0 I 0 0 0
Σ0 0 X33 X34 X35
0 0 X43 X44 X45 0 0 X53 X54 X55
}k }r − k }k }r − k }n − 2r
.
Since columns 1 and 4 are orthogonal, it follows Σ0X34 = 0. Thus X34 = 0 (since Σ0 nonsigular). Likewise X35, X43, X53 = 0. From the orthogonality of columns 1 and 3, it follows that −Γ0Σ0+ Σ0X33= 0, so X33= Γ0. The matrix ˆU3 =
· X44 X45 X54 X55
¸
is unitary.
Set U2 = diag(Ik, ˆU3) ˆU2 and U = diag(U1, U2). Then UHW V = diag(Ir+k, ˆU3)X with
X =
Γ0 0 −Σ0 0 0
0 I 0 0 0
Σ0 0 Γ0 0 0
0 0 0 I 0
0 0 0 0 I
.
The theorem is proved.
Theorem 5.1.3 Let W = [W1, W2] and Z = [Z1, Z2] be orthogonal, where W1, Z1 ∈ Rn×k and W2, Z2 ∈ Rn×(n−k). If S1 = R(W1) and S2 = R(Z1) then
dist(S1, S2) = q
1 − σmin2 (W1TZ1) (5.1.6) Proof: Let Q = WTZ and assume that k ≥ j = n − k. Let the C-S decomposition of Q be given by (5.1.2), (Qij = WiTZj, i, j = 1, 2). It follows that
k W1TZ2 k2=k W2TZ1 k2= sj = q
1 − c2j =p
1 − σmin2 (W1TZ1).
Since W1W1T and Z1Z1T are the orthogonal projections onto S1 and S2, respectively. We have
dist(S1, S2) = k W1W1T − Z1Z1T k2
= k WT(W1W1T − Z1Z1T)Z k2
= k
· 0 W1TZ2 W2TZ1 0
¸ k2
= sj.
If k < j, the above argument by setting Q = [W2, W1]T[Z2, Z1] and noting that σmin(W2TZ1) = σmin(W1TZ2) = sj.
5.2 Perturbation Theory
Theorem 5.2.1 (Gerschgorin Circle Theorem) If X−1AX = D+F , D ≡ diag(d1, · · · , dn) and F has zero diagonal entries, then σ(A) ⊂Sn
i=1Di, where Di = {z ∈ C | |z − di| ≤
Xn j=1,j6=i
|fij|}.
Proof: Suppose λ ∈ σ(A) and assume without loss of generality that λ 6= di for i = 1, · · · , n. Since (D − λI) + F is singular, it follows that
1 ≤k (D − λI)−1F k∞= Xn
j=1
|fkj| / |dk− λ|
for some k(1 ≤ k ≤ n). But this implies that λ ∈ Dk. Corollary 5.2.1 If the union M1 =Sk
j=1Dij of k discs Dij, j = 1, · · · , k, and the union M2 of the remaining discs are disjoint, then M1 contains exactly k eigenvalues of A and M2 exactly n − k eigenvalues.
Proof: Let B = X−1AX = D + F , for t ∈ [0, 1]. Let Bt := D + tF , then B0 = D, B1 = B. The eigenvalues of Bt are continuous functions of t. Applying Theorem 5.2.1 of Gerschgorin to Bt, one finds that for t = 0, there are exactly k eigenvalues of B0 in M1 and n − k in M2. (Counting multiple eigenvalues) Since for 0 ≤ t ≤ 1 all eigenvalues of Bt likewise must lie in these discs, it follows for reasons of continuity that also k eigenvalues of A lie in M1 and the remaining n − k in M2.
Remark 5.2.1 Take X = I, A = diag(A) + offdiag(A). Consider the transformation A −→ 4−1A4 with 4 = diag(δ1, · · · , δn). The Gerschgorin discs:
Di = {z ∈ C | |z − aii| ≤ Xn
k=1 k6=i
¯¯
¯¯aikδk δi
¯¯
¯¯ =: ρi}.
Example 5.2.1 Let A =
1 ² ²
² 2 ²
² ² 2
, D1 = {z | |z − 1| ≤ 2²}, D2 = D3 = {z |
|z − 2| ≤ 2²}, 0 < ² ¿ 1. Transformation with 4 = diag(1, k², k²), k > 0 yields A = 4˜ −1A4 =
1 k²2 k²2
1
k 2 ²
1
k ² 2
.
For ˜A we have ρ1 = 2k²2, ρ2 = ρ3 = 1k+ ². The discs D1 and D2 = D3 for ˜A are disjoint if ρ1+ ρ2 = 2k²2 +1k + ² < 1.
For this to be true we must clearly have k > 1. The optimal value ˜k, for which D1 and D2(for ˜A) touch one another, is obtained from ρ1+ ρ2 = 1. One finds
˜k = 2 1 − ² +p
(1 − ²)2− 8²2 = 1 + ² + O(²2)
and thus ρ1 = 2˜k²2 = 2²2+ O(²3). Through the transformation A −→ ˜A the radius ρ1 of D1 can thus be reduced from the initial 2² to about 2²2.
Theorem 5.2.2 (Bauer-Fike) If µ is an eigenvalue of A + E ∈ Cn×n and X−1AX = D = diag(λ1, · · · , λn), then
λ∈σ(A)min |λ − µ| ≤ κp(X) k E kp,
where k · kp is p-norm and κp(X) =k X kpk X−1 kp .
Proof: We need only consider the case µ 6∈ σ(A). If X−1(A + E − µI)X is singular, then so is I + (D − µI)−1(X−1EX). Thus,
1 ≤k (D − µI)−1(X−1EX) kp≤ 1
λ∈σ(A)min |λ − µ| k X kpk E kpk X−1 kp .
Theorem 5.2.3 Let Q∗AQ = D+N be a Schur decomposition of A with D = diag(λ1, · · · , λn) and N strictly upper triangular, Nn= 0. If µ ∈ σ(A + E), then
λ∈σ(A)min |λ − µ| ≤ max{θ, θn1}, where θ =k E k2 Pn−1
k=0 k N kk2.
Proof: Define δ = minλ∈σ(A)|λ − µ|. The theorem is true if δ = 0. If δ > 0, then I − (µI − A)−1E is singular and we have
1 ≤ k (µI − A)−1E k2
≤ k (µI − A)−1 k2k E k2
= k [(µI − D) − N]−1 k2k E k2 .
Since (µI − D) is diagonal it follows that [(µI − D)−1N]n = 0 and therefore
[(µI − D) − N]−1 = Xn−1
k=0
[(µI − D)−1N]k(µI − D)−1.
Hence we have
1 ≤ k E k2
δ max{1, 1 δn−1}
Xn−1 k=0
k N kk2,
from which the theorem readily follows.
Example 5.2.2 If A =
1 2 3
0 4 5
0 0 4.001
and E =
0 0 0
0 0 0
0.001 0 0
. Then σ(A + E) ∼= {1.0001, 4.0582, 3.9427} and A’s matrix of eigenvectors satisfies κ2(X) ∼= 107. The Bauer- Fike bound in Theorem 5.2.2 has order 104, but the Schur bound in Theorem 5.2.3 has order 100.
Theorems 5.2.2 and 5.2.3 each indicate potential eigenvalue sensitively if A is non- normal. Specifically, if κ2(X) and k N kn−12 is large, then small changes in A can induce large change in the eigenvalues.
Example 5.2.3 If A =
· 0 I9
0 0
¸
and E =
· 0 0
10−10 0
¸
, then for all λ ∈ σ(A) and µ ∈ σ(A + E), |λ − µ| = 10−1010 . So a change of order 10−10 in A results in a change of order 10−1 in its eigenvalues.
Let λ be a simple eigenvalue of A ∈ Cn×n and x and y satisfy Ax = λx and y∗A = λy∗ with k x k2=k y k2= 1. Using classical results from Function Theory, it can be shown that there exists differentiable x(ε) and λ(ε) such that
(A + εF )x(ε) = λ(ε)x(ε)
with k x(ε) k2= 1 and k F k2≤ 1, and such that λ(0) = λ and x(0) = x. By differentiating and set ε = 0:
A ˙x(0) + F x = ˙λ(0)x + λ ˙x(0).
Applying y∗ to both sides and dividing by y∗x =⇒
f (x, y) = yn+ pn−1(x)yn−1+ pn−2(x)yn−2+ · · · + p1(x)y + p0(x).
Fix x, then f (x, y) = 0 has n roots y1(x), · · · , yn(x). f (0, y) = 0 has n roots y1(0), · · · , yn(0).
Theorem 5.2.4 Suppose yi(0) is a simple root of f (0, y) = 0, then there is δi > 0 such that there is a simple root yi(x) of f (x, y) = 0 defined by
yi(x) = yi(0) + pi1x + pi2x2+ · · · , (may terminate!) where the series is convergent for |x| < δi. (yi(x) −→ yi(0) as x −→ 0).
Theorem 5.2.5 If y1(0) = · · · = ym(0) is a root of multiplicity m of f (0, y) = 0, then there exists δ > 0 such that there are exactly m zeros of f (x, y) = 0 when |x| < δ having the following properties:
(a) Pr
i=1mi = m, mi ≥ 0. The m roots fall into r groups.
(b) Those roots in the group of mi are mi values of a series y1(0) + pi1z + pi2z2 + · · ·
corresponding to the mi different values of z defined by z = xmi1 .
Let λ1 be a simple root of A and x1 be the corresponding eigenvector. Since λ1 is simple, (A − λ1I) has at least one nonzero minor of order n − 1. Suppose this lies in the first (n − 1) rows of (A − λ1I). Take x1 = (An1, An2, · · · , Ann). Then
(A − λ1I)
An1 An2 ...
Ann
=
0 0...
0
,
since Pn
j=1anjAnj = det(A − λ1I) = 0. Here Ani is the cofactor of ani, hence it is a polynomial in λ1 of degree not greater than (n − 1).
Let λ1(ε) be the simple eigenvalue of A + εF and x1(ε) be the corresponding eigen- vector. Then the elements of x1(ε) are the polynomial in λ1(ε) and ε. Since the power series for λ1(ε) is convergent for small ε, so x1(ε) = x1+ εz1+ ε2z2+ · · · is a convergent power series
¯¯
¯ ˙λ(0)
¯¯
¯ = |y∗F x|
|y∗x| ≤ 1
|y∗x|. The upper bound is attained if F = yx∗. We refer to the reciprocal of s(λ) ≡ |y∗x| as the condition number of the eigenvalue λ.
λ(ε) = λ(0) + ˙λ(0)ε + O(ε2), an eigenvalue λ may be perturbed by an amount ε s(λ), if s(λ) is small then λ is appropriately regarded as ill-conditioned. Note that s(λ) is the cosine of the angle between the left and right eigenvectors associated with λ and is unique only if λ is simple. A small s(λ) implies that A is near a matrix having a multiple eigenvalue. In particular, if λ is distinct and s(λ) < 1, then there exists an E such that λ is a repeated eigenvalue of A + E and
k E k2≤ s(λ) p1 − s2(λ), this is proved in Wilkinson(1972).
Example 5.2.4 If A =
1 2 3
0 4 5
0 0 4.001
and E =
0 0 0
0 0 0
0.001 0 0
. Then σ(A + E) ∼= {1.0001, 4.0582, 3.9427} and s(1) ∼= 0.79 × 100, s(4) = 0.16 × 10−3, s(4.001) ∼= 0.16 × 10−3. Observe that k E k2 /s(λ) is a good estimate of the perturbation that each eigenvalue undergoes.
If λ is a repeated eigenvalue, then the eigenvalue sensitivity question is more compli- cated. For example A =
· 1 a 0 1
¸
and F =
· 0 0 1 0
¸
then σ(A + εF ) = {1 ±√
εa}. Note that if a 6= 0 then the eigenvalues of A + εF are not differentiable at zero, their rate of change at the origin is infinite. In general, if λ is a detective eigenvalue of A, then O(ε) perturbations in A result in O(εp1) perturbations in λ where p ≥ 2 (see Wilkinson AEP pp.77 for a more detailed discussion).
We now consider the perturbations of invariant subspaces. Assume A ∈ Cn×n has distinct eigenvalues λ1, · · · , λn and k F k2= 1. We have
(A + εF )xk(ε) = λk(ε)xk(ε), k xk(ε) k2= 1,
and
yk∗(ε)(A + εF ) = λk(ε)y∗k(ε), k yk(ε) k2= 1,
for k = 1, · · · , n, where each λk(ε),xk(ε) and yk(ε) are differentiable. Set ε = 0 : A ˙xk(0) + F xk = ˙λk(0)xk+ λk˙xk(0),
where λk = λk(0) and xk = xk(0). Since {xi}ni=1 linearly independent, write ˙xk(0) = Pn
i=1aixi, so we have Xn
i=1 i6=k
ai(λi− λk)xi+ F xk= ˙λk(0)xk.
But yi∗(0)xk= yi∗xk = 0, for i 6= k and thus
ai = yi∗F xk/[(λk− λi)yi∗xi], i 6= k.
Hence the Taylor expansion for xk(ε) is
xk(ε) = xk+ ε Xn
i=1 i6=k
½ yi∗F xk (λk− λi)yi∗xi
¾
xi+ O(ε2).
Thus the sensitivity of xk depends upon eigenvalue sensitivity and the separation of λk
from the other eigenvalues.
Example 5.2.5 If A =
· 1.01 0.01 0.00 0.99
¸
, then λ = 0.99 has Condition 1
s(0.99) ∼= 1.118 and associated eigenvector x = (0.4472, −8.944)T. On the other hand, ˜λ = 1.00 of the
”nearby” matrix A + E =
· 1.01 0.01 0.00 1.00
¸
has an eigenvector ˜x = (0.7071, −0.7071)T. Suppose
Q∗AQ =
· T11 T12 0 T22
¸ }p
}q = n − p (5.2.1)
is a Schur decomposition of A with
Q = [ Q|{z}1
p
, Q|{z}2
n−p
]. (5.2.2)
Definition 5.2.1 We define the separation between T11 and T22 by sepF(T11, T22) = min
Z6=0
k T11Z − ZT22 kF k Z kF .
Definition 5.2.2 Let X be a subspace of Cn, X is called an invariant subspace of A ∈ Cn×n, if AX ⊂ X (i.e. x ∈ X =⇒ Ax ∈ X).
Theorem 5.2.6 A ∈ Cn×n, V ∈ Cn×r and rank(V ) = r, then there are equivalent:
(a) there exists S ∈ Cr×r such that AV = V S.
(b) R(V ) is an invariant subspace of A.
Proof: Trivial!
Remark 5.2.2 (a) If Sz = µz, z 6= 0 then µ is eigenvalue of A with eigenvector V z.
(b) If V is a basis of X, then ˜V = V (V∗V )−12 is an orthogonal basis of X.
Theorem 5.2.7 A ∈ Cn×n, Q = (Q1, Q2) orthogonal, then there are equivalent:
(a) If Q∗AQ = B =
· B11 B12 B21 B22
¸
, then B21= 0.
(b) R(Q1) is an invariant subspace of A.
Proof: Q∗AQ = B ⇐⇒ AQ = QB = (Q1, Q2)
· B11 B12 B21 B22
¸
. Thus AQ1 = Q1B11+ Q2B21.
(a) B21= 0, then AQ1 = Q1B11, so R(Q1) is an invariant subspace of A (from Theorem 5.2.6).
(b) R(Q1) is invariant subspace. There exists S such that AQ1 = Q1S = Q1B11+Q2B21. Multiply with Q∗1, then
S = Q∗1Q1S = Q∗1Q1B11+ Q∗1Q2B21. So S = B11 =⇒ Q2B21 = 0 =⇒ Q∗2Q2B21 = 0 =⇒ B21= 0.
Theorem 5.2.8 Suppose (5.2.1) and (5.2.2) hold and for E ∈ Cn×n we partition Q∗EQ as follows:
Q∗EQ =
· E11 E12
E21 E22
¸
with E11 ∈ Rp×p and E22 ∈ R(n−p)×(n−p). If
δ = sep2(T11, T22)− k E11 k2 − k E22k2> 0 and
k E21k2 (k T12k2 + k E12 k2) ≤ δ2/4.
Then there exists P ∈ C(n−k)×k such that
k P k2≤ 2 k E21 k2 /δ
and such that the column of ˜Q1 = (Q1+ Q2P )(I + P∗P )−12 form an orthonormal basis for a invariant subspace of A + E.(See Stewart 1973).
Lemma 5.2.1 Let {sm} and {pm} be two sequence defined by
sm+1 = sm/(1 − 2ηpmsm), pm+1 = ηp2msm+1, m = 0, 1, 2, · · · (5.2.3) and
s0 = σ, p0 = σγ (5.2.4)
satisfying
4ησ2γ < 1. (Here σ, η, γ > 0) (5.2.5) Then {sm} is monotonic increasing and bounded above; {pm} is monotonic decreasing, converges quadratically to zero.
Proof: Let
xm = smpm, m = 0, 1, 2, · · · . (5.2.6) From (5.2.3) we have
xm+1 = sm+1pm+1 = ηp2ms2m/(1 − 2ηpmsm)2 = ηx2m/(1 − 2ηxm)2, (5.2.7) (5.2.5) can be written as
0 < x0 < 1
4η. (since x0 = s0p0 = σ2γ < 1
4η) (5.2.8)
Consider
x = f (x), f (x) = ηx2/(1 − 2ηx)2, x ≥ 0. (5.2.9) By
df (x)
dx = 2ηx
(1 − 2ηx)3,
we know that f (x) is differentiable and monotonic increasing in [0, 1/2η), and df (x)
dx |x=0= 0 : The equation (5.2.9) has zeros 0 and 1/4η in [0, 1/2η). Under Condition (5.2.8) the iteration xm as in (5.2.7) must be monotone decreasing converges quadratically to zero.
(Issacson &Keller ”Analysis of Num. Method 1996, Chapter 3 §1.) Thus sm+1
sm = 1
1 − 2ηxm = 1 + 2ηxm
1 − 2ηxm = 1 + tm,
where tm is monotone decreasing, converges quadratically to zero, hence
sm+1 = s0 Ym j=0
sj+1
sj = s0 Ym j=0
(1 + tj)
monotone increasing, and converges to s0 Y∞ j=0
(1 + tj) < ∞, so pm = xm
sm monotone de- creasing, and quadratically convergent to zero.
Theorem 5.2.9 Let
P A12P + P A11− A22P − A21 = 0 (5.2.10) be the quadratic matrix equation in P ∈ C(n−l)×l (1 ≤ l ≤ n), where
· A11 A12 A21 A22
¸
= A, σ(A11)\
σ(A22) = ∅.
Define operator T by:
T Q ≡ QA11− A22Q, Q ∈ C(n−l)×l. (5.2.11) Let
η =k A12k, γ =k A21k (5.2.12)
and
σ =k T−1 k= sup
kP k=1
k T−1P k . (5.2.13)
If
4ησ2γ < 1, (5.2.14)
then according to the following iteration, we can get a solution P of (5.2.10) satisfying
k P k≤ 2σγ, (5.2.15)
and this iteration is quadratic convergence.
Iteration: Let Am =
"
A(m)11 A(m)12 A(m)21 A(m)22
#
, A0 = A.
(i) Solve
TmPm ≡ PmA(m)11 − A(m)22 Pm = A(m)21 (5.2.16) and get Pm ∈ C(n−l)×l;
(ii) Compute
A(m+1)11 = A(m)11 + A12Pm, A(m+1)22 = A(m)22 − PmA12, A(m+1)21 = −PmA12Pm. Goto (i), solve Pm+1.
Then
P = lim
m→∞
Xm i=0
Pi (5.2.17)
is a solution of (5.2.10) and satisfies (5.2.15).
Proof: (a) Prove that for m = 0, 1, 2, · · · , Tm−1 exist: denote
k Tm−1 k= σm, (T = T0, σ = σ0), (5.2.18) then
4 k A12 kk Pm k σm < 1. (5.2.19) By induction, m = 0, from σ(A11)T
σ(A22) = ∅ we have T0 = T is nonsingular. From (5.2.12)-(5.2.14) it holds
4 k A12 kk P0 k σ0 = 4η k T−1A21 k σ ≤ 4ησ2γ < 1.
Suppose Tm−1 exists, and (5.2.19) holds, prove that Tm+1−1 exists and 4 k A12 kk Pm+1 k σm+1 < 1.
From the definition
sep(A11, A22) = inf
kQk=1k QA11− A22Q k
and the existence of T−1 follows sep(A11, A22) =k T−1 k−1= σ−1, and by the perturbation property of ”sep” follows
sep(A(m+1)11 , A(m+1)22 ) = sep(A(m)11 + A12Pm, A(m)22 − PmA12)
≥ sep(A(m)11 , A(m)22 )− k A12Pm k − k PmA12k
≥ 1 − 2 k A12kk Pm k σm σm
> 0. (5.2.20)
From
sep(A11, A22) ≤ min{|λ1− λ2| : λ1 ∈ σ(A11), λ2 ∈ σ(A22)}.
We have σ(A(m+1)11 )T
σ(A(m+1)22 ) = ∅, hence Tm+1−1 exists and sep(A(m+1)11 , A(m+1)22 ) =k Tm+1−1 k−1= σ−1m+1. From (5.2.20) it follows
σm+1 ≤ σm
1 − 2 k A12 kk Pm k σm. (5.2.21) Substitute (5.2.19) into (5.2.21), we get σm+1 ≤ 2σm, and
k Pm+1 k≤k Tm+1−1 kk Am+121 k≤ σm+1 k Pm k2k A12k< 1
2 k Pm k . Hence
2 k A12 kk Pm+1 k σm+1 ≤ 2 k A12 kk Pm k σm < 1/2.
This proved that Tm−1 exists for all m = 0, 1, 2, · · · and (5.2.19) holds.
(b) Prove k Pm k is quadratic convergence to zero. Construct sequences {qm}, {sm}, {pm} satisfying
k A(m)21 k≤ qm, σm ≤ sm, k Pm k≤ pm. (5.2.22) From
A(m+1)21 = −PmA12Pm (5.2.23)
follows
k A(m+1)21 k≤k A12kk Pm k2≤ ηp2m. (5.2.24) Define {qm} by
qm+1 = ηp2m, q0 = γ; (5.2.25) From (5.2.21) we have
σm+1 ≤ sm
1 − 2ηpmsm. (5.2.26)
Define {sm} by
sm+1 = sm
1 − 2ηpmsm, s0 = σ; (5.2.27) From (5.2.16) we have
k Pm k≤k Tm−1 kk A(m)21 k= σm k A(m)21 k≤ smqm. Define {pm} by
pm+1 = sm+1qm+1 = ηp2msm+1, p0 = σγ. (5.2.28) By Lemma 5.2.1 follows that {pm} & 0 monotone and form (5.2.22) follows that k Pm k−→ 0 quadratically.
(c) Prove P(m) −→ P and (5.2.15) holds. According to the method as in Lemma 5.2.1. Construct {xm} (see (5.2.6),(5.2.7) ), that is
xm+1 = ηx2m
(1 − 2ηxm)2, sm+1 = sm
1 − 2ηxm (5.2.29)
and then
pm+1 = xm+1
sm+1 = ηxm
1 − 2ηxmpm. (5.2.30)
By induction! For all m = 1, 2, · · · we have pm < 1
2pm−1, xm < 1
4η. (5.2.31)
In fact, substitute
ηx0
1 − 2ηx0 = ησ2γ
1 − 2ησ2γ < 1
2 (5.2.32)
into (5.2.30) and get p1 < 1
2p0; From (5.2.29) and (5.2.32) it follows that x1 = 1
η
µ ηx0
1 − 2ηx0
¶2
< 1 4η.