Part II On the Numerical Solutions of Eigenvalue Problems

(1)

Part II

On the Numerical Solutions of

Eigenvalue Problems

(2)

(3)

Chapter 5 The Unsymmetric Eigenvalue Problem

Generalized eigenvalue problem (GEVP):

Given A, B ∈ C^n×n. Determine λ ∈ C and 0 6= x ∈ Cⁿ with Ax = λBx. λ is called an eigenvalue of the pencil A − λB (or pair(A, B)) and x is called an eigen- vector corresponding to λ. λ is an eigenvalue of A − λB ⇐⇒ det(A − λB) = 0.

(σ(A, B) ≡ {λ ∈ C | det(A − λB) = 0}.)

Definition 5.0.2 A pencil A − λB (A, B ∈ R^m×n) or a pair(A, B) is called regular if that

(i) A and B are square matrices of order n, and (ii) det(A − λB) 6≡ 0.

In all other case (m 6= n or m = n but det(A − λB) ≡ 0), the pencil is called singular.

Detailed algebraic structure of a pencil A − λB see Matrix theory II, chapter XII (Gant- macher 1959).

Eigenvalue Problem (EVP):

Special case in GEVP when B = I, we have λ ∈ C and 0 6= x ∈ Cⁿ with Ax = λx. λ is an eigenvalue of A and x is an eigenvector corresponding to λ.

Definition 5.0.3 (a) σ(A) = {λ ∈ C | det(A − λI) = 0} is called the spectrum of A.

(b) ρ(A) = max{| λ |: λ ∈ σ(A)} is called the radius of σ(A).

(c) P (λ) = det(λI − A) is called the characteristic polynomial of A.

Let P (λ) = Ys i=1

(λ − λ_i)^m(λⁱ⁾, λ_i 6= λ_j(i 6= j) and Xs

i=1

m(λ_i) = n.

Example 5.0.2 A =

· 2 2 0 3

¸ , B =

· 1 0 0 0

¸

=⇒ det(A − λB) = 2 − λ and σ(A, B) = {2}.

(4)

Example 5.0.3 A =

· 1 2 0 3

¸ , B =

· 0 1 0 0

¸

=⇒ det(A − λB) = 3 and σ(A, B) = ∅.

Example 5.0.4 A =

· 1 2 0 0

¸ , B =

· 1 0 0 0

¸

=⇒ det(A − λB) = 0 and σ(A, B) = C.

Example 5.0.5 det(µA − λB) = (2µ − λ)µ µ = 1 : Ax = λBx =⇒ λ = 2.

λ = 1 : Bx = µAx =⇒ µ = 0, µ = ¹₂ =⇒ λ = ∞, λ = 2.

σ(A, B) = {2, ∞}.

Example 5.0.6 det(µA − λB) = µ · 3µ µ = 1 : no solution for λ.

λ = 1 : Bx = µAx =⇒ µ = 0, 0.(multiple) σ(A, B) = {∞, ∞}.

Let

m(λi) := algebraic multiplicity of λi.

n(λ_i) := n − rank(A − λ_iI) = geometric multiplicity.

1 ≤ n(λ_i) ≤ m(λ_i).

If for some i, n(λi) < m(λi), then A is degenerated (defective). The following statements are equivalent:

(a) A is diagonalizable: There exists a nonsingular matrix T such that T⁻¹AT = diag(λ₁, · · · , λ_n).

(b) There are n linearly independent eigenvectors.

(c) A is nondefective, i.e. ∀ λ ∈ σ(A) =⇒ m(λ) = n(λ).

If A is defective then eigenvector + principle vector =⇒ Jordan form.

Theorem 5.0.3 (Jordan decomposition) If A ∈ C^n×n, then there exists a nonsingu- lar X ∈ C^n×n, such that X⁻¹AX = diag(J₁, · · · , J_t),where

J_i =







λ_i 1 0

. .. ...

. .. 1

0 λ_i







is mi× mi and m1+ · · · + mt= n.

Theorem 5.0.4 (Schur decomposition) If A ∈ C^n×n then there exists a unitary ma- trix U ∈ C^n×n such that U^∗AU(= U⁻¹AU) a upper triangular.

- A normal(i.e. AA^∗ = A^∗A) ⇐⇒ ∃ unitary U such that U^∗AU = diag(λ₁, · · · , λ_n), i.e.

Aui = λiui, u^∗_iuj = δij.

- A hermitian(i.e. A^∗ = A) ⇐⇒ A is normal and σ(A) ⊆ R.

- A symmetric(i.e. A^T = A, A ∈ R^n×n) ⇐⇒ ∃ orthogonal U such that U^TAU = diag(λ₁, · · · , λ_n) and σ(A) ⊆ R.

(5)

5.1 Orthogonal Projections and C-S Decomposition

Definition 5.1.1 Let S ⊆ Rⁿ be a subspace, P ∈ R^n×n is the orthogonal projection onto S if





Range(P ) = S, P² = P,

P^T = P,

(5.1.1)

where Range(P ) = R(P ) = {y ∈ Rⁿ| y = P x, for some x ∈ Rⁿ}.

Remark 5.1.1 If x ∈ Rⁿ, then P x ∈ S and (I − P )x ∈ S^⊥.

Example 5.1.1 P = vv^T/v^Tv is the orthogonal projection onto S = span{v}, v ∈ Rⁿ. x

Px

S=span{v}

Figure 5.1: Orthogonal projection

Remark 5.1.2 (i) If P₁ and P₂ are orthogonal projections, then for any z ∈ Rⁿ we have k (P₁− P₂)z k²₂= (P₁z)^T(I − P₂)z + (P₂z)^T(I − P₁)z. (5.1.2) If R(P₁) = R(P₂) = S then the right-hand side of (5.1.2) is zero. Thus the orthog- onal projection for a subspace is unique.

(ii) If V = [v₁, · · · , v_k] is an orthogonal basis for S, then P = V V^T is unique orthogonal projection onto S.

Definition 5.1.2 Suppose S₁ and S₂ are subspaces of Rⁿ and dim(S₁) = dim(S₂). We define the distance between S₁ and S₂ by

dist(S₁, S₂) =k P₁− P₂ k₂, (5.1.3) where P_i is the orthogonal projection onto S_i, i = 1, 2.

Remark 5.1.3 By considering the case of one-dimensional subspaces in R², we obtain a geometrical interpretation of dist(·, ·). Suppose S1 = span{x} and S2 = span{y} and

(6)

S

=span{y}

=span{x}

2

θ

1

k x k₂=k y k₂= 1. Assume that x^Ty = cos θ, θ ∈ [0,^π₂]. It follows that the difference between the projections onto these spaces satisfies

P1− P2 = xx^T − yy^T = x[x − (y^Tx)y]^T − [y − (x^Ty)x]y^T. If θ = 0(⇒ x = y), then dist(S₁, S₂) =k P₁− P₂ k₂= sin θ = 0.

If θ 6= 0, then

U_x = [u₁, u₂] = [x, −[y − (y^Tx)x]/ sin θ]

and

Vx = [v1, v2] = [[x − (x^Ty)y]/ sin θ, y]

are defined and orthogonal. It follows that

P1− P2 = Ux diag[sin θ, sin θ] V_x^T

is the SVD of P₁ − P₂. Consequently, dist(S₁, S₂) = sin θ, the sine of the angle between the two subspaces.

Theorem 5.1.1 (C-S Decomposition, Davis / Kahan(1970) or Stewart(1977)) If Q =

· Q₁₁ Q₁₂ Q₂₁ Q₂₂

¸

is orthogonal with Q₁₁ ∈ R^k×k and Q₂₂ ∈ R^j×j(k ≥ j), then there exists orthogonal matrices U₁, V₁ ∈ R^k×k and orthogonal matrices U₂, V₂ ∈ R^j×j such that

· U₁ 0 0 U2

¸_T ·

Q₁₁ Q₁₂ Q21 Q22

¸ · V₁ 0 0 V2

¸

=



 I 0 0

0 C S

0 −S C



 , (5.1.4)

where

C = diag(c1, · · · , cj), ci = cos θi, S = diag(s₁, · · · , s_j), s_i = sin θ_i

and 0 ≤ θ₁ ≤ θ₂ ≤ · · · ≤ θ_j ≤ ^π₂. Lemma 5.1.1 Let Q =

· Q₁ Q₂

¸

be orthogonal with Q₁ ∈ R^n×n. Then there are unitary matrices U₁, U₂ and W such that

· U₁^T 0 0 U₂^T

¸ · Q₁ Q₂

¸ W =

· C S

¸

where C = diag(c₁, · · · , c_j) ≥ 0, and S = diag(s₁, · · · , s_n) ≥ 0 with c²_i + s²_i = 1, i = 1, · · · , n.

(7)

Proof: Let U₁^TQ₁W = C be the SVD of Q₁. Consider

· U₁^T 0 0 I

¸ · Q₁ Q₂

¸ W =

· C

Q₂W

¸

has orthogonal columns. Define ˜Q2 ≡ Q2W . Then C² + ˜Q^T₂Q˜2 = I or ˜Q^T₂Q˜2 = I − C² diagonal, thus ˜Q^T₂Q˜₂ is diagonal. Which means that the nonzero column of ˜Q₂ are orthogonal to one another.If all the columns of ˜Q₂ are nonzero, set S² = ˜Q^T₂Q˜₂ and U2 = ˜Q2S⁻¹, then we have U₂^TU2 = I and U₂^TQ˜2 = S. It follows the decomposition.

If ˜Q₂ has zero columns, normalize the nonzero columns and replace the zero columns with an orthogonal basis for the orthogonal complement of the column space of ˜Q₂. It is easily verified that U₂ so defined is orthogonal and S ≡ U₂^TQ˜₂ is diagonal. It also follows that decomposition.

Theorem 5.1.2 (C-S Decomposition) Let the unitary matrix W ∈ C^n×n be parti- tioned in the form W =

· W₁₁ W₁₂ W₂₁ W₂₂

¸

, where W₁₁∈ C^r×r with r ≤ ⁿ₂. Then there exist

unitary matrices U = diag(

z}|{r

U₁ , z}|{n−r

U₂ ) and V = diag(

z}|{r

V₁ , z}|{n−r

V₂ ) such that

U^∗W V =



 Γ −Σ 0

Σ Γ 0

0 0 I



 }r }r }n − 2r

, (5.1.5)

where Γ = diag(γ₁, · · · , γ_r) ≥ 0 and Σ = diag(σ₁, · · · , σ_r) ≥ 0 with γ_i² + σ²_i = 1, i = 1, · · · , r.

Proof: Let Γ = U₁^∗W₁₁V₁ be the SVD of W₁₁ with the diagonal elements of Γ : γ₁ ≤ γ₂ ≤ · · · ≤ γ_k < 1 = γ_k+1 = · · · = γ_r, i.e.

Γ = diag(Γ⁰, I_r−k).

The matrix

· W₁₁ W₂₁

¸

V₁ has orthogonal columns. Hence

I =

·µ W₁₁ W₂₁

¶ V₁

¸_∗ ·µ W₁₁ W₂₁

¶ V₁

¸

= Γ²+ (W₂₁V₁)^∗(W₂₁V₁).

Since I and Γ² are diagonal, (W₂₁V₁)^∗(W₂₁V₁) is diagonal. So the columns of W₂₁V₁ are orthogonal. Since the ith diagonal of I − Γ² is the norm of the ith column of W₂₁V₁, only the first k(k ≤ r ≤ n − r) columns of W₂₁V₁ are nonzero. Let ˆU₂ be unitary whose first k columns are the normalized columns of W21V1. Then

Uˆ₂^∗W₂₁V₁ =

· Σ 0

¸ ,

where Σ = diag(σ₁, · · · , σ_k, 0, · · · , 0) ≡ diag(Σ⁰, 0), ˆU₂ ∈ C(n−r)×(n−r). Since

diag(U₁, ˆU₂)^∗

µ W₁₁ W21

¶ V₁ =



 Γ Σ 0





(8)

has orthogonal (orthonormal) columns, we have γ_i²+ σ_i² = 1, i = 1, · · · , r. (Σ⁰ is nonsingular).

By the same argument as above : there is a unitary V₂ ∈ C(n−r)×(n−r) such that U₁^∗W12V2 = (T, 0),

where T = diag(τ₁, · · · , τ_r) and τ_i ≤ 0. Since γ_i² + τ_i² = 1, it follows from γ_i² + σ_i² = 1 that T = −Σ. Set Û = diag(U1, Û2) and V = diag(V1, V2). Then X = Û^∗W V can be partitioned in the form

X =







Γ⁰ 0 −Σ⁰ 0 0

0 I 0 0 0

Σ⁰ 0 X33 X34 X35

0 0 X₄₃ X₄₄ X₄₅ 0 0 X₅₃ X₅₄ X₅₅





 }k }r − k }k }r − k }n − 2r

.

Since columns 1 and 4 are orthogonal, it follows Σ⁰X₃₄ = 0. Thus X₃₄ = 0 (since Σ⁰ nonsigular). Likewise X₃₅, X₄₃, X₅₃ = 0. From the orthogonality of columns 1 and 3, it follows that −Γ⁰Σ⁰+ Σ⁰X₃₃= 0, so X₃₃= Γ⁰. The matrix ˆU₃ =

· X₄₄ X₄₅ X54 X55

¸

is unitary.

Set U2 = diag(Ik, Û3) Û2 and U = diag(U1, U2). Then U^HW V = diag(Ir+k, Û3)X with

X =







Γ⁰ 0 −Σ⁰ 0 0

0 I 0 0 0

Σ⁰ 0 Γ⁰ 0 0

0 0 0 I 0

0 0 0 0 I







.

The theorem is proved.

Theorem 5.1.3 Let W = [W₁, W₂] and Z = [Z₁, Z₂] be orthogonal, where W₁, Z₁ ∈ R^n×k and W₂, Z₂ ∈ R^n×(n−k). If S₁ = R(W₁) and S₂ = R(Z₁) then

dist(S₁, S₂) = q

1 − σ_min² (W₁^TZ₁) (5.1.6) Proof: Let Q = W^TZ and assume that k ≥ j = n − k. Let the C-S decomposition of Q be given by (5.1.2), (Q_ij = W_i^TZ_j, i, j = 1, 2). It follows that

k W₁^TZ2 k2=k W₂^TZ1 k2= sj = q

1 − c²_j =p

1 − σ_min² (W₁^TZ1).

Since W₁W₁^T and Z₁Z₁^T are the orthogonal projections onto S₁ and S₂, respectively. We have

dist(S₁, S₂) = k W₁W₁^T − Z₁Z₁^T k₂

= k W^T(W₁W₁^T − Z₁Z₁^T)Z k₂

= k

· 0 W₁^TZ₂ W₂^TZ1 0

¸ k₂

= s_j.

If k < j, the above argument by setting Q = [W₂, W₁]^T[Z₂, Z₁] and noting that σmin(W₂^TZ1) = σmin(W₁^TZ2) = sj.

(9)

5.2 Perturbation Theory

Theorem 5.2.1 (Gerschgorin Circle Theorem) If X⁻¹AX = D+F , D ≡ diag(d₁, · · · , d_n) and F has zero diagonal entries, then σ(A) ⊂S_n

i=1D_i, where D_i = {z ∈ C | |z − d_i| ≤

Xn j=1,j6=i

|f_ij|}.

Proof: Suppose λ ∈ σ(A) and assume without loss of generality that λ 6= d_i for i = 1, · · · , n. Since (D − λI) + F is singular, it follows that

1 ≤k (D − λI)⁻¹F k_∞= Xn

j=1

|f_kj| / |d_k− λ|

for some k(1 ≤ k ≤ n). But this implies that λ ∈ D_k. Corollary 5.2.1 If the union M₁ =S_k

j=1D_i_j of k discs D_i_j, j = 1, · · · , k, and the union M2 of the remaining discs are disjoint, then M1 contains exactly k eigenvalues of A and M₂ exactly n − k eigenvalues.

Proof: Let B = X⁻¹AX = D + F , for t ∈ [0, 1]. Let B_t := D + tF , then B₀ = D, B₁ = B. The eigenvalues of B_t are continuous functions of t. Applying Theorem 5.2.1 of Gerschgorin to B_t, one finds that for t = 0, there are exactly k eigenvalues of B₀ in M₁ and n − k in M₂. (Counting multiple eigenvalues) Since for 0 ≤ t ≤ 1 all eigenvalues of B_t likewise must lie in these discs, it follows for reasons of continuity that also k eigenvalues of A lie in M₁ and the remaining n − k in M₂.

Remark 5.2.1 Take X = I, A = diag(A) + offdiag(A). Consider the transformation A −→ 4⁻¹A4 with 4 = diag(δ1, · · · , δn). The Gerschgorin discs:

D_i = {z ∈ C | |z − a_ii| ≤ Xn

k=1 k6=i

¯¯

¯¯a_ikδ_k δ_i

¯¯

¯¯ =: ρⁱ}.

Example 5.2.1 Let A =



 1 ² ²

² 2 ²

² ² 2



, D1 = {z | |z − 1| ≤ 2²}, D2 = D3 = {z |

|z − 2| ≤ 2²}, 0 < ² ¿ 1. Transformation with 4 = diag(1, k², k²), k > 0 yields A = 4˜ ⁻¹A4 =



 1 k²² k²²

1

k 2 ²

1

k ² 2



.

For ˜A we have ρ₁ = 2k²², ρ₂ = ρ₃ = ¹_k+ ². The discs D₁ and D₂ = D₃ for ˜A are disjoint if ρ1+ ρ2 = 2k²² +¹_k + ² < 1.

For this to be true we must clearly have k > 1. The optimal value ˜k, for which D₁ and D₂(for ˜A) touch one another, is obtained from ρ₁+ ρ₂ = 1. One finds

(10)

˜k = 2 1 − ² +p

(1 − ²)²− 8²² = 1 + ² + O(²²)

and thus ρ1 = 2˜k²² = 2²²+ O(²³). Through the transformation A −→ ˜A the radius ρ1 of D₁ can thus be reduced from the initial 2² to about 2²².

Theorem 5.2.2 (Bauer-Fike) If µ is an eigenvalue of A + E ∈ C^n×n and X⁻¹AX = D = diag(λ₁, · · · , λ_n), then

λ∈σ(A)min |λ − µ| ≤ κp(X) k E kp,

where k · kp is p-norm and κp(X) =k X kpk X⁻¹ kp .

Proof: We need only consider the case µ 6∈ σ(A). If X⁻¹(A + E − µI)X is singular, then so is I + (D − µI)⁻¹(X⁻¹EX). Thus,

1 ≤k (D − µI)⁻¹(X⁻¹EX) k_p≤ 1

λ∈σ(A)min |λ − µ| k X k_pk E k_pk X⁻¹ k_p .

Theorem 5.2.3 Let Q^∗AQ = D+N be a Schur decomposition of A with D = diag(λ₁, · · · , λ_n) and N strictly upper triangular, Nⁿ= 0. If µ ∈ σ(A + E), then

λ∈σ(A)min |λ − µ| ≤ max{θ, θⁿ¹}, where θ =k E k₂ P_n−1

k=0 k N k^k₂.

Proof: Define δ = min_λ∈σ(A)|λ − µ|. The theorem is true if δ = 0. If δ > 0, then I − (µI − A)⁻¹E is singular and we have

1 ≤ k (µI − A)⁻¹E k₂

≤ k (µI − A)⁻¹ k2k E k2

= k [(µI − D) − N]⁻¹ k₂k E k₂ .

Since (µI − D) is diagonal it follows that [(µI − D)⁻¹N]ⁿ = 0 and therefore

[(µI − D) − N]⁻¹ = Xn−1

k=0

[(µI − D)⁻¹N]^k(µI − D)⁻¹.

Hence we have

1 ≤ k E k₂

δ max{1, 1 δⁿ⁻¹}

Xn−1 k=0

k N k^k₂,

from which the theorem readily follows.

(11)

Example 5.2.2 If A =



 1 2 3

0 4 5

0 0 4.001



 and E =



 0 0 0

0 0 0

0.001 0 0



. Then σ(A + E) ∼= {1.0001, 4.0582, 3.9427} and A’s matrix of eigenvectors satisfies κ2(X) ∼= 10⁷. The Bauer- Fike bound in Theorem 5.2.2 has order 10⁴, but the Schur bound in Theorem 5.2.3 has order 10⁰.

Theorems 5.2.2 and 5.2.3 each indicate potential eigenvalue sensitively if A is non- normal. Specifically, if κ₂(X) and k N kⁿ⁻¹₂ is large, then small changes in A can induce large change in the eigenvalues.

· 0 I9

0 0

¸

and E =

· 0 0

10⁻¹⁰ 0

¸

, then for all λ ∈ σ(A) and µ ∈ σ(A + E), |λ − µ| = 10⁻¹⁰¹⁰ . So a change of order 10⁻¹⁰ in A results in a change of order 10⁻¹ in its eigenvalues.

Let λ be a simple eigenvalue of A ∈ C^n×n and x and y satisfy Ax = λx and y^∗A = λy^∗ with k x k2=k y k2= 1. Using classical results from Function Theory, it can be shown that there exists differentiable x(ε) and λ(ε) such that

(A + εF )x(ε) = λ(ε)x(ε)

with k x(ε) k₂= 1 and k F k₂≤ 1, and such that λ(0) = λ and x(0) = x. By differentiating and set ε = 0:

A ˙x(0) + F x = ˙λ(0)x + λ ˙x(0).

Applying y^∗ to both sides and dividing by y^∗x =⇒

f (x, y) = yⁿ+ p_n−1(x)yⁿ⁻¹+ p_n−2(x)yⁿ⁻²+ · · · + p₁(x)y + p₀(x).

Fix x, then f (x, y) = 0 has n roots y₁(x), · · · , y_n(x). f (0, y) = 0 has n roots y₁(0), · · · , y_n(0).

Theorem 5.2.4 Suppose yi(0) is a simple root of f (0, y) = 0, then there is δi > 0 such that there is a simple root y_i(x) of f (x, y) = 0 defined by

yi(x) = yi(0) + pi1x + pi2x²+ · · · , (may terminate!) where the series is convergent for |x| < δi. (yi(x) −→ yi(0) as x −→ 0).

Theorem 5.2.5 If y₁(0) = · · · = y_m(0) is a root of multiplicity m of f (0, y) = 0, then there exists δ > 0 such that there are exactly m zeros of f (x, y) = 0 when |x| < δ having the following properties:

(a) P_r

i=1m_i = m, m_i ≥ 0. The m roots fall into r groups.

(b) Those roots in the group of mi are mi values of a series y₁(0) + p_i1z + p_i2z² + · · ·

corresponding to the m_i different values of z defined by z = x^mi¹ .

(12)

Let λ₁ be a simple root of A and x₁ be the corresponding eigenvector. Since λ₁ is simple, (A − λ1I) has at least one nonzero minor of order n − 1. Suppose this lies in the first (n − 1) rows of (A − λ₁I). Take x₁ = (A_n1, A_n2, · · · , A_nn). Then

(A − λ₁I)





 A_n1 A_n2 ...

A_nn





=





 0 0...

0





,

since P_n

j=1anjAnj = det(A − λ1I) = 0. Here Ani is the cofactor of ani, hence it is a polynomial in λ₁ of degree not greater than (n − 1).

Let λ₁(ε) be the simple eigenvalue of A + εF and x₁(ε) be the corresponding eigen- vector. Then the elements of x1(ε) are the polynomial in λ1(ε) and ε. Since the power series for λ₁(ε) is convergent for small ε, so x₁(ε) = x₁+ εz₁+ ε²z₂+ · · · is a convergent power series

¯¯

¯ ˙λ(0)

¯¯

¯ = |y^∗F x|

|y^∗x| ≤ 1

|y^∗x|. The upper bound is attained if F = yx^∗. We refer to the reciprocal of s(λ) ≡ |y^∗x| as the condition number of the eigenvalue λ.

λ(ε) = λ(0) + ˙λ(0)ε + O(ε²), an eigenvalue λ may be perturbed by an amount ε s(λ), if s(λ) is small then λ is appropriately regarded as ill-conditioned. Note that s(λ) is the cosine of the angle between the left and right eigenvectors associated with λ and is unique only if λ is simple. A small s(λ) implies that A is near a matrix having a multiple eigenvalue. In particular, if λ is distinct and s(λ) < 1, then there exists an E such that λ is a repeated eigenvalue of A + E and

k E k₂≤ s(λ) p1 − s²(λ), this is proved in Wilkinson(1972).



 1 2 3

0 4 5

0 0 4.001



 and E =



 0 0 0

0 0 0

0.001 0 0



. Then σ(A + E) ∼= {1.0001, 4.0582, 3.9427} and s(1) ∼= 0.79 × 10⁰, s(4) = 0.16 × 10⁻³, s(4.001) ∼= 0.16 × 10⁻³. Observe that k E k2 /s(λ) is a good estimate of the perturbation that each eigenvalue undergoes.

If λ is a repeated eigenvalue, then the eigenvalue sensitivity question is more compli- cated. For example A =

· 1 a 0 1

¸

and F =

· 0 0 1 0

¸

then σ(A + εF ) = {1 ±√

εa}. Note that if a 6= 0 then the eigenvalues of A + εF are not differentiable at zero, their rate of change at the origin is infinite. In general, if λ is a detective eigenvalue of A, then O(ε) perturbations in A result in O(ε^p¹) perturbations in λ where p ≥ 2 (see Wilkinson AEP pp.77 for a more detailed discussion).

We now consider the perturbations of invariant subspaces. Assume A ∈ C^n×n has distinct eigenvalues λ₁, · · · , λ_n and k F k₂= 1. We have

(A + εF )x_k(ε) = λ_k(ε)x_k(ε), k x_k(ε) k₂= 1,

(13)

and

y_k^∗(ε)(A + εF ) = λk(ε)y^∗_k(ε), k yk(ε) k2= 1,

for k = 1, · · · , n, where each λk(ε),xk(ε) and yk(ε) are differentiable. Set ε = 0 : A ˙xk(0) + F xk = ˙λk(0)xk+ λk˙xk(0),

where λk = λk(0) and xk = xk(0). Since {xi}ⁿ_i=1 linearly independent, write ˙xk(0) = P_n

i=1a_ix_i, so we have Xn

i=1 i6=k

ai(λi− λk)xi+ F xk= ˙λk(0)xk.

But y_i^∗(0)x_k= y_i^∗x_k = 0, for i 6= k and thus

a_i = y_i^∗F x_k/[(λ_k− λ_i)y_i^∗x_i], i 6= k.

Hence the Taylor expansion for x_k(ε) is

x_k(ε) = x_k+ ε Xn

i=1 i6=k

½ y_i^∗F x_k (λ_k− λ_i)y_i^∗x_i

¾

x_i+ O(ε²).

Thus the sensitivity of xk depends upon eigenvalue sensitivity and the separation of λk

from the other eigenvalues.

· 1.01 0.01 0.00 0.99

¸

, then λ = 0.99 has Condition 1

s(0.99) ∼= 1.118 and associated eigenvector x = (0.4472, −8.944)^T. On the other hand, ˜λ = 1.00 of the

”nearby” matrix A + E =

· 1.01 0.01 0.00 1.00

¸

has an eigenvector ˜x = (0.7071, −0.7071)^T. Suppose

Q^∗AQ =

· T₁₁ T₁₂ 0 T₂₂

¸ }p

}q = n − p (5.2.1)

is a Schur decomposition of A with

Q = [ Q|{z}₁

p

, Q|{z}₂

n−p

]. (5.2.2)

Definition 5.2.1 We define the separation between T11 and T22 by sep_F(T₁₁, T₂₂) = min

Z6=0

k T₁₁Z − ZT₂₂ k_F k Z k_F .

Definition 5.2.2 Let X be a subspace of Cⁿ, X is called an invariant subspace of A ∈ C^n×n, if AX ⊂ X (i.e. x ∈ X =⇒ Ax ∈ X).

Theorem 5.2.6 A ∈ C^n×n, V ∈ C^n×r and rank(V ) = r, then there are equivalent:

(14)

(a) there exists S ∈ C^r×r such that AV = V S.

(b) R(V ) is an invariant subspace of A.

Proof: Trivial!

Remark 5.2.2 (a) If Sz = µz, z 6= 0 then µ is eigenvalue of A with eigenvector V z.

(b) If V is a basis of X, then ˜V = V (V^∗V )⁻¹² is an orthogonal basis of X.

Theorem 5.2.7 A ∈ C^n×n, Q = (Q₁, Q₂) orthogonal, then there are equivalent:

(a) If Q^∗AQ = B =

· B₁₁ B₁₂ B21 B22

¸

, then B₂₁= 0.

(b) R(Q₁) is an invariant subspace of A.

Proof: Q^∗AQ = B ⇐⇒ AQ = QB = (Q₁, Q₂)

· B₁₁ B₁₂ B21 B22

¸

. Thus AQ₁ = Q₁B₁₁+ Q₂B₂₁.

(a) B₂₁= 0, then AQ₁ = Q₁B₁₁, so R(Q₁) is an invariant subspace of A (from Theorem 5.2.6).

(b) R(Q1) is invariant subspace. There exists S such that AQ1 = Q1S = Q1B11+Q2B21. Multiply with Q^∗₁, then

S = Q^∗₁Q₁S = Q^∗₁Q₁B₁₁+ Q^∗₁Q₂B₂₁. So S = B11 =⇒ Q2B21 = 0 =⇒ Q^∗₂Q2B21 = 0 =⇒ B21= 0.

Theorem 5.2.8 Suppose (5.2.1) and (5.2.2) hold and for E ∈ C^n×n we partition Q^∗EQ as follows:

Q^∗EQ =

· E11 E12

E₂₁ E₂₂

¸

with E₁₁ ∈ R^p×p and E₂₂ ∈ R(n−p)×(n−p). If

δ = sep2(T11, T22)− k E11 k2 − k E22k2> 0 and

k E₂₁k₂ (k T₁₂k₂ + k E₁₂ k₂) ≤ δ²/4.

Then there exists P ∈ C^(n−k)×k such that

k P k₂≤ 2 k E₂₁ k₂ /δ

and such that the column of ˜Q₁ = (Q₁+ Q₂P )(I + P^∗P )⁻¹² form an orthonormal basis for a invariant subspace of A + E.(See Stewart 1973).

(15)

Lemma 5.2.1 Let {s_m} and {p_m} be two sequence defined by

s_m+1 = s_m/(1 − 2ηp_ms_m), p_m+1 = ηp²_ms_m+1, m = 0, 1, 2, · · · (5.2.3) and

s0 = σ, p0 = σγ (5.2.4)

satisfying

4ησ²γ < 1. (Here σ, η, γ > 0) (5.2.5) Then {s_m} is monotonic increasing and bounded above; {p_m} is monotonic decreasing, converges quadratically to zero.

Proof: Let

xm = smpm, m = 0, 1, 2, · · · . (5.2.6) From (5.2.3) we have

x_m+1 = s_m+1p_m+1 = ηp²_ms²_m/(1 − 2ηp_ms_m)² = ηx²_m/(1 − 2ηx_m)², (5.2.7) (5.2.5) can be written as

0 < x0 < 1

4η. (since x0 = s0p0 = σ²γ < 1

4η) (5.2.8)

Consider

x = f (x), f (x) = ηx²/(1 − 2ηx)², x ≥ 0. (5.2.9) By

df (x)

dx = 2ηx

(1 − 2ηx)³,

we know that f (x) is differentiable and monotonic increasing in [0, 1/2η), and df (x)

dx |_x=0= 0 : The equation (5.2.9) has zeros 0 and 1/4η in [0, 1/2η). Under Condition (5.2.8) the iteration xm as in (5.2.7) must be monotone decreasing converges quadratically to zero.

(Issacson &Keller ”Analysis of Num. Method 1996, Chapter 3 §1.) Thus s_m+1

s_m = 1

1 − 2ηx_m = 1 + 2ηx_m

1 − 2ηx_m = 1 + tm,

where t_m is monotone decreasing, converges quadratically to zero, hence

s_m+1 = s₀ Ym j=0

sj+1

s_j = s₀ Ym j=0

(1 + t_j)

monotone increasing, and converges to s₀ Y∞ j=0

(1 + t_j) < ∞, so p_m = x_m

s_m monotone decreasing, and quadratically convergent to zero.

(16)

Theorem 5.2.9 Let

P A₁₂P + P A₁₁− A₂₂P − A₂₁ = 0 (5.2.10) be the quadratic matrix equation in P ∈ C^(n−l)×l (1 ≤ l ≤ n), where

· A₁₁ A₁₂ A₂₁ A₂₂

¸

= A, σ(A₁₁)\

σ(A₂₂) = ∅.

Define operator T by:

T Q ≡ QA₁₁− A₂₂Q, Q ∈ C^(n−l)×l. (5.2.11) Let

η =k A12k, γ =k A21k (5.2.12)

and

σ =k T⁻¹ k= sup

kP k=1

k T⁻¹P k . (5.2.13)

If

4ησ²γ < 1, (5.2.14)

then according to the following iteration, we can get a solution P of (5.2.10) satisfying

k P k≤ 2σγ, (5.2.15)

and this iteration is quadratic convergence.

Iteration: Let Am =

"

A^(m)₁₁ A^(m)₁₂ A^(m)₂₁ A^(m)₂₂

#

, A0 = A.

(i) Solve

T_mP_m ≡ P_mA^(m)₁₁ − A^(m)₂₂ P_m = A^(m)₂₁ (5.2.16) and get P_m ∈ C^(n−l)×l;

(ii) Compute

A^(m+1)₁₁ = A^(m)₁₁ + A₁₂P_m, A^(m+1)₂₂ = A^(m)₂₂ − PmA12, A^(m+1)₂₁ = −P_mA₁₂P_m. Goto (i), solve P_m+1.

Then

P = lim

m→∞

Xm i=0

Pi (5.2.17)

is a solution of (5.2.10) and satisfies (5.2.15).

(17)

Proof: (a) Prove that for m = 0, 1, 2, · · · , T_m⁻¹ exist: denote

k T_m⁻¹ k= σ_m, (T = T₀, σ = σ₀), (5.2.18) then

4 k A12 kk Pm k σm < 1. (5.2.19) By induction, m = 0, from σ(A₁₁)T

σ(A₂₂) = ∅ we have T₀ = T is nonsingular. From (5.2.12)-(5.2.14) it holds

4 k A₁₂ kk P₀ k σ₀ = 4η k T⁻¹A₂₁ k σ ≤ 4ησ²γ < 1.

Suppose T_m⁻¹ exists, and (5.2.19) holds, prove that T_m+1⁻¹ exists and 4 k A12 kk Pm+1 k σm+1 < 1.

From the definition

sep(A11, A22) = inf

kQk=1k QA11− A22Q k

and the existence of T⁻¹ follows sep(A₁₁, A₂₂) =k T⁻¹ k⁻¹= σ⁻¹, and by the perturbation property of ”sep” follows

sep(A^(m+1)₁₁ , A^(m+1)₂₂ ) = sep(A^(m)₁₁ + A12Pm, A^(m)₂₂ − PmA12)

≥ sep(A^(m)₁₁ , A^(m)₂₂ )− k A₁₂P_m k − k P_mA₁₂k

≥ 1 − 2 k A₁₂kk P_m k σ_m σm

> 0. (5.2.20)

From

sep(A₁₁, A₂₂) ≤ min{|λ₁− λ₂| : λ₁ ∈ σ(A₁₁), λ₂ ∈ σ(A₂₂)}.

We have σ(A^(m+1)₁₁ )T

σ(A^(m+1)₂₂ ) = ∅, hence T_m+1⁻¹ exists and sep(A^(m+1)₁₁ , A^(m+1)₂₂ ) =k T_m+1⁻¹ k⁻¹= σ⁻¹_m+1. From (5.2.20) it follows

σ_m+1 ≤ σ_m

1 − 2 k A₁₂ kk P_m k σ_m. (5.2.21) Substitute (5.2.19) into (5.2.21), we get σ_m+1 ≤ 2σ_m, and

k P_m+1 k≤k T_m+1⁻¹ kk A^m+1₂₁ k≤ σ_m+1 k P_m k²k A₁₂k< 1

2 k P_m k . Hence

2 k A₁₂ kk P_m+1 k σ_m+1 ≤ 2 k A₁₂ kk P_m k σ_m < 1/2.

This proved that T_m⁻¹ exists for all m = 0, 1, 2, · · · and (5.2.19) holds.

(18)

(b) Prove k P_m k is quadratic convergence to zero. Construct sequences {q_m}, {s_m}, {p_m} satisfying

k A^(m)₂₁ k≤ qm, σm ≤ sm, k Pm k≤ pm. (5.2.22) From

A^(m+1)₂₁ = −PmA12Pm (5.2.23)

follows

k A^(m+1)₂₁ k≤k A12kk Pm k²≤ ηp²_m. (5.2.24) Define {qm} by

qm+1 = ηp²_m, q0 = γ; (5.2.25) From (5.2.21) we have

σ_m+1 ≤ s_m

1 − 2ηp_ms_m. (5.2.26)

Define {s_m} by

s_m+1 = s_m

1 − 2ηp_ms_m, s₀ = σ; (5.2.27) From (5.2.16) we have

k Pm k≤k T_m⁻¹ kk A^(m)₂₁ k= σm k A^(m)₂₁ k≤ smqm. Define {p_m} by

p_m+1 = s_m+1q_m+1 = ηp²_ms_m+1, p₀ = σγ. (5.2.28) By Lemma 5.2.1 follows that {p_m} & 0 monotone and form (5.2.22) follows that k Pm k−→ 0 quadratically.

(c) Prove P^(m) −→ P and (5.2.15) holds. According to the method as in Lemma 5.2.1. Construct {x_m} (see (5.2.6),(5.2.7) ), that is

x_m+1 = ηx²_m

(1 − 2ηx_m)², s_m+1 = sm

1 − 2ηx_m (5.2.29)

and then

p_m+1 = x_m+1

s_m+1 = ηx_m

1 − 2ηx_mp_m. (5.2.30)

By induction! For all m = 1, 2, · · · we have p_m < 1

2p_m−1, x_m < 1

4η. (5.2.31)

In fact, substitute

ηx₀

1 − 2ηx₀ = ησ²γ

1 − 2ησ²γ < 1

2 (5.2.32)

into (5.2.30) and get p1 < 1

2p0; From (5.2.29) and (5.2.32) it follows that x₁ = 1

η

µ ηx0

1 − 2ηx₀

¶₂

< 1 4η.