Eigenspaces and their Approximation

(1)

師大

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University, Taiwan

April 8, 2009

(2)

師大

Outline

1 Eigenspaces Definitions

Simple eigenspaces

2 Perturbation Theory Canonical angles Residual analysis

3 Krylov subspaces

Krylov sequences and Krylov spaces Convergence

Block Krylov spaces

4 Rayleigh-Ritz Approximation Rayleigh-Ritz methods Convergence

Refined Ritz vectors Harmonic Ritz vectors

(3)

師大

Definition

Let A be of order n and let X be a subspace of Cⁿ. Then X is aneigenspaceorinvariant subspaceof A if

AX = {Ax; x ∈ X } ⊂ X .

If (λ, x) ≡ (α + ıβ, y + ız) is a complex eigenpair of a real matrix A, i.e.,

A(y + ız) = (α + ıβ)(y + ız) = (αy − βz) + ı(βy + αz)

⇒

Ay = αy − βz, Az = βy + αz, then

A

y z = y z

α β

−β α

. It implies that R [ y z ] is an eigenspace of A.

(4)

師大 Definitions

Theorem

Let X be an eigenspace of A and let X be a basis for X . Then there is a unique matrix L such that

AX = XL.

The matrix L is given by

L = XÎAX, where XÎ is a matrix satisfying XÎX = I.

If (λ, x) is an eigenpair of A with x ∈ X , then (λ, X^Ix)is an eigenpair of L. Conversely, if (λ, u) is an eigenpair of L, then (λ, Xu)is an eigenpair of A.

(5)

師大

Proof: Let

X = [x1· · · x_k] and Y = AX = [y1· · · y_k] .

Since y_i ∈ X and X is a basis for X , there is a unique vector `_i such that

yi = X`i.

If we set L = [`₁· · · `_k], then AX = XL and L = X^IXL = X^IAX.

Now let (λ, x) be an eigenpair of A with x ∈ X . Then there is a unique vector u such that x = Xu. However, u = X^Ix. Hence

λx = Ax = AXu = XLu ⇒ λu = λX^Ix = Lu.

Conversely, if Lu = λu, then

A(Xu) = (AX)u = (XL)u = X(Lu) = λ(Xu), so that (λ, Xu) is an eigenpair of A.

(6)

師大 Definitions

Definition

Let A be of order n. For X ∈ C^n×kand L ∈ C^k×k, we say that (L, X)is an eigenpair of order k or right eigenpair of order k of Aif

1. X is of full rank, 2. AX = XL.

The matrices X and L are called eigenbasis and eigenblock, respectively. If X is orthonormal, we say that the eigenpair (L, X)is orthonormal.

If Y ∈ C^n×k has linearly independent columns and

Y^HA = LY^H, we say that (L, Y ) is a left eigenpair of order k of A.

(7)

師大

Question

How eigenpairs transform under change of basis and similarities?

Theorem

Let (L, X) be an eigenpair of A. If U is nonsingular, then the pair (U⁻¹LU, XU )is also eigenpair of A. If W is nonsingular, then (L, W⁻¹X)is an eigenpair of W⁻¹AW.

proof:

A(XU ) = (AX)U = (XL)U = (XU )(U⁻¹LU ), (W⁻¹AW )(W⁻¹X) = W⁻¹AX = (W⁻¹X)L.

The eigenvalues of L of an eigenspace with respect to a basis are independent of the choices of the basis.

(8)

師大 Definitions

Theorem

Let L = {λ₁, . . . , λ_k} ⊂ Λ(A) be a multisubset of the

eigenvalues of A. Then there is an eigenspace X of A whose eigenvalues are λ₁, . . . , λk.

Proof: Let

A[ U1 U₂ ] = [ U1 U₂ ]

T₁₁ T₁₂ 0 T22

be a partitioned Schur decomposition of A in which T₁₁is of order k and has the members of L on its diagonal. Then

AU1= U1T11.

Hence the column space of U₁is an eigenspace of A whose eigenvalues are the members of L.

(9)

師大

Definition

An eigenvalue whose geometric multiplicity is less than its algebraic multiplicity is defective.

Definition

Let X be an eigenspace of A with eigenvalues L. Then X is a simple eigenspace of A if

L ∩ [Λ(A) \ L] = ∅.

In the other words, an eigenspace is simple if its eigenvalues are disjoint from the other eigenvalues of A.

(10)

師大 Simple eigenspaces

Theorem

Let (L₁, X1)be a simple orthonormal eigenpairs of A and let (X1, Y2)be unitary so that

X₁^H Y₂^H

A

X₁ Y₂ =

L₁ H 0 L₂

.

Then there is a matrix Q satisfying the Sylvester equation L₁Q − QL₂= −H

such that if we set X =

X1 X2

and Y =

Y1 Y2 , where

X₂ = Y₂+ X₁Q and Y₁ = X₁− Y₂Q^H,

(11)

師大

then

Y^HX = I and Y^HAX =diag(L1, L2).

Proof: Since (L1, X1)is a simple eigenpairs of A, it implies that Λ(L1) ∩ λ(L2) = ∅.

By Theorem 1.18 in Chapter 1, there is a unique matrix Q satisfying

L1Q − QL2= −H such that

I −Q

0 I

L1 H 0 L₂

I Q 0 I

=diag(L₁, L₂).

That is

I −Q

0 I

X₁^H Y₂^H

A

X1 Y2

I Q 0 I

=diag(L₁, L2).

(12)

師大 Simple eigenspaces

Therefore,

X₁^H− QY₂^H Y₂^H

A

X1 X1Q + Y2 = diag(L₁, L2).

Observations

1 X and Y are said to be biorthogonal.

2 Since A

X₁ X₂ = X₁ X₂ diag(L₁, L₂), we see that

AX2 = X2L2,

so that (L₂, X₂)is an eigenpair of A. Likewise (L₁, Y₁)is a left eigenpair of A.

(13)

師大

Let x and y be nonzero vectors. Then the angle ∠(x, y) of x and y is defined as

cos ∠(x, y) = |x^Hy|

k x k₂k y k₂.

Extend this definition to subspaces in Cⁿ. Let X and Y be subspaces of the same dimension. Let X and Y be orthonormal bases for X and Y, respectively, and define C = Y^HX. We have

k C k₂≤k X k₂k Y k₂= 1.

Hence all the singular value of C lie in [0, 1] and can be regarded as cosine of angles.

(14)

師大 Canonical angles

Definition

Let X and Y be subspaces of Cⁿof dimension p and let X and Y be orthonormal bases for X and Y, respectively. Then the canonical angles between X and Y are

θ_i(X , Y) = cos⁻¹γ_i, (1) with

θ1(X , Y) ≥ θ2(X , Y) ≥ · · · ≥ θp(X , Y), where γ_i are the singular values of Y^HX.

(15)

師大

If the canonical angle is small, then the computation of (1) will give inaccurate results.

For small θ, cos(θ) ∼= 1 −¹₂θ². If θ ≤ 10⁻⁸, then cos(θ) will evaluate to 1 in IEEE double-precision arithmetic, and we will conclude that θ = 0.

The cure for this problem is to compute the sine of the canonical angles.

Theorem

Let X and Y be orthonormal bases for X and Y, and let Y⊥be an orthonormal basis for the orthogonal complement of Y.

Then the singular values of Y_⊥^HX are the sines of the canonical angles between X and Y.

(16)

Proof: Let

Y^H Y_⊥^H

X =

C S

. By the orthonormality, we have

I = C^HC + S^HS.

Let

V^H(C^HC)V = Γ² ≡ diag(γ₁², · · · , γ_p²)

be the spectral decomposition of C^HC. Then by the definition of canonical angle θ_iin (1), we have

θi= cos⁻¹γi. But

I = V^H(C^HC + S^HS)V = Γ²+ V^H(S^HS)V ≡ Γ²+ Σ².

(17)

師大

It follows that

Σ² ≡ diag(σ²₁, · · · , σ²_p) =diag(1 − γ²₁, · · · , 1 − γ_p²), where σ_i are singular values of S = Y_⊥^HX. Therefore,

σ²_i = 1 − γ_i²= 1 − cos²θi = sin²θi ⇒ θi= sin⁻¹σi.

Theorem

Let x be a vector with k x k₂= 1and let Y be a subspace. Then sin ∠(x, Y) = min

y∈Y k x − y k₂.

(18)

Proof: Let (Y, Y⊥)be unitary with R(Y ) = Y. Let y ∈ Y, then

Y^H Y_⊥^H

(x − y) =

xˆ ˆ x⊥

−

yˆ 0

=

x − ˆˆ y ˆ x⊥

. It implies that

k x − y k₂=

Y^H Y_⊥^H

(x − y) 2

=

x − ˆˆ y ˆ x⊥

2

and hence

miny∈Ykx − yk = kˆx⊥k₂= kY_⊥^Hxk₂. (2) By Theorem 10 and (2), we have

sin ∠(x, Y) = kY⊥^Hxk₂= min

y∈Ykx − yk.

(19)

師大

Theorem

Let X and Y be orthonormal matrices with X^HY = 0and let Z = X + Y Q. Let

σ₁≥ σ₂ ≥ · · · ≥ σ_k> 0 and ζ₁ ≥ ζ₂ ≥ · · · ≥ ζ_k> 0 denote the nonzero singular values of Z and Q, respectively.

Set

θ₁ ≥ θ₂≥ · · · ≥ θ_k

to be the nonzero canonical angle between R(X) and R(Z).

Then

σ_i = sec θ_i and ζ_i= tan θ_i, for i = 1, . . . , k.

(20)

Proof: Since

X^HX = I, Y^HY = I, X^HY = 0 and Z = X + Y Q, we have

Z^HZ = (X^H + Q^HY^H)(X + Y Q) = I + Q^HQ.

This implies that

σ_i² = 1 + ζ_i², for i = 1, . . . , k. (3) Define

Z ≡ Z(Zˆ ^HZ)^−1/2= (X + Y Q)(I + Q^HQ)^−1/2, where (I + Q^HQ)^−1/2is the inverse of the positive definite square root of I + Q^HQ. Then ˆZis an orthonormal basis for R(Z) and

X^HZ = (I + QQˆ ^H)^−1/2.

(21)

師大

Hence the singular values γ_i of X^HZˆare γ_i =

q 1 + ζ_i²

−1

for i = 1, . . . , k. Using (3) and the definition of canonical angles θ_i between R(X) and R(Z), we have

cos θi = γi= σ_i⁻¹. That is

σ_i= 1 cos θi

= sec θ_i. The relation tan θi= ζinow follows from (3).

(22)

Let (L₁, X₁)be a simple right orthonormal eigenpair of A and let (X₁, Y₂)be unitary. From Theorem 8, (L₁, Y₁≡ X₁− Y₂Q^H) is left eigenpair of A and Y₁^HX1= I. By Theorem 12, we obtain the following corollary.

Corollary

Let X be an orthonormal basis for a simple eigenspace X of A and let Y be a basis for the corresponding left eigenspace Y of Anormalized so that Y^HX = I. Then the singular values of Y are the secants of the canonical angles between X and Y. In particular,

kY k₂= sec θ₁(X , Y).

(23)

師大

Theorem

Let [X X⊥]be unitary. Let R = AX − XL and

S^H = X^HA − LX^H. Then kRk and kSk are minimized when L = X^HAX,

in which case

kRk = kX_⊥^HAXk and kSk = kX^HAX⊥k.

Proof: Set

X^H X_⊥^H

A

X X⊥ =

Lˆ H

G M

. Then

X^H X_⊥^H

R =

Lˆ H

G M

X^H X_⊥^H

X −

X^H X_⊥^H

XL =

L − Lˆ G

.

(24)

師大 Residual analysis

It implies that kRk =

X^H X_⊥^H

R

=

L − Lˆ G

,

which is minimized when L = X^HAXand min kRk = kGk = kX_⊥^HAXk.

The proof for S is similar.

Definition

Let X be of full column rank and let X^Ibe a left inverse of X.

Then X^IAX is a Rayleigh quotient of A.

(25)

師大

Theorem

Let X be orthonormal and let

R = AX − XL.

Let `₁, . . . , `kbe the eigenvalues of L. Then there are eigenvalues λ_j₁, . . . , λ_j_k of A such that

|`_i− λ_j_i| ≤ kRk₂ and

v u u t

k

X

i=1

(`_i− λ_j_i)²≤√

2kRk_F.

(26)

師大 Krylov sequences and Krylov spaces

Power method: compute the dominant eigenpair Disadvantage: at each step it considers only the single vector A^ku, which amounts to throwing away the information contained in u, Au, A²u, . . . , A^k−1u.

Definition

Let A be of order n and let u 6= 0 be an n vector. Then {u, Au, A²u, A³u, . . .}

is a Krylov sequence based on A and u. We call the matrix Kk(A, u) =

u Au A²u · · · A^k−1u the kth Krylov matrix. The space

K_k(A, u) = R[K_k(A, u)]

is called the kth Krylov subspace.

(27)

師大

Theorem

Let A and u 6= 0 be given. Then

1 The sequence of Krylov subspaces satisfies

K_k(A, u) ⊂ K_k+1(A, u), AK_k(A, u) ⊂ K_k+1(A, u).

2 If σ 6= 0, then

K_k(A, u) = K_k(σA, u) = K_k(A, σu).

3 For any κ,

K_k(A, u) = K_k(A − κI, u).

4 If W is nonsingular, then

K_k(W⁻¹AW, W⁻¹u) = W⁻¹K_k(A, u).

(28)

師大 Krylov sequences and Krylov spaces

A Krylov sequence terminates at ` if ` is the smallest integer such that

K_`+1(A, u) = K_`(A, u).

Theorem

A Krylov sequence terminates based on A and u at ` if and only if ` is the smallest integer for which

dim[K_`+1] =dim[K_`].

If the Krylov sequence terminates at `, then K_`is an

eigenspace of A of dimension `. On the other hand, if u lies in an eigenspace of dimension m, then for some ` ≤ m, the sequence terminates at `.

(29)

師大

Proof:

If K_`+1 = K_`, then dim[K_`+1] =dim[K_`]. On the other hand, if dim[K_`+1] =dim[K_`], then K_`+1 = K`because K_`⊂ K_`+1. If the sequence terminates at `, then

AK_` ⊂ K_`+1= K_`, so that K_` is an invariant subspace of A.

Let X be an invariant subspace with dimension m. If u ∈ X, then Aⁱu ∈ X for all i. That is K_i⊂ X and

dim(K_i) ≤ mfor all i. If the sequence terminates at ` > m, then K_` is an invariant subspace and dim(K_`) >

dim(K_m) =dim(X ), which is impossible.

(30)

師大 Convergence

By the definition of K_k(A, u), for any vector v ∈ K_k(A, u)can be written in the form

v = γ1u + γ2Au + · · · + γ_kA^k−1u ≡ p(A)u, where

p(A) = γ₁I + γ₂A + γ₃A²+ · · · + γ_kA^k−1.

Assume that A is Hermitian and has an orthonormal eigenpairs (λi, xi)for i = 1, . . . , n. Write u in the form

u = α₁x₁+ α₂x₂+ · · · + α_nx_n, where α_i = x^H_i u. Since p(A)x_i = p(λ_i)x_i, we have

p(A)u = α1p(λ1)x1+ α2p(λ2)x2+ · · · + αnp(λn)xn. (4) If p(λi)is large compared with p(λj)for j 6= i, then p(A)u is a good approximation to x_i.

(31)

師大

Theorem

If x^H_i u 6= 0and p(λ_i) 6= 0, then tan ∠(p(A)u, xi) ≤ max

j6=i

|p(λ_j)|

|p(λ_i)|tan ∠(u, xi).

Proof: From (4), we have

cos ∠(p(A)u, xi) = |x^H_i p(A)u|

kp(A)uk₂kx_ik₂ = |α_ip(λi)|

qPn

j=1|α_jp(λj)|² and

sin ∠(p(A)u, xi) = qP

j6=i|α_jp(λj)|² qPn

j=1|α_jp(λj)|²

(32)

師大 Convergence

Hence

tan²∠(p(A)u, xi) = X

j6=i

|α_jp(λ_j)|²

|α_ip(λi)|²

≤ max

j6=i

|p(λ_j)|²

|p(λ_i)|² X

j6=i

|α_j|²

|α_i|²

= max

j6=i

|p(λ_j)|²

|p(λ_i)|² tan²∠(u, xi).

Assume that p(λ_i) = 1, then tan ∠(p(A)u, xi) ≤ max

j6=i,p(λi)=1|p(λ_j)| tan ∠(u, xi) ∀ p(A)u ∈ K_k. Hence

tan ∠(xi, K_k) ≤ min

deg(p)≤k−1,p(λi)=1max

j6=i |p(λ_j)| tan ∠(u, xi).

(33)

師大

Assume that

λ₁> λ₂ ≥ · · · ≥ λ_n

and that our interest is in the eigenvector x1. Then tan ∠(x1, Kk) ≤ min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn,λ2]|p(λ)| tan ∠(u, x1).

Question

How to compute

deg(p)≤k−1,p(λmin 1)=1 max

λ∈[λn,λ2]|p(λ)|?

Definition

The Chebyshev polynomials are defined by c_k(t) =

cos(k cos⁻¹t), |t| ≤ 1, cosh(k cosh⁻¹t), |t| ≥ 1.

(34)

師大 Convergence

Theorem

(a) c₀(t) = 1, c₁(t) = tand

c_k+1(t) = 2c_k(t) − c_k−1(t), k = 1, 2, . . . . (b) For |t| > 1,

c_k(t) = (1 +p

t²− 1)^k+ (1 +p

t²− 1)^−k. (c) For t ∈ [−1, 1], |c_k(t)| ≤ 1. Moreover, if

t_ik = cos(k − i)π

k , i = 0, 1, . . . , k, then

c_k(t_ik) = (−1)^k−i.

(35)

師大

(d) For s > 1,

min

deg(p)≤k,p(s)=1max

t∈[0,1]

|p(t)| = 1

c_k(s), (5) and the minimum is obtained only for p(t) = c_k(t)/ck(s).

For applying (5), we define

λ = λ2+ (µ − 1)(λ2− λ_n)

to transform interval [λ_n, λ₂]to [0, 1]. Then the value of µ at λ₁ is µ1 = 1 + λ1− λ₂

λ₂− λ_n and

min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn,λ2]

|p(λ)|

= min

deg(p)≤k−1,p(µ1)=1 max

µ∈[0,1]

|p(µ)| = 1 c_k−1(µ₁).

(36)

師大 Convergence

Theorem

Let the Hermitian matrix A have an orthonormal eigenpair (λi, xi)with

λ₁> λ₂ ≥ · · · ≥ λ_n. Let

η = λ1− λ₂ λ₂− λ_n. Then

tan ∠[x1, K_k(A, u)] ≤ tan ∠(x1, u) c_k−1(1 + η)

= tan ∠(x1, u)

(1 +p

2η + η²)^k−1+ (1 +p

2η + η²)^1−k.

(37)

師大

Remark

For k large, we have

tan ∠[x1, Kk(A, u)] . tan ∠(x1, u) (1 +p

2η + η²)^k−1. For k large and if η is small, then the bound becomes

tan ∠[x1, K_k(A, u)] . tan ∠(x1, u) (1 +√

2η)^k−1. Compare it with power method:

If |λ₁| > |λ₂| ≥ · · · ≥ |λ_n|, then the convergence of the power method is |λ₂/λ₁|^k.

(38)

師大 Convergence

For example, let

λ₁ = 1, λ₂ = 0.95, λ₃= 0.95², · · · , λ₁₀₀ = 0.95⁹⁹ be the eigenvalues of 100-by-100 matrix A. Then η = 0.0530and the bound on the convergence rate is 1/(1 +√

2η) = 0.7544. Thus the square root effect gives a great improvement over the rate of 0.95 for the power method.

Replaced A by −A, then the Krylov sequence converges to the eigenvector corresponding to the smallest

eigenvalue of A. However, the smallest eigenvalues of a matrix – particularly a positive definite matrix – often tend to cluster together, so that the bound will be unfavorable.

(39)

師大

The hypothesis λ1> λ2 can be relaxed. Suppose that λ₁= λ₂ > λ₃. Expand u in the form

u = α₁x₁+ α₂x₂+ α₃x₃+ · · · + α_nx_n. Then

A^ku = λ^k₁(α1x1+ α2x2) + α3λ^k₃x3+ · · · + αnλ^k_nxn. This shows that the spaces K_k(A, u)contain only approximations to α₁x1+ α2x2.

(40)

師大 Convergence

Theorem

Let λ be a simple eigenvalue of A and let A =

x X

λ 0 0 L

y^H Y^H

= λxy^H + XLY^H be a spectral representation. Let

u = αx + Xa, where

α = y^Hu and a = Y^Hu.

Then

sin ∠[x, Kk(A, u)] ≤ |α|⁻¹ min

deg(p)≤k−1,p(λ)=1

kXp(L)ak₂.

(41)

師大

Proof: From Theorem 11,

sin ∠[x, Kk(A, u)] = |α|⁻¹ min

y∈Kk(A,u)

kαx − yk₂

= |α|⁻¹ min

deg(p)≤k−1

kαx − p(A)uk₂

≤ |α|⁻¹ min

deg(p)≤k−1,p(λ)=1kαx − p(A)uk₂. Since

p(λ) = 1 and AX = XL, we have

p(A)u = p(A)(αx + Xa) = αp(λ)x + Xp(L)a = αx + Xp(L)a.

Hence

sin ∠[x, Kk(A, u)] ≤ |α|⁻¹ min

deg(p)≤k−1,p(λ)=1

kαx − (αx + Xp(L)a)k₂

= |α|⁻¹ min

deg(p)≤k−1,p(λ)=1kXp(L)ak₂.

(42)

師大 Block Krylov spaces

Let (λ_i, x_i)be an eigenpair of A for i = 1, . . . , n. Write vector u in the form

u = α1x1+ α2x2+ · · · + αnxn. Assume that λ₁is double, i.e., λ₁ = λ₂. Then

A^ku = λ^k₁(α₁x₁+ α₂x₂) + λ^k₃α₃x₃+ · · · + λ^k_nα_nx_n. Hence the Krylov sequence can only produce the

approximation α₁x₁+ α₂x₂ to a vector in the eigenspace of λ₁. Let U be a matrix with linearly independent columns. Then the sequence

{U, AU, A²U, . . .}

is called a block Krylov sequence and the space K_k(A, U )is called the k-th block Krylov space.

(43)

師大

Gaol: passing to block Krylov sequence improves the convergence bound.

Theorem

Let A be Hermitian and let (λi, xi)be a complete system (n eigenvectors are linearly independent) of orthonormal eigenpairs of A with

λ₁≥ λ₂ ≥ · · · ≥ λ_n,

and assume that the multiplicity of λ₁ is not greater than m. If

B =





 x^H₁

... x^H_m





U is nonsingular and we set

v = U B⁻¹e₁,

(44)

then

tan ∠[x1, K_k(A, U )] ≤ tan ∠(x1, v) c_k−1(1 + 2η)

= tan ∠(x1, v)

(1 + 2p

η + η²)^k−1+ (1 + 2p

η + η²)^1−k(6), where

η = λ₁− λ_m+1 λm+1− λ_n.

Proof: Since v ∈ R(U ), we have K_k(A, v) ⊂ Kk(A, U ). By Theorem 11,

sin ∠[x1, K_k(A, U )] = min

y∈Kk(A,U )

kx₁− yk₂

≤ min

y∈Kk(A,v)

kx₁− yk₂ = sin ∠[x1, Kk(A, v)].

(45)

師大

This implies that

∠[x1, K_k(A, U )] ≤ ∠[x1, K_k(A, v)].

By the definition of B, we have





 x^H₁

... x^H_n





U B⁻¹ = I ⇒ x^H_i U B⁻¹ = e^T_i for i = 1, . . . , m.

By the definition of v,

x^H_i v = x^H_i U B⁻¹e1= 0 for i = 2, . . . , m.

On the other hand, (λ_i, x_i)is an eigenpair of Hermitian A, i.e., x^H_i A = λ_ix^H_i . Hence

x^H_i A^jv = λ^j_ix^H_i v = 0 for i = 2, . . . , m.

(46)

This implies that x₂, . . . , x_m are not contained in K_k(A, v). That is

A^jv = α₁λ^j₁x₁+ α_m+1λ^j_m+1x_m+1+ · · · + α_nλ^j_nx_n

for j = 1, . . . , k − 1. We may now apply Theorem 23 to get (6).

(47)

師大

Theorem

Let U be a subspace and let U be a basis for U . Let V be a left inverse of U and set

B = V^HAU.

If X ⊂ U is an eigenspace of A, then there is an eigenpair (L, W )of B such that (L, U W ) is an eigenpair of A with R(U W ) = X .

Proof: Let (L, X) be an eigenpair of A and let X = U W . Then from the relation

AU W = U W L we obtain

BW = V^HAU W = V^HU W L = W L, so that (L, W ) is an eigenpair of B.

(48)

師大 Rayleigh-Ritz methods

We can find exact eigenspaces contained in U by looking at eigenpairs of the Rayleigh quotient B.

Algorithm (Rayleigh-Ritz procedure)

1 Let U be a basis for U and let V^H be a left inverse of U .

2 Form the Rayleigh quotient B = V^HAU.

3 Let (M, W ) be a suitable eigenpair of B.

4 Return (M, U W ) as an approximate eigenpair of A.

(M, U W )is called aRitz pair. Written Ritz pair in the form (λ, U w), we will call λ aRitz valueand U w aRitz vector.

Two difficulties for Rayleigh-Ritz procedure: (i) how to choose the eigenpair (M, W ) in statement 3. (ii) no guarantee that the result approximates the desired eigenpair of A.

(49)

師大

Example

Let A = diag(0, 1, −1) and suppose we interested in approximating the eigenpair (0, e₁). Assume

U =





1 0

0 1/√ 2 0 1/√

2



.

Then

B = U^HAU =

0 0 0 0

and any nonzero vector p is an eigenvector of B. If we take p = [1, 1]^T, then U p = [1, 1/√

2, 1/√

2]is an approximate eigenvector of A, which is completely wrong. Thus the method can fail, even though the space U contains the desired

eigenvector.

(50)

師大 Rayleigh-Ritz methods

The matrices U and V in Algorithm 1 satisfy the condition V^HU = Iand they can differ. Hence Algorithm 1 is called anoblique Rayleigh-Ritz method.

If the matrix U is taken to be orthonormal and V = U . In addition, W is taken to be orthonormal, so that ˆX ≡ U W is also orthonormal. We call this procedure theorthogonal Rayleigh-Rite method.

Theorem

Let (M, ˆX ≡ U W )be an orthogonal Rayleigh-Rite pair. Then R = A ˆX − ˆXM

is minimal in any unitarily invariant norm.

(51)

師大

Proof: By Theorem 14, we need to show M = ˆX^HA ˆX. Since (M, W )is an eigenpair of B and W is orthonormal, we have

M = W^HBW and

Xˆ^HA ˆX = W^HU^HAU W = W^HBW = M.

(52)

師大 Convergence

Let (λ, x) be the desired eigenpair of A and U_θ be an orthonormal basis for which θ = ∠(x, Uθ)is small.

Theorem Let

B_θ= U_θ^HAU_θ. Then there is a matrix E_θsatisfying

kE_θk₂ ≤ sin θ p1 − sin²θ

kAk₂ such that λ is an eigenvalue of B_θ+ E_θ. Proof. Let (U_θ, U⊥)be unitary and set

y = U_θ^Hx and z = U_⊥^Hx.

(53)

師大

From Theorem 10, we have

kzk₂= sin θ and kyk₂ =p

1 − sin²θ.

Since Ax − λx = 0, we have U_θ^HA[U_θ, U⊥]

U_θ^H U_⊥^H

x − λU_θ^Hx = 0, or

B_θy + U_θ^HAU⊥z − λy = 0.

Let ˆy = y/kyk2 = y/p

1 − sin²θ. If r ≡ B_θy − λˆˆ y = −1

p1 − sin²θ

U_θ^HAU_⊥z, it follows that

krk₂≤ sin θ p1 − sin²θ

kAk₂.

(54)

師大 Convergence

Now define

E_θ= −r ˆy^H. Then

kE_θk₂ = q

λ1((r ˆy^H)(ˆyr^H)) = q

λ1(rr^H) = krk2≤ sin θ p1 − sin²θ

kAk₂ and

(B_θ+ E_θ)ˆy = B_θy − (r ˆˆ y^H)ˆy = B_θy − r = λˆˆ y.

Therefore, (λ, ˆy)is an eigenpair of B_θ+ E_θ. Corollary

There is an eigenvalue µ_θof B_θsuch that

|µ_θ− λ| ≤ 4(2kAk₂+ kEθk₂)^1−1/mkE_θk^1/m₂ , where m is the order of B_θ.

(55)

師大

Theorem

Let (µ_θ, w_θ)be an eigenpair of B_θ and let

w_θ W_θ be unitary, so that

w^H_θ W_θ^H

Bθ

wθ Wθ =

µ_θ h^H_θ 0 Nθ

. Then

sin ∠(x, Uθw_θ) ≤ sin θ s

1 + kh_θk²₂ sep(λ, N_θ)², where sep(λ, N_θ) = k(λI − N_θ)⁻¹k⁻¹.

(56)

師大 Convergence

By the continuity of sep, we have

|sep(λ, N_θ) −sep(µ_θ, N_θ)| ≤ |µ_θ− λ|

⇒ sep(λ, N_θ) ≥sep(µ_θ, N_θ) − |µ_θ− λ|.

Suppose µ_θ→ λ and sep(µ_θ, N_θ)is bounded below. Then sep(λ, N_θ)is also bounded below. Since kh_θk₂ ≤ kAk₂, we have sin ∠(x, Uθw_θ) → 0along with θ.

Corollary

Let (µ_θ, U_θw_θ)be a Ritz pair for which µ_θ→ λ. If there is a constant α > 0 such that

sep(µ_θ, N_θ) ≥ α > 0, (7) then

sin ∠(x, Uθw_θ) . sin θ r

1 +kAk₂ α² .

(57)

師大

This corollary justifies that eigenvalue convergence plus separation equals eigenvector convergence.

The condition (7) is called the uniform separation condition.

Let (µ_θ, x_θ)with kx_θk₂ = 1be the Ritz approximation to (λ, x).

Then by construction, we have

µ_θ = x^H_θ Ax_θ. (8)

Write

x_θ = γx + σy, (9)

where y⊥x and kyk₂= 1. Then

|γ| = |x^Hxθ| = cos ∠(xθ, x) and |σ| = |y^Hxθ| = sin ∠(xθ, x).

If the uniform separation is satisfied, we have

|σ| = sin ∠(xθ, x) = O(θ).

(58)

師大 Convergence

Substituting (9) into (8) and using the facts of Ax = λx and y^Hx = 0, we find that

µ_θ = (¯γx^H + ¯σy^H)(γλx + σAy)

= |γ|²λ + σx^H_θ Ay.

Hence

|µ_θ− λ| = |(|γ|²− 1)λ + σx^H_θ Ay|

≤ |σ|²|λ| + |σ|kx_θk₂kAk₂kyk₂

≤ |σ|(1 + |σ|)kAk₂

= O(θ).

Thus the Ritz value converges at least as fast as the eigenvector approximation of x in U (U_θ).

(59)

師大

If A is Hermitian, then

µ_θ = (¯γx^H + ¯σy^H)(γλx + σAy)

= |γ|²λ + ¯rσx^HAy + |σ|²y^HAy = |γ|²λ + ¯rσ¯λx^Hy + |σ|²y^HAy

= |γ|²λ + |σ|²y^HAy and

|µ_θ− λ| = |(|γ|²− 1)λ + |σ|²y^HAy|

≤ |σ|²|λ| + |σ|²kyk₂kAk₂kyk₂

≤ 2|σ|²kAk₂

= O(θ²).

Since the angle θ = ∠(x, Uθ)cannot be known and hence cannot compute error bounds. Thus, we must look to the residual as an indication of convergence.

(60)

師大 Convergence

Theorem

Let A have the spectral representation A = λxy^H + XLY^H,

where kxk₂ = 1and Y is orthonormal. Let (µ, ˜x)be an approximation to (λ, x) and let

ρ = kA˜x − µ˜xk₂. Then

sin ∠(˜x, x) ≤ ρ

sep(µ, L) ≤ ρ

sep(λ, L) − |µ − λ|.

(61)

師大

Proof: Since Y^Hx = 0, we have that Y is an orthonormal basis for the orthogonal complement of the space span{x} and then sin ∠(˜x, x) = kY^Hxk˜ 2. Let r = A˜x − µ˜x. Then

Y^Hr = Y^HA˜x − µY^Hx = (L − µI)Y˜ ^Hx.˜ It follows that

sin ∠(˜x, x) = k(L − µI)⁻¹Y^Hrk₂ ≤ krk₂ sep(µ, L). By the fact that sep(µ, L) ≥ sep(λ, L) − |µ − λ|, the second inequality is obtained.

Since λ is assumed to be simple, this theorem says that:

Sufficient condition for ˜xto converge to x is for µ to converge to λand for the residual to converge to zero.

(62)

師大 Refined Ritz vectors

Definition

Let µ_θbe a Ritz value associated with U_θ. A refined Ritz vector is a solution of the problem

min kAˆx_θ− µ_θxˆ_θk₂ subject to xˆ_θ ∈ U_θ, kˆx_θk₂ = 1.

Theorem

Let A have the spectral representation A = λxy^H + XLY^H,

where kxk₂ = 1and Y is orthonormal. Let µ_θbe a Ritz value and ˆxθthe corresponding refined Ritz vector. If

sep(λ, L) − |µ_θ− λ| > 0, then

sin ∠(x, ˆx_θ) ≤ kA − µ_θIk₂sin θ + |λ − µ_θ| p1 − sin²θ[sep(λ, L) − |λ − µ_θ|].

(63)

師大

Proof: Let U be an orthonormal basis for U_θ and let x = y + z, where z = U U^Hx. Then

kzk₂ = kU^Hxk₂ = sin θ.

Moreover, since y and x are orthogonal,

kzk²₂ = kx − yk²₂ = (x^H− y^H)(x − y)

= kxk²₂+ kyk²₂= 1 + kyk²₂

⇒ kyk²₂= 1 − kzk²₂= 1 − sin²θ.

Let

ˆ

y = y

p1 − sin²θ , we have

(A − µ_θI)ˆy = (A − µ_θI)y p1 − sin²θ

= (A − µ_θI)(x − z) p1 − sin²θ

= (λ − µ_θ)x − (A − µ_θI)z p1 − sin²θ

.

(64)

師大 Refined Ritz vectors

Hence

k(A − µ_θI)ˆyk2≤ |λ − µ_θ| + kA − µ_θIk sin θ p1 − sin²θ . By the definition of a refined Ritz vector we have

k(A − µ_θI)ˆxk₂ ≤ |λ − µ_θ| + kA − µ_θIk sin θ p1 − sin²θ

. The result now follows from Theorem 35.

Remark

By Corollary 30, µ_θ→ λ. It follows that sin ∠(x, ˆx_θ) → 0. In other words, refined Ritz vectors are guaranteed to

converge.

ˆ

µ_θ = ˆx^H_θ Aˆx_θ is more accurate than µ_θ and kAˆx_θ− ˆµ_θxˆ_θk₂ is optimal.

(65)

師大

The computation of a refined Ritz vector amounts to solve min kAˆx − µˆxk₂

subject to x ∈ U , kˆˆ xk₂ = 1. (10) Let U be an orthonormal basis for U . Then (10) is equivalent to

min k(A − µI)U zk₂ subject to kzk₂ = 1.

The solution of this problem is the right singular vector of (A − µI)Ucorresponding to its smallest singular value. Thus refined Ritz vector can be computed by the following algorithm.

1 V = AU

2 W = V − µU

3 Compute the smallest singular value of W and its right singular vector z

4 x = U zˆ

(66)

師大 Harmonic Ritz vectors

Exterior eigenvalues are easily convergent than interior eigenvalues by Rayleigh quotient. The quality of the refined Ritz vector depends on the accuracy of the Ritz value µ and each refined Ritz vector must be calculated independently from its own distinct value of µ.

Definition

Let U be an orthonormal basis for subspace U . Then (κ + δ, U w)is aHarmonic Ritz pair with shift κif

U^H(A − κI)^H(A − κI)U w = δU^H(A − κI)^HU w. (11) Given shift κ, (11) is a generalized eigenvalue problem with eigenvalue δ.

Theorem

Let (λ, x) be an eigenpair of A with x = U w. Then (λ, U w) is a harmonic Ritz pair.

(67)

師大

Proof: Since (λ, x) is an eigenpair of A with x = U w, we have Ax = λx ⇒ AU w = λU w.

It implies that

U^H(A − κI)^H(A − κI)U w = (λ − κ)U^H(A − κI)^HU w.

Taking eigenvalue δ = λ − κ, we obtain

U^H(A − κI)^H(A − κI)U w = δU^H(A − κI)^HU w.

That is (κ + δ, U w) = (λ, U w) is a harmonic Ritz pair.

(68)

師大 Harmonic Ritz vectors

Given a shift κ, if we want to compute the eigenvalue λ of A which is closest to κ, then we need to compute the eigenvalue δ of (11) such that |δ| is the smallest value of all of the absolute values for the eigenvalues of (11).

Expect

If x is approximately represented in U , then the harmonic Rayleigh-Ritz will produce an approximation to x.

Question

How to compute the eigenpair (δ, w) of (11)?

(69)

師大

Let

(A − κI)U = QR

be the QR factorization of (A − κI)U . Then (11) can be rewritten as

R^HRw = δR^HQ^HU w.

That is

(Q^HU )w = δ⁻¹Rw.

This eigenvalue can be solved by the QZ algorithm. The harmonic Ritz vector ˆx = U wand the corresponding harmonic Ritz value is µ = ˆx^HAˆx.