# Eigenspaces and their Approximation

## Full text

(1)

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University, Taiwan

April 8, 2009

(2)

### Outline

1 Eigenspaces Definitions

Simple eigenspaces

2 Perturbation Theory Canonical angles Residual analysis

3 Krylov subspaces

Krylov sequences and Krylov spaces Convergence

Block Krylov spaces

4 Rayleigh-Ritz Approximation Rayleigh-Ritz methods Convergence

Refined Ritz vectors Harmonic Ritz vectors

(3)

Definition

Let A be of order n and let X be a subspace of Cn. Then X is aneigenspaceorinvariant subspaceof A if

AX = {Ax; x ∈ X } ⊂ X .

If (λ, x) ≡ (α + ıβ, y + ız) is a complex eigenpair of a real matrix A, i.e.,

A(y + ız) = (α + ıβ)(y + ız) = (αy − βz) + ı(βy + αz)

 Ay = αy − βz, Az = βy + αz, then

A

y z  =  y z 

 α β

−β α

 . It implies that R [ y z ] is an eigenspace of A.

(4)

Theorem

Let X be an eigenspace of A and let X be a basis for X . Then there is a unique matrix L such that

AX = XL.

The matrix L is given by

L = XIAX, where XI is a matrix satisfying XIX = I.

If (λ, x) is an eigenpair of A with x ∈ X , then (λ, XIx)is an eigenpair of L. Conversely, if (λ, u) is an eigenpair of L, then (λ, Xu)is an eigenpair of A.

(5)

Proof: Let

X = [x1· · · xk] and Y = AX = [y1· · · yk] .

Since yi ∈ X and X is a basis for X , there is a unique vector `i such that

yi = X`i.

If we set L = [`1· · · `k], then AX = XL and L = XIXL = XIAX.

Now let (λ, x) be an eigenpair of A with x ∈ X . Then there is a unique vector u such that x = Xu. However, u = XIx. Hence

λx = Ax = AXu = XLu ⇒ λu = λXIx = Lu.

Conversely, if Lu = λu, then

A(Xu) = (AX)u = (XL)u = X(Lu) = λ(Xu), so that (λ, Xu) is an eigenpair of A.

(6)

Definition

Let A be of order n. For X ∈ Cn×kand L ∈ Ck×k, we say that (L, X)is an eigenpair of order k or right eigenpair of order k of Aif

1. X is of full rank, 2. AX = XL.

The matrices X and L are called eigenbasis and eigenblock, respectively. If X is orthonormal, we say that the eigenpair (L, X)is orthonormal.

If Y ∈ Cn×k has linearly independent columns and

YHA = LYH, we say that (L, Y ) is a left eigenpair of order k of A.

(7)

Question

How eigenpairs transform under change of basis and similarities?

Theorem

Let (L, X) be an eigenpair of A. If U is nonsingular, then the pair (U−1LU, XU )is also eigenpair of A. If W is nonsingular, then (L, W−1X)is an eigenpair of W−1AW.

proof:

A(XU ) = (AX)U = (XL)U = (XU )(U−1LU ), (W−1AW )(W−1X) = W−1AX = (W−1X)L.

The eigenvalues of L of an eigenspace with respect to a basis are independent of the choices of the basis.

(8)

Theorem

Let L = {λ1, . . . , λk} ⊂ Λ(A) be a multisubset of the

eigenvalues of A. Then there is an eigenspace X of A whose eigenvalues are λ1, . . . , λk.

Proof: Let

A[ U1 U2 ] = [ U1 U2 ]

 T11 T12 0 T22



be a partitioned Schur decomposition of A in which T11is of order k and has the members of L on its diagonal. Then

AU1= U1T11.

Hence the column space of U1is an eigenspace of A whose eigenvalues are the members of L.

(9)

Definition

An eigenvalue whose geometric multiplicity is less than its algebraic multiplicity is defective.

Definition

Let X be an eigenspace of A with eigenvalues L. Then X is a simple eigenspace of A if

L ∩ [Λ(A) \ L] = ∅.

In the other words, an eigenspace is simple if its eigenvalues are disjoint from the other eigenvalues of A.

(10)

Theorem

Let (L1, X1)be a simple orthonormal eigenpairs of A and let (X1, Y2)be unitary so that

 X1H Y2H

 A

X1 Y2  =

 L1 H 0 L2

 .

Then there is a matrix Q satisfying the Sylvester equation L1Q − QL2= −H

such that if we set X =

X1 X2

 and Y =

Y1 Y2  , where

X2 = Y2+ X1Q and Y1 = X1− Y2QH,

(11)

then

YHX = I and YHAX =diag(L1, L2).

Proof: Since (L1, X1)is a simple eigenpairs of A, it implies that Λ(L1) ∩ λ(L2) = ∅.

By Theorem 1.18 in Chapter 1, there is a unique matrix Q satisfying

L1Q − QL2= −H such that

 I −Q

0 I

  L1 H 0 L2

  I Q 0 I



=diag(L1, L2).

That is

 I −Q

0 I

  X1H Y2H

 A

X1 Y2



 I Q 0 I



=diag(L1, L2).

(12)

Therefore,

 X1H− QY2H Y2H

 A

X1 X1Q + Y2  = diag(L1, L2).

Observations

1 X and Y are said to be biorthogonal.

2 Since A

X1 X2  =  X1 X2  diag(L1, L2), we see that

AX2 = X2L2,

so that (L2, X2)is an eigenpair of A. Likewise (L1, Y1)is a left eigenpair of A.

(13)

Let x and y be nonzero vectors. Then the angle ∠(x, y) of x and y is defined as

cos ∠(x, y) = |xHy|

k x k2k y k2.

Extend this definition to subspaces in Cn. Let X and Y be subspaces of the same dimension. Let X and Y be orthonormal bases for X and Y, respectively, and define C = YHX. We have

k C k2≤k X k2k Y k2= 1.

Hence all the singular value of C lie in [0, 1] and can be regarded as cosine of angles.

(14)

Definition

Let X and Y be subspaces of Cnof dimension p and let X and Y be orthonormal bases for X and Y, respectively. Then the canonical angles between X and Y are

θi(X , Y) = cos−1γi, (1) with

θ1(X , Y) ≥ θ2(X , Y) ≥ · · · ≥ θp(X , Y), where γi are the singular values of YHX.

(15)

If the canonical angle is small, then the computation of (1) will give inaccurate results.

For small θ, cos(θ) ∼= 1 −12θ2. If θ ≤ 10−8, then cos(θ) will evaluate to 1 in IEEE double-precision arithmetic, and we will conclude that θ = 0.

The cure for this problem is to compute the sine of the canonical angles.

Theorem

Let X and Y be orthonormal bases for X and Y, and let Ybe an orthonormal basis for the orthogonal complement of Y.

Then the singular values of YHX are the sines of the canonical angles between X and Y.

(16)

Proof: Let

 YH YH

 X =

 C S

 . By the orthonormality, we have

I = CHC + SHS.

Let

VH(CHC)V = Γ2 ≡ diag(γ12, · · · , γp2)

be the spectral decomposition of CHC. Then by the definition of canonical angle θiin (1), we have

θi= cos−1γi. But

I = VH(CHC + SHS)V = Γ2+ VH(SHS)V ≡ Γ2+ Σ2.

(17)

It follows that

Σ2 ≡ diag(σ21, · · · , σ2p) =diag(1 − γ21, · · · , 1 − γp2), where σi are singular values of S = YHX. Therefore,

σ2i = 1 − γi2= 1 − cos2θi = sin2θi ⇒ θi= sin−1σi.

Theorem

Let x be a vector with k x k2= 1and let Y be a subspace. Then sin ∠(x, Y) = min

y∈Y k x − y k2.

(18)

Proof: Let (Y, Y)be unitary with R(Y ) = Y. Let y ∈ Y, then

 YH YH



(x − y) =

 xˆ ˆ x



 yˆ 0



=

 x − ˆˆ y ˆ x

 . It implies that

k x − y k2=

 YH YH



(x − y) 2

=

 x − ˆˆ y ˆ x

 2

and hence

miny∈Ykx − yk = kˆxk2= kYHxk2. (2) By Theorem 10 and (2), we have

sin ∠(x, Y) = kYHxk2= min

y∈Ykx − yk.

(19)

Theorem

Let X and Y be orthonormal matrices with XHY = 0and let Z = X + Y Q. Let

σ1≥ σ2 ≥ · · · ≥ σk> 0 and ζ1 ≥ ζ2 ≥ · · · ≥ ζk> 0 denote the nonzero singular values of Z and Q, respectively.

Set

θ1 ≥ θ2≥ · · · ≥ θk

to be the nonzero canonical angle between R(X) and R(Z).

Then

σi = sec θi and ζi= tan θi, for i = 1, . . . , k.

(20)

Proof: Since

XHX = I, YHY = I, XHY = 0 and Z = X + Y Q, we have

ZHZ = (XH + QHYH)(X + Y Q) = I + QHQ.

This implies that

σi2 = 1 + ζi2, for i = 1, . . . , k. (3) Define

Z ≡ Z(Zˆ HZ)−1/2= (X + Y Q)(I + QHQ)−1/2, where (I + QHQ)−1/2is the inverse of the positive definite square root of I + QHQ. Then ˆZis an orthonormal basis for R(Z) and

XHZ = (I + QQˆ H)−1/2.

(21)

Hence the singular values γi of XHZˆare γi =

q 1 + ζi2

−1

for i = 1, . . . , k. Using (3) and the definition of canonical angles θi between R(X) and R(Z), we have

cos θi = γi= σi−1. That is

σi= 1 cos θi

= sec θi. The relation tan θi= ζinow follows from (3).

(22)

Let (L1, X1)be a simple right orthonormal eigenpair of A and let (X1, Y2)be unitary. From Theorem 8, (L1, Y1≡ X1− Y2QH) is left eigenpair of A and Y1HX1= I. By Theorem 12, we obtain the following corollary.

Corollary

Let X be an orthonormal basis for a simple eigenspace X of A and let Y be a basis for the corresponding left eigenspace Y of Anormalized so that YHX = I. Then the singular values of Y are the secants of the canonical angles between X and Y. In particular,

kY k2= sec θ1(X , Y).

(23)

Theorem

Let [X X]be unitary. Let R = AX − XL and

SH = XHA − LXH. Then kRk and kSk are minimized when L = XHAX,

in which case

kRk = kXHAXk and kSk = kXHAXk.

Proof: Set

 XH XH

 A

X X  =

 Lˆ H

G M

 . Then

 XH XH

 R =

 Lˆ H

G M

  XH XH

 X −

 XH XH

 XL =

 L − Lˆ G

 .

(24)

It implies that kRk =

 XH XH

 R

=

 L − Lˆ G

 ,

which is minimized when L = XHAXand min kRk = kGk = kXHAXk.

The proof for S is similar.

Definition

Let X be of full column rank and let XIbe a left inverse of X.

Then XIAX is a Rayleigh quotient of A.

(25)

Theorem

Let X be orthonormal and let

R = AX − XL.

Let `1, . . . , `kbe the eigenvalues of L. Then there are eigenvalues λj1, . . . , λjk of A such that

|`i− λji| ≤ kRk2 and

v u u t

k

X

i=1

(`i− λji)2≤√

2kRkF.

(26)

Power method: compute the dominant eigenpair Disadvantage: at each step it considers only the single vector Aku, which amounts to throwing away the information contained in u, Au, A2u, . . . , Ak−1u.

Definition

Let A be of order n and let u 6= 0 be an n vector. Then {u, Au, A2u, A3u, . . .}

is a Krylov sequence based on A and u. We call the matrix Kk(A, u) =

u Au A2u · · · Ak−1u  the kth Krylov matrix. The space

Kk(A, u) = R[Kk(A, u)]

is called the kth Krylov subspace.

(27)

Theorem

Let A and u 6= 0 be given. Then

1 The sequence of Krylov subspaces satisfies

Kk(A, u) ⊂ Kk+1(A, u), AKk(A, u) ⊂ Kk+1(A, u).

2 If σ 6= 0, then

Kk(A, u) = Kk(σA, u) = Kk(A, σu).

3 For any κ,

Kk(A, u) = Kk(A − κI, u).

4 If W is nonsingular, then

Kk(W−1AW, W−1u) = W−1Kk(A, u).

(28)

A Krylov sequence terminates at ` if ` is the smallest integer such that

K`+1(A, u) = K`(A, u).

Theorem

A Krylov sequence terminates based on A and u at ` if and only if ` is the smallest integer for which

dim[K`+1] =dim[K`].

If the Krylov sequence terminates at `, then K`is an

eigenspace of A of dimension `. On the other hand, if u lies in an eigenspace of dimension m, then for some ` ≤ m, the sequence terminates at `.

(29)

Proof:

If K`+1 = K`, then dim[K`+1] =dim[K`]. On the other hand, if dim[K`+1] =dim[K`], then K`+1 = K`because K`⊂ K`+1. If the sequence terminates at `, then

AK` ⊂ K`+1= K`, so that K` is an invariant subspace of A.

Let X be an invariant subspace with dimension m. If u ∈ X, then Aiu ∈ X for all i. That is Ki⊂ X and

dim(Ki) ≤ mfor all i. If the sequence terminates at ` > m, then K` is an invariant subspace and dim(K`) >

dim(Km) =dim(X ), which is impossible.

(30)

By the definition of Kk(A, u), for any vector v ∈ Kk(A, u)can be written in the form

v = γ1u + γ2Au + · · · + γkAk−1u ≡ p(A)u, where

p(A) = γ1I + γ2A + γ3A2+ · · · + γkAk−1.

Assume that A is Hermitian and has an orthonormal eigenpairs (λi, xi)for i = 1, . . . , n. Write u in the form

u = α1x1+ α2x2+ · · · + αnxn, where αi = xHi u. Since p(A)xi = p(λi)xi, we have

p(A)u = α1p(λ1)x1+ α2p(λ2)x2+ · · · + αnp(λn)xn. (4) If p(λi)is large compared with p(λj)for j 6= i, then p(A)u is a good approximation to xi.

(31)

Theorem

If xHi u 6= 0and p(λi) 6= 0, then tan ∠(p(A)u, xi) ≤ max

j6=i

|p(λj)|

|p(λi)|tan ∠(u, xi).

Proof: From (4), we have

cos ∠(p(A)u, xi) = |xHi p(A)u|

kp(A)uk2kxik2 = |αip(λi)|

qPn

j=1jp(λj)|2 and

sin ∠(p(A)u, xi) = qP

j6=ijp(λj)|2 qPn

j=1jp(λj)|2

(32)

Hence

tan2∠(p(A)u, xi) = X

j6=i

jp(λj)|2

ip(λi)|2

≤ max

j6=i

|p(λj)|2

|p(λi)|2 X

j6=i

j|2

i|2

= max

j6=i

|p(λj)|2

|p(λi)|2 tan2∠(u, xi).

Assume that p(λi) = 1, then tan ∠(p(A)u, xi) ≤ max

j6=i,p(λi)=1|p(λj)| tan ∠(u, xi) ∀ p(A)u ∈ Kk. Hence

tan ∠(xi, Kk) ≤ min

deg(p)≤k−1,p(λi)=1max

j6=i |p(λj)| tan ∠(u, xi).

(33)

Assume that

λ1> λ2 ≥ · · · ≥ λn

and that our interest is in the eigenvector x1. Then tan ∠(x1, Kk) ≤ min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn2]|p(λ)| tan ∠(u, x1).

Question

How to compute

deg(p)≤k−1,p(λmin 1)=1 max

λ∈[λn2]|p(λ)|?

Definition

The Chebyshev polynomials are defined by ck(t) =

 cos(k cos−1t), |t| ≤ 1, cosh(k cosh−1t), |t| ≥ 1.

(34)

Theorem

(a) c0(t) = 1, c1(t) = tand

ck+1(t) = 2ck(t) − ck−1(t), k = 1, 2, . . . . (b) For |t| > 1,

ck(t) = (1 +p

t2− 1)k+ (1 +p

t2− 1)−k. (c) For t ∈ [−1, 1], |ck(t)| ≤ 1. Moreover, if

tik = cos(k − i)π

k , i = 0, 1, . . . , k, then

ck(tik) = (−1)k−i.

(35)

(d) For s > 1,

min

deg(p)≤k,p(s)=1max

t∈[0,1]

|p(t)| = 1

ck(s), (5) and the minimum is obtained only for p(t) = ck(t)/ck(s).

For applying (5), we define

λ = λ2+ (µ − 1)(λ2− λn)

to transform interval [λn, λ2]to [0, 1]. Then the value of µ at λ1 is µ1 = 1 + λ1− λ2

λ2− λn and

min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn2]

|p(λ)|

= min

deg(p)≤k−1,p(µ1)=1 max

µ∈[0,1]

|p(µ)| = 1 ck−11).

(36)

Theorem

Let the Hermitian matrix A have an orthonormal eigenpair (λi, xi)with

λ1> λ2 ≥ · · · ≥ λn. Let

η = λ1− λ2 λ2− λn. Then

tan ∠[x1, Kk(A, u)] ≤ tan ∠(x1, u) ck−1(1 + η)

= tan ∠(x1, u)

(1 +p

2η + η2)k−1+ (1 +p

2η + η2)1−k.

(37)

Remark

For k large, we have

tan ∠[x1, Kk(A, u)] . tan ∠(x1, u) (1 +p

2η + η2)k−1. For k large and if η is small, then the bound becomes

tan ∠[x1, Kk(A, u)] . tan ∠(x1, u) (1 +√

2η)k−1. Compare it with power method:

If |λ1| > |λ2| ≥ · · · ≥ |λn|, then the convergence of the power method is |λ21|k.

(38)

For example, let

λ1 = 1, λ2 = 0.95, λ3= 0.952, · · · , λ100 = 0.9599 be the eigenvalues of 100-by-100 matrix A. Then η = 0.0530and the bound on the convergence rate is 1/(1 +√

2η) = 0.7544. Thus the square root effect gives a great improvement over the rate of 0.95 for the power method.

Replaced A by −A, then the Krylov sequence converges to the eigenvector corresponding to the smallest

eigenvalue of A. However, the smallest eigenvalues of a matrix – particularly a positive definite matrix – often tend to cluster together, so that the bound will be unfavorable.

(39)

The hypothesis λ1> λ2 can be relaxed. Suppose that λ1= λ2 > λ3. Expand u in the form

u = α1x1+ α2x2+ α3x3+ · · · + αnxn. Then

Aku = λk11x1+ α2x2) + α3λk3x3+ · · · + αnλknxn. This shows that the spaces Kk(A, u)contain only approximations to α1x1+ α2x2.

(40)

Theorem

Let λ be a simple eigenvalue of A and let A =

x X 

 λ 0 0 L

  yH YH



= λxyH + XLYH be a spectral representation. Let

u = αx + Xa, where

α = yHu and a = YHu.

Then

sin ∠[x, Kk(A, u)] ≤ |α|−1 min

deg(p)≤k−1,p(λ)=1

kXp(L)ak2.

(41)

Proof: From Theorem 11,

sin ∠[x, Kk(A, u)] = |α|−1 min

y∈Kk(A,u)

kαx − yk2

= |α|−1 min

deg(p)≤k−1

kαx − p(A)uk2

≤ |α|−1 min

deg(p)≤k−1,p(λ)=1kαx − p(A)uk2. Since

p(λ) = 1 and AX = XL, we have

p(A)u = p(A)(αx + Xa) = αp(λ)x + Xp(L)a = αx + Xp(L)a.

Hence

sin ∠[x, Kk(A, u)] ≤ |α|−1 min

deg(p)≤k−1,p(λ)=1

kαx − (αx + Xp(L)a)k2

= |α|−1 min

deg(p)≤k−1,p(λ)=1kXp(L)ak2.

(42)

Let (λi, xi)be an eigenpair of A for i = 1, . . . , n. Write vector u in the form

u = α1x1+ α2x2+ · · · + αnxn. Assume that λ1is double, i.e., λ1 = λ2. Then

Aku = λk11x1+ α2x2) + λk3α3x3+ · · · + λknαnxn. Hence the Krylov sequence can only produce the

approximation α1x1+ α2x2 to a vector in the eigenspace of λ1. Let U be a matrix with linearly independent columns. Then the sequence

{U, AU, A2U, . . .}

is called a block Krylov sequence and the space Kk(A, U )is called the k-th block Krylov space.

(43)

Gaol: passing to block Krylov sequence improves the convergence bound.

Theorem

Let A be Hermitian and let (λi, xi)be a complete system (n eigenvectors are linearly independent) of orthonormal eigenpairs of A with

λ1≥ λ2 ≥ · · · ≥ λn,

and assume that the multiplicity of λ1 is not greater than m. If

B =

 xH1

... xHm

U is nonsingular and we set

v = U B−1e1,

(44)

then

tan ∠[x1, Kk(A, U )] ≤ tan ∠(x1, v) ck−1(1 + 2η)

= tan ∠(x1, v)

(1 + 2p

η + η2)k−1+ (1 + 2p

η + η2)1−k(6), where

η = λ1− λm+1 λm+1− λn.

Proof: Since v ∈ R(U ), we have Kk(A, v) ⊂ Kk(A, U ). By Theorem 11,

sin ∠[x1, Kk(A, U )] = min

y∈Kk(A,U )

kx1− yk2

≤ min

y∈Kk(A,v)

kx1− yk2 = sin ∠[x1, Kk(A, v)].

(45)

This implies that

∠[x1, Kk(A, U )] ≤ ∠[x1, Kk(A, v)].

By the definition of B, we have

 xH1

... xHn

U B−1 = I ⇒ xHi U B−1 = eTi for i = 1, . . . , m.

By the definition of v,

xHi v = xHi U B−1e1= 0 for i = 2, . . . , m.

On the other hand, (λi, xi)is an eigenpair of Hermitian A, i.e., xHi A = λixHi . Hence

xHi Ajv = λjixHi v = 0 for i = 2, . . . , m.

(46)

This implies that x2, . . . , xm are not contained in Kk(A, v). That is

Ajv = α1λj1x1+ αm+1λjm+1xm+1+ · · · + αnλjnxn

for j = 1, . . . , k − 1. We may now apply Theorem 23 to get (6).

(47)

Theorem

Let U be a subspace and let U be a basis for U . Let V be a left inverse of U and set

B = VHAU.

If X ⊂ U is an eigenspace of A, then there is an eigenpair (L, W )of B such that (L, U W ) is an eigenpair of A with R(U W ) = X .

Proof: Let (L, X) be an eigenpair of A and let X = U W . Then from the relation

AU W = U W L we obtain

BW = VHAU W = VHU W L = W L, so that (L, W ) is an eigenpair of B.

(48)

We can find exact eigenspaces contained in U by looking at eigenpairs of the Rayleigh quotient B.

Algorithm (Rayleigh-Ritz procedure)

1 Let U be a basis for U and let VH be a left inverse of U .

2 Form the Rayleigh quotient B = VHAU.

3 Let (M, W ) be a suitable eigenpair of B.

4 Return (M, U W ) as an approximate eigenpair of A.

(M, U W )is called aRitz pair. Written Ritz pair in the form (λ, U w), we will call λ aRitz valueand U w aRitz vector.

Two difficulties for Rayleigh-Ritz procedure: (i) how to choose the eigenpair (M, W ) in statement 3. (ii) no guarantee that the result approximates the desired eigenpair of A.

(49)

Example

Let A = diag(0, 1, −1) and suppose we interested in approximating the eigenpair (0, e1). Assume

U =

1 0

0 1/√ 2 0 1/√

2

.

Then

B = UHAU =

 0 0 0 0



and any nonzero vector p is an eigenvector of B. If we take p = [1, 1]T, then U p = [1, 1/√

2, 1/√

2]is an approximate eigenvector of A, which is completely wrong. Thus the method can fail, even though the space U contains the desired

eigenvector.

(50)

The matrices U and V in Algorithm 1 satisfy the condition VHU = Iand they can differ. Hence Algorithm 1 is called anoblique Rayleigh-Ritz method.

If the matrix U is taken to be orthonormal and V = U . In addition, W is taken to be orthonormal, so that ˆX ≡ U W is also orthonormal. We call this procedure theorthogonal Rayleigh-Rite method.

Theorem

Let (M, ˆX ≡ U W )be an orthogonal Rayleigh-Rite pair. Then R = A ˆX − ˆXM

is minimal in any unitarily invariant norm.

(51)

Proof: By Theorem 14, we need to show M = ˆXHA ˆX. Since (M, W )is an eigenpair of B and W is orthonormal, we have

M = WHBW and

HA ˆX = WHUHAU W = WHBW = M.

(52)

Let (λ, x) be the desired eigenpair of A and Uθ be an orthonormal basis for which θ = ∠(x, Uθ)is small.

Theorem Let

Bθ= UθHAUθ. Then there is a matrix Eθsatisfying

kEθk2 ≤ sin θ p1 − sin2θ

kAk2 such that λ is an eigenvalue of Bθ+ Eθ. Proof. Let (Uθ, U)be unitary and set

y = UθHx and z = UHx.

(53)

From Theorem 10, we have

kzk2= sin θ and kyk2 =p

1 − sin2θ.

Since Ax − λx = 0, we have UθHA[Uθ, U]

 UθH UH



x − λUθHx = 0, or

Bθy + UθHAUz − λy = 0.

Let ˆy = y/kyk2 = y/p

1 − sin2θ. If r ≡ Bθy − λˆˆ y = −1

p1 − sin2θ

UθHAUz, it follows that

krk2≤ sin θ p1 − sin2θ

kAk2.

(54)

Now define

Eθ= −r ˆyH. Then

kEθk2 = q

λ1((r ˆyH)(ˆyrH)) = q

λ1(rrH) = krk2≤ sin θ p1 − sin2θ

kAk2 and

(Bθ+ Eθ)ˆy = Bθy − (r ˆˆ yH)ˆy = Bθy − r = λˆˆ y.

Therefore, (λ, ˆy)is an eigenpair of Bθ+ Eθ. Corollary

There is an eigenvalue µθof Bθsuch that

θ− λ| ≤ 4(2kAk2+ kEθk2)1−1/mkEθk1/m2 , where m is the order of Bθ.

(55)

Theorem

Let (µθ, wθ)be an eigenpair of Bθ and let

wθ Wθ  be unitary, so that

 wHθ WθH

 Bθ

 wθ Wθ  =

 µθ hHθ 0 Nθ

 . Then

sin ∠(x, Uθwθ) ≤ sin θ s

1 + khθk22 sep(λ, Nθ)2, where sep(λ, Nθ) = k(λI − Nθ)−1k−1.

(56)

By the continuity of sep, we have

|sep(λ, Nθ) −sep(µθ, Nθ)| ≤ |µθ− λ|

⇒ sep(λ, Nθ) ≥sep(µθ, Nθ) − |µθ− λ|.

Suppose µθ→ λ and sep(µθ, Nθ)is bounded below. Then sep(λ, Nθ)is also bounded below. Since khθk2 ≤ kAk2, we have sin ∠(x, Uθwθ) → 0along with θ.

Corollary

Let (µθ, Uθwθ)be a Ritz pair for which µθ→ λ. If there is a constant α > 0 such that

sep(µθ, Nθ) ≥ α > 0, (7) then

sin ∠(x, Uθwθ) . sin θ r

1 +kAk2 α2 .

(57)

This corollary justifies that eigenvalue convergence plus separation equals eigenvector convergence.

The condition (7) is called the uniform separation condition.

Let (µθ, xθ)with kxθk2 = 1be the Ritz approximation to (λ, x).

Then by construction, we have

µθ = xHθ Axθ. (8)

Write

xθ = γx + σy, (9)

where y⊥x and kyk2= 1. Then

|γ| = |xHxθ| = cos ∠(xθ, x) and |σ| = |yHxθ| = sin ∠(xθ, x).

If the uniform separation is satisfied, we have

|σ| = sin ∠(xθ, x) = O(θ).

(58)

Substituting (9) into (8) and using the facts of Ax = λx and yHx = 0, we find that

µθ = (¯γxH + ¯σyH)(γλx + σAy)

= |γ|2λ + σxHθ Ay.

Hence

θ− λ| = |(|γ|2− 1)λ + σxHθ Ay|

≤ |σ|2|λ| + |σ|kxθk2kAk2kyk2

≤ |σ|(1 + |σ|)kAk2

= O(θ).

Thus the Ritz value converges at least as fast as the eigenvector approximation of x in U (Uθ).

(59)

If A is Hermitian, then

µθ = (¯γxH + ¯σyH)(γλx + σAy)

= |γ|2λ + ¯rσxHAy + |σ|2yHAy = |γ|2λ + ¯rσ¯λxHy + |σ|2yHAy

= |γ|2λ + |σ|2yHAy and

θ− λ| = |(|γ|2− 1)λ + |σ|2yHAy|

≤ |σ|2|λ| + |σ|2kyk2kAk2kyk2

≤ 2|σ|2kAk2

= O(θ2).

Since the angle θ = ∠(x, Uθ)cannot be known and hence cannot compute error bounds. Thus, we must look to the residual as an indication of convergence.

(60)

Theorem

Let A have the spectral representation A = λxyH + XLYH,

where kxk2 = 1and Y is orthonormal. Let (µ, ˜x)be an approximation to (λ, x) and let

ρ = kA˜x − µ˜xk2. Then

sin ∠(˜x, x) ≤ ρ

sep(µ, L) ≤ ρ

sep(λ, L) − |µ − λ|.

(61)

Proof: Since YHx = 0, we have that Y is an orthonormal basis for the orthogonal complement of the space span{x} and then sin ∠(˜x, x) = kYHxk˜ 2. Let r = A˜x − µ˜x. Then

YHr = YHA˜x − µYHx = (L − µI)Y˜ Hx.˜ It follows that

sin ∠(˜x, x) = k(L − µI)−1YHrk2 ≤ krk2 sep(µ, L). By the fact that sep(µ, L) ≥ sep(λ, L) − |µ − λ|, the second inequality is obtained.

Since λ is assumed to be simple, this theorem says that:

Sufficient condition for ˜xto converge to x is for µ to converge to λand for the residual to converge to zero.

(62)

Definition

Let µθbe a Ritz value associated with Uθ. A refined Ritz vector is a solution of the problem

min kAˆxθ− µθθk2 subject to xˆθ ∈ Uθ, kˆxθk2 = 1.

Theorem

Let A have the spectral representation A = λxyH + XLYH,

where kxk2 = 1and Y is orthonormal. Let µθbe a Ritz value and ˆxθthe corresponding refined Ritz vector. If

sep(λ, L) − |µθ− λ| > 0, then

sin ∠(x, ˆxθ) ≤ kA − µθIk2sin θ + |λ − µθ| p1 − sin2θ[sep(λ, L) − |λ − µθ|].

(63)

Proof: Let U be an orthonormal basis for Uθ and let x = y + z, where z = U UHx. Then

kzk2 = kUHxk2 = sin θ.

Moreover, since y and x are orthogonal,

kzk22 = kx − yk22 = (xH− yH)(x − y)

= kxk22+ kyk22= 1 + kyk22

⇒ kyk22= 1 − kzk22= 1 − sin2θ.

Let

ˆ

y = y

p1 − sin2θ , we have

(A − µθI)ˆy = (A − µθI)y p1 − sin2θ

= (A − µθI)(x − z) p1 − sin2θ

= (λ − µθ)x − (A − µθI)z p1 − sin2θ

.

(64)

Hence

k(A − µθI)ˆyk2≤ |λ − µθ| + kA − µθIk sin θ p1 − sin2θ . By the definition of a refined Ritz vector we have

k(A − µθI)ˆxk2 ≤ |λ − µθ| + kA − µθIk sin θ p1 − sin2θ

. The result now follows from Theorem 35.

Remark

By Corollary 30, µθ→ λ. It follows that sin ∠(x, ˆxθ) → 0. In other words, refined Ritz vectors are guaranteed to

converge.

ˆ

µθ = ˆxHθ Aˆxθ is more accurate than µθ and kAˆxθ− ˆµθθk2 is optimal.

(65)

The computation of a refined Ritz vector amounts to solve min kAˆx − µˆxk2

subject to x ∈ U , kˆˆ xk2 = 1. (10) Let U be an orthonormal basis for U . Then (10) is equivalent to

min k(A − µI)U zk2 subject to kzk2 = 1.

The solution of this problem is the right singular vector of (A − µI)Ucorresponding to its smallest singular value. Thus refined Ritz vector can be computed by the following algorithm.

1 V = AU

2 W = V − µU

3 Compute the smallest singular value of W and its right singular vector z

4 x = U zˆ

(66)

Exterior eigenvalues are easily convergent than interior eigenvalues by Rayleigh quotient. The quality of the refined Ritz vector depends on the accuracy of the Ritz value µ and each refined Ritz vector must be calculated independently from its own distinct value of µ.

Definition

Let U be an orthonormal basis for subspace U . Then (κ + δ, U w)is aHarmonic Ritz pair with shift κif

UH(A − κI)H(A − κI)U w = δUH(A − κI)HU w. (11) Given shift κ, (11) is a generalized eigenvalue problem with eigenvalue δ.

Theorem

Let (λ, x) be an eigenpair of A with x = U w. Then (λ, U w) is a harmonic Ritz pair.

(67)

Proof: Since (λ, x) is an eigenpair of A with x = U w, we have Ax = λx ⇒ AU w = λU w.

It implies that

UH(A − κI)H(A − κI)U w = (λ − κ)UH(A − κI)HU w.

Taking eigenvalue δ = λ − κ, we obtain

UH(A − κI)H(A − κI)U w = δUH(A − κI)HU w.

That is (κ + δ, U w) = (λ, U w) is a harmonic Ritz pair.

(68)

Given a shift κ, if we want to compute the eigenvalue λ of A which is closest to κ, then we need to compute the eigenvalue δ of (11) such that |δ| is the smallest value of all of the absolute values for the eigenvalues of (11).

Expect

If x is approximately represented in U , then the harmonic Rayleigh-Ritz will produce an approximation to x.

Question

How to compute the eigenpair (δ, w) of (11)?

(69)

Let

(A − κI)U = QR

be the QR factorization of (A − κI)U . Then (11) can be rewritten as

RHRw = δRHQHU w.

That is

(QHU )w = δ−1Rw.

This eigenvalue can be solved by the QZ algorithm. The harmonic Ritz vector ˆx = U wand the corresponding harmonic Ritz value is µ = ˆxHAˆx.

Updating...

## References

Related subjects :