• 沒有找到結果。

# Eigenspaces and their Approximation

N/A
N/A
Protected

Share "Eigenspaces and their Approximation"

Copied!
69
0
0

(1)

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University, Taiwan

April 8, 2009

(2)

### Outline

1 Eigenspaces Definitions

Simple eigenspaces

2 Perturbation Theory Canonical angles Residual analysis

3 Krylov subspaces

Krylov sequences and Krylov spaces Convergence

Block Krylov spaces

4 Rayleigh-Ritz Approximation Rayleigh-Ritz methods Convergence

Refined Ritz vectors Harmonic Ritz vectors

(3)

Definition

Let A be of order n and let X be a subspace of Cn. Then X is aneigenspaceorinvariant subspaceof A if

AX = {Ax; x ∈ X } ⊂ X .

If (λ, x) ≡ (α + ıβ, y + ız) is a complex eigenpair of a real matrix A, i.e.,

A(y + ız) = (α + ıβ)(y + ız) = (αy − βz) + ı(βy + αz)

 Ay = αy − βz, Az = βy + αz, then

A

y z  =  y z 

 α β

−β α

 . It implies that R [ y z ] is an eigenspace of A.

(4)

Theorem

Let X be an eigenspace of A and let X be a basis for X . Then there is a unique matrix L such that

AX = XL.

The matrix L is given by

L = XIAX, where XI is a matrix satisfying XIX = I.

If (λ, x) is an eigenpair of A with x ∈ X , then (λ, XIx)is an eigenpair of L. Conversely, if (λ, u) is an eigenpair of L, then (λ, Xu)is an eigenpair of A.

(5)

Proof: Let

X = [x1· · · xk] and Y = AX = [y1· · · yk] .

Since yi ∈ X and X is a basis for X , there is a unique vector `i such that

yi = X`i.

If we set L = [`1· · · `k], then AX = XL and L = XIXL = XIAX.

Now let (λ, x) be an eigenpair of A with x ∈ X . Then there is a unique vector u such that x = Xu. However, u = XIx. Hence

λx = Ax = AXu = XLu ⇒ λu = λXIx = Lu.

Conversely, if Lu = λu, then

A(Xu) = (AX)u = (XL)u = X(Lu) = λ(Xu), so that (λ, Xu) is an eigenpair of A.

(6)

Definition

Let A be of order n. For X ∈ Cn×kand L ∈ Ck×k, we say that (L, X)is an eigenpair of order k or right eigenpair of order k of Aif

1. X is of full rank, 2. AX = XL.

The matrices X and L are called eigenbasis and eigenblock, respectively. If X is orthonormal, we say that the eigenpair (L, X)is orthonormal.

If Y ∈ Cn×k has linearly independent columns and

YHA = LYH, we say that (L, Y ) is a left eigenpair of order k of A.

(7)

Question

How eigenpairs transform under change of basis and similarities?

Theorem

Let (L, X) be an eigenpair of A. If U is nonsingular, then the pair (U−1LU, XU )is also eigenpair of A. If W is nonsingular, then (L, W−1X)is an eigenpair of W−1AW.

proof:

A(XU ) = (AX)U = (XL)U = (XU )(U−1LU ), (W−1AW )(W−1X) = W−1AX = (W−1X)L.

The eigenvalues of L of an eigenspace with respect to a basis are independent of the choices of the basis.

(8)

Theorem

Let L = {λ1, . . . , λk} ⊂ Λ(A) be a multisubset of the

eigenvalues of A. Then there is an eigenspace X of A whose eigenvalues are λ1, . . . , λk.

Proof: Let

A[ U1 U2 ] = [ U1 U2 ]

 T11 T12 0 T22



be a partitioned Schur decomposition of A in which T11is of order k and has the members of L on its diagonal. Then

AU1= U1T11.

Hence the column space of U1is an eigenspace of A whose eigenvalues are the members of L.

(9)

Definition

An eigenvalue whose geometric multiplicity is less than its algebraic multiplicity is defective.

Definition

Let X be an eigenspace of A with eigenvalues L. Then X is a simple eigenspace of A if

L ∩ [Λ(A) \ L] = ∅.

In the other words, an eigenspace is simple if its eigenvalues are disjoint from the other eigenvalues of A.

(10)

Theorem

Let (L1, X1)be a simple orthonormal eigenpairs of A and let (X1, Y2)be unitary so that

 X1H Y2H

 A

X1 Y2  =

 L1 H 0 L2

 .

Then there is a matrix Q satisfying the Sylvester equation L1Q − QL2= −H

such that if we set X =

X1 X2

 and Y =

Y1 Y2  , where

X2 = Y2+ X1Q and Y1 = X1− Y2QH,

(11)

then

YHX = I and YHAX =diag(L1, L2).

Proof: Since (L1, X1)is a simple eigenpairs of A, it implies that Λ(L1) ∩ λ(L2) = ∅.

By Theorem 1.18 in Chapter 1, there is a unique matrix Q satisfying

L1Q − QL2= −H such that

 I −Q

0 I

  L1 H 0 L2

  I Q 0 I



=diag(L1, L2).

That is

 I −Q

0 I

  X1H Y2H

 A

X1 Y2



 I Q 0 I



=diag(L1, L2).

(12)

Therefore,

 X1H− QY2H Y2H

 A

X1 X1Q + Y2  = diag(L1, L2).

Observations

1 X and Y are said to be biorthogonal.

2 Since A

X1 X2  =  X1 X2  diag(L1, L2), we see that

AX2 = X2L2,

so that (L2, X2)is an eigenpair of A. Likewise (L1, Y1)is a left eigenpair of A.

(13)

Let x and y be nonzero vectors. Then the angle ∠(x, y) of x and y is defined as

cos ∠(x, y) = |xHy|

k x k2k y k2.

Extend this definition to subspaces in Cn. Let X and Y be subspaces of the same dimension. Let X and Y be orthonormal bases for X and Y, respectively, and define C = YHX. We have

k C k2≤k X k2k Y k2= 1.

Hence all the singular value of C lie in [0, 1] and can be regarded as cosine of angles.

(14)

Definition

Let X and Y be subspaces of Cnof dimension p and let X and Y be orthonormal bases for X and Y, respectively. Then the canonical angles between X and Y are

θi(X , Y) = cos−1γi, (1) with

θ1(X , Y) ≥ θ2(X , Y) ≥ · · · ≥ θp(X , Y), where γi are the singular values of YHX.

(15)

If the canonical angle is small, then the computation of (1) will give inaccurate results.

For small θ, cos(θ) ∼= 1 −12θ2. If θ ≤ 10−8, then cos(θ) will evaluate to 1 in IEEE double-precision arithmetic, and we will conclude that θ = 0.

The cure for this problem is to compute the sine of the canonical angles.

Theorem

Let X and Y be orthonormal bases for X and Y, and let Ybe an orthonormal basis for the orthogonal complement of Y.

Then the singular values of YHX are the sines of the canonical angles between X and Y.

(16)

Proof: Let

 YH YH

 X =

 C S

 . By the orthonormality, we have

I = CHC + SHS.

Let

VH(CHC)V = Γ2 ≡ diag(γ12, · · · , γp2)

be the spectral decomposition of CHC. Then by the definition of canonical angle θiin (1), we have

θi= cos−1γi. But

I = VH(CHC + SHS)V = Γ2+ VH(SHS)V ≡ Γ2+ Σ2.

(17)

It follows that

Σ2 ≡ diag(σ21, · · · , σ2p) =diag(1 − γ21, · · · , 1 − γp2), where σi are singular values of S = YHX. Therefore,

σ2i = 1 − γi2= 1 − cos2θi = sin2θi ⇒ θi= sin−1σi.

Theorem

Let x be a vector with k x k2= 1and let Y be a subspace. Then sin ∠(x, Y) = min

y∈Y k x − y k2.

(18)

Proof: Let (Y, Y)be unitary with R(Y ) = Y. Let y ∈ Y, then

 YH YH



(x − y) =

 xˆ ˆ x



 yˆ 0



=

 x − ˆˆ y ˆ x

 . It implies that

k x − y k2=

 YH YH



(x − y) 2

=

 x − ˆˆ y ˆ x

 2

and hence

miny∈Ykx − yk = kˆxk2= kYHxk2. (2) By Theorem 10 and (2), we have

sin ∠(x, Y) = kYHxk2= min

y∈Ykx − yk.

(19)

Theorem

Let X and Y be orthonormal matrices with XHY = 0and let Z = X + Y Q. Let

σ1≥ σ2 ≥ · · · ≥ σk> 0 and ζ1 ≥ ζ2 ≥ · · · ≥ ζk> 0 denote the nonzero singular values of Z and Q, respectively.

Set

θ1 ≥ θ2≥ · · · ≥ θk

to be the nonzero canonical angle between R(X) and R(Z).

Then

σi = sec θi and ζi= tan θi, for i = 1, . . . , k.

(20)

Proof: Since

XHX = I, YHY = I, XHY = 0 and Z = X + Y Q, we have

ZHZ = (XH + QHYH)(X + Y Q) = I + QHQ.

This implies that

σi2 = 1 + ζi2, for i = 1, . . . , k. (3) Define

Z ≡ Z(Zˆ HZ)−1/2= (X + Y Q)(I + QHQ)−1/2, where (I + QHQ)−1/2is the inverse of the positive definite square root of I + QHQ. Then ˆZis an orthonormal basis for R(Z) and

XHZ = (I + QQˆ H)−1/2.

(21)

Hence the singular values γi of XHZˆare γi =

q 1 + ζi2

−1

for i = 1, . . . , k. Using (3) and the definition of canonical angles θi between R(X) and R(Z), we have

cos θi = γi= σi−1. That is

σi= 1 cos θi

= sec θi. The relation tan θi= ζinow follows from (3).

(22)

Let (L1, X1)be a simple right orthonormal eigenpair of A and let (X1, Y2)be unitary. From Theorem 8, (L1, Y1≡ X1− Y2QH) is left eigenpair of A and Y1HX1= I. By Theorem 12, we obtain the following corollary.

Corollary

Let X be an orthonormal basis for a simple eigenspace X of A and let Y be a basis for the corresponding left eigenspace Y of Anormalized so that YHX = I. Then the singular values of Y are the secants of the canonical angles between X and Y. In particular,

kY k2= sec θ1(X , Y).

(23)

Theorem

Let [X X]be unitary. Let R = AX − XL and

SH = XHA − LXH. Then kRk and kSk are minimized when L = XHAX,

in which case

kRk = kXHAXk and kSk = kXHAXk.

Proof: Set

 XH XH

 A

X X  =

 Lˆ H

G M

 . Then

 XH XH

 R =

 Lˆ H

G M

  XH XH

 X −

 XH XH

 XL =

 L − Lˆ G

 .

(24)

It implies that kRk =

 XH XH

 R

=

 L − Lˆ G

 ,

which is minimized when L = XHAXand min kRk = kGk = kXHAXk.

The proof for S is similar.

Definition

Let X be of full column rank and let XIbe a left inverse of X.

Then XIAX is a Rayleigh quotient of A.

(25)

Theorem

Let X be orthonormal and let

R = AX − XL.

Let `1, . . . , `kbe the eigenvalues of L. Then there are eigenvalues λj1, . . . , λjk of A such that

|`i− λji| ≤ kRk2 and

v u u t

k

X

i=1

(`i− λji)2≤√

2kRkF.

(26)

Power method: compute the dominant eigenpair Disadvantage: at each step it considers only the single vector Aku, which amounts to throwing away the information contained in u, Au, A2u, . . . , Ak−1u.

Definition

Let A be of order n and let u 6= 0 be an n vector. Then {u, Au, A2u, A3u, . . .}

is a Krylov sequence based on A and u. We call the matrix Kk(A, u) =

u Au A2u · · · Ak−1u  the kth Krylov matrix. The space

Kk(A, u) = R[Kk(A, u)]

is called the kth Krylov subspace.

(27)

Theorem

Let A and u 6= 0 be given. Then

1 The sequence of Krylov subspaces satisfies

Kk(A, u) ⊂ Kk+1(A, u), AKk(A, u) ⊂ Kk+1(A, u).

2 If σ 6= 0, then

Kk(A, u) = Kk(σA, u) = Kk(A, σu).

3 For any κ,

Kk(A, u) = Kk(A − κI, u).

4 If W is nonsingular, then

Kk(W−1AW, W−1u) = W−1Kk(A, u).

(28)

A Krylov sequence terminates at ` if ` is the smallest integer such that

K`+1(A, u) = K`(A, u).

Theorem

A Krylov sequence terminates based on A and u at ` if and only if ` is the smallest integer for which

dim[K`+1] =dim[K`].

If the Krylov sequence terminates at `, then K`is an

eigenspace of A of dimension `. On the other hand, if u lies in an eigenspace of dimension m, then for some ` ≤ m, the sequence terminates at `.

(29)

Proof:

If K`+1 = K`, then dim[K`+1] =dim[K`]. On the other hand, if dim[K`+1] =dim[K`], then K`+1 = K`because K`⊂ K`+1. If the sequence terminates at `, then

AK` ⊂ K`+1= K`, so that K` is an invariant subspace of A.

Let X be an invariant subspace with dimension m. If u ∈ X, then Aiu ∈ X for all i. That is Ki⊂ X and

dim(Ki) ≤ mfor all i. If the sequence terminates at ` > m, then K` is an invariant subspace and dim(K`) >

dim(Km) =dim(X ), which is impossible.

(30)

By the definition of Kk(A, u), for any vector v ∈ Kk(A, u)can be written in the form

v = γ1u + γ2Au + · · · + γkAk−1u ≡ p(A)u, where

p(A) = γ1I + γ2A + γ3A2+ · · · + γkAk−1.

Assume that A is Hermitian and has an orthonormal eigenpairs (λi, xi)for i = 1, . . . , n. Write u in the form

u = α1x1+ α2x2+ · · · + αnxn, where αi = xHi u. Since p(A)xi = p(λi)xi, we have

p(A)u = α1p(λ1)x1+ α2p(λ2)x2+ · · · + αnp(λn)xn. (4) If p(λi)is large compared with p(λj)for j 6= i, then p(A)u is a good approximation to xi.

(31)

Theorem

If xHi u 6= 0and p(λi) 6= 0, then tan ∠(p(A)u, xi) ≤ max

j6=i

|p(λj)|

|p(λi)|tan ∠(u, xi).

Proof: From (4), we have

cos ∠(p(A)u, xi) = |xHi p(A)u|

kp(A)uk2kxik2 = |αip(λi)|

qPn

j=1jp(λj)|2 and

sin ∠(p(A)u, xi) = qP

j6=ijp(λj)|2 qPn

j=1jp(λj)|2

(32)

Hence

tan2∠(p(A)u, xi) = X

j6=i

jp(λj)|2

ip(λi)|2

≤ max

j6=i

|p(λj)|2

|p(λi)|2 X

j6=i

j|2

i|2

= max

j6=i

|p(λj)|2

|p(λi)|2 tan2∠(u, xi).

Assume that p(λi) = 1, then tan ∠(p(A)u, xi) ≤ max

j6=i,p(λi)=1|p(λj)| tan ∠(u, xi) ∀ p(A)u ∈ Kk. Hence

tan ∠(xi, Kk) ≤ min

deg(p)≤k−1,p(λi)=1max

j6=i |p(λj)| tan ∠(u, xi).

(33)

Assume that

λ1> λ2 ≥ · · · ≥ λn

and that our interest is in the eigenvector x1. Then tan ∠(x1, Kk) ≤ min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn2]|p(λ)| tan ∠(u, x1).

Question

How to compute

deg(p)≤k−1,p(λmin 1)=1 max

λ∈[λn2]|p(λ)|?

Definition

The Chebyshev polynomials are defined by ck(t) =

 cos(k cos−1t), |t| ≤ 1, cosh(k cosh−1t), |t| ≥ 1.

(34)

Theorem

(a) c0(t) = 1, c1(t) = tand

ck+1(t) = 2ck(t) − ck−1(t), k = 1, 2, . . . . (b) For |t| > 1,

ck(t) = (1 +p

t2− 1)k+ (1 +p

t2− 1)−k. (c) For t ∈ [−1, 1], |ck(t)| ≤ 1. Moreover, if

tik = cos(k − i)π

k , i = 0, 1, . . . , k, then

ck(tik) = (−1)k−i.

(35)

(d) For s > 1,

min

deg(p)≤k,p(s)=1max

t∈[0,1]

|p(t)| = 1

ck(s), (5) and the minimum is obtained only for p(t) = ck(t)/ck(s).

For applying (5), we define

λ = λ2+ (µ − 1)(λ2− λn)

to transform interval [λn, λ2]to [0, 1]. Then the value of µ at λ1 is µ1 = 1 + λ1− λ2

λ2− λn and

min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn2]

|p(λ)|

= min

deg(p)≤k−1,p(µ1)=1 max

µ∈[0,1]

|p(µ)| = 1 ck−11).

(36)

Theorem

Let the Hermitian matrix A have an orthonormal eigenpair (λi, xi)with

λ1> λ2 ≥ · · · ≥ λn. Let

η = λ1− λ2 λ2− λn. Then

tan ∠[x1, Kk(A, u)] ≤ tan ∠(x1, u) ck−1(1 + η)

= tan ∠(x1, u)

(1 +p

2η + η2)k−1+ (1 +p

2η + η2)1−k.

(37)

Remark

For k large, we have

tan ∠[x1, Kk(A, u)] . tan ∠(x1, u) (1 +p

2η + η2)k−1. For k large and if η is small, then the bound becomes

tan ∠[x1, Kk(A, u)] . tan ∠(x1, u) (1 +√

2η)k−1. Compare it with power method:

If |λ1| > |λ2| ≥ · · · ≥ |λn|, then the convergence of the power method is |λ21|k.

(38)

For example, let

λ1 = 1, λ2 = 0.95, λ3= 0.952, · · · , λ100 = 0.9599 be the eigenvalues of 100-by-100 matrix A. Then η = 0.0530and the bound on the convergence rate is 1/(1 +√

2η) = 0.7544. Thus the square root effect gives a great improvement over the rate of 0.95 for the power method.

Replaced A by −A, then the Krylov sequence converges to the eigenvector corresponding to the smallest

eigenvalue of A. However, the smallest eigenvalues of a matrix – particularly a positive definite matrix – often tend to cluster together, so that the bound will be unfavorable.

(39)

The hypothesis λ1> λ2 can be relaxed. Suppose that λ1= λ2 > λ3. Expand u in the form

u = α1x1+ α2x2+ α3x3+ · · · + αnxn. Then

Aku = λk11x1+ α2x2) + α3λk3x3+ · · · + αnλknxn. This shows that the spaces Kk(A, u)contain only approximations to α1x1+ α2x2.

(40)

Theorem

Let λ be a simple eigenvalue of A and let A =

x X 

 λ 0 0 L

  yH YH



= λxyH + XLYH be a spectral representation. Let

u = αx + Xa, where

α = yHu and a = YHu.

Then

sin ∠[x, Kk(A, u)] ≤ |α|−1 min

deg(p)≤k−1,p(λ)=1

kXp(L)ak2.

(41)

Proof: From Theorem 11,

sin ∠[x, Kk(A, u)] = |α|−1 min

y∈Kk(A,u)

kαx − yk2

= |α|−1 min

deg(p)≤k−1

kαx − p(A)uk2

≤ |α|−1 min

deg(p)≤k−1,p(λ)=1kαx − p(A)uk2. Since

p(λ) = 1 and AX = XL, we have

p(A)u = p(A)(αx + Xa) = αp(λ)x + Xp(L)a = αx + Xp(L)a.

Hence

sin ∠[x, Kk(A, u)] ≤ |α|−1 min

deg(p)≤k−1,p(λ)=1

kαx − (αx + Xp(L)a)k2

= |α|−1 min

deg(p)≤k−1,p(λ)=1kXp(L)ak2.

(42)

Let (λi, xi)be an eigenpair of A for i = 1, . . . , n. Write vector u in the form

u = α1x1+ α2x2+ · · · + αnxn. Assume that λ1is double, i.e., λ1 = λ2. Then

Aku = λk11x1+ α2x2) + λk3α3x3+ · · · + λknαnxn. Hence the Krylov sequence can only produce the

approximation α1x1+ α2x2 to a vector in the eigenspace of λ1. Let U be a matrix with linearly independent columns. Then the sequence

{U, AU, A2U, . . .}

is called a block Krylov sequence and the space Kk(A, U )is called the k-th block Krylov space.

(43)

Gaol: passing to block Krylov sequence improves the convergence bound.

Theorem

Let A be Hermitian and let (λi, xi)be a complete system (n eigenvectors are linearly independent) of orthonormal eigenpairs of A with

λ1≥ λ2 ≥ · · · ≥ λn,

and assume that the multiplicity of λ1 is not greater than m. If

B =

 xH1

... xHm

U is nonsingular and we set

v = U B−1e1,

(44)

then

tan ∠[x1, Kk(A, U )] ≤ tan ∠(x1, v) ck−1(1 + 2η)

= tan ∠(x1, v)

(1 + 2p

η + η2)k−1+ (1 + 2p

η + η2)1−k(6), where

η = λ1− λm+1 λm+1− λn.

Proof: Since v ∈ R(U ), we have Kk(A, v) ⊂ Kk(A, U ). By Theorem 11,

sin ∠[x1, Kk(A, U )] = min

y∈Kk(A,U )

kx1− yk2

≤ min

y∈Kk(A,v)

kx1− yk2 = sin ∠[x1, Kk(A, v)].

(45)

This implies that

∠[x1, Kk(A, U )] ≤ ∠[x1, Kk(A, v)].

By the definition of B, we have

 xH1

... xHn

U B−1 = I ⇒ xHi U B−1 = eTi for i = 1, . . . , m.

By the definition of v,

xHi v = xHi U B−1e1= 0 for i = 2, . . . , m.

On the other hand, (λi, xi)is an eigenpair of Hermitian A, i.e., xHi A = λixHi . Hence

xHi Ajv = λjixHi v = 0 for i = 2, . . . , m.

(46)

This implies that x2, . . . , xm are not contained in Kk(A, v). That is

Ajv = α1λj1x1+ αm+1λjm+1xm+1+ · · · + αnλjnxn

for j = 1, . . . , k − 1. We may now apply Theorem 23 to get (6).

(47)

Theorem

Let U be a subspace and let U be a basis for U . Let V be a left inverse of U and set

B = VHAU.

If X ⊂ U is an eigenspace of A, then there is an eigenpair (L, W )of B such that (L, U W ) is an eigenpair of A with R(U W ) = X .

Proof: Let (L, X) be an eigenpair of A and let X = U W . Then from the relation

AU W = U W L we obtain

BW = VHAU W = VHU W L = W L, so that (L, W ) is an eigenpair of B.

(48)

We can find exact eigenspaces contained in U by looking at eigenpairs of the Rayleigh quotient B.

Algorithm (Rayleigh-Ritz procedure)

1 Let U be a basis for U and let VH be a left inverse of U .

2 Form the Rayleigh quotient B = VHAU.

3 Let (M, W ) be a suitable eigenpair of B.

4 Return (M, U W ) as an approximate eigenpair of A.

(M, U W )is called aRitz pair. Written Ritz pair in the form (λ, U w), we will call λ aRitz valueand U w aRitz vector.

Two difficulties for Rayleigh-Ritz procedure: (i) how to choose the eigenpair (M, W ) in statement 3. (ii) no guarantee that the result approximates the desired eigenpair of A.

(49)

Example

Let A = diag(0, 1, −1) and suppose we interested in approximating the eigenpair (0, e1). Assume

U =

1 0

0 1/√ 2 0 1/√

2

.

Then

B = UHAU =

 0 0 0 0



and any nonzero vector p is an eigenvector of B. If we take p = [1, 1]T, then U p = [1, 1/√

2, 1/√

2]is an approximate eigenvector of A, which is completely wrong. Thus the method can fail, even though the space U contains the desired

eigenvector.

(50)

The matrices U and V in Algorithm 1 satisfy the condition VHU = Iand they can differ. Hence Algorithm 1 is called anoblique Rayleigh-Ritz method.

If the matrix U is taken to be orthonormal and V = U . In addition, W is taken to be orthonormal, so that ˆX ≡ U W is also orthonormal. We call this procedure theorthogonal Rayleigh-Rite method.

Theorem

Let (M, ˆX ≡ U W )be an orthogonal Rayleigh-Rite pair. Then R = A ˆX − ˆXM

is minimal in any unitarily invariant norm.

(51)

Proof: By Theorem 14, we need to show M = ˆXHA ˆX. Since (M, W )is an eigenpair of B and W is orthonormal, we have

M = WHBW and

HA ˆX = WHUHAU W = WHBW = M.

(52)

Let (λ, x) be the desired eigenpair of A and Uθ be an orthonormal basis for which θ = ∠(x, Uθ)is small.

Theorem Let

Bθ= UθHAUθ. Then there is a matrix Eθsatisfying

kEθk2 ≤ sin θ p1 − sin2θ

kAk2 such that λ is an eigenvalue of Bθ+ Eθ. Proof. Let (Uθ, U)be unitary and set

y = UθHx and z = UHx.

(53)

From Theorem 10, we have

kzk2= sin θ and kyk2 =p

1 − sin2θ.

Since Ax − λx = 0, we have UθHA[Uθ, U]

 UθH UH



x − λUθHx = 0, or

Bθy + UθHAUz − λy = 0.

Let ˆy = y/kyk2 = y/p

1 − sin2θ. If r ≡ Bθy − λˆˆ y = −1

p1 − sin2θ

UθHAUz, it follows that

krk2≤ sin θ p1 − sin2θ

kAk2.

(54)

Now define

Eθ= −r ˆyH. Then

kEθk2 = q

λ1((r ˆyH)(ˆyrH)) = q

λ1(rrH) = krk2≤ sin θ p1 − sin2θ

kAk2 and

(Bθ+ Eθ)ˆy = Bθy − (r ˆˆ yH)ˆy = Bθy − r = λˆˆ y.

Therefore, (λ, ˆy)is an eigenpair of Bθ+ Eθ. Corollary

There is an eigenvalue µθof Bθsuch that

θ− λ| ≤ 4(2kAk2+ kEθk2)1−1/mkEθk1/m2 , where m is the order of Bθ.

(55)

Theorem

Let (µθ, wθ)be an eigenpair of Bθ and let

wθ Wθ  be unitary, so that

 wHθ WθH

 Bθ

 wθ Wθ  =

 µθ hHθ 0 Nθ

 . Then

sin ∠(x, Uθwθ) ≤ sin θ s

1 + khθk22 sep(λ, Nθ)2, where sep(λ, Nθ) = k(λI − Nθ)−1k−1.

(56)

By the continuity of sep, we have

|sep(λ, Nθ) −sep(µθ, Nθ)| ≤ |µθ− λ|

⇒ sep(λ, Nθ) ≥sep(µθ, Nθ) − |µθ− λ|.

Suppose µθ→ λ and sep(µθ, Nθ)is bounded below. Then sep(λ, Nθ)is also bounded below. Since khθk2 ≤ kAk2, we have sin ∠(x, Uθwθ) → 0along with θ.

Corollary

Let (µθ, Uθwθ)be a Ritz pair for which µθ→ λ. If there is a constant α > 0 such that

sep(µθ, Nθ) ≥ α > 0, (7) then

sin ∠(x, Uθwθ) . sin θ r

1 +kAk2 α2 .

(57)

This corollary justifies that eigenvalue convergence plus separation equals eigenvector convergence.

The condition (7) is called the uniform separation condition.

Let (µθ, xθ)with kxθk2 = 1be the Ritz approximation to (λ, x).

Then by construction, we have

µθ = xHθ Axθ. (8)

Write

xθ = γx + σy, (9)

where y⊥x and kyk2= 1. Then

|γ| = |xHxθ| = cos ∠(xθ, x) and |σ| = |yHxθ| = sin ∠(xθ, x).

If the uniform separation is satisfied, we have

|σ| = sin ∠(xθ, x) = O(θ).

(58)

Substituting (9) into (8) and using the facts of Ax = λx and yHx = 0, we find that

µθ = (¯γxH + ¯σyH)(γλx + σAy)

= |γ|2λ + σxHθ Ay.

Hence

θ− λ| = |(|γ|2− 1)λ + σxHθ Ay|

≤ |σ|2|λ| + |σ|kxθk2kAk2kyk2

≤ |σ|(1 + |σ|)kAk2

= O(θ).

Thus the Ritz value converges at least as fast as the eigenvector approximation of x in U (Uθ).

(59)

If A is Hermitian, then

µθ = (¯γxH + ¯σyH)(γλx + σAy)

= |γ|2λ + ¯rσxHAy + |σ|2yHAy = |γ|2λ + ¯rσ¯λxHy + |σ|2yHAy

= |γ|2λ + |σ|2yHAy and

θ− λ| = |(|γ|2− 1)λ + |σ|2yHAy|

≤ |σ|2|λ| + |σ|2kyk2kAk2kyk2

≤ 2|σ|2kAk2

= O(θ2).

Since the angle θ = ∠(x, Uθ)cannot be known and hence cannot compute error bounds. Thus, we must look to the residual as an indication of convergence.

(60)

Theorem

Let A have the spectral representation A = λxyH + XLYH,

where kxk2 = 1and Y is orthonormal. Let (µ, ˜x)be an approximation to (λ, x) and let

ρ = kA˜x − µ˜xk2. Then

sin ∠(˜x, x) ≤ ρ

sep(µ, L) ≤ ρ

sep(λ, L) − |µ − λ|.

(61)

Proof: Since YHx = 0, we have that Y is an orthonormal basis for the orthogonal complement of the space span{x} and then sin ∠(˜x, x) = kYHxk˜ 2. Let r = A˜x − µ˜x. Then

YHr = YHA˜x − µYHx = (L − µI)Y˜ Hx.˜ It follows that

sin ∠(˜x, x) = k(L − µI)−1YHrk2 ≤ krk2 sep(µ, L). By the fact that sep(µ, L) ≥ sep(λ, L) − |µ − λ|, the second inequality is obtained.

Since λ is assumed to be simple, this theorem says that:

Sufficient condition for ˜xto converge to x is for µ to converge to λand for the residual to converge to zero.

(62)

Definition

Let µθbe a Ritz value associated with Uθ. A refined Ritz vector is a solution of the problem

min kAˆxθ− µθθk2 subject to xˆθ ∈ Uθ, kˆxθk2 = 1.

Theorem

Let A have the spectral representation A = λxyH + XLYH,

where kxk2 = 1and Y is orthonormal. Let µθbe a Ritz value and ˆxθthe corresponding refined Ritz vector. If

sep(λ, L) − |µθ− λ| > 0, then

sin ∠(x, ˆxθ) ≤ kA − µθIk2sin θ + |λ − µθ| p1 − sin2θ[sep(λ, L) − |λ − µθ|].

(63)

Proof: Let U be an orthonormal basis for Uθ and let x = y + z, where z = U UHx. Then

kzk2 = kUHxk2 = sin θ.

Moreover, since y and x are orthogonal,

kzk22 = kx − yk22 = (xH− yH)(x − y)

= kxk22+ kyk22= 1 + kyk22

⇒ kyk22= 1 − kzk22= 1 − sin2θ.

Let

ˆ

y = y

p1 − sin2θ , we have

(A − µθI)ˆy = (A − µθI)y p1 − sin2θ

= (A − µθI)(x − z) p1 − sin2θ

= (λ − µθ)x − (A − µθI)z p1 − sin2θ

.

(64)

Hence

k(A − µθI)ˆyk2≤ |λ − µθ| + kA − µθIk sin θ p1 − sin2θ . By the definition of a refined Ritz vector we have

k(A − µθI)ˆxk2 ≤ |λ − µθ| + kA − µθIk sin θ p1 − sin2θ

. The result now follows from Theorem 35.

Remark

By Corollary 30, µθ→ λ. It follows that sin ∠(x, ˆxθ) → 0. In other words, refined Ritz vectors are guaranteed to

converge.

ˆ

µθ = ˆxHθ Aˆxθ is more accurate than µθ and kAˆxθ− ˆµθθk2 is optimal.

(65)

The computation of a refined Ritz vector amounts to solve min kAˆx − µˆxk2

subject to x ∈ U , kˆˆ xk2 = 1. (10) Let U be an orthonormal basis for U . Then (10) is equivalent to

min k(A − µI)U zk2 subject to kzk2 = 1.

The solution of this problem is the right singular vector of (A − µI)Ucorresponding to its smallest singular value. Thus refined Ritz vector can be computed by the following algorithm.

1 V = AU

2 W = V − µU

3 Compute the smallest singular value of W and its right singular vector z

4 x = U zˆ

(66)

Exterior eigenvalues are easily convergent than interior eigenvalues by Rayleigh quotient. The quality of the refined Ritz vector depends on the accuracy of the Ritz value µ and each refined Ritz vector must be calculated independently from its own distinct value of µ.

Definition

Let U be an orthonormal basis for subspace U . Then (κ + δ, U w)is aHarmonic Ritz pair with shift κif

UH(A − κI)H(A − κI)U w = δUH(A − κI)HU w. (11) Given shift κ, (11) is a generalized eigenvalue problem with eigenvalue δ.

Theorem

Let (λ, x) be an eigenpair of A with x = U w. Then (λ, U w) is a harmonic Ritz pair.

(67)

Proof: Since (λ, x) is an eigenpair of A with x = U w, we have Ax = λx ⇒ AU w = λU w.

It implies that

UH(A − κI)H(A − κI)U w = (λ − κ)UH(A − κI)HU w.

Taking eigenvalue δ = λ − κ, we obtain

UH(A − κI)H(A − κI)U w = δUH(A − κI)HU w.

That is (κ + δ, U w) = (λ, U w) is a harmonic Ritz pair.

(68)

Given a shift κ, if we want to compute the eigenvalue λ of A which is closest to κ, then we need to compute the eigenvalue δ of (11) such that |δ| is the smallest value of all of the absolute values for the eigenvalues of (11).

Expect

If x is approximately represented in U , then the harmonic Rayleigh-Ritz will produce an approximation to x.

Question

How to compute the eigenpair (δ, w) of (11)?

(69)

Let

(A − κI)U = QR

be the QR factorization of (A − κI)U . Then (11) can be rewritten as

RHRw = δRHQHU w.

That is

(QHU )w = δ−1Rw.

This eigenvalue can be solved by the QZ algorithm. The harmonic Ritz vector ˆx = U wand the corresponding harmonic Ritz value is µ = ˆxHAˆx.

[This function is named after the electrical engineer Oliver Heaviside (1850–1925) and can be used to describe an electric current that is switched on at time t = 0.] Its graph

Mean Value Theorem to F and G, we need to be sure the hypotheses of that result hold.... In order to apply

The new control is also similar to an R-format instruction, because we want to write the result of the ALU into a register (and thus MemtoReg = 0, RegWrite = 1) and of course we

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

Show that the requirement in the definition of uniform continuity can be rephrased as follows, in terms of diameters of sets: To every  &gt; 0 there exists a δ &gt; 0 such that

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.

Experiment a little with the Hello program. It will say that it has no clue what you mean by ouch. The exact wording of the error message is dependent on the compiler, but it might