師大

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University, Taiwan

April 8, 2009

師大

### Outline

1 Eigenspaces Definitions

Simple eigenspaces

2 Perturbation Theory Canonical angles Residual analysis

3 Krylov subspaces

Krylov sequences and Krylov spaces Convergence

Block Krylov spaces

4 Rayleigh-Ritz Approximation Rayleigh-Ritz methods Convergence

Refined Ritz vectors Harmonic Ritz vectors

師大

Definition

Let A be of order n and let X be a subspace of C^{n}. Then X is
aneigenspaceorinvariant subspaceof A if

AX = {Ax; x ∈ X } ⊂ X .

If (λ, x) ≡ (α + ıβ, y + ız) is a complex eigenpair of a real matrix A, i.e.,

A(y + ız) = (α + ıβ)(y + ız) = (αy − βz) + ı(βy + αz)

⇒

Ay = αy − βz, Az = βy + αz, then

A

y z = y z

α β

−β α

. It implies that R [ y z ] is an eigenspace of A.

師大 Definitions

Theorem

Let X be an eigenspace of A and let X be a basis for X . Then there is a unique matrix L such that

AX = XL.

The matrix L is given by

L = X^{I}AX,
where X^{I} is a matrix satisfying X^{I}X = I.

If (λ, x) is an eigenpair of A with x ∈ X , then (λ, X^{I}x)is an
eigenpair of L. Conversely, if (λ, u) is an eigenpair of L, then
(λ, Xu)is an eigenpair of A.

師大

Proof: Let

X = [x1· · · x_{k}] and Y = AX = [y1· · · y_{k}] .

Since y_{i} ∈ X and X is a basis for X , there is a unique vector `_{i}
such that

yi = X`i.

If we set L = [`_{1}· · · `_{k}], then AX = XL and
L = X^{I}XL = X^{I}AX.

Now let (λ, x) be an eigenpair of A with x ∈ X . Then there is a
unique vector u such that x = Xu. However, u = X^{I}x. Hence

λx = Ax = AXu = XLu ⇒ λu = λX^{I}x = Lu.

Conversely, if Lu = λu, then

A(Xu) = (AX)u = (XL)u = X(Lu) = λ(Xu), so that (λ, Xu) is an eigenpair of A.

師大 Definitions

Definition

Let A be of order n. For X ∈ C^{n×k}and L ∈ C^{k×k}, we say that
(L, X)is an eigenpair of order k or right eigenpair of order k of
Aif

1. X is of full rank, 2. AX = XL.

The matrices X and L are called eigenbasis and eigenblock, respectively. If X is orthonormal, we say that the eigenpair (L, X)is orthonormal.

If Y ∈ C^{n×k} has linearly independent columns and

Y^{H}A = LY^{H}, we say that (L, Y ) is a left eigenpair of order k of
A.

師大

Question

How eigenpairs transform under change of basis and similarities?

Theorem

Let (L, X) be an eigenpair of A. If U is nonsingular, then the
pair (U^{−1}LU, XU )is also eigenpair of A. If W is nonsingular,
then (L, W^{−1}X)is an eigenpair of W^{−1}AW.

proof:

A(XU ) = (AX)U = (XL)U = (XU )(U^{−1}LU ),
(W^{−1}AW )(W^{−1}X) = W^{−1}AX = (W^{−1}X)L.

The eigenvalues of L of an eigenspace with respect to a basis are independent of the choices of the basis.

師大 Definitions

Theorem

Let L = {λ_{1}, . . . , λ_{k}} ⊂ Λ(A) be a multisubset of the

eigenvalues of A. Then there is an eigenspace X of A whose
eigenvalues are λ_{1}, . . . , λk.

Proof: Let

A[ U1 U_{2} ] = [ U1 U_{2} ]

T_{11} T_{12}
0 T22

be a partitioned Schur decomposition of A in which T_{11}is of
order k and has the members of L on its diagonal. Then

AU1= U1T11.

Hence the column space of U_{1}is an eigenspace of A whose
eigenvalues are the members of L.

師大

Definition

An eigenvalue whose geometric multiplicity is less than its algebraic multiplicity is defective.

Definition

Let X be an eigenspace of A with eigenvalues L. Then X is a simple eigenspace of A if

L ∩ [Λ(A) \ L] = ∅.

In the other words, an eigenspace is simple if its eigenvalues are disjoint from the other eigenvalues of A.

師大 Simple eigenspaces

Theorem

Let (L_{1}, X1)be a simple orthonormal eigenpairs of A and let
(X1, Y2)be unitary so that

X_{1}^{H}
Y_{2}^{H}

A

X_{1} Y_{2} =

L_{1} H
0 L_{2}

.

Then there is a matrix Q satisfying the Sylvester equation
L_{1}Q − QL_{2}= −H

such that if we set X =

X1 X2

and Y =

Y1 Y2 , where

X_{2} = Y_{2}+ X_{1}Q and Y_{1} = X_{1}− Y_{2}Q^{H},

師大

then

Y^{H}X = I and Y^{H}AX =diag(L1, L2).

Proof: Since (L1, X1)is a simple eigenpairs of A, it implies that Λ(L1) ∩ λ(L2) = ∅.

By Theorem 1.18 in Chapter 1, there is a unique matrix Q satisfying

L1Q − QL2= −H such that

I −Q

0 I

L1 H
0 L_{2}

I Q 0 I

=diag(L_{1}, L_{2}).

That is

I −Q

0 I

X_{1}^{H}
Y_{2}^{H}

A

X1 Y2

I Q 0 I

=diag(L_{1}, L2).

師大 Simple eigenspaces

Therefore,

X_{1}^{H}− QY_{2}^{H}
Y_{2}^{H}

A

X1 X1Q + Y2 = diag(L_{1}, L2).

Observations

1 X and Y are said to be biorthogonal.

2 Since A

X_{1} X_{2} = X_{1} X_{2} diag(L_{1}, L_{2}),
we see that

AX2 = X2L2,

so that (L_{2}, X_{2})is an eigenpair of A. Likewise (L_{1}, Y_{1})is a
left eigenpair of A.

師大

Let x and y be nonzero vectors. Then the angle ∠(x, y) of x and y is defined as

cos ∠(x, y) = |x^{H}y|

k x k_{2}k y k_{2}.

Extend this definition to subspaces in C^{n}. Let X and Y be
subspaces of the same dimension. Let X and Y be
orthonormal bases for X and Y, respectively, and define
C = Y^{H}X. We have

k C k_{2}≤k X k_{2}k Y k_{2}= 1.

Hence all the singular value of C lie in [0, 1] and can be regarded as cosine of angles.

師大 Canonical angles

Definition

Let X and Y be subspaces of C^{n}of dimension p and let X and
Y be orthonormal bases for X and Y, respectively. Then the
canonical angles between X and Y are

θ_{i}(X , Y) = cos^{−1}γ_{i}, (1)
with

θ1(X , Y) ≥ θ2(X , Y) ≥ · · · ≥ θp(X , Y),
where γ_{i} are the singular values of Y^{H}X.

師大

If the canonical angle is small, then the computation of (1) will give inaccurate results.

For small θ, cos(θ) ∼= 1 −^{1}_{2}θ^{2}. If θ ≤ 10^{−8}, then cos(θ) will
evaluate to 1 in IEEE double-precision arithmetic, and we
will conclude that θ = 0.

The cure for this problem is to compute the sine of the canonical angles.

Theorem

Let X and Y be orthonormal bases for X and Y, and let Y⊥be an orthonormal basis for the orthogonal complement of Y.

Then the singular values of Y_{⊥}^{H}X are the sines of the canonical
angles between X and Y.

師大 Canonical angles

Proof: Let

Y^{H}
Y_{⊥}^{H}

X =

C S

. By the orthonormality, we have

I = C^{H}C + S^{H}S.

Let

V^{H}(C^{H}C)V = Γ^{2} ≡ diag(γ_{1}^{2}, · · · , γ_{p}^{2})

be the spectral decomposition of C^{H}C. Then by the definition
of canonical angle θ_{i}in (1), we have

θi= cos^{−1}γi.
But

I = V^{H}(C^{H}C + S^{H}S)V = Γ^{2}+ V^{H}(S^{H}S)V ≡ Γ^{2}+ Σ^{2}.

師大

It follows that

Σ^{2} ≡ diag(σ^{2}_{1}, · · · , σ^{2}_{p}) =diag(1 − γ^{2}_{1}, · · · , 1 − γ_{p}^{2}),
where σ_{i} are singular values of S = Y_{⊥}^{H}X. Therefore,

σ^{2}_{i} = 1 − γ_{i}^{2}= 1 − cos^{2}θi = sin^{2}θi ⇒ θi= sin^{−1}σi.

Theorem

Let x be a vector with k x k_{2}= 1and let Y be a subspace. Then
sin ∠(x, Y) = min

y∈Y k x − y k_{2}.

師大 Canonical angles

Proof: Let (Y, Y⊥)be unitary with R(Y ) = Y. Let y ∈ Y, then

Y^{H}
Y_{⊥}^{H}

(x − y) =

xˆ ˆ x⊥

−

yˆ 0

=

x − ˆˆ y ˆ x⊥

. It implies that

k x − y k_{2}=

Y^{H}
Y_{⊥}^{H}

(x − y) 2

=

x − ˆˆ y ˆ x⊥

2

and hence

miny∈Ykx − yk = kˆx⊥k_{2}= kY_{⊥}^{H}xk_{2}. (2)
By Theorem 10 and (2), we have

sin ∠(x, Y) = kY⊥^{H}xk_{2}= min

y∈Ykx − yk.

師大

Theorem

Let X and Y be orthonormal matrices with X^{H}Y = 0and let
Z = X + Y Q. Let

σ_{1}≥ σ_{2} ≥ · · · ≥ σ_{k}> 0 and ζ_{1} ≥ ζ_{2} ≥ · · · ≥ ζ_{k}> 0
denote the nonzero singular values of Z and Q, respectively.

Set

θ_{1} ≥ θ_{2}≥ · · · ≥ θ_{k}

to be the nonzero canonical angle between R(X) and R(Z).

Then

σ_{i} = sec θ_{i} and ζ_{i}= tan θ_{i}, for i = 1, . . . , k.

師大 Canonical angles

Proof: Since

X^{H}X = I, Y^{H}Y = I, X^{H}Y = 0 and Z = X + Y Q,
we have

Z^{H}Z = (X^{H} + Q^{H}Y^{H})(X + Y Q) = I + Q^{H}Q.

This implies that

σ_{i}^{2} = 1 + ζ_{i}^{2}, for i = 1, . . . , k. (3)
Define

Z ≡ Z(Zˆ ^{H}Z)^{−1/2}= (X + Y Q)(I + Q^{H}Q)^{−1/2},
where (I + Q^{H}Q)^{−1/2}is the inverse of the positive definite
square root of I + Q^{H}Q. Then ˆZis an orthonormal basis for
R(Z) and

X^{H}Z = (I + QQˆ ^{H})^{−1/2}.

師大

Hence the singular values γ_{i} of X^{H}Zˆare
γ_{i} =

q
1 + ζ_{i}^{2}

−1

for i = 1, . . . , k. Using (3) and the definition of canonical angles
θ_{i} between R(X) and R(Z), we have

cos θi = γi= σ_{i}^{−1}.
That is

σ_{i}= 1
cos θi

= sec θ_{i}.
The relation tan θi= ζinow follows from (3).

師大 Canonical angles

Let (L_{1}, X_{1})be a simple right orthonormal eigenpair of A and
let (X_{1}, Y_{2})be unitary. From Theorem 8, (L_{1}, Y_{1}≡ X_{1}− Y_{2}Q^{H})
is left eigenpair of A and Y_{1}^{H}X1= I. By Theorem 12, we obtain
the following corollary.

Corollary

Let X be an orthonormal basis for a simple eigenspace X of A
and let Y be a basis for the corresponding left eigenspace Y of
Anormalized so that Y^{H}X = I. Then the singular values of Y
are the secants of the canonical angles between X and Y. In
particular,

kY k_{2}= sec θ_{1}(X , Y).

師大

Theorem

Let [X X⊥]be unitary. Let R = AX − XL and

S^{H} = X^{H}A − LX^{H}. Then kRk and kSk are minimized when
L = X^{H}AX,

in which case

kRk = kX_{⊥}^{H}AXk and kSk = kX^{H}AX⊥k.

Proof: Set

X^{H}
X_{⊥}^{H}

A

X X⊥ =

Lˆ H

G M

. Then

X^{H}
X_{⊥}^{H}

R =

Lˆ H

G M

X^{H}
X_{⊥}^{H}

X −

X^{H}
X_{⊥}^{H}

XL =

L − Lˆ G

.

師大 Residual analysis

It implies that kRk =

X^{H}
X_{⊥}^{H}

R

=

L − Lˆ G

,

which is minimized when L = X^{H}AXand
min kRk = kGk = kX_{⊥}^{H}AXk.

The proof for S is similar.

Definition

Let X be of full column rank and let X^{I}be a left inverse of X.

Then X^{I}AX is a Rayleigh quotient of A.

師大

Theorem

Let X be orthonormal and let

R = AX − XL.

Let `_{1}, . . . , `kbe the eigenvalues of L. Then there are
eigenvalues λ_{j}_{1}, . . . , λ_{j}_{k} of A such that

|`_{i}− λ_{j}_{i}| ≤ kRk_{2}
and

v u u t

k

X

i=1

(`_{i}− λ_{j}_{i})^{2}≤√

2kRk_{F}.

師大 Krylov sequences and Krylov spaces

Power method: compute the dominant eigenpair
Disadvantage: at each step it considers only the single
vector A^{k}u, which amounts to throwing away the
information contained in u, Au, A^{2}u, . . . , A^{k−1}u.

Definition

Let A be of order n and let u 6= 0 be an n vector. Then
{u, Au, A^{2}u, A^{3}u, . . .}

is a Krylov sequence based on A and u. We call the matrix Kk(A, u) =

u Au A^{2}u · · · A^{k−1}u
the kth Krylov matrix. The space

K_{k}(A, u) = R[K_{k}(A, u)]

is called the kth Krylov subspace.

師大

Theorem

Let A and u 6= 0 be given. Then

1 The sequence of Krylov subspaces satisfies

K_{k}(A, u) ⊂ K_{k+1}(A, u), AK_{k}(A, u) ⊂ K_{k+1}(A, u).

2 If σ 6= 0, then

K_{k}(A, u) = K_{k}(σA, u) = K_{k}(A, σu).

3 For any κ,

K_{k}(A, u) = K_{k}(A − κI, u).

4 If W is nonsingular, then

K_{k}(W^{−1}AW, W^{−1}u) = W^{−1}K_{k}(A, u).

師大 Krylov sequences and Krylov spaces

A Krylov sequence terminates at ` if ` is the smallest integer such that

K_{`+1}(A, u) = K_{`}(A, u).

Theorem

A Krylov sequence terminates based on A and u at ` if and only if ` is the smallest integer for which

dim[K_{`+1}] =dim[K_{`}].

If the Krylov sequence terminates at `, then K_{`}is an

eigenspace of A of dimension `. On the other hand, if u lies in an eigenspace of dimension m, then for some ` ≤ m, the sequence terminates at `.

師大

Proof:

If K_{`+1} = K_{`}, then dim[K_{`+1}] =dim[K_{`}]. On the other hand,
if dim[K_{`+1}] =dim[K_{`}], then K_{`+1} = K`because K_{`}⊂ K_{`+1}.
If the sequence terminates at `, then

AK_{`} ⊂ K_{`+1}= K_{`},
so that K_{`} is an invariant subspace of A.

Let X be an invariant subspace with dimension m. If
u ∈ X, then A^{i}u ∈ X for all i. That is K_{i}⊂ X and

dim(K_{i}) ≤ mfor all i. If the sequence terminates at ` > m,
then K_{`} is an invariant subspace and dim(K_{`}) >

dim(K_{m}) =dim(X ), which is impossible.

師大 Convergence

By the definition of K_{k}(A, u), for any vector v ∈ K_{k}(A, u)can be
written in the form

v = γ1u + γ2Au + · · · + γ_{k}A^{k−1}u ≡ p(A)u,
where

p(A) = γ_{1}I + γ_{2}A + γ_{3}A^{2}+ · · · + γ_{k}A^{k−1}.

Assume that A is Hermitian and has an orthonormal eigenpairs (λi, xi)for i = 1, . . . , n. Write u in the form

u = α_{1}x_{1}+ α_{2}x_{2}+ · · · + α_{n}x_{n},
where α_{i} = x^{H}_{i} u. Since p(A)x_{i} = p(λ_{i})x_{i}, we have

p(A)u = α1p(λ1)x1+ α2p(λ2)x2+ · · · + αnp(λn)xn. (4)
If p(λi)is large compared with p(λj)for j 6= i, then p(A)u is a
good approximation to x_{i}.

師大

Theorem

If x^{H}_{i} u 6= 0and p(λ_{i}) 6= 0, then
tan ∠(p(A)u, xi) ≤ max

j6=i

|p(λ_{j})|

|p(λ_{i})|tan ∠(u, xi).

Proof: From (4), we have

cos ∠(p(A)u, xi) = |x^{H}_{i} p(A)u|

kp(A)uk_{2}kx_{i}k_{2} = |α_{i}p(λi)|

qPn

j=1|α_{j}p(λj)|^{2}
and

sin ∠(p(A)u, xi) = qP

j6=i|α_{j}p(λj)|^{2}
qPn

j=1|α_{j}p(λj)|^{2}

師大 Convergence

Hence

tan^{2}∠(p(A)u, xi) = X

j6=i

|α_{j}p(λ_{j})|^{2}

|α_{i}p(λi)|^{2}

≤ max

j6=i

|p(λ_{j})|^{2}

|p(λ_{i})|^{2}
X

j6=i

|α_{j}|^{2}

|α_{i}|^{2}

= max

j6=i

|p(λ_{j})|^{2}

|p(λ_{i})|^{2} tan^{2}∠(u, xi).

Assume that p(λ_{i}) = 1, then
tan ∠(p(A)u, xi) ≤ max

j6=i,p(λi)=1|p(λ_{j})| tan ∠(u, xi) ∀ p(A)u ∈ K_{k}.
Hence

tan ∠(xi, K_{k}) ≤ min

deg(p)≤k−1,p(λi)=1max

j6=i |p(λ_{j})| tan ∠(u, xi).

師大

Assume that

λ_{1}> λ_{2} ≥ · · · ≥ λ_{n}

and that our interest is in the eigenvector x1. Then tan ∠(x1, Kk) ≤ min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn,λ2]|p(λ)| tan ∠(u, x1).

Question

How to compute

deg(p)≤k−1,p(λmin 1)=1 max

λ∈[λn,λ2]|p(λ)|?

Definition

The Chebyshev polynomials are defined by
c_{k}(t) =

cos(k cos^{−1}t), |t| ≤ 1,
cosh(k cosh^{−1}t), |t| ≥ 1.

師大 Convergence

Theorem

(a) c_{0}(t) = 1, c_{1}(t) = tand

c_{k+1}(t) = 2c_{k}(t) − c_{k−1}(t), k = 1, 2, . . . .
(b) For |t| > 1,

c_{k}(t) = (1 +p

t^{2}− 1)^{k}+ (1 +p

t^{2}− 1)^{−k}.
(c) For t ∈ [−1, 1], |c_{k}(t)| ≤ 1. Moreover, if

t_{ik} = cos(k − i)π

k , i = 0, 1, . . . , k, then

c_{k}(t_{ik}) = (−1)^{k−i}.

師大

(d) For s > 1,

min

deg(p)≤k,p(s)=1max

t∈[0,1]

|p(t)| = 1

c_{k}(s), (5)
and the minimum is obtained only for p(t) = c_{k}(t)/ck(s).

For applying (5), we define

λ = λ2+ (µ − 1)(λ2− λ_{n})

to transform interval [λ_{n}, λ_{2}]to [0, 1]. Then the value of µ at λ_{1} is
µ1 = 1 + λ1− λ_{2}

λ_{2}− λ_{n}
and

min

deg(p)≤k−1,p(λ1)=1 max

λ∈[λn,λ2]

|p(λ)|

= min

deg(p)≤k−1,p(µ1)=1 max

µ∈[0,1]

|p(µ)| = 1
c_{k−1}(µ_{1}).

師大 Convergence

Theorem

Let the Hermitian matrix A have an orthonormal eigenpair (λi, xi)with

λ_{1}> λ_{2} ≥ · · · ≥ λ_{n}.
Let

η = λ1− λ_{2}
λ_{2}− λ_{n}.
Then

tan ∠[x1, K_{k}(A, u)] ≤ tan ∠(x1, u)
c_{k−1}(1 + η)

= tan ∠(x1, u)

(1 +p

2η + η^{2})^{k−1}+ (1 +p

2η + η^{2})^{1−k}.

師大

Remark

For k large, we have

tan ∠[x1, Kk(A, u)] . tan ∠(x1, u) (1 +p

2η + η^{2})^{k−1}.
For k large and if η is small, then the bound becomes

tan ∠[x1, K_{k}(A, u)] . tan ∠(x1, u)
(1 +√

2η)^{k−1}.
Compare it with power method:

If |λ_{1}| > |λ_{2}| ≥ · · · ≥ |λ_{n}|, then the convergence of the
power method is |λ_{2}/λ_{1}|^{k}.

師大 Convergence

For example, let

λ_{1} = 1, λ_{2} = 0.95, λ_{3}= 0.95^{2}, · · · , λ_{100} = 0.95^{99}
be the eigenvalues of 100-by-100 matrix A. Then
η = 0.0530and the bound on the convergence rate is
1/(1 +√

2η) = 0.7544. Thus the square root effect gives a great improvement over the rate of 0.95 for the power method.

Replaced A by −A, then the Krylov sequence converges to the eigenvector corresponding to the smallest

eigenvalue of A. However, the smallest eigenvalues of a matrix – particularly a positive definite matrix – often tend to cluster together, so that the bound will be unfavorable.

師大

The hypothesis λ1> λ2 can be relaxed. Suppose that
λ_{1}= λ_{2} > λ_{3}. Expand u in the form

u = α_{1}x_{1}+ α_{2}x_{2}+ α_{3}x_{3}+ · · · + α_{n}x_{n}.
Then

A^{k}u = λ^{k}_{1}(α1x1+ α2x2) + α3λ^{k}_{3}x3+ · · · + αnλ^{k}_{n}xn.
This shows that the spaces K_{k}(A, u)contain only
approximations to α_{1}x1+ α2x2.

師大 Convergence

Theorem

Let λ be a simple eigenvalue of A and let A =

x X

λ 0 0 L

y^{H}
Y^{H}

= λxy^{H} + XLY^{H}
be a spectral representation. Let

u = αx + Xa, where

α = y^{H}u and a = Y^{H}u.

Then

sin ∠[x, Kk(A, u)] ≤ |α|^{−1} min

deg(p)≤k−1,p(λ)=1

kXp(L)ak_{2}.

師大

Proof: From Theorem 11,

sin ∠[x, Kk(A, u)] = |α|^{−1} min

y∈Kk(A,u)

kαx − yk_{2}

= |α|^{−1} min

deg(p)≤k−1

kαx − p(A)uk_{2}

≤ |α|^{−1} min

deg(p)≤k−1,p(λ)=1kαx − p(A)uk_{2}.
Since

p(λ) = 1 and AX = XL, we have

p(A)u = p(A)(αx + Xa) = αp(λ)x + Xp(L)a = αx + Xp(L)a.

Hence

sin ∠[x, Kk(A, u)] ≤ |α|^{−1} min

deg(p)≤k−1,p(λ)=1

kαx − (αx + Xp(L)a)k_{2}

= |α|^{−1} min

deg(p)≤k−1,p(λ)=1kXp(L)ak_{2}.

師大 Block Krylov spaces

Let (λ_{i}, x_{i})be an eigenpair of A for i = 1, . . . , n. Write vector u
in the form

u = α1x1+ α2x2+ · · · + αnxn.
Assume that λ_{1}is double, i.e., λ_{1} = λ_{2}. Then

A^{k}u = λ^{k}_{1}(α_{1}x_{1}+ α_{2}x_{2}) + λ^{k}_{3}α_{3}x_{3}+ · · · + λ^{k}_{n}α_{n}x_{n}.
Hence the Krylov sequence can only produce the

approximation α_{1}x_{1}+ α_{2}x_{2} to a vector in the eigenspace of λ_{1}.
Let U be a matrix with linearly independent columns. Then the
sequence

{U, AU, A^{2}U, . . .}

is called a block Krylov sequence and the space K_{k}(A, U )is
called the k-th block Krylov space.

師大

Gaol: passing to block Krylov sequence improves the convergence bound.

Theorem

Let A be Hermitian and let (λi, xi)be a complete system (n eigenvectors are linearly independent) of orthonormal eigenpairs of A with

λ_{1}≥ λ_{2} ≥ · · · ≥ λ_{n},

and assume that the multiplicity of λ_{1} is not greater than m. If

B =

x^{H}_{1}

...
x^{H}_{m}

U is nonsingular and we set

v = U B^{−1}e_{1},

師大 Block Krylov spaces

then

tan ∠[x1, K_{k}(A, U )] ≤ tan ∠(x1, v)
c_{k−1}(1 + 2η)

= tan ∠(x1, v)

(1 + 2p

η + η^{2})^{k−1}+ (1 + 2p

η + η^{2})^{1−k}(6),
where

η = λ_{1}− λ_{m+1}
λm+1− λ_{n}.

Proof: Since v ∈ R(U ), we have K_{k}(A, v) ⊂ Kk(A, U ). By
Theorem 11,

sin ∠[x1, K_{k}(A, U )] = min

y∈Kk(A,U )

kx_{1}− yk_{2}

≤ min

y∈Kk(A,v)

kx_{1}− yk_{2} = sin ∠[x1, Kk(A, v)].

師大

This implies that

∠[x1, K_{k}(A, U )] ≤ ∠[x1, K_{k}(A, v)].

By the definition of B, we have

x^{H}_{1}

...
x^{H}_{n}

U B^{−1} = I ⇒ x^{H}_{i} U B^{−1} = e^{T}_{i} for i = 1, . . . , m.

By the definition of v,

x^{H}_{i} v = x^{H}_{i} U B^{−1}e1= 0 for i = 2, . . . , m.

On the other hand, (λ_{i}, x_{i})is an eigenpair of Hermitian A, i.e.,
x^{H}_{i} A = λ_{i}x^{H}_{i} . Hence

x^{H}_{i} A^{j}v = λ^{j}_{i}x^{H}_{i} v = 0 for i = 2, . . . , m.

師大 Block Krylov spaces

This implies that x_{2}, . . . , x_{m} are not contained in K_{k}(A, v). That
is

A^{j}v = α_{1}λ^{j}_{1}x_{1}+ α_{m+1}λ^{j}_{m+1}x_{m+1}+ · · · + α_{n}λ^{j}_{n}x_{n}

for j = 1, . . . , k − 1. We may now apply Theorem 23 to get (6).

師大

Theorem

Let U be a subspace and let U be a basis for U . Let V be a left inverse of U and set

B = V^{H}AU.

If X ⊂ U is an eigenspace of A, then there is an eigenpair (L, W )of B such that (L, U W ) is an eigenpair of A with R(U W ) = X .

Proof: Let (L, X) be an eigenpair of A and let X = U W . Then from the relation

AU W = U W L we obtain

BW = V^{H}AU W = V^{H}U W L = W L,
so that (L, W ) is an eigenpair of B.

師大 Rayleigh-Ritz methods

We can find exact eigenspaces contained in U by looking at eigenpairs of the Rayleigh quotient B.

Algorithm (Rayleigh-Ritz procedure)

1 Let U be a basis for U and let V^{H} be a left inverse of U .

2 Form the Rayleigh quotient B = V^{H}AU.

3 Let (M, W ) be a suitable eigenpair of B.

4 Return (M, U W ) as an approximate eigenpair of A.

(M, U W )is called aRitz pair. Written Ritz pair in the form (λ, U w), we will call λ aRitz valueand U w aRitz vector.

Two difficulties for Rayleigh-Ritz procedure: (i) how to choose the eigenpair (M, W ) in statement 3. (ii) no guarantee that the result approximates the desired eigenpair of A.

師大

Example

Let A = diag(0, 1, −1) and suppose we interested in
approximating the eigenpair (0, e_{1}). Assume

U =

1 0

0 1/√ 2 0 1/√

2

.

Then

B = U^{H}AU =

0 0 0 0

and any nonzero vector p is an eigenvector of B. If we take
p = [1, 1]^{T}, then U p = [1, 1/√

2, 1/√

2]is an approximate eigenvector of A, which is completely wrong. Thus the method can fail, even though the space U contains the desired

eigenvector.

師大 Rayleigh-Ritz methods

The matrices U and V in Algorithm 1 satisfy the condition
V^{H}U = Iand they can differ. Hence Algorithm 1 is called
anoblique Rayleigh-Ritz method.

If the matrix U is taken to be orthonormal and V = U . In addition, W is taken to be orthonormal, so that ˆX ≡ U W is also orthonormal. We call this procedure theorthogonal Rayleigh-Rite method.

Theorem

Let (M, ˆX ≡ U W )be an orthogonal Rayleigh-Rite pair. Then R = A ˆX − ˆXM

is minimal in any unitarily invariant norm.

師大

Proof: By Theorem 14, we need to show M = ˆX^{H}A ˆX. Since
(M, W )is an eigenpair of B and W is orthonormal, we have

M = W^{H}BW
and

Xˆ^{H}A ˆX = W^{H}U^{H}AU W = W^{H}BW = M.

師大 Convergence

Let (λ, x) be the desired eigenpair of A and U_{θ} be an
orthonormal basis for which θ = ∠(x, Uθ)is small.

Theorem Let

B_{θ}= U_{θ}^{H}AU_{θ}.
Then there is a matrix E_{θ}satisfying

kE_{θ}k_{2} ≤ sin θ
p1 − sin^{2}θ

kAk_{2}
such that λ is an eigenvalue of B_{θ}+ E_{θ}.
Proof. Let (U_{θ}, U⊥)be unitary and set

y = U_{θ}^{H}x and z = U_{⊥}^{H}x.

師大

From Theorem 10, we have

kzk_{2}= sin θ and kyk_{2} =p

1 − sin^{2}θ.

Since Ax − λx = 0, we have
U_{θ}^{H}A[U_{θ}, U⊥]

U_{θ}^{H}
U_{⊥}^{H}

x − λU_{θ}^{H}x = 0,
or

B_{θ}y + U_{θ}^{H}AU⊥z − λy = 0.

Let ˆy = y/kyk2 = y/p

1 − sin^{2}θ. If
r ≡ B_{θ}y − λˆˆ y = −1

p1 − sin^{2}θ

U_{θ}^{H}AU_{⊥}z,
it follows that

krk_{2}≤ sin θ
p1 − sin^{2}θ

kAk_{2}.

師大 Convergence

Now define

E_{θ}= −r ˆy^{H}.
Then

kE_{θ}k_{2} =
q

λ1((r ˆy^{H})(ˆyr^{H})) =
q

λ1(rr^{H}) = krk2≤ sin θ
p1 − sin^{2}θ

kAk_{2}
and

(B_{θ}+ E_{θ})ˆy = B_{θ}y − (r ˆˆ y^{H})ˆy = B_{θ}y − r = λˆˆ y.

Therefore, (λ, ˆy)is an eigenpair of B_{θ}+ E_{θ}.
Corollary

There is an eigenvalue µ_{θ}of B_{θ}such that

|µ_{θ}− λ| ≤ 4(2kAk_{2}+ kEθk_{2})^{1−1/m}kE_{θ}k^{1/m}_{2} ,
where m is the order of B_{θ}.

師大

Theorem

Let (µ_{θ}, w_{θ})be an eigenpair of B_{θ} and let

w_{θ} W_{θ} be
unitary, so that

w^{H}_{θ}
W_{θ}^{H}

Bθ

wθ Wθ =

µ_{θ} h^{H}_{θ}
0 Nθ

. Then

sin ∠(x, Uθw_{θ}) ≤ sin θ
s

1 + kh_{θ}k^{2}_{2}
sep(λ, N_{θ})^{2},
where sep(λ, N_{θ}) = k(λI − N_{θ})^{−1}k^{−1}.

師大 Convergence

By the continuity of sep, we have

|sep(λ, N_{θ}) −sep(µ_{θ}, N_{θ})| ≤ |µ_{θ}− λ|

⇒ sep(λ, N_{θ}) ≥sep(µ_{θ}, N_{θ}) − |µ_{θ}− λ|.

Suppose µ_{θ}→ λ and sep(µ_{θ}, N_{θ})is bounded below. Then
sep(λ, N_{θ})is also bounded below. Since kh_{θ}k_{2} ≤ kAk_{2}, we
have sin ∠(x, Uθw_{θ}) → 0along with θ.

Corollary

Let (µ_{θ}, U_{θ}w_{θ})be a Ritz pair for which µ_{θ}→ λ. If there is a
constant α > 0 such that

sep(µ_{θ}, N_{θ}) ≥ α > 0, (7)
then

sin ∠(x, Uθw_{θ}) . sin θ
r

1 +kAk_{2}
α^{2} .

師大

This corollary justifies that eigenvalue convergence plus separation equals eigenvector convergence.

The condition (7) is called the uniform separation condition.

Let (µ_{θ}, x_{θ})with kx_{θ}k_{2} = 1be the Ritz approximation to (λ, x).

Then by construction, we have

µ_{θ} = x^{H}_{θ} Ax_{θ}. (8)

Write

x_{θ} = γx + σy, (9)

where y⊥x and kyk_{2}= 1. Then

|γ| = |x^{H}xθ| = cos ∠(xθ, x) and |σ| = |y^{H}xθ| = sin ∠(xθ, x).

If the uniform separation is satisfied, we have

|σ| = sin ∠(xθ, x) = O(θ).

師大 Convergence

Substituting (9) into (8) and using the facts of Ax = λx and
y^{H}x = 0, we find that

µ_{θ} = (¯γx^{H} + ¯σy^{H})(γλx + σAy)

= |γ|^{2}λ + σx^{H}_{θ} Ay.

Hence

|µ_{θ}− λ| = |(|γ|^{2}− 1)λ + σx^{H}_{θ} Ay|

≤ |σ|^{2}|λ| + |σ|kx_{θ}k_{2}kAk_{2}kyk_{2}

≤ |σ|(1 + |σ|)kAk_{2}

= O(θ).

Thus the Ritz value converges at least as fast as the
eigenvector approximation of x in U (U_{θ}).

師大

If A is Hermitian, then

µ_{θ} = (¯γx^{H} + ¯σy^{H})(γλx + σAy)

= |γ|^{2}λ + ¯rσx^{H}Ay + |σ|^{2}y^{H}Ay = |γ|^{2}λ + ¯rσ¯λx^{H}y + |σ|^{2}y^{H}Ay

= |γ|^{2}λ + |σ|^{2}y^{H}Ay
and

|µ_{θ}− λ| = |(|γ|^{2}− 1)λ + |σ|^{2}y^{H}Ay|

≤ |σ|^{2}|λ| + |σ|^{2}kyk_{2}kAk_{2}kyk_{2}

≤ 2|σ|^{2}kAk_{2}

= O(θ^{2}).

Since the angle θ = ∠(x, Uθ)cannot be known and hence cannot compute error bounds. Thus, we must look to the residual as an indication of convergence.

師大 Convergence

Theorem

Let A have the spectral representation
A = λxy^{H} + XLY^{H},

where kxk_{2} = 1and Y is orthonormal. Let (µ, ˜x)be an
approximation to (λ, x) and let

ρ = kA˜x − µ˜xk_{2}.
Then

sin ∠(˜x, x) ≤ ρ

sep(µ, L) ≤ ρ

sep(λ, L) − |µ − λ|.

師大

Proof: Since Y^{H}x = 0, we have that Y is an orthonormal basis
for the orthogonal complement of the space span{x} and then
sin ∠(˜x, x) = kY^{H}xk˜ 2. Let r = A˜x − µ˜x. Then

Y^{H}r = Y^{H}A˜x − µY^{H}x = (L − µI)Y˜ ^{H}x.˜
It follows that

sin ∠(˜x, x) = k(L − µI)^{−1}Y^{H}rk_{2} ≤ krk_{2}
sep(µ, L).
By the fact that sep(µ, L) ≥ sep(λ, L) − |µ − λ|, the second
inequality is obtained.

Since λ is assumed to be simple, this theorem says that:

Sufficient condition for ˜xto converge to x is for µ to converge to λand for the residual to converge to zero.

師大 Refined Ritz vectors

Definition

Let µ_{θ}be a Ritz value associated with U_{θ}. A refined Ritz vector
is a solution of the problem

min kAˆx_{θ}− µ_{θ}xˆ_{θ}k_{2}
subject to xˆ_{θ} ∈ U_{θ}, kˆx_{θ}k_{2} = 1.

Theorem

Let A have the spectral representation
A = λxy^{H} + XLY^{H},

where kxk_{2} = 1and Y is orthonormal. Let µ_{θ}be a Ritz value
and ˆxθthe corresponding refined Ritz vector. If

sep(λ, L) − |µ_{θ}− λ| > 0, then

sin ∠(x, ˆx_{θ}) ≤ kA − µ_{θ}Ik_{2}sin θ + |λ − µ_{θ}|
p1 − sin^{2}θ[sep(λ, L) − |λ − µ_{θ}|].

師大

Proof: Let U be an orthonormal basis for U_{θ} and let x = y + z,
where z = U U^{H}x. Then

kzk_{2} = kU^{H}xk_{2} = sin θ.

Moreover, since y and x are orthogonal,

kzk^{2}_{2} = kx − yk^{2}_{2} = (x^{H}− y^{H})(x − y)

= kxk^{2}_{2}+ kyk^{2}_{2}= 1 + kyk^{2}_{2}

⇒ kyk^{2}_{2}= 1 − kzk^{2}_{2}= 1 − sin^{2}θ.

Let

ˆ

y = y

p1 − sin^{2}θ
,
we have

(A − µ_{θ}I)ˆy = (A − µ_{θ}I)y
p1 − sin^{2}θ

= (A − µ_{θ}I)(x − z)
p1 − sin^{2}θ

= (λ − µ_{θ})x − (A − µ_{θ}I)z
p1 − sin^{2}θ

.

師大 Refined Ritz vectors

Hence

k(A − µ_{θ}I)ˆyk2≤ |λ − µ_{θ}| + kA − µ_{θ}Ik sin θ
p1 − sin^{2}θ .
By the definition of a refined Ritz vector we have

k(A − µ_{θ}I)ˆxk_{2} ≤ |λ − µ_{θ}| + kA − µ_{θ}Ik sin θ
p1 − sin^{2}θ

. The result now follows from Theorem 35.

Remark

By Corollary 30, µ_{θ}→ λ. It follows that sin ∠(x, ˆx_{θ}) → 0. In
other words, refined Ritz vectors are guaranteed to

converge.

ˆ

µ_{θ} = ˆx^{H}_{θ} Aˆx_{θ} is more accurate than µ_{θ} and kAˆx_{θ}− ˆµ_{θ}xˆ_{θ}k_{2}
is optimal.

師大

The computation of a refined Ritz vector amounts to solve
min kAˆx − µˆxk_{2}

subject to x ∈ U , kˆˆ xk_{2} = 1. (10)
Let U be an orthonormal basis for U . Then (10) is equivalent to

min k(A − µI)U zk_{2}
subject to kzk_{2} = 1.

The solution of this problem is the right singular vector of (A − µI)Ucorresponding to its smallest singular value. Thus refined Ritz vector can be computed by the following algorithm.

1 V = AU

2 W = V − µU

3 Compute the smallest singular value of W and its right singular vector z

4 x = U zˆ

師大 Harmonic Ritz vectors

Exterior eigenvalues are easily convergent than interior eigenvalues by Rayleigh quotient. The quality of the refined Ritz vector depends on the accuracy of the Ritz value µ and each refined Ritz vector must be calculated independently from its own distinct value of µ.

Definition

Let U be an orthonormal basis for subspace U . Then (κ + δ, U w)is aHarmonic Ritz pair with shift κif

U^{H}(A − κI)^{H}(A − κI)U w = δU^{H}(A − κI)^{H}U w. (11)
Given shift κ, (11) is a generalized eigenvalue problem with
eigenvalue δ.

Theorem

Let (λ, x) be an eigenpair of A with x = U w. Then (λ, U w) is a harmonic Ritz pair.

師大

Proof: Since (λ, x) is an eigenpair of A with x = U w, we have Ax = λx ⇒ AU w = λU w.

It implies that

U^{H}(A − κI)^{H}(A − κI)U w = (λ − κ)U^{H}(A − κI)^{H}U w.

Taking eigenvalue δ = λ − κ, we obtain

U^{H}(A − κI)^{H}(A − κI)U w = δU^{H}(A − κI)^{H}U w.

That is (κ + δ, U w) = (λ, U w) is a harmonic Ritz pair.

師大 Harmonic Ritz vectors

Given a shift κ, if we want to compute the eigenvalue λ of A which is closest to κ, then we need to compute the eigenvalue δ of (11) such that |δ| is the smallest value of all of the absolute values for the eigenvalues of (11).

Expect

If x is approximately represented in U , then the harmonic Rayleigh-Ritz will produce an approximation to x.

Question

How to compute the eigenpair (δ, w) of (11)?

師大

Let

(A − κI)U = QR

be the QR factorization of (A − κI)U . Then (11) can be rewritten as

R^{H}Rw = δR^{H}Q^{H}U w.

That is

(Q^{H}U )w = δ^{−1}Rw.

This eigenvalue can be solved by the QZ algorithm. The
harmonic Ritz vector ˆx = U wand the corresponding harmonic
Ritz value is µ = ˆx^{H}Aˆx.