The QR algorithm

(1)

師大

The QR algorithm

Tsung-Ming Huang

Department of Mathematics National Taiwan Normal University, Taiwan

December 3, 2008

(2)

師大

Outline

1 The power and inverse power methods The inverse power method

2 The explicitly shift QR algorithm

The QR algorithm and the inverse power method The unshifted QR algorithm

Hessenberg form

3 Implicity shifted QR algorithm The implicit double shift

4 The generalized eigenvalue problem

Real Schur and Hessenberg-triangular forms The doubly shifted QZ algorithm

(3)

師大

The power and inverse power methods

Let A be a nondefective matrix and (λi, xi)for i = 1, · · · , n be a complete set of eigenpairs of A. That is {x₁, · · · , x_n} is linearly independent. Hence, for any u₀ 6= 0, ∃ α1, · · · , α_nsuch that

u₀ = α₁x₁+ · · · + α_nx_n. Now A^kxi= λ^k_ixi, so that

A^ku0= α1λ^k₁x1+ · · · + αnλ^k_nxn. (1) If |λ₁| > |λ_i| for i ≥ 2 and α16= 0, then

1

λ^k₁A^ku0= α1x1+(λ2

λ₁)^kα2x2+· · ·+αn(λn

λ₁)^kxn→ α₁x1 as k → 0.

(4)

師大

Theorem

Let A have a unique dominant eigenpair (λ₁, x₁)with x^∗₁x₁ = 1 and X = x1 X2 be a nonsingular matrix with X₂^∗X2= I such that

X⁻¹AX =

λ1 0

0 M

. Let u₀ 6= 0 be decomposed in u₀ = r₁x₁+ X₂c₂. Then

sin ∠(x1, A^ku0) ≤ |λ₁|^−kkM^kk₂kc₂/r1k₂ 1 − |λ₁|^−kkM^kk₂kc₂/r₁k₂. In particular ∀ ε > 0, ∃ σ such that

sin ∠(x1, A^ku0) ≤ σ[ρ(M )/|λ1| + ε]^k 1 − σ[ρ(M )/|λ₁| + ε]^k, where ρ(M ) is the spectral radius of M .

(5)

師大

Proof: Since

u₀= α₁x₁+ X₂c₂ = x₁ X₂

α₁ c2

= X

α₁ c2

, it follows that

X⁻¹A^ku₀ = X⁻¹A^kX

α1

c₂

= (X⁻¹AX)(X⁻¹AX) · · · (X⁻¹AX)

α₁ c2

=

λ^k₁ 0 0 M^k

α₁ c₂

. Hence,

A^ku0 = X

λ^k₁α1

M^kc2

= α1λ^k₁x1+ X2M^kc2.

(6)

師大

Let the columns of Y form an orthonormal basis for the subspace orthogonal to x₁.By Lemma 3.12 in Chapter 1, we have

sin ∠(x1, A^ku0) = kY^∗A^ku0k₂

kA^ku₀k₂ = kY^∗X2M^kc2k₂ kα₁λ^k₁x1+ X2M^kc2k₂. But

kY^∗X2M^kc2k₂≤ kM^kk₂kc₂k₂ and

kα₁λ^k₁x₁+ X₂M^kc₂k₂≥ |α₁||λ^k₁| − kM^kk₂kc₂k₂, we get

sin ∠(x1, A^ku₀) ≤ |λ₁|^−kkM^kk₂kc₂/α₁k₂ 1 − |λ1|^−kkM^kk₂kc₂/α1k₂. By Theorem 2.9 in Chapter 1, ∀ ε > 0, ∃ ˆσsuch that

kM^kk₂≤ ˆσ(ρ(M ) + ε)^k.

(7)

師大

Take σ = ˆσkc2/α1k₂.Then ∀ ε > 0,

sin ∠(x1, A^ku₀) ≤ σ[ρ(M )/|λ₁| + ε]^k 1 − σ[ρ(M )/|λ₁| + ε]^k.

The error in the eigenvector approximation converges to zero at an asymptotic rate of [ρ(M )/|λ₁|]^k.

If A has a complete system of eigenvectors with

|λ₁| > |λ₂| ≥ · · · ≥ |λ_n|, then the convergence is as

|λ₂/λ₁|^k.

(8)

師大

Algorithm (Power Method with 2-norm) Choose an initial u 6= 0 with kuk₂ = 1.

Iterate until convergence

Computev = Au; k = kvk₂; u := v/k Theorem

The sequence defined by Algorithm 1 is satisfied

i→∞lim ki = |λ1|

i→∞lim εⁱui= x₁ kx₁k

α₁

|α₁|, where ε = |λ₁| λ1

(9)

師大

Proof: It is obvious that

us = A^su0/kA^su0k, ks= kA^su0k/kA^s−1u0k. (2) This follows from λ1−sA^su0 −→ α₁x1 that

|λ₁|^−skA^su₀k −→ |α₁|kx₁k

|λ₁|^−s+1kA^s−1u₀k −→ |α₁|kx₁k and then

|λ₁|⁻¹kA^su₀k/kA^s−1u₀k = |λ₁|⁻¹k_s−→ 1.

From (1) follows now for s → ∞

ε^su_s = ε^s A^su₀ kA^su0k =

α₁x₁+Pn i=2α_i

λi

λ1

s

x_i kα₁x1+Pn

i=2αi

λi

λ1

s

xik

→ α1x1

kα₁x1k = x1

kx₁k α1

|α₁|.

(10)

師大

Algorithm (Power Method with Linear Function) Choose an initial u 6= 0.

Iterate until convergence

Computev = Au; k = `(v); u := v/k

where`(v), e.g.e1(v)oren(v), is a linear functional.

Theorem

Suppose `(x₁) 6= 0and `(v_i) 6= 0, i = 1, 2, . . . ,then

i→∞lim ki = λ1 i→∞lim ui = x1

`(x₁).

(11)

師大

Proof: As above we show that

u_i = Aⁱu₀/`(Aⁱu₀), k_i= `(Aⁱu₀)/`(Aⁱ⁻¹u₀).

From (1) we get for s → ∞ λ1−s

`(A^su0) −→ α1`(x1), λ₁^−s+1`(A^s−1u₀) −→ α₁`(x₁), thus

λ1−1

ks−→ 1.

Similarly for i −→ ∞,

ui= Aⁱu0

`(Aⁱu₀) = α₁x₁+Pn

j=2α_j(^λ_λ^j

1)ⁱx_j

`(α₁x₁+P_n

j=2α_j(_λ^λ^j

1)ⁱx_j)

−→ α1x1

α₁`(x₁)

(12)

師大

Note that:

ks = `(A^su0)

`(A^s−1u₀) = λ1

α₁`(x₁) +Pn

j=2α_j(^λ_λ^j

1)^s`(x_j) α1`(x1) +Pn

j=2αj(^λ_λ^j

1)^s−1`(xj)

= λ1+ O

| λ2

λ₁ |^s−1

.

That is the convergent rate is

λ2

λ1

.

(13)

師大

Theorem

Let u 6= 0 and for any µ set r_µ= Au − µu. Then kr_µk₂is minimized when

µ = u^∗Au/u^∗u.

In this case r_µ⊥ u.

Proof: W.L.O.G. assume kuk₂= 1. Let u U be unitary and set

u^∗ U^∗

A u U ≡

ν h^∗ g B

=

u^∗Au u^∗AU U^∗Au U^∗AU

.

(14)

師大

Then

u^∗ U^∗

rµ =

u^∗ U^∗

Au − µ

u^∗ U^∗

u

=

u^∗ U^∗

A u U

u^∗ U^∗

u − µ

u^∗ U^∗

u

=

ν h^∗ g B

u^∗ U^∗

u − µ

u^∗ U^∗

u

=

ν h^∗ g B

1 0

− µ

1 0

=

ν − µ g

.

It follows that kr_µk²₂= k

u^∗ U^∗

r_µk²₂ = k

ν − µ g

k²₂ = |ν − µ|²+ kgk²₂.

(15)

師大

Hence

minµ kr_µk₂ = kgk2 = krνk₂. That is µ = ν = u^∗Au. On the other hand, since

u^∗r_µ= u^∗(Au − µu) = u^∗Au − µ = 0, it implies that r_µ⊥ u.

Definition (Rayleigh quotient)

Let u and v be vectors with v^∗u 6= 0.Then v^∗Au/v^∗uis called a Rayleigh quotient.

If u or v is an eigenvector corresponding to an eigenvalue λ of A, then

v^∗Au

v^∗u = λv^∗u v^∗u = λ.

Therefore, u^∗_kAu_k/u^∗_ku_kprovide a sequence of approximation to λin the power method.

(16)

師大 The inverse power method

Inverse power method

Goal

Find the eigenvalue of A that is in a given region or closest to a certain scalar σ and the corresponding eigenvector.

Let λ₁, · · · , λ_nbe the eigenvalues of A. Suppose λ₁is simple and σ ≈ λ1.Then

µ1 = 1

λ₁− σ, µ2 = 1

λ₂− σ, · · · , µn= 1 λ_n− σ

are eigenvalues of (A − σI)⁻¹ and µ₁→ ∞ as σ → λ₁. Thus we transform λ₁into a dominant eigenvalue µ₁.

The inverse power method is simply the power method applied to (A − σI)⁻¹.

(17)

Let

y = (A − σI)⁻¹xand ˆx = y/kyk2. It holds that

(A − σI)ˆx = x

kyk₂ ≡ w.

Set

ρ = ˆx^∗(A − σI)ˆx = ˆx^∗w.

Then

r = [A − (σ + ρ)I]ˆx = (A − σI)ˆx − ρˆx = w − ρˆx.

Algorithm (Inverse power method with a fixed shift) Choose an initial u₀ 6= 0.

For i = 0, 1, 2, . . .

Computevi+1= (A − σI)⁻¹ui andki+1= `(vi+1).

Setu_i+1= v_i+1/k_i+1

(18)

The convergence of Algorithm 3 is |^λ_λ¹^−σ

2−σ| whenever λ1 and λ₂are the closest and the second closest eigenvalues to σ.

Algorithm 3 is linearly convergent.

Algorithm (Inverse power method with variant shifts) Choose an initial u₀ 6= 0.

Given σ₀= σ.

For i = 0, 1, 2, . . .

Computev_i+1= (A − σ_iI)⁻¹u_iandk_i+1= `(v_i+1).

Setu_i+1= v_i+1/k_i+1andσ_i+1= σ_i+ 1/k_i+1. Above algorithm is locally quadratic convergent.

(19)

Connection with Newton method

Consider the nonlinear equations:

F

u λ

≡

Au − λu

`^Tu − 1

=

0 0

. (3)

Newton method for (3): for i = 0, 1, 2, . . .

u_i+1 λ_i+1

=

u_i λ_i

−

F⁰

u_i λ_i

−1

F

u_i λ_i

. Since

F⁰

u λ

=

A − λI −u

`^T 0

,

the Newton method can be rewritten by component-wise

(A − λ_i)u_i+1 = (λ_i+1− λ_i)u_i (4)

`^Tu_i+1 = 1. (5)

(20)

Let

v_i+1= u_i+1 λ_i+1− λ_i. Substituting v_i+1into (4), we get

(A − λ_iI)v_i+1= u_i. By equation (5), we have

k_i+1= `(v_i+1) = `(ui+1)

λ_i+1− λ_i = 1 λ_i+1− λ_i. It follows that

λi+1= λi+ 1 k_i+1.

Hence the Newton’s iterations (4) and (5) are identified with Algorithm 4.

(21)

Algorithm (Inverse power method with Rayleigh Quotient) Choose an initial u0 6= 0 with ku0k₂ = 1.

Compute σ₀= u^T₀Au₀. For i = 0, 1, 2, . . .

Computevi+1= (A − σiI)⁻¹ui.

Setu_i+1= v_i+1/kv_i+1k₂ andσ_i+1= u^T_i+1Au_i+1. For symmetric A, Algorithm 5 is cubically convergent.

(22)

師大

The explicitly shift QR algorithm

The QR algorithm is an iterative method for reducing a matrix A to triangular form by unitary similarity transformations.

Algorithm (explicitly shift QR algorithm) Set A₀ = A.

For k = 0, 1, 2, · · · Choose a shift σ_k;

Factor A_k− σ_kI = Q_kR_k,where Q_kis orthogonal and R_kis upper triangular;

A_k+1= R_kQ_k+ σ_kI; end for

(23)

師大

Since

A_k− σ_kI = Q_kR_k=⇒ R_k= Q^∗_k(A_k− σ_kI), it holds that

A_k+1 = R_kQ_k+ σ_kI

= Q^∗_k(A_k− σ_kI)Q_k+ σ_kI

= Q^∗_kAkQk

The algorithm is a variant of the power method.

(24)

師大 The QR algorithm and the inverse power method

Let Q = Qˆ q be unitary and write

Q^∗AQ =

Qˆ^∗A ˆQ Qˆ^∗Aq q^∗A ˆQ q^∗Aq

≡

Bˆ ˆh ˆ g^∗ µˆ

. If (λ, q) is a left eigenpair of A, then

ˆ

g^∗ = q^∗A ˆQ = λq^∗Q = 0ˆ and ˆµ = q^∗Aq = λq^∗q = λ.

That is

Q^∗AQ =

Bˆ hˆ 0 λ

.

But it is not an effective computational procedure because it requires q is an eigenvector of A.

(25)

Let q be an approximate left eigenvector of A with q^∗q = 1, ˆµ = q^∗Aq and r = q^∗A − ˆµq^∗. Then

r Qˆ q

= (q^∗A − ˆµq^∗) Qˆ q

= q^∗A ˆQ − ˆµq^∗Qˆ q^∗Aq − ˆµq^∗q

= q^∗A ˆQ 0 = ˆg^∗ 0 . Therefore,

kˆg^∗k₂= kr Qˆ q k₂= krk2.

The QR algorithm implicitly chooses q to be a vector produced by the inverse power method with shift σ.

(26)

Write the QR factorization of A − σI as

Qˆ^∗ q^∗

(A − σI) = R ≡

Rˆ^∗ r^∗

. It holds that

q^∗(A − σI) = r^∗ = r_nne^T_n ⇒ q^∗ = r_nne^T_n(A − σI)⁻¹ (6) Hence, the last column of Q generated by the QR algorithm is the result of the inverse power method with shift σ applied to e^T_n. Question

How to choose shift σ?

(27)

Let

A =

B h g^∗ µ

. Then

e^T_nAen= µ and e^T_nA − µen= g^∗ µ − µe_n= g^∗ 0 .

If we take (µ, e_n)to be an approximate left eigenvector of A, then the corresponding residual norm is kgk₂.

If g is small, then µ should approximate an eigenvalue of A and choose σ = µ = e^T_nAen(Rayleigh quotient shift).

Question

Why the QR algorithm converges?

(28)

Let

A − σI ≡

B − σI h g^∗ µ − σ

= QR ≡

P f e^∗ π

S r 0 ρ

(7) be the QR factorization of A − σI. Take

A ≡ˆ

Bˆ ˆh ˆ g^∗ µˆ

= RQ + σI. (8)

Since Q is unitary, we have

kek²₂+ π² = kf k²₂+ π² = 1 which implies that

kek₂= kf k2 and |π| ≤ 1.

(29)

From (7), we have

g^∗ = e^∗S.

Assume S is nonsingular and κ = kS⁻¹k₂, then kek₂≤ κkgk₂.

Since R ≡

S r 0 ρ

= Q^∗(A−σI) ≡

P^∗ e f^∗ π¯

B − σI h g^∗ µ − σ

, it implies that

ρ = f^∗h + ¯π(µ − σ) and then

|ρ| ≤ kf kkhk + |π||µ − σ| = kek₂khk₂+ |π||µ − σ|

≤ κkgk₂khk₂+ |µ − σ|.

(30)

From (8), we have

ˆ

g^∗ = ρe^∗ which implies that

kˆgk₂≤ |ρ|kek₂ ≤ |ρ|κkgk₂≤ κ²khk₂kgk²₂+ κ|µ − σ|kgk₂. Consequently,

kg_j+1k₂ ≤ κ²_jkh_jk₂kg_jk²₂+ κ_j|µ_j − σ_j|kg_jk₂. If g₀ is sufficiently small and µ₀ is sufficiently near a simple eigenvalue λ, then g_j → 0 and µ_j → λ.

Assume ∃ η and κ such that

kh_jk₂ ≤ η and κ_j = kS_j⁻¹k₂ ≤ κ.

(31)

Take the Rayleigh quotient shift σ_j = µ_j. Then kg_j+1k₂ ≤ κ²ηkgjk²₂,

which means that kg_jk₂converges at least quadratically to zero.

If A₀ is Hermitian, then A_kis also Hermitian. It holds that hj = gj

and then

kg_j+1k₂≤ κ²kg_jk³₂. Therefore, the convergent rate is cubic.

(32)

師大 The unshifted QR algorithm

The unshifted QR algorithm

QR algorithm

A_k+1= Q^∗_kA_kQ_k or

Ak+1= Q^∗_kQ^∗_k−1· · · Q₀A0Q0· · · Q_k−1Qk

for k = 0, 1, 2, · · · . Let

Qˆk= Q0· · · Q_k−1Qk. Then

A_k+1= ˆQ^∗_kA₀Qˆ_k.

(33)

Theorem

Let Q₀, · · · , Q_kand R₀, · · · , R_kbe the orthogonal and triangular matrices generated by the QR algorithm with shifts σ₀, · · · , σ_k starting with A. Let

Qˆk= Q0· · · Q_k and ˆRk= R0· · · R_k. Then

QˆkRˆk= (A − σ0I) · · · (A − σkI).

Proof: Since

Rk = (Ak+1− σ_kI)Q^∗_k

= Qˆ^∗_k(A − σ_kI) ˆQ_kQ^∗_k

= Qˆ^∗_k(A − σ_kI) ˆQ_k−1,

(34)

it follows that

Rˆk = RkRˆk−1 = ˆQ^∗_k(A − σkI) ˆQk−1Rˆk−1

and

QˆkRˆk= (A − σkI) ˆQk−1Rˆk−1. By induction on ˆQk−1Rˆk−1, we have

QˆkRˆk= (A − σkI) · · · (A − σ0I).

If σ_k= 0for k = 0, 1, 2, · · · , then ˆQkRˆk = A^k+1 and ˆ

r^(k)₁₁qˆ₁^(k)= ˆQ_kRˆ_ke₁ = A^k+1e₁.

This implies that the first column of ˆQ_kis the normalized result of applying k + 1 iterations of the power method to e₁.

(35)

Hence, ˆq₁^(k)approaches the dominant eigenvector of A, i.e., if Ak = ˆQ^∗_kAQk=

µk h^∗_k g_k B_k

,

then g_k→ 0 and µk → λ₁,where λ₁is the dominant eigenvalue of A.

Theorem Let

X⁻¹AX = Λ ≡ diag(λ1, · · · , λn)

where |λ₁| > |λ₂| > · · · > |λ_n| > 0. Suppose X⁻¹has an LU factorization X⁻¹= LU,where L is unit lower triangular, and let X = QRbe the QR factorization of X. If A^khas the QR

factorization A^k = ˆQkRˆk,then ∃ diagonal matrices D_kwith

|D_k| = I such that ˆQ_kD_k−→ Q.

(36)

Proof: By the assumptions, we get

A^k= XΛ^kX⁻¹ = QRΛ^kLU = QR(Λ^kLΛ^−k)(Λ^kU ).

Since

(Λ^kLΛ^−k)_ij = `_ij(λ_i/λ_j)^k → 0 for i > j, it holds that

Λ^kLΛ^−k → I as k → ∞.

Let

Λ^kLΛ^−k = I + E_k, where E_k→ 0 as k → ∞. Then

A^k= QR(I + Ek)(Λ^kU ) = Q(I + REkR⁻¹)(RΛ^kU ).

(37)

Let

I + REkR⁻¹ = ¯QkR¯k

be the QR factorization of I + RE_kR⁻¹.Then A^k = (Q ¯Qk)( ¯RkRΛ^kU ).

Since

I + REkR⁻¹→ I as k → ∞, we have

Q¯_k → I as k → ∞.

Let the diagonals of ¯R_kRΛ^kU be δ₁, · · · , δ_mand set Dk= diag(¯δ1/|δ1|, · · · , ¯δn/δn).

Then A^k= (Q ¯QkD⁻¹_k )(DkR¯kRΛ^kU ) = ˆQkRˆk.

(38)

Since the diagonals of D_kR¯_kRΛ^kU and ˆR_kare positive, by the uniqueness of the QR factorization

Qˆ_k = Q ¯Q_kD⁻¹_k , which implies that

QˆkDk= Q ¯Qk → Q as k → ∞.

Remark:

(i) Since X⁻¹AX = Λ ≡ diag(λ₁, · · · , λ_n), we have

A = XΛX⁻¹ = (QR)Λ(QR)⁻¹ = Q(RΛR⁻¹)Q^∗ ≡ QT Q^∗ which is a Schur decomposition of A. Therefore, the column of ˆQ_kD_kconverge to the Schur vector of A and Ak= ˆQ^∗_kA ˆQkconverges to the triangular factor of the Schur decomposition of A.

(39)

(ii) Write

R(Λ^kLΛ^−k) =





R₁₁ r_1,i R_1,i+1 0 rii r^∗_i,i+1 0 0 Ri+1,i+1











L^(k)₁₁ 0 0

`^(k)∗_i,1 1 0 L^(k)_i+1,1 `^(k)_i+1,i L^(k)_i+1,i+1





. If `^(k)∗_i,1 , L^(k)_i+1,1 and `^(k)_i+1,i are zeros, then

R(Λ^kLΛ^−k) =





R₁₁L^(k)₁₁ r_1,i R_1,i+1L_i+1,i+1 0 r_i,i r^∗_i,i+1L_i+1,i+1 0 0 Ri+1,i+1Li+1,i+1



 and

I + RE_kR⁻¹ = R(I + E_k)R⁻¹= R(Λ^kLΛ^−k)R⁻¹

=





G11 g1,i G1,i+1

0 gii g^∗_i,i+1 0 0 Gi+1,i+1





= Q¯_kR¯_k ∼ QR factorization

(40)

which implies that

Q¯k= diag( ¯Q^k₁₁, w, ¯Q^k_i+1,i+1) and

A_k = Qˆ^∗_kA ˆQ_k= ¯Q^∗_kQ^∗AQ ¯Q_k= ¯Q^∗_kT ¯Q_k

=







A^(k)₁₁ a^(k)_1,i A^(k)_1,i+1 0 λi A^(k)_i,i+1 0 0 A^(k)_i+1,i+1





.

Therefore, A_kdecouples at its ith diagonal element.

The rate of convergence is at least as fast as the approach of max{|λ_i/λ_i−1|, |λ_i+1/λ_i|}^kto zero.

(41)

師大 Hessenberg form

Definition

A Householder transformation or elementary reflector is a matrix of

H = I − uu^∗ where kuk2 =√

2.

Note that H is Hermitian and unitary.

Theorem

Let x be a vector such that kxk₂ = 1and x₁ is real and nonnegative. Let

u = (x + e1)/√ 1 + x1. Then

Hx = (I − uu^∗)x = −e1.

(42)

Proof:

I − uu^∗x = x − (u^∗x)u = x − x^∗x + x₁

√1 + x₁ · x + e₁

√1 + x₁

= x − (x + e1) = −e1

Theorem

Let x be a vector with x₁6= 0. Let u = ρ_kxk^x

2 + e₁ q1 + ρ_kxk^x¹

2

,

where ρ = ¯x₁/|x₁|. Then

Hx = − ¯ρkxk2e1.

(43)

Proof: Since

[ ¯ρx^∗/kxk₂+ e^T₁][ρx/kxk₂+ e₁]

= ρρ + ρx¯ ₁/kxk₂+ ¯ρ¯x₁/kxk₂+ 1

= 2[1 + ρx1/kxk2], it follows that

u^∗u = 2 ⇒ kuk₂=√ 2 and

u^∗x = ρkxk¯ 2+ x1

q1 + ρ_kxk^x¹

2

.

(44)

Hence,

Hx = x − (u^∗x)u = x − ρkxk¯ 2+ x1

q1 + ρ_kxk^x¹

2

ρ_kxk^x

2 + e1

q1 + ρ_kxk^x¹

2

=

"

1 −( ¯ρkxk2+ x1)_kxk^ρ

2

1 + ρ_kxk^x¹

2

#

x −ρkxk¯ 2+ x1

1 + ρ_kxk^x¹

2

e1

= −ρkxk¯ 2+ x1

1 + ρ_kxk^x¹

2

e1

= −¯ρkxk2e1.

(45)

•Reduction to Hessenberg form Take

A =

α11 a^∗₁₂ a21 A22

.

Let ˆH1 be a Householder transformation such that Hˆ₁a₂₁= v₁e₁.

Set H₁ = diag(1, ˆH₁). Then H₁AH₁=

α₁₁ a^∗₁₂Hˆ₁ Hˆ1a21 Hˆ1A22Hˆ1

=

α₁₁ a^∗₁₂Hˆ₁ v1e1 Hˆ1A22Hˆ1

For the general step, suppose H1, · · · , H_k−1are Householder transformation such that

H_k−1· · · H₁AH₁· · · H_k−1=





A₁₁ a_1,k A_1,k+1 0 α_kk a^∗_k,k+1 0 a_k+1,k A_k+1,k+1



,

(46)

where A₁₁is a Hessenberg matrix of order k − 1. Let ˆHkbe a Householder transformation such that

Hˆkak+1,k= vke1. Set H_k= diag(I_k, ˆH_k),then

HkHk−1· · · H₁AH1· · · H_k−1Hk=







A₁₁ a_1,k A_1,k+1Hˆ_k 0 αkk a^∗_k,k+1Hˆk

0 v_ke₁ Hˆ_kA_k+1,k+1Hˆ_k





.

(47)







× × × × ×





 H1

−→







× × × × ×

0 × × × ×







H₂

−→







× × × × ×

0 × × × ×

0 0 × × ×







H₃

−→







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×







(48)

Definition (Givens rotation)

A plane rotation (also called a Givens rotation) is a matrix of the form

P =

c s

−¯s ¯c

where |c|²+ |s|²= 1.

Given a 6= 0 and b, set

v =p|a|²+ |b|², c = |a|/v and s = a

|a| ·¯b v, then

c s

−¯s c¯

a b

= v a

|a|

0

! .

(49)

Let

P_ij =





 Ii−1

c s

I_j−i−1

−¯s c¯

I_n−j





 .







× × × ×





 P₁₂

−→







+ + + +

0 + + +

× × × ×





 P₁₃

−→







+ + + +

0 × × ×

0 + + +

× × × ×







P₁₄

−→







+ + + +

0 × × ×

0 + + +







(50)







× × × ×





 P₁₂

−→







+ 0 × ×

+ + × ×







P13

−→







+ 0 0 ×

+ × + ×







P14

−→







+ 0 0 0

+ × × +







(51)

(i) Reduce a matrix to Hessenberg form by QR factorization.







× × × × ×







Q1AQ^∗₁

−−−−→







× × × × ×

0 × × × ×







Q₂AQ^∗₂

−−−−→







× × × × ×

0 × × × ×

0 0 × × ×







Q3AQ^∗₃

−−−−→







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×







: upper Hessenberg

(52)

(ii) Reduce upper Hessenberg matrix to upper triangular form by Givens rotations







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×





 P₁₂A₁

−−−→







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×







P23A2

−−−→







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×





 P34A3

−−−→







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×







P₄₅A₄

−−−→







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×

0 0 0 0 ×







= T (upper triangular)

(53)







× × × × ×

0 × × × ×

0 0 × × ×

0 0 0 × ×

0 0 0 0 ×





 A1P₁₂^∗

−−−→







+ + × × ×

0 0 × × ×

0 0 0 × ×

0 0 0 0 ×







A₂P₂₃^∗

−−−→







× + + × ×

0 + + × ×

0 0 0 × ×

0 0 0 0 ×





 A₃P₃₄^∗

−−−→







× × + + ×

0 × + + ×

0 0 + + ×

0 0 0 0 ×







A₄P₄₅^∗

−−−→







× × × + +

0 × × + +

0 0 × + +

0 0 0 + +







= H (upper Hessenberg)

(54)

A practical algorithm for reducing an upper Hessenberg matrix Hto Schur form:

1 If the shifted QR algorithm is applied to H, then h_n,n−1will tend rapidly to zero and other subdiagonal elements may also tend to zero, slowly.

2 If h_i,i−1 ≈ 0, then deflate the matrix to save computation.

How to decide hi,i−1to be negligible?

If

|hi+1,i| ≤ εkAkF

for a small number ε, then hi+1,iis negligible.

Let Q be an orthogonal matrix such that H = Q^∗AQ ≡ [hij] is upper Hessenberg. Let

H = H − h˜ i+1,iei+1e^T_i ∼ deflated matrix

(55)

Set

E = Q(h_i+1,ie_i+1e^T_i )Q^∗. Then

H = Q˜ ^∗(A − E)Q.

If |h_i+1,i| ≤ εkAk_F,then

kEk_F = kQ(h_i+1,ie_i+1e^T_i )Q^∗k_F = |h_i+1,i| ≤ εkAk_F or

kEk_F kAk_F ≤ ε.

When ε equals the rounding unit ε_M,the perturbation E is of a size with the perturbation due to rounding the

elements of A.

(56)

The Wilkinson shift

1 The Rayleigh-quotient shift σ = hn,n

⇒ local quadratic convergence to simple

2 If H is real

⇒ Rayleigh-quotient shift is also real

⇒ can not approximate a complex eigenvalue

3 The Wilkinson shift µ : If λ₁, λ2 are eigenvalues of

h_n−1,n−1 h_n−1,n hn,n−1 hn,n

with

|λ₁− h_n,n| ≤ |λ₂− h_n,n|, then µ = λ1.

(57)

Algorithm do k = 1, 2, · · ·

compute Wilkinson shift µ_k

Reduce upper Hessenberg H_k− µ_kI to upper triangular T_k: P_n−1,n^(k) · · · P₁₂^(k)(H_k− µ_kI) = T_k;

compute

H_k+1 = T_kP₁₂^(k)∗· · · P_n−1,n^(k)∗ + µ_kI;

end do

⇒ Schur form of A ⇒ eigenvalues of A.

Question

How to get eigenvectors of A?

(58)

If A = QT Q^∗ is the Schur decomposition of A and X is the matrix of right eigenvectors of T , then QX is the matrix of right eigenvalues of A.

If

T =





T₁₁ t_1,k t_1,k+1 0 τ_kk t^∗_k,k+1 0 0 T_k+1,k+1



 and τ_kkis a simple eigenvalue of T , then





−(T₁₁− τ_kkI)⁻¹t_1,k 1

0



 is an eigenvector of T and

0 1 −t^∗_k,k+1(T_k+1,k+1− τ_kkI)⁻¹ is a left eigenvector of T corresponding to τ_kk.

(59)

師大

The implicity shifted QR algorithm

Theorem (Real Schur form)

Let A be real of order n. Then ∃ an orthogonal matrix U such that

U^TAU =







T₁₁ T₁₂ · · · T_1k 0 T₂₂ · · · T_2k ... . .. ... ... 0 · · · 0 Tkk







∼ quasi-triangular

The diagonal blocks of T are of order one or two. The blocks of order one contain the real eigenvalue of A. The block of order two contain the pairs of complex conjugate eigenvalue of A.

The blocks can be made to appear in any order.

(60)

師大

Proof: Let (λ, x) be a complex eigenpair with λ = µ + iν and x = y + iz. That is

2y = x + ¯x, 2zi = x − ¯x and

Ay = 1

2[λx + ¯λ¯x]

= 1

2[(µy − νz) + i(µz + νy) + (µy − νz) − i(νy + µz)]

= µy − νz. (9)

Similarly,

Az = 1

2i[λx − ¯λ¯x] = νy + µz. (10) From (9) and (10), we have

A y z

= µy − νz νy + µz

= y z

µ ν

−ν µ

≡ y z L.

(61)

師大

Let

y z = X₁ X₂

R 0

= X1R

be a QR factorization of y z . Since y and z are linearly independent, it holds that R is nonsingular and

X₁ = y z R⁻¹. Consequently,

AX1 = A y z R⁻¹ = y z LR⁻¹= X1RLR⁻¹. Using this result and (X₁ X₂)is unitary, we have

X₁^T X₂^T

A X₁ X₂

=

X₁^TAX1 X₁^TAX2

X₂^TAX₁ X₂^TAX₂

=

RLR⁻¹ X₁^TAX₂ 0 X₂^TAX2

. (11)

(62)

師大

Since λ and ¯λare eigenvalues of L and RLR⁻¹ is similar to L, (11) completes the deflation of the complex conjugate pair λ and ¯λ.

Remark

AX₁ = X₁(RLR⁻¹)

⇒ A maps the column space of X1 into itself

⇒ span(X₁)is called an eigenspace or invariant subspace.

•Francis double shift

1 If the Wilkinson shift σ is complex, then ¯σ is also a candidate for a shift.

2 Apply two steps of the QR algorithm, one with shift σ and the other with shift ¯σto yield a matrix ˆH.

(63)

師大

Let

Q ˆˆR = (H − σI)(H − ˆσI) be the QR factorization of (H − σI)(H − ˆσI), then

H = ˆˆ Q^∗H ˆQ.

Since

(H − σI)(H − ˆσI) = H²− 2Re(σ)H + |σ|²I ∈ R^n×n, we have that ˆQ ∈ R^n×nand ˆH ∈ R^n×n. Therefore, the QR algorithm with two complex conjugate shifts preserves reality.

Francis double shift strategy

1 Compute the Wilkinson shift σ;

2 From the matrix H²− 2Re(σ)H + |σ|²I := ˜H ∼ O(n³) operations;

3 Compute QR factorization of ˜H : ˜H = ˆQ ˆR;

4 Compute ˆH = ˆQ^∗H ˆQ.

(64)

師大

• The uniqueness of Hessenberg reduction Definition

Let H be upper Hessenberg of order n. Then H is unreduced if h_i+1,i 6= 0 for i = 1, · · · , n − 1.

Theorem (Implicit Q theorem)

Suppose Q = q₁ · · · q_n and V = v₁ · · · v_n are unitary matrices with

Q^∗AQ = H and V^∗AV = G

being upper Hessenberg. Let k denote the smallest positive integer for which h_k+1,k = 0,with the convection that k = n if H is unreduced. If v₁ = q1, then v_i= ±qiand |h_i,i−1| = |g_i,i−1| for i = 2, · · · , k.Moreover, if k < n, then g_k+1,k = 0.

(65)

師大

Proof: Define W ≡ w1 · · · wn = V^∗Q. Then GW = GV^∗Q = V^∗AQ = V^∗QH = W H which implies that

hi,i−1wi = Gwi−1−

i−1

X

j=1

hj,i−1wj for i = 2, · · · , k.

Since v1 = q₁, it holds that w₁ = e₁,

h₂₁w₂ = Gw₁− h₁₁w₁ = α₂₁e₁+ α₂₂e₂. Assume

w_i−1= α_i−1,1e₁+ · · · + α_i−1,i−1e_i−1.

(66)

師大

Then

hi,i−1wi = G[αi−1,1e1+ · · · + αi−1,i−1ei−1] −

i−1

X

j=1

βi,jej

= α¯i,1e1+ · · · + ¯αi,iei.

By induction, w1 · · · w_k is upper triangular. Since V and Qare unitary, W = V ∗ Q is also unitary and then

w^∗₁w_j = 0, for j = 2, · · · , k.

That is

w1j = 0, for j = 2, · · · , k which implies that

w2= ±e2. Similarly, by

w^∗₂wj = 0, for j = 3, · · · , k,

(67)

師大

i.e.,

w2j = 0, for j = 3, · · · , k.

We get w₃ = ±e₃.By induction,

wi= ±ei, for i = 2, · · · , k.

Since w_i= V^∗qi and h_i,i−1 = w^∗_iGwi−1, we have v_i = V e_i= ±V w_i = ±q_i and

|h_i,i−1| = |g_i,i−1| for i = 2, · · · , k.

If h_k+1,k = 0, then

g_k+1,k = e^T_k+1Ge_k= ±e^T_k+1GW e_k= ±e^T_k+1W He_k

= ±e^T_k+1

k

X

i=1

h_ikw_i = ±

k

X

i=1

h_ike^T_k+1e_i = 0