7.1 The Lanczos Algorithm

(1)

Chapter 7 Lanczos Methods

In this chapter we develop the Lanczos method, a technique that is applicable to large sparse, symmetric eigenproblems. The method involves tridiagonalizing the given matrix A. However, unlike the Householder approach, no intermediate (an full) submatrices are generated. Equally important, information about A⁰s extremal eigenvalues tends to emerge long before the tridiagonalization is complete. This makes the Lanczos algorithm particularly useful in situations where a few of A⁰s largest or smallest eigenvalues are desired.

7.1 The Lanczos Algorithm

Suppose A ∈ R^n×n is large, sparse and symmetric. There exists an orthogonal matrix Q, which transforms A to a tridiagonal matrix T .

Q^TAQ = T ≡ tridiagonal. (7.1.1)

Remark 7.1.1 (a) Such Q can be generated by Householder transformations or Givens rotations.

(b) Almost for all A (i.e. all eigenvalues are distinct) and almost for any q₁ ∈ Rⁿ with kq₁k₂ = 1, there exists an orthogonal matrix Q with first column q₁ satisfying (7.1.1). q₁ determines T uniquely up to the sign of the columns (that is, we can multiply each column with -1).

Let (x ∈ Rⁿ)

K[x, A, m] = [x, Ax, A²x, · · · , A^m−1x] ∈ R^n×m. (7.1.2) K[x, A, m] is called a Krylov-matrix. Let

K(x, A, m) = Range(K[x, A, m]) = Span(x, Ax, · · · , A^m−1x). (7.1.3) K(x, A, m) is called the Krylov-subspace generated by K[x, A, m].

Remark 7.1.2 For each H ∈ C^n×m or R^n×m (m ≤ n) with rank(H) = m, there exists an Q ∈ C^n×m or R^n×m and an upper triangular R ∈ C^m×m or R^m×m with Q^∗Q = Im

such that

H = QR. (7.1.4)

(2)

Q is uniquely determined, if we require all r_ii> 0.

Theorem 7.1.1 Let A be symmetric (Hermitian), 1 ≤ m ≤ n be given and dimK(x, A, m) = m then

(a) If

K[x, A, m] = Q_mR_m (7.1.5)

is an QR factorization, then Q^∗_mAQ_m = T_m is an m × m tridiagonal matrix and satisfies

AQ_m = Q_mT_m+ r_me^T_m, Q^∗_mr_m = 0. (7.1.6) (b) Let kxk2 = 1. If Qm ∈ C^n×m with the first column x and Q^∗_mQm = Im and satisfies

AQ_m = Q_mT_m+ r_me^T_m, where T_m is tridiagonal, then

K[x, A, m] = [x, Ax, · · · , A^m−1x] = Qm[e1, Tme1, · · · , T_m^m−1e1] (7.1.7) is an QR factorization of K[x, A, m].

Proof: (a) Since

AK(x, A, j) ⊂ K(x, A, j + 1), j < m. (7.1.8) From (7.1.5), we have

Span(q₁, · · · , q_i) = K(x, A, i), i ≤ m. (7.1.9) So we have

qi+1 ⊥ K(x, A, i)^(7.1.8)⊃ AK(x, A, i − 1) = A(span(q1, · · · , qi−1)).

This implies

q_i+1^∗ Aq_j = 0, j = 1, · · · , i − 1, i + 1 ≤ m.

That is

(Q^∗_mAQm)ij = (Tm)ij = q_i^∗Aqj = 0 for i > j + 1.

So T_m is upper Hessenberg and then tridiagonal (since T_m is Hermitian).

It remains to show (7.1.6). Since

[x, Ax, · · · , A^m−1x] = Q_mR_m and

AK[x, A, m] = K[x, A, m]







0 0

1 . ..

. .. ...

0 1 0





+ A^mxe^T_m,

(3)

we have

AQmRm = QmRm







0 0

1 . ..

. .. ...

0 1 0





+ QmQ^∗_mA^mxe^T_m+ (I − QmQ^∗_m)A^mxe^T_m.

Then

AQm = Qm[Rm







0 0

1 . ..

. .. ...

0 1 0





+ Q^∗_mA^mxe^T_m]R⁻¹_m + (I − QmQ^∗_m)A^mxe^T_mR_m⁻¹

= Q_m[R_m







0 0

1 . ..

. .. ...

0 1 0





R⁻¹_m + γQ^∗_mA^mxe^T_m] + γ(I − Q_mQ^∗_m)A^mx

| {z }

rm

e^T_m

= Q_mH_m+ r_me^T_m with Q^∗_mr_m = 0,

where H_m is an upper Hessenberg matrix. But Q^∗_mAQ_m = H_m is Hermitian, so H_m = T_m is tridiagonal.

(b) We check (7.1.7):

x = Q_me₁ coincides the first column. Suppose that i-th columns are equal, i.e.

Aⁱ⁻¹x = Q_mT_mⁱ⁻¹e₁ Aⁱx = AQmT_mⁱ⁻¹e1

= (Q_mT_m+ r_me^T_m)T_mⁱ⁻¹e₁

= Q_mT_mⁱ e₁+ r_me^T_mT_mⁱ⁻¹e₁.

But e^T_mT_mⁱ⁻¹e1 = 0 for i < m. Therefore, Aⁱx = QmTⁱe1 the (i + 1)-th columns are equal.

It is clearly that (e₁, T_me₁, · · · , T_m^m−1e₁) is an upper triangular matrix.

Theorem 7.1.1 If x = q₁ with kq₁k₂ = 1 satisfies rank(K[x, A, n]) = n

(that is {x, Ax, · · · , Aⁿ⁻¹x} are linearly independent), then there exists an unitary matrix Q with first column q1 such that Q^∗AQ = T is tridiagonal.

Proof: From Theorem 7.1.1(a) m = n, we have Q_m = Q unitary and AQ = QT . Uniqueness: Let Q^∗AQ = T , ˜Q^∗A ˜Q = ˜T and Q₁e₁ = ˜Qe₁

⇒ K[q1, A, n] = QR = ˜Q ˜R

⇒ Q = ˜QD, R = D ˜R.

Substitute Q by QD, where D = diag(²₁, · · · , ²_n) with |²_i| = 1. Then (QD)^∗A(QD) = D^∗Q^∗AQD = D^∗T D = tridiagonal.

(4)

So Q is unique up to multiplying the columns of Q by a factor ² with |²| = 1.

In the following paragraph we will investigate the Lanczos algorithm for the real case, i.e., A ∈ R^n×n.

How to find an orthogonal matrix Q = (q₁, · · · , q_n) with Q^TQ = I_nsuch that Q^TAQ = T = tridiagonal and Q is almost uniquely determined. Let

AQ = QT, (7.1.10)

where

Q = [q₁, · · · , q_n] and T =







α₁ β₁ 0

β₁ . .. ...

. .. ... β_n−1

0 β_n−1 α_n





.

It implies that the j-th column of (7.1.10) forms:

Aqj = βj−1qj−1+ αjqj + βjqj+1, (7.1.11) for j = 1, · · · , n with β₀ = β_n= 0. By multiplying (7.1.11) by q_j^T we obtain

q_j^TAq_j = α_j. (7.1.12)

Define r_j = (A − α_jI)q_j − β_j−1q_j−1. Then r_j = β_jq_j+1 with

β_j = ±kr_jk₂ (7.1.13)

and if β_j 6= 0 then

qj+1 = rj/βj. (7.1.14)

So we can determine the unknown α_j, β_j, q_j in the following order:

Given q1, α1, r1, β1, q2, α2, r2β2, q3, · · · . The above formula define the Lanczos iterations:

j = 0, r0 = q1 , β0 = 1 , q0 = 0 Do while (β_j 6= 0)

q_j+1 = r_j/β_j , j := j + 1 αj = q_j^TAqj ,

rj = (A − αjI)qj− βj−1qj−1, β_j = kr_jk₂.

(7.1.15)

There is no loss of generality in choosing the βj to be positive. The qj are called Lanczos vectors. With careful overwriting and use of the formula α_j = q_j^T(Aq_j − β_j−1q_j−1), the whole process can be implemented with only a pair of n-vectors.

(5)

Algorithm 7.1.1 (Lanczos Algorithm) Given a symmetric A ∈ R^n×n and w ∈ Rⁿ having unit 2-norm. The following algorithm computes a j × j symmetric tridiagonal matrix T_j with the property that σ(T_j) ⊂ σ(A). The diagonal and subdiagonal elements of T_j are stored in α₁, · · · , α_j and β₁, · · · , β_j−1 respectively.

vi := 0 (i = 1, · · · , n) β₀ := 1

j := 0

Do while (βj 6= 0) if (j 6= 0), then

for i = 1, · · · , n,

t := wi, wi := vi/βj, vi := −βjt.

end for end if

v := Aw + v, j := j + 1, α_j := w^Tv, v := v − αjw, β_j := kvk₂.

Remark 7.1.3 (a) If the sparity is exploited and only kn flops are involved in each call (Aw) (k ¿ n), then each Lanczos step requires about (4+k)n flops to execute.

(b) The iteration stops before complete tridiagonalizaton if q₁ is contained in a proper invariant subspace. From the iteration (7.1.15) we have

A(q₁, · · · , q_m) = (q₁, · · · , q_m)







α1 β1

β₁ . .. ... β_m−1 . .. ...

βm−1 αm





+ (0, · · · , 0,

rm

z }| { β_mq_m+1)

| {z }

rme^T_m

β_m = 0 if and only if r_m = 0.

This implies

A(q₁, · · · , q_m) = (q₁, · · · , q_m)T_m. That is

Range(q1, · · · , qm) = Range(K[q1, A, m])

is the invariant subspace of A and the eigenvalues of T_m are the eigenvalues of A.

Theorem 7.1.2 Let A be symmetric and q₁ be a given vector with kq₁k₂ = 1. The Lanc- zos iterations (7.1.15) runs until j = m where m = rank[q₁, Aq₁, · · · , Aⁿ⁻¹q₁]. Moreover, for j = 1, · · · , m we have

AQ_j = Q_jT_j + r_je^T_j (7.1.16) with

Tj =







α₁ β₁ β1 . .. ...

. .. ... β_j−1 β_j−1 α_j





 and Qj = [q1, · · · , qj]

(6)

has orthonormal columns satisfying Range(Q_j) = K(q₁, A, j).

Proof: By induction on j. Suppose the iteration has produced Q_j = [q₁, · · · , q_j] such that Range(Q_j) = K(q₁, A, j) and Q^T_jQ_j = I_j. It is easy to see from (7.1.15) that (7.1.16) holds. Thus

Q^T_jAQ_j = T_j + Q^T_jr_je^T_j. Since αi = q_i^TAqi for i = 1, · · · , j and

q^T_i+1Aqi = q_i+1^T (βiqi+1+ αiqi+ βi−1qi−1) = q_i+1^T (βiqi+1) = βi

for i = 1, · · · , j − 1 we have Q^T_jAQ_j = T_j. Consequently Q^T_jr_j = 0.

If r_j 6= 0 then q_j+1 = r_j/kr_jk₂ is orthogonal to q₁, · · · , q_j and qj+1 ∈ Span{Aqj, qj, qj−1} ⊂ K(q1, A, j + 1).

Thus Q^T_j+1Qj+1 = Ij+1 and Range(Qj+1) = K(q1, A, j + 1).

On the other hand, if r_j = 0, then AQ_j = Q_jT_j. This says that Range(Q_j) = K(q₁, A, j) is invariant. From this we conclude that j = m = dim[K(q₁, A, n)].

Encountering a zero βj in the Lanczos iteration is a welcome event in that it signals the computation of an exact invariant subspace. However an exactly zero or even small β_j is rarely in practice. Consequently, other explanations for the convergence of T_j⁰s eigenvalues must be sought.

Theorem 7.1.3 Suppose that j steps of the Lanczos algorithm have been performed and that

S_j^TT_jS_j = diag(θ₁, · · · , θ_j)

is the Schur decomposition of the tridiagonal matrix Tj, if Yj ∈ R^n×j is defined by Y_j = [y₁, · · · , y_j] = Q_jS_j

then for i = 1, · · · , j we have

kAy_i− θ_iy_ik₂ = |β_j||s_ji| where S_j = (s_pq).

Proof: Post-multiplying (7.1.16) by Sj gives

AYj = Yjdiag(θ1, · · · , θj) + rje^T_jSj, i.e.,

Ay_i = θ_iy_i + r_j(e^T_jS_je_i) , i = 1, · · · , j.

The proof is complete by taking norms and recalling krjk2 = |βj|.

Remark 7.1.4 The theorem provides error bounds for T_j⁰s eigenvalues:

µ∈σ(A)min |θ_i− µ| ≤ |β_j||s_ji| i = 1, · · · , j.

(7)

Note that in section 10 the (θ_i, y_i) are Ritz pairs for the subspace R(Q_j).

If we use the Lanczos method to compute AQj = QjTj + rje^T_j and set E = τ ww^T where τ = ±1 and w = aq_j + br_j, then it can be shown that

(A + E)Q_j = Q_j(T_j + τ a²e_je^T_j) + (1 + τ ab)r_je^T_j. If 0 = 1 + τ ab, then the eigenvalues of the tridiagonal matrix

T˜j = Tj + τ a²eje^T_j

are also eigenvalues of A + E. We may then conclude from theorem 6.1.2 that the interval [λ_i(T_j), λ_i−1(T_j)] where i = 2, · · · , j, each contains an eigenvalue of A + E.

Suppose we have an approximate eigenvalue ˜λ of A. One possibility is to choose τ a² so that

det( ˜T_j − ˜λI_j) = (α_j+ τ a²− ˜λ)p_j−1(˜λ) − β_j−1² p_j−2(˜λ) = 0, where the polynomial p_i(x) = det(T_i− xI_i) can be evaluated at ˜λ using (5.3).

The following theorems are known as the Kaniel-Paige theory for the estimation of eigenvalues which obtained via the Lanczos algorithm.

Theorem 7.1.4 Let A be n × n symmetric matrix with eigenvalues λ₁ ≥ · · · ≥ λ_n and corresponding orthonormal eigenvectors z₁, · · · , z_n. If θ₁ ≥ · · · ≥ θ_j are the eigenvalues of T_j obtained after j steps of the Lanczos iteration, then

λ₁ ≥ θ₁ ≥ λ₁ −(λ₁− λ_n) tan (φ₁)² [cj−1(1 + 2ρ1)]² ,

where cos φ₁ = |q₁^Tz₁|, ρ₁ = (λ₁− λ₂)/(λ₂− λ_n) and c_j−1 is the Chebychev polynomial of degree j − 1.

Proof: From Courant-Fischer theorem we have θ₁ = max

y6=0

y^TT_jy

y^Ty = max

y6=0

(Q_jy)^TA(Q_jy)

(Qjy)^T(Qjy) = max

06=w∈K(q1,A,j)

w^TAw w^Tw .

Since λ₁ is the maximum of w^TAw/w^Tw over all nonzero w, it follows that λ₁ ≥ θ₁. To obtain the lower bound for θ1, note that

θ₁ = max

p∈Pj−1

q₁^Tp(A)Ap(A)q₁ q₁^Tp(A)²q₁ , where P_j−1 is the set of all j − 1 degree polynomials. If

q₁ = Xn

i=1

d_iz_i

then q₁^Tp(A)Ap(A)q₁

q^T₁p(A)²q₁ = P_n

i=1d²_ip(λ_i)²λ_i P_n

i=1d²_ip(λi)²

(8)

≥ λ₁− (λ₁− λ_n)

P_n

i=2d²_ip(λ_i)² d²₁p(λ1)²+P_n

i=2d²_ip(λi)².

We can make the lower bound tight by selecting a polynomial p(x) that is large at x = λ1

in comparison to its value at the remaining eigenvalues. Set

p(x) = c_j−1[−1 + 2 x − λn

λ₂− λ_n],

where c_j−1(z) is the (j − 1)-th Chebychev polynomial generated by cj(z) = 2zcj−1(z) − cj−2(z), c0 = 1, c1 = z.

These polynomials are bounded by unity on [-1,1]. It follows that |p(λ_i)| is bounded by unity for i = 2, · · · , n while p(λ₁) = c_j−1(1 + 2ρ₁). Thus,

θ₁ ≥ λ₁− (λ₁− λ_n)(1 − d²₁) d²₁

1

c²_j−1(1 + 2ρ₁).

The desired lower bound is obtained by noting that tan (φ₁)² = (1 − d²₁)/d²₁.

Corollary 7.1.5 Using the same notation as Theorem 7.1.4

λ_n≤ θ_j ≤ λ_n+(λ₁− λ_n) tan²(φ_n) c²_j−1(1 + 2ρ_n) , where ρ_n= (λ_n−1− λ_n)/(λ₁− λ_n−1) and cos (φ_n) = |q₁^Tz_n|.

Proof: Apply Theorem 7.1.4 with A replaced by −A.

Example 7.1.1

L_j−1 ≡ 1

[C_j−1(2^λ_λ¹

2 − 1)]² ≥ 1

[C_j−1(1 + 2ρ₁)]² R_j−1 = (λ2

λ₁)^2(j−1) power method

λ₁/λ₂ j=5 j=25

1.5 1.1 × 10⁻⁴/3.9 × 10⁻² 1.4 × 10⁻²⁷/3.5 × 10⁻⁹ Lj−1/Rj−1

1.01 5.6 × 10⁻¹/9.2 × 10⁻¹ 2.8 × 10⁻⁴/6.2 × 10⁻¹ L_j−1/R_j−1

Rounding errors greatly affect the behavior of algorithm 7.1.1, the Lanczos iteration.

The basic difficulty is caused by loss of orthogonality among the Lanczos vectors. To avoid these difficulties we can reorthogonalize the Lanczos vectors.

(9)

7.1.1 Reorthogonalization

Since

AQ_j = Q_jT_j+ r_je^T_j, let

AQ_j − Q_jT_j = r_je^T_j + F_j (7.1.17) I − Q^T_jQj = C_j^T + ∆j+ Cj, (7.1.18) where Cj is strictly upper triangular and ∆jis diagonal. (For simplicity, suppose (Cj)i,i+1 = 0 and ∆_i = 0.)

Definition 7.1.1 θi and yi ≡ Qjsi are called Ritz value and Ritz vector, respectively, if T_js_i = θ_is_i.

Let Θ_j ≡ diag(θ₁, · · · , θ_j) = S_j^TT_jS_j where S_j =£

s1 · · · sj

¤.

Theorem 7.1.6 (Paige Theorem) Assume that (a) S_j and Θ_j are exact ! (Since j ¿ n). (b) local orthogonality is maintained. ( i.e. q^T_i+1qi = 0, i = 1, . . . , j − 1, r^T_jqj = 0, and (C_j)_i,i+1 = 0 ). Let

F_j^TQ_j− Q^T_jF_j = K_j− K_j^T,

∆jTj − Tj∆j ≡ Nj− N_j^T,

G_j = S_j^T(K_j + N_j)S_j ≡ (r_ik).

Then

(a) y^T_i q_j+1 = r_ii/β_ji, where y_i = Q_js_i, β_ji = β_js_ji. (b) For i 6= k,

(θi− θk)y_i^Tyk = rii(sjk

s_ji) − rkk(sji

s_jk) − (rik− rki). (7.1.19) Proof: Multiplied (7.1.17) from left by Q^T_j, we get

Q^T_jAQ_j − Q^T_jQ_jT_j = Q^T_jr_je^T_j + Q^T_jF_j, (7.1.20) which implies that

Q^T_jA^TQ_j − T_jQ^T_jQ_j = e_jr_j^TQ_j + F_j^TQ_j. (7.1.21) Subtracted (7.1.20) from (7.1.21), we have

(Q^T_jγ_j)e^T_j − e_j(Q^T_jγ_j)^T

= (C_j^TTj − TjC_j^T) + (CjTj − TjCj) + (∆jTj− Tj∆j) + F_j^TQj − QjF_j^T

= (C_j^TT_j − T_jC_j^T) + (C_jT_j − T_jC_j) + (N_j − N_j^T) + (K_j− K_j^T).

(10)

This implies that

(Q^T_jr_j)e^T_j = C_jT_j− T_jC_j+ N_j+ K_j. Thus,

y^T_i q_j+1β_ji = s^T_i (Q^T_jr_j)e^T_js_i = s^T_i (C_jT_j− T_jC_j)s_i+ s^T_i (N_j + K_j)s_i

= (s^T_i C_js_i)θ_i− θ_i(s^T_i C_js_i) + r_ii, which implies that

y^T_i q_j+1 = rii

β_ji.

Similarly, (7.1.19) can be obtained by multiplying (7.1.20) from left and right by s^T_i and si, respectively.

Remark 7.1.5 Since y^T_i qj+1 = r_ii

β_ji =

½ O(esp), if |βji| = O(1), (not converge!)

O(1), if |β_ji| = O(esp), (converge for (θ_j, y_j))

we have that q^T_j+1yi = O(1) when the algorithm converges, i.e., qj+1 is not orthogonal to

< Q_j > where Q_js_i = y_i.

(i) Full Reorthogonalization by MGS:

Orthogonalize q_j+1 to all q₁, · · · , q_j by

q_j+1 := q_j+1− Xj

i=1

(q_j+1^T q_i)q_i.

If we incorporate the Householder computations into the Lanczos process, we can produce Lanczos vectors that are orthogonal to working accuracy:

r₀ := q₁ (given unit vector)

Determine P0 = I − 2v0v₀^T/v₀^Tv0 so that P0r0 = e1; α1 := q₁^TAq1;

Do j = 1, · · · , n − 1,

r_j := (A − α_j)q_j− β_j−1q_j−1 (β₀q₀ ≡ 0), w := (P_j−1· · · P₀)r_j,

Determine P_j = I − 2v_jv^T_j/v^T_jv_j such that P_jw = (w₁, · · · , w_j, β_j, 0, · · · , 0)^T, q_j+1 := (P₀· · · P_j)e_j+1,

αj+1 := q^T_j+1Aqj+1.

This is the complete reorthogonalization Lanczos scheme.

(11)

(ii) Selective Reorthogonalization by MGS If |βji| = O(√

eps), (θj, yj) “good” Ritz pair Do q_j+1⊥q₁, . . . , q_j

Else not to do Reorthogonalization (iii) Restart after m-steps

(Do full Reorthogonalization) (iv) Partial Reorthogonalization

Do reorthogonalization with previous (e.g. k = 5) Lanczos vectors {q₁, . . . , q_k} For details see the books:

Parlett: “Symmetric Eigenvalue problem” (1980) pp.257–

Golub & Van Loan: “Matrix computation” (1981) pp.332–

To (7.1.19): The duplicate pairs can occur!

i 6= k, (θi− θk) y|{z}_i^Tyk = O(esp)

O(1), if y_i = y_k ⇒ Q_i ≈ Q_k

How to avoid the duplicate pairs ? The answer is using the implicit Restart Lanczos algorithm:

Let

AQ_j = Q_jT_j + r_je^T_j be a Lanczos decomposition.

• In principle, we can keep expanding the Lanczos decomposition until the Ritz pairs have converged.

• Unfortunately, it is limited by the amount of memory to storage of Qj.

• Restarted the Lanczos process once j becomes so large that we cannot store Q_j. – Implicitly restarting method

• Choose a new starting vector for the underlying Krylov sequence

• A natural choice would be a linear combination of Ritz vectors that we are interested in.

7.1.2 Filter polynomials

Assume A has a complete system of eigenpairs (λi, xi) and we are interested in the first k of these eigenpairs. Expand u₁ in the form

u₁ = Xk

i=1

γ_ix_i+ Xn i=k+1

γ_ix_i. If p is any polynomial, we have

p(A)u₁ = Xk

i=1

γ_ip(λ_i)x_i+ Xn i=k+1

γ_ip(λ_i)x_i.

(12)

• Choose p so that the values p(λ_i) (i = k +1, . . . , n) are small compared to the values p(λi) (i = 1, . . . , k).

• Then p(A)u₁ is rich in the components of the x_i that we want and deficient in the ones that we do not want.

• p is called a filter polynomial.

• Suppose we have Ritz values µ₁, . . . , µ_m and µ_k+1, . . . , µ_m are not interesting. Then take

p(t) = (t − µ_k+1) · · · (t − µ_m).

7.1.3 Implicitly restarted algorithm

Let

AQ_m = Q_mT_m+ β_mq_m+1e^T_m (7.1.22) be a Lanczos decomposition with order m. Choose a filter polynomial p of degree m − k and use the implicit restarting process to reduce the decomposition to a decomposition

A ˜Q_k = ˜Q_kT˜_k+ ˜β_kq˜_k+1e^T_k of order k with starting vector p(A)u₁.

Let ν₁, . . . , ν_m be eigenvalues of T_m and suppose that ν₁, . . . , ν_m−k correspond to the part of the spectrum we are not interested in. Then take

p(t) = (t − ν1)(t − ν2) · · · (t − νm−k).

The starting vector p(A)u₁ is equal to

p(A)u₁ = (A − ν_m−kI) · · · (A − ν₂I)(A − ν₁I)u₁

= (A − νm−kI) [· · · [(A − ν2I) [(A − ν1I)u1]]] .

In the first, we construct a Lanczos decomposition with starting vector (A−ν₁I)u₁. From (7.1.22), we have

(A − ν₁I)Q_m = Q_m(T_m− ν₁I) + β_mq_m+1e^T_m (7.1.23)

= Q_mU₁R₁+ β_mq_m+1e^T_m, where

T_m− ν₁I = U₁R₁

is the QR factorization of Tm− κ1I. Postmultiplying by U1, we get (A − ν₁I)(Q_mU₁) = (Q_mU₁)(R₁U₁) + β_mq_m+1(e^T_mU₁).

It implies that

AQ⁽¹⁾_m = Q⁽¹⁾_m T_m⁽¹⁾+ β_mq_m+1b^(1)T_m+1, where

Q⁽¹⁾_m = Q_mU₁, T_m⁽¹⁾ = R₁U₁+ ν₁I, b^(1)T_m+1 = e^T_mU₁. (Q⁽¹⁾m : one step of single shifted QR algorithm)

(13)

Remark 7.1.6

• Q⁽¹⁾m is orthonormal.

• By the definition of Tm⁽¹⁾, we get

U₁T_m⁽¹⁾U₁^T = U₁(R₁U₁+ ν₁I)U₁^T = U₁R₁ + ν₁I = T_m. (7.1.24) Therefore, ν₁, ν₂, . . . , ν_m are also eigenvalues of Tm⁽¹⁾.

• Since T_m is tridiagonal and U₁ is the Q-factor of the QR factorization of T_m− ν₁I, it implies that U₁ and Tm⁽¹⁾ are upper Hessenberg. From (7.1.24), Tm⁽¹⁾ is symmetric.

Therefore, Tm⁽¹⁾ is also tridiagonal.

• The vector b^(1)T_m+1 = e^T_mU₁ has the form b^(1)T_m+1 =

h

0 · · · 0 U_m−1,m⁽¹⁾ Um,m⁽¹⁾

i

; i.e., only the last two components of b⁽¹⁾_m+1 are nonzero.

• For on postmultiplying (7.1.23) by e₁, we get

(A − ν1I)q1 = (A − ν1I)(Qme1) = Q⁽¹⁾_m R1e1 = r⁽¹⁾₁₁q₁⁽¹⁾.

Since T_m is unreduced, r⁽¹⁾₁₁ is nonzero. Therefore, the first column of Q⁽¹⁾m is a multiple of (A − κ₁I)q₁.

Repeating this process with ν₂, . . . , ν_m−k, the result will be a Krylov decomposition AQ^(m−k)_m = Q^(m−k)_m T_m^(m−k)+ β_mq_m+1b^(m−k)T_m+1

with the following properties i. Q^(m−k)m is orthonormal.

ii. Tm^(m−k) is tridiagonal.

iii. The first k − 1 components of b^(m−k)T_m+1 are zero.

iv. The first column of Q^(m−k)m is a multiple of (A − ν₁I) · · · (A − ν_m−kI)q₁.

Corollary 7.1.1 Let ν₁, . . . , ν_m be eigenvalues of T_m. If the implicitly restarted QR step is performed with shifts ν1, . . . , νm−k, then the matrix Tm^(m−k) has the form

T_m^(m−k) =

"

T_kk^(m−k) T_k,m−k^(m−k) 0 T_k+1,k+1^(m−k)

# ,

where T_k+1,k+1^(m−k) is an upper triangular matrix with Ritz value ν₁, . . . , ν_m−k on its diagonal.

(14)

Therefore, the first k columns of the decomposition can be written in the form AQ^(m−k)_k = Q^(m−k)_k T_kk^(m−k)+ t_k+1,kq_k+1^(m−k)e^T_k + β_ku_mkq_m+1e^T_k,

where Q^(m−k)_k consists of the first k columns of Q^(m−k)m , T_kk^(m−k) is the leading principal submatrix of order k of Tm^(m−k), and u_km is from the matrix U = U₁· · · U_m−k. Hence if we set

Q˜_k = Q^(m−k)_k , T˜_k = T_kk^(m−k),

β˜_k = kt_k+1,kq_k+1^(m−k)+ β_ku_mkq_m+1k₂,

˜

q_k+1 = ˜β_k⁻¹(t_k+1,kq_k+1^(m−k)+ β_ku_mkq_m+1), then

A ˜Q_k = ˜Q_kT˜_k+ ˜β_kq˜_k+1e^T_k

is a Lanczos decomposition whose starting vector is proportional to (A − ν1I) · · · (A − ν_m−kI)q₁.

• Avoid any matrix-vector multiplications in forming the new starting vector.

• Get its Lanczos decomposition of order k for free.

• For large n the major cost will be in computing QU .

7.2 Approximation from a subspace

Assume that A is symmetric and {(αi, zi)}ⁿ_i=1be eigenpairs of A with α1 ≤ α2 ≤ · · · ≤ αn. Define

ρ(x) = ρ(x, A) = x^TAx x^Tx . Algorithm 7.2.1 (Rayleigh-Ritz-Quotient procedure)

Give a subspace S^(m) = span{Q} with Q^TQ = Im; Set H := ρ(Q) = Q^TAQ;

Compute the p (≤ m) eigenpairs of H, which are of interest, say Hgi = θigi for i = 1, . . . , p;

Compute Ritz vectors y_i = Qg_i, for i = 1, . . . , p;

Check kAy_i− θ_iy_ik₂ ≤ T ol, for i = 1, . . . , p.

By the minimax characterization of eigenvalues, we have α_j = λ_j(A) = min

F^j⊆Rⁿmax

f ∈F^jρ(f, A).

(15)

Define

β_j = min

G^j⊆S^mmax

g∈G^jρ(g, A), for j ≤ m.

Since G^j ⊆ S^m and S^(m) = span{Q}, it implies that G^j = Q eG^j for some eG^j ⊆ R^m. Therefore,

β_j = min

Ge^j⊆R^mmax

s∈ eG^jρ(s, H) = λ_j(H) ≡ θ_j, for j = 1, . . . , m.

For any m by m matrix B, there is associated a residual matrix R(B) ≡ AQ − QB.

Theorem 7.2.1 For given orthonormal n by m matrix Q, kR(H)k ≤ kR(B)k

for all m by m matrix B.

Proof: Since

R(B)^∗R(B) = Q^∗A²Q − B^∗(Q^∗AQ) − (Q^∗AQ)B + B^∗B

= Q^∗A²Q − H²+ (H − B)^∗(H − B)

= R(H)^∗R(H) + (H − B)^∗(H − B)

and (H − B)^∗(H − B) is positive semidefinite, it implies that kR(B)k² ≥ kR(H)k². Since

Hg_i = θ_ig_i, for i = 1, . . . , m, we have that

Q^TAQg_i = θ_ig_i, which implies that

QQ^TA(Qgi) = θi(Qgi).

Let yi = Qgi. Then QQ^Tyi = Q(Q^TQ)gi = yi. Take PQ = QQ^T which is a projection on span{Q}. Then

(QQ^T)Ay_i = θ_i(QQ^T)y_i, which implies that

P_Q(Ay_i− θ_iy_i) = 0, i.e., r_i = Ay_i − θ_iy_i ⊥ S^m = span{Q}.