## Chapter 7

## Lanczos Methods

In this chapter we develop the Lanczos method, a technique that is applicable to large
sparse, symmetric eigenproblems. The method involves tridiagonalizing the given matrix
*A. However, unlike the Householder approach, no intermediate (an full) submatrices*
*are generated. Equally important, information about A*^{0}*s extremal eigenvalues tends to*
emerge long before the tridiagonalization is complete. This makes the Lanczos algorithm
*particularly useful in situations where a few of A*^{0}*s largest or smallest eigenvalues are*
desired.

### 7.1 The Lanczos Algorithm

*Suppose A ∈ R*^{n×n}*is large, sparse and symmetric. There exists an orthogonal matrix Q,*
*which transforms A to a tridiagonal matrix T .*

*Q*^{T}*AQ = T ≡ tridiagonal.* (7.1.1)

*Remark 7.1.1 (a) Such Q can be generated by Householder transformations or Givens*
*rotations.*

*(b) Almost for all A (i.e. all eigenvalues are distinct) and almost for any q*_{1} *∈ R*^{n}*with kq*_{1}*k*_{2} *= 1, there exists an orthogonal matrix Q with first column q*_{1} *satisfying*
*(7.1.1). q*_{1} *determines T uniquely up to the sign of the columns (that is, we can*
*multiply each column with -1).*

*Let (x ∈ R** ^{n}*)

*K[x, A, m] = [x, Ax, A*^{2}*x, · · · , A*^{m−1}*x] ∈ R*^{n×m}*.* (7.1.2)
*K[x, A, m] is called a Krylov-matrix. Let*

*K(x, A, m) = Range(K[x, A, m]) = Span(x, Ax, · · · , A*^{m−1}*x).* (7.1.3)
*K(x, A, m) is called the Krylov-subspace generated by K[x, A, m].*

*Remark 7.1.2 For each H ∈ C*^{n×m}*or R*^{n×m}*(m ≤ n) with rank(H) = m, there exists*
*an Q ∈ C*^{n×m}*or R*^{n×m}*and an upper triangular R ∈ C*^{m×m}*or R*^{m×m}*with Q*^{∗}*Q = I**m*

*such that*

*H = QR.* (7.1.4)

*Q is uniquely determined, if we require all r*_{ii}*> 0.*

*Theorem 7.1.1 Let A be symmetric (Hermitian), 1 ≤ m ≤ n be given and dimK(x, A, m) =*
*m then*

*(a) If*

*K[x, A, m] = Q*_{m}*R** _{m}* (7.1.5)

*is an QR factorization, then Q*^{∗}_{m}*AQ*_{m}*= T*_{m}*is an m × m tridiagonal matrix and*
*satisfies*

*AQ*_{m}*= Q*_{m}*T*_{m}*+ r*_{m}*e*^{T}_{m}*,* *Q*^{∗}_{m}*r*_{m}*= 0.* (7.1.6)
*(b) Let kxk*2 *= 1. If Q**m* *∈ C*^{n×m}*with the first column x and Q*^{∗}_{m}*Q**m* *= I**m* *and satisfies*

*AQ*_{m}*= Q*_{m}*T*_{m}*+ r*_{m}*e*^{T}_{m}*,*
*where T*_{m}*is tridiagonal, then*

*K[x, A, m] = [x, Ax, · · · , A*^{m−1}*x] = Q**m**[e*1*, T**m**e*1*, · · · , T*_{m}^{m−1}*e*1] (7.1.7)
*is an QR factorization of K[x, A, m].*

Proof: (a) Since

*AK(x, A, j) ⊂ K(x, A, j + 1),* *j < m.* (7.1.8)
From (7.1.5), we have

*Span(q*_{1}*, · · · , q*_{i}*) = K(x, A, i),* *i ≤ m.* (7.1.9)
So we have

*q**i+1* *⊥ K(x, A, i)*^{(7.1.8)}*⊃ AK(x, A, i − 1) = A(span(q*1*, · · · , q**i−1**)).*

This implies

*q*_{i+1}^{∗}*Aq*_{j}*= 0,* *j = 1, · · · , i − 1,* *i + 1 ≤ m.*

That is

*(Q*^{∗}_{m}*AQ**m*)*ij* *= (T**m*)*ij* *= q*_{i}^{∗}*Aq**j* *= 0 for i > j + 1.*

*So T*_{m}*is upper Hessenberg and then tridiagonal (since T** _{m}* is Hermitian).

It remains to show (7.1.6). Since

*[x, Ax, · · · , A*^{m−1}*x] = Q*_{m}*R** _{m}*
and

*AK[x, A, m] = K[x, A, m]*

0 0

1 . ..

. .. ...

0 1 0

*+ A*^{m}*xe*^{T}_{m}*,*

we have

*AQ**m**R**m* *= Q**m**R**m*

0 0

1 . ..

. .. ...

0 1 0

*+ Q**m**Q*^{∗}_{m}*A*^{m}*xe*^{T}_{m}*+ (I − Q**m**Q*^{∗}_{m}*)A*^{m}*xe*^{T}_{m}*.*

Then

*AQ**m* *= Q**m**[R**m*

0 0

1 . ..

. .. ...

0 1 0

*+ Q*^{∗}_{m}*A*^{m}*xe*^{T}_{m}*]R*^{−1}_{m}*+ (I − Q**m**Q*^{∗}_{m}*)A*^{m}*xe*^{T}_{m}*R*_{m}^{−1}

*= Q*_{m}*[R*_{m}

0 0

1 . ..

. .. ...

0 1 0

*R*^{−1}_{m}*+ γQ*^{∗}_{m}*A*^{m}*xe*^{T}_{m}*] + γ(I − Q*_{m}*Q*^{∗}_{m}*)A*^{m}*x*

| {z }

*r**m*

*e*^{T}_{m}

*= Q*_{m}*H*_{m}*+ r*_{m}*e*^{T}_{m}*with Q*^{∗}_{m}*r*_{m}*= 0,*

*where H*_{m}*is an upper Hessenberg matrix. But Q*^{∗}_{m}*AQ*_{m}*= H*_{m}*is Hermitian, so H*_{m}*= T** _{m}*
is tridiagonal.

(b) We check (7.1.7):

*x = Q*_{m}*e*_{1} *coincides the first column. Suppose that i-th columns are equal, i.e.*

*A*^{i−1}*x = Q*_{m}*T*_{m}^{i−1}*e*_{1}
*A*^{i}*x = AQ**m**T*_{m}^{i−1}*e*1

*= (Q*_{m}*T*_{m}*+ r*_{m}*e*^{T}_{m}*)T*_{m}^{i−1}*e*_{1}

*= Q*_{m}*T*_{m}^{i}*e*_{1}*+ r*_{m}*e*^{T}_{m}*T*_{m}^{i−1}*e*_{1}*.*

*But e*^{T}_{m}*T*_{m}^{i−1}*e*1 *= 0 for i < m. Therefore, A*^{i}*x = Q**m**T*^{i}*e*1 *the (i + 1)-th columns are equal.*

*It is clearly that (e*_{1}*, T*_{m}*e*_{1}*, · · · , T*_{m}^{m−1}*e*_{1}) is an upper triangular matrix.

*Theorem 7.1.1 If x = q*_{1} *with kq*_{1}*k*_{2} *= 1 satisfies*
*rank(K[x, A, n]) = n*

*(that is {x, Ax, · · · , A*^{n−1}*x} are linearly independent), then there exists an unitary matrix*
*Q with first column q*1 *such that Q*^{∗}*AQ = T is tridiagonal.*

*Proof: From Theorem 7.1.1(a) m = n, we have Q*_{m}*= Q unitary and AQ = QT .*
*Uniqueness: Let Q*^{∗}*AQ = T , ˜Q*^{∗}*A ˜Q = ˜T and Q*_{1}*e*_{1} = ˜*Qe*_{1}

*⇒ K[q*1*, A, n] = QR = ˜Q ˜R*

*⇒* *Q = ˜QD, R = D ˜R.*

*Substitute Q by QD, where D = diag(²*_{1}*, · · · , ²*_{n}*) with |²*_{i}*| = 1. Then*
*(QD)*^{∗}*A(QD) = D*^{∗}*Q*^{∗}*AQD = D*^{∗}*T D = tridiagonal.*

*So Q is unique up to multiplying the columns of Q by a factor ² with |²| = 1.*

In the following paragraph we will investigate the Lanczos algorithm for the real case,
*i.e., A ∈ R** ^{n×n}*.

*How to find an orthogonal matrix Q = (q*_{1}*, · · · , q*_{n}*) with Q*^{T}*Q = I*_{n}*such that Q*^{T}*AQ =*
*T = tridiagonal and Q is almost uniquely determined. Let*

*AQ = QT,* (7.1.10)

where

*Q = [q*_{1}*, · · · , q*_{n}*] and T =*

*α*_{1} *β*_{1} 0

*β*_{1} . .. ...

*. .. ... β*_{n−1}

0 *β*_{n−1}*α*_{n}

*.*

*It implies that the j-th column of (7.1.10) forms:*

*Aq**j* *= β**j−1**q**j−1**+ α**j**q**j* *+ β**j**q**j+1**,* (7.1.11)
*for j = 1, · · · , n with β*_{0} *= β*_{n}*= 0. By multiplying (7.1.11) by q*_{j}* ^{T}* we obtain

*q*_{j}^{T}*Aq*_{j}*= α*_{j}*.* (7.1.12)

*Define r*_{j}*= (A − α*_{j}*I)q*_{j}*− β*_{j−1}*q** _{j−1}*. Then

*r*

_{j}*= β*

_{j}*q*

*with*

_{j+1}*β*_{j}*= ±kr*_{j}*k*_{2} (7.1.13)

*and if β*_{j}*6= 0 then*

*q**j+1* *= r**j**/β**j**.* (7.1.14)

*So we can determine the unknown α*_{j}*, β*_{j}*, q** _{j}* in the following order:

*Given q*1*, α*1*, r*1*, β*1*, q*2*, α*2*, r*2*β*2*, q*3*, · · · .*
The above formula define the Lanczos iterations:

*j = 0, r*0 *= q*1 *, β*0 *= 1 , q*0 = 0
*Do while (β*_{j}*6= 0)*

*q*_{j+1}*= r*_{j}*/β*_{j}*, j := j + 1*
*α**j* *= q*_{j}^{T}*Aq**j* *,*

*r**j* *= (A − α**j**I)q**j**− β**j−1**q**j−1**,*
*β*_{j}*= kr*_{j}*k*_{2}*.*

(7.1.15)

*There is no loss of generality in choosing the β**j* *to be positive. The q**j* are called Lanczos
*vectors. With careful overwriting and use of the formula α*_{j}*= q*_{j}^{T}*(Aq*_{j}*− β*_{j−1}*q** _{j−1}*), the

*whole process can be implemented with only a pair of n-vectors.*

*Algorithm 7.1.1 (Lanczos Algorithm) Given a symmetric A ∈ R*^{n×n}*and w ∈ R*^{n}*having unit 2-norm. The following algorithm computes a j × j symmetric tridiagonal*
*matrix T*_{j}*with the property that σ(T*_{j}*) ⊂ σ(A). The diagonal and subdiagonal elements*
*of T*_{j}*are stored in α*_{1}*, · · · , α*_{j}*and β*_{1}*, · · · , β** _{j−1}* respectively.

*v**i* *:= 0 (i = 1, · · · , n)*
*β*_{0} := 1

*j := 0*

*Do while (β**j* *6= 0)*
*if (j 6= 0), then*

*for i = 1, · · · , n,*

*t := w**i**, w**i* *:= v**i**/β**j**, v**i* *:= −β**j**t.*

end for end if

*v := Aw + v,*
*j := j + 1,*
*α*_{j}*:= w*^{T}*v,*
*v := v − α**j**w,*
*β*_{j}*:= kvk*_{2}*.*

*Remark 7.1.3 (a) If the sparity is exploited and only kn flops are involved in each call*
*(Aw) (k ¿ n), then each Lanczos step requires about (4+k)n flops to execute.*

*(b) The iteration stops before complete tridiagonalizaton if q*_{1} *is contained in a proper*
*invariant subspace. From the iteration (7.1.15) we have*

*A(q*_{1}*, · · · , q*_{m}*) = (q*_{1}*, · · · , q** _{m}*)

*α*1 *β*1

*β*_{1} *. .. ... β*_{m−1}*. .. ...*

*β**m−1* *α**m*

*+ (0, · · · , 0,*

*r**m*

z }| {
*β*_{m}*q** _{m+1}*)

| {z }

*r**m**e*^{T}_{m}

*β*_{m}*= 0 if and only if r*_{m}*= 0.*

*This implies*

*A(q*_{1}*, · · · , q*_{m}*) = (q*_{1}*, · · · , q*_{m}*)T*_{m}*.*
*That is*

*Range(q*1*, · · · , q**m**) = Range(K[q*1*, A, m])*

*is the invariant subspace of A and the eigenvalues of T*_{m}*are the eigenvalues of A.*

*Theorem 7.1.2 Let A be symmetric and q*_{1} *be a given vector with kq*_{1}*k*_{2} *= 1. The Lanc-*
*zos iterations (7.1.15) runs until j = m where m = rank[q*_{1}*, Aq*_{1}*, · · · , A*^{n−1}*q*_{1}*]. Moreover,*
*for j = 1, · · · , m we have*

*AQ*_{j}*= Q*_{j}*T*_{j}*+ r*_{j}*e*^{T}* _{j}* (7.1.16)

*with*

*T**j* =

*α*_{1} *β*_{1}
*β*1 *. .. ...*

*. .. ... β*_{j−1}*β*_{j−1}*α*_{j}

*and Q**j* *= [q*1*, · · · , q**j*]

*has orthonormal columns satisfying Range(Q*_{j}*) = K(q*_{1}*, A, j).*

*Proof: By induction on j. Suppose the iteration has produced Q*_{j}*= [q*_{1}*, · · · , q** _{j}*] such

*that Range(Q*

_{j}*) = K(q*

_{1}

*, A, j) and Q*

^{T}

_{j}*Q*

_{j}*= I*

*. It is easy to see from (7.1.15) that (7.1.16) holds. Thus*

_{j}*Q*^{T}_{j}*AQ*_{j}*= T*_{j}*+ Q*^{T}_{j}*r*_{j}*e*^{T}_{j}*.*
*Since α**i* *= q*_{i}^{T}*Aq**i* *for i = 1, · · · , j and*

*q*^{T}_{i+1}*Aq**i* *= q*_{i+1}^{T}*(β**i**q**i+1**+ α**i**q**i**+ β**i−1**q**i−1**) = q*_{i+1}^{T}*(β**i**q**i+1**) = β**i*

*for i = 1, · · · , j − 1 we have Q*^{T}_{j}*AQ*_{j}*= T*_{j}*. Consequently Q*^{T}_{j}*r** _{j}* = 0.

*If r*_{j}*6= 0 then q*_{j+1}*= r*_{j}*/kr*_{j}*k*_{2} *is orthogonal to q*_{1}*, · · · , q** _{j}* and

*q*

*j+1*

*∈ Span{Aq*

*j*

*, q*

*j*

*, q*

*j−1*

*} ⊂ K(q*1

*, A, j + 1).*

*Thus Q*^{T}_{j+1}*Q**j+1* *= I**j+1* *and Range(Q**j+1**) = K(q*1*, A, j + 1).*

*On the other hand, if r*_{j}*= 0, then AQ*_{j}*= Q*_{j}*T*_{j}*. This says that Range(Q** _{j}*) =

*K(q*

_{1}

*, A, j) is invariant. From this we conclude that j = m = dim[K(q*

_{1}

*, A, n)].*

*Encountering a zero β**j* in the Lanczos iteration is a welcome event in that it signals
the computation of an exact invariant subspace. However an exactly zero or even small
*β*_{j}*is rarely in practice. Consequently, other explanations for the convergence of T*_{j}^{0}*s*
eigenvalues must be sought.

*Theorem 7.1.3 Suppose that j steps of the Lanczos algorithm have been performed and*
*that*

*S*_{j}^{T}*T*_{j}*S*_{j}*= diag(θ*_{1}*, · · · , θ** _{j}*)

*is the Schur decomposition of the tridiagonal matrix T**j**, if Y**j* *∈ R*^{n×j}*is defined by*
*Y*_{j}*= [y*_{1}*, · · · , y*_{j}*] = Q*_{j}*S*_{j}

*then for i = 1, · · · , j we have*

*kAy*_{i}*− θ*_{i}*y*_{i}*k*_{2} *= |β*_{j}*||s*_{ji}*|*
*where S*_{j}*= (s*_{pq}*).*

*Proof: Post-multiplying (7.1.16) by S**j* gives

*AY**j* *= Y**j**diag(θ*1*, · · · , θ**j**) + r**j**e*^{T}_{j}*S**j**,*
i.e.,

*Ay*_{i}*= θ*_{i}*y*_{i}*+ r*_{j}*(e*^{T}_{j}*S*_{j}*e*_{i}*) , i = 1, · · · , j.*

*The proof is complete by taking norms and recalling kr**j**k*2 *= |β**j**|.*

*Remark 7.1.4 The theorem provides error bounds for T*_{j}^{0}*s eigenvalues:*

*µ∈σ(A)*min *|θ*_{i}*− µ| ≤ |β*_{j}*||s*_{ji}*| i = 1, · · · , j.*

*Note that in section 10 the (θ*_{i}*, y*_{i}*) are Ritz pairs for the subspace R(Q** _{j}*).

*If we use the Lanczos method to compute AQ**j* *= Q**j**T**j* *+ r**j**e*^{T}_{j}*and set E = τ ww*^{T}*where τ = ±1 and w = aq*_{j}*+ br** _{j}*, then it can be shown that

*(A + E)Q*_{j}*= Q*_{j}*(T*_{j}*+ τ a*^{2}*e*_{j}*e*^{T}_{j}*) + (1 + τ ab)r*_{j}*e*^{T}_{j}*.*
*If 0 = 1 + τ ab, then the eigenvalues of the tridiagonal matrix*

*T*˜*j* *= T**j* *+ τ a*^{2}*e**j**e*^{T}_{j}

*are also eigenvalues of A + E. We may then conclude from theorem 6.1.2 that the interval*
*[λ*_{i}*(T*_{j}*), λ*_{i−1}*(T*_{j}*)] where i = 2, · · · , j, each contains an eigenvalue of A + E.*

Suppose we have an approximate eigenvalue ˜*λ of A. One possibility is to choose τ a*^{2}
so that

*det( ˜T*_{j}*− ˜λI*_{j}*) = (α*_{j}*+ τ a*^{2}*− ˜λ)p** _{j−1}*(˜

*λ) − β*

_{j−1}^{2}

*p*

*(˜*

_{j−2}*λ) = 0,*

*where the polynomial p*

_{i}*(x) = det(T*

_{i}*− xI*

*) can be evaluated at ˜*

_{i}*λ using (5.3).*

The following theorems are known as the Kaniel-Paige theory for the estimation of eigenvalues which obtained via the Lanczos algorithm.

*Theorem 7.1.4 Let A be n × n symmetric matrix with eigenvalues λ*_{1} *≥ · · · ≥ λ*_{n}*and*
*corresponding orthonormal eigenvectors z*_{1}*, · · · , z*_{n}*. If θ*_{1} *≥ · · · ≥ θ*_{j}*are the eigenvalues*
*of T*_{j}*obtained after j steps of the Lanczos iteration, then*

*λ*_{1} *≥ θ*_{1} *≥ λ*_{1} *−(λ*_{1}*− λ*_{n}*) tan (φ*_{1})^{2}
*[c**j−1**(1 + 2ρ*1)]^{2} *,*

*where cos φ*_{1} *= |q*_{1}^{T}*z*_{1}*|, ρ*_{1} *= (λ*_{1}*− λ*_{2}*)/(λ*_{2}*− λ*_{n}*) and c*_{j−1}*is the Chebychev polynomial of*
*degree j − 1.*

Proof: From Courant-Fischer theorem we have
*θ*_{1} = max

*y6=0*

*y*^{T}*T*_{j}*y*

*y*^{T}*y* = max

*y6=0*

*(Q*_{j}*y)*^{T}*A(Q*_{j}*y)*

*(Q**j**y)*^{T}*(Q**j**y)* = max

*06=w∈K(q*1*,A,j)*

*w*^{T}*Aw*
*w*^{T}*w* *.*

*Since λ*_{1} *is the maximum of w*^{T}*Aw/w*^{T}*w over all nonzero w, it follows that λ*_{1} *≥ θ*_{1}. To
*obtain the lower bound for θ*1, note that

*θ*_{1} = max

*p∈P**j−1*

*q*_{1}^{T}*p(A)Ap(A)q*_{1}
*q*_{1}^{T}*p(A)*^{2}*q*_{1} *,*
*where P*_{j−1}*is the set of all j − 1 degree polynomials. If*

*q*_{1} =
X*n*

*i=1*

*d*_{i}*z*_{i}

then *q*_{1}^{T}*p(A)Ap(A)q*_{1}

*q*^{T}_{1}*p(A)*^{2}*q*_{1} =
P_{n}

*i=1**d*^{2}_{i}*p(λ** _{i}*)

^{2}

*λ*

*P*

_{i}

_{n}*i=1**d*^{2}_{i}*p(λ**i*)^{2}

*≥ λ*_{1}*− (λ*_{1}*− λ** _{n}*)

P_{n}

*i=2**d*^{2}_{i}*p(λ** _{i}*)

^{2}

*d*

^{2}

_{1}

*p(λ*1)

^{2}+P

_{n}*i=2**d*^{2}_{i}*p(λ**i*)^{2}*.*

*We can make the lower bound tight by selecting a polynomial p(x) that is large at x = λ*1

in comparison to its value at the remaining eigenvalues. Set

*p(x) = c*_{j−1}*[−1 + 2* *x − λ**n*

*λ*_{2}*− λ*_{n}*],*

*where c*_{j−1}*(z) is the (j − 1)-th Chebychev polynomial generated by*
*c**j**(z) = 2zc**j−1**(z) − c**j−2**(z),* *c*0 *= 1, c*1 *= z.*

*These polynomials are bounded by unity on [-1,1]. It follows that |p(λ*_{i}*)| is bounded by*
*unity for i = 2, · · · , n while p(λ*_{1}*) = c*_{j−1}*(1 + 2ρ*_{1}). Thus,

*θ*_{1} *≥ λ*_{1}*− (λ*_{1}*− λ** _{n}*)

*(1 − d*

^{2}

_{1})

*d*

^{2}

_{1}

1

*c*^{2}_{j−1}*(1 + 2ρ*_{1})*.*

*The desired lower bound is obtained by noting that tan (φ*_{1})^{2} *= (1 − d*^{2}_{1}*)/d*^{2}_{1}.

*Corollary 7.1.5 Using the same notation as Theorem 7.1.4*

*λ*_{n}*≤ θ*_{j}*≤ λ** _{n}*+

*(λ*

_{1}

*− λ*

*) tan*

_{n}^{2}

*(φ*

*)*

_{n}*c*

^{2}

_{j−1}*(1 + 2ρ*

*)*

_{n}*,*

*where ρ*

_{n}*= (λ*

_{n−1}*− λ*

_{n}*)/(λ*

_{1}

*− λ*

_{n−1}*) and cos (φ*

_{n}*) = |q*

_{1}

^{T}*z*

_{n}*|.*

*Proof: Apply Theorem 7.1.4 with A replaced by −A.*

Example 7.1.1

*L*_{j−1}*≡* 1

*[C** _{j−1}*(2

^{λ}

_{λ}^{1}

2 *− 1)]*^{2} *≥* 1

*[C*_{j−1}*(1 + 2ρ*_{1})]^{2}
*R** _{j−1}* = (

*λ*2

*λ*_{1})^{2(j−1)}*power method*

*λ*_{1}*/λ*_{2} *j=5* *j=25*

*1.5* *1.1 × 10*^{−4}*/3.9 × 10*^{−2}*1.4 × 10*^{−27}*/3.5 × 10*^{−9}*L**j−1**/R**j−1*

*1.01* *5.6 × 10*^{−1}*/9.2 × 10*^{−1}*2.8 × 10*^{−4}*/6.2 × 10*^{−1}*L*_{j−1}*/R*_{j−1}

Rounding errors greatly affect the behavior of algorithm 7.1.1, the Lanczos iteration.

The basic difficulty is caused by loss of orthogonality among the Lanczos vectors. To avoid these difficulties we can reorthogonalize the Lanczos vectors.

### 7.1.1 Reorthogonalization

Since

*AQ*_{j}*= Q*_{j}*T*_{j}*+ r*_{j}*e*^{T}_{j}*,*
let

*AQ*_{j}*− Q*_{j}*T*_{j}*= r*_{j}*e*^{T}_{j}*+ F** _{j}* (7.1.17)

*I − Q*

^{T}

_{j}*Q*

*j*

*= C*

_{j}*+ ∆*

^{T}*j*

*+ C*

*j*

*,*(7.1.18)

*where C*

*j*is strictly upper triangular and ∆

*j*

*is diagonal. (For simplicity, suppose (C*

*j*)

*i,i+1*= 0 and ∆

*= 0.)*

_{i}*Definition 7.1.1 θ**i* *and y**i* *≡ Q**j**s**i* *are called Ritz value and Ritz vector, respectively, if*
*T*_{j}*s*_{i}*= θ*_{i}*s*_{i}*.*

Let Θ_{j}*≡ diag(θ*_{1}*, · · · , θ*_{j}*) = S*_{j}^{T}*T*_{j}*S*_{j}*where S** _{j}* =£

*s*1 *· · · s**j*

¤.

*Theorem 7.1.6 (Paige Theorem) Assume that (a) S*_{j}*and Θ*_{j}*are exact ! (Since j ¿*
*n). (b) local orthogonality is maintained. ( i.e. q*^{T}_{i+1}*q**i* *= 0, i = 1, . . . , j − 1, r*^{T}_{j}*q**j* *= 0, and*
*(C** _{j}*)

_{i,i+1}*= 0 ). Let*

*F*_{j}^{T}*Q*_{j}*− Q*^{T}_{j}*F*_{j}*= K*_{j}*− K*_{j}^{T}*,*

∆*j**T**j* *− T**j*∆*j* *≡ N**j**− N*_{j}^{T}*,*

*G*_{j}*= S*_{j}^{T}*(K*_{j}*+ N*_{j}*)S*_{j}*≡ (r*_{ik}*).*

*Then*

*(a) y*^{T}_{i}*q*_{j+1}*= r*_{ii}*/β*_{ji}*, where y*_{i}*= Q*_{j}*s*_{i}*, β*_{ji}*= β*_{j}*s*_{ji}*.*
*(b) For i 6= k,*

*(θ**i**− θ**k**)y*_{i}^{T}*y**k* *= r**ii*(*s**jk*

*s*_{ji}*) − r**kk*(*s**ji*

*s*_{jk}*) − (r**ik**− r**ki**).* (7.1.19)
*Proof: Multiplied (7.1.17) from left by Q*^{T}* _{j}*, we get

*Q*^{T}_{j}*AQ*_{j}*− Q*^{T}_{j}*Q*_{j}*T*_{j}*= Q*^{T}_{j}*r*_{j}*e*^{T}_{j}*+ Q*^{T}_{j}*F*_{j}*,* (7.1.20)
which implies that

*Q*^{T}_{j}*A*^{T}*Q*_{j}*− T*_{j}*Q*^{T}_{j}*Q*_{j}*= e*_{j}*r*_{j}^{T}*Q*_{j}*+ F*_{j}^{T}*Q*_{j}*.* (7.1.21)
Subtracted (7.1.20) from (7.1.21), we have

*(Q*^{T}_{j}*γ*_{j}*)e*^{T}_{j}*− e*_{j}*(Q*^{T}_{j}*γ** _{j}*)

^{T}*= (C*_{j}^{T}*T**j* *− T**j**C*_{j}^{T}*) + (C**j**T**j* *− T**j**C**j*) + (∆*j**T**j**− T**j*∆*j**) + F*_{j}^{T}*Q**j* *− Q**j**F*_{j}^{T}

*= (C*_{j}^{T}*T*_{j}*− T*_{j}*C*_{j}^{T}*) + (C*_{j}*T*_{j}*− T*_{j}*C*_{j}*) + (N*_{j}*− N*_{j}^{T}*) + (K*_{j}*− K*_{j}^{T}*).*

This implies that

*(Q*^{T}_{j}*r*_{j}*)e*^{T}_{j}*= C*_{j}*T*_{j}*− T*_{j}*C*_{j}*+ N*_{j}*+ K*_{j}*.*
Thus,

*y*^{T}_{i}*q*_{j+1}*β*_{ji}*= s*^{T}_{i}*(Q*^{T}_{j}*r*_{j}*)e*^{T}_{j}*s*_{i}*= s*^{T}_{i}*(C*_{j}*T*_{j}*− T*_{j}*C*_{j}*)s*_{i}*+ s*^{T}_{i}*(N*_{j}*+ K*_{j}*)s*_{i}

*= (s*^{T}_{i}*C*_{j}*s*_{i}*)θ*_{i}*− θ*_{i}*(s*^{T}_{i}*C*_{j}*s*_{i}*) + r*_{ii}*,*
which implies that

*y*^{T}_{i}*q** _{j+1}* =

*r*

*ii*

*β*_{ji}*.*

*Similarly, (7.1.19) can be obtained by multiplying (7.1.20) from left and right by s*^{T}* _{i}* and

*s*

*i*, respectively.

*Remark 7.1.5 Since*
*y*^{T}_{i}*q**j+1* = *r*_{ii}

*β** _{ji}* =

½ *O(esp),* *if |β**ji**| = O(1), (not converge!)*

*O(1),* *if |β*_{ji}*| = O(esp), (converge for (θ*_{j}*, y** _{j}*))

*we have that q*^{T}_{j+1}*y**i* *= O(1) when the algorithm converges, i.e., q**j+1* *is not orthogonal to*

*< Q*_{j}*> where Q*_{j}*s*_{i}*= y*_{i}*.*

(i) Full Reorthogonalization by MGS:

*Orthogonalize q*_{j+1}*to all q*_{1}*, · · · , q** _{j}* by

*q*_{j+1}*:= q*_{j+1}*−*
X*j*

*i=1*

*(q*_{j+1}^{T}*q*_{i}*)q*_{i}*.*

If we incorporate the Householder computations into the Lanczos process, we can produce Lanczos vectors that are orthogonal to working accuracy:

*r*_{0} *:= q*_{1} (given unit vector)

*Determine P*0 *= I − 2v*0*v*_{0}^{T}*/v*_{0}^{T}*v*0 *so that P*0*r*0 *= e*1;
*α*1 *:= q*_{1}^{T}*Aq*1;

*Do j = 1, · · · , n − 1,*

*r*_{j}*:= (A − α*_{j}*)q*_{j}*− β*_{j−1}*q*_{j−1}*(β*_{0}*q*_{0} *≡ 0),*
*w := (P*_{j−1}*· · · P*_{0}*)r*_{j}*,*

*Determine P*_{j}*= I − 2v*_{j}*v*^{T}_{j}*/v*^{T}_{j}*v*_{j}*such that P*_{j}*w = (w*_{1}*, · · · , w*_{j}*, β*_{j}*, 0, · · · , 0)*^{T}*,*
*q*_{j+1}*:= (P*_{0}*· · · P*_{j}*)e*_{j+1}*,*

*α**j+1* *:= q*^{T}_{j+1}*Aq**j+1**.*

This is the complete reorthogonalization Lanczos scheme.

(ii) Selective Reorthogonalization by MGS
*If |β**ji**| = O(√*

*eps), (θ**j**, y**j*) “good” Ritz pair
*Do q*_{j+1}*⊥q*_{1}*, . . . , q*_{j}

Else not to do Reorthogonalization (iii) Restart after m-steps

(Do full Reorthogonalization) (iv) Partial Reorthogonalization

*Do reorthogonalization with previous (e.g. k = 5) Lanczos vectors {q*_{1}*, . . . , q*_{k}*}*
For details see the books:

Parlett: “Symmetric Eigenvalue problem” (1980) pp.257–

Golub & Van Loan: “Matrix computation” (1981) pp.332–

To (7.1.19): The duplicate pairs can occur!

*i 6= k, (θ**i**− θ**k**) y*|{z}_{i}^{T}*y**k* *= O(esp)*

*O(1), if y*_{i}*= y*_{k}*⇒ Q*_{i}*≈ Q*_{k}

How to avoid the duplicate pairs ? The answer is using the implicit Restart Lanczos algorithm:

Let

*AQ*_{j}*= Q*_{j}*T*_{j}*+ r*_{j}*e*^{T}* _{j}*
be a Lanczos decomposition.

*• In principle, we can keep expanding the Lanczos decomposition until the Ritz pairs*
have converged.

*• Unfortunately, it is limited by the amount of memory to storage of Q**j*.

*• Restarted the Lanczos process once j becomes so large that we cannot store Q** _{j}*.
– Implicitly restarting method

*• Choose a new starting vector for the underlying Krylov sequence*

*• A natural choice would be a linear combination of Ritz vectors that we are interested*
in.

### 7.1.2 Filter polynomials

*Assume A has a complete system of eigenpairs (λ**i**, x**i*) and we are interested in the first
*k of these eigenpairs. Expand u*_{1} in the form

*u*_{1} =
X*k*

*i=1*

*γ*_{i}*x** _{i}*+
X

*n*

*i=k+1*

*γ*_{i}*x*_{i}*.*
*If p is any polynomial, we have*

*p(A)u*_{1} =
X*k*

*i=1*

*γ*_{i}*p(λ*_{i}*)x** _{i}*+
X

*n*

*i=k+1*

*γ*_{i}*p(λ*_{i}*)x*_{i}*.*

*• Choose p so that the values p(λ*_{i}*) (i = k +1, . . . , n) are small compared to the values*
*p(λ**i**) (i = 1, . . . , k).*

*• Then p(A)u*_{1} *is rich in the components of the x** _{i}* that we want and deficient in the
ones that we do not want.

*• p is called a filter polynomial.*

*• Suppose we have Ritz values µ*_{1}*, . . . , µ*_{m}*and µ*_{k+1}*, . . . , µ** _{m}* are not interesting. Then
take

*p(t) = (t − µ*_{k+1}*) · · · (t − µ*_{m}*).*

### 7.1.3 Implicitly restarted algorithm

Let

*AQ*_{m}*= Q*_{m}*T*_{m}*+ β*_{m}*q*_{m+1}*e*^{T}* _{m}* (7.1.22)

*be a Lanczos decomposition with order m. Choose a filter polynomial p of degree m − k*and use the implicit restarting process to reduce the decomposition to a decomposition

*A ˜Q** _{k}* = ˜

*Q*

_{k}*T*˜

*+ ˜*

_{k}*β*

_{k}*q*˜

_{k+1}*e*

^{T}

_{k}*of order k with starting vector p(A)u*

_{1}.

*Let ν*_{1}*, . . . , ν*_{m}*be eigenvalues of T*_{m}*and suppose that ν*_{1}*, . . . , ν** _{m−k}* correspond to the
part of the spectrum we are not interested in. Then take

*p(t) = (t − ν*1*)(t − ν*2*) · · · (t − ν**m−k**).*

*The starting vector p(A)u*_{1} is equal to

*p(A)u*_{1} *= (A − ν*_{m−k}*I) · · · (A − ν*_{2}*I)(A − ν*_{1}*I)u*_{1}

*= (A − ν**m−k**I) [· · · [(A − ν*2*I) [(A − ν*1*I)u*1*]]] .*

*In the first, we construct a Lanczos decomposition with starting vector (A−ν*_{1}*I)u*_{1}. From
(7.1.22), we have

*(A − ν*_{1}*I)Q*_{m}*= Q*_{m}*(T*_{m}*− ν*_{1}*I) + β*_{m}*q*_{m+1}*e*^{T}* _{m}* (7.1.23)

*= Q*_{m}*U*_{1}*R*_{1}*+ β*_{m}*q*_{m+1}*e*^{T}_{m}*,*
where

*T*_{m}*− ν*_{1}*I = U*_{1}*R*_{1}

*is the QR factorization of T**m**− κ*1*I. Postmultiplying by U*1, we get
*(A − ν*_{1}*I)(Q*_{m}*U*_{1}*) = (Q*_{m}*U*_{1}*)(R*_{1}*U*_{1}*) + β*_{m}*q*_{m+1}*(e*^{T}_{m}*U*_{1}*).*

It implies that

*AQ*^{(1)}_{m}*= Q*^{(1)}_{m}*T*_{m}^{(1)}*+ β*_{m}*q*_{m+1}*b*^{(1)T}_{m+1}*,*
where

*Q*^{(1)}_{m}*= Q*_{m}*U*_{1}*,* *T*_{m}^{(1)} *= R*_{1}*U*_{1}*+ ν*_{1}*I,* *b*^{(1)T}_{m+1}*= e*^{T}_{m}*U*_{1}*.*
*(Q*^{(1)}*m* *: one step of single shifted QR algorithm)*

Remark 7.1.6

*• Q*^{(1)}*m* *is orthonormal.*

*• By the definition of T**m*^{(1)}*, we get*

*U*_{1}*T*_{m}^{(1)}*U*_{1}^{T}*= U*_{1}*(R*_{1}*U*_{1}*+ ν*_{1}*I)U*_{1}^{T}*= U*_{1}*R*_{1} *+ ν*_{1}*I = T*_{m}*.* (7.1.24)
*Therefore, ν*_{1}*, ν*_{2}*, . . . , ν*_{m}*are also eigenvalues of T**m*^{(1)}*.*

*• Since T*_{m}*is tridiagonal and U*_{1} *is the Q-factor of the QR factorization of T*_{m}*− ν*_{1}*I,*
*it implies that U*_{1} *and T**m*^{(1)} *are upper Hessenberg. From (7.1.24), T**m*^{(1)} *is symmetric.*

*Therefore, T**m*^{(1)} *is also tridiagonal.*

*• The vector b*^{(1)T}_{m+1}*= e*^{T}_{m}*U*_{1} *has the form*
*b*^{(1)T}* _{m+1}* =

h

*0 · · · 0 U*_{m−1,m}^{(1)} *U**m,m*^{(1)}

i

;
*i.e., only the last two components of b*^{(1)}_{m+1}*are nonzero.*

*• For on postmultiplying (7.1.23) by e*_{1}*, we get*

*(A − ν*1*I)q*1 *= (A − ν*1*I)(Q**m**e*1*) = Q*^{(1)}_{m}*R*1*e*1 *= r*^{(1)}_{11}*q*_{1}^{(1)}*.*

*Since T*_{m}*is unreduced, r*^{(1)}_{11} *is nonzero. Therefore, the first column of Q*^{(1)}*m* *is a*
*multiple of (A − κ*_{1}*I)q*_{1}*.*

*Repeating this process with ν*_{2}*, . . . , ν** _{m−k}*, the result will be a Krylov decomposition

*AQ*

^{(m−k)}

_{m}*= Q*

^{(m−k)}

_{m}*T*

_{m}

^{(m−k)}*+ β*

_{m}*q*

_{m+1}*b*

^{(m−k)T}

_{m+1}with the following properties
*i. Q*^{(m−k)}*m* is orthonormal.

*ii. T**m** ^{(m−k)}* is tridiagonal.

*iii. The first k − 1 components of b*^{(m−k)T}* _{m+1}* are zero.

*iv. The first column of Q*^{(m−k)}*m* *is a multiple of (A − ν*_{1}*I) · · · (A − ν*_{m−k}*I)q*_{1}.

*Corollary 7.1.1 Let ν*_{1}*, . . . , ν*_{m}*be eigenvalues of T*_{m}*. If the implicitly restarted QR step*
*is performed with shifts ν*1*, . . . , ν**m−k**, then the matrix T**m*^{(m−k)}*has the form*

*T*_{m}* ^{(m−k)}* =

"

*T*_{kk}^{(m−k)}*T*_{k,m−k}* ^{(m−k)}*
0

*T*

_{k+1,k+1}

^{(m−k)}#
*,*

*where T*_{k+1,k+1}^{(m−k)}*is an upper triangular matrix with Ritz value ν*_{1}*, . . . , ν*_{m−k}*on its diagonal.*

*Therefore, the first k columns of the decomposition can be written in the form*
*AQ*^{(m−k)}_{k}*= Q*^{(m−k)}_{k}*T*_{kk}^{(m−k)}*+ t*_{k+1,k}*q*_{k+1}^{(m−k)}*e*^{T}_{k}*+ β*_{k}*u*_{mk}*q*_{m+1}*e*^{T}_{k}*,*

*where Q*^{(m−k)}_{k}*consists of the first k columns of Q*^{(m−k)}*m* *, T*_{kk}* ^{(m−k)}* is the leading principal

*submatrix of order k of T*

*m*

^{(m−k)}*, and u*

_{km}*is from the matrix U = U*

_{1}

*· · · U*

*. Hence if we set*

_{m−k}*Q*˜_{k}*= Q*^{(m−k)}_{k}*,*
*T*˜_{k}*= T*_{kk}^{(m−k)}*,*

*β*˜_{k}*= kt*_{k+1,k}*q*_{k+1}^{(m−k)}*+ β*_{k}*u*_{mk}*q*_{m+1}*k*_{2}*,*

˜

*q** _{k+1}* = ˜

*β*

_{k}

^{−1}*(t*

_{k+1,k}*q*

_{k+1}

^{(m−k)}*+ β*

_{k}*u*

_{mk}*q*

_{m+1}*),*then

*A ˜Q** _{k}* = ˜

*Q*

_{k}*T*˜

*+ ˜*

_{k}*β*

_{k}*q*˜

_{k+1}*e*

^{T}

_{k}*is a Lanczos decomposition whose starting vector is proportional to (A − ν*1*I) · · · (A −*
*ν*_{m−k}*I)q*_{1}.

*• Avoid any matrix-vector multiplications in forming the new starting vector.*

*• Get its Lanczos decomposition of order k for free.*

*• For large n the major cost will be in computing QU .*

### 7.2 Approximation from a subspace

*Assume that A is symmetric and {(α**i**, z**i**)}*^{n}_{i=1}*be eigenpairs of A with α*1 *≤ α*2 *≤ · · · ≤ α**n*.
Define

*ρ(x) = ρ(x, A) =* *x*^{T}*Ax*
*x*^{T}*x* *.*
Algorithm 7.2.1 (Rayleigh-Ritz-Quotient procedure)

*Give a subspace S*^{(m)}*= span{Q} with Q*^{T}*Q = I**m**;*
*Set H := ρ(Q) = Q*^{T}*AQ;*

*Compute the p (≤ m) eigenpairs of H, which are of interest,*
*say Hg**i* *= θ**i**g**i* *for i = 1, . . . , p;*

*Compute Ritz vectors y*_{i}*= Qg*_{i}*, for i = 1, . . . , p;*

*Check kAy*_{i}*− θ*_{i}*y*_{i}*k*_{2} *≤ T ol, for i = 1, . . . , p.*

By the minimax characterization of eigenvalues, we have
*α*_{j}*= λ*_{j}*(A) = min*

*F*^{j}*⊆R** ^{n}*max

*f ∈F*^{j}*ρ(f, A).*

Define

*β** _{j}* = min

*G*^{j}*⊆S** ^{m}*max

*g∈G*^{j}*ρ(g, A), for j ≤ m.*

*Since G*^{j}*⊆ S*^{m}*and S*^{(m)}*= span{Q}, it implies that G*^{j}*= Q eG** ^{j}* for some e

*G*

^{j}*⊆ R*

*. Therefore,*

^{m}*β** _{j}* = min

*G*e^{j}*⊆R** ^{m}*max

*s∈ e**G*^{j}*ρ(s, H) = λ*_{j}*(H) ≡ θ*_{j}*,*
*for j = 1, . . . , m.*

*For any m by m matrix B, there is associated a residual matrix R(B) ≡ AQ − QB.*

*Theorem 7.2.1 For given orthonormal n by m matrix Q,*
*kR(H)k ≤ kR(B)k*

*for all m by m matrix B.*

Proof: Since

*R(B)*^{∗}*R(B) = Q*^{∗}*A*^{2}*Q − B*^{∗}*(Q*^{∗}*AQ) − (Q*^{∗}*AQ)B + B*^{∗}*B*

*= Q*^{∗}*A*^{2}*Q − H*^{2}*+ (H − B)*^{∗}*(H − B)*

*= R(H)*^{∗}*R(H) + (H − B)*^{∗}*(H − B)*

*and (H − B)*^{∗}*(H − B) is positive semidefinite, it implies that kR(B)k*^{2} *≥ kR(H)k*^{2}.
Since

*Hg*_{i}*= θ*_{i}*g*_{i}*, for i = 1, . . . , m,*
we have that

*Q*^{T}*AQg*_{i}*= θ*_{i}*g*_{i}*,*
which implies that

*QQ*^{T}*A(Qg**i**) = θ**i**(Qg**i**).*

*Let y**i* *= Qg**i**. Then QQ*^{T}*y**i* *= Q(Q*^{T}*Q)g**i* *= y**i**. Take P**Q* *= QQ** ^{T}* which is a projection on

*span{Q}. Then*

*(QQ*^{T}*)Ay*_{i}*= θ*_{i}*(QQ*^{T}*)y*_{i}*,*
which implies that

*P*_{Q}*(Ay*_{i}*− θ*_{i}*y*_{i}*) = 0,*
*i.e., r*_{i}*= Ay*_{i}*− θ*_{i}*y*_{i}*⊥ S*^{m}*= span{Q}.*