By limiting the generality of the matrices considered the essential ideas and results can be conveyed in a more intuitive manner without the mathematical machinery required for the most general cases

(1)

Toeplitz and Circulant Matrices: A review







t₀ t₋₁ t₋₂ · · · t−(n−1)

t₁ t₀ t₋₁

t₂ t₁ t₀ ...

... . ..

t_n₋₁ · · · t₀







Robert M. Gray

Information Systems Laboratory Department of Electrical Engineering

Stanford University Stanford, California 94305

rmgray@stanford.edu

Revised August 2002 This document available as an

Adobe portable document format (pdf) file at http://ee.stanford.edu/~gray/toeplitz.pdf

°Robert M. Gray, 1971, 1977, 1993, 1997, 1998, 2000, 2001, 2002.c

The preparation of the original report was financed in part by the National Science Foundation and by the Joint Services Program at Stanford. Since then it has been done as a hobby.

(2)

Abstract

The fundamental theorems on the asymptotic behavior of eigenvalues, inverses, and products of “finite section” Toeplitz matrices and Toeplitz matrices with absolutely summable elements are derived in a tutorial manner.

Mathematical elegance and generality are sacrificed for conceptual simplic- ity and insight in the hopes of making these results available to engineers lacking either the background or endurance to attack the mathematical literature on the subject. By limiting the generality of the matrices considered the essential ideas and results can be conveyed in a more intuitive manner without the mathematical machinery required for the most general cases. As an application the results are applied to the study of the covariance matrices and their factors of linear models of discrete time random processes.

Acknowledgements

The author gratefully acknowledges the assistance of Ronald M. Aarts of the Philips Research Labs in correcting many typos and errors in the 1993 revision, Liu Mingyu in pointing out errors corrected in the 1998 revision, Paolo Tilli of the Scuola Normale Superiore of Pisa for pointing out an incorrect corollary and providing the correction, and to David Neuhoff of the University of Michigan for pointing out several typographical errors and some confusing notation. For corrections, comments, and improvements to the 2001 revision thanks are due to William Trench, John Dattorro, and Young Han-Kim. In particular, Professor Trench brought the Wielandt-Hoffman theorem and its use to prove strengthened results to my attention. Section 2.4 largely follows his suggestions, although I take the blame for any introduced errors. For the 2002 revision, particular thanks to Cynthia Pozun of ENST for several corrections.

(3)

Contents

1 Introduction 3

2 The Asymptotic Behavior of Matrices 5

2.1 Eigenvalues . . . . 5

2.2 Matrix Norms . . . . 7

2.3 Asymptotically Equivalent Matrices . . . . 9

2.4 Absolutely Equal Eigenvalue Distributions . . . . 16

3 Circulant Matrices 21 4 Toeplitz Matrices 25 4.1 Bounded Toeplitz Matrices . . . . 25

4.2 Finite Order Toeplitz Matrices . . . . 29

4.3 Absolutely Summable Toeplitz Matrices . . . . 34

4.4 Toeplitz Determinants . . . . 50

5 Applications to Stochastic Time Series 53 5.1 Moving Average Processes . . . . 54

5.2 Autoregressive Processes . . . . 57

5.3 Factorization . . . . 60

5.4 Differential Entropy Rate of Gaussian Processes . . . . 62

Bibliography 64

1

(4)

(5)

Chapter 1 Introduction

A Toeplitz matrix is an n×n matrix Tn= t_k,j where t_k,j = t_k_−j, i.e., a matrix of the form

T_n=







t₀ t₋₁ t₋₂ · · · t_−(n−1) t₁ t₀ t₋₁

t₂ t₁ t₀ ...

... . ..

t_n₋₁ · · · t₀







. (1.1)

Examples of such matrices are covariance matrices of weakly stationary stochastic time series and matrix representations of linear time-invariant discrete time filters. There are numerous other applications in mathematics, physics, information theory, estimation theory, etc. A great deal is known about the behavior of such matrices — the most common and complete ref- erences being Grenander and Szeg¨o [13] and Widom [26]. A more recent text devoted to the subject is B¨ottcher and Silbermann [4]. Unfortunately, however, the necessary level of mathematical sophistication for understanding reference [13] is frequently beyond that of one species of applied mathematician for whom the theory can be quite useful but is relatively little under- stood. This caste consists of engineers doing relatively mathematical (for an engineering background) work in any of the areas mentioned. This apparent dilemma provides the motivation for attempting a tutorial introduction on Toeplitz matrices that proves the essential theorems using the simplest possible and most intuitive mathematics. Some simple and fundamental methods that are deeply buried (at least to the untrained mathematician) in [13] are here made explicit.

3

(6)

In addition to the fundamental theorems, several related results that nat- urally follow but do not appear to be collected together anywhere are presented.

The essential prerequisites for this book are a knowledge of matrix theory, an engineer’s knowledge of Fourier series and random processes and calculus (Riemann integration). A first course in analysis would be helpful, but it is not assumed. Several of the occasional results required of analysis are usually contained in one or more courses in the usual engineering curriculum, e.g., the Cauchy-Schwarz and triangle inequalities. Hopefully the only unfamiliar results are a corollary to the Courant-Fischer theorem and the Weierstrass approximation theorem. The latter is an intuitive result which is easily be- lieved even if not formally proved. More advanced results from Lebesgue integration, functional analysis, and Fourier series are not used.

The approach of this book is to relate the properties of Toeplitz matrices to those of their simpler, more structured cousin — the circulant or cyclic matrix. These two matrices are shown to be asymptotically equivalent in a certain sense and this is shown to imply that eigenvalues, inverses, products, and determinants behave similarly. This approach provides a simplified and direct path (to the author’s point of view) to the basic eigenvalue distribution and related theorems. This method is implicit but not immediately apparent in the more complicated and more general results of Grenander in Chapter 7 of [13]. The basic results for the special case of a finite order Toeplitz matrix appeared in [10], a tutorial treatment of the simplest case which was in turn based on the first draft of this work. The results were subsequently generalized using essentially the same simple methods, but they remain less general than those of [13].

As an application several of the results are applied to study certain models of discrete time random processes. Two common linear models are studied and some intuitively satisfying results on covariance matrices and their factors are given. As an example from Shannon information theory, the Toeplitz results regarding the limiting behavior of determinants is applied to find the differential entropy rate of a stationary Gaussian random process.

We sacrifice mathematical elegance and generality for conceptual sim- plicity in the hope that this will bring an understanding of the interesting and useful properties of Toeplitz matrices to a wider audience, specifically to those who have lacked either the background or the patience to tackle the mathematical literature on the subject.

(7)

Chapter 2

The Asymptotic Behavior of Matrices

In this chapter we begin with relevant definitions and a prerequisite theorem and proceed to a discussion of the asymptotic eigenvalue, product, and inverse behavior of sequences of matrices. The major use of the theorems of this chapter is that we can often study the asymptotic behavior of complicated matrices by studying a more structured and simpler asymptotically equivalent matrix, as will be developed in subsequent chapters.

2.1 Eigenvalues

The eigenvalues αk and the (right) eigenvectors (n-tuples) xk of an n× n matrix A are the solutions to the equation

Ax = αx (2.1)

and hence the eigenvalues are the roots of the characteristic equation of A:

det(A− αI) = 0 . (2.2)

Unless specifically stated otherwise, we will always assume that the eigenval- ues are ordered in nonincreasing fashion, i.e., α₁ ≥ α2 ≥ α3· · ·.

Any complex matrix A can be written as

A = U RU^∗, (2.3)

5

(8)

where the asterisk∗ denotes conjugate transpose, U is unitary, i.e., U⁻¹ = U^∗, and R ={rk,j} is an upper triangular matrix ([15], p. 79). The eigenvalues of A are the principal diagonal elements of R. If A is normal, i.e., if A^∗A = AA^∗, then R is a diagonal matrix, which we denote as R = diag(α_k). If A is Hermitian, i.e., if A^∗ = A, then the eigenvalues are real. If a matrix is Hermitian, then it is also normal.

For the case of Hermitian matrices, a useful description of the eigenvalues is the variational description given by the Courant-Fischer theorem ([15], p.

116). While we will not have direct need of this theorem, we will use the following important corollary which is stated below without proof.

Corollary 2.1 Define the Rayleigh quotient of an Hermitian matrix H and a vector (complex n−tuple) x by

R_H(x) = (x^∗Hx)/(x^∗x). (2.4) Let η_M and η_m be the maximum and minimum eigenvalues of H, respectively.

Then

η_m = min

x R_H(x) = min

x:x^∗x=1 x^∗Hx (2.5)

η_M = max

x R_H(x) = max

x:x^∗x=1 x^∗Hx (2.6)

This corollary will be useful in specifying the interval containing the eigenvalues of an Hermitian matrix.

The following lemma is useful when studying non-Hermitian matrices and products of Hermitian matrices.

Lemma 2.1 Let A be a matrix with eigenvalues α_k. Define the eigenvalues of the Hermitian matrix A^∗A to be λ_k. Then

nX−1 k=0

λk ≥ⁿ^X⁻¹

k=0

|αk|², (2.7)

with equality iff (if and only if ) A is normal.

Proof. The trace of a matrix is the sum of the diagonal elements of a matrix.

The trace is invariant to unitary operations so that it also is equal to the sum of the eigenvalues of a matrix, i.e.,

Tr{A^∗A} =ⁿ^X⁻¹

k=0

(A^∗A)_k,k =

nX−1 k=0

λ_k. (2.8)

(9)

2.2. MATRIX NORMS 7

We have

Tr{A^∗A} = Tr{R^∗R}

=

nX−1 k=0

nX−1 j=0

|rj,k|²

=

nX−1 k=0

|αk|²+^X

k6=j

|rj,k|²

≥ ⁿ^X⁻¹

k=0

|αk|² (2.9)

Equation (2.9) will hold with equality iff R is diagonal and hence iff A is

normal. 2

Lemma 2.1 is a direct consequence of Shur’s theorem ([15], pp. 229-231) and is also proved in [13], p. 106.

2.2 Matrix Norms

To study the asymptotic equivalence of matrices we require a metric or equivalently a norm of the appropriate kind. Two norms — the operator or strong norm and the Hilbert-Schmidt or weak norm (also called the Frobenius norm or Euclidean norm when the scaling term n is removed) — will be used here ([13], pp. 102-103).

Let A be a matrix with eigenvalues α_k and let λ_k be the eigenvalues of the Hermitian matrix A^∗A. The strong normk A k is defined by

k A k= max_x R_A∗A(x)^1/2= max

x:x^∗x=1[x^∗A^∗Ax]^1/2. (2.10) From Corollary 2.1

k A k²= max

k λ_k = λ^∆ _M. (2.11)

The strong norm of A can be bounded below by letting eM be the eigenvector of A corresponding to α_M, the eigenvalue of A having largest absolute value:

k A k²= max

x:x^∗x=1x^∗A^∗Ax≥ (e^∗MA^∗)(Ae_M) =|αM|². (2.12)

(10)

If A is itself Hermitian, then its eigenvalues α_k are real and the eigenvalues λ_k of A^∗A are simply λ_k = α_k². This follows since if e^(k)is an eigenvector of A with eigenvalue α_k, then A^∗Ae^(k) = α_kA^∗e^(k) = α²_ke^(k). Thus, in particular, if A is Hermitian then

k A k= max

k |αk| = |αM|. (2.13)

The weak norm (or Hilbert-Schmidt norm) of an n×n matrix A = {ak,j} is defined by

|A| =



n⁻¹

nX−1 k=0

nX−1 j=0

|ak,j|²





1/2

= (n⁻¹Tr[A^∗A])^1/2 =

Ã

n⁻¹

nX−1 k=0

λ_k

!_1/2

. (2.14) The quantity √

n|A| is sometimes called the Frobenius norm or Euclidean norm. From lemma 2.1 we have

|A|² ≥ n⁻¹ⁿ^X⁻¹

k=0

|αk|², with equality iff A is normal. (2.15) The Hilbert-Schmidt norm is the “weaker” of the two norms since

k A k²= max

k λ_k ≥ n⁻¹ⁿ^X⁻¹

k=0

λ_k =|A|². (2.16) A matrix is said to be bounded if it is bounded in both norms.

Note that both the strong and the weak norms are in fact norms in the linear space of matrices, i.e., both satisfy the following three axioms:

1. k A k≥ 0 , with equality iff A = 0 , the all zero matrix.

2. k A + B k≤k A k + k B k 3. k cA k= |c|· k A k

.

(2.17)

The triangle inequality in (2.17) will be used often as is the following direct consequence:

k A − B k≥ | k A k − k B k . (2.18) The weak norm is usually the most useful and easiest to handle of the two but the strong norm is handy in providing a bound for the product of two matrices as shown in the next lemma.

(11)

2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 9

Lemma 2.2 Given two n× n matrices G = {gk,j} and H = {hk,j}, then

|GH| ≤k G k ·|H|. (2.19)

Proof. Expanding terms yields

|GH|² = n⁻¹^X

i

X

j

|^X

k

g_i,kh_k,j|²

= n⁻¹^X

i

X

j

X

k

X

m

g_i,k¯g_i,mh_k,j¯h_m,j

= n⁻¹^X

j

h^∗_jG^∗Gh_j,

(2.20)

where hj is the j^th column of H. From (2.10) (h^∗_jG^∗Gh_j)/(h^∗_jh_j)≤k G k² and therefore

|GH|² ≤ n⁻¹ k G k² ^X

j

h^∗_jh_j =k G k² ·|H|².

2 Lemma 2.2 is the matrix equivalent of 7.3a of ([13], p. 103). Note that the lemma does not require that G or H be Hermitian.

2.3 Asymptotically Equivalent Matrices

We will be considering n×n matrices that approximate each other when n is large. As might be expected, we will use the weak norm of the difference of two matrices as a measure of the “distance” between them. Two sequences of n× n matrices An and Bn are said to be asymptotically equivalent if

1. A_nand B_nare uniformly bounded in strong (and hence in weak) norm:

k Ank , k Bn k≤ M < ∞ (2.21) and

(12)

2. A_n− Bn= D_n goes to zero in weak norm as n→ ∞:

nlim→∞|An− Bn| = lim_n_→∞|Dn| = 0.

Asymptotic equivalence of A_n and B_n will be abbreviated A_n ∼ Bn. If one of the two matrices is Toeplitz, then the other is said to be asymptotically Toeplitz. We can immediately prove several properties of asymptotic equivalence which are collected in the following theorem.

Theorem 2.1 1. If An∼ Bn, then

nlim→∞|An| = lim_n_→∞|Bn|. (2.22) 2. If A_n ∼ Bn and B_n ∼ Cn, then A_n∼ Cn.

3. If A_n ∼ Bn and C_n∼ Dn, then A_nC_n∼ BnD_n.

4. If A_n ∼ Bn and k A⁻¹n k, k Bn⁻¹ k≤ K < ∞, i.e., A⁻¹n and B_n⁻¹ exist and are uniformly bounded by some constant independent of n, then A⁻¹_n ∼ Bn⁻¹.

5. If A_nB_n ∼ Cn and k A⁻¹n k≤ K < ∞, then Bn ∼ A⁻¹n C_n. Proof.

1. Eqs. (2.22) follows directly from (2.17).

2. |An− Cn| = |An− Bn+ B_n− Cn| ≤ |An− Bn| + |Bn− Cn|ⁿ^−→^→∞0 3. Applying lemma 2.2 yields

|AnC_n− BnD_n| = |AnC_n− AnD_n+ A_nD_n− BnD_n|

≤ k Ank ·|Cn− Dn|+ k Dnk ·|An− Bn|

n−→→∞ 0.

4.

|A⁻¹n − Bn⁻¹| = |Bn⁻¹BnAn− Bn⁻¹AnA⁻¹_n

≤ k Bn⁻¹ k · k A⁻¹n k ·|Bn− An|

n−→→∞ 0.

(13)

2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 11 5.

B_n− A⁻¹n C_n = A⁻¹_n A_nB_n− A⁻¹n C_n

≤ k A⁻¹n k ·|AnB_n− Cn|

n−→→∞ 0.

2 The above results will be useful in several of the later proofs.

Asymptotic equality of matrices will be shown to imply that eigenvalues, products, and inverses behave similarly. The following lemma provides a prelude of the type of result obtainable for eigenvalues and will itself serve as the essential part of the more general results to follow. It shows that if the weak norm of the difference of the two matrices is small, then the sums of the eigenvalues of each must be close.

Lemma 2.3 Given two matrices A and B with eigenvalues α_n and β_n, re- spectively, then

|n⁻¹ⁿ^X⁻¹

k=0

α_k− n⁻¹ⁿ^X⁻¹

k=0

β_k| = |n⁻¹ⁿ^X⁻¹

k=0

(α_k− βk)| ≤ |A − B|.

Proof: Define the difference matrix D = A− B = {dk,j} so that

nX−1 k=0

α_k−ⁿ^X⁻¹

k=0

β_k = Tr(A)− Tr(B)

= Tr(D).

Applying the Cauchy-Schwartz inequality (see, e.g., [19], p. 17) to Tr(D) yields

|Tr(D)|² = ^¯¯_¯¯

¯

nX−1 k=0

d_k,k^¯¯_¯¯

¯

2

≤ nⁿ^X⁻¹

k=0

|dk,k|²

≤ nⁿ^X⁻¹

k=0 nX−1 j=0

|dk,j|²

= n²|D|². (2.23)

(14)

Taking the square root and dividing by n proves the lemma. 2 An immediate consequence of the lemma is the following corollary.

Corollary 2.2 Given two sequences of asymptotically equivalent matrices A_n and B_n with eigenvalues α_n,k and β_n,k, respectively, then

nlim→∞n⁻¹

nX−1 k=0

α_n,k = lim

n→∞n⁻¹

nX−1 k=0

β_n,k (2.24)

or, equivalently,

nlim→∞n⁻¹

nX−1 k=0

(α_n,k− βn,k) = 0. (2.25) Proof. Let D_n ={dk,j} = An− Bn. Eq. (2.24) is equivalent to

nlim→∞n⁻¹Tr(Dn) = 0. (2.26) Dividing by n², and taking the limit, results in

0≤ |n⁻¹Tr(D_n)|² ≤ |Dn|^{2 −→}ⁿ→∞ 0 (2.27) from the lemma, which implies (2.26) and hence (2.24). 2 The previous corollary can be interpreted as saying the sample or arithmetic means of the eigenvalues of two matrices are asymptotically equal if the matrices are asymptotically equivalent. It is easy to see that if the matrices are Hermitian, a similar result holds for the means of the squared eigenvalues.

From (2.18) and (2.15),

|Dn| ≥ | |An| − |Bn| |

= |

vu utn⁻¹

nX−1 k=0

α²_n,k−

vu utn⁻¹

nX−1 k=0

β_n,k² |

n−→→∞ 0

if |Dn|ⁿ^−→→∞0, yielding the following corollary.

Corollary 2.3 Given two sequences of asymptotically equivalent Hermitian matrices A_n and B_n with eigenvalues α_n,k and β_n,k, respectively, then

nlim→∞n⁻¹

nX−1 k=0

α²_n,k = lim

n→∞n⁻¹

nX−1 k=0

β_n,k² (2.28)

(15)

2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 13

or, equivalently,

nlim→∞n⁻¹

nX−1 k=0

(α²_n,k− βn,k² ) = 0. (2.29)

Both corollaries relate limiting sample (arithmetic) averages of eigenvalues or moments of an eigenvalue distribution rather than individual eigenvalues. Equations (2.24) and (2.28) are special cases of the following fundamental theorem of asymptotic eigenvalue distribution.

Theorem 2.2 Let A_n and B_n be asymptotically equivalent sequences of ma- trices with eigenvalues αn,k and βn,k, respectively. Assume that the eigenvalue moments of either matrix converge, e.g., lim

n→∞n⁻¹

nX−1 k=0

α^s_n,k exists and is finite for any positive integer s. Then

nlim→∞n⁻¹

nX−1 k=0

α^s_n,k = lim

n→∞n⁻¹

nX−1 k=0

β_n,k^s . (2.30)

Proof. Let A_n = B_n+ D_n as in Corollary 2.2 and consider A^s_n− Bn^s

= ∆∆ _n. Since the eigenvalues of A^s_n are α^s_n,k, (2.30) can be written in terms of ∆_n as

nlim→∞n⁻¹Tr∆n= 0. (2.31) The matrix ∆_n is a sum of several terms each being a product of ∆⁰_ns and B_n⁰s but containing at least one D_n. Repeated application of lemma 2.2 thus gives

|∆n| ≤ K⁰|Dn|ⁿ^−→^→∞0. (2.32) where K⁰does not depend on n. Equation (2.32) allows us to apply Corollary 2.2 to the matrices A^s_n and D^s_n to obtain (2.31) and hence (2.30). 2 Theorem 2.2 is the fundamental theorem concerning asymptotic eigenvalue behavior. Most of the succeeding results on eigenvalues will be applications or specializations of (2.30).

Since (2.28) holds for any positive integer s we can add sums correspond- ing to different values of s to each side of (2.28). This observation immedi- ately yields the following corollary.

(16)

Corollary 2.4 Let A_n and B_n be asymptotically equivalent sequences of ma- trices with eigenvalues α_n,k and β_n,k, respectively, and let f (x) be any poly- nomial. Then

nlim→∞n⁻¹

nX−1 k=0

f (α_n,k) = lim

n→∞n⁻¹

nX−1 k=0

f (β_n,k) . (2.33) Whether or not A_n and B_n are Hermitian, Corollary 2.4 implies that (2.33) can hold for any analytic function f (x) since such functions can be expanded into complex Taylor series, i.e., into polynomials. If A_n and B_n are Hermitian, however, then a much stronger result is possible. In this case the eigenvalues of both matrices are real and we can invoke the Stone- Weierstrass approximation theorem ([19], p. 146) to immediately generalize Corollary 2.4. This theorem, our one real excursion into analysis, is stated below for reference.

Theorem 2.3 (Stone-Weierstrass) If F (x) is a continuous complex function on [a, b], there exists a sequence of polynomials p_n(x) such that

nlim→∞p_n(x) = F (x) uniformly on [a, b].

Stated simply, any continuous function defined on a real interval can be approximated arbitrarily closely by a polynomial. Applying theorem 2.3 to Corollary 2.4 immediately yields the following theorem:

Theorem 2.4 Let A_n and B_nbe asymptotically equivalent sequences of Her- mitian matrices with eigenvalues α_n,k and β_n,k, respectively. Since A_n and B_n are bounded there exist finite numbers m and M such that

m≤ αn,k, β_n,k ≤ M , n = 1, 2, . . . k = 0, 1, . . . , n − 1. (2.34) Let F (x) be an arbitrary function continuous on [m, M ]. Then

nlim→∞n⁻¹

nX−1 k=0

F (α_n,k) = lim

n→∞n⁻¹

nX−1 k=0

F (β_n,k) (2.35) if either of the limits exists. Equivalently,

nlim→∞n⁻¹

nX−1 k=0

(F (α_n,k)− F (βn,k)) = 0 (2.36)

(17)

2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 15 Theorem 2.4 is the matrix equivalent of theorem (7.4a) of [13]. When two real sequences{αn,k; k = 0, 1, . . . , n−1} and {βn,k; k = 0, 1, . . . , n−1} satisfy (2.34)-(2.35), they are said to be asymptotically equally distributed ([13], p.

62, where the definition is attributed to Weyl).

As an example of the use of theorem 2.4 we prove the following corollary on the determinants of asymptotically equivalent matrices.

Corollary 2.5 Let A_nand B_nbe asymptotically equivalent Hermitian matri- ces with eigenvalues αn,k and βn,k, respectively, such that αn,k, βn,k ≥ m > 0.

Then

nlim→∞(det A_n)^1/n = lim

n→∞(det B_n)^1/n. (2.37) Proof. From theorem 2.4 we have for F (x) = ln x

nlim→∞n⁻¹

nX−1 k=0

ln α_n,k = lim

n→∞n⁻¹

nX−1 k=0

ln β_n,k and hence

nlim→∞exp

"

n⁻¹ln

nY−1 k=0

α_n,k

#

= lim

n→∞exp

"

n⁻¹ln

nY−1 k=0

β_n,k

#

or equivalently

nlim→∞exp[n⁻¹ln det A_n] = lim

n→∞exp[n⁻¹ln det B_n],

from which (2.37) follows. 2

With suitable mathematical care the above corollary can be extended to cases where α_n,k, β_n,k > 0, but there is no m satisfying the hypothesis of the corollary, i.e., where the eigenvalues can get arbitrarily small but are still strictly positive. (In particular, see the discussion on p. 66 and in Section 3.1 of [13] for the required technical conditions.)

In the preceding chapter the concept of asymptotic equivalence of matrices was defined and its implications studied. The main consequences have been the behavior of inverses and products (theorem 2.1) and eigenvalues (theorems 2.2 and 2.4). These theorems do not concern individual entries in the matrices or individual eigenvalues, rather they describe an “average”

behavior. Thus saying A⁻¹_n ∼ Bn⁻¹ means that that |A⁻¹n − B⁻¹n | ⁿ^−→^→∞ 0 and says nothing about convergence of individual entries in the matrix. In certain cases stronger results on a type of elementwise convergence are possible using the stronger norm of Baxter [1, 2]. Baxter’s results are beyond the scope of this book.

(18)

2.4 Absolutely Equal Eigenvalue Distributions

It is possible to strengthen theorem 2.4 and some of the interim results used in its derivation using reasonably elementary methods. The key additional idea required is the Wielandt-Hoffman theorem [27], a result from matrix theory that is of independent interest. The theorem is stated and a proof following Wilkinson [28] is presented for completeness.

Lemma 2.4 (Wielandt-Hoffman theorem) Given two Hermitian matrices A and B with eigenvalues α_k and β_k in nonincreasing order, respectively, then

n⁻¹

nX−1 k=0

|αk− βk|² ≤ |A − B|².

Proof: Since A and B are Hermitian, we can write them as A = U diag(α_k)U^∗, B = W diag(β_k)W^∗, where U and W are unitary. Since the weak norm is not effected by multiplication by a unitary matrix,

|A − B| = |Udiag(αk)U^∗− W diag(βk)W^∗|

= |diag(αk)U^∗− U^∗W diag(β_k)W^∗|

= |diag(αk)U^∗W − U^∗W diag(β_k)|

= |diag(αk)Q− Qdiag(βk)|,

where Q = U^∗W = {qi,j} is also unitary. The (i, j) entry in the matrix diag(α_k)Q− Qdiag(βk) is (α_i− βj)q_i,j and hence

|A − B|² = n⁻¹

nX−1 i=0

nX−1 j=0

|αi− βj|²|qi,j|^{2 ∆}=

nX−1 i=0

nX−1 j=0

|αi− βj|²p_i,j (2.38)

where we have defined p_i,j − n⁻¹|qi,j|² Since Q is unitary, we also have that

nX−1 i=0

|qi,j|² =

nX−1 j=0

|qi,j|² = 1 (2.39)

or nX−1

i=0

p_i,j =

nX−1 j=0

p_i,j = 1

n. (2.40)