Toeplitz and Circulant Matrices: A review
t0 t−1 t−2 · · · t−(n−1)
t1 t0 t−1
t2 t1 t0 ...
... . ..
tn−1 · · · t0
Robert M. Gray
Information Systems Laboratory Department of Electrical Engineering
Stanford University Stanford, California 94305
rmgray@stanford.edu
Revised August 2002 This document available as an
Adobe portable document format (pdf) file at http://ee.stanford.edu/~gray/toeplitz.pdf
°Robert M. Gray, 1971, 1977, 1993, 1997, 1998, 2000, 2001, 2002.c
The preparation of the original report was financed in part by the National Science Foundation and by the Joint Services Program at Stanford. Since then it has been done as a hobby.
Abstract
The fundamental theorems on the asymptotic behavior of eigenvalues, inverses, and products of “finite section” Toeplitz matrices and Toeplitz ma- trices with absolutely summable elements are derived in a tutorial manner.
Mathematical elegance and generality are sacrificed for conceptual simplic- ity and insight in the hopes of making these results available to engineers lacking either the background or endurance to attack the mathematical lit- erature on the subject. By limiting the generality of the matrices considered the essential ideas and results can be conveyed in a more intuitive manner without the mathematical machinery required for the most general cases. As an application the results are applied to the study of the covariance matrices and their factors of linear models of discrete time random processes.
Acknowledgements
The author gratefully acknowledges the assistance of Ronald M. Aarts of the Philips Research Labs in correcting many typos and errors in the 1993 re- vision, Liu Mingyu in pointing out errors corrected in the 1998 revision, Paolo Tilli of the Scuola Normale Superiore of Pisa for pointing out an incorrect corollary and providing the correction, and to David Neuhoff of the University of Michigan for pointing out several typographical errors and some confusing notation. For corrections, comments, and improvements to the 2001 revision thanks are due to William Trench, John Dattorro, and Young Han-Kim. In particular, Professor Trench brought the Wielandt-Hoffman theorem and its use to prove strengthened results to my attention. Section 2.4 largely follows his suggestions, although I take the blame for any introduced errors. For the 2002 revision, particular thanks to Cynthia Pozun of ENST for several corrections.
Contents
1 Introduction 3
2 The Asymptotic Behavior of Matrices 5
2.1 Eigenvalues . . . . 5
2.2 Matrix Norms . . . . 7
2.3 Asymptotically Equivalent Matrices . . . . 9
2.4 Absolutely Equal Eigenvalue Distributions . . . . 16
3 Circulant Matrices 21 4 Toeplitz Matrices 25 4.1 Bounded Toeplitz Matrices . . . . 25
4.2 Finite Order Toeplitz Matrices . . . . 29
4.3 Absolutely Summable Toeplitz Matrices . . . . 34
4.4 Toeplitz Determinants . . . . 50
5 Applications to Stochastic Time Series 53 5.1 Moving Average Processes . . . . 54
5.2 Autoregressive Processes . . . . 57
5.3 Factorization . . . . 60
5.4 Differential Entropy Rate of Gaussian Processes . . . . 62
Bibliography 64
1
Chapter 1 Introduction
A Toeplitz matrix is an n×n matrix Tn= tk,j where tk,j = tk−j, i.e., a matrix of the form
Tn=
t0 t−1 t−2 · · · t−(n−1) t1 t0 t−1
t2 t1 t0 ...
... . ..
tn−1 · · · t0
. (1.1)
Examples of such matrices are covariance matrices of weakly stationary stochastic time series and matrix representations of linear time-invariant dis- crete time filters. There are numerous other applications in mathematics, physics, information theory, estimation theory, etc. A great deal is known about the behavior of such matrices — the most common and complete ref- erences being Grenander and Szeg¨o [13] and Widom [26]. A more recent text devoted to the subject is B¨ottcher and Silbermann [4]. Unfortunately, how- ever, the necessary level of mathematical sophistication for understanding reference [13] is frequently beyond that of one species of applied mathemati- cian for whom the theory can be quite useful but is relatively little under- stood. This caste consists of engineers doing relatively mathematical (for an engineering background) work in any of the areas mentioned. This apparent dilemma provides the motivation for attempting a tutorial introduction on Toeplitz matrices that proves the essential theorems using the simplest possi- ble and most intuitive mathematics. Some simple and fundamental methods that are deeply buried (at least to the untrained mathematician) in [13] are here made explicit.
3
In addition to the fundamental theorems, several related results that nat- urally follow but do not appear to be collected together anywhere are pre- sented.
The essential prerequisites for this book are a knowledge of matrix theory, an engineer’s knowledge of Fourier series and random processes and calculus (Riemann integration). A first course in analysis would be helpful, but it is not assumed. Several of the occasional results required of analysis are usually contained in one or more courses in the usual engineering curriculum, e.g., the Cauchy-Schwarz and triangle inequalities. Hopefully the only unfamiliar results are a corollary to the Courant-Fischer theorem and the Weierstrass approximation theorem. The latter is an intuitive result which is easily be- lieved even if not formally proved. More advanced results from Lebesgue integration, functional analysis, and Fourier series are not used.
The approach of this book is to relate the properties of Toeplitz matrices to those of their simpler, more structured cousin — the circulant or cyclic matrix. These two matrices are shown to be asymptotically equivalent in a certain sense and this is shown to imply that eigenvalues, inverses, products, and determinants behave similarly. This approach provides a simplified and direct path (to the author’s point of view) to the basic eigenvalue distribution and related theorems. This method is implicit but not immediately apparent in the more complicated and more general results of Grenander in Chapter 7 of [13]. The basic results for the special case of a finite order Toeplitz matrix appeared in [10], a tutorial treatment of the simplest case which was in turn based on the first draft of this work. The results were subsequently generalized using essentially the same simple methods, but they remain less general than those of [13].
As an application several of the results are applied to study certain models of discrete time random processes. Two common linear models are studied and some intuitively satisfying results on covariance matrices and their fac- tors are given. As an example from Shannon information theory, the Toeplitz results regarding the limiting behavior of determinants is applied to find the differential entropy rate of a stationary Gaussian random process.
We sacrifice mathematical elegance and generality for conceptual sim- plicity in the hope that this will bring an understanding of the interesting and useful properties of Toeplitz matrices to a wider audience, specifically to those who have lacked either the background or the patience to tackle the mathematical literature on the subject.
Chapter 2
The Asymptotic Behavior of Matrices
In this chapter we begin with relevant definitions and a prerequisite theo- rem and proceed to a discussion of the asymptotic eigenvalue, product, and inverse behavior of sequences of matrices. The major use of the theorems of this chapter is that we can often study the asymptotic behavior of com- plicated matrices by studying a more structured and simpler asymptotically equivalent matrix, as will be developed in subsequent chapters.
2.1 Eigenvalues
The eigenvalues αk and the (right) eigenvectors (n-tuples) xk of an n× n matrix A are the solutions to the equation
Ax = αx (2.1)
and hence the eigenvalues are the roots of the characteristic equation of A:
det(A− αI) = 0 . (2.2)
Unless specifically stated otherwise, we will always assume that the eigenval- ues are ordered in nonincreasing fashion, i.e., α1 ≥ α2 ≥ α3· · ·.
Any complex matrix A can be written as
A = U RU∗, (2.3)
5
where the asterisk∗ denotes conjugate transpose, U is unitary, i.e., U−1 = U∗, and R ={rk,j} is an upper triangular matrix ([15], p. 79). The eigenvalues of A are the principal diagonal elements of R. If A is normal, i.e., if A∗A = AA∗, then R is a diagonal matrix, which we denote as R = diag(αk). If A is Hermitian, i.e., if A∗ = A, then the eigenvalues are real. If a matrix is Hermitian, then it is also normal.
For the case of Hermitian matrices, a useful description of the eigenvalues is the variational description given by the Courant-Fischer theorem ([15], p.
116). While we will not have direct need of this theorem, we will use the following important corollary which is stated below without proof.
Corollary 2.1 Define the Rayleigh quotient of an Hermitian matrix H and a vector (complex n−tuple) x by
RH(x) = (x∗Hx)/(x∗x). (2.4) Let ηM and ηm be the maximum and minimum eigenvalues of H, respectively.
Then
ηm = min
x RH(x) = min
x:x∗x=1 x∗Hx (2.5)
ηM = max
x RH(x) = max
x:x∗x=1 x∗Hx (2.6)
This corollary will be useful in specifying the interval containing the eigen- values of an Hermitian matrix.
The following lemma is useful when studying non-Hermitian matrices and products of Hermitian matrices.
Lemma 2.1 Let A be a matrix with eigenvalues αk. Define the eigenvalues of the Hermitian matrix A∗A to be λk. Then
nX−1 k=0
λk ≥nX−1
k=0
|αk|2, (2.7)
with equality iff (if and only if ) A is normal.
Proof. The trace of a matrix is the sum of the diagonal elements of a matrix.
The trace is invariant to unitary operations so that it also is equal to the sum of the eigenvalues of a matrix, i.e.,
Tr{A∗A} =nX−1
k=0
(A∗A)k,k =
nX−1 k=0
λk. (2.8)
2.2. MATRIX NORMS 7
We have
Tr{A∗A} = Tr{R∗R}
=
nX−1 k=0
nX−1 j=0
|rj,k|2
=
nX−1 k=0
|αk|2+X
k6=j
|rj,k|2
≥ nX−1
k=0
|αk|2 (2.9)
Equation (2.9) will hold with equality iff R is diagonal and hence iff A is
normal. 2
Lemma 2.1 is a direct consequence of Shur’s theorem ([15], pp. 229-231) and is also proved in [13], p. 106.
2.2 Matrix Norms
To study the asymptotic equivalence of matrices we require a metric or equiv- alently a norm of the appropriate kind. Two norms — the operator or strong norm and the Hilbert-Schmidt or weak norm (also called the Frobenius norm or Euclidean norm when the scaling term n is removed) — will be used here ([13], pp. 102-103).
Let A be a matrix with eigenvalues αk and let λk be the eigenvalues of the Hermitian matrix A∗A. The strong normk A k is defined by
k A k= maxx RA∗A(x)1/2= max
x:x∗x=1[x∗A∗Ax]1/2. (2.10) From Corollary 2.1
k A k2= max
k λk = λ∆ M. (2.11)
The strong norm of A can be bounded below by letting eM be the eigenvector of A corresponding to αM, the eigenvalue of A having largest absolute value:
k A k2= max
x:x∗x=1x∗A∗Ax≥ (e∗MA∗)(AeM) =|αM|2. (2.12)
If A is itself Hermitian, then its eigenvalues αk are real and the eigenvalues λk of A∗A are simply λk = αk2. This follows since if e(k)is an eigenvector of A with eigenvalue αk, then A∗Ae(k) = αkA∗e(k) = α2ke(k). Thus, in particular, if A is Hermitian then
k A k= max
k |αk| = |αM|. (2.13)
The weak norm (or Hilbert-Schmidt norm) of an n×n matrix A = {ak,j} is defined by
|A| =
n−1
nX−1 k=0
nX−1 j=0
|ak,j|2
1/2
= (n−1Tr[A∗A])1/2 =
Ã
n−1
nX−1 k=0
λk
!1/2
. (2.14) The quantity √
n|A| is sometimes called the Frobenius norm or Euclidean norm. From lemma 2.1 we have
|A|2 ≥ n−1nX−1
k=0
|αk|2, with equality iff A is normal. (2.15) The Hilbert-Schmidt norm is the “weaker” of the two norms since
k A k2= max
k λk ≥ n−1nX−1
k=0
λk =|A|2. (2.16) A matrix is said to be bounded if it is bounded in both norms.
Note that both the strong and the weak norms are in fact norms in the linear space of matrices, i.e., both satisfy the following three axioms:
1. k A k≥ 0 , with equality iff A = 0 , the all zero matrix.
2. k A + B k≤k A k + k B k 3. k cA k= |c|· k A k
.
(2.17)
The triangle inequality in (2.17) will be used often as is the following direct consequence:
k A − B k≥ | k A k − k B k . (2.18) The weak norm is usually the most useful and easiest to handle of the two but the strong norm is handy in providing a bound for the product of two matrices as shown in the next lemma.
2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 9
Lemma 2.2 Given two n× n matrices G = {gk,j} and H = {hk,j}, then
|GH| ≤k G k ·|H|. (2.19)
Proof. Expanding terms yields
|GH|2 = n−1X
i
X
j
|X
k
gi,khk,j|2
= n−1X
i
X
j
X
k
X
m
gi,k¯gi,mhk,j¯hm,j
= n−1X
j
h∗jG∗Ghj,
(2.20)
where hj is the jth column of H. From (2.10) (h∗jG∗Ghj)/(h∗jhj)≤k G k2 and therefore
|GH|2 ≤ n−1 k G k2 X
j
h∗jhj =k G k2 ·|H|2.
2 Lemma 2.2 is the matrix equivalent of 7.3a of ([13], p. 103). Note that the lemma does not require that G or H be Hermitian.
2.3 Asymptotically Equivalent Matrices
We will be considering n×n matrices that approximate each other when n is large. As might be expected, we will use the weak norm of the difference of two matrices as a measure of the “distance” between them. Two sequences of n× n matrices An and Bn are said to be asymptotically equivalent if
1. Anand Bnare uniformly bounded in strong (and hence in weak) norm:
k Ank , k Bn k≤ M < ∞ (2.21) and
2. An− Bn= Dn goes to zero in weak norm as n→ ∞:
nlim→∞|An− Bn| = limn→∞|Dn| = 0.
Asymptotic equivalence of An and Bn will be abbreviated An ∼ Bn. If one of the two matrices is Toeplitz, then the other is said to be asymptotically Toeplitz. We can immediately prove several properties of asymptotic equiv- alence which are collected in the following theorem.
Theorem 2.1 1. If An∼ Bn, then
nlim→∞|An| = limn→∞|Bn|. (2.22) 2. If An ∼ Bn and Bn ∼ Cn, then An∼ Cn.
3. If An ∼ Bn and Cn∼ Dn, then AnCn∼ BnDn.
4. If An ∼ Bn and k A−1n k, k Bn−1 k≤ K < ∞, i.e., A−1n and Bn−1 exist and are uniformly bounded by some constant independent of n, then A−1n ∼ Bn−1.
5. If AnBn ∼ Cn and k A−1n k≤ K < ∞, then Bn ∼ A−1n Cn. Proof.
1. Eqs. (2.22) follows directly from (2.17).
2. |An− Cn| = |An− Bn+ Bn− Cn| ≤ |An− Bn| + |Bn− Cn|n−→→∞0 3. Applying lemma 2.2 yields
|AnCn− BnDn| = |AnCn− AnDn+ AnDn− BnDn|
≤ k Ank ·|Cn− Dn|+ k Dnk ·|An− Bn|
n−→→∞ 0.
4.
|A−1n − Bn−1| = |Bn−1BnAn− Bn−1AnA−1n
≤ k Bn−1 k · k A−1n k ·|Bn− An|
n−→→∞ 0.
2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 11 5.
Bn− A−1n Cn = A−1n AnBn− A−1n Cn
≤ k A−1n k ·|AnBn− Cn|
n−→→∞ 0.
2 The above results will be useful in several of the later proofs.
Asymptotic equality of matrices will be shown to imply that eigenvalues, products, and inverses behave similarly. The following lemma provides a prelude of the type of result obtainable for eigenvalues and will itself serve as the essential part of the more general results to follow. It shows that if the weak norm of the difference of the two matrices is small, then the sums of the eigenvalues of each must be close.
Lemma 2.3 Given two matrices A and B with eigenvalues αn and βn, re- spectively, then
|n−1nX−1
k=0
αk− n−1nX−1
k=0
βk| = |n−1nX−1
k=0
(αk− βk)| ≤ |A − B|.
Proof: Define the difference matrix D = A− B = {dk,j} so that
nX−1 k=0
αk−nX−1
k=0
βk = Tr(A)− Tr(B)
= Tr(D).
Applying the Cauchy-Schwartz inequality (see, e.g., [19], p. 17) to Tr(D) yields
|Tr(D)|2 = ¯¯¯¯
¯
nX−1 k=0
dk,k¯¯¯¯
¯
2
≤ nnX−1
k=0
|dk,k|2
≤ nnX−1
k=0 nX−1 j=0
|dk,j|2
= n2|D|2. (2.23)
Taking the square root and dividing by n proves the lemma. 2 An immediate consequence of the lemma is the following corollary.
Corollary 2.2 Given two sequences of asymptotically equivalent matrices An and Bn with eigenvalues αn,k and βn,k, respectively, then
nlim→∞n−1
nX−1 k=0
αn,k = lim
n→∞n−1
nX−1 k=0
βn,k (2.24)
or, equivalently,
nlim→∞n−1
nX−1 k=0
(αn,k− βn,k) = 0. (2.25) Proof. Let Dn ={dk,j} = An− Bn. Eq. (2.24) is equivalent to
nlim→∞n−1Tr(Dn) = 0. (2.26) Dividing by n2, and taking the limit, results in
0≤ |n−1Tr(Dn)|2 ≤ |Dn|2 −→n→∞ 0 (2.27) from the lemma, which implies (2.26) and hence (2.24). 2 The previous corollary can be interpreted as saying the sample or arith- metic means of the eigenvalues of two matrices are asymptotically equal if the matrices are asymptotically equivalent. It is easy to see that if the matrices are Hermitian, a similar result holds for the means of the squared eigenvalues.
From (2.18) and (2.15),
|Dn| ≥ | |An| − |Bn| |
= |
vu utn−1
nX−1 k=0
α2n,k−
vu utn−1
nX−1 k=0
βn,k2 |
n−→→∞ 0
if |Dn|n−→→∞0, yielding the following corollary.
Corollary 2.3 Given two sequences of asymptotically equivalent Hermitian matrices An and Bn with eigenvalues αn,k and βn,k, respectively, then
nlim→∞n−1
nX−1 k=0
α2n,k = lim
n→∞n−1
nX−1 k=0
βn,k2 (2.28)
2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 13
or, equivalently,
nlim→∞n−1
nX−1 k=0
(α2n,k− βn,k2 ) = 0. (2.29)
Both corollaries relate limiting sample (arithmetic) averages of eigenval- ues or moments of an eigenvalue distribution rather than individual eigen- values. Equations (2.24) and (2.28) are special cases of the following funda- mental theorem of asymptotic eigenvalue distribution.
Theorem 2.2 Let An and Bn be asymptotically equivalent sequences of ma- trices with eigenvalues αn,k and βn,k, respectively. Assume that the eigenvalue moments of either matrix converge, e.g., lim
n→∞n−1
nX−1 k=0
αsn,k exists and is finite for any positive integer s. Then
nlim→∞n−1
nX−1 k=0
αsn,k = lim
n→∞n−1
nX−1 k=0
βn,ks . (2.30)
Proof. Let An = Bn+ Dn as in Corollary 2.2 and consider Asn− Bns
= ∆∆ n. Since the eigenvalues of Asn are αsn,k, (2.30) can be written in terms of ∆n as
nlim→∞n−1Tr∆n= 0. (2.31) The matrix ∆n is a sum of several terms each being a product of ∆0ns and Bn0s but containing at least one Dn. Repeated application of lemma 2.2 thus gives
|∆n| ≤ K0|Dn|n−→→∞0. (2.32) where K0does not depend on n. Equation (2.32) allows us to apply Corollary 2.2 to the matrices Asn and Dsn to obtain (2.31) and hence (2.30). 2 Theorem 2.2 is the fundamental theorem concerning asymptotic eigen- value behavior. Most of the succeeding results on eigenvalues will be appli- cations or specializations of (2.30).
Since (2.28) holds for any positive integer s we can add sums correspond- ing to different values of s to each side of (2.28). This observation immedi- ately yields the following corollary.
Corollary 2.4 Let An and Bn be asymptotically equivalent sequences of ma- trices with eigenvalues αn,k and βn,k, respectively, and let f (x) be any poly- nomial. Then
nlim→∞n−1
nX−1 k=0
f (αn,k) = lim
n→∞n−1
nX−1 k=0
f (βn,k) . (2.33) Whether or not An and Bn are Hermitian, Corollary 2.4 implies that (2.33) can hold for any analytic function f (x) since such functions can be expanded into complex Taylor series, i.e., into polynomials. If An and Bn are Hermitian, however, then a much stronger result is possible. In this case the eigenvalues of both matrices are real and we can invoke the Stone- Weierstrass approximation theorem ([19], p. 146) to immediately generalize Corollary 2.4. This theorem, our one real excursion into analysis, is stated below for reference.
Theorem 2.3 (Stone-Weierstrass) If F (x) is a continuous complex function on [a, b], there exists a sequence of polynomials pn(x) such that
nlim→∞pn(x) = F (x) uniformly on [a, b].
Stated simply, any continuous function defined on a real interval can be approximated arbitrarily closely by a polynomial. Applying theorem 2.3 to Corollary 2.4 immediately yields the following theorem:
Theorem 2.4 Let An and Bnbe asymptotically equivalent sequences of Her- mitian matrices with eigenvalues αn,k and βn,k, respectively. Since An and Bn are bounded there exist finite numbers m and M such that
m≤ αn,k, βn,k ≤ M , n = 1, 2, . . . k = 0, 1, . . . , n − 1. (2.34) Let F (x) be an arbitrary function continuous on [m, M ]. Then
nlim→∞n−1
nX−1 k=0
F (αn,k) = lim
n→∞n−1
nX−1 k=0
F (βn,k) (2.35) if either of the limits exists. Equivalently,
nlim→∞n−1
nX−1 k=0
(F (αn,k)− F (βn,k)) = 0 (2.36)
2.3. ASYMPTOTICALLY EQUIVALENT MATRICES 15 Theorem 2.4 is the matrix equivalent of theorem (7.4a) of [13]. When two real sequences{αn,k; k = 0, 1, . . . , n−1} and {βn,k; k = 0, 1, . . . , n−1} satisfy (2.34)-(2.35), they are said to be asymptotically equally distributed ([13], p.
62, where the definition is attributed to Weyl).
As an example of the use of theorem 2.4 we prove the following corollary on the determinants of asymptotically equivalent matrices.
Corollary 2.5 Let Anand Bnbe asymptotically equivalent Hermitian matri- ces with eigenvalues αn,k and βn,k, respectively, such that αn,k, βn,k ≥ m > 0.
Then
nlim→∞(det An)1/n = lim
n→∞(det Bn)1/n. (2.37) Proof. From theorem 2.4 we have for F (x) = ln x
nlim→∞n−1
nX−1 k=0
ln αn,k = lim
n→∞n−1
nX−1 k=0
ln βn,k and hence
nlim→∞exp
"
n−1ln
nY−1 k=0
αn,k
#
= lim
n→∞exp
"
n−1ln
nY−1 k=0
βn,k
#
or equivalently
nlim→∞exp[n−1ln det An] = lim
n→∞exp[n−1ln det Bn],
from which (2.37) follows. 2
With suitable mathematical care the above corollary can be extended to cases where αn,k, βn,k > 0, but there is no m satisfying the hypothesis of the corollary, i.e., where the eigenvalues can get arbitrarily small but are still strictly positive. (In particular, see the discussion on p. 66 and in Section 3.1 of [13] for the required technical conditions.)
In the preceding chapter the concept of asymptotic equivalence of matri- ces was defined and its implications studied. The main consequences have been the behavior of inverses and products (theorem 2.1) and eigenvalues (theorems 2.2 and 2.4). These theorems do not concern individual entries in the matrices or individual eigenvalues, rather they describe an “average”
behavior. Thus saying A−1n ∼ Bn−1 means that that |A−1n − B−1n | n−→→∞ 0 and says nothing about convergence of individual entries in the matrix. In certain cases stronger results on a type of elementwise convergence are possible using the stronger norm of Baxter [1, 2]. Baxter’s results are beyond the scope of this book.
2.4 Absolutely Equal Eigenvalue Distributions
It is possible to strengthen theorem 2.4 and some of the interim results used in its derivation using reasonably elementary methods. The key additional idea required is the Wielandt-Hoffman theorem [27], a result from matrix theory that is of independent interest. The theorem is stated and a proof following Wilkinson [28] is presented for completeness.
Lemma 2.4 (Wielandt-Hoffman theorem) Given two Hermitian matrices A and B with eigenvalues αk and βk in nonincreasing order, respectively, then
n−1
nX−1 k=0
|αk− βk|2 ≤ |A − B|2.
Proof: Since A and B are Hermitian, we can write them as A = U diag(αk)U∗, B = W diag(βk)W∗, where U and W are unitary. Since the weak norm is not effected by multiplication by a unitary matrix,
|A − B| = |Udiag(αk)U∗− W diag(βk)W∗|
= |diag(αk)U∗− U∗W diag(βk)W∗|
= |diag(αk)U∗W − U∗W diag(βk)|
= |diag(αk)Q− Qdiag(βk)|,
where Q = U∗W = {qi,j} is also unitary. The (i, j) entry in the matrix diag(αk)Q− Qdiag(βk) is (αi− βj)qi,j and hence
|A − B|2 = n−1
nX−1 i=0
nX−1 j=0
|αi− βj|2|qi,j|2 ∆=
nX−1 i=0
nX−1 j=0
|αi− βj|2pi,j (2.38)
where we have defined pi,j − n−1|qi,j|2 Since Q is unitary, we also have that
nX−1 i=0
|qi,j|2 =
nX−1 j=0
|qi,j|2 = 1 (2.39)
or nX−1
i=0
pi,j =
nX−1 j=0
pi,j = 1
n. (2.40)