Differential Entropy Rate of Gaussian Processes

(5.37) Combining (5.36) and (5.37) yields

nlim→∞|Bn⁻¹T_n− In| = 0. (5.38)

Applying theorem 2.1 yields (5.33). 2

Since the only real requirements for the proof were the existence of the Wiener-Hopf factorization and the limiting behavior of the determinant, this result could easily be extended to the more general case that ln f (λ) is in-tegrable. The theorem can also be derived as a special case of more general results of Baxter [1] and is similar to a result of Rissanen and Barbosa [18].

5.4 Differential Entropy Rate of Gaussian Pro-cesses

As a final application of the Toeplitz eigenvalue distribution theorem, we consider a property of a random process that arises in Shannon information theory. Given a random process {Xn} for which a probability density func-tion f_Xn(xⁿ) is for the random vector Xⁿ = (X₀, X₁, . . . , X_n₋₁) defined for all positive integers n, the Shannon differential entropy h(Xⁿ) is defined by the integral

h(Xⁿ) = −^Z f_Xn(xⁿ) log f_Xn(xⁿ) dxⁿ

5.4. DIFFERENTIAL ENTROPY RATE OF GAUSSIAN PROCESSES63

and the differential entropy rate is defined by the limit h(X) = lim

n→∞

1 nh(Xⁿ)

if the limit exists. (See, for example, Cover and Thomas[5].) The logarithm is usually taken as base 2 and the units are bits. We will use the Toeplitz theorem to evaluate the differential entropy rate of a stationary Gaussian random process.

A stationary zero mean Gaussian random process is completely described by its mean correlation function R_X(k, m) = R_X(k−m) = E[(Xk−m)(Xk− m)] or, equivalently, by its power spectral density function

S(f ) =

X∞ n=−∞

R_X(n)e^−2πinf,

the Fourier transform of the covariance function. For a fixed positive integer n, the probability density function is

f_Xn(xⁿ) = 1

(2π)^n/2det(R_n)^1/2e⁻¹²^(xⁿ^−mⁿ⁾^t^R⁻¹ⁿ ^(xⁿ^−mⁿ⁾,

where R_n is the n× n covariance matrix with entries RX(k, m), k, m = 0, 1, . . . , n − 1. A straightforward multidimensional integration using the properties of Gaussian random vectors yields the differential entropy

h(Xⁿ) = 1

2log(2πe)ⁿdetR_n.

If we now identify the the covariance matrix Rn as the Toeplitz matrix generated by the power spectral density, T_n(S), then from theorem 4.5 we have immediately that

h(X) = 1

2log(2πe)σ_∞² (5.39)

where

σ²_∞= 1 2π

Z _2π

ln S(f ) df. (5.40)

The Toeplitz distribution theorems have also found application in more complicated information theoretic evaluations, including the channel capacity of Gaussian channels [24, 23] and the rate-distortion functions of autoregres-sive sources [9, 14]. The discussion by Arimoto and Hashimoto [14], however,

does contain an erroneous argument describing a “contradiction” in the re-sults of [9] and implicitly in [24, 23], but the alleged contradiction arises only because they attempt to apply the equal distribution theorem to a situation where it does not apply.

Bibliography

[1] G. Baxter, “A Norm Inequality for a ‘Finite-Section’ Wiener-Hopf Equa-tion,” Illinois J. Math., 1962, pp. 97–103.

[2] G. Baxter, “An Asymptotic Result for the Finite Predictor,” Math.

Scand., 10, pp. 137–144, 1962.

[3] A. B¨ottcher adn S.M. Grudsky, Toeplitz Matrices, Asymptotic Linear Algebra, and Functional Analysis, Birkh¨auser, 2000.

[4] A. B¨ottcher and B. Silbermann, Introduction to Large Truncated Toeplitz Matrices, Springer, New York, 1999.

[5] T. A. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991.

[6] P. J. Davis, Circulant Matrices, Wiley-Interscience, NY, 1979.

[7] D. Fasino and P. Tilli, “Spectral clustering properties of block multilevel Hankel matrices, Linear Algebra and its Applications, Vol. 306, pp. 155-163, 2000.

[8] F.R. Gantmacher, The Theory of Matrices, Chelsea Publishing Co., NY 1960.

[9] R.M. Gray, “Information Rates of Autoregressive Processes,” IEEE Trans. on Info. Theory, IT-16, No. 4, July 1970, pp. 412–421.

[10] R. M. Gray, “On the asymptotic eigenvalue distribution of Toeplitz ma-trices,” IEEE Transactions on Information Theory, Vol. 18, November 1972, pp. 725–730.

[11] R.M. Gray, “On Unbounded Toeplitz Matrices and Nonstationary Time Series with an Application to Information Theory,” Information and Control, 24, pp. 181–196, 1974.

[12] U. Grenander and M. Rosenblatt, Statistical Analysis of Stationary Time Series, Wiley and Sons, NY, 1966, Chapter 1.

[13] U. Grenander and G. Szeg¨o, Toeplitz Forms and Their Applications, University of Calif. Press, Berkeley and Los Angeles, 1958.

[14] T. Hashimoto and S. Arimoto, “On the rate-distortion function for the nonstationary Gaussian autoregressive process,” IEEE Transactions on Information Theory, Vol. IT-26, pp. 478-480, 1980.

[15] P. Lancaster, Theory of Matrices, Academic Press, NY, 1969.

[16] J. Pearl, “On Coding and Filtering Stationary Signals by Discrete Fourier Transform,” IEEE Trans. on Info. Theory, IT-19, pp. 229–232, 1973.

[17] C.L. Rino, “The Inversion of Covariance Matrices by Finite Fourier Transforms,” IEEE Trans. on Info. Theory, IT-16, No. 2, March 1970, pp. 230–232.

[18] J. Rissanen and L. Barbosa, “Properties of Infinite Covariance Matrices and Stability of Optimum Predictors,” Information Sciences, 1, 1969, pp. 221–236.

[19] W. Rudin, Principles of Mathematical Analysis, McGraw-Hill, NY, 1964.

[20] W. F. Trench, “Asymptotic distribution of the even and odd spectra of real symmetric Toeplitz matrices,” Linear Algebra Appl., Vol. 302-303, pp. 155-162, 1999.

[21] W. F. Trench, “Absolute equal distribution of the spectra of Hermitian matrices,” submitted for publication, 2001.

[22] W. F. Trench, “Absolute equal distribution of equally distributed fami-lies of finite sets,” preprint, 2002.

BIBLIOGRAPHY 67 [23] B.S. Tsybakov, “Transmission capacity of memoryless Gaussian vector channels,” (in Russian),Probl. Peredach. Inform., Vol 1, pp. 26–40, 1965.

[24] B.S. Tsybakov, “On the transmission capacity of a discrete-time Gaus-sian channel with filter,” (in RusGaus-sian),Probl. Peredach. Inform., Vol 6, pp. 78–82, 1970.

[25] E.E. Tyrtyshnikov, “A unifying approach to some old and new theorems on distribution and clustering,” Linear Algebra and its Applications, Vol.

232, pp. 1-43, 1996.

[26] H. Widom, “Toeplitz Matrices,” in Studies in Real and Complex Anal-ysis, edited by I.I. Hirschmann, Jr., MAA Studies in Mathematics, Prentice-Hall, Englewood Cliffs, NJ, 1965.

[27] A.J. Hoffman and H. W. Wielandt, “The variation of the spectrum of a normal matrix,” Duke Math. J., Vol. 20, pp. 37-39, 1953.

[28] James H. Wilkinson, “Elementary proof of the Wielandt-Hoffman theo-rem and of its generalization,” Stanford University, Department of Com-puter Science Report Number CS-TR-70-150, January 1970 .

absolutely summable, 25, 31, 40 asymptotic equivalence, 31

asymptotically absolutely equally dis-tributed, 18

asymptotically equally distributed, 15, 46

asymptotically equivalent matrices, 9

asymptotically weakly stationary, 54 autocorrelation matrix, 54

autoregressive process, 57 bounded Toeplitz matrices, 25 Cauchy-Schwartz inequality, 11, 17 characteristic equation, 5

circulant matrix, 4, 21 conjugate transpose, 6, 60

continuous, 14, 18, 19, 27, 32, 34, 40, 41, 44, 46–48, 57, 59 convergence

uniform, 26

Courant-Fischer theorem, 6 covariance matrix, 3, 53, 54 cyclic matrix, 4

cyclic shift, 21

determinant, 15, 25, 50, 60 DFT, 23

diagonal, 7

differential entropy, 62

differential entropy rate, 62 discrete time, 53

eigenvalue, 5, 25

eigenvalue distribution theorem, 33, 40

factorization, 60 filter, 3

linear time invariant, 53 finite order, 25

finite order Toeplitz matrix, 29 Fourier series, 26 inverse, 24, 25, 43 Kronecker delta, 23

Kronecker delta response, 55 linear difference equation, 21 matrix

INDEX 69

power specral density, 54

power spectral density, 53, 54, 63 probability mass function, 17 product, 24, 25

random process, 53 discrete time, 53 Rayleigh quotient, 6

Riemann integrable, 40, 43, 51 Shannon information theory, 62 Shur’s theorem, 7 Toeplitz matrix, 3, 25 trace, 6

transpose, 22

triangle inequality, 8

triangular, 53, 56, 58, 60, 62 two-sided, 53

Wielandt-Hoffman theorem, 16, 17 Wiener-Hopf factorization, 53

在文檔中 By limiting the generality of the matrices considered the essential ideas and results can be conveyed in a more intuitive manner without the mathematical machinery required for the most general cases (頁 64-71)