A rate-distortion theorem for arbitrary discrete sources

(1)

1666 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 4, JULY 1998

A Rate-Distortion Theorem for Arbitrary Discrete Sources Po-Ning Chen and Fady Alajaji

Abstract— A rate-distortion theorem for arbitrary (not necessarily

stationary or ergodic) discrete-time finite-alphabet sources is given. This result, which provides the expression of the minimum-achievable fixed-length coding rate subject to a fidelity criterion, extends a recent data compression theorem by Steinberg and Verd ´u.

Index Terms— Arbitrary discrete sources, data compression,

rate-dis-tortion theory, Shannon theory.

I. INTRODUCTION

We consider the problem of source coding with a fidelity criterion for arbitrary (not necessarily stationary or ergodic) discrete-time finite-alphabet sources. We prove a general rate-distortion theorem by establishing the expression of the minimum -achievable block coding rate subject to a fidelity criterion.

In [3, Theorem 10, part a)], Steinberg and Verd´u demonstrate a data compression theorem for arbitrary sources under the restriction that the probability of excessive distortion due to the achievable data compression codes is asymptotically equal to zero (cf. [3, Definitions 30 and 31]). In this work, we provide a variant of their result by relaxing the restriction on the probability of excessive distortion (cf. (3.1)).

II. PRELIMINARIES

Consider a random process XXX defined by a sequence of finite-dimensional distributions [2] X X X fXn_{= (X}(n) 1 ; 1 1 1 ; Xn(n))g1n=1: Let YYY fYn_{= (Y}(n) 1 ; 1 1 1 ; Yn(n))gn=11

be the corresponding output process induced byXXX via the channel W

W

W fWY jX = PY jX : Xn! Yngn=11 ;

which is an arbitrary sequence ofn-dimensional conditional distri-butions fromXn to Yn, whereX and Y are the input and output alphabets, respectively. We assume thatX to Y are finite.

Definition 2.1 ([2]): Given a joint distribution PX Y = WY jX PX on Xn2 Yn with marginals PX and PY , the information density is defined by

iX Y (an_{; b}n_{) = log}WY jX (bnj an) PY (bn₎ :

Manuscript received April 15, 1996; revised January 20, 1998. The work of P.-N. Chen was supported in part by National Chiao-Tung University. The work of F. Alajaji was supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant OGP0183645. The material in this correspondence was presented in part at the International Sym-posium on Information Theory and Its Applications, Victoria, BC, Canada, September 1996.

P.-N. Chen is with the Department of Communication Engineering, National Chiao-Tung University, HsinChu, Taiwan, R.O.C.

F. Alajaji is with the Department of Mathematics and Statistics, Queen’s University, Kingston, Ont. K7L 3N6, Canada.

Publisher Item Identifier S 0018-9448(98)03468-3.

Definition 2.2 ([2], [3]): The sup-information rate I(XX; YYY ) of theX joint process XXXYYY is defined as the limsup in probability1 _{of the} sequence of normalized information densities _n1 iX Y (Xn; Yn).

Analogously, the inf-information rateI(XXX; YYY ) between XXX and YYY is defined as the liminf in probability of the sequence of normalized information densities _n1 iX Y (Xn; Yn).

When XXX is equal to YYY , I(XXX; XXX) (respectively, I(XXX; XX)) isX referred to as the sup (respectively, inf) entropy rate ofXXX and is denoted by H(XXX) (respectively, H(XXX)).

Definition 2.3 ([2], [3]): Given a joint distribution PX Y = WY jX PX , the conditional entropy density is defined by

iY jX (bn_{j a}n_{) = 0 log WY jX} _(bn_{j a}n_):

The conditional sup-entropy rate H(YYY j XXX) of YYY given XXX is defined as the limsup in probability of the sequence of normalized conditional entropy densities 1_n iY jX (Ynj Xn).

Analogously, the conditional inf-entropy rate H(YYY j XX) of YYYX givenXX is defined as the liminf in probability ofX _n1 i_{Y jX} (Yn j Xn_).

III. GENERAL DATACOMPRESSION THEOREM

Definition 3.1 (e.g., [1]): Given a finite source alphabetX and a finite reproduction alphabet Y, a block code for data compression of blocklengthn and size M is a mapping fn(1) : Xn! Ynthat results inkfnk = M codewords of length n, where each codeword is a sequence ofn reproducing letters.

Definition 3.2: A distortion measuren(1; 1) is a mapping n: Xn_{2 Y}n_{! <}+ _{[0; 1):}

We can view the distortion measure as the cost of representing a sourcen-tuple Xn by a reproductionn-tuple fn(Xn).

Definition 3.3: LetXX and fn(1; 1)gn1X be given. Let fff(XXX) ffn(Xn_)g1

n=1

denote a sequence of data compression codes forXXX. The distortion spectrum _X_X_Xfff(X_X_X)() for fff(1) is defined by

_X_XXfff(XXX)() lim inf_n!1 Pr 1_nn(Xn; fn(Xn)) : Definition 3.4: FixD > 0 and 1 > " > 0. R is an "-achievable data compression rate at distortionD for a source XX if there existsX a sequence of data compression codesfn(1) with

lim sup n!1 1 nlog kfnk = R and sup[ : _X_Xfff(X_X _X)_X() "] D: (3.1) 1_If_A

nis a sequence of random variables, then its liminf in probability is the largest extended real number such that for all > 0,

lim

n!1Pr [An 0 ] = 0:

Similarly, its limsup in probability is the smallest extended real number such that for all > 0 [2]

lim

n!1Pr [An + ] = 0:

(2)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 4, JULY 1998 1667

Fig. 1. _X_X_Xf(X_X_X)(D + ) > " ) sup [ : _X_X_Xf(X_X_X)() "] D + : Note that (3.1) is equivalent to stating that the limsup of the probability of excessive distortion (i.e., distortion larger thanD) is smaller than1 0 ".

The infimum"-achievable data compression rate at distortion D forXXX is denoted by T"(D; XX).X

Theorem 3.1 (General Data Compression Theorem): Fix D > 0 and1 > " > 0. Let XX and fn(1; 1)gn1X be given. Then

T"(D; XXX) = R"(D) where

R"(D) inf

fWWW :sup[: ()"]DgI(XX; YYY )X

where the infimum is taken over all conditional distributions WWW = fPY jX g1

n=1for which the joint distributionPXXXYYY = PXXXWWW satisfies the distortion constraint.

Proof:

1) Forward part (achievability): Choose > 0. We will prove the existence of a sequence of data compression codes with

lim sup n!1 1 nkfnk R"(D) + 2 and sup[ : XXfff(XX X)X() "] D + :

Step 1: Let ~WWW be the channel distribution achieving R"(D), and let P~_Y_Y_Y be theYYY -marginal of PX_X_XWWW .~

Step 2: Let R = R"(D) + 2 . Choose M = enR n-blocks independently according toP~_Y_Y_Y, and denote the resulting random set by Cn.

Step 3: For a givenCn, we denote byA(Cn) the set of sequences xn_{2 X}n _{such that there exists}_yn_{2 Cn}_with

1 nn(xn; yn) D + : Step 4: Claim: lim sup n!1 E~YYY[PX (A c_{(Cn))] < 1 0 ":}

The proof of this claim is provided in the Appendix. Therefore, there exists (a sequence of)C3_nsuch that

lim sup

n!1 PX (A

c_(C3

n)) < 1 0 ":

Step 5: Define a sequence of codesffng by

fn(xn_{) = arg min}y 2C n(xn; yn); if xn2 A(Cn3)

0; otherwise

where0 is a fixed default n-tuple in Yn. Then

xn2 Xn: 1_nn(xn; fn(xn)) D + A(C3n) since(8xn 2 A(C_n3)) there exists yn 2 C3_n such that (1=n)n(xn_{; y}n_{) D +
, which by definition of fn} implies that(1=n)n(xn; fn(xn)) D + . Step 6: Consequently, XXXfff(XXX)(D + ) = lim inf n!1 PX x n_{2 X}n_{: 1} nn(xn; f(xn)) D + lim inf n!1 PX (A(C 3 n)) = 1 0 lim sup n!1 PX (A c_(C3 n)) > ": Hence sup[ : _X_X_Xfff(X_X_X)() "] D + where the last step is clearly depicted in Fig. 1. This proves the forward part.

2) Converse part: We show that for any sequence of encoders ffn(1)g1 n=1, if sup [ : _X_Xfff(X_X _X)_X() "] D then lim sup n!1 1 nlog kfnk R"(D): Let ^ Wn_(yn_{j x}n₎ 1; if yn= fn(xn) 0; otherwise:

Let Y^n denote the output corresponding to input Xn and channel W^n. Then to evaluate the statistical properties of the random sequence f(1=n)n(Xn; fn(Xn)g1_n=1 under distribution

(3)

1668 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 4, JULY 1998

PX is equivalent to evaluating those of the random sequence f(1=n)n(Xn_{; ^}_Yn_)g1

n=1under distributionPX W^n. Therefore,

R"(D) inf

fWWW :sup[: ()"]DgI(XX; YYY )X I(XX; ^X YYY ) H( ^YYY ) 0 H( ^YYY j XXX) H( ^YYY ) lim sup n!1 1 nlog kfnk;

where the second inequality follows from [4, Theorem 8, property (d)] and the third inequality follows from the fact thatH( ^YYY j XXX) 0.

APPENDIX Claim (cf. Proof of Theorem 3.1):

lim sup n!1 E~YYY[PX (A c_(C_{n))] < 1 0 ":}3 Proof: Step 1: Let D(") _{supf :} X X X ~YYY() "g: Define A(") n; (xn; yn) : 1_nn(xn; yn) D(")+ and 1 niX Y (xn; yn) I(XXX; ~YYY ) + : Since lim inf n!1 Pr D 1 nn(Xn; ~Yn) D(")+ > " and lim inf n!1 Pr E 1 niX ~Y (Xn; ~Yn) I(XX; ~X YYY ) + = 1 we have lim inf n!1 Pr A (") n; = lim inf_n!1 Pr (D \ E) lim inf

n!1 Pr (D) + lim infn!1 Pr (E) 0 1 > " + 1 0 1 = ":

Step 2: LetK(xn; yn) be the indicator function of A(")n; K(xn_{; y}n_{) = 1;} if(xn; yn) 2 A(")n;

0; otherwise:

Step 3: By following a similar argument in [3, Eqs. (9)–(12)], we obtain E~_Y_Y_Y[PX (Ac(Cn))]3 = C P_Y~ (C3n) x 62A(C ) PX (xn) = x 2X PX (xn₎ C :x 62A(C ) P~_Y (C3 n) = x 2X PX (xn) 1 0 y 2Y P~_Y (yn)K(xn; yn) M x 2X PX (xn_{) 1 0 e}0n(I(XX; ~XYYY )+ ) 2 y 2Y P_{Y jX}~ (ynj xn)K(xn; yn) M 1 0 x 2X y 2Y PX (xn)P_{Y jX}~ (xn; yn)K(xn; yn) + expf0en(R0R (D)0 )g: Therefore, lim sup n!1 EY~ [PX (A n_(C3 n))] 1 0 lim inf_n!1 Pr A(")n; < 1 0 ": ACKNOWLEDGMENT

The authors would like to thank Prof. S. Verd´u for his valuable advice and encouragements.

REFERENCES

[1] R. E. Blahut, Principles and Practice of Information Theory. Reading, MA: Addison Wesley, 1988.

[2] T. S. Han and S. Verd´u, “Approximation theory of output statistics,”

IEEE Trans. Inform. Theory, vol. 39, pp. 752–772, May 1993.

[3] Y. Steinberg and S. Verd´u, “Simulation of random processes and rate-distortion theory,” IEEE Trans. Inform. Theory, vol. I42, pp. 63–86, Jan. 1996.

[4] S. Verd´u and T. S. Han, “A general formula for channel capacity,” IEEE

Trans. Inform. Theory, vol. 40, pp. 1147–1157, July 1994.

On One Useful Inequality in Testing of Hypotheses Marat V. Burnashev

Abstract—A simple proof of one probabilistic inequality is presented. Index Terms—Error probabilities, testing of hypotheses.

I. MAIN INEQUALITY

LetP and Q be two given probability measures on a measurable space (X ; A). We consider testing of hypotheses P and Q using one observation. For an arbitrary decision rule, let and denote the two kinds of error probabilities. If both error probabilities have equal costs (or we want to minimize the maximum of them) then it is natural to investigate the minimal possible sum inff + g for the best decision rule.

Manuscript received July 10, 1997; revised November 24, 1997. This work was supported by the Russian Foundation for Fundamental Research under Grants N 95-01-00136a and INTAS-94-469.

The author is with the Institute for Problems of Information Transmission, Russian Academy of Sciences, 101447 Moscow, Russia.

Publisher Item Identifier S 0018-9448(98)03470-1.