• 沒有找到結果。

A rate-distortion theorem for arbitrary discrete sources

N/A
N/A
Protected

Academic year: 2021

Share "A rate-distortion theorem for arbitrary discrete sources"

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

1666 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 4, JULY 1998

A Rate-Distortion Theorem for Arbitrary Discrete Sources Po-Ning Chen and Fady Alajaji

Abstract— A rate-distortion theorem for arbitrary (not necessarily

stationary or ergodic) discrete-time finite-alphabet sources is given. This result, which provides the expression of the minimum-achievable fixed-length coding rate subject to a fidelity criterion, extends a recent data compression theorem by Steinberg and Verd ´u.

Index Terms— Arbitrary discrete sources, data compression,

rate-dis-tortion theory, Shannon theory.

I. INTRODUCTION

We consider the problem of source coding with a fidelity criterion for arbitrary (not necessarily stationary or ergodic) discrete-time finite-alphabet sources. We prove a general rate-distortion theorem by establishing the expression of the minimum -achievable block coding rate subject to a fidelity criterion.

In [3, Theorem 10, part a)], Steinberg and Verd´u demonstrate a data compression theorem for arbitrary sources under the restriction that the probability of excessive distortion due to the achievable data compression codes is asymptotically equal to zero (cf. [3, Definitions 30 and 31]). In this work, we provide a variant of their result by relaxing the restriction on the probability of excessive distortion (cf. (3.1)).

II. PRELIMINARIES

Consider a random process XXX defined by a sequence of finite-dimensional distributions [2] X X X fXn= (X(n) 1 ; 1 1 1 ; Xn(n))g1n=1: Let YYY fYn= (Y(n) 1 ; 1 1 1 ; Yn(n))gn=11

be the corresponding output process induced byXXX via the channel W

W

W fWY jX = PY jX : Xn! Yngn=11 ;

which is an arbitrary sequence ofn-dimensional conditional distri-butions fromXn to Yn, whereX and Y are the input and output alphabets, respectively. We assume thatX to Y are finite.

Definition 2.1 ([2]): Given a joint distribution PX Y = WY jX PX on Xn2 Yn with marginals PX and PY , the information density is defined by

iX Y (an; bn) = logWY jX (bnj an) PY (bn) :

Manuscript received April 15, 1996; revised January 20, 1998. The work of P.-N. Chen was supported in part by National Chiao-Tung University. The work of F. Alajaji was supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant OGP0183645. The material in this correspondence was presented in part at the International Sym-posium on Information Theory and Its Applications, Victoria, BC, Canada, September 1996.

P.-N. Chen is with the Department of Communication Engineering, National Chiao-Tung University, HsinChu, Taiwan, R.O.C.

F. Alajaji is with the Department of Mathematics and Statistics, Queen’s University, Kingston, Ont. K7L 3N6, Canada.

Publisher Item Identifier S 0018-9448(98)03468-3.

Definition 2.2 ([2], [3]): The sup-information rate I(XX; YYY ) of theX joint process XXXYYY is defined as the limsup in probability1 of the sequence of normalized information densities n1 iX Y (Xn; Yn).

Analogously, the inf-information rateI(XXX; YYY ) between XXX and YYY is defined as the liminf in probability of the sequence of normalized information densities n1 iX Y (Xn; Yn).

When XXX is equal to YYY , I(XXX; XXX) (respectively, I(XXX; XX)) isX referred to as the sup (respectively, inf) entropy rate ofXXX and is denoted by H(XXX) (respectively, H(XXX)).

Definition 2.3 ([2], [3]): Given a joint distribution PX Y = WY jX PX , the conditional entropy density is defined by

iY jX (bnj an) = 0 log WY jX (bnj an):

The conditional sup-entropy rate H(YYY j XXX) of YYY given XXX is defined as the limsup in probability of the sequence of normalized conditional entropy densities 1n iY jX (Ynj Xn).

Analogously, the conditional inf-entropy rate H(YYY j XX) of YYYX givenXX is defined as the liminf in probability ofX n1 iY jX (Yn j Xn).

III. GENERAL DATACOMPRESSION THEOREM

Definition 3.1 (e.g., [1]): Given a finite source alphabetX and a finite reproduction alphabet Y, a block code for data compression of blocklengthn and size M is a mapping fn(1) : Xn! Ynthat results inkfnk = M codewords of length n, where each codeword is a sequence ofn reproducing letters.

Definition 3.2: A distortion measuren(1; 1) is a mapping n: Xn2 Yn! <+ [0; 1):

We can view the distortion measure as the cost of representing a sourcen-tuple Xn by a reproductionn-tuple fn(Xn).

Definition 3.3: LetXX and fn(1; 1)gn1X be given. Let fff(XXX) ffn(Xn)g1

n=1

denote a sequence of data compression codes forXXX. The distortion spectrum XXXfff(XXX)() for fff(1) is defined by

XXXfff(XXX)() lim infn!1 Pr 1nn(Xn; fn(Xn))   : Definition 3.4: FixD > 0 and 1 > " > 0. R is an "-achievable data compression rate at distortionD for a source XX if there existsX a sequence of data compression codesfn(1) with

lim sup n!1 1 nlog kfnk = R and sup[ : XXfff(XX X)X()  "]  D: (3.1) 1IfA

nis a sequence of random variables, then its liminf in probability is the largest extended real number such that for all  > 0,

lim

n!1Pr [An 0 ] = 0:

Similarly, its limsup in probability is the smallest extended real number such that for all  > 0 [2]

lim

n!1Pr [An + ] = 0:

(2)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 4, JULY 1998 1667

Fig. 1. XXXf(XXX)(D + ) > " ) sup [ : XXXf(XXX)()  "]  D + : Note that (3.1) is equivalent to stating that the limsup of the probability of excessive distortion (i.e., distortion larger thanD) is smaller than1 0 ".

The infimum"-achievable data compression rate at distortion D forXXX is denoted by T"(D; XX).X

Theorem 3.1 (General Data Compression Theorem): Fix D > 0 and1 > " > 0. Let XX and fn(1; 1)gn1X be given. Then

T"(D; XXX) = R"(D) where

R"(D) inf

fWWW :sup[: ()"]DgI(XX; YYY )X

where the infimum is taken over all conditional distributions WWW = fPY jX g1

n=1for which the joint distributionPXXXYYY = PXXXWWW satisfies the distortion constraint.

Proof:

1) Forward part (achievability): Choose > 0. We will prove the existence of a sequence of data compression codes with

lim sup n!1 1 nkfnk  R"(D) + 2 and sup[ : XXfff(XX X)X()  "]  D + :

Step 1: Let ~WWW be the channel distribution achieving R"(D), and let P~YYY be theYYY -marginal of PXXXWWW .~

Step 2: Let R = R"(D) + 2 . Choose M = enR n-blocks independently according toP~YYY, and denote the resulting random set by Cn.

Step 3: For a givenCn, we denote byA(Cn) the set of sequences xn2 Xn such that there existsyn2 Cnwith

1 nn(xn; yn)  D + : Step 4: Claim: lim sup n!1 E~YYY[PX (A c(Cn))] < 1 0 ":

The proof of this claim is provided in the Appendix. Therefore, there exists (a sequence of)C3nsuch that

lim sup

n!1 PX (A

c(C3

n)) < 1 0 ":

Step 5: Define a sequence of codesffng by

fn(xn) = arg miny 2C n(xn; yn); if xn2 A(Cn3)

0; otherwise

where0 is a fixed default n-tuple in Yn. Then

xn2 Xn: 1nn(xn; fn(xn))  D +  A(C3n) since(8xn 2 A(Cn3)) there exists yn 2 C3n such that (1=n)n(xn; yn)  D + , which by definition of fn implies that(1=n)n(xn; fn(xn))  D + . Step 6: Consequently, XXXfff(XXX)(D + ) = lim inf n!1 PX x n2 Xn: 1 nn(xn; f(xn))  D +  lim inf n!1 PX (A(C 3 n)) = 1 0 lim sup n!1 PX (A c(C3 n)) > ": Hence sup[ : XXXfff(XXX)()  "]  D + where the last step is clearly depicted in Fig. 1. This proves the forward part.

2) Converse part: We show that for any sequence of encoders ffn(1)g1 n=1, if sup [ : XXfff(XX X)X()  "]  D then lim sup n!1 1 nlog kfnk  R"(D): Let ^ Wn(ynj xn) 1; if yn= fn(xn) 0; otherwise:

Let Y^n denote the output corresponding to input Xn and channel W^n. Then to evaluate the statistical properties of the random sequence f(1=n)n(Xn; fn(Xn)g1n=1 under distribution

(3)

1668 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 4, JULY 1998

PX is equivalent to evaluating those of the random sequence f(1=n)n(Xn; ^Yn)g1

n=1under distributionPX W^n. Therefore,

R"(D) inf

fWWW :sup[: ()"]DgI(XX; YYY )X  I(XX; ^X YYY )  H( ^YYY ) 0 H( ^YYY j XXX)  H( ^YYY )  lim sup n!1 1 nlog kfnk;

where the second inequality follows from [4, Theorem 8, property (d)] and the third inequality follows from the fact thatH( ^YYY j XXX)  0.

APPENDIX Claim (cf. Proof of Theorem 3.1):

lim sup n!1 E~YYY[PX (A c(Cn))] < 1 0 ":3 Proof: Step 1: Let D(") supf :  X X X ~YYY()  "g: Define A(") n; (xn; yn) : 1nn(xn; yn)  D(")+ and 1 niX Y (xn; yn)  I(XXX; ~YYY ) + : Since lim inf n!1 Pr D 1 nn(Xn; ~Yn)  D(")+ > " and lim inf n!1 Pr E 1 niX ~Y (Xn; ~Yn)  I(XX; ~X YYY ) + = 1 we have lim inf n!1 Pr A (") n; = lim infn!1 Pr (D \ E)  lim inf

n!1 Pr (D) + lim infn!1 Pr (E) 0 1 > " + 1 0 1 = ":

Step 2: LetK(xn; yn) be the indicator function of A(")n; K(xn; yn) = 1; if(xn; yn) 2 A(")n;

0; otherwise:

Step 3: By following a similar argument in [3, Eqs. (9)–(12)], we obtain E~YYY[PX (Ac(Cn))]3 = C PY~ (C3n) x 62A(C ) PX (xn) = x 2X PX (xn) C :x 62A(C ) P~Y (C3 n) = x 2X PX (xn) 1 0 y 2Y P~Y (yn)K(xn; yn) M  x 2X PX (xn) 1 0 e0n(I(XX; ~XYYY )+ ) 2 y 2Y PY jX~ (ynj xn)K(xn; yn) M  1 0 x 2X y 2Y PX (xn)PY jX~ (xn; yn)K(xn; yn) + expf0en(R0R (D)0 )g: Therefore, lim sup n!1 EY~ [PX (A n(C3 n))]  1 0 lim infn!1 Pr A(")n; < 1 0 ": ACKNOWLEDGMENT

The authors would like to thank Prof. S. Verd´u for his valuable advice and encouragements.

REFERENCES

[1] R. E. Blahut, Principles and Practice of Information Theory. Reading, MA: Addison Wesley, 1988.

[2] T. S. Han and S. Verd´u, “Approximation theory of output statistics,”

IEEE Trans. Inform. Theory, vol. 39, pp. 752–772, May 1993.

[3] Y. Steinberg and S. Verd´u, “Simulation of random processes and rate-distortion theory,” IEEE Trans. Inform. Theory, vol. I42, pp. 63–86, Jan. 1996.

[4] S. Verd´u and T. S. Han, “A general formula for channel capacity,” IEEE

Trans. Inform. Theory, vol. 40, pp. 1147–1157, July 1994.

On One Useful Inequality in Testing of Hypotheses Marat V. Burnashev

Abstract—A simple proof of one probabilistic inequality is presented. Index Terms—Error probabilities, testing of hypotheses.

I. MAIN INEQUALITY

LetP and Q be two given probability measures on a measurable space (X ; A). We consider testing of hypotheses P and Q using one observation. For an arbitrary decision rule, let and denote the two kinds of error probabilities. If both error probabilities have equal costs (or we want to minimize the maximum of them) then it is natural to investigate the minimal possible sum inff + g for the best decision rule.

Manuscript received July 10, 1997; revised November 24, 1997. This work was supported by the Russian Foundation for Fundamental Research under Grants N 95-01-00136a and INTAS-94-469.

The author is with the Institute for Problems of Information Transmission, Russian Academy of Sciences, 101447 Moscow, Russia.

Publisher Item Identifier S 0018-9448(98)03470-1.

數據

Fig. 1.  X X Xf(X X X) (D + 
) &gt; &#34; ) sup [ :  X X Xf(X X X) ()  &#34;]  D + 
: Note that (3.1) is equivalent to stating that the limsup of the probability of excessive distortion (i.e., distortion larger than D) is smaller than 1 0 &#34;.

參考文獻

相關文件

In particular, we present a linear-time algorithm for the k-tuple total domination problem for graphs in which each block is a clique, a cycle or a complete bipartite graph,

In this paper we prove a Carleman estimate for second order elliptic equa- tions with a general anisotropic Lipschitz coefficients having a jump at an interface.. Our approach does

In this talk, we introduce a general iterative scheme for finding a common element of the set of solutions of variational inequality problem for an inverse-strongly monotone mapping

This paper presents (i) a review of item selection algorithms from Robbins–Monro to Fred Lord; (ii) the establishment of a large sample foundation for Fred Lord’s maximum

Two sources to produce an interference that is stable over time, if their light has a phase relationship that does not change with time: E(t)=E 0 cos( w t+ f ).. Coherent sources:

In fact, while we will be naturally thinking of a one-dimensional lattice, the following also holds for a lattice of arbitrary dimension on which sites have been numbered; however,

Abstract In this paper, we consider the smoothing Newton method for solving a type of absolute value equations associated with second order cone (SOCAVE for short), which.. 1

(2007) demonstrated that the minimum β-aberration design tends to be Q B -optimal if there is more weight on linear effects and the prior information leads to a model of small size;