Subset Typicality Lemmas and Improved Achievable Regions in Multiterminal Source Coding

(1)

arXiv:1205.1173v1 [cs.IT] 6 May 2012

Subset Typicality Lemmas and Improved

Achievable Regions in Multiterminal Source Coding

Kumar Viswanatha, Emrah Akyol and Kenneth Rose ECE Department, University of California - Santa Barbara

{kumar,eakyol,rose}@ece.ucsb.edu

Abstract—Consider the following information theoretic setup wherein independent codebooks ofN correlated ran- dom variables are generated according to their respective marginals. The problem of determining the conditions on the rates of codebooks to ensure the existence of at least one codeword tuple which is jointly typical with respect to a given joint density (called the multivariate covering lemma) has been studied fairly well and the associated rate regions have found applications in several source coding scenarios. However, several multiterminal source coding applications, such as the general multi-user Gray-Wyner network, require joint typicality only within subsets of codewords transmitted. Motivated by such applications, we ask ourselves the conditions on the rates to ensure the existence of at least one codeword tuple which is jointly typical within subsets according to given per subset joint densities. This report focuses primarily on deriving a new achievable rate region for this problem which strictly improves upon the direct extension of the multivariate covering lemma, which has quite popularly been used in several earlier work. Towards proving this result, we derive two important results called ‘subset typicality lemmas’

which can potentially have broader applicability in more general scenarios beyond what is considered in this report.

We finally apply the results therein to derive a new achievable region for the general multi-user Gray-Wyner network.

Index Terms—Typicality within subsets, Multivariate covering lemma, Multi-user Gray-Wyner network

I. INTRODUCTION

Consider a scenario where independent codebooks of N random variables (X₁, X₂, . . . , XN) are generated according to some given marginal distributions at rates (R₁, R₂, . . . , RN) respectively. Let S₁, S₂. . . SM

beM subsets of {1, 2, . . . , N } and let the joint distributions of(X₁, X₂, . . . , XN) within each subset, consistent with each other and with the marginal distributions, be given. We ask ourselves the conditions on the rates (R₁, R₂, . . . , RN) (achievable region) so that the probability of finding one codeword from each codebook, such that the codewords are all jointly typical within

subsets S1, S2. . . SM according to the given per subset joint distributions, approaches 1. We denote the given probability distribution over subsetSi byP ({X}Si). The conditions on the rates whenSi= {1, . . . , N }, i.e, when the joint distribution over all the random variables is given, can be derived using standard typicality arguments and is quite popularly called as the multivariate covering lemma [1], [2]¹. It says that for any joint density over (X₁, X₂, . . . , XN), if the codebooks are generated according to the respective marginals, the probability of not finding a jointly typical codeword tuple approach0 if ∀J ⊆ {1, 2, . . . , N }:

X

i∈J

Ri ≥X

i∈J

H(Xi) − H(P ({X}_J)) (1) where {X}J denotes the set X_i : i ∈ J and H(P ) denotes the entropy of any distributionP .

A fairly direct extension of the multivariate covering lemma, to the more general scenario of arbitrary subsets S1, S2. . . SM, which has been quite popularly used in several information theoretic scenarios, such as [2], [4], [5], [6], [7], can be described as follows. Fix any joint density ˜P (X1, X2, . . . , XN) such that:

P ({X}˜ Sj) = PSj({X}Sj) ∀j (2) i.e, it satisfies the given joint distributions within subsets Sj ∀j. Then the set of all rate tuples satisfying the following conditions are achievable,∀J ⊆ {1, 2, . . . , N }:

X

i∈J

Ri ≥X

i∈J

H(Xi) − H( ˜P ({X}J)) (3) The convex closure of all achievable rate tuples, over all such joint densities ˜P satisfying the given per subset densities is an achievable region for the problem. We denote this region by Ra. Our primary objective in

1We note that the underlying principles and proofs of multivariate covering lemma appeared much earlier in the literature, for example [3]. However the nomenclature and the general applicability of the underlying ideas have been elucidated quite clearly in [1]

(2)

this report is to show that the rate region in (3) with the individual functionals set to their respective maxima subject only to their specific exact constraints is, infact, achievable. Specifically we show that, each of the terms H( ˜P ({X}J)) can be replaced with the corresponding maximum entropy functionalsH^∗( ˜P ({X}J)) subject to only the constraints pertinent to subsets of {X}J. This allows us to achieve simultaneous optimum of all the functionals leading to a strictly larger achievable region than Ra. Towards proving this result, we establish two important lemmas, namely ‘subset typicality lemmas’, which may prove to have much wider applicability in general scenarios beyond the scope of this report.

Scenarios depicted in the above example, where typicality within subsets of codewords is sufficient for decoding, arise quite frequently in several multiterminal source coding setups. One of the most typical examples is the multi-user generalization of the Gray-Wyner network [8]

discussed in section III where the encoder observes K random variables and there are K sinks, each decoding one of the random variables upto a prescribed distortion constraint². The most general setting involves 2^K − 1 branches (encoding rates), each being sent to a unique subset of the decoders. Observe that it is sufficient if all the codewords being sent to sinki are jointly typical with thei’th source sequence and enforcing joint typicality of all the codewords in an unnecessary restriction. Similar settings arise in the context of dispersive information routing of correlated sources [7], fusion coding and selective retrieval in a database [6] and in several other scenarios which can be considered as particular cross- sections of the general L−channel ‘multiple descriptions’ (MD) problem [2], [4]. We note that, in this report, we demonstrate the workings of the underlying principle in the context of the example we described above.

However it is important to note that the results we derive have implications in a wide variety of problems involving optimization of multiple functionals, each depending on a subset of the random variables, subject to constraints on their joint distributions.

II. MAINRESULTS

In this section, we first establish the subset typicality lemmas which will finally lead to Theorems 1 and 2

2We note that [9] considers a particular generalization of the Gray- Wyner network to multiple users with applications in information theoretic security where a unique common branch is sent to all the decoders along with their respective individual rates. However we assert that the most general extension of the 2 user Gray-Wyner network will involve a combinatorial number of branches, each being sent to a unique subset of the decoders.

showing strictly larger achievable rates compared toRa. Throughout the report, we use the following notation.

n independent and identically distributed (iid) copies of a random variable and its realizations are denoted by X₀ⁿ and xⁿ₀ respectively. Length n, ǫ-typical set of any random variable X, with distribution P (X) is denoted³ by T_ǫⁿ(P (X)). Throughout the report, for any set S, we use the shorthand {U }S to denote the set {Ui : i ∈ S}. Note the difference between U₁₂₃, which is a single random variable and {U }₁₂₃, which is the set of random variables {U₁, U₂, U₃}. In the following Lemmas, we use the notation P (A) .

= 2^−nR to denote 2^{−n(R+δ(ǫ))} ≤ P (A) ≤ 2^{−n(R−δ(ǫ))} for some δ(ǫ) → 0 as ǫ → 0. To avoid resolvable but unnecessary com- plications, we further assume that there exists at least one joint distribution consistent with the prescribed per subset distributions forS₁, S₂, . . . , SM.

A. Subset Typicality Lemmas

Lemma 1. Subset Typicality Lemma :Let (X₁, X₂, . . . XN) be N random variables taking values on arbitrary finite alphabets (X₁, X₂, . . . XN) respectively. Let their marginal distributions be P₁(X₁), P₂(X₂) . . . , PN(XN) respectively. Let S₁, S₂. . . SM be M subsets of {1, 2, . . . , N } and for all j ∈ {1, 2, . . . , M }, let PSj({X}Sj) be any given joint distribution for {X}Si consistent with each other and with the given marginal distributions. Generate sequences xⁿ₁, xⁿ₂. . . xⁿ_N, each independent of the other, where xⁿ_i is drawn iid according to the marginal distribution Pi(Xi), i.e., xⁿ_i ∼Qn

l=1Pi(xil). Then, P

{x}ⁿ_S_j ∈ T_ǫⁿ PSj({X}Sj) , ∀j ∈ {1 . . . M}

= 2. ⁻ⁿ⁽^P^Nⁱ⁼¹^H(Xⁱ^)−H(P^∗⁾⁾ (4) whereP^∗ is a distribution over (X₁, X₂. . . , X_N) which satisfies:

P^∗= arg max

P˜

H ˜P

(5)

subject to ˜P ({X}_S_j) = P_S_j({X}_S_j) ∀j ∈ {1 . . . M }.

This Lemma essentially says that the total number of sequence tuples (xⁿ₁, xⁿ₂. . . xⁿ_N) generated according to their respective marginals which are jointlyǫ−typical according to PSi({X}Si) ∀i within subsets S₁, S₂. . . SM, is approximately 2^nH(P^∗⁾ where P^∗ is the maximum entropy distribution subject to the constraint that the joint density within subset Si isPSj({X}Sj)∀j.

3The parenthesis is dropped whenever it is obvious

(3)

Proof: To prove this Lemma, we resort to Sanov’s theorem ([10] Theorem 11.4.1) from the theory of large deviations. Sanov’s theorem states that for any distribution Q(X) and for any subset of probability distributions E ⊆ P, where P denotes the universe of the PMFs over the alphabets of X:

Qⁿ(E) .

= 2^−nD(P^∗^||Q) (6)

for sufficiently large n, where P^∗ is the distribution closest in relative entropy to Q in E and Qⁿ(E) denotes the probability that an iid sequence generated according to Q(X) is ǫ−typical with respect to some distribution in E. We set Q(·) = QN

i=1P_i(X_i) and E as the set of all distributions over (X₁, X₂, . . . X_N) satisfying the given constraints. Then it follows from Sanov’s theorem that the probability of (xⁿ₁ . . . xⁿ_N) being ǫ−typical according to some distribution satisfying the given constraints is approximately 2^−nD(P^∗^||^Q^Nⁱ⁼¹^Pⁱ^(Xⁱ⁾⁾, where P^∗ is the distribution having minimum relative entropy to QN

i=1P_i(X_i) and satisfying the given constraints.

However, all such distributions have the same marginal distributions P_i(X₁), P_i(X₂) . . . , P_i(X_N). Hence mini- mizing relative entropy is equivalent to maximizing the joint entropy leading to P^∗ as defined in (5). Therefore we have:

P {x}ⁿ_S_i ∈ T_ǫⁿ({X}Si), ∀i = Qⁿ(E) .

= 2^−nD(P^∗^||Q)

= 2. ⁻ⁿ⁽^P^Nⁱ⁼¹^H(Xⁱ^)−H(P^∗⁾⁾

(7) where the last equality follows because P^∗ satisfies the given marginals.

We note that a particular instance of Lemma 1 was derived in [11]. However, as it turns out, for the setup they consider, this Lemma does not help in deriving an improved achievable region. In the following lemma, we establish the conditional version of Lemma 1. Note that Lemma 2 is not used in proving Theorems 1 or 2, but will play a crucial role in the application of these results to more general multi-terminal source coding scenarios (as we will see in section III).

Lemma 2. Conditional Subset Typicality Lemma :Let random variables (X₁, X₂, . . . XN), sets S₁, S₂. . . SM

and joint densities PSj({X}Sj) be defined as in Lemma 1. Let the sequences(xⁿ₁. . . xⁿ_N) be generated such that each sequence is generated conditioned on a subset of already generated sequences {x}ⁿ_A_i and independent of the rest, where (i, Ai) ∈ Sj for some j ∈ {1, . . . , M }.

Then we have:

P {x}ⁿ_S_i ∈ T_ǫⁿ({X}_S_i) ∀i ∈ {1 . . . M } .

=

2⁻ⁿ⁽^P^Nⁱ⁼¹^H(Xⁱ^|{X}^Ai^)−H(P^∗⁾⁾ (8) whereP^∗ satisfies (5).

Proof: The proof follows in very similar lines to that of Lemma 1 by settingQ(·) =QN

i=1P (X_i|XAi), as conditioning on xⁿ_A

i only introduces further constraints, which are redundant, asP_S_j({X}_S_j) are consistent with each other and (i, A_i) ∈ S_j for some j ∈ {1, . . . , M }.

B. Simultaneous Optimality of Functionals

In this section we will show that simultaneous optimality of all functionH( ˜P ({X}J)) is in fact achievable leading to a new achievable rate region for the problem stated in the introduction.

Theorem 1. Let random variables (X₁, X₂, . . . XN), sets S₁, S₂. . . SM and joint densities PSj({X}Sj) be defined as in Lemma 1. For each i ∈ {1, 2 . . . , M }, let xⁿ_i(mi) mi ∈ {1, . . . , 2^nRⁱ} be independent sequences drawn iid according to the respective marginals, i.e., xⁿ_i(mi) ∼Qn

l=1Pi(xil(mi)) ∀mi ∈ {1, . . . , 2^nRⁱ}. Then

∀ǫ > 0, ∃δ(ǫ) such that δ(ǫ) → 0 as ǫ → 0 and, P

{x}ⁿ_S_j {m}Sj ∈ T_ǫⁿ P_S_j({X}_S_j)

∀j for some{m1, m2. . . , mN}

≥ 1 − δ(ǫ) (9) if, (R₁, R₂. . . , RN) satisfy the following conditions

∀J ⊆ {1, 2, . . . , N }:

X

i∈J

Ri ≥X

i∈J

H(Xi) − H^∗({X}J) + ǫ (10)

where,

H^∗({X}J) = max

P({X}˜ J)

H ˜P ({X}J)

(11)

where ˜P ({X}J) satisfies:

P {X}˜ _{J ∩S}_j = P {X}J ∩Sj ∀j ∈ {1 . . . M} (12) We denote the rate region in (10) by R^∗_a.

Remark 1. Note that H^∗({X}J) = H(P ({X}J)) if J ⊆ Sj for someSj. Hence for allJ such that J ⊆ Sj

for some j, the corresponding inequalities in Theorem 1 and equations (2) are the same. However this theorem asserts that for every otherJ , the functionals in (2) can be replaced with the ‘maximum joint entropy’ subject

(4)

to the given subset distributions which involve only the random variables {X}J. It is very important to note that the maximum entropy distributions for two different subsets XJ1 and XJ2, J1, J2 ⊆ {1, 2, . . . , N }, may not even correspond to any valid joint distribution over (X₁, X₂, . . . , XN). This is precisely what provides the additional leeway in achieving points which are strictly outside (2) as illustrated in Theorem 2. A pictorial representation of the above theorem is shown in Fig.

1.

Proof: We are interested in finding conditions on rates so that the probability in (9) approaches 1. Denote the event E = ∄{m₁, m₂. . . , mN} : {x}ⁿ_S_j {m}Sj

∈ T_ǫⁿ P_S_j({X}_S_j)

∀j. We want to make P (E) → 0. Let N denote the set {1, 2, . . . , N } and let (m₁, m₂, . . . , mN) = {m}_N be an index tuple, one from each codebook, such that mi ∈ {1, . . . , 2^nRⁱ}. Let E({m}N) denote the event that {x}ⁿ_S_j {m}Sj ∈ T_ǫⁿ P_S_j({X}_S_j) ∀j. Define random variables χ({m}_N) such that:

χ({m}N) =

(1 ifE({m}N) occurs

0 else (13)

and random variable χ = P

{m}N χ({m}N). Then we have P (E) = P (χ = 0). From Chebyshev’s inequality, it follows that:

P (E) = P (χ = 0) ≤ P [|χ − E(χ)| ≥ E(χ)/2] (14)

≤ 4Var(χ) (E(χ))² =

4

E(χ²) − (E(χ))² (E(χ))²

We next bound E(χ) and E(χ²) using Lemma 1. First we write E(χ) as:

E(χ) = 2ⁿ^P^Nⁱ⁼¹^RⁱP (E({m}_N)) (15) for any {m}N because all the sequences are drawn independent of each other. Next towards boundingE(χ²), note that:

E(χ²) = X

{m}N

X

{l}N

P (E({m}_N), E({l}_N)) (16)

Let {m}Q = {l}_Q and {m}N −Q 6= {l}N −Q for some Q ⊆ N , Q 6= φ where φ denotes a null-set. Then,

P (E({m}N), E({l}N)) = (

P (E({m}Q))

P

E({m}N)

E ({m}^Q)2)

(17)

where E({m}Q) denotes the event that {x}ⁿ_S_j_∩Q {m}Sj∩Q

∈ T_ǫⁿ PSj∩Q({X}Sj∩Q)

∀j, as conditional on {x}ⁿ_Q({m}Q), sequences {x}ⁿ_{N −Q}({m}N −Q) and {x}ⁿ_{N −Q} ({l}N −Q) are drawn independently from the same distribution. The above expression can be rewritten as:

P (E({m}N), E({l}N)) = (

P (E({m}Q))

× P (E({m}N)) P (E({m}Q))

2)

(18)

If Q = φ, we have P (E({m}N), E({l}N)) = (P (E({m}N)))². Hence, we can write V ar(χ) as:

V ar(χ) = X

Q⊆N ,Q6=φ

(

2ⁿ^P^i∈Q^Rⁱ⁺²ⁿ^P^{i∈N −Q}^Rⁱ

×P (E({m}Q)) P (E({m}N)) P (E({m}Q))

2) (19)

Note that the Q = φ term gets cancelled with the

‘(E(χ))²’ terms in V ar(χ) (see [2] for a similar ar- gument).

On substituting (15) and (19) in (14), and noting that for any Q ⊆ N , Q 6= φ, we can write P (E({m}N)) = P (E({m}_Q))^P_P^(E({m}_(E({m}^N⁾⁾

Q)), we have:

P (E) ≤ 4 X

Q⊆N ,Q6=φ

2⁻ⁿ^P^i∈Q^Rⁱ(P (E({m}_Q)))⁻¹ (20) Next, invoking Lemma 1, we bound P (E({m}_Q)) as:

P (E({m}_Q)) ≥ 2⁻ⁿ(^Pi∈QH(Xi)−H^∗({X}Q)))^−nδ(ǫ) (21) On substituting (21) in (20), it follows that P (E) → 0 asn → ∞ if R_i satisfy (10).

C. Strict Improvement

Theorem 2. (i) The region in Theorem 1 subsumes the region in (3). i.e,

Ra⊆ R^∗_a (22)

(ii) There exist scenarios for which the region in Theorem 1 can be strictly larger than the region in (3). i.e.,

R^∗_a⊃ Ra (23)

Proof: The first half of the Theorem follows directly because H^∗({X}J) ≥ H({X}J) ∀J for any joint distribution satisfying the given distributions within subsets.

To prove (ii) we provide an example for which R^∗_a has points which are not part ofRa. Consider the following

(5)

Figure 1. Pictorial representation of Theorem 1: The triangle denotes the simplex of all joint distributions over (X₁, X₂, . . . , XN). The black star denotes the joint distribution representing the product of marginals (codebook generation). Each loop represent the set of all joint distributions satisfying the conditions imposed on {X}J for some J . The intersection of all the loops (red region) represents the set of joint distributions satisfying all the conditions. The blue stars represent the joint distributions which maximize functionals H(P ({X}J)) (equivalently, minimize the relative entropy with the product of marginals as seen from Sanov’s theorem) subject to the conditions on {X}J. Theorem 1 asserts that a separate joint distribution for each J can be chosen from the corresponding loop (blue stars) and hence all the functionals H(P ({X}J)) can be set to their respective maxima simultaneously.

(xi, x₄) 0, 0 0, 1 1, 0 1, 1 P (xⁱ, x4) ¹/2 0 ¹/4 1/4

Table I

PAIRWISEPMFOF(Xⁱ, X4) ∀i ∈ {1, 2, 3}

example of4 binary random variables (X₁, X₂, X₃, X₄).

X₁, X₂ and X₃ are distributed bern(¹₂) and X₄ is distributed bern(³₄), where bern(p) denotes a Bernoulli random variable with P (0) = p and P (1) = 1 − p. Let S1, S₂. . . , S₆ be all possible subsets of {1, 2, 3, 4} of cardinality2. Let PSj({X}Sj) be such that (X1, X2, X3) are pairwise independent and the pairwise PMF of (X_i, X₄) ∀i ∈ {1, 2, 3} is given in Table I. Note that these pairwise densities are satisfied by at lease one joint density obtained by the following operations : X₃ = X₁⊕X2 andX₄ = X₁•X2, whereX₁ andX₂ are independent bern(¹₂) random variables and ‘⊕’ and ‘•’

denote ‘bit-exor’ and ‘bit-and’ operations respectively.

Observe that maximizing the entropy over (X₁, X₂, X₃) subject to their respective pairwise densities makes them mutually independent. However, there exists no joint distribution over (X₁, X₂, X₃, X₄) satisfying all the pairwise conditions which makes (X₁, X₂, X₃) mutually independent. This intuition is in fact sufficient to see that R^∗_a ⊃ Ra. However to be more rigorous, we first rewrite the achievable region

R^∗_a for this example as:

Ri+ R₄ ≥ Hb(1 4) −1

2H_b(1 2) Ri+ Rj+ R₄ ≥ 2 + Hb(1

4) − H^∗(Xi, Xj, X₄)

4

X

i=1

Ri ≥ 3 + Hb(1

4) − H^∗({X}_1,2,3,4)(24)

∀i, j ∈ {1, 2, 3} where Hb(·) denotes the binary entropy function and {X}1,2,3,4 = {X1, X2, X3, X4}.

We consider the following corner point of (24), A = (0, 0, 0, 3+Hb(¹₄)−H^∗({X}_1,2,3,4)). It is sufficient for us to prove thatA /∈ Ra. Note that, ifR₁= R₂ = R₃ = 0, (X₁, X₂, X₃) must be mutually independent (which in- turn satisfies the pairwise independence conditions). To prove that A /∈ Ra , we will show that there cannot exist any joint PMF over (X₁, X₂, X₃, X₄) satisfying all pairwise distributions and for which (X₁, X₂, X₃) are mutually independent. Let us suppose that such a joint PMF exists. Denote the conditional PMFP (X₄ = 0|x₁, x₂, x₃) = αx1x2x3, x₁, x₂, x₃ ∈ {0, 1}. As (X₁, X₂, X₃) are assumed to be mutually independent, the joint distributionPX1,X2,X3,X4(x₁, x₂, x₃, 1) =

1−αx1 x2x3

8 . The pairwise distribution of (X₁, X₄) (from Table I) is such that P_X_i_,X4(0, 1) = 0 ∀i ∈ {1, 2, 3}.

This leads to the conclusion that α_x1x2x3 = 1 if any one of x₁, x₂, x₃ is 0. We are only left with finding α₁₁₁. Further, we want P_X1,X⁴(1, 1) = ¹₄, i.e.

P

x2,x3PX1,X2,X3,X4(1, x2, x3, 1) = P

x2,x3

1−α¹x2 x3

8 =

1

4. One substituting, we have α₁₁₁ = 2. As αx1,x2,x3s are conditional probabilities, this leads to a contradiction and proves that there cannot exist a joint distribution with (X₁, X₂, X₃) being mutually independent. There- fore R^∗_a⊃ Ra, proving the second half of the Theorem.

III. APPLICATION TOMULTI-USERGRAY-WYNER

NETWORK

We finally apply the results in Theorem 1 to obtain a new achievable region for the multi-user Gray-Wyner network. To illustrate the applicability and to maintain simplicity in notation, we only consider the 3-user lossless Gray-Wyner network here. However the approach can be extended directly to the general L−user setting and to incorporate distortions. Note that the formal definition of an achievable rate region closely resembles that in [8], with obvious generalization to the 3 user setting as shown in Fig. 2. We omit the details here due to space constraints. We further note that the rate region

(6)

Figure 2. 3-user Gray-Wyner network: There is a unique branch from the encoder to every subset of the decoders

is in general 7 dimensional, with the following rates:

(R1, R2, R3, R12, R13, R23, R123).

Corollary 1. Let (X1, X2, X3) be the random variables with joint distribution P (X1, X2, X3) observed by the encoder. Let (U123, U12, U13, U23) be random variables jointly distributed with (X1, X2, X3) with conditional distribution P (U123, U12, U13, U23|X1, X2, X3) and tak- ing values over arbitrary finite alphabets. Define sub- sets S1 = {U₁₂₃, U₁₂, U₁₃}, S2 = {U₁₂₃, U₁₂, U₂₃}, S3 = {U123, U13, U23}. The rate region for the 3-user lossless Gray-Wyner network contains all the rates such that ∀(i, j, k) ∈ {1, 2, 3} and i < j, i < k,

R₁₂₃ ≥ H(U123) − H^∗(U₁₂₃|X) R₁₂₃+ Rij ≥ H(U123, Uij)

−H^∗(U₁₂₃, Uij|X)

R₁₂₃+ Rij+ Rik ≥ H(U₁₂₃) − H^∗({U }_123,ij,ik|X) +H(Uij|U₁₂₃) + H(Uik|U₁₂₃) R₁₂₃+X

i<j

Rij ≥ H(U₁₂₃) +X

i<j

H(Uij|U₁₂₃)

−H^∗(U₁₂₃, U₁₂, U₂₃, U₁₃|X) Ri ≥ H(Xi|{U }J :i∈J) (25) where X = {X1, X₂, X₃} and H^∗({U }_J|X) is given by:

˜ max

P({U }J,{X}1,2,3)

H ˜P ({U }J) X

(26)

where ˜P ({U }_J

X) satisfies:

P {U }˜ J ∩Sj, Xj

= P {U }J ∩Sj, Xj

∀j (27) The closure of the achievable rates over all conditional distributions P (U₁₂₃, U₁₂, U₁₃, U₂₃|X₁, X₂, X₃) is an achievable region for the 3-user lossless Gray-Wyner network.

Proof: A codebook for U₁₂₃ consisting of 2^nR¹²³ codewords is generated according to the marginal P (U123). Conditioned on each codeword of U123, independent codebooks are generated forU12, U13andU23at rates of R₁₂, R₁₃ and R₂₃ according to their respective conditional distributions P (U₁₂|U₁₂₃), P (U₁₃|U₁₂₃) and P (U₂₃|U₁₂₃). If the rates satisfy (25), then there always exists a codeword tuple, one from each codebook, denoted by (uⁿ₁₂₃, uⁿ₁₂, uⁿ₁₃, uⁿ₂₃), such that the following subsets of sequences are jointly typical according to their respective subset joint densities: (xⁿ₁, uⁿ₁₂₃, uⁿ₁₂, uⁿ₁₃), (xⁿ₂, uⁿ₁₂₃, uⁿ₁₂, uⁿ₂₃) and (xⁿ₃, uⁿ₁₂₃, uⁿ₁₃, uⁿ₂₃). The proof follows rather directly from Lemmas 1, 2 and Theorem 1 asU₁₂₃ is part of S₁, S₂ andS₃. The last constraint in (25) denotes the minimum rate of the bin indices required to achieve lossless reconstruction at each sink given that all the codewords received at any sink are jointly typical.

IV. DISCUSSION

We note that the conditions in (25) ensure joint typicality of source sequence X_iⁿ only with the codewords which reach sink i. However an alternate achievable region (which is subsumed in the above region) can be derived using results of the general L−channel MD problem in [2] which extends the principles underlying (3) to the multiple descriptions framework. Due to the inherent structure of the MD problem, joint typicality of all the transmitted codewords is necessary. However imposing such a constraint limits the performance of systems that do not explicitly require such conditions.

Note that, although we have not proved formally that the new region for the multi-user Gray-Wyner network is strictly larger than that derivable from the results in [2], Theorem 2 suggests that for general sources, there exist points which are strictly outside. It is important to note that implications of the results we derived may not always lead to a strictly larger achievable region. A classic example of this setting is the 2 user Gray-Wyner network [8] for which the complete rate-distortion region can be achieved even if joint typicality of all the codewords is imposed. This is because, in the 2-user scenario, there is no inherent conflict between maximum entropy distributions of different subsets of random variables.

However, in the L−user setting (as seen in Theorem 2), such a conflict arises and maintaining joint typicality only within subsets plays a paramount role in deriving improved achievable regions.

(7)

REFERENCES

[1] A. El-Gamal, Y.H. Kim, “Lecture notes on network information theory”, 23-61 to 23-67, http://arxiv.org/abs/1001.3404 , 2010.

[2] R. Venkataramani, G. Kramer, V.K. Goyal, “Multiple descrip- tion coding with many channels”, IEEE Trans. on Information Theory, vol.49, no.9, pp. 2106- 2114, Sept 2003.

[3] A. El Gamal and T. M. Cover, “Achievable rates for multiple descriptions,” IEEE Trans. Inf. Theory, vol. IT-28, pp. 851–857, Nov. 1982.

[4] R. Puri, S. S. Pradhan, and K. Ramchandran, “n-channel symmetric multiple descriptions-part II: an achievable rate- distortion region”, IEEE Trans. Information Theory, vol. 51, pp. 1377-1392, Apr. 2005.

[5] K. Viswanatha, E. Akyol and K. Rose, “Combinatorial message sharing for a refined multiple descriptions achievable region”, in Proc. IEEE Symp. Information Theory (ISIT), Aug. 2011.

[6] J. Nayak, S. Ramaswamy, K. Rose, “Correlated source coding for fusion storage and selective retrieval”, in Proc. IEEE Symp.

Information Theory (ISIT), Sept. 2005.

[7] K. Viswanatha, E. Akyol and K. Rose, “An achievable rate region for distributed source coding and dispersive information routing” in Proc. IEEE Symp. Information Theory (ISIT), Aug.

2011.

[8] R. Gray and A. Wyner, “Source coding for a simple network”, Bell systems technical report, Dec 1974.

[9] R. Tandon, L. Sankar, and H. V. Poor, “Multi-user privacy: The Gray-Wyner system and generalized common information,” in Proc. IEEE Symp. Information Theory (ISIT), Aug. 2011.

[10] T. Cover and J. Thomas, “Elements of Information Theory”, Wiley publications, Second edition, 2006.

[11] E. Perron, S. Diggavi, E. Telatar, “On the role of encoder side- information in source coding for multiple decoders," In Proc.

IEEE International Symposium on Information Theory (ISIT), vol., no., pp.331-335, 9-14 Jul 2006.