We now generalize the above derivation to the trinary noisy channels, then the general-ization to the dinary channel will just follows. The key steps are similar to the binary ones. The first step is to use the same method to prove the following theorem:
Theorem II: The ratio I(X;Y )I(X;Z) reaches its maximum only when all the three conditional probabilities Pr(Y |X = i) with i = 0, 1, 2 are almost indistinguishable.
The strategy to prove this theorem is to observe that we can treat the pair (Pr(Y = 0|X = i), Pr(Y = 1|X = i)) for each i (note that Pr(Y = 2|X = i) is not independent of this pair) as a point inside the unit square ([0, 1], [0, 1]). Then the three points Pr(Y |X = i) for i = 0, 1, 2 form a triangle. We can then follow the same way of proving the Lemma I in the previous subsection for the trinary case. First, we assume the maximal value of r occurs at all three vertices of some triangle. We then perform the affine transformation to rescale this maximal value to 1, and to make f = g (or more specifically H(Y |X = i) = H(Z|X = i)) at the three vertices of the above triangle. This then immediately leads to that there exists some point inside the triangle such that f = g.
We can use this point to construct a smaller triangle with any two of the vertices of the original triangle and show that the ratio r for this new triangle is greater than the one for the original larger triangle. Repeating this procedure we can prove the above theorem. It is also clear that we can generalize the theorem for the multi-nary channels by generalizing the triangle to the concave body of the higher dimensional space.
Here, we should point out that one can always reduce the concave body to the linear interval one, so that we can reduce to the situation for the binary case. That is, we
set all the conditional probabilities except one to be equal, and then study the closeness condition of the remaining two distinct conditional probabilities for the maximal ratio of
I(X;Z)
I(X;Y ). In the following, we will always restrict to such a situation.
We then go to the second step as for the binary channel, that is to use Theorem II to reduce the problem of maximizing I(X;Y )I(X;Z) to the one of maximizing the ratio of relative entropies. We rewrite the ratio of two mutual information as following,
I(X;Z) To simplify the expression for further manipulations, we denote the average probability of Y as ~p = P2
i=0Pr(X = i) Pr(Y |X = i), and parameterize the probability Pr(Y |X = 0) = ~p +~0 and Pr(Y |X = 1) = ~p +~1. Thus, the probability Pr(Y |X = 2) is forced to be
~
p − Pr(X=0)Pr(X=2)~0− Pr(X=1)Pr(X=2)~1. The parameter vectors ~0 and ~1 should be sufficiently small as required by Theorem II to have maximal ratio I(X;Y )I(X;Z). Furthermore, we will further reduce the triangle to the linear interval case by assuming ~0 = ~1, i.e., Pr(Y |X = 0) =
Note again the ratio now does not depend on Pr(X).
Before serious expansion of (A.9) in the power of ~0, we need to specify ~p = (Pr(Y =0),Pr(Y =1),Pr(Y =2)) and ~0 = (v0,v1,v2). Note that, v0+ v1+ v2 = 0. As for the bi-nary channel, we expand the
relative entropy in terms of Pr(Y =i)vi . The leading term of the expansion for the denomi-nator of (A.9) is found to be
To find the expansion of the numerator, we need to specify the channel A between Y and Z. The generic trinary channel is given by
A = Pr(Z|Y ) =
found to be
D((~p + ~0) · A k ~p · A) = 2ln21 (v0a1p(Z=0)+v1b1+v2c1 + v0a2+vp(Z=1)1b2+v2c2 +v0a3p(Z=2)+v1b3+v2c3). (A.12) For simplicity, we only consider the symmetry trinary channel as follows
A = Pr(Z|Y ) =
Since we know that for symmetric channel, the maximal mutual information is achieved for uniform input probabilities. Thus, we assume uniform Pr(Y ) and Pr(Z) so that (A.9) depends only on variable ξ. We then obtain
I(X; Z)
I(X; Y ) ≤ (3ξ − 1
2 )2. (A.15)
This is the generalization of (A.3) for binary channel to the trinary one.
Similarly, we can generalize the above derivation to the dinary channels. If the channel between Y and Z is a dinary and symmetry channel specified as follows: Pr(Z = i|Y = i) = ξ and Pr(Z = s 6= i|Y = i) = d−11−ξ with i ∈ {0, 1, ..., d − 1}, then the bound of the ratio I(X;Y )I(X;Z) is given by (A.4).
Appendix B
The concavity of mutual information
In this appendix, we want to prove the mutual information I is not a concave function to joint probabilities Pr(B~y − A~x|~x, ~y) and input marginal probabilities Pr(ai). Thus, we could not formulate the problem (maximizing mutual information I) as a convex optimization programming.
First, we reexpress mutual information I by Pr(B~y−A~x|~x, ~y) and Pr(ai). If maximizing mutual information is a concave function to these probabilities, the second order partial derivative of mutual information respecting to each probability should be negative. Here, we find a violation when calculating ∂(Pr(B ∂2I
~
y−A~x=0|~x=0,~y=0))2. In following paragraphs, we denote the joint probability Pr(B~y− A~x = 0|~x = 0, ~y = 0) as V .
The mutual information can be rewritten as I = We can express Pr(a0 = j, β = n|b = 0) as the combination of joint probabilities Pr(B~y− A~x|~x, ~y) and input marginal probabilities Pr(ai) to obtain ∂ Pr(a0=j,β=n|b=0)
∂V .
Since joint probabilities Pr(B~y − A~x|~x, ~y) are subjected to the normalization conditions
of total probability, if n − j 6= (d − 1), where ~x in the above functions is given by the encoding of the RAC protocol, namely,
~
x := (x1, · · · , xk−1) with xi= ai− a0
Now, we can calculate the derivatives. The patrial derivative
∂ Pr(a0 = j, β = n|b = 0) Put above result to (B.3), for fixed j, we can find that Pd−1
n=0
∂ Pr(a0=j,β=n|b=0)
∂V = 0, thus
the second term of (B.3) will vanish.
We then can calculate the second order derivative
∂2Ib=0
For higher d and k, once the input marginal probabilities Pr(ai) are uniform. We then
can obtain
∂2I
∂V2 = ∂2Ib=0
∂V2 = 1
ln2
d−1
X
n=0
1
d2k( 1
Pr(a0= n, β = n|b = 0) + 1
Pr(a0 = n, β = n − (d − 1)|b = 0))
> 0 (B.10)
It is clear that mutual information I is not a concave function to joint probabilities Pr(B~y− A~x|~x, ~y) and input marginal probabilities Pr(ai).
Appendix C
Semidefinite programming
In this appendix, we briefly introduce the semidefinite programming (SDP) [61]. SDP is the problem of optimizing a linear function subjected to certain conditions associated with a positive semidefinite matrix X, i.e., v†Xv ≥ 0, for v ∈ Cn, and is denoted by X 0. It can be formulated as the standard primal problem as follows. Given the n × n symmetric matrices C and Dq’s with q = 1, · · · , m, we like to optimize the n × n positive semidefinite matrix X 0 such that we can achieve the following:
minimize T race(CTX) (C.1a)
subject to T race(DTqX) = bq, q = 1, · · · , m . (C.1b) Corresponding to the above primal problem, we can obtain a dual problem via a Lagrange approach [64]. The Lagrange duality can be understood as the following. If the primal problem is
minimize f0(x) (C.2a)
s.t. fq(x) ≤ 0, q ∈ 1...m. (C.2b)
hq(x) = 0, q ∈ 1...p, (C.2c)
the Lagrange function can be defined as
L(x, λ, ν) = f0(x) + Σmq=1λqfq(x) + Σpq=1νqhq(x), (C.3) where λ1,. . . , λm, and ν1,. . . ,νp are Lagrange multipliers respectively. Due to the problem and (C.3), the minima of f0 is bounded by (C.3) under the constraints when λ1,. . . , λm ≥ 0.
infx f0 ≥ inf
x L(x, λ, ν).
Then the Lagrange dual function is obtained.
g(λ, ν) = inf
x L(x, λ, ν).
g(λ, ν) ≤ p (p is the optimal solution of f0(x) ), for λ1,. . . , λm≥ 0 and arbitrary ν1,. . . ,νp. The dual problem is defined.
maximize g(λ, ν) (C.4a)
s.t. λq ≥ 0. (q ∈ {1...m}) (C.4b)
We can use the same method to define the dual problem for SDP. From the primal problem of SDP (C.1), we can write down the dual function by using minimax inequality [65].
X0inf T race(CTX) = infX0T race(CTX) +Pm The optimal solution of dual function is bounded under some vector y.
sup
If the feasible solutions for the primal problem and the dual problem attain their minimal and maximal values denoted as p0and d0respectively, then p0 ≥ d0, which is called the duality gap. This implies that the optimal solution of primal problem is bounded by dual problem. This then leads to the following: Both the primal and the dual problems attain their optimal solutions when the duality gap vanishes, i.e., d0= p0.
Appendix D
The Tsirelson-type inequality derived from the information
causality
In this appendix, we write down the detail of getting the Tsirelson-type inequality derived from IC. We review the the RAC protocol as follows. Alice has a database of k bits a0, a1, , , ak−1 where ai ∈ {0, 1} is the random variable ∀i ∈ (0, · · · , k − 1). The distant Bob is given a random variable b ∈ (0, ...k − 1) and a bit α sent by Alice. Bob’s task is to guess ab. Here we will consider the RAC protocol with different settings. Case (a) is proposed in the main text. In case (b), Alice’s and Bob’s settings are modified. In the following, Alice’s input is denoted by an N -bit string ~x = x1. . . xN. Let x = 1+
iyi, 1 ≤ y ≤ k. In this case, the Tsirelson-type inequality derived from information causality following the procedure in the chapter 2. is
| X the Tsirelson-type inequality from information causality is
| X
{~x},{~y}
(−1)~x·~yC~x,~y| ≤ 2k√
k. (D.2)