Quantum Information Theory
Scope:
1. Transmission of classical information over quantum channels.
2. The tradeoff between acquisition of quan- tum state information and disturbance of the state.
3. Quantifying quantum entanglement.
4. Transmission of quantum information over quantum channels.
Mainly accomplished by the interpretation and application of the Von Neumann entropy.
Classical Information Theory
A message is a string of letters chosen from an alphabet of k letters
{a1, a2, . . . , ak}.
The letters are independent and occurs with probability p(ax), and Pkx=1 p(ax) = 1.
A typical message of length n will contain np(ax) ax’s for each x. So the number of typical strings is
n!
Qk
x=1(np(ax))!.
By the Stirling approximation log n! = n log n − n + O(log n) we have
log n!
Qk
x=1(np(ax))!
= log n! −
k X x=1
log(np(ax))!
2
= n log n − n −
k X x=1
(np(ax) log np(ax) − np(ax))
= n log n −
k X x=1
np(ax)(log n + log p(ax))
= −n
k X x=1
p(ax) log p(ax)
= nH(X), where
H(X) = −
k X x=1
p(ax) log p(ax)
is the Shannon entropy of the ensemble X =
So there are approximately 2nH(X)
typical strings of length n for the letter en- semble X. Hence if we consider the typical strings as the only strings that can appear, then a string of length n can be compressed to nH(X) bits, that is, only nH(X) bits are needed to store any length n string.
The noiseless coding theorem states that the optimal code compresses each letter to H(X) bits asymptotically. It’s the highest compres- sion rate given the requirement that messages must be decoded without errors as n → ∞.
Another Perspective
For a particular length n message x1x2 . . . xn,
its prior probability is
P (x1x2 . . . xn) = p(x1)p(x2) . . . p(xn) and
log P (x1x2 . . . xn) =
n X i=1
log p(xi).
By the central limit theorem, when n is large enough most messages has probability P sat- isfying
−1
n log P (x1x2 . . . xn) = −1 n
n X i=1
log p(xi)
≈ h− log p(x)i
≡ H(X),
where the random varaiable x represent a letter chosen from X.
So for the typical sequences its probability P satisfies
H(X) − δ < −1
n log P (x1x2 . . . xn) < H(X) + δ, or
2−n(H(X)−δ) > P (x1x2 . . . xn) > 2−n(H(X)+δ), where δ > 0 is small.
Interpretation of Shannon Entropy
The Shannon entropy for a specific source X can be seen as the amount of our ignorance about the value of the next letter, or the amount of indeterminancy of the unknownm letter. It can also be seen as the amount of information we gain after receiving one letter, in the usual case where the logarithm in H is with base 2, the unit of H is bits.
Binary Entropy
Suppose that the alphabet is bits, that is, X = {0, 1}, with probability p0 = p and p1 = 1 − p.
The entropy for this case is
H(X) = H(p) = −p log p − (1 − p) log(1 − p).
When p0 = 12, the bit is completely random, hence
H(1
2) = 1
is the maximum attainable value for the en- tropy, that is, we are maximally ignorant about the value of the next letter, or that we gain the most information (one bit) by receiving one let- ter. When p0 = 1 or p1 = 1, the next bit is completely predictable, the entropy in this case is
H(0) = H(1) = 0,
so we are not ignorant about the value of next bit at all, it also means that we get no infor- mation by receiving one letter. All other cases have entropy between these two extremes.
5
Generalize the result in the previous section to general sources X, the entropy H is zero whenever any one of the letters occurs with certainty. That is,
Hmin(X) = − log 1 = 0.
And maximum entropy is achieved when all let- ters occur with equal probability, that is, with a uniform probability distribution. For a X with d letters, the entropy of uniform probability is
Hmax(X) = − X
i
1
d log 1
d = − log 1
d = log d.
In the general case of source X with d letters, its information per letter is
0 ≤ H(X) ≤ log d.
Relative Entropy
If p(x) and q(x) are two probability distributions over the same index set x (or a given set of letters), then the relative entropy of p(x) to q(x) is defined as
H(p(x)||q(x)) ≡ X
x
p(x) log p(x) q(x)
= −H(X) − X
x
p(x) log q(x).
The relative entropy is a measure of the close- ness of these two probability distributions. Since ln y ≤ y − 1, we have
H(p(x)||q(x)) = −X
x
p(x) log q(x) p(x)
≥ 1
ln 2
X x
p(x) 1 − q(x) p(x)
!
= 1
ln 2
X x
(p(x) − q(x))
= 0.
With equality when p(x) = q(x) for all x.
7
Mutual Information
Suppose a message composed from X are trans- mitted through a noisy channel, and a mes- sage composed from Y is received, that is, the channel distorts a letter x ∈ X into y ∈ Y with conditional probability p(y|x). When the mes- sage is received, the probability distribution for x can be updated to
p(x|y) = p(y|x)p(x) p(y) ,
where p(y|x) represent properties of the chan- nel, p(x) the a priori probabilities of ensemble X, and p(y) = Px p(y|x)p(x). So the message composed from Y contains some information about the original message from X. Using the p(x|y)’s we can defined the conditional entropy as
H(X|Y ) = h− log p(x|y)i = −X
xy
p(x, y) log p(x|y).
Note that
H(X|Y ) = h− log p(x, y) + log p(y)i
= h− log p(x, y)i − h− log p(y)i
= H(X, Y ) − H(Y ),
where H(X, Y ) ≡ −Pxy p(x, y) log p(x, y), simi- larly
H(Y |X) = H(X, Y ) − H(X).
We need H(X) bits per letter to decode mes- sages from X, after receiving via the noisy channel a message from Y , we need H(X|Y ) more bits per letter to decode the message. In other words
I(X; Y ) = H(X) − H(X|Y )
= H(X) + H(Y ) − H(X, Y )
= H(Y ) − H(Y |X).
bits of information per letters is gained by re- ceiving the distorted message. I(X; Y ) is the mutual information, which is symmetric.
From the properties of the logarithm we have H(X) ≥ H(X|Y ) ≥ 0,
H(Y ) ≥ H(Y |X) ≥ 0, so
I(X; Y ) ≥ 0,
H(X) + H(Y ) ≥ H(X, Y ).
That is, we will not lose any knowledge of a message from X by receiving a message from Y .
Equality occurs when X and Y is independent, then
I(X; Y ) = H(X) − H(X|Y )
= H(X) − h− log p(x|y)i
= H(X) −
*
− log p(x, y) p(y)
+
= H(X) −
*
− log p(x)p(y) p(y)
+
= H(X) − h− log p(x)i
The Noisy Coding Theorem
With X = {x, p(x)} for the input letters, we send a length n message through a memory- less noisy channel specified by p(y|x)’s. The output letters Y = {y, p(y)} can be found by knowledge of X and the channel.
Intuitively it seems we can send no more than I(X; Y ) bits per letter over the noisy chan- nel, the value of which depends on the p(y|x)’s (channel) and p(x)’s (input ensemble). This is the noisy coding theorem.
9
Coding and Transmission of Messages Using Quantum States
The quantum equivalent of the previous situa- tion is to replace message letters with quantum states. Suppose for a particular physical sys- tem we have the states |ψxi each occuring with probability p(x), where Px p(x) = 1. Then the density operator for a particular state (letter) is
ρ = X
x
p(x)|ψxihψx|.
Since the states |ψxi may not be mutually or- thogonal, different states are not completely distinguishable, that is, they overlap in the state space, hence the entropy for this case is not
H(X) = −X
x
p(x) log p(x).
Two overlapping letters are not exactly two letters, they are effectively less than two let- ters, although always more or same as one let- ter.
Von Neumann Entropy
The Von Neumann entropy for the density operator defined previously is defined as
S(ρ) ≡ −tr (ρ log ρ) .
The logarithm of a matrix is defined as the inverse of the exponential of a matrix. For matrices A and B if
eA =
∞ X n=0
An
n! = B, then
log B = A.
The logarithm of a matrix is normally very hard to calculate, but for diagonal matrix A where Aij = δijai, its exponential is
(eA)ij = δijeai = Bij,
so B is diagonal, with Bij = δijbi we have (log B)ij = δij log bi.
11
Since any density operator can be diagonalized, suppose the eigenvalues of ρ is λi, that is,
ρ = X
i
λi|ϕiihϕi|,
where the |ϕii’s are mutually orthonormal, then S(ρ) = −X
i
λi log λi.
This is the same as the entropy of an ensem- ble of letters each with probabilities λi, since all density operators have unit trace. This equal- ity is not surprising since orthogonal states are completely distinguishable, hence can be treated as classical letters.
Density operators can also be treated as let- ters, for the ensemble X = {ρx, p(x)}, the den- sity operator for each letter is
ρ = X
x
p(x)ρx.
This is the most general case in which even individual letters are in a mixed state, but how
The Von Neumann entropy represents three physical quantities:
1. The quantum information per letter.
2. The classical information per letter.
3. The amount of entanglement.
Yet the theories and methods developed by use of the Von Neumann entropy may somehow be limited due to large correspondence with clas- sical information theory. For example, letters are generally represented physically as mixed states rather than pure states, that is, without relative phase information. The Von Neumann entropy may be a special case of a more gen- eral complex entropy?
Quantum (Von Neumann) Relative entropy
This is the quantum version of relative entropy.
For density matrices ρ1 and ρ2, the relative entropy of ρ1 to ρ2 is defined as
S(ρ1||ρ2) ≡ tr (ρ1 log ρ1) − tr (ρ1 log ρ2) .
The relative entropy is likewise non-negative, and equals zero when ρ1 = ρ2.
Diagonalize both ρ1 and ρ2: ρ1 = X
i
pi|ψiihψi|, ρ2 = X
i
qi|ϕiihϕi|, then
S(ρ1||ρ2)
= S(ρ1) − tr (ρ1 log ρ2)
= X
i
pi log pi − X
i
hψi|ρ1 log ρ2|ψii
= X
i
pi log pi − X
i
pihψi| log ρ2|ψii
= X
i
pi log pi − X
i
pihψi|
X
j
(log qj)|ϕjihϕj|
|ψii
= X
i
pi log pi − X
i
pi X
j
hψi|ϕji2 log qj
= X
i
pi
log pi − X
j
hψi|ϕji2 log qj
Since the logarithm is strictly concave, we have
X j
hψi|ϕji2 log qj ≤ log
X
j
hψi|ϕji2 qj
, with equality if and only if ∀i∃j|ϕji = |ψii, so S(ρ1||ρ2) = X
i
pi
log pi − X
j
hψi|ϕji2 log qj
≥ X
i
pi
log pi − log
X
j
hψi|ϕji2 qj
in the case of equality, S(ρ1||ρ2) = X
i
pi (logpi − log qi) ≥ 0,
since S becomes a (classical) relative entropy.