Quantum Information Theory

(1)

Quantum Information Theory

Scope:

1. Transmission of classical information over quantum channels.

2. The tradeoff between acquisition of quantum state information and disturbance of the state.

3. Quantifying quantum entanglement.

4. Transmission of quantum information over quantum channels.

Mainly accomplished by the interpretation and application of the Von Neumann entropy.

(2)

Classical Information Theory

A message is a string of letters chosen from an alphabet of k letters

{a₁, a₂, . . . , a_k}.

The letters are independent and occurs with probability p(a_x), and ^P^k_x=1 p(a_x) = 1.

A typical message of length n will contain np(a_x) a_x’s for each x. So the number of typical strings is

n!

Q_k

x=1(np(a_x))!.

By the Stirling approximation log n! = n log n − n + O(log n) we have

log n!

Q_k

x=1(np(a_x))!

= log n! −

k X x=1

log(np(a_x))!

2

(3)

= n log n − n −

k X x=1

(np(a_x) log np(a_x) − np(a_x))

= n log n −

k X x=1

np(a_x)(log n + log p(a_x))

= −n

k X x=1

p(a_x) log p(a_x)

= nH(X), where

H(X) = −

k X x=1

p(a_x) log p(a_x)

is the Shannon entropy of the ensemble X =

(4)

So there are approximately 2^nH(X)

typical strings of length n for the letter ensemble X. Hence if we consider the typical strings as the only strings that can appear, then a string of length n can be compressed to nH(X) bits, that is, only nH(X) bits are needed to store any length n string.

The noiseless coding theorem states that the optimal code compresses each letter to H(X) bits asymptotically. It’s the highest compres- sion rate given the requirement that messages must be decoded without errors as n → ∞.

(5)

Another Perspective

For a particular length n message x₁x₂ . . . x_n,

its prior probability is

P (x₁x₂ . . . x_n) = p(x₁)p(x₂) . . . p(x_n) and

log P (x₁x₂ . . . x_n) =

n X i=1

log p(x_i).

By the central limit theorem, when n is large enough most messages has probability P sat- isfying

−1

n log P (x₁x₂ . . . x_n) = −1 n

n X i=1

log p(x_i)

≈ h− log p(x)i

≡ H(X),

where the random varaiable x represent a letter chosen from X.

(6)

So for the typical sequences its probability P satisfies

H(X) − δ < −1

n log P (x₁x₂ . . . x_n) < H(X) + δ, or

2^{−n(H(X)−δ)} > P (x₁x₂ . . . x_n) > 2^{−n(H(X)+δ)}, where δ > 0 is small.

(7)

Interpretation of Shannon Entropy

The Shannon entropy for a specific source X can be seen as the amount of our ignorance about the value of the next letter, or the amount of indeterminancy of the unknownm letter. It can also be seen as the amount of information we gain after receiving one letter, in the usual case where the logarithm in H is with base 2, the unit of H is bits.

(8)

Binary Entropy

Suppose that the alphabet is bits, that is, X = {0, 1}, with probability p₀ = p and p₁ = 1 − p.

The entropy for this case is

H(X) = H(p) = −p log p − (1 − p) log(1 − p).

When p₀ = ¹₂, the bit is completely random, hence

H(1

2) = 1

is the maximum attainable value for the entropy, that is, we are maximally ignorant about the value of the next letter, or that we gain the most information (one bit) by receiving one letter. When p₀ = 1 or p₁ = 1, the next bit is completely predictable, the entropy in this case is

H(0) = H(1) = 0,

so we are not ignorant about the value of next bit at all, it also means that we get no information by receiving one letter. All other cases have entropy between these two extremes.

5

(9)

Generalize the result in the previous section to general sources X, the entropy H is zero whenever any one of the letters occurs with certainty. That is,

H_min(X) = − log 1 = 0.

And maximum entropy is achieved when all letters occur with equal probability, that is, with a uniform probability distribution. For a X with d letters, the entropy of uniform probability is

H_max(X) = − ^X

i

1

d log 1

d = − log 1

d = log d.

In the general case of source X with d letters, its information per letter is

0 ≤ H(X) ≤ log d.

(10)

Relative Entropy

If p(x) and q(x) are two probability distributions over the same index set x (or a given set of letters), then the relative entropy of p(x) to q(x) is defined as

H(p(x)||q(x)) ≡ ^X

x

p(x) log p(x) q(x)

= −H(X) − ^X

x

p(x) log q(x).

The relative entropy is a measure of the close- ness of these two probability distributions. Since ln y ≤ y − 1, we have

H(p(x)||q(x)) = −^X

x

p(x) log q(x) p(x)

≥ 1

ln 2

X x

p(x) 1 − q(x) p(x)

!

= 1

ln 2

X x

(p(x) − q(x))

= 0.

With equality when p(x) = q(x) for all x.

7

(11)

Mutual Information

Suppose a message composed from X are trans- mitted through a noisy channel, and a message composed from Y is received, that is, the channel distorts a letter x ∈ X into y ∈ Y with conditional probability p(y|x). When the message is received, the probability distribution for x can be updated to

p(x|y) = p(y|x)p(x) p(y) ,

where p(y|x) represent properties of the channel, p(x) the a priori probabilities of ensemble X, and p(y) = ^P_x p(y|x)p(x). So the message composed from Y contains some information about the original message from X. Using the p(x|y)’s we can defined the conditional entropy as

H(X|Y ) = h− log p(x|y)i = −^X

xy

p(x, y) log p(x|y).

(12)

Note that

H(X|Y ) = h− log p(x, y) + log p(y)i

= h− log p(x, y)i − h− log p(y)i

= H(X, Y ) − H(Y ),

where H(X, Y ) ≡ −^P_xy p(x, y) log p(x, y), simi- larly

H(Y |X) = H(X, Y ) − H(X).

We need H(X) bits per letter to decode messages from X, after receiving via the noisy channel a message from Y , we need H(X|Y ) more bits per letter to decode the message. In other words

I(X; Y ) = H(X) − H(X|Y )

= H(X) + H(Y ) − H(X, Y )

= H(Y ) − H(Y |X).

bits of information per letters is gained by receiving the distorted message. I(X; Y ) is the mutual information, which is symmetric.

(13)

From the properties of the logarithm we have H(X) ≥ H(X|Y ) ≥ 0,

H(Y ) ≥ H(Y |X) ≥ 0, so

I(X; Y ) ≥ 0,

H(X) + H(Y ) ≥ H(X, Y ).

That is, we will not lose any knowledge of a message from X by receiving a message from Y .

Equality occurs when X and Y is independent, then

I(X; Y ) = H(X) − H(X|Y )

= H(X) − h− log p(x|y)i

= H(X) −

*

− log p(x, y) p(y)

+

= H(X) −

*

− log p(x)p(y) p(y)

+

= H(X) − h− log p(x)i

(14)

The Noisy Coding Theorem

With X = {x, p(x)} for the input letters, we send a length n message through a memory- less noisy channel specified by p(y|x)’s. The output letters Y = {y, p(y)} can be found by knowledge of X and the channel.

Intuitively it seems we can send no more than I(X; Y ) bits per letter over the noisy channel, the value of which depends on the p(y|x)’s (channel) and p(x)’s (input ensemble). This is the noisy coding theorem.

9

(15)

Coding and Transmission of Messages Using Quantum States

The quantum equivalent of the previous situa- tion is to replace message letters with quantum states. Suppose for a particular physical sys- tem we have the states |ψ_xi each occuring with probability p(x), where ^P_x p(x) = 1. Then the density operator for a particular state (letter) is

ρ = ^X

x

p(x)|ψ_xihψ_x|.

Since the states |ψ_xi may not be mutually orthogonal, different states are not completely distinguishable, that is, they overlap in the state space, hence the entropy for this case is not

H(X) = −^X

x

p(x) log p(x).

Two overlapping letters are not exactly two letters, they are effectively less than two letters, although always more or same as one letter.

(16)

Von Neumann Entropy

The Von Neumann entropy for the density operator defined previously is defined as

S(ρ) ≡ −tr (ρ log ρ) .

The logarithm of a matrix is defined as the inverse of the exponential of a matrix. For matrices A and B if

e^A =

∞ X n=0

Aⁿ

n! = B, then

log B = A.

The logarithm of a matrix is normally very hard to calculate, but for diagonal matrix A where A_ij = δ_ija_i, its exponential is

(e^A)_ij = δ_ije^aⁱ = B_ij,

so B is diagonal, with B_ij = δ_ijb_i we have (log B)_ij = δ_ij log b_i.

11

(17)

Since any density operator can be diagonalized, suppose the eigenvalues of ρ is λ_i, that is,

ρ = ^X

i

λ_i|ϕ_iihϕ_i|,

where the |ϕ_ii’s are mutually orthonormal, then S(ρ) = −^X

i

λ_i log λ_i.

This is the same as the entropy of an ensemble of letters each with probabilities λ_i, since all density operators have unit trace. This equality is not surprising since orthogonal states are completely distinguishable, hence can be treated as classical letters.

Density operators can also be treated as letters, for the ensemble X = {ρ_x, p(x)}, the density operator for each letter is

ρ = ^X

x

p(x)ρ_x.

This is the most general case in which even individual letters are in a mixed state, but how

(18)

The Von Neumann entropy represents three physical quantities:

1. The quantum information per letter.

2. The classical information per letter.

3. The amount of entanglement.

Yet the theories and methods developed by use of the Von Neumann entropy may somehow be limited due to large correspondence with classical information theory. For example, letters are generally represented physically as mixed states rather than pure states, that is, without relative phase information. The Von Neumann entropy may be a special case of a more general complex entropy?

(19)

Quantum (Von Neumann) Relative entropy

This is the quantum version of relative entropy.

For density matrices ρ₁ and ρ₂, the relative entropy of ρ₁ to ρ₂ is defined as

S(ρ₁||ρ₂) ≡ tr (ρ₁ log ρ₁) − tr (ρ₁ log ρ₂) .

The relative entropy is likewise non-negative, and equals zero when ρ₁ = ρ₂.

Diagonalize both ρ₁ and ρ₂: ρ₁ = ^X

i

p_i|ψ_iihψ_i|, ρ₂ = ^X

i

q_i|ϕ_iihϕ_i|, then

S(ρ₁||ρ₂)

= S(ρ₁) − tr (ρ₁ log ρ₂)

= ^X

i

p_i log p_i − ^X

i

hψ_i|ρ₁ log ρ₂|ψ_ii

= ^X

i

p_i log p_i − ^X

i

p_ihψ_i| log ρ₂|ψ_ii

(20)

= ^X

i

p_i log p_i − ^X

i

p_ihψ_i|



 X

j

(log q_j)|ϕ_jihϕ_j|



|ψ_ii

= ^X

i

p_i log p_i − ^X

i

p_i ^X

j

hψ_i|ϕ_ji² log q_j

= ^X

i

p_i



log p_i − ^X

j





Since the logarithm is strictly concave, we have

X j

hψ_i|ϕ_ji² log q_j ≤ log



 X

j

hψ_i|ϕ_ji² q_j



, with equality if and only if ∀i∃j|ϕ_ji = |ψ_ii, so S(ρ₁||ρ₂) = ^X

i

p_i



log p_i − ^X

j





≥ ^X

i

p_i



log p_i − log



 X

j

hψ_i|ϕ_ji² q_j









in the case of equality, S(ρ₁||ρ₂) = ^X

i

p_i (logp_i − log q_i) ≥ 0,

since S becomes a (classical) relative entropy.