Rényi Mutual Information - 量子資訊理論中的錯誤率分析

The mutual information of a c-q channel W : X → S(H) with a prior distribution P ∈ P(X ) is dened by

I(P,W) := D (P ◦ WkP ⊗ P W) = D (WkP W|P ) , (3.53) where P ◦ W := P_x∈XP (x)|xihx| ⊗ W_x and P W := P_x∈XP (x)W_x. Hence, the information radius or information capacity² of W : X → S(H) is

C_W:= sup

P ∈P(X )I(P,W). (3.54)

The conditional information variance and the unconditional information variance of W : X → S(H) with a prior distribution P ∈ P(X ) are dened, respectively, by

V (P,W) := V (WkP W|P ) ,

U (P,W) := V (P ◦ WkP ⊗ P W) . (3.55)

Note that V (P^?,W) = U(P^?,W) for every capacity-achieving distribution P^? ∈P(X ), i.e. I(P^?,W) = C_W, can be easily veried from the similar argument in [12, Lemma 62]. We also dene the uncondi-tional information variance in terms of V (ρkσ)e :

V (P,e W) :=V (e WkP W|P ) . (3.56)

The minimal peripheral information variance and its variant are dened by

V_W:= inf

P ∈P(X ): I(P,W)=CW

V (P,W), (3.57)

Ve_W:= inf

P ∈P(X ): I(P,W)=CW

V (P,e W). (3.58)

Furthermore, one can easily verify that

V_W> 0 implies C_W> 0. (3.59)

In the following, We dene two related information quantities: for every α ∈ [0, 1], I_α⁽¹⁾(P,W) := inf

σ∈S(H)D_α(P ◦WkP ⊗ σ) ; (3.60)

I_α⁽²⁾(P,W) := inf

σ∈S(H)Dα(Wkσ|P ) . (3.61)

The term Iα⁽¹⁾(P,W) is called the α-Rényi mutual information [104, 64, 58, 105] or the generalized Holevo quantity. The second term Iα⁽²⁾(P,W) can be viewed as a variant of the α-Rényi mutual infor-mation, called α-Augustin mutual information [106, 107]. It can be veried that these two functions

2We note that CWequals to the capacity of classical communications over quantum channels [101,102,103]. It is usually term classical capacity [50], though it is a quantity in quantum information processing.

are related by Jensen's inequality:

I_α⁽¹⁾(P,W) ≤ Iα⁽²⁾(P,W). (3.62) For the case of α = 1, they both equal conventional mutual information, i.e. I₁⁽¹⁾(P,W) = I₁⁽²⁾(P,W) = I(P,W). Mosonyi and Ogawa [58, Proposition IV.2] showed that for all α ∈ [0, 1],

Cα,W:= sup

P ∈P(X )I_α⁽¹⁾(P,W) = sup

P ∈P(X )I_α⁽²⁾(P,W), (3.63) and it is termed the Rényi radius or the Rényi capacity of order α. Moreover, Proposition 3.2below and the compactness of P(X ) show that the suprema in Eq. (3.63) can be replaced with maxima.

We note that Iα⁽¹⁾ admits a closed form for α ∈ (0, 1] due to the quantum Sibson's identity below.

The minimizer in Eq. (3.61) will be studied in Proposition 3.2.

Lemma 3.3 (Quantum Sibson's Identity [108]). Fix an α ∈ (0, 1]. Let ρAB ∈ S(AB) and let σ_B^? be the minimizer of minσB∈S(B)Dα(ρ_ABkρ_A⊗ σ_B). Then, one has

σ^? = (TrA[ρ^α_AB])^α¹ Trh

Tr_Aρ^α_AB_α¹i . (3.64)

The following proposition presents important properties of α-Rényi mutual information and radius.

Proposition 3.2 (Properties of α-Mutual Information and Radius). Given any classical-quantum channel W : X → S(H), the following holds:

(a) For every P ∈ P(X ), α 7→ Iα⁽²⁾(P,W) is monotone increasing on [0, 1], and Iα⁽²⁾(P,W) ≤ log min{|X |, d} for all α ∈ [0, 1].

(b) The map (α, P ) 7→ Iα⁽²⁾(P,W) is continuous on [0, 1] × P(X ).

I_α⁽²⁾(P,W) = Dα(Wkσα,P|P ) , (3.65) and

Tα,P(σ) = σ and σ P W if and only if σ = σα,P, (3.66) where the map Tα,P :S_P,_W(H) → S(H) is dened as

Tα,P(σ) = X

x∈X

P (x)σ^1−α² W_x^ασ^1−α²

Tr [W_x^ασ^1−α]. (3.67)

(d) The map (α, P ) 7→ σα,P is continuous on (0, 1] × P(X ).

(e) The map α 7→ Cα,W is continuous and monotone increasing on [0, 1].

Proof of Proposition 3.2.

(3.2)-(a) Recalling the denition of Iα⁽²⁾ given in Eq. (3.61). The statement immediately follows from Eq. (3.13) (see also [58, Lemma IV.5]) because the minimization over σ ∈ S(H) preserves the monotonicity. Hence, we have Iα⁽²⁾(P, W) ≤ I1(P, W) ≤ log min{|X |, d}, where the last inequality follows from the well-known upper bound for the Holevo quantity (see e.g. [5, Chapter 12]).

(3.2)-(b) Fix an arbitrary sequence (αk, Pk)k∈N such that αk∈ [0, 1], Pk∈P(X ), and limk→+∞(αk, Pk) = (α₀, P₀) ∈ [0, 1] ×P(X ). Let

σk:= σ_α_k_,P_k ∈ arg min

σ∈S(H)

Dαk(Wkσ|Pk) , ∀k ∈N. (3.68) We rst choose a subsequence {kl}_l∈_Nsuch that

lim inf

k→+∞I_α⁽²⁾

k(P_k,W) = lim

l→+∞I_α⁽²⁾

kl(P_k_l,W). (3.69)

Since S(H) is compact³, there exists a convergent subsubsequence {klm}_m∈_Nsuch that limm→+∞σ_k_lm = σ0 for some σ0∈ S(H). Then, we have

lim inf

k→+∞I_α⁽²⁾

k(P_k,W) = lim

m→+∞D_α

klm Wkσk_lm|P_k_lm

(3.70)

= lim

m→+∞D_α

klm Wkσk_lm|P₀

(3.71) + lim

m→+∞

x∈X

P_k_lm(x) − P₀(x) D_α

klm W_xkσ_k_lm

(3.72)

≥ lim

m→+∞Dα_klm Wkσk_lm|P₀

(3.73)

≥ D_α₀(W kσ0| P₀) (3.74)

≥ min

σ∈S(H)D_α₀(Wkσ|P0) (3.75)

= I_α⁽²⁾₀(P0,W), (3.76)

To see why inequality (3.73) holds, we observe that supp(P0) ⊆ supp(Pk) for all suciently large k ∈ N Further, the upper bound of Iα⁽²⁾(P,W) ≤ log min{|X |, d} (item (a)) implies that D_α_k(W_xkσ_k) ≤ log min{|X |,d}

Pk(x) for all x ∈ supp(Pk). Hence, for x ∈ supp(P0)and for all suciently large m ∈ N, one has Pk_lm(x) → P0(x) and Dα_klm(Wxkσ_k_lm) is bounded away from +∞. On the other hand, Pk_lm(x) − P₀(x) ≥ 0 for x /∈ supp(P0)and all suciently large m ∈ N. In order to establish (3.74), we used the lower semi-continuity of the map (α, σ) 7→ Dα(W_xkσ) for all x ∈supp(P0) in Eq. (3.17) in Lemma 3.2.

Next, we let

σe_k:= (1 − ε_k) σ_α₀_,P₀+ ε_k1

d, ∀k ∈N; (3.77)

ε_k:= kP_k− P₀k₁

2 . (3.78)

3Again, the compactness is with respect to the trace norm topology, we transit to the operator norm topology by the

nite dimension of the Hilbert space.

The denition of Iα⁽²⁾ yields

where equality (3.80) follows from the denition Iα⁽²⁾. Inequality (3.81) is due to the subadditivity of superior limits. Then, the convexity of σ 7→ Dαk(Wkσ|P ) implies that

lim sup where the last line holds because of the continuity of α 7→ Dα(·k·)on [0, 1] [58, Corollary III.13]and the niteness of Dα_k(W k1/d| P0) for all k ∈ N.

It remains to show the second term in Eq. (3.81) is actually zero. Direct calculation shows that

lim sup

where Eq. (3.86) follows from the dominance of α-Rényi divergence [8, Section 4]; equality (3.88) follows the niteness of Dα(Wxk1/d) for all x ∈ X and α ∈ [0, 1]. in the last equality (3.89) we use the convention limεk↓0ε_klog ε_k= 0 as εk → 0Hence, item(b)is proved.

(3.2)-(c) For α = 1, it is well-known that (see e.g. [101]) σ1,P = PW. Using the fact P W Wx for all x ∈supp(P ), the statements are trivial.

We x an arbitrary (α, P ) ∈ (0, 1) × P(X ) subsequently. Without loss of generality, we may

further assume

[

x∈supp(P )

supp(Wx) =1H, (3.90)

and hence P W has full support. We rst show that the minimizer σα,P has full support too.

Second, we prove the xed-point property Eq. (3.66). Finally, we establish the uniqueness of σα,P. We remark that the uniqueness has been proven by Dalai and Winter [39, Appendix C].

Here, we provide an alternative proof for the completeness. Our approach follows closely from Hayashi and Tomamichel [104, Appendix C].

Dene

M_α(H) := arg min

σ∈S(H)

D_α(Wkσ|P ) = arg max

σ∈S(H)

g_α(σ) = arg max

σ∈S_P,W(H)

g_α(σ) (3.91) where

g_α(σ) := X

x∈X

P (x) log TrW_x^ασ^1−α . (3.92)

To show that the optimizer of gα(·) has full support, we observe that the directional derivative on the boundary of S(H) where at least one eigenvalue is zero in a direction that increases its rank diverges to positive innite. Namely, it suces to show

t→0lim

g_α((1 − t)σ + tσ^⊥) − g_α(σ)

t = +∞, (3.93)

where σ ∈ SP,W(H) is some singular density operator, and σ^⊥ := _Tr[⁽¹₁^H^−σ)

H−σ]. For x ∈ supp(P ) with Wx σ, we have Wx⊥ σ^⊥. It is not hard to see that

t→0limP (x) log Tr

W_x^α (1 − t)σ + tσ^⊥1−αi

− log TrW_x^ασ^1−α

t (3.94)

= lim

t→0P (x)log TrW_x^α (1 − t)^1−ασ^1−α+ t^1−α(σ^⊥)^1−α − log Tr W_x^ασ^1−α

t (3.95)

= lim

t→0P (x)(1 − α) log(1 − t)

t (3.96)

= lim

t→0P (x)−(1 − α)

1 − t (3.97)

= −P (x)(1 − α) (3.98)

> −∞ (3.99)

where Eq. (3.95) holds because σ ⊥ σ^⊥; Eq. (3.96) is due to Wx ⊥ σ^⊥; and Eq. (3.97) is owing to L'Hôspital's rule.

On the other hand, since σ is singular, there must be some x ∈ supp(P ) such that Wx 6 σ.

Hence, by denoting c := ^Tr[^Wx^α(σ^⊥)^1−α]

Tr[W_x^ασ^1−α] > 0, Eq. (3.95) leads to limt→0P (x)log(1 − t)^1−α+ t^1−αc

t (3.100)

= lim

t→0P (x)−(1 − α)(1 − t)^−α+ (1 − α)t^−αc

(1 − t)^1−α+ t^1−αc (3.101)

= +∞, (3.102)

where Eq. (3.101) is by L'Hôspital's rule again. Combining Eqs. (3.99) and (3.102) concludes Eq. (3.93).

Next, we show the xed-point property: Mα(H) = F_α(H), where Fα(H) := {σ ∈ S_>0(H)}

denotes the xed-points of the map: Tα,P : S_P,_W(H) → S(H). A necessary and sucient condition for σ to be an optimizer is

∂ωgα(σ) := Dgα(σ)[ω − σ] = 0, (3.103) for all ω ∈ S(H), where Dgα(σ) denotes the Fréchet derivative of the map gα (see e.g. [104, Appendix C]). Using the chain rule of Fréchet derivatives, it follows

∂_ωg_α(σ) = Tr

x∈X

P (x) W_x^α

Tr [W_x^ασ^1−α]∂_ωσ^1−α

(3.104)

= Tr

x∈X

P (x)σ^−α² W_x^ασ^−α²

Tr [W_x^ασ^1−α]σ^α²∂_ωσ^1−ασ^α²

. (3.105)

We claim that the operators n

∆_ω = σ^α²σ^1−α∂_ωσ^α² : ω ∈ S(H)o

(3.106) span the space of traceless Hermitian operators on S(H). Let σ = P_iλi|iihi|with λi > 0be the eigenvalue decomposition. One can verify [82, Theorem 3.25] that

hi|∆_ω|ji =







(λiλj)^α²^λ

1−α i −λ^1−α_j

λi−λ_j hi|ω − σ|ji, if λi 6= λ_j (1 − α)hi|ω − σ|ji, if λi = λj

. (3.107)

Therefore, ∆ωis Hermitian and Tr [∆ω] = 0for all ω ∈ S(H). Moreover, the basis of the traceless Hermitian operators is given by the operators

Γ_ij = |iihj| + |jihi|, Γ⁰_ij = i|iihj| − |jihi|, Γ⁰⁰_ij = |iihi| − |jihj|

i6=j. (3.108) For every tuple (i, j) with i 6= j there exists an ε > 0 such that the state ω = σ + εΓij is still in S(H). For this state, we nd that ∆ω = ηΓ_ij for some real η > 0. The similar argument applies to Γ⁰_ij and Γ⁰⁰_ij. Hence, we have veried that the operators {∆ω}_ω∈S(H) span the space of traceless Hermitian operators.

Armed with the above discussion, the condition that ∂ for all ω ∈ S(H) is equivalent

to the condition that the operators

x∈X

P (x)σ^−α² W_x^ασ^−α²

Tr [W_x^ασ^1−α] (3.109)

must be proportional to the identity. Thus, the optimum must be a xed point of the map T_α,P(·).

Lastly, to prove the uniqueness of the optimizer, it remains to show ∂ω²g_α(σ) : D²g_α(σ)[ω − σ, ω − σ] < 0 for all ω 6= σ and σ > 0. Continuing on Eq. (3.104), we have

∂_ω²g_α(σ) = − Tr

x∈X

P (x) W_x^α

Tr²[W_x^ασ^1−α]∂_ωσ^1−α

# + Tr

x∈X

P (x) W_x^α

Tr [W_x^ασ^1−α]∂_ω²σ^1−α

(3.110)

< Tr

x∈X

P (x) W_x^α

Tr [W_x^ασ^1−α]∂_ω²σ^1−α

, (3.111)

where Eq. (3.111) holds by noting that ∂ωσ^1−α 6= 0 for all ω 6= σ. Further, ∂ω²σ^1−α ≤ 0 since u 7→ u^1−α is operator concave. Thus, ∂ω²gα(σ) < 0, item(c)is proved.

(3.2)-(d) We follow the notation in item (d). However, we restrict (αk, P_k)_k∈_N and (α0, P₀) to be in the set (0, 1] × P(X ). The continuity of (α, P ) 7→ Iα⁽²⁾(P,W) in item(b)and Eq. (3.74) thus imply

k→+∞lim I_α⁽²⁾_k(P_k,W) = Dα0(Wkσ0|P₀) = I_α⁽²⁾₀(P₀,W) = Dα0(Wkσα0,P0|P₀). (3.112) Then, the uniqueness of the minimizer σα,P in item (c)guarantees that σ0 = σα0,P0. Hence,

k→+∞lim σ_α_k_,σ_k = σ₀ = σ_α₀_,σ₀, (3.113) which proves item (d).

(3.2)-(e) Berge's maximum theorem [109, Section IV.3], [110, Lemma 3.1] shows that the continuous map (α, P ) 7→ I_α⁽²⁾(P,W) maximized over the compact set P ∈ P(X ) is still continuous for α ∈ [0, 1].

Quantum Hypothesis Testing

The goal of this chapter is to provide an introduction to quantum hypothesis testing. In Parts II and IIIlater, our nite blocklength bounds heavily rely on the results in this chapter. In Sections4.1 and 4.2 below, we present the error exponent analysis, while the moderate deviation analysis is given in Section4.3.

The binary quantum hypothesis testing consists of a null hypothesis and an alternative hypothesis.

The null hypothesis and the alternative hypothesis are described by the quantum states ρ ∈ S(H) and σ ∈ S(H), respectively. Given any test 0 ≤ Q ≤ 1 that determines the outcome to be null hypothesis ρ, the type-I error and type-II error of the hypothesis testing are dened as follows:

α (Q; ρ) := Tr [(1 − Q)ρ] , (4.1)

β (Q; σ) := Tr [Qσ] . (4.2)

Unless ρ ⊥ σ, one cannot make both the type-I and type-II errors arbitrary small given the above denitions. Thus, we dene the minimum type-I error when the type-II error is below µ ∈ (0, 1) as

αb_µ(ρkσ) := min

0≤Q≤1α (Q; ρ) : β (Q; σ) ≤ µ . (4.3) The following famous quantum Stein's lemma characterizes the trade-o relation between these two er-rors. That is, the quantum relative entropy D(ρkσ) serves as a benchmark to determine the asymptotic error behaviors of the optimal type-I error.

Theorem 4.1 (Quantum Stein's Lemma [95], [57], [86]). Given a binary hypotheses: H0 : ρand H1 : σ, one has

n→+∞lim αb_exp{−nr} ρ^⊗nkσ^⊗n =(0, r < D(ρkσ)

1, r > D(ρkσ). (4.4)

For an n-shot independent extension of the binary hypothesis:

H0: ρⁿ= ρ1⊗ ρ₂⊗ · · · ⊗ ρ_n, (4.5) H1: σⁿ= σ1⊗ σ₂⊗ · · · ⊗ σ_n, (4.6)

we dene an error exponent function [86] by

φn(r|ρⁿkσⁿ) := sup

α∈(0,1]

α − 1 α

r − 1

nDα(ρⁿkσⁿ)

, r ≥ 0. (4.7)

For the case ρⁿ σⁿ, it is known that [86, Lemma 4]

φ_n(r|ρⁿkσⁿ) =







+∞, r ∈ [0, −_n¹log Tr(ρⁿ)⁰σⁿ),

− log Trρⁿ(σⁿ)⁰ , r ≥ _n¹D (ρⁿkσⁿ) . (4.8) In the following Sections 4.1and 4.2, we show that the exponent function φn will determine how fast the optimal type-I error exponentially decay, i.e.

n→+∞lim −1

nlogαb_exp{−nr} ρ^⊗nkσ^⊗n = φ₁(r|ρkσ) = sup

0≤α≤1

1 − α

α (D_α(ρkσ) − r) . (4.9)

4.1 Achievability

Quantum Stein's lemma, given in Theorem 4.1, states that if the exponential decay of the type-II error is not faster than the relative entropy, i.e. r < D(ρkσ), then the optimal type-I error vanishes asymptotically. The quantum Hoeding bound makes a step further to investigate the non-asymptotics:

how fast does the optimal type-I error decays? The achievability bound is then to give an exponential upper bound for it. This result was rst proved by Hayashi [88], and the upper bound can be expressed as Petz's Rényi divergence. Together with the converse bound, discussed in Section4.2later, the error exponent for the optimal type-I error in quantum hypothesis testing was solved; see Eq. (4.9).

For the convenience of readers, we provide the proof of the achievability in Theorem4.2 below.

Theorem 4.2 (Achievability Hoeding Bound [88], [86, Section 5.5]). Given a binary hypotheses:

H0: ρ and H1: σ, and rate r < D(ρkσ), one has

−1

nlogαb_exp{−nr} ρ^⊗nkσ^⊗n ≥ φ₁(r|ρkσⁿ), (4.10) where φn is dened in Eq. (4.7).

Proof of Theorem 4.2. Fix an n ∈ N, α ∈ (0, 1), and let

A = e^−nxσ^⊗n (4.11)

B = ρ^⊗n, (4.12)

where x will be determined later. Consider a sequence of test {(1−Qn, Q_n)}with Qn:= {B − A ≥ 0}.

Then, Lemma2.8gives that

β(Q_n; σ^⊗n) = TrQ_nσ^⊗n

(4.13)

= e^nxTr [Q_nA] (4.14)

≤ e^nxαQ_α ρ^⊗nkσ^⊗n

(4.15)

= e^nxαQ_α(ρkσ)ⁿ (4.16)

α(Q_n; ρ^⊗n) = Tr(1 − Qn)ρ^⊗n

(4.17)

= e^nxTr [(1 − Qn)B] (4.18)

≤ e^{−nx(1−α)}Qα ρ^⊗nkσ^⊗n

(4.19)

= e^{−nx(1−α)}Qα(ρkσ)ⁿ. (4.20)

Now, choose x such that xα + log Qα(ρkσ) = −r to have

β(Qn; σ^⊗n) ≤ exp{−nr}. (4.21)

Further, it is not hard to see that

α(Qn; ρ^⊗n) ≤ exp{−nφ1(r|ρkσ)}. (4.22)

在文檔中量子資訊理論中的錯誤率分析 (頁 41-50)