The mutual information of a c-q channel W : X → S(H) with a prior distribution P ∈ P(X ) is dened by
I(P,W) := D (P ◦ WkP ⊗ P W) = D (WkP W|P ) , (3.53) where P ◦ W := Px∈XP (x)|xihx| ⊗ Wx and P W := Px∈XP (x)Wx. Hence, the information radius or information capacity2 of W : X → S(H) is
CW:= sup
P ∈P(X )I(P,W). (3.54)
The conditional information variance and the unconditional information variance of W : X → S(H) with a prior distribution P ∈ P(X ) are dened, respectively, by
V (P,W) := V (WkP W|P ) ,
U (P,W) := V (P ◦ WkP ⊗ P W) . (3.55)
Note that V (P?,W) = U(P?,W) for every capacity-achieving distribution P? ∈P(X ), i.e. I(P?,W) = CW, can be easily veried from the similar argument in [12, Lemma 62]. We also dene the uncondi-tional information variance in terms of V (ρkσ)e :
V (P,e W) :=V (e WkP W|P ) . (3.56)
The minimal peripheral information variance and its variant are dened by
VW:= inf
P ∈P(X ): I(P,W)=CW
V (P,W), (3.57)
VeW:= inf
P ∈P(X ): I(P,W)=CW
V (P,e W). (3.58)
Furthermore, one can easily verify that
VW> 0 implies CW> 0. (3.59)
In the following, We dene two related information quantities: for every α ∈ [0, 1], Iα(1)(P,W) := inf
σ∈S(H)Dα(P ◦WkP ⊗ σ) ; (3.60)
Iα(2)(P,W) := inf
σ∈S(H)Dα(Wkσ|P ) . (3.61)
The term Iα(1)(P,W) is called the α-Rényi mutual information [104, 64, 58, 105] or the generalized Holevo quantity. The second term Iα(2)(P,W) can be viewed as a variant of the α-Rényi mutual infor-mation, called α-Augustin mutual information [106, 107]. It can be veried that these two functions
2We note that CWequals to the capacity of classical communications over quantum channels [101,102,103]. It is usually term classical capacity [50], though it is a quantity in quantum information processing.
are related by Jensen's inequality:
Iα(1)(P,W) ≤ Iα(2)(P,W). (3.62) For the case of α = 1, they both equal conventional mutual information, i.e. I1(1)(P,W) = I1(2)(P,W) = I(P,W). Mosonyi and Ogawa [58, Proposition IV.2] showed that for all α ∈ [0, 1],
Cα,W:= sup
P ∈P(X )Iα(1)(P,W) = sup
P ∈P(X )Iα(2)(P,W), (3.63) and it is termed the Rényi radius or the Rényi capacity of order α. Moreover, Proposition 3.2below and the compactness of P(X ) show that the suprema in Eq. (3.63) can be replaced with maxima.
We note that Iα(1) admits a closed form for α ∈ (0, 1] due to the quantum Sibson's identity below.
The minimizer in Eq. (3.61) will be studied in Proposition 3.2.
Lemma 3.3 (Quantum Sibson's Identity [108]). Fix an α ∈ (0, 1]. Let ρAB ∈ S(AB) and let σB? be the minimizer of minσB∈S(B)Dα(ρABkρA⊗ σB). Then, one has
σ? = (TrA[ραAB])α1 Trh
TrAραABα1i . (3.64)
The following proposition presents important properties of α-Rényi mutual information and radius.
Proposition 3.2 (Properties of α-Mutual Information and Radius). Given any classical-quantum channel W : X → S(H), the following holds:
(a) For every P ∈ P(X ), α 7→ Iα(2)(P,W) is monotone increasing on [0, 1], and Iα(2)(P,W) ≤ log min{|X |, d} for all α ∈ [0, 1].
(b) The map (α, P ) 7→ Iα(2)(P,W) is continuous on [0, 1] × P(X ).
(c) For every (α, P ) ∈ (0, 1] × P(X ), there exists a unique σα,P ∈ S(H) such that
Iα(2)(P,W) = Dα(Wkσα,P|P ) , (3.65) and
Tα,P(σ) = σ and σ P W if and only if σ = σα,P, (3.66) where the map Tα,P :SP,W(H) → S(H) is dened as
Tα,P(σ) = X
x∈X
P (x)σ1−α2 Wxασ1−α2
Tr [Wxασ1−α]. (3.67)
(d) The map (α, P ) 7→ σα,P is continuous on (0, 1] × P(X ).
(e) The map α 7→ Cα,W is continuous and monotone increasing on [0, 1].
Proof of Proposition 3.2.
(3.2)-(a) Recalling the denition of Iα(2) given in Eq. (3.61). The statement immediately follows from Eq. (3.13) (see also [58, Lemma IV.5]) because the minimization over σ ∈ S(H) preserves the monotonicity. Hence, we have Iα(2)(P, W) ≤ I1(P, W) ≤ log min{|X |, d}, where the last inequality follows from the well-known upper bound for the Holevo quantity (see e.g. [5, Chapter 12]).
(3.2)-(b) Fix an arbitrary sequence (αk, Pk)k∈N such that αk∈ [0, 1], Pk∈P(X ), and limk→+∞(αk, Pk) = (α0, P0) ∈ [0, 1] ×P(X ). Let
σk:= σαk,Pk ∈ arg min
σ∈S(H)
Dαk(Wkσ|Pk) , ∀k ∈N. (3.68) We rst choose a subsequence {kl}l∈Nsuch that
lim inf
k→+∞Iα(2)
k(Pk,W) = lim
l→+∞Iα(2)
kl(Pkl,W). (3.69)
Since S(H) is compact3, there exists a convergent subsubsequence {klm}m∈Nsuch that limm→+∞σklm = σ0 for some σ0∈ S(H). Then, we have
lim inf
k→+∞Iα(2)
k(Pk,W) = lim
m→+∞Dα
klm Wkσklm|Pklm
(3.70)
= lim
m→+∞Dα
klm Wkσklm|P0
(3.71) + lim
m→+∞
X
x∈X
Pklm(x) − P0(x) Dα
klm Wxkσklm
(3.72)
≥ lim
m→+∞Dαklm Wkσklm|P0
(3.73)
≥ Dα0(W kσ0| P0) (3.74)
≥ min
σ∈S(H)Dα0(Wkσ|P0) (3.75)
= Iα(2)0(P0,W), (3.76)
To see why inequality (3.73) holds, we observe that supp(P0) ⊆ supp(Pk) for all suciently large k ∈ N Further, the upper bound of Iα(2)(P,W) ≤ log min{|X |, d} (item (a)) implies that Dαk(Wxkσk) ≤ log min{|X |,d}
Pk(x) for all x ∈ supp(Pk). Hence, for x ∈ supp(P0)and for all suciently large m ∈ N, one has Pklm(x) → P0(x) and Dαklm(Wxkσklm) is bounded away from +∞. On the other hand, Pklm(x) − P0(x) ≥ 0 for x /∈ supp(P0)and all suciently large m ∈ N. In order to establish (3.74), we used the lower semi-continuity of the map (α, σ) 7→ Dα(Wxkσ) for all x ∈supp(P0) in Eq. (3.17) in Lemma 3.2.
Next, we let
σek:= (1 − εk) σα0,P0+ εk1
d, ∀k ∈N; (3.77)
εk:= kPk− P0k1
2 . (3.78)
3Again, the compactness is with respect to the trace norm topology, we transit to the operator norm topology by the
nite dimension of the Hilbert space.
The denition of Iα(2) yields
where equality (3.80) follows from the denition Iα(2). Inequality (3.81) is due to the subadditivity of superior limits. Then, the convexity of σ 7→ Dαk(Wkσ|P ) implies that
lim sup where the last line holds because of the continuity of α 7→ Dα(·k·)on [0, 1] [58, Corollary III.13]and the niteness of Dαk(W k1/d| P0) for all k ∈ N.
It remains to show the second term in Eq. (3.81) is actually zero. Direct calculation shows that
lim sup
where Eq. (3.86) follows from the dominance of α-Rényi divergence [8, Section 4]; equality (3.88) follows the niteness of Dα(Wxk1/d) for all x ∈ X and α ∈ [0, 1]. in the last equality (3.89) we use the convention limεk↓0εklog εk= 0 as εk → 0Hence, item(b)is proved.
(3.2)-(c) For α = 1, it is well-known that (see e.g. [101]) σ1,P = PW. Using the fact P W Wx for all x ∈supp(P ), the statements are trivial.
We x an arbitrary (α, P ) ∈ (0, 1) × P(X ) subsequently. Without loss of generality, we may
further assume
[
x∈supp(P )
supp(Wx) =1H, (3.90)
and hence P W has full support. We rst show that the minimizer σα,P has full support too.
Second, we prove the xed-point property Eq. (3.66). Finally, we establish the uniqueness of σα,P. We remark that the uniqueness has been proven by Dalai and Winter [39, Appendix C].
Here, we provide an alternative proof for the completeness. Our approach follows closely from Hayashi and Tomamichel [104, Appendix C].
Dene
Mα(H) := arg min
σ∈S(H)
Dα(Wkσ|P ) = arg max
σ∈S(H)
gα(σ) = arg max
σ∈SP,W(H)
gα(σ) (3.91) where
gα(σ) := X
x∈X
P (x) log TrWxασ1−α . (3.92)
To show that the optimizer of gα(·) has full support, we observe that the directional derivative on the boundary of S(H) where at least one eigenvalue is zero in a direction that increases its rank diverges to positive innite. Namely, it suces to show
t→0lim
gα((1 − t)σ + tσ⊥) − gα(σ)
t = +∞, (3.93)
where σ ∈ SP,W(H) is some singular density operator, and σ⊥ := Tr[(11H−σ)
H−σ]. For x ∈ supp(P ) with Wx σ, we have Wx⊥ σ⊥. It is not hard to see that
t→0limP (x) log Tr
h
Wxα (1 − t)σ + tσ⊥1−αi
− log TrWxασ1−α
t (3.94)
= lim
t→0P (x)log TrWxα (1 − t)1−ασ1−α+ t1−α(σ⊥)1−α − log Tr Wxασ1−α
t (3.95)
= lim
t→0P (x)(1 − α) log(1 − t)
t (3.96)
= lim
t→0P (x)−(1 − α)
1 − t (3.97)
= −P (x)(1 − α) (3.98)
> −∞ (3.99)
where Eq. (3.95) holds because σ ⊥ σ⊥; Eq. (3.96) is due to Wx ⊥ σ⊥; and Eq. (3.97) is owing to L'Hôspital's rule.
On the other hand, since σ is singular, there must be some x ∈ supp(P ) such that Wx 6 σ.
Hence, by denoting c := Tr[Wxα(σ⊥)1−α]
Tr[Wxασ1−α] > 0, Eq. (3.95) leads to limt→0P (x)log(1 − t)1−α+ t1−αc
t (3.100)
= lim
t→0P (x)−(1 − α)(1 − t)−α+ (1 − α)t−αc
(1 − t)1−α+ t1−αc (3.101)
= +∞, (3.102)
where Eq. (3.101) is by L'Hôspital's rule again. Combining Eqs. (3.99) and (3.102) concludes Eq. (3.93).
Next, we show the xed-point property: Mα(H) = Fα(H), where Fα(H) := {σ ∈ S>0(H)}
denotes the xed-points of the map: Tα,P : SP,W(H) → S(H). A necessary and sucient condition for σ to be an optimizer is
∂ωgα(σ) := Dgα(σ)[ω − σ] = 0, (3.103) for all ω ∈ S(H), where Dgα(σ) denotes the Fréchet derivative of the map gα (see e.g. [104, Appendix C]). Using the chain rule of Fréchet derivatives, it follows
∂ωgα(σ) = Tr
"
X
x∈X
P (x) Wxα
Tr [Wxασ1−α]∂ωσ1−α
#
(3.104)
= Tr
"
X
x∈X
P (x)σ−α2 Wxασ−α2
Tr [Wxασ1−α]σα2∂ωσ1−ασα2
#
. (3.105)
We claim that the operators n
∆ω = σα2σ1−α∂ωσα2 : ω ∈ S(H)o
(3.106) span the space of traceless Hermitian operators on S(H). Let σ = Piλi|iihi|with λi > 0be the eigenvalue decomposition. One can verify [82, Theorem 3.25] that
hi|∆ω|ji =
(λiλj)α2λ
1−α i −λ1−αj
λi−λj hi|ω − σ|ji, if λi 6= λj (1 − α)hi|ω − σ|ji, if λi = λj
. (3.107)
Therefore, ∆ωis Hermitian and Tr [∆ω] = 0for all ω ∈ S(H). Moreover, the basis of the traceless Hermitian operators is given by the operators
Γij = |iihj| + |jihi|, Γ0ij = i|iihj| − |jihi|, Γ00ij = |iihi| − |jihj|
i6=j. (3.108) For every tuple (i, j) with i 6= j there exists an ε > 0 such that the state ω = σ + εΓij is still in S(H). For this state, we nd that ∆ω = ηΓij for some real η > 0. The similar argument applies to Γ0ij and Γ00ij. Hence, we have veried that the operators {∆ω}ω∈S(H) span the space of traceless Hermitian operators.
Armed with the above discussion, the condition that ∂ for all ω ∈ S(H) is equivalent
to the condition that the operators
X
x∈X
P (x)σ−α2 Wxασ−α2
Tr [Wxασ1−α] (3.109)
must be proportional to the identity. Thus, the optimum must be a xed point of the map Tα,P(·).
Lastly, to prove the uniqueness of the optimizer, it remains to show ∂ω2gα(σ) : D2gα(σ)[ω − σ, ω − σ] < 0 for all ω 6= σ and σ > 0. Continuing on Eq. (3.104), we have
∂ω2gα(σ) = − Tr
"
X
x∈X
P (x) Wxα
Tr2[Wxασ1−α]∂ωσ1−α
# + Tr
"
X
x∈X
P (x) Wxα
Tr [Wxασ1−α]∂ω2σ1−α
#
(3.110)
< Tr
"
X
x∈X
P (x) Wxα
Tr [Wxασ1−α]∂ω2σ1−α
#
, (3.111)
where Eq. (3.111) holds by noting that ∂ωσ1−α 6= 0 for all ω 6= σ. Further, ∂ω2σ1−α ≤ 0 since u 7→ u1−α is operator concave. Thus, ∂ω2gα(σ) < 0, item(c)is proved.
(3.2)-(d) We follow the notation in item (d). However, we restrict (αk, Pk)k∈N and (α0, P0) to be in the set (0, 1] × P(X ). The continuity of (α, P ) 7→ Iα(2)(P,W) in item(b)and Eq. (3.74) thus imply
k→+∞lim Iα(2)k(Pk,W) = Dα0(Wkσ0|P0) = Iα(2)0(P0,W) = Dα0(Wkσα0,P0|P0). (3.112) Then, the uniqueness of the minimizer σα,P in item (c)guarantees that σ0 = σα0,P0. Hence,
k→+∞lim σαk,σk = σ0 = σα0,σ0, (3.113) which proves item (d).
(3.2)-(e) Berge's maximum theorem [109, Section IV.3], [110, Lemma 3.1] shows that the continuous map (α, P ) 7→ Iα(2)(P,W) maximized over the compact set P ∈ P(X ) is still continuous for α ∈ [0, 1].
Quantum Hypothesis Testing
The goal of this chapter is to provide an introduction to quantum hypothesis testing. In Parts II and IIIlater, our nite blocklength bounds heavily rely on the results in this chapter. In Sections4.1 and 4.2 below, we present the error exponent analysis, while the moderate deviation analysis is given in Section4.3.
The binary quantum hypothesis testing consists of a null hypothesis and an alternative hypothesis.
The null hypothesis and the alternative hypothesis are described by the quantum states ρ ∈ S(H) and σ ∈ S(H), respectively. Given any test 0 ≤ Q ≤ 1 that determines the outcome to be null hypothesis ρ, the type-I error and type-II error of the hypothesis testing are dened as follows:
α (Q; ρ) := Tr [(1 − Q)ρ] , (4.1)
β (Q; σ) := Tr [Qσ] . (4.2)
Unless ρ ⊥ σ, one cannot make both the type-I and type-II errors arbitrary small given the above denitions. Thus, we dene the minimum type-I error when the type-II error is below µ ∈ (0, 1) as
αbµ(ρkσ) := min
0≤Q≤1α (Q; ρ) : β (Q; σ) ≤ µ . (4.3) The following famous quantum Stein's lemma characterizes the trade-o relation between these two er-rors. That is, the quantum relative entropy D(ρkσ) serves as a benchmark to determine the asymptotic error behaviors of the optimal type-I error.
Theorem 4.1 (Quantum Stein's Lemma [95], [57], [86]). Given a binary hypotheses: H0 : ρand H1 : σ, one has
n→+∞lim αbexp{−nr} ρ⊗nkσ⊗n =(0, r < D(ρkσ)
1, r > D(ρkσ). (4.4)
For an n-shot independent extension of the binary hypothesis:
H0: ρn= ρ1⊗ ρ2⊗ · · · ⊗ ρn, (4.5) H1: σn= σ1⊗ σ2⊗ · · · ⊗ σn, (4.6)
we dene an error exponent function [86] by
φn(r|ρnkσn) := sup
α∈(0,1]
α − 1 α
r − 1
nDα(ρnkσn)
, r ≥ 0. (4.7)
For the case ρn σn, it is known that [86, Lemma 4]
φn(r|ρnkσn) =
+∞, r ∈ [0, −n1log Tr(ρn)0σn),
− log Trρn(σn)0 , r ≥ n1D (ρnkσn) . (4.8) In the following Sections 4.1and 4.2, we show that the exponent function φn will determine how fast the optimal type-I error exponentially decay, i.e.
n→+∞lim −1
nlogαbexp{−nr} ρ⊗nkσ⊗n = φ1(r|ρkσ) = sup
0≤α≤1
1 − α
α (Dα(ρkσ) − r) . (4.9)
4.1 Achievability
Quantum Stein's lemma, given in Theorem 4.1, states that if the exponential decay of the type-II error is not faster than the relative entropy, i.e. r < D(ρkσ), then the optimal type-I error vanishes asymptotically. The quantum Hoeding bound makes a step further to investigate the non-asymptotics:
how fast does the optimal type-I error decays? The achievability bound is then to give an exponential upper bound for it. This result was rst proved by Hayashi [88], and the upper bound can be expressed as Petz's Rényi divergence. Together with the converse bound, discussed in Section4.2later, the error exponent for the optimal type-I error in quantum hypothesis testing was solved; see Eq. (4.9).
For the convenience of readers, we provide the proof of the achievability in Theorem4.2 below.
Theorem 4.2 (Achievability Hoeding Bound [88], [86, Section 5.5]). Given a binary hypotheses:
H0: ρ and H1: σ, and rate r < D(ρkσ), one has
−1
nlogαbexp{−nr} ρ⊗nkσ⊗n ≥ φ1(r|ρkσn), (4.10) where φn is dened in Eq. (4.7).
Proof of Theorem 4.2. Fix an n ∈ N, α ∈ (0, 1), and let
A = e−nxσ⊗n (4.11)
B = ρ⊗n, (4.12)
where x will be determined later. Consider a sequence of test {(1−Qn, Qn)}with Qn:= {B − A ≥ 0}.
Then, Lemma2.8gives that
β(Qn; σ⊗n) = TrQnσ⊗n
(4.13)
= enxTr [QnA] (4.14)
≤ enxαQα ρ⊗nkσ⊗n
(4.15)
= enxαQα(ρkσ)n (4.16)
α(Qn; ρ⊗n) = Tr(1 − Qn)ρ⊗n
(4.17)
= enxTr [(1 − Qn)B] (4.18)
≤ e−nx(1−α)Qα ρ⊗nkσ⊗n
(4.19)
= e−nx(1−α)Qα(ρkσ)n. (4.20)
Now, choose x such that xα + log Qα(ρkσ) = −r to have
β(Qn; σ⊗n) ≤ exp{−nr}. (4.21)
Further, it is not hard to see that
α(Qn; ρ⊗n) ≤ exp{−nφ1(r|ρkσ)}. (4.22)