• 沒有找到結果。

Large Deviation Theory

(f (A) − f (B))+⊕ (f (B) − f (A))+

(2.44)

≤ k[f ((A − B)+) − f (0)1] ⊕ [f((B − A)+) − f (0)1]k (2.45)

= kf (|A − B|) − f (0)1k (2.46)

= f (kA − Bk) − f (0). (2.47)

Lemma 2.13. [90, Corollary 3.6] Let Ai be m × m positive semi-denite matrix and Zi be n × m matrix for i = 1, . . . , k. Then, for all unitarily invariant norms k · k and γ > 0, the map

(p, t) 7→

k

X

i=1

ZiAt/pi Zi

!γp

(2.48)

is jointly log-convex on (0, +∞) × (−∞, +∞).

2.2 Large Deviation Theory

In this section, we will see that the Lengendre-Fenchel transform is closely related to the error-exponent function of hypothesis testing and channel coding. Consider the following binary classical hypotheses:

H0 : pn:= px1 ⊗ px2 ⊗ · · · pxn,

H1 : qn:= qx1⊗ qx2 ⊗ · · · qxn, (2.49) where pxi, qxi are probability mass functions; and xi belongs to some nite alphabet X and n ∈ N be

xed. Given any r ≥ 0, recall the denition of the error-exponent function in Eq. (4.7):

φn(r) = φn(r|pnkqn) = sup

α∈(0,1]

 1 − α α

 1

nDα(pnkqn) − r



. (2.50)

Without loss of generality, we assume that pn qn have the same support since elements of qxi, that do not lie in the support of pxi, do not contribute to φn(r).

Let Z be a random variable with probability measure µ. Further, we assume Z is nite on supp(µ).

The cumulant generating function (c.g.f.) of Z is dened as

Λ(t) := logEµetZ , t ∈R. (2.51)

The Lengendre-Fenchel transform of Λ(t) is Λ(z) := sup

t∈R{zt − Λ(t)} . (2.52)

Such a transform plays a signicant role in concentration inequalities, convex analysis, and large deviation theory [27].

Let Pxn be the empirical distribution of the sequence xn = x1x2. . . xn. Let Z0 = logqpnn with

Rewrite the right-hand side of Eq. (2.50) with α = 1+s1 , and observe that

X

Then the error-exponent function in Eq. (2.50) can also be viewed as a Lengendre-Fenchel transform of E0(2)(s, Pxn):

The following lemma relates φn(r)to Λj,Pxn(z), the Lengendre-Fenchel transform of Eq. (2.53):

Λj,Pxn(z) := sup

xn be the optimizer of φn(r) in Eq. (2.57). The optimizer t? ∈ (0, 1) is unique, and satises Λ00,Pxn(t?) = φn(r) − r.

Before proving Lemma 2.14, we will need the following partial derivatives with respect to t:

where we denote the tilted distributions for every i ∈ [n] and t ∈ [0, 1] by

ˆ

qxi,t(ω) := pxi(ω)1−tqxi(ω)t P

ω∈supp(pxi)pxi(ω)1−tqxi(ω)t, ω ∈supp(pxi). (2.61) It is also easy to verify that

Λ0,xi(t) = Λ1,xi(1 − t), Λ00,xi(t) = −Λ01,xi(1 − t), Λ000,xi(t) = Λ001,xi(1 − t). (2.62) This lemma closely follows Ref. [91, Lemma 9]; however, the major dierence is that we prove the claim using φn(r|ρnn)in Eq. (4.7) instead of the discrimination function: min {D (τkρ) : D (τkσ) ≤ r}

in Eq. (9.20). This expression is crucial to obtaining the sphere-packing bound in Theorem11.1in the strong from, cf. Eq. (1.4), instead of the weak form, cf. Eq. (1.5).

Proof of Lemma 2.14.

(2.14-(a)) We will prove this statement by contradiction. Let t ∈ [0, 1], Assuming that Λ000,Pxn(t) = 0, implies Λ000,x(t) = 0, ∀x ∈ supp(Pxn). Recall from Eq. (2.60)

which is equivalent to

px(ω) = qx(ω) · e−Λ00,x(t), ∀ω ∈supp(px). (2.64) Summing both sides of Eq. (2.64) over ω ∈ supp(px) gives

1 = Trp0xqx e−Λ00,x(t). (2.65) Then, Eqs. (2.64) and (2.65) imply that

φn(r) = sup

(2.14-(b)) Observe that E0(2)(s, Pxn) − sr in Eq. (2.57) is strictly concave in s ∈ R≥0 since

owing to Eqs. (2.56), (2.60), and Lemma (a). Moreover, s = 0 cannot be an optimum in Eq. (2.57); otherwise, it will violate the assumption φn(r) ≥ 0. Thus a unique maximizer s?∈R>0 exists such that

where in the second equality we use Eq. (2.56) and

r = ∂E0(2)(s, Pxn)

Comparing Eq. (2.71) with (2.73) gives

Λ00,P

which is exactly the optimum solution to Λ0,Pxn(z)in Eq. (2.58) with

t?= s?

where Eqs. (2.74) and (2.71) are used in the third and last equalities.

(2.14-(c)) This proof follows from similar arguments in item (b)and Eq. (2.62). Eqs. (2.74) and (2.62) lead to

which satises the optimum solution to Λ1,Pxn(z) in Eq. (2.58) with t? = 1+s1? ∈ (0, 1) and

where the third equality is due to Eq. (2.81), and the last equality follows from Eqs. (2.62) and (2.73).

(2.14-(d)) The fact that a unique optimizer t? ∈ (0, 1) exists such that Λ00,Pxn (t?) = φn(r) − r follows directly from Eqs. (2.74), (2.75) and Λ000,Pxn(t) > 0, for t ∈ [0, 1].

Moreover, Eqs. (2.70), (2.72), and (2.69) yield

−∂φn(r) which completes the claim in item(d).

Let (Zi)ni=1be a sequence of independent, real-valued random variables with probability measures (µi)ni=1. Let Λi(t) := logE etZi

and dene the Legendre-Fenchel transform of n1Pn

i=1Λi(·) to be:

Dene the probability measure ˜µi via d˜µi

inequality for n1Pn i=1Zi:

Theorem 2.1 (Bahadur-Ranga Rao's Concentration Inequality [91, Proposition 5], [48]). Provided that √

Chaganty and Sethuraman in Ref. [49, Theorem 3.3] considered a more general sequence of random variables {Zn}n∈N, which are not necessarily the sum of random variables.

Let (Xi)i∈N be a sequence of independent, real-valued random variables with probability measures (µi)ni=1. Let Zn := Pn

i=1Xi and let Λn(t) := logE etZn

. Dene the Legendre-Fenchel transform of

1 With these denitions, we can now state the following sharp concentration inequality for 1nZn: Theorem 2.2 (Chaganty-Sethuraman's Concentration Inequality [49, Theorem 3.3] ). For any η ∈ (0, 1), there exists an N0 ∈N such that, for all n ≥ N0,

Remark 2.1. Chaganty and Sethuraman proved Theorem2.2 provided that the following condition is

satised: there exists δ0 > 0such that for any δ and λ with 0 < δ < δ0 < λ, supδ<|t|≤λt?n|exp{Λn(t?n+ it)}/ exp{Λn(t?n)}| = o(1/√

n), where the supremum is dened to be 0 if {t : δ < |t| ≤ λt?n}is empty. In the case of Znbeing a sum of random variables, exp{Λn(t?n+ it)}/ exp{Λn(t?n)}is the product of the characteristic functions of {Xi}ni=1. Since the supremum of a characteristic function on a compact interval not containing 0 is less than 1, this condition is thus satised.

We note that the lower bound in Theorem2.2for the general sequence of random variables (Xi)i∈N suces to establish the converse bound in moderate deviation analysis for c-q channel coding, The-orem 12.2 in Chapter 12 later. We do not particularly consider the case of lattice valued random

variables (see e.g. [49, Theorem 3.5]). 3

Quantum Entropic Quantities and Notation

In this chapter, we introduce necessary notation in quantum information theory. In Section 3.1, we present various quantum generalizations of the classical Rényi divergence [92], and their mathematical properties. As we will see in quantum hypothesis testing discussed in Chapter 4, some specic deni-tions of the quantum Rényi divergence naturally arise in the exponent function. In Secdeni-tions 3.2 and 3.3, we dene the conditional Rényi entropies and Rényi mutual information, which play signicant roles in th Parts II and III, respectively. We refer the interested readers to books [93,50,8] for more comprehensive discussions.

Notation. Throughout this thesis, we consider a nite-dimensional Hilbert space H. The set of density operators (i.e. positive semi-denite operators with unit trace) and the set of full-rank density operators on H are dened as S(H) and S(H)>0. For ρ, σ ∈ S(H), we write ρ  σ if the support of ρ is contained in the support of σ. The identity operator on H is denoted by 1H. If there is no possibility of confusion, we will skip the subscript H. We use Tr [ · ] as the standard trace function. Let N, R, R≥0, and R>0 denote the set of integers, real numbers, non-negative real numbers, and positive real numbers, respectively. Dene [n] := {1, 2, . . . , n} for n ∈ N.

For a positive semi-denite operator A whose spectral decomposition is A = PiaiPi, where (ai)i and (Pi)i are the eigenvalues and eigenprojections of A, its power is dened as: Ap := P

i:ai6=0apiPi. In particular, A0 denotes the projection onto the support of A. We use supp(A) to denote the support of the operator A. Further, A ⊥ B means supp(A) ∩ supp(B) = ∅.

Given a pair of positive semi-denite operators ρ, σ ∈ S(H), we dene quantum relative entropy [94,95] as

D(ρkσ) := Tr [ρ (logρ − logσ)] . (3.1)

We dene two types of the quantum relative entropy variances [14,15,16] by V (ρkσ) := Trh

ρ (log ρ − log σ)2i

− D(ρkσ)2 (3.2)

V (ρkσ) :=

Z 1

dt Trρ1−t(log ρ − log σ)ρt(log ρ − log σ) − D(ρkσ)2. (3.3)

They are dened to be +∞ when ρ 6 σ We note that when ρ and σ commute, D(ρkσ) reduces to the classical Kullback-Leibler divergence [96]. It is well-known that both the quantities are non-negative, and D(ρkσ) = 0 if and only if ρ = σ, which in turn shows that

V (ρkσ) > 0 implies D(ρkσ) > 0. (3.4)

3.1 Quantum Rényi divergence

For density operators ρ, σ ∈ S(H)>0, and every α ∈ [0, 1), we dene the following two families of quantum Rényi divergences [59,57,58]:

Dα(ρkσ) := 1

α − 1log Qα(ρkσ), Qα(ρkσ) := Trρασ1−α ; (3.5) Dα[(ρkσ) := 1

α − 1log Q[α(ρkσ), Q[α(ρkσ) := Trh

eα log ρ+(1−α) log σi

. (3.6)

We term the above quantities as the (Petz) α-Rényi divergence, and the log-Euclidean α-Rényi diver-gence, respectively. The log-Euclidean Rényi divergence arises from the log-Euclidean operator mean (also called the chaotic mean): A3αB := exp ((1 − α) log A + α log B) for 0 ≤ α ≤ 1. For general density operators ρ, σ ∈ S(H), the above denitions can be extended as

Qα(ρkσ) := lim

δ&0Qα(ρ + δ1kσ + δ1) and Q[α(ρkσ) := lim

δ&0Q[α(ρ + δ1kσ + δ1). (3.7) For α = 1, we dene (see e.g. [58, Lemma III.4]):

Q1(ρkσ) := Trρσ0

and Q[1(ρkσ) := Trρσ0 ; (3.8) D1(ρkσ) := lim

α→1Dα(ρkσ) =D(ρkσ) and D1[(ρkσ) := lim

α→1Dα[(ρkσ) =D(ρkσ). (3.9) In addition, these two quantities are related by the Golden-Thompson inequality given in Lemma 2.7:

Q[α(ρkσ) ≤ Qα(ρkσ), ∀α ∈ [0, 1]. (3.10) The log-Euclidean Rényi divergence is closely related to the quantum version of the Hellinger arc in statistics [97,98], [58, Seciont III]. Lemma3.1will useful to prove the variational representations in Sections5.1 and9.1later.

Lemma 3.1 ([58, Theorem III.5]). Let ρ, τ ∈ S(H) with ρ  τ. For all s > −1, it follows that min

σ∈S(H)D(σkρ) + sD(σkτ ) = sD[1 1+s

(ρkτ ). (3.11)

In the following, we provide useful mathematical properties. Most of them can be found in Refs. [99, 58,100].

Lemma 3.2. Let ρ, σ ∈ S(H). Then,

α 7→ log Qα(ρkσ) and α 7→ log Q[α(ρkσ) are convex on (0, 1); (3.12) α 7→ Dα(ρkσ) is continuous and monotone increasing on [0, 1]; (3.13)

∀α ∈ (0, 1), (ρ, σ) 7→ Q[α(ρkσ) is jointly concave on S(H) × S(H); (3.14)

∀α ∈ [0, 1], σ 7→ Dα(ρkσ) is convex and lower semi-continuous on S(H). (3.15) For every ρ ∈ S(H) and γ > 0, the map

(α, σ) 7→ Qα(ρkσ + γ1) is continuous on [0, 1] × S(H). (3.16) Moreover, for every ρ ∈ S(H), the map

(α, σ) 7→ −Qα(ρkσ) is lower-semicontinuous on [0, 1] × S(H), (3.17) and the same argument holds for Dα.

Proof of Lemma 3.2. We note that Eqs. (3.12), (3.13), (3.14), and (3.15) are proved in [99], [58, Lemma III.3, Lemma III.11, Theorem III.14, Corollary III.25], [100, Corollary 2.2]1. We only prove Eqs. (3.16) and (3.17).

Fix arbitrary γ > 0, α1 ∈ [0, 1], ρ, σ1 ∈ S(H), kσ1− σ2k ≤ ε1, and |α1 − α2| ≤ ε2. Triangle inequality implies that

|Qα1(ρkσ1+ γ1) − Qα2(ρkσ2+ γ1)| ≤ |Qα1(ρkσ2+ γ1) − Qα2(ρkσ2+ γ1)|

+ |Qα1(ρkσ1+ γ1) − Qα1(ρkσ2+ γ1)| . (3.18) In the following, we upper bound the two terms in the right-hand side of Eq. (3.18), respectively.

Without loss of generality, we assume α1 ≤ α2. Direct calculation shows that

|Qα1(ρkσ2+ γ1) − Qα2(ρkσ2+ γ1)| =

For suciently small ε2, it follows that

where we denote by ˜λmin the smallest non-zero eigenvalue. Further, using [67, Problem III.6.14], we

Combining Eqs. (3.24) and (3.26) yields

Hence, Eqs. (3.23) and (3.29) give

|Qα1(ρkσ2+ γ1) − Qα2(ρkσ2+ γ1)| ≤ ε2d(1 + γ)

" ˜λmin(ρ) 1 + γ

#

+ o(ε2). (3.30) Next, we upper bound the second term in Eq. (3.18). Hölder's inequality given in Lemma2.5leads to Then, we apply Lemma 2.12in Section2.1on Eq. (3.32) to obtain

|Qα1(ρkσ1+ γ1) − Qα1(ρkσ2+ γ1)| ≤ dh

1+ γ)1−α1 − γ1−α1i

. (3.33)

Eqs. (3.18), (3.24) and (3.33) thus give

|Qα1(ρkσ1+ γ1) − Qα2(ρkσ2+ γ1)| ≤ ε2 concludes the continuity of (α, σ) 7→ Qα(ρkσ + γ1). The assertion for Dα follow immediately.

Let X = {1, 2, . . . , |X |} be a nite alphabet, and let P(X ) be the set of probability distributions on X . Let W : X → S(H) be a c-q channel. We denote a c-q state by:

P ◦W := X

x∈X

P (x)|xihx| ⊗ Wx. (3.35)

tational basis (|xi)x∈X, i.e. P = Px∈XP (x)|xihx|.

We dene the conditional quantum relative entropy of two sets of density operators ¯W, W and P ∈P(X ) as

D WkW|P  :=¯ X

x∈X

P (x)D W¯xkWx . (3.36)

Similarly, we dene the following conditional entropic quantities for σ ∈ S(H) and P ∈ P(X ):

D (Wkσ|P ) :=X

x∈X

P (x)D (Wxkσ) , (3.37)

Dα(Wkσ|P ) :=X

x∈X

P (x)Dα(Wxkσ) , (3.38)

V (Wkσ|P ) :=X

x∈X

P (x)V (Wxkσ) , (3.39)

V (e Wkσ|P ) :=X

x∈X

P (x) eV (Wxkσ) . (3.40)