Large Deviation Theory - 量子資訊理論中的錯誤率分析

(f (A) − f (B))₊⊕ (f (B) − f (A))₊

_∞ (2.44)

≤ k[f ((A − B)₊) − f (0)1] ⊕ [f((B − A)+) − f (0)1]k_∞ (2.45)

= kf (|A − B|) − f (0)1k_∞ (2.46)

= f (kA − Bk_∞) − f (0). (2.47)

Lemma 2.13. [90, Corollary 3.6] Let Ai be m × m positive semi-denite matrix and Zi be n × m matrix for i = 1, . . . , k. Then, for all unitarily invariant norms k · k and γ > 0, the map

(p, t) 7→

i=1

Z_i^∗A^t/p_i Z_i

!γp

(2.48)

is jointly log-convex on (0, +∞) × (−∞, +∞).

2.2 Large Deviation Theory

In this section, we will see that the Lengendre-Fenchel transform is closely related to the error-exponent function of hypothesis testing and channel coding. Consider the following binary classical hypotheses:

H0 : pⁿ:= p_x₁ ⊗ p_x₂ ⊗ · · · p_x_n,

H1 : qⁿ:= q_x₁⊗ q_x₂ ⊗ · · · q_x_n, (2.49) where pxi, qxi are probability mass functions; and xi belongs to some nite alphabet X and n ∈ N be

xed. Given any r ≥ 0, recall the denition of the error-exponent function in Eq. (4.7):

φn(r) = φn(r|pⁿkqⁿ) = sup

α∈(0,1]

1 − α α

nDα(pⁿkqⁿ) − r

. (2.50)

Without loss of generality, we assume that pⁿ qⁿ have the same support since elements of qxi, that do not lie in the support of pxi, do not contribute to φn(r).

Let Z be a random variable with probability measure µ. Further, we assume Z is nite on supp(µ).

The cumulant generating function (c.g.f.) of Z is dened as

Λ(t) := logEµe^tZ , t ∈R. (2.51)

The Lengendre-Fenchel transform of Λ(t) is Λ^∗(z) := sup

t∈R{zt − Λ(t)} . (2.52)

Such a transform plays a signicant role in concentration inequalities, convex analysis, and large deviation theory [27].

Let Pxⁿ be the empirical distribution of the sequence xⁿ = x1x2. . . xn. Let Z0 = log^q_pⁿn with

Rewrite the right-hand side of Eq. (2.50) with α = _1+s¹ , and observe that

Then the error-exponent function in Eq. (2.50) can also be viewed as a Lengendre-Fenchel transform of E₀⁽²⁾(s, Pxⁿ):

The following lemma relates φn(r)to Λ^∗_j,P_xn(z), the Lengendre-Fenchel transform of Eq. (2.53):

Λ^∗_j,P_xn(z) := sup

xn be the optimizer of φ_n(r) in Eq. (2.57). The optimizer t^? ∈ (0, 1) is unique, and satises Λ⁰_0,P_xn(t^?) = φ_n(r) − r.

Before proving Lemma 2.14, we will need the following partial derivatives with respect to t:

where we denote the tilted distributions for every i ∈ [n] and t ∈ [0, 1] by

qxi,t(ω) := pxi(ω)^1−tqxi(ω)^t P

ω∈supp(pxi)pxi(ω)^1−tqxi(ω)^t, ω ∈supp(pxi). (2.61) It is also easy to verify that

Λ_0,x_i(t) = Λ_1,x_i(1 − t), Λ⁰_0,x_i(t) = −Λ⁰_1,x_i(1 − t), Λ⁰⁰_0,x_i(t) = Λ⁰⁰_1,x_i(1 − t). (2.62) This lemma closely follows Ref. [91, Lemma 9]; however, the major dierence is that we prove the claim using φn(r|ρⁿkσⁿ)in Eq. (4.7) instead of the discrimination function: min {D (τkρ) : D (τkσ) ≤ r}

in Eq. (9.20). This expression is crucial to obtaining the sphere-packing bound in Theorem11.1in the strong from, cf. Eq. (1.4), instead of the weak form, cf. Eq. (1.5).

Proof of Lemma 2.14.

(2.14-(a)) We will prove this statement by contradiction. Let t ∈ [0, 1], Assuming that Λ⁰⁰_0,P_xn(t) = 0, implies Λ⁰⁰0,x(t) = 0, ∀x ∈ supp(Pxⁿ). Recall from Eq. (2.60)

which is equivalent to

p_x(ω) = q_x(ω) · e^−Λ⁰^0,x^(t), ∀ω ∈supp(px). (2.64) Summing both sides of Eq. (2.64) over ω ∈ supp(px) gives

1 = Trp⁰_xq_x e^−Λ⁰^0,x^(t). (2.65) Then, Eqs. (2.64) and (2.65) imply that

φn(r) = sup

(2.14-(b)) Observe that E₀⁽²⁾(s, Pxⁿ) − sr in Eq. (2.57) is strictly concave in s ∈ R≥0 since

owing to Eqs. (2.56), (2.60), and Lemma (a). Moreover, s = 0 cannot be an optimum in Eq. (2.57); otherwise, it will violate the assumption φn(r) ≥ 0. Thus a unique maximizer s^?∈R>0 exists such that

where in the second equality we use Eq. (2.56) and

r = ∂E₀⁽²⁾(s, Pxⁿ)

Comparing Eq. (2.71) with (2.73) gives

Λ⁰_0,P

which is exactly the optimum solution to Λ^∗_0,P_xn(z)in Eq. (2.58) with

t^?= s^?

where Eqs. (2.74) and (2.71) are used in the third and last equalities.

(2.14-(c)) This proof follows from similar arguments in item (b)and Eq. (2.62). Eqs. (2.74) and (2.62) lead to

which satises the optimum solution to Λ1,P_xn(z) in Eq. (2.58) with t^? = _1+s¹? ∈ (0, 1) and

where the third equality is due to Eq. (2.81), and the last equality follows from Eqs. (2.62) and (2.73).

(2.14-(d)) The fact that a unique optimizer t^? ∈ (0, 1) exists such that Λ⁰_0,P_xn (t^?) = φn(r) − r follows directly from Eqs. (2.74), (2.75) and Λ⁰⁰_0,P_xn(t) > 0, for t ∈ [0, 1].

Moreover, Eqs. (2.70), (2.72), and (2.69) yield

−∂φn(r) which completes the claim in item(d).

Let (Zi)ⁿ_i=1be a sequence of independent, real-valued random variables with probability measures (µi)ⁿ_i=1. Let Λi(t) := logE e^tZⁱ

and dene the Legendre-Fenchel transform of _n¹Pn

i=1Λi(·) to be:

Dene the probability measure ˜µi via d˜µi

inequality for _n¹Pn i=1Zi:

Theorem 2.1 (Bahadur-Ranga Rao's Concentration Inequality [91, Proposition 5], [48]). Provided that √

Chaganty and Sethuraman in Ref. [49, Theorem 3.3] considered a more general sequence of random variables {Zn}_n∈_N, which are not necessarily the sum of random variables.

Let (Xi)_i∈_N be a sequence of independent, real-valued random variables with probability measures (µi)ⁿ_i=1. Let Zn := Pn

i=1Xi and let Λn(t) := logE e^tZⁿ

. Dene the Legendre-Fenchel transform of

1 With these denitions, we can now state the following sharp concentration inequality for ¹_nZ_n: Theorem 2.2 (Chaganty-Sethuraman's Concentration Inequality [49, Theorem 3.3] ). For any η ∈ (0, 1), there exists an N0 ∈N such that, for all n ≥ N0,

Remark 2.1. Chaganty and Sethuraman proved Theorem2.2 provided that the following condition is

satised: there exists δ0 > 0such that for any δ and λ with 0 < δ < δ0 < λ, supδ<|t|≤λt^?_n|exp{Λ_n(t^?_n+ it)}/ exp{Λ_n(t^?_n)}| = o(1/√

n), where the supremum is dened to be 0 if {t : δ < |t| ≤ λt^?_n}is empty. In the case of Znbeing a sum of random variables, exp{Λn(t^?_n+ it)}/ exp{Λ_n(t^?_n)}is the product of the characteristic functions of {Xi}ⁿ_i=1. Since the supremum of a characteristic function on a compact interval not containing 0 is less than 1, this condition is thus satised.

We note that the lower bound in Theorem2.2for the general sequence of random variables (Xi)_i∈_N suces to establish the converse bound in moderate deviation analysis for c-q channel coding, The-orem 12.2 in Chapter 12 later. We do not particularly consider the case of lattice valued random

variables (see e.g. [49, Theorem 3.5]). 3

Quantum Entropic Quantities and Notation

In this chapter, we introduce necessary notation in quantum information theory. In Section 3.1, we present various quantum generalizations of the classical Rényi divergence [92], and their mathematical properties. As we will see in quantum hypothesis testing discussed in Chapter 4, some specic deni-tions of the quantum Rényi divergence naturally arise in the exponent function. In Secdeni-tions 3.2 and 3.3, we dene the conditional Rényi entropies and Rényi mutual information, which play signicant roles in th Parts II and III, respectively. We refer the interested readers to books [93,50,8] for more comprehensive discussions.

Notation. Throughout this thesis, we consider a nite-dimensional Hilbert space H. The set of density operators (i.e. positive semi-denite operators with unit trace) and the set of full-rank density operators on H are dened as S(H) and S(H)_>0. For ρ, σ ∈ S(H), we write ρ σ if the support of ρ is contained in the support of σ. The identity operator on H is denoted by 1H. If there is no possibility of confusion, we will skip the subscript H. We use Tr [ · ] as the standard trace function. Let N, R, R≥0, and R>0 denote the set of integers, real numbers, non-negative real numbers, and positive real numbers, respectively. Dene [n] := {1, 2, . . . , n} for n ∈ N.

For a positive semi-denite operator A whose spectral decomposition is A = P_ia_iP_i, where (ai)_i and (Pi)i are the eigenvalues and eigenprojections of A, its power is dened as: A^p := P

i:ai6=0a^p_iPi. In particular, A⁰ denotes the projection onto the support of A. We use supp(A) to denote the support of the operator A. Further, A ⊥ B means supp(A) ∩ supp(B) = ∅.

Given a pair of positive semi-denite operators ρ, σ ∈ S(H), we dene quantum relative entropy [94,95] as

D(ρkσ) := Tr [ρ (logρ − logσ)] . (3.1)

We dene two types of the quantum relative entropy variances [14,15,16] by V (ρkσ) := Trh

ρ (log ρ − log σ)²i

− D(ρkσ)² (3.2)

V (ρkσ) :=

Z 1

dt Trρ^1−t(log ρ − log σ)ρ^t(log ρ − log σ) − D(ρkσ)². (3.3)

They are dened to be +∞ when ρ 6 σ We note that when ρ and σ commute, D(ρkσ) reduces to the classical Kullback-Leibler divergence [96]. It is well-known that both the quantities are non-negative, and D(ρkσ) = 0 if and only if ρ = σ, which in turn shows that

V (ρkσ) > 0 implies D(ρkσ) > 0. (3.4)

3.1 Quantum Rényi divergence

For density operators ρ, σ ∈ S(H)_>0, and every α ∈ [0, 1), we dene the following two families of quantum Rényi divergences [59,57,58]:

Dα(ρkσ) := 1

α − 1log Qα(ρkσ), Qα(ρkσ) := Trρ^ασ^1−α ; (3.5) D_α^[(ρkσ) := 1

α − 1log Q^[_α(ρkσ), Q^[_α(ρkσ) := Trh

eα log ρ+(1−α) log σi

. (3.6)

We term the above quantities as the (Petz) α-Rényi divergence, and the log-Euclidean α-Rényi diver-gence, respectively. The log-Euclidean Rényi divergence arises from the log-Euclidean operator mean (also called the chaotic mean): A3αB := exp ((1 − α) log A + α log B) for 0 ≤ α ≤ 1. For general density operators ρ, σ ∈ S(H), the above denitions can be extended as

Qα(ρkσ) := lim

δ&0Qα(ρ + δ1kσ + δ1) and Q^[_α(ρkσ) := lim

δ&0Q^[_α(ρ + δ1kσ + δ1). (3.7) For α = 1, we dene (see e.g. [58, Lemma III.4]):

Q1(ρkσ) := Trρσ⁰

and Q^[1(ρkσ) := Trρσ⁰ ; (3.8) D1(ρkσ) := lim

α→1Dα(ρkσ) =D(ρkσ) and D1^[(ρkσ) := lim

α→1D_α^[(ρkσ) =D(ρkσ). (3.9) In addition, these two quantities are related by the Golden-Thompson inequality given in Lemma 2.7:

Q^[_α(ρkσ) ≤ Qα(ρkσ), ∀α ∈ [0, 1]. (3.10) The log-Euclidean Rényi divergence is closely related to the quantum version of the Hellinger arc in statistics [97,98], [58, Seciont III]. Lemma3.1will useful to prove the variational representations in Sections5.1 and9.1later.

Lemma 3.1 ([58, Theorem III.5]). Let ρ, τ ∈ S(H) with ρ τ. For all s > −1, it follows that min

σ∈S(H)D(σkρ) + sD(σkτ ) = sD^[1 1+s

(ρkτ ). (3.11)

In the following, we provide useful mathematical properties. Most of them can be found in Refs. [99, 58,100].

Lemma 3.2. Let ρ, σ ∈ S(H). Then,

α 7→ log Qα(ρkσ) and α 7→ log Q^[α(ρkσ) are convex on (0, 1); (3.12) α 7→ D_α(ρkσ) is continuous and monotone increasing on [0, 1]; (3.13)

∀α ∈ (0, 1), (ρ, σ) 7→ Q^[_α(ρkσ) is jointly concave on S(H) × S(H); (3.14)

∀α ∈ [0, 1], σ 7→ D_α(ρkσ) is convex and lower semi-continuous on S(H). (3.15) For every ρ ∈ S(H) and γ > 0, the map

(α, σ) 7→ Q_α(ρkσ + γ1) is continuous on [0, 1] × S(H). (3.16) Moreover, for every ρ ∈ S(H), the map

(α, σ) 7→ −Q_α(ρkσ) is lower-semicontinuous on [0, 1] × S(H), (3.17) and the same argument holds for Dα.

Proof of Lemma 3.2. We note that Eqs. (3.12), (3.13), (3.14), and (3.15) are proved in [99], [58, Lemma III.3, Lemma III.11, Theorem III.14, Corollary III.25], [100, Corollary 2.2]¹. We only prove Eqs. (3.16) and (3.17).

Fix arbitrary γ > 0, α1 ∈ [0, 1], ρ, σ1 ∈ S(H), kσ1− σ₂k∞ ≤ ε₁, and |α1 − α₂| ≤ ε₂. Triangle inequality implies that

|Q_α₁(ρkσ1+ γ1) − Qα2(ρkσ2+ γ1)| ≤ |Qα1(ρkσ2+ γ1) − Qα2(ρkσ2+ γ1)|

+ |Qα1(ρkσ1+ γ1) − Qα1(ρkσ2+ γ1)| . (3.18) In the following, we upper bound the two terms in the right-hand side of Eq. (3.18), respectively.

Without loss of generality, we assume α1 ≤ α₂. Direct calculation shows that

|Q_α₁(ρkσ2+ γ1) − Qα2(ρkσ2+ γ1)| =

For suciently small ε2, it follows that

where we denote by ˜λmin the smallest non-zero eigenvalue. Further, using [67, Problem III.6.14], we

Combining Eqs. (3.24) and (3.26) yields

Hence, Eqs. (3.23) and (3.29) give

|Q_α₁(ρkσ2+ γ1) − Qα2(ρkσ2+ γ1)| ≤ ε2d(1 + γ)

" ˜λ_min(ρ) 1 + γ

+ o(ε2). (3.30) Next, we upper bound the second term in Eq. (3.18). Hölder's inequality given in Lemma2.5leads to Then, we apply Lemma 2.12in Section2.1on Eq. (3.32) to obtain

|Q_α₁(ρkσ1+ γ1) − Qα1(ρkσ2+ γ1)| ≤ dh

(ε1+ γ)^1−α¹ − γ^1−α¹i

. (3.33)

Eqs. (3.18), (3.24) and (3.33) thus give

|Q_α₁(ρkσ₁+ γ1) − Qα2(ρkσ₂+ γ1)| ≤ ε2 concludes the continuity of (α, σ) 7→ Qα(ρkσ + γ1). The assertion for Dα follow immediately.

Let X = {1, 2, . . . , |X |} be a nite alphabet, and let P(X ) be the set of probability distributions on X . Let W : X → S(H) be a c-q channel. We denote a c-q state by:

P ◦W := X

x∈X

P (x)|xihx| ⊗ W_x. (3.35)

tational basis (|xi)x∈X, i.e. P = P_x∈XP (x)|xihx|.

We dene the conditional quantum relative entropy of two sets of density operators ¯W, W and P ∈P(X ) as

D WkW|P :=¯ X

x∈X

P (x)D W¯_xkW_x . (3.36)

Similarly, we dene the following conditional entropic quantities for σ ∈ S(H) and P ∈ P(X ):

D (Wkσ|P ) :=X

x∈X

P (x)D (Wxkσ) , (3.37)

D_α(Wkσ|P ) :=X

x∈X

P (x)D_α(W_xkσ) , (3.38)

V (Wkσ|P ) :=X

x∈X

P (x)V (Wxkσ) , (3.39)

V (e Wkσ|P ) :=X

x∈X

P (x) eV (W_xkσ) . (3.40)

在文檔中量子資訊理論中的錯誤率分析 (頁 29-39)