• 沒有找到結果。

Hypercontractivity and the logarithmic Sobolev constant

Since Gross introduced the notions of the logarithmic Sobolev constant and of the hypercontractivity, many techniques are developed to compute the logarithmic Sobolev constant. The hypercontractivity is proved useful in bounding the conver-gence rate of Markov chains to their stationarity. An informative account of the development of logarithmic Sobolev inequalities can be found in the survey paper [14].

In Section 2.1, we define the logarithmic Sobolev constant and use it to bound the entropy of a Markov chain. In Section 2.2, we introduce how the hypercontrac-tivity can be used to bound the `p-distance and the `p-mixing time. In Section 2.3, diverse techniques for the estimation of the logarithmic Sobolev constant are intro-duced. In Section 2.4, we determine the explicit value of the logarithmic Sobolev constant for some examples.

2.1 The logarithmic Sobolev constant

The definition of the logarithmic Sobolev constant is very similar to that of the spectral gap. For a motive of why we concern such a constant, let’s start by looking at the relative entropy of the continuous-time Markov chain. Let (X , K, π) be an irreducible Markov chain, Ht be the associated continuous-time semigroup of K and E be the Dirichlet form. Recall that the entropy of a probability measure µ

with respect to π is defined by

Entπ(µ) = π(h log h),

where µ = hπ. Here we abuse the usage of Ent by letting

Entπ(f ) = π(f log f ),

if f is a any nonnegative function but not a probability measure. A simple com-putation shows that, for any probability measure µ, P

xµ(x)/π(x) 6= 1 and

where the inequality is proved by Diaconis and Saloff-Coste in [11, Lemma 2.7] and has an improved coefficient 4 instead of 2 if K is assumed reversible. By (2.1), one can see that a bound on the ratio Entπ(Hth)/E(p

Hth,p

Hth) suffices to give a bound the rate of the convergence. To define the logarithmic Sobolev constant, we need to replace the variance by the following entropy-like quantity.

L(f ) = Lπ(f ) =X

Since u 7→ u log u is convex, Jensen’s inequality implies that L(f ) is nonnegative.

Furthermore, if π is positive everywhere, then L(f ) = 0 if and only if f is constant.

Note that if kf k2 = 1, that is, f2 is the probability density of µ = f2π with respective to π, then

L(f ) = Entπ(µ).

Definition 2.1. Let (X , K, π) be an irreducible Markov chain and L be the func-tional defined in (2.2). The logarithmic Sobolev constant α = α(K) is defined by

α = inf

½E(f, f )

L(f ) : L(f ) 6= 0

¾ .

By definition, it is clear that α(K) = α(K). Obviously, one has L(f ) = L(|f |) and E(|f |, |f |) ≤ E(f, f ). By these facts, the logarithmic Sobolev constant can be obtained by taking the infimum of the ratio E(f, f )/L(f ) over all nonnegative functions f .

Definition 2.2. Let (X , K, π) be an irreducible Markov chain and E be the Dirich-let form. A logarithmic Sobolev inequality is an inequality of the following type.

CL(f ) ≤ E(f, f ), for all function f ,

where C is a nonnegative constant.

By the above definition, if the logarithmic Sobolev inequality holds for some constant C ≥ 0, then α ≥ C. In other words, α is the largest constant C such that the logarithmic Sobolev inequality holds. One may think of the existence of a function f such that the ratio E(f, f )/L(f ) is equal to 0, which means that the logarithmic Sobolev inequality never holds unless C = 0. It has been proved that the irreducibility eliminates such a possibility. Thus, one needs to consider only the case C > 0 in Definition 2.2. For a proof of the fact α > 0, please see Proposition 2.3. By (2.1), the entropy of a continuous-time Markov chain is bounded from above as follows.

Proposition 2.1. Let (X , K, π) be an irreducible Markov chain, Ht be the asso-ciated semigroup and α be the logarithmic Sobolev constant. Let µ be a probability

measure on X . Then one has

The same proof as above works for the reversible cases. The second part is followed by letting µ = δx, where δx(y) = 1 if y = x and δx(y) = 0 otherwise.

By applying Proposition 1.9 and Proposition 2.1, one may give an upper bound on the total variation distance.

Corollary 2.1. Let (X , K, π) be an irreducible Markov chain and α be the loga-rithmic Sobolev constant of K. Then, for t > 0,

dπ,1(Ht(x, ·), π) ≤p

2 log(1/π(x))e−αt, and, if K is reversible,

dπ,1(Ht(x, ·), π) ≤p

2.2 Hypercontractivity

In the previous section, the entropy and the `1-distance of a continuous-time Markov chain are proved to converge exponentially with rate at least the loga-rithmic Sobolev constant. It is natural to consider using the logaloga-rithmic Sobolev constant to bound the `p-distance. The following theorem is the well-known hy-percontractivity introduced in [14], which is sufficient to derive a bound on the

`p-distance.

Theorem 2.1. (Theorem 3.5 in [11])Let (X , K, π) be an irreducible Markov chain and α be the logarithmic Sobolev constant of K.

(1) Assume that there exists β > 0 such that kHtk2→q ≤ 1 for all t > 0 and 2 ≤ q < ∞ satisfying e4βt≥ q − 1. Then βL(f ) ≤ E(f, f ) for all f , and thus α ≥ β.

(2) Assume that (K, π) is reversible. Then kHtk2→q ≤ 1 for all t > 0 and 2 ≤ q < ∞ satisfying e4αt ≥ q − 1.

(3) For non-reversible chains, we have kHtk2→q ≤ 1 for all t > 0 and 2 ≤ q < ∞ satisfying e2αt ≥ q − 1.

Proof. See the proof given in [11].

Remark 2.1. Note that if (K, π) is reversible, then the first two assertions in The-orem 2.1 characterize the logarithmic Sobolev constant as follows.

α = max{β : kHtk2→q ≤ 1, ∀t ≥ 1 log(q − 1), 2 ≤ q < ∞}.

To point out a surprising observation from the hypercontractivity, we recall the following fact in [23].

Lemma 2.1. Assume that K is a normal operator on `2(π) and β0 = 1, β1, ..., β|X |−1 are the eigenvalues of K with corresponding eigenvectors φ0 ≡ 1, φ1, ..., φ|X |−1. Then, for all x ∈ X , one has

kHt(x, ·)/πk22 =

|X |−1X

i=0

e−2t(1−Reβi)i(x)|2.

It follows from the above lemma that kHtk2→∞ > 1 if K is normal. Since Ht is a contraction in `2 and has eigenvalue 1 with corresponding eigenvector 1, we have kHtk2→2 = 1. A nontrivial observation, even in the discrete setting of a state space, from the hypercontractivity is the existence of 0 < tq < ∞, for any 2 < q < ∞, such that kHtk2→q = 1 when t ≥ tq.

By Theorem 2.1, we may bound the `p-distance from above by using the loga-rithmic Sobolev constant.

Theorem 2.2. Let (X , K, π) be an irreducible Markov chain and λ and α be the spectral gap and the logarithmic Sobolev constant of K. Then, for ², θ, σ ≥ 0 and t = ² + θ + σ,

dπ,2(Ht(x, ·), π) ≤







kH²(x, ·)/πk2/(1+e2 4αθ)e−λσ if K is reversible kH²(x, ·)/πk2/(1+e2 2αθ)e−λσ in general

.

In particular, for c ≥ 0, one has

dπ,2(Ht(x, ·), π) ≤ e1−c,

as

t =







(4α)−1log+log(1/π(x)) + cλ−1 if K is reversible (2α)−1log+log(1/π(x)) + cλ−1 in general

,

where log+t = max{0, log t}.

Proof. We consider only the reversible case by using Theorem 2.1(2), while the general case can be proved by applying Theorem 2.1(3). Let θ > 0 and q(θ) = 1 + e4αθ. By Theorem 2.1(2), it is clear that kHθk2→q(θ) ≤ 1, and by the duality given in Lemma A.1, it follows that kHθkq0(θ)→2 ≤ 1, where q(θ)−1+ (q0(θ))−1 = 1.

For convenience, let hxt denote the density of Ht(x, ·) with respect to π. Note that hxt+s = Hthxs. This implies

khxt − 1k2 = k(Hσ− π)(Hθhx²)k2

≤ kHσ− πk2→2kHθkq0(θ)→2khx²kq0(θ)≤ e−λσkhx²k2/q(θ)2 , where the last inequality uses Remark 1.6 and the following H¨older inequality.

kf kq0 ≤ kf k1−2/q1 kf k2/q2 ,

for all 1 ≤ q0 ≤ 2 and q−1+ (q0)−1 = 1. This proves the first inequality.

For the second part, note that khx0k2 = π(x)−1/2 for x ∈ X . By letting ² = 0, we obtain

khxt − 1k2 µ 1

π(x)

1/(1+e4αθ) e−λσ.

Let σ = cλ−1. To get the desired upper bound for the `2-distance, we let σ = cλ−1, choose θ = 0 if π(x) > e−1, and put

θ = 1

log log 1 π(x), if π(x) < e−1.

Using the Cauchy-Schwartz inequality, the `-distance can be bounded from above by the `2-distance. In fact, for t > 0, one has

|ht(x, y) − 1| =

¯¯

¯¯

¯ X

z∈X

(ht/2(x, z) − 1)(ht/2(y, z) − 1)π(z)

¯¯

¯¯

¯

≤ kht/2(x.·) − 1k2kht/2(y, ·) − 1k2.

This implies the following corollary.

Corollary 2.2. Let (X , K, π) be an irreducible Markov chain and λ and α be the spectral gap and logarithmic Sobolev constant of K. Then, for c > 0, one has

|Ht(x, y)/π(y) − 1| ≤ e2−c,

Summing up Theorem 2.2 and Corollary 2.2, we may bound the `p-mixing time by using the logarithmic Sobolev constant.

Corollary 2.3. Let K be a reversible and irreducible Markov chain with stationary distribution π and α be the logarithmic Sobolev constant. For 1 ≤ p ≤ ∞, let Tp be the `p-mixing time of K. Then, for 1 < p ≤ 2,

Proof. The upper bounds are obtained immediately from Theorem 2.2 and Corol-lary 2.2. For the lower bound, Theorem 3.9 in [11] proves the case 2 < p ≤ ∞.

For 1 < p ≤ 2, we use the fact

T2 ≤ mpTp, ∀1 < p ≤ 2.

Remark 2.2. For general cases, Theorem 2.2 and Corollary 2.2 derive an upper bound of the `p-mixing time which is twice of that in Corollary 2.3.

Remark 2.3. Comparing Corollary 2.3 with Theorem 1.3, one may find that to bound the `p-mixing time of a reversible continuous-time Markov chain, the loga-rithmic Sobolev constant is more closely related to Tp than the spectral gap.

2.3 Tools to compute the logarithmic Sobolev constant

It follows from Theorem 1.3 and Corollary 2.3 that the logarithmic Sobolev con-stant provides a tighter bound(in the sense of order) for the time to equilibrium Tp than the spectral gap. Based on Corollary 2.3, to bound the `p-mixing time by using the logarithmic Sobolev constant, we need to determine its value. For this view point, it is natural to ask: can one compute explicitly or estimate the con-stant α? In this section, we introduce several established tools to help determine the logarithmic Sobolev constant.

1. Bounding α from above by using the spectral gap λ. The following proposition establishes a relation between the spectral gap and the logarithmic Sobolev constant.

Proposition 2.2. (Lemma 2.2.2 in [23]) Let (X , K, π) be an irreducible Markov chain. Then the spectral gap λ and the logarithmic Sobolev constant α of K satisfy α ≤ λ/2. Furthermore, let φ be an eigenvector of the matrix 12(K + K) whose corresponding eigenvalue is (1 − λ). If π(φ3) 6= 0, then α < λ/2.

Proof. We show by following the proof in [23] whose original idea comes from [22].

Let g be a real function on X and set f = 1 + ²g. Then for small enough ², we

have

f2log f2 = (1 + 2²g + ²2g2

2²g − ²2g2+ 23²3g3+ O(²4

= 2²g + 3²2g2+ 23²3g3+ O(²4),

and

f2log kf k22 = (1 + 2²g + ²2g2

2²π(g) + ²2(π(g2) − 2π(g)2) + ²3¡8

3π(g)3− 2π(g)π(g2

+ O(²4

= 2²π(g) + ²2(4gπ(g) + π(g2) − 2π(g)2) + ²3£8

3π(g)3− 2π(g)π(g2) + 2gπ(g2)

− 4gπ(g)2+ 2g2π(g)¤

+ O(²4)

Thus,

f2log f2

kf k22 = 2² [g − π(g)] + ²2£

3g2− 4gπ(g) − π(g2) + 2π(g)2¤ + ²3£2

3g3 83π(g)3+ 2π(g)π(g2) − 2gπ(g2) + 4gπ(g)2− 2g2π(g)¤

+ O(²4)

and

L(f ) = 2²2Varπ(g) + ²3£2

3π(g3) + 43π(g)3− 2π(g)π(g2

+ O(²4), where O(·) depends only on kgk.

To finish the proof, note that E(f, f ) = ²2E(g, g). Let φ be an eigenfunction of 12(K + K) whose eigenvalue is 1 − λ. By definition, it is clear that E(φ, φ) = λVarπ(φ) and π(φ) = 0. Letting g = φ implies

α ≤ E(f, f )

L(f ) = λVarπ(φ)

2Varπ(φ) +23²π(φ3) + O(²2)

The first inequality is obtained by letting ² → 0. For the second part, since π(φ3) 6= 0, we may choose |²| > 0 such that ²π(φ3) > 0 and 23²π(φ3) > O(²2). This proves the second inequality.

2. One sufficient condition for α = λ/2. As a consequence of Proposition 2.2, the logarithmic Sobolev constant α is bounded from above by λ/2. Furthermore, a sufficient condition for the case 2α < λ is also given in that proposition. In the following, we give a necessary condition for the situation 2α < λ to happen.

Proposition 2.3. (Theorem 2.2.3 in [23]) Let (X , K, π) be an irreducible Markov chain and λ and α be the spectral gap and the logarithmic Sobolev con-stant of K. Then either α = λ/2 or there exists a positive non-concon-stant function u which is a solution of

2u log u − 2u log kuk2 = 1

α(I − K)u = 0, (2.3)

where α = E(u, u)/L(u). In particular, α > 0.

Proof. We prove by considering the minimizer of the infimum in Definition 2.1.

Note that we may restrict ourselves to non-negative vectors with mean 1(under π). By definition, either α is attained by a nonnegative non-constant vector, say u, or the infimum is attained at the constant vector 1. In the latter case, one may choose a minimizing sequence (1 + ²ngn)1 satisfying

²n→ 0 and π(gn) = 0, kgnk2 = 1, ∀n ≥ 1.

This implies that the sequence {kgnk} is bounded from above and below by positive numbers. Then, by the proof of Proposition 2.2, we get

α = lim

n→∞

E(1 + ²ngn, 1 + ²ngn) L(1 + ²ngn)

= lim

n→∞

E(gn, gn)

2Varπ(gn) + O(²n) ≥ lim inf

n→∞

λ

2 + O(²n) = λ 2

This proves α = λ/2.

If α is attained by a nonnegative non-constant vector f , then by viewing E(f, f )/L(f ) as a function defined on R|X |, we have the following Euler-Lagrange equation

µE(f, f ) L(f )

¶ ¯¯

¯¯

f =u

= 0,

which is identical to (2.3). To show the positiveness of u, observe that if u(x) = 0 for some x ∈ X , then (2.3) implies that Ku(x) = 0, or equivalently, u(y) = 0 if K(x, y) > 0. Thus, by the irreducibility of K, one has u ≡ 0, which contradicts the assumption that u is not constant.

Remark 2.4. Note that a constant function is always a solution of (2.3).

Corollary 2.4. Let (X , K, π) be an irreducible Markov chain and λ and α be the spectral gap and logarithmic Sobolev constant of K. If a non-constant function u on X and a positive number β satisfy the following system of equations

(I − K)u = 2β(u log u − u log kuk2), (2.4)

then β = E(u, u)/L(u). In particular, (2.4) has no non-constant solution for β ∈ (0, α). Moreover, if (2.4) has no non-constant solution for β ∈ (0, λ/2), then α = λ/2.

3. Comparison technique. In many cases, the model of interest is complicated but can be replaced by a simpler one. In that case, the tradeoff of the replacement can be the loss of the accuracy of α(up to a constant) but the advantage is the simplicity of the new chain and, mostly, α is of the same order as the logarithmic Sobolev constant of the new Markov chain.

Proposition 2.4. (Lemma 2.2.12 in [23]) Let (X1, K1, π1) and (X2, K2, π2) be irreducible Markov chains and E1 and E2 be respective Dirichlet forms. Assume that there exists a linear map

T : `21) → `22)

and constant A > 0, B ≥ 0, a > 0 such that, for all f ∈ `21),

E2(T f, T f ) ≤ AE1(f, f ), aVarπ1(f ) ≤ Varπ2(T f ) + BE1(f, f ).

Then the spectral gaps λ1 = λ(K1) and λ2 = λ(K2) satisfy 2

A + Bλ2 ≤ λ1. Similarly, if

E2(T f, T f ) ≤ AE1(f, f ), aLπ1(f ) ≤ Lπ2(T f ) + BE1(f, f ),

then the logarithmic Sobolev constants α1 = α(K1) and α2 = α(K2) satisfy 2

A + Bα2

≤ α1.

In particular, if X1 = X2, E2 ≤ AE1 and aπ1 ≤ π2, then 2

A ≤ λ1, 2 A ≤ α1.

Proof. The proof follows from the variational definitions of the spectral gap and the logarithmic Sobolev constant. For the spectral gap, we have

aVarπ1(f ) ≤ Varπ2(T f ) + BE1(f, f ) ≤ E2(T f, T f )

λ2 + BE1(f, f )

µA

λ2 + B

E1(f, f ).

The proof for the logarithmic Soboloev constant goes in the same way.

To show the last part, consider the following characterizations of λ and α.

Varπ(f ) = min

c∈R kf − ck22 = min

c∈R

X

x∈X

[f (x) − c]2π(x), (2.5)

and

Lπ(f ) =X

x∈X

£f2(x) log f2(x) − f2(x) log kf k22− f2(x) + kf k22¤ π(x)

= min

c>0

X

x∈X

£f2(x) log f2(x) − f2(x) log c − f2(x) + c¤ π(x).

(2.6)

Letting T = I implies that

aVarπ1(f ) ≤ Varπ2(f ), aLπ1(f ) ≤ Lπ2(f ),

where the second one use the fact, t log t − t log s − t + s ≥ 0 for t, s ≥ 0.

The following is a simple but useful tool which involves collapsing a chain to that with a smaller state space.

Corollary 2.5. Let (X1, K1, π1) and (X2, K2, π2) be irreducible Markov chains and E1 and E2 be respective Dirichlet forms. Assume that there exists a surjective map p : X2 → X1 such that

E2(f ◦ p, f ◦ p) ≤ AE1(f, f ), ∀f ∈ R|X1|,

and

1(f ) ≤ π2(f ◦ p), ∀f ≥ 0.

Then the spectral gaps λ1 = λ(K1), λ2 = λ(K2) and the logarithmic Sobolev con-stants α1 = α(K1), α2 = α(K2) satisfy

2

A ≤ λ1, 2 A ≤ α1.

In particular, if a = A = 1, α2 = λ2/2 and λ1 = λ2, then α1 = λ1/2.

Proof. Let T : `21) → `22) be a linear map defined by T f = f ◦ p. In this setting, the assumption becomes

E2(T f, T f ) ≤ AE1(f, f ), ∀f ∈ R|X1|,

and

1(f ) ≤ π2(T f ), ∀f ≥ 0.

By (2.5) and (2.6), the second inequality implies

aVarπ1(f ) ≤ Varπ2(T f ), aLπ1(f ) ≤ Lπ2(T f ), ∀f ∈ R|X1|.

The desired identity is then proved by Proposition 2.4.

Remark 2.5. Note that, in Corollary 2.5, if π1 is a pushforward of π2, that is, X

y:p(y)=x

π2(y) = π1(x), ∀x ∈ X1,

then π1(f ) = π2(f ◦ p) for all f ∈ R|X1|.

The following is a further corollary of Corollary 2.5 and the above remark which gives a sufficient condition on a = A = 1 in Corollary 2.5.

Corollary 2.6. Let (X1, K1, π1) and (X2, K2, π2) be irreducible Markov chains and p : X2 → X1 be a surjective map. Assume that, for all x, y ∈ X1,

X

z:p(z)=x w:p(w)=y

π2(z)K2(z, w) = π1(x)K1(x, y). (2.7)

Let λ1, λ2 and α1, α2 be respectively the spectral gaps and logarithmic Sobolev con-stants of K1 and K2. Then

λ2 ≤ λ1, α2 ≤ α1.

In particular, if λ1 = λ2 and α2 = λ2/2, then α1 = λ1/2.

Proof. It suffices to show that both constants a, A in Corollary 2.5 are equal to 1.

Let E1 and E2 be the Dirichlet forms of (X1, K1, π1) and (X2, K2, π2). By Lemma 1.1, a simple computation shows

E1(f, f ) = 1

By the definition of a stationary distribution in (1.2), summing up each side of (2.7) over all x ∈ X1 implies

In some models, we may “collapse” the state space into a smaller one by par-titioning the state space into several subsets and viewing each of them as a new state. In the induced state space, the stationary distribution of the new Markov chain is a lumped probability of the original one in the sense of Remark 2.5. The following proposition provides a sufficient condition for collapsing Markov chains.

Proposition 2.5. Let (X2, K2, π2) be an irreducible Markov chain and p : X2 → X1 be a surjective map. Assume that

K2(f ◦ p)(x) = K2(f ◦ p)(y), ∀p(x) = p(y), f ∈ R|X1|. (2.8) Set, for z, w ∈ X1,

K1(z, w) := X

t:p(t)=w

K2(s, t),

where p(s) = z. Then K1 is irreducible and the stationary distribution π1 is given by

π1(x) = X

y:p(y)=x

π2(y).

Furthermore, if λ1, λ2 and α1, α2 are respectively the spectral gaps and logarithmic Sobolev constants of K1, K2. Then λ2 ≤ λ1 and α2 ≤ α1.

Remark 2.6. Note that (2.8) is equivalent to X

z:p(z)=w

K2(x, z) = X

z:p(z)=w

K2(y, z)

for all x, y ∈ X2 satisfying p(x) = p(y) and w ∈ X1.

Proof of Proposition 2.5. By choosing f = δw(the function taking value 1 at w and 0 otherwise) in (2.8), the quantity K1(z, w) is well-defined for all z, w ∈ X1. It is clear that the irreducibility of K1 is obtained immediately from that of K2. By a simple computation, we have

X

t:p(t)=w s:p(s)=z

π2(s)K2(s, t) = X

s:p(s)=z

π2(s)K1(z, w) = π1(z)K1(z, w).

Summing up both sides over all z ∈ X1implies that π1 is the stationary distribution of K1 and the remaining part is implied by Corollary 2.6.

4. The product chains. In the following, we consider the logarithmic Sobolev constant of a product chain. For 1 ≤ i ≤ n, let (Xi, Ki, πi) be an irreducible Markov chain. Let µ be a probability measure on {0, 1, 2, ..., n} and K be a Markov kernel on the product space X =Qn

i=1Xi defined by Kµ(x, y) = K(x, y) = µ(0)δ(x, y) +

Xn i=1

µ(i)δi(x, y)Ki(xi, yi) (2.9)

where x = (x1, ..., xn), y = (y1, ..., yn) and

δi(x, y) = Yn j=1j6=i

δ(xi, yi), δ(s, t) =







1 if s = t 0 otherwise

.

In the above setting, it is obvious that K is irreducible and the stationary distri-bution is π =Nn

1 πi, where π(x) =

Yn i=1

πi(xi), ∀x = (x1, ..., xn) ∈ X . (2.10)

Proposition 2.6. (Lemma 2.2.11 in [23]) Let {(Xi, Ki, πi)}n1 be a sequence of irreducible Markov chains and (λi)n1 and (αi)n1 be their spectral gaps and logarith-mic Sobolev constants. Let µ be a probability measure on the set {0, 1, ..., n} and (X , K, π) be a product chain, where X =Qn

1Xi and K and π are defined in (2.9) and (2.10). Then λ = λ(K) and α = α(K) are given by

λ = min{µ(i)λi}, α = min{µ(i)αi}.

Proof. See P.339 in [23].

2.4 Some examples

Since the logarithmic Sobolev constant was introduced in 1975, many people dedi-cate to estimating its value. Their experiences show that it is not an easy job even though the computation of the logarithmic Sobolev constant is to find its correct order. As one can see in [11, Theorem A.2], the computation of the logarithmic Sobolev constant for asymmetric Markov kernels on a two point space is tough and complicated. Up to now, the explicit computation of the logarithmic Sobolev con-stant is still restricted to very simple examples and few of them are determined. By Proposition 2.6, the computation of the logarithmic Sobolev constants for Markov

chains with small state spaces is not futile. In this section, we introduce some examples whose exact logarithmic Sobolev constants are known.

1. Random walk on a two point space. We first consider the simplest case where the state space has only two points, say 0 and 1. Let X = {0, 1} and K be a Markov kernel on X defined by K(0, 0) = p1, K(0, 1) = q1, K(1, 0) = p2 and K(1, 1)q2, where p1+ q1 = p2+ q2 = 1. Equivalently, the matrix form of K is given by

K =

 p1 q1 p2 q2

 . (2.11)

The following theorem treats the case p1 = p2.

Theorem 2.3 ([11, Theorem A.2]). Fix p, q ∈ (0, 1), p + q = 1. For the two-point space X = {0, 1} equipped with the chain

K(0, 0) = K(1, 0) = q, K(0, 1) = K(1, 1) = p, π(0) = q, π(1) = p. (2.12)

we have λ = 1 and α = 1/2 if p = q = 1/2 and

α = p − q

log(p/q) if p 6= q.

Proof. The fact λ = 1 is an easy exercise. We prove the statement concerning α using Corollary 2.4. Setting φ(0) = b, φ(1) = a and normalizing qb2+ pa2 = 1, we look for triplets (β, a, b) of positive numbers that are solutions of (2.4), that is,











p(b − a) = 2βb log b q(a − b) = 2βa log a pa2+ qb2 = 1.

Luckily, β can be eliminated by using the first two equations. This yields the

system 





pa log a + qb log b = 0 p(a2− 1) + q(b2− 1) = 0.

Setting aside the solution a = b = 1, we can assume a, b ∈ (0, 1) ∪ (1, +∞) and write this system as 





pa log a + qb log b = 0

a−a−1

log a = b−blog b−1.

Calculus shows that the function x 7→ (x − x−1)/ log x is decreasing on (0, 1) and increasing on (1, ∞). As it obviously satisfies f (x) = f (1/x), it follows that the second equation can only be satisfied if b = 1/a. Reporting in the first equation yields pa − q/a = 0, that is, a =p

q/p. It follows that the solutions of our original system are the triplets (β, 1, 1) (β arbitrary) and, when p 6= q,

µ p − q log(p/q),p

q/p,p p/q

.

As log(p/q)p−q < 1/2 when p 6= q, we conclude from Corollary 2.4 that the logarithmic Sobolev constant of the asymmetric two-point space at (2.12) is

α = p − q

log(p/q), p 6= q

and that, in the symmetric case p = q = 1/2, we have 2α = λ = 1.

Remark 2.7. The proof of Theorem 2.3 given above is outlined without details in [4]. It is much simpler than the two different proofs given in [11, 23]. Here, we have been careful to treat both the symmetric and the asymmetric cases at once. In fact, the proof in [23] is incorrect (it can however be corrected with additional pain but without changing the main ideas). On the one hand, in the case p = q = 1/2, the proof above consists in showing that no nonconstant minimizers exist, leading to the conclusion that α = λ/2. This is the main line of reasoning that will be used

in this work to treat other examples. On the other hand, in the case p 6= q, we were able to find a unique normalized nonconstant solution of (2.3) with α < λ/2 leading to the explicit computation of α. To the best of our knowledge, this is the only case with α < λ/2 where α has been computed by solving (2.3). Our study of other small examples indicate that such computation is typically extremely difficult if not impossible.

Remark 2.8. As a consequence of Theorem 2.3 and Definition 2.1, we have

f (p) = inf

Let K be the Markov kernel in (2.11). A computation shows that the stationary distribution is equal to π = (p2p+q2 1,p2q+q1 1) and, for any function f = (x, y) satisfying

By the above identities and Remark 2.8, the logarithmic Sobolev constant of a general two point Markov chain is then a corollary of Theorem 2.3.

Corollary 2.7. Let ({0, 1}, K, π) be an irreducible Markov chain where K is given by K(0, 0) = p1, K(0, 1) = q1, K(1, 0) = p2 and K(1, 1) = q2 with p2q1 6= 0. Then the spectral gap λ and the logarithmic Sobolev constant α are given by

λ = p2+ q1, α =

Remark 2.9. Let K be the Markov kernel given by (2.11). By Corollary 2.7, α = λ/2 if and only if K(0, 1) = K(1, 0), that is, K is symmetric.

2. Finite Markov chain with kernel K(x, ·) ≡ π. Let X be a finite set and π be a positive probability measure on X . Consider a Markov kernel K, where K(x, y) = π(y) for x, y = X . In this setting, such a chain perfectly reaches its stationarity once the transition starts. Clearly, the spectral gap λ is 1 and the stationary distribution of K is π. To determine the logarithmic Sobolev constant α, we need the following computation.

Assume that |X | > 2. Let π = minxπ(x) < 1/2 and x0 ∈ X be such that π(x0) = π. Consider the projection p : X 7→ {0, 1} where p(x0) = 0 and p(x) = 1 for x ∈ X \ {x0}. Let eK be a Markov kernel on {0, 1} obtained by collapsing K through the map p, that is,

K =e

 π 1 − π π 1 − π

 ,

and eα be the logarithmic Sobolev constant of eK. Then, by Proposition 2.5 and Theorem 2.3, one has

α ≤ eα = 1 − 2π

log[(1 − π)/π]) < λ/2. (2.13) Theorem 2.4. ([11, Theorem A.1]) Let X be a finite set with cardinality at least 3 and π be a positive probability measure on X . Let K be a Markov kernel given by K(x, y) = π(y) for x, y ∈ X . Then the spectral gap is 1 and the logarithmic Sobolev constant α is equal to

α = 1 − 2π log[(1 − π)/π], where π = minxπ(x).

Proof. The upper bound of α is given by (2.13) and it remains to prove its lower bound. By (2.13) and Proposition 2.3, there exists a non-constant positive function u on X satisfying (2.3). Without loss of generality, we assume that kuk2 = 1. Then

Proof. The upper bound of α is given by (2.13) and it remains to prove its lower bound. By (2.13) and Proposition 2.3, there exists a non-constant positive function u on X satisfying (2.3). Without loss of generality, we assume that kuk2 = 1. Then

相關文件