The Cutoff Phenomenon for Ehrenfest Processes

(1)

GUAN-YU CHEN1_{, YANG-JEN FANG}2_{, AND YUAN-CHUNG SHEU}3

Abstract. We consider families of Ehrenfest chains and provide a simple criterion on the Lp_{-cutoff and the L}p_{-precutoff with specified initial states for}

1≤ p < ∞. For the family with an Lp_{-cutoﬀ, a cutoﬀ time is described and}

a possible window is given. For the family without an Lp_{-precutoﬀ, the exact}

order of the Lp_{-mixing time is determined. The result is consistent with the}

well-known conjecture on cutoffs of Markov chains proposed by Peres in 2004, which says that a cutoff exists if and only if the multiplication of the spectral gap and the mixing time tends to infinity.

1. Introduction

Consider a time-homogeneous Markov chain on a ﬁnite set Ω with one-step transition matrix K. Let Kt_(x,_{·) denote the probability distribution of the chain}

at time t starting from state x. It is well-known that if K is ergodic (irreducible and aperiodic), then

lim

t→∞K

t_{(x, y) = π(y)} _{∀x, y ∈ Ω,}

where π is the unique invariant probability of K on Ω. Denote by kt

x the relative

density of Kt_(x,_{·) with respect to π, that is, k}t

x(y) = Kt(x, y)/π(y). For 1≤ p < ∞,

deﬁne the Lp_{-distance by}

Dp(x, t) =∥kxt− 1∥Lp_(π)=  ∑ y_∈Ω |kt x(y)− 1| p_π(y)   1/p .

For p =∞, the L∞-distance is set to be D_∞(x, t) = maxy|ktx(y)− 1|. In the case

p = 1, this is exactly twice of the the total variation distance between Kt_(x,_{·) and}

π, which is deﬁned by

DTV(x, t) =∥Kt(x,·) − π∥TV= max

A⊂Ω{K

t_{(x, A)}_{− π(A)}.}

For p = 2, it is the so-called chi-square distance. For any ϵ > 0 and 1≤ p ≤ ∞, deﬁne the Lp_{-mixing time by}

Tp(x, ϵ) = min{t ≥ 0 : Dp(x, t)≤ ϵ}.

The concept of cutoﬀs was introduced by Aldous and Diaconis in [1, 2, 3] to capture the fact that many ergodic Markov chains converge abruptly to their sta-tionary distributions (in total variation and separation). We refer the reader to

2000 Mathematics Subject Classiﬁcation. 60J05,60J25.

Key words and phrases. Cutoﬀ phenomenon, Ehrenfest chains.

1_{Partially supported by NSC grant NSC98-2628-M-009-003 and CMMSC and NCTS, Taiwan.} 2_{Partially supported by NSC grant NSC99-2115-M-009-008 and CMMSC and NCTS, Taiwan.} 3_{Partially supported by NSC grant NSC99-2115-M-009-010 and CMMSC and NCTS, Taiwan.}

(2)

[6, 7, 13, 14, 15] for details and further discussions on variant examples. In a word, when 1 < p≤ ∞, a family of ﬁnite ergodic Markov chains (Ωn, Kn, πn) with

specified initial states xn has an Lp-cutoff with cutoff time tn if

lim

n→∞Dn,p(xn, (1 + a)tn) =

{

0 if a > 0

∞ if − 1 < a < 0,

where Dn,p denotes the Lp-distance of the nth Markov chain. The deﬁnition for

cutoﬀs in total variation, separation and L1_{-distance is the same as above expect}

the replacement of the limit∞ with 1 in total variation and separation and with 2 in L1_-distance.

In [6], the authors discussed a number of variants of cutoﬀs and produced, in the reversible case, a necessary and suﬃcient condition for the existence of a max-Lp

-cutoﬀ, which is a cutoﬀ in the distance maxx_∈ΩDp(x,·) with 1 < p ≤ ∞. In [7],

there establishes an equivalent condition on the L2_{-cutoﬀ for families of Markov}

processes with speciﬁed initial distributions assuming the associated semigroups are normal. Also, a formula on the L2-cutoﬀ time was introduced in [7], based on a complete information of the spectral decomposition. This is in contrast with techniques and results in [6] which do not involve much in spectral theory.

Consider the Ehrenfest chains. For n≥ 1, let Ωn ={0, 1, ..., n} and Kn be the

Markov kernel of the Ehrenfest chain on Ωn deﬁned by

(1.1) Kn(i, i + 1) = 1−

i

n, Kn(i + 1, i) = i + 1

n , ∀0 ≤ i ≤ n − 1.

It is a simple exercise to check that the unbiased binomial distribution, πn(i) =

(n i

)

2−n, is the invariant probability of Kn and the pair (Kn, πn) is reversible, i.e.

πn(i)Kn(i, j) = πn(j)Kn(j, i) for all i, j ∈ Ωn. By lifting the chain to a random

walk on the hypercube, one may use the group representation of (Z2)n to identify

the eigenvalues and eigenvectors of Kn as follows.

Lemma 1.1. The matrix deﬁned in (1.1) has eigenvalues

βn,i= 1− 2i n 0≤ i ≤ n, with L2_(π n)-normalized eigenvectors (1.2) ψn,i(x) = ( n i )−1/2 i_∑ k=0 (−1)k ( x k )( n− x i− k ) 0≤ i, x ≤ n.

See, e.g., [8] for a proof. Based on the above result, Chen and Saloﬀ-Coste obtained the following theorem.

Theorem 1.2 ([7, Theorem 6.5]). Let Kn be deﬁned in (1.1) and set Kn′ = (I +

nKn)/(n + 1), πn(i) =

(n i

)

2−n. Then, the following are equivalent.

(1) The family{(Ωn, Kn′, πn) : n = 1, 2, ...} with starting states (xn)∞n=1 has an

L2_-cutoﬀ.

(2) |n − 2xn|/

√

n→ ∞ as n → ∞. Moreover, if (2) holds, then

Tn,2(xn, ϵ) =

n

2 log

|n − 2x_√ n|

(3)

The notation Oϵ(n) denotes a sequence in n whose absolute values are bounded

above by Cϵn for all n≥ 1 with 0 < Cϵ<∞.

The aim of this paper is to provide a necessary and suﬃcient condition on the

Lp_{-cutoﬀ of Ehrenfest chains with 1}_{≤ p < ∞ and describe the L}p_{-cutoﬀ time if}

any. For 1 < p <∞, the eigenfunctions are useful in bounding the Lp_{-distance but,}

however, they do not work very well in bounding the total variation distance of the associated semigroup from below. A path comparison to the simple random walk onZ is proposed to get suitable lower bound and this leads to the following result. Theorem 1.3. As in the setting of Theorem 1.2, the following are equivalent. For

p∈ [1, ∞),

Lp_-cutoﬀ.

Lp_-precutoﬀ.

(3) |n − 2xn|/

√

n→ ∞ as n → ∞. Moreover, if (2) holds, then

Tn,p(xn, ϵ) = n 2log |n − 2xn| √ n + Oϵ,p(n), ∀ϵ > 0, p ∈ (1, ∞). For p = 1, the above identity remains true with ϵ∈ (0, 2).

This theorem is a special case of Theorem 4.1 and 5.1. The concept of precutoff will be introduced in the next section. In the case p = 1, it has been proved in [7] that (3) is sufficient for (1). As the Ehrenfest chain is a birth-and-death chain, we refer the reader to [9, 10] for more results on cutoffs, where the first article treats the cutoff in separation for chains starting from one end-point and the second article considers the max-total variation cutoff for lazy chains. Both of them introduce a universal criterion on cutoffs but the Ehrenfest chain is out of their categories.

The remaining of this article is organized in the following way. In Section 2, we recall various notions of cutoffs and quote useful results from [6]. In Section 3, we recall some well-known results for simple random walks onZ, which will be used in latter context, and provide a proof on them. In Section 4, we deal with the total variation cutoff for the Ehrenfest chains in both the continuous time and discrete time cases. Those ideas inspired in this section are in fact applicable to more general models. In Section 5, we treat the Lp_{-cutoff and spell out the results}

along with the open problems.

2. Cutoffs

Throughout the oncoming sections, the term (Ω, K, π, µ) will be used to denote a time-homogeneous irreducible Markov chain on Ω with one-step transition matrix

K, invariant probability π and initial distribution µ. Write (Ω, Ht, π, µ) as the

continuous time Markov chain associated with (Ω, K, π, µ) if Ht = e−t(I−K), the

semigroup associated with K. If the chain starts at state x, we write (Ω, K, π, x) and (Ω, Ht, π, x) instead. For any two sequences of positive numbers, say tn, sn, the

notation sn = O(tn) means that there are N > 0 and C > 0 such that sn ≤ Ctn

for all n≥ N. If both sn = O(tn) and tn = O(sn) hold, we simply write tn ≍ sn.

(4)

In this section, we recall various definitions of cutoffs and a series of related results from [6]. The notion of cutoff can be developed for any family of non-increasing functions taking values on [0,∞]. The following definitions treat the Lp -cutoff for families of finite ergodic Markov chains with specified initial distributions in discrete time case. We refer the reader to [6] for further details and examples. Definition 2.1. LetF = {(Ωn, Kn, πn, µn) : n = 1, 2, ...} be a family of irreducible

and aperiodic ﬁnite Markov chains. For p∈ (1, ∞], the family F is said to present: (1) An Lp-precutoﬀ if there is a sequence tn > 0 and constants 0 < A < B

such that lim

n→∞Dn,p(µn, Bn) = 0, lim infn→∞ Dn,p(µn, An) > 0,

where Bn= inf{j ≥ 0 : j > Btn} and An = sup{j ≥ 0 : j < Atn}.

(2) An Lp_{-cutoﬀ if there is a sequence t}

n> 0 such that, for all ϵ∈ (0, 1),

lim

n→∞Dn,p(µn, kn(ϵ)) = 0, nlim→∞Dn,p(µn, kn(−ϵ)) = ∞,

where kn(ϵ) = inf{j ≥ 0 : j > (1 + ϵ)kn} and kn(ϵ) = sup{j ≥ 0 : j <

(1 + ϵ)tn}.

(3) A (tn, bn) Lp-cutoﬀ if tn> 0, bn> 0, bn= o(tn) and

lim c_→∞Fp(c) = 0, c_→−∞lim Fp(c) =∞, where Fp(c) = lim sup n→∞ Dn,p(µn, k(n, c)), Fp(c) = lim inf n→∞ Dn,p(µn, k(n, c)),

and k(n, c) = inf{j ≥ 0 : j > tn + cbn} and k(n, c) = sup{j ≥ 0 : j <

tn+ cbn}.

The deﬁnition for the case p = 1 follows if∞ is replaced by 2.

The deﬁnition agrees with that in [6]. In (2) and (3), tn is called an Lp-cutoﬀ

time and bn is a window with respect to tn. In (3), the functions, Fp and Fp, give

an idea on how the cutoﬀ evolves and is sometimes called the shape of the (tn, bn)

cutoﬀ.

Remark 2.1. Note that, for t > 0, the mapping t7→ Dn,p(µn, t) is non-increasing.

This implies that, if tn tends to inﬁnity (or equivalently Tn,p(µn, ϵ)→ ∞ for some

ϵ > 0) in Deﬁnition 2.1, it makes no diﬀerence to replace An with⌊Atn⌋ or ⌈Atn⌉,

and so for the replacements of Bn, kn(ϵ), kn(ϵ), k(n, c), and k(n, c).

Remark 2.2. In the continuous time case, the definition of cutoffs in Definition 2.1

follows in the intuitive way. That is, An= Atn, Bn= Btn, kn(ϵ) = kn(ϵ) = (1+ϵ)tn

and k(n, c) = k(n, c) = tn+ cbn.

Remark 2.3. According to Deﬁnition 2.1, if a family has no Lp_{-precutoﬀ (resp. L}p

-cutoﬀ), then the new family obtained by merging this one with any other still has no

Lp_{-precutoff (resp. L}p_{-cutoff). This implies that if a subfamily has no L}p_-precutoff

(resp. Lp_{-cutoff), then the original family has no L}p_{-precutoff (resp. L}p_-cutoff).

But, however, there might exist another subfamily that has an Lp-precutoﬀ (resp.

(5)

Definition 2.2. Let (Ω, K, π, µ) be an irreducible finite Markov chain and p ∈ [1,∞]. For ϵ > 0, the ϵ-Lp-mixing time (or briefly the Lp-mixing time) is defined to be

Tp(µ, ϵ) := inf{t ≥ 0 : Dp(µ, t)≤ ϵ},

where the right side is set to be inﬁnity if the inﬁmum is taken over an empty set. If (Ω, Ht, π, µ) is the continuous time chain associated with K, write the Lp-mixing

time as

Tpc(µ, ϵ) := inf{t ≥ 0 : Dpc(µ, t)≤ ϵ},

where Dcp(µ, t) is the Lp-distance between µHt and π.

The concept of cutoﬀ can also be described using the notion of mixing time. For instance, assuming Tn,p(ϵ) → ∞ for some ϵ > 0, a family of irreducible and

aperiodic Markov chains has an Lp-cutoﬀ if and only if lim

n_→∞Tn,p(µn, ϵ)/Tn,p(µn, δ) = 1, ∀ϵ, δ ∈ (0, Mp),

where Mp=∞ if p > 1 and M1= 2. See [6, Proposition 2.3-2.4] for further details

and relationships.

We end this section by introducing the following lemmas and corollary, which provide an idea on proving or disproving cutoﬀs.

Lemma 2.1 ([7, Proposition 2.1]). Let F = {(Ωn, Kn, πn, µn) : n = 1, 2, ...} be a

family of irreducible and aperiodic Markov chains. For any subsequence ξ = (ξn)∞n=1

of positive integers, set Fξ = {(Ωξn, Kξn, πξn, µξn) : n = 1, 2, ...}. Let p ∈ [1, ∞]

and assume Tn,p(ϵ)→ ∞ for some ϵ > 0. Then, the following are equivalent.

(1) F has an Lp_{-cutoﬀ (resp. (t}

n, bn) Lp-cutoﬀ ).

(2) For any subsequence ξ,Fξ has an Lp-cutoﬀ (resp. (tξn, bξn) L

p_{-cutoﬀ ).}

(3) For any subsequence ξ, there is a further subsequence ξ′ such that Fξ′ has

an Lp_{-cutoﬀ (resp. (t}

ξ_n′, bξ′_n) Lp-cutoﬀ ).

Remark 2.4. In Lemma 2.1, (1)⇒(2)⇒(3) also holds true for the Lp_-precutoﬀ.

Lemma 2.2. Let F = {(Ωn, Kn, πn, µn) : n = 1, 2, ...} be a family of irreducible

and aperiodic Markov chains and p ∈ [1, ∞]. Suppose that there is ϵ > 0 and an → ∞ such that Tn,p(µn, ϵ)≍ an and Tn,p(µn, δ) = O(an) for all 0 < δ < ϵ.

Then, the following are equivalent.

(1) F has no Lp_-precutoﬀ. (2) For all c > 0, lim sup n→∞ Dn,p(µn,⌊can⌋) > 0. (3) As δ→ 0, lim sup n→∞ Tn,p(µn, δ) an → ∞.

Proof. (2)⇔(3) is obvious from the deﬁnition of the Lp_{-mixing time. By the}

mono-tonicity of the Lp_{-distance, the converse statements for (1) and (2) are exactly}

(1)’ F has an Lp_-precutoﬀ.

(2)’ There is C > 0 such that lim

(6)

We prove the equivalence of (1) and (2) by showing (1)’⇔(2)’ instead. First, assume thatF has an Lp-precutoﬀ and, according to Remark 2.1, let tn> 0 and 0 < A < B

be constants such that lim inf

n_→∞ Dn,p(µn,⌊Atn⌋) = ϵ0> 0, nlim_→∞Dn,p(µn,⌊Btn⌋) = 0.

Let δ < min{ϵ, ϵ0} and choose N > 0, C1> 0 such that

Dn,p(µn,⌊Atn⌋) > δ > Dn,p(µn,⌊Btn⌋), Tn,p(µn, δ)≤ C1an, ∀n ≥ N.

The former implies Atn≤ Tn,p(µn, δ)≤ Btn and, then,

Btn≤ BTn,p(µn, δ) A ≤ BC1 A an. This yields lim sup n_→∞ Dn,p(µn,⌊BC1an/A⌋) ≤ lim sup n_→∞ Dn,p(µn,⌊Btn⌋) = 0.

Second, assume (2)’ and choose C2 > 0 such that Tn,p(µn, ϵ)≥ C2an and an≥

2/C2. Then, for n≥ 1,

Dn,p(µn,⌊C2an/2⌋) ≥ Dn,p(µn,⌊C2an− 1⌋) ≥ Dn,p(µn, Tn,p(µn, ϵ)− 1) > ϵ > 0.

This proves the Lp-precutoﬀ.

The following is a simple corollary from Lemma 2.2, which surveys the Lp

-precutoﬀ in a more strict way.

Corollary 2.3. As in the setting of Lemma 2.2, the following are equivalent. (1) No subfamily of F has an Lp_-precutoﬀ.

(2) For all c > 0, lim inf n_→∞ Dn,p(µn,⌊can⌋) > 0. (3) As δ→ 0, lim inf n_→∞ Tn,p(µn, δ) an → ∞.

Remark 2.5. It makes no diﬀerence to replace⌊can⌋ with ⌈can⌉ in (2) of Lemma

2.2 and Corollary 2.3.

Remark 2.6. Lemma 2.1-2.2 and Corollary 2.3 can be generalized to any family of

non-increasing functions deﬁned on {0, 1, 2, ...} or [0, ∞). In particular, they hold for the continuous time Markov chains without the assumption Tn,p(µn, ϵ) → ∞

and an→ ∞.

3. Simple random walks on Z

This section is contributed to the establishment of some frequently used inequal-ity related to the simple random walk on integers. A simple random walk is a discrete time Markov chain (Xn)∞n=0whose transition matrix is given by

K(i, i + 1) = K(i, i− 1) = 1/2, ∀i ∈ Z.

For m≥ 1, let Tmbe the ﬁrst passage time to the set {±m}, i.e.

(3.1) Tm= inf{n ≥ 0 : Xn= m or Xn=−m}.

For the continuous time case, let N (t) be a Poisson process with parameter 1 and independent of Xnand set Yt= XN (t). Clearly, Ytis a realization of the semigroup

(7)

Ht = e−t(I−K) associated with K and the ﬁrst passage time to{±m} is denoted

by

(3.2) Tem= inf{t ≥ 0 : Yt= m or Yt=−m}.

Theorem 3.1. Let Tm, eTm be the random times deﬁned in (3.1)-(3.2) and P0 be

the conditional probability given the initial state is 0. Then, for any b > 1 and m≥ 5,

min{P0(Tm> bm2),P0( eTm> bm2)} ≥ e−2b.

Remark 3.1. This theorem says that, regardless of discrete time or continuous time

cases, the simple random walk starting from the origin never reaches ±m before time m2 _{with positive probability uniformly over m.}

To prove this theorem, we introduce the following proposition.

Proposition 3.2. Let K be the transition matrix of an irreducible birth-and-death

chain on {0, 1, ...}. For m ≥ 1, let τm and eτm be respectively the ﬁrst passage

times to state m associated with the discrete time and continuous time chains. Let λ1, ..., λmbe the eigenvalues of the submatrix of I− K indexed by {0, 1, ..., m − 1}.

Then, λi∈ (0, 2) for 1 ≤ i ≤ m, λi̸= λj for i̸= j, and

(3.3) P0(τm> k) = m ∑ i=1  ∏ j:j̸=i λj λj− λi   (1 − λi)k and (3.4) P0(eτm> t) = m ∑ i=1  ∏ j:j̸=i λj λj− λi   e−tλi_.

Remark 3.2. The right side of (3.4) is exactly P( eT > t), where eT is a sum of m

independent exponential random variables with parameters λ1, ..., λm. Assuming

λi ∈ (0, 1) for all 1 ≤ i ≤ m, the right side of (3.3) is equal to P(T > k), where

T is a sum of independent geometric random variables with success probabilities λ1, ..., λm.

Proof of Proposition 3.2. The proof for the continuous time case is available in [4],

while the proof for the discrete time case follows in the same spirit. Back to the setting of the simple random walk. Observe that

P0(Tm> k) =P0(|Xi| < m, ∀i ≤ k), P0( eTm> t) =P0(|Xs| < m, ∀s ≤ t).

By the symmetry of the walk starting from 0, one may collapse states±i to achieve P0(Tm> k) =P′0(τm> k), P0( eTm> t) =P′0(eτm> t),

where P′₀ is the probability for the birth-and-death chain on {0, 1, ...} with initial state 0 and transition matrix K′ given by

K′(0, 1) = 1, K′(i, i− 1) = K′(i, i + 1) = 1/2, ∀i ≥ 1.

Here, τmandeτmare the ﬁrst passage times to state m associated with the discrete

(8)

[11, Section XIV.5], the eigenvalues and eigenvectors for the submatrix of I− K′ indexed by 0, 1, ..., m− 1 are λi= 1− cos (2i− 1)π 2m , ϕi(j) = cos (2i− 1)(j − 1)π 2m , ∀i, j ∈ {1, ..., m}. We ﬁrst treat the continuous time case. Let S1, ..., Smbe independent

exponen-tial random variables with parameters λi. As a consequence of Proposition 3.2,

replacing t with bm2 _yields

P0( eTm> bm2) =P(S1+· · · + Sm> bm2)≥ P(S1> bm2) = e−bm 2_λ1

≥ e−2b_,

where the last inequality uses the fact 1− cos t ≤ t2_{/2. For the discrete time case,}

the periodicity of K′, which is of period 2, implies λi> 1 for some i. This prevents

us from doing the same reasoning as the continuous time case. An idea to erase the periodicity of K′ is to consider the lazy walk with transition matrix 1₂(I + K′), since the eigenvalues of the submatrix of I−1₂(I + K′) indexed by{0, ..., m − 1} are contained in (0, 1). To see the detail, let (X_n′)∞_n=0be the birth-and-death chain with transition matrix K′ and deﬁne Zn = X2n′ /2. Obviously,

This implies that given X0′ = 0, or equivalently Z0= 0, (Zn)∞n=0is a Markov chain

on{0, 1, ...} with initial state 0 and transition matrix 1₂(I + K′). Furthermore, by the periodicity of K′, if m is even and positive, then

P′

0(τm> k) =P′0(Xi′< m,∀i ≤ k) = P′0(Zi< m/2,∀i ≤ ⌊k/2⌋).

If m is odd and m > 1, then P′

0(τm> k) =P′1(Xi′ < m,∀i ≤ k − 1) = P′0(Zi< (m− 1)/2, ∀i ≤ ⌊(k − 1)/2⌋),

where the last equality uses the fact that, given X₀′ = 1, the process (X_2n′ − 1)∞_n=1 has the same distribution as (Zn)∞n=1with Z0= 0. Let τm′ be the ﬁrst passage time

to m of the chain (Zn)∞n=0. Putting all above together yields

P0(Tm> k) =P′0(τm> k)≥ P′0(τ_⌊m/2⌋′ >⌊k/2⌋).

Note that the eigenvalues of the submatrix of I−1₂(I+K′) indexed by 0, 1, ...,⌊m/2⌋− 1 are λi/2∈ (0, 1), 1 ≤ i ≤ ⌊m/2⌋. By Proposition 3.2, if S1′, ..., S_⌊m/2⌋′ are

inde-pendent geometric random variables with success probabilities λ1/2, ..., λ_⌊m/2⌋/2,

then, for any positive integer k,

P0(Tm> k)≥ P(S1′ +· · · + S_⌊m/2⌋′ >⌊k/2⌋) ≥ ( 1 + cos(π/(2⌊m/2⌋)) 2 )_⌊k/2⌋ .

(9)

Replacing k with⌊bm2⌋, b > 1 and m > 1 gives P0(Tm> bm2)≥ ( 1 + cos(π/(2⌊m/2⌋)) 2 )⌊k/2⌋ ≥ ( 1 + cos(π/(m− 1)) 2 )bm2_/2 = ( cos π 2(m− 1) )bm2 ≥ ( 1− π 2 8(m− 1)2 )bm2 ≥ e−2b_,

where the last inequality uses the fact log(1− t) ≥ −12t/11 for t < 1/12 and asks

m≥ 5.

4. The total variation cutoff of Ehrenfest chains

This section is dedicated to the total variation cutoﬀ of Ehrenfest chains. First, recall the setting in (1.1). For n≥ 1, let Ωn={0, 1, ..., n} and Knbe the transition

matrix of the Ehrenfest chain on Ωn given by

(4.1) Kn(i, i + 1) = 1−

i

n, Kn(i + 1, i) = i + 1

n , ∀0 ≤ i ≤ n − 1.

It is easy to see that Kn is irreducible with stationary distribution πn(i) =

(n i

) 2−n for 0≤ i ≤ n and of period 2. Concerning the periodicity of Kn and the semigroup

associated with Kn, consider

(4.2) K_n′ = 1 n + 1I + n n + 1Kn, Hn,t= e −t(I−Kn)₌ ∞ ∑ i=0 ( e−tt i i! ) K_ni.

The total variation distance between (K_n′)t _{(resp. H}

n,t) and πn with initial state

xn is deﬁned by Dn,TV(xn, t) := max A⊂Ωn |(K′ n) t_(x n, A)− πn(A)| and Dc_n,_TV(xn, t) := max A⊂Ωn |Hn,,t(xn, A)− πn(A)|.

The total variation variation mixing time is set to be

Tn,TV(xn, ϵ) := min{t ≥ 0 : Dn,TV(xn, t)≤ ϵ}

and

T_n,c_TV(xn, ϵ) := min{t ≥ 0 : Dcn,TV(xn, t)≤ ϵ}.

For p∈ [1, ∞], let Dn,p, Dn,pc and Tn,p, Tn,pc be the L

p_{-distances and the L}p_-mixing

time in the discrete and continuous time cases.

Remark 4.1. The coupling, a classical probabilistic technique, was introduced by

Aldous and Diaconis to control and further to identify the total variation distance. See [2] and the references therein for details.

According to the above setting, it is clear that the total variation distance is exactly half of the L1-distance and has 1 as its maximum. In the same spirit, the total variation cutoff is consistent with the L1-cutoff and, thus, the definition is the same as in Definition 2.1 except the replacement of∞ by 1. The following theorem deals with the total variation cutoff of Ehrenfest chains.

Theorem 4.1. For n≥ 1, let xn ∈ Ωn. Consider the familiesF = {(Ωn, Kn′, πn, xn) :

n = 1, 2, ...} and Fc ={(Ωn, Hn,t, πn, xn) : n = 1, 2, ...}. Then, the following are

(10)

(1) F (resp, Fc) has a total variation precutoﬀ.

(2) F (resp, Fc) has a total variation cutoﬀ.

(3) |n − 2xn|/

√

n→ ∞.

Furthermore, if (3) holds, then both F and Fc have a (tn, n) total variation cutoﬀ

with tn= n 2 log |n − 2x_√ n| n .

Remark 4.2. The window size n is optimal in the sense that, if F or Fc has a

(tn, bn) total variation cutoﬀ, then n = O(bn). See [6] for details on variants of

window optimality.

Proof of Theorem 4.1. (3)⇒(2) and the (tn, n) total variation cutoﬀ under (3) has

been proved in [7]. (2)⇒(1) follows from the deﬁnition. For (1)⇒(3), we assume (3) fails and proveF and Fc have no total variation precutoﬀ. By Remark 2.3, it

suﬃces to show that, if|xn− n/2|/

√

n is bounded, then no subfamily ofF and Fc

has a total variation precutoﬀ. The proof consists of three steps.

Step1: Bounding the total variation from above. Note that the total varia-tion distance is bounded above by the chi-square distance. That is,

2Dn,TV(x, t)≤ Dn,2(x, t), 2Dcn,TV(x, t)≤ D

c n,2(x, t).

Using the reversibility of Kn and Lemma 1.1, the L2-distance can be expressed as

where ψn,iis the function deﬁned in (1.2) and the ﬁrst inequality applies the identity

ψn,n_−i(x) = (−1)xψn,i(x) for all x, i ∈ {0, 1, ..., n}. It is worthwhile to note that

the summation in the last line is also an upper bound for the continuous time case since [Dc_n,2(x, t)]2= n ∑ i=1 |ψn,i(x)|2e−4it/n ≤ 2 ⌊n/2⌋_∑ i=1 |ψn,i(x)|2e−4it/(n+1)+ e−4nt/(n+1).

Observe that ψn,i(x) =

(n i

)1/2

Pi(x, 1/2, n), where Pi(x, p, n) is the Krawtchouk

polynomial, i.e. Pi(x, p, n) = 2F1 ( −i, −x −n 1 p ) .

See [12] for the deﬁnition. Using the following recurrence relation (n− 2x)Pi(x, 1/2, n) = (n− i)Pi+1(x, 1/2, n) + iPi−1(x, 1/2, n),

(11)

we may rewrite

(4.3) ψn,i+1(x) =

n− 2x √

n An,iψn,i(x)− Bn,iψn,i−1(x),

where An,i= √ n (i + 1)(n− i), Bn,i= √ i(n− i + 1) (i + 1)(n− i).

Obviously, for n ≥ 2 and 1 ≤ i < n, An,i ≤ 1 and Bn,i ≤ 1. By setting r =

1 + supn{|n − 2xn|/

√

n} < ∞, we obtain

|ψn,i+1(xn)| ≤ (r − 1)|ψn,i(xn)| + |ψn,i−1(xn)|, ∀1 ≤ i < n.

Along with the boundary condition,

|ψn,0(xn)| = 1, |ψn,1(xn)| = |n − 2xn|/

√

n≤ (r − 1),

the above inequality yields

|ψn,i(xn)| ≤ ri, ∀0 ≤ i ≤ n.

Putting this back to the computation of the L2_{-distance derives, for any positive}

integer N ≥ 1₄log(2r2_), (4.4) max{Dn,TV(xn, N (n + 1)), Dn,cTV(xn, N (n + 1))} ≤1 2  2⌊n/2⌋∑ i=1 r2ie−4iN + e−4nN   1/2 ≤ ( 1 2 ∞ ∑ i=1 r2ie−4iN )1/2 ≤ ( r2e−4N 2(1− r2_e−4N₎ )1/2 ≤ re−2N_,

where the last inequality uses the fact et_{≥ 1 + t for t ≥ 0. Hence, for all ϵ ∈ (0, 1)}

and n≥ 2, max{Tn,TV(xn, ϵ), Tn,cTV(xn, ϵ)} ≤ ⌈ 1 2log 2r ϵ⌉(n + 1).

Step 2: Bounding the total variation from below: Discrete time case. In this step, we treat the discrete time case. Note that K_n′ can be interpreted in the following way. First, ﬂip a coin with probability n/(n + 1) landing on heads and evolve the chain according to Kn if a head appears. If the tail shows up, then the

chain keeps in current state. Since the coin has a high preference on heads, the periodicity of Kn still plays an important role in the evolution of Kn′. This implies

that the set partitioned by the period is a candidate of the testing set for the total variation. In the case of Ehrenfest chains, the set is either even integers or odd integers. From the viewpoint of the spectral theory, the period of any reversible ﬁnite Markov chain is either 1 or 2. Assuming the reversibility, a chain is periodic if and only if−1 is an eigenvalue of its transition matrix. Intuitively, the eigenvector associated with −1 should be able to provide a good idea on the construction of a testing set for the total variation. This is not clear for general chains, but it is quite obvious for Ehrenfest chain. According to Lemma 1.1, ψn,n(x) = (−1)x is

an eigenvector of Kn associated with the eigenvalue −1 and the sets, {x ∈ Ωn :

(12)

odd numbers in Ωn. Due to the above discussion, we set An={i ∈ Ωn: i is even}

and let 1An be the indicating function of An. Clearly, 2· 1An− 1 = ψn,n and

for n≥ 3, where the last inequality applies the fact log(1 − t) ≥ −2t for t ∈ [0, 1/2]. This implies, for 0 < ϵ≤ 1/(2e4_),

Tn,TV(xn, ϵ)≥ ⌊1₄log_2ϵ1⌋(n + 1), ∀n ≥ 3.

It is worthwhile to note that the lower bound is independent of the initial state. Along with the upper bound in Step 1, we obtain Tn,TV(xn, 1/(2e4)) ≍ n and

Tn,TV(xn, ϵ) = Oϵ(n) for all ϵ < 1/(2e4). Using the last inequality of (4.5), it is

easy to see that, for any c≥ 1 and n ≥ 1,

Dn,TV(xn,⌊cn⌋) ≥ Dn,TV(xn,⌊2c⌋(n + 1)) ≥ 1₂e−4⌊2c⌋≥ e−9c.

By Corollary 2.3, no subfamily ofF has a total variation precutoﬀ.

Step 3: Bounding the total variation from below: Continuous time case. Again, we suppose|n − 2xn|/

√

n is bounded. It has been developed in Step 1 that Tc

n,TV(xn, ϵ) = Oϵ(n) for all ϵ∈ (0, 1). By Corollary 2.3, it suﬃces to show that

(4.6) lim inf

n_→∞ D c

n,TV(xn, cn) > 0, ∀c > 0.

The trick used in Step 2 does not work for the continuous time case, since, by writing

exp{−t(I − Kn)} = exp

{ −2t [ I− ( I + Kn 2 )]} ,

the continuous time Markov chain behaves like the lazy chain, a Markov chain whose transition matrix has entries in the diagonal at least 1/2. Comparing with

K_n′, (I + Kn)/2 evolves according to a fair coin and Kn. That is, if the coin

lands on heads, then the chain transits states according to Kn. If the coin lands

on tails, then the chain keeps at current state. For lazy chains, their eigenvalues must be nonnegative and the smallest eigenvalue has less contribution to the L2 -distance and the total variation. Our policy to conquer the continuous time case is as follows. First, we compare the original discrete time Ehrenfest chain Kn with

the simple random walk onZ. Based on the symmetry of the Ehrenfest chain, the comparison will generate a lower bound on the total variation distance related to the ﬁrst passage time discussion in Section 3. This will lead to (4.6).

First, observe that, for any A⊂ Ωn and t≥ 0,

(4.7) Dc_n,_TV(xn, t)≥ Hn,t(xn, A)− πn(A) = ∞ ∑ i=0 ( e−tt i i! ) K_ni(xn, A)− πn(A).

By the symmetry of Knand the boundedness of|xn−n/2|/

√

n, it loses no generality

to assume that n/4≤ xn≤ n/2 for all n ≥ 0. Moreover, by Remark 2.4, it suﬃces

to deal with the following subcases. (4.8) (n/2− xn)/

√

n→ a ∈ [0, ∞), as n → ∞.

(13)

Proposition 4.2. Let Knbe the transition matrix on Ωn deﬁned by (4.1). Suppose

µn is a probability concentrated on A = {0, 1, ..., ⌈n/2⌉}, i.e., µn(A) = 1. Then,

µnKnt(A)≥ 1/2 for all t ≥ 0.

This proposition realizes the intuition that, by the symmetry of Ehrenfest chains, if the initial distribution concentrates on the left half side of Ωn, then so does the

distribution of the chain at all time. See the appendix for a proof of this proposition. Now, let A ={0, 1, ..., ⌈n/2⌉}. Clearly, πn(A)≤ 1/2 + πn(⌈n/2⌉) and, by Stirling’s

formula, πn(⌈n/2⌉) ∼ (πn/2)−1/2. Let T be the ﬁrst passage time to state ⌊n/2⌋,

the ﬁrst time (including time 0) to hit ⌊n/2⌋, for the Ehrenfest chain Kn. The

irreducibility of Kn implies Pxn(T < ∞) = 1 and the strong Markov property

yields K_ni(xn, A) = i ∑ j=0 K_ni−j(⌊n/2⌋, A)Pxn(T = j) +Pxn(T > i)≥ 1 2 + 1 2Pxn(T > i).

Putting this back to (4.7), we obtain, for all m≥ 0,

(4.9) D_TVc (xn, t)≥ 1 2 ( e−t m ∑ i=0 ti i! ) Pxn(T > m)− πn(⌈n/2⌉).

Next, we use Theorem 3.1 to boundPxn(T > m) from below. Consider the simple

random walk onZ. For m ≥ 1, k ≥ 1 and i ∈ Z, let P(m, k, i) be the set containing paths of length m starting from 0, ending at i and staying in{0, ±1, ±2, ..., ±(k−1)} up to time m. Clearly,

Pxn(T > m)≥

⌊n/2⌋−x∑n−1 i=0

Pxn(P(m, ⌊n/2⌋ − xn, i))

LetP′be the probability where the simple random walk onZ starting from the origin sits. For any path w = (w0, w1, ..., wm)∈ P(m, k, i) with |i| < k, one may partition

the edges{(wj, wj+1) : 0≤ k < m} into two subsets, say B1(w) and B2(w), where

B1(w) = {(j, j + 1) : 0 ≤ j < i} for i > 0, B1(w) = {(j, j − 1) : 0 ≥ j > i} for

i < 0, and B2(w) is a disjoint union of pairs in the form{(j, j + 1), (j + 1, j)} with

−k < j < k − 1. Note that, for 2xn− n/2 ≤ j ≤ n/2,

1− j n ≥ j n≥ 1 2 ( 4xn n − 1 ) = 1 2 ( 1−2(n− 2xn) n ) and ( 1− j n ) j + 1 n ≥ 1 4 [ 1− ( n− 2j n )2] ≥1 4 [ 1− 4 ( n− 2xn n )2] .

This leads toPxn(w)≥ cn(m)P′(w) for all w∈ P(m, ⌊n/2⌋−xn, i) and 2xn−n/2 ≤

i≤ n/2, where cn(m) = [ 1− 4 ( n− 2xn n )2]m( 1−2(n− 2xn) n )n/2−xn .

(14)

Let m = N n, where N is any positive integer. Using the notation in (3.1) and applying Theorem 3.1, we obtain

Pxn(T > N n)≥ cn(N n)P ′ 0(T⌊n/2⌋−xn> N n) ≥ cn(N n) exp { − 2N n (⌊n/2⌋ − xn)2 } ,

provided N n≥ (⌊n/2⌋ − xn)2. Putting this back to (4.9), we obtain

Dc_TV(xn, t)≥ 1 2 ( e−t N n ∑ i=0 ti i! ) cn(N n) exp { − 2N n (⌊n/2⌋ − xn)2 } − πn(⌊n/2⌋),

if N n≥ (⌊n/2⌋ − xn)2. As a consequence of Lemma A.3, if a > 0 in the setting of

(4.8), then lim inf n→∞ D c TV(xn, cn)≥ 1 2e −(20a2_+2/a2_)N > 0, ∀N > max{c, a2, 1}.

By Corollary 2.3, this prove that if a > 0, then no subfamily of Fc has a total

variation precutoﬀ.

In the end, we deal with the subcase a = 0. Obviously, the last inequality provides a trivial lower bound on the total variation. To get an applicable bound, we rewrite the transition density of Kt

n as follows using Lemma 1.1.

K_nt(x, y)/πn(y)− 1 = n

∑

i=1

ψn,i(x)ψn,i(y)|βn,i|t.

See [14, Lemma 1.3.3] for a proof. Applying this identity to the case (K_n′)t _and

Hn,t gives (4.10) (K ′ n)t(x, y) πn(y) − 1 = n ∑ i=1

ψn,i(x)ψn,i(y)

( 1 + nβn,i n + 1 )t and (4.11) Hn,t(x, y) πn(y) − 1 = n ∑ i=1

ψn,i(x)ψn,i(y)e−t(1−βn,i)

For n≥ 1, set

Hn,t(xn, y)/πn(y)− 1 = fn(t, y) + gn(t, y),

where fn(t, y) = ψn,2(xn)e−t(1−βn,2)ψn,2(y) and gn(t, y) = n ∑ i=1,i_̸=2

ψn,i(xn)e−t(1−βn,i)ψn,i(y).

As a consequence of the triangle inequality and Jensen’s inequality, we obtain 2Dc_TV(xn, t) =∥fn(t,·) + gn(t,·)∥L1_(π

n)≥ ∥fn(t,·)∥L1(πn)− ∥gn(t,·)∥L2(πn).

It remains to prove that, for all c > 0, lim inf n→∞ [ ∥fn(cn,·)∥L1_(πn)− ∥gn(cn,·)∥L2_(πn) ] > 0.

(15)

First, observe that ∥gn(t,·)∥L2_(π n)= ( n− 2x_√ n n e −4t/n₊ n ∑ i=3 |ψn,i(xn)|2e−4it/n )1/2 .

Recall the following fact developed in Step 1. If r = 1 + sup_n{|n − 2xn|/

√

n} < ∞,

then

|ψn,i(xn)| ≤ ri, ∀0 ≤ i ≤ n.

Putting this back to the L2_(π

n)-norm of gn(t,·) yields ∥gn(cn,·)∥L2_(πn)≤ ( n− 2xn √ n e −4c₊ (re−4c)3 1− re−4c )1/2 ,

provided r < e4c. Also, it is an easy exercise to compute

ψn,2(x) = √ n 2(n− 1) [( n− 2x √ n )2 − 1 ] . This implies|ψn,2(xn)| ∼ 1/ √ 2 and ∥ψn,2∥L1_(π n)≥ 1 2πn({x : |x − n/2| < √ n/4}) ∼ √1 2π ∫ 1/2 0 e−u2/2du≥ 1 12. According to the assumption (n/2− xn)/

√ n→ a = 0, if r < e4c, then lim inf n→∞ [ ∥fn(cn,·)∥L1(πn)− ∥gn(cn,·)∥L2(πn) ] ≥ 1 12√2e −4c₋_√ r3/2 1− re−4ce −6c_{= e}−4c( 1 12√2− r3/2 √ 1− re−4ce −2c)_{> 0,}

for c large enough. By the monotonicity of the total variation distance, we have lim inf

n→∞ D c

TV(xn, cn) > 0, ∀c > 0.

By Corollary 2.3, no subfamily of Fc has a total variation precutoﬀ when a = 0.

This ﬁnishes the proof.

Remark 4.3. In the proof of Theorem 4.1, it has been shown that if|xn− n/2|/

√ n

is bounded, then no subfamily ofF and Fc presents a total variation precutoﬀ and

the total variation mixing time is of order n.

Remark 4.4. In Step 3, the method for a = 0 is also valid for a > 0 if one replaces fn(t,·) with ψn,1(xn)e−t(1βn,1)ψn,1and changes gn(t,·) into Hn,t(xn,·)/πn− 1 − fn.

The proof for a > 0 also works for the discrete time case. 5. The Lp

-cutoff of Ehrenfest chains

This section is contributed to the development of the Lp_{-cutoﬀ of Ehrenfest}

chains with p ∈ (1, ∞). To bound the Lp_{-distance, we have to select suitable}

test functions in accordance with the operator theory and the spectral information provides some good ideas on the choice, for instance, the eigenfunctions. The main theorem states as follows.

Theorem 5.1. LetF and Fc be the families in Theorem 4.1. For p∈ (1, ∞), the

following are equivalent.

(16)

(2) F (resp. Fc) has an Lp-cutoﬀ.

(3) |xn− n/2|/

√

n→ ∞.

Moreover, if (3) holds, then bothF and Fc have a (tn, n) Lp-cutoﬀ with

tn= n 2 log |n − 2xn| √ n .

Proof. In this proof, we will write∥f∥pas the Lp(π)-norm of f for short. Obviously,

(2)⇒(1) comes immediate from Deﬁnition 2.1 for all 1 < p < ∞. For (3)⇒(2) and the (tn, n) Lp-cutoﬀ, we set

Fp(a) = lim sup n→∞

Dn,p(xn, tn+ an), Fp(a) = lim inf

n→∞ Dn,p(xn, tn+ an)

and

Gp(a) = lim sup n→∞

Dc_n,p(xn, tn+ an), Gp(a) = lim inf n→∞ D

c

n,p(xn, tn+ an).

Consider in the following two cases, p∈ (1, 2] and p ∈ (2, ∞).

Case 1: (1 < p≤ 2) For p = 2, (2) and (3) have been proved equivalent in [7]. In detail, by Theorem 6.3-6.5 in [7] and the proofs therein, there are positive constants

A, N such that, for n≥ N,

max{Dn,2(xn, tn+ an), Dn,2c (xn, tn+ an)} ≤ Ae−2a+ o(1)

and

min{Dn,2(xn, tn+ an), Dn,2c (xn, tn+ an)} ≥ e−2a+ o(1).

This implies

(5.1) max{F2(a), G2(a)} ≤ Ae−2a, min{F2(a), G2(a)} ≥ e−2a, ∀a ∈ R.

Note that the Lr-distance is bounded above by Ls-distance for 1≤ r < s ≤ ∞. Using the ﬁrst inequality of (5.1), we obtain, for p∈ (1, 2),

max{Fp(a), Gp(a)} ≤ Ae−2a→ 0, as a → ∞.

To get a lower bound, consider the test function ψn,1. Set q = (1− 1/p)−1. A

simple application of the central limit theorem yields

∥ψn,1∥q = ( _n ∑ x=0 ( |n − 2x|_√ n )q πn(x) )1/q → Cq := [E(|X|q)]1/q,

where X is a standard normal random variable andE denotes the expectation. It is a simple exercise to show that

Cq = (√ 2q πΓ ( q + 1 2 ))1/q <∞, ∀q ∈ (1, ∞),

where Γ is the Gamma function deﬁned by Γ(z) =∫₀∞e−ttz−1_{dt. As a consequence}

of (4.10)-(4.11), we have

Fp(a)≥ lim inf n→∞ |⟨(K′ n)tn+an(xn,·)/πn− 1, ψn,1⟩πn| ∥ψn,1∥q = e−2a/Cq and

Gp(a)≥ lim inf n→∞

|⟨Hn,tn+an(xn,·)/πn− 1, ψn,1⟩πn|

∥ψn,1∥q

(17)

Obviously, min{Fp(a), Gp(a)} → ∞ as a → −∞. This proves the desired (tn, n)

Lp-cutoﬀ.

Case 2: (2 < p <∞) Using the second inequality of (5.1), it is easy to see that min{Fp(a), Gp(a)} ≥ e−2a+ o(1)→ ∞, as a → −∞.

To get an upper bound, we apply the fact ψn,n_−i(x) = (−1)xψn,i(x) to the right

sides of (4.10)-(4.11) and get

Dn,p(xn, t)≤ 2 ⌈n/2⌉_∑ i=1 |ψn,i(xn)|∥ψn,i∥p ( 1− 2i n + 1 )t + ( 1− 2 n + 1 )t ≤ 2dp(n, t) and Dc_n,p(xn, t)≤ 2 ⌈n/2⌉_∑ i=1

|ψn,i(xn)|∥ψn,i∥pe−2it/n+

( 1− 2 n + 1 )t ≤ 2dp(n, t), where dp(n, t) = ⌈n/2⌉_∑ i=1

|ψn,i(xn)|∥ψn,i∥pe−2it/(n+1)+ e−2t/(n+1).

To bound dp(n, t) from above, one has to compute the Lp-norm of ψn,i. This

can be very complicated from its deﬁnition but, surprisingly, the identity in (4.3) is suﬃcient to give a reasonable upper bound. In detail, one may derive from (4.3) that, for i≤ n/2, |ψn,i+1(x)| ≤ (√ 2 i + 1× |n − 2x|_√ n ) |ψn,i(x)| + |ψn,i−1(x)|.

Along with the initial conditions, ψn,0≡ 1 and ψn,1(x) = (n−2x)/

√ n, an inductive argument yields (5.2) |ψn,i(x)| ≤ √ 2i i! i ∏ j=1 ( |ψn,1(x)| + √ j 2 ) , ∀x ∈ Ωn, i≤ n/2.

For convenience, write i! = αiii+1/2e−i. By Stirling’s formula, αi→

√

2π as i→ ∞. Thus, we may choose β > 1 such that β−1≤ αi≤ β for all i ≥ 1. This implies

(5.3) ii+1/2e−i/β≤ i! ≤ βii+1/2e−i, ∀i ≥ 1.

In this setting, (5.2) gives

|ψn,i(x)| ≤ (2e)i/2i−1/4β1/2

(

|ψn,1(x)|i−1/2+ 1

)i

(5.4)

and, then, the Lp_{-norm of ψ}

n,i is bounded above as follows.

∥ψn,i∥pp≤ (2e) pi/2_i_−p/4_βp/2_π n [( |ψn,1|i−1/2+ 1 )pi]

≤ (2e)pi/2_i−p/4_βp/2₂pi[_i−pi/2_π n ( |ψn,1|pi ) + 1 ] ,

where the last inequality uses the fact (s + t)r≤ 2r−1(sr+ tr) for any s > 0, t > 0 and r ≥ 1. It deserves to note that, for ﬁxed i, the central limit theorem implies that πn(|ψn,1|pi) converges to the expectation of |X|pi, where X is the standard

normal random variable. To estimate such a convergence for all 1 ≤ i ≤ n, one may consider the convergence rate of the central limit theorem, but, however, this

(18)

can be very complicated. Here, we cook up a direct computation in Lemma A.4, which says that there exists a constant C > 1 such that

πn(|ψn,1|pi)≤ C4piΓ ( pi + 1 2 ) .

As a consequence of the identity Γ(t + 1) = tΓ(t), Γ ( pi + 1 2 ) ≤ 2 ⌊(pi−1)/2⌋_∏ j=1 pi− 2j + 1 2 ≤ pi × (⌈ pi− 3 2 ⌉ ! ) ≤ 5βpi[(pi)/(2e)]pi/2_.

For p≥ 2, the above inequalities gives

∥ψn,i∥p ≤

(

(2e)pi/2i−p/4βp/22pi{20βC4pi(pi)[p/(2e)]pi/2} )1/p

≤ 10βCi1/4_(8p)i_.

Plugging the last term and (5.4) back to dp(n, t), we obtain

(5.5) dp(n, t)≤ 10β2C ⌈n/2⌉_∑ i=1 (20p)i(|ψn,1(xn)| + 1) i e−2it/(n+1)+ e−2t/(n+1). Recall that tn= n 2log |n − 2xn| √ n = n 2log|ψn,1(xn)|. Clearly, for a > 1, tn+ an≥ n + 1 2 log|ψn,1(xn)| + (a − 1)n ≥ n + 1 2 log|ψn,1(xn)| + n + 1 2 (a− 1). This implies dp(n, tn+ an)≤ 10β2C ⌈n/2⌉_∑ i=1 ( 20p ea−1 × |ψn,1(xn)| + 1 |ψn,1(xn)| )i + exp{−|ψn,1(xn)|}.

Under the assumption of (3), that is,|ψn,1(xn)| → ∞, if ea−1 > 8p, then

max{Fp(a), Gp(a)} ≤ 2 lim sup n→∞ dp(n, tn+ an) ≤ 20β2_C ∞ ∑ i=1 i(20pe1−a)i= 400β 2_Cpe1−a 1− 20pe1−a .

Obviously, the last term converges to 0 as a tends to inﬁnity. This proves the (tn, n)

Lp-cutoﬀ ofF and Fc with 2 < p <∞.

For (1)⇒(3), we assume that |xn − n/2|/

√

n is bounded and prove that no

subfamily ofF and Fc has an Lp-precutoﬀ. Set M = supn_≥1{|2xn− n|/

√ n} + 1.

By (5.5), we have, for p > 2 and ea≥ 20Mp

max{Dn,p(xn,⌈an⌉), Dn,pc (xn, an)} ≤ 2dp(n, an)

≤20β2_C ∞ ∑ i=1 (20M pe−a)i+ 2e−a= 400M β 2_Cpe−a 1− 20Mpe−a + 2e −a_.

Again, the right side converges to 0 as a tends to inﬁnity. This implies, for all ϵ > 0 and p <∞,

(19)

Also, by Remark 4.3 and Corollary 2.3, we have lim inf

n→∞ min{Dn,TV(xn, cn), D c

n,TV(xn, cn)} > 0, ∀c > 0.

This yields, for p > 1, lim inf

n→∞ min{Dn,p(xn, cn), D c

n,p(xn, cn)} > 0, ∀c > 0.

Consequently, for 1 < p <∞, no subfamily of F and Fc has an Lp-precutoﬀ. This

ﬁnishes the proof.

Remark 5.1. It is worthwhile to note that if |n − xn|/

√

n is bounded, then the Lp_{-mixing time of the Ehrenfest chains in (4.2) with p}_{∈ [1, ∞) is of order n.}

Remark 5.2. For p =∞, the equivalence in Theorem 5.1 might fall. Suppose n is

even, xn= n/2 and consider the separation distance, which is closely related to the

L∞-distance and is deﬁned by

Dn,sep(x, t) = max y { 1−(K ′ n)t(x, y) πn(y) } , Dc_n,_sep(x, t) = max y { 1−Hn,t(x, y) πn(y) } .

For n≥ 1, let Ln be a Markov kernel on{0, 1, ..., n/2} given by

Ln(i, i) = 0, ∀0 ≤ i ≤ n/2, Ln(i, i + 1) = 1− i n, ∀0 ≤ i < n/2, and Ln(i + 1, i) = i + 1 n , ∀0 ≤ i < n/2 − 1, Ln(n/2, n/2− 1) = 1.

It is obviously that Ln is obtained from Kn by collapsing states{i, n − i} and has

eπn(i) = 21−n

(n i

)

for i < n/2 and eπn(n/2) = 2−n

( n n/2

)

as the stationary distribu-tion. Let eDn,sep(x, t), eDn,csep(x, t) be respectively the separation distances between

(L′_n)t_{, e}−t(I−Ln) _and_eπ

n, where L′n= (I + nLn)/(n + 1). Then, Dn,sep(n/2, t) = eDn,sep(n/2, t), D c n,sep(n/2, t) = eD c n,sep(n/2, t).

In fact, the above identities also hold in the Lp_{-distance with 1}_{≤ p ≤ ∞. In [9],}

the authors consider discrete time monotone birth-and-death chains, which is not satisﬁed by L′n, and continuous time birth-and-death chains without any constraint.

It is an easy exercise to check that I− Ln has eigenvalues 4i/n and eigenvectors

ϕn,i given by ϕn,i(x) = ψn,2i(x) for 0≤ i ≤ n/2. Clearly, the spectral gap of Ln is

λn= 4/n. Set tn = n/2 ∑ i=1 n 4i = n log n 4 + O(n).

As a consequence of [9, Theorem 5.1-6.1], the family Fc in Theorem 4.1 has a

(1₄n log n, n) separation cutoﬀ. However, according to Theorem 5.1 and Remark

5.1,Fc has no Lp-precutoﬀ and the exact order of the Lp-mixing time is n.

Remark 5.3. There is no universal criterion on the total variation cutoﬀ or

pre-cutoff, except specific chains such as lazy birth-and-death chains. Concerning the maximum total variation distance and the related mixing time, define

DTV(t) = max

(20)

and call the cutoﬀ in the above distance as the maximum total variation cutoﬀ. The authors of [10] prove that a family of lazy birth-and-death chains on Ωn =

{0, 1, ..., n} has a maximum total variation cutoﬀ if and only if

lim

n_→∞λnTn,TV(ϵ) =∞,

for some ϵ∈ (0, 1), where 1 − λn is the second largest eigenvalue of the transition

matrix on Ωn. Such a criterion is proposed by Peres during the ARCC workshop

held by AIM in Palo Alto, December 2004. Under the assumption of reversibility, it has been shown to be true in [6] for max- Lp distance with 1 < p < ∞, but disproved in [5] for p = 1 using an idea from Aldous. However, none of the above results is clear if the initial states or distributions for a family of ergodic Markov chains are specified. As a consequence of Theorem 4.1, Lemma 1.1 and Remark 4.3, the family in Theorem 4.1 has a total variation cutoff (also for the precutoff) if and only if

lim

n→∞λnTn,TV(xn, ϵ) =∞,

for some ϵ∈ (0, 1). This provides an example that is consistent with Peres’ conjec-ture.

Appendix A. Techniques and proofs We consider Proposition 4.2 in a more general setting.

Lemma A.1. Let K be the transition matrix of a periodic birth-and-death chain

on Ω ={0, 1, ..., m} with birth rate pi and death rate qi= 1− pi. That is,

K(i, i + 1) = pi, K(i, i− 1) = qi= 1− pi, ∀0 ≤ i ≤ m,

with the convention pm = q0 = 0. Let l = ⌊m/2⌋ and µ be a probability on Ω.

Suppose that, for any i≥ 0,

(A.1) µ(l− 2i) ≥ µ(l + 2i + 2) ≥ µ(l − 2i − 2), pl+2i≥ ql−2i≥ pl+2i+2,

and

(A.2) pl+2i+ ql+2i+2 ≥ pl−2i−2+ ql−2i≥ pl+2i+2+ ql+2i+4.

Then, for all i≥ 0,

µK(l + 2i + 1)≥ µK(l − 2i − 1) ≥ µK(l + 2i + 3). Proof. By the periodicity of K,

µK(j) = µ(j− 1)pj−1+ µ(j + 1)qj+1, ∀0 ≤ j ≤ m,

where

(A.3) µ(−1) = µ(m + 1) = p₋₁= qm+1= 0.

It is easy to check that both (A.1) and (A.2) hold under the extension in (A.3). If

i≤ (l − 1)/2, then l + 2i + 1 ≤ 2l ≤ m and µK(l + 2i + 1)− µK(l − 2i − 1)

=[µ(l + 2i)pl+2i+ µ(l + 2i + 2)ql+2i+2]

− [µ(l − 2i)ql−2i+ µ(l− 2i − 2)pl−2i−2]

≥µ(l − 2i)(pl+2i− ql_−2i) + µ(l + 2i + 2)(ql+2i+2− pl_−2i−2)

(21)

If l + 2i + 3≤ m, then l − 2i − 1 ≥ 2l + 2 − m ≥ 1 and

µK(l− 2i − 1) − µK(l + 2i + 3)

=[µ(l− 2i)ql−2i+ µ(l− 2i − 2)pi−2i−2]

− [µ(l + 2i + 2)pl+2i+2+ µ(l + 2i + 4)ql+2i+4]

≥µ(l + 2i + 2)(ql−2i− pl+2i+2) + µ(l− 2i − 2)(pi−2i−2− ql+2i+4)

≥µ(l − 2i − 2)(ql−2i− pl+2i+2+ pi−2i−2− ql+2i+4)≥ 0.

This ﬁnishes the proof.

Remark A.1. Lemma A.1 also holds for the case that m is even and l = m/2− 1.

The proof goes similarly and is omitted.

The following is a simple corollary of Lemma A.1.

Corollary A.2. Let K be the transition matrix on Ω ={0, 1, ..., m} given by

K(i, i + 1) = pi, K(i, i− 1) = qi= 1− pi, ∀0 ≤ i ≤ m,

where pm= q0= 0, and let µ be a probability on Ω. Suppose that

pi= qm−i, pi≥ pi+1, ∀i ≥ 0,

and

pi+ qi+2 ≤ pi+1+ qi+3, ∀0 ≤ i ≤ ⌊m/2⌋ − 2.

(1) If m = 2l and

µ(l + 2i)≥ µ(l − 2i − 2) ≥ µ(l + 2i + 2), ∀i ≥ 0, then, for all i≥ 0 and t ∈ {0, 1, 2, ...},

µK2t+1(l− 2i − 1) ≥ µK2t+1(l + 2i + 1)≥ µK2t+1(l− 2i − 3)

and

µK2t(l + 2i)≥ µK2t(l− 2i − 2) ≥ µK2t(l + 2i + 2). (2) If m = 2l and

µ(l− 2i − 1) ≥ µ(l − 2i + 1) ≥ µ(l − 2i − 3), ∀i ≥ 0, then, for all i≥ 0 and t ∈ {0, 1, 2, ...},

µK2t+1(l + 2i)≥ µK2t+1(l− 2i − 2) ≥ µK2t+1(l + 2i + 2).

and

µK2t(l− 2i − 1) ≥ µK2t(l + 2i + 1)≥ µK2t(l− 2i − 3). (3) If m = 2l + 1 and

µ(l− 2i) ≥ µ(l + 2i + 2) ≥ µ(l − 2i − 2), ∀i ≥ 0, then, for all i≥ 0 and t ∈ {0, 1, 2, ...},

µK2t+1(l + 2i + 1)≥ µK2t+1(l− 2i − 1) ≥ µK2t+1(l + 2i + 3)

and

(22)

Proof of Proposition 4.2. For the birth-and-death chain in Proposition 4.2, it is

obvious that pi= 1− i/n and qi= i/n. This implies

pi= qn−i, pi> pi+1, pi+ qi+2 = 1 +

2

n, ∀i ≥ 0.

Applying Corollary A.2 with K = Kn and µ = δ_⌈n/2⌉, the dirac mass on ⌈n/2⌉,

yields

K_nt(⌈n/2⌉, A) ≥ 1/2, ∀t ≥ 0.

For the general case with µn(A) ≥ 1/2, let (Xt)∞t=0 be a Markov chain with

transition matrix Kn and let T be the ﬁrst passage time to state ⌈n/2⌉, i.e.,

T = min{t ≥ 0 : Xt = ⌈n/2⌉}. By the irreducibility of Kn, Pµn(T < ∞) = 1.

Using the strong Markov property, we obtain, for t≥ 0,

µnKnt(A) = t ∑ i=0 Pµn(Xt∈ A, T = i) + Pµn(Xt∈ A, T > t) = t ∑ i=0 P(Xt−i∈ A|X0=⌈n/2⌉)Pµn(T = i) +Pµn(T > t) ≥1 2Pµn(T ≤ t) + Pµn(T > t)≥ 1/2. Lemma A.3 ([6, Lemma A.1]). For n > 0, let an∈ R+, bn∈ Z+, cn =bn√−a_a n

n and dn= e−an ∑bn i=0 ai n

i!. Assume that an+ bn → ∞. Then

(A.4) lim sup

n→∞ dn= Φ ( lim sup n→∞ cn ) , lim inf n→∞ dn= Φ ( lim inf n→∞ cn ) , where Φ(x) = √1 2π ∫x −∞e−t 2_/2 dt.

In particular, if cn converges(the limit can be +∞ and −∞), then lim n→∞dn = Φ ( lim n→∞cn ) .

Lemma A.4. For n ≥ 1, let ξn be a binomial random variable with parameters

(n, 1/2). Then, there is a universal constant C > 0 such that

E ( n− 2ξ√_n n θ) ≤ C4θ_Γ ( θ + 1 2 ) , ∀θ > 0, n ≥ 1, where Γ is the Gamma function.

Proof. Set Ωn ={0, 1, ..., n} and πn(x) =

(_n

x

)

2−n. According to the deﬁnition of

ξn,P(ξn = x) = πn(x) for x∈ Ωn. For 0≤ j < √ n, set En,j={x ∈ Ωn:|n − 2x|/ √ n∈ (j, j + 1]}, yn,j = max{x ∈ En,j: x≤ n/2}. Clearly, [n− (j + 1)√n]/2≤ yn,j < (n− j √ n)/2 and (A.5) E ( n− 2ξ√_nn θ) ≤ ⌊_∑√n_⌋ j=0 (j + 1)θπn(En,j).

(23)

Using (5.3), we obtain, for yn,j̸= 0, πn(En,j) = 2−n ∑ x∈En,j n! x!(n− x)! ≤ 2 1−n_⌈√_n_⌉ n! yn,j!(n− yn,j)! ≤ 22_−n√_nβ3 nn+1/2 yyn,j+1/2 n,j (n− yn,j)n−yn,j+1/2 = 8β3/zn,j, where zn,j = ( 2 n )n+1 y_n,jyn,j+1/2(n− yn,j)n−yn,j+1/2 = [ 2yn,j n ( 2−2yn,j n )](n+1)/2( n− yn,j yn,j )n/2−yn,j = [ 1− ( 1−2yn,j n )2](n+1)/2[ 1 + (1− 2yn,j/n) 1− (1 − 2yn,j/n) ]n/2_−yn,j .

Note that the mapping t7→ (1 − t)1/tis strictly decreasing on (0, 1). This implies [ 1− ( 1−2yn,j n )2]n/2 ≥ [ 1− ( 1−2yn,j n )]n/2−yn,j and, hence, zn,j ≥ √ 1− ( 1−2yn,j n )2[ 1 + ( 1−2yn,j n )]n/2_−yn,j ≥2yn,j n [ 1 + ( 1−2yn,j n )]n/2−yn,j

In the case yn,j ≥ n/6, one may use the inequality, log(1 + t) ≥ t/2 for t ∈ [0, 1], to

get zn,j≥ 1 3exp { n 4 ( 1−2yn,j n )2} ≥ 1 3e j2_/4 .

In the case 1≤ yn,j≤ n/6, it is clear that

zn,j ≥ 2 n ( 5 3 )n/3 ≥ 2 ne n/6 _≥ 2 ne n/24_ej2/8_,

where the last inequality applies the fact j <√n. Putting both cases together, we

may choose a universal constant C > 1 such that

zn,j ≥

ej2/8

C , ∀0 ≤ j ≤ √

n, yn,j ̸= 0, n ≥ 1.

Back to the computation of πn(En,j), this gives

πn(En,j)≤ 8Cβ3e−j 2_/8

, ∀0 ≤ j ≤√n, yn,j̸= 0, n ≥ 1.

In fact, the above inequality also holds for yn,j = 0 (which must imply j =⌊

√ n⌋)

since, in such a case, πn(En,j) = 21−n ≤ 2e−(log 2)j 2

≤ 2e−j2_/8

(24)

computation in (A.5), we have E ( n− 2ξ√_n n θ) ≤ 8Cβ3 ⌊_∑√n⌋ j=0 (j + 1)θe−j2/8≤ 16Cβ3 ⌊_∑√n⌋ j=0 (j + 1)θe−(j+2)2/16 ≤ 16Cβ3 ⌊_∑√n⌋ j=0 ∫ j+2 j+1 tθe−t2/16dt≤ 64Cβ34θ ∫ _∞ 0 sθe−s2ds = 32Cβ34θΓ ( θ + 1 2 ) . References

[1] David Aldous. Random walks on ﬁnite groups and rapidly mixing Markov chains. In Seminar

on probability, XVII, volume 986 of Lecture Notes in Math., pages 243–297. Springer, Berlin,

1983.

[2] David Aldous and Persi Diaconis. Shuﬄing cards and stopping times. Amer. Math. Monthly, 93(5):333–348, 1986.

[3] David Aldous and Persi Diaconis. Strong uniform times and ﬁnite random walks. Adv. in

Appl. Math., 8(1):69–97, 1987.

[4] M. Brown and Y.-S. Shao. Identifying coeﬃcients in the spectral representation for ﬁrst passage time distributions. Probab. Engrg. Inform. Sci., 1:69–74, 1987.

[5] Guan-Yu Chen. The cutoﬀ phenomenon for ﬁnite Markov chains. PhD thesis, Cornell Uni-versity, 2006.

[6] Guan-Yu Chen and Laurent Saloﬀ-Coste. The cutoﬀ phenomenon for ergodic markov pro-cesses. Electron. J. Probab., 13:26–78, 2008.

[7] Guan-Yu Chen and Laurent Saloﬀ-Coste. The L2_{-cutoﬀ for reversible Markov processes. J.} Funct. Anal., 258(7):2246–2315, 2010.

[8] Persi Diaconis and Phil Hanlon. Eigen-analysis for some examples of the Metropolis algo-rithm. In Hypergeometric functions on domains of positivity, Jack polynomials, and

applica-tions (Tampa, FL, 1991), volume 138 of Contemp. Math., pages 99–117. Amer. Math. Soc.,

Providence, RI, 1992.

[9] Persi Diaconis and Laurent Saloﬀ-Coste. Separation cut-oﬀs for birth and death chains. Ann.

Appl. Probab., 16(4):2098–2122, 2006.

[10] Jian Ding, Eyal Lubetzky, and Yuval Peres. Total variation cutoﬀ in birth-and-death chains.

Probab. Theory Related Fields, 146(1-2):61–85, 2010.

[11] William Feller. An introduction to probability theory and its applications. Vol. I. Third edi-tion. John Wiley & Sons Inc., New York, 1968.

[12] R. Koekoek and R. Swarttouw. The askey-scheme of hypergeometric orthogonal polynomials and its q- analog. http://math.nist.gov/opsf/projects/koekoek.html, 1998.

[13] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov chains and mixing times. American Mathematical Society, Providence, RI, 2009. With a chapter by James G. Propp and David B. Wilson.

[14] Laurent Saloﬀ-Coste. Lectures on ﬁnite Markov chains. In Lectures on probability theory

and statistics (Saint-Flour, 1996), volume 1665 of Lecture Notes in Math., pages 301–413.

Springer, Berlin, 1997.

[15] Laurent Saloﬀ-Coste. Random walks on ﬁnite groups. In Probability on discrete structures, volume 110 of Encyclopaedia Math. Sci., pages 263–346. Springer, Berlin, 2004.

1

Department of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan

(25)

2

E-mail address: [email protected] 3