• 沒有找到結果。

5.5.1 Convergence in Probability

N/A
N/A
Protected

Academic year: 2022

Share "5.5.1 Convergence in Probability"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

5.5 Convergence Concepts

This section treats the somewhat fanciful idea of allowing the sample size to approach infinity and investigates the behavior of certain sample quantities as this happens. We are mainly concerned with three types of convergence, and we treat them in varying amounts of detail.

In particular, we want to look at the behavior of ¯Xn, the mean of n observations, as n → ∞.

5.5.1 Convergence in Probability

Definition 5.5.1 A sequence of random variables, X1, X2, . . ., converges in probability to a random variable X if, for every ² > 0,

n→∞lim P (|Xn− X| ≥ ²) = 0 or equivalently,

n→∞lim P (|Xn− X| < ²) = 1.

The X1, X2, . . . in Definition 5.5.1 (and the other definitions in this section) are typically not independent and identically distributed random variables, as in a random sample. The distribution of Xnchanges as the subscript changes, and the convergence concepts discussed in this section describes different ways in which the distribution of Xn converges to some limiting distribution as the subscript becomes large.

Theorem 5.5.2 (Weak law of large numbers)

Let X1, X2, . . . be iid random variable with EXi = µ and VarXi = σ2 < ∞. Define ¯Xn = (1/n)Pn

i=1Xi. Then, for every ² > 0,

n→∞lim P (| ¯Xn− µ| < ²) = 1;

that is, ¯Xn converges in probability to µ.

Proof: We have, for every ² > 0,

P (| ¯Xn− µ| ≥ ²) = P (( ¯Xn− µ)2 ≥ ²)

E( ¯Xn− µ)2

²2 = Var ¯X

²2 = σ2 2

(2)

Hence, P (| ¯Xn− µ| < ²) = 1 − P (| ¯Xn− µ| ≥ ²) = 1 − σ22 → 1, as n → ∞. ¤

The weak law of large numbers (WLLN) quite elegantly states that under general conditions, the sample mean approaches the population mean as n → ∞.

Example (Consistency of S2)

Suppose we have a sequence X1, X2, . . . of iid random variables with EXi = µ and VarXi = σ2 < ∞. If we define

Sn2 = 1 n − 1

Xn i=1

(Xi− ¯Xn)2, using Chebychev’s Inequality, we have

P (|S2− σ2| ≥ ²) ≤ E(Sn2− σ2)2

²2 = VarSn2

²2 ,

and thus, a sufficient condition that Sn2 converges in probability to σ2 is that VarSn2 → 0 as n → ∞.

Theorem 5.5.4

Suppose that X1, X2, . . . converges in probability to a random variable X and that h is a continuous function. Then h(X1), h(X2), . . . converges in probability to h(X).

Proof: If h is continuous, given ² > 0 there exists δ > 0 such that |h(xn) − h(x)| < ² for

|xn− x| < δ. Since X1, X2, . . . converges in probability to the random variable X, then

n→∞lim P (|Xn− X| < δ) = 1 Thus,

n→∞lim P (|h(Xn) − h(X)| < ²) = 1.

¤

Example (Consistency of S)

If Sn2 is a consistent estimator of σ2, then by Theorem 5.5.4, the sample standard deviation Sn=p

Sn2 is a consistent estimator of σ.

(3)

5.5.2 Almost sure convergence

A type of convergence that is stronger than convergence in probability is almost sure con- vergence. This type of convergence is similar to pointwise convergence of a sequence of functions, except that the convergence need not occur on a set with probability 0 (hence the

“almost” sure).

Example (Almost sure convergence)

Let the sample space S be the closed interval [0, 1] with the uniform probability distribution.

Define random variables Xn(s) = s + sn and X(s) = s. For every s ∈ [0, 1), sn → 0 as n → and Xn(s) → s = X(s). However, Xn(1) = 2 for every n so Xn(1) does not converge to 1 = X(1). But since the convergence occurs on the set [0, 1) and P ([0, 1)) = 1, Xnconverges to X almost surely.

Example (Convergence in probability, not almost surely)

Let the sample space be [0, 1] with the uniform probability distribution. Define the sequence X1, X2, . . . as follows:

X1(s) = s + I[0,1](s), X2(s) = s + I[0,1

2](s), X3(s) = s + I[1

2,1](s), X4(s) = s + I[0,1

3](s), X5(s) = s + I[1

3,23](s), X6(s) = s + I[2

3,1](s),

· · ·

Let X(s) = s. As n → ∞, P (|Xn− X| ≥ ²) is equal to the probability of an interval of s values whose length is going to 0. However, Xn does not converge to X almost surely.

Indeed, there is no value of s ∈ S for which Xn(s) → s = X(s). For every s, the value Xn(s) alternates between the values s and s + 1 infinitely often. For example, if s = 38, X1(s) = 11/8, X2(s) = 11/8, X3(s) = 3/8, X4(s) = 3/8, X5(s) = 11/8, X6(s) = 3/8, etc. No pointwise convergence occurs for this sequence.

Theorem 5.5.9 (Strong law of large numbers)

Let X1, X2, . . . be iid random variable with EXi = µ and VarXi = σ2 < ∞. Define ¯Xn = (1/n)Pn

i=1Xi. Then, for every ² > 0, P ( lim

n→∞| ¯Xn− µ| < ²) = 1;

(4)

that is, ¯Xn converges almost surely to µ.

For both the weak and strong law of large numbers we had the assumption of a finite variance.

In fact, both the weak and strong laws hold without this assumption. The only moment condition needed is that E|Xi| < ∞.

5.5.3 Convergence in Distribution

Definition 5.5.10

A sequence of random variables, X1, X2, . . ., converges in distribution to a random variable X if

n→∞lim FXn(x) = FX(x) at all points x where FX(x) is continuous.

Example (Maximum of uniforms)

If X1, X2, . . . are iid uniform(0,1) and X(n) = max1≤i≤nXi, let us examine if X(n) converges in distribution.

As n → ∞, we have for any ² > 0,

P (|Xn− 1| ≥ ²) = P (X(n) ≤ 1 − ²)

= P (Xi ≤ 1 − ², i = 1, . . . , n) = (1 − ²)n, which goes to 0. However, if we take ² = t/n, we then have

P (X(n)≤ 1 − t/n) = (1 − t/n)n→ e−t, which, upon rearranging, yields

P (n(1 − X(n)) ≤ t) → 1 − e−t;

that is, the random variable n(1−X(n)) converges in distribution to an exponential(1) random variable.

Note that although we talk of a sequence of random variables converging in distribution, it is really the cdfs that converge, not the random variables. In this very fundamental way

(5)

convergence in distribution is quite different from convergence in probability or convergence almost surely.

Theorem 5.5.12

If the sequence of random variables, X1, X2, . . ., converges in probability to a random variable X, the sequence also converges in distribution to X.

Theorem 5.5.13

The sequence of random variables, X1, X2, . . ., converges in probability to a constant µ if and only if the sequence also converges in distribution to µ. That is, the statement

P (|Xn− µ| > ²) → 0 for every ² > 0 is equivalent to

P (Xn ≤ x) →





0 if x < µ 1 if x > µ.

Theorem 5.5.14 (Central limit theorem)

Let X1, X2, . . . be a sequence of iid random variables whose mgfs exist in a neighborhood of 0 (that is, MXi(t) exists for |t| < h, for some positive h). Let EXi = µ and VarXi = σ2 > 0.

(Both µ and σ2 are finite since the mgf exists.) Define ¯Xn = (n1)Pn

i=1Xi. Let Gn(x) denote the cdf of

n( ¯Xn− µ)/σ. Then, for any x, −∞ < x < ∞,

n→∞lim Gn(x) = Z x

−∞

1

2πe−y2/2dy;

that is,

n( ¯Xn− µ)/σ has a limiting standard normal distribution.

Theorem 5.5.15 (Stronger form of the central limit theorem)

Let X1, X2, . . . be a sequence of iid random variables with EXi = µ and 0 < VarXi = σ2 <

∞. Define ¯Xn= (1n)Pn

i=1Xi. Let Gn(x) denote the cdf of

n( ¯Xn− µ)/σ. Then, for any x,

−∞ < x < ∞,

n→∞lim Gn(x) = Z x

−∞

1

2πe−y2/2dy;

that is,

n( ¯Xn− µ)/σ has a limiting standard normal distribution.

(6)

The proof is almost identical to that of Theorem 5.5.14, except that characteristic functions are used instead of mgfs.

Example (Normal approximation to the negative binomial)

Suppose X1, . . . , Xn are a random sample from a negative binomial(r, p) distribution. Recall that

EX = r(1 − p)

p , VarX = r(1 − p) p2 and the central limit theorem tells us that

√n( ¯X − r(1 − p)/p) pr(1 − p)/p2

is approximately N(0, 1). The approximate probability calculation are much easier than the exact calculations. For example, if r = 10, p = 12, and n = 30, an exact calculation would be

P ( ¯X ≤ 11) = P ( X30

i=1

Xi ≤ 330)

= X330

x=0

µ300 + x − 1 x

¶ (1

2)300+x = 0.8916 NoteP

X is negative binomial(nr, p). The CLT gives us the approximation

P ( ¯X ≤ 11) = P (

30( ¯X − 10)

20

√30(11 − 10)

20 ) ≈ P (Z ≤ 1.2247) = .8888.

Theorem 5.5.17 (Slutsky’s theorem)

If Xn→ X in distribution and Yn→ a, a constant, in probability, then (a) YnXn→ aX in distribution.

(b) Xn+ Yn→ X + a in distribution.

Example (Normal approximation with estimated variance)

Suppose that

n( ¯Xn− µ)

σ → N(0, 1),

(7)

but the value σ is unknown. We know Sn → σ in probability. By Exercise 5.32, σ/Sn → 1 in probability. Hence, Slutsky’s theorem tells us

√n( ¯Xn− µ) Sn

= σ Sn

√n( ¯Xn− µ)

σ → N(0, 1).

參考文獻

相關文件

• A put gives its holder the right to sell a number of the underlying asset for the strike price.. • An embedded option has to be traded along with the

Juang has received numerous distinctions and recognitions, including Bell Labs' President Gold Award, IEEE Signal Processing Society Technical Achievement Award, the IEEE

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

The first row shows the eyespot with white inner ring, black middle ring, and yellow outer ring in Bicyclus anynana.. The second row provides the eyespot with black inner ring

(Correct formula but with wrong answer form: 1-point deduction for each error.) (Did not use unit vectors: 1-point deduction for each

The minimal ellipse that can enclosed the cirlce is tangent to the circle, and the tangent point (x, y) has one possible solution for y variable.. This is our constrain

Suppose there is a differentiable function g(T ) (an estimator of some parameter) for which we want an approximate estimate of variance...

Example 4.3 (Beta-binomial hierarchy) One generalization of the binomial distribution is to allow the success probability to vary according to a distribution... 5 Covariance