• 沒有找到結果。

5.5.3 Convergence in Distribution Definition 5.5.10 A sequence of random variables, X

N/A
N/A
Protected

Academic year: 2022

Share "5.5.3 Convergence in Distribution Definition 5.5.10 A sequence of random variables, X"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

5.5.3 Convergence in Distribution

Definition 5.5.10

A sequence of random variables, X1, X2, . . ., converges in distribution to a random variable X if

n→∞lim FXn(x) = FX(x) at all points x where FX(x) is continuous.

Example (Maximum of uniforms)

If X1, X2, . . . are iid uniform(0,1) and X(n) = max1≤i≤nXi, let us examine if X(n) converges in distribution.

As n → ∞, we have for any ² > 0,

P (|Xn− 1| ≥ ²) = P (X(n) ≤ 1 − ²)

= P (Xi ≤ 1 − ², i = 1, . . . , n) = (1 − ²)n, which goes to 0. However, if we take ² = t/n, we then have

P (X(n)≤ 1 − t/n) = (1 − t/n)n→ e−t,

which, upon rearranging, yields

P (n(1 − X(n)) ≤ t) → 1 − e−t;

that is, the random variable n(1−X(n)) converges in distribution to an exponential(1) random variable.

Note that although we talk of a sequence of random variables converging in distribution, it is really the cdfs that converge, not the random variables. In this very fundamental way convergence in distribution is quite different from convergence in probability or convergence almost surely.

Theorem 5.5.12

If the sequence of random variables, X1, X2, . . ., converges in probability to a random variable X, the sequence also converges in distribution to X.

(2)

Theorem 5.5.13

The sequence of random variables, X1, X2, . . ., converges in probability to a constant µ if and only if the sequence also converges in distribution to µ. That is, the statement

P (|Xn− µ| > ²) → 0 for every ² > 0 is equivalent to

P (Xn ≤ x) →





0 if x < µ 1 if x > µ.

Theorem 5.5.14 (Central limit theorem)

Let X1, X2, . . . be a sequence of iid random variables whose mgfs exist in a neighborhood of 0 (that is, MXi(t) exists for |t| < h, for some positive h). Let EXi = µ and VarXi = σ2 > 0.

(Both µ and σ2 are finite since the mgf exists.) Define ¯Xn = (n1)Pn

i=1Xi. Let Gn(x) denote the cdf of

n( ¯Xn− µ)/σ. Then, for any x, −∞ < x < ∞,

n→∞lim Gn(x) = Z x

−∞

1

2πe−y2/2dy;

that is,

n( ¯Xn− µ)/σ has a limiting standard normal distribution.

Theorem 5.5.15 (Stronger form of the central limit theorem)

Let X1, X2, . . . be a sequence of iid random variables with EXi = µ and 0 < VarXi = σ2 <

∞. Define ¯Xn= (1n)Pn

i=1Xi. Let Gn(x) denote the cdf of

n( ¯Xn− µ)/σ. Then, for any x,

−∞ < x < ∞,

n→∞lim Gn(x) = Z x

−∞

1

2πe−y2/2dy;

that is,

n( ¯Xn− µ)/σ has a limiting standard normal distribution.

The proof is almost identical to that of Theorem 5.5.14, except that characteristic functions are used instead of mgfs.

Example (Normal approximation to the negative binomial)

Suppose X1, . . . , Xn are a random sample from a negative binomial(r, p) distribution. Recall that

EX = r(1 − p)

p , VarX = r(1 − p) p2

(3)

and the central limit theorem tells us that

√n( ¯X − r(1 − p)/p) pr(1 − p)/p2

is approximately N(0, 1). The approximate probability calculation are much easier than the exact calculations. For example, if r = 10, p = 12, and n = 30, an exact calculation would be

P ( ¯X ≤ 11) = P ( X30

i=1

Xi ≤ 330)

= X330

x=0

µ300 + x − 1 x

¶ (1

2)300+x = 0.8916 NoteP

X is negative binomial(nr, p). The CLT gives us the approximation

P ( ¯X ≤ 11) = P (

30( ¯X − 10)

20

√30(11 − 10)

20 ) ≈ P (Z ≤ 1.2247) = .8888.

Theorem 5.5.17 (Slutsky’s theorem)

If Xn→ X in distribution and Yn→ a, a constant, in probability, then (a) YnXn→ aX in distribution.

(b) Xn+ Yn→ X + a in distribution.

Example (Normal approximation with estimated variance)

Suppose that

n( ¯Xn− µ)

σ → N(0, 1),

but the value σ is unknown. We know Sn → σ in probability. By Exercise 5.32, σ/Sn → 1 in probability. Hence, Slutsky’s theorem tells us

√n( ¯Xn− µ)

Sn = σ

Sn

√n( ¯Xn− µ)

σ → N(0, 1).

5.5.4 The Delta Method

First, we look at one motivation example. Example 5.5.19 (Estimating the odds)

Suppose we observe X1, X2, . . . , Xnindependent Bernoulli(p) random variables. The typical

(4)

parameter of interest is p, but another population is 1−pp . As we would estimate p by ˆ

p =P

iXi/n, we might consider using 1−ˆpˆp as an estimate of 1−pp . But what are the properties of this estimator? How might we estimate the variance of 1−ˆpˆp ?

Definition

If a function g(x) has derivatives of order r, that is, g(r)(x) = dxdrrg(x) exists, then for any constant a, the Taylor polynomial of order r about a is

Tr(x) = Xr

i=0

g(i)(a)

i! (x − a)i.

Theorem (Taylor)

If g(r)(a) = dxdrrg(x)|x=a exists, then

x→alim

g(x) − Tr(x) (x − a)r = 0.

Since we are interested in approximations, we are just going to ignore the remainder. There are, however, many explicit forms, one useful one being

g(x) − Tr(x) = Z x

a

g(r+1)(t)

r! (x − t)rdt.

Now we consider the multivariate case of Taylor series. Let T1, . . . , Tk be random variables with means θ1, . . . , θk, and define T = (T1, . . . , Tk) and θ = (θ1, . . . , θk). Suppose there is a differentiable function g(T ) (an estimator of some parameter) for which we want an approximate estimate of variance. Define

gi0(θ) =

∂tig(t)|t11,...,tkk. The first-order Taylor series expansion of g about θ is

g(t) = g(θ) + Xk

i=1

gi0(θ)(ti− θi) + Remainder.

From our statistical approximation we forget about the remainder and write

g(t) ≈ g(θ) + Xk

i=1

g0i(θ)(ti− θi).

(5)

Now, take expectation on both sides to get

Eθg(T) ≈ g(theta) + Xk

i=1

gi0(theta)Eθ(Ti− θi) = g(theta).

We can now approximate the variance of g(T) by

Varθg(T) ≈ Eθ([g(T) − g(theta)]2) ≈ Eθ¡ (

Xk i=1

g0i(theta)(Ti− θi)2¢

= Xk

i=1

[gi0(theta)]2VarθTi+ 2X

i>j

gi0(θ)gj0(theta)Covθ(Ti, Tj).

This approximation is very useful because it gives us a variance formula for a general function, using only simple variance and covariance.

Example (Continuation of Example 5.5.19)

In our above notation, take g(p) = 1−pp , so g0(p) = (1−p)1 2 and Var( pˆ

1 − ˆp) ≈ [g0(p)]2Var(ˆp)

[ 1

(1 − p)2]2p(1 − p)

n = p

n(1 − p)3, giving us an approximation for the variance of our estimator.

Example (Approximate mean and variance)

Suppose X is a random variable with EµX = µ 6= 0. If we want to estimate a function g(µ), a first-order approximation would give us

g(X) = g(µ) + g0(µ)(X − µ).

If we use g(X) as an estimator of g(µ), we can say that approximately Eµg(X) ≈ g(µ),

and

Varµg(X) ≈ [g0(µ)]2VarµX.

Theorem 5.5.24 (Delta method)

Let Ynbe a sequence of random variables that satisfies

n(Yn−θ) → N(0, σ2) in distribution.

(6)

For a given function g and a specific value of θ, suppose that g0(θ) exists and is not 0. Then

√n[g(Yn) − g(θ)] → N(0, σ2[g0(θ)2])

in distribution.

Proof: The Taylor expansion of g(Yn) around Yn= θ is

g(Yn) = g(θ) + g0(θ)(Yn− θ) + remainder,

where the remainder→ 0 as Yn → θ. Since Yn → θ in probability it follows that the remainder→ 0 in probability. By applying Slutsky’s theorem (a),

g0(θ)√

n(Yn− θ) → g0(θ)X, where X ∼ N(0, σ2). Therefore

√n[g(Yn) − g(θ)] → g0(θ)√

n(Yn− θ) → N(0, σ2[g0(θ)]2).

¤

Example

Suppose now that we have the mean of a random sample ¯X. For µ 6= 0, we have

√n(1 X¯ 1

µ) → N(0, (1

µ)4VarµX1).

in distribution.

There are two extensions of the basic Delta method that we need to deal with to complete our treatment. The first concerns the possibility that g0(µ) = 0.

(Second-order Delta Method)

Let Ynbe a sequence of random variables that satisfies

n(Yn−θ) → N(0, σ2) in distribution.

For a given function g and a specific value of θ, suppose that g0(θ) = 0 and g00(θ) exists and is not 0. Then

n[g(Yn) − g(θ)] → σ2g00(θ) 2 χ21

(7)

in distribution.

Next we consider the extension of the basic Delta method to the multivariate case.

Theorem 5.5.28

Let X1, . . . , Xnbe a random sample with E(Xij) = µi and Cov(Xik, Xjk) = σij. For a given function g with continuous first partial derivatives and a specific value of µ = (µ1, . . . , µp) for which τ2 =P P

σij∂g(µ)∂µ

i

∂g(µ)

∂µj > 0,

√n[g( ¯X1, . . . , ¯Xp) − g(µ1, . . . , µp)] → N(0, τ2)

in distribution.

參考文獻

相關文件

(i) [5%] Give an example of a nondegenerate symmetric bilinear form of Witt index 1 on a two- dimensional real vector space.. (ii) [5%] Give an example of nonzero quadratic form on

Let f be a positive continuous function in

[10%] Give an example of a differentiable curve whose curvature (as a function of a parameter) can take any positive real

(In Section 7.5 we will be able to use Newton's Law of Cooling to find an equation for T as a function of time.) By measuring the slope of the tangent, estimate the rate of change

Find

The integral of the mean squared error of an estimator of a re- gression function is used as a criterion for defining an optimal design measure in the context of local

As an example of a situation where the mgf technique fails, consider sampling from a Cauchy distribution.. Thus, the sum of two independent Cauchy random variables is again a

We study the problem of estimating the size of an inclusion embed- ded inside a two dimensional body with discontinuous conductivity by one voltage-current measurement.. This problem