StatisticalInference:ConﬁdenceIntervals 101 學年統計學 (I) 授課老師：蔡碧紋

(1)

101 學年統計學(I) 授課老師：蔡碧紋

Statistical Inference : Confidence Intervals

1

(2)

Population: the form of the distribution is assumed known, but the parameter(s) which determines the distribution is unknown

Sample: Draw a set of random sample from the population (i.i.d)

Point estimation (MME, MLE) Confidence Intervals:

I Confidence intervals for a population mean

I Confidence intervals for difference between two means

I Confidence intervals for variances

I Confidence intervals for proportions

I Sample Size

2

(3)

Confidence intervals for the mean of a single population

CI for µ

1. A set of random sample (i.i.d) from a normally distributed population.

(i) when the variance σ²is known.

(ii) when the variance σ² is unknown.

2. Sample is NOT from a normal distribution.

(a) When n is large (CLT _σ/^{X −µ}^¯^√_n → N(0, 1))

(b) When n is less than 30 and underlying distribution is less normal—Non-parameter methods

3

(4)

CIs for µ when the variance σ

²

is known

Assume the population X ∼ N (µ, σ²) where σ² is known. We draw a set of random sample of size n, let X be the sample average, and we can work out the probability that the random interval

[ X − z_α/2( σ

√n) , X + z_α/2( σ

√n) ] contains the unknown mean µ is 1 − α, i.e.,

P(L ≤ µ ≤ U) = 1 − α

[L, U] is a set of random intervals that contains µ with probability 1 − α;

If we replicate the sampling process 100 times, we have 100 different confidence intervals. It should be true that about 95% of them would contain the population mean µ.

4

(5)

40 45 50 55 60

01020304050

Confidence intervals

U

T

5

(6)

Once the sample is observed and the sample average is computed to equal to x , we call the interval

[ x ± z_α/2( σ

√n) ]

a 100(1 − α)% confidence intervals for the unknown mean µ. We are 100(1 − α)% confidence that [x ± z_α/2(^√^σ_n)] will contain µ

sample size n

confidence coefficient 1 − α

6

(7)

Example

Assume the population X ∼ N (µ, 81), we draw a set of random sample of size n = 10, and have

60.50 66.18 48.10 41.21 53.66 36.49 54.80 56.04 43.48 42.41

Find a 95% confidence interval for µ.

[x ± 1.96 ^√^σ_n] = [44.71, 55.87] is a 95% confidence interval for µ.

For a particular sample and x was computed, the interval either does or does not contain the mean µ. We can’t say there is 95% chance that the µ will fall between 44.71 and 55.87. We can only say that we have 95% confidence that the population mean will fall between [44.71, 55.87].

(we provide information about the uncertainty of the estimate)

7

(8)

Example

Assume the population X ∼ N (µ, 81), we draw a set of random sample of size n = 10, and have

60.50 66.18 48.10 41.21 53.66 36.49 54.80 56.04 43.48 42.41

Find a 95% confidence interval for µ.

[x ± 1.96 ^√^σ_n] = [44.71, 55.87] is a 95% confidence interval for µ.

For a particular sample and x was computed, the interval either does or does not contain the mean µ. We can’t say there is 95% chance that the µ will fall between 44.71 and 55.87. We can only say that we have 95% confidence that the population mean will fall between [44.71, 55.87].

(we provide information about the uncertainty of the estimate)

7

(9)

CI for µ when σ

²

is also unknown.

Recall: T -distribution According to the definition of a T random variable: Z ∼ N (0, 1) and V = χ²(r ), Z , V are independent

T = Z

pV /r

has a t-distribution with r degrees of freedom.

8

(10)

Recall: Normal and χ² distributions

Given X1, · · · , Xn is a random sample from a N (µ, σ²)

distribution where µ and σ² are unknown parameters, let X be the sample average, and S² =P(X_i − X )²/(n − 1) the

sample variance. Define W = (n − 1)S²/σ² (ie sum of squares divided by σ², then W is a chi-square distribution with

r = n − 1 degrees of freedom. That is W = (n − 1)S²

σ² ∼ χ²(n − 1)

E (W ) = 2(r /2) = r , Var (W ) = 4(r /2) = 2r . Thus a random variable W ∼ χ²(v ) have mean v and variance 2v and the mgf of W is M_w(t) =

1 1−2t

^r₂

, t < 1/2

9

(11)

CI for µ when σ

²

is also unknown.

We have

T =

√n(X − µ)/σ r

(n−1)S² σ²

.

(n − 1)

= X − µ S /√

n

have a t distribution with r = n − 1 degrees of freedom (recall many properties of t-distribution?)

Random Intervals

[ X − t_α/2(n − 1)( S

√n), X + t_α/2(n − 1)( S

√n) ]

Once a random sample is observed, we compute x and s² and [x ± t_α/2(n − 1)( s

√n)] is a 100(1 − α)% confidence interval for µ.

10

(12)

CI for µ when σ

²

is also unknown.

We have

T =

√n(X − µ)/σ r

(n−1)S² σ²

.

(n − 1)

= X − µ S /√

n

have a t distribution with r = n − 1 degrees of freedom (recall many properties of t-distribution?)

Random Intervals

[ X − t_α/2(n − 1)( S

√n), X + t_α/2(n − 1)( S

√n) ]

Once a random sample is observed, we compute x and s² and [x ± t_α/2(n − 1)( s

√n)]

is a 100(1 − α)% confidence interval for µ.

10

(13)

CIs for difference of two means Two independent normal distributions

1. When both variances are known.

2. If the variances are unknown and the sample sizes are large

3. If the variances are unknown

(a) assume common unknown equal variance (b) unequal variance

(i) sample sizes are large (ii) sample sizes are small

Paired data, Match data, dependent data

11

(14)

Both variances are known

Two independent random samples of sizes n and m from the two normal distributions

X₁, · · · , X_n ∼ N (µ_x, σ_X²), and Y₁, · · · , Y_m ∼ N (µ_y, σ_Y²).

Then we have X ∼ N (µX, σ²_X/n), and Y ∼ N (µY, σ²_Y/m).

Let W = X − Y , then

W ∼ N (µ_X − µ_Y,σ_X² n +σ_Y²

m)

12

(15)

Both variances are known

Once the samples are drawn

x − y ± z_α/2 rσ²_X

n + σ²_Y m is a 100(1-α)% CI for µ_X − µ_Y

13

(16)

Sample sizes are large and variances are unknown

We replace variances with the sample variances s_X², s_Y² where they are the values of the respective unbiased estimates of the variances.

That is

x − y ± z_α/2 rs_X²

n + s_Y² m is an approximate 100(1-α)% CI for µX − µY

14

(17)

Sample sizes are small and variances are unknown

a. Assumed common variance

Estimate for the common variance: equal variance σ_X² = σ_Y² = σ²

Denote

S_p² = (n − 1)S_X² + (m − 1)S_Y² n + m − 2

which is an unbiased estimator for the common variance σ².

15

(18)

Estimate for the common variance

Since the random samples are from two independent normal distribution with common variance , we have

(n − 1)S_X²

σ² ∼ χ²(n − 1), (m − 1)S_Y²

σ² ∼ χ²(m − 1) and they are independent. Thus

U = (n − 1)S_X²

σ² + (m − 1)S_Y²

σ² ∼ χ²(n + m − 2) and E(U) = n + m − 2, thus we have E(S_p²) = σ²

16

(19)

(a). Common variance assumption

we have

Z = X − Y − (µ_x − µ_Y) q

σ²(¹_n +_m¹) but we don’t know σ² so we have

T = Z

pU/r = [X − Y − (µ_x− µ_Y)]pσ²(1/n + 1/m) q

[^(n−1)S_σ2 ^X² +^(m−1)S_σ2 ^Y²](n + m − 2)

= X − Y − (µ_x − µ_Y) q

[^(n−1)S_n+m−2^X²^+(m−1)S^Y²] _n¹ +_m¹

has a t-distribution with r = n + m − 2 degrees of freedom.

A 00(1 − α)% CI for µX − µY is x − y ± t_α/2(n + m − 2)

r s_p² 1

n + 1 m

17

(20)

(b) not equal variances

W = X − Y − (µ_x − µ_Y) pS_X²/n + S_Y²/m

1. If n and m are large enough and the underlying distributions are close to normal -> usenormal distribution to construct a CI

2. ^∗ If n and m are small -> approximating Student’s t distribution has r degrees of freedom (Welch t) where

1

r = c²

n − 1 +(1 − c)²

m − 1 and c = s_X²/n s_X²/n + s_Y²/m r = (s_X²/n + s_Y²/m)²

1

n−1(s_X²/n)²+_m−1¹ (s_Y²/m)²

If r is not an integer, then we use the greatest integer in r , i.e., br c the “floor” is the number of degrees of freedom associated with the approximating student’s t distribution.

18

(21)

We have

x − y ± t_α/2(r ) rs_X²

n + s_Y² m is a 100(1-α)% CI for µ_X − µ_Y.

19

(22)

Paired (Match) samples

X_i and Y_i are measurements taken from the same subject. X_i and Yi are dependent random variables.

Let (X₁, Y₁), · · · , (X_n, Y_n) be n pairs of dependent measurements.

Let D_i = X_i− Y_i, i = 1, · · · , n. Suppose D_i can be thought of as a random sample from N(µ_D, σ_D²), where µ_D and σ_D² are the mean and standard deviation of each difference.

20

(23)

To form a CI for µX − µY, use T = D − µ_D

S_D/√ n

where D and S_D are the sample mean and sample standard deviation of the n differences. T is a t statistic with n − 1 degrees of freedom.

Thus the CI for µD = µX − µY is

d ± t_α/2(n − 1)s_D

√n

where d and s_D are the observed mean and standard deviation of the sample. (this is the same as the CI for a single mean).

21

(24)

Confidence intervals for variances

Recall : Chi-Square distribution

Given X1, · · · , Xn is a random sample from a N (µ, σ²) distribution where µ and σ² are unknown parameters and let S² =P(X_i − X )²/(n − 1).

W = (n − 1)S²

σ² ∼ χ²(n − 1)

22

(25)

chi-squared R.V.s

0 2 4 6 8 10

0.000.050.100.150.200.25

Chisq distribution

x

dchisq(x, 3)

df= 3 df= 5 df= 8

23

(26)

Example: chi-squared distribution

Let S² be the sample variance of a random sample of size 6 is drawn from a N (µ, 12) distribution. Find

P(2.76 < S² < 22.2)

Let W = ^(n−1)S_σ2 ², then W ∼ χ²(5).

So P(2.76 < S² < 22.2) = P(₁₂⁵ (2.76) < ^(n−1)S_σ2 ² <

5

12(22.2)) = P(1.15 < W < 9.25) and

P(2.76 < S² < 22.2) = P(W < 9.25) − P(W < 1.15) = pchisq(9.25, 5) − pchisq(1.15, 5) = 0.85.

24

(27)

Example: chi-squared distribution

Let S² be the sample variance of a random sample of size 6 is drawn from a N (µ, 12) distribution. Find

P(2.76 < S² < 22.2)

Let W = ^(n−1)S_σ2 ², then W ∼ χ²(5).

So P(2.76 < S² < 22.2) = P(₁₂⁵ (2.76) < ^(n−1)S_σ2 ² <

5

12(22.2)) = P(1.15 < W < 9.25) and

P(2.76 < S² < 22.2) = P(W < 9.25) − P(W < 1.15) = pchisq(9.25, 5) − pchisq(1.15, 5) = 0.85.

24

(28)

CI for variance σ

²

Let X₁, · · · , X_n is a random sample from a N (µ, σ²) distribution, find the 100(1 − α)% CI for σ²

Select constantsa and b from χ²(n − 1) such that P(a ≤ (n − 1)S²

σ² ≤ b) = 1 − α

we select a = χ²_1−α/2(n − 1) and b = χ²_α/2(n − 1), and we have 1 − α = P( a

(n − 1)S² ≤ 1

σ² ≤ b (n − 1)S²)

= P((n − 1)S²

b ≤ σ² ≤ (n − 1)S²

a )

The probability that the random interval

[(n − 1)S²/b, (n − 1)S²/a] contains the unknown σ² is 1 − α. Once the data is observed, then the CIs

[(n − 1)s²/b, (n − 1)s²/a] is a 100(1-α)% CI for σ².

25

(29)

CI for variance σ

²

Let X₁, · · · , X_n is a random sample from a N (µ, σ²) distribution, find the 100(1 − α)% CI for σ²

Select constantsa and b from χ²(n − 1) such that P(a ≤ (n − 1)S²

σ² ≤ b) = 1 − α

we select a = χ²_1−α/2(n − 1) and b = χ²_α/2(n − 1), and we have 1 − α = P( a

(n − 1)S² ≤ 1

σ² ≤ b (n − 1)S²)

= P((n − 1)S²

b ≤ σ² ≤ (n − 1)S²

a )

The probability that the random interval

[(n − 1)S²/b, (n − 1)S²/a] contains the unknown σ² is 1 − α.

Once the data is observed, then the CIs

[(n − 1)s²/b, (n − 1)s²/a] is a 100(1-α)% CI for σ².

25

(30)

Example: CI for variance

X₁, · · · , X₁₃ ∼ N (µ, σ²), we have x = 18.97 and P13

i =1(x_i − x)² = 128.41 find the 90% CIs for σ². From chi-squared table we have χ²_0.95(12) = 5.226 and χ²_0.05(12) = 21.03 (5 quantile and 95 quantile from a chi-squared distribution with 12 degrees of freedom respectively).

A 90% CIs for σ² is [ 128.4

21.03,128.4

5.226 ] = [6.11, 24.57]

26

(31)

Example

Given X1, · · · , Xn is a random sample from a Exponential(λ) distribution (mean=1/λ).

1. Let W = 2λ

n

X

i =1

X_i, show W ∼ χ²(2n) (hint: use Moment generating function)

2. Find a 90% CIs for λ.

27

(32)

CI for the ratio of variances

Recall: F distribution

W1 ∼ χ²(v1), W2 ∼ χ²(v2) and W1, W2 are independent random variables. Then a random variable F which can be expressed as

F = W₁/v₁ W₂/v₂

is said to be distributed as a F distribution with degrees of freedom v₁ and v₂, denoted by F (v₁, v₂) or F_v₁_,v₂

28

(33)

0 2 4 6 8 10

0.00.10.20.30.40.50.60.7

F (d1, d2) distribution

x

df(x, 3, 9)

3, 9 4, 8 9, 3

29

(34)

F-distribution

Reciprocal of an F

Let the r.v. F ∼ F (v₁, v₂) and let Y = 1/F . Then Y has a pdf.

f (y )^∗ = g (F )|dF dy|

= v₁^v¹^/2y^1−(v¹^/2)v₂^v²^/2y^(v¹^+v²^)/2 B(^v₂¹,^v₂²)(v₂y + v₁)^(v¹^+v²^)/2

1 y²

= v₁^v¹^/2v₂^v²^/2y^(v²^/2)−1

B(^v₂²,^v₂¹)(v₁+ v₂y )^(v¹^+v²^)/2 y ∈ [0, ∞)

That is if F ∼ F (v₁, v₂) and Y = 1/F , then Y ∼ F (v₂, v₁)

30

(35)

CI for σ

_X²

/σ

_Y²

from two ind. Normal

Given S_X², S_Y² are unbiased estimates of σ²_X, σ_Y² derived from samples of size n and m , respectively, from two independent normal populations. Find a 100(1 − α)% CI for σ_X²/σ²_Y.

(n − 1)S_X²/σ²_X ∼ χ²(n − 1) , (m − 1)S_Y²/σ_Y² ∼ χ²(m − 1)

(m−1)S_Y²

σ²_Y /(m − 1)

(n−1)S₁²

σ₁² /(n − 1)

= S_Y²/σ²_Y S_X²/σ²_X

follow a F distribution with degrees of freedom (m − 1) and (n − 1) i.e.,

S_Y²/σ_Y²

S_X²/σ_X² ∼ F (m − 1, n − 1)

31

(36)

CI for σ

_X²

/σ

_Y²

from two ind. Normal

Given S_X², S_Y² are unbiased estimates of σ²_X, σ_Y² derived from samples of size n and m , respectively, from two independent normal populations. Find a 100(1 − α)% CI for σ_X²/σ²_Y. (n − 1)S_X²/σ²_X ∼ χ²(n − 1) , (m − 1)S_Y²/σ_Y² ∼ χ²(m − 1)

(m−1)S_Y²

σ²_Y /(m − 1)

(n−1)S₁²

σ₁² /(n − 1)

= S_Y²/σ²_Y S_X²/σ²_X

follow a F distribution with degrees of freedom (m − 1) and (n − 1) i.e.,

S_Y²/σ_Y²

S_X²/σ_X² ∼ F (m − 1, n − 1)

31

(37)

S_Y²/σ²_Y

S_X²/σ²_X ∼ F (m − 1, n − 1)

So we select c = F_1−α/2(m − 1, n − 1) and d = Fα/2(m − 1, n − 1), and

P(c ≤ S_Y²/σ_Y²

S_X²/σ_X² ≤ d ) = 1 − α That is

P c S_X² S_Y² ≤ σ_X²

σ_Y² ≤ d S_X²

S_Y² = 1 − α

Often from table we have

c = F1−α/2(m − 1, n − 1) = 1/Fα/2(n − 1, m − 1) and

d = F_α/2(m − 1, n − 1), let s_x² and s_y² be the realization of S_X² and S_Y², then a 100(1 − α)% CIs for σ_X²/σ²_Y is

[ 1

F_α/2(n − 1, m − 1) s_X²

s_Y² , F_α/2(m − 1, n − 1)s_X² s_Y² ]

32

(38)

S_Y²/σ²_Y

S_X²/σ²_X ∼ F (m − 1, n − 1)

So we select c = F_1−α/2(m − 1, n − 1) and d = Fα/2(m − 1, n − 1), and

P(c ≤ S_Y²/σ_Y²

S_X²/σ_X² ≤ d ) = 1 − α That is

P c S_X² S_Y² ≤ σ_X²

σ_Y² ≤ d S_X²

S_Y² = 1 − α Often from table we have

c = F_1−α/2(m − 1, n − 1) = 1/F_α/2(n − 1, m − 1) and

d = F_α/2(m − 1, n − 1), let s_x² and s_y² be the realization of S_X² and S_Y², then a 100(1 − α)% CIs for σ_X²/σ²_Y is

[ 1

F_α/2(n − 1, m − 1) s_X²

s_Y², F_α/2(m − 1, n − 1)s_X² s_Y² ]

32

(39)

Example

From two ind Normal with unknown means and variances, we have (12)s_X² = 128.4 from a random sample of size 13 and (8)s_Y² = 36.72 from a random sample of size 9. Find a 98%

CIs for σ_X²/σ_Y².

S_Y²/σ_Y²

S_X²/σ_X² ∼ F (8, 12) From F -table we have F_0.01(12, 8) = 5.67 and F_0.01(8, 12) = 4.50, so a 98% CIs for σ²_X/σ_Y² is

[ 1

5.67

128.4/12

36.72/8 , 4.50 128.4/12 36.72/8 ]

33

(40)

Example

From two ind Normal with unknown means and variances, we have (12)s_X² = 128.4 from a random sample of size 13 and (8)s_Y² = 36.72 from a random sample of size 9. Find a 98%

CIs for σ_X²/σ_Y².

S_Y²/σ_Y²

S_X²/σ_X² ∼ F (8, 12) From F -table we have F0.01(12, 8) = 5.67 and F_0.01(8, 12) = 4.50, so a 98% CIs for σ²_X/σ_Y² is

[ 1

5.67

128.4/12

36.72/8 , 4.50 128.4/12 36.72/8 ]

33

(41)

Confidence intervals for proportions (p)

Estimate proportions. Construct a CI for p in the Bin(n, p) distribution.

Assume that sampling is from a binomial population and hence that the problem is to estimate p in the Bin(n, p) distribution where p is unknown.

recall:

Given Y is distributed as Bin(n, p), an unbiased estimate of p is ˆp = ^Y_n.

E (ˆp) = E (Y n) = p and

Var (ˆp) = 1

n²Var(Y ) = 1

n²np(1 − p) = p(1 − p) n

34

(42)

Confidence intervals for proportions (p)

For large n,

Y − np

pnp(1 − p) = (Y /n) − p pp(1 − p)/n

can be approximated by the standard normal N (0, 1).

Thus an approximate 100(1 − α)% CI for p is obtained by considering

P(−z_α/2< (Y /n) − p

pp(1 − p)/n < z_α/2) = 1 − α

Replace the variance of ˆp = Y /n by its estimate ˆp(1 − ˆp)/n, giving a simple expression for the CI for p is

[ˆp ± z_α/2

rˆp(1 − ˆp) n ] = [Y

n ± z_α/2

r(Y /n)(1 − Y /n)

n ]

35

(43)

Confidence intervals for proportions (p)

For large n,

Y − np

pnp(1 − p) = (Y /n) − p pp(1 − p)/n

can be approximated by the standard normal N (0, 1).

Thus an approximate 100(1 − α)% CI for p is obtained by considering

P(−z_α/2< (Y /n) − p

pp(1 − p)/n < z_α/2) = 1 − α

Replace the variance of ˆp = Y /n by its estimate ˆp(1 − ˆp)/n, giving a simple expression for the CI for p is

[ˆp ± z_α/2

rp(1 − ˆˆ p) n ] = [Y

n ± z_α/2

r(Y /n)(1 − Y /n)

n ]

35

(44)

Example

Assume Y ∼ Bin(n, p), we have n = 36 and y /n = 0.222, find an approximate 90% CIs for p

[ 0.222 ± 1.645

r(0.222)(1 − 0.222)

36 ]

Example

Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61

Poll n = 351 and y = 185 say yes, find 95% CI for p?

36

(45)

Example

[ 0.222 ± 1.645

r(0.222)(1 − 0.222)

36 ]

Example

Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61

Poll n = 351 and y = 185 say yes, find 95% CI for p?

36

(46)

Example

[ 0.222 ± 1.645

r(0.222)(1 − 0.222)

36 ]

Example

Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61 Poll n = 351 and y = 185 say yes, find 95% CI for p?

36

(47)

CI for difference of two proportions

37

(48)

民

民民調調調的的的解解解讀讀讀

「....施政滿意度4成4。本次調查是以台灣地區住宅電話簿為抽樣清冊，並以電話的後四碼進行隨機抽樣。共成功訪問1056位台灣地區20歲以上民眾。在95%的信心水準下，

抽樣誤差為正負3.0百分點。

1. 這項民調的母體是什麼？樣本數為多少？

2. 受訪民眾中對施政滿意約有多少人？

3. 算出這次調查的信賴區間？

38

(49)

民

民民調調調的的的解解解讀讀讀

1. 在本次調查中，母體是台灣地區20歲以上的民眾，樣

本則是成功訪問的1056人，「滿意度4成4」表示在1056位受訪者中，約有44%的人表示滿意(即約有456人回答滿意

2. 區間[0.44 − 0.03, 0.44 + 0.03] = [0.41, 0.47]，稱為信賴區間 (信賴區間：[估計值-最大誤差 , 估計值+最大誤差] )

假設母體真正的滿意比例是p,這次的調查推估p的值可能會落在0.41到0.47的範圍內。

3. 95%的信心水準: p是不可知的，而抽樣都會有誤差，

並不能保證真正的比例p一定會在我們所推估的區間內。「如果我們抽樣很多次，每次都會得到一個信賴區間，那麼這麼多的信賴區間中，約有95%的區間會

涵蓋真正的p值。

4. 而我們有95%的信心說，真正的滿意度會落在我們所

得出的區間中。

39

(50)

Sample Size for proportion

某報對於台北市市長施政滿意程度進行民調，民調結果如下：「滿意度為六成三，本次民調共成功訪問n位台北

市20歲以上的成年民眾，在95%的信心水準下，抽樣誤差

為正負3.2百分點。」求n?

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032 we have n = (0.63)(0.37)(1.96)²/0.032² = 864

The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01 we have n = (0.08)(0.92)(2.326)²/0.01² = 3982

40

(51)

Sample Size for proportion

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032 we have n = (0.63)(0.37)(1.96)²/0.032² = 864

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01 we have n = (0.08)(0.92)(2.326)²/0.01² = 3982

40

(52)

Sample Size for proportion

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032 we have n = (0.63)(0.37)(1.96)²/0.032² = 864

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01 we have n = (0.08)(0.92)(2.326)²/0.01² = 3982

40

(53)

Sample Size for proportion

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032 we have n = (0.63)(0.37)(1.96)²/0.032² = 864

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01 we have n = (0.08)(0.92)(2.326)²/0.01² = 3982

40

(54)

unknown ˆ p

For estimating p, we have p^∗(1 − p^∗) ≤ 1/4. Hence we need n = 2.326²/(4 ∗ 0.01²) = 13530

for the maximum error of the estimate for 98% confidence coefficient is 0.01.

95% confidence coefficient for = 0.01, we have n = 9604 95% confidence coefficient for = 0.03, we have n = 1067

41

(55)

Sample Size and CIs for given ˆ p

The 95% CI for the proportion of people of supporting A when there is 51% people support A in polls of 100, 400 or 10,000 sample.

[0.41, 0.61], [0.46, 0.56], [0.50, 0.52]

[0.51 ± 0.1], [0.51 ± 0.05], [0.51 ± 0.01]

42

(56)

Sample Size for mean

100(1 − α)% CI for µ is [x ± z_α/2(σ/√

n)]. Denote such interval as x ± . we sometime call = z_α/2(σ/√

n) the maximum error of the estimate

n = (z_α/2)²(σ)²

² where it is assumed that σ² is known.

43

(57)

Example

we want the 95% CIs for µ to be x ± 1 for a normal population with standard deviation σ = 15, find the sample size.

1.96 15

√n = 1 we have n ≈ 864.35 = 865.

The 80% CIs for µ is x ± 2, then we have

1.282 15

√n = 2 where z_0.1 = 1.282 and thus n = 93

44

(58)

Example

1.96 15

√n = 1 we have n ≈ 864.35 = 865.

1.282 15

44

(59)

Example

1.96 15

√n = 1 we have n ≈ 864.35 = 865.

1.282 15

44

(60)

Pivotal quantity I

A very useful method for finding confidence intervals uses a pivotal quantity.

What is a pivotal quantity? A pivotal quantity is a function of data and the unknown parameter, say g (X, θ), and the distribution of g (X, θ) does not depend on the unknown parameter.

Example

Given X₁, · · · , X_n is a random sample from a N (µ, σ²) distribution.

45

(61)

Pivotal quantity II

1. When σ is known, Z = X − µ σ/√

n is a pivotal quantity.

Z ∼ N (0, 1)

2. When σ is unknown, T = X − µ S /√

n is a pivotal quantity where S is the sample deviation. T ∼ t(n − 1)

3. W = (n − 1)S²/σ² is a pivotal quantity. W ∼ χ²(n − 1)

Y ∼ Bin(n, p), (Y /n) − p qp(1−p)

n

∼ N (0, 1)

46

StatisticalInference:ConﬁdenceIntervals 101 學 年 統 計 學 (I) 授 課 老 師 ： 蔡 碧 紋

101 學年 統計 學(I) 授課老師：蔡碧紋

Statistical Inference : Confidence Intervals

Confidence intervals for the mean of a single population

CIs for µ when the variance σ

is known

CI for µ when σ

is also unknown.

CI for µ when σ

is also unknown.

CI for µ when σ

is also unknown.

Both variances are known

Both variances are known

Sample sizes are large and variances are unknown

Sample sizes are small and variances are unknown

(a). Common variance assumption

(b) not equal variances

Paired (Match) samples

Confidence intervals for variances

CI for variance σ

CI for variance σ

Example: CI for variance

Example

CI for the ratio of variances

F-distribution

CI for σ

/σ

from two ind. Normal

CI for σ

/σ

from two ind. Normal

Example

Example

Confidence intervals for proportions (p)

Confidence intervals for proportions (p)

Confidence intervals for proportions (p)

Example

Example

Example

Example

Example

Example

CI for difference of two proportions

民

民 民調 調 調的 的 的解 解 解讀 讀 讀

民

民 民調 調 調的 的 的解 解 解讀 讀 讀

Sample Size for proportion

Sample Size for proportion

Sample Size for proportion

Sample Size for proportion

unknown ˆ p

Sample Size and CIs for given ˆ p

Sample Size for mean

Pivotal quantity I

Example

Pivotal quantity II

StatisticalInference:ConﬁdenceIntervals 101 學年統計學 (I) 授課老師：蔡碧紋

101 學年統計學(I) 授課老師：蔡碧紋

民民調調調的的的解解解讀讀讀

民民調調調的的的解解解讀讀讀