## 101 學年 統計 學(I) 授課老師：蔡碧紋

### Statistical Inference : Confidence Intervals

1

Population: the form of the distribution is assumed known, but the parameter(s) which determines the distribution is unknown

Sample: Draw a set of random sample from the population (i.i.d)

Point estimation (MME, MLE) Confidence Intervals:

I Confidence intervals for a population mean

I Confidence intervals for difference between two means

I Confidence intervals for variances

I Confidence intervals for proportions

I Sample Size

2

## Confidence intervals for the mean of a single population

CI for µ

1. A set of random sample (i.i.d) from a normally distributed population.

(i) when the variance σ^{2}is known.

(ii) when the variance σ^{2} is unknown.

2. Sample is NOT from a normal distribution.

(a) When n is large (CLT _{σ/}^{X −µ}^{¯}^{√}_{n} → N(0, 1))

(b) When n is less than 30 and underlying distribution is less normal—Non-parameter methods

3

## CIs for µ when the variance σ

^{2}

## is known

Assume the population X ∼ N (µ, σ^{2}) where σ^{2} is known. We
draw a set of random sample of size n, let X be the sample
average, and we can work out the probability that the random
interval

[ X − z_{α/2}( σ

√n) , X + z_{α/2}( σ

√n) ] contains the unknown mean µ is 1 − α, i.e.,

P(L ≤ µ ≤ U) = 1 − α

[L, U] is a set of random intervals that contains µ with probability 1 − α;

If we replicate the sampling process 100 times, we have 100 different confidence intervals. It should be true that about 95% of them would contain the population mean µ.

4

40 45 50 55 60

01020304050

**Confidence intervals **

U

T

5

Once the sample is observed and the sample average is computed to equal to x , we call the interval

[ x ± z_{α/2}( σ

√n) ]

a 100(1 − α)% confidence intervals for the unknown mean
µ. We are 100(1 − α)% confidence that [x ± z_{α/2}(^{√}^{σ}_{n})] will
contain µ

sample size n

confidence coefficient 1 − α

6

Example

Assume the population X ∼ N (µ, 81), we draw a set of random sample of size n = 10, and have

60.50 66.18 48.10 41.21 53.66 36.49 54.80 56.04 43.48 42.41

Find a 95% confidence interval for µ.

[x ± 1.96 ^{√}^{σ}_{n}] = [44.71, 55.87] is a 95% confidence interval
for µ.

For a particular sample and x was computed, the interval either does or does not contain the mean µ. We can’t say there is 95% chance that the µ will fall between 44.71 and 55.87. We can only say that we have 95% confidence that the population mean will fall between [44.71, 55.87].

(we provide information about the uncertainty of the estimate)

7

Example

Assume the population X ∼ N (µ, 81), we draw a set of random sample of size n = 10, and have

60.50 66.18 48.10 41.21 53.66 36.49 54.80 56.04 43.48 42.41

Find a 95% confidence interval for µ.

[x ± 1.96 ^{√}^{σ}_{n}] = [44.71, 55.87] is a 95% confidence interval
for µ.

For a particular sample and x was computed, the interval either does or does not contain the mean µ. We can’t say there is 95% chance that the µ will fall between 44.71 and 55.87. We can only say that we have 95% confidence that the population mean will fall between [44.71, 55.87].

(we provide information about the uncertainty of the estimate)

7

## CI for µ when σ

^{2}

## is also unknown.

Recall: T -distribution According to the definition of a T
random variable: Z ∼ N (0, 1) and V = χ^{2}(r ), Z , V are
independent

T = Z

pV /r

has a t-distribution with r degrees of freedom.

8

Recall: Normal and χ^{2} distributions

Given X1, · · · , Xn is a random sample from a N (µ, σ^{2})

distribution where µ and σ^{2} are unknown parameters, let X be
the sample average, and S^{2} =P(X_{i} − X )^{2}/(n − 1) the

sample variance. Define W = (n − 1)S^{2}/σ^{2} (ie sum of squares
divided by σ^{2}, then W is a chi-square distribution with

r = n − 1 degrees of freedom. That is
W = (n − 1)S^{2}

σ^{2} ∼ χ^{2}(n − 1)

E (W ) = 2(r /2) = r , Var (W ) = 4(r /2) = 2r . Thus a random
variable W ∼ χ^{2}(v ) have mean v and variance 2v and the
mgf of W is M_{w}(t) =

1 1−2t

^{r}_{2}

, t < 1/2

9

## CI for µ when σ

^{2}

## is also unknown.

We have

T =

√n(X − µ)/σ r

(n−1)S^{2}
σ^{2}

.

(n − 1)

= X − µ S /√

n

have a t distribution with r = n − 1 degrees of freedom (recall many properties of t-distribution?)

Random Intervals

[ X − t_{α/2}(n − 1)( S

√n), X + t_{α/2}(n − 1)( S

√n) ]

Once a random sample is observed, we compute x and s^{2} and
[x ± t_{α/2}(n − 1)( s

√n)] is a 100(1 − α)% confidence interval for µ.

10

## CI for µ when σ

^{2}

## is also unknown.

We have

T =

√n(X − µ)/σ r

(n−1)S^{2}
σ^{2}

.

(n − 1)

= X − µ S /√

n

have a t distribution with r = n − 1 degrees of freedom (recall many properties of t-distribution?)

Random Intervals

[ X − t_{α/2}(n − 1)( S

√n), X + t_{α/2}(n − 1)( S

√n) ]

Once a random sample is observed, we compute x and s^{2} and
[x ± t_{α/2}(n − 1)( s

√n)]

is a 100(1 − α)% confidence interval for µ.

10

CIs for difference of two means Two independent normal distributions

1. When both variances are known.

2. If the variances are unknown and the sample sizes are large

3. If the variances are unknown

(a) assume common unknown equal variance (b) unequal variance

(i) sample sizes are large (ii) sample sizes are small

Paired data, Match data, dependent data

11

## Both variances are known

Two independent random samples of sizes n and m from the two normal distributions

X_{1}, · · · , X_{n} ∼ N (µ_{x}, σ_{X}^{2}), and Y_{1}, · · · , Y_{m} ∼ N (µ_{y}, σ_{Y}^{2}).

Then we have X ∼ N (µX, σ^{2}_{X}/n), and Y ∼ N (µY, σ^{2}_{Y}/m).

Let W = X − Y , then

W ∼ N (µ_{X} − µ_{Y},σ_{X}^{2}
n +σ_{Y}^{2}

m)

12

## Both variances are known

Once the samples are drawn

x − y ± z_{α/2}
rσ^{2}_{X}

n + σ^{2}_{Y}
m
is a 100(1-α)% CI for µ_{X} − µ_{Y}

13

## Sample sizes are large and variances are unknown

We replace variances with the sample variances s_{X}^{2}, s_{Y}^{2} where
they are the values of the respective unbiased estimates of the
variances.

That is

x − y ± z_{α/2}
rs_{X}^{2}

n + s_{Y}^{2}
m
is an approximate 100(1-α)% CI for µX − µY

14

## Sample sizes are small and variances are unknown

a. Assumed common variance

Estimate for the common variance: equal variance
σ_{X}^{2} = σ_{Y}^{2} = σ^{2}

Denote

S_{p}^{2} = (n − 1)S_{X}^{2} + (m − 1)S_{Y}^{2}
n + m − 2

which is an unbiased estimator for the common variance σ^{2}.

15

Estimate for the common variance

Since the random samples are from two independent normal distribution with common variance , we have

(n − 1)S_{X}^{2}

σ^{2} ∼ χ^{2}(n − 1), (m − 1)S_{Y}^{2}

σ^{2} ∼ χ^{2}(m − 1)
and they are independent. Thus

U = (n − 1)S_{X}^{2}

σ^{2} + (m − 1)S_{Y}^{2}

σ^{2} ∼ χ^{2}(n + m − 2)
and E(U) = n + m − 2, thus we have E(S_{p}^{2}) = σ^{2}

16

## (a). Common variance assumption

we have

Z = X − Y − (µ_{x} − µ_{Y})
q

σ^{2}(^{1}_{n} +_{m}^{1})
but we don’t know σ^{2} so we have

T = Z

pU/r = [X − Y − (µ_{x}− µ_{Y})]pσ^{2}(1/n + 1/m)
q

[^{(n−1)S}_{σ}2 ^{X}^{2} +^{(m−1)S}_{σ}2 ^{Y}^{2}](n + m − 2)

= X − Y − (µ_{x} − µ_{Y})
q

[^{(n−1)S}_{n+m−2}^{X}^{2}^{+(m−1)S}^{Y}^{2}] _{n}^{1} +_{m}^{1}

has a t-distribution with r = n + m − 2 degrees of freedom.

A 00(1 − α)% CI for µX − µY is
x − y ± t_{α/2}(n + m − 2)

r
s_{p}^{2} 1

n + 1 m

17

## (b) not equal variances

W = X − Y − (µ_{x} − µ_{Y})
pS_{X}^{2}/n + S_{Y}^{2}/m

1. If n and m are large enough and the underlying distributions are close to normal -> usenormal distribution to construct a CI

2. ^{∗} If n and m are small -> approximating Student’s t
distribution has r degrees of freedom (Welch t) where

1

r = c^{2}

n − 1 +(1 − c)^{2}

m − 1 and c = s_{X}^{2}/n
s_{X}^{2}/n + s_{Y}^{2}/m
r = (s_{X}^{2}/n + s_{Y}^{2}/m)^{2}

1

n−1(s_{X}^{2}/n)^{2}+_{m−1}^{1} (s_{Y}^{2}/m)^{2}

If r is not an integer, then we use the greatest integer in r , i.e., br c the “floor” is the number of degrees of freedom associated with the approximating student’s t distribution.

18

We have

x − y ± t_{α/2}(r )
rs_{X}^{2}

n + s_{Y}^{2}
m
is a 100(1-α)% CI for µ_{X} − µ_{Y}.

19

## Paired (Match) samples

X_{i} and Y_{i} are measurements taken from the same subject. X_{i}
and Yi are dependent random variables.

Let (X_{1}, Y_{1}), · · · , (X_{n}, Y_{n}) be n pairs of dependent
measurements.

Let D_{i} = X_{i}− Y_{i}, i = 1, · · · , n. Suppose D_{i} can be thought of
as a random sample from N(µ_{D}, σ_{D}^{2}), where µ_{D} and σ_{D}^{2} are
the mean and standard deviation of each difference.

20

To form a CI for µX − µY, use
T = D − µ_{D}

S_{D}/√
n

where D and S_{D} are the sample mean and sample standard
deviation of the n differences. T is a t statistic with n − 1
degrees of freedom.

Thus the CI for µD = µX − µY is

d ± t_{α/2}(n − 1)s_{D}

√n

where d and s_{D} are the observed mean and standard deviation
of the sample. (this is the same as the CI for a single mean).

21

## Confidence intervals for variances

Recall : Chi-Square distribution

Given X1, · · · , Xn is a random sample from a N (µ, σ^{2})
distribution where µ and σ^{2} are unknown parameters and let
S^{2} =P(X_{i} − X )^{2}/(n − 1).

W = (n − 1)S^{2}

σ^{2} ∼ χ^{2}(n − 1)

22

chi-squared R.V.s

0 2 4 6 8 10

0.000.050.100.150.200.25

**Chisq distribution**

x

dchisq(x, 3)

df= 3 df= 5 df= 8

23

Example: chi-squared distribution

Let S^{2} be the sample variance of a random sample of size 6 is
drawn from a N (µ, 12) distribution. Find

P(2.76 < S^{2} < 22.2)

Let W = ^{(n−1)S}_{σ}2 ^{2}, then W ∼ χ^{2}(5).

So P(2.76 < S^{2} < 22.2) = P(_{12}^{5} (2.76) < ^{(n−1)S}_{σ}2 ^{2} <

5

12(22.2)) = P(1.15 < W < 9.25) and

P(2.76 < S^{2} < 22.2) = P(W < 9.25) − P(W < 1.15) =
pchisq(9.25, 5) − pchisq(1.15, 5) = 0.85.

24

Example: chi-squared distribution

Let S^{2} be the sample variance of a random sample of size 6 is
drawn from a N (µ, 12) distribution. Find

P(2.76 < S^{2} < 22.2)

Let W = ^{(n−1)S}_{σ}2 ^{2}, then W ∼ χ^{2}(5).

So P(2.76 < S^{2} < 22.2) = P(_{12}^{5} (2.76) < ^{(n−1)S}_{σ}2 ^{2} <

5

12(22.2)) = P(1.15 < W < 9.25) and

P(2.76 < S^{2} < 22.2) = P(W < 9.25) − P(W < 1.15) =
pchisq(9.25, 5) − pchisq(1.15, 5) = 0.85.

24

## CI for variance σ

^{2}

Let X_{1}, · · · , X_{n} is a random sample from a N (µ, σ^{2})
distribution, find the 100(1 − α)% CI for σ^{2}

Select constantsa and b from χ^{2}(n − 1) such that
P(a ≤ (n − 1)S^{2}

σ^{2} ≤ b) = 1 − α

we select a = χ^{2}_{1−α/2}(n − 1) and b = χ^{2}_{α/2}(n − 1), and we have
1 − α = P( a

(n − 1)S^{2} ≤ 1

σ^{2} ≤ b
(n − 1)S^{2})

= P((n − 1)S^{2}

b ≤ σ^{2} ≤ (n − 1)S^{2}

a )

The probability that the random interval

[(n − 1)S^{2}/b, (n − 1)S^{2}/a] contains the unknown σ^{2} is 1 − α.
Once the data is observed, then the CIs

[(n − 1)s^{2}/b, (n − 1)s^{2}/a] is a 100(1-α)% CI for σ^{2}.

25

## CI for variance σ

^{2}

Let X_{1}, · · · , X_{n} is a random sample from a N (µ, σ^{2})
distribution, find the 100(1 − α)% CI for σ^{2}

Select constantsa and b from χ^{2}(n − 1) such that
P(a ≤ (n − 1)S^{2}

σ^{2} ≤ b) = 1 − α

we select a = χ^{2}_{1−α/2}(n − 1) and b = χ^{2}_{α/2}(n − 1), and we have
1 − α = P( a

(n − 1)S^{2} ≤ 1

σ^{2} ≤ b
(n − 1)S^{2})

= P((n − 1)S^{2}

b ≤ σ^{2} ≤ (n − 1)S^{2}

a )

The probability that the random interval

[(n − 1)S^{2}/b, (n − 1)S^{2}/a] contains the unknown σ^{2} is 1 − α.

Once the data is observed, then the CIs

[(n − 1)s^{2}/b, (n − 1)s^{2}/a] is a 100(1-α)% CI for σ^{2}.

25

## Example: CI for variance

X_{1}, · · · , X_{13} ∼ N (µ, σ^{2}), we have x = 18.97 and
P13

i =1(x_{i} − x)^{2} = 128.41 find the 90% CIs for σ^{2}.
From chi-squared table we have χ^{2}_{0.95}(12) = 5.226 and
χ^{2}_{0.05}(12) = 21.03 (5 quantile and 95 quantile from a
chi-squared distribution with 12 degrees of freedom
respectively).

A 90% CIs for σ^{2} is
[ 128.4

21.03,128.4

5.226 ] = [6.11, 24.57]

26

## Example

Given X1, · · · , Xn is a random sample from a Exponential(λ) distribution (mean=1/λ).

1. Let W = 2λ

n

X

i =1

X_{i}, show W ∼ χ^{2}(2n) (hint: use
Moment generating function)

2. Find a 90% CIs for λ.

27

## CI for the ratio of variances

Recall: F distribution

W1 ∼ χ^{2}(v1), W2 ∼ χ^{2}(v2) and W1, W2 are independent
random variables. Then a random variable F which can be
expressed as

F = W_{1}/v_{1}
W_{2}/v_{2}

is said to be distributed as a F distribution with degrees of
freedom v_{1} and v_{2}, denoted by F (v_{1}, v_{2}) or F_{v}_{1}_{,v}_{2}

28

0 2 4 6 8 10

0.00.10.20.30.40.50.60.7

**F (d1, d2) distribution**

x

df(x, 3, 9)

3, 9 4, 8 9, 3

29

## F-distribution

Reciprocal of an F

Let the r.v. F ∼ F (v_{1}, v_{2}) and let Y = 1/F . Then Y has a
pdf.

f (y )^{∗} = g (F )|dF
dy|

= v_{1}^{v}^{1}^{/2}y^{1−(v}^{1}^{/2)}v_{2}^{v}^{2}^{/2}y^{(v}^{1}^{+v}^{2}^{)/2}
B(^{v}_{2}^{1},^{v}_{2}^{2})(v_{2}y + v_{1})^{(v}^{1}^{+v}^{2}^{)/2}

1
y^{2}

= v_{1}^{v}^{1}^{/2}v_{2}^{v}^{2}^{/2}y^{(v}^{2}^{/2)−1}

B(^{v}_{2}^{2},^{v}_{2}^{1})(v_{1}+ v_{2}y )^{(v}^{1}^{+v}^{2}^{)/2} y ∈ [0, ∞)

That is if F ∼ F (v_{1}, v_{2}) and Y = 1/F , then Y ∼ F (v_{2}, v_{1})

30

## CI for σ

_{X}

^{2}

## /σ

_{Y}

^{2}

## from two ind. Normal

Given S_{X}^{2}, S_{Y}^{2} are unbiased estimates of σ^{2}_{X}, σ_{Y}^{2} derived from
samples of size n and m , respectively, from two independent
normal populations. Find a 100(1 − α)% CI for σ_{X}^{2}/σ^{2}_{Y}.

(n − 1)S_{X}^{2}/σ^{2}_{X} ∼ χ^{2}(n − 1) , (m − 1)S_{Y}^{2}/σ_{Y}^{2} ∼ χ^{2}(m − 1)

(m−1)S_{Y}^{2}

σ^{2}_{Y} /(m − 1)

(n−1)S_{1}^{2}

σ_{1}^{2} /(n − 1)

= S_{Y}^{2}/σ^{2}_{Y}
S_{X}^{2}/σ^{2}_{X}

follow a F distribution with degrees of freedom (m − 1) and (n − 1) i.e.,

S_{Y}^{2}/σ_{Y}^{2}

S_{X}^{2}/σ_{X}^{2} ∼ F (m − 1, n − 1)

31

## CI for σ

_{X}

^{2}

## /σ

_{Y}

^{2}

## from two ind. Normal

Given S_{X}^{2}, S_{Y}^{2} are unbiased estimates of σ^{2}_{X}, σ_{Y}^{2} derived from
samples of size n and m , respectively, from two independent
normal populations. Find a 100(1 − α)% CI for σ_{X}^{2}/σ^{2}_{Y}.
(n − 1)S_{X}^{2}/σ^{2}_{X} ∼ χ^{2}(n − 1) , (m − 1)S_{Y}^{2}/σ_{Y}^{2} ∼ χ^{2}(m − 1)

(m−1)S_{Y}^{2}

σ^{2}_{Y} /(m − 1)

(n−1)S_{1}^{2}

σ_{1}^{2} /(n − 1)

= S_{Y}^{2}/σ^{2}_{Y}
S_{X}^{2}/σ^{2}_{X}

follow a F distribution with degrees of freedom (m − 1) and (n − 1) i.e.,

S_{Y}^{2}/σ_{Y}^{2}

S_{X}^{2}/σ_{X}^{2} ∼ F (m − 1, n − 1)

31

S_{Y}^{2}/σ^{2}_{Y}

S_{X}^{2}/σ^{2}_{X} ∼ F (m − 1, n − 1)

So we select c = F_{1−α/2}(m − 1, n − 1) and
d = Fα/2(m − 1, n − 1), and

P(c ≤ S_{Y}^{2}/σ_{Y}^{2}

S_{X}^{2}/σ_{X}^{2} ≤ d ) = 1 − α
That is

P c S_{X}^{2}
S_{Y}^{2} ≤ σ_{X}^{2}

σ_{Y}^{2} ≤ d S_{X}^{2}

S_{Y}^{2} = 1 − α

Often from table we have

c = F1−α/2(m − 1, n − 1) = 1/Fα/2(n − 1, m − 1) and

d = F_{α/2}(m − 1, n − 1), let s_{x}^{2} and s_{y}^{2} be the realization of S_{X}^{2}
and S_{Y}^{2}, then a 100(1 − α)% CIs for σ_{X}^{2}/σ^{2}_{Y} is

[ 1

F_{α/2}(n − 1, m − 1)
s_{X}^{2}

s_{Y}^{2} , F_{α/2}(m − 1, n − 1)s_{X}^{2}
s_{Y}^{2} ]

32

S_{Y}^{2}/σ^{2}_{Y}

S_{X}^{2}/σ^{2}_{X} ∼ F (m − 1, n − 1)

So we select c = F_{1−α/2}(m − 1, n − 1) and
d = Fα/2(m − 1, n − 1), and

P(c ≤ S_{Y}^{2}/σ_{Y}^{2}

S_{X}^{2}/σ_{X}^{2} ≤ d ) = 1 − α
That is

P c S_{X}^{2}
S_{Y}^{2} ≤ σ_{X}^{2}

σ_{Y}^{2} ≤ d S_{X}^{2}

S_{Y}^{2} = 1 − α
Often from table we have

c = F_{1−α/2}(m − 1, n − 1) = 1/F_{α/2}(n − 1, m − 1) and

d = F_{α/2}(m − 1, n − 1), let s_{x}^{2} and s_{y}^{2} be the realization of S_{X}^{2}
and S_{Y}^{2}, then a 100(1 − α)% CIs for σ_{X}^{2}/σ^{2}_{Y} is

[ 1

F_{α/2}(n − 1, m − 1)
s_{X}^{2}

s_{Y}^{2}, F_{α/2}(m − 1, n − 1)s_{X}^{2}
s_{Y}^{2} ]

32

## Example

From two ind Normal with unknown means and variances, we
have (12)s_{X}^{2} = 128.4 from a random sample of size 13 and
(8)s_{Y}^{2} = 36.72 from a random sample of size 9. Find a 98%

CIs for σ_{X}^{2}/σ_{Y}^{2}.

S_{Y}^{2}/σ_{Y}^{2}

S_{X}^{2}/σ_{X}^{2} ∼ F (8, 12)
From F -table we have F_{0.01}(12, 8) = 5.67 and
F_{0.01}(8, 12) = 4.50, so a 98% CIs for σ^{2}_{X}/σ_{Y}^{2} is

[ 1

5.67

128.4/12

36.72/8 , 4.50 128.4/12 36.72/8 ]

33

## Example

From two ind Normal with unknown means and variances, we
have (12)s_{X}^{2} = 128.4 from a random sample of size 13 and
(8)s_{Y}^{2} = 36.72 from a random sample of size 9. Find a 98%

CIs for σ_{X}^{2}/σ_{Y}^{2}.

S_{Y}^{2}/σ_{Y}^{2}

S_{X}^{2}/σ_{X}^{2} ∼ F (8, 12)
From F -table we have F0.01(12, 8) = 5.67 and
F_{0.01}(8, 12) = 4.50, so a 98% CIs for σ^{2}_{X}/σ_{Y}^{2} is

[ 1

5.67

128.4/12

36.72/8 , 4.50 128.4/12 36.72/8 ]

33

## Confidence intervals for proportions (p)

Estimate proportions. Construct a CI for p in the Bin(n, p) distribution.

Assume that sampling is from a binomial population and hence that the problem is to estimate p in the Bin(n, p) distribution where p is unknown.

recall:

Given Y is distributed as Bin(n, p), an unbiased estimate of p
is ˆp = ^{Y}_{n}.

E (ˆp) = E (Y n) = p and

Var (ˆp) = 1

n^{2}Var(Y ) = 1

n^{2}np(1 − p) = p(1 − p)
n

34

## Confidence intervals for proportions (p)

For large n,

Y − np

pnp(1 − p) = (Y /n) − p pp(1 − p)/n

can be approximated by the standard normal N (0, 1).

Thus an approximate 100(1 − α)% CI for p is obtained by considering

P(−z_{α/2}< (Y /n) − p

pp(1 − p)/n < z_{α/2}) = 1 − α

Replace the variance of ˆp = Y /n by its estimate ˆp(1 − ˆp)/n, giving a simple expression for the CI for p is

[ˆp ± z_{α/2}

rˆp(1 − ˆp) n ] = [Y

n ± z_{α/2}

r(Y /n)(1 − Y /n)

n ]

35

## Confidence intervals for proportions (p)

For large n,

Y − np

pnp(1 − p) = (Y /n) − p pp(1 − p)/n

can be approximated by the standard normal N (0, 1).

Thus an approximate 100(1 − α)% CI for p is obtained by considering

P(−z_{α/2}< (Y /n) − p

pp(1 − p)/n < z_{α/2}) = 1 − α

Replace the variance of ˆp = Y /n by its estimate ˆp(1 − ˆp)/n, giving a simple expression for the CI for p is

[ˆp ± z_{α/2}

rp(1 − ˆˆ p) n ] = [Y

n ± z_{α/2}

r(Y /n)(1 − Y /n)

n ]

35

## Example

Assume Y ∼ Bin(n, p), we have n = 36 and y /n = 0.222, find an approximate 90% CIs for p

[ 0.222 ± 1.645

r(0.222)(1 − 0.222)

36 ]

### Example

Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61

Poll n = 351 and y = 185 say yes, find 95% CI for p?

36

## Example

Assume Y ∼ Bin(n, p), we have n = 36 and y /n = 0.222, find an approximate 90% CIs for p

[ 0.222 ± 1.645

r(0.222)(1 − 0.222)

36 ]

### Example

Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61

Poll n = 351 and y = 185 say yes, find 95% CI for p?

36

## Example

Assume Y ∼ Bin(n, p), we have n = 36 and y /n = 0.222, find an approximate 90% CIs for p

[ 0.222 ± 1.645

r(0.222)(1 − 0.222)

36 ]

### Example

Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61 Poll n = 351 and y = 185 say yes, find 95% CI for p?

36

## CI for difference of two proportions

37

## 民

## 民 民調 調 調的 的 的解 解 解讀 讀 讀

「....施政滿意度4成4。本次調查是以台灣地區住宅電話簿 為抽樣清冊， 並以電話的後四碼進行隨機抽樣。共成功訪 問1056位台灣地區20歲以上民眾。在95%的信心水準下，

抽樣誤差為正負3.0百分點。

1. 這項民調的母體是什麼？樣本數為多少？

2. 受訪民眾中對施政滿意約有多少人？

3. 算出這次調查的信賴區間？

38

## 民

## 民 民調 調 調的 的 的解 解 解讀 讀 讀

1. 在本次調查中，母體是台灣地區20歲以上的民眾，樣

本則是成功訪問的1056人，「滿意度4成4」表示 在1056位受訪者中，約有44%的人表示滿意(即約 有456人回答滿意

2. 區間[0.44 − 0.03, 0.44 + 0.03] = [0.41, 0.47]，稱為信賴 區間 (信賴區間：[估計值-最大誤差 , 估計值+最大誤 差] )

假設母體真正的滿意比例是p,這次的調查推估p的值可 能會落在0.41到0.47的範圍內。

3. 95%的信心水準: p是不可知的，而抽樣都會有誤差，

並不能保證真正的比例p一定會在我們所推估的區間 內。「如果我們抽樣很多次，每次都會得到一個信賴 區間，那麼這麼多的信賴區間中，約有95%的區間會

涵蓋真正的p值。

4. 而我們有95%的信心說，真正的滿意度會落在我們所

得出的區間中。

39

## Sample Size for proportion

某報對於台北市市長施政滿意程度進行民調，民調結果如 下： 「滿意度為六成三，本次民調共成功訪問n位台北

市20歲以上的成年民眾，在95%的信心水準下，抽樣誤差

為正負3.2百分點。」 求n?

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032
we have n = (0.63)(0.37)(1.96)^{2}/0.032^{2} = 864

The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01
we have n = (0.08)(0.92)(2.326)^{2}/0.01^{2} = 3982

40

## Sample Size for proportion

某報對於台北市市長施政滿意程度進行民調，民調結果如 下： 「滿意度為六成三，本次民調共成功訪問n位台北

市20歲以上的成年民眾，在95%的信心水準下，抽樣誤差

為正負3.2百分點。」 求n?

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032
we have n = (0.63)(0.37)(1.96)^{2}/0.032^{2} = 864

The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01
we have n = (0.08)(0.92)(2.326)^{2}/0.01^{2} = 3982

40

## Sample Size for proportion

某報對於台北市市長施政滿意程度進行民調，民調結果如 下： 「滿意度為六成三，本次民調共成功訪問n位台北

市20歲以上的成年民眾，在95%的信心水準下，抽樣誤差

為正負3.2百分點。」 求n?

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032
we have n = (0.63)(0.37)(1.96)^{2}/0.032^{2} = 864

The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01
we have n = (0.08)(0.92)(2.326)^{2}/0.01^{2} = 3982

40

## Sample Size for proportion

某報對於台北市市長施政滿意程度進行民調，民調結果如 下： 「滿意度為六成三，本次民調共成功訪問n位台北

市20歲以上的成年民眾，在95%的信心水準下，抽樣誤差

為正負3.2百分點。」 求n?

z0.025 = 1.96 and (1.96)

q(0.63)(1−0.63)

n = 0.032
we have n = (0.63)(0.37)(1.96)^{2}/0.032^{2} = 864

The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n

z0.01= 2.326 and (2.326)

q(0.08)(0.92)

n = 0.01
we have n = (0.08)(0.92)(2.326)^{2}/0.01^{2} = 3982

40

## unknown ˆ p

For estimating p, we have p^{∗}(1 − p^{∗}) ≤ 1/4. Hence we need
n = 2.326^{2}/(4 ∗ 0.01^{2}) = 13530

for the maximum error of the estimate for 98% confidence coefficient is 0.01.

95% confidence coefficient for = 0.01, we have n = 9604 95% confidence coefficient for = 0.03, we have n = 1067

41

## Sample Size and CIs for given ˆ p

The 95% CI for the proportion of people of supporting A when there is 51% people support A in polls of 100, 400 or 10,000 sample.

[0.41, 0.61], [0.46, 0.56], [0.50, 0.52]

[0.51 ± 0.1], [0.51 ± 0.05], [0.51 ± 0.01]

42

## Sample Size for mean

100(1 − α)% CI for µ is [x ± z_{α/2}(σ/√

n)]. Denote such
interval as x ± . we sometime call = z_{α/2}(σ/√

n) the maximum error of the estimate

n = (z_{α/2})^{2}(σ)^{2}

^{2}
where it is assumed that σ^{2} is known.

43

Example

we want the 95% CIs for µ to be x ± 1 for a normal population with standard deviation σ = 15, find the sample size.

1.96 15

√n = 1 we have n ≈ 864.35 = 865.

The 80% CIs for µ is x ± 2, then we have

1.282 15

√n = 2
where z_{0.1} = 1.282 and thus n = 93

44

Example

we want the 95% CIs for µ to be x ± 1 for a normal population with standard deviation σ = 15, find the sample size.

1.96 15

√n = 1 we have n ≈ 864.35 = 865.

The 80% CIs for µ is x ± 2, then we have

1.282 15

√n = 2
where z_{0.1} = 1.282 and thus n = 93

44

Example

we want the 95% CIs for µ to be x ± 1 for a normal population with standard deviation σ = 15, find the sample size.

1.96 15

√n = 1 we have n ≈ 864.35 = 865.

The 80% CIs for µ is x ± 2, then we have

1.282 15

√n = 2
where z_{0.1} = 1.282 and thus n = 93

44

## Pivotal quantity I

A very useful method for finding confidence intervals uses a pivotal quantity.

What is a pivotal quantity? A pivotal quantity is a function of data and the unknown parameter, say g (X, θ), and the distribution of g (X, θ) does not depend on the unknown parameter.

### Example

Given X_{1}, · · · , X_{n} is a random sample from a N (µ, σ^{2})
distribution.

45

## Pivotal quantity II

1. When σ is known, Z = X − µ σ/√

n is a pivotal quantity.

Z ∼ N (0, 1)

2. When σ is unknown, T = X − µ S /√

n is a pivotal quantity where S is the sample deviation. T ∼ t(n − 1)

3. W = (n − 1)S^{2}/σ^{2} is a pivotal quantity. W ∼ χ^{2}(n − 1)

Y ∼ Bin(n, p), (Y /n) − p qp(1−p)

n

∼ N (0, 1)

46