101 學年 統計 學(I) 授課老師:蔡碧紋
Statistical Inference : Confidence Intervals
1
Population: the form of the distribution is assumed known, but the parameter(s) which determines the distribution is unknown
Sample: Draw a set of random sample from the population (i.i.d)
Point estimation (MME, MLE) Confidence Intervals:
I Confidence intervals for a population mean
I Confidence intervals for difference between two means
I Confidence intervals for variances
I Confidence intervals for proportions
I Sample Size
2
Confidence intervals for the mean of a single population
CI for µ
1. A set of random sample (i.i.d) from a normally distributed population.
(i) when the variance σ2is known.
(ii) when the variance σ2 is unknown.
2. Sample is NOT from a normal distribution.
(a) When n is large (CLT σ/X −µ¯√n → N(0, 1))
(b) When n is less than 30 and underlying distribution is less normal—Non-parameter methods
3
CIs for µ when the variance σ
2is known
Assume the population X ∼ N (µ, σ2) where σ2 is known. We draw a set of random sample of size n, let X be the sample average, and we can work out the probability that the random interval
[ X − zα/2( σ
√n) , X + zα/2( σ
√n) ] contains the unknown mean µ is 1 − α, i.e.,
P(L ≤ µ ≤ U) = 1 − α
[L, U] is a set of random intervals that contains µ with probability 1 − α;
If we replicate the sampling process 100 times, we have 100 different confidence intervals. It should be true that about 95% of them would contain the population mean µ.
4
40 45 50 55 60
01020304050
Confidence intervals
U
T
5
Once the sample is observed and the sample average is computed to equal to x , we call the interval
[ x ± zα/2( σ
√n) ]
a 100(1 − α)% confidence intervals for the unknown mean µ. We are 100(1 − α)% confidence that [x ± zα/2(√σn)] will contain µ
sample size n
confidence coefficient 1 − α
6
Example
Assume the population X ∼ N (µ, 81), we draw a set of random sample of size n = 10, and have
60.50 66.18 48.10 41.21 53.66 36.49 54.80 56.04 43.48 42.41
Find a 95% confidence interval for µ.
[x ± 1.96 √σn] = [44.71, 55.87] is a 95% confidence interval for µ.
For a particular sample and x was computed, the interval either does or does not contain the mean µ. We can’t say there is 95% chance that the µ will fall between 44.71 and 55.87. We can only say that we have 95% confidence that the population mean will fall between [44.71, 55.87].
(we provide information about the uncertainty of the estimate)
7
Example
Assume the population X ∼ N (µ, 81), we draw a set of random sample of size n = 10, and have
60.50 66.18 48.10 41.21 53.66 36.49 54.80 56.04 43.48 42.41
Find a 95% confidence interval for µ.
[x ± 1.96 √σn] = [44.71, 55.87] is a 95% confidence interval for µ.
For a particular sample and x was computed, the interval either does or does not contain the mean µ. We can’t say there is 95% chance that the µ will fall between 44.71 and 55.87. We can only say that we have 95% confidence that the population mean will fall between [44.71, 55.87].
(we provide information about the uncertainty of the estimate)
7
CI for µ when σ
2is also unknown.
Recall: T -distribution According to the definition of a T random variable: Z ∼ N (0, 1) and V = χ2(r ), Z , V are independent
T = Z
pV /r
has a t-distribution with r degrees of freedom.
8
Recall: Normal and χ2 distributions
Given X1, · · · , Xn is a random sample from a N (µ, σ2)
distribution where µ and σ2 are unknown parameters, let X be the sample average, and S2 =P(Xi − X )2/(n − 1) the
sample variance. Define W = (n − 1)S2/σ2 (ie sum of squares divided by σ2, then W is a chi-square distribution with
r = n − 1 degrees of freedom. That is W = (n − 1)S2
σ2 ∼ χ2(n − 1)
E (W ) = 2(r /2) = r , Var (W ) = 4(r /2) = 2r . Thus a random variable W ∼ χ2(v ) have mean v and variance 2v and the mgf of W is Mw(t) =
1 1−2t
r2
, t < 1/2
9
CI for µ when σ
2is also unknown.
We have
T =
√n(X − µ)/σ r
(n−1)S2 σ2
.
(n − 1)
= X − µ S /√
n
have a t distribution with r = n − 1 degrees of freedom (recall many properties of t-distribution?)
Random Intervals
[ X − tα/2(n − 1)( S
√n), X + tα/2(n − 1)( S
√n) ]
Once a random sample is observed, we compute x and s2 and [x ± tα/2(n − 1)( s
√n)] is a 100(1 − α)% confidence interval for µ.
10
CI for µ when σ
2is also unknown.
We have
T =
√n(X − µ)/σ r
(n−1)S2 σ2
.
(n − 1)
= X − µ S /√
n
have a t distribution with r = n − 1 degrees of freedom (recall many properties of t-distribution?)
Random Intervals
[ X − tα/2(n − 1)( S
√n), X + tα/2(n − 1)( S
√n) ]
Once a random sample is observed, we compute x and s2 and [x ± tα/2(n − 1)( s
√n)]
is a 100(1 − α)% confidence interval for µ.
10
CIs for difference of two means Two independent normal distributions
1. When both variances are known.
2. If the variances are unknown and the sample sizes are large
3. If the variances are unknown
(a) assume common unknown equal variance (b) unequal variance
(i) sample sizes are large (ii) sample sizes are small
Paired data, Match data, dependent data
11
Both variances are known
Two independent random samples of sizes n and m from the two normal distributions
X1, · · · , Xn ∼ N (µx, σX2), and Y1, · · · , Ym ∼ N (µy, σY2).
Then we have X ∼ N (µX, σ2X/n), and Y ∼ N (µY, σ2Y/m).
Let W = X − Y , then
W ∼ N (µX − µY,σX2 n +σY2
m)
12
Both variances are known
Once the samples are drawn
x − y ± zα/2 rσ2X
n + σ2Y m is a 100(1-α)% CI for µX − µY
13
Sample sizes are large and variances are unknown
We replace variances with the sample variances sX2, sY2 where they are the values of the respective unbiased estimates of the variances.
That is
x − y ± zα/2 rsX2
n + sY2 m is an approximate 100(1-α)% CI for µX − µY
14
Sample sizes are small and variances are unknown
a. Assumed common variance
Estimate for the common variance: equal variance σX2 = σY2 = σ2
Denote
Sp2 = (n − 1)SX2 + (m − 1)SY2 n + m − 2
which is an unbiased estimator for the common variance σ2.
15
Estimate for the common variance
Since the random samples are from two independent normal distribution with common variance , we have
(n − 1)SX2
σ2 ∼ χ2(n − 1), (m − 1)SY2
σ2 ∼ χ2(m − 1) and they are independent. Thus
U = (n − 1)SX2
σ2 + (m − 1)SY2
σ2 ∼ χ2(n + m − 2) and E(U) = n + m − 2, thus we have E(Sp2) = σ2
16
(a). Common variance assumption
we have
Z = X − Y − (µx − µY) q
σ2(1n +m1) but we don’t know σ2 so we have
T = Z
pU/r = [X − Y − (µx− µY)]pσ2(1/n + 1/m) q
[(n−1)Sσ2 X2 +(m−1)Sσ2 Y2](n + m − 2)
= X − Y − (µx − µY) q
[(n−1)Sn+m−2X2+(m−1)SY2] n1 +m1
has a t-distribution with r = n + m − 2 degrees of freedom.
A 00(1 − α)% CI for µX − µY is x − y ± tα/2(n + m − 2)
r sp2 1
n + 1 m
17
(b) not equal variances
W = X − Y − (µx − µY) pSX2/n + SY2/m
1. If n and m are large enough and the underlying distributions are close to normal -> usenormal distribution to construct a CI
2. ∗ If n and m are small -> approximating Student’s t distribution has r degrees of freedom (Welch t) where
1
r = c2
n − 1 +(1 − c)2
m − 1 and c = sX2/n sX2/n + sY2/m r = (sX2/n + sY2/m)2
1
n−1(sX2/n)2+m−11 (sY2/m)2
If r is not an integer, then we use the greatest integer in r , i.e., br c the “floor” is the number of degrees of freedom associated with the approximating student’s t distribution.
18
We have
x − y ± tα/2(r ) rsX2
n + sY2 m is a 100(1-α)% CI for µX − µY.
19
Paired (Match) samples
Xi and Yi are measurements taken from the same subject. Xi and Yi are dependent random variables.
Let (X1, Y1), · · · , (Xn, Yn) be n pairs of dependent measurements.
Let Di = Xi− Yi, i = 1, · · · , n. Suppose Di can be thought of as a random sample from N(µD, σD2), where µD and σD2 are the mean and standard deviation of each difference.
20
To form a CI for µX − µY, use T = D − µD
SD/√ n
where D and SD are the sample mean and sample standard deviation of the n differences. T is a t statistic with n − 1 degrees of freedom.
Thus the CI for µD = µX − µY is
d ± tα/2(n − 1)sD
√n
where d and sD are the observed mean and standard deviation of the sample. (this is the same as the CI for a single mean).
21
Confidence intervals for variances
Recall : Chi-Square distribution
Given X1, · · · , Xn is a random sample from a N (µ, σ2) distribution where µ and σ2 are unknown parameters and let S2 =P(Xi − X )2/(n − 1).
W = (n − 1)S2
σ2 ∼ χ2(n − 1)
22
chi-squared R.V.s
0 2 4 6 8 10
0.000.050.100.150.200.25
Chisq distribution
x
dchisq(x, 3)
df= 3 df= 5 df= 8
23
Example: chi-squared distribution
Let S2 be the sample variance of a random sample of size 6 is drawn from a N (µ, 12) distribution. Find
P(2.76 < S2 < 22.2)
Let W = (n−1)Sσ2 2, then W ∼ χ2(5).
So P(2.76 < S2 < 22.2) = P(125 (2.76) < (n−1)Sσ2 2 <
5
12(22.2)) = P(1.15 < W < 9.25) and
P(2.76 < S2 < 22.2) = P(W < 9.25) − P(W < 1.15) = pchisq(9.25, 5) − pchisq(1.15, 5) = 0.85.
24
Example: chi-squared distribution
Let S2 be the sample variance of a random sample of size 6 is drawn from a N (µ, 12) distribution. Find
P(2.76 < S2 < 22.2)
Let W = (n−1)Sσ2 2, then W ∼ χ2(5).
So P(2.76 < S2 < 22.2) = P(125 (2.76) < (n−1)Sσ2 2 <
5
12(22.2)) = P(1.15 < W < 9.25) and
P(2.76 < S2 < 22.2) = P(W < 9.25) − P(W < 1.15) = pchisq(9.25, 5) − pchisq(1.15, 5) = 0.85.
24
CI for variance σ
2Let X1, · · · , Xn is a random sample from a N (µ, σ2) distribution, find the 100(1 − α)% CI for σ2
Select constantsa and b from χ2(n − 1) such that P(a ≤ (n − 1)S2
σ2 ≤ b) = 1 − α
we select a = χ21−α/2(n − 1) and b = χ2α/2(n − 1), and we have 1 − α = P( a
(n − 1)S2 ≤ 1
σ2 ≤ b (n − 1)S2)
= P((n − 1)S2
b ≤ σ2 ≤ (n − 1)S2
a )
The probability that the random interval
[(n − 1)S2/b, (n − 1)S2/a] contains the unknown σ2 is 1 − α. Once the data is observed, then the CIs
[(n − 1)s2/b, (n − 1)s2/a] is a 100(1-α)% CI for σ2.
25
CI for variance σ
2Let X1, · · · , Xn is a random sample from a N (µ, σ2) distribution, find the 100(1 − α)% CI for σ2
Select constantsa and b from χ2(n − 1) such that P(a ≤ (n − 1)S2
σ2 ≤ b) = 1 − α
we select a = χ21−α/2(n − 1) and b = χ2α/2(n − 1), and we have 1 − α = P( a
(n − 1)S2 ≤ 1
σ2 ≤ b (n − 1)S2)
= P((n − 1)S2
b ≤ σ2 ≤ (n − 1)S2
a )
The probability that the random interval
[(n − 1)S2/b, (n − 1)S2/a] contains the unknown σ2 is 1 − α.
Once the data is observed, then the CIs
[(n − 1)s2/b, (n − 1)s2/a] is a 100(1-α)% CI for σ2.
25
Example: CI for variance
X1, · · · , X13 ∼ N (µ, σ2), we have x = 18.97 and P13
i =1(xi − x)2 = 128.41 find the 90% CIs for σ2. From chi-squared table we have χ20.95(12) = 5.226 and χ20.05(12) = 21.03 (5 quantile and 95 quantile from a chi-squared distribution with 12 degrees of freedom respectively).
A 90% CIs for σ2 is [ 128.4
21.03,128.4
5.226 ] = [6.11, 24.57]
26
Example
Given X1, · · · , Xn is a random sample from a Exponential(λ) distribution (mean=1/λ).
1. Let W = 2λ
n
X
i =1
Xi, show W ∼ χ2(2n) (hint: use Moment generating function)
2. Find a 90% CIs for λ.
27
CI for the ratio of variances
Recall: F distribution
W1 ∼ χ2(v1), W2 ∼ χ2(v2) and W1, W2 are independent random variables. Then a random variable F which can be expressed as
F = W1/v1 W2/v2
is said to be distributed as a F distribution with degrees of freedom v1 and v2, denoted by F (v1, v2) or Fv1,v2
28
0 2 4 6 8 10
0.00.10.20.30.40.50.60.7
F (d1, d2) distribution
x
df(x, 3, 9)
3, 9 4, 8 9, 3
29
F-distribution
Reciprocal of an F
Let the r.v. F ∼ F (v1, v2) and let Y = 1/F . Then Y has a pdf.
f (y )∗ = g (F )|dF dy|
= v1v1/2y1−(v1/2)v2v2/2y(v1+v2)/2 B(v21,v22)(v2y + v1)(v1+v2)/2
1 y2
= v1v1/2v2v2/2y(v2/2)−1
B(v22,v21)(v1+ v2y )(v1+v2)/2 y ∈ [0, ∞)
That is if F ∼ F (v1, v2) and Y = 1/F , then Y ∼ F (v2, v1)
30
CI for σ
X2/σ
Y2from two ind. Normal
Given SX2, SY2 are unbiased estimates of σ2X, σY2 derived from samples of size n and m , respectively, from two independent normal populations. Find a 100(1 − α)% CI for σX2/σ2Y.
(n − 1)SX2/σ2X ∼ χ2(n − 1) , (m − 1)SY2/σY2 ∼ χ2(m − 1)
(m−1)SY2
σ2Y /(m − 1)
(n−1)S12
σ12 /(n − 1)
= SY2/σ2Y SX2/σ2X
follow a F distribution with degrees of freedom (m − 1) and (n − 1) i.e.,
SY2/σY2
SX2/σX2 ∼ F (m − 1, n − 1)
31
CI for σ
X2/σ
Y2from two ind. Normal
Given SX2, SY2 are unbiased estimates of σ2X, σY2 derived from samples of size n and m , respectively, from two independent normal populations. Find a 100(1 − α)% CI for σX2/σ2Y. (n − 1)SX2/σ2X ∼ χ2(n − 1) , (m − 1)SY2/σY2 ∼ χ2(m − 1)
(m−1)SY2
σ2Y /(m − 1)
(n−1)S12
σ12 /(n − 1)
= SY2/σ2Y SX2/σ2X
follow a F distribution with degrees of freedom (m − 1) and (n − 1) i.e.,
SY2/σY2
SX2/σX2 ∼ F (m − 1, n − 1)
31
SY2/σ2Y
SX2/σ2X ∼ F (m − 1, n − 1)
So we select c = F1−α/2(m − 1, n − 1) and d = Fα/2(m − 1, n − 1), and
P(c ≤ SY2/σY2
SX2/σX2 ≤ d ) = 1 − α That is
P c SX2 SY2 ≤ σX2
σY2 ≤ d SX2
SY2 = 1 − α
Often from table we have
c = F1−α/2(m − 1, n − 1) = 1/Fα/2(n − 1, m − 1) and
d = Fα/2(m − 1, n − 1), let sx2 and sy2 be the realization of SX2 and SY2, then a 100(1 − α)% CIs for σX2/σ2Y is
[ 1
Fα/2(n − 1, m − 1) sX2
sY2 , Fα/2(m − 1, n − 1)sX2 sY2 ]
32
SY2/σ2Y
SX2/σ2X ∼ F (m − 1, n − 1)
So we select c = F1−α/2(m − 1, n − 1) and d = Fα/2(m − 1, n − 1), and
P(c ≤ SY2/σY2
SX2/σX2 ≤ d ) = 1 − α That is
P c SX2 SY2 ≤ σX2
σY2 ≤ d SX2
SY2 = 1 − α Often from table we have
c = F1−α/2(m − 1, n − 1) = 1/Fα/2(n − 1, m − 1) and
d = Fα/2(m − 1, n − 1), let sx2 and sy2 be the realization of SX2 and SY2, then a 100(1 − α)% CIs for σX2/σ2Y is
[ 1
Fα/2(n − 1, m − 1) sX2
sY2, Fα/2(m − 1, n − 1)sX2 sY2 ]
32
Example
From two ind Normal with unknown means and variances, we have (12)sX2 = 128.4 from a random sample of size 13 and (8)sY2 = 36.72 from a random sample of size 9. Find a 98%
CIs for σX2/σY2.
SY2/σY2
SX2/σX2 ∼ F (8, 12) From F -table we have F0.01(12, 8) = 5.67 and F0.01(8, 12) = 4.50, so a 98% CIs for σ2X/σY2 is
[ 1
5.67
128.4/12
36.72/8 , 4.50 128.4/12 36.72/8 ]
33
Example
From two ind Normal with unknown means and variances, we have (12)sX2 = 128.4 from a random sample of size 13 and (8)sY2 = 36.72 from a random sample of size 9. Find a 98%
CIs for σX2/σY2.
SY2/σY2
SX2/σX2 ∼ F (8, 12) From F -table we have F0.01(12, 8) = 5.67 and F0.01(8, 12) = 4.50, so a 98% CIs for σ2X/σY2 is
[ 1
5.67
128.4/12
36.72/8 , 4.50 128.4/12 36.72/8 ]
33
Confidence intervals for proportions (p)
Estimate proportions. Construct a CI for p in the Bin(n, p) distribution.
Assume that sampling is from a binomial population and hence that the problem is to estimate p in the Bin(n, p) distribution where p is unknown.
recall:
Given Y is distributed as Bin(n, p), an unbiased estimate of p is ˆp = Yn.
E (ˆp) = E (Y n) = p and
Var (ˆp) = 1
n2Var(Y ) = 1
n2np(1 − p) = p(1 − p) n
34
Confidence intervals for proportions (p)
For large n,
Y − np
pnp(1 − p) = (Y /n) − p pp(1 − p)/n
can be approximated by the standard normal N (0, 1).
Thus an approximate 100(1 − α)% CI for p is obtained by considering
P(−zα/2< (Y /n) − p
pp(1 − p)/n < zα/2) = 1 − α
Replace the variance of ˆp = Y /n by its estimate ˆp(1 − ˆp)/n, giving a simple expression for the CI for p is
[ˆp ± zα/2
rˆp(1 − ˆp) n ] = [Y
n ± zα/2
r(Y /n)(1 − Y /n)
n ]
35
Confidence intervals for proportions (p)
For large n,
Y − np
pnp(1 − p) = (Y /n) − p pp(1 − p)/n
can be approximated by the standard normal N (0, 1).
Thus an approximate 100(1 − α)% CI for p is obtained by considering
P(−zα/2< (Y /n) − p
pp(1 − p)/n < zα/2) = 1 − α
Replace the variance of ˆp = Y /n by its estimate ˆp(1 − ˆp)/n, giving a simple expression for the CI for p is
[ˆp ± zα/2
rp(1 − ˆˆ p) n ] = [Y
n ± zα/2
r(Y /n)(1 − Y /n)
n ]
35
Example
Assume Y ∼ Bin(n, p), we have n = 36 and y /n = 0.222, find an approximate 90% CIs for p
[ 0.222 ± 1.645
r(0.222)(1 − 0.222)
36 ]
Example
Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61
Poll n = 351 and y = 185 say yes, find 95% CI for p?
36
Example
Assume Y ∼ Bin(n, p), we have n = 36 and y /n = 0.222, find an approximate 90% CIs for p
[ 0.222 ± 1.645
r(0.222)(1 − 0.222)
36 ]
Example
Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61
Poll n = 351 and y = 185 say yes, find 95% CI for p?
36
Example
Assume Y ∼ Bin(n, p), we have n = 36 and y /n = 0.222, find an approximate 90% CIs for p
[ 0.222 ± 1.645
r(0.222)(1 − 0.222)
36 ]
Example
Poll n = 100 and y = 51 say yes, find 95% CI for p .41, 0.61 Poll n = 351 and y = 185 say yes, find 95% CI for p?
36
CI for difference of two proportions
37
民
民 民調 調 調的 的 的解 解 解讀 讀 讀
「....施政滿意度4成4。本次調查是以台灣地區住宅電話簿 為抽樣清冊, 並以電話的後四碼進行隨機抽樣。共成功訪 問1056位台灣地區20歲以上民眾。在95%的信心水準下,
抽樣誤差為正負3.0百分點。
1. 這項民調的母體是什麼?樣本數為多少?
2. 受訪民眾中對施政滿意約有多少人?
3. 算出這次調查的信賴區間?
38
民
民 民調 調 調的 的 的解 解 解讀 讀 讀
1. 在本次調查中,母體是台灣地區20歲以上的民眾,樣
本則是成功訪問的1056人,「滿意度4成4」表示 在1056位受訪者中,約有44%的人表示滿意(即約 有456人回答滿意
2. 區間[0.44 − 0.03, 0.44 + 0.03] = [0.41, 0.47],稱為信賴 區間 (信賴區間:[估計值-最大誤差 , 估計值+最大誤 差] )
假設母體真正的滿意比例是p,這次的調查推估p的值可 能會落在0.41到0.47的範圍內。
3. 95%的信心水準: p是不可知的,而抽樣都會有誤差,
並不能保證真正的比例p一定會在我們所推估的區間 內。「如果我們抽樣很多次,每次都會得到一個信賴 區間,那麼這麼多的信賴區間中,約有95%的區間會
涵蓋真正的p值。
4. 而我們有95%的信心說,真正的滿意度會落在我們所
得出的區間中。
39
Sample Size for proportion
某報對於台北市市長施政滿意程度進行民調,民調結果如 下: 「滿意度為六成三,本次民調共成功訪問n位台北
市20歲以上的成年民眾,在95%的信心水準下,抽樣誤差
為正負3.2百分點。」 求n?
z0.025 = 1.96 and (1.96)
q(0.63)(1−0.63)
n = 0.032 we have n = (0.63)(0.37)(1.96)2/0.0322 = 864
The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n
z0.01= 2.326 and (2.326)
q(0.08)(0.92)
n = 0.01 we have n = (0.08)(0.92)(2.326)2/0.012 = 3982
40
Sample Size for proportion
某報對於台北市市長施政滿意程度進行民調,民調結果如 下: 「滿意度為六成三,本次民調共成功訪問n位台北
市20歲以上的成年民眾,在95%的信心水準下,抽樣誤差
為正負3.2百分點。」 求n?
z0.025 = 1.96 and (1.96)
q(0.63)(1−0.63)
n = 0.032 we have n = (0.63)(0.37)(1.96)2/0.0322 = 864
The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n
z0.01= 2.326 and (2.326)
q(0.08)(0.92)
n = 0.01 we have n = (0.08)(0.92)(2.326)2/0.012 = 3982
40
Sample Size for proportion
某報對於台北市市長施政滿意程度進行民調,民調結果如 下: 「滿意度為六成三,本次民調共成功訪問n位台北
市20歲以上的成年民眾,在95%的信心水準下,抽樣誤差
為正負3.2百分點。」 求n?
z0.025 = 1.96 and (1.96)
q(0.63)(1−0.63)
n = 0.032 we have n = (0.63)(0.37)(1.96)2/0.0322 = 864
The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n
z0.01= 2.326 and (2.326)
q(0.08)(0.92)
n = 0.01 we have n = (0.08)(0.92)(2.326)2/0.012 = 3982
40
Sample Size for proportion
某報對於台北市市長施政滿意程度進行民調,民調結果如 下: 「滿意度為六成三,本次民調共成功訪問n位台北
市20歲以上的成年民眾,在95%的信心水準下,抽樣誤差
為正負3.2百分點。」 求n?
z0.025 = 1.96 and (1.96)
q(0.63)(1−0.63)
n = 0.032 we have n = (0.63)(0.37)(1.96)2/0.0322 = 864
The maximum error of the estimate for 98% confidence coefficient is 0.01 for ˆp = 0.08, find the n
z0.01= 2.326 and (2.326)
q(0.08)(0.92)
n = 0.01 we have n = (0.08)(0.92)(2.326)2/0.012 = 3982
40
unknown ˆ p
For estimating p, we have p∗(1 − p∗) ≤ 1/4. Hence we need n = 2.3262/(4 ∗ 0.012) = 13530
for the maximum error of the estimate for 98% confidence coefficient is 0.01.
95% confidence coefficient for = 0.01, we have n = 9604 95% confidence coefficient for = 0.03, we have n = 1067
41
Sample Size and CIs for given ˆ p
The 95% CI for the proportion of people of supporting A when there is 51% people support A in polls of 100, 400 or 10,000 sample.
[0.41, 0.61], [0.46, 0.56], [0.50, 0.52]
[0.51 ± 0.1], [0.51 ± 0.05], [0.51 ± 0.01]
42
Sample Size for mean
100(1 − α)% CI for µ is [x ± zα/2(σ/√
n)]. Denote such interval as x ± . we sometime call = zα/2(σ/√
n) the maximum error of the estimate
n = (zα/2)2(σ)2
2 where it is assumed that σ2 is known.
43
Example
we want the 95% CIs for µ to be x ± 1 for a normal population with standard deviation σ = 15, find the sample size.
1.96 15
√n = 1 we have n ≈ 864.35 = 865.
The 80% CIs for µ is x ± 2, then we have
1.282 15
√n = 2 where z0.1 = 1.282 and thus n = 93
44
Example
we want the 95% CIs for µ to be x ± 1 for a normal population with standard deviation σ = 15, find the sample size.
1.96 15
√n = 1 we have n ≈ 864.35 = 865.
The 80% CIs for µ is x ± 2, then we have
1.282 15
√n = 2 where z0.1 = 1.282 and thus n = 93
44
Example
we want the 95% CIs for µ to be x ± 1 for a normal population with standard deviation σ = 15, find the sample size.
1.96 15
√n = 1 we have n ≈ 864.35 = 865.
The 80% CIs for µ is x ± 2, then we have
1.282 15
√n = 2 where z0.1 = 1.282 and thus n = 93
44
Pivotal quantity I
A very useful method for finding confidence intervals uses a pivotal quantity.
What is a pivotal quantity? A pivotal quantity is a function of data and the unknown parameter, say g (X, θ), and the distribution of g (X, θ) does not depend on the unknown parameter.
Example
Given X1, · · · , Xn is a random sample from a N (µ, σ2) distribution.
45
Pivotal quantity II
1. When σ is known, Z = X − µ σ/√
n is a pivotal quantity.
Z ∼ N (0, 1)
2. When σ is unknown, T = X − µ S /√
n is a pivotal quantity where S is the sample deviation. T ∼ t(n − 1)
3. W = (n − 1)S2/σ2 is a pivotal quantity. W ∼ χ2(n − 1)
Y ∼ Bin(n, p), (Y /n) − p qp(1−p)
n
∼ N (0, 1)
46