• 沒有找到結果。

The monotone boundary property and the full coverage property of confidence intervals for a binomial proportion

N/A
N/A
Protected

Academic year: 2021

Share "The monotone boundary property and the full coverage property of confidence intervals for a binomial proportion"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

Contents lists available atScienceDirect

Journal of Statistical Planning and Inference

journal homepage:w w w . e l s e v i e r . c o m / l o c a t e / j s p i

The monotone boundary property and the full coverage property of

confidence intervals for a binomial proportion

Hsiuying Wang

National Chiao Tung University, Hsinchu, Taiwan

A R T I C L E I N F O A B S T R A C T

Article history: Received 1 March 2006 Accepted 26 July 2009 Available online 27 August 2009 Keywords:

Confidence coefficient Binomial distribution Confidence interval Coverage probability

The monotone boundary property The full coverage property

The methodology for deriving the exact confidence coefficient of some confidence intervals for a binomial proportion is proposed in Wang [2007. Exact confidence coefficients of con-fidence intervals for a binomial proportion. Statist. Sinica 17, 361–368]. The methodology requires two conditions of confidence intervals: the monotone boundary property and the full coverage property. In this paper, we show that for some confidence intervals of a binomial proportion, the two properties hold for any sample size. Based on results presented in this paper, the procedure in Wang [2007. Exact confidence coefficients of confidence intervals for a binomial proportion. Statist. Sinica 17, 361–368] can be directly used to calculate the exact confidence coefficients of these confidence intervals for any fixed sample size.

© 2009 Elsevier B.V. All rights reserved.

1. Introduction

The binomial distribution is a very useful distribution in many real application areas. The asymptotic behavior for several confidence intervals of a binomial proportion has been investigated byBrown et al. (2001, 2002). For small sample size behaviors of these confidence intervals,Wang (2007, 2009a)proposes calculation algorithms to derive their exact minimum coverage probabilities and average coverage probabilities. Some of these confidence intervals are successfully adopted in the quality control area (Wang, 2009b) and can potentially be developed to estimate the nucleotide substitution rate in an important biological evolutionary model (Wang et al., 2008).

One of the algorithms proposed inWang (2007, 2009a)is to calculate the minimum coverage probability, also known as confidence coefficient, of a confidence interval (U(X), L(X)) for a binomial proportion p, where X follows a binomial distribution

B(n, p). The coverage probability of a confidence interval of p is defined as the probability that the random interval covers the true

parameter p. In this case of the binomial distribution, the coverage probability is a variable function of p. For a 1



confidence interval of p that is constructed from the large sample approximation, the exact confidence coefficient may be far away from 1−



. One example is the 1−



Wald interval (ˆp − z/2



ˆp(1 − ˆp)/n, ˆp + z/2



ˆp(1 − ˆp)/n), where ˆp = X/n and z/2is the upper



/2

cutoff point of the standard normal distribution. It is well known that the confidence coefficient of the Wald interval is zero (see Lehmann, 1986;Blyth and Still, 1983).

Usually, the exact confidence coefficient is unknown because we do not know at which point in the parameter space the infimum coverage probability occurs. AdoptingWang's (2007)procedure, the infimum coverage probability and the maximum coverage probability of a confidence interval for a binomial proportion can be derived if two specific conditions for the confidence interval are satisfied, which are the monotone boundary property and the full coverage property. When applying the procedure for a given sample, we need to check if these two properties hold for the sample size of this sample. Note that the minimum

E-mail address:wang@stat.nctu.edu.tw

0378-3758/$ - see front matter©2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.07.031

(2)

coverage probability of a confidence interval only depends on the sample size and the form of the confidence interval. For a confidence interval, if the two properties can be shown to be satisfied for all sample sizes, the procedure can be directly utilized to calculate the confidence coefficient without the necessity of checking the two conditions for any sample size. In this paper, we show that for some confidence intervals, the two properties are satisfied for any sample size.

Beside being used in the coverage probability inference for confidence intervals, the monotone boundary property is also an essential condition in deriving the exact minimum coverage probability of tolerance intervals or simultaneous confidence intervals for discrete distributions (Cai and Wang, 2009; Wang and Tsung, 2009; Wang, 2008). Therefore, the monotone boundary property is an important condition in interval estimation for discrete distributions.

The paper is organized as follows. Section 2 describes the conditions and procedure proposed inWang (2007). The main results, that the two conditions hold for the Wilson, Agresti–Coull and likelihood ratio intervals with any sample size or when the sample size is greater than 1, are given in Section 3. Finally, a conclusion is given in Section 4.

2. Procedure

We briefly describe the procedure of computing exact confidence intervals proposed inWang (2007)as well as the conditions for the confidence intervals. The conditions required of a confidence interval (L(X), U(X)) for a binomial proportion in the procedure of computing exact confidence coefficients are given in Assumption 1.

Assumption 1. Confidence interval (L(X), U(X)) of a binomial proportion p satisfies:

(i) L(X1)

<

L(X2) if X1

<

X2; (ii) U(X1)

<

U(X2) if X1

<

X2;

(iii) for any fixed p∈ (0, 1), there exists an x0such that p∈ (L(x0), U(x0)).

Remark 1. The conditions in Assumption 1 can be extended to calculate confidence coefficients of confidence intervals for other

discrete distributions (Wang, 2009a).

In this paper, condition (i) is called the monotone lower boundary property, condition (ii) is called the monotone upper boundary property, and condition (iii) is called the full coverage property. If a confidence interval has the monotone lower boundary property and the monotone upper boundary property, we say that the confidence interval has the monotone boundary property. ByWang (2007), if a confidence interval satisfies Assumption 1, the exact confidence coefficient of the confidence interval can be derived by applying the following procedure.

Procedure for computing exact confidence coefficient.

Step 1: Check if the union of (n+ 1) intervals (L(X), U(X)), X = 0, ... , n, covers all p ∈ (0, 1) and if (i) and (ii) in Assumption 1 are satisfied. If it does not cover all p∈ (0, 1), the confidence coefficient is zero. We do not need to go to step 2.

Step 2: If Assumption 1 is satisfied, list the endpoints of the intervals that are greater than zero and smaller than 1.

Step 3: Calculate the coverage probability of each endpoint in step 2. The minimum value of these coverage probabilities is

the exact confidence coefficient.

Note that all endpoints of the confidence interval based on x= 0, ... , n are (L(0), L(1), . . . , L(n), U(0), U(1), . . . , U(n)).

3. The main results

In this section, the monotone boundary property and the full coverage property are examined for any fixed sample size for the three confidence intervals which are discussed inBrown et al. (2002)andWang (2007).

1. The 1−



Wilson interval. Denote ˜X= X + k2/2 and˜n = n + k2. Let˜p = ˜X/ ˜n, ˜q = 1 − ˜p, ˆp = X/n, ˆq = 1 − ˆp and k be the upper



/2 cutoff point of the standard normal distribution. The 1−



Wilson interval has the form

CIW(X)= ⎛ ⎝˜p − kn1/2 n+ k2  ˆpˆq +k2 4n 1/2 , ˜p + kn 1/2 n+ k2  ˆpˆq +k2 4n 1/2⎞ ⎠ . 2. The Agresti–Coull interval. The 1



Agresti–Coull interval is

CIAC(X)= (˜p − k(˜p˜q)1/2˜n−1/2, ˜p + k(˜p˜q)1/2˜n−1/2).

3. The likelihood ratio interval. The 1



interval is

CIn(X)= p : p X(1− p)n−X (X/n)X(1− X/n)n−X

>

e−k 2/2 .

(3)

For the Wilson interval, the monotone boundary property and full coverage property are shown to hold for any sample size in Propositions 1 and 2.

Proposition 1. The Wilson interval CIW(X) has the monotone boundary property.

Proof. Let LW(x) and UW(x) denote the lower bound and the upper bound of CIW(X) corresponding to X=x. To prove the monotone

boundary property, it is necessary to show that the two functions, LW(x+ 1) − LW(x) and UW(x+ 1) − UW(x) are greater than zero

for x= 0, ... , n − 1. We have LW(x+ 1) − LW(x)= 1 2(k2+ n)⎝2 + kn ⎛ ⎝ k2n+ 4(n − x)x n2 − −4(1 + x)2 + n(4 + k2+ 4x) n2 ⎞ ⎠ ⎞ ⎠ . (1)

Note that (1)

>

0 is equivalent to 2√n+ k k2n+ 4(n − x)x 2 − k −4(1 + x)2 + n(4 + k2+ 4x) 2

>

0. (2)

The left hand side of (2) is equal to

4(n+ k2(1− n + 2x) + kn k2n+ 4(n − x)x)

>

4(n+ k2(1− n + 2x) + k2n)

>

0, Therefore, LW(x) is an increasing function of x.

We have UW(x+ 1) − UW(x)= 1 2(k2+ n)⎝2 + kn ⎛ ⎝− k2n+ 4(n − x)x n2 + −4(1 + x)2 + n(4 + k2+ 4x) n2 ⎞ ⎠ ⎞ ⎠ . (3)

Note that (3)

>

0 is equivalent to 2√n+ k −4(1 + x)2 + n(4 + k2+ 4x) 2 − k k2n+ 4(n − x)x 2

>

0. (4)

The left hand side of (4) is equal to

4(n+ k2(−1 + n − 2x) + kn 4(1+ x)(n − 1 − x) + nk2

>

4(n+ k2(−1 + n − 2x) + k2n)

>

4(n+ k2(2n− 2x − 1))

>

0, (5)

because x

n− 1. Therefore, UW(x) is an increasing function of x.



Proposition 2. The Wilson interval CIW(X) has the full coverage property for all n and k

>

1.

Proof. By Proposition 1, CIW(x) has the monotone boundary property. According to the result, to show the full coverage property,

it is only necessary to show that LW(x+ 1) is less than UW(x) for x= 0, ... , n − 1, and UW(n) is not less than 1 and LW(0) is not

larger than 0. LW(x+ 1) − UW(x)= x+ 1 + k2/2 n+ k2 − kn1/2 n+ k2  (x+ 1)(n − x − 1) n2 + k2 4n 1/2xn+ k2/2 + k2 − kn1/2 n+ k2  x(n− x) n2 + k2 4n 1/2

<

1 n+ k2 − ⎧ ⎨ ⎩ kn1/2 n+ k2 ⎡ ⎣  k2 4n 1/2 +  k2 4n 1/2⎤ ⎦ ⎫ ⎬ ⎭ = 1 n+ k2(1− k 2)

<

0,

which leads to LW(x+ 1) is less than UW(x) for k

>

1. By straightforward calculation, we have UW(n)= 1 and LW(0)= 0. Therefore,

the proof is complete.



For the Agresti-Coull interval, the monotone boundary property and full coverage property are shown to hold for any sample size in Propositions 3 and 4.

(4)

Proof. Let LAC(x) and UAC(x) denote the lower bound and the upper bound of CIAC(X) corresponding to X= x. LAC(x+ 1) − LAC(x)= 1/ ˜n + k/(2(˜n)3/2) (k2+ 2n − 2x)(k2+ 2x) − (−2 + k2+ 2n − 2x)(2 + k2+ 2x))  . (6)

Note that (6) larger than zero is equivalent to 2k2+ n + k (k2+ 2n − 2x)(k2+ 2x) 2 − k (−2 + k2+ 2n − 2x)(2 + k2+ 2x) 2

>

0. (7)

By straightforward calculation, (7) is equal to

4(n+ kk2+ nk4+ 2k2n+ 4nx − 4x2+ k2(2− n + 2x))

4(n+ kk2+ nk4+ 2k2n+ k2(2− n + 2x))

4(n+ k (k2+ n)k2(k2+ 2n) + k2(2− n + 2x))

4(n+ k2(k2+ n) + k2(2− n + 2x))

>

0, (8)

which implies that LAC(x) is an increasing function of x.

UAC(x+ 1) − UAC(x)= 1/ ˜n + k/(2˜n)3/2

(k2+ 2n − 2x)(k2+ 2x) + (−2 + k2+ 2n − 2x)(2 + k2+ 2x) 

. (9)

Note that (9) larger than zero is equivalent to

(2k2+ n + k (−2 + k2+ 2n − 2x)(2 + k2+ 2x))2− (k (k2+ 2n − 2x)(k2+ 2x))2

>

0. (10) By straightforward calculation, (10) is equal to

4 n+ kk2+ n k4+ 2k2n+ 4nx + 4n − 4(x + 1)2+ k2n+ 2k2x

4 n+ kk2+ n k4+ 2k2n+ 4(x + 1)(n − x − 1) + k2n− 2k2x

4n+ knk2n+ k2n− 2k2x

4(n+ 2k2n− 2k2x)

>

0, (11)

which implies that UAC(x) is an increasing function of x.



Proposition 4. The Agresti–Coull interval CIAC(X) has the full coverage property for all n

2 and k

>

1.

Proof. By a similar argument as that in Proposition 2, we need to show that LAC(x+ 1) is less than UAC(x), and UAC(n) is not less

than 1 and LAC(0) is not larger than 0.

LAC(x+ 1) − UAC(x)= x+ 1 + k2/2 n+ k2 − kh1(n+ k 2)−1/2x+ k2/2 n+ k2 − kh2(n+ k 2)−1/2, (12)

where h1= ((x + 1 + k2/2)(n+ k2/2− x − 1)/(n + k2)2)1/2and h2= ((x + k2/2)(n+ k2/2− x)/(n + k2)2)1/2. Thus,

(12)= 1

n+ k2 − k(n + k 2)−1/2(h

1+ h2). (13)

Since h1and h2are greater than ((x+ k2/2)(n+ k2/2− x − 1)/(n + k2)2)1/2, (13) is less than 1 n+ k2− 2k(n + k 2)−1/2(n+ k2)−1  x+k 2 2   n+k 2 2 − x − 1 1/2 . (14)

Note that the minimum value of the term  x+k22   n+k22 − x − 1  in (14) for x= 0, ... , n − 1 is k2 2  2n+ k2− 2 2  .

(5)

Thus (14) is less than 1 n+ k2 ⎛ ⎝1 − 2k(n + k2)−1/2  k2 2  2n+ k2− 2 2 1/2⎞ ⎠

<

1 n+ k2(1− k 2)

<

0.

The second to last inequality holds because n

2 and the last inequality holds because k

>

1. Thus, LAC(x+ 1) is less than UAC(x).

Moreover, UAC(n) is n+ k2/2 n+ k2 + k/2k4+ 2k2n (n+ k2)3/2 = n+ k2/2 n+ k2 + k2/2k2+ 2n (n+ k2)3/2

>

n+ k2/2 n+ k2 + k2/2 n+ k2 = 1. Note that LAC(0) is k2 2(n+ k2) ⎛ ⎝1 −  2n+ k2 n+ k2 1/2⎞ ⎠

<

0. Thus, the proof is complete.



The Wilson interval and the Agresti–Coull interval have closed forms. However, the likelihood ratio interval does not have a closed form. It is more difficult to show the two properties for the likelihood ratio interval. Before giving the results for the likelihood ratio interval, we need the following lemma.

Lemma 1. The two functions (t/(t− 1))t−1and (t/(t+ 1))t+1are increasing functions of t, and

(i) (t/(t− 1))t−1

<

e for t

1; (ii) (t/(t+ 1))t+1

<

1/e for t

0.

Proof. First, we show log(t/(t− 1))t−1is an increasing function of t.

*

*

t  log t t− 1 t−1 = log t t− 1− log e1/t. (15)

To establish (15) greater than zero, we need to show

t

t− 1

>

e1/t. (16)

The expansion of the left hand side of (16) is 1+1/t+1/t2+· · · . The expansion of the right hand side of (16) is 1+1/t+1/(2!t2)+· · · . Therefore, (16) holds, which implies that log(t/(t− 1))t−1is an increasing function and (t/(t− 1))t−1is also an increasing function. Since limt→∞(t/(t− 1))t−1= e, we have (t/(t − 1))t−1

<

e for t

1. The proof of (i) is complete.

For the second part, we need to show (t/(t+ 1))t+1is an increasing function of t.

*

*

t  log t t+ 1 t+1 = logt+ 1t + log e1/t = log(t+ 1)/te1/t = log1+ 1/t + 1/(2!t1+ 1/t2)+ · · ·, (17)

which is greater than zero. Therefore, (t/(t+ 1))t+1is an increasing function of t. Since lim

t→∞(t/(t+ 1))t+1= e−1, thus (t/(t+ 1))t+1

<

e−1for t

>

0.



Proposition 5. CIn(x) has the monotone boundary property.

Proof. Let U(x) and L(x) denote the lower bound and the upper bound of CIn(X), respectively. First we show that CInhas the

monotone lower boundary property. Let

L(x, p)= px(1− p)

n−x (x/n)x(1− x/n)n−x.

(6)

2 k2 − x+1 n x n LΛ (x) L(x, p) L(x+1, p) e

Fig. 1. The plot of L(x, p) and L(x+ 1, p). The lower endpoint of CIn(x) is L(x). L(x+ 1) is greater than L(x) if L(x+ 1, L(x))<e−k2/2.

2 k2 e− 1− x n x n l1 L(x, p) L(n–x, p) 1 − l1 l2 1 − l2

Fig. 2. The plot of L(x, p) and L(n− x, p).

Note that L(x, p) is a unimodal function of p. For a fixed x, we have

L(x)x(1− L(x))(n−x) (x/n)x(1− x/n)(n−x) = e

−k2/2

.

Note that L(x)

<

x/n because L(x, x/n)= 1

>

e−k2/2. If we can demonstrate

L(x)(x+1)(1− L(x))(n−x−1)

((x+ 1)/n)(x+1)(1− (x + 1)/n)(n−x−1)

<

e−k

2/2

, (18)

then L(x+ 1) is greater than L(x), seeFig. 1. The left hand side of (18) can be rewritten as

L(x, L(x)) L(x) 1− L(x)

(x/n)x(1− x/n)(n−x) ((x+ 1)/n)x+1(1− (x + 1)/n)n−x−1. By the fact L(x, L(x))= e−k2/2

and L(x)/(1− L(x))

<

(x/n)/(1− x/n), to prove (18), we only need to show (x/n)x+1(1− x/n)(n−x−1) ((x+ 1)/n)x+1(1− (x + 1)/n)n−x−1

<

1, (19) which is equivalent to x x+ 1 x+1 n− x n− x − 1 n−x−1

<

1. (20)

By Lemma 1, we have ((n− x)/(n − x − 1))(n−x−1)

<

e and (x/(x+ 1))(x+1)

<

1/e for all x= 0, ... , n − 1. Therefore (20) holds, which implies that CIn(x) has the monotone lower boundary property.

Note that L(x, p)= L(n − x, 1 − p). We have U(n− x) = 1 − L(x), seeFig. 2. For x2

>

x1,

U(x2)− U(x1)= L(n− x1)− L(n− x2)

>

0

(7)

Proposition 6. The likelihood ratio interval CIn(X) has full coverage property for all n and k

>

−2log



, where



is minx=0,...,n−1 (dx x(1− dx)n−x/(x/n)x(1− x/n)n−x) and dx= x+ 1 n x+1 1−x+ 1 n n−x x + 1 n x+1 1−x+ 1n n−x + x n x 1−nx n−x 1−x+ 1n  for 0

x

n− 1.

Proof. To show the full coverage property, we need to show that L(x+ 1) is less than U(x), U(n) is not less than 1 and L(0)

is not larger than 0. Note that the function L(x, p) has a maximum value 1 at p= x/n. By straightforward calculation, the equation

L(x, p)= L(x + 1, p) for a fixed x has only one root at p = dx. If k satisfies

e−k2/2

<

min x=0,...,n−1(d

x

x(1− dx)n−x/(x/n)x(1− x/n)n−x), (21)

then L(x+ 1) is less than U(x) for x= 0, ... , n − 1. The condition of k in (21) is equivalent to k

>

−2 log



. U(n) is greater than 1 because L(n, 1)= 1. L(0) is less than 0 because L(0, 0)= 1. Therefore, the proof is complete.



Remark 2. I have done numerical calculations to approach



in Proposition 6 and found that the minimum value always happens

at x= 0 and



is an increasing function in n. When n is 2,



is 0.64. By the numerical calculations, the condition of k in Proposition 6 is k

>

0.945 for all n

2. The lower bound of k can be smaller if n increases.

4. Conclusion

In this paper, the monotone boundary property and the full coverage property for the Wilson, Agresti–Coull and the likelihood ratio confidence intervals of a binomial proportion are shown to hold for any sample size or sample size greater than 1. Although the algorithm proposed inWang (2007)has been used inWang (2007)andWang (2009a)to calculate the minimum coverage probability for the three intervals for some sample sizes, there were no explicit demonstrations that the algorithm can be used for most sample sizes before until this study. With the results in this paper, the procedure can be directly used to calculate the minimum coverage probabilities of the three important confidence intervals without the necessity of checking the conditions in Assumption 1.

References

Blyth, C.R., Still, H.A., 1983. Binomial confidence intervals. J. Amer. Statist. Assoc. 78, 108–116.

Brown, L.D., Cai, T., DasGupta, A., 2001. Interval estimation for a binomial proportion (with discussion). Statist. Sci. 16, 101–133. Brown, L.D., Cai, T., DasGupta, A., 2002. Confidence intervals for a binomial and asymptotic expansions. Ann. Statist. 30, 160–201. Cai, T., Wang, H., 2009. Tolerance intervals for discrete distributions in exponential families. Statist. Sinica 19, 905–923. Lehmann, E.L., 1986. Testing Statistical Hypotheses. second ed. Wiley, New York.

Wang, H., 2007. Exact confidence coefficients of confidence intervals for a binomial proportion. Statist. Sinica 17, 361–368.

Wang, H., 2008. Exact confidence coefficients of simultaneous confidence intervals for multinomial proportions. Journal of Multivariate Analysis 99, 896–911. Wang, H., Tzeng, Y.H., Li, W.H., 2008. Improved variance estimators for one- and two-parameter models of nucleotide substitution. J. Theoret. Biol. 254, 164–167. Wang, H., 2009a. Exact average coverage probabilities and confidence coefficients of confidence intervals for discrete distributions. Statist. Comput. 19, 139–148. Wang, H., 2009b. Comparison of p control charts for low defective rate. computational statistics and data analysis, 53, 4210–4220.

數據

Fig. 1. The plot of L(x, p) and L(x + 1, p). The lower endpoint of CI  n (x) is L  (x)

參考文獻

相關文件

Robinson Crusoe is an Englishman from the 1) t_______ of York in the seventeenth century, the youngest son of a merchant of German origin. This trip is financially successful,

fostering independent application of reading strategies Strategy 7: Provide opportunities for students to track, reflect on, and share their learning progress (destination). •

Strategy 3: Offer descriptive feedback during the learning process (enabling strategy). Where the

stating clearly the important learning concepts to strengthen the coverage of knowledge, so as to build a solid knowledge base for students; reorganising and

In this talk, we introduce a general iterative scheme for finding a common element of the set of solutions of variational inequality problem for an inverse-strongly monotone mapping

strongly monotone or uniform P -function to obtain property of bounded level sets, see Proposition 3.5 of Chen and Pan (2006).. In this section, we establish that if F is either

There are existing learning resources that cater for different learning abilities, styles and interests. Teachers can easily create differentiated learning resources/tasks for CLD and

Provide all public sector schools with Wi-Fi coverage to enhance learning through the use of mobile computing devices, in preparation for the launch of the fourth IT in