• 沒有找到結果。

4. Performance Comparisons of Four Bootstrap Methods

4.2. Error Probability Analysis

The error probability is the proportion of times that we wrongly reject the null hypothesis

H Y

0

:

q1

Y

q 2, while actually

H Y

0

:

q1

Y

q 2 is true. For the test, we will calculate the proportion of times the LCB of

Y

q2

Y

q1 is positive and the LCB of

is larger than 1. A sample of size

2

/

Y

q

Y

q1

n =

100 was drawn with

bootstrap resamples, and the single simulation was then replicated times.

Figures 2 and 3 show the error probability of those four bootstrap methods for the difference and ratio statistics with 25 combinations (it also called 25 cases in the following) tabulated in Table 3, respectively. Usually, the reasonable probability of error selection is less than a maximum value

3,000 B = 3,000 N =

α -condition. The frequency of error *

selection is a binomial random variable with

N = 3,000

and . Then we can calculate a 99% confidence interval for error probability is

*

0.05

α

=

* * *

0.05

(1 )/ 0.05 2.576 (0.05 0.95)/3000 0.05 0.0103

Z N

α

± ×

α

α

= ± × × = ±

.

That is, if we set , the reasonable interval would be the range from 0.0397 to 0.0610.

*

0.05

α

=

Before we selected the parameter of the error test, we tried many different combinations of N ,

B

and . Because too low value of

n N would make the

random error significant and the tendency between different cases wouldn’t be obvious. On the other hand, too high value of N will make the 99% confidence bound too narrow. Cases were out of interval easily in this condition and it was difficult to judge which bootstrap method was better. As the result, we tested many combinations of the parameter and finally selected

N = 3,000

, , and

to perform the error test. Considering different value of may also affect the layout of the error curve for four bootstrap methods, different values of were also simulated under the same combination (

3,000

then we found that the tendency of the curves were more significant by the increase of value in . And the relative location between the curve of four methods wasn’t changed. Based on these tests, we finally selected the case of with

100

n =

Y

q

q

0.8

Y =

3,000

N =

, , and to perform the error test in four bootstrap methods.

3,000

B = n =

100

difference error

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0 5 10 15 20 25

case number

error probability

SB PB BCPB BT 0.0603 0.05 0.0397

Figure 2. Error probability of four bootstrap methods under

Y

q2

Y

q1

= 0

(

Y

q1

= Y

q2

= 0.8

).

ratio error

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0 5 10 15 20 25

case number

error prability

SB PB BCPB BT 0.0603 0.05 0.0397

Figure 3. Error probability of four bootstrap methods under

Y

q2

/ Y =

q1

1

(

Y

q1

= Y

q2

= 0.8

).

After the difference statistic, there were three occurrences out of the 25 cases were outside the interval (0.0397, 0.0603) for the SB method. And for the PB method, there were three occurrences beyond these limits. Only two occurrences were outside the interval for the BCPB method, and there were three occurrences beyond these limits for the BT method. As for the ratio test, there were 3 occurrences out of the 25 cases outside the interval (0.0397, 0.0603) for the SB, PB and BCPB methods. The most cases out of the limits were for the BT method (7 occurrences). We could find that the PB, SB and BCPB methods had similar number of occurrences outside the interval for the 25 cases and the BT method had the least cases out of the upper bound in the ratio test.

By the following Tables 4-5, we can further examine the mean and standard deviation of the error probability for four methods. We could find that the mean of the error probability for the BCPB method was the farthest from the target and the standard deviation was the lowest compared with the other methods in the different test. In the ratio test, the mean for the BT method was the farthest from the setting 0.05 and the standard deviation was the highest in four bootstrap methods. And The BCPB method still had the lowest standard deviation. The SB method has the closest mean to the target, but its deviation was higher than the PB and BCPB methods in Table 5. In the two tests, we could find that the BCPB method had a higher mean of error probability in four methods but it could keep a lowest standard deviation in all four methods. We also performed the error test in

Y

q1

= Y

q2

= 0.83

(see Tables 10-11 and Figures 11-12 in Appendix A) and found that as the value became larger, the standard deviation of the error probability for four methods got larger. In this condition, the BCPB method still had the smallest variation among four bootstrap methods. Considering the application in high quality measuring, the BCPB method could keep the steadiest value of the error probability.

Y

q

Table 4. Error statistics of the four bootstrap methods for the difference test (

Y

q1

= Y

q2

= 0.8

).

Difference Mean of these 25 cases error

Standard deviation of these 25 cases error

Number of out of limits

Out of limits case

SB 0.0552528 0.004475521 3 21,22,23

PB 0.0556936 0.003744748 3 21,22,23

BCPB 0.0567064 0.002709163 2 3,22

BT 0.0550536 0.0054367 9 4 21,22, 23,24

Table 5. Error statistics of the four bootstrap methods for the ratio test (

Y

q1

= Y

q2

= 0.8

).

Difference Mean of these 25 cases error

Standard deviation of these 25 cases error

Number of out of limits

Out of limits case

SB 0.0499332 0.005729382 3 21,22,23

PB 0.0556936 0.003744748 3 21,22,23

BCPB 0.0569744 0.00272262 3 3,8,22

methods Error

prob.

In addition, we calculated an average lower bound and the standard deviation of the lower bound based on the

N =

3000,

B =

3000,

n =

100 difference trials. Table 6 also displays the average lower confidence bound (LCB) and standard deviation of the LCB for each of the four bootstrap confidence intervals and we tabulated these values of 25 cases for four bootstrap methods in Table 12 in Appendix A. In the Figures 2-3, and Table 6, we could find the different cases’ influence on the error probability. The average and standard deviation of LCB was significantly different between these cases. By setting different cases and comparing the performance of four methods, a suitable bootstrap method could be selected. In Tables 4-5, we found the performance of the BT method was the worst. It couldn’t keep steady error probability in different cases. The SB, PB, and BCPB methods had similar performance in two tests, but the BCPB method had the smallest variation and the least out of limit cases.

相關文件