Bootstrap Conﬁdence Intervals - 14.1 The Bootstrap Idea

SECTION 14.3 Exercises

14.4 Bootstrap Conﬁdence Intervals

To this point, we have met just one type of inference procedure based on re-sampling, the bootstrap t confidence intervals. We can calculate a bootstrap t confidence interval for any parameter by bootstrapping the corresponding statistic. We don’t need conditions on the population or special knowledge about the sampling distribution of the statistic. The flexible and almost auto-matic nature of bootstrap t intervals is appealing—but there is a catch. These intervals work well only when the bootstrap distribution tells us that the sam-pling distribution is approximately normal and has small bias. How well must these conditions be met? What can we do if we don’t trust the bootstrap t in-terval? In this section we will see how to quickly check t confidence intervals for accuracy and learn alternative bootstrap confidence intervals that can be used more generally than the bootstrap t.

Bootstrap percentile conﬁdence intervals

Confidence intervals are based on the sampling distribution of a statistic. If a statistic has no bias as an estimator of a parameter, its sampling distribution is centered at the true value of the parameter. We can then get a 95% con-fidence interval by marking off the central 95% of the sampling distribution.

The t critical values in a t confidence interval are a shortcut to marking off the central 95%. The shortcut doesn’t work under all conditions—it depends both on lack of bias and on normality. One way to check whether t intervals (using either bootstrap or formula-based standard errors) are reasonable is to com-pare them with the central 95% of the bootstrap distribution. The 2.5% and 97.5% percentiles mark off the central 95%. The interval between the 2.5%

and 97.5% percentiles of the bootstrap distribution is often used as a confi-dence interval in its own right. It is known as a bootstrap percentile conficonfi-dence interval.

BOOTSTRAP PERCENTILE CONFIDENCE INTERVALS

The interval between the 2.5% and 97.5% percentiles of the bootstrap distribution of a statistic is a 95% bootstrap percentile confidence interval for the corresponding parameter. Use this method when the bootstrap estimate of bias is small.

The conditions for safe use of bootstrap t and bootstrap percentile inter-vals are a bit vague. We recommend that you check whether these interinter-vals are reasonable by comparing them with each other. If the bias of the bootstrap distribution is small and the distribution is close to normal, the bootstrap t and percentile confidence intervals will agree closely. Percentile intervals, un-like t intervals, do not ignore skewness. Percentile intervals are therefore usu-ally more accurate, as long as the bias is small. Because we will soon meet much more accurate bootstrap intervals, our recommendation is that when

CAUTION

bootstrap t and bootstrap percentile intervals do not agree closely, neither type of interval should be used.

E X A M P L E 1 4 . 8 In Example 14.5 (page 14-18) we found that a 95% bootstrap t confi-dence interval for the 25% trimmed mean of Seattle real estate sales prices is 210.2 to 277.8. The bootstrap distribution in Figure 14.7 shows a small bias and, though roughly normal, is a bit skewed. Is the bootstrap t confidence interval ac-curate for these data?

The S-PLUS bootstrap output includes the 2.5% and 97.5% percentiles of the bootstrap distribution. They are 213.1 and 279.4. These are the endpoints of the 95%

bootstrap percentile confidence interval. This interval is quite close to the bootstrap t interval. We conclude that both intervals are reasonably accurate.

The bootstrap t interval for the trimmed mean of real estate sales in Ex-ample 14.8 is

x25%± t^∗SEboot= 244 ± 33.81

We can learn something by also writing the percentile interval starting at the statistic x25%= 244. In this form, it is

244.0 − 30.9, 244.0 + 35.4

Unlike the t interval, the percentile interval is not symmetric—its endpoints are different distances from the statistic. The slightly greater distance to the 97.5% percentile reflects the slight right skewness of the bootstrap distribution.

Conﬁdence intervals for the correlation

The bootstrap allows us to find standard errors and confidence intervals for a wide variety of statistics. We have done this for the mean and the trimmed mean. We also learned how to find the bootstrap distribution for a differ-ence of means, but that distribution for the Verizon data (Example 14.6,

page 14-19) is so far from normal that we are reluctant to use the bootstrap t or percentile confidence intervals. Now we will bootstrap the correlation co-efficient. This is our first use of the bootstrap for a statistic that depends on two related variables. As with the difference of means, we must pay attention to how we should resample.

E X A M P L E 1 4 . 9 Major League Baseball (MLB) owners claim they need limitations on player salaries to maintain competitiveness among richer and poorer teams. This argument assumes that higher salaries attract better players. Is there a relationship between an MLB player’s salary and his performance?

Table 14.2 contains the names, 2002 salaries, and career batting averages of 50 randomly selected MLB players (excluding pitchers).⁹ The scatterplot in Figure 14.15 suggests that the relationship between salary and batting average is weak. The sample correlation is r= 0.107. Is this small correlation significantly different from 0? To find out, we can calculate a 95% confidence interval for the population correlation and see whether or not it covers 0. If the confidence interval does not cover 0, the observed correlation is significant at the 5% level.

TA B L E 1 4 . 2

Major League Baseball salaries and batting averages

Name Salary Average Name Salary Average

Matt Williams $9,500,000 0.269 Greg Colbrunn $1,800,000 0.307

Jim Thome $8,000,000 0.282 Dave Martinez $1,500,000 0.276

Jim Edmonds $7,333,333 0.327 Einar Diaz $1,087,500 0.216

Fred McGriff $7,250,000 0.259 Brian L. Hunter $1,000,000 0.289

Jermaine Dye $7,166,667 0.240 David Ortiz $950,000 0.237

Edgar Martinez $7,086,668 0.270 Luis Alicea $800,000 0.202

Jeff Cirillo $6,375,000 0.253 Ron Coomer $750,000 0.344

Rey Ordonez $6,250,000 0.238 Enrique Wilson $720,000 0.185

Edgardo Alfonzo $6,200,000 0.300 Dave Hansen $675,000 0.234 Moises Alou $6,000,000 0.247 Alfonso Soriano $630,000 0.324 Travis Fryman $5,825,000 0.213 Keith Lockhart $600,000 0.200

Kevin Young $5,625,000 0.238 Mike Mordecai $500,000 0.214

M. Grudzielanek $5,000,000 0.245 Julio Lugo $325,000 0.262

Tony Batista $4,900,000 0.276 Mark L. Johnson $320,000 0.207

Fernando Tatis $4,500,000 0.268 Jason LaRue $305,000 0.233

Doug Glanville $4,000,000 0.221 Doug Mientkiewicz $285,000 0.259

Miguel Tejada $3,625,000 0.301 Jay Gibbons $232,500 0.250

Bill Mueller $3,450,000 0.242 Corey Patterson $227,500 0.278

Mark McLemore $3,150,000 0.273 Felipe Lopez $221,000 0.237

Vinny Castilla $3,000,000 0.250 Nick Johnson $220,650 0.235 Brook Fordyce $2,500,000 0.208 Thomas Wilson $220,000 0.243

Torii Hunter $2,400,000 0.306 Dave Roberts $217,500 0.297

Michael Tucker $2,250,000 0.235 Pablo Ozuna $202,000 0.333

Eric Chavez $2,125,000 0.277 Alexis Sanchez $202,000 0.301

Aaron Boone $2,100,000 0.227 Abraham Nunez $200,000 0.224

0 2 4 6 8

.200 .250 .300 .350

Batting average

Salary (in millions of dollars)

FIGURE 14.15 Career batting average and 2002 salary for a random sample of 50 Major League Baseball players.

How shall we resample from Table 14.2? Because each observation con-sists of the batting average and salary for one player, we resample players (that is, observations). Resampling batting averages and salaries separately would lose the tie between a player’s batting average and his salary. Software such as S-PLUS automates proper resampling. Once we have produced a bootstrap distribution by resampling, we can examine the distribution and form a con-fidence interval in the usual way. We need no special formulas or procedures to handle the correlation.

Figure 14.16 shows the bootstrap distribution and normal quantile plot for the sample correlation for 1000 resamples from the 50 players in our sample.

The bootstrap distribution is close to normal and has small bias, so a 95%

bootstrap t confidence interval appears reasonable.

The bootstrap standard error is SEboot = 0.125. The t interval using the bootstrap standard error is

r± t^∗SEboot= 0.107 ± (2.009)(0.125)

= 0.107 ± 0.251

= (−0.144, 0.358) The 95% bootstrap percentile interval is

(2.5% percentile, 97.5% percentile) = (−0.128, 0.356)

= (0.107 − 0.235, 0.107 + 0.249) The two confidence intervals are in reasonable agreement.

The confidence intervals give a wide range for the population correlation, and both include 0. These data do not provide significant evidence that there is a relationship between salary and batting average. A larger sample might result in a significant relationship, but the evidence from this sample suggests

–0.2 0.0 0.2 0.4 Correlation coefficient

(a)

–0.2 0.0 0.2 0.4

–2 0 2

z-score

Correlation coefficient

(b)

Observed Mean

FIGURE 14.16 The bootstrap distribution and normal quan-tile plot for the correlation r for 1000 resamples from the baseball player data in Table 14.2. The solid double-ended arrow below the distribution is the t interval, and the dashed arrow is the percentile interval.

that any relationship is quite weak. Of course, batting average is only one facet of a player’s performance. It is possible that there may be a significant salary-performance relationship if we include several measures of salary-performance.

More accurate bootstrap conﬁdence intervals:

BCa and tilting

Any method for obtaining confidence intervals requires some conditions in order to produce exactly the intended confidence level. These conditions (for example, normality) are never exactly met in practice. So a 95% confidence

in-terval in practice will not capture the true parameter value exactly 95% of the time. In addition to “hitting” the parameter 95% of the time, a good confidence interval should divide its 5% of “misses” equally between high misses and low misses. We will say that a method for obtaining 95% confidence intervals is accurate in a particular setting if 95% of the time it produces intervals that accurate

capture the parameter and if the 5% misses are equally shared between high and low misses. Perfect accuracy isn’t available in practice, but some methods are more accurate than others.

One advantage of the bootstrap is that we can to some extent check the ac-curacy of the bootstrap t and percentile confidence intervals by examining the bootstrap distribution for bias and skewness and by comparing the two inter-vals with each other. The interinter-vals in Examples 14.8 and 14.9 reveal some right skewness, but not enough to invalidate inference. The bootstrap distribution in Figure 14.9 (page 14-21) for comparing two means, on the other hand, is so skewed that we hesitate to use the t or percentile intervals. In general, the t and percentile intervals may not be sufficiently accurate when

• the statistic is strongly biased, as indicated by the bootstrap estimate of bias;

• the sampling distribution of the statistic is clearly skewed, as indicated by the bootstrap distribution and by comparing the t and percentile intervals;

• we require high accuracy because the stakes are high (large sums of money or public welfare).

Most confidence interval procedures are more accurate for larger sample sizes. The t and percentile procedures improve only slowly: they require 100 times more data to improve accuracy by a factor of 10. (Recall the√

n in the formula for the usual one-sample t interval.) These intervals may not be very accurate except for quite large sample sizes. There are more elaborate boot-strap procedures that improve faster, requiring only 10 times more data to im-prove accuracy by a factor of 10. These procedures are quite accurate unless the sample size is very small.

BCa AND TILTING CONFIDENCE INTERVALS

The bootstrap bias-corrected accelerated (BCa) interval is a modi-fication of the percentile method that adjusts the percentiles to correct for bias and skewness.

The bootstrap tilting interval adjusts the process of randomly form-ing resamples (though a clever implementation allows use of the same resamples as other bootstrap methods).

These two methods are accurate in a wide variety of settings, have reason-able computation requirements (by modern standards), and do not produce excessively wide intervals. The BCa intervals are more widely used. Both are based on the big ideas of resampling and the bootstrap distribution. Now that you understand the big ideas, you should always use one of these more ac-curate methods if your software offers them. We did not meet them earlier

because the details of producing the confidence intervals are quite technical.¹⁰ The BCa method requires more than 1000 resamples for high accuracy. Use 5000 or more resamples if the accuracy of inference is very important. Tilting is more efficient, so that 1000 resamples are generally enough. Don’t forget that even BCa and tilting confidence intervals should be used cautiously when

CAUTION

sample sizes are small, because there are not enough data to accurately determine the necessary corrections for bias and skewness.

E X A M P L E 1 4 . 1 0 The 2002 Seattle real estate sales data are strongly skewed (Fig-ure 14.6). Fig(Fig-ure 14.17 shows the bootstrap distribution of the sample mean x. We see that the skewness persists in the bootstrap distribution and therefore in the sampling distribution. Inference based on a normal sampling distri-bution is not appropriate.

We generally prefer resistant measures of center such as the median or trimmed mean for skewed data. Accordingly, in Example 14.5 (page 14-18) we bootstrapped the 25% trimmed mean. However, the mean is easily understood by the public and is needed for some purposes, such as projecting taxes based on total sales value.

The bootstrap t and percentile intervals aren’t reliable when the sampling distribu-tion of the statistic is skewed. Figure 14.18 shows software output that includes all four of the confidence intervals we have mentioned, along with the traditional one-sample t interval. The BCa interval is

(329.3 − 62.2, 329.3 + 127.0) = (267.1, 456.3)

200 300 400 500

Observed Mean

Resample means, $1000s

FIGURE 14.17 The bootstrap distribution of the sample means of 5000 resamples from the data in Table 14.1, for Example 14.10. The bootstrap distribution is right-skewed, so we conclude that the sampling distribution of x is right-skewed as well.

One -sample t -Test

data: Price in Seattle2002 t = 7.3484, df = 49, p-value = 0

alternative hypothesis: mean is not equal to 0 95 percent confidence interval:

239.2150 419.2992 sample estimates:

mean of x 329.2571

Number of Replications: 5000

Summary Statistics:

Observed Mean Bias SE mean 329.3 328.4 –0.8679 43.68

Percentiles:

2.5% 5% 95% 97.5%

mean 253.5264 263.1985 406.4151 425.513

BCa Confidence Intervals:

2.5% 5% 95% 97.5%

mean 267.0683 275.5542 433.4044 456.2938

Tilting Confidence Intervals:

2.5% 5% 95% 97.5%

mean 263.1428 272.4917 430.7042 455.2483

T Confidence Intervals using Bootstrap Standard Errors:

2.5% 5% 95% 97.5%

mean 241.4652 256.0183 402.496 417.0491

FIGURE 14.18 S-PLUS output for bootstrapping the mean of the Seattle real estate selling price data, for Example 14.10. The output includes four types of con-ﬁdence intervals for the population mean.

and the tilting interval is

(329.3 − 66.2, 329.3 + 125.9) = (263.1, 455.2)

These intervals agree closely. Both are strongly asymmetrical: the upper endpoint is about twice as far from the sample mean as the lower endpoint. This reflects the strong right skewness of the bootstrap distribution.

The output in Figure 14.18 also shows that both endpoints of the less-accurate intervals (one-sample t, bootstrap t, and percentile) are too low.

These intervals miss the population mean on the low side too often (more than 2.5%) and miss on the high side too seldom. They give a biased picture of where the true mean is likely to be.

While the BCa and tilting calculations are radically different, the results tend to be about the same, except for random variation in the BCa if the num-ber of resamples is less than about 5000. Both procedures are accurate, so we expect them to produce similar results unless a small sample size makes any inference dubious.

在文檔中 14.1 The Bootstrap Idea (頁 34-42)