• 沒有找到結果。

Bootstrap and Jackknife Calculations in R Version 6 April 2004

N/A
N/A
Protected

Academic year: 2022

Share "Bootstrap and Jackknife Calculations in R Version 6 April 2004"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1)

Version 6 April 2004

These notes work through a simple example to show how one can programRto do both jackknife and bootstrap sampling. We start with bootstrapping.

Bootstrap Calculations

Rhas a number of nice features for easy calculation of bootstrap estimates and confidence intervals. To see how to use these features, consider the following 25 observations:

8.26 6.33 10.4 5.27 5.35 5.61 6.12 6.19 5.2 7.01 8.74 7.78 7.02 6 6.5 5.8 5.12 7.41 6.52 6.21 12.28 5.6 5.38 6.6 8.74

Suppose we wish to estimate the coefficient of variation, CV =√

Var/ x. Let’s do this with a bootstrap estimator.

First, let’s put the data into a vector, which we will callx,

> x <-c(8.26, 6.33, 10.4, 5.27, 5.35, 5.61, 6.12, 6.19, 5.2, 7.01, 8.74, 7.78, 7.02, 6, 6.5, 5.8, 5.12, 7.41, 6.52, 6.21, 12.28, 5.6, 5.38, 6.6, 8.74)

Now let’s define a functon in R, which we will call CV, to compute the coefficient of variation,

> CV <- function(x) sqrt(var(x))/mean(x) So, let’s compute the CV

> CV(x) [1] 0.2524712

To generate a single bootstrap sample from this data vector, we use the command

>sample(x,replace=T)

which generates a bootstrap sample of the data vectorxby sampling with replacement.

Hence, to compute the CV using a single bootstrap sample,

> CV(sample(x,replace=T)) [1] 0.2242572

The particular value thatRreturns for you will be different as the sample is random.

Some other useful commands:

> sum(x)returns the sum of the elements inx

> mean(x)returns the mean of the elements inx

> var(x)returns the sample variance, i.e.,P

i(x− x)2/(n− 1)

> length(x)returns the number of items inx(i.e., the sample size n)

(2)

Note that thesumcommand is fairly general, for example

> sum((x-mean(x))^ 2) computesP

i(x− x)2

So, lets now generate 1000 bootstrap samples. We first need to specify a vector of real values of lenght 1000, which we will callboot

> boot <-numeric(1000)

We now generate 1000 samples, and assign the CV for bootstrap sample i as the ith element in the vectorboot, using aforloop

for (i in 1:1000) boot[i] <- CV(sample(x,replace=T))

The mean and variance of this collection of bootstrap samples are easily obtained using themeanandvarcommands (again, your values may differ),

> mean(boot) [1] 0.2404653

> var(boot) [1] 0.00193073

A plot of the histogram of these values follows using hist(boot)

Likewise, the value corresponding to the (say) upper 97.5

> quantile(boot,0.975) [1] 0.3176385

while the value corresponding to the lower 2.5% follows from

> quantile(boot,0.025) [1] 0.153469

Recall from the notes that the estimate of the bias is given by the difference between the mean of the bootstrap values and the initial estimate,

(3)

> bias <- mean(boot) - CV(x)

and an bootstrap-corrected estimate of the CV is just the original estimate minus the bias,

> CV(x) - bias [1] 0.2644771

Assuming normality, the approximate 95% confidence interval is given by CVd± 1.96p

Var(bootstrap) (or adjusting for the bias an lower and upper values of

> CV(x) - bias - 1.96*sqrt(var(boot)) [1] 0.1783546

> CV(x) - bias + 1.96*sqrt(var(boot)) [1] 0.3505997

Efron’s confident limit (Equation 11 on resampling notes) has an upper and lower value of

> quantile(boot,0.975) [1] 0.3176385

and

> quantile(boot,0.025) [1] 0.153469

While Hall’s confidence limits (Equation 12) has an upper and lower value of

> 2*CV(x) - quantile(boot,0.025) [1] 0.3514734

and

> 2*CV(x) - quantile(boot,0.975) [1] 0.1873039

Jackknife Calculations

We now turn to jackknifing the sample. Recall from the randomization notes that this involves two steps. First, we generate a jackknife sample which has value xiremoved and then compute the ith partial estimate of the test statistic using this sample,

i(x1· · · xi−1, xi,· · · xn)

We then turn this ith partial estimate into the ith pseudovalue bθiusing (Equation 5c in random notes)

i = nbθ− (n − 1)bθi

(4)

where bθis the estimate using the full data.

Let’s see how to code this inRusing the previous vectorxof data with our test statistic again being the coefficient of variation (and hence our functionCVpreviously defined).

We first focus on generating the ith partial estimate and ith pseudovalue. We need to take the original data vectorxand turn it into a vector (which we denotejack) of lenght n− 1 as follows. First, we need to specify toRthat we are creating the jackknife sample vector of the n− 1 sampled points

jack <- numeric(length(x)-1)

As before, we will use the commandlenght(x)in place of n. We also need to specify to Rthat we will be generating a vectorpseudoof the n pseudovalues

pseudo <- numeric(length(x))

Next, we need to fill in the elements of thejacksample vector as follows. For j < i, the jth element ofjackis the same as the jth element ofx; for j = i we exclude the value of x, while for j > i, the j− 1th element ofjackis the jth element ofx. We can state all this using a logicalif .. elsestatement within aforloop,

for (j in 1:length(x)) if(j < i) jack[j] <- x[j]

else if(j > i) jack[j-1] <- x[j]

We can then compute the ith pseudovalue (for the CV) as follows:

pseudo[i] <- lenght(x)*CV(x) -(lenght(x)-1)*CV(jack)

Finally, we top this all off by looping through the n possible i values, giving the final code as

jack <- numeric(length(x)-1) pseudo <- numeric(length(x)) for (i in 1:length(x)) { for (j in 1:length(x))

{if(j < i) jack[j] <- x[j] else if(j > i) jack[j-1] <- x[j]} pseudo[i] <- length(x)*CV(x) -(length(x)-1)*CV(jack)}

Note the use of the parenthesis ({, }) to delimit the appropriate elements in each loop. The mean and variance of the pseudovalues are easily found using

> mean(pseudo) [1] 0.2617376

> var(pseudo) [1] 0.07262871

Likewise, a histogram of the pseudovalues is generated using hist(pseudo)

Recall that the mean of the pseudovalues is the bootstrap estimator, while var(pseudo)/n is the variance of this estimator,

>var(pseudo)/length(x) [1] 0.002905148

(5)

An approximate 95% confidence interval is given by mean(pseudo)±t0.975,n−1p

var(pseudo)/n UsingR, the upper and lower limits become

> mean(pseudo) + qt(0.975,length(x)-1)*sqrt(var(pseudo)/length(x)) [1] 0.3729806

> mean(pseudo) - qt(0.975,length(x)-1)*sqrt(var(pseudo)/length(x)) [1] 0.1504947

Giving the approximate 95% jackknife confidence interval as 0.150 to 0.372.

Here’s a summary of the various estimated values, variances, and confidence intervals

Method Estimated CV Variance 95% interval

Original Estimate 0.252

Jackknife 0.262 0.0029 0.150 - 0.373

Bootstrap 0.264 0.0019

Bootstrap (normality) 0.178 - 0.351

Bootstrap (Efron) 0.153 - 0.318

Bootstrap (Hall) 0.187 - 0.351

參考文獻

相關文件

Figure 4 shows the detection result defined as the expected state and located using our mean shift method.. In this figure, the crosses mark the positions of the particles, and

Different for Encoder/Decoder According to the previous results, we hypothesize that using BERT in the encoder and GPT-2 in the decoder could perform best. However, in our

[r]

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

To meet the need of high speed, the fast total variation deconvolution algorithm (FTVD) is employed for optimization problem due to our data fitting in frequency

parallel to this line, pointing in the direction of increasing function values, and with length equal to the maximum value of the directional derivative of  at (4 6).. We can

The tangential component of a is the length of the projection of a onto T, so we sketch the scalar projection of a in the tangential direction to the curve and estimate its length to

Teachers may consider the school’s aims and conditions or even the language environment to select the most appropriate approach according to students’ need and ability; or develop