### Financial Time Series I

### Topic 3: Resampling Methods

### Hung Chen

### Department of Mathematics National Taiwan University

### 11/11/2002

OUTLINE 1. Motivated Example

– Double Blind Randomized Experiment – Bootstrap with R

2. Odds Ratio – Definition

– Random Design – Prospective Study – Retropective Study 3. Bootstrap Method

– Parametric Bootstrap – Nonparametric Bootstrap – Failure of bootstrap method

The Practice of Statistics

• Statistics is the science of learning from ex- perience, especially experience that arrives a little bit at a time.

• Most people are not natural-born statisti- cians.

– We are not very good at picking out pat- terns from a sea of noisy data.

– To put it another way, we all are too good at picking out non-existent patterns that happen to suit our purposes.

– Statistical theory attacks the problem from both ends. It provides optimal methods for finding a real signal in a noisy back- ground, and also provides strict checks against the overinterpretation of random patterns.

• Statistical theory attempts to answer three basic questions:

1. Data Collection: How should I collect my data?

2. Summary: How should I analyze and summarize the data that I’ve collected?

3. Statistical Inference: How accurate are my data summaries?

• The bootstrap is a recently developed tech- nique for making certain kinds of statistical inferences.

It is only recently developed because it re- quires modern computer power to simplify the often intricate calculations of traditional statistical theory.

• The idea of bootstrap method is close to that of simulation. The main difference is to plug in an estimate of the underlying un- known random mechanism F .

Motivated Example

• Illustrate the just mentioned three basic sta- tistical concepts using a front-page news from the New York Times of January 27, 1987.

• A study was done to see if small aspirin doses would prevent heart attacks in healthy middle-aged men.

• The data for the aspirin study were collected in a particularly efficient way: by a con- trolled, randomized, double-blind study.

– One half of the subjects received aspirin and the other half received a control sub- stance, or placebo, with no active ingre- dients.

– The subjects were randomly assigned to the aspirin or placebo groups.

– Both the subjects and the supervising physicians were blind to the assignments, with the statisticians keeping a secret code of who received which substance.

– Scientists, like everyone else, want the subject they are working on to succeed.

– The elaborate precautions of a controlled,

randomized, blinded experiment guard against seeing benefits that don’t exist, while max- imizing the chance of detecting a genuine positive effect.

• The summary statistics in the study are very simple:

heart attacks subjects (fatal plus non-fatal)

aspirin group: 104 11,037

placebo group: 189 11,034

• What strikes the eye here is the lower rate of heart attacks in the aspirin group.

• The ratio of the two rates is θ =ˆ 104/11037

189/11034 = 0.55.

It suggests that the aspirin-takers only have 55% as many as heart attacks as placebo- takers.

• We are not interested in ˆθ.

What we would like to know is θ, the true ratio, that is the ratio we would see if we

could treat all subjects, and not just a sam- ple of them.

• The tough question is how do we know that θ might not come out much less favorablyˆ if the experiment were run again? This is where statistical inference comes in.

• Statistical theory allows us to make the fol- lowing inference: the true value of θ lies in the interval 0.43 < θ < 0.70 with 95% con- fidence.

Note that

θ = ˆθ + (θ − ˆθ) = 0.55 + [θ − ˆθ(ω_{0})],
where θ and ˆθ(ω_{0}) (= 0.55) are two numbers.

• Since ω_{0} cannot be observed, we use θ− ˆθ(ω)
to describe θ − ˆθ(ω_{0}) in statistics.

• What is the fluctuation of θ − ˆθ(ω) among all ω?

• If, for most ω, θ − ˆθ(ω) is around zero, we
can conclude statistically that θ is close to
0.55 (= ˆθ(ω_{0})).

• (Recall the definition of consistency.) If P (ω : |θ − ˆθ(ω)| < 0.1) = 0.95,

we claim that with 95% confidence that θ − 0.55 is no more than 0.1.

• In the aspirin study, it also track strokes.

The results are presented as the following:

strokes subjects aspirin group: 119 11,037 placebo group: 98 11,034

• For strokes, the ratio of the two rates is θ =ˆ 119/11037

98/11034 = 1.21.

It now looks like taking aspirin is actually harmful.

• However, the interval for the true stroke ra- tio θ turns out to be 0.93 < θ < 1.59 with 95% confidence. This includes the neutral value θ = 1, at which aspirin would be no better or worse than placebo.

• In the language of statistical hypothesis test- ing, aspirin was found to be significantly

beneficial for preventing heart attacks, but not significantly harmful for causing strokes.

According to introductory statistics course, we can put the above problem into the frame- work of two-sample problem with binomial dis- tribution. Asymptotic analysis will then be used to give an approximation on the distribution of θ.ˆ

In this note, we demonstrate an alternative.

Apply the bootstrap method in the stroke ex- ample.

1. Create two pseudo populations based on the collectd data:

– Pseudo population 1: It is consist of 119 ones and 11037 − 119 = 10918 zeros.

– Pseudo population 2: It is consist of 98 ones and 11034 − 98 = 10936 zeros.

2. (Monte Carlo Resampling) Draw with re- placement a sample of 11037 items from the first pseudo population, and a sample of 11034 items from the second pseudo population. Each of these is called a

bootstrap sample.

3. Derive the bootstrap replicate of ˆθ:

θˆ^{∗} = prop. of ones in bootstrap sample #1
prop. of ones in bootstrap sample #2.
4. Repeat this process (1-3) a large number

of times, say 1000 times, and obtain 1000
bootstrap replicates ˆθ^{∗}.

The above procedure can be implemented eas- ily using the following R code.

• n1 < −11037; s1 < −119; p1 < −s1/n1

• n2 < −11034; s2 < −98; p2 < −s2/n2

• Write a function named stroke.

• stroke < −f unction(n1, p1, n2, p2){

control < −rbinom(1, n1, p1) treat < −rbinom(1, n2, p2)

theta < −(control/n1)/(treat/n2) return(theta)

}

• Suppose that we would like to do 1000 repli- cations.

• result < −rep(1, 1000) which is used to store the 1000 boostrap replicates of ˆθ.

• for (i in 1 : 1000) result[i] < −stroke(n1, p1, n2, p2) My simulation gives

• The standard deviation turned out to be 0.17 in a batch of 1000 replicates that we generated.

• A rough 95% confidence interval is (0.93, 1.60) which is derived by taking the 25th and 975th largest of the 1000 replicates.

• The above method is called the percentile method.

Project 1.1 Do a computer experiment to repeat the above process one hundread times and answer the following questions.

• (a) Give a summary of those 100 confidence intervals. Do they always contain 1? If not, is it statistical correct?

• (b) How do you describe the distribution of bootstrap replicates of ˆθ? Is it close to nor- mal?

Odds Ratio

• If an event has probability P (A) of occur- ring, the odds of A occurring are defined to be

odds(A) = P (A) 1 − P (A).

• Let X denote the event that an individual is exposed to a potentially harmful agent and D denote the event that the individual be- comes diseased.

Denote the complementary events as ¯X and D.¯

• The odds of an individual contracting the disease given that he is exposed are

odds(D|X) = P (D|X) 1 − P (D|X)

and the odds of contracting the disease given that he is not exposed are

odds(D| ¯X) = P (D| ¯X) 1 − P (D| ¯X).

• The odds ratio ∆ = ^{odds(D|X)}_{odds(D| ¯}_{X)} is a measure
of the influence of exposure on subsequent
disease.

We will consider how the odds and odds ratio could be estimated by sampling from a popu- lation with joint and marginal probabilities de- fined as in the following table:

D¯ D

X π¯ _{00} π_{01} π_{0.}

X π_{10} π_{11} π_{1.}

π_{.0} π_{.1} 1
With this notation,

P (D|X) = π_{11}

π_{10} + π_{11} P (D| ¯X) = π_{01}
π_{00} + π_{01}
so that

odds(D|X) = π_{11}

π_{10} odds(D| ¯X) = π_{01}
π_{00}
and the odds ratio is

∆ = π_{11}π_{00}
π_{01}π_{10}

the product of the diagonal probabilities in the preceding table divided by the product of the off-diagonal probabilities.

Now we will consider three possible ways to sample this population to study the relationship of disease and exposure.

• Random sample:

– From such a sample, we could estimate all the probabilities directly.

– If the disease is rare, the total sample size would have to be quite large to guaran- tee that a substantial number of diseased individuals was included.

• Prospective study:

– A fixed number of exposed and nonex- posed individuals are sampled and then followed through time.

– The incidences of disease in those two groups are compared.

– In this case the data allow us to estimate and compare P (D|X) and P (D| ¯X) and, hence, the odds ratio.

– The aspirin study described in the previ- ous section can be viewed as this type of study.

• Retrospective study:

– A fixed number of diseased and undis- eased individuals are sampled and the

incidences of exposure in the two groups are compared.

– From such data we can directly estimate P (X|D) and P (X| ¯D).

– Since the marginal counts of diseased and nondiseased are fixed, we cannot estimate the joint probabilities or the important

conditional probabilities P (D|X) and P (D| ¯X).

– Observe that

P (X|D) = π_{11}
π_{01} + π_{11},
1 − P (X|D) = π_{01}

π_{01} + π_{11},
odds(X|D) = π_{11}

π_{01},
odds(X| ¯D) = π_{10}

π_{00}.

The odds ratio can also be expressed as odds(X|D)/odds(X| ¯D).

Now we describe the study of Vianna, Green- wald, and Davies (1971) to illustrate the retro- spective study.

• In this study they collected data compar- ing the percentages of tonsillectomies for a

group of patients suffering from Hodgkin’s disease and a comparable control group:

Tonsillectomy No Tonsillectomy

Hodgkin’s 67 34

Control 43 64

• Recall that the odds ratio can be expressed
as odds(X|D)/odds(X| ¯D) and an estimate
of it is n_{00}n_{11}/(n_{01}n_{10}), the product of the
diagonal counts divided by the product of
the off-diagonal counts.

• The data of Vianna, Greenwald, and Davies gives an estimate of odds ratio is

67 × 64

43 × 34 = 2.93.

• According to this study, the odds of con- tracting Hodgkin’s disease is increased by about a factor of three by undergoing a ton- sillectomy.

• As well as having a point estimate 2.93, it would be useful to attach an approximate standard error to the estimate to indicate its uncertainty.

• We will use simulation (parametric boot- strap) to approximate the distribution of ∆.

– We need to generate random numbers according to a statistical model for the counts in the table of Vianna, Green- wald, and Davies.

– The model is that the count in the first
row and first column, N_{11}, is binomially
distributed with n = 101 and probability
π_{11}.

– The count in the second row and second
column, N_{22}, is binomially distributed
with n = 107 and probability π_{22}.

– The distribution of the random variable

∆ =ˆ N_{11}N_{22}

(101 − N_{11})(107 − N_{22})

is thus determined by the two binomial distributions, and we could approximate it arbitrarily well by drawing a large num- ber of samples from them.

– Since the probabilities π_{11} and π_{22} are
unknown, they are estimated from the
observed counts by ˆπ_{11} = 67/101 = 0.663

and π_{22} = 64/107 = 0.598.

– A one thousand realizations generated on a computer gives the standard deviation 0.89.

Project 1.2 Do a computer experiment to run the above process in the setting of retro- spective study. Give a 95% confidence interval of ∆ and describe the distribution of bootstrap replicates of ˆ∆?

Bootstrap Method

• The bootstrap method introduced in Efron (1979) is a very general resampling proce- dure for estimating the distributions of statis- tics based on independent observations.

– The bootstrap method is shown to be successful in many situations, which is being accepted as an alternative to the asymptotic methods.

– It is better than some other asymptotic methods, such as the traditional normal approximation and the Edgeworth expan- sion.

– There are some counterexamples that show the bootstrap produces wrong solutions, i.e., it provides some inconsistent estima- tors.

Consider the problem of estimating variabil- ity of location estimates by the Bootstrap method.

• If we view the observations x_{1}, x_{2}, . . . , x_{n}
as realizations of independent random vari-
ables with common distribution function F ,

it is appropriate to investigate the variabil- ity and sampling distribution of a location estimate calculated from a sample of size n.

• Denote the location estimate as ˆθ.

– Note that ˆθ is a function of the random
variables X_{1}, X_{2}, . . . , X_{n} and hence has
a probability distribution, its sampling
distribution, which is determined by n
and F .

– How do we derive this sampling distribu- tion?

• We are faced with two problems:

1. F is unknown.

2. F is known, but ˆθ may be such a com-
plicated function of X_{1}, X_{2}, . . . , X_{n} that
finding its distribution would exceed
our analytic abilities.

• To address the second problem when F is known.

– How could we find the probability distri- bution of ˆθ without going through incred- ibly complicated analytic calculations?

– The computer comes to our rescue-we can do it by simulation.

– We generate many, many samples, say B in number, of size n from F ; from each sample we calculate the value of ˆθ.

– The empirical distribution of the result-
ing values ˆθ_{1}^{∗}, ˆθ_{2}^{∗}, . . . , ˆθ_{B}^{∗} is an approxi-
mation to the distribution function of ˆθ,
which is good if B is very large.

– If we wish to know the standard devia- tion of ˆθ, we can find a good approxima- tion to it by calculating the standard de-

viation of the collection of values ˆθ_{1}^{∗}, ˆθ_{2}^{∗}, . . . , ˆθ_{B}^{∗} .
– We can make these approximations arbi-

trarily accurate by taking B to be arbi- trarily large.

Simulation Let G be a distribution and let
Y_{1}, . . . , Y_{B} be iid values drawn from G.

• By the law of large numbers, B^{−1} ^{P}^{B}_{j=1} Y_{j}
converges in probability to E(Y ).

• We can use B^{−1} ^{P}^{B}_{j=1} Y_{j} as an estimate of
E(Y ).

• In a simulation, we can make B as large as
we like in which case, the difference between
B^{−1} ^{P}^{B}_{j=1}Y_{j} and E(Y ) is negligible.

All this would be well and good if we knew F , but we don’t. So what do we do? We will consider two different cases.

• In the first case, F is unknown up to an unknown parameter η, i.e. F (x|η).

– Without knowing η, the above approxi- mation cannot be used.

– The idea of the parametric bootstrap is to simulate data from F (x|ˆη) where ˆη should be a good estimate of η.

– It utilizes the structure of F .

• In the second case, F is completely unknown.

• The idea of the nonparametric boot-
strap is to simulate data from the empirical
cdf F_{n}.

• Here F_{n} is a discrete probability distribution
that gives probability 1/n to each observed
value x_{1}, · · · , x_{n}.

• A sample of size n from F_{n} is thus a sample
of size n drawn with replacement from the
collection x_{1}, · · · , x_{n}. The standard devia-
tion of ˆθ is then estimated by

s_{θ}_{ˆ} =

v u u u u t

1 B

B

X

i=1(θ^{∗}_{i} − ¯θ^{∗})^{2}

where θ^{∗}_{1}, . . . , θ_{B}^{∗} are produced from B sam-
ple of size n from the collection x_{1}, · · · , x_{n}.
Now we use a simple example to illustrate this
idea.

• Suppose n = 2 and observe X_{(1)} = c <

X_{(2)} = d.

• X_{1}^{∗}, X_{2}^{∗} are independently distributed with
P (X_{i}^{∗} = c) = P (X_{i}^{∗} = d) = 1/2, i = 1, 2.

• The pairs (X_{1}^{∗}, X_{2}^{∗}) therefore takes on the
four possible pairs of values

(c, c), (c, d), (d, c), (d, d), each with probability 1/4.

• θ^{∗} = (X_{1}^{∗} + X_{2}^{∗})/2 takes on the values c,
(c +d)/2, d with probabilities 1/4, 1/2, 1/4,

respectively, so that θ^{∗} − (c + d)/2 takes
on the values (c − d)/2, 0, (d − c)/2 with
probabilities 1/4, 1/2, 1/4, respectively.

For the above example, we can easily calcu- late its bootstrap distribution.

• When n is large, we can easily imagine that the above computation becomes too compli- cated to compute directly.

• Use simple random sampling to approximate bootstrap distribution.

• In the bootstrap literature, a variety alter- natives are suggested other than simple ran- dom sampling.

Project 1.3 Use parametric bootstrap and nonparametric bootstrap to approximate the dis- tribution of median based on a data with sam- ple size 20 from a standard normal distribution.

The following is a sample R-code.

• n < −20

• x < −rnorm(n) # Create some data.

• theta.hat < −median(x)

• B < −1000; theta.boot < − rep(0,B)

• for(i in 1 : B)) {

xstar < − sample(x,size=n,replace=T) # draw a bootstrap sample

theta.boot[i] < − median(xstar) # com- pute the statistic

}

• var.boot < − var(theta.boot)

• se¡- sqrt(var.boot); print(se)

We now introduce notations to illustrate the bootstrap method.

• Assumed the data X_{1}, · · · , X_{n}, are indepen-
dent and identically distributed (iid) sam-
ples from a k-dimensional population distri-
bution F .

• Estimate the distribution

H_{n}(x) = P {R_{n} ≤ x},

where R_{n} = R_{n}(T_{n}, F ) is a real-valued func-
tional of F and T_{n} = T_{n}(X_{1}, · · · , X_{n}), a
statistic of interest.

• Let X_{1}^{∗}, · · · , X_{n}^{∗} be a “bootstrap” samples
iid from F_{n}, the empirical distribution based
on X_{1}, · · · , X_{n}, T_{n}^{∗} = T_{n}(X_{1}^{∗}, · · · , X_{n}^{∗}), and
R^{∗}_{n} = R_{n}(T_{n}^{∗}, F_{n}). F_{n} is constructed by
placing at each observation X_{i} a mass 1/n.

Thus F_{n} may be represented as
F_{n}(x) = 1

n

n

X

i=1 I(X_{i} ≤ x), −∞ < x < ∞.

• A bootstrap estimator of H_{n} is
Hˆ_{n}(x) = P_{∗}{R^{∗}_{n} ≤ x},

where for given X_{1}, · · · , X_{n}, P_{∗} is the con-
ditional probability with respect to the ran-
dom generator of bootstrap samples.

• Since the bootstrap samples are generated
from F_{n}, this method is called the nonpara-
metric bootstrap.

– Note that ˆH_{n}(x) will depend on F_{n} and
hence itself is a random variable.

– To be specific, ˆH_{n}(x) will change as the
data {x_{1}, · · · , x_{n}} changes.

– Recall that a bootstrap analysis is run to assess the accuracy of some primary statistical results.

– This produces bootstrap statistics, like standard errors or confidence intervals, which are assessments of error for the pri- mary results.

• As a further remark, the empirical distribu-
tion F_{n} is called the nonparametric maxi-
mum likelihood estimate (MLE) of F .

As illustration, we consider the following three examples.

Example 1. Suppose that X_{1}, · · · , X_{n} ∼ N (µ, 1)
and R_{n} = √

n( ¯X_{n} − µ). Consider the estima-
tion of

P (a) = P {R_{n} > a|N (µ, 1)}.

The nonparametric bootstrap method will esti- mate P (a) by

P_{N B}(a) = P {√

n( ¯X_{n}^{∗} − ¯X_{n}) > a|F_{n}}.

• Observe data x_{1}, · · · , x_{n} with mean ¯x_{n}.

• Let Y_{1}, . . . , Y_{n} denote a bootstrap sample
of n observations drawn independently from
F_{n}.

• Let ¯Y_{n} = n^{−1} ^{P}^{n}_{i=1} Y_{i}.

• P (a) is estimated by
P_{N B}(a) = P {√

n( ¯Y_{n} − ¯x_{n}) > a|F_{n}}.

• In principle, P_{N B}(a) can be found by con-
sidering all n^{n} possible bootstrap sample.

– If all X_{i}’s are distinct, then the number
of different possible resamples equals the
number of distinct ways of placing n in-
distinguishable objects into n numbered

boxes, the boxes being allowed to contain
any number of objects. It is known that
it is equal to C(2n−1, n) ≈ (nπ)^{−1/2}2^{2n−1}.
– When n = 10(20, respect.), C(2n−1, n) ≈

92375(6.9 × 10^{10}, respect.).

– For small value of n, it is often feasible to calculate a bootstrap estimate exactly.

– For large samples, say n ≥ 10, this be- comes infeasible even at today’s computer technology.

• Natural questions to ask are as follows:

– What are computationally efficient ways to bootstrap?

– Can we get bootstrap-like answers with- out Monte Carlo?

• Address the question of “evaluating” the per- formance of bootstrap method.

– For the above particular problem, we need
to estimate P_{N B}(a)−P (a) or sup_{a} |P_{N B}(a)−

P (a)|.

– As a remark, P_{N B}(a) is a random vari-
able since F_{n} is random.

– Efron (1992) proposed to use jackknife to give the error estimates for bootstrap quantities.

• Suppose that additional information on F is available. Then it is reasonable to utilize this information in the bootstrap method.

• In this example, F known to be normally distributed with unknown mean µ and vari- ance 1.

– It is natural to use ¯x_{n} to estimate µ and
then estimate P (a) = P {R_{n} > a|N (µ, 1)}

by

P_{P B}(a) = P {√

n( ¯Y_{n}−¯x_{n}) > a|N (¯x_{n}, 1)}.

– Since the bootstrap samples are gener-
ated from N (¯x_{n}, 1) which utilizes the in-
formation from a parametric form of F ,
this method is called the parametric boot-
strap.

– In this case, it can be shown that P_{P B}(a) =
P (a) for all realization of ¯X_{n}.

– If F is known to be normally distributed with unknown mean and variance µ and

variance σ^{2} respectively, P_{P B}(a) is no
longer equal to P (a).

Project 1.4. (a) Show that P_{P B}(a) = Φ(a/s_{n})
where s^{2}_{n} = (n − 1)^{−1} ^{P}^{n}_{i=1}(x_{i} − ¯x_{n})^{2}.

(b) Prove that P_{P B}(a) is a consistent estimate
of P (a) for fixed a.

(c) Prove that sup_{a} |P_{P B}(a) − P (a)| → 0.^{P}

For the question of finding P_{N B}(a), we can in
principle write down the characteristic function
and then apply the inversion formula. However,
it is a nontrivial job. Therefore, Efron (1979)
suggested to approximate P_{N B}(a) by Monte Carlo
resampling. (i.e., Sample-size resamples may be
drawn repeatedly from the original sample, the
value of a statistic computed for each individual
resample, and the bootstrap statistic approxi-
mated by taking an average of an appropriate
function of these numbers.)

Now we state Levy’s Inversion Formula which is taken from Chapter 6.2 of Chung (1974).

Theorem If x_{1} < x_{2} and x_{1} and x_{2} are
points of continuity of F , then we have

F (x_{2})−F (x_{1}) = lim

T →∞

1 2π

Z T

−T

e^{−itx}^{1} − e^{−itx}^{2}

it f (t)dt, where f (t) is the characteristic function.

Example 2. Estimating the probability of success

• Consider a probability distribution F putting all of its mass at zero or one.

• Let θ(F ) = P (X = 1) = p.

• Consider R(X, F ) = ¯X − θ(F ) = ˆp − p.

• Observed X = x, the bootstrap sample

X_{1}^{∗}, · · · , X_{n}^{∗} ∼ Bin(1, θ(F_{n})) = Bin(1, ¯x_{n}).

Note that

R(X^{∗}, F_{n}) = ¯X_{n}^{∗} − ¯x_{n},
E_{∗}( ¯X_{n}^{∗} − ¯x_{n}) = 0,

V ar_{∗}( ¯X_{n}^{∗} − ¯x_{n}) = x¯_{n}(1 − ¯x_{n})

n .

Recall that n ¯X_{n}^{∗} ∼ Bin(n, ¯x) and n ¯X_{n} ∼
Bin(n, p).

• It is known that if min{n¯x_{n}, n(1− ¯x_{n})} ≥ 5,
n ¯X_{n}^{∗} − n¯x_{n}

rnx¯_{n}(1 − ¯x_{n}) =

√n( ¯X_{n}^{∗} − ¯x_{n})

rx¯_{n}(1 − ¯x_{n}) ∼ N (0, 1);

and if min{np, n(1 − p)} ≥ 5,
n ¯X_{n} − np

rnθ(1 − p) =

√n( ¯X_{n} − p)

rp(1 − p) ∼ N (0, 1).

• Based on the above approximation results,
we conclude that the bootstrap method works
if min{n¯x_{n}, n(1 − ¯x_{n})} ≥ 5.

• The question remained to be studied is whether
P {min(n ¯X_{n}, n(1 − ¯X_{n})) ≥ 5} → 0?

Example 3. Estimating the median

• Suppose we are interested in finding the dis-
tribution of n^{1/2}{F_{n}^{−1}(1/2)−F^{−1}(1/2)} where
F_{n}^{−1}(1/2) and F^{−1}(1/2) are the sample and
population median respectively.

• Set θ(F ) = F^{−1}(1/2).

• Fin a bootstrap approximation of the above distribution.

• Consider n = 2m − 1. Then the sample
median F_{n}^{−1}(1/2) = X_{(m)} where X_{(1)} ≤
X_{(2)} ≤ · · · ≤ X_{(n)}.

• Let N_{i}^{∗} denote the number of times x_{i} is se-
lected in the bootstrap sampling procedure.

Set N^{∗} = (N_{1}^{∗}, · · · , N_{n}^{∗}).

It follows easily that N^{∗} follows a multino-
mial distribution with n trials and the prob-
ability of selection is (n^{−1}, · · · , n^{−1}).

• Denote the order statistics of x_{1}, . . . , x_{n} by
x_{(1)} ≤ · · · ≤ x_{(n)}.

• Set N_{[i]}^{∗} to be the number of times of choos-
ing x_{(i)}. Then for 1 ≤ ` < n, we have

P rob_{∗}(X_{(m)}^{∗} > x_{(`)}) = P rob_{∗}{N_{[1]}^{∗} + · · · + N_{[`]}^{∗} ≤ m − 1}

= P rob

Bin

n, ` n

≤ m − 1

= ^{m−1}^{X}

j=0 C(n, j)

` n

j ^{}

1 − ` n

n−j

. Or,

P rob_{∗}(T^{∗} = x_{(`)} − x_{(m)}) = P rob

Bin

n, ` − 1 n

≤ m − 1

− P rob

Bin

n, ` n

≤ m − 1

.

• When n = 13, we have

` 2 or 12 3 or 11 4 or 10 5 or 9 6 or 8 7 probability 0.0015 0.0142 0.0550 0.1242 0.4136 0.2230 Quite often we use the mean square error

to measure the performance of an estimator,
t(X), of θ(F ). Or, E_{F}T^{2} = E_{F}(t(X) − θ(F ))^{2}.
Use the bootstrap to estimate E_{F}T^{2}. Then the
bootstrap estimate of E_{F}T^{2} is

E_{∗}(T^{∗})^{2} = ^{X}^{13}

`=1[x_{(`)}−x_{(7)}]^{2}P rob_{∗}{T^{∗} = x_{(`)}−x_{(7)}}.

It is known that E_{F}T^{2} → [4nf^{2}(θ)]^{−1} as n
tends to infinity when F has a bounded con-
tinuous density. A natural question to ask is
whether E_{∗}(T^{∗})^{2} is close to E_{F}T^{2}?

Validity of the Bootstrap Method

We now give a brief discussion on the validity of the bootstrap method. First, we state cen- tral limit theorems and its approximation error bound.

Perhaps the most widely known version of the CLT is the following.

Theorem (Lindeberg-Levy Central Limit The-
orem) Let {X_{i}} be iid with mean µ and finite
variance σ^{2}. Then

√n

1 n

Xn

i=1 X_{i} − µ

→ N (0, σd ^{2}).

The above theorem can be generalized to inde- pendent random variables which are not neces- sarily identically distributed.

Theorem (Lindeberg-Feller CLT) Let {X_{i}} be
independent with mean {µ_{i}}, finite variances
{σ_{i}^{2}}, and distribution functions {F_{i}}.

• Suppose that B_{n}^{2} = ^{P}^{n}_{i=1} σ_{i}^{2} satisfies
σ_{n}^{2}

B_{n}^{2} → 0, B_{n} → ∞ as n → ∞.

• n^{−1} ^{P}^{n}_{i=1} X_{i} is N (n^{−1} ^{P}^{n}_{i=1} µ_{i}, n^{−2}B_{n}^{2}) if and

only if the following Lindeberg condition sat- isfied

B_{n}^{−2} ^{X}^{n}

i=1

Z

|t−µ_{i}|>B_{n}(t−µ_{i})^{2}dF_{i}(t) → 0, n → ∞ , each > 0.

In the theorems just described, asymptotic normality was asserted for a sequence of sums

Pn

1 X_{i} generated by a single sequence X_{1}, X_{2}, . . .
of random variables. For the validality of boot-
strap, we may consider a double array of ran-
dom variables

X_{11}, X_{12}, · · · , X_{1K}_{1};
X_{21}, X_{22}, · · · , X_{2K}_{2};

... ... ... ...

X_{n1}, X_{n2}, · · · , X_{nK}_{n};
... ... ... ...

For each n ≥ 1, there are K_{n} random variables
{X_{nj}, 1 ≤ j ≤ K_{n}}. It is assumed that K_{n} →

∞. The case K_{n} = n is called a “triangular”

array.

Denote by F_{nj} the distribution function of
X_{nj}. Also, put

µ_{nj} = EX_{nj},
A_{n} = E ^{K}^{X}^{n}

j=1X_{nj} = ^{K}^{X}^{n}

j=1µ_{nj},

B_{n}^{2} = V ar

K_{n}

X

j=1X_{nj}

.

We then have the following theorem.

Theorem (Lindeberg-Feller) Let {X_{nj} : 1 ≤
j ≤ K_{n}; n = 1, 2, . . .} be a double array with
independent random variables within rows. Then
the “uniform asymptotic negligibility” condi-
tion

1≤j≤Kmax_{n} P (|X_{nj}−µ_{nj}| > τ B_{n}) → 0, n → ∞, each τ > 0,
and the asymptotic normality condition ^{P}^{K}_{j=1}^{n} X_{nj}

is AN (A_{n}, B_{n}^{2}) together hold if and only if the
Lindberg condition

B_{n}^{−2} ^{X}^{n}

i=1

Z

|t−µ_{i}|>B_{n}(t−µ_{i})^{2}dF_{i}(t) → 0, n → ∞each > 0
is satisfied. As a note, the independence is as-

sumed only it within rows, which themselves may be arbitrarily dependent.

It is of both theoretical and practical interest to characterize the error of approximation in the CLT.

For the i.i.d. case, an exact bound on the error of approximation is provided by the following theorem due to Berry (1941) and Esseen (1945).

Theorem If X_{1}, . . . , X_{n} are i.i.d. with dis-
tribution F and if S_{n} = X_{1} + · · · + X_{n}, then
there exists a constant c (independent of F )
such that for all x,

supx

P

S_{n} − ES_{n}

rV ar(S_{n}) ≤ x

− Φ(x)

≤ c

√n

E|X_{1} − EX_{1}|^{3}
[V ar(X_{1})]^{3/2}
for all F with finite third moment.

• Note that c in the above theorem is a univer- sal constant. Various authors have thought to find the best constant c.

• Originally, c is set to be 33/4 but it has been sharpened to 0.7975.

• For x is sufficiently large, while n remains fixed, the quantities

P [(S_{n} − ES_{n})/^{r}V ar(S_{n}) ≤ x]

and Φ(x) each become so close to 1 that the bound given by above is too crude.

• The problem in this case may be character- ized as one of approximation of large devia- tion probabilities, with the object of atten- tion becoming the relative error in approxi-

mation of

1 − P [(S_{n} − ES_{n})/^{r}V ar(S_{n}) ≤ x]

by 1 − Φ(x) when x → ∞.

Inconsistent Bootstrap Estimator

Bickel and Freedman (1981) and Loh (1984) showed that the bootstrap estimators of the dis- tributions of the extreme-order statistics are in- consistent.

• Let X_{(n)} be the maximum of i.i.d. random
variables X_{1}, . . . , X_{n} from F with F (θ) = 1
for some θ, and let X_{(m)}^{∗} be the maximum
of X_{(1)}^{∗} , . . . , X_{(m)}^{∗} which are i.i.d. from the
empirical distribution F_{n}.

• Although X_{(n)} → θ, it never equals θ. But
P_{∗}{X_{(n)}^{∗} = X_{(n)}} = 1−(1−n^{−1})^{n} → 1−e^{−1},
which leads to the inconsistency of the boot-
strap estimator.

• The reason for the inconsistency of the boot-
strap is that the bootstrap samples are drawn
from F_{n} which is not exactly F . Therefore,
the bootstrap may fail due to the lack of

“continuity.”

Consider the following problem.

• Let X_{1}, . . . , X_{n} be independent, with a com-
mon N (µ, σ^{2}) distribution.

• Let s^{2} be the sample variance.

• Consider the pivot s^{2}/σ^{2}.

This is distributed, of course, as χ^{2}_{n−1}/(n −
1).

• Consider the bootstrap approximation, namely
the distribution of s^{∗2}/s^{2}, where s^{∗2} is the
variance of the resampled data.

• For n = 20, the bootstrap approximation is not good.

• One source of this problem is in the tails of the normal: about 30% of the variance is contributed by 5% of the distribution and will be missed by typical samples of size 20.

• In other words, s^{2} is mean unbiased for σ^{2},
but quite skewed for moderate n: in most
samples, s^{2} is somewhat too small, coun-
terbalanced by the few samples where s^{2} is
huge. The variance of s^{∗2}/s^{2} is largely con-
trolled by the sample fourth moment, which
is even more skewed.

• For large n, these problems go away: after
all, s^{2} and the bootstrap are consistent.

Project 1.5 Use simulation to illustrate the points made in the above two questions. For the problem of estimating the end point, you can assume that X is a uniform random variable over [0, 1] (i.e., θ = 1).

Bias Reduction via the Bootstrap Principle

• The bootstrap can also be used to estimate bias and do bias reduction.

• Consider θ_{0} = θ(F_{0}) = µ^{3}, where µ =

R xdF_{0}(x). Set ˆθ = θ(F_{n}) = ¯X^{3}.

• Elementary calculations show that
E{θ(F_{n})|F_{0}} = E{µ + n^{−1 n}^{X}

i=1(X_{i} − µ)}^{3}

= µ^{3} + n^{−1}3µσ^{2} + n^{−2}γ,
where γ = E(X_{1} − µ)^{3} denotes population
skewness.

• Using the nonparametric bootstrap, we ob- tain the following:

E{θ(F_{n}^{∗})|F_{n}} = ¯X^{3} + n^{−1}3 ¯Xσˆ^{2} + n^{−2}γ,ˆ

where ˆσ^{2} = n^{−1} ^{P}(X_{i}− ¯X)^{2} and ˆγ = n^{−1} ^{P}(X_{i}−
X)¯ ^{3} denote sample variance and skewness
respectively.

• Using the bootstrap principle, E{θ(F_{n}^{∗})|F_{n}}−

θ(F_{n}) is used to estimate θ(F_{n}) − θ(F_{0}).

• Note that θ_{0} = θ(F_{n}) − (θ(F_{n}) − θ_{0}). Or, θ_{0}
can be estimated by θ(F_{n})−[E{θ(F_{n}^{∗})|F_{n}}−

θ(F_{n})] or 2θ(F_{n}) − E{θ(F_{n}^{∗})|F_{n}}.

• The bootstrap bias-reduced estimate is 2 ¯X^{3}−
( ¯X^{3} + n^{−1}3 ¯Xσˆ^{2} + n^{−2}γ). Or, ˆˆ θ_{N B} = ¯X^{3} −
n^{−1}3 ¯Xσˆ^{2} − n^{−2}γ.ˆ