Financial Time Series I Topic 3: Resampling Methods Hung Chen Department of Mathematics National Taiwan University 11/11/2002

(1)

Financial Time Series I

Topic 3: Resampling Methods

Hung Chen

Department of Mathematics National Taiwan University

11/11/2002

(2)

OUTLINE 1. Motivated Example

– Double Blind Randomized Experiment – Bootstrap with R

2. Odds Ratio – Definition

– Random Design – Prospective Study – Retropective Study 3. Bootstrap Method

– Parametric Bootstrap – Nonparametric Bootstrap – Failure of bootstrap method

(3)

The Practice of Statistics

• Statistics is the science of learning from experience, especially experience that arrives a little bit at a time.

• Most people are not natural-born statisticians.

– We are not very good at picking out patterns from a sea of noisy data.

– To put it another way, we all are too good at picking out non-existent patterns that happen to suit our purposes.

– Statistical theory attacks the problem from both ends. It provides optimal methods for finding a real signal in a noisy back- ground, and also provides strict checks against the overinterpretation of random patterns.

• Statistical theory attempts to answer three basic questions:

1. Data Collection: How should I collect my data?

2. Summary: How should I analyze and summarize the data that I’ve collected?

(4)

3. Statistical Inference: How accurate are my data summaries?

• The bootstrap is a recently developed tech- nique for making certain kinds of statistical inferences.

It is only recently developed because it re- quires modern computer power to simplify the often intricate calculations of traditional statistical theory.

• The idea of bootstrap method is close to that of simulation. The main difference is to plug in an estimate of the underlying unknown random mechanism F .

(5)

Motivated Example

• Illustrate the just mentioned three basic statistical concepts using a front-page news from the New York Times of January 27, 1987.

• A study was done to see if small aspirin doses would prevent heart attacks in healthy middle-aged men.

• The data for the aspirin study were collected in a particularly efficient way: by a controlled, randomized, double-blind study.

– One half of the subjects received aspirin and the other half received a control substance, or placebo, with no active ingre- dients.

– The subjects were randomly assigned to the aspirin or placebo groups.

– Both the subjects and the supervising physicians were blind to the assignments, with the statisticians keeping a secret code of who received which substance.

– Scientists, like everyone else, want the subject they are working on to succeed.

(6)

– The elaborate precautions of a controlled,

randomized, blinded experiment guard against seeing benefits that don’t exist, while max- imizing the chance of detecting a genuine positive effect.

• The summary statistics in the study are very simple:

heart attacks subjects (fatal plus non-fatal)

aspirin group: 104 11,037

placebo group: 189 11,034

• What strikes the eye here is the lower rate of heart attacks in the aspirin group.

• The ratio of the two rates is θ =ˆ 104/11037

189/11034 = 0.55.

It suggests that the aspirin-takers only have 55% as many as heart attacks as placebo- takers.

• We are not interested in ˆθ.

What we would like to know is θ, the true ratio, that is the ratio we would see if we

(7)

could treat all subjects, and not just a sample of them.

• The tough question is how do we know that θ might not come out much less favorablyˆ if the experiment were run again? This is where statistical inference comes in.

• Statistical theory allows us to make the following inference: the true value of θ lies in the interval 0.43 < θ < 0.70 with 95% confidence.

Note that

θ = ˆθ + (θ − ˆθ) = 0.55 + [θ − ˆθ(ω₀)], where θ and ˆθ(ω₀) (= 0.55) are two numbers.

• Since ω₀ cannot be observed, we use θ− ˆθ(ω) to describe θ − ˆθ(ω₀) in statistics.

• What is the fluctuation of θ − ˆθ(ω) among all ω?

• If, for most ω, θ − ˆθ(ω) is around zero, we can conclude statistically that θ is close to 0.55 (= ˆθ(ω₀)).

(8)

• (Recall the definition of consistency.) If P (ω : |θ − ˆθ(ω)| < 0.1) = 0.95,

we claim that with 95% confidence that θ − 0.55 is no more than 0.1.

• In the aspirin study, it also track strokes.

The results are presented as the following:

strokes subjects aspirin group: 119 11,037 placebo group: 98 11,034

• For strokes, the ratio of the two rates is θ =ˆ 119/11037

98/11034 = 1.21.

It now looks like taking aspirin is actually harmful.

• However, the interval for the true stroke ratio θ turns out to be 0.93 < θ < 1.59 with 95% confidence. This includes the neutral value θ = 1, at which aspirin would be no better or worse than placebo.

• In the language of statistical hypothesis test- ing, aspirin was found to be significantly

(9)

beneficial for preventing heart attacks, but not significantly harmful for causing strokes.

According to introductory statistics course, we can put the above problem into the frame- work of two-sample problem with binomial distribution. Asymptotic analysis will then be used to give an approximation on the distribution of θ.ˆ

In this note, we demonstrate an alternative.

Apply the bootstrap method in the stroke example.

1. Create two pseudo populations based on the collectd data:

– Pseudo population 1: It is consist of 119 ones and 11037 − 119 = 10918 zeros.

– Pseudo population 2: It is consist of 98 ones and 11034 − 98 = 10936 zeros.

2. (Monte Carlo Resampling) Draw with replacement a sample of 11037 items from the first pseudo population, and a sample of 11034 items from the second pseudo population. Each of these is called a

(10)

bootstrap sample.

3. Derive the bootstrap replicate of ˆθ:

θˆ^∗ = prop. of ones in bootstrap sample #1 prop. of ones in bootstrap sample #2. 4. Repeat this process (1-3) a large number

of times, say 1000 times, and obtain 1000 bootstrap replicates ˆθ^∗.

The above procedure can be implemented easily using the following R code.

• n1 < −11037; s1 < −119; p1 < −s1/n1

• n2 < −11034; s2 < −98; p2 < −s2/n2

• Write a function named stroke.

• stroke < −f unction(n1, p1, n2, p2){

control < −rbinom(1, n1, p1) treat < −rbinom(1, n2, p2)

theta < −(control/n1)/(treat/n2) return(theta)

}

• Suppose that we would like to do 1000 repli- cations.

(11)

• result < −rep(1, 1000) which is used to store the 1000 boostrap replicates of ˆθ.

• for (i in 1 : 1000) result[i] < −stroke(n1, p1, n2, p2) My simulation gives

• The standard deviation turned out to be 0.17 in a batch of 1000 replicates that we generated.

• A rough 95% confidence interval is (0.93, 1.60) which is derived by taking the 25th and 975th largest of the 1000 replicates.

• The above method is called the percentile method.

Project 1.1 Do a computer experiment to repeat the above process one hundread times and answer the following questions.

• (a) Give a summary of those 100 confidence intervals. Do they always contain 1? If not, is it statistical correct?

• (b) How do you describe the distribution of bootstrap replicates of ˆθ? Is it close to normal?

(12)

Odds Ratio

• If an event has probability P (A) of occurring, the odds of A occurring are defined to be

odds(A) = P (A) 1 − P (A).

• Let X denote the event that an individual is exposed to a potentially harmful agent and D denote the event that the individual becomes diseased.

Denote the complementary events as ¯X and D.¯

• The odds of an individual contracting the disease given that he is exposed are

odds(D|X) = P (D|X) 1 − P (D|X)

and the odds of contracting the disease given that he is not exposed are

odds(D| ¯X) = P (D| ¯X) 1 − P (D| ¯X).

• The odds ratio ∆ = ^odds(D|X)_{odds(D| ¯}_X) is a measure of the influence of exposure on subsequent disease.

(13)

We will consider how the odds and odds ratio could be estimated by sampling from a population with joint and marginal probabilities defined as in the following table:

D¯ D

X π¯ ₀₀ π₀₁ π_0.

X π₁₀ π₁₁ π_1.

π_.0 π_.1 1 With this notation,

P (D|X) = π₁₁

π₁₀ + π₁₁ P (D| ¯X) = π₀₁ π₀₀ + π₀₁ so that

odds(D|X) = π₁₁

π₁₀ odds(D| ¯X) = π₀₁ π₀₀ and the odds ratio is

∆ = π₁₁π₀₀ π₀₁π₁₀

the product of the diagonal probabilities in the preceding table divided by the product of the off-diagonal probabilities.

Now we will consider three possible ways to sample this population to study the relationship of disease and exposure.

(14)

• Random sample:

– From such a sample, we could estimate all the probabilities directly.

– If the disease is rare, the total sample size would have to be quite large to guaran- tee that a substantial number of diseased individuals was included.

• Prospective study:

– A fixed number of exposed and nonex- posed individuals are sampled and then followed through time.

– The incidences of disease in those two groups are compared.

– In this case the data allow us to estimate and compare P (D|X) and P (D| ¯X) and, hence, the odds ratio.

– The aspirin study described in the previ- ous section can be viewed as this type of study.

• Retrospective study:

– A fixed number of diseased and undis- eased individuals are sampled and the

(15)

incidences of exposure in the two groups are compared.

– From such data we can directly estimate P (X|D) and P (X| ¯D).

– Since the marginal counts of diseased and nondiseased are fixed, we cannot estimate the joint probabilities or the important

conditional probabilities P (D|X) and P (D| ¯X).

– Observe that

P (X|D) = π₁₁ π₀₁ + π₁₁, 1 − P (X|D) = π₀₁

π₀₁ + π₁₁, odds(X|D) = π₁₁

π₀₁, odds(X| ¯D) = π₁₀

π₀₀.

The odds ratio can also be expressed as odds(X|D)/odds(X| ¯D).

Now we describe the study of Vianna, Green- wald, and Davies (1971) to illustrate the retrospective study.

• In this study they collected data compar- ing the percentages of tonsillectomies for a

(16)

group of patients suffering from Hodgkin’s disease and a comparable control group:

Tonsillectomy No Tonsillectomy

Hodgkin’s 67 34

Control 43 64

• Recall that the odds ratio can be expressed as odds(X|D)/odds(X| ¯D) and an estimate of it is n₀₀n₁₁/(n₀₁n₁₀), the product of the diagonal counts divided by the product of the off-diagonal counts.

• The data of Vianna, Greenwald, and Davies gives an estimate of odds ratio is

67 × 64

43 × 34 = 2.93.

• According to this study, the odds of contracting Hodgkin’s disease is increased by about a factor of three by undergoing a tonsillectomy.

• As well as having a point estimate 2.93, it would be useful to attach an approximate standard error to the estimate to indicate its uncertainty.

(17)

• We will use simulation (parametric bootstrap) to approximate the distribution of ∆.

– We need to generate random numbers according to a statistical model for the counts in the table of Vianna, Green- wald, and Davies.

– The model is that the count in the first row and first column, N₁₁, is binomially distributed with n = 101 and probability π₁₁.

– The count in the second row and second column, N₂₂, is binomially distributed with n = 107 and probability π₂₂.

– The distribution of the random variable

∆ =ˆ N₁₁N₂₂

(101 − N₁₁)(107 − N₂₂)

is thus determined by the two binomial distributions, and we could approximate it arbitrarily well by drawing a large number of samples from them.

– Since the probabilities π₁₁ and π₂₂ are unknown, they are estimated from the observed counts by ˆπ₁₁ = 67/101 = 0.663

(18)

and π₂₂ = 64/107 = 0.598.

– A one thousand realizations generated on a computer gives the standard deviation 0.89.

Project 1.2 Do a computer experiment to run the above process in the setting of retrospective study. Give a 95% confidence interval of ∆ and describe the distribution of bootstrap replicates of ˆ∆?

(19)

Bootstrap Method

• The bootstrap method introduced in Efron (1979) is a very general resampling procedure for estimating the distributions of statistics based on independent observations.

– The bootstrap method is shown to be successful in many situations, which is being accepted as an alternative to the asymptotic methods.

– It is better than some other asymptotic methods, such as the traditional normal approximation and the Edgeworth expan- sion.

– There are some counterexamples that show the bootstrap produces wrong solutions, i.e., it provides some inconsistent estimators.

Consider the problem of estimating variabil- ity of location estimates by the Bootstrap method.

• If we view the observations x₁, x₂, . . . , x_n as realizations of independent random variables with common distribution function F ,

(20)

it is appropriate to investigate the variabil- ity and sampling distribution of a location estimate calculated from a sample of size n.

• Denote the location estimate as ˆθ.

– Note that ˆθ is a function of the random variables X₁, X₂, . . . , X_n and hence has a probability distribution, its sampling distribution, which is determined by n and F .

– How do we derive this sampling distribution?

• We are faced with two problems:

1. F is unknown.

2. F is known, but ˆθ may be such a complicated function of X₁, X₂, . . . , X_n that finding its distribution would exceed our analytic abilities.

• To address the second problem when F is known.

– How could we find the probability distribution of ˆθ without going through incred- ibly complicated analytic calculations?

(21)

– The computer comes to our rescue-we can do it by simulation.

– We generate many, many samples, say B in number, of size n from F ; from each sample we calculate the value of ˆθ.

– The empirical distribution of the result- ing values ˆθ₁^∗, ˆθ₂^∗, . . . , ˆθ_B^∗ is an approximation to the distribution function of ˆθ, which is good if B is very large.

– If we wish to know the standard deviation of ˆθ, we can find a good approximation to it by calculating the standard de-

viation of the collection of values ˆθ₁^∗, ˆθ₂^∗, . . . , ˆθ_B^∗ . – We can make these approximations arbi-

trarily accurate by taking B to be arbitrarily large.

Simulation Let G be a distribution and let Y₁, . . . , Y_B be iid values drawn from G.

• By the law of large numbers, B⁻¹ ^P^B_j=1 Y_j converges in probability to E(Y ).

• We can use B⁻¹ ^P^B_j=1 Y_j as an estimate of E(Y ).

(22)

• In a simulation, we can make B as large as we like in which case, the difference between B⁻¹ ^P^B_j=1Y_j and E(Y ) is negligible.

All this would be well and good if we knew F , but we don’t. So what do we do? We will consider two different cases.

• In the first case, F is unknown up to an unknown parameter η, i.e. F (x|η).

– Without knowing η, the above approximation cannot be used.

– The idea of the parametric bootstrap is to simulate data from F (x|ˆη) where ˆη should be a good estimate of η.

– It utilizes the structure of F .

• In the second case, F is completely unknown.

• The idea of the nonparametric bootstrap is to simulate data from the empirical cdf F_n.

• Here F_n is a discrete probability distribution that gives probability 1/n to each observed value x₁, · · · , x_n.

(23)

• A sample of size n from F_n is thus a sample of size n drawn with replacement from the collection x₁, · · · , x_n. The standard deviation of ˆθ is then estimated by

s_θ_ˆ =

v u u u u t

1 B

B

X

i=1(θ^∗_i − ¯θ^∗)²

where θ^∗₁, . . . , θ_B^∗ are produced from B sample of size n from the collection x₁, · · · , x_n. Now we use a simple example to illustrate this idea.

• Suppose n = 2 and observe X₍₁₎ = c <

X₍₂₎ = d.

• X₁^∗, X₂^∗ are independently distributed with P (X_i^∗ = c) = P (X_i^∗ = d) = 1/2, i = 1, 2.

• The pairs (X₁^∗, X₂^∗) therefore takes on the four possible pairs of values

(c, c), (c, d), (d, c), (d, d), each with probability 1/4.

• θ^∗ = (X₁^∗ + X₂^∗)/2 takes on the values c, (c +d)/2, d with probabilities 1/4, 1/2, 1/4,

(24)

respectively, so that θ^∗ − (c + d)/2 takes on the values (c − d)/2, 0, (d − c)/2 with probabilities 1/4, 1/2, 1/4, respectively.

For the above example, we can easily calculate its bootstrap distribution.

• When n is large, we can easily imagine that the above computation becomes too complicated to compute directly.

• Use simple random sampling to approximate bootstrap distribution.

• In the bootstrap literature, a variety alter- natives are suggested other than simple random sampling.

Project 1.3 Use parametric bootstrap and nonparametric bootstrap to approximate the distribution of median based on a data with sample size 20 from a standard normal distribution.

The following is a sample R-code.

• n < −20

• x < −rnorm(n) # Create some data.

• theta.hat < −median(x)

(25)

• B < −1000; theta.boot < − rep(0,B)

• for(i in 1 : B)) {

xstar < − sample(x,size=n,replace=T) # draw a bootstrap sample

theta.boot[i] < − median(xstar) # compute the statistic

}

• var.boot < − var(theta.boot)

• se¡- sqrt(var.boot); print(se)

(26)

We now introduce notations to illustrate the bootstrap method.

• Assumed the data X₁, · · · , X_n, are independent and identically distributed (iid) samples from a k-dimensional population distribution F .

• Estimate the distribution

H_n(x) = P {R_n ≤ x},

where R_n = R_n(T_n, F ) is a real-valued func- tional of F and T_n = T_n(X₁, · · · , X_n), a statistic of interest.

• Let X₁^∗, · · · , X_n^∗ be a “bootstrap” samples iid from F_n, the empirical distribution based on X₁, · · · , X_n, T_n^∗ = T_n(X₁^∗, · · · , X_n^∗), and R^∗_n = R_n(T_n^∗, F_n). F_n is constructed by placing at each observation X_i a mass 1/n.

Thus F_n may be represented as F_n(x) = 1

n

X

i=1 I(X_i ≤ x), −∞ < x < ∞.

• A bootstrap estimator of H_n is Hˆ_n(x) = P_∗{R^∗_n ≤ x},

(27)

where for given X₁, · · · , X_n, P_∗ is the conditional probability with respect to the random generator of bootstrap samples.

• Since the bootstrap samples are generated from F_n, this method is called the nonparametric bootstrap.

– Note that ˆH_n(x) will depend on F_n and hence itself is a random variable.

– To be specific, ˆH_n(x) will change as the data {x₁, · · · , x_n} changes.

– Recall that a bootstrap analysis is run to assess the accuracy of some primary statistical results.

– This produces bootstrap statistics, like standard errors or confidence intervals, which are assessments of error for the primary results.

• As a further remark, the empirical distribution F_n is called the nonparametric maximum likelihood estimate (MLE) of F .

As illustration, we consider the following three examples.

(28)

Example 1. Suppose that X₁, · · · , X_n ∼ N (µ, 1) and R_n = √

n( ¯X_n − µ). Consider the estima- tion of

P (a) = P {R_n > a|N (µ, 1)}.

The nonparametric bootstrap method will estimate P (a) by

P_{N B}(a) = P {√

n( ¯X_n^∗ − ¯X_n) > a|F_n}.

• Observe data x₁, · · · , x_n with mean ¯x_n.

• Let Y₁, . . . , Y_n denote a bootstrap sample of n observations drawn independently from F_n.

• Let ¯Y_n = n⁻¹ ^Pⁿ_i=1 Y_i.

• P (a) is estimated by P_{N B}(a) = P {√

n( ¯Y_n − ¯x_n) > a|F_n}.

• In principle, P_{N B}(a) can be found by con- sidering all nⁿ possible bootstrap sample.

– If all X_i’s are distinct, then the number of different possible resamples equals the number of distinct ways of placing n in- distinguishable objects into n numbered

(29)

boxes, the boxes being allowed to contain any number of objects. It is known that it is equal to C(2n−1, n) ≈ (nπ)^−1/22²ⁿ⁻¹. – When n = 10(20, respect.), C(2n−1, n) ≈

92375(6.9 × 10¹⁰, respect.).

– For small value of n, it is often feasible to calculate a bootstrap estimate exactly.

– For large samples, say n ≥ 10, this becomes infeasible even at today’s computer technology.

• Natural questions to ask are as follows:

– What are computationally efficient ways to bootstrap?

– Can we get bootstrap-like answers without Monte Carlo?

• Address the question of “evaluating” the performance of bootstrap method.

– For the above particular problem, we need to estimate P_{N B}(a)−P (a) or sup_a |P_{N B}(a)−

P (a)|.

– As a remark, P_{N B}(a) is a random variable since F_n is random.

(30)

– Efron (1992) proposed to use jackknife to give the error estimates for bootstrap quantities.

• Suppose that additional information on F is available. Then it is reasonable to utilize this information in the bootstrap method.

• In this example, F known to be normally distributed with unknown mean µ and variance 1.

– It is natural to use ¯x_n to estimate µ and then estimate P (a) = P {R_n > a|N (µ, 1)}

by

P_{P B}(a) = P {√

n( ¯Y_n−¯x_n) > a|N (¯x_n, 1)}.

– Since the bootstrap samples are generated from N (¯x_n, 1) which utilizes the information from a parametric form of F , this method is called the parametric bootstrap.

– In this case, it can be shown that P_{P B}(a) = P (a) for all realization of ¯X_n.

– If F is known to be normally distributed with unknown mean and variance µ and

(31)

variance σ² respectively, P_{P B}(a) is no longer equal to P (a).

Project 1.4. (a) Show that P_{P B}(a) = Φ(a/s_n) where s²_n = (n − 1)⁻¹ ^Pⁿ_i=1(x_i − ¯x_n)².

(b) Prove that P_{P B}(a) is a consistent estimate of P (a) for fixed a.

(c) Prove that sup_a |P_{P B}(a) − P (a)| → 0.^P

(32)

For the question of finding P_{N B}(a), we can in principle write down the characteristic function and then apply the inversion formula. However, it is a nontrivial job. Therefore, Efron (1979) suggested to approximate P_{N B}(a) by Monte Carlo resampling. (i.e., Sample-size resamples may be drawn repeatedly from the original sample, the value of a statistic computed for each individual resample, and the bootstrap statistic approxi- mated by taking an average of an appropriate function of these numbers.)

Now we state Levy’s Inversion Formula which is taken from Chapter 6.2 of Chung (1974).

Theorem If x₁ < x₂ and x₁ and x₂ are points of continuity of F , then we have

F (x₂)−F (x₁) = lim

T →∞

1 2π

Z T

−T

e^−itx¹ − e^−itx²

it f (t)dt, where f (t) is the characteristic function.

(33)

Example 2. Estimating the probability of success

• Consider a probability distribution F putting all of its mass at zero or one.

• Let θ(F ) = P (X = 1) = p.

• Consider R(X, F ) = ¯X − θ(F ) = ˆp − p.

• Observed X = x, the bootstrap sample

X₁^∗, · · · , X_n^∗ ∼ Bin(1, θ(F_n)) = Bin(1, ¯x_n).

Note that

R(X^∗, F_n) = ¯X_n^∗ − ¯x_n, E_∗( ¯X_n^∗ − ¯x_n) = 0,

V ar_∗( ¯X_n^∗ − ¯x_n) = x¯_n(1 − ¯x_n)

n .

Recall that n ¯X_n^∗ ∼ Bin(n, ¯x) and n ¯X_n ∼ Bin(n, p).

• It is known that if min{n¯x_n, n(1− ¯x_n)} ≥ 5, n ¯X_n^∗ − n¯x_n

rnx¯_n(1 − ¯x_n) =

√n( ¯X_n^∗ − ¯x_n)

rx¯_n(1 − ¯x_n) ∼ N (0, 1);

and if min{np, n(1 − p)} ≥ 5, n ¯X_n − np

rnθ(1 − p) =

√n( ¯X_n − p)

rp(1 − p) ∼ N (0, 1).

(34)

• Based on the above approximation results, we conclude that the bootstrap method works if min{n¯x_n, n(1 − ¯x_n)} ≥ 5.

• The question remained to be studied is whether P {min(n ¯X_n, n(1 − ¯X_n)) ≥ 5} → 0?

(35)

Example 3. Estimating the median

• Suppose we are interested in finding the distribution of n^1/2{F_n⁻¹(1/2)−F⁻¹(1/2)} where F_n⁻¹(1/2) and F⁻¹(1/2) are the sample and population median respectively.

• Set θ(F ) = F⁻¹(1/2).

• Fin a bootstrap approximation of the above distribution.

• Consider n = 2m − 1. Then the sample median F_n⁻¹(1/2) = X_(m) where X₍₁₎ ≤ X₍₂₎ ≤ · · · ≤ X_(n).

• Let N_i^∗ denote the number of times x_i is se- lected in the bootstrap sampling procedure.

Set N^∗ = (N₁^∗, · · · , N_n^∗).

It follows easily that N^∗ follows a multino- mial distribution with n trials and the probability of selection is (n⁻¹, · · · , n⁻¹).

• Denote the order statistics of x₁, . . . , x_n by x₍₁₎ ≤ · · · ≤ x_(n).

• Set N_[i]^∗ to be the number of times of choos- ing x_(i). Then for 1 ≤ ` < n, we have

P rob_∗(X_(m)^∗ > x_(`)) = P rob_∗{N_[1]^∗ + · · · + N_[`]^∗ ≤ m − 1}

(36)

= P rob







Bin





n, ` n





 ≤ m − 1







= ^m−1^X

j=0 C(n, j)







` n







j ^



1 − ` n







n−j

. Or,

P rob_∗(T^∗ = x_(`) − x_(m)) = P rob







Bin





n, ` − 1 n





 ≤ m − 1







− P rob







Bin





n, ` n





 ≤ m − 1







.

• When n = 13, we have

` 2 or 12 3 or 11 4 or 10 5 or 9 6 or 8 7 probability 0.0015 0.0142 0.0550 0.1242 0.4136 0.2230 Quite often we use the mean square error

to measure the performance of an estimator, t(X), of θ(F ). Or, E_FT² = E_F(t(X) − θ(F ))². Use the bootstrap to estimate E_FT². Then the bootstrap estimate of E_FT² is

E_∗(T^∗)² = ^X¹³

`=1[x_(`)−x₍₇₎]²P rob_∗{T^∗ = x_(`)−x₍₇₎}.

It is known that E_FT² → [4nf²(θ)]⁻¹ as n tends to infinity when F has a bounded con- tinuous density. A natural question to ask is whether E_∗(T^∗)² is close to E_FT²?

(37)

Validity of the Bootstrap Method

We now give a brief discussion on the validity of the bootstrap method. First, we state central limit theorems and its approximation error bound.

Perhaps the most widely known version of the CLT is the following.

Theorem (Lindeberg-Levy Central Limit The- orem) Let {X_i} be iid with mean µ and finite variance σ². Then

√n







1 n

Xn

i=1 X_i − µ







→ N (0, σd ²).

The above theorem can be generalized to independent random variables which are not neces- sarily identically distributed.

Theorem (Lindeberg-Feller CLT) Let {X_i} be independent with mean {µ_i}, finite variances {σ_i²}, and distribution functions {F_i}.

• Suppose that B_n² = ^Pⁿ_i=1 σ_i² satisfies σ_n²

B_n² → 0, B_n → ∞ as n → ∞.

• n⁻¹ ^Pⁿ_i=1 X_i is N (n⁻¹ ^Pⁿ_i=1 µ_i, n⁻²B_n²) if and

(38)

only if the following Lindeberg condition satisfied

B_n⁻² ^Xⁿ

i=1

Z

|t−µ_i|>B_n(t−µ_i)²dF_i(t) → 0, n → ∞ , each > 0.

In the theorems just described, asymptotic normality was asserted for a sequence of sums

Pn

1 X_i generated by a single sequence X₁, X₂, . . . of random variables. For the validality of bootstrap, we may consider a double array of random variables

X₁₁, X₁₂, · · · , X_1K₁; X₂₁, X₂₂, · · · , X_2K₂;

... ... ... ...

X_n1, X_n2, · · · , X_nK_n; ... ... ... ...

For each n ≥ 1, there are K_n random variables {X_nj, 1 ≤ j ≤ K_n}. It is assumed that K_n →

∞. The case K_n = n is called a “triangular”

array.

Denote by F_nj the distribution function of X_nj. Also, put

µ_nj = EX_nj, A_n = E ^K^Xⁿ

j=1X_nj = ^K^Xⁿ

j=1µ_nj,

(39)

B_n² = V ar







K_n

X

j=1X_nj





.

We then have the following theorem.

Theorem (Lindeberg-Feller) Let {X_nj : 1 ≤ j ≤ K_n; n = 1, 2, . . .} be a double array with independent random variables within rows. Then the “uniform asymptotic negligibility” condition

1≤j≤Kmax_n P (|X_nj−µ_nj| > τ B_n) → 0, n → ∞, each τ > 0, and the asymptotic normality condition ^P^K_j=1ⁿ X_nj

is AN (A_n, B_n²) together hold if and only if the Lindberg condition

B_n⁻² ^Xⁿ

i=1

Z

|t−µ_i|>B_n(t−µ_i)²dF_i(t) → 0, n → ∞each > 0 is satisfied. As a note, the independence is as-

sumed only it within rows, which themselves may be arbitrarily dependent.

It is of both theoretical and practical interest to characterize the error of approximation in the CLT.

For the i.i.d. case, an exact bound on the error of approximation is provided by the following theorem due to Berry (1941) and Esseen (1945).

(40)

Theorem If X₁, . . . , X_n are i.i.d. with distribution F and if S_n = X₁ + · · · + X_n, then there exists a constant c (independent of F ) such that for all x,

supx

P







S_n − ES_n

rV ar(S_n) ≤ x





 − Φ(x)

≤ c

√n

E|X₁ − EX₁|³ [V ar(X₁)]^3/2 for all F with finite third moment.

• Note that c in the above theorem is a univer- sal constant. Various authors have thought to find the best constant c.

• Originally, c is set to be 33/4 but it has been sharpened to 0.7975.

• For x is sufficiently large, while n remains fixed, the quantities

P [(S_n − ES_n)/^rV ar(S_n) ≤ x]

and Φ(x) each become so close to 1 that the bound given by above is too crude.

• The problem in this case may be character- ized as one of approximation of large deviation probabilities, with the object of atten- tion becoming the relative error in approxi-

(41)

mation of

1 − P [(S_n − ES_n)/^rV ar(S_n) ≤ x]

by 1 − Φ(x) when x → ∞.

(42)

Inconsistent Bootstrap Estimator

Bickel and Freedman (1981) and Loh (1984) showed that the bootstrap estimators of the distributions of the extreme-order statistics are inconsistent.

• Let X_(n) be the maximum of i.i.d. random variables X₁, . . . , X_n from F with F (θ) = 1 for some θ, and let X_(m)^∗ be the maximum of X₍₁₎^∗ , . . . , X_(m)^∗ which are i.i.d. from the empirical distribution F_n.

• Although X_(n) → θ, it never equals θ. But P_∗{X_(n)^∗ = X_(n)} = 1−(1−n⁻¹)ⁿ → 1−e⁻¹, which leads to the inconsistency of the bootstrap estimator.

• The reason for the inconsistency of the bootstrap is that the bootstrap samples are drawn from F_n which is not exactly F . Therefore, the bootstrap may fail due to the lack of

“continuity.”

Consider the following problem.

• Let X₁, . . . , X_n be independent, with a common N (µ, σ²) distribution.

(43)

• Let s² be the sample variance.

• Consider the pivot s²/σ².

This is distributed, of course, as χ²_n−1/(n − 1).

• Consider the bootstrap approximation, namely the distribution of s^∗2/s², where s^∗2 is the variance of the resampled data.

• For n = 20, the bootstrap approximation is not good.

• One source of this problem is in the tails of the normal: about 30% of the variance is contributed by 5% of the distribution and will be missed by typical samples of size 20.

• In other words, s² is mean unbiased for σ², but quite skewed for moderate n: in most samples, s² is somewhat too small, coun- terbalanced by the few samples where s² is huge. The variance of s^∗2/s² is largely controlled by the sample fourth moment, which is even more skewed.

• For large n, these problems go away: after all, s² and the bootstrap are consistent.

(44)

Project 1.5 Use simulation to illustrate the points made in the above two questions. For the problem of estimating the end point, you can assume that X is a uniform random variable over [0, 1] (i.e., θ = 1).

(45)

Bias Reduction via the Bootstrap Principle

• The bootstrap can also be used to estimate bias and do bias reduction.

• Consider θ₀ = θ(F₀) = µ³, where µ =

R xdF₀(x). Set ˆθ = θ(F_n) = ¯X³.

• Elementary calculations show that E{θ(F_n)|F₀} = E{µ + n^{−1 n}^X

i=1(X_i − µ)}³

= µ³ + n⁻¹3µσ² + n⁻²γ, where γ = E(X₁ − µ)³ denotes population skewness.

• Using the nonparametric bootstrap, we obtain the following:

E{θ(F_n^∗)|F_n} = ¯X³ + n⁻¹3 ¯Xσˆ² + n⁻²γ,ˆ

where ˆσ² = n⁻¹ ^P(X_i− ¯X)² and ˆγ = n⁻¹ ^P(X_i− X)¯ ³ denote sample variance and skewness respectively.

• Using the bootstrap principle, E{θ(F_n^∗)|F_n}−

θ(F_n) is used to estimate θ(F_n) − θ(F₀).

• Note that θ₀ = θ(F_n) − (θ(F_n) − θ₀). Or, θ₀ can be estimated by θ(F_n)−[E{θ(F_n^∗)|F_n}−

θ(F_n)] or 2θ(F_n) − E{θ(F_n^∗)|F_n}.

(46)

• The bootstrap bias-reduced estimate is 2 ¯X³− ( ¯X³ + n⁻¹3 ¯Xσˆ² + n⁻²γ). Or, ˆˆ θ_{N B} = ¯X³ − n⁻¹3 ¯Xσˆ² − n⁻²γ.ˆ