# Financial Time Series I Topic 3: Resampling Methods Hung Chen Department of Mathematics National Taiwan University 11/11/2002

## Full text

(1)

### 11/11/2002

(2)

OUTLINE 1. Motivated Example

– Double Blind Randomized Experiment – Bootstrap with R

2. Odds Ratio – Definition

– Random Design – Prospective Study – Retropective Study 3. Bootstrap Method

– Parametric Bootstrap – Nonparametric Bootstrap – Failure of bootstrap method

(3)

The Practice of Statistics

• Statistics is the science of learning from ex- perience, especially experience that arrives a little bit at a time.

• Most people are not natural-born statisti- cians.

– We are not very good at picking out pat- terns from a sea of noisy data.

– To put it another way, we all are too good at picking out non-existent patterns that happen to suit our purposes.

– Statistical theory attacks the problem from both ends. It provides optimal methods for finding a real signal in a noisy back- ground, and also provides strict checks against the overinterpretation of random patterns.

• Statistical theory attempts to answer three basic questions:

1. Data Collection: How should I collect my data?

2. Summary: How should I analyze and summarize the data that I’ve collected?

(4)

3. Statistical Inference: How accurate are my data summaries?

• The bootstrap is a recently developed tech- nique for making certain kinds of statistical inferences.

It is only recently developed because it re- quires modern computer power to simplify the often intricate calculations of traditional statistical theory.

• The idea of bootstrap method is close to that of simulation. The main difference is to plug in an estimate of the underlying un- known random mechanism F .

(5)

Motivated Example

• Illustrate the just mentioned three basic sta- tistical concepts using a front-page news from the New York Times of January 27, 1987.

• A study was done to see if small aspirin doses would prevent heart attacks in healthy middle-aged men.

• The data for the aspirin study were collected in a particularly efficient way: by a con- trolled, randomized, double-blind study.

– One half of the subjects received aspirin and the other half received a control sub- stance, or placebo, with no active ingre- dients.

– The subjects were randomly assigned to the aspirin or placebo groups.

– Both the subjects and the supervising physicians were blind to the assignments, with the statisticians keeping a secret code of who received which substance.

– Scientists, like everyone else, want the subject they are working on to succeed.

(6)

– The elaborate precautions of a controlled,

randomized, blinded experiment guard against seeing benefits that don’t exist, while max- imizing the chance of detecting a genuine positive effect.

• The summary statistics in the study are very simple:

heart attacks subjects (fatal plus non-fatal)

aspirin group: 104 11,037

placebo group: 189 11,034

• What strikes the eye here is the lower rate of heart attacks in the aspirin group.

• The ratio of the two rates is θ =ˆ 104/11037

189/11034 = 0.55.

It suggests that the aspirin-takers only have 55% as many as heart attacks as placebo- takers.

• We are not interested in ˆθ.

What we would like to know is θ, the true ratio, that is the ratio we would see if we

(7)

could treat all subjects, and not just a sam- ple of them.

• The tough question is how do we know that θ might not come out much less favorablyˆ if the experiment were run again? This is where statistical inference comes in.

• Statistical theory allows us to make the fol- lowing inference: the true value of θ lies in the interval 0.43 < θ < 0.70 with 95% con- fidence.

Note that

θ = ˆθ + (θ − ˆθ) = 0.55 + [θ − ˆθ(ω0)], where θ and ˆθ(ω0) (= 0.55) are two numbers.

• Since ω0 cannot be observed, we use θ− ˆθ(ω) to describe θ − ˆθ(ω0) in statistics.

• What is the fluctuation of θ − ˆθ(ω) among all ω?

• If, for most ω, θ − ˆθ(ω) is around zero, we can conclude statistically that θ is close to 0.55 (= ˆθ(ω0)).

(8)

• (Recall the definition of consistency.) If P (ω : |θ − ˆθ(ω)| < 0.1) = 0.95,

we claim that with 95% confidence that θ − 0.55 is no more than 0.1.

• In the aspirin study, it also track strokes.

The results are presented as the following:

strokes subjects aspirin group: 119 11,037 placebo group: 98 11,034

• For strokes, the ratio of the two rates is θ =ˆ 119/11037

98/11034 = 1.21.

It now looks like taking aspirin is actually harmful.

• However, the interval for the true stroke ra- tio θ turns out to be 0.93 < θ < 1.59 with 95% confidence. This includes the neutral value θ = 1, at which aspirin would be no better or worse than placebo.

• In the language of statistical hypothesis test- ing, aspirin was found to be significantly

(9)

beneficial for preventing heart attacks, but not significantly harmful for causing strokes.

According to introductory statistics course, we can put the above problem into the frame- work of two-sample problem with binomial dis- tribution. Asymptotic analysis will then be used to give an approximation on the distribution of θ.ˆ

In this note, we demonstrate an alternative.

Apply the bootstrap method in the stroke ex- ample.

1. Create two pseudo populations based on the collectd data:

– Pseudo population 1: It is consist of 119 ones and 11037 − 119 = 10918 zeros.

– Pseudo population 2: It is consist of 98 ones and 11034 − 98 = 10936 zeros.

2. (Monte Carlo Resampling) Draw with re- placement a sample of 11037 items from the first pseudo population, and a sample of 11034 items from the second pseudo population. Each of these is called a

(10)

bootstrap sample.

3. Derive the bootstrap replicate of ˆθ:

θˆ = prop. of ones in bootstrap sample #1 prop. of ones in bootstrap sample #2. 4. Repeat this process (1-3) a large number

of times, say 1000 times, and obtain 1000 bootstrap replicates ˆθ.

The above procedure can be implemented eas- ily using the following R code.

• n1 < −11037; s1 < −119; p1 < −s1/n1

• n2 < −11034; s2 < −98; p2 < −s2/n2

• Write a function named stroke.

• stroke < −f unction(n1, p1, n2, p2){

control < −rbinom(1, n1, p1) treat < −rbinom(1, n2, p2)

theta < −(control/n1)/(treat/n2) return(theta)

}

• Suppose that we would like to do 1000 repli- cations.

(11)

• result < −rep(1, 1000) which is used to store the 1000 boostrap replicates of ˆθ.

• for (i in 1 : 1000) result[i] < −stroke(n1, p1, n2, p2) My simulation gives

• The standard deviation turned out to be 0.17 in a batch of 1000 replicates that we generated.

• A rough 95% confidence interval is (0.93, 1.60) which is derived by taking the 25th and 975th largest of the 1000 replicates.

• The above method is called the percentile method.

Project 1.1 Do a computer experiment to repeat the above process one hundread times and answer the following questions.

• (a) Give a summary of those 100 confidence intervals. Do they always contain 1? If not, is it statistical correct?

• (b) How do you describe the distribution of bootstrap replicates of ˆθ? Is it close to nor- mal?

(12)

Odds Ratio

• If an event has probability P (A) of occur- ring, the odds of A occurring are defined to be

odds(A) = P (A) 1 − P (A).

• Let X denote the event that an individual is exposed to a potentially harmful agent and D denote the event that the individual be- comes diseased.

Denote the complementary events as ¯X and D.¯

• The odds of an individual contracting the disease given that he is exposed are

odds(D|X) = P (D|X) 1 − P (D|X)

and the odds of contracting the disease given that he is not exposed are

odds(D| ¯X) = P (D| ¯X) 1 − P (D| ¯X).

• The odds ratio ∆ = odds(D|X)odds(D| ¯X) is a measure of the influence of exposure on subsequent disease.

(13)

We will consider how the odds and odds ratio could be estimated by sampling from a popu- lation with joint and marginal probabilities de- fined as in the following table:

D¯ D

X π¯ 00 π01 π0.

X π10 π11 π1.

π.0 π.1 1 With this notation,

P (D|X) = π11

π10 + π11 P (D| ¯X) = π01 π00 + π01 so that

odds(D|X) = π11

π10 odds(D| ¯X) = π01 π00 and the odds ratio is

∆ = π11π00 π01π10

the product of the diagonal probabilities in the preceding table divided by the product of the off-diagonal probabilities.

Now we will consider three possible ways to sample this population to study the relationship of disease and exposure.

(14)

• Random sample:

– From such a sample, we could estimate all the probabilities directly.

– If the disease is rare, the total sample size would have to be quite large to guaran- tee that a substantial number of diseased individuals was included.

• Prospective study:

– A fixed number of exposed and nonex- posed individuals are sampled and then followed through time.

– The incidences of disease in those two groups are compared.

– In this case the data allow us to estimate and compare P (D|X) and P (D| ¯X) and, hence, the odds ratio.

– The aspirin study described in the previ- ous section can be viewed as this type of study.

• Retrospective study:

– A fixed number of diseased and undis- eased individuals are sampled and the

(15)

incidences of exposure in the two groups are compared.

– From such data we can directly estimate P (X|D) and P (X| ¯D).

– Since the marginal counts of diseased and nondiseased are fixed, we cannot estimate the joint probabilities or the important

conditional probabilities P (D|X) and P (D| ¯X).

– Observe that

P (X|D) = π11 π01 + π11, 1 − P (X|D) = π01

π01 + π11, odds(X|D) = π11

π01, odds(X| ¯D) = π10

π00.

The odds ratio can also be expressed as odds(X|D)/odds(X| ¯D).

Now we describe the study of Vianna, Green- wald, and Davies (1971) to illustrate the retro- spective study.

• In this study they collected data compar- ing the percentages of tonsillectomies for a

(16)

group of patients suffering from Hodgkin’s disease and a comparable control group:

Tonsillectomy No Tonsillectomy

Hodgkin’s 67 34

Control 43 64

• Recall that the odds ratio can be expressed as odds(X|D)/odds(X| ¯D) and an estimate of it is n00n11/(n01n10), the product of the diagonal counts divided by the product of the off-diagonal counts.

• The data of Vianna, Greenwald, and Davies gives an estimate of odds ratio is

67 × 64

43 × 34 = 2.93.

• According to this study, the odds of con- tracting Hodgkin’s disease is increased by about a factor of three by undergoing a ton- sillectomy.

• As well as having a point estimate 2.93, it would be useful to attach an approximate standard error to the estimate to indicate its uncertainty.

(17)

• We will use simulation (parametric boot- strap) to approximate the distribution of ∆.

– We need to generate random numbers according to a statistical model for the counts in the table of Vianna, Green- wald, and Davies.

– The model is that the count in the first row and first column, N11, is binomially distributed with n = 101 and probability π11.

– The count in the second row and second column, N22, is binomially distributed with n = 107 and probability π22.

– The distribution of the random variable

∆ =ˆ N11N22

(101 − N11)(107 − N22)

is thus determined by the two binomial distributions, and we could approximate it arbitrarily well by drawing a large num- ber of samples from them.

– Since the probabilities π11 and π22 are unknown, they are estimated from the observed counts by ˆπ11 = 67/101 = 0.663

(18)

and π22 = 64/107 = 0.598.

– A one thousand realizations generated on a computer gives the standard deviation 0.89.

Project 1.2 Do a computer experiment to run the above process in the setting of retro- spective study. Give a 95% confidence interval of ∆ and describe the distribution of bootstrap replicates of ˆ∆?

(19)

Bootstrap Method

• The bootstrap method introduced in Efron (1979) is a very general resampling proce- dure for estimating the distributions of statis- tics based on independent observations.

– The bootstrap method is shown to be successful in many situations, which is being accepted as an alternative to the asymptotic methods.

– It is better than some other asymptotic methods, such as the traditional normal approximation and the Edgeworth expan- sion.

– There are some counterexamples that show the bootstrap produces wrong solutions, i.e., it provides some inconsistent estima- tors.

Consider the problem of estimating variabil- ity of location estimates by the Bootstrap method.

• If we view the observations x1, x2, . . . , xn as realizations of independent random vari- ables with common distribution function F ,

(20)

it is appropriate to investigate the variabil- ity and sampling distribution of a location estimate calculated from a sample of size n.

• Denote the location estimate as ˆθ.

– Note that ˆθ is a function of the random variables X1, X2, . . . , Xn and hence has a probability distribution, its sampling distribution, which is determined by n and F .

– How do we derive this sampling distribu- tion?

• We are faced with two problems:

1. F is unknown.

2. F is known, but ˆθ may be such a com- plicated function of X1, X2, . . . , Xn that finding its distribution would exceed our analytic abilities.

• To address the second problem when F is known.

– How could we find the probability distri- bution of ˆθ without going through incred- ibly complicated analytic calculations?

(21)

– The computer comes to our rescue-we can do it by simulation.

– We generate many, many samples, say B in number, of size n from F ; from each sample we calculate the value of ˆθ.

– The empirical distribution of the result- ing values ˆθ1, ˆθ2, . . . , ˆθB is an approxi- mation to the distribution function of ˆθ, which is good if B is very large.

– If we wish to know the standard devia- tion of ˆθ, we can find a good approxima- tion to it by calculating the standard de-

viation of the collection of values ˆθ1, ˆθ2, . . . , ˆθB . – We can make these approximations arbi-

trarily accurate by taking B to be arbi- trarily large.

Simulation Let G be a distribution and let Y1, . . . , YB be iid values drawn from G.

• By the law of large numbers, B−1 PBj=1 Yj converges in probability to E(Y ).

• We can use B−1 PBj=1 Yj as an estimate of E(Y ).

(22)

• In a simulation, we can make B as large as we like in which case, the difference between B−1 PBj=1Yj and E(Y ) is negligible.

All this would be well and good if we knew F , but we don’t. So what do we do? We will consider two different cases.

• In the first case, F is unknown up to an unknown parameter η, i.e. F (x|η).

– Without knowing η, the above approxi- mation cannot be used.

– The idea of the parametric bootstrap is to simulate data from F (x|ˆη) where ˆη should be a good estimate of η.

– It utilizes the structure of F .

• In the second case, F is completely unknown.

• The idea of the nonparametric boot- strap is to simulate data from the empirical cdf Fn.

• Here Fn is a discrete probability distribution that gives probability 1/n to each observed value x1, · · · , xn.

(23)

• A sample of size n from Fn is thus a sample of size n drawn with replacement from the collection x1, · · · , xn. The standard devia- tion of ˆθ is then estimated by

sθˆ =

v u u u u t

1 B

B

X

i=1i − ¯θ)2

where θ1, . . . , θB are produced from B sam- ple of size n from the collection x1, · · · , xn. Now we use a simple example to illustrate this idea.

• Suppose n = 2 and observe X(1) = c <

X(2) = d.

• X1, X2 are independently distributed with P (Xi = c) = P (Xi = d) = 1/2, i = 1, 2.

• The pairs (X1, X2) therefore takes on the four possible pairs of values

(c, c), (c, d), (d, c), (d, d), each with probability 1/4.

• θ = (X1 + X2)/2 takes on the values c, (c +d)/2, d with probabilities 1/4, 1/2, 1/4,

(24)

respectively, so that θ − (c + d)/2 takes on the values (c − d)/2, 0, (d − c)/2 with probabilities 1/4, 1/2, 1/4, respectively.

For the above example, we can easily calcu- late its bootstrap distribution.

• When n is large, we can easily imagine that the above computation becomes too compli- cated to compute directly.

• Use simple random sampling to approximate bootstrap distribution.

• In the bootstrap literature, a variety alter- natives are suggested other than simple ran- dom sampling.

Project 1.3 Use parametric bootstrap and nonparametric bootstrap to approximate the dis- tribution of median based on a data with sam- ple size 20 from a standard normal distribution.

The following is a sample R-code.

• n < −20

• x < −rnorm(n) # Create some data.

• theta.hat < −median(x)

(25)

• B < −1000; theta.boot < − rep(0,B)

• for(i in 1 : B)) {

xstar < − sample(x,size=n,replace=T) # draw a bootstrap sample

theta.boot[i] < − median(xstar) # com- pute the statistic

}

• var.boot < − var(theta.boot)

• se¡- sqrt(var.boot); print(se)

(26)

We now introduce notations to illustrate the bootstrap method.

• Assumed the data X1, · · · , Xn, are indepen- dent and identically distributed (iid) sam- ples from a k-dimensional population distri- bution F .

• Estimate the distribution

Hn(x) = P {Rn ≤ x},

where Rn = Rn(Tn, F ) is a real-valued func- tional of F and Tn = Tn(X1, · · · , Xn), a statistic of interest.

• Let X1, · · · , Xn be a “bootstrap” samples iid from Fn, the empirical distribution based on X1, · · · , Xn, Tn = Tn(X1, · · · , Xn), and Rn = Rn(Tn, Fn). Fn is constructed by placing at each observation Xi a mass 1/n.

Thus Fn may be represented as Fn(x) = 1

n

n

X

i=1 I(Xi ≤ x), −∞ < x < ∞.

• A bootstrap estimator of Hn is Hˆn(x) = P{Rn ≤ x},

(27)

where for given X1, · · · , Xn, P is the con- ditional probability with respect to the ran- dom generator of bootstrap samples.

• Since the bootstrap samples are generated from Fn, this method is called the nonpara- metric bootstrap.

– Note that ˆHn(x) will depend on Fn and hence itself is a random variable.

– To be specific, ˆHn(x) will change as the data {x1, · · · , xn} changes.

– Recall that a bootstrap analysis is run to assess the accuracy of some primary statistical results.

– This produces bootstrap statistics, like standard errors or confidence intervals, which are assessments of error for the pri- mary results.

• As a further remark, the empirical distribu- tion Fn is called the nonparametric maxi- mum likelihood estimate (MLE) of F .

As illustration, we consider the following three examples.

(28)

Example 1. Suppose that X1, · · · , Xn ∼ N (µ, 1) and Rn = √

n( ¯Xn − µ). Consider the estima- tion of

P (a) = P {Rn > a|N (µ, 1)}.

The nonparametric bootstrap method will esti- mate P (a) by

PN B(a) = P {√

n( ¯Xn − ¯Xn) > a|Fn}.

• Observe data x1, · · · , xn with mean ¯xn.

• Let Y1, . . . , Yn denote a bootstrap sample of n observations drawn independently from Fn.

• Let ¯Yn = n−1 Pni=1 Yi.

• P (a) is estimated by PN B(a) = P {√

n( ¯Yn − ¯xn) > a|Fn}.

• In principle, PN B(a) can be found by con- sidering all nn possible bootstrap sample.

– If all Xi’s are distinct, then the number of different possible resamples equals the number of distinct ways of placing n in- distinguishable objects into n numbered

(29)

boxes, the boxes being allowed to contain any number of objects. It is known that it is equal to C(2n−1, n) ≈ (nπ)−1/222n−1. – When n = 10(20, respect.), C(2n−1, n) ≈

92375(6.9 × 1010, respect.).

– For small value of n, it is often feasible to calculate a bootstrap estimate exactly.

– For large samples, say n ≥ 10, this be- comes infeasible even at today’s computer technology.

• Natural questions to ask are as follows:

– What are computationally efficient ways to bootstrap?

– Can we get bootstrap-like answers with- out Monte Carlo?

• Address the question of “evaluating” the per- formance of bootstrap method.

– For the above particular problem, we need to estimate PN B(a)−P (a) or supa |PN B(a)−

P (a)|.

– As a remark, PN B(a) is a random vari- able since Fn is random.

(30)

– Efron (1992) proposed to use jackknife to give the error estimates for bootstrap quantities.

• Suppose that additional information on F is available. Then it is reasonable to utilize this information in the bootstrap method.

• In this example, F known to be normally distributed with unknown mean µ and vari- ance 1.

– It is natural to use ¯xn to estimate µ and then estimate P (a) = P {Rn > a|N (µ, 1)}

by

PP B(a) = P {√

n( ¯Yn−¯xn) > a|N (¯xn, 1)}.

– Since the bootstrap samples are gener- ated from N (¯xn, 1) which utilizes the in- formation from a parametric form of F , this method is called the parametric boot- strap.

– In this case, it can be shown that PP B(a) = P (a) for all realization of ¯Xn.

– If F is known to be normally distributed with unknown mean and variance µ and

(31)

variance σ2 respectively, PP B(a) is no longer equal to P (a).

Project 1.4. (a) Show that PP B(a) = Φ(a/sn) where s2n = (n − 1)−1 Pni=1(xi − ¯xn)2.

(b) Prove that PP B(a) is a consistent estimate of P (a) for fixed a.

(c) Prove that supa |PP B(a) − P (a)| → 0.P

(32)

For the question of finding PN B(a), we can in principle write down the characteristic function and then apply the inversion formula. However, it is a nontrivial job. Therefore, Efron (1979) suggested to approximate PN B(a) by Monte Carlo resampling. (i.e., Sample-size resamples may be drawn repeatedly from the original sample, the value of a statistic computed for each individual resample, and the bootstrap statistic approxi- mated by taking an average of an appropriate function of these numbers.)

Now we state Levy’s Inversion Formula which is taken from Chapter 6.2 of Chung (1974).

Theorem If x1 < x2 and x1 and x2 are points of continuity of F , then we have

F (x2)−F (x1) = lim

T →∞

1 2π

Z T

−T

e−itx1 − e−itx2

it f (t)dt, where f (t) is the characteristic function.

(33)

Example 2. Estimating the probability of success

• Consider a probability distribution F putting all of its mass at zero or one.

• Let θ(F ) = P (X = 1) = p.

• Consider R(X, F ) = ¯X − θ(F ) = ˆp − p.

• Observed X = x, the bootstrap sample

X1, · · · , Xn ∼ Bin(1, θ(Fn)) = Bin(1, ¯xn).

Note that

R(X, Fn) = ¯Xn − ¯xn, E( ¯Xn − ¯xn) = 0,

V ar( ¯Xn − ¯xn) = x¯n(1 − ¯xn)

n .

Recall that n ¯Xn ∼ Bin(n, ¯x) and n ¯Xn ∼ Bin(n, p).

• It is known that if min{n¯xn, n(1− ¯xn)} ≥ 5, n ¯Xn − n¯xn

rnx¯n(1 − ¯xn) =

√n( ¯Xn − ¯xn)

rn(1 − ¯xn) ∼ N (0, 1);

and if min{np, n(1 − p)} ≥ 5, n ¯Xn − np

rnθ(1 − p) =

√n( ¯Xn − p)

rp(1 − p) ∼ N (0, 1).

(34)

• Based on the above approximation results, we conclude that the bootstrap method works if min{n¯xn, n(1 − ¯xn)} ≥ 5.

• The question remained to be studied is whether P {min(n ¯Xn, n(1 − ¯Xn)) ≥ 5} → 0?

(35)

Example 3. Estimating the median

• Suppose we are interested in finding the dis- tribution of n1/2{Fn−1(1/2)−F−1(1/2)} where Fn−1(1/2) and F−1(1/2) are the sample and population median respectively.

• Set θ(F ) = F−1(1/2).

• Fin a bootstrap approximation of the above distribution.

• Consider n = 2m − 1. Then the sample median Fn−1(1/2) = X(m) where X(1) ≤ X(2) ≤ · · · ≤ X(n).

• Let Ni denote the number of times xi is se- lected in the bootstrap sampling procedure.

Set N = (N1, · · · , Nn).

It follows easily that N follows a multino- mial distribution with n trials and the prob- ability of selection is (n−1, · · · , n−1).

• Denote the order statistics of x1, . . . , xn by x(1) ≤ · · · ≤ x(n).

• Set N[i] to be the number of times of choos- ing x(i). Then for 1 ≤ ` < n, we have

P rob(X(m) > x(`)) = P rob{N[1] + · · · + N[`] ≤ m − 1}

(36)

= P rob

Bin

n, ` n

≤ m − 1

= m−1X

j=0 C(n, j)

` n

j

1 − ` n

n−j

. Or,

P rob(T = x(`) − x(m)) = P rob

Bin

n, ` − 1 n

≤ m − 1

− P rob

Bin

n, ` n

≤ m − 1

.

• When n = 13, we have

` 2 or 12 3 or 11 4 or 10 5 or 9 6 or 8 7 probability 0.0015 0.0142 0.0550 0.1242 0.4136 0.2230 Quite often we use the mean square error

to measure the performance of an estimator, t(X), of θ(F ). Or, EFT2 = EF(t(X) − θ(F ))2. Use the bootstrap to estimate EFT2. Then the bootstrap estimate of EFT2 is

E(T)2 = X13

`=1[x(`)−x(7)]2P rob{T = x(`)−x(7)}.

It is known that EFT2 → [4nf2(θ)]−1 as n tends to infinity when F has a bounded con- tinuous density. A natural question to ask is whether E(T)2 is close to EFT2?

(37)

Validity of the Bootstrap Method

We now give a brief discussion on the validity of the bootstrap method. First, we state cen- tral limit theorems and its approximation error bound.

Perhaps the most widely known version of the CLT is the following.

Theorem (Lindeberg-Levy Central Limit The- orem) Let {Xi} be iid with mean µ and finite variance σ2. Then

√n

1 n

Xn

i=1 Xi − µ

→ N (0, σd 2).

The above theorem can be generalized to inde- pendent random variables which are not neces- sarily identically distributed.

Theorem (Lindeberg-Feller CLT) Let {Xi} be independent with mean {µi}, finite variances {σi2}, and distribution functions {Fi}.

• Suppose that Bn2 = Pni=1 σi2 satisfies σn2

Bn2 → 0, Bn → ∞ as n → ∞.

• n−1 Pni=1 Xi is N (n−1 Pni=1 µi, n−2Bn2) if and

(38)

only if the following Lindeberg condition sat- isfied

Bn−2 Xn

i=1

Z

|t−µi|>Bn(t−µi)2dFi(t) → 0, n → ∞ , each  > 0.

In the theorems just described, asymptotic normality was asserted for a sequence of sums

Pn

1 Xi generated by a single sequence X1, X2, . . . of random variables. For the validality of boot- strap, we may consider a double array of ran- dom variables

X11, X12, · · · , X1K1; X21, X22, · · · , X2K2;

... ... ... ...

Xn1, Xn2, · · · , XnKn; ... ... ... ...

For each n ≥ 1, there are Kn random variables {Xnj, 1 ≤ j ≤ Kn}. It is assumed that Kn

∞. The case Kn = n is called a “triangular”

array.

Denote by Fnj the distribution function of Xnj. Also, put

µnj = EXnj, An = E KXn

j=1Xnj = KXn

j=1µnj,

(39)

Bn2 = V ar

Kn

X

j=1Xnj

.

We then have the following theorem.

Theorem (Lindeberg-Feller) Let {Xnj : 1 ≤ j ≤ Kn; n = 1, 2, . . .} be a double array with independent random variables within rows. Then the “uniform asymptotic negligibility” condi- tion

1≤j≤Kmaxn P (|Xnj−µnj| > τ Bn) → 0, n → ∞, each τ > 0, and the asymptotic normality condition PKj=1n Xnj

is AN (An, Bn2) together hold if and only if the Lindberg condition

Bn−2 Xn

i=1

Z

|t−µi|>Bn(t−µi)2dFi(t) → 0, n → ∞each  > 0 is satisfied. As a note, the independence is as-

sumed only it within rows, which themselves may be arbitrarily dependent.

It is of both theoretical and practical interest to characterize the error of approximation in the CLT.

For the i.i.d. case, an exact bound on the error of approximation is provided by the following theorem due to Berry (1941) and Esseen (1945).

(40)

Theorem If X1, . . . , Xn are i.i.d. with dis- tribution F and if Sn = X1 + · · · + Xn, then there exists a constant c (independent of F ) such that for all x,

supx

P

Sn − ESn

rV ar(Sn) ≤ x

− Φ(x)

≤ c

√n

E|X1 − EX1|3 [V ar(X1)]3/2 for all F with finite third moment.

• Note that c in the above theorem is a univer- sal constant. Various authors have thought to find the best constant c.

• Originally, c is set to be 33/4 but it has been sharpened to 0.7975.

• For x is sufficiently large, while n remains fixed, the quantities

P [(Sn − ESn)/rV ar(Sn) ≤ x]

and Φ(x) each become so close to 1 that the bound given by above is too crude.

• The problem in this case may be character- ized as one of approximation of large devia- tion probabilities, with the object of atten- tion becoming the relative error in approxi-

(41)

mation of

1 − P [(Sn − ESn)/rV ar(Sn) ≤ x]

by 1 − Φ(x) when x → ∞.

(42)

Inconsistent Bootstrap Estimator

Bickel and Freedman (1981) and Loh (1984) showed that the bootstrap estimators of the dis- tributions of the extreme-order statistics are in- consistent.

• Let X(n) be the maximum of i.i.d. random variables X1, . . . , Xn from F with F (θ) = 1 for some θ, and let X(m) be the maximum of X(1) , . . . , X(m) which are i.i.d. from the empirical distribution Fn.

• Although X(n) → θ, it never equals θ. But P{X(n) = X(n)} = 1−(1−n−1)n → 1−e−1, which leads to the inconsistency of the boot- strap estimator.

• The reason for the inconsistency of the boot- strap is that the bootstrap samples are drawn from Fn which is not exactly F . Therefore, the bootstrap may fail due to the lack of

“continuity.”

Consider the following problem.

• Let X1, . . . , Xn be independent, with a com- mon N (µ, σ2) distribution.

(43)

• Let s2 be the sample variance.

• Consider the pivot s22.

This is distributed, of course, as χ2n−1/(n − 1).

• Consider the bootstrap approximation, namely the distribution of s∗2/s2, where s∗2 is the variance of the resampled data.

• For n = 20, the bootstrap approximation is not good.

• One source of this problem is in the tails of the normal: about 30% of the variance is contributed by 5% of the distribution and will be missed by typical samples of size 20.

• In other words, s2 is mean unbiased for σ2, but quite skewed for moderate n: in most samples, s2 is somewhat too small, coun- terbalanced by the few samples where s2 is huge. The variance of s∗2/s2 is largely con- trolled by the sample fourth moment, which is even more skewed.

• For large n, these problems go away: after all, s2 and the bootstrap are consistent.

(44)

Project 1.5 Use simulation to illustrate the points made in the above two questions. For the problem of estimating the end point, you can assume that X is a uniform random variable over [0, 1] (i.e., θ = 1).

(45)

Bias Reduction via the Bootstrap Principle

• The bootstrap can also be used to estimate bias and do bias reduction.

• Consider θ0 = θ(F0) = µ3, where µ =

R xdF0(x). Set ˆθ = θ(Fn) = ¯X3.

• Elementary calculations show that E{θ(Fn)|F0} = E{µ + n−1 nX

i=1(Xi − µ)}3

= µ3 + n−13µσ2 + n−2γ, where γ = E(X1 − µ)3 denotes population skewness.

• Using the nonparametric bootstrap, we ob- tain the following:

E{θ(Fn)|Fn} = ¯X3 + n−13 ¯Xσˆ2 + n−2γ,ˆ

where ˆσ2 = n−1 P(Xi− ¯X)2 and ˆγ = n−1 P(Xi− X)¯ 3 denote sample variance and skewness respectively.

• Using the bootstrap principle, E{θ(Fn)|Fn}−

θ(Fn) is used to estimate θ(Fn) − θ(F0).

• Note that θ0 = θ(Fn) − (θ(Fn) − θ0). Or, θ0 can be estimated by θ(Fn)−[E{θ(Fn)|Fn}−

θ(Fn)] or 2θ(Fn) − E{θ(Fn)|Fn}.

(46)

• The bootstrap bias-reduced estimate is 2 ¯X3− ( ¯X3 + n−13 ¯Xσˆ2 + n−2γ). Or, ˆˆ θN B = ¯X3 − n−13 ¯Xσˆ2 − n−2γ.ˆ

Updating...

## References

Related subjects :