Methods for Statistical Prediction Financial Time Series I Topic 1: Review on Hypothesis Testing Hung Chen Department of Mathematics National Taiwan University 9/26/2002

(1)

Methods for Statistical Prediction

Financial Time Series I

Topic 1: Review on Hypothesis Testing

Hung Chen

Department of Mathematics National Taiwan University

9/26/2002

(2)

OUTLINE 1. Fundamental Concepts

2. Neyman-Pearson Paradigm 3. Examples

4. Optimal Test

5. Observational Studies 6. Likelihood Ratio Test

7. One-sample and Two-sample Tests

(3)

Motivated Example on Hypothesis Testing ESP experiment: guess the color of 52 cards

with replacement.

• Experiment: Generate data to test the hypotheses.

• T : number of correct guess in 10 trials

• H₀ : T ∼ Bin(10, 0.5) versus H₁ : T ∼ Bin(10, p) with p > 1/2

• Consider the test statistic T and the rejection region R = {8, 9, 10}.

• Compute the probability of committing type 1 error:

α = P (R) = P (X > 7)

= 0.0439 + 0.0098 + 0.0010 = 0.0547.

• When rejection region= R = {7, 8, 9, 10},

α = P (X > 6) = 0.1172+P (X > 7) = 0.1719.

• Calculation of power when R = {8, 9, 10}.

We compute what the power will be under various values of p.

p = 0.6 P (X > 7|p = 0.6) = 0.1673 p = 0.7 P (X > 7|p = 0.7) = 0.3828.

2

(4)

• Idea: A statistical test of a hypothesis is a rule which assigns each possible observation to one of two exclusive categories: consistent with the hypothesis under considera- tion and not consistent with the hypothesis.

• Will we make mistake?

Two Types of Error

Reality

H₀ true H₀ false Test says reject H₀ Type I Error Good

cannot reject Good Type II Error H₀

• Usually, P (Type I Error) is denoted by α and P (Type II Error) is denoted by β.

• In ESP experiment, α increases when R moves from {8, 9, 10} to {7, 8, 9, 10} but β decreases.

(5)

• Statistical Hypotheses testing is a formal means of choosing between two distributions on the basis of a particular statistic or random variable generated from one of them.

– How do we accomodate the uncertainty on the observed data?

– How do we evaluate a method?

• Neyman-Pearson Paradigm – Null hypothesis H₀

– Alternate hypothesis H_A or H₁

– The objective is to select one of the two based on the available data.

– A crucial feature of hypothesis testing is that the two competing hypotheses are not treated in the same way: one is given the benefit of the doubt, the other has the burden of proof.

The one that gets the benefit of the doubt is called the null hypothesis. The other is called the alternative hypothesis.

– By definition, the default is H₀. When we carry out a test, we are asking whether

4

(6)

the available data is significant evidence in favor of H₁. We are not testing whether H₁ is true; rather, we are testing whether the evidence supporting H₁ is statisti- cally significant.

– The conclusion of a hypothesis test is that we either reject the null hypothesis (and accept the alternative) or we fail to reject the null hypothesis.

Failing to reject H₀ does not quite mean that the evidence supports H₀; rather, it means that the evidence does not strongly favor H₁.

Again, H₀ gets the benefit of the doubt.

– Examples:

∗ Suppose we want to determine if stocks picked by experts generally perform better than stocks picked by darts. We might conduct a hypothesis test to determine if the available data should persuade us that the experts do better. In this case, we would have

H₀: experts not better than darts H₁: experts better than darts

(7)

∗ Suppose we are skeptical about the effectiveness of a new product in pro- moting dense hair growth. We might conduct a test to determine if the data shows that the new product stimulates hair growth. This suggests

H₀: New product does not promote hair growth

H₁: New product does promote hair growth

Choosing the hypotheses this way puts the onus on the new product; unless there is strong evidence in favor of H₁, we stick with H₀.

∗ Suppose we are considering changing the packaging of a product in the hope of boosting sales. Switching to a new package is costly, so we will only undertake the switch if there is significant evidence that sales will increase.

We might test-market the change in one or two cities and then evaluate the results using a hypothesis test. Since the burden of proof is on the new pack-

6

(8)

age, we should set the hypotheses as follows:

H₀: New package does not increase sales

H₁: New package does increase sales – There are two types of hypotheses, sim-

ple ones where the hypothesis completely specifies the distribution.

– Simple hypotheses test one value of the parameter against another, the form of the distribution remaining fixed.

– Here is an example when they are both composite:

X_i: Poisson with unknown parameter X_i is not Poisson

(9)

Steps for setting up test:

1. Define the null hypothesis H₀ (devil’s ad- vocate).

Put the hypothesis that you don’t believe as H₀

2. Define the alternative H_A (one sided /two sided).

3. Find the test statistic.

Use heuristic or systematic methods.

4. Decide on the type I error: α that you are willing to take.

5. Compute the probability of observing the data given the null hypothesis: p-value.

6. Compare the p-value to α, if its smaller, reject H₀.

8

(10)

Example 1: Sex bias in graduate admission

• The graduate division of the University of California at Berkeley attempted to study the possibility that sex bias operated in graduate admissions in 1973 by examining admission data.

• In this case, what does the hypothesis of no sex bias corresponds to? It is natural to translate this into

P [Admit|M ale] = P [Admit|F emale].

• Data

– There were 8, 442 men who applied for admission to graduate school that quarter, and 4, 321 women.

– About 44% of the men and 35% of the women were admitted.

– How do we perform this two-sample test?

• What is the conclusion?

– two-sample test 0.44 − 0.35

s0.44×0.56

8442 + ^0.35×0.65₄₃₂₁ = 9.948715.

(11)

– p-value is 1.283 exp(−22) when H₁ is a

one-sided test P [Admit|M ale] > P [Admit|F emale].

– p-value is 2.566 exp(−22) when H₁ is a

two-sided test P [Admit|M ale] 6= P [Admit|F emale].

10

(12)

Example 2: Effectiveness of Therapy

• Suppose that a new drug is being considered with a view to curing a certain disease.

• How do we evaluate its effectiveness?

• The drug is given to n patients suffering from the disease and the number x of cures is noted.

• We wish to test the hypothesis that there is at least a 50 − 50 chance of a cure by this drug based on the following data:

x cures among n patients.

• Put the problem in the following framework of statistical test:

– The sample space X is simple-it is the set {0, 1, 2, . . . , n}. (i.e., X can take on 0, 1, 2, . . . , n.)

– The family {P_θ} of possible distributions on X is (assuming independent patients)

the family of binomial distributions, parametrized by the real parameter θ taking values in

[0, 1].

(13)

– θ is being interpreted as the probability of cure.

– X ∼ Bin(n, θ)

– The stated hypothesis defines the subset Θ₀ = [1/2, 1] of the parameter space.

H₀ : θ ≥ 1/2

– In this situation, only a small class of tests which seem worth considering on a purely intuitive basis.

We will only consider those for which the set of x taken to be consistent with Θ₀ have the form {x : x ≥ k}

– Question: Does it make sense to consider that x cures out of n patients were consistent with Θ₀, while x + 1 were not?

– What is a reasonable test?

12

(14)

A recipe: Optimal tests for simple hypotheses

• Null hypothesis H₀ : f = f₀

• Alternate hypothesis H_A : f = f₁

• Want to find a rejection region R such that the error of both types are as small as possible.

Z

R f₀(x)dx = α and 1 − β = ^Z_R f₁(x)dx.

• Neyman-Pearson Lemma:

For testing f₀(x) against f₁(x) a critical region of the form

Λ(x) = f₁(x)

f₀(x) ≥ K

where K is a constant has the greatest power (smallest β) in the class of tests with the same α.

– Let R denote the rejection region determined by Λ(x) and S denote the rejection of other testing procedure.

– α_R = ^R_R f₀(x)dx, α_S = ^R_S f₀(x)dx, – α_R, α_S ≤ α

(15)

– β_R − β_S = (^R_R − ^R_S) f₁dx = ^R_R∩S^c f₁dx −

RS∩R^c f₁dx.

Since in R, f₁ ≥ f₀/K and in R_c, −f₁ ≥

−f₀/K we have:

β_R − β_S ≥ 1 K

Z

R∩S^c f₀dx − ^Z_S∩Rc f₀dx

= 1 K

Z

R f₀dx − ^Z_S f₁dx = 1

K(α_R − α_S) – When α_R = α_S = α, β_R − β_S ≥ 0.

14

(16)

Why Neyman-Pearson framework is being accepted?

• A test whose error probabilities are as small as possible is clearly desirable.

However, we cannot choose the critical region in such a way that α(θ) and β(θ) are simultaneously uniformly minimized.

By taking the critical region as the empty set, we can make α(θ) = 0 and by taking the critical region as the sample space, we can make β(θ) = 0. Hence a test which uniformly minimized both error-probability functions would require to have zero error probabilities, and usually no such test ex- ists.

• The modification suggested by Neyman and Pearson is based on the fact that in most circumstances our attitudes to the hypotheses Θ₀ and Θ−Θ₀ are different- we are often asking if there is sufficient evidence to reject the hypothesis Θ₀.

In terms of the two possible errors this may be translated into the statement that often the Type I error is more serious than

(17)

the Type II error.

• We should control the probability of the Type I error at some pre-assigned small value α, and then, subject to this control, look for a test which uniformly minimizes the function describing the probabilities of Type II error.

• Is this asymmetry on (H₀, H₁) reasonable?

Can you come up an example with business application?

– Suppose we use this testing technique in searching for regions of the genome that resemble other regions that are known to have significant biological activity.

– One way of doing this is to align the known and unknown regions and compute statistics based on the number of matches.

– To determine significant values of these statistics a (more complicated) version of the following is done.

Thresholds (critical values) are set so that if the matches occur at random and the

16

(18)

probability of a match is 1/2, then the probability of exceeding the threshold (type I) error is smaller than α.

– No one really believes that H₀ is true and possible types of alternatives are vaguely known at best, but computation under H₀ is easy.

Now we use the following example to moti- vate Neyman-Pearson lemma. We start from the simplest possible situation, that where Θ has only two elements θ₀ and θ₁, say, and where Θ₀ = {θ₀}, Θ − Θ₀ = {θ₁}. Note that a hypothesis which specifies a set in the parameter space containing only one element is called a simple hypothesis. Thus we are now considering testing a simple null-hypothesis against a simple alternative. In this case, the power function of any test reduces to a single number, and we examine the question of the existence of a most-powerful test of given significance level α.

Revisit the example that x cures out of n patients when n = 5. We wish to test

H₀ : p = 0.5 versus H₁ : p = 0.3.

(19)

• The probability distribution of X is

X = x 0 1 2 3 4 5

p = 0.5 0.031 0.156 0.313 0.313 0.156 0.031 p = 0.3 0.168 0.360 0.309 0.132 0.028 0.003 f₁(x)/f₀(x) 5.419 2.308 0.987 0.422 0.179 0.097

• Think of the meaning of likelihood ratio f₁(x)/f₀(x).

• We consider all possible nonrandomized tests of significance level 0.2.

critical region α 1 − β critical region α 1 − β {0} 0.031 0.168 {0, 1} 0.187 0.528 {1} 0.156 0.360 {0, 4} 0.187 0.196 {4} 0.156 0.028 {1, 5} 0.187 0.363 {5} 0.031 0.003 {4, 5} 0.187 0.031 {0, 5} 0.062 0.171

• The best test is the one with critical region {0, 1}. Can you give a reason for that? Or, can you find a rule?

Try to think in terms of likelihood ratio by noting

f₁(x) = f₁(x)

f₀(x) · f₀(x).

As a hint, compare the two tests {0, 1} and {0, 4} with the same α. Observe that their

18

(20)

power are

β_{0,1} = [P_{p=0.3}(r = 0)] + P_{p=0.3}(r = 1) β_{0,4} = [P_{p=0.3}(r = 0)] + P_{p=0.3}(r = 4).

Compare P_{p=0.3}(r = 4) to P_{p=0.3}(r = 1).

• Conclusion: The critical region determined by {x : f₁(x)/f₀(x) ≥ c} is quite intuitive.

Suppose that we set out to order points in the sample space according to the amount of evidence they provide for P₁ rather than P₀. We should naturally order them according to the value of the ratio f₁(x)/f₀(x); any x for which this ratio is large provides evidence than P₁ rather than P₀ is the true underlying probability distribution. The Neyman- Pearson analysis gives us a basis for choosing c so that

P₁











x : f₁(x)

f₀(x) ≥ c











= α.

Now we use the Neyman-Pearson lemma to derive UMP test in the following two examples.

Example 3. Suppose that X is a sample of size 1. We wish to test whether it comes from

(21)

N (0, 1) or the double exponential distribution DE(0, 2) with the pdf 4⁻¹ exp(−|x|/2).

• Make a guess on the testing procedure?

• Since P (f₁(x) = cf₀(x)) = 0, there is a unique nonrandomized UMP test.

• The UMP test T_∗(x) = 1 if and only if π

8 exp(x² − |x|) > c²

for some c > 0, which is equivalent to |x| >

t or |x| < 1 − t for some t > 1/2.

• Suppose that α < 1/4. We use

α = E₀[T_∗(X)] = P₀(|X| > t) = 0.3374 > α.

Hence t should be greater than 1 and α = Φ(−t) + 1 − Φ(t).

Thus, t = Φ⁻¹(1−α/2) and T_∗(X) = I_(t,∞)(|X|).

• Why the UMP test rejects H₀ when |X| is large?

• The power of T_∗ under H₁ is

E₁[T_∗(X)] = P₁(|X| > t) = 1−1 4

Z t

−t e^−|x|/2dx = e^−t/2.

20

(22)

Example 4. Let X₁, . . . , X_n be iid binary random variables with p = P (X₁ = 1). Sup- pose that we wish to test H₀ : p = p₀ versus H₁ : p = p₁, where 0 < p₀ < p₁ < 1.

• Since P (f₁(x) = cf₀(x)) 6= 0, we may need to consider randomized UMP test.

• A UMP test of size α is T_∗(Y ) =











1 λ(Y ) > c γ λ(Y ) = c 0 λ(Y ) < c, where Y = ^Pⁿ_i=1 X_i and

λ(Y ) =







p₁ p₀







Y ^





1 − p₁ 1 − p₀







n−Y

.

• Since λ(Y ) is increasing in Y , there is an integer m > 0 such that

T_∗(Y ) =











1 Y > m γ Y = m 0 Y < m, where m and γ satisfy

α = E₀[T_∗(Y )] = P₀(Y > m)+γP₀(Y = m).

(23)

• Since Y has the binomial distribution Bin(n, p), we can determine m and γ from

α = ^Xⁿ

j=m+1







n j





p^j₀(1−p₀)^n−j+γ







n m





p^m₀ (1−p₀)^n−m.

• Unless

α = ^Xⁿ

j=m+1







n j





p^j₀(1 − p₀)^n−j

for some integer m, the UMP test is a randomized test.

• Do you notice that the UMP test T_∗ does not depend on p₁?

– Neyman-Pearson lemma tells us that we should put those x into rejection region according to its likelihood ratio until the level of test achieves α.

– Think of two hypothesis testing problems:

The first one is H₀ : p = p₀ versus H₁ : p = p₁ and the second one is H₀ : p = p₀ versus H₁ : p = p₂ where p₁ > p₀ and p₂ > p₀.

– For the above two testing problems, both their likelihood ratios increase as y increases.

22

(24)

– T_∗ is in fact a UMP test for testing H₀ : p = p₀ versus H₁ : p > p₀.

• Suppose that there is a test T_∗ of size α such that for every P₁ ∈ P, T_∗ is UMP for testing H₀ versus the hypothesis P = P₁.

Then T_∗ is UMP for testing H₀ versus H₁. Example: Suppose we have reason to believe that the true average monthly return on stocks selected by darts is 1.5%. We want to choose between H₀ : µ = 1.5 versus H₁ : µ 6=

1.5, where µ¯ is the true mean monthly return.

• We need to select a significance level α. Let’s pick α = 0.05. This means that there is at most a 5% chance that we will mistakenly reject H₀ if in fact H₀ is true (Type I error).

It says nothing about the chances that we will mistakenly stick with H₀ if in fact H₁ is true (Type II error).

• Large sample hypothesis test. Let’s suppose we have samples X₁, · · · , X_n with n > 30.

– The first step in choosing between our hypotheses is computing the following test

(25)

statistic:

Z =

X − µ¯ ₀ σ/√

n

– If the null hypothesis is true, then the test statistic Z has approximately a standard normal distribution by the central limit theorem.

– The test: If Z < −z_α/2 or Z > z_α/2, we reject the null hypothesis; otherwise, we stick with the null hypothesis. (Recall that

• T-test for normal population.

Suppose now that we don’t necessarily have a large sample but we do have a normal population. Consider the same hypotheses as before.

– Now our test statistic becomes t ==

X − µ¯ ₀ s/√

n

– Under H₀, the test statistic t has a t- distribution with n − 1

– Consider the mean return on darts. Sup- pose we have n = 20 observations (the 1-

24

(26)

month contests) with a sample mean of

−1.0 and a sample standard deviation of 7.2.

Our test statistic is −1.55. The thresh- hold for rejection is t_19,0.025 = 2.093.

Example: Consider the effect of a packaging change on sales of a product. Let µ be the (unknown) mean increase in sales due to the change. We have data available from a test- marketing study. We will not undertake the change unless there is strong evidence in favor of increased sales. We should therefore set up the test like this: H₀ : µ ≤ 0 versus H₁ : µ > 0.

• Note that this is a one-sided test.

• This formulation implies that a large X (i.e., large increases in sales in a test market) will support H₁ (i.e., cause us to switch to the new package) but negative values of X (rejecting decreased sales) support H₀.

• The packaging example: Suppose that based on test-marketing in 36 stores we observe a sample mean increase in sales of 13.6 units per week with a sample standard deviation

(27)

of 42.

Is the observed increase significant at level α=0.05? To answer this, we compute the test statistic Z = 1.80.

Our cutoffffis z_α = 1.645. Since Z > z, the increase is significant.

26

(28)

Observational Studies

• An observational study on sex bias in admissions to the Graduate Division at the University of California, Berkeley, was carried out in the fall quarter of 1973. Bickel, P., OConnell, J.W., and Hammel, E. (1975) Is there a sex bias in graduate admissions?

Science 187, 398-404.

– There were 8, 442 men who applied for admission to graduate school that quarter, and 4, 321 women.

– About 44% of the men and 35% of the women were admitted.

– Assuming that the men and women were on the whole equally well qualified (and there is no evidence to the contrary), the difference in admission rates looks like a very strong piece of evidence to show that men and women are treated differently in the admission procedure.

• Admissions to graduate work are made separately for each major. By looking at each major separately, it should have been possi-

(29)

ble to identify the ones which discriminated against the women.

– In Berkeley, there are over a hundred majors.

– Look at the six largest majors had over five hundred applicants each. (They together accounted for over one third of the total number of applicants to the campus.)

– In each major, the percentage of female applicants who were admitted is roughly equal to the percentage of male applicants.

– The only exception is major A, which ap- pears to discriminate against men: it admitted 82% of the women, and only 62%

of the men.

– When a;; six majors are taken together, they admitted 44% of the male applicants, and only 30% of the females-the difference is 14%,

• Admissions data in the six largest majors

28

(30)

Men Women

Number of Percent Number of Percent Major applicants admitted applicants admitted

A 825 62 108 82

B 560 63 25 68

C 325 37 59 34

D 417 33 375 35

E 191 28 393 24

F 373 6 341 7

• What is going on? An explanation:

– The first two majors were easy to get into. Over 50% of the men applied to these two.

– The other four majors were much harder to get into. Over 90% of the women applied to these four.

– There was an effect due to the choice of major, confounded with the effect due to sex. When the choice of major is controlled for, as in the above Table, there is little difference in the admissions rates for men or women.

• An experiment is controlled when the in-

(31)

vestigators determine which subjects will be the controls and which will get the treatment- for instance, by tossing a coin.

• Statisticians distinguish carefully between controlled experiments and observational studies.

– Studies of the effects of smoking are necessarily observational-nobody is going to smoke for ten years just to please a statis- tician.

– Many problems can be studied only ob- servationally and all observational studies have to deal with the problems of con- founding.

– For the admission example, it is wrong to campus-wide choice of major. We have to make comparisons for homogeneous subgroups.

– This was not a controlled, randomized experiment, however; sex was not ran- domly assigned to the applicants.

• An alternative analysis: Compare the weighted average admission rates for men and women.

30

(32)

Consider

933

4526 × 62% + ₄₅₂₆⁵⁸⁵ × 63% + ₄₅₂₆⁹¹⁸ × 37%

+₄₅₂₆⁷⁹² × 33% + ₄₅₂₆⁵⁸⁴ × 28% + ₄₅₂₆⁷¹⁴ × 6%

and etc which lead to 39% versus 43%.

(33)

Hypothesis Testing By Likelihood Methods Example Let X₁, . . . , X_n be iid with X₁ ∼ N (µ, 1).

• Test H₀ : µ = 0 versus H₁ : µ = µ₀ > 0.

• Construct a test with α = 0.05 and β = 0.2005.

• Reject H₀ if √

n ¯X_n > 1.645.

• Note that β = P (√

n ¯X_n ≤ 1.645|µ = µ₀) = Φ(1.645−√

nµ₀).

• If n → ∞ and µ₀ is a fixed positive constant, β → 0.

• To ensure β = 0.2005, it requires that 1.645 − √

nµ₀ = −0.84 or µ₀ = 2.485n^−1/2.

• Do you notice that µ₀ will change with n which is no longer a fixed alternative?

Test Statistics for A Simple Null Hypothesis Consider testing H₀ : θ = θ⁰ ∈ R^s versus H₁ : θ 6= θ⁰.

32

(34)

Likelihood Ratio Test

• A likelihood ratio statistic, Λ_n = L(θ⁰; x)

supθ∈Θ L(θ; x)

was introduced by Neyman and Pearson (1928).

• Λ_n takes values in the interval [0, 1] and H₀ is to be rejected for sufficiently small values of Λ_n.

• The rationale behind LR tests is that when H₀ is true, Λ_n tends to be close to 1, whereas when H₁ is true, Λ_n tends to be close to 0,

• The test may be carried out in terms of the statistic

λ_n = −2 log Λ_n.

• For finite n, the null distribution of λ_n will generally depend on n and on the form of pdf of X.

• LR tests are closely related to MLE’s.

• Denote MLE by ˆθ. For asymptotic analysis, expanding λ_n at ˆθ in a Taylor series, we get λ_n = −2







− ^Xⁿ

i=1log f (X_i, ˆθ) + ^Xⁿ

i=1log f (X_i, θ⁰)







(35)

= 2











1

2(θ⁰ − ˆθ)^T





− ^Xⁿ

i=1

∂²

∂θ_j∂θ_k log f (x; θ)

θ⁼θ^∗





(θ⁰ − ˆθ)











, where ˆθ lies between ˆθ and θ⁰.

• Since θ^∗ is consistent, λ_n = n(ˆθ−θ⁰)^T





−1 n

n

X

i=1

∂²

∂θ_j∂θ_kL(θ)

θ⁼θ₀





(ˆθ−θ⁰)+o_P(1).

By the asymptotic normality of ˆθ and

−n^{−1 n}^X

i=1

∂²

∂θ_j∂θ_k L(θ)|

θ⁼θ⁰

→ I(θP ⁰), λ_n has, under H₀, a limiting chi-squared distribution on s degrees of freedom.

Example Consider the testing problem H₀ : θ = θ₀ versus H₁ : θ 6= θ₀ based on iid X₁, . . . , X_n from the uniform distribution U (0, θ).

• L(θ₀; x) = θ₀⁻ⁿ1_{x_(n)_<θ₀_}

• ˆθ = x_(n) (MLE) and sup_θ∈Θ L(θ; x) = x⁻ⁿ_(n)1_{x_(n)_<θ}

• We have Λ_n =











(X_(n)/θ⁰)ⁿ X_(n) ≤ θ₀ 0 X_(n) > θ₀

34

(36)

• Reject H₀ if X_(n) > θ₀ or X_(n)/θ₀ < c^1/n.

• What is the asymptotic distribution of λ_n?

• What is P (n log(X_(n)/θ⁰) ≤ c) where c <

0? It is not a χ² distribution. (Why???) Example Consider the testing problem H₀ : σ² = σ₀² versus H₁ : σ² 6= σ₀² based on iid X₁, . . . , X_n from the normal distribution N (µ₀, σ²).

• L(θ⁰; x) = (2πσ₀²)^−n/2 exp −^P_i(x_i − µ₀)²/2σ₀²

• ˆσ² = n⁻¹ ^P_i(x_i − µ₀)² (MLE) and sup

θ∈Θ

L(θ; x) = (2π ˆσ²)^−n/2 exp(−n/2).

• We have Λ_n =







ˆ σ² σ₀²







n/2

exp







n 2 −

Pi(x_i − µ₀)² 2σ₀²







or under H₀ λ_n = −n







ln







1 n

n

X

i=1Z_i²





 −





1 −







1 n

n

X

i=1 Z_i²



















, where Z₁, . . . , Z_n are iid N (0, 1).

• Fact: Using CLT, we have n⁻¹ ^Pⁿ_i=1 Z_i² − 1

r2/n

→ N (0, 1)d

(37)

or n 2







1 n

n

X

i=1 Z_i² − 1







2 d

→ χ²₁.

• Note that ln u ≈ −(1 − u) − (1 − u)²/2 when u is near 1 and n⁻¹ ^Pⁿ_i=1 Z_i² → 1 in probability by LLN.

• A common question to be asked in Tay- lor’s series approximation is that how many terms we should consider. In this example, it refers to the use of approximation ln u ≈ −(1 − u) as a contrast to the second order approximation we use. If we do use the first order approximation, we will end up the difficulty of finding lim_na_nb_n when lim_n a_n = ∞ and lim_nb_n = 0.

• We conclude that λ_n has a limiting chi-squared distribution with 1 degree of freedom.

36

(38)

1. We begin withthe simplestcase of a test. Suppose we are inclinedto believe that some

(unknown) population mean has the value

0

, where

0

is some (known) number.

We have samples X

1

;:::;X

n

from the underlying population and we want to test our

hypothesis that=

0

. Thus, wehave

H

0

: =

0

H

1

: 6=

0

What sort of evidence would lead us to reject H

0

in favor of H

1

? Naturally, a sample

mean farfrom

0

wouldsupportH

1

whileone closeto

0

wouldnot. Hypothesis testing

makesthisintuition precise.

2. Thisisatwo-sidedortwo-tailedtestbecausesamplemeansthatareverylargeorvery

smallcount asevidenceagainstH

0

. Inaone-sidedtest,onlyvaluesinonedirectionare

evidence againstthe nullhypothesis. We treat thatcase later.

3. Example: Suppose we have reason to believe that the true average monthly return on

stocks selected by darts is 1.5%. (See Dart Investment Fund in the casebook for back-

groundand data.) Wewantto choosebetween

H

0

: =1:5

H

1

: 6=1:5;

where isthetrue meanmonthlyreturn.

4. We need to select a signicance level . Let'spick =:05. Thismeans that there isat

mosta 5% chance thatwe willmistakenlyreject H

0

ifin factH

0

is true (Type Ierror).

It says nothingaboutthe chances that we willmistakenlystick with H

0

if infact H

1 is

true(Type IIerror).

5. Large sample hypothesis test. Let's suppose we have samples X

1

;:::;X

n

with n>

30. The rst step in choosing between our hypotheses is computing the following test

statistic:

Z =

X

0

=

p

n :

Iam temporarily assumingthatweknow.

6. Remember that we know

0

(it's part of the null hypothesis we've formulated), even

thoughwedon't know.

7. If the null hypothesis is true, then the test statistic Z has approximately a standard

normaldistribution.

8. Nowwe carryoutthetest: IfZ < z

=2

orZ >z

=2

wereject thenullhypothesis;oth-

erwise, westickwiththenullhypothesis. (Recallthat z

=2

is denedbytherequirement

that thearea to theright of z

=2

under N(0;1) is =2. Thus, with =:05, thecuto is

(39)

=2

statisticZ landsinthesetofpointshavingabsolutevaluegreatherthanz

=2

. Thissetis

calledtherejection region forthetest.

10. Every hypothesis test hasthisgeneral form: we computea test statisticfrom data, then

checkiftheteststatisticlandsinsideoroutsidetherejection region. Therejectionregion

dependson butnoton thedata.

11. Noticethat saying

z

=2

<

X

0

=

p

n

<z

=2

isequivalent to saying

X z

=2

p

n

<

0

<X+z

=2

p

n :

So, here is anotherway to think of thetest we just did. We found a condenceinterval

forthemeanandcheckedto seeif

0

landsinthatinterval. If

0

landsinside,we don't

reject H

0

;if

0

landsoutside,we do reject H

0 .

12. Thissupportsourintuitionthat we shouldreject H

0

ifX is farfrom

0 .

13. Asusual, ifwe don'tknow wereplace itwith thesamplestandard deviations.

14. T-test for normal population. Suppose now that we don't necessarily have a large

sample but we do have a normal population. Consider the same hypotheses as before.

Nowourtest statistic becomes

t=

X

0

s=

p

n :

15. Under the nullhypothesis, the test statistic t has a t-distribution with n 1 degrees of

freedom.

16. Nowwe carry outthetest. Reject if

t< t

n 1;=2

or t>t

n 1;=2

;

otherwise,do notreject.

17. Asbefore,rejecting basedon thisruleisequivalentto rejectingwhenever

0

fallsoutside

thecondenceintervalfor.

18. Example: Let'scontinuewiththehypothesistestforthemeanreturnondarts. Asabove,

0

= 1:5 and = :05. Suppose we have n = 20 observations (the 1-month contests)

withasample meanof 1:0anda sample standarddeviationof7:2. Ourtest statisticis

therefore

t=

X

0

s=

p

n

=

1:0 1:5

7:2=

p

20

= 1:55

The threshholdfor rejection is t

19;:025

=2:093. Since our test statistic thas an absolute

valuesmallerthanthecuto,wecannotrejectthenullhypothesis. Inotherwords,based

on a signicance level of :05 the evidence does not signicantly support the view that

6=1:5.

(40)

0

we stickwith thenullhypothesis. The smallerwemake ,the harderitisto reject H

0 .

20. Ifa test leadsusto reject H

0

,we saythatthe results aresignicant at level .

P-Values

1. There is something rather arbitrary aboutthe choice of . Why shouldwe use =:05

ratherthan .01,.10 orsome other value? What if we would have rejected H

0

at =:10

butfailto reject itbecause we chose =:05? Should we changeourchoice of?

2. Changing after a hypothesis test is \cheating" in a precise sense. Recall that, bythe

denitionof , the probabilityof a Type Ierror is at most . Thus, xing givesus a

guarantee on theeectivenessof thetest. If we change, we lose thisguarantee.

3. Nevertheless, there is an acceptable way to report what would have happened had we

chosen a dierent signicance level. This isbased on somethingcalled the p-value of a

test.

4. Thep-value isthesmallestsignicance level(i.e.,thesmallest)at whichH

0

wouldbe

rejected,foragiventeststatistic. Itisthereforeameasureofhowsignicanttheevidence

infavorof H

1

is: the smallerthep-value,themore compellingtheevidence.

5. Example: Consider the test of mean returns on stocks picked by darts, as above. To

simplify the present discussion, let's suppose we have 30 data points, rather than 20.

(See Dart Investment Fund in the casebook for background and data.) As before the

hypothesesare

H

0

: =1:5

H

1

: 6=1:5

Let'ssupposethatoursamplemean X (basedon 30observations)is-0.8and thesample

standarddeviation is6.1. Since we areassuminga largesample,ourtest statisticis

Z=

X 1:5

s=

p

n

=

0:8 1:5

6:1=

p

30

= 2:06

With a signicance level of =:05, we get z

=2

= 1:96, and our rejection region would

be

Z < 1:96 orZ >1:96:

So, inthis case, Z = 2:06 would be signicant: it is suÆcientlyfar from zero to cause

usto reject H

0 .

We now ask, what is thesmallest at which we would reject H

0

,based on Z = 2:06.

We are askingfor thesmallest suchthat 2:06< z

=2

;i.e., the smallest such that

z

=2

< 2:06. To nd this value, we look up 2.06 in the normal table. This gives us