• 沒有找到結果。

Lecture 8: Hypothesis testing

N/A
N/A
Protected

Academic year: 2022

Share "Lecture 8: Hypothesis testing"

Copied!
23
0
0

(1)

Lecture 8: Hypothesis testing

2nd of December 2015

(2)

Single sample test for a population mean

Consider the following hypothetical situation:

From previous experience we know that the birth weights of babies in England are Normally distributed with a mean of 3000g and a standard deviation of 500g.

We think that maybe babies in Australia have a mean birth weight greater than 3000g and we would like to test this hypothesis.

(3)

Null and Alternative Hypothesis

The main hypothesis that we are most interested in is the research hypothesis, denoted H1, that the mean birth weight of Australian babies is greater than 3000g.

The other hypothesis is the null hypothesis, denoted H0, that the mean birth weight is equall to 3000g.

In summary:

H0 : µ = 3000g H1 : µ > 3000g

The research hypothesis is often called the alternative hypothesis.

(4)

Null and Alternative Hypothesis

We start with the research hypothesis and ”set up” the null hypothesis to be directly counter to what we hope to show.

We then try to show that, in light of our collected data, the null hypothesis is false.

We do this by calculating the probability of the data if the null hypothesis is true.

If this probability is very small it suggests that the null hypothesis is false.

If this probability is large it suggests that there is not enough evidence to reject the null hypothesis.

(5)

Collecting a dataset

Once we have set up our null and alternative hypothesis we can collect a sample of data.

Imagine that we take a sample of 44 babies from Australia, measure their birth weights and we observe that the sample mean of these 44 weights is X = 3275.955g.¯

We now want to calculate the probability of obtaining a sample with mean as large as 3275.955 by chance under the assumption of the null hypothesis H0.

(6)

We know from previous lectures that,

If X1, X2, . . . Xn are n independent and

identically distributed random variables from a N (µ, σ2) then X ∼ N¯

 µ,σ2

n



Thus, under the assumption of the null hypothesis the sample mean of 44 values from a N (3000, 5002) is

X ∼ N¯



3000,5002 44



= N (3000, 5681.818) .

Now we can calculate the probability of obtaining a sample with a mean as large as 3275.955 using standardization.

(7)

Z ~ N(0, 1)

0 Z = X ï 3000

= 3.66

3000 3275.955 ï 3000

P(Z > 3.66)

3275.955 P(X > 3275.955) X ~ N(3000, 5681.818)

75.378

75.378

P X > 3275.955 = P¯

X − 3000¯

75.378 > 3275.955 − 3000 75.378



= P (Z > 3.66) = 0.00015

(8)

p-value

This probability is called the p-value of the test.

In this case the p-value is very low: the probability of the data is very low if we assume the null hypothesis is true.

But how low does this probability has to be before we can conclude that the null hypothesis is false?

Convention: choose a level of significance before the experiment that dictates how low the p-value should be before we reject the null hypothesis.

It is common to choose a significance level of 0.01 or 0.05.

We conclude that there is significant evidence against the null hypothesis if the p-value is less than or equal to 0.01 (or 0.05).

(9)

In our baby weight example, the p-value is 0.00015 which is lower than 0.01.

In this case, we conclude that

”there is significant evidence against the null hypothesis at the 0.01 level.”

Another way of saying this is that

”we reject the null hypothesis at the 0.01 level.”

(10)

If the p-value for the test were much larger, say 0.23, then we could conclude that

”the evidence against the null hypothesis is not significant at the 0.01 level.”

Another way of saying this is that

”we cannot reject the null hypothesis at the 0.01 level.”

(11)

Test statistics and critical regions

We can also compare the test statistic under the null hypothesis z = x − µ¯

σ/√

n = 3.66 . to a critical region of values defined such that

if the test statistic lies in this region then we will reject H0,

if the test statistic lies outside this region then we will not reject H0.

0

0.05 1.645 N(0, 1)

Critical Region

(12)

Overview of Hypothesis Testing

Version1:

1 Begin with a research (alternative) hypothesis and decide upon a level of significance for the test.

2 Set up the null hypothesis.

3 Collect a sample of data.

4 Calculate a test statistic from the sample of data.

5 Compare the test statistic to its sampling distribution under the null hypothesis by calculating the p-value.

6 Reject the null hypothesis if the p-value is less than the level of significance. Otherwise, retain the null hypothesis.

(13)

Overview of Hypothesis Testing

Version2:

1 Begin with a research (alternative) hypothesis and decide upon a level of significance for the test.

2 Set up the null hypothesis.

3 Collect a sample of data.

4 Calculate a test statistic from the sample of data.

5 Compare the test statistic to its sampling distribution under the null hypothesis by calculating the critical region for the test.

6 Reject the null hypothesis if the test statistic lies in the critical region. Otherwise, retain the null hypothesis.

(14)

One sample test for a proportion

Suppose that a university claims to admit equal number of state and public school students.

We have a research hypothesis that the university tends to admit more public school students so we interview 500 first year students and discover that 267 came from public school.

We want to test our hypothesis at the 0.05 level.

We first write down our null and alternative hypotheses regarding the population proportion p of public school students:

H0 : p = 0.5 H1 : p > 0.5

Using our sample of data we obtain an estimate of p as ˆp = 267 = 0.534.

(15)

We know that, among n = 500 interviewed students,

number of students from public school ∼ B(n, p) ≈ N (np, np(1 − p)) . Here we are interested at the proportion (and not the number) of students that came from public school. Approximatively

proportion of students from public school ∼ N



p,p(1 − p) n

 . In this situation, the test statistic used is

Z = p − pˆ qp(1−p)

n

∼ N (0, 1)

where p is the proportion dictated by the null hypothesis H0 and n is the size of our sample.

(16)

In our example, the value of the test statistic is

ˆ p − p qp(1−p)

n

= 0.534 − 0.5 q0.5×0.5

500

= 1.52

0

0.05 1.645 N(0, 1)

Critical Region

The test statistic does not lie in the critical region so we conclude that the evidence against the null hypothesis is not significant at the 0.05 level.

Similarly, we can compute the probability of obtaining an estimated proportion ˆp as large as 0.534 under the hypothesis H0:

p-value = P (ˆp > 0.534) ≈ P (Z > 1.52) = 0.064 where Z ∼ N (0, 1) which is larger than the critical level 0.05.

(17)

One and two-sided test

In the first example, we wanted to test the research hypothesis that mean birth weight of Australian babies was higher than 3000g. We had

H0 : µ = 3000g H1 : µ > 3000g

Instead, our research hypothesis could be that the mean birth weight of Australian babies is different from 300g. In this case we have

H0 : µ = 3000g H1 : µ 6= 3000g

This is an example of a two-sided test as opposed to the previous examples which were one-sided tests.

(18)

As before we would calculate our test statistic as σ/x−µ¯n = 3.66.

The p-value is different, though.

We are not looking at the probability that Z is only higher than 3.66 (in the positive direction), but that Z is at least this further away from 3.66 in either direction:

P (|Z| > 3.66) = P (Z > 3.66) + P (Z < −3.66)

= 2P (Z > 3.66) .

In general:

the two-sided p-value is always twice as big as the one-sided p-value.

(19)

Hypothesis testing and confidence interval

In the previous lecture, we have constructed confidence intervals for the mean µ of a normal distribution N (µ, σ2) based on n observations as



X − z¯ σ

√n, ¯X + z σ

√n



where z is such that P



X − z¯ σ

√n ≤ µ ≤ ¯X + z σ

√n



= α for a confidence level α.

You may notice that we do a lot of the same things to carry out a statistical test than we do to compute a confidence interval:

we compute a standard error

and look up a value on a standard normal table.

(20)

For instance, in the baby weight example we might have expressed our uncertainty about the average Australian birthweight in the form of a confidence interval. The calculation would have been just slightly different:

we start with the mean observed birthweight in the Australian sample:

X = 3275.955g¯ the standard error is σ/√

44, where σ is the (unknown) SD of Australian birth weights. Since we don’t know σ, we substitute the SD of the sample, which is 528g. So we use

SE = 528

√44 = 80g.

(21)

Then a 95% confidence interval for the mean birthweight of Australian babies is

(3275.955 − 1.96 × 80 , 3275.955 + 1.96 × 80) = (3120, 3432)g This is consistent with the observation that the Australian birth weights is significantly greater than the the UK average 3000g, at the 0.05

significance level.

(22)

Summary

Framework of hypothesis testing

Two ways to operate: computing a p-value or through a confidence region

Examples for normal distribution or proportions One-sided and two sided tests

Link between hypothesis testing and confidence interval

Reminder: the lecture notes contain more details and more examples; they are available on my website.

(23)

Happy Break

decision tree: a traditional learning model that realizes conditional aggregation.. Decision Tree Decision Tree Hypothesis.. Disclaimers about

The main disadvantage of the Derman-Kani tree is the invalid transition probability problem, in which the transition probability may become greater than one or less than zero.

Sometimes called integer linear programming (ILP), in which the objective function and the constraints (other than the integer constraints) are linear.. Note that integer programming

∗ Suppose we want to determine if stocks picked by experts generally perform better than stocks picked by darts. We might conduct a hypothesis test to de- termine if the available

Other advantages of our ProjPSO algorithm over current methods are (1) our experience is that the time required to generate the optimal design is gen- erally a lot faster than many

we use data (random sample) to test if the data provides significant evidence to reject the null hypothesis.. If X &gt; c reject

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette &amp; Turner, 1999?. Total Mass Density

This paper will present a Bayes factor for the comparison of an inequality constrained hypothesis with its complement or an unconstrained hypothesis. Equivalent sets of hypotheses