• 沒有找到結果。

Introduction to Bayesian Statistics Lecture 2: Single Parameter (I)

N/A
N/A
Protected

Academic year: 2022

Share "Introduction to Bayesian Statistics Lecture 2: Single Parameter (I)"

Copied!
15
0
0

加載中.... (立即查看全文)

全文

(1)

Introduction to Bayesian Statistics

Lecture 2: Single Parameter (I)

Rung-Ching Tsai

Department of Mathematics National Taiwan Normal University

March 4, 2015

(2)

Recap of Bayesian Approach: Frequentist vs. Bayesian

Frequentist (classical) statistics

In Frequentist statistics,parameters are fixed, and we think of properties of estimation methods inrepeated sampling, that is, when we imagine taking many random samples from the population that generated our observed data.

It is not meaningful to talk about the probability that the parameter falls within a range, such as Prob(θ > 0), or Prob(θ ∈ [a, b]) = 0.95.

Bayesian statistics

Probability measures degree of uncertainty. Inference is conditional on the observed data.

There is not much distinction between parameters and random variables. Thus, it is perfectly legitimate to talk about theprobability that the parameter falls within a range, such as Prob(θ > 0), or Prob(θ ∈ [a, b]) = 0.95.

(3)

Recap of Bayesian Approach: From Prior to Posterior

Goal: Estimate the parameter of interest θ or predict ˜y

Three steps first

specify the prior: p(θ)

consider the likelihood of θ: l (θ|y ) = p(y |θ)

find the posterior distribution of θ:

p(θ|y ) =p(θ, y )

p(y ) =p(θ)p(y |θ)

p(y ) ∝ p(θ)p(y |θ)

Point estimation for θ

Credible interval estimation for θ

Predictive interval for ˜y

(4)

Point estimation for θ

Point estimation in Bayesian statistics is solely based on its posterior distribution p(θ|y ).

Define a loss function l (ˆθ, θ) measures the “loss” generated by an estimate.

The expected loss gives the level of the loss of a specific estimator such that

E(l (ˆθ, θ)|y ) = Z

Θ

l (ˆθ, θ)p(θ|y )d θ.

A Bayes estimator ˆθ minimizes the expected loss generated by an estimate for a specific loss function.

(5)

Point estimation for θ: expected a posteriori (EAP)

The posterior expectation estimator is given by θˆEAP = E(θ|y ) =

Z

θp(θ|y )d θ.

It minimizes the expectation of the quadratic loss function l2(ˆθ, θ) = (ˆθ − θ)2.

(6)

Point estimation for θ: posterior median estimator

The median represents the point with 50% of the probability mass of the posterior distribution below it and 50% above it. i. e., the estimator is

θ = Med(θ|y ) :ˆ Z θˆ

−∞

p(θ|y )d θ = 0.5.

It minimizes the expectation of the linear loss function l1(ˆθ, θ) = |ˆθ − θ|.

(7)

Point estimation for θ: maximim a posteriori (MAP)

The posterior mode estimator is defined as the argument where the posterior probability density function takes its maximum.

θˆMAP = Mode of p(θ|y ) = arg max

θ p(θ|y ).

It minimizes the expectation of the zero one loss function

l3(ˆθ, θ) =

(0, |ˆθ − θ| ≤  1, |ˆθ − θ| > 

(8)

Credible Interval estimation for θ

A credible interval [a, b] to the level (1 − α) is defined as Z b

a

p(θ|y )d θ = 1 − α,

where a, b ∈ R and p(θ|y ) the posterior distribution of θ.

The random variable θ is with probability (1 − α) contained in the interval [a, b]. Note the semantic difference to confidence intervals in Frequentist interpretation.

Such a credible interval is not unique.

Typically the respective quantiles of p(θ|y ) are used as endpoints to construct the quantile-based intervals. That is, its 2.5% and 97.5% quantiles are used to construct a 95% credible interval for θ.

(9)

Highest posterior density (HPD) credible interval for θ

A HPD credible interval satisfies the following two conditions:

Z b a

p(θ|y )d θ = 1 − α,

p(θ|y ) ≥ p(˜θ|y ), ∀θ ∈ I and ∀˜θ /∈ I ,

where I = [a, b] ⊂ Θ is a HPD credible interval to the level 1 − α.

That is,the minimum density of any point within that region is equal to or larger than the density of any point outside that region.

(10)

Predictive interval for ˜ y

After obtaining the posterior distribution of θ, p(θ|y ), we can compute the posterior predictive distribution of future observation

˜ y as p(˜y |y ) =

Z

Θ

p(˜y , θ|y )d θ = Z

Θ

p(˜y |θ, y )p(θ|y )d θ = Z

Θ

p(˜y |θ)p(θ|y )d θ.

A 100(1 − α)% posterior predictive interval [c,d] for ˜y is similarly defined as

Z d

c

p(˜y |y )d θ = 1 − α, where c, d ∈ R.

Note thatprior predictive distribution of ˜y : p(˜y ) =

Z

Θ

p(˜y , θ)d θ = Z

Θ

p(˜y |θ)p(θ)d θ.

(11)

Single Parameter θ: Discrete y

y ∼ binomial(n, θ), with one data point of y, use Bayesian approach to estimate θ.

choose a prior of θ, but how? No idea! Use thenon-informative or flat priorfor θ such that

θ ∼ uniform(0, 1) i.e., p(θ) = 1 for θ ∈ (0, 1)

likelihood of θ: p(y |θ) = nyy(1 − θ)n−y

find the posterior distribution of θ:

p(θ|y ) ∝ p(θ)p(y |θ) = 1 ×n y



θy(1 − θ)n−y =n y



θy(1 − θ)n−y

That is, θ|y ∼ Beta(y + 1, n − y + 1)

(12)

Single Parameter θ: Discrete y = (y

1

, · · · , y

m

)

y1, · · · , ymiid∼ binomial(n, θ), use Bayesian approach to estimate θ.

choose a prior of θ, if we use thenon-informative priorfor θ θ ∼ uniform(0, 1) i.e., p(θ) = 1 for θ ∈ (0, 1)

likelihood of θ:

p(y1, · · · , ym|θ) =

m

Y

i =1

 n yi



θyi(1 − θ)n−yi

find the posterior distribution of θ:

p(θ|y) ∝ p(θ)p(y|θ) = 1×

m

Y

i =1

 n yi



θyi(1−θ)n−yi ∝ θPmi =1yi(1−θ)Pmi =1(n−yi)

That is, θ|y ∼ Beta(Pm

yi+ 1, nm −Pm

yi+ 1)

(13)

Single Parameter θ: Discrete y = (y

1

, · · · , y

m

)

y1, · · · , ymiid∼ binomial(n, θ), use Bayesian approach to estimate θ.

choose a prior of θ, if we use theBeta(α, β) priorfor θ

θ ∼ Beta(α, β) i.e., p(θ) = 1

B(α, β)θα−1(1 − θ)β−1for θ ∈ (0, 1)

likelihood of θ: p(y1, · · · , ym|θ) =Qm i =1

n

yiyi(1 − θ)n−yi

find the posterior distribution of θ:

p(θ|y) p(θ)p(y|θ) = 1

B(α, β)θα−1(1 − θ)β−1

m

Y

i =1

 n yi



θyi(1 − θ)n−yi

θPmi =1yi+α−1(1 − θ)Pmi =1(n−yi)+β−1

Pm Pm

(14)

Exercise

Question: Suppose we all take turns to flip one particular coin 5 times, and use the data to estimate the probability of getting a head θ for this coin.

Use both the Frequentist and Bayesian approaches to obtain the point and interval estimation.

Use the Bayesian approach to answer the following questions:

Based on our data, if you are to flip the coin again, what is the probability that the outcome will be a head?

Based on our data, if you are to flip the coin 5 times, what is the probability mass function of the number of heads you will obtain?

(15)

Homework I

y1, · · · , yniid

∼ Poisson(λ), that is,

p(yi|λ) = e−λλyi

yi! , yi = 0, 1, · · · .

Use Bayesian approach to estimate λ. Please obtain the posterior distribution of λ and its point and interval estimators.

參考文獻

相關文件

We also think that Naïve Bayes is good method to filt spam when traing set is small, and because the word ap- pearing in ham isn’t too many, we can compute probabili- ties of words

The underlying idea was to use the power of sampling, in a fashion similar to the way it is used in empirical samples from large universes of data, in order to approximate the

Students in this Learning Unit should recognise the concepts of sample statistics and population parameters and their relationships:. Population Parameter

Hope theory: A member of the positive psychology family. Lopez (Eds.), Handbook of positive

Frequentists think of meta-experiments and consider the current dataset as a single realization from all possible datasets?. Slide 8— PhD (Aug 23rd 2011) — Frequentist and

• The Bayesian approach is clear: Obtain the joint posterior distribution of all unknowns, then integrate over the nuisance parameters to leave the marginal posterior distribution

◦ Lack of fit of the data regarding the posterior predictive distribution can be measured by the tail-area probability, or p-value of the test quantity. ◦ It is commonly computed

parameters in the model, where a parameter counts as: 1 if it is estimated with no constraints or prior information; 0 if it is fully constrained or if all the information about