Introduction to Bayesian Statistics Lecture 2: Single Parameter (I)

(1)

Introduction to Bayesian Statistics

Lecture 2: Single Parameter (I)

Rung-Ching Tsai

Department of Mathematics National Taiwan Normal University

March 4, 2015

(2)

Recap of Bayesian Approach: Frequentist vs. Bayesian

• Frequentist (classical) statistics

◦ In Frequentist statistics,parameters are fixed, and we think of properties of estimation methods inrepeated sampling, that is, when we imagine taking many random samples from the population that generated our observed data.

◦ It is not meaningful to talk about the probability that the parameter falls within a range, such as Prob(θ > 0), or Prob(θ ∈ [a, b]) = 0.95.

• Bayesian statistics

◦ Probability measures degree of uncertainty. Inference is conditional on the observed data.

◦ There is not much distinction between parameters and random variables. Thus, it is perfectly legitimate to talk about theprobability that the parameter falls within a range, such as Prob(θ > 0), or Prob(θ ∈ [a, b]) = 0.95.

(3)

Recap of Bayesian Approach: From Prior to Posterior

• Goal: Estimate the parameter of interest θ or predict ˜y

• Three steps first

◦ specify the prior: p(θ)

◦ consider the likelihood of θ: l (θ|y ) = p(y |θ)

◦ find the posterior distribution of θ:

p(θ|y ) =p(θ, y )

p(y ) =p(θ)p(y |θ)

p(y ) ∝ p(θ)p(y |θ)

• Point estimation for θ

• Credible interval estimation for θ

• Predictive interval for ˜y

(4)

Point estimation for θ

• Point estimation in Bayesian statistics is solely based on its posterior distribution p(θ|y ).

• Define a loss function l (ˆθ, θ) measures the “loss” generated by an estimate.

• The expected loss gives the level of the loss of a specific estimator such that

E(l (ˆθ, θ)|y ) = Z

Θ

l (ˆθ, θ)p(θ|y )d θ.

• A Bayes estimator ˆθ minimizes the expected loss generated by an estimate for a specific loss function.

(5)

Point estimation for θ: expected a posteriori (EAP)

• The posterior expectation estimator is given by θˆ_EAP = E(θ|y ) =

Z

θp(θ|y )d θ.

• It minimizes the expectation of the quadratic loss function l₂(ˆθ, θ) = (ˆθ − θ)².

(6)

Point estimation for θ: posterior median estimator

• The median represents the point with 50% of the probability mass of the posterior distribution below it and 50% above it. i. e., the estimator is

θ = Med(θ|y ) :ˆ Z θ^ˆ

−∞

p(θ|y )d θ = 0.5.

• It minimizes the expectation of the linear loss function l₁(ˆθ, θ) = |ˆθ − θ|.

(7)

Point estimation for θ: maximim a posteriori (MAP)

• The posterior mode estimator is defined as the argument where the posterior probability density function takes its maximum.

θˆ_MAP = Mode of p(θ|y ) = arg max

θ p(θ|y ).

• It minimizes the expectation of the zero one loss function

l3(ˆθ, θ) =

(0, |ˆθ − θ| ≤ 1, |ˆθ − θ| >

(8)

Credible Interval estimation for θ

A credible interval [a, b] to the level (1 − α) is defined as Z b

a

p(θ|y )d θ = 1 − α,

where a, b ∈ R and p(θ|y ) the posterior distribution of θ.

• The random variable θ is with probability (1 − α) contained in the interval [a, b]. Note the semantic difference to confidence intervals in Frequentist interpretation.

• Such a credible interval is not unique.

• Typically the respective quantiles of p(θ|y ) are used as endpoints to construct the quantile-based intervals. That is, its 2.5% and 97.5% quantiles are used to construct a 95% credible interval for θ.

(9)

Highest posterior density (HPD) credible interval for θ

A HPD credible interval satisfies the following two conditions:

•

Z b a

p(θ|y )d θ = 1 − α,

•

p(θ|y ) ≥ p(˜θ|y ), ∀θ ∈ I and ∀˜θ /∈ I ,

where I = [a, b] ⊂ Θ is a HPD credible interval to the level 1 − α.

That is,the minimum density of any point within that region is equal to or larger than the density of any point outside that region.

(10)

Predictive interval for ˜ y

• After obtaining the posterior distribution of θ, p(θ|y ), we can compute the posterior predictive distribution of future observation

˜ y as p(˜y |y ) =

Z

Θ

p(˜y , θ|y )d θ = Z

Θ

p(˜y |θ, y )p(θ|y )d θ = Z

Θ

p(˜y |θ)p(θ|y )d θ.

• A 100(1 − α)% posterior predictive interval [c,d] for ˜y is similarly defined as

Z _d

c

p(˜y |y )d θ = 1 − α, where c, d ∈ R.

• Note thatprior predictive distribution of ˜y : p(˜y ) =

Z

Θ

p(˜y , θ)d θ = Z

Θ

p(˜y |θ)p(θ)d θ.

(11)

Single Parameter θ: Discrete y

• y ∼ binomial(n, θ), with one data point of y, use Bayesian approach to estimate θ.

◦ choose a prior of θ, but how? No idea! Use thenon-informative or flat priorfor θ such that

θ ∼ uniform(0, 1) i.e., p(θ) = 1 for θ ∈ (0, 1)

◦ likelihood of θ: p(y |θ) = ⁿ_yθ^y(1 − θ)^n−y

p(θ|y ) ∝ p(θ)p(y |θ) = 1 ×n y

θ^y(1 − θ)^n−y =n y

θ^y(1 − θ)^n−y

That is, θ|y ∼ Beta(y + 1, n − y + 1)

(12)

Single Parameter θ: Discrete y = (y

₁

, · · · , y

_m

)

• y₁, · · · , y_m^iid∼ binomial(n, θ), use Bayesian approach to estimate θ.

◦ choose a prior of θ, if we use thenon-informative priorfor θ θ ∼ uniform(0, 1) i.e., p(θ) = 1 for θ ∈ (0, 1)

◦ likelihood of θ:

p(y1, · · · , ym|θ) =

m

Y

i =1

n yi

θ^yⁱ(1 − θ)^n−yⁱ

p(θ|y) ∝ p(θ)p(y|θ) = 1×

m

Y

i =1

n yi

θ^yⁱ(1−θ)^n−yⁱ ∝ θ^P^m^{i =1}^yⁱ(1−θ)^P^m^{i =1}^(n−yⁱ⁾

That is, θ|y ∼ Beta(Pm

yi+ 1, nm −Pm

yi+ 1)

(13)

Single Parameter θ: Discrete y = (y

₁

, · · · , y

_m

)

• y₁, · · · , y_m^iid∼ binomial(n, θ), use Bayesian approach to estimate θ.

◦ choose a prior of θ, if we use theBeta(α, β) priorfor θ

θ ∼ Beta(α, β) i.e., p(θ) = 1

B(α, β)θ^α−1(1 − θ)^β−1for θ ∈ (0, 1)

◦ likelihood of θ: p(y₁, · · · , y_m|θ) =Qm i =1

n

y_iθ^yⁱ(1 − θ)^n−yⁱ

p(θ|y) ∝ p(θ)p(y|θ) = 1

B(α, β)θ^α−1(1 − θ)^β−1

m

Y

i =1

n yi

θ^yⁱ(1 − θ)^n−yⁱ

∝ θ^P^m^{i =1}^yⁱ^+α−1(1 − θ)^P^m^{i =1}^(n−yⁱ^)+β−1

Pm Pm

(14)

Exercise

Question: Suppose we all take turns to flip one particular coin 5 times, and use the data to estimate the probability of getting a head θ for this coin.

• Use both the Frequentist and Bayesian approaches to obtain the point and interval estimation.

• Use the Bayesian approach to answer the following questions:

◦ Based on our data, if you are to flip the coin again, what is the probability that the outcome will be a head?

◦ Based on our data, if you are to flip the coin 5 times, what is the probability mass function of the number of heads you will obtain?

(15)

Homework I

y1, · · · , yniid

∼ Poisson(λ), that is,

p(y_i|λ) = e^−λλ^yⁱ

y_i! , y_i = 0, 1, · · · .

Use Bayesian approach to estimate λ. Please obtain the posterior distribution of λ and its point and interval estimators.