• 沒有找到結果。

Introduction to Bayesian Statistics Lecture 10: Model Checking

N/A
N/A
Protected

Academic year: 2022

Share "Introduction to Bayesian Statistics Lecture 10: Model Checking"

Copied!
15
0
0

加載中.... (立即查看全文)

全文

(1)

Introduction to Bayesian Statistics

Lecture 10: Model Checking

Rung-Ching Tsai

Department of Mathematics National Taiwan Normal University

(2)

The role of model checking

Three common steps in Bayesian modeling

specify the prior based on historical data or substantive knowledge

construct a reasonable probability model

compute the posterior distribution of model parameters or posterior predictive distribution of future observations

Yet another crucial step: Model checking

assessing the adequacy of the fit of the model to the data and to our substantive knowledge.

investigating what aspects of reality are not captured by the model.

checking of the adequacy of the plausibility of the model for the purposes for which the model will be used.

2 of 15

(3)

What is to be checked?

Idea: check “ALL” aspects of the “model”

Why? Prior-to-Posterior inferences involve the whole structure (with hierarchies) of the Bayesian model can produce spurious inference if the model is poor.

sensitivity analysis to thepriordistribution

the appropriateness of thelikelihoodmodel (i.e., the sampling distribution)

anyhierarchical structure

other issues such as whichexplanatory variables or covariatesshould have been included in a model

Any particularfeature of the dataone wishes to capture

(4)

Sensitivity Analysis

Basic Idea: Since it is typically the case that more than one reasonable probability model can provide an adequate fit to the data in a scientific problem. Sensitivity analysis aims to examine how much do posterior inferences change when other probability models are used in place of the present model?

Do the inferences from the model make sense?

Is the model consistent with the data? Posterior predictive checking

How can we compare or rank different plausible models, including the prior and likelihood etc, in their order of preference with respect to a given data set?

4 of 15

(5)

Do the inferences from the model make sense?

In any applied problem, there will be knowledge that is not included formally in either the prior distribution or the likelihood, for reasons of convenience or objectivity. If the additional information suggests that posterior inferences of interest are false, then this suggests a potential for creating a more accurate probability model for the parameters and data collection process.

(6)

Is the model consistent with the data? Posterior predictive checking

If the model fits, then replicated data generated under the model should look similar to observed data.

The observed data should look plausible uner the posterior predictive distribution. Therefore, one needs to check whether an observed discrepancy can be due to model misfit or chance.

Basic technique is to draw simulated values from the posterior predictive distribution of replicated data and compare these samples to the observed data.

Any systematic differences between the simulations and the data indicate potential failings of the model.

6 of 15

(7)

How can we compare (or rank) different plausible models?

Model expansion

Model comparison

(8)

Posterior predictive checks (I)

Let y = (y1, y2, · · · , yn)0 be the observed data and θ be the set of all parameters (including all hyperparameters) for a model

p(θ|y) ∝ p(θ) × p(y|θ).

Let yrep= (yrep,1, yrep,2, · · · , yrep,n)0 be the replicated data that we would see if the experiment was replicated with the same model and the same value of θ that produced the observed data y.

Replicated data yrep, like predictions ˜y, has two components of uncertainty:

p(yrep|y) = Z

p(yrep|θ)p(θ|y)d θ

the fundamental variability of the model, represented by the posited variability in the data

the posterior uncertainty in the estimation of θ

8 of 15

(9)

Posterior predictive checks (II)

Test quantities T (y, θ)

measure the discrepancy between model and data in the aspects of the data one wishes to check

Test quantities play a role in Bayesian model checking that test statistics, T (y), play in classical testing. In classical statistics, the test statistic T (y) does not depend upon model parameters.

Tail-area probability

Lack of fit of the data regarding the posterior predictive distribution can be measured by the tail-area probability, or p-value of the test quantity.

It is commonly computed using posterior simulations of (θ, yrep).

(10)

Posterior predictive checks (III)

Classical p-values (for test statistic T (y)) pC = Pr(T (yrep) ≥ T (y)|θ),

the probability is taken over the distribution of yrep with θ fixed.

the test statistic T (y) does not depend upon model parameters.

Posterior predictive p-values (for test quantity T (y, θ)) pB = Pr(T (yrep, θ) ≥ T (y, θ)|y)

= Z Z

IT (yrep,θ)≥T (y,θ)p(yrep|θ)p(θ|y)d yrepd θ

test quantities can be a function of the parameters and the data because the test quantity is evaluated over draws from the posterior distribution of the unknown θ and yrep.

10 of 15

(11)

Example. Checking the assumption of independence in binomial trials (I)

Consider a sequence of binary outcomes, y1, · · · , yn, modeled as iid Bernoulli trials

uniform prior distribution on the probability of success, θ.

the posterior density under the model is

p(θ|y) ∝ θs(1 − θ)n−s, with s =X yi.

Data: 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0.

The observed autocorrelation seems evidence that the model is

(12)

Example. Checking the assumption of independence in binomial trials (II)

T (y) = 3

draw θ from its posterior distribution, Beta(8,14).

draw yrep= (yrep,1, yrep,2, · · · , yrep,n)0 as independent Bernoulli with probability θ.

p-value=Pr(T (yrep, θ) ≥ T (y, θ)|y) ≈ 0.972

12 of 15

(13)

χ

2

-type discrepancy measure

Choose a discrepancy measure or test measure

T (y, θ) =

n

X

i =1

(yi− E [yi|θ])2 var(yi|θ)

Compute T (y, θj) and the set of T (yrep,j, θj) and obtain the posterior predictive p-values:

PB = Pr(T (yrep, θ) > T (y, θ))

≈ 1

M

M

X1[T (yrep,j, θj) > T (y, θj)]

(14)

Interpreting posterior predictive p-values

If the observed discrepancy measure has a tail-area probability close to 0 or 1, it implies that the observed pattern would be unlikely to be seen in replications of the data if the model were true.

An extreme p-value implies that the model cannot be expected to capture this aspect of the data. If a p-value is close to 0 or 1, it is not so important exactly how extreme it is. A p-value of 0.00001 is virtually no stronger, in practice, than 0.001; in either case, the aspect of the data measured by the test quantity is inconsistent with the model.

Major failures of the model, typically corresponding to extreme tail-area probabilities (less than 0.01 or more than 0.99), can be addressed by expanding the model in an appropriate way. Lesser failures might also suggest model improvements if the failure appears not to affect the main inferences.

The p-value measures statistical significance, not practical significance.

14 of 15

(15)

Limitations of posterior tests

Finding an extreme p-value and thus rejecting a model is never the end of an analysis; the departures of the test quantity in question from its posterior predictive distribution will often suggest

improvements of the model or places to check the data

Conversely, even when the current model seems appropriate for drawing inferences, the next scientific step will often be a more rigorous experiment incorporating additional factors, thereby providing better data.

參考文獻

相關文件

2.How do the other countries present generic skills?. 3.What are the recommended

Assessing Fit of Unidimensional Item Response Theory Models The issue of evaluating practical consequences of model misfit has been given little attention in the model

In order to explore the performance of posterior predictive model checking in eval- uating different sources of misfit for unidimensional graded IRT models, a number of response

We ran a series of posterior predictive checks as demonstrated in [43] to detect any systematic differences between the model and the observed data and to verify that inferences

O.K., let’s study chiral phase transition. Quark

◦ In Frequentist statistics, parameters are fixed, and we think of properties of estimation methods in repeated sampling, that is, when we imagine taking many random samples from

• The Bayesian approach is clear: Obtain the joint posterior distribution of all unknowns, then integrate over the nuisance parameters to leave the marginal posterior distribution

◦ Lack of fit of the data regarding the posterior predictive distribution can be measured by the tail-area probability, or p-value of the test quantity. ◦ It is commonly computed