• 沒有找到結果。

Generalized linear models

N/A
N/A
Protected

Academic year: 2022

Share "Generalized linear models"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

Generalized linear models using JAGS and R

Andrew O. Finley January 25, 2013

Generalized linear models

We make use of several libraries in the following example session, including:

ˆ library(coda)

ˆ library(rjags)

ˆ library(fields)

ˆ library(MBA)

ˆ library(geoR)

Let’s consider the logistic model where the outcome of the i-th sample unit, yiis either 1 or 0. In addition to the outcome, we often have a set of covariates observed at the sample units. Given these data, we can write the model as

π(yi| ηi) ∼ Ber(p(ηi)), for i = 1, . . . , n (1) where p(ηi) = exp(ηi)/(1 + exp(ηi)) is the probability of yi= 1 and

ηi= x0iβ. (2)

As in the ordinary linear regression model, xi = (xi,0, xi,1, . . . , xi,p)0 is a p × 1 vector (with xi0 set to one, i.e., the intercept) and β = (β0, β1, . . . , βp)0 is a p × 1 vector of regression coefficients.

1 Simulated data analysis

Let’s begin by simulating some data from model (1).

> set.seed(1)

> n <- 100

> X <- cbind(1, rnorm(n))

> beta <- c(0.1, -0.5)

> eta <- X %*% beta

> p <- 1/(1 + exp(-eta))

> y <- rbinom(n, size = 1, prob = p)

Note, exp(η)/(1 + exp(η)) and 1/(1 + exp(−η)) are mathematically equivalent forms of the inverse logit function.

We would like to generate samples from the posterior distributions of β0 and β1. Again, it is useful to provide JAGS with some reasonable starting values for these parameters. Note, the call to glm includes a -1 that indicates the intercept is already included in x.

> glm.m <- glm(y ~ X - 1, family = "binomial")

> summary(glm.m)

(2)

Call:

glm(formula = y ~ X - 1, family = "binomial") Deviance Residuals:

Min 1Q Median 3Q Max

-2.1663 -1.0622 0.5172 1.0552 1.7453 Coefficients:

Estimate Std. Error z value Pr(>|z|) X1 0.1521 0.2179 0.698 0.485090 X2 -0.9454 0.2779 -3.401 0.000671 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1)

Null deviance: 138.63 on 100 degrees of freedom Residual deviance: 124.33 on 98 degrees of freedom AIC: 128.33

Number of Fisher Scoring iterations: 4 Now we formulate the JAGS model file ex-3a.jag.

model{

f o r ( i in 1 : n ) {

e t a [ i ] ← b e t a . 0 + b e t a . 1*x [ i ] p i [ i ] ← 1/(1+ exp(− e t a [ i ] ) ) y [ i ] ∼ dbern ( p i [ i ] )

}

b e t a . 0 ∼ dnorm ( 0 , 0 . 0 0 0 0 0 1 ) b e t a . 1 ∼ dnorm ( 0 , 0 . 0 0 0 0 0 1 ) }

As in the previous exercises, we define the data objects needed in the JAGS model along with parameter starting values. Our call to jags.model requests three MCMC chains of 5000 iterations.

> x <- X[, 2]

> n <- length(y)

> data <- list(y = y, x = x, n = n)

> inits <- list(beta.0 = coefficients(glm.m)[1], beta.1 = coefficients(glm.m)[2])

> jags.m <- jags.model(file = "ex-3a.jag", data = data, + inits = inits, n.chains = 3, n.adapt = 1000) Compiling model graph

Resolving undeclared variables Allocating nodes

Graph Size: 806 Initializing model

(3)

Now we collect the posterior samples.

> params <- c("beta.0", "beta.1")

> samps <- coda.samples(jags.m, params, n.iter = 5000)

> plot(samps, density = FALSE)

1000 2000 3000 4000 5000 6000

−0.50.00.51.0

Iterations Trace of beta.0

1000 2000 3000 4000 5000 6000

−2.0−1.5−1.0−0.50.0

Iterations Trace of beta.1

Figure 1: Posterior sample chain and density plots for β.

We will also check the Gelman-Rubin Diagnostic to assess convergence, Figure 2. Given the potential scale reduction factors are all less then ∼1.1 and visual inspection of the chains suggests they are mixing well, we can concluded the chains have converged and we can turn our attention to summarizing the posterior samples.

> gelman.diag(samps)

Potential scale reduction factors:

Point est. Upper C.I.

beta.0 1 1

beta.1 1 1

Multivariate psrf 1

> gelman.plot(samps)

> burn.in <- 2000

> round(summary(window(samps, start = burn.in))$quantiles[, + c(3, 1, 5)], 2)

50% 2.5% 97.5%

beta.0 0.16 -0.27 0.60 beta.1 -0.97 -1.57 -0.45

(4)

1000 2000 3000 4000 5000 6000

1.001.051.101.151.201.25

last iteration in chain

shrink factor

median 97.5%

beta.0

1000 2000 3000 4000 5000 6000

1.001.011.021.031.041.051.06

last iteration in chain

shrink factor

median 97.5%

beta.1

Figure 2: Gelman-Rubin diagnostic plots for β.

Considering the addition of a random intercept

Say our data fall within three groups and we want to explore y differs with regard to group membership.

We could set this model up using dummy variables or use a multilevel model. When the number of groups is large or we expect the model to become more complicated, then the multilevel approach is desirable. For the multilevel model specification, we modify model (1) by adding an additional term to η. Specifically,

ηij = x0iβ + αj, for i = 1 . . . n and j = 1 . . . 3, (3) where αj is the group specific intercept offset. We assume αj∼ N (0, σα2).

Let’s simulate some data to try this model out.

> set.seed(1)

> n <- 200

> X <- cbind(1, rnorm(n))

> beta <- c(0.1, -0.5)

> alpha.groups <- sample(1:3, size = n, replace = TRUE)

> Z.alpha <- matrix(0, n, 3)

> Z.alpha[alpha.groups == 1, 1] <- 1

> Z.alpha[alpha.groups == 2, 2] <- 1

> Z.alpha[alpha.groups == 3, 3] <- 1

> head(Z.alpha) [,1] [,2] [,3]

[1,] 0 1 0

[2,] 1 0 0

[3,] 0 0 1

[4,] 0 0 1

[5,] 0 0 1

[6,] 0 0 1

(5)

> sigma.sq.alpha <- 1

> alpha <- rnorm(3, 0, sqrt(sigma.sq.alpha))

> p <- 1/(1 + exp(-(X %*% beta + Z.alpha %*% alpha)))

> y <- rbinom(n, size = 1, prob = p)

Now we formulate the JAGS model file ex-3a.jag.

model{

f o r ( i in 1 : n ) {

e t a [ i ] ← b e t a . 0 + b e t a . 1*x [ i ] + alpha [ alpha.indx [ i ] ] p i [ i ] ← 1/(1+ exp(− e t a [ i ] ) )

y [ i ] ∼ dbern ( p i [ i ] ) }

b e t a . 0 ∼ dnorm ( 0 , 0 . 0 0 0 0 0 1 ) b e t a . 1 ∼ dnorm ( 0 , 0 . 0 0 0 0 0 1 ) f o r ( j in 1 : q ) {

a l p h a [ j ] ∼ dnorm ( 0 , t a u . s q . a l p h a ) }

t a u . s q . a l p h a ∼ dgamma( 0 . 0 1 , 0 . 0 1 ) s i g m a . s q . a l p h a ← 1/ t a u . s q . a l p h a }

As in the previous exercises, we define the data objects needed in the JAGS model along with parameter starting values. Our call to jags.model requests three MCMC chains of 10000 iterations.

> x <- X[, 2]

> n <- length(y)

> data <- list(y = y, x = x, n = n, q = 3, alpha.indx = alpha.groups)

> inits <- list(beta.0 = coefficients(glm.m)[1], beta.1 = coefficients(glm.m)[2])

> jags.m <- jags.model(file = "ex-3b.jag", data = data, + n.chains = 3, n.adapt = 1000)

Compiling model graph

Resolving undeclared variables Allocating nodes

Graph Size: 1813 Initializing model

> params <- c("beta.0", "beta.1", "alpha", "sigma.sq.alpha")

> samps <- coda.samples(jags.m, params, n.iter = 10000)

> plot(samps, density = FALSE)

> burn.in <- 2000

> round(summary(window(samps, start = burn.in))$quantiles[, + c(3, 1, 5)], 2)

50% 2.5% 97.5%

alpha[1] 0.19 -3.43 3.80

alpha[2] -1.69 -5.34 1.88

(6)

alpha[3] 1.40 -2.13 5.16

beta.0 0.67 -2.94 4.26

beta.1 -0.37 -0.77 0.02

sigma.sq.alpha 3.50 0.60 82.77

So, are these parameter values correct? How is chain convergence?

Try your hand at writing some more complex models by, e.g., 1) adding another random offset to the intercept, and 2) allowing the covariate to vary by group, i.e., an intercept and slope varying model.

參考文獻

相關文件

Find the area of the resulting

A factorization method for reconstructing an impenetrable obstacle in a homogeneous medium (Helmholtz equation) using the spectral data of the far-field operator was developed

A factorization method for reconstructing an impenetrable obstacle in a homogeneous medium (Helmholtz equation) using the spectral data of the far-eld operator was developed

We can use the point (2 2) to determine the scale of the axes, and we estimate the length of the projection to be approximately 3.0 units.. All

[r]

To get the inteval of convergence , we need to check the end points, 1

We can use should / ought to / have to / must + the base form of a verb to give advice and talk about necessity.. Should and ought to are less strong and less formal than have to

C) protein chains maintained by interactions of peptide backbones D) amino acid sequence maintained by peptide bonds. E) protein structure maintained through multiple hydrogen