DOI: 10.1007/S11336-008-9075-Y

A MULTIVARIATE MULTILEVEL APPROACH TO THE MODELING OF ACCURACY AND SPEED OF TEST TAKERS

R.H. KLEINENTINK, J.-P. FOX,AND W.J.VAN DERLINDEN UNIVERSITY OF TWENTE

Response times on test items are easily collected in modern computerized testing. When collect- ing both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel regression structure which allows the incorporation of covariates to explain the variance in speed and accuracy between individuals and groups of test takers. A Bayesian approach with Markov chain Monte Carlo (MCMC) computation enables straightforward estimation of all model parameters. Model-specific implementations of a Bayes factor (BF) and deviance information criterium (DIC) for model selection are proposed which are easily calculated as byproducts of the MCMC computation. Both results from simulation studies and real-data examples are given to illustrate several novel analyses possible with this modeling framework.

Key words: speed, accuracy, IRT, response times.

1. Introduction

Response times (RTs) on test items can be a valuable source of information on test takers and test items, for example, when analyzing the speededness of the test, calibrating test items, detecting cheating, and designing a test (e.g., Bridgeman & Cline,2004; Wise & Kong,2005;

van der Linden & Guo,2008; van der Linden, Breithaupt, Chuah, & Zang,2007; van der Linden, 2007). With the introduction of computerized testing, their collection has become straightfor- ward.

It is important to make a distinction between the RTs on the test items and the speed at which a test taker operates throughout the test, especially when each person takes a different selection of items, as in adaptive testing. For two different test takers, it is possible to operate at the same speed, but produce entirely different RTs because the problems formulated in their items require different amounts of information to be processed, different problem-solving strategies, etc. Models for RTs should therefore have separate parameters for the test takers’ speed and the time intensities of the items.

Another potential confounding relationship is that between speed and accuracy. It is well known that on complex tasks, these two are different constructs (see, for instance, Kennedy, 1930; Schnipke & Scrams,2002). Tate (1948) was one of the first to examine the relationship be- tween speed and accuracy on different tests. He concluded that for a controlled level of accuracy, each test taker worked at a constant speed. Furthermore, test takers working at a certain speed do not necessarily demonstrate the same accuracy.

Some of these findings can be explained by the well-known speed-accuracy trade-off (e.g., Luce,1986). The trade-off reflects the fact that speed and accuracy are main determinants of each other. Also, they are negatively related. When a person chooses to increase his speed, then

The authors thank Steven Wise, James Madison University, and Pere Joan Ferrando, Universitat Rovira i Virgili, for generously making available their data sets for the empirical examples in this paper.

Requests for reprints should be sent to R.H. Klein Entink, Department of Research Methodology, Mea- surement and Data Analysis, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. E-mail:

r.h.kleinentink@gw.utwente.nl

© 2008 The Author(s). This article is published with open access at Springerlink.com21

his accuracy decreases. But once his speed is fixed, his accuracy remains constant. Observe that this trade-off involves a within-person constraint only; it does not enable us to predict the speed or accuracy of one person from another taking the same test. In order to model the relationship between speed and accuracy adequately, we therefore need a model with different levels. This multilevel perspective has not yet been dominant in the psychometric literature on RT modeling.

Instead, attempts have been made to integrate speed parameters or RTs into traditional single- level response models (Verhelst, Verstralen, & Jansen,1997) or, reversely, response parameters into RT models (Thissen,1983). However, a hierarchical framework for modeling responses and RTs was introduced in van der Linden (2008). The framework has separate first-level models for the responses and RTs. For the response model, a traditional item-response theory (IRT) model was chosen. For the RTs, a lognormal model with separate person and item parameters was adopted, which has nice statistical properties and fitted actual response time data very well (van der Linden,2006). At the second level, the joint distributions of the person and item parameters in the two first-level models were modeled separately.

Observe that because the framework in this paper does not model a speed-accuracy tradeoff, it can be used just as well to analyze responses and RTs to instruments for noncognitive domains, such as attitudes scales or personality questionnaires.

Because the first-level parameters capture all systematic variation in the RTs, they can be assumed to be conditionally independent given the speed parameter. Likewise, the responses and RTs are assumed to be conditionally independent given the ability and speed parameter. Such assumptions of conditional independence are quite common in hierarchical modeling but may seem counterintuitive in the current context, where the speed-accuracy trade-off is often taken to suggest that the frequency of the correct responses increases if the RTs go up. However, this confusion arises when the earlier distinction between speed and RT is overlooked. The trade-off controls the choice of the levels of speed and accuracy by the individual test taker whereas the conditional independence assumptions address what happens with his response and RT distribu- tions after the levels of speed and accuracy have been fixed.

Besides being a nice implementation of the assumptions of local independence for RTs and responses, this framework allows for the incorporation of explanatory variables to identify factors that explain variation in speed and accuracy between individuals who may be nested within groups. The current paper addresses this possibility; its goal is to extend the framework with a third level with regression and group effects and to make this result statistically tractable.

The result is a multivariate multilevel model for mixed response variables (binary responses and continuous RTs). At the person level, just as in the original framework, it allows us to measure both accuracy and speed. Test takers can therefore be compared to each other with respect to these measures. But at the higher levels, the extended framework also allows us to identify covariates and group memberships that explain the measures as well as their relationships. Also, the item parameters are allowed to correlate.

Analysis of the extended model is performed in a fully Bayesian way. The motivation for the Bayesian treatment is its capability of handling complex models with many parameters that take all possible sources of variation into account. A new Gibbs sampling procedure (Geman

& Geman,1984; Gelfand & Smith,1990) was developed which applies not only to the current framework but to the entire class of nonlinear multivariate multilevel models for mixed responses with balanced and unbalanced designs. All parameters can be estimated simultaneously without the need to fine-tune any parameters to guarantee convergence, for instance, as in a Metropolis–

Hastings (MH) algorithm. Proper prior distributions can be specified that can be used both to incorporate a set of identifying restrictions for the model and to reflect the researcher’s ideas about the parameter values and uncertainties. The estimation method can also handle incomplete designs with data missing at random.

A model-specific implementation of the Bayes factor (Kass & Raftery,1995) and the de- viance information criterion (DIC) (Spiegelhalter, Best, Carlin, & van der Linde,2002) is given,

which can be used (i) to test specific assumptions about the distribution of speed and accuracy in a population of test takers and (ii) to iteratively build a structural multivariate multilevel com- ponent for the latent person parameters with fixed and random effects. Both statistics can be computed as by-products of the proposed Gibbs sampler. The DIC requires an analytic expres- sion of the deviance associated with the likelihood of interest. Such an expression is offered for the multivariate multilevel model given the complete data, which includes augmented continuous data given the binary responses (Albert,1992), integrating out both random person parameters and other random regression effects at the level of groups of respondents. The posterior expec- tation of this complete DIC is taken over the augmented data using the output from the MCMC algorithm. Properties of the DIC, as well as the Bayes factor, were analyzed in a study with simulated data.

In the next sections, we describe the entire model, specify the prior distributions, discuss the Gibbs sampler, and show how to apply the Bayes factor and the DIC to the current model. Then in a simulation study, the performance of the Gibbs sampler is addressed, whereby our interest is particularly in estimating the parameters in the structural component of the model. In a second simulation study, the relationships between the person parameters and the tests of multivariate hypotheses using the Bayes factor and the DIC are explored. Finally, the results from real-data examples are given and a few suggestions for extensions of the model are presented.

2. A Multivariate Multilevel Model

Various sources contribute to the variation between responses and RTs on test items. The total variation can be partitioned into variation due to (i) the sampling of persons and items, (ii) the nesting of responses within persons and items, and (iii) the nesting of persons within groups.

Two measurement models describe the distributions of the binary responses and continuous
RTs at level 1 of the framework. At level 2, two correlation structures are posited to allow for the
dependencies between the level 1 model parameters. First, the person parameters for ability and
**speed, denoted as θ and ζ , respectively, are modeled to have a multivariate normal regression****on covariates x, while group differences between these parameters are explained as a function**
**of group-level covariates w at a third level. By specifying a higher-level regression structure for**
these random person parameters, it becomes possible to partition their total variance into within-
group and between-group components. As a result, we are able to draw inferences about the
person parameters for different groups simultaneously. Second, a correlation structure for the
item parameters in the two measurement models is specified.

The model can be used for various analyses. First, the analysis might focus on the item parameters; more specifically, the relationships between the characteristics of the items in the domain covered by the test. For example, we may want to know the correlation between the time intensity and difficulty parameters of the items. Second, the analysis could be about the structural relationships between explanatory information at the individual and/or group levels and the test takers’ ability and speed. For example, the variance components of the structural model help us to explore the partitioning of the variance of the speed parameters across the different levels of analysis. Third, the interest might be in the random effects in the model, e.g., to identify atypical individuals or groups with respect to their ability or speed.

*2.1. Level-1 Measurement Models for the Responses and RTs*

*The probability of person i= 1, . . . , n**j**in group j= 1, . . . , J answering item k = 1, . . . , K*
*correctly (y**ij k**= 1) is assumed to follow the three-parameter normal ogive model:*

*P*

*y*_{ij k}*= 1 | θ**ij**, a*_{k}*, b*_{k}*, c*_{k}

*= c**k**+ (1 − c**k**)Φ(a*_{k}*θ*_{ij}*− b**k**),* (1)

*where Φ(.) denotes the normal distribution function, θ**ij* *the ability parameter of test taker ij,*
*and a**k**, b**k**,and c**k**the discrimination, difficulty and guessing parameters of item k, respectively.*

Typically, as the result of a natural lower bound at zero, RT distributions are skewed to
the right. A family that describes this characteristic well is the log-normal distribution (van der
Linden,2006; Schnipke & Scrams,1997). Let t*ij k* *denote the log-response time of person i in*
*group j on item k. We apply a normal model for t**ij k*, with a mean depending on the speed
*at which the person works, denoted as ζ**ij**, and the time intensity of the item, λ**k**. A higher λ**k*

*represents an item that is expected to consume more time. On the other hand, a higher ζ**ij* means
*that the person works faster and a lower RT is expected. A parameter φ**k*is introduced, which can
be interpreted as a time discrimination parameter.

The response-time model at level 1 is given by:

*t*_{ij k}*= −φ**k**ζ*_{ij}*+ λ**k**+ **ζ**ij k**,* (2)

*where **ζ*_{ij k}*∼ N(0, τ*_{k}^{2}*). Notice that the interpretation of the model parameters in (2) results in a*
different location of the minus sign compared to the IRT model. Also, there is a correspondence
of the RT model with IRT models for continuous responses; for the latter, see, for instance,
Mellenbergh (1994) and Shi & Lee (1998).

*2.2. Multivariate Two-Level Model for the Person Parameters*

The interest is in the relationships between the person parameters and the effects of poten- tial explanatory variables. For convenience, we use the same set of explanatory variables for both types of person parameters; the generalization to the case of different variables is straightforward.

**Let x***j* *denote a known n**j**× Q covariate matrix (with ones in the first column for the intercept)*
**and β**_{j}**= (β***1j***, β**_{2j}*)a Q× 2 matrix of regression coefficients for group j = 1, . . . , J . The coef-*
ficients are treated as random but they can be restricted to be common to all groups, leading to
the case of one fixed effect.

The regression of the two sets of person parameters at the individual level is defined by:

*θ*_{ij}**= x**^{t}_{ij}**β**_{1j}*+ e**θ**ij**,* (3)

*ζ**ij***= x**^{t}_{ij}**β**_{2j}*+ e**ζ*_{ij}*.* (4)

*The two sets of regression equations are allowed to have correlated error terms; (e**θ**ij**, e*_{ζ}_{ij}*)*is
**taken to be bivariate normal with zero means and covariance matrix ***P*:

* _{P}* =

*σ*_{θ}^{2} *ρ*
*ρ* *σ*_{ζ}^{2}

*.* (5)

* It is straightforward to extend the random effects model to explain variance in the β’s by*
group level covariates (Snijders & Bosker,1999). For instance, test takers can be grouped ac-
cording to their social economic background or because they are nested within different schools.

*Although different covariates can be included for the Q intercept and slope parameters, for con-*
**venience, it will be assumed that the same covariate matrix is used for β**_{1}**and β**_{2}. The covariates
**for the Q parameters of group j are contained in a matrix w***j* *of dimension Q× S. That is, in*
*total there are S covariates for each group, including the ones for the intercepts. The random*
**effects β**_{1j}**and β*** _{2j}*are then modeled as:

**β**_{1j}**= w***j***γ**_{1}**+ u***1j**,* (6)
**β**_{2j}**= w***j***γ**_{2}**+ u***2j**,* (7)

**where γ**_{1} **and γ**_{2} *are the vectors of regression coefficients of length S. The group-level error*
**terms, (u***1j**,***u***2j**),*are assumed to be multivariate normally distributed with means zero and co-
**variance matrix V. More stable parameter estimates can be obtained by restricting this covariance**
**matrix to be block-diagonal with diagonal matrices V**1**and V**2*,each of dimension Q× Q. In*
**this case, the random effects in the regression of θ on x are allowed to correlate but they are****independent of those in the regression of ζ on x. This choice will be made throughout this paper.**

**Note that when (xβ**_{1}*, xβ*

_{2}

*)= (μ*

*θ*

*, μ*

_{ζ}*)*

**= μ***P*, the model as proposed by van der Linden (2008) is obtained as a special case.

**Let θ***j* **and ζ**_{j}*denote the vectors of length n**j* *of the person parameters of group j . The*
entire structural multivariate multilevel model can now be presented as:

**vec(θ***j***, ζ**_{j}*)*=
**I**2**⊗ x**^{t}_{j}

**vec(β**_{j}*) + vec(e*

*θ*

*j*

*,*

**e**

*ζ*

*j*

*),*(8)

**vec(β**

_{j}*)*2

**= (I****⊗ w**

*j*

*)vec(γ*

_{1}

**, γ**_{2}

*)*

**+ vec(u***1j*

*,*

**u**

*2j*

*),*(9) where vec denotes the operation of vectorizing a matrix. We refer to these two models as level 2 and 3 models, respectively. Marginalizing over the random regression effects in (8) and (9), the

**distribution of vec(θ***j*

**, ζ**

_{j}*)*becomes

**vec(θ***j***, ζ**_{j}*)∼ N*

*(I*2**⊗ x***j***w***j** )γ , (I*2

**⊗ x**

*j*

*)V(I*2

**⊗ x**

*j*

*)*

^{t}

**+***P*

**⊗ I**

*n*

*j*

*.* (10)

The structural component of the model allows a simultaneous regression analysis of all per- son parameters on explanatory variables at the individual and group levels while taking into account the dependencies between the individuals within each group. As a result, among other things, conclusions can be drawn as to the size of the effects of the explanatory variables on the test takers’ ability and speed as well as the correlation between these person parameters. Note that hypotheses on these effects can be tested simultaneously.

*2.3. Multivariate Model for the Item Parameters*

An empirical distribution for the item parameters is specified such that for each item the
**vector ξ**_{k}*= (a**k**, b*_{k}*, φ*_{k}*, λ*_{k}*)*is assumed to follow a multivariate normal distribution with mean
**vector μ**_{I}*= (μ**a**, μ*_{b}*, μ*_{φ}*, μ*_{λ}*):*

**ξ**_{k}**= μ***I***+ e***I***, e**_{I}**∼ N(0, ***I**),* (11)
**where ***I* specifies the covariance structure.

The assumption introduces a correlation structure between the item parameters. For exam- ple, it may be expected that easy items require less time to be solved than more difficult items.

If so, the time intensity parameter correlates positively with the item difficulty parameter. The guessing parameter of the response model has no analogous parameter in the RT measurement model (since there is no guessing aspect for the RTs). Therefore, it does not serve a purpose to include it in this multivariate model and an independent prior for this parameter is specified below.

3. Exploring the Multivariate Normal Structure

The observed response data are augmented using a procedure that facilitates the statistical inferences. Besides, as will be shown in the next section, these augmentation steps allow for a fully Gibbs sampling approach for estimation of the model.

First, an augmentation step is introduced according to Beguin & Glas (2001). A variable
*s*_{ij k}*= 1 when a person ij knows the correct answer to question k and is s**ij k*= 0 otherwise. Its
conditional probabilities are given by:

*P*

*s*_{ij k}*= 1|y**ij k**= 1, θ**ij**, a*_{k}*, b*_{k}*, c*_{k}

= *Φ(a*_{k}*θ*_{ij}*− b**k**)*

*Φ(a**k**θ**ij**− b**k**)+ c**k**(1− Φ(a**k**θ**ij**− b**k**)),* (12)
*P*

*s*_{ij k}*= 0|y**ij k**= 1, θ**ij**, a*_{k}*, b*_{k}*, c*_{k}

= *c**k**(1− Φ(a**k**θ**ij**− b**k**))*

*Φ(a*_{k}*θ*_{ij}*− b**k**)+ c**k**(1− Φ(a**k**θ*_{ij}*− b**k**)),* (13)
*P*

*s**ij k**= 1|y**ij k**= 0, θ**ij**, a**k**, b**k**, c**k*

*= 0,* (14)

*P*

*s*_{ij k}*= 0|y**ij k**= 0, θ**ij**, a*_{k}*, b*_{k}*, c*_{k}

*= 1.* (15)

Second, following Albert (1992), continuous latent responses z*ij k* are defined:

*z*_{ij k}*= a**k**θ*_{ij}*− b**k**+ **θ*_{ij k}*,* (16)
**where the error terms are standard normally distributed and s is taken to be a matrix of indicator**
**variables for the events of the components of z being positive. When the guessing parameters**
*are restricted to be zero, it follows immediately that s**ij k* *= y**ij k* with probability one and the
2-parameter IRT model is obtained.

Statistical inferences can be made from the complete data due to the following factorization:

*p*

**y, z, s, t****| a, b, c, φ, λ, γ , ***P**,***V**

**= p(y | z, s)p(s | c)p**

**z, t****| a, b, φ, λ, γ , ***P**,***V**

*.* (17)
Our interest is in exploring the structural relationships between ability and speed. Therefore, the
term on the far right-hand side of (17) will be explored in more detail now. This likelihood can
be taken to be that of a normal multivariate multilevel model,

*p*

**z, t****| a, b, φ, λ, γ , ***P**,***V**
*,*

=

*p*

**z****| θ, a, b***p*

**t****| ζ , φ, λ***p*

**ζ , θ****| β, ***P*

*p*

**β****| γ , V**

* dθ dζ dβ.* (18)
Therefore, all factors in this decomposition are multivariate normal densities. The first two factors
occur because of the independence of the responses and response times given the latent person
parameters. The last two factors represent levels 2 and 3 of the model.

Inference from this multivariate hierarchical model simplifies when taking advantage of
some of the properties of the multivariate normal distribution. For example, let us assume for
*a moment that the item parameters are fixed and known and define (˜z**ij**, ˜***t***ij**) = (z*

*ij*

**+ b, t***ij*

**− λ).**Levels 1 and 2 of the model can then be represented by the following multivariate hierarchical structure:

⎡

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎣
*θ*_{ij}*ζ*_{ij}*. . . .*

*˜z**ij1*

*...*

*˜z**ij K*

*. . . .*

*˜t**ij1*

*...*

*˜t**ij K*

⎤

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎦

*∼ N*

⎛

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎝

⎡

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎣
**x**^{t}_{ij}**β**_{1j}**x**^{t}_{ij}**β**_{2j}*. . . .*

*a*1*θ**ij*

*...*
*a*_{K}*θ*_{ij}*. . . .*

*−φ*1*ζ*_{ij}*...*

*−φ**K**ζ*_{ij}

⎤

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎦
*,*

⎡

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎣

*σ*_{θ}^{2} *ρ* *σ*_{θ}^{2}**a**^{t}**−ρφ**^{t}*ρ* *σ*_{ζ}^{2} *ρ***a**^{t}*−σ*_{ζ}^{2}**φ**^{t}*. . . .*

**aσ**_{θ}^{2} **aρ****aσ**_{θ}^{2}**a**^{t}**+ I***K* **−aρφ**^{t}*. . . .*

**−φρ −φσ**_{ζ}^{2} **−φρa**^{t}**φσ**_{ζ}^{2}**φ**^{t}**+ τ**^{2}

⎤

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎦

⎞

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎠

*.* (19)

This representation provides insight in the complex correlational structure hidden in the data and entails several possible inferences. It also helps us to derive some of the conditional posterior distributions for the Gibbs sampling algorithm (e.g., the conditional posterior distributions of the latent person parameters given the augmented data). For a general treatment of the deriva- tion of conditional from multivariate normal distributions, see, for instance, Searle, Casella, and McCulloch (1992).

*Parameter ρ, which controls the covariance between the θ s and ζ s, plays an important role*
in the model. It can be considered to be the bridge between the separate measurement models for
ability and speed. Therefore, its role within the hierarchical structure will be explored in more
detail.

*The conditional covariance between the latent response variables and RTs on items k*=
*1, . . . , K is equal to cov(a**k**θ**ij**− b**k**+ **θ*_{ij k}*,−φ**k**ζ**ij**+ λ**k**+ **ζ*_{ij k}*)= −a**k**ρφ**k*, due to independence
*between the residuals as well as the residuals and the person parameters. Since a**k* *and φ**k* are
positive, the latent response variables and RTs, and hence the responses and RTs, correlate neg-
*atively when ρ is positive. So, in spite of conditional independence between the responses and*
RTs given the person parameters, their correlation is negative.

*The conditional distribution of θ**ij* *given ζ**ij* is normal:

*θ*_{ij}*| ζ**ij***, β**_{j}*, σ*_{θ}^{2}*, σ*_{ζ}^{2}*, ρ∼ N*

**x**^{t}_{ij}**β**_{1j}*+ ρσ*_{ζ}^{−2}

*ζ*_{ij}**− x**^{t}_{ij}**β**_{2j}

*, σ*_{θ}^{2}*− ρ*^{2}*σ*_{ζ}^{−2}

*.* (20)

*A greater covariance ρ between the person parameters gives a greater reduction of the conditional*
*variance of θ**ij* *given ζ**ij**. The expression also shows that the amount of information about θ**ij* in
*ζ** _{ij}* depends both on the precision of measuring the speed parameter and its correlation with the
ability parameter.

From (19), it also follows that the conditional expected value of θ*ij* given the complete data
is equal to

*E*

*θ**ij***| β***j**, ζ**ij**,***˜z***ij***, ***P**, a, b*

**= x**^{t}_{ij}**β**_{1j}*+ ρσ*_{ζ}^{−2}

*ζ*_{ij}**− x**^{t}_{ij}**β**_{2j}

*+ σ*_{θ}^{2}**a**^{t}

**aσ**_{θ}^{2}**a**^{t}**+ I***K*

_{−1}

**˜z***ij***− ax**^{t}_{ij}**β**_{1j}

=

**a**^{t}**a***+ σ*_{θ}^{−2}_{−1}

**a**^{t}**˜z***ij**+ σ*_{θ}^{−2}

**x**^{t}_{ij}**β**_{1j}*+ ρσ*_{ζ}^{−2}

*ζ**ij***− x**^{t}_{ij}**β**_{2j}

*.* (21)

*The conditional expected value of θ**ij*consists of two parts: one part representing the information
*about θ**ij* in the (augmented) response data and another the information through the multivariate
**regression on x***ij**. For ρ*= 0, (21) reduces to

*E*

*θ*_{ij}**| β***1j**,***˜z***ij**, σ*_{θ}^{2}*, a, b*

=

**a**^{t}**a***+ σ*_{θ}^{−2}_{−1}

**a**^{t}**˜z***ij**+ σ*_{θ}^{−2}**x**^{t}_{ij}**β**_{1j}

*.* (22)

*This expression can be recognized as the precision-weighted mean of the predictions of θ**ij*from
**the (augmented) response data and from the linear regression of θ on x (see, for instance, Fox &**

Glas,2001). Comparing (22) with (21), it can be seen that when ρ > 0, the expected value of θ*ij*

increases for test takers who work at a greater than average speed; that is, a test taker’s ability is predicted to be higher when the same response pattern is obtained at a higher speed (i.e., in a shorter expected time on the same set of items).

In (19), in addition to the responses and RTs, the random test takers were the only extra source of heterogeneity. But another level of heterogeneity was added in (9), where the test takers were assumed to be nested within groups and the regression effects were allowed to vary randomly across them. Also, the item parameters correlate in (11). Because of these random effects and correlations, the marginal covariances between the measurements change.

We conclude this discussion with the following comments:

• In (19), a special structure (compound symmetry) for the covariance matrix of the residu- als at the level of individuals was shown to exist. This structure may lead to more efficient inference. For a general discussion of possible parameterizations and estimation meth- ods for multivariate random effects structures, see, for instance, Harville (1977), Rabe- Hesketh and Skrondal (2001), and Reinsel (1983).

• Linear multivariate three-level structures for continuous responses are discussed, among
others, in Goldstein (2003), and Snijders and Bosker (1999). As already indicated, the
covariance structure of the level-3 random regression effects is assumed to be block di-
**agonal. This means that the parameters in the regression of θ on x are conditionally in-*** dependent of those in the regression of ζ on x. It is possible to allow these parameters*
to correlate but this option is unattractive when the dimension of the covariance matrix
becomes large. Typically, the covariance matrix is then poorly estimated (Laird & Ware,
1982).

• For the same reason, the covariance matrix of the fixed effects in (9) is assumed to be block diagonal. The Bayesian approach in the next sections allows us to specify different levels of prior information about this matrix.

4. Bayesian Estimation Using Gibbs Sampling

In Bayesian statistics, inferences are made from the posterior distribution of the model para- meters. Markov chain Monte Carlo (MCMC) methods enable us to simulate random draws from this distribution. Summary statistics can then be used to estimate the parameters or functionals of interest. A useful feature of MCMC methods is that they remain straightforward and easy to implement when the complexity of the model increases. Also, they allow for the simultaneous estimation of all model parameters. Since the current model is quite complex and has many pa- rameters, we need these advantages to estimate the model. For a general introduction to Gibbs sampling, see Gelman, Carlin, Stern, and Rubin (2004) and Gelfand & Smith (1990). MCMC methods for IRT models are discussed by Albert (1992) and Patz & Junker (1999).

A new Gibbs sampling scheme was developed to deal with the extension of the model. Fur- ther, the scheme differs from that in van der Linden (2008) by its increased efficiency; it samples both types of person parameters in one step, taking into account the identifying restrictions, and avoids an MH step in the sampling of the item parameters due to better capitalization on the regression structure of the model. The full conditional distributions of all model parameters for the scheme are given in theAppendix.

The remainder of this section discusses the priors and identifying restrictions we use.

*4.1. Prior Distributions*

*The parameter c**k* is the success probability in the Binomial distribution for the number
*of correct guesses on item k. A Beta prior with parameters B(b*_{1}^{}*, b*_{2}^{}*)*is chosen, which is the
conjugate for the Binomial likelihood, and thus leads to a Beta posterior.

*For the residual variance τ*_{k}^{2}, a conjugate inverse Gamma prior is assumed with parameters
*g*_{1}*and g*2.

**A normal inverse-Wishart prior is chosen for the mean vector μ**_{I}**and covariance matrix ***I*

of the item parameters. The family of priors is conjugate for the multivariate normal distribution (Gelman et al.,2004). Thus,

* _{I}*∼ Inverse − Wishart

^{−1}_{I}

0 *, ν*_{I}_{0}

*,* (23)

**μ**_{I}**| ***I***∼ N(μ***I0***, **_{I}*/κ*_{I0}*).* (24)

*A vague proper prior follows if ν**I0*is set equal to the minimum value for the degrees-of-freedom
parameter and a diagonal variance matrix with large values is chosen.

* Likewise, a normal inverse-Wishart prior is chosen for the fixed parameters γ of the multi-*
variate random-effects structure of the person parameters in (9),

* _{γ}*∼ Inverse − Wishart

^{−1}_{γ}

0 *, ν*_{γ}_{0}

*,* (25)

**γ****| ***γ** ∼ N(γ*0

**,**

_{γ}*/κ*

_{γ}_{0}

*).*(26)

**The covariance matrix V of the level-3 random group effects (u***1j**,***u***2j**)*is assumed to also
**have an inverse-Wishart prior with scale matrix V**0*and degrees of freedom ν**V*0.

**The prior for the covariance matrix of the person parameters, ***P*, is chosen to give special
treatment because the model is not yet identified.

**4.2. Prior for ***P* *with Identifying Restrictions*

The model can be identified by fixing the scales of the two latent person parameters.

A straightforward way of fixing the scale of the ability parameter is to set the mean equal to
* zero and the variance to one. To avoid a tradeoff between φ and ζ the time discrimination pa-*
rameters are restricted to

_{K}*k*=1*φ*_{k}* = 1. When these are restricted to φ = 1 the lognormal RT*
model as proposed by van der Linden (2006) is obtained. Then for the speed parameter, since
RTs have a natural unit, we only have to fix the origin of its scale and set it equal to its population
mean. Note that a multivariate probit model is identified by fixing the diagonal elements of the
covariance matrix (Chib & Greenberg,1998) but that because of the special nature of the RTs, in

**the current case only one element of***P*has to be fixed.

Generally, two issues arise when restricting a covariance structure. First, defining proper priors for a restricted covariance matrix is rather difficult. For example, for the conjugate inverse- Wishart prior, there is no choice of parameter values that reflects a restriction on the variance of the ability parameter such as that above. For the multinomial probit model, McCulloch and Rossi (1994) tackled this problem by specifying proper diffuse priors for the unidentified parameters and reporting the marginal posterior distributions of the identified parameters. However, it is hard to specify prior beliefs about unidentified parameters. Second, for a Gibbs sampler, sampling from a restricted covariance matrix requires extra attention. Chib and Greenberg (1998) defined individual priors on the free covariance parameters, but as a result, the augmented data had to be sampled from a special truncated region and the values of the free covariance parameter could only be sampled using an MH step. However, such steps involve the specification of an effective proposal density with tuning parameters that can only be fixed through a cumbersome process.

A general approach for sampling from a restricted covariance matrix can be found in Browne (2006) but this is also based on an MH algorithm.

Here, a different approach is taken that allows us to specify proper informative priors and
*facilitate the implementation of the Gibbs sampler. A prior is chosen such that σ*_{θ}^{2}= 1 with
**probability one. Hence, covariance matrix ***P* always equals:

*P*=

1 *ρ*
*ρ* *σ*_{ζ}^{2}

*.* (27)

Using (8) and (27), the conditional distribution of ζ*ij* *given θ**ij* has density
*ζ*_{ij}*| θ**ij***, β**_{j}*, ρ, σ*_{ζ}^{2}*∼ N*

**x**^{t}_{ij}**β**_{2j}*+ ρ*

*θ*_{ij}**− x**^{t}_{ij}**β**_{1j}*,˜σ*_{ζ}^{2}

where *˜σ*_{ζ}^{2}*= σ*_{ζ}^{2}*− ρ*^{2}*. Parameter ρ can be viewed as the slope parameter in a normal regression*
*problem of ζ**ij* *on θ**ij* with variance *˜σ*_{ζ}^{2}. Specifying a normal and inverse gamma as conjugate

priors for these parameters,

*ρ∼ N*
*ρ*0*, σ*_{ρ}^{2}

*,* (28)

*˜σ*_{ζ}^{−2}*∼ Gamma(g*1*, g*2*),* (29)

their full conditional posterior distributions become
*ρ | θ, ζ , β, ρ*0

*, σ*

_{ρ}^{2}

*∼ N*

*Δ*

*ρ*0*σ*_{ρ}^{−2}+
* θ− xβ*1

_{t}

* (ζ− xβ*2

*)*

*, Δ*

*,* (30)

*˜σ*_{ζ}^{−2}**| θ, ζ , β, ρ ∼ Gamma**

*g*_{1}*+ N/2, g*2*+ Ξ*^{t}*Ξ /2*

*,* (31)

where

*Δ*=

* (θ− xβ*1

*)*

^{t}*1*

**(θ****− xβ***)+ σ*

_{ρ}^{−2}

_{−1}

and *Ξ = (ζ − xβ*2

*)*1

**− ρ(θ − xβ***).*

Since**|***P**| = σ*_{ζ}^{2}*− ρ*^{2}*= ˜σ*_{ζ}^{2}and*˜σ*_{ζ}^{2}*>*0, it follows that the determinant**|***P**| > 0. The latter*
**is sufficient to guarantee matrix ***P* to be positive semi-definite.

When implementing a Gibbs sampler, the random draws of the elements of covariance ma-
**trix ***P* in (27) can be constructed from the samples drawn from (30)–(31). These draws will
show greater autocorrelation due to this new parametrization. This implies that more MCMC
iterations are needed to cover the support of the posterior distribution adequately, a measure that
only involves a (linear) increase in the running time of the sampler. On the other hand, con-
vergence of the algorithm is easily established without having to specify any tuning parameter.

Finally, this procedure also enables straightforward implementation of the data augmentation
**procedure since the zs can be drawn from a normal distribution truncated at zero, where s indi-**
**cates when z is positive.**

The key element of the present approach is the specification of a proper prior distribution for the covariance matrix with one fixed diagonal element and the construction of random draws from the matrix from the corresponding conditional posterior distribution. For the multinomial probit model, the approach was also followed by McCulloch, Polson, and Rossi (2000). For completeness, we also mention an alternative approach. Barnard, McCullogh, and Meng (2000) formulated a prior directly for the identified parameters. In order to do so, they factored the covariance matrix into a diagonal matrix with standard deviations and a correlation matrix, and specified an informative prior for the latter. This prior was then incorporated into a Griddy–

Gibbs sampler. However, such algorithms can be slow and require the choices of a grid size and boundaries. Boscardin and Zhang (2004) followed a comparable approach but used a parameter- extended MH algorithm for sampling values from the conditional distribution of the correlation matrix.

5. Model Selection Methods

A model comparison method is often based on a measure of fit and some penalty function based on the number of free parameters for the complexity of the model. A bias-variance trade- off exists between these two quantities since a more complex model often leads to less bias but a less complex model involves more accurate estimation. Two well-known criteria of model selection based on a deviance fit measure are the Bayesian information criterion (BIC) (Schwarz, 1978) and Akaike’s information criterion (AIC) (Akaike,1973). These criteria depend on the effective number of parameters in the model as a measure of model complexity. A drawback of these measures is that they are often difficult to calculate for hierarchical models: Although

the nominal number of parameters follows directly from the likelihood, the prior distribution imposes additional restrictions on the parameter space and reduces its effective dimension. In a random-effects model, the effective number of parameters depends strongly on the higher-level variance parameters. When the variance of the random effects approaches zero, all random effects are equal and the model reduces to a simple linear model with one mean parameter. But when the variance goes to infinity, the number of free parameters approaches the number of random effects.

Spiegelhalter et al. (2002) proposed the deviance information criterion (DIC) for model comparison when the number of parameters is not clearly defined. The DIC is defined as the sum of a deviance measure and a penalty term for the effective number of parameters based on a measure of model complexity described below.

An alternative method for model selection that can handle complex hierarchical models is the Bayes factor; for a review, see Kass and Raftery (1995). The Bayes factor is based on a comparison of marginal likelihoods but its implementation is hampered by its critical dependence on the prior densities assigned to the model parameters. It is known that the Bayes factor tends to favor models with reasonably vague proper priors; see, for instance, Berger and Delampady (1987) and Sinharay and Stern (2002). An advantage of the Bayes factor is its clear interpretation as the change in the odds in favor of the model when moving from the prior to the posterior distribution (Lavine & Schervish,1999).

In one of the empirical examples below, the focus is on the structural multivariate model for the person parameters. It will be shown that a DIC can be formulated for choosing between models that differ in the fixed and/or random part of the structural model. In addition, a Bayes factor for selecting between the IRT measurement model for binary responses and the model extended with the hierarchical structure for responses and RTs is presented.

*5.1. Model Selection Using the DIC*

The DIC requires a closed-form likelihood. Our interest is focused on the likelihood of the structural parameters in the model; accordingly, all random effect parameters can be intregated out. Besides, the variances, covariances, and items parameters are considered as nuisance para- meters, and their values are assumed to be known. So, a DIC will be derived for the complete- data likelihood with the random effects integrated out. Subsequently, the posterior expectation of the DIC over the augmented data will be taken. The same procedure was proposed for mixture models by DeIorio and Robert (2002).

**Let z**^{∗}_{ij}**= vec(z***ij***+ b, t***ij* **− λ) and H***P* * = (a ⊕ −φ). From (19), Conditional on s, the mea-*
surement models for ability and speed can be summarized as

**z**^{∗}_{ij}**= H***P**ij***+ e***ij**,* (32)

**where e***ij* **∼ N(0, C), with C = (I***K***⊕ I***K***τ**^{2}*)*a diagonal matrix with in the left upper square
**1 and in the right lower square τ on its diagonal, and ***ij* **= vec(θ***ij***, ζ**_{ij}*). The focus is on the*
**structure of . Using the factorization in (17), the standardized deviance is**

* D()*=

*ij*

**z**^{∗}_{ij}**− H***P*_{ij}*t*

**C**^{−1}

**z**^{∗}_{ij}**− H***P*_{ij}

*.* (33)

The DIC is defined as

DIC=
DIC**| z**

*p(z | y) dz* (34)

= *D ¯*
*+ 2p**D*

*p(z | y) dz* (35)

*= E***z**

*D ¯*

*+ 2p**D***| y**

*,* (36)

where ¯*equals the posterior mean and p**D* is the effective number of parameters given the
augmented data. The latter can be shown to be equal to the mean deviance minus the deviance of
the mean. Hence,

*p*_{D}* = D() − D* ¯

(37)

*= E**
*

**D()****| z**^{∗}

*−D*
*E*

**| z**^{∗}

*= E**
*

*ij*

**z**^{∗}_{ij}**− H***P*_{ij}*t*

**C**^{−1}

**z**^{∗}_{ij}**− H***P*_{ij}

*− D*
*E*

**| z**^{∗}

= tr

*ij*

*E*_{}

**z**^{∗}_{ij}**− H***P*_{ij}

**z**^{∗}_{ij}**− H***P*_{ij}_{t}**C**^{−1}

− tr

*ij*

**z**^{∗}_{ij}**− H***P**E*

**| z**^{∗}

**z**^{∗}_{ij}**− H***P**E*

**| z**^{∗}*t*

**C**^{−1}

=

*ij*

tr
*E**
*

**z**^{∗}_{ij}**− H***P*_{ij}

**z**^{∗}_{ij}**− H***P*_{ij}_{t}**C**^{−1}

−

**z**^{∗}_{ij}**− H***P**E*

**| z**^{∗}

**z**^{∗}_{ij}**− H***P**E*
**| z**^{∗}*t*

**C**^{−1}

=

*ij*

tr

**C**^{−1}var
**e***ij***| z**^{∗}*ij*

(38)

=

*ij*

tr

**C**^{−1}**var(e***ij**)*− cov
**e***ij**,***z**^{∗}_{ij}

var
**z**^{∗}_{ij}_{−1}

cov

**e***ij**,***z**^{∗}_{ij}

(39)

=

*ij*

tr

**C**^{−1}var
**z**^{∗}_{ij}_{−1}

**var(H***P*_{ij}*)*

(40)

=

*ij*

tr
**C**^{−1}

**H***P***x***ij***w***j*_{γ}**w**^{t}_{j}**x**^{t}_{ij}**H**^{t}_{P}**+ H***P***x***ij**V***x**^{t}_{ij}**H**^{t}_{P}**+ H***P*_{P}**H**^{t}_{P}**+ C**_{−1}

×

**H***P***x***ij***w***j*_{γ}**w**^{t}_{j}**x**^{t}_{ij}**H**^{t}_{P}**+ H***P***x***ij**V***x**^{t}_{ij}**H**^{t}_{P}**+ H***P*_{P}**H**^{t}_{P}

*,* (41)

*where tr(·) denotes the trace function, i.e., the sum of the diagonal elements. The expectation is*
* taken with respect to the posterior distribution of . The terms in (38) can be recognized as the*
posterior variances of the residuals whereas those in (40) follow from the fact that because of

**independence, the variance of z**

^{∗}

_{ij}**equals the sum of the variance of H**

*P*

_{ij}**and e**

*ij*.

DICs of nested models are computed by restricting one or more variance parameters in (41) to zero. Also, (41) can be estimated as a by-product of the MCMC algorithm; that is, the output of the algorithm can be used to estimate the posterior means of the model parameters in the second term of (37) and to integrate the DIC over the item parameters to obtain the first term. (In the current application, the item parameters are the nuisance parameters.)

Usually the variance parameters are unknown. Then the DIC has to be integrated over their marginal distribution, too. In fact, the correct Bayesian approach would be to integrate the joint posterior over the nuisance parameters to obtain the marginal posterior of interest. However, this approach is not possible since no closed-form expression of the DIC can be obtained for this mar- ginal posterior. Thus, our proposal does not account for the unknown variances. Equation (41) reflects the effective number of parameters of the proposed model without the additional vari- ability in the posterior because of the unknown covariance parameters. The more general case with unknown covariance parameters is complex, and no simple correction seems available. But Vaida and Blanchard (2005) showed that for a mixed-effects model, the correction for unknown covariance parameters is negligible asymptotically. So, it seems safe to assume that their effect on the estimate of (37) becomes only apparent when the covariance parameters are estimated less precisely.

*5.2. Model Selection Using the Bayes Factor*

The question we address is if the use of the RTs in the hierarchical model proves to be
beneficial for making inferences about the ability parameter. As no benefits can be obtained
*when the correlation r(θ, ζ )= = 0 (i.e., independence between θ and ζ ), a Bayes factor is*
*defined to test whether the data support fitting a model M*1*between θ and ζ or the null model*
*M*_{0}*⊂ M*1with independence. For an introduction to Bayes factors, see Berger and Delampady
(1987), Kass and Raftery (1995).

Both models are given equal prior weight. Therefore, the Bayes factor can be presented as
BF=*p(y, t| M*0*)*

*p(y, t| M*1*)* (42)

=

*p(y | z)p(z, t | M*0

*) dz*

*p(y | z)p(z, t | M*1

*) dz*(43)

=

*p(y | z)p(z, t | = 0) dz*

*p(y | z)p(z, t | )π( ) d dz.* (44)

*A popular family of conjugate priors for the correlation coefficient has the form (1− *^{2}*)** ^{ν}* on
its support, 0

*≤ ≤ 1 (Lee,*2004). For ν

*= 0, a uniform distribution is obtained. For ν = 5, a*

*half-normal distribution is approximated. For ν→ ∞, the prior assigns probability 1 to = 0,*

*which yields model M*0. To assess the sensitivity of the Bayes factor to the specification of the prior density, a variety of members from the family can be chosen.

6. Simulation Study

In the first study, different data sets were simulated and the parameters were re-estimated to check the performance of the Gibbs sampler. In the second study, the properties of the proposed Bayes factor in (44) were investigated for datasets generated for different values of and different choices of prior distributions. We also checked the rejection region for the null hypothesis. In the third study, the characteristics of the proposed DIC test were analyzed.

*6.1. Parameter Recovery*

Datasets were simulated for the following structural component of the model:

*θ*_{ij}*ζ*_{ij}

=

*γ*00*+ u*^{(θ )}_{0j}*γ*_{10}*+ u*^{(ζ )}_{1j}

+

*x**ij*

*w**j**γ*01*+ u*^{(θ )}_{1j}*x*_{ij}

*w*_{j}*γ*_{11}*+ u*^{(ζ )}_{2j}

+

*e*_{1ij}*e*_{2ij}

*,*

TABLE1.

Simulated and estimated values of the structural parameters.

True value EAP SD Fixed parameters

*γ*_{00} *0.00* *0.00* –

*γ*_{01} *4.00* *3.77* *0.23*

*γ*_{10} *0.00* *0.00* –

*γ*_{11} *3.00* *2.99* *0.12*

Variance components

_{P}_{11} *1.00* *1.00* –

_{12} *0.50* *0.55* *0.04*

_{22} *1.00* *1.07* *0.06*

**V**_{1} *V*_{11} *1.00* *1.00* *0.25*

*V*12 *0.50* *0.48* *0.22*

*V*_{22} *1.00* *1.13* *0.35*

**V**_{2} *V*_{11} *1.00* *1.07* *0.23*

*V*_{12} *0.50* *0.47* *0.17*

*V*_{22} *1.00* *0.86* *0.19*

**where e***ij* **∼ N(0, ***P**), u*^{(θ )}* ∼ N(0, V*1

*)*

**and u**

^{(ζ )}*2*

**∼ N(0, V***). The model had the same set of*explanatory variables in the regression of each latent parameter and had random intercepts and slopes. The intercepts and slopes were taken to be independent of the residuals and across the person parameters. The true values of the structural parameters used in the study are given in Table1. The values of the explanatory variables x and w were drawn from a standard normal distribution. For the responses, the 2PL model was assumed and the item parameters were drawn

**from a multivariate normal distribution with mean μ**

_{I}*= (1, 0, 1, 0) and a diagonal covariance*

**matrix***I*

**with all variances equal to 0.5. Negative values of φ and a were simply ignored.***Responses and RTs were simulated for N= 1,000 persons nested in 50 groups each taking 20*
items.

*In the estimation procedure, the following hyperparameters were used: Scale matrices **I0*

*and **γ*0*were chosen to be diagonal with elements 0.01 to indicate vague proper prior informa-*
**tion, and we set μ**_{I0}* = (1, 0, 1, 0) and γ*0

**= 0. Besides, a vague normal prior with parameters**

*μ*

*ρ*

*= 0 and σ*

*ρ*

^{2}

*= 10 was specified for ρ.*

The MCMC procedure was iterated 50,000 times and the first 5,000 iterations were discarded when the means and posterior standard deviations of the parameters were estimated.

The accuracy of the parameter estimates was investigated by comparing them to their true
values. The results for the parameters in the structural model are given in Table 1. Both the
estimates of the fixed parameters and the variance components are in close agreement with the
*true values. (Note that γ*00 *and γ*10 are zero due to the identifying restrictions.) Although not
shown here, the same close agreement was observed for the item parameter estimates.

*6.2. Sensitivity of the Bayes Factor*

Usually, we will have little prior information about the correlation of the person parame-
ters. Therefore, it is important to know how the Bayes factor behaves for a relatively vague prior
*distribution of the correlation = ρ*^{2}*/*

*σ*_{θ}^{2}*σ*_{ζ}^{2}. In total, 500 data sets were simulated for dif-
*ferent values of ∈ [0, 1] and an empty structural model for the person parameters. All other*