Nonparametric Item Response Function Estimation for Assessing Parametric Model Fit

(1)

Function Estimation for Assessing Parametric Model Fit

Jeffrey Douglas and Allan Cohen University of Wisconsin, Madison

Methods are developed that investigate the fit of parametric item response models by comparing them to models fitted under nonparametric assumptions.

The approach is primarily graphical, but is made inferential through resampling from an estimated parametric model. The identifiability and estimation consistency of item response theory models are dis-

cussed and shown to be vital to the interpretation of differences between two fitted item response theory models. Simulation studies and real-data examples illustrate these techniques. Index terms: goodness of fit, item response function, item response theory, kernel smoothing, nonparametric item response theory, nonparametric regression.

Unidimensional item response modeling requires accurate estimates of item response functions (IRFs). A standard approach is to use a computer program, such asBILOG(Mislevy & Bock, 1982) orXCALIBRE(Assessment Systems Corporation, 1996) to obtain parameter estimates from the two- or three-parameter logistic families ofIRFs. [For an in-depth treatment of estimation techniques for parametric item response theory (IRT) models, see Baker (1992).] The two- and three-parameter logistic models (2PLMand3PLM, respectively) are desirable, because when they fit the data they provide substantial and convenient data reduction—only two or three parameters are required to reproduce theIRFs and item information functions.

However, when a parametric model is fitted to data, goodness of fit is always an issue. Several goodness-of-fit techniques have been developed for parametricIRTmodels (see Fischer & Molenaar, 1995, for methods to inspect goodness of fit for binary and polytomous Rasch models). Many of these methods are used to obtain a global significance test over all items and trait levels. Others test at the single-item level, but offer little information about the functional shape of departures from the parametric family ofIRFs.

Comparing observed and expected proportions of correct responses at several quadrature points is one method that provides information about structural departures.BILOGsupplies the necessary information. The distribution of aχ²statistic for the null hypothesis of model fit has been studied (Stone, 2000) using monte carlo simulation. The computer programMSP5(Molenaar & Sijtsma, 2000) uses an approach similar to that presented here. MSP5is used to fit nonparametricIRFs and to perform statistical testing of local deviations from monotonicity.

Purpose

NonparametricIRFestimation is used here to detect items for which the constraints of parametric

IRFs prove insufficiently flexible to adapt to the shape of the underlyingIRFs. The methods presented

234

Applied Psychological Measurement, Vol. 25 No. 3, September 2001, 234–243

(2)

are based on the empirical regression-based diagnostic methods of Lord (1970) and Kingston &

Dorans (1985) and make extensive use of resampling methods (Azzalini, Bowman, & Härdle, 1989) to obtain approximate significance levels for the departure of nonparametric estimates from the parameterized family of functions. The shape of theIRFis considered only after determining that the assumptions of unidimensionality and local independence approximately hold. These assumptions should be investigated beforehand with such tools as factor analysis andDIMTEST

(Stout, 1987).

Nonparametric IRF Estimation

IRFs cannot always be modeled well with parametric models such as the3PLMand normal ogive models. This has led to research in the theory and applications of nonparametricIRT(Douglas, 1997; Junker & Ellis, 1997; Mokken & Lewis, 1982; Rosenbaum, 1984; Sijtsma & Junker, 1996;

Sijtsma & Molenaar, 1987; Stout, 1987, 1990) and methods for estimatingIRFs without restricting the functions to a parameterized functional form (Ramsay, 1988, 1991; Ramsay & Abrahamowicz, 1989; Ramsay & Winsberg, 1991; Samejima, 1979, 1981, 1984, 1988).

Many popular methods of nonparametric regression estimation are based on local averaging, becauseP (θ) = E(Y |θ = θ) in^IRFestimation. Thus, a reasonable estimate ofP (θ) can be obtained by taking a weighted average of the response variableY over those examinees with trait levels close toθ. The weight assignment is determined by the scatterplot smoothing method used, as well as the smoothing parameter selected by the user or by the data that optimally balances estimation bias and the variance of the estimate.

In practice, user-selected smoothing parameters are determined by viewing the fitted functions and deciding whether they are sufficiently smooth to be interpretable, yet not so smooth that subtle features are lost. Data-selected smoothing parameters are obtained by methods based on cross- validation (Rice, 1984). Hastie & Tibshirani (1990) provided a detailed survey and comparison of some popular nonparametric regression and scatterplot smoothing techniques. Ramsay (1991) provided a brief survey of selected nonparametric and semiparametricIRFestimation procedures, as well as a discussion of smoothing parameter selection.

InIRFestimation, thekth examinee’s trait level, θ_k, cannot be observed.θ_kis needed to determine how much weight to assign examineek’s response to item i, Y_ik, when forming the estimate ˆP_non,i of the regressionPiatθ. However, effective substitutes for θkcan be obtained by transforming each examinee’s empirical percentile of the total score distribution, obtained after deleting the studied item, to the corresponding value on the scale determined by the latent trait distributionF (Ramsay, 1991). For example, ifF is the standard normal distribution and an examinee has a total score at the 95th percentile,θkwould be set to 1.645, the 95th percentile of the standard normal distribution.

Although any ˆPnon,i obtained in this manner is an empirical regression estimate ofYi on a total score transformation, it can consistently estimatePiofYionθ (Douglas, 1997).

Using the kernel-smoothing method (Ramsay, 1991), ˆPnon,itakes the form,

Pˆnon,i(θ) =

J k=1

K

θ − k

h

Yik

J K=1

K

θ − k

h

, (1)

whereK is a nonnegative symmetric kernel function, nonincreasing as its argument becomes further from zero;h is the bandwidth parameter, selected by the user to control the amount of smoothing.

(3)

Thus, ˆPnon,i is a smoothly weighted average with the weights determined byK and h. Typical choices forK are Gaussian [K(u) = exp(−u²)] and uniform [K(u) = I[−1 < u < 1)]. The critical properties of a kernel function are that it must assign its highest values to points near 0.0 and decrease with distance from 0.0.

h is selected to obtain a desirable trade-off between estimation bias and variance (Härdle, 1990).

Ash decreases, the bias will decrease and the variance of the estimated function at each point will increase. Ifh increases, the reverse is true. Generally, h is selected to minimize mean squared error. However, it is common for researchers to select anh that results in an estimated function that achieves some minimal level of smoothness. This is determined by visually inspecting the estimated function.h might be smaller when large samples are available, because then many data points are available in a small window around each point of evaluation.

The computer programTESTGRAF(Ramsay, 2000) setsh to the default value 1.1n^−.2. In the examples below,h was selected to be as small as possible without producing enough local variation (jaggedness) to cause difficulty in estimating the functions. In addition to TESTGRAF, kernel- smoothedIRFestimation also can be programmed withS-PLUS(Becker, Chambers, & Wilks, 1988) and other statistical software.

For kernel-smoothedIRFestimates, Douglas (1997) showed that the greatest estimation error over all items within any open interval contained in the support ofF converges to 0 with a probability of 1 as the test length and sample size increase. For reasonably long tests and large sample sizes, the nonparametric estimates should be close to the true IRFsP1, P2, . . . , Pn. Large discrepancies between the nonparametric estimates and the parametric family of functions should indicate systematic bias in the parametric model assumptions. However, this does not rule out the possibility that there is some other distinct set ofIRFs,P₁^∗, P₂^∗, . . . , P_n^∗, that results in the same joint distribution of the vector (Y1, Y2, . . . , Yn) (i.e., the manifest distribution). In other words, the

IRTmodel for the manifest distribution is not identifiable. IRTmodels are trivially unidentifiable, because whenF is transformed a corresponding transformation of the^IRFoccurs, resulting in two different models but with the same manifest distribution.

Douglas (1999) showed that, for long tests, correspondingIRFs of two differentIRTmodels with the same manifest distribution must be nearly identical. This means that it can be assumed that there is only one correct IRTmodel for a given choice of theθ distribution, and nonparametric methods can consistently estimate itsIRFs. Thus, if nonparametric estimates are unlike parametric model estimates, a parametric model on the particular scale determined byF is an incorrect model for the data.

IRT Model Diagnosis

To examine the fit of the3PLM, Lord (1970) plotted its estimates and nonparametric estimates.

His method used over 100,000 examinees to obtain nonparametric estimates. Kingston & Dorans (1985) used a method based on an approach analogous to unweighted nearest neighbor smoothing with approximately 3,000 examinees to investigate the fit of the3PLMlocally. They observed where the parametric function fell within approximate asymptotic pointwise confidence bands around the nonparametric estimates. With recent advances in smoothing techniques, Lord’s model diagnosis approach can be applied to datasets of 500 examinees or fewer. Additionally, the development of resampling techniques makes computer-intensive procedures possible (Azzalini et al., 1989). These procedures can be used in place of asymptotic theory to construct hypothesis tests of departure from the parametric model with a user-selected significance level.

(4)

Graphical Inspection

Methods that provide an item-by-item assessment of the capability of a selected parametric family of functions to represent underlyingIRFs begin with a graphical inspection of how and where a model does not fit an item. After this inspection, the user might be satisfied that the nonparametric functions are close enough to the parametric functions. However, if items are detected that (1) have humps, (2) are not monotone increasing, or (3) fail to asymptote properly, using this model could have severe consequences (see the examples below).

For example, maximum likelihoodθ estimates derived from parametric^IRFestimates cannot be correct when the probability values in the likelihood equation are misestimated. Also, when

IRFsmoothness cannot be described adequately by a few parameters, item information functions obtained from the derivatives of these functions are incorrect and misleading. When parametric and nonparametric estimates of the sameIRFs do not agree, it is natural to ask whether discrepancies arose merely from chance variation in the data, or whether the mathematical forms of theIRFs were not adequately approximated by the parameterized functions. Resampling techniques, allowing an observation of how responsible chance variation is for the apparent misfit, can help answer this question.

To diagnose the parametric model, plot each nonparametrically estimatedIRF, ˆP_non,i, alongside a parametrically estimated IRF. For this comparison to be valid, it is critical that F be used for both the nonparametric and parametricIRFs. For example, if marginal maximum likelihood (which typically assumes a normal distribution ofθ) is selected to fit a parametric model (Baker, 1992), nonparametric estimates represented on this same distribution can be compared to them. IfF is estimated as part of the parametric procedure instead of being fixed, the nonparametricIRFs also must use the parametrically estimatedF .

Theoretically, ˆPnon,ican resemble a parametric function and also look different fromIRFs estimated by maximum likelihood or other techniques. For this reason, ˆPnon,i’s distance from para- metricIRFs is measured by finding its nearest approximation within the parametric family. For instance, if the presumed parametric family were2PLM, a function might be chosen with a logit that comprises the best-fitting line to the logit of ˆP_non,i.

Determining the Distance Between IRFs

Letd(P, P^∗) be a measure of the distance between two functions, and let βββ be a parameter vector that minimizesd(Pβ, ˆPnon) over all values of βββ in the parameter space. The resulting approximation to ˆPnon,i isP_ˆβ,i. For somei, if ˆPnon,i displays a large difference fromP_ˆβ,i, it will display large differences from all parametricIRFs. Many choices ford(P, P^∗) are possible, but the root integrated squared error (RISE) is used here. RISEis simple to compute and has an interpretation much like the mean squared error of a parameter estimate.RISEis

RISE= d(P, P^∗) =

P (θ) − P^∗(θ)₂ f (θ)dθ

_1/2

. (2)

RISEcan be minimized by discretizing theIRFs over some finite grid ofθ values and representing its square as a sum over those points. The Newton-Raphson algorithm (Press, Teukolsky, Vetterling,

& Flannery, 1996) can be applied to minimize this over parameters.

Bias and Variance of the Differences

Azzalini et al. (1989) used a form of bootstrapping to estimate the combined influence of bias and variance of statistics. Resampling can be used to separate deviation from the parametric model due to chance error fromIRFs that have a different functional form.

(5)

Assume that the parametric model withβ = (β1, β2, . . . , βn) parameters is the true underlying latent variable model for the manifest distribution. Letdibe the distanced( ˆPnon,i, P_ˆβ,i), and let Gβ

be the joint distribution function ofdi(i = 1, 2, . . . , n) under the null hypothesis that the parametric model holds. If the trueβ were known, it would be a trivial matter to generate a large number of datasets withJ examinees randomly drawn from F and their item responses drawn from Bernoulli distributions with probabilitiesP_ˆβ_i(θk)(i = 1, 2, . . . , n; k = 1, 2, . . . , J ). Then, Gβ could be empirically determined by repeatedly calculatingd for each dataset. However, the smoothness of most IRTmodels and d_i guarantee that G_β will not change much ifβ is slightly perturbed.

An effective substitute for an unknownβ is the estimate ˆβ = ˆβ1, ˆβ2, . . . , ˆβn. As ˆβ converges to β, G_ˆβconverges toGβ. Thus, it is reasonable to approximate the null distribution ofdiby generating samples fromG_ˆβ. Of course, the accuracy ofG_ˆβin approximatingGβis governed by how well ˆβ estimatesβ. The accuracy in approximating G_ˆβdepends entirely on how much resampling is done.

Determining Significance Values

Approximate significance values for item-by-item tests of goodness of fit for the parametric model can be obtained with a three-step procedure:

1. From the original dataset, computed1, d2, . . . , dn. Under the null hypothesis that the parametric model holds,d1, d2, . . . , dnare drawn randomly fromGβ.

2. GenerateM datasets by selecting J values of θ from F and generating item responses from the parametric items that best approximate the nonparametric items ˆP_ˆβ

i. For themth dataset (m = 1, 2, . . . , M), compute the distance measures d_i^m(i = 1, 2, . . . , n) in the same manner as Step 1. Note thatd_i^mare computed for each dataset and are randomly drawn fromG_ˆβ. 3. For eachi, compute the rank (ri) of diamongd_i^m. Ifriis quite large for somei, the deviance

of ˆP_non,ifromP_ˆβ,icomputed with the data will appear to be much larger than usual for that difference computed under the null model. In this case, the model does not apply to itemi.

The difference 1− [ri/(M + 1)] can be used as an approximate significance value to test this goodness-of-fit hypothesis.

This method can be implemented using S-PLUS. The function KSMOOTHcan be used to obtain nonparametric estimates,Pnon. Functions such asNLMINandLSFITcan be used for finding the best parametric approximating functions when the distance functions are nonlinear or linear in the parameters, respectively. The resampling approach merely requires performing these operations repeatedly, randomly generating new datasets from the parametric model in each iteration. Using conditional independence and knowledge of theIRFs, datasets can be generated by first selecting values ofθ using the function^RNORM(ifF is assumed to be normal) and then determining whether randomly drawn uniformly distributed variables obtained from the functionRUNIFare less than the valuesP_ˆβ_i, to decide if theith item response Yi is 1 or 0. (S-PLUSsoftware to implement this procedure is available from the authors.)

Simulated Data Examples Study 1

Method. Data were simulated for 1,000 examinees withθs drawn from a standard normal distribution responding to 16 items. The first twelve items(P1, P2, . . . , P12) were from the^2PLM:

log{Pi(θ)/ [1 − Pi(θ)]} = ai+ biθ , βi = (ai, bi) , i = 1, . . . , 12 . (3)

(6)

The remaining items,P13, . . . , P16, were selected to have a mathematical form capable of creating humps and multiple inflection points, although they were monotone increasing. ForP13–P16, a cubic form for the logit was selected,

log{Pi(θ)/ [1 − P_i(θ)]} = a_i+ b_iθ + c_iθ³, i = 13, . . . , 16 , (4) wherebi andciwere positive values.

Intercept parameters were randomly drawn from a uniform distribution on the interval (1, 1).

Slope parameters were randomly drawn from a uniform distribution on the interval (1, 2.5), and the coefficient for the cubic term ofP13–P16was set to .75.

The null hypothesis was that the2PLMheld, and an analysis was conducted to attempt to dis- tinguish those items that had a2PLMform from those that did not. Each nonparametricIRFes- timate and its nearest2PLMapproximation was obtained, along with correspondingRISEvalues [d( ˆPnon,i, P_ˆβ,i)]. For nonparametric estimation, a bandwidth of .20 was used, along with a Gaus- sian kernel function. Significance values were computed using M = 500 resampled datasets generated from ˆβ and a standard normal θ distribution.

Results. For Items 1–12,RISEvalues ranged from .017 to .076; the smallestp was .108 (Item 11). The items with the largestRISEwere those that did not follow the2PLM.RISEfor these items ranged from .078 to .118. p for each of these four items was less than .03.

Study 2

A second simulation illustrated a limitation of this approach.

Method. Items 1–12 again were from the2PLMand Items 13–16 had a cubic term in the logit.

Item parameters were drawn as described for Study 1. However, the sample size was only 250, with θ values again generated from the standard normal distribution. Accordingly, a larger bandwidth of .50 was selected.

Results. Three of the four cubicIRFs hadp < .10, and only one of the twelve2PLM IRFs had p < .10. With this smaller sample size, it became more difficult to visually interpret how the items differed in functional form. The estimates were incapable of representing all the curves and bends in the cubicIRFs. The bias of the estimates could be reduced by selecting a smaller bandwidth, but this would create considerable variance in the estimates at each point, which would further complicate the interpretation.

Real-Data Examples Example 1

Method. This example used data from a University of Wisconsin mathematics placement test consisting of 32 items designed to assess skills in arithmetic and elementary algebra. The dataset consisted of item responses from 1,019 students.

Nonparametric estimates were fitted using a bandwidth of .20 and a Gaussian kernel.RISEvalues were computed based on the best parametric approximation to the nonparametric fit. Significance values were computed usingM = 500 resampled datasets. The θ distribution was assumed to be standard normal.

Results. The fit of the2PLMgenerally was good. AllRISEvalues were less than .060. Only four items hadp < .10, and three were less than .05. Figure 1 shows, for the item with the smallest p (.014), that item’s ˆPnonand its nearest approximation,P_ˆβ. The results show that the item might

(7)

not have an asymptote at 0, as is required of the2PLM; it might have a sharper asymptote at 1 at the high end of theθ distribution. However, in the absence of a very large sample, the high local variability of the nonparametric estimates limits the possibility of making definitive conclusions about such structural departures.

Figure 1

Pnon(_θ) and_P_ˆβ(_θ) for the Item With_{p = .014}

Example 2

Method. Data were obtained from 379 students in an introductory psychology course at McGill University (Ramsay, 2000). The test consisted of 100 multiple-choice items. Because of the small sample size, a bandwidth of .45 was used. Theθ distribution was assumed to be standard normal.

Results. Six items hadRISEvalues greater than .50 andp < .05 when compared to^IRFs from the

2PLM. These items are shown in Figure 2. Item 46 (Figure 2c) displayed some nonmonotonicity, and Item 96 (Figure 2f) displayed severe nonmonotonicity. For Items 38 (Figure 2a) and 74 (Figure 2d), a guessing parameter might improve the fit. Items 40 (Figure 2b) and 78 (Figure 2e) had sharp initial rises, flattened in the middle of theθ distribution, and then sharply rose. The^2PLMis incapable of modeling this behavior.

Conclusions

For long tests, estimates ofθ based on the total score that can be used as the covariate in kernel smoothing are more reliable than estimates based on total scores from short tests. They nearly eliminate estimation bias and variance due to estimating covariates with error. As a result, the nonparametricIRFestimates andp values obtained from resampling are more precise for tests with 25 or more items. The primary advantage of this method is that it allows visual inspection of the nature of item misfit. As shown in the real-data examples, this might be due to the omission of a guessing parameter, nonmonotonicity, or more subtle changes in concavity that require several parameters to fit.

The methods described here are general and can be applied to virtually any presumed parametric family ofIRFs. They also can easily be extended to polytomousIRTmodels: kernel smoothing can

(8)

Figure 2

Pnon(_θ) and_P_ˆβ(_θ) for Six Items

a. Item 38 b. Item 40

c. Item 46 d. Item 74

e. Item 78 f. Item 96

(9)

be used to fit each option response function. The remaining steps, finding the best approximations in the parametric family and resampling, can be done as described.

References Assessment Systems Corporation (1996). XCAL-

IBRE marginal maximum likelihood estimation program, Version 1.10 [Computer program].

St. Paul MN: Author.

Azzalini, A., Bowman, A. W., & Härdle, W. H.

(1989). On the use of nonparametric regression for model checking. Biometrika, 76, 1–11.

Baker, F. B. (1992). Item response theory parameter estimation techniques. New York: Marcel Dekker.

Becker, R. A., Chambers, J. M., & Wilks, A. R.

(1988). The new S language: A programming envi- ronment for data analysis and graphics. Belmont CA: Wadsworth.

Douglas, J. (1997). Joint consistency of nonparametric item characteristic curve and ability estimation.

Psychometrika, 62, 7–28.

Douglas, J. (1999). Asymptotic identifiability of non- parametric item response models (Technical Re- port No. 142). University of Wisconsin, Depart- ment of Biostatistics and Medical Informatics.

Fischer, G. H., & Molenaar, I. W. (Eds.). (1995).

Rasch models: Foundations, recent developments, and applications. New York: Springer-Verlag.

Härdle, W. (1990). Applied nonparametric regres- sion. London: Chapman & Hall.

Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models. London: Chapman & Hall.

Junker, B., & Ellis, J. (1997). A characterization of monotone unidimensional latent variable models.

Annals of Statistics, 25, 1327–1343.

Kingston, N. M., & Dorans, N. J. (1985). The analysis of item-ability regressions: An exploratory IRT model fit tool. Applied Psychological Mea- surement, 9, 281–288.

Lord, F. M. (1970). Item characteristic curves estimated without knowledge of their mathematical form: A confrontation of Birnbaum’s logistic model. Psychometrika, 35, 43–50.

Mislevy, R. J., & Bock, R. D. (1982). BILOG: Item analysis and test scoring with binary logistic mod- els [Computer Program]. Mooresville IN: Scien- tific Software.

Mokken, R., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item re- sponses. Applied Psychological Measurement, 6, 417–430.

Molenaar, I. W., & Sijtsma, K. (2000). MSP5 for Windows, a program for Mokken scale analysis of polytomous items. Groningen, The Netherlands:

ProGAMMA.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., &

Flannery, B. P. (1996). Numerical recipes in FOR- TRAN 90: The art of parallel scientific computing.

New York: Cambridge University Press.

Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science, 3, 425–441.

Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estima- tion. Psychometrika, 56, 611–630.

Ramsay, J. O. (2000). TESTGRAF: A program for the graphical analysis of multiple choice test and questionnaire data [Computer program].

Available from www.psych.mcgill.ca/faculty/

ramsay.html.

Ramsay, J. O., & Abrahamowicz, M. (1989). Bino- mial regression with monotone splines: A psy- chometric application. Journal of the American Statistical Association, 84, 906–915.

Ramsay, J. O., & Winsberg, S. (1991). Maximum marginal likelihood estimation for semiparamet- ric item analysis. Psychometrika, 56, 365–379.

Rice, J. (1984). Bandwidth choice for nonparametric regression. Annals of Statistics, 12, 1215–1230.

Rosenbaum, P. R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425–435.

Samejima, F. (1979). A new family of models for the multiple-choice item (Research Report No. 79-4).

Knoxville TN: University of Tennessee, Depart- ment of Psychology.

Samejima, F. (1981). Efficient methods of estimating the operating characteristic of item response cat- egories and a challenge to a new model for the multiple-choice item. Knoxville TN: University of Tennessee.

Samejima, F. (1984). Plausibility functions of the Iowa vocabulary test items estimated by the simple sum procedure of the conditional P.D.F. approach (Research Report 84-1). Knoxville TN: Univer- sity of Tennessee, Department of Psychology.

Samejima, F. (1988). Advancement of latent trait theory (Research Report No. 79-4). Knoxville TN: University of Tennessee, Department of Psychology.

Sijtsma, K., & Junker, B. (1996). A survey of theory and methods of invariant item ordering. British Journal of Mathematical and Statistical Psychol- ogy, 49, 79–105.

Sijtsma, K., & Molenaar, I. (1987). Reliability of test scores in nonparametric item response theory.

Psychometrika, 52, 79–97.

(10)

Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of Educational Measure- ment, 37, 58–75.

Stout, W. F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychome- trika, 52, 589–617.

Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimen- sionality assessment and ability estimation. Psy- chometrika, 55, 293–326.

Acknowledgments

This research was supported in part by the National Institute of Health, Grant No. R01 CA81068-01.

Author’s Address

Send requests for reprints or further information to Jeffrey Douglas, Department of Biostatistics and Medical Informatics, K6/436 CSC, 600 High- land Avenue, Madison WI 53792, U.S.A. Email:

[email protected].