The number of replications is limited to 100 in our study given that it takes 1 to 5 minutes to obtain parameter estimates for each replication. However, it takes one to two weeks for the Monte Carlo Markov Chain (MCMC) chain to arrive at the posterior distribution of the parameters using the Bayesian estimation. For some replications, convergence problems occur under the MRM fittings, but not under the MRM-RT fitting. We suspect that the classification of examinees heavily requires the information from the response time in the current simulation condition. Therefore, lack of the response time information in MRM results in the problem of unable to classifying the examinees into the RG class accurately due to its small class size, which in turn causes the problem of convergence. The percentages of the replications that converge under MRM fittings with different sample sizes are reported in Table 3. The percentage of convergence increases as the sample size becomes larger. This is reasonable because the latent classes are large enough to be distinguished with a large sample. Only
those replications that converge are used for summarizing the simulation results. More specifically, to ensure that estimation does not fail to converge, we only collect those replications with their item difficulty estimates ranging from -5 to 5. We summarize the estimation results from a total of 100 replications which satisfy such a criterion for the MRM fitting.
The estimation performances of the proportions of the three classes, the ability mean of the HARF class and ability variances of the SB and HARF classes, are summarized in Table 4. Comparing the mean of the estimates to the true parameters, the empirical bias decreases as the sample size increases. The SD and MSE exhibit the same pattern that the estimation is more precise as the sample size increases, as expected. Based on Table 4, it seems that MRM could not well account for the proportions of the three classes, πRG, πSB, and πHARF, especially when the sample size is only 250. Moreover, the ability mean of the HARF class does not seem to approach its parameter values even when the sample size increases to 2000 under MRM. Comparing the results of MRM-RT with those of MRM, the empirical bias as well as the SD are shown to be generally smaller in the former model. That is, the information of the response time really helps capture the test-taking behavior of the examinees in the RG class and resultantly improve the estimation with respect to bias and relative efficiency for finite samples.
The estimation performance of the item difficulty parameters for each sample size and the SB and HARF classes are summarized in Tables 5 to 12. The empirical bias, SD, and MSE all decrease as the sample size increases. In addition, the estimates from the MRM-RT are better than those from the MRM with respect to their SD as well as MSE. For example, Table 7 shows that SD and MSE of each item under the MRM-RT are smaller than those under the MRM. However, empirical bias from the MRM-RT are not always better than MRM, especially in small samples. More specifically, Table 5 shows that the empirical bias from MRM are better than MRM-RT for items 4, 10, 12, 13, 14, 16, 17, 22 and 24 for samples of size 250. As the sample size increases to 1000, empirical bias for almost all items from the MRM-RT are better than MRM.
For example, Table 9 shows that only item 1 has a smaller empirical bias from MRM than that from MRM-RT, and Table 12 shows that the empirical bias for all items are smaller from MRM-RT than MRM. For example, the difficulty parameter of item 25 is 1.9, and its mean estimates for the HARF class under MRM-RT are respectively
Table 3: Percentages of Replications that Converge under MRM Fittings
Sample size 250 500 1000 2000
percent of passing the (-5, 5) criterion 7.45 % 14 % 17% 27 %
2.29, 2.09, 1.94, and 1.95 as the sample size increases from 250 to 2000. However, its mean of estimates under the MRM are respectively 2.54, 2.32, 2.32, and 2.19. In other words, the empirical bias under the MRM-RT are obviously much smaller than those obtained under the MRM as the sample size increases.
In addition, the SD and MSE of the 10 DIF items for the HARF class are obviously larger than other items under the MRM-RT when the sample size is small. As the sample size increases, no differences are found for the SD and MSE between these 10 DIF items and the others. Table 6 with sample size of 250 shows that the SD and MSE of item 19 are respectively 0.55 and 0.52, and of item 18 are respectively 0.45 and 0.25. Table 10 with sample size of 1000 shows that the SD and MSE of item 19 are respectively 0.29 and 0.14, and of item 18 are respectively 0.27 and 0.08. That is, the stability of item difficulty parameter mean of estimates under MRM increases as the sample size increases.
The item difficulty parameters of the larger proportion classes in MRM could be estimated when the sample size is 1000 and their empirical bias, SD and MSE of estimates obviously decrease. In addition, the empirical bias, SD, and MSE for the difficulty parameter estimates of the DIF items under the MRM-RT obviously decrease when sample size is 1000. Overall, these results show that MRM-RT could recover parameters very well and better than MRM using MLR estimation under the mixture SEM. Especially with a small sample size, the estimation performance of MRM-RT is much advantaged over MRM because the response time really help disregarding (or accounting for) the responses of the examinees in the RG class to arrive at better estimation of the item difficulty parameters.
Table4:Mean,SD,andMSEoftheEstimatesforThreeClassSizes,MeanAbilityofHARFClass,andAbility VariancesofSBandHARFClasses,underMRM-RTandMRMFittings MRM-RTMRM πRGπSBπHARFσ2 θSBµθHARFσ2 θHARFπRGπSBπHARFσ2 θSBµθHARFσ2 θHARF NParameter0.150.550.310.50.650.150.550.310.50.65 250 mean0.1440.4930.3640.8990.6030.7690.1420.4580.40.9680.521.209 SD0.0230.1140.1150.1950.4470.2350.0270.2190.220.5310.7350.907 MSE0.000580.0160.0170.0480.2110.0690.000790.0560.0560.2830.5411.136 500 mean0.150.5120.3390.930.5190.7060.150.4250.4240.8870.4630.972 SD0.0150.0970.0950.1670.280.1690.0190.1920.1930.3920.6470.407 MSE0.000240.0110.0110.0330.0790.0320.000350.0450.0450.1660.420.269 1000 mean0.1490.5380.3130.9910.4940.6620.1490.4620.3890.8850.5020.936 SD0.0110.0570.0580.1170.2720.1140.0120.1640.1630.2950.6190.359 MSE0.000110.0030.0040.0140.0740.0130.000160.0430.0420.1000.3840.211 2000 mean0.1490.5430.3080.9890.5110.6620.1490.4640.3870.8950.5540.787 SD0.0080.0450.0440.080.2190.0860.010.1640.1630.3360.6230.37 MSE0.000070.0020.0020.0070.0480.0080.000100.0350.0370.1240.3920.156 RG=rapid-guessing;SB=solutionbehaviorclass;HARF=highabilityand/orrespondwithfamiliarity; SD=standarddeviation;MSE=meansquarederror.
Table 5: Mean, SD, and MSE of Difficulty Parameter Estimates for the SB Class with Sample Size of 250, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.43 0.30 0.09 -2.47 0.36 0.13
SD = standard deviation; MSE = mean squared error.
Table 6: Mean, SD, and MSE of Difficulty Parameter Estimates for the HARF Class with Sample Size of 250, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.43 0.30 0.09 -2.47 0.36 0.13
SD = standard deviation; MSE = mean squared error.
Table 7: Mean, SD, and MSE of Difficulty Parameter Estimates for the SB Class with Sample Size of 500, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.40 0.20 0.04 -2.41 0.35 0.12
SD = standard deviation; MSE = mean squared error.
Table 8: Mean, SD, and MSE of Difficulty Parameter Estimates for the HARF Class with Sample Size of 500, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.40 0.20 0.04 -2.41 0.35 0.12
SD = standard deviation; MSE = mean squared error.
Table 9: Mean, SD, and MSE of Difficulty Parameter Estimates for the SB Class with Sample Size of 1000, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.42 0.14 0.02 -2.40 0.29 0.09
SD = standard deviation; MSE = mean squared error.
Table 10: Mean, SD, and MSE of Difficulty Parameter Estimates for the HARF Class with Sample Size of 1000, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.42 0.14 0.02 -2.40 0.29 0.09
SD = standard deviation; MSE = mean squared error.
Table 11: Mean, SD, and MSE of Difficulty Parameter Estimates for the SB Class with Sample Size of 2000, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.41 0.11 0.01 -2.39 0.29 0.08
SD = standard deviation; MSE = mean squared error.
Table 12: Mean, SD, and MSE of Difficulty Parameter Estimates for the HARF Class with Sample Size of 2000, under MRM-RT and MRM Fittings
MRM-RT MRM
item Parameter mean SD MSE mean SD MSE
1 -2.4 -2.41 0.11 0.01 -2.39 0.29 0.08
SD = standard deviation; MSE = mean squared error.
4 Applied Data Analysis
In this section, the real data of Meyer (2010) are reanalyzed with mixture SEM.
Meyer (2010) compared the one- and two-class models and showed that the two-class model fitted better. However, he suggested the possibility of adding one more class to account for some members of the rapid-guessing class who seem to spend more response time than the others within the same class, but on the other hand less response time than those in the solution behavior class. Under the current mixture SEM framework, the model can be easily extended to a three-class model and readily estimated with MLR. In this study,the two- and three-class models are compared to examine the ne-cessity of including this additional class to better characterize the test-taking behavior through investigating the characteristics of each class of examinees.
4.1 Data
The original data were taken from a Spring 2004 administration of the Information Literacy Test (ILT). Participants included 524 college sophomores who completed the computerized testing. The number of items on the test is 60. Cronbach’s alpha was 0.88 for this test (Meyer, 2010). The response data are dichotomous, and the unit of response time is in seconds.
Meyer(2010) considered the existence of two types of test-taking behavior, namely the rapid-guessing and solution behavior, in the data. However, their results showed that it could not account for the much longer response time of some members in the rapid-guessing class. Thus, we first consider a three-class model with the rapid-guessing class, and the other two solution behavior classes, denoted as SB 1 and SB 2. Secondly, the two-, three-, and four-class models are compared using the Akaike information criterion (AIC; Akaike, 1987), Bayesian information criterion (BIC; Schwartz, 1978), and Sample-Size Adjusted BIC (Sclove, 1987) such that
AIC = −2log(L(u, t∗)) + 2k; (17)
BIC = −2log(L(u, t∗)) + klog(N ); (18) Adjusted BIC = −2log(L(u, t∗)) + klog((N + 2)/24), (19)
Table 13: Model Fit Indices of the Two-Class, Three-Class, and Four-Class Models
two-class three-class four-class
AIC 78772.401 73850.126 75974.571
BIC 80059.371 75912.688 78812.725
Adjusted BIC 79100.750 74376.354 76698.678
Table 14: Estimates of Class Sizes of the Three-Class Model
class proportion SE
RG 0.095 0.013
SB 1 0.453 0.035
SB 2 0.452 0.036
RG = rapid-guessing class;
SB 1 = solution behavior class 1;
SB 2 = solution behavior class 2.
where N is the number of examinees in the data, and k is the number of parameters computed by adding the the number of means and variances of the item response time, the number of item difficulty parameters and the number of means and variances of the ability distributions for all the classes, and the number of mixing proportions minus one. The most parsimonious model is the one with the smallest AIC, BIC, and Adjusted BIC. Finally, we explore the characteristics of each class of examinees.
4.2 Results
As shown in Table 13, all fit indices show that the three-class model fits better than the two-class and four-class models. Table 14 shows the estimated proportions of each class in the three-class model. In the three-class model, the estimated size of the rapid-guessing class is smaller than the estimate of 15% in Meyer (2010).
Table 15: Estimates of Mean Response Time for SB 1 and SB 2
class SB 1 SB 2 SB 1 SB 2
item estimate SE estimate SE item estimate SE estimate SE
1 22.10 0.64 16.19 0.52 31 13.66 0.49 10.24 0.26
2 19.80 0.59 15.28 0.46 32 10.03 0.27 8.14 0.25
3 43.88 1.23 28.52 0.97 33 61.34 1.84 49.56 1.04
4 16.77 0.52 12.25 0.34 34 55.45 1.83 41.04 1.48
5 15.45 0.63 10.09 0.29 35 50.13 1.70 31.19 1.37
6 19.01 0.86 12.41 0.35 36 23.42 0.75 13.47 0.63
7 10.43 0.37 7.55 0.20 37 58.35 1.93 34.86 1.71
8 13.14 0.50 9.90 0.28 38 63.18 2.91 34.10 1.81
9 25.12 0.80 16.99 0.53 39 26.38 0.95 17.08 0.68
10 23.07 0.85 15.09 0.36 40 45.78 1.97 28.19 1.18
11 14.02 0.43 10.11 0.29 41 32.20 1.32 20.78 0.83
12 21.20 0.89 14.66 0.37 42 18.53 0.67 12.13 0.39
13 34.34 0.86 23.16 0.79 43 29.80 0.86 18.99 0.70
14 14.61 0.47 10.06 0.30 44 43.66 1.44 26.73 1.15
15 40.92 1.31 25.99 0.86 45 34.65 1.77 19.50 0.96
16 45.00 1.67 25.72 1.26 46 22.91 0.83 16.91 0.80
17 48.84 1.51 34.89 0.73 47 17.83 0.86 10.61 0.51
18 13.18 0.36 10.25 0.29 48 48.86 2.25 26.36 1.61
19 8.68 0.31 6.24 0.14 49 26.17 0.89 17.18 0.64
20 24.92 0.77 17.52 0.58 50 41.66 1.42 25.13 1.58
21 19.43 0.68 13.95 0.36 51 13.59 0.42 10.32 0.30
22 31.97 1.34 20.58 0.51 52 17.30 0.55 12.24 0.42
23 47.03 1.83 27.27 1.17 53 21.72 0.56 14.73 0.60
24 48.70 1.75 24.93 1.55 54 9.14 0.27 7.02 0.20
25 30.56 1.10 18.70 0.69 55 17.77 0.48 12.40 0.47
26 21.11 0.89 13.57 0.56 56 29.59 0.80 19.57 0.82
27 23.88 1.10 14.67 0.62 57 16.85 0.52 12.18 0.48
28 10.97 0.44 7.63 0.26 58 19.84 0.60 13.21 0.52
29 12.83 0.41 9.46 0.31 59 14.56 0.44 10.12 0.36
30 12.27 0.37 9.09 0.23 60 16.87 0.49 11.00 0.43
SB 1 = solution behavior class 1; SB 2 = solution behavior class 2.
As shown in Table 15, the means of response time of all items for SB 2 are smaller than those of SB 1 in the three-class model. The estimates of item difficulty parameters for SB 1 and SB 2 are reported in Table 16. Considering the magnitudes of standard errors of those difficulty estimates, the differences in the difficulty parameter estimates for the items between the two classes might not be statistically significant. For example, the estimates of the difficulty parameter of item 5 are respectively -0.75 and -1.269 for SB 1 and SB 2, but their corresponding standard errors are respectively 0.149 and 0.506. In other words, the item difficulty parameters in the two solution behavior classes might be statistically indifferent. Thus, we next consider the more restricted model which constrains the item difficulty parameters of the two solution classes to be the same and test whether some items indeed exhibit DIF between the two latent classes.
The fit indices reported in Table 17 indicate that the three-class model with equality constraint on the item parameters for both solution behavior classes fits better than the unconstrained one. To assess the absolute instead of relative fit of the three-class model, we compare the observed and expected probability of getting a correct answer on each item for each of the three classes. The examinees are first classified into their most likely class based on the obtained posterior probabilities of belonging to each of the three classes. For each class of examinees, the observed and expected mean probability of getting a correct answer on each item are computed and plotted in Figure 3. Figure 3 shows that the observed and expected mean probability of getting a correct answer for SB 1 and SB 2 are very close for the three-class model. However, the observed mean probabilities of correct on the items appear quite different from the expected probability of 0.25 and seem to vary across items for RG. Therefore, we further relax the assumption of having the probability of getting a correct answer for RG be 0.25 for all items, and allow the item difficulty parameters to be freely estimated.
Table 18 shows that the three-class model without equality constraint on the item difficulty parameters for RG fits better than the restricted model. Moreover, as shown in Table 19, a three-class model is again preferred to a two-class model when the item difficulty parameters are allowed to vary for the RG class. The observed and expected mean probability of getting a correct answer for each of the three-class of examinees are plotted in Figure 4. According to Figure 4, the observed mean probabilities of getting a correct answer are very close to the expected ones on all items for SB 1, SB
Table 16: Estimates of Difficulty Parameters for SB 1 and SB 2
class SB 1 SB 2 SB 1 SB 2
item estimate SE estimate SE item estimate SE estimate SE
1 -3.366 0.355 -3.366 0.355 31 -2.01 0.204 -1.704 0.502
2 -2.798 0.279 -2.85 0.541 32 -3.653 0.416 -3.454 0.559
3 -0.474 0.15 -0.129 0.484 33 2.098 0.209 2.367 0.547
4 -1.484 0.175 -1.519 0.504 34 -2.381 0.238 -2.106 0.511
5 -0.75 0.149 -1.269 0.506 35 -0.408 0.146 -0.74 0.502
6 -3.291 0.355 -3.431 0.546 36 -4.452 0.581 -3.73 0.602
7 -3.36 0.35 -4.409 0.673 37 0.408 0.145 0.062 0.498
8 -2.666 0.264 -2.62 0.502 38 -1.102 0.159 -0.695 0.489
9 -1.014 0.155 -1.188 0.486 39 -2.001 0.204 -2.5 0.503
10 -2.722 0.266 -2.855 0.546 40 -3.123 0.318 -2.111 0.51
11 -2.795 0.276 -3.131 0.56 41 -1.732 0.192 -1.387 0.49
12 -0.471 0.149 -0.752 0.485 42 -3.47 0.37 -3.72 0.597
13 0.892 0.151 0.831 0.494 43 -2.286 0.225 -1.379 0.484
14 -2.077 0.214 -1.99 0.527 44 -3.236 0.339 -2.07 0.508
15 3.445 0.372 3.033 0.601 45 -3.347 0.354 -2.786 0.526
16 1.451 0.176 1.401 0.507 46 -2.096 0.215 -1.878 0.495
17 -1.082 0.165 -0.578 0.497 47 -2.058 0.213 -2.147 0.505
18 -1.939 0.2 -1.672 0.51 48 -0.439 0.146 -0.497 0.492
19 -3.135 0.318 -3.484 0.579 49 -0.795 0.153 -0.858 0.5
20 -1.024 0.16 -1.026 0.503 50 -0.627 0.156 -0.461 0.485
21 -1.907 0.21 -2.146 0.504 51 -2.551 0.247 -1.964 0.503
22 0.599 0.146 0.776 0.488 52 -0.304 0.144 -0.694 0.507
23 -0.29 0.144 0.056 0.487 53 -2.613 0.265 -1.867 0.499
24 0.123 0.14 0.131 0.486 54 -0.833 0.154 -0.861 0.501
25 -1.463 0.174 -1.789 0.518 55 -3.054 0.309 -2.838 0.544
26 -0.315 0.146 -0.721 0.493 56 -1.207 0.162 -0.783 0.484
27 0.797 0.15 0.843 0.507 57 -1.087 0.161 -1.074 0.498
28 -2.858 0.346 -2.394 0.531 58 -0.519 0.147 -0.941 0.496
29 -0.426 0.148 -0.681 0.488 59 -1.767 0.196 -2.041 0.525
30 -3.949 0.462 -2.912 0.535 60 -0.681 0.148 -0.674 0.489
SB 1 = solution behavior class 1; SB 2 = solution behavior class 2.
Table 17: Fit Indices for Three-Class Models with and without Equality Constraints on Difficulty Parameters for SB 1 and SB 2
without with
AIC 73850.126 73848.904
BIC 75912.688 75660.038
Adjusted BIC 74376.354 74310.984
Figure 3: Observed and expected mean response probability of getting a correct answer for each latent class of examinees classified using the greatest posterior probability
Table 18: Fit Indices for Three-Class Models with and without Equality Constraint on
Table 19: Fit Indices of Two-Class and Three-Class Models without Equality Constraint on Difficulty Parameters for RG
two-class three-class
AIC 77718.044 73067.466
BIC 79264.966 75134.290
Adjusted BIC 78112.715 73594.781
2, as well as for RG. Therefore, the resultant three-class model provides reasonable fit to the data.
From the above results that the three-class model without the equality constraints on the difficulty parameters of all items appear to be more parsimonious, we conclude that the examinees in RG do not just randomly respond to or guess the answers for all the items. It is possible that they might try to solve some of the items but guess some other items. In addition, with the equality constraint between the item difficulty parameters for the two latent classes of examinees, the items are said to be DIF free for the SB 1 and SB 2 classes. The estimates of items difficulty parameters with equality constraints for both SB 1 and SB 2, and those for RG are reported in Table 20.
According to Table 21, the estimated class sizes for the RG, SB 1, and SB2 are respectively 0.121, 0.421, and 0.458. Next, we investigate the characteristics of the RG, SB 1, and SB 2 classes. Table 22 shows that the mean ability of SB 1 class is slightly higher than that of SB 2 class, where the mean ability is set to be 0 for SB 2 class for identification purpose. In addition, the mean ability of the RG class is obviously lower than those of the two SB classes. However, the low mean ability of the RG is of less interest because if guessing behavior is assumed for examinees in RG
Figure 4: Observed and expected mean response probability of getting a correct answer for each latent class without the equality constraint on difficulty parameters for RG
Table 20: Estimates of Difficulty Parameters for RG and for Both Solution Behavior Classes, SB 1 and SB 2
class RG SB 1& 2 RG SB 1& 2
item estimate SE estimate SE item estimate SE estimate SE
1 -2.780 0.187 -2.780 0.187 31 -0.460 0.288 -1.693 0.138
2 -0.912 0.298 -2.649 0.190 32 -0.756 0.292 -3.327 0.253
3 1.413 0.332 -0.146 0.107 33 1.892 0.382 2.430 0.162
4 -0.105 0.278 -1.314 0.123 34 -0.388 0.283 -2.034 0.153
5 0.250 0.283 -0.817 0.113 35 1.311 0.331 -0.434 0.109
6 -0.756 0.293 -3.210 0.240 36 -0.175 0.283 -4.108 0.360
7 -1.075 0.303 -3.608 0.286 37 0.466 0.291 0.416 0.105
8 -0.532 0.290 -2.458 0.178 38 0.107 0.288 -0.720 0.113
9 0.621 0.295 -0.940 0.113 39 -0.317 0.283 -2.123 0.157
10 -0.388 0.281 -2.615 0.189 40 0.322 0.289 -2.373 0.175
11 -1.250 0.313 -2.756 0.197 41 0.699 0.303 -1.427 0.127
12 0.104 0.279 -0.454 0.110 42 -0.178 0.275 -3.688 0.302
13 2.204 0.428 1.014 0.113 43 0.542 0.284 -1.626 0.138
14 -0.246 0.281 -1.873 0.145 44 -0.175 0.280 -2.400 0.175
15 2.198 0.422 3.373 0.238 45 0.621 0.293 -2.961 0.216
16 1.892 0.386 1.591 0.128 46 0.178 0.282 -1.817 0.143
17 1.413 0.335 -0.677 0.112 47 0.104 0.275 -1.951 0.150
18 0.395 0.292 -1.693 0.137 48 1.213 0.322 -0.298 0.109
19 -1.161 0.314 -3.210 0.240 49 0.699 0.296 -0.657 0.110
20 0.542 0.289 -0.883 0.114 50 2.839 0.536 -0.444 0.109
21 -0.105 0.281 -1.892 0.145 51 0.249 0.283 -2.056 0.152
22 1.029 0.308 0.853 0.113 52 0.178 0.284 -0.318 0.108
23 1.314 0.331 0.050 0.107 53 0.322 0.286 -2.078 0.156
24 1.759 0.377 0.255 0.104 54 -0.535 0.290 -0.667 0.109
25 0.469 0.290 -1.471 0.129 55 -1.161 0.310 -2.756 0.201
26 0.699 0.295 -0.337 0.110 56 1.119 0.313 -0.861 0.116
27 1.413 0.337 0.982 0.113 57 0.942 0.307 -0.940 0.116
28 -0.034 0.285 -2.429 0.178 58 -0.107 0.285 -0.554 0.109
29 0.696 0.297 -0.385 0.110 59 -0.317 0.286 -1.780 0.140
30 -0.532 0.289 -3.210 0.241 60 0.696 0.300 -0.554 0.109
RG = rapid-guessing class; SB 1 = solution behavior class 1; SB 2 = solution behavior class 2.
Table 21: Estimates of Class Sizes of the Three-Class Model with the Equality Constraint on Difficulty Parameters between SB 1 and SB 2 and without the Equality Constraint on Difficulty Parameters for RG
SB 1 = solution behavior class 1;
SB 2 = solution behavior class 2.
Table 22: Estimates of Mean Ability for RG, SB 1, and SB 2
class µˆθ SE
RG -1.705 0.154
SB 1 0.138 0.067
SB 2 0 (fixed)
-RG = rapid-guessing class;
SB 1 = solution behavior class 1;
SB 2 = solution behavior class 2.
class on some items, their responses do not solely depend on ability and therefore their ability might not be appropriately estimated.
Figure 5 shows the estimates of the item difficulty and mean response log-time for RG, SB 1, and SB 2. Figure 5 indicates that the mean response time of the SB 2 class is shorter than those of the SB 1 class. For example, the mean response time of item 12 are respectively 21.20 and 14.66 for the SB 1 and SB 2 classes. Therefore, the results suggest that examinees with a slightly higher mean ability, and spend more time on the items are more likely to belong to the SB 1 class. The estimated class size of 0.121 for the rapid-guessing class is again smaller than the estimate of 15% in Meyer (2010). It is possibly due to the fact that shorter response time is the main indication of being a member in the rapid-guessing class in Meyer (2010) whereas some members belonging to the original rapid-guessing class are now classified as those faster respondents of solution behavior 2 in the three-class model. In summary, we can label the three types
of examinees in the ILT data as the rapid-guessing, the solution behavior, and the faster respondents classes. The estimates of item difficulty parameters of solution behavior class are overall very similar to those of Meyer (2010). Furthermore, the mean response log-time for the solution behavior class in Mayer (2010) falls between the estimates of mean response log-time of SB 1 and SB 2 in our present analysis.
of examinees in the ILT data as the rapid-guessing, the solution behavior, and the faster respondents classes. The estimates of item difficulty parameters of solution behavior class are overall very similar to those of Meyer (2010). Furthermore, the mean response log-time for the solution behavior class in Mayer (2010) falls between the estimates of mean response log-time of SB 1 and SB 2 in our present analysis.