Results - 混合SEM模型加入作答時間利用應試行為促進模型分析

The number of replications is limited to 100 in our study given that it takes 1 to 5 minutes to obtain parameter estimates for each replication. However, it takes one to two weeks for the Monte Carlo Markov Chain (MCMC) chain to arrive at the posterior distribution of the parameters using the Bayesian estimation. For some replications, convergence problems occur under the MRM ﬁttings, but not under the MRM-RT ﬁtting. We suspect that the classiﬁcation of examinees heavily requires the information from the response time in the current simulation condition. Therefore, lack of the response time information in MRM results in the problem of unable to classifying the examinees into the RG class accurately due to its small class size, which in turn causes the problem of convergence. The percentages of the replications that converge under MRM ﬁttings with diﬀerent sample sizes are reported in Table 3. The percentage of convergence increases as the sample size becomes larger. This is reasonable because the latent classes are large enough to be distinguished with a large sample. Only

those replications that converge are used for summarizing the simulation results. More speciﬁcally, to ensure that estimation does not fail to converge, we only collect those replications with their item diﬃculty estimates ranging from -5 to 5. We summarize the estimation results from a total of 100 replications which satisfy such a criterion for the MRM ﬁtting.

The estimation performances of the proportions of the three classes, the ability mean of the HARF class and ability variances of the SB and HARF classes, are summarized in Table 4. Comparing the mean of the estimates to the true parameters, the empirical bias decreases as the sample size increases. The SD and MSE exhibit the same pattern that the estimation is more precise as the sample size increases, as expected. Based on Table 4, it seems that MRM could not well account for the proportions of the three classes, πRG, πSB, and πHARF, especially when the sample size is only 250. Moreover, the ability mean of the HARF class does not seem to approach its parameter values even when the sample size increases to 2000 under MRM. Comparing the results of MRM-RT with those of MRM, the empirical bias as well as the SD are shown to be generally smaller in the former model. That is, the information of the response time really helps capture the test-taking behavior of the examinees in the RG class and resultantly improve the estimation with respect to bias and relative eﬃciency for ﬁnite samples.

The estimation performance of the item diﬃculty parameters for each sample size and the SB and HARF classes are summarized in Tables 5 to 12. The empirical bias, SD, and MSE all decrease as the sample size increases. In addition, the estimates from the MRM-RT are better than those from the MRM with respect to their SD as well as MSE. For example, Table 7 shows that SD and MSE of each item under the MRM-RT are smaller than those under the MRM. However, empirical bias from the MRM-RT are not always better than MRM, especially in small samples. More speciﬁcally, Table 5 shows that the empirical bias from MRM are better than MRM-RT for items 4, 10, 12, 13, 14, 16, 17, 22 and 24 for samples of size 250. As the sample size increases to 1000, empirical bias for almost all items from the MRM-RT are better than MRM.

For example, Table 9 shows that only item 1 has a smaller empirical bias from MRM than that from MRM-RT, and Table 12 shows that the empirical bias for all items are smaller from MRM-RT than MRM. For example, the diﬃculty parameter of item 25 is 1.9, and its mean estimates for the HARF class under MRM-RT are respectively

Table 3: Percentages of Replications that Converge under MRM Fittings

Sample size 250 500 1000 2000

percent of passing the (-5, 5) criterion 7.45 % 14 % 17% 27 %

2.29, 2.09, 1.94, and 1.95 as the sample size increases from 250 to 2000. However, its mean of estimates under the MRM are respectively 2.54, 2.32, 2.32, and 2.19. In other words, the empirical bias under the MRM-RT are obviously much smaller than those obtained under the MRM as the sample size increases.

In addition, the SD and MSE of the 10 DIF items for the HARF class are obviously larger than other items under the MRM-RT when the sample size is small. As the sample size increases, no diﬀerences are found for the SD and MSE between these 10 DIF items and the others. Table 6 with sample size of 250 shows that the SD and MSE of item 19 are respectively 0.55 and 0.52, and of item 18 are respectively 0.45 and 0.25. Table 10 with sample size of 1000 shows that the SD and MSE of item 19 are respectively 0.29 and 0.14, and of item 18 are respectively 0.27 and 0.08. That is, the stability of item diﬃculty parameter mean of estimates under MRM increases as the sample size increases.

The item diﬃculty parameters of the larger proportion classes in MRM could be estimated when the sample size is 1000 and their empirical bias, SD and MSE of estimates obviously decrease. In addition, the empirical bias, SD, and MSE for the diﬃculty parameter estimates of the DIF items under the MRM-RT obviously decrease when sample size is 1000. Overall, these results show that MRM-RT could recover parameters very well and better than MRM using MLR estimation under the mixture SEM. Especially with a small sample size, the estimation performance of MRM-RT is much advantaged over MRM because the response time really help disregarding (or accounting for) the responses of the examinees in the RG class to arrive at better estimation of the item diﬃculty parameters.

Table4:Mean,SD,andMSEoftheEstimatesforThreeClassSizes,MeanAbilityofHARFClass,andAbility VariancesofSBandHARFClasses,underMRM-RTandMRMFittings MRM-RTMRM πRGπSBπHARFσ2 θSBµθHARFσ2 θHARFπRGπSBπHARFσ2 θSBµθHARFσ2 θHARF NParameter0.150.550.310.50.650.150.550.310.50.65 250 mean0.1440.4930.3640.8990.6030.7690.1420.4580.40.9680.521.209 SD0.0230.1140.1150.1950.4470.2350.0270.2190.220.5310.7350.907 MSE0.000580.0160.0170.0480.2110.0690.000790.0560.0560.2830.5411.136 500 mean0.150.5120.3390.930.5190.7060.150.4250.4240.8870.4630.972 SD0.0150.0970.0950.1670.280.1690.0190.1920.1930.3920.6470.407 MSE0.000240.0110.0110.0330.0790.0320.000350.0450.0450.1660.420.269 1000 mean0.1490.5380.3130.9910.4940.6620.1490.4620.3890.8850.5020.936 SD0.0110.0570.0580.1170.2720.1140.0120.1640.1630.2950.6190.359 MSE0.000110.0030.0040.0140.0740.0130.000160.0430.0420.1000.3840.211 2000 mean0.1490.5430.3080.9890.5110.6620.1490.4640.3870.8950.5540.787 SD0.0080.0450.0440.080.2190.0860.010.1640.1630.3360.6230.37 MSE0.000070.0020.0020.0070.0480.0080.000100.0350.0370.1240.3920.156 RG=rapid-guessing;SB=solutionbehaviorclass;HARF=highabilityand/orrespondwithfamiliarity; SD=standarddeviation;MSE=meansquarederror.

Table 5: Mean, SD, and MSE of Diﬃculty Parameter Estimates for the SB Class with Sample Size of 250, under MRM-RT and MRM Fittings

MRM-RT MRM

item Parameter mean SD MSE mean SD MSE

1 -2.4 -2.43 0.30 0.09 -2.47 0.36 0.13