CHAPTER 5 Results
5.2 Experiment two: Model parameter recovery with discrete background
To evaluate the performance of the multilevel higher-order item response model in estimating the data with or without background variables, this experiment implements the approaches into a parameter recovery model with discrete background variables. In this experiment, the results are separated into two parts, the overall and the domain ability estimates.
5.2.1 Overall ability estimates
The results for the overall ability estimates were display in two different parameter, the group and the subgroup statistic. Table 5-.9 includes the RMSE of group mean in overall ability according to the following factors: (a) number of items administered in each domain (three levels: n=5, 10, and 20), (b) fitting model (four models: multilevel higher-order item response model (MHO-IRT), higher-order item response model (HO-IRT), multilevel unidimensional item response model (MU-IRT) and unidimensional item response model (U-IRT) and (c) the difference group of the background variables (four levels: A: school A, B: school B, H: the parental socioeconomic status is high, and L: the parental socioeconomic status is L). (d) number of examinees (two levels: N=1000 or 4000). The simulation setting is that average difference between school type A and B was 0.000 in this simulation, while the average difference based on parental SES was magnitude 1.414.
The MHO-IRT model consistently provides the lowest RMSE among the four models. The difference between the multilevel based models (i.e., the MHO-IRT and MM-IRT) and the IRT based models (i.e., the HO-IRT and M-IRT) becomes higher with the average difference based on parental SES. When N = 1000 and n = 20, the RMSEs are 0.0312 for group A, 0.0313 for group B, 0.0235 for group L, and 0.0211 for group H using the MHO-IRT model whereas the RMSEs increased to 0.0317 for group A, 0.0325 for group B, 0.0303 for group L, and 0.0309 for group H with the HO-IRT model. The differences between MHO-IRT and HO-IRT are 0.0013 for group A, 0.0012 for group B, 0.0068 for group L, and 0.0098 for group H. According to the
60
difference, the MHO-IRT model provides better estimates when the average differences in the background variables are higher.
Table 5-9 shows that proficient estimates are achieved with longer tests and larger sample sizes. For example, when N = 1000, n = 5 the RMSE is 0.0284 for group H, when the number of items administered in each domain increases to 10 the RMSE decreases to 0.0263 and the estimate is even better when n = 20 for which the RMSE is 0.0211. Thus, a large sample size is more representative of the population.
Table 5-9
RMSE of group mean in overall ability
Examinees Correlation Category MHO-IRT HO-IRT MU-IRT U-IRT
1000
5
A 0.0382 0.0396 0.0391 0.0396 B 0.0384 0.0397 0.0387 0.0393 L 0.0296 0.0380 0.0322 0.0390 H 0.0284 0.0384 0.0327 0.0391
10
A 0.0355 0.0364 0.0368 0.0365 B 0.0357 0.0370 0.0367 0.0369 L 0.0276 0.0348 0.0293 0.0369 H 0.0263 0.0358 0.0299 0.0363
20
A 0.0312 0.0317 0.0322 0.0324 B 0.0313 0.0325 0.0318 0.0317 L 0.0235 0.0303 0.0244 0.0319 H 0.0211 0.0309 0.0249 0.0320
4000
5
A 0.0375 0.0395 0.0382 0.0392 B 0.0366 0.0386 0.0382 0.0377 L 0.0278 0.0373 0.0311 0.0389 H 0.0273 0.0365 0.0321 0.0382
10
A 0.0350 0.0353 0.0350 0.0349 B 0.0339 0.0368 0.0360 0.0354 L 0.0269 0.0335 0.0285 0.0362 H 0.0243 0.0348 0.0297 0.0352
20
A 0.0298 0.0309 0.0307 0.0305 B 0.0302 0.0307 0.0309 0.0305 L 0.0218 0.0302 0.0235 0.0302 H 0.0207 0.0291 0.0239 0.0314
61
Table 5-10
RMSE of group standard deviation in overall ability
Examinees Correlation Category MHO-IRT HO-IRT MU-IRT U-IRT
1000
5
A 0.0424 0.0634 0.0436 0.0686 B 0.0432 0.0629 0.0463 0.0705 L 0.0209 0.0630 0.0239 0.0643 H 0.0232 0.0632 0.0246 0.0656
10
A 0.0398 0.0604 0.0414 0.0658 B 0.0411 0.0599 0.0436 0.0677 L 0.0183 0.0605 0.0212 0.0613 H 0.0207 0.0606 0.0215 0.0625
20
A 0.0368 0.0575 0.0393 0.0636 B 0.0379 0.0567 0.0409 0.0647 L 0.0159 0.0584 0.0183 0.0589 H 0.0182 0.0583 0.0184 0.0603
4000
5
A 0.0411 0.0624 0.0417 0.0667 B 0.0420 0.0614 0.0457 0.0698 L 0.0200 0.0619 0.0234 0.0642 H 0.0222 0.0623 0.0243 0.0646
10
A 0.0391 0.0598 0.0399 0.0643 B 0.0393 0.0589 0.0427 0.0672 L 0.0171 0.0596 0.0203 0.0598 H 0.0198 0.0600 0.0211 0.0618
20
A 0.0355 0.0564 0.0390 0.0633 B 0.0379 0.0554 0.0403 0.0647 L 0.0146 0.0575 0.0172 0.0581 H 0.0163 0.0571 0.0182 0.0589 Table 5-10 includes the RMSE of the group standard deviation of the overall ability according to the four different factors described above. The MHO-IRT model consistently provides the lowest RMSE among the four models. The difference between the multilevel based model (i.e., the MHO-IRT and MM-IRT) and the IRT based model (i.e., the HO-IRT and M-IRT) becomes higher with the average difference based on parental SES. The MHO-IRT model provides better estimates when the average differences in the background variables are higher. Table 5-10 shows that proficient estimates are achieved with longer tests and larger sample sizes.
The RMSE differences between the multilevel and the IRT based models in estimating
62
the group standard deviations are higher than the group mean estimates. The result shows that the multilevel based model provides better estimates of group standard deviations.
Table 5-11
RMSE of subgroup mean in overall ability
Examinees Test length Category MHO-IRT HO-IRT MU-IRT U-IRT
1000
5
00 0.0288 0.0303 0.0299 0.0306 01 0.0232 0.0317 0.0240 0.0326 10 0.0220 0.0301 0.0232 0.0305 11 0.0265 0.0307 0.0271 0.0313
10
00 0.0279 0.0302 0.0281 0.0313 01 0.0226 0.0309 0.0232 0.0316 10 0.0219 0.0284 0.0226 0.0291 11 0.0264 0.0305 0.0268 0.0315
20
00 0.0206 0.0298 0.0208 0.0303 01 0.0150 0.0232 0.0151 0.0245 10 0.0217 0.0298 0.0229 0.0298 11 0.0207 0.0244 0.0216 0.0252
4000
5
00 0.0286 0.0300 0.0298 0.0305 01 0.0220 0.0306 0.0238 0.0318 10 0.0212 0.0300 0.0225 0.0304 11 0.0257 0.0297 0.0260 0.0303
10
00 0.0275 0.0297 0.0276 0.0300 01 0.0216 0.0298 0.0229 0.0315 10 0.0208 0.0276 0.0216 0.0291 11 0.0262 0.0305 0.0265 0.0313
20
00 0.0196 0.0295 0.0203 0.0294 01 0.0144 0.0228 0.0150 0.0235 10 0.0208 0.0287 0.0226 0.0294 11 0.0198 0.0239 0.0205 0.0247 Table 5-11 shows the RMSE of the marginal means for each of the groups as obtained from the simulated data. The means are estimated fairly well by all the models used. Compared to the U-IRT and HO-IRT models, the results indicate that the models that consider the background variables are relatively efficient. The RMSE of
63
the group mean shows that when the test lengths and sample sizes increase, the RMSE decreases. The results indicate that better estimates are obtained with longer tests than with larger sample sizes.
Table 5-12
RMSE of subgroup standard deviation in overall ability
Examinees Test length Category MHO-IRT HO-IRT MU-IRT U-IRT
1000
5
00 0.0352 0.0542 0.0360 0.0547 01 0.0356 0.0508 0.0368 0.0518 10 0.0316 0.0527 0.0329 0.0533 11 0.0317 0.0542 0.0322 0.0554
10
00 0.0354 0.0544 0.0364 0.0550 01 0.0372 0.0525 0.0378 0.0527 10 0.0325 0.0535 0.0329 0.0541 11 0.0319 0.0549 0.0320 0.0560
20
00 0.0307 0.0499 0.0308 0.0507 01 0.0317 0.0480 0.0324 0.0488 10 0.0324 0.0506 0.0328 0.0519 11 0.0264 0.0532 0.0266 0.0537
4000
5
00 0.0349 0.0541 0.0352 0.0538 01 0.0351 0.0506 0.0357 0.0512 10 0.0305 0.0521 0.0317 0.0526 11 0.0316 0.0535 0.0311 0.0547
10
00 0.0344 0.0536 0.0360 0.0542 01 0.0359 0.0517 0.0366 0.0520 10 0.0313 0.0530 0.0320 0.0532 11 0.0314 0.0544 0.0307 0.0553
20
00 0.0299 0.0488 0.0307 0.0494 01 0.0317 0.0477 0.0311 0.0482 10 0.0315 0.0499 0.0315 0.0506 11 0.0253 0.0522 0.0253 0.0526 Table 5-12 shows the RMSE of the marginal standard deviations for each of the groups as obtained from the simulated data. When we examine the RMSE of the marginal standard deviations, we observe deterioration in these estimates compared to the mean estimates. The MHO-IRT and MU-IRT are similar estimators and outperform the HO-IRT and U-IRT models. The results indicate that the models that
64
consider the background variables are relatively efficient. The RMSEs of the group standard deviations show that when the test length and sample sizes increase, the RMSE decreases. When n = 5 the RMSE is 0.0352 for the MHO-IRT method in category 00, when the number of items administered in each domain increases to 10 the RMSE decreases to 0.0354 and an even better estimate results when n = 20 for which the RMSE is 0.0307.
5.2.2 Domain ability estimates
Table 5-13
RMSE of group mean in domain ability
Examinees Correlation Category MHO-IRT HO-IRT MM-IRT M-IRT
1000
5
A 0.0380 0.0386 0.0386 0.0392 B 0.0379 0.0395 0.0387 0.0384 L 0.0291 0.0372 0.0320 0.0389 H 0.0277 0.0379 0.0323 0.0389
10
A 0.0352 0.0371 0.0357 0.0366 B 0.0352 0.0362 0.0358 0.0355 L 0.0258 0.0345 0.0296 0.0365 H 0.0249 0.0352 0.0295 0.0359
20
A 0.0308 0.0330 0.0307 0.0315 B 0.0299 0.0325 0.0317 0.0318 L 0.0204 0.0294 0.0245 0.0310 H 0.0200 0.0310 0.0253 0.0306
4000
5
A 0.0371 0.0384 0.0370 0.0370 B 0.0367 0.0390 0.0374 0.0380 L 0.0272 0.0367 0.0313 0.0369 H 0.0279 0.0370 0.0312 0.0384
10
A 0.0329 0.0354 0.0345 0.0358 B 0.0344 0.0352 0.0350 0.0343 L 0.0244 0.0343 0.0294 0.0348 H 0.0245 0.0355 0.0290 0.0356
20
A 0.0284 0.0320 0.0296 0.0318 B 0.0283 0.0318 0.0311 0.0305 L 0.0204 0.0288 0.0229 0.0292 H 0.0188 0.0312 0.0240 0.0303
65
Table 5-13 shows that proficient estimates are achieved with longer tests and larger sample sizes. For example, when N = 1000 and n = 5 the RMSE is 0.0277 for group H, when number of items administered in each domain increases to 10 the RMSE decreases to 0.0249 and a better estimate results when n = 20 for which the RMSE is 0.0200. Thus, a large sample size is more representative of the population, and the MHO-IRT model consistently provides the lowest RMSE among the four models.
Table 5-14
RMSE of group standard deviation in domain ability
Examinees Correlation Category MHO-IRT HO-IRT MM-IRT M-IRT
1000
5
A 0.0418 0.0626 0.0428 0.0683 B 0.0426 0.0629 0.0458 0.0696 L 0.0201 0.0621 0.0236 0.0638 H 0.0222 0.0627 0.0239 0.0653
10
A 0.0403 0.0606 0.0399 0.0656 B 0.0404 0.0599 0.0432 0.0675 L 0.0172 0.0603 0.0211 0.0614 H 0.0201 0.0604 0.0213 0.0632
20
A 0.0361 0.0577 0.0374 0.0631 B 0.0383 0.0564 0.0396 0.0655 L 0.0149 0.0576 0.0190 0.0587 H 0.0171 0.0582 0.0194 0.0597
4000
5
A 0.0397 0.0610 0.0419 0.0661 B 0.0414 0.0601 0.0447 0.0695 L 0.0198 0.0613 0.0230 0.0622 H 0.0219 0.0603 0.0232 0.0639
10
A 0.0377 0.0593 0.0395 0.0648 B 0.0399 0.0592 0.0417 0.0674 L 0.0171 0.0576 0.0202 0.0605 H 0.0195 0.0587 0.0212 0.0632
20
A 0.0357 0.0563 0.0362 0.0619 B 0.0369 0.0543 0.0395 0.0649 L 0.0127 0.0559 0.0181 0.0575 H 0.0160 0.0571 0.0194 0.0583
66
Table 5-14 includes the RMSE of the group standard deviation of domain ability according to the four different factors described above. Table 5-14 shows that proficient estimates are achieved with longer tests and larger sample sizes. The MHO-IRT model consistently provides the lowest RMSE among the four models and provides better estimates when the average difference in the background variables is higher. The RMSE differences between the multilevel and the IRT based models in estimating the group standard deviation are higher than the group mean estimates. The results show that the multilevel based model provides better estimates for group standard deviation.
Table 5-15
RMSE of subgroup mean in domain ability
Examinees Test length Category MHO-IRT HO-IRT MM-IRT M-IRT
1000
5
00 0.0264 0.0310 0.0272 0.0320 01 0.0242 0.0310 0.0245 0.0316 10 0.0278 0.0312 0.0288 0.0315 11 0.0239 0.0304 0.0243 0.0313
10
00 0.0254 0.0297 0.0261 0.0306 01 0.0232 0.0303 0.0237 0.0309 10 0.0266 0.0300 0.0271 0.0305 11 0.0225 0.0297 0.0230 0.0302
20
00 0.0233 0.0258 0.0235 0.0261 01 0.0211 0.0249 0.0215 0.0252 10 0.0240 0.0237 0.0251 0.0239 11 0.0230 0.0252 0.0232 0.0253
4000
5
00 0.0255 0.0305 0.0266 0.0312 01 0.0233 0.0298 0.0244 0.0309 10 0.0269 0.0306 0.0279 0.0310 11 0.0230 0.0293 0.0234 0.0307
10
00 0.0250 0.0293 0.0250 0.0299 01 0.0228 0.0295 0.0228 0.0300 10 0.0264 0.0293 0.0265 0.0297 11 0.0217 0.0289 0.0221 0.0296 20
00 0.0225 0.0253 0.0229 0.0259 01 0.0203 0.0239 0.0211 0.0243 10 0.0238 0.0231 0.0244 0.0232
67
11 0.0223 0.0243 0.0226 0.0250 Table 5-15 shows that the RMSE of the marginal means for each of the groups as obtained from the simulated data. The means are estimated fairly well by all the models used. Compared to the M-IRT and HO-IRT models, the results indicate that the models that include the background variable are relatively efficient. The RMSE of the group mean shows that when the test lengths and sample sizes increase, the RMSE decreases. The results indicate that better estimates are obtained with longer tests than with larger sample sizes.
Table 5-16
RMSE of subgroup standard deviation in domain ability
Examinees Test length Category MHO-IRT HO-IRT MM-IRT M-IRT
1000
5
00 0.0385 0.0523 0.0388 0.0529 01 0.0330 0.0550 0.0334 0.0556 10 0.0312 0.0523 0.0313 0.0533 11 0.0352 0.0548 0.0359 0.0550
10
00 0.0389 0.0536 0.0397 0.0543 01 0.0341 0.0555 0.0347 0.0562 10 0.0320 0.0532 0.0326 0.0543 11 0.0365 0.0553 0.0367 0.0561
20
00 0.0332 0.0485 0.0338 0.0493 01 0.0309 0.0488 0.0311 0.0491 10 0.0265 0.0478 0.0271 0.0481 11 0.0342 0.0486 0.0348 0.0489
4000
5
00 0.0382 0.0520 0.0383 0.0522 01 0.0326 0.0544 0.0329 0.0551 10 0.0306 0.0514 0.0305 0.0520 11 0.0347 0.0538 0.0354 0.0549
10
00 0.0382 0.0528 0.0396 0.0536 01 0.0330 0.0549 0.0336 0.0555 10 0.0314 0.0523 0.0321 0.0540 11 0.0357 0.0547 0.0361 0.0559 20
00 0.0329 0.0478 0.0330 0.0486 01 0.0300 0.0483 0.0304 0.0482 10 0.0262 0.0468 0.0267 0.0476
68
11 0.0334 0.0478 0.0338 0.0483 Table 5-16 shows the RMSE of the marginal standard deviations for each of the subgroups as obtained from the simulated data. When we examine the RMSE of the marginal standard deviations, we observe deterioration in these estimates compared to the mean estimates. The MHO-IRT and MM-IRT are similar estimators and outperform the HO-IRT and M-IRT models. The results indicate that the models that consider the background variables are relatively efficient. The RMSE of the subgroup standard deviations show that when the test lengths and sample sizes increase, the RMSE decreases. When n = 5 the RMSE is 0.0385 for the MHO-IRT method in category 00, when the number of items administered in each domain increases to 10 the RMSE decreases to 0.0389, and a better estimate results when n = 20 for which the RMSE is 0.0332.
69