Experiment two: Model parameter recovery with discrete background

CHAPTER 5 Results

5.2 Experiment two: Model parameter recovery with discrete background

To evaluate the performance of the multilevel higher-order item response model in estimating the data with or without background variables, this experiment implements the approaches into a parameter recovery model with discrete background variables. In this experiment, the results are separated into two parts, the overall and the domain ability estimates.

5.2.1 Overall ability estimates

The results for the overall ability estimates were display in two different parameter, the group and the subgroup statistic. Table 5-.9 includes the RMSE of group mean in overall ability according to the following factors: (a) number of items administered in each domain (three levels: n=5, 10, and 20), (b) fitting model (four models: multilevel higher-order item response model (MHO-IRT), higher-order item response model (HO-IRT), multilevel unidimensional item response model (MU-IRT) and unidimensional item response model (U-IRT) and (c) the difference group of the background variables (four levels: A: school A, B: school B, H: the parental socioeconomic status is high, and L: the parental socioeconomic status is L). (d) number of examinees (two levels: N=1000 or 4000). The simulation setting is that average difference between school type A and B was 0.000 in this simulation, while the average difference based on parental SES was magnitude 1.414.

The MHO-IRT model consistently provides the lowest RMSE among the four models. The difference between the multilevel based models (i.e., the MHO-IRT and MM-IRT) and the IRT based models (i.e., the HO-IRT and M-IRT) becomes higher with the average difference based on parental SES. When N = 1000 and n = 20, the RMSEs are 0.0312 for group A, 0.0313 for group B, 0.0235 for group L, and 0.0211 for group H using the MHO-IRT model whereas the RMSEs increased to 0.0317 for group A, 0.0325 for group B, 0.0303 for group L, and 0.0309 for group H with the HO-IRT model. The differences between MHO-IRT and HO-IRT are 0.0013 for group A, 0.0012 for group B, 0.0068 for group L, and 0.0098 for group H. According to the

difference, the MHO-IRT model provides better estimates when the average differences in the background variables are higher.

Table 5-9 shows that proficient estimates are achieved with longer tests and larger sample sizes. For example, when N = 1000, n = 5 the RMSE is 0.0284 for group H, when the number of items administered in each domain increases to 10 the RMSE decreases to 0.0263 and the estimate is even better when n = 20 for which the RMSE is 0.0211. Thus, a large sample size is more representative of the population.

Table 5-9

RMSE of group mean in overall ability

Examinees Correlation Category MHO-IRT HO-IRT MU-IRT U-IRT

1000

A 0.0382 0.0396 0.0391 0.0396 B 0.0384 0.0397 0.0387 0.0393 L 0.0296 0.0380 0.0322 0.0390 H 0.0284 0.0384 0.0327 0.0391

A 0.0355 0.0364 0.0368 0.0365 B 0.0357 0.0370 0.0367 0.0369 L 0.0276 0.0348 0.0293 0.0369 H 0.0263 0.0358 0.0299 0.0363

A 0.0312 0.0317 0.0322 0.0324 B 0.0313 0.0325 0.0318 0.0317 L 0.0235 0.0303 0.0244 0.0319 H 0.0211 0.0309 0.0249 0.0320

4000

A 0.0375 0.0395 0.0382 0.0392 B 0.0366 0.0386 0.0382 0.0377 L 0.0278 0.0373 0.0311 0.0389 H 0.0273 0.0365 0.0321 0.0382

A 0.0350 0.0353 0.0350 0.0349 B 0.0339 0.0368 0.0360 0.0354 L 0.0269 0.0335 0.0285 0.0362 H 0.0243 0.0348 0.0297 0.0352

A 0.0298 0.0309 0.0307 0.0305 B 0.0302 0.0307 0.0309 0.0305 L 0.0218 0.0302 0.0235 0.0302 H 0.0207 0.0291 0.0239 0.0314

Table 5-10

RMSE of group standard deviation in overall ability

Examinees Correlation Category MHO-IRT HO-IRT MU-IRT U-IRT

1000

A 0.0424 0.0634 0.0436 0.0686 B 0.0432 0.0629 0.0463 0.0705 L 0.0209 0.0630 0.0239 0.0643 H 0.0232 0.0632 0.0246 0.0656

A 0.0398 0.0604 0.0414 0.0658 B 0.0411 0.0599 0.0436 0.0677 L 0.0183 0.0605 0.0212 0.0613 H 0.0207 0.0606 0.0215 0.0625

A 0.0368 0.0575 0.0393 0.0636 B 0.0379 0.0567 0.0409 0.0647 L 0.0159 0.0584 0.0183 0.0589 H 0.0182 0.0583 0.0184 0.0603

4000

A 0.0411 0.0624 0.0417 0.0667 B 0.0420 0.0614 0.0457 0.0698 L 0.0200 0.0619 0.0234 0.0642 H 0.0222 0.0623 0.0243 0.0646

A 0.0391 0.0598 0.0399 0.0643 B 0.0393 0.0589 0.0427 0.0672 L 0.0171 0.0596 0.0203 0.0598 H 0.0198 0.0600 0.0211 0.0618

A 0.0355 0.0564 0.0390 0.0633 B 0.0379 0.0554 0.0403 0.0647 L 0.0146 0.0575 0.0172 0.0581 H 0.0163 0.0571 0.0182 0.0589 Table 5-10 includes the RMSE of the group standard deviation of the overall ability according to the four different factors described above. The MHO-IRT model consistently provides the lowest RMSE among the four models. The difference between the multilevel based model (i.e., the MHO-IRT and MM-IRT) and the IRT based model (i.e., the HO-IRT and M-IRT) becomes higher with the average difference based on parental SES. The MHO-IRT model provides better estimates when the average differences in the background variables are higher. Table 5-10 shows that proficient estimates are achieved with longer tests and larger sample sizes.

The RMSE differences between the multilevel and the IRT based models in estimating

the group standard deviations are higher than the group mean estimates. The result shows that the multilevel based model provides better estimates of group standard deviations.

Table 5-11

RMSE of subgroup mean in overall ability

Examinees Test length Category MHO-IRT HO-IRT MU-IRT U-IRT

1000

00 0.0288 0.0303 0.0299 0.0306 01 0.0232 0.0317 0.0240 0.0326 10 0.0220 0.0301 0.0232 0.0305 11 0.0265 0.0307 0.0271 0.0313

00 0.0279 0.0302 0.0281 0.0313 01 0.0226 0.0309 0.0232 0.0316 10 0.0219 0.0284 0.0226 0.0291 11 0.0264 0.0305 0.0268 0.0315

00 0.0206 0.0298 0.0208 0.0303 01 0.0150 0.0232 0.0151 0.0245 10 0.0217 0.0298 0.0229 0.0298 11 0.0207 0.0244 0.0216 0.0252

4000

00 0.0286 0.0300 0.0298 0.0305 01 0.0220 0.0306 0.0238 0.0318 10 0.0212 0.0300 0.0225 0.0304 11 0.0257 0.0297 0.0260 0.0303

00 0.0275 0.0297 0.0276 0.0300 01 0.0216 0.0298 0.0229 0.0315 10 0.0208 0.0276 0.0216 0.0291 11 0.0262 0.0305 0.0265 0.0313

00 0.0196 0.0295 0.0203 0.0294 01 0.0144 0.0228 0.0150 0.0235 10 0.0208 0.0287 0.0226 0.0294 11 0.0198 0.0239 0.0205 0.0247 Table 5-11 shows the RMSE of the marginal means for each of the groups as obtained from the simulated data. The means are estimated fairly well by all the models used. Compared to the U-IRT and HO-IRT models, the results indicate that the models that consider the background variables are relatively efficient. The RMSE of

the group mean shows that when the test lengths and sample sizes increase, the RMSE decreases. The results indicate that better estimates are obtained with longer tests than with larger sample sizes.

Table 5-12

RMSE of subgroup standard deviation in overall ability

Examinees Test length Category MHO-IRT HO-IRT MU-IRT U-IRT

1000

00 0.0352 0.0542 0.0360 0.0547 01 0.0356 0.0508 0.0368 0.0518 10 0.0316 0.0527 0.0329 0.0533 11 0.0317 0.0542 0.0322 0.0554

00 0.0354 0.0544 0.0364 0.0550 01 0.0372 0.0525 0.0378 0.0527 10 0.0325 0.0535 0.0329 0.0541 11 0.0319 0.0549 0.0320 0.0560

00 0.0307 0.0499 0.0308 0.0507 01 0.0317 0.0480 0.0324 0.0488 10 0.0324 0.0506 0.0328 0.0519 11 0.0264 0.0532 0.0266 0.0537

4000

00 0.0349 0.0541 0.0352 0.0538 01 0.0351 0.0506 0.0357 0.0512 10 0.0305 0.0521 0.0317 0.0526 11 0.0316 0.0535 0.0311 0.0547

00 0.0344 0.0536 0.0360 0.0542 01 0.0359 0.0517 0.0366 0.0520 10 0.0313 0.0530 0.0320 0.0532 11 0.0314 0.0544 0.0307 0.0553

00 0.0299 0.0488 0.0307 0.0494 01 0.0317 0.0477 0.0311 0.0482 10 0.0315 0.0499 0.0315 0.0506 11 0.0253 0.0522 0.0253 0.0526 Table 5-12 shows the RMSE of the marginal standard deviations for each of the groups as obtained from the simulated data. When we examine the RMSE of the marginal standard deviations, we observe deterioration in these estimates compared to the mean estimates. The MHO-IRT and MU-IRT are similar estimators and outperform the HO-IRT and U-IRT models. The results indicate that the models that

consider the background variables are relatively efficient. The RMSEs of the group standard deviations show that when the test length and sample sizes increase, the RMSE decreases. When n = 5 the RMSE is 0.0352 for the MHO-IRT method in category 00, when the number of items administered in each domain increases to 10 the RMSE decreases to 0.0354 and an even better estimate results when n = 20 for which the RMSE is 0.0307.

5.2.2 Domain ability estimates

Table 5-13

RMSE of group mean in domain ability

Examinees Correlation Category MHO-IRT HO-IRT MM-IRT M-IRT

1000

A 0.0380 0.0386 0.0386 0.0392 B 0.0379 0.0395 0.0387 0.0384 L 0.0291 0.0372 0.0320 0.0389 H 0.0277 0.0379 0.0323 0.0389

A 0.0352 0.0371 0.0357 0.0366 B 0.0352 0.0362 0.0358 0.0355 L 0.0258 0.0345 0.0296 0.0365 H 0.0249 0.0352 0.0295 0.0359

A 0.0308 0.0330 0.0307 0.0315 B 0.0299 0.0325 0.0317 0.0318 L 0.0204 0.0294 0.0245 0.0310 H 0.0200 0.0310 0.0253 0.0306

4000

A 0.0371 0.0384 0.0370 0.0370 B 0.0367 0.0390 0.0374 0.0380 L 0.0272 0.0367 0.0313 0.0369 H 0.0279 0.0370 0.0312 0.0384

A 0.0329 0.0354 0.0345 0.0358 B 0.0344 0.0352 0.0350 0.0343 L 0.0244 0.0343 0.0294 0.0348 H 0.0245 0.0355 0.0290 0.0356

A 0.0284 0.0320 0.0296 0.0318 B 0.0283 0.0318 0.0311 0.0305 L 0.0204 0.0288 0.0229 0.0292 H 0.0188 0.0312 0.0240 0.0303

Table 5-13 shows that proficient estimates are achieved with longer tests and larger sample sizes. For example, when N = 1000 and n = 5 the RMSE is 0.0277 for group H, when number of items administered in each domain increases to 10 the RMSE decreases to 0.0249 and a better estimate results when n = 20 for which the RMSE is 0.0200. Thus, a large sample size is more representative of the population, and the MHO-IRT model consistently provides the lowest RMSE among the four models.

Table 5-14

RMSE of group standard deviation in domain ability

Examinees Correlation Category MHO-IRT HO-IRT MM-IRT M-IRT

1000

A 0.0418 0.0626 0.0428 0.0683 B 0.0426 0.0629 0.0458 0.0696 L 0.0201 0.0621 0.0236 0.0638 H 0.0222 0.0627 0.0239 0.0653

A 0.0403 0.0606 0.0399 0.0656 B 0.0404 0.0599 0.0432 0.0675 L 0.0172 0.0603 0.0211 0.0614 H 0.0201 0.0604 0.0213 0.0632

A 0.0361 0.0577 0.0374 0.0631 B 0.0383 0.0564 0.0396 0.0655 L 0.0149 0.0576 0.0190 0.0587 H 0.0171 0.0582 0.0194 0.0597

4000

A 0.0397 0.0610 0.0419 0.0661 B 0.0414 0.0601 0.0447 0.0695 L 0.0198 0.0613 0.0230 0.0622 H 0.0219 0.0603 0.0232 0.0639

A 0.0377 0.0593 0.0395 0.0648 B 0.0399 0.0592 0.0417 0.0674 L 0.0171 0.0576 0.0202 0.0605 H 0.0195 0.0587 0.0212 0.0632

A 0.0357 0.0563 0.0362 0.0619 B 0.0369 0.0543 0.0395 0.0649 L 0.0127 0.0559 0.0181 0.0575 H 0.0160 0.0571 0.0194 0.0583

Table 5-14 includes the RMSE of the group standard deviation of domain ability according to the four different factors described above. Table 5-14 shows that proficient estimates are achieved with longer tests and larger sample sizes. The MHO-IRT model consistently provides the lowest RMSE among the four models and provides better estimates when the average difference in the background variables is higher. The RMSE differences between the multilevel and the IRT based models in estimating the group standard deviation are higher than the group mean estimates. The results show that the multilevel based model provides better estimates for group standard deviation.

Table 5-15

RMSE of subgroup mean in domain ability

Examinees Test length Category MHO-IRT HO-IRT MM-IRT M-IRT

1000

00 0.0264 0.0310 0.0272 0.0320 01 0.0242 0.0310 0.0245 0.0316 10 0.0278 0.0312 0.0288 0.0315 11 0.0239 0.0304 0.0243 0.0313

00 0.0254 0.0297 0.0261 0.0306 01 0.0232 0.0303 0.0237 0.0309 10 0.0266 0.0300 0.0271 0.0305 11 0.0225 0.0297 0.0230 0.0302

00 0.0233 0.0258 0.0235 0.0261 01 0.0211 0.0249 0.0215 0.0252 10 0.0240 0.0237 0.0251 0.0239 11 0.0230 0.0252 0.0232 0.0253

4000

00 0.0255 0.0305 0.0266 0.0312 01 0.0233 0.0298 0.0244 0.0309 10 0.0269 0.0306 0.0279 0.0310 11 0.0230 0.0293 0.0234 0.0307

00 0.0250 0.0293 0.0250 0.0299 01 0.0228 0.0295 0.0228 0.0300 10 0.0264 0.0293 0.0265 0.0297 11 0.0217 0.0289 0.0221 0.0296 20

00 0.0225 0.0253 0.0229 0.0259 01 0.0203 0.0239 0.0211 0.0243 10 0.0238 0.0231 0.0244 0.0232

11 0.0223 0.0243 0.0226 0.0250 Table 5-15 shows that the RMSE of the marginal means for each of the groups as obtained from the simulated data. The means are estimated fairly well by all the models used. Compared to the M-IRT and HO-IRT models, the results indicate that the models that include the background variable are relatively efficient. The RMSE of the group mean shows that when the test lengths and sample sizes increase, the RMSE decreases. The results indicate that better estimates are obtained with longer tests than with larger sample sizes.

Table 5-16

RMSE of subgroup standard deviation in domain ability

Examinees Test length Category MHO-IRT HO-IRT MM-IRT M-IRT

1000

00 0.0385 0.0523 0.0388 0.0529 01 0.0330 0.0550 0.0334 0.0556 10 0.0312 0.0523 0.0313 0.0533 11 0.0352 0.0548 0.0359 0.0550

00 0.0389 0.0536 0.0397 0.0543 01 0.0341 0.0555 0.0347 0.0562 10 0.0320 0.0532 0.0326 0.0543 11 0.0365 0.0553 0.0367 0.0561

00 0.0332 0.0485 0.0338 0.0493 01 0.0309 0.0488 0.0311 0.0491 10 0.0265 0.0478 0.0271 0.0481 11 0.0342 0.0486 0.0348 0.0489

4000

00 0.0382 0.0520 0.0383 0.0522 01 0.0326 0.0544 0.0329 0.0551 10 0.0306 0.0514 0.0305 0.0520 11 0.0347 0.0538 0.0354 0.0549

00 0.0382 0.0528 0.0396 0.0536 01 0.0330 0.0549 0.0336 0.0555 10 0.0314 0.0523 0.0321 0.0540 11 0.0357 0.0547 0.0361 0.0559 20

00 0.0329 0.0478 0.0330 0.0486 01 0.0300 0.0483 0.0304 0.0482 10 0.0262 0.0468 0.0267 0.0476

11 0.0334 0.0478 0.0338 0.0483 Table 5-16 shows the RMSE of the marginal standard deviations for each of the subgroups as obtained from the simulated data. When we examine the RMSE of the marginal standard deviations, we observe deterioration in these estimates compared to the mean estimates. The MHO-IRT and MM-IRT are similar estimators and outperform the HO-IRT and M-IRT models. The results indicate that the models that consider the background variables are relatively efficient. The RMSE of the subgroup standard deviations show that when the test lengths and sample sizes increase, the RMSE decreases. When n = 5 the RMSE is 0.0385 for the MHO-IRT method in category 00, when the number of items administered in each domain increases to 10 the RMSE decreases to 0.0389, and a better estimate results when n = 20 for which the RMSE is 0.0332.

在文檔中多階層高層試題反應理論之蒙地卡羅馬可夫鏈估計法 (頁 69-79)