第五章 結論與建議
第二節 後續研究建議
本研究旨於探討三種選題法篩選 DIF-free 試題在概似比檢定法上進行 DIF 檢核的效能,利用篩選出較不可能具有 DIF 現象的試題做為定錨題運用於先定 錨後檢核策略,在高 DIF 試題百分比時也能控制型一誤差在合理的誤差範圍之 內,此策略在本研究中得到良好的驗證,但本研究發現使用迭代定題法篩選 DIF-free 試題時,在高 DIF 百分比時,選題正確率較量尺淨化法及排序選題法來 得低一些,此與以往的研究發現不一致,也與孫國瑋(2010)以概似比檢定法 檢測二分題 DIF 時的發現不同,由於本研究探討的情境有限,似不足以釐清原 因,建議未來研究中可針對迭代定題法的選題步驟進行改善,提升選題的表 現,以增進檢核效能。
本研究使用目前最新研究所提出的三種選題法來篩選 DIF-free 試題,而在 未來的議題上,如何更方便又確實的篩選出 DIF-free 試題,在後續研究可繼續 進行選題方法上的改良。也可運用於更多研究情境的探討,以更了解先定錨後 檢核策略的在各情境下的實施效能。後續研究者也可將先定錨後檢核策略結合
其他 DIF 檢核方法,如 SIBTEST 法等。亦建議可將先定錨後檢核策略運用於其 他多元計分模式資料,進而提供後續研究者更多檢核策略上的選擇。
參考文獻
孫國瑋 (2010)。先定錨後檢核策略運用在概似比檢定法之差異試題功能檢核 效果。國立臺中教育大學教育測驗統計研究所碩士論文,未出版,臺中 市。
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277-300.
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Psychological Measurement, 15, 113-141.
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Lawrence Earlbaum Associates.
Cohen, A. S., Kim, S. & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, 335-350.
Embretson, S. E. & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum Publishers.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and
Psychological Measurement, 67, 373-393.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum.
Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345-355.
Lord, F. M. (1980). Applications of item response theory to practical testing problems.
Hillsdale, NJ: Lawrence Erlbaum.
Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58,690-700.
Mantel, N., & Haenszel, W.(1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-108.
Miller, T.R., & Spray, J.A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30 (2), 107-122.
Park, D. G., & Lautenschlager, G. J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.
Raju, N, S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502.
Raju, N, S., van der Linden, W., Fleer, P. (1995). An IRT-based internal measure of test bias with applications for differential item functioning. Applied Psychological Measurement, 19, 353-368.
Samejima, F. (1969). Estimation of a latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 17,1-100.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58,159-194.
Shih, C.-L., &Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.
Stark, S., Chernyshenko, Oleksandr, S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory:
Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306.
Thissen, D. (1991). MULTILOG user’s guide(Version 6) [Computer program]. Mooresville, IN: Scientific Software.
Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item
functioning. University of North Carolina at Chapel Hill.
Thissen, D., Steinberg, L., & Gerrand, M. (1986). Beyond group mean differences:
The concept of item bias. Psychological Bulletin, 99, 118-128.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale NJ: Lawrence Erlbaum.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item
functioning using the parameters of item response models. In P. W. Holland & H.
Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale NJ: Erlbaum.
Wang, W.-C. (2001, September). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Paper presented at the annual meeting of the Chinese Psychological Association, Chia-Yi, Taiwan.
Manuscript submitted for publication.
Wang, W.-C. (2004). Effect of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied Measurement, 9, 387-408.
Wang, W.-C, Shih, C.-L., & Yang, C,-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713-731.
Wang, W.-C., & Su, Y.-H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF Detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144.
Wang, W.-C & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42-57.