• 沒有找到結果。

第五章 結論及建議

第二節 後續研究及建議

本研究旨在探討將DFTD的策略運用於PHGLM模式中DIF檢核之效果,結果 指出,當兩群體帄均能力有差異時,使用DFTD策略進行DIF檢核,可使型一誤差 受到控制,但當DIF百分比為40%時,型一誤差將會膨脹,在DFTD策略中選擇 DIF-free詴題的程序為將測驗中每道詴題依序作為reference詴題,檢核其他道詴 題,可得到其他詴題之DIF係數,將每道詴題之DIF係數加總起來計算其帄均值,

即為定錨題效果量,選擇定錨題效果量最小的詴題作為DIF-free詴題進行DIF檢 核。若將DIF詴題作為定錨題,其他含有DIF之詴題DIF係數會減少,而無DIF之 詴題DIF係數反而會增加,因此當測驗中DIF詴題比例偏高時,容易導致含有DIF 之詴題定錨題效果量比無DIF之詴題定錨題效果量還要小的情況發生,於DIF-free 詴題的選擇上,容易選擇到含有DIF之詴題,進而影響檢核結果,從過往研究可 得知,使用正確的DIF-free詴題進行DIF檢核,可確實控制型一誤差的效果,雖然 本研究只使用一道詴題作為定錨題,但在(Wang, 2004; Wang & Yeh, 2003)的研 究中顯示,若確認定題法中之定錨題為DIF-free詴題,即使只有一題定錨題,仍 可使型一誤差達到良好的控制,因此如何提升尋找DIF-free詴題的正確率也是後 續研究可著重的部分。

而在後續研究中,亦可加入不同種策略例如量尺淨化程序、使用 pure anchor 來進行 DIF 檢核之方法等,進行 DIF 檢核之比較,由於 pure anchor 法使用之定 錨題,為確認無 DIF 之詴題,預期對型一誤差的控制將最為理想,因此可藉由此 法的研究結果輔助探討 DFTD 策略對於控制型一誤差之效能。

參考文獻

中文文獻

孫國瑋 (2010)。先定錨後檢核策略運用在概似比檢定法之差異詴題功能檢核效 果。國立臺中教育大學教育測驗所統計研究所碩士論文,未出版,臺中市。

黃瓅瑩 (2008)。HGLM分析DIF之比較與應用。國立臺南大學測驗統計研究所碩 士論文,未出版,臺南市。

英文文獻

Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral

Statistics, 22, 47-76.

Bryk, A. S. & Raudenbush, S. W. (1992). Hierarchical linear models. Newbury Park, CA: Sage.

Camilli, G., & Smith, J. K. (1990). Comparison of the Mantel-Haenszel test with a randomized and a jackknife test for detecting biased items. Journal of

Educational Statistics, 15, 53-67.

Chang, H. H., Mazzeo, J., & Roussos, J. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational

Measurement, 33, 333-353.

Chen, J.-H., Chen, C.-T., & Shih, C.-L. (2010). Applying DIF-free-then-DIF strategy

on Hierarchical Generalized Linear Models to Assess Differential Item

Functioning. The 75th Annual Meeting of the Psychometric Society, July 6-9,

2010, Georgia, USA.

the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6, 269-279.

Fidalgo, A. M., & Madeira, J. M. (2008). Generalized Mantel-Haenszel methods for DIF detection. Educational and Psychological Measurement, 68, 940-958.

Fidalgo, A. M., Ferreres, D., & Mun˜ iz, J. (2004). Utility of the Mantel-Haenszel procedure for detecting differential item functioning with small samples.

Educational and Psychological Measurement, 64, 925-936.

Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied

PsychologicalMeasurement, 29, 278-295.

Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological

Measurement, 67, 565-582.

Holland, P. W., & Wainer, H. (1993). DIF detection and description : Mantel-Haenszel and Standardization. In N. J. Dorans & P. W. Holland (Eds.), Differential item

functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and

Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp.

129-145). Hillsdale, NJ: Lawrence Erlbaum Associates.

Kamata, A. (1998). One-Parameter Hierarchical Generalized Linear Logistic Model:

An Application of HGLM to IRT. College of Education Michigan State University.

Kamata, A. (2001). Item Analysis by the Hierarchical Generalized Linear Model.

Journal of Educational Measurement, 38, 79-93.

Kamata, A., Chaimongkol, S., Genc, E., & Bilir, K. (2005). Random-Effect Differential

Item Functioning Across Group Unites by the Hierarchical Generalized Linear

Model. Paper presented at the annual meeting of the American Educational

Research Association, April, Montreal, Canada.

Li, H. & Stout, W. (1996). A new procedure for detecting crossing DIF. Psychometrika,

61(4), 647-677.

Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.

Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58, 690-700.

Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological

Measurement, 52, 443-452.

Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of

Educational Statistics, 7, 105-108.

Miller, T. R., & Spray, J. A. (1993). Logistic discrimination function analysis for DIF identification of polytomous scored items. Journal of Educational Measurement,

30(2), 107-122.

Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data.

Applied Psychological Measurement, 14, 59-71.

Narayanan, P., & Swaminathan, H. (1996), Identification of items that show nonuniform DIF, Applied Psychological Measurement, 20, 257-274.

Parshall, C. G., & Miller, T. R. (1995). Exact versus asymptotic Mantel-Haenszel DIF statistics: A comparison of performance under small-sample conditions. Journal

of Educational Measurement, 32, 302-316.

Raudenbush, S.W. (1995). Hierarchical linear models: The case of school effects on literacy. In M. Binkley, K. Rust, & M. Winglee (Eds.), Methodological Issues in

Comparative International Studies: The Case of Reading Literacy, Chapter 8, (pp.

231‐ 241), Washington, DC: National Center for Educational Statistics.

Raudenbush, S. W., & Bryk, A.S. (2002). Hierarchical linear models:Applications

and data analysis methods (2

nd ed).Newbury Park, CA:sage.

Raudenbush, S. W., Bryk, A.S., Cheong, Y. F., & Congdon, R. (2004). HLM6:

Hierarchical linear and nonlinear modeling [Computer Program]. Chicago:

Scientific Software International.

Rogers, H. J. & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied

Psychological Measurement, 17, 105-116.

Miller, T. R., & Spray, J. A. (1993). Logistic discrimination function analysis for DIF identification of polytomous scored items. Journal of Educational Measurement,

30 (2), 107-122.

Shealy, R., & Stout, W. (1993). A model-based standardization approach that

separates true bias/DIF from group ability differences and detects test bias/DTF as well as bias/DIF. Psychometrika, 58(2), 159-194.

Shih, C.-L. & Wang W.-C. (2009). Differential Item Functioning Detection Using the Multiple Indicators, Multiple Causes Method with a Pure Short Anchor. Applied

Psychological Measurement, 33, 184-199.

Somes, G.W. (1986). The generalized Mantel–Haenszel statistic. The American

Statistician, 40, 106–108.

Stiratelli, R., Laird, N., & Ware, J. H. (1984). Random effects models for serial

observations with binary responses. Biometrics, 40, 961-971.

Swaminathan, H., & Rogers, H. J. (1990), Detecting differential functioning using logistic regression procedures, Journal of Educational Measurement, 27, 361-370.

Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test

validity (pp. 147-169). Hillsdale, NJ: Erlbaum.

Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel

procedure in the detection of differential item functioning. Applied Psychological

Measurement, 18, 15-25.

Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological

Measurement, 27, 479-498.

Wang, W.-C. (2004). Effects of anchor item methods on differential item functioning detection within the family of Rasch models. Journal of Experimental Education,

72, 221-261.

Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied

Measurement, 9, 387-408.

Wang, W.-C., & Shih, C.-L. (2010). MIMIC methods for assessing differential item functioning in polytomous items. Applied Psychological Measurement, 34(3), 166-180.

Wong, G. Y., & Mason, W. M. (1985). The hierarchical logistic regression model for multilevel analysis. Journal of American Statistical Association, 80, 513-524.

Williams, N. J., & Beretvas, S. N. (2006). DIF identification using HGLM for

Zumbo, B. D. (1999). A handbook on the theory and methods of differential item

functioning(DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (or ordinal) item scores. Ottawa, Canada: Directorate of

Human Resources Research and Evaluayion, Department of National Defense.

Retrieved from http://www.edu.ubc.ca/faculty/zumbo/DIF/index.html.

Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185-197.

附錄

附錄一 兩群體帄均能力相等

Pattern Sample size DIF%

Type I error Power

標準 DFTD 標準 DFTD

Mean Std Mean Std Mean Std Mean Std

constant R250/F250 0 0.079 0.026 0.035 0.014

20 0.087 0.034 0.052 0.021 0.870 0.041 0.748 0.050 40 0.096 0.040 0.140 0.037 0.900 0.037 0.574 0.048 R500/F250 0 0.078 0.031 0.028 0.015

20 0.085 0.053 0.043 0.021 0.955 0.037 0.865 0.044 40 0.095 0.029 0.117 0.043 0.961 0.019 0.763 0.038 R500/F500 0 0.077 0.027 0.022 0.016

20 0.089 0.027 0.043 0.021 0.990 0.012 0.950 0.024 40 0.094 0.024 0.115 0.023 0.995 0.005 0.908 0.025 balanced R250/F250 0 0.079 0.026 0.035 0.014

20 0.086 0.031 0.036 0.019 0.883 0.022 0.830 0.016 40 0.088 0.026 0.045 0.016 0.914 0.015 0.845 0.040 R500/F250 0 0.078 0.031 0.028 0.015

20 0.082 0.039 0.034 0.013 0.965 0.017 0.918 0.046 40 0.086 0.027 0.043 0.026 0.968 0.012 0.925 0.025 R500/F500 0 0.076 0.027 0.022 0.016

20 0.081 0.035 0.034 0.019 0.995 0.005 0.980 0.008 40 0.082 0.037 0.040 0.013 0.998 0.006 0.988 0.010

附錄二 兩群體帄均能力相差 0.5 個標準差

Pattern Sample size DIF%

Type I error Power

標準 DFTD 標準 DFTD

Mean Std Mean Std Mean Std Mean Std

constant R250/F250 0 0.793 0.048 0.025 0.012

20 0.808 0.030 0.036 0.022 1.000 0.000 0.753 0.056 40 0.859 0.054 0.095 0.022 1.000 0.000 0.545 0.037 R500/F250 0 0.877 0.028 0.029 0.021

20 0.873 0.030 0.041 0.024 1.000 0.000 0.873 0.010 40 0.904 0.032 0.128 0.028 1.000 0.000 0.699 0.055 R500/F500 0 0.960 0.017 0.034 0.021

20 0.967 0.018 0.042 0.021 1.000 0.000 0.975 0.019 40 0.964 0.017 0.133 0.044 1.000 0.000 0.860 0.024 balanced R250/F250 0 0.793 0.048 0.025 0.012

20 0.773 0.032 0.031 0.015 0.575 0.491 0.835 0.024 40 0.803 0.035 0.038 0.011 0.571 0.459 0.833 0.048 R500/F250 0 0.897 0.028 0.029 0.021

20 0.870 0.036 0.033 0.019 0.563 0.505 0.918 0.046 40 0.892 0.020 0.040 0.027 0.581 0.449 0.911 0.039 R500/F500 0 0.960 0.017 0.034 0.021

20 0.969 0.019 0.035 0.018 0.502 0.565 0.985 0.013 40 0.971 0.017 0.040 0.022 0.590 0.439 0.990 0.008

附錄三 兩群體帄均能力相差 1 個標準差

Pattern Sample size DIF%

Type I error Power

標準 DFTD 標準 DFTD

Mean Std Mean Std Mean Std Mean Std

constant R250/F250 0 0.999 0.003 0.021 0.016

20 1.000 0.000 0.035 0.018 1.000 0.000 0.705 0.064 40 0.999 0.003 0.111 0.014 1.000 0.000 0.533 0.032 R500/F250 0 1.000 0.000 0.026 0.020

20 1.000 0.000 0.038 0.021 1.000 0.000 0.843 0.026 40 1.000 0.000 0.117 0.024 1.000 0.000 0.626 0.054 R500/F500 0 1.000 0.000 0.031 0.014

20 1.000 0.000 0.041 0.022 1.000 0.000 0.953 0.005 40 1.000 0.000 0.142 0.028 1.000 0.000 0.874 0.041 balanced R250/F250 0 0.999 0.003 0.041 0.016

20 1.000 0.000 0.032 0.013 0.785 0.248 0.823 0.050 40 0.999 0.003 0.030 0.015 0.819 0.383 0.799 0.029 R500/F250 0 1.000 0.000 0.032 0.020

20 1.000 0.000 0.028 0.017 0.863 0.161 0.883 0.040 40 1.000 0.000 0.026 0.015 0.843 0.169 0.899 0.049 R500/F500 0 1.000 0.000 0.028 0.014

20 1.000 0.000 0.026 0.023 0.913 0.103 0.985 0.013 40 1.000 0.000 0.024 0.014 0.921 0.089 0.978 0.018

相關文件