後續研究及建議

第五章結論及建議

第二節後續研究及建議

本研究旨在探討將DFTD的策略運用於PHGLM模式中DIF檢核之效果，結果指出，當兩群體帄均能力有差異時，使用DFTD策略進行DIF檢核，可使型一誤差受到控制，但當DIF百分比為40%時，型一誤差將會膨脹，在DFTD策略中選擇 DIF-free詴題的程序為將測驗中每道詴題依序作為reference詴題，檢核其他道詴題，可得到其他詴題之DIF係數，將每道詴題之DIF係數加總起來計算其帄均值，

即為定錨題效果量，選擇定錨題效果量最小的詴題作為DIF-free詴題進行DIF檢核。若將DIF詴題作為定錨題，其他含有DIF之詴題DIF係數會減少，而無DIF之詴題DIF係數反而會增加，因此當測驗中DIF詴題比例偏高時，容易導致含有DIF 之詴題定錨題效果量比無DIF之詴題定錨題效果量還要小的情況發生，於DIF-free 詴題的選擇上，容易選擇到含有DIF之詴題，進而影響檢核結果，從過往研究可得知，使用正確的DIF-free詴題進行DIF檢核，可確實控制型一誤差的效果，雖然本研究只使用一道詴題作為定錨題，但在（Wang, 2004; Wang & Yeh, 2003）的研究中顯示，若確認定題法中之定錨題為DIF-free詴題，即使只有一題定錨題，仍可使型一誤差達到良好的控制，因此如何提升尋找DIF-free詴題的正確率也是後續研究可著重的部分。

而在後續研究中，亦可加入不同種策略例如量尺淨化程序、使用 pure anchor 來進行 DIF 檢核之方法等，進行 DIF 檢核之比較，由於 pure anchor 法使用之定錨題，為確認無 DIF 之詴題，預期對型一誤差的控制將最為理想，因此可藉由此法的研究結果輔助探討 DFTD 策略對於控制型一誤差之效能。

參考文獻

中文文獻

孫國瑋 (2010)。先定錨後檢核策略運用在概似比檢定法之差異詴題功能檢核效果。國立臺中教育大學教育測驗所統計研究所碩士論文，未出版，臺中市。

黃瓅瑩 (2008)。HGLM分析DIF之比較與應用。國立臺南大學測驗統計研究所碩 士論文，未出版，臺南市。

英文文獻

Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral

Statistics, 22, 47-76.

Bryk, A. S. & Raudenbush, S. W. (1992). Hierarchical linear models. Newbury Park, CA: Sage.

Camilli, G., & Smith, J. K. (1990). Comparison of the Mantel-Haenszel test with a randomized and a jackknife test for detecting biased items. Journal of

Educational Statistics, 15, 53-67.

Chang, H. H., Mazzeo, J., & Roussos, J. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational

Measurement, 33, 333-353.

Chen, J.-H., Chen, C.-T., & Shih, C.-L. (2010). Applying DIF-free-then-DIF strategy

on Hierarchical Generalized Linear Models to Assess Differential Item

Functioning. The 75th Annual Meeting of the Psychometric Society, July 6-9,

2010, Georgia, USA.

the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6, 269-279.

Fidalgo, A. M., & Madeira, J. M. (2008). Generalized Mantel-Haenszel methods for DIF detection. Educational and Psychological Measurement, 68, 940-958.

Fidalgo, A. M., Ferreres, D., & Mun˜ iz, J. (2004). Utility of the Mantel-Haenszel procedure for detecting differential item functioning with small samples.

Educational and Psychological Measurement, 64, 925-936.

Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied

PsychologicalMeasurement, 29, 278-295.

Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological

Measurement, 67, 565-582.

Holland, P. W., & Wainer, H. (1993). DIF detection and description : Mantel-Haenszel and Standardization. In N. J. Dorans & P. W. Holland (Eds.), Differential item

functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and

Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp.

129-145). Hillsdale, NJ: Lawrence Erlbaum Associates.

Kamata, A. (1998). One-Parameter Hierarchical Generalized Linear Logistic Model:

An Application of HGLM to IRT. College of Education Michigan State University.

Kamata, A. (2001). Item Analysis by the Hierarchical Generalized Linear Model.

Journal of Educational Measurement, 38, 79-93.

Kamata, A., Chaimongkol, S., Genc, E., & Bilir, K. (2005). Random-Effect Differential

Item Functioning Across Group Unites by the Hierarchical Generalized Linear

Model. Paper presented at the annual meeting of the American Educational

Research Association, April, Montreal, Canada.

Li, H. & Stout, W. (1996). A new procedure for detecting crossing DIF. Psychometrika,

61(4), 647-677.

Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.

Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58, 690-700.

Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological

Measurement, 52, 443-452.

Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of

Educational Statistics, 7, 105-108.

Miller, T. R., & Spray, J. A. (1993). Logistic discrimination function analysis for DIF identification of polytomous scored items. Journal of Educational Measurement,

30(2), 107-122.

Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data.

Applied Psychological Measurement, 14, 59-71.

Narayanan, P., & Swaminathan, H. (1996), Identification of items that show nonuniform DIF, Applied Psychological Measurement, 20, 257-274.

Parshall, C. G., & Miller, T. R. (1995). Exact versus asymptotic Mantel-Haenszel DIF statistics: A comparison of performance under small-sample conditions. Journal

of Educational Measurement, 32, 302-316.

Raudenbush, S.W. (1995). Hierarchical linear models: The case of school effects on literacy. In M. Binkley, K. Rust, & M. Winglee (Eds.), Methodological Issues in

Comparative International Studies: The Case of Reading Literacy, Chapter 8, (pp.

231‐ 241), Washington, DC: National Center for Educational Statistics.

Raudenbush, S. W., & Bryk, A.S. (2002). Hierarchical linear models:Applications

and data analysis methods (2

^nd ed).Newbury Park, CA:sage.

Raudenbush, S. W., Bryk, A.S., Cheong, Y. F., & Congdon, R. (2004). HLM6:

Hierarchical linear and nonlinear modeling [Computer Program]. Chicago:

Scientific Software International.

Rogers, H. J. & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied

Psychological Measurement, 17, 105-116.

Miller, T. R., & Spray, J. A. (1993). Logistic discrimination function analysis for DIF identification of polytomous scored items. Journal of Educational Measurement,

30 (2), 107-122.

Shealy, R., & Stout, W. (1993). A model-based standardization approach that

separates true bias/DIF from group ability differences and detects test bias/DTF as well as bias/DIF. Psychometrika, 58(2), 159-194.

Shih, C.-L. & Wang W.-C. (2009). Differential Item Functioning Detection Using the Multiple Indicators, Multiple Causes Method with a Pure Short Anchor. Applied

Psychological Measurement, 33, 184-199.

Somes, G.W. (1986). The generalized Mantel–Haenszel statistic. The American

Statistician, 40, 106–108.

Stiratelli, R., Laird, N., & Ware, J. H. (1984). Random effects models for serial

observations with binary responses. Biometrics, 40, 961-971.

Swaminathan, H., & Rogers, H. J. (1990), Detecting differential functioning using logistic regression procedures, Journal of Educational Measurement, 27, 361-370.

Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test

validity (pp. 147-169). Hillsdale, NJ: Erlbaum.

Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel

procedure in the detection of differential item functioning. Applied Psychological

Measurement, 18, 15-25.

Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological

Measurement, 27, 479-498.

Wang, W.-C. (2004). Effects of anchor item methods on differential item functioning detection within the family of Rasch models. Journal of Experimental Education,

72, 221-261.

Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied

Measurement, 9, 387-408.

Wang, W.-C., & Shih, C.-L. (2010). MIMIC methods for assessing differential item functioning in polytomous items. Applied Psychological Measurement, 34(3), 166-180.

Wong, G. Y., & Mason, W. M. (1985). The hierarchical logistic regression model for multilevel analysis. Journal of American Statistical Association, 80, 513-524.

Williams, N. J., & Beretvas, S. N. (2006). DIF identification using HGLM for

Zumbo, B. D. (1999). A handbook on the theory and methods of differential item

functioning(DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (or ordinal) item scores. Ottawa, Canada: Directorate of

Human Resources Research and Evaluayion, Department of National Defense.

Retrieved from http://www.edu.ubc.ca/faculty/zumbo/DIF/index.html.

Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185-197.

附錄

附錄一兩群體帄均能力相等

Pattern Sample size DIF%

Type I error Power

標準 DFTD 標準 DFTD

Mean Std Mean Std Mean Std Mean Std

constant R250/F250 0 0.079 0.026 0.035 0.014

20 0.087 0.034 0.052 0.021 0.870 0.041 0.748 0.050 40 0.096 0.040 0.140 0.037 0.900 0.037 0.574 0.048 R500/F250 0 0.078 0.031 0.028 0.015

20 0.085 0.053 0.043 0.021 0.955 0.037 0.865 0.044 40 0.095 0.029 0.117 0.043 0.961 0.019 0.763 0.038 R500/F500 0 0.077 0.027 0.022 0.016

20 0.089 0.027 0.043 0.021 0.990 0.012 0.950 0.024 40 0.094 0.024 0.115 0.023 0.995 0.005 0.908 0.025 balanced R250/F250 0 0.079 0.026 0.035 0.014

20 0.086 0.031 0.036 0.019 0.883 0.022 0.830 0.016 40 0.088 0.026 0.045 0.016 0.914 0.015 0.845 0.040 R500/F250 0 0.078 0.031 0.028 0.015

20 0.082 0.039 0.034 0.013 0.965 0.017 0.918 0.046 40 0.086 0.027 0.043 0.026 0.968 0.012 0.925 0.025 R500/F500 0 0.076 0.027 0.022 0.016

20 0.081 0.035 0.034 0.019 0.995 0.005 0.980 0.008 40 0.082 0.037 0.040 0.013 0.998 0.006 0.988 0.010

附錄二兩群體帄均能力相差 0.5 個標準差

Pattern Sample size DIF%

Type I error Power

標準 DFTD 標準 DFTD

Mean Std Mean Std Mean Std Mean Std

constant R250/F250 0 0.793 0.048 0.025 0.012

20 0.808 0.030 0.036 0.022 1.000 0.000 0.753 0.056 40 0.859 0.054 0.095 0.022 1.000 0.000 0.545 0.037 R500/F250 0 0.877 0.028 0.029 0.021

20 0.873 0.030 0.041 0.024 1.000 0.000 0.873 0.010 40 0.904 0.032 0.128 0.028 1.000 0.000 0.699 0.055 R500/F500 0 0.960 0.017 0.034 0.021

20 0.967 0.018 0.042 0.021 1.000 0.000 0.975 0.019 40 0.964 0.017 0.133 0.044 1.000 0.000 0.860 0.024 balanced R250/F250 0 0.793 0.048 0.025 0.012

20 0.773 0.032 0.031 0.015 0.575 0.491 0.835 0.024 40 0.803 0.035 0.038 0.011 0.571 0.459 0.833 0.048 R500/F250 0 0.897 0.028 0.029 0.021

20 0.870 0.036 0.033 0.019 0.563 0.505 0.918 0.046 40 0.892 0.020 0.040 0.027 0.581 0.449 0.911 0.039 R500/F500 0 0.960 0.017 0.034 0.021

20 0.969 0.019 0.035 0.018 0.502 0.565 0.985 0.013 40 0.971 0.017 0.040 0.022 0.590 0.439 0.990 0.008

附錄三兩群體帄均能力相差 1 個標準差

Pattern Sample size DIF%

Type I error Power

標準 DFTD 標準 DFTD

Mean Std Mean Std Mean Std Mean Std

constant R250/F250 0 0.999 0.003 0.021 0.016

20 1.000 0.000 0.035 0.018 1.000 0.000 0.705 0.064 40 0.999 0.003 0.111 0.014 1.000 0.000 0.533 0.032 R500/F250 0 1.000 0.000 0.026 0.020

20 1.000 0.000 0.038 0.021 1.000 0.000 0.843 0.026 40 1.000 0.000 0.117 0.024 1.000 0.000 0.626 0.054 R500/F500 0 1.000 0.000 0.031 0.014

20 1.000 0.000 0.041 0.022 1.000 0.000 0.953 0.005 40 1.000 0.000 0.142 0.028 1.000 0.000 0.874 0.041 balanced R250/F250 0 0.999 0.003 0.041 0.016

20 1.000 0.000 0.032 0.013 0.785 0.248 0.823 0.050 40 0.999 0.003 0.030 0.015 0.819 0.383 0.799 0.029 R500/F250 0 1.000 0.000 0.032 0.020

20 1.000 0.000 0.028 0.017 0.863 0.161 0.883 0.040 40 1.000 0.000 0.026 0.015 0.843 0.169 0.899 0.049 R500/F500 0 1.000 0.000 0.028 0.014

20 1.000 0.000 0.026 0.023 0.913 0.103 0.985 0.013 40 1.000 0.000 0.024 0.014 0.921 0.089 0.978 0.018

在文檔中先定錨後檢核運用在PHGLM之差異試題功能檢核效果 (頁 43-53)

第五章 結論及建議

第二節 後續研究及建議

參考文獻

中文文獻

英文文獻

Statistics, 22, 47-76.

Educational Statistics, 15, 53-67.

Measurement, 33, 333-353.

on Hierarchical Generalized Linear Models to Assess Differential Item

Functioning. The 75th Annual Meeting of the Psychometric Society, July 6-9,

Educational and Psychological Measurement, 64, 925-936.

PsychologicalMeasurement, 29, 278-295.

Measurement, 67, 565-582.

functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.

An Application of HGLM to IRT. College of Education Michigan State University.

Journal of Educational Measurement, 38, 79-93.

Item Functioning Across Group Unites by the Hierarchical Generalized Linear

Model. Paper presented at the annual meeting of the American Educational

61(4), 647-677.

Measurement, 52, 443-452.

Educational Statistics, 7, 105-108.

30(2), 107-122.

Applied Psychological Measurement, 14, 59-71.

of Educational Measurement, 32, 302-316.

Comparative International Studies: The Case of Reading Literacy, Chapter 8, (pp.

and data analysis methods (2

Hierarchical linear and nonlinear modeling [Computer Program]. Chicago:

Psychological Measurement, 17, 105-116.

30 (2), 107-122.

Psychological Measurement, 33, 184-199.

Statistician, 40, 106–108.

validity (pp. 147-169). Hillsdale, NJ: Erlbaum.

Measurement, 18, 15-25.

Measurement, 27, 479-498.

72, 221-261.

Measurement, 9, 387-408.

functioning(DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (or ordinal) item scores. Ottawa, Canada: Directorate of

附錄

第五章結論及建議

第二節後續研究及建議