後續研究建議

第五章結論與建議

第二節後續研究建議

在進行 DIF 檢測之前，必須先建立共同量尺(common metric)，所謂的共同量尺就是指配對變項(matching variable)，有了共同量尺就可以比較不同群體間的試題反應。如果一份測驗中隱含了 DIF 試題，我們又將測驗中的全部試題作為配對變項，那麼共同量尺會因 DIF 而對不同群體造成不公平的現象，將會使整個量尺不具有信度與效度，進而導致能力估計的偏誤與錯誤的 DIF 檢測結果(Lord, 1980)，量尺淨化(scale purification)的程序因而被發展出，並且被建議應用在 DIF 檢測方法上(Candell & Drasgow, 1988; French & Maller, 2007; Holland & Thayer, 1988; Lord, 1980; Park & Lautenschlager, 1990)。當測驗中 DIF 試題的比例超過 20

％時，大部分的 DIF 檢測方法之型一誤差都會發生膨脹而失控的現象，於是量尺淨化的功效即為避免此現象，並且降低 DIF 試題對參數估計的影響。

在真實的測驗情境裡，通常不會只有一道 DIF 試題存在，以往的研究顯示量尺淨化程序在 DIF 試題數為 30%(有些方法為 20%）內時，可以使 DIF 檢核方法的Type Ι error 受到較好的控制。然而當一份測驗中所含的 DIF 試題過多時，即使進行量尺淨化的程序，其改善的效果亦相當有限。為了解決此問題，因此

Wang(2008)提出一個新的方法，即「DIF-Free-then-DIF」的策略，選定測驗中少數最不可能為 DIF 試題來當作配對變項，再針對其它試題做 DIF 檢測，經由模擬研究發現即使測驗中含有高比例的 DIF 試題，其 type Ι error 受到較好的控制，而具有 DIF 的試題也能確實的檢測出來，由於本研究受限於時間的因素，因此建議在未來之研究可進一步的在 MACS 模式下運用量尺淨化的程序以及

「DIF-Free-then-DIF」策略來進行 DIF 檢測，並探究其成效為何。

柯明錦(2010)。Evaluation of Mean and Covariance Structure Analysis Model in Detecting Differential Item Functioning of Polytomous Items: in

Comparision with GMH, and Poly-SIBTEST。國立臺灣師範大學數學系教學 碩士班碩士論文，未出版，臺北市。

Andrich, D. (1978). A rating formulation for ordered response categories.

Psychometrika 43, 561-573.

Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and covariance adjusted weighted least squares estimation in CFA.

Structural Equation Modeling, 13, 186-203.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick(Eds.), Staistical theories of mental test scores(pp.397-479). Reading, MA: Addison-Wesley.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.

Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley & Sons.

Byran, B. M. (2001). Structural equation modeling with AMOS: Basic concepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum Associates.

Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for equivalence of factorial invariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.

Camilli, G., & Shepard, L. A. (1994). Methods for Identifying Biased Test Items.

Sage:Thousand Oaks.

Chan, D. (2000). Detection of Differential Item Functioning on the Kirton Adaptation-Innovation Inventory using multiple-group mean and covariance structure analyses. Multivariate Behavioral Research, 35, 169-199.

Chen, Y.-F.,, Shih, C.-L., & Wang, W.-C. (2009). A scale purification procedure for the mean and covariance structure analysis model for assessment of nonuniform differential item functioning.. The 16^th International Meeting of the Psychometric Society,

Cole, N., S.,& Zikey, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369-382.

Dorans, N. J., & Kulick, E. (1986), Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test, Journal of Educational Measurement , 23, 355-368.

Drasgow, F. (1984), Scrutinizing psychological test: Measurement equivalence and equivalent relations with external variables are the central issues: Psychologicalll Bulletin , 95, 34-135.

Embretson, S. E. & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum Publishers.

Ferrando, P. J. (1996). Calibration of invariant item parameters in a continuous item response model using the extended LISREL measurement submodel. Multivariate Behavioral Research, 31, 419-439.

Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.

French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393.

González-Romá, V., Hernández, A., & Gómez-Benito, J. (2006). Power and Type I error of the mean and covariance structure analysis model for detecting

differential item functioning in graded response items. Multivariate Behavioral Research, 41, 29-53.

Hernández, A., & González-Romá, V.(2003). Evaluating the multiple-group mean and covariance analysis model for the detection of differential item functioning in polytomous ordered items, Psicothema, 15,322-327.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp.

129-145). Hillsdale, NJ: Lawrence Erlbaum.

Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ:

Lawrence Erlbaum.

Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 10,631-639.

Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22,345-355.

Lee, J. (2009). Type I error and Power of the mean and covariance structure confirmatory factor analysis for differential item functioning detection:

methodological issues and resolutions. PhD Dissertation. University of Kansas.

Little, T. D. (1997). Mean and covariance structures (MACS) analysis of cross-cultural data: practical and theoretical issues. Multivariate Behavioral Research, 32, 56-76.

Lord, F. M. (1980). Applications of item response theory to practical testing problems.

Hillsdale, NJ: Lawrence Erlbaum.

Mapuranga, R., Dorans, N. J., & Middleton, K. (2008). A review of recent developments in differential item functioning, Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) held in March, 2008, New York.

Masters, G. N.(1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

Meade, A. W., & Lautenschlager, G.K. (2004). A Monte-Carlo study of confirmatory factor analytic tests of measurement equivalence/invariance. Structural Equation Modeling, 11, 60-72.

Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-108.

Mellenbergh, G. J. (1994). A unidimensional latent trait model for continuous item responses. Multivariate Behavioral Research, 29, 223-236.

Muraki, E. & Bock, R, D. (1996). PARSCALE : IRT based test scoring and item analysis for graded open-ended exercise and performance tasks. Chicago : Scientific software international.

Muthén, B.(2006). Robust chi square difference testing with mean and variance adjust test statistics. Mplusl Web.

Muthén, B., & Lehman. J. (1985). Multiple-group IRT modeling: Applications to item bias analysis. Journal of Educational Statistics, 10,133-142.

Muthén, B.O., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript, University of California, Los Angeles.

Muthén, L. K., & Muthén, B. O. (2007). Mplus 4.21[Computer software]. Los Angeles, CA : Muthén & Muthén.

Navas-Ara, M. J., & Gomez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Journal of Psychological Assessment, 18, 9-15.

Oort, F. J.(1998). Simulation study of item bias detection with restricted factor analysis.

Structural Equation Modeling, 5, 107-124.

Park, D. G., & Lautenschlager, G. J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.

Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19, 5-15.

Raju, N, S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.Copenhagen: Institute of Educational Research. (Expanded edition, 1980.

Chicago : The University of Chicago Press)

Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory analysis and item response theory.

Journal of Applied Psychology ,87(3), 517-529.

Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance.

Psychological Bulletin ,114, 552-566.

Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215-230.

Samejima, F. (1969). Estimation of a latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 17,1-100.

Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507-514.

Shih, C.-L., & Wang, W.-C.* (2009). Differential item functioning detection using the MIMIC method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.

SÖ rbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 27, 229-239.

Stark, S., Chernyshenko, Oleksandr, S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory:

Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306.

Swaminathan, H., & Rogers, H. J. (1990), Detecting differential functioning using logistic regression procedures, Journal of Educational Measurement, 27, 361-370.

Taehoon Kang, Allan S. Cohen, & Hyun-Jung Sung. (2009). Model Selection Indices for Polytomous Items. Applied Psychological Measurement, 33, 499-518.

Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.

Teresi, J. A. (2006). Overview of quantitative measurement methods: Equivalence, invariance, and Differential Item Functioning in health applications. Medical Care, 44, 39-49.

Teresi, J., & Fleishman, J. (2007). Differential item functioning and health assessment.

Quality of Life Research, 16, 33-42.

Thissen, D. (1991). MULTILOG user’s guide(Version 6) [Computer program]. Mooresville, IN: Scientific Software.

Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Applied Psychological Measurement, 18, 15-25.

Vandenberg, R. J. (2002). Toward a further understanding of and improvement in measurement invariance methods and procedures. Organizational Research Methods, 5, 139-158.

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69.

Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied Measurement, 9, 387-408.

Wang, W.-C., & Su, Y.-H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF Detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144.

Wang, W.-C & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.

Wanichtanom, R. (2001). Methods of detecting Differential Item Functioning: A comparison of item response theory and Confirmatory Factor Analysis.

Unpublished doctoral dissertation.

Woods, C. M. (2009). Evaluation of MIMIC-Model Methods for DIF Testing With Comparision to Two-Group Analysis. Multivariate Behavioral Research, 44,1-27.

Yvette R., & Lindsay T. (1999). Structural equation modeling with Lisrel application in tourism.. Tourism management,20,71-88.

在文檔中平均數及共變數結構法在多分題上的DIF檢核效果 (頁 45-54)

第五章 結論與建議

第二節 後續研究建議

第五章結論與建議

第二節後續研究建議