第五章 結論與建議
第二節 研究建議
本節根據研究過程與結論,提出以下幾點建議:
一、從本研究結論中可得知,在試題存在違反局部獨立的情形下,HLM6.03 估計 參數精準度較 BILOG-MG 佳,因此可以 HLM 軟體代替 BILOG-MG 進行參 數估計。
二、從本研究的結果中可發現,兩模式在一些模擬因子組合下的估計效果指標,
並無呈現相同的趨勢,推測可能與本研究所設定的複製次數 50 次有關,因 此建議未來研究者可提高複製次數,可使結果更趨於穩定。
三、本研究中只設定三個模擬因子,未來的研究可增加模擬因子,像是題組數等,
使研究結果更具完整性。
四、本研究結果中,在某些模擬情境裡存在著無法解釋的估計表現趨勢,推測可 能是由於本研究在產生模擬資料時,利用 SAS 軟體一次產生 50 筆資料集而 導致模擬資料存在異常情況,建議未來相關的模擬研究在產生資料時,一次 產生一筆資料集。
五、本研究在估計軟體 HLM6.03 中,選用 REML 估計法,而 BILOG- MG 選用 ML 估計法,因此參數的估計效果是由模式與估計法一起造成。建議未來研 究者可使用相同的估計法,更可突顯參數估計效果是由模式所造成的差異。
參考文獻
中文部份
王寶墉(1995)。現代測驗理論。臺北市:心理出版社。
余民寧(1991)。IRT 學理與應用。取自 http://www.edutest.com.tw/e-irt/irt.htm。
余民寧、謝進昌(2005)。首屆國際認證理財規劃顧問專業能力測驗心理計量特質 分析。教育研究與發展期刊,3,51-82。
林原宏(1997a) 。教育研究資料的階層線性模式分析。國立臺中教育大學學報,
11,489- 509。
林原宏(1997b)。階層線性模式(HLM)之理論。測驗統計簡訊,15,17- 26。
林原宏(2006)。數學試題的局部獨立性與題組反應模式:兼論其在數學考卷的評 析與檢驗。數學考卷編製暨評析研討會。台中市:國立台中教育大學。
吳璧如(2005)。教師效能感的縱貫性研究:以幼教職前教師為例。國立政治大學 教育與心理研究,28,383-403。
吳毓瑩、吳麗君(2001)。從比較教育的取向討論測驗評量在教育銜接中之意涵—
一個可能的研究途徑。國立台北師範學院學報, 15,313-336。
葛湘瑋(2004)。應用線性混合效果模式於建立多層次縱向資料的模式之實例研 究。國立政治大學教育與心理研究,27,399-419。
溫福星(2007)。階層線性模式原理、發展與應用。臺北市:雙葉書廊有限公司。
劉子鍵、林原宏(1997)。階層線性模式之理論與應用:以「影響自然科成績之因 素的研究」為分析實例。國立政治大學教育與心理研究,20,1-22。
英文部份
Andersen, E. B. (1973). Conditional inference and models for measuring. Copenhagen:
Mentalhygiejnisk Forlag.
Adams, R. J. & Wilson, M. (1996). Formulating the Rasch model as a mixed
coefficients multinomial logit. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory and practice (pp. 143-166). Norwood, NJ: Ablex.
Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.
Birnbaum, A. (1968). Some latent trait models and their user in inferring an examinee’s ability. In F. M. Lord , & M. R. Novick (Eds.), Statistical theories of mental rest scores (pp. 397-479). Reading, MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM algorithm. Psychometrika, 46, 443-459.
Bock, R. D., & Zimowski, M. F. (1996). Multiple group IRT. In W. J. van der Linden,
& R. K. Hambleton (Eds.), Handbook of modern item response theory. New York:
Springer.
Breslow, N. E., & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Associatio n, 88, 9-25.
Beretvas, S. N., & Williams, N. J. (2004). The use of hierarchical generalized linear model for item dimensionality assessment. Journal of Educational Measurement, 41, 379-395.
Beretvas, S. N., & Williams, N. J. (2006). DIF identification using HGLM for polytomous item. Applied Psychological Measurement, 30, 22-42.
Beretvas, S. N., & Kamata, A. (2005). The multilevel measurement model:
introduction to the special issue . Journal of Applied Measurement, 6, 247-254.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.
Bradlow, E. T., Wainer, H., & Wang, X. (2002). A general Bayesian model for testlets:
theory and applications. Applied Psychological Measurement, 26, 109-128.
Chu, K. L., & Kamata, A. (2005). Test equating in the presence of DIF item. Journal of Applied Measurement, 6, 342-354.
Cheong, Y. F., & Raudenbush, S. W. (2000). Measurement and structural models for children’s problem behaviors. Psychological Methods, 5 , 477-495.
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for items pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265-289.
Du, Z. (1998). Modeling conditional item dependencies with a three-parameter logistic testlet model. Doctoral dissertation, Columbia University.
Embretson, S. E. (1997). Structured ability models in tests designed from cognitive theory. In M. Wilson, G. Engelhard, & K. Draney (Eds.), Objective measurement III (pp. 223-236). Norwood, NJ: Ablex.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for p sychologists.
Mahwash, NJ: Lawrence Erlbaum Associates.
Fischer, G. H. (1995). Linear logistic test model. In G. H. Fisher, & I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments and applications (pp.
131-155). New York: Springer-Verlag.
Fox, J. P. (2004). Applications of multilevel IRT modeling. School Effectiveness and School Improvement, 15, 261-280.
Fox, J. P., & Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 269-286.
Fox, J. P., & Glas, C. A. W. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68, 169-191.
Hambleton, R. K. (1989). Principles and selected applications of item response theory (3rd ed.). In R. L. Linn (Ed.), Educational measurement (pp. 147-220). New York:
Macmillan.
Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75-96.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer-Nijhoff.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item resp onse theory . Newburry Park, CA: Sage.
Hox, J. J. (2002). Multilevel analysis: techniques and applications. Mahwah, NJ:
Lawrence Erlbaum Associates.
Hung, L. F., & Wang, W. C. (2005). Multilevel modeling for testing whether items have good discrimination. Chinese Journal of Psychology, 47, 197-209.
Jiao, H., Wang, S., & Kamata, A. (2005). Modeling local item dependence with the hierarchical generalized linear model. Journal of Applied Measurement, 6, 311-321.
Kamata, A. (1998). Some generalizations of the Ra sch model: An application of the hierarchical generalized linear model . Doctoral dissertation, Michigan State University, East Lansing.
Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 79-93.
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. Thousand Oaks, CA:
Sage Publications.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. (2nd ed.).
New York: Springer.
Kim, S. H., Cohen, A. S., & Lin, Y. H. (2005). LDIP: A computer program for local dependence indices for polytomous items [Software and Manual]. Athens, GA:
University of Georgia.
Lee, Y. W. (2004). Examining passage- related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test.
In T. Eckes, & R. Grotjahn (Eds.), Language Testing (pp. 290-325). CA: SAGE.
Lord, F. M. (1952). A theory of test scores. (Psychometrika Monograph No. 7). Iowa City, IA: Psychometric Society.
Lord, F. M. (1980). Applications of item response theory to p ractical testing problems.
Hillsdale, NJ: Lawrence Erlbaum Associated.
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212.
Mislevy, R. J. (1986). Bayes modal estimation in item response models.
Psychometrika, 51, 177–195.
Miller, A. D., & Murdock, T. B. (2007). Modeling latent true scores to determine the utility of aggregate student perceptions as classroom indicators in HLM: The case
of classroom goal structures. Contemporary Educational Psychology, 32, 83-104.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests.
Chicago: University of Chicago Press.
Rosenbaum, P. R. (1988). Item bundles. Psychometrika, 53, 349-359.
Ringdal, K. (1992). Methods for multilevel analysis. Acta Sociologica, 35, 235-243.
Raudenbush, S., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis Methods(2nd ed.). Thousand Oaks, CA: Sage Publications.
Raudenbush, S. W., Bryk, A. S., & Congdon, R. (2004). HLM6: Hierarchical linear and nonlinear modeling [Computer Program]. Chicago: Scientific Software International.
Roberts, J. K., & Herrington, R. (2005). Demonstration of software programs for estimating multilevel measurement model parameters. Journal of applied measurement, 6 , 255-272.
Singer, J. D. (1998). Using SAS proc mixed to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 24, 323-355.
Swaminathan, H., & Gifford, J. A. (1982). Bayesian estimation in the Rasch model.
Journal of Educational Statistics, 7, 175–191.
Swaminathan, H., & Gifford, J. A. (1985). Bayesian estimation in the two -parameter logistic model. Psychometrika, 50, 349–364.
Swaminathan, H., & Gifford, J. A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589–601.
Swaminathan, H., & Rogers, J. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Sireci, S. G., Wainer, H., & Thissen, D. (1991). On the reliability of testlet-based tests.
Journal of Educational Statistics, 7, 175-191.
Tucker, L. R. (1946). Maximum validity of a test with equivalent items.
Psychometrika, 11, 1-13.
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models.
Psychometrika, 51, 567-577.
Thissen, D., Sternberg, L., & Mooney, J. A. (1989). Trace kines for testlets: a use of multiple-categorical-response model. Journal of Educational Measurement, 26, 247-260.
van den Noortgate, W., De Boeck, P., & Meulders, Michel. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369-386.
Wang, W. C., & Liu, C. Y. (2007). Formulation and application of the generalized multilevel facets model. Educational and Psychological Measurement, 67, 583-605.
Wainer, H. (1990). Computer adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test:
The 1991 Law School Admission Test as an example. Applied Measurement in Education, 8, 157-187.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test score?
What is the effect of lo cal dependence on reliability? Educational Measurement:
Issues and Practice, 15, 22-29.
Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473-492.
Weiss, D. J. (1985). Adaptive testing by computer. Journal of Consulting and Clinical Psychology, 53, 774-789.
Yen, W. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125-145.
Yen, W. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R.D. (1996). BILOG-MG:
Multiple-group IRT analysis and test maintenance for binary items [Computer program]. Chicago, IL: Scientific Software.