建議 - 結論與建議 - 階層式試題反應理論之多點計分模式探討

第五章結論與建議

第二節建議

本節就本研究未盡完備之處，提出一些研究建議，供後續研究者參考。

一、本研究僅就模擬資料進行實驗，後續研究者可就實徵資料進行研究比對。

二、本研究中領域量尺數的設計是固定的，後續研究者可針對此點探討這此一變項對HO-IRT模式的估計影響。

三、本研究中假設受試者能力及試題參數分布來自常態，後續研究者可就不同分布進行研究比對。

四、本研究中對領域量尺增加一個總體量尺時，會降低該領域量尺與原本總體量尺的相關設定，因此增加一個總體量尺並不能增進能力量尺的估計精準度，

後續研究者可針對此點，探討這此一變項對二因子之HO-IRT模式的估計影響

，以供評量設計者參考。

五、本研究中在HO-IRT模式下進行MCMC估計時，樣本數、題數設定只有兩種，後續研究者進ㄧ步探討可針對樣本數、題數設定為何時，參數估計會趨於穩定，以供評量設計者、施測者作為參考。

六、本研究模擬資料使用單參數模式，後續研究者可採用二參模式或是三參模式進一步探討鑑別度、猜測度對HO-IRT模式之參數估計的影響。

七、題內多向度測驗之架構會因領域量尺數與與之對應題數的題數不同而有所多種可能。本研究中僅就其中某ㄧ種架構進行模擬實驗，後續研究者可就此進ㄧ步做完整探討。

八、本研究之模擬資料為單一受試群體接受單一測驗，後續研究者可發展多個題本、對多個團體施測，進一步探討等化設計對HO-IRT模式參數估計的影響。

九、本研究所使用之估計軟體WinBUGS，進行參數估計時需較多的時間，後續研究者可針對此點自行撰寫研發，加以改善。

參考文獻

中文部分

余民寧(1992)。試題反應理論的介紹（三）-試題反應模式及其特性。研習資訊9(2)，

6-10。

林佳樺(2009)。高階層試題反應理論及其成效探討。國立臺中教育大學教育測驗統計研究所，碩士論文。

洪碧霞、吳裕益、吳鐵雄、陳英豪(1992)。能力估計方法、題庫特質及終止標準 對CAT考生能力估計影響之研究（國科會專題研究計畫成果報告編號：

NSC81-0301-H024-03）。台北：中華民國行政院國家科學委員會。

黃珮漩(2002)。MCMC在電腦適性測驗上的應用。國立彰化師範大學數學研究所，碩士論文。

簡月梅(1998)。互動式提示多點計分電腦化適性測驗。國立台灣師範大學資訊教育研究所碩士論文。

英文部分

Ackerman, T. A. (1991). The use of unidimensional parameter estimates of

multidimensional items in adaptive testing. Applied Psychological Measurement,

13, 113-127.

Adams, R. J., Wilson, M., & Wang, W. C. (1997). The Multidimensional Random Coefficients Nultinomial Logit Model. Applied Psychological Measurement, 21, 1-23.

Andrich, D. (1978). A rating formulation for ordered response categories.

Psychometrika, 43, 561-573.

Baker, F. B. (1992). Item response theory : Paremeter estimation techniques. New York : Marcel Dekker.

Bock, R. D., & Aitken. M. (1981). Marginal Maximum Likelihood Estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.

Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in

amicrocomputer environment. Applied Psychological Measurement, 6, 431-444.

Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm.

American Statistician, 49, 327-335.

Cowles, M. K. (2004). Review of WinBUGS 1.4. The American Statistician, 58, 330-336.

Daniel J. McGrath (2007). Comparing TIMSS with NAEP and PISA in mathematics and science. Retrieved June 16, 2011, from the World Wide Web:

http://www.eric.ed.gov/PDFS/ED503624.pdf

de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353.

abilities : A Higher-Order IRT Model Approach. Applied Psychological

Measurement, 33(8), 620-639.

de la Torre, J., & Yuan, H. (2010).Parameter estimation with small sample size a Higher-Order IRT model approach. Applied Psychological Measurement, 34(4), 267-285.

Embretson, S. E. & Reise, S. P. (2000). Item Response Theory for psychologists.

Mahwah, NJ: Erlbaum Publishers.

Fischer, G. H. (1973). The Linear Logistic Test Model as instrument in educational research. Acta Psychologica, 37, 359-374.

IEA (2003). TIMSS Advanced 2008 Technical Report.

Retrieved June 16, 2011, from the World Wide Web:

http://timss.bc.edu/timss_advanced/downloads/T08_TR_Chapter8.pdf IEA (2011). TIMSS 2011 Assessment Framework.

Retrieved June 16, 2011, from the World Wide Web:

http://timss.bc.edu/timss2011/downloads/TIMSS2011_Frameworks-Chapter1.pdf Masters, G. N. (1982). A Rasch Model for Partial Credit scoring. Psychometrika,

47, 149-174.

Masters, G. N., & Wright, B. D. (1996). The Partial Credit Model. In W. J. van der Linden & R. K. Hambelton (Eds.), Handbook of modern item response theory.

New York: Springer.

McCullagh, P., & Nelder, J. A. (1989).Generalized linear models (2nd ed.). London:

Chapman & Hall.

McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models.

Retrieved June 16, 2011, from the World Wide Web:

http://nces.ed.gov/nationsreportcard/

National Assessment Governing Board (2009). Mathematics framework for the 2009

national Assessment of educational progress. National Assessment Governing

Board U.S. Department of Education

Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal

of the Royal Statistical Society A,135, 370-384.

OECD (2005). PISA 2003 Technical Report. OCED. Paris.

OECD (2009). PISA 2009 Assessment Framework.

Retrieved June 16, 2011, from the World Wide Web:

http://www.oecd.org/dataoecd/11/40/44455820.pdf

Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov Chain Monte Carlo methods for item response models. Journal of Educational and

Behavioral Statistics, 24(2), 146-178.

Qiu, Z., Song, P. X.-K., & TAn, M. (2002). Bayesian hierarchical models for multi-level repeated ordinal data using WinBUGS. Journal of Biopharmaceutical

Statistics, 12, 121-135.

Rasch, G. (1960). Probabilistic models forsome intelligence and attainment tests.

Copenhagen: Institute of Educational Research. ( Expanded edition, 1980.

Chicago:The University of Chicago Press.)

Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for Item Response Theory. Psychological Methods, 8, 185-205.

Sheng, Y. (2005). Bayesian analysis of hierarchical IRT model: comparing and

combing the unidimensional and multi-unidimensional IRT model. Unpublished

Sturtz, S., Ligges, U., & Gelman, A. (2005). R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software, 12, 1-16.

Tierney, L. (1994). Exploring posterior distributions with Markov Chains. Annals of

Statistics, 22, 1701-1762.

Weiss, D. & Yoes, M. (1991). Item Response Theory. In R. K. Hambleton & J. Zall (Eds.), Advances in educational and psychological testing. Boston:

Kluwer-Nijhoff.

Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ACER ConQuest. Melbourne, Victoria, Australia: Australian Council for Educational Research press.

附錄一各實驗結果總表

在文檔中階層式試題反應理論之多點計分模式探討 (頁 87-93)

建議

第五章 結論與建議

第二節 建議

參考文獻

中文部分

英文部分

13, 113-127.

Psychometrika, 43, 561-573.

American Statistician, 49, 327-335.

Measurement, 33(8), 620-639.

47, 149-174.

New York: Springer.

national Assessment of educational progress. National Assessment Governing

of the Royal Statistical Society A,135, 370-384.

Behavioral Statistics, 24(2), 146-178.

Statistics, 12, 121-135.

combing the unidimensional and multi-unidimensional IRT model. Unpublished

Statistics, 22, 1701-1762.

附錄一 各實驗結果總表

第五章結論與建議

第二節建議

附錄一各實驗結果總表