研究建議

第五章結論與建議

第二節研究建議

本節根據研究過程與結論，提出以下幾點建議：

一、從本研究結論中可得知，在試題存在違反局部獨立的情形下，HLM6.03 估計參數精準度較 BILOG-MG 佳，因此可以 HLM 軟體代替 BILOG-MG 進行參數估計。

二、從本研究的結果中可發現，兩模式在一些模擬因子組合下的估計效果指標，

並無呈現相同的趨勢，推測可能與本研究所設定的複製次數 50 次有關，因此建議未來研究者可提高複製次數，可使結果更趨於穩定。

三、本研究中只設定三個模擬因子，未來的研究可增加模擬因子，像是題組數等，

使研究結果更具完整性。

四、本研究結果中，在某些模擬情境裡存在著無法解釋的估計表現趨勢，推測可能是由於本研究在產生模擬資料時，利用 SAS 軟體一次產生 50 筆資料集而導致模擬資料存在異常情況，建議未來相關的模擬研究在產生資料時，一次產生一筆資料集。

五、本研究在估計軟體 HLM6.03 中，選用 REML 估計法，而 BILOG- MG 選用 ML 估計法，因此參數的估計效果是由模式與估計法一起造成。建議未來研究者可使用相同的估計法，更可突顯參數估計效果是由模式所造成的差異。

參考文獻

中文部份

王寶墉(1995)。現代測驗理論。臺北市：心理出版社。

余民寧(1991)。IRT 學理與應用。取自 http://www.edutest.com.tw/e-irt/irt.htm。

余民寧、謝進昌(2005)。首屆國際認證理財規劃顧問專業能力測驗心理計量特質 分析。教育研究與發展期刊，3，51-82。

林原宏(1997a) 。教育研究資料的階層線性模式分析。國立臺中教育大學學報，

11，489- 509。

林原宏(1997b)。階層線性模式(HLM)之理論。測驗統計簡訊，15，17- 26。

林原宏(2006)。數學試題的局部獨立性與題組反應模式：兼論其在數學考卷的評 析與檢驗。數學考卷編製暨評析研討會。台中市：國立台中教育大學。

吳璧如(2005)。教師效能感的縱貫性研究：以幼教職前教師為例。國立政治大學 教育與心理研究，28，383-403。

吳毓瑩、吳麗君(2001)。從比較教育的取向討論測驗評量在教育銜接中之意涵—

一個可能的研究途徑。國立台北師範學院學報， 15，313-336。

葛湘瑋(2004)。應用線性混合效果模式於建立多層次縱向資料的模式之實例研 究。國立政治大學教育與心理研究，27，399-419。

溫福星(2007)。階層線性模式原理、發展與應用。臺北市：雙葉書廊有限公司。

劉子鍵、林原宏(1997)。階層線性模式之理論與應用：以「影響自然科成績之因 素的研究」為分析實例。國立政治大學教育與心理研究，20，1-22。

英文部份

Andersen, E. B. (1973). Conditional inference and models for measuring. Copenhagen:

Mentalhygiejnisk Forlag.

Adams, R. J. & Wilson, M. (1996). Formulating the Rasch model as a mixed

coefficients multinomial logit. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory and practice (pp. 143-166). Norwood, NJ: Ablex.

Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.

Birnbaum, A. (1968). Some latent trait models and their user in inferring an examinee’s ability. In F. M. Lord , & M. R. Novick (Eds.), Statistical theories of mental rest scores (pp. 397-479). Reading, MA: Addison-Wesley.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM algorithm. Psychometrika, 46, 443-459.

Bock, R. D., & Zimowski, M. F. (1996). Multiple group IRT. In W. J. van der Linden,

& R. K. Hambleton (Eds.), Handbook of modern item response theory. New York:

Springer.

Breslow, N. E., & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Associatio n, 88, 9-25.

Beretvas, S. N., & Williams, N. J. (2004). The use of hierarchical generalized linear model for item dimensionality assessment. Journal of Educational Measurement, 41, 379-395.

Beretvas, S. N., & Williams, N. J. (2006). DIF identification using HGLM for polytomous item. Applied Psychological Measurement, 30, 22-42.

Beretvas, S. N., & Kamata, A. (2005). The multilevel measurement model:

introduction to the special issue . Journal of Applied Measurement, 6, 247-254.

Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.

Bradlow, E. T., Wainer, H., & Wang, X. (2002). A general Bayesian model for testlets:

theory and applications. Applied Psychological Measurement, 26, 109-128.

Chu, K. L., & Kamata, A. (2005). Test equating in the presence of DIF item. Journal of Applied Measurement, 6, 342-354.

Cheong, Y. F., & Raudenbush, S. W. (2000). Measurement and structural models for children’s problem behaviors. Psychological Methods, 5 , 477-495.

Chen, W. H., & Thissen, D. (1997). Local dependence indexes for items pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265-289.

Du, Z. (1998). Modeling conditional item dependencies with a three-parameter logistic testlet model. Doctoral dissertation, Columbia University.

Embretson, S. E. (1997). Structured ability models in tests designed from cognitive theory. In M. Wilson, G. Engelhard, & K. Draney (Eds.), Objective measurement III (pp. 223-236). Norwood, NJ: Ablex.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for p sychologists.

Mahwash, NJ: Lawrence Erlbaum Associates.

Fischer, G. H. (1995). Linear logistic test model. In G. H. Fisher, & I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments and applications (pp.

131-155). New York: Springer-Verlag.

Fox, J. P. (2004). Applications of multilevel IRT modeling. School Effectiveness and School Improvement, 15, 261-280.

Fox, J. P., & Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 269-286.

Fox, J. P., & Glas, C. A. W. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68, 169-191.

Hambleton, R. K. (1989). Principles and selected applications of item response theory (3rd ed.). In R. L. Linn (Ed.), Educational measurement (pp. 147-220). New York:

Macmillan.

Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75-96.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer-Nijhoff.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item resp onse theory . Newburry Park, CA: Sage.

Hox, J. J. (2002). Multilevel analysis: techniques and applications. Mahwah, NJ:

Lawrence Erlbaum Associates.

Hung, L. F., & Wang, W. C. (2005). Multilevel modeling for testing whether items have good discrimination. Chinese Journal of Psychology, 47, 197-209.

Jiao, H., Wang, S., & Kamata, A. (2005). Modeling local item dependence with the hierarchical generalized linear model. Journal of Applied Measurement, 6, 311-321.

Kamata, A. (1998). Some generalizations of the Ra sch model: An application of the hierarchical generalized linear model . Doctoral dissertation, Michigan State University, East Lansing.

Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 79-93.

Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. Thousand Oaks, CA:

Sage Publications.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. (2nd ed.).

New York: Springer.

Kim, S. H., Cohen, A. S., & Lin, Y. H. (2005). LDIP: A computer program for local dependence indices for polytomous items [Software and Manual]. Athens, GA:

University of Georgia.

Lee, Y. W. (2004). Examining passage- related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test.

In T. Eckes, & R. Grotjahn (Eds.), Language Testing (pp. 290-325). CA: SAGE.

Lord, F. M. (1952). A theory of test scores. (Psychometrika Monograph No. 7). Iowa City, IA: Psychometric Society.

Lord, F. M. (1980). Applications of item response theory to p ractical testing problems.

Hillsdale, NJ: Lawrence Erlbaum Associated.

Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212.

Mislevy, R. J. (1986). Bayes modal estimation in item response models.

Psychometrika, 51, 177–195.

Miller, A. D., & Murdock, T. B. (2007). Modeling latent true scores to determine the utility of aggregate student perceptions as classroom indicators in HLM: The case

of classroom goal structures. Contemporary Educational Psychology, 32, 83-104.

Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests.

Chicago: University of Chicago Press.

Rosenbaum, P. R. (1988). Item bundles. Psychometrika, 53, 349-359.

Ringdal, K. (1992). Methods for multilevel analysis. Acta Sociologica, 35, 235-243.

Raudenbush, S., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis Methods(2nd ed.). Thousand Oaks, CA: Sage Publications.

Raudenbush, S. W., Bryk, A. S., & Congdon, R. (2004). HLM6: Hierarchical linear and nonlinear modeling [Computer Program]. Chicago: Scientific Software International.

Roberts, J. K., & Herrington, R. (2005). Demonstration of software programs for estimating multilevel measurement model parameters. Journal of applied measurement, 6 , 255-272.

Singer, J. D. (1998). Using SAS proc mixed to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 24, 323-355.

Swaminathan, H., & Gifford, J. A. (1982). Bayesian estimation in the Rasch model.

Journal of Educational Statistics, 7, 175–191.

Swaminathan, H., & Gifford, J. A. (1985). Bayesian estimation in the two -parameter logistic model. Psychometrika, 50, 349–364.

Swaminathan, H., & Gifford, J. A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589–601.

Swaminathan, H., & Rogers, J. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.

Sireci, S. G., Wainer, H., & Thissen, D. (1991). On the reliability of testlet-based tests.

Journal of Educational Statistics, 7, 175-191.

Tucker, L. R. (1946). Maximum validity of a test with equivalent items.

Psychometrika, 11, 1-13.

Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models.

Psychometrika, 51, 567-577.

Thissen, D., Sternberg, L., & Mooney, J. A. (1989). Trace kines for testlets: a use of multiple-categorical-response model. Journal of Educational Measurement, 26, 247-260.

van den Noortgate, W., De Boeck, P., & Meulders, Michel. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369-386.

Wang, W. C., & Liu, C. Y. (2007). Formulation and application of the generalized multilevel facets model. Educational and Psychological Measurement, 67, 583-605.

Wainer, H. (1990). Computer adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates.

Wainer, H. (1995). Precision and differential item functioning on a testlet-based test:

The 1991 Law School Admission Test as an example. Applied Measurement in Education, 8, 157-187.

Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test score?

What is the effect of lo cal dependence on reliability? Educational Measurement:

Issues and Practice, 15, 22-29.

Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473-492.

Weiss, D. J. (1985). Adaptive testing by computer. Journal of Consulting and Clinical Psychology, 53, 774-789.

Yen, W. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125-145.

Yen, W. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.

Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R.D. (1996). BILOG-MG:

Multiple-group IRT analysis and test maintenance for binary items [Computer program]. Chicago, IL: Scientific Software.

在文檔中違反試題局部獨立性之參數估計－BILOG-MG與HLM軟體的比較 (頁 78-85)

第五章 結論與建議

第二節 研究建議

參考文獻

中文部份

英文部份

第五章結論與建議

第二節研究建議