未來研究建議

第五章結論與未來研究建議

第二節未來研究建議

茲就本研究未盡完備之處，提出以下研究建議，供後續相關研究者參考。

一、本研究在等化設計中僅考慮受試者能力分佈為常態之情形，未來研究可考量探討受試者能力分佈為偏態與雙峰之效果比較。

二、本研究在等化設計中僅考慮題間多向度之情形，未來研究可考量探討題內多向度之效果比較。

三、研究在等化設計中僅考慮三種題本次級量尺比例，未來研究可考量其他題本次級量尺比例之效果比較。

四、本研究因使用 Acer ConQuest 軟體進行參數估計，故僅考慮單參數模式，未來研究可考量探討二參數、三參數模式，以 NOHARM、TESTFACT 等軟體進行估計。

伍、本研究延續謝佳穎（2009）中使用的三種次級量尺方法，以多向度 MIRT 以及單向度 BOCK 及 W-BOCK 作為對照，尚未涵蓋所有次級量尺分數估計方法，未來研究可考量探討其他次級量尺分數估計方法之效果比較。

參考文獻

中文部分

心理與教育測驗研究發展中心-測驗專業工作坊（2006，1月2日）。「IRT 在量表（測驗 ) 編製上的應用」講義（下）。上網日期： 2009 年 8 月 10 日。網址： http://www.rcpet.ntnu.edu.tw/download.htm

王文中（2004）。Rasch測量理論與其在教育和心理之應用測量理論與其在教育和心理之應用測量理論與其在教育和心理之應用測量理論與其在教育和心理之應用。國立政治大學「教育與心理研究」2004年12月，27卷4期，頁637-694。

王暄博（2006）。BIB與與與NEAT設計之水平及垂直等化效果比較與設計之水平及垂直等化效果比較設計之水平及垂直等化效果比較設計之水平及垂直等化效果比較。國立臺中教育大學教育測驗統計研究所碩士論文，未出版，臺中市。

郭伯臣、王暄博、吳慧珉、張宛婷(2010)。次級量尺分數估計法應用於大型教育次級量尺分數估計法應用於大型教育次級量尺分數估計法應用於大型教育次級量尺分數估計法應用於大型教育測驗情境之模擬研究

測驗情境之模擬研究測驗情境之模擬研究

測驗情境之模擬研究。中國測驗學刊-第57輯第二期，2010年6月30日謝佳穎（2009）。多向度多向度多向度多向度試題反應理論用於次級量尺分數估計之模擬試題反應理論用於次級量尺分數估計之模擬試題反應理論用於次級量尺分數估計之模擬試題反應理論用於次級量尺分數估計之模擬研究研究研究研究。

國立臺中教育大學教育測驗統計研究所碩士論文，未出版，臺中市。

英文部分

Ackerman, T. A. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement,20, 311-329.

Adams, R. J., Wilson, M. R., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.

Allen, N.L., Donoghue, J.R., & Schoeps, T.L. (2001). The NAEP 1998 technical report. Washington, DC: National Center for Educational Statistics.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick(Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Bock, R. D. and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika 46, 443-445.

Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRT estimation of domain scores.

Journal of educational measurement, 34,197-211.

Fraser, C. (1988). NOHARM II: A fortran program for fitting unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia:

The University of New England, Center for Behavioral Studies.

Gessaroli, M. E. (2004). Using hierarchical multidimensional item response theory to estimate augmented subscores. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.

Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., & Reckase, M. D. (1984).

Gummerman, K (1972). A Response-Contingent Measure of Proportion Correct. The Journal of the Acoustical Society of America, 52, 1645-1647.

Hambleton, R.K., & Swaminathan, H. (1985). Item Response Theory: Principles and Application. Boston, MA：Kivwer-Nijhoff.

Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia:

The University of New England, Center for Behavioral Studies.

Jimmy De la Torre., Richard J ,Patz. (2002). A multidimensional item response theory approach to simultaneous ability estimation. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.

Johnson, D. A. & Wichern, D. W. (2007). Applied multivariate statistical analysis.

New Jersey: Pearson Education.

Kahraman, N. & Kamata, A. (2004). Increasing the precision of subscale scores by using out-of-scale information. Applied psychological measurement, 28(6), 407-426

Kelley, T. L. (1927). The interpretation of educational measurements. New York:

World Book.

Kelley, T. L. (1947). Fundamentals of statistics. Cambridge, MA: Harvard University Press.

Kolen, M. J. & Brennan, R. L. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.

Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer-Verlag.

Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum.

Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233-245.

Martin, M. O., Mullis, I. V. S., & Chrostowski, S. J. (Eds.) (2004). TIMSS 2003 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

McDonald, R. P. (1967). Nonlinear factor analysis. Psychometric Monograph, 15, 1-167.

McKinley, R. L. & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation,15, 389-390.

Meyers, J. L., Shin, D., & Nichols P. D., (2008, March). Perspective: An integrated assessment and instructional resources system. Pearson Research Report. Iowa City, IA: Pearson.

Moran, R., Rampey, B. D., Dion, G., & Donahue, P. (2008). National Indian Education Study 2007 Part I: Performance of American Indian and Alaska Native Students at Grades 4 and 8 on NAEP 2007 Reading and Mathematics Assessments (NCES 2008–457). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, D.C.

Nance, L. A., John, R. D., & Terry, L. S. (2001). The NEAP 1998 Technical Report.

National Center for Education Statistics, Educational Testing Service.

Novick, M. R. & Jackson, P. H. (1974). Statistical methods for educational and psychological research. New York, NY: McGraw-Hill.

Olson, J .F., Martin, M. O., & Mullis, I. V. S. (Eds.). (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

Petersen, Nancy S., Kolen, Michael J., Hoover, H. D. (1993). Scaling, Norming, and Equating. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp221-262).

New York: Macmillan.

PISA(2006). PISA 2006 Science Competencies for Tomorrow's World. Retrieved January 2, 2010, from http://www.pisa.oecd.org/document/2/0,3343,en_322 52351_32236191_39718850_1_1_1_1,00.html

Pommerich, M., Nicewander, W. A., & Hanson, B. (1999). Estimating average domain scores. Journal of educational measurement, 36, 199-216.

Rasch, G. (1960). Probabilistic models for some Intelligence and attainment tests.

Chicago: University of Chicago Press.

Reckase, M. D., & McKinley, R. L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361-373.

Peng, L. (2008). IRT vs. Factor Analysis Approaches in Analyzing Multigroup Multidimensional Binary Data: The Effect of Structural Orthogonality, and the Equivalence in Test Structure, Item Difficulty, & Examinee Groups. Dissertation submitted to the Faculty of the Graduate School of the University of Maryland.

Shin. D. (2007). A Comparison of Methods of Estimating Subscale Scores or Mixed-Format Tests. Pearson Educational Measurement.

Shin, C. D. (2006). A comparison of methods of estimating subscale scores for Mixed-Format test s. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.

Shin, C. D., Ansley, T., Tsai, T., & Mao X. (2005). A comparison of methods of estimating objective scores. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.

Sympson, J. B. (1978). A model for testing with the multidimensional items. In D. J.

Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference (pp. 82-98). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program.

Tate, R. L. (2004). Implications of multidimensionality for total score and subscale performance. Applied measurement in education, 17(2). 89-112

van der Linden, W. J., & Veidkamp, B. P., & Carlson, J. E. (2004). Optimizing Balanced Incomplete Block Designs for Educational Assessments. Applied Psychological Measurement, 28, 317-331.

von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.

Wainer, H., Vevea, J. L., Camacho, F., Reeve III, B. B., Rosa, K., Nelson, L., Swygert, K. A., & Thissen, D. (2000). Test scoring. Hillsdale, NJ: Earlbaum Associates.

Wang, W.-C., Wilson, M. R., & Adams, R. J. (1997). Rasch models for multidimensionality between items and within items. In M. Wilson, G. Engelhard

& K. Draney (Eds.), Objective measurement: Theory into practice. (Volume 4, pp.

139-155). Norwood, NJ: Ablex.

Academic Publishers, Massachusetts, USA.

Whitely, S. E. (1980). Measuring aptitude processes with multicomponent latent trait models (Technical Report NIE-80-5). Lawrence: University of Kansas.

Woods, C. M., & Lin, N. (2009). Item Response Theory With Estimation of the Latent Density Using Davidian Curves. Applied Psychological Measurement, 33, 102-117.

Wu, M. L., Adams, R. J. & Wilson, M.R. (1998). ACER Conquest: Generalised item response modelling software. Melbourne: ACER Press.

Yao, L, & Mao, X. (2004). Unidimensional and multidimensional estimation of vertical scaled tests with complex structure. Paper presented at the annual meeting of National Council on Measurement in Education, San Diego, CA.

Yao, L., & Boughton, K. A. (2007). A Multidimensional Item Response Modeling Approach for Improving Subscale Proficiency Estimation and Classification.

Applied Psychological Measurement, 31, 83-105.

Yen, W. M., Sykes, R. C., Ito, K., & Julian, M. (1997). A Bayesian / IRT index of objective performance for tests with mixed-item types. Paper presented at the annual meeting of the National Council on Measurement in Education in Chicago.

附錄一水平等化設計之誤差RMSE

附錄二垂直等化設計之誤差RMSE

附表 2-2 L_0.5_1 於不同測驗情境之 RMSE L :低年級

附表 2-3 H_0.5_1 於不同測驗情境之 RMSE H:高年級

附表 2-4 T_0.5_2 於不同測驗情境之 RMSE

附表 2-5 L_0.5_2 於不同測驗情境之 RMSE

附表 2-6 H_0.5_2 於不同測驗情境之 RMSE

附表 2-7 T_0.5_3 於不同測驗情境之 RMSE

附表 2-8 L_0.5_3 於不同測驗情境之 RMSE

附表 2-9 H_0.5_3 於不同測驗情境之 RMSE

附表 2-10 T_1_1 於不同測驗情境之 RMSE

附表 2-11 L_1_1 於不同測驗情境之 RMSE

附表 2-12 H _1_1 於不同測驗情境之 RMSE

附表 2-13 T_1_2 於不同測驗情境之 RMSE

附表 2-14 L_1_2 於不同測驗情境之 RMSE

附表 2-15 H_1_2 於不同測驗情境之 RMSE

附表 2-16 T_1_3 於不同測驗情境之 RMSE

附表 2-17 L_1_3 於不同測驗情境之 RMSE

附表 2-18 H _1_3 於不同測驗情境之 RMSE

附表 2-19 T_2_1 於不同測驗情境之 RMSE

附表 2-20 L_2_1 於不同測驗情境之 RMSE

附表 2-21 H_2_1 於不同測驗情境之 RMSE

附表 2-22 T_2_2 於不同測驗情境之 RMSE

附表 2-23 L_2_2 於不同測驗情境之 RMSE

附表 2-24 H_2_2 於不同測驗情境之 RMSE

附表 2-25 T_2_3 於不同測驗情境之 RMSE

附表 2-26 L_2_3 於不同測驗情境之 RMSE

附表 2-27 H_2_3 於不同測驗情境之 RMSE

附錄三垂直等化設計之題本配置比例RMSE

附表 3-2 題本配置比例於相同定錨試題比例(20%)之 RMSE 比較表

N(0.5,1) 7140_0.5_2 15120_0.5_2

RMSE RMSE

N(1,1) ^7140_1_2 ^15120_1_2

RMSE RMSE

N(2,1) ^7140_2_2 ^15120_2_2

RMSE RMSE

附表 3-3 題本配置比例於相同定錨試題比例(30%)之 RMSE 比較表

附錄四垂直等化設計之次級量尺相關程度RMSE

N(1,1) 7140_1_1 15120_1_1

RMSE RMSE

附表 4-2 次級量尺相關程度於相同定錨試題比例(20%)之 RMSE 比較表

N(1,1) 7140_1_2 15120_1_2

RMSE RMSE

N(2,1) 7140_2_2 15120_2_2

RMSE RMSE

附表 4-3 次級量尺相關程度於相同定錨試題比例(30%)之 RMSE 比較表

N(0.5,1) 7140_0.5_3 15120_0.5_3

RMSE RMSE

N(1,1) 7140_1_3 15120_1_3

RMSE RMSE

N(2,1) 7140_2_3 15120_2_3

RMSE RMSE

在文檔中不同次級量尺估計法之水平及垂直等化效果比較 (頁 55-96)

第五章 結論與未來研究建議

第二節 未來研究建議

參考文獻

中文部分

英文部分

附錄一 水平等化設計之誤差RMSE

附錄二 垂直等化設計之誤差RMSE

附表 2-2 L_0.5_1 於不同測驗情境之 RMSE L :低年級

附表 2-3 H_0.5_1 於不同測驗情境之 RMSE H:高年級

附表 2-4 T_0.5_2 於不同測驗情境之 RMSE

附表 2-5 L_0.5_2 於不同測驗情境之 RMSE

附表 2-6 H_0.5_2 於不同測驗情境之 RMSE

附表 2-7 T_0.5_3 於不同測驗情境之 RMSE

附表 2-8 L_0.5_3 於不同測驗情境之 RMSE

附表 2-9 H_0.5_3 於不同測驗情境之 RMSE

附表 2-10 T_1_1 於不同測驗情境之 RMSE

附表 2-11 L_1_1 於不同測驗情境之 RMSE

附表 2-12 H _1_1 於不同測驗情境之 RMSE

附表 2-13 T_1_2 於不同測驗情境之 RMSE

附表 2-14 L_1_2 於不同測驗情境之 RMSE

附表 2-15 H_1_2 於不同測驗情境之 RMSE

附表 2-16 T_1_3 於不同測驗情境之 RMSE

附表 2-17 L_1_3 於不同測驗情境之 RMSE

附表 2-18 H _1_3 於不同測驗情境之 RMSE

附表 2-19 T_2_1 於不同測驗情境之 RMSE

附表 2-20 L_2_1 於不同測驗情境之 RMSE

附表 2-21 H_2_1 於不同測驗情境之 RMSE

附表 2-22 T_2_2 於不同測驗情境之 RMSE

附表 2-23 L_2_2 於不同測驗情境之 RMSE

附表 2-24 H_2_2 於不同測驗情境之 RMSE

附表 2-25 T_2_3 於不同測驗情境之 RMSE

附表 2-26 L_2_3 於不同測驗情境之 RMSE

附表 2-27 H_2_3 於不同測驗情境之 RMSE

附錄三 垂直等化設計之題本配置比例RMSE

附表 3-2 題本配置比例於相同定錨試題比例(20%)之 RMSE 比較表

附表 3-3 題本配置比例於相同定錨試題比例(30%)之 RMSE 比較表

附錄四 垂直等化設計之次級量尺相關程度RMSE

附表 4-2 次級量尺相關程度於相同定錨試題比例(20%)之 RMSE 比較表

附表 4-3 次級量尺相關程度於相同定錨試題比例(30%)之 RMSE 比較表

第五章結論與未來研究建議

第二節未來研究建議

附錄一水平等化設計之誤差RMSE

附錄二垂直等化設計之誤差RMSE

附錄三垂直等化設計之題本配置比例RMSE

附錄四垂直等化設計之次級量尺相關程度RMSE