建議 - 結論與建議 - 次級量尺分數估計法應用於大型教育測驗情境之模擬研究

第五章結論與建議

第二節建議

茲就本研究未盡完備之處，提出一些研究建議，供後續研究者參考。

一、本研究在 IRT 等化估計方法僅使用同時等化法，未來研究可考量連結分開估計等化方法，例如：平均數法、平均數與標準差法及特徵曲線法等。

二、本研究在等化測驗設計中僅考慮一種次級量尺配置模式（每個題本包含兩個次級量尺，每個次級量尺的測驗長度為 12 題，定錨試題為 6 題等），

未來研究可考量不同次級量尺配置模式。

三、本研究僅考慮一種受試者能力分布，未來研究可考量不同受試者能力分布之效果比較。

四、本研究提出六種次級量尺計算方法，如：Bock, OPI, W-Bock, REG, REGP, PC，並未涵蓋所有次級量尺計算方法，未來研究可嘗試不同次級量尺計算方法之測驗分數估計結果。

五、由本研究可以發現，次級量尺分數的計算上，若次級量尺間相關程度高時， Bock, W-Bock, REG, REGP 四種方法估計效果較好且差異性不大；若次級量尺間相關程度低時，REG 與 REGP 估計效果較好。未來研究者在大型測驗次級量尺分數計算上，若次級量尺相關程度高者可以使用 Bock, W-Bock, REG, REGP 四種方法；若相關程度範圍較大者，則建議使用 REG 與 REGP 兩種方法。

參考文獻

中文部分

王暄博（2006）。BIB 與 NEAT 設計之水平及垂直等化效果比較。國立臺中教育 大學教育測驗統計研究所碩士論文，未出版，臺中市。

李源煌、楊玉女（2000）。建立學科評量量尺之理論基礎。中國測驗學會測驗年 刊，47（1），95-116。

張鈺卿、張宛婷、郭伯臣、楊思偉（2007）。不同年度及不同年級大型教育測驗 等化效果之模擬研究。台灣師範大學主辦，2007年中國測驗學會教育測驗學 術研討會，臺北市。

陳煥文（2004）。垂直等化連結特性之研究-四種連結方法的比較。（國科會專題 研究計畫，NSC92-2413-H-024-015）。臺南市︰國立臺南大學測驗統計研究所。

楊孟麗、譚康榮、黃敏雄（2003）。台灣教育長期追蹤資料庫︰心理計量報告︰

TEPS2001 分析能力測驗第一版。中央研究院調查研究專題中心。

臺灣學生學習成就評量資料庫（2007）。2007年學生學習成就評量國語文科評量 簡介。檢索日期：12/01/2007。http://tasa.naer.edu.tw/chinese.htm

歐滄和（2002）。教育測驗與評量。臺北市：心理。

英文部分

Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R.L. Thorridike (Ed.), Educational measurement (2nd ed., 508-600). Washington, DC: American Council on Education. (Reprinted as W. A. Angoff, Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service, 1984.)

Baker, F. B. (1992). Item Response Theory: Parameter Estimation Techniques. New

Yook: Marcel Dekker.

Baker, F. B. & Kim, S.H. (2004). Item Response Theory: Parameter Estimation Techniques. New Yook: Marcel Dekker, Inc. 2nd Edition.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick(Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRT estimation of domain scores.

Journal of educational measurement, 34(3),197-211.

Braun, H. I. & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland and D. B. Rubin (Eds.), Test equating (pp.9-49). New York：Academic.

Cook, L. L. & Eignor, D. R. (1991). An NCMF instructional module on IRT equating methods. Educational Measurement: Issues and Practice, 10, 37-45.

Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., & Reckase, M. D. (1984).

Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21(4), 347-360.

Gummerman, K (1972). A Response-Contingent Measure of Proportion Correct. The Journal of the Acoustical Society of America, 52, 1645-1647.

Crocker, L. & Algina, J. (1986). Introduction to Classical and Modem Test Theory.

New York: Holt, Rinehart and Winston.

Gessaroli, M. E. (2004). Using hierarchical multidimensional item response theory to estimate augmented subscores. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.

Haebara, T. (1980). Equating Logistic Ability Scales by a Weighted Least Squares Method. Japanese Psychological Research, 22, 144-149.

Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer-Nijhoff.

Harris, D. J. & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6, 195-240.

Johnson, D. A. & Wichern, D. W. (2007). Applied multivariate statistical analysis.

New Jersey: Pearson Education.

Kahraman, N. & Kamata, A. (2004). Increasing the precision of subscale scores by using out-of-scale information. Applied psychological measurement, 28(6), 407-426.

Kelley, T. L. (1927). The interpretation of educational measurements. New York:

World Book.

Kelley, T. L. (1947). Fundamentals of statistics. Cambridge, MA: Harvard University Press.

Kim, S. H. & Cohen, A. S. (1998). A Comparison of Linking and Concurrent Calibration Under Item Response Theory. Applied Psychological Measurement, 22, 131-143.

Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer-Verlag.

Kolen, M. J. & Brennan, R. J. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.

Lord, F. M. (1980). Application of item response theory to practical testing problems.

hillsdale, NJ : lawrence erlbaum associates.

Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233-245.

Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading,

MA: Addison Wesley.

Martin, M. O., Mullis, I. V. S., & Chrostowski, S. J. (Eds.) (2004), TIMSS 2003 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

Moran, R., Rampey, B. D., Dion, G., & Donahue, P. (2008). National Indian Education Study 2007 Part I: Performance of American Indian and Alaska Native Students at Grades 4 and 8 on NAEP 2007 Reading and Mathematics Assessments (NCES 2008–457). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, D.C.

Morris, C. N. (1982). On the foundations of test equating. In P.W. Holland and D.B.

Rubin (Eds.), Test equating (pp. 169-191). New York: Academic.

Mislevy, R. J. & Bock R. D. (1982). Implementation of the EM algorithm in the estimation of item parameters: The BILOG computer program. In: Item Response Theory and Computerized Adaptive Testing Conference Proceedings (Wayzata, MN).

Mislevy, R. J. & Bock R. D. (1990). PC-BILOG-Item analysis and test scoring with binary logistic models [Computer software]. Mooresville, IN: Scientific Software.

Muraki, E. (1992). A generalized Partial credit model：Application of an EM algorithm.

Applied Psychological Measurement, 16(2), 159-176.

Muraki, E. & Bock, R. D. (1996). PARSCALE: IRT based test scoring and item analysis for graded open-ended exercises and performance tasks (Version 3) [Computer software]. Chicago: Scientific Software.

Nance, L. A., John, R. D., & Terry, L. S. (2001). The NEAP 1998 Technical Report.

National Center for Education Statistics, Educational Testing Service.

Novick, M. R. & Jackson, P. H. (1974). Statistical methods for educational and

psychological research. New York, NY: McGraw-Hill.

Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating.

In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York:

Macmillan.

PISA(2006). PISA 2006 Science Competencies for Tomorrow's World. Retrieved December 27, 2007, from

http://www.pisa.oecd.org/document/2/0,3343,en_32252351_32236191_39718850 _1_1_1_1,00.html

Pommerich, M., Nicewander, W. A., & Hanson, B. (1999). Estimating average domain scores. Journal of educational measurement, 36, 199-216.

Shin, C. D., Ansley, T., Tsai, T., & Mao X. (2005). A comparison of methods of estimating objective scores. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.

Shin, C. D. (2006). A comparison of methods of estimating subscale scores for Mixed-Format tests. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.

Stocking, M. L. & Lord, F. M. (1983). Developing a Common Metric in Item Response Theory. Applied Psychological Measurement, 7(2).201-211.

Tate, R. L. (2004). Implications of multidimensionality for total score and subscale performance. Applied measurement in education, 17(2). 89-112.

Wainer, H., Vevea, J. L., Camacho, F., Reeve III, B. B., Rosa, K., Nelson, L., Swygert, K. A., & Thissen, D. (2000). Test scoring. Hillsdale, NJ: Earlbaum Associates.

Weiss, D.J. & Yoes, M.E. (1991). Item response Theory: In Advances in Educational and Psychological Testing. Edited by Hambleton, R.K., & Zaal, J.N., Kluwer Academic Publishers, Massachusetts, USA.

Yen, W. M. (1983). Tau-equivalence and equipercentile equating. Psychometrika, 48, 353-369.

Yen, W. M. (1987). A Bayesian / IRT index of objective performance. Paper presented at the annual meeting of the Psychometric Society, Montreal, Quebec, Canada, June 1-19.

Yen, W. M., Sykes, R. C., Ito, K., & Julian, M. (1997). A Bayesian / IRT index of objective performance for tests with mixed-item types. Paper presented at the annual meeting of the National Council on Measurement in Education in Chicago.

Yen, W. M. & Fitzpatrick, A. R. (2007). Item Response Theory. In Robert L. Brennan (Ed.), Educational Measurement (4rd ed.). New York: Macmillan.

Zimowski, M. F., Muraki, E., Mislevy, R. J. & Bock, R. D. (2003). BILOG-MG.

Scientific Software lnternational.

附錄一單一測驗設計之誤差RMSE

附表 1-1 測驗題型混和比例為 0%、施測人數 3000 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1676 0.1527 0.1226 0.1577 0.1212 0.1355 0.2_4_24

0.0004 0.0020 0.0007 0.0010 0.0011 0.0009 0.1450 0.1341 0.1208 0.1679 0.1189 0.1262 0.5_4_24

0.0005 0.0011 0.0008 0.0011 0.0011 0.0007 0.1008 0.1106 0.0995 0.1588 0.1004 0.0958 0.8_4_24

0.0007 0.0011 0.0008 0.0010 0.0012 0.0008 0.0753 0.1082 0.0800 0.1578 0.0887 0.0768 1_4_24

0.0009 0.0012 0.0009 0.0010 0.0014 0.0009 0.1418 0.1083 0.0972 0.1115 0.0939 0.1221 0.2_2_24

0.0009 0.0018 0.0008 0.0009 0.0010 0.0011 0.1254 0.1079 0.1000 0.1190 0.0956 0.1125 0.5_2_24

0.0006 0.0011 0.0010 0.0011 0.0012 0.0007 0.0899 0.0942 0.0870 0.1127 0.0835 0.0856 0.8_2_24

0.0006 0.0010 0.0008 0.0010 0.0010 0.0007 0.0727 0.0955 0.0763 0.1117 0.0742 0.0733 1_2_24

0.0009 0.0011 0.0010 0.0011 0.0011 0.0009 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-2 測驗題型混和比例為 0%、施測人數 1000 人情形之 RMSE

附表 1-2 測驗題型混和比例為 0%、施測人數 1000 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1675 0.1493 0.1226 0.1572 0.1224 0.1362 0.2_4_24

0.0007 0.0029 0.0011 0.0020 0.0014 0.0016 0.1466 0.1348 0.1216 0.1678 0.1208 0.1278 0.5_4_24

0.0009 0.0020 0.0014 0.0018 0.0015 0.0011 0.1010 0.1106 0.0996 0.1587 0.1010 0.0960 0.8_4_24

0.0010 0.0017 0.0012 0.0017 0.0015 0.0012 0.0751 0.1093 0.0800 0.1568 0.0880 0.0764 1_4_24

0.0017 0.0018 0.0018 0.0014 0.0030 0.0017 0.1408 0.1081 0.0975 0.1118 0.0947 0.1213 0.2_2_24

0.0015 0.0028 0.0015 0.0018 0.0016 0.0019 0.1272 0.1081 0.1003 0.1189 0.0965 0.1140 0.5_2_24

0.0010 0.0019 0.0017 0.0021 0.0018 0.0013 0.0900 0.0939 0.0869 0.1124 0.0835 0.0856 0.8_2_24

0.0013 0.0019 0.0015 0.0019 0.0014 0.0014 0.0725 0.0957 0.0758 0.1110 0.0744 0.0730 1_2_24

0.0018 0.0017 0.0019 0.0017 0.0019 0.0018 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-3 測驗題型混和比例為 0%、施測人數 500 人情形之 RMSE

附表 1-3 測驗題型混和比例為 0%、施測人數 500 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1677 0.1482 0.1228 0.1572 0.1244 0.1373 0.2_4_24

0.0020 0.0035 0.0022 0.0025 0.0027 0.0026 0.1483 0.1361 0.1225 0.1679 0.1236 0.1300 0.5_4_24

0.0011 0.0022 0.0017 0.0022 0.0023 0.0016 0.1013 0.1105 0.0996 0.1585 0.1028 0.0963 0.8_4_24

0.0016 0.0028 0.0017 0.0022 0.0029 0.0016 0.0765 0.1097 0.0815 0.1562 0.0905 0.0775 1_4_24

0.0022 0.0028 0.0024 0.0025 0.0040 0.0021 0.1398 0.1086 0.0973 0.1118 0.0956 0.1209 0.2_2_24

0.0019 0.0037 0.0019 0.0024 0.0020 0.0022 0.1280 0.1086 0.1004 0.1187 0.0977 0.1155 0.5_2_24

0.0015 0.0027 0.0023 0.0027 0.0023 0.0017 0.0902 0.0946 0.0866 0.1124 0.0845 0.0860 0.8_2_24

0.0019 0.0027 0.0023 0.0026 0.0025 0.0021 0.0731 0.0959 0.0764 0.1110 0.0750 0.0735 1_2_24

0.0023 0.0026 0.0023 0.0025 0.0024 0.0023 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-4 測驗題型混和比例為 20%、施測人數 3000 人情形之 RMSE

附表 1-4 測驗題型混和比例為 20%、施測人數 3000 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1614 0.1358 0.1141 0.1428 0.1122 0.1284 0.2_4_24

0.0005 0.0010 0.0008 0.0010 0.0008 0.0007 0.1448 0.1257 0.1147 0.1483 0.1118 0.1265 0.5_4_24

0.0004 0.0008 0.0006 0.0009 0.0007 0.0005 0.1006 0.1064 0.0998 0.1462 0.0968 0.0947 0.8_4_24

0.0006 0.0010 0.0008 0.0010 0.0010 0.0006 0.0707 0.1007 0.0792 0.1429 0.0841 0.0707 1_4_24

0.0007 0.0008 0.0009 0.0008 0.0014 0.0007 0.1307 0.1001 0.0898 0.1012 0.0871 0.1146 0.2_2_24

0.0011 0.0022 0.0009 0.0011 0.0010 0.0012 0.1208 0.0987 0.0919 0.1050 0.0878 0.1102 0.5_2_24

0.0007 0.0009 0.0009 0.0009 0.0009 0.0007 0.0897 0.0906 0.0852 0.1036 0.0804 0.0851 0.8_2_24

0.0007 0.0010 0.0009 0.0009 0.0008 0.0007 0.0687 0.0878 0.0735 0.1013 0.0723 0.0687 1_2_24

0.0008 0.0010 0.0009 0.0010 0.0009 0.0008 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-5 測驗題型混和比例為 20%、施測人數 1000 人情形之 RMSE

附表 1-5 測驗題型混和比例為 20%、施測人數 1000 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1607 0.1351 0.1148 0.1430 0.1137 0.1297 0.2_4_24

0.0007 0.0017 0.0013 0.0016 0.0015 0.0014 0.1468 0.1261 0.1153 0.1483 0.1130 0.1277 0.5_4_24

0.0007 0.0015 0.0013 0.0015 0.0014 0.0010 0.1012 0.1068 0.0999 0.1462 0.0973 0.0952 0.8_4_24

0.0009 0.0017 0.0013 0.0015 0.0013 0.0010 0.0708 0.1019 0.0798 0.1421 0.0840 0.0708 1_4_24

0.0016 0.0017 0.0017 0.0015 0.0026 0.0016 0.1301 0.0991 0.0894 0.1009 0.0874 0.1142 0.2_2_24

0.0015 0.0029 0.0015 0.0018 0.0016 0.0017 0.1227 0.0991 0.0921 0.1049 0.0885 0.1120 0.5_2_24

0.0009 0.0018 0.0015 0.0015 0.0014 0.0010 0.0901 0.0905 0.0853 0.1035 0.0807 0.0855 0.8_2_24

0.0011 0.0015 0.0016 0.0015 0.0014 0.0012 0.0680 0.0879 0.0732 0.1008 0.0720 0.0680 1_2_24

0.0017 0.0018 0.0017 0.0017 0.0026 0.0017 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-6 測驗題型混和比例為 20%、施測人數 500 人情形之 RMSE

附表 1-6 測驗題型混和比例為 20%、施測人數 500 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1611 0.1353 0.1148 0.1430 0.1150 0.1315 0.2_4_24

0.0022 0.0028 0.0018 0.0022 0.0022 0.0021 0.1482 0.1269 0.1156 0.1480 0.1147 0.1295 0.5_4_24

0.0011 0.0022 0.0019 0.0025 0.0011 0.0013 0.1017 0.1068 0.1006 0.1467 0.0990 0.0958 0.8_4_24

0.0015 0.0025 0.0021 0.0023 0.0025 0.0016 0.0720 0.1024 0.0811 0.1420 0.0857 0.0720 1_4_24

0.0020 0.0023 0.0022 0.0019 0.0031 0.0019 0.1291 0.1005 0.0891 0.1006 0.0875 0.1139 0.2_2_24

0.0020 0.0037 0.0020 0.0024 0.0019 0.0019 0.1253 0.0986 0.0922 0.1044 0.0890 0.1142 0.5_2_24

0.0015 0.0025 0.0021 0.0024 0.0022 0.0014 0.0890 0.0901 0.0846 0.1030 0.0807 0.0846 0.8_2_24

0.0014 0.0021 0.0021 0.0022 0.0018 0.0016 0.0685 0.0879 0.0736 0.1005 0.0727 0.0684 1_2_24

0.0019 0.0025 0.0020 0.0025 0.0019 0.0019 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-7 測驗題型混和比例為 50%、施測人數 3000 人情形之 RMSE

附表 1-7 測驗題型混和比例為 50%、施測人數 3000 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1630 0.1306 0.1090 0.1317 0.1068 0.1306 0.2_4_24

0.0005 0.0013 0.0007 0.0008 0.0007 0.0006 0.1397 0.1194 0.1086 0.1381 0.1068 0.1230 0.5_4_24

0.0004 0.0010 0.0007 0.0010 0.0007 0.0005 0.1026 0.1036 0.0967 0.1340 0.0919 0.0963 0.8_4_24

0.0004 0.0009 0.0008 0.0009 0.0007 0.0005 0.0666 0.0945 0.0771 0.1317 0.0798 0.0668 1_4_24

0.0007 0.0008 0.0009 0.0008 0.0012 0.0007 0.1341 0.0980 0.0842 0.0930 0.0814 0.1189 0.2_2_24

0.0006 0.0017 0.0007 0.0009 0.0007 0.0005 0.1196 0.0921 0.0866 0.0980 0.0831 0.1085 0.5_2_24

0.0005 0.0008 0.0007 0.0009 0.0007 0.0005 0.0897 0.0858 0.0808 0.0949 0.0759 0.0853 0.8_2_24

0.0006 0.0009 0.0008 0.0009 0.0009 0.0006 0.0631 0.0810 0.0662 0.0931 0.0638 0.0631 1_2_24

0.0008 0.0010 0.0009 0.0009 0.0009 0.0008 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-8 測驗題型混和比例為 50%、施測人數 1000 人情形之 RMSE

附表 1-8 測驗題型混和比例為 50%、施測人數 1000 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1626 0.1283 0.1090 0.1315 0.1074 0.1317 0.2_4_24

0.0006 0.0016 0.0013 0.0014 0.0013 0.0010 0.1419 0.1198 0.1092 0.1380 0.1077 0.1247 0.5_4_24

0.0006 0.0016 0.0013 0.0016 0.0014 0.0010 0.1032 0.1037 0.0969 0.1343 0.0928 0.0968 0.8_4_24

0.0007 0.0016 0.0013 0.0016 0.0013 0.0008 0.0663 0.0955 0.0775 0.1309 0.0794 0.0666 1_4_24

0.0013 0.0016 0.0015 0.0015 0.0020 0.0013 0.1337 0.0967 0.0840 0.0926 0.0815 0.1187 0.2_2_24

0.0011 0.0027 0.0014 0.0015 0.0014 0.0010 0.1215 0.0924 0.0869 0.0979 0.0837 0.1104 0.5_2_24

0.0008 0.0013 0.0013 0.0014 0.0013 0.0009 0.0901 0.0858 0.0810 0.0949 0.0763 0.0857 0.8_2_24

0.0008 0.0013 0.0013 0.0014 0.0014 0.0009 0.0624 0.0813 0.0657 0.0928 0.0633 0.0625 1_2_24

0.0012 0.0014 0.0014 0.0015 0.0013 0.0012 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附表 1-9 測驗題型混和比例為 50%、施測人數 500 人情形之 RMSE

附表 1-9 測驗題型混和比例為 50%、施測人數 500 人情形之 RMSE（續）

RMSE / STD 不同實驗設計

IRT OPI REG PC REGP WIRT 0.1630 0.1284 0.1093 0.1316 0.1088 0.1332 0.2_4_24

0.0017 0.0020 0.0016 0.0020 0.0029 0.0017 0.1440 0.1211 0.1103 0.1382 0.1094 0.1268 0.5_4_24

0.0010 0.0021 0.0015 0.0022 0.0017 0.0012 0.1030 0.1038 0.0969 0.1347 0.0935 0.0967 0.8_4_24

0.0016 0.0022 0.0022 0.0021 0.0019 0.0016 0.0676 0.0960 0.0791 0.1309 0.0812 0.0677 1_4_24

0.0017 0.0024 0.0024 0.0023 0.0033 0.0017 0.1322 0.0980 0.0839 0.0929 0.0818 0.1179 0.2_2_24

0.0013 0.0039 0.0019 0.0021 0.0019 0.0014 0.1219 0.0924 0.0869 0.0976 0.0842 0.1115 0.5_2_24

0.0013 0.0024 0.0020 0.0022 0.0020 0.0014 0.0893 0.0859 0.0810 0.0949 0.0763 0.0851 0.8_2_24

0.0014 0.0023 0.0021 0.0023 0.0015 0.0016 0.0628 0.0814 0.0658 0.0926 0.0638 0.0628 1_2_24

0.0019 0.0022 0.0020 0.0022 0.0019 0.0019 不同實驗設計中代號為 N_R_m，N 為次級量尺間相關，R 為次級量尺數，

m 為題本試題數

附錄二等化測驗設計之誤差RMSE

附表 2-1 定錨試題於不同次級量尺情形之 RMSE

RMSE / STD

受試者人數次級量尺

相關程度 IRT OPI WIRT REGP

0.1582 0.1146 0.1383 0.1096 0.2 0.0013 0.0020 0.0064 0.0017 0.1289 0.1049 0.1156 0.1078 0.5 0.0010 0.0015 0.0015 0.0024 0.1043 0.1024 0.0992 0.1049 0.8 0.0011 0.0014 0.0012 0.0018 0.0742 0.0965 0.0749 0.0773 500

1 0.0016 0.0016 0.0016 0.0023 0.1504 0.1110 0.1284 0.1099 0.2 0.0008 0.0019 0.0034 0.0015 0.1342 0.1080 0.1173 0.1020 0.5 0.0008 0.0012 0.0013 0.0012 0.0933 0.0952 0.0886 0.0976 0.8 0.0009 0.0013 0.0011 0.0011 0.0729 0.0934 0.0734 0.0761 1000

1 0.0010 0.0013 0.0011 0.0012 0.1419 0.1108 0.1189 0.1051 0.2 0.0008 0.0011 0.0024 0.0008 0.1270 0.1063 0.1140 0.1027 0.5 0.0005 0.0007 0.0013 0.0132 0.1049 0.1027 0.0993 0.0970 0.8 0.0005 0.0008 0.0005 0.0007 0.0756 0.0994 0.0761 0.0788 3000

1 0.0007 0.0008 0.0007 0.0008

附表 2-2 定錨試題於相同次級量尺情形之 RMSE

RMSE / STD

受試者人數次級量尺

相關程度 IRT OPI WIRT REGP

0.1569 0.1148 0.1417 0.1027 0.2 0.0017 0.0023 0.0100 0.0020 0.1295 0.1073 0.1158 0.1017 0.5 0.0010 0.0018 0.0019 0.0022 0.1042 0.1029 0.0994 0.0924 0.8 0.0013 0.0018 0.0014 0.0020 0.0764 0.0976 0.0771 0.0822 500

1 0.0016 0.0019 0.0016 0.0027 0.1485 0.1129 0.1304 0.1108 0.2 0.0008 0.0031 0.0039 0.0019 0.1336 0.1065 0.1187 0.1038 0.5 0.0009 0.0013 0.0018 0.0015 0.1038 0.1030 0.0986 0.0966 0.8 0.0009 0.0014 0.0010 0.0013 0.0735 0.0966 0.0745 0.0838 1000

1 0.0010 0.0013 0.0011 0.0023 0.1384 0.1111 0.1176 0.1069 0.2 0.0005 0.0010 0.0013 0.0010 0.1301 0.1084 0.1163 0.1032 0.5 0.0006 0.0007 0.0009 0.0008 0.1047 0.1032 0.0991 0.0970 0.8 0.0005 0.0008 0.0006 0.0009 0.0758 0.0997 0.0763 0.0826 3000

1 0.0007 0.0007 0.0007 0.0010

附表 2-3 REG 方法下定錨試題於不同次級量尺情形之 RMSE RMSE / STD

受試者人數次級量尺

相關程度平均數等化法線性等化法等百分位數等化法

0.1694 0.1706 0.1209 0.2 0.0022 0.0025 0.0026 0.1603 0.1664 0.1139 0.5 0.0023 0.0047 0.0026 0.1287 0.1258 0.0986 0.8 0.0024 0.0024 0.0015 0.1021 0.1092 0.0861 500

1 0.0026 0.0038 0.0024 0.1857 0.1968 0.1305 0.2 0.0018 0.0035 0.0024 0.1556 0.1561 0.1045 0.5 0.0016 0.0021 0.0016 0.1356 0.1353 0.0987 0.8 0.0019 0.0022 0.0013 0.0970 0.1041 0.1058 1000

1 0.0023 0.0036 0.0016 0.1803 0.1781 0.1245 0.2 0.0009 0.0009 0.0009 0.1720 0.1741 0.1226 0.5 0.0009 0.0011 0.0010 0.1362 0.1369 0.1115 0.8 0.0011 0.0011 0.0008 0.0983 0.0978 0.0976 3000

1 0.0012 0.0012 0.0009

附表 2-4 REG 方法下定錨試題於相同次級量尺情形之 RMSE RMSE / STD

受試者人數次級量尺

相關程度平均數等化法線性等化法等百分位數等化法

0.1808 0.1898 0.1244 0.2 0.0023 0.0043 0.0031 0.1434 0.1541 0.1171 0.5 0.0019 0.0033 0.0025 0.1171 0.1209 0.0944 0.8 0.0026 0.0042 0.0019 0.0943 0.0992 0.0941 500

1 0.0029 0.0041 0.0020 0.1849 0.1921 0.1274 0.2 0.0014 0.0029 0.0021 0.1569 0.1600 0.1161 0.5 0.0015 0.0026 0.0016 0.1200 0.1257 0.1022 0.8 0.0017 0.0035 0.0014 0.0928 0.1128 0.0903 1000

1 0.0020 0.0043 0.0013 0.1732 0.1814 0.1279 0.2 0.0007 0.0021 0.0013 0.1589 0.1688 0.1256 0.5 0.0009 0.0017 0.0013 0.1343 0.1340 0.1179 0.8 0.0011 0.0011 0.0010 0.0985 0.0978 0.1050 3000

1 0.0012 0.0015 0.0008

附表 2-5 PC 方法下定錨試題於不同次級量尺情形之 RMSE RMSE / STD

受試者人數次級量尺

相關程度平均數等化法線性等化法等百分位數等化法

0.1924 0.1932 0.1144 0.2 0.0024 0.0024 0.0018 0.1804 0.1858 0.1148 0.5 0.0024 0.0042 0.0017 0.1520 0.1494 0.1167 0.8 0.0025 0.0025 0.0018 0.1302 0.1380 0.1088 500

1 0.0026 0.0037 0.0018 0.2068 0.2163 0.1196 0.2 0.0019 0.0032 0.0014 0.1746 0.1750 0.1055 0.5 0.0017 0.0019 0.0014 0.1568 0.1566 0.1126 0.8 0.0019 0.0021 0.0014 0.1261 0.1307 0.1316 1000

1 0.0020 0.0027 0.0015 0.1990 0.1982 0.1228 0.2 0.0009 0.0009 0.0009 0.1904 0.1918 0.1203 0.5 0.0009 0.0010 0.0008 0.1576 0.1580 0.1209 0.8 0.0011 0.0011 0.0008 0.1274 0.1271 0.1221 3000

1 0.0011 0.0011 0.0008

附表 2-6 PC 方法下定錨試題於相同次級量尺情形之 RMSE RMSE / STD

受試者人數次級量尺

相關程度平均數等化法線性等化法等百分位數等化法

0.2017 0.2085 0.1128 0.2 0.0025 0.0036 0.0018 0.1693 0.1782 0.1120 0.5 0.0023 0.0030 0.0017 0.1466 0.1499 0.1107 0.8 0.0029 0.0039 0.0021 0.1217 0.1260 0.1150 500

1 0.0029 0.0039 0.0018 0.2057 0.2118 0.1187 0.2 0.0015 0.0026 0.0012 0.1780 0.1806 0.1191 0.5 0.0016 0.0023 0.0012 0.1486 0.1537 0.1175 0.8 0.0018 0.0033 0.0012 0.1173 0.1381 0.1132 1000

1 0.0019 0.0041 0.0015 0.1967 0.2033 0.1183 0.2 0.0008 0.0018 0.0006 0.1805 0.1893 0.1255 0.5 0.0010 0.0016 0.0007 0.1562 0.1560 0.1277 0.8 0.0011 0.0012 0.0010 0.1271 0.1267 0.1291 3000

1 0.0011 0.0013 0.0008

在文檔中次級量尺分數估計法應用於大型教育測驗情境之模擬研究 (頁 57-88)

建議

第五章 結論與建議

第二節 建議

參考文獻

中文部分

英文部分

附錄一 單一測驗設計之誤差RMSE

附錄二 等化測驗設計之誤差RMSE

第五章結論與建議

第二節建議

附錄一單一測驗設計之誤差RMSE

附錄二等化測驗設計之誤差RMSE