未來研究建議

第五章結論與建議

第二節未來研究建議

本研究使用模擬資料，探討兩種題數 (詴題區塊每每向度所對應的題數為 5 題與 10 題)、兩種向度數 (2 向度與 4 向度)及不同估計方法共三種實驗變項對於多向度 IRT 垂直等化設計下的個體能力估計與群體參數估

計的影響。本節茲就本研究未盡完備之處提出幾點建議，以供未來研究之參考。

一、本研究的實驗變項僅考慮了題數、向度數與估計方法，未來可再針對受詴樣本大小進行探究。

二、本研究採用的等化方法為 BIB 垂直等化，但尚有許多其他現今大型測驗採用的等化方法，故未來可對不同等化方法於多向度 IRT 垂直等化的影響及成效進行探究。

三、本研究在等化的估計上均採用同時估計，但尚有許多不同的測驗連結方法，未來可以不同的連結方法如線性轉換法等進行探究。

四、本研究僅以 Conquest2.0 一種軟體進行估計，未來可針對不同軟體之估計成效進行比較。

五、本研究於垂直等化定錨比例故定為 20%，未來可針對不同定錨比例對多向度 IRT 垂直等化的估計進行探究。

參考文獻

中文部份

王敏嫻（2011）。不同水帄等化設計於可能值方法之探討。未出版之碩士論文，

臺中教育大學教育測驗統計研究所，臺中市。

王暄博（2006）。BIB 與 NEAT 設計之水帄與垂直等化效果比較。未出版之碩士 論文，臺中教育大學教育測驗統計研究所，臺中市。

余民寧（2009）。詴題反應理論（IRT）及其應用（一版）。臺北市，心理出版社 股份有限公司。

郭伯臣、王暄博（2008）。大型測驗中同時進行垂直與水帄等化效果之探討。教 育研究與發展期刊，4(4)，87-120。新北市：國家教育研究院。

黃珮璇（2007）。BIB、PBIB 與 NEAT 設計於多元計分測驗之連結效果比較。未 出版之碩士論文，臺中教育大學教育測驗統計研究所，臺中市。

葉昶成（2012）。不同垂直等化設計下可能值方法估計效果之探討。未出版之碩士論文，臺中教育大學教育測驗統計研究所，臺中市。

英文部份

Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychologica l Measurement, 21 (1), 1-23 .

Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.

Andrew, R. W. & Terry, L. S., (2001). Assessment frameworks and instruments for the 1998 civics Assessment. In Allen, N.L., Donoghue, J.R., &Schoeps, T.L.(Eds),

The NAEP 1998 Technical Report (NCES 2001-509) (pp. 399-411). Washington, DC: National Center for Education Statistics.

Allen, N. L., Carlson J. E., Johnson E. G. ,& Mislevy, R. J. (1999) The NAEP 1998 technical report. Educational Testing Service.

Baker, F. B., & Kim, S, H. (2004). Item Response Theory：Parameter Estimation Techniques. New York：Marcel Dekker.

Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a

microcomputer environment. Applied Psychological Measurement, 6, 431-444.

de la Torre, J., & Song, H. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33, 465-485.

Foy, P., Galia, J., & Li, L. (2008). Scaling the data from the TIMSS 2007 Mathematics and Science assessments.In John F. Olson, Michael O. Martin ,Ina V.S. Mullis.

(Eds). TIMSS 2007 Technical Report.TIMSS & PIRLS International Study Center,Lynch School of Education, Boston College.

Hattie, J. (1981). Decision criteria for determining unidimensional and

multidimensional normal ogive models of latent trait theory. Armidale, Australia:

The University of New England, Center for Behavioral Studies.

Ito, K., Sykes, R. C., & Yao, L. (2008). Concurrent and separate grade-groups linking procedures for vertical scaling. Applied Measurement in Education, 21, 187-206.

Johnson. E. G., & Carlson, J. (1994).The NAEP 1992 Technical Report(Report No.

23-TR-20).Washington, DC: National Center for Education Statistics.

Kolen, M.J. & Brennan, R.J.(1995). Test Equating: Methods and Practices. New York:

Springer-Verlag.

Li, Y. & Lissitz, R. W. (2000). An Evaluation of the Accuracy of Multidimensional IRT Linking. Applied Psychological Measurement, 24, 115-138.

Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233-245.

McKinley, R. L. & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation, 15, 389-390.

Min, k. S. (2007). Evaluation of Linking Methods for Multidimensional IRT Calibrations. Asia Pacific Education Review, 8 (1), 41-55.

Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.

Mislevy, R. J., & Sheehan, K. M. (1989). Information matrices in latent-variable models. Journal of Educational Statistics, 14 , 335-350.

Mislevy, R. J. (1991). Randomization-based inference about latent variable from complex samples. Psychometrika, 56, Psychometric Society, Greensboro, 177-196.

Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating

population characteristics form sparse matrix samples of item response. Journal of Educational Measurement, 29, 133-161.

Mislevy, R.J.,Johnson, E.G., & Muraki, E. (1992). Scaling procedures in NAEP.

Journal of Educational Statistics, 17, 131-154.

Nemhauser, G. L., & Wolsey, L. A. (1999). Integer and Combinatorial Optimization.

New York: John Wiley.

Patz, R., & Yao, L. (2007). Method and Models for Vertical Scaling. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and Aligning Scores and Scales (pp. 253-272). New York: Springer.

Reckase, M. D., & Mckinley, R. L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15,

361-373.

Sympson, J. B. (1978). A model for testing with the multidimensional items. In D. J.

Weiss（Ed.）, Proceedings of the 1977 Computerized Adaptive Testing Conference 82-98. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program.

van der Linden, W. J., Veldkamp, B. P., & Carlson, J. E. (2004).Optimizing Balanced Incomplete Block Designs for Educational Assessments. Applied Psychological Measurement, 28, 317-331.

von Davier M., Gonzalez, E., & Mislevy, R. J. (2009).What are plausible values and why are they useful? IERA Monograph Series: Issues and Methodologies in Large-Scale Assessment,2,.9-36.

Warm, T. A. (1989). Weighted Likelihood Estimation of Ability in Item Response Theory. Psychometrika, 54, 427-450.

Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31 (2-3), 114-128.

Wu, M., Adams, R.J., & Wilson, M.R., & Haldane, A.H. (2007). ACERConQuest 2.0 [computer program]. Hawthorn, Australia: ACER.

OECD (2009). PISA 2006 Technical Report. OCED, Paris.

附錄一

兩向度下，個體能力估計值 RMSE 向度數詴題區塊內

每向度題數群體 PV EAP WLE

2 5 ALL ^dim1 ^0.5466 ^0.4954 ^0.7702

dim2 0.5472 0.4956 0.7723

2 5 G6 ^dim1 ^0.5420 ^0.4877 ^0.7805

dim2 0.5442 0.4892 0.7992

2 5 G5 ^dim1 ^0.5441 ^0.4960 ^0.7510

dim2 0.5437 0.4960 0.7479

2 5 G4 ^dim1 ^0.5536 ^0.5023 ^0.7787

dim2 0.5537 0.5016 0.7689

2 10 ALL ^dim1 ^0.4412 ^0.4013 ^0.5678

dim2 0.4409 0.4011 0.5688

2 10 G6 ^dim1 ^0.4427 ^0.4009 ^0.5814

dim2 0.4445 0.4025 0.5900

2 10 G5 ^dim1 ^0.4316 ^0.3944 ^0.5373

dim2 0.4314 0.3948 0.5381

2 10 G4 ^dim1 ^0.4491 ^0.4085 ^0.5834

dim2 0.4465 0.4060 0.5769

附錄三

兩向度下，群體能力帄均值 RMSE 向度數詴題區塊內

每向度題數群體 PV EAP WLE PVW

2 5 ALL ^dim1 ^0.0079 ^0.0071 ^0.0052 ^0.0079 dim2 0.0054 0.0053 0.0049 0.0054 2 5 G6 ^dim1 ^0.0088 ^0.0092 ^0.0142 ^0.0088 dim2 0.0093 0.0093 0.0151 0.0093 2 5 G5 ^dim1 ^0.0075 ^0.0080 ^0.0082 ^0.0075 dim2 0.0067 0.0064 0.0074 0.0067 2 5 G4 ^dim1 ^0.0083 ^0.0070 ^0.0143 ^0.0083 dim2 0.0067 0.0067 0.0115 0.0067 2 10 ALL ^dim1 ^0.0059 ^0.0056 ^0.0054 ^0.0059 dim2 0.0058 0.0054 0.0046 0.0058 2 10 G6 ^dim1 ^0.0064 ^0.0060 ^0.0041 ^0.0064 dim2 0.0061 0.0058 0.0026 0.0061 2 10 G5 ^dim1 ^0.0069 ^0.0071 ^0.0078 ^0.0069 dim2 0.0064 0.0055 0.0057 0.0064 2 10 G4 ^dim1 ^0.0058 ^0.0065 ^0.0066 ^0.0058 dim2 0.0046 0.0040 0.0084 0.0046

附錄五

兩向度下，群體能力標準差 RMSE

向度數詴題區塊內每向度題數群體 PV EAP WLE PVW

2 5 ALL ^dim1 ^0.0240 ^0.1056 ^0.2107 ^0.0788

dim2 0.0270 0.1028 0.2111 0.0761

2 5 G6 ^dim1 ^0.0389 ^0.0974 ^0.2253 ^0.0687

dim2 0.0417 0.0955 0.2326 0.0668

2 5 G5 ^dim1 ^0.0190 ^0.1546 ^0.2382 ^0.1251

dim2 0.0183 0.1531 0.2370 0.1246

2 5 G4 ^dim1 ^0.0192 ^0.1257 ^0.2280 ^0.0960

dim2 0.0224 0.1229 0.2181 0.0934

2 10 ALL ^dim1 ^0.0105 ^0.0713 ^0.1318 ^0.0549

dim2 0.0112 0.0708 0.1319 0.0541

2 10 G6 ^dim1 ^0.0197 ^0.0655 ^0.1373 ^0.0487

dim2 0.0207 0.0651 0.1424 0.0475

2 10 G5 ^dim1 ^0.0213 ^0.1045 ^0.1327 ^0.0864

dim2 0.0208 0.1028 0.1346 0.0855

2 10 G4 ^dim1 ^0.0093 ^0.0847 ^0.1489 ^0.0660

dim2 0.0103 0.0829 0.1427 0.0641

在文檔中以可能值方法為基礎之多向度垂直等化之探究 (頁 59-71)

第五章 結論與建議

第二節 未來研究建議

參考文獻

中文部份

英文部份

附錄一

附錄三

附錄五

第五章結論與建議

第二節未來研究建議