改進建議

第五章結論與建議

第二節改進建議

本模擬研究在進行不同年度間測驗等化時，共同變項設定三種受試人數，5460 人、7540 人及 10062 人；三種定錨試題比例，10％、20％及 30％；兩種測驗題本長度，

30 題及 60 題；三種受試者能力分布差距，無差距、小差距及大差距，進行不同量尺化程序的比較。茲就本研究未盡完備之處，提出研究建，供後續研究亦者參考。

一、本研究僅考慮受試者能力為常態分布，以及一種試題參數分布，可進一步考量不同參數分布的等化效果比較。

二、本研究僅模擬二元計分之作答反應組型，可進一步考量多元計分在不同量尺化程序下的等化效果。

參考文獻

中文部分

王暄博（2006）。BIB 與 NEAT 設計之水平及垂直等化效果比較。國立臺中教育大學 教育測驗統計研究所碩士論文，未出版，臺中市。

郭伯臣、楊思偉、白曉珊、張鈺卿（2008）。BIB 與 NEAT 設計在不同年度測驗連結 效果之比較。測驗統計年刊，16（2）台中市：國立台中教育大學。

郭伯臣、王暄博（2008）。大型測驗中同時進行垂直與水平等化效果之探討。教育研 究與發展期刊，4（4），87-120。

陳煥文（2004）。垂直等化連結特性之研究-四種連結方法的比較。（國科會專題研究 計畫，NSC92-2413-H-024-015）。臺南市︰國立臺南大學測驗統計研究所。

張宛婷（2010）。臺灣學生學習成就評量資料庫試題等化設計。臺灣學生學習成就評量資料庫電子報，5，2010 年 2 月 1 日，臺灣學生學習成就評量資料庫網站。

國家教育研究院（2010）。大型標準化測驗建置流程應用於 TASA 之研究。臺北市：

國家教育研究院。

黃美芳（2006）。試題反應理論三參數模式下等化效果之探究。國立臺中教育大學教育測驗統計研究所碩士論文，未出版，臺中市。

臺灣學生學習成就評量資料庫網站（2011）。臺灣學生學習成就評量資料庫。檢索日期：2011 年 6 月 12 日。網址：http://tasa.naer.edu.tw/brief.htm.

英文部分

Allen, N.L., Donoghue, J.R., & Schoeps, T.L. (2001). The NAEP 1998 technical report.

Washington, DC: National Center for Educational Statistics.

Angoff, W. H.(1971). Scales, norms, and equivalent scores. In R.L. Thorridike (Ed.),

Educational measurement (2nd ed., 508-600). Washington, DC: American Council on

Education. (Reprinted as W. A. Angoff, Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service, 1984.)

Anton B. & Bradley H.（2001）.Effect of Noncompensatory Multidimensionality on Separate

and Concurrent Estimation in IRT Observed Score Equating.

Measurement and

Research Department Reports 2001-02.

Baker, F. B.(1992). Item Response Theory: Parameter Estimation Techniques. New Yook:

Marcel Dekker.

Crocker, L. & Algina, J.(1986). Introduction to Classical and Modem Test Theory. New York: Holt, Rinehart and Winston.

Educational Testing Service.(2011). About ETS. Retrieved June 14,2011, from http://www.ets.org

Hanson,B.A. & Beguin, A.A.（2002）. Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate Versus Concurrent estimation in the Common-Item Equating Design. Applied Psychological Measurement,26,3-24.

John, F.O. Michael, O.M. & Ina, V.S.M.(2008).TIMSS 2007 Technical Report. TIMSS &

PIRLS International Study Center.

Kolen, M. J.& Brennan, R. J.(2004).Test Equating, Scaling, and Linking:Methods and

Practices. New York: Springer-Verlag.

Lord, F. M.(1980).Applications of Item Response Theory to Practical Testing Problems.

hillsdale, NJ：Lawrence Erlbaum.

Mislevy, R. J. & Bock, R. D. (1982).Implementation of the EM algorithm in the estimation

of item parameters:THE BILOG computer program. In:Item Response Theory and

Computerized Adaptive Testing Conference Proceedings（Wayzata, MN）.

Mislevy, R. J. (1986). Bayes model estimation in item response models. Psychometrika, 51, 177-195.

Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O'Sullivan, C.Y., Arora, A., & Eberber, E.

(2005). TIMSS 2007 Assessment Frameworks. From http://timss.bc.edu/TIMSS2007/frameworks.html

NAEP Technical Documentation,(2011). The Nation’s Report Card. June 14,2011, from National Center for Education Statistics: http://nces.ed.gov/nationsreportcard/tdw/

OECD (2009). PISA 2006 Technical Report. OCED, Paris.

Simon, M. K. (2008). Comparison of concurrent and separate multidimensional IRT linking

of item parameters. Educational Psychology. ProQuest Dissertations and Theses,

Retrieved from http://search.proquest.com/docview/304512900?accountid=14223 Stocking, M. L. & Lord, F. M.(1983).Developing a Common Metric in Item Response

Theory. Applied Psychological Measurement,7（2）.201-211.

Swaminathan, H., & Gifford, J. A. (1986). Bayesian estimation in the threeparameter logistic models. Psychometrika, 51, 589-601.

The 13th International Objective Measurement Workshop(2006).

Symposium 7: Current Critical Issues Related to Science Assessments. 2006.4.6.

TIMSS & PIRLS International Study Center(2011). TIMSS 2011. June 16,2011,from http://timss.bc.edu/timss2011/index.html

Zhang, Z. (2010). Comparison of different equating methods and an application to link

testlet-based tests. The Chinese University of Hong Kong (Hong Kong)). ProQuest

Dissertations and Theses, Retrieved from

http://search.proquest.com/docview/822411041?accountid=14223

附錄一兩年度受試者能力無差距之估計誤差

RMSE 不同

實驗設計 PISA 量尺化固定試題參數法 N&T 量尺化同時估計法 5460_10_30 0.4602 0.4356 0.4214 0.4191 7540_10_30 0.4557 0.4344 0.4209 0.4185 10062_10_30 0.4551 0.4315 0.4207 0.4175

5460_10_60 0.4201 0.3898 0.3337 0.3383 7540_10_60 0.4230 0.3958 0.3312 0.3358 10062_10_60 0.4100 0.3894 0.3278 0.3289

5460_20_30 0.4250 0.4088 0.3988 0.3980 7540_20_30 0.4193 0.4045 0.3965 0.3969 10062_20_30 0.4159 0.4042 0.3933 0.3950

5460_20_60 0.3417 0.3541 0.3074 0.3141 7540_20_60 0.3394 0.3532 0.3065 0.3119 10062_20_60 0.3338 0.3484 0.3064 0.3096

5460_30_30 0.4014 0.3859 0.3769 0.3775 7540_30_30 0.3972 0.3861 0.3765 0.3760 10062_30_30 0.3916 0.3835 0.3779 0.3751

5460_30_60 0.3139 0.3315 0.2859 0.2935 7540_30_60 0.3125 0.3294 0.2848 0.2911 10062_30_60 0.3084 0.3265 0.2848 0.2887 不同實驗設計中代碼為 N_A_T，N 為受試者人數，A 為定錨試題比例，T 為每一題本試題數。

附錄二兩年度受試者能力小差距之估計誤差

RMSE 不同

實驗設計 PISA 量尺化固定試題參數法 N&T 量尺化同時估計法 5460_10_30 0.4520 0.4290 0.4262 0.4183 7540_10_30 0.4452 0.4268 0.4263 0.4146 10062_10_30 0.4453 0.4260 0.4249 0.4121 5460_10_60 0.4072 0.3833 0.3333 0.3347 7540_10_60 0.3818 0.3810 0.3311 0.3273 10062_10_60 0.3857 0.3792 0.3320 0.3252 5460_20_30 0.4131 0.4023 0.4035 0.3944 7540_20_30 0.4039 0.3997 0.4041 0.3940 10062_20_30 0.4034 0.3964 0.4014 0.3894 5460_20_60 0.3296 0.3472 0.3133 0.3119 7540_20_60 0.3270 0.3443 0.3134 0.3100 10062_20_60 0.3267 0.3434 0.3150 0.3079 5460_30_30 0.3924 0.3837 0.3836 0.3751 7540_30_30 0.3821 0.3782 0.3818 0.3699 10062_30_30 0.3843 0.3786 0.3835 0.3713 5460_30_60 0.3083 0.3287 0.2956 0.2942 7540_30_60 0.2978 0.3245 0.2958 0.2879 10062_30_60 0.2935 0.3207 0.2964 0.2890 不同實驗設計中代碼為N_A_T，N為受試者人數，A為定錨試題比例，T為每一題本試題數。

附錄三兩年度受試者能力大差距之估計誤差

RMSE 不同

實驗設計 PISA 量尺化固定試題參數法 N&T 量尺化同時估計法 5460_10_30 0.4210 0.4068 0.6259 0.4215 7540_10_30 0.4158 0.4046 0.6411 0.4327 10062_10_30 0.4171 0.4084 0.6461 0.4306 5460_10_60 0.3534 0.3436 0.5891 0.3645 7540_10_60 0.3355 0.3417 0.5928 0.3795 10062_10_60 0.3277 0.3364 0.5954 0.3807 5460_20_30 0.3912 0.3843 0.6362 0.4221 7540_20_30 0.3834 0.3818 0.6382 0.4135 10062_20_30 0.3813 0.3809 0.6410 0.4279 5460_20_60 0.3257 0.3422 0.6072 0.3734 7540_20_60 0.3066 0.3385 0.6000 0.3629 10062_20_60 0.3088 0.3413 0.6054 0.3792 5460_30_30 0.3988 0.3899 0.6407 0.4231 7540_30_30 0.3913 0.3879 0.6413 0.4240 10062_30_30 0.3796 0.3861 0.6410 0.4195 5460_30_60 0.3309 0.3501 0.6036 0.3682 7540_30_60 0.3172 0.3499 0.6011 0.3571 10062_30_60 0.3111 0.3488 0.6043 0.3604 不同實驗設計中代碼為N_A_T，N為受試者人數，A為定錨試題比例，T為每一題本試 題數。

在文檔中大型測驗不同量尺化程序之等化效果探究 (頁 57-64)

第五章 結論與建議

第二節 改進建議

參考文獻

中文部分

英文部分

Washington, DC: National Center for Educational Statistics.

Educational measurement (2nd ed., 508-600). Washington, DC: American Council on

and Concurrent Estimation in IRT Observed Score Equating.

Practices. New York: Springer-Verlag.

of item parameters:THE BILOG computer program. In:Item Response Theory and

of item parameters. Educational Psychology. ProQuest Dissertations and Theses,

Symposium 7: Current Critical Issues Related to Science Assessments. 2006.4.6.

testlet-based tests. The Chinese University of Hong Kong (Hong Kong)). ProQuest

附錄一 兩年度受試者能力無差距之估計誤差

附錄二 兩年度受試者能力小差距之估計誤差

附錄三 兩年度受試者能力大差距之估計誤差

第五章結論與建議

第二節改進建議

附錄一兩年度受試者能力無差距之估計誤差

附錄二兩年度受試者能力小差距之估計誤差

附錄三兩年度受試者能力大差距之估計誤差