限制與建議

第二章文獻探討

第二節限制與建議

茲就本研究未盡完備之處，提出一些研究建議，供後續研究者參考。

一、以 MMLE/EMMIX 進行難度參數估計時，人數多寡及測驗長度與參數估計誤差在部分情境下缺乏一致性，所以若將 MMLE/EMMIX 用在大型測驗，恐成效不彰，故日後的研究可進一步探討其原因及改良方法。

二、以 MMLE/EMMIX 進行難度參數估計時，其參數估計誤差受l 影響，後續研究可深入探討不同能力值分布之下l之最佳值。

三、由於本研究未探討 BILOGMG 中參數初始值的設定及加速的機制，未來研究可深入探討其對參數估計誤差之影響，進一步作為改進 MMLE/EMMIX 的機制。

參考文獻

中文部分

王暄博(2006)。BIB 與 NEAT 設計之水平及垂直等化效果比較。國立台中 教育大學教育測驗統計研究所碩士論文，未出版。

王雅苓(1999)。Kernel smoothing 在 IRT 真分數等化的應用與分析。國立 彰化師範大學數學研究所碩士論文。

吳慧泯（2001）：選項特徵曲線之研究－以核函數之平滑化為估計取向。

國立台中師範學院教育測驗統計研究所碩士論文。

陳煥文(2004)。垂直等化連結特性之研究四種連結方法的比較。國科會專 題研究計畫。

黃志傑(2004)。定錨試題分佈對測驗等化之影響。國立臺中師範學院教育測驗統計研究所碩士論文，未出版。

黃美芳(2006)。試題反應理論三參數模式下等化效果之探究。國立台中教育大學教育測驗統計研究所碩士論文，未出版。

楊孟麗、譚康榮、黃敏雄(2003）。心理計量報告：TEPS 2001 分析能力測 驗。台灣長期追蹤資料庫。

趙素珍(1998)。IRT 軟體估計精準度之比較。國立臺中師範學院國民教育 研究所碩士論文，未出版。

劉湘川(2001a)：相關加權核平滑化無參數試題選項特徵曲線估計法及其 IORS 整合模式。第五屆華人社會心理與教育測驗學術研討會。C5.1 ， 110 頁。台北市：中國測驗學會、台灣師範大學。

英文部分

Ban, J.C., Hanson, B.A, Yi, Q., & Harris, D.J.(2001) Data Sparseness and

Online Pretest Item Calibration/Scaling Methods in CAT. Annual Meeting

of the American Educational Research Association.

Birnbaum. A (1968) Some latent trait models and their use in inferring an examinee’s ability. Statistical theories of mental test scores. London:

Wesley Publishing Company.

Bock, R. D. & Lieberman, M (1970) Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179197.

Bock, R. D. & Aitkin, M (1981) Marginal maximum likelihood estimation of item parameters：Application of an EM algorithm. Psychometrika, 46, 443459.

Baker, F. B. (2004). Item Response Theory

： Parameter estimation techniques.

New York：Marcel Dekker.

Bowman, A. W. & Azzalini. A. (1997) Applied Smoothing Techniques for Data

Analysis, Oxford University Press.

Donoghue, J.R. & Isham,S.P. (1996) Comparing the Effectiveness of

Procedures to Detect Item Parameter Drift. Educational Testing Service.

DeMars, C.E.(2005) "Guessing" Parameter Estimates for Multidimensional

IRT Models.

American Educational Research Association,

Gao.F & Lisue.C (2005) Bayesian or NonBayesian:A Comparison Study of Item Parameter Estimation in the ThreeParameter Logistic Model.

Applied Measurement in Education, 18(4), 351–380

Glas .C. A.W. & Hendrawan.I (2005) Testing Linear Models for Ability

Glas, C. A. W., & van der Linden, W. J. (2006). Modeling Variability in Item

Parameters in Educational Measurement. Law School Admission Council

Computerized Testing Report 0107.

Gasser, T. & Muller, H. G. (1979). Kernel estimation of regression functions.

In Smoothing Techniques for Curve Estimation . SpringerVerlag.

Haebara, T.(1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144149.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Priciples

and applications. Boston: KluwerNijhoff.

Hanson, B. A. & Béguin, A. A. (2002). Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate Versus Concurrent stimation in the CommonItem Equating Design. Applied Psychological

Measurement, 26, 324.

Jones,D.H & Nediak.M (2000) Item Parameter Calibration of LSAT Items

Using MCMC Approximation of Bayes Posterior Distributions. Law

School Admission Council Computerized Testing Report 0005

Kolen, M. J. & Brennan, R. L. (1995). Test equating: methods and practices.

New York: SpringerVerlag.

Lindley, D.V.(1971) Bayesian statistics: A review. Philadelphia: Society for Industrial and Applied Mathetics.

Lord, F. M.(1980).Applications of item response theory to practional testing

Mislevy, R. J. & Bock, R. D. (1989) BILOG 3：Item analysis and test scoring

with binary logistic models. Mooresville, IN: Scientific Software.

Mislevy, R.J. & Stocking, M.L. (1989). A consumer’s Guide to LOGIST and BILOG.. Applied Psychological Measurement,13,5775.

Mislevy, R. J. & Bock, R. D. (1982). Implementation of the EM algorithm in the estimation of item parameters: The BILOG computer program. Item

Response Theory and Computerized Adaptive Testing Conference Proceedings.

Muraki, E., & Bock, R. D. (1996). PARSCALE: IRT based test scoring and item

analysis for graded openended exercises and performance tasks. Chicago:

Scientific Software.

Nadaraya, E. A.(1964) On Estimating Regression, Theory Probability

Application, 10, 186−90.

Priestley, M. B. & Chao, M. T. (1972). Nonparametric function fitting. J. Roy.

Statist. Soc.Ser. B 34, 385392.

Ramsay, J. O. (1991) Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611630.

Stocking, M.L. & Lord, F.M.(1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2). 201211.

Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis.

London: Chapman & Hall.

Swaminathan. H., & Gifford, J.A. (1982) Bayesian estimation in the Rasch model. Journal of Educational Statistics, 7, 175191.

Tang,K.L. & Eignor.D.R (2001) A Study of the Use of Collateral Statistical

Information in Attempting to Reduce TOEFL IRT Item Parameter Estimation Sample Sizes. TOFEL Technique Report 17.

Vale, C. D. (1986). Linking item parameters onto a common scale. Applied

Psychological Measurement, 10(4), 333344.

Watson, G. S. (1964) Smooth Regression Analysis, Sankhya The Indian

Journal of Statistics, Series A, 26, 359−372.

Wood, R. L. & Wingersky, M. S.＆Lord, F. M. (1976) LOGIST：A computer

program for estimating examinee ability and item characteristic curve parameters. Princeton, NJ: Educational Testing Service.

Wingersky, M. S. & Lord, F. M. (1984). An investigation of methods for

reducing sampling error in certain IRT procedures. Applied Psychological

Measurement, 8, 347364.

Wolfgang, H. & Marlene, M. (2004) Nonparametric and Semiparametric

Models. Heidelberge New York.

Yamamoto.K (1995) Estimating the Effects of Test Length and Test Time on

Parameter Estimation Using the HYERID Model. TOFEL Technique

Report 10.

Yao,L, Patz,R.J. & Hanson,B.A. (2002) More Effcient Markov Chain Monte

Carlo Estimation in IRT Using Marginal Posteriors

. National Council on Measurement in Education

國立台中教育大學教育測驗統計研究所碩士論文

指導教授：郭伯臣博士

融合kernel smoothing 之MMLE法於IRT參數估計之應用

研究生：張雅媛撰

摘要

BILOGMG 在應用邊際最大概似法(marginal maximum likelihood estimation/ EM, MMLE/EM)估計試題參數過程中，在估計能力的機率密度函數時，涉及數值運算的部分，BILOGMG 採用直方圖的估計方法，本研究以無參數的方法，藉由核平滑化(kernel smoothing)的方法估計能力的機率密度函數，期望克服直方圖估計所遭遇的問題並提升估計精準度。

是故本研究自行開發基於核平滑化法之邊際最大概似估計法(簡稱 MMLE/EMMIX)之程式，比較能力值為不同分布時，以 MMLE/EMMIX 進行參數估計，與 BILOGMG 之估計結果比較估計精準度。

研究結果顯示在實驗一中，能力參數估計部分，能力為不同分布時，

測驗長度為 60 題時，大致以 MMLE/EMMIX( l =0 )所得之參數估計誤差較小，測驗長度為 30 題時，大致以 BILOGMG 所得之參數估計誤差最小。

試題參數估計部分，樣本數較少時，以 MMLE/EMMIX( l =0 )所得之參數估計誤差最小；樣本數較大時，以 BILOGMG 所得之參數估計誤差最小。

實驗二中，無論受試者的能力分布為何，以 MMLE/EMMIX 進行能力參數及試題參數估計，其參數估計誤差大致上均小於 BILOGMG 之參

數估計誤差，然因l值設定的不同，在不同參數估計及情境下有不同的效

果。

關鍵字：邊際最大概似法、貝氏後驗機率期望值估計法、核平滑化法

Abstract

In this paper, a modified version of MMLE/EM (Bock & Aitkin, 1981) is proposed. From simulation study, we find that the performance of BilogMG (MMLE with EAP) is poor when the distribution of incidental parameter is not normally distributed. There are two modifications in the proposed algorithm.

First, kernel density estimation technique is applied to estimate the distribution of incidental parameter in Estep. Second, kernel density estimation technique is applied to estimate the structural parameters and incidental parameters with EAP in Mstep. Then we use this methodology to estimate the ability and item parameters iteratively.

In this paper, a simulation experiment based on threeparameter logistic model is conducted to compare the performances of BilogMG and the proposed algorithm. In the experiment, three types of distributions of incidental parameters (normal, bimode and skewed distributions) are considered. Three values of l which means the weight of kernel method are tried. Then root mean square error (RMSE) is used to evaluate the performances of BilogMG and the proposed algorithm. Experimental result shows that under most conditions, RMSEs of both ability and item parameters of the proposed algorithm are less than those of BilogMG.

Keyword: MMLE/EM, EAP, kernel smoothing

第一章緒論... 5 第一節研究動機... 7 第二節研究目的... 8 第三節研究問題... 8 第二章文獻探討... 9 第一節聯合最大概似法 ... 10 第二節邊際最大概似法 ... 13 第三節貝氏估計法... 16 第四節 BILOGMG 的參數估計方法... 19 第參章研究方法... 20 第一節 MMLE/EM 的估計缺點 ... 20 第二節核平滑化法... 22 第三節基於核平滑化法之貝氏估計法... 25 第四節研究設計... 28 第肆章研究結果... 36 第一節實驗一之結果 ... 37 第二節實驗二之結果 ...46 第三節實驗結果比較 ... 50 第伍章結論與建議 ... 85 第一節結論 ... 85 第二節限制與建議... 86 參考文獻 ... 87

第一章緒論

試題反應理論(item response theory, IRT)是現代測驗理論的主要架構，

為國內外測驗學者廣泛應用，且為測驗學界的主流趨勢。然而，由於受試者的能力參數和試題參數的真值在現實生活中無法預知，故無法評斷估計所得的參數是否精確，故只能以電腦模擬試驗的方式模擬參數真值，藉由真值模擬作答反應，以該作答反應進行參數估計，再計算參數估計值和真值之間的誤差作為評斷不同估計法優劣的指標。

目前已有數種 IRT 應用軟體問世，比如說：BILOGMG、ICCNP、

MULTILOG (Thissen, 1991)、PARSCALE (Muraki & Bock, 1996)，每種軟體適用的模式及參數估計方法皆不太相同。趙素珍（1998）採用真實資料與模擬資料並用的方式，利用三參數對數模式製造二元計分資料，來測試上述四種 IRT 軟體的實際應用情形及其參數估計精準度，研究結果指出試題參數與能力值參數皆以 BILOGMG 軟體的估計最為精確與穩定。

然 BILOGMG 在應用邊際最大概似法(marginal maximum likelihood estimation/ EM, MMLE/EM)估計試題參數過程中(Mislevy & Bock, 1989)，

涉及估計能力的機率密度函數部分，BILOGMG 採用直方圖的估計方法，

但是在實際應用過程中則有組距及原點以決定的問題。因此，若能有較精準的估計法，相信必能提升整體的估計精確度。

加拿大心理計量學者 Ramsay (1991)成功地結合高低試題鑑別指數與核平滑無參數估算法，發展出正確選項與誘答選項均可分析之核平滑法無參數試題特徵曲線估算法（kernel smoothing approaches to nonparametric

本章分為四節，第一節介紹研究動機，第二節介紹研究目的，第三節介紹研究問題。

第一節研究動機

BILOGMG 在應用邊際最大概似法(marginal maximum likelihood estimation/ EM, MMLE/EM)估計試題參數過程中(Mislevy & Bock, 1989)，

涉及估計能力的機率密度函數部分，BILOGMG 採用直方圖的估計方法，

該方法雖然很方便，亦能針對能力參數為非常態的情況下進行正確的估計，但是在實際應用過程中則有組距難及原點以決定的問題。

故本研究以無參數的核平滑化(kernel smoothing)來估計能力的機率密度函數，期望克服直方圖估計所遭遇的問題並提升估計精準度。

為探討本研究所提出之基於核平滑化法之邊際最大概似估計法(簡稱 MMLE/EMMIX)是否可行，本研究將與 BILOGMG 比較估計結果估計之精準度

第二節研究目的

本研究欲了解在不同人數、測驗長度、不同能力分布時，以 MMLE/EMMIX 及 BILOGMG 進行參數估計之精準度比較。

茲將本研究目的敘述如下：

一、了解 BILOGMG 在不同情況下之參數估計的效果。

二、開發新的參數估計法 MMLE/EMMIX。

三、了解 MMLE/EMMIX 在不同情況下之參數估計的效果。

四、比較 MMLE/EMMIX 及 BILOGMG 於不同情況下參數估計的效果。

第三節研究問題

本研究欲探討的問題有三，茲分述如下：

一、各變項對 BILOGMG 進行參數估計效果的影響為何？

二、各變項對 MMLE/EMMIX 進行參數估計效果的影響為何？

三、比較 MMLE/EMMIX、BILOGMG 於不同情況下參數估計效果之優劣？

第二章文獻探討

本章第一節介紹聯合最大概似法，第二節介紹邊際最大概似法，第三節介紹貝氏估計法，第四節介紹BILOGMG的參數估計方法。

爲說明上的方便，進行文獻探討前，先定義本章所使用之符號：

N ：受試者(subject or examinee)人數 n ：測驗長度

x ：試題參數

t ：影響能力參數分布之超參數(hyperparameter) h：影響試題參數分布之超參數(hyperparameter)

)

Birnbaum(1968) 提出聯合最大概似法(joint maximum likelihood

estimation, JMLE)，JMLE 的主要特色是能力值參數與試題參數以迭代方式共同估計，分二個階段進行，第一階段估計試題參數，第二階段估計能力

å

使用 Fisher scoring 法可得二階導數為：

將二階導數的部分由其期望值替代：會產生「identification problem」，亦即估計的過程中量尺會移動，所以在每次估計完後必須將能力值進行標準化：

第二節邊際最大概似法

由於 JMLE 引起 NeymanScott 問題，即當樣本數越大時，所要估計的能力參數也越多，估計的精準度無法藉由樣本數的增加而提升，使得參數

在文檔中融合kernelsmoothing之MMLE法於IRT參數估計之應用 (頁 87-184)

第二章 文獻探討

第二節 限制與建議

參考文獻

中文部分

英文部分

Online Pretest Item Calibration/Scaling Methods in CAT. Annual Meeting

： Parameter estimation techniques.

Analysis, Oxford University Press.

Procedures to Detect Item Parameter Drift. Educational Testing Service.

IRT Models.

Applied Measurement in Education, 18(4), 351–380

Parameters in Educational Measurement. Law School Admission Council

In Smoothing Techniques for Curve Estimation . Springer­Verlag.

and applications. Boston: Kluwer­Nijhoff.

Measurement, 26, 3­24.

Using MCMC Approximation of Bayes Posterior Distributions. Law

with binary logistic models. Mooresville, IN: Scientific Software.

Response Theory and Computerized Adaptive Testing Conference Proceedings.

analysis for graded open­ended exercises and performance tasks. Chicago:

Application, 10, 186−90.

Information in Attempting to Reduce TOEFL IRT Item Parameter Estimation Sample Sizes. TOFEL Technique Report 17.

Psychological Measurement, 10(4), 333­344.

Journal of Statistics, Series A, 26, 359−372.

program for estimating examinee ability and item characteristic curve parameters. Princeton, NJ: Educational Testing Service.

Measurement, 8, 347­364.

Models. Heidelberge New York.

Parameter Estimation Using the HYERID Model. TOFEL Technique

Carlo Estimation in IRT Using Marginal Posteriors

國立台中教育大學教育測驗統計研究所碩士論文

指導教授：郭伯臣 博士

融合kernel smoothing 之MMLE法 於IRT參數估計之應用

研究生：張雅媛 撰

摘 要

Abstract

目 次

第一章 緒論

第一節 研究動機

第二節 研究目的

第三節 研究問題

第二章 文獻探討

å

第二節 邊際最大概似法

第二章文獻探討

第二節限制與建議

In Smoothing Techniques for Curve Estimation . SpringerVerlag.

and applications. Boston: KluwerNijhoff.

Measurement, 26, 324.

analysis for graded openended exercises and performance tasks. Chicago:

Psychological Measurement, 10(4), 333344.

Measurement, 8, 347364.

指導教授：郭伯臣博士

融合kernel smoothing 之MMLE法於IRT參數估計之應用

研究生：張雅媛撰

摘要

目次

第一章緒論

第一節研究動機

第二節研究目的

第三節研究問題

第二章文獻探討

第二節邊際最大概似法