建議 - 結論與建議 - 小六數學科試題與性別的試題差別功能(DIF)現象與能力指標達成率分析研究

第五章結論與建議

第二節建議

壹、對教學者的建議

一、重視學生基礎概念

兩組受試者在「N_3_10 認識生活中使用的大的測量單位，如：千公斤(公噸)、

千公升(公秉)、百平方米(公畝)、千平方米(公頃)」之能力指標達成率最低，倘若經過試題的質與量分析，結果顯示試題能有效檢測此能力指標，教學者宜省思學生是否有學習上的迷失概念，在平日教學中，雖然一般人平時較少接觸到大的測量單位，且一方面過於依賴電子計算機，而忽視了對單位的認識。再者對於測量單位的換算練習不足，老師教學時容易匆匆帶過，學生基礎概念不夠紮實，值得教育第一線者深思及重新審視教學的過程。

二、重視男、女生在不同能力指標的表現

在「數與計算」、「量與實測」、「幾何」、「統計與機率」與「代數」五大測驗內容內，計有 2 項能力指標的平均達成率顯示，女生的學習成效落後男生，而男生則有 1 項能力指標落後給女生，並無哪一性別有明顯的優勢，可謂各擅勝場。

教學者宜針對學生能力較不足之處，強化基礎概念與計算能力的培養，對學生在學習時所發生的錯誤及時導正，並在測驗後進行檢視，從而改善教學方法，相信對聞數學則色變之高年級學生的能力會有很大的改善。

三、重視教材編排與試題的編製

2006 TASA 數學科試題中與性別有關之 DIF 現象，經研究檢測後，大致仍在可接受的範圍。但本研究對於 DIF 的偵測結果僅限於量的分析，必須再對試題內容做質的分析，才能真正檢驗出是否在課程編排或教學中發生問題，亦或是試題本身的偏誤。

四、增加試題以減低代表性不足之疑慮

本研究發現共有 22 項能力指標只有各設計 1 題試題，是否會造成判斷學生表現的不夠客觀，建議適度增加一些試題，避免造成試題本身代表性不足，可更完整探究男女生在數學科之表現的差異。

貳、對後續研究的建議

一、在試題方面

本研究以小六數學測驗內容與題型做分配及架構，主要是參酌課程綱要與現行教科書的內容比重，研究發現其中「數與計算」的比重最重，達總題數的 40%，

而「代數」卻只佔了總題數的 8%，但過少的題數在檢驗學生能力時可能造成代表性不足的疑慮，未來研究可以就測驗內容領域和試題數做更均衡的分配，依照小六階段的學生能力指標來分類，以探究性別因素在數學科之 DIF 可能形成的原因。

二、增加其他 DIF 方法

本研究採用 SIBTEST、EZDIF 與 SPSS for Windows 12.0 等軟體來檢定 DIF 現象，但只是眾多 DIF 方法中的其中幾種，未來亦可用其他 DIF 方法來做檢定並比較，將存在試題偏誤之試題找出修改或刪除，以確保測驗的公平性。

三、進行長期觀察

可利用 TASA 在小四、小六、國二與高中(職)二年級抽測的學生進行男女生學習狀況的長期觀察，或兩者在學習過程中的變化趨勢，以做為課程制定與補救

教學之參考，未來更可在國際間與類似的資料庫相互比較。

四、將優良試題建置題庫

TASA 可於施測結束，進一步做試題分析，修刪不適當之題目，可確保試題之效度；並將優良試題納入題庫中，將來可以藉由 IRT 的理念針對不同的測驗目標來設計不同的標準化測驗工具。

參考文獻

壹、中文部分

王振世(1997)。不同計分對 DIF 呈現的影響。發表於第三屆兩岸心理與教育測驗 學術研討會。

王寶墉(1995)。現代測驗理論。臺北市：心理出版社。

余民寧(1993)。試題反應理論的介紹(13)―試題偏差的診斷。研習資訊，10(6)，

7-11。2009 年 9 月 3 日，取自：http://www.irt.org.tw/index.php?mod=irt13 余民寧(2006)。教育測驗與評量：成就測驗與教學評量。臺北：心理出版社。

余民寧、謝進昌(2006)。國中基本學力測驗之 DIF 的實徵分析：以 91 年度兩次測 驗為例。國立高雄教育大學教育學刊，26，241-276。

吳明隆(2006)。SPSS 統計應用學習實務—問卷分析與應用統計。臺北：知城數位 科技。

陳明終(1996)。能力測驗試題偏誤之研究。國立臺灣師範大學教育心理與輔導研 究所博士論文。

陳世杰(2005)。國小學童閱讀理解策略與數學文字題閱讀理解、數學文字題解題 表現之相關研究。國立高雄師範大學教育學系碩士論文。

陳濱興(2001)。國小數學解題實作評量與後設認知之相關研究。國立臺中師範學 院教育測驗統計研究所碩士論文。

郭生玉(2004)。教育測驗與評量。臺北：精華書局。

教育部(2006)。92年國民中小學九年一貫課程綱要。2008年8月10日，取自：

http://teach.eje.edu.tw/9CC/index_new92.php

教育部(2008)。九年一貫課程綱要的特色。教育部國教專業社群網。取自：

http://teach.eje.edu.tw/9CC/brief/brief7.php。

曾建銘(2005)。數學科區域性試題差別功能(DIF)之分析與研究：以93年第一次國 中基本學測數學科試題為例。臺中縣豐原市：教育部台灣省中等學校教師研 習會。

曾建銘(2007)。2006TASA 國二數學試題、性別的差異試題功能(DIF)與九年一貫 能力指標達成率分析研究。國家教育研究院籌備處，2007 年 12 月。

黃財尉、李信宏(1999)。國中數學成就測驗性別DIF之探討：Poly-SIBTEST的應 用與分析。中國測驗學會測驗年刊，46(2)，45-60。

臺灣學生學習成就評量資料庫(2007)。國家教育研究院籌備處，2008 年 7 月。取 自：http://tasa.naer.edu.tw/chinese.htm。

盧雪梅(1999)。差異試題功能(DIF)的檢定方法。臺北市立師範學院學報，30，

149-166。

盧雪梅(2000)。Mantel-Haenszel DIF 程序之第一類錯誤率和 DIF 嚴重度分類結果 研究。中國測驗學會測驗年刊，47(1)，57-71。

簡茂發(1991)。命題方法與試題分析。國教輔導，31(1)，2-13。

簡茂發、劉湘川、許天維、郭伯臣、殷志文(1995)。以 Mantel-Haenszel 法檢定試 題區別功能之相關因素探討。測驗年刊，42，85-102。

戴麗紅 (1994) 。大學入學考試試題偏誤之研究－試題特徵區線法和 Mantel-Haenszel 法之比較。國立臺灣師範大學教育心理與輔導研究所碩士 論文。

蕭美琪(2003)。國小學童乘法解題與整合認知之相關研究。國立臺中師範學院教 育測驗統計研究所碩士論文。

貳、英文部分

Allen, M. J. & Yen, W. M. (2002). Introduction to Measurement Theory.

IL: Waveland.

Angoff, W. H. (1993). Perspectives on differential item functioning

methodology.In P.W. Holland & H. Wainer (Eds), Differential item functioning.

Hillsdale, NJ: Lawrence Erlbaum Associate.Inc.

Birnbaum, A. (1957). Efficient design and use of tests of a mental ability for various decision-making problems.(Series Report

No.58-16). Randolph Air Force Base, Texas: USAF School of Aviation Medicine.

(Project No. 7755-23)

Birnbaum, A. (1958a). On the estimation of mental ability. (Series Report No.15).

Randolph Air Force Base, Texas: USAF School of Aviation Medicine. (Project No. 7755-23)

Birnbaum, A. (1958b). Further considerations of efficiency in tests

of a mental ability. (Technical Report No.17). Randolph Air Force Base,Texas:

USAF School of Aviation Medicine. (Project No. 7755-23)

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. P. Novick (Ed.), Statistical theories of mental test scores (pp.397-422). Reading, MA：Addison-Wesley.

Bolt, D. M. (2000). A SIBTEST approach to testing dif hypotheses using

experimentally designed test items. Journal of Educational Measurement, 37(4), 307-327.

Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items.

Thousand Oaks, CA : Sage.

Chase, C. I. (1978). Measurement for educational evaluation (2 ed.).

Reading, MA : Addison-Wesley.

Dorans, N. J., & Holland, P. W. (1993). DIF detection and description:

Mantel-Haenszel and standardization. In P.W. Holland and H. Wainer（Eds.）

Differential Item Functioning（pp.35-66）. Hillsdale, NJ:Lawerence Erlbaum.

Dorans, N. J. & Potenza, M. T. (1994). Equity assessment for polytomously scored items: A taxonomy of procedures for assessing differential item functioning.

ERIC Document Reproduction Service No. ED 380499. Retrieved January 5, 2010, from http://www.eric.ed.gov/PDFS/ED380499.pdf

Ebel, R. L. (1979). Essentials of Educational Measurement. (3^rd ed.) Englewood Cliffs, N.J.: Prentice-Hall.

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.), Englewood Cliffs, NJ: Prentice- Hall.

Finch, H., & French, B. F. (2007). Detection of Crossing Differential Item Functioning.

A Comparison of Four Methods. Educational and Psychological Measurement, 67(4): 565-582.

Haladyna, T. M. (1994). Developing and validating multiple-choice test items.

Hillsdale, NJ: Lawrence Erlbaum Associates.

Hambleton, R. K. & Swaminathan, H. (1985). Item response theory-principles and applications. Boston, MA: Kluwer-Nijhoff.

Harris, A. M., & Carlton, S. T. (1993). Patterns of gender differences on mathematics items on the Scholastic Aptitude Test. Applied Measurement in Education, 6, 137-151.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the

Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity , (pp.

129-145). Hillsdale, NJ: Lawrence Erlbaum Associates.

Kim, S., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22, 131-143.

Kim, S., & Cohen, A. S. (2002). A comparison of linking and concurrent calibration under graded response theory. Applied Psychological Measurement, 26, 25-41 Kolen, M. J. & Brennan, R. J. (1995). Test Equating: Methods and Practices.

New York: Springer-Verlag.

Landis, R. J., Hyman, E. R., & Kock, G. G. (1978). Average partial association in three-way contingency tables: A review and discussion of alternative tests.

International Statistical Review, 46, 237-254.

Lane, S., Wang, N., & Magone, M. (1996). Gender-related differential item functioning on a middle-school mathematics performance assessment.

Educational measurement:Issues and Practice, 15(4), 21-27.

Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF.

Psychometrika, 61(4), 647-677.

Lord, F. M. (1980). Application of item response theory to practical testing problems.

Hillsdale, NJ: Lawrence Erlbaum Associates.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores.

NewYork: Addison-Wesley.

Mantel, N., & Haenszel, W. M. (1959). Statistical aspects of the analysis of data from respective studies of disease. Journal of the National Cancer Institute, 22,

719-748.

Mellenberg, G. J. (1982). Contingency table models for assessing item Bias. Journal of Educational Statistics,7, 105-118.

Noll, V. H., Scannell, D. P., & Craig, R. C. (1979). Introduction to Educational Measurement. (4^th ed.) Boston: Houghton Mifflin. van der Linden, W.J., Petersen, N. S., Cook, L. L., & Stocking, M. L. (1983). IRT versus

conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8, 137-156.

Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small

sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error pergormance. Journal of Educational Measurement, 33, 215-230.

Ryan, K. E. & Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14, 73-90.

Ryan, K. E. & Fan, M. (1996). Examining gender DIF on a multiple-choice test of mathematics: A confirmatory approach. Educational Measurement: Issues and Practice, 15(4), 15-20.

Shealy, R. T., & Stout, W. F. (1993a). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 54, 159-194.

Shealy, R. T., & Stout, W. F. (1993b). An item response theory model for test bias and differential test functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197-239). Hillsdale, NJ: Lawrence Erlbaum Associates.

Stocking, M L. & Lord, F. M. (1983). Developing a Common Metric in Item Response Theory. Applied Psychological Measurement, 7(2).201-211.

van der Linden, W. J., Veldkamp, B. P., & Carlson, J. E. (2004）.Optimizing balanced incomplete block designs for educational assessments.

在文檔中小六數學科試題與性別的試題差別功能(DIF)現象與能力指標達成率分析研究 (頁 84-92)

建議

第五章 結論與建議

第二節 建議

第二節 建議

壹、 對教學者的建議

貳、對後續研究的建議

參考文獻

壹、中文部分

貳、英文部分

第五章結論與建議

第二節建議

第二節建議

壹、對教學者的建議