• 沒有找到結果。

本研究成功地建構了以語言知識模式為基礎,參考歐洲共同語言參考 架構的溝通能力指標,並且涵蓋高階層能力的華語溝通能力測驗。總共收 集有效樣本共 1,235 份,其中男性華裔學生有 660 名,女性華裔學生為 575 名。52%的華裔學生來自於接觸華語機會較高的僑居地區或國家。

本研究採 H3PLM 測驗架構所編製的華語溝通能力測驗,可以評量出 每一個參與測驗華裔學生個別的語言、社會語言和語用三個領域能力;同 時,評量的結果能立即協助教師診斷出個別學生在哪一個領域的知識上,

需要再提供補充教材或進行補救教學。除此之外,透過 IRT 所得到的華語 總體溝通能力,能夠讓每一個學生的華語總體溝通能力進行比較,作為華 裔學生未來在台灣升讀大學和申請院系之參考依據之一。

本研究在華語溝通能力電腦化適性測驗的系統建置上,主要的貢獻有 六:首先,HO-IRT CAT 系統是第一個可以評量華語溝通能力的電腦化適 性測驗系統;其次,HO-IRT CAT 系統是第一個考量華人文化知識和生活 習慣,採語言知識模式,以同時涵蓋個別任務型和整合任務型的試題,測 量受試者多面向的華語溝通能力的測驗系統;第三,採 MAP 法的 HO-IRT CAT 系統能夠準確地而且有效率估計受試者的能力;第四,採一因子題內 HO-IRT 的測驗架構能同時估計受試者的領域能力與總體能力。進而使得 HO-IRT CAT 系統能夠成為一個實用且多功能的評量工具。例如 HO-IRT CAT 系統所估計的領域能力可作為一種形成性的評量,其測量結果可提供 教師了解學習者需要加強那一種領域能力。再則,HO-IRT CAT 系統所估 計的總體能力亦可作為總結性評量,僑生先修部可以根據其總能力的測量 結果,作為華裔學生將來分發進入大學就讀的參考依據。第五,本研究所 建置的 HO-IRT CAT 系統運用 MIRT CAT Approach 採 MAP 法,不論是在 受試者能力估計的準確性、系統選題策略的效率或是適當性上,都能得到

100

最大的效益。最後,HO-IRT CAT 系統使用傳統紙筆測驗三分之一的題數

,即可完成華語溝通能力的測驗。他可以有效縮短測驗時間並降低測驗成 本。

未來可以研究的方向有二:首先,本研究試題的研發僅針對閱讀技能 進行比較。未來可以同時針對聽、讀、說、寫四種技能,比較語言知識模 式和採語言技能模式的差異。其次,為了避免受試者因為社會語言領域能 力較低,而造成統計不顯著的情形,本研究僅針對華裔學生進行試測。未 來可以將外國語學習者納入研究樣本之中,並進行華語溝通能力值比較。

101

102

曾妙芬. (2007). 推動專業化的 AP 中文教學-大學二年級中文教學成功模式之探討 與應用. 北京市: 北京語言大學出版社.

葉育婷. (2009). AP 中文對美國中文學校的影響—以南加州 PV 學區為主. 碩士 碩 士論文, 國立臺灣師範大學, 未出版.

僑生先修部. (2014a). 國立臺灣師範大學僑生先修部校史簡介 Retrieved Dec. 19, 2014, from http://www.ntnu.edu.tw/divrec/A/A1/001Introduce.htm

僑生先修部. (2014b). 開課班別 Retrieved Dec. 19, 2014, from http://www.ntnu.edu.tw/divrec/A/A1/003Class.htm

劉珣 (Ed.). (2006). 漢語作為第二語言教學簡論. 北京: 北京語言大學出版社.

蔡雅薰. (2009). 華語文教材分級研制原理之建構. 臺北: 正中書局.

蔡慶皇. (2010). 以 HO-IRT 為基礎之數學領域電腦適性測驗系統建置. 碩士 碩士 論文, 國立臺中教育大學, 未出版.

蔣宇紅. (2006). 外語交際能力的跨文化因素研究. 國際關係學報, 2, 60-64.

聯合國. (2014). 聯合國官方語文 Retrieved Dec.19, 2014, from http://www.un.org/zh/aboutun/languages.shtml

103

英文部分

Ackerman, T. A. (1989). Unidimensional IRT calibration of compensatory and

noncompensatory items. Applied Psychological Measurement, 13, 113-127.

Ackerman, T. A., Gierl, M. J., & Walker, C. (2003). An NCME instructional module on using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22, 37-53.

Adams, R. J., Wilson, M. R., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1-23. doi: 10.1177/0146621697211001

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.

Alderson, D. (2000). Language Testing and Evaluation. Beijing Foreign Languages Teaching and Research Press.

Anderson, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123-140.

Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford:

Oxford University Press.

Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative language proficiency. TESOL Quarterly, 16(4), 449-465.

Bachman, L. F., & Palmer, A. S. (1984). Some comments on the terminology of language testing. In C. Rivera (Ed.), Communicative competence approaches to language proficiency assessment: research and application (pp. 34-43).

Clevedon Avon, UK: Multilingual Matters.

Bachman, L. F., & Palmer, A. S. (1997). Language Testing in Practice. London: Oxford University Press.

Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.

Birnbaum, A. (Ed.). (1968). Some latent trait models and their use in inferring an examinee’s ability. Reading, MA: Addison-Wesley.

Blum-Kulka, S., House, J., & Kasper, G. (1989). Cross-cultural pragmatics: Requests and apologies. Norwood, NJ: Ablex. Norwood, NJ.: Ablex.

Bock, R. D., & Aitken, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of the EM algorithm. Psychometrika, 46, 443-459.

Bock, R. D., Gibbons, R., & Muraki, E. J. (1988). Full information item factor analysis.

Applied Psychological Measurement, 12, 261-280.

Brown, H. D. (2006). Principles of Language Learning and Teaching (Fifth Edition ed.).

Englewood Cliffs, NJ: Prentice Hall Regents.

104

Campbell, R. N., & Rosenthal, J. W. (Eds.). (2000). Heritage Language. Mahwah, NJ:

Lawrence Erlbaum Associates.

Canale, M. (1983). From communicative competence to communicative language pedagogy. In J. C. Richard & R. Schmidt (Eds.), Language and Communication (pp. 2-27). London, UK: Longman.

Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, I(1), 1-47. doi:

10.1093/applin/I.1.1

Carroll, J. B. (1945). The effect of difficulty and chance success on correlations between items or between tests. Psychometrika, 10, 1-19.

Carroll, J. B. (1968). The psychology of language testing. In A. Davies (Ed.), Language Testing Symposium: A Psycholinguistic Approach (pp. 46-69). London: Oxford University Press.

Carroll, J. B. (1972). Fundamental considerations in testing English proficiency of foreign students. In H. B. A. R. N. Campbell (Ed.), Teaching English as a second language: A book of readings (2nd ed.). New York: McGraw-Hill Hook

Company.

Chalhoub-Deville, M. (1997). Theoretical models, assessment frameworks and test construction. Language Testing, 14(1), 3-22. doi:

10.1177/026553229701400102

Chalhoub-Deville, M., & Deville, C. (2005). A look back at and forward to what language testers measure. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 815-832). Mahwah, NJ: Lawrence Erlbaum Associates.

CollegeBoard. (2011). Chinese language and culture Retrieved Dec. 15, 2011, from http://www.collegeboard.com/student/testing/ap/sub_chineselang.html CollegeBoard. (2012a). AP Chinese Language and Culture Course Retrieved Feb. 7,

2012, from

http://apcentral.collegeboard.com/apc/public/courses/teachers_corner/3722 1.html

CollegeBoard. (2012b). AP Chinese Language and Culture Exam Overview. College Board.

Council of Europe (Ed.). (2001). The common European framework of reference for languages. Strasbourg: Cambridge University Press.

de la Torre, J. (2008). Multidimensional scoring of abilities: The ordered polytomous response case. Applied Psychological Measurement, 32, 355-370.

de la Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of MCMC in test scoring. Journal of Educational and Behavioral

105

Statistics, 30, 295-311.

de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33(8), 620-639.

de la Torre, J., Song, H., & Hong, Y. (2011). A comparison of four methods of IRT subscoring. Applied Psychological Measurement. doi:

10.1177/0146621610378653

Douglas, D. (2000). Assessing Language for Specific Purposes. Cambridge: Cambridge University Press.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists.

Mahwah, N.J.: L. Erlbaum Associates.

Ethnologue. (2014). languages of the world Retrieved Dec. 19, 2014, from http://www.ethnologue.com/statistics/size

ETS. (2012). Internet-Based Test of English as a Foreign Language Retrieved Sep. 21, 2012, from http://www.ets.org/toefl/ibt/about

Gallagher, M. W. (1996). Optimizing unique opportunities for learning. In X. Wang (Ed.), A view from within: A case study of Chinese heritage community language schools in the United States (pp. 69-76). Washington, DC: National Foreign Language Center.

Glas, C. A. W., Wainer, H., & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 271-287).

Netherlands: Kluwer Academic Publishers.

Green, S. B. (1983). Identifiability of spurious factors with linear factor analysis with binary items. Applied Psychological Measurement, 7(3-13).

Gu, L. (2013). At the interface between language testing and second language acquisition: Language ability and context of learning. Language Testing. doi:

10.1177/0265532212469177

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory : Principles and applications. Hingham, MA: Kluwer-Nijhoff.

Harley, B., Allen, P., Cummins, J., & Swain, M. (1990). The Development of Second Language Proficiency. NY: Cambridge Applied Linguistics.

Harris, D. P., & Palmer, L. A. (1970). CELT Listening Form L-A, Structure Form S-A, Vocabulary Form V-A. New York: McGraw-Hill Book Company.

Hattie, J. (1981). Decision Criteria for Determining Unidimensional and

Multidimensional Normal Ogive Models of Latent Trait Theory. Armidale, Australia: The University of New England, Center for Behavioral Studies.

He, A. W. (2006). Toward an identity theory of development of Chinses as a heritate

106

language. Heritage Language Journal, 4(1).

HSK. The First Mock Test Paper for HSK (Elementary-Intermediate). from HSK http://www.hsk.org.cn/Intro_sample.aspx

HSK. (2011). Hanyu Shuiping Kaoshi Retrieved May 21, 2011, from http://www.hsk.org.cn/index.aspx

Huang, H.-Y., Chen, P.-H., & Wang, W.-C. (2012). Computerized adaptive testing using a class of high-order item response theory models. Applied Psychological Measurement, 36(8), 689-706. doi: 10.1177/0146621612459552

Huang, H. Y. (2009). The hierarchical structure item response model and its application to computerized adaptive testing. PH.D. doctoral dissertation, National Taiwan Normal University, Unpublished.

Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., & Taylor, C. (2000). TOEFL 2000 framework: A working paper. Princeton, NJ: ETS.

Kang, T., & Cohen, A. S. (2007). IRT model selection methods for dichotomous items.

Applied Psychological Measurement, 31(4), 331-358. doi:

10.1177/0146621606292213

Kasper, G., & Dahl, M. (1991). Research methods in interlanguage pragmatics.

Honolulu: University of Hawai’i Press.

Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring.

Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring. Applied Psychological Measurement, 20, 153-168., 20, 153-168.

Kondo, K., & Brown, J. D. (Eds.). (2007). Teaching Chinese, Japanese, and Korean language students: Curriculum needs, materials, and assessment. Mahwah.

NJ: Lawrence Erlbaum Associates.

Kramsch, C. (1986). From Language Proficiency to Interactional Competence. The Modern Languge Journal, 70(4), 366-372.

Kramsch, C. (1998). Language and Culture. Oxford: Oxford University Press.

Kunnan, A. J. (1998). Approaches to validation in language assessment. In A. J.

Kunnan (Ed.), Validation in language assessment (pp. 1-16). Mahwah, N.J.:

LEA.

Lado, R. (1961). Language Testing. New York: McGraw-Hill. New York: McGraw-Hill.

Lee, J., Grigg, W. S., & Dion, G. S. (2007). The nation’s report card: Mathematics 2007.

Washington, DC: U.S. Department of Education.

Li, Y. H., & Schafer, W. D. (2005). Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics. Applied

Psychological Measurement, 29(1), 3-25. doi: 10.1177/0146621604270667 Lord, F. M. (1952). A theory of test scores. Psychometric Monograph, 7.

Lord, F. M. (1971). The self-scoring flexilevel test. Journal of Educational

107

Measurement, 8, 147-151.

Lord, F. M. (Ed.). (1980). Applications of item response theory to practical testing problems. New Jersey: Lawrence Erlbaum Associates.

Lynch, A. (2003). The relationship between second and heritage language acquisition:

Notes on research and theory building. Heritage Language Journal, 1(1).

Mckinley, R. L., & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods & Instrumentation, 15, 389-390.

McKinley, R. L., & Way, W. D. (1992). The feasibility of modeling secondary TOEFL ability dimensions using multidimensional IRT models. Princeton, NJ: ETS.

McNamara, T. (2003). Book Review: Fundamental considerations in language testing.

Oxford: Oxford University Press, Language testing in practice: designing and developing useful language tests. Language Testing, 20(4), 466-473. doi:

10.1191/0265532203lt268xx

Mulder, J., & van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criterion for item selection. Psychometrika, 74, 273-296.

Mullis, I. V. S., Martin, M. O., & Foy, P. (2007). TIMSS 2007 International Mathematics Report.

Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O'Sullivan, C. Y., & Preuschoff, C. (2009).

TIMSS 2011 Assessment Frameworks. Boston College.: Lynch School of Education.

Murphy, D. L., Dodd, B. G., & Vaughn, B. K. (2010). A comparison of item selection techniques for testlets. Applied Psychological Measurement, 34, 424-437.

O'Sullivan, B. (Ed.). (2011). Language Testing: Theories and Practices New York:

Palgrave Macmillan.

O'Sullivan, B., & Weir, C. J. (2011). Test development and validation. In B. O'Sullivan (Ed.), Language testing: Theories and practices (pp. 13-32). UK: Palgrave Macmillan.

OECD. (2005). PISA 2003 Technical Report. Paris: OCED.

Oller, J. W. (1979). Language Tests at School London: Longman Group Ltd.,.

Oller, J. W. (Ed.). (1983). Issues in Language Testing Research. Rowley, MA: Newbury House Publishers, Inc.

Oller, J. W., & Jonz, J. (1994). Cloze and Coherence. London: Associated University Presse.

Ostini, R., & Nering, M. L. (2005). Polytomous item response theory models.

ThousandOaks, CA: Sage. Thousand Oaks, CA: Sage.

Paolillo, J. C. (2006). Evaluating Language Statistics: The Ethnologue and Beyond (pp.

57): UNESCO.

108

Phinney, J. S., & Nakayama, S. (1991). Parental Influences on Ethnic Formation in Adolescents: ERIC Document Reproduction Service.

Purpura, J. E. (2010). Assessing communicative language ability: models and their components. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of Language and Education (2nd ed., Vol. Language Testing and Assessment, pp.

53-68). NY: Springer.

Raîche, G., Blais, J. G., & Magis, D. (2007, June 7-8). Adaptive estimatiors of trait level inadaptive testing: some proposals. Paper presented at the The 2007 GMAC conference on Computerized Adaptive Testing, Minneapolis.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.

Copenhagen: Denmarks Paedagogiske Institut.

Reckase, M. D. (1985). The difficulty of test items that measure more than one ability.

Applied Psychological Measurement, 9, 401-412.

Reckase, M. D. (1997). The past and future of multidimensional item response theroy.

Applied Psychological Measurement, 21, 25-36.

Reckase, M. D. (2009). Multidimensional ITem Response Theory. New York: Springer.

Rymes, B. R. (2008). Language Socialization and the Linguistic Anthropology of Education. In P. Duff & N. Hornberger (Eds.), Encyclopedia of Language and Education (2nd ed., pp. 29-42). New York: Springer.

Sasaki, M. (1999). Second Language Proficiency, Foreign Language Aptitude, and Intelligence. NY: Peter Lang.

Sawaki, Y., Sticker, L. J., & Oranje, A. (2008). Factor Structure of the TOEFL Internet-Based Test (iBT): Exploration in a Field Trial Sample (pp. 67):

Educational Testing Services.

Sawaki, Y., Sticker, L. J., & Oranje, A. (2009). Factor structure of the TOEFL Internet-based test. Language Testing, 26(1), 5-30.

SC-TOP. (2011a). New version of the Test of Chinese as a Foreign Language launched in 2011. Taipei: Steering Committee for the Test Of Proficienct-Huayu Retrieved from

http://www.sc-top.org.tw/download/News%20release%20of%20the%20new

%20version.pdf.

SC-TOP. (2011b). Steering Committee for the Test of Proficiency-Huayu Retrieved May 17, 2011, from http://www.sc-top.org.tw/

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333-343.

Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-345.

109

Segall, D. O. (2000). Principles of Multidimensional Adaptive Testing. In W. J. van der Linden & C. W. Glas (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 53–73). Netherlands: Kluwer Academic Publishers.

Segall, D. O., & Moreno, K. E. (1999). Development of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery. In F.

Drasgow & J. B. Olson-Buchanan (Eds.), Innovations in Computerized Assessment. Hillsdale, NJ: Lawrence Erlbaum Associates.

Shehan, P. (1988). State-of-the-Art article: Language testing, Part 1. Language Testing, 21(4), 211-221.

Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement, 68, 413-440.

Skehan, P. (1988). State-of-the-art article: Languae testing, Part 1. Language Testing, 24(4), 211-221.

Song, H. (2007). A higher-order item response model: development and application.

Ph. D. doctoral dissertation, The State University of New Jersey, Unpublished.

Spiegelhalter, D. J., Best, N. G., & Carlin, B. P. (1998). Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models Technical report. Cambridge, UK: MRC Biostatistics Unit.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B - Statistical Methodology, 64(4), 583-639.

Stone, C. A., & Yeh, C.-C. (2006). Assessing the Dimensionality and Factor Structure of Multiple-Choice Exams: An Empirical Comparison of Methods Using the Multistate Bar Examination. Educational and Psychological Measurement, 66(2), 193-214.

Swygert, K. A., McLeod, L. D., & Thissen, D. (Eds.). (2001). Factor analysis for items or testlets scored in more than two categories: Chapel Hill: University of North Carolina, L. L. Thurstone Psychometric Laboratory.

Valdés, G. (Ed.). (2001). Heritage Language Students: Profiles and Possibilities. IL:

Delta Publishing Company.

van der Linden, W. J. (1999). Multidimensional adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics, 24, 398-412.

van der Linden, W. J., & Hambleton, R. K. (1996). Handbook of Modern Item Response Theory. New York: Springer-Verlag press.

van der Linden, W. J., & Pashley, P. J. (2000). Item Selection and Ability Estimation in Adaptive Testing. In W. J. van der Linden & C. W. Glas (Eds.), Computerized

110

Adaptive Testing: Theory and Practice (pp. 1-25). Netherlands: Kluwer Academic Publishers.

Vollmer, H. J., & Sang, F. (1980). Competing hypotheses about second language ability:a plea for caution. Berlin: Osnabrück.

Walton, A. R. (1995). Chinese language schools as a national resource: The large context. Journal of the Association of Chinese Schools, 21, 3-15.

Wang, H.-P., Kuo, B.-C., Tsai, Y.-H., & Liao, C.-H. (2012). A CEFR-based computerized adaptive testing system for Chinese proficiency. The Turkish Online Journal of Educational Technology, 11(4).

Wang, W.-C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136.

Wang, W.-C., Wilson, M. R., & Adams, R. J. (1997). Rasch models for

multidimensionality between items and within items. In M. Wilson, G.

Engelhard & K. Draney (Eds.), Objective measurement: Theory into practice

Engelhard & K. Draney (Eds.), Objective measurement: Theory into practice