• 沒有找到結果。

測驗評量與學習概念在英語教學之應用

N/A
N/A
Protected

Academic year: 2021

Share "測驗評量與學習概念在英語教學之應用"

Copied!
135
0
0

加載中.... (立即查看全文)

全文

(1)國立臺中教育大學教育測驗統計研究所博士論文. 指導教授:許天維. 博士. 永井正武. 博士. 測驗評量與學習概念在英語教學之應用 Test Evaluation and Structural Analysis of Learning Concepts in English Teaching Application. 研究生:王柏婷. 中. 華. 民. 國. 一. ○. 一. 撰. 年. 十. 月.

(2) Acknowledgements I would like to express my gratitude to all those who gave me the possibility to complete this thesis. I want to thank my advisors: Prof. Tian-Wei Sheu, and Prof. Masatake Nagai who encourage me to do the research work and provide stimulating support in all the time of writing this thesis. My sincere thanks to all other committee members: Dr. Chin-Tsai Lin, Dr. Jiang-Long Lin, Dr. Kun-Li Wen, Dr. Chaang-Yung Kung, and Dr. Bor-Chen Kuo. I appreciate your encouragement, interest in this research, and your valuable input to this thesis. I am deeply indebted to the Graduate Institute of Educational Measurement and Statistics, National Taichung University of Education, and Feng-Chia University, for providing me the opportunity to pursue my study. My heartfelt thanks to all the professors who have taught me and guided me throughout my academic study during these years. Especially, I would like to give my special thanks to my husband and my parents whose patient love enabled me to complete this thesis. Finally, I would like to thank my Ph.D. classmates: Jung-Chin Liang and Jian-Wei Tzeng who accompany and support me during the studying days.. I.

(3) 中文摘要 本研究旨在探討以大學生為研究對象,探究其於英語語言教育測驗評量以 及知識學習之概念結構分析。透過統計分析方法:灰關聯分析(Grey Relational Analysis; GRA )、 Grey Student-Problem 表 (GSP chart) 、 灰 結 構 模 型 (Grey Structural Modeling; GSM)、概念詮釋結構模式(Interpretive Structure Modeling; ISM) 、 試 題 順 序 理 論 (Ordering Theory)/ 試 題 關 聯 結 構 法 (Item Relational Structure)及層級分析法(Analytic Hierarchy Process)等研究方法建構科學系統 分析模式,並使用 Matlab GUI 工具箱,經由圖表具體呈現數據分群狀態,實證 結果如次: 一、提供新的教育決策評量方法 以層級分析法結合灰關聯分析法以小樣本進行教育測驗評量分析,在取樣、 分析具有經濟性,實驗經驗證具有其信度及效度。經由 C.I . 值檢定,可獲得 客觀的數據以供教育決策評量參考。 二、應用知識結構分析方法 應用灰關聯分析法及灰結構模型,依據學生作答反應,分析學生英文文法、 閱讀測驗等知識結構及其強項、弱項,提供教師訂定英文測驗評分權重之參 考及補救教學之依據,具有分析英語教育決策之實用價值。 三、系統化呈現評量結果 透過 GSP 表,灰結構模型圖及 Matlab GUI 工具箱,繪製圖表鑑別學生程度 差異,並與試題關聯結構相互驗證,提供學習診斷,進而據以實施補救教學 及因材施教。 關鍵字:英語語言教育測驗評量、結構分析、灰關聯分析、GSP 表、灰結構模 型、概念詮釋結構模式、試題順序理論/試題關聯結構法 II.

(4) Abstract The goal of this paper is to investigate the applications of statistical analysis methods in English language assessment and structural analysis of English learning concepts. By using the statistical analysis methods, such as Grey Relational Analysis (GRA), Grey Student-Problem Chart (GSP chart), Grey Structural Modeling (GSM), Interpretive Structural Modeling (ISM), Ordering Theory/ Item Relational Structure (OT/IRS) and Analytic Hierarchy Process (AHP), the scientific system analysis model could be constructed. Also, with the use of Matlab, the specific clustering data are presented in structural graphs. The empirical results are shown as follows: 1. Providing a new educational decision-making evaluation method Combining AHP with GRA and using small data to do the educational evaluation analysis, the researcher finds it is economy to do the sampling and analyzing. Also, the empirical results certify its reliability and validity. Through the C.I . verification, the objective data could be achieved, and they can be applied to educational decision-making fields. 2. Applying knowledge structure analysis methods By applying the GRA and GSM model, the researcher analyzes the knowledge structure, and strength and weakness of students’ English grammar and reading performances. The results can not only be applied to teachers while deciding English test weighting, and providing remedial instructions, but also has the practical value of educational decision-making fields. 3. Systematizing assessment results Through GSP chart, GSM model, and Matlab toolbox, it is possible to identify III.

(5) differences in students' abilities. Also, the results are verified with OT/IRS to provide learning diagnosis and remedial instructions. Keywords: English language assessment, structural analysis, GRA, GSP, GSM, ISM, OT/IRS. IV.

(6) Table of Contents Page Acknowledgements ........................................................................................................ I Chinese Abstract ...........................................................................................................II English Abstract .......................................................................................................... III Table of Contents ......................................................................................................... V List of Tables ............................................................................................................ VIII List of Figures ............................................................................................................... X Notation……… ........................................................................................................... XII Chapter 1 Introduction................................................................................................. 1 1.1 Research Background and Motivation .................................................................. 1 1.2 Research Purpose ................................................................................................ 13 1.3 Research Flowchart ............................................................................................. 14 1.4 Research Questions ............................................................................................. 15 1.5 Definition of Terms ............................................................................................. 16 1.6 Summary of Research Contribution .................................................................... 19 1.7 Limitations of This Research .............................................................................. 20 1.8 Overview of This Paper....................................................................................... 21 Chapter 2 Literature Review ..................................................................................... 22 2.1 Development in Language Testing ..................................................................... 22 2.2 Educational Measurement Methods ................................................................... 27 2.2.1 Ordering Theory and Item Relational Structure (OT/IRS) ........................... 27 2.2.2 Interpretive Structural Modeling ................................................................... 29 2.3 The Grey System Theory Application in the Educational Measurement Field . 32 V.

(7) 2.3.1 Grey Relational Analysis and Grey Student-Problem Chart ........................ 32 2.3.2 Grey Structural Modeling ............................................................................. 40 2.3.3 Analytic Hierarchy Process Combined with GRA ....................................... 42 Chapter 3 Experimental Methods ............................................................................. 45 3.1 Research Design .................................................................................................. 45 3.2 Significance of the Research ............................................................................... 47 3.3 Research Instrument ............................................................................................ 49 3.4 Research Method ................................................................................................. 51 Chapter 4 Research Results and Discussion............................................................. 53 4.1 Results and Discussion of Experimental Test 1:Concept Diagnosis of English Grammar ........................................................................................................... 53 4.2 Results and Discussion of Experimental Test 2: Evaluate English Test Item Difficulties ........................................................................................................ 63 4.3 Results and Discussion of Experimental Test 3: Choosing an English Coursebook by Using GRA-AHP Method ....................................................... 67 Chapter 5 Total Discussion ........................................................................................ 75 5.1 Concept Diagnosis of English Grammar: ........................................................... 75 5.2 Evaluate English Test Item Difficulties .............................................................. 79 5.3 Choosing an English Coursebook by Using GRA and AHP .............................. 81 Chapter 6 Conclusions and Recommendations........................................................ 84 6.1 Conclusions ......................................................................................................... 84 6.2 Recommendations ............................................................................................... 88. VI.

(8) References…………………………………………………………………………..90 Appendix 1 Freshman English Grammar Exam ................................................... 113 Appendix 2 English proficiency test ........................................................................ 114 Appendix 3 The decision matrix of interviewees (professional) ........................... 115 VITA…………........................................................................................................... 121. VII.

(9) List of Tables Table 1.1. Page English language learning categories .......................................................... 1. Table 1.2. English language testing history .................................................................. 3. Table 1.3. Types of assessments used with English language learners ........................ 4. Table 1.4. Comparison of system modeling approaches ............................................ 12. Table 2.1. Relevant literature of language testing ...................................................... 25. Table 2.2. The joint marginal probabilities of item concepts j and k ......................... 28. Table 2.3. Example of original data S-P chart ............................................................ 35. Table 2.4. Sorted S-P chart of Table 2.3 ..................................................................... 35. Table 2.5. Example of students’ matrix of Table 2.3 ................................................. 38. Table 2.6. Example of problem’s matrix of Table 2.3 ................................................ 39. Table 2.7. Pair-wise comparison scale for AHP preferences ..................................... 44. Table 3.1. Research instrument and the relevant experimental tests .......................... 49. Table 3.2. The verb concepts in the grammar test ...................................................... 49. Table 3.3. The experimental tests of the paper ........................................................... 52. Table 4.1. The concept attributes of basic English verb test ...................................... 54. Table 4.2. The concept-attribute matrix of basic English verb test ............................ 54. Table 4.3. The item-attribute matrix of the test .......................................................... 56. Table 4.4. Item analysis comparison between CTT and GRA ................................... 64. Table 4.5. The coding information of professionals ................................................... 69. Table 4.6. Coursebook selection criteria items ........................................................... 70. Table 4.7. The C.I . value of professionals............................................................... 71. Table 4.8. LGRA-P ordinal ......................................................................................... 72 VIII.

(10) Table 4.9. LGRA-I ordinal ......................................................................................... 72. Table 4.10 The criteria order of choosing English coursebook .................................... 73 Table 5.1. The comparison of IRS and GSM ............................................................. 77. Table 5.2. Main comparison of CTT and GRA .......................................................... 80. Table 5.3. Extract from a weighted rating scale for the comparative evaluation of coursebooks .................................................................................................. 82. Table 6.1. The comparison of S-P chart and GSP chart ............................................. 85. IX.

(11) List of Figures Fig. 1.1. Page English self-study center in a private university ............................................ 6. Fig. 1.2. English learning environment in a private university .................................... 6. Fig. 1.3. Suggested learning path process ..................................................................... 7. Fig. 1.4. Six test qualities .............................................................................................. 8. Fig. 1.5. The relationship between English curriculum, teaching, English assessment and test evaluation ....................................................................... 9. Fig. 1.6. Research flowchart ....................................................................................... 14. Fig. 2.1. An example of ISM hierarchy ...................................................................... 31. Fig. 2.2. Grey S-P chart framework ............................................................................ 37. Fig. 3.1. Framework of the thesis ............................................................................... 45. Fig. 4.1. The concept structure of English grammar verb test .................................... 55. Fig. 4.2. The GSM structure of test items ................................................................... 56. Fig. 4.3. IRS of examinees generated by Takeya’s threshold 0.5 .............................. 57. Fig. 4.4. Concept hierarchy of daytime and extension students by using GSM ......... 58. Fig. 4.5. GSM of grammar (daytime) performances .................................................. 59. Fig. 4.6. GSM of grammar (extension) performances ................................................ 59. Fig. 4.7. GSP of daytime students .............................................................................. 60. Fig. 4.8. GSP of extension students ............................................................................ 60. Fig. 4.9.  Distribution of day-time students ............................................................ 61. Fig. 4.10  Distribution of extension students ........................................................... 62 Fig. 4.11 Students’ performances clustering in Matlab interface ................................. 66 Fig. 4.12 GSP of English coursebook chosen ............................................................... 73 Fig. 4.13 GSM of English coursebook choosing path .................................................. 74 X.

(12) Fig. 6.1. Suggested Diagnosis Sheet ........................................................................... 89. XI.

(13) Notation T. : Reachable matrix. X Y. : Space. Aij. : Students’ responses toward test items : Binomial relationship of elements Ai and A j , where i  1, 2, 3, , n; j  1, 2, 3, , n. Ci. : Hierarchical class set, where i  1, 2, 3, , n. Si. : Student ID numbers, where i  1, 2, 3, , n. Pj. : Test items, where j  1, 2, 3, , n. rjk*. : Ordering relationship between item j and k , where j  1, 2, 3, , n ,. k  1, 2, 3, , n : A binomial relationship of si and s j , where R as the relationship,. si Rs j.  0i. max  ij  *jk . . si Rs j  1 , si Rs j  0 : Grey relational grade : Degree between x0 and x j , where x0 as reference sequence and x j as the sequence : The maximum eigenvalues : Path of elements : Threshold value between 0.02 and 0.04 : Class coefficient : Path coefficient. XII.

(14) Chapter 1 Introduction This chapter is an overall introduction of the paper. It first highlights the research motivation of the paper through a discussion of the English testing history and the difficulties the English teachers are facing in Taiwan. Also, the proposed statistical analysis methods, related to the grey system theory, are introduced. The research purpose is then discussed, followed by the research questions. Next, the explanations of terms are introduced, and the significance of this paper is subsequently pointed out. Finally, an overview of the paper has been presented.. 1.1 Research Background and Motivation Traditionally, researchers have classified English language learning under two categories and they are shown in Table 1.1 (DelliCarpini, 2008; Karbalaei, 2010; Laufer and Paribakht, 1998; Nayar, 1997; Rasmussen, 2010):. Table 1.1 English language learning categories English as a Foreign Language English as a Second Language (EFL) Definition. Example. (ESL). EFL refers to English. ESL refers to English. instruction and learning in. instruction and learning in. regions where English is not. regions where it is a first. the language of everyday. language or is commonly used. communication. in education or the workplace. Taiwan or Japan. England or the United States 1.

(15) The global economy is blurring these traditional classifications, and English proficiency is now of interest to countries worldwide because of its role in creating a competitive workforce (Pang, Zhou and Fu, 2002). Therefore, followed by the trend of globalization, the educational institutions in Taiwan try hard to enhance the English ability of college students, and English ability is considered to be a vital element of international competitiveness (Black, 2001; Chen and Klenowski, 2009; Gilleard and Gilleard, 2000; Liang, J. S., 2010; Rea-Dickins and Scott, 2007; Wang, 2007). So, how to evaluate students’ English performances becomes more and more important. Hence, Table 1.2 introduces the brief history of English language testing which also provides a general idea of test evolution from the 1980s to current focus (Bachman, 1990, 2000; Bachman and Palmer, 1996; Canale, 1983, 1984; Canale and Swain, 1980; McNamara, 1996; Morrow, 1979; Savignon, 1972, 1983). When we look back at the 1980s, researchers seemed to be interested in real-life testing, and they started to design the tests which include the real-life communication (Canale, 1983, 1984; Canale and Swain 1980; Morrow, 1979; Savignon, 1972, 1983). In the 1990s, researchers started to investigate the relationship between the test methods and test performances, and this also provoked the development of English proficiency test (Bachman, 1990; Bachman and Palmer, 1996; McNamara, 1996). Since the year 2000 until now, the language testing still focuses on the nature of language ability, why the language assessment is important, and how to interpret the test scores (Bachman, 2000; Bachman and Palmer, 2010).. 2.

(16) Table 1.2 English language testing history Year 1980. Researchers Savignon (1972, 1983),. Language Testing Development Language was reckoned to be a set of. Morrow (1979), Canale & real life encounters and experiences and Swain (1980), and Canale. tasks, a view which took real life testing. (1983, 1984).. so seriously that it lost both objectivity and generality.. 1990. Bachman (1990),. Bachman proposed an interactional. McNamara (1996) and. model of language test performance in. Bachman & Palmer. which language ability (language. (1996). knowledge and metacognitive strategies) is seen as interacting with test method (characteristics of the environment, rubric, input, expected response, and the relationship between input and expected response) to produce a performance that can be described and reported. It has provided a principled, systematic basis for the development of language tests, such as the English proficiency test.. Current Focus. Bachman (2000). Focus on: -Methodology -Practical advances -Performance-affecting factors -Performance assessment -Ethical issues 3.

(17) According to Table 1.2, we notice that the trends in language testing seem to follow the trends in language teaching since language teaching methodologists have developed and used language testing models (Giri, 2003; Hu, 2004; Jia and Yang, 2005; Klenowski, 2006). Moreover, Table 1.3 summarizes some of the types of assessments used with English language learners, and these assessments can be either formal or informal (Rasmussen, 2010). According to Rasmussen (2010), the informal assessments are often used to provide formative information while the formal assessments check program success or students’ progress.. Table 1.3 Types of assessments used with English language learners Informal. Formal. Alternative, Authentic,. Standardized,. Large Scale,. and Classroom-based. Commercial. Standards-based. Assessments. Assessments. Assessments. • Checklists. • Assessments. • College Entrance. • Games. related to. • Extensive reading. commercial. • Observations. instructional. • Parent information. programs. • Portfolios • Rating scales • Student Oral Language • Student self-assessment. • English Proficiency tests • English Achievement tests. 4. Exam.

(18) Recently, passing the English proficiency tests in school seems to become a way to support individual English ability, and the outcomes of such tests influence students’ learning and teachers’ instructions (Abedi, 2008; Leung and Lewkowicz, 2006; Rasmussen, 2010). The systematic testing provides data that assist educators in making decisions about the initial placement of students in instructional programs or advancing them to next levels, in identifying their needs, and ensuring that they meet educational goals. (Alberta Education, 2006; Rasmussen, 2010). Also, the teachers can understand the learning outcomes and learning difficulties among students through the tests (Rowe, 2006; Sireci, Han and Wells, 2008). However, while the educational institutions work hard to design English proficiency tests, there is still a question: do these tests fit the students’ ability? How should test scores be interpreted (Leung and Lewkowicz, 2006)? Do teachers believe what the test tells about the students? Besides, the educational institutions in Taiwan work hard on creating English learning environment on campus, and the main purpose is to encourage students to use the facilities, like English self-study software (i.e. Rosetta Stone, Longman English Interactive, Focus on Grammar, etc.), English corner, and English lab (please see Fig. 1.1 and Fig. 1.2). Finally, it can increase students’ English learning motivation gradually. But how can these educational institutions understand students’ progress after the students’ use the school assigned software? The relationship between the English software and the students’ test performances is suggested to be measured in an objective and quantitative way.. 5.

(19) Fig. 1.1 English self-study center in a private university. Fig. 1.2 English learning environment in a private university 6.

(20) However, Taiwanese students do not know how to use the English facilities effectively, and most of them are just doing the “suggested software” on the student manual instead of doing the English software which really meets their needs. If the educational institutions could provide the “suggested learning path” to each individual student, it will be more helpful for students’ English learning, and the suggested process is summarized in Fig. 1.3.. English Teacher. Advisor. • Evaluation students' English performances. Student. • Providing suggested learning path. • Do the English software provided by the learning path. Fig. 1.3 Suggested learning path process (Adapted from Education and Manpower Bureau, 2007). According to Fig. 1.3, we can understand that as a teacher, it is important to understand the exam results and provide remedial instructions to the students. In order to understand the test results, we need to view the test design process first. Therefore, Bachman and Palmer (1996) mentioned the six test qualities in test design process, and they are shown in Fig. 1.4.. 7.

(21) Reliability. Practicality. Construct Validity. Impact. Authenticity. Interactiveness. Fig. 1.4 Six test qualities (Adapted from Bachman and Palmer, 1996) If the test is too easy and most of the students get high scores on the test or it is too difficult while the class average is low, the test designer has to check the reliability and content of the test (Sheu, Chen, Tzeng, Tsai and Nagai, 2012; Tzeng, Sheu, Liang, Wang and Nagai, 2012a~2012b; Wang, Sheu, Liang, Tzeng and Nagai, 2011a~2011b, 2012a~2012c). Otherwise, it is just waste of time taking the tests because it fails to test the students’ abilities; as a result, students receive inappropriate instruction (Abedi, 2008; Leung and Lewkowicz, 2006). Therefore, the author proposes the statistical analysis method, like the grey S-P chart, to cluster students’ performances and test item difficulties. Then teachers can provide remedial instructions or assignments based on students’ clustered groups, and review test content as well (Wang, Wang, Wen, Nagai and Liang, 2011; Wang, Sheu, Liang, Tzeng and Nagai, 2011a~2011b). In this paper, statistical analysis methods related to the grey system theory are used. 8.

(22) Wen, You, Nagai, Chang and Liang (2010) and Wang et al (2011a) point out that the grey statistical analysis method not only quantifies the relationship between elements, but also presents the results in a concrete quantitative way. Also, regarding to the characteristics of the grey system statistical analysis method, the evaluation must be as objective as possible which increases the reliability and validity of the test results. In addition, to enhance the teaching effectiveness in classroom, it is an important issue for teachers to find out students’ misconceptions and provide remedial instructions effectively and quickly (Sheu, Tzeng, Tsai and Chen, 2012; Thompson and Logue, 2006; Tzeng et al, 2012a~2012b). Wang et al (2011b) also uses the grey system statistical analysis methods “GSP and GSM” to provide adaptive English teaching instructions. The detailed process between English curriculum, teaching, English assessment and test evaluation are shown in Fig. 1.5. Check item difficulties. English Curriculum. Teaching decide. decide. •Purposes •criteria •goal •task. •Methods •teaching •lecture •train •instruct. English Assessment •Methods •formal test •informal test (See Table 1.3). achieve. analyze. Test Evaluation •Statistical Analysis Methods •GSP •GSM. modify modify. Fig. 1.5 The relationship between English curriculum, teaching, English assessment and test evaluation (Adapted from Leung and Lewkowicz, 2006) 9.

(23) Since developing the perfect test is the ultimate goal for test designers, looking at item difficulties may help them decide what is wrong with the test items (Suen and McClellan, 2003). In the past, researchers use the Classical Test Theory (CTT) to check the item difficulty (Magno, 2009; Morales, 2009). That is, when over 90% of the students answered correctly on the item, it may be too easy, and the revision of the item is suggested. On the other hand, if there are less than 30% of the students answered the item correctly, the item may be too difficult, and it also needs to be revised (Kelley, 1939; Wang et al, 2012a). Later, the Item Response Theory Model (IRT) has been more popular to look into item level information (Morales, 2009). Linacre (2002) pointed out that the IRT model is based on two assumptions: uni-dimensionality and local independence. In addition, IRT expresses that the item difficulty is influencing person responses while the item difficulty estimation is influenced by individual ability (Galdin and Laurencelle, 2010; Linacre, 2002). Then Chao, Kuo, Tsai, Lin and Nagai (2010) indicate that by using IRT, there has to be a large sample size for the item difficulty estimation. Thus, they use the Grey Relational Analysis (GRA) to analyze small amount data, and the results are also in line with IRT (Chao, Kuo and Tsai, 2010). To summarize the current situation in English teaching and testing, there are some published papers so far (Alderson, 1991, 2004, 2005, 2007, 2010; Alderson and Banerjee, 2001, 2002; Alderson and Huhta, 2005; Bachman, 2007;. Bishop,. 2004;. Boyd. and. Davies,. 2002;. Broadfoot,. 2005;. Chalhoub-Deville, 2003; Cohen, 2007; Leung, 2004; Lumley, 2002; McNamara, 2003; Mislevy, Steinberg and Almond, 2002; Reath, 2004; 10.

(24) Solano-Flores and Trumbull, 2003; Song and Cheng, 2006). The traditional analysis methods, such as factor analysis method, or the proposed grey system statistical analysis method can be used (Wang et al, 2011a). However, the traditional analysis methods only deal with one-dimensional space which only presents the order of latent variables. Besides, it is to describe many factors with a small number of factors, that is, summarizing related variables into a category, making every category a factor (Wang et al, 2011a). In this paper, the author proposes to use the grey system statistical analysis method, that is, the GRA to do the structural analysis of learning concepts in English and try to find out the objective method of English coursebook chosen. By using GRA with GSP and GSM, it is possible to draw two-dimensional space figures which not only show the relationship between elements, but also present the hierarchy of each element (Wang et al, 2011a~2011b). According to Nagai, Yamaguchi and Li (2005), Yamaguchi, Li and Nagai (2005) and Yamaguchi, Li, Mizutani, Akabane, Nagai and Kitaoka (2007), the comparison of system modeling approaches can be summarized in Table 1.4.. 11.

(25) Table 1.4 Comparison of system modeling approaches (Adapted from Nagai et al, 2005, p128) Traditional Graphical. ISM. GSM. Modeling Theoretical. OT/IRS. Boolean algebra. theory. background Input. Directly ( 0, 1 ). Directly ( 0, 1 ). element. Directly ( 0, 1 ) / Indirectly. causality Causality. Grey system. (observed value) Correlation. Binary relation. Grey relational analysis. type Graph type. Diagraph. Diagraph. Diagraph. Adapt. No. No. Yes (  ,  ). hierarchy. According to Table 1.4, the propose GSM method is superior to the other two methods because it is able to analyze direct and indirect values; moreover, its hierarchy is adaptive while the traditional graphical modeling and ISM fails to do adapt hierarchy.. 12.

(26) 1.2 Research Purpose The purposes of this paper are as follows: 1. Using the proposed GRA method to discuss validity issues involved in measuring the English skills of college students, especially in small data analysis. 2. Using the proposed statistical and qualitative methods: GRA and GSP, which can be used to evaluate the validity and classify students’ test performances derived from the test scores which also be consistent with the results generated from traditional analysis methods. Moreover, the Matlab GUI toolbox can cluster the students’ performances and item difficulties in a user-friendly interface. 3. By using the proposed GSM figure, we are going to construct the structural analysis of both experts’ and students’ concepts in English teaching which helps teachers provide learning diagnosis and remedial instructions. 4. In educational decision making fields, combining GRA and AHP must be helpful to find the solution.. 13.

(27) 1.3 Research Flowchart Following flowchart shows different steps conducted in this paper. Research Motivation. Literature Review. Define Research Goals. Decide Research Method. NO Data Collection. Data Analysis. Discussion. YES Conclusion: Pedagogic and Curriculum Concerns of English Language Teaching.. Fig. 1.6 Research flowchart. 14.

(28) 1.4 Research Questions Below are the major research questions to be answered in this paper. 1. Can teachers evaluate students’ test performances in an objective way by using GRA calculation and still achieve consistent reliability as traditional math analysis method (i.e. CTT) in small data experiments? 2. How to cluster students’ performances by using the novelty GRA, GSP and Matlab program? 3. How to present students’ concept structural and their learning needs in a clear, scientific way by using GSM? Could the results reached from GSM be in line with OT/IRS? 4. How to find the objective method of choosing coursebook by using GRA and AHP? Is it more objective and accurate than using the traditional analysis method?. 15.

(29) 1.5 Definition of Terms To guide the preparation of this paper, the following terms of GRA, GSP, GSM and ISM are clearly defined prior the body of the paper. In this paper, GRA is used to calculate the gamma values of the test results. Based on the. gamma values, students’ test performances and test item difficulties are clustered in the GSP figure. Finally, GSM is used to present the concept hierarchy of students, and ISM is used to present the professionals’ concepts compared with the GSM results. GRA Grey Relational Analysis (GRA) can treat uncertain, multiple, discrete and incomplete information effectively (Huang, 2008; Kuo, Yang and Huang, 2008; Liang, 2011; Liang, Lee and Chen, 2009; Liang, Lee, and Liu, 2009; Liang, Lee and Weng, 2010; Sheu, Liang, You and Wen, 2010; Sheu, Wang, Liang, Tzeng and Nagai, 2010; Sheu, Liang, Wang, Tzeng and Nagai, 2010; Sheu, Tzeng, Liang, Wang and Nagai, 2010; Wen, Chang, Yeh, Wang, and Lin, 2006; Yamaguchi, Li and Nagai, 2007). The GRA can not only count and quantize the discrete data, but also make them ordinal to be analyzed. It uses a specific concept of information which defines situations with no information as black, and those with perfect information as white (Chan and Tong, 2007; Huang, 2008; Kung and Wen, 2007; Kuo et al, 2008; Yamaguchi, Li and Nagai, 2007). However, neither of these idealized situations ever occurs in real world problems. In fact, situations between these extremes are described as being grey, hazy or fuzzy. Therefore, a grey system means that a system in which part of information is known and part of information is unknown (Hsia, Chen and Chang, 2004; You, Wang and Yeh, 2006). With this definition, information 16.

(30) quantity and quality form a continuum from a total lack of information to complete information – from black through grey to white. Since uncertainty always exists, one is always somewhere in the middle, somewhere between the extremes, somewhere in the grey area, so do students’ grades and test performances (Wang et al, 2011a~2011b; Wang, Sheu, Liang, Tzeng and Nagai, 2012a~2012c). GSP The Grey S-P chart (GSP) is the combination of GRA and S-P chart, and it not only makes the analysis more concrete and accurate, but can also be applied to different fields, and it was first introduced by Professor Nagai in 2010 (Liang, Lee and Nagai, 2011; Sheu, Tzeng, Liang, Wang and Nagai, 2011; Sheu, Wang, Liang, Tzeng and Nagai, 2010; Wang et al, 2011a~2011b). It has a diversity function to analyze uncertain factors, such as the application of evaluating students’ English listening performances (Liang, 2011; Liang, Lee and Nagai, 2011). GSM Grey Structural Modeling (GSM) is originated from the grey relational analysis (GRA), and it is established from two steps: estimating a hierarchy and paths among the elements (Liang, Sheu, Wang, Tzeng and Nagai, 2011a~2011c; Nagai, Yamaguchi and Li, 2005; Yamaguchi, Li, Mizutani, Akabane, Nagai and Kitaoka, 2007; Wang et al, 2011a~2011b; Wang et al, 2012a~2012c). ISM The ISM was a method proposed in 1968 by John Nelson Warfield (Wang et al, 2011a~2011b). The main idea is to sort out complex systems with a structuring method, and the relationship between elements can be reached after 17.

(31) repeating the calculation process. When ISM is applied to analyze data, it uses the hierarchical digraph of the graphic theory to describe the elements between different elements (Wang et al, 2011a~2011b). Finally, the hierarchy between elements can be obtained, and they are shown in a clear figure.. 18.

(32) 1.6 Summary of Research Contribution The contribution of the paper can be summarized as follows: 1. The paper uses the grey statistical analysis methods, such as GRA, GSP, and GSM in the field of English teaching and testing. This is a prospective research of cross-domain integration, which has the originality and feasibility. 2. The students’ test performances are clustered based on the gamma values obtained from GRA calculation, and the results are shown clearly between. 0 to 1 in the GSP figure trough Matlab GUI toolbox. Also, the GSM figures present the learning concepts in different levels, and it is an innovative way. In addition, the paper draws two-dimensional space figures which not only show the relationship between elements, but also present the hierarchy of each element (Wang et al, 2011a~2011b; Wang et al, 2012a~2012c). 3. The paper combines GRA and AHP to provide an innovative way in educational decision-making field. This method is not only objective, but also helps teachers find the objective path to choose English coursebook. 4. In short, the paper uses the grey statistical analysis methods to the field of English evaluation which is a pioneer research in English teaching field. Moreover, the results and proposed methods in this paper could be expanded as the basis for future research.. 19.

(33) 1.7 Limitations of This Research The limitations of this paper are summarized as follows: 1. The Matlab GUI function limits to graphics’ resolution, which can’t present the analysis process. In future research, it is suggested to be compared with other toolbox of relevant theory to reach better effect. 2. Due to the time limitation, there is only one experimental test related to the AHP theory. Besides, this is a small scale study in which only twelve professional English teachers were involved. Thus, the future research is suggested to apply a larger pool of participants. In the future, the AHP method is also suggested to be used in more English decision-making fields, such as recruiting new teachers, English speaking weighting assessment and so on. 3. For test item difficulty identification, the paper only compares the results reached from CTT and GRA. It is suggested to use more educational measurement methodologies to be compared with the proposed grey system theory method. 4. For knowledge structure analysis, the paper only compares the results obtained from IRS and GSM. In the future, it could be compared with other knowledge structure analysis methods, like student’s concept structure (Liu, 2012).. 20.

(34) 1.8 Overview of This Paper The paper is comprised of 6 chapters. Chapter 1 introduces the background and focus of the paper. In chapter 2, the development in language testing, educational measurement methods, and proposed grey system theory in educational measurement application are introduced. Chapter 3 gives an account of the methodology of the present paper. The subjects, sampling method, data-collecting instrument, and methods of data analysis are described. Chapter 4 details the results and findings of the present paper, including students’ English grammar performances, English proficiency test item’s difficulty identification and the weighting of choosing English coursebook. Chapter 5 is the discussion of research results. In chapter 6, main themes based on the research questions are discussed, followed by suggestions for future research.. 21.

(35) Chapter 2 Literature Review In educational measurement fields, the test evaluation and test performances have been recognized as the responses among students, teachers and curriculum (Black and Wiliam, 1998; Chalhoub-Deville, 2003; Clarke and Gipps, 2000; McNamara, 1997, 2001; Shepard, 2000; Yin, 2010; Yung, 2002). However, how to measure and what to measure seem to be a big problem. Besides, after taking the test, how to provide effective remedial instructions is also important (Horn, McCoy, Campbell, and Brock, 2009; Leake and Lesik, 2007). In this chapter, the development in language testing will be discussed first, followed by the educational measurement methods and knowledge structure analysis methods, such as OT/IRS and ISM. Then the proposed grey system theory in educational measurement application (e.g. GRA, GSM and GSP) will be introduced. Finally, the proposed educational decision-making method of AHP will be discussed.. 2.1 Development in Language Testing In the 1980s, language testing focuses on the aspects of discourse, sociolinguistic and the context (Morrow, 1979; Widdowson, 1983). The “communicative approach” was introduced and influenced the design of language tests, that is, to make the communicative or authentic tests (Alderson, 1981; Canale, 1983, 1984; Spolsky, 1985). At the same time, second language acquisition researchers investigate the background knowledge (Alderson, 1981; Alderson and Urquhart, 1985) on test performance and the process of test-taking (Grotjahn, 1986). At the end of the 1980s, researchers consider 22.

(36) learner’s language ability when design language tests; moreover, how to interpret the test scores seem to be more important and language testing became the mainstream in the field of applied linguistics (Bachman, 1990; Pinemann, Johnson and Brindley, 1988). In the 1990s, state of the art methodologies, like IRT and structural equation modeling, have replaced the traditional statistic methods (e.g. factor analysis) and become the trend in language testing (Bachman and Eignor, 1997). Also, the four skills (listening, speaking, reading and writing) are still important while researchers bring the idea of “cross-cultural pragmatics” (Bachman, 2000). Moreover, more tasks are included in the language assessment; for example, the multiple-choices, structured oral interviews or self-assessments (Bachman, 2000; Bachman and Palmer, 1996). At this time, testing languages for specific purposes (Wier, 1983) and new kinds of vocabulary tests are developed (Nation, 1990; Read, 1993, 1997). With the development of computer technology and IRT, it is possible to make computer-based and adaptive tests (Gruba and Corbel, 1997). In addition, Bachman (2000) points out that the computer-based tests are more authentic and interactive than traditional paper-and-pencil tests. Also, the testing procedure is taken into consideration; for instance, the item difficulty and test item characteristics (Bachman and Palmer, 1996; Freedle and Kostin, 1999). As for the performance assessment, there are two related fields: educational measurement and language teaching (McNamara, 1997; Bachman, 2000). Then the standardized multiple-choice tests (e.g. English proficiency test) and standards-based assessment have been developed at this time (Bachman, 2000). Next, according to Bachman (2000), there are two areas in today’s language 23.

(37) testing, and they are: professionalization of the field and validation research. It is suggested that the test designers need to be trained or join courses (Bailey and Brown, 1996; Boyd and Davies, 2002; Carless, 2005; Clarke and Hollingsworth, 2002; Lynch, 2003) to enhance their professional knowledge. For validation research, it is important for language testing researchers to notice the interpretations and the consequences of test use (Kunnan, 1998, 2005). In the future, there will be more new approaches applied to educational measurement,. and. these. approaches. might. be. combined. with. computer-administered assessment (Bachman, 2000; Leung, 2007). In addition, language testing researchers will apply more sophisticated, quantitative and qualitative methodologies in the development of language testing; for example, IRT or structural equation modeling (Bachman, 2000, 2005; Bonk and Ockey, 2003; Brookhart, 2003;. Kunnan, 1998; Lantolf and Poehner, 2004;. McNamara, 1997; O’Loughlin, 2002). Moreover, researchers may focus on interpreting test scores (Reath, 2004), the responsibilities of test developers and users (Bishop, 2004; Boyd and Davies, 2002), and the impact and consequences of large-scaled assessment (Bachman, 2005; McNamara and Roever, 2006; Reath, 2004). According to Bachman and Palmer (2010), there has been a lack of principled basis for linking researcher’s concerns with validity and consequences combining with qualitative and quantitative methods. Hence, this paper hopes to bring the innovative grey system theory to the language testing application. Since the researchers in the language testing field want to use both quantitative and qualitative methods, the author compiles the relevant literature 24.

(38) and they are shown in Table 2.1, and it is clear that there are more quantitative methods being used in the language testing field recently. Table 2.1 Relevant literature of language testing (Adapted from Bachman, 2007) Research Methods. Topic. Relevant literature. Quantitative. Criterion-referenced. Brown, 1989; Brown &. approaches. measurement. Hudson, 2002; Hudson, 1991; Lynch & Davidson, 1994. Generalizability theory Bachman, Lynch, & Mason, 1995; Bolis, Hinofotis, & Bailey, 1982; Kunnan, 1992; Schoonen, 2005; Stansfield & Kenyon, 1992 Item response theory. Bonk & Ockey, 2003; Choi & Bachman, 1992; Henning, 1984, 1992; McNamara, 1990; O'Loughlin, 2002; Weigle, 1994. Structural equation. Bachman & Palmer, 1981;. modeling. Choi, Kim, & Boo, 2003; Kunnan, 1998; Shin, 2005; Xi, 2005 (table continues). 25.

(39) Table 2.1 (continued) Research Methods. Topic. Relevant literature. Qualitative. Conversation/discourse Brown, 2003; Huhta, Kalaja, &. approaches. analysis. Pitkanen-Huhta, 2006; Lazaraton, 1996, 2002; Swain, 2001. Verbal protocol. Buck, 1991; Cohen, 1984;. analysis. Lumley, 2002; Uiterwijk & Vallen, 2005. Mixed methods. Speaking/ composition. Brown, 2003; Clapham, 1996; North, 2000; O'Loughlin, 2001; Sasaki, 1996; Uiterwijk & Vallen, 2005; Weigle, 1994. 26.

(40) 2.2 Educational Measurement Methods The paper considers two issues: how to interpret test scores and how to make efficient educational decision-making procedure. For the interpretation of assessment results (i.e. scores) and provide effective remedial instructions, the paper focuses on clustering students’ test performances, and understanding their knowledge structure. Hence, the methods of OT/IRS, and ISM will be discussed in this section.. 2.2.1 Ordering Theory and Item Relational Structure (OT/IRS) In order to define students’ knowledge structure after they took the test, the ordering theory (Airasian and Bart, 1973; Bart and Krus, 1973), and the item relational structure (Takeya, 1991) were introduced as follows (Chen, Lin, Yih and Yu, 2012; Lin and Chen, 2006; Lin, Yih and Ko, 2012; Liu, Wu and Chen, 2011; Wu, Kuo and Yang, 2012; Yang, Chen and Sheu, 2006). Step 1: Let X  ( X 1 , X 2 ,, X n ) denote a vector, including n binary item score variables. Step 2: Every examinee takes an n -item test which reached a vector. x  ( x1 , x2 ,  , xn ) , including answers of 0 (incorrect) and 1 (correct). Next, the joint marginal probabilities of item concepts j and k are displayed in Table 2.2.. 27.

(41) Table 2.2 The joint marginal probabilities of item concepts j and k Item k Xk 1. Xk  0. Total. X j 1. P( X j  1, X k  1) P( X j  1, X k  0). P( X j  1). Xj 0. P( X j  0, X k  1) P( X j  0, X k  0). P( X j  0). Item j. Total. P( X k  1). P( X k  0). 1. Based on Table 2.2, the ordering theory (OT) is defined as follows: When  *jk  P( X j  0, X k  1)    is the threshold which is between ,. 0.02 and 0.04 (Airasian and Bart, 1973), if the equation is established, it means the probabilities of getting the wrong answer on item j , and getting the correct answer on item k is below a certain value. Besides, the relationship between the two items can be recognized as X j  X k. .. On the other hand, the item relational structure (IRS), proposed by Takeya (1991), uses another index r jk* to define the ordering relationship between item j and item k . Here, r is used as the threshold, and it is defined as rjk*  1 . P( X j  0, X k  1) P( X j  0) P( X k  1). r.. If rjk*  r , it means concept j can be linked forward. to concept k , and the relationship can be recognized as X j  X k .. 28.

(42) 2.2.2 Interpretive Structural Modeling In order to define expert’s knowledge structure, the method of Interpretive Structural Modeling (ISM) is discussed in this section. Normally, in mathematics the S  {s1 , s2 ,, sn } indicates a set, in which the relation between si and s j is si Rs j . This is known as a binomial relationship and uses R( si , s j ) to express the magnitude of si Rs j (Janes, 1988; Kung, Hsieh and Yan, 2011; Lee, 2012; Lee and Chen, 2010; Nagai, Chung and Tsai, 2002; Nagai et al, 2005). There are three basic properties for the binomial relationships (Kung et al, 2011; Wen et al, 2010; Zhong, Zhang and Lin, 2007). 1. Reflexivity: si Rsi , si  S 2. Symmetry: si Rs j  s j Rsi , si , si  S 3. Transitivity: si Rsk , sk Rs j  si R s j , si , si , sk  S The ISM was a method proposed in 1968 by John Nelson Warfield (Wang, Wang, Wen, Nagai and Liang, 2011). The main idea is to sort out complex systems with a structuring method. In practice, there are n number of elements which form a set S , where S  {s1 , s2 ,, sn } . Then the direct product (cross product) defining S is S  S  {( si , s j ) | si , s j  S} (Yamaguchi, Li, Mizutani, Akabane, Nagai and Kitaoka, 2007). According to Sheu and Lin (1994), if there are k elements in the system and the binomial relationship of elements Ai and A j is known, then the relationship is defined as Aij  (aij ) k k . When aij  1 , it means that Ai is a. 29.

(43) prerequisite to A j . If aij  0 , it means that Ai is not a prerequisite to A j . For example, if we assume exist matrix A , which is shown below. Step 1: Obtaining the correlation matrix.  a11  a1n   aij  1 if , ( si , s j )  R A    aij    [aij ] , where    aij  0 if , ( si , s j )  R a n1  a nn . Step 2: Based on the calculation rules of ISM, 1. Find the value of ( A  I ) , where I is a unit matrix. 2. Calculate the power of ( A  I ) , repeating the calculation until the results of the matrix stops changing (Wang, Wang, Wen, Nagai and Liang, 2011), then the reachable matrix can be obtained as follows: ( A  I ) n 2  ( A  I ) n 1  ( A  I ) n  T. (2-1). where T is reachable matrix, in our example, the n=2, means 1 0  ( A  I )  0  0 0. 0 1 0 0 1 1 0 0  0 1 0 1  0 0 1 1 0 0 0 1. 1 0  ( A  I ) 2  0  0 0. 0 1 0 0  1 1 1 0 0  0   0 1 0 1 . 0   0 0 1 1 0 0 0 0 1 0. 1 0  ( A  I ) 3  0  0 0. 0 1 0 1 1 1 1 0 1 0  0 1 0 1 0  0 0 1 1 0 0 0 0 1 0. 0 1 0 0  1 1 1 0 0  0   0 1 0 1  0   0 0 1 1 0 0 0 0 1 0 0 1 0 1 1 1 1 0 1 0   0 1 0 1  0   0 0 1 1 0 0 0 0 1 0. 30. 0 1 0 1 1 1 0 1  0 1 0 1  ( A  I )  0 0 1 1 0 0 0 1 0 1 0 1 1 1 0 1  0 1 0 1  ( A  I ) 2  T  0 0 1 1 0 0 0 1.

(44) Step 3: After the calculation, the figure of ISM figure can be established, as shown in Fig. 2.1.. 5. 3. 4. 1. 2. Fig. 2.1 An example of ISM hierarchy. Besides, Wang et al (2011a) has used the ISM method to provide the concept structures of English grammar test items, and students’ learning concept performances.. 31.

(45) 2.3 The Grey System Theory Application in the Educational Measurement Field In this section, the proposed grey system theory application methods (i.e. GRA, GSP and GSM) in the educational field will be introduced. Next, the AHP method combined with the GRA method (called GRA-AHP) will be introduced in the educational decision-making field.. 2.3.1 Grey Relational Analysis and Grey Student-Problem Chart The grey system theory was proposed by Deng in 1982, and the grey system theory includes internal information system model which is either insufficient or incomplete, and the grey system theory can be used for relational analysis (Liang et al, 2011a~2011c; Lin and Wen, 2009; Wang et al, 2011a~2011b; Wang, Sheu and Nagai, 2011; Wang et al., 2012a~2012c; Yamaguchi, Li and Nagai, 2005). The grey system theory is to use discrete irregular data, generated by the accumulation of new data, and it has a form of regularity index and the establishment of differential equations, according to a new model to fit the data (Lee, Liang and Nagai, 2012; Liang et al, 2011a~2011c; Nagai et al, 2005; Wang et al, 2011a~2011b; Wang et al, 2012a~2012c; Wen, You, Nagai, Chang and Liang, 2010). The GRA is an important approach of the grey system theory because GRA not only applies to cluster the data which have same features, but also measures their relationships (Hwu, Liang, Chiang, Chu and Nagai, 2012; Lai and Chu, 2011; Li, Masuda and Nagai, 2011; Lin and Wu, 2010). The procedure of GRA generation is summarized as follows (Chen, Chen and Cho, 2011; Hsu, Ken and Lein, 2008; Hwu et al, 2012; Lai, Chen, Chen, Yeh and Cheng, 2009; 32.

(46) Nagai et al, 2005; Wang et al, 2011a~2011b; Wang et al, 2012a~2012c; Wen et al, 2010): Step 1: Establish raw data. In GRA space { P(X );  }, there is a vector: xi  ( xi (1), xi (2), xi (3), , xi (k )). (2-2). where i  0,1,2,3,, n , and k  1,2,3,, m (Wen et al, 2010). The details are shown as follows:. x0  ( x0 (1), x0 (2), x0 (3), , x0 (k ), , x0 (m)); k  1,2,3,..., m. (2-3). x1  ( x1 (1), x1 ( 2), x1 (3), , x1 ( k ), , x1 ( m)) x 2  ( x 2 (1), x 2 ( 2), x 2 (3), , x 2 (k ), , x 2 (m))  xi  ( xi (1), xi (2), xi (3), , xi ( k ), , xi ( m)). (2-4).  x n  ( x n (1), x n ( 2), x n (3), , x n ( k ), , x n (m)) i  1,2,3,, n. If we only use x0 as reference sequence, and the rest are inspected sequences, it is called “Localization GRA.” If any sequence in xi can be inspected sequence only, then is called “Globalization GRA.” Step 2: Grey relational calculation. 0i  ( xo (k ), xi (k )) .  max   0i  max   min. (2-5). 1. n. where 0i  x0i  ([ 0i (k )] )   k 1.  max and  min. represent  0i ’s maximum and minimum values,. respectively. When   1,2,3,, m , it is called Minkowski GRA (Nagai et al, 2005; Wen et al, 2010). In this paper, GRA calculation is used to reach the gamma value, which is 33.

(47) between 0 to 1. Moreover, it is used to do the clustering of students’ performances and find the objective solution of educational decision-making fields. To sum up, there are some publications which have indicated the GRA calculations is reliable for educational assessment (please see Wang et al, 2011a~2011b; Wang et al, 2012a~2012c). Next, the Student-Problem chart analysis (S-P chart analysis) was first invented by Takahiro Sato, who cared about the differences of response data obtained from student answers; moreover, the students’ responses are shown in graphs (Sato, 1975, 1980, 1985). The abnormal performances of students or problems can be diagnosed through the S-P chart and teachers can benefit from the results for diagnosing the learning effects of learners (Harnisch, 1984; Harnisch and Linn, 1981). Four numbered indices, such as disparity index, student caution index (CS), problem caution index (CP), and homogeneity index can be found in the S-P chart, and these indices help teachers diagnose student learning conditions, instructive achievement, and problem quality (Sato, 1980; Wu, 1999; Yih and Lin, 2010). Moreover, a performance profile curve of individual student can be drawn using the analyzed S-P chart data (Yu and Yu, 2006). At the end, teachers can provide remedial instructions and clear guidance for students based on the information of the data. Table 2.3 shows the matrix of the original data where there are 10 students and 10 problems. In Table 2.3, when students get the correct answer on the problem, the cell will be marked as 1, and when they get the wrong answer, the cell will be marked as 0.. 34.

(48) Table 2.3 Example of original data S-P chart. Students. Problems 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. correct. 1 2 3 4 5 6 7 8. 0 0 1 0 0 0 1 0. 1 0 1 0 1 1 0 1. 1 1 1 1 1 1 1 1. 1 1 1 1 1 1 1 1. 0 0 1 1 0 0 0 0. 1 1 1 1 1 1 1 1. 0 1 1 1 1 1 1 1. 1 1 1 0 0 0 1 0. 1 0 1 1 0 1 0 1. 1 0 1 1 1 1 1 1. 7 5 10 7 6 7 7 7. 9 10. 1 1. 1 0. 0 1. 1 1. 0 1. 1 1. 1 1. 1 1. 1 1. 1 0. 8 8. number. 4. 6. 9. 10. 3. 10. 9. 6. 7. 8. After all the test information is recorded, the students and problems are sorted from high to low (0 to 1), and the sorted results are shown in Table 2.4.. Table 2.4 Sorted S-P chart of Table 2.3. Students. Problems 4. 6. 3. 7. 10. 9. 2. 8. 1. 5. correct. %. 3. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 10. 100. 9 10. 1 1. 1 1. 0 1. 1 1. 1 0. 1 1. 1 0. 1 1. 1 1. 0 1. 8 8. 80 80. 1. 1. 1. 1. 0. 1. 1. 1. 1. 0. 0. 7. 70. 4 6. 1 1. 1 1. 1 1. 1 1. 1 1. 1 1. 0 1. 0 0. 0 0. 1 0. 7 7. 70 70. 7. 1. 1. 1. 1. 1. 0. 0. 1. 1. 0. 7. 70. 8. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 7. 70. 5. 1. 1. 1. 1. 1. 0. 1. 0. 0. 0. 6. 60. 2. 1. 1. 1. 1. 0. 0. 0. 1. 0. 0. 5. 50. 10 10 9 100 100 90. 9 90. 8 80. 7 70. 6 60. 6 60. 4 40. 3 30. number %. 35.

(49) According to Table 2.4, students with higher scores are in the upper part of the S-P chart, and upper-left part shows the corrected problems answered by the students. Besides, it is able to draw the S-curve (solid line) and P-curve (dotted line). The function of the S-P chart is to evaluate the learning progress and provide the remedial plan to improve the curriculum and teaching based on Bloom’s learning evaluation (Chacko, 1998; Chen, Lai and Liu, 2005): diagnostic evaluation, formative evaluation and summative evaluation. The S-P chart can not only be applied to the diagnostic evaluation during the learning process, but also make the progress in formative evaluation. Then, Nagai first introduced the Grey Student-Problem chart in 2010, and it is the combination of GRA and S-P chart which not only makes the analysis more concrete and accurate, but can also be applied to analyze uncertain factors (Sheu et al, 2011; Wang et al, 2011a~2011b; Wang, Sheu and Nagai, 2011; Wang et al, 2012c). Through using the equations, the GSP can make the readable chart effectively and find out the weighting or ordinal numbers between the discrete data. The GSP provides the educational assessment of English listening performances, product design and product professional courses to define curriculum assessment results which are based on grey relational analysis (Sheu, Liang, Wang, Tzeng and Nagai, 2010; Sheu, Tzeng, Liang, Wang and Nagai, 2010; Sheu, Wang, Liang, Tzeng and Nagai, 2010). To sum up, it is an effective way to deal with complicated factors and cause-effect analysis (Sheu et al, 2011; Wang et al, 2011a~2011b; Wang et al, 2012c). For the algorithm of GSP chart formation, they are shown as follows: Step 1: Construct decision matrix and grey relational construction. 36.

(50) Step 2: Normalize the data of the decision matrix and follow three principles of establishing the sequences: 1. Non-dimension: the factor of the sequence does not have units; 2. Scaling: the factor of the sequence values should be less than 100; 3. Polarization: the factor of the sequence description should be in the same direction. According to the above processes, the GSP chart in this paper is shown in Fig. 2.2. Test Items GPj , j  1 ~ n. Overall score. LGRA-S. High Y  yij . Student ID. Gamma. GS i , i  1 ~ m Low m. (Recognize number) LGRA-P. most. least. n.  SS   PP i. i 1. j. j 1. Gamma Fig. 2.2 Grey S-P chart framework. In Fig. 2.2, S i refers to student ID numbers, Pj refers to test items, and Y  yij  refers to students’ responses toward test items. When yij  0 , it. means students get wrong answers; yij  1 means students get correct answers. By using the proposed GSP chart, it is possible to make some of the 37.

(51) shortcomings of the S-P chart, and they are summarized as follows (Wang et al, 2011a~2011b; Wang et al, 2012a~2012c): 1. Through the GRA calculation, the data are in the interval of 0 and 1 in the GSP chart. 2. There is no restriction on the number of students and problems in the GSP chart. 3. The traditional S-P chart fails to compare the students with the same level; however, the proposed GSP  distribution figure can make up the defect. The GSP method has also proven to be a reliable and innovative method in some educational publications (please see Wang et al, 2011a~2011b; Wang et al, 2012a~2012c). For example, based on Table 2.3, the students’ GSP matrix can be established in Table 2.5. Table 2.5 Example of students’ matrix of Table 2.3 Problems Students. P1 P2 P3 P4 P5 P6 P7 P8 P9 P10. larger-the-better 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. gamma value. S1 S2 S3 S4 S5 S6 S7. 0 0 1 0 0 0 1. 1 0 1 0 1 1 0. 1 1 1 1 1 1 1. 1 1 1 1 1 1 1. 0 0 1 1 0 0 0. 1 1 1 1 1 1 1. 0 1 1 1 1 1 1. 1 1 1 0 0 0 1. 1 0 1 1 0 1 0. 1 0 1 1 1 1 1. 0.225 0.000 1.000 0.225 0.106 0.225 0.225. S8 S9 S10. 0 1 1. 1 1 0. 1 0 1. 1 1 1. 0 0 1. 1 1 1. 1 1 1. 0 1 1. 1 1 1. 1 1 0. 0.225 0.368 0.368. 38.

(52) After transforming the data in Table 2.5, the matrix of problems can be established in Table 2.6. Table 2.6 Example of problem’s matrix of Table 2.3 Students. S1. S2. S3. S4. S5. S6. S7. S8. S9 S10. larger-the-better. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. P1 P2 P3. 0 1 1. 0 0 1. 1 1 1. 0 0 1. 0 1 1. 0 1 1. 1 0 1. 0 1 1. 1 1 0. 1 0 1. 0.074 0.244 0.622. P4 P5 P6 P7 P8 P9 P10. 1 0 1 0 1 1 1. 1 0 1 1 1 0 0. 1 1 1 1 1 1 1. 1 1 1 1 0 1 1. 1 0 1 1 0 0 1. 1 0 1 1 0 1 1. 1 0 1 1 1 0 1. 1 0 1 1 0 1 1. 1 0 1 1 1 1 1. 1 1 1 1 1 1 0. 1.000 0.000 1.000 0.622 0.244 0.345 0.466. Problems. gamma value. Finally, the GSP chart can be established based on the information in Table 2.5 and Table 2.6, and the example of GSP figure is presented in Fig. 2.3. 1.000, P4. 1.0. 1.000, P6/ S3. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0. 0.622, P7 0.622, P3 Students. 0.466, P10 0.345, P9 0.368, S10 0.368, S9 0.244, P8 0.244, P2 0.225, S6 0.225, S8 0.225, S4 0.225, S7 0.106, S5 0.225, S1 0.074, P1 0, P5/ S2 10 9 8 7 6 5 4 3 2 1. Fig. 2.3 The example of GSP 39. Problems.

(53) 2.3.2 Grey Structural Modeling Grey Structural Modeling (GSM) is originated from the grey relational analysis (GRA), and it is established from two steps: estimating a hierarchy and paths among the elements (Hwu et al, 2012; Liang et al, 2011a~2011c; Nagai et al, 2005; Wang et al, 2011a~2011b; Wen, 2004; Yamaguchi, Kobayashi, Mizutani and Nagai, 2004; Yamaguchi, Li, Mizutani, Akabane, Nagai and Kitaoka, 2007). The generation of GSM is described below (Nagai et al, 2005; Wang et al, 2011a~2011b): Step 1: S is a set and. si , s j. are given elements in S . The matrix of S is. as follows:.  s11 s12  s1m  s s  s  2m  S   21 22         sm1 sm 2  smm . (2-6). where i, j  1,2,3,, m ; 0  sij  1 (Nagai et al, 2005). Step 2: In order to find the path of elements, the grey relational analysis is applied as follows:.  ij  1 . || xi  x j || max max || xi  x j || i. (2-7). j. where   2. Step 3: Setting hierarchical class information. Let Ci become the hierarchical class set and each Ci is given as follows (Nagai et al, 2005):. . Ci  s j eij   . 40. (2-8).

(54) For i  j , eij  0 . Then the matrix is shown as follows:  e11 e E   21     em 1. e12 e22  em 2.  e1m   e2 m        emn . (2-9). Step 4: Path information is given as follows (Nagai et al, 2005): P  ( si , s j )  ij   ,  0i   0 j . (2-10). where  is path coefficient (0    1) ;  is the grey relational matrix and it is defined as follows:.  11  12   1m       21 22 2m           m1  m 2   mm . (2-11). Moreover, the GSM method has proven to be an innovative and reliable method for presenting structural among elements in some publications (please see Liang, 2011; Liang, Lee and Nagai, 2011; Wang et al, 2011b).. 41.

(55) 2.3.3 Analytic Hierarchy Process Combined with GRA The Analytic Hierarchy Process (AHP) was proposed by American Operations researcher Thomas L. Saaty in the 1970s, it is a multi-objective decision analysis method which combines the quantitative and the qualitative that transforms the decision-makers experience to judge into numerical values (Saaty, 1986, 1990, 2008). It is a practical method which uses hierarchy to determine the weight coefficients. For example: for multi-objective problems, each objective is different in importance. The comparison and quantification of the objectives’ importance by the decision-maker is called “value trade-off,” which ultimately reflects upon the “weighing of coefficients” of each objective or when the decision-maker is “willing to pay a price” to achieve a certain goal (Saaty, 2008). Therefore, this is when AHP could be used to solve many problems (Saaty, 2008). Let w1 , w2 ,, wn as weighting elements of level A1 , A2 ,, An of the previous level, and the important degree of a i to a j is a ij . The comparative matrix of elements A1 , A2 ,, An is A  [aij ] . When w1 , w2 ,, wn is reached, the equation A of its AHP table is shown as follows:. A  [aij ]  A1  w1 w  1  w2  w1    wn   w1. . A2  An w1 w2 w2 w2  wn w2. w1  wn  A1  w2  A  2 wn      wn  An   wn  . 42. A1  1  1   a12    1  a1n. A2  An a12 1  1 a2n.  a1n    a2n       1  . A1 A2  An.

(56)  w1  w  wi 1 Where aij  , a ji  , W   2 , i, j  1, 2, 3, , n  wj aij    wn  aij  a jk  aik (i, j, k  1, 2, 3, , n). Saaty (1986, 1990, 2008) developed the following steps for applying the AHP: Step 1: Define the problem and determine its goal and create the level model of the system through structuring the hierarchy from the top (the objectives from a decision-maker's viewpoint) through the intermediate levels. Step 2: Construct a set of pair-wise comparison matrices (size n  n ) for each of the lower levels with one matrix for each element in the level immediately above by using the relative scale measurement shown in Table 2.7. The pair-wise comparisons are done in terms of which element dominates the other. There are. n(n  1) judgments required to develop the set 2. of matrices in step three. Reciprocals are automatically assigned in each pair-wise comparison. Step 3: Hierarchical synthesis is now used to weight the eigenvectors by the weight of the criteria and the sum is taken over all weighted eigenvector entries corresponding to those in the next lower level of the hierarchy. Step 4: Having made all the pair-wise comparisons, the consistency is determined by using the eigenvalue, take the maximum eigenvalue ( max ) to calculate the consistency index, C.I . as follows: C. I . . (max  n ) , where n (n  1). is the matrix size. Judgment consistency can be checked by taking the 43.

(57) consistency ratio ( CR ) of C.I . with the appropriate value in Table 2.7 ( C.I .  0.1 ).. Table 2.7 Pair-wise comparison scale for AHP preferences Numerical rating. Verbal judgments of preferences. 9. Extremely preferred. 8. Very strongly to extremely. 7. Very strongly preferred. 6. Strongly to very strongly. 5. Strongly preferred. 4. Moderately to strongly. 3. Moderately preferred. 2. Equally to moderately. 1. Equally preferred. By using the AHP method combing with GRA method, there are some publications which are proved to be reliable for decision-making process (please see Liang et al, 2011b~2011c; Wang et al, 2011b).. 44.

(58) Chapter 3 Experimental Methods Based on the methodologies mentioned in chapter 2, this chapter explains the framework of the research, followed by instrument and the research method.. 3.1 Research Design According to the research motivation, research purpose and methodologies, the research framework of this study is shown in Fig. 3.1.. English coursebook chosen. English Test. CTT. GRA/GSP. ISM. OT/IRS. GRA. GSP. Test Item difficulties/ Test Design Review//. Experts’ Concept Structure. GRA. GSM. Students’ Item Response Structure. Optimal remedial instruction path. Educational Decision-making evaluation. Fig. 3.1 Framework of the thesis. 45. AHP. GSP. GSM.

(59) Fig. 3.1 shows the framework of the study which starts from taking English tests, including grammar tests and reading tests. Then the gamma values of test items are calculated via Matlab toolbox (Wen, et al, 2006; Wen, et al, 2010). Based on the gamma values, GSM structure of test items can be established, which shows the hierarchy and relationship between test items. Finally, test designers can review the test items and decide to revise the test or provide remedial instructions. Moreover, the study also uses GRA to the educational decision-making field, that is, the study tries to find the objective method for choosing an English coursebook. By combing GRA and AHP, the experts’ opinions are analyzed in an objective, scientific way.. 46.

(60) 3.2 Significance of the Research The originality of this research could be concluded as follows: 1. The students’ test performances are clustered based on the gamma values obtained from GRA calculation, and the results are shown clearly in the GSP figure through Matlab GUI toolbox. Besides, the gamma values are between 0 to 1, and the GSP can be applied to both large pool of data and small data, which is better than traditional S-P chart. 2. Also, the GSM figures present the learning concepts in different levels, and it is an innovative way. In addition, the paper draws two-dimensional space figures which not only show the relationship between elements, but also present the hierarchy of each element (Wang et al, 2011a~2011d; Wang et al, 2012a~2012c) which proves that this is a prospective research of cross-domain integration with the originality and feasibility. 3. In current English teaching field, it seldom uses engineering software to assist and analyze the data. In past software development, there are some achievements, but it lacks the discussion of Matlab GUI toolbox to cluster students’ test performances. The proposed toolbox in this paper aims to assisted analysis and validation. The Matlab GUI toolbox used in this paper is not only user-friendly, but also has powerful analytical functions. 4. The interface of Matlab GUI toolbox shows both the students’ test performances and test item difficulties in a user-friendly interface. This novelty approach is efficient, and it is suggested to become an information science tool in the educational field. 5. The paper uses Grey-AHP method (GRA and AHP) to evaluate the English coursebook chosen process in a more objective way. It presents 47.

(61) the coordinates’ distance from the origin. Comparing with the traditional math methods, the gamma values reached by GRA can be positioned and sorted correctly. Also, the results are clear to present both the weighting of each professional and English coursebook selection criteria. This method is also suggested to be used in other decision-making fields, such as business or medical fields.. 48.

(62) 3.3 Research Instrument The research instrument and the relevant experimental tests are shown in Table 3.1. Table 3.1 Research instrument and the relevant experimental tests Instrument •. Relevant experimental tests. English. (one-hour. Proficiency test),. Test. including. 30. listening questions, 10 grammar questions and 30 reading questions (please. see. Appendix. 1. &. 1. Concept diagnosis of English grammar 2. Evaluate English test item difficulties. Appendix 2). • Matlab GUI toolbox •. English. coursebook. selection. criteria. 3. Evaluate the English coursebook chosen process (weighting). In the first experimental test, there are four verb concepts included in the grammar test, and they are introduced in Table 3.2.. Table 3.2 The verb concepts in the grammar test (Adapted from Harmer, 2007) Verb forms Present tense. Examples I go to work every day. He always goes to school by bus.. Simple past tense. They went to Taipei last weekend. She visited the museum last year. (table continues) 49.

(63) Table 3.2 (continued) Verb forms Be verbs. Examples I am from the United States. They are my grandparents.. Present continues. Look! They are crossing the road.. and present tense. She can’t answer the phone because she is taking a shower.. Next, the purpose of this English proficiency test is to measure whether freshman students' English ability reach CEF B1 level (Common European Framework B1 Level) or not, approximately equivalent to the TOEIC Test score 600. To distinguish the proficient learners from the less proficient ones, a second instrument, Matlab GUI toolbox, is used to calculate students’ gamma values and cluster them. For experiment 3, the author uses the English coursebook selection criteria to summarize five important factors (Cunningsworth, 1995; Ellis, 1997; Richards, 2001). Then the soft-computing calculation methods of AHP and GRA are used to decide the order of importance for the factors in an objective and scientific way.. 50.

參考文獻

相關文件

Generic methods allow type parameters to be used to express dependencies among the types of one or more arguments to a method and/or its return type.. If there isn’t such a

• Summarize the methods used to reduce moral hazard in debt contracts... Basic Facts about Financial Structure Throughout

• Summarize the methods used to reduce moral hazard in debt contracts.2. Basic Facts about Financial Structure Throughout

According to the related researches the methods to mine association rules, they need too much time to implement their algorithms; therefore, this thesis proposes an efficient

3 recommender systems were proposed in this study, the first is combining GPS and then according to the distance to recommend the appropriate house, the user preference is used

This can be used to guide small and medium construction industry to setup occupational safety and health management.. The checklists can be used to build and evaluate

Simulations are conducted to show the effectiveness of the proposed methods. The proposed BPS-SA-DCT and BPS-BBGM methods are compared with LPE, SA-DCT, BBM, and BBGM methods.

Keywords: Junior high students, Computer-Assisted Language Learning (CALL), English teaching and learning methods, perceived usefulness, perceived ease to use, willingness