運用寫作評量軟體批改高中生英文作文之研究

全文

(1)國立臺灣師範大學英語學系碩士論文 Master Thesis Graduate Institute of English National Taiwan Normal University. 運用寫作評量軟體批改高中生英文作文之研究. Utilizing Automated Writing Evaluation Software in Correcting High School Students’ Compositions. 指導教授: 陳浩然博士 Advisor: Dr. Hao-Jan Howard Chen 研究生: 鄭惠文 Huei-Wen Cheng. 中華民國一百零一年八月 August, 2012.

(2) 中文摘要在英文學習領域中，隨著網際網路的發達與全球化的影響，英文寫作扮演日益重要的角色。學生期盼能有更多的練習，以便因應將來各種需要以英文寫作的場合。但在台灣的高中裡，因班級人數眾多，批改作文及給予適當回饋建議對英文老師是一大負擔。近年來有些寫作評量軟體可以提供學生立即的批改與回饋，可以減輕老師的負擔。但隨著越來越多人使用這些軟體，有必要針對系統優缺點進行進一步研究。本研究檢驗一個新的寫作評量軟體 Correct English 所提供的寫作錯誤回饋並發現此軟體的優點與缺點。研究對象為 90 位來自兩所公立高中的高三學生。本研究收集此軟體針對 146 篇高中學生作文所提供的各種寫作錯誤回饋並加以分析；並進一步比較軟體提供的回饋與老師提供的回饋之間的差異。結果顯示此軟體提供了 40 種不同的回饋訊息，但大約有三分之一的回饋是錯誤訊息。此外，與老師提供的回饋相比，此系統仍然無法偵測許多學生的常見錯誤，如介系詞、時態、詞性與冠詞等；也無法提供句子改寫的功能。本研究建議使用此類寫作評量軟體的老師須注意其在教學上的使用。融合老師的引導與詳細的指示，英文老師便能善用現有的軟體功能來幫助學生寫作。關鍵字:錯誤回饋、自動寫作評量、英文作文. i.

(3) ABSTRACT In respect of English learning, the ability of writing well has been more and more important with the advances of the Internet and the trend of globalization. Learners expect to have more practices to prepare themselves for various occasions in which they have to write in English. However, for high schools in Taiwan, with more than 40 students in a class, grading and giving feedback on students’ writing has been a burden for English teachers in Taiwan. In recent years, there have been some automated writing evaluation (AWE) systems developed to provide learners with computerized feedback. These systems seem to be an alternative way to help teachers in the process of correcting essays since AWE systems promise to provide immediate score and feedback. Targeting at a newly-developed AWE system, this study aims to investigate the employment of the AWE system in correcting high school students’ compositions and find out whether the system can provide usable feedback for its users to revise their writing. A total of 90 12th grade students from two senior high schools in Taipei were recruited in the study. Each of the students was asked to write two compositions on the assigned topics. An automated writing evaluation program called Correct English was used to generate the computerized feedback messages. At the same time, two human raters corrected and commented on the compositions. Afterwards, the computer-generated feedback messages on writing errors were compared with those of human raters’. The results showed that Correct English provided 40 types of error feedback. Among the error feedback messages provided by the AWE system, one third of them were false alarms, which would confuse the learners. In addition, compared with the errors identified by human raters, there were still many common errors left untreated by the AWE system, such as errors in prepositions, verb tense, word form, and articles. Besides, human raters rewrote the sentences or provide suggestions when there were unclear or ungrammatical expressions, but the AWE system was not able to offer feedback on sentence level. Based on the major findings of this study, it is suggested that language teachers should pay attention to the use of AWE systems in class. Teacher’s guidance, specific instructions and follow-up activities should be incorporated so that instructors can make the best use of the available functions to assist learners to become better writers. Keywords: error feedback, automated writing evaluation, English composition. ii.

(4) ACKNOWLEDGEMENT This master thesis would never have been completed without the assistance and support of many gracious people. First and foremost, special recognition and sincerest gratitude must be given to my supervisor, Dr. Hao-Jan Chen, whose expertise, excellent research ability, patient guidance and invaluable advice gave me plenty of insights on my thesis. In addition, his continual encouragement inspired me to get through the process of thesis writing. As a mentor, Dr. Chen guided me to explore the field of teaching English writing and enlightened me on the possibility of incorporating computerized feedback into ESL writing instructions. My sincere appreciation also goes to the committee members, Dr. Zhao-Ming Gao and Dr. Chun-Chieg Tseng for their thoughtful suggestions and comments during my oral defense. It was their generous sharing of opinions and implicit advice that improved and refined the study. Special thanks should be extended to the all the participants from Taipei First Girls’ High School and Taipei Chun-Ho Senior High School. Their help for completing two compositions contributed to the data collection and analyses in this thesis. Besides, heartfelt thanks should also be given to the other anonymous rater, who patiently and carefully helped correct the 146 writing samples. Finally, this thesis is especially dedicated to my family. I owe a great deal of gratitude to my parents, who have always accompanied me and taken good care of my two little children. Wholehearted thankfulness also goes to my husband, who has always been considerate supportive during the years I study for the master degree. It is because of their love and support that I am able to overcome the difficulties and frustrations in the process of thesis writing.. iii.

(5) TABLE OF CONTENTS List of Tables…………………………………….……………………………….…..vi List of Figures…………………………………………….…………………...……viii Chapter One Introduction........................................................................................... 1 1.1 Background .................................................................................................... 1 1.2 Purpose of the study....................................................................................... 4 1.3 Research Questions ........................................................................................ 4 1.4 Significance of the Study ............................................................................... 5 1.5 Organization of the Thesis ............................................................................ 5 Chapter Two Literature Review ................................................................................. 7 2.1 Issues in error correction............................................................................... 7 2.1.1 The case against grammar correction ............................................... 7 2.1.2 The case for grammar correction ...................................................... 8 2.1.3 To correct or not to correct? .............................................................. 9 2.2 Brief Introduction of the Development of Automated Writing Evaluation ...................................................................................................................... 10 2.3 Previous studies of using AWE systems in EFL contexts .......................... 12 2.3.1 Previous studies on students’ perceptions ....................................... 12 2.3.2 Previous studies concerning AWE grammar feedback accuracy . 18 2.3.3 Previous studies about the comparison between AWE and peer feedback ...................................................................................................... 21 2.4 Overall Strengths and Weaknesses of AWE Systems ................................ 24 2.5 Summary of Chapter Two ........................................................................... 26 Chapter Three Methodology ..................................................................................... 27 3.1 Subjects ......................................................................................................... 27 3.2 Instrument .................................................................................................... 28 3.3 Procedure ...................................................................................................... 32 Chapter Four .............................................................................................................. 34 4.1 Types and numbers of corrective feedback identified by Correct English and human raters ............................................................................................... 34 4.2 The analysis of the error feedback types provided by Correct English ... 36 4.2.1 The top twenty error types identified by Correct English and their accuracy rates ............................................................................................. 37 4.2.2 Discussion of the performance of Correct English ......................... 48 4.2.3 Other findings in the error feedback provided by Correct English ...................................................................................................................... 51 4.3 The analysis of the error feedback types identified by human raters ..... 53 4.3.1 The analysis of the top twenty error types identified by human iv.

(6) raters and the comparison with feedback provided by Correct English ...................................................................................................................... 54 4.3.2 Comparison of the top twenty error feedback types provided by human raters and Correct English............................................................ 60 4.3.3 Discussion of the error feedback messages provided by human raters and Correct English......................................................................... 61 4.4 Summary of Chapter Four .......................................................................... 63 Chapter Five ............................................................................................................... 65 5.1 Summary ....................................................................................................... 65 5.2 Pedagogical Implications ............................................................................. 66 5.3 Limitations of the Present Study and Suggestions for Further Research ...................................................................................................................... 68 REFERENCES ........................................................................................................... 70 Appendix: Teacher’s Rewrite.................................................................................... 75. v.

(7) List of Tables Table 1. The Number and Error Types Provided by Correct English and Human Raters ............................................................................................................. 35 Table 2. The Types and Numbers of Error Messages Provided by Correct English ... 36 Table 3. Examples of Correct Detection and False Alarms for Spelling Errors by Correct English ............................................................................................ 37 Table 4. Examples of Correct Detection and False Alarms for Clause Errors by Correct English ............................................................................................ 38 Table 5. Examples of Correct Detection and False Alarms for Subject-Verb agreement Errors by Correct English .............................................................................. 39 Table 6. Examples of Correct Detection and False Alarms for Word Form Errors by Correct English .............................................................................................. 39 Table 7. Examples of Correct Detection and False Alarms for Punctuation Errors by Correct English .............................................................................................. 40 Table 8. Examples of Correct Detection and False Alarms for Noun Phrase Consistency Errors by Correct English ......................................................... 40 Table 9. Examples of Correct Detection and False Alarms for Infinitive or –ing Forms by Correct English ......................................................................................... 41 Table 10. Examples of Correct Detection and False Alarms for Verb Group Consistency Errors by Correct English ......................................................... 42 Table 11. Examples of Correct Detection and False Alarms for Weak / Non-standard Modifiers by Correct English ........................................................................ 42 Table 12. Examples of Correct Detection and False Alarms for Adverb Placement Errors by Correct English .............................................................................. 43 Table 13. Examples of Correct Detection and False Alarms for Capitalization Errors by Correct English ......................................................................................... 43 Table 14. Examples of Correct Detection and False Alarms for Missing/ Unnecessary/ Incorrect Articles by Correct English ............................................................ 44 Table 15. Examples of Correct Detection and False Alarms for Wordy Expressions by Correct English .............................................................................................. 44 Table 16. Examples of Correct Detection and False Alarms for Redundant Expressions by Correct English ..................................................................... 45 Table 17. Examples of Correct Detection and False Alarms for A vs. An Errors by Correct English .............................................................................................. 46 Table 18. Examples of False Alarms for Vague quantifiers by Correct English ......... 46 Table 19. Examples of Correct Detection and False Alarms for Preposition Errors by Correct English .............................................................................................. 47 vi.

(8) Table 20. Examples of Correct Detection and False Alarms for Nouns: Mass or Count Errors by Correct English .............................................................................. 47 Table 21. Examples of Correct Detection for Open vs. Closed Spelling Errors by Correct English .............................................................................................. 48 Table 22. Examples of Correct Detection and False Alarms for Word Confusion by Correct English .............................................................................................. 48 Table 23. Summary of the Accuracy Rates of the Top Twenty Error Feedback Messages by Correct English ........................................................................ 49 Table 24. Examples of Correct Detection and False Alarms for Homonyms by Correct English and Human Raters............................................................................. 51 Table 25. Examples of False Alarms for Passive Voice Usages by Correct English ... 52 Table 26. Examples of False Alarms for Clichés by Correct English ......................... 52 Table 27. The Types and Numbers of Error Messages Identified by Human Raters ... 53 Table 28. Examples of Teacher’s Rewrite by Human raters and the Comparison with Feedback by Correct English ......................................................................... 55 Table 29. Examples of Word Confusion Errors Identified by Human Raters .............. 57 Table 30. Examples of Misused Words Detected by Human Raters............................ 57 Table 31. Examples of Detection of Idiomatic Expressions Errors by Correct English and Human Raters .......................................................................................... 58 Table 32. Examples of Detection for Relative Pronoun Errors by Human raters ........ 59 Table 33. Examples of Detection for Word Order by Human raters and Correct English ........................................................................................................... 59 Table 34. Examples of Detection for Run-on Sentences by Human raters .................. 59 Table 35. Examples of Detection for Unclear Meaning by Human raters ................... 60 Table 36. The Top Twenty Types and Numbers of Error Messages Identified by Correct English and Human Raters ............................................................... 60 Table 37. The Number and Error Types Shared by Correct English and Human Raters ……………………………………………………………………………62. vii.

(9) List of Figures Figure 1. Basic Check Marked by Correct English ................................................ ….29 Figure 2. Grammar and Usage Feedback Provided by Correct English ...................... 30 Figure 3. Style Choice Suggested by Correct English................................................. 30 Figure 4. The Function of Writing Help in Correct English ........................................ 31 Figure 5. The Function of Reference in Correct English............................................. 32 Figure 6. Occurrence of the Error Types Provided by Correct English....................... 37 Figure 7. The Distribution of Accuracy Rate for the Top Twenty Error Types Detected by Correct English ....................................................................................... 50 Figure 8. Occurrence of the Error Types Provided by Human Raters…………….54. viii.

(10) Chapter One Introduction 1.1 Background In recent years, with the advances of the Internet and the growing tendency of English as a global language, there is a growing need to write well in English in order to communicate with people in different countries and in different settings. English learners are eager to master the skills of writing so that they can be better prepared for the numerous occasions in which they need to write in English. For high school students in Taiwan, English composition in particular has a more dominant role in the process of English learning, for students have to take the college entrance exams in which a more than 120-word English composition is required. In addition to memorizing a large amount of vocabulary, numerous idioms and phrases, or having translation practices, high school students expect to have more writing practices in their English class. To respond to students’ requests, most teachers assign students as many writing topics as possible in the belief that more practices will help polish students’ writing skills and make students better writers. In a typical composition class in high school, students write for an assigned topic within a limited amount of time, and the teacher collects their works for grading. A few days or weeks later, students receive their compositions with a grade, numerous red-ink marks indicating their errors and a few words or lines of feedback from the teacher. Since there is only limited time in class, students rarely have the chance to revise, and they will be asked to write for a new topic in the next class. In most cases, teachers serve as the only source for correction and feedback. While providing students with feedback on error correction remains an issue under debate (Truscott, 1996, 1999; Ferris, 1999, 2004), it is generally believed that feedback enables students to discover the weaknesses in their works and thus help them improve the quality of their writing. However, with 1.

(11) more than 40 students in a class, marking students’ works and giving proper feedback becomes a tiring, time-consuming burden for most language teachers. They may in turn reduce the frequency of giving students assignments, which contradicts the idea of having more practices on writing. Revisions and multiple drafts seem nearly impossible. Therefore, pursuing a more efficient way to grade students’ works and give feedback has become a goal for many language teachers. With the rise of CALL technology in classroom, there seems to be an alternative way which can relieve teachers from their heavy workload –automated writing evaluation (AWE) programs. Originally developed by universities and some corporations in the U.S. for large-scale high-stake assessments, some of the programs are now available for the schools to aid writing instructions, also in ESL and EFL context. These programs utilize various techniques such as Natural Language Processing (NLP) skills or Artificial Intelligence (AI) to simulate how teachers and human raters grade students’ works (Burstein et al., 2003; Elliot, 2003). These systems provide instant scores and some even with detailed feedback for students to improve their writing. Boasting immediacy and the high correlations with the scores of human raters (Burstein et al., 2003; Elliot, 2003), automatic essay scoring seems to be a promising solution to language teachers’ heavy workload. Though a relatively new research field, with numerous attempts made to perfect the techniques, AWE has been adopted in some of the large scale assessments, such as Test of English as a Foreign Language (TOFEL) and Graduate Management Admissions Test (GMAT), in which students’ score is determined by a human rater and e-rater at the same time. Online programs incorporating the computerized scoring mechanism and feedback functions are also available, two of the better-known are My Access!, developed by Vantage Learning, and Criterion, an automated evaluation product developed by Educational Testing Service (ETS). These AWE programs not 2.

(12) only provide students with a score, but instant feedback on grammar, style, and word usages; in addition, some advice on content and organization are also given (Burstein et al., 2003; Vantage Learning, 2003). Then learners can revise their works based on the suggestions they receive from the software. Through intense advertising and marketing strategies, these systems are implemented in the K-12 environment and the college settings in the U.S. and now some universities outside the U.S. are trying to incorporating AWE into their writing classes. There has been some research regarding the validity of these AWE programs conducted by the companies. They claim a high correlation between the computer and human readers, and propose that AWE should be viewed as a supplement to facilitate the writing instructions rather than replacement of teachers (Vantage,2003; Burstein, 2003). Research concerning students’ perceptions toward the software has also been carried out. The results show that students are impressed with the immediacy but feel unsatisfied with the “too general” feedback (Yang, 2004; Chen& Cheng, 2006). Most of the previous studies concerning AWE focus on the validity of systems or students’ perceptions of the software, but research on the comparison and analysis of the computer-generated feedback quality is relatively few. In the present study, the researcher will use a new AWE program as the instrument to grade students’ compositions, and then compare the results with those of human raters to see whether there are strengths or limitations in the computerized feedback mechanism. With the advancing technology, the researcher believes that it may be a plus for language teachers to make use of the available tools and resources to facilitate students’ learning and also, to make the grading process a less painful one.. 3.

(13) 1.2 Purpose of the study The study is aimed to explore the employment of an automated essay evaluation system in correcting high school students’ compositions. A newly developed automated writing evaluation program called Correct English, created by Vantage Learning, is used in the present study. The computer-generated feedback will be compared with those of human raters’. The researcher would like to find out whether the new automated writing evaluation system can provide comprehensive, usable feedback for its users so that it can help teachers in the aspect of providing feedback, and then further be used in the real classroom settings. Since writing errors are the most obvious and frequently corrected part in students’ compositions, the present study will focus on comparison of writing error feedback provided by AWE system and human raters. The subjects are ninety 12th grade students from two social science classes in two different senior high schools in Taipei. One class is of intermediate level; the other is of higher-intermediate level. Each of the students will be asked to write two compositions on the assigned topics. There will be two human raters to grade and comment on students’ writing; at the same time, the AWE system also will be used to correct the compositions; then a detailed analysis and comparison of feedback between AWE system and human raters on grammar will be conducted.. 1.3 Research Questions To better know the strengths and weaknesses of employing automated writing evaluation system in grading and commenting students’ compositions, there are three questions worth exploring: (1) What kinds of writing errors are identified by the AWE system? What are their accuracy rates? 4.

(14) (2) What kinds of writing errors are identified by human raters? (3) What are the major differences between the writing error feedback provided by AWE system and by human raters?. 1.4 Significance of the Study This study tries to examine the error feedback accuracy provided by the AWE system. Although many researchers and teachers are interested in the CALL area and have been trying to incorporate various kinds of technology in English learning, automatic essay evaluation is still a relatively new arena. Most of the available research mainly deals with the score accuracy or the students’ perceptions on the AWE systems. By comparing the feedback generated the AWE system with those of human raters, the study aims to find out the strengths and weaknesses of AWE systems and help to shed light on the utilization of AWE in high school composition classes. It is hoped that the present study will offer an alternative way for EFL teachers and learners and provide some empirical research evidences in the AWE field.. 1.5 Organization of the Thesis The study is aimed to explore the employment of an automated writing evaluation system in commenting high school students’ writing. Chapter Two presents the issues of error correction and the development of automated essay evaluation, along with the reviews of previous studies concerning students’ perceptions on AWE, the grammar accuracy of AWE system and the use of AWE in different classroom settings. Chapter Three introduces the design of the research, including the subjects, the instrument and the procedure. Chapter Four discusses the results and responds to the research questions. In the end, major findings will be summarized in Chapter Five 5.

(15) with the implications, limitations and suggestions for further research provided.. 6.

(16) Chapter Two Literature Review This chapter is composed of four parts. Section 2.1 presents related issues in error correction. Section 2.2 is an overview of automated writing evaluation. Section 2.3 reviews the previous studies of using AWE in classroom settings. Finally, Section 2.4 summarizes the main points of this chapter.. 2.1 Issues in error correction In the process of language learning, most learners, including their teachers, believe that correcting the mistakes will help to reduce the possibility of making the same errors next time. Thus, language teachers try their best to provide corrective feedback in the hope of seeing improvements in their students’ works. Researchers also conducted surveys to find out the most effective ways to correct and provide feedback on students’ performances. However, in 1996, John Truscott published a controversial review arguing that grammar correction is “ineffective and even harmful to them (students) because the fear of making mistakes cause students to lose confidence and cannot focus on other important parts of writing,” which contradicts the generally-agreed idea that correction helps students become better writers. Therefore, a series of debate and research followed.. 2.1.1 The case against grammar correction Truscott asserted that many teachers and researchers intuitively regard grammar correction as a necessary part in the process of writing without clear evidence of its effect. He thus reviewed previous research concerning error correction and concluded that most studies have shown no significant improvement on students’ errors. In some studies Truscott reviewed, the group receiving only feedback on content and the group 7.

(17) receiving no feedback even performed better than other groups. Thus, he put forward the idea that the fear of making mistakes leads students to be less productive. Truscott further pointed out that learning is a gradual process and that teachers should be aware of the developmental sequence, which is approved by many researchers. Besides, students might not understand meaning of correction, so they might ignore it, or they will forget the correction. On the other hand, teachers, with so many assignments to deal with, fail to be consistent and systematic, which even confuses the students more. Since grammar is only the surface structure, Truscott claimed that grammar correction only distracts teachers and students from the more important elements in writing: development and organization. In the end, Truscott concluded that teachers should not give students correction just because students want it. Instead, teachers can improve students’ accuracy through reading or other aspects in writing. Also, further research on developmental sequences and learner variables needs to be done.. 2.1.2 The case for grammar correction Truscott’s review essay “The case against grammar correction in L2 writing classes” (1996), led to a lot of discussion while other researchers hold the opposite points of view. In 1999, Ferris first responded to Truscott’s article and argued that Truscott’s viewpoints are “premature and overly strong.” Ferris then examined the claims put forward by Truscott and pointed out two important “weaknesses”: the definition of the term “error correction” and the reviewed studies. Ferris (1999, 2002) points out that correction can be effective when being consistent and accurate. Also, selective error correction suits students better because it helps them to focus on certain types of errors. In this way, learners will not be confused and will be able to learn from the rules. 8.

(18) Ferris further argued that in the previous studies Truscott cited, the subjects are not equal, the teaching strategies vary and most of all, Truscott “overstates the negative evidence” (pp.4, 1996) Ferris also stated that it is vital to realize that not all students make the same mistakes. In ESL settings, students’ proficiency level also makes a difference and thus needs to be taken into consideration. Besides, absence of error correction may frustrate students since in written proficiency assessment examinations, students might have multiple errors, which leads to unsatisfactory scores. Therefore, for further research, Ferris concluded that the following factors: selective and systematic feedback, student differences and appropriate research design need to be closely examined and well considered.. 2.1.3 To correct or not to correct? Following the debate in error correction, there is a good deal of research concerning this topic. While most teachers and researchers are positive about the effects of correction, research seems to show mixed results. Teachers are uncertain about whether or not to provide correction (Otoshi, 2005) As Truscott himself said, “I do not deny the value of grammatical accuracy; … No do I generally reject feedback as a teaching method; …” (pp. 329, 1996). The point is the focus of feedback and error correction. It has to be consistent, accurate and productive. However, with so many students in one class and tons of assignments to deal with, it is a difficult task for most language teachers. Ferris suggests that, while making efforts to correct students’ numerous mistakes, teachers need to find ways to “conserve energy and avoid burnout,” (pp. 73, 2002). In addition to various strategies such as peer feedback, mini-conference, there has been a growing interest in using technology to help relieve teachers’ burden, that is, the rise of automatic essay evaluation. 9.

(19) 2.2 Brief Introduction of the Development of Automated Writing Evaluation It has been almost half a decade since the first automatic scoring system was devised. Through the years, researchers have tried various approaches and techniques in the development of automated essay evaluation to simulate the features and process of human grading. The history of AWE can be traced back to 1960s when Page and a network of universities in the US, known as the College Board, with the hope of making the whole grading process more efficient and effective, started to develop the first automated writing evaluation system – Project Essay Grade (PEG). Page experimented with some features that could be extracted from essays (proxy variables) and used correlations to predict the scores given by human judges. PEG is first trained on a certain number of sample essays, and then it processed the variables by computing to produce a score. Although the predicted scores are comparable to those of human raters, PEG could only focus on features of surface structures, such as average word length, essay length, number of uncommon words. Indirect measures were used because it was difficult for computers to process more direct measures at that time (KuKich, 2000). Due to the instability of computer technology and the restricted access to computers, PEG could not be operated in a large scale. It was not until the generalization of personal computers and the more user-friendly computer language in the mid-1980s that the potential of automatic essay scoring was once again re-examined (Page, 2003). In the 1990s, with the utilization of natural language processing (NLP), information retrieval (IR), and artificial intelligence (AI), researchers started to look for new ways to examine the features related to writing quality. Two of the major software engines were developed— e-rater by Educational Testing Service (ETS) and 10.

(20) IntellimetricTM by Vantage Learning. Both claim to have high correlation with human raters. These two scoring engines later were upgraded and further included the functions of providing feedbacks for students’ writing. The instructional applications can be used in classroom settings to assist the teaching of writing. The two most noted automated essay evaluation systems used by many schools or studied by researchers are Criterion, developed by ETS and My Access, developed by Vantage Learning. The following are brief introductions of their functions: Criterion is a web-based system that provides users with automated scoring and evaluation of students’ essays. It is composed of two applications: one is e-rater 2.0, the automated scoring engine. It assigns holistic scores by extracting essential linguistic features and then statistically processes the features to see how they are related to overall writing quality. The other application is Critique, the writing analysis tool that detects students’ errors in grammar, usage, and mechanics, identify the essay’s discourse structure and recognize undesirable styles (Burstein et al., 2004). Students can revise their essays using the instant diagnostic feedback provided by Criterion. My Access is an online portfolio-based instruction program developed by Vantage Learning. It provides students with a six-point holistic score and diagnostic feedback, including the following five domains: focus and meaning, content and development, organization, language use and style, and mechanics and conventions. Students could know their strengths and weaknesses and make further improvement. Since there are unlimited possibilities of computer technology, there must be continuous improvement in the domain of automated writing evaluation systems. Still, more research is needed to provide teachers and students with guidance in the use of technology in classroom.. 11.

(21) 2.3 Previous studies of using AWE systems in EFL contexts As Warschauer & Ware (2006) mind, a large portion of research on automated writing evaluation programs are funded by the companies that invented the programs because the new products need to be refined and at the same time, advertised. In recent years, some research conducted by professors in the universities is presented in academic journals and conferences, which indicates that AWE has gradually been noticed and utilized in the classroom settings. Research of the use of AWE systems in EFL classrooms is especially valuable for the improvement of teaching writing in Taiwan. In the following section, seven empirical studies will be reviewed.. 2.3.1 Previous studies on students’ perceptions The following three studies investigated students’ perceptions toward the use of My Access in writing class by using questionnaires, and all three studies showed that most of the students were not satisfied with the fixed and repeated feedback provided by the system. Chen &Cheng further pointed out that the instructors’ attitudes and familiarity with the system might also affect the effectiveness and students’ attitudes towards the use of AWE system in class 2.3.1.1 Yang (2004) Yang looked into the use of the AWE system called My Access in Taiwanese college classroom settings. There were approximately 300 subjects from Freshman English classes, English Composition classes and a group of students from a self-study program. At the beginning, all the subjects received workshop about how to use the program, but some classes had instructions on the use of My Access from the teachers while others did not. Therefore Yang further divided the subjects into five groups: WI and WN (composition classes with or without My Access instruction), EI and EN (English classes with or without My Access instruction), and S (self-study 12.

(22) students without instructions). After using My Access for one or two semesters, questionnaires were administered to both the students and the teachers to further explore their attitudes and perceptions toward using My Access in the classroom. Most subjects had no difficulty in using the program and more than 60% of the students considered it user-friendly. The results also show that 91% of the subjects who used My Access for a few times or more per month found the program helpful to their English writing. The more often students used the program, the more positive attitudes they had toward it. Most students liked My Access for the revision function (89%), immediate scores and feedback (86%), the writing portfolio (83%), the instructions for improvement (77%), and the grammar suggestions (71%). They also felt their writing better improved in the Organization domain (61%) and the focus/ meaning domain (52%). However, there were different opinions about the feedback provided by My Access. About half of the subjects said the comments were easy to comprehend and they would incorporate the feedback in their writing. Nevertheless, only a small percentage of students (13%) considered the scores they had from My Access appropriate, while more than half were uncertain about the scores. The reason why students did not trust the scores are: (1) the computer feedback were too general, (2) there were not clear information for further improvement. Students also pointed out the need of the instructors’ guidance in the writing process in addition to the computer program. The teachers held positive attitudes toward the system, but they were not sure about its effectiveness in improving students’ writing and stimulating their motivation. To integrate AWE programs in the EFL writing class, Yang proposed that teacher’s guidance is indispensable and that the idea of “autonomous learning” should be 13.

(23) introduced to students. Moreover, instructors should have sharing activities to encourage cooperative learning among students. Yang also pointed out possible improvement for the system: detailed guidance or a writing sample before students write essays, self-study supporting mechanisms, more detailed instruction in the writing feedback, more options for creative writing, and more scoring scales (not just the 6 point scale). In brief, Yang concluded that no present system can replace the role of human teachers and that every program has its advantages and disadvantages. It is the instructors’ job to find out the best ways to help students learn in the classroom.. 2.3.1.2 Chen & Cheng (2006) Chen and Cheng explored the use of My Access in college EFL writing classes. There were 68 students in total from three different classes. The two researchers used questionnaires to investigate students’ attitudes toward the program and its effectiveness in essay grading and providing feedback. They also collected writing samples and group interviews to triangulate the questionnaire results. Chen & Cheng first discussed students’ reactions toward My Access as an essay grader. Surprisingly, except for the immediacy of feedback, most of them had negative reactions toward the grades provided by the system. The finding is similar to those in Yang’s (2004) study. None of the students considered the scores adequate and almost half of the students found the feedback given by the system not helpful at all. The researchers further looked into students’ self-report to find out the reasons of their dissatisfaction. Firstly, students doubted the fairness of the scores. Some of them found they could trick the program by writing longer passages or using more transitional phrases. The researchers also provided a writing sample without coherence to show there are indeed some design flaws in My Access. Secondly, 14.

(24) students considered the feedback given by My Access too general and similar each time. They then expected help from their instructors to give more individual and detailed feedback. In addition, once students received “off-topic” comment, the program did not give them further explanations for improvement, which caused confusion and frustration. Then Chen & Cheng looked into students’ reactions toward the writing tool functions provided by My Access: My Editor, Thesaurus, Word Bank and Online Portfolio. They found that students did not think the functions help their writing process in general. Some of them seldom used these functions while some found the functions quite limited. Seeing that students from three classes held slightly different attitudes toward the program, Chen & Cheng also looked into the ways the instructors used My Access in their classes. Besides, the instructors’ technological skills and familiarity with the system also influenced students’ reactions toward the program. Chen & Cheng concluded that these programs cannot replace the role of human teachers and that no single computer-based program is without flaws. Thus, it is important for researchers and teachers to evaluate the programs, find out their strengths and weakness, and make best use of them.. 2.3.1.3 Chen & Cheng (2008) The researchers adopted a classroom-based approach to investigate students’ perceptions of AWE, and whether the instructors’ attitudes toward it influence the effectiveness of AWE. The subjects were 68 English major students from three different classes, taught by three different instructors. An AWE system My Access! was used in this research. In addition to questionnaires designed to survey students’ responses, the researchers also interviewed the three instructors to find out the ways 15.

(25) they integrated AWE into writing class. How the instructors uses the AWE scores and feedback in improving students’ work were especially worthy of note. Both Instructors A and B had a two-stage design when implementing AWE system; they required students to work with My Access! and then used the scores and feedback to improve their drafts. After students got a score of 4 out of 6, they submitted their essays, and Instructor A gave students written feedback and there was also a peer review. For Instructor B, she allowed students to use the system as much as they want. After students submitted their essays, she conducted individual teacher-student conference to help students improve their work. Both of the instructors appeared to have less confidence in AWE system and put more emphasis on human feedback. Instructor C, however, seemed to have more trust in My Access! She didn’t give students guidelines or feedback during the writing process; instead, she let the system do the scoring and asked students to do online peer review by themselves. As for the period of using My Access! in class, both Classes A and C used it for 16 weeks, while Class B used it for only six weeks. Instructor B had little trust in automated scores and feedback since the vagueness of the feedback only increased her workload, for she needed to more specific details. Besides, Instructor B and her class encountered technical problems, which made them frustrated. Instructors A and C did not have problems with My Access!, but Instructor A pointed out that the program gave some constraints, which would limit students’ creativity and idea development; therefore, she gave them more freedom when writing the essays. Instructor C, however, used the program as a tool to measure students’ writing performance and the final exam grades. Nevertheless, students complained that they had doubts in the fairness of automated scores and feedback. Instructor C then allowed students to submit revision of the essays for her re-assessment, which showed she had less confidence in AWE program due to students’ complaints. 16.

(26) Chen & Cheng summed up that the following four factors influenced the use of AWE program in class: the teachers’ attitude toward AWE scores and feedback, the views on human feedback, their familiarities with the AWE program, and their own ideas of teaching writing. Another finding in this research is students’ perceptions of AWE, and the results were similar to the previous study published in 2006, about half of the students found the program moderately or slightly helpful, whereas the rest considered it not helpful. The different ratings among the three classes were noteworthy: compared with the other two classes, 86% of Class A found the program was more or less helpful in improving their writing. The researchers attributed the results to the comparatively successful implementation of the program in Class A. As for students’ responses toward the AWE scores and feedback, 83% students in Class B showed disagreement. This might result from the instructor’s negative attitude toward it. 57% in Class A and 42% in Class C also reported their distrust in the program for the following reasons: its preference of longer essays, strong emphasis on using transition words, ignoring coherence and content development, and discouraging unconventional ways of writing. Based on the findings, the researchers concluded that how AWE program was implemented in class would have a strong influence on students’ perceptions and its effectiveness. The way of using automated scores, need for human feedback, students’ language proficiency, and the purpose for learning writing were the issues needed to be taken into consideration. The researchers further emphasized the importance of giving human feedback when implementing AWE program in learning since automated feedback is unable to solve students’ individual problems, nor can it attend to coherence and idea development. Besides, lack of meaning negotiation might frustrate students since they need a real audience to improve their writing. AWE 17.

(27) program might be competent when providing feedback on forms, but advanced learners would expect meaning-focused response. The researchers also addressed the need to strike a balance between form and meaning in second language writing instructions. Therefore, the role of teacher is essential in AWE writing environments, so that learners and teachers can make full use of AWE programs.. 2.3.2 Previous studies concerning AWE grammar feedback accuracy The following three studies examined the grammar feedback accuracy of two major AWE systems, Criterion and My Access by human teachers’ double-checking of the computer-generated feedback. It is found that though AWE systems could identify many errors in students’ writing, there were still quite a lot of false alarms that might confuse EFL learners, and some of the errors were left untreated. 2.3.2.1 Otoshi (2005) Otoshi used 28 students’ TOFEL writing essays to examine the error feedback provided by Criterion and compare the results with those of human instructors. There were five main error categories explored in this study: verbs, nouns, articles, wrong word (word choice) and sentence structures. In the verb error category, Criterion detected none of tense errors because the errors were decided from the context. In the noun error part, Criterion detected no errors while 24 wrong uses of singular and plurals were found by human raters. In the article error type, Criterion detected only two wrong uses whereas the instructors found 54 errors. In the wrong word category, Criterion detected only 2 spelling errors and 2 word choice problems but did not find word choice problems or surface errors like prepositions and pronouns. However, the instructors detected various kinds of errors, especially in the word choice part. They also rewrote the words and provided students with suggestions for improvement. For the sentence structure errors, human 18.

(28) instructors detected 176 errors in total while Criterion found only 9 errors. The instructors would help rewrite some parts when they read from the context. In sum, the results showed that Criterion detected much less errors than human instructors in the five error categories. In addition, Criterion had difficulty detecting errors related to the context. Otoshi concluded that an AWE tool such as Criterion could not be used as a single source for feedback in teaching writing. It could relieve the teachers’ heavy workload, but researchers and teachers should carefully examine AWE tools when using them in teaching.. 2.3.2.2 Chen (2006) In order to know the effectiveness of the AWE program My Access in both scoring and feedback aspects, Chen collected 124 essays from college freshmen, English majors, master and doctoral students. Chen then compared the scores obtained from My Access with the scores assigned by two human raters. The results show that the system assigns higher scores to students. In a six-point scale, a lower level student would get 3 points from My Access while human raters gave him just 1-2 points. Chen also observed that the scores correlate with the essay length, as reported in Chen & Cheng’s study (2006). For analyze the feedback quality of My Access, Chen listed 30 types of most common errors and closely examined top fourteen types of error messages by asking human raters to review these error messages. The 14 types include: punctuation, spelling, similar words, clause (fragment), subject-verb agreement, missing articles, pronouns, misused words, ESL punctuation errors, noun phrase consistency, nouns, and unnecessary prepositions. According to the review of human raters, many error messages identified by My Access are unnecessary; in other words, there are too many false alarms. It would be difficult for EFL learners to tell whether the feedback 19.

(29) they received was errors which needed improvement or just false alarms. Besides, Chen randomly chose fifteen papers and asked two English teachers to give detailed feedback; then he compared the feedback from My Access and two human teachers. The results showed that about 134 errors found by human raters were not identified in My Access. It was also noticed that the error correction was not consistent. Some errors were detected while some errors of the same types in the same essay were ignored. Chen discussed the reasons why My Access fails to recognize many ESL or EFL errors: (1) many ESL or EFL errors are influenced by learners’ mother tongues, (2) My Access is originally designed to grade native English students’ writing. With the above limitations, Cheng reminded teachers and learners should be careful when using the system. It could not replace the role of teachers’ feedback. In the end, Chen suggested that the system should make some modifications so that it can be utilized by more ESL students around the world.. 2.3.2.3 Chen, Chiu & Liao (2009) In order to find out contributions and limitations of using AWE software in correcting students’ essays, Chen et al. examined the grammar feedback generated from two major AWE systems: My Access by Vantage and Criterion by ETS. 119 essays graded by My Access and 150 essays graded by Criterion were randomly selected; then the researchers further calculated and analyzed the grammar feedback messages. There were 30 error types identified by My Access, while Criterion provided 27 types of error. To examine the feedback accuracy, two EFL teachers were asked to review the grammar feedback offered by the two systems. The researchers analyzed top ten feedback messages provided by the two systems. Top ten error types detected by My Access were spelling errors, similar words, clause errors, subject-verb agreement, missing articles, pronoun errors, misused words, 20.

(30) punctuation errors, possible word confusions and prepositions errors. Compared with the review of two teachers, the researchers found that only three error types reached 20 % accuracy. Most of the error feedback messages offered by My Access were false alarms. It could be confusing for students who are learning to write accurately. Chen et al. then discussed the top ten error types provided by Criterion: missing or extra articles, spelling errors, fragment, run-on sentences, subject-verb agreement, confused words, ill-formed verbs, Proof-read This!, wrong article, compound words. Most of the categories reached 70% accuracy and there were fewer false alarms. It is clear that Criterion performed better than My Access in the detection of errors. The researchers also noticed that some error types were left untreated or ill-treated by the two systems, such as word order, modal auxiliaries, verb tenses, collocations, conjunct errors, word choice and pronouns. The researchers pointed out that the AWE systems need to be better improved to detect those common errors made by EFL students in their essays. Chen et al. concluded that since the AWE systems available are still not mature, instructors should not expect that these AWE systems will replace the role of teachers. There are still false alarms in the feedback messages generated by the systems and it is difficult for these systems to provide detailed or to-the-point feedback on content, organization or collocations. Learners should use the feedback provided carefully, while teachers should give students necessary guidance and assistance.. 2.3.3 Previous studies about the comparison between AWE and peer feedback Lai (2009) Lai investigated the use of an AWE program – My Access and peer evaluation (PE) in the EFL writing class. There were 22 EFL college students and they were required to write two compositions for PE and two for AWE. Students first received a 21.

(31) brief introduction of AWE and PE; then in the following 16 weeks, students started writing essays and revised their essays with the help of AWE or PE feedback. In the AWE session, students submitted their essays to My Access. After reading My Access feedback, they then revised their compositions and handed in the final draft to the system. In the PE sessions, students formed pairs and gave one another suggestions based on the guide of “Reader Response Sheet.” At the end, Lai used 5-point questionnaire and interview to explore students’ attitudes toward PE and AWE respectively. There were three parts of the questionnaire: Part I-writers’ reflections on the writing process, Part II-writers’ improvements (the product), and Part III- writer’s general perceptions of AWE feedback and PE feedback. Lai first looked into the frequency and types of revision in AWE and PE. It was found that peer evaluation had higher frequency than AWE, which indicated that students revised more often by using peer feedback. Then the researcher investigated the improvement students made in writing with the help of these two evaluation forms. In the feedback from My Access, students found feedback on organization most helpful and feedback on mechanics and convention least helpful. Contrary to the above findings, students regarded mechanics and convention in peer feedback most helpful in revising their essays. The mean scores of the five types of evaluation were above the average, which showed that students were positive towards AWE and PE. However, peer feedback had significantly higher mean scores than My Access in four types of revision, which indicated students made greater improvement by using PE. In the next part, Lai discussed students’ perceptions towards AWE and PE. From the questionnaire results, it was found that students regarded peers more as real audience and thus adopted peer feedback more often in their revision. In the follow-up 22.

(32) interview, students also provided some suggestions for both forms of evaluation. The top one problem for My Access was “to avoid vagueness in feedback”, which echoed the findings in the previous studies of Yang (2004), and Chen & Cheng (2006). Students found the fixed or repeated feedback of My Access not so helpful in improving their writing. They also advised the simplification of the functions in My Access to make it easier to use. As for peer feedback, most students held positive attitudes towards it. Some suggested that scoring should be included in peer review, and that exchanging peer partners regularly or creating a network platform should be considered. In general, this study confirmed the effectiveness of using AWE and PE in EFL writing class. However, students’ preference of PE over AWE showed that social interaction played a role in the process of learning. Besides, students’ preference for direct feedback might be another reason why they preferred peer feedback rather than the feedback provided by AWE system. Lai also pointed out that “computer anxiety” and “L1 culture” might be other reasons that influenced students’ perceptions of AWE and PE. Nevertheless, automated feedback still has its strength - scoring immediacy and diagnostic feedback. It is advised that teachers should give students various types of feedback to assist students in the process of becoming a better writer.. To sum up, the above studies indicated that the AWE systems needs to be improved to be used in EFL writing class and that human instructors’ guidance is still indispensible. To this date, AWE systems have often been viewed as supplement to regular writing instructions. However, only a few of the previous studies were conducted in real instructional settings as well as looked into the similarities and differences between the feedback generated by AWE systems and the teacher’s feedback. Whether there is specific focus or to what extent AWE feedback is 23.

(33) comparable to teacher’s feedback are questions worth exploring. Besides, most of the studies focused on college students. How the AWE system works on high school students is rarely discussed in the literature. It is thus hoped that the present study of the comparison of two types of feedback will help answer the questions.. 2.4 Overall Strengths and Weaknesses of AWE Systems Automated writing evaluation software has the following advantages: it gives students immediate reply and feedback and the cost is relatively low. Besides, a wide range of functions and features, such as model essays, instant error diagnosis or editing tools, are available to facilitate the process of writing, which encourages multiple drafts and self-learning. Instead of trying to replace the role of teachers, these programs assist the teachers who need to correct so many writing assignments. It seems that AWE programs are of promising future and great capability. Nevertheless, some researchers criticize that computers do not appreciate the essence of the essays, give students’ higher scores, provide vague feedback and can even be fooled. Besides, teachers’ familiarity with these programs may also influence the effectiveness (Chen & Cheng, 2006).All of the above are crucial factors which need to be taken into consideration. First, to predict the scores of human raters, most of systems use natural language processing (NLP), artificial intelligence (AI) or information retrieval (IR) to generate the scores and feedback on the target essays. However, as Ben-Simon & Bennett (2007) noted, the features used by automatic essay scoring to grade writing is not necessarily related to good writing. Warschauer & Ware (2006) also pointed out that the automatic system does not understand the real essence of the essay, which means no real interaction and communication happens during the process of writing. The lack of “real audience” and the fact that the results are derived from statistical 24.

(34) methods are not likely to lead to meaningful evaluation. Besides, the feedback provided by the systems mostly focus on surface structures and most of the time are insensitive to context, which might encourage students to “write to the test, ” and focus on features that are most easily detected by the software. In this way, the meaning and content are very likely to be ignored (Cheville, 2004). Second, most of the programs are originally developed to suit the need of learners in the US. Though the institute are developing functions catering to users from various cultural backgrounds, the question whether the programs meet the need of EFL and ESL students is still required to be addressed. Third, some previous studies found that AWE software is likely to be tricked by writers who deliberately use detectable features in their writing and that the software will assign high scores to poor-quality essays. Powers et al.(2002) and Chen & Cheng (2006) discovered a tendency that the longer the essay is, the higher score the writer can obtain. Cheville (2004) also pointed out that the systems, based on counts of certain features, sometimes fail to differentiate illogical writing from inventive one. Conversely, Shermis & Burstein (2003) argue that while a bad essay can get a good score, it takes a good writer to produce the bad essay to get the good score. The question of credibility is still an issue in the use of the AWE systems. Besides, the error diagnosis and individualized feedback on content in AWE may not suit the need of writers and therefore might not lead to meaningful revision and learning. In some studies, students complain that the AWE system only provide general comments on writing styles and these comments are not helpful in their revisions (Yang, 2004; Cheville, 2004). To this day, AWE is still not widely used in secondary schools and the amount of research body is limited (Warschauer & Grimes, 2008). Despite the worries and possible drawbacks, many researchers and teachers are still looking forward to its 25.

(35) future development. Still, to best meet students’ needs, it is necessary for more research to be done from different perspectives.. 2.5 Summary of Chapter Two In this chapter, we review the issues of error correction, the development of AWE software and some previous studies of using AWE in classroom settings. It is hoped that a close examination and detailed comparison between AWE error feedback and human raters’ will help English teachers and learners when using AWE software to enhance their English writing.. 26.

(36) Chapter Three Methodology The research was designed to explore the quality of feedback provided by AWE software by comparing it to human raters’ feedback. Ninety 12th grade high school students and two experienced English teachers as human raters were involved in this study. The AWE program used in this study is Correct English, developed by Vantage Learning. The researcher used Correct English to check students’ compositions and the corrective feedback would be compared and analyzed along with the human raters’.. 3.1 Subjects The subjects were from two intact classes of the 12th grade in two different high schools in Taipei. The total number of the students was 90. Class A was from a girl’s high school; taught by the researcher. There were 43 students in Class A and their English proficiency level was approximately at the upper-intermediate level. The English average score in SAT1 of Class A was 13 (from a 15-point scale). Class B was from a co-education high school, taught by another English teacher. There were 47 students in Class B, 11 male students and 36 female students. Their English proficiency level was approximately at the intermediate level. The SAT average score of Class B was 10 (from a 15-point scale). All of the students had formally studied English for eight years: four years in elementary school, three years in junior high school and two years in senior high school. The students had five to six English classes a week and the main focus of their English course were vocabulary and. 1. The SAT (General Scholastic Ability Test) is one of the college entrance exams designated for high school students in Taiwan. There are five subjects: Chinese, English, math, social science and science. 27.

(37) reading. Students in the 12th grade had a one-hour elective course of English composition in which English writing in formally instructed. In the composition class, students would be asked to write on an assigned topic in 50 minutes and the teacher collected the compositions and graded them. Students received a grade with some feedback given by the teacher a few weeks later. However, most of them were still novice writers since they didn’t have a lot of chance to practice writing compositions until they entered senior high school. Therefore, their experience of writing in English was quite limited.. 3.2 Instrument In the present study, in addition to two human raters, a web-based AWE program Correct English, developed by Vantage Linguistics, will be implemented to evaluate the subjects’ compositions. Correct English aims to enhance writing by providing comprehensive feedback on grammar, style and content, along with a score of readability on five dimensions: organization, focus, content, style and overall performance. The grammar engine targets more than 80 types of errors in writing, including common grammatical mistakes or easily-confused words. Cross-lingual grammar help is also available in seven languages, including Spanish, French, German, Japanese, Korean, traditional and simplified Chinese, which makes it easier for ESL and EFL learners to improve their writing skills. In addition to the function of grammar and style checker, Correct English also offers annotated models and instant writing evaluation and revision checklists based on different fields and tasks to guide learners. In addition to basic editing tools like those in Microsoft Word, there are four main functions of Correct English: Grammar, Writing Help, References and Review. The most used function is Grammar Feedback, which is also the focus of this research. 28.

(38) When a user finishes typing an essay, he or she simply clicks the “check” icon and errors in spelling, grammar, mechanics and styles will automatically be detected by the system and underlined in different colors. A column of suggestions for correction or improvement is shown on the right-hand side of the screen for its users to refer to. In addition, the system calculates the number of words and sentences in an essay, and provides a readability score and the overall level (poor/ good/ great) of the checked essay. A screen of error feedback is shown in Figure 1.. Figure 1. Basic Check Marked by Correct English In the error feedback column, it is divided into three types: Basic Check, Grammar and Usage, and Style Choice. In Basic Check column, it points out the spelling errors and missing words such as prepositions, as shown in Figure 1. For Grammar and Usage, the system detects ungrammatical usages such as the easily confused infinitive and –ing forms, misused words, open v.s closed spelling, as shown in Figure 2. In Style Choice (Figure 3), Correct English identifies problems such as wordy expressions and adverb placement. Users are then able to revise the essays 29.

(39) using the feedback on grammar provided by Correct English.. Figure 2. Grammar and Usage Feedback Provided by Correct English. Figure 3. Style Choice Suggested by Correct English. 30.

(40) The second function in Writing Help. The system provides guide in different fields: Humanities, Academic Forms, Citation Guide, English Basics, Sciences and Social Sciences. Take English Basic for example, users can choose suitable writing genres (Essay: Getting Started, Essay: Structure, American Business Communication, Resume Chorological / Functional), and the system will provide guided questions and a model essay for users’ reference (see Figure 4).. Figure 4. The Function of Writing Help in Correct English. Still another function is Reference. Users can directly type the information they want to look for in the column of Lexipedia and Writing Websites; the system will directly link to online reference sites like Wikipedia or web dictionaries for users to check, as shown in Figure 5.. 31.

(41) Figure 5. The Function of Reference in Correct English. With its instant feedback and so many functions catering to the need of English learners, Correct English allows its users to revise their writing much more easily and effortlessly.. 3.3 Procedure The students in Class A and Class B were asked to write compositions on the following two topics: “If you were to go abroad for a year, what would you bring along with you?” and “The Most Impressive Advertisement/ Commercial I Have Ever Seen.” For each topic, they had 50 minutes to finish their compositions. The two teachers collected the compositions of their own class and give students grades and proper feedback. Six students in Class A and eleven students in Class B failed to hand in their composition assignments; thus it turned out that there were 146 samples in total, 74 from Class A and 72 from class B. The researcher then used Correct English to check the collected compositions. The results of grammar feedback were downloaded; the number and type of writing 32.

(42) errors extracted from Correct English were listed and calculated, then categorized into several major types of error feedback. At the same time, two human raters, one is the researcher, the other is an English teacher with five year teaching experience, corrected the 146 sample compositions. The results were also calculated and put into categories. The researcher then compared the feedback provided by Correct English with the feedback of two human raters in order to find out the similarities and differences. In addition, whether there were major differences between the results of Class A and Class B was also examined. In the following chapter, the analysis of the error feedback messages provided by Correct English will be presented.. 33.

(43) Chapter Four Results and Data Analysis This chapter provides the results and findings of the three research questions raised in Chapter One. Section 4.1 is a summary of all the types and numbers of corrective feedback provided by Correct English and human raters respectively. Section 4.2 analyzes the top twenty error types detected by Correct English and their accuracy rate and addresses other findings worth noticing. Section 4.3 focuses on the top 20 error types identified by human raters and the comparison with the results obtained from Correct English. Section 4.4 is a brief summary of Chapter Four.. 4.1 Types and numbers of corrective feedback identified by Correct English and human raters When students’ compositions were graded by Correct English and human raters, many different types of writing errors were identified. There were 1275 errors detected by Correct English and human raters identified 1503 errors in total. There are 40 error types provided by Correct English, while human raters had 34 error types. Thus, there are 47 error types in total. Table 1 is the summary of the numbers and types of error feedback detected by Correct English and human raters. It should be noted that the list of error types provided by Correct English here does not stand for all the error types that could be provided by the system. They were just errors detected in the 146 compositions in the study.. 34.

(44) Table 1. The Number and Error Types Provided by Correct English and Human Raters Error Types. No. of Errors by Correct English. No. of Errors by Human Raters. 1. Spelling errors. 256. 168. 2. Clause errors. 152. 53. 3. Contractions. 128. 0. 4. Subject -Verb agreement. 104. 68. 5. Word form. 62. 63. 6. Punctuation (ESL). 46. 44. 7. Noun phrase consistency. 40. 46. 8. Infinitive or -ing form. 35. 13. 9. Verb group consistency. 35. 87. 10. Weak / non-standard modifiers. 34. 0. 11. Adverb placement. 33. 0. 12. Capitalization. 31. 6. 13. Missing / unnecessary/ incorrect articles. 30. 86. 14. Wordy expressions. 26. 0. 15. Redundant expressions. 25. 39. 16. A vs. An. 24. 22. 17. Vague quantifiers. 20. 0. 18. Missing/ unnecessary/ incorrect prepositions(ESL). 19. 129. 19. Nouns: mass or count. 17. 72. 20. Open vs. closed spelling. 16. 0. 21. Word confusion. 14. 67. 22. Homonyms. 14. 3. 23. Idiomatic expressions. 14. 32. 24. Word order. 12. 27. 25. Misused words. 12. 63. 26. Verb tense. 12. 119. 27. Ungrammatical (informal) usages. 10. 6. 28. Word choice. 9. 21. 29. Pronoun errors (ESL). 8. 3. 30. Compounding errors. 6. 6. 31. Passive voice usages. 6. 0. 32. Run- on sentences. 5. 26. 33. Comparative / superlative. 4. 11. 34. Clichés. 4. 0. 35. Double negative. 4. 1. 36. Stock phrase. 2. 0. 37. Style settings (long sequence of prep. phrases). 2. 0. 38. Reflexive pronouns. 2. 0. 39. It vs. There. 1. 0. 40. Verb construction. 1. 0. 41. Teacher’s rewrite. 0. 127. 42. Relative pronouns. 0. 32. 43. Unclear meaning. 0. 26. 44. Causative verbs. 0. 16. 45. Pronoun Reference. 0. 8. 46. Possessive errors. 0. 7. 47. Very vs. very much. 0. 6. 1275. 1503. Total. 35.