應用語料庫工具改正英文錯誤之效能研究

全文

(1)國立臺灣師範大學英語學系碩士論文 Master Thesis Department of English National Taiwan Normal University. 應用語料庫工具改正英文錯誤之效能研究 The Effects of Corpus Tools on Assisting EFL Learners to Correct Errors. 指導教授：陳浩然博士 Advisor: Dr. Hao-Jan Chen. 研究生：張文嘉 Student: Christine Susan Chang. 中華民國 105 年 7 月 July 2016.

(2) 中文摘要由於科技進步，越來越多語料庫工具透過大數據分析，將語料以頻率高低排列，以協助使用者了解常用英文詞句搭配。過去研究證實學習者能有效使用語料庫工具搜尋語料，進而改正其英文寫作文法錯誤，甚至是更正不同類型的英文文法錯誤。然而，過去有研究指出每種語料庫工具因其不同設計目的，皆有其優點及限制。越來越多學者因而提出使用及研究多於一種語料庫工具並比較不同語料庫工具在錯誤改正上之效能，特別是使用包含豐富語料及簡單易操作介面的語料庫工具，以協助英語學習者使用語料庫改正不同類型的英文文法錯誤。本研究目的為比較兩種符合上述兩大條件的語料庫工具，並研究其在十種英文文法錯誤類型上的改正效用。本研究之四名受試者為就讀大學且英文能力中級的英文學習者，其中兩名受試者會先使用語料庫工具 Netspeak 進行錯誤改正，並再以另一語料庫工具 Linggle 進行錯誤改正;另兩名會以相反的順序改正錯誤。研究工具包含錯誤例題試卷及答案、前後測態度問卷及學習者搜尋歷程之記錄。本研究分析英文學習者以兩種工具更正十種錯誤類型答案比例、學習者之搜尋歷程及學習者對語料庫工具之態度。研究結果指出兩種語料庫工具皆能有效的協助學習者更正十種英文文法錯誤類型。其中部分錯誤類型有較高的更正率，但部分錯誤類型卻發現無法被兩種語料庫工具更正。此外，搜尋歷程顯示學習者能有效使用搜尋符碼並發展搜尋策略。本研究也發現學習者對這兩種語料庫工具抱持正向態度，且未來也願意繼續使用這兩種語料庫工具進行語言學習。關鍵字：語料庫工具、英文文法錯誤、錯誤類型、錯誤更正、搜尋歷程、大學生. i.

(3) Abstract Studies showed that learners could use corpus tools to correct different types of error; however, each corpus tool may have its strength and weakness based on the design purpose. Thus, an increasing number of scholars advocated the use of more than one corpus tool to facilitate learners to do data-driven learning (DDL). Among various corpus tools that were analyzed, it is found that corpus tools with two features that could be helpful for learners to correct errors, which are gigantic corpora and user-friendly interface similar to search engines. Netspeak and Linggle are two powerful corpus tools compass copious corpora and user-friendly interface similar to search engines that are suitable for DDL. The present study aims at analyzing the effects of Netspeak and Linggle on assisting learners to correct the ten types of error. Four intermediate college learners were recruited. Two of them used Netspeak and then used Linggle to correct errors; the other two learners used the two corpus tool in reverse orders. The instruments include the error correction tasks, leaners’ searching logs, and pre-and post-questionnaires. Three aspects were examined, which are types of error that learners could correct by using the two corpus tools, the error correction processes, and learners’ attitudes toward the two corpus tools. The results showed that two corpus tools were found to be useful for learners to correct the ten types of error. The two corpus tools were found to be more useful to correct some error types while others may not be corrected by using the corpus tools. Moreover, it is found that learners used operators with various strategies while correcting errors. Finally, learners held positive attitudes toward to two corpus tools and were willing to use them in the future. Keywords: corpus tools, grammatical error, error correction, error type, DDL, EFL iii.

(4) ACKNOWLEDGEMENT. I would like to express my sincere gratitude to many people who provide assistance and support to make this thesis possible. First and foremost, I would like to show deep gratitude to my advisor, Prof. Hao-Jan Chen, for his continuous support and for this study and great patience with me. His insightful guidance and valuable comments helped me to conduct research and write this thesis. He always motivates and inspires me with his immense knowledge and philosophy of life. I am truly grateful to have him as my advisor. Besides my advisor, I would like to give special thanks my committee members: Prof. Chih-Cheng Lin and Prof. Jason S. Chang, for their valuable comments and suggestions to make this thesis more complete. My sincere thanks also go to many professors in English and Education Department that provide me with timely assistance. I would also like to thank my classmates at National Taiwan Normal University, especially Igance and Lisa, for all the inspiring discussions that we had and their valuable feedback to help me improve my thesis. Moreover, I would like to thank my dear family and friends for the continuous support and constant encourage throughout writing this thesis.. iii i.

(5) TABLE OF CONTENTS 中文摘要......................................................................................................................... i ABSTRCT...................................................................................................................... ii ACKNOWLEDGEMENT ............................................................................................ iii TABLE OF CONTENTS .............................................................................................. iv LIST OF TABLES ........................................................................................................ vi LIST OF FIGURES ..................................................................................................... vii CHAPTER 1 INTRODUCTION ............................................................................. 1 1.1 Background ..................................................................................................................... 1 1.2 Motivation ....................................................................................................................... 3 1.3 Purpose ............................................................................................................................ 5 1.4 Research Questions ......................................................................................................... 7 1.5 Significance ..................................................................................................................... 7. CHAPTER 2 LITERATURE REVIEW ............................................................... 10 2.1 Notions of Data-driven Learning (DDL) ....................................................................... 10 2.2 Studies of DDL .............................................................................................................. 13 2.2.1 General Learning Abilities ..................................................................................... 13 2.2.2 Writing Accuracy ................................................................................................... 15 2.2.3 Different Types of Grammatical Error ................................................................... 24. CHAPTER 3 METHODOLOGY .......................................................................... 28 3.1 Participants .................................................................................................................... 28 3.2 Instrument ...................................................................................................................... 28 3.2.1 Two Corpus Tools .................................................................................................. 29. 3.2.1.1 Netspeak ................................................................................... 29 3.2.1.2 Linggle ..................................................................................... 31 3.2.2 Inputlog .................................................................................................................. 34 3.2.3 The Error Correction Task...................................................................................... 35 3.2.4 Questionnaires and Interviews ............................................................................... 38 3.3 Procedures ..................................................................................................................... 40 3.4 Data Collection and Analysis ........................................................................................ 41 iv.

(6) CHAPTER 4 RESULTS AND DISCUSSION ......................................................... 43 4.1 The Type of Error Corrected by Learners ..................................................................... 43 4.1.1 Each Learner’ Correct Answers ............................................................................. 43 4.1.2 Correct Answers of Each Error Type ..................................................................... 46 4.1.3 Three Types of Changes of Learners' Answers ...................................................... 51 4.2 Learners’ Searching Processes ...................................................................................... 52 4.2.1 Operators and Average Queries of Each Item ........................................................ 53 4.2.2 Strategies for DDL ................................................................................................. 57 4.3 Learners’ Attitudes ........................................................................................................ 58 4.3.1 Learners’ Prior Experiences ................................................................................... 58 4.3.2 Learners’ Attitudes toward Netspeak and Linggle ................................................. 60 4.3.3 Learner' Attitudes Changes after Using the Corpus Tools ..................................... 64 4.4 Discussion ..................................................................................................................... 65 4.4.1 Types of Error that Learners Corrected by Using Corpus Tools ............................ 66. 4.4.1.1 Each Learner’s Correct Answers ............................................. 66 4.4.1.2 Correct Answers of Each Error Type ....................................... 67 4.4.1.3 Three Kinds of Changes of Learners' Correct Answers ........... 75 4.4.2 Learners’ Searching Processes with Netspeak and Linggle ................................... 76 4.4.3 EFL Learners’ Attitudes toward the Two Corpus Tools ........................................ 80. 4.4.3.1 Target Users and Interfaces of the Two Corpus Tools ............. 80 4.4.3.2 The Operators of the Two Corpus Tools .................................. 83 4.4.3.3 Using the Two Corpus Tools for English Learning ................. 87 4.4.3.4 Attitudes Changes after Using the Two Corpus Tools ............. 89 CHAPTER 5 CONCLUSION ................................................................................... 93 5.1 Summery ....................................................................................................................... 93 5.2 Pedagogical Implications............................................................................................... 96 5.3 Limitations and Suggestions for Future Research ......................................................... 98. REFERENCES ......................................................................................................... 101 APPENDIXES .......................................................................................................... 104 v.

(7) LIST OF TABLES Table 2.1 Summary of the Results of Previous Error Types ...................................... 24 Table 3.1 Operators of Netspeak ................................................................................ 31 Table 3.2 Operators of Linggle .................................................................................. 33 Table 3.3 Ten Type of Error for the Present Study..................................................... 37 Table 3.4 Same Items on Pre- and Post-questionnaires ............................................. 42 Table 4.1 The Paired T-test Results of Both Groups of Learners .............................. 44 Table 4.2 Correct Answers by using Netspeak .......................................................... 45 Table 4.3 Correct Answers by using Linggle ............................................................. 46 Table 4.4 Correct Answers of Ten Types of Error by Using Netspeak and Linggle .. 47 Table 4.5 Rankings of Correct Answers of Ten Error Types ..................................... 50 Table 4.6 Types of Changes of Learner A and B's Answers ...................................... 52 Table 4.7 Types of Changes of Learner C and D's Answers ...................................... 52 Table 4.8 Searching logs of Netspeak’s operators ..................................................... 53 Table 4.9 Searching logs of Linggle’s operators ........................................................ 54 Table 4.10 Searching Logs of Linggles’s “POS” Operators .................................... 55 Table 4.11 Searching Logs of Netspeak and Linggle’s Operators ............................. 56 Table 4.12 Average Query for Each Item Submitted by Learners ............................. 56 Table 4.13 Results of the pre-questionnaires ............................................................. 59 Table 4.14 Results of the post-questionnaires ............................................................ 61 Table 4.15 Learners preferences of Netspeak’s operators.......................................... 63 Table 4.16 Learners preferences of Linggle’s operators ............................................ 63 Table 4.17 Learners’ Attitudes toward Three Items on both Questionnaires ............. 65. vi.

(8) LIST OF FIGURES. Figure 3.1 Interface of Netspeak ................................................................................ 30 Figure 3.2 Interface of Linggle .................................................................................. 32 Figure 3.3 Interface of Inputlog ................................................................................. 35 Figure 3.4 Overall Procedures of the Present Study .................................................. 40. vii.

(9) CHAPTER 1 INTRODUCTION. 1.1 Background The importance of improving writing accuracy for EFL learners has been emphasized in the field of language learning (Chen, 2009; Milton, 2006). One common method for EFL learners to improve writing accuracy is to receive error feedback (Ferris, 2006; Milton, 2006). In EFL setting, English teachers are often expected to give feedback when learners make grammatical errors (Milton, 2006). However, teachers are not always available for giving feedback because of time constraints or the pressure of dealing with a large group of learners (Milton, 2006; Yeh, Liou & Yu, 2007). Even if teachers are able to provide feedback, they might provide postponed feedback instead of instant feedback (Milton, 2006; Yeh, Liou & Yu, 2007). Learners who receive postponed feedback could not know answers to their errors immediately; instead, they have to wait for a period of time. The waiting process could be time-consuming and discouraging for learners according to Milton (2006). To cope with the above-mentioned situation, scholars advocated allowing learners to empowering learners to search for language patterns by using large data bases that show copious language patterns independently. Therefore, it could help teachers to reduce workload and to help learners spending less much time waiting for teachers' responses (Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Johns, 1991; Milton, 2006; Shei, 2008a, 2008b; Sun, 2007). This could be seen as data-driven learning, or DDL, referring to an approach that teachers provide resources for learners to discover knowledge independently. The roles of teachers and learners 1.

(10) are different. Teachers are informants who introduce different learning tools and strategies to facilitate learners' independent language learning. Learners, on the other hand, use the suitable tools and strategies provided by teachers to work as independent researchers and find answers tailored to their needs (Johns, 1991). Various corpus tools were used in previous DDL studies as useful language learning scaffolding (Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Sun, 2007). Corpus tools encompass rich corpora, which are linguistic data in contexts extracted from a great body of authentic texts or webpages. Corpus tools enable learners to search for specific linguistic features, including lexical, phrasal, or contextual information within a few seconds (Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; McEnery & Xiao, 2011; Shei, 2008b; Sun, 2007). A large amount of research has been carried out to reveal the effects of using data-driven learning (DDL) for learners to improve writing accuracy or correct errors (Boulton, 2010; Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Milton, 2006; Shei, 2008a, 2008b; Sun, 2007; Yeh, Liou & Yu, 2007; Yoon & Hirvela, 2004). Some similarities and differences were found in these studies. It was found that different corpus tools were chosen to meet with different research goals. In addition, learners were required to use different amount of corpus tools. More DDL studies analyzed one corpus tool (Boulton, 2010; Gaskell & Cobb, 2004; Milton, 2006; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007; Yoon & Hirvela, 2004) while fewer studies compared more than one corpus tool to compare the effects of different corpus tools (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007). On the other hand, some similarities could also be found in previous studies. First, DDL were found to be beneficial for learners to improve writing accuracy or assist learners to correct errors (Boulton, 2010; Conroy, 2010; Gaskell & Cobb, 2004; 2.

(11) Hafner & Candlin, 2007; Milton, 2006; Shei, 2008a, 2008b; Sun, 2007; Yeh, Liou & Yu, 2007; Yoon & Hirvela, 2004). Learners were capable of correcting grammatical errors by using the designate corpus tools in their learning tasks (Conroy, 2010; Gaskell & Cobb, 2004; Yeh, Liou & Yu, 2007). In addition, learners believed DDL could help them to correct errors or improve writing accuracy; thus, they were willing to use corpus tools for DDL in the future (Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Milton, 2006; Sun, 2007; Yeh, Liou & Yu, 2007; Yoon & Hirvela, 2004). Secondly, learners tended to correct some error types more successfully than other types of error after DDL (Boulton, 2007; Boulton, 2010; Conory, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Yeh, Liou & Yu, 2007). Finally, the participants were mostly college learners who have mastered basic language rules. They were said to be more capable of inducing rules based on corpora compared to learners at other education levels (Conroy, 2010; Hafner & Candlin, 2007; Gaskell & Cobb, 2004; Shei, 2008b; Sun, 2007; Yeh, Liou & Yu, 2007).. 1.2 Motivation Previous studies suggested that corpus tools that contain larger corpora could provide learners with rich examples for generating linguistic rules (Conroy, 2010; Gaskell & Cobb, 2004). For examples, learners stated that they could not find sufficient examples in the Brown Corpus, which contain one million words (Conroy, 2010). Thus, using corpus tools with corpora larger than one million words was suggested. In addition, Shei (2008a) mentioned that some corpus tools such as BNC (British National Corpus) remained closed systems because of their interfaces. On the other hand, corpus tools with easily accessible interfaces such as the search engine, Google, could be powerful learning scaffolding. Therefore, corpus tools with a large 3.

(12) data base and a user-friendly interface could be suitable for DDL. Netspeak and Linggle were designed based on the two important features suggested by previous studies as mentioned above, which contain large corpora (Conroy, 2010; Gaskell & Cobb, 2004) and user-friendly interfaces similar to Google (Shei, 2008a). Netspeak and Linggle are two corpus tools which could provide more than one billion corpora because they both extract corpora from Google. In addition, they have interfaces similar to the searching engine, Google, as suggested by Shei (2008a). Therefore, Netspeak and Linggle could be seen as two highly powerful corpus tools that are suitable for DDL. However, few studies analyzed the effects of using Netspeak and Linggle for DDL, especially for improving writing accuracy. One study showed that learners improved their writing performances after using Netspeak. Boisson and others (2013) showed that Linggle is an extremely powerful corpus tool for language learning. In fact, Netspeak and Linggle were compared in previous studies. In Potthast and others' study (2014), they stated that the target users of Netspeak and Linggle are different. Netspeak is for average learners and Linggle is for professional linguistic research. However, Boisson and others (2013) suggested that Linggle is by far the most functionally comprehensive corpus tools aiming at ESL/EFL learners. In addition, Linggle provides some functions that could not be found in Netspeak, such as searching for parts of speech. Nonetheless, no empirical evidence was provided to understand the effects on using Linggle for assisting learners to improve writing accuracy. Based on the previous discussion, it is found that different corpus tools were designed for different target users with various functions; therefore, they could have different strengths and weaknesses. Some corpus tools could be useful for assisting 4.

(13) learners to correct certain error types while other corpus tools may be more useful for correct the other types of error. It was also demonstrated that learners would use different corpus tools to improve writing accuracy, showing that learners were also aware of the strengths and weaknesses of different corpus tools (Conroy, 2010; Hafner& Candlin, 2007; Sun, 2007). Thus, it was advocated that using different corpus tools to correct different types of error may compensate the weaknesses of their counterparts and facilitate language learning (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007). Therefore, comparing different corpus tools is crucial to unveil the effects of different corpus tools for assisting learners to correct different types of error. Thus, the present study will compare Netspeak and Linggle, which are two highly powerful corpus tools that were said to be suitable for DDL. The effects of the two corpus tools on assisting EFL college learners to correct different types of error were analyzed. Additional evidence is provided in hope to provide a thorough understanding of the extent to which Netspeak and Linggle would help learners to correct different types of error.. 1.3 Purpose The purpose of the present study is to analyze the effects of Netspeak and Linggle on assisting learners to correct ten common ESL/EFL error types. These two powerful corpus tools contain gigantic corpora gigantic with user-friendly interfaces as reported (Boisson et al., 2013; Potthast et al., 2014). In fact, both corpus tools contain more than 1 billion corpora extracted from Google, which provide easy access to gigantic corpora that are even larger than many other well-established corpus tools. Thus, they could provide copious contextualized language patterns to help learners generate language rules, which would be extremely suitable for DDL (Conroy, 2010; Gaskell 5.

(14) & Cobb, 2004). Learners could induce language rules. In addition, it was demonstrated that Netspeak's interface learners were capable of using Netspeak for DDL after training for only five minutes because of the simple and use-friendly interface (Potthast et al., 2014). Other common features shared by the two corpus tools were further addressed which make this corpus tools comparable (Boisson et al., 2013; Potthast et al., 2014). Both Netspeak and Linggle enable learners to search for one to five contiguous words at most. When learners search for a word or a phrase, corpora would be sequenced based on their frequencies. Common or frequent usages would be shown ahead of the less frequent ones. Learners could understand words or phrases in context and based on the frequencies provided by the corpus tools, which could help them to select the language patterns tailored to their unique needs. On the other hand, Netspeak and Linggle provide some different functions (Boisson et al., 2013; Potthast et al., 2014), which make them unique and worth analyzing. For example, Netspeak and Linggle have some different operators, which are the symbols that lead to show corpora in certain ways. Therefore, learners are able to check uncertain or possible language patterns with various methods to use different operators to search for certain corpora. Based on the above discussion, Netspeak and Linggle are two powerful corpus tools that are similar in nature but provide some different functions, which could be helpful for learners to correct learning through DDL to different extents. Thus, the purpose of the present study aims to analyze and compare the extent to which Netspeak and Linggle can be useful for assisting EFL college learners to correct ten types of errors. The ten error types include collocations, conjunctions, gerunds and infinitives, parts of speech, preposition, pronouns, noun plurals, participles, modals, 6.

(15) and word order. These types of error were chosen for analysis because previous studies showed that most these errors could be corrected by learners more successfully by learners compared to other types of error (Boulton, 2007; Boulton, 2010; Conory, 2010; Hafner & Candlin, 2007; Gaskell & Cobb, 2004). After aggregating the number of correct answers provided by learners based on each error type, types of error that could be corrected by using Netspeak or Linggle are shown. The strengths and weaknesses of Netspeak and Linggle for assisting EFL learners to correct errors and improve writing accuracy could be shown in the present study. Another purpose is to examine the learners' searching processes of error correction by analyzing the average query times, operators, and strategies that learners used to complete the error correction tasks. In addition, learners' attitudes of using Netspeak and Linggle for error correction will be another goal for analysis.. 1.4 Research Questions Three research questions will be addressed in the present study: 1. Which type of error could EFL learners correct by using Netspeak and Linggle? 2. What operators and strategies did learners use to correct errors? What were average queries of each item? 3. What were EFL learners’ attitudes of using Netspeak and Linggle to correct errors?. 1.5 Significance The present study attempts to explore Netspeak and Linggle for assisting EFL learners to correct different types of error. These two corpus tools have been reported to be powerful and have positive impacts for learners to improve writing accuracy 7.

(16) (Boisson et al., 2013; Potthast et al., 2014). Netspeak and Linggle are worth analyzing because they have gigantic corpora and user-friendly interfaces, which are two important factors for DDL suggested in previous studies (Conroy, 2010; Gaskell & Cobb, 2004; Shei, 2008a). Netspeak and Linggle have different functions that made them unique in addition to their similar designs. Their similarities and differences may result in their strengths and weaknesses for learners to correct different types of error. A thorough understanding of their strengths and weaknesses could shed light on whether these two corpus tools could assist learners to improve writing accuracy, especially for correcting different types of error. The ten types of error of the present study are also worth analyzing. Previous studies showed that learners could correct and reduce most of these error types either (Boulton, 2007; Boulton, 2010; Conory, 2010; Hafner & Candlin, 2007; Gaskell & Cobb, 2004). On the other hand, a type of error were found to be significantly increased after DDL treatment (Gaskell & Cobb, 2004) and a type of major verb form that is important for English learners are included for further analysis (Granger, 1997; Yeh, Liou & Yu, 2007). In addition to the previous findings, the ten error types were found to be common ESL/EFL types of error (Chen, 2009; Ferris, 2006). Thus, it is vital to investigate these ten types of error. The present study attempts to show some insights on error types that could be corrected by EFL learners with Netspeak and Linggle. First, it is hoped that through the understanding of the types of error that learners could or could not correct by using Netspeak and Linggle, the strengths and weaknesses of Netspeak and Linggle could be shown. Thus, English teachers and language researchers could use the results as teaching/ learning materials or research reference. Learners may also benefit from the results by understanding the possibilities of using different corpus tools to 8.

(17) improve writing accuracy even without a teacher' presence. In addition to the error types, it is hoped to provide a thorough and in-depth understanding of DDL based on the analysis of learners' error correction processes based on average query times, frequently used operators, and strategies for using the two corpus tools error correction. The results may assist researchers to develop corpus tools and make them more suitable for their target learners. Furthermore, Referring to the results, EFL teachers and learners can understand ways to correct errors with the two corpus tools to facilitate independent learning or teaching implementation in the future. The last insight could be inferred from the analysis of learners' attitude. Learners' preferences of using the two corpus tools for error correction may reveal valuable learner feedback in terms of the two corpus tools and DDL. This could be important reminders for research or teaching implementation of data-driven learning. In addition, researchers could consult with the results and develop different corpus tools.. 9.

(18) CHAPTER 2. LITERATURE REVIEW. The literature review of the present study contains two main sections. The first section is further divided into two sub-sections. The first sub-section is notions of DDL and the second sub-section focuses on the review of previous DDL studies. As for the second sub-section which reviews previous DDL studies, three main aspects are discussed. In the beginning, 2.2.1 focuses on the analysis of DDL for general purposes. Afterward, section 2.2.2 aims at DDL studies that analyzed writing accuracy. Finally, section 2.2.3 reviews the types of error that have been analyzed in previous DDL studies.. 2.1 Notions of Data-driven Learning (DDL) According to Johns (1991), technology tools are helpful informants for language learning. When learners use technology tools as language learning references, they could use technology tools to search for corpora that they need. After receiving the copious corpora provided by technology tools, it is learners that need to choose and decide the corpora that fit their needs. Although other language learning references are said to be useful, including printed dictionaries and grammar books, they may contain limited corpora compared to the ones provided by technology tools (Boisson et al., 2013; Johns, 1991). It is because technology tools could store, extract, and provide copious linguistic data without being limited by the volume of the book (Boisson et al., 2013; Johns, 1991). An approach was developed based on the above concepts, which has been known as data-driven learning, or DDL (Johns, 1991). In data-driven learning (DDL), 10.

(19) technology tools are informants that provide rich corpora for learners to choose the suitable corpora that they need. After choosing the corpora, learners need to generalize hidden language rules based on the chosen corpora. Three stages of DDL implementation were proposed (Johns, 1991; McCarthy, 1995). Learners need to observe the examples with certain linguistic features, classify the examples either independently or collaboratively with peers, and generalize the hidden linguistic rules based on a particular linguistic feature. These three stages showed that learners need to discover knowledge as independent researchers. Thus, DDL is a learner-centered approach. In addition, this approach is discovery learning and inductive learning in nature (Johns, 1991; McEnery & Xiao, 2011). Learners will not be given the specified language rules to their grammatical problems; instead, they need to screen through the examples that contain certain linguistic features and generalize language rules in DDL. McEnery and Xiao (2011) also held the same belief and stated that DDL is like “teaching-to-exploit”, meaning that learners would learn how to use different ways to discover knowledge or complete their learning tasks in DDL. Learners need to induce language rules and develop learning strategies in the self-discovery learning journey (Johns, 1991; McEnery & Xiao, 2011). Teachers, on the other hand, are facilitators who provide contexts and strategies that are beneficial for learners to induce language rules. It is important that though learners’ active roles are highly emphasized in DDL, proper mediation and guidance from teachers or researchers are essential for a successful pedagogical implementation (Johns, 1991; Hafner & Candlin, 2007; McEnery & Xiao, 2011; Shei, 2008a; Sun, 2007; Yoon & Hirvela, 2004). Various learner factors should be taken into consideration for sufficient mediation, including proficiency level, age, motivation, prior experience, and prior knowledge (McEnery & 11.

(20) Xiao, 2011). Sufficient training of using technology tools for DDL is also important in order to understand learners' actual performances (Conroy, 2010; Shei, 2008). Corpus tools are the technology tools that have been used for self-discovery learning journey in DDL (Johns, 1991; McEnery & Xiao, 2011). Corpus tools are developed by researchers. They are data bases that contain copious corpora by extracting typical lexical and phrasal sentences; moreover, the corpora are further sequenced based on their frequencies (McEnery & Xiao, 2011). The design of each corpus tool could be different in order to meet different needs. For instance, different corpus tools may extract corpora from different sources. An example is COCA (Corpus of Contemporary American English), which is one of the largest free corpus tools for contemporary American English, whereas BNC (British National Corpus) is another well-developed corpus tool that extracts corpora of British English. Some corpus tools even provide additional sorting functions which enable learners to focus on corpora in specific contexts. For example, the corpora could be further sorted by genres, years, or parts of speech in COCA. Learners can find a list of examples that contain certain features that they need within a few seconds. Corpus tools have been considered beneficial for language learning (Johns, 1991; McEnery & Xiao, 2011). It can help learners to find corpora that are extracted from a collection of texts in a short amount of time. In addition, authentic and typical language patterns with contextualized information are provided by corpus tools (McEnery & Xiao, 2011). Other advantages have also been mentioned (Johns, 1991; McEnery and Xiao, 2011). According to McEnery and Xiao (2011), corpus tools may help language teaching and learning in many aspects, including publishing reference, syllabus design and materials development, language testing, references for language 12.

(21) teaching and learning, teacher development, and language analysis.. 2.2 Studies of DDL In this section, studies related to data-driven learning (DDL) is reviewed. This section is divided into three sub-sections. Section 2.2.1 aims at DDL studies that analyzed general learning abilities. Section 2.2.2 reviews DDL studies that focus on writing accuracy. In section 2.2.3, types of error that has been analyzed in previous DDL studies are discussed.. 2.2.1 General Learning Abilities Three main research directions have been found in a paper that views 39 DDL studies (Boulton, 2007). The first aspect is learners’ attitudes or perceptions toward using corpus tools for DDL. In this type of study, a self-rated questionnaire was frequently used (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007; Yoon & Hirvela, 2004). Some studies also used an interview after learners completed their questionnaire in order to retrieve thorough learner feedback (Conroy, 2010; Sun, 2007; Yoon & Hirvela, 2004). The second aspect aims at learners’ behaviors during DDL (Hafner & Candlin, 2007). For example, Hafner and Candlin (2007) analyzed how learners consulted a corpus tool by analyzing learners' searching logs. The third aspect is the effects of DDL (Boulton, 2010; Conroy, 2010; Johns, 1991; Shei, 2008a; Shei, 2008b), which is to understand if learners were capable of improving their language performances through DDL. Studies have shown that DDL provides some benefits for general learning abilities (Boulton, 2007; Johns, 1991). To begin with, scholars stated that learners 13.

(22) developed independent learning skills and learning strategies after DDL (Boulton, 2007; Johns, 1991; O’Sullivan, 2007; Shei, 2008b). When learners have the skills and strategies for self-learning, they could become more competent in problem solving and independent learning even without a teacher's presence. For example, O’Sullivan (2007) mentioned that learners could develop various cognitive skills by using corpus tools, including predicting, observing, noticing, thinking, reasoning, analyzing, interpreting, reflecting, exploring, making inferences, focusing, guessing, comparing, differentiating, theorizing, hypothesizing, and verifying. In addition to the various cognitive skills, scholars reported that DDL enables learners to gain procedural knowledge and metacognitive strategies because learners need to understand the learning tasks and develop appropriate procedures to complete the learning tasks (Boulton, 2007; Jones, 1991). In addition, learners may be more aware of their learning processes in DDL; thus, learners could be more autonomous and develop better learning skills in the long term because of the learner-centered approach (Boulton, 2007). Some issues were addressed in previous studies (Boulton, 2007; Boulton, 2010). It is said that DDL might fail to attract a large group of people to use this approach because DDL cost a lot time and effort to implement. Knowledge discovery is a crucial element in DDL; nonetheless, it requires learners to spend more time and effort compared to other teaching methods that are not inductive teaching in nature. However, Boulton (2010) specified that the above-mentioned difficulties were result from the implementation of DDL rather than the nature of DDL. In addition, it does not necessarily mean that DDL is inferior to other teaching methods in terms of learning outcomes. For instance, deductive teaching is a popular teaching method in EFL countries. It seems to save time and effort because learners mainly absorb the 14.

(23) knowledge transmitted by teachers rather than actively discover knowledge by themselves as DDL. When the learning outcomes of deductive teaching and DDL, an inductive teaching were compared, the results showed similar short-term effects but have different long-term effects (Cobb, 1999; Boulton, 2010). For example, Cobb (1999) demonstrated that short-term benefits were similar for using either deductive teaching or DDL. Nonetheless, the learning outcomes of deductive teaching and DDL showed significant differences in the long term. It was reported that learners who had DDL treatment showed longer retention of the target knowledge than the others that received deductive teaching (Cobb, 1999). Furthermore, although learners noticed that DDL could be time-consuming and may need a great deal of effort from learners, they were quite aware of DDL's benefits and believed it could be helpful for language learning (Hafner & Candlin, 2007).. 2.2.2 Writing Accuracy Previous DDL studies showed some similarities in terms of their participants (Boisson et al., 2013; Boulton, 2007; Boulton, 2010; Gaskell & Cobb, 2004; Milton, 2006; Potthast et al., 2014; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Most studies had learners who mastered basic language rules as their participants (Boulton, 2007; Gaskell & Cobb, 2004; Milton, 2006; O’Sullivan, 2006; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Scholars explained that learners who mastered basic language rules were more capable of brainstorming possible language patterns and understanding the examples provided by corpus tools (Gaskell & Cobb, 2004; Milton, 2006; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Among the participants who mastered basic language rules, a large number of studies aimed at non-native English learners (Boulton, 2007). It was stated that DDL 15.

(24) could be even more beneficial for EFL and ESL learners who might not be able to receive as much language input as native speakers from their daily lives (Milton, 2006; Potthast et al., 2014; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Receiving linguistic input from the corpus tools could be another alternative for EFL learners. They can learn common usages and various language formulations through DDL with different corpus tools. For studies that analyzed EFL and ESL learners as their targeted participants, it was reported that DDL could be beneficial for developing various language skills, such as reading (Conory, 2010), vocabulary learning (Boulton, 2010; Conory, 2010; Yeh, Liou & Yu, 2007), and writing (Boisson et al., 2013; Boulton, 2007; Boulton, 2010; Gaskell & Cobb, 2004; Milton, 2006; O’Sullivan, 2006; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Fruitful studies were conducted to understand the effects of using DDL on assisting learners to improve writing abilities (Boisson et al., 2013; Boulton, 2007; Boulton, 2010; Gaskell & Cobb, 2004; Milton, 2006; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Various corpus tools were reported to be suitable writing scaffolding for DDL because of the following two reasons (Boulton, 2007; Boulton, 2010; Gaskell & Cobb, 2004; Milton, 2006; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). To begin with, learners indicated that corpus tools provided concrete examples which helped learners to understand contexts of a grammar point (Boulton, 2010; Yoon & Hirvela, 2004). Learners would not learn discrete grammar rules that do not provide ample examples or contexts; instead, learners are able to visualize the context of a grammar point as contextualized grammar (Potthast et al., 2014). Therefore, learners could understand how to use grammar in context. In addition, learners stated that they learned more vocabulary or phrases from the contexts of a grammar point that provided by corpus 16.

(25) tools (Yoon & Hirvela, 2004). It is because the examples or the contexts of a grammar point might contain some new or unfamiliar words for learners. Learners need to search for their meaning in order to completely understand the examples. Another study also supported this concept and specified that learners could have incidental and informal learning during DDL (Hafner & Candlin, 2007). Learners may gain even more language knowledge that is not expected when they have data-driven learning. Secondly, learners also showed positive attitude toward the corpus tools that they used in DDL (Boulton, 2010; Gaskell & Cobb, 2004; Yeh, Liou & Yu, 2007; Yoon & Hirvela, 2004). It was stated that corpus tools were useful, convenient, and helpful for assisting learners to complete their language learning tasks (Boulton, 2010; Gaskell & Cobb, 2004; Yeh, Liou & Yu, 2007; Yoon & Hirvela, 2004). In addition, learners stated that they were willing to use the corpus tools in the future or for other courses (Gaskell & Cobb, 2004; Yoon & Hirvela, 2004). Various corpus tools were reported to be useful (Boulton, 2007). It is reported that the corpus tools contain different amount of corpora, ranging from billions of corpora to only 2000 corpora. Some corpus tools that have been analyzed were the BNC, the Brown Corpus, WordSmith Tool, and MICASE. Moreover, some studies used corpus tools that extract corpora from webpages. An increasing number of studies have analyzed corpus tools that extract corpora from webpages for DDL in recent years (Boisson et al., 2013; Gaskell & Cobb, 2004; Milton, 2006; Potthast et al., 2014; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Cobb (1997) is one of the first researchers attempted to conduct an empirical experiment to investigate the effects of a corpus tool that extract corpora from webpages on language learning. The results showed that learners gained more vocabulary by using the corpus tool with glossary systems. Afterward, a great amount of research was 17.

(26) carried out and showed these corpus tools were extremely beneficial for DDL (Boisson et al., 2013; Gaskell & Cobb, 2004; Milton, 2006; Potthast et al., 2014; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). The interfaces and the amount of corpora are two main advantages of the corpus tools that extract corpora from webpages. The interfaces of the corpus tools that extract corpora from webpages tend to be concise and are similar to search engines that learners had already been familiar with, such as Google. Thus, the corpus tools are praised with its familiarity, accessibility, and user-friendly interfaces (Boisson et al., 2013; Gaskell & Cobb, 2004; Milton, 2006; Potthast et al., 2014; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). The other advantage is the amount of corpora provided by these corpus tools. Copious corpora are extracted from a wide variety webpages. When webpages are created or edited, the corpora would be automatically updated (Shei, 2008a, 2008b). As a result, these corpus tools could be frequently updated and even provide more than one billion phrases, which could outperform many other well-established corpus tools in terms of their rich and gigantic corpora. Research also showed these corpus tools could provide resourceful feedback for improving leaners' writing abilities in terms of learning common English usages and being aware of the variation in English (Boisson et al., 2013; Shei, 2008a, 2008b). The corpora are sequenced based on frequencies. Words or phrases with higher frequencies would be shown in the beginning of the searching results. In general, the frequencies could be used as an indication of words or phrases that are more commonly used compared to the lower frequencies ones. Thus, learners could compare frequencies and learn common language patterns. Other than learning the common English usages, the variation in English could 18.

(27) also be shown in these corpus tools (Shei, 2008a, 2008b). The corpus tools extract corpora from webpages from various contexts, so they contain language patterns in rich contexts. This statement is also echoed with Shei's study (2008a, 2008b, claiming that the nature of language for communication is more similar to spectrum rather than discrete right or wrong answers. More than one possible language patterns may carry the same meaning. Some subtle differences could only be found in the examples provided by these corpus tools. However, these subtle differences may not be thoroughly included in a dictionary because of the volume limitation (Boisson et al., 2013). Previous DDL studies also revealed that these corpus tools extracted corpora from webpages could be extremely helpful for learners to improve writing accuracy (Boisson et al., 2013; Gaskell & Cobb, 2004; Milton, 2006; Potthast et al., 2014; Milton, 2006; Shei, 2008a, 2008b; Yeh, Liou & Yu, 2007). Most studies showed that learners improved their writing abilities and made fewer errors on their essays after using these corpus tools for DDL (Boisson et al., 2013; Potthast et al., 2014; Milton, 2006; Shei, 2008a, 2008b). Several examples were provided as follow. Milton (2006) showed that learners were more aware of inappropriate usages and made fewer new errors in their essays. Shei (2008a; 2008b) also mentioned that learners could learn possible formulations of the phraseological units made by native speakers, such as extended collocation. Other studies analyzed Netspeak or Linggle and showed they were helpful for assisting learners to improve writing performances (Boisson et al., 2013; Potthast et al., 2014). Netspeak and Linggle were reported to have features that could benefit DDL, including that they both have more than one billion corpora, allow searching for one to five words, and show the frequencies of words and phrases (Boisson et al., 2013; Potthast et al., 2014). In fact, it was reported that learners 19.

(28) improved their cloze test performances after using Netspeak (Potthast et al., 2014). Although positive results were found, some challenges were mentioned for implementing DDL with corpus tools that extract corpora from webpages. One major challenge is that learners might find problematic examples in their searching results (Conory, 2010). These corpus tools extract corpora from webpages that are created by people with various language proficiencies. In addition, the words or phrases that are used on the webpages may not undergo rigorous scrutiny for their appropriateness. Two solutions were provided to minimize the chance of misleading learners to generalize rules based on problematic examples (Conory, 2010). One is to refer to the frequency of a language pattern, and the other is to analyze the contexts of a language pattern (Conory, 2010). First of all, the frequency of a language pattern could be valuable references. Although some problematic examples could still be found, they would have much lower frequencies compared to the correct usages (Conory, 2010; Kilgarriff & Grefenstette, 2003). Learners should be aware that low frequencies may represent two kinds of meanings, one is problematic examples and the other is contexts that are constrained. Thus, learners should be extremely careful when they find certain language patterns with low frequencies because they may have different meanings (Conory, 2010; Kilgarriff & Grefenstette, 2003; Sun, 2007). On the contrary, learners are advised to find language patterns with high frequencies because they tend to be common language patterns. The second solution for dealing with problematic examples is to understand the contexts of a language pattern. Learners are advised to read the examples carefully in order to understand whether the context fit their needs (Conory, 2010). For instance, if the learners would like to correct "about leaving" into the intended usage "about to leave", they might search for "about leaving" and find this usage existed. However, 20.

(29) when they look at the examples, they would find collocations such as "quotes about leaving" which would be different from the intended usage. Therefore, it is crucial to train learners to distinguish among possible usages by consulting examples that provide contexts of certain language patterns (Conory, 2010; Kilgarriff & Grefenstette, 2003; Sun, 2007). After the previous discussion, it is found that different corpus tools could have different limitations because they are designed for various purposes. Previous studies also showed that even corpus tools that were reported to be useful would have different strengths and weaknesses (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007). Some of them are more helpful for learners to correct certain types of error while others may not be as useful as them. To cope with the above-mentioned issue, scholars suggested that learners may use more than one corpus tool for DDL (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007). As a result, an increasing number of scholars have evaluated the extent to which different corpus tools could assist learners to improve writing performances. They compared the effects of different corpus tools through writing or error correction tasks, arguing to adopt more than one corpus tool to facilitate DDL (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007). Some results were presented as in the following sections. Learners held positive attitudes toward using different corpus tools for DDL. In some studies, learners agreed that using different corpus tools for DDL were useful, convenient, and rewarding to assist their writing performances (Conroy, 2010; Hafner& Candlin, 2007; Sun, 2007). Other studies showed that learners were more willing to use different corpus tools to complete writing tasks after training (Conroy, 2010; Hafner & Candlin, 2007). In fact, the percentages of learners who used two 21.

(30) corpus tools for DDL increased drastically after the training, suggesting that learners were motivated to try different kinds of corpus tools because they were more aware of the strengths and weaknesses of different corpus tools (Conroy, 2010). Finally, learners mentioned that they were willing to use different corpus tools as writing scaffolding in the future (Conroy, 2010; Hafner& Candlin, 2007; Sun, 2007). In addition to learners' positive attitudes toward using different corpus tools, some learners tended to choose corpus tools with three features, which were large amount of corpora with rich contexts, the interfaces that learners were familiar with, and examples that are less problematic (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007). The first factor was the amount of corpora provided by the corpus tools. A large number of learners tended to choose the corpus tools with more corpora in order to find rich examples than the ones with fewer corpora (Conroy, 2010; Hafner& Candlin, 2007; Sun, 2007). For example, learners indicated the one million words in Brown Corpus could not be helpful for them to find sufficient examples. Thus, it is suggested that corpus tools that contain more corpora are more suitable for DDL (Conory, 2010). This suggestion was also mentioned in many other studies that call for implementing DDL with corpus tools with larger corpora (Gaskell & Cobb, 2004). Other than the amount of corpora, it was reported that learners tended to choose corpus tools with easier and user-friendly interfaces, especially those interfaces similar to search engines, such as Google (Conroy, 2010; Hafner & Candlin, 2007). It is because most learners were accustomed to search for information with this kind of interface based on their prior experiences. The last factor was to use corpus tools with less problematic examples. A small number of learners indicated that that they preferred to use the corpus tools with examples that undergo rigorous scrutiny. They were concerned that they might not be 22.

(31) able to distinguish the correct usages from the incorrect ones. Thus, they were concerned that they may be misled by problematic examples and generalize incorrect language rules (Sun, 2007). While some learners tended to choose corpus tools with certain features, others showed that they preferred to use different corpus tools to complete different writing goals (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007). It is found that learners used concordancers for solving lexical-grammatical problems, such as to determine word category, to check prepositions, and to learn passive and active tenses (Conroy, 2010). On the contrary, learners used Google for several purposes, including to check intuition (Conroy, 2010; Hafner & Candlin, 2007; Sun, 2007), to ensure their language patterns were more suitable for academic styles (Conroy, 2010), and learn new words (Conroy, 2010; Hafner & Candlin, 2007). Previous studies shed light on the extent to which different corpus tools could assist learners to complete writing tasks (Conroy, 2010; Hafner & Candlin, 2007; Sun 2007). However, some issues remained to be unsolved to date and could need further investigation. One major issue was that learners were asked to use two or more corpus tools, but they may not spend the same amount of time in using all these corpus tools. Thus, it could be more difficult to conclude which corpus tools contributed to the positive results. For example, Sun (2007) analyzed the effects of a corpus tool on assisting learners to improve writing performances. The results showed that learners believed the corpus tool assigned by the researcher was useful for improving writing performances. However, learners revealed that they also consulted other corpus tools such as Google in order to find sufficient examples. Since the amount of time of using the assigned corpus tool and Google were not clearly stated, some concerns could appear. Thus, it would be more difficult to draw conclusions. 23.

(32) 2.2.3 Different Types of Grammatical Error Previous DDL studies showed that learners were capable of using various corpus tools to improve writing accuracies, especially for error correction (Boisson et al., 2013; Boulton, 2007; Boulton, 2010; Gaskell & Cobb, 2004; Potthast et al., 2014; Yeh, Liou & Yu, 2007). Some of these researchers strived to analyze the type of error that could be corrected by learners (Boulton, 2007; Boulton, 2010; Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Potthast et al., 2014; Yeh, Liou & Yu, 2007). Summary of previous DDL studies aiming at types of error that could be corrected by learners is organized as in Table 2.1. Table 2.1 Summary of the Results of Previous Error Types. Study. Corpus Tool. Error Type. Errors Changed Significantly. Boulton (2007). BoE, BNC,. . 1. Lexical errors. Brown, ICE,. 1. Lexical errors. MICAS, VIEW,. 2. Collocational. WordSmith Tools,. errors. 2 errors:. 2. Collocational errors. MicroConcord, Sara, web,. Boulton (2010). BNC. . 15 errors:. Similar words but used in. 1.. grammar. different contexts. errors 2.. word usages errors. 24.

(33) Conroy (2010). Virtual Language. Not specified. 1.. Concordancers: correct. Centre, Compleat. lexical and grammatical. Lexical Tutor,. errors. Google. (1) parts of speech (2) preposition (3) verbs 2.. web: ensure essays were (1) more native-like (2) suitable for academic style. Gaskell and. Lextutor. . Cobb (2004). 10. 1.. Reduction: pronouns. grammatical. 2.. Increase: conjunction. errors: 1.. conjunctions. 2.. gerunds and infinitives. 3.. noun plurals. 4.. prepositions. 5.. capitals and punctuation. 6.. word order. 7.. pronouns. 8.. modals. 9.. subject/verb agreement. Hafner &. LAWS. Candlin (2007). Not specify types. 1. vocabulary. of error.. 2. phrases 3. general writing skills. Potthast et al.. Netspeak. . Three levels. Learners could improve test. of difficulty. scores with three levels of. 1.. Easy. difficulty. 2.. Medium. 3.. Hard. (2014). 25.

(34) Yeh, Liou & Yu. TOTALrecall. (2007). . 3 types of. Not specify about which type of. error. error could be correct. 1. verbs. successfully.. 2. nouns 3. adjectives. Eighteen types of error were analyzed in previous studies: adjectives, articles, conjunctions, capitals, collocation, gerunds and infinitives, informal usage, idioms, misspelling, modals, noun plurals, parts of speech, prepositions, pronouns, punctuation, subject-verb agreement, tenses, and word order (Boulton, 2007; Boulton, 2010; Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Potthast et al., 2014; Yeh, Liou & Yu, 2007). Most types of error that have been analyzed were lexical and collocation errors (Boulton, 2010; Gaskell & Cobb, 2004; O’Sullivan, 2006; Yeh, Liou & Yu, 2007), which were also found in Boulton's study (2007) which analyzed previous 39 DDL studies. Lexical errors refer to the wrong choice of a word in a certain context, and collocation errors are errors that violate with typical word combinations. Other studies investigated errors other than lexical or collocation errors. For instance, Gaskell and Cobb (2004) analyzed word order and subject/ verb agreement, which are at sentential level. Some errors showed significant changes after DDL as shown in Table 2.1. It is showed that learners were more capable of correcting three out of eighteen types of error. These types of error were collocation, prepositions, and pronouns (Boulton, 2007; Boulton, 2010; Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Potthast et al., 2014; Yeh, Liou & Yu, 2007). Although other types of error did not showed significant reduction, they still show some reduction after DDL treatments. In fact, six types of error were said to 26.

(35) show some reduction and some of them even almost changed significantly (Gaskell & Cobb, 2004). These errors were capitals, gerunds and infinitives, modals, noun plurals, punctuation, and word order. Three of them almost achieved significant reduction, which included capitals, punctuation, and word order. On the contrary, learners significantly made more conjunctions errors, suggesting that learners may not be able to correct this type of error. In another study (Conory, 2010), learners reported that they used corpus tools to correct parts of speech errors and reduce their errors. These types of error that changed significantly in previous DDL studies were similar to common errors made by EFL and ESL learners (Chen, 2009; Ferris, 2006). According to Chen's study (2009), several common errors made by EFL learners were identified through a large EFL Chinese corpus, which included missing or extra article, spelling, fragment or missing comma, run-on sentence, subject-verb agreement, confused words, ill-formed verbs, wrong article, and compound word. Another study also showed similar types of error that were frequently made by ESL learners. Ferris (2006) analyzed different types of error that learners could correct by ESL learners. Types of typical error were categorized by experienced instructors of ESL composition courses. The errors included word choice, verb tense, verb form, word form, article, singular-plural form, pronouns, run-on sentences, fragment, punctuation, spelling, sentence structure, informal, idiom, subject-verb agreement. Similar results can be found regarding the common types of error made by ESL/EFL learners (Chen, 2009; Ferris, 2006) and eighteen types of error that have been analyzed in previous DDL research (Boulton, 2007; Boulton, 2010; Conroy, 2010; Gaskell & Cobb, 2004; Hafner & Candlin, 2007; Potthast et al., 2014; Yeh, Liou & Yu, 2007). It suggested that the error types investigated in previous studies were worth analyzing because those errors were common errors for ESL/EFL learners. 27.

(36) CHAPTER 3. METHODOLOGY. 3.1 Participants The participants of the present study were four freshman students with different majors enrolled at a university in the northern part of Taiwan. Three of them are females and one of them is male. All of them are taking freshman English classes at high intermediate level, which could be considered as intermediate EFL learners. In addition, all of them have passed GEPT Intermediate Level. All four participants used both Netspeak and Linggle to correct errors. To prevent the order of using two different corpus tools from influencing learners' performances, a counterbalanced measure was implemented. It means that learners were divided into two groups with opposite orders. One group of learners used Netspeak first and then they used Linggle to correct errors. The other group of learners used Linggle first and then they used Linggle to correct errors.. 3.2 Instrument In the beginning of this section, Netspeak and Linggle are introduced, which are the two corpus tools for the present study. Afterward, an introduction of Inputlog is provided, which is the tracking system for learners' searching processes. Next, the error correction tasks are addressed. Pre-and post- questionnaires and an interview are presented in the end of this section.. 28.

(37) 3.2.1 Two Corpus Tools This section introduces Netspeak and Linggle, which were two corpus tools that were used in the present study. These two corpus tools have similar designs in general, but they also have some different functions (Boisson et al., 2013; Potthast et al., 2014).. 3.2.1.1 Netspeak Netspeak (see Figure 3.1) is developed by Bauhaus-Universität Weimar since 2008. It contains 2 billion words and phrases that extracted corpora from Google. Netspeak allows users to search for one to five continuous words, ensuring users to focus on limited language patterns instead of complete sentences. Some other corpus tools enable users to type in more words, such as Webcorp. However, Webcorp do not provide suggested or related searching results as Netspeak. When suggested or related searching results are shown, various possible language patterns would be shown. Therefore, learners could receive much more linguistic input other than the target linguistic features. It allows users to find target information faster and have incidental learning (Hafner & Candlin, 2007). Learners can learn common usages and the variation in English in Netspeak. The corpora are shown in sequence based on their frequencies. Items that are more commonly-used would have higher frequencies and are shown in the beginning of the searching results. It helps learners to be aware of frequent or common usages. In addition to common usages, learners could learn understand the variation of English based on language patterns extracted from rich contexts (Potthast et al., 2014).. 29.

(38) Figure 3.1 Interface of Netspeak. Netspeak is a helpful corpus tools to help learners improve writing accuracy or correct errors for the following four reasons (Potthast et al., 2014). The four reasons include Netspeak’ target users, interface, and operators. First, Netspeak is designed for average learners who have mastered basic language rules to check uncertain language formulations (Potthast et al., 2014). English learners could use Netspeak as digital writing scaffolding to improve writing performances. Secondly, it has user-friendly interface that is similar to Google's interface. In addition, Netspeak provides clear instruction for learners show how to use the operators to search for language patterns. In fact, the interface of Nestspeak is so simple that training for five minutes was reported to be sufficient for learners to improve cloze test scores after using Netspeak (Potthast et al., 2014). The last reason is that Netspeak provides five operators to search for various language patterns. Operators are the symbols that lead to certain ways to sort the 30.

(39) corpora, such as “?”. Operators provide useful alternatives for learners to find possible language patterns. Although some corpus tools provide more than five operators, it was reported that the five operators’ functions on Netspeak were sufficient for average learners to improve their cloze test performances (Potthast et al., 2014). The five operators of Netspeak are listed in Table 3.1. Table 3.1 Operators of Netspeak Operators. Purpose. Examples. ?. Find one word.. how to ? this. .... Find many words.. see ... works. Compare options.. it's [great well]. Find similar words.. and knows #much. Check the order.. {more show me}. [. ] #. {. }. 3.2.1.2 Linggle Linggle is another powerful corpus tool that is designed to assist ESL/ EFL learners to improve writing performances. It is developed by National Tsing Hua University in Taiwan (see Figure 3.2) and was released in 2013 (Boisson et al., 2013). Linggle is a well-established corpus tool with more than one billion words and phrases extracted from Google, and it allows users to search for one to five words, which are two same features as Netspeak’s design. Linggle shows learners common usages and the variation in English, which are also similar to Netspeaks’ design. It also provides common usages with frequencies of words and phrases. In addition, Linggle provides examples various language patterns. Therefore, learners can learn the frequencies of various words and phrases. 31.

(40) Meanwhile, learners are able to find language patterns in context so as to find subtle differences among similar usages. Figure 3.2 Interface of Linggle. Linggle has many functions that are useful for DDL. Four reasons are discussed, including Linggle’ target users, interface, examples, and operators. First of all, the target learners of Linggle are EFL and ESL learners who would like to improve their writing performances. Secondly, Linggle's interface is also quite similar to Google. In addition, it also provides a user-friendly interface and clear instruction of the all the operators and examples to show how to use these operators. The third reason is that Linggle provides examples extracted from New York Times Corpus instead of random webpages (Boisson et al., 2013). New York Times is 32.

(41) a well-established journalism that could require rigorous scrutiny for the words and phrases that they published. Thus, fewer problematic examples could be found compared to corpus tools that extracted corpora from webpages. However, the examples provided by Linggle would more limited in the field of journalism. The last reason is operators. Linggle has much more operators than Netspeak. Of all the operators, parts of speech could not be found on Netspeak. However, operators to search for parts of speech could be extremely beneficial and time-saving for DDL because it provides searching results that are more precise. For example, learners could use the operators to specifically search for verbs. Compared to Netspeak, Linggle does not provide operator to search for word order. In addition to operators to search for parts of speech and word order, the operators of synonym were reported to be designed in different ways. Boisson and others mentioned that (2013) Netspeak could only provide limited synonyms, whereas Linggle could provide more synonyms with conceptually related words. A list of Linggle's operators is shown as in Table 3.2. Table 3.2 Operators of Linggle Operator. Purpose. Example. _. search for any word. listen_music. ~. search for the similar words. ~role. ?. search for the word optionally. listen ?to music. *. match zero or more words. good * role. /. see which word is more suitable. go in/to school. search for parts of speech. v. research. v.(verb), n.(noun). do n.. adj.(adjective). adj. girl. adv.(adverb),. play adv. 33.

(42) prep.(preposition),. prep. campus. det.(determiner),. det. food. conj.(conjunction),. I was late conj I arrived. pron.(pronoun). I like pron.. interj.(interjection). interj. where is my key. 3.2.2 Inputlog In order to analyze learners’ searching processes, Inputlog (see Figure 3.3) is used to track learners’ performances for the present study. Inputlog was used while learners doing the error correction in order to understand the operators, the strategies, and the average query times for using the two corpus tools to correct errors. Inputlog is a designed to record and analyze writing processes. Inputlog is a powerful tool because it logs inputs of keyboard and mouse. It provides recording of the writing processes and analysis of revision times and content, the words that they searched, and duration that an item is searched and many other functions. When they clicked the bottom to start recording, a word file would automatically popped out. Learners need to write down the correct answers in the word file provide by Inputlog during two error correction tasks. When they finished the tasks, they needed to click the stop recording bottom. Meanwhile, the word files and the records of the searching processes were saved automatically. Learners' words files with correct answers and the records of the searching processes could be further examined. First of all, learners’ word files could show if learners provide correct answers. Secondly, the analysis of the writing processes would be shown charts would show learners inputs with keyboard and mouse. Thus, no further transcription was needed to conduct by the researcher. To understand 34.

(43) learners’ searching processes, the operators, the strategies, and the average query times for using the two corpus tools to correct errors were analyzed. Furthermore, the researcher could replay the recordings to understand more about the writing processes. Figure 3.3 Interface of Inputlog. 3.2.3 The Error Correction Task In the beginning of this section, the rationale for choosing the ten types of error are addressed. The second part of this section shows the design of the test sheet for the error correction task with the above-mentioned ten error types. To begin with, ten types of error were chosen for the following reasons. First of all, collocation and pronouns are chosen because these two error types were found to be reduced significantly after learners used corpus tools to correct them according to 35.

(44) Table 2.1 (Boulton, 2007; Gaskell & Cobb, 2004). Secondly, conjunctions are included for analysis because it was found that learners significantly increased this type of error after using the corpus tool (Gaskell & Cobb, 2004), suggesting that this error type could be further examined because learners may not be able to correct this error type. Thirdly, parts of speech and prepositions are included because learners reported that they use corpus tools to correct these two error type (Conory, 2010). As for types of error that showed some reduction but were not reached significances, they should also be included for further analysis to understand if they can be corrected by learners using the two corpus tools provided by the present study. According to Table 2.1, six error types are found, which are capitals, gerunds and infinitives, modals, noun plurals, punctuation, and word order (Gaskell & Cobb, 2004). However, capitals and punctuation were excluded because learners are capable of correcting them once they are underlined. Thus, the remaining four of them are chosen for further investigation in the present study, which are gerunds and infinitives, modals, noun plurals, and word order. Finally, participles are another important type of error that is worth analyzing. It is because previous DDL studies had analyzed verb forms (Conory, 2010; Yeh, Liou & Yu, 2007), and participles are one of the major verb forms for academic writing (Granger, 1997). Thus, to master participles is important for learners to complete academic writing. However, Granger (1997) found that EFL learners used far less participles compared to native speakers. One possible explanation is that EFL learners use fewer participles because they may be unsure about proper ways to use participles. Thus, participles are also included for further investigation to understand if intermediate EFL learners could master participles and correct this type of error. Thus, a list of ten error types of the present study is shown in Table 3.3. 36.