以語料庫分析期刊作者與臺灣英文學習者學術英文動詞─名詞與形容詞─名詞搭配詞之使用

全文

(1)國立臺灣師範大學英語學系碩士論文 Master’s Thesis Department of English National Taiwan Normal University. 以語料庫分析期刊作者與臺灣英文學習者學術英文動詞─名詞與形容詞─名詞搭配詞之使用. A Corpus-Based Study on Verb-Noun Collocations and Adjective-Noun Collocations in Published Authors’ and Taiwanese EFL Learners’ Academic Writing. 指導教授：陳浩然. 博士. Advisor: Dr. Hao-Jan Chen. 研究生：楊定瑜 Student: Ting-Yu Yang. 中華民國一○四年六月 June, 2015.

(2) 中文摘要過往研究指出在不同的學術領域中搭配詞的使用也有所不同，研究者也發現學術英文學習者在搭配詞的使用上也不同於母語者。然而，針對單一學術領域分析其母語者及學習者英文搭配詞之使用情形的研究卻很少。有鑑於此，本文針對應用語言學領域中學術期刊作者與台灣英文學習者使用動詞-名詞(動名)搭配詞與形容詞-名詞(形名)搭配詞之情形進行探討。本文研究旨在 1)列出期刊作者常用之動名搭配詞與形名搭配詞，並提供學術英文學習者作為參考使用；2)找出台灣英文學習者常用之動名搭配詞與形名搭配詞，並和前列期刊作者常用之搭配詞進行比較；3)指出台灣英文學習者未充分利用或過度使用之搭配詞。為達到上述之研究目的，本文自建兩座容量各近近 1 千 2 百萬字的期刊文章語料庫與學習者碩士論文語料庫。期刊文章語料庫之文本選自 15 本應用語言學相關之國際期刊文章，總文本數量為 1,500 篇；學習者碩士論文語料庫之文本則選自 10 間國立大學應用語言學/英語教學學位的碩士論文，總文本數量為 494 篇。本文自兩座語料庫中找出 29 個高頻的核心名詞，進一步從兩座語料庫中找出這些名詞的常見動詞/名詞搭配並進行比較分析。本文自期刊文章語料庫中找出 181 種常見的動名搭配詞與 248 種常見的形名搭配詞、自學習者碩士論文語料庫找出 203 種常見的動名搭配詞與 231 種常見的形名搭配詞。比較期刊作者和臺灣英文學習者學術英文搭配詞之使用發現：1) 期刊作者和臺灣英文學習者共同常用之動名搭配詞有 136 種、形名搭配詞有 159 種；2)期刊作者的搭配詞詞彙量比臺灣英文學習者的詞彙量為多，而且期刊作者的動名搭配詞詞彙量與形名搭配詞詞彙量較為相近。這些研究結果顯示期刊作者 i.

(3) 在學術動名與形名搭配詞的使用上較為均衡；相反地，臺灣英文學習者的動名搭配詞與形名搭配詞則較無多樣性且較不均衡。本文另發現台灣英文學習者的搭配詞過度使用情形較未充分使用情形顯著；此外，具有較高 t 分數的搭配詞也比較容易被學習者過度使用/未充分使用。進一步分析九對詞意相近的搭配詞後發現，高頻且具有高度相關性、具有較高 t 分數、以及語意清楚直接的搭配詞較容易被學習者過度使用。本文研究結果顯示，期刊作者與臺灣英文學習者在學術動名搭配詞與形名搭配詞的使用上有相當差異。研究者建議於學術英語教學者與教材編寫者能將此文所列期刊作者常用之搭配詞引入課程，並提醒學生避免太過頻繁使用相同的學術英文搭配詞，期望這些建議能為增進台灣英文學習者的學術寫作能力盡棉薄之力。. 關鍵字: 學術英文搭配詞、語料庫分析、學術寫作、學術英文。. ii.

(4) ABSTRACT Research has revealed that the usage pattern of collocations vary from one academic discipline to another, and that advanced English for Academic Purposes (EAP) learners in general used academic collocations quite different from native speakers. However, discipline-specific studies on the usage pattern of collocations by either published authors or EAP learners are scant. The current research thus set out to investigate the frequent verb-noun (V-N) and adjective-noun (A-N) collocations employed by published authors and Taiwanese EFL learners in the field of applied linguistics. The purposes of this study are: 1) to generate lists of frequent V-N and A-N collocations in the published authors’ writing in order to provide EAP learners a useful reference, 2) to extract the frequent V-N and A-N collocations in the Taiwanese EFL learners’ writing and compare the learners’ usage pattern of collocations with the published authors’, and 3) to identify V-N and A-N collocations that are underused/overused by the Taiwanese EFL learners. To achieve the research purposes, two corpora were built in the current study. A reference corpus, Research Article Corpus (RAC), composed of 1,500 research articles from 15 international journals in the field of applied linguistics. A learner corpus, Master’s Thesis Corpus (MTC), composed of 494 masters’ theses from 10 national universities in Taiwan. The two corpora contained approximately 12 million words. The researcher identified 29 core nouns and respectively investigated their frequent verb/adjective collocates in the two corpora. The Taiwanese EFL learners’ usage pattern of the frequent V-/A-N collocations were then compared with the published authors’ to uncover the similarities/differences in their collocation use. In the published authors’ articles, 181 types of V-N collocations and 248 types of A-N collocations were frequently employed. In the Taiwanese EFL learners’ theses, 203 iii.

(5) types of V-N collocations and 231 types of A-N collocations were frequently used. Comparisons between the published authors’ and the Taiwanese EFL learners’ usage pattern of collocations yielded the following findings. First, 136 types of V-N collocations and 159 types of A-N collocations were overlapped in the two corpora. Second, the published authors’ lexical repertoire of both V-N and A-N collocations was wider as compared to the Taiwanese EFL learners, and the published authors’ lexical repertoire of V-N collocations was proportionally similar to their lexical repertoire of A-N collocations. These findings suggest that the published authors demonstrated a more balanced productive knowledge of both V-N and A-N collocations in their writings, whereas the Taiwanese EFL learners’ production of V-N and A-N collocations was less diverse and balanced. This study also uncovered that the Taiwanese EFL learners’ overusing behavior was more salient that their underusing behavior. Association analysis (i.e. t-score and MI-value) on the underused and overused collocations showed that both underused/overused collocations were tended to be those with high t-scores. Further association analysis of several synonymous pairs of underused/overused collocations demonstrated that robust (i.e. frequent and highly-associated) collocations and collocations with higher t-scores were more likely to be overused in the Taiwanese EFL learners’ writing. In addition, many verb/adjective collocates that recurrently formed different types of underused/overused collocations were also observed in the learners’ writing. Some pedagogical implications for EAP writing instruction in the field of applied linguistics and possible directions for future research are proposed. Key words: academic collocations, corpus analysis, academic writing, English for academic purposes.. iv.

(6) ACKNOWLEDGEMENT This thesis would not have been possible without the assistance and support of many people from whom I would like to recognize and thank. First and foremost, I would like to express my utmost gratitude to my advisor, Prof. Hao-Jan Chen, for his expert guidance on my graduate study and continuous support of my thesis. His generous offer of every invaluable resource and suggestion during my thesis writing process greatly encouraged me to carry on. I am honored to have known him for more than six years and learned from him about research and life. Without his guidance and support, I would never be able to finish this thesis and complete my graduate degree. I also want to express my sincere gratitude to the committee members of my thesis, Dr. Chih-Cheng Lin and Dr. Jason S. Chang. Dr. Ling and Dr. Chang carefully proofread my thesis and provided many insightful comments as well as constructive suggestions on enriching the content of my thesis. Their invaluable advice and suggestions helped me to polish the quality of the thesis, for which I am truly grateful. My sincere thanks also goes to my school bother/sisters in the TESOL program: Fiona, Ignace, and Lisa. They spent numerous time to help me with the data collection and filtering process. Without their generous help, this thesis would not be possible. Last but not least, I would like to thank my beloved parents and friends, for they were very supportive throughout the process of thesis writing. Their encouragement and support inspired confidence in myself to overcome every difficulty during my graduate study years and the conduction of this research, for which I am heartily thankful.. v.

(7) TABLE OF CONTENT 中文摘要......................................................................................................................... i ABSTRACT .................................................................................................................iii ACKNOWLEDGEMENT ........................................................................................... v LIST OF TABLES ....................................................................................................... ix LIST OF FIGURES .................................................................................................... xi CHAPTER ONE—INTRODUCTION ...................................................................... 1 1.1 Background of the Study ............................................................................... 1 1.2 Motivation of the Study ................................................................................. 2 1.3 Purpose of the Study ...................................................................................... 4 1.4 Definition of Key Terms ................................................................................ 6 1.4.1. Lexical Collocation ...................................................................... 6. 1.4.2. Published Authors ........................................................................ 6. 1.4.3. Overuse/Underuse ........................................................................ 7. CHAPTER TWO—LITERATURE REVIEW ......................................................... 8 2.1 Previous Attempts to Generate Academic Collocation Lists ......................... 8 2.1.1. Durrant’s Academic Collocation List ........................................... 9. 2.1.2. Ackermann and Chen’s Academic Collocation List .................. 10. 2.1.3. The Proposed List: A Discipline-specific Verb-Noun and Adjective-Noun Academic Collocation List ................................ 12. 2.2 Features of ESL/EFL Learners’ Production of Collocations in Writing ...... 14 2.2.1. Advanced Learners’ Use of Lexical Collocations in Argumentative English ................................................................ 15. 2.2.2. Advanced Learners’ Use of Lexical Collocations in Academic English ......................................................................................... 17 vi.

(8) 2.2.3. A Re-examination of Advanced Learners’ Overuse/Underuse Behaviors Based on Large-sized Academic Corpora ................... 21. CHAPTER THREE—METHOD ............................................................................. 23 3.1 Corpora ........................................................................................................ 23 3.1.1. The Reference Corpus.................................................................. 23. 3.1.2. The Learner Corpus ..................................................................... 25. 3.2 Instrument .................................................................................................... 26 3.2.1. Sketch Engine .............................................................................. 26. 3.2.2. Log-likelihood Calculator ............................................................ 31. 3.2.3. Collocation Calculator (for Calculating Association) .................. 32. 3.3 Data Analysis Procedures ............................................................................ 34 CHAPTER FOUR—RESULTS AND DISCUSSION ............................................. 39 4.1 Frequent V-/A-N Collocations in the Field of Applied Linguistics .............. 39 4.1.1. Frequent V-/A-N Collocations in RAC ........................................ 39. 4.1.2. Frequent V-/A-N Collocations in MTC ....................................... 44. 4.1.3. Comparisons between Frequent V-/A-N Collocations in the Two Corpora ........................................................................................ 45. 4.1.4. Summary of the Findings on Published Authors’ and Taiwanese EFL Learners’ Use of V-N and A-N Collocations ....................... 52. 4.2 Underused and Overused V-N and A-N Collocations in Taiwanese EFL Learners’ Writing ......................................................................................... 53 4.2.1. Overview of Taiwanese EFL Learners’ Underuse and Overuse Behaviors ..................................................................................... 53. 4.2.2. T-score and MI-value Analysis of the Underused/Overused Collocations ................................................................................. 56 vii.

(9) 4.2.3. Recurrent Verb and Adjective Collocates in the Lists of Underused and Overused Collocations ........................................ 75. 4.3 Discussion on the Use of V-N and A-N Collocations in Academic English. 89 4.3.1. Discussion on Published Authors’ Use of V-N and A-N Collocations in Academic English ............................................... 89. 4.3.2. Discussion on Taiwanese EFL Learners’ Use of V-N and A-N Collocations in Academic English ............................................... 90. CHAPTER FIVE—CONCLUSION ........................................................................ 93 5.1 Summary of Major Findings ........................................................................ 93 5.2 Pedagogical Implications ............................................................................. 95 5.3 Limitations of the Current Research ............................................................ 97 5.4 Suggestions for Further Research ................................................................ 98 REFERENCES ......................................................................................................... 100 APPENDICES .......................................................................................................... 106 Appendix A. Listing of Meaning-Clustered V-N Collocations in RAC. ............ 106 Appendix B. Listing of Meaning-Clustered A-N Collocations in RAC............. 109 Appendix C. V-N and A-N Collocations in RAC and MTC. .............................. 113 Appendix D. List of Under-/Overused V-N Collocations. ................................. 128 Appendix E. List of Under-/Overused A-N Collocations. ................................. 134 Appendix F. List of Additional Core Nouns. ..................................................... 142. viii.

(10) LIST OF TABLES Table 2.1. Normed Frequency per Million Word of Example Collocations Selected from Ackermann and Chen (2013). ............................................................. 11 Table 3.1. Composition of RAC. ................................................................................. 24 Table 3.2. Composition of MTC. ................................................................................ 26 Table 3.3. Core Nouns Identified in the Present Study (in Alphabetical Order). ........ 35 Table 4.1. Overview of V-/A-N Collocations in RAC. ................................................ 40 Table 4.2. Frequent V-N and A-N Collocations Shared in ACL. .................................. 42 Table 4.3. Overview of V-/A-N Collocations in MTC. ................................................ 44 Table 4.4. Comparisons between Average Normalized Frequency and Token-Type Ratio of V-/A-N Collocations in RAC and MTC. ........................................ 45 Table 4.5. Comparisons between Average Normalized Frequency and Token-Type Ratio of V-/A-N Collocations Occurring Less than 100 Times PMWs in RAC and MTC. ............................................................................................ 47 Table 4.6. Overview of the Frequency of Overlapped V-/A-N Collocations in RAC and MTC. ..................................................................................................... 49 Table 4.7. Overview of the Frequency of Overlapped V-/A-N Collocations Occurring Less than 100 Times PMWs in RAC and MTC. .......................................... 50 Table 4.8. Log-likelihood Test on the Frequency of Overlapped V-/A-N Collocations in RAC and MTC. ........................................................................................ 52 Table 4.9. Overview of the Frequency of Under-/Overused V-/A-N Collocations. ..... 54 Table 4.10. Overview of Underused V-N and A-N Collocations Found at Different Levels of t-score. .......................................................................................... 57 Table 4.11. Overview of Underused V-N and A-N Collocations Found at Different Levels of MI-value. ...................................................................................... 59 ix.

(11) Table 4.12. Overview of Overused V-N and A-N Collocations Found at Different Levels of t-score. .......................................................................................... 61 Table 4.13. Overview of Overused V-N and A-N Collocations Found at Different Levels of MI-value. ...................................................................................... 63 Table 4.14. t-scores and MI-values of ‘Important Difference’ and ‘Significant difference’. ................................................................................................... 65 Table 4.15. t-scores and MI-values of the Synonymous A-N Collocation Pair: ‘Early Study’ and ‘Previous Study’. ........................................................................ 68 Table 4.16. t-scores and MI-values of the Synonymous V-N Collocation Pair: ‘Confirm/Corroborate Findings’ and ‘Support Findings’. .......................... 69 Table 4.17. t-scores and MI-values of the Synonymous V-N Collocation Pair: ‘Undertake Study’ and ‘Conduct Study’....................................................... 71 Table 4.18. t-scores and MI-values of the Synonymous A-N Collocation Pair: ‘Further Research’ and ‘Future Research’. ................................................. 73 Table 4.19. Recurrent Verb Collocates in the Underused/Overused Collocation List. 76 Table 4.20. Recurrent Adjective Collocates in the Underused/Overused Collocation List. .............................................................................................................. 78 Table 4.21. Frequency Data of the Recurrent Verb Collocates and the Constantly Underused V-N Collocations. ...................................................................... 82 Table 4.21. Frequency Data of the Recurrent Adjective Collocates and the Constantly Underused V-N Collocations. ...................................................................... 86 Table 4.22. Underused Collocations Overlapped in Academic Collocation List. ....... 96. x.

(12) LIST OF FIGURES Figure 2.1. Overview of Final Academic Collocations in Part-of-Speech Combinations (Source: Ackermann & Chen, 2013). ................................... 12 Figure 3.1. Interface of Word List Search. .................................................................. 28 Figure 3.2. Search Results of Word List in the Output Type of Keyword. .................. 29 Figure 3.3. Interface of Sketch-Diff by Subcorpus. ..................................................... 30 Figure 3.4. The Search Outcome of Sketch-Diff of Difference between Two Corpora. ...................................................................................................................... 30 Figure 3.5. The Interface of the Log-likelihood Ratio Calculator............................... 32 Figure 3.6. The Interface and the Outputs of the Online Collocation Calculator. ...... 33 Figure 3.7. Flow Chart of the Overall Research Procedure. ....................................... 38 Figure 4.1. Venn Diagram of V-N and A-N Collocations Distribution. ....................... 41 Figure 4.2. Regrouped Percentage of Underused V-N and A-N collocations Found at Different Levels of t-score. .......................................................................... 58 Figure 4.3. Regrouped Percentage of Underused V-N and A-N Collocations Found at Different Levels of MI-value. ...................................................................... 60 Figure 4.4. Regrouped Percentage of Overused V-N and A-N Collocations Found at Different Levels of t-score. .......................................................................... 62 Figure 4.5. Regrouped Percentage of Overused V-N and A-N Collocations Found at Different Levels of MI-value. ...................................................................... 64. xi.

(13) CHAPTER ONE INTRODUCTION 1.1 Background of the Study The written academic discourse is characterized by formal vocabulary and associated grammar which are not frequently seen in other registers (Biber & Conrad, 1999, 2004; Biber, Conrad, & Cortes, 2004; Biber, Conrad, Reppen, Byrd, & Helt, 2002; Coxhead, 2000; Schleppegrell, 2004; Schleppegrell, Achugar, & Orteiza, 2004; Schleppegrell & Colombi, 2002). To achieve fluent communication in this discourse, language learners are often expected to employ this formal vocabulary in their writing (Jones & Haywood, 2004, p.273). Knowledge of a lexis, however, involves not only the user’s understanding of the word’s meanings but many different aspects as well (Nation, 2001, 2005), such as its forms, grammatical behavior, associations, frequency, and collocations. The native-like use of collocations as an indicator of proficient language use has been generally acknowledged by researchers (e.g. Barfield & Gyllstad, 2009; Schmitt, 2004; Sinclair, 1991; Wood, 2010; Wray, 2002). It is also suggested that mastery of collocations is an essential requirement for successful English for Academic Purposes (EAP) learners, who are often required to compose information-rich yet structurallycomplex essays. Howarth (1998) has pointed out that ‘Conforming to the native stylistic norms for a particular register entails not only making appropriate grammatical and lexical choices but also selecting conventional collocations to an appropriate extent. (p.186)’ Ellis, Simpson-Vlach, and Maynard (1998) also argue that writers in the academic discourse should have a full understanding of collocations because these multi-word expressions are so significant in this community. 1.

(14) Past SLA and TESOL studies, however, have revealed that second/foreign language learners have great difficulties in mastering collocations (e.g. Arnaud & Savignon, 1997; Nesselhauf, 2005; Revier & Henriksen, 2006), and that learners’ productive knowledge of collocations are more deficient than their receptive knowledge (e.g. Koya, 2005; Laufer & Waldman, 2011). Because collocations function as both pragmatic referents as well as semantic conveyers in a given discourse, failure of using appropriate collocation use might lead to misunderstandings and signal a lack of expertise (Henriksen, 2013). In fact, recent research has reported that malformed L2 collocations would pose more processing burden for native speakers to achieve fluent reading (Millar, 2011). Some studies also show that L2 learners underuse some collocations and overuse other favored combinations as compared to native speakers (e.g. Jiang, 2009; Lorenz, 1999). Based on these findings, it is clear that learners suffer from insufficient knowledge of collocation use, which might carry the risk of misconveying messages to their readers. To better enhance L2 learners’ productive knowledge of collocations, Laufer (2005) suggests that the complex process of learning to employ these items should be pedagogically planned. Coxhead (2008) also argues that the existing wordlists should extend to the inclusion of high-frequency collocations for more effective teaching/learning of EAP. With collocation lists, L2 learners can know what items they should acquire first to achieve successful communication in academic writing. 1.2 Motivation of the Study The urgent needs of collocation lists have encouraged some researchers to compile lists of frequent collocations for teaching/learning purposes. Adopting a corpus-based approach, researchers have generated lists of high-frequency collocations for general purposes (e.g. Shin & Nation, 2008) and for academic purposes (Ackermann & Chen, 2.

(15) 2013; Durrant, 2009). These listings have offered insights on how collocations are actually put in use in a particular discourse. There are, however, some limitations of the existing lists that motivate the conducting of the present study. The first limitation of the existing academic collocation lists is that these lists did not include the disciplinary preference of collocation use into consideration when compiling, in spite of the fact that research has often emphasized the topic- and genrespecificity of collocation in academic discourse. For example, in Ackermann and Chen’s (2013) academic collocation list, the most frequent 2,468 lexical collocations across five disciplines were revealed. While Ackermann and Chen stated that their list is useful for EAP learners in different research fields, they also acknowledged that there are some disciplinary differences regarding the use of some identified items. For instance, some collocations (e.g. ‘academic writing’ and ‘make explicit’) frequently employed by writers in Humanities are less preferred by writers in other fields. The disciplinary-specificity of collocation use is also reported in Ward’s (2007) study on the collocations of common nouns in Chemical Engineering textbooks. In this study, Ward reported that the use of “gas +” was the most important feature to distinguish the Chemical engineering genre from four other engineering disciplines. What these research findings have implied is that the use of academic collocations is so disciplinary-specific that a list generated from diverse disciplines might not cater to different EAP learners’ needs. To better facilitate EAP learners’ collocation use in various subject fields, ‘materials developers, curriculum designers, and teachers need to be aware of the difference between language specific to a particular sub-area of academic study (Coxhead, & Byrd, 2007, p.131),’ and studies should further ‘highlight specific collocations that are exclusively frequent in individual fields of studies (Ackermann & Chen, 2013, p.246).’ A disappointing reality regarding the existing lists 3.

(16) is that there has not been a specialized academic collocation list for a subject area. Another reason for conducting the present research is that there is currently no study specifically investigating second/foreign language leaners’ overuse/underuse behaviors in terms of collocation use. As mentioned in the previous section, researchers have discovered that English learners’ use of collocations deviates from native speakers’/published writers’. Some of the researchers further pointed out that learners tend to use high-frequency collocations more often than their counterparts. Nevertheless, these studies often reported some general findings regarding the leaners’ overuse/underuse behaviors, and no detailed presentations of the overused/underused collocations were made in these studies. Building upon Laufer’s (2005) suggestion, it is argued here that a ‘planned’ list of both commonly overused/underused collocations, resembling the content of existing learner dictionaries, might be more useful to improve EAP learners’ collocational knowledge. In sum, the findings as well as suggestions reported in previous research imply that the generation of an academic collocation list should be discipline-specific to better match the needs of learners in different academic fields, and, using the disciplinespecific list as the norm, a list of overused/underused collocations employed by learners in that specific field should also be compiled to deepen our understanding of learners’ collocation employment. 1.3 Purpose of the Study The purpose of this study is two-fold. The first aim of this study is to identify highfrequency collocations of common nouns in one academic area. This research purpose is to fill the gap of lacking discipline-specific academic collocation list in previous studies. In this study, the academic field of applied linguistics is chosen for investigation. Because EAP learners in this field are often required to produce high-quality English 4.

(17) texts in pursuing their academic achievements, the researcher thus decided to generate a list of common lexical collocations employed in this field to facilitate these learners’ productive knowledge of collocations. As for the target collocation types, only verb and adjective collocates were identified in the present study. Past research has revealed that verb-noun and adjective-noun collocations are the two major lexical collocation types in the genre of academic writing (cf. Ackermann & Chen, 2013). Furthermore, some TESOL and SLA studies also discovered that second/foreign language learners’ employment of these collocations is somewhat deviant from native/published writers’ (e.g. Howarth, 1998; Laufer & Waldman, 2011; Li & Schmitt, 2010; Nesselhauf, 2003; Siyanova & Schmitt, 2008). Because of their omnipresence in academic texts and difficulty for leaners to master, the researcher decided to focus on verb-noun and adjective-noun collocations in the present study. It was expected that this disciplinespecific lexical collocation list could serve as a useful reference for learners. The second purpose of the present study was to carry out a detailed analysis of EAP learners’ collocation use in academic writing. Since researchers have uncovered that learners’ use of verb-noun and adjective-noun collocations is different from the native norms, studies on the same issue should explore further on the types of underuse/overuse in learner language. Learners’ overuse/underuse behaviors were analyzed through comparison with the native norms, which might better improve learners’ academic proficiency (Ackermann & Chen, 2013). As Granger (2011) suggested, a detailed comparison between learners’ and native/experienced writers’ use of EAP words ‘enables us to uncover many differences in terms of frequency of use, meaning, lexico-grammatical patterning, collocational preferences and syntactic positioning (p.141),’ and instances of overuse/underuse yielded in these corpus-based comparative study often offer valuable information from a teaching perspective. The 5.

(18) present study thus also aims to compare the verb-noun and adjective-noun collocations employed by learners with those identified in the previously proposed academic collocation list. It was expected that this comparison would reveal the collocations frequently overused/underused by learners and provide some useful information for both EAP teachers and learners in the field of applied linguistics. 1.4 Definition of Key Terms 1.4.1. Lexical Collocation. Within the phraseological tradition 1 of conceptualizing collocations, Benson, Benson, and Ilson (1986, 1997) developed a framework to categorize collocations into grammatical collocations and lexical collocations. On the one hand, the construction of a lexical collocation consists of two open-class components (i.e. nouns, adjectives, verbs, and adverbs) without any function words. Grammatical collocations, on the other hand, are composed of one open-class word and one close-class word. In the present study, only lexical collocations structured as noun + verb/adjective were investigated. 1.4.2. Published Authors. In this study, published authors are defined as writers who have published at least one research article in any SSCI-indexed international journals. It should be noted that, in the current study, native languages of these published writers are not necessarily to English.. 1. There are two major conceptual underpinnings of collocations. One is the frequency-based tradition, and the other is the phraseological traditions. In the frequency-based tradition, collocations are defined as word pairs consisting of co-occurring words within a certain distance of each other. Researchers in this tradition often analyze collocations based on frequency and statistics (e.g. t-score, MI-value, loglikelihood, etc.). On the contrary, the phraseological view of collocations focuses on the ‘fixedness’ of a word combination, and relevant work on collocations is often carried out through syntactic and semantic analysis. 6.

(19) 1.4.3. Overuse/Underuse. In the present study, the term ‘overuse’ refers to the situation that a collocation is employed significantly more often by learners in comparison with published writers, whereas, the term ‘underuse’ refers to the situation that a collocations is employed significantly less often by learners in comparison with published writers.. 7.

(20) CHAPTER TWO LITERATURE REVIEW This chapter reviews past literature on collocations that are highly relevant to the research focus of the present study. The review begins by presenting researchers’ attempts to compile lists of academic collocations, and then argues for the need to generate a discipline-specific collocation list for EAP teaching/learning purpose. The second section of this chapter summarizes the findings of previous research on advanced ESL/EFL learners’ collocation use in writing, and the focus then move on to review more recent studies specifically on ESL/EFL student writers’ collocation use in the genre of academic writing and to highlight the importance of further exploring student writers’ overuse/underuse behaviors. 2.1 Previous Attempts to Generate Academic Collocation Lists The emerging importance of multi-word expressions has aroused researchers’ interest in investigating various types of recurrent word co-occurrence in academic prose based on corpus data (e.g. Biber, Conrad, & Cortes, 2004; Chen & Baker, 2010; Hyland, 2008). Nevertheless, the exploration of academic collocations begins comparatively late, and less robust, compared to the investigations of other types of multi-word units (e.g. lexical bundles). Efforts of compiling an academic collocation list for EAP teaching/learning are even scanter. To the researcher’s best knowledge, the systematic compilation of academic collocation lists have been recently undertaken by Durrant in 2009 and Ackermann & Chen in 2013, both of which drew on corpus data across different academic disciplines to develop an academic collocation list.. 8.

(21) 2.1.1. Durrant’s Academic Collocation List. Durrant (2009) compiled an academic collocation list based on a 25-million-word corpus, which includes 3,251 research articles of 31 academic schools across five disciplines (i.e. Arts and Humanities, Life Sciences, Science and Engineering, SocialAdministrative, and Social-Psychological). To retrieve his list of academic collocations, Durrant first extracted keywords from different schools, and further identified highfrequency collocations based on these keywords. In his list of high-frequency collocations, it was found that over 76% (n=763) of the top 1,000 collocations were grammatical collocations, such as this study and associated with. He claims that frequent grammatical collocations like these are so commonly seen in academic English that they should be introduced to English learners. While Durrant argued strongly that many of his grammatical collocations possess great value for English learners, he also admitted that this finding “may be a disappointment to some” and that “such items [grammatical collocations] are not what many teachers have in mind when they think of collocations (p.163).” Indeed, to many researchers and language teachers, lexical collocations (e.g. ‘conduct a study’ and ‘significant difference’) are usually the more valuable phraseological units worthy of further investigation (e.g. Granger, 1998; Laufer & Waldman, 2011; Nesselhauf, 2003). The combination of two open-class components are also the main targets included in most of the collocation dictionaries, such as Oxford Collocations Dictionary for Students of English and Macmillan Collocations Dictionary. Ackermann and Chen (2013) also comment that Durrant’s listing “does not provide readily usable materials for EAP teaching and learning (p.236)”, even though his listing might revealed some patterns (i.e., grammatical collocations) overlooked by researchers. Although collocations identified in Durrant’s study are considered less usable, the 9.

(22) results of his research still yielded some interesting findings regarding the disciplinary differences in writers’ collocation use. In the retrieval of keywords, Durrant already discovered a huge gap between disciplines in the Arts and Humanities and those in other groups. Between the 26 schools outside of Arts and Humanities, the mean percentage of overlapped keywords was 26%; however, the percentage drastically dropped to 20% when the schools of Arts and Humanities were included. This disciplinary gap was also observed in the total frequency of key collocations for the Arts and Humanities groups. For all the other academic schools, the frequency of key collocations was within a relatively narrow band of 30-35,000 per-million-words (PMWs), whereas the rate for the groupings of Arts and Humanities decreased to 17,677 PMWs. Based on these findings, Durrant concluded that the vocabulary needs of students in the Arts and Humanities are substantially different from those in other disciplines and “should be treated separately (p.165).” 2.1.2. Ackermann and Chen’s Academic Collocation List. The divergence of collocation use in different disciplines exists not only in Durrant’s listing of grammatical collocations, but in Ackermann and Chen’s listing (2013) of academic lexical collocations as well. Adopting a mixed approach of both automated extraction and expert judgment, Ackermann and Chen identified 2,468 lexical collocations of four major types in the 25.6-million-word written curricular component of Pearson International Corpus of Academic English (PICAE), which consists of journal articles and textbook chapters in the fields of Applied Sciences and Professions, Humanities, Social Science, and Natural/Formal Sciences. Although the researchers’ original aim was to generate an academic collocation list for all EAP students, some example items presented in their study still suggested disciplinary difference between the uses of collocations in different fields. For instance, the 10.

(23) researchers observed that, in the field of Humanities, some combinations (e.g. ‘academic writing’ and ‘make explicit’) appeared more frequently than in the other three fields, whereas some (e.g. ‘economic growth’ and ‘adversely affect’) occurred less often in the field of Humanities (see Table 2.1 for more examples). The uneven frequency of these items again demonstrated that the use of academic collocation in one academic field might be drastically different from that in others. Table 2.1. Normed Frequency per Million Word of Example Collocations Selected from Ackermann and Chen (2013). Normed PMWs. Normed AS. Normed HM. Normed SS. Normed NS. Frequent Collocations in HM academic. writing. 16.74. 5.28. 68.12. 1.81. 0.20. crucial. role. 4.89. 3.43. 7.55. 5.42. 3.82. native. speaker. 5.56. 3.00. 18.65. 2.17. 0.40. social. status. 8.39. 3.14. 17.82. 13.19. 0.60. source. material. 3.63. 0.71. 11.32. 2.17. 2.01. make. explicit. 11.71. 7.85. 21.17. 14.99. 4.42. radically. different. 4.67. 1.71. 9.01. 6.14. 3.02. Infrequent Collocations in HM economic. growth. 15.21. 27.70. 2.93. 11.92. 13.07. environmental. factors. 8.26. 10.14. 0.84. 12.10. 8.45. key. factor. 7.58. 10.56. 1.47. 9.21. 7.44. adversely. affect. 4.35. 7.00. 0.63. 3.97. 4.63. In addition to disciplinary difference, Ackermann and Chen’s (2013) listing also revealed the preferred use of noun-related combinations in written academic register. As illustrated in Figure 2.1, noun combinations (i.e. adjective-noun and noun-noun) formed the largest group of lexical collocations, accounting for 74.3% (n=1,835) of the 2,468 entries. The second largest group were verb combinations with nouns or adjectives as complements (13.8%, n=340). The remaining two combination types were 11.

(24) verb-adverb combinations (6.9%, n=170) and adverb-adjective combinations (5.0%, n=124). The researchers suggested that the dominance of nominal combinations in their list reflects the feature of nominalization in academic texts, and proposed that the great tendency of collocating nouns with other word classes requires more investigations.. Figure 2.1. Overview of Final Academic Collocations in Part-of-Speech Combinations (Source: Ackermann & Chen, 2013).. 2.1.3. The Proposed List: A Discipline-specific Verb-Noun and Adjective-Noun Academic Collocation List. Both Durrant’s (2009) and Ackermann & Chen’s (2013) studies listed some useful collocations for EAP teaching/learning; however, it is questionable to claim that the two lists are indeed feasible for all students. As mentioned in previous section, the majority of Durrant’s listing consisted of grammatical collocations, the teaching/learning values of which are often considered less high as compared to lexical collocations. For better EAP teaching/learning, the compilation of lexical collocations should be carried out prior to grammatical collocations. In addition, both of the two lists demonstrated that 12.

(25) different academic fields prefer different combinations, and that this disciplinary difference is consistently observable in the field of Arts and Humanities. This discovery implies that, at least for students in Humanities, a collocation list generated from texts produced by writers in this field will benefit them more than a list consisting items retrieved across different disciplines. The present study thus proposed to generate a discipline-specific verb-noun and adjective-noun collocation list for students in the field of applied linguistics. The reason for choosing applied linguistics as the target discipline is that, for many students in this field, producing English academic articles is a common practice in their academic life. An academic collocation list is thus necessary to enhance their English writing ability. In addition, (applied) linguistics belongs to the sub-category of Humanities in both Durrant (2009) and Ackermann & Chen (2013), which was discovered to be highly divergent from other disciplines in terms of collocation use. The field of applied linguistics is thus a great area for further investigation. Regarding the types of lexical combination investigated, the researcher targeted on verb + noun combinations and adjective- noun combinations in the present study. In Ackermann and Chen’s study (2013), the noun combination group, including adjectivenoun and noun-noun combinations, formed a high proportion of all combinations, whereas verb-noun combinations fall into the second largest group. However, a closer examination on the proportion of each individual type revealed that, while adjectivenoun combinations still made up the largest proportion of all types (71.8%, n=1,773), noun-noun combinations only formed a small proportion (2.5%, n=62). Instead, verbnoun combinations accounted for 12.6% (n=310), ranking the second largest type of all. The proportion of adjective-noun and verb-noun combinations together formed a significant high proportion (84.4%) of all. It seems that verb-noun and adjective-noun 13.

(26) combinations serve as the core collocation types in academic writing. In addition, past research on ESL/EFL learners’ collocation use also suggests that these two collocation types are difficult for learners to master (see Section 2.2 for detailed discussion). The researchers thus decided to investigate only adjective-noun and verb-noun combinations in the present study, examining how adjective-noun and verb-noun collocations are employed in the field of applied linguistics. 2.2 Features of ESL/EFL Learners’ Production of Collocations in Writing The importance of collocation in successful language learning has aroused many researchers’ interests in investigating second/foreign language learners’ productive knowledge of these two-word combinations. Results of these studies all show that both second and foreign language learners’ productive use of collocations are different from native speakers’. For example, compared to the native norms, learners were found to employ fewer collocations in their writing (Laufer & Waldman, 2011). In addition, learners’ use of collocations is somewhat restricted to a limited range of combinations (Fan, 2009; Siyanova & Schimitt, 2008). These ‘favored’, ‘overused’ phrases are often found to be frequent items or cognate to L1 forms (DeCock et al, 1998; Durrant & Schmitt, 2009; Granger, 1998; Laufer & Waldman, 2011; Li & Schmitt, 2010; Lorenz, 1999; Nesselhauf, 2003, 2005). On the contrary, items which are cognately dissimilar to L1 forms or with lower frequency are tended to be underused by learners. Even for advanced learners with many years of English instruction, these features can still be observed in their writing. Since the present study sets out to investigate published authors’ and student writers’ use of lexical collocations in the genre of academic writing, the following subsections will review studies on advanced (i.e. at the university/postgraduate level) 14.

(27) ESL/EFL leaners’ employment of lexical collocations in more formal types of text, namely, argumentative/examination essays and research papers. 2.2.1. Advanced Learners’ Use of Lexical Collocations in Argumentative English. Several investigations have been conducted to examine advanced ESL/EFL learners’ employment of adverb-adjective collocations (e.g. Granger, 1998; Lorenz, 1999), adjective-noun collocations (e.g. Siyanova & Schmitt, 2008), and verb-noun collocations (e.g. Nesselhauf, 2003; Laufer & Waldman, 2011) in argumentative essays, and findings of these studies reveal learners’ difficulties in appropriately using these lexical collocations. In Granger’s study (1998), for instance, advanced French-speaking learners were reported to exhibit both underuse and overuse behaviors of –ly intensifier + adjective English collocations. Extracting –ly intensifier + adjective collocations from the 251,000-word French component of ICLE, Granger discovered that learners applied ‘boosters’ intensifiers (e.g. deeply, strongly, and highly) comparatively less frequently as compared to native English students. In contrast, learners were found to overuse completely and totally compared with the native norm. The researcher concluded that the advanced learners tended to underuse native-like expressions but overuse some L1 congruent combinations which were unconventional in the target language. In addition to Granger, Lorenz (1999) also investigated lexical collocations of intensifier + adjective in advanced German-speaking learners’ argumentative essays. Adopting a frequency-based statistic approach, Lorenz calculated association measures of collocations (i.e. t-score and MI-value2) and type-token ratios (TTRs) to compare. 2. Both MI-value and t-score are frequency-based methods to determine the collocational strength of word combinations. On the one hand, collocations with high t-scores are combinations frequently found in given texts, the ranking based on which is very similar to the ranking based on raw frequency. Collocations with high MI-values, on the other hand, might be less common in terms of their raw 15.

(28) learners’ use of these collocations with native British students’. The TTR analysis suggested that the learners’ repertoires of collocations were similar to natives, yet they still overused some high frequency collocations (i.e. combinations with high t-scores and/or MI-values) in their writing. Also adopting a statistic-based approach, Siyanova and Schmitt (2008) compared advanced Russian-speaking learners’ employment of adjective-noun collocations with native university students’. The researchers first extracted all adjective-noun collocations in the 24,000-word Russian subcorpus of CLEC and the 25,000-word LOCNESS, and consulted the frequency and MI-values of these collocations in BNC. They reported that only 45 percent of the learners’ collocation use was appropriate; in other words, more than half of the collocations were unconventional as compared to the native norm. The results of their study thus indicated the learners’ lack of native-like collocational knowledge. Targeting on verb-noun collocations (e.g. take a break), Nesselhauf (2003) examined advanced German-speaking learners’ use of these collocations from a phraseological approach. She classified word pairs into groups of free combination (e.g. want a car), collocation (e.g. take a picture), and idiom (e.g. sweeten the pill), and investigated which group was more problematic for the learners. Her research findings revealed that, among the three groups of word combination, the error rate for the collocation group was the highest (79%), while the rates for the free combination group and the idiom group were distinctively lower (23% for each). The researcher further suggested that the learners’ L1 was the most influential factor contributing to their erroneous production of verb-noun collocations. Advanced learners’ difficulty in using verb-noun collocations was also yielded in. frequency, but the components of these collocations are not often found apart (Stubbs, 1995). 16.

(29) Laufer and Waldman (2011). The researchers compared collocations employed by native speakers of Hebrew with those by native speakers of English. Their research showed that learners at all three proficiency levels (i.e. basic, intermediate, and advanced) used significantly fewer collocations than natives did. While the advanced learners increased the number of collocations in their writing, they still produced a number of miscollocations, particularly interlingual ones, to a similar extent as learners at the lower levels did. Although there are some divergences regarding the investigated collocation types and adopted analytic measures, the reviewed studies all prove that appropriately employing lexical collocations is a difficult task for advanced ESL/EFL leaners, regardless of their L1s. Some studies reveal learners’ misuse of collocations, and some suggest learners’ tendency of overusing certain collocations but underusing others. In sum, these studies present the discrepancy between native English speakers’ and learners’ use of lexical collocations in argumentative essays. 2.2.2. Advanced Learners’ Use of Lexical Collocations in Academic English. Section 2.2.1 reviewed studies on advanced ESL/EFL learners’ use of adverbadjective, adjective-noun, and verb-noun collocations in argumentative essays, and results of these studies have demonstrated that learners’ use of these collocations is deviant from native speakers’ in terms of the numbers as well as types of employed collocations and error-free use. One thing should be noted is that the discrepancy between ESL/EFL learners’ and native speakers’ collocation use is not limited to the genre of argumentation. There are, in fact, some studies illustrating a similar divergence between learners’ and native speakers’ collocation use in academic writing. Adopting a phraseological approach, Howarth (1998) conducted a comparative study on native speakers’ and non-native speakers’ use of restricted verb-noun 17.

(30) collocations (e.g. ‘draw a comparison’ and ‘perform a task’) in academic writing. Comparing collocations identified in a 25,000-word learner corpus of non-native postgraduate essays with the norms established in a 240,000-word corpus of British academic writers, Howarth reported that postgraduates produced fewer (around 50%) restricted verb-noun collocations than natives did. In addition to quantitative discrepancy, a qualitative difference was also revealed by the fact that around six percent of learners’ collocation use was unconventional (e.g. *perform a project, *pay effort/care, *reach findings, *draw a correlation, etc.) Based on these findings, Howarth concluded that the lower density of restricted collocations and instances of awkward collocational expression might suggest learners’ limited knowledge of verbnoun collocations as well as low awareness of appropriate verb-noun collocation employment. While Howarth’s (1998) study yielded learners’ difficulty in using verb-noun collocations in academic English, Durrant & Schmitt’s (2009) and Li & Schmitt’s (2010) studies revealed that adjective-noun collocations were also problematic for advanced non-native learners. Using MI-values and t-scores for analysis, Durrant and Schmitt (2009) undertook a comparative study on the use of adjective-noun and noun-noun collocations in native and non-native speakers’ academic writing. In their study, Durrant and Schmitt collected texts at different lengths (i.e. long and short) produced by native speakers and ESL/EFL learners. The native corpus, containing approximately 94,000 words, was composed of long texts from Prospect and academic journals as well as short texts from opinion articles in newspapers and LOCESS. The 87,000-word nonnative corpus contained long texts from a British EAP project and a Turkish EAP project as well as short texts from a British short essay pool and the Bulgarian subcorpus of ICLE. The researchers first extracted all adjective-noun and noun-noun combinations 18.

(31) from the two corpora, and excluded combinations containing unwarranted components (i.e. proper nouns, acronyms, pronouns, possessives, semi-determiners, and numbers/ordinals) or occurring less than five times in the BNC. Collocations passing the two filtering stages were then measured with MI-value and t-score, and those failing to reach the cutoff points (i.e. MI-value at least 3 and t-score at lease 2) were excluded from further comparison and discussion. Comparing adjective-noun and noun-noun collocations identified in the nonnative corpus with those in the native corpus, Durrant and Schmitt discovered that nonnative speakers’ use of low frequency collocations (i.e. appearing less than five times in BNC) was distinguishably different from native speakers’ in long texts. The native speakers’ rate of using low frequency collocations was 48%, whereas the rate for nonnative speakers dropped to 38%. In addition to low frequency collocations, discrepancy between natives and non-natives were also revealed in both the t-score analysis and the MI-value analysis. For t-score analysis, non-native speakers were reported to employ collocations with very high t-scores (t-scores > 10) significantly more often than native speakers. However, a reverse condition was found in the MI-value analysis. Non-native speakers were reported to rely on strong collocations to a lesser extent than natives did, and to underuse collocations with MI-values bigger than seven. The results of Durrant and Schmitt’s research suggest that, compared to the native norm, non-native speakers tend to overuse high-frequency items but underuse high-MI ones. A similar pattern of learners’ overuse/underuse behaviors was also reported in Li and Schmitt (2010). Adopting the same types of frequency-based association measures, Li and Schmitt investigated four Chinese postgraduates’ development of collocation use over a period of one academic year (i.e. three terms). They collected eight essays and one dissertation from each participant to build a learner corpus. To examine these 19.

(32) learners’ collocation use over time, the researchers first extracted all adjective-noun combinations from the 150,000-word learner corpus. Extracted combinations were then removed from analysis if they contained components such as hyphenated adjectives, pronouns, possessives, etc. The remaining collocations were then further divided into frequent and infrequent combinations, and infrequent collocations (i.e. appearing less than four times in the BAWC) were excluded from further statistical measures (i.e. MIvalue and t-score). The results of Li and Schmitt’s study identified 494 tokens of adjective-noun collocation employed by the four Chinese postgraduates, which made up of 299 types. Of these 299 types, 41.1% were considered frequent and strongly-associated collocations (MI-value >3, t-score>2), and 58.9% were considered infrequent combinations. In other words, the learners’ use of robust (i.e. frequent and stronglyassociated) types was proportionally similar to that of rarely-occurring types. In terms of tokens, however, the learners showed a tendency to employ robust collocations more often than the infrequent ones, which was demonstrated by the sharp contrast between the low TTR for robust collocations (0.43) and high TTR for infrequent ones (0.90). In addition, the researchers also reported the learners’ tendency to repetitively employ several types of collocations and their modest increase in using collocations with high t-scores during the academic year. Regarding the collocations with high MI-values, however, no increase was found in the learners’ use of these items. In terms of learners’ overuse and underuse behaviors, findings yielded in Li and Schmitt echo with those in Durrant and Schmitt (2009), that is, less proficient non-native writers are inclined to use collocations with high t-scores but relatively low MI-values, whereas proficient/native writers demonstrate reverse preference.. 20.

(33) 2.2.3. A Re-examination of Advanced Learners’ Overuse/Underuse Behaviors Based on Large-sized Academic Corpora. Studies reviewed in Section 2.2.1 and Section 2.2.2 all point to the fact that advanced learners’ have great difficulty in appropriately using lexical collocations in their writing. Compared to the native/published writers’ norm, learners are reported to produce fewer collocations, misuse some non-conventional combinations, or overuse high frequency collocations. Regarding the issue of overuse and underuse, learners are found to exhibit the tendency of overusing high t-score collocations but underusing those with high MI-values, whereas native/published writers demonstrate a reverse preference. This phenomenon is consistently observed in the production of academic collocations in learners’ and native/published writers’ writing. While results of these studies seemingly lead to a general conclusion regarding advanced learners’ collocation use, there are, however, some limitations in these studies. One of the major limitations is the small size of the corpora investigated in these studies. As can be seen from the previous two sections, the largest learner corpus investigated in previous research only contained approximately 250,000 words in total, and many other investigated corpora were even sized far below this number. It is afraid that findings generated from these small-sized corpora cannot really represent the features of advanced learners’ use of lexical collocations. In addition to the small corpus size, none of these studies present a list of the frequently overuse/underused collocations. While readers are informed about the learners’ preference of highly frequent collocations and disfavor of mutually-associated ones, they still do not know ‘which’ collocations are frequently overused/underused by learners. The researcher here thus argue the need to generate a list of commonly overused/underused academic collocations in order to offer valuable information for EAP teaching/learning. 21.

(34) To better facilitate leaners’ knowledge of academic collocations, the present study aimed to generate two lists of frequent collocations. The first list, as discussed in Section 2.1.3, was a discipline-specific verb-noun and adjective-noun academic collocation list, which can serve as a reference for EAP teachers and learners. Using the first list as the norm, the second list contained verb-noun and adjective-noun collocations overused/underused by Taiwanese EFL learners. It is argued that the compilation of a frequently overused/underused collocation list can better facilitate our understanding of these learners’ use of these lexical collocations. Four research questions were thus raised in the present study for generating the two lists: Research Questions 1. What are the frequent verb-noun (V-N) and adjective-noun (A-N) collocations in published authors’ writing in the field of applied linguistics? 2. What are the frequent verb-noun (V-N) and adjective-noun (A-N) collocations in Taiwanese EFL learners’ writing in the field of applied linguistics? 3. To what extent do Taiwanese student writers’ use of these frequent V-N and A-N collocations differ from that of the published authors? 4. What are the V-N and A-N collocations overused/underused by Taiwanese student writers in the field of applied linguistics?. 22.

(35) CHAPTER THREE METHOD The current study is a corpus-based investigation of published authors’ and Taiwanese EFL learners’ usage pattern of collocations in academic writing. The first section of this chapter introduces the self-compiled reference corpus and learner corpus investigated in the present study. The instruments employed to examine the use of V/A-N collocations in the two corpora are introduced in the second section. Finally, the procedure for extracting frequent collocations and identifying over-/underused items are explained in the last section. 3.1 Corpora 3.1.1. The Reference Corpus. A reference corpus of research articles were compiled to investigate highfrequency collocations of nouns in the field of applied linguistics. The use of research article was because of its importance in academic writing. This text type has long been considered the preferred genre for the academic communities to communicate (Williams, 1998); in other words, their language legitimates the way people in these communities should talk (Peacock, 2012). In addition, they often serve as the target of good writing for students to emulate (Hyland, 2008). Since research articles function as the key medium in the academia and the writing model for students, collocations identified in this text type should thus offer high teaching/learning value to L2 learners. Because the present study aimed to compare a learner corpus of Taiwanese EFL learners’ theses with the research article corpus, and because the majority of the theses were empirical studies in the Introduction-Method-Results-Discussion (IMRD) format, review articles were excluded from the reference corpus. Research articles in the IMRD 23.

(36) format were selected from 15 international journals of applied linguistics. These journals were all SSCI-indexed and peer-reviewed, and accepted only articles written in (well-written and proof-read) English. From each journal, 100 articles were randomly chosen, and the total number of texts were 1,500. The size of this research article corpus was close to 12 million running words, which should be adequate to generate a list of frequent V-/A-N collocations in the field of applied linguistics. Table 3.1 below reports the composition of the Research Article Corpus (abbreviated as RAC throughout this thesis). Table 3.1. Composition of RAC.. Applied Linguistics. 100. 771,486. Average Word per Text 7,714.86. Computer Assisted Language Learning. 100. 688,845. 6,888.45. English for Specific Purposes Journal of English for Academic Purposes Journal of Second Language Writing. 100. 750,373. 7,503.73. 100. 717,603. 7,176.03. 100. 797,827. 7,978.27. Language Learning. 100. 1,005,547. 10,055.47. Language Learning & Technology. 100. 796,604. 7,966.04. Language Teaching Research. 100. 769,499. 7,694.99. Language Testing. 100. 695,279. 6,952.79. ReCALL. 100. 692,535. 6,925.35. Second Language Research. 100. 899,428. 8,994.28. Studies in Second Language Acquisition. 100. 1,010,644. 10,106.44. System. 100. 618,088. 6,180.88. TESOL Quarterly. 100. 727,089. 7,270.89. The Modern Language Journal. 100. 917,130. 9,171.30. 1,500. 11,857,977. 7,905.32. Journals. No. of Text. Total. 24. No. of Running Word.

(37) 3.1.2. The Learner Corpus. A learner corpus of Taiwanese graduate students’ theses on applied linguistics/English language teaching were compiled to reveal these EFL learners’ use of V-/A-N collocations in academic English. Master’s theses included in this learner corpus were selected from 10 graduate programs of different national universities on applied linguistics/English language teaching. The reason for targeting on these programs was the fair quality of the students’ English writing in these programs. Investigation on the courses of these graduate programs revealed that students in these programs are all required to receive formal training on advanced/academic English writing, which suggests that theses produced by students in these programs might be more polished and worthy of further investigation. The reason for choosing on different programs was to collect texts that could represent the panorama of thesis writing by Taiwanese graduate students in the field of applied linguistics. From each university, theses produced during 2003-2012 were included in the learner corpus, but deletion were made to theses in the form of meta-analysis or written by non-Taiwanese students to ensure the homogeneity of the learner corpus in terms of thesis format (i.e. IMRD) and learners’ first language (i.e. Chinese). The size of this learner corpus also approximated to 12 million words, so that the learner corpus were comparable to RAC in terms of corpus size. Table 3.2 shows the composition of the Master’s Thesis Corpus (i.e. the universities that the 10 programs belong to, the number of texts, the number of running words, and the average word per text).. 25.

(38) Table 3.2. Composition of MTC.. National Changhua University of Education. 84. 2,077,235. Average Word per Text 24,728.99. National Chiao Tung University. 28. 671,611. 23,986.11. National Chiayi University. 12. 290,496. 24,208.00. National Chung Chen University. 68. 1,709,833. 25,144.60. National Chung Chi University. 14. 359,937. 25,709.79. 128. 2,911,090. 22,742.89. National Pingtung University of Education. 28. 631,985. 22,570.89. National Taiwan Normal University National Taiwan University of Science and Technology National Tsing Hua University. 60. 1,474,036. 24,567.27. 26. 577,760. 22,221.54. 46. 1,019,019. 22,152.59. Total. 494. 11,723,002. 23,730.77. No. of Text. Universities. National Kaohsiung Normal University. No. of Running Word. All of the texts in the two corpora were downloaded online. The research articles were obtained either from the online electronic database or the journal websites, while the Taiwanese masters’ theses were retrieved from the Electronic Theses and Dissertations System of the National Central Library or the online libraries of the universities. The tables, figures, excerpts, transcripts, quotes (more than 40 words), acknowledgment, foot-/endnotes and reference lists were all manually removed from the downloaded texts to ensure that any confounding factors were excluded from the data. 3.2 Instrument 3.2.1. Sketch Engine. In the present study, the commercial online platform Sketch Engine (SKE) were utilized to extract collocations in the two corpora. The SKE was developed by Kilgarriff and his colleagues to automatically present a one-page summary of a word’s 26.

(39) grammatical and collocational features based on corpus data (Kilgarriff et al, 2004). The advantage of applying the SKE in generating lists of common collocations in the two corpora is its clear demonstration of categorizing collocations into different lexical/grammatical collocation types. The traditional manner of collocation extraction usually grumbles all the types of lexical collocation in one list, and it often consumes researchers much time to identify specific type(s) of lexical collocation in such a long list. The SKE, however, can systematically demonstrate a set of up to 27 grammatical relations connected to a headword (Kilgarriff et al, 2004), which allows the researcher to quickly identify V-/A-N collocations from the two 12-million-word corpora. Another benefit of the SKE is its multi-functionality. In addition to the function of concordancing, the SKE also includes corpus creating, word list, word sketch references, thesaurus search, sketch differences, and many other practical uses. This one-for-all tool allowed the researcher to identify both a list of key nouns and lists of the common collocations in published authors’ and student writers’ writing in the same platform. Because the SKE offers multiple functions for the researcher to efficiently explore the collocation patterns in the two corpora, this online platform was chosen as one of the main instruments in the present study. Among the various functions of the SKE, the Corpus Creating, Word List, and Sketch-Diff functions were utilized in the present study. The Corpus Creating function allowed the researcher to upload the two self-compiled 12-million-word corpora onto the platform for further alternative analyses by applying the tools on the SKE website. The Word List function, in addition to generating a wordlist of a corpus, can also identify a list of keywords (i.e. words whose frequency of occurrences are salient in one corpus but not in the other) by comparing two wordlists of two different corpora (see Figure 3.1, the interface of Word List search). 27.

(40) Figure 3.1. Interface of Word List Search.. The default reference corpus for retrieving keywords in the present study was the British National Corpus (BNC). The BNC was chosen as the reference for comparison was because of its large-size (100-million-word) and representativeness of general English, which serves as a good base to identify core nouns out of the two corpora. By comparing the frequency of occurrences of all the words in a corpus with those in the BNC, a keyword list could be thus generated. Keywords identified in a corpus, however, contained not only nouns but also other content words and function words. To solve this problem, the search attribute was set as ‘lempos’. As demonstrated in Figure 3.2 below, the outcome of lempos search presented the keywords in the form of ‘lemma-pos’ (i.e. a lemma with its part-of-speech specified), which allowed the researcher to further retrieve key nouns (in the form of 28.

(41) ‘lemma-n’, such as ‘proficiency-n’ and ‘learner-n’) from the two corpora. With this function, the researcher then could easily identify key nouns in the two self-compiled corpora.. Figure 3.2. Search Results of Word List in the Output Type of Keyword.. As for the Sketch-Diff function, it was originally developed to display the collocational discrepancy between two synonyms (i.e. what collocates tend to co-occur with one synonym but not the other). This special function, however, were adopted to retrieve the verb/noun collocates of the identified key nouns from RAC and MTC. Figure 3.3 below presents the interface of Sketch-diff by subcorpus.. 29.

(42) Figure 3.3. Interface of Sketch-Diff by Subcorpus.. A summary chart concerning the corresponding collocates of ‘difference’ in distinct parts of speech positions was displayed in different blocks, as shown in Figure 3.4 below.. Figure 3.4. The Search Outcome of Sketch-Diff of Difference between Two Corpora. 30.

(43) In each block, collocates were highlighted in three distinct colors, namely, green, white, and red. The green color highlighted collocates that tended to co-occur in RAC; the red color, on the contrary, specified collocates that were more likely to co-occur in MTC. Items in the white area were those that appeared equally often in the two corpora. For instance, in the block object_of, it can be seen that the published authors often collocated the verb ‘observe’ (in the green area) with ‘difference to form the V-N collocation ‘observe a difference’, whereas the Taiwanese EFL learners never constructed this V-N collocation in their writing. In contrast, the verb ‘investigate’ (in the red area) co-occurred with ‘difference’ for 90 times in the Taiwanese EFL learners’ writing, while this combination did not occur in the published authors’ writing. In other words, items in the green area were the potential underused collocates, while those in the red area were more possibly overused. It should be noted that mis-tagging of words could sometimes occur, which might influence the comparison results due to miscalculation of frequencies. Further manual examination on the concordance lines of each potential collocate were thus required after the semi-automated extraction of collocations in the two corpora. Since the present study targeted on V-/A-N collocations, only blocks titled object_of (i.e., analysis as objects) and modifier (i.e., collocates as the modifiers of analysis) were examined in the current study. 3.2.2. Log-likelihood Calculator. One of the main purposes of the current study was to explore the difference(s) in V-/A-N collocation use by published authors and Taiwanese EFL learners. To achieve this purpose, log-likelihood test was adopted to determine to what extent the two writer groups’ collocation use was different from each other. A log-likelihood ration calculator developed by Jiajin Xu (2009) was chosen as the instrument to compare the raw frequencies of frequent V-N and A-N collocations in the published authors’ 31.

(44) writing with those in the Taiwanese EFL learners’ writing. As illustrated in Figure 3.5, the calculator would yield a log-likelihood value (LL-value) for the compared frequency sets after computation, and indicate the level of significance (Sig.) between the observed frequency differences. The plus (+) and the minus (-) symbols next to the significant level showed whether the investigated collocations were overused or underused in A corpus in relation to B corpus. For instance, in Figure 3.5, the significant level of the collocation indicate difference was 0.031 and was preceded by a * and a plus symbol. This indicated that indicate difference was used significantly more often in RAC than in MTC, and that this collocation was underused by the Taiwanese EFL learners.. Figure 3.5. The Interface of the Log-likelihood Ratio Calculator.. 3.2.3. Collocation Calculator (for Calculating Association). Another tool employed in the present study was an online collocation calculator (http://corpora.lancs.ac.uk/stats/toolbox.php#tabs8) to obtain different association measures of the identified collocations (see Figure 3.6 for the interface of the online 32.