CHAPTER I INTRODUCTION
1.4 Definition of Key Terms
In the lexical collocation definition developed by Benson, Benson and Ilson (1986), lexical collocation consist of nouns, adjectives, verbs, and adverbs. One type of lexical collocation is formed by a lexical verb and a noun as object, which was referred in this present paper as verb-noun collocation.
1.4.2 Miscollocation
Based on previous research (Laufer & Waldman, 2011; Nesselhauf, 2003, 2005;
Wang & Shaw, 2008), a collocation were judged as unacceptable if the collocation:
8
cannot be found in dictionaries (Oxford Advanced Learner’s Dictionary, the Collins
COBUILD English Dictionary, The BBI Dictionary of English Word Combinations, and
Oxford Dictionary of English Idioms), found less than 10 hits in BNC, not being
accepted by native speakers. This present paper followed a similar line of judgement, and referring to a collocation as miscollocation if the collocation did not appear in BNC/COCA academic subcopora, not being found in BNC, and not being accepted by one native English speaker.9
Chapter Two: Literature Review
This chapter reviewed previous studies on ESL/EFL learners’ collocations. The
review started out discussing three different notions of collocations which were adopted in previous research. The second part of this chapter summarized some difficulties EFL learners might have when producing collocations, especially in academic writings. The third part emphasized on the types and causes of miscollocations found in learners’writings. Lastly, this section ended with the aim and research questions answered in this present study.
2.1. Notion of Collocation
Collocations, also known as formulaic sequences, prefabs, or chunks, are one of the major lexical competences of second-language learners. There has not been a clear-cut definition for collocation due to its complexity (Fan, 2009). Three broad definitions, namely phraseological tradition, frequency-based view and grammatical categorization, can be discovered in previous research (Barfield & Gyllstad, 2009). Three notions were discussed in details below.
2.1.1 Phraseological View of Collocation
The core of phraseological view is the paradigmatic and associative relationship between the lexical items in one collocation. Unlike frequency-based view, which emphasizes on frequency, phraseological view is more concerned with the fixedness and substitutability of the lexical items.
Cowie (1981) first approached the phraseological notion of collocation and defined collocation “a composite unit which permits the substitutability of items for at least one of its constituent elements” (P. 224). He further raised two parallel
10
explanations that governed the notion of collocation. Firstly, in one collocation, one item was used in a figurative sense. For instance, in the word combination, explode a
bomb was defined as free-combination since all of the lexical items were used in their
literal meaning. However, explode a myth, was categorized as collocation because the word explode was used in the sense of “to prove something is wrong”, which is different from its original meaning, “to burst”. The figurative sense of explode restricted the object-nouns following behind it. Only a small handful of words could be attached behind it, such as theory, idea, notion (Cowie, 1981, P.226). Aside from determination of collocates by the figurative sense of one lexical item, the second explanation can be viewed from the other way around. The context, on the other hand, might trigger the figurative meaning of one element. Cowie claimed that, “a single item in a figurative sense is determined by a limited set of items used in a literal sense” (P. 226). Taking the word combination explode a myth as an example again, explode was only interpreted figuratively in the literal context of myth, notion, idea, theory. There might not be other context which triggered the figurative sense of explode.Howarth (1996) followed Cowie’s work and purposed that the meaning of collocation was not completely figurative as a whole. There was still evidence that each component contributed to the meaning of collocation individually. Only one component was in its “specialized” sense and thus made the word combination had figurative sense.
For instance, foot the bill (collocation) and fill the bill (idiom), even though foot was not in its literal sense (as in the meaning of “pay”), the semantic meaning of foot can still be extracted from the collocation. However, the semantic sense of fill the bill was one inseparable unit. To make the definition of collocation more precise, there were three conditions in which the lexical item was considered specialized (Aisenstadt, 1979):
1. With a narrow, specific meaning. E.g., Shrug ones shoulder (Howarth, 1996,
11
P. 39)
2. Having a figurative, secondary meaning. E.g., File an application 3. Delexicalized meaning. E.g., Catch fire
Imagining a scale of semantic transparency and specialized sense of lexical items in a one word combination, the more narrow, specific or delexicalized the sense of the word is, the more restricted such a combination might be.
Nesselhaulf (2005) expanded Cowie’s viewpoints and defined collocation as
“words that habitually occur together with restriction of substitutability and relative transparency of meaning” (Nesselhaulf, 2005, P. 2). She established a term named
“restricted sense” to explain the semantic restriction in a verb-object-noun collocation.
Based on the degree of restrictedness of lexical items in one collocation, a continuum of idiomaticity was presented free of restriction on one side (free combination) and fully restricted on the other (idiom). Two criteria were also developed to determine the degree of restrictedness:
1. The sense of verb is so specific that it only takes a small range of nouns following behind it.
2. All semantically and syntactically acceptable nouns cannot be used with a specific sense of verb.
On the basis of restrictedness, three categories of word combinations were therefore further developed, free combination, collocation, and idiom. In free combination, all items were used in an unrestricted sense (e.g., want a car), whereas in collocation, the verbal item was restricted and only took a small set of nouns as collocates (e.g., dial a number). As for idioms, both verb and noun in the word combination were used restrictedly, not allowing any kind of substitution (e.g., sweeten the pill). In short, as long as either one of the criteria or both were observed in the verbal component, such a
12
combination was then classified as collocation. Similar to Cowie’s (1981) definition of collocation, Nesselhaulf (2005) also had an underlying concept that nouns behaved the same in both free combination and collocation, but verbs behaved differently.
Nesselhaulf’s analysis of word combination clarified the hazy definition of collocation by using verb as the core of classification for collocation.
2.1.2 Frequency-based View of Collocation
A Frequency-based definition considers collocation as a set of words within a certain distance appearing frequently (Barfield & Gyllstad, 2009; Fan, 2009).
Pioneering work under this tradition can be traced back to Firth’s study (1957). He proposed “collocability” within collocation, using the example “dark night” to demonstrate that the meaning of “night” is complete when combined with the word
“dark”. In other words, the meaning of a word is more intact with the presence of other supplementary words which it has collocability with. Furthermore, the meaning of collocation cannot be simply predicted from each component in the collocation lexically.
The meaning of collocation, as Firth argued, should be “an abstraction at the syntagmatic level” (P. 196).
Halliday (1966) followed Firth’s concept of collocation, and expanded his concept by introducing the following terms: node, collocate and span of collocation. Node is the primary word in the collocation. Collocate is the frequently co-occurring word and span is the environment where collocate would appear. In Halliday’s view of collocation, predictions of lexical items can be made since some co-occurring items appear “greater than chance” (P.156).
Discussing collocation from a statistical approach, Sinclair (1991) combined ideas from both Firth and Halliday, and defined collocation as “the occurrence of two or more words with a short space of each other in the text” (P.170). Two types of collocation
13
were distinguished by Sinclair based on the definition above, which were, casual collocation, and significant collocation. Significant collocations could be distinguished by the higher frequency of co-occurring word combinations when compared to the respective frequencies of each word in a given length of text (Jones & Sinclair, 1974).
For instance, the word combination “the club” might be a casual collocation instead of a significant collocation because “the” occurs more often than the word combination
the club. However, join club, on the other hand, could possibly be a significant
collocation. In this present study, Sinclair’s definition was adopted and the frequency of word combinations was the key to decide whether the targeted collocation is acceptable or not.Table 2.1. Definition of Collocation Types by Jones and Sinclair (1974)
Collocation Type Definition Example
Casual Frequency of word combination < Frequencies each word. The club Significant Frequency of word combination > Frequencies each word. Join club
Later in his study, Sinclair (1991) made an observation that a lexical item which appeared most frequently in general had a less clear and more opaque meaning. To put it in another sense, the more frequent lexical item was “delexicalized” and more difficult to analyze or explain. Based on this observation, Sinclair presented two types of collocation: upward collocation and downward collocation. Upward collocations included words that tend to collocate with lexical items of higher frequency than itself, whereas downward collocations consisted of words that collocated with words with lower frequency. To distinguish upward and downward collocate statistically, Sinclair proposed that the frequency of collocate should be plus or minus 15% of the frequency
14
of the node1. Using the word back as an example, upward collocates were down, from,
into, at, and downward collocates were arrive, bring, behind, again. The former type
of collocations were often used as grammatical frames; on the other hand, the latter type of collocations denoted more semantic sense (Sinclair, 1991, P.116). Sinclair’s statistical definition of collocation provided insights of the “lexical realization of the situational context” (Moon, 1987, p22).Table 2.2. Definition of Upward/Downward Collocation by Sinclair (1991)
Collocation Definition Example
Upward words collocating with lexical items with higher frequency than
itself Back from
Downward words collocating with lexical items with lower frequency than
itself Bring back
2.1.3 Grammatical Categorization of Collocation
Different from the frequency tradition and phraseological view, grammatical categorization of collocation did not tend to define collocation from a semantic aspect.
The BBI dictionary of English Word Combinations (Benson, Benson, & Ilson, 1987) presented two categories for collocation, grammatical collocation and lexical collocation. Grammatical collocation was described as a word combination with a
“dominant word” (usually contained the core meaning of collocation) which was followed by a preposition or a clause (Benson et al, 1987, P. XIX). A typical grammatical collocation might be decide on. Lexical collocation, however, only contained nouns, verbs, adjectives, and adverbs. In contrast to grammatical collocations, grammatical words could not be seen in lexical collocations. Benson et al. (1987) further divided lexical collocation into seven sub-categories (L1 to L7): verb +
1 For instance, if the frequency of node is 100 times in English, upward collocate should appear at least 115 times, and downward collocates should appear less than 85 times.
15
noun/pronoun (prepositional phrase), verb + noun, adjective + noun, noun + verb, noun
1+ of + noun
2, adverb + adjective, verb + adverb (Examples see Table 2.3)
Table 2.3. Sub-categorizations of lexical collocations (Benson et al., 1987)
Collocation Type Grammatical Structure Example L1 verb + noun/pronoun (prepositional phrase) set a recordL2 verb + noun break a code
L3 adjective + noun strong tea
L4 noun + verb blood circulates
L5 noun1 + of + noun2 a school of whales
L6 adverb + adjective closely acquainted
L7 verb + adverb apologize humbly
Some differences between L1 and L2 was that verbs in L1 denoted a sense of creation and/or activation, such as make an impression, whereas L2 collocation consisted of verbs which had a meaning of eradication and/or nullification, like demolish a house.
Based on this classification of collocation, this present study intended to look into L1 and L2 types of collocation. Grammatical collocations were not included in the research
2.2 Advanced ESL/EFL learners’ Difficulties of Collocation
Due to the importance of collocation in language learning, a large amount of studies were carried out to investigate the features and patterns of learners’ collocation (Bahns & Eldaw, 1993; Barfield & Gyllstad, 2009; Biskup, 1992; Durrant, 2009;
Farghal & Obiedat, 1995; Granger, 1998; Gyllstad, 2005; Laufer & Waldman, 2011;
Nesselhauf, 2003, 2005; Panahifar, 2013; Peacock, 2012; Wang & Shaw, 2008). It was discovered that ESL/EFL learners did not possess enough collocational knowledge in general (Bahns & Eldaw, 1993; Biskup, 1992; Farghal & Obiedat, 1995; Granger, 1998;
Gyllstad, 2005; Howarth, 1998; Nesselhauf, 2003). Bahns & Eldaw (1993) utilized translation and fill-in-blank methods to examine the relationship between general
16
knowledge of vocabulary and collocation use. Using noun as node and verb as collocate, Bahns & Eldaw selected 15 V-N collocations to investigate collocational knowledge of 58 German advanced learners. The result suggested that learners were more likely to make mistakes in collocation production than they are with general vocabulary production. Bahns & Eldaw later concluded that learners’ problems of collocation were far more complicated than simple general vocabulary issues. Possible causes could be attributed to not being aware of potential difficulty of collocation.
By delving into productive knowledge of collocation, Farghal and Obiedat (1995) used fill-in-the-blank tests to investigate learners’ coping strategies when encountering unfamiliar collocations. Overall analysis showed that both junior and senior college students were deficient in collocation, therefore four strategies were used for lexical simplification, namely voidance, transfer, synonymy, and paraphrasing. Among the four skills, synonym was used most frequently, and thus Farghal and Obiedat (1995) later claimed that learners were unaware of “collocational restrictions of lexical items”
(P. 321). Learners were also unable to acknowledge collocation as prefabricated patterns in which random substitution of words were not allowed.
Discussing collocational development and difficulties at different proficiency levels, Gitsaki (1997) investigated how 275 Greek junior high school students made use of collocation in composition, blank-filling and translation tasks. Students were divided into three groups: post-beginning, intermediate, post-intermediate. Results suggested that the accuracy of both lexical and grammatical collocations were low with the beginning level learners. Gitsak (1997) explained that lexical collocations were relatively shorter than grammatical collocations and therefore beginning level students tend to memorize lexical collocation as unanalyzed chunks. Learners were also found to have more problems with grammatical collocations containing prepositions because
17
of L1 interference.
Aside from studies using traditional elicitation techniques, some studies resort to corpus technology and analyzed learner collocation with large amount of texts. With help of computer software, comparison among native and non-native corpora provides more insights into collocation problems learners might encounter. In a corpus-based study done by Granger (1998), International Corpus of Learner English (ICLE) was utilized and investigated how -ly adverb + adjective (e.g., perfectly natural) collocation were used by French learners. ICLE is a computerized corpus containing EFL writings from all over the world. Comparing French components of ICLE (251,318 words) and the native corpus (234,514 words), Granger used text retrieval software to extract all words ended with –ly and manually deleted the words that did not serve a semantic purpose of intensifier. The result demonstrated that learners underused booster amplifiers (e.g., highly) compared to native English speakers, and furthermore, overused two particular amplifiers, completely, totally, which Granger referred as “safe bets”. With an underdeveloped sense of collocation restriction, learners tended to underuse native-like collocation and overuse collocations which were not typical in English.
Laufer & Waldman (2011), on the other hand, discussed collocations used in 759 argumentative essays by basic, intermediate and advanced level of Israeli students. 220 nouns appearing at least 20 times were extracted from the native corpus. V-N collocations were then selected in both corpora, and checked with BNC, dictionaries, and natives for their validity. Laufer & Waldman found out that all learners produced significantly less collocation than native speakers regardless of proficiency level. In terms of collocation error, advanced learners made more mistakes than other levels of learners, even though advanced learners, at the same time, had the most amount of
18
collocation produced. They later claimed that the reason for such a high ratio of collocation error (one third of all collocation produced) was that learners processed individual words in the word combination, instead of prefabricated patterns.
2.2.1 Advanced EFL learners’ Difficulties of Collocation in Academic setting
Turning to learners’ collocational use in academic writings, several studies (Durrant & Schmitt, 2009; Howarth, 1998; Li & Schmitt, 2010; Yang, 2015) investigated the difference of collocation use between native and non-native speakers.
Shared findings suggested that the use of collocations in academic setting deviated from native norms of collocational use.
Howarth (1998) conducted research on V-N collocations of academic writings by non-native speakers. The native corpus contained texts extracted from LOB corpus and published texts in sociology, education and law (240,000 running words), whereas 10 sets of graduate school assignments from non-native speakers in applied linguistics were collected to compile a non-native corpus (25,000 running words). The total amount of 5300 V-N collocations from native corpus and 1200 collocations from non-native corpus were extracted by using a list of high frequency verbs. All collocations were checked with BB1 Combinatory Dictionary of English and Oxford Dictionary of
Current Idiomatic English and three levels of restrictedness of word combination (free
combination, collocation, and idiom) were identified. Results found out that conventional collocations and idioms found in non-native corpus were 25 % of the entire corpus, relatively lower than the native norm (38%), indicating that learners had a lower level of knowledge of collocation. Other findings also showed that less restricted collocations were found in the non-native corpus, and more unacceptable collocation produced by non-native postgraduate learners. Howarth suggested that learners lacked of awareness of appropriately deploying lexical items in collocation.19
Using methodologies which take advantage of frequency, Durrant and Schmitt (2009) intended to find out how non-native learners make use of word combinations in academic writings. They studied adjective-noun and noun-noun collocations by ESL/EFL learners of different L1 and collected two lengths of text (short, long) to form four corpora: NS long, NNS long, NS short, and NNS short. A native speaker corpora consisted of long texts from Prospect magazine and academic essays by students in Applied Linguistics and short texts extracted from LOCESS corpus. NNS corpora, on the other hand, contained long texts from British, Turkish academic essays, and short texts from Bulgarian subcorpus of ICLE. A total of 10,839 collocationss from the 96 texts and only 1,355 was produced by non-native speakers (12%). Two main findings were raised. First, non-native writers tend to use less low-frequency (found less in five times in BNC) collocations than native writers. The pattern was consistent in both short and long texts, however, the trend seemed to be more prominent in longer texts.
Secondly, taking t-score and MI value into account, non-native learners showed a greater reliance on strong collocations (high t-score), but underuse collocations with low MI value in comparison with native speakers. The result of this study indicated that productions by non-native speakers showed a degree of conservatism, and hence they rely on collocations that were common in English
Li and Schmitt (2010) conducted a longitudinal study with learner’s development of collocation learning. With a similar thought from Durrant and Schimtt (2009), Li and Schmitt also utilized a frequency tradition to investigate collocational behaviors by 4 Chinese postgraduate students studying at the University of Nottingham. To understand the improvement of collocation learning overseas, a total of 32 essays and 4 dissertations were collected from the four subjects during one academic year. All
adjective-noun collocations were then extracted from learner corpus (150,000 words).
20
The result showed that 299 types of adjective-noun collocation were identified, and among all, only 41% were classified as frequent and strongly-associated. Li and Schmitt suggested that learners are more inclined to use collocations with high t scores and low MI scores, they also used certain collocations respectively. As for the development of collocational use, little improvement was discovered in numbers of robust collocations in their academic writings.
Employing a semi-automatic method, Yang (2015) investigated adjective-noun and verb-noun collocations by Taiwanese postgraduate students in Applied linguistic.
Using large academic corpora, she compared the collocational behavior in theses by Taiwanese students with those in published research papers. The sizes of the postgraduate corpus and the published author corpus were 11.7 million and 11.8 million respectively. Yang used 29 most frequent core nouns as a baseline and extracted verbal, adjective collocates from both corpora via Sketch Diff function built in Sketch Engine.
All extracted adjective-noun and verb-noun collocations were then examined manually to exclude collocations which contained verbs that can be widely collocated with many words (e.g., consider and regard). A log-likelihood test was later conducted to clarify the underuse and overuse behaviors of learner collocations. The findings showed that published authors’ adjective-noun and verb-noun collocation repertoire was richer than Taiwanese. Low type-token ratio of V-N/A-N collocations in Taiwanese postgraduate
All extracted adjective-noun and verb-noun collocations were then examined manually to exclude collocations which contained verbs that can be widely collocated with many words (e.g., consider and regard). A log-likelihood test was later conducted to clarify the underuse and overuse behaviors of learner collocations. The findings showed that published authors’ adjective-noun and verb-noun collocation repertoire was richer than Taiwanese. Low type-token ratio of V-N/A-N collocations in Taiwanese postgraduate