Definition of Key Terms - 以語料庫為本之半自動化英語母語者及學習者動名詞搭配詞比較

CHAPTER I INTRODUCTION

1.5 Definition of Key Terms

1. Verb-Noun Collocation: Based on past studies (Givon 1993; Fromkin el al. 2003;

Lin 2010), the Verb-Noun collocations refer to the lexical verbs and the nouns with preceded modifiers. In this study, the online query system of the SKE would scan the corpora and extract Verb-Noun collocations semi-automatically by itself.

2. Learner Corpus: With L2 learners' linguistic production compiled and systematically-tagged, either spoken or written, this type of corpus is distinguished from general corpora like the BNC (British National Corpus), which consists of linguistic production of L1 natives, i.e., natives of English-speaking regions.

3. Semi-Automated: This term here means that in this study, from corpus data extraction to preliminarily pragmatic as well as semantic appropriateness inspection, the online system of The Sketch Engine does the part for human manual labor.

4. The Sketch Engine: A web-based program (https://the.sketchengine.co.uk/) that accepts uploaded linguistic data on subscription terms and offers its analysis functions in return. It contains two main functions－concordancer and the Word Sketch program (Kilgarriff et al. 2004). Word Sketch, like what its literal meaning

suggests, organizes a keyword's collocates of all parts of speech, and provides a

one-page summary of them with the number of frequency of each collocate shown neatly in the form of colorful columns. The picture (Figure 1.1) below is an example of the word fun demonstrated with Word Sketch on the SKE website.

Figure 1.1 Fun in the BNC demonstrated by the Word Sketch function

CHAPTER II

LITERATURE REVIEW

2.1 Collocations

This section is organized in two parts. The first discusses the general notions and studies of collocations. The latter focuses on Verb-Noun collocations and their related topics, which is the emphasis of this study.

2.1.1 Collocations and Verb-Noun Types

Since Firth (1957) first coined the term "collocations," the definition of this pivotal concept has spurred much discussion and multifarious explanations. For instance, Murphy (1983) referred to collocations as "word associations," Alexander (1984) "fixed expressions," and Granger (1998) "prefabricated patterns." As a quote from Firth adequately explained, "you shall know a word by the company it keeps (p.197)," the relations among words appear much more complex than their individual components.

Nesselhaulf (2005) offered a practical description of "non-substitutability" about

the feature of collocations. It explains that the meaning of a collocation is muddled once one of its components is replaced with another semantic counterpart. For example, "powerful" can be semantically similar to "strong," but in the collocation "a powerful computer," powerful cannot be exchanged with strong. Otherwise, the meaning turns out weird and just not right.

In this study, the definition the author opts for is the one proposed by Laufer and Waldman (2011). They suggested "restricted co-occurrence (Sinclair 1991)" and

"semantic transparency (Cowie 1994)" as their core definition of collocations. A brief section of their examples serves as a straightforward and clear categorization:

"We consider throw a disk and pay money to be free combinations, we consider throw a party and pay attention to be collocations, and we consider throw someone’s weight around and pay lip service to be idioms (p. 649)."

In respect of the importance of collocations, Howarth (1996) found that over a third of the collocations observed in a 240,000-word corpus were collocations.

Moreover, Nesselhaulf (2005) delineated a precise argument of why collocations count. The past observations about the crucial role collocations stand for can be summarized in the subsequent four facets.

First, collocations are the foundation of creative language in both L1 and L2 (Peters 1983; Wray 1999). Second, a set of ready-for-use chunks are psycholinguistically the decisive factor for one to achieve fluency in language production (Pawley and Syder 1983; Aitchison 1987). Third, since collocations are the shared platform between conversants, pragmatic comprehension surely receives enhancement through the utilization of them (Hunston and Francis 2000). Finally, like register in a certain region, collocations in common serve a similar function to meet

“the desire to sound [and write] like others” (Wray 2002).

There are always various interpretations of so called Verb-Noun collocations. For a clear inspection of V-N collocations in this study, two camps of rationale are outlined: one from Cowie (1991; 1992) and the other from Howarth (1966; 1998), Siepmann (2005), and Lin (2010).

First, Cowie (1991; 1992) suggested that verbs could contain three major features: figurative, delexical, and technical.

1. Figurative: in "deliver a speech," the keyword deliver goes beyond its basic denotation of sending an object to a more figurative or abstract aspect of getting certain ideas across to the listeners.

2. Delexical: in "make recommendations," the verb make does not really convey

a specific meaning but is rather grammaticalized and vague.

3. Technical: in "try a case," the action try instead is constrained, just exhibits the meaning of the collocation, and tends to be narrow as well as specific.

Second, based on Howarth (1966; 1998), Siepmann (2005), and Lin (2010), from the most fixed to the least, there can be five levels of fixedness of Verb-Noun collocations.

1. Complete restriction on individual components: they can be (a) pure, i.e., opaque in literal meaning (let the cat out of the bag, spill the beans), or (b) figurative idioms (call the shots).

2. Constraint on one element, while others could be substituted: such as give the appearance/impression, or take/pay heed.

3. Partial regulation on certain distributional places: like make/give a speech/presentation.

4. Freedom of replacement on one component, while others are partially restrained: weak collocations in a sense (accept/agree to/adopt a plan/proposal/ suggestion/recommendation/convention, etc).

5. Open to any substitution on any component: free or non-restricted

compositional sequence (paint the wall; suggest an idea).

2.1.2 Difficulties ESL Learners Face

In the previous part, the major discussion of collocations in general and Verb-Noun features is briefly summarized. The following two sections would review the past research concerning learners' difficulties with collocations and what the basic miscollocation types are.

Studies have been carried out through plenty of techniques of elicitation, such as translation, cloze, fill-in, multiple-choice tests, or questionnaires. Biskup (1990) reported that by means of translation tests, Polish ESL learners were observed to answer correctly when translating L2 collocations to L1, yet not the other way around.

Bahns and Eldaw (1993) designed translation as well as fill-in questions like "He was too proud to _____ his defeat." Their study showed that verbs which were part of a

collocation caused much more difficulties than others, no matter whichever level the testees were at. Shei (1999) revealed another interesting case about Chinese ESL learners encountering more problems dealing with cloze collocation tests than other learners with a European L1 background. Furthermore, Wang (2001), through a 50-question fill-in test for her subjects, pointed out that English collocations which

were more idiomatic, with no similar Chinese counterparts, or contained more interchangeable synonyms therein had imposed much trouble on students' learning.

Chen (2008) designed a multiple-choice test with 50 questions for 440 non-English major college students. Questionnaires were also distributed to understand their ESL learning background. The results of Chen's study suggested that Verb-Noun miscollocations were the most marked errors. Other than that, students' knowledge of collocations was found to have much to do with their performance in English on their Joint College Entrance Examinations. After analyzing her subjects' collocation errors, Chen concluded that negative transfer from L1, overgeneralization, and confusing usages of synonyms mainly resulted in students' deficiency of English collocations.

Gitsaki (1997) proceeded further to combine essay writing, translation, and fill-in questions in her study. She divided her subjects into three respective groups according to their proficiency levels, i.e., post-beginning, intermediate, and post-intermediate.

Her results showed that ESL learners were obviously in need of collocation instruction due to several opposing factors: the discrepancy between learners' L1 and L2, the intrinsic complicatedness of collocations, and the insufficient amount of L2 input received.

Regarding learners' self-perceptions, Li (2005) recruited 38 college undergraduates for her study in which assignments and questionnaires were given to examine their ideas about L2 collocation difficulty. The results demonstrated that what students deemed to be hard collocations were not the actual mistakes they made in their assignments. Moreover, ignorance of collocational constraint turned out to be the major cause of errors in students' production.

These above studies, though conducted in different forms of design, all pointed in one direction: collocation, Verb-Noun bundles in particular, caused much difficulty for ESL learners both in acquisition as well as production, and corresponding mechanisms should be especially taken into consideration.

2.1.3 Error Types of ESL Learners' Collocations

Drawing on the discussion of Nesselhauf (2003) and James (1998), Chang and Yang (2009) proposed twelve genres of V-N miscollocations and eleven potential causes, according to their findings from the CLEC corpus.

1. Erroneous verb choice: such as *learn knowledge, which should be acquire knowledge instead.

2. Misuse of delexical verbs: like the categorization of Howarth (1966; 1998),

delexical verbs do not convey much meaning unless accompanied with a complement, as do in *do recommendations (should be make recommendations).

3. Erroneous use of idioms: if certain collocations function like phrases, then they should be seen as a whole without much replacement, like *get touch with them (should be keep in touch with them).

4. Erroneous noun choice: such as *tell a speech, which should be tell a story instead.

5. Erroneous preposition after verb: for instance, in *reply letters, the V-Prep is misused and should be reply to letters.

6. Erroneous preposition after noun: similarly, in the collocation *give sympathy to animals, the Noun-Prep is wrong and the right one is give sympathy for

animals.

7. Erroneous use of determiner: as the determiner is missing in *play piano, the original one ought to be play the piano.

8. Erroneous syntactic structure: basically means the distributional misuse like

*rang the phone should be the phone rang.

9. Erroneous choice for intended meaning: in Chinese there is a common phrase

*break my armed self, but the correct corresponding English expression should

be undermine my self-esteem.

10. Redundant repetition: a semantically similar word is repeated partially due to L1 interference, like *work one job (do one's job or just work would suffice).

11. Erroneous combination of two collocations: for example, in *enjoy yourself a good time, either enjoy yourself or have a good time would be appropriately

sufficient.

12. Miscellaneous: those types of miscollocations which cannot be categorized.

In addition, Liu (1999) collected 94 copies of general writings and 127 final exam papers written by college students. In his result, 63 miscollocations were found and a categorization of them was provided by the researcher.

1. Overgeneralization: if a word contains more than one usage, then a possible overuse of it could be foreseeable. As the word worry can function as a noun or verb, mistakes like *I am worry about you happened (I am worried about you as the right one).

2. False analogy: an example like *ask you a favor can be an erroneous extension from structures like Verb+Noun+O.C.

3. Erroneous assumption: commonly found for delexical verbs in the instance of

*do plans (make plans as the correct one), false guesses about these vague

verbs could be made by ESL learners.

4. Erroneous use of synonyms: as Farghal and Obiedat (1995) indicated as a

"straightforward application of the open choice principle," *broaden your eyesight could be produced by ESL learners instead of broaden your horizons.

5. Negative transfer: also possibly stemming from the influence of Chinese, *eat medicine is translated directly from Chinese expressions, while take medicine

should be the right one in English expressions.

6. Erroneous coinage: out of vocabulary deficiency or a lack of collocational awareness, students could make *see sun-up, combining sun and up together, instead of see a sunrise.

7. Approximation: when two words share similar meanings or forms, a muddled sense of them could result in *release my pressure, whereas relieve my pressure ought to be used by ESL learners (release and relieve might appear

alike in learners' memory).

2.2 Methods of Analyzing ESL Learners' Miscollocations

A large number of studies have been conducted in order to understand the hurdles learners face concerning their collocation learning and proficiency. Based on the generalization of Laufer and Waldman (2011) and Kilgarriff et al. (2004), the methods of these studies are categorized into three crucial stages－traditional manual data collection of language samples, concordances obtained from learner corpora with KWIC (keywords in context), and the use of MI (mutual information) measures for a more systematic analysis of a word's collocates.

2.2.1 Manual Extraction and Examination

Before the prevalence of online linguistic data sorting and storage systems, human-labor data collection as well as error inspection was the most typical way. Liu (1999) collected 94 copies of general writings and 127 exam papers of Taiwanese ESL students for analysis. After being manually checked by the researcher, 63 errors were detected in the writing samples, which mostly were Verb-Noun collocation errors.

Chen (2002), in addition, looked into the miscollocations of high school students in Taiwan through 90 English examination papers. 272 miscollocations were found

according to the category system of Benson (1986), with Adjective-Noun and Verb-Noun miscollocations being the two most significant sorts of all. Chen also pointed out that the negative transfer from L1 seemed to be the main cause of ESL learners' miscollocations, especially when a L1 equivalent example was available for the intended message to be phrased in English.

2.2.2 Learner Corpus and KWIC Concordances

Later, thanks to the development of modern technology, large-scale corpora and corresponding KWIC (keyword in context) programs have been widely available to researchers.

Shih (2000), after manually inspecting the most frequently-used verbs by ESL students in the TLCE (Taiwan Learner Corpus of English), a 415,700-word corpus, found several key verbs from the most problematic Verb-Noun combinations. These key verbs were achieve, understand, disturb, ask, and avoid. The possible reasons for learners' misuse of these verbs in collocations were considered to be their high frequency as well as dominant tendency to be memorized with other noun collocates.

That is, the more common for a verb to be collocates of other words, the more frequent it is to be misused in learners' collocation production.

Liu (2002) adopted ETLC (English Taiwan Learner Corpus), a 1-million-plus-word corpus of which the data was collected from IWill, an online reading project platform, for her study of Verb-Noun miscollocations made by ESL learners. With the help of error tags already marked by other English teachers online such as word choice, wrong verb/ noun, or problematic usages, the researcher combed these tagged words for potentially erroneous Verb-Noun collocations. She found 233 Verb-Noun miscollocations out of 265 lexical miscollocations in total. The quantitative result was quite significant, and Liu further pointed out that verb-based mistakes outnumbered noun-based ones, which meant that students had more problems learning how to utilize the verbs in collocations they intended to produce.

Apart from that, by consulting WordNet (http://wordnet.princeton.edu/), an online lexical database, Liu claimed that over half of the students' miscollocations stemmed from their puzzled semantic concepts of inter-related verbs like run and move, operate and drive, while the rest of the errors were mostly triggered by direct translations from learners' L1.

As for European ESL learners, Nesselhauf (2003) retrieved 32 essays composed by German college undergraduates from ICLE (International Corpus of Learner English). She perused the writings, tagged keywords one by one, and extracted errors mostly on her own. The results pointed out that 56 out of 1072 Verb-Noun

combinations were confirmed to be miscollocations, the most salient type of all.

In an even larger scale of research, Nesselhauf (2005) went on to choose GeLEE, a 318-argumentative-essay, 154,191-word German ESL learner subcorpus in ICLE, for her doctoral dissertation research. She manually searched for miscollocations, with the BNC corpus, dictionaries, and native speakers as reference. Her focus was on Verb-Noun collocations, and the final outcome showed that 744 out of 2078 Verb-Noun collocations were miscollocations.

2.2.3 Concordancers and MI Measures

With the promotion of lexical statistics (Church and Hanks 1989) and better-programmed concordancers, other measures like MI (mutual information) have allowed researchers to broaden the scope of inspecting a word's collocates up to five words (Kilgarriff et al. 2004). Instead of reading concordance after concordance, a list of salient collocates for a keyword could be conveniently summarized.

Lin (2010) conducted an informative study to compare the Verb-Noun miscollocations between Chinese and Taiwanese ESL learners by adopting CLEC (Chinese Learner English Corpus), approximately 3.4 million words, and a Taiwanese ESL learners’ corpus, around 1.8 million words. First, she extracted all the Verb-Noun

combinations from the above two corpora with the software Antconc and MonoConc Pro. Then, comparing these combinations with the BNC corpus by using another

program Perl, Liu identified those combinations not overlapped in the BNC. Finally, the researcher performed a manual check of all the potentially erroneous Verb-Noun collocations by means of the consultation with dictionaries and online websites. Her result showed that 210 types of miscollocations were detected in the Taiwanese ESL learner corpus, while 268 in CLEC, and about 10% of the miscollocations appeared overlapped in the two corpora.

The studies above from 2.2.1 to 2.2.3, though all offering insightful results and discussions, seemed to leave room for improvement in the following realms.

As for the data extraction procedures, except for Lin (2010), the past studies all required too much labor during data extraction process and might not really be feasible for future academic reproduction. Even if in Lin's (2010) study, certain semi-automated method was adopted (a software Perl was applied to filter out overlapping V-N collocations in both the learner corpora and BNC to extract those erroneous V-N ones from the learner corpora), most of the procedure was still manual.

Also, as notified before, the potential miscollocations in the previous research

were double-checked manually with many kinds of resources such as the The BBI Dictionary of English Word Combination, Oxford Collocations Dictionary, Oxford

Advanced Learner’s Dictionary, etc. Nevertheless, the native examples from a

well-organized corpus like the BNC were left without further consultation. This could result from the issue of labor and time constraint that if the V-N collocations from the BNC were to be manually extracted for comparison, it might take too much time and human resources.

Third, due to practicality constraint, most of the analyses and decisions in the past research were conducted by the researchers alone, with the suggestions from hard-copy as well as online dictionary references. This, though the results were examined rigorously later on, still poses a question of human judgment and fatigue concerns.

Finally, with the assistance of technology, the sizes of different corpora around the world are increasing day by day. Once new sources of data are to be incorporated into current corpora, new results or different analyses could be promisingly awaited.

CHAPTER III

METHOD

This section presents the tool, data source, and planned extraction coupled with analysis procedures for this study. First, the powerful online platform, The Sketch Engine, is introduced accompanied with its two major functions for the target research.

Then, the adopted corpora, both general corpora as well as ESL learner ones, are discussed. Finally, the semi-automated method and final judgment process are explained.

3.1 Instruments－The Sketch Engine

First utilized in the compilation of the Macmillan English Dictionary (Rundell 2002), and debuted at Euralex 2002 (Kilgarriff and Rundell 2002), word sketches are a one-page summary of a word's features, both grammatically and collocationally, drawing on corpus-based data in an automatic manner (Kilgarriff et al. 2004: p. 1).

The Sketch Engine (SKE, also known as the Word Sketch Engine, would be referred to

as SKE henceforth) is an innovated corpus query system that demonstrates word sketches, grammatical relations, and a distributional thesaurus (Huang and Hong

2006). With its clear and constantly renovated online platform, SKE has been gaining more and more attention these days.

In response to the ever-changing era of hi-tech advancement, the invention of SKE copes with the ensuing challenges and develops distinctive functions.

First of all, as witnessing the introduction of Gigaword (1000M word corpus) by The Linguistic Data Consortium (http://www.ldc.upenn.edu/), researchers around the

world sensed that the traditional interface of concordancers could not handle such an amazing amount of data any more (Kilgarriff and Grefenstette 2003). Instead of just

在文檔中以語料庫為本之半自動化英語母語者及學習者動名詞搭配詞比較 (頁 20-0)