CLEC, SWECCL, JCEE, The Taiwanese Learner Corpus

CHAPTER III METHOD

3.2 Corpora

3.2.2 CLEC, SWECCL, JCEE, The Taiwanese Learner Corpus

The Chinese ESL learner corpora, on the other hand, are composed of four major parts－CLEC (Chinese Learner English Corpus, 1.0), SWECCL (Spoken and Written English Corpus of Chinese Learners. 1.0 and 2.0), JCEE (Joint College Entrance Examinations) Testees Corpus, and The Taiwanese Learner Corpus. The total is 7.3 million words.

The first two corpora are based on the input of Chinese ESL learners in Mainland China. CLEC (Chinese Learner English Corpus) is a large-scale ESL learner corpus compiled by professors Gui and Yang. Comprised of about 1 million words produced by high school and college students, it is frequently adopted for research purposes for its balanced tagging labels of 61 types of errors, including up to 1288 tokens of Verb-Noun miscollocations ready for analysis (Zhou 2005; Li 2005). As to SWECCL (Spoken and Written English Corpus of Chinese Learners), it is a project led by Wen et al., and is so far the largest Chinese ESL learner corpus in Mainland China. With a size of 3.5 million words, the SWECCL corpus possesses both written and spoken data. Since the BNC, the authors' native reference corpus for this study, is mainly comprised of written input, only the written data from the SWECCL, a sum of around 2.4 million words, are adopted for further semi-automated extraction and comparison.

The other two learner corpora are made up of the English production from Taiwanese ESL learners. The JCEE (Joint College Entrance Examinations) Testees Corpus, with an approximate total of 2 million words, is constituted with English written data by Taiwanese high school graduates on their college entrance exams.

Compiled by the College Entrance Examination Center in Taiwan, the JCEE corpus currently is for research purposes only. With regard to The Taiwanese Learner Corpus, it consists of about 1.8 million words, which were contributed by students from National Taiwan Normal University, National Tsing Hua University, National Taiwan Ocean University, National Taiwan University, National Taichung University, and Soochow University. The students composed online about various topics like technology, politics, education, school life, etc, with the length of three hundred to five hundred words per essay.

After the native and non-native corpora were obtained respectively, they were uploaded onto the SKE. The four Chinese ESL learner corpora were merged into one big corpus first, and then the BNC and the combined Chinese ESL learner corpus were analyzed with the functions on the SKE.

3.3 Data Extraction

One breakthrough of this study is the alternative manipulation of the Sketch Diff functionality on the SKE to accomplish a semi-automated fashion of both data extraction as well as analysis. In the past research, the discussions and criteria on what Verb-Noun structures to specify and what to filter out often took much of researchers' time and effort. By applying the tagging system and powerful sorting tools on the SKE, the author would, based on a list of frequent nouns generated from the Chinese ESL learner corpus, extract the target Verb collocates both from the BNC and the Chinese ESL learner corpus, and compare them with the Sketch Diff function.

First, if knowledge is selected as an example to compare between the native and non-native corpora, it is clear that the Sketch Diff provides a summary chart concerning the corresponding collocates of knowledge in distinct parts of speech positions (Figure 3.4). Then, since our focus is on Verb-Noun miscollocations, the left column with the heading object_of (that is, knowledge used as Objects) would be examined. The red area means those Verb collocates Chinese ESL learners tend to use with knowledge while native speakers never do, such as enrich (99 times), study (94 times), and master (58 times). The green part signifies the Verb collocates native speakers habitually apply to go with knowledge, but not vice versa for non-natives.

Those extreme examples which native speakers never produce are our target for further inspection.

Figure 3.4 Knowledge Compared between Native & Non-Native Corpora

Next, to probe into what the concordances are and how they are misused, the entry of enrich is chosen, and a list of lines are shown (Figure 3.5). This way, the author could examine the context quickly to decide whether the evidence given by the

Sketch Diff between natives and non-natives provides actual miscollocations or not.

The references adopted for this study are introduced in section 3.4, Data Analysis.

Figure 3.5 Concordances of enrich_knowledge in the Chinese ESL Learner Corpus

As for the keywords that were tested with the Sketch Diff interface, they were based on a list of the most frequently used nouns from the Chinese ESL learner corpus, generated online by the SKE. According to the suggestion of Liu (2002), nouns tend to be the main crucial indicators for learners' English Verb-Noun miscollocations. By inspecting the verb collocates of a noun, it is more efficient to capture the V-N misuse

than looking into the noun collocates of a verb. A similar idea is also proposed by Manning and Schütze (1999) with the term "focal word" indicating the crucial feature of nouns in V-N collocations.

In this study, the set threshold of frequency count was 300. That is, only nouns with no fewer than 300 frequency tokens would be incorporated in this study. This requirement eventually narrowed the number down to 690 key nouns to be compared (cf. Appendix A). On the other hand, only those V-N miscollocations found more than three times in the Chinese ESL learner corpus would be counted significant enough by the author for further comparison and discussion with the native speaker corpora.

In a semi-automated manner, the most frequently used nouns in the Chinese ESL learner corpus and their respectively common verb collocates in the ESL leaner corpus and the BNC corpus were checked one by one with the Sketch-Diff function.

Demonstrated above by the two colored areas (cf. Figure 3.4), it is obviously shown that certain verbs are significantly used more often by either natives or non-natives.

This, ultimately, is the target function on the SKE platform the author would like to apply in this study, i.e., manipulating the Sketch Diff interface to examine common Verb-Noun collocations in native corpora and non-native ones in a semi-automated manner.

3.4 Data Analysis

The analysis of the results provided by the Sketch Diff described in section 3.3 would be stratified in the following steps.

First, the suspicious V-N collocations, detected by the Sketch Diff function, which native speakers never used (0 token found in the BNC) were targeted. Then, only those suspicious V-N collocations found at least three times in the Chinese ESL learner corpus would be counted significant enough by the author for further comparison and discussion with the native speaker corpora.

Second, during the process of examination, based on the red area (Figure 3.4), which indicated those V-N combinations found at least three times in the Chinese ESL corpus but none in the BNC, the author would double-check the suspicious examples in the Corpus of Contemporary American English (COCA), another powerful online corpus, for further confirmation. Since the BNC is basically composed of linguistic input of British English, a parallel check of the suspicious V-N collocations on the COCA, mostly consisting of American English, could avoid any possible negligence.

Once those suspicious V-N collocations were double-checked on the COCA and there was no entry found, the author would regard them as V-N miscollocations for sure.

Third, due to the feature of the Sketch-Diff function, which treats Verb-Noun combinations and Prep-Noun combinations as two separate categories on the SKE platform, the author would not additionally extract possible Verb-Prep-Noun collocations from the Prep-Noun category for this study. All of the results displayed in Chapter IV are originally classified in the Verb-Noun category by the SKE. Even though some Verb-Prep-Nouns would be discussed, they were included because the Sketch-Diff function actually highlighted them as suspiciously wrong V-N

collocations (not found in the BNC). After the author looked into them, it was discovered that actually the verbs in the examples were acceptable, but that the prepositions after the verbs were deviant. The author, therefore, still considered them part of the results for general consistency and their original categorization as Verb-Nouns by the SKE online system.

Fourth, in terms of error classification, the possible types would be partially based on Chang and Yang (2009). As reviewed in section 2.1.3, Error Types of ESL Learners' Collocations, there are generally 12 kinds of Verb-Noun miscollocations (cf.

Table 3.1).

Table 3.1 Verb-Noun Types of Chang and Yang (2009)

Error Types Examples

1 Erroneous verb choice *learn knowledge

2 Misuse of delexical verbs *do recommendations 3 Erroneous use of idioms *get touch with them

4 Erroneous noun choice *tell a speech

5 Erroneous preposition after verb *reply letters 6 Erroneous preposition after noun *give sympathy to animals

7 Erroneous use of determiner *play piano

8 Erroneous syntactic structure *rang the phone 9 Erroneous choice for intended meaning *break my armed self

10 Redundant repetition *work one job

11 Erroneous combination of two

collocations *enjoy yourself a good time

12 Miscellaneous miscollocations which cannot be

categorized

Fifth, if the author cannot be sure to which category a V-N error should belong, a native speaker of English as well as other resources would be consulted, such Just the Word (http://www.just-the-word.com/), dictionaries like The BBI Dictionary of

English Word Combination, Oxford Collocations Dictionary, Oxford Advanced

Learner’s Dictionary, and the Collins COBUILD English Dictionary.

Finally, after a basic error categorization is compiled, the author would look into the possible causes of these V-N miscollocations. Here, the study of Liu (1999) would be the basis of discussion (cf. Table 3.2).

Table 3.2 Possible Causes of Verb-Noun Miscollocations by Liu (1999)

Possible Causes Examples

1 Overgeneralization *I am worry about you

2 False analogy *ask you a favor

3 Erroneous assumption *do plans

4 Erroneous use of synonyms *broaden your eyesight

5 Negative L1 transfer *eat medicine

6 Erroneous coinage *see sun-up

7 Approximation *release my pressure

Table 3.3 summarizes the general data analysis procedures for this study.

Table 3.3 Data Analysis Procedures for this Study Types of collocates provided by

the Sketch Diff

Discard those collocation types with more than 1 time found in the BNC, the native corpus

Inspect the concordances of suspicious V-N miscollocations

Discard those found less than 3 times in the Chinese ESL learner Corpus

CHAPTER IV

RESULTS AND DISCUSSION

This chapter aims to provide the general findings, types of Chinese ESL learner's Verb-Noun miscollocations, the salient miscollocates among these mistakes, and the possible causes leading to the V-N misuse. An overall discussion would be provided at the end of each analysis.

By applying the Corpus Creating function of The Sketch Engine website, several Chinese ESL learner corpora (cf. Chapter III) were uploaded onto the platform, and combined into one large corpus. With a size of 7,376,712 tokens, it has been one of the biggest Chinese ESL learner corpora for linguistic research so far.

The author then adopted the function of Sketch-Diff on the SKE for the target data extraction. Originally, this Sketch-Diff is designed for the sake of comparing two synonymous words in the same corpus or between two separate corpora of different genres (e.g. academic and oral). This function is alternatively extended for the author's study. That is, instead of examining the corpora of native speakers, the Sketch-Diff function was utilized to compare the uses of the same word respectively in

an English native corpus (the BNC), and an ESL learner corpus (the Chinese ESL learner corpus) (cf. Figure 4.1 and Figure 4.2).

Figure 4.1 Interface of the Sketch-Diff function

Figure 4.2 Stress as an example of Sketch-Diff Result between the BNC and Chinese ESL Learners

4.1 Statistical Data and Miscollocation Types

In this study, the number of suspicious types (with at least three times of occurrence in the Chinese ESL learner corpus) is 1284, with 23385 collocations in total. After validation from the COCA (Corpus of Contemporary American English), many other corpus-based resources, and examination by native speakers, 134 types of Verb-Noun miscollocations were eventually indentified, with 2841 tokens overall (cf.

Table 4.1).

Table 4.1 Overall Types and Tokens of Verb-Noun Miscollocations

Chinese ESL

Among these V-N miscollocations, adapted from Chang and Yang (2009) and Nesselhauf (2005), there are basically three aspects of misuse－simple verb usages, prepositional and phrasal verb usages, and noun usages (cf. Table 4.2). Prepositional and phrasal verb misuses were grouped together. This is due to the comparatively small number of phrasal verb errors in the results, which only accounts for two distinct types. The major criterion of distinction between verb and noun misuse lies in

the concept that in deviant-verb-based V-N collocations, the use of nouns was semantically and grammatically correct according to the intended meanings, but their verb collocates were not. Deviant-noun-based V-N collocations, on the other hand, were composed of semantically as well as grammatically acceptable verbs, but not their accompanying nouns. Both deviant-verb-based and deviant-noun-based V-N collocations were confirmed with the aforementioned corpus-based resources.

Table 4.2 Aspects and Tokens of Verb-Noun Miscollocation Types

Aspects of Misuse

Generally, the most common misuse of Verb-Noun miscollocations was found to be in the prepositional and phrasal verb category, with 1502 tokens (53%). Deviant simple verb use, on the other hand, occupied the most various kinds of usages, with 63 different types (47%) in total. Erroneous noun use was the least common both in variety and quantity, with 28 types of Verb-Noun miscollocations (21%) and 507 tokens (18%) overall.

The incorrect verb usages of the Verb-Noun miscollocations, simple verbs

combined together with prepositional and phrasal verbs, take up 106 discrete types (79%), with 2334 tokens (82%) in the whole results found by utilizing the Sketch-Diff interface on the Sketch Engine platform. In the following presentations, the column of

"suggested verbs" or "suggested nouns" is based on those frequently-adopted verb/noun collocates on the Sketch-Diff interface, COCA platform, and from the suggestions of a native speaker consultant, which would also be part of the results and discussion in this study.

4.2 Misuse of Simple Verbs

Three major types of Simple-V-N miscollocations can be classified from the results. They are synonymous verb pairs (22 types with 346 tokens), verbs in common expressions (10 types with 109 tokens), and other verb pairs (31 types with 377 tokens) (cf. Table 4.3).

Table 4.3 Types of Simple-Verb-Based V-N Miscollocations

Aspects of Misuse Verb-Noun Miscollocation Types

Synonymous verb pairs mean that the deviant verb collocates Chinese ESL learner used are actually semantically related with their accurate verb counterparts.

Verbs in common expressions, on the other hand, reveal that the correct verbs with their noun collocates are in fact common or almost fixed expressions. The original verbs chosen by ESL learners for their intended collocations, therefore, turned out to be grammatically yet not pragmatically acceptable. Other verb pairs refer to those that cannot be directly categorized, with various background factors, and would be further

examined in the discussion section.

4.2.1 Synonymous Verb Pairs

22 different kinds of Simple-V-N miscollocations were found to be in the synonymous verb pair category, with 346 tokens. The erroneous verbs in their ESL-learner-produced collocations might appear semantically similar to their correct verb suggestions at first sight. Yet, evidence from the BNC, COCA, and other large-corpora-based resources all proved that they just do not go with certain nouns (0 token found on the BNC nor the COCA), as shown by Table 4.4.

Table 4.4 Simple-Verb-Noun Miscollocation Types: Synonymous Verb Pairs

No. Incorrect Verb Suggested Verb(s) V-N Miscollocations Frequency

1 accept receive/ enter/ have accept higher education 130

2 keep maintain sports are good ways to keep health 60

3 catch grab/ seize catch that chance immediately 25

4 train enhance/ increase/ develop train our ability through practice 24

5 take get/ have/ earn take good grades on exams 18

6 enlarge broaden/ expand/ widen enlarge our horizons 11

7 relax relieve/ reduce relax our stress 10

8 increase enhance/ cultivate/ foster increase our friendship 7

9 look watch/ see looked this TV advertisement 7

10 talk tell/ crack talked many jokes 7

11 finish fulfill/ satisfy/ meet/ achieve finish their wish 6

12 invent make/ develop/ work on invent an invention 6

13 devote donate devote two million dollars 5

14 forget ignore forget the stress 5

15 say speak/ talk in say good English 4

16 appreciate enjoy appreciate the comfortable wind 3

17 content fulfill/ satisfy content my desire 3

18 gain earn/ win gained five thousand dollars 3

19 promise grant parents promised their wish 3

20 realize understand/ comprehend how much they realize the lessons 3 21 realize understand realize the custom of many countries 3

22 talk tell/ reveal talk their secrets 3

Total 346

In terms of the suggested verbs alongside the wrong verb column, they are the frequently-adopted verb collocates of the key nouns from the data of BNC or COCA.

Drawing on the Sketch -Diff function on the SKE, the author first extracted all the verb collocates of a key noun from the BNC and the Chinese ESL learner corpus.

Then, examining the verb collocates used by native speakers, the author endeavored to find the appropriate verb collocates which would possibly reflect the intended meanings of ESL learners' V-N miscollocations. The online platform of the COCA can generate all the frequent verb collocates of a noun as well. The author, therefore, also checked the suggestions on the COCA for reference so that a reasonable set of advised verbs could be arranged for this study.

In Table 4.4, it is obvious that *accept high education is the most often misused synonymous-verb-pair collocation type, with 130 tokens overall. The verb collocates adopted by native speakers for the noun education, instead, are receive, enter, and have. According to OALD, Oxford Advanced Learner's Dictionary (8th Edition), a

definition of accept is "to take willingly something that is offered; to say ‘yes’ to an

offer, invitation, etc." One possible explanation for this misuse could be that education usually entails compulsory duties or decisions already made. As a result, one either decides to or not to receive education. There is no need for one to express his or her will to "accept education," which sounds like an ideological stance about certain issue, not the real classes to be taken at school.

*Keep health, taking up 60 entries, is the second most misused V-N

miscollocation in the synonymous-verb-pair category. In OALD, keep basically refers to "to stay in a particular condition or position; to make somebody/something do this."

Though its denotation seems possible to go with the word health to ESL learners, the actual usages about keep in OALD are "to keep somebody/something + adjective,"

such as "She kept the children amused for hours (OALD)." or "to keep somebody/something (+ adverb/preposition)," as in the example "He kept his coat on (OALD)." On the other hand, maintain, which is the native suggestion in the BNC, means "to make something continue at the same level, standard, etc (OALD)," and its example is "She maintained a dignified silence." Maintain, clearly, already indicates the continuation of certain status, without additional words needed to complete its meaning. For the word health, maintain obviously is a better choice to signify the continuation of one's effort to "keep his or her health at an ideal level."

The third most misused synonymous-verb-pair collocation is *catch that chance,

with grab and seize as the main verbs suggested by native-based corpus data. One entry of definition about grab is "to take advantage of an opportunity to do or have something," and one about seize is "to be quick to make use of a chance, an

在文檔中以語料庫為本之半自動化英語母語者及學習者動名詞搭配詞比較 (頁 46-0)