The Present Study - LITERATURE REVIEW - 以語料庫為本之台灣英語學習者錯誤學術動名詞搭配詞分析

CHAPTER II LITERATURE REVIEW

2.4. The Present Study

The studies reviewed in 2.2 and 2.3 all pointed out that appropriately using collocation poses a great difficulty for advanced ESL/EFL learners. Due to their limited knowledge in both lexical usage and collocational restrictions, learners, in general, produce a smaller amount and variety of collocations than native speakers do. In academic writings, learners tend to underuse native-like collocations, or overuse less restricted collocations, indicating that collocational problems are far more complex than possessing a small range of vocabulary and noticing the collocational restrictions.

In terms of miscollocation, learners’ greatest difficulty lies in correctly choosing verbal component of collocation. Other problems such as L1 interference hinders learners from producing proper collocation.

Despite the studies above provides abundant insights of Advanced ESL/EFL learners collocational deficiency, there are still some room for improvement. For starters, to the researcher’s best knowledge, studies dedicated to EFL learners' miscollocation specifically in academic English has yet been conducted. Since in academic writing, being capable of writing precisely is the key to successful writing, it would be beneficial for both learners and teachers to list out collocations which are difficult to learners, along with causes behind them.

Secondly, except for Yang (2015), manual extraction of collocation was utilized by most of the collocational studies reviewed above. However, manual extraction was only feasible if the corpora sizes were small. If a larger corpora were used, maual extraction might not be the ideal method to extract miscollocation. To set this present study apart from previous collocational research, an online query system, Sketch Engine, was used to extract miscollocations automatically. As claimed by Liu (2014), extracting miscollocations via Sketch Engine minimized laborious work of filtering out

targeted collocations, and the system also “displayed a precise comparison between the language use of native speakers and that of EFL learners in an efficient manner” (Liu, 2014, P. 89).

To better understand learners’ problems of V-N collocation in academic English as well as causes behind them, this present study aims to investigate V-N miscollocations in Taiwanese master theses in Applied Linguistics. A list of collocation errors was also provided in hope to facilitate instructors locating problems most learner would encounter in academic writings. The following research questions were answered in this research:

1. What are the types miscollocation found in Taiwanese master theses in Applied Linguistics?

2. Among the types of miscollocations found in Taiwanese master theses in Applied Linguistics, which type of miscollocation is made by learners the most?

3. What might be the causes that miscollocations attribute to?

Chapter Three: Methodology

This chapter contained three sections. The first section of this chapter explained the composition of the two corpora in this study, MTC and COCABNCacademic. The second part introduced the tools used to extract V-N collocation, Sketch Engine. The final part of this chapter disclosed the procedure of extracting core nouns and identifying verb-noun miscollocations.

3.1 Corpora used in this study 3.1.1 Master Theses corpus

In order to investigate the miscollocations produced by Taiwanese graduate students, a large learner corpus compiled by Yang (2015) was used in this present study.

Yang (2015) compiled a learner corpus of 11 million words, selecting a total of 494 master theses from 10 Applied Linguistics/ English Teaching programs in Taiwan published between years 2003-2012. Since advanced academic writing courses were required for students in the Applied English and English Teaching program, the quality of the writings should be high enough for investigation. With a large corpus size and fairly homogeneous quality of master theses, it is safe to conclude that collocational errors found in the results can be generalized as common errors that most Taiwanese postgraduate students in AL/English Teaching programs might produce. To ensure other influential factors that might affect the results, such as the writers’ first language and the format of the theses, only theses written by Chinese native speakers were selected. Details of Master Thesis corpus were listed in Table 3.1 (adopted from Yang, 2015, P. 26).

Table 3.1. Details of Master Thesis Corpus (Yang, 2015)

Universities No. of Text No. of Running Word

National Changhua University of Education 84 2,077,235

National Chiao Tung University 28 671,611

National Chiayi University 12 290,496

National Chung Chen University 68 1,709,833

National Chung Chi University 14 359,937

National Kaohsiung Normal University 128 2,911,090

National Pingtung University of Education 28 631,985

National Taiwan Normal University 60 1,474,036

National Taiwan University of Science and Technology 26 577,760

National Tsing Hua University 46 1,019,019

Total 494 11,723,002

3.1.2 Reference Corpus: COCABNCacademic

To extract possible miscollocations, two large reference corpora were used in this study, COCA and BNC. COCA is a large, balanced, American English corpus, consisting of 520 million words from five different genres (spoken, fiction, magazine, newspaper, academic journals). BNC, on the other hand, is a 100-million-word British English corpus containing a large variety of texts, ranging from university essays and fiction to spoken business or government meeting records. Since this present study focused on collocations in academic texts, subcomponents of academic texts in COCA and BNC written subcorpus were subtracted and compiled into a COCABNCacademic corpus (1.2 billion). The reason of choosing BNC written subcorpus was to balance out the amount of texts using American English and British English. It is also crucial to point out that the size of COCABNCacedmic was 10 times larger than the master theses corpus. Using large and representative corpora, such as BNC and COCA, could not only save manual labor of removing any mis-tagging, but also ensure the credibility of the collocation search results

Table 3.2. Comparison of Master Thesis corpus and COCABNCacademic corpus

Corpus No. of Running Word

MTC

COCABNCacademic

11,723,002 125,430,426

3.2 Instrument

Sketch Engine, developed by Kilgarriff and his colleagues (2004), is a commercial

corpus query system featuring multiple ready-for-use corpora and functions for analyzing lexical, grammatical aspects of words. This systems is famous for its selection of up to 300 pre-load tagged corpora in different language for multiple types of linguistic research (e.g., language teaching, diachronic analysis, etc.) Sketch Engine also includes a thesaurus search engine, corpus similarity comparison, and terminology extraction function.

Different from other corpus query software, Sketch Engine is more feasible for this study due to its part of speech (POS) selection of a word. It allows users to set POS of a searched lemma and then generates a one-page summary of collocates, arraying them with frequency and different grammatical features. This function is advantageous for collocation research since it saves manual work of selecting only verb-object-noun from other grammatical forms of collocations. Aside from the search for certain POS, the other advantage of this platform is its multi-functionality (Yang, 2015). Users are able to view word sketch (a word’s collocates in different grammatical relations), create wordlists and build a customized corpus (Kilgarriff, Rychly, Smrz, & Tugwell, 2004).

Other functions such as concordance is built into the collocation search result, which allows researchers to examine concordance lines directly without much trouble.

Because of the multiple functions provided by this platform, Sketch Engine was therefore chosen as the instrument for this present study.

In this study, two functions, Word list and Sketch diff, are utilized to retrieve possible collocation errors produced by Taiwanese EFL learners. In the Word list function, a list of nouns can be created based on frequency, which is vital for researchers to select nouns for further investigation of collocation. Figure 3.1 below illustrates the interface of the Word list function.

Figure 3.1. Word list function

Word list function was an efficient tool for extracting target noun because it

automatically generated part of speech of all the words on the frequency list. The

traditional way of sorting out nouns was to examine concordance lines of each word on the list one by one and selected target nouns manually. This method was relatively time-consuming and might not be suitable for this present study. Sketch Engine solved this issue by adding “lempos” search onto the platform (see Figure 3.1), in which POS tagged at the end of each lemma (in the form of lemma-POS, e.g., “test-n” or “read-v”), making noun retrieval an easier task. MTC word list result is shown in Figure 3.2 below.

Figure 3.2. Word list of Master Thesis Corpus

Another function used in this Sketch Engine is Sketch Diff. Sketch Diff was

originally used for investigating how two near-synonyms (e.g., big, large) behave differently in one set of data. In this study, sketch diff was used to detect how a word was used differently in two sets of data, namely MTC and COCABNCacademic corpus.

In previous studies on collocation error by L2 learners (Fan, 2009; Laufer & Waldman, 2011; Nesselhauf, 2005), V-N collocations were extracted manually and cross-examined with dictionaries and large English corpus (e.g., BNC, COCA) to detect error.

This method might not be applicable in research using large corpora since extracting all the verb-noun collocations manually might be labor-intensive and time-consuming.

Therefore, by comparing collocates of a noun found in MTC and COCABNCacademic corpus automatically with Sketch Diff function, possible collocation error can be found within seconds. Figure 3.3 illustrates using Sketch Diff function to search collocates of the lemma “feedback” in COCABNCacademic corpus and MTC

Figure 3.3. Interface of Sketch Diff function

A summary of collocates of a searched lemma categorized in different part of speech was shown in the search result. The summary was composed of 47 blocks, each representing a specific POS usage of collocates. Since this present study investigated verb-noun collocations, the block of “object-of” was used as an example below. Each block was highlighted with five colors: green, bright green, white, bright red, and red.

Green highlighted collocates were found only in COCABNCacademic which indicates that such word combinations were only used by native speakers (e.g., solicit feedback).

A bright Green highlight indicated that collocates were found in both sub-corpora but more usage was found in the COCABNCacademic corpus. White highlighted collocates meant that such collocates were found both in COCABNCacademic corpus and MTC, but that the frequency of collocation occurrence in both corpora were fairly equal (e.g., deliver feedback). As for collocates with bright red highlighting, the frequency of such collocates were found to be more in MTC than in COCABNCacademic corpus. Collocates highlighted in red indicated that such collocates were only found in MTC (e.g., make feedback), which was the focus of this present study. The hypothesis was that collocations that were only found in Taiwanese master theses had a tendency of mis-use, and thus required further validation of accuracy.

Figure 3.4. Sketch Diff result of “feedback” in two corpora

3.3 Data Extraction and Analysis

In order to examine verb-noun miscollocation by Taiwanese learners, a list of nouns in MTC was first extracted as the guideline for searching miscollocation.

Secondly, each target noun was searched via Sketch Diff function to extract any possible miscollocation. Lastly, dictionaries and judgements from one native speaker were utilized to verify the accuracy of each possible miscollocation. The following section illustrated the procedure of target noun selection, miscollocation extraction and the analysis of miscollocation in detail.

3.3.1 Noun Selection

Since the aim of this study was learners’ use of collocation, a list of nouns was generated from Master Thesis Corpus with Word list function. Lempos search was selected to accelerate the process of selecting nouns. Proper nouns and acronyms were excluded from the noun list manually. The target noun selection of this study was different from that of Yang’s study (2015) and Laufer & Waldman’s study (2011), who selected nouns appearing in both the learner and reference corpus. There were two main reasons for selecting nouns only from the MTC corpus. First, although nouns found in both the learner and reference corpus guarantees key-ness and high frequency in the English language as a whole, it does not necessarily mean that those nouns are prone to be used incorrectly by L2 learners. Using such a noun list to locate collocation errors in a learner corpus might end up with less results. On the other hand, if selecting nouns from both MTC and COCABNCacademic corpus, nouns that had less frequency in the reference corpus would not appear on the noun list. However, less frequent nouns were possibly the ones that might be problematic for learners. To thoroughly extract all V-N collocation error, whether nouns had high frequency or not, should all be examined.

Based on the aforementioned reasons, a list of nouns was retrieved solely from the MTC corpus.

As for the cut-off point for selecting nouns, nouns that appeared over 500 times were retrieved for further analysis. Different from how other studies suggested a higher frequency cut-off in normalized data (Biber, Corad & Cortes, 2004; Yang, 2015), a lower cut-off ensured most of the problematic verb-noun collocation were chosen. The other reason for adopting 500 times as a cut-off was that the counts of erroneous verbal collocates of nouns appearing less than 500 times were not enough. The noun list served as a baseline for retrieving verbal collocates.

3.3.2 Miscollocation Extraction

After the target nouns were selected, each noun was searched for with the Sketch

Diff function in SKE to draw out verbal collocates in the MTC corpus. A collocation

was determined to be unacceptable if such collocation could only be found in the MTC corpus. Therefore, verbal collocates that were marked as red from the “object_of” block were extracted. To explain the miscollocation extraction procedure in a concrete example, the targeted noun, “progress” was chosen here for further explanation.

After the target noun, “progress”, was selected, it was entered into the search box of sketch diff to extract the verbal collocates. Verbal collocates in the red section of the

“object_of” block were then examined to collect the erroneous verb-noun combinations.

As shown in Figure 3.5, two columns with numbers on the right of the verbs, represented the number of hits found in the COCABNCacademic and MTC corpus. The numbers on the right were hits found in the MTC corpus and thus were inspected. Next, all concordance lines of verbal collocates appearing in the red section of the “object_of”

block were then examined to ensure that mis-tagging was not included in the result. By clicking on the numbers, the page was then directed to the concordance lines of the target nouns and the selected verbal collocate (See Figure 3.6).

Figure 3.5. Sketch Diff result of target noun “progress” in two corpora

Figure 3.6. Concordance lines of miscollocation “reach progress”

Aside from miscollocation, “reach progress”, other verbal collocates in red block were also examined in concordance lines one by one. Other collocations that were judged as wrong by the native speaker were: improve progress (3 hits), obtain progress (5 hits) and gain progress (20 hits). Another example, “word”, also posed difficulty to learners (See Figure 3.7). Using the same method mentioned above, several miscollocations, “Consult word”, “meet word”, “reinforce word”, and “absorb word”, were also found to be possible miscollocations. Based on these example, it was clear to see that even advanced English learners had difficulty in producing acceptable collocations in academic writing.

Figure 3.7. Suspicious verbal collocates of the targeted noun “word” in MTC corpus

Despite Sketch Engine automatically extracting verb-noun collocations, some inevitable mis-tagging might still occur, for instance, listening comprehension (See Figure 3.7). Listen in –ing form did not function as verb in this collocation, hence, such collocations were judged as mis-tagging and removed from the results. Aside from excluding mis-tagging from the results, miscollocations with less than three counts in the MTC corpus were excluded as well. Miscollocations appearing less than three times might be writing idiosyncrasies and could not be regarded as common mistakes.

Figure 3.8. Concordance lines and frequency counts of “listening comprehension”

To sum it up, the following guidelines were adopted to exclude all unqualified miscollocations.

1. Mis-tagging: Verbs function as adjective to modify nouns.

2. Miscollocation appearing less than three times.

3. Compound nouns as object of verb (e.g. achieve “reading comprehension”)

4. Terminology (e.g. input hypothesis)

All possible verb-noun miscollocations extracted from Sketch Engine were listed with one example sentence and were further searched in a larger reference corpus, BNC.

If possible miscollocations were not found in BNC, such collocations were then marked and checked by one native speaker of English. If the collocations were noted as erroneous by the native speaker, they were then regarded as miscollocations and were further analyzed. Correction of each miscollocations were provided with the help from the native English speaker. Two dictionaries, Oxford Collocations Dictionary and The

BBI Combinatory Dictionary of English, were also used to facilitate correcting

miscollocations without altering the learners’ intended meaning.

3.3.3 Error types and causes of Miscollocations

To answer research question one, a list of error types (cf. Table 3.3) was generated based on the lists by Nessehaulf (2003) and Wang & Shaw (2008) reviewed in section 2.3. It is important to point out that this present study used a large amount of academic writings, whereas data used Nessehaulf (2003) and Wang & Shaw (2008) did not belong to a specific genre. Hence, error type found in this present study might be slightly different from their result. The following list was used mainly as reference to classify miscollocations found in MTC corpus.

Table 3.3. Types of Error (Nesselhauf, 2003; Wang & Shaw, 2008)

Type of Error

Wrong verb choice Wrong noun choice Wrong syntactic structure Wrong noun plurality

Preposition of a prepositional verb missing, present though unacceptable Preposition of a noun missing, present though unacceptable.

Combination not exist

Combination exist but used incorrectly

As for the causes of errors, an adopted version of Liu’s (1999) lists of causes of errors was used to facilitate the process of analyze the possible causes. Different from Liu’s research, this present study investigated L1 influence in more details by subcategorizing “negative transfer” in Liu’s list to “split category” and “direct transfer”, as shown in Table 3.4 below.

Table 3.4. An adopted version of Liu’s (1999) list of Miscollocation causes

Cause

Figure 3.9. Flow chart of data extraction

Master Thesis Corpus COCABNCacademic Corpus

Sketch Diff

Possible Miscollocation

Dictionaries

Native Speaker

Miscollocation

Analysis of ErrorTypes and Causes of Miscollocation Master Thesis Corpus

Noun Selection

Target Nouns

Chapter Four: Result and Discussion

This chapter presented and discussed the result of this study. In the first section, overall statistical data was presented, followed by discussion on error types of miscollocation found in the research. In the second section, the most frequent misused verbs made by Taiwanese learners were analyzed. The third section contained analysis on several possible causes of learner miscollocations. Finally, this chapter ended with some insights of this study.

4.1 Overall Result and Error Types

With the utilization of Sketch Engine, there were 897 nouns with frequency above 500 times were extracted. Each noun were entered into Sketch Diff function to check if any erroneous verbal collocates existed in MTC. Altogether, there were 127 target nouns found to have at least one erroneous verbal collocates. Each suspicious miscollocation was then examined via BNC and also by one English native speaker consultant. Altogether, 171 patterns of verb-noun miscollocations were found with a total frequency of 1171 instances.

Table 4.1. Overall statistical data of miscollocation

No. of target nouns No. of Verb-noun

miscollocation pattern No. of Verb-noun miscollocation instance

127 171 1171

To answer research question one, there were three main types of error which can be observed from the result, misused verb, misused noun, and combination non-existent.

The above miscollocation patterns were determined based on the collocations listed from Oxford Collocations Dictionary, Cambridge Advanced Learner's Dictionary (4th

Edition), and the consultation from one native English speaker. Since the meaning of

collocation is highly context-dependent (i.e., the meaning changes in different context), the context that the collocations were in was the main criterion of identifying miscollocation. After the suggested corrections of each miscollocation were listed, collocational errors and corrections were compared in order to determine the type of error that each miscollocation belonged to.

The error type, Misused verb, consisted of verb-noun miscollocations containing semantically or grammatically unacceptable verb, which was judged based on writers’

intended meaning. On the other hand, misused nouns error type, were composed of miscollocations which had at least one semantically unacceptable noun. The last error type, combination non-existent, containing miscollocations which cannot be categorized in any of the error types above. One recognizable feature of the miscollocation in this category was that the miscollocation would still remain inappropriate even if the verb or noun was replaced. Such miscollocations signaled that there might be more complex problems than just misused verbs or nouns. In order to keep the intended meaning, possible corrections of the miscollocations under this error type might simply be a completely different collocation denoting the same meaning.

Table 4.2 shows the patterns and instances of miscollocations of each error type.

在文檔中以語料庫為本之台灣英語學習者錯誤學術動名詞搭配詞分析 (頁 41-0)