Data Analysis Procedures - 以語料庫分析期刊作者與臺灣英文學習者學術英文動詞─名詞與形容詞─名詞搭配詞之使用

Before retrieving V-N and A-N collocations in the two investigated corpora, a list of core noun shared in the two corpora had to be extracted first. To generate this core noun list, the researcher respectively identified key nouns in RAC and in MTC by means of the Word List function of SKE, which generated a list of key nouns in RAC and a list of key nouns in MTC. Human vetting was then undertaken to ensure that the two key noun lists contained only frequent, common, and useful key nouns. The criteria for identifying frequent, common, and useful key nouns were the following

For retrieving frequent key nouns, the researcher set the frequency cut-off of a key noun at 80 PMWs or more in this study. Adopting a higher frequency cut-off, rather than the commonly adopted 40 PMWs as argued in Biber, Conrad and Cortes (2004), was because a higher frequency cut-off could not only identify highly frequent nouns in the two corpora but also yield a manageable size of results for further analysis. In addition to frequency, dispersion of nouns (i.e. the spread of nouns among the texts in a given corpus) was also taken into account to retrieve common nouns in the two corpora. In this study, the minimum dispersion rate (DP) of a core noun was 50%, which meant that a core noun had to appear at least half of the texts in RAC and half of the texts in MTC. The criteria of both frequency cut-off and dispersion rate could thus identify frequent and common nouns in the two investigated corpora. In addition, to ensure nouns included in the core noun list were useful, nouns like proper nouns, acronyms, pronouns, and nouns with less than four alphabetical characters were also removed from the list. Based on these criteria, the researcher identified 29 core nouns in the two corpora. These 29 items in this core noun list then served as the seeds to retrieve V-/A-N combinations in both RAC and MTC. Table 3.3 presents the frequency data and dispersion rates of the 29 core nouns.

Table 3.3. Core Nouns Identified in the Present Study (in Alphabetical Order).

RAC MTC

After identifying core nouns in the two corpora, the researcher then extracted the potential verb/adjective collocates of these core nouns in the two corpora by utilizing the Sketch-Diff function of the SKE. Results of the Sketch-Diff search yielded 2,256 items in the object of block and 1,686 items in the modifier block of these core nouns in the two corpora. However, some of the extracted items could not be considered as (good) verb/adjective collocates of the core nouns and should be eliminated for further analysis. For instance, some of the items in the object_of block could be collocated with a wide range of nouns (e.g., ‘regard’, ‘consider’, ‘involve’, etc.) and hence were not good verb collocates of these core nouns. Some of the items in the modifier block were even not adjectives at all. Human inspection was thus conducted to exclude the following types of unwarranted collocates:

1. Verbs that can widely collocate with many nouns (e.g. ‘consider’ and

‘regard’);

2. Nouns (including common nouns, proper nouns, pronouns, and acronyms);

3. Possessives;

4. Semi-determiners—as listed in Biber et al. (1999), i.e., same, other, former, latter, last, next, certain, such, etc.;

5. Numbers/ordinals;

6. Hyphenated adjectives (e.g. corpus-based approach);

7. Adjectives to signify nationalities (e.g. Chinese, English);

8. Terminology (e.g. Lexical Approach, Universal Grammar);

In addition to manual inspection on the items in the two blocks, further examination on the concordance lines (i.e. hits) of each item was also carried out.

Examining the concordance lines of these items was to avoid the miscalculation of false

raw frequencies of these collocations because of mis-tagging, which might greatly influence the outcomes of frequency comparison. After the inspection, the researcher then generated different lists of collocation to answer the proposed research questions.

For the first and the second research question, the researcher extracted lists of V-/A-N collocations appearing in the research article corpus and the learner corpus respectively.

However, because the first two research questions aimed to identify frequent V-/A-N collocations in the two corpora, combinations occurring less than 5 times PMWs and 1% DP in the two corpora were excluded from the frequent collocation lists. The refined two lists of collocation were then utilized to compare the difference(s) of the published authors’ and the Taiwanese EFL learners’ usage pattern of V-N and A-N collocations.

To answer the third research question, the general frequency data of the identified V-N and A-N collocation in RAC and in MTC were subjected to the log-likelihood test to determine whether the published authors’ V-N and A-N collocation use was deviant from the Taiwanese EFL learners in terms of frequency. In addition, type-token ratios (i.e. TTRs) of the identified V-N and A-N collocations in RAC and in MTC were also calculated to examine how similar or different the published authors’ collocational diversity was from the Taiwanese EFL learners’.

Finally, for the fourth research question, log-likelihood test on each frequent V-N and A-N collocation identified in the two corpora were be implemented to identified all the collocations underused/overused by the Taiwanese EFL learners. These underused/overused collocations were then subjected to the collocation calculator to obtain the t-score and MI-value of every single collocation.

The overall data analysis procedure of the present study was summarized in Figure 3.7 below.

Figure 3.7. Flow Chart of the Overall Research Procedure.

CHAPTER FOUR

RESULTS AND DISCUSSION

This chapter reports the findings regarding the use of V-N and A-N collocations in published authors’ and Taiwanese EFL learners’ writing in the field of applied linguistics. In the first section, frequency data of V-N and A-N collocations employed by published authors and Taiwanese EFL learners are presented. Comparisons between the two writer groups’ general use of V-N and A-N collocations are then displayed. The second section of this chapter further examine the differences between the two writer groups’ V-N and A-N collocation use by identifying combinations overused and/or underused by the learners. Association analysis (i.e. t-scores and MI-values) on all the overused/underused collocations is then conducted to examine the relationship between association measures and the learners’ collocation usage pattern. A more focused t-score as well as MI-value analysis on some synonymous pairs of overused and underused items is then presented. Some recurrent verb/adjective collocates identified in the underused/overused lists are also presented. The final section then presents a general discussion on the published authors’ and the Taiwanese EFL learners’ usage pattern of academic V-N and A-N collocations.

在文檔中以語料庫分析期刊作者與臺灣英文學習者學術英文動詞─名詞與形容詞─名詞搭配詞之使用 (頁 46-51)