• 沒有找到結果。

3.2 Instrument

3.2.1 Sketch Engine

In the present study, the commercial online platform Sketch Engine (SKE) were utilized to extract collocations in the two corpora. The SKE was developed by Kilgarriff and his colleagues to automatically present a one-page summary of a word’s

27

grammatical and collocational features based on corpus data (Kilgarriff et al, 2004).

The advantage of applying the SKE in generating lists of common collocations in the two corpora is its clear demonstration of categorizing collocations into different lexical/grammatical collocation types. The traditional manner of collocation extraction usually grumbles all the types of lexical collocation in one list, and it often consumes researchers much time to identify specific type(s) of lexical collocation in such a long list. The SKE, however, can systematically demonstrate a set of up to 27 grammatical relations connected to a headword (Kilgarriff et al, 2004), which allows the researcher to quickly identify V-/A-N collocations from the two 12-million-word corpora. Another benefit of the SKE is its multi-functionality. In addition to the function of concordancing, the SKE also includes corpus creating, word list, word sketch references, thesaurus search, sketch differences, and many other practical uses.

This one-for-all tool allowed the researcher to identify both a list of key nouns and lists of the common collocations in published authors’ and student writers’ writing in the same platform. Because the SKE offers multiple functions for the researcher to efficiently explore the collocation patterns in the two corpora, this online platform was chosen as one of the main instruments in the present study.

Among the various functions of the SKE, the Corpus Creating, Word List, and Sketch-Diff functions were utilized in the present study. The Corpus Creating function allowed the researcher to upload the two self-compiled 12-million-word corpora onto the platform for further alternative analyses by applying the tools on the SKE website.

The Word List function, in addition to generating a wordlist of a corpus, can also identify a list of keywords (i.e. words whose frequency of occurrences are salient in one corpus but not in the other) by comparing two wordlists of two different corpora (see Figure 3.1, the interface of Word List search).

28

Figure 3.1. Interface of Word List Search.

The default reference corpus for retrieving keywords in the present study was the British National Corpus (BNC). The BNC was chosen as the reference for comparison was because of its large-size (100-million-word) and representativeness of general English, which serves as a good base to identify core nouns out of the two corpora. By comparing the frequency of occurrences of all the words in a corpus with those in the BNC, a keyword list could be thus generated.

Keywords identified in a corpus, however, contained not only nouns but also other content words and function words. To solve this problem, the search attribute was set as ‘lempos’. As demonstrated in Figure 3.2 below, the outcome of lempos search presented the keywords in the form of ‘lemma-pos’ (i.e. a lemma with its part-of-speech specified), which allowed the researcher to further retrieve key nouns (in the form of

29

‘lemma-n’, such as ‘proficiency-n’ and ‘learner-n’) from the two corpora. With this function, the researcher then could easily identify key nouns in the two self-compiled corpora.

Figure 3.2. Search Results of Word List in the Output Type of Keyword.

As for the Sketch-Diff function, it was originally developed to display the collocational discrepancy between two synonyms (i.e. what collocates tend to co-occur with one synonym but not the other). This special function, however, were adopted to retrieve the verb/noun collocates of the identified key nouns from RAC and MTC.

Figure 3.3 below presents the interface of Sketch-diff by subcorpus.

30

Figure 3.3. Interface of Sketch-Diff by Subcorpus.

A summary chart concerning the corresponding collocates of ‘difference’ in distinct parts of speech positions was displayed in different blocks, as shown in Figure 3.4 below.

Figure 3.4. The Search Outcome of Sketch-Diff of Difference between Two Corpora.

31

In each block, collocates were highlighted in three distinct colors, namely, green, white, and red. The green color highlighted collocates that tended to co-occur in RAC;

the red color, on the contrary, specified collocates that were more likely to co-occur in MTC. Items in the white area were those that appeared equally often in the two corpora.

For instance, in the block object_of, it can be seen that the published authors often collocated the verb ‘observe’ (in the green area) with ‘difference to form the V-N collocation ‘observe a difference’, whereas the Taiwanese EFL learners never constructed this V-N collocation in their writing. In contrast, the verb ‘investigate’ (in the red area) co-occurred with ‘difference’ for 90 times in the Taiwanese EFL learners’

writing, while this combination did not occur in the published authors’ writing. In other words, items in the green area were the potential underused collocates, while those in the red area were more possibly overused. It should be noted that mis-tagging of words could sometimes occur, which might influence the comparison results due to miscalculation of frequencies. Further manual examination on the concordance lines of each potential collocate were thus required after the semi-automated extraction of collocations in the two corpora. Since the present study targeted on V-/A-N collocations, only blocks titled object_of (i.e., analysis as objects) and modifier (i.e., collocates as the modifiers of analysis) were examined in the current study.