National Chengchi University, Taiwan

ABSTRACT

This paper investigates twenty-two prepositions in two different lexical bundles – [PREPOSITION the NOUN of] (at the point of, from the perspective of, etc.) and [VERB PREPOSITION the NOUN of]

(shouted above the noise of, suffering from the effects of, etc.), the only difference being that the former does not include the head verb that is present in the latter. Strings of constructions were extracted from the British National Corpus and the types of possible verbs, prepositions, and nouns in each possible combination were analyzed. The paper also details an experiment in which the types of nouns under each of the twenty-two prepositions were coded by human subjects in terms of their semantic features. Finally, a computer program was also utilized to calculate the shared meaning of the different VERBs and NOUNs. The results showed that the nouns in [(VERB) PREPOSITION the NOUN of], though they might form clusters of meanings, may not behave in the same way with and without the presence of the verbs.

Keywords: prepositions, lexical bundles, nouns, semantic features, corpus, constructions

I. INTRODUCTION

According to Biber et al. (2004) and Levy (2008), who investigated ‘lexical bundles’ in spoken versus written registers, lexical bundles, or multi-word sequences, are “the most frequent recurring lexical sequences in a register”, including, but not limited to, four-word sequences such as do you want to, take a look at, to come up with, I don’t know

what, one of the things, those of you who, and so forth (p. 376). Their instances of

bundles may or may not contain a head verb.

Most previous studies on lexical bundles focus on register-specific materials. For instance, Biber (2009) compared the most common multi-word patterns in conversation and academic writing and found that the multi-word patterns occurring in the two registers are different. Patterns in conversation tend to be fixed sequences including both function words and content words; patterns in academic writing, however, tend to

be formulaic frames consisting of invariable function words with an intervening variable slot that is filled by content words.

Focusing on academic prose, Biber proposed that there are numerous fillers that may occur in the frame the * of the. It was found that four different prepositions tend to precede the * of to form the four-word lexical bundles: at the * of, on the * of, in the *

of, and to the of, all of which are patterns of interest in the present paper. Among these,*

the most distinctive frame is at the * of, which co-occurs frequently with the fillers end,

time, beginning, level, expense, start, center/centre, top, and base. On the other hand, in the of takes several high frequency fillers that are distinctively used in this frame,*

namely case, absence, form, context, course, and process. Using a similar ‘frame’, this paper investigates the distributions of different variables (in capitals) in the pattern [(VERB) PREPOSITION the NOUN of]. The present work focuses not on any specific genre, but on material contained in the British National Corpus (BNC), a general corpus.

We propose that similar clusters of nouns (and verbs) can also be found in a general corpus. Our study further hypothesizes that the VERBs and NOUNs can be measured in terms of their semantic relatedness. To answer this question, two types of methodologies were employed – one including an experimental-based analysis of semantic features, while the second involves the automatic extraction of semantically related hypernyms. The details of this will be illustrated in the next section.

In a different study, also following a genre approach, Luzón Marco (2000) investigated the collocational framework in the medical research paper. The results showed that two of the most common frameworks in the corpus are: [the NOUN of] (e.g. the start of), a NOUN of (e.g. a variety of). [The NOUN of] tends to be used in expressing the construction of nominalizations (e.g. the cloning of); [a NOUN of] is frequently applied to describe the process of quantifying and categorizing. Another important finding is that these two frameworks are likely to precede or follow the collocates belonging to specific semantic classes. For example, the risk of is always preceded by verbs with causative meanings (related/associated with/to the risk of). It was concluded that the selection of specific collocates for these frameworks is conditioned by the linguistic conventions of the genre. In a different study and in an attempt to improve the understanding of the function of lexical bundles in academic prose, Biber et al. (2004) compared the use of such bundles by published authors in history and biology. The most

frequent four-word lexical bundles in these genres were classified in terms of their structure groups. The findings revealed that lexical bundles in history mainly belong to two structural groups – noun phrases and prepositional phrases – while lexical bundles in biology cover a wider range of structural groups, including noun phrases, prepositional phrases, [it + Vbe + adjective], [Vbe + complement], and [noun phrase + V + complement] clause fragments. In general, in both history and biology genres, the majority of the bundles could be categorized into the groups containing a noun phrase with an of phrase fragment (e.g. a measure of the, the beginning of the) and prepositional phrases with an embedded of phrase (e.g. as a function of, at the beginning

of, at the university of). From here, one can see that most of these studies in lexical

bundles needed to deal with noun phrases and prepositional phrases in one way or another. For instance, Biber and Conrad (1999) found that, in academic prose, 60% of the bundles are phrasal, parts of noun phrases or prepositional phrases, as in the case of,

as a result of, on the basis of, and on the other hand. Noun phrases and prepositional

phrase fragments were also found as the most frequent patterns in academic prose (also found in Biber et al. 2004 and Hyland 2008a, 2008b). Similarly, scientific discourse is also characterized by very frequent occurrences of nouns, long words, prepositions, conjuncts, being agentless, and by-passives, as well as past participial adverbial clauses (Biber 1988). In a book by Silvestre (2009), he investigated the particle meanings of in and on. In his methodology, “multi-word lexicalized expression” was recognized as one of the criteria in extracting verb-particle constructions (VPC). Multi-word expressions were included in his VPC analysis because some uses of in and on, such as in “to decide

in favor of sb” are “motivated by” the noun (favor in this example) “rather than being

directly bounded to the verbal element” (p. 159). Given the above studies, we postulate that it might be useful to investigate lexical bundles by examining the nouns (and the verbs) in a given construction. This paper inspects both the nouns and the verbs in the constructions [(VERB) PREPOSITION the NOUN of], which co-occur with twenty-two different prepositions.¹

Rather than looking at one particular preposition, this paper investigates a group of prepositions in terms of distributional patterns. As Silvestre (2009) discovered, some of the particles were more closely related to the nearby nouns than to the verbs, and this is the kind of phraseological phenomenon we inspect in this study. The foci of this study

are: (a) To compare the distributions of NOUNs and VERBs in the construction [(VERB) PREPOSITION the NOUN of] when twenty-two different prepositions are involved; and (b) To display similarities of meanings among NOUNs and VERBs in this construction. The ultimate goal is to propose a systematic way to analyze semantic features of nouns and verbs given a preposition-containing construction. Two types of methodologies were employed, namely experimental analysis of semantic features, and computational calculation of semantic meanings by measuring the common hypernym, if any, found between any two nouns or verbs. Both these methodologies complement each other and the results were cross-referred.

II. DATA FROM THE CORPUS

All data discussed in this paper were taken from the written portion of the BNC, retrieved through BNCWeb, a platform which allows access to the BNC through a search engine of its own (Hoffmann et al. 2008). Twenty-two prepositions (about,

above, across, after, against, among, around, as, at, beside, by, down, for, from, in, into, like, of, off, on, onto, and with) were investigated. It was hypothesized that the groups of

words that appear with a similar preposition would share some similarities in semantic features. In the following sections, the distributional patterns will first be discussed, followed by a semantic analysis by human subjects. Finally, in section III, a computational program will be discussed.

II.1. Distributional patterns

In the written portion of the BNC, 373,258 instances of [PREPOSITION the NOUN of]

and 86,877 instances of [VERB PREPOSITION the NOUN of] were found. These instances were analyzed according to the different types of verbs and nouns used in them.

Table 1, below, displays the most frequent patterns for each preposition, along with their frequencies and percentages. For example, about the nature of has a frequency of 225 and the percentage of nature in the construction of [about the NOUN of] is 4.5%.

Patterns with the same scores were all listed (as for among and onto).

Table 1. Frequencies of [(VERB) PREPOSITION the NOUN of] in the BNC.

在文檔中以語料庫為基礎之英語與馬來語介系詞詞彙語意分析研究 (頁 39-43)