• 沒有找到結果。

CHAPTER 1 INTRODUCTION

1.4 O RGANIZATION OF THE T HESIS

This thesis is organized as follows. Chapter 2 reviews the lexical relations and determines the aspect to look at when discussing them. The introduction of lexical patterns and related research will be also given in this chapter. In addition, the basic component of constructing a vector space model and two state-of-the-art applications will be reviewed. Lastly, the linguistic analytical framework will also be introduced in Chapter 2. Chapter 3 introduces the corpus used in this thesis, including the data extracting method and research procedures. Chapter 4 focuses on the analysis of the results and the discussion. Chapter 5 offers the conclusion and some suggestions for further study.

Chapter 2

Theoretical Background

One of the central goals of this thesis is to provide a theoretical explanation for the applicability of lexical patterns representing word relations, indicating what aspects are conveyed through the automatically generated patterns. Since most research applying lexical patterns requires calculating similarity via computational models (more review of these works will be found in Sec 2.1.3), the findings of this study are also designed to be implemented in similarity measure algorithms. To begin with, a set of lexical patterns will be generated semi-automatically, and a computational model that reflects the hypothesized aspect will be constructed to calculate similarity between the lexical patterns. With a view to investigating how much the

computational model reveals about human understanding of these lexical patterns, the result will be discussed in a linguistic analytical framework.

As became clear in the procedures mentioned above, this thesis covers some topics in computer science and semantics. Therefore a brief but comprehensive review of related works in both realms has to be given before the discussion is continued. This chapter is divided into four parts. In the first part I will give an introduction to word relation and lexical patterns. To explain how lexical patterns are applied to facilitate computational research on relation similarity measure, the second part will focus on computational models and their applications. The third section, with the necessary theoretical background presented, a hypothesis of lexical patterns’

applicability will be developed. Lastly, a linguistic analytical framework will be presented for the discussion of research results in this thesis.

2.1 Word Relations and Lexical Patterns

In light of the progress made in modeling human language ability, there is an increased attention being paid to mental lexicon. Among the interested parties, a growing consensus is that the paradigmatic semantic relations among words (e.g., antonymy, synonymy, hyponymy and the like) are somehow relevant to the structure of lexical or conceptual information (Murphy 2003). The importance of these

relations manifests itself in an extensive range of research terminologies in fields including philosophy, cognitive psychology, linguistics, computer science and

cognitive neuroscience, just about any areas that are intrigued by word meaning or the mind. On the one hand, everyone has basic knowledge about how these relations connect word senses. To test this universal relational competence in speakers of different languages, Raybeck and Herrmann (1990) presented pairs of words with predefined relations to speakers of 7 different languages who were then asked to make classification according to the similarity among the relations. The subjects had

managed to sort invariably similar pairs of words (e.g., male/female and

remember/forget) into one group while leaving pairs like car/tire to another, which suggests a universal knowledge about relations among speakers of different languages.

On the other hand, however, no one seems to be able to pin down what exactly these relations are and where they should be included in our mental representation.

Therefore it is extremely difficult, if not downright impossible, to develop a unifying definition of word relations that can fit well in the various theoretic structures.

Having said that, since different approaches follow different traditions, it is of great importance to specify the perspective taken with regard to word relations in this thesis before I delve deeper. Therefore I will start this section by making clear what treatment of word relations is assumed. Are these relations among words or among

the things they represent? A brief history of research on lexical patterns and their relationship with word relations will then be given.

2.1.1 What Kind of Word Relations?

Most lexical semanticists claim that semantic relations do not connect words per se;

rather, it is the senses words carry that are brought together in the relations. In some literatures these relations are studied under the name sense relations (Lyons 1977) or meaning relations (Allan 1986) instead of lexical relations. For instance, in the sense contrast high temperature/low temperature, a few word pairs can be said to encode this relational information: hot/cold, boiling/freezing, and steamy/lukewarm.

However, this view on relations cannot fully account for the canonicity issue, as illustrated in works of several researchers (Gross, Fischer & Miller 1989; Charles, Reed & Derryberry 1994). They point out that in antonymic relations, some pairs of words (e.g., good/bad and hot/cold) are considered more canonical than others (e.g., hot/cool and happy/miserable). Additionally, as will become more evident in the following discussion, senses are not the only factor that comes into play when a relation is formed between two words. Other pragmatic factors also need to be considered. Accordingly, I prefer the term lexical relations to semantic relations when discussing word relations since senses are not the only concern in my work.

A caveat is in order, however, since lexical relation itself is an ambiguous name for word relations if no further explanation is given. In Murphy’s words (2003), there are at least two treatments of lexical relations: intralexical treatment and

metalexical treatment. Based on structuralist and generativist theories, a lexicon is a collection of linguistic information that cannot be derived from other information.

Then it stands to reason that information contained in a lexicon is arbitrary or

idiosyncratic. In Murphy’s definition, therefore, an intralexical treatment of relations should be the one that asserts knowledge about word relations is (a) self-contained within lexicon and (b) specifically linguistic (therefore, non-linguistic knowledge will not be involved). In other words, this way of treating word relations holds that they cannot be derived and non-linguistic knowledge such as social or cultural context will not be involved in one’s knowledge about word relations. As will become clear in the later discussion, both claims cannot fully account for actual language use in some situations.

According to Murphy (2003), the metalexical treatment contrasts with the intralexical one in that it holds that word relations are not contained in a lexicon and are therefore composed of human conceptual knowledge about the words, rather than of the words. The conceptual nature of word relations can be understood in three aspects: (a) they are productive, (b) they are context-dependent, and (c) they display the prototypicality effect. If word relations are arbitrary or idiosyncratic, then they should not be accounted for by rules. However, there is in fact an array of different productive mechanisms by which new instances of synonyms or antonyms are generated. For example, based on the morphological rule that generates oppositional pair of words, a new lexical item defriend is now widely used among the youngsters who are actively involved in social networking applications like Facebook and MySpace. The new lexical item stands in contrast to add-someone-to-the-friend-list.

This example shows that word relations are not idiosyncratic linguistic information contained in one lexicon. Another piece of evidence in support of a metalexical approach is the fact that word relations are understood, to a large extent, based on contexts. For instance, luggage and baggage are only synonyms when they are used to describe containers filled with personal items during trips. An empty luggage will

not be treated as a synonym to baggage, as illustrated in the anomalous sentence: *I bought a new set of baggage for my trip. Another example of relations’ dependence on context is the color pair blue/green. Although the pair is definitely not perceived as opposing lexical items when one is describing the physical features of objects, it is antonymous to most Taiwanese for the two colors represent parties which stand on the two extremes in the political spectrum in Taiwan. They stand in such a stark contrast that supporters of the two parties may actually be affected when asked to express preference for neutral objects with the color blue or green. This is because the color pair has been conceptually incorporated with their concepts of politics in this culture.

One may argue that these senses of colors can be represented within the lexicon as different meanings and derive the relations accordingly. However, as the context can vary according to non-linguistic factors (e.g., social or cultural conventions), the meanings of sense can be potentially limitless, which proves that it is impossible to represent word relations intralexically. The last piece of evidence to justify the metalexical treatment is the prototypicality effect. People can naturally make judgments about better examples of a certain association, determining some candidates of word pairs to be more canonical than others (e.g., big/little over gigantic/tiny). These decisions are made based more on the subjects’ memory and experience than on their linguistic ability. As in the case of context-dependency, it is obvious that people produce and perceive lexical relations according to their

knowledge about the world they live in, the social conventions they follow and the activities they participate in.

To sum up, we understand lexical relations with the aid of our metalinguistic knowledge, and this knowledge is different from our linguistic one in its conceptual-like nature. Therefore, it suffices now to say lexical relations can be seen as

realizations of our concepts about this world. This metalexical treatment of word relation and its conceptual nature will be the stance taken in this thesis and it goes in parallel with Murphy’s claim that “lexical should only indicate any properties

‘involving words’ rather than ‘contained in the mental lexicon’” (Murphy 2003, p.9).

2.1.2 Lexical patterns

In the last section I have made clear the metalexical treatment is the perspective taken when excavating lexical relations in this thesis. This approach to lexical relations takes on a conceptual interpretation of how we produce and perceive them based on our commonly shared experience and social background. This may seem like a trivial statement, but the fact is that few studies in NLP had attempted to state specifically what relations they are looking at before they implemented them.

After pointing out what aspect to look at in lexical relations, I will now turn to their linguistic realizations in computational models that help achieve tasks in natural language processing, that is, the lexical patterns. Among several tasks that involved lexical relations in NLP, a ubiquitous assumption is that lexical patterns, or lexico-syntactic patterns, are treated as representation of lexical relations between word pairs. In most tasks that involve lexical patterns, they are defined as a set of items that co-occur so frequently with a word pair that they may be used to indicate the relation between the pair. The items in the set can be lemmas, punctuation marks, or words with specific part of speech tags. For example, the lexical pattern NP0 is the opposite of NP1 may indicate an antonymic relation between the word pair NP0 and NP1.

To my best knowledge, Hearst’s (1992) work on hyponyms is the earliest one adopting the pattern-based method. Using a cyclopedic corpus that comprises 8.6 million words, Hearst extracted hyponymous information between named entities.

The task was done with the help of a set of 5 pre-defined lexical patterns that are supposed to capture hyponymous relations. One example of the patterns is such X as Y, which extracts facts about Shakespeare in sentences like such authors like

Shakespeare. Among the 226 words that constitute 153 candidate hyponymous pairs that Hearst managed to identify, 106 words are included in WordNet, which suggests that Hearst’s approach can be seen as a useful complement to the existing thesaurus.

Following works of Hearst’s, Berland and Charniak (1999) took a slightly different approach to the study of meronymous words. Excavating natural texts in a larger corpus (100 million words), they started by creating seed pairs (i.e., a set of hand-coded meronyms), which were then applied to extract substrings from sentences that include these seed pairs. After lexical patterns are manually identified from the collection of substrings, new pairs of candidate meronyms are thus picked out if they co-occur with the lexical patterns in sentences. The evaluation of the result was made based on the majority vote by 5 human judges. For the top 50 meronyms derived from their algorithm, they reported an accuracy of 55%.

Targeting a range of relations including both meronyms and hyponyms, Pantel and Pennacchiotti (2006) identified a set of generic lexical patterns automatically.

Generic patterns are the ones with broad coverage and low precision. They also began with a set of seed pairs, obtaining generic patterns to increase the recall rate. All generic patterns were then rated according to a reliability measure which calculates how reliable one pattern is based on pointwise mutual information. Discarding the less reliable patterns, the algorithms went on to collect new pairs from the Web.

Using a corpus of 6 million words, they obtained precision scores between 73% and 85% when a random sample of 50 extracted pairs were judged by 2 human judges.

So far, I have presented some introduction and instances of lexical patterns. It should now be clear that these patterns are widely applied in studying lexical

relations. And as I mentioned in Chapter 1, such studies often involve the similarity measures of lexical relations. Therefore, in order to offer a more detailed description of what lexical patterns can help to achieve, in the following section I will go through some important tasks that involve measuring relational similarity.

2.1.3 Algorithms Measuring Relational Similarity

For a better understanding of what lexical relations can help achieve in NLP, I will now introduce some major tasks that involve measuring similarity between lexical relations.

2.1.3.1 Recognizing Verbal Analogy

A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D.” An example given by Turney and Littman (2005) is “mason is to stone as carpenter is to wood.” The task of recognizing verbal analogy is, given a stem word pair and a set of choice word pairs, selecting the choice that is most analogous to the stem.

Turney et al. (2003) has attempted this problem by combining different independent modules to answer SAT (Scholastic Aptitude Test) questions. An

example of SAT question is shown in Table 1. As shown in their discussion, of all the modules, the one that was based on the VSM (Vector Space Model) performed the best, achieving a score of 47% in answering a set of 374 SAT questions.

Table 1. A sample of SAT question

Stem: mason:stone

Choices: (a) teacher:chalk

(b) carpenter:wood (c) soldier:gun

(d) photograph:camera (e) book:word

Solution: (b) carpenter:wood

Veale (2004) dealt with the same set of 374 SAT questions with a lexicon-based approach. He applied WordNet in his research, measuring the paths from node A to node B in the word pair A:B. The evaluation of each candidate choice word pair was then to calculate the similarity of its path distance with that of the stem pair. The final result of Veale’s work has attained a score of 43%.

Different from Veale’s lexicon-based approach, Turney (2005) applied a corpus-based one to solve the SAT questions, which attained a score of 56% in the result. In this work, Turney introduced an enhanced version of the VSM approach, calling it Latent Relational Analysis (LRA). LRA basically calculates similarity between pairs of words according to their corresponding lexical patterns. A more detailed review on the LRA module will be presented in Section 2.2.3.1.

As argued by Turney (2006), in addition to answering SAT analogical questions, our daily use of metaphorical language can also be expressed in a SAT-style verbal analogy. For example, the sentence you need to budget your time can be expressed as in the format money:budget::time:schedule. This treatment of metaphor is supported by the claim made by Gentner et al. (2001), asserting that novel

metaphors are understood using analogy while the conventional ones are simply

recalled from memory. In this case, even if someone is for the first time given the example sentence you need to budget your time, he/she will be able to use the analogical knowledge well in interpreting the novel metaphor.

2.1.3.2 Classifying Noun-Modifier Relations

The task of classifying noun-modifier relations is to identify the possible semantic relations between a noun and its modifier. There has been much scholarly attention paid to noun-modifier pairs because of their high frequency in English (Turney 2006).

Lauer (1995) approached the noun-modifier problem with a corpus-based method. He used the British National Corpus (BNC) as his database, interpreting the pairs by inserting the prepositions such as of, for, in, at, on, and with between the noun and its modifier. For instance, the pair reptile haven was paraphrased as haven for reptiles.

Specifically in the medical domain, Rosario and Hearst (2001) classified noun-modifier relations using Medical Subject Headings and Unified Medical Language System as their lexical resources. In the final result, they successfully distinguished 12 classes of semantic relations based on a supervised neural network.

2.1.3.3 Information Extraction

In general, Information Extraction (IE) in machine learning refers to automatic methods for creating a structured representation of selected information drawn from documents. A frequently applied method is to set up relations in natural language.

More specifically, the task involves looking for pairs of entities that satisfy a given relation in the appointed document. Therefore drug names or adverse interactions between medical treatments can be automatically identified from multiple

unstructured medical documentations. The relation extraction task was first introduced as part of the Template Element Task in Message Understanding

Conferences (MUC) in 1988. Many different approaches have been proposed since then. Zelenko, Aone, and Richardella (2003) introduced a kernel method between two parse trees for extracting relations such as person-affiliation and

organization-location. They achieved success on two simple relation extraction tasks.

Recently the Web mining-based approach to IE has gained much attention (Pantel & Pennacchiotti 2006; Banko et al. 2007; Bollegala et al. 2007; Davidov &

Rappoport 2008). The basic procedures start with searching the Web as their corpus to gather co-occurrences of word pairs with strings of words between them, and then generate textual patterns according to the calculation of frequencies. This is similar to the task of classifying semantic relations, except that the focus here is on the relations between a specific pair of entities.

2.1.3.4 Automatic Thesaurus Generation

Algorithms of automatic thesaurus generation were at first developed with regard to certain specific semantic relations such as meronymy and hyponomy. Hearst (1992) introduced a corpus-based algorithm for extracting hyponyms (type of) and Berland and Charniak (1999) approached the meronyms (part of) by using a corpus. With a view to extracting more various relations to automatically generate a more

comprehensive thesaurus, WordNet (Fellbaum 1998) was constructed to include more than a dozen semantic relations between words.

2.2 Vector Space Models

So far we have specified in the previous sections that it is the conceptual nature of

So far we have specified in the previous sections that it is the conceptual nature of