Evaluation - Information Extraction: Capabilities and Challenges

Evaluating NER requires first of all a clear statement of the task to be performed. This may seem obvious, but the difficulty of preparing clear guidelines is often overlooked. Even with three categories (people, organizations, locations) difficult cases arise. Should characters in books and movies be labeled as people? Should planets be labeled as locations? Is “New York Criminal Court” one name or two?

One general problem is that of systematic name polysemy. Names of organizations also represent the locations they occupy (“I’ll meet you in front of MacDonald’s”); names of locations also represent their governments (“Germany signed a treaty with the Soviet Union.”) and sometimes the population or industry (“Japan eats more tuna that the U.S.”).

A basic decision must be made whether to tag a name based on its primary meaning or its meaning in context. The ACE evaluations, for example, tagged based on the primary meaning of a name. In recognition of the particular problem with locations, ACE introduced the category GPE, Geo-Political Entity, for locations with governments. “Japan” would be tagged as a GPE whether used in a context representing the land, the government, or the population. The category location was reserved for natural landmarks (“the Atlantic Ocean”, “Mt. Everest”).

In preparing an evaluation, it is also important to characterize the test corpus. As we have already noted, the performance of an NER can be very sensitive to these characteristics.

In particular, a supervised model may do very well if the test corpus is similar to the training corpus – possibly because it has simply memorized lots of names from the training corpus – and may rapidly degrade for more varied corpora. We should be aware of the source of the texts, the epoch of the texts, and whether they were limited to particular topics.

In general, if annotators are familiar with the topics discussed in a text, name annotation (for a small set of name types) quality can be high. However, if annotators are not familiar with the topic, they may have difficulty correctly assigning name types and, in languages where names are not distinguished by case, in distinguishing names from non-names.

Comparing the name tags produced by an NER against an annotated reference corpus, we mark tags as

correct the tags agree in extent and type

spurious a tag in system output with no matching tag in the reference corpus missing a tag in the reference corpus with no matching tag in the system output

From these we compute

The MUC evaluation gave part credit for a system name tag which matched the reference in extent but did not match in type. More recent papers generally follow the CoNLL scoring, which gives no credit for such partial matches (so that an error in type is considered a spurious and a missing tag). As a result, the early MUC scores cannot be directly compared with more recent results.

For the CoNLL multi-site evaluations in 2002 and 2003 [TKS03, TKSDM03], the best named entity F scores were:

English 88 Spanish 81 Dutch 77 German 72

These tests were run with uniform training and test corpora (in the case of English, the Reuters newswire). Performance degrades sharply if there are differences between training

and test, even if both are news corpora. Ciaramita and Altun reported that a system trained on the CoNLL 2003 Reuters dataset achieved an F-measure of 91 when it was tested on a similar Reuters corpus but only 64 on a Wall Street Journal dataset [CA05]. They were able to mitigate the domain-specificity to some extent (to F=69), but more work in this area is required to better adapt name taggers to new domains.

Chapter 4

Names, Mentions, and Entities

Information extraction gathers information about discrete entities, such as people, orga-nizations, vehicles, books, cats, etc. The texts contain mentions of these entities. These mentions may take the form of

• names (“Sarkozy”)

• noun phrases headed by common nouns (“the president”)

• pronouns (“he”)

In the previous chapter we considered methods for identifying names. Noun phrases and pronouns can be identified through syntactic analysis – either chunking or full parsing.

Information extraction finally seeks to build data bases of relations. Filling these data base entries with nouns or pronouns is not very helpful. We can’t do much with an entry like

murders agent victim date

he her last week

At the very least, we want to fill in the names of the entities. But even names may be ambiguous; there are lots of people named “Michael Collins” or “Robert Miller”. So we may want to create a data base of entities with unique ID’s, and express relations and events in terms of these IDs.

The first step in either case is in-document coreference – linking all mentions in a docu-ment which refer to the same entity. If one of these docu-mentions is a name, this allows us to use the name in the extracted relations. Coreference has been extensively studied independently of IE, typically by constructing statistical models of the likelihood that a pair of mentions are coreferential. We will not review these models here.

The performance of these models has been gradually improving over the last decade;

accuracies are quite good for pronouns (70% to 80%) but not so good for common nouns.

Because a large fraction of extracted relations involve nouns and pronouns, which can be mapped to names only if coreference is correctly analyzed, coreference continues to be a major limiting factor in relation and event extraction.

If we are aggregating extracted information across multiple documents, we will want to perform cross-document coreference to link together the entities mentioned by individual documents. This is generally limited to entities which have been mentioned by name in each document. This process will produce a data base of entities with IDs. As additional documents are processed, the entities mentioned will either be linked to existing entries in the data base or lead to the creation of new entries.

The study of cross-document coreference is relatively recent, and has been conducted mostly as part of IE evaluations: ACE 2008, WePS¹, and KBP. It involves modeling

• possible spelling and name variation, due for example to variant transliterations (Osama bin Laden / Usama bin Laden) or to nicknames (Bill Clinton vs. William Jefferson Clinton)

• likelihood of coreference, based on shared or conflicting attributes (extracted by IE from the individual documents) or on co-occurring terms in the documents

1Web People Search, http://nlp.uned.es/weps/

Chapter 5

Extracting Relations

5.1 Introduction

A relation is a predication about a pair of entities:

• Rodrigo works for UNED.

• Alfonso lives in Tarragona.

• Otto’s father is Ferdinand.

Typically relations represent information which is permanent or of extended duration. How-ever, the crucial characteristic is that they involve a pair of entities; this makes training somewhat simpler than for more general, n-ary predications such as the events discussed in the next chapter.

Relation detection and classification was introduced as a separate IE task in MUC-7 (1998). The MUC-7 relations task covered three relations involving organizations: loca-tion of, employee of, and product of. Relaloca-tions were extensively studied as part of the ACE evaluations, starting in 2002. The current KBP evaluations incorporate a form of relation extraction.

The ACE relation task (“Relation Detection and Characterization”) was introduced in 2002 and repeatedly revised in order to create a set of relations which can be more consistently annotated. Most research has been done using the task definitions from 2003, 2004, and to a lesser extent, 2005. Each task defined a set of relation types and subtypes:

for 2003: 5 types, 24 subtypes; for 2004: 7 types, 23 subtypes; and for 2005: 7 types, 19 subtypes. For example, the 2004 task had the following types and subtypes:

relation type subtypes

physical located, near, part-whole

personal-social business, family, other

employment/membership/ employ-executive, employ-staff,

subsidiary employ-undetermined, member-of-group, partner, subsidiary, other

agent-artifact user-or-owner, inventor-or-manufacturer, other

person-org affiliation ethnic, ideology, other

GPE affiliation citizen-or-resident, based-in, other

discourse

-One crucial property of the ACE relation tasks was that mentions of both arguments had to appear explicitly in the sentence which expressed the relation. Thus a system would not be expected to recognize the father-son or mother-son relation in

Ferdinand and Isabella were married in 1469. A son was born the next year.

or an employ-executive relation in

There was a complete turnover in the Citibank executive suite. Fred Smith was named president.

As we shall see, most of the procedures for recognizing relations rely on this constraint.

In keeping with the distinction between entities and [entity] mentions, ACE makes a distinction between relations and relation mentions. A relation mention is a segment of text expressing a connection R between two entity mentions m₁ and m₂. If m₁ is a mention of entity e₁ and m₂ is a mention of entity e₂, then this is a mention of a relation R between e₁ and e₂. The hard part is finding the relation mentions in the text; given a relation mention and the coreference information (linking entity mentions to entities), it is simple to determine the corresponding relation.

If there are several mentions of an entity in a sentence, there is some ambiguity about specifying the relation mention. Consider

Fred died yesterday. His son, Eric, said the eulogy.

Here there are two entities, Fred (with mentions “Fred” and “his”) and Eric (with mentions

“His son” and “Eric”). In ACE, the relation mention will be recorded between the two closest mentions in the same sentence ... in this case, between “his” and “his son”. If we want to report the corresponding relation in a table, we would find the entities, look for mentions of these entities which are names, and report those names (“Fred” and “Eric”).

In the KBP evaluations, systems have to extract a set of attributes for a named person or organization. Many of these attributes have values which are themselves entities (em-ployee of, member of, country of birth) so the task has strong parallels to earlier relation extraction tasks. Unlike ACE, however, there is no requirement that both entities be explic-itly mentioned in the same sentence. Some cross-sentence analysis is required for roughly 15% of the attributes [JG11].

在文檔中 Information Extraction: Capabilities and Challenges (頁 14-20)