Automatic Answer Candidate Selection 1. What-Question Type

Chapter 2 Selection of Answer Candidates in Question Answering Using Information Fusion

4. Automatic Answer Candidate Selection 1. What-Question Type

For 5W1H questions, the targets for who, where, and when are clearer than those of the other three. The answer for how-question is non-entities, so that it is not major focus of this paper. The following only considers what-question and which-question.

There are four cases of what-questions:

1. “What X VP?”or “N V what X?”

E.g.Q427:“What culture developed the idea of potlatch?”

E.g.Q934:“Material called linen is made from what plant?”

Answer candidates are those which are X’s, such as cultures or plants in this example.

2. “What be the X-NP?”

E.g.Q586:“What is the chemical symbol for nitrogen?”

Answer candidates are those which are X’s, such as chemical symbols in this example.

3. “What”alone as a subject or an object where its main verb is not be-verb E.g.Q552:“What caused the Lynmouth floods?”

Its answer type does not directly appear in the question.

4. DIFINITION questions

E.g.Q600:“What is typhoid fever?”

Answers to such questions are definitions or descriptions.

In this paper, we experimented on only the first and the second cases. For the fourth case, i.e., DIFINITION questions, no answer candidates are needed to answer a question.

Instead, gloss information or definition pattern is more helpful. For the third case, one possible way to find answer candidates is to gather all the terms as subjects (or objects) of this main verb. It remains future work and is not discussed in this paper.

For the first and the second cases, answer candidates are those which can be X’s. If Y is the answer to a question Q “What X does something?”, the information of “Y is an X”and “Y does something”may not appear in the same passage, even not in the same document. Information fusion is needed to gather these pieces of information together in order to answer such questions.

Our idea of answer candidate selection by information fusion is: find instances of X in a knowledge base; assign Y as one of the instances and check if “Y does something”. If so, this instance is reported as an answer. Instances finding procedure is described in Section 5.

4.2. Question Focus

For a which-question or what-question in the first and the second cases, our system first identifies its X part, which is referred as “question focus”by Harabagiu et al (2000). We use this term but with slightly different meaning.

After syntactic parsing, if the word “what”or “which”alone is an NP, then it is in the second case and our system extracts the noun phrase after the be-verb as its question focus. If “what”or “which”is in a noun phrase with other words, it is in the first case and our system assigns its question focus as the noun phrase which “what”or “which”is in, but excludes the word “what”or “which”.

Because it does not guarantee that we can find at least one instance of this question focus in the knowledge base, we have to relax the range of focus if necessary. Other possible foci are the head noun phrase of the question focus, and the remaining phrase with removing leading article, attaching propositional phrase, or any modifier. If the question focus is in the form of “kind of NP”, “type of NP”, or “name of NP”, etc., possible focus is the noun phrase after “of”.

In the following example, a question and its possible foci are demonstrated in sequence:

Q254: What is California's state bird?

Foci: California's state bird state bird

bird

4.3. Corpus Candidates DIFINITION Instances

In order to find instances of an entity set, we adopted DEFINITION patterns from Ravichandran and Hovy (2002), and from Soubbotin (2001). DEFINITION questions are a special group in question answering. Such a question asks for a definition of a term, or a description of a specific person or entity.

In Ravichandran and Hovy’s system, they made experiments on six question types.

One of the six question types is DEFINITION. They collected pairs of questions and the corresponding answers as examples, and automatically learned their co-occurrence patterns in the knowledge base. Some example DEFINITION patterns are listed below:

<NAME> -LRB- <ANSWER> -

-RRB-<NAME> and related <ANSWER>s

-COMMA-in which <NAME> denotes a question term, and <ANSWER> the correspond-COMMA-ing answer part.

Soubbotin also used DEFINITION patterns, but they made them manually. Some examples are:

-COMMA-<NAME> is called <ANSWER>

The reason that we use definition to find instances is: for the instances of an entity set, the name of the entity set is just like the definition of the instances. Unlike the usage of these patterns in finding answers of DEFINITION questions, this time <ANSWER> part (the DEFINITION part) in the patterns is known (the entity set), and we’d like to extract

<NAME> part as instances.

Syntactic information is integrated into these patterns. Since answers are mostly entities, we forced the extracted <NAME> parts to be noun phrases (NP) or quantitative phrases (QP). We extracted the minimal noun phrase if there is no other text to the left or right of the <NAME> tag.

Equivalent Instances

In some cases, the name of the entity set is not the best definition of its instance.

Moreover, it may not be an appropriate definition of the instance. For example, “oak tree”can be an instance of “habitat”, but the definition of “oak tree”is “a deciduous tree that has acorns and lobed leaves”.

To capture such instances, we further extracted equivalent entities in the knowledge base. That is, if any form of “A is B”appeared in the corpus, than we thought A could be an instance of B, or vise versa B could be an instance of A. Again, during extraction, A or B was restricted to an NP or QP.

4.4. Answer Candidates Selection Models

We experimented on three models to find answer candidates automatically. They are:

(1) Model A: Extracting Self-Evident NPs

If an NP’s head is the same as the question focus, it is regarded as an answer candidate.

E.g. QFocus: artery

AnsCand: pulmonary artery

(2) Model B: Looking for WN Descendants

If a term is a descendant of the question focus in WordNet, it is considered as an answer candidate.

E.g. QFocus: color AnsCand: red WN: red, redness

=> chromatic color, spectral color…

=> color, colour, …

(3) Model C: Extracting Corpus Candidates

If a term in the corpus matches one of the DEFINITION patterns, or an equivalent relationship (A is B) is found, it is considered as an answer candidate.

E.g. QFocus: elephant

AnsCand: Loxodonta Africana

Pat: Loxodonta Africana (African elephants)

5. Experiments

在文檔中問答系統技術研發(3/3)－異質資訊源問答系統之研究 (頁 25-30)