Chapter 3. Anaphora Resolution
3.8. Salience Grading
3.8.4. Feature Selection
Figure 3: A general of genetic algorithm flowchart.
From Ng and Cardie [2002a] they showed the improvement in F-Score with hand-selected features. Feature selection in this thesis for salience grading is implemented with a genetic algorithm which can get the best features by choosing best parents to produce offspring leave local maximum by mutation. Sequential Floating Forward Selection (SFFS) is the best among the sequential search algorithms, but between the SFFS and GA, no clear cut case can be made for which is the better of the two. [Oh et al., 04].
In the initial state, we chose features (10 chromosomes), and chose crossover feature to produce offspring randomly. We calculated mutations for each feature in
Initial population Sentences
Chromosomes Selection
Crossover Mutation
Evaluation Terminal
Y N
each chromosome, and found about two features to be mutated in each generation.
Maximal F-Score is used to evaluate each chromosome and top 10 chromosomes are chosen for next generation. The algorithm terminated if two contiguous generations does not increase the F-score. Time complexity is O(MN) where M is the number of candidate antecedents, N is number of anaphors.
3.9. Experimental Results and Analysis
The test corpus, Medstract, was adopted from (http://www.medstract.org/), containing 32 MEDLINE abstracts and 83 biomedical anaphora pairs (40 pronominal (14 which) and 43 sortal pairs). We try to establish a corpus containing as many kinds of anaphor types as possible, so we collected 43-Genia and 57-Medlines from different ways. We combine 43-Genia and 57-Medlines as 100 MEDLINE abstracts (100-Medlines). 43 abstracts (479 sentences) were from GENIA corpus which contain pronominal anaphor, 57 abstracts (656 sentences) are from PubMed query result by using queries
“these proteins” and “these receptors”) containing 177 pronominal anaphora and 186 sortal anaphora pairs. Table 12 shows the statistic of pronominal and sortal anaphors for each corpus.
From Table 12 we have number of each anaphor distribution in each corpus. For pleonastic-it we total find 13 instances which all can be resolved. There are 314 ‘the NP’ sortal anaphor candidates in Mestract, 611 ‘the NP’ instances in 43-GENIA, and 607 ‘the’ anaphor candidates in 57-Medlines.
Table 11: Statistics of anaphor and antecedent pairs.
Abstracts Sentences Pronominal instances
Sortal
instances Total
57-Medlines 57 565 69 118 187
43-GENIA 43 479 98 63 161
Table 12: Occurrences of each anaphor.
Pronominal it its itself they their them themselves Total
Medstract 6 9 0 2 7 2 0 26
Table 4 to Figure 5 presents the distribution of the NPs between antecedents and anaphors. From Table 4 and Table 5 we can conclude that NPs between anaphors and antecedents in sortal anaphora are more than NPs in sortal anaphora. Sortal anaphors contain more information than pronominal anaphors, so it is more readable than pronominal anaphors in far distance.
0%
Figure 4: NPs between pronominal anaphor and antecedent.
0%
Figure 5: NPs between sortal anaphor and antecedent.
Figure 4 and Figure 5 shows percentage of NPs between anaphor and antecedent in both pronominal and sortal anaphors. From Figure 4 we can see the tendency of fewer instances as the distance increase in pronominal anaphora, wile Figure 5 the highest percentage is not the nearest NPs.
0
Figure 6: Sentences between pronominal anaphors and antecedents.
0
Figure 7: Sentences between sortal anaphors and antecedents.
Figure 6 and Figure 7 shows distance in sentences between anaphor and antecedent. The value 0 denotes intra-sentential anaphora and number 1, 2, 3 indicate inter-sentential anaphora which antecedent is 1, 2 or 3 sentences ahead of anaphor.
The results give us confident while using two sentences as searching space.
From the experimental results in Table 13, best fit strategy performed better than the nearest first strategy. In addition, the features selected by the genetic algorithm indicated that syntactic features affect pronominal anaphora, and semantic features will impacts on both sortal and pronominal anaphora.
Table 13: Results with best-first and nearest-first algorithms for Medstract.
Best Fit Nearest Fit [Castano et al., 2002]
Sortal Pronominal Sortal Pronominal Sortal Pronominal Total
Features 64.08% 88.46% 50.49% 73.47%
F5~F7 All-{F5} F5~F7 All-{F2,F5} F4~F6 F4, F6, F7 Genetic
Features 78.26% 92.31% 61.18% 79.17% 74.4% 75.23%
F1: Recency, F2: Subject and Object preference, F3: Grammatical role Agreement, F4:
Number Agreement, F5: Longest common subsequence, F6: Semantic type Agreement, F7: Biomedical Antecedent
Table 14: F-Score of Medstract and 100-Medlines
Medstract 100-Medlines
Sortal Pronominal Sortal Pronominal Total
Features 64.08% 88.46% 71.33% 86.65%
F5~F7 All-{F5} F5~F7 All-{F5}
Genetic
Features 78.26% 92.31% 80.62% 87.25%
The impact of each feature was also concerned and verified within different corpora. Results are showed in Table 15 and Table 16 . Syntactic features (F1~F4) play insignificant roles in sortal resolution but they are useful for pronominal anaphora resolution. Sortal anaphora resolution are sensitive to semantic features (F5~F7), semantic type agreement plays an important role in sortal anaphora resolution. In addition to UMLS, headwords and PubMed search results were used to determine semantic type agreement between anaphor and antecedents. Table 16 shows F3 increases F-score in pronominal anaphora but drop F-score in sortal anaphora.
Medstract and 100-Medlines results show semantic type match is important in both sortal and pronominal anaphora. Table 17 shows F-score when removing headword and PubMed query result. Headword features show improvement in F-score because the semantic type of new words become precisely. PubMed query results improved little in F-score may because we only use co-occurrence information was concerned.
From Table 16 shows that SA/AO collection corpus affects the F-Score within 43-GENIA and 57-Medlines. We collect SA/AO patterns from GENIA corpus, so we can identify semantic type more correctly than in 57-Medlines.
Table 15: Impact of each feature in Medstract and 100-Medlines.
Medstract 100-Medlines
Sortal Pronominal Sortal Pronominal
All 64.08% 88.46% 71.33% 86.65%
Table 16: Impact of each feature in 43-GENIA and 57-Medlines.
43-GENIA 57-Medlines
Sortal Pronominal Sortal Pronominal
All 67.69% 93.58% 73.28% 76.81%
All - F1 60.14% 83.87% 75.44% 75.36%
Table 17: Impact of headword and PubMed in Medstract.
The success rate is calculated as following equation:
With Headword w/o Headword
Medstract. 100-Medlines Medstract. 100-Medlines
With PubMed 78% 80.62% 59% 72.16%
Without PubMed 76% 80.13% 58% 71.33%
anaphors
Success rate shows the accuracy of identifying anaphor and its antecedent. From Table 18, the success rates of sortal anaphora are higher than their F-Score, while success rate of pronominal anaphora are lower than their F-Score. Results shows in 100-MEDLIINES, sortal anaphora have more plural anaphora errors and pronominal have more singular anaphora errors.
Table 18: Success rates of the 100-Medlines.
100-Medlines
Sortal Pronominal
All 77.30% 82.64%
All – Recency (F1) 77.30% 77.78%
All - Subject or Object preference (F2) 80.85% 78.47%
All - Grammatical Role Match (F3) 76.60% 75.00%
All - Number Agreement (F4) 75.89% 79.86%
All – LCS (F5) 59.57% 82.64%
All – Semantic Type Match (F6) 60.74% 79.17%
All - Biomedical Antecedent (F7) 61.70% 58.33%
Table 19 shows features used in Medstract. From table, the pronominal anaphora use syntax and semantic features except 'F5'. For sortal anaphora, the syntax features are used in selecting antecedent, but from Table 15 and Table 16 we can see that using these feature will drop F-scores in antecedent selection.
Table 19: Features used in Medstract.
Sortal Pronominal
F1 30 25
F2 22 15
F3 3 11
F4 21 25
F5 17 0
F6 41 5
F7 37 22