Experimental Results and Analysis - Abbreviation Resolution

Chapter 2. Abbreviation Resolution

2.4. Experimental Results and Analysis

We tested our abbreviation recognition system by two corpora, the Medstract Gold Standard Evaluation Corpus (Medstract), which has 133 abstracts including 168

<abbreviation form, long form> pairs, and 100-MEDLINS which includes 162

<abbreviation form, long form> pairs. Information of the two corpora are shown in Table 1. From Table 1 we show the information of Medstract and 100-Medlines. It shows we have 298 'NP(NP)' and 143 'NP, NP' candidates in Medstract. For 100-Medlines we have 380 'NP(NP) and 'NP, NP' candidates. Amount these candidates, we have 163 'LF(Ab)', 2 'Ab(LF)', and 3 'NP, NP' pairs are truly abbreviation and long forms pairs in Medstract. In 100-Medlines we have 162 'LF(Ab)', 0 'Ab(LF)', and 0 'NP, NP' pairs are truly abbreviation and long form pairs.

Table 1: Corpora information for abbreviation resolution.

In Table 2, characters beside alphabet and number are used to separate words into tokens.

Corpus Medstract 100-Medlines

Abstracts 133 100

Abbreviations 168 162

LF(Ab)/NP(NP) 163/298 162/380

Ab(LF)/NP(NP) 2/298 0/380

NP, NP, 3/143 0/162

(NP)NP 0/0 0/0

Table 2: Number of tokens in long forms and NP-chunks collect form Medstract.

The average of difference between extended NPs in tokens and long form length in tokens is 3.7. The average of difference between 'min(n*2,n+5)' number as tokens and long form length in tokens is 4.8. The result shows that using extended NPs as candidate of long form is closer to right long form chunk, so we need to delete less words.

We use precision, recall and F-Score to evaluate our result, precision and recall function is listed below:

)

We use chunker result as our baseline, the baseline model is using base NP definition, patterns such as ‘NP1 of NP2’ is considered as NP1 and NP2. But from Table 3 we can see the short definition NPs can not cover most long form, so the long

as one NP.

On the Medstract corpus, our method results in precision 94% at a recall of 86%.

For comparison, the algorithm described in Schwartz and Hearst [03] achieved 96%

precision at 82% recall, and that of Pustejovsky et al.[01] achieved 98% precision at 72% recall.

Table 3: Experimental results on 100-Medlines w.r.t. rules.

100-Medlines Medstract

Threshold Type of NP Recall Precision F-Score Recall Precision R-Score Chunker

Result Base-NP 66.05% 72.30% 69.03% 53.57% 63.38% 58.06%

All Extended-NP 85.80% 99.29% 92.05% 85.12% 94.08% 89.38%

All-R5 Extended-NP 88.89% 97.96% 93.20% 86.90% 95.42% 90.97%

All-R5-R1-R3 Extended-NP 7.41% 48.00% 12.83% 4.17% 77.78% 7.91%

All-R5-R2 Extended-NP 88.89% 97.96% 93.20% 86.90% 95.42% 90.97%

All-R5-R3 Extended-NP 56.79% 94.85% 71.04% 48.21% 90.00% 62.79%

All –R5-R4 Extended-NP 82.72% 91.16% 86.73% 63.10% 92.98% 75.18%

All-R5-R6 Extended-NP 88.89% 97.30% 92.90% 82.14% 94.52% 87.90%

All-R5-R7 Extended-NP 85.19% 97.87% 91.09% 82.14% 94.52% 87.90%

Table 3 shows the impact factor for each rule. It is noticed that rule 1 and rule 3 carry more information in long form identification.

In 100-Medlines results, we extract ten error abbreviation and long form pairs.

Six are because the first character is not the same with the first character of abbreviation, one is because order of abbreviation and long form character is not same, and three are not relation abbreviation and long form pairs. Amount unsolved 18 abbreviation and long form pairs, three are NP chunking error, 13 are syllables error, and one is <Heliothis receptor 14-16, HR14 HR16>. To compare with base line model, we found out chunker can correctly chunk instances which contain semantic type of

abbreviation pair such as < RNA polymerase II, Pol II> pair but for instance such as <differentiation inhibitory factor,I factor> will chunk the long form error, because NP will contain ‘differentiation’ while long form only contain ‘inhibitor factor’.

Table 4: Experiments on 100-Medlines for various threshold.

Abbreviation and Chunk Correct Threshold Recall Precision F-Score

100% 85.80% 99.29% 92.05%

90% 85.80% 99.29% 92.05%

80% 85.80% 99.29% 92.05%

70% 88.27% 98.62% 93.16%

60% 88.89% 97.96% 93.20%

50% 90.12% 90.68% 90.40%

Table 5: Experiments on Medstract corpus in various thresholds.

Abbreviation and Chunk Correct Threshold Recall Precision F-Score

100% 82.14% 94.52% 87.90%

90% 82.14% 94.52% 87.90%

80% 82.74% 94.56% 88.25%

70% 86.31% 94.16% 90.06%

60% 85.71% 91.72% 88.62%

50% 85.12% 89.38% 87.20%

Table 4 and Table 5 show results in different threshold, and we can see the best result is about the same threshold (66%) in the Pustejovsky et al. [01]. It means the match ratio is at 66% between abbreviation and its corresponding long form.

Table 6 shows count of each rule is fired in Medstract, it shows the most

Table 6: Rules used in Medstract.

R1 R2 R3 R4 R5 R6 R7

Medstract 168 67 56 36 4 1 22

Chapter 3. Anaphora Resolution

Figure 2: Architecture overview.

Figure 2 is the presented overview architecture which contains background processing Antecedent Finder

collection and foreground processing indicated with solid lines, including preprocessor, grammatical pattern extractor anaphor recognizer, and antecedent finder.

3.1. Headword Collector

For unknown words, we need to predict their semantic types of the word. In [Pustejovsky et al., 02a], they use the right-hand head rule (the head of a morphologically complex word to be the right-hand member of that word) to extract headwords to be subtype of the semantic type in UMLS (135 semantic types).

We collected all UMLS concepts and their corresponding synonyms (1,860,682 recorders), and then selected headwords for each semantic type (super-concept). For example, concept ‘interleukin-2’ has synonyms ‘Costimulator’, ‘Co-Simulator’, ‘IL 2’, and ‘interleukine 2’. We collected ‘interleukin’, ‘costimulator’, ‘simulator’, ‘IL’, and ‘interleukine’ as headwords for ‘interleukin-2’. Then, we found semantic types of

‘interlukin-2’ is ‘Amino Acid, Peptide, or Protein’ and ‘Immunologic Factor’. We assigned synonym headwords of ‘interleukin-2’ into both semantic types. Eq. 2 was designed to score each headword for each semantic type. The scoring function smoothes the semantic type size. We set the threshold as 0.03, if the maximum words of the semantic is over 10000 the threshold is 0.003.

Headword scoring function:

Table 7: Top score headwords for Amino Acid, Peptide, or Protein semantic type.

Headword Score No. Count

Protein 0.020833 36807

Product 0.007223 12761

Cerevisiae 0.007082 3128

endonuclease 0.005832 1288

Kinase 0.00575 2963

Antigen 0.004536 4842

Receptor 0.004478 4450

Synthase 0.004426 1629

Reductase 0.004279 1575

Arabidopsis 0.004246 1094

dehydrogenase 0.004005 2064

Antibody 0.003867 3416

3.2. SA/AO Patterns Finder

In this thesis we used co-occurring SA/AO patterns obtained from GENIA corpus for pronominal anaphora resolution. We use the English Part-of-Speech Tagger (http://tamas.nlm.nih.gov/tagger.html) proposed by Tamas Doszkocs, Ph.D to tag POS and NPs, then we use the grammatical function extractor to extract subject and objects.

Then we tag subjects and objects with UMLS-semantic type tags, we search the noun phrase from right to find the longest word sequence can found in the UMLS, if not found we will try the headwords to tag semantic types. Each SA/AO pattern is scored by the scoring function (Eq. 1). An antecedent candidate is concerned if its scores are greater than a given threshold (0.01).

) 1

The following is a pattern extraction example:

Example 7:

<NFATp> <binds> to two sites within the kappa 3 element UMLS semantic type of NFATp: Amino Acid, Peptide, or Protein Extracted pattern: <Amino Acid, Peptide, or Protein> <bind>

Table 8 is a statistic of pattern association with the verb 'bind' and possible semantic type for its subject.

Table 8: Statistics of patterns (subject, verb).

Score (Pharmacologic Substance, Bind) = 0.142857 Score (Organic Chemical, Bind) = 0.114286 Score (Amino Acid, Peptide, Protein Bind) = 0.114286

Score (Cell, Bind) = 0.085714

3.3. Preprocessor

Anaphor resolution step, we use the tagger which base-NP will be chunked.

3.4. Grammatical Function Extraction

Grammatical function is defined as creating a systematic link between the syntactic relation of arguments and their encoding in lexical structure. For anaphora resolution, grammatical function is an important feature of salience grading. We extended rules from Siddharthan [03], with rules 5 and 6.

Rule 1: Prep NP (Oblique) Rule 2: Verb NP (Direct object)

Rule 3: Verb [NP]⁺ NP (Indirect object)

Rule 4: NP (Subject) [“,[^Verb], ”|Prep NP]* Verb

Rule 5: NP1 Conjunction NP2 (Role is same as NP1) Conjunction]

Rule 6: [Conjunction] NP1 ( Role is same as NP2 ) Conjunction NP2

Rule 5 and rule 6 were presented for dealing those anaphors that have plural antecedents. We use syntactic agreement with first antecedent to find other antecedents. Without rules 5 and 6, ‘anti-CD4 mAb’ in Example 8 will not be found when resolving ‘they’’s antecedents.

Example 8:

“Whereas different anti-CD4 mAb or HIV-1 gp120 could all trigger activation of the ..., they differed…”

3.5. Anaphora Resolution

Anaphor and antecedent recognition are the two main parts of the anaphora resolution system. Anaphor recognition is to recognize the target anaphora by filtering strategies.

Antecedent recognition is to determine appropriate antecedents with respect to the target anaphor. In this thesis, we deal with pronominal and sortal anaphor. In current version, zero and event anaphora are not solved.

3.6. Anaphora Recognition

Noun phrases or prepositional phrases with ‘it’, ‘its’, ‘itself’, ‘they’, ‘them’,

‘themselves’ and ‘their’ are considered as pronominal anaphor. ‘it’, ‘its’, and ‘itself’

are considered as anaphor which has singular number of antecedent, others are considered as anaphor which has plural number of antecedents. Relative pronouns

‘which’ and ‘that’ are also pronominal anaphors but such anaphors can be resolved by

Rule 1: The nearest noun phrase of prepositional phrase is assigned as antecedent of the anaphor."

Rule 2: If the anaphor is 'that' and paired with pleonastic-it, the relative clause next to the anaphor is its antecedent.

Noun phrases or prepositional phrases with ‘either’, ‘this’, ‘both’, ‘these’, ‘the’, and ‘each’ are considered as candidates of sortal anaphors. Noun phrases or prepositional phrases with ‘this’ or ‘the+ singular noun’ are considered as anaphors which have singular antecedent. Anaphor with plural number of antecedents are shown in Table 9.

Table 9: Number of Antecedents.

Anaphor Antecedents #

Either 2 Both 2 Each Many They, Their, Them, Themselves Many

The +Number+ noun Number

Those +Number+ noun Number these +Number+ noun Number

3.6.1. Pronominal Anaphora Recognition

Pronominal anaphora recognition is done by filtering out pleonastic-it. We reference Tyne and Wu [04] and generate following rules are used to recognize pleonastic-it instances.

Rule1: It be [Adj|Adv| verb]* that

Example 9: “It is shown that antibody 19 reacts with this polypeptide either bound to the ribosome or free in solution.”

Rule 2: It be Adj [for NP] to VP

Example 10: “However, it is possible for antidepressants to exert their effects on the fetus at other times during pregnancy as well as to infants during lactation.”

Rule 3: It [seems|appears|means|follows] [that]*

Example 11: “It seems that the presence of HNF1 sites in liver-specific genes was favoured, but that no counter-selection occurred within the rest of the genome.”

Rule 4: NP [makes|finds|take] it [Adj]* [for NP]* [to VP|Ving]

Example 12: “Furthermore, the same experimental model makes it possible to image lymphoid progenitors in fetal and adult hematopoietic tissues.”

3.6.2. Sortal Anaphora Recognition

Sortal anaphora recognition is done by filtering those sortal anaphors, which have no referent antecedent or which have antecedents but not in the defined biomedical semantic types. Following two rules are used to filter out those non-target anaphors.

Rule 1: Filter out those noun phrases or prepositional phrases if they are not tagged with the following UMLS classes.

Amino Acid, Protein, Peptide, Embryonic Structure, Cell Biomedical Active Substance, Organism, Functional Chemical, Bacterium, Molecular Sequence, Chemical, Nucleoside, Cell Component, Enzyme, Gene or Genome, Structural Chemical Nucleotide Sequence, Substance, Organic Chemical, Pharmacologic Substance, Organism Attribute, Nucleic Acid, Nucleotide.

Rule 2: Filter out proper nouns with capitals and numerical features.

3.7. Number Agreement Checking

Number is the quantity that distinguishes between singular (one entity) and plural (numerous entities). It makes the process of deciding candidates easier since they must be consistent in number. All noun phrases and pronouns are annotated with number (singular or plural). For a specified pronoun, we can discard those noun phrases whose numbers differ from the pronoun. With singular antecedent anaphor, plural noun phrases are not considered as possible candidates.

3.8. Salience Grading

Salience grade for each candidate antecedent is assigned according to Table 10. Each candidate antecedent is assigned with zero at initial state.

Recency is a feature about distance between an anaphor and candidate antecedents. The closer between an anaphor and a candidate antecedent, the more chance the anaphor points to this candidate antecedent. For grammatical role agreement, if we use same entity in the second sentence and in the same role, it is easy for readers to identify which antecedent that the anaphor points to, so an author might use anaphor instead of full name of the entity. In addition to role agreement, subjects and objects are important role in sentence, which may be mentioned many times and writer might use an anaphor to replace a previously mentioned items.

Singular anaphors may only point to one antecedent, while plural anaphors usually points to plural antecedents. For the feature of semantic type agreement, when we mention entity the second time, it is common for us to use its hypernym concept.

Therefore such feature will receive high weights at salience grading.

Table 10: Salience grading for candidate antecedents.

Features Score Recency

0, if in two sentences away from anaphor 1, if in one sentence away from anaphor

2, if in same sentence as anaphor 0-2 Subject and Object Preference 1 Grammatical function agreement 1

Number Agreement 1

Longest Common Subsequence 0 to 3 Semantic Type Agreement -1 to +2 Biomedical antecedent preference -2 if not or +2

3.8.1. Antecedent and Anaphor Semantic Type Agreement

For pronominal anaphora, we collected coercion semantic type between verb and headword by GENIA SA/AO patterns, and we generalized subjects and objects by using UMLS semantic types. For a pronoun, we tagged the pronoun with coercion semantic types on the basis of SA/AO pattern.

Sortal anaphors are dealt by checking semantic agreement between anaphor and antecedent. So, all noun phrases and prepositional phrases will be tagged in advance by following steps.

(1) UMLS type check: we search the noun phrase from right to find the longest word sequence can found in the UMLS.

(2) The Antecedent contains the headword in the anaphor’s semantic type.

(3) If there is no headword found in antecedent then check {anaphor, antecedent}

pair by using PubMed

)

antecedent for semantic type agreement.

3.8.2.

Longest Common Subsequence (LCS)

The use of the LCS exploits the fact that the anaphor and its antecedents are morphological variants of each other (e.g., the anaphor “the grafts” and the antecedent

“xenografts”) [Castaño, 02]. We score each anaphor and candidate antecedent as follows:

If total match between an anaphor and its candidate antecedents then salience score = salience score + 3

Else if partial match between an anaphor and its candidate antecedents then salience score = salience score + 2

Else if one antecedent match its anaphor hyponym by WordNet 2.0 then salience score = salience score + 1

Example 13: total match:

<anaphor: each inhibitor, Antecedent: PAH alkyne metabolism-based inhibitors>

Example 14: partial match:

<Anaphor: both receptor types, Antecedent: the ETB receptor antagonist BQ788>

Example 15: using WordNet 2.0:

<Anaphor: this protein (has hyponym: growth factor), Antecedent: Cleavage and polyadenylation specificity factor (CPSF)>

)

i of pages containing A

3.8.3. Antecedent Selection

We search noun phrases or prepositional phrases in range of two sentences preceding the anaphor. We count salience grader scores for each noun phrase. Antecedents are selected by using best fit or nearest fit strategy.

(1) Best Fit: select antecedents with the highest salience score that is greater than threshold

(2) Nearest Fit: Select the nearest antecedents whose salience value is greater than a given threshold, and find candidate antecedents from the anaphor to the two sentences ahead

We have identified the number of antecedents for its corresponding anaphor. If an anaphor is identified to have plural antecedents, we will use following steps to choose antecedents.

(1) If the number of antecedents is identified, set the highest number of noun phrases or prepositional phrases to the anaphor.

(2) If the number of antecedents is unknown, find those noun phrases and prepositional phrases that are greater than a given threshold and they have the same patterns as the top-score noun phrase or prepositional phrase.

3.8.4. Feature Selection

Figure 3: A general of genetic algorithm flowchart.

From Ng and Cardie [2002a] they showed the improvement in F-Score with hand-selected features. Feature selection in this thesis for salience grading is implemented with a genetic algorithm which can get the best features by choosing best parents to produce offspring leave local maximum by mutation. Sequential Floating Forward Selection (SFFS) is the best among the sequential search algorithms, but between the SFFS and GA, no clear cut case can be made for which is the better of the two. [Oh et al., 04].

In the initial state, we chose features (10 chromosomes), and chose crossover feature to produce offspring randomly. We calculated mutations for each feature in

Initial population Sentences

Chromosomes Selection

Crossover Mutation

Evaluation Terminal

Y N

each chromosome, and found about two features to be mutated in each generation.

Maximal F-Score is used to evaluate each chromosome and top 10 chromosomes are chosen for next generation. The algorithm terminated if two contiguous generations does not increase the F-score. Time complexity is O(MN) where M is the number of candidate antecedents, N is number of anaphors.

3.9. Experimental Results and Analysis

The test corpus, Medstract, was adopted from (http://www.medstract.org/), containing 32 MEDLINE abstracts and 83 biomedical anaphora pairs (40 pronominal (14 which) and 43 sortal pairs). We try to establish a corpus containing as many kinds of anaphor types as possible, so we collected 43-Genia and 57-Medlines from different ways. We combine 43-Genia and 57-Medlines as 100 MEDLINE abstracts (100-Medlines). 43 abstracts (479 sentences) were from GENIA corpus which contain pronominal anaphor, 57 abstracts (656 sentences) are from PubMed query result by using queries

“these proteins” and “these receptors”) containing 177 pronominal anaphora and 186 sortal anaphora pairs. Table 12 shows the statistic of pronominal and sortal anaphors for each corpus.

From Table 12 we have number of each anaphor distribution in each corpus. For pleonastic-it we total find 13 instances which all can be resolved. There are 314 ‘the NP’ sortal anaphor candidates in Mestract, 611 ‘the NP’ instances in 43-GENIA, and 607 ‘the’ anaphor candidates in 57-Medlines.

Table 11: Statistics of anaphor and antecedent pairs.

Abstracts Sentences Pronominal instances

Sortal

instances Total

57-Medlines 57 565 69 118 187

43-GENIA 43 479 98 63 161

Table 12: Occurrences of each anaphor.

Pronominal it its itself they their them themselves Total

Medstract 6 9 0 2 7 2 0 26

Table 4 to Figure 5 presents the distribution of the NPs between antecedents and anaphors. From Table 4 and Table 5 we can conclude that NPs between anaphors and antecedents in sortal anaphora are more than NPs in sortal anaphora. Sortal anaphors contain more information than pronominal anaphors, so it is more readable than pronominal anaphors in far distance.

Figure 4: NPs between pronominal anaphor and antecedent.

Figure 5: NPs between sortal anaphor and antecedent.

Figure 4 and Figure 5 shows percentage of NPs between anaphor and antecedent in both pronominal and sortal anaphors. From Figure 4 we can see the tendency of fewer instances as the distance increase in pronominal anaphora, wile Figure 5 the highest percentage is not the nearest NPs.

Figure 6: Sentences between pronominal anaphors and antecedents.

Figure 7: Sentences between sortal anaphors and antecedents.

Figure 6 and Figure 7 shows distance in sentences between anaphor and antecedent. The value 0 denotes intra-sentential anaphora and number 1, 2, 3 indicate inter-sentential anaphora which antecedent is 1, 2 or 3 sentences ahead of anaphor.

The results give us confident while using two sentences as searching space.

From the experimental results in Table 13, best fit strategy performed better than the nearest first strategy. In addition, the features selected by the genetic algorithm indicated that syntactic features affect pronominal anaphora, and semantic features will impacts on both sortal and pronominal anaphora.

Table 13: Results with best-first and nearest-first algorithms for Medstract.

Best Fit Nearest Fit [Castano et al., 2002]

Sortal Pronominal Sortal Pronominal Sortal Pronominal Total

Features 64.08% 88.46% 50.49% 73.47%

F5~F7 All-{F5} F5~F7 All-{F2,F5} F4~F6 F4, F6, F7 Genetic

Features 78.26% 92.31% 61.18% 79.17% 74.4% 75.23%

F1: Recency, F2: Subject and Object preference, F3: Grammatical role Agreement, F4:

Number Agreement, F5: Longest common subsequence, F6: Semantic type Agreement, F7: Biomedical Antecedent

Table 14: F-Score of Medstract and 100-Medlines

Medstract 100-Medlines

Sortal Pronominal Sortal Pronominal Total

Features 64.08% 88.46% 71.33% 86.65%

F5~F7 All-{F5} F5~F7 All-{F5}

Genetic

Features 78.26% 92.31% 80.62% 87.25%

The impact of each feature was also concerned and verified within different corpora. Results are showed in Table 15 and Table 16 . Syntactic features (F1~F4) play insignificant roles in sortal resolution but they are useful for pronominal anaphora resolution. Sortal anaphora resolution are sensitive to semantic features (F5~F7), semantic type agreement plays an important role in sortal anaphora resolution. In addition to UMLS, headwords and PubMed search results were used to determine semantic type agreement between anaphor and antecedents. Table 16 shows F3 increases F-score in pronominal anaphora but drop F-score in sortal anaphora.

Medstract and 100-Medlines results show semantic type match is important in both

在文檔中生物文獻中同指涉問題處理之研究 (頁 18-0)