Experimental Settings - ( s w sig s w sim s q

( s w sig s w sim s q

5.3 Experimental Settings

In our experiments, we evaluated models with different settings to answer the following questions:

Q1: Can the proposed hybrid relevance analysis achieve better performance when it is employed to estimate the relevance of a sentence to the query?

Q2: Are surface features beneficial to improve the summarization performance?

Q3: Does the modified MMR outperform the original MMR?

Table 19 lists these models. In the table, B1 and B2 are baselines, which only exploit the similarity metric proposed in [1] and the original MMR [6]. M10 is the system that integrates all proposed methods in this work. Others are listed here as references to obtain more clearly the impact of different factors. As for parameters (as listed in Table 20), they were set manually in these experiments.

Table 19: Settings of different models Settings Relevance Features MMR

B1 sim1 N None

B2 sim1 N Original

M1 sim1 Y None

M2 sim1 Y Modified

M3 sim2 N None

M4 sim2 N Original

M5 sim2 Y None

M6 sim2 Y Modified

M7 sim3 N None

M8 sim3 N Original

M9 sim3 Y None

M10 sim3 Y Modified

Table 20 Parameter settings Equation Parameter Setting Eq. (32) k=10

Eq. (33) α=0.5

Eq. (40) wsig=0.5; wsim=0.7; wf1=0.8;

wf2=0.3; wf3=0.3; wf4=0.5; wf5=1.0;

wNE=0.3 Eq. (42) δ=0.5; λ=0.7 5.4 Results

The recall results are given in Table 21, sorted by the recall values of ROUGE-2 and ROUGE-SU4 respectively. In this table, the recall values of the best systems at DUC 2005 are listed as well (see System 15 [24] and System 17 [15]).

First, when only sentence similarity with the query is considered (see B1, M3, and M7), as we expected, M7, which employs the hybrid relevance analysis, obtained the highest score. As mentioned before, this benefits from noise reduction by averaging different similarity metrics. M3 outperformed B1, showing that latent semantic analysis could derive the topic structure of the corpus by grouping words according to co-occurrences, which leads to higher recalls compared to the vector space model. Moreover, for all cases using the proposed hybrid relevance analysis, they outperformed all other models (e.g., M7>M3>B1; M8>M4>B2; M9>M5>M1;

M10>M6>M2). These results suggest that a hybrid relevance analysis which combines similarities computed from the vector space model and latent semantic analysis is a successful way to estimate a better sentence relevance to the query.

Second, for those models in which surface features are taken into account but without MMR (see M1, M5, and M9), it is obvious that a scoring mechanism enhanced with low-level text features will improve the performance (e.g., M1>B1;

M5>M3; M9>M7). It is noted that according to the results of our experiments, we found that a slightly smaller wsig obtains better results. This is because for query-relevant summarization, relevant sentences are much more important than those sentences with high shallow feature salience which might be interpreted as theme-relevant sentences.

Finally, considering cases when the modified MMR was applied (see M2, M6, and M10), they outperformed models which use the original MMR (see B2, M4, and M8). A sentence, if it has high feature score and is highly relevant to the query but has lower similarity with sentences in the summary, will be ranked in the topmost position. This demonstrates that the modified MMR is a suitable module for query-focused multidocument summarization.

Table 21: recalls of ROUGE-2 and ROUGE-SU4 Models R-2 Models R-SU4 1 M10 0.075690 System 15 0.131633

2 M6 0.073880 M10 0.129950

3 M9 0.073780 System 17 0.129725 4 System 15 0.072510 M6 0.127110 5 M2 0.072280 M9 0.126870 6 System 17 0.071741 M5 0.124430 7 M5 0.071340 M2 0.124330 8 M8 0.070110 M8 0.124270 9 M4 0.070000 M4 0.123930 10 M1 0.069720 M7 0.121750 11 M7 0.068730 M1 0.121350 12 B2 0.067690 B2 0.120200 13 M3 0.067190 M3 0.119950 14 B1 0.064830 B1 0.117550

To sum up, we got the best results of 0.075690 and 0.129950 for ROUGE-2 and ROUGE-SU4 respectively (see M10). The results are comparable to System 15 and System 17, which had the best results at DUC 2005.

6 Conclusion

In this report, we propose a sentence retrieval approach to address query-focused multidocument summarization. The proposed method measures the relevance of a sentence to the query using a novel hybrid relevance analysis which linearly combines relevance measures from the vector space model and latent semantic analysis. The output summary is generated by including sentences with high sentence salience which is evaluated in terms of sentence relevance and low-level feature significances. In addition, a modified redundancy reduction module based on MMR is proposed for anti-redundancy by combining sentence representative power (i.e., surface feature salience) with the original MMR. The proposed method was evaluated using the DUC 2005 official corpus and found to perform well with competitive results.

The contributions of this work are three-fold. First, a hybrid relevance analysis is proposed to estimate sentence relevance to the query. Second, shallow features are employed for scoring sentence importance and are shown to be useful. Finally, a modified MMR was proposed and shown to be a suitable component for query-focused summarization when sentence representative power is considered.

In the future, we intend applying sentence compression techniques in order to include more useful information in the summary. We also plan to resolve anaphora references to obtain a summary with better readability. Sentence ordering is another issue that needs to be investigated to create a more fluent and coherent summary.

References

[1] J. Allan, C. Wade, and A. Bolivar, Retrieval and Novelty Detection at the Sentence Level, Proc. of SIGIR’03, Toronto, Canada, 2003.

[2] E. Amigo, J. Gonzalo, V. Peinado, A. Penas, and F. Verdejo, An Empirical Study of Information Synthesis Tasks. Proc. of ACL 2004, Barcelona, Spain, 2004.

[3] A. Berger, and V. O. Mittal, Query-Relevant Summarization using FAQs, Proc. of ACL 2000, Hong Kong, Chian, 2000.

[4] S. Blair-Goldensohn, From Definitions to Complex Topics: Columbia University at DUC 2005, Proc. of DUC 2005, Vancouver, Canada, 2005.

[5] W. Bosma, Query-Based Summarization Using Rhetorical Structure Theory, Proc.

of CLIN 2003, Belgium, 2003.

[6] J. Carbonell and J. Goldstein, The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, Proc. of SIGIR’98, Melbourne, Australia, 1998.

[7] E. D’Avanzo and B. Magnini, A Keyphrase-based Approach to Summarization:

the LAKE System at DUC-2005, Proc. of DUC 2005, Vancouver, Canada, 2005.

[8] H. Daumé III and D. Marcu, Bayesian Query-Focused Summarization, Proc. of COLING/ACL 2006, Sydney, Australia, 2006.

[9] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, Vol. 41, No. 6, 1990, pp. 391-407.

[10] Document Understanding Conference. Available at http://duc.nist.gov/.

[11] B. Hachey, G. Murray and D. Reitter, Query-oriented Multi-document Summarization with a Very Large Latent Semantic Space, Proc. of DUC 2005, Vancouver, Canada, 2005.

[12] T. Hirao, K. Takeuchi, H. Isozaki, Y. Sasaki and E. Maeda, NTT/NAIST’s Text Summarization Systems for TSC-2, Proc. of NTCIR 2003, Japan, 2003.

[13] E. Hovy, C.-Y. Lin, and L. Zhou, A BE-based Multidocument Summarizer with Query Interpretation, Proc. of DUC 2005, Vancouver, Canada, 2005.

[14] J. Jagadeesh, P. Pingali and V. Varma, A Relevance-Based Language Modeling Approach to DUC 2005, Proc. of DUC 2005, Vancouver, Canada, 2005.

[15] W. Li, W. Li, B. Li, Q. Chen and M. Wu, The Hong Kong Polytechnic University at DUC2005, Proc. of DUC 2005, Vancouver, Canada, 2005.

[16] C. Y. Lin, Training a Selection Function for Extraction, Proc. of CIKM’99, Kansas, MO, 1999.

[17] I. Mani, and E. Bloedorn, Summarizing Similarities and Differences among Related Documents, Information Retrieval, Vol. 1, 1999, pp. 35-67.

[18] I. Mani, and M. T. Maybury (Eds), Advances in Automated Text Summarization, Cambridge, MA: The MIT Press, 1999.

[19] W. C. Mann, and S. A. Thompson, Rhetorical Structure Theory: Toward a Function Theory of Text Organization, Text, Vol. 8, No. 3, 1988, pp. 243-281.

[20] MEAD, http://www.summarization.com/mead/.

[21] ROUGE, http://www.isi.edu/~cyl/ROUGE/.

[22] F. Schilder, A. McCulloh, B. T. McInnes, and A. Zhou, TLR at DUC: Tree Similarity, Proc. of DUC 2005, Vancouver, Canada, 2005.

[23] Y. Seki, K. Eguchi, N. Kando, and M. Aono, Multi-Document Summarization with Subjectivity Analysis at DUC 2005, Proc. of DUC 2005, Vancouver, Canada, 2005.

[24] S. Ye, L. Qiu, T.-S. Chua and M.-Y. Kan, NUS at DUC 2005: Understanding Document via Concept Links, Proc. of DUC 2005, Vancouver, Canada, 2005.

[25] J. Y. Yeh, H. R. Ke, W. P. Yang, and I. H. Meng, Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis, Information Processing &

Management, Vol. 41, No. 1, 2005, pp. 75-95.

[26] Q. Zhou, L. Sun, and J. Y. Nie, IS_SUM: A Multi-Document Summarizer based on Document Index Graphic and Lexical Chains, Proc. of DUC 2005, Vancouver, Canada, 2005.

在文檔中多語言複合式文件自動摘要之研究(III) (頁 83-89)