Multi-Layer Mutually Reinforced Random Walk with Hidden Parameters for Improved Multi-Party Meeting Summarization

(1)

Multi-Layer Mutually Reinforced Random Walk with Hidden Parameters for Improved Multi-Party Meeting Summarization

Yun-Nung Chen and Florian Metze

School of Computer Science, Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213, USA

{yvchen, fmetze}@cs.cmu.edu

Abstract

This paper proposes an improved approach of summarization for spoken multi-party interaction, in which a multi-layer graph with hidden parameters is constructed. The graph includes utterance-to-utterance relation, utterance-to-parameter weight, and speaker-to-parameter weight. Each utterance and each speaker are represented as a node in the utterance-layer and speaker-layer of the graph respectively. We use terms/ topics as hidden parameters for estimating utterance-to-parameter and speaker-to-parameter weight, and compute topical similarity between utterances as the utterance-to-utterance relation. By within- and between-layer propagation in the graph, the scores from different layers can be mutually reinforced so that utterances can automatically share the scores with the utterances from the speakers who focus on similar terms/ topics. For both ASR output and manual transcripts, experiments confirmed the efficacy of including hidden parameters and involving speaker information in the multi-layer graph for summarization. We find that choosing latent topics as hidden parameters significantly reduces computational complexity and does not hurt the performance.

Index Terms: summarization, multi-party meeting, mutual reinforcement, random walk

1. Introduction

Speech summarization is important for spoken or even multime- dia documents, which are more difficult to browse than text, and has therefore been investigated in the past [1]. Recent work has been increasingly directed towards conversational speech such as telephone conversation and multi-party meeting [2, 3, 4, 5, 6].

In this work, we perform extractive summarization on the output of automatic speech recognition (ASR) and corresponding manual transcripts of multi-party academic meeting recordings [7].

A general approach has been found be very successful [8], in which each utterance in the document can be represented as a sequence of terms, and the importance score of the utterance can integrate the score from the grammatical structure of the utterance, some statistical measures (such as TF-IDF), some lin- guistic measures (e.g., POS tags), a confidence score, and an n- gram score for each term in the utterance. For each document, the utterances to be used in the summary are then selected based on this score.

Many approaches to text summarization focus on graph- based methods for computing lexical centrality of utterances, in order to extract summaries [9, 10]. Speech summarization carries intrinsic difficulties due to the presence of recognition errors, spontaneous speech effects, and lack of segmen-

tation. In recent work, we proposed a graphical structure to rescore the importance scores of utterances, which can model the topical coherence between utterances using a random walk process [4, 11, 12, 13]. Unlike lecture and news, meeting recordings contain spoken multi-party interactions, so that the

“speaker importance” scores can be added to the estimation of the importance of individual utterance [14]. However, the utterance-to-speaker relation is not easy to model [12], so this paper additionally includes a middle layer to provide common parameters between utterances and speakers. Then the proposed multi-layer mutually reinforced random walk can compute the importance of hidden parameters and then increase the scores of utterances similar to other utterances based on hidden parameter modeling. It models intra- and inter-speaker topics to- gether in the graph by automatically propagating scores from utterance- and speaker-layer to hidden-parameter-layer for improving meeting summarization [10, 12, 15].

Section 2 describes the construction of the multi-layer graph and the algorithms about computing the importance of utterances with integration of within- and between-layer propagation through hidden parameters. Section 3 shows the exper- imental results of applying proposed approaches, evaluates the effectiveness of hidden parameters, and discusses the influence of parameter types and relation types for both ASR and manual transcripts. Section 4 concludes the achievements.

2. Proposed Approach

We first preprocess the utterances in all meetings by applying word stemming, stop word removal, and noise utterance filter- ing [16]. For extractive summarization, we set a cut-off ratio to retain only the most important utterances and form the summary of each document based on the “importance” of utterances. Thus, we formulate the utterance selection problem as computing the importance of each utterance. Then we construct a multi-layer graph to compute the importance for utterances, speakers, and hidden parameters in utterance-layer, speaker-layer, and hidden-parameter-layer respectively. In the multi-layer directed graph, each utterance is represented by a node in the utterance layer, and the edges between these are weighted by topical similarity described in Section 2.4. Each speaker in the meeting is a node in speaker layer. The hidden parameters represent terms or latent topics, and the edges between different layers are weighted by the relation between the two nodes as described in Section 2.3.

The basic idea is that an utterance similar to more important utterances should be more important [11], so the importance of each utterance considers the scores propagated from other utterances weighted by the similarity between them. In this ap-

(2)

U₁

U₂

U₃ U₄

U₅

U₆

U₇ Utterance-Layer

H₁

H₂

H₃ H₄

H₅

H₆ Hidden-Parameter-Layer

S₂

Speaker-Layer

S₁ S₃

Figure 1: A simplified example of the multi-layer graph with hidden parameters, where a speaker Si is represented as a speaker node, an utterance Ujis represented as an utterance node, and a hidden parameter Hl is represented as a parameter node of the graph. There are three different types of edges corresponding to different relations (utterance-to- utterance, utterance-to-parameter, and speaker-to-parameter).

proach, the propagated scores additionally consider speaker information, which is automatically modeled via hidden parameters in the graph. Figure 1 shows a simplified example for such a multi-layer graph with hidden parameters, in which there are a speaker-layer, an utterance-layer, and a hidden-parameter-layer.

2.1. Parameters from Topic Model

Topic models such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) have been widely used to analyze the semantics of documents based on a set of latent topics [17, 18]. Given a set of documents {dj, j = 1, 2, ..., J } and all terms {ti, i = 1, 2, ..., M } they include, the topic model uses a set of latent topic variables, {Tk, k = 1, 2, ..., K}, to characterize the “term-document” co- occurrence relationships. It can be optimized using the EM algorithm, by maximizing a likelihood function [17]. We utilize two parameters, latent topic significance (LTS) and latent topic entropy (LTE), in this paper [19].

Latent topic significance (LTS) for a given term tiwith respect to a topic Tkcan be defined as

LTSti(Tk) = P

d_j∈Dn(ti, dj)P (Tk| dj) P

d_j∈Dn(ti, dj)[1 − P (Tk| dj)], (1) where n(ti, dj) is the occurrence count of term tiin a document dj. Thus, a higher LTSt_i(Tk) indicates that the term tiis more significant for the latent topic Tk.

Latent topic entropy (LTE) for a given term tican be calcu- lated from the topic distribution P (Tk| ti),

LTE(ti) = −

K

X

k=1

P (Tk| ti) log P (Tk| ti), (2) where the topic distribution P (Tk | ti) can be estimated from the topic model. LTE(ti) is a measure of how the term ti is focused on a few topics, so a lower latent topic entropy implies the term carries more topical information.

2.2. Statistical Measures of a Term

The statistical measure of a term ti, s(ti, d) measures the importance of ti such as TF-IDF. In this work, it can be defined based on LTE(ti) as s(ti, d) = γ · n(ti, d)/ LTE(ti), where γ is a scaling factor such that s(ti, d) lies within the interval [0, 1], so the score s(ti, d) is inversely proportion to the latent topic entropy LTE(ti). This measure outperformed the very successful “significance score” [19, 8] in speech summarization, so we use the LTE-based statistical measure as our baseline.

2.3. Between-Layer Relation via Hidden Parameters Given a set of utterances {Ui, i = 1, 2, ..., |U |} and a set of speakers {Sj, j = 1, 2, |S|}, where a speaker node in the graph is represented by combination of all utterances from the speaker, we choose two different types of hidden parameters {Hl, l = 1, 2, ..., |H|}, terms in the vocabulary and latent topics from the topic model, as the middle layer of the graph, because they are the common parameters shared with other two layers. We derive the weights of between-layer relation as the significance of the hidden parameters given the utterance or the speaker.

2.3.1. Term Layer

We use terms from the vocabulary to represent hidden parameters, and the weight is computed as follows.

Lex(Ui, Hl) = 1

|Ui| X

t∈U_i

idft, (3)

where |H| is set to be M , which is the vocabulary size and idftis the inverse document frequency (IDF) of term t. Hence Lex(Ui, Hl) is average TF-IDF of term t in the utterance Ui. Lex(Sj, Hl) can be derived in the similar way.

2.3.2. Topic Layer

We use the trained topic model to represent each latent topic as a node in the layer of hidden parameters. Based on latent topics, which are the common parameters shared by utterances and speakers, we can weight between-layer edges below.

Topic(Ui, Hl) = 1

|Ui| X

t∈U_i

LTSt(Tl), (4)

where |H| is set to be K, which is the number of latent topics.

Topic(Sj, Hl) can be computed in the similar way.

2.4. Within-Layer Relation via Similarity

Considering that word overlap between utterances may be sparse due to recognition errors, it’s possible that topical similarity via topic models can be more informative than lexical similarity. Hence, we compute the similarity between utterances based on topical distribution.

Within a document d, we can first compute the probability that the topic Tkis addressed by an utterance Ui,

P (Tk| Ui) = P

t∈U_in(t, Ui)P (Tk| t) P

t∈U_in(t, Ui) . (5) Then an asymmetric topical similarity Sim(Ui, Uj) for utterances Ui to Uj(with direction Ui → Uj) can be defined by

(3)

accumulating LTSt(Tk) in (1) weighted by P (Tk | Ui) for all terms t in Ujover all latent topics,

Sim(Ui, Uj) = X

t∈U_j K

X

k=1

LTSt(Tk)P (Tk| Ui), (6)

where the idea is similar to generative probability in information retrieval. We call this generative significance of Uigiven Uj. 2.5. Multi-Layer Mutually Reinforced Random Walk For each document d, we construct a directed multi-layer graph G containing an utterance set, a speaker set, and a hidden parameter set to compute the importance of each utterance.

G = hVU, VS, VH, EU U, EU H, ESHi, where VU= {Ui∈ d}, VS = {Si ∈ d}, VH = {Hi}, EU U = {eij | Ui, Uj ∈ VU}, EU H = {eij| Ui∈ VU, Hj∈ VH}, and ESH = {eij | Si∈ S, Hj∈ VH}. EU U, EU H, and ESHcorrespond the utterance- to-utterance, utterance-to-parameter, and speaker-to-parameter relation respectively [10].

We compute WU U = [wU_i,U_j]|V_U|×|V_U|, where wU_i,U_j

is from Sim(Ui, Uj). WU H = [wU_i,H_j]|V_U|×|V_H|, where wU_i,H_jis the either from Topic(Ui, Hj) or Lex(Ui, Hj). Sim- ilarly, WSH = [wS_i,H_j]|V_S|×|V_H|, where wS_i,H_j is the either from Topic(Si, Hj) or Lex(Si, Hj). Row-normalization and column-normalization are applied to obtain LU U, LU H, and LSHas normalized affinity matrices [20].

Traditional random walk integrates the original scores and the scores propagated from other nodes [11, 14, 21]. Here the proposed approach additionally considers the speaker information and integrates importance propagated from speaker nodes via hidden parameters to automatically model intra- and inter- speaker relation. We use mutually reinforced random walk to propagate the scores based on external mutual reinforcement between different layers and internal importance propagation within the layer. The algorithm is detailed as follows.







F_H^(t+1)= (1 − 2α)F_H⁽⁰⁾+ α · L^T_{U H}F_U^(t)+ α · L^T_SHF_S^(t) F_U^(t+1)= (1 − 2α)F_U⁽⁰⁾+ 2α · L^T_{U U}LU HF_H^(t)

F_S^(t+1)= (1 − 2α)F_S⁽⁰⁾+ 2α · LSHF_H^(t)

, (7) where F_H^(t), F_U^(t), and F_S^(t)denote the importance scores of the hidden parameter set VH, the utterance set VU, and the speaker set VSin t-th iteration respectively.

In the algorithm, they are the interpolations of the initial importance and the scores propagated from another layer, where F_H^(t)integrate the scores propagated from both utterance-layer and speaker-layer to measure the importance of each hidden parameter. For utterance set, L^T_{U U}LU HF_H^(t) is the score propagated from hidden-parameter-layer according to utterance-to- parameter relation and then weighted by utterance-to-utterance similarity. Similarly, nodes in the speaker-layer also include the scores propagated from hidden-parameter-layer but without within-layer propagation, because the speaker-to-speaker relation cannot be estimated accurately, and if we set the uniform distribution for the relation, the results may not change a lot.

Then F_H^(t+1), F_U^(t+1), and F_S^(t+1)can be mutually updated by the latter parts in (7) iteratively. The algorithm will converge and then (8) can be satisfied [10].







F_H^∗ = (1 − 2α)F_H⁽⁰⁾+ α · L^T_{U H}F_U^∗+ α · L^T_SHF_S^∗ FU^∗= (1 − 2α)F_U⁽⁰⁾+ 2α · L^TU ULU HFH^∗

F_S^∗= (1 − 2α)F_S⁽⁰⁾+ 2α · LSHF_H^∗

(8)

We can solve FH^∗ as below.

F_H^∗ = (1 − 2α)F_H⁽⁰⁾ (9)

+ α · L^T_{U H}

(1 − 2α)F_U⁽⁰⁾+ 2α · L^T_{U U}LU HF_H^∗ + α · L^T_SH

(1 − 2α)F_S⁽⁰⁾+ 2α · LSHF_H^∗

= (1 − 2α)F_H⁽⁰⁾

+ α(1 − 2α) · L^T_{U H}F_U⁽⁰⁾+ α(1 − 2α) · L^T_SHF_S⁽⁰⁾ + 2α²· L^TU HL^T_{U U}LU HF_H^∗+ 2α²· L^TSHLSHF_H^∗

=

(1 − 2α)F_H⁽⁰⁾e^T

+ α(1 − 2α) · (L^TU HF_U⁽⁰⁾e^T+ L^TSHF_S⁽⁰⁾e^T) + 2α²· (L^TU HL^TU ULU H+ L^TSHLSH)

FH^∗

= M F_H^∗,

where the e = [1, 1, ..., 1]^T. It has been shown that the closed- form solution F_H^∗of (9) is the dominant eigenvector of M [22], which is the eigenvector corresponding to the largest absolute eigenvalue of M . Then we can compute the solution of F_U^∗ using (8), which denotes the updated importance scores for all utterances. Similar to the PageRank [23], the solution can also be obtained by iteratively updating F_H^(t), F_U^(t), and F_S^(t).

We set F_U⁽⁰⁾ to be the baseline score after normalization such that the sum of them is equal to 1, F_H⁽⁰⁾ = e^T/|VH|, and F_S⁽⁰⁾ = e^T/|VS|, which means we assume all hidden parameters and speakers in the document have equal importance in the beginning.

3. Experiments

3.1. Corpus

The corpus used here is a sequence of academic meetings, which features largely overlapping participant sets and topics of discussion. For each meeting, SmartNotes was used to record both the audio from each participant, as well as the notes [3]. The meetings were transcribed both manually and using a speech recognizer; the word error rate is around 44%. In this paper we use 10 meetings held from April to June of 2006.

On average, each meeting had about 28 minutes of speech.

Across these 10 meetings, there were 6 unique participants;

each meeting featured between 2 and 4 of these participants (average: 3.7). Total number of utterances is 9837 across 10 meetings. In this paper, we empirically set α = 0.45 for the unsupervised experiments because (1 − 2 × 0.45) is a proper damping factor [23, 21]. Note that for previous approaches, α is set to be 0.9 such that damping factor is (1 − 0.9). We use PLSA as our topic model and set the number of topics to be 32.

The reference summaries are given by the set of “noteworthy utterances”: two annotators manually labelled the degree (three levels) of “noteworthiness” for each utterance, and we extract the utterances with the highest level of “noteworthiness”

to form the summary of each meeting. Note that this experiment does not consider redundancy of information but focuses on the importance of utterances. After performing the algorithm, iteratively selecting utterances based on redundancy can achieve final summarized results. In the following experiments, for each meeting, we extract about 10% and 20% of the number of terms as the shorter summary considering reasonable ratios of meeting data, which are different from previous experiments [12], where the ratio is set to be 30%.

(4)

Table 1: The results of all proposed approaches and maximum relative improvement with respect to the baseline (%).

F-measure

10% Summary 20% Summary

ASR Manual ASR Manual

R-1 R-L R-1 R-L R-1 R-L R-1 R-L

(a) Baseline: LTE 44.27 43.32 43.10 41.99 44.73 44.11 42.30 41.68

(b) Two-Layer MRRW-WBP (LexSim) 45.82 44.82 44.89 44.00 45.64 44.78 43.92 43.26 (c) Two-Layer MRRW-WBP (TopicSim) 46.53 45.77 44.46 43.57 45.18 44.34 43.95 43.20

(d) Multi-Layer MRRW-Term 50.36 49.62 49.36 48.42 48.02 47.35 45.69 44.95

(e) Multi-Layer MRRW-Topic 50.00 49.16 48.68 47.82 48.35 47.78 46.69 45.99

Max Relative Improvement +13.76 +14.54 +14.52 +15.31 +8.09 +8.32 +10.38 +10.34

3.2. Evaluation Metrics

Our automated evaluation utilizes the standard DUC (Document Understanding Conference) evaluation metric, ROUGE [24], which represents recall over various n-grams statistics from a system-generated summary against a set of human generated summaries. F-measures for ROUGE-1 (unigram; R-1) and ROUGE-L (longest common subsequence; R-L) can be eval- uated in exactly the same way.

3.3. Results

Table 1 shows the performance achieved by all proposed approaches. Row (a) is the baseline, which uses an LTE-based statistical measure to compute the importance of utterances.

Row (b) is the result after applying two-layer mutually reinforced random walk (MRRW) using within- and between-layer propagation (WBP), which uses lexical similarity to measure utterance-to-utterance relation [12]. Row (c) is the same as row (b) except it uses topical similarity for utterance-to-utterance relation. Row (d) and (e) are the results of proposed multi-layer MRRW with terms (Term) and latent topics (Topic) as hidden parameters respectively.

3.3.1. Effectiveness of Hidden Parameters

We can see all performance of the multi-layer graph with hidden parameters (row (d)-(e)) significantly outperforms two-layer approaches (row (b)-(c)) for both ASR and manual transcripts.

The largest improvement is from multi-layer MRRW-Term (row (d)) for 10% summary and from multi-layer MRRW-Topic (row (e)) for 20% summary. The results confirm that hidden parameters carry more information and help summarization.

3.3.2. Comparing Types of Hidden Parameters

For shorter summary, there’s no obvious difference between using lexical and topical similarity in the two-layer approaches (row (b)-(c)). Therefore, we analyze the difference between choosing terms (similar to lexical information) and latent topics (similar to topical information) as hidden parameters.

When using terms as hidden parameters, we have about 2000 nodes in the middle layer. The selected terms correspond to words in vocabulary, which are collected from the corpus after word stemming and stop word removal. After performing the algorithm, each term in the hidden-parameter-layer has the score indicating the term significance in terms of utterances and speakers. Then the importance of each utterance integrates the scores propagated from other utterances and from the terms the utterance contains, where the latter carries speaker information.

Using terms as parameters allows scores to be computed more

accurately, and more accurate scores are better for extracting shorter summary so that MRRW-Term (row (d)) performs better than MRRW-Topic (row (e)) for 10% summary.

In the case of latent topics as hidden parameters, the size of the middle layer is equal to the number of topics, which is 32. The experiments are performed with different numbers of topics, 16, 32, 64, and 128, but the different settings do not influence the results a lot, which means that topic models can still capture the most important topics for all settings. With the proposed algorithm, the score of each latent topic considers utterance-to-parameter and speaker-to-parameter relation to indicate the topic significance in the multi-layer graph. With latent topics as parameters, scores can be computed more gen- erally. Topic models capture not lexically but conceptually similar information so that using latent topics as parameters is more suitable for extracting a 20% summary. Hence the results show that MRRW-Topic (row (e)) performs better than MRRW-Term (row (d)).

3.3.3. Computational Complexity Reduction

MRRW-Term includes a larger hidden-parameter-layer, and the proposed algorithm requires computation of the eigenvector of M2000×2000 in (9), which is computationally expensive.

MRRW-Topic can use a smaller hidden-parameter-layer, whose size is the number of topics, a constant value, by modeling similar terms in the same latent topic, to reduce computational complexity. In our experiments, we were able to significantly reduce the computational complexity without hurting the performance a lot for a 10% summary, while even achieving an improvement for a 20% summary. This shows that the proposed algorithm is effective and can be applied in a practical way.

4. Conclusions and Future Work

This paper presents the results of extensive experiments, which show that multi-layer mutually reinforced random walk with hidden parameters can model the importance of utterances and speakers through hidden parameters in the multi-layer graph.

The speaker information can be automatically included in the utterance importance by between-layer propagation, achieving about 13% and 8% relative improvement compared to the LTE baseline for shorter summary of ASR and manual transcripts respectively. Using latent topics as hidden parameters does not hurt the performance, but even reduces the computational complexity, showing the practicality and effectiveness of the proposed algorithm. In the future, we plan to model additional parameters such as prosodic features and to integrate different types of features in a single multi-layer graph.

(5)

5. References

[1] L.-S. Lee and B. Chen, “Spoken document understanding and organization,” IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 42–60, 2005.

[2] D. Harwath and T. J. Hazen, “Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech,” in Proceedings of IEEE ICASSP, 2012.

[3] S. Banerjee and A. I. Rudnicky, “An extractive-summarizaion baseline for the automatic detection of noteworthy utterances in multi-party human-human dialog,” in Proceedings of IEEE SLT, 2008.

[4] Y.-N. Chen and F. Metze, “Intra-speaker topic modeling for improved multi-party meeting summarization with integrated random walk,” in Proceedings of NAACL-HLT, 2012, number 382–

385.

[5] F. Liu and Y. Liu, “Using spoken utterance compression for meeting summarization: A pilot study,” in Proceedings of IEEE SLT, 2010.

[6] P.-Y. Hsueh and J. D. Moore, “Improving meeting summarization by focusing on user needs: a task-oriented evaluation,” in Proceedings of Intelligent User Interfaces, 2009.

[7] Y. Liu, S. Xie, and F. Liu, “Using N-best recognition output for extractive summarization and keyword extraction in meeting speech,” in Proceedings of IEEE ICASSP, 2010.

[8] S. Furui, T. Kikuchi, Y. Shinnaka, and C. Hori, “Speech-to- text and speech-to-speech summarization of spontaneous speech,”

IEEE Transactions on Speech and Audio Processing, vol. 12, no.

4, pp. 401–408, 2004.

[9] G. Erkan and D. R. Radev., “LexRank: Graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, vol. 22, pp. 457–479, 2004.

[10] X. Cai and W. Li, “Mutually reinforced manifold-ranking based relevance propagation model for query-focused multi-document summarization,” IEEE Transactions on Acoustics, Speech and Language Processing, vol. 20, pp. 1597–1607, 2012.

[11] Y.-N. Chen, Y. Huang, H.-Y. Lee, and L.-S. Lee, “Spoken lecture summarization by random walk over a graph constructed with automatically extracted key terms,” in Proceedings of InterSpeech, 2011.

[12] Y.-N. Chen and F. Metze, “Two-layer mutually reinforced random walk for improved multi-party meeting summarization,” in Proceedings of IEEE SLT, 2012.

[13] F. Lin, Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning, Ph.D. thesis, Carnegie Mellon Uni- versity, 2012.

[14] Y.-N. Chen and F. Metze, “Integrating intra-speaker topic modeling and temporal-based inter-speaker topic modeling in random walk for improved multi-party meeting summarization,” in Pro- ceedings of InterSpeech, 2012.

[15] N. Garg, B. Favre, K. Reidhammer, and D. Hakkani-T¨ur, “Clus- terrank: A graph based method for meeting summarization,” in Proceedings of InterSpeech, 2009.

[16] M. F. Porter and et al., “An algorithm for suffix strip- ping,” Program, 1980, http://tartarus.org/˜martin/

PorterStemmer/.

[17] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceed- ings of SIGIR, 1999.

[18] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichilet allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–

1022, 2003.

[19] S.-Y. Kong and L.-S. Lee, “Semantic analysis and organization of spoken documents based on parameters derived from latent topics,” IEEE Transactions on Audio, Speech and Language Pro- cessing, vol. 19, no. 7, pp. 1875–1889, 2011.

[20] J. Shi and J. Malik, “Normalized cuts and image segmentation,”

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.

[21] W. Hsu and L. Kennedy, “Video search reranking through random walk over document-level context graph,” in Proceedings of MM, 2007.

[22] A. Langville and C. Meyer, “A survey of eigenvector methods for web information retrieval,” SIAM Review, 2005.

[23] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” in Proceedings of WWW, 1998.

[24] C. Lin, “Rouge: A package for automatic evaluation of summaries,” in Workshop on Text Summarization Branches Out, 2004.