• 沒有找到結果。

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk

N/A
N/A
Protected

Academic year: 2022

Share "Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk"

Copied!
38
0
0

加載中.... (立即查看全文)

全文

(1)

Sujay Kumar Jauhar

Yun-Nung (Vivian) Chen

Florian Metze The 6th International Joint Conference on Natural Language Processing – Oct. 14-18, 2013

{sjauhar, yvchen, fmetze}@cs.cmu.edu

Language Technologies Institute School of Computer Science Carnegie Mellon University

(2)

Outline

Introduction Approach Experiments Conclusion

(3)

Outline

Introduction Approach Experiments Conclusion

O

Motivation

O

Extractive Summarization

(4)

Outline

Introduction Approach Experiments Conclusion

O

Motivation

O

Extractive Summarization

(5)

Motivation

O

Speech Summarization

O Spoken documents are more difficult to browse than texts

 easy to browse, save time, easily get the key points O

Prosodic Features

O Speakers may use prosody to implicitly convey the importance of the speech

(6)

Outline

Introduction Approach Experiments Conclusion

O

Motivation

O

Extractive Summarization

(7)

Extractive Summarization

(1/2)

O

Extractive Speech Summarization

O Select the indicative utterances in a spoken document

O Cascade the utterances to form a summary

1st utterance 2nd utterance 3rd utterance 4th utterance

: :

n-th utterance :

:

Extractive Summary

(8)

Extractive Summarization

(2/2)

O

Selection of Indicative Utterances

O Each utterance U in a spoken document d is given an importance score I(U, d)

O Select the indicative utterances based on I(U,d)

O The number of utterances selected as summary is decided by a predefined ratio

n

i

t

t t

t

U

1 2

   

    

n

i

i

d t

s d

U

1

] ,

[ ,

I    

utterance term

term statistical measure (ex. TF-IDF) Importance score

(9)

Outline

Introduction Approach Experiments Conclusion

O

Prosodic Feature Extraction

O

Graph Construction

O

Two-Layer Mutually Reinforced Random Walk

(10)

Outline

Introduction Approach Experiments Conclusion

O

Prosodic Feature Extraction

O

Graph Construction

O

Two-Layer Mutually Reinforced Random Walk

(11)

Prosodic Feature Extraction

O

For each pre-segmented audio file, we extract

O number of syllables

O number of pauses

O duration time: speaking time including pauses

O phonation time: speaking time excluding pauses

O speaking rate: #syllable / duration time

O articulation rate: #syllable / phonation time

O fundamental frequency measured in Hz: avg, max, min

O energy measured in Pa2/sec

O intensity measured in dB

(12)

Outline

Introduction Approach Experiments Conclusion

O

Prosodic Feature Extraction

O

Graph Construction

O

Two-Layer Mutually Reinforced Random Walk

(13)

Graph Construction

(1/3)

O

Utterance-Layer

O Each node is the utterance in the meeting document

U1

U2

U3 U4

U5

U6

U7

Utterance-Layer

(14)

Graph Construction

(2/3)

O

Utterance-Layer

O Each node is the utterance in the meeting document O

Prosody-Layer

O Each node is a

prosodic feature U1

U2

U3 U4

U5

U6

U7

Utterance-Layer

P1

P2

P3 P4

P5

P6

Prosody-Layer

(15)

Graph Construction

(3/3)

O

Utterance-Layer

O Each node is the utterance in the meeting document O

Prosody-Layer

O Each node is a prosodic feature O

Between-Layer

Relation

U1

U2

U3 U4

U5

U6

U7

Utterance-Layer

P1

P2

P3 P4

P5

P6

Prosody-Layer

O The weight of the edge is the normalized value of the prosodic feature extracted from the utterance

(16)

Outline

Introduction Approach Experiments Conclusion

O

Prosodic Feature Extraction

O

Graph Construction

O

Two-Layer Mutually Reinforced Random Walk

(17)

O

Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk

(1/2)

utterance scores at (t+1)-th iteration

U1

U2

U3 U4

U5

U6

U7 Utterance-Layer

P1

P2

P3 P4

P5

P6 Prosody-Layer

(18)

O

Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk

(1/2)

original importance of utterances

O

Original importance

O Utterance: equal weight

U1

U2

U3 U4

U5

U6

U7 Utterance-Layer

P1

P2

P3 P4

P5

P6 Prosody-Layer

(19)

O

Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk

(1/2)

scores propagated from prosody nodes weighted by prosodic values

O

Original importance

O Utterance: equal weight

U1

U2

U3 U4

U5

U6

U7 Utterance-Layer

P1

P2

P3 P4

P5

P6 Prosody-Layer

(20)

O

Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk

(1/2)

prosody scores at (t+1)-th iteration

O

Original importance

O Utterance: equal weight

U1

U2

U3 U4

U5

U6

U7 Utterance-Layer

P1

P2

P3 P4

P5

P6 Prosody-Layer

(21)

O

Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk

(1/2)

original importance of prosodic features

O

Original importance

O Utterance: equal weight

O Prosody: equal weight

U1

U2

U3 U4

U5

U6

U7 Utterance-Layer

P1

P2

P3 P4

P5

P6 Prosody-Layer

(22)

O

Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk

(1/2)

O

Original importance

O Utterance: equal weight

O Prosody: equal weight

U1

U2

U3 U4

U5

U6

U7 Utterance-Layer

P1

P2

P3 P4

P5

P6 Prosody-Layer

scores propagated from utterances weighted by prosodic values

(23)

Two-Layer Mutual Reinforced Random Walk

(2/2)

O

Mathematical Formulation

Utterance node U can get higher score when

• More important prosodic features with higher weights corresponding to utterance U

(24)

Two-Layer Mutual Reinforced Random Walk

(2/2)

O

Mathematical Formulation

Utterance node U can get higher score when

• More important prosodic features with higher weights corresponding to utterance U

Prosody node P can get higher score when

• More important utterances have higher weights corresponding to the prosodic feature P

 Unsupervised learn important utterances/prosodic features

(25)

Outline

Introduction Approach Experiments Conclusion

O

Experimental Setup

O

Evaluation Metrics

O

Results

O

Analysis

(26)

Outline

Introduction Approach Experiments Conclusion

O

Experimental Setup

O

Evaluation Metrics

O

Results

O

Analysis

(27)

O

CMU Speech Meeting Corpus

O 10 meetings from 2006/04 – 2006/06

O #Speaker: 6 (total), 2-4 (each meeting)

O WER = 44%

O

Reference Summaries

O Manually labeled by two annotators as three

“noteworthiness” level (1-3)

O Extract utterances with level 3 as reference summaries O

Parameter Setting

O α = 0.9

O Extractive summary ratio = 10%, 20%, 30%

Experimental Setup

(28)

Outline

Introduction Approach Experiments Conclusion

O

Experimental Setup

O

Evaluation Metrics

O

Results

O

Analysis

(29)

O

ROUGE

O ROUGE-1

O F-measure of matched unigram between extracted summary and reference summary

O ROUGE-L (Longest Common Subsequence)

O F-measure of matched LCS between extracted summary and reference summary

O

Average Relevance Score

O Average noteworthiness scores for the extracted utterances

Evaluation Metrics

(30)

Outline

Introduction Approach Experiments Conclusion

O

Experimental Setup

O

Evaluation Metrics

O

Results

O

Analysis

(31)

O

Longest

O the longest utterances based on #tokens O

Begin

O the utterances that appear in the beginning O

Latent Topic Entropy (LTE)

O Estimate the “focus” of an utterance

O Lower topic entropy represents more topically informative O

TFIDF

O Average TFIDF scores of all words in the utterances

Baseline

(32)

2.20 2.25 2.30 2.35 2.40 2.45 2.50

Longest Begin LTE TFIDF Proposed

Avg. Relevance

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00

Longest Begin LTE TFIDF Proposed

ROUGE-1

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00

Longest Begin LTE TFIDF Proposed

ROUGE-L

Results

For 10% summaries, Begin performs best and proposed performs comparable results

(33)

2.20 2.25 2.30 2.35 2.40 2.45 2.50

Longest Begin LTE TFIDF Proposed

Avg. Relevance

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00

Longest Begin LTE TFIDF Proposed

ROUGE-1

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00

Longest Begin LTE TFIDF Proposed

ROUGE-L

Results

For 20% summaries, proposed approach outperforms all of the baselines

(34)

2.20 2.25 2.30 2.35 2.40 2.45 2.50

Longest Begin LTE TFIDF Proposed

Avg. Relevance

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00

Longest Begin LTE TFIDF Proposed

ROUGE-1

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00

Longest Begin LTE TFIDF Proposed

ROUGE-L

Results

For 30% summaries, proposed approach outperforms all of the baselines

(35)

Outline

Introduction Approach Experiments Conclusion

O

Experimental Setup

O

Evaluation Metrics

O

Results

O

Analysis

(36)

O

Based on converged scores for prosodic features

O Predictive features

O number of pauses

O min pitch

O avg pitch

O intensity

O Least predictive features

O the duration time

O the number of syllables

O the energy

Analysis

(37)

Outline

Introduction Approach Experiments Conclusion

O Two-layer mutually reinforced random walk integrates

prosodic knowledge into an unsupervised model for speech summarization

O We show the first attempt at performing unsupervised speech summarization without using lexical information

O Compared to some lexically derived baselines, the proposed approach outperforms all of them but one scenario

(38)

參考文獻

相關文件

The course objective is designed to let students learn the following topics: (1) international trade, (2) business letters highly used in trade, (2) business letters highly used

Regarding the importance of these aspects as perceived by the employers, nearly all aspects received a rating between “quite important” and “very important”, with Management

Regarding the importance of these aspects as perceived by the employers, nearly all aspects received a rating between “quite important” and “very important”, with Management Skill

important to not just have intuition (building), but know definition (building block).. More on

Apart from actively taking forward the "Walk in HK" programme announced by the Transport and Housing Bureau in January this year to encourage people to walk more, we

○ exploits unlabeled data to learn latent factors as representations. ○ learned representations can be transfer to

SLU usually focuses on understanding single-turn utterances The understanding result is usually influenced by?. 1) local observations 2)

▫ Not only the sentences with high importance score based on statistical measure should be considered as indicative sentence... Proposed