• 沒有找到結果。

Automatic Key Term Extraction from Spoken Course Lectures

N/A
N/A
Protected

Academic year: 2022

Share "Automatic Key Term Extraction from Spoken Course Lectures"

Copied!
50
0
0

加載中.... (立即查看全文)

全文

(1)

Using Branching Entropy and Prosodic/Semantic Features

Yun-Nung (Vivian) Chen, Yu Huang,

Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan

(2)

Introduction

(3)

Definition

Key Term

Higher term frequency

Core content

Two types

Keyword

Key phrase

Advantage

Indexing and retrieval

The relations between key terms and segments of documents

(4)

Introduction

(5)

Introduction

acoustic model language model

hmm n gram

phone hidden Markov model

(6)

Introduction

hmm

acoustic model language model

n gram

phone hidden Markov model

bigram

Target: extract key terms from course lectures

(7)

Proposed Approach

(8)

Automatic Key Term Extraction

▼ Original spoken documents

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

(9)

Automatic Key Term Extraction

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASRASR

speech signal

ASR trans

(10)

Automatic Key Term Extraction

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

(11)

Phrase Identification

Automatic Key Term Extraction

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

First using branching entropy to identify phrases

ASR trans

(12)

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

Key terms entropy acoustic

model :

Then using learning methods to extract key terms by some features

ASR trans

(13)

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

Key terms entropy acoustic

model :

ASR trans

(14)

Branching Entropy

“hidden” is almost always followed by the same word

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(15)

Branching Entropy

“hidden” is almost always followed by the same word

“hidden Markov” is almost always followed by the same word

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(16)

Branching Entropy

hidden Markov model

boundary

Define branching entropy to decide possible boundary

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

“hidden” is almost always followed by the same word

“hidden Markov” is almost always followed by the same word

“hidden Markov model” is followed by many different words

(17)

Branching Entropy

hidden Markov model

• Definition of Right Branching Entropy

Probability of children xi for X

Right branching entropy for X

X xi

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(18)

Branching Entropy

hidden Markov model

• Decision of Right Boundary

Find the right boundary located between X and xi where

X

boundary

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(19)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(20)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(21)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(22)

Branching Entropy

hidden Markov model

• Decision of Left Boundary

Find the left boundary located between X and xi where

X: model Markov hidden

How to decide the boundary of a phrase?

boundary

X

represent is

can : : is

of in

: :

Using PAT Tree to implement

(23)

Branching Entropy

• Implementation in the PAT tree

Probability of children xi for X

Right branching entropy for X hidden

Markov

1 model

2

chain 3

state

distribution 5 6 variable

4

X

x1 x2

X : hidden Markov

x1: hidden Markov model x2: hidden Markov chain

How to decide the boundary of a phrase?

(24)

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

Key terms entropy acoustic

model :

Extract prosodic, lexical, and semantic features for each candidate term

ASR trans

(25)

Feature Extraction

• Prosodic features

For each candidate term appearing at the first time

Feature

Name Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Speaker tends to use longer duration to emphasize key terms

using 4 values for duration of the term duration of phone “a” normalized by

avg duration of phone “a”

(26)

Feature Extraction

• Prosodic features

For each candidate term appearing at the first time Higher pitch may represent significant information

Feature

Name Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

(27)

Feature Extraction

• Prosodic features

For each candidate term appearing at the first time Higher pitch may represent significant information

Feature

Name Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Pitch

(I - IV) F0

(max, min, mean, range)

(28)

Feature Extraction

• Prosodic features

For each candidate term appearing at the first time Higher energy emphasizes important information

Feature

Name Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Pitch

(I - IV) F0

(max, min, mean, range)

(29)

Feature Extraction

• Prosodic features

For each candidate term appearing at the first time Higher energy emphasizes important information

Feature

Name Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Pitch

(I - IV) F0

(max, min, mean, range)

Energy

(I - IV) energy

(max, min, mean, range)

(30)

Feature Extraction

• Lexical features

Feature Name Feature Description

TF term frequency

IDF inverse document frequency

TFIDF tf * idf

PoS the PoS tag

Using some well-known lexical features for each candidate term

(31)

Feature Extraction

• Semantic features

Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Probability

Key terms tend to focus on limited topics

t 1t 2 t j

t n

D1 D2 Di

DN

TK Tk

T2 T1

P(T |D )k i P(t |T )j k

Di: documents Tk: latent topics tj: terms

(32)

Feature Extraction

• Semantic features

Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Probability

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term

key term

Key terms tend to focus on limited topics

describe a probability distribution How to use it?

(33)

Feature Extraction

• Semantic features

Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Significance

Within-topic to out-of-topic ratio

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term

key term

Key terms tend to focus on limited topics

within-topic freq.

out-of-topic freq.

(34)

Feature Extraction

• Semantic features

Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Significance

Within-topic to out-of-topic ratio

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

non-key term

key term

Key terms tend to focus on limited topics

within-topic freq.

out-of-topic freq.

(35)

Feature Extraction

• Semantic features

Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Entropy

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

non-key term

key term

Key terms tend to focus on limited topics

(36)

Feature Extraction

• Semantic features

Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Entropy

Feature Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

LTE term entropy for latent topic

non-key term

key term

Key terms tend to focus on limited topics

Higher LTE

Lower LTE

(37)

Phrase Identification

Key Term Extraction

Automatic Key Term Extraction

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

Key terms entropy acoustic

model :

Using unsupervised and supervised approaches to extract key terms

(38)

Learning Methods

• Unsupervised learning

K-means Exemplar

Transform a term into a vector in LTS (Latent Topic Signific ance) space

Run K-means

Find the centroid of each cluster to be the key term

The candidate term in the same group are related to the key term The key term can represent this topic

The terms in the same cluster focus on a single topic

(39)

Learning Methods

• Supervised learning

Adaptive Boosting

Neural Network

Automatically adjust the weights of features to produce a classifier

(40)

Experiments & Evaluation

(41)

Experiments

Corpus

NTU lecture corpus

Mandarin Chinese embedded by English words

Single speaker

45.2 hours

我們的 solution 是 viterbi algorithm (Our solution is viterbi algorithm)

(42)

Experiments

• ASR Accuracy

Language Mandarin English Overall

Char Acc (%) 78.15 53.44 76.26

CH EN

SI Model

some data from target speaker

AM

Out-of-domain corpora Background

In-domain corpus Adaptive

trigram

interpolation LM

Bilingual AM and model adaptation

(43)

Experiments

• Reference Key Terms

Annotations from 61 students who have taken the course

If the k-th annotator labeled Nk key terms, he gave each o f them a score of , but 0 to others

Rank the terms by the sum of all scores given by all annot ators for each term

Choose the top N terms form the list (N is average Nk)

N = 154 key terms

59 key phrases and 95 keywords

(44)

Experiments

Evaluation

Unsupervised learning

Set the number of key terms to be N

Supervised learning

3-fold cross validation

(45)

Pr Lx Sm Pr+Lx Pr+Lx+Sm

0 10 20 30 40 50 60

Experiments

• Feature Effectiveness

Neural network for keywords from ASR transcriptions

Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful

20.78

42.86

35.63

48.15

56.55

Pr: Prosodic Lx: Lexical Sm: Semantic

F-measure

(46)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

• Overall Performance

51.95

55.84

62.39

67.31

23.38

Conventional TFIDF scores w/o branching entropy

stop word removal

PoS filtering

Branching entropy performs well K-means Exempler outperforms TFIDF

Supervised approaches are better than unsupervised approaches

F-measure

AB: AdaBoost

NN: Neural Network

(47)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

• Overall Performance

The performance of ASR is slightly worse than manual but reasonableSupervised learning using neural network gives the best results

23.38 20.78

51.95

43.51

55.84 52.60

62.39 57.68

67.31 62.70 F-measure

AB: AdaBoost

NN: Neural Network

(48)

Conclusion

(49)

Conclusion

• We propose the new approach to extract key terms

• The performance can be improved by

Identifying phrases by branching entropy

Prosodic, lexical, and semantic features together

• The results are encouraging

(50)

Thanks for your attention!  Q & A

Thank reviewers for valuable comments

NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture

參考文獻

相關文件

- Informants: Principal, Vice-principals, curriculum leaders, English teachers, content subject teachers, students, parents.. - 12 cases could be categorised into 3 types, based

In the third paragraph, please write a 100-word paragraph to talk about what you’d do in the future to make this research better and some important citations if any.. Please help

Through study in various knowledge contexts and through engaging in a range of learning activities, students will acquire technological concepts and knowledge and develop

- To provide career and life planning education at the junior secondary level to develop students’ understanding of themselves in the context of whole-person development.

Continue to serve as statements of curriculum intentions setting out more precisely student achievement as a result of the curriculum.

Additional Key Words and Phrases: Topic Hierarchy Generation, Text Segment, Hierarchical Clustering, Partitioning, Search-Result Snippet, Text Data

2. How would you say the following sentence? Write the stress level 1, 2, or 3 over each word. Draw a slash to show the thought groups. Circle the true statements about

Activate prior knowledge about the genre Language access strategies. While-reading activities Reading in