Automatic Key Term Extraction from Spoken Course Lectures

(1)

Using Branching Entropy and Prosodic/Semantic Features

Yun-Nung (Vivian) Chen, Yu Huang,

Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan

(2)

Introduction

(3)

Definition

• ^{Key Term}

•

Higher term frequency

•

Core content

• ^{Two types}

•

^Keyword

•

^{Key phrase}

• ^Advantage

•

Indexing and retrieval

•

The relations between key terms and segments of documents

(4)

Introduction

(5)

Introduction

acoustic model language model

hmm n gram

phone hidden Markov model

(6)

Introduction

hmm

acoustic model language model

n gram

phone hidden Markov model

bigram

Target: extract key terms from course lectures

(7)

Proposed Approach

(8)

Automatic Key Term Extraction

▼ Original spoken documents

Archive of spoken documents

Branching Entropy

Feature Extraction

Learning Methods 1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

(9)

Automatic Key Term Extraction

Branching Entropy

Feature Extraction

3)Neural Network ASRASR

speech signal

ASR trans

(10)

Automatic Key Term Extraction

Branching Entropy

Feature Extraction

speech signal

ASR trans

(11)

Phrase Identification

Automatic Key Term Extraction

Branching Entropy

Feature Extraction

speech signal

First using branching entropy to identify phrases

ASR trans

(12)

Key Term Extraction

Automatic Key Term Extraction

Branching Entropy

Feature Extraction

speech signal

Key terms entropy acoustic

model :

Then using learning methods to extract key terms by some features

ASR trans

(13)

Key Term Extraction

Automatic Key Term Extraction

Branching Entropy

Feature Extraction

speech signal

model :

ASR trans

(14)

Branching Entropy

•

“hidden” is almost always followed by the same word

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in

: :

(15)

Branching Entropy

•

“hidden Markov” is almost always followed by the same word

hidden Markov model

represent is

can : : is

of in

: :

(16)

Branching Entropy

hidden Markov model

boundary

Define branching entropy to decide possible boundary

represent is

can : : is

of in

: :

•

“hidden Markov” is almost always followed by the same word

•

“hidden Markov model” is followed by many different words

(17)

Branching Entropy

hidden Markov model

• Definition of Right Branching Entropy

• Probability of children xi for X

• Right branching entropy for X

X x_i

represent is

can : : is

of in

: :

(18)

Branching Entropy

hidden Markov model

• Decision of Right Boundary

• Find the right boundary located between X and xi where

X

boundary

represent is

can : : is

of in

: :

(19)

Branching Entropy

hidden Markov model

represent is

can : : is

of in

: :

(20)

Branching Entropy

hidden Markov model

represent is

can : : is

of in

: :

(21)

Branching Entropy

hidden Markov model

represent is

can : : is

of in

: :

(22)

Branching Entropy

hidden Markov model

• Decision of Left Boundary

• Find the left boundary located between X and xi where

X: model Markov hidden

boundary

X

represent is

can : : is

of in

: :

Using PAT Tree to implement

(23)

Branching Entropy

• Implementation in the PAT tree

• Probability of children x_i for X

• Right branching entropy for X ^hidden

Markov

1 model

2

chain 3

state

distribution 5 6 variable

4

X

x₁ x₂

X : hidden Markov

x₁: hidden Markov model x₂: hidden Markov chain

(24)

Key Term Extraction

Automatic Key Term Extraction

Branching Entropy

Feature Extraction

speech signal

model :

Extract prosodic, lexical, and semantic features for each candidate term

ASR trans

(25)

Feature Extraction

• Prosodic features

•

For each candidate term appearing at the first time

Feature

Name Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Speaker tends to use longer duration to emphasize key terms

using 4 values for duration of the term duration of phone “a” normalized by

avg duration of phone “a”

(26)

Feature Extraction

• Prosodic features

•

For each candidate term appearing at the first time Higher pitch may represent significant information

Feature

(27)

Feature Extraction

• Prosodic features

•

For each candidate term appearing at the first time Higher pitch may represent significant information

Feature

Pitch

(I - IV) F0

(max, min, mean, range)

(28)

Feature Extraction

• Prosodic features

•

For each candidate term appearing at the first time Higher energy emphasizes important information

Feature

Pitch

(I - IV) F0

(29)

Feature Extraction

• Prosodic features

•

For each candidate term appearing at the first time Higher energy emphasizes important information

Feature

Pitch

(I - IV) F0

Energy

(I - IV) energy

(30)

Feature Extraction

• Lexical features

Feature Name Feature Description

TF term frequency

IDF inverse document frequency

TFIDF tf * idf

PoS the PoS tag

Using some well-known lexical features for each candidate term

(31)

Feature Extraction

• Semantic features

•

Probabilistic Latent Semantic Analysis (PLSA)

 Latent Topic Probability

Key terms tend to focus on limited topics

t 1t 2 t j

t n

D₁ D₂ D_i

D_N

TK Tk

T2 T1

P(T |D )k i P(t |T )j k

D_i: documents T_k: latent topics t_j: ^terms

(32)

Feature Extraction

• Semantic features

•

 Latent Topic Probability

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term

key term

describe a probability distribution How to use it?

(33)

Feature Extraction

• Semantic features

•

 Latent Topic Significance

Within-topic to out-of-topic ratio

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term

key term

within-topic freq.

out-of-topic freq.

(34)

Feature Extraction

• Semantic features

•

 Latent Topic Significance

Within-topic to out-of-topic ratio

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

non-key term

key term

within-topic freq.

out-of-topic freq.

(35)

Feature Extraction

• Semantic features

•

 Latent Topic Entropy

non-key term

key term

(36)

Feature Extraction

• Semantic features

•

 Latent Topic Entropy

LTE term entropy for latent topic

non-key term

key term

Higher LTE

Lower LTE

(37)

Key Term Extraction

Automatic Key Term Extraction

Branching Entropy

Feature Extraction

speech signal

ASR trans

model :

Using unsupervised and supervised approaches to extract key terms

(38)

Learning Methods

• Unsupervised learning

•

K-means Exemplar

 Transform a term into a vector in LTS (Latent Topic Signific ance) space

 Run K-means

 Find the centroid of each cluster to be the key term

The candidate term in the same group are related to the key term The key term can represent this topic

The terms in the same cluster focus on a single topic

(39)

Learning Methods

• Supervised learning

•

Adaptive Boosting

•

Neural Network

Automatically adjust the weights of features to produce a classifier

(40)

Experiments & Evaluation

(41)

Experiments

• ^Corpus

•

NTU lecture corpus

 Mandarin Chinese embedded by English words

 Single speaker

 45.2 hours

我們的 solution 是 viterbi algorithm (Our solution is viterbi algorithm)

(42)

Experiments

• ASR Accuracy

Language Mandarin English Overall

Char Acc (%) 78.15 53.44 76.26

CH EN

SI Model

some data from target speaker

AM

Out-of-domain corpora Background

In-domain corpus Adaptive

trigram

interpolation LM

Bilingual AM and model adaptation

(43)

Experiments

• Reference Key Terms

•

Annotations from 61 students who have taken the course

 If the k-th annotator labeled Nk key terms, he gave each o f them a score of , but 0 to others

 Rank the terms by the sum of all scores given by all annot ators for each term

 Choose the top N terms form the list (N is average Nk)

•

N = 154 key terms

 59 key phrases and 95 keywords

(44)

Experiments

• ^Evaluation

•

Unsupervised learning

 Set the number of key terms to be N

•

Supervised learning

 3-fold cross validation

(45)

Pr Lx Sm Pr+Lx Pr+Lx+Sm

0 10 20 30 40 50 60

Experiments

• Feature Effectiveness

•

Neural network for keywords from ASR transcriptions

Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful

20.78

42.86

35.63

48.15

56.55

Pr: Prosodic Lx: Lexical Sm: Semantic

F-measure

(46)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

• Overall Performance

51.95

55.84

62.39

67.31

23.38

Conventional TFIDF scores w/o branching entropy

stop word removal

PoS filtering

Branching entropy performs well K-means Exempler outperforms TFIDF

Supervised approaches are better than unsupervised approaches

F-measure

AB: AdaBoost

NN: Neural Network

(47)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

• Overall Performance

The performance of ASR is slightly worse than manual but reasonableSupervised learning using neural network gives the best results

23.38 20.78

51.95

43.51

55.84 52.60

62.39 57.68

67.31 62.70 F-measure

AB: AdaBoost

NN: Neural Network

(48)

Conclusion

(49)

Conclusion

• We propose the new approach to extract key terms

• The performance can be improved by

•

Identifying phrases by branching entropy

•

Prosodic, lexical, and semantic features together

• The results are encouraging

(50)

Thanks for your attention!  Q & A

Thank reviewers for valuable comments

NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture

Automatic Key Term Extraction from Spoken Course Lectures

Using Branching Entropy and Prosodic/Semantic Features

Introduction

Definition

• Key Term

•

•

• Two types

•

•

• Advantage

•

•

Introduction

Introduction

Introduction

Proposed Approach

Automatic Key Term Extraction

Automatic Key Term Extraction

Automatic Key Term Extraction

Automatic Key Term Extraction

Automatic Key Term Extraction

Automatic Key Term Extraction

Branching Entropy

•

hidden Markov model

Branching Entropy

•

•

hidden Markov model

Branching Entropy

hidden Markov model

•

•

•

Branching Entropy

hidden Markov model

Branching Entropy

hidden Markov model

Branching Entropy

hidden Markov model

Branching Entropy

hidden Markov model

Branching Entropy

hidden Markov model

Branching Entropy

hidden Markov model

Branching Entropy

Automatic Key Term Extraction

Feature Extraction

• Prosodic features

•

Feature Extraction

• Prosodic features

•

Feature Extraction

• Prosodic features

•

Feature Extraction

• Prosodic features

•

Feature Extraction

• Prosodic features

•

Feature Extraction

• Lexical features

Feature Extraction

• Semantic features

•

Feature Extraction

• Semantic features

•

Feature Extraction

• Semantic features

•

Feature Extraction

• Semantic features

•

Feature Extraction

• Semantic features

• ^{Key Term}

• ^{Two types}

• ^Advantage

• ^Corpus

• ^Evaluation