Automatic Key Term Extraction fromSpoken Course Lectures

(1)

Automatic Key Term Extract ion from

Spoken Course Lectures

Using Branching Entropy and Pr

osodic/Semantic Features Speaker:

黃宥、陳縕儂

(2)

Outline

O

Introduction

O

Proposed Approach

O

Branching Entropy

O

Feature Extraction

O

Learning Method

O

Experiments & Evaluation

O

Conclusion

(3)

Introduction

(4)

Definition

O

Key Term

O

Higher term frequency

O

Core content

O

Two types

O

Keyword

O

Key phrase

O

Advantage

O

Indexing and retrieval

O

The relations between key terms and segments of do

cuments

(5)

Introduction

(6)

acoustic model language model

hmm n gram

phone hidden Markov model

Introduction

(7)

hmm

acoustic model language model

n gram

hidden Markov model phone bigram

Target: extract key terms from course lectures

Introduction

(8)

Proposed Approach

(9)

Automatic Key Term Extraction

▼ Original spoken documents

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

(10)

Automatic Key Term Extraction

Branchin g Entropy

Feature Extraction

Learning Methods

speech signal

ASR trans

(11)

Automatic Key Term Extraction

Branchin g Entropy

Feature Extraction

Learning Methods

speech signal

ASR trans

(12)

Phrase Identificatio

n

Automatic Key Term Extraction

Branchin g Entropy

Feature Extraction

Learning Methods

speech signal

ASR trans

First using branching entropy to identify phrases

(13)

Key Term Extraction

n

Automatic Key Term Extraction

Branchin g Entropy

Feature Extraction

Learning Methods

speech signal

ASR trans

Learning to extract key terms by some features

Key terms entropy acoustic

model :

(14)

Key Term Extraction

n

Automatic Key Term Extraction

Branchin g Entropy

Feature Extraction

Learning Methods

speech signal

ASR trans

Key terms entropy acoustic

model :

(15)

Branching Entropy

O “hidden” is almost always followed by the same word

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

(16)

Branching Entropy

O “hidden Markov” is almost always followed by the same word

hidden Markov model

represent is

can : : is

of in : :

(17)

Branching Entropy

O “hidden Markov” is almost always followed by the same word

O “hidden Markov model” is followed by many different wo rds

hidden Markov model

represent is

can : : is

of in : :

boundary

Define branching entropy to decide possible boundary

(18)

Branching Entropy

hidden Markov model

represent is

can : : is

of in : :

O

Definition of Right Branching Entropy

O Probability of children xi for X

O Right branching entropy for X

X x

_i

(19)

Branching Entropy

hidden Markov model

represent is

can : : is

of in :

:

X

O

Decision of Right Boundary

O Find the right boundary located between X and xi where

boundary

(20)

Branching Entropy

hidden Markov model

represent is

can : : is

of in : :

(21)

Branching Entropy

hidden Markov model

represent is

can : : is

of in : :

(22)

Branching Entropy

hidden Markov model

represent is

can : : is

of in : :

(23)

Branching Entropy

hidden Markov model

represent is

can : : is

of in : :

O

Decision of Left Boundary

O Find the left boundary located between X and xi where X: model Markov hidden

boundary

X

Using PAT Tree to implement

(24)

O

Implementation in the PAT tree

O Probability of children x_i for X

O Right branching entropy for X ^hidden

Markov

1 model

2

chain 3

state

distribution 5 6 variable

4

X

x₁ x₂

X : hidden Markov

x₁: hidden Markov model x₂: hidden Markov chain

Branching Entropy

(25)

Key Term Extraction

n

Automatic Key Term Extraction

Branchin g Entropy

Feature Extraction

Learning Methods

speech signal

ASR trans

Key terms entropy acoustic

model :

Extract some features for each candidate term

(26)

Feature Extraction

O

Prosodic features

O For each candidate term appearing at the first time

Featur e Name

Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Speaker tends to use longer duration to emphasize key terms

using 4 values for duration of the term

duration of phone “a”

normalized by avg duration of phone “a”

(27)

O

Prosodic features

Higher pitch may represent significant information

Feature Extraction

(28)

O

Prosodic features

O For each candidate term appearing at the first time Higher pitch may represent significant information

Feature Extraction

Pitch

(I - IV) F0

(max, min, mean, range)

(29)

O

Prosodic features

O For each candidate term appearing at the first time Higher energy emphasizes important information

Pitch

(I - IV) F0

Feature Extraction

(30)

O

Prosodic features

Feature Extraction

Higher energy emphasizes important information

Pitch

(I - IV) F0

Energy

(I - IV) energy

(31)

O

Lexical features

Feature Name

Feature Description

TF term frequency

IDF inverse document frequency

TFIDF tf * idf

PoS the PoS tag

Using some well-known lexical features for each candidate term

Feature Extraction

(32)

O

Semantic features

O Probabilistic Latent Semantic Analysis (PLSA)

 Latent Topic Probability

Key terms tend to focus on limited topics

t 1t 2 t j

t n

D₁ D₂ D_i

D_N

TK Tk

T2 T1

P(T |D )k i P(t |T )j k

D_i: documents T_k: latent topics t_j: ^terms

Feature Extraction

(33)

O

Semantic features

 Latent Topic Probability

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

describe a probability distribution

How to use it?

Feature

Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

(34)

O

Semantic features

 Latent Topic Significance

Within-topic to out-of-topic ratio

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

Feature

within-topic freq.

out-of-topic freq.

(35)

O

Semantic features

 Latent Topic Significance

Within-topic to out-of-topic ratio

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

within-topic freq.

out-of-topic freq.

Feature

LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

(36)

O

Semantic features

 Latent Topic Entropy

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

Feature

(37)

O

Semantic features

 Latent Topic Entropy

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term Higher LTE

Lower LTE

Feature

LTE term entropy for latent topic

(38)

Key Term Extraction

n

Automatic Key Term Extraction

Branchin g Entropy

Feature Extraction

Learning Methods

speech signal

ASR trans

Key terms entropy acoustic

model :

Using learning approaches to extract key terms

(39)

Learning Methods

O

Unsupervised learning

O K-means Exemplar

 Transform a term into a vector in LTS (Latent Topic Sig nificance) space

 Run K-means

 Find the centroid of each cluster to be the key term The term in the same group are related to the key term

The key term can represent this topic

The terms in the same cluster focus on a single topic

(40)

O

Supervised learning

O Adaptive Boosting

O Neural Network

Automatically adjust the weights of features to produce a classifier

Learning Methods

(41)

Experiments & Evaluation

(42)

Experiments

O

Corpus

O NTU lecture corpus

O Mandarin Chinese embedded by English words

O Single speaker

O 45.2 hours

我們的 solution 是 viterbi algorithm (Our solution is viterbi algorithm)

(43)

O

ASR Accuracy

Language Mandarin English Overall

Char Acc (%) 78.15 53.44 76.26

CH EN

SI Model

some data from target speaker

AM

Out-of-domain corpora

Background

In-domain corpus Adaptive

trigram

interpolation LM

Bilingual AM and model adaptation

Experiments

(44)

O

Reference Key Terms

O Annotations from 61 students who have taken the course

 If the k-th annotator labeled N_k key terms, he gave eac h of them a score of , but 0 to others

 Rank the terms by the sum of all scores given by all an notators for each term

 Choose the top N terms form the list (N is average Nk)

O N = 154 key terms

 59 key phrases and 95 keywords

Experiments

(45)

O

Evaluation

O Unsupervised learning

 Set the number of key terms to be N

O Supervised learning

 3-fold cross validation

Experiments

(46)

Pr Lx Sm Pr+LxPr+Lx+Sm

0 10 20 30 40 50 60

Experiments

O

Feature Effectiveness

O Neural network for keywords from ASR transcriptions

Each set of these features alone gives F1 from 20% to 42%

Prosodic features and lexical features are additive Three sets of features are all useful

20.7 8

42.8

6 35.6

3

48.1 5

56.5 5

Pr: Prosodic Lx: Lexical

Sm: Semantic

F-measure

(47)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

O

Overall Performance

51.95

55.84

62.39

67.31

23.38

Conventional TFIDF scores w/o branching entropy

stop word removal

PoS filtering

Branching entropy performs well K-means Exempler outperforms TFIDF

Supervised approaches are better than unsupervised approaches

F-measure

AB: AdaBoost

NN: Neural Network

(48)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

O

Overall Performance

The performance of ASR is slightly worse than manual but reasonable

Supervised learning using neural network gives the best results

23.38 20.78

51.95 43.51

55.84 52.60

62.39 57.68

67.31 62.70 F-measure

AB: AdaBoost

NN: Neural Network

(49)

Conclusion

(50)

Conclusion

O

We propose the new approach to extract key terms

O

The performance can be improved by

O Identifying phrases by branching entropy

O Prosodic, lexical, and semantic features together O

The results are encouraging

(51)

Thanks for your attention!  Q & A

NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lectu re