Automatic Key Term Extract ion from
Spoken Course Lectures
Using Branching Entropy and Pr
osodic/Semantic Features Speaker:
黃宥、陳縕儂
Outline
O
Introduction
O
Proposed Approach
O
Branching Entropy
O
Feature Extraction
O
Learning Method
O
Experiments & Evaluation
O
Conclusion
Introduction
Definition
O
Key Term
O
Higher term frequency
O
Core content
O
Two types
O
Keyword
O
Key phrase
O
Advantage
O
Indexing and retrieval
O
The relations between key terms and segments of do
cuments
Introduction
acoustic model language model
hmm n gram
phone hidden Markov model
Introduction
hmm
acoustic model language model
n gram
hidden Markov model phone bigram
Target: extract key terms from course lectures
Introduction
Proposed Approach
Automatic Key Term Extraction
▼ Original spoken documents
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Automatic Key Term Extraction
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Automatic Key Term Extraction
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Phrase Identificatio
n
Automatic Key Term Extraction
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
First using branching entropy to identify phrases
Key Term Extraction
Phrase Identificatio
n
Automatic Key Term Extraction
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Learning to extract key terms by some features
Key terms entropy acoustic
model :
Key Term Extraction
Phrase Identificatio
n
Automatic Key Term Extraction
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Key terms entropy acoustic
model :
Branching Entropy
O “hidden” is almost always followed by the same word
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
Branching Entropy
O “hidden” is almost always followed by the same word
O “hidden Markov” is almost always followed by the same word
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
Branching Entropy
O “hidden” is almost always followed by the same word
O “hidden Markov” is almost always followed by the same word
O “hidden Markov model” is followed by many different wo rds
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
boundary
Define branching entropy to decide possible boundary
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
O
Definition of Right Branching Entropy
O Probability of children xi for X
O Right branching entropy for X
X x
iBranching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in :
:
X
O
Decision of Right Boundary
O Find the right boundary located between X and xi where
boundary
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in : :
O
Decision of Left Boundary
O Find the left boundary located between X and xi where X: model Markov hidden
boundary
X
Using PAT Tree to implement
O
Implementation in the PAT tree
O Probability of children xi for X
O Right branching entropy for X hidden
Markov
1 model
2
chain 3
state
distribution 5 6 variable
4
X
x1 x2
X : hidden Markov
x1: hidden Markov model x2: hidden Markov chain
Branching Entropy
How to decide the boundary of a phrase?Key Term Extraction
Phrase Identificatio
n
Automatic Key Term Extraction
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Key terms entropy acoustic
model :
Extract some features for each candidate term
Feature Extraction
O
Prosodic features
O For each candidate term appearing at the first time
Featur e Name
Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Speaker tends to use longer duration to emphasize key terms
using 4 values for duration of the term
duration of phone “a”
normalized by avg duration of phone “a”
O
Prosodic features
O For each candidate term appearing at the first time
Featur e Name
Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Higher pitch may represent significant information
Feature Extraction
O
Prosodic features
O For each candidate term appearing at the first time Higher pitch may represent significant information
Feature Extraction
Featur e Name
Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Pitch
(I - IV) F0
(max, min, mean, range)
O
Prosodic features
O For each candidate term appearing at the first time Higher energy emphasizes important information
Featur e Name
Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Pitch
(I - IV) F0
(max, min, mean, range)
Feature Extraction
O
Prosodic features
O For each candidate term appearing at the first time
Feature Extraction
Higher energy emphasizes important information
Featur e Name
Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Pitch
(I - IV) F0
(max, min, mean, range)
Energy
(I - IV) energy
(max, min, mean, range)
O
Lexical features
Feature Name
Feature Description
TF term frequency
IDF inverse document frequency
TFIDF tf * idf
PoS the PoS tag
Using some well-known lexical features for each candidate term
Feature Extraction
O
Semantic features
O Probabilistic Latent Semantic Analysis (PLSA)
Latent Topic Probability
Key terms tend to focus on limited topics
t 1t 2 t j
t n
D1 D2 Di
DN
TK Tk
T2 T1
P(T |D )k i P(t |T )j k
Di: documents Tk: latent topics tj: terms
Feature Extraction
O
Semantic features
O Probabilistic Latent Semantic Analysis (PLSA)
Latent Topic Probability
Key terms tend to focus on limited topics
Feature Extraction
non-key term
key term
describe a probability distribution
How to use it?
Feature
Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
O
Semantic features
O Probabilistic Latent Semantic Analysis (PLSA)
Latent Topic Significance
Within-topic to out-of-topic ratio
Key terms tend to focus on limited topics
Feature Extraction
non-key term
key term
Feature
Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
within-topic freq.
out-of-topic freq.
O
Semantic features
O Probabilistic Latent Semantic Analysis (PLSA)
Latent Topic Significance
Within-topic to out-of-topic ratio
Key terms tend to focus on limited topics
Feature Extraction
non-key term
key term
within-topic freq.
out-of-topic freq.
Feature
Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
O
Semantic features
O Probabilistic Latent Semantic Analysis (PLSA)
Latent Topic Entropy
Key terms tend to focus on limited topics
Feature Extraction
non-key term
key term
Feature
Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
O
Semantic features
O Probabilistic Latent Semantic Analysis (PLSA)
Latent Topic Entropy
Key terms tend to focus on limited topics
Feature Extraction
non-key term
key term Higher LTE
Lower LTE
Feature
Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
LTE term entropy for latent topic
Key Term Extraction
Phrase Identificatio
n
Automatic Key Term Extraction
Archive of spoken documents
Branchin g Entropy
Feature Extraction
Learning Methods
1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Key terms entropy acoustic
model :
Using learning approaches to extract key terms
Learning Methods
O
Unsupervised learning
O K-means Exemplar
Transform a term into a vector in LTS (Latent Topic Sig nificance) space
Run K-means
Find the centroid of each cluster to be the key term The term in the same group are related to the key term
The key term can represent this topic
The terms in the same cluster focus on a single topic
O
Supervised learning
O Adaptive Boosting
O Neural Network
Automatically adjust the weights of features to produce a classifier
Learning Methods
Experiments & Evaluation
Experiments
O
Corpus
O NTU lecture corpus
O Mandarin Chinese embedded by English words
O Single speaker
O 45.2 hours
我們的 solution 是 viterbi algorithm (Our solution is viterbi algorithm)
O
ASR Accuracy
Language Mandarin English Overall
Char Acc (%) 78.15 53.44 76.26
CH EN
SI Model
some data from target speaker
AM
Out-of-domain corpora
Background
In-domain corpus Adaptive
trigram
interpolation LM
Bilingual AM and model adaptation
Experiments
O
Reference Key Terms
O Annotations from 61 students who have taken the course
If the k-th annotator labeled Nk key terms, he gave eac h of them a score of , but 0 to others
Rank the terms by the sum of all scores given by all an notators for each term
Choose the top N terms form the list (N is average Nk)
O N = 154 key terms
59 key phrases and 95 keywords
Experiments
O
Evaluation
O Unsupervised learning
Set the number of key terms to be N
O Supervised learning
3-fold cross validation
Experiments
Pr Lx Sm Pr+LxPr+Lx+Sm
0 10 20 30 40 50 60
Experiments
O
Feature Effectiveness
O Neural network for keywords from ASR transcriptions
Each set of these features alone gives F1 from 20% to 42%
Prosodic features and lexical features are additive Three sets of features are all useful
20.7 8
42.8
6 35.6
3
48.1 5
56.5 5
Pr: Prosodic Lx: Lexical
Sm: Semantic
F-measure
Baseline U: TFIDF U: K-means S: AB S: NN
0 10 20 30 40 50 60 70
manual ASR
Experiments
O
Overall Performance
51.95
55.84
62.39
67.31
23.38
Conventional TFIDF scores w/o branching entropy
stop word removal
PoS filtering
Branching entropy performs well K-means Exempler outperforms TFIDF
Supervised approaches are better than unsupervised approaches
F-measure
AB: AdaBoost
NN: Neural Network
Baseline U: TFIDF U: K-means S: AB S: NN
0 10 20 30 40 50 60 70
manual ASR
Experiments
O
Overall Performance
The performance of ASR is slightly worse than manual but reasonable
Supervised learning using neural network gives the best results
23.38 20.78
51.95 43.51
55.84 52.60
62.39 57.68
67.31 62.70 F-measure
AB: AdaBoost
NN: Neural Network
Conclusion
Conclusion
O
We propose the new approach to extract key terms
O
The performance can be improved by
O Identifying phrases by branching entropy
O Prosodic, lexical, and semantic features together O
The results are encouraging
Thanks for your attention! Q & A
NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lectu re