Using Branching Entropy and Prosodic/Semantic Features
Yun-Nung (Vivian) Chen, Yu Huang,
Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan
Introduction
Definition
• Key Term
•
Higher term frequency•
Core content• Two types
•
Keyword•
Key phrase• Advantage
•
Indexing and retrieval•
The relations between key terms and segments of documentsIntroduction
Introduction
acoustic model language model
hmm n gram
phone hidden Markov model
Introduction
hmm
acoustic model language model
n gram
phone hidden Markov model
bigram
Target: extract key terms from course lectures
Proposed Approach
Automatic Key Term Extraction
▼ Original spoken documents
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Automatic Key Term Extraction
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASRASR
speech signal
ASR trans
Automatic Key Term Extraction
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Phrase Identification
Automatic Key Term Extraction
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
First using branching entropy to identify phrases
ASR trans
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
Key terms entropy acoustic
model :
Then using learning methods to extract key terms by some features
ASR trans
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
Key terms entropy acoustic
model :
ASR trans
Branching Entropy
•
“hidden” is almost always followed by the same wordhidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
Branching Entropy
•
“hidden” is almost always followed by the same word•
“hidden Markov” is almost always followed by the same wordhidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
Branching Entropy
hidden Markov model
boundary
Define branching entropy to decide possible boundary
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
•
“hidden” is almost always followed by the same word•
“hidden Markov” is almost always followed by the same word•
“hidden Markov model” is followed by many different wordsBranching Entropy
hidden Markov model
• Definition of Right Branching Entropy
• Probability of children xi for X
• Right branching entropy for X
X xi
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
Branching Entropy
hidden Markov model
• Decision of Right Boundary
• Find the right boundary located between X and xi where
X
boundary
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
Branching Entropy
hidden Markov model
How to decide the boundary of a phrase?
represent is
can : : is
of in
: :
Branching Entropy
hidden Markov model
• Decision of Left Boundary
• Find the left boundary located between X and xi where
X: model Markov hidden
How to decide the boundary of a phrase?
boundary
X
represent is
can : : is
of in
: :
Using PAT Tree to implement
Branching Entropy
• Implementation in the PAT tree
• Probability of children xi for X
• Right branching entropy for X hidden
Markov
1 model
2
chain 3
state
distribution 5 6 variable
4
X
x1 x2
X : hidden Markov
x1: hidden Markov model x2: hidden Markov chain
How to decide the boundary of a phrase?
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
Key terms entropy acoustic
model :
Extract prosodic, lexical, and semantic features for each candidate term
ASR trans
Feature Extraction
• Prosodic features
•
For each candidate term appearing at the first timeFeature
Name Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Speaker tends to use longer duration to emphasize key terms
using 4 values for duration of the term duration of phone “a” normalized by
avg duration of phone “a”
Feature Extraction
• Prosodic features
•
For each candidate term appearing at the first time Higher pitch may represent significant informationFeature
Name Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Feature Extraction
• Prosodic features
•
For each candidate term appearing at the first time Higher pitch may represent significant informationFeature
Name Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Pitch
(I - IV) F0
(max, min, mean, range)
Feature Extraction
• Prosodic features
•
For each candidate term appearing at the first time Higher energy emphasizes important informationFeature
Name Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Pitch
(I - IV) F0
(max, min, mean, range)
Feature Extraction
• Prosodic features
•
For each candidate term appearing at the first time Higher energy emphasizes important informationFeature
Name Feature Description Duration
(I – IV) normalized duration (max, min, mean, range)
Pitch
(I - IV) F0
(max, min, mean, range)
Energy
(I - IV) energy
(max, min, mean, range)
Feature Extraction
• Lexical features
Feature Name Feature Description
TF term frequency
IDF inverse document frequency
TFIDF tf * idf
PoS the PoS tag
Using some well-known lexical features for each candidate term
Feature Extraction
• Semantic features
•
Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability
Key terms tend to focus on limited topics
t 1t 2 t j
t n
D1 D2 Di
DN
TK Tk
T2 T1
P(T |D )k i P(t |T )j k
Di: documents Tk: latent topics tj: terms
Feature Extraction
• Semantic features
•
Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term
key term
Key terms tend to focus on limited topics
describe a probability distribution How to use it?
Feature Extraction
• Semantic features
•
Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Feature Extraction
• Semantic features
•
Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Feature Extraction
• Semantic features
•
Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
Feature Extraction
• Semantic features
•
Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
LTE term entropy for latent topic
non-key term
key term
Key terms tend to focus on limited topics
Higher LTE
Lower LTE
Phrase Identification
Key Term Extraction
Automatic Key Term Extraction
Archive of spoken documents
Branching Entropy
Feature Extraction
Learning Methods 1)K-means Exemplar 2)AdaBoost
3)Neural Network ASR
speech signal
ASR trans
Key terms entropy acoustic
model :
Using unsupervised and supervised approaches to extract key terms
Learning Methods
• Unsupervised learning
•
K-means Exemplar Transform a term into a vector in LTS (Latent Topic Signific ance) space
Run K-means
Find the centroid of each cluster to be the key term
The candidate term in the same group are related to the key term The key term can represent this topic
The terms in the same cluster focus on a single topic
Learning Methods
• Supervised learning
•
Adaptive Boosting•
Neural NetworkAutomatically adjust the weights of features to produce a classifier
Experiments & Evaluation
Experiments
• Corpus
•
NTU lecture corpus Mandarin Chinese embedded by English words
Single speaker
45.2 hours
我們的 solution 是 viterbi algorithm (Our solution is viterbi algorithm)
Experiments
• ASR Accuracy
Language Mandarin English Overall
Char Acc (%) 78.15 53.44 76.26
CH EN
SI Model
some data from target speaker
AM
Out-of-domain corpora Background
In-domain corpus Adaptive
trigram
interpolation LM
Bilingual AM and model adaptation
Experiments
• Reference Key Terms
•
Annotations from 61 students who have taken the course If the k-th annotator labeled Nk key terms, he gave each o f them a score of , but 0 to others
Rank the terms by the sum of all scores given by all annot ators for each term
Choose the top N terms form the list (N is average Nk)
•
N = 154 key terms 59 key phrases and 95 keywords
Experiments
• Evaluation
•
Unsupervised learning Set the number of key terms to be N
•
Supervised learning 3-fold cross validation
Pr Lx Sm Pr+Lx Pr+Lx+Sm
0 10 20 30 40 50 60
Experiments
• Feature Effectiveness
•
Neural network for keywords from ASR transcriptionsEach set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful
20.78
42.86
35.63
48.15
56.55
Pr: Prosodic Lx: Lexical Sm: Semantic
F-measure
Baseline U: TFIDF U: K-means S: AB S: NN
0 10 20 30 40 50 60 70
manual ASR
Experiments
• Overall Performance
51.95
55.84
62.39
67.31
23.38
Conventional TFIDF scores w/o branching entropy
stop word removal
PoS filtering
Branching entropy performs well K-means Exempler outperforms TFIDF
Supervised approaches are better than unsupervised approaches
F-measure
AB: AdaBoost
NN: Neural Network
Baseline U: TFIDF U: K-means S: AB S: NN
0 10 20 30 40 50 60 70
manual ASR
Experiments
• Overall Performance
The performance of ASR is slightly worse than manual but reasonableSupervised learning using neural network gives the best results
23.38 20.78
51.95
43.51
55.84 52.60
62.39 57.68
67.31 62.70 F-measure
AB: AdaBoost
NN: Neural Network
Conclusion
Conclusion
• We propose the new approach to extract key terms
• The performance can be improved by
•
Identifying phrases by branching entropy•
Prosodic, lexical, and semantic features together• The results are encouraging
Thanks for your attention! Q & A
Thank reviewers for valuable comments
NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture