• 沒有找到結果。

Automatic Key Term Extraction fromSpoken Course Lectures

N/A
N/A
Protected

Academic year: 2022

Share "Automatic Key Term Extraction fromSpoken Course Lectures"

Copied!
51
0
0

加載中.... (立即查看全文)

全文

(1)

Automatic Key Term Extract ion from

Spoken Course Lectures

Using Branching Entropy and Pr

osodic/Semantic Features Speaker:

黃宥、陳縕儂

(2)

Outline

O

Introduction

O

Proposed Approach

O

Branching Entropy

O

Feature Extraction

O

Learning Method

O

Experiments & Evaluation

O

Conclusion

(3)

Introduction

(4)

Definition

O

Key Term

O

Higher term frequency

O

Core content

O

Two types

O

Keyword

O

Key phrase

O

Advantage

O

Indexing and retrieval

O

The relations between key terms and segments of do

cuments

(5)

Introduction

(6)

acoustic model language model

hmm n gram

phone hidden Markov model

Introduction

(7)

hmm

acoustic model language model

n gram

hidden Markov model phone bigram

Target: extract key terms from course lectures

Introduction

(8)

Proposed Approach

(9)

Automatic Key Term Extraction

▼ Original spoken documents

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

(10)

Automatic Key Term Extraction

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

(11)

Automatic Key Term Extraction

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

(12)

Phrase Identificatio

n

Automatic Key Term Extraction

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

First using branching entropy to identify phrases

(13)

Key Term Extraction

Phrase Identificatio

n

Automatic Key Term Extraction

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

Learning to extract key terms by some features

Key terms entropy acoustic

model :

(14)

Key Term Extraction

Phrase Identificatio

n

Automatic Key Term Extraction

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

Key terms entropy acoustic

model :

(15)

Branching Entropy

O “hidden” is almost always followed by the same word

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

(16)

Branching Entropy

O “hidden” is almost always followed by the same word

O “hidden Markov” is almost always followed by the same word

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

(17)

Branching Entropy

O “hidden” is almost always followed by the same word

O “hidden Markov” is almost always followed by the same word

O “hidden Markov model” is followed by many different wo rds

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

boundary

Define branching entropy to decide possible boundary

(18)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

O

Definition of Right Branching Entropy

O Probability of children xi for X

O Right branching entropy for X

X x

i

(19)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in :

:

X

O

Decision of Right Boundary

O Find the right boundary located between X and xi where

boundary

(20)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

(21)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

(22)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

(23)

Branching Entropy

hidden Markov model

How to decide the boundary of a phrase?

represent is

can : : is

of in : :

O

Decision of Left Boundary

O Find the left boundary located between X and xi where X: model Markov hidden

boundary

X

Using PAT Tree to implement

(24)

O

Implementation in the PAT tree

O Probability of children xi for X

O Right branching entropy for X hidden

Markov

1 model

2

chain 3

state

distribution 5 6 variable

4

X

x1 x2

X : hidden Markov

x1: hidden Markov model x2: hidden Markov chain

Branching Entropy

How to decide the boundary of a phrase?

(25)

Key Term Extraction

Phrase Identificatio

n

Automatic Key Term Extraction

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

Key terms entropy acoustic

model :

Extract some features for each candidate term

(26)

Feature Extraction

O

Prosodic features

O For each candidate term appearing at the first time

Featur e Name

Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Speaker tends to use longer duration to emphasize key terms

using 4 values for duration of the term

duration of phone “a”

normalized by avg duration of phone “a”

(27)

O

Prosodic features

O For each candidate term appearing at the first time

Featur e Name

Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Higher pitch may represent significant information

Feature Extraction

(28)

O

Prosodic features

O For each candidate term appearing at the first time Higher pitch may represent significant information

Feature Extraction

Featur e Name

Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Pitch

(I - IV) F0

(max, min, mean, range)

(29)

O

Prosodic features

O For each candidate term appearing at the first time Higher energy emphasizes important information

Featur e Name

Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Pitch

(I - IV) F0

(max, min, mean, range)

Feature Extraction

(30)

O

Prosodic features

O For each candidate term appearing at the first time

Feature Extraction

Higher energy emphasizes important information

Featur e Name

Feature Description Duration

(I – IV) normalized duration (max, min, mean, range)

Pitch

(I - IV) F0

(max, min, mean, range)

Energy

(I - IV) energy

(max, min, mean, range)

(31)

O

Lexical features

Feature Name

Feature Description

TF term frequency

IDF inverse document frequency

TFIDF tf * idf

PoS the PoS tag

Using some well-known lexical features for each candidate term

Feature Extraction

(32)

O

Semantic features

O Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Probability

Key terms tend to focus on limited topics

t 1t 2 t j

t n

D1 D2 Di

DN

TK Tk

T2 T1

P(T |D )k i P(t |T )j k

Di: documents Tk: latent topics tj: terms

Feature Extraction

(33)

O

Semantic features

O Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Probability

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

describe a probability distribution

How to use it?

Feature

Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

(34)

O

Semantic features

O Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Significance

Within-topic to out-of-topic ratio

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

Feature

Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

within-topic freq.

out-of-topic freq.

(35)

O

Semantic features

O Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Significance

Within-topic to out-of-topic ratio

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

within-topic freq.

out-of-topic freq.

Feature

Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

(36)

O

Semantic features

O Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Entropy

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term

Feature

Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

(37)

O

Semantic features

O Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Entropy

Key terms tend to focus on limited topics

Feature Extraction

non-key term

key term Higher LTE

Lower LTE

Feature

Name Feature Description

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

LTE term entropy for latent topic

(38)

Key Term Extraction

Phrase Identificatio

n

Automatic Key Term Extraction

Archive of spoken documents

Branchin g Entropy

Feature Extraction

Learning Methods

1)K-means Exemplar 2)AdaBoost

3)Neural Network ASR

speech signal

ASR trans

Key terms entropy acoustic

model :

Using learning approaches to extract key terms

(39)

Learning Methods

O

Unsupervised learning

O K-means Exemplar

Transform a term into a vector in LTS (Latent Topic Sig nificance) space

Run K-means

Find the centroid of each cluster to be the key term The term in the same group are related to the key term

The key term can represent this topic

The terms in the same cluster focus on a single topic

(40)

O

Supervised learning

O Adaptive Boosting

O Neural Network

Automatically adjust the weights of features to produce a classifier

Learning Methods

(41)

Experiments & Evaluation

(42)

Experiments

O

Corpus

O NTU lecture corpus

O Mandarin Chinese embedded by English words

O Single speaker

O 45.2 hours

我們的 solution 是 viterbi algorithm (Our solution is viterbi algorithm)

(43)

O

ASR Accuracy

Language Mandarin English Overall

Char Acc (%) 78.15 53.44 76.26

CH EN

SI Model

some data from target speaker

AM

Out-of-domain corpora

Background

In-domain corpus Adaptive

trigram

interpolation LM

Bilingual AM and model adaptation

Experiments

(44)

O

Reference Key Terms

O Annotations from 61 students who have taken the course

If the k-th annotator labeled Nk key terms, he gave eac h of them a score of , but 0 to others

Rank the terms by the sum of all scores given by all an notators for each term

Choose the top N terms form the list (N is average Nk)

O N = 154 key terms

59 key phrases and 95 keywords

Experiments

(45)

O

Evaluation

O Unsupervised learning

Set the number of key terms to be N

O Supervised learning

3-fold cross validation

Experiments

(46)

Pr Lx Sm Pr+LxPr+Lx+Sm

0 10 20 30 40 50 60

Experiments

O

Feature Effectiveness

O Neural network for keywords from ASR transcriptions

Each set of these features alone gives F1 from 20% to 42%

Prosodic features and lexical features are additive Three sets of features are all useful

20.7 8

42.8

6 35.6

3

48.1 5

56.5 5

Pr: Prosodic Lx: Lexical

Sm: Semantic

F-measure

(47)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

O

Overall Performance

51.95

55.84

62.39

67.31

23.38

Conventional TFIDF scores w/o branching entropy

stop word removal

PoS filtering

Branching entropy performs well K-means Exempler outperforms TFIDF

Supervised approaches are better than unsupervised approaches

F-measure

AB: AdaBoost

NN: Neural Network

(48)

Baseline U: TFIDF U: K-means S: AB S: NN

0 10 20 30 40 50 60 70

manual ASR

Experiments

O

Overall Performance

The performance of ASR is slightly worse than manual but reasonable

Supervised learning using neural network gives the best results

23.38 20.78

51.95 43.51

55.84 52.60

62.39 57.68

67.31 62.70 F-measure

AB: AdaBoost

NN: Neural Network

(49)

Conclusion

(50)

Conclusion

O

We propose the new approach to extract key terms

O

The performance can be improved by

O Identifying phrases by branching entropy

O Prosodic, lexical, and semantic features together O

The results are encouraging

(51)

Thanks for your attention!  Q & A

NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lectu re

參考文獻

相關文件

In the third paragraph, please write a 100-word paragraph to talk about what you’d do in the future to make this research better and some important citations if any.. Please help

Myers effect condensation of mean field D(-1) Chern Simons term is induced. Fuzzy sphere is

微算機基本原理與應用 第15章

政府由2017/18學年起推行幼稚園教育計劃(「計劃」),其中一項中/長期措施

Additional Key Words and Phrases: Topic Hierarchy Generation, Text Segment, Hierarchical Clustering, Partitioning, Search-Result Snippet, Text Data

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term.

LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term.

Hofmann, “Collaborative filtering via Gaussian probabilistic latent semantic analysis”, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and