Unsupervised Spoken Language
Understanding in Dialogue Systems
YUN-NUNG (VIVIAN) CHEN 陳 縕儂 CARNEGIE MELLON UNIVERSITY
2015/01/16 @NCTU H T T P : / / V I V I A N C H E N . I D V . T W
1 UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Outline
Introduction
Unsupervised Slot Induction [Chen et al., ASRU’13 & Chen et al., SLT‘14]
Unsupervised Domain Exploration [Chen and Rudnicky, SLT’14]
Unsupervised Relation Detection [Chen et al., SLT’14]
Conclusions & Future Work
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 2
Outline
Introduction
Unsupervised Slot Induction [Chen et al., ASRU’13 & Chen et al., SLT‘14]
Unsupervised Domain Exploration [Chen and Rudnicky, SLT’14]
Unsupervised Relation Detection [Chen et al., SLT’14]
Conclusions & Future Work
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 3
Spoken Language Understanding (SLU)
SLU in dialogue systems
◦ SLU maps natural language inputs to semantic forms “I would like to go to NCTU on Friday.”
◦ Semantic frames, slots, and values
◦ often manually defined by domain experts or developers.
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
location: NCTU date: Friday
What are the problems?
4
Problems with Predefined Information
Generalization: may not generalize to real-world users.
Bias propagation: can bias subsequent data collection and annotation.
Maintenance: when new data comes in, developers need to start a new round of annotation to analyze the data and
update the grammar.
Efficiency: time consuming, and high costs.
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Can we automatically induce semantic information w/o annotations?
5
Outline
Introduction
Unsupervised Slot Induction [Chen et al., ASRU’13 & Chen et al., SLT‘14]
Unsupervised Domain Exploration [Chen and Rudnicky, SLT’14]
Unsupervised Relation Detection [Chen et al., SLT’14]
Conclusions & Future Work
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 6
Unsupervised Slot Induction
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Motivation
◦ Spoken dialogue systems (SDS) require predefined semantic slots to parse users’ input into semantic representations
◦ Frame semantics theory provides generic semantics
◦ Distributional semantics capture contextual latent semantics
7
Probabilistic Frame-Semantic Parsing
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
FrameNet [Baker et al., 1998]
◦ a linguistically-principled semantic resource, based on the frame-semantics theory.
◦ “low fat milk” “milk” evokes the “food” frame;
“low fat” fills the descriptor frame element
◦ Frame (food): contains words referring to items of food.
◦ Frame Element: a descriptor indicates the characteristic of food.
SEMAFOR [Das et al., 2010; 2013]
◦ a state-of-the-art frame-semantics parser, trained on manually annotated FrameNet sentences
8
Step 1: Frame-Semantic Parsing for ASR outputs
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
can i have a cheap restaurant
Frame: capability FT LU: can FE LU: i
Frame: expensiveness FT LU: cheap
Frame: locale by use FT/FE LU: restaurant
Task: adapting generic frames to task-specific settings for SDSs Good!
Good!
Bad!
9
Step 2: Slot Ranking Model
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Main Idea
◦ Ranking domain-specific concepts higher than generic semantic concepts
can i have a cheap restaurant
Frame: capability FT LU: can FE LU: i
Frame: expensiveness FT LU: cheap
Frame: locale by use FT/FE LU: restaurant
slot candidate slot filler
10
Step 2: Slot Ranking Model
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Rank the slot candidates by integrating two scores
the frequency of the slot candidate in the SEMAFOR-parsed corpus
the coherence of slot fillers
slots with higher frequency may be more important
domain-specific concepts should focus on fewer topics and be similar to each other
lower coherence in topic space higher coherence in topic space
slot: quantity slot: expensiveness
a one
all three
cheap
expensive
inexpensive
11
Step 2: Slot Ranking Model
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Measure coherence by pair-wised similarity of slot fillers
◦ For each slot candidate
slot candidate: expensiveness
The slot with higher h(s
i) usually focuses on fewer topics, which are more specific, which is preferable for slots of SDS.
corresponding slot filler:
“cheap”, “not expensive”
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 12
Step 2: Slot Ranking Model
How to define the vector for each slot filler?
◦ Run clustering and then build vectors based on clustering results
◦ K-means, spectral clustering, etc.
◦ Use distributional semantics to transfer words into vectors
◦ LSA, PLSA, neural word embeddings (word2vec)
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 13
Experiments for Slot Induction
Dataset
◦ Cambridge University SLU corpus [Henderson, 2012]
◦ Restaurant recommendation in an in-car setting in Cambridge
◦ WER = 37%
◦ vocabulary size = 1868
◦ 2,166 dialogues
◦ 15,453 utterances
◦ dialogue slot: addr, area, food, name, phone, postcode, price range, task, type
The mapping table between induced and reference slots
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 14
Experiments for Slot Induction
◦ Slot Induction Evaluation: MAP of the slot ranking model to measure the quality of induced slots via the mapping table
◦ Slot Filling Evaluation: MAP-F-H/S: weight the MAP score with F-measure of two slot filler lists
Approach ASR
MAP MAP-F-H MAP-F-S
Frame Sem
(a) Frequency 67.61 26.96 27.29
(b) K-Means 67.38 27.38 27.99
(c) Spectral Clustering 68.06 30.52 28.40
Frame Sem + Dist Sem
(d) Google News RepSim 72.71 31.14 31.44
(e) NeiSim 73.35 31.44 31.81
(f) Freebase RepSim 71.48 29.81 30.37
(g) NeiSim 73.02 30.89 30.72
(h) (d) + (e) + (f) + (g) 76.22 30.17 30.53
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 15
Experiments for Slot Induction
Approach ASR
MAP MAP-F-H MAP-F-S
Frame Sem
(a) Frequency 67.61 26.96 27.29
(b) K-Means 67.38 27.38 27.99
(c) Spectral Clustering 68.06 30.52 28.40
Frame Sem + Dist Sem
(d) Google News RepSim 72.71 31.14 31.44
(e) NeiSim 73.35 31.44 31.81
(f) Freebase RepSim 71.48 29.81 30.37
(g) NeiSim 73.02 30.89 30.72
(h) (d) + (e) + (f) + (g) 76.22 30.17 30.53
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Adding distributional information outperforms our baselines
16
Experiments for Slot Induction
Approach ASR
MAP MAP-F-H MAP-F-S
Frame Sem
(a) Frequency 67.61 26.96 27.29
(b) K-Means 67.38 27.38 27.99
(c) Spectral Clustering 68.06 30.52 28.40
Frame Sem + Dist Sem
(d) Google News RepSim 72.71 31.14 31.44
(e) NeiSim 73.35 31.44 31.81
(f) Freebase RepSim 71.48 29.81 30.37
(g) NeiSim 73.02 30.89 30.72
(h) (d) + (e) + (f) + (g) 76.22 30.17 30.53
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Combining two datasets to integrate the coverage of Google and precision of Freebase can rank correct slots higher and performs the best MAP scores
17
Outline
Introduction
Unsupervised Slot Induction [Chen et al., ASRU’13 & Chen et al., SLT‘14]
Unsupervised Domain Exploration [Chen and Rudnicky, SLT’14]
Unsupervised Relation Detection [Chen et al., SLT’14]
Conclusions & Future Work
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 18
Question?
Unsupervised Domain Exploration
Target: given conversation
interaction with SDS, predicting which application the user
wants to launch Approach:
◦ Step 1: enriching the semantics using word embeddings
◦ Step 2: using the descriptions of applications as a retrieval cue to find relevant applications
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 19
Proposed Framework
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 20
“play lady gaga’s bad romance”
1. Semantic Seed Generation
The Semantic Seeds (Slot Types)
2. Semantics Enrichment
Entity Linking Wikipedia
Freebase Structured Knowledge
Word Embeddings
Enrichment Process
3. Retrieval Process
Ranking Model
Ranked Applications
Pandora
singer songwriter
song music
:
Application Data Query Utterance
Frame- Semantic
Parsing
Proposed Framework
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 21
“play lady gaga’s bad romance”
1. Semantic Seed Generation
The Semantic Seeds (Slot Types)
2. Semantics Enrichment
Entity Linking Wikipedia
Freebase Structured Knowledge
Word Embeddings
Enrichment Process
3. Retrieval Process
Ranking Model
Ranked Applications
Pandora
singer songwriter
song music
:
Application Data Query Utterance
Frame- Semantic
Parsing
Semantic Seed Generation
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 22
Semantic parsing performs well on a generic domain, and cannot recognize domain-specific named entities.
• Main idea: Slot types help imply semantic meaning of the utterance for expanding domain knowledge.
• Frame Type of Semantic Parsing
Q: compose an email to alex
Frame: text creation
FT LU: compose FE LU: an email
Frame: contacting FT LU: email
S
frm(Q): frame-based semantic seeds
Semantic Seed Generation
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 23
• Main idea: Slot types help imply semantic meaning of the utterance for expanding domain knowledge.
• Entity Type from Linked Structured Knowledge
。 Wikipedia Page Linking 。 Freebase List Linking
Q: play lady gaga’s bad romance
… is an American singer, songwriter, and actress.
… is a song by American singer …
S
wk(Q): wikipedia-based semantic seeds
celebrity composition
:
composition canonical version musical recording
:
Q: play lady gaga’s bad romance
S
fb(Q): freebase-based
semantic seeds
Proposed Framework
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 24
“play lady gaga’s bad romance”
1. Semantic Seed Generation
The Semantic Seeds (Slot Types)
2. Semantics Enrichment
Entity Linking Wikipedia
Freebase Structured Knowledge
Word Embeddings
Enrichment Process
3. Retrieval Process
Ranking Model
Ranked Applications
Pandora
singer songwriter
song music
:
Application Data Query Utterance
Frame- Semantic
Parsing
Semantic Enrichment
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 25
• Main idea: Utilizing distributed word embeddings to obtain the semantically related knowledge of each word.
1) Modeling word embeddings by the application vender descriptions.
2) Extracting the most related words by trained word embeddings for
each word. (ex. “text” “message”,
“msg”)
Words with higher similarity suggest that they are often
occurs with common contexts in the embedding training data.
Proposed Framework
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 26
“play lady gaga’s bad romance”
1. Semantic Seed Generation
The Semantic Seeds (Slot Types)
2. Semantics Enrichment
Entity Linking Wikipedia
Freebase Structured Knowledge
Word Embeddings
Enrichment Process
3. Retrieval Process
Ranking Model
Ranked Applications
Pandora
singer songwriter
song music
:
Application Data Query Utterance
Frame- Semantic
Parsing
Retrieval Process
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 27
• Main idea: Retrieving the applications that are more likely to support users’ requests.
probability that user speaks Q to make the request for launching the application A
• Query Reformulation (Q’)
。 Embedding-Enriched Query: integrates similar words to all words in Q
。 Type-Embedding-Enriched Query: additionally adds similar words to semantic seeds S(Q)
• Ranking Model
probability that word x occurs in the
application
The application with higher P(Q | A) is more likely to be able to
support the user desired functions.
Results
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 28
0.22 0.26 0.30 0.34 0.38
0 25 50 75 100 125 150 175 200 MAP
#word / query
Baseline Embedding-Enriched (T)
Type-Embedding-Enriched: Frame (T) Type-Embedding-Enriched: Wikipedia (T) Type-Embedding-Enriched: Freebase (T) Type-Embedding-Enriched: Hand-crafted (T)
0.25 0.31 0.37 0.43 0.49
0 25 50 75 100 125 150 175 200 P@5
#word / query
Tune the thresholds by develop set
Overall Results
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 29
Approach ASR
MAP P@5
Original Query 25.50 34.97 Embedding-Enriched 30.42 40.72
Type- Embed.- Enriched
Frame 30.11 39.59
Wikipedia 30.74 40.82 Freebase 32.02 41.23 Hand-Craft 34.91 45.03
◦ Enriching semantics improves performance by involving domain-specific knowledge.
◦ Freebase results are better than the embedding-enriched method, showing that we can effectively and efficiently expand domain-specific knowledge by types of slots from Freebase.
◦ Hand-crafted mapping shows
that the correct types of slots
offer better understanding and
tells the room of improvement.
Outline
Introduction
Unsupervised Slot Induction [Chen et al., ASRU’13 & Chen et al., SLT‘14]
Unsupervised Domain Exploration [Chen and Rudnicky, SLT’14]
Unsupervised Relation Detection [Chen et al., SLT’14]
Conclusions & Future Work
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 30
Question?
Unsupervised Relation Detection
Spoken Language Understanding (SLU): convert ASR outputs into pre- defined semantic output format
Relation: semantic interpretation of input utterances
◦ movie.release_date, movie.name, movie.directed_by, director.name
Unsupervised SLU: utilize external knowledge to help relation detection without labelled data
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
“when was james cameron’s avatar released”
Intent: FIND_RELEASE_DATE
Slot-Val: MOVIE_NAME=“avatar”, DIRECTOR_NAME=“james cameron”
31
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS
Semantic Knowledge Graph
Priors for SLU
What are knowledge graphs?
◦ Graphs with
◦ strongly typed and uniquely identified entities (nodes)
◦ facts/literals connected by relations (edge)
Examples:
◦ Satori, Google KG, Facebook Open Graph, Freebase
How large?
◦ > 500M entities, >1.5B relations, > 5B facts
How broad?
◦ Wikipedia-breadth: “American Football” “Zoos”
• Slides of Larry Heck, Dilek Hakkani-Tur, and Gokhan Tur, Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing, in Proceedings of Interspeech, 2013.
32
Semantic Interpretation via Relations
Two Examples
◦
differentiate two examples by including the originating node types in the relation
User Utterance:
find movies produced by james cameron SPARQL Query (simplified):
SELECT ?movie {?movie. ?movie.produced_by?producer.
?producer.name"James Cameron".}
Logical Form:
λx. Ǝy. movie.produced_by(x, y) Λ person.name(y, z) Λ z=“James Cameron”
Relation:
movie.produced_by producer.name
User Utterance:
who produced avatar SPARQL Query (simplified):
SELECT ?producer {?movie.name"Avatar“. ?movie.produced_by?producer.}
Logical Form:
λy. Ǝx. movie.produced_by(x, y) Λ movie.name(x, z) Λ z=“Avatar”
Relation:
movie.name movie.produced_by
produced_by
name
MOVIE PERSON
produced_by
name
MOVIE PERSON
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 33
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 34
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 35
Relation Inference from Gazetteers
Gazetteers (entity lists)
“james cameron”
director producer
:
james cameron director
director producer
#movies James Cameron directed
movie.directed_by director.name
director director
• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding, in Proceedings of Interspeech, 2014.
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 36
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 37
Relational Surface Form Derivation
Web Resource Mining
Bing query snippets including entity pairs connected with specific relations in KG
Dependency Parsing
Avatar is a 2009 American epic science fiction film directed by James Cameron.
directed_by
Avatar is a 2009 American epic science fiction film directed by James Cameron nsub
num cop det
nn vmod
prop_by
nn
$movie nn nn nn prop pobj $director
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 38
Relational Surface Form Derivation
Dependency-Based Entity Embeddings
1) Word & Context Extraction
Word Contexts
$movie film/nsub-1 is film/cop-1 a film/det-1 2009 film/num-1 american, epic,
science, fiction film/nn-1
Word Contexts
film
film/nsub, is/cop, a/det, 2009/num, american/nn, epic/nn, science/nn, fiction/nn, directed/vmod directed $director/prep_by
$director directed/prep_by-1
Avatar is a 2009 American epic science fiction film directed by James Cameron nsub
num cop det
nn vmod
prop_by
nn
$movie nn nn nn prop pobj $director
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 39
Relational Surface Form Derivation
Dependency-Based Entity Embeddings
2) Training Process
◦ Each word w is associated with a vector v
wand each context c is represented as a vector v
c◦ Learn vector representations for both words and contexts such that the dot product v
w. v
cassociated with good word-context pairs belonging to the training data D is maximized
◦ Objective function:
Word Contexts
$movie film/nsub-1 is film/cop-1 a film/det-1 2009 film/num-1 american, epic,
science, fiction film/nn-1
Word Contexts
film
film/nsub, is/cop, a/det, 2009/num, american/nn, epic/nn, science/nn, fiction/nn, directed/vmod directed $director/prep_by
$director directed/prep_by-1
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 40
Relational Surface Form Derivation
Surface Form Derivation
Entity Surface Forms
◦ learn the surface forms corresponding to entities
Entity Syntactic Contexts
◦ learn the important contexts of entities
$char, $director, etc.
$char: “character”, “role”, “who”
$director: “director”, “filmmaker”
$genre: “action”, “fiction”
based on word vector v
wbased on context vector v
c$char: “played”
$director: “directed”
with similar contexts
frequently occurring together
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 41
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 42
Probabilistic Enrichment
Integrate relations from
◦ Prior knowledge
◦ Entity surface forms
◦ Entity syntactic contexts
Integrated Relations for Words by
◦ Unweighted: combine all relations with binary values
◦ Weighted: combine all relations and keep the highest weights of relations
◦ Highest Weighted: combine the most possible relation of each word
Integrated Relations for Utterances by
• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding, in Proceedings of Interspeech, 2014.
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 43
Boostrapping
Unsupervised Self-Training
Training a multi-label multi-class classifier estimating relations given an utterance
Ru1 (r)
r
Ru2 (r)
r
Ru3 (r)
r
Utterances with relation weights
Pseudo labels for training
u1: Lu1 (r) u2: Lu2 (r) u3: Lu3 (r) :
creating labels by a threshold
Adaboost:
ensemble M weak classifiers
Classifier
output prob dist.
of relations
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 44
Experiments of Relation Detection
Dataset
Knowledge Base: Freebase
◦ 670K entities
◦ 78 entity types (movie names, actors, etc)
Relation Detection Data
◦ Crowd-sourced utterances
◦ Manually annotated with SPARQL queries relations
Query Statistics Dev Test
% entity only 8.9% 10.7%
% rel only w/ specified movie names 27.1% 27.5%
% rel only w/ specified other names 39.8% 39.6%
% more complicated relations 15.4% 14.7%
% not covered 8.8% 7.6%
#utterances 3338 1084
User Utterance:
who produced avatar Relation:
movie.name
movie.produced_by
produced_by
name
MOVIE PERSON
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 45
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Baseline
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 46
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Words derived by dependency embeddings can successfully capture the surface forms of entity tags, while words derived by regular embeddings cannot.
Baseline
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 47
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Words derived from entity contexts slightly improve performance.
Baseline
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 48
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04 Gazetteer + Entity Surface Form + Context 37.66 38.64 40.29 41.98 40.07 43.34
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Combining all approaches performs best, while the major improvement is from derived entity surface forms.
Baseline
Proposed
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 49
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04 Gazetteer + Entity Surface Form + Context 37.66 38.64 40.29 41.98 40.07 43.34
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
With the same information, learning surface forms from dependency- based embedding performs better, because there’s mismatch between written and spoken language.
Baseline
Proposed
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 50
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Weighted methods perform better when less features, and highest weighted methods perform better when more features.
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04 Gazetteer + Entity Surface Form + Context 37.66 38.64 40.29 41.98 40.07 43.34 Baseline
Proposed
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 51
Experiments of Relation Detection
Entity Surface Forms Derived from Dependency Embeddings The functional similarity carried by dependency-based entity
embeddings effectively benefits relation detection task.
Entity Tag Derived Word
$character character, role, who, girl, she, he, officier
$director director, dir, filmmaker
$genre comedy, drama, fantasy, cartoon, horror, sci
$language language, spanish, english, german
$producer producer, filmmaker, screenwriter
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 52
Experiments of Relation Detection
Effectiveness of Boosting
◦ The best result is the
combination of all approaches, because probabilities came from different resources can complement each other.
◦ Only adding entity surface forms performs similarly, showing that the major improvement comes from
relational entity surface forms.
◦ Boosting significantly improves most performance
0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44
1 2 3 4 5 6 7 8 9 10
F-Measure
Iteration
Gaz. Gaz. + Weakly Supervised
Gaz. + Entity Surface Form (BOW) Gaz. + Entity Surface Form (Dep) Gaz. + Entity Context Gaz. + Entity Surface Form + Context
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 53
Outline
Introduction
Unsupervised Slot Induction [Chen et al., ASRU’13 & Chen et al., SLT‘14]
Unsupervised Domain Exploration [Chen and Rudnicky, SLT’14]
Unsupervised Relation Detection [Chen et al., SLT’14]
Conclusions & Future Work
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 54
Question?
Conclusions & Future Work
Conclusions
◦ Unsupervised SLU are more and more popular.
◦ Using external knowledge helps SLU in different ways.
◦ Word embeddings is very useful
Future Work
◦ Fusion of various knowledge resources
◦ Different resources help SLU in different ways
◦ Relation between slots
◦ Understanding Inter-slot relations can help develop better SDS
◦ Active learning
◦ In terms of practical and efficiency, manually labeling a small set of samples can boost performance.
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 55
Q & A
THANKS FOR YOUR AT TENTIONS!!
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING IN DIALOGUE SYSTEMS 56