10/02/2014 Sphinx Lunch
DERIVING LOCAL RELATIONAL SURFACE FORMS FROM DEPENDENCY-BASED ENTITY EMBEDDINGS FOR
UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING
YUN-NUNG (VIVIAN) CHEN
DILEK HAKKANI-TÜ R & GOKHAN TUR
Outline
Introduction
◦ Main Idea
◦ Semantic Knowledge Graph
◦ Semantic Interpretation via Relation
Proposed Approach
◦ Relation Inference from Gazetteers
◦ Relational Surface Form Derivation
◦ Probabilistic Enrichment
◦ Boostrapping
Experiments
Conclusions
Main Idea
Relation Detection for Unsupervised SLU
Spoken Language Understanding (SLU): convert automatic speech recognition (ASR) outputs into pre-defined semantic output format
Relation: semantic interpretation of input utterances
◦ movie.release_date, movie.name, movie.directed_by, director.name
Unsupervised SLU: utilize external knowledge to help relation detection without labelled data
“when was james cameron’s avatar released”
Intent: FIND_RELEASE_DATE
Slot-Val: MOVIE_NAME=“avatar”, DIRECTOR_NAME=“james cameron”
Semantic Knowledge Graph
Priors for SLU
What are knowledge graphs?
◦ Graphs with
◦ strongly typed and uniquely identified entities (nodes)
◦ facts/literals connected by relations (edge)
Examples:
◦ Satori, Google KG, Facebook Open Graph, Freebase
How large?
◦ > 500M entities, >1.5B relations, > 5B facts
How broad?
◦ Wikipedia-breadth: “American Football” “Zoos”
• Slides of Larry Heck, Dilek Hakkani-Tur, and Gokhan Tur, Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing, in Proceedings of Interspeech, 2013.
Semantic Interpretation via Relations
Two Examples
◦ differentiate two examples by including the originating node types in the relation
User Utterance:
find movies produced by james cameron SPARQL Query (simplified):
SELECT ?movie {?movie. ?movie.produced_by?producer.
?producer.name"James Cameron".}
Logical Form:
λx. Ǝy. movie.produced_by(x, y) Λ person.name(y, z) Λ z=“James Cameron”
Relation:
movie.produced_by producer.name
User Utterance:
who produced avatar SPARQL Query (simplified):
SELECT ?producer {?movie.name"Avatar“. ?movie.produced_by?producer.}
Logical Form:
λy. Ǝx. movie.produced_by(x, y) Λ movie.name(x, z) Λ z=“Avatar”
Relation:
movie.name movie.produced_by
produced_by
name
MOVIE PERSON
produced_by
name
MOVIE PERSON
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
Relation Inference from Gazetteers
Gazetteers (entity lists)
“james cameron”
director producer
:
james cameron director
director producer
#movies James Cameron directed
movie.directed_by director.name
director director
• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding, in Proceedings of Interspeech, 2014.
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
Relational Surface Form Derivation
Web Resource Mining
Bing query snippets including entity pairs connected with specific relations in KG
Dependency Parsing
Avatar is a 2009 American epic science fiction film directed by James Cameron.
directed_by
Avatar is a 2009 American epic science fiction film directed by James Cameron nsub
num cop det
nn vmod
prop_by
nn
$movie nn nn nn prop pobj $director
Relational Surface Form Derivation
Dependency-Based Entity Embeddings
1) Word & Context Extraction
Word Contexts
$movie film/nsub-1
is film/cop-1
a film/det-1
2009 film/num-1
american, epic, science, fiction film/nn-1
Word Contexts
film
film/nsub, is/cop, a/det, 2009/num, american/nn, epic/nn, science/nn, fiction/nn, directed/vmod directed $director/prep_by
$director directed/prep_by-1
Avatar is a 2009 American epic science fiction film directed by James Cameron nsub
num cop det
nn vmod
prop_by
nn
$movie nn nn nn prop pobj $director
Relational Surface Form Derivation
Dependency-Based Entity Embeddings
2) Training Process
◦ Each word w is associated with a vector v
wand each context c is represented as a vector v
c◦ Learn vector representations for both words and contexts such that the dot product v
w. v
cassociated with good word-context pairs belonging to the training data D is maximized
◦ Objective function:
Word Contexts
$movie film/nsub-1
is film/cop-1
a film/det-1
2009 film/num-1
american, epic, science, fiction film/nn-1
Word Contexts
film
film/nsub, is/cop, a/det, 2009/num, american/nn, epic/nn, science/nn, fiction/nn, directed/vmod directed $director/prep_by
$director directed/prep_by-1
Relational Surface Form Derivation
Surface Form Derivation
Entity Surface Forms
◦ learn the surface forms corresponding to entities
Entity Syntactic Contexts
◦ learn the important contexts of entities
$char, $director, etc.
$char: “character”, “role”, “who”
$director: “director”, “filmmaker”
$genre: “action”, “fiction”
based on word vector v
wbased on context vector v
c$char: “played”
$director: “directed”
with similar contexts
frequently occurring together
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
Probabilistic Enrichment
Integrate relations from
◦ Prior knowledge
◦ Entity surface forms
◦ Entity syntactic contexts
Integrated Relations for Words by
◦ Unweighted: combine all relations with binary values
◦ Weighted: combine all relations and keep the highest weights of relations
◦ Highest Weighted: combine the most possible relation of each word
Integrated Relations for Utterances by
• Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, and Gokhan Tur, Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding, in Proceedings of Interspeech, 2014.
Proposed Framework
Relation Inference from
Gazetteers
Entity Dict.
Relational Surface Form
Derivation
EntityEmbeddings
P
F(r | w)
Entity Surface Forms
P
C(r | w) P
E(r | w)
Entity Syntactic Contexts Knowledge Graph Entity
Probabilistic Enrichment
Ru
(r)
Relabel
Boostrapping
Final Results
“find me some films directed by james cameron”
Input Utterance Background Knowledge
Local Relational Surface Form
Bing Query Snippets Knowledge Graph
Boostrapping
Unsupervised Self-Training
Training a multi-label multi-class classifier estimating relations given an utterance
Ru1 (r)
r
Ru2 (r)
r
Ru3 (r)
r
Utterances with relation weights
Pseudo labels for training
u1: Lu1 (r) u2: Lu2 (r) u3: Lu3 (r) :
creating labels by a threshold
Adaboost:
ensemble M weak classifiers
Classifier
output prob dist.
of relations
Experiments of Relation Detection
Dataset
Knowledge Base: Freebase
◦ 670K entities
◦ 78 entity types (movie names, actors, etc)
Relation Detection Data
◦ Crowd-sourced utterances
◦ Manually annotated with SPARQL queries relations
Query Statistics Dev Test
% entity only 8.9% 10.7%
% rel only w/ specified movie names 27.1% 27.5%
% rel only w/ specified other names 39.8% 39.6%
% more complicated relations 15.4% 14.7%
% not covered 8.8% 7.6%
#utterances 3338 1084
User Utterance:
who produced avatar Relation:
movie.name
movie.produced_by
produced_by
name
MOVIE PERSON
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Baseline
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Words derived by dependency embeddings can successfully capture the surface forms of entity tags, while words derived by regular embeddings cannot.
Baseline
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Words derived from entity contexts slightly improve performance.
Baseline
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04 Gazetteer + Entity Surface Form + Context 37.66 38.64 40.29 41.98 40.07 43.34
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Combining all approaches performs best, while the major improvement is from derived entity surface forms.
Baseline
Proposed
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04 Gazetteer + Entity Surface Form + Context 37.66 38.64 40.29 41.98 40.07 43.34
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
With the same information, learning surface forms from dependency- based embedding performs better, because there’s mismatch between written and spoken language.
Baseline
Proposed
Experiments of Relation Detection
All performance
Evaluation Metric: micro F-measure (%)
Weighted methods perform better when less features, and highest weighted methods perform better when more features.
Approach
Unweighted Weighted Highest Weighted Ori Boostrap Ori Boostrap Ori Boostrap
Gazetteer 35.21 36.91 37.93 40.10 36.08 38.89
Gazetteer + Weakly Supervised 25.07 37.39 39.04 39.07 39.40 39.98 Gazetteer + Entity Surface Form (Reg) 34.23 34.91 36.57 38.13 34.69 37.16 Gazetteer + Entity Surface Form (Dep) 37.44 38.37 41.01 41.10 39.19 42.74 Gazetteer + Entity Context 35.31 37.23 38.04 38.88 37.25 38.04 Gazetteer + Entity Surface Form + Context 37.66 38.64 40.29 41.98 40.07 43.34 Baseline
Proposed
Experiments of Relation Detection
Entity Surface Forms Derived from Dependency Embeddings The functional similarity carried by dependency-based entity
embeddings effectively benefits relation detection task.
Entity Tag Derived Word
$character character, role, who, girl, she, he, officier
$director director, dir, filmmaker
$genre comedy, drama, fantasy, cartoon, horror, sci
$language language, spanish, english, german
$producer producer, filmmaker, screenwriter
Experiments of Relation Detection
Effectiveness of Boosting
◦ The best result is the
combination of all approaches, because probabilities came from different resources can complement each other.
◦ Only adding entity surface forms performs similarly, showing that the major improvement comes from
relational entity surface forms.
◦ Boosting significantly improves most performance
0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44
1 2 3 4 5 6 7 8 9 10
F-Measure
Iteration
Gaz. Gaz. + Weakly Supervised
Gaz. + Entity Surface Form (BOW) Gaz. + Entity Surface Form (Dep) Gaz. + Entity Context Gaz. + Entity Surface Form + Context