Learning Semantic Hierarchy with Distributional Representations for Unsupervised Spoken Language Understanding
Yun-Nung (Vivian) Chen, William Yang Wang, and Alexander I. Rudnicky
Summary
Multi-Level
Slot Ranking SLU
Model Induced Slots
Semantic
Representations
“i am looking for an expensive restaurant in north”
Slot Induction
Slot
Candidates
Frame-Semantic Parsing Semantic
Decoder Training Unlabeled Collection
Low-Level Slot Importance Estimation High-Level Slot Importance Estimation
Can a dialogue system automatically learn open domain knowledge?
Low-Level Slot Importance Estimation
High-Level Slot Importance Estimation
Motivation
o Dialogue systems require a predefined semantic ontology; can it be learned from data?
o A hierarchical ontology containing cross-slot information is crucial to SLU.
o Word embeddings carry robust semantics.
Approach
1) HAC learns a hierarchical ontology based on FrameNet- parsed slot candidates and word embeddings.
2) The slot importance estimated for different levels is integrated together to induce the ontology.
3) The induced slots are used for training an SLU model.
Result
o With high-level information, the SLU model achieves 13%
relative improvement on F1.
o The learned hierarchy aligns well with the hand-craft mapping.
Multi-Level Slot Ranking
Approach
Slot
Induction AUC (%)
SLU F1 (%)
Baseline Low-Level 79.50 60.27 Proposed High-Level 81.28 67.94 Multi-Level 82.00 68.13
can i have a cheap restaurant
Frame: capability FT LU: can FE Filler: i
Frame: expensiveness FT LU: cheap
Frame: locale by use FT/FE LU: restaurant generic semantic concept
(useless for SDS) domain-specific concept
• Frame semantics parsing generates slot candidates (Chen et al., 2013; 2014)
o : the slot frequency in the parsed corpus o : the coherence of slot-fillers
slots with higher frequency more important
domain-specific concepts focus on fewer topics
coherence can help measure slot prominence
• Idea: rank domain-specific concepts higher than generic semantic concepts
• Hierarchical Agglomerative Clustering (HAC) performs a bottom-up clustering approach by successively merging similar clusters together.
o The distance between two clusters A and B is defined as
restaurant
bar
southern north location
place pub
restaurants
garden house
locale by use: [2, 0, 1, …]
buildings: [2, 0, 1, …]
locale: [0, 2, 0, …]
direction: [0, 2, 0, …]
part orientational: [0, 2, 0, …]
• Idea: rank slots considering all different levels of the hierarchy
Experiments
• Domain: restaurant
recommendation in an in-car
setting in Cambridge (Word Error Rate = 37%)
o Dialogue slots: addr, area,
food, phone, postcode, price range, task, and type
speak on topic addr area
food
phone part orientational
direction locale
part inner outer food
origin
contacting
postcode pricerange
task type
sending
commerce scenario expensiveness
range
seeking desiring locating
locale by use building
part orientational direction
locale range
diversity increment
commerce scenario expensiveness
locale by use building
Learned Semantic Hierarchy
Mappings of Induced Slots (within the blocks) and Expert-Defined Semantic Slots (right sides of arrows)
• Word-level clustering groups the words with closer embeddings since they have similar contexts
• Slot-level clustering groups the slots with closer vectors built by the word-level clustering results
Conclusion
• We propose an unsupervised
approach unifying semantics from a hierarchical structure to
improve slot induction and SLU modeling.
• Our automatically induced
semantic slots align well with reference slots.
• We show the feasibility of training an SLU model based on
automatically induced slots and its promising performance for practical usage.
Word embeddings help merge semantically similar words together.
Different slot candidates generated by the frame semantic parser can be merged because they share similar clustering distribution.
• Bottom-up slot importance estimation estimates the high-level slot importance by aggragating the low-level importance:
The final slot importance contains hierarchical information.
Cluster 1
Cluster 2
High-level information helps both slot induction and SLU performance, where the learned hierarchy aligns well with the manual mapping.