Learning Semantic Hierarchy with Distributional Representations for Unsupervised Spoken Language Understanding

(1)

Learning Semantic Hierarchy with Distributional Representations for Unsupervised Spoken Language Understanding

Yun-Nung (Vivian) Chen, William Yang Wang, and Alexander I. Rudnicky

Summary

Multi-Level

Slot Ranking SLU

Model Induced Slots

Semantic

Representations

“i am looking for an expensive restaurant in north”

Slot Induction

Slot

Candidates

Frame-Semantic Parsing Semantic

Decoder Training Unlabeled Collection

Low-Level Slot Importance Estimation High-Level Slot Importance Estimation

Can a dialogue system automatically learn open domain knowledge?

Low-Level Slot Importance Estimation

High-Level Slot Importance Estimation

 Motivation

o Dialogue systems require a predefined semantic ontology; can it be learned from data?

o A hierarchical ontology containing cross-slot information is crucial to SLU.

o Word embeddings carry robust semantics.

 Approach

1) HAC learns a hierarchical ontology based on FrameNet- parsed slot candidates and word embeddings.

2) The slot importance estimated for different levels is integrated together to induce the ontology.

3) The induced slots are used for training an SLU model.

 Result

o With high-level information, the SLU model achieves 13%

relative improvement on F1.

o The learned hierarchy aligns well with the hand-craft mapping.

Multi-Level Slot Ranking

Approach

Slot

Induction AUC (%)

SLU F1 (%)

Baseline Low-Level 79.50 60.27 Proposed High-Level 81.28 67.94 Multi-Level 82.00 68.13

can i have a cheap restaurant

Frame: capability FT LU: can FE Filler: i

Frame: expensiveness FT LU: cheap

Frame: locale by use FT/FE LU: restaurant generic semantic concept

(useless for SDS) domain-specific concept

• Frame semantics parsing generates slot candidates (Chen et al., 2013; 2014)

o : the slot frequency in the parsed corpus o : the coherence of slot-fillers

slots with higher frequency  more important

domain-specific concepts focus on fewer topics

 coherence can help measure slot prominence

• Idea: rank domain-specific concepts higher than generic semantic concepts

• Hierarchical Agglomerative Clustering (HAC) performs a bottom-up clustering approach by successively merging similar clusters together.

o The distance between two clusters A and B is defined as

restaurant

bar

southern north location

place pub

restaurants

garden house

locale by use: [2, 0, 1, …]

buildings: [2, 0, 1, …]

locale: [0, 2, 0, …]

direction: [0, 2, 0, …]

part orientational: [0, 2, 0, …]

• Idea: rank slots considering all different levels of the hierarchy

Experiments

• Domain: restaurant

recommendation in an in-car

setting in Cambridge (Word Error Rate = 37%)

o Dialogue slots: addr, area,

food, phone, postcode, price range, task, and type

speak on topic addr area

food

phone part orientational

direction locale

part inner outer food

origin

contacting

postcode pricerange

task type

sending

commerce scenario expensiveness

range

seeking desiring locating

locale by use building

part orientational direction

locale range

diversity increment

commerce scenario expensiveness

locale by use building

Learned Semantic Hierarchy

Mappings of Induced Slots (within the blocks) and Expert-Defined Semantic Slots (right sides of arrows)

• Word-level clustering groups the words with closer embeddings since they have similar contexts

• Slot-level clustering groups the slots with closer vectors built by the word-level clustering results

Conclusion

• We propose an unsupervised

approach unifying semantics from a hierarchical structure to

improve slot induction and SLU modeling.

• Our automatically induced

semantic slots align well with reference slots.

• We show the feasibility of training an SLU model based on

automatically induced slots and its promising performance for practical usage.

Word embeddings help merge semantically similar words together.

Different slot candidates generated by the frame semantic parser can be merged because they share similar clustering distribution.

• Bottom-up slot importance estimation estimates the high-level slot importance by aggragating the low-level importance:

The final slot importance contains hierarchical information.

Cluster 1

Cluster 2

High-level information helps both slot induction and SLU performance, where the learned hierarchy aligns well with the manual mapping.