Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

(1)

Unsupervised Learning and Modeling of

Knowledge and Intent for Spoken Dialogue Systems

Y U N - NUNG (V I V I AN ) CHEN

H T T P : / / V I V I A N C H E N . I D V . T W

A P R I L 1 6 T H , 2 0 1 5 @ N E W YO R K U N I V E R S I T Y

(2)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

(3)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

(4)

A Popular Robot - Baymax

Big Hero 6 -- Video content owned and licensed by Disney Entertainment, Marvel Entertainment, LLC, etc

(5)

A Popular Robot - Baymax

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better

understanding and interacting with people.

The goal is to automate learning and understanding

procedures in system development.

(6)

Spoken Dialogue System (SDS)

Spoken dialogue systems are the intelligent agents that are able to help users finish tasks more efficiently via speech interactions.

Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).

Apple’s Siri

Microsoft’s Cortana

Amazon’s

Echo Samsung’s SMART TV

Google Now

https://www.apple.com/ios/siri/

http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortana http://www.amazon.com/oc/echo/

http://www.samsung.com/us/experience/smart-tv/

(7)

Large Smart Device Population

The number of global smartphone users will surpass 2 billion in 2016.

As of 2012, there are 1.1 billion automobiles on the earth.

The more natural and convenient input of the devices evolves towards speech

(8)

Knowledge Representation/Ontology

Traditional SDSs require manual annotations for specific domains to represent domain knowledge.

Restaurant Domain

Movie Domain

restaurant

type price

location

movie

year genre

director

Node: semantic concept/slot Edge: relation between concepts

located_in

directed_by

released_in

(9)

Utterance Semantic Representation

A spoken language understanding (SLU) component requires the domain ontology to decode utterances into semantic forms, which contain core content (a set of slots and slot-fillers) of the utterance.

find a cheap taiwanese restaurant in oakland

show me action movies directed by james cameron

target=“restaurant”, price=“cheap”, type=“taiwanese”, location=“oakland”

target=“movie”, genre=“action”, director=“james cameron”

Restaurant Domain

Movie Domain

restaurant

type price

location

movie

year genre

director

(10)

Challenges for SDS

An SDS in a new domain requires

1) A hand-crafted domain ontology

2) Utterances labelled with semantic representations

3) An SLU component for mapping utterances into semantic representations

With increasing spoken interactions, building domain ontologies and annotating utterances cost a lot so that the data does not scale up.

The goal is to enable an SDS to automatically learn this

knowledge so that open domain requests can be handled.

(11)

Questions to Address

1) Given unlabelled raw audio recordings, how can a system automatically induce and organize domain-specific concepts?

2) With the automatically acquired knowledge, how can a system

understand individual utterances?

(12)

Interaction Example

find a cheap restaurant for asian food

User

Intelligent

Agent Q: How does a dialogue system process this request?

Cheap Asian restaurants include Kelly & Ping, Saigon Shack, etc.

What do you want to choose?

(13)

SDS Process – Available Domain Ontology find a cheap restaurant for asian food

User

target

food price

AMOD

NN

seeking ^PREP_FOR

Organized Domain Knowledge

Intelligent

Agent

(14)

SDS Process – Available Domain Ontology find a cheap restaurant for asian food

User

target

food price

AMOD

NN

seeking ^PREP_FOR

Organized Domain Knowledge

Intelligent Agent

Ontology Induction

(semantic slot)

(15)

SDS Process – Available Domain Ontology find a cheap restaurant for asian food

User

target

food price

AMOD

NN

seeking ^PREP_FOR

Organized Domain Knowledge

Intelligent Agent

Ontology Induction

(semantic slot)

Structure Learning

(inter-slot relation)

(16)

SDS Process – Spoken Language Understanding (SLU) find a cheap restaurant for asian food

User

target

food price

AMOD

NN

seeking ^PREP_FOR

Intelligent Agent

seeking=“find”

target=“restaurant”

price=“cheap”

food=“asian food”

(17)

SDS Process – Spoken Language Understanding (SLU) find a cheap restaurant for asian food

User

target

food price

AMOD

NN

seeking ^PREP_FOR

Intelligent Agent

seeking=“find”

target=“restaurant”

price=“cheap”

food=“asian food”

Semantic Decoding

(18)

SDS Process – Dialogue Management (DM) find a cheap restaurant for asian food

User

target

food price

AMOD

NN

seeking ^PREP_FOR

SELECT restaurant {

restaurant.price=“cheap”

restaurant.food=“asian food”

Intelligent }

Agent

(19)

SDS Process – Dialogue Management (DM) find a cheap restaurant for asian food

User

SELECT restaurant {

restaurant.price=“cheap”

restaurant.food=“asian food”

}

Kelly & Ping Saigon Shack

: :

Intelligent

Agent

(20)

SDS Process – Natural Language Generation (NLG) find a cheap restaurant for asian food

User

Intelligent Agent

Cheap Asian restaurants include Kelly & Ping, Saigon Shack, etc.

What do you want to choose?

(21)

Goals

target

food price

AMOD

NN

seeking ^PREP_FOR

SELECT restaurant {

restaurant.price=“cheap”

restaurant.food=“asian food”

}

find a cheap eating place for asian food

User

Required Domain-Specific Information

(22)

Goals

target

food price

AMOD

NN

seeking ^PREP_FOR

SELECT restaurant {

restaurant.price=“cheap”

restaurant.food=“asian food”

}

find a cheap restaurant for asian food

User

Ontology Induction

Structure Learning

Semantic Decoding

(inter-slot relation)

(semantic slot)

(23)

Goals

find a cheap restaurant for asian food

User

Ontology Induction

Structure Learning

Semantic Decoding

(24)

Knowledge Acquisition

1) Given unlabelled raw audio recordings, how can a system automatically induce and organize domain-specific concepts?

Restaurant Asking Conversations

target

food

price seeking

quantity

PREP_FOR

NN AMOD

AMOD AMOD

Organized Domain Knowledge Unlabelled

Collection Knowledge Acquisition

Ontology Induction Structure Learning

Knowledge Acquisition

(25)

SLU Modeling

2) With the automatically acquired knowledge, how can a system understand individual utterances?

Organized Domain Knowledge

price=“cheap”

target=“restaurant”

SLU Modeling

SLU Component

“can i have a cheap restaurant”

Semantic Decoding

SLU Modeling

(26)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

(27)

Ontology Induction [ASRU’13, SLT’14a]

Input: Unlabelled user utterances Output: Slots that are useful for a

domain-specific SDS

Y.-N. Chen et al., "Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing," in Proc. of ASRU, 2013. (Best Student Paper Award)

Restaurant Asking Conversations

target

food

price seeking

quantity

Domain-Specific Slot Unlabelled Collection

Step 1: Frame-semantic parsing on all utterances for creating slot candidates Step 2: Slot ranking model for differentiating domain-specific concepts from

generic concepts

Step 3: Slot selection

(28)

Probabilistic Frame-Semantic Parsing

FrameNet [Baker et al., 1998]

◦ a linguistically semantic resource, based on the frame- semantics theory

◦ “low fat milk”  “milk” evokes the “food” frame;

“low fat” fills the descriptor frame element

SEMAFOR [Das et al., 2014]

◦ a state-of-the-art frame-semantics parser, trained on manually annotated FrameNet sentences

Baker et al., " The berkeley framenet project," in Proc. of International Conference on Computational linguistics, 1998.

(29)

Step 1: Frame-Semantic Parsing for Utterances

can i have a cheap restaurant

Frame: capability FT LU: can FE LU: i

Frame: expensiveness FT LU: cheap

Frame: locale by use FT/FE LU: restaurant

Task: adapting generic frames to domain-specific settings for SDSs

Good!

?

FT: Frame Target; FE: Frame Element; LU: Lexical Unit

(30)

Step 2: Slot Ranking Model

Main Idea: rank domain-specific concepts higher than generic semantic concepts

can i have a cheap restaurant

Frame: capability FT LU: can FE LU: i

Frame: expensiveness FT LU: cheap

Frame: locale by use FT/FE LU: restaurant

slot candidate

slot filler

(31)

Step 2: Slot Ranking Model

Rank a slot candidate s by integrating two scores

slot frequency in the domain-specific conversation

semantic coherence of slot fillers slots with higher frequency  more important

domain-specific concepts  fewer topics

(32)

Step 2: Slot Ranking Model

h(s) : Semantic coherence

lower coherence in topic space higher coherence in topic space slot name: quantity slot name: expensiveness

a one

all three

cheap

expensive

inexpensive

measured by cosine similarity between their word embeddings

the slot-filler set

corresponding to s

(33)

Step 3: Slot Selection

Rank all slot candidates by their importance scores

Output slot candidates with higher scores based on a threshold

frequency semantic coherence

locale_by_use 0.89 capability 0.68

food 0.64

expensiveness 0.49 quantity 0.30 seeking 0.22

:

locale_by_use

food

expensiveness capability

quantity

(34)

Experiments of Ontology Induction

Dataset

◦ Cambridge University SLU corpus [Henderson, 2012]

◦ Restaurant recommendation in an in-car setting in Cambridge

◦ WER = 37%

◦ vocabulary size = 1868

◦ 2,166 dialogues

◦ 15,453 utterances

◦ dialogue slot: addr, area, food, name, phone, postcode, price range, task, type

The mapping table between induced and reference slots

UNSUPERVISED LEARNING AND MODELING OF KNOWLEDGE AND INTENT FOR SPOKEN DIALOGUE SYSTEMS

Henderson et al., "Discriminative spoken language understanding using word confusion networks," in Proc. of SLT, 2012.

34

(35)

Experiments of Ontology Induction

◦ Slot Induction Evaluation: Average Precision (AP) and Area Under the Precision-Recall Curve (AUC) of the slot ranking model to

measure quality of induced slots via the mapping table

Induced slots have 70% of AP and align well with human-annotated slots for SDS.

Semantic relations help decide domain-specific knowledge.

Approach ASR Manual

AP (%) AUC (%) AP (%) AUC (%)

Baseline: MLE 56.7 54.7 53.0 50.8

MLE + Semantic Coherence 71.7

(+26.5%)

70.4 (+28.7%)

74.4 (+40.4%)

73.6 (+44.9%)

(36)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

(37)

Structure Learning [NAACL-HLT’15]

Input: Unlabelled user utterances Output: Slots with relations

Step 1: Construct a graph to represent slots, words, and relations Step 2: Compute scores for edges (relations) and nodes (slots)

Step 3: Identify important relations connecting important slot pairs

Domain-Specific Ontology Unlabelled Collection

locale_by_use

food expensiveness seeking

relational_quantity

PREP_FOR

NN AMOD

AMOD

desiring

DOBJ

(38)

Step 1: Knowledge Graph Construction

ccomp

amod dobj

nsubj det

Syntactic dependency parsing on utterances

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

can i have a cheap restaurant

capability expensiveness locale_by_use

restaurant

can have

i a

cheap w

w capability

locale_by_use expensiveness

s

(39)

Step 1: Knowledge Graph Construction

The edge between a node pair is weighted as relation importance

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurant

can have

i a

cheap w

w capability

locale_by_use expensiveness

s

How to decide the weights to represent relation importance?

(40)

Dependency-based word embeddings

Dependency-based slot embeddings

Step 2: Weight Measurement

Slot/Word Embeddings Training

can = [0.8 … 0.24]

have = [0.3 … 0.21]

: :

expensiveness = [0.12 … 0.7]

capability = [0.3 … 0.6]

: :

can i have a cheap restaurant

ccomp

amod dobj

nsubj det

have a

capability expensiveness locale_by_use

ccomp

amod dobj

nsubj det

(41)

Step 2: Weight Measurement

Compute edge weights to represent relation importance

◦ Slot-to-slot relation 𝐿

_𝑠𝑠

: similarity between slot embeddings

◦ Word-to-slot relation 𝐿

_𝑤𝑠

or 𝐿

_𝑠𝑤

: frequency of the slot-word pair

◦ Word-to-word relation 𝐿

_𝑤𝑤

: similarity between word embeddings

𝐿 _𝑤𝑤

𝐿 _𝑤𝑠 𝑜𝑟 𝐿 _𝑠𝑤 𝐿 _𝑠𝑠

(42)

Step 2: Slot Importance by Random Walk

scores propagated from word-layer then propagated within slot-layer

Assumption: the slots with more dependencies to more important slots should be more important

The random walk algorithm computes importance for each slot

original frequency score slot importance

𝐿 _𝑤𝑤

𝐿 _𝑤𝑠 𝑜𝑟 𝐿 _𝑠𝑤 𝐿 _𝑠𝑠

The converged slot importance suggests whether the slot is important Rank slot pairs by summing up their converged slot importance

Select slot pairs with higher scores according to a threshold

Step 3: Identify Domain Slots w/ Relations

(4)

: :

locale_by_use

food expensiveness seeking

relational_quantity

PREP_FOR

NN AMOD

AMOD AMOD

desiring

DOBJ

(Experiment 1)

(Experiment 2)

(44)

Experiments for Structure Learning

Experiment 1: Quality of Slot Importance

Dataset: Cambridge University SLU Corpus

Approach ASR Manual

AP (%) AUC (%) AP (%) AUC (%)

Baseline: MLE 56.7 54.7 53.0 50.8

Random Walk:

MLE + Dependent Relations

69.0 (+21.8%)

68.5 (+24.8%)

75.2 (+41.8%)

74.5 (+46.7%)

Dependent relations help decide domain-specific knowledge.

(45)

Experiments for Structure Learning

Experiment 2: Relation Discovery Evaluation

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness seeking

relational_quantity

PREP_FOR

NN AMOD

AMOD AMOD

desiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

task

area

PREP_IN

(46)

Experiments for Structure Learning

Experiment 2: Relation Discovery Evaluation

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness seeking

relational_quantity

PREP_FOR

NN AMOD

AMOD AMOD

desiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

task

area

PREP_IN

The automatically learned domain ontology

aligns well with the reference one.

(47)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

Knowledge Acquisition

Organized Domain Knowledge Unlabelled Collection

(48)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

(49)

SLU Model

Semantic Representation

“can I have a cheap restaurant”

Ontology Induction

Unlabeled Collection

Semantic KG

Frame-Semantic Parsing

F_w F_s

Feature Model

R_w

R_s Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure

Learning

.

Semantic KG

SLU Modeling by Matrix Factorization

Semantic Decoding

Input: user utterances, automatically learned knowledge

Output: the semantic concepts included in each individual utterance

(50)

Ontology Induction

SLU

F_w F_s

Structure Learning

.

Matrix Factorization (MF)

Feature Model

1

Utterance 1

i would like a cheap restaurant

Word Observation Slot Candidate

Train

… … …

cheap restaurant expensiveness food 1

locale_by_use 1

1

find a restaurant with chinese food

Utterance 2

1 1

food

1 1

1 Test

1

.90 .97 .85 .95

.93 .98 .92

.

⁰⁵ ^.05

Slot Induction

show me a list of cheap restaurants

Test Utterance

hidden semantics

(51)

Matrix Factorization (MF)

Knowledge Graph Propagation Model

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation

matrix

slot relation

matrix

‧

1

Train

cheap restaurant expensiveness food 1

locale_by_use 1

1

1 1

food

1 1

1 Test

1

.90 .97 .85 .95

.93 .98 .92

.

⁰⁵ ^.05

Slot Induction

The MF method completes a partially-missing matrix based on the

latent semantics by decomposing it into product of two matrices.

(52)

Bayesian Personalized Ranking for MF

Model implicit feedback

◦ not treat unobserved facts as negative samples (true or false)

◦ give observed facts higher scores than unobserved facts

Experiments of Semantic Decoding

Experiment 1: Quality of Semantics Estimation

Dataset: Cambridge University SLU Corpus

Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance

Approach ASR Manual

Baseline: Logistic Regression 34.0 38.8

Random 22.5 25.1

Majority Class 32.9 38.4

MF Approach

Feature Model 37.6 45.3

Feature Model +

Knowledge Graph Propagation

43.5 (+27.9%)

53.4 (+37.6%) Modeling

Implicit Semantics

The MF approach effectively models hidden semantics to improve SLU.

Adding a knowledge graph propagation model further improves the results.

(54)

Experiments of Semantic Decoding

Experiment 2: Effectiveness of Relations

Dataset: Cambridge University SLU Corpus

Metric: Mean Average Precision (MAP) of all estimated slot probabilities for each utterance

Both semantic and dependent relations are useful to infer hidden semantics.

Approach ASR Manual

Feature Model 37.6 45.3

Feature + Knowledge Graph Propagation

Semantic Relation 41.4 (+10.1%) 51.6 (+13.9%)

Dependent Relation 41.6 (+10.6%) 49.0 (+8.2%)

Both 43.5 (+15.7%) 53.4 (+17.9%)

Combining both types of relations further improves the performance.

(55)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

SLU Modeling

Organized Domain Knowledge

price=“cheap”

target=“restaurant”

SLU Model

“can i have a cheap restaurant”

(56)

Outline

Introduction

Ontology Induction [ASRU’13, SLT’14a]

Structure Learning [NAACL-HLT’15]

Semantic Decoding (submitted)

Conclusions

(57)

Summary

Ontology Induction  Semantic relations are useful Structure Learning  Dependent relations are useful

Knowledge Acquisition

Semantic Decoding

 The MF approach builds an SLU model to decode semantics

SLU Modeling

(58)

Conclusions

The knowledge acquisition procedure enables systems to automatically learn open domain knowledge and produce domain-specific ontologies.

The MF technique for SLU modeling provides a principle model that is able to unify the automatically acquired knowledge, and then allows systems to consider implicit semantics for better understanding.

The work shows the feasibility and the potential of improving

generalization, maintenance, efficiency, and scalability of SDSs.

(59)

Q & A

THANKS FOR YOUR ATTENTIONS!!

(60)

Word Embeddings

Training Process

◦ Each word w is associated with a vector

◦ The contexts within the window size c are considered as the training data D

restaurant have/dobj

^-1

, a/det, cheap/amod

can i have a cheap restaurant

ccomp

amod dobj

nsubj det

(62)

Dependency-Based Embeddings

Training Process

◦ Each word w is associated with a vector v

_w

and each context c is represented as a vector v

_c

◦ Learn vector representations for both words and contexts such that the dot product v

_w

． v

_c

associated with good word-context pairs belonging to the training data D is maximized

◦ Objective function:

[back]

(63)

Evaluation Metrics

◦ Slot Induction Evaluation: Average Precision (AP) and Area Under the Precision-Recall Curve (AUC) of the slot ranking model to

measure the quality of induced slots via the mapping table

[back]

1. locale_by_use 0.89 2. capability 0.68 3. food 0.64 4. expensiveness 0.49 5. quantity 0.30

Precision 1 0 2/3 3/4 0

AP = 80.56%

(64)

Slot Induction on ASR & Manual Results

The slot importance:

[back]

1. locale_by_use 0.88 2. expensiveness 0.78 3. food 0.64 4. capability 0.59 5. quantity 0.44

1. locale_by_use 0.89 2. capability 0.68 3. food 0.64 4. expensiveness 0.49 5. quantity 0.30

ASR Manual

Users tend to speak important information more clearly, so misrecognition of less important slots may slightly benefit the slot induction performance.

frequency

(65)

Slot Mapping Table

origin food

u

₁

u

₂

: u

_k

: u

_n

asian : : japan

: :

asian beer

: japan

: noodle food

: beer

: : : noodle

Create the mapping if slot fillers of the induced slot are included by the reference slot

induced slots reference slot

[back]

(66)

Random Walk Algorithm

The converged algorithm satisfies

The derived closed form solution is the dominant eigenvector of M

[back]

66

(67)

SEMAFOR Performance

The SEMAFOR evaluation

[back]

(68)

Matrix Factorization

The decomposed matrices represent latent semantics for utterances and words/slots respectively

The product of two matrices fills the probability of hidden semantics

[back]

68

1

Train

cheap restaurant expensiveness food

1

locale_by_use

1 1

food

1 1

1 _T

est

1 1

.90 .97 .85 .95

.93 .98 .92

.

⁰⁵ ^.05

𝑼

𝑾 + 𝑺

≈ 𝑼 × 𝒅 × 𝒅 × 𝑾 + 𝑺

(69)