Course Logistics
2
3
Outline
Introduction and Background
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue State Tracking (DST)
Dialogue Policy
Natural Language Generation (NLG)
End-to-End Learning for Dialogue Systems
Conclusion
Part I
Introduction & Background
4
5
Language Empowering Intelligent Assistants
Apple Siri (2011) Google Now (2012) Microsoft Cortana (2014)
Dialogue System
Task-Oriented
Personal assistant, achieve a certain task
Combination of rules and statistical components
POMDP for spoken dialog systems (Williams and Young, 2007)
Learning End-to-End Goal- oriented Dialog (Antoni and Weston, 2016)
An End-to-End Trainable Task-oriented Dialogue System (Wen el al., 2016)
Chit-Chat
No specific goal, focus on conversation flow
Work using variants of seq2seq model
A Neural Conversation Model (Vinyals and Le, 2015)
Deep Reinforcement Learning for Dialogue Generation (Li et al., 2016)
Conversational Contextual Cues: The Case of
Personalization & History for Response Ranking (AI- Rfou et al., 2016)
6
7
Pipelined Task-Oriented Dialogue System
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
Text response Text Input
Are there any action movies to see this weekend?
Speech Signal
Part II
Modular Dialogue System
8
9
Outline
Introduction and Background
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue State Tracking (DST)
Dialogue Policy
Natural Language Generation (NLG)
End-to-End Learning for Dialogue Systems
Conclusion
10
Semantic Frame Representation
Requires a domain ontology
Contains core content (intent, a set of slots with fillers) find a cheap taiwanese restaurant in oakland
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”, director=“james cameron”) Restaurant
Domain
Movie Domain
restaurant price type
location
movie year genre
director
10
11
Language Understanding (LU)
Pipelined
1. Domain Classification
2. Intent
Classification 3. Slot Filling
LU – Domain/Intent Classification
• Given a collection of utterances u
iwith labels c
i, D=
{(u
1,c
1),…,(u
n,c
n)} where c
i∊ C, train a model to estimate labels for new utterances u
k.
Mainly viewed as an utterance classification task
12
find me a cheap taiwanese restaurant in oakland
Movies Restaurants Sports
Weather Music
…
Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics
…
13
Language Understanding - Slot Filling
Is there um a cheap place in the centre of town please?
Is there um a cheap place in the centre of town please?
O O O O
B-price
O O O
B-area
O I-areaI-area
As a sequence tagging task
• CRF for tagging each utterance
As a classification task
• SVM for each slot value pair
14
Language Understanding
Intent Classification (Ravuri and Stolcke, 2015)
IOB Sequence Labeling for Slot Filling (Hakkani-Tur et al., 2016)
𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0𝑓ℎ
1𝑓ℎ
2𝑓ℎ
𝑛𝑓ℎ
0𝑏ℎ
1𝑏ℎ
2𝑏ℎ
𝑛𝑏𝑦
0𝑦
1𝑦
2𝑦
𝑛(a) LSTM (b) LSTM-LA (c) bLSTM-LA
intent
𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0ℎ
1ℎ
2ℎ
𝑛𝑦
0𝑦
1𝑦
2𝑦
𝑛𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0ℎ
1ℎ
2ℎ
𝑛𝑦
0𝑦
1𝑦
2𝑦
𝑛𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0ℎ
1ℎ
2ℎ
𝑛14
ht-
1
ht+
1
ht
W W W W
taiwanese U
food U
please U
hT+1 EOS U
Joint Semantic Frame Parsing
Sequence-based (Hakkani-Tur et
al., 2016)
• Slot filling and intent prediction in the same
output sequence Parallel (Liu and Lane, 2016)
• Intent prediction and slot filling are performed in two branches
15
16
Outline
Introduction and Background
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue State Tracking (DST)
Dialogue Policy
Natural Language Generation (NLG)
End-to-End Learning for Dialogue Systems
Conclusion
16
17
Dialogue State Tracking (DST)
18
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best prediction for better robustness
18
Slide credited by Sungjin Lee
Incorrect
for both!
19
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or
ambiguous input
How can I help you?
Book a table at Sumiko for 5 How many people?
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
20
Outline
Introduction and Background
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue State Tracking (DST)
Dialogue Policy
Natural Language Generation (NLG)
End-to-End Learning for Dialogue Systems
Conclusion
20
21
Dialogue Policy Optimization
Dialogue management in a RL framework
U s e r
Reward R Observation O Action A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
22
Reward for RL ≅ Evaluation for SDS
Dialogue is a special RL task
Human involves in interaction and rating (evaluation) of a dialogue
Fully human-in-the-loop framework
Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost
22
23
Dialogue Reinforcement Signal
Typical Reward Function
per turn penalty -1
Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
24
User Simulation
User Simulation
Goal: generate natural and reasonable conversations to enable reinforcement learning for exploring the policy space
Approach
Rule-based crafted by experts (Li et al., 2016)
Learning-based (Schatzmann et al., 2006)
Dialogue Corpus
Simulated User
Real User
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Interaction
24
25
Outline
Introduction and Background
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue State Tracking (DST)
Dialogue Policy
Natural Language Generation (NLG)
End-to-End Learning for Dialogue Systems
Conclusion
26
Natural Language Generation (NLG)
Mapping semantic frame into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
26
27
Template-Based NLG
Define a set of rules to map frames to NL
Pros: simple, error-free, easy to control Semantic Frame Natural Language
confirm() “Please tell me more about the product your are looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
28
Trainable Generator (Walker et al., 2002)
Divide the problem into pipeline
Statistical sentence plan generator
(Stent et al., 2009) Statistical surface realization
(Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …)Sentence Plan Generator Inform(
name=Z_House, price=cheap )
Z House is a cheap restaurant.
Sentence Plan Reranker
Surface Realiser
Pros: can model complex linguistic structures
Cons: heavily engineered, require domain knowledge
2829
RNN Language Generator (Wen et al., 2015)
<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> EAT serves British . Inform(name=EAT, food=British)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… …
dialog act 1-hot representation
SLOT_NAME serves SLOT_FOOD . <EOS>
30
Outline
Introduction and Background
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue State Tracking (DST)
Dialogue Policy
Natural Language Generation (NLG)
End-to-End Learning for Dialogue Systems
Conclusion
30
31
E2E Joint NLU and DM (Yang et al., 2017)
Errors from DM can be propagated to NLU for better robustness
DM
32
E2E Supervised Dialogue System (Wen et al., 2016)
Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.
Can I have korean
Korean
0.7 British 0.2 French 0.1
…
Belief Tracker
0 0 0 … 0 1
MySQL query:
“Select * where food=Korean”
Database Operator Intent Network
Can I have <v.food>
Generation Network
<v.name> serves great <v.food> .
Policy Network
Copy field…
Database
Seven days Curry Prince Nirala Royal Standard Little Seuol
DB pointer
x
tz
tp
tq
t32
33
Info Access Bot (Info-Bot) (Dhingra et al., 2016)
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray
movie which came out in 1993. KB-InfoBot User
(Groundhog Day, actor, Bill Murray) (Groundhog Day, release year, 1993) (Australia, actor, Nicole Kidman) (Mad Max: Fury Road, release year,
Knowledge Base (head, relation, tail)
34
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
wi
B- type
wi +1
wi+2
O O
EOS
<intent
>
wi
B- type
wi +1
wi+2
O O
EOS
<intent
>
Semantic Frame
request_movie genre=action, date=this weekendSystem Action /
Policy
request_location
User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi
+1
wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2 Time t
Dialogue Management
(DM)
w0 w1 w2
Natural Language Generation (NLG)
User EOS
Goal
User Agenda Modeling User Simulator
End-to-End Neural Dialogue System Text Input
Are there any action movies to see this weekend?
Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system
34
35
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch?
Agent: What date would you like to see it?
User: Tomorrow
Agent: Which theater do you prefer?
User: Regal meridian 16 Agent: What is your location?
User: Seattle
Agent: What time would you like to see it?
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!
36
Outline
Introduction and Background
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue State Tracking (DST)
Dialogue Policy
Natural Language Generation (NLG)
End-to-End Learning for Dialogue Systems
Conclusion
36
37