Deep Learning for Dialogue Systems
deepdialogue.miulab.tw
2
Outline
Introduction & Background
Neural Networks
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management (DM)
◼ Dialogue State Tracking (DST)
◼ Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Neural Dialogue Systems
Evaluation
Recent Trends on Learning Dialogues
Neural Networks
Reinforcement Learning
4
Early 1990s
Early 2000s
2017
Multi-modal systems
e.g., Microsoft MiPad, Pocket PC
Keyword Spotting (e.g., AT&T)
System: “Please say collect, calling card, person, third
TV Voice Search e.g., Bing on Xbox
Intent Determination
(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house” Task-specific argument extraction
(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”
Brief History of Dialogue Systems
Apple Siri (2011)
Google Now (2012)
Facebook M & Bot (2015)
Google Home (2016) Microsoft Cortana
(2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
DARPA CALO Project
Virtual Personal Assistants
5
Language Empowering Intelligent Assistant
Apple Siri (2011) Google Now (2012)
Facebook M & Bot (2015) Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
Apple HomePod (2017)
6
Conversational Agents
Chit-Chat
Task-Oriented
7
Challenges
Variability in natural language
Robustness
Recall/Precision Trade-off
Meaning Representation
Common Sense, World Knowledge
Ability to learn
Transparency
7
8
Task-Oriented Dialogue System (Young, 2000)
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
9
Outline
Introduction & Background
Neural Networks
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management (DM)
◼ Dialogue State Tracking (DST)
◼ Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Neural Dialogue Systems
System Evaluation
Recent Trends on Learning Dialogues
9
10
A Single Neuron
z w 1
w 2
w N
…
x 1
x 2
x N
+ b
( ) z
( ) z
bias z
y
( ) z
z e −
= + 1
1
Sigmoid function Activation function
1
w, b are the parameters of this neuron
11
A Single Neuron
z w 1
w 2
w N
…
x 1
x 2
x N
+
b
bias
y
1
5 . 0
"
2
"
5 . 0
"
2
"
y not
y is
A single neuron can only handle binary classification
11
M
N R
R
f : →
12
A Layer of Neurons
Handwriting digit classification f : R N → R M
A layer of neurons can handle multiple possible output, and the result depends on the max one
…
x 1
x 2
x N
+
1
+ y 1
+
… …
“1” or not
“2” or not
“3” or not
y 2
y 3
10 neurons/10 classes
Which
one is
max?
13
Deep Neural Networks (DNN)
Fully connected feedforward network
x
1x
2……
Layer 1
……
y
1y
2……
Layer 2
……
Layer L
……
……
……
Input Output
y
Mx
Nvector x
vector y
Deep NN: multiple hidden layers
M
N R
R
f : →
14
Recurrent Neural Network (RNN)
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)
15
Outline
Introduction & Background
Neural Networks
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management (DM)
◼ Dialogue State Tracking (DST)
◼ Dialogue Policy Optimization
Natural Language Generation (NLG)
System Evaluation
Recent Trends on Learning Dialogues
15
16
Reinforcement Learning
RL is a general purpose framework for decision making
RL is for an agent with the capacity to act
Each action influences the agent’s future state
Success is measured by a scalar reward signal
Goal: select actions to maximize future reward
Observation
Action
Reward
17
Supervised v.s. Reinforcement
Supervised
Reinforcement
17
……
Say “Hi”
Say “Good bye”
Learning from teacher
Learning from critics
Hello ☺ ……
“Hello”
“Bye bye”
……. …….
OXX???!
Bad
18
Deep Reinforcement Learning
Environment
Observation Action
Reward Function
Input
Function Output
Used to pick the best function
… …
… DNN
Goal: select actions that maximize the expected total reward
Modular Dialogue System
20
Outline
Introduction & Background
Neural Networks
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management (DM)
◼ Dialogue State Tracking (DST)
◼ Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Neural Dialogue Systems
System Evaluation
Recent Trends on Learning Dialogues
21
Task-Oriented Dialogue System (Young, 2000)
21
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
22
Semantic Frame Representation
Requires a domain ontology: early connection to backend
Contains core content (intent, a set of slots with fillers) find me a cheap taiwanese restaurant in oakland
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”, director=“james cameron”) Restaurant
Domain
Movie Domain
restaurant price type
location
movie year genre
director
23
Language Understanding (LU)
Pipelined
23
1. Domain Classification
2. Intent
Classification 3. Slot Filling
LU – Domain/Intent Classification
As an utterance classification
task
• Given a collection of utterances u i with labels c i , D= {(u 1 ,c 1 ),…,(u n ,c n )} where c i ∊ C, train a model to estimate labels for new utterances u k .
24
find me a cheap taiwanese restaurant in oakland
Movies Restaurants Music
Sports
…
find_movie, buy_tickets
find_restaurant, find_price, book_table find_lyrics, find_singer
…
Domain Intent
25
Domain/Intent Classification (Sarikaya+, 2011)
Deep belief nets (DBN)
Unsupervised training of weights
Fine-tuning by back-propagation
Compared to MaxEnt, SVM, and boosting
25 http://ieeexplore.ieee.org/abstract/document/5947649/
26
Domain/Intent Classification
2012)
Deep convex networks (DCN)
Simple classifiers are stacked to learn complex functions
Feature selection of salient n-grams
Extension to kernel-DCN
http://ieeexplore.ieee.org/abstract/document/6289054/; http://ieeexplore.ieee.org/abstract/document/6424224/
27
2015)
RNN and LSTMs for utterance classification
27
Intent decision after reading all words performs better
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf
28
Dialogue Act Classification (Lee & Dernoncourt, 2016)
RNN and CNNs for dialogue act classification
LU – Slot Filling
29
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence tagging task
• Given a collection tagged word sequences, S={((w
1,1,w
1,2,…, w
1,n1), (t
1,1,t
1,2,…,t
1,n1)), ((w
2,1,w
2,2,…,w
2,n2), (t
2,1,t
2,2,…,t
2,n2)) …}
where t
i∊ M, the goal is to estimate tags for a new word sequence.
flights from Boston to New York today
Entity Tag
Slot Tag
30
Slot Tagging (Yao+, 2013; Mesnil+, 2015)
Variations:
a. RNNs with LSTM cells
b. Input, sliding window of n-grams
c. Bi-directional LSTMs
𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0𝑓ℎ
1𝑓ℎ
2𝑓ℎ
𝑛𝑓ℎ
0𝑏ℎ
1𝑏ℎ
2𝑏ℎ
𝑛𝑏𝑦
0𝑦
1𝑦
2𝑦
𝑛𝑦
0𝑦
1𝑦
2𝑦
𝑛𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0ℎ
1ℎ
2ℎ
𝑛𝑦
0𝑦
1𝑦
2𝑦
𝑛𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0ℎ
1ℎ
2ℎ
𝑛http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380
31
Slot Tagging (Kurata+, 2016; Simonnet+, 2015)
Encoder-decoder networks
Leverages sentence level information
Attention-based encoder- decoder
Use of attention (as in MT) in the encoder-decoder network
Attention is estimated using a feed-forward network with input: h t and s t at time t
𝑦
0𝑦
1𝑦
2𝑦
𝑛𝑤
𝑛𝑤
2𝑤
1𝑤
0ℎ
𝑛ℎ
2ℎ
1ℎ
0𝑤
0𝑤
1𝑤
2𝑤
𝑛𝑦
0𝑦
1𝑦
2𝑦
𝑛𝑤
0𝑤
1𝑤
2𝑤
𝑛ℎ
0ℎ
1ℎ
2ℎ
𝑛𝑠
0𝑠
1𝑠
2𝑠
𝑛c
iℎ
0… ℎ
𝑛http://www.aclweb.org/anthology/D16-1223
32
Joint Segmentation & Slot Tagging (Zhai+, 2017)
Encoder that segments
Decoder that tags the segments
https://arxiv.org/pdf/1701.04027.pdf
33
2016)
Multi-task learning
Goal: exploit data from domains/tasks with a lot of data to improve ones with less data
Lower layers are shared across domains/tasks
Output layer is specific to task
33 https://arxiv.org/abs/1604.00117; http://www.sensei-conversation.eu/wp-content/uploads/2016/11/favre_is2016b.pdf
34
Semi-Supervised Slot Tagging (Lan+, 2018)
Idea: language understanding objective can enhance other tasks
Slot Tagging
Model
BLM exploits the unsupervised knowledge, the shared-private framework and adversarial training make the slot tagging model more generalized
https://speechlab.sjtu.edu.cn/papers/oyl11-lan-icassp18.pdf
35
LU Evaluation
Metrics
Sub-sentence-level: intent accuracy, intent F1, slot F1
Sentence-level: whole frame accuracy
35
ht-
1
ht+
1
ht
W W W W
taiwanese
B-type U
food U
please U
V
O V
O V
hT+1 EOS U
FIND_RES T V
Slot Filling Intent
Joint Semantic Frame Parsing
Liu & Lane, 2016)
Sequence- based (Hakkani- Tur+, 2016)
• Slot filling and intent prediction in the same
output sequence
Parallel (Liu & Lane,
2016)
• Intent prediction and slot filling are performed in two branches
36 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454
37
Slot-Gated Joint SLU (Goo+, 2018)
Slot Attention
Intent Attention 𝑦𝐼
Word Sequence
𝑥1 𝑥2 𝑥3 𝑥4
BLSTM Slot
Sequence
𝑦1𝑆 𝑦2𝑆 𝑦3𝑆 𝑦4𝑆
Word
Sequence
𝑥1 𝑥2 𝑥3 𝑥4BLSTM
Slot Gate
𝑊
𝑐
𝐼𝑣
tanh 𝑔
𝑐
𝑖𝑆Slot Gate
𝑔 = ∑𝑣 ∙ tanh 𝑐
𝑖𝑆+ 𝑊 ∙ 𝑐
𝐼Slot Prediction
𝑦
𝑖𝑆= 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊
𝑆ℎ
𝑖+ 𝒈 ∙ 𝑐
𝑖𝑆+ 𝑏
𝑆𝒈 will be larger if slot and intent are better related
38
Contextual LU
just sent email to bob about fishing this weekend
O O O O
B-contact_name O
B-subject I-subject I-subject U
S
I send_email D communication
→ send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend U
1S
2→ send_email(message=“are we going to fish this weekend”) send email to bob
U
2→ send_email(contact_name=“bob”)
B-message
I-message I-message I-message I-message I-message I-message
B-contact_name S
1Domain Identification → Intent Prediction → Slot Filling
39
Which restaurant would you like to book a table for and for what time?
Contextual LU
User utterances are highly ambiguous in isolation
Cascal, for 6.
#people time
?
Book a table for 10 people tonight.
Which restaurant would you like to book a table for?
Restaurant
Booking
40
Contextual LU (Bhargava+, 2013; Hori+, 2015 )
Leveraging contexts
Used for individual tasks
Seq2Seq model
Words are input one at a time, tags are output at the end of each utterance
Extension: LSTM with speaker role dependent layers
https://www.merl.com/publications/docs/TR2015-134.pdf
41
End-to-End Memory Networks (Sukhbaatar+, 2015)
U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
m0
mi
mn-1 u
42
E2E MemNN for Contextual LU (Chen+, 2016)
u
Knowledge Attention Distribution
p
im
iMemory Representation
Weighted
Sum
h
∑ W
kgKnowledge Encoding
o
Representation history utterances {xi}
current utterance
c
Inner Product Sentence
Encoder RNNin
x1 x2 … xi
Contextual Sentence Encoder
x1 x2 … xi
RNNmem
slot tagging sequence
y
ht-1 ht
V V
W W W
wt-1 wt yt-1 yt
U U
M M
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Idea: additionally incorporating contextual knowledge during slot tagging
→ track dialogue states in a latent way
RNN Tagger
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf
43
Analysis of Attention
U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
0.69
0.13
0.16
44
Dialogue Encoder Network (Bapna+, 2017)
Past and current turn encodings input to a feed forward network
http://aclweb.org/anthology/W17-5514
45
Dense Layer
+
w
tw
t+1w
T… …
Dense Layer
Spoken Language Understanding u
2u
6Tourist u
4Guide
u
1u
7Current
Sentence-Level Time-Decay Attention
u
3u
5Role-Level Time- Decay Attention
𝛼
𝑟1𝛼
𝑟2𝛼
𝑢𝑖∙ u2 ∙ u4 ∙ u5
𝛼𝑢2 𝛼𝑢4 𝛼𝑢5 𝛼𝑢∙ u1 1𝛼𝑢∙ u3 3 𝛼𝑢∙ u6 6 History Summary
Time-Decay Attention Function (𝛼
𝑢& 𝛼
𝑟) 𝛼
𝑑 𝛼
𝑑 𝛼
𝑑
convex linear concave
Role-Based Time-Decay Attention (Su+, 2018)
http://aclweb.org/anthology/N18-1194
46
Dense Layer
+
Current Utterance w
tw
t+1w
T… …
Dense
Layer Spoken Language Understanding
∙ u2 ∙ u4 ∙ u5
u
2u
6Tourist
u
4Guide u
1u
7Current
Sentence-Level Time-Decay Attention
u
3u
5𝛼𝑢2 𝛼𝑢∙ u1 1 ∙ u3 ∙ u6
Role-Level Time-Decay Attention
𝛼
𝑟1𝛼
𝑟2𝛼
𝑢𝑖𝛼𝑢4 𝛼𝑢5 𝛼𝑢3 𝛼𝑢6 History Summary
Attention Model
Attention Model
convex linear concave
Time-decay attention significantly improves the understanding results
Context-Sensitive Time-Decay (Su+, 2018)
47
Structural LU (Chen+, 2016)
Prior knowledge as a teacher
47
Knowledge Encoding
Sentence Encoding
Inner Product
m
iKnowledge Attention Distribution
p i
Encoded Knowledge Representation
Weighted Sum
∑
Knowledge- Guided Representation
slot tagging sequence knowledge-guided structure {x
i}
showme theflights fromseattleto sanfrancisco
ROOT
Input Sentence
W W W W
wt-1
yt-1 U
wt M U
wt+1 U
V
yt V
yt+1 V M
M
RNN Tagger
Knowledge Encoding Module
http://arxiv.org/abs/1609.03286
48
Structural LU (Chen+, 2016)
Sentence structural knowledge stored as memory
Semantics (AMR Graph)
show me
the
flights from seattle
to francisco ROOT
1.
3.
4.
2.
show
you flight I
1.
2.
4.
city city
Seattle San Francisco 3.
Sentence s show me the flights from seattle to san francisco
Syntax (Dependency Tree)
http://arxiv.org/abs/1609.03286
49
Structural LU (Chen+, 2016)
Sentence structural knowledge stored as memory
http://arxiv.org/abs/1609.03286Using less training data with K-SAN allows the model pay the similar attention to the salient substructures
http://arxiv.org/abs/1609.03286
50
Semantic Frame Representation
Requires a domain ontology: early connection to backend
Contains core content (intent, a set of slots with fillers) find me a cheap taiwanese restaurant in oakland
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”, director=“james cameron”) Restaurant
Domain
Movie Domain
restaurant price type
location
movie year genre
director
51
LU – Learning Semantic Ontology (Chen+, 2013)
Learning key domain concepts from goal-oriented human-human conversations
Clustering with mutual information and KL divergence
(Chotimongkol & Rudnicky, 2002)
Spectral clustering based slot ranking model (Chen et al., 2013)
◼ Use a state-of-the-art frame-semantic parser trained for FrameNet
◼ Adapt the generic output of the parser to the target semantic space
51 http://www.cs.cmu.edu/~ananlada/ConceptIdentificationICSLP02.pdf, http://ieeexplore.ieee.org/abstract/document/6707716/
52
LU – Intent Expansion (Chen+, 2016)
Transfer dialogue acts across domains
Dialogue acts are similar for multiple domains
Learning new intents by information from other domains
CDSSM New Intent
Intent Representation 1 2
K :
Embedding Generation
K+1
<change_calender> K+2 Training Data
<change_note>
“adjust my note”
:
<change_setting>
“volume turn down”
300 300 300 300
U A
1A
2A
nCosSi m
P(A
1| U) P(A
2| U) P(A
n| U)
…
Utterance
The dialogue act representations can be
http://ieeexplore.ieee.org/abstract/document/7472838/
postpone my meeting to five pm
53
LU – Language Extension (Upadhyay+, 2018)
Source language: English (full annotations)
Target language: Hindi (limited annotations)
53
RT: round trip, FC: from city, TC: to city, DDN: departure day name
http://shyamupa.com/papers/UFTHH18.pdf
54
LU – Language Extension (Upadhyay+, 2018)
English Train
Hindi Train
Hindi Tagger
MT SLU
Results Hindi Test
Train on Target (Lefevre et al, 2010)
English Tagger Hindi
Test
English
MT Test SLU
Results Test on Source (Jabaian et al, 2011)
SLU Results Hindi Train (Small)
Bilingual Tagger English Train (Large)
Joint Training
Hindi Test Joint Training
MT system is not required and both languages can be processed by a single model
http://shyamupa.com/papers/UFTHH18.pdf
55
Outline
Introduction & Background
Neural Networks
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management (DM)
◼ Dialogue State Tracking (DST)
◼ Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Neural Dialogue Systems
System Evaluation
Recent Trends on Learning Dialogues
55
56
Task-Oriented Dialogue System (Young, 2000)
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
57
Elements of Dialogue Management
(Figure from Gašić)
57Dialogue state tracking
58
Dialogue State Tracking (DST)
Dialogue state: a representation of the system's belief of the user's goal(s) at any time during the dialogue
Inputs
Current user utterance
Preceding system response
Results from previous turns
For
Looking up knowledge or making API call(s)
Generating the next system action/response
59
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or
ambiguous input
59
How can I help you?
Book a table at Sumiko for 5 How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
60
Multi-Domain Dialogue State Tracking
A full representation of the system's belief of the user's goal at any point during the dialogue
Used for making API calls
Movies
Less More
Date Time
#People 6 pm
2 11/15/17
7 pm 8 pm 9 pm
Century 16 Shoreline
#People Theater
Inferno.
Inferno Movie
Which movie are you interested in?
I wanna buy two tickets for tonight at the Shoreline theater.
61
Multi-Domain Dialogue State Tracking
A full representation of the system's belief of the user's goal at any point during the dialogue
Used for making API calls
61 Movies
Less Likely
More Likely
I wanna buy two tickets for tonight at the Shoreline theater.
Date Time
#People
6:30 pm 2 11/15/17
7:30 pm 8:45 pm 9:45 pm
Century 16 Shoreline
#People Theater
Which movie are you interested in?
Inferno.
Inferno Movie
Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?
We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?
Restaurants
6:00 pm 6:30 pm 11/15/17
Date
Time 7:00 pm
Cascal
#People 2 Restaurant
62
Multi-Domain Dialogue State Tracking
A full representation of the system's belief of the user's goal at any point during the dialogue
Used for making API calls
Movies
Less More
Date Time
#People
6:30 pm 2 11/15/17
7:30 pm 8:45 pm 9:45 pm
Century 16 Shoreline
#People Theater
Inferno.
Inferno Movie
Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?
We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?
Restaurants
6:00 pm 6:30 pm 11/15/17
Date
Time 7:00 pm
Cascal
Cascal has a table for 2 at 6pm and 7:30pm.
OK, let me get the table at 6 and tickets for the 7:30 showing.
#People 2 Restaurant
63
RNN-CNN DST (Mrkšić+, 2015)
(Figure from Wen et al, 2016)
63 https://arxiv.org/abs/1506.0719064
Neural Belief Tracker (Mrkšić+, 2016)
Candidate pairs are considered
https://arxiv.org/abs/1606.03777
Previous Belief State: [bt-1]
65
Global-Locally Self-Attentive DST (Zhong+, 2018)
More advanced encoder
Global modules share parameters for all slots
Local modules learn slot-specific feature representations
65 http://www.aclweb.org/anthology/P18-1135
66
Dialog State Tracking Challenge (DSTC)
(Williams+, 2013, Henderson+, 2014, Henderson+, 2014, Kim+, 2016, Kim+, 2016)
Challenge Type Domain Data Provider Main Theme DSTC1 Human-
Machine Bus Route CMU Evaluation Metrics
DSTC2 Human-
Machine Restaurant U. Cambridge User Goal Changes
DSTC3 Human-
Machine Tourist Information U. Cambridge Domain Adaptation
DSTC4 Human-
Human Tourist Information I2R Human Conversation
DSTC5 Human-
Human Tourist Information I2R Language Adaptation
67
DST Evaluation
Metric
Tracked state accuracy with respect to user goal
Recall/Precision/F-measure individual slots
67
68
DST – Language Extension (Shi+, 2016)
Training a multichannel CNN for each slot
Chinese character CNN
Chinese word CNN
English word CNN
https://arxiv.org/abs/1701.06247
69
DST – Task Lineages (Lee & Stent, 2016)
Slot values shared across tasks
Utterances with complex constraints on user goals
Interleaved multiple task discussions
69
(confidence, dialog act item )
Start_timeEnd_timeTask Frame:
Connection to Manhattan and find me a Thai restaurant, not Italian
https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=29
Task State:
Thai restaurant, not Italian
70
DST – Task Lineages (Lee & Stent, 2016)
https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=29
71
DST – Scalability (Rastogi+, 2017)
Focus only on the relevant slots
Better generalization to ASR lattices, visual context, etc.
71
S> How about 6 pm?
U> I am busy then, book it for 7 pm instead.
https://arxiv.org/pdf/1712.10224.pdf
72
DST – Handling Unknown Values (Xu & Hu, 2018)
Issue: fixed value sets in DST
http://aclweb.org/anthology/P18-1134
<sys> would you like some Thai food
Attention Dist.
<usr> I prefer Italian one <food>
“Italian”
other dontcare
none
Italian
73
Joint NLU and DST (Gupta+, 2018)
73 dt
dst
System Act Encoder
Utterance Encoder
request(movie) request(date)
<SOS> Tickets for Avatar tonight <EOS>
dt-1 dst at
dt-2 dst
ue ut
System Act Encoder
Utterance Encoder
greeting
<SOS> I want to see a movie <EOS>
at-1
ue ut-1
u
tu
odt-1 do
dt do
at
O O O B-movie B-date O
User Intent Classifier
..
BUY_MOVIE_TICKETS
Dialogue Act Classifier
INFORM
Slot Tagger
u
tu
o74
Joint NLU and DST (Gupta+, 2018)
System Act Encoder
Utterance Encoder
request(movie) request(date)
dt dst dt-1 dst at
dt-1 do
dt do dt-2 dst
ue ut
System Act Encoder
Utterance Encoder
greeting
<SOS> I want to see a movie <EOS>
at-1
ue ut-1
u
tu
oCandidate Scorer
Dt Dt-1
movie: Avatar date: tonight
75
Outline
Introduction & Background
Neural Networks
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management (DM)
◼ Dialogue State Tracking (DST)
◼ Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Neural Dialogue Systems
System Evaluation
Recent Trends on Learning Dialogues
75
76
Elements of Dialogue Management
Dialogue policy optimization
77
Dialogue Policy Optimization
Dialogue management in a RL framework
77
U s e r
Reward R Observation O Action A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
Goal: select the best action that maximizes the future reward
78
Reward for RL ≅ Evaluation for System
◼ Dialogue is a special RL task
◼ Human involves in interaction and rating (evaluation) of a dialogue
◼ Fully human-in-the-loop framework
◼ Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost
- Objective rating Check desired aspects, low cost
79
RL for Dialogue Policy Optimization
79
Language understanding
Language (response) generation
Dialogue Policy 𝑎 = 𝜋(𝑠)
Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)
Optimize 𝑄(𝑠, 𝑎) User input (o)
Response
𝑠
𝑎
Type of Bots State Action Reward
Social ChatBots
Chat history System Response # of turns maximized;Intrinsically motivated reward
InfoBots (interactive Q/A)
User current question + ContextAnswers to current question
Relevance of answer;
# of turns minimized
Task-Completion Bots
User current input + ContextSystem dialogue act w/
slot value (or API calls)
Task success rate;
# of turns minimized
Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories
80
Dialogue Reinforcement Learning Signal
Typical reward function
◼ Large reward at completion if successful
◼ -1 for per turn penalty
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
The user simulator is usually required for
dialogue system training before deployment
81
Neural Dialogue Manager (Li+, 2017)
Deep RL for training DM
Input: current semantic frame observation, database returned results
Output: system action
81
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location DQN-based
Dialogue Management Simulated/paid/real (DM)
User
Backend DB
http://www.aclweb.org/anthology/I17-1074
82
E2E Task-Completion Bot (TC-Bot) (Li+, 2017)
Idea: SL for each component and RL for end-to-end training
wi
<slot>
wi+1
O
EOS
<intent>
wi
<slot>
wi+1
O
EOS
<intent>
Database
Neural Dialogue System User Model
User Simulation
Dialogue Policy Natural Language
w0 w1
NLG
w2EOS
User Goal
wi
<slot>
wi+1
O
EOS
<intent>
LU
𝑠
𝑡DST
𝑠
1𝑠
2𝑠
𝑛𝑎
1𝑎
2𝑎
𝑘……
…
Dialogue Policy Learning Are there any
action movies to see this weekend?
request_location
http://www.aclweb.org/anthology/I17-1074
83
SL + RL for Sample Efficiency (Su+, 2017)
Issue about RL for DM
slow learning speed
cold start
Solutions
Sample-efficient actor-critic
◼ Off-policy learning with experience replay
◼ Better gradient update
Utilizing supervised data
◼ Pretrain the model with SL and then fine- tune with RL
◼ Mix SL and RL data during RL learning
◼ Combine both
https://arxiv.org/pdf/1707.00130.pdf
http://aclweb.org/anthology/W17-5518
84
Learning to Negotiate (Lewis+, 2017)
Task: multi-issue bargaining
Each agent has its own value function
https://arxiv.org/pdf/1706.05125.pdf
85
Learning to Negotiate (Lewis+, 2017)
Dialogue rollouts to simulate a future conversation
SL + RL
SL aims to imitate human users’ actions
RL tries to make agents focus on the goal
https://arxiv.org/pdf/1706.05125.pdf
86
Online Training (Su+, 2015; Su+, 2016)
Policy learning from real users
Infer reward directly from dialogues (Su+, 2015)
User rating (Su+, 2016)
Reward modeling on user binary success rating
Reward
Model Success/Fail Embedding
Function
Dialogue Representation
Reinforcement Signal Query rating
http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=437; https://www.aclweb.org/anthology/P/P16/P16-1230.pdf
87
Interactive RL for DM (Shah+, 2016)
87
Immediate Feedback
https://research.google.com/pubs/pub45734.html
Use a third agent for providing interactive feedback to the DM
Explicit
Implicit
88
Multi-Domain – Hierarchical RL (Peng+, 2017) Travel Planning
Actions
• Set of tasks that need to be fulfilled collectively!
• Build a DM for cross-subtask constraints (slot constraints)
• Temporally constructed goals
• hotel_check_in_time > departure_flight_time
• # flight_tickets = #people checking in the hotel
• hotel_check_out_time< return_flight_time,
https://arxiv.org/abs/1704.03084
89
Multi-Domain – Hierarchical RL (Peng+, 2017)
Model makes decisions over two levels: meta-controller & controller
The agent learns these policies simultaneously
the policy of optimal sequence of goals to follow 𝜋
𝑔𝑔
𝑡, 𝑠
𝑡; 𝜃
1
Policy 𝜋
𝑎,𝑔𝑎
𝑡, 𝑔
𝑡, 𝑠
𝑡; 𝜃
2for each sub-goal 𝑔
𝑡89
Meta- Controller Controller
(mitigate reward sparsity issues)
https://arxiv.org/abs/1704.03084
90
Planning – Deep Dyna-Q (Peng+, 2018)
Issues: sample-inefficient, discrepancy between simulator & real user
Idea: learning with real users with planning
Policy Model World User
Model
Real Experience
Direct Reinforcement
Learning World Model
Learning Planning
Acting
Human
Conversational Data
Imitation Learning
Supervised
Learning
https://arxiv.org/abs/1801.06176
91
Deep Dyna-Q (Su+, 2018)
Idea: add a discriminator to filter out the bad experiences
91
Policy Model
User World
Model
Real Experience
Direct Reinforcement
Learning
World Model Learning
Controlled Planning
Acting
Human
Conversational Data Imitation
Learning Supervised
Learning
Discriminator
Discriminative Training
NLU
Discriminator
System Action (Policy) Semantic
Frame
State Representation Real
Experience
DST
Policy Learning NLG
Simulated Experience
World Model
User
(to appear) EMNLP 2018
92
Deep Dyna-Q (Su+, 2018)
𝑠 𝑎
𝑜 𝑟 𝑡
User
Response Reward Termination Signal
Dialogu
e State System
Action Task-Specific
Layer
Shared Layer
𝑠, 𝑎, 𝑟, 𝑠′ World Model
𝑫𝒔 𝑠, 𝑎, 𝑟, 𝑠𝑫𝒖 ′
Simulated Real
LSTM Discriminator
𝑜𝑡−1 𝑜2
𝑜1
Dialogue Contexts
1: high-quality
0: low-quality
S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.
(to appear) EMNLP 2018
93
Deep Dyna-Q (Su+, 2018)
93
S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.
The policy learning is more robust and shows the improvement in human evaluation
(to appear) EMNLP 2018
94
Dialogue Management Evaluation
Metrics
Turn-level evaluation: system action accuracy
Dialogue-level evaluation: task success rate, reward
95
RL-Based DM Challenge
SLT 2018 Microsoft Dialogue Challenge:
End-to-End Task-Completion Dialogue Systems
Domain 1: Movie-ticket booking
Domain 2: Restaurant reservation
Domain 3: Taxi ordering
96
Outline
Introduction & Background
Neural Networks
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management (DM)
◼ Dialogue State Tracking (DST)
◼ Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Neural Dialogue Systems
System Evaluation
Recent Trends on Learning Dialogues
97
Task-Oriented Dialogue System (Young, 2000)
97
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
98
Natural Language Generation (NLG)
Mapping dialogue acts into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
99
Template-Based NLG
Define a set of rules to map frames to NL
99
Pros: simple, error-free, easy to control
Cons: time-consuming, un-natural , poor scalability Semantic Frame Natural Language
confirm() “Please tell me more about the product your are looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
100