Evaluate the generated sentences using BLUE-like measures against the reference utterances collected from humans (with the same
goal)
116
Seq2Seq User Simulation (El Asri et al., 2016)
Seq2Seq trained from dialogue data
Input: c
iencodes contextual features, such as the previous system action, consistency between user goal and machine provided values
Output: a dialogue act sequence form the user
Extrinsic evaluation for policy
https://arxiv.org/abs/1607.00070
Material: http://deepdialogue.miulab.tw
117
Seq2Seq User Simulation (Crook and Marin, 2017)
Seq2Seq trained from dialogue data
No labeled data
Trained on just human to machine conversations
118
User Simulator for Dialogue Evaluation Measures
• whether constrained values specified by users can be understood by the system
• agreement percentage of system/user understandings over the entire dialog (averaging all turns) Understanding Ability
• Number of dialogue turns
• Ratio between the dialogue turns (larger is better) Efficiency
• an explicit confirmation for an uncertain user utterance is an appropriate system action
• providing information based on misunderstood user requirements Action Appropriateness
Material: http://deepdialogue.miulab.tw
119
How NOT to Evaluate Dialog System (Liu et al., 2017)
How to evaluate the quality of the generated response ?
Specifically investigated for chat-bots
Crucial for task-oriented tasks as well
Metrics:
Word overlap metrics, e.g., BLEU, METEOR, ROUGE, etc.
Embeddings based metrics, e.g., contextual/meaning representation between target and candidate
https://arxiv.org/pdf/1603.08023.pdf
120
Dialogue Response Evaluation (Lowe et al., 2017)
Towards an Automatic Turing Test
Problems of existing automatic evaluation
can be biased
correlate poorly with human judgements of response quality
using word overlap may be misleading
Solution
collect a dataset of accurate human scores for variety of dialogue responses (e.g., coherent/un-coherent, relevant/irrelevant, etc.)
use this dataset to train an automatic dialogue
evaluation model – learn to compare the reference to candidate responses!
Use RNN to predict scores by comparing against human scores!
Context of Conversation
Speaker A: Hey, what do you want to do tonight?
Speaker B: Why don’t we go see a movie?
Model Response
Nah, let’s do something active.
Reference Response
Yeah, the film about Turing looks great!
Material: http://deepdialogue.miulab.tw
End-to-End Learning for Dialogues Multimodality
Dialogue Breath
Dialogue Depth
122
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
Material: http://deepdialogue.miulab.tw
123
ChitChat Hierarchical Seq2Seq (Serban et al., 2016)
Learns to generate dialogues from offline dialogs
No state, action, intent, slot, etc.
http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11957
124
ChitChat Hierarchical Seq2Seq (Serban et.al., 2017)
A hierarchical seq2seq model with Gaussian latent variable for generating dialogues (like topic or sentiment)
https://arxiv.org/abs/1605.06069
Material: http://deepdialogue.miulab.tw
125
2017)
125 https://arxiv.org/abs/1702.01932
126
E2E Joint NLU and DM (Yang et al., 2017)
Errors from DM can be propagated to NLU for regularization + robustness
DM
126
Model DM NLU
Baseline (CRF+SVMs) 7.7 33.1 Pipeline-BLSTM 12.0 36.4
JointModel
22.8 37.4Both DM and NLU performance (frame accuracy) is improved
https://arxiv.org/abs/1612.00913
Material: http://deepdialogue.miulab.tw
127
0 0 0 … 0 1
Database Operator
Copy field
…
Database
Seven days Curry Prince Nirala Royal Standard Little Seuol
DB pointer
Can I have korean Korean
0.7 British 0.2 French 0.1
…
Belief Tracker Intent Network
Can I have <v.food>
E2E Supervised Dialogue System (Wen et al., 2016)
Generation Network
<v.name> serves great <v.food> .
Policy Network
127
zt
pt xt
MySQL query:
“Select * where food=Korean”
qt
https://arxiv.org/abs/1604.04562
128
E2E MemNN for Dialogues (Bordes et al., 2016)
Split dialogue system actions into subtasks
API issuing
API updating
Option displaying
Information informing
https://arxiv.org/abs/1605.07683
Material: http://deepdialogue.miulab.tw
129
E2E RL-Based KB-InfoBot (Dhingra et al., 2017)
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray movie which came out in 1993.
KB-InfoBot User
Entity-Centric Knowledge Base
Idea: differentiable database for propagating the gradients 129
http://www.aclweb.org/anthology/P/P17/P17-1045.pdf
Movie Actor Release
Year Groundhog Day Bill Murray 1993
Australia Nicole Kidman X
Mad Max: Fury Road X 2015
130
E2E RL-Based System (Zhao and Eskenazi, 2016)
130
Joint learning
NLU, DST, Dialogue Policy
Deep RL for training
Deep Q-network
Deep recurrent network
Baseline RL
Hybrid-RL
http://www.aclweb.org/anthology/W/W16/W16-36.pdf
Material: http://deepdialogue.miulab.tw
131
E2E LSTM-Based Dialogue Control (Williams and Zweig, 2016)
131
Idea: an LSTM maps from raw dialogue history directly to a distribution over system actions
Developers can provide software including business rules & programmatic APIs
LSTM can take actions in the real world on behalf of the user
The LSTM can be optimized using SL or RL
https://arxiv.org/abs/1606.01269
132
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
wi
B-type
wi +1
wi+2
O O
EOS
<intent
>
wi
B-type
wi +1
wi+2
O O
EOS
<intent
> Semantic Frame request_movie genre=action, date=this weekend System Action /
Policy
request_location User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi
+1
wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2 Time t
Dialogue Management
(DM)
w0 w1 w2
Natural Language Generation (NLG)
User EOS
Goal
User Agenda Modeling User Simulator
End-to-End Neural Dialogue System Text Input
Are there any action movies to see this weekend?
Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system
132 https://arxiv.org/abs/1703.01008
Material: http://deepdialogue.miulab.tw
133
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you! 133
The system can learn how to efficiently interact with users for task completion
REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch?
Agent: What date would you like to see it?
User: Tomorrow
Agent: Which theater do you prefer?
User: Regal meridian 16 Agent: What is your location?
User: Seattle
Agent: What time would you like to see it?
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!
https://arxiv.org/abs/1703.01008
134
Hierarchical RL for Composite Tasks (Peng et al., 2017)
134
Travel Planning
Actions
• Set of tasks that need to be fulfilled collectively!
• Build a dialog manager that satisfies cross-subtask constraints (slot constraints)
• Temporally constructed goals
• hotel_check_in_time > departure_flight_time
• # flight_tickets = #people checking in the hotel
• hotel_check_out_time< return_flight_time,
https://arxiv.org/abs/1704.03084 Peng et.al., EMNLP 2017
Material: http://deepdialogue.miulab.tw
135
Hierarchical RL for Composite Tasks (Peng et al., 2017)
135
The dialog model makes decisions over two levels: meta-controller and meta-controller
The agent learns these policies simultaneously
the policy of optimal sequence of goals to follow 𝜋𝑔 𝑔𝑡, 𝑠𝑡; 𝜃1
Policy 𝜋𝑎,𝑔 𝑎𝑡, 𝑔𝑡, 𝑠𝑡; 𝜃2 for each sub-goal 𝑔𝑡
Meta-Controller Controller
(mitigate reward sparsity issues)
https://arxiv.org/abs/1704.03084 Peng et.al., EMNLP 2017
136
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
136
Material: http://deepdialogue.miulab.tw
137
Brain Signal for Understanding
137
Misunderstanding detection by brain signal
Green: listen to the correct answer
Red: listen to the wrong answer
http://dl.acm.org/citation.cfm?id=2388695
Detecting misunderstanding via brain signal in order to correct the understanding results
138
Video for Intent Understanding
138