36
37 Material: http://opendialogue.miulab.tw
Task-Oriented Dialogue System
(Young, 2000)37
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
38 Material: http://opendialogue.miulab.tw
Outline
PART I. Introduction & Background Knowledge
PART II. Task-Oriented Dialogue Systems
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management – Dialogue State Tracking (DST)
Dialogue Management – Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Task-Oriented Dialogue Systems
PART III. Social Chat Bots
PART IV. Evaluation
PART V. Recent Trends and Challenges
38
39 Material: http://opendialogue.miulab.tw
Language Understanding (LU)
Pipelined
39
1. Domain Classification
2. Intent
Classification 3. Slot Filling
LU – Domain/Intent Classification
• Given a collection of utterances uiwith labels ci, D= {(u1,c1),…,(un,cn)}
where ci ∊ C, train a model to estimate labels for new utterances uk. Mainly viewed as an utterance classification task
40
find me a cheap taiwanese restaurant in oakland
Movies Restaurants Sports
Weather Music
…
Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics
…
41 Material: http://opendialogue.miulab.tw
DNN for Domain/Intent Classification
(Ravuri & Stolcke, 2015)41
Intent decision after reading all words performs better
RNN and LSTMs for utterance classification
42 Material: http://opendialogue.miulab.tw
DNN for Dialogue Act Classification
(Lee & Dernoncourt, 2016)42
RNN and CNNs for dialogue act classification
LU – Slot Filling
43
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence tagging task
• Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
where ti ∊ M, the goal is to estimate tags for a new word sequence.
flights from Boston to New York today
Entity Tag Slot Tag
44 Material: http://opendialogue.miulab.tw
RNN for Slot Tagging – I
(Yao et al, 2013; Mesnil et al, 2015) Variations:
a. RNNs with LSTM cells
b. Input, sliding window of n-grams
c. Bi-directional LSTMs
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0𝑓 ℎ1𝑓 ℎ2𝑓 ℎ𝑛𝑓 ℎ0𝑏 ℎ1𝑏 ℎ2𝑏 ℎ𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛
(b) LSTM-LA (c) bLSTM
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
(a) LSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
45 Material: http://opendialogue.miulab.tw
RNN for Slot Tagging – II
(Kurata et al., 2016; Simonnet et al., 2015) Encoder-decoder networks
Leverages sentence level information
Attention-based encoder-decoder
Use of attention (as in MT) in the encoder-decoder network
Attention is estimated using a
feed-forward network with input: ht and st at time t
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤𝑛 𝑤2 𝑤1 𝑤0 ℎ𝑛 ℎ2 ℎ1 ℎ0
𝑤0 𝑤1 𝑤2 𝑤𝑛
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
ℎ0 ℎ1 ℎ2 ℎ𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛
ci ℎ0…ℎ𝑛
46 Material: http://opendialogue.miulab.tw
RNN for Slot Tagging – III
(Jaech et al., 2016; Tafforeau et al., 2016) Multi-task learning
Goal: exploit data from domains/tasks with a lot of data to improve ones with less data
Lower layers are shared across domains/tasks
Output layer is specific to task
46
47 Material: http://opendialogue.miulab.tw
Joint Segmentation and Slot Tagging
(Zhai et al., 2017) Encoder that segments
Decoder that tags the segments
47
h
t-1
ht+
1
ht
W W W W
taiwanese
B-type U
food U
please U
V
O V
O V
hT+1 EOS U
FIND_REST V
Slot Filling Intent Prediction
Joint Semantic Frame Parsing
Sequence-based
(Hakkani-Tur+ 16)
• Slot filling and intent prediction in the same
output sequence
Parallel-based (Liu+ 16)
• Intent prediction and slot filling are performed in two branches
48
49 Material: http://opendialogue.miulab.tw
Contextual LU
49
just sent email to bob about fishing this weekend
O O O O
B-contact_name O
B-subject I-subject I-subject U
S
I send_email D communication
send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend U1
S2
send_email(message=“are we going to fish this weekend”) send email to bob
U2
send_email(contact_name=“bob”)
B-message
I-messageI-message I-message I-message I-message I-message
B-contact_name S1
Domain Identification Intent Prediction Slot Filling
50 Material: http://opendialogue.miulab.tw
Contextual LU
User utterances are highly ambiguous in isolation
Cascal, for 6.
#people time
?
Book a table for 10 people tonight.
Which restaurant would you like to book a table for?
Restaurant Booking
51 Material: http://opendialogue.miulab.tw
Contextual LU
(Bhargava et al., 2013; Hori et al, 2015) Leveraging contexts
Used for individual tasks
Seq2Seq model
Words are input one at a time, tags are output at the end of each utterance
Extension: LSTM with speaker role dependent layers
51
52 Material: http://opendialogue.miulab.tw
End-to-End Memory Networks
(Sukhbaatar et al, 2015)U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
m0
mi
mn-1 u
53 Material: http://opendialogue.miulab.tw
E2E MemNN for Contextual LU
(Chen et al., 2016)53
u
Knowledge Attention Distribution
pi
mi
Memory Representation
Weighted
Sum h
∑ Wkg
Knowledge Encoding o
Representation history utterances {xi}
current utterance
c
Inner Product Sentence
Encoder RNNin
x1 x2 … xi
Contextual Sentence Encoder
x1 x2 … xi
RNNmem
slot tagging sequencey
ht-1 ht
V V
W W W
wt-1 wt yt-1 yt
U U
M M
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Idea: additionally incorporating contextual knowledge during slot tagging
track dialogue states in a latent way
RNN Tagger
54 Material: http://opendialogue.miulab.tw
Analysis of Attention
U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
0.69
0.13
0.16
55 Material: http://opendialogue.miulab.tw
Sequential Dialogue Encoder Network
(Bapna et al., 2017) Past and current turn encodings input to a feed forward network
55 Bapna et.al., SIGDIAL 2017
56 Material: http://opendialogue.miulab.tw
Structural LU
(Chen et al., 2016) K-SAN: prior knowledge as a teacher
56
Knowledge Encoding
Sentence Encoding
Inner Product
mi
Knowledge Attention Distribution
pi
Encoded Knowledge Representation
Weighted Sum
∑
Knowledge-Guided Representation
slot tagging sequence knowledge-guided structure {xi}
showme theflights fromseattleto sanfrancisco
ROOT
Input Sentence
W W W W
wt-1
yt-1 U
wt M U
wt+1 U
V
yt V
yt+1 V M
M
RNN Tagger
Knowledge Encoding Module
57 Material: http://opendialogue.miulab.tw
Structural LU
(Chen et al., 2016) Sentence structural knowledge stored as memory
57
Semantics (AMR Graph)
show me
the
flights from seattle
to
san francisco ROOT
1.
3.
4.
2.
show
you flight I
1.
2.
4.
city city
Seattle San Francisco 3.
Sentence s show me the flights from seattle to san francisco
Syntax (Dependency Tree)
58 Material: http://opendialogue.miulab.tw
Structural LU
(Chen et al., 2016) Sentence structural knowledge stored as memory
Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.
59 Material: http://opendialogue.miulab.tw
LU Importance
(Li et al., 2017) Compare different types of LU errors
Slot filling is more important than intent detection in language understanding
Sensitivity to Intent Error Sensitivity to Slot Error
60 Material: http://opendialogue.miulab.tw
LU Evaluation
Metrics
Sub-sentence-level: intent accuracy, slot F1
Sentence-level: whole frame accuracy
60
61 Material: http://opendialogue.miulab.tw
Outline
PART I. Introduction & Background Knowledge
PART II. Task-Oriented Dialogue Systems
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management – Dialogue State Tracking (DST)
Dialogue Management – Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Task-Oriented Dialogue Systems
PART III. Social Chat Bots
PART IV. Evaluation
PART V. Recent Trends and Challenges
61
62 Material: http://opendialogue.miulab.tw
Elements of Dialogue Management
(Figure from Gašić) 62
Dialogue State Tracking
63 Material: http://opendialogue.miulab.tw
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input
63
How can I help you?
Book a table at Sumiko for 5 How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
64 Material: http://opendialogue.miulab.tw
Multi-Domain Dialogue State Tracking (DST)
A full representation of the system's belief of the user's goal at any point during the dialogue
Used for making API calls
64
Do you wanna take Angela to go see a movie tonight?
Sure, I will be home by 6.
Let's grab dinner before the movie.
How about some Mexican?
Let's go to Vive Sol and see Inferno after that.
Angela wants to watch the Trolls movie.
Ok. Lets catch the 8 pm show.
Inferno
6 pm 7 pm
2 3
11/15/16
Vive Sol Restaurant
Mexican Cuisine
6:30 pm 7 pm 11/15/16 Date
Time
Restaurants
7:30 pm
Century 16
Trolls
8 pm 9 pm
Movies
65 Material: http://opendialogue.miulab.tw
Dialog State Tracking Challenge (DSTC)
(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)
Challenge Type Domain Data Provider Main Theme
DSTC1 Human-Machine Bus Route CMU Evaluation Metrics
DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes
DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation
DSTC4 Human-Human Tourist Information I2R Human Conversation
DSTC5 Human-Human Tourist Information I2R Language Adaptation
66 Material: http://opendialogue.miulab.tw
NN-Based DST
(Henderson et al., 2013; Mrkšić et al., 2015; Mrkšić et al., 2016)(Figure from Wen et al, 2016) 66
67 Material: http://opendialogue.miulab.tw
Neural Belief Tracker
(Mrkšić et al., 2016)67
68 Material: http://opendialogue.miulab.tw
DST Evaluation
Dialogue State Tracking Challenges
DSTC2-3, human-machine
DSTC4-5, human-human
Metric
Tracked state accuracy with respect to user goal
Recall/Precision/F-measure individual slots
68
69 Material: http://opendialogue.miulab.tw
Outline
PART I. Introduction & Background Knowledge
PART II. Task-Oriented Dialogue Systems
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management – Dialogue State Tracking (DST)
Dialogue Management – Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Task-Oriented Dialogue Systems
PART III. Social Chat Bots
PART IV. Evaluation
PART V. Recent Trends and Challenges
69
70 Material: http://opendialogue.miulab.tw
Elements of Dialogue Management
(Figure from Gašić) 70
Dialogue Policy Optimization
71 Material: http://opendialogue.miulab.tw
Dialogue Policy Optimization
Dialogue management in a RL framework
71
U s e r
Reward R Observation O Action A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
Slides credited by Pei-Hao Su
Optimized dialogue policy selects the best action that can maximize the future reward.
Correct rewards are a crucial factor in dialogue policy training
72 Material: http://opendialogue.miulab.tw
Reward for RL ≅ Evaluation for System
Dialogue is a special RL task
Human involves in interaction and rating (evaluation) of a dialogue
Fully human-in-the-loop framework
Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost
72
73 Material: http://opendialogue.miulab.tw
Reinforcement Learning for Dialogue Policy Optimization
73
Language understanding
Language (response) generation
Dialogue Policy 𝑎 = 𝜋(𝑠)
Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)
Optimize 𝑄(𝑠, 𝑎) User input (o)
Response
𝑠
𝑎
Type of Bots State Action Reward
Social ChatBots Chat history System Response # of turns maximized;
Intrinsically motivated reward
InfoBots (interactive Q/A) User current question + Context
Answers to current question
Relevance of answer;
# of turns minimized
Task-Completion Bots User current input + Context
System dialogue act w/
slot value (or API calls)
Task success rate;
# of turns minimized
Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories
74 Material: http://opendialogue.miulab.tw
Dialogue Reinforcement Learning Signal
Typical reward function
-1 for per turn penalty
Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
74
The user simulator is usually required for dialogue system training before deployment
75 Material: http://opendialogue.miulab.tw
Neural Dialogue Manager
(Li et al., 2017) Deep Q-network for training DM policy
Input: current semantic frame observation, database returned results
Output: system action
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location DQN-based
Dialogue Management
Simulated User (DM) Backend DB
Material: http://deepdialogue.miulab.tw
76 Material: http://opendialogue.miulab.tw
SL + RL for Sample Efficiency
(Su et al., 2017) Issue about RL for DM
slow learning speed
cold start
Solutions
Sample-efficient actor-critic
Off-policy learning with experience replay
Better gradient update
Utilizing supervised data
Pretrain the model with SL and then fine-tune with RL
Mix SL and RL data during RL learning
Combine both
76
77 Material: http://opendialogue.miulab.tw
Online Training
(Su et al., 2015; Su et al., 2016) Policy learning from real users
Infer reward directly from dialogues (Su et al., 2015)
User rating (Su et al., 2016)
Reward modeling on user binary success rating
Reward
Model Success/Fail Embedding
Function
Dialogue Representation
Reinforcement Signal Query rating
78 Material: http://opendialogue.miulab.tw
Interactive RL for DM
(Shah et al., 2016)78
Immediate Feedback
Use a third agent for providing interactive feedback to the DM
79 Material: http://opendialogue.miulab.tw
Dialogue Management Evaluation
Metrics
Turn-level evaluation: system action accuracy
Dialogue-level evaluation: task success rate, reward
79
80 Material: http://opendialogue.miulab.tw
Outline
PART I. Introduction & Background Knowledge
PART II. Task-Oriented Dialogue Systems
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management – Dialogue State Tracking (DST)
Dialogue Management – Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Task-Oriented Dialogue Systems
PART III. Social Chat Bots
PART IV. Evaluation
PART V. Recent Trends and Challenges
80
81 Material: http://opendialogue.miulab.tw
Natural Language Generation (NLG)
Mapping semantic frame into natural language
inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant
81
82 Material: http://opendialogue.miulab.tw
Template-Based NLG
Define a set of rules to map frames to NL
82
Pros:simple, error-free, easy to control Cons: time-consuming, poor scalability Semantic Frame Natural Language
confirm() “Please tell me more about the product your are looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
83 Material: http://opendialogue.miulab.tw
Plan-Based NLG
(Walker et al., 2002) Divide the problem into pipeline
Statistical sentence plan generator (Stent et al., 2009)
Statistical surface realizer (Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(
name=Z_House, price=cheap )
Z House is a cheap restaurant.
Pros:can model complex linguistic structures
Cons: heavily engineered, require domain knowledge Sentence
Plan Generator
Sentence Plan Reranker
Surface Realizer
syntactic tree
84 Material: http://opendialogue.miulab.tw
Class-Based LM NLG
(Oh and Rudnicky, 2000) Class-based language modeling
NLG by decoding
84
Pros:easy to implement/ understand, simple rules Cons: computationally inefficient
Classes:
inform_area inform_address
…
request_area request_postcode
85 Material: http://opendialogue.miulab.tw
RNN-Based LM NLG
(Wen et al., 2015)<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> Din Tai Fung serves Taiwanese . delexicalisation
Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…
dialogue act 1-hot representation
SLOT_NAME serves SLOT_FOOD . <EOS>
Slot weight tying
conditioned on the dialogue act
Input
Output
86 Material: http://opendialogue.miulab.tw
Handling Semantic Repetition
Issue: semantic repetition
Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.
Din Tai Fung is a child friendly restaurant, and also allows kids.
Deficiency in either model or decoding (or both)
Mitigation
Post-processing rules (Oh & Rudnicky, 2000)
Gating mechanism (Wen et al., 2015)
Attention(Mei et al., 2016; Wen et al., 2015)
86
87 Material: http://opendialogue.miulab.tw
Original LSTM cell
Dialogue act (DA) cell
Modify Ct
Semantic Conditioned LSTM
(Wen et al., 2015)DA cell LSTM cell
Ct
it
ft
ot
rt
ht
dt
dt-1
xt
xt ht-1
xt ht-1 xt ht-1 xt h
t-1
ht-1
Inform(name=Seven_Days, food=Chinese)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d0
87
Idea: using gate mechanism to control the generated semantics (dialogue act/slots)
88 Material: http://opendialogue.miulab.tw
Structural NLG
(Dušek and Jurčíček, 2016) Goal: NLG based on the syntax tree
Encode trees as sequences
Seq2Seq model for generation
88
89 Material: http://opendialogue.miulab.tw
Contextual NLG
(Dušek and Jurčíček, 2016) Goal: adapting users’ way of speaking, providing context-aware responses
Context encoder
Seq2Seq model
89
90 Material: http://opendialogue.miulab.tw
Controlled Text Generation
(Hu et al., 2017) Idea: NLG based on generative adversarial network (GAN) framework
c: targeted sentence attributes
91 Material: http://opendialogue.miulab.tw
NLG Evaluation
Metrics
Subjective: human judgement (Stent et al., 2005)
Adequacy: correct meaning
Fluency: linguistic fluency
Readability: fluency in the dialogue context
Variation: multiple realizations for the same concept
Objective: automatic metrics
Word overlap: BLEU (Papineni et al, 2002), METEOR, ROUGE
Word embedding based: vector extrema, greedy matching, embedding average
There is a gap between human perception and automatic metrics 91
92 Material: http://opendialogue.miulab.tw
Outline
PART I. Introduction & Background Knowledge
PART II. Task-Oriented Dialogue Systems
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management – Dialogue State Tracking (DST)
Dialogue Management – Dialogue Policy Optimization
Natural Language Generation (NLG)
End-to-End Task-Oriented Dialogue Systems
PART III. Social Chat Bots
PART IV. Evaluation
PART V. Recent Trends and Challenges
92
93 Material: http://opendialogue.miulab.tw
E2E Joint NLU and DM
(Yang et al., 2017) Errors from DM can be propagated to NLU for regularization + robustness
DM
93
Model DM NLU
Baseline (CRF+SVMs) 7.7 33.1 Pipeline-BLSTM 12.0 36.4
JointModel 22.8 37.4
Both DM and NLU performance (frame accuracy) is improved
94 Material: http://opendialogue.miulab.tw
0 0 0 … 0 1
Database Operator
Copy field
…
Database
Seven days Curry Prince Nirala Royal Standard Little Seuol
DB pointer
Can I have korean Korean
0.7 British 0.2 French 0.1
…
Belief Tracker Intent Network
Can I have <v.food>
E2E Supervised Dialogue System
(Wen et al., 2017)Generation Network
<v.name> serves great <v.food> .
Policy Network
94
zt
pt xt
MySQL query:
“Select * where food=Korean”
qt
95 Material: http://opendialogue.miulab.tw
E2E MemNN for Dialogues
(Bordes et al., 2017) Split dialogue system actions into subtasks
API issuing
API updating
Option displaying
Information informing
96 Material: http://opendialogue.miulab.tw
E2E RL-Based KB-InfoBot
(Dhingra et al., 2017)Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray movie which came out in 1993.
KB-InfoBot User
Entity-Centric Knowledge Base
Idea: differentiable database for propagating the gradients 96
Movie Actor Release
Year Groundhog Day Bill Murray 1993
Australia Nicole Kidman X
Mad Max: Fury Road X 2015
97 Material: http://opendialogue.miulab.tw
E2E RL-Based System
(Zhao and Eskenazi, 2016)97
Joint learning
NLU, DST, Dialogue Policy
Deep RL for training
Deep Q-network
Deep recurrent network
Baseline RL
Hybrid-RL
98 Material: http://opendialogue.miulab.tw
E2E LSTM-Based Dialogue Control
(Williams and Zweig, 2016)98
Idea: an LSTM maps from raw dialogue history directly to a distribution over system actions
Developers can provide software including business rules & programmatic APIs
LSTM can take actions in the real world on behalf of the user
The LSTM can be optimized using SL or RL
99 Material: http://opendialogue.miulab.tw
E2E Task-Completion Bot (TC-Bot)
(Li et al., 2017)wi
B-type
wi +1
wi+2
O O
EOS
<intent
>
wi
B-type
wi +1
wi+2
O O
EOS
<intent
> Semantic Frame request_movie genre=action, date=this weekend System Action /
Policy
request_location User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi
+1
wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2 Time t
Dialogue Management
(DM)
w0 w1 w2
Natural Language Generation (NLG)
User EOS
Goal
User Agenda Modeling User Simulator
End-to-End Neural Dialogue System Text Input
Are there any action movies to see this weekend?
Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system
99
100 Material: http://opendialogue.miulab.tw
E2E Task-Completion Bot (TC-Bot)
(Li et al., 2017) User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you! 100
The system can learn how to efficiently interact with users for task completion
REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch?
Agent: What date would you like to see it?
User: Tomorrow
Agent: Which theater do you prefer?
User: Regal meridian 16 Agent: What is your location?
User: Seattle
Agent: What time would you like to see it?
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!
101 Material: http://opendialogue.miulab.tw
Hierarchical RL for Composite Tasks
(Peng et al., 2017)101
Travel Planning
Actions
• Set of tasks that need to be fulfilled collectively!
• Build a dialog manager that satisfies cross-subtask constraints (slot constraints)
• Temporally constructed goals
• hotel_check_in_time > departure_flight_time
• # flight_tickets = #people checking in the hotel
• hotel_check_out_time< return_flight_time,
102 Material: http://opendialogue.miulab.tw
Hierarchical RL for Composite Tasks
(Peng et al., 2017)102
The dialog model makes decisions over two levels: meta-controller and meta-controller
The agent learns these policies simultaneously
the policy of optimal sequence of goals to follow 𝜋𝑔 𝑔𝑡, 𝑠𝑡; 𝜃1
Policy 𝜋𝑎,𝑔 𝑎𝑡, 𝑔𝑡, 𝑠𝑡; 𝜃2 for each sub-goal 𝑔𝑡
Meta-Controller Controller
(mitigate reward sparsity issues)
Social Chat Bots
103
104 Material: http://opendialogue.miulab.tw
Social Chat Bots
104
The success of XiaoIce (小冰)
Problem setting and evaluation
Maximize the user engagement by automatically generating
enjoyable and useful conversations
Learning a neural conversation engine
A data driven engine trained on social chitchat data (Sordoni+ 15; Li+ 16)
Persona based models and speaker-role based models (Li+ 16; Luan+ 17)
Image-grounded models (Mostafazadeh+ 17)
Knowledge-grounded models (Ghazvininejad+ 17)