Deep Learning for Dialogue Systems
deepdialogue.miulab.tw
2
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
2
Material: http://deepdialogue.miulab.tw
Break
Introduction
4
Early 1990s
Early 2000s
2017
Multi-modal systems
e.g., Microsoft MiPad, Pocket PC
Keyword Spotting (e.g., AT&T)
System: “Please say collect, calling card, person, third number, or operator”
TV Voice Search e.g., Bing on Xbox
Intent Determination
(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house” Task-specific argument extraction
(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”
Brief History of Dialogue Systems
Apple Siri (2011)
Google Now (2012)
Facebook M & Bot (2015)
Google Home (2016) Microsoft Cortana
(2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
DARPA CALO Project
Virtual Personal Assistants
Material: http://deepdialogue.miulab.tw
5
Language Empowering Intelligent Assistant
Apple Siri (2011) Google Now (2012)
Facebook M & Bot (2015) Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
Apple HomePod (2017)
6
Why We Need?
Get things done
E.g. set up alarm/reminder, take note
Easy access to structured data, services and apps
E.g. find docs/photos/restaurants
Assist your daily schedule and routine
E.g. commute alerts to/from work
Be more productive in managing your work and personal life
6
Material: http://deepdialogue.miulab.tw
7
Why Natural Language?
Global Digital Statistics (2015 January)
7
Global Population 7.21B
Active Internet Users 3.01B
Active Social Media Accounts
2.08B
Active Unique Mobile Users
3.65B The more natural and convenient input of devices evolves towards speech.
8
Spoken Dialogue System (SDS)
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.
Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in- car navigating system, etc).
8
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
Good dialogue systems assist users to access information conveniently and finish tasks efficiently.
Material: http://deepdialogue.miulab.tw
9
App Bot
A bot is responsible for a “single” domain, similar to an app
9
Users can initiate dialogues instead of following the GUI design
10
GUI v.s. CUI (Conversational UI)
10 https://github.com/enginebai/Movie-lol-android
Material: http://deepdialogue.miulab.tw
11
GUI v.s. CUI (Conversational UI)
Website/APP’s GUI Msg’s CUI
Situation Navigation, no specific goal Searching, with specific goal
Information Quantity More Less
Information Precision Low High
Display Structured Non-structured
Interface Graphics Language
Manipulation Click mainly use texts or speech as input
Learning Need time to learn and adapt No need to learn
Entrance App download Incorporatedin any msg-based interface
Flexibility Low, like machine manipulation High, like converse with a human
11
12
Challenges
Variability in Natural Language
Robustness
Recall/Precision Trade-off
Meaning Representation
Common Sense, World Knowledge
Ability to Learn
Transparency
12
Material: http://deepdialogue.miulab.tw
Two Branches of Bots
Personal assistant, helps users achieve a certain task
Combination of rules and statistical components
POMDP for spoken dialog systems (Williams and Young, 2007)
End-to-end trainable task-oriented dialogue system (Wen et al., 2016)
End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)
No specific goal, focus on natural responses
Using variants of seq2seq model
A neural conversation model (Vinyals and Le, 2015)
Reinforcement learning for dialogue generation (Li et al., 2016)
Conversational contextual cues for response ranking (AI-Rfou et al., 2016)
13
Task-Oriented Bot Chit-Chat Bot
14
Task-Oriented Dialogue System (Young, 2000)
14
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
Material: http://deepdialogue.miulab.tw
15
Interaction Example
15
User
Intelligent
Agent Q: How does a dialogue system process this request?
Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there.
find a good eating place for taiwanese food
16
Task-Oriented Dialogue System (Young, 2000)
16
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
Material: http://deepdialogue.miulab.tw
17
Requires Predefined Domain Ontology
17
find a good eating place for taiwanese food
User
Organized Domain Knowledge (Database)
Intelligent Agent
Restaurant DB Taxi DB Movie DB
Classification!
18
2. Intent Detection
Requires Predefined Schema
18
find a good eating place for taiwanese food
User
Intelligent Agent
Restaurant DB
FIND_RESTAURANT FIND_PRICE
FIND_TYPE :
Classification!
Material: http://deepdialogue.miulab.tw
19
Requires Predefined Schema
find a good eating place for taiwanese food
User
Intelligent Agent
19
Restaurant DB
Restaurant Rating Type Rest 1 good Taiwanese
Rest 2 bad Thai
: : :
FIND_RESTAURANT rating=“good”
type=“taiwanese”
SELECT restaurant { rest.rating=“good”
rest.type=“taiwanese”
Semantic Frame } Sequence Labeling
O O B-rating O O O B-type O
20
Task-Oriented Dialogue System (Young, 2000)
20
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
Material: http://deepdialogue.miulab.tw
21
Requires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
21
location rating type
loc, rating rating, type
loc, type all
i want it near to my office
NULL
22
State Tracking
Requires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
22
location rating type
loc, rating rating, type
loc, type all
i want it near to my office
NULL
Material: http://deepdialogue.miulab.tw
23
Handling Errors and Confidence
User
Intelligent Agent
find a good eating place for taixxxx food
23
FIND_RESTAURANT rating=“good”
type=“taiwanese”
FIND_RESTAURANT rating=“good”
type=“thai”
FIND_RESTAURANT rating=“good”
location rating type
loc, rating rating, type
loc, type all
NULL
?
?
rating=“good”, type=“thai”
rating=“good”, type=“taiwanese”
?
?
24
Dialogue Policy for Agent Action
Inform(location=“Taipei 101”)
“The nearest one is at Taipei 101”
Request(location)
“Where is your home?”
Confirm(type=“taiwanese”)
“Did you want Taiwanese food?”
24
Material: http://deepdialogue.miulab.tw
25
Task-Oriented Dialogue System (Young, 2000)
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text Input
Are there any action movies to see this weekend?
Speech Signal
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Backend Action / Knowledge Providers Natural Language
Generation (NLG) Text response
Where are you located?
26
Output / Natural Language Generation
Goal: generate natural language or GUI given the selected dialogue action for interactions
Inform(location=“Taipei 101”)
“The nearest one is at Taipei 101” v.s.
Request(location)
“Where is your home?” v.s.
Confirm(type=“taiwanese”)
“Did you want Taiwanese food?” v.s.
26
Material: http://deepdialogue.miulab.tw
Neural Network Basics
Reinforcement Learning
28
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
28
Material: http://deepdialogue.miulab.tw
29
Machine Learning ≈ Looking for a Function
Speech Recognition
Image Recognition
Go Playing
Chat Bot
f
f
f
f
cat
“你好 (Hello) ”
5-5 (next move)
“Where is Westin?” “The address is…”
Given a large amount of data, the machine learns what the function f should be.
30
Machine Learning
30
Machine Learning
Unsupervised Learning Supervised
Learning
Reinforcement Learning
Deep learning is a type of machine learning approaches, called “neural networks”.
Material: http://deepdialogue.miulab.tw
31
A Single Neuron
z w
1w
2w
N…
x
1x
2x
N b
z
z
bias z
y
zz e
1
1
Sigmoid function Activation function
1
w, b
are the parameters of this neuron
31
32
A Single Neuron
z w
1w
2w
N…
x
1x
2x
N
b
bias
y
1
5 . 0
"
2
"
5 . 0
"
2
"
y not
y is
A single neuron can only handle binary classification
32
M
N
R
R
f :
Material: http://deepdialogue.miulab.tw
33
A Layer of Neurons
Handwriting digit classification f : R
N R
MA layer of neurons can handle multiple possible output, and the result depends on the max one
…
x
1x
2x
N
1
y
1
… …
“1” or not
“2” or not
“3” or not
y
2y
310 neurons/10 classes
Which one is max?
34
Deep Neural Networks (DNN)
Fully connected feedforward network
x
1x
2……
Layer 1
……
y
1y
2……
Layer 2
……
Layer L
……
……
……
Input Output
y
Mx
Nvector x
vector y
Deep NN: multiple hidden layers
M
N
R
R
f :
Material: http://deepdialogue.miulab.tw
35
Recurrent Neural Network (RNN)
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)
36
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
36
Material: http://deepdialogue.miulab.tw
37
Reinforcement Learning
RL is a general purpose framework for decision making
RL is for an agent with the capacity to act
Each action influences the agent’s future state
Success is measured by a scalar reward signal
Goal: select actions to maximize future reward
38
Scenario of Reinforcement Learning
Agent learns to take actions to maximize expected reward.
Environment
Observation o
tAction a
tReward r
tIf win, reward = 1 If loss, reward = -1 Otherwise, reward = 0
Next Move
Material: http://deepdialogue.miulab.tw
39
Supervised v.s. Reinforcement
Supervised
Reinforcement
39
Hello ☺
Agent
……
Agent
……. …….
……
Bad
“Hello” Say “Hi”
“Bye bye” Say “Good bye”
Learning from teacher
Learning from critics
40
Sequential Decision Making
Goal: select actions to maximize total future reward
Actions may have long-term consequences
Reward may be delayed
It may be better to sacrifice immediate reward to gain more long-term reward
40
Material: http://deepdialogue.miulab.tw
41
Deep Reinforcement Learning
Environment
Observation Action
Reward Function
Input
Function Output
Used to pick the best function
… …
…
DNN
42
Reinforcing Learning
Start from state s
0
Choose action a
0
Transit to s
1~ P(s
0, a
0)
Continue…
Total reward:
Goal: select actions that maximize the expected total reward
Material: http://deepdialogue.miulab.tw
43
Reinforcement Learning Approach
Policy-based RL
Search directly for optimal policy
Value-based RL
Estimate the optimal value function
Model-based RL
Build a model of the environment
Plan (e.g. by lookahead) using model
is the policy achieving maximum future reward
is maximum value achievable under any policy
Modular Dialogue System
44
45
Task-Oriented Dialogue System (Young, 2000)
45
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
46
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
46
Material: http://deepdialogue.miulab.tw
47
Language Understanding (LU)
Pipelined
47
1. Domain Classification
2. Intent
Classification 3. Slot Filling
LU – Domain/Intent Classification
• Given a collection of utterances u
iwith labels c
i, D= {(u
1,c
1),…,(u
n,c
n)}
where c
i∊ C, train a model to estimate labels for new utterances u
k.
Mainly viewed as an utterance classification task48
find me a cheap taiwanese restaurant in oakland
Movies Restaurants Sports
Weather Music
…
Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics
…
Material: http://deepdialogue.miulab.tw
49
DNN for Domain/Intent Classification – I (Sarikaya et al., 2011)
Deep belief nets (DBN)
Unsupervised training of weights
Fine-tuning by back-propagation
Compared to MaxEnt, SVM, and boosting
49 http://ieeexplore.ieee.org/abstract/document/5947649/
50
DNN for Domain/Intent Classification – II (Tur et al., 2012;
Deng et al., 2012)
Deep convex networks (DCN)
Simple classifiers are stacked to learn complex functions
Feature selection of salient n-grams
Extension to kernel-DCN
50 http://ieeexplore.ieee.org/abstract/document/6289054/; http://ieeexplore.ieee.org/abstract/document/6424224/
Material: http://deepdialogue.miulab.tw
51
DNN for Domain/Intent Classification – III (Ravuri & Stolcke, 2015)
51 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf
Intent decision after reading all words performs better
RNN and LSTMs for utterance classification
52
DNN for Dialogue Act Classification – IV (Lee & Dernoncourt, 2016)
52
RNN and CNNs for dialogue act classification
Material: http://deepdialogue.miulab.tw
LU – Slot Filling
53
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence tagging task
• Given a collection tagged word sequences, S={((w
1,1,w
1,2,…, w
1,n1), (t
1,1,t
1,2,…,t
1,n1)), ((w
2,1,w
2,2,…,w
2,n2), (t
2,1,t
2,2,…,t
2,n2)) …}
where t
i∊ M, the goal is to estimate tags for a new word sequence.
flights from Boston to New York today
Entity Tag
Slot Tag
54
Recurrent Neural Nets for Slot Tagging – I (Yao et al, 2013;
Mesnil et al, 2015)
Variations:
a.
RNNs with LSTM cells
b.
Input, sliding window of n-grams
c.
Bi-directional LSTMs
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0𝑓 ℎ1𝑓 ℎ2𝑓 ℎ𝑛𝑓 ℎ0𝑏 ℎ1𝑏 ℎ2𝑏 ℎ𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛
(b) LSTM-LA (c) bLSTM
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
(a) LSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380
Material: http://deepdialogue.miulab.tw
55
Simonnet et al., 2015)
Encoder-decoder networks
Leverages sentence level information
Attention-based encoder-decoder
Use of attention (as in MT) in the encoder-decoder network
Attention is estimated using a feed-
forward network with input: h
tand s
tat time t
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤𝑛 𝑤2 𝑤1 𝑤0 ℎ𝑛 ℎ2 ℎ1 ℎ0
𝑤0 𝑤1 𝑤2 𝑤𝑛 𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
ℎ0 ℎ1 ℎ2 ℎ𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛 ci
ℎ0
…
ℎ𝑛http://www.aclweb.org/anthology/D16-1223
56
Recurrent Neural Nets for Slot Tagging – III (Jaech et al., 2016;
Tafforeau et al., 2016)
Multi-task learning
Goal: exploit data from domains/tasks with a lot of data to improve ones with less data
Lower layers are shared across domains/tasks
Output layer is specific to task
56 https://arxiv.org/abs/1604.00117; http://www.sensei-conversation.eu/wp-content/uploads/2016/11/favre_is2016b.pdf
Material: http://deepdialogue.miulab.tw
57
Joint Segmentation and Slot Tagging (Zhai et al., 2017)
Encoder that segments
Decoder that tags the segments
57 https://arxiv.org/pdf/1701.04027.pdf
ht-
1
ht+
1
ht
W W W W
taiwanese
B-type U
food U
please U
V
O V
O V
hT+1 EOS U
FIND_REST V
Slot Filling Intent Prediction
Joint Semantic Frame Parsing
Sequence- based (Hakkani-Tur
et al., 2016)
• Slot filling and intent prediction in the same
output sequence
Parallel (Liu and Lane, 2016)
• Intent prediction and slot filling are performed in two branches
58 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454
Material: http://deepdialogue.miulab.tw
59
Contextual LU
59
just sent email to bob about fishing this weekend
O O O O
B-contact_name O
B-subject I-subject I-subject U
S
I send_email D communication
send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend U1
S2
send_email(message=“are we going to fish this weekend”) send email to bob
U2
send_email(contact_name=“bob”)
B-message
I-messageI-message I-message I-message I-message I-message
B-contact_name S1
Domain Identification Intent Prediction Slot Filling
60
Contextual LU
User utterances are highly ambiguous in isolation
Cascal, for 6.
#people time
?
Book a table for 10 people tonight.
Which restaurant would you like to book a table for?
Restaurant Booking
Material: http://deepdialogue.miulab.tw
61
Contextual LU (Bhargava et al., 2013; Hori et al, 2015 )
Leveraging contexts
Used for individual tasks
Seq2Seq model
Words are input one at a time, tags are output at the end of each utterance
Extension: LSTM with speaker role dependent layers
61 https://www.merl.com/publications/docs/TR2015-134.pdf
62
End-to-End Memory Networks (Sukhbaatar et al, 2015)
U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
m0
mi
mn-1 u
Material: http://deepdialogue.miulab.tw
63
E2E MemNN for Contextual LU (Chen et al., 2016)
63
u
Knowledge Attention Distribution
pi
mi
Memory Representation
Weighted
Sum h
∑ Wkg
Knowledge Encoding o
Representation history utterances {xi}
current utterance
c
Inner Product Sentence
Encoder RNNin
x1 x2 … xi
Contextual Sentence Encoder
x1 x2 … xi
RNNmem
slot tagging sequencey
ht-1 ht
V V
W W W
wt-1 wt yt-1 yt
U U
M M
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Idea: additionally incorporating contextual knowledge during slot tagging
track dialogue states in a latent way
RNN Tagger
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf
64
Analysis of Attention
U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
0.69
0.13
0.16
Material: http://deepdialogue.miulab.tw
65
Sequential Dialogue Encoder Network (Bapna et al., 2017)
Past and current turn encodings input to a feed forward network
65 Bapna et.al., SIGDIAL 2017
66
Structural LU (Chen et al., 2016)
K-SAN: prior knowledge as a teacher
66
Knowledge Encoding
Sentence Encoding
Inner Product
m
iKnowledge Attention Distribution
p
iEncoded Knowledge Representation
Weighted Sum
∑
Knowledge- Guided Representation
slot tagging sequence knowledge-guided structure {xi}
showme theflights fromseattleto sanfrancisco
ROOT
Input Sentence
W W W W
wt-1
yt-1 U
wt M U
wt+1 U
V
yt V
yt+1 V M
M
RNN Tagger
Knowledge Encoding Module
http://arxiv.org/abs/1609.03286
Material: http://deepdialogue.miulab.tw
67
Structural LU (Chen et al., 2016)
Sentence structural knowledge stored as memory
67
Semantics (AMR Graph)
show me
the
flights from seattle
to
san francisco
ROOT1.
3.
4.
2.
show
you flight I
1.
2.
4.
city city
Seattle San Francisco
3.Sentence s show me the flights from seattle to san francisco
Syntax (Dependency Tree)
http://arxiv.org/abs/1609.03286
68
Structural LU (Chen et al., 2016)
Sentence structural knowledge stored as memory
http://arxiv.org/abs/1609.03286
Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.
Material: http://deepdialogue.miulab.tw
69
LU Importance (Li et al., 2017)
Compare different types of LU errors
http://arxiv.org/abs/1703.07055
Slot filling is more important than intent detection in language understanding
Sensitivity to Intent Error Sensitivity to Slot Error
70
LU Evaluation
Metrics
Sub-sentence-level: intent accuracy, slot F1
Sentence-level: whole frame accuracy
70
Material: http://deepdialogue.miulab.tw
71
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
71
72
Elements of Dialogue Management
(Figure from Gašić) 72
Dialogue State Tracking
Material: http://deepdialogue.miulab.tw
73
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best prediction for better robustness
73
Incorrect
for both!
74
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input
74
How can I help you?
Book a table at Sumiko for 5 How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
Material: http://deepdialogue.miulab.tw
75
Multi-Domain Dialogue State Tracking (DST)
A full representation of the system's belief of the user's goal at any point during the dialogue
Used for making API calls
75
Do you wanna take Angela to go see a movie tonight?
Sure, I will be home by 6.
Let's grab dinner before the movie.
How about some Mexican?
Let's go to Vive Sol and see Inferno after that.
Angela wants to watch the Trolls movie.
Ok. Lets catch the 8 pm show.
Inferno
6 pm 7 pm
2 3
11/15/16
Vive Sol Restaurant
Mexican Cuisine
6:30 pm 7 pm 11/15/16 Date
Time
Restaurants
7:30 pm
Century 16
Trolls
8 pm 9 pm
Movies
76
Dialog State Tracking Challenge (DSTC)
(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)
Challenge Type Domain Data Provider Main Theme
DSTC1 Human-Machine Bus Route CMU Evaluation Metrics
DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes
DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation
DSTC4 Human-Human Tourist Information I2R Human Conversation
DSTC5 Human-Human Tourist Information I2R Language Adaptation
Material: http://deepdialogue.miulab.tw
77
Mrkšić et al., 2016)
(Figure from Wen et al, 2016) 77
http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777
78
Neural Belief Tracker (Mrkšić et al., 2016)
78 https://arxiv.org/abs/1606.03777
Material: http://deepdialogue.miulab.tw
79
Multichannel Tracker (Shi et al., 2016)
79
Training a multichannel CNN for each slot
Chinese character CNN
Chinese word CNN
English word CNN
https://arxiv.org/abs/1701.06247
80
DST Evaluation
Dialogue State Tracking Challenges
DSTC2-3, human-machine
DSTC4-5, human-human
Metric
Tracked state accuracy with respect to user goal
Recall/Precision/F-measure individual slots
80
Material: http://deepdialogue.miulab.tw
81
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
81
82
Elements of Dialogue Management
(Figure from Gašić) 82
Dialogue Policy Optimization
Material: http://deepdialogue.miulab.tw
83
Dialogue Policy Optimization
Dialogue management in a RL framework
83
U s e r
Reward R
Observation O Action AEnvironment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
Slides credited by Pei-Hao Su
Optimized dialogue policy selects the best action that can maximize the future reward.
Correct rewards are a crucial factor in dialogue policy training
84
Reward for RL ≅ Evaluation for System
Dialogue is a special RL task
Human involves in interaction and rating (evaluation) of a dialogue
Fully human-in-the-loop framework
Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost
84
Material: http://deepdialogue.miulab.tw
85
Reinforcement Learning for Dialogue Policy Optimization
85
Language understanding
Language (response) generation
Dialogue Policy 𝑎 = 𝜋(𝑠)
Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)
Optimize 𝑄(𝑠, 𝑎) User input (o)
Response
𝑠
𝑎
Type of Bots State Action Reward
Social ChatBots Chat history System Response # of turns maximized;
Intrinsically motivated reward
InfoBots (interactive Q/A) User current question + Context
Answers to current question
Relevance of answer;
# of turns minimized
Task-Completion Bots User current input + Context
System dialogue act w/
slot value (or API calls)
Task success rate;
# of turns minimized
Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories
86
Dialogue Reinforcement Learning Signal
Typical reward function
-1 for per turn penalty
Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
86
The user simulator is usually required for dialogue system training before deployment
Material: http://deepdialogue.miulab.tw
87
Neural Dialogue Manager (Li et al., 2017)
Deep Q-network for training DM policy
Input: current semantic frame observation, database returned results
Output: system action
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location DQN-based
Dialogue Management
Simulated User
(DM)Backend DB
https://arxiv.org/abs/1703.01008
88
SL + RL for Sample Efficiency (Su et al., 2017)
Issue about RL for DM
slow learning speed
cold start
Solutions
Sample-efficient actor-critic
Off-policy learning with experience replay
Better gradient update
Utilizing supervised data
Pretrain the model with SL and then fine-tune with RL
Mix SL and RL data during RL learning
Combine both
88 https://arxiv.org/pdf/1707.00130.pdf Su et.al., SIGDIAL 2017
Material: http://deepdialogue.miulab.tw
89
Online Training (Su et al., 2015; Su et al., 2016)
Policy learning from real users
Infer reward directly from dialogues
(Su et al., 2015)
User rating
(Su et al., 2016)
Reward modeling on user binary success rating
Reward
Model
Success/Fail
EmbeddingFunction
Dialogue Representation
Reinforcement Signal Query rating
http://www.anthology.aclweb.org/W/W15/W15-46.pdf; https://www.aclweb.org/anthology/P/P16/P16-1230.pdf
90
Interactive RL for DM (Shah et al., 2016)
90
Immediate Feedback
https://research.google.com/pubs/pub45734.html
Use a third agent for providing interactive feedback to the DM
Material: http://deepdialogue.miulab.tw
91
Interpreting Interactive Feedback (Shah et al., 2016)
91 https://research.google.com/pubs/pub45734.html
92
Dialogue Management Evaluation
Metrics
Turn-level evaluation: system action accuracy
Dialogue-level evaluation: task success rate, reward
92
Material: http://deepdialogue.miulab.tw
93
Outline
Introduction
Background Knowledge
Neural Network Basics
Reinforcement Learning
Modular Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Dialogue Management
Dialogue State Tracking (DST)
Dialogue Policy Optimization
Natural Language Generation (NLG)
Evaluation
Recent Trends and Challenges
End-to-End Neural Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
93
94
Natural Language Generation (NLG)
Mapping semantic frame into natural language
inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant
94
Material: http://deepdialogue.miulab.tw
95
Template-Based NLG
Define a set of rules to map frames to NL
95
Pros:
simple, error-free, easy to control
Cons: time-consuming, poor scalability Semantic Frame Natural Languageconfirm() “Please tell me more about the product your are looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
96
Plan-Based NLG (Walker et al., 2002)
Divide the problem into pipeline
Statistical sentence plan generator
(Stent et al., 2009)
Statistical surface realizer
(Dethlefs et al., 2013; Cuayáhuitl et al., 2014; …) Inform(name=Z_House, price=cheap )
Z House is a cheap restaurant.
Pros:
can model complex linguistic structures
Cons: heavily engineered, require domain knowledge Sentence
Plan Generator
Sentence Plan Reranker
Surface Realizer
syntactic tree
Material: http://deepdialogue.miulab.tw
97
Class-Based LM NLG (Oh and Rudnicky, 2000)
Class-based language modeling
NLG by decoding
97
Pros:
easy to implement/ understand, simple rules
Cons: computationally inefficientClasses:
inform_area inform_address
…
request_area request_postcode
http://dl.acm.org/citation.cfm?id=1117568
98
Phrase-Based NLG (Mairesse et al, 2010)
Semantic DBN Phrase
DBN
Charlie Chan is a Chinese Restaurant near Cineworld in the centre
d d
Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)
98
Pros:
efficient, good performance
Cons: require semantic alignmentsrealization phrase semantic stack
http://dl.acm.org/citation.cfm?id=1858838
Material: http://deepdialogue.miulab.tw
99
RNN-Based LM NLG (Wen et al., 2015)
<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> Din Tai Fung serves Taiwanese . delexicalisation
Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…
dialogue act 1-hot representation
SLOT_NAME serves SLOT_FOOD . <EOS>
Slot weight tying
conditioned on the dialogue act
Input
Output
http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295
100
Handling Semantic Repetition
Issue: semantic repetition
Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.
Din Tai Fung is a child friendly restaurant, and also allows kids.
Deficiency in either model or decoding (or both)
Mitigation
Post-processing rules
(Oh & Rudnicky, 2000) Gating mechanism (Wen et al., 2015)
Attention(Mei et al., 2016; Wen et al., 2015)
100
Material: http://deepdialogue.miulab.tw