1 Y U N - N U N G ( V I V I A N ) C H E N 陳 縕 儂
H T T P : / / V I V I A N C H E N . I D V . T W
NTUMIULAB
• Introduction
• Background Knowledge
• Modular Dialogue System
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
2
NTUMIULAB
• Introduction
• Background Knowledge
• Modular Dialogue System
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
3
NTUMIULAB
Early 1990s
Early 2000s
2017 Multi-modal systems
e.g., Microsoft MiPad, Pocket PC
Keyword Spotting (e.g., AT&T)
System: “Please say collect, calling card, person, third number, or operator”
TV Voice Search e.g., Bing on Xbox
Intent Determination
(Nuance’s Emily™, AT&T HMIHY) User: “Uh…we want to move…we want to change our phone line from this house to another house” Task-specific argument extraction
(e.g., Nuance, SpeechWorks) User: “I want to fly from Boston to New York next week.”
Brief History of Dialogue Systems
Apple Siri (2011)
Google Now (2012)
Facebook M & Bot (2015)
Google Home (2016) Microsoft Cortana
(2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
DARPA CALO Project
Virtual Personal Assistants
NTUMIULAB
Language Empowering Intelligent Assistant
Apple Siri (2011) Google Now (2012)
Facebook M & Bot (2015) Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
Apple HomePod (2017)
NTUMIULAB
Why We Need?
• Get things done
• E.g. set up alarm/reminder, take note
• Easy access to structured data, services and apps
• E.g. find docs/photos/restaurants
• Assist your daily schedule and routine
• E.g. commute alerts to/from work
• Be more productive in managing your work and personal life
6
NTUMIULAB
Why Natural Language?
• Global Digital Statistics (2018 January)
Total Population 7.59B
Internet Users
4.02B Unique Mobile Users
5.14B
The more natural and convenient input of devices evolves towards speech.
Active Mobile Social Users
2.96B Active Social Media
Users 3.20B
13% 4%
7% 14%
NTUMIULAB
Spoken Dialogue System (SDS)
• Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.
• Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).
8
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
Good dialogue systems assist users to access information conveniently and finish tasks efficiently.
NTUMIULAB
App → Bot
• A bot is responsible for a “single” domain, similar to an app
9
Users can initiate dialogues instead of following the GUI design
NTUMIULAB
GUI v.s. CUI (Conversational UI)
10
https://github.com/enginebai/Movie-lol-android
NTUMIULAB
GUI v.s. CUI (Conversational UI)
Website/APP’s GUI Msg’s CUI
Situation Navigation, no specific goal Searching, with specific goal
Information Quantity More Less
Information Precision Low High
Display Structured Non-structured
Interface Graphics Language
Manipulation Click mainly use texts or speech as input
Learning Need time to learn and adapt No need to learn
Entrance App download Incorporated in any msg-based interface
Flexibility Low, like machine manipulation High, like converse with a human
11
NTUMIULAB
Conversational Agents
Chit-Chat
Task-Oriented
NTUMIULAB
Challenges
• Variability in Natural Language
• Robustness
• Recall/Precision Trade-off
• Meaning Representation
• Common Sense, World Knowledge
• Ability to Learn
• Transparency
13NTUMIULAB
Task-Oriented Dialogue System (Young, 2000)
14
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
NTUMIULAB
Interaction Example
15
User
Intelligent
Agent Q: How does a dialogue system process this request?
Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there.
find a good eating place for taiwanese food
NTUMIULAB
Task-Oriented Dialogue System (Young, 2000)
16
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
NTUMIULAB
Requires Predefined Domain Ontology
17
find a good eating place for taiwanese food
User
Organized Domain Knowledge (Database)
Intelligent Agent
Restaurant DB Taxi DB Movie DB
Classification!
NTUMIULAB
Requires Predefined Schema
18
find a good eating place for taiwanese food
User
Intelligent Agent
Restaurant DB
FIND_RESTAURANT FIND_PRICE
FIND_TYPE :
Classification!
NTUMIULAB
Requires Predefined Schema
find a good eating place for taiwanese food
User
Intelligent Agent
19
Restaurant DB
Restaurant Rating Type Rest 1 good Taiwanese
Rest 2 bad Thai
: : :
FIND_RESTAURANT rating=“good”
type=“taiwanese”
SELECT restaurant { rest.rating=“good”
rest.type=“taiwanese”
Semantic Frame } Sequence Labeling O O B-rating O O O B-type O
NTUMIULAB
Task-Oriented Dialogue System (Young, 2000)
20
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
NTUMIULAB
Requires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
location rating 21type
loc, rating rating, type loc, type
all
i want it near to my office
NULL
NTUMIULAB
Requires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
location rating 22type
loc, rating rating, type loc, type
all
i want it near to my office
NULL
NTUMIULAB
Handling Errors and Confidence
User
Intelligent Agent
find a good eating place for taixxxx food
23
FIND_RESTAURANT rating=“good”
type=“taiwanese”
FIND_RESTAURANT rating=“good”
type=“thai”
FIND_RESTAURANT rating=“good”
location rating type
loc, rating rating, type loc, type
all NULL
?
?
rating=“good”, type=“thai”
rating=“good”, type=“taiwanese”
?
?
NTUMIULAB
Dialogue Policy for Agent Action
• Inform(location=“Taipei 101”)
• “The nearest one is at Taipei 101”
• Request(location)
• “Where is your home?”
• Confirm(type=“taiwanese”)
• “Did you want Taiwanese food?”
24
NTUMIULAB
Task-Oriented Dialogue System (Young, 2000)
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling Hypothesis
are there any action movies to see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text Input
Are there any action movies to see this weekend?
Speech Signal
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Backend Action / Knowledge Providers Natural Language
Generation (NLG) Text response
Where are you located?
NTUMIULAB
Output / Natural Language Generation
• Goal: generate natural language or GUI given the selected dialogue action for interactions
• Inform(location=“Taipei 101”)
• “The nearest one is at Taipei 101” v.s.
• Request(location)
• “Where is your home?” v.s.
• Confirm(type=“taiwanese”)
• “Did you want Taiwanese food?” v.s. 26
NTUMIULAB
• Introduction
• Background Knowledge
• Neural Network Basics
• Reinforcement Learning
• Modular Dialogue System
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
27
NTUMIULAB
• Introduction
• Background Knowledge
• Neural Network Basics
• Reinforcement Learning
• Modular Dialogue System
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
28
NTUMIULAB
• Speech Recognition
• Image Recognition
• Go Playing
• Chat Bot
Machine Learning ≈ Looking for a Function
(
)
=f
(
)
=f
(
)
=f
(
)
=f
cat
“你好 (Hello) ”
5-5 (next move)
“Where is KAIST?”
“The address is…”Given a large amount of data, the machine learns what the function f should be.
NTUMIULAB
Machine Learning
30
Machine Learning
Unsupervised Learning Supervised
Learning
Reinforcement Learning
Deep learning is a type of machine learning approaches, called “neural networks”.
NTUMIULAB
A Single Neuron
z w
1w
2w
N…
x
1x
2x
N+ b
( ) z
( ) z
bias
z
y
( )
zz e
−= + 1
1
Sigmoid function Activation function
1
w, b are the parameters of this neuron
31
NTUMIULAB
A Single Neuron
z w
1w
2w
N…
x
1x
2x
N+
b
bias
y
1
5 . 0
"
2
"
5 . 0
"
2
"
y not
y is
A single neuron can only handle binary classification
32
M
N
R
R
f : →
NTUMIULAB
A Layer of Neurons
• Handwriting digit classification
M
N
R
R
f : →
A layer of neurons can handle multiple possible output, and the result depends on the max one
…
x
1x
2x
N+
1
+ y
1+
… …
“1” or not
“2” or not
“3” or not
y
2y
310 neurons/10 classes
Which one is max?
NTUMIULAB
Deep Neural Networks (DNN)
• Fully connected feedforward network
x1
x2
……
Layer 1
……
y1
y2
……
Layer 2
……
Layer L
……
……
……
Input Output
yM
xN
vector x
vector y
Deep NN: multiple hidden layers
M
N
R
R
f : →
NTUMIULAB
Recurrent Neural Network (RNN)
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)
NTUMIULAB
• Introduction
• Background Knowledge
• Neural Network Basics
• Reinforcement Learning
• Modular Dialogue System
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
36
NTUMIULAB
Reinforcement Learning
• RL is a general purpose framework for decision making
• RL is for an agent with the capacity to act
• Each action influences the agent’s future state
• Success is measured by a scalar reward signal
• Goal: select actions to maximize future reward
Observation Action Reward
NTUMIULAB
Scenario of Reinforcement Learning
Agent learns to take actions to maximize expected reward.
Environment
Observation ot Action at
Reward rt If win, reward = 1 If loss, reward = -1 Otherwise, reward = 0
Next Move
NTUMIULAB
Supervised v.s. Reinforcement
• Supervised
• Reinforcement
39
……
Say “Hi”
Say “Good bye”
Learning from teacher
Learning from critics
Hello ☺ ……
“Hello”
“Bye bye”
……. …….
OXX???!
Bad
NTUMIULAB
Sequential Decision Making
• Goal: select actions to maximize total future reward
• Actions may have long-term consequences
• Reward may be delayed
• It may be better to sacrifice immediate reward to gain more long-term reward
40
NTUMIULAB
Deep Reinforcement Learning
Environment
Observation Action
Reward Function
Input
Function Output
Used to pick the best function
…
……
DNN
NTUMIULAB
• Start from state
s0• Choose action
a0• Transit to
s1 ~ P(s0, a0)• Continue…
• Total reward:
Reinforcing Learning
Goal: select actions that maximize the expected total reward
NTUMIULAB
Reinforcement Learning Approach
• Policy-based RL
• Search directly for optimal policy
• Value-based RL
• Estimate the optimal value function
• Model-based RL
• Build a model of the environment
• Plan (e.g. by lookahead) using model
is the policy achieving maximum future reward
is maximum value achievable under any policy
NTUMIULAB
• Introduction
• Background Knowledge
• Modular Dialogue System
• Spoken/Natural Language Understanding (SLU/NLU)
• Dialogue Management
• Dialogue State Tracking (DST)
• Dialogue Policy Optimization
• Natural Language Generation (NLG)
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
44
NTUMIULAB
Task-Oriented Dialogue System (Young, 2000)
45
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
NTUMIULAB
• Introduction
• Background Knowledge
• Modular Dialogue System
• Spoken/Natural Language Understanding (SLU/NLU)
• Dialogue Management
• Dialogue State Tracking (DST)
• Dialogue Policy Optimization
• Natural Language Generation (NLG)
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
46
NTUMIULAB
Language Understanding (LU)
• Pipelined
47
1. Domain Classification
2. Intent
Classification 3. Slot Filling
NTUMIULAB
LU – Domain/Intent Classification
As an utterance classification
task
• Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a
model to estimate labels for new utterances uk.
find me a cheap taiwanese restaurant in oakland
Movies
Restaurants Music
Sports
…
find_movie, buy_tickets
find_restaurant, find_price, book_table find_lyrics, find_singer
…
Domain Intent
NTUMIULAB
• Deep belief nets (DBN)
• Unsupervised training of weights
• Fine-tuning by back-propagation
• Compared to MaxEnt, SVM, and boosting
Domain/Intent Classification
(Sarikaya et al., 2011)49
http://ieeexplore.ieee.org/abstract/document/5947649/
NTUMIULAB
• Deep convex networks (DCN)
• Simple classifiers are stacked to learn complex functions
• Feature selection of salient n-grams
• Extension to kernel-DCN
Domain/Intent Classification
(Tur et al., 2012; Deng et al., 2012)50
NTUMIULAB
• RNN and LSTMs for utterance classification
Domain/Intent Classification
(Ravuri & Stolcke, 2015)51
Intent decision after reading all words performs better
NTUMIULAB
• RNN and CNNs for dialogue act classification
Dialogue Act Classification
(Lee & Dernoncourt, 2016)NTUMIULAB
LU – Slot Filling
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence tagging task
• Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
where ti ∊ M, the goal is to estimate tags for a new word sequence.
flights from Boston to New York today
Entity Tag Slot Tag
NTUMIULAB
• Variations:
a. RNNs with LSTM cells
b. Input, sliding window of n-grams c. Bi-directional LSTMs
Slot Tagging
(Yao et al, 2013; Mesnil et al, 2015)𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0𝑓 ℎ1𝑓 ℎ2𝑓 ℎ𝑛𝑓 ℎ0𝑏 ℎ1𝑏 ℎ2𝑏 ℎ𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛
(b) LSTM-LA (c) bLSTM
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
(a) LSTM
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380
NTUMIULAB
• Encoder-decoder networks
• Leverages sentence level information
• Attention-based encoder-decoder
• Use of attention (as in MT) in the encoder-decoder network
• Attention is estimated using a feed-forward network with input: ht and st at time t
Slot Tagging
(Kurata et al., 2016; Simonnet et al., 2015)𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤𝑛 𝑤2 𝑤1 𝑤0 ℎ𝑛 ℎ2 ℎ1 ℎ0
𝑤0 𝑤1 𝑤2 𝑤𝑛
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
ℎ0 ℎ1 ℎ2 ℎ𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛 ci
ℎ0…ℎ𝑛
http://www.aclweb.org/anthology/D16-1223
NTUMIULAB
• Encoder that segments
• Decoder that tags the segments
Joint Segmentation & Slot Tagging (Zhai+, 2017)
56
https://arxiv.org/pdf/1701.04027.pdf
NTUMIULAB
• Multi-task learning
• Goal: exploit data from domains/tasks with a lot of data to improve ones with less data
• Lower layers are shared across domains/tasks
• Output layer is specific to task
Multi-Task Slot Tagging
(Jaech et al., 2016; Tafforeau et al., 2016)57
https://arxiv.org/abs/1604.00117; http://www.sensei-conversation.eu/wp-content/uploads/2016/11/favre_is2016b.pdf
NTUMIULAB
Semi-Supervised Slot Tagging
(Lan+, 2018)58
O. Lan, S. Zhu, and K. Yu, “Semi-supervised Training using Adversarial Multi-task Learning for Spoken Language Understanding,” in Proceedings of ICASSP, 2018.
Slot Tagging
Model
BLM exploits the unsupervised knowledge, the shared-private framework and adversarial training make the slot tagging model more generalized
https://speechlab.sjtu.edu.cn/papers/oyl11-lan-icassp18.pdf
• Idea: language model objective can enhance other tasks
NTUMIULAB
LU Evaluation
• Metrics
• Sub-sentence-level: intent accuracy, slot F1
• Sentence-level: whole frame accuracy
59
NTUMIULAB
ht-1 ht ht+1
W W W W
taiwanese
B-type U
food U
please U
V
O V
O V
hT+1 EOS U
FIND_REST V
Slot Filling Intent Prediction
Joint Semantic Frame Parsing
Sequence- based (Hakkani-
Tur et al., 2016)
• Slot filling and intent prediction in the same
output sequence
Parallel (Liu and Lane, 2016)• Intent prediction
and slot filling are
performed in two
branches
NTUMIULAB
Slot-Gated Joint SLU
(Goo+, 2018)Slot Attention
Intent Attention 𝑦𝐼
Word Sequence
𝑥1 𝑥2 𝑥3 𝑥4
BLSTM Slot
Sequence 𝑦1
𝑆 𝑦2𝑆 𝑦3𝑆 𝑦4𝑆
Word
Sequence 𝑥1 𝑥2 𝑥3 𝑥4
BLSTM
Slot Gate
𝑊
𝑐𝐼
𝑣 tanh
𝑔
𝑐𝑖𝑆
Slot Gate
𝑔 = ∑𝑣 ∙ tanh 𝑐𝑖𝑆 + 𝑊 ∙ 𝑐𝐼 Slot Prediction
𝑦𝑖𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊𝑆 ℎ𝑖 + 𝒈 ∙ 𝑐𝑖𝑆 + 𝑏𝑆
𝒈 will be larger if slot and intent are better related
NTUMIULAB
Contextual LU
62
just sent email to bob about fishing this weekend
O O O O
B-contact_name
O
B-subject I-subject I-subject
U S
I send_email D communication
→ send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend U1
S2
→ send_email(message=“are we going to fish this weekend”)
send email to bob
U2
→ send_email(contact_name=“bob”)
B-message
I-messageI-message I-message I-message I-message I-message
B-contact_name
S1
Domain Identification → Intent Prediction → Slot Filling
NTUMIULAB
• User utterances are highly ambiguous in isolation
Contextual LU
Cascal, for 6.
#people time
?
Book a table for 10 people tonight.
Which restaurant would you like to book a table for?
Restaurant Booking
NTUMIULAB
• Leveraging contexts
• Used for individual tasks
• Seq2Seq model
• Words are input one at a time, tags are output at the end of each utterance
• Extension: LSTM with speaker role dependent layers
Contextual LU
(Bhargava et al., 2013; Hori et al, 2015)64
https://www.merl.com/publications/docs/TR2015-134.pdf
NTUMIULAB
U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
End-to-End Memory Networks
(Sukhbaatar et al, 2015)m0
mi
mn-1 u
NTUMIULAB
E2E MemNN for Contextual LU
(Chen+, 2016)66
u
Knowledge Attention Distribution
pi
mi
Memory Representation
Weighted
Sum h
∑ Wkg
Knowledge Encoding o
Representation history utterances {xi}
current utterance
c
Inner Product Sentence
Encoder RNNin
x1 x2 … xi
Contextual Sentence Encoder
x1 x2 … xi
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt yt-1 yt
U U
M M
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Idea: additionally incorporating contextual knowledge during slot tagging
→ track dialogue states in a latent way
RNN Tagger
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf
NTUMIULAB U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
Analysis of Attention
0.69
0.13
0.16
NTUMIULAB
Role-Based & Time-Aware Attention
(Su+, 2018)Dense Layer
+
wt wt+1 wT
… …
Dense Layer
Spoken Language Understanding u2
u6
Tourist u4 Guide
u1
u7
Current
Sentence-Level Time-Decay Attention
u3 u5
Role-Level Time-Decay Attention
𝛼𝑟1 𝛼𝑟2
𝛼𝑢𝑖
∙ u2 ∙ u4 ∙ u5
𝛼𝑢2 𝛼𝑢4 𝛼𝑢5 𝛼𝑢∙ u1 1 𝛼𝑢∙ u3 3 𝛼𝑢∙ u6 6 History Summary
Time-Decay Attention Function (𝛼𝑢 & 𝛼𝑟)
𝛼
𝑑 𝛼
𝑑 𝛼
𝑑
convex linear concave
NTUMIULAB
Learnable Time-Decay Attention
(Su+, 2019)Dense Layer
+
Current Utterance wt wt+1 wT
… …
Dense
Layer Spoken Language Understanding
∙ u2 ∙ u4 ∙ u5
u2
u6
Tourist
u4
Guide u1
u7
Current
Sentence-Level Time-Decay Attention
u3 u5
𝛼𝑢2 𝛼𝑢∙ u1 1 ∙ u3 ∙ u6
Role-Level Time-Decay Attention
𝛼𝑟1 𝛼𝑟2
𝛼𝑢𝑖
𝛼𝑢4 𝛼𝑢5 𝛼𝑢3 𝛼𝑢6
History Summary
Attention Model
Attention Model
convex linear concave
S.-Y. Su, P.-C. Yuan, and Y.-N. Chen, "Learning Context-Sensitive Time-Decay Attention for Role-Based Dialogue Modeling," in Submission.
NTUMIULAB
Structural LU
(Chen et al., 2016)• K-SAN: prior knowledge as a teacher
Knowledge 70
Encoding
Sentence Encoding
Inner Product
mi
Knowledge Attention Distribution
pi
Encoded Knowledge Representation
Weighted Sum
∑
Knowledge- Guided Representation
slot tagging sequence knowledge-guided structure {xi}
showme theflights fromseattleto sanfrancisco
ROOT
Input Sentence
W W W W
wt-1
yt-1 U
wt M U
wt+1 U
V
yt V
yt+1 V M
M
RNN Tagger
Knowledge Encoding Module
http://arxiv.org/abs/1609.03286
NTUMIULAB
Structural LU
(Chen et al., 2016)• Sentence structural knowledge stored as memory
71
Semantics (AMR Graph)
show me
the
flights from seattle
to
san francisco
ROOT
1.
3.
4.
2.
show
you flight I
1.
2.
4.
city city
Seattle San Francisco
3.
Sentence s
show me the flights from seattle to san franciscoSyntax (Dependency Tree)
http://arxiv.org/abs/1609.03286
NTUMIULAB
Structural LU
(Chen et al., 2016)• Sentence structural knowledge stored as memory http://arxiv.org/abs/1609.03286
Using less training data with K-SAN allows the model pay the similar attention to the salient substructures that are important for tagging.
NTUMIULAB
Semantic Frame Representation
• Requires a domain ontology: early connection to backend
• Contains core content (intent, a set of slots with fillers)
find me a cheap taiwanese restaurant in oakland
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”, director=“james cameron”) Restaurant
Domain
Movie Domain
restaurant price type
location
movie
year genre
director
73
NTUMIULAB
• Learning key domain concepts from goal-oriented human-human conversations
• Clustering with mutual information and KL divergence (Chotimongkol &
Rudnicky, 2002)
• Spectral clustering based slot ranking model (Chen et al., 2013)
• Use a state-of-the-art frame-semantic parser trained for FrameNet
• Adapt the generic output of the parser to the target semantic space
LU – Learning Semantic Ontology
(Chen+, 2013)74
NTUMIULAB
• Transfer dialogue acts across domains
• Dialogue acts are similar for multiple domains
• Learning new intents by information from other domains
LU – Intent Expansion
(Chen+, 2016)CDSSM New Intent
Intent Representation 12
K:
Embedding Generation
K+1
<change_calender> K+2 Training Data
<change_note>
“adjust my note”
:
<change_setting>
“volume turn down”
300 300 300 300
U A1 A2 An
CosSim
P(A1 | U) P(A2 | U) P(An | U)
…
Utterance Action
The dialogue act representations can be automatically learned for other domains
http://ieeexplore.ieee.org/abstract/document/7472838/
postpone my meeting to five pm
NTUMIULAB
LU – Language Extension
(Upadhyay+, 2018)• Source language: English (full annotations)
• Target language: Hindi (limited annotations)
76
RT: round trip, FC: from city, TC: to city, DDN: departure day name
http://shyamupa.com/papers/UFTHH18.pdf
NTUMIULAB
LU – Language Extension
(Upadhyay+, 2018)77
English Train
Hindi Train
Hindi Tagger
MT SLU
Results Hindi Test
Train on Target (Lefevre et al, 2010)
English Tagger Hindi
Test
English
MT Test SLU
Results Test on Source (Jabaian et al, 2011)
SLU Results Hindi Train (Small)
Bilingual Tagger English Train (Large)
Joint Training
Hindi Test Joint Training
MT system is not required and both languages can be processed by a single model
http://shyamupa.com/papers/UFTHH18.pdf
NTUMIULAB
• Introduction
• Background Knowledge
• Modular Dialogue System
• Spoken/Natural Language Understanding (SLU/NLU)
• Dialogue Management
• Dialogue State Tracking (DST)
• Dialogue Policy Optimization
• Natural Language Generation (NLG)
• System Evaluation
• Recent Trends of Learning Dialogues
Outline
78
NTUMIULAB
Elements of Dialogue Management
79
(Figure from Gašić)
Dialogue State Tracking
NTUMIULAB
• Maintain a probabilistic distribution instead of a 1-best prediction for better robustness
Dialogue State Tracking (DST)
80
Incorrect for both!
NTUMIULAB
Dialogue State Tracking (DST)
• Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input
81
How can I help you?
Book a table at Sumiko for 5 How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
NTUMIULAB
• A full representation of the system's belief of the user's goal at any point during the dialogue
• Used for making API calls
Multi-Domain Dialogue State Tracking
82
Movies
Less Likely
More Likely Date
Time
#People
6 pm 2 11/15/17
7 pm 8 pm 9 pm
Century 16 Shoreline
#People Theater
Inferno.
Inferno Movie
Which movie are you interested in?
I wanna buy two tickets for tonight at the Shoreline theater.
NTUMIULAB
• A full representation of the system's belief of the user's goal at any point during the dialogue
• Used for making API calls
Multi-Domain Dialogue State Tracking
83
Movies
Less Likely
More Likely
I wanna buy two tickets for tonight at the Shoreline theater.
Date Time
#People
6:30 pm 2 11/15/17
7:30 pm 8:45 pm 9:45 pm
Century 16 Shoreline
#People Theater
Which movie are you interested in?
Inferno.
Inferno Movie
Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?
We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?
Restaurants
6:00 pm 6:30 pm 11/15/17
Date
Time 7:00 pm
Cascal
#People 2 Restaurant
NTUMIULAB
• A full representation of the system's belief of the user's goal at any point during the dialogue
• Used for making API calls
Multi-Domain Dialogue State Tracking
84
Movies
Less Likely
More Likely Date
Time
#People
6:30 pm 2 11/15/17
7:30 pm 8:45 pm 9:45 pm
Century 16 Shoreline
#People Theater
Inferno.
Inferno Movie
Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?
We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?
Restaurants
6:00 pm 6:30 pm 11/15/17
Date
Time 7:00 pm
Cascal
Cascal has a table for 2 at 6pm and 7:30pm.
OK, let me get the table at 6 and tickets for the 7:30 showing.
#People 2 Restaurant
NTUMIULAB
RNN-CNN DST
(Mrkšić+, 2015)85
(Figure from Wen et al, 2016)
http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777
NTUMIULAB
Neural Belief Tracker
(Mrkšić+, 2016)• Candidate pairs are considered
https://arxiv.org/abs/1606.03777Belief State Updates: [bt]
Previous Belief State: [bt-1]
NTUMIULAB
• More advanced encoder
• Global modules share parameters for all slots
• Local modules learn slot-specific feature representations
Global-Locally Self-Attentive DST
(Zhong+, 2018)87
http://www.aclweb.org/anthology/P18-1135
NTUMIULAB
Dialog State Tracking Challenge (DSTC)
(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)
Challenge Type Domain Data Provider Main Theme
DSTC1 Human-Machine Bus Route CMU Evaluation Metrics
DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes
DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation
DSTC4 Human-Human Tourist Information I2R Human Conversation
DSTC5 Human-Human Tourist Information I2R Language Adaptation