Towards Conversational AI
Applied Deep Learning
May 31st, 2021 http://adl.miulab.tw
What can machines achieve now or in the future?
Iron Man (2008)
2
Language Empowering Intelligent Assistants
Apple Siri (2011) Google Now (2012)
Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
Apple HomePod (2017) Facebook Portal (2019)
3
Why Natural Language?
◉
Global Digital Statistics (2021 January)
Internet Users 4.66B (59.5%)
Unique Mobile Users
5.22B (66.6%)
The more natural and convenient input of devices evolves towards speech.
Active Mobile Social Users 4.20B (53.6%) Total Population
7.83B
4
Why and When We Need?
“I want to chat”
“I have a question”
“I need to get this done”
“What should I do?”
Turing Test (talk like a human) Information consumption
Task completion Decision support
Social Chit-Chat Task-Oriented Dialogues
• Is this course good to take?
• Book me the train ticket from Kaohsiung to Taipei
• Reserve a table at Din Tai Fung for 5 people, 7PM tonight
• Schedule a meeting with Vivian at 10:00 tomorrow
• What is today’s agenda?
• What does NLP stand for?
5
Intelligent Assistants
Task-Oriented
6
App → Bot
◉
A bot is responsible for a “single” domain, similar to an app
Users can initiate dialogues instead of following the GUI design
7
Two Branches of Conversational AI
Chit-Chat
Task-Oriented
8
Task-Oriented Dialogue Systems
9
Task-Oriented Dialogue Systems
(Young, 2000)LU: Language
Understanding DST: Dialogue State Tracking
DP: Dialogue Policy Learning NLG: Natural
Language Generation For how many people?
ASR
TTS Can you help me book a
5-star hotel on Sunday?
10
Modular Task-Oriented Dialogue Systems
Language Understanding
11
Language Understanding (LU)
◉
NLU is a turn-level task that maps utterances to semantics frames.
○
Input: raw user utterance
○
Output: semantic frame (e.g. speech-act, intent, slots)
DP
For two people, thanks! DST
people_num=2
NLG
LU: Language Understanding 12
Language Understanding (LU)
◉
Pipelined
1. Domain Classification
2. Intent
Classification 3. Slot Filling 13
1. Domain Identification
Requires Predefined Domain Ontology
find a good eating place for taiwanese food
User
Organized Domain Knowledge (Database)
Intelligent Agent
Restaurant DB Taxi DB Movie DB
Classification!
14
2. Intent Detection
Requires Predefined Schema
find a good eating place for taiwanese food
User
Intelligent Agent
Restaurant DB
FIND_RESTAURANT FIND_PRICE
FIND_TYPE :
Classification!
15
3. Slot Filling
Requires Predefined Schema
find a good eating place for taiwanese food
User
Intelligent Agent
Restaurant DB
Restaurant Rating Type Rest 1 good Taiwanese
Rest 2 bad Thai
: : :
FIND_RESTAURANT rating=“good”
type=“taiwanese”
SELECT restaurant { rest.rating=“good”
rest.type=“taiwanese”
Semantic Frame } Sequence Labeling O O B-rating O O O B-type O
16
Slot Tagging
(Yao et al, 2013; Mesnil et al, 2015)◉
Variations:
a.
RNNs with LSTM cells
b.
Input, sliding window of n-grams
c.
Bi-directional LSTMs
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0𝑓 ℎ1𝑓 ℎ2𝑓 ℎ𝑛𝑓 ℎ0𝑏 ℎ1𝑏 ℎ2𝑏 ℎ𝑛𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛
(b) LSTM-LA (c) bLSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
(a) LSTM
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛 ℎ0 ℎ1 ℎ2 ℎ𝑛
http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380
17
Slot Tagging
(Kurata et al., 2016; Simonnet et al., 2015)◉
Encoder-decoder networks
○
Leverages sentence level information
◉
Attention-based encoder-decoder
○
Use of attention (as in MT) in the encoder-decoder network
○
Attention is estimated using a feed-
forward network with input: h
tand s
tat time t
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤𝑛 𝑤2 𝑤1 𝑤0 ℎ𝑛 ℎ2 ℎ1 ℎ0
𝑤0 𝑤1 𝑤2 𝑤𝑛
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
ℎ0 ℎ1 ℎ2 ℎ𝑛 𝑠0 𝑠1 𝑠2 𝑠𝑛 ci
ℎ0…ℎ𝑛
http://www.aclweb.org/anthology/D16-1223
18
ht-1 ht ht+1
W W W W
taiwanese
B-type U
food U
please U
V
O V
O V
hT+1 EOS U
FIND_REST V
Slot Filling Intent Prediction
Joint Semantic Frame Parsing
◉
Sequence-based
(Hakkani-Tur+, 2016) ◉Parallel-based
(Liu and Lane, 2016)Attention Mechanism
Intent-Slot Relationship Sequence-based (Hakkani-Tur+, ‘16) X Δ (Implicit) Parallel-based (Liu & Lane, ‘16) √ Δ (Implicit)
Slot-Gated Joint Model √ √ (Explicit)
19
Slot-Gated Joint SLU
(Goo+, 2018)Slot Attention
Intent Attention 𝑦𝐼
Word Sequence
𝑥1 𝑥2 𝑥3 𝑥4
BLSTM Slot
Sequence
𝑦1𝑆 𝑦2𝑆 𝑦3𝑆 𝑦4𝑆
Word
Sequence 𝑥1 𝑥2 𝑥3 𝑥4 BLSTM
Slot Gate
𝑊
𝑐𝐼
𝑣 tanh
𝑔
𝑐𝑖𝑆
Slot Gate
𝑔 = ∑𝑣 ∙ tanh 𝑐𝑖𝑆 + 𝑊 ∙ 𝑐𝐼 Slot Prediction
𝑦𝑖𝑆 = softmax 𝑊𝑆 ℎ𝑖 + 𝒈 ∙ 𝑐𝑖𝑆 + 𝑏𝑆
𝒈 will be larger if slot and intent are better related
20
Contextual Language Understanding
◉
User utterances are highly ambiguous in isolation
Cascal, for 6.
#people time
?
Book a table for 10 people tonight.
Which restaurant would you like to book a table for?
Restaurant Booking
21
End-to-End Memory Networks
(Sukhbaatar et al, 2015)m0
mi
mn-1
u U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
U: “Let’s do 5:40”
22
E2E MemNN for Contextual LU
(Chen+, 2016)u
Knowledge Attention Distribution
pi
mi
Memory Representation
Weighted
Sum h
∑ Wkg
o
Knowledge Encoding Representation history utterances
{xi} current utterance
c
Inner Product Sentence
Encoder RNNin
x1 x2 … xi
Contextual Sentence Encoder
x1 x2 …xi
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt yt-1 yt
U U
M M
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Idea: additionally incorporating contextual knowledge during slot tagging
RNN Tagger
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf
23
E2E MemNN for Contextual LU
(Chen+, 2016)0.69
0.13
0.16
U: “Let’s do 5:40”
U: “i d like to purchase tickets to see deepwater horizon”
S: “for which theatre”
U: “angelika”
S: “you want them for angelika theatre?”
U: “yes angelika”
S: “how many tickets would you like ?”
U: “3 tickets for saturday”
S: “What time would you like ?”
U: “Any time on saturday is fine”
S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
24
Recent Advances in NLP
◉
Contextual Embeddings (ELMo & BERT)
○ Boost many understanding performance with pre- trained language models
?
25
26
27
Robustness – Adapting to ASR
(Huang & Chen, 2019)LSTM LSTM LSTM
What a day
Linear
a day <EOS>
Stage 1: Pre-Training on Sequential Texts
LatticeLSTM
the, 1.0
LatticeLSTM Max pooling
classification Fine-Tuning
the, 1.0 0.8
0.2
Linear
0.9 1.0 1.0
0.1
1.0 1.0
Stage 2: Pre-Training on Lattices
LatticeLSTM
28
Robustness – Adapting to ASR
(Huang & Chen, 2019)◉
Idea: lattices may include correct words
◉
Goal: feed lattices into Transformer
Transformer Encoder 𝑤1 𝑤2 . . .𝑤𝑚−1𝑤𝑚
<S> <E>
Linear
𝑦
<s> cheapest airfare
fair
affair air
to Milwaukee </s>
1
0.4 0.3 0.3
1
1 1
1
1 1
Chao-Wei Huang and Yun-Nung Chen, “Adapting Pretrained Transformer to Lattices for Spoken Language Understanding,”
in Proceedings of 2019 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2019.
SLU performance is improved by leveraging the lattices without increasing training/inference time 29
Robustness – Adapting to ASR
(Huang & Chen, 2019)Chao-Wei Huang and Yun-Nung Chen, “Learning ASR-Robust Contextualized Embeddings for Spoken Language Understanding,” in ICASSP, 2019.
The contextual embeddings of the recognized texts would be similar to the ground truth one.
◉
Confusion-Aware Fine-Tuning
○
Supervised
○
Unsupervised
30
Scalability – Multilingual LU
(Upadhyay+, 2018)◉
Source language: English (full annotations)
◉
Target language: Hindi (limited annotations)
RT: round trip, FC: from city, TC: to city, DDN: departure day name
http://shyamupa.com/papers/UFTHH18.pdf
31
Scalability – Multilingual LU
(Upadhyay+, 2018)English Train
Hindi Train
Hindi Tagger
MT SLU
Results Hindi Test
Train on Target (Lefevre et al, 2010)
English Tagger Hindi
Test
English
MT Test SLU
Results Test on Source (Jabaian et al, 2011)
SLU Results Hindi Train (Small)
Bilingual Tagger English Train (Large)
Joint Training
Hindi Test Joint Training
MT system is not required and both languages can be processed by a single model
http://shyamupa.com/papers/UFTHH18.pdf
32
LU Evaluation
◉
Metrics
○
Sub-sentence-level: domain/intent accuracy, slot F1
○
Sentence-level: whole frame accuracy
Utterance: For 2 people thanks
Slot: O B-people O O Domain: Hotel
Intent: Hotel_Book ⇒ Acc
⇒ Slot-F1
⇒ Frame Accuracy 33
Modular Task-Oriented Dialogue Systems
Dialogue State Tracking
34
Dialogue State Tracking
◉
DST is a dialogue-level task that maps partial dialogues into dialogue states.
○
Input: a dialogue / a turn with its previous state
○
Output: dialogue state (e.g. slot-value pairs)
Hotel_Book ( star=5
day=sunday )
Hotel_Book ( star=5
day=sunday people_num=2) Can you help me book a
5-star hotel on Sunday?
DP NLG
NLU people_num=2
For two people, thanks! DST: Dialogue
State Tracking 35
Dialogue State Tracking
request (restaurant; foodtype=Thai)
inform (area=centre)
request (address)
bye ()
36
Dialogue State Tracking
Requires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
location rating type
loc, rating
rating, type
loc, type all
i want it near to my office
NULL
37
Dialogue State Tracking
Requires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
location rating type
loc, rating
rating, type
loc, type all
i want it near to my office
NULL
38
Dialogue State Tracking
Handling Errors and Confidence
User
Intelligent Agent
find a good eating place for taixxxx food
FIND_RESTAURANT rating=“good”
type=“taiwanese”
FIND_RESTAURANT rating=“good”
type=“thai”
FIND_RESTAURANT rating=“good”
location rating type
loc, rating
rating, type
loc, type all
NULL
?
?
rating=“good”, type=“thai”
rating=“good”, type=“taiwanese”
?
?
39
DST Problem Formulation
◉
The DST dataset consists of
○
Goal: for each informable slot
■ e.g. price=cheap
○
Requested: slots by the user
■ e.g. moviename
○
Method: search method for entities
■ e.g. by constraints, by name
◉
The dialogue state is
○
the distribution over possible slot-value pairs for goals
○
the distribution over possible requested slots
○
the distribution over possible methods
40
Dialogue State Tracking (DST)
◉
Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input
How can I help you?
Book a table at Sumiko for 5 How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
41
Multi-Domain Dialogue State Tracking
◉
A full representation of the system's belief of the user's goal at any point during the dialogue
◉
Used for making API calls
Movies
Less Likely
More Likely Date
Time
#People
6 pm 2 11/15/17
7 pm 8 pm 9 pm
Century 16 Shoreline
#People Theater
Inferno.
Inferno Movie
Which movie are you interested in?
I wanna buy two tickets for tonight at the Shoreline theater.
42
Multi-Domain Dialogue State Tracking
◉
A full representation of the system's belief of the user's goal at any point during the dialogue
◉
Used for making API calls
Movies
Less Likely
More Likely
I wanna buy two tickets for tonight at the Shoreline theater.
Date Time
#People
6:30 pm 2 11/15/17
7:30 pm 8:45 pm 9:45 pm
Century 16 Shoreline
#People Theater
Which movie are you interested in?
Inferno.
Inferno Movie
Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?
We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?
Restaurants
6:00 pm 6:30 pm 11/15/17
Date
Time 7:00 pm
Cascal
#People 2 Restaurant
43
Multi-Domain Dialogue State Tracking
◉
A full representation of the system's belief of the user's goal at any point during the dialogue
◉
Used for making API calls
Movies
Less Likely
More Likely Date
Time
#People
6:30 pm 2 11/15/17
7:30 pm 8:45 pm 9:45 pm
Century 16 Shoreline
#People Theater
Inferno.
Inferno Movie
Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer?
We'd like to eat dinner before the movie at Cascal, can you check what time i can get a table?
Restaurants
6:00 pm 6:30 pm 11/15/17
Date
Time 7:00 pm
Cascal
Cascal has a table for 2 at 6pm and 7:30pm.
OK, let me get the table at 6 and tickets for the 7:30 showing.
#People 2 Restaurant
44
Discriminative DST – Single Turn
Data
Model
Prediction
• Observations labeled w/ dialogue state
• Distribution over dialogue states – Dialogue State Tracking
• Neural networks
• Ranking models
45
DNN for DST
feature
extraction DNN
A slot value distribution for each slot
multi-turn conversation
state of this turn 46
Discriminative DST – Multiple Turns
Data
Model
Prediction
• Sequence of observations labeled w/ dialogue states
• Distribution over dialogue states – Dialogue State Tracking
• Sequential models
– Recurrent neural networks (RNN)
47
Recurrent Neural Network (RNN)
◉
Elman-type
◉
Jordan-type
48
RNN-Based DST
◉
Idea: internal memory for representing dialogue context
○ Input
■ most recent dialogue turn
■ last machine dialogue act
■ dialogue state
■ memory layer
○ Output
■ update its internal memory
■ distribution over slot values 49
RNN-CNN DST
(Mrkšić+, 2015)(Figure from Wen et al, 2016)
http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777
50
Global-Locally Self-Attentive DST
(Zhong+, 2018)◉
More advanced encoder
○ Global modules share parameters for all slots
○ Local modules learn slot-specific feature representations
http://www.aclweb.org/anthology/P18-1135
51
Generative DST
●
Generating the state as a sequence (Lei+, 2018) or dialogue state updates (Lin+, 2020)
(Dialogue history) ⇒ (slot1=val,slot2=val …)
●
Given a dialogue and a slot, generate the value of the slot ( Wu+, 2019; Gao+, 2019; Ren+, 2019; Zhou & Small, 2019; Kim+, 2019;
Le+, 2020) ⇒ requires multiple forwards
(Dialogue history, slot1) ⇒ val
52
Handling Unknown Slot Values
(Xu & Hu, 2018)◉
Issue: fixed value sets in DST
http://aclweb.org/anthology/P18-1134<sys> would you like some Thai food
Attention Dist.
<usr> I prefer Italian one <food>
“Italian”
other dontcare
none
Italian
Pointer networks for generating unknown values 53
NONE DONTCARE
Context PTR
Vector Ashley
Slot Gate
Ex: hotel
Utterances
…....
Bot: Which area are you looking for the hotel?
User: There is one at east town called Ashley Hotel.
Utterance Encoder
Domains Hotel, Train,
Attraction, Restaurant, Taxi
Slots
Price, Area, Day, Departure, name, LeaveAt, food, etc.
State Generator
Ashley
Ex: name
Hotel?
TRADE: Transferable DST
(Wu+, 2019) 54TripPy: Handling OOV & Rare Values
(Heck+, 2020)55
DST Evaluation
◉
Dialogue State Tracking Challenges
○ DSTC2-3, human-machine
○ DSTC4-5, human-human
○ DSTC8, human-machine
◉
Metric
○ Tracked state accuracy with respect to user goal
○ Recall/Precision/F-measure individual slots
Input Dialogue:
USER: Can you help me book a 5- star hotel on Sunday?
SYSTEM: For how many people?
USER: For two people, thanks!
Output Dialogue State:
Hotel_Book (star=5, day=sunday) Hotel_Book (star=5, day=sunday, people_num=2)
⇒ Slot Acc / Joint Acc 56
Dialog State Tracking Challenge (DSTC)
(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)
Challenge Type Domain Data Provider Main Theme
DSTC1 Human-Machine Bus Route CMU Evaluation Metrics DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation DSTC4 Human-Human Tourist Information I2R Human Conversation DSTC5 Human-Human Tourist Information I2R Language Adaptation
57
DSTC4-5
◉ Type: Human-Human
◉ Domain: Tourist Information
Tourist: Can you give me some uh- tell me some cheap rate hotels, because I'm planning just to leave my bags there and go somewhere take some pictures.
Guide: Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right?
Tourist: Yes. I'm just gonna bring my backpack and my buddy with me. So I'm kinda looking for a hotel that is not that expensive. Just gonna leave our things there and, you know, stay out the whole day.
Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit uh not so roomy like hotel because you just back to sleep.
Tourist: Yes. Yes. As we just gonna put our things there and then go out to take some pictures.
Guide: Okay, um- Tourist: Hm.
Guide: Let's try this one, okay?
Tourist: Okay.
Guide: It's InnCrowd Backpackers Hostel in Singapore.
If you take a dorm bed per person only twenty dollars. If you take a room, it's two single beds at fifty nine dollars.
Tourist: Um. Wow, that's good.
Guide: Yah, the prices are based on per person per bed or dorm. But this one is room. So it should be fifty nine for the two room. So you're actually paying about ten dollars more per person only.
Tourist: Oh okay. That's- the price is reasonable actually. It's good.
{Topic: Accommodation; Type: Hostel; Pricerange:
Cheap; GuideAct: ACK; TouristAct: REQ}
{Topic: Accommodation; NAME: InnCrowd
Backpackers Hostel; GuideAct: REC; TouristAct: ACK}
58
Multi-Domain DST Data
◉
MultiWoZ 2.0 ⇒ 2.1 ⇒ 2.2 ⇒ 2.3 ⇒ ……
◉
SGD: natural language described schema for better scalability
59
MultiWOZ 2.1 Leaderboard
60
Modular Task-Oriented Dialogue Systems
Dialogue Policy Learning
61
Dialogue Policy Learning
◉
DP decides the system action for interacting with users based on dialogue states.
○
Input: dialogue state + KB results
○
Output: system action (speech-act + slot-value pairs)
62
NLU DST
DP: Dialogue Policy Learning
NLG
Inform (
hotel_name=B&B )
KB Hotel_Book (
star=5
day=sunday people_num=2) Can you help me book a
5-star hotel on Sunday?
For two people, thanks!
Dialogue Policy Learning
request (restaurant; foodtype=Thai)
inform (area=centre)
request (address)
bye ()
greeting ()
request (area)
inform (restaurant=Bangkok city, area=centre of town, foodtype=Thai)
inform (address=24 Green street)
63
Supervised v.s. Reinforcement
◉
Supervised
◉
Reinforcement
……
Say “Hi”
Say “Good bye”
Learning from teacher
Learning from critics
Hello ☺ ……
“Hello”
“Bye bye”
……. …….
OXX???
!
Bad
64
Dialogue Policy Optimization
◉
Dialogue management in a RL framework
U s e r
Reward R Observation O Action A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
Select the best action that maximizes the future reward 65
Reward for RL ≅ Evaluation for System
◉
Dialogue is a special RL task
● Human involves in interaction and rating (evaluation) of a dialogue
● Fully human-in-the-loop framework
◉
Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost - Objective rating Check desired aspects, low cost
66
Dialogue Reinforcement Learning Signal
◉
Typical reward function
○ -1 for per turn penalty
○ Large reward at completion if successful
◉
Typically requires domain knowledge
○ ✔ Simulated user
○ ✔ Paid users (Amazon Mechanical Turk)
○ ✖ Real users
| | |
…
﹅
The user simulator is usually required for dialogue system training before deployment
67
Neural Dialogue Manager
(Li et al., 2017)◉
Deep Q-network for training dialogue policy
○
Input: current semantic frame observation, database returned results
○
Output: system action
Semantic Frame request_movie
genre=action, date=this weekend
System
Action/Policy request_location
DQN-based Dialogue Management Simulated User (DM)
Backend DB
https://arxiv.org/abs/1703.01008
68
E2E Task-Completion Bot (TC-Bot)
(Li et al., 2017)Idea: SL for each component and RL for end-to-end training the neural dialogue system
wi
<slot>
wi+1
O
EOS
<intent>
wi
<slot>
wi+1
O
EOS
<intent>
Knowledge Database
Neural Dialogue System
User Model
User Simulation
Dialogue Policy Natural Language
w0 w1NLGw2 EOS
User Goal
wi
<slot>
wi+1
O
EOS
<intent>
LU
𝑠𝑡 DST
𝑠1 𝑠2 𝑠𝑛
𝑎1 𝑎2 𝑎𝑘
……
…
Dialogue Policy Learning Are there any action
movies to see this weekend?
request_location
69
User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle.
RULE BASED SYSTEM
User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm?
Agent: Which theater do you want?
User: AMC pacific place 11 theater, please!
Agent: How many tickets do you want?
User: Two!
Agent: What date?
User: Tomorrow!
Agent: Which city do you want?
User: Seattle!
Agent: Do you have any specification for zip code?
User: No, I do not care.
Agent: Do you have any distance constraint?
User: No, I do not care.
Agent: How about video format? 2d? Imax 3d?
User: Any kind is okay.
Agent: Do you have any constraint to price?
User: No!
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie
‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
REINFORCEMENT LEARNING SYSTEM User: I want two tickets for movie ‘Deadpool’.
Agent: What is the starttime you prefer?
User: 9:00 PM Agent: Which day?
User: Tomorrow Agent: Which city?
User: Seattle
Agent: Which theater do you prefer?
User: AMC Pacific Place 11 theater.
Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.
User: Okay, thanks.
Agent: Thanks!
Reinforcement Learning Agent
(Li et al., 2017)Skip the requests the user may not care about to improve efficiency
Issue: no notion
about what requests can be skipped
70
Online Training
(Su+, 2015; Su+, 2016)◉
Policy learning from real users
○ Infer reward directly from dialogues (Su et al., 2015)
○ User rating (Su et al., 2016)
◉
Reward modeling on user binary success rating
Reward
Model Success/Fail Embedding
Function
Dialogue Representation
Reinforcement Signal
Query rating 71
Interactive RL for DP
(Shah+, 2016)Immediate Feedback
https://research.google.com/pubs/pub45734.html
Use a third agent for providing interactive feedback to the policy
72
Planning – Deep Dyna-Q
(Peng+, 2018)◉ Idea: learning with real users with planning
Policy Model
World User Model
Real Experience
Direct
Reinforcement Learning World Model
Learning
Planning
Acting Human
Conversational Data Imitation
Learning Supervised
Learning
Policy learning suffers from the poor quality of fake experiences
73
Robust Planning – D3Q (Su+, 2018)
◉ Idea: add a discriminator to filter out the bad experiences
Policy Model
World User Model
Real Experience
Direct
Reinforcement Learning World Model
Learning
Controlled Planning
Acting Human
Conversational Data
Imitation Learning Supervised
Learning
Discriminator Discriminative
Training
S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.
NLU Discriminator
System Action (Policy)
Semantic Frame
State
Representation Real
Experience
DST
Policy Learning NLG
Simulated Experience
World Model User
74
Robust Planning – D3Q
(Su+, 2018)The policy learning is more robust and shows the improvement in human evaluation
75
Multi-Domain – Hierarchical RL
(Peng+, 2017)Travel Planning
Actions
• Set of tasks that need to be fulfilled collectively!
• Build a DM for cross-subtask constraints (slot constraints)
• Temporally constructed goals
• hotel_check_in_time > departure_flight_time
• # flight_tickets = #people checking in the hotel
• hotel_check_out_time< return_flight_time
https://arxiv.org/abs/1704.03084
76
Multi-Domain – Hierarchical RL
(Peng+, 2017)◉ Model makes decisions over two levels: meta-controller & controller
◉ The agent learns these policies simultaneously
○ Policy of optimal sequence of goals to follow 𝜋𝑔 𝑔𝑡, 𝑠𝑡; 𝜃1
○ Policy 𝜋𝑎,𝑔 𝑎𝑡, 𝑔𝑡, 𝑠𝑡; 𝜃2 for each sub-goal 𝑔𝑡
Meta-Controller
Controller
(mitigate reward sparsity issues)
Multiple policies need to collaborate with each other for better multi-domain interactions 77
Dialogue Policy Evaluation
◉
Metrics
○
Turn-level evaluation: system action accuracy
○
Dialogue-level evaluation: task success rate, reward
78
Dialogue State:
Hotel_Book ( star=5, day=sunday, people_num=2 ) KB State:
rest1=B&B
System Action:
inform ( hotel_name=B&B )
Modular Task-Oriented Dialogue Systems
Natural Language Generation
79
Natural Language Generation
◉
NLG is to map system actions to natural language responses.
○
Input: system speech-act + slot-value (optional)
○
Output: natural language response
80
NLU DST
I have book a hotel B&B for you. DP
Inform (
hotel_name=B&B ) NLG: Natural
Language Generation Can you help me book a
5-star hotel on Sunday?
For two people, thanks!
Template-Based NLG
◉
Define a set of rules to map frames to natural language
Pros: simple, error-free, easy to control
Cons: time-consuming, rigid, poor scalability
Semantic Frame Natural Language
confirm() “Please tell me more about the product you are looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
81
RNN-Based LM NLG
(Wen et al., 2015)<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> Din Tai Fung serves Taiwanese . delexicalisation
Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…
dialogue act 1-hot representation
SLOT_NAME serves SLOT_FOOD . <EOS>
conditioned on the dialogue act
Input
Output 82
xt ht-1 xt ht-1 xt ht-1
LSTM cell
Ct it
ft
ot ht xt
ht-1
Semantic Conditioned LSTM
(Wen et al., 2015)◉
Issue: semantic repetition
○ Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.
○ Din Tai Fung is a child friendly restaurant, and also allows kids.
DA cell
rt dt
dt-1
xt ht-1
Inform(name=Seven_Days, food=Chinese)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d0
Idea: using gate mechanism to control the generated semantics (dialogue act/slots) 83
Structural NLG
(Sharma+, 2017; Nayak+, 2017)◉
Delexicalized slots do not consider the word level information
◉
Slot value-informed sequence to sequence models
84
Contextual NLG
(Dušek and Jurčíček, 2016)◉
Goal: adapting users’ speaking way, providing context-aware responses
○
Context encoder
○
Seq2Seq model
https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203
85
Controlled Text Generation
(Hu et al., 2017)◉
Idea: NLG based on generative adversarial network (GAN) framework
○
c: targeted sentence attributes
86
Issues in NLG
◉
Issue
○
NLG tends to generate shorter sentences
○
NLG may generate grammatically-incorrect sentences
◉
Solution
○
Generate word patterns in an order
○
Consider linguistic patterns
87
Hierarchical NLG w/ Linguistic Patterns
(Su et al., 2018)
Bidirectional GRU Encoder
Italian priceRange
name … …
ENCODER
name[Midsummer House], food[Italian], priceRange[moderate], near[All Bar One]
All Bar One place it Midsummer House
All Bar One is priced place it is called Midsummer House All Bar One is moderately priced Italian place it is called Midsummer House
Near All Bar One is a moderately priced Italian place it is called Midsummer House
DECODING LAYER1 DECODING LAYER2 DECODING LAYER3 DECODING LAYER4
Hierarchical Decoder
1. NOUN + PROPN + PRON 2. VERB
3. ADJ + ADV 4. Others
Input Semantics
[ … 1, 0, 0, 1, 0, …]
Semantic 1-hot Representation
GRU Decoder
All Bar One is a
is a moderately
All Bar One is moderately
…… …
…
… …
output from last layer 𝒚𝒕𝒊−𝟏 last output 𝒚𝒕−𝟏𝒊 1. Repeat-input
2. Inner-Layer Teacher Forcing 3. Inter-Layer Teacher Forcing 4. Curriculum Learning
𝒉enc
88
Fine-Tuning Pre-Trained GPT-2
◉
Fine-tuning for conditional generation
89
Pre-trained models have better capability of generating fluent sentences
NLG Evaluation
◉
Automatic metrics
◉
Human evaluation
90
System Action inform(name=B&B)
System Response
I have book a hotel B&B for you.
Automatic Evaluation
◉
Perplexity ⇒ how likely the model is to generate the gold response
◉
N-gram overlapping ⇒ BLEU etc.
◉
Slot error rate ⇒ whether the given slots are mentioned
◉
Distinct N-grams ⇒ response diversity
91
Model
Response
Do you have any other plans this weekend?
Gold Response What do you do in the
coming days?
Scorer Score
Human Evaluation Likert
◉
Judges are asked to give ratings 0-5 according to “Humanness, Fluency and Coherence”
92
Model
Response
Do you have any other plans this weekend?
Dialogue History
I could teach a few classes this weekend and I don’t
know what to do
Human Evaluator
Likert:
Humanness Fluency Coherency
Human Evaluation Dynamic Likert
◉
Human judge interacts with the model and give ratings 0-5 according to “Humanness, Fluency and Coherence”
93
Model
Human Evaluator
Likert:
Humanness Fluency Coherency
Model
Human Evaluator Human
Evaluator
ACUTE-EVAL (Li et.al. 2019)
After conversation
Human Evaluation A/B
◉
Judges are asked to choose the best one according to “Humanness, Fluency and Coherence”
94
Model A
Response
Do you have any other plans this
weekend?
Response I don’t know
Human Evaluator
A / B Testing Humanness
Fluency Coherency
Model B Human
Dialogue History