Slido: #ADL2021
End-to-End Conversational AI
Applied Deep Learning
June 7th, 2021 http://adl.miulab.tw
Slides credited from NeurIPS 2020 Tutorial
Slido: #ADL2021
Why and When We Need?
“I want to chat”
“I have a question”
“I need to get this done”
“What should I do?”
Turing Test (talk like a human) Information consumption
Task completion Decision support
Social Chit-Chat Task-Oriented Dialogues
• Is this course good to take?
• Book me the train ticket from Kaohsiung to Taipei
• Reserve a table at Din Tai Fung for 5 people, 7PM tonight
• Schedule a meeting with Vivian at 10:00 tomorrow
• What is today’s agenda?
• What does NLP stand for?
2
Slido: #ADL2021
Two Branches of Conversational AI
Chit-Chat
Task-Oriented
3
Slido: #ADL2021
Vanilla Seq2Seq ConvAI: How
A simple 4-step recipe:
1. Choose the data: Human-to-human conversations
2. Choose the model: Large pre-trained language models are preferable 3. Train the model with the data: Supervised learning
4. Evaluate your model: Automatic or human evaluation
4
Slido: #ADL2021
Human1: Ok, I’ll try that.
Human2: Is there anything else bothering you?
Human1: Just one more thing. A school
called me this morning to see if I could teach a few classes this weekend and I don’t know what to do.
Human2: Do you have any other plan this weekend?
Human1: I’m supposed to work on a paper that’s due on Monday.
Human-to-Human Conversations:
● Daily Dialog
● Ubuntu Dialogue Corpus
● Twitter Conversations
● Reddit Conversational Data
● OpenSubtitles
These datasets are pre-processed to
have only 2 speakers ⇒ usually no more than 2 turns
Vanilla Seq2Seq ConvAI: Datasets
5
Slido: #ADL2021
Vanilla Seq2Seq conversational model (Vinyals and Le et.al., 2015, Shang et al., 2015)
Causal Decoder (Wolf et.al. 2019 ,Radford et.al. 2018)
Vanilla Seq2Seq ConvAI: Models
6
Slido: #ADL2021
Vanilla Seq2Seq ConvAI: Supervised Learning
Maximum Likelihood Estimation (MLE)
⇒ maximizing the conditional probability of the response given the history
⇒ The model output is a probability distribution over the vocab
Decoder
softmax softmax softmax softmax
(Target Seq.) (Probabilities)
7
Slido: #ADL2021
Decoder
softmax softmax softmax softmax
argmax
Vanilla Seq2Seq ConvAI: Greedy Decoding
8
Slido: #ADL2021
Vanilla Seq2Seq ConvAI: How
A simple 4-step recipe:
1. Choose the data: Human-to-human conversations
2. Choose the model: Large pre-trained language models are preferable 3. Train the model with the data: Supervised learning
4. Evaluate your model: Automatic or human evaluation
9
Slido: #ADL2021
Sci-Fi Short Film - SUNSPRING
https://www.youtube.com/watch?v=LY7x2Ihqj
10
Slido: #ADL2021
Limitation: Lack of Diversity
Human: What are you doing today?
System: I don’t know
Human: What is your name?
System: I don’t know
Human: What is your favourite food?
System: I don’t know
11
Slido: #ADL2021
Limitation: Lack of Consistency
Human: Where were you born?
System: I was born in Canada.
Human: Where are you from?
System: England, you?
Human: Where did you grow up?
System: I grew up in Texas.
12
Slido: #ADL2021
Limitation: Lack of Knowledge
Human: What is the weather like today?
System: I don’t know, maybe sunny?
Human: I am going out to Taylor Swift's concert today, I hope the weather is good.
System: Who is Taylor Swift?
Human: Okay, never mind.
13
Slido: #ADL2021
Limitation: Lack of Empathy
Human: I messed up my project today.
System: What is your project?
Human: I think I might get fired...
System: Then you need to find another job!
Human: 😩
14
Slido: #ADL2021
Limitation: Lack of Controllability
Human: Hi, how are you?
Model1: I am good thanks
Model2: I had really a bad day
Model3: I am okay, how was your day?
Model4: I am okay, I just finished my training session in the swimming pool
→ no control for style, topics, engagement, toxic and inappropriate responses
15
Slido: #ADL2021
Limitation: Lack of Versatility
NLU: Natural Language
Understanding
DST: Dialogue State Tracking
DP: Dialogue Policy
Learning NLG: Natural
Language Generation Can you help me book
a 5-star hotel on Sunday?
For how many people?
→ cannot handle task-oriented conversations due to API requirement
16
Slido: #ADL2021
Limitation: Lack of Global Optimization
◉ Turn-level optimization
Target:
response
decoder
Yeah EOS
I’m Yeah
on I’m
my on
way
… because of your game? my
Source: conversation history
encoder
17
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge 4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
18
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge 4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
19
Slido: #ADL2021
Limitation 1: Lack of Diversity
‘tis a fine brew on a day like this! Strong though, how many is sensible?
I'm not sure yet, I'll let you know !
Milan apparently selling Zlatan to balance the books... Where next, Madrid?
I don’t know.
Wow sour starbursts really do make your mouth water... mm drool.
Can I have one?
Of course!
Well he was on in Bromley a while ago... still touring.
I don't even know what he's talking about.
32% responses are general and meaningless
“I don’t know”
“I don’t know what you are talking about”
“I don’t think that is a good idea”
“Oh my god”
20
Slido: #ADL2021
Solution: Diversify Responses
1. Training and Decoding strategy ⇒ Maximum Mutual Information (MMI)
2. Model architecture ⇒ Conditional Variational Autoencoder (CVAE)
3. More data & Larger models ⇒ Large scale pre-training
4. Decoding strategy ⇒ Top-k sampling, Nucleus Sampling
⇒
⇒
21
Slido: #ADL2021
MMI for Response Diversity
(Li et al., 2016)‘tis a fine brew on a day like this! Strong though, how many is sensible?
Depends on how much you drink!
Milan apparently selling Zlatan to balance the books... Where next, Madrid?
I think he'd be a good signing.
Wow sour starbursts really do make your mouth water... mm drool.
Can I have one?
Of course you can! They’re delicious!
Well he was on in Bromley a while ago... still touring.
I’ve never seen him live.
22
Slido: #ADL2021
MMI for Response Diversity
(Li et al., 2016)‘tis a fine brew on a day like this! Strong though, how many is sensible?
Depends on how much you drink!
Milan apparently selling Zlatan to balance the books... Where next, Madrid?
I think he'd be a good signing.
Wow sour starbursts really do make your mouth water... mm drool.
Can I have one?
Of course you can! They’re delicious!
Well he was on in Bromley a while ago... still touring.
I’ve never seen him live.
23
Slido: #ADL2021
Diversify by Large-Scale Pretraining
BART T5
Meena BlenderBot
Meena BST
Text Pre-trained Dialogue Pre-trained
Initialize
Encoder Decoder
Dialogue
History Response
24
Slido: #ADL2021
Diversify by Large-Scale Pretraining
GPT-1/2/3
DialoGPT
DialoGPT
Text Pre-trained Dialogue Pre-trained
Initialize
Causal Decoder
Dialogue
History Response
25
Slido: #ADL2021
● Compared to beam search, human are more likely to
sample “low probability”
tokens.
● Nucleus Sampling try to
recover the human sampling process by sampling from
top-N vocabulary ..
.
Ref: The Curious Case of Neural Text Degeneration
Diversify by Nucleus Sampling
26
Slido: #ADL2021
Diversify by Nucleus Sampling
Figure from: https://huggingface.co/blog/how-to-generate
Time step 1 Time step 2
27
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge
4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
28
Slido: #ADL2021
Limitation 2: Lack of Consistency
29
Slido: #ADL2021
1. Learning speaker embedding:
■ Speaker Model
2. Conditioning on persona descriptions:
■ PersonaChat Dataset
■ TransferTransfo Model
Solution: Personalization
30
Slido: #ADL2021
Personalization via Speaker Model
EOS
where do you live
in
in england
england
.
. EOS
Rob Rob
Rob Rob
Word embeddings(50k)
englandlondon u.s.
great
good
stay
live okay monday
tuesday
Speaker embeddings(70k)
Rob_712 skinnyoflynny2
Tomcoatez
Kush_322 D_Gomes25
Dreamswalls
kierongillen5 TheCharlieZ
The_Football_Bar
This_Is_Artful DigitalDan285 Jinnmeow3
Bob_Kelly2
31
Slido: #ADL2021
Persona Model for Consistency
(Li et al., 2016)Baseline model → inconsistency Persona model using speaker embedding → consistency
32
Slido: #ADL2021
Personalization Datasets
Persona Info Human2:
- I like to ski.
- I am 25 years old
Human1: Hi, what do you do in your free time?
Human2: I enjoy going to the mountain and skiing
Human1: That’s cool, you should be young and strong for this activity!
Human2: oh yeah, I am 25 🤗
Human-to-Human Conversations + Persona Features
●
Persona Chat
●
Tweeter Personalized
●
Learning Personalized End-to- End Goal-Oriented Dialog
33
Slido: #ADL2021
Personalization via TransferTransfo Model
- Fine-Tuning GPT with conversational data
(Persona-Chat)
- Formulate persona, history and reply in single
sequence
Decoder-only
Dialogue History Persona Description+
Response 34
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge 4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
35
Slido: #ADL2021
Limitation 3: Lack of Knowledge
Any recommendation?
The weather is so depressing these days.
I know, I dislike rain too.
What about a day trip to eastern Washington?
Try Dry Falls, it’s spectacular!
Social Chat
Engaging, Human-Like Interaction (Ungrounded)
Task-Oriented
Task Completion, Decision Support (Grounded)
36
36
Slido: #ADL2021
Conversation and Non-Conversation Data
You know any good Japanese restaurant in Seattle?
Try Kisaku, one of the best sushi restaurants in the city.
You know any good A restaurant in B?
Try C, one of the best D in the city.
Conversation Data
Knowledge Resource 37
Slido: #ADL2021
Solution: Knowledge
1. Textual Knowledge
⇒ Retrieving knowledge from Wikipedia, news, etc.
2. Graph Knowledge
⇒ Retrieving subgraph from knowledge graphs
3. Tabular Knowledge
⇒ Incorporate tabular information
4. Service API Interaction
⇒ Generates API query, and incorporate API returns into the response 38
Slido: #ADL2021
Textual Knowledge
Human: My favorite color is blue.
Wizard: Same! Blue is one of the three primary colours.
Human: I am trying to recall, where does blue fall on the spectrum of visible light?
Textual Knowledge:
Blue is one of the three primary colours in the RGB colour model. It lies between violet and green on the spectrum of visible light.
Wizard: It is right between violet and green.
Human-to-Human Conversations + Textual Knowledge
●
Wizard of Wikipedia
●
CoQA
●
TopicChat
●
CMUDoG
●
HollE
●
ConversingByReading
39
Slido: #ADL2021
Dialogue History
Textual
Knowledge
Retrieved Knowledge
Retrieval Methods:
- IR Systems: TF-IDF, BM25
- Neural Retriever: DPR
Encoder Decoder Response
Models with Textual Knowledge
40
Slido: #ADL2021
1. Use TF-IDF retrieves documents that related to dialogue context 2. Encode the retrieved documents independently
3. Use dialogue history as query to assign different weights to the documents 4. Decoder generates the response
Generative Transformer Memory Network
Knowledge: IR Systems + Model
41
Slido: #ADL2021
Human-to-Human Conversations + Graph KG
● OpenDialKG
● DyKgChat
● KdConv
● Commonsense Knowledge Aware Conversation Generation with Graph Attention
● Enhancing Dialog Coherence with Event Graph Grounded Content Planning
Graph Knowledge
42
Slido: #ADL2021
Dialogue History
Subgraph
Subgraph Retrieval:
● All knowledge triples mentioned in a dialogue (1 hop reasoning)
● Neural Retriever (multihop reasoning)
Knowledge graph in triple format:
(entity1, relation, entity2)
Encoder Decoder Response
Models with Graph Knowledge
43
Slido: #ADL2021
DyKgChat: Quick Adaptive Model (Qadpt)
Encoder
output projection extracte seeds
Ct
F B C D
E generic words dist.
D talked to C: Afterwards, don't wait for him at the door. It is cool
in autumn. You may get a cold.
C: I have promised to wait for E.
Reasoning Model
N-hops
D
B
F
E C
D
B
F
E
C D
B
F
E C Transition Matrix (Tt)
X
Tt
Decoder
d1 d2 dt
controller graph entity dist.
Reasoning Matrix
D
B
C
enem y
lover
friend lover
Adjacency Matrix
D
B
F
E C
lover enem
y lover
friend
dT
1. Seq2Seq model
3. Reasoning model 2. Controller
44
Slido: #ADL2021
- Take all the entities mentioned in dialogue as starting node
- Supervised learn the reasoning path over graph via graph attention
OpenDialKG Walker : Subgraph Retrieval
45
Slido: #ADL2021
Tabular Knowledge
Human-to-Human Conversations + Table Knowledge
●
SMD
●
Camrest
●
bAbI-Dialogues
46
Slido: #ADL2021
Dialogue History
Mem2Seq Neural Assistant
Examples
KVR
Encoder Decoder Response
Models with Tabular Knowledge
47
Slido: #ADL2021
External Service API Interaction
Human-to-Human Conversations + Table Knowledge
●
bAbI
●
Camrest
●
MultiWoz
●
CrossWoz
●
Schema Guided Dialogue
●
TaskMaster 1-2-3
●
STAR
48
Slido: #ADL2021
Dialogue History
Service API
Response
Language Model
API query (dialogue state)
ResultsKB
I want to book a cheap restaurant
Query(restaurant_price : cheap)
1050 matches
{name: pizza hut, price :cheap}
…...
There are 1050 cheap restaurant, which location you prefer?
Models with Service API
I want to book a pizza hut for 3 people
Book(restaurant_name : pizza hut, restaurant_people: 3)
Book: success
Reference Number: 32bhj32n Your booking is
successful, the reference number is 32bhj32n
49
Slido: #ADL2021
End-to-End GPT2 Neural Pipeline SimpleToD SOLOIST
Examples
Causal- Decoder
Dialogue History
API query (dialogue state)
Service API
ResultsKB Response
Models with Service API
50
Slido: #ADL2021
Sequicity DAMD MinTL
Examples
Encoder- Decoder
Dialogue History API query
(dialogue state)
Service API
ResultsKB Response
Models with Service API
51
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge 4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
52
Slido: #ADL2021
Limitation 4: Lack of Empathy
Human: I messed up my project today.
System: What is your project?
Human: I think I might get fired...
System: Then you need to find another job!
Human: 😩
53
Slido: #ADL2021
Solution: Empathic Generation
1. Emotional response generation:
■ MojiTalk
■ Emotional Chatting Machine
2. Understand user’s emotion, and response accordingly:
■ Empathetic Dialogues
■ MoEL
■ Cairebot
54
Slido: #ADL2021
Empathy Dataset
Empathy: understand the feelings of the conversation partner and replying accordingly.
Dataset: Empathetic Dialogues 55
Slido: #ADL2021
Encoder Decoder
Dialogue
History Response
Emotion Recognition
Examples
MoEL EmoPrepend-1 CaireBot
Models with Empathy
56
Slido: #ADL2021
57
https://demo.caire.ust.hk/chatbot
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge 4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
58
Slido: #ADL2021
Limitation 5: Lack of Controllability
Existing large pre-trained model has no control over
-
Response style
-
Topics
-
Repetition and specificity
-
Response-relatedness
-
Engagement by
proactively asking question
Dialogue Model
Dialogue
History Response
Meena BlenderBot
Meena BST
Dialogue Pre-trained
DialoGPT
DialoGPT
59
Slido: #ADL2021
1. Controlling low-level attribute 2. Controlling by fine-tuning
3. Controlling by perturbation
4. Controlling by conditioned generation
Solution: Controllability
60
Slido: #ADL2021
Conditional Training + Weight Decoding
What makes a good conversation? How controllable attributes affect human judgments (See et. al. 2019)
Controlling Low-Level Attribute
61
Slido: #ADL2021
Multitask conversation data with style data
⇒ No control codes
STYLEDGPT: Stylized Response
Generation with Pre-trained Language Models (Yang et. al. 2020)
DialoGPT
Dialogue
History Response
word-level style loss
Conversational data Sentence-level
Style loss
Controlling by Fine-Tuning
62
Slido: #ADL2021
● Control the generated style with Plug-and-Play LM (PPLM) (Dathathri et. al. 2020)
● Distilling the generated responses from PPLM into residual adapter (Houlsby et.al. 2019)
⇒ Plug-and-Play for 3 style and 3 topic
Plug-and-Play Conversational Models (Madotto et. al. 2020) DialoGPT
Dialogue
History Response
Controlling by Perturbation
63
Slido: #ADL2021
Controlling by Conditioned Generation
Controllable generation architectures in open-domain dialogues:
▪ retrieval + style-controlled generation (Weston et al. 2018)
▪ PPLM (Dathathri et. al. 2020)
▪ CTRL (Keskar et. al. 2019)
200 style labels in ConvAI2, EmpatheticDialogues, Wizard of Wikipedia, and BlendedSkillTalk) generated by a classifier trained on Image-Chat
Controlling Style in Generated Dialogue (Smith & Gonzalez-Rico et. al. 2020)
64
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge 4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
65
Slido: #ADL2021
Limitation 6: Lack of Versatility
Dialogue Model
Dialogue
History Response
Textual Knowle dge
Dialogue Model
Dialogue
History Response
Dialogue History
API query (dialogue state)
Service API KB Results
Response
Dialogue Model
Dialogue
History Response
Emotion Recognition
Dialogue Model
66
Slido: #ADL2021
67
Dialogue Model
Dialogue History
Response
Textual
Knowledge
Emotion Recognition
API-Query
Solution: ToDs + Chit-Chat
Slido: #ADL2021
ToDs + Chit-Chat Datasets
1. Mixing multiple dialogue datasets 2. Multiple dialogue skills
⇒ Collecting dataset that mix skills
3. Mixing Chit-Chat and ToDs
⇒ Collecting data from mixing the two
68
Slido: #ADL2021
Encoder
Chit-Chat
Knowledge Base Persona
Dialogue History
… .
Domain/Skills API
Composer
Restaurant Hotel SQL BOOK
Decoders
System Response API Call
69
Attention over Parameters
Slido: #ADL2021
Adapter-Bot: All-In-One Controllable Model
●
Use a fixed backbone - DialoGPT
●
Encode each dialogue skill with an independently trained adapters
● able to process multiple knowledge types and styles (8 goal-oriented skills +
personalized and empathetic responses)
●
A skill manager, BERT, is trained to select each adapter
70
Slido: #ADL2021
Encoder
Chit-Chat
Knowledge Base Persona
Dialogue History
….
Domain/Skills API
Composer
Restaurant Hotel SQL BOOK
Decoders
System Response API Call
Blender-bot
Attention over Parameters for Dialogue Systems (Madotto et.al. 2019)
Recipes for building an open-domain chatbot (Roller et.al 2020)
The Adapter-Bot: All-In-One Controllable Conversational Model (Lin & Madotto et.al.
2020)
71
Putting It All Together
Slido: #ADL2021
Limitations of Vanilla Seq2Seq: Summary
1. Lack of diversity
2. Lack of consistency 3. Lack of knowledge 4. Lack of empathy
5. Lack of controllability 6. Lack of versatility
7. Lack of global optimization
◉ These limitations of vanilla seq2seq make human-machine conversations boring and shallow. How can we overcome these limitations and move towards deeper conversational AI?
72
Slido: #ADL2021
Limitation 7: Lack of Global Optimization
Application State Action Reward
Task Completion Bots
(Movies, Restaurants, …)
User input + Context Dialog act + slot-value Task success rate
# of turns Info Bots
(Q&A bot over KB, Web etc.)
Question + Context Clarification questions, Answers
Relevance of answer
# of turns Social Bot
(XiaoIce)
Conversation history Response Engagement(?)
Language understanding
Language (response) generation
Dialogue Manager
𝑎 = 𝜋(𝑠)
Collect rewards (𝑠, 𝑎, 𝑟, 𝑠’)
Optimize 𝑄(𝑠, 𝑎) User input (o)
Response
𝑠
𝑎
73
Slido: #ADL2021
Input message Supervised Learning Agent Reinforcement Learning Agent
Solution: Deep RL for Optimization
(Li et al., 2016)◉ RL agent generates more interactive responses
◉ RL agent tends to end a sentence with a question and hand the conversation over to the user
74
Slido: #ADL2021
Concluding Remarks
◉ Limitations of vanilla seq2seq models
1.
Lack of diversity
2.
Lack of consistency
3.
Lack of knowledge
4.
Lack of empathy
5.
Lack of controllability
6.
Lack of versatility
7.
Lack of global optimization
◉ Recent trends for addressing above limitations
75
Slido: #ADL2021
Her (2013)
What can machines achieve now or in the future?
76
Slido: #ADL2021
77