Slides credit from Shawn
Review
2
3
Task-Oriented Dialogue System
(Young, 2000)3
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
4
Task-Oriented Dialogue System
(Young, 2000)4
Speech Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy Natural Language
Generation (NLG) Hypothesis
are there any action movies to see this weekend
Semantic Frame request_movie
genre=action, date=this weekend
System Action/Policy request_location Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action / Knowledge Providers
Language Modeling
5
6
Language Modeling
Goal: estimate the probability of a word sequence
Example task: determinate whether a sequence is grammatical or makes more sense
6
recognize speech or
wreck a nice beach Output =
“recognize speech”
If P(recognize speech)
> P(wreck a nice beach)
7
N-Gram Language Modeling
Goal: estimate the probability of a word sequence
N-gram language model
Probability is conditioned on a window of (n-1) previous words
Estimate the probability based on the training data
7
𝑃 beach|nice = 𝐶 𝑛𝑖𝑐𝑒 𝑏𝑒𝑎𝑐ℎ
𝐶 𝑛𝑖𝑐𝑒 Count of “nice” in the training data Count of “nice beach” in the training data
Issue: some sequences may not appear in the training data
8
N-Gram Language Modeling
Training data:
The dog ran ……
The cat jumped ……
8
P( jumped | dog ) = 0 P( ran | cat ) = 0
give some small probability
smoothing
0.0001 0.0001
The probability is not accurate.
The phenomenon happens because we cannot collect all the possible text in the world as training data.
9
Neural Language Modeling
Idea: estimate not from count, but
from the NN prediction
9
Neural Network
vector of “START”
P(next word is
“wreck”)
Neural Network
vector of “wreck”
P(next word is “a”)
Neural Network
vector of “a”
P(next word is
“nice”)
Neural Network
vector of “nice”
P(next word is
“beach”) P(“wreck a nice beach”) = P(wreck|START)P(a|wreck)P(nice|a)P(beach|nice)
10
Neural Language Modeling
Bengio et al., “A Neural Probabilistic Language Model,” in JMLR, 2003. 10
Issue: fixed context window for conditioning input
hidden output
context vector Probability distribution
of the next word
11
Neural Language Modeling
The input layer (or hidden layer) of the related words are close
If P(jump|dog) is large, P(jump|cat) increase accordingly (even there is not “… cat jump …” in the data)
11
h1 h2
dog cat
rabbit
Smoothing is automatically done
12
RNNLM
Idea: condition the neural network on all previous words and tie the weights at each time step
Assumption: temporal information matters
12
vector of “START”
P(next=“wreck”)
vector of “wreck”
P(next=“a”)
vector of “a”
P(next=“nice”)
vector of “nice”
P(next =“beach”)
Idea: pass the information from the previous hidden layer to leverage all contexts context
vector word dist
Natural Language Generation
13
Traditional Approaches
14
Natural Language Generation (NLG)
Mapping dialogue acts into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
14
15
Template-Based NLG
Define a set of rules to map frames to NL
15
Pros:simple, error-free, easy to control Cons: time-consuming, rigid, poor scalability Semantic Frame Natural Language
confirm() “Please tell me more about the product your are looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
16
Class-Based LM NLG
(Oh and Rudnicky, 2000)
Class-based language modeling
NLG by decoding
16
Pros:easy to implement/
understand, simple rules
Cons: computationally inefficient Classes:
inform_area inform_address
…
request_area request_postcode
http://dl.acm.org/citation.cfm?id=1117568
17
Phrase-Based NLG
(Mairesse et al, 2010)Semantic DBN Phrase
DBN
Charlie Chan is a Chinese Restaurant near Cineworld in the centre
d d
Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)
17
Pros:efficient, good performance Cons: require semantic alignments
realization phrase semantic stack
http://dl.acm.org/citation.cfm?id=1858838
Natural Language Generation
18
Deep Learning Approaches
19
RNN-Based LM NLG
(Wen et al., 2015)<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> Din Tai Fung serves Taiwanese . delexicalisation
Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…
dialogue act 1-hot representation
SLOT_NAME serves SLOT_FOOD . <EOS>
Slot weight tying
conditioned on the dialogue act
Input
Output
http://www.anthology.aclweb.org/W/W15/W15-46.pdf#page=295
20
Handling Semantic Repetition
Issue: semantic repetition
Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.
Din Tai Fung is a child friendly restaurant, and also allows kids.
Deficiency in either model or decoding (or both)
Mitigation
Post-processing rules (Oh & Rudnicky, 2000)
Gating mechanism (Wen et al., 2015)
Attention(Mei et al., 2016; Wen et al., 2015)
20
21
Visualization
21
22
Original LSTM cell
Dialogue act (DA) cell
Modify C
tSemantic Conditioned LSTM
(Wen et al., 2015)DA cell LSTM cell
Ct
it
ft
ot
rt
ht
dt
dt-1
xt
xt ht-1
xt ht-1 xt ht-1 xt ht-
1
ht-1
Inform(name=Seven_Days, food=Chinese)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … dialog act 1-hot representation d0
22
Idea: using gate mechanism to control the generated semantics (dialogue act/slots)
http://www.aclweb.org/anthology/D/D15/D15-1199.pdf
23
Attentive Encoder-Decoder for NLG
Slot & value embedding
Attentive meaning representation
23
24
Attention Heat Map
25
Model Comparison
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
1 10 100
BLEU
% of data
hlstm sclstm encdec
0%
1%
10%
100%
1 10 100
ERR
% of data hlstm
sclstm encdec
26
Structural NLG
(Dušek and Jurčíček, 2016)
Goal: NLG based on the syntax tree
Encode trees as sequences
Seq2Seq model for generation
26 https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79
27
Contextual NLG
(Dušek and Jurčíček, 2016)
Goal: adapting users’
way of speaking, providing context- aware responses
Context encoder
Seq2Seq model
27 https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203
28
Decoder Sampling Strategy
Decoding procedure
Greedy search
Beam search
Random search
28
Inform(name=Din Tai Fung, food=Taiwanese) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…
SLOT_NAME serves SLOT_FOOD . <EOS>
29
Greedy Search
Select the next word with the highest probability
29
30
Beam Search
Select the next k-best words and keep a beam with width=k for following decoding
30
31
Random Search
Randomly select the next word
Higher diversity
Can follow a probability distribution
31
Chit-Chat Generation
32
33
Chit-Chat Bot
Neural conversational model
Non task-oriented
33
34
Many-to-Many
Both input and output are both sequences → Sequence-to- sequence learning
E.g. Machine Translation (machine learning→機器學習)
34
learning
machine
機 器 學 習
[Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15]
===
35
A Neural Conversational Model
Seq2Seq
35
[Vinyals and Le, 2015]
36
Chit-Chat Bot
36
電視影集 (~40,000 sentences)、美國總統大選辯論
37
Sci-Fi Short Film - SUNSPRING
https://www.youtube.com/watch?v=LY7x2Ihqj37
38
Concluding Remarks
The three pillars of deep learning for NLG
Distributed representation – generalization
Recurrent connection – long-term dependency
Conditional RNN – flexibility/creativity
Useful techniques in deep learning for NLG
Learnable gates
Attention mechanism
Generating longer/complex sentences
Phrase dialogue as conditional generation problem
Conditioning on raw input sentence chit-chat bot
Conditioning on both structured and unstructured sources task-completing dialogue system
38