Improving Sequence Generation by GAN
Hung-yi Lee
http://www.voidcn.com/article /p-nbtytose-tz.html
如何做 NLP 相關研究
Outline
Improving Supervised Seq-to-seq Model
• RL (human feedback)
• GAN (discriminator feedback) Unsupervised Seq-to-seq Model
• Summarization
• Translation
Text Style Transfer
Review: Chat-bot
• Sequence-to-sequence learning
Encoder Generator
Input sentence
output sentence
history information Training data:
A: OOO B: XXX A: ∆ ∆ ∆
…… ……
B: XXX
A: ∆ ∆ ∆
A: OOO
Review: Encoder
好
我 很
to generator
Encoder
Hierarchical Encoder
嗎你 好
Review: Generator
A A
A B
A B
A
A B B
B
<BOS>
can be different with attention mechanism : condition
from decoder
Review: Training Generator
Reference:
A B
𝐶 =
𝑡
𝐶 𝑡
Minimizing
cross-entropy of each component
A
A B
B
A B
<BOS>
A
B B
𝐶 1 𝐶 2 𝐶 3
: condition from decoder
Review: Maximum Likelihood
ො 𝑥𝑡
𝐶 𝑡
𝐶𝑡 = −𝑙𝑜𝑔𝑃𝜃 𝑥ො𝑡| ො𝑥1:𝑡−1, ℎ
𝐶 =
𝑡
𝐶 𝑡
𝐶 = −
𝑡
𝑙𝑜𝑔𝑃 ො𝑥𝑡| ො𝑥1:𝑡−1, ℎ
Maximizing the likelihood of generating ො𝑥 given h
= −𝑙𝑜𝑔𝑃 ො𝑥1|ℎ 𝑃 ො𝑥2| ො𝑥1, ℎ
⋯ 𝑃 ො𝑥𝑇| ො𝑥1:𝑇−1, ℎ
= −𝑙𝑜𝑔𝑃 ො𝑥|ℎ
Training data: ℎ, ො𝑥 ℎ: input sentence and history/context 𝑥: correct response (word sequence)ො
ො
𝑥𝑡: t-th word, ො𝑥1:𝑡: first t words of ො𝑥
…… ……
ො 𝑥𝑡+1
𝐶 𝑡+1
ො 𝑥𝑡−1
𝐶 𝑡−1
generator output
…… ……
Outline
Improving Supervised Seq-to-seq Model
• RL (human feedback)
• GAN (discriminator feedback) Unsupervised Seq-to-seq Model
• Summarization
• Translation
Text Style Transfer
Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky, “Deep Reinforcement Learning for Dialogue
Generation“, EMNLP 2016
Introduction
• Machine obtains feedback from user
• Chat-bot learns to maximize the expected reward
https://image.freepik.com/free-vector/variety- of-human-avatars_23-2147506285.jpg
How are you?
Bye bye
Hello
Hi
-10 3
http://www.freepik.com/free-vector/variety- of-human-avatars_766615.htm
Maximizing Expected Reward
𝜃∗ = 𝑎𝑟𝑔 max
𝜃
𝑅ത𝜃 𝑅ത𝜃
Encoder Generator
𝜃
ℎ 𝑥 Human
=
ℎ
𝑃 ℎ
𝑥
𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ
𝑅 ℎ, 𝑥
Randomness in generator Probability that the input/history is h
Maximizing expected reward
update
Maximizing Expected Reward
= 𝐸ℎ~𝑃 ℎ 𝐸𝑥~𝑃𝜃 𝑥|ℎ 𝑅 ℎ, 𝑥
≈ 1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖
Sample:
𝜃∗ = 𝑎𝑟𝑔 max
𝜃
𝑅ത𝜃
=
ℎ
𝑃 ℎ
𝑥
𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ
Maximizing expected reward
Encoder Generator
𝜃
ℎ 𝑥 Human 𝑅 ℎ, 𝑥
update
𝑅ത𝜃
= 𝐸ℎ~𝑃 ℎ ,𝑥~𝑃𝜃 𝑥|ℎ 𝑅 ℎ, 𝑥
ℎ1, 𝑥1 , ℎ2, 𝑥2 , ⋯ , ℎ𝑁, 𝑥𝑁
Where
is 𝜃?
Policy Gradient
𝛻 ത𝑅𝜃 =
ℎ
𝑃 ℎ
𝑥
𝑅 ℎ, 𝑥 𝛻𝑃𝜃 𝑥|ℎ
=
ℎ
𝑃 ℎ
𝑥
𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ 𝛻𝑃𝜃 𝑥|ℎ 𝑃𝜃 𝑥|ℎ
𝑑𝑙𝑜𝑔 𝑓 𝑥
𝑑𝑥 = 1
𝑓 𝑥
𝑑𝑓 𝑥 𝑑𝑥
=
ℎ
𝑃 ℎ
𝑥
𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥|ℎ
≈ 1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥|ℎ
= 𝐸ℎ~𝑃 ℎ ,𝑥~𝑃𝜃 𝑥|ℎ 𝑅 ℎ, 𝑥 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥|ℎ
Sampling
=
ℎ
𝑃 ℎ
𝑥
𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ
𝑅ത𝜃 ≈ 1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖
Policy Gradient
• Gradient Ascent
𝜃 𝑛𝑒𝑤 ← 𝜃 𝑜𝑙𝑑 + 𝜂𝛻 ത 𝑅 𝜃
𝑜𝑙𝑑𝛻 ത 𝑅 𝜃 ≈ 1
𝑁
𝑖=1 𝑁
𝑅 ℎ 𝑖 , 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖
𝑅 ℎ 𝑖 , 𝑥 𝑖 is positive
After updating 𝜃, 𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖 will increase 𝑅 ℎ 𝑖 , 𝑥 𝑖 is negative
After updating 𝜃, 𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖 will decrease
Implementation
1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 1
𝑁
𝑖=1 𝑁
𝑙𝑜𝑔𝑃𝜃 𝑥ො𝑖|ℎ𝑖
1
𝑁
𝑖=1 𝑁
𝛻𝑙𝑜𝑔𝑃𝜃 𝑥ො𝑖|ℎ𝑖
1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖 𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖
𝑅 ℎ𝑖, ො𝑥𝑖 = 1 Sampling as training data weighted by 𝑅 ℎ𝑖, 𝑥𝑖
Objective Function
Gradient
Maximum Likelihood
Reinforcement Learning
Training Data
ℎ1, ො𝑥1 , … , ℎ𝑁, ො𝑥𝑁 ℎ1, 𝑥1 , … , ℎ𝑁, 𝑥𝑁
Encoder Genera
ℎ𝑖 tor
Human
𝑥𝑖
𝑅 ℎ𝑖, 𝑥𝑖
Implementation
𝜃 𝑡
ℎ1, 𝑥1 ℎ2, 𝑥2
ℎ𝑁, 𝑥𝑁
……
𝑅 ℎ1, 𝑥1 𝑅 ℎ2, 𝑥2
𝑅 ℎ𝑁, 𝑥𝑁
……
1
𝑁
𝑖=1 𝑁
𝑅 ℎ
𝑖, 𝑥
𝑖𝛻𝑙𝑜𝑔𝑃
𝜃𝑡𝑥
𝑖|ℎ
𝑖𝜃 𝑡+1 ← 𝜃 𝑡 + 𝜂𝛻 ത 𝑅 𝜃
𝑡1
𝑁
𝑖=1 𝑁
𝑅 ℎ
𝑖, 𝑥
𝑖𝑙𝑜𝑔𝑃
𝜃𝑥
𝑖|ℎ
𝑖New Objective:
𝜃0 can be well pre-trained from ℎ1, ො𝑥1 , … , ℎ𝑁, ො𝑥𝑁
Add a Baseline
Ideal case
Due to Sampling
(h,x1)
Because it is probability …
Not sampled
1𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖 𝑙𝑜𝑔𝛻𝑃𝜃 𝑥𝑖|ℎ𝑖
𝑃𝜃 𝑥|ℎ
(h,x2) (h,x3)
(h,x1) (h,x2) (h,x3) (h,x1) (h,x2) (h,x3) (h,x1) (h,x2) (h,x3) If 𝑅 ℎ𝑖, 𝑥𝑖 is always positive
1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖 − 𝑏 𝑙𝑜𝑔𝛻𝑃𝜃 𝑥𝑖|ℎ𝑖
Add a Baseline
(h,x1) 1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖 𝑙𝑜𝑔𝛻𝑃𝜃 𝑥𝑖|ℎ𝑖
There are several ways to obtain the baseline b.
𝑃𝜃 𝑥|ℎ
(h,x2) (h,x3)
Not
sampled
Add baseline
If 𝑅 ℎ𝑖, 𝑥𝑖 is always positive
(h,x1) (h,x2) (h,x3)
Alpha GO style training !
• Let two agents talk to each other
How old are you?
See you.
See you.
See you.
How old are you?
I am 16.
I though you were 12.
What make you think so?
Using a pre-defined evaluation function to compute R(h,x)
Example Reward
• The final reward R(h,x) is the weighted sum of three terms r
1(h,x), r
2(h,x) and r
3(h,x)
𝑅 ℎ, 𝑥 = λ 1 𝑟 1 ℎ, 𝑥 + λ 2 𝑟 2 ℎ, 𝑥 + λ 3 𝑟 3 ℎ, 𝑥
Ease of answering
Information Flow
Semantic Coherence
不要成為 句點王
說點 新鮮的
不要前言
不對後語
Example Results
Outline
Improving Supervised Seq-to-seq Model
• RL (human feedback)
• GAN (discriminator feedback) Unsupervised Seq-to-seq Model
• Summarization
• Translation
Text Style Transfer
Basic Idea – Chat-bot
Discriminator Input
sentence/history h response sentence x
Real or fake
http://www.nipic.com/show/3/83/3936650kd7476069.html
human dialogues Chatbot
En De
Conditional GAN
response sentence x Input
sentence/history h
Algorithm – Chat-bot
• Initialize generator Gen and discriminator Dis
• In each iteration:
• Sample real history ℎ and sentence 𝑥 from database
• Sample real history ℎ′ from database, and generate sentences 𝑥 by Gen(ℎ′)
• Update Dis to increase 𝐷𝑖𝑠 ℎ, 𝑥 and decrease 𝐷𝑖𝑠 ℎ′, 𝑥
• Update Gen such that
Discrimi
nator scalar
update
Training data:
A: OOO B: XXX A: ∆ ∆ ∆
…… ……
ℎ
h x
Chatbot
En De
A A A
B
A B
A
A B B
B
<BOS>
Can we do
backpropogation?
Tuning generator a little bit will not
change the output.
Encoder
Discriminator scalar
update
Alternative:
improved WGAN
scalar Chatbot
En De
(ignoring
sampling process)
Alternatives
• Gumbel-softmax
• Matt J. Kusner, José Miguel Hernández-Lobato, “GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution”, arXiv 2016
• MaliGAN
• Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio, “Maximum-Likelihood Augmented Discrete
Generative Adversarial Networks”, arXiv 2017
• SeqGAN
• Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient”, AAAI 2017
• Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky,
“Adversarial Learning for Neural Dialogue Generation”, arXiv 2017
Reinforcement Learning?
• Consider the output of discriminator as reward
• Update generator to increase discriminator = to get maximum reward
• Different from typical RL
• The discriminator would update
𝛻 ത𝑅𝜃 ≈ 1
𝑁
𝑖=1 𝑁
𝑅 ℎ𝑖, 𝑥𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 Discrimi
nator scalar
update
Chatbot
En De
reward
Discriminator Score
𝐷 ℎ𝑖, 𝑥𝑖𝜃 𝑡
ℎ1, 𝑥1 ℎ2, 𝑥2
ℎ𝑁, 𝑥𝑁
……
𝐷 ℎ1, 𝑥1 𝐷 ℎ2, 𝑥2
𝐷 ℎ𝑁, 𝑥𝑁
……
1
𝑁
𝑖=1 𝑁
𝐷 ℎ
𝑖, 𝑥
𝑖𝛻𝑙𝑜𝑔𝑃
𝜃𝑡𝑥
𝑖|ℎ
𝑖𝜃 𝑡+1 ← 𝜃 𝑡 + 𝜂𝛻 ത 𝑅 𝜃
𝑡1
𝑁
𝑖=1 𝑁
𝐷 ℎ
𝑖, 𝑥
𝑖𝑙𝑜𝑔𝑃
𝜃𝑥
𝑖|ℎ
𝑖New Objective:
g-step
d-step
discriminator
discriminator ℎ
𝑥
fake real
Example Results
感謝 段逸林 同學提供實驗結果Human Evaluation
MLE 52.6%
SeqGAN 56.9%
ESGAN 60.9%
Tips: Reward for Every Generation Step
𝛻 ത𝑅𝜃 ≈ 1
𝑁
𝑖=1 𝑁
𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 ℎ𝑖 = “What is your name?”
𝑥𝑖 = “I don’t know”
𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 is negative
Update 𝜃 to decrease log𝑃𝜃 𝑥𝑖|ℎ𝑖 𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 = 𝑙𝑜𝑔𝑃 𝑥1𝑖|ℎ𝑖 + 𝑙𝑜𝑔𝑃 𝑥2𝑖 |ℎ𝑖, 𝑥1𝑖 + 𝑙𝑜𝑔𝑃 𝑥3𝑖 |ℎ𝑖, 𝑥1:2𝑖
𝑃 "𝐼"|ℎ𝑖
ℎ𝑖 = “What is your name?”
𝑥𝑖 = “I am John”
𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 is positive
Update 𝜃 to increase log𝑃𝜃 𝑥𝑖|ℎ𝑖 𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 = 𝑙𝑜𝑔𝑃 𝑥1𝑖|ℎ𝑖 + 𝑙𝑜𝑔𝑃 𝑥2𝑖 |ℎ𝑖, 𝑥1𝑖 + 𝑙𝑜𝑔𝑃 𝑥3𝑖 |ℎ𝑖, 𝑥1:2𝑖
𝑃 "𝐼"|ℎ𝑖
Tips: Reward for Every Generation Step
Method 2. Discriminator For Partially Decoded Sequences
𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 = 𝑙𝑜𝑔𝑃 𝑥1𝑖|ℎ𝑖 + 𝑙𝑜𝑔𝑃 𝑥2𝑖|ℎ𝑖, 𝑥1𝑖 + 𝑙𝑜𝑔𝑃 𝑥3𝑖|ℎ𝑖, 𝑥1:2𝑖 ℎ𝑖 = “What is your name?” 𝑥𝑖 = “I don’t know”
𝑃 "𝐼"|ℎ𝑖 𝑃 "𝑑𝑜𝑛′𝑡"|ℎ𝑖, "𝐼" 𝑃 "𝑘𝑛𝑜𝑤"|ℎ𝑖, "𝐼 𝑑𝑜𝑛′𝑡"
Method 1. Monte Carlo (MC) Search
𝛻 ത𝑅𝜃 ≈ 1
𝑁
𝑖=1 𝑁
𝑡=1 𝑇
𝑄 ℎ𝑖, 𝑥1:𝑡𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑡𝑖|ℎ𝑖, 𝑥1:𝑡−1𝑖
𝛻 ത𝑅𝜃 ≈ 1
𝑁
𝑖=1 𝑁
𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖
Tips: Monte Carlo Search
• How to estimate 𝑄 ℎ 𝑖 , 𝑥 1:𝑡 𝑖 ?
𝑄 "𝑊ℎ𝑎𝑡 𝑖𝑠 𝑦𝑜𝑢𝑟 𝑛𝑎𝑚𝑒? ", "𝐼"
I am John I am happy I don’t know
I am superman
ℎ𝑖 𝑥1𝑖
𝑥𝐴 = 𝑥𝐵 = 𝑥𝐶 = 𝑥𝐷 =
𝐷 ℎ𝑖, 𝑥𝐴 𝐷 ℎ𝑖, 𝑥𝐵 𝐷 ℎ𝑖, 𝑥𝐶 𝐷 ℎ𝑖, 𝑥𝐷
= 1.0
= 0.1
= 0.1
= 0.8
𝑄 ℎ𝑖, "𝐼" = 0.5 A roll-out generator for sampling is needed
avg
Tips: Rewarding Partially Decoded Sequences
• Training a discriminator that is able to assign rewards to both fully and partially decoded sequences
• Break generated sequences into partial sequences h=“What is your name?”, x=“I am john”
h=“What is your name?”, x=“I don’t know”
Dis
scalarh h=“What is your name?”, x=“I am” x
h=“What is your name?”, x=“I”
h=“What is your name?”, x=“I don’t”
h=“What is your name?”, x=“I”
𝑄 ℎ, 𝑥 1:𝑡
h
Dis
𝑥 1:𝑡
Tips: Adding Good Examples
• The training of generative model is unstable
• This reward is used to promote or discourage the generator’s own generated sequences.
• Usually It knows that the generated results are bad, but does not know what results are good.
Obtained by sampling
Adding more Data:
Training Data for SeqGAN:
ℎ1, ො𝑥1 , … , ℎ𝑁, ො𝑥𝑁
ℎ1, 𝑥1 , … , ℎ𝑁, 𝑥𝑁
weighted by 𝐷 ℎ𝑖, 𝑥𝑖 Real data Consider 𝐷 ℎ𝑖, ො𝑥𝑖 = 1
Tips: RankGAN
Kevin Lin, Dianqi Li, Xiaodong
He, Zhengyou Zhang, Ming-Ting Sun,
“Adversarial Ranking for Language Generation”, NIPS 2017
Image caption generation:
More Applications
• Supervised machine translation
• Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, Tie-Yan Liu, “Adversarial Neural Machine Translation”, arXiv 2017
• Zhen Yang, Wei Chen, Feng Wang, Bo Xu, „Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets“, arXiv 2017
• Supervised abstractive summarization
• Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li, “Generative Adversarial Network for Abstractive Text Summarization”, AAAI 2018
• Image/video caption generation
• Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele, “Speaking the Same Language: Matching Machine to Human
Captions by Adversarial Training”, ICCV 2017
• Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing, “Recurrent Topic-Transition GAN for Visual Paragraph Generation”, arXiv 2017
Outline
Improving Supervised Seq-to-seq Model
• RL (human feedback)
• GAN (discriminator feedback) Unsupervised Seq-to-seq Model
• Summarization
• Translation
Text Style Transfer
Summarization
Audio File to be summarized
This is the summary.
Select the most informative segments to form a compact version
Extractive Summaries
…… deep learning is powerful …… …… ……
[Lee, et al., Interspeech 12]
[Lee, et al., ICASSP 13]
[Shiang, et al., Interspeech 13]
Machine does not write summaries in its own words
Abstractive Summarization
• Now machine can do abstractive summary (write summaries in its own words)
Title 1 Title 2
Title 3
TrainingData
title generated
by machine
without hand- crafted rules
(in its own words)
Abstractive Summarization
• Input: transcriptions of audio, output: summary
ℎ1 ℎ2 ℎ3 ℎ4
RNN Encoder: read through
the inputw1 w2 w3 w4
transcriptions of audio from automatic speech recognition (ASR)
𝑧1 𝑧2
……
w
Aw
B……
RNN generator We need lots of labelled
training data (supervised).
Unsupervised Abstractive Summarization
• Document:澳大利亞今天與13個國家簽署了反興奮劑雙 邊協議,旨在加強體育競賽之外的藥品檢查並共享研究成 果 ……
• Summary:
• Human:澳大利亞與13國簽署反興奮劑協議
• Unsupervised:澳大利亞加強體育競賽之外的藥品檢查
• Document:中華民國奧林匹克委員會今天接到一九九二年 冬季奧運會邀請函,由於主席張豐緒目前正在中南美洲進 行友好訪問,因此尚未決定是否派隊赴賽 ……
• Summary:
• Human:一九九二年冬季奧運會函邀我參加
• Unsupervised:奧委會接獲冬季奧運會邀請函
Unsupervised Abstractive Summarization
• Document:據此間媒體27日報道,印度尼西亞蘇門答臘島 的兩個省近日來連降暴雨,洪水泛濫導致塌方,到26日為止 至少已有60人喪生,100多人失蹤 ……
• Summary:
• Human:印尼水災造成60人死亡
• Unsupervised:印尼門洪水泛濫導致塌雨
• Document:安徽省合肥市最近為領導幹部下基層做了新規 定:一律輕車簡從,不準搞迎來送往、不準搞層層陪同 ……
• Summary:
• Human:合肥規定領導幹部下基層活動從簡
• Unsupervised:合肥領導幹部下基層做搞迎來送往規定:
一律簡
More Applications
• Unsupervised video summarization
Behrooz Mahasseni, Michael Lam and Sinisa Todorovic, “Unsupervised Video Summarization with Adversarial LSTM Networks”, CVPR, 2017
Outline of Part II
Improving Supervised Seq-to-seq Model
• RL (human feedback)
• GAN (discriminator feedback) Unsupervised Seq-to-seq Model
• Summarization
• Translation
Text Style Transfer
Unsupervised Translation
• Alexis Conneau, Guillaume Lample, Marc'Aurelio
Ranzato, Ludovic Denoyer, Hervé Jégou, Word Translation Without Parallel Data, submitted to ICRL 2018
• Guillaume Lample, Ludovic
Denoyer, Marc'Aurelio Ranzato,
“Unsupervised Machine Translation Using Monolingual Corpora Only”, submitted to ICRL 2018
Approaches
Experimental Results
Outline of Part II
Improving Supervised Seq-to-seq Model
• RL (human feedback)
• GAN (discriminator feedback) Unsupervised Seq-to-seq Model
• Summarization
• Translation
Text Style Transfer
Example: Personalized Chat-bot
• General chat-bots generate plain responses
• Human talks in different styles and sentiments to different people in different conditions.
• We want the response of chat-bot is controllable.
• Therefore, chat-bot can be personalized in the future
• We only focus on generate positive response below.
Input: How was your day today?
It is wonderful today.
It is terrible today.
Optimistic Chat-bot
Assumption: We have a sentiment classifier. Given a sentence x, we can evaluate how positive it is, SC(x).
Approaches
Input sentence
response sentence Chatbot
En De
Transformation
Positive response
Do not have to changeInput sentence
response sentence Chatbot
En De
Parameters are modified
Type 2. Output Transformation Type 1. System Modification
Positive
response
Approaches
• 1. Persona-Based Model
How is today
Today is awesome
Sentiment Classifier
Today is awesome
0.9
Training
How positive is the input
Approaches
• 1. Persona-Based Model
How is today
Today is bad
Sentiment Classifier
Today is bad
0.1
Training
How positive is the input
Approaches
• 1. Persona-Based Model
I love you
? ? ?
?
Testing
?
Response: I love you, too.
= 0.0
?
= 1.0
Response: I am not ready to
start a relationship.
Approaches
2. Reinforcement Learning
How is today
Today is bad
Positive reward for
positive response
Sentiment Classifier0.1
good
Network parameters are updated
Approaches
3. Plug & Play
response sentence
Transfor mation
Positive response
code sentence VRAE
Encoder
VRAE
Decoder sentence
response code of chat-bot
VRAE Encoder
VRAE Decoder
Positive response
Sentiment Classifier
new code
As large as possible
As close as
possible
Approaches
4. Cycle GAN
response sentence
Transfor mation
Positive response
Domain X Domain Y
male female
It is good.
It’s a good day.
I love you.
It is bad.
It’s a bad day.
I don’t love you.
positive sentences negative sentences
Cycle GAN
• Negative
sentence to positive sentence:it's a crappy day → it's a great day
i wish you could be here → you could be here it's not a good idea → it's good idea
i miss you → i love you
i don't love you → i love you i can't do that → i can do that i feel so sad → i happy
it's a bad day → it's a good day
it's a dummy day → it's a great day
sorry for doing such a horrible thing → thanks for doing a great thing
my doggy is sick → my doggy is my doggy
my little doggy is sick → my little doggy is my little doggy
Experimental Results
• Chat-bots were changed (these two method were able to make complicated changes to the output sentences).
• RL model perform better than persona-based models in two out of the three metric, except sentiment score
• For persona-based, sentiment is high but coherence and
grammar are low.
58
Experimental Results
• These three models modify the responses after they are generated.
• Among the three models, cycle GAN is the best in terms of
coherence, sentiment and grammar.
59
Examples: goo.gl/X1PZLM.
Example 1
• Input: yeah, i’m just less interested in the how-to and more in the what and why of the things we build
• Seq2seq: it’s not a good thing .
• Persona: I’m interested in this .
• RL: it’s a good one
• Plug and play: it’s not a wrong thing .
• Cycle GAN: it’s a good thing .
Example 2
• Input: always learning and self improving is a huge turn on for me so believe me i understand
• Seq2seq: i'm not sure if i can see you in a relationship
• Persona: and you're gonna have a word with your mother .
• RL: i love you so much
• Plug and play: i'm not sure if i can see you a a special
• Cycle GAN: i'm sure if i can see you in a relationship.