Generation by GAN

(1)

Improving Sequence Generation by GAN

Hung-yi Lee

(2)

http://www.voidcn.com/article /p-nbtytose-tz.html

如何做 NLP 相關研究

(3)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(4)

Review: Chat-bot

• Sequence-to-sequence learning

Encoder Generator

Input sentence

output sentence

history information Training data:

A: OOO B: XXX A: ∆ ∆ ∆

…… ……

B: XXX

A: ∆ ∆ ∆

A: OOO

(5)

Review: Encoder

好

我很

to generator

Encoder

Hierarchical Encoder

嗎

你好

(6)

Review: Generator

A A

A B

A

A B B

B

<BOS>

can be different with attention mechanism : condition

from decoder

(7)

Review: Training Generator

Reference:

A B

𝐶 = ෍

𝑡

𝐶 _𝑡

Minimizing

cross-entropy of each component

A

A B

B

A B

<BOS>

A

B B

𝐶 ₁ 𝐶 ₂ 𝐶 ₃

: condition from decoder

(8)

Review: Maximum Likelihood

ො 𝑥_𝑡

𝐶 _𝑡

𝐶_𝑡 = −𝑙𝑜𝑔𝑃_𝜃 𝑥ො_𝑡| ො𝑥_1:𝑡−1, ℎ

𝐶 = ෍

𝑡

𝐶 _𝑡

𝐶 = − ෍

𝑡

𝑙𝑜𝑔𝑃 ො𝑥_𝑡| ො𝑥_1:𝑡−1, ℎ

Maximizing the likelihood of generating ො𝑥 given h

= −𝑙𝑜𝑔𝑃 ො𝑥₁|ℎ 𝑃 ො𝑥₂| ො𝑥₁, ℎ

⋯ 𝑃 ො𝑥_𝑇| ො𝑥_1:𝑇−1, ℎ

= −𝑙𝑜𝑔𝑃 ො𝑥|ℎ

Training data: ℎ, ො𝑥 ℎ: input sentence and history/context 𝑥: correct response (word sequence)ො

ො

𝑥_𝑡: t-th word, ො𝑥_1:𝑡: first t words of ො𝑥

…… ……

ො 𝑥_𝑡+1

𝐶 _𝑡+1

ො 𝑥_𝑡−1

𝐶 _𝑡−1

generator output

…… ……

(9)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky, “Deep Reinforcement Learning for Dialogue

Generation“, EMNLP 2016

(10)

Introduction

• Machine obtains feedback from user

• Chat-bot learns to maximize the expected reward

https://image.freepik.com/free-vector/variety- of-human-avatars_23-2147506285.jpg

How are you?

Bye bye 

Hello

Hi 

-10 3

http://www.freepik.com/free-vector/variety- of-human-avatars_766615.htm

(11)

Maximizing Expected Reward

𝜃^∗ = 𝑎𝑟𝑔 max

𝜃

𝑅ത_𝜃 𝑅ത_𝜃

Encoder Generator

𝜃

ℎ 𝑥 _Human

= ෍

ℎ

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃_𝜃 𝑥|ℎ

𝑅 ℎ, 𝑥

Randomness in generator Probability that the input/history is h

Maximizing expected reward

update

(12)

Maximizing Expected Reward

= 𝐸_{ℎ~𝑃 ℎ} 𝐸_𝑥~𝑃_𝜃 _𝑥|ℎ 𝑅 ℎ, 𝑥

≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖

Sample:

𝜃^∗ = 𝑎𝑟𝑔 max

𝜃

𝑅ത_𝜃

= ෍

ℎ

𝑃 ℎ ෍

𝑥

Maximizing expected reward

Encoder Generator

𝜃

ℎ 𝑥 _Human 𝑅 ℎ, 𝑥

update

𝑅ത_𝜃

= 𝐸_{ℎ~𝑃 ℎ ,𝑥~𝑃}_𝜃 _𝑥|ℎ 𝑅 ℎ, 𝑥

ℎ¹, 𝑥¹ , ℎ², 𝑥² , ⋯ , ℎ^𝑁, 𝑥^𝑁

Where

is 𝜃?

(13)

Policy Gradient

𝛻 ത𝑅_𝜃 = ෍

ℎ

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝛻𝑃_𝜃 𝑥|ℎ

= ෍

ℎ

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃_𝜃 𝑥|ℎ 𝛻𝑃_𝜃 𝑥|ℎ 𝑃_𝜃 𝑥|ℎ

𝑑𝑙𝑜𝑔 𝑓 𝑥

𝑑𝑥 = 1

𝑓 𝑥

𝑑𝑓 𝑥 𝑑𝑥

= ෍

ℎ

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃_𝜃 𝑥|ℎ 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥|ℎ

≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥|ℎ

= 𝐸_{ℎ~𝑃 ℎ ,𝑥~𝑃}_𝜃 _𝑥|ℎ 𝑅 ℎ, 𝑥 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥|ℎ

Sampling

= ෍

ℎ

𝑃 ℎ ෍

𝑥

𝑅ത_𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

(14)

Policy Gradient

• Gradient Ascent

𝜃 ^𝑛𝑒𝑤 ← 𝜃 ^𝑜𝑙𝑑 + 𝜂𝛻 ത 𝑅 _𝜃

^𝑜𝑙𝑑

𝛻 ത 𝑅 _𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ ^𝑖 , 𝑥 ^𝑖 𝛻𝑙𝑜𝑔𝑃 _𝜃 𝑥 ^𝑖 |ℎ ^𝑖

𝑅 ℎ ^𝑖 , 𝑥 ^𝑖 is positive

After updating 𝜃, 𝑃 _𝜃 𝑥 ^𝑖 |ℎ ^𝑖 will increase 𝑅 ℎ ^𝑖 , 𝑥 ^𝑖 is negative

After updating 𝜃, 𝑃 _𝜃 𝑥 ^𝑖 |ℎ ^𝑖 will decrease

(15)

Implementation

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥^𝑖|ℎ^𝑖 1

𝑁 ෍

𝑖=1 𝑁

𝑙𝑜𝑔𝑃_𝜃 𝑥ො^𝑖|ℎ^𝑖

1

𝑁 ෍

𝑖=1 𝑁

𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥ො^𝑖|ℎ^𝑖

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖 𝑙𝑜𝑔𝑃_𝜃 𝑥^𝑖|ℎ^𝑖

𝑅 ℎ^𝑖, ො𝑥^𝑖 = 1 Sampling as training data weighted by 𝑅 ℎ^𝑖, 𝑥^𝑖

Objective Function

Gradient

Maximum Likelihood

Reinforcement Learning

Training Data

ℎ¹, ො𝑥¹ , … , ℎ^𝑁, ො𝑥^𝑁 ℎ¹, 𝑥¹ , … , ℎ^𝑁, 𝑥^𝑁

Encoder Genera

ℎ^𝑖 tor

Human

𝑥^𝑖

(16)

Implementation

𝜃 ^𝑡

ℎ¹, 𝑥¹ ℎ², 𝑥²

ℎ^𝑁, 𝑥^𝑁

……

𝑅 ℎ¹, 𝑥¹ 𝑅 ℎ², 𝑥²

𝑅 ℎ^𝑁, 𝑥^𝑁

……

1 𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ

^𝑖

, 𝑥

^𝑖

𝛻𝑙𝑜𝑔𝑃

_𝜃^𝑡

𝑥

^𝑖

|ℎ

^𝑖

𝜃 ^𝑡+1 ← 𝜃 ^𝑡 + 𝜂𝛻 ത 𝑅 _𝜃

^𝑡

1 𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ

^𝑖

, 𝑥

^𝑖

𝑙𝑜𝑔𝑃

_𝜃

𝑥

^𝑖

|ℎ

^𝑖

New Objective:

𝜃⁰ can be well pre-trained from ℎ¹, ො𝑥¹ , … , ℎ^𝑁, ො𝑥^𝑁

(17)

Add a Baseline

Ideal case

Due to Sampling

(h,x¹)

Because it is probability …

Not sampled

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖 𝑙𝑜𝑔𝛻𝑃_𝜃 𝑥^𝑖|ℎ^𝑖

𝑃_𝜃 𝑥|ℎ

(h,x²) (h,x³)

(h,x¹) (h,x²) (h,x³) (h,x¹) (h,x²) (h,x³) (h,x¹) (h,x²) (h,x³) If 𝑅 ℎ^𝑖, 𝑥^𝑖 is always positive

(18)

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖 − 𝑏 𝑙𝑜𝑔𝛻𝑃_𝜃 𝑥^𝑖|ℎ^𝑖

Add a Baseline

(h,x¹) 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖 𝑙𝑜𝑔𝛻𝑃_𝜃 𝑥^𝑖|ℎ^𝑖

There are several ways to obtain the baseline b.

𝑃_𝜃 𝑥|ℎ

(h,x²) (h,x³)

Not

sampled

Add baseline

If 𝑅 ℎ^𝑖, 𝑥^𝑖 is always positive

(h,x¹) (h,x²) (h,x³)

(19)

Alpha GO style training !

• Let two agents talk to each other

How old are you?

See you.

How old are you?

I am 16.

I though you were 12.

What make you think so?

Using a pre-defined evaluation function to compute R(h,x)

(20)

Example Reward

• The final reward R(h,x) is the weighted sum of three terms r

₁

(h,x), r

₂

(h,x) and r

₃

(h,x)

𝑅 ℎ, 𝑥 = λ ₁ 𝑟 ₁ ℎ, 𝑥 + λ ₂ 𝑟 ₂ ℎ, 𝑥 + λ ₃ 𝑟 ₃ ℎ, 𝑥

Ease of answering

Information Flow

Semantic Coherence

不要成為句點王

說點新鮮的

不要前言

不對後語

(21)

Example Results

(22)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(23)

Basic Idea – Chat-bot

Discriminator Input

sentence/history h response sentence x

Real or fake

http://www.nipic.com/show/3/83/3936650kd7476069.html

human dialogues Chatbot

En De

Conditional GAN

response sentence x Input

sentence/history h

(24)

Algorithm – Chat-bot

• Initialize generator Gen and discriminator Dis

• In each iteration:

• Sample real history ℎ and sentence 𝑥 from database

• Sample real history ℎ′ from database, and generate sentences ෤𝑥 by Gen(ℎ′)

• Update Dis to increase 𝐷𝑖𝑠 ℎ, 𝑥 and decrease 𝐷𝑖𝑠 ℎ^′, ෤𝑥

• Update Gen such that

Discrimi

nator scalar

update

Training data:

A: OOO B: XXX A: ∆ ∆ ∆

…… ……

ℎ

h x

Chatbot

En De

(25)

A A A

B

A B

A

A B B

B

<BOS>

Can we do

backpropogation?

Tuning generator a little bit will not

change the output.

Encoder

Discrimi

nator scalar

update

Alternative:

improved WGAN

scalar Chatbot

En De

(ignoring

sampling process)

(26)

Alternatives

• Gumbel-softmax

• Matt J. Kusner, José Miguel Hernández-Lobato, “GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution”, arXiv 2016

• MaliGAN

• Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio, “Maximum-Likelihood Augmented Discrete

Generative Adversarial Networks”, arXiv 2017

• SeqGAN

• Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient”, AAAI 2017

• Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky,

“Adversarial Learning for Neural Dialogue Generation”, arXiv 2017

(27)

Reinforcement Learning?

• Consider the output of discriminator as reward

• Update generator to increase discriminator = to get maximum reward

• Different from typical RL

• The discriminator would update

𝛻 ത𝑅_𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ^𝑖, 𝑥^𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥^𝑖|ℎ^𝑖 Discrimi

nator scalar

update

Chatbot

En De

reward

Discriminator Score

𝐷 ℎ^𝑖, 𝑥^𝑖

(28)

𝜃 ^𝑡

ℎ¹, 𝑥¹ ℎ², 𝑥²

ℎ^𝑁, 𝑥^𝑁

……

𝐷 ℎ¹, 𝑥¹ 𝐷 ℎ², 𝑥²

𝐷 ℎ^𝑁, 𝑥^𝑁

……

1 𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ

^𝑖

, 𝑥

^𝑖

𝛻𝑙𝑜𝑔𝑃

_𝜃^𝑡

𝑥

^𝑖

|ℎ

^𝑖

𝜃 ^𝑡+1 ← 𝜃 ^𝑡 + 𝜂𝛻 ത 𝑅 _𝜃

^𝑡

1 𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ

^𝑖

, 𝑥

^𝑖

𝑙𝑜𝑔𝑃

_𝜃

𝑥

^𝑖

|ℎ

^𝑖

New Objective:

g-step

d-step

discriminator

discriminator ℎ

𝑥

fake real

(29)

Example Results

感謝段逸林同學提供實驗結果

Human Evaluation

MLE 52.6%

SeqGAN 56.9%

ESGAN 60.9%

(30)

Tips: Reward for Every Generation Step

𝛻 ത𝑅_𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ^𝑖, 𝑥^𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥^𝑖|ℎ^𝑖 ℎ^𝑖 = “What is your name?”

𝑥^𝑖 = “I don’t know”

𝐷 ℎ^𝑖, 𝑥^𝑖 − 𝑏 is negative

𝑃 "𝐼"|ℎ^𝑖

ℎ^𝑖 = “What is your name?”

𝑥^𝑖 = “I am John”

𝐷 ℎ^𝑖, 𝑥^𝑖 − 𝑏 is positive

𝑃 "𝐼"|ℎ^𝑖

(31)

Tips: Reward for Every Generation Step

Method 2. Discriminator For Partially Decoded Sequences

𝑙𝑜𝑔𝑃_𝜃 𝑥^𝑖|ℎ^𝑖 = 𝑙𝑜𝑔𝑃 𝑥₁^𝑖|ℎ^𝑖 + 𝑙𝑜𝑔𝑃 𝑥₂^𝑖|ℎ^𝑖, 𝑥₁^𝑖 + 𝑙𝑜𝑔𝑃 𝑥₃^𝑖|ℎ^𝑖, 𝑥_1:2^𝑖 ℎ^𝑖 = “What is your name?” 𝑥^𝑖 = “I don’t know”

𝑃 "𝐼"|ℎ^𝑖 𝑃 "𝑑𝑜𝑛′𝑡"|ℎ^𝑖, "𝐼" 𝑃 "𝑘𝑛𝑜𝑤"|ℎ^𝑖, "𝐼 𝑑𝑜𝑛′𝑡"

Method 1. Monte Carlo (MC) Search

𝛻 ത𝑅_𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

෍

𝑡=1 𝑇

𝑄 ℎ^𝑖, 𝑥_1:𝑡^𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥_𝑡^𝑖|ℎ^𝑖, 𝑥_1:𝑡−1^𝑖

𝛻 ത𝑅_𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ^𝑖, 𝑥^𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃_𝜃 𝑥^𝑖|ℎ^𝑖

(32)

Tips: Monte Carlo Search

• How to estimate 𝑄 ℎ ^𝑖 , 𝑥 _1:𝑡 ^𝑖 ?

𝑄 "𝑊ℎ𝑎𝑡 𝑖𝑠 𝑦𝑜𝑢𝑟 𝑛𝑎𝑚𝑒? ", "𝐼"

I am John I am happy I don’t know

I am superman

ℎ^𝑖 𝑥₁^𝑖

𝑥^𝐴 = 𝑥^𝐵 = 𝑥^𝐶 = 𝑥^𝐷 =

𝐷 ℎ^𝑖, 𝑥^𝐴 𝐷 ℎ^𝑖, 𝑥^𝐵 𝐷 ℎ^𝑖, 𝑥^𝐶 𝐷 ℎ^𝑖, 𝑥^𝐷

= 1.0

= 0.1

= 0.8

𝑄 ℎ^𝑖, "𝐼" = 0.5 A roll-out generator for sampling is needed

avg

(33)

Tips: Rewarding Partially Decoded Sequences

• Training a discriminator that is able to assign rewards to both fully and partially decoded sequences

• Break generated sequences into partial sequences h=“What is your name?”, x=“I am john”

h=“What is your name?”, x=“I don’t know”

Dis

^scalar

h h=“What is your name?”, x=“I am” x

h=“What is your name?”, x=“I”

h=“What is your name?”, x=“I don’t”

h=“What is your name?”, x=“I”

𝑄 ℎ, 𝑥 _1:𝑡

h

Dis

𝑥 _1:𝑡

(34)

Tips: Adding Good Examples

• The training of generative model is unstable

• This reward is used to promote or discourage the generator’s own generated sequences.

• Usually It knows that the generated results are bad, but does not know what results are good.

Obtained by sampling

Adding more Data:

Training Data for SeqGAN:

ℎ¹, ො𝑥¹ , … , ℎ^𝑁, ො𝑥^𝑁

ℎ¹, 𝑥¹ , … , ℎ^𝑁, 𝑥^𝑁

weighted by 𝐷 ℎ^𝑖, 𝑥^𝑖 Real data Consider 𝐷 ℎ^𝑖, ො𝑥^𝑖 = 1

(35)

Tips: RankGAN

Kevin Lin, Dianqi Li, Xiaodong

He, Zhengyou Zhang, Ming-Ting Sun,

“Adversarial Ranking for Language Generation”, NIPS 2017

Image caption generation:

(36)

More Applications

• Supervised machine translation

• Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, Tie-Yan Liu, “Adversarial Neural Machine Translation”, arXiv 2017

• Zhen Yang, Wei Chen, Feng Wang, Bo Xu, „Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets“, arXiv 2017

• Supervised abstractive summarization

• Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li, “Generative Adversarial Network for Abstractive Text Summarization”, AAAI 2018

• Image/video caption generation

• Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele, “Speaking the Same Language: Matching Machine to Human

Captions by Adversarial Training”, ICCV 2017

• Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing, “Recurrent Topic-Transition GAN for Visual Paragraph Generation”, arXiv 2017

(37)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(38)

Summarization

Audio File to be summarized

This is the summary.

 Select the most informative segments to form a compact version

Extractive Summaries

…… deep learning is powerful …… …… ……

[Lee, et al., Interspeech 12]

[Lee, et al., ICASSP 13]

[Shiang, et al., Interspeech 13]

 Machine does not write summaries in its own words

(39)

Abstractive Summarization

• Now machine can do abstractive summary (write summaries in its own words)

Title 1 Title 2

Title 3

Training

Data

title generated

by machine

without hand- crafted rules

(in its own words)

(40)

Abstractive Summarization

• Input: transcriptions of audio, output: summary

ℎ¹ ℎ² ℎ³ ℎ⁴

RNN Encoder: read through

the input

w₁ w₂ w₃ w₄

transcriptions of audio from automatic speech recognition (ASR)

𝑧¹ 𝑧²

……

w

_A

w

_B

……

RNN generator We need lots of labelled

training data (supervised).

(41)

Unsupervised Abstractive Summarization

• Document:澳大利亞今天與13個國家簽署了反興奮劑雙 邊協議,旨在加強體育競賽之外的藥品檢查並共享研究成果 ……

• Summary:

• Human:澳大利亞與13國簽署反興奮劑協議

• Unsupervised:澳大利亞加強體育競賽之外的藥品檢查

• Document:中華民國奧林匹克委員會今天接到一九九二年 冬季奧運會邀請函,由於主席張豐緒目前正在中南美洲進行友好訪問,因此尚未決定是否派隊赴賽 ……

• Summary:

• Human:一九九二年冬季奧運會函邀我參加

• Unsupervised:奧委會接獲冬季奧運會邀請函

(42)

Unsupervised Abstractive Summarization

• Document:據此間媒體27日報道,印度尼西亞蘇門答臘島 的兩個省近日來連降暴雨,洪水泛濫導致塌方,到26日為止至少已有60人喪生,100多人失蹤 ……

• Summary:

• Human:印尼水災造成60人死亡

• Unsupervised:印尼門洪水泛濫導致塌雨

• Document:安徽省合肥市最近為領導幹部下基層做了新規 定:一律輕車簡從,不準搞迎來送往、不準搞層層陪同 ……

• Summary:

• Human:合肥規定領導幹部下基層活動從簡

• Unsupervised:合肥領導幹部下基層做搞迎來送往規定:

一律簡

(43)

More Applications

• Unsupervised video summarization

Behrooz Mahasseni, Michael Lam and Sinisa Todorovic, “Unsupervised Video Summarization with Adversarial LSTM Networks”, CVPR, 2017

(44)

Outline of Part II

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(45)

Unsupervised Translation

• Alexis Conneau, Guillaume Lample, Marc'Aurelio

Ranzato, Ludovic Denoyer, Hervé Jégou, Word Translation Without Parallel Data, submitted to ICRL 2018

• Guillaume Lample, Ludovic

Denoyer, Marc'Aurelio Ranzato,

“Unsupervised Machine Translation Using Monolingual Corpora Only”, submitted to ICRL 2018

(46)

Approaches

(47)

Experimental Results

(48)

Outline of Part II

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(49)

Example: Personalized Chat-bot

• General chat-bots generate plain responses

• Human talks in different styles and sentiments to different people in different conditions.

• We want the response of chat-bot is controllable.

• Therefore, chat-bot can be personalized in the future

• We only focus on generate positive response below.

Input: How was your day today?

It is wonderful today.

It is terrible today.

Optimistic Chat-bot

Assumption: We have a sentiment classifier. Given a sentence x, we can evaluate how positive it is, SC(x).

(50)

Approaches

Input sentence

response sentence Chatbot

En De

^Transfor

mation

Positive response

Do not have to change

Input sentence

response sentence Chatbot

En De

Parameters are modified

Type 2. Output Transformation Type 1. System Modification

Positive

response

(51)

Approaches

• 1. Persona-Based Model

How is today

Today is awesome

Sentiment Classifier

Today is awesome

0.9

Training

How positive is the input

(52)

Approaches

• 1. Persona-Based Model

How is today

Today is bad

0.1

Training

How positive is the input

(53)

Approaches

• 1. Persona-Based Model

I love you

? ? ?

?

Testing

?

Response: I love you, too.

= 0.0

?

= 1.0

Response: I am not ready to

start a relationship.

(54)

Approaches

2. Reinforcement Learning

How is today

Today is bad

Positive reward for

positive response

0.1

good

Network parameters are updated

(55)

Approaches

3. Plug & Play

response sentence

Transfor mation

Positive response

code sentence VRAE

Encoder

VRAE

Decoder sentence

response code of chat-bot

VRAE Encoder

VRAE Decoder

Positive response

new code

As large as possible

As close as

possible

(56)

Approaches

4. Cycle GAN

response sentence

Transfor mation

Positive response

Domain X Domain Y

male female

It is good.

It’s a good day.

I love you.

It is bad.

It’s a bad day.

I don’t love you.

positive sentences negative sentences

(57)

Cycle GAN

• Negative

sentence to positive sentence:

it's a crappy day → it's a great day

i wish you could be here → you could be here it's not a good idea → it's good idea

i miss you → i love you

i don't love you → i love you i can't do that → i can do that i feel so sad → i happy

it's a bad day → it's a good day

it's a dummy day → it's a great day

sorry for doing such a horrible thing → thanks for doing a great thing

my doggy is sick → my doggy is my doggy

my little doggy is sick → my little doggy is my little doggy

(58)

Experimental Results

• Chat-bots were changed (these two method were able to make complicated changes to the output sentences).

• RL model perform better than persona-based models in two out of the three metric, except sentiment score

• For persona-based, sentiment is high but coherence and

grammar are low.

58

(59)

Experimental Results

• These three models modify the responses after they are generated.

• Among the three models, cycle GAN is the best in terms of

coherence, sentiment and grammar.

59

Examples: goo.gl/X1PZLM.

(60)

Example 1

• Input: yeah, i’m just less interested in the how-to and more in the what and why of the things we build

• Seq2seq: it’s not a good thing .

• Persona: I’m interested in this .

• RL: it’s a good one

• Plug and play: it’s not a wrong thing .

• Cycle GAN: it’s a good thing .

(61)

Example 2

• Input: always learning and self improving is a huge turn on for me so believe me i understand

• Seq2seq: i'm not sure if i can see you in a relationship

• Persona: and you're gonna have a word with your mother .

• RL: i love you so much

• Plug and play: i'm not sure if i can see you a a special

• Cycle GAN: i'm sure if i can see you in a relationship.

(62)

Generation by GAN

Improving Sequence Generation by GAN

Hung-yi Lee

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

Review: Chat-bot

• Sequence-to-sequence learning

Encoder Generator

A: OOO B: XXX A: ∆ ∆ ∆

…… ……

B: XXX

A: ∆ ∆ ∆

A: OOO

Review: Encoder

Encoder

Hierarchical Encoder

Review: Generator

Review: Training Generator

𝐶 = ෍

𝑡

𝐶 𝑡

𝐶 1 𝐶 2 𝐶 3

Review: Maximum Likelihood

𝐶 𝑡

𝐶 = ෍

𝑡

𝐶 𝑡

𝐶 𝑡+1

𝐶 𝑡−1

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

Introduction

-10 3

Maximizing Expected Reward

Encoder Generator

𝜃

ℎ 𝑥 Human

𝑅 ℎ, 𝑥

update

Maximizing Expected Reward

Sample:

Encoder Generator

𝜃

ℎ 𝑥 Human 𝑅 ℎ, 𝑥

update

Where

is 𝜃?

Policy Gradient

Sampling

Policy Gradient

• Gradient Ascent

𝜃 𝑛𝑒𝑤 ← 𝜃 𝑜𝑙𝑑 + 𝜂𝛻 ത 𝑅 𝜃

𝛻 ത 𝑅 𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ 𝑖 , 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖

𝑅 ℎ 𝑖 , 𝑥 𝑖 is positive

After updating 𝜃, 𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖 will increase 𝑅 ℎ 𝑖 , 𝑥 𝑖 is negative

After updating 𝜃, 𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖 will decrease

Implementation

Objective Function

Gradient

Maximum Likelihood

Reinforcement Learning

Training Data

Implementation

𝜃 𝑡

……

……

1

𝐶 _𝑡

𝐶 ₁ 𝐶 ₂ 𝐶 ₃

𝐶 _𝑡

𝐶 _𝑡

𝐶 _𝑡+1

𝐶 _𝑡−1

ℎ 𝑥 _Human

ℎ 𝑥 _Human 𝑅 ℎ, 𝑥

𝜃 ^𝑛𝑒𝑤 ← 𝜃 ^𝑜𝑙𝑑 + 𝜂𝛻 ത 𝑅 _𝜃

𝛻 ത 𝑅 _𝜃 ≈ 1

𝑅 ℎ ^𝑖 , 𝑥 ^𝑖 𝛻𝑙𝑜𝑔𝑃 _𝜃 𝑥 ^𝑖 |ℎ ^𝑖

𝑅 ℎ ^𝑖 , 𝑥 ^𝑖 is positive

After updating 𝜃, 𝑃 _𝜃 𝑥 ^𝑖 |ℎ ^𝑖 will increase 𝑅 ℎ ^𝑖 , 𝑥 ^𝑖 is negative

After updating 𝜃, 𝑃 _𝜃 𝑥 ^𝑖 |ℎ ^𝑖 will decrease

𝜃 ^𝑡

𝜃 ^𝑡+1 ← 𝜃 ^𝑡 + 𝜂𝛻 ത 𝑅 _𝜃

𝑅 ℎ, 𝑥 = λ ₁ 𝑟 ₁ ℎ, 𝑥 + λ ₂ 𝑟 ₂ ℎ, 𝑥 + λ ₃ 𝑟 ₃ ℎ, 𝑥

不要成為句點王

說點新鮮的