• 沒有找到結果。

Generation by GAN

N/A
N/A
Protected

Academic year: 2022

Share "Generation by GAN"

Copied!
62
0
0

加載中.... (立即查看全文)

全文

(1)

Improving Sequence Generation by GAN

Hung-yi Lee

(2)

http://www.voidcn.com/article /p-nbtytose-tz.html

如何做 NLP 相關研究

(3)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(4)

Review: Chat-bot

• Sequence-to-sequence learning

Encoder Generator

Input sentence

output sentence

history information Training data:

A: OOO B: XXX A: ∆ ∆ ∆

…… ……

B: XXX

A: ∆ ∆ ∆

A: OOO

(5)

Review: Encoder

我 很

to generator

Encoder

Hierarchical Encoder

你 好

(6)

Review: Generator

A A

A B

A B

A

A B B

B

<BOS>

can be different with attention mechanism : condition

from decoder

(7)

Review: Training Generator

Reference:

A B

𝐶 = ෍

𝑡

𝐶 𝑡

Minimizing

cross-entropy of each component

A

A B

B

A B

<BOS>

A

B B

𝐶 1 𝐶 2 𝐶 3

: condition from decoder

(8)

Review: Maximum Likelihood

ො 𝑥𝑡

𝐶 𝑡

𝐶𝑡 = −𝑙𝑜𝑔𝑃𝜃 𝑥ො𝑡| ො𝑥1:𝑡−1, ℎ

𝐶 = ෍

𝑡

𝐶 𝑡

𝐶 = − ෍

𝑡

𝑙𝑜𝑔𝑃 ො𝑥𝑡| ො𝑥1:𝑡−1, ℎ

Maximizing the likelihood of generating ො𝑥 given h

= −𝑙𝑜𝑔𝑃 ො𝑥1|ℎ 𝑃 ො𝑥2| ො𝑥1, ℎ

⋯ 𝑃 ො𝑥𝑇| ො𝑥1:𝑇−1, ℎ

= −𝑙𝑜𝑔𝑃 ො𝑥|ℎ

Training data: ℎ, ො𝑥 ℎ: input sentence and history/context 𝑥: correct response (word sequence)ො

𝑥𝑡: t-th word, ො𝑥1:𝑡: first t words of ො𝑥

…… ……

ො 𝑥𝑡+1

𝐶 𝑡+1

ො 𝑥𝑡−1

𝐶 𝑡−1

generator output

…… ……

(9)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan Jurafsky, “Deep Reinforcement Learning for Dialogue

Generation“, EMNLP 2016

(10)

Introduction

• Machine obtains feedback from user

• Chat-bot learns to maximize the expected reward

https://image.freepik.com/free-vector/variety- of-human-avatars_23-2147506285.jpg

How are you?

Bye bye 

Hello

Hi 

-10 3

http://www.freepik.com/free-vector/variety- of-human-avatars_766615.htm

(11)

Maximizing Expected Reward

𝜃 = 𝑎𝑟𝑔 max

𝜃

𝑅ത𝜃 𝑅ത𝜃

Encoder Generator

𝜃

ℎ 𝑥 Human

= ෍

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ

𝑅 ℎ, 𝑥

Randomness in generator Probability that the input/history is h

Maximizing expected reward

update

(12)

Maximizing Expected Reward

= 𝐸ℎ~𝑃 ℎ 𝐸𝑥~𝑃𝜃 𝑥|ℎ 𝑅 ℎ, 𝑥

≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖

Sample:

𝜃 = 𝑎𝑟𝑔 max

𝜃

𝑅ത𝜃

= ෍

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ

Maximizing expected reward

Encoder Generator

𝜃

ℎ 𝑥 Human 𝑅 ℎ, 𝑥

update

𝑅ത𝜃

= 𝐸ℎ~𝑃 ℎ ,𝑥~𝑃𝜃 𝑥|ℎ 𝑅 ℎ, 𝑥

1, 𝑥1 , ℎ2, 𝑥2 , ⋯ , ℎ𝑁, 𝑥𝑁

Where

is 𝜃?

(13)

Policy Gradient

𝛻 ത𝑅𝜃 = ෍

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝛻𝑃𝜃 𝑥|ℎ

= ෍

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ 𝛻𝑃𝜃 𝑥|ℎ 𝑃𝜃 𝑥|ℎ

𝑑𝑙𝑜𝑔 𝑓 𝑥

𝑑𝑥 = 1

𝑓 𝑥

𝑑𝑓 𝑥 𝑑𝑥

= ෍

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥|ℎ

≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥|ℎ

= 𝐸ℎ~𝑃 ℎ ,𝑥~𝑃𝜃 𝑥|ℎ 𝑅 ℎ, 𝑥 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥|ℎ

Sampling

= ෍

𝑃 ℎ ෍

𝑥

𝑅 ℎ, 𝑥 𝑃𝜃 𝑥|ℎ

𝑅ത𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖

(14)

Policy Gradient

• Gradient Ascent

𝜃 𝑛𝑒𝑤 ← 𝜃 𝑜𝑙𝑑 + 𝜂𝛻 ത 𝑅 𝜃

𝑜𝑙𝑑

𝛻 ത 𝑅 𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ 𝑖 , 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖

𝑅 ℎ 𝑖 , 𝑥 𝑖 is positive

After updating 𝜃, 𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖 will increase 𝑅 ℎ 𝑖 , 𝑥 𝑖 is negative

After updating 𝜃, 𝑃 𝜃 𝑥 𝑖 |ℎ 𝑖 will decrease

(15)

Implementation

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 1

𝑁 ෍

𝑖=1 𝑁

𝑙𝑜𝑔𝑃𝜃 𝑥ො𝑖|ℎ𝑖

1

𝑁 ෍

𝑖=1 𝑁

𝛻𝑙𝑜𝑔𝑃𝜃 𝑥ො𝑖|ℎ𝑖

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖 𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖

𝑅 ℎ𝑖, ො𝑥𝑖 = 1 Sampling as training data weighted by 𝑅 ℎ𝑖, 𝑥𝑖

Objective Function

Gradient

Maximum Likelihood

Reinforcement Learning

Training Data

1, ො𝑥1 , … , ℎ𝑁, ො𝑥𝑁1, 𝑥1 , … , ℎ𝑁, 𝑥𝑁

Encoder Genera

𝑖 tor

Human

𝑥𝑖

𝑅 ℎ𝑖, 𝑥𝑖

(16)

Implementation

𝜃 𝑡

1, 𝑥12, 𝑥2

𝑁, 𝑥𝑁

……

𝑅 ℎ1, 𝑥1 𝑅 ℎ2, 𝑥2

𝑅 ℎ𝑁, 𝑥𝑁

……

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ

𝑖

, 𝑥

𝑖

𝛻𝑙𝑜𝑔𝑃

𝜃𝑡

𝑥

𝑖

|ℎ

𝑖

𝜃 𝑡+1 ← 𝜃 𝑡 + 𝜂𝛻 ത 𝑅 𝜃

𝑡

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ

𝑖

, 𝑥

𝑖

𝑙𝑜𝑔𝑃

𝜃

𝑥

𝑖

|ℎ

𝑖

New Objective:

𝜃0 can be well pre-trained from ℎ1, ො𝑥1 , … , ℎ𝑁, ො𝑥𝑁

(17)

Add a Baseline

Ideal case

Due to Sampling

(h,x1)

Because it is probability …

Not sampled

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖 𝑙𝑜𝑔𝛻𝑃𝜃 𝑥𝑖|ℎ𝑖

𝑃𝜃 𝑥|ℎ

(h,x2) (h,x3)

(h,x1) (h,x2) (h,x3) (h,x1) (h,x2) (h,x3) (h,x1) (h,x2) (h,x3) If 𝑅 ℎ𝑖, 𝑥𝑖 is always positive

(18)

1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖 − 𝑏 𝑙𝑜𝑔𝛻𝑃𝜃 𝑥𝑖|ℎ𝑖

Add a Baseline

(h,x1) 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖 𝑙𝑜𝑔𝛻𝑃𝜃 𝑥𝑖|ℎ𝑖

There are several ways to obtain the baseline b.

𝑃𝜃 𝑥|ℎ

(h,x2) (h,x3)

Not

sampled

Add baseline

If 𝑅 ℎ𝑖, 𝑥𝑖 is always positive

(h,x1) (h,x2) (h,x3)

(19)

Alpha GO style training !

• Let two agents talk to each other

How old are you?

See you.

See you.

See you.

How old are you?

I am 16.

I though you were 12.

What make you think so?

Using a pre-defined evaluation function to compute R(h,x)

(20)

Example Reward

• The final reward R(h,x) is the weighted sum of three terms r

1

(h,x), r

2

(h,x) and r

3

(h,x)

𝑅 ℎ, 𝑥 = λ 1 𝑟 1 ℎ, 𝑥 + λ 2 𝑟 2 ℎ, 𝑥 + λ 3 𝑟 3 ℎ, 𝑥

Ease of answering

Information Flow

Semantic Coherence

不要成為 句點王

說點 新鮮的

不要前言

不對後語

(21)

Example Results

(22)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(23)

Basic Idea – Chat-bot

Discriminator Input

sentence/history h response sentence x

Real or fake

http://www.nipic.com/show/3/83/3936650kd7476069.html

human dialogues Chatbot

En De

Conditional GAN

response sentence x Input

sentence/history h

(24)

Algorithm – Chat-bot

• Initialize generator Gen and discriminator Dis

• In each iteration:

• Sample real history ℎ and sentence 𝑥 from database

• Sample real history ℎ′ from database, and generate sentences ෤𝑥 by Gen(ℎ′)

• Update Dis to increase 𝐷𝑖𝑠 ℎ, 𝑥 and decrease 𝐷𝑖𝑠 ℎ, ෤𝑥

• Update Gen such that

Discrimi

nator scalar

update

Training data:

A: OOO B: XXX A: ∆ ∆ ∆

…… ……

h x

Chatbot

En De

(25)

A A A

B

A B

A

A B B

B

<BOS>

Can we do

backpropogation?

Tuning generator a little bit will not

change the output.

Encoder

Discrimi

nator scalar

update

Alternative:

improved WGAN

scalar Chatbot

En De

(ignoring

sampling process)

(26)

Alternatives

• Gumbel-softmax

• Matt J. Kusner, José Miguel Hernández-Lobato, “GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution”, arXiv 2016

• MaliGAN

• Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio, “Maximum-Likelihood Augmented Discrete

Generative Adversarial Networks”, arXiv 2017

• SeqGAN

• Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient”, AAAI 2017

• Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky,

“Adversarial Learning for Neural Dialogue Generation”, arXiv 2017

(27)

Reinforcement Learning?

• Consider the output of discriminator as reward

• Update generator to increase discriminator = to get maximum reward

• Different from typical RL

• The discriminator would update

𝛻 ത𝑅𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑅 ℎ𝑖, 𝑥𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 Discrimi

nator scalar

update

Chatbot

En De

reward

Discriminator Score

𝐷 ℎ𝑖, 𝑥𝑖

(28)

𝜃 𝑡

1, 𝑥12, 𝑥2

𝑁, 𝑥𝑁

……

𝐷 ℎ1, 𝑥1 𝐷 ℎ2, 𝑥2

𝐷 ℎ𝑁, 𝑥𝑁

……

1

𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ

𝑖

, 𝑥

𝑖

𝛻𝑙𝑜𝑔𝑃

𝜃𝑡

𝑥

𝑖

|ℎ

𝑖

𝜃 𝑡+1 ← 𝜃 𝑡 + 𝜂𝛻 ത 𝑅 𝜃

𝑡

1

𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ

𝑖

, 𝑥

𝑖

𝑙𝑜𝑔𝑃

𝜃

𝑥

𝑖

|ℎ

𝑖

New Objective:

g-step

d-step

discriminator

discriminator ℎ

𝑥

fake real

(29)

Example Results

感謝 段逸林 同學提供實驗結果

Human Evaluation

MLE 52.6%

SeqGAN 56.9%

ESGAN 60.9%

(30)

Tips: Reward for Every Generation Step

𝛻 ത𝑅𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖𝑖 = “What is your name?”

𝑥𝑖 = “I don’t know”

𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 is negative

Update 𝜃 to decrease log𝑃𝜃 𝑥𝑖|ℎ𝑖 𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 = 𝑙𝑜𝑔𝑃 𝑥1𝑖|ℎ𝑖 + 𝑙𝑜𝑔𝑃 𝑥2𝑖 |ℎ𝑖, 𝑥1𝑖 + 𝑙𝑜𝑔𝑃 𝑥3𝑖 |ℎ𝑖, 𝑥1:2𝑖

𝑃 "𝐼"|ℎ𝑖

𝑖 = “What is your name?”

𝑥𝑖 = “I am John”

𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 is positive

Update 𝜃 to increase log𝑃𝜃 𝑥𝑖|ℎ𝑖 𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 = 𝑙𝑜𝑔𝑃 𝑥1𝑖|ℎ𝑖 + 𝑙𝑜𝑔𝑃 𝑥2𝑖 |ℎ𝑖, 𝑥1𝑖 + 𝑙𝑜𝑔𝑃 𝑥3𝑖 |ℎ𝑖, 𝑥1:2𝑖

𝑃 "𝐼"|ℎ𝑖

(31)

Tips: Reward for Every Generation Step

Method 2. Discriminator For Partially Decoded Sequences

𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖 = 𝑙𝑜𝑔𝑃 𝑥1𝑖|ℎ𝑖 + 𝑙𝑜𝑔𝑃 𝑥2𝑖|ℎ𝑖, 𝑥1𝑖 + 𝑙𝑜𝑔𝑃 𝑥3𝑖|ℎ𝑖, 𝑥1:2𝑖𝑖 = “What is your name?” 𝑥𝑖 = “I don’t know”

𝑃 "𝐼"|ℎ𝑖 𝑃 "𝑑𝑜𝑛′𝑡"|ℎ𝑖, "𝐼" 𝑃 "𝑘𝑛𝑜𝑤"|ℎ𝑖, "𝐼 𝑑𝑜𝑛′𝑡"

Method 1. Monte Carlo (MC) Search

𝛻 ത𝑅𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝑡=1 𝑇

𝑄 ℎ𝑖, 𝑥1:𝑡𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑡𝑖|ℎ𝑖, 𝑥1:𝑡−1𝑖

𝛻 ത𝑅𝜃 ≈ 1

𝑁 ෍

𝑖=1 𝑁

𝐷 ℎ𝑖, 𝑥𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃𝜃 𝑥𝑖|ℎ𝑖

(32)

Tips: Monte Carlo Search

• How to estimate 𝑄 ℎ 𝑖 , 𝑥 1:𝑡 𝑖 ?

𝑄 "𝑊ℎ𝑎𝑡 𝑖𝑠 𝑦𝑜𝑢𝑟 𝑛𝑎𝑚𝑒? ", "𝐼"

I am John I am happy I don’t know

I am superman

𝑖 𝑥1𝑖

𝑥𝐴 = 𝑥𝐵 = 𝑥𝐶 = 𝑥𝐷 =

𝐷 ℎ𝑖, 𝑥𝐴 𝐷 ℎ𝑖, 𝑥𝐵 𝐷 ℎ𝑖, 𝑥𝐶 𝐷 ℎ𝑖, 𝑥𝐷

= 1.0

= 0.1

= 0.1

= 0.8

𝑄 ℎ𝑖, "𝐼" = 0.5 A roll-out generator for sampling is needed

avg

(33)

Tips: Rewarding Partially Decoded Sequences

• Training a discriminator that is able to assign rewards to both fully and partially decoded sequences

• Break generated sequences into partial sequences h=“What is your name?”, x=“I am john”

h=“What is your name?”, x=“I don’t know”

Dis

scalar

h h=“What is your name?”, x=“I am” x

h=“What is your name?”, x=“I”

h=“What is your name?”, x=“I don’t”

h=“What is your name?”, x=“I”

𝑄 ℎ, 𝑥 1:𝑡

h

Dis

𝑥 1:𝑡

(34)

Tips: Adding Good Examples

• The training of generative model is unstable

• This reward is used to promote or discourage the generator’s own generated sequences.

• Usually It knows that the generated results are bad, but does not know what results are good.

Obtained by sampling

Adding more Data:

Training Data for SeqGAN:

1, ො𝑥1 , … , ℎ𝑁, ො𝑥𝑁

1, 𝑥1 , … , ℎ𝑁, 𝑥𝑁

weighted by 𝐷 ℎ𝑖, 𝑥𝑖 Real data Consider 𝐷 ℎ𝑖, ො𝑥𝑖 = 1

(35)

Tips: RankGAN

Kevin Lin, Dianqi Li, Xiaodong

He, Zhengyou Zhang, Ming-Ting Sun,

“Adversarial Ranking for Language Generation”, NIPS 2017

Image caption generation:

(36)

More Applications

• Supervised machine translation

• Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, Tie-Yan Liu, “Adversarial Neural Machine Translation”, arXiv 2017

• Zhen Yang, Wei Chen, Feng Wang, Bo Xu, „Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets“, arXiv 2017

• Supervised abstractive summarization

• Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li, “Generative Adversarial Network for Abstractive Text Summarization”, AAAI 2018

• Image/video caption generation

• Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele, “Speaking the Same Language: Matching Machine to Human

Captions by Adversarial Training”, ICCV 2017

• Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing, “Recurrent Topic-Transition GAN for Visual Paragraph Generation”, arXiv 2017

(37)

Outline

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(38)

Summarization

Audio File to be summarized

This is the summary.

 Select the most informative segments to form a compact version

Extractive Summaries

…… deep learning is powerful …… …… ……

[Lee, et al., Interspeech 12]

[Lee, et al., ICASSP 13]

[Shiang, et al., Interspeech 13]

 Machine does not write summaries in its own words

(39)

Abstractive Summarization

• Now machine can do abstractive summary (write summaries in its own words)

Title 1 Title 2

Title 3

Training

Data

title generated

by machine

without hand- crafted rules

(in its own words)

(40)

Abstractive Summarization

• Input: transcriptions of audio, output: summary

1234

RNN Encoder: read through

the input

w1 w2 w3 w4

transcriptions of audio from automatic speech recognition (ASR)

𝑧1 𝑧2

……

w

A

w

B

……

RNN generator We need lots of labelled

training data (supervised).

(41)

Unsupervised Abstractive Summarization

• Document:澳大利亞今天與13個國家簽署了反興奮劑雙 邊協議,旨在加強體育競賽之外的藥品檢查並共享研究成 果 ……

• Summary:

• Human:澳大利亞與13國簽署反興奮劑協議

• Unsupervised:澳大利亞加強體育競賽之外的藥品檢查

• Document:中華民國奧林匹克委員會今天接到一九九二年 冬季奧運會邀請函,由於主席張豐緒目前正在中南美洲進 行友好訪問,因此尚未決定是否派隊赴賽 ……

• Summary:

• Human:一九九二年冬季奧運會函邀我參加

• Unsupervised:奧委會接獲冬季奧運會邀請函

(42)

Unsupervised Abstractive Summarization

• Document:據此間媒體27日報道,印度尼西亞蘇門答臘島 的兩個省近日來連降暴雨,洪水泛濫導致塌方,到26日為止 至少已有60人喪生,100多人失蹤 ……

• Summary:

• Human:印尼水災造成60人死亡

• Unsupervised:印尼門洪水泛濫導致塌雨

• Document:安徽省合肥市最近為領導幹部下基層做了新規 定:一律輕車簡從,不準搞迎來送往、不準搞層層陪同 ……

• Summary:

• Human:合肥規定領導幹部下基層活動從簡

• Unsupervised:合肥領導幹部下基層做搞迎來送往規定:

一律簡

(43)

More Applications

• Unsupervised video summarization

Behrooz Mahasseni, Michael Lam and Sinisa Todorovic, “Unsupervised Video Summarization with Adversarial LSTM Networks”, CVPR, 2017

(44)

Outline of Part II

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(45)

Unsupervised Translation

• Alexis Conneau, Guillaume Lample, Marc'Aurelio

Ranzato, Ludovic Denoyer, Hervé Jégou, Word Translation Without Parallel Data, submitted to ICRL 2018

• Guillaume Lample, Ludovic

Denoyer, Marc'Aurelio Ranzato,

“Unsupervised Machine Translation Using Monolingual Corpora Only”, submitted to ICRL 2018

(46)

Approaches

(47)

Experimental Results

(48)

Outline of Part II

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

(49)

Example: Personalized Chat-bot

• General chat-bots generate plain responses

• Human talks in different styles and sentiments to different people in different conditions.

• We want the response of chat-bot is controllable.

• Therefore, chat-bot can be personalized in the future

• We only focus on generate positive response below.

Input: How was your day today?

It is wonderful today.

It is terrible today.

Optimistic Chat-bot

Assumption: We have a sentiment classifier. Given a sentence x, we can evaluate how positive it is, SC(x).

(50)

Approaches

Input sentence

response sentence Chatbot

En De

Transfor

mation

Positive response

Do not have to change

Input sentence

response sentence Chatbot

En De

Parameters are modified

Type 2. Output Transformation Type 1. System Modification

Positive

response

(51)

Approaches

• 1. Persona-Based Model

How is today

Today is awesome

Sentiment Classifier

Today is awesome

0.9

Training

How positive is the input

(52)

Approaches

• 1. Persona-Based Model

How is today

Today is bad

Sentiment Classifier

Today is bad

0.1

Training

How positive is the input

(53)

Approaches

• 1. Persona-Based Model

I love you

? ? ?

?

Testing

?

Response: I love you, too.

= 0.0

?

= 1.0

Response: I am not ready to

start a relationship.

(54)

Approaches

2. Reinforcement Learning

How is today

Today is bad

Positive reward for

positive response

Sentiment Classifier

0.1

good

Network parameters are updated

(55)

Approaches

3. Plug & Play

response sentence

Transfor mation

Positive response

code sentence VRAE

Encoder

VRAE

Decoder sentence

response code of chat-bot

VRAE Encoder

VRAE Decoder

Positive response

Sentiment Classifier

new code

As large as possible

As close as

possible

(56)

Approaches

4. Cycle GAN

response sentence

Transfor mation

Positive response

Domain X Domain Y

male female

It is good.

It’s a good day.

I love you.

It is bad.

It’s a bad day.

I don’t love you.

positive sentences negative sentences

(57)

Cycle GAN

• Negative

sentence to positive sentence:

it's a crappy day → it's a great day

i wish you could be here → you could be here it's not a good idea → it's good idea

i miss you → i love you

i don't love you → i love you i can't do that → i can do that i feel so sad → i happy

it's a bad day → it's a good day

it's a dummy day → it's a great day

sorry for doing such a horrible thing → thanks for doing a great thing

my doggy is sick → my doggy is my doggy

my little doggy is sick → my little doggy is my little doggy

(58)

Experimental Results

• Chat-bots were changed (these two method were able to make complicated changes to the output sentences).

• RL model perform better than persona-based models in two out of the three metric, except sentiment score

• For persona-based, sentiment is high but coherence and

grammar are low.

58

(59)

Experimental Results

• These three models modify the responses after they are generated.

• Among the three models, cycle GAN is the best in terms of

coherence, sentiment and grammar.

59

Examples: goo.gl/X1PZLM.

(60)

Example 1

• Input: yeah, i’m just less interested in the how-to and more in the what and why of the things we build

• Seq2seq: it’s not a good thing .

• Persona: I’m interested in this .

• RL: it’s a good one

• Plug and play: it’s not a wrong thing .

• Cycle GAN: it’s a good thing .

(61)

Example 2

• Input: always learning and self improving is a huge turn on for me so believe me i understand

• Seq2seq: i'm not sure if i can see you in a relationship

• Persona: and you're gonna have a word with your mother .

• RL: i love you so much

• Plug and play: i'm not sure if i can see you a a special

• Cycle GAN: i'm sure if i can see you in a relationship.

(62)

Concluding Remarks

Improving Supervised Seq-to-seq Model

• RL (human feedback)

• GAN (discriminator feedback) Unsupervised Seq-to-seq Model

• Summarization

• Translation

Text Style Transfer

參考文獻

相關文件

For example, Ko, Chen and Yang [22] proposed two kinds of neural networks with different SOCCP functions for solving the second-order cone program; Sun, Chen and Ko [29] gave two

Wang, A recurrent neural network for solving nonlinear convex programs subject to linear constraints, IEEE Transactions on Neural Networks, vol..

Wang, Solving pseudomonotone variational inequalities and pseudo- convex optimization problems using the projection neural network, IEEE Transactions on Neural Network,

Example 2: CHEN KENGYANG vs CHEN KENG

⇔ improve some performance measure (e.g. prediction accuracy) machine learning: improving some performance measure..

⇔ improve some performance measure (e.g. prediction accuracy) machine learning: improving some performance measure?.

Machine Translation Speech Recognition Image Captioning Question Answering Sensory Memory.

Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015.. Image