Introduction of Generative

(1)

Introduction of Generative Adversarial Network (GAN)

李宏毅

Hung-yi Lee

(2)

Yann LeCun’s comment

https://www.quora.com/What-are-some-recent-and-

potentially-upcoming-breakthroughs-in-unsupervised-learning

(3)

Yann LeCun’s comment

https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs- in-deep-learning

……

(4)

Generative Adversarial Network (GAN)

• How to pronounce “GAN”?

Google 小姐

(5)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Modifying input code

• Paired data

• Unpaired data

• Application: Intelligent Photoshop

(6)

Basic Idea of GAN

Generator

It is a neural network (NN), or a function.

Generator

0.1

−3

⋮ 2.4 0.9

image

vector

Generator

3

−3

⋮ 2.4 0.9

Generator

0.1 2.1

⋮ 2.4 0.9

Generator

0.1

−3

⋮ 2.4 3.5

matrix

Powered by: http://mattya.github.io/chainer-DCGAN/

Each dimension of input vector

represents some characteristics. Longer hair

blue hair Open mouth

(7)

Discri- minator

scalar

image

Basic Idea of GAN It is a neural network (NN), or a function.

Larger value means real, smaller value means fake.

Discri- minator

1.0 1.0

0.1 ?

(8)

Basic Idea of GAN

Brown veins

Butterflies are not brown

Butterflies do

not have veins ……..

Generator

Discriminator

(9)

Basic Idea of GAN

NN

Generator v1

Discri- minator

v1

Real images:

NN

Generator v2

Discri- minator

v2

NN

Generator v3

Discri- minator

v3 This is where the term

“adversarial” comes from.

You can explain the process in different ways…….

(10)

Generator v3

Generator v2

Basic Idea of GAN (和平的比喻)

Generator (student)

Discriminator (teacher)

Generator

v1 Discriminator

v1

Discriminator v2

No eyes

No mouth

為什麼不自己學？ 為什麼不自己做？

(11)

Anime Face Generation

100 updates

(12)

Anime Face Generation

1000 updates

(13)

Anime Face Generation

2000 updates

(14)

Anime Face Generation

5000 updates

(15)

Anime Face Generation

10,000 updates

(16)

Anime Face Generation

20,000 updates

(17)

Anime Face Generation

50,000 updates

(18)

感謝陳柏文同學提供實驗結果

0.0 0.0

G

0.9 0.9

G

0.1 0.1

G

0.2 0.2

G

0.3 0.3

G

0.4 0.4

G

0.5 0.5

G

0.6 0.6

G

0.7 0.7

G

0.8 0.8

G

(19)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data

(20)

Structured Learning

Y X

f : 

Regression: output a scalar

Classification: output a “class”

Structured Learning/Prediction: output a sequence, a matrix, a graph, a tree ……

(one-hot vector)

1 0 0

Machine learning is to find a function f

Output is composed of components with dependency

0 1 0 0 0 1

Class 1 Class 2 Class 3

(21)

Regression, Classification

Structured

Learning

(22)

Output Sequence

(what a user says)

Y X

f : 

“機器學習及其深層與結構化”

“Machine learning and

having it deep and structured”

:

X Y :

感謝大家來上課”

(response of machine)

:

X Y :

Machine Translation

Speech Recognition

Chat-bot

(speech)

:

X Y :

(transcription) (sentence of language 1) (sentence of language 2)

“How are you?” “I’m fine.”

(23)

Output Matrix f : X  Y

ref: https://arxiv.org/pdf/1605.05396.pdf

“this white and yellow flower have thin white petals and a round yellow stamen”

:

X _Y _:

Text to Image Image to Image

:

X Y :

Colorization:

Ref: https://arxiv.org/pdf/1611.07004v1.pdf

(24)

Decision Making and Control

Action:

“right”

Action:

“fire”

Action:

“left”

A sequence of decisions

(25)

Why Structured Learning Interesting?

• One-shot/Zero-shot Learning:

• In classification, each class has some examples.

• In structured learning,

• If you consider each possible output as a

“class” ……

• Since the output space is huge, most “classes”

do not have any training data.

• Machine has to create new stuff during testing.

• Need more intelligence

(26)

Why Structured Learning Interesting?

• Machine has to learn to planning

• Machine can generate objects component-by-

component, but it should have a big picture in its mind.

• Because the output components have dependency, they should be considered globally.

Image

Generation

Sentence Generation

這個婆娘不是人

九天玄女下凡塵

(27)

Structured Learning Approach

Bottom Up

Top Down

Generative Adversarial Network (GAN)

Generator

Discriminator

Learn to generate the object at the component level

Evaluating the

whole object, and find the best one

(28)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data

(29)

Generation

NN

Generator

Image Generation

Sentence Generation

NN

Generator

We will control what to generate latter. → Conditional Generation

0.1

−0.1

⋮ 0.7

−0.3 0.1

⋮ 0.9

0.1

−0.1

⋮ 0.2

−0.3 0.1

⋮ 0.5

Good morning.

How are you?

0.3

−0.1

⋮

−0.7

0.3

−0.1

⋮

−0.7 Good afternoon.

In a specific range

(30)

Generator v3

Generator v2

Basic Idea of GAN (和平的比喻)

Generator (student)

Generator

v1 Discriminator

v1

Discriminator v2

No eyes

No mouth

(31)

Generator

Image:

code: 0.1

(where does them 0.9

come from?)

NN

Generator 0.1

−0.5

0.2

−0.1

0.3 0.2

NN

Generator

code

vectors code code

0.1

0.9 ^image

As close as possible

c.f.

^NN

Classifier

𝑦₁ 𝑦₂

⋮

As close as possible 1 0

⋮

(32)

Generator

Encoder in auto-encoder provides the code ☺

NN 𝑐 Encoder

NN

Generator

code

vectors code code

Image:

code:

(where does them come from?)

0.1 0.9 0.1

−0.5

0.2

−0.1

0.3 0.2

(33)

Auto-encoder

NN Encoder

NN Decoder

code

Compact

representation of the input object

code Can reconstruct

the original object

Learn together

28 X 28 = 784

Low dimension

𝑐

Trainable

NN Encoder

NN Decoder

(34)

Auto-encoder

As close as possible

NN Encoder

NN Decoder

code

NN Decoder codeRandomly generate

a vector as code

Image ?

= Generator

(35)

Auto-encoder

NN Decoder code

2D

-1.5 1.5

−1.5 0

NN Decoder

1.5 0

NN Decoder

(real examples)

image

(36)

Auto-encoder

-1.5 1.5

(real examples)

(37)

Auto-encoder

_Generator^NN

code

vectors code code

NN

Generator

a NN

Generator b

NN

Generator

?

a b

0.5x + 0.5x Image:

code:

(where does them come from?)

0.1 0.9 0.1

−0.5

0.2

−0.1

0.3 0.2

(38)

What do we miss?

G

as close as possible

Target Generated Image

It will be fine if the generator can truly copy the target image.

What if the generator makes some mistakes …….

Some mistakes are serious, while some are fine.

(39)

What do we miss?

^Target

1 pixel error 1 pixel error

6 pixel errors 6 pixel errors

我覺得不行我覺得不行

我覺得其實 OK

(40)

What do we miss?

我覺得不行我覺得其實 OK

The relation between the components are critical.

The last layer generates each components independently.

Need deep structure to catch the relation between components.

Layer L-1

……

Layer L

……

Each neural in output layer corresponds to a pixel.

ink empty

旁邊的，我們產生一樣的顏色

誰管你

(41)

(Variational) Auto-encoder

感謝黃淞楓同學提供結果

x₁ x₂

𝑧₁ G 𝑧₂

𝑥₁ 𝑥₂

(42)

Generator v3

Generator v2

Basic Idea of GAN (和平的比喻)

Generator (student)

Generator

v1 Discriminator

v1

Discriminator v2

No eyes

No mouth

(43)

Discriminator

• Discriminator is a function D (network, can deep)

• Input x: an object x (e.g. an image)

• Output D(x): scalar which represents how “good” an object x is

R :

D X 

D

1.0

^D

0.1 Can we use the discriminator to generate objects?

Yes.

Evaluation function, Potential

Function, Evaluation Function …

(44)

Discriminator

• It is easier to catch the relation between the components by top-down evaluation.

我覺得不行我覺得其實 OK

This CNN filter is good enough.

(45)

Discriminator

• Suppose we already have a good discriminator D(x) …

How to learn the discriminator?

Enumerate all possible x !!!

It is feasible ???

(46)

Discriminator - Training

• I have some real images

Discriminator training needs some negative examples.

D scalar 1

(real)

D scalar 1

(real)

D scalar 1

(real)

D scalar 1

(real) Discriminator only learns to output “1” (real).

(47)

Discriminator - Training

D(x)

real examples In practice, you cannot decrease all the x

other than real examples.

Discrimi nator D object

x

scalar D(x)

(48)

Discriminator - Training

• Negative examples are critical.

D scalar 1

(real)

D scalar 0

(fake)

D 0.9 Pretty real

D scalar 1

(real)

D scalar 0

(fake)

How to generate realistic

negative examples?

(49)

Discriminator - Training

• General Algorithm

• Given a set of positive examples, randomly generate a set of negative examples.

• In each iteration

• Learn a discriminator D that can discriminate positive and negative examples.

• Generate negative examples by discriminator D

D

  ^x

D x  arg max

xX

~

v.s.

(50)

Discriminator

- Training

^{𝐷 𝑥}

𝐷 𝑥

𝐷 𝑥 𝐷 𝑥

In the end ……

real

generated

(51)

Graphical Model

Bayesian Network (Directed Graph)

Markov Random Field (Undirected Graph)

Markov Logic Network

Boltzmann Machine

Restricted

Boltzmann Machine

Structured Learning ➢ Structured Perceptron

➢ Structured SVM

➢ Gibbs sampling

➢ Hidden information

➢ Application: sequence labelling, summarization

Conditional Random Field

Segmental CRF

(Only list some of the approaches)

Energy-based Model:

http://www.cs.nyu .edu/~yann/resear ch/ebm/

(52)

Generator v.s. Discriminator

• Generator

• Pros:

• Easy to generate even with deep model

• Cons:

• Imitate the appearance

• Hard to learn the correlation between components

• Discriminator

• Pros:

• Considering the big picture

• Cons:

• Generation is not always feasible

• Especially when your model is deep

• How to do negative sampling?

(53)

Generator + Discriminator

• General Algorithm

• Given a set of positive examples, randomly generate a set of negative examples.

• In each iteration

• Learn a discriminator D that can discriminate positive and negative examples.

• Generate negative examples by discriminator D

D

  ^x

D x  arg max

xX

~

v.s.

G

~ x ₌

(54)

Generating Negative Examples

  ^x

D x  arg max

xX

G

~ x ₌ ~

Discri- minator NN

Generator ^image code

0.13 hidden layer

update fix

^0.9

Gradient Ascent

(55)

• Initialize generator and discriminator

• In each training iteration:

D G

Sample some real objects:

Generate some fake objects:

G

Algorithm

D Update

G ^image D

code code code code

0 0

1 1

code code code code

image image

image

1

update fix

(56)

• Initialize generator and discriminator

• In each training iteration:

D G

Learning D

Sample some real objects:

Generate some fake objects:

G

Algorithm

D Update

Learning

G G ^image D

code code code code

1 1

code code code code

image image

image

1

update fix

0 0

(57)

Benefit of GAN

• From Discriminator’s point of view

• Using generator to generate negative samples

• From Generator’s point of view

• Still generate the object component-by- component

• But it is learned from the discriminator with global view.

  ^x

D x  arg max

xX

G

~ x ₌ ~

efficient

(58)

GAN

感謝段逸林同學提供結果

https://arxiv.org/a bs/1512.09300

VAE GAN

(59)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data

(60)

Conditional Generation

Generation

Conditional Generation

NN

Generator

“Girl with red hair and red eyes”

“Girl with yellow ribbon”

NN

Generator

0.1

−0.1

⋮ 0.7

−0.3 0.1

⋮ 0.9 0.3

−0.1

⋮

−0.7

In a specific range

(61)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data

(62)

Modifying Input Code

Generator

0.1

−3

⋮ 2.4 0.9

Generator

3

−3

⋮ 2.4 0.9

Generator

0.1 2.1

⋮ 2.4 0.9

Generator

0.1

−3

⋮ 2.4 3.5 Each dimension of input vector

represents some characteristics. Longer hair

blue hair Open mouth

➢The input code determines the generator output.

➢Understand the meaning of each dimension to

control the output.

(63)

Connecting Code and Attribute

CelebA

(64)

GAN+Autoencoder

• We have a generator (input z, output x)

• However, given x, how can we find z?

• Learn an encoder (input x, output z)

Generator (Decoder)

Encoder z x

x

fixed Discriminator

init

Autoencoder

different structures?

(65)

Generator (Decoder)

Encoder z x

x

(66)

Attribute Representation

𝑧

_{𝑙𝑜𝑛𝑔}

= 1

𝑁

₁

෍

𝑥∈𝑙𝑜𝑛𝑔

𝐸𝑛 𝑥 − 1

𝑁

₂

෍

𝑥^′∉𝑙𝑜𝑛𝑔

𝐸𝑛 𝑥

^′

Short Hair

Long 𝐸𝑛 𝑥 + 𝑧_{𝑙𝑜𝑛𝑔} = 𝑧^′ Hair

𝑥 𝐺𝑒𝑛 𝑧^′

𝑧

_{𝑙𝑜𝑛𝑔}

(67)

https://www.youtube.com/watch?v=kPEIJJsQr7U

Photo

Editing

(68)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data

(69)

Target of NN output

Conditional GAN

• Text to image by traditional supervised learning

NN Image

Text: “train”

c¹: a dog is running ො𝑥¹:

ො 𝑥²: c²: a bird is flying

A blurry image!

c¹: a dog is running 𝑥ො¹:

(70)

Conditional GAN

G

Prior distribution 𝑧

x = G(c,z)

c: train

Target of NN output Text: “train”

A blurry image!

Approximate the distribution below It is a distribution.

Image

(71)

Conditional GAN

D

(type 2)

scalar

𝑐 𝑥 D

(type 1)

scalar

𝑥

Positive example:

Negative example:

G

Prior distribution 𝑧

x = G(c,z)

c: train

x is realistic or not

Image

x is realistic or not +

c and x are matched or not

(train , ) (train , ) (cat , ) Positive example:

Negative example:

(72)

Text to Image - Results

"red flower with black center"

(73)

Text to Image

- Results

(74)

Image-to-image

https://arxiv.org/pdf/1611.07004

G

𝑧

x = G(c,z)

𝑐

(75)

Image-to-image

• Traditional supervised approach

NN Image

It is blurry because it is the average of several images.

Testing:

input close

(76)

Image-to-image

• Experimental results

Testing:

input close GAN

G 𝑧

Image D scalar

GAN + close

(77)

Image super resolution

• Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew

Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes

Totz, Zehan Wang, Wenzhe Shi, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”, CVPR, 2016

(78)

Video Generation

Generator

Discrimi nator

Last frame is real or generated

Discriminator thinks it is real target Minimize distance

(79)

https://github.com/dyelax/Adversarial_Video_Generation

(80)

Speech Enhancement

• Typical deep learning approach

Noisy Clean

G

^Output

Using CNN

(81)

Speech Enhancement

• Conditional GAN

G

D scalar

noisy output clean

noisy clean

output

noisy

training data

(fake pair or not)

(82)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data

(83)

Cycle GAN, Disco GAN

Transform an object from one domain to another without paired data

photo ^{van Gogh}

Domain X Domain Y

paired data

(84)

Cycle GAN

𝐺

_𝑋→𝑌

Domain X Domain Y

𝐷

_𝑌

Domain Y Domain X

scalar Input image

belongs to

domain Y or not Become similar

to domain Y

Not what we want ignore input

https://arxiv.org/abs/1703.10593 https://junyanz.github.io/CycleGAN/

(85)

Cycle GAN

Domain X Domain Y

𝐺

_𝑋→𝑌

𝐷

_𝑌

Domain Y

scalar Input image

belongs to

domain Y or not

𝐺

_Y→X

Lack of information for reconstruction

(86)

Cycle GAN

Domain X Domain Y

𝐺

_𝑋→𝑌

𝐺

_Y→X

𝐺

_Y→X

𝐺

_𝑋→𝑌

𝐷

_𝑌

𝐷

_𝑋 scalar: belongs to

domain Y or not scalar: belongs to

domain X or not

c.f. Dual Learning

(87)

動畫化的世界

• Using the code:

https://github.com/Hi- king/kawaii_creator

• It is not cycle GAN, Disco GAN

input output domain

(88)

Outline

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data

(89)

https://www.youtube.com/watch?v=9c4z6YsBGQ0

Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman and Alexei A. Efros. "Generative Visual Manipulation on the Natural Image Manifold", ECCV, 2016.

(90)

Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston, Neural Photo Editing with Introspective Adversarial Networks, arXiv preprint, 2017

(91)

Basic Idea

space of code z

Why move on the code space?

Fulfill the constraint

(92)

Generator (Decoder)

Encoder z x

x

Back to z

• Method 1

• Method 2

• Method 3

𝑧^∗ = 𝑎𝑟𝑔 min

𝑧 𝐿 𝐺 𝑧 , 𝑥^𝑇

𝑥^𝑇 𝑧^∗

Difference between G(z) and x^T

➢ Pixel-wise

➢ By another network

Using the results from method 2 as the initialization of method 1 Gradient Descent

(93)

Editing Photos

• z

₀

is the code of the input image

𝑧

^∗

= 𝑎𝑟𝑔 min

𝑧

𝑈 𝐺 𝑧 + 𝜆

₁

𝑧 − 𝑧

₀ ²

− 𝜆

₂

𝐷 𝐺 𝑧

Does it fulfill the constraint of editing?

Not too far away from the original image

Using discriminator to check the image is realistic or not

image

(94)

Concluding Remarks

Basic Idea of GAN

When do we need GAN?

GAN as structured learning algorithm Conditional Generation by GAN

• Paired data

• Unpaired data