Introduction of Generative Adversarial Network (GAN)
李宏毅
Hung-yi Lee
Yann LeCun’s comment
https://www.quora.com/What-are-some-recent-and-
potentially-upcoming-breakthroughs-in-unsupervised-learning
Yann LeCun’s comment
https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs- in-deep-learning
……
Generative Adversarial Network (GAN)
• How to pronounce “GAN”?
Google 小姐
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
Basic Idea of GAN
Generator
It is a neural network (NN), or a function.
Generator
0.1
−3
⋮ 2.4 0.9
image
vector
Generator
3
−3
⋮ 2.4 0.9
Generator
0.1 2.1
⋮ 2.4 0.9
Generator
0.1
−3
⋮ 2.4 3.5
matrix
Powered by: http://mattya.github.io/chainer-DCGAN/
Each dimension of input vector
represents some characteristics. Longer hair
blue hair Open mouth
Discri- minator
scalar
image
Basic Idea of GAN It is a neural network (NN), or a function.
Larger value means real, smaller value means fake.
Discri- minator
Discri- minator
Discri- minator
Discri- minator
1.0 1.0
0.1 ?
Basic Idea of GAN
Brown veins
Butterflies are not brown
Butterflies do
not have veins ……..
Generator
Discriminator
Basic Idea of GAN
NN
Generator v1
Discri- minator
v1
Real images:
NN
Generator v2
Discri- minator
v2
NN
Generator v3
Discri- minator
v3 This is where the term
“adversarial” comes from.
You can explain the process in different ways…….
Generator v3
Generator v2
Basic Idea of GAN (和平的比喻)
Generator (student)
Discriminator (teacher)
Generator
v1 Discriminator
v1
Discriminator v2
No eyes
No mouth
為什麼不自己學? 為什麼不自己做?
Anime Face Generation
100 updates
Anime Face Generation
1000 updates
Anime Face Generation
2000 updates
Anime Face Generation
5000 updates
Anime Face Generation
10,000 updates
Anime Face Generation
20,000 updates
Anime Face Generation
50,000 updates
感謝陳柏文同學提供實驗結果
0.0 0.0
G
0.9 0.9
G
0.1 0.1
G
0.2 0.2
G
0.3 0.3
G
0.4 0.4
G
0.5 0.5
G
0.6 0.6
G
0.7 0.7
G
0.8 0.8
G
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
Structured Learning
Y X
f :
Regression: output a scalar
Classification: output a “class”
Structured Learning/Prediction: output a sequence, a matrix, a graph, a tree ……
(one-hot vector)
1 0 0
Machine learning is to find a function f
Output is composed of components with dependency
0 1 0 0 0 1
Class 1 Class 2 Class 3
Regression, Classification
Structured
Learning
Output Sequence
(what a user says)
Y X
f :
“機器學習及其深層與 結構化”
“Machine learning and
having it deep and structured”
:
X Y :
感謝大家來上課”
(response of machine)
:
X Y :
Machine Translation
Speech Recognition
Chat-bot
(speech)
:
X Y :
(transcription) (sentence of language 1) (sentence of language 2)
“How are you?” “I’m fine.”
Output Matrix f : X Y
ref: https://arxiv.org/pdf/1605.05396.pdf
“this white and yellow flower have thin white petals and a round yellow stamen”
:
X Y :
Text to Image Image to Image
:
X Y :
Colorization:
Ref: https://arxiv.org/pdf/1611.07004v1.pdf
Decision Making and Control
Action:
“right”
Action:
“fire”
Action:
“left”
A sequence of decisions
Why Structured Learning Interesting?
• One-shot/Zero-shot Learning:
• In classification, each class has some examples.
• In structured learning,
• If you consider each possible output as a
“class” ……
• Since the output space is huge, most “classes”
do not have any training data.
• Machine has to create new stuff during testing.
• Need more intelligence
Why Structured Learning Interesting?
• Machine has to learn to planning
• Machine can generate objects component-by-
component, but it should have a big picture in its mind.
• Because the output components have dependency, they should be considered globally.
Image
Generation
Sentence Generation
這個婆娘不是人
九天玄女下凡塵
Structured Learning Approach
Bottom Up
Top Down
Generative Adversarial Network (GAN)
Generator
Discriminator
Learn to generate the object at the component level
Evaluating the
whole object, and find the best one
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
Generation
NN
Generator
Image Generation
Sentence Generation
NN
Generator
We will control what to generate latter. → Conditional Generation
0.1
−0.1
⋮ 0.7
−0.3 0.1
⋮ 0.9
0.1
−0.1
⋮ 0.2
−0.3 0.1
⋮ 0.5
Good morning.
How are you?
0.3
−0.1
⋮
−0.7
0.3
−0.1
⋮
−0.7 Good afternoon.
In a specific range
Generator v3
Generator v2
Basic Idea of GAN (和平的比喻)
Generator (student)
Discriminator (teacher)
Generator
v1 Discriminator
v1
Discriminator v2
No eyes
No mouth
為什麼不自己學? 為什麼不自己做?
Generator
Image:
code: 0.1
(where does them 0.9
come from?)
NN
Generator 0.1
−0.5
0.2
−0.1
0.3 0.2
NN
Generator
code
vectors code code
0.1
0.9 image
As close as possible
c.f.
NNClassifier
𝑦1 𝑦2
⋮
As close as possible 1 0
⋮
Generator
Encoder in auto-encoder provides the code ☺
NN 𝑐 Encoder
NN
Generator
code
vectors code code
Image:
code:
(where does them come from?)
0.1 0.9 0.1
−0.5
0.2
−0.1
0.3 0.2
Auto-encoder
NN Encoder
NN Decoder
code
Compact
representation of the input object
code Can reconstruct
the original object
Learn together
28 X 28 = 784
Low dimension
𝑐
Trainable
NN Encoder
NN Decoder
Auto-encoder
As close as possible
NN Encoder
NN Decoder
code
NN Decoder codeRandomly generate
a vector as code
Image ?
= Generator
= Generator
Auto-encoder
NN Decoder code
2D
-1.5 1.5
−1.5 0
NN Decoder
1.5 0
NN Decoder
(real examples)
image
Auto-encoder
-1.5 1.5
(real examples)
Auto-encoder
GeneratorNNcode
vectors code code
NN
Generator
a NN
Generator b
NN
Generator
?
a b
0.5x + 0.5x Image:
code:
(where does them come from?)
0.1 0.9 0.1
−0.5
0.2
−0.1
0.3 0.2
What do we miss?
G
as close as possible
Target Generated Image
It will be fine if the generator can truly copy the target image.
What if the generator makes some mistakes …….
Some mistakes are serious, while some are fine.
What do we miss?
Target1 pixel error 1 pixel error
6 pixel errors 6 pixel errors
我覺得不行 我覺得不行
我覺得 其實 OK
我覺得 其實 OK
What do we miss?
我覺得不行 我覺得其實 OK
The relation between the components are critical.
The last layer generates each components independently.
Need deep structure to catch the relation between components.
Layer L-1
……
Layer L
……
……
……
……
Each neural in output layer corresponds to a pixel.
ink empty
旁邊的,我們產 生一樣的顏色
誰管你
(Variational) Auto-encoder
感謝 黃淞楓 同學提供結果
x1 x2
𝑧1 G 𝑧2
𝑥1 𝑥2
Generator v3
Generator v2
Basic Idea of GAN (和平的比喻)
Generator (student)
Discriminator (teacher)
Generator
v1 Discriminator
v1
Discriminator v2
No eyes
No mouth
為什麼不自己學? 為什麼不自己做?
Discriminator
• Discriminator is a function D (network, can deep)
• Input x: an object x (e.g. an image)
• Output D(x): scalar which represents how “good” an object x is
R :
D X
D
1.0
D0.1
Can we use the discriminator to generate objects?
Yes.
Evaluation function, Potential
Function, Evaluation Function …
Discriminator
• It is easier to catch the relation between the components by top-down evaluation.
我覺得不行 我覺得其實 OK
This CNN filter is good enough.
Discriminator
• Suppose we already have a good discriminator D(x) …
How to learn the discriminator?
Enumerate all possible x !!!
It is feasible ???
Discriminator - Training
• I have some real images
Discriminator training needs some negative examples.
D scalar 1
(real)
D scalar 1
(real)
D scalar 1
(real)
D scalar 1
(real) Discriminator only learns to output “1” (real).
Discriminator - Training
D(x)
real examples In practice, you cannot decrease all the x
other than real examples.
Discrimi nator D object
x
scalar D(x)
Discriminator - Training
• Negative examples are critical.
D scalar 1
(real)
D scalar 0
(fake)
D 0.9 Pretty real
D scalar 1
(real)
D scalar 0
(fake)
How to generate realistic
negative examples?
Discriminator - Training
• General Algorithm
• Given a set of positive examples, randomly generate a set of negative examples.
• In each iteration
• Learn a discriminator D that can discriminate positive and negative examples.
• Generate negative examples by discriminator D
D
x
D x arg max
xX~
v.s.
Discriminator
- Training
𝐷 𝑥𝐷 𝑥
𝐷 𝑥 𝐷 𝑥
In the end ……
real
generated
Graphical Model
Bayesian Network (Directed Graph)
Markov Random Field (Undirected Graph)
Markov Logic Network
Boltzmann Machine
Restricted
Boltzmann Machine
Structured Learning ➢ Structured Perceptron
➢ Structured SVM
➢ Gibbs sampling
➢ Hidden information
➢ Application: sequence labelling, summarization
Conditional Random Field
Segmental CRF
(Only list some of the approaches)
Energy-based Model:
http://www.cs.nyu .edu/~yann/resear ch/ebm/
Generator v.s. Discriminator
• Generator
• Pros:
• Easy to generate even with deep model
• Cons:
• Imitate the appearance
• Hard to learn the correlation between components
• Discriminator
• Pros:
• Considering the big picture
• Cons:
• Generation is not always feasible
• Especially when your model is deep
• How to do negative sampling?
Generator + Discriminator
• General Algorithm
• Given a set of positive examples, randomly generate a set of negative examples.
• In each iteration
• Learn a discriminator D that can discriminate positive and negative examples.
• Generate negative examples by discriminator D
D
x
D x arg max
xX~
v.s.
G
~ x =
Generating Negative Examples
x
D x arg max
xXG
~ x = ~
Discri- minator NN
Generator image code
0.13 hidden layer
update fix
0.9Gradient Ascent
• Initialize generator and discriminator
• In each training iteration:
D G
Sample some real objects:
Generate some fake objects:
G
Algorithm
D Update
G image D
code code code code
0 0
0 0
1 1
1 1
code code code code
image image
image
1
update fix
• Initialize generator and discriminator
• In each training iteration:
D G
Learning D
Sample some real objects:
Generate some fake objects:
G
Algorithm
D Update
Learning
G G image D
code code code code
1 1
1 1
code code code code
image image
image
1
update fix
0 0
0 0
Benefit of GAN
• From Discriminator’s point of view
• Using generator to generate negative samples
• From Generator’s point of view
• Still generate the object component-by- component
• But it is learned from the discriminator with global view.
x
D x arg max
xXG
~ x = ~
efficient
GAN
感謝 段逸林 同學提供結果
https://arxiv.org/a bs/1512.09300
VAE GAN
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
Conditional Generation
Generation
Conditional Generation
NN
Generator
“Girl with red hair and red eyes”
“Girl with yellow ribbon”
NN
Generator
0.1
−0.1
⋮ 0.7
−0.3 0.1
⋮ 0.9 0.3
−0.1
⋮
−0.7
In a specific range
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
Modifying Input Code
Generator
0.1
−3
⋮ 2.4 0.9
Generator
3
−3
⋮ 2.4 0.9
Generator
0.1 2.1
⋮ 2.4 0.9
Generator
0.1
−3
⋮ 2.4 3.5 Each dimension of input vector
represents some characteristics. Longer hair
blue hair Open mouth
➢The input code determines the generator output.
➢Understand the meaning of each dimension to
control the output.
Connecting Code and Attribute
CelebA
GAN+Autoencoder
• We have a generator (input z, output x)
• However, given x, how can we find z?
• Learn an encoder (input x, output z)
Generator (Decoder)
Encoder z x
x
as close as possible
fixed Discriminator
init
Autoencoder
different structures?
Generator (Decoder)
Encoder z x
x
as close as possible
Attribute Representation
𝑧
𝑙𝑜𝑛𝑔= 1
𝑁
1
𝑥∈𝑙𝑜𝑛𝑔
𝐸𝑛 𝑥 − 1
𝑁
2
𝑥′∉𝑙𝑜𝑛𝑔
𝐸𝑛 𝑥
′Short Hair
Long 𝐸𝑛 𝑥 + 𝑧𝑙𝑜𝑛𝑔 = 𝑧′ Hair
𝑥 𝐺𝑒𝑛 𝑧′
𝑧
𝑙𝑜𝑛𝑔https://www.youtube.com/watch?v=kPEIJJsQr7U
Photo
Editing
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
Target of NN output
Conditional GAN
• Text to image by traditional supervised learning
NN Image
Text: “train”
c1: a dog is running ො𝑥1:
ො 𝑥2: c2: a bird is flying
A blurry image!
c1: a dog is running 𝑥ො1:
as close as possible
Conditional GAN
G
Prior distribution 𝑧
x = G(c,z)
c: train
Target of NN output Text: “train”
A blurry image!
Approximate the distribution below It is a distribution.
Image
Conditional GAN
D
(type 2)
scalar
𝑐 𝑥 D
(type 1)
scalar
𝑥
Positive example:
Negative example:
G
Prior distribution 𝑧
x = G(c,z)
c: train
x is realistic or not
Image
x is realistic or not +
c and x are matched or not
(train , ) (train , ) (cat , ) Positive example:
Negative example:
Text to Image - Results
"red flower with black center"
Text to Image
- Results
Image-to-image
https://arxiv.org/pdf/1611.07004
G
𝑧
x = G(c,z)
𝑐
as close as possible
Image-to-image
• Traditional supervised approach
NN Image
It is blurry because it is the average of several images.
Testing:
input close
Image-to-image
• Experimental results
Testing:
input close GAN
G 𝑧
Image D scalar
GAN + close
Image super resolution
• Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, Wenzhe Shi, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”, CVPR, 2016
Video Generation
Generator
Discrimi nator
Last frame is real or generated
Discriminator thinks it is real target Minimize distance
https://github.com/dyelax/Adversarial_Video_Generation
Speech Enhancement
• Typical deep learning approach
Noisy Clean
G
OutputUsing CNN
Speech Enhancement
• Conditional GAN
G
D scalar
noisy output clean
noisy clean
output
noisy
training data
(fake pair or not)
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
Cycle GAN, Disco GAN
Transform an object from one domain to another without paired data
photo van Gogh
Domain X Domain Y
paired data
Cycle GAN
𝐺
𝑋→𝑌Domain X Domain Y
𝐷
𝑌Domain Y Domain X
scalar Input image
belongs to
domain Y or not Become similar
to domain Y
Not what we want ignore input
https://arxiv.org/abs/1703.10593 https://junyanz.github.io/CycleGAN/
Cycle GAN
Domain X Domain Y
𝐺
𝑋→𝑌𝐷
𝑌Domain Y
scalar Input image
belongs to
domain Y or not
𝐺
Y→Xas close as possible
Lack of information for reconstruction
Cycle GAN
Domain X Domain Y
𝐺
𝑋→𝑌𝐺
Y→Xas close as possible
𝐺
Y→X𝐺
𝑋→𝑌as close as possible
𝐷
𝑌𝐷
𝑋 scalar: belongs todomain Y or not scalar: belongs to
domain X or not
c.f. Dual Learning
動畫化的世界
• Using the code:
https://github.com/Hi- king/kawaii_creator
• It is not cycle GAN, Disco GAN
input output domain
Outline
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop
https://www.youtube.com/watch?v=9c4z6YsBGQ0
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman and Alexei A. Efros. "Generative Visual Manipulation on the Natural Image Manifold", ECCV, 2016.
Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston, Neural Photo Editing with Introspective Adversarial Networks, arXiv preprint, 2017
Basic Idea
space of code z
Why move on the code space?
Fulfill the constraint
Generator (Decoder)
Encoder z x
x
as close as possible
Back to z
• Method 1
• Method 2
• Method 3
𝑧∗ = 𝑎𝑟𝑔 min
𝑧 𝐿 𝐺 𝑧 , 𝑥𝑇
𝑥𝑇 𝑧∗
Difference between G(z) and xT
➢ Pixel-wise
➢ By another network
Using the results from method 2 as the initialization of method 1 Gradient Descent
Editing Photos
• z
0is the code of the input image
𝑧
∗= 𝑎𝑟𝑔 min
𝑧
𝑈 𝐺 𝑧 + 𝜆
1𝑧 − 𝑧
0 2− 𝜆
2𝐷 𝐺 𝑧
Does it fulfill the constraint of editing?
Not too far away from the original image
Using discriminator to check the image is realistic or not
image
Concluding Remarks
Basic Idea of GAN
When do we need GAN?
GAN as structured learning algorithm Conditional Generation by GAN
• Modifying input code
• Paired data
• Unpaired data
• Application: Intelligent Photoshop