Outline
• CNN (Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• More Applications
Outline
• CNN (Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• More Applications
Image Recognition
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
Why CNN for Image
• Some patterns are much smaller than the whole image
A neuron does not have to see the whole image to discover the pattern.
“beak” detector
Connecting to small region with less parameters
Why CNN for Image
• The same patterns appear in different regions.
“upper-left beak” detector
“middle beak”
detector They can use the same set of parameters.
Do almost the same thing
Why CNN for Image
• Subsampling the pixels will not change the object
subsampling bird
bird
We can subsample the pixels to make image smaller
Less parameters for the network to process the image
Image Recognition
Local Connectivity
Neurons connect to a small
region
Parameter Sharing
• The same feature in different positions
Neurons
share the same weights
Parameter Sharing
• Different features in the same position
Neurons
have different weights
Convolutional Layers
depth
width width
depth
weights weights
height
shared weight
Convolutional Layers
c1
c2
b1
b2 a1
a2
a3
depth = 2 depth = 1
Convolutional Layers
c1 b1
b2 a1
a2
d1
b3 a
c2
d2
depth = 2 depth = 2
Convolutional Layers
c1 b1
b2 a1
a2
d1
b3 a3
c2
d2
depth = 2 depth = 2
Convolutional Layers
A B C
A B C D
Hyper-parameters of CNN
• Stride • Padding
0 0
Stride = 1
Stride = 2
Padding = 0
Padding = 1
Example
Output
Volume (3x3x2)
Input
Volume (7x7x3)
Stride = 2
Padding = 1 Filter (3x3x3)
Convolutional Layers
http://cs231n.github.io/convolutional-networks/
Convolutional Layers
http://cs231n.github.io/convolutional-networks/
Convolutional Layers
http://cs231n.github.io/convolutional-networks/
Relationship with Convolution
Relationship with Convolution
Relationship with Convolution
Relationship with Convolution
Nonlinearity
• Rectified Linear (ReLU)
n
ReLU
Why ReLU?
• Easy to train
• Avoid gradient vanishing problem
Sigmoid saturated
gradient ≈ 0 ReLU not saturated
Why ReLU?
• Biological reason
strong stimulation ReLU
weak stimulation
neuron t
v
strong stimulation
neuron t
v
weak stimulation
Pooling Layer
1 3 2 4
5 7 6 8
0 0 3 3
5 5 0 0
4 5
5 3
7 8
5 3
Maximum Pooling
Average Pooling
Max(1,3,5,7) = 7 Avg(1,3,5,7) = 4
no overlap
no weights
depth = 1
Max(0,0,5,5) = 5
Why “Deep” Learning?
Visual Perception of Computer
Convolutional Layer
Convolutional
Layer Pooling Layer Pooling
Layer
Receptive Fields Receptive Fields Input
Layer
Fully-Connected Layer
• Fully-Connected Layers : Global feature extraction
• Softmax Layer: Classifier Convolutional
Layer Convolutional Layer
Pooling Layer Pooling
Layer Input
Layer Input
Image
Fully-Connected
Layer Softmax Layer
5
7
Class Label
Training
• Forward Propagation
n2 n1
Training
• Update weights
n2 n1
Cost function:
Training
• Update weights
n2 n1
Cost function:
Training
• Propagate to the previous layer
n2 n1
Cost function:
Training Convolutional Layers
• example:
a3 a2 a1
b2 b1
output input
Convolutional Layer
Training Convolutional Layers
• Forward propagation
a3 a2 a1
b2 b1
input Convolutional
Layer
Training Convolutional Layers
• Update weights
Cost function:
a3 a2 a1
b2 b1
Training Convolutional Layers
• Update weights
a3 a2 a1
b2 b1 Cost
function:
Training Convolutional Layers
• Update weights
a3 a2 a1
b2 b1 Cost
function:
Training Convolutional Layers
• Update weights
a3 a2 a1
b2 b1 Cost
function:
Training Convolutional Layers
• Propagate to the previous layer
Cost function:
a3 a2 a1
b2 b1
Training Convolutional Layers
• Propagate to the previous layer
Cost function:
a3 a2 a1
b2 b1
Max-Pooling Layers during Training
• Pooling layers have no weights
• No need to update weights
a3 a2 a1
b2 b1
Max-pooling
Max-Pooling Layers during Training
• Propagate to the previous layer
a3 a2 a1
b2 b1 Cost
function:
Max-Pooling Layers during Training
• if a1 = a2 ??
◦ Choose the node with smaller index
a3 a2 a1
b2 b1 Cost
function:
Avg-Pooling Layers during Training
• Pooling layers have no weights
• No need to update weights
a3 a2 a1
b2 b1
Avg-pooling
Avg-Pooling Layers during Training
• Propagate to the previous layer
Cost function:
a3 a2 a1
b2 b1
ReLU during Training
n
Training CNN
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• More Applications
LeNet (1998)
Yann LeCun
ImageNet Challenge (2010-2017)
• ImageNet Large Scale Visual Recognition Challenge
◦ 1000 categories
◦ Training: 1,200,000
◦ Validation: 50,000
◦ Testing: 100,000
ImageNet Challenge (2010-2017)
AlexNet (2012)
• The resurgence of Deep Learning
◦ ReLU, dropout, image augmentation, max pooling
Geoffrey Hinton Alex Krizhevsky
VGGNet (2014)
D: VGG16 E: VGG19
All filters are 3x3
VGGNet
• More layers & smaller filters (3x3) is better
• More non-linearity, fewer parameters
One 5x5 filter
• Parameters:
5x5 = 25
• Non-linear:1
Two 3x3 filters
• Parameters:
3x3x2 = 18
• Non-linear:2
VGG 19
depth=64 3x3 conv
conv1_1 conv1_2
maxpool
depth=128 3x3 conv
conv2_1 conv2_2
maxpool
depth=256 3x3 conv
conv3_1 conv3_2 conv3_3 conv3_4
depth=512 3x3 conv
conv4_1 conv4_2 conv4_3 conv4_4
depth=512 3x3 conv
conv5_1 conv5_2 conv5_3 conv5_4
maxpool maxpool maxpool
size=4096 FC1 FC2 size=1000
softmax
GoogLeNet (2014)
• Paper:
http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf
22 layers deep network
Inception
Module
Inception Module
• Best size?
◦ 3x3? 5x5?
• Use them all, and combine
Inception Module
1x1
convolution3x3
convolution5x5
convolution3x3
max-poolingPrevious layer
Filter
Concatenate
Inception Module with Dimension Reduction
• Use 1x1 filters to reduce dimension
Inception Module with Dimension Reduction
Previous layer
1x1
convolution (1x1x256x128)Reduced dimension
Input size 1x1x256
256 128
Output size 1x1x128
ResNet (2015)
• Residual Networks with 152 layers
ResNet
• Residual learning: a building block
Residual
function
Residual Learning with Dimension Reduction
• using 1x1 filters
Open Images Extended - Crowdsourced
Pretrained Model Download
• http://www.vlfeat.org/matconvnet/pretrained/
◦ Alexnet:
◦ http://www.vlfeat.org/matconvnet/models/imagenet-matconvnet-alex.mat
◦ VGG19:
◦ http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat
◦ GoogLeNet:
◦ http://www.vlfeat.org/matconvnet/models/imagenet-googlenet-dag.mat
◦ ResNet
◦ http://www.vlfeat.org/matconvnet/models/imagenet-resnet-152-dag.mat
Using Pretrained Model
• Lower layers:edge, blob, texture (more general)
• Higher layers : object part (more specific)
Transfer Learning
• The pretrained model is trained on ImageNet
• If your data is similar to the ImageNet data
◦ Fix all CNN Layers
◦ Train FC layer
Conv layer
FC layer FC layer
Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data
Your data
…
Conv layer
Conv layer
…
Conv layer
Your data
… …
Transfer Learning
• The pretrained model is trained on ImageNet
• If your data is far different from the ImageNet data
◦ Fix lower CNN Layers
◦ Train higher CNN and FC layers
Conv layer
FC layer FC layer
Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data
Your data
…
Conv layer
Conv layer
…
Conv layer
Your data
… …
Transfer Learning Example
daisy 634 photos
dandelion 899
photos
roses 642 photos
tulips 800 photos
sunflowers 700
photos
http://download.tensorflow.org/example_images/flower_photos.tgz
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• More Applications
Visualizing CNN
Visualizing CNN
CNN
CNN flower
random noise
filter response
filter response
filter response:
Gradient Ascent
• Magnify the filter response random
noise:
score:
lower score
higher score
gradient:
filter response:
Gradient Ascent
• Magnify the filter response random
noise:
gradient:
lower score
higher score
update
learning rate
Gradient Ascent
Different Layers of Visualization
CNN
Multiscale Image Generation
visualize resize
visualize resize
visualize
Deep Dream
• Given a photo, machine adds what it sees ……
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• More Applications
The Mechanism of Painting
Brain Artist
Scene Style ArtWork
Computer Neural Networks
Content Generation
Brain Artist
Content
Canvas
Minimize the difference Neural
Stimulation
Draw
Content Generation
Filter Responses VGG19
Update the color of the pixels Content
Canvas
Result
Width*Height
Depth
Minimize the difference
Content Generation
Style Generation
VGG19 Artwork
G
G
Filter Responses Gram Matrix
Width*Height
Depth
Depth
Depth
Position- Position-
Style Generation
VGG19 Filter Responses
Gram Matrix
Minimize the difference G
G Style
Canvas
Update the color of the pixels
Result
Style Generation
Artwork Generation
Filter Responses VGG19
Gram Matrix
Artwork Generation
Content v.s. Style
0.15 0.05
0.02
Neural Doodle
• Image analogy
Neural Doodle
• Image analogy
恐怖連結,慎入!
https://raw.githubusercontent.com/awentzonline/
image-analogies/master/examples/images/trump- image-analogy.jpg
Real-time Texture Synthesis
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• More Applications
More Application: Playing Go
CNN
CNN
record of previous plays
Target:
“天元” = 1 else = 0
Target:
“五之 5” = 1 else = 0
Training:
黑: 5之五 白: 天元 黑: 五之5 …Why CNN for Go?
• Some patterns are much smaller than the whole image
• The same patterns appear in different regions.
AlphaGo uses 5 x 5 for first layer
Why CNN for Go?
• Subsampling the pixels will not change the object
Alpha Go does not use Max Pooling ……
Max Pooling How to explain this???
More Applications: Sentence Encoding
Ambiguity in Natural Language
http://3rd.mafengwo.cn/travels/info_wei bo.php?id=2861280
http://www.appledaily.com.tw/realtimen ews/article/new/20151006/705309/
Element-wise 1D Operations on Word Vectors
• 1D Convolution or 1D Pooling
This is a
operation operation
This is a
Represented by
CNN Model
This is a dog
conv3
conv2
conv1 conv1
conv1 conv2
Different conv layers
CNN with Max-Pooling Layers
• Similar to syntax tree
• But human-labeled syntax tree is not needed
This is a dog
conv2
pool1
conv1 conv1
conv1 pool1
This is a dog
conv2
pool1
conv1 conv1
pool1
Max Pooling
Sentiment Analysis by CNN
• Use softmax layer to classify the sentiments positive
This movie is awesome
conv2
pool1
conv1 conv1
conv1 pool1
softmax
negative
This movie is awful
conv2
pool1
conv1 conv1
conv1 pool1
softmax
Sentiment Analysis by CNN
• Build the “correct syntax tree” by training negative
This movie is awesome
conv2
pool1
conv1 conv1
conv1 pool1
softmax
negative
This movie is awesome
conv2
pool1
conv1 conv1
conv1 pool1
softmax
Backward propagation error
Sentiment Analysis by CNN
• Build the “correct syntax tree” by training negative
This movie is awesome
conv2
pool1
conv1 conv1
conv1 pool1
softmax
positive
This movie is awesome
conv2
pool1
conv1 conv1
conv1 pool1
softmax
Update the weights
Multiple Filters
• Richer features than RNN
This is
filter11 filter12 Filter13
a
filter11 filter12 Filter13 filter21 filter22 Filter23
Resizing Sentence
• Image can be easily resized • Sentence can’t be easily resized
全台灣最高樓在台北
resize
resize
全台灣最高的高樓在台北市 全台灣最高樓在台北市
台灣最高樓在台北
Various Input Size
• Convolutional layers and pooling layers
◦ can handle input with various size
This is a dog
pool1
conv1 conv1
conv1 pool1
the dog run
pool1
conv1 conv1
Various Input Size
• Fully-connected layer and softmax layer
◦ need fixed-size input
The dog run
fc softmax
This is a
fc softmax
dog
k-max Pooling
• choose the k-max values
• preserve the order of input values
• variable-size input, fixed-size output
3-max pooling
13 4 1 7 8
12 5 21 15 7 4 9
3-max pooling
12 21 15 13 7 8
Wide Convolution
• Ensures that all weights reach the entire sentence
conv conv conv conv conv conv conv conv
Wide convolution Narrow convolution
Dynamic k-max Pooling
Wide convolution &
Dynamic k-max pooling
CNN for Sentence Classification
• Pretrained by word2vec
• Static & non-static channels
◦ Static: fix the values during training
◦ Non-Static: update the values during training
Concluding Remarks
Convolutional
Layer Convolutional Layer
Pooling Layer Pooling
Layer Input
Layer Input
Image
Fully-Connected
Layer Softmax Layer
5
7
Class Label