Slides credited from Mark Chang & Hung-Yi Lee

(1)

(2)

Outline

• CNN (Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

(3)

Outline

• CNN (Convolutional Neural Networks) Introduction

• CNN as Artist

(4)

Image Recognition

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

(5)

Why CNN for Image

• Some patterns are much smaller than the whole image

A neuron does not have to see the whole image to discover the pattern.

“beak” detector

Connecting to small region with less parameters

(6)

Why CNN for Image

• The same patterns appear in different regions.

“upper-left beak” detector

“middle beak”

detector They can use the same set of parameters.

Do almost the same thing

(7)

Why CNN for Image

• Subsampling the pixels will not change the object

subsampling bird

bird

We can subsample the pixels to make image smaller

Less parameters for the network to process the image

(8)

Image Recognition

(9)

Local Connectivity

Neurons connect to a small

region

(10)

Parameter Sharing

• The same feature in different positions

Neurons

share the same weights

(11)

Parameter Sharing

• Different features in the same position

Neurons

have different weights

(12)

Convolutional Layers

depth

width width

depth

weights weights

height

shared weight

(13)

Convolutional Layers

c₁

c₂

b₁

b₂ a₁

a₂

a₃

depth = 2 depth = 1

(14)

Convolutional Layers

c₁ b₁

b₂ a₁

a₂

d₁

b₃ a

c₂

d₂

depth = 2 depth = 2

(15)

Convolutional Layers

c₁ b₁

b₂ a₁

a₂

d₁

b₃ a₃

c₂

d₂

depth = 2 depth = 2

(16)

Convolutional Layers

A B C

A B C D

(17)

Hyper-parameters of CNN

• Stride • Padding

0 0

Stride = 1

Stride = 2

Padding = 0

Padding = 1

(18)

Example

Output

Volume (3x3x2)

Input

Volume (7x7x3)

Stride = 2

Padding = 1 Filter (3x3x3)

(19)

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

(20)

Convolutional Layers

(21)

Convolutional Layers

(22)

Relationship with Convolution

(23)

Relationship with Convolution

(24)

Relationship with Convolution

(25)

Relationship with Convolution

(26)

Nonlinearity

• Rectified Linear (ReLU)

n

ReLU

(27)

Why ReLU?

• Easy to train

• Avoid gradient vanishing problem

Sigmoid saturated

gradient ≈ 0 ReLU not saturated

(28)

Why ReLU?

• Biological reason

strong stimulation ReLU

weak stimulation

neuron t

v

strong stimulation

neuron t

v

weak stimulation

(29)

Pooling Layer

1 3 2 4

5 7 6 8

0 0 3 3

5 5 0 0

4 5

5 3

7 8

5 3

Maximum Pooling

Average Pooling

Max(1,3,5,7) = 7 Avg(1,3,5,7) = 4

no overlap

no weights

depth = 1

Max(0,0,5,5) = 5

(30)

Why “Deep” Learning?

(31)

Visual Perception of Computer

Convolutional Layer

Convolutional

Layer Pooling Layer Pooling

Layer

Receptive Fields Receptive Fields Input

Layer

(32)

Fully-Connected Layer

• Fully-Connected Layers : Global feature extraction

• Softmax Layer: Classifier Convolutional

Layer Convolutional Layer

Pooling Layer Pooling

Layer Input

Image

Fully-Connected

Layer Softmax Layer

5

7

Class Label

(33)

Training

• Forward Propagation

n₂ n₁

(34)

Training

• Update weights

n₂ n₁

Cost function:

(35)

Training

• Update weights

n₂ n₁

Cost function:

(36)

Training

• Propagate to the previous layer

n₂ n₁

Cost function:

(37)

Training Convolutional Layers

• example:

a₃ a₂ a₁

b₂ b₁

output input

Convolutional Layer

(38)

Training Convolutional Layers

• Forward propagation

a₃ a₂ a₁

b₂ b₁

input Convolutional

Layer

(39)

Training Convolutional Layers

• Update weights

Cost function:

a₃ a₂ a₁

b₂ b₁

(40)

Training Convolutional Layers

• Update weights

a₃ a₂ a₁

b₂ b₁ Cost

function:

(41)

Training Convolutional Layers

• Update weights

a₃ a₂ a₁

b₂ b₁ Cost

function:

(42)

Training Convolutional Layers

• Update weights

a₃ a₂ a₁

b₂ b₁ Cost

function:

(43)

Training Convolutional Layers

Cost function:

a₃ a₂ a₁

b₂ b₁

(44)

Training Convolutional Layers

Cost function:

a₃ a₂ a₁

b₂ b₁

(45)

Max-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a₃ a₂ a₁

b₂ b₁

Max-pooling

(46)

Max-Pooling Layers during Training

a₃ a₂ a₁

b₂ b₁ Cost

function:

(47)

Max-Pooling Layers during Training

• if a₁ = a₂ ??

◦ Choose the node with smaller index

a₃ a₂ a₁

b₂ b₁ Cost

function:

(48)

Avg-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a₃ a₂ a₁

b₂ b₁

Avg-pooling

(49)

Avg-Pooling Layers during Training

Cost function:

a₃ a₂ a₁

b₂ b₁

(50)

ReLU during Training

n

(51)

Training CNN

(52)

Outline

• CNN(Convolutional Neural Networks) Introduction

• CNN as Artist

(53)

LeNet (1998)

Yann LeCun

(54)

ImageNet Challenge (2010-2017)

• ImageNet Large Scale Visual Recognition Challenge

◦ 1000 categories

◦ Training: 1,200,000

◦ Validation: 50,000

◦ Testing: 100,000

(55)

ImageNet Challenge (2010-2017)

(56)

AlexNet (2012)

• The resurgence of Deep Learning

◦ ReLU, dropout, image augmentation, max pooling

Geoffrey Hinton Alex Krizhevsky

(57)

VGGNet (2014)

D: VGG16 E: VGG19

All filters are 3x3

(58)

VGGNet

• More layers & smaller filters (3x3) is better

• More non-linearity, fewer parameters

One 5x5 filter

• Parameters:

5x5 = 25

• Non-linear:1

Two 3x3 filters

• Parameters:

3x3x2 = 18

• Non-linear:2

(59)

VGG 19

depth=64 3x3 conv

conv1_1 conv1_2

maxpool

depth=128 3x3 conv

conv2_1 conv2_2

maxpool

depth=256 3x3 conv

conv3_1 conv3_2 conv3_3 conv3_4

depth=512 3x3 conv

maxpool maxpool maxpool

size=4096 FC1 FC2 size=1000

softmax

(60)

GoogLeNet (2014)

• Paper:

http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf

22 layers deep network

Inception

Module

(61)

Inception Module

• Best size?

◦ 3x3? 5x5?

• Use them all, and combine

(62)

Inception Module

1x1

convolution

3x3

convolution

5x5

convolution

3x3

max-pooling

Previous layer

Filter

Concatenate

(63)

Inception Module with Dimension Reduction

• Use 1x1 filters to reduce dimension

(64)

Inception Module with Dimension Reduction

Previous layer

1x1

convolution (1x1x256x128)

Reduced dimension

Input size 1x1x256

256 128

Output size 1x1x128

(65)

ResNet (2015)

• Residual Networks with 152 layers

(66)

ResNet

• Residual learning: a building block

Residual

function

(67)

Residual Learning with Dimension Reduction

• using 1x1 filters

(68)

Open Images Extended - Crowdsourced

(69)

Pretrained Model Download

• http://www.vlfeat.org/matconvnet/pretrained/

◦ Alexnet:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-matconvnet-alex.mat

◦ VGG19:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat

◦ GoogLeNet:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-googlenet-dag.mat

◦ ResNet

◦ http://www.vlfeat.org/matconvnet/models/imagenet-resnet-152-dag.mat

(70)

Using Pretrained Model

• Lower layers：edge, blob, texture (more general)

• Higher layers : object part (more specific)

(71)

Transfer Learning

• The pretrained model is trained on ImageNet

• If your data is similar to the ImageNet data

◦ Fix all CNN Layers

◦ Train FC layer

Conv layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Your data

…

Conv layer

…

Conv layer

Your data

… …

(72)

Transfer Learning

• The pretrained model is trained on ImageNet

• If your data is far different from the ImageNet data

◦ Fix lower CNN Layers

◦ Train higher CNN and FC layers

Conv layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Your data

…

Conv layer

…

Conv layer

Your data

… …

(73)

Transfer Learning Example

daisy 634 photos

dandelion 899

photos

roses 642 photos

tulips 800 photos

sunflowers 700

photos

http://download.tensorflow.org/example_images/flower_photos.tgz

(74)

Outline

• CNN as Artist

(75)

Visualizing CNN

(76)

Visualizing CNN

CNN

CNN flower

random noise

filter response

(77)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

score:

lower score

higher score

gradient:

(78)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

gradient:

lower score

higher score

update

learning rate

(79)

Gradient Ascent

(80)

Different Layers of Visualization

CNN

(81)

Multiscale Image Generation

visualize resize

visualize

(82)

Deep Dream

• Given a photo, machine adds what it sees ……

(83)

Outline

• CNN as Artist

(84)

The Mechanism of Painting

Brain Artist

Scene Style ArtWork

Computer Neural Networks

(85)

Content Generation

Brain Artist

Content

Canvas

Minimize the difference Neural

Stimulation

Draw

(86)

Content Generation

Filter Responses VGG19

Update the color of the pixels Content

Canvas

Result

Width*Height

Depth

Minimize the difference

(87)

Content Generation

(88)

Style Generation

VGG19 Artwork

G

Filter Responses Gram Matrix

Width*Height

Depth

Position- Position-

(89)

Style Generation

VGG19 Filter Responses

Gram Matrix

Minimize the difference G

G Style

Canvas

Update the color of the pixels

Result

(90)

Style Generation

(91)

Artwork Generation

Filter Responses VGG19

Gram Matrix

(92)

Artwork Generation

(93)

Content v.s. Style

0.15 0.05

0.02

(94)

Neural Doodle

• Image analogy

(95)

Neural Doodle

• Image analogy

恐怖連結，慎入！

https://raw.githubusercontent.com/awentzonline/

image-analogies/master/examples/images/trump- image-analogy.jpg

(96)

Real-time Texture Synthesis

(97)

Outline

• CNN as Artist

(98)

More Application: Playing Go

CNN

record of previous plays

Target:

“天元” = 1 else = 0

Target:

“五之 5” = 1 else = 0

Training:

黑: 5之五白: 天元黑: 五之5 …

(99)

Why CNN for Go?

• Some patterns are much smaller than the whole image

• The same patterns appear in different regions.

AlphaGo uses 5 x 5 for first layer

(100)

Why CNN for Go?

• Subsampling the pixels will not change the object

Alpha Go does not use Max Pooling ……

Max Pooling How to explain this???

(101)

More Applications: Sentence Encoding

(102)

Ambiguity in Natural Language

http://3rd.mafengwo.cn/travels/info_wei bo.php?id=2861280

http://www.appledaily.com.tw/realtimen ews/article/new/20151006/705309/

(103)

Element-wise 1D Operations on Word Vectors

• 1D Convolution or 1D Pooling

This is a

operation operation

This is a

Represented by

(104)

CNN Model

This is a dog

conv3

conv2

conv1 conv1

conv1 conv2

Different conv layers

(105)

CNN with Max-Pooling Layers

• Similar to syntax tree

• But human-labeled syntax tree is not needed

This is a dog

conv2

pool1

conv1 conv1

conv1 pool1

This is a dog

conv2

pool1

conv1 conv1

pool1

Max Pooling

(106)

Sentiment Analysis by CNN

• Use softmax layer to classify the sentiments positive

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

This movie is awful

conv2

pool1

conv1 conv1

conv1 pool1

softmax

(107)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Backward propagation error

(108)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

conv2

pool1

conv1 conv1

conv1 pool1

softmax

positive

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Update the weights

(109)

Multiple Filters

• Richer features than RNN

This is

filter11 filter12 Filter13

a

filter11 filter12 Filter13 filter21 filter22 Filter23

(110)

Resizing Sentence

• Image can be easily resized • Sentence can’t be easily resized

全台灣最高樓在台北

resize

全台灣最高的高樓在台北市全台灣最高樓在台北市

台灣最高樓在台北

(111)

Various Input Size

• Convolutional layers and pooling layers

◦ can handle input with various size

This is a dog

pool1

conv1 conv1

conv1 pool1

the dog run

pool1

conv1 conv1

(112)

Various Input Size

• Fully-connected layer and softmax layer

◦ need fixed-size input

The dog run

fc softmax

This is a

fc softmax

dog

(113)

k-max Pooling

• choose the k-max values

• preserve the order of input values

• variable-size input, fixed-size output

3-max pooling

13 4 1 7 8

12 5 21 15 7 4 9

3-max pooling

12 21 15 13 7 8

(114)

Wide Convolution

• Ensures that all weights reach the entire sentence

conv conv conv conv conv conv conv conv

Wide convolution Narrow convolution

(115)

Dynamic k-max Pooling

Wide convolution &

Dynamic k-max pooling

(116)

CNN for Sentence Classification

• Pretrained by word2vec

• Static & non-static channels

◦ Static: fix the values during training

◦ Non-Static: update the values during training

(117)

Concluding Remarks

Convolutional

Layer Convolutional Layer

Pooling Layer Pooling

Layer Input

Image

Fully-Connected

Layer Softmax Layer

5

7

Class Label