• 沒有找到結果。

Slides credited from Mark Chang & Hung-Yi Lee

N/A
N/A
Protected

Academic year: 2022

Share "Slides credited from Mark Chang & Hung-Yi Lee"

Copied!
117
0
0

加載中.... (立即查看全文)

全文

(1)
(2)

Outline

• CNN (Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

(3)

Outline

• CNN (Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

(4)

Image Recognition

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

(5)

Why CNN for Image

• Some patterns are much smaller than the whole image

A neuron does not have to see the whole image to discover the pattern.

“beak” detector

Connecting to small region with less parameters

(6)

Why CNN for Image

• The same patterns appear in different regions.

“upper-left beak” detector

“middle beak”

detector They can use the same set of parameters.

Do almost the same thing

(7)

Why CNN for Image

• Subsampling the pixels will not change the object

subsampling bird

bird

We can subsample the pixels to make image smaller

Less parameters for the network to process the image

(8)

Image Recognition

(9)

Local Connectivity

Neurons connect to a small

region

(10)

Parameter Sharing

• The same feature in different positions

Neurons

share the same weights

(11)

Parameter Sharing

• Different features in the same position

Neurons

have different weights

(12)

Convolutional Layers

depth

width width

depth

weights weights

height

shared weight

(13)

Convolutional Layers

c1

c2

b1

b2 a1

a2

a3

depth = 2 depth = 1

(14)

Convolutional Layers

c1 b1

b2 a1

a2

d1

b3 a

c2

d2

depth = 2 depth = 2

(15)

Convolutional Layers

c1 b1

b2 a1

a2

d1

b3 a3

c2

d2

depth = 2 depth = 2

(16)

Convolutional Layers

A B C

A B C D

(17)

Hyper-parameters of CNN

• Stride • Padding

0 0

Stride = 1

Stride = 2

Padding = 0

Padding = 1

(18)

Example

Output

Volume (3x3x2)

Input

Volume (7x7x3)

Stride = 2

Padding = 1 Filter (3x3x3)

(19)

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

(20)

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

(21)

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

(22)

Relationship with Convolution

(23)

Relationship with Convolution

(24)

Relationship with Convolution

(25)

Relationship with Convolution

(26)

Nonlinearity

• Rectified Linear (ReLU)

n

ReLU

(27)

Why ReLU?

• Easy to train

• Avoid gradient vanishing problem

Sigmoid saturated

gradient ≈ 0 ReLU not saturated

(28)

Why ReLU?

• Biological reason

strong stimulation ReLU

weak stimulation

neuron t

v

strong stimulation

neuron t

v

weak stimulation

(29)

Pooling Layer

1 3 2 4

5 7 6 8

0 0 3 3

5 5 0 0

4 5

5 3

7 8

5 3

Maximum Pooling

Average Pooling

Max(1,3,5,7) = 7 Avg(1,3,5,7) = 4

no overlap

no weights

depth = 1

Max(0,0,5,5) = 5

(30)

Why “Deep” Learning?

(31)

Visual Perception of Computer

Convolutional Layer

Convolutional

Layer Pooling Layer Pooling

Layer

Receptive Fields Receptive Fields Input

Layer

(32)

Fully-Connected Layer

• Fully-Connected Layers : Global feature extraction

• Softmax Layer: Classifier Convolutional

Layer Convolutional Layer

Pooling Layer Pooling

Layer Input

Layer Input

Image

Fully-Connected

Layer Softmax Layer

5

7

Class Label

(33)

Training

• Forward Propagation

n2 n1

(34)

Training

• Update weights

n2 n1

Cost function:

(35)

Training

• Update weights

n2 n1

Cost function:

(36)

Training

• Propagate to the previous layer

n2 n1

Cost function:

(37)

Training Convolutional Layers

• example:

a3 a2 a1

b2 b1

output input

Convolutional Layer

(38)

Training Convolutional Layers

• Forward propagation

a3 a2 a1

b2 b1

input Convolutional

Layer

(39)

Training Convolutional Layers

• Update weights

Cost function:

a3 a2 a1

b2 b1

(40)

Training Convolutional Layers

• Update weights

a3 a2 a1

b2 b1 Cost

function:

(41)

Training Convolutional Layers

• Update weights

a3 a2 a1

b2 b1 Cost

function:

(42)

Training Convolutional Layers

• Update weights

a3 a2 a1

b2 b1 Cost

function:

(43)

Training Convolutional Layers

• Propagate to the previous layer

Cost function:

a3 a2 a1

b2 b1

(44)

Training Convolutional Layers

• Propagate to the previous layer

Cost function:

a3 a2 a1

b2 b1

(45)

Max-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a3 a2 a1

b2 b1

Max-pooling

(46)

Max-Pooling Layers during Training

• Propagate to the previous layer

a3 a2 a1

b2 b1 Cost

function:

(47)

Max-Pooling Layers during Training

• if a1 = a2 ??

Choose the node with smaller index

a3 a2 a1

b2 b1 Cost

function:

(48)

Avg-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a3 a2 a1

b2 b1

Avg-pooling

(49)

Avg-Pooling Layers during Training

• Propagate to the previous layer

Cost function:

a3 a2 a1

b2 b1

(50)

ReLU during Training

n

(51)

Training CNN

(52)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

(53)

LeNet (1998)

Yann LeCun

(54)

ImageNet Challenge (2010-2017)

• ImageNet Large Scale Visual Recognition Challenge

1000 categories

Training: 1,200,000

Validation: 50,000

Testing: 100,000

(55)

ImageNet Challenge (2010-2017)

(56)

AlexNet (2012)

• The resurgence of Deep Learning

ReLU, dropout, image augmentation, max pooling

Geoffrey Hinton Alex Krizhevsky

(57)

VGGNet (2014)

D: VGG16 E: VGG19

All filters are 3x3

(58)

VGGNet

• More layers & smaller filters (3x3) is better

• More non-linearity, fewer parameters

One 5x5 filter

• Parameters:

5x5 = 25

• Non-linear:1

Two 3x3 filters

• Parameters:

3x3x2 = 18

• Non-linear:2

(59)

VGG 19

depth=64 3x3 conv

conv1_1 conv1_2

maxpool

depth=128 3x3 conv

conv2_1 conv2_2

maxpool

depth=256 3x3 conv

conv3_1 conv3_2 conv3_3 conv3_4

depth=512 3x3 conv

conv4_1 conv4_2 conv4_3 conv4_4

depth=512 3x3 conv

conv5_1 conv5_2 conv5_3 conv5_4

maxpool maxpool maxpool

size=4096 FC1 FC2 size=1000

softmax

(60)

GoogLeNet (2014)

• Paper:

http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf

22 layers deep network

Inception

Module

(61)

Inception Module

• Best size?

3x3? 5x5?

• Use them all, and combine

(62)

Inception Module

1x1

convolution

3x3

convolution

5x5

convolution

3x3

max-pooling

Previous layer

Filter

Concatenate

(63)

Inception Module with Dimension Reduction

• Use 1x1 filters to reduce dimension

(64)

Inception Module with Dimension Reduction

Previous layer

1x1

convolution (1x1x256x128)

Reduced dimension

Input size 1x1x256

256 128

Output size 1x1x128

(65)

ResNet (2015)

• Residual Networks with 152 layers

(66)

ResNet

• Residual learning: a building block

Residual

function

(67)

Residual Learning with Dimension Reduction

• using 1x1 filters

(68)

Open Images Extended - Crowdsourced

(69)

Pretrained Model Download

• http://www.vlfeat.org/matconvnet/pretrained/

Alexnet:

http://www.vlfeat.org/matconvnet/models/imagenet-matconvnet-alex.mat

VGG19:

http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat

GoogLeNet:

http://www.vlfeat.org/matconvnet/models/imagenet-googlenet-dag.mat

ResNet

http://www.vlfeat.org/matconvnet/models/imagenet-resnet-152-dag.mat

(70)

Using Pretrained Model

• Lower layers:edge, blob, texture (more general)

• Higher layers : object part (more specific)

(71)

Transfer Learning

• The pretrained model is trained on ImageNet

• If your data is similar to the ImageNet data

Fix all CNN Layers

Train FC layer

Conv layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Your data

Conv layer

Conv layer

Conv layer

Your data

(72)

Transfer Learning

• The pretrained model is trained on ImageNet

• If your data is far different from the ImageNet data

Fix lower CNN Layers

Train higher CNN and FC layers

Conv layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Your data

Conv layer

Conv layer

Conv layer

Your data

(73)

Transfer Learning Example

daisy 634 photos

dandelion 899

photos

roses 642 photos

tulips 800 photos

sunflowers 700

photos

http://download.tensorflow.org/example_images/flower_photos.tgz

(74)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

(75)

Visualizing CNN

(76)

Visualizing CNN

CNN

CNN flower

random noise

filter response

filter response

(77)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

score:

lower score

higher score

gradient:

(78)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

gradient:

lower score

higher score

update

learning rate

(79)

Gradient Ascent

(80)

Different Layers of Visualization

CNN

(81)

Multiscale Image Generation

visualize resize

visualize resize

visualize

(82)

Deep Dream

• Given a photo, machine adds what it sees ……

(83)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

(84)

The Mechanism of Painting

Brain Artist

Scene Style ArtWork

Computer Neural Networks

(85)

Content Generation

Brain Artist

Content

Canvas

Minimize the difference Neural

Stimulation

Draw

(86)

Content Generation

Filter Responses VGG19

Update the color of the pixels Content

Canvas

Result

Width*Height

Depth

Minimize the difference

(87)

Content Generation

(88)

Style Generation

VGG19 Artwork

G

G

Filter Responses Gram Matrix

Width*Height

Depth

Depth

Depth

Position- Position-

(89)

Style Generation

VGG19 Filter Responses

Gram Matrix

Minimize the difference G

G Style

Canvas

Update the color of the pixels

Result

(90)

Style Generation

(91)

Artwork Generation

Filter Responses VGG19

Gram Matrix

(92)

Artwork Generation

(93)

Content v.s. Style

0.15 0.05

0.02

(94)

Neural Doodle

• Image analogy

(95)

Neural Doodle

• Image analogy

恐怖連結,慎入!

https://raw.githubusercontent.com/awentzonline/

image-analogies/master/examples/images/trump- image-analogy.jpg

(96)

Real-time Texture Synthesis

(97)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• More Applications

(98)

More Application: Playing Go

CNN

CNN

record of previous plays

Target:

“天元” = 1 else = 0

Target:

“五之 5” = 1 else = 0

Training:

黑: 5之五 白: 天元 黑: 五之5 …

(99)

Why CNN for Go?

• Some patterns are much smaller than the whole image

• The same patterns appear in different regions.

AlphaGo uses 5 x 5 for first layer

(100)

Why CNN for Go?

• Subsampling the pixels will not change the object

Alpha Go does not use Max Pooling ……

Max Pooling How to explain this???

(101)

More Applications: Sentence Encoding

(102)

Ambiguity in Natural Language

http://3rd.mafengwo.cn/travels/info_wei bo.php?id=2861280

http://www.appledaily.com.tw/realtimen ews/article/new/20151006/705309/

(103)

Element-wise 1D Operations on Word Vectors

• 1D Convolution or 1D Pooling

This is a

operation operation

This is a

Represented by

(104)

CNN Model

This is a dog

conv3

conv2

conv1 conv1

conv1 conv2

Different conv layers

(105)

CNN with Max-Pooling Layers

• Similar to syntax tree

• But human-labeled syntax tree is not needed

This is a dog

conv2

pool1

conv1 conv1

conv1 pool1

This is a dog

conv2

pool1

conv1 conv1

pool1

Max Pooling

(106)

Sentiment Analysis by CNN

• Use softmax layer to classify the sentiments positive

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

This movie is awful

conv2

pool1

conv1 conv1

conv1 pool1

softmax

(107)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Backward propagation error

(108)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

positive

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Update the weights

(109)

Multiple Filters

• Richer features than RNN

This is

filter11 filter12 Filter13

a

filter11 filter12 Filter13 filter21 filter22 Filter23

(110)

Resizing Sentence

• Image can be easily resized • Sentence can’t be easily resized

全台灣最高樓在台北

resize

resize

全台灣最高的高樓在台北市 全台灣最高樓在台北市

台灣最高樓在台北

(111)

Various Input Size

• Convolutional layers and pooling layers

can handle input with various size

This is a dog

pool1

conv1 conv1

conv1 pool1

the dog run

pool1

conv1 conv1

(112)

Various Input Size

• Fully-connected layer and softmax layer

need fixed-size input

The dog run

fc softmax

This is a

fc softmax

dog

(113)

k-max Pooling

• choose the k-max values

• preserve the order of input values

• variable-size input, fixed-size output

3-max pooling

13 4 1 7 8

12 5 21 15 7 4 9

3-max pooling

12 21 15 13 7 8

(114)

Wide Convolution

• Ensures that all weights reach the entire sentence

conv conv conv conv conv conv conv conv

Wide convolution Narrow convolution

(115)

Dynamic k-max Pooling

Wide convolution &

Dynamic k-max pooling

(116)

CNN for Sentence Classification

• Pretrained by word2vec

• Static & non-static channels

Static: fix the values during training

Non-Static: update the values during training

(117)

Concluding Remarks

Convolutional

Layer Convolutional Layer

Pooling Layer Pooling

Layer Input

Layer Input

Image

Fully-Connected

Layer Softmax Layer

5

7

Class Label

參考文獻

相關文件

▪ Step 2: Run DFS on the transpose

ˆ incrementally develop sender, receiver sides of reliable data transfer protocol (rdt). ˆ consider only unidirectional

 “Greedy”: always makes the choice that looks best at the moment in the hope that this choice will lead to a globally optimal solution.  When to

Machine Translation Speech Recognition Image Captioning Question Answering Sensory Memory.

◦ Action, State, and Reward Markov Decision Process Reinforcement Learning.

Constrain the data distribution for learned latent codes Generate the latent code via a prior

 Sequence-to-sequence learning: both input and output are both sequences with different lengths..

Training two networks jointly  the generator knows how to adapt its parameters in order to produce output data that can fool the