Slide credit from Mark Chang

(1)

(2)

Convolutional Neural Networks

• We need a course to talk about this topic

◦ http://cs231n.stanford.edu/syllabus.html

• However, we only have a lecture

(3)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• Sentiment Analysis by CNN

(4)

Outline

• CNN as Artist

(5)

Image Recognition

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

(6)

Image Recognition

(7)

Local Connectivity

Neurons connect to a small

region

(8)

Parameter Sharing

• The same feature in different positions

Neurons

share the same weights

(9)

Parameter Sharing

• Different features in the same position

Neurons

have different weights

(10)

Convolutional Layers

depth

width width

depth

weights weights

height

shared weight

(11)

Convolutional Layers

c₁

c₂

b₁

b₂ a₁

a₂

a₃

w_b1 w_b2

b₁ =w_b1a₁+w_b2a₂

w_b1

w_b2 b₂ =w_b1a₂+w_b2a₃ w_c1

w_c2

c₁ =w_c1a₁+w_c2a₂

w_c2 w_c1

c₂ =w_c1a₂+w_c2a₃ depth = 2

depth = 1

(12)

Convolutional Layers

c₁ b₁

b₂ a₁

a₂

d₁

b₃ a₃

c₂

d₂

depth = 2 depth = 2

w_c1

w_c2

w_c3

w_c4

c₁ = a₁w_c1 + b₁w_c2 + a₂w_c3 + b₂w_c4

w_c1

w_c2

w_c3

w_c4

c₂ = a₂w_c1 + b₂w_c2 + a₃w_c3 + b₃w_c4

(13)

Convolutional Layers

c₁ b₁

b₂ a₁

a₂

d₁

b₃ a₃

c₂

d₂

c₁ = a₁w_c1 + b₁w_c2 + a₂w_c3 + b₂w_c4

c₂ = a₂w_c1 + b₂w_c2 + a₃w_c3 + b₃w_c4 w_d1

w_d2

w_d3

w_d4 d₁ = a₁w_d1 + b₁w_d2 + a₂w_d3 + b₂w_d4 w_d1

w_d2

w_d3

w_d4 d₂ = a₂w_d1 + b₂w_d2 + a₃w_d3 + b₃w_d4

depth = 2 depth = 2

(14)

Convolutional Layers

A B C

A B C D

(15)

Hyper-parameters of CNN

• Stride • Padding

0 0

Stride = 1

Stride = 2

Padding = 0

Padding = 1

(16)

Example

Output

Volume (3x3x2)

Input

Volume (7x7x3)

Stride = 2

Padding = 1

http://cs231n.github.io/convolutional-networks/

Filter (3x3x3)

(17)

Convolutional Layers

(18)

Convolutional Layers

(19)

Convolutional Layers

(20)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

y[n]

x[k]

k

k w[0 k]

n

y[0] = x[ 2]w[2] + x[ 1]w[1] + x[0]w[0]

(21)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

y[n]

x[k]

k

n w[1 k]

y[1] = x[ 1]w[2] + x[0]w[1] + x[2]w[0]

(22)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

y[n]

x[k]

k

n y[2] = x[0]w[2] + x[1]w[1] + x[2]w[0]

w[2 k]

(23)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

y[n]

x[k]

k

n w[4 k]

y[4] = x[2]w[2] + x[3]w[1] + x[4]w[0]

(24)

Nonlinearity

• Rectified Linear (ReLU)

n

_out

=

⇢ n

_in

if n

_in

> 0 0 otherwise n

_in _n

2 6 6 4

1 4

3 1

3 7 7 5

2 6 6 4

1 4 0 1

3 7 7

ReLU

5

(25)

Why ReLU?

• Easy to train

• Avoid gradient vanishing problem

Sigmoid saturated

gradient ≈ 0 ReLU not saturated

(26)

Why ReLU?

• Biological reason

strong stimulation ReLU

weak stimulation

neuron t

v

strong stimulation

neuron t

v

weak stimulation

(27)

Pooling Layer

1 3 2 4 5 7 6 8 0 0 3 3 5 5 0 0

4 5 5 3 7 8

5 3

Maximum

Pooling Average

Pooling

Max(1,3,5,7) = 7 Avg(1,3,5,7) = 4

no overlap

no weights

depth = 1

Max(0,0,5,5) = 5

(28)

Why “Deep” Learning?

(29)

Visual Perception of Human

http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg

(30)

Visual Perception of Computer

Convolutional Layer

Convolutional

Layer Pooling Layer Pooling

Layer

Receptive Fields Receptive Fields Input

Layer

(31)

Visual Perception of Computer

Input Layer

Convolutional Layer with Receptive Fields:

Max-pooling Layer with

Width =3, Height = 3

Filter Responses

Input Image

(32)

Fully-Connected Layer

• Fully-Connected Layers : Global feature extraction

• Softmax Layer: Classifier Convolutional

Layer Convolutional Layer

Pooling Layer Pooling

Layer Input

Image

Fully-Connected

Layer Softmax Layer

5

7

Class Label

(33)

Visual Perception of Computer

• Alexnet

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

(34)

Training

• Forward Propagation

n₂ n₁

n_1(out) n_2(out) n_2(in) w₂₁

n_2(in) = w₂₁n_1(out)

n_2(out) = g(n_2(in)), g is activation function

(35)

Training

• Update weights

n₂ n₁

J

Cost function:

@J

@w₂₁ = @J

@n_2(out)

@n_2(in)

@w₂₁ w₂₁ w²¹ ⌘ @J

@w₂₁

) w²¹ w²¹ ⌘ @J

@n_2(out)

@n_2(in)

@w₂₁ n_2(out) n_2(in)

w

₂₁ _n_1(out)

@J

@w₂₁ = @J

@n_2(out)

@n_2(in)

@w₂₁ w₂₁ w²¹ ⌘ @J

@w₂₁

) w²¹ w²¹ ⌘ @J

@n_2(out)

@n_2(in)

@w₂₁

(36)

Training

• Update weights

n₂ n₁

J

Cost function:

n_2(out) n_2(in)

w

₂₁ _n_1(out)

w₂₁ w²¹ ⌘ @J

@n_2(out)

@n_2(in)

@w₂₁ ) w²¹ w²¹ ⌘ @J

@n_2(out)g⁰(n_2(in))n_1(out) n_2(out) = g(n_2(in)), n_2(in) = w₂₁n_1(out)

) @n_2(out)

@n_2(in) = g⁰(n_2(in)), @n_2(in)

@w₂₁ = n_1(out)

(37)

Training

• Propagate to the previous layer

n₂ n₁

J

Cost function:

@J

@n_1(in) = @J

@n_2(out)

@n_2(in)

@n_1(out)

@n_1(in)

n_2(out) n_2(in) _n_1(out) _n_1(in)

(38)

Training Convolutional Layers

• example:

a₃ a₂ a₁

b₂

b₁ ^w^b1 w_b1 w_b2

w_b2

output input

Convolutional Layer

To simplify the notations, in the following slides, we make:

b₁ means b_1(in), a₁ means a_1(out), and so on.

(39)

Training Convolutional Layers

• Forward propagation

a₃ a₂ a₁

b₂ b₁

input Convolutional

Layer

b₁ = w_b1a₁ + w_b2a₂

b₂ = w_b1a₂ + w_b2a₃

w_b1

w_b1 w_b2

w_b2

(40)

Training Convolutional Layers

• Update weights

J Cost

function:

a₃ a₂ a₁

b₂ b₁

@J

@b₁

@J

@b₂

w_b1 w_b1

@b₁

@w_b1

@b₂

@w_b1 w_b1 w^b1 ⌘( @J

@b₁

@w_b1 + @J

@b₂

@w_b1 )

(41)

Training Convolutional Layers

• Update weights

a₃ a₂ a₁

b₂

b₁ _w_b1 w_b1 b₁ = w_b1a₁ + w_b2a₂

b₂ = w_b1a₂ + w_b2a₃

@b₁

@w_b1 = a₁

@b₂

@w_b1 = a₂ w_b1 w^b1 ⌘( @J

@b₁ a₁ + @J

@b₂ a₂)

@J

@b₁

@J

@b₂ J Cost

function:

(42)

Training Convolutional Layers

• Update weights

a₃ a₂ a₁

b₂ b₁

w_b2 w^b2 ⌘( @J

@b₁

@w_b2 + @J

@b₂

@w_b2 )

@b₁

@w_b2

@b₂

@w_b2 w_b2

w_b2

@J

@b₁

@J

@b₂ J

Cost function:

(43)

Training Convolutional Layers

• Update weights

a₃ a₂ a₁

b₂ b₁

w_b2 w_b2

@b₁

@w_b2 = a₂

@b₂

@w_b2 = a₃ w_b2 w^b2 ⌘( @J

@b₁ a₂ + @J

@b₂ a₃) b₁ = w_b1a₁ + w_b2a₂

b₂ = w_b1a₂ + w_b2a₃

@J

@b₁

@J

@b₂ J

Cost function:

(44)

Training Convolutional Layers

J Cost

function:

a₃ a₂ a₁

b₂ b₁

@J

@b₁

@J

@b₂

@b₁

@a₁

@b₁

@a₂

@b₂

@a₂

@b₂

@a₃

@J

@b₁

@a₁

@J

@b₂

@a₃

@J

@b₁

@a₂ + @J

@b₂

@a₂

(45)

Training Convolutional Layers

J Cost

function:

a₃ a₂ a₁

b₂ b₁

@J

@b₁

@J

@b₂

b₁ = w_b1a₁ + w_b2a₂

b₂ = w_b1a₂ + w_b2a₃

@b₁

@a₁ = w_b1 @b₁

@a₂ = w_b2

@b₂

@a₂ = w_b1

@b₂

@a₃ = w_b2

@J

@b₁ w_b1

@J

@b₁ w_b1 + @J

@b₂ w_b2

@J

@b₂ w_b2

(46)

Max-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a₃ a₂ a₁

b₂

b₁ a₁ > a₂ a₂ > a₃ b₂ = max(a₂, a₃)

b₁ = max(a₁, a₂)

Max-pooling

=

⇢ a₂ if a₂ a₃

a₃ otherwise @b₂

@a₂ =

⇢ 1 if a₂ a₃ 0 otherwise

(47)

Max-Pooling Layers during Training

a₃ a₂ a₁

b₂ b₁

@J

@b₁

@J

@b₂

@b₁

@a₁ = 1

a₂ > a₃

@b₂

@a₂ = 1

a₁ > a₂

@J

@b₁

@J

@b₂

@b₁

@a₂ = 0

@b₂

@a₃ = 0 J

Cost function:

(48)

Max-Pooling Layers during Training

• if a₁ = a₂ ??

◦ Choose the node with smaller index

a₃ a₂ a₁

b₂ b₁

@J

@b₁

@J

@b₂

@J

@b₁

@J

@b₂ J

Cost function:

a₁ = a₂ = a₃

(49)

Avg-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a₃ a₂ a₁

b₂ b₁ b₁ = 1

2(a₁ + a₂)

b₂ = 1

2(a₂ + a₃)

@b₂

@a₂ = 1 2

@b₂

@a₃ = 1 2

Avg-pooling

(50)

Avg-Pooling Layers during Training

J

Cost function:

a₃ a₂ a₁

b₂ b₁

@J

@b₁

@J

@b₂

@b₁

@a₁ = @b₁

@a₂ = 1 2

@b₂

@a₂ = @b₂

@a₃ = 1 2

1 2

@J

@b₁ 1

2( @J

@b₁ + @J

@b₂ ) 1

2

@J

@b₂

(51)

ReLU during Training

n

_out

=

⇢ n

_in

if n

_in

> 0 0 otherwise n

_in _n

@n_out

@n_in =

⇢ 1 if n_in > 1 0 otherwise

(52)

Training CNN

(53)

Outline

• CNN as Artist

(54)

LeNet

◦ Paper:

http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf

Yann LeCun http://yann.lecun.com/exdb/lenet/

(55)

ImageNet Challenge

• ImageNet Large Scale Visual Recognition Challenge

◦ http://image-net.org/challenges/LSVRC/

• Dataset :

◦ 1000 categories

◦ Training: 1,200,000

◦ Validation: 50,000

◦ Testing: 100,000

http://vision.stanford.edu/Datasets/collage_s.png

(56)

ImageNet Challenge

http://www.qingpingshan.com/uploads/allimg/160818/1J22QI5-0.png

(57)

AlexNet (2012)

• Paper:

• The resurgence of Deep Learning

Geoffrey Hinton Alex Krizhevsky

(58)

VGGNet (2014)

• Paper: https://arxiv.org/abs/1409.1556

D: VGG16 E: VGG19

All filters are 3x3

(59)

VGGNet

• More layers & smaller filters (3x3) is better

• More non-linearity, fewer parameters

One 5x5 filter

• Parameters:

5x5 = 25

• Non-linear:1

Two 3x3 filters

• Parameters:

3x3x2 = 18

• Non-linear:2

(60)

VGG 19

depth=64 3x3 conv

conv1_1 conv1_2

maxpool

depth=128 3x3 conv

conv2_1 conv2_2

maxpool

depth=256 3x3 conv

conv3_1 conv3_2 conv3_3 conv3_4

depth=512 3x3 conv

conv4_1 conv4_2 conv4_3 conv4_4

depth=512 3x3 conv conv5_1 conv5_2 conv5_3 conv5_4

maxpool maxpool maxpool

size=4096 FC1FC2 size=1000

softmax

(61)

GoogLeNet (2014)

• Paper:

http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf

22 layers deep network

Inception

Module

(62)

Inception Module

• Best size?

◦ 3x3? 5x5?

• Use them all, and combine

(63)

Inception Module

1x1

convolution

3x3

convolution

5x5

convolution

3x3

max-pooling

layer Filter

Concatenate

(64)

Inception Module with Dimension Reduction

• Use 1x1 filters to reduce dimension

(65)

Inception Module with Dimension Reduction

Previous layer

1x1

convolution (1x1x256x128)

Reduced dimension

Input size 1x1x256

256 128

Output size 1x1x128

(66)

ResNet (2015)

• Residual Networks

• 152 layers

(67)

ResNet

• Residual learning: a building block

Residual

function

(68)

Residual Learning with Dimension Reduction

• using 1x1 filters

(69)

Pretrained Model Download

• http://www.vlfeat.org/matconvnet/pretrained/

◦ Alexnet:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-matconvnet- alex.mat

◦ VGG19:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep- 19.mat

◦ GoogLeNet:

◦ http://www.vlfeat.org/matconvnet/models/imagenet-googlenet-dag.mat

◦ ResNet

◦ http://www.vlfeat.org/matconvnet/models/imagenet-resnet-152-dag.mat

(70)

Using Pretrained Model

• Lower layers：edge, blob, texture (more general)

• Higher layers : object part (more specific)

(71)

Transfer Learning

• The Pretrained Model is trained on ImageNet

dataset

• If your data is similar to the ImageNet data

◦ Fix all CNN Layers

◦ Train FC layer

CNN layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Yourdata

… CNN layer

CNN layer

… CNN layer

Yourdata

… …

(72)

Transfer Learning

• The Pretrained Model is trained on ImageNet dataset

• If your data is far different from the ImageNet data

◦ Fix lower CNN Layers

◦ Train higher CNN and FC layers

CNN layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Yourdata

… CNN layer

CNN layer

… CNN layer

Yourdata

… …

(73)

Tensorflow Transfer Learning Example

• https://www.tensorflow.org/versions/r0.11/how_tos/styl e_guide.html

daisy photos 634

dandelion photos 899

roses photos 642

tulips photos 800

sunflowers photos 700

http://download.tensorflow.org/example_images/flower_photos.tgz

(74)

Tensorflow Transfer Learning Example

Fix these layers Train this layer

(75)

Outline

• CNN as Artist

(76)

Visualizing CNN

(77)

Visualizing CNN

CNN

CNN flower

random noise

filter response

(78)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

x f

score:

F = X

i,j

f

_i,j

f

_i,j

lower

score higher score

x F

gradient:

@F

@x

(79)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

x f

x

gradient:

f

_i,j

lower

score higher score

F

@F

@x

update

x

learning rate

x x + ⌘ @F

@x

(80)

Gradient Ascent

(81)

Different Layers of Visualization

CNN

(82)

Multiscale Image Generation

visualize resize visualize resize

visualize

(83)

Multiscale Image Generation

(84)

Deep Dream

• https://research.googleblog.com/2015/06/inceptionism- going-deeper-into-neural.html

• Source code:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/ex amples/tutorials/deepdream/deepdream.ipynb

http://download.tensorflow.org/example_images /flower_photos.tgz

(85)

Deep Dream

(86)

Deep Dream

(87)

Outline

• CNN as Artist

(88)

Neural Art

• Source code : https://github.com/ckmarkoh/neuralart_tensorflow

content

style

artwork

http://www.taipei-

101.com.tw/upload/news/201502/2015 021711505431705145.JPG

https://github.com/andersbll/neural_ar tistic_style/blob/master/images/starry_

night.jpg?raw=true

(89)

The Mechanism of Painting

Brain Artist

Scene Style ArtWork

Computer Neural Networks

(90)

Misconception

(91)

Content Generation

Brain Artist

Content

Canvas

Minimize differencethe Neural

Stimulation

Draw

(92)

Content Generation

Filter Responses VGG19

Update the color of the pixels Content

Canvas

Result

Width*Height

Depth

Minimize differencethe

(93)

Content Generation

Layer l’s Filter l Responses:

Layer l’s Filter Responses:

Input

Photo: Input

Canvas:

Width*Height (j)

Depth (i)

Width*Height (j)

Depth (i)

(94)

Content Generation

• Backward Propagation

Layer l’s Filter l Responses:

Input Canvas:

VGG19

Update Canvas

Learning Rate

(95)

Content Generation

(96)

Content Generation

VGG19

conv1_2 conv2_2 conv3_4 conv4_4 conv5_1 conv5_2

(97)

Style Generation

VGG19 Artwork

G

Filter Responses Gram Matrix

Width*Height

Depth

Position-

dependent Position- independent

(98)

Style Generation

1. .5

.5 .5 1.

1. .5 .25 1.

.5 .25 .5

.25 .25

1. .5 1.

Width*Height

Depth

k₁ k₂

Depth

Layer l’s Filter Responses

Gram Matrix G

(99)

Style Generation

Layer l’s

Filter Responses

Layer l’s

Gram Matrix Layer l’s

Gram Matrix Input

Artwork: Input

Canvas:

(100)

Style Generation

VGG19 Filter

Responses Gram

Matrix

Minimize the difference G

G Style

Canvas

Update the color of the pixels

Result

(101)

Style Generation

(102)

Style Generation

VGG19

Conv1_1 Conv1_1

Conv2_1 Conv1_1 Conv2_1 Conv3_1

Conv1_1 Conv2_1 Conv3_1 Conv4_1

Conv1_1 Conv2_1 Conv3_1 Conv4_1 Conv5_1

(103)

Artwork Generation

Filter Responses VGG19

Gram Matrix

(104)

Artwork Generation

VGG19 VGG19

Conv1_1 Conv2_1 Conv3_1 Conv4_1 Conv5_1 Conv4_2

(105)

Artwork Generation

(106)

Content v.s. Style

0.15 0.05

0.02 0.007

(107)

Neural Doodle

• Source code: https://github.com/alexjc/neural-doodle

style

content semantic maps result

(108)

Neural Doodle

• Image analogy

(109)

Neural Doodle

• Image analogy

恐怖連結，慎入！

https://raw.githubusercontent.com/awentzonline/

image-analogies/master/examples/images/trump- image-analogy.jpg

(110)

Real-time Texture Synthesis

• Paper: https://arxiv.org/pdf/1604.04382v1.pdf

◦ GAN: https://arxiv.org/pdf/1406.2661v1.pdf

◦ VAE: https://arxiv.org/pdf/1312.6114v10.pdf

• Source Code : https://github.com/chuanli11/MGANs

(111)

Outline

• CNN as Artist

(112)

A Convolutional Neural Network for Modelling Sentences

• Source code:

https://github.com/FredericGodin/DynamicCNN

(113)

Drawbacks of Recursive Neural Networks(RvNN)

• Need human-labeled syntax tree during training

This is a dog

Train RvNN

Wordvector

This is a dog

RvNN

(114)

Drawbacks of Recursive Neural Networks(RvNN)

• Ambiguity in natural language

http://3rd.mafengwo.cn/travels/info_wei

bo.php?id=2861280 http://www.appledaily.com.tw/realtimen ews/article/new/20151006/705309/

(115)

Element-wise 1D operations on word vectors

• 1D Convolution or 1D Pooling

This is a

operation operation

This is a

Represented by

(116)

From RvNN to CNN

• RvNN • CNN

This is a dog

conv3

conv2

conv1 conv1

conv1 conv2 SameRvNN

Different conv layers

This is a dog

RvNN

(117)

CNN with Max-Pooling Layers

• Similar to syntax tree

• But human-labeled syntax tree is not needed

This is a dog

conv2

pool1

conv1 conv1

conv1 pool1

This is a dog

conv2

pool1

conv1 conv1

pool1

Max Pooling

(118)

Sentiment Analysis by CNN

• Use softmax layer to classify the sentiments positive

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

This movie is awful

conv2

pool1

conv1 conv1

conv1 pool1

softmax

(119)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Backward propagation error

(120)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

conv2

pool1

conv1 conv1

conv1 pool1

softmax

positive

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Update the weights

(121)

Multiple Filters

• Richer features than RNN

This is

filter11 filter12 Filter13

a

filter11 filter12 Filter13 filter21 filter22 Filter23

(122)

Sentence can’t be easily resized

• Image can be easily resized • Sentence can’t be easily resized

全台灣最高樓在台北

resize

全台灣最高的高樓在台北市全台灣最高樓在台北市

台灣最高樓在台北

(123)

Various Input Size

• Convolutional layers and pooling layers

◦ can handle input with various size

This is a dog

pool1

conv1 conv1

conv1 pool1

the dog run

pool1

conv1 conv1

(124)

Various Input Size

• Fully-connected layer and softmax layer

◦ need fixed-size input

The dog run

fc softmax

This is a

fc softmax

dog

(125)

k-max Pooling

• choose the k-max values

• preserve the order of input values

• variable-size input, fixed-size output

3-max pooling

13 4 1 7 8

12 5 21 15 7 4 9

3-max pooling

12 21 15 13 7 8

(126)

Wide Convolution

• Ensures that all weights reach the entire sentence

conv conv conv conv conv conv conv conv

Wide convolution Narrow convolution

(127)

Dynamic k-max Pooling

wide convolution &

k-max pooling wide convolution

& k-max pooling

k_l k_top

L

s k_top and L are constants

l : index of current layer k_l : k of current layer k_top : k of top layer

L : total number of layers s : length of input sentence

k_l = max(k_top, dL l

L se)

(128)

Dynamic k-max Pooling

s = 10

L = 2 k₁ = max(3, d2 1

2 ⇥ 10e) = 5 k_top = 3

L se)

conv & pooling conv & pooling

(129)

Dynamic k-max Pooling

L = 2 k_top = 3

L se)

k₁ = max(3, d2 1

2 ⇥ 14e) = 7

s = 14

(130)

Dynamic k-max Pooling

L = 2 k_top = 3

L se)

s = 8 k₁ = max(3, d2 1

2 ⇥ 8e) = 4

(131)

Dynamic k-max Pooling

Wide convolution &

Dynamic k-max pooling

(132)

Convolutional Neural Networks for Sentence Classification

• Paper: http://www.aclweb.org/anthology/D14-1181

• Sourcee code:

https://github.com/yoonkim/CNN_sentence

(133)

Static & Non-Static Channel

• Pretrained by word2vec

• Static: fix the values during training

• Non-Static: update the values during training

(134)

About the Lecturer

Mark Chang

• Email: ckmarkoh at gmail dot com

• Blog: http://cpmarkchang.logdown.com

• Github: https://github.com/ckmarkoh

• Slideshare: http://www.slideshare.net/ckmarkohchang

• Youtube: https://www.youtube.com/channel/UCckNPGDL21aznRhl3EijRQw

HTC Research & Healthcare Deep Learning Algorithms Research Engineer