• 沒有找到結果。

Slide credit from Mark Chang

N/A
N/A
Protected

Academic year: 2022

Share "Slide credit from Mark Chang"

Copied!
134
0
0

加載中.... (立即查看全文)

全文

(1)
(2)

Convolutional Neural Networks

• We need a course to talk about this topic

http://cs231n.stanford.edu/syllabus.html

• However, we only have a lecture

(3)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• Sentiment Analysis by CNN

(4)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• Sentiment Analysis by CNN

(5)

Image Recognition

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

(6)

Image Recognition

(7)

Local Connectivity

Neurons connect to a small

region

(8)

Parameter Sharing

• The same feature in different positions

Neurons

share the same weights

(9)

Parameter Sharing

• Different features in the same position

Neurons

have different weights

(10)

Convolutional Layers

depth

width width

depth

weights weights

height

shared weight

(11)

Convolutional Layers

c1

c2

b1

b2 a1

a2

a3

wb1 wb2

b1 =wb1a1+wb2a2

wb1

wb2 b2 =wb1a2+wb2a3 wc1

wc2

c1 =wc1a1+wc2a2

wc2 wc1

c2 =wc1a2+wc2a3 depth = 2

depth = 1

(12)

Convolutional Layers

c1 b1

b2 a1

a2

d1

b3 a3

c2

d2

depth = 2 depth = 2

wc1

wc2

wc3

wc4

c1 = a1wc1 + b1wc2 + a2wc3 + b2wc4

wc1

wc2

wc3

wc4

c2 = a2wc1 + b2wc2 + a3wc3 + b3wc4

(13)

Convolutional Layers

c1 b1

b2 a1

a2

d1

b3 a3

c2

d2

c1 = a1wc1 + b1wc2 + a2wc3 + b2wc4

c2 = a2wc1 + b2wc2 + a3wc3 + b3wc4 wd1

wd2

wd3

wd4 d1 = a1wd1 + b1wd2 + a2wd3 + b2wd4 wd1

wd2

wd3

wd4 d2 = a2wd1 + b2wd2 + a3wd3 + b3wd4

depth = 2 depth = 2

(14)

Convolutional Layers

A B C

A B C D

(15)

Hyper-parameters of CNN

• Stride • Padding

0 0

Stride = 1

Stride = 2

Padding = 0

Padding = 1

(16)

Example

Output

Volume (3x3x2)

Input

Volume (7x7x3)

Stride = 2

Padding = 1

http://cs231n.github.io/convolutional-networks/

Filter (3x3x3)

(17)

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

(18)

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

(19)

Convolutional Layers

http://cs231n.github.io/convolutional-networks/

(20)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

n

y[n]

x[k]

k

k w[0 k]

n

y[0] = x[ 2]w[2] + x[ 1]w[1] + x[0]w[0]

(21)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

n

y[n]

x[k]

k

k

n w[1 k]

y[1] = x[ 1]w[2] + x[0]w[1] + x[2]w[0]

(22)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

n

y[n]

x[k]

k

k

n y[2] = x[0]w[2] + x[1]w[1] + x[2]w[0]

w[2 k]

(23)

Relationship with Convolution

y[n] = X

k

x[k]w[n k]

x[n]

w[n]

n

n

y[n]

x[k]

k

k

n w[4 k]

y[4] = x[2]w[2] + x[3]w[1] + x[4]w[0]

(24)

Nonlinearity

• Rectified Linear (ReLU)

n

out

=

⇢ n

in

if n

in

> 0 0 otherwise n

in n

2 6 6 4

1 4

3 1

3 7 7 5

2 6 6 4

1 4 0 1

3 7 7

ReLU

5

(25)

Why ReLU?

• Easy to train

• Avoid gradient vanishing problem

Sigmoid saturated

gradient ≈ 0 ReLU not saturated

(26)

Why ReLU?

• Biological reason

strong stimulation ReLU

weak stimulation

neuron t

v

strong stimulation

neuron t

v

weak stimulation

(27)

Pooling Layer

1 3 2 4 5 7 6 8 0 0 3 3 5 5 0 0

4 5 5 3 7 8

5 3

Maximum

Pooling Average

Pooling

Max(1,3,5,7) = 7 Avg(1,3,5,7) = 4

no overlap

no weights

depth = 1

Max(0,0,5,5) = 5

(28)

Why “Deep” Learning?

(29)

Visual Perception of Human

http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg

(30)

Visual Perception of Computer

Convolutional Layer

Convolutional

Layer Pooling Layer Pooling

Layer

Receptive Fields Receptive Fields Input

Layer

(31)

Visual Perception of Computer

Input Layer

Convolutional Layer with Receptive Fields:

Max-pooling Layer with

Width =3, Height = 3

Filter Responses

Filter Responses

Input Image

(32)

Fully-Connected Layer

• Fully-Connected Layers : Global feature extraction

• Softmax Layer: Classifier Convolutional

Layer Convolutional Layer

Pooling Layer Pooling

Layer Input

Layer Input

Image

Fully-Connected

Layer Softmax Layer

5

7

Class Label

(33)

Visual Perception of Computer

• Alexnet

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

(34)

Training

• Forward Propagation

n2 n1

n1(out) n2(out) n2(in) w21

n2(in) = w21n1(out)

n2(out) = g(n2(in)), g is activation function

(35)

Training

• Update weights

n2 n1

J

Cost function:

@J

@w21 = @J

@n2(out)

@n2(out)

@n2(in)

@n2(in)

@w21 w21 w21 ⌘ @J

@w21

) w21 w21 ⌘ @J

@n2(out)

@n2(out)

@n2(in)

@n2(in)

@w21 n2(out) n2(in)

w

21 n1(out)

@J

@w21 = @J

@n2(out)

@n2(out)

@n2(in)

@n2(in)

@w21 w21 w21 ⌘ @J

@w21

) w21 w21 ⌘ @J

@n2(out)

@n2(out)

@n2(in)

@n2(in)

@w21

(36)

Training

• Update weights

n2 n1

J

Cost function:

n2(out) n2(in)

w

21 n1(out)

w21 w21 ⌘ @J

@n2(out)

@n2(out)

@n2(in)

@n2(in)

@w21 ) w21 w21 ⌘ @J

@n2(out)g0(n2(in))n1(out) n2(out) = g(n2(in)), n2(in) = w21n1(out)

) @n2(out)

@n2(in) = g0(n2(in)), @n2(in)

@w21 = n1(out)

(37)

Training

• Propagate to the previous layer

n2 n1

J

Cost function:

@J

@n1(in) = @J

@n2(out)

@n2(out)

@n2(in)

@n2(in)

@n1(out)

@n1(out)

@n1(in)

n2(out) n2(in) n1(out) n1(in)

(38)

Training Convolutional Layers

• example:

a3 a2 a1

b2

b1 wb1 wb1 wb2

wb2

output input

Convolutional Layer

To simplify the notations, in the following slides, we make:

b1 means b1(in), a1 means a1(out), and so on.

(39)

Training Convolutional Layers

• Forward propagation

a3 a2 a1

b2 b1

input Convolutional

Layer

b1 = wb1a1 + wb2a2

b2 = wb1a2 + wb2a3

wb1

wb1 wb2

wb2

(40)

Training Convolutional Layers

• Update weights

J Cost

function:

a3 a2 a1

b2 b1

@J

@b1

@J

@b2

wb1 wb1

@b1

@wb1

@b2

@wb1 wb1 wb1 ⌘( @J

@b1

@b1

@wb1 + @J

@b2

@b2

@wb1 )

(41)

Training Convolutional Layers

• Update weights

a3 a2 a1

b2

b1 wb1 wb1 b1 = wb1a1 + wb2a2

b2 = wb1a2 + wb2a3

@b1

@wb1 = a1

@b2

@wb1 = a2 wb1 wb1 ⌘( @J

@b1 a1 + @J

@b2 a2)

@J

@b1

@J

@b2 J Cost

function:

(42)

Training Convolutional Layers

• Update weights

a3 a2 a1

b2 b1

wb2 wb2 ⌘( @J

@b1

@b1

@wb2 + @J

@b2

@b2

@wb2 )

@b1

@wb2

@b2

@wb2 wb2

wb2

@J

@b1

@J

@b2 J

Cost function:

(43)

Training Convolutional Layers

• Update weights

a3 a2 a1

b2 b1

wb2 wb2

@b1

@wb2 = a2

@b2

@wb2 = a3 wb2 wb2 ⌘( @J

@b1 a2 + @J

@b2 a3) b1 = wb1a1 + wb2a2

b2 = wb1a2 + wb2a3

@J

@b1

@J

@b2 J

Cost function:

(44)

Training Convolutional Layers

• Propagate to the previous layer

J Cost

function:

a3 a2 a1

b2 b1

@J

@b1

@J

@b2

@b1

@a1

@b1

@a2

@b2

@a2

@b2

@a3

@J

@b1

@b1

@a1

@J

@b2

@b2

@a3

@J

@b1

@b1

@a2 + @J

@b2

@b2

@a2

(45)

Training Convolutional Layers

• Propagate to the previous layer

J Cost

function:

a3 a2 a1

b2 b1

@J

@b1

@J

@b2

b1 = wb1a1 + wb2a2

b2 = wb1a2 + wb2a3

@b1

@a1 = wb1 @b1

@a2 = wb2

@b2

@a2 = wb1

@b2

@a3 = wb2

@J

@b1 wb1

@J

@b1 wb1 + @J

@b2 wb2

@J

@b2 wb2

(46)

Max-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a3 a2 a1

b2

b1 a1 > a2 a2 > a3 b2 = max(a2, a3)

b1 = max(a1, a2)

Max-pooling

=

⇢ a2 if a2 a3

a3 otherwise @b2

@a2 =

⇢ 1 if a2 a3 0 otherwise

(47)

Max-Pooling Layers during Training

• Propagate to the previous layer

a3 a2 a1

b2 b1

@J

@b1

@J

@b2

@b1

@a1 = 1

a2 > a3

@b2

@a2 = 1

a1 > a2

@J

@b1

@J

@b2

@b1

@a2 = 0

@b2

@a3 = 0 J

Cost function:

(48)

Max-Pooling Layers during Training

• if a1 = a2 ??

Choose the node with smaller index

a3 a2 a1

b2 b1

@J

@b1

@J

@b2

@J

@b1

@J

@b2 J

Cost function:

a1 = a2 = a3

(49)

Avg-Pooling Layers during Training

• Pooling layers have no weights

• No need to update weights

a3 a2 a1

b2 b1 b1 = 1

2(a1 + a2)

b2 = 1

2(a2 + a3)

@b2

@a2 = 1 2

@b2

@a3 = 1 2

Avg-pooling

(50)

Avg-Pooling Layers during Training

• Propagate to the previous layer

J

Cost function:

a3 a2 a1

b2 b1

@J

@b1

@J

@b2

@b1

@a1 = @b1

@a2 = 1 2

@b2

@a2 = @b2

@a3 = 1 2

1 2

@J

@b1 1

2( @J

@b1 + @J

@b2 ) 1

2

@J

@b2

(51)

ReLU during Training

n

out

=

⇢ n

in

if n

in

> 0 0 otherwise n

in n

@nout

@nin =

⇢ 1 if nin > 1 0 otherwise

(52)

Training CNN

(53)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• Sentiment Analysis by CNN

(54)

LeNet

Paper:

http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf

Yann LeCun http://yann.lecun.com/exdb/lenet/

(55)

ImageNet Challenge

• ImageNet Large Scale Visual Recognition Challenge

http://image-net.org/challenges/LSVRC/

• Dataset :

1000 categories

Training: 1,200,000

Validation: 50,000

Testing: 100,000

http://vision.stanford.edu/Datasets/collage_s.png

(56)

ImageNet Challenge

http://www.qingpingshan.com/uploads/allimg/160818/1J22QI5-0.png

(57)

AlexNet (2012)

• Paper:

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

• The resurgence of Deep Learning

Geoffrey Hinton Alex Krizhevsky

(58)

VGGNet (2014)

• Paper: https://arxiv.org/abs/1409.1556

D: VGG16 E: VGG19

All filters are 3x3

(59)

VGGNet

• More layers & smaller filters (3x3) is better

• More non-linearity, fewer parameters

One 5x5 filter

• Parameters:

5x5 = 25

• Non-linear:1

Two 3x3 filters

• Parameters:

3x3x2 = 18

• Non-linear:2

(60)

VGG 19

depth=64 3x3 conv

conv1_1 conv1_2

maxpool

depth=128 3x3 conv

conv2_1 conv2_2

maxpool

depth=256 3x3 conv

conv3_1 conv3_2 conv3_3 conv3_4

depth=512 3x3 conv

conv4_1 conv4_2 conv4_3 conv4_4

depth=512 3x3 conv conv5_1 conv5_2 conv5_3 conv5_4

maxpool maxpool maxpool

size=4096 FC1FC2 size=1000

softmax

(61)

GoogLeNet (2014)

• Paper:

http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf

22 layers deep network

Inception

Module

(62)

Inception Module

• Best size?

3x3? 5x5?

• Use them all, and combine

(63)

Inception Module

1x1

convolution

3x3

convolution

5x5

convolution

3x3

max-pooling

Previous

layer Filter

Concatenate

(64)

Inception Module with Dimension Reduction

• Use 1x1 filters to reduce dimension

(65)

Inception Module with Dimension Reduction

Previous layer

1x1

convolution (1x1x256x128)

Reduced dimension

Input size 1x1x256

256 128

Output size 1x1x128

(66)

ResNet (2015)

• Paper: https://arxiv.org/abs/1512.03385

• Residual Networks

• 152 layers

(67)

ResNet

• Residual learning: a building block

Residual

function

(68)

Residual Learning with Dimension Reduction

• using 1x1 filters

(69)

Pretrained Model Download

• http://www.vlfeat.org/matconvnet/pretrained/

Alexnet:

http://www.vlfeat.org/matconvnet/models/imagenet-matconvnet- alex.mat

VGG19:

http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep- 19.mat

GoogLeNet:

http://www.vlfeat.org/matconvnet/models/imagenet-googlenet-dag.mat

ResNet

http://www.vlfeat.org/matconvnet/models/imagenet-resnet-152-dag.mat

(70)

Using Pretrained Model

• Lower layers:edge, blob, texture (more general)

• Higher layers : object part (more specific)

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

(71)

Transfer Learning

• The Pretrained Model is trained on ImageNet

dataset

• If your data is similar to the ImageNet data

Fix all CNN Layers

Train FC layer

CNN layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Yourdata

CNN layer

CNN layer

CNN layer

Yourdata

(72)

Transfer Learning

• The Pretrained Model is trained on ImageNet dataset

• If your data is far different from the ImageNet data

Fix lower CNN Layers

Train higher CNN and FC layers

CNN layer

FC layer FC layer

Labeled dataLabeled dataLabeled dataLabeled dataLabeled dataImageNet data

Yourdata

CNN layer

CNN layer

CNN layer

Yourdata

(73)

Tensorflow Transfer Learning Example

• https://www.tensorflow.org/versions/r0.11/how_tos/styl e_guide.html

daisy photos 634

dandelion photos 899

roses photos 642

tulips photos 800

sunflowers photos 700

http://download.tensorflow.org/example_images/flower_photos.tgz

(74)

Tensorflow Transfer Learning Example

Fix these layers Train this layer

(75)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• Sentiment Analysis by CNN

(76)

Visualizing CNN

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

(77)

Visualizing CNN

CNN

CNN flower

random noise

filter response

filter response

(78)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

x f

score:

F = X

i,j

f

i,j

f

i,j

lower

score higher score

x F

gradient:

@F

@x

(79)

filter response:

Gradient Ascent

• Magnify the filter response random

noise:

x f

x

gradient:

f

i,j

lower

score higher score

F

@F

@x

update

x

learning rate

x x + ⌘ @F

@x

(80)

Gradient Ascent

(81)

Different Layers of Visualization

CNN

(82)

Multiscale Image Generation

visualize resize visualize resize

visualize

(83)

Multiscale Image Generation

(84)

Deep Dream

• https://research.googleblog.com/2015/06/inceptionism- going-deeper-into-neural.html

• Source code:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/ex amples/tutorials/deepdream/deepdream.ipynb

http://download.tensorflow.org/example_images /flower_photos.tgz

(85)

Deep Dream

(86)

Deep Dream

(87)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• Sentiment Analysis by CNN

(88)

Neural Art

• Paper: https://arxiv.org/abs/1508.06576

• Source code : https://github.com/ckmarkoh/neuralart_tensorflow

content

style

artwork

http://www.taipei-

101.com.tw/upload/news/201502/2015 021711505431705145.JPG

https://github.com/andersbll/neural_ar tistic_style/blob/master/images/starry_

night.jpg?raw=true

(89)

The Mechanism of Painting

Brain Artist

Scene Style ArtWork

Computer Neural Networks

(90)

Misconception

(91)

Content Generation

Brain Artist

Content

Canvas

Minimize differencethe Neural

Stimulation

Draw

(92)

Content Generation

Filter Responses VGG19

Update the color of the pixels Content

Canvas

Result

Width*Height

Depth

Minimize differencethe

(93)

Content Generation

Layer l’s Filter l Responses:

Layer l’s Filter Responses:

Input

Photo: Input

Canvas:

Width*Height (j)

Depth (i)

Width*Height (j)

Depth (i)

(94)

Content Generation

• Backward Propagation

Layer l’s Filter l Responses:

Input Canvas:

VGG19

Update Canvas

Learning Rate

(95)

Content Generation

(96)

Content Generation

VGG19

conv1_2 conv2_2 conv3_4 conv4_4 conv5_1 conv5_2

(97)

Style Generation

VGG19 Artwork

G

G

Filter Responses Gram Matrix

Width*Height

Depth

Depth

Depth

Position-

dependent Position- independent

(98)

Style Generation

1. .5

.5 .5 1.

1. .5 .25 1.

.5 .25 .5

.25 .25

1. .5 1.

Width*Height

Depth

k1 k2

k1 k2

Depth

Depth

Layer l’s Filter Responses

Gram Matrix G

(99)

Style Generation

Layer l’s

Filter Responses

Layer l’s

Gram Matrix Layer l’s

Gram Matrix Input

Artwork: Input

Canvas:

(100)

Style Generation

VGG19 Filter

Responses Gram

Matrix

Minimize the difference G

G Style

Canvas

Update the color of the pixels

Result

(101)

Style Generation

(102)

Style Generation

VGG19

Conv1_1 Conv1_1

Conv2_1 Conv1_1 Conv2_1 Conv3_1

Conv1_1 Conv2_1 Conv3_1 Conv4_1

Conv1_1 Conv2_1 Conv3_1 Conv4_1 Conv5_1

(103)

Artwork Generation

Filter Responses VGG19

Gram Matrix

(104)

Artwork Generation

VGG19 VGG19

Conv1_1 Conv2_1 Conv3_1 Conv4_1 Conv5_1 Conv4_2

(105)

Artwork Generation

(106)

Content v.s. Style

0.15 0.05

0.02 0.007

(107)

Neural Doodle

• Paper: https://arxiv.org/abs/1603.01768

• Source code: https://github.com/alexjc/neural-doodle

style

content semantic maps result

(108)

Neural Doodle

• Image analogy

(109)

Neural Doodle

• Image analogy

恐怖連結,慎入!

https://raw.githubusercontent.com/awentzonline/

image-analogies/master/examples/images/trump- image-analogy.jpg

(110)

Real-time Texture Synthesis

• Paper: https://arxiv.org/pdf/1604.04382v1.pdf

GAN: https://arxiv.org/pdf/1406.2661v1.pdf

VAE: https://arxiv.org/pdf/1312.6114v10.pdf

• Source Code : https://github.com/chuanli11/MGANs

(111)

Outline

• CNN(Convolutional Neural Networks) Introduction

• Evolution of CNN

• Visualizing the Features

• CNN as Artist

• Sentiment Analysis by CNN

(112)

A Convolutional Neural Network for Modelling Sentences

• Paper: https://arxiv.org/abs/1404.2188

• Source code:

https://github.com/FredericGodin/DynamicCNN

(113)

Drawbacks of Recursive Neural Networks(RvNN)

• Need human-labeled syntax tree during training

This is a dog

Train RvNN

Wordvector

This is a dog

RvNN

RvNN

RvNN

(114)

Drawbacks of Recursive Neural Networks(RvNN)

• Ambiguity in natural language

http://3rd.mafengwo.cn/travels/info_wei

bo.php?id=2861280 http://www.appledaily.com.tw/realtimen ews/article/new/20151006/705309/

(115)

Element-wise 1D operations on word vectors

• 1D Convolution or 1D Pooling

This is a

operation operation

This is a

Represented by

(116)

From RvNN to CNN

• RvNN • CNN

This is a dog

conv3

conv2

conv1 conv1

conv1 conv2 SameRvNN

Different conv layers

This is a dog

RvNN

RvNN

RvNN

(117)

CNN with Max-Pooling Layers

• Similar to syntax tree

• But human-labeled syntax tree is not needed

This is a dog

conv2

pool1

conv1 conv1

conv1 pool1

This is a dog

conv2

pool1

conv1 conv1

pool1

Max Pooling

(118)

Sentiment Analysis by CNN

• Use softmax layer to classify the sentiments positive

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

This movie is awful

conv2

pool1

conv1 conv1

conv1 pool1

softmax

(119)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

negative

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Backward propagation error

(120)

Sentiment Analysis by CNN

• Build the “correct syntax tree” by training negative

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

positive

This movie is awesome

conv2

pool1

conv1 conv1

conv1 pool1

softmax

Update the weights

(121)

Multiple Filters

• Richer features than RNN

This is

filter11 filter12 Filter13

a

filter11 filter12 Filter13 filter21 filter22 Filter23

(122)

Sentence can’t be easily resized

• Image can be easily resized • Sentence can’t be easily resized

全台灣最高樓在台北

resize

resize

全台灣最高的高樓在台北市 全台灣最高樓在台北市

台灣最高樓在台北

(123)

Various Input Size

• Convolutional layers and pooling layers

can handle input with various size

This is a dog

pool1

conv1 conv1

conv1 pool1

the dog run

pool1

conv1 conv1

(124)

Various Input Size

• Fully-connected layer and softmax layer

need fixed-size input

The dog run

fc softmax

This is a

fc softmax

dog

(125)

k-max Pooling

• choose the k-max values

• preserve the order of input values

• variable-size input, fixed-size output

3-max pooling

13 4 1 7 8

12 5 21 15 7 4 9

3-max pooling

12 21 15 13 7 8

(126)

Wide Convolution

• Ensures that all weights reach the entire sentence

conv conv conv conv conv conv conv conv

Wide convolution Narrow convolution

(127)

Dynamic k-max Pooling

wide convolution &

k-max pooling wide convolution

& k-max pooling

kl ktop

L

s ktop and L are constants

l : index of current layer kl : k of current layer ktop : k of top layer

L : total number of layers s : length of input sentence

kl = max(ktop, dL l

L se)

(128)

Dynamic k-max Pooling

s = 10

L = 2 k1 = max(3, d2 1

2 ⇥ 10e) = 5 ktop = 3

kl = max(ktop, dL l

L se)

conv & pooling conv & pooling

(129)

Dynamic k-max Pooling

conv & pooling conv & pooling

L = 2 ktop = 3

kl = max(ktop, dL l

L se)

k1 = max(3, d2 1

2 ⇥ 14e) = 7

s = 14

(130)

Dynamic k-max Pooling

conv & pooling conv & pooling

L = 2 ktop = 3

kl = max(ktop, dL l

L se)

s = 8 k1 = max(3, d2 1

2 ⇥ 8e) = 4

(131)

Dynamic k-max Pooling

Wide convolution &

Dynamic k-max pooling

(132)

Convolutional Neural Networks for Sentence Classification

• Paper: http://www.aclweb.org/anthology/D14-1181

• Sourcee code:

https://github.com/yoonkim/CNN_sentence

(133)

Static & Non-Static Channel

• Pretrained by word2vec

• Static: fix the values during training

• Non-Static: update the values during training

(134)

About the Lecturer

Mark Chang

• Email: ckmarkoh at gmail dot com

• Blog: http://cpmarkchang.logdown.com

• Github: https://github.com/ckmarkoh

• Slideshare: http://www.slideshare.net/ckmarkohchang

• Youtube: https://www.youtube.com/channel/UCckNPGDL21aznRhl3EijRQw

HTC Research & Healthcare Deep Learning Algorithms Research Engineer

參考文獻

相關文件

„ However, NTP SIPv6 UA cannot communicate with CISCO PSTN gateway, and CCL PCA (IPv6 SIP UA) cannot communicate with CISCO PSTN gateway and Pingtel hardware-based SIP phone. „

Machine Translation Speech Recognition Image Captioning Question Answering Sensory Memory.

Responsible for providing reliable data transmission Data Link Layer from one node to another. Concerned with routing data from one network node Network Layer

Constrain the data distribution for learned latent codes Generate the latent code via a prior

Reinforcement learning is based on reward hypothesis A reward r t is a scalar feedback signal. ◦ Indicates how well agent is doing at

 Sequence-to-sequence learning: both input and output are both sequences with different lengths..

Training two networks jointly  the generator knows how to adapt its parameters in order to produce output data that can fool the

◦ Value function: how good is each state and/or action1. ◦ Model: agent’s representation of