Special Networks

77  Download (0)

Full text

(1)

Special Networks

Hung-yi Lee

李宏毅

(2)

Announcement

• 11/13 (下週一) 14:00 ~ 17:00 台灣微軟參訪

• 地址:台北市信義區忠孝東路五段68號19樓 (捷運市 政府站3號出口)

• 14:00:在捷運市政府站3號出口集合

• 報名表單:

• https://docs.google.com/forms/d/e/1FAIpQLSfs2zlo GanjWjJvVkJu8DUe9BlVZ5ugLPIs3FUmMbR9VkF8Fw /viewform?fbzx=-8767653761190698000

(3)

Outline

• Convolutional Neural Network (Review)

• Spatial Transformer

• Highway Network & Grid LSTM

• Pointer Network

• External Memory

(4)

Convolutional Layer

……

1

1 2

2

Sparse Connectivity

3 4 5

Each neural only connects to part of the output of the

previous layer

3 4 Receptive

Field

Different neurons have different, but overlapping, receptive fields

(5)

Convolutional Layer

……

1

1 2

2

Sparse Connectivity

3 4 5

Each neural only connects to part of the output of the

previous layer

3 4

Parameter Sharing

The neurons with different receptive fields can use the same set of parameters.

Less parameters than fully connected layer

(6)

Convolutional Layer

……

1

1 2

2 3

4 5

3 4

Considering neuron 1 and 3 as

“filter 1” (kernel 1)

filter (kernel) size: size of the receptive field of a neuron

Stride = 2

Considering neuron 2 and 4 as

“filter 2” (kernel 2)

Kernel size, no. of filter, stride are all designed by the developers.

(7)

Example –

1D Signal + Single Channel

1 2 3 4

𝑥1 𝑥2 𝑥3

𝑥4

𝑥5

Classification, Predict the future …

Audio Signal, Stock Value …

(8)

Example –

1D Signal + Multiple Channel

1 2 3

A document: each word is a vector

I like this movie

very

much 4

……

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7

Does this kind of receptive field make sense?

(9)

Example –

2D Signal + Single Channel

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

6 x 6 black & white picture image

1:

2:

3:

7:

8:

9:

13:

14:

15:

4:

10:

16:

1 0 0 0 0 1 0 0 0 0 1 1

Only show 1 filter here

Size of Receptive field is 3x3, Stride is 1

(10)

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

Example –

2D Signal + Multiple Channel

6 x 6 colorful image

1:

2:

3:

7:

8:

9:

13:

14:

15:

4:

10:

16:

1 0 0 0 0 1 0 0 0 0 1 1

Only show 1 filter here

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

Size of Receptive field is 3x3x3, Stride is 1

1 0 1 0 0 1 1 0 1 1 0 1

0 0 0 0 0 0 0 0 1 1 1 0

(11)

Padding

Source of images:

https://github.com/vdumoulin/conv_arithmetic

Zero Padding, Reflection Padding

(12)

Pooling Layer

nodes k

N /

Layer l Layer l 1

nodes N

… …

1

1 k

1

k

k 2

2

k outputs in layer 𝑙 − 1 are grouped together

Each output in layer 𝑙

“summarizes” k inputs

1 1

al

1 l

ak l

a1

k j

l j

l a

a k

1

1 1

1 Average Pooling:

Max Pooling:

L2 Pooling:

2 1 1

1 1

1l max al ,al , ,akl

a

  

k j

l j

l a

a k

1

1 2 1

1

(13)

Pooling Layer

Which outputs should be grouped together?

Group the neurons corresponding to the same filter with nearby

receptive fields

……

1

1 2

2 3

4 5

3 4 Convolutional

Layer

Pooling Layer

1

2

Subsampling

(14)

Pooling Layer

Which outputs should be grouped together?

Group the neurons with the same receptive field

……

1

1 2

2 3

4 5

3 4 Convolutional

Layer

Pooling Layer

1

2

Maxout Network

How can you know whether the neurons detect the same

pattern?

(15)

Auto-encoder for CNN

Convolution

Pooling

Convolution

Pooling Deconvolution

Unpooling

Deconvolution Unpooling

As close as possible Deconvolution

code

Unpooling &

Deconvolution

(16)

Unpooling

14 x 14 28 x 28

Source of image :

https://leonardoaraujosantos.gitbooks.io/artificial- inteligence/content/image_segmentation.html

Alternative: simply repeat the values

(17)

Deconvolution

=

Actually, deconvolution is convolution.

+ + +

+

(18)

Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, “Learning the Speech Front-end With Raw Waveform CLDNNs,”

In INTERPSEECH 2015

Combination of Different Structures

(19)

Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, “Learning the Speech Front-end With Raw Waveform CLDNNs,”

In INTERPSEECH 2015

Combination of Different Structures

(20)

Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, “Learning the Speech Front-end With Raw Waveform CLDNNs,”

In INTERPSEECH 2015

Combination of Different Structures

3 layers

(21)

CNN for Sequence-

to-

sequence

https://arxiv.org/abs/1705.03122

(22)

CNN for Sequence-to-sequence

• Encoder

機 器 學 習

RNN

43

21

機 器 學 習

CNN

43

21

(23)

CNN for Sequence-to-sequence

• Decoder - WaveNet

(24)

Outline

• Convolutional Neural Network (Review)

• Spatial Transformer

• Highway Network & Grid LSTM

• Pointer Network

• External Memory

(25)

Spatial Transformer Layer

• CNN is not invariant to scaling and rotation

CNN

CNN

5

6

NN layer

End-to-end learn Can also transform feature map

(26)

Spatial Transformer Layer

𝑎11𝑙−1 𝑎12𝑙−1 𝑎13𝑙−1 𝑎21𝑙−1 𝑎22𝑙−1 𝑎23𝑙−1 𝑎31𝑙−1 𝑎32𝑙−1 𝑎33𝑙−1

• How to transform an image/feature map

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

Layer l-1 Layer l

Spatial Transformer Layer

Translate

General layer: 𝑎𝑛𝑚𝑙 = ෍

𝑖=1

3

𝑗=1

3 𝑤𝑛𝑚,𝑖𝑗𝑙 𝑎𝑖𝑗𝑙−1

If we want translate as above: 𝑎𝑛𝑚𝑙 = 𝑎(𝑛−1)𝑚𝑙−1

𝑤𝑛𝑚,𝑖𝑗𝑙 = 1 𝑖𝑓 𝑖 = 𝑛 − 1, 𝑗 = 𝑚 𝑤𝑛𝑚,𝑖𝑗𝑙 = 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(27)

Spatial Transformer Layer

• How to transform an image/feature map

𝑎11𝑙−1 𝑎12𝑙−1 𝑎13𝑙−1 𝑎21𝑙−1 𝑎22𝑙−1 𝑎23𝑙−1 𝑎31𝑙−1 𝑎32𝑙−1 𝑎33𝑙−1

Layer l-1

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

Layer l

𝑎11𝑙−1 𝑎12𝑙−1 𝑎13𝑙−1 𝑎21𝑙−1 𝑎22𝑙−1 𝑎23𝑙−1 𝑎31𝑙−1 𝑎32𝑙−1 𝑎33𝑙−1

Layer l-1

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

Layer l

NN NN

Control the connection

Control the connection

(28)

Image Transformation

Expansion, Compression, Translation

𝑥 𝑦

𝑥′

𝑥′ 𝑦′

𝑦′ = 2 0 0 2

𝑥

𝑦 + 0 0

𝑥′

1 𝑦′

1

𝑥′

𝑦′ = 0.5 0 0 0.5

𝑥

𝑦 + 0.5 0.5

(29)

Image Transformation

• Rotation

https://home.gamer.com.tw/c reationDetail.php?sn=792585

𝑥

𝑦 𝑥′

𝑦′

Rotate 𝜃° 𝑥′

𝑦′ = 𝑐𝑜𝑠𝜃 −𝑠𝑖𝑛𝜃 𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃

𝑥

𝑦 + 0 0

(30)

Spatial Transformer Layer

𝑥′

𝑦′ = 𝑎 𝑏 𝑐 𝑑

𝑥

𝑦 + 𝑒 𝑓

𝑎11𝑙−1 𝑎12𝑙−1 𝑎13𝑙−1 𝑎21𝑙−1 𝑎22𝑙−1 𝑎23𝑙−1 𝑎31𝑙−1 𝑎32𝑙−1 𝑎33𝑙−1

Layer l-1

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

Layer l

NN

Index of layer l Index of layer l-1

𝑎 𝑏 𝑐 𝑑

𝑒 𝑓

6 parameters to describe the affine transformation

(31)

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙 𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

Spatial Transformer Layer

𝑥′

𝑦′ = 𝑎 𝑏 𝑐 𝑑

𝑥

𝑦 + 𝑒

𝑓 6 parameters to describe the affine transformation

𝑎11𝑙−1 𝑎12𝑙−1 𝑎13𝑙−1 𝑎21𝑙−1 𝑎22𝑙−1 𝑎23𝑙−1 𝑎31𝑙−1 𝑎32𝑙−1 𝑎33𝑙−1

Layer l-1 Layer l

NN

Index of layer l Index of layer l-1

𝑎 𝑏 𝑐 𝑑

𝑒 0 1 𝑓

1 0 −1

−1 0 1

1 0 −1

−1

(32)

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙 𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

Spatial Transformer Layer

𝑥′

𝑦′ = 𝑎 𝑏 𝑐 𝑑

𝑥

𝑦 + 𝑒

𝑓 6 parameters to describe the affine transformation

𝑎11𝑙−1 𝑎12𝑙−1 𝑎13𝑙−1 𝑎21𝑙−1 𝑎22𝑙−1 𝑎23𝑙−1 𝑎31𝑙−1 𝑎32𝑙−1 𝑎33𝑙−1

Layer l-1 Layer l

NN

Index of layer l Index of layer l-1

𝑎 𝑏 𝑐 𝑑

𝑒 0 0.5 𝑓

1 0

0.6 0.4 0 0.5

1 0 0.6

2 0.4 1.6 2

2.4

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

What is the problem?

Gradient is always zero

(33)

Interpolation

𝑥′

𝑦′ = 𝑎 𝑏 𝑐 𝑑

𝑥

𝑦 + 𝑒

𝑓 6 parameters to describe the affine transformation Index of layer l

Index of layer l-1

0 0.5

1 0 0.6

2 0.4 1.6 2

2.4

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙 𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

Layer l

𝑎11𝑙 𝑎12𝑙 𝑎13𝑙 𝑎21𝑙 𝑎22𝑙 𝑎23𝑙 𝑎31𝑙 𝑎32𝑙 𝑎33𝑙

1.6 2.4

0.6 0.4

0.4 0.6

0.6 0.4

0.6 0.4

𝑎22𝑙 = (1 − 0.4) × (1 − 0.4) × 𝑎22𝑙−1 + 1 − 0.6 × (1 − 0.4) × 𝑎12𝑙−1 + 1 − 0.6 × (1 − 0.6) × 𝑎13𝑙−1 + 1 − 0.4 × (1 − 0.6) × 𝑎23𝑙−1 Now we can use gradient descent

(34)
(35)
(36)

Single: one transformation layer Multi: many transformation layer

Street View

House Number

(37)

Bird Recognition

𝑎 𝑏𝑐 𝑑

𝑒

0 𝑓

0

(38)

Outline

• Convolutional Neural Network (Review)

• Spatial Transformer

• Highway Network & Grid LSTM

• Pointer Network

• External Memory

(39)

x f1 a1 f2 a2 f3 a3 f4 y

𝑎𝑡 = 𝑓𝑙 𝑎𝑡−1 = 𝜎 𝑊𝑡𝑎𝑡−1 + 𝑏𝑡

x1

h0 f h1

x2 f

x3 h2 f

x4

h3 f y4

𝑡 = 𝑓 ℎ𝑡−1, 𝑥𝑡 = 𝜎 𝑊𝑡−1 + 𝑊𝑖𝑥𝑡 + 𝑏𝑖

t is layer

t is time step

Applying gated structure in feedforward network

Feedforward v.s. Recurrent

1. Feedforward network does not have input at each step

2. Feedforward network has different parameters for each layer

(40)

GRU → Highway Network

ht-1

r z

yt

xt ht-1

h'

xt

1- ⨀

ht

reset update No input xt at each

step

at-1 is the output of the (t-1)-th layer at is the output of the t-th layer

No output yt at each step

No reset gate

at-1 at

at-1

(41)

Highway Network

• Residual Network

• Highway Network

Deep Residual Learning for Image Recognition

http://arxiv.org/abs/1512.03385 Training Very Deep Networks

https://arxiv.org/pdf/1507.0622 8v2.pdf

+

copy copy Gate

controller

= 𝜎 𝑊𝑎𝑡−1 𝑧 = 𝜎 𝑊′𝑎𝑡−1

𝑎𝑡 = 𝑧 ⊙ 𝑎𝑡−1 + 1 − 𝑧 ⊙ ℎ

𝑎𝑡−1 𝑎𝑡 𝑧

ℎ′

𝑎𝑡−1 𝑎𝑡

𝑎𝑡−1

(42)

Input layer output layer

Input layer output layer

Input layer output layer

Highway Network automatically determines the layers needed!

(43)

Highway Network

(44)

Grid LSTM

LSTM y

x

c ht h

c Grid

LSTM

c h h

c

Memory for both time and depth

a b a’ b’

time depth

(45)

Grid ht-1 LSTM

ct-1

al bl

ht ct

al-1 bl-1

Grid LSTM al bl

ht+1 ct+1

al-1 bl-1 Grid

LSTM’

ht-1 ct-1

al+1 bl+1

ht ct

Grid LSTM’

al+1 bl+1

ht+1 ct+1

(46)

Grid LSTM

Grid LSTM

c h h

c

a b a’ b’

h'

z zi

zf zo

h c

⨀ ⨀

tanh

c' a

b

a' b'

(47)

e’ f’

3D Grid LSTM

h c

h’

c’

b a

e f

b’

a’

(48)

3D Grid LSTM

• Images are composed of pixels

3 x 3 images

(49)

Outline

• Convolutional Neural Network (Review)

• Spatial Transformer

• Highway Network & Grid LSTM

• Pointer Network

• External Memory

(50)

Pointer Network

NN

𝑥1 𝑦1

𝑥2 𝑦2

𝑥3 𝑦3

𝑥4

𝑦4 ……

coordinate of P1

4 2 7 6 5 3

(51)

“硬train” 的故事

• Fizz Buzz in Tensorflow:

http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/

https://ochronus.com/fizzbuzz-in-css/

(52)

Sequence-to-sequence?

機 器 學 習

Encoder Decoder

machine learning

~ ~

(53)

Sequence-to-sequence?

Encoder Decoder

~ ~

𝑥1

𝑦1 𝑥2

𝑦2 𝑥3

𝑦3 𝑥4 𝑦4

{1, 2, 3, 4, END}

~

1 4 2

Problem?

Of course, one can add attention.

(54)

Pointer Network

𝑧0

1234

𝑥1

𝑦1 𝑥2

𝑦2 𝑥3

𝑦3 𝑥4 𝑦4 𝑥0

𝑦00

𝑥0

𝑦0 : END

key 0.5

Attention Weight

(55)

Pointer Network

𝑧0

1234

𝑥1

𝑦1 𝑥2

𝑦2 𝑥3

𝑦3 𝑥4 𝑦4 𝑥0

𝑦00

𝑥0

𝑦0 : END

0.5

0.0 0.3 0.2 0.0

key argmax from this distribution

~ 1

Output:

𝑧1 𝑥1 𝑦1

What decoder can output depends on the input.

(56)

Pointer Network

𝑧0

1234

𝑥1

𝑦1 𝑥2

𝑦2 𝑥3

𝑦3 𝑥4 𝑦4 𝑥0

𝑦00

𝑥0

𝑦0 : END

0.0

0.0 0.1 0.2 0.7

key

𝑧1 𝑥1 𝑦1 argmax from this distribution

~ 4

Output:

𝑧2 𝑥4 𝑦4

……

The process stops when

“END” has the largest attention weights.

What decoder can output depends on the input.

(57)

Applications - Summarization

https://arxiv.org/abs/1704.04368

(58)

More Applications

Machine Translation

Chat-bot

User: X寶你好,我是庫洛洛

Machine:庫洛洛你好,很高興認識你

(59)

Outline

• Convolutional Neural Network (Review)

• Spatial Transformer

• Highway Network & Grid LSTM

• Pointer Network

• External Memory

(60)

External Memory

Reading Head Controller

Input

Reading Head

output

…… ……

Machine’s Memory DNN/RNN

Ref:

http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Attain%20(v3).e cm.mp4/index.html

(61)

Reading Comprehension

Query

Each sentence becomes a vector.

……

DNN/RNN

Reading Head Controller

……

answer

Semantic Analysis

(62)

Memory Network

Answer

Match Query

vector Document

q

Extracted Information

𝑥1 ……

𝛼1

= ෍

𝑛=1 𝑁

𝛼𝑛𝑥𝑛

𝑥2 𝑥3 𝑥𝑁 𝛼2 𝛼3 𝛼𝑁 Sentence to DNN

vector can be jointly trained.

Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, “End-To-End Memory Networks”, NIPS, 2015

(63)

Answer

Match Query

Document

q

Extracted Information

𝑥1 ……

𝛼1

= ෍

𝑛=1 𝑁

𝛼𝑛𝑛

𝑥2 𝑥3 𝑥𝑁 𝛼2 𝛼3 𝛼𝑁 Jointly learned

123 …… ℎ𝑁

DNN

Memory Network

Hopping

(64)

Memory Network

q

……

……

Compute attention Extract information

……

……

Compute attention Extract information

DNN a

(65)

Multiple-hop

• End-To-End Memory Networks. S. Sukhbaatar, A. Szlam, J.

Weston, R. Fergus. NIPS, 2015.

The position of reading head:

Keras has example:

https://github.com/fchollet/keras/blob/master/examples/ba bi_memnn.py

(66)

Visual Question Answering

source: http://visualqa.org/

(67)

Visual Question Answering

Query DNN/RNN

Reading Head Controller

answer

CNN A vector for

each region

(68)

Visual Question Answering

• Huijuan Xu, Kate Saenko. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question

Answering. arXiv Pre-Print, 2015

(69)

Visual Question Answering

• Huijuan Xu, Kate Saenko. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question

Answering. arXiv Pre-Print, 2015

(70)

External Memory v2

Reading Head Controller

Input

Reading Head

output

…… ……

Machine’s Memory DNN/RNN

Neural Turing Machine

Writing Head Controller Writing Head

(71)

Neural Turing Machine

r0 y1

f h0

𝑚01 𝑚02 𝑚03 𝑚04

x1

ො𝛼01 ො𝛼02 ො𝛼03 ො𝛼04 𝑟0 = ෍ ො𝛼0𝑖 𝑚0𝑖

Retrieval process

(72)

Neural Turing Machine

𝑚01 𝑚02 𝑚03 𝑚04

ො𝛼01 ො𝛼02 ො𝛼03 ො𝛼04 𝑟0 = ෍ ො𝛼0𝑖 𝑚0𝑖

𝑒1 𝑎1 𝑘1

𝛼1𝑖 = 𝑐𝑜𝑠 𝑚0𝑖 , 𝑘1

ො𝛼11 ො𝛼12 ො𝛼13 ො𝛼14 softmax

𝛼11 𝛼12 𝛼13 𝛼14 r0

y1 f h0

x1

(73)

Neural Turing Machine

𝑚01 𝑚02 𝑚03 𝑚04

ො𝛼01 ො𝛼02 ො𝛼03 ො𝛼04

𝑒1 𝑎1 𝑘1

𝑚11 𝑚12 𝑚13 𝑚14 𝑚1𝑖 = 𝑚0𝑖 𝑒1 + ො𝛼1𝑖 𝑎1

(element-wise)

− ො𝛼1𝑖

ො𝛼11 ො𝛼12 ො𝛼13 ො𝛼14 𝑚0𝑖

0 ~ 1

(74)

Neural Turing Machine

𝑚01 𝑚02 𝑚03 𝑚04

ො𝛼01 ො𝛼02 ො𝛼03 ො𝛼04 ො𝛼11 ො𝛼12 ො𝛼13 ො𝛼14 𝑚11 𝑚12 𝑚13 𝑚14

ො𝛼21 ො𝛼22 ො𝛼23 ො𝛼24 𝑚21 𝑚22 𝑚23 𝑚24 r0

y1 f h0

x1 r1

y2 f h1

x2

(75)

Neural Turing Machine for LM

Wei-Jen Ko, Bo-Hsiang Tseng, Hung-yi Lee,

“Recurrent Neural Network based Language Modeling with Controllable External Memory”, ICASSP, 2017

(76)

Stack RNN

Armand Joulin, Tomas Mikolov, Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets, arXiv Pre-Print, 2015

stack

xt yt

……

f

Push, Pop, Nothing 0.7 0.2 0.1

Information to store

Pop Nothing Push

X0.7 X0.2 X0.1

+ +

(77)

Concluding Remarks

• Convolutional Neural Network (Review)

• Spatial Transformer

• Highway Network & Grid LSTM

• Pointer Network

• External Memory

Figure

Updating...

References

Related subjects :