Deep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataDeep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataautoencoder:
representation-learning through approximating identity function
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24
Deep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataautoencoder:
representation-learning through
approximating identity function
Deep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataautoencoder:
representation-learning through approximating identity function
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24
Deep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataautoencoder:
representation-learning through
approximating identity function
Deep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataautoencoder:
representation-learning through approximating identity function
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24
Deep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataautoencoder:
representation-learning through
approximating identity function
Deep Learning Autoencoder
Usefulness of Approximating Identity Function
if
g(x) ≈ x
using somehidden
structures on theobserved data x n
•
for supervised learning:• hidden structure (essence) of x can be used as reasonable transform Φ(x)
—learning
‘informative’ representation
of data•
for unsupervised learning:• density estimation: larger (structure match) when g(x) ≈ x
• outlier detection: those x where g(x) 6≈ x
—learning
‘typical’ representation
of dataautoencoder:
representation-learning through approximating identity function
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n
w ij (1) o
taken as
shallowly pre-trained weights
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n
w ij (1) o
taken as
shallowly pre-trained weights
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n
w ij (1) o
taken as
shallowly pre-trained weights
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n
w ij (1) o
taken as
shallowly pre-trained weights
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n
w ij (1) o
taken as
shallowly pre-trained weights
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n
w ij (1) o
taken as
shallowly pre-trained weights
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n
w ij (1) o
taken as
shallowly pre-trained weights
Deep Learning Autoencoder
Basic Autoencoder
basic
autoencoder:
d —˜ d —d NNet
with error functionPd
i=1
(gi
(x) − xi
)2
•
backpropeasily
applies;shallow
andeasy
to train•
usuallyd ˜
<d
:compressed
representation•
data: {(x1
,y 1 = x 1
), (x2
,y 2 = x 2
), . . . , (xN
,y N = x N
)}—often categorized as
unsupervised learning technique
•
sometimes constrainw ij (1)
=w ji (2)
asregularization
—more
sophisticated
in calculating gradientbasic
autoencoder
in basic deep learning:n w ij (1) o
taken as
shallowly pre-trained weights
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24
Deep Learning Autoencoder
Pre-Training with Autoencoders
Deep Learning
with Autoencoders
1
for` = 1, . . . , L,pre-train
n wij (`)
oassuming w
∗ (1)
,. . . w∗ (`−1)
fixed(a) (b) (c) (d)
by
training basic autoencoder on n
x (`−1) n o
with ˜ d = d (`)
2 train with backprop
onpre-trained
NNet tofine-tune
all nw
ij (`)
omany successful
pre-training
techniques take‘fancier’ autoencoders
with differentarchitectures
andregularization schemes
Deep Learning Autoencoder
Pre-Training with Autoencoders
Deep Learning with Autoencoders
1
for` = 1, . . . , L,pre-train
n wij (`)
oassuming w
∗ (1)
,. . . w∗ (`−1)
fixed(a) (b) (c) (d)
by
training basic autoencoder on n
x (`−1) n o
with ˜ d = d (`)
2 train with backprop
onpre-trained
NNet tofine-tune
all nw
ij (`)
omany successful
pre-training
techniques take‘fancier’ autoencoders
with differentarchitectures
andregularization schemes
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/24
Deep Learning Autoencoder
Pre-Training with Autoencoders
Deep Learning with Autoencoders
1
for` = 1, . . . , L,pre-train
n wij (`)
oassuming w
∗ (1)
,. . . w∗ (`−1)
fixed(a) (b) (c) (d)
by
training basic autoencoder on n
x (`−1) n o
with ˜ d = d (`)
2 train with backprop
onpre-trained
NNet tofine-tune
all nw
ij (`)
omany successful
pre-training
techniques take‘fancier’ autoencoders
with differentarchitectures
andregularization schemes
Deep Learning Autoencoder
Fun Time
Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d
(1)
-d(2)
-d(3)
-1 deep NNet?1
c d + d(1)
+d(2)
+d(3)
+12
c d · d(1)
· d(2)
· d(3)
· 13
c dd(1)
+d(1)
d(2)
+d(2)
d(3)
+d(3)
4
c dd(1)
· d(1)
d(2)
· d(2)
d(3)
· d(3)
Reference Answer: 3
Each c · d
(`−1)
· d(`)
represents the time for pre-training with one autoencoder to determine one layer of the weights.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/24
Deep Learning Autoencoder
Fun Time
Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d
(1)
-d(2)
-d(3)
-1 deep NNet?1
c d + d(1)
+d(2)
+d(3)
+12
c d · d(1)
· d(2)
· d(3)
· 13
c dd(1)
+d(1)
d(2)
+d(2)
d(3)
+d(3)
4
c dd(1)
· d(1)
d(2)
· d(2)
d(3)
· d(3)
Reference Answer: 3
Each c · d
(`−1)
· d(`)
represents the time for pre-training with one autoencoder to determine one layer of the weights.Deep Learning Denoising Autoencoder
Regularization in Deep Learning
x
0= 1 x
1x
2.. . x
d+1
tanh
tanh
w
ij(1)w
jk(2)w
kq(3)+1
tanh
tanh