Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
**autoencoder:**

**representation-learning through** approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
**autoencoder:**

**representation-learning through**

### approximating identity function

Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
**autoencoder:**

**representation-learning through** approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
**autoencoder:**

**representation-learning through**

### approximating identity function

Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
**autoencoder:**

**representation-learning through** approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
**autoencoder:**

**representation-learning through**

### approximating identity function

Deep Learning Autoencoder

### Usefulness of Approximating Identity Function

if

**g(x) ≈** **x**

using some**hidden**

structures on the**observed data x** _{n}

### •

for supervised learning:### • hidden structure (essence) of **x** can be used as reasonable transform Φ(x)

—learning

**‘informative’ representation**

of data
### •

for unsupervised learning:### • density estimation: larger (structure match) when **g(x) ≈** **x**

### • outlier detection: those **x where** **g(x) 6≈** **x**

—learning

**‘typical’ representation**

of data
**autoencoder:**

**representation-learning through** approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n

### w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n

### w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n

### w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n

### w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n

### w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n

### w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n

### w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Deep Learning Autoencoder

### Basic Autoencoder

basic

**autoencoder:**

### d —˜ d —d NNet

with error functionP### d

### i=1

(g_{i}

(x) − x_{i}

)^{2}

### •

backprop### easily

applies;**shallow**

and### easy

to train### •

usually### d ˜

<### d

:**compressed**

representation
### •

data: {(x_{1}

,**y** _{1} = **x** _{1}

), (x_{2}

,**y** _{2} = **x** _{2}

), . . . , (x_{N}

,**y** _{N} = **x** _{N}

)}
—often categorized as

**unsupervised learning technique**

### •

sometimes constrain### w _{ij} ^{(1)}

=### w _{ji} ^{(2)}

as**regularization**

—more

**sophisticated**

in calculating gradient
basic

**autoencoder**

in basic deep learning:
### n w _{ij} ^{(1)} o

taken as

### shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

### Pre-Training with Autoencoders

### Deep Learning

### with Autoencoders

### 1

for` = 1, . . . , L,**pre-train**

n
w_{ij} ^{(`)}

o
assuming w

### ∗ ^{(1)}

,. . . w### ∗ ^{(`−1)}

fixed
(a) (b) (c) (d)

by

**training basic autoencoder on** n

**x** ^{(`−1)} _{n} o

**with ˜** d = d ^{(`)}

### 2 **train with backprop**

on**pre-trained**

NNet to**fine-tune**

all
n
w

_{ij} ^{(`)}

o
many successful

**pre-training**

techniques take
**‘fancier’ autoencoders**

with different
**architectures**

and**regularization schemes**

Deep Learning Autoencoder

### Pre-Training with Autoencoders

### Deep Learning with Autoencoders

### 1

for` = 1, . . . , L,**pre-train**

n
w_{ij} ^{(`)}

o
assuming w

### ∗ ^{(1)}

,. . . w### ∗ ^{(`−1)}

fixed
(a) (b) (c) (d)

by

**training basic autoencoder on** n

**x** ^{(`−1)} _{n} o

**with ˜** d = d ^{(`)}

### 2 **train with backprop**

on**pre-trained**

NNet to**fine-tune**

all
n
w

_{ij} ^{(`)}

o
many successful

**pre-training**

techniques take
**‘fancier’ autoencoders**

with different
**architectures**

and**regularization schemes**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/24

Deep Learning Autoencoder

### Pre-Training with Autoencoders

### Deep Learning with Autoencoders

### 1

for` = 1, . . . , L,**pre-train**

n
w_{ij} ^{(`)}

o
assuming w

### ∗ ^{(1)}

,. . . w### ∗ ^{(`−1)}

fixed
(a) (b) (c) (d)

by

**training basic autoencoder on** n

**x** ^{(`−1)} _{n} o

**with ˜** d = d ^{(`)}

### 2 **train with backprop**

on**pre-trained**

NNet to**fine-tune**

all
n
w

_{ij} ^{(`)}

o
many successful

**pre-training**

techniques take
**‘fancier’ autoencoders**

with different
**architectures**

and**regularization schemes**

Deep Learning Autoencoder

### Fun Time

Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d

^{(1)}

-d^{(2)}

-d^{(3)}

-1 deep NNet?
### 1

c d + d^{(1)}

+d^{(2)}

+d^{(3)}

+1
### 2

c d · d^{(1)}

· d^{(2)}

· d^{(3)}

· 1
### 3

c dd^{(1)}

+d^{(1)}

d^{(2)}

+d^{(2)}

d^{(3)}

+d^{(3)}

### 4

c dd^{(1)}

· d^{(1)}

d^{(2)}

· d^{(2)}

d^{(3)}

· d^{(3)}

### Reference Answer: 3

Each c · d

^{(`−1)}

· d^{(`)}

represents the time for
pre-training with one autoencoder to determine
one layer of the weights.
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/24

Deep Learning Autoencoder

### Fun Time

Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d

^{(1)}

-d^{(2)}

-d^{(3)}

-1 deep NNet?
### 1

c d + d^{(1)}

+d^{(2)}

+d^{(3)}

+1
### 2

c d · d^{(1)}

· d^{(2)}

· d^{(3)}

· 1
### 3

c dd^{(1)}

+d^{(1)}

d^{(2)}

+d^{(2)}

d^{(3)}

+d^{(3)}

### 4

c dd^{(1)}

· d^{(1)}

d^{(2)}

· d^{(2)}

d^{(3)}

· d^{(3)}

### Reference Answer: 3

Each c · d

^{(`−1)}

· d^{(`)}

represents the time for
pre-training with one autoencoder to determine
one layer of the weights.
Deep Learning Denoising Autoencoder

### Regularization in Deep Learning

### x

_{0}

### = 1 x

_{1}

### x

_{2}

### .. . x

d### +1

tanh

tanh

### w

_{ij}

^{(1)}

### w

_{jk}

^{(2)}

### w

_{kq}

^{(3)}

### +1

tanh

tanh

### s

_{3}

^{(2)}tanh

### x

_{3}

^{(2)}