representation-learning through approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through

approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through

approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through

approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

g(x) ≈ x

using some

hidden

structures on the

observed data x _n

•

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

•

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

_i

(x) − x

_i

)

²

•

backprop

easily

applies;

shallow

and

easy

to train

•

usually

d ˜

d

compressed

representation

•

data: {(x

₁

y ₁ = x ₁

), (x

₂

y ₂ = x ₂

), . . . , (x

_N

y _N = x _N

)}

—often categorized as

unsupervised learning technique

•

sometimes constrain

w _ij ⁽¹⁾

w _ji ⁽²⁾

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n w _ij ⁽¹⁾ o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Pre-Training with Autoencoders

Deep Learning

with Autoencoders

1

for` = 1, . . . , L,

pre-train

n w

_ij ^(`)

assuming w

∗ ⁽¹⁾

,. . . w

∗ ^(`−1)

fixed

(a) (b) (c) (d)

training basic autoencoder on n

x ^(`−1) _n o

with ˜ d = d ^(`)

2 train with backprop

pre-trained

NNet to

fine-tune

all n

_ij ^(`)

many successful

pre-training

techniques take

‘fancier’ autoencoders

with different

architectures

and

regularization schemes

Deep Learning Autoencoder

Pre-Training with Autoencoders

Deep Learning with Autoencoders

1

for` = 1, . . . , L,

pre-train

n w

_ij ^(`)

assuming w

∗ ⁽¹⁾

,. . . w

∗ ^(`−1)

fixed

(a) (b) (c) (d)

training basic autoencoder on n

x ^(`−1) _n o

with ˜ d = d ^(`)

2 train with backprop

pre-trained

NNet to

fine-tune

all n

_ij ^(`)

many successful

pre-training

techniques take

‘fancier’ autoencoders

with different

architectures

and

regularization schemes

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/24

Deep Learning Autoencoder

Pre-Training with Autoencoders

Deep Learning with Autoencoders

1

for` = 1, . . . , L,

pre-train

n w

_ij ^(`)

assuming w

∗ ⁽¹⁾

,. . . w

∗ ^(`−1)

fixed

(a) (b) (c) (d)

training basic autoencoder on n

x ^(`−1) _n o

with ˜ d = d ^(`)

2 train with backprop

pre-trained

NNet to

fine-tune

all n

_ij ^(`)

many successful

pre-training

techniques take

‘fancier’ autoencoders

with different

architectures

and

regularization schemes

Deep Learning Autoencoder

Fun Time

Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d

⁽¹⁾

-d

⁽²⁾

-d

⁽³⁾

-1 deep NNet?

1

c d + d

⁽¹⁾

⁽²⁾

⁽³⁾

2

c d · d

⁽¹⁾

· d

⁽²⁾

· d

⁽³⁾

· 1

3

c dd

⁽¹⁾

⁽²⁾

⁽³⁾

4

c dd

⁽¹⁾

· d

⁽¹⁾

⁽²⁾

· d

⁽²⁾

⁽³⁾

· d

⁽³⁾

Reference Answer: 3

Each c · d

^(`−1)

· d

^(`)

represents the time for pre-training with one autoencoder to determine one layer of the weights.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/24

Deep Learning Autoencoder

Fun Time

Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d

⁽¹⁾

-d

⁽²⁾

-d

⁽³⁾

-1 deep NNet?

1

c d + d

⁽¹⁾

⁽²⁾

⁽³⁾

2

c d · d

⁽¹⁾

· d

⁽²⁾

· d

⁽³⁾

· 1

3

c dd

⁽¹⁾

⁽²⁾

⁽³⁾

4

c dd

⁽¹⁾

· d

⁽¹⁾

⁽²⁾

· d

⁽²⁾

⁽³⁾

· d

⁽³⁾

Reference Answer: 3

Each c · d

^(`−1)

· d

^(`)

represents the time for pre-training with one autoencoder to determine one layer of the weights.

Deep Learning Denoising Autoencoder

Regularization in Deep Learning

x

₀

= 1 x

₁

x

₂

.. . x

+1

tanh

w

_ij⁽¹⁾

w

_jk⁽²⁾

w

_kq⁽³⁾

+1

tanh

s

₃⁽²⁾ tanh

x

₃⁽²⁾

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 50-71)

representation-learning through approximating identity function

Usefulness of Approximating Identity Function

g(x) ≈ x

hidden

observed data x n

•

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

‘informative’ representation

•

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

‘typical’ representation

Usefulness of Approximating Identity Function

g(x) ≈ x

hidden

observed data x n

•

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

‘informative’ representation

•

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

‘typical’ representation

autoencoder: