• 沒有找到結果。

representation-learning through approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through

approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through

approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through

approximating identity function

Deep Learning Autoencoder

Usefulness of Approximating Identity Function

if

g(x) ≈ x

using some

hidden

structures on the

observed data x n

for supervised learning:

• hidden structure (essence) of x can be used as reasonable transform Φ(x)

—learning

‘informative’ representation

of data

for unsupervised learning:

• density estimation: larger (structure match) when g(x) ≈ x

• outlier detection: those x where g(x) 6≈ x

—learning

‘typical’ representation

of data

autoencoder:

representation-learning through approximating identity function

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w ij (1) o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w ij (1) o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w ij (1) o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w ij (1) o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w ij (1) o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w ij (1) o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n

w ij (1) o

taken as

shallowly pre-trained weights

Deep Learning Autoencoder

Basic Autoencoder

basic

autoencoder:

d —˜ d —d NNet

with error functionP

d

i=1

(g

i

(x) − x

i

)

2

backprop

easily

applies;

shallow

and

easy

to train

usually

d ˜

<

d

:

compressed

representation

data: {(x

1

,

y 1 = x 1

), (x

2

,

y 2 = x 2

), . . . , (x

N

,

y N = x N

)}

—often categorized as

unsupervised learning technique

sometimes constrain

w ij (1)

=

w ji (2)

as

regularization

—more

sophisticated

in calculating gradient

basic

autoencoder

in basic deep learning:

n w ij (1) o

taken as

shallowly pre-trained weights

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/24

Deep Learning Autoencoder

Pre-Training with Autoencoders

Deep Learning

with Autoencoders

1

for` = 1, . . . , L,

pre-train

n w

ij (`)

o

assuming w

(1)

,. . . w

(`−1)

fixed

(a) (b) (c) (d)

by

training basic autoencoder on n

x (`−1) n o

with ˜ d = d (`)

2 train with backprop

on

pre-trained

NNet to

fine-tune

all n

w

ij (`)

o

many successful

pre-training

techniques take

‘fancier’ autoencoders

with different

architectures

and

regularization schemes

Deep Learning Autoencoder

Pre-Training with Autoencoders

Deep Learning with Autoencoders

1

for` = 1, . . . , L,

pre-train

n w

ij (`)

o

assuming w

(1)

,. . . w

(`−1)

fixed

(a) (b) (c) (d)

by

training basic autoencoder on n

x (`−1) n o

with ˜ d = d (`)

2 train with backprop

on

pre-trained

NNet to

fine-tune

all n

w

ij (`)

o

many successful

pre-training

techniques take

‘fancier’ autoencoders

with different

architectures

and

regularization schemes

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/24

Deep Learning Autoencoder

Pre-Training with Autoencoders

Deep Learning with Autoencoders

1

for` = 1, . . . , L,

pre-train

n w

ij (`)

o

assuming w

(1)

,. . . w

(`−1)

fixed

(a) (b) (c) (d)

by

training basic autoencoder on n

x (`−1) n o

with ˜ d = d (`)

2 train with backprop

on

pre-trained

NNet to

fine-tune

all n

w

ij (`)

o

many successful

pre-training

techniques take

‘fancier’ autoencoders

with different

architectures

and

regularization schemes

Deep Learning Autoencoder

Fun Time

Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d

(1)

-d

(2)

-d

(3)

-1 deep NNet?

1

c d + d

(1)

+d

(2)

+d

(3)

+1

2

c d · d

(1)

· d

(2)

· d

(3)

· 1

3

c dd

(1)

+d

(1)

d

(2)

+d

(2)

d

(3)

+d

(3)



4

c dd

(1)

· d

(1)

d

(2)

· d

(2)

d

(3)

· d

(3)



Reference Answer: 3

Each c · d

(`−1)

· d

(`)

represents the time for pre-training with one autoencoder to determine one layer of the weights.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/24

Deep Learning Autoencoder

Fun Time

Suppose training a d -˜d -d autoencoder with backprop takes approximately c · d · ˜d seconds. Then, what is the total number of seconds needed for pre-training a d -d

(1)

-d

(2)

-d

(3)

-1 deep NNet?

1

c d + d

(1)

+d

(2)

+d

(3)

+1

2

c d · d

(1)

· d

(2)

· d

(3)

· 1

3

c dd

(1)

+d

(1)

d

(2)

+d

(2)

d

(3)

+d

(3)



4

c dd

(1)

· d

(1)

d

(2)

· d

(2)

d

(3)

· d

(3)



Reference Answer: 3

Each c · d

(`−1)

· d

(`)

represents the time for pre-training with one autoencoder to determine one layer of the weights.

Deep Learning Denoising Autoencoder

Regularization in Deep Learning

x

0

= 1 x

1

x

2

.. . x

d

+1

tanh

tanh

w

ij(1)

w

jk(2)

w

kq(3)

+1

tanh

tanh

s

3(2) tanh

x

3(2)

相關文件