• 沒有找到結果。

Slide credit from Hung-Yi Lee

N/A
N/A
Protected

Academic year: 2022

Share "Slide credit from Hung-Yi Lee"

Copied!
41
0
0

加載中.... (立即查看全文)

全文

(1)
(2)

Introduction

Big data ≠ Big annotated data

Machine learning techniques include:

Supervised learning (if we have labelled data)

Reinforcement learning (if we have an environment for reward)

Unsupervised learning (if we do not have labelled data)

2

What can we do if there is no sufficient training data?

(3)

Semi-Supervised Learning

Labelled data

Unlabeled data

cat dog

(4)

Semi-Supervised Learning

Why semi-supervised learning helps?

4

The distribution of the unlabeled data provides some cues

(5)

Transfer Learning

Labelled data

Labeled data

cat dog

elephant elephant tiger tiger

(6)

Transfer Learning

Widely used on image processing

Using sufficient labeled data to learn a CNN

Using this CNN as feature extractor

6

Layer 1 Layer 2 Layer L

x1

x2

……

…… …… ……

……

elephant

N ……

x

Pixels

……

……

……

(7)

Transfer Learning Example

爆漫王 研究生

生存守則

責編 漫畫家

投稿 jump 畫分鏡 指導教授

研究生

投稿期刊 跑實驗

漫畫家 online

研究生 online

(8)

Self-Taught Learning

The unlabeled data sometimes is not related to the task

8

Unlabeled data

(Just crawl millions of images from the Internet) Labelled

data

cat dog

(9)

Self-Taught Learning

The unlabeled data sometimes is not related to the task Labelled data

Digit Recognition

Unlabeled data

Speech Document Classification

Digits character

English

News Webpages

(10)

Self-Taught Learning

How does self-taught learning work?

Why does unlabeled and unrelated data help the tasks?

10

Finding latent factors that control the observations

(11)

Latent Factors for Handwritten Digits

(12)

Latent Factors for Documents

http://deliveryimages.acm.org/10.1145/2140000/2133826/figs/f1.jpg 12

(13)

Latent Factors for Recommendation System

A

C B 單純呆

傲嬌

(14)

Latent Factor Exploitation

Handwritten digits

14

The handwritten images are composed of strokes

Strokes (Latent Factors)

…….

No. 1 No. 2 No. 3 No. 4 No. 5

(15)

Latent Factor Exploitation

28

28

= + +

Strokes (Latent Factors)

…….

No. 1 No. 2 No. 3 No. 4 No. 5

No. 1 No. 3 No. 5

(16)

Autoencoder

Representati on Learning

16

(17)

Autoencoder

Represent a digit using 28 X 28 dimensions Not all 28 X 28 images are digits

NN Encoder

NN

code

Learn together

28 X 28 = 784 Usually <784

Idea: represent the images of digits in a more compact way

Compact

representation of the input object

Can reconstruct

(18)

Autoencoder

18

𝑥

Input layer

𝑊

𝑦 𝑊′

output layer hidden layer

𝑎

As close as possible Minimize 𝑥 − 𝑦 2

Bottleneck layer

𝑎 = 𝜎 𝑊𝑥 + 𝑏 𝑦 = 𝜎 𝑊′𝑎 + 𝑏′

Output of the hidden layer is the code

encode decode

(19)

Autoencoder

De-noising auto-encoder

𝑥 𝑊 𝑦

𝑊′

𝑎

encode decode

Add noise

𝑥′

As close as possible

(20)

Deep Autoencoder

20

Initialize by RBM layer-by-layer

Input Layer Layer Layer bottle Output Layer

Layer

Layer

Layer

Layer

… …

Code

As close as possible

𝑥 𝑥෤

Hinton and Salakhutdinov. “Reducing the dimensionality of data with neural networks,” Science, 2006.

(21)

Deep Autoencoder

Original Image

PCA Deep

Auto-encoder

784 784

784 1000 500 250 30

250 500 1000 784

30

(22)

Feature Representation

22

784 784

784 1000 500 250 2 2

250 500 1000 784

(23)

Auto-encoder – Text Retrieval

word string:

“This is an apple”

this is

a an apple pen

1 1 0 1 1 0

Bag-of-word Vector Space Model

document query

(24)

Autoencoder – Text Retrieval

24

Bag-of-word (document or query) query

2000 500 250 125

2

The documents talking about the same thing will have close code

(25)

Autoencoder – Similar Image Retrieval

Retrieved using Euclidean distance in pixel intensity space

(26)

Autoencoder – Similar Image Retrieval

26

32x32

8192 4096 2048 1024 512 256

code (crawl millions of images from the Internet)

(27)

Autoencoder – Similar Image Retrieval

Images retrieved using Euclidean distance in pixel intensity space

Images retrieved using 256 codes

(28)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

28

Target

784 1000 1000

10

Input output

Input 784

1000 784

W1 𝑥

෤ 𝑥 500

W1

(29)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

Target

1000 1000 output 10

500

1000 1000 1000

𝑎1

෤ 𝑎1

W2 W2

(30)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

30

Target

784 1000 1000

10

Input output

500

Input 784

1000

W1 1000

fix

𝑥 𝑎1

෤ 𝑎2

W2 fix

𝑎2 1000

W3 500

W3

(31)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

Target

1000 1000 output 10

500

1000 1000

W2 W3 500

output 10

W4

Random init Find-tune via backprop

(32)

Variational Autoencoder

Representati on Learning and Generation

32

(33)

Generation from Latent Codes

𝑥

𝑊

𝑦 𝑊′

𝑎

encode decode

How can we set a latent code for generation?

(34)

Latent Code Distribution Constraints

Constrain the data distribution for learned latent codes Generate the latent code via a prior distribution

34

𝑥 encode 𝑎 𝑦

decode

sampling

(35)

Reconstruction

AE

VAE

(36)

VAE for Music Generation

𝑡

𝑑𝑇𝑡= 𝑇𝑡 = 𝑃𝑡 = #𝐹

Note Dictionary 𝑛𝑜𝑡𝑒𝑡= (𝑃𝑡, 𝑇𝑡, 𝑑𝑇𝑡)

notet

𝑇1

𝑇0 𝑇𝑡

𝑃1

𝑃0 𝑃𝑡

𝑑𝑇1 d𝑇0 𝑑𝑇𝑡

Variationa l Inference

Reverse Note Dictionary

Modularized Note Unrolling Decoder

Modularized Encoder

Latent Code

Fully-Connected 𝑑𝑇1, 𝑧𝑑𝑇0, 𝑧 𝑑𝑇𝑡, 𝑧

𝑇1, 𝑧

𝑇0, 𝑧 𝑇𝑡, 𝑧

𝑃1, 𝑧

𝑃0, 𝑧 𝑃𝑡, 𝑧

𝑛𝑜𝑡𝑒0 𝑛𝑜𝑡𝑒1 𝑛𝑜𝑡𝑒𝑡𝑛𝑜𝑡𝑒𝑡+1

<Start>

𝑡+3, 𝑧 𝑑𝑇𝑡 𝑑𝑇𝑡+1

𝑡+5, 𝑧 𝑡+4, 𝑧

𝑇𝑡 𝑇𝑡+1

𝑃𝑡 𝑃𝑡+1 0 1

𝑡+1

𝑡+2

𝑧

𝑧

𝑧

2

http://mvae.miulab.tw

(37)

Distant Supervision

Representati on Learning by Weak Labels

(38)

20K 20K 20K 1000

w1 w2 w3

1000 1000

20K

wd

1000 300

Word Sequence: x

Word Hashing Matrix: Wh Word Hashing Layer: lh Convolution Matrix: Wc Convolutional Layer: lc Max Pooling Operation Max Pooling Layer: lm

Semantic Projection Matrix: Ws Semantic Layer: y

max max max 300 300 300 300

Q D1 D2 Dn CosSim(Q, Di)

P(D1 | Q) P(D2 | Q) P(Dn | Q)

Query

Documents

how about we discuss this later

Convolutional Deep Structured Semantic Models (CDSSM/DSSM)

Huang et al., "Learning deep structured semantic models for web search using clickthrough data," in Proc. of CIKM, 2013.

Shen et al., “Learning semantic representations using ´ convolutional neural networks for web search," in Proc. of WWW, 2014. 38 maximizes the likelihood of clicked documents given queries

Semantically related documents are close to the query in the encoded space

(39)

Multi-Tasking

Representati on Learning by Different Tasks

(40)

Task-Shared Representation

40

Task 1 Task 2

The latent factors can be learned by different tasks

(41)

Concluding Remarks

Labeling data is expensive, but we have large unlabeled data Autoencoder / VAE

exploits unlabeled data to learn latent factors as representations

learned representations can be transfer to other tasks

Distant Labels / Labels from Other Tasks

learn the representations that are useful for other tasks

learned representations may be also useful for the target task

參考文獻

相關文件

◦ Action, State, and Reward Markov Decision Process Reinforcement Learning.

○ exploits unlabeled data to learn latent factors as representations. ○ learned representations can be transfer to

 Sequence-to-sequence learning: both input and output are both sequences with different lengths..

Training two networks jointly  the generator knows how to adapt its parameters in order to produce output data that can fool the

• BP can not correct the latent error neurons by adjusting their succeeding layers.. • AIR tree can trace the errors in a latent layer that near the front

Input domain: word, word sequence, audio signal, click logs Output domain: single label, sequence tags, tree structure, probability

◦ Value function: how good is each state and/or action.. ◦ Policy: agent’s

State value function: when using