• 沒有找到結果。

Unsupervised Learning

N/A
N/A
Protected

Academic year: 2022

Share "Unsupervised Learning"

Copied!
43
0
0

加載中.... (立即查看全文)

全文

(1)

Unsupervised Learning

Applied Deep Learning

May 25th, 2020 http://adl.miulab.tw

(2)

Introduction

◉ Big data ≠ Big annotated data

◉ Machine learning techniques include:

Supervised learning (if we have labelled data)

○ Reinforcement learning (if we have an environment for reward)

○ Unsupervised learning (if we do not have labelled data)

What can we do if there is no sufficient training data?

2

(3)

Semi-Supervised Learning

Labelled Data

Unlabeled Data

cat dog

(Image of cats and dogs without labeling)

3

(4)

Semi-Supervised Learning

◉ Why semi-supervised learning helps?

The distribution of the unlabeled data provides some cues

4

(5)

Transfer Learning

Source Data

Target Data

cat dog

Not related to the task considered

elephant elephant tiger tiger

5

(6)

Transfer Learning

◉ Widely used on image processing

○ Using sufficient labeled data to learn a CNN

Using this CNN as feature extractor

Layer 1

Layer 2

Layer L

x1

x2

……

…… … … … …

……

elephant

N

x

Pixels

6

(7)

Transfer Learning Example

爆漫王 研究生

生存守則

責編 漫畫家

投稿 jump 畫分鏡 指導教授

研究生

投稿期刊 跑實驗

漫畫家 online 研究生 online

7

(8)

Self-Taught Learning

◉ The unlabeled data sometimes is not related to the task

Unlabeled Data

(Just crawl millions of images from the Internet) Labelled Data

cat dog

8

(9)

Self-Taught Learning

◉ The unlabeled data sometimes is not related to the task

Labelled Data

Digit Recognition

Unlabeled Data

Speech Recognition

Document Classification

Digits character

Taiwanese

English Chinese

News Webpages

……

Why can we use unlabeled and unrelated data to help our tasks?

9

(10)

Self-Taught Learning

◉ How does self-taught learning work?

◉ Why does unlabeled and unrelated data help the tasks?

Finding latent factors that control the observations

10

(11)

Latent Factors for Handwritten Digits

11

(12)

Latent Factors for Documents

http://deliveryimages.acm.org/10.1145/2140000/2133826/figs/f1.jpg

12

(13)

Latent Factors for Recommendation System

A

C B

單純呆 傲嬌

13

(14)

Latent Factor Exploitation

◉ Handwritten digits

The handwritten images are composed of strokes

Strokes (Latent Factors)

…….

No. 1 No. 2 No. 3 No. 4 No. 5

14

(15)

Latent Factor Exploitation

28

28

Represented by 28 X 28 = 784 pixels

= + +

[1 0 1 0 1 0 …….]

Strokes (Latent Factors)

…….

No. 1 No. 2 No. 3 No. 4 No. 5

No. 1 No. 3 No. 5

(simpler representation)

15

(16)

Representation Learning

Autoencoder

16

(17)

Autoencoder

Represent a digit using 28 X 28 dimensions

Not all 28 X 28 images are digits

NN Encoder

NN Decoder

code

code

Learn together

28 X 28 = 784 Usually <784

Idea: represent the images of digits in a more compact way

compact representation of the input object

reconstruct the original object

17

(18)

Autoencoder

𝑥

Input layer

𝑊

𝑦 𝑊′

output layer hidden layer

𝑎

As close as possible Minimize 𝑥 − 𝑦 2

Bottleneck layer

𝑎 = 𝜎 𝑊𝑥 + 𝑏 𝑦 = 𝜎 𝑊′𝑎 + 𝑏′

Output of the hidden layer is the code encode decode

18

(19)

Autoencoder

De-noising auto-encoder

𝑥 𝑊 𝑦

𝑊′

𝑎

encode decode

Add noise

𝑥′

As close as possible

Rifai, et al. "Contractive auto-encoders: Explicit invariance during feature extraction,“ in ICML, 2011.

19

(20)

Deep Autoencoder

Input Layer Layer Layer bottle Output Layer

Layer

Layer

Layer

Layer

… …

Code

As close as possible

𝑥 𝑥෤

Hinton and Salakhutdinov. “Reducing the dimensionality of data with neural networks,” Science, 2006.

20

(21)

Deep Autoencoder

Original Image

PCA Deep Auto-encoder

784 784

784 1000 500 250 30

250 500 1000 784

30

21

(22)

Feature Representation

784 784

784 1000 500 250 2 2

250 500 1000 784

22

(23)

Auto-encoder – Text Retrieval

word string:

“This is an apple”

this is

a an apple pen

1 1 0 1 1 0

Bag-of-word Vector Space Model

document query

Semantics are not considered

23

(24)

Autoencoder – Text Retrieval

Bag-of-word(document or query) query

2000 500 250 125

2

The documents talking about the same thing will have close code

24

(25)

Auto-Encoding (AE)

◉ Objective: reconstructing ҧ𝑥 from ො 𝑥

○ dimension reduction or denoising (masked LM)

Randomly mask 15% of tokens

25

(26)

Autoencoder – Similar Image Retrieval

Retrieved using Euclidean distance in pixel intensity space

Krizhevsky et al. "Using very deep autoencoders for content-based image retrieval," in ESANN, 2011.

26

(27)

Autoencoder – Similar Image Retrieval

32x32

8192 4096 2048 1024 512 256

code (crawl millions of images from the Internet)

27

(28)

Autoencoder – Similar Image Retrieval

◉ Images retrieved using Euclidean distance in pixel intensity space

◉ Images retrieved using 256 codes

Learning the useful latent factors

28

(29)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

Target

784 1000 1000

10

Input output

Input 784 1000

784

W1 𝑥

෤ 𝑥 500

W1

29

(30)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

Target

784 1000 1000

10

Input output

500

Input 784 1000

W1 1000 1000

fix

𝑥 𝑎1

෤ 𝑎1

W2 W2

30

(31)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

Target

784 1000 1000

10

Input output

500

Input 784 1000

W1 1000

fix

𝑥 𝑎1

෤ 𝑎2

W2 fix

𝑎2 1000

W3 500

W3

31

(32)

Autoencoder for DNN Pre-Training

Greedy layer-wise pre-training again

Target

784 1000 1000

10

Input output

500

Input 784 1000

W1 1000

𝑥 W2

W3 500

output 10

W4

Random init Find-tune via backprop

32

(33)

Representation Learning and Generation

Variational Autoencoder

33

(34)

Generation from Latent Codes

𝑥

𝑊

𝑦 𝑊′

𝑎

encode decode

How can we set a latent code for generation?

34

(35)

Latent Code Distribution Constraints

Constrain the data distribution for learned latent codes

Generate the latent code via a prior distribution

𝑥 encode 𝑎 𝑦

decode

sampling

35

(36)

Reconstruction

AE

VAE

36

(37)

Representation Learning by Weak Labels

Distant Supervision

37

(38)

20K 20K 20K 1000

w1 w2 w3

1000 1000

20K

wd 1000

300

Word Sequence: x Word Hashing Matrix: Wh Word Hashing Layer: lh Convolution Matrix: Wc Convolutional Layer: lc Max Pooling Operation Max Pooling Layer: lm

Semantic Projection Matrix: Ws Semantic Layer: y

max max max 300 300 300 300

Q D1 D2 Dn

CosSim(Q, Di)

P(D1| Q) P(D2| Q) P(Dn| Q)

Query

Documents

Convolutional Deep Structured Semantic Models (CDSSM/DSSM)

Huang et al., "Learning deep structured semantic models for web search using clickthrough data," in Proc. of CIKM, 2013.

Shenet al., “Learning semantic representations using ´ convolutional neural networks for web search," in Proc. of WWW, 2014.

maximizes the likelihood of clicked documents given queries

Semantically related documents are close to the query in the encoded space

38

(39)

Representation Learning by Different Tasks

Multi-Task Learning

39

(40)

Task-Shared Representation

Task 1 Task 2

The latent factors can be learned by different tasks 40

(41)

Semi-Supervised Multi-Task SLU

(Lan et al., 2018)

◉ Idea: language understanding objective can enhance other tasks

41

O. Lan, S. Zhu, and K. Yu, “Semi-supervised Training using Adversarial Multi-task Learning for Spoken Language Understanding,” in Proceedings of ICASSP, 2018.

Slot Tagging

Model

BLM exploits the unsupervised knowledge, the shared-private framework and adversarial training make the slot tagging model more generalized

(42)

MT-DNN

(Liu et al., 2019)

42

https://github.com/namisan/mt-dnn

(43)

Concluding Remarks

◉ Labeling data is expensive, but we have large unlabeled data

◉ Autoencoder

exploits unlabeled data to learn latent factors as representations

○ learned representations can be transfer to other tasks

◉ Distant Labels / Labels from Other Tasks

○ learn the representations that are useful for other tasks

○ learned representations may be also useful for the target task

43

參考文獻

相關文件

 To share strategies and experiences on new literacy practices and the use of information technology in the English classroom for supporting English learning and teaching

• Extending students’ learning experience through promoting Reading across the Curriculum (RaC) &amp; using Writing across the Curriculum (WaC) activities as a

• To explore the roles of English Language curriculum leaders in planning the school-based curriculum in primary schools under Learning to Learn 2.0.. • To introduce

Using Information Texts in the Primary English Classroom: Developing KS2 Students’ Reading and Writing Skills (New). Jan-Feb 2016

one on ‘The Way Forward in Curriculum Development’, eight on the respective Key Learning Areas (Chinese Language Education, English Language Education, Mathematics

 Work in a collaborative manner with subject teachers to provide learners with additional opportunities to learn and use English in the school.  Enhance teachers’ own

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

The Seed project, Coding to Learn – Enabling Primary Students to Experience a New Approach to English Learning (C2L), aims to explore ways to use coding as a means of motivating