• 沒有找到結果。

# Machine Learning Foundations (ᘤ9M)

N/A
N/A
Protected

Share "Machine Learning Foundations (ᘤ9M)"

Copied!
32
0
0

(1)

## ( 機器學習基石)

### Lecture 3: Types of Learning

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

### ( 國立台灣大學資訊工程系)

(2)

Types of Learning

### 1 When

Can Machines Learn?

A takes

D and

H to get

g

### 4 How Can Machines Learn Better?

(3)

Types of Learning Learning with Different Output Space Y

## Credit Approval Problem Revisited

credit?

1

1

N

N

Y = {−1, +1}:

### binary classification

(4)

Types of Learning Learning with Different Output Space Y

## More Binary Classification Problems

credit

email

patient

### correct/incorrect

(KDDCup 2010)

core and important problem with many tools as

### building block of other tools

(5)

Types of Learning Learning with Different Output Space Y

## Multiclass Classification: Coin Recognition Problem

25

5 1

Mass

Size 10

### •

classify US coins (1c, 5c, 10c, 25c) by (size, mass)

### •

Y = {1c, 5c, 10c, 25c}, or

### •

binary classification: special case with K = 2

### •

written digits ⇒ 0, 1, · · · , 9

### •

pictures ⇒ apple, orange, strawberry

### •

emails ⇒ spam, primary, social, promotion, update (Google)

### many applications

in practice, especially for ‘recognition’

(6)

Types of Learning Learning with Different Output Space Y

## Regression: Patient Recovery Prediction Problem

### •

binary classification: patient features ⇒ sick or not

### •

multiclass classification: patient features ⇒ which type of cancer

### •

regression: patient features ⇒

### • Y = R

or Y = [lower, upper] ⊂ R (bounded regression)

—deeply studied in statistics

### •

company data ⇒ stock price

### •

climate data ⇒ temperature

also core and important with many ‘statistical’

tools as

### building block of other tools

(7)

Types of Learning Learning with Different Output Space Y

## Structured Learning: Sequence Tagging Problem

I

|{z}

love

|{z}

ML

|{z}

### •

multiclass classification: word ⇒ word class

### •

structured learning:

### •

Y = {PVN, PVP, NVN, PV , · · · }, not including VVVVV

### •

huge multiclass classification problem (structure ≡ hyperclass)

### •

protein data ⇒ protein folding

### •

speech data ⇒ speech parse tree

a fancy but complicated learning problem

(8)

Types of Learning Learning with Different Output Space Y

## Mini Summary

### •

multiclass classification: Y = {1, 2, · · · , K }

: Y = R

### •

structured learning: Y = structures

### •

. . .and a lot more!!

1

1

N

N

### hypothesis set H

core tools: binary classification and regression

(9)

Types of Learning Learning with Different Output Space Y

## Fun Time

### What is this learning problem?

The entrance system of the school gym, which does automatic face recognition based on machine learning, is built to charge four different groups of users differently: Staff, Student, Professor, Other. What type of learning problem best fits the need of the system?

### 1

binary classification

### 2

multiclass classification

regression

### 4

structured learning

There is an ‘explicit’ Y that contains four classes.

(10)

Types of Learning Learning with Different Data Label yn

## Supervised: Coin Recognition Revisited

25

5 1

Mass

Size 10

1

1

N

N

### hypothesis set H

supervised learning:

every

### xncomes with corresponding yn

(11)

Types of Learning Learning with Different Data Label yn

## Unsupervised: Coin Recognition without y n

25

5 1

Mass

Size 10

Mass

Size

‘clustering’

### •

articles ⇒ topics

### •

consumer profiles ⇒ consumer groups

### clustering: a challenging but useful problem

(12)

Types of Learning Learning with Different Data Label yn

## Unsupervised: Coin Recognition without y n

25

5 1

Mass

Size 10

Mass

Size

‘clustering’

### •

articles ⇒ topics

### •

consumer profiles ⇒ consumer groups

### clustering: a challenging but useful problem

(13)

Types of Learning Learning with Different Data Label yn

## Unsupervised: Learning without y n

clustering: {x

### n

} ⇒ cluster(x)

(≈ ‘unsupervised multiclass classification’)

—i.e. articles ⇒ topics

### • density estimation: {xn

} ⇒ density(x) (≈ ‘unsupervised bounded regression’)

—i.e. traffic reports with location ⇒ dangerous areas

### • outlier detection: {xn

} ⇒ unusual(x)

(≈ extreme ‘unsupervised binary classification’)

—i.e. Internet logs ⇒ intrusion alert

### •

. . .and a lot more!!

### unsupervised learning: diverse, with possibly

very different performance goals

(14)

Types of Learning Learning with Different Data Label yn

## Semi-supervised: Coin Recognition with Some y n

25

5 1

Mass

Size 10

25

5 1

Mass

Size 10

Mass

Size

### •

face images with a few labeled ⇒ face identifier (Facebook)

### •

medicine data with a few labeled ⇒ medicine effect predictor

### semi-supervised learning: leverage

unlabeled data to avoid ‘expensive’ labeling

(15)

Types of Learning Learning with Different Data Label yn

## Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog pees on the ground.

### •

cannot easily show the dog that y

= sit when

=‘sit down’

### •

but can ‘punish’ to say ˜y

= pee is wrong

### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

### information’

(often sequentially)

(16)

Types of Learning Learning with Different Data Label yn

## Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog sits down.

### •

still cannot show y

= sit when

=‘sit down’

### •

but can ‘reward’ to say ˜y

= sit is good

### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

### information’

(often sequentially)

(17)

Types of Learning Learning with Different Data Label yn

## Mini Summary

### •

unsupervised: no y

### •

semi-supervised: some y

### •

reinforcement: implicit y

by goodness(˜y

)

. . .and more!!

1

1

N

N

### hypothesis set H

core tool: supervised learning

(18)

Types of Learning Learning with Different Data Label yn

## Fun Time

### What is this learning problem?

To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?

supervised

unsupervised

semi-supervised

### 4

reinforcement

The 1, 000 records are the labeled (x

,y

### n

); the other 999, 000 pictures are the unlabeled

### xn

.

(19)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

## Batch Learning: Coin Recognition Revisited

25

5 1

Mass

Size 10

1

1

N

N

### batch

supervised multiclass classification:

learn from

### all known

data

(20)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

## More Batch Learning Problems

25

5 1

Mass

Size 10

Mass

Size

### •

batch of (email, spam?) ⇒ spam filter

### •

batch of (patient, cancer) ⇒ cancer classifier

### •

batch of patient data ⇒ group of patients

batch learning:

### a very common protocol

(21)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

## Online: Spam Filter that ‘Improves’

### •

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

### • online

spam filter, which

t

t

t

t

t

t

t

### •

PLA can be easily adapted to online protocol (how?)

### •

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

### sequentially

(22)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

## Active Learning: Learning by ‘Asking’

### •

batch: ‘duck feeding’

### •

online: ‘passive sequential’

(sequentially)

—query the y

of the

1

1

N

N

### hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

### strategically

(23)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

## Mini Summary

### •

online: sequential (passive) data

. . .and more!!

1

1

N

N

### hypothesis set H

core protocol: batch

(24)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

## Fun Time

### What is this learning problem?

A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is

‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?

batch

online

active

### 4

random

The algorithm takes a active but naïve strategy:

### do the same when taking a class. :-)

(25)

Types of Learning Learning with Different Input Space X

## Credit Approval Problem Revisited

1

1

N

N

### concrete

features: each dimension of X ⊆ R

### d

represents ‘sophisticated physical meaning’

(26)

Types of Learning Learning with Different Input Space X

## More on Concrete Features

### • (size, mass)

for coin classification

### • customer info

for credit approval

### • patient info

for cancer diagnosis

### •

often including ‘human intelligence’

25

5 1

Mass

Size 10

concrete features: the ‘easy’ ones for ML

(27)

Types of Learning Learning with Different Input Space X

## Raw Features: Digit Recognition Problem (1/2)

### •

digit recognition problem: features ⇒ meaning of digit

### •

a typical supervised multiclass classification problem

(28)

Types of Learning Learning with Different Input Space X

## Raw Features: Digit Recognition Problem (2/2)

### •

16 by 16 gray image

### x ≡

(0, 0, 0.9, 0.6, · · · ) ∈ R

### •

‘simplephysical meaning’;

thus more difficult for ML than concrete features

### •

image pixels, speech signal, etc.

raw features: often need human or machines to

### convert to concrete ones

(29)

Types of Learning Learning with Different Input Space X

## Abstract Features: Rating Prediction Problem

### •

given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid?

### •

a regression problem with Y ⊆ R as rating and

### •

‘nophysical meaning’; thus even more difficult for ML

### •

student ID in online tutoring system (KDDCup 2010)

### •

abstract: again need ‘feature

### conversion/extraction/construction’

(30)

Types of Learning Learning with Different Input Space X

## Mini Summary

physical meaning

### •

raw: simple physical meaning

### •

abstract: no (or little) physical meaning

. . .and more!!

1

1

N

N

### hypothesis set H

‘easy’ input: concrete

(31)

Types of Learning Learning with Different Input Space X

## Fun Time

### What features can be used?

Consider a problem of building an online image advertisement system that shows the users the most relevant images. What features can you choose to use?

concrete

concrete, raw

### 3

concrete, abstract

### 4

concrete, raw, abstract

concrete user features, raw image features, and maybe abstract user/image IDs

(32)

Types of Learning Learning with Different Input Space X

## Summary

### 1 When

Can Machines Learn?

### 4 How Can Machines Learn Better?

Which keywords below shall have large positive weights in a good perceptron for the task.. 1 coffee, tea,

2 You’ll likely be rich by exploiting the rule in the next 100 days, if the market behaves similarly to the last 10 years. 3 You’ll likely be rich by exploiting the ‘best rule’

“Machine Learning Foundations” free online course, and works from NTU CLLab and NTU KDDCup teams... The Learning Problem What is

1 After computing if D is linear separable, we shall know w ∗ and then there is no need to use PLA.. Noise and Error Algorithmic Error Measure. Choice of

• elearning pilot scheme (Four True Light Schools): WIFI construction, iPad procurement, elearning school visit and teacher training, English starts the elearning lesson.. 2012 •

You shall find it difficult to generate more kinds by varying the inputs, and we will give a formal proof in future lectures.

Lecture 5: Training versus Testing Hsuan-Tien Lin (林 軒田) htlin@csie.ntu.edu.tw?. Department of

Definition of VC Dimension VC Dimension of Perceptrons Physical Intuition of VC Dimension Interpreting VC Dimension?. 3 How Can