Machine Learning Foundations (ᘤ9M)

32  Download (0)

Full text

(1)

Machine Learning Foundations

( 機器學習基石)

Lecture 3: Types of Learning

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

(2)

Types of Learning

Roadmap

1 When

Can Machines Learn?

Lecture 2: Learning to Answer Yes/No PLA

A takes

linear separable

D and

perceptrons

H to get

hypothesis

g

Lecture 3: Types of Learning

Learning with Different Output Space Y Learning with Different Data Label y n Learning with Different Protocol f ⇒ (x n , y n ) Learning with Different Input Space X

2 Why Can Machines Learn?

3 How Can Machines Learn?

4 How Can Machines Learn Better?

(3)

Types of Learning Learning with Different Output Space Y

Credit Approval Problem Revisited

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000

credit?

{no(−1), yes(+1)}

unknown target function f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

Y = {−1, +1}:

binary classification

(4)

Types of Learning Learning with Different Output Space Y

More Binary Classification Problems

credit

approve/disapprove

email

spam/non-spam

patient

sick/not sick

ad

profitable/not profitable

answer

correct/incorrect

(KDDCup 2010)

core and important problem with many tools as

building block of other tools

(5)

Types of Learning Learning with Different Output Space Y

Multiclass Classification: Coin Recognition Problem

25

5 1

Mass

Size 10

classify US coins (1c, 5c, 10c, 25c) by (size, mass)

Y = {1c, 5c, 10c, 25c}, or

Y = {1, 2, · · · , K } (abstractly)

binary classification: special case with K = 2

Other Multiclass Classification Problems

written digits ⇒ 0, 1, · · · , 9

pictures ⇒ apple, orange, strawberry

emails ⇒ spam, primary, social, promotion, update (Google)

many applications

in practice, especially for ‘recognition’

(6)

Types of Learning Learning with Different Output Space Y

Regression: Patient Recovery Prediction Problem

binary classification: patient features ⇒ sick or not

multiclass classification: patient features ⇒ which type of cancer

regression: patient features ⇒

how many days before recovery

• Y = R

or Y = [lower, upper] ⊂ R (bounded regression)

—deeply studied in statistics

Other Regression Problems

company data ⇒ stock price

climate data ⇒ temperature

also core and important with many ‘statistical’

tools as

building block of other tools

(7)

Types of Learning Learning with Different Output Space Y

Structured Learning: Sequence Tagging Problem

I

|{z}

pronoun

love

|{z}

verb

ML

|{z}

noun

multiclass classification: word ⇒ word class

structured learning:

sentence ⇒ structure (class of each word)

Y = {PVN, PVP, NVN, PV , · · · }, not including VVVVV

huge multiclass classification problem (structure ≡ hyperclass)

without ‘explicit’

class definition

Other Structured Learning Problems

protein data ⇒ protein folding

speech data ⇒ speech parse tree

a fancy but complicated learning problem

(8)

Types of Learning Learning with Different Output Space Y

Mini Summary

Learning with Different Output Space Y

binary classification: Y = {−1, +1}

multiclass classification: Y = {1, 2, · · · , K }

regression

: Y = R

structured learning: Y = structures

. . .and a lot more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core tools: binary classification and regression

(9)

Types of Learning Learning with Different Output Space Y

Fun Time

What is this learning problem?

The entrance system of the school gym, which does automatic face recognition based on machine learning, is built to charge four different groups of users differently: Staff, Student, Professor, Other. What type of learning problem best fits the need of the system?

1

binary classification

2

multiclass classification

3

regression

4

structured learning

Reference Answer: 2

There is an ‘explicit’ Y that contains four classes.

(10)

Types of Learning Learning with Different Data Label yn

Supervised: Coin Recognition Revisited

25

5 1

Mass

Size 10

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

supervised learning:

every

x n comes with corresponding y n

(11)

Types of Learning Learning with Different Data Label yn

Unsupervised: Coin Recognition without y n

25

5 1

Mass

Size 10

supervised multiclass classification

Mass

Size

unsupervised multiclass classification

⇐⇒

‘clustering’

Other Clustering Problems

articles ⇒ topics

consumer profiles ⇒ consumer groups

clustering: a challenging but useful problem

(12)

Types of Learning Learning with Different Data Label yn

Unsupervised: Coin Recognition without y n

25

5 1

Mass

Size 10

supervised multiclass classification

Mass

Size

unsupervised multiclass classification

⇐⇒

‘clustering’

Other Clustering Problems

articles ⇒ topics

consumer profiles ⇒ consumer groups

clustering: a challenging but useful problem

(13)

Types of Learning Learning with Different Data Label yn

Unsupervised: Learning without y n

Other Unsupervised Learning Problems

clustering: {x

n

} ⇒ cluster(x)

(≈ ‘unsupervised multiclass classification’)

—i.e. articles ⇒ topics

density estimation: {x n

} ⇒ density(x) (≈ ‘unsupervised bounded regression’)

—i.e. traffic reports with location ⇒ dangerous areas

outlier detection: {x n

} ⇒ unusual(x)

(≈ extreme ‘unsupervised binary classification’)

—i.e. Internet logs ⇒ intrusion alert

. . .and a lot more!!

unsupervised learning: diverse, with possibly

very different performance goals

(14)

Types of Learning Learning with Different Data Label yn

Semi-supervised: Coin Recognition with Some y n

25

5 1

Mass

Size 10

supervised

25

5 1

Mass

Size 10

semi-supervised

Mass

Size

unsupervised (clustering)

Other Semi-supervised Learning Problems

face images with a few labeled ⇒ face identifier (Facebook)

medicine data with a few labeled ⇒ medicine effect predictor

semi-supervised learning: leverage

unlabeled data to avoid ‘expensive’ labeling

(15)

Types of Learning Learning with Different Data Label yn

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog pees on the ground.

BAD DOG. THAT’S A VERY WRONG ACTION.

cannot easily show the dog that y

n

= sit when

x n

=‘sit down’

but can ‘punish’ to say ˜y

n

= pee is wrong

Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

(customer, ad choice, ad click earning) ⇒ ad system

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

(16)

Types of Learning Learning with Different Data Label yn

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog sits down.

Good Dog. Let me give you some cookies.

still cannot show y

n

= sit when

x n

=‘sit down’

but can ‘reward’ to say ˜y

n

= sit is good

Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

(customer, ad choice, ad click earning) ⇒ ad system

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

(17)

Types of Learning Learning with Different Data Label yn

Mini Summary

Learning with Different Data Label y n

supervised: all y n

unsupervised: no y

n

semi-supervised: some y

n

reinforcement: implicit y

n

by goodness(˜y

n

)

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core tool: supervised learning

(18)

Types of Learning Learning with Different Data Label yn

Fun Time

What is this learning problem?

To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?

1

supervised

2

unsupervised

3

semi-supervised

4

reinforcement

Reference Answer: 3

The 1, 000 records are the labeled (x

n

,y

n

); the other 999, 000 pictures are the unlabeled

x n

.

(19)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Batch Learning: Coin Recognition Revisited

25

5 1

Mass

Size 10

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

batch

supervised multiclass classification:

learn from

all known

data

(20)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

More Batch Learning Problems

25

5 1

Mass

Size 10

Mass

Size

batch of (email, spam?) ⇒ spam filter

batch of (patient, cancer) ⇒ cancer classifier

batch of patient data ⇒ group of patients

batch learning:

a very common protocol

(21)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Online: Spam Filter that ‘Improves’

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

online

spam filter, which

sequentially:

1 observe an email x

t

2 predict spam status with current g

t

(x

t

)

3 receive ‘desired label’ y

t

from user, and then update g

t

with (x

t

, y

t

)

Connection to What We Have Learned

PLA can be easily adapted to online protocol (how?)

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

sequentially

(22)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Active Learning: Learning by ‘Asking’

Protocol ⇔ Learning Philosophy

batch: ‘duck feeding’

online: ‘passive sequential’

active: ‘question asking’

(sequentially)

—query the y

n

of the

chosen x n

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

strategically

(23)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Mini Summary

Learning with Different Protocol f ⇒ (x n , y n )

batch: all known data

online: sequential (passive) data

• active: strategically-observed data

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core protocol: batch

(24)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Fun Time

What is this learning problem?

A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is

‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?

1

batch

2

online

3

active

4

random

Reference Answer: 3

The algorithm takes a active but naïve strategy:

ask when ‘confused’.

You should probably

do the same when taking a class. :-)

(25)

Types of Learning Learning with Different Input Space X

Credit Approval Problem Revisited

age 23 years

gender female

annual salary NTD 1,000,000 year in residence 1 year

year in job 0.5 year current debt 200,000 unknown target function

f : X → Y

(ideal credit approval formula)

training examples D : (x

1

, y

1

), · · · , (x

N

,y

N

) (historical records in bank)

learning algorithm

A

final hypothesis g ≈ f

(‘learned’ formula to be used)

hypothesis set H

(set of candidate formula)

concrete

features: each dimension of X ⊆ R

d

represents ‘sophisticated physical meaning’

(26)

Types of Learning Learning with Different Input Space X

More on Concrete Features

(size, mass)

for coin classification

customer info

for credit approval

patient info

for cancer diagnosis

often including ‘human intelligence’

on the learning task

25

5 1

Mass

Size 10

concrete features: the ‘easy’ ones for ML

(27)

Types of Learning Learning with Different Input Space X

Raw Features: Digit Recognition Problem (1/2)

digit recognition problem: features ⇒ meaning of digit

a typical supervised multiclass classification problem

(28)

Types of Learning Learning with Different Input Space X

Raw Features: Digit Recognition Problem (2/2)

by Concrete Features

x =(symmetry, density)

by Raw Features

16 by 16 gray image

x ≡

(0, 0, 0.9, 0.6, · · · ) ∈ R

256

‘simplephysical meaning’;

thus more difficult for ML than concrete features

Other Problems with Raw Features

image pixels, speech signal, etc.

raw features: often need human or machines to

convert to concrete ones

(29)

Types of Learning Learning with Different Input Space X

Abstract Features: Rating Prediction Problem

Rating Prediction Problem (KDDCup 2011)

given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid?

a regression problem with Y ⊆ R as rating and

X ⊆ N × N as (userid, itemid)

‘nophysical meaning’; thus even more difficult for ML

Other Problems with Abstract Features

student ID in online tutoring system (KDDCup 2010)

advertisement ID in online ad system

abstract: again need ‘feature

conversion/extraction/construction’

(30)

Types of Learning Learning with Different Input Space X

Mini Summary

Learning with Different Input Space X

concrete: sophisticated (and related)

physical meaning

raw: simple physical meaning

abstract: no (or little) physical meaning

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

‘easy’ input: concrete

(31)

Types of Learning Learning with Different Input Space X

Fun Time

What features can be used?

Consider a problem of building an online image advertisement system that shows the users the most relevant images. What features can you choose to use?

1

concrete

2

concrete, raw

3

concrete, abstract

4

concrete, raw, abstract

Reference Answer: 4

concrete user features, raw image features, and maybe abstract user/image IDs

(32)

Types of Learning Learning with Different Input Space X

Summary

1 When

Can Machines Learn?

Lecture 2: Learning to Answer Yes/No Lecture 3: Types of Learning

Learning with Different Output Space Y

[classification], [regression], structured Learning with Different Data Label y n

[supervised], un/semi-supervised, reinforcement Learning with Different Protocol f ⇒ (x n , y n )

[batch], online, active Learning with Different Input Space X

[concrete], raw, abstract

next: learning is impossible?!

2 Why Can Machines Learn?

3 How Can Machines Learn?

4 How Can Machines Learn Better?

Figure

Updating...

References

Related subjects :