Machine Learning Foundations
( 機器學習基石)
Lecture 3: Types of Learning
Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.twDepartment of Computer Science
& Information Engineering
National Taiwan University
( 國立台灣大學資訊工程系)
Types of Learning
Roadmap
1 When
Can Machines Learn?Lecture 2: Learning to Answer Yes/No PLA
A takeslinear separable
D andperceptrons
H to gethypothesis
gLecture 3: Types of Learning
Learning with Different Output Space Y Learning with Different Data Label y n Learning with Different Protocol f ⇒ (x n , y n ) Learning with Different Input Space X
2 Why Can Machines Learn?
3 How Can Machines Learn?
4 How Can Machines Learn Better?
Types of Learning Learning with Different Output Space Y
Credit Approval Problem Revisited
age 23 years
gender female
annual salary NTD 1,000,000 year in residence 1 year
year in job 0.5 year current debt 200,000
credit?
{no(−1), yes(+1)}
unknown target function f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
Y = {−1, +1}:
binary classification
Types of Learning Learning with Different Output Space Y
More Binary Classification Problems
•
creditapprove/disapprove
•
emailspam/non-spam
•
patientsick/not sick
•
adprofitable/not profitable
•
answercorrect/incorrect
(KDDCup 2010)core and important problem with many tools as
building block of other tools
Types of Learning Learning with Different Output Space Y
Multiclass Classification: Coin Recognition Problem
25
5 1
Mass
Size 10
•
classify US coins (1c, 5c, 10c, 25c) by (size, mass)•
Y = {1c, 5c, 10c, 25c}, orY = {1, 2, · · · , K } (abstractly)
•
binary classification: special case with K = 2Other Multiclass Classification Problems
•
written digits ⇒ 0, 1, · · · , 9•
pictures ⇒ apple, orange, strawberry•
emails ⇒ spam, primary, social, promotion, update (Google)many applications
in practice, especially for ‘recognition’Types of Learning Learning with Different Output Space Y
Regression: Patient Recovery Prediction Problem
•
binary classification: patient features ⇒ sick or not•
multiclass classification: patient features ⇒ which type of cancer•
regression: patient features ⇒how many days before recovery
• Y = R
or Y = [lower, upper] ⊂ R (bounded regression)—deeply studied in statistics
Other Regression Problems
•
company data ⇒ stock price•
climate data ⇒ temperaturealso core and important with many ‘statistical’
tools as
building block of other tools
Types of Learning Learning with Different Output Space Y
Structured Learning: Sequence Tagging Problem
I
|{z}
pronoun
love
|{z}
verb
ML
|{z}
noun
•
multiclass classification: word ⇒ word class•
structured learning:sentence ⇒ structure (class of each word)
•
Y = {PVN, PVP, NVN, PV , · · · }, not including VVVVV•
huge multiclass classification problem (structure ≡ hyperclass)without ‘explicit’
class definition
Other Structured Learning Problems
•
protein data ⇒ protein folding•
speech data ⇒ speech parse treea fancy but complicated learning problem
Types of Learning Learning with Different Output Space Y
Mini Summary
Learning with Different Output Space Y
• binary classification: Y = {−1, +1}
•
multiclass classification: Y = {1, 2, · · · , K }• regression
: Y = R•
structured learning: Y = structures•
. . .and a lot more!!unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
core tools: binary classification and regression
Types of Learning Learning with Different Output Space Y
Fun Time
What is this learning problem?
The entrance system of the school gym, which does automatic face recognition based on machine learning, is built to charge four different groups of users differently: Staff, Student, Professor, Other. What type of learning problem best fits the need of the system?
1
binary classification2
multiclass classification3
regression4
structured learningReference Answer: 2
There is an ‘explicit’ Y that contains four classes.
Types of Learning Learning with Different Data Label yn
Supervised: Coin Recognition Revisited
25
5 1
Mass
Size 10
unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
supervised learning:
every
x n comes with corresponding y n
Types of Learning Learning with Different Data Label yn
Unsupervised: Coin Recognition without y n
25
5 1
Mass
Size 10
supervised multiclass classification
Mass
Size
unsupervised multiclass classification
⇐⇒
‘clustering’Other Clustering Problems
•
articles ⇒ topics•
consumer profiles ⇒ consumer groupsclustering: a challenging but useful problem
Types of Learning Learning with Different Data Label yn
Unsupervised: Coin Recognition without y n
25
5 1
Mass
Size 10
supervised multiclass classification
Mass
Size
unsupervised multiclass classification
⇐⇒
‘clustering’Other Clustering Problems
•
articles ⇒ topics•
consumer profiles ⇒ consumer groupsclustering: a challenging but useful problem
Types of Learning Learning with Different Data Label yn
Unsupervised: Learning without y n
Other Unsupervised Learning Problems
•
clustering: {xn
} ⇒ cluster(x)(≈ ‘unsupervised multiclass classification’)
—i.e. articles ⇒ topics
• density estimation: {x n
} ⇒ density(x) (≈ ‘unsupervised bounded regression’)—i.e. traffic reports with location ⇒ dangerous areas
• outlier detection: {x n
} ⇒ unusual(x)(≈ extreme ‘unsupervised binary classification’)
—i.e. Internet logs ⇒ intrusion alert
•
. . .and a lot more!!unsupervised learning: diverse, with possibly
very different performance goalsTypes of Learning Learning with Different Data Label yn
Semi-supervised: Coin Recognition with Some y n
25
5 1
Mass
Size 10
supervised
25
5 1
Mass
Size 10
semi-supervised
Mass
Size
unsupervised (clustering)
Other Semi-supervised Learning Problems
•
face images with a few labeled ⇒ face identifier (Facebook)•
medicine data with a few labeled ⇒ medicine effect predictorsemi-supervised learning: leverage
unlabeled data to avoid ‘expensive’ labelingTypes of Learning Learning with Different Data Label yn
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog pees on the ground.
BAD DOG. THAT’S A VERY WRONG ACTION.
•
cannot easily show the dog that yn
= sit whenx n
=‘sit down’•
but can ‘punish’ to say ˜yn
= pee is wrongOther Reinforcement Learning Problems Using (x, ˜ y , goodness)
•
(customer, ad choice, ad click earning) ⇒ ad system•
(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with‘partial/implicit
information’
(often sequentially)Types of Learning Learning with Different Data Label yn
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog sits down.
Good Dog. Let me give you some cookies.
•
still cannot show yn
= sit whenx n
=‘sit down’•
but can ‘reward’ to say ˜yn
= sit is goodOther Reinforcement Learning Problems Using (x, ˜ y , goodness)
•
(customer, ad choice, ad click earning) ⇒ ad system•
(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with‘partial/implicit
information’
(often sequentially)Types of Learning Learning with Different Data Label yn
Mini Summary
Learning with Different Data Label y n
• supervised: all y n
•
unsupervised: no yn
•
semi-supervised: some yn
•
reinforcement: implicit yn
by goodness(˜yn
)•
. . .and more!!unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
core tool: supervised learning
Types of Learning Learning with Different Data Label yn
Fun Time
What is this learning problem?
To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?
1
supervised2
unsupervised3
semi-supervised4
reinforcementReference Answer: 3
The 1, 000 records are the labeled (x
n
,yn
); the other 999, 000 pictures are the unlabeledx n
.Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Batch Learning: Coin Recognition Revisited
25
5 1
Mass
Size 10
unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
batch
supervised multiclass classification:learn from
all known
dataTypes of Learning Learning with Different Protocol f ⇒ (xn,yn)
More Batch Learning Problems
25
5 1
Mass
Size 10
Mass
Size
•
batch of (email, spam?) ⇒ spam filter•
batch of (patient, cancer) ⇒ cancer classifier•
batch of patient data ⇒ group of patientsbatch learning:
a very common protocol
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Online: Spam Filter that ‘Improves’
•
batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g
• online
spam filter, whichsequentially:
1 observe an email x
t2 predict spam status with current g
t(x
t)
3 receive ‘desired label’ y
tfrom user, and then update g
twith (x
t, y
t)
Connection to What We Have Learned
•
PLA can be easily adapted to online protocol (how?)•
reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances
sequentially
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Active Learning: Learning by ‘Asking’
Protocol ⇔ Learning Philosophy
•
batch: ‘duck feeding’•
online: ‘passive sequential’• active: ‘question asking’
(sequentially)—query the y
n
of thechosen x n
unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
active: improve hypothesis with fewer labels (hopefully) by asking questions
strategically
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Mini Summary
Learning with Different Protocol f ⇒ (x n , y n )
• batch: all known data
•
online: sequential (passive) data• active: strategically-observed data
•
. . .and more!!unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
core protocol: batch
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Fun Time
What is this learning problem?
A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is
‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?
1
batch2
online3
active4
randomReference Answer: 3
The algorithm takes a active but naïve strategy:
ask when ‘confused’.
You should probably
do the same when taking a class. :-)
Types of Learning Learning with Different Input Space X
Credit Approval Problem Revisited
age 23 years
gender female
annual salary NTD 1,000,000 year in residence 1 year
year in job 0.5 year current debt 200,000 unknown target function
f : X → Y
(ideal credit approval formula)
training examples D : (x
1, y
1), · · · , (x
N,y
N) (historical records in bank)
learning algorithm
A
final hypothesis g ≈ f
(‘learned’ formula to be used)
hypothesis set H
(set of candidate formula)
concrete
features: each dimension of X ⊆ Rd
represents ‘sophisticated physical meaning’Types of Learning Learning with Different Input Space X
More on Concrete Features
• (size, mass)
for coin classification• customer info
for credit approval• patient info
for cancer diagnosis•
often including ‘human intelligence’on the learning task
25
5 1
Mass
Size 10
concrete features: the ‘easy’ ones for ML
Types of Learning Learning with Different Input Space X
Raw Features: Digit Recognition Problem (1/2)
•
digit recognition problem: features ⇒ meaning of digit•
a typical supervised multiclass classification problemTypes of Learning Learning with Different Input Space X
Raw Features: Digit Recognition Problem (2/2)
by Concrete Features
x =(symmetry, density)
by Raw Features
•
16 by 16 gray imagex ≡
(0, 0, 0.9, 0.6, · · · ) ∈ R256
•
‘simplephysical meaning’;thus more difficult for ML than concrete features
Other Problems with Raw Features
•
image pixels, speech signal, etc.raw features: often need human or machines to
convert to concrete ones
Types of Learning Learning with Different Input Space X
Abstract Features: Rating Prediction Problem
Rating Prediction Problem (KDDCup 2011)
•
given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid?•
a regression problem with Y ⊆ R as rating andX ⊆ N × N as (userid, itemid)
•
‘nophysical meaning’; thus even more difficult for MLOther Problems with Abstract Features
•
student ID in online tutoring system (KDDCup 2010)•
advertisement ID in online ad systemabstract: again need ‘feature
conversion/extraction/construction’
Types of Learning Learning with Different Input Space X
Mini Summary
Learning with Different Input Space X
• concrete: sophisticated (and related)
physical meaning•
raw: simple physical meaning•
abstract: no (or little) physical meaning•
. . .and more!!unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
‘easy’ input: concrete
Types of Learning Learning with Different Input Space X
Fun Time
What features can be used?
Consider a problem of building an online image advertisement system that shows the users the most relevant images. What features can you choose to use?
1
concrete2
concrete, raw3
concrete, abstract4
concrete, raw, abstractReference Answer: 4
concrete user features, raw image features, and maybe abstract user/image IDs
Types of Learning Learning with Different Input Space X