•
cannot easily show the dog that yn
= sit whenx n
=‘sit down’•
but can ‘punish’ to say ˜yn
= pee is wrongOther Reinforcement Learning Problems Using (x, ˜ y , goodness)
•
(customer, ad choice, ad click earning) ⇒ ad system•
(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with‘partial/implicit
information’
(often sequentially)Types of Learning Learning with Different Data Label yn
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog pees on the ground.
BAD DOG. THAT’S A VERY WRONG ACTION.
•
cannot easily show the dog that yn
= sit whenx n
=‘sit down’•
but can ‘punish’ to say ˜yn
= pee is wrongOther Reinforcement Learning Problems Using (x, ˜ y , goodness)
•
(customer, ad choice, ad click earning) ⇒ ad system•
(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with‘partial/implicit
information’
(often sequentially)Types of Learning Learning with Different Data Label yn
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog sits down.
Good Dog. Let me give you some cookies.
•
still cannot show yn
= sit whenx n
=‘sit down’•
but can ‘reward’ to say ˜yn
= sit is goodOther Reinforcement Learning Problems Using (x, ˜ y , goodness)
•
(customer, ad choice, ad click earning) ⇒ ad system•
(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with‘partial/implicit
information’
(often sequentially)Types of Learning Learning with Different Data Label yn
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog sits down.
Good Dog. Let me give you some cookies.
•
still cannot show yn
= sit whenx n
=‘sit down’•
but can ‘reward’ to say ˜yn
= sit is goodOther Reinforcement Learning Problems Using (x, ˜ y , goodness)
•
(customer, ad choice, ad click earning) ⇒ ad system•
(cards, strategy, winning amount) ⇒ black jack agentreinforcement: learn with
‘partial/implicit
information’
(often sequentially)Types of Learning Learning with Different Data Label yn
Reinforcement Learning
a ‘very different’ but natural way of learning
Teach Your Dog: Say ‘Sit Down’
The dog sits down.
Good Dog. Let me give you some cookies.
•
still cannot show yn
= sit whenx n
=‘sit down’•
but can ‘reward’ to say ˜yn
= sit is goodOther Reinforcement Learning Problems Using (x, ˜ y , goodness)
•
(customer, ad choice, ad click earning) ⇒ ad system•
(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with‘partial/implicit
information’
(often sequentially)Types of Learning Learning with Different Data Label yn
Mini Summary
Learning with Different Data Label y n
• supervised: all y n
•
unsupervised: no yn
•
semi-supervised: some yn
•
reinforcement: implicit yn
by goodness(˜yn
)•
. . .and more!!unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
core tool: supervised learning
Types of Learning Learning with Different Data Label yn
Fun Time
What is this learning problem?
To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?
1
supervised2
unsupervised3
semi-supervised4
reinforcementReference Answer: 3
The 1, 000 records are the labeled (x
n
,yn
); the other 999, 000 pictures are the unlabeledx n
.Types of Learning Learning with Different Data Label yn
Fun Time
What is this learning problem?
To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?
1
supervised2
unsupervised3
semi-supervised4
reinforcementReference Answer: 3
The 1, 000 records are the labeled (x
n
,yn
); the other 999, 000 pictures are the unlabeledx n
.Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Batch Learning: Coin Recognition Revisited
25
5 1
Mass
Size 10
unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
batch
supervised multiclass classification:learn from
all known
dataTypes of Learning Learning with Different Protocol f ⇒ (xn,yn)
More Batch Learning Problems
25
5 1
Mass
Size 10
Mass
Size
•
batch of (email, spam?) ⇒ spam filter•
batch of (patient, cancer) ⇒ cancer classifier•
batch of patient data ⇒ group of patientsbatch learning:
a very common protocol
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Online: Spam Filter that ‘Improves’
•
batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g
• online
spam filter, whichsequentially:
1 observe an email x
t2 predict spam status with current g
t(x
t)
3 receive ‘desired label’ y
tfrom user, and then update g
twith (x
t, y
t)
Connection to What We Have Learned
•
PLA can be easily adapted to online protocol (how?)•
reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances
sequentially
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Online: Spam Filter that ‘Improves’
•
batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g
• online
spam filter, whichsequentially:
1 observe an email x
t2 predict spam status with current g
t(x
t)
3 receive ‘desired label’ y
tfrom user, and then update g
twith (x
t, y
t)
Connection to What We Have Learned
•
PLA can be easily adapted to online protocol (how?)•
reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances
sequentially
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Online: Spam Filter that ‘Improves’
•
batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g
• online
spam filter, whichsequentially:
1 observe an email x
t2 predict spam status with current g
t(x
t)
3 receive ‘desired label’ y
tfrom user, and then update g
twith (x
t, y
t)
Connection to What We Have Learned
•
PLA can be easily adapted to online protocol (how?)•
reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances
sequentially
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Online: Spam Filter that ‘Improves’
•
batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g
• online
spam filter, whichsequentially:
1 observe an email x
t2 predict spam status with current g
t(x
t)
3 receive ‘desired label’ y
tfrom user, and then update g
twith (x
t, y
t)
Connection to What We Have Learned
•
PLA can be easily adapted to online protocol (how?)•
reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances
sequentially
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Online: Spam Filter that ‘Improves’
•
batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g
• online
spam filter, whichsequentially:
1 observe an email x
t2 predict spam status with current g
t(x
t)
3 receive ‘desired label’ y
tfrom user, and then update g
twith (x
t, y
t)
Connection to What We Have Learned
•
PLA can be easily adapted to online protocol (how?)•
reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances
sequentially
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Active Learning: Learning by ‘Asking’
Protocol ⇔ Learning Philosophy
•
batch: ‘duck feeding’•
online: ‘passive sequential’• active: ‘question asking’
(sequentially)—query the y
n
of thechosen x n
unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
active: improve hypothesis with fewer labels (hopefully) by asking questions
strategically
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Active Learning: Learning by ‘Asking’
Protocol ⇔ Learning Philosophy
•
batch: ‘duck feeding’•
online: ‘passive sequential’• active: ‘question asking’
(sequentially)—query the y
n
of thechosen x n
unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
active: improve hypothesis with fewer labels (hopefully) by asking questions
strategically
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Active Learning: Learning by ‘Asking’
Protocol ⇔ Learning Philosophy
•
batch: ‘duck feeding’•
online: ‘passive sequential’• active: ‘question asking’
(sequentially)—query the y
n
of thechosen x n
unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
active: improve hypothesis with fewer labels (hopefully) by asking questions
strategically
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Mini Summary
Learning with Different Protocol f ⇒ (x n , y n )
• batch: all known data
•
online: sequential (passive) data• active: strategically-observed data
•
. . .and more!!unknown target function f : X → Y
training examples D : (x
1,y
1), · · · , (x
N,y
N)
learning algorithm
A
final hypothesis g ≈ f
hypothesis set H
core protocol: batch
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Fun Time
What is this learning problem?
A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is
‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?
1
batch2
online3
active4
randomReference Answer: 3
The algorithm takes a active but naïve strategy: ask when ‘confused’.
You should probably
do the same when taking a class. :-)
Types of Learning Learning with Different Protocol f ⇒ (xn,yn)
Fun Time
What is this learning problem?
A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is
‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?
1
batch2
online3
active4
randomReference Answer: 3
The algorithm takes a active but naïve strategy:
ask when ‘confused’.
You should probably
do the same when taking a class. :-)
Types of Learning Learning with Different Input Space X