• 沒有找到結果。

BAD DOG. THAT’S A VERY WRONG ACTION

cannot easily show the dog that y

n

= sit when

x n

=‘sit down’

but can ‘punish’ to say ˜y

n

= pee is wrong

Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

(customer, ad choice, ad click earning) ⇒ ad system

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

Types of Learning Learning with Different Data Label yn

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog pees on the ground.

BAD DOG. THAT’S A VERY WRONG ACTION.

cannot easily show the dog that y

n

= sit when

x n

=‘sit down’

but can ‘punish’ to say ˜y

n

= pee is wrong

Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

(customer, ad choice, ad click earning) ⇒ ad system

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

Types of Learning Learning with Different Data Label yn

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog sits down.

Good Dog. Let me give you some cookies.

still cannot show y

n

= sit when

x n

=‘sit down’

but can ‘reward’ to say ˜y

n

= sit is good

Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

(customer, ad choice, ad click earning) ⇒ ad system

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

Types of Learning Learning with Different Data Label yn

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog sits down.

Good Dog. Let me give you some cookies.

still cannot show y

n

= sit when

x n

=‘sit down’

but can ‘reward’ to say ˜y

n

= sit is good

Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

(customer, ad choice, ad click earning) ⇒ ad system

(cards, strategy, winning amount) ⇒ black jack agent

reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

Types of Learning Learning with Different Data Label yn

Reinforcement Learning

a ‘very different’ but natural way of learning

Teach Your Dog: Say ‘Sit Down’

The dog sits down.

Good Dog. Let me give you some cookies.

still cannot show y

n

= sit when

x n

=‘sit down’

but can ‘reward’ to say ˜y

n

= sit is good

Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

(customer, ad choice, ad click earning) ⇒ ad system

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with

‘partial/implicit

information’

(often sequentially)

Types of Learning Learning with Different Data Label yn

Mini Summary

Learning with Different Data Label y n

supervised: all y n

unsupervised: no y

n

semi-supervised: some y

n

reinforcement: implicit y

n

by goodness(˜y

n

)

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core tool: supervised learning

Types of Learning Learning with Different Data Label yn

Fun Time

What is this learning problem?

To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?

1

supervised

2

unsupervised

3

semi-supervised

4

reinforcement

Reference Answer: 3

The 1, 000 records are the labeled (x

n

,y

n

); the other 999, 000 pictures are the unlabeled

x n

.

Types of Learning Learning with Different Data Label yn

Fun Time

What is this learning problem?

To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?

1

supervised

2

unsupervised

3

semi-supervised

4

reinforcement

Reference Answer: 3

The 1, 000 records are the labeled (x

n

,y

n

); the other 999, 000 pictures are the unlabeled

x n

.

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Batch Learning: Coin Recognition Revisited

25

5 1

Mass

Size 10

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

batch

supervised multiclass classification:

learn from

all known

data

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

More Batch Learning Problems

25

5 1

Mass

Size 10

Mass

Size

batch of (email, spam?) ⇒ spam filter

batch of (patient, cancer) ⇒ cancer classifier

batch of patient data ⇒ group of patients

batch learning:

a very common protocol

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Online: Spam Filter that ‘Improves’

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

online

spam filter, which

sequentially:

1 observe an email x

t

2 predict spam status with current g

t

(x

t

)

3 receive ‘desired label’ y

t

from user, and then update g

t

with (x

t

, y

t

)

Connection to What We Have Learned

PLA can be easily adapted to online protocol (how?)

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

sequentially

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Online: Spam Filter that ‘Improves’

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

online

spam filter, which

sequentially:

1 observe an email x

t

2 predict spam status with current g

t

(x

t

)

3 receive ‘desired label’ y

t

from user, and then update g

t

with (x

t

, y

t

)

Connection to What We Have Learned

PLA can be easily adapted to online protocol (how?)

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

sequentially

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Online: Spam Filter that ‘Improves’

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

online

spam filter, which

sequentially:

1 observe an email x

t

2 predict spam status with current g

t

(x

t

)

3 receive ‘desired label’ y

t

from user, and then update g

t

with (x

t

, y

t

)

Connection to What We Have Learned

PLA can be easily adapted to online protocol (how?)

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

sequentially

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Online: Spam Filter that ‘Improves’

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

online

spam filter, which

sequentially:

1 observe an email x

t

2 predict spam status with current g

t

(x

t

)

3 receive ‘desired label’ y

t

from user, and then update g

t

with (x

t

, y

t

)

Connection to What We Have Learned

PLA can be easily adapted to online protocol (how?)

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

sequentially

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Online: Spam Filter that ‘Improves’

batch spam filter:

learn with known (email, spam?) pairs, and predict with fixed g

online

spam filter, which

sequentially:

1 observe an email x

t

2 predict spam status with current g

t

(x

t

)

3 receive ‘desired label’ y

t

from user, and then update g

t

with (x

t

, y

t

)

Connection to What We Have Learned

PLA can be easily adapted to online protocol (how?)

reinforcement learning is often done online (why?)

online: hypothesis ‘improves’ through receiving data instances

sequentially

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Active Learning: Learning by ‘Asking’

Protocol ⇔ Learning Philosophy

batch: ‘duck feeding’

online: ‘passive sequential’

active: ‘question asking’

(sequentially)

—query the y

n

of the

chosen x n

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

strategically

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Active Learning: Learning by ‘Asking’

Protocol ⇔ Learning Philosophy

batch: ‘duck feeding’

online: ‘passive sequential’

active: ‘question asking’

(sequentially)

—query the y

n

of the

chosen x n

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

strategically

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Active Learning: Learning by ‘Asking’

Protocol ⇔ Learning Philosophy

batch: ‘duck feeding’

online: ‘passive sequential’

active: ‘question asking’

(sequentially)

—query the y

n

of the

chosen x n

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

strategically

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Mini Summary

Learning with Different Protocol f ⇒ (x n , y n )

batch: all known data

online: sequential (passive) data

• active: strategically-observed data

. . .and more!!

unknown target function f : X → Y

training examples D : (x

1

,y

1

), · · · , (x

N

,y

N

)

learning algorithm

A

final hypothesis g ≈ f

hypothesis set H

core protocol: batch

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Fun Time

What is this learning problem?

A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is

‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?

1

batch

2

online

3

active

4

random

Reference Answer: 3

The algorithm takes a active but naïve strategy: ask when ‘confused’.

You should probably

do the same when taking a class. :-)

Types of Learning Learning with Different Protocol f ⇒ (xn,yn)

Fun Time

What is this learning problem?

A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is

‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?

1

batch

2

online

3

active

4

random

Reference Answer: 3

The algorithm takes a active but naïve strategy:

ask when ‘confused’.

You should probably

do the same when taking a class. :-)

Types of Learning Learning with Different Input Space X

Credit Approval Problem Revisited

age 23 years

gender female

annual salary NTD 1,000,000

相關文件