### •

cannot easily show the dog that y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘punish’ to say ˜y_{n}

= pee is wrong
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning) ⇒ ad system### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
Types of Learning Learning with Different Data Label y_{n}

## Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog pees on the ground.

**BAD DOG. THAT’S A VERY WRONG ACTION.**

### •

cannot easily show the dog that y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘punish’ to say ˜y_{n}

= pee is wrong
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning) ⇒ ad system### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
Types of Learning Learning with Different Data Label y_{n}

## Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog sits down.

### Good Dog. Let me give you some cookies.

### •

still cannot show y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘reward’ to say ˜y_{n}

= sit is good
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning) ⇒ ad system### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
Types of Learning Learning with Different Data Label y_{n}

## Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog sits down.

### Good Dog. Let me give you some cookies.

### •

still cannot show y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘reward’ to say ˜y_{n}

= sit is good
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning) ⇒ ad system### •

(cards, strategy, winning amount) ⇒ black jack agentreinforcement: learn with

**‘partial/implicit**

**information’**

(often sequentially)
Types of Learning Learning with Different Data Label y_{n}

## Reinforcement Learning

a ‘very different’ but natural way of learning

### Teach Your Dog: Say ‘Sit Down’

The dog sits down.

### Good Dog. Let me give you some cookies.

### •

still cannot show y_{n}

= sit
when**x** _{n}

=‘sit down’
### •

but can ‘reward’ to say ˜y_{n}

= sit is good
### Other Reinforcement Learning Problems Using (x, ˜ y , goodness)

### •

(customer, ad choice, ad click earning) ⇒ ad system### •

(cards, strategy, winning amount) ⇒ black jack agent reinforcement: learn with**‘partial/implicit**

**information’**

(often sequentially)
Types of Learning Learning with Different Data Label y_{n}

## Mini Summary

### Learning with Different Data Label y n

### • **supervised: all y** _{n}

### •

unsupervised: no y### n

### •

semi-supervised: some y### n

### •

reinforcement: implicit y_{n}

by goodness(˜y_{n}

)
### •

. . .and more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

_{1}

### ), · · · , (x

_{N}

### ,y

_{N}

### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

core tool: supervised learning

Types of Learning Learning with Different Data Label y_{n}

## Fun Time

### What is this learning problem?

To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?

### 1

supervised### 2

unsupervised### 3

semi-supervised### 4

reinforcement### Reference Answer: 3

The 1, 000 records are the labeled (x

### n

,y### n

); the other 999, 000 pictures are the unlabeled**x** _{n}

.
Types of Learning Learning with Different Data Label y_{n}

## Fun Time

### What is this learning problem?

To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve?

### 1

supervised### 2

unsupervised### 3

semi-supervised### 4

reinforcement### Reference Answer: 3

The 1, 000 records are the labeled (x

### n

,y### n

); the other 999, 000 pictures are the unlabeled**x** _{n}

.
Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Batch Learning: Coin Recognition Revisited

**25**

**5**
**1**

**Mass**

**Size**
**10**

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

**batch**

supervised multiclass classification:
learn from

**all known**

data
Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## More Batch Learning Problems

**25**

**5**
**1**

**Mass**

**Size**
**10**

**Mass**

**Size**

### •

batch of (email, spam?) ⇒ spam filter### •

batch of (patient, cancer) ⇒ cancer classifier### •

batch of patient data ⇒ group of patientsbatch learning:

**a very common protocol**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Online: Spam Filter that ‘Improves’

### •

batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g

### • **online**

spam filter, which**sequentially:**

### 1 observe an email **x**

t
### 2 predict spam status with current g

t### (x

t### )

### 3 receive ‘desired label’ y

t### from user, and then update g

t### with (x

_{t}

### , y

t### )

### Connection to What We Have Learned

### •

PLA can be easily adapted to online protocol (how?)### •

reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances

**sequentially**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Online: Spam Filter that ‘Improves’

### •

batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g

### • **online**

spam filter, which**sequentially:**

### 1 observe an email **x**

t
### 2 predict spam status with current g

t### (x

t### )

### 3 receive ‘desired label’ y

t### from user, and then update g

t### with (x

t### , y

t### )

### Connection to What We Have Learned

### •

PLA can be easily adapted to online protocol (how?)### •

reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances

**sequentially**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Online: Spam Filter that ‘Improves’

### •

batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g

### • **online**

spam filter, which**sequentially:**

### 1 observe an email **x**

t
### 2 predict spam status with current g

t### (x

t### )

### 3 receive ‘desired label’ y

t### from user, and then update g

t### with (x

t### , y

t### )

### Connection to What We Have Learned

### •

PLA can be easily adapted to online protocol (how?)### •

reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances

**sequentially**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Online: Spam Filter that ‘Improves’

### •

batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g

### • **online**

spam filter, which**sequentially:**

### 1 observe an email **x**

t
### 2 predict spam status with current g

t### (x

t### )

### 3 receive ‘desired label’ y

t### from user, and then update g

t### with (x

t### , y

t### )

### Connection to What We Have Learned

### •

PLA can be easily adapted to online protocol (how?)### •

reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances

**sequentially**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Online: Spam Filter that ‘Improves’

### •

batch spam filter:learn with known (email, spam?) pairs, and predict with fixed g

### • **online**

spam filter, which**sequentially:**

### 1 observe an email **x**

t
### 2 predict spam status with current g

t### (x

t### )

### 3 receive ‘desired label’ y

t### from user, and then update g

t### with (x

t### , y

t### )

### Connection to What We Have Learned

### •

PLA can be easily adapted to online protocol (how?)### •

reinforcement learning is often done online (why?)online: hypothesis ‘improves’ through receiving data instances

**sequentially**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Active Learning: Learning by ‘Asking’

### Protocol ⇔ Learning Philosophy

### •

batch: ‘duck feeding’### •

online: ‘passive sequential’### • **active: ‘question asking’**

(sequentially)
—query the y

_{n}

of the**chosen** **x** _{n}

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

**strategically**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Active Learning: Learning by ‘Asking’

### Protocol ⇔ Learning Philosophy

### •

batch: ‘duck feeding’### •

online: ‘passive sequential’### • **active: ‘question asking’**

(sequentially)
—query the y

_{n}

of the**chosen** **x** _{n}

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

**strategically**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Active Learning: Learning by ‘Asking’

### Protocol ⇔ Learning Philosophy

### •

batch: ‘duck feeding’### •

online: ‘passive sequential’### • **active: ‘question asking’**

(sequentially)
—query the y

_{n}

of the**chosen** **x** _{n}

### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

active: improve hypothesis with fewer labels (hopefully) by asking questions

**strategically**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Mini Summary

### Learning with Different Protocol f ⇒ (x n , y n )

### • **batch: all known data**

### •

online: sequential (passive) data### • active: strategically-observed data

### •

. . .and more!!### unknown target function f : X → Y

### training examples **D : (x**

1### ,y

1### ), · · · , (x

N### ,y

N### )

### learning algorithm

### A

### final hypothesis g ≈ f

### hypothesis set H

core protocol: batch

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Fun Time

### What is this learning problem?

A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is

‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?

### 1

batch### 2

online### 3

active### 4

random### Reference Answer: 3

The algorithm takes a active but naïve strategy: ask when ‘confused’.

**You should probably**

**do the same when taking a class. :-)**

Types of Learning Learning with Different Protocol f ⇒ (x_{n},y_{n})

## Fun Time

### What is this learning problem?

A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is

‘confident’ on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm?

### 1

batch### 2

online### 3

active### 4

random### Reference Answer: 3

The algorithm takes a active but naïve strategy:

ask when ‘confused’.

**You should probably**

**do the same when taking a class. :-)**

Types of Learning Learning with Different Input Space X