Active Learning by Learning

(1)

Active Learning by Learning

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University ( 國立台灣大學資訊工程系)

2015 IR Workshop, IIS Sinica, Taiwan

joint work with Wei-Ning Hsu, presented in AAAI 2015

(2)

About Me

Hsuan-Tien Lin

• Associate Professor, Dept. of CSIE, National Taiwan University

• Leader of the Computational Learning Laboratory

• Co-author of the textbook “Learning from Data: A Short Course” (often

ML best seller on Amazon)

• Instructor of the NTU-Coursera Mandarin-teaching ML Massive Open Online Courses

•

“Machine Learning Foundations”:

www.coursera.org/course/ntumlone

•

“Machine Learning Techniques”:

www.coursera.org/course/ntumltwo

Hsuan-Tien Lin (NTU CSIE) Active Learning by Learning 1/18

(3)

Active Learning

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

•

need

apple classifier: is this a picture of an apple?

•

gather photos under CC-BY-2.0 license on Flicker (thanks to the

authors below!) and label them as apple/other for learning

(APAL stands for Apple and Pear Australia Ltd)

Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster https:

//flic.

kr/p/jNQ55

https:

//flic.

kr/p/jzP1VB

https:

//flic.

kr/p/bdy2hZ

https:

//flic.

kr/p/51DKA8

https:

//flic.

kr/p/9C3Ybd

nachans APAL Jo Jakeman APAL APAL

https:

//flic.

kr/p/9XD7Ag

https:

//flic.

kr/p/jzRe4u

https:

//flic.

kr/p/7jwtGp

https:

//flic.

kr/p/jzPYNr

https:

//flic.

kr/p/jzScif

(4)

Active Learning

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

•

need

apple classifier: is this a picture of an apple?

•

gather photos under CC-BY-2.0 license on Flicker (thanks to the

authors below!) and label them as apple/other for learning

Mr. Roboto. Richard North Richard North Emilian Robert Vicol

Nathaniel Mc- Queen https:

//flic.

kr/p/i5BN85

https:

//flic.

kr/p/bHhPkB

https:

//flic.

kr/p/d8tGou

https:

//flic.

kr/p/bpmGXW

https:

//flic.

kr/p/pZv1Mf

Crystal jfh686 skyseeker Janet Hudson Rennett Stowe https:

//flic.

kr/p/kaPYp

https:

//flic.

kr/p/6vjRFH

https:

//flic.

kr/p/2MynV

https:

//flic.

kr/p/7QDBbm

https:

//flic.

kr/p/agmnrk

(5)

Active Learning

Batch (Traditional) Machine Learning

Note: Flow Taken from my “ML Foundations” MOOC

unknown target function f : X → Y

training examples D : (x1,y1), · · · , (xN,yN) ( , +1), ( , +1), ( , +1)

( , -1), ( , -1), ( , -1)

learning algorithm

A

final hypothesis g≈f

hypothesis set H

batch

supervised classification:

learn from

fully labeled

data

(6)

Active Learning

Active Learning: Learning by ‘Asking’

but labeling is

expensive

Protocol ⇔ Learning Philosophy

•

batch: ‘duck feeding’

• active: ‘question asking’

(iteratively)

—query ynof

chosen x

_n

unknown target function f : X → Y

labeled training examples ( , +1), ( , +1), ( , +1)

( , -1), ( , -1), ( , -1)

learning algorithm

A

final hypothesis g≈f

hypothesis set H

+1

active: improve hypothesis with fewer labels (hopefully) by asking questions

strategically

(7)

Active Learning

Pool-Based Active Learning Problem

Given

•

labeled pool

D

_l =n

(feature

x

n ,

label y

n(e.g. IsApple?))oN n=1

•

unlabeled pool D_u=n

x ˜

_soS

s=1

Goal

design an algorithm that iteratively

1

strategically query

some

˜ x

s to get associated

y ˜

s 2 move (

x ˜

_s,

y ˜

s)from

D

_uto

D

_l

3 learn

classifier g

^(t)from

D

_l

and improve

test accuracy of g

^(t) w.r.t

#queries

how to

query strategically?

(8)

Active Learning

How to Query Strategically?

by DFID - UK Department for International Development;

licensed under CC BY-SA 2.0 via Wikimedia Commons

Strategy 1

ask

most confused

question

Strategy 2

ask

most frequent

question

Strategy 3

ask

most helpful

question

do you use a

fixed strategy

in practice?

(9)

Active Learning

Choice of Strategy

Strategy 1:

uncertainty

ask

most confused

question

Strategy 2:

representative

ask

most frequent

question

Strategy 3:

exp.-err. reduction

ask

most helpful

question

• choosing

one single strategy is

non-trivial:

0 10 20 30 40 50 60

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy

RAND UNCERTAIN PSDS QUIRE

0 10 20 30 40 50 60

0.4 0.5 0.6 0.7 0.8 0.9

Accuracy

0 10 20 30 40 50 60

0.5 0.6 0.7 0.8 0.9 1

Accuracy

•

human-designed strategy

heuristic

and

confine

machine’s ability can we

free

the machine

by letting it

learn to choose

the strategies?

(10)

Active Learning

Our Contributions

a philosophical and algorithmic study of active learning, which ...

•

allows machine to make

intelligent choice of strategies, just like my cute daughter

•

studies

sound feedback scheme

to tell machine about goodness of choice, just like

what I do

•

results in

promising active learning performance, just like (hopefully) bright future

of my daughter

will describe

key philosophical ideas

behind our proposed approach

(11)

Online Choice of Strategy

Idea: Trial-and-Reward Like Human

by DFID - UK Department for International Development;

licensed under CC BY-SA 2.0 via Wikimedia Commons

K strategies:

A₁, A₂, · · · , A_K

try

one strategy

“goodness” of strategy as

reward

two issues:

try

and

reward

(12)

Reduction to Bandit

when do humans

trial-and-reward?

gambling

K strategies:

A₁, A₂, · · · , A_K

tryone strategy

“goodness” of strategy asreward

K bandit machines:

B₁, B2, · · · , BK

tryone bandit machine

“luckiness” of machine asreward

—will take one well-known

probabilistic bandit learner (EXP4.P)

intelligent choice of strategy

=⇒intelligent choice of

bandit machine

(13)

Active Learning by Learning

K strategies:

A₁, A₂, · · · , A_K

try

one strategy

“goodness” of strategy as

reward

Given: K existing active learning strategies

for t = 1, 2, . . . , T

1 let EXP4.P

decide strategy A

_k

to try

2

query the ˜ x

ssuggested by A_k, and compute g^(t)

3 evaluate

goodness of g

^(t) as

reward

of

trial

to update EXP4.P

only remaining problem:

what reward?

(14)

Design of Reward

Ideal Reward

ideal reward

after updating classifier g^(t) by the query (x_n_t,y_n_t):

accuracy

1 M

M

X

m=1

r

y_m =g^(t)(x_m)z

on

test set

{(xm,y_m)}^M_m=1

• test accuracy

as

reward:

area under query-accuracy curve

≡

cumulative reward

0 10 20 30 40 50 60

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

Accuracy

• test accuracy infeasible

in practice

—labeling

expensive, remember?

difficulty: approximate

test accuracy on the fly

(15)

Design of Reward

Training Accuracy as Reward

test accuracy

(( (( (( (( (( (( ( hhh hhh

hhh hhh h

1 M

P

M

m=1

qy

_m

= g

^(t)

(x

_m

)y infeasible, naïve replacement:

accuracy 1 t

t

X

τ =1

r

y_n_τ =g^(t)(x_n_τ) z

on

labeled pool

{(xnτ,y_n_τ)}^t_{τ =1}

• training accuracy

as

reward:

training accuracy

≈

test accuracy?

•

not necessarily!!

—for active learning strategy that asks

easiest

questions:

•

training accuracy high: x

nτ

easy to label

•

test accuracy low: not enough information about harder instances training accuracy:

too

biased

to approximate

test accuracy

(16)

Design of Reward

Weighted Training Accuracy as Reward

training accuracy

(( (( (( (( (( (( ( hhh hhh

hhh hhh h

1 t

P

t

τ =1

qy

_n_τ

= g

^(t)

(x

_n_τ

)y biased,

want

unbiased estimator

• non-uniform sampling

theorem: if

(x

_n_τ

, y

_n_τ

) sampled with probability p

_τ

> 0

from data set {(x_n,y_n)}^N_n=1 in iteration τ ,

weighted training accuracy

1 t

t

X

τ =1

1 p

_τ

Jy

ⁿτ

= g(x

nτ

) K

≈ 1

N

X

n=1

Jyn=g(x_n)K in

expectation

•

with

probabilistic query

like EXP4.P:

weighted training accuracy

≈

test accuracy weighted training accuracy:

unbiased

approx. of

test accuracy on the fly

(17)

Design of Reward

Human-Designed Criterion as Reward

(Baram et al., 2004) COMB approach:

bandit +

balancedness

of g^(t) on unlabeled data as reward

•

why? human criterion that matches classifier to

domain assumption

•

but many active learning applications are on

unbalanced data!

—assumption may be

unrealistic

existing strategies: active learning

by acting;

COMB: active learning

by acting;

ours: active learning

by learning

(18)

Experiments

Comparison with Single Strategies

UNCERTAIN

Best

5 10 15 20 25 30 35 40 45 50 55 60 0.55

0.6 0.65 0.7 0.75 0.8 0.85 0.9

Accuracy ALBL

vehicle

PSDS

Best

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8

Accuracy ALBL

sonar

QUIRE

Best

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75

Accuracy ALBL

diabetes

• no single best strategy

for every data set

—choosing/blending needed

• ALBL

consistently

matches the best

—similar findings across other data sets

ALBL: effective in making intelligent choices

(19)

Experiments

Comparison with Other Adaptive Blending Algorithms

ALBL

≈

COMB

5 10 15 20 25 30 35 40 45 50 55 60 0.6

0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76

Accuracy

ALBL COMB ALBL−Train

diabetes

ALBL

>

COMB

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8

Accuracy

ALBL COMB ALBL−Train

sonar

• ALBL

>

ALBL-Train

generally

—importance-weightedmechanism needed for

correcting

biased training accuracy

• ALBL

consistently

comparable to or better than COMB

—learning performancemore useful than

human-criterion

ALBL: effective in utilizing performance

(20)

Conclusion

Active Learning by Learning

•

based on

bandit learning

+

unbiased performance estimator

as reward

•

effective in

making intelligent choices

—comparable or superior to the best of existing strategies

•

effective in

utilizing learning performance

—superior to human-criterion-based blending

New Directions

• open-source tool

being developed

•

extending to