• 沒有找到結果。

Making Active Learning More Realistic

N/A
N/A
Protected

Academic year: 2022

Share "Making Active Learning More Realistic"

Copied!
24
0
0

加載中.... (立即查看全文)

全文

(1)

Making Active Learning More Realistic

Hsuan-Tien Lin

National Taiwan University

htlin@csie.ntu.edu.tw

Keynote Talk of Weakly-supervised Learning Workshop @ ACML November 17 2019

(2)

About Me

Professor

National Taiwan University

Chief Data Science Consultant (former Chief Data Scientist)

Appier Inc.

Co-author Learning from Data

Instructor

NTU-Coursera MOOCs ML Foundations/Techniques

(3)

Active Learning

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

• needapple classifier: is this a picture of an apple?

• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning

(APAL stands for Apple and Pear Australia Ltd)

Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster https:

//flic.

kr/p/jNQ55

https:

//flic.

kr/p/jzP1VB

https:

//flic.

kr/p/bdy2hZ

https:

//flic.

kr/p/51DKA8

https:

//flic.

kr/p/9C3Ybd

nachans APAL Jo Jakeman APAL APAL

https:

//flic.

kr/p/9XD7Ag

https:

//flic.

kr/p/jzRe4u

https:

//flic.

kr/p/7jwtGp

https:

//flic.

kr/p/jzPYNr

https:

//flic.

kr/p/jzScif

(4)

Active Learning

Apple Recognition Problem

Note: Slide Taken from my “ML Techniques” MOOC

• needapple classifier: is this a picture of an apple?

• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning

Mr. Roboto. Richard North Richard North Emilian Robert Vicol

Nathaniel Mc- Queen https:

//flic.

kr/p/i5BN85

https:

//flic.

kr/p/bHhPkB

https:

//flic.

kr/p/d8tGou

https:

//flic.

kr/p/bpmGXW

https:

//flic.

kr/p/pZv1Mf

Crystal jfh686 skyseeker Janet Hudson Rennett Stowe

https:

//flic.

kr/p/kaPYp

https:

//flic.

kr/p/6vjRFH

https:

//flic.

kr/p/2MynV

https:

//flic.

kr/p/7QDBbm

https:

//flic.

kr/p/agmnrk

(5)

Active Learning

Active Learning: Learning by ‘Asking’

but labeling isexpensive

Protocol ⇔ Learning Philosophy

• batch: ‘duck feeding’

active: ‘question asking’(iteratively)

—query ynofchosenxn

unknown target function f : X → Y

labeled training examples ( , +1), ( , +1), ( , +1)

( , -1), ( , -1), ( , -1)

learning algorithm

A

final hypothesis gf

+1

active: improve hypothesis withfewer labels (hopefully) by asking questionsstrategically

—learning withincomplete labels

(6)

Active Learning

Pool-Based Active Learning Problem

Given

• labeled poolDl =n

(featurexn ,label yn(e.g. IsApple?))oN n=1

• unlabeled pool Du= nx˜soS

s=1

Goal

design an algorithm that iteratively

1 strategically querysome˜xs to get associatedy˜s

2 move (˜xs,y˜s)fromDutoDl

3 learnclassifier g(t)fromDl (optionally, +Du) and improvetest accuracy of g(t) w.r.t#queries

how toquery strategically?

(7)

Uncertainty Sampling

How to Query Strategically? (1/2)

by DFID - UK Department for International Development;

licensed under CC BY-SA 2.0 via Wikimedia Commons

Strategy 1

askmost confused question

how to defineconfusion (uncertainty)?

(8)

Uncertainty Sampling

Uncertainty of Hard Binary Classifier

uncertain ⇔near binary decision boundary Active Learning with SVM (Tong, 2000)

1 learn a SVM hyperplane withDl (blue squares)

2 queryx˜s (magenta circle) closest to the hyperplane

figure from my former student Chun-Liang Li’s ACML 2012 presentation (Li, 2012)

uncertainty sampling:

arguablymost popularAL paradigm becausesimpleandeffective

(9)

Representative Sampling

Problem of Uncertainty Sampling

initially after many iterations

figures from my student Chun-Liang Li’s ACML 2012 presentation (Li, 2012)

overly confidentabout unknown clusters

• trapped in bad“local optimal”

solution: queryunknown clusters

(10)

Representative Sampling

How to Query Strategically? (2/2)

by DFID - UK Department for International Development;

licensed under CC BY-SA 2.0 via Wikimedia Commons

Strategy 2

askmost frequent question

how tocombine frequency (density) with uncertainty?

(11)

Representative Sampling

Representative Sampling (1/2):

Query of Informative and Repre. Examples (QUIRE)

QUIRE(Huang, 2010)

x˜s = argmin

˜ x

worst case loss on (Dl∪Du) | (˜x, ˜y )

!

where loss ≈ -accuracy

• ≈ uncertainty: knowing ˜y improves loss a lotin worst case

• ≈ representative: knowing ˜y improves loss a lotfor Dupart

QUIRE: promising for RS, buttime consumingto compute

(12)

Representative Sampling

Representative Sampling (2/2):

Hinted Sampling

Uncertainty Sampling (US)

˜xs = argmax

˜x

1 − |p(˜x)| where p(˜x) is probability of more likely class

Hinted Sampling x˜s = argmax

x˜

1 − |p0x)|

wherep0x) ≈ 12 fordense regions of Du

afterwards

⇐=

initially

=⇒

afterwards

figures from my student Chun-Liang Li’s ACML 2012 presentation (Li, 2012)

Hinted Sampling:simple realizationof RS

(13)

Choice of Strategy

How to Query Strategically?

by DFID - UK Department for International Development;

licensed under CC BY-SA 2.0 via Wikimedia Commons

Strategy 1

uncertainty sampling

Strategy 2 QUIRE

Strategy 3 hinted sampling

do you use afixed strategyin practice?

(14)

Choice of Strategy

Choice of Strategy

Strategy 1

uncertainty sampling

Strategy 2 QUIRE

Strategy 3 hinted sampling PSDS(Donmez , 2008)

choosingone single strategy isnon-trivial:

0 10 20 30 40 50 60

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy

RAND UNCERTAIN PSDS QUIRE

0 10 20 30 40 50 60

0.4 0.5 0.6 0.7 0.8 0.9

% of unlabelled data

Accuracy

RAND UNCERTAIN PSDS QUIRE

0 10 20 30 40 50 60

0.5 0.6 0.7 0.8 0.9 1

% of unlabelled data

Accuracy

RAND UNCERTAIN PSDS QUIRE

• human-designed strategyheuristicandconfinemachine’s ability choice(in advance): practitioner’s pain point

—proposed solution:learning to choose

(15)

Choice of Strategy

Idea: Trial-and-Reward Like Human

by DFID - UK Department for International Development;

licensed under CC BY-SA 2.0 via Wikimedia Commons

K strategies:

A1, A2, · · · , AK

try one strategy

“goodness” of strategy as reward

two issues:tryandreward

(16)

Choice of Strategy

Reduction to Bandit

when do humanstrial-and-reward?

gambling

K strategies:

A1, A2, · · · , AK

tryone strategy

“goodness” of strategy asreward

K bandit machines:

B1, B2, · · · , BK

tryone bandit machine

“luckiness” of machine asreward

—will take one well-knownprobabilistic bandit learner (EXP4.P) intelligent choice of strategy

=⇒intelligent choice ofbandit machine

(17)

Choice of Strategy

Active Learning by Learning

(Hsu, 2015)

K strategies:

A1, A2, · · · , AK

try one strategy

“goodness” of strategy as reward

Given: K existing active learning strategies for t = 1, 2, . . . , T

1 let EXP4.Pdecide strategy Ak to try

2 query the ˜xssuggested by Ak, and compute g(t)

3 evaluategoodness of g(t) asrewardoftrialto update EXP4.P

only remaining problem: what reward?

(18)

Design of Reward

Ideal Reward

ideal rewardafter updating classifier g(t) by the query (xnt,ynt):

accuracy 1 M

M

X

m=1

r

ym =g(t)(xm)z

ontest set {(xm,ym)}Mm=1

test accuracyasreward:

area under query-accuracy curve≡cumulative reward

0 10 20 30 40 50 60

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy

RAND UNCERTAIN PSDS QUIRE

• test accuracyinfeasiblein practice

—labelingexpensive, remember?

difficulty: approximatetest accuracyon the fly

(19)

Design of Reward

Weighted Training Accuracy as Reward

non-uniform samplingtheorem: if(xnτ,ynτ)sampled with probability pτ >0from data set {(xn,yn)}Nn=1,

weightedtraining accuracy 1 t

t

X

τ =1

1

pτJynτ =g(xnτ)K

≈ 1

N

N

X

n=1

Jyn=g(xn)K inexpectation

• withprobabilistic querylike EXP4.P:

weightedtraining accuracy≈test accuracy

—can approximatetest accuracyon the fly

ALBL: EXP4.P +weightedtraining accuracy

(20)

Design of Reward

Comparison with Single Strategies

UNCERTAINBest

5 10 15 20 25 30 35 40 45 50 55 60 0.55

0.6 0.65 0.7 0.75 0.8 0.85 0.9

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

vehicle

PSDSBest

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

sonar

QUIREBest

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

diabetes

no single best strategyfor every data set

—choosing needed

ALBLconsistentlymatches the best

—similar findings across other data sets

ALBL: effective inmaking intelligent choices

(21)

Realistic Active Learning

Have We Made Active Learning More Realistic? (1/2)

Yes!

open-source toollibactdeveloped(Yang, 2017)

https://github.com/ntucllab/libact

• including uncertainty, hinted, QUIRE, PSDSand ALBL

• received>500starsand continuousissues

“libact is a Python package designed tomake ac- tive learning easierfor real-world users”

(22)

Realistic Active Learning

Have We Made Active Learning More Realistic? (2/2)

No!

• single-most raisedissue: hard to install on Windows/Mac

—becausehinted(& others) requires some C packages

• performance in a recent industry project:

uncertaintysamplingoften suffices

ALBLdragged down by bad strategy(DWUS: representative sampling)

“libact is a Python package designed to make active learning easier for real-world users”

(23)

Realistic Active Learning

Other Attempts for Realistic Active Learning

“learn” a strategybeforehandrather than on-the-fly?

• learning active learning(Konyushkova, 2017)

• transfer active learning experience(Chu, 2016)

—not easy to realizein open-source package

strategy to savetrue resource consumption (cost)?

• annotation cost-sensitive learning(Tsou, 2019)

—costly to get costs

many more needs to be satisfied: mini-batch, multi-label query,weak-label query, etc.

(24)

Realistic Active Learning

Summary

Traditional Active Learning

• strategies byhuman philosophy: uncertainty, representative (QUIRE, hinted),and many more

Active Learning by (Bandit) Learning

• based onbandit learning+performance estimatoras reward

• effective inmaking intelligent choices

—comparable or superior to the best of existing strategies Making Active Learning Realistic

• attempt withopen-source toollibact

• stillvery challenging

Thank you! Questions?

參考文獻

相關文件

D. Existing cost-insensitive active learning strategies 1) Binary active learning: Active learning for binary classification (binary active learning) has been studied in many works

First, we discuss practical use of SVM as an example to see how users apply a machine learning method Second, we discuss design considerations for a good machine learning package..

• If this active figure can’t auto-play, please click right button, then click play. Active

• If this active figure can’t auto-play, please click right button, then click play.. NEXT

• If this active figure can’t auto-play, please click right button, then click play.. NEXT

Hence, Buddhism has become even more adapted to contemporary society for its realistic and universal values, as well as making outstanding contributions to the contemporary

* Attained: Accumulated at least 420 minutes of moderate- to vigorous-intensity physical activities across the week (60 x 7 = 420 minutes). 本星期體能活動總時間:

A high speed, large area, silicon photovoltaic detector housed in a 26.2mm diameter case. Its large active area, 1cm 2 , and peak spectral response at 900nm make the device suitable