Making Active Learning More Realistic
Hsuan-Tien Lin
National Taiwan University
htlin@csie.ntu.edu.tw
Keynote Talk of Weakly-supervised Learning Workshop @ ACML November 17 2019
About Me
Professor
National Taiwan University
Chief Data Science Consultant (former Chief Data Scientist)
Appier Inc.
Co-author Learning from Data
Instructor
NTU-Coursera MOOCs ML Foundations/Techniques
Active Learning
Apple Recognition Problem
Note: Slide Taken from my “ML Techniques” MOOC
• needapple classifier: is this a picture of an apple?
• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning
(APAL stands for Apple and Pear Australia Ltd)
Dan Foy APAL adrianbartel ANdrzej cH. Stuart Webster https:
//flic.
kr/p/jNQ55
https:
//flic.
kr/p/jzP1VB
https:
//flic.
kr/p/bdy2hZ
https:
//flic.
kr/p/51DKA8
https:
//flic.
kr/p/9C3Ybd
nachans APAL Jo Jakeman APAL APAL
https:
//flic.
kr/p/9XD7Ag
https:
//flic.
kr/p/jzRe4u
https:
//flic.
kr/p/7jwtGp
https:
//flic.
kr/p/jzPYNr
https:
//flic.
kr/p/jzScif
Active Learning
Apple Recognition Problem
Note: Slide Taken from my “ML Techniques” MOOC
• needapple classifier: is this a picture of an apple?
• gather photos under CC-BY-2.0 license on Flicker (thanks to the authors below!) andlabel them as apple/other for learning
Mr. Roboto. Richard North Richard North Emilian Robert Vicol
Nathaniel Mc- Queen https:
//flic.
kr/p/i5BN85
https:
//flic.
kr/p/bHhPkB
https:
//flic.
kr/p/d8tGou
https:
//flic.
kr/p/bpmGXW
https:
//flic.
kr/p/pZv1Mf
Crystal jfh686 skyseeker Janet Hudson Rennett Stowe
https:
//flic.
kr/p/kaPYp
https:
//flic.
kr/p/6vjRFH
https:
//flic.
kr/p/2MynV
https:
//flic.
kr/p/7QDBbm
https:
//flic.
kr/p/agmnrk
Active Learning
Active Learning: Learning by ‘Asking’
but labeling isexpensive
Protocol ⇔ Learning Philosophy
• batch: ‘duck feeding’
• active: ‘question asking’(iteratively)
—query ynofchosenxn
unknown target function f : X → Y
labeled training examples ( , +1), ( , +1), ( , +1)
( , -1), ( , -1), ( , -1)
learning algorithm
A
final hypothesis g≈f
+1
active: improve hypothesis withfewer labels (hopefully) by asking questionsstrategically
—learning withincomplete labels
Active Learning
Pool-Based Active Learning Problem
Given
• labeled poolDl =n
(featurexn ,label yn(e.g. IsApple?))oN n=1
• unlabeled pool Du= nx˜soS
s=1
Goal
design an algorithm that iteratively
1 strategically querysome˜xs to get associatedy˜s
2 move (˜xs,y˜s)fromDutoDl
3 learnclassifier g(t)fromDl (optionally, +Du) and improvetest accuracy of g(t) w.r.t#queries
how toquery strategically?
Uncertainty Sampling
How to Query Strategically? (1/2)
by DFID - UK Department for International Development;
licensed under CC BY-SA 2.0 via Wikimedia Commons
Strategy 1
askmost confused question
how to defineconfusion (uncertainty)?
Uncertainty Sampling
Uncertainty of Hard Binary Classifier
uncertain ⇔near binary decision boundary Active Learning with SVM (Tong, 2000)
1 learn a SVM hyperplane withDl (blue squares)
2 queryx˜s (magenta circle) closest to the hyperplane
figure from my former student Chun-Liang Li’s ACML 2012 presentation (Li, 2012)
uncertainty sampling:
arguablymost popularAL paradigm becausesimpleandeffective
Representative Sampling
Problem of Uncertainty Sampling
initially after many iterations
figures from my student Chun-Liang Li’s ACML 2012 presentation (Li, 2012)
• overly confidentabout unknown clusters
• trapped in bad“local optimal”
solution: queryunknown clusters
Representative Sampling
How to Query Strategically? (2/2)
by DFID - UK Department for International Development;
licensed under CC BY-SA 2.0 via Wikimedia Commons
Strategy 2
askmost frequent question
how tocombine frequency (density) with uncertainty?
Representative Sampling
Representative Sampling (1/2):
Query of Informative and Repre. Examples (QUIRE)
QUIRE(Huang, 2010)
x˜s = argmin
˜ x
worst case loss on (Dl∪Du) | (˜x, ˜y )
!
where loss ≈ -accuracy
• ≈ uncertainty: knowing ˜y improves loss a lotin worst case
• ≈ representative: knowing ˜y improves loss a lotfor Dupart
QUIRE: promising for RS, buttime consumingto compute
Representative Sampling
Representative Sampling (2/2):
Hinted Sampling
Uncertainty Sampling (US)
˜xs = argmax
˜x
1 − |p(˜x)| where p(˜x) is probability of more likely class
Hinted Sampling x˜s = argmax
x˜
1 − |p0(˜x)|
wherep0(˜x) ≈ 12 fordense regions of Du
afterwards
⇐=
initially
=⇒
afterwards
figures from my student Chun-Liang Li’s ACML 2012 presentation (Li, 2012)
Hinted Sampling:simple realizationof RS
Choice of Strategy
How to Query Strategically?
by DFID - UK Department for International Development;
licensed under CC BY-SA 2.0 via Wikimedia Commons
Strategy 1
uncertainty sampling
Strategy 2 QUIRE
Strategy 3 hinted sampling
do you use afixed strategyin practice?
Choice of Strategy
Choice of Strategy
Strategy 1
uncertainty sampling
Strategy 2 QUIRE
Strategy 3 hinted sampling PSDS(Donmez , 2008)
• choosingone single strategy isnon-trivial:
0 10 20 30 40 50 60
0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
% of unlabelled data
Accuracy
RAND UNCERTAIN PSDS QUIRE
0 10 20 30 40 50 60
0.4 0.5 0.6 0.7 0.8 0.9
% of unlabelled data
Accuracy
RAND UNCERTAIN PSDS QUIRE
0 10 20 30 40 50 60
0.5 0.6 0.7 0.8 0.9 1
% of unlabelled data
Accuracy
RAND UNCERTAIN PSDS QUIRE
• human-designed strategyheuristicandconfinemachine’s ability choice(in advance): practitioner’s pain point
—proposed solution:learning to choose
Choice of Strategy
Idea: Trial-and-Reward Like Human
by DFID - UK Department for International Development;
licensed under CC BY-SA 2.0 via Wikimedia Commons
K strategies:
A1, A2, · · · , AK
try one strategy
“goodness” of strategy as reward
two issues:tryandreward
Choice of Strategy
Reduction to Bandit
when do humanstrial-and-reward?
gambling
K strategies:
A1, A2, · · · , AK
tryone strategy
“goodness” of strategy asreward
K bandit machines:
B1, B2, · · · , BK
tryone bandit machine
“luckiness” of machine asreward
—will take one well-knownprobabilistic bandit learner (EXP4.P) intelligent choice of strategy
=⇒intelligent choice ofbandit machine
Choice of Strategy
Active Learning by Learning
(Hsu, 2015)K strategies:
A1, A2, · · · , AK
try one strategy
“goodness” of strategy as reward
Given: K existing active learning strategies for t = 1, 2, . . . , T
1 let EXP4.Pdecide strategy Ak to try
2 query the ˜xssuggested by Ak, and compute g(t)
3 evaluategoodness of g(t) asrewardoftrialto update EXP4.P
only remaining problem: what reward?
Design of Reward
Ideal Reward
ideal rewardafter updating classifier g(t) by the query (xnt,ynt):
accuracy 1 M
M
X
m=1
r
ym =g(t)(xm)z
ontest set {(xm,ym)}Mm=1
• test accuracyasreward:
area under query-accuracy curve≡cumulative reward
0 10 20 30 40 50 60
0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
% of unlabelled data
Accuracy
RAND UNCERTAIN PSDS QUIRE
• test accuracyinfeasiblein practice
—labelingexpensive, remember?
difficulty: approximatetest accuracyon the fly
Design of Reward
Weighted Training Accuracy as Reward
• non-uniform samplingtheorem: if(xnτ,ynτ)sampled with probability pτ >0from data set {(xn,yn)}Nn=1,
weightedtraining accuracy 1 t
t
X
τ =1
1
pτJynτ =g(xnτ)K
≈ 1
N
N
X
n=1
Jyn=g(xn)K inexpectation
• withprobabilistic querylike EXP4.P:
weightedtraining accuracy≈test accuracy
—can approximatetest accuracyon the fly
ALBL: EXP4.P +weightedtraining accuracy
Design of Reward
Comparison with Single Strategies
UNCERTAINBest
5 10 15 20 25 30 35 40 45 50 55 60 0.55
0.6 0.65 0.7 0.75 0.8 0.85 0.9
% of unlabelled data
Accuracy ALBL
RAND UNCERTAIN PSDS QUIRE
vehicle
PSDSBest
5 10 15 20 25 30 35 40 45 50 55 60 0.5
0.55 0.6 0.65 0.7 0.75 0.8
% of unlabelled data
Accuracy ALBL
RAND UNCERTAIN PSDS QUIRE
sonar
QUIREBest
5 10 15 20 25 30 35 40 45 50 55 60 0.5
0.55 0.6 0.65 0.7 0.75
% of unlabelled data
Accuracy ALBL
RAND UNCERTAIN PSDS QUIRE
diabetes
• no single best strategyfor every data set
—choosing needed
• ALBLconsistentlymatches the best
—similar findings across other data sets
ALBL: effective inmaking intelligent choices
Realistic Active Learning
Have We Made Active Learning More Realistic? (1/2)
Yes!
open-source toollibactdeveloped(Yang, 2017)
https://github.com/ntucllab/libact
• including uncertainty, hinted, QUIRE, PSDSand ALBL
• received>500starsand continuousissues
“libact is a Python package designed tomake ac- tive learning easierfor real-world users”
Realistic Active Learning
Have We Made Active Learning More Realistic? (2/2)
No!
• single-most raisedissue: hard to install on Windows/Mac
—becausehinted(& others) requires some C packages
• performance in a recent industry project:
• uncertaintysamplingoften suffices
• ALBLdragged down by bad strategy(DWUS: representative sampling)
“libact is a Python package designed to make active learning easier for real-world users”
Realistic Active Learning
Other Attempts for Realistic Active Learning
“learn” a strategybeforehandrather than on-the-fly?
• learning active learning(Konyushkova, 2017)
• transfer active learning experience(Chu, 2016)
—not easy to realizein open-source package
strategy to savetrue resource consumption (cost)?
• annotation cost-sensitive learning(Tsou, 2019)
—costly to get costs
many more needs to be satisfied: mini-batch, multi-label query,weak-label query, etc.
Realistic Active Learning
Summary
Traditional Active Learning
• strategies byhuman philosophy: uncertainty, representative (QUIRE, hinted),and many more
Active Learning by (Bandit) Learning
• based onbandit learning+performance estimatoras reward
• effective inmaking intelligent choices
—comparable or superior to the best of existing strategies Making Active Learning Realistic
• attempt withopen-source toollibact
• stillvery challenging
Thank you! Questions?