Cost-sensitive Classification: Techniques and Stories
Hsuan-Tien Lin htlin@csie.ntu.edu.tw
Professor
Dept. of CSIE, National Taiwan University
Machine Learning Summer School @ Taipei, Taiwan August 2, 2021
About Me
• co-author of textbook ‘Learning from Data: A Short Course’
• instructor of two Coursera
Mandarin-teaching ML MOOCs on Coursera
goal: make ML more realistic
• weakly supervised learning: in ICML ’20, ICLR ’21,. . .
• online/active learning: in ICML ’12, ICML ’14, AAAI ’15, EMNLP ’20,. . .
• cost-sensitive classification: in ICML ’10, KDD ’12, IJCAI ’16,. . .
• multi-label classification: in
NeurIPS ’12, ICML ’14, AAAI ’18,. . .
• large-scale data mining: e.g. co-led KDDCup world-championNTU teams 2010–2013
More About Me
attendant:MLSS Taipei 2006
student workshop talk: Large-Margin Thresholded Ensembles for Ordinal Regression
Disclaimer about Cost-sensitive Classification
materials mostly from “old” tutorials
• Advances in Cost-sensitive Multiclass and Multilabel Classification. KDD 2019 Tutorial, Anchorage, Alaska, USA, August 2019.
• Cost-sensitive Classification: Algorithms and Advances. ACML 2013 Tutorial, Canberra, Australia, November 2013.
• core techniquessomewhat mature, compared to 10 years ago
• new researchstill being inspired, e.g.
• Classification with Rejection Based on Cost-sensitive Classification, Charoenphakdee et al., ICML ’21
• Cost-Sensitive Robustness against Adversarial Examples, Zhang and Evans, ICLR ’19
• will show one applicationstoryin the end
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Outline
1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
CSMC by Bayesian Perspective
CSMC by (Weighted) Binary Classification CSMC by Regression
2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup
CSML by Bayesian Perspective
CSML by (Weighted) Binary Classification CSML by Regression
3 A Story of Bacteria Classification with Doctor-Annotated Costs
4 Summary
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Which Digit Did You Write?
?
one (1) two (2) three (3) four (4)
• amulticlass classification problem: grouping ‘pictures’ into different ‘categories’
C’mon, we know about
multiclass classification all too well!:-)
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Performance Evaluation
?
• ZIP code recognition:
1:wrong; 2:right; 3: wrong; 4: wrong
• check value recognition:
1:one-dollar mistake; 2:no mistake;
3:one-dollar mistake; 4:two-dollar mistake
different applications: evaluate mis-predictions differently
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
ZIP Code Recognition
?
1:wrong; 2:right; 3: wrong; 4: wrong
• regular multiclass classification: only right or wrong
• wrong cost: 1;right cost: 0
• prediction error of h on some (x, y ):
classification cost =Jy 6= h(x)K regular multiclass classification:
well-studied, many good algorithms
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Check Value Recognition
?
1:one-dollar mistake; 2:no mistake;
3:one-dollar mistake; 4: two-dollar mistake
• cost-sensitive multiclass classification: different costs for different mis-predictions
• e.g. prediction error of h on some (x, y ):
absolute cost =|y − h(x)|
next: more aboutcost-sensitive multiclass classification (CSMC)
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
What is the Status of the Patient?
(image by mcmurryjulie from Pixabay)?
bird flu cold healthy
(images by Clker-Free-Vector-Images from Pixabay)
• anotherclassification problem: grouping ‘patients’ into different ‘status’
are all mis-prediction costs equal?
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Patient Status Prediction
error measure = society cost actual
predicted
bird flu cold healthy
bird flu 0 1000 100000
cold 100 0 3000
healthy 100 30 0
• bird flu mis-predicted as healthy: very high cost
• cold mis-predicted as healthy: high cost
• cold correctly predicted as cold: no cost
human doctors consider costs of decision;
can computer-aided diagnosis do the same?
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Setup: Class-Dependent Cost-Sensitive Classification
Given
N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K }
and cost matrixC ∈ RK ×K withC(y, y) =0= min1≤k ≤KC(y, k) patient diagnosis
with society cost C =
0 1000 100000
100 0 3000
100 30 0
check digit recognition
with absolute cost (cost function)
C(y, k) = |y − k|
Goal
a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y) includesregular classificationCc like0 1
1 0
asspecial case
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Which Age-Group?
? infant (1) child (2) teen (3) adult (4)
(images by Tawny van Breda, Pro File, Mieczysław Samol, lisa runnels, vontoba from Pixabay)
• small mistake—classify child as teen;big mistake—classify infant as adult
• cost matrixC(y, g(x)) for embedding ‘order’: C =
0 1 4 5
1 0 1 3
3 1 0 2
5 4 1 0
CSMC can help solve many other problems likeordinal ranking
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Cost Vector
cost vectorc: a row of cost components
• society cost for a bird flu patient: c = (0,1000,100000)
• absolute cost for digit 2: c = (1,0,1,2)
• age-ranking cost for a teenager:c = (3,1,0,2)
• ‘regular’ classification cost for label 2:c(2)c = (1,0,1,1)
• movie recommendation
• someone who loves romance movie buthates terror:
c = (romance =0, fiction =5,terror=100)
• someone who loves romance movie butfine with terror:
c = (romance =0, fiction =5,terror=3) cost vector:
representation ofpersonal preference in many applications
Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
Setup: Example-Dependent Cost-Sensitive Classification
Given
N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost vectorcn∈ RK
—will assumecn[yn] =0= min1≤k ≤Kcn[k ] Goal
a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)
• will assumec[y ] =0=cmin= min1≤k ≤Kc[k ]
• note: y not really needed in evaluation
example-dependent⊃ class-dependent ⊃ regular
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
Outline
1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
CSMC by Bayesian Perspective
CSMC by (Weighted) Binary Classification CSMC by Regression
2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup
CSML by Bayesian Perspective
CSML by (Weighted) Binary Classification CSML by Regression
3 A Story of Bacteria Classification with Doctor-Annotated Costs
4 Summary
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
Key Idea: Conditional Probability Estimator
Goal (Class-Dependent Setup)
a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y)
if P(y|x) known
Bayes optimal g∗(x) =
argmin
1≤k ≤K K
X
y =1
P(y|x)C(y, k)
if q(x, y )≈ P(y|x) well
approximately good gq(x) =
argmin
1≤k ≤K K
X
y =1
q(x, y )C(y, k)
how to getconditional probability estimator q?
logistic regression, Naïve Bayes,. . .
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
Approximate Bayes-Optimal Decision
if q(x, y )≈ P(y|x) well
(Domingos, 1999)
approximately good gq(x) = argmin
1≤k ≤K K
P
y =1
q(x, y )C(y, k)
Approximate Bayes-Optimal Decision (ABOD) Approach
1 use your favorite algorithm on{(xn, yn)} to getq(x, y )≈ P(y|x)
2 for each new inputx,predict its class using gq(x)above
ABOD:probability estimate+Bayes-optimal decision
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
ABOD on Artificial Data
1 use your favorite algorithm on{(xn, yn)} to getq(x, y )≈ P(y|x)
2 for each new inputx,predict its class using gq(x)above
LogReg
−→
g(x)
1 2 3 4
y
1 0 1 2 4
2 4 0 1 2
3 2 4 0 1
4 1 2 4 0
regular rotate
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
ABOD for Binary Classification
Given N examples, each (inputxn, label yn)∈ X × {−1, +1}
and weightsw+,w− representingtwo entries of cost matrix
g(x) +1 -1
y +1 0 w+
-1 w− 0
if q(x)≈ P(+1|x) well
approximately good gq(x) = sign
w+q(x)−w−(1−q(x))
, i.e. (Elkan, 2001), gq(x) = +1 ⇐⇒w+q(x)−w−(1−q(x))> 0 ⇐⇒q(x)> w−
w++w−
ABOD for binary classification:
probability estimate+threshold changing
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
ABOD for Binary Classification on Artificial Data
1 use your favorite algorithm on{(xn, yn)} to getq(x)≈ P(+1|x)
2 for each new inputx, predict its class using gq(x) = sign(q(x)−w+w+w−−)
LogReg
−→
g(x) +1 -1
y +1 0 10
-1 1 0
regular positive emphasis
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
Pros and Cons of ABOD
Pros
• optimal if goodprobability estimateq
• predictioneasily adapts to differentC without modifyingtraining (probability estimate) Cons
• ‘difficult’: goodprobability estimateoften more difficult
than goodmulticlass classification
• ‘restricted’: only applicable toclass-dependent setup
—need ‘full picture’ of cost matrix
• ‘slower prediction’ (for multiclass): more calculationat prediction stage
can we use anymulticlass classification algorithmfor ABOD?
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
MetaCost Approach
Approximate Bayes-Optimal Decision (ABOD) Approach
1 use your favorite algorithm on{(xn, yn)} to getq(y, x)≈ P(y|x)
2 for each new inputx, predict its class using gp(x) MetaCost Approach (Domingos, 1999)
1 use your favoritemulticlass classificationalgorithm onbootstrapped{(xn, yn)}and aggregatethe classifiers to getq(y, x)≈ P(y|x)
2 for each given inputxn,relabel it to yn0 using gq(x)
3 run your favoritemulticlass classificationalgorithm
onrelabeled{(xn, yn0)}to get final classifier g
4 for each new inputx, predict its class using g(x)
pros: anymulticlass classificationalgorithm can be used
Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective
MetaCost on Semi-Real Data
0 500 1000 1500 2000 2500 3000 3500
0 300 600 900 1200 1500 1800
C4.5R
MetaCost Multiclass Two-class y=x
0 500 1000 1500 2000 2500
0 300 600 900 1200 1500 1800
Undersampling
MetaCost Multiclass Two-class y=x
0 500 1000 1500 2000 2500
0 300 600 900 1200 1500 1800
Oversampling
MetaCost Multiclass Two-class y=x
/(ÔçWñ8àܸê 'LÐ=ØWá·ßÒWàÔªÓØgÙØMÝ*Ü"×@Ò{Ð=ØWÓ@×CDÓ(ÖzØWÓ@×@Ó#Ò´ýmÔÓ"-ôÔ×|Þ
×@ÞmØgÓÜØݵРB8íB@$ó»ñÙ8æmÜ"à|Ó@ÒWá·ßmÑÔªÙmçåÒWÙæ Ø´÷WÜ"à|Ó@ÒWáßÑÔªÙmç
Ò´ýmÜ"Ó"UíX =ÒWÖ0Þ^ß8ØWÔÙg׸Ö&ØWà|à@Ü"Ó|ß8ØgÙæÓ6×@ØËÒËæÒ×|ÒMïÒWÓÜÒWÙæ*×OâmßQÜ
ØÝÖ&ØWÓ@×á·Ò×|àÔýuí ØMÔªÙW×|Ó'ÒMïQØ´÷MÜ×@ÞmÜ N üÑÔÙmÜÒWàÜ·×|ÞmØWÓ@Ü
à@Ü"æñÖz×ÔØWÙµáÜz×@ÞØ{æµØMÝÖ0ÞmØWÔÖzܻݺØWàáñÑ×@ÔÖzÑÒWÓ@Óßà@ØWïÑÜGáÓ"í
EF Y LHRS#SUXW$YTZDLH[]\S
îÙÛ×Oô=ØMúZÖzÑÒWÓ@Óßà@ØWïmÑÜ"á·Óµô6ÞmÜGàÜE14êA7zêCN 14Jþ7|þA N ÿó
Ó@×@à|Ò×ÔÕQÖ"Ò×ÔØWÙ'Ö"ÒMÙïQÜÒMßßmÑÔÜ"æ'ôÔ×@ÞmØgñm×rÒWÙgâ¸ÒWßßà@ØcýmÔá·Ò×@ÔØgÙ
ïgâ á·ÒMägÔÙç 14ê N 14Zþ:7zêUó 14ZþA N 14ê7UþAåÒMÙæ
ßà@ØmÖ&ÜzÜGæmÔªÙmç]ÒMÓï8ÜzݺØWà@ÜMí:rÜz×@×ÔªÙmçðêËïQÜ/×|ÞmÜËáÔªÙmØWà@Ô×OâÖzÑÒWÓ@Ó
ÒMÙ8æ þ×@ÞmÜùá·Ò´òØWà@Ô×Oâ:Ö&ѪÒMÓ|Ó"óËÜzýmßQÜ"à@ÔªáÜGÙg×@ÓðØgÙ ×Oô=ØMúZÖzÑÒWÓ@Ó
æÒM×@ÒMï8ÒMÓ@Ü"ÓÛô=ÜGàÜ Ö&ØWÙ8æñÖ&×@Ü"æñÓÔªÙmçã×@ÞmÜ:ݺØMÑÑØ´ôÔÙmçãÖ&ØgÓ×
áØmæmÜzÑ9'014êA7zê*N 14Zþ:7|þ *Nöÿ?14ê7UþA*Nûê"ÿWÿMÿ?14Jþ7zê*N
ê"ÿWÿMÿ gó»ô6ÞmÜGàÜðôSÒWÓËÓ@Üz×µÒÑ×ÜGà@ÙÒM×Ü"Ñâ×Øöþmó @móÂÒWÙæûêGÿí
=ØW×Ü ×@Þ8Ò×µ×@ÞmÜüÒMï8ÓØWÑñm×@Ü¡÷´ÒMÑñmÜGÓµØÝ414Zþ:7zêËÒMÙæ<14ê7UþA
ÒMà@ÜÔªà@à@ÜzÑÜz÷´ÒWÙW×ݺØWàÂÒÑçMØWà@Ô×@ÞáÖ&Øgá·ßÒMà@ÔÓ@ØWÙßñ8à@ßQØWÓ@Ü"Ó?pØWÙmÑâ
×@ÞÜzÔªàÂà|Ò×@ÔØÔÓ»ÓÔçWÙÔ}ÕQÖ"ÒMÙg×zí;)ÞmÜàÜGÓ@ñmÑ×@ÓÂØWï×@ÒÔªÙmÜGæuóuñÓÔªÙmç
×@ÞÜ]Ó@ÒWáÜ*ÓÜ"××ÔªÙmçgÓݺØWà *Üz×|ÒgÐ=ØWÓ@×ÒMÓ/ï8ÜzݺØWà@ÜMóÒMà@Ü^Ó@ÞmØ´ô6Ù
ÔªÙ )(ÒMïmÑÜ8ó[ÒWÙæ^çWà|ÒMß8ÞmÔÖ"ÒÑÑâµÔªÙ3/Ôçgñà@ÜêMí @Â÷MÜGà@Ó|ÒMá·ßmÑÔÙç
ÔªÓÙmØM×÷WÜ"à@âüÜKpÜ"Ö&×@Ô÷WÜÔÙà@Ü"æñÖzÔÙç ÖzØWÓ@×·ôÔ×|ÞÒWÙWâ°ØÝÂ×|ÞmÜ
Ö&ØgÓ×à@ÒM×ÔØWÓ"í »ÙæmÜGà@Ó|ÒMá·ßmÑÔÙç]ÔªÓÜK[ÜGÖ&×Ô÷MÜËݺØWàCN @]ÒMÙæ
N ê"ÿóïñm×ËÙmØM×ݺØgà N þmí ]Üz×@Ò{Ð=ØWÓ@×àÜGæñÖzÜ"ÓÖzØWÓ@×@Ó
Ö&Øgá·ßÒMà@Ü"æü×@Ø Ð BQí@óLñÙæmÜGà@Ó|ÒMá·ßmÑÔÙç¡ÒMÙ8æüØ´÷MÜGà@Ó|ÒMá·ßmÑÔÙç
ØWÙËÒѪáØWÓ@×ÒMÑÑuæÒM×@ÒWïÒMÓ@Ü"Ó"óMݺØgàÒÑÑuÖ&ØWÓ@×à@ÒM×ÔØWÓ"írîÙµÒÑÑÖzÒWÓÜGÓzó
×@ÞÜÖzØWÓ@×@ÓØWïm×|ÒÔªÙmÜ"æïgâ=*Üz×|ÒgÐ=ØWÓ@×·ÒMà@Ü/ÑØ´ô=ÜGà×|ÞÒMÙ°×@ÞmØgÓÜ
ØÝSÜ"ÒWÖ0Þ]ØMÝ=×@ÞÜØM×@ÞÜ"à»×@Þà@ÜzÜÒMÑçWØWà@Ô×|Þá·ÓôÔ×@Þ ÖzØWÙ{ÕQæÜ"ÙÖzÜ"Ó
Ü&ýÖzÜzÜ"æÔÙmç HHñÓÔªÙmç ÓÔçWÙÒMÙæ ;öÔѪÖ&ØcýmØgÙü×ÜGÓ×|Ó XÜ&ýÖ&ÜGßm×
ݺØWà×|ÞmÜûÓ@ÔçgÙ ×@Ü"Ó@×ݺØWàöñÙæÜ"à|Ó@ÒWáßÑÔªÙmç ôÔ×|Þ N êGÿó
ô6ÞmÜGàÜ^×|ÞmÜ]Ö&ØgÙ{ÕQæmÜGÙÖ&ÜÔªÓ HWë Uí )ÞÜ"Ó@Ü*àÜGÓ@ñmÑ×@Ó/Ó@ñß8ß8Øgà×
×@ÞÜûÖ&ØWÙ8Ö&ѪñÓÔØWÙ ×@ÞÒM× *Üz×|ÒgÐ=ØgÓ×éÔÓ×|ÞmÜûÖ&ØgÓ×úZà@Ü"æ8ñÖ&×@ÔØgÙ
áÜz×|ÞmØmæØÝÖ0ÞmØMÔªÖ&ÜüÜ"÷MÜGÙåݺØgà*×OôSØúOÖ&ѪÒMÓ|ÓßàØgïmÑÜ"á·Ó^ô6ÞmÜGàÜ
Ó@×@à|Ò×ÔÕQÖ"Ò×ÔØWÙµÖzÒWÙËïQÜÒMß8ßmÑÔÜ"æËôÔ×@ÞmØgñm×ÒMß8ßàØcýmÔªá·Ò×ÔØWÙí
EFE 5[]S NPY M J%NP[]S
>gÜ"÷MÜ"à|ÒÑ!(gñÜ"Ó@×ÔØWÙÓÒWàÔªÓ@ÜÔÙüÖzØWÙÙmÜGÖ&×@ÔØgÙ ôÔ×|Þ=*Üz×|ÒgÐ=ØgÓ×EDÓ
à@Ü"Ó|ñmÑ×|ÓzíPIÂØ´ô Ó@Ü"ÙÓ@Ô×@Ô÷WÜÛÒWàÜÛ×@ÞmÜ"â%×Ø ×@ÞmÜ Ù{ñáïQÜ"àéØMÝ
à@Ü"Ó|ÒMá·ßmÑÜ"ÓñÓ@Ü"æ ;ØWñmѪæùÔ×*ïQÜüÜ"ÙØWñmçgÞ;×ØöÓ@Ôá·ßmÑâÛñ8ÓÜ
×@ÞÜ·ÖzÑÒWÓ@Ó¸ßà@ØWï8ÒMïmÔÑÔ×ÔÜ"Ó¸ßà@ØmæñÖzÜ"æïWâ¡ÒËÓ@ÔÙçMÑÜ·à|ñÙ]ØÝ=×|ÞmÜ
Ü"à|à@ØWàúZïÒWÓÜGæðÖ&ѪÒMÓ|Ó@Ô}Õ8ÜGà·ØWÙé×@ÞÜݹñmÑÑÂ×@à|ÒÔªÙmÔªÙmçüÓÜ"× ;ØWñÑæ
*Ü"×@ÒgÐ=ØgÓ×*ßQÜ"àݺØWà|á ïQÜz×@×ÜGà]ÔÝÒÑÑáØ{æÜzѪÓ*ôSÜ"à@ÜðñÓÜGæùÔÙ
à@ÜzѪÒMïQÜzÑÔÙmçãÒMÙ Ü&ýÒMá·ßmÑÜMóðÔªà|àÜGÓ@ßQÜ"Öz×Ô÷MÜkØMÝ°ô6ÞmÜ"×@ÞmÜGàÛ×|ÞmÜ
Ü&ýÒWáßÑÜ'ôÒMÓñÓ@Ü"æµ×ØÑÜ"ÒWà@ÙË×@ÞÜ"á ØgàÙmØM×^øÂÙæ^ÞmØ´ôåôSÜzÑÑ
ôSØWñmѪæ ]Üz×@Ò{Ð=ØWÓ@×»æØËÔÝS×|ÞmÜÖ&ѪÒMÓ|Ó¸ßàØgïÒMïÔÑÔ×@ÔÜGÓ¸ßàØmæñ8Ö&Ü"æ
ïgâ Ð BQí@ ôSÜ"à@Ü·ÔçgÙmØWà@Ü"æórÒMÙ8æ¡×|ÞmÜßà@ØWïÒWïmÔÑÔ×Oâ]ØMÝÒ^ÖzÑÒWÓ@Ó
ôÒMÓËÜ"Ó@×ÔªáÒM×ÜGæÛÓ@Ôá·ßmÑâ;ÒMÓµ×|ÞmÜݹà@ÒWÖ&×ÔØWÙåØÝáØmæmÜ"ÑÓµ×@Þ8Ò×
ßà@Ü"æmÔªÖ&×@Ü"æ Ô× )ÞÔÓÓÜGÖ&×ÔØWÙüÒWÙÓ@ô=ÜGà@Ó¸×|ÞmÜ"Ó@Ü ({ñmÜ"Ó@×ÔØWÙ8Óïgâ
ÖzÒWà@à@â{ÔÙçðØWñm×µ×@ÞÜ à@ÜzÑÜz÷´ÒWÙW×ËÜ&ýßQÜ"à@ÔáÜ"Ùg×|Ózí /QØWàË×@ÞmÜüÓ|ÒMäÜ
ØÝÓ@ßÒWÖ&ÜWó6ØWÙmÑâöàÜGÓ@ñÑ×|Ó/ØgÙö×@ÞmÜ×OôSØúOÖ&ѪÒMÓ|Ó/æÒM×@ÒMï8ÒMÓ@Ü"Ó/ÒWàÜ
ßà@Ü"Ó@Ü"Ùg×ÜGæ? ×@ÞmÜùàÜGÓ@ñÑ×|Ó°ØgÙ áñmÑ×ÔªÖ&ѪÒMÓ|Ó°æ8Ò×@ÒWïÒMÓ@Ü"Ó°ô=ÜGàÜ
ïà@ØWÒWæmÑâðÓ@ÔªáÔѪÒMà"í )rÒWïmÑÜ*B à@Ü"ßQØWà@×@Ó×@ÞmÜ^à@Ü"Ó|ñmÑ×@ÓØWïm×|ÒÔªÙmÜ"æ
ݺØWà N þmó @åÒWÙæ%êGÿ;ïgâù×|ÞmÜݺØMÑÑØ´ôÔªÙmçå÷´ÒWàÔªÒ×@ÔØgÙÓ*ØMÝ
*Ü"×@ÒgÐ=ØgÓ×';ñÓ@ÔÙmç;þMÿéÒWÙæ:ê"ÿàÜGÓ@ÒWá·ßmÑÜGÓµÔÙÓ@×ÜGÒMæùØÝ @ÿ
¹ÑªÒMïQÜzÑÜ"æ T
N¸þÿAV'ÒMÙæ=T
NêGÿVG?Wà@ÜzѪÒMïQÜzÑÔªÙmç×|ÞmÜ×|à@ÒMÔÙmÔªÙmç
Ü&ýÒWáßÑÜGÓ;ñÓ@ÔÙmç%×|ÞmÜûÖ&ѪÒMÓ|Ó;ßà@ØWï8ÒMïmÔÑÔ×ÔÜ"Óöß8àØmæñÖzÜ"æïgâ
Ò;Ó@ÔÙçMÑÜ°à@ñ8ÙØݵРB8íB@$ ØgÙ ÒÑÑ×|ÞmÜæÒM×@Ò XÑÒWï8Ü"ÑÜGæ TUÐ B
à@ØWï8ÓVG?6ÔçWÙmØgàÔªÙmç ×@ÞÜ]Ö&ѪÒMÓ|Óßà@ØWïÒWïmÔÑÔ×ÔÜ"Óß8àØmæñÖzÜ"æïgâ
(Domingos, 1999)
• some ‘artificial’ cost with UCI data
• MetaCost+C4.5:
cost-sensitive
• C4.5: regular
not surprisingly,
considering the cost properly does help
Hsuan-Tien Lin (NTU) Cost-sensitive Classification: Techniques and Stories 23/104
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Outline
1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
CSMC by Bayesian Perspective
CSMC by (Weighted) Binary Classification CSMC by Regression
2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup
CSML by Bayesian Perspective
CSML by (Weighted) Binary Classification CSML by Regression
3 A Story of Bacteria Classification with Doctor-Annotated Costs
4 Summary
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Key Idea: Cost Transformation
(heuristic)relabelinguseful in MetaCost: a moreprincipledway?
Yes, by Connecting Cost Vector to Regular Costs!
1 0 1 2
| {z }
cof interest
shift equivalence
−−−−−−−−−−−→ 3 2 3 4
| {z }
shifted cost
= 1 2 1 0
| {z }
mixture weightsu`
·
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
| {z }
regular costs
i.e.x with c = (1,0,1,2)equivalent to
a weighted mixture{(x, y, u)} = {(x, 1, 1), (x, 2, 2), (x, 3, 1)}
cost equivalence(Lin, 2014): for any classifier h, c[h(x)]+constant=
K
P
`=1
u`J` 6= h(x)K
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Meaning of Cost Equivalence
c[h(x)]+constant=
K
X
`=1
u`J` 6= h(x)K
on one (x, y , c):
wrong prediction charged by c[h(x)]
relabeled data
onall{(x, `, u`)}:
wrong prediction charged by total weighted classification error
ofrelabeled data minhexpectedLHS (original CSMC problem)
= minhexpectedRHS (weighted classification when u` ≥ 0)
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Calculation of u
`Smallest Non-Negative u`’s(Lin, 2014)
whenconstant = (K − 1) max
1≤k ≤Kc[k ]− PK
k =1
c[k ], u` = max
1≤k ≤Kc[k ]− c[`]
e.g. 1 0 1 2
| {z }
cof interest
→ 1 2 1 0
| {z }
mixture weightsu`
• largest c[`]: u`=0 (least preferred relabel)
• smallestc[`]: u` =largest (original label &most preferred relabel)
`’s and u`’sembed the cost
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Data Space Expansion Approach
Data Space Expansion (DSE) Approach(Abe, 2004) 1 for each (xn, yn, cn)and`, letun,`= max
1≤k ≤Kcn[k ]− cn[`]
2 apply your favoritemulticlass classification algorithmon the weighted mixtures
N
S
n=1
{(xn, `, un,`)}K`=1to get g(x)
3 for each new inputx, predict its class using g(x)
• bycost equivalence,
good g fornew (weighted) regular classification problem
=
good g fororiginal cost-sensitive classification problem• weightedregular classification: special case of CSMC
but more easily solvable by, e.g.,sampling+ regular classification(Zadrozny, 2003)
pros: anymulticlass classificationalgorithm can be used
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
DSE versus MetaCost on Semi-Real Data
ann kdd let spl sat
0 50 100 150 200 250
avg. test cost
DSE
MetaCost (Abe, 2004)some ‘artificial’
cost with UCI data
• usesampling+ C4.5 for weighted regular classification
DSEcompetitive to MetaCost
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Cons of DSE: Unavoidable Noise
Original Cost-Sensitive Classification Problem
1 2 3 4
individual examples without noise
+absolute cost =
New Regular Classification Problem
mixtures with relabeling noise
• cost embedded as weight +noisy labels
• new problem usuallyharder than original one
needrobustmulticlass classification algorithm to deal withnoise
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Key Idea: Design Robust Multiclass Algorithm
One-Versus-One: A Popular Classification Meta-Method
• for all different class pairs (i, j),
1 take all examples (xn, yn)
• that yn=i or j(original one-versus-one)
• that un,i6= un,jwith the larger-u label and weight |un,i− un,j| (robust one-versus-one) 2 train a binary classifier ˆg(i,j)using those examples
• return g(x) that predicts using the votes from ˆg(i,j)
• un-shiftinginside the meta-method toremove noise
• robust stepmakes it suitable for DSE
cost-sensitive one-versus-one: DSE +robust one-versus-one
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Cost-Sensitive One-Versus-One (CSOVO)
Cost-Sensitive One-Versus-One (Lin, 2014)
• for all different class pairs (i, j),
1 robust one-versus-one+ calculate fromcn: take all examples (xn, yn)
thatcn[i]6= cn[j] withsmaller-c labelandweight un(i,j)=|cn[i]− cn[j]|
2 train a binary classifier ˆg(i,j)using those examples
• return g(x) that predicts using the votes from ˆg(i,j)
• comes withgood theoretical guarantee:
test cost of g≤ 2X
i<jtest cost of ˆg(i,j)
• sibling to Weighted All-Pairs (WAP) approach: even tighter guarantee(Beygelzimer, 2005)
withmore sophisticated construction of u(i,j)n physical meaning: each ˆg(i,j)answers yes/no question‘prefer i or j?’
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
CSOVO on Semi-Real Data
veh vow seg dna sat usp
0 20 40 60 80 100 120 140 160 180 200
avg. test random cost
CSOVO
OVO (Lin, 2014)some ‘artificial’
cost with UCI data
• CSOVO-SVM:
cost-sensitive
• OVO-SVM: regular
not surprisinglyagain,
considering the cost properly does help
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
CSOVO for Ordinal Ranking
pyr mac bos aba ban com cal cen
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
avg. test absolute cost
CSOVO
OVO (Lin, 2014)absolute cost with benchmark ordinal
ranking data
• CSOVO-SVM:
cost-sensitive
• OVO-SVM: regular
CSOVOsignificantly better for ordinal ranking
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Cons of CSOVO: Many Binary Classifiers
K classes−−−−→CSOVO K (K −1)2 binary classifiers
time-consumingin both
• training, especially withmany different cn[i]and cn[j]
• prediction
—parallization helps a bit, butgenerally not feasible for large K
CSOVO: a simple meta-methodfor median K only
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Key Idea: OVO ≡ Round-Robin Tournament
Round-Robin Tournament 1
2
3 4
Single-Elimination Tournament 3
2
1 2
3
3 4
• prediction≡ decidingtournament winnerfor eachx
• (CS)OVO: K (K −1)2 games for prediction (and hence training)
• single-elimination tournament(for K = 2`):
• K− 1 games for prediction via bottom-up: real-world
• log2K gamesfor prediction via top-down:computer-world :-)
next:single-elimination tournamentfor CSMC
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Filter Tree (FT) Approach
Filter Tree(Beygelzimer, 2009)Training: from bottom to top
ˆg(...) (R, 4) gˆ(1,2) (L, 9)
1 c[1] = 0
2 c[2] = 9
gˆ(3,4) (L, 3)
3 c[3] = 5
4 c[4] = 8
• gˆ(1,2)andgˆ(3,4)trained like CSOVO: smaller-c labelandweight u(i,j)n =|cn[i]− cn[j]|
• gˆ(...) trainedwith (kL, kR)filtered by sub-trees
—smaller-csub-tree directionandweight un(...) =|cn[kL]− cn[kR]| FT: top classifiersaware of bottom-classifier mistakes
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Pros and Cons of FT
Pros
• efficient:
O(K ) training, O(log K ) prediction
• strong theoretical guarantee:
small-regret binary classifiers
=⇒ small-regret CSMC classifier
12vs34
1vs2
1 2
3vs4
3 4
Cons
• ‘asymmetric’ to labels: non-trivial structural decision
• ‘hard’sub-tree dependent top-classification tasks
next: other reductions to (weighted) binary classification
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Other Approaches via Weighted Binary Classification
FT: with regret bound(Beygelzimer, 2009)
the lowestachievablecost within{1, 2} or {3, 4}?
Divide&Conquer Tree (TREE):
withoutregret bound (Beygelzimer, 2009)
the lowestidealcost within{1, 2} or {3, 4}?
12vs34
1vs2
1 2
3vs4
3 4
Sensitive Err. Correcting Output Code (SECOC): with regret bound(Langford, 2005)
c[1] + c[3] + c[4] greater than someθ?
training time:
SECOC (O(T · K )) > FT (O(K )) ≈ TREE (O(K ))
Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification
Comparison of Reductions to Weighted Binary Classification
zo. gl. ve. vo. ye. se. dn. pa. sa. us.
avg. test cost
0 50 100 150 200 250 300 350
CSOVO FT TREE
SECOC (Lin, 2014)couple all
meta-methods with SVM
• round-robin
tournament (CSOVO)
• single-elimination tournament (FT, TREE)
• error-correcting-code (SECOC)
CSOVO often among the best;
FT somewhat competitive
Cost-Sensitive Multiclass Classification CSMC by Regression
Outline
1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
CSMC by Bayesian Perspective
CSMC by (Weighted) Binary Classification CSMC by Regression
2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup
CSML by Bayesian Perspective
CSML by (Weighted) Binary Classification CSML by Regression
3 A Story of Bacteria Classification with Doctor-Annotated Costs
4 Summary
Cost-Sensitive Multiclass Classification CSMC by Regression
Key Idea: Cost Estimator
Goal
a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)
if everyc[k ] known optimal
g∗(x) = argmin1≤k ≤Kc[k ]
if rk(x)≈ c[k] well
approximately good gr(x) = argmin1≤k ≤Krk(x)
how to get cost estimator rk?regression
Cost-Sensitive Multiclass Classification CSMC by Regression
Cost Estimator by Per-class Regression
Given
N examples, each (inputxn, label yn, cost cn)∈ X × {1, 2, . . . , K } × RK
input cn[1] input cn[2] . . . input cn[K ]
x1 0, x1 2, x1 1
x2 1, x2 3, x2 5
· · ·
xN 6, xN 1, xN 0
| {z }
r1 | {z }
r2 | {z }
rK
want: rk(x)≈ c[k] for all future (x, y, c) and k
Cost-Sensitive Multiclass Classification CSMC by Regression
The Reduction-to-Regression Framework
cost- sensitive example (xn, yn, cn)
⇒@A
A
%
$ '
&
regression examples (Xn,k, Yn,k)
k = 1,· · · , K ⇒⇒⇒ regression
algorithm
⇒⇒
⇒
%
$ '
&
regressors rk(x) k∈ 1, · · · , K
AA
@
⇒
cost- sensitive classifier gr(x)
1 encode: transform cost-sensitive examples (xn, yn, cn)
to regression examples xn,k, Yn,k = xn, cn[k ]
2 learn: use your favorite algorithm on regression examples to get estimators rk(x)
3 decode: for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x)
the reduction-to-regression framework:
systematic & easy to implement
Cost-Sensitive Multiclass Classification CSMC by Regression
Theoretical Guarantees (1/2)
gr(x) = argmin
1≤k ≤K
rk(x)
Theorem (Absolute Loss Bound)
For any set of cost estimators{rk}Kk =1and for any example (x, y , c) with c[y ] =0,
c[gr(x)]≤
K
X
k =1
rk(x)− c[k]
.
low-cost classifier⇐= accurate estimators
Cost-Sensitive Multiclass Classification CSMC by Regression
Theoretical Guarantees (2/2)
gr(x) = argmin
1≤k ≤K
rk(x)
Theorem (Squared Loss Bound)
For any set of cost estimators{rk}Kk =1and for any example (x, y , c) with c[y ] =0,
c[gr(x)]≤ v u u t2
K
X
k =1
(rk(x)− c[k])2.
applies to commonleast-square regression
Cost-Sensitive Multiclass Classification CSMC by Regression
A Pictorial Proof
c[gr(x)]≤
K
X
k =1
rk(x)− c[k]
• assumec ordered and not degenerate: y = 1;0=c[1]<c[2]≤ · · · ≤c[K ]
• assume mis-prediction gr(x) = 2: r2(x) = min1≤k ≤Krk(x)≤ r1(x)
c[1]
∆1 r2(x) r1(x)
∆2
c[2] c[3]
r3(x)
c[K ] rK(x)
c[2]− c[1]
|{z}
0
≤ ∆1
+
∆2
≤
K
X
k =1
rk(x)− c[k]
Cost-Sensitive Multiclass Classification CSMC by Regression
An Even Closer Look
let∆1≡ r1(x)− c[1] and∆2≡ c[2] − r2(x)
1 ∆1≥ 0 and∆2≥ 0: c[2] ≤∆1+∆2
2 ∆1≤ 0 and∆2≥ 0: c[2] ≤∆2 3 ∆1≥ 0 and∆2≤ 0: c[2] ≤∆1
c[1]
∆1
r2(x) r1(x)
∆2
c[2] c[3]
r3(x)
c[K ] rK(x)
∆1 r2(x)r1(x)
∆2
c[2]≤ max(∆1, 0) + max(∆2, 0)≤ ∆1
+ ∆2
Cost-Sensitive Multiclass Classification CSMC by Regression
Tighter Bound with One-sided Loss
Defineone-sided lossξk ≡ max(∆k, 0) with ∆k ≡
rk(x)− c[k]
ifc[k ] = cmin=0
∆k ≡
c[k ]− rk(x)
ifc[k ]6= cmin
Intuition
• c[k ] = cmin:wish to have rk(x)≤ c[k]
• c[k ]6= cmin:wish to have rk(x)≥ c[k]
—both wishes same as ∆k ≤ 0 ⇐⇒ ξk =0
One-sided Loss Bound:
c[gr(x)]≤
K
X
k =1
ξk ≤
K
X
k =1
∆k
Cost-Sensitive Multiclass Classification CSMC by Regression
The Improved Reduction Framework
cost- sensitive example (xn, yn, cn)
⇒@A
A
%
$ '
&
regression examples (Xn,k, Yn,k, Zn,k)
k = 1,· · · , K ⇒⇒⇒ one-sided regression algorithm ⇒⇒⇒
%
$ '
&
regressors rk(x) k∈ 1, · · · , K
AA
@
⇒
cost- sensitive classifier gr(x)
(Tu, 2010)
1 encode: transform cost-sensitive examples (xn, yn, cn)to
one-sided regression examples x(k )n , Yn(k ),Zn(k ) = (xn, cn[k ], 2Jcn[k ] =0K − 1)
2 learn: use aone-sided regression algorithmto get estimators rk(x)
3 decode: for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x) the reduction-to-OSR framework:
need a good OSR algorithm
Cost-Sensitive Multiclass Classification CSMC by Regression
Regularized One-Sided Hyper-Linear Regression
Given
xn,k, Yn,k, Zn,k =
xn, cn[k ], 2r
cn[k ] =0 z
− 1 Training Goal
all trainingξn,k = max
Zn,k rk(xn,k)− Yn,k
| {z }
∆n,k
, 0
small
—will drop k
minw,b
λ
2hw, wi +
N
X
n=1
ξn
to get rk(x) =hw, φ(x)i + b
Cost-Sensitive Multiclass Classification CSMC by Regression
One-Sided Support Vector Regression
Regularized One-Sided Hyper-Linear Regression minw,b
λ
2hw, wi +
N
X
n=1
ξn
ξn = max (Zn· (rk(xn)− Yn), 0) Standard Support Vector Regression
minw,b
1
2Chw, wi +
N
X
n=1
(ξn+ξn∗)
ξn= max (+1· (rk(xn)− Yn− ), 0) ξ∗n= max (−1· (rk(xn)− Yn+), 0)
OSR-SVM = SVR +(← 0)+(keepξnorξn∗by Zn)
Cost-Sensitive Multiclass Classification CSMC by Regression
OSR-SVM on Semi-Real Data
ir. wi. gl. ve. vo. se. dn. sa. us. le.
0 50 100 150 200 250 300 350
avg. test cost
OSR
OVA (Tu, 2010)some ‘artificial’
cost with UCI data
• OSR: cost-sensitive SVM
• OVA: regular one-versus-all SVM
OSR often significantly betterthan OVA
Cost-Sensitive Multiclass Classification CSMC by Regression
OSR versus WAP on Semi-Real Data
ir. wi. gl. ve. vo. se. dn. sa. us. le.
0 50 100 150 200 250 300
avg. test cost
OSR
WAP (Tu, 2010)some ‘artificial’
cost with UCI data
• OSR(per-class):
O(K ) training, O(K ) prediction
• WAP≈ CSOVO (pairwise): O(K2) training, O(K2) prediction
OSR faster and competitive performance
Cost-Sensitive Multiclass Classification CSMC by Regression
From OSR-SVM to AOSR-DNN
OSR-SVM min
w,b
λ
2hw, wi +XN n=1ξn
with rk(x) =hw, φ(x)i + b
ξn= max (Zn· (rk(xn)− Yn), 0)
Appro. OSR-DNN min
NNet regularizer +XN n=1δn
with rk(x) =NNet(x)
δn=ln (1 + exp (Zn· (rk(xn)− Yn))) AOSR-DNN(Chung, 2016a)= Deep Learning + OSR +
smoother upper boundδn≥ ξnbecause ln(1 + exp(•)) ≥ max(•, 0)
Cost-Sensitive Multiclass Classification CSMC by Regression
From AOSR-DNN to CSDNN
Cons of AOSR-DNN
c affects both classification and feature-extraction in DNN but hard to do effectivecost-sensitive feature extraction
idea 1: pre-training with c
• layer-wise pre-training with cost-sensitive autoencoders
loss = reconstruction+ AOSR
• CSDNN(Chung, 2016a)
= AOSR-DNN + cost-sens.
pre-training
idea 2: auxiliary cost-sensitive nodes
• auxiliary nodesto
predict costs per layer loss = AOSR for classification
+ AOSR for auxiliary
• applies to any deep learning model
(Chung, 2020)
CSDNN: world’sfirst successful CSMC deep learning model
Cost-Sensitive Multiclass Classification CSMC by Regression
AOSR-DNN versus CSDNN
m b s c mi bi si ci
0 1 2 3 4 5 6 7 8 9
avg. test cost
AOSR-DNN
CSDNN (Chung, 2016a)
• AOSR-DNN:
cost-sensitive training
• CSDNN:
AOSR-DNN+
cost-sensitive feature extraction
CSDNN wins, justifyingcost-sensitive feature extraction
Cost-Sensitive Multiclass Classification CSMC by Regression
ABOD-DNN versus CSDNN
m b s c mi bi si ci
0 1 2 3 4 5 6 7 8 9
avg. test cost
ABOD-DNN
CSDNN (Chung, 2016a)
• ABOD-DNN:
probability estimate+ cost-sensitive
prediction
• CSDNN:
cost-sensitive training +cost-sensitive feature extraction
CSDNN still wins, hintingdifficulty of probability estimate withoutcost-sensitive feature extraction
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
Outline
1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup
CSMC by Bayesian Perspective
CSMC by (Weighted) Binary Classification CSMC by Regression
2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup
CSML by Bayesian Perspective
CSML by (Weighted) Binary Classification CSML by Regression
3 A Story of Bacteria Classification with Doctor-Annotated Costs
4 Summary
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
Which Fruit?
?
(image by Robert-Owen-Wahl from Pixabay)
apple orange strawberry kiwi
(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)
multiclass classification:
classify input (picture) toone category (label),remember? :-)
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
Which Fruits?
?:{apple, orange, kiwi}
(image by Michal Jarmoluk from Pixabay)
apple orange strawberry kiwi
(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)
multilabelclassification:
classify input tomultiple (or no)categories
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
Label Powerset: Multilabel Classification via Multiclass
(Tsoumakas, 2007)multiclass w/ L = 4 classes
4 possible outcomes {a, o, s, k}
multilabel w/ L = 4 classes 24=16possible outcomes
2{a, o, s, k} m
{ φ, a, o, s, k, ao, as, ak, os, ok, sk, aos, aok, ask, osk, aosk}
• Label Powerset (LP): reduction to multiclass classification
• difficulties for large L:
• computation: 2Lextended classes
• sparsity: no or few example for some combinations LP: feasible only forsmall L
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
What Tags?
?:{machine learning,data structure,data mining,object oriented programming,artificial intelligence,compiler, architecture,chemistry,textbook,children book,. . . etc.}
anothermultilabel classification problem:
tagginginput to multiple categories
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
Binary Relevance: Multilabel Classification via Yes/No
binary
classification {yes,no}
multilabel w/ L classes: L yes/no questions
machine learning(Y), data structure(N), data mining(Y), OOP(N), AI(Y), compiler(N), architecture(N), chemistry(N), textbook(Y),
children book(N), etc.
• Binary Relevance (BR): reduction to multiple isolated binary classification
• disadvantages:
• isolation—hidden relations not exploited
(e.g. ML and DMhighly correlated, MLsubset ofAI, textbook & children bookdisjoint)
• unbalanced—fewyes, manyno
BR: simple (& strong) benchmark with known disadvantages
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
Multilabel Classification Setup
Given
N examples (inputxn, labelsetYn)∈ X × 2{1,2,···L}
• fruits:X = encoding(pictures), Yn⊆ {1, 2, · · · , 4}
• tags:X = encoding(merchandise), Yn⊆ {1, 2, · · · , L}
Goal
a multilabel classifier g(x) thatclosely predictsthe labelsetY associated with some unseen inputs x (byexploiting hidden relations/combinations between labels)
multilabel classification:
hot and importantwith many real-world applications
Cost-Sensitive Multilabel Classification CSML Motivation and Setup
From Labelset to Coding View
labelset apple orange strawberry binary code Y1={o} 0 (N) 1 (Y) 0 (N) y1= [0, 1, 0]
Y2={a, o} 1 (Y) 1 (Y) 0 (N) y2= [1, 1, 0]
Y3={o, s} 0 (N) 1 (Y) 1 (Y) y3= [0, 1, 1]
Y4={} 0 (N) 0 (N) 0 (N) y4= [0, 0, 0]
(images by PublicDomainPictures, Narin Seandag, GiltonF, nihatyetkin from Pixabay)
subsetY of 2{1,2,··· ,L}⇐⇒ length-L binary code y