Disclaimer about Cost-sensitive Classification

105  Download (0)

Full text

(1)

Cost-sensitive Classification: Techniques and Stories

Hsuan-Tien Lin htlin@csie.ntu.edu.tw

Professor

Dept. of CSIE, National Taiwan University

Machine Learning Summer School @ Taipei, Taiwan August 2, 2021

(2)

About Me

• co-author of textbook ‘Learning from Data: A Short Course’

• instructor of two Coursera

Mandarin-teaching ML MOOCs on Coursera

goal: make ML more realistic

• weakly supervised learning: in ICML ’20, ICLR ’21,. . .

• online/active learning: in ICML ’12, ICML ’14, AAAI ’15, EMNLP ’20,. . .

cost-sensitive classification: in ICML ’10, KDD ’12, IJCAI ’16,. . .

multi-label classification: in

NeurIPS ’12, ICML ’14, AAAI ’18,. . .

• large-scale data mining: e.g. co-led KDDCup world-championNTU teams 2010–2013

(3)

More About Me

attendant:MLSS Taipei 2006

student workshop talk: Large-Margin Thresholded Ensembles for Ordinal Regression

(4)

Disclaimer about Cost-sensitive Classification

materials mostly from “old” tutorials

• Advances in Cost-sensitive Multiclass and Multilabel Classification. KDD 2019 Tutorial, Anchorage, Alaska, USA, August 2019.

• Cost-sensitive Classification: Algorithms and Advances. ACML 2013 Tutorial, Canberra, Australia, November 2013.

• core techniquessomewhat mature, compared to 10 years ago

• new researchstill being inspired, e.g.

Classification with Rejection Based on Cost-sensitive Classification, Charoenphakdee et al., ICML ’21

Cost-Sensitive Robustness against Adversarial Examples, Zhang and Evans, ICLR ’19

• will show one applicationstoryin the end

(5)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(6)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Which Digit Did You Write?

?

one (1) two (2) three (3) four (4)

• amulticlass classification problem: grouping ‘pictures’ into different ‘categories’

C’mon, we know about

multiclass classification all too well!:-)

(7)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Performance Evaluation

?

• ZIP code recognition:

1:wrong; 2:right; 3: wrong; 4: wrong

• check value recognition:

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4:two-dollar mistake

different applications: evaluate mis-predictions differently

(8)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

ZIP Code Recognition

?

1:wrong; 2:right; 3: wrong; 4: wrong

regular multiclass classification: only right or wrong

• wrong cost: 1;right cost: 0

• prediction error of h on some (x, y ):

classification cost =Jy 6= h(x)K regular multiclass classification:

well-studied, many good algorithms

(9)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Check Value Recognition

?

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4: two-dollar mistake

cost-sensitive multiclass classification: different costs for different mis-predictions

• e.g. prediction error of h on some (x, y ):

absolute cost =|y − h(x)|

next: more aboutcost-sensitive multiclass classification (CSMC)

(10)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

What is the Status of the Patient?

(image by mcmurryjulie from Pixabay)?

bird flu cold healthy

(images by Clker-Free-Vector-Images from Pixabay)

• anotherclassification problem: grouping ‘patients’ into different ‘status’

are all mis-prediction costs equal?

(11)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Patient Status Prediction

error measure = society cost actual

predicted

bird flu cold healthy

bird flu 0 1000 100000

cold 100 0 3000

healthy 100 30 0

• bird flu mis-predicted as healthy: very high cost

• cold mis-predicted as healthy: high cost

• cold correctly predicted as cold: no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the same?

(12)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Setup: Class-Dependent Cost-Sensitive Classification

Given

N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K }

and cost matrixC ∈ RK ×K withC(y, y) =0= min1≤k ≤KC(y, k) patient diagnosis

with society cost C =

0 1000 100000

100 0 3000

100 30 0

check digit recognition

with absolute cost (cost function)

C(y, k) = |y − k|

Goal

a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y) includesregular classificationCc like0 1

1 0



asspecial case

(13)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Which Age-Group?

? infant (1) child (2) teen (3) adult (4)

(images by Tawny van Breda, Pro File, Mieczysław Samol, lisa runnels, vontoba from Pixabay)

• small mistake—classify child as teen;big mistake—classify infant as adult

• cost matrixC(y, g(x)) for embedding ‘order’: C =

0 1 4 5

1 0 1 3

3 1 0 2

5 4 1 0

CSMC can help solve many other problems likeordinal ranking

(14)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Cost Vector

cost vectorc: a row of cost components

• society cost for a bird flu patient: c = (0,1000,100000)

• absolute cost for digit 2: c = (1,0,1,2)

• age-ranking cost for a teenager:c = (3,1,0,2)

• ‘regular’ classification cost for label 2:c(2)c = (1,0,1,1)

• movie recommendation

someone who loves romance movie buthates terror:

c = (romance =0, fiction =5,terror=100)

someone who loves romance movie butfine with terror:

c = (romance =0, fiction =5,terror=3) cost vector:

representation ofpersonal preference in many applications

(15)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Setup: Example-Dependent Cost-Sensitive Classification

Given

N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost vectorcn∈ RK

—will assumecn[yn] =0= min1≤k ≤Kcn[k ] Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

• will assumec[y ] =0=cmin= min1≤k ≤Kc[k ]

• note: y not really needed in evaluation

example-dependent⊃ class-dependent ⊃ regular

(16)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(17)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Key Idea: Conditional Probability Estimator

Goal (Class-Dependent Setup)

a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y)

if P(y|x) known

Bayes optimal g(x) =

argmin

1≤k ≤K K

X

y =1

P(y|x)C(y, k)

if q(x, y )≈ P(y|x) well

approximately good gq(x) =

argmin

1≤k ≤K K

X

y =1

q(x, y )C(y, k)

how to getconditional probability estimator q?

logistic regression, Naïve Bayes,. . .

(18)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Approximate Bayes-Optimal Decision

if q(x, y )≈ P(y|x) well

(Domingos, 1999)

approximately good gq(x) = argmin

1≤k ≤K K

P

y =1

q(x, y )C(y, k)

Approximate Bayes-Optimal Decision (ABOD) Approach

1 use your favorite algorithm on{(xn, yn)} to getq(x, y )≈ P(y|x)

2 for each new inputx,predict its class using gq(x)above

ABOD:probability estimate+Bayes-optimal decision

(19)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

ABOD on Artificial Data

1 use your favorite algorithm on{(xn, yn)} to getq(x, y )≈ P(y|x)

2 for each new inputx,predict its class using gq(x)above

LogReg

−→

g(x)

1 2 3 4

y

1 0 1 2 4

2 4 0 1 2

3 2 4 0 1

4 1 2 4 0

regular rotate

(20)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

ABOD for Binary Classification

Given N examples, each (inputxn, label yn)∈ X × {−1, +1}

and weightsw+,w representingtwo entries of cost matrix

g(x) +1 -1

y +1 0 w+

-1 w 0

if q(x)≈ P(+1|x) well

approximately good gq(x) = sign

w+q(x)−w(1−q(x))

, i.e. (Elkan, 2001), gq(x) = +1 ⇐⇒w+q(x)−w(1−q(x))> 0 ⇐⇒q(x)> w

w++w

ABOD for binary classification:

probability estimate+threshold changing

(21)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

ABOD for Binary Classification on Artificial Data

1 use your favorite algorithm on{(xn, yn)} to getq(x)≈ P(+1|x)

2 for each new inputx, predict its class using gq(x) = sign(q(x)−w+w+w)

LogReg

−→

g(x) +1 -1

y +1 0 10

-1 1 0

regular positive emphasis

(22)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Pros and Cons of ABOD

Pros

• optimal if goodprobability estimateq

• predictioneasily adapts to differentC without modifyingtraining (probability estimate) Cons

• ‘difficult’: goodprobability estimateoften more difficult

than goodmulticlass classification

• ‘restricted’: only applicable toclass-dependent setup

—need ‘full picture’ of cost matrix

• ‘slower prediction’ (for multiclass): more calculationat prediction stage

can we use anymulticlass classification algorithmfor ABOD?

(23)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

MetaCost Approach

Approximate Bayes-Optimal Decision (ABOD) Approach

1 use your favorite algorithm on{(xn, yn)} to getq(y, x)≈ P(y|x)

2 for each new inputx, predict its class using gp(x) MetaCost Approach (Domingos, 1999)

1 use your favoritemulticlass classificationalgorithm onbootstrapped{(xn, yn)}and aggregatethe classifiers to getq(y, x)≈ P(y|x)

2 for each given inputxn,relabel it to yn0 using gq(x)

3 run your favoritemulticlass classificationalgorithm

onrelabeled{(xn, yn0)}to get final classifier g

4 for each new inputx, predict its class using g(x)

pros: anymulticlass classificationalgorithm can be used

(24)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

MetaCost on Semi-Real Data

0 500 1000 1500 2000 2500 3000 3500

0 300 600 900 1200 1500 1800

C4.5R

MetaCost Multiclass Two-class y=x

0 500 1000 1500 2000 2500

0 300 600 900 1200 1500 1800

Undersampling

MetaCost Multiclass Two-class y=x

0 500 1000 1500 2000 2500

0 300 600 900 1200 1500 1800

Oversampling

MetaCost Multiclass Two-class y=x

/(ԁçWñ8à•Ü¸ê 'LÐ=ØWá·ßŠÒWà•ÔªÓ•ØgÙØMÝ *Ü"×@Ò{Ð=ØWÓ@×CDÓ(ÖzØWÓ@×@Ó# Ò´ýmԁÓ"-ôԟ×|Þ

×@ÞmØgӕÜØݵРB8íB@$›ó»ñŠÙ8æmÜ"à|Ó@ÒWá·ßmџԪÙmçåÒWيæ Ø´÷WÜ"à|Ó@ÒWáߊџԪÙmç 

Ò´ýmÜ"Ó"UíX =ÒWÖ0Þ^ß8ØWԁÙg׸Ö&ØWà|à@Ü"Ó|ß8ØgيæŠÓ6×@ØËÒËæŠÒ×|ÒMïŠÒWӕܛÒWيæ*×OâmßQÜ

ØÝÖ&ØWÓ@×á·Ò×|à•ÔŸýuí  ØMÔªÙW×|Ó'ÒMïQØ´÷MÜ×@ÞmÜ N üсԁÙmÜÒWà•Ü·×|ÞmØWÓ@Ü

à@Ü"æŠñŠÖzוԁØWÙµáÜz×@ފØ{æµØMÝÖ0ÞmØWԁÖzܻݺØWàá›ñŠÑŸ×@ԁÖzсÒWÓ@Óߊà@ØWïŠÑŸÜGáÓ"í

EF Y  LHRS#SUXW$YTZDLH[]\S

î“ÙÛ×Oô=ØMúZÖzсÒWÓ@Óߊà@ØWïmсÜ"á·Óµô6ÞmÜGà•ÜE14•êA7zêCN 14Jþ7|þA N ÿŠó

Ó@×@à|ÒוԟÕQÖ"ÒוԁØWÙ'Ö"ÒMÙïQÜÒMߊßmсԁÜ"æ'ôԁ×@ÞmØgñm×rÒWÙgâ¸ÒWߊߊà@Øcýmԁá·Ò×@ԟØgÙ

ïgâ á·ÒMägԁيç 14•ê N 14Zþ:7zêUó 14ZþA N 14•ê7UþAåÒMيæ

ߊà@ØmÖ&ÜzÜGæmÔªÙmç]ÒMÓ ï8ÜzݺØWà@ÜMí:rÜz×@וԪÙmçðêËïQÜ/×|ÞmÜËáÔªÙmØWà@ԁ×Oâ™ÖzсÒWÓ@Ó

ÒMÙ8æ þ­×@ÞmÜùá·Ò´ò“ØWà@ԁ×Oâ:Ö&ѪÒMÓ|Ó"óËÜzýmßQÜ"à@Ôªá ÜGÙg×@ÓðØgÙ ×Oô=ØMúZÖzсÒWÓ@Ó

æŠÒM×@ÒMï8ÒMÓ@Ü"ÓÛô=ÜGà•Ü Ö&ØWÙ8æŠñŠÖ&×@Ü"æñŠÓ•ÔªÙmçã×@ÞmÜ:ݺØMссشôԁÙmçãÖ&Øgӕ×

áØmæmÜzÑ9'014•êA7zê*N 14Zþ:7|þ *Nöÿ?14•ê7UþA*Nûê"ÿWÿMÿ?14Jþ7zê*N

ê"ÿWÿMÿ gó»ô6ÞmÜGà•Ü ðôSÒWÓËÓ@Üz×µÒсוÜGà@يÒMוÜ"џâ‡×•Øöþmó @móÂÒWيæûêGÿŠí

=ØW×•Ü ×@Þ8Ò×µ×@ÞmÜüÒMï8ӕØWсñm×@Ü¡÷´ÒMсñmÜGÓµØÝ414Zþ:7zêËÒMيæ<14•ê7UþA

ÒMà@ÜÔªà@à@ÜzсÜz÷´ÒWÙW×ݺØWàÂÒсçMØWà@ԁ×@ފáÖ&Øgá·ßŠÒMà@ԁÓ@ØWÙߊñ8à@ßQØWÓ@Ü"Ó?pØWÙmсâ

×@ފÜzÔªàÂà|Ò×@ԟØ ƒÔÓ»Ó•ÔçWيÔ}ÕQÖ"ÒMÙg×zí;)ÞmÜà•ÜGÓ@ñmс×@ÓÂØWïŠ×@ÒÔªÙmÜGæuóuñŠÓ•ÔªÙmç

×@ފÜ]Ó@ÒWáÜ*ӕÜ"ווԪÙmçgӃݺØWà *Üz×|ÒgÐ=ØWÓ@×ÒMÓ/ï8ÜzݺØWà@ÜMóÒMà@Ü^Ó@ÞmØ´ô6Ù

ÔªÙ )(ÒMïmсÜ8ó[ÒWيæ^çWà|ÒMß8ÞmԁÖ"ÒсџâµÔªÙ3/ԟçgñŠà@ÜêMí @Â÷MÜGà@Ó|ÒMá·ßmсԁيç

ԪӃÙmØM׃÷WÜ"à@âüÜKpÜ"Ö&×@ԟ÷WÜԁهà@Ü"æŠñŠÖzԁيç ÖzØWÓ@×·ôԟ×|ÞÒWÙWâ°ØÝÂ×|ÞmÜ

Ö&Øgӕ×à@ÒMוԁØWÓ"í »ÙŠæmÜGà@Ó|ÒMá·ßmсԁيç]ԪӛÜK[ÜGÖ&וԁ÷MÜËݺØWà CN @]ÒMيæ

N ê"ÿŠóïŠñm×ËÙmØM׃ݺØgà N þmí ]Üz×@Ò{Ð=ØWÓ@×à•ÜGæŠñŠÖzÜ"ÓÖzØWÓ@×@Ó

Ö&Øgá·ßŠÒMà@Ü"æü×@Ø Ð BQí@óLñŠÙŠæmÜGà@Ó|ÒMá·ßmсԁيç¡ÒMÙ8æüØ´÷MÜGà@Ó|ÒMá·ßmсԁيç

ØWÙËÒѪáØWÓ@×ÒMџÑuæŠÒM×@ÒWïŠÒMÓ@Ü"Ó"óMݺØgàÒсÑuÖ&ØWÓ@×à@ÒMוԁØWÓ"írî“ÙµÒÑÑ ÖzÒWӕÜGÓzó

×@ފÜÖzØWÓ@×@Ó ØWïm×|ÒÔªÙmÜ"æïgâ= *Üz×|ÒgÐ=ØWÓ@×·ÒMà@Ü/сشô=ÜGà›×|ފÒMÙ°×@ÞmØgӕÜ

ØÝSÜ"ÒWÖ0Þ]ØMÝ=×@ފ܃ØM×@ފÜ"à»×@ފà@ÜzÜÒMџçWØWà@ԟ×|ފá·Óôԁ×@Þ ÖzØWÙ{ÕQæŠÜ"يÖzÜ"Ó

Ü&ýŠÖzÜzÜ"æŠÔÙmç HHñŠÓ•ÔªÙmç ӕԁçWÙÒMيæ ;öԟѪÖ&ØcýmØgÙüוÜGӕ×|Ó XÜ&ýŠÖ&ÜGßm×

ݺØWà‡×|ÞmÜûÓ@ԟçgÙ ×@Ü"Ó@ׇݺØWàöñŠÙŠæŠÜ"à|Ó@ÒWáߊџԪÙmç ôԟ×|Þ N êGÿŠó

ô6ÞmÜGà•Ü^×|ÞmÜ]Ö&ØgÙ{ÕQæmÜGيÖ&ÜÔªÓ HWë )ފÜ"Ó@Ü*à•ÜGÓ@ñmс×@Ó/Ó@ñŠß8ß8Øgà•×

×@ފÜûÖ&ØWÙ8Ö&ѪñŠÓ•ÔØWÙ ×@ފÒM× *Üz×|ÒgÐ=Øgӕ×éԁӇ×|ÞmÜûÖ&ØgӕוúZà@Ü"æ8ñŠÖ&×@ԟØgÙ

áÜz×|ÞmØmæ­ØÝÖ0ÞmØMÔªÖ&ÜüÜ"÷MÜGÙåݺØgà*×OôSØúOÖ&ѪÒMÓ|Óߊà•ØgïmсÜ"á·Ó^ô6ÞmÜGà•Ü

Ó@×@à|ÒוԟÕQÖ"ÒוԁØWÙµÖzÒWÙËïQÜÒMß8ßmџԁÜ"æËôԁ×@ÞmØgñm×ÒMß8ߊà•ØcýmÔªá·ÒוԁØWÙ í

EFE 5[]S NPY M J%NP[]S

>gÜ"÷MÜ"à|ÒÑ!(gñŠÜ"Ó@וԁØWيÓÒWà•ÔªÓ@܃ԁÙüÖzØWيÙmÜGÖ&×@ԟØgÙ ôԟ×|Þ= *Üz×|ÒgÐ=Øgӕ×EDÓ

à@Ü"Ó|ñmџ×|ÓzíPIÂØ´ô Ó@Ü"يÓ@ԟ×@ԟ÷WÜÛÒWà•ÜÛ×@ÞmÜ"â%×•Ø ×@ÞmÜ Ù{ñŠá›ïQÜ"àéØMÝ

à@Ü"Ó|ÒMá·ßmсÜ"ÓñŠÓ@Ü"æ ;™ØWñmѪæùԟ×*ïQÜüÜ"يØWñmçgÞ;וØöÓ@ԁá·ßmсâÛñ8ӕÜ

×@ފܷÖzсÒWÓ@Ӹߊà@ØWï8ÒMïmԁџԁוԁÜ"Ӹߊà@ØmæŠñŠÖzÜ"æ™ïWâ¡ÒËÓ@ԁيçMсܷà|ñŠÙ]ØÝ=×|ÞmÜ

Ü"à|à@ØWà•úZïŠÒWӕÜGæðÖ&ѪÒMÓ|Ó@Ô}Õ8ÜGà·ØWÙé×@ފÜݹñmџÑÂ×@à|ÒÔªÙmÔªÙmçüӕÜ"× ;™ØWñŠÑæ

*Ü"×@ÒgÐ=Øgӕ×*ßQÜ"à•ÝºØWà|á ïQÜz×@וÜGà]ԟÝÒсћáØ{æŠÜzѪÓ*ôSÜ"à@ÜðñŠÓ•ÜGæùԁÙ

à@ÜzѪÒMïQÜzсԁÙmçãÒMÙ Ü&ýŠÒMá·ßmсÜMóðÔªà|à•ÜGÓ@ßQÜ"Özוԁ÷MÜkØMÝ°ô6ÞmÜ"×@ÞmÜGàÛ×|ÞmÜ

Ü&ýŠÒWáߊџÜ'ôÒMÓñŠÓ@Ü"æµ×•ØƒÑÜ"ÒWà@ÙË×@ފÜ"á ØgàÙmØM× ^øÂيæ^ÞmØ´ôåôSÜzсÑ

ôSØWñmѪæ ]Üz×@Ò{Ð=ØWÓ@×»æŠØËԟÝS×|Þm܃Ö&ѪÒMÓ|Ӹߊà•ØgïŠÒMïŠÔŸÑÔŸ×@ԟÜGӸߊà•ØmæŠñ8Ö&Ü"æ

ïgâ Ð BQí@ ôSÜ"à@ܷԟçgÙmØWà@Ü"æ órÒMÙ8æ¡×|ÞmÜߊà@ØWïŠÒWïmԁџԁ×Oâ]ØMÝÒ^ÖzсÒWÓ@Ó

ôÒMÓËÜ"Ó@וԪáÒMוÜGæÛÓ@ԁá·ßmсâ;ÒMÓµ×|Þmܙݹà@ÒWÖ&וԁØWÙåØÝ áØmæmÜ"сӵ×@Þ8Ò×

ߊà@Ü"æmÔªÖ&×@Ü"æ ԁ× )ÞŠÔÓ Ó•ÜGÖ&וԁØWÙüÒWيÓ@ô=ÜGà@Ó¸×|ÞmÜ"Ó@Ü ({ñmÜ"Ó@וԁØWÙ8ӛïgâ

ÖzÒWà@à@â{ԁيçðØWñm×µ×@ÞŠÜ à@ÜzсÜz÷´ÒWÙW×ËÜ&ýŠßQÜ"à@ԁáÜ"Ùg×|Ózí /QØWàË×@ÞmÜüÓ|ÒMäÜ

ØÝÓ@ߊÒWÖ&ÜWó6ØWÙmсâöà•ÜGÓ@ñŠÑŸ×|Ó/ØgÙö×@Þmܙ×OôSØúOÖ&ѪÒMÓ|Ó/æŠÒM×@ÒMï8ÒMÓ@Ü"Ó/ÒWà•Ü

ߊà@Ü"Ó@Ü"ÙgוÜGæ? ×@ÞmÜùà•ÜGÓ@ñŠÑŸ×|Ó°ØgÙ á›ñmсוԪÖ&ѪÒMÓ|Ó°æ8Ò×@ÒWïŠÒMÓ@Ü"Ó°ô=ÜGà•Ü

ïŠà@ØWÒWæmџâðÓ@Ôªá ԁѪÒMà"í )rÒWïmсÜ*B à@Ü"ßQØWà@×@Ó×@ÞmÜ^à@Ü"Ó|ñmс×@ÓØWïm×|ÒÔªÙmÜ"æ

ݺØWà N þmó @åÒWيæ%êGÿ;ïgâù×|ÞmÜݺØMсџشôÔªÙmçå÷´ÒWà•ÔªÒ×@ԟØgيÓ*ØMÝ

*Ü"×@ÒgÐ=Øgӕ×';ñŠÓ@ԁÙmç;þMÿéÒWيæ:ê"ÿ‡à•ÜGÓ@ÒWá·ßmџÜGӵԁيÓ@וÜGÒMæùØÝ @ÿ

¹ÑªÒMïQÜzсÜ"æ T

N¸þÿAV'ÒMيæ=T



N êGÿVG ?Wà@ÜzѪÒMïQÜzсԪÙmç›×|ÞmÜ×|à@ÒMԁÙmÔªÙmç

Ü&ýŠÒWáߊџÜGÓ;ñŠÓ@ԁÙmç%×|ÞmÜûÖ&ѪÒMÓ|Ó;ߊà@ØWï8ÒMïmԁџԁוԁÜ"Óöß8à•ØmæŠñŠÖzÜ"æïgâ

Ò;Ó@ԁيçMсܰà@ñ8Ù­ØݵРB8íB@$ ØgÙ Òсћ×|ÞmÜæŠÒM×@Ò XсÒWï8Ü"џÜGæ TUÐ B

 à@ØWï8ÓVG ?6ԁçWÙmØgà•ÔªÙmç ×@ފÜ]Ö&ѪÒMÓ|Óߊà@ØWïŠÒWïmԁџԁוԁÜ"Óß8à•ØmæŠñŠÖzÜ"æ‡ïgâ

(Domingos, 1999)

• some ‘artificial’ cost with UCI data

• MetaCost+C4.5:

cost-sensitive

• C4.5: regular

not surprisingly,

considering the cost properly does help

Hsuan-Tien Lin (NTU) Cost-sensitive Classification: Techniques and Stories 23/104

(25)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(26)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Key Idea: Cost Transformation

(heuristic)relabelinguseful in MetaCost: a moreprincipledway?

Yes, by Connecting Cost Vector to Regular Costs!

1 0 1 2

| {z }

cof interest

shift equivalence

−−−−−−−−−−−→ 3 2 3 4

| {z }

shifted cost

= 1 2 1 0

| {z }

mixture weightsu`

·

0 1 1 1

1 0 1 1

1 1 0 1

1 1 1 0

| {z }

regular costs

i.e.x with c = (1,0,1,2)equivalent to

a weighted mixture{(x, y, u)} = {(x, 1, 1), (x, 2, 2), (x, 3, 1)}

cost equivalence(Lin, 2014): for any classifier h, c[h(x)]+constant=

K

P

`=1

u`J` 6= h(x)K

(27)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Meaning of Cost Equivalence

c[h(x)]+constant=

K

X

`=1

u`J` 6= h(x)K

on one (x, y , c):

wrong prediction charged by c[h(x)]

relabeled data

onall{(x, `, u`)}:

wrong prediction charged by total weighted classification error

ofrelabeled data minhexpectedLHS (original CSMC problem)

= minhexpectedRHS (weighted classification when u` ≥ 0)

(28)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Calculation of u

`

Smallest Non-Negative u`’s(Lin, 2014)

whenconstant = (K − 1) max

1≤k ≤Kc[k ]− PK

k =1

c[k ], u` = max

1≤k ≤Kc[k ]− c[`]

e.g. 1 0 1 2

| {z }

cof interest

→ 1 2 1 0

| {z }

mixture weightsu`

largest c[`]: u`=0 (least preferred relabel)

• smallestc[`]: u` =largest (original label &most preferred relabel)

`’s and u`’sembed the cost

(29)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Data Space Expansion Approach

Data Space Expansion (DSE) Approach(Abe, 2004) 1 for each (xn, yn, cn)and`, letun,`= max

1≤k ≤Kcn[k ]− cn[`]

2 apply your favoritemulticlass classification algorithmon the weighted mixtures

N

S

n=1

{(xn, `, un,`)}K`=1to get g(x)

3 for each new inputx, predict its class using g(x)

• bycost equivalence,

good g fornew (weighted) regular classification problem

=

good g fororiginal cost-sensitive classification problem

• weightedregular classification: special case of CSMC

but more easily solvable by, e.g.,sampling+ regular classification(Zadrozny, 2003)

pros: anymulticlass classificationalgorithm can be used

(30)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

DSE versus MetaCost on Semi-Real Data

ann kdd let spl sat

0 50 100 150 200 250

avg. test cost

DSE

MetaCost (Abe, 2004)some ‘artificial’

cost with UCI data

• usesampling+ C4.5 for weighted regular classification

DSEcompetitive to MetaCost

(31)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Cons of DSE: Unavoidable Noise

Original Cost-Sensitive Classification Problem

1 2 3 4

individual examples without noise

+absolute cost =

New Regular Classification Problem

mixtures with relabeling noise

• cost embedded as weight +noisy labels

• new problem usuallyharder than original one

needrobustmulticlass classification algorithm to deal withnoise

(32)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Key Idea: Design Robust Multiclass Algorithm

One-Versus-One: A Popular Classification Meta-Method

• for all different class pairs (i, j),

1 take all examples (xn, yn)

that yn=i or j(original one-versus-one)

that un,i6= un,jwith the larger-u label and weight |un,i− un,j| (robust one-versus-one) 2 train a binary classifier ˆg(i,j)using those examples

• return g(x) that predicts using the votes from ˆg(i,j)

un-shiftinginside the meta-method toremove noise

robust stepmakes it suitable for DSE

cost-sensitive one-versus-one: DSE +robust one-versus-one

(33)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Cost-Sensitive One-Versus-One (CSOVO)

Cost-Sensitive One-Versus-One (Lin, 2014)

• for all different class pairs (i, j),

1 robust one-versus-one+ calculate fromcn: take all examples (xn, yn)

thatcn[i]6= cn[j] withsmaller-c labelandweight un(i,j)=|cn[i]− cn[j]|

2 train a binary classifier ˆg(i,j)using those examples

• return g(x) that predicts using the votes from ˆg(i,j)

• comes withgood theoretical guarantee:

test cost of g≤ 2X

i<jtest cost of ˆg(i,j)

• sibling to Weighted All-Pairs (WAP) approach: even tighter guarantee(Beygelzimer, 2005)

withmore sophisticated construction of u(i,j)n physical meaning: each ˆg(i,j)answers yes/no question‘prefer i or j?’

(34)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

CSOVO on Semi-Real Data

veh vow seg dna sat usp

0 20 40 60 80 100 120 140 160 180 200

avg. test random cost

CSOVO

OVO (Lin, 2014)some ‘artificial’

cost with UCI data

• CSOVO-SVM:

cost-sensitive

• OVO-SVM: regular

not surprisinglyagain,

considering the cost properly does help

(35)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

CSOVO for Ordinal Ranking

pyr mac bos aba ban com cal cen

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

avg. test absolute cost

CSOVO

OVO (Lin, 2014)absolute cost with benchmark ordinal

ranking data

• CSOVO-SVM:

cost-sensitive

• OVO-SVM: regular

CSOVOsignificantly better for ordinal ranking

(36)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Cons of CSOVO: Many Binary Classifiers

K classes−−−−→CSOVO K (K −1)2 binary classifiers

time-consumingin both

• training, especially withmany different cn[i]and cn[j]

• prediction

—parallization helps a bit, butgenerally not feasible for large K

CSOVO: a simple meta-methodfor median K only

(37)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Key Idea: OVO ≡ Round-Robin Tournament

Round-Robin Tournament 1

2

3 4

Single-Elimination Tournament 3

2

1 2

3

3 4

• prediction≡ decidingtournament winnerfor eachx

• (CS)OVO: K (K −1)2 games for prediction (and hence training)

• single-elimination tournament(for K = 2`):

K− 1 games for prediction via bottom-up: real-world

log2K gamesfor prediction via top-down:computer-world :-)

next:single-elimination tournamentfor CSMC

(38)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Filter Tree (FT) Approach

Filter Tree(Beygelzimer, 2009)Training: from bottom to top

ˆg(...) (R, 4) gˆ(1,2) (L, 9)

1 c[1] = 0

2 c[2] = 9

(3,4) (L, 3)

3 c[3] = 5

4 c[4] = 8

• gˆ(1,2)andgˆ(3,4)trained like CSOVO: smaller-c labelandweight u(i,j)n =|cn[i]− cn[j]|

• gˆ(...) trainedwith (kL, kR)filtered by sub-trees

—smaller-csub-tree directionandweight un(...) =|cn[kL]− cn[kR]| FT: top classifiersaware of bottom-classifier mistakes

(39)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Pros and Cons of FT

Pros

• efficient:

O(K ) training, O(log K ) prediction

strong theoretical guarantee:

small-regret binary classifiers

=⇒ small-regret CSMC classifier

12vs34

1vs2

1 2

3vs4

3 4

Cons

• ‘asymmetric’ to labels: non-trivial structural decision

• ‘hard’sub-tree dependent top-classification tasks

next: other reductions to (weighted) binary classification

(40)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Other Approaches via Weighted Binary Classification

FT: with regret bound(Beygelzimer, 2009)

the lowestachievablecost within{1, 2} or {3, 4}?

Divide&Conquer Tree (TREE):

withoutregret bound (Beygelzimer, 2009)

the lowestidealcost within{1, 2} or {3, 4}?

12vs34

1vs2

1 2

3vs4

3 4

Sensitive Err. Correcting Output Code (SECOC): with regret bound(Langford, 2005)

c[1] + c[3] + c[4] greater than someθ?

training time:

SECOC (O(T · K )) > FT (O(K )) ≈ TREE (O(K ))

(41)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Comparison of Reductions to Weighted Binary Classification

zo. gl. ve. vo. ye. se. dn. pa. sa. us.

avg. test cost

0 50 100 150 200 250 300 350

CSOVO FT TREE

SECOC (Lin, 2014)couple all

meta-methods with SVM

• round-robin

tournament (CSOVO)

• single-elimination tournament (FT, TREE)

• error-correcting-code (SECOC)

CSOVO often among the best;

FT somewhat competitive

(42)

Cost-Sensitive Multiclass Classification CSMC by Regression

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(43)

Cost-Sensitive Multiclass Classification CSMC by Regression

Key Idea: Cost Estimator

Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

if everyc[k ] known optimal

g(x) = argmin1≤k ≤Kc[k ]

if rk(x)≈ c[k] well

approximately good gr(x) = argmin1≤k ≤Krk(x)

how to get cost estimator rk?regression

(44)

Cost-Sensitive Multiclass Classification CSMC by Regression

Cost Estimator by Per-class Regression

Given

N examples, each (inputxn, label yn, cost cn)∈ X × {1, 2, . . . , K } × RK

input cn[1] input cn[2] . . . input cn[K ]

x1 0, x1 2, x1 1

x2 1, x2 3, x2 5

· · ·

xN 6, xN 1, xN 0

| {z }

r1 | {z }

r2 | {z }

rK

want: rk(x)≈ c[k] for all future (x, y, c) and k

(45)

Cost-Sensitive Multiclass Classification CSMC by Regression

The Reduction-to-Regression Framework







 cost- sensitive example (xn, yn, cn)

@A

A

%

$ '

&

regression examples (Xn,k, Yn,k)

k = 1,· · · , K ⇒⇒⇒ regression

algorithm

⇒⇒

%

$ '

&

regressors rk(x) k∈ 1, · · · , K

AA

@









 cost- sensitive classifier gr(x)

1 encode: transform cost-sensitive examples (xn, yn, cn)

to regression examples xn,k, Yn,k = xn, cn[k ]

2 learn: use your favorite algorithm on regression examples to get estimators rk(x)

3 decode: for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x)

the reduction-to-regression framework:

systematic & easy to implement

(46)

Cost-Sensitive Multiclass Classification CSMC by Regression

Theoretical Guarantees (1/2)

gr(x) = argmin

1≤k ≤K

rk(x)

Theorem (Absolute Loss Bound)

For any set of cost estimators{rk}Kk =1and for any example (x, y , c) with c[y ] =0,

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

.

low-cost classifier⇐= accurate estimators

(47)

Cost-Sensitive Multiclass Classification CSMC by Regression

Theoretical Guarantees (2/2)

gr(x) = argmin

1≤k ≤K

rk(x)

Theorem (Squared Loss Bound)

For any set of cost estimators{rk}Kk =1and for any example (x, y , c) with c[y ] =0,

c[gr(x)]≤ v u u t2

K

X

k =1

(rk(x)− c[k])2.

applies to commonleast-square regression

(48)

Cost-Sensitive Multiclass Classification CSMC by Regression

A Pictorial Proof

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

• assumec ordered and not degenerate: y = 1;0=c[1]<c[2]≤ · · · ≤c[K ]

• assume mis-prediction gr(x) = 2: r2(x) = min1≤k ≤Krk(x)≤ r1(x)

c[1]

1 r2(x) r1(x)

2

c[2] c[3]

r3(x)

c[K ] rK(x)

c[2]− c[1]

|{z}

0

≤ ∆1

+

2

K

X

k =1

rk(x)− c[k]

(49)

Cost-Sensitive Multiclass Classification CSMC by Regression

An Even Closer Look

let∆1≡ r1(x)− c[1] and2≡ c[2] − r2(x)

11≥ 0 and∆2≥ 0: c[2] ≤1+∆2

21≤ 0 and∆2≥ 0: c[2] ≤2 31≥ 0 and∆2≤ 0: c[2] ≤1

c[1]

1

r2(x) r1(x)

2

c[2] c[3]

r3(x)

c[K ] rK(x)

1 r2(x)r1(x)

2

c[2]≤ max(∆1, 0) + max(∆2, 0)≤ ∆1

+ ∆2

(50)

Cost-Sensitive Multiclass Classification CSMC by Regression

Tighter Bound with One-sided Loss

Defineone-sided lossξk ≡ max(∆k, 0) with ∆k ≡

rk(x)− c[k]

ifc[k ] = cmin=0

k ≡

c[k ]− rk(x)

ifc[k ]6= cmin

Intuition

c[k ] = cmin:wish to have rk(x)≤ c[k]

c[k ]6= cmin:wish to have rk(x)≥ c[k]

—both wishes same as ∆k ≤ 0 ⇐⇒ ξk =0

One-sided Loss Bound:

c[gr(x)]≤

K

X

k =1

ξk

K

X

k =1

k

(51)

Cost-Sensitive Multiclass Classification CSMC by Regression

The Improved Reduction Framework







 cost- sensitive example (xn, yn, cn)

@A

A

%

$ '

&

regression examples (Xn,k, Yn,k, Zn,k)

k = 1,· · · , K ⇒⇒⇒ one-sided regression algorithm ⇒⇒⇒

%

$ '

&

regressors rk(x) k∈ 1, · · · , K

AA

@









 cost- sensitive classifier gr(x)

(Tu, 2010)

1 encode: transform cost-sensitive examples (xn, yn, cn)to

one-sided regression examples x(k )n , Yn(k ),Zn(k ) = (xn, cn[k ], 2Jcn[k ] =0K − 1)

2 learn: use aone-sided regression algorithmto get estimators rk(x)

3 decode: for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x) the reduction-to-OSR framework:

need a good OSR algorithm

(52)

Cost-Sensitive Multiclass Classification CSMC by Regression

Regularized One-Sided Hyper-Linear Regression

Given

xn,k, Yn,k, Zn,k =

xn, cn[k ], 2r

cn[k ] =0 z

− 1 Training Goal

all trainingξn,k = max

Zn,k rk(xn,k)− Yn,k



| {z }

n,k

, 0

small

—will drop k

minw,b

λ

2hw, wi +

N

X

n=1

ξn

to get rk(x) =hw, φ(x)i + b

(53)

Cost-Sensitive Multiclass Classification CSMC by Regression

One-Sided Support Vector Regression

Regularized One-Sided Hyper-Linear Regression minw,b

λ

2hw, wi +

N

X

n=1

ξn

ξn = max (Zn· (rk(xn)− Yn), 0) Standard Support Vector Regression

minw,b

1

2Chw, wi +

N

X

n=1

nn)

ξn= max (+1· (rk(xn)− Yn− ), 0) ξn= max (−1· (rk(xn)− Yn+), 0)

OSR-SVM = SVR +(← 0)+(keepξnorξnby Zn)

(54)

Cost-Sensitive Multiclass Classification CSMC by Regression

OSR-SVM on Semi-Real Data

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300 350

avg. test cost

OSR

OVA (Tu, 2010)some ‘artificial’

cost with UCI data

• OSR: cost-sensitive SVM

• OVA: regular one-versus-all SVM

OSR often significantly betterthan OVA

(55)

Cost-Sensitive Multiclass Classification CSMC by Regression

OSR versus WAP on Semi-Real Data

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300

avg. test cost

OSR

WAP (Tu, 2010)some ‘artificial’

cost with UCI data

• OSR(per-class):

O(K ) training, O(K ) prediction

• WAP≈ CSOVO (pairwise): O(K2) training, O(K2) prediction

OSR faster and competitive performance

(56)

Cost-Sensitive Multiclass Classification CSMC by Regression

From OSR-SVM to AOSR-DNN

OSR-SVM min

w,b

λ

2hw, wi +XN n=1ξn

with rk(x) =hw, φ(x)i + b

ξn= max (Zn· (rk(xn)− Yn), 0)

Appro. OSR-DNN min

NNet regularizer +XN n=1δn

with rk(x) =NNet(x)

δn=ln (1 + exp (Zn· (rk(xn)− Yn))) AOSR-DNN(Chung, 2016a)= Deep Learning + OSR +

smoother upper boundδn≥ ξnbecause ln(1 + exp(•)) ≥ max(•, 0)

(57)

Cost-Sensitive Multiclass Classification CSMC by Regression

From AOSR-DNN to CSDNN

Cons of AOSR-DNN

c affects both classification and feature-extraction in DNN but hard to do effectivecost-sensitive feature extraction

idea 1: pre-training with c

• layer-wise pre-training with cost-sensitive autoencoders

loss = reconstruction+ AOSR

• CSDNN(Chung, 2016a)

= AOSR-DNN + cost-sens.

pre-training

idea 2: auxiliary cost-sensitive nodes

• auxiliary nodesto

predict costs per layer loss = AOSR for classification

+ AOSR for auxiliary

• applies to any deep learning model

(Chung, 2020)

CSDNN: world’sfirst successful CSMC deep learning model

(58)

Cost-Sensitive Multiclass Classification CSMC by Regression

AOSR-DNN versus CSDNN

m b s c mi bi si ci

0 1 2 3 4 5 6 7 8 9

avg. test cost

AOSR-DNN

CSDNN (Chung, 2016a)

• AOSR-DNN:

cost-sensitive training

• CSDNN:

AOSR-DNN+

cost-sensitive feature extraction

CSDNN wins, justifyingcost-sensitive feature extraction

(59)

Cost-Sensitive Multiclass Classification CSMC by Regression

ABOD-DNN versus CSDNN

m b s c mi bi si ci

0 1 2 3 4 5 6 7 8 9

avg. test cost

ABOD-DNN

CSDNN (Chung, 2016a)

• ABOD-DNN:

probability estimate+ cost-sensitive

prediction

• CSDNN:

cost-sensitive training +cost-sensitive feature extraction

CSDNN still wins, hintingdifficulty of probability estimate withoutcost-sensitive feature extraction

(60)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(61)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Which Fruit?

?

(image by Robert-Owen-Wahl from Pixabay)

apple orange strawberry kiwi

(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)

multiclass classification:

classify input (picture) toone category (label),remember? :-)

(62)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Which Fruits?

?:{apple, orange, kiwi}

(image by Michal Jarmoluk from Pixabay)

apple orange strawberry kiwi

(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)

multilabelclassification:

classify input tomultiple (or no)categories

(63)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Label Powerset: Multilabel Classification via Multiclass

(Tsoumakas, 2007)

multiclass w/ L = 4 classes

4 possible outcomes {a, o, s, k}

multilabel w/ L = 4 classes 24=16possible outcomes

2{a, o, s, k} m

{ φ, a, o, s, k, ao, as, ak, os, ok, sk, aos, aok, ask, osk, aosk}

Label Powerset (LP): reduction to multiclass classification

• difficulties for large L:

computation: 2Lextended classes

sparsity: no or few example for some combinations LP: feasible only forsmall L

(64)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

What Tags?

?:{machine learning,data structure,data mining,object oriented programming,artificial intelligence,compiler, architecture,chemistry,textbook,children book,. . . etc.}

anothermultilabel classification problem:

tagginginput to multiple categories

(65)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Binary Relevance: Multilabel Classification via Yes/No

binary

classification {yes,no}

multilabel w/ L classes: L yes/no questions

machine learning(Y), data structure(N), data mining(Y), OOP(N), AI(Y), compiler(N), architecture(N), chemistry(N), textbook(Y),

children book(N), etc.

Binary Relevance (BR): reduction to multiple isolated binary classification

• disadvantages:

isolation—hidden relations not exploited

(e.g. ML and DMhighly correlated, MLsubset ofAI, textbook & children bookdisjoint)

unbalanced—fewyes, manyno

BR: simple (& strong) benchmark with known disadvantages

(66)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Multilabel Classification Setup

Given

N examples (inputxn, labelsetYn)∈ X × 2{1,2,···L}

fruits:X = encoding(pictures), Yn⊆ {1, 2, · · · , 4}

tags:X = encoding(merchandise), Yn⊆ {1, 2, · · · , L}

Goal

a multilabel classifier g(x) thatclosely predictsthe labelsetY associated with some unseen inputs x (byexploiting hidden relations/combinations between labels)

multilabel classification:

hot and importantwith many real-world applications

(67)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

From Labelset to Coding View

labelset apple orange strawberry binary code Y1={o} 0 (N) 1 (Y) 0 (N) y1= [0, 1, 0]

Y2={a, o} 1 (Y) 1 (Y) 0 (N) y2= [1, 1, 0]

Y3={o, s} 0 (N) 1 (Y) 1 (Y) y3= [0, 1, 1]

Y4={} 0 (N) 0 (N) 0 (N) y4= [0, 0, 0]

(images by PublicDomainPictures, Narin Seandag, GiltonF, nihatyetkin from Pixabay)

subsetY of 2{1,2,··· ,L}⇐⇒ length-L binary code y

Figure

Updating...

References

Related subjects :