• 沒有找到結果。

Disclaimer about Cost-sensitive Classification

N/A
N/A
Protected

Academic year: 2022

Share "Disclaimer about Cost-sensitive Classification"

Copied!
105
0
0

加載中.... (立即查看全文)

全文

(1)

Cost-sensitive Classification: Techniques and Stories

Hsuan-Tien Lin htlin@csie.ntu.edu.tw

Professor

Dept. of CSIE, National Taiwan University

Machine Learning Summer School @ Taipei, Taiwan August 2, 2021

(2)

About Me

• co-author of textbook ‘Learning from Data: A Short Course’

• instructor of two Coursera

Mandarin-teaching ML MOOCs on Coursera

goal: make ML more realistic

• weakly supervised learning: in ICML ’20, ICLR ’21,. . .

• online/active learning: in ICML ’12, ICML ’14, AAAI ’15, EMNLP ’20,. . .

cost-sensitive classification: in ICML ’10, KDD ’12, IJCAI ’16,. . .

multi-label classification: in

NeurIPS ’12, ICML ’14, AAAI ’18,. . .

• large-scale data mining: e.g. co-led KDDCup world-championNTU teams 2010–2013

(3)

More About Me

attendant:MLSS Taipei 2006

student workshop talk: Large-Margin Thresholded Ensembles for Ordinal Regression

(4)

Disclaimer about Cost-sensitive Classification

materials mostly from “old” tutorials

• Advances in Cost-sensitive Multiclass and Multilabel Classification. KDD 2019 Tutorial, Anchorage, Alaska, USA, August 2019.

• Cost-sensitive Classification: Algorithms and Advances. ACML 2013 Tutorial, Canberra, Australia, November 2013.

• core techniquessomewhat mature, compared to 10 years ago

• new researchstill being inspired, e.g.

Classification with Rejection Based on Cost-sensitive Classification, Charoenphakdee et al., ICML ’21

Cost-Sensitive Robustness against Adversarial Examples, Zhang and Evans, ICLR ’19

• will show one applicationstoryin the end

(5)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(6)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Which Digit Did You Write?

?

one (1) two (2) three (3) four (4)

• amulticlass classification problem: grouping ‘pictures’ into different ‘categories’

C’mon, we know about

multiclass classification all too well!:-)

(7)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Performance Evaluation

?

• ZIP code recognition:

1:wrong; 2:right; 3: wrong; 4: wrong

• check value recognition:

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4:two-dollar mistake

different applications: evaluate mis-predictions differently

(8)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

ZIP Code Recognition

?

1:wrong; 2:right; 3: wrong; 4: wrong

regular multiclass classification: only right or wrong

• wrong cost: 1;right cost: 0

• prediction error of h on some (x, y ):

classification cost =Jy 6= h(x)K regular multiclass classification:

well-studied, many good algorithms

(9)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Check Value Recognition

?

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4: two-dollar mistake

cost-sensitive multiclass classification: different costs for different mis-predictions

• e.g. prediction error of h on some (x, y ):

absolute cost =|y − h(x)|

next: more aboutcost-sensitive multiclass classification (CSMC)

(10)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

What is the Status of the Patient?

(image by mcmurryjulie from Pixabay)?

bird flu cold healthy

(images by Clker-Free-Vector-Images from Pixabay)

• anotherclassification problem: grouping ‘patients’ into different ‘status’

are all mis-prediction costs equal?

(11)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Patient Status Prediction

error measure = society cost actual

predicted

bird flu cold healthy

bird flu 0 1000 100000

cold 100 0 3000

healthy 100 30 0

• bird flu mis-predicted as healthy: very high cost

• cold mis-predicted as healthy: high cost

• cold correctly predicted as cold: no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the same?

(12)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Setup: Class-Dependent Cost-Sensitive Classification

Given

N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K }

and cost matrixC ∈ RK ×K withC(y, y) =0= min1≤k ≤KC(y, k) patient diagnosis

with society cost C =

0 1000 100000

100 0 3000

100 30 0

check digit recognition

with absolute cost (cost function)

C(y, k) = |y − k|

Goal

a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y) includesregular classificationCc like0 1

1 0



asspecial case

(13)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Which Age-Group?

? infant (1) child (2) teen (3) adult (4)

(images by Tawny van Breda, Pro File, Mieczysław Samol, lisa runnels, vontoba from Pixabay)

• small mistake—classify child as teen;big mistake—classify infant as adult

• cost matrixC(y, g(x)) for embedding ‘order’: C =

0 1 4 5

1 0 1 3

3 1 0 2

5 4 1 0

CSMC can help solve many other problems likeordinal ranking

(14)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Cost Vector

cost vectorc: a row of cost components

• society cost for a bird flu patient: c = (0,1000,100000)

• absolute cost for digit 2: c = (1,0,1,2)

• age-ranking cost for a teenager:c = (3,1,0,2)

• ‘regular’ classification cost for label 2:c(2)c = (1,0,1,1)

• movie recommendation

someone who loves romance movie buthates terror:

c = (romance =0, fiction =5,terror=100)

someone who loves romance movie butfine with terror:

c = (romance =0, fiction =5,terror=3) cost vector:

representation ofpersonal preference in many applications

(15)

Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

Setup: Example-Dependent Cost-Sensitive Classification

Given

N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost vectorcn∈ RK

—will assumecn[yn] =0= min1≤k ≤Kcn[k ] Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

• will assumec[y ] =0=cmin= min1≤k ≤Kc[k ]

• note: y not really needed in evaluation

example-dependent⊃ class-dependent ⊃ regular

(16)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(17)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Key Idea: Conditional Probability Estimator

Goal (Class-Dependent Setup)

a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y)

if P(y|x) known

Bayes optimal g(x) =

argmin

1≤k ≤K K

X

y =1

P(y|x)C(y, k)

if q(x, y )≈ P(y|x) well

approximately good gq(x) =

argmin

1≤k ≤K K

X

y =1

q(x, y )C(y, k)

how to getconditional probability estimator q?

logistic regression, Naïve Bayes,. . .

(18)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Approximate Bayes-Optimal Decision

if q(x, y )≈ P(y|x) well

(Domingos, 1999)

approximately good gq(x) = argmin

1≤k ≤K K

P

y =1

q(x, y )C(y, k)

Approximate Bayes-Optimal Decision (ABOD) Approach

1 use your favorite algorithm on{(xn, yn)} to getq(x, y )≈ P(y|x)

2 for each new inputx,predict its class using gq(x)above

ABOD:probability estimate+Bayes-optimal decision

(19)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

ABOD on Artificial Data

1 use your favorite algorithm on{(xn, yn)} to getq(x, y )≈ P(y|x)

2 for each new inputx,predict its class using gq(x)above

LogReg

−→

g(x)

1 2 3 4

y

1 0 1 2 4

2 4 0 1 2

3 2 4 0 1

4 1 2 4 0

regular rotate

(20)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

ABOD for Binary Classification

Given N examples, each (inputxn, label yn)∈ X × {−1, +1}

and weightsw+,w representingtwo entries of cost matrix

g(x) +1 -1

y +1 0 w+

-1 w 0

if q(x)≈ P(+1|x) well

approximately good gq(x) = sign

w+q(x)−w(1−q(x))

, i.e. (Elkan, 2001), gq(x) = +1 ⇐⇒w+q(x)−w(1−q(x))> 0 ⇐⇒q(x)> w

w++w

ABOD for binary classification:

probability estimate+threshold changing

(21)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

ABOD for Binary Classification on Artificial Data

1 use your favorite algorithm on{(xn, yn)} to getq(x)≈ P(+1|x)

2 for each new inputx, predict its class using gq(x) = sign(q(x)−w+w+w)

LogReg

−→

g(x) +1 -1

y +1 0 10

-1 1 0

regular positive emphasis

(22)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

Pros and Cons of ABOD

Pros

• optimal if goodprobability estimateq

• predictioneasily adapts to differentC without modifyingtraining (probability estimate) Cons

• ‘difficult’: goodprobability estimateoften more difficult

than goodmulticlass classification

• ‘restricted’: only applicable toclass-dependent setup

—need ‘full picture’ of cost matrix

• ‘slower prediction’ (for multiclass): more calculationat prediction stage

can we use anymulticlass classification algorithmfor ABOD?

(23)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

MetaCost Approach

Approximate Bayes-Optimal Decision (ABOD) Approach

1 use your favorite algorithm on{(xn, yn)} to getq(y, x)≈ P(y|x)

2 for each new inputx, predict its class using gp(x) MetaCost Approach (Domingos, 1999)

1 use your favoritemulticlass classificationalgorithm onbootstrapped{(xn, yn)}and aggregatethe classifiers to getq(y, x)≈ P(y|x)

2 for each given inputxn,relabel it to yn0 using gq(x)

3 run your favoritemulticlass classificationalgorithm

onrelabeled{(xn, yn0)}to get final classifier g

4 for each new inputx, predict its class using g(x)

pros: anymulticlass classificationalgorithm can be used

(24)

Cost-Sensitive Multiclass Classification CSMC by Bayesian Perspective

MetaCost on Semi-Real Data

0 500 1000 1500 2000 2500 3000 3500

0 300 600 900 1200 1500 1800

C4.5R

MetaCost Multiclass Two-class y=x

0 500 1000 1500 2000 2500

0 300 600 900 1200 1500 1800

Undersampling

MetaCost Multiclass Two-class y=x

0 500 1000 1500 2000 2500

0 300 600 900 1200 1500 1800

Oversampling

MetaCost Multiclass Two-class y=x

/(ԁçWñ8à•Ü¸ê 'LÐ=ØWá·ßŠÒWà•ÔªÓ•ØgÙØMÝ *Ü"×@Ò{Ð=ØWÓ@×CDÓ(ÖzØWÓ@×@Ó# Ò´ýmԁÓ"-ôԟ×|Þ

×@ÞmØgӕÜØݵРB8íB@$›ó»ñŠÙ8æmÜ"à|Ó@ÒWá·ßmџԪÙmçåÒWيæ Ø´÷WÜ"à|Ó@ÒWáߊџԪÙmç 

Ò´ýmÜ"Ó"UíX =ÒWÖ0Þ^ß8ØWԁÙg׸Ö&ØWà|à@Ü"Ó|ß8ØgيæŠÓ6×@ØËÒËæŠÒ×|ÒMïŠÒWӕܛÒWيæ*×OâmßQÜ

ØÝÖ&ØWÓ@×á·Ò×|à•ÔŸýuí  ØMÔªÙW×|Ó'ÒMïQØ´÷MÜ×@ÞmÜ N üсԁÙmÜÒWà•Ü·×|ÞmØWÓ@Ü

à@Ü"æŠñŠÖzוԁØWÙµáÜz×@ފØ{æµØMÝÖ0ÞmØWԁÖzܻݺØWàá›ñŠÑŸ×@ԁÖzсÒWÓ@Óߊà@ØWïŠÑŸÜGáÓ"í

EF Y  LHRS#SUXW$YTZDLH[]\S

î“ÙÛ×Oô=ØMúZÖzсÒWÓ@Óߊà@ØWïmсÜ"á·Óµô6ÞmÜGà•ÜE14•êA7zêCN 14Jþ7|þA N ÿŠó

Ó@×@à|ÒוԟÕQÖ"ÒוԁØWÙ'Ö"ÒMÙïQÜÒMߊßmсԁÜ"æ'ôԁ×@ÞmØgñm×rÒWÙgâ¸ÒWߊߊà@Øcýmԁá·Ò×@ԟØgÙ

ïgâ á·ÒMägԁيç 14•ê N 14Zþ:7zêUó 14ZþA N 14•ê7UþAåÒMيæ

ߊà@ØmÖ&ÜzÜGæmÔªÙmç]ÒMÓ ï8ÜzݺØWà@ÜMí:rÜz×@וԪÙmçðêËïQÜ/×|ÞmÜËáÔªÙmØWà@ԁ×Oâ™ÖzсÒWÓ@Ó

ÒMÙ8æ þ­×@ÞmÜùá·Ò´ò“ØWà@ԁ×Oâ:Ö&ѪÒMÓ|Ó"óËÜzýmßQÜ"à@Ôªá ÜGÙg×@ÓðØgÙ ×Oô=ØMúZÖzсÒWÓ@Ó

æŠÒM×@ÒMï8ÒMÓ@Ü"ÓÛô=ÜGà•Ü Ö&ØWÙ8æŠñŠÖ&×@Ü"æñŠÓ•ÔªÙmçã×@ÞmÜ:ݺØMссشôԁÙmçãÖ&Øgӕ×

áØmæmÜzÑ9'014•êA7zê*N 14Zþ:7|þ *Nöÿ?14•ê7UþA*Nûê"ÿWÿMÿ?14Jþ7zê*N

ê"ÿWÿMÿ gó»ô6ÞmÜGà•Ü ðôSÒWÓËÓ@Üz×µÒсוÜGà@يÒMוÜ"џâ‡×•Øöþmó @móÂÒWيæûêGÿŠí

=ØW×•Ü ×@Þ8Ò×µ×@ÞmÜüÒMï8ӕØWсñm×@Ü¡÷´ÒMсñmÜGÓµØÝ414Zþ:7zêËÒMيæ<14•ê7UþA

ÒMà@ÜÔªà@à@ÜzсÜz÷´ÒWÙW×ݺØWàÂÒсçMØWà@ԁ×@ފáÖ&Øgá·ßŠÒMà@ԁÓ@ØWÙߊñ8à@ßQØWÓ@Ü"Ó?pØWÙmсâ

×@ފÜzÔªàÂà|Ò×@ԟØ ƒÔÓ»Ó•ÔçWيÔ}ÕQÖ"ÒMÙg×zí;)ÞmÜà•ÜGÓ@ñmс×@ÓÂØWïŠ×@ÒÔªÙmÜGæuóuñŠÓ•ÔªÙmç

×@ފÜ]Ó@ÒWáÜ*ӕÜ"ווԪÙmçgӃݺØWà *Üz×|ÒgÐ=ØWÓ@×ÒMÓ/ï8ÜzݺØWà@ÜMóÒMà@Ü^Ó@ÞmØ´ô6Ù

ÔªÙ )(ÒMïmсÜ8ó[ÒWيæ^çWà|ÒMß8ÞmԁÖ"ÒсџâµÔªÙ3/ԟçgñŠà@ÜêMí @Â÷MÜGà@Ó|ÒMá·ßmсԁيç

ԪӃÙmØM׃÷WÜ"à@âüÜKpÜ"Ö&×@ԟ÷WÜԁهà@Ü"æŠñŠÖzԁيç ÖzØWÓ@×·ôԟ×|ÞÒWÙWâ°ØÝÂ×|ÞmÜ

Ö&Øgӕ×à@ÒMוԁØWÓ"í »ÙŠæmÜGà@Ó|ÒMá·ßmсԁيç]ԪӛÜK[ÜGÖ&וԁ÷MÜËݺØWà CN @]ÒMيæ

N ê"ÿŠóïŠñm×ËÙmØM׃ݺØgà N þmí ]Üz×@Ò{Ð=ØWÓ@×à•ÜGæŠñŠÖzÜ"ÓÖzØWÓ@×@Ó

Ö&Øgá·ßŠÒMà@Ü"æü×@Ø Ð BQí@óLñŠÙŠæmÜGà@Ó|ÒMá·ßmсԁيç¡ÒMÙ8æüØ´÷MÜGà@Ó|ÒMá·ßmсԁيç

ØWÙËÒѪáØWÓ@×ÒMџÑuæŠÒM×@ÒWïŠÒMÓ@Ü"Ó"óMݺØgàÒсÑuÖ&ØWÓ@×à@ÒMוԁØWÓ"írî“ÙµÒÑÑ ÖzÒWӕÜGÓzó

×@ފÜÖzØWÓ@×@Ó ØWïm×|ÒÔªÙmÜ"æïgâ= *Üz×|ÒgÐ=ØWÓ@×·ÒMà@Ü/сشô=ÜGà›×|ފÒMÙ°×@ÞmØgӕÜ

ØÝSÜ"ÒWÖ0Þ]ØMÝ=×@ފ܃ØM×@ފÜ"à»×@ފà@ÜzÜÒMџçWØWà@ԟ×|ފá·Óôԁ×@Þ ÖzØWÙ{ÕQæŠÜ"يÖzÜ"Ó

Ü&ýŠÖzÜzÜ"æŠÔÙmç HHñŠÓ•ÔªÙmç ӕԁçWÙÒMيæ ;öԟѪÖ&ØcýmØgÙüוÜGӕ×|Ó XÜ&ýŠÖ&ÜGßm×

ݺØWà‡×|ÞmÜûÓ@ԟçgÙ ×@Ü"Ó@ׇݺØWàöñŠÙŠæŠÜ"à|Ó@ÒWáߊџԪÙmç ôԟ×|Þ N êGÿŠó

ô6ÞmÜGà•Ü^×|ÞmÜ]Ö&ØgÙ{ÕQæmÜGيÖ&ÜÔªÓ HWë )ފÜ"Ó@Ü*à•ÜGÓ@ñmс×@Ó/Ó@ñŠß8ß8Øgà•×

×@ފÜûÖ&ØWÙ8Ö&ѪñŠÓ•ÔØWÙ ×@ފÒM× *Üz×|ÒgÐ=Øgӕ×éԁӇ×|ÞmÜûÖ&ØgӕוúZà@Ü"æ8ñŠÖ&×@ԟØgÙ

áÜz×|ÞmØmæ­ØÝÖ0ÞmØMÔªÖ&ÜüÜ"÷MÜGÙåݺØgà*×OôSØúOÖ&ѪÒMÓ|Óߊà•ØgïmсÜ"á·Ó^ô6ÞmÜGà•Ü

Ó@×@à|ÒוԟÕQÖ"ÒוԁØWÙµÖzÒWÙËïQÜÒMß8ßmџԁÜ"æËôԁ×@ÞmØgñm×ÒMß8ߊà•ØcýmÔªá·ÒוԁØWÙ í

EFE 5[]S NPY M J%NP[]S

>gÜ"÷MÜ"à|ÒÑ!(gñŠÜ"Ó@וԁØWيÓÒWà•ÔªÓ@܃ԁÙüÖzØWيÙmÜGÖ&×@ԟØgÙ ôԟ×|Þ= *Üz×|ÒgÐ=Øgӕ×EDÓ

à@Ü"Ó|ñmџ×|ÓzíPIÂØ´ô Ó@Ü"يÓ@ԟ×@ԟ÷WÜÛÒWà•ÜÛ×@ÞmÜ"â%×•Ø ×@ÞmÜ Ù{ñŠá›ïQÜ"àéØMÝ

à@Ü"Ó|ÒMá·ßmсÜ"ÓñŠÓ@Ü"æ ;™ØWñmѪæùԟ×*ïQÜüÜ"يØWñmçgÞ;וØöÓ@ԁá·ßmсâÛñ8ӕÜ

×@ފܷÖzсÒWÓ@Ӹߊà@ØWï8ÒMïmԁџԁוԁÜ"Ӹߊà@ØmæŠñŠÖzÜ"æ™ïWâ¡ÒËÓ@ԁيçMсܷà|ñŠÙ]ØÝ=×|ÞmÜ

Ü"à|à@ØWà•úZïŠÒWӕÜGæðÖ&ѪÒMÓ|Ó@Ô}Õ8ÜGà·ØWÙé×@ފÜݹñmџÑÂ×@à|ÒÔªÙmÔªÙmçüӕÜ"× ;™ØWñŠÑæ

*Ü"×@ÒgÐ=Øgӕ×*ßQÜ"à•ÝºØWà|á ïQÜz×@וÜGà]ԟÝÒсћáØ{æŠÜzѪÓ*ôSÜ"à@ÜðñŠÓ•ÜGæùԁÙ

à@ÜzѪÒMïQÜzсԁÙmçãÒMÙ Ü&ýŠÒMá·ßmсÜMóðÔªà|à•ÜGÓ@ßQÜ"Özוԁ÷MÜkØMÝ°ô6ÞmÜ"×@ÞmÜGàÛ×|ÞmÜ

Ü&ýŠÒWáߊџÜ'ôÒMÓñŠÓ@Ü"æµ×•ØƒÑÜ"ÒWà@ÙË×@ފÜ"á ØgàÙmØM× ^øÂيæ^ÞmØ´ôåôSÜzсÑ

ôSØWñmѪæ ]Üz×@Ò{Ð=ØWÓ@×»æŠØËԟÝS×|Þm܃Ö&ѪÒMÓ|Ӹߊà•ØgïŠÒMïŠÔŸÑÔŸ×@ԟÜGӸߊà•ØmæŠñ8Ö&Ü"æ

ïgâ Ð BQí@ ôSÜ"à@ܷԟçgÙmØWà@Ü"æ órÒMÙ8æ¡×|ÞmÜߊà@ØWïŠÒWïmԁџԁ×Oâ]ØMÝÒ^ÖzсÒWÓ@Ó

ôÒMÓËÜ"Ó@וԪáÒMוÜGæÛÓ@ԁá·ßmсâ;ÒMÓµ×|Þmܙݹà@ÒWÖ&וԁØWÙåØÝ áØmæmÜ"сӵ×@Þ8Ò×

ߊà@Ü"æmÔªÖ&×@Ü"æ ԁ× )ÞŠÔÓ Ó•ÜGÖ&וԁØWÙüÒWيÓ@ô=ÜGà@Ó¸×|ÞmÜ"Ó@Ü ({ñmÜ"Ó@וԁØWÙ8ӛïgâ

ÖzÒWà@à@â{ԁيçðØWñm×µ×@ÞŠÜ à@ÜzсÜz÷´ÒWÙW×ËÜ&ýŠßQÜ"à@ԁáÜ"Ùg×|Ózí /QØWàË×@ÞmÜüÓ|ÒMäÜ

ØÝÓ@ߊÒWÖ&ÜWó6ØWÙmсâöà•ÜGÓ@ñŠÑŸ×|Ó/ØgÙö×@Þmܙ×OôSØúOÖ&ѪÒMÓ|Ó/æŠÒM×@ÒMï8ÒMÓ@Ü"Ó/ÒWà•Ü

ߊà@Ü"Ó@Ü"ÙgוÜGæ? ×@ÞmÜùà•ÜGÓ@ñŠÑŸ×|Ó°ØgÙ á›ñmсוԪÖ&ѪÒMÓ|Ó°æ8Ò×@ÒWïŠÒMÓ@Ü"Ó°ô=ÜGà•Ü

ïŠà@ØWÒWæmџâðÓ@Ôªá ԁѪÒMà"í )rÒWïmсÜ*B à@Ü"ßQØWà@×@Ó×@ÞmÜ^à@Ü"Ó|ñmс×@ÓØWïm×|ÒÔªÙmÜ"æ

ݺØWà N þmó @åÒWيæ%êGÿ;ïgâù×|ÞmÜݺØMсџشôÔªÙmçå÷´ÒWà•ÔªÒ×@ԟØgيÓ*ØMÝ

*Ü"×@ÒgÐ=Øgӕ×';ñŠÓ@ԁÙmç;þMÿéÒWيæ:ê"ÿ‡à•ÜGÓ@ÒWá·ßmџÜGӵԁيÓ@וÜGÒMæùØÝ @ÿ

¹ÑªÒMïQÜzсÜ"æ T

N¸þÿAV'ÒMيæ=T



N êGÿVG ?Wà@ÜzѪÒMïQÜzсԪÙmç›×|ÞmÜ×|à@ÒMԁÙmÔªÙmç

Ü&ýŠÒWáߊџÜGÓ;ñŠÓ@ԁÙmç%×|ÞmÜûÖ&ѪÒMÓ|Ó;ߊà@ØWï8ÒMïmԁџԁוԁÜ"Óöß8à•ØmæŠñŠÖzÜ"æïgâ

Ò;Ó@ԁيçMсܰà@ñ8Ù­ØݵРB8íB@$ ØgÙ Òсћ×|ÞmÜæŠÒM×@Ò XсÒWï8Ü"џÜGæ TUÐ B

 à@ØWï8ÓVG ?6ԁçWÙmØgà•ÔªÙmç ×@ފÜ]Ö&ѪÒMÓ|Óߊà@ØWïŠÒWïmԁџԁוԁÜ"Óß8à•ØmæŠñŠÖzÜ"æ‡ïgâ

(Domingos, 1999)

• some ‘artificial’ cost with UCI data

• MetaCost+C4.5:

cost-sensitive

• C4.5: regular

not surprisingly,

considering the cost properly does help

Hsuan-Tien Lin (NTU) Cost-sensitive Classification: Techniques and Stories 23/104

(25)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(26)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Key Idea: Cost Transformation

(heuristic)relabelinguseful in MetaCost: a moreprincipledway?

Yes, by Connecting Cost Vector to Regular Costs!

1 0 1 2

| {z }

cof interest

shift equivalence

−−−−−−−−−−−→ 3 2 3 4

| {z }

shifted cost

= 1 2 1 0

| {z }

mixture weightsu`

·

0 1 1 1

1 0 1 1

1 1 0 1

1 1 1 0

| {z }

regular costs

i.e.x with c = (1,0,1,2)equivalent to

a weighted mixture{(x, y, u)} = {(x, 1, 1), (x, 2, 2), (x, 3, 1)}

cost equivalence(Lin, 2014): for any classifier h, c[h(x)]+constant=

K

P

`=1

u`J` 6= h(x)K

(27)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Meaning of Cost Equivalence

c[h(x)]+constant=

K

X

`=1

u`J` 6= h(x)K

on one (x, y , c):

wrong prediction charged by c[h(x)]

relabeled data

onall{(x, `, u`)}:

wrong prediction charged by total weighted classification error

ofrelabeled data minhexpectedLHS (original CSMC problem)

= minhexpectedRHS (weighted classification when u` ≥ 0)

(28)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Calculation of u

`

Smallest Non-Negative u`’s(Lin, 2014)

whenconstant = (K − 1) max

1≤k ≤Kc[k ]− PK

k =1

c[k ], u` = max

1≤k ≤Kc[k ]− c[`]

e.g. 1 0 1 2

| {z }

cof interest

→ 1 2 1 0

| {z }

mixture weightsu`

largest c[`]: u`=0 (least preferred relabel)

• smallestc[`]: u` =largest (original label &most preferred relabel)

`’s and u`’sembed the cost

(29)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Data Space Expansion Approach

Data Space Expansion (DSE) Approach(Abe, 2004) 1 for each (xn, yn, cn)and`, letun,`= max

1≤k ≤Kcn[k ]− cn[`]

2 apply your favoritemulticlass classification algorithmon the weighted mixtures

N

S

n=1

{(xn, `, un,`)}K`=1to get g(x)

3 for each new inputx, predict its class using g(x)

• bycost equivalence,

good g fornew (weighted) regular classification problem

=

good g fororiginal cost-sensitive classification problem

• weightedregular classification: special case of CSMC

but more easily solvable by, e.g.,sampling+ regular classification(Zadrozny, 2003)

pros: anymulticlass classificationalgorithm can be used

(30)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

DSE versus MetaCost on Semi-Real Data

ann kdd let spl sat

0 50 100 150 200 250

avg. test cost

DSE

MetaCost (Abe, 2004)some ‘artificial’

cost with UCI data

• usesampling+ C4.5 for weighted regular classification

DSEcompetitive to MetaCost

(31)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Cons of DSE: Unavoidable Noise

Original Cost-Sensitive Classification Problem

1 2 3 4

individual examples without noise

+absolute cost =

New Regular Classification Problem

mixtures with relabeling noise

• cost embedded as weight +noisy labels

• new problem usuallyharder than original one

needrobustmulticlass classification algorithm to deal withnoise

(32)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Key Idea: Design Robust Multiclass Algorithm

One-Versus-One: A Popular Classification Meta-Method

• for all different class pairs (i, j),

1 take all examples (xn, yn)

that yn=i or j(original one-versus-one)

that un,i6= un,jwith the larger-u label and weight |un,i− un,j| (robust one-versus-one) 2 train a binary classifier ˆg(i,j)using those examples

• return g(x) that predicts using the votes from ˆg(i,j)

un-shiftinginside the meta-method toremove noise

robust stepmakes it suitable for DSE

cost-sensitive one-versus-one: DSE +robust one-versus-one

(33)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Cost-Sensitive One-Versus-One (CSOVO)

Cost-Sensitive One-Versus-One (Lin, 2014)

• for all different class pairs (i, j),

1 robust one-versus-one+ calculate fromcn: take all examples (xn, yn)

thatcn[i]6= cn[j] withsmaller-c labelandweight un(i,j)=|cn[i]− cn[j]|

2 train a binary classifier ˆg(i,j)using those examples

• return g(x) that predicts using the votes from ˆg(i,j)

• comes withgood theoretical guarantee:

test cost of g≤ 2X

i<jtest cost of ˆg(i,j)

• sibling to Weighted All-Pairs (WAP) approach: even tighter guarantee(Beygelzimer, 2005)

withmore sophisticated construction of u(i,j)n physical meaning: each ˆg(i,j)answers yes/no question‘prefer i or j?’

(34)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

CSOVO on Semi-Real Data

veh vow seg dna sat usp

0 20 40 60 80 100 120 140 160 180 200

avg. test random cost

CSOVO

OVO (Lin, 2014)some ‘artificial’

cost with UCI data

• CSOVO-SVM:

cost-sensitive

• OVO-SVM: regular

not surprisinglyagain,

considering the cost properly does help

(35)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

CSOVO for Ordinal Ranking

pyr mac bos aba ban com cal cen

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

avg. test absolute cost

CSOVO

OVO (Lin, 2014)absolute cost with benchmark ordinal

ranking data

• CSOVO-SVM:

cost-sensitive

• OVO-SVM: regular

CSOVOsignificantly better for ordinal ranking

(36)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Cons of CSOVO: Many Binary Classifiers

K classes−−−−→CSOVO K (K −1)2 binary classifiers

time-consumingin both

• training, especially withmany different cn[i]and cn[j]

• prediction

—parallization helps a bit, butgenerally not feasible for large K

CSOVO: a simple meta-methodfor median K only

(37)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Key Idea: OVO ≡ Round-Robin Tournament

Round-Robin Tournament 1

2

3 4

Single-Elimination Tournament 3

2

1 2

3

3 4

• prediction≡ decidingtournament winnerfor eachx

• (CS)OVO: K (K −1)2 games for prediction (and hence training)

• single-elimination tournament(for K = 2`):

K− 1 games for prediction via bottom-up: real-world

log2K gamesfor prediction via top-down:computer-world :-)

next:single-elimination tournamentfor CSMC

(38)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Filter Tree (FT) Approach

Filter Tree(Beygelzimer, 2009)Training: from bottom to top

ˆg(...) (R, 4) gˆ(1,2) (L, 9)

1 c[1] = 0

2 c[2] = 9

(3,4) (L, 3)

3 c[3] = 5

4 c[4] = 8

• gˆ(1,2)andgˆ(3,4)trained like CSOVO: smaller-c labelandweight u(i,j)n =|cn[i]− cn[j]|

• gˆ(...) trainedwith (kL, kR)filtered by sub-trees

—smaller-csub-tree directionandweight un(...) =|cn[kL]− cn[kR]| FT: top classifiersaware of bottom-classifier mistakes

(39)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Pros and Cons of FT

Pros

• efficient:

O(K ) training, O(log K ) prediction

strong theoretical guarantee:

small-regret binary classifiers

=⇒ small-regret CSMC classifier

12vs34

1vs2

1 2

3vs4

3 4

Cons

• ‘asymmetric’ to labels: non-trivial structural decision

• ‘hard’sub-tree dependent top-classification tasks

next: other reductions to (weighted) binary classification

(40)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Other Approaches via Weighted Binary Classification

FT: with regret bound(Beygelzimer, 2009)

the lowestachievablecost within{1, 2} or {3, 4}?

Divide&Conquer Tree (TREE):

withoutregret bound (Beygelzimer, 2009)

the lowestidealcost within{1, 2} or {3, 4}?

12vs34

1vs2

1 2

3vs4

3 4

Sensitive Err. Correcting Output Code (SECOC): with regret bound(Langford, 2005)

c[1] + c[3] + c[4] greater than someθ?

training time:

SECOC (O(T · K )) > FT (O(K )) ≈ TREE (O(K ))

(41)

Cost-Sensitive Multiclass Classification CSMC by (Weighted) Binary Classification

Comparison of Reductions to Weighted Binary Classification

zo. gl. ve. vo. ye. se. dn. pa. sa. us.

avg. test cost

0 50 100 150 200 250 300 350

CSOVO FT TREE

SECOC (Lin, 2014)couple all

meta-methods with SVM

• round-robin

tournament (CSOVO)

• single-elimination tournament (FT, TREE)

• error-correcting-code (SECOC)

CSOVO often among the best;

FT somewhat competitive

(42)

Cost-Sensitive Multiclass Classification CSMC by Regression

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(43)

Cost-Sensitive Multiclass Classification CSMC by Regression

Key Idea: Cost Estimator

Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

if everyc[k ] known optimal

g(x) = argmin1≤k ≤Kc[k ]

if rk(x)≈ c[k] well

approximately good gr(x) = argmin1≤k ≤Krk(x)

how to get cost estimator rk?regression

(44)

Cost-Sensitive Multiclass Classification CSMC by Regression

Cost Estimator by Per-class Regression

Given

N examples, each (inputxn, label yn, cost cn)∈ X × {1, 2, . . . , K } × RK

input cn[1] input cn[2] . . . input cn[K ]

x1 0, x1 2, x1 1

x2 1, x2 3, x2 5

· · ·

xN 6, xN 1, xN 0

| {z }

r1 | {z }

r2 | {z }

rK

want: rk(x)≈ c[k] for all future (x, y, c) and k

(45)

Cost-Sensitive Multiclass Classification CSMC by Regression

The Reduction-to-Regression Framework







 cost- sensitive example (xn, yn, cn)

@A

A

%

$ '

&

regression examples (Xn,k, Yn,k)

k = 1,· · · , K ⇒⇒⇒ regression

algorithm

⇒⇒

%

$ '

&

regressors rk(x) k∈ 1, · · · , K

AA

@









 cost- sensitive classifier gr(x)

1 encode: transform cost-sensitive examples (xn, yn, cn)

to regression examples xn,k, Yn,k = xn, cn[k ]

2 learn: use your favorite algorithm on regression examples to get estimators rk(x)

3 decode: for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x)

the reduction-to-regression framework:

systematic & easy to implement

(46)

Cost-Sensitive Multiclass Classification CSMC by Regression

Theoretical Guarantees (1/2)

gr(x) = argmin

1≤k ≤K

rk(x)

Theorem (Absolute Loss Bound)

For any set of cost estimators{rk}Kk =1and for any example (x, y , c) with c[y ] =0,

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

.

low-cost classifier⇐= accurate estimators

(47)

Cost-Sensitive Multiclass Classification CSMC by Regression

Theoretical Guarantees (2/2)

gr(x) = argmin

1≤k ≤K

rk(x)

Theorem (Squared Loss Bound)

For any set of cost estimators{rk}Kk =1and for any example (x, y , c) with c[y ] =0,

c[gr(x)]≤ v u u t2

K

X

k =1

(rk(x)− c[k])2.

applies to commonleast-square regression

(48)

Cost-Sensitive Multiclass Classification CSMC by Regression

A Pictorial Proof

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

• assumec ordered and not degenerate: y = 1;0=c[1]<c[2]≤ · · · ≤c[K ]

• assume mis-prediction gr(x) = 2: r2(x) = min1≤k ≤Krk(x)≤ r1(x)

c[1]

1 r2(x) r1(x)

2

c[2] c[3]

r3(x)

c[K ] rK(x)

c[2]− c[1]

|{z}

0

≤ ∆1

+

2

K

X

k =1

rk(x)− c[k]

(49)

Cost-Sensitive Multiclass Classification CSMC by Regression

An Even Closer Look

let∆1≡ r1(x)− c[1] and2≡ c[2] − r2(x)

11≥ 0 and∆2≥ 0: c[2] ≤1+∆2

21≤ 0 and∆2≥ 0: c[2] ≤2 31≥ 0 and∆2≤ 0: c[2] ≤1

c[1]

1

r2(x) r1(x)

2

c[2] c[3]

r3(x)

c[K ] rK(x)

1 r2(x)r1(x)

2

c[2]≤ max(∆1, 0) + max(∆2, 0)≤ ∆1

+ ∆2

(50)

Cost-Sensitive Multiclass Classification CSMC by Regression

Tighter Bound with One-sided Loss

Defineone-sided lossξk ≡ max(∆k, 0) with ∆k ≡

rk(x)− c[k]

ifc[k ] = cmin=0

k ≡

c[k ]− rk(x)

ifc[k ]6= cmin

Intuition

c[k ] = cmin:wish to have rk(x)≤ c[k]

c[k ]6= cmin:wish to have rk(x)≥ c[k]

—both wishes same as ∆k ≤ 0 ⇐⇒ ξk =0

One-sided Loss Bound:

c[gr(x)]≤

K

X

k =1

ξk

K

X

k =1

k

(51)

Cost-Sensitive Multiclass Classification CSMC by Regression

The Improved Reduction Framework







 cost- sensitive example (xn, yn, cn)

@A

A

%

$ '

&

regression examples (Xn,k, Yn,k, Zn,k)

k = 1,· · · , K ⇒⇒⇒ one-sided regression algorithm ⇒⇒⇒

%

$ '

&

regressors rk(x) k∈ 1, · · · , K

AA

@









 cost- sensitive classifier gr(x)

(Tu, 2010)

1 encode: transform cost-sensitive examples (xn, yn, cn)to

one-sided regression examples x(k )n , Yn(k ),Zn(k ) = (xn, cn[k ], 2Jcn[k ] =0K − 1)

2 learn: use aone-sided regression algorithmto get estimators rk(x)

3 decode: for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x) the reduction-to-OSR framework:

need a good OSR algorithm

(52)

Cost-Sensitive Multiclass Classification CSMC by Regression

Regularized One-Sided Hyper-Linear Regression

Given

xn,k, Yn,k, Zn,k =

xn, cn[k ], 2r

cn[k ] =0 z

− 1 Training Goal

all trainingξn,k = max

Zn,k rk(xn,k)− Yn,k



| {z }

n,k

, 0

small

—will drop k

minw,b

λ

2hw, wi +

N

X

n=1

ξn

to get rk(x) =hw, φ(x)i + b

(53)

Cost-Sensitive Multiclass Classification CSMC by Regression

One-Sided Support Vector Regression

Regularized One-Sided Hyper-Linear Regression minw,b

λ

2hw, wi +

N

X

n=1

ξn

ξn = max (Zn· (rk(xn)− Yn), 0) Standard Support Vector Regression

minw,b

1

2Chw, wi +

N

X

n=1

nn)

ξn= max (+1· (rk(xn)− Yn− ), 0) ξn= max (−1· (rk(xn)− Yn+), 0)

OSR-SVM = SVR +(← 0)+(keepξnorξnby Zn)

(54)

Cost-Sensitive Multiclass Classification CSMC by Regression

OSR-SVM on Semi-Real Data

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300 350

avg. test cost

OSR

OVA (Tu, 2010)some ‘artificial’

cost with UCI data

• OSR: cost-sensitive SVM

• OVA: regular one-versus-all SVM

OSR often significantly betterthan OVA

(55)

Cost-Sensitive Multiclass Classification CSMC by Regression

OSR versus WAP on Semi-Real Data

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300

avg. test cost

OSR

WAP (Tu, 2010)some ‘artificial’

cost with UCI data

• OSR(per-class):

O(K ) training, O(K ) prediction

• WAP≈ CSOVO (pairwise): O(K2) training, O(K2) prediction

OSR faster and competitive performance

(56)

Cost-Sensitive Multiclass Classification CSMC by Regression

From OSR-SVM to AOSR-DNN

OSR-SVM min

w,b

λ

2hw, wi +XN n=1ξn

with rk(x) =hw, φ(x)i + b

ξn= max (Zn· (rk(xn)− Yn), 0)

Appro. OSR-DNN min

NNet regularizer +XN n=1δn

with rk(x) =NNet(x)

δn=ln (1 + exp (Zn· (rk(xn)− Yn))) AOSR-DNN(Chung, 2016a)= Deep Learning + OSR +

smoother upper boundδn≥ ξnbecause ln(1 + exp(•)) ≥ max(•, 0)

(57)

Cost-Sensitive Multiclass Classification CSMC by Regression

From AOSR-DNN to CSDNN

Cons of AOSR-DNN

c affects both classification and feature-extraction in DNN but hard to do effectivecost-sensitive feature extraction

idea 1: pre-training with c

• layer-wise pre-training with cost-sensitive autoencoders

loss = reconstruction+ AOSR

• CSDNN(Chung, 2016a)

= AOSR-DNN + cost-sens.

pre-training

idea 2: auxiliary cost-sensitive nodes

• auxiliary nodesto

predict costs per layer loss = AOSR for classification

+ AOSR for auxiliary

• applies to any deep learning model

(Chung, 2020)

CSDNN: world’sfirst successful CSMC deep learning model

(58)

Cost-Sensitive Multiclass Classification CSMC by Regression

AOSR-DNN versus CSDNN

m b s c mi bi si ci

0 1 2 3 4 5 6 7 8 9

avg. test cost

AOSR-DNN

CSDNN (Chung, 2016a)

• AOSR-DNN:

cost-sensitive training

• CSDNN:

AOSR-DNN+

cost-sensitive feature extraction

CSDNN wins, justifyingcost-sensitive feature extraction

(59)

Cost-Sensitive Multiclass Classification CSMC by Regression

ABOD-DNN versus CSDNN

m b s c mi bi si ci

0 1 2 3 4 5 6 7 8 9

avg. test cost

ABOD-DNN

CSDNN (Chung, 2016a)

• ABOD-DNN:

probability estimate+ cost-sensitive

prediction

• CSDNN:

cost-sensitive training +cost-sensitive feature extraction

CSDNN still wins, hintingdifficulty of probability estimate withoutcost-sensitive feature extraction

(60)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Outline

1 Cost-Sensitive Multiclass Classification CSMC Motivation and Setup

CSMC by Bayesian Perspective

CSMC by (Weighted) Binary Classification CSMC by Regression

2 Cost-Sensitive Multilabel Classification CSML Motivation and Setup

CSML by Bayesian Perspective

CSML by (Weighted) Binary Classification CSML by Regression

3 A Story of Bacteria Classification with Doctor-Annotated Costs

4 Summary

(61)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Which Fruit?

?

(image by Robert-Owen-Wahl from Pixabay)

apple orange strawberry kiwi

(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)

multiclass classification:

classify input (picture) toone category (label),remember? :-)

(62)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Which Fruits?

?:{apple, orange, kiwi}

(image by Michal Jarmoluk from Pixabay)

apple orange strawberry kiwi

(images by Pexels, PublicDomainPictures, 192635, Rob van der Meijden from Pixabay)

multilabelclassification:

classify input tomultiple (or no)categories

(63)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Label Powerset: Multilabel Classification via Multiclass

(Tsoumakas, 2007)

multiclass w/ L = 4 classes

4 possible outcomes {a, o, s, k}

multilabel w/ L = 4 classes 24=16possible outcomes

2{a, o, s, k} m

{ φ, a, o, s, k, ao, as, ak, os, ok, sk, aos, aok, ask, osk, aosk}

Label Powerset (LP): reduction to multiclass classification

• difficulties for large L:

computation: 2Lextended classes

sparsity: no or few example for some combinations LP: feasible only forsmall L

(64)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

What Tags?

?:{machine learning,data structure,data mining,object oriented programming,artificial intelligence,compiler, architecture,chemistry,textbook,children book,. . . etc.}

anothermultilabel classification problem:

tagginginput to multiple categories

(65)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Binary Relevance: Multilabel Classification via Yes/No

binary

classification {yes,no}

multilabel w/ L classes: L yes/no questions

machine learning(Y), data structure(N), data mining(Y), OOP(N), AI(Y), compiler(N), architecture(N), chemistry(N), textbook(Y),

children book(N), etc.

Binary Relevance (BR): reduction to multiple isolated binary classification

• disadvantages:

isolation—hidden relations not exploited

(e.g. ML and DMhighly correlated, MLsubset ofAI, textbook & children bookdisjoint)

unbalanced—fewyes, manyno

BR: simple (& strong) benchmark with known disadvantages

(66)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

Multilabel Classification Setup

Given

N examples (inputxn, labelsetYn)∈ X × 2{1,2,···L}

fruits:X = encoding(pictures), Yn⊆ {1, 2, · · · , 4}

tags:X = encoding(merchandise), Yn⊆ {1, 2, · · · , L}

Goal

a multilabel classifier g(x) thatclosely predictsthe labelsetY associated with some unseen inputs x (byexploiting hidden relations/combinations between labels)

multilabel classification:

hot and importantwith many real-world applications

(67)

Cost-Sensitive Multilabel Classification CSML Motivation and Setup

From Labelset to Coding View

labelset apple orange strawberry binary code Y1={o} 0 (N) 1 (Y) 0 (N) y1= [0, 1, 0]

Y2={a, o} 1 (Y) 1 (Y) 0 (N) y2= [1, 1, 0]

Y3={o, s} 0 (N) 1 (Y) 1 (Y) y3= [0, 1, 1]

Y4={} 0 (N) 0 (N) 0 (N) y4= [0, 0, 0]

(images by PublicDomainPictures, Narin Seandag, GiltonF, nihatyetkin from Pixabay)

subsetY of 2{1,2,··· ,L}⇐⇒ length-L binary code y

參考文獻

相關文件

The proposed al- gorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using

The embedding allows the proposed algorithm, active learning with cost embedding (ALCE), to define a cost-sensitive uncertainty measure from the distance in the hidden space..

We proposed the condensed filter tree (CFT) algorithm by coupling several tools and ideas: the label powerset approach for reducing to cost-sensitive classifi- cation, the

Exten- sive experimental results justify the validity of the novel loss function for making existing deep learn- ing models cost-sensitive, and demonstrate that our proposed model

Experiments on the benchmark and the real-world data sets show that our proposed methodology in- deed achieves lower test error rates and similar (sometimes lower) test costs

Furthermore, we leverage the cost information embedded in the code space of CSRPE to propose a novel active learning algorithm for cost-sensitive MLC.. Extensive exper- imental

In addi- tion, soft cost-sensitive classification algorithms reach significantly lower test error rate than their hard siblings, while achieving similar (sometimes better) test

D. Existing cost-insensitive active learning strategies 1) Binary active learning: Active learning for binary classification (binary active learning) has been studied in many works