Cost-Sensitive Classification:
Algorithms and Advances
Hsuan-Tien Lin htlin@csie.ntu.edu.tw
Department of Computer Science
& Information Engineering
National Taiwan University
Tutorial for ACML @ Canberra, Australia November 13, 2013
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 0/99
More about Me
Associate Professor Dept. CSIE National Taiwan
University
• co-leader of KDDCup world champion teams at NTU: 2010–2013
• research on multi-label classification, ranking, active learning, etc.
• research on cost-sensitive classification:
2007–Present
• Secretary General, Taiwanese Association for Artificial Intelligence
• instructor of Mandarin-teaching MOOC of Machine Learning on NTU-Coursera:
2013.11–
https://www.coursera.org/course/ntumlone
Cost-Sensitive Binary Classification
Outline
Cost-Sensitive Binary Classification
Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification
Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification
Cost-Sensitive Classification by Regression
Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 2/99
Cost-Sensitive Binary Classification
Is This Your Fingerprint?
f
+1 you
−1 intruder
?
you intruder
• abinary classification problem
—grouping “fingerprint pictures” intotwo different “categories”
C’mon, we know about binary classification all too well!:-)
Cost-Sensitive Binary Classification
Supervised Machine Learning
parent
?
(picture, category) pairs
?
kid’s good
decision function brain
'
&
$
% -
6 possibilities
truth f (x) + noise e(x)
?
examples (picturexn, category yn)
?
learning good
decision function g(x)≈ f (x) algorithm
'
&
$
% -
6
learning model{h(x)}
how toevaluatewhetherg(x)≈ f (x)?
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 4/99
Cost-Sensitive Binary Classification
Performance Evaluation
Fingerprint Verification
f
+1 you
−1 intruder
example/figure borrowed from Amazon ML best-seller textbook
“Learning from Data” (Abu-Mostafa, Magdon-Ismail, , 2013)
two types of error: false acceptandfalse reject g
+1 -1
f +1 no error false reject -1 false accept no error
g +1 -1
f +1 0 1
-1 1 0
simplest choice:
penalizes both typesequallyand calculateaveragepenalties
Cost-Sensitive Binary Classification
Fingerprint Verification for Supermarket
Fingerprint Verification
f
+1 you
−1 intruder
two types of error: false acceptandfalse reject g
+1 -1
f +1 no error false reject -1 false accept no error
g +1 -1
f +1 0 10
-1 1 0
• supermarket: fingerprint for discount
• false reject:very unhappy customer, lose future business
• false accept:give a minor discount, intruder left fingerprint:-)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 6/99
Cost-Sensitive Binary Classification
Fingerprint Verification for CIA
Fingerprint Verification
f
+1 you
−1 intruder
two types of error: false acceptandfalse reject g
+1 -1
f +1 no error false reject -1 false accept no error
g
+1 -1
f +1 0 1
-1 1000 0
• CIA: fingerprint for entrance
• false accept:very serious consequences!
• false reject:unhappy employee, but so what? :-)
Cost-Sensitive Binary Classification
Regular Binary Classification
penalizes both typesequally
h(x) +1 -1
y +1 0 1
-1 1 0
in-sample error for any hypothesis h Ein(h) = 1
N u w v yn
|{z}
f (xn)+noise
6= h(xn) }
~
out-of-sample error for any hypothesis h Eout(h) = E
(x,y )
u w
v y
|{z}
f (x)+noise
6= h(x) }
~
regular binary classification:
well-studied in machine learning
—ya, we know! :-)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 8/99
Cost-Sensitive Binary Classification
Class-Weighted Cost-Sensitive Binary Classification
Supermarket Cost (Error, Loss, . . .) Matrix
h(x) +1 -1
y +1 0 10
-1 1 0
in-sample
Ein(h) = 1 N
XN n=1
10 if yn = +1 1 if yn =−1
·Jyn6= h(xn)K out-of-sample
Eout(h) = E
(x,y )
10 if y = +1 1 if y =−1
·Jy 6= h(x)K class-weighted cost-sensitive binary classification:different
‘weight’ for different y
Cost-Sensitive Binary Classification
Setup: Class-Weighted Cost-Sensitive Binary Classification Given
N examples, each
(inputxn, label yn)∈ X × {−1, +1}
and weightsw+,w−
representing the two entries of the cost matrix
h(x) +1 -1 y +1 0 w+
-1 w− 0
Goal
a classifier g(x) that
pays a small cost wyJy 6= g (x)K
on futureunseen example (x, y ), i.e., achieves low Eout(g)
regular classification: w+ =w−(= 1)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 10/99
Cost-Sensitive Binary Classification
Supermarket Revisited
Fingerprint Verification
f
+1 you
−1 intruder
two types of error: false acceptandfalse reject g
+1 -1
f +1 no error false reject -1 false accept no error
g +1 -1 f big customer 0 100
usual customer 0 10
intruder 1 0
• supermarket: fingerprint for discount
• big customer: really don’t want to lose her/his business
• usual customer: don’t want to lose business, but not so serious
Cost-Sensitive Binary Classification
Example-Weighted Cost-Sensitive Binary Classification
Supermarket Cost Vectors(Rows)
h(x) +1 -1
y
big 0 100
usual 0 10
intruder 1 0
in-sample
Ein(h) = 1 N
XN n=1
wn
|{z}
importance
·Jyn6= h(xn)K
out-of-sample Eout(h) = E
(x,y ,w)w·Jy 6= h(x)K example-weighted cost-sensitive binary classification:
different w for different (x, y )
—seen this in AdaBoost? :-)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 12/99
Cost-Sensitive Binary Classification
Setup: Example-Weighted Cost-Sensitive Binary Classification
Given
N examples, each (inputxn, label yn)∈ X × {−1, +1}
and weight wn∈ R+
Goal
a classifier g(x) that
pays a small cost wJy 6= g (x)K
on futureunseen example (x, y , w), i.e., achieves low Eout(g)
regular⊂ class-weighted ⊂ example-weighted
Bayesian Perspective of Cost-Sensitive Binary Classification
Outline
Cost-Sensitive Binary Classification
Bayesian Perspective of Cost-Sensitive Binary Classification
Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification
Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification
Cost-Sensitive Classification by Regression
Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 14/99
Bayesian Perspective of Cost-Sensitive Binary Classification
Key Idea: Conditional Probability Estimator
Goal (Class-Weighted Setup)
a classifier g(x) that pays a small cost wyJy 6= g (x)K on future unseen example (x, y )
• expected error for predicting +1 onx: w−P(−1|x)
• expected error for predicting -1 onx: w+P(+1|x)
if P(y|x) known
Bayes optimal g∗(x) = sign
w+P(+1|x) − w−P(−1|x)
if p(x)≈ P(+1|x) well
approximately good gp(x) = sign
w+p(x)− w−(1− p(x)) how to get conditional probability estimator p?
logistic regression, Naïve Bayes,. . .
Bayesian Perspective of Cost-Sensitive Binary Classification
Approximate Bayes-Optimal Decision
if p(x)≈ P(+1|x) well
approximately good gp(x) = sign
w+p(x)− w−(1− p(x)) that is(Elkan, 2001),
gp(x) = +1 iff w+p(x)− w−(1− p(x)) > 0 iff p(x)> w−
w++w− : 111 for supermarket; 100101 for CIA
Approximate Bayes-Optimal Decision (ABOD) Approach
1 use your favorite algorithm on{(xn, yn)} to getp(x)≈ P(+1|x)
2 for each new inputx, predict its class using gp(x) = sign(p(x)−w+w+w−−)
‘simplest’ approach:
probability estimate+threshold changing
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 16/99
Bayesian Perspective of Cost-Sensitive Binary Classification
ABOD on Artificial Data
1 use your favorite algorithm on{(xn, yn)} to getp(x)≈ P(+1|x)
2 for each new inputx, predict its class using gp(x) = sign(p(x)−w+w+w−−)
LogReg
−→
g +1 -1
y +1 0 10
-1 1 0
regular supermarket
Bayesian Perspective of Cost-Sensitive Binary Classification
Pros and Cons of ABOD
Pros
• optimal: if goodprobability estimate: p(x) really close to P(+1|x)
• simple: training (probability estimate)unchanged, andprediction (threshold)changed only a little Cons
• ‘difficult’: goodprobability estimateoften more difficult than goodbinary classification
• ‘restricted’: only applicable toclass-weighted setup
—need ‘full picture’ of cost matrix
approach for theexample-weighted setup?
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 18/99
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
Outline
Cost-Sensitive Binary Classification
Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification
Cost-Sensitive Multiclass Classification
Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification
Cost-Sensitive Classification by Regression
Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
Key Idea: Example Weight = Copying
Goal
a classifier g(x) that
pays a small cost wJy 6= g (x)K on futureunseen example (x, y , w)
on one (x, y )
wrong prediction charged by w
—regular classification
onw copiesof (x, y )
wrong prediction charged by 1
—regular classification
how to copy?over-sampling
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 20/99
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
Example-Weighted Classification by Over-Sampling
copy each (xn, yn)for wntimes original problem
evaluate with
h(x) +1 -1
y
big 0 100
usual 0 10
intruder 1 0
(x1,-1,1) (x2,+1,10) (x3,+1,100)
(x4,+1,10) (x5,-1,1)
equivalent problem evaluate with
h(x) +1 -1
y
big 0 1
usual 0 1
intruder 1 0
(x1,-1) (x2,+1),. . ., (x2,+1) (x3,+1),. . ., (x3,+1),. . ., (x3,+1)
(x4,+1),. . ., (x4,+1) (x5,-1)
how to learn a good g for RHS?
SVM, NNet,. . .
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
Cost-Proportionate Example Weighting
Cost-Proportionate Example Weighting (CPEW) Approach
1 effectively transform{(xn, yn, wn)} to{(xm, ym)}such that the
‘copies’ of (xn, yn)in{(xm, ym)} is proportional to wn
• over/under-sampling with normalized wn(Elkan, 2001)
• under-sampling by rejection(Zadrozny, 2003)
• modify existing algorithms equivalently(Zadrozny, 2003) 2 use your favorite algorithm on{(xm, ym)}to get binary
classifierg(x)
3 for each new inputx, predict its class usingg(x)
simple and general:
very popular for cost-sensitive binary classification
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 22/99
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
CPEW by Modification
1 effectively transform{(xn, yn, wn)} to{(xm, ym)}such that the
‘copies’ of (xn, yn)in{(xm, ym)} is proportional to wn
• modify existing algorithms equivalently(Zadrozny, 2003) 2 use your favorite algorithm on{(xm, ym)} to get binary
classifierg(x)
3 for each new inputx, predict its class usingg(x)
Regular Linear SVM
minw,b
1 2hw, wi +
N
X
n=1
Cξn
ξn=max (1 − yn(hw, xni + b), 0)
Modified Linear SVM
minw,b
1 2hw, wi +
N
X
n=1
C · wn· ξn
ξn=max (1 − yn(hw, xni + b), 0)
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
CPEW by Modification on Artificial Data
1 effectively transform{(xn, yn, wn)} to{(xm, ym)}by modifying existing algorithms equivalently(Zadrozny, 2003)
2 use your favorite algorithm on{(xm, ym)} to getg(x)
3 for each new inputx, predict its class usingg(x)
modify
−→
g +1 -1
y +1 0 10
-1 1 0
regular supermarket
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 24/99
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
CPEW by Rejection Sampling
COSTING Algorithm(Zadrozny, 2003)
1 effectively transform{(xn, yn, wn)} to{(xm, ym)}such that the
‘copies’ of (xn, yn)in{(xm, ym)} is proportional to wn
• under-sampling by rejection(Zadrozny, 2003)
2 use your favorite algorithm on{(xm, ym)} to get binary classifierg(x)
3 repeat 1 and 2 to get multiplegandaggregate them
4 for each new inputx, predict its class usingaggregatedg(x)
commonly used when your favorite algorithm is ablack box rather than awhite box
Non-Bayesian Perspective of Cost-Sensitive Binary Classification
Biased Personal Favorites
• CPEW by Modification if possible
• COSTING: fast training and stable performance
• ABOD if in the mood for Bayesian :-)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 26/99
Cost-Sensitive Multiclass Classification
Outline
Cost-Sensitive Binary Classification
Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification
Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification
Cost-Sensitive Classification by Regression
Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary
Cost-Sensitive Multiclass Classification
Which Digit Did You Write?
?
one (1) two (2) three (3) four (4)
• amulticlass classification problem
—grouping “pictures” into different “categories”
C’mon, we know about
multiclass classification all too well!:-)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 28/99
Cost-Sensitive Multiclass Classification
Performance Evaluation
(g(x)≈ f (x)?)?
• ZIP code recognition:
1:wrong; 2:right; 3: wrong; 4: wrong
• check value recognition:
1:one-dollar mistake; 2:no mistake;
3:one-dollar mistake; 4:two-dollar mistake
• evaluation by formation similarity:
1:not very similar; 2: very similar;
3:somewhat similar; 4: asilly prediction different applications:
evaluate mis-predictions differently
Cost-Sensitive Multiclass Classification
ZIP Code Recognition
?
1:wrong; 2:right; 3: wrong; 4: wrong
• regular multiclass classification: only right or wrong
• wrong cost: 1;right cost: 0
• prediction error of h on some (x, y ):
classification cost =Jy 6= h(x)K
—as discussed in regular binary classification
regular multiclass classification:
well-studied, many good algorithms
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 30/99
Cost-Sensitive Multiclass Classification
Check Value Recognition
?
1:one-dollar mistake; 2:no mistake;
3:one-dollar mistake; 4: two-dollar mistake
• cost-sensitive multiclass classification:
different costs for different mis-predictions
• e.g. prediction error of h on some (x, y ):
absolute cost =|y − h(x)|
next: cost-sensitivemulticlass classification
Cost-Sensitive Multiclass Classification
What is the Status of the Patient?
?
H1N1-infected cold-infected healthy
• anotherclassification problem
—grouping “patients” into different “status”
are all mis-prediction costs equal?
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 32/99
Cost-Sensitive Multiclass Classification
Patient Status Prediction
error measure = society cost XXXXactual XXXXXX
predicted
H7N9 cold healthy
H7N9 0 1000 100000
cold 100 0 3000
healthy 100 30 0
• H7N9 mis-predicted as healthy:very high cost
• cold mis-predicted as healthy: high cost
• cold correctly predicted as cold: no cost
human doctors consider costs of decision;
can computer-aided diagnosis do the same?
Cost-Sensitive Multiclass Classification
What is the Type of the Movie?
? romance fiction terror
customer 1 who hates romance but likes terror error measure = non-satisfaction
XXXXactual XXXXXX predicted
romance fiction terror
romance 0 5 100
customer 2 who likes terror and romance
XXXXactual XXXXXX predicted
romance fiction terror
romance 0 5 3
different customers:
evaluate mis-predictions differently
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 34/99
Cost-Sensitive Multiclass Classification
Cost-Sensitive Multiclass Classification Tasks
movie classification with non-satisfaction
XXXXactual XXXXXX predicted
romance fiction terror
customer 1, romance 0 5 100
customer 2, romance 0 5 3
patient diagnosis with society cost
XXXXactual XXXXXX predicted
H7N9 cold healthy
H7N9 0 1000 100000
cold 100 0 3000
healthy 100 30 0
check digit recognition with absolute cost C(y, h(x)) = |y − h(x)|
Cost-Sensitive Multiclass Classification
Cost Vector
cost vectorc: a row of cost components
• customer 1 on a romance movie: c = (0,5,100)
• an H7N9 patient: c = (0,1000,100000)
• absolute cost for digit 2: c = (1,0,1,2)
• “regular” classification cost for label 2: c(2)c = (1,0,1,1)
regular classification:
special case of cost-sensitive classification
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 36/99
Cost-Sensitive Multiclass Classification
Setup: Matrix-Based Cost-Sensitive Binary Classification Given
N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost matrixC ∈ RK ×K
—will assumeC(y, y) =0=min1≤k ≤KC(y, k) Goal
a classifier g(x) that
pays a small costC(y, g(x)) on futureunseen example (x, y )
extension of ‘class-weighted’ cost-sensitive binary classification
Cost-Sensitive Multiclass Classification
Setup: Vector-Based Cost-Sensitive Binary Classification Given
N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost vectorcn∈ RK
—will assumecn[yn] =0=min1≤k ≤Kcn[k ] Goal
a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)
• will assumec[y ] =0=cmin =min1≤k ≤Kc[k ]
• note: y not really needed in evaluation
extension of ‘example-weighted’
cost-sensitive binary classification
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 38/99
Cost-Sensitive Multiclass Classification
Which Age-Group?
?
infant (1) child (2) teen (3) adult (4)
• small mistake—classify a child as a teen;
big mistake—classify an infant as an adult
• cost matrixC(y, g(x)) for embedding ‘order’: C =
0 1 4 5 1 0 1 3 3 1 0 2 5 4 1 0
cost-sensitive classification can help solve many other problems, such asordinal ranking
Bayesian Perspective of Cost-Sensitive Multiclass Classification
Outline
Cost-Sensitive Binary Classification
Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification
Bayesian Perspective of Cost-Sensitive Multiclass Classification
Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification
Cost-Sensitive Classification by Regression
Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 40/99
Bayesian Perspective of Cost-Sensitive Multiclass Classification
Key Idea: Conditional Probability Estimator
Goal (Matrix Setup)
a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y )
if P(y|x) known
Bayes optimal g∗(x) =
argmin
1≤k ≤K
XK y =1
P(y|x)C(y, k)
if p(y, x)≈ P(y|x) well
approximately good gp(x) =
argmin
1≤k ≤K
XK y =1
p(y, x)C(y, k)
how to get conditional probability estimator p?
logistic regression, Naïve Bayes,. . .
Bayesian Perspective of Cost-Sensitive Multiclass Classification
Approximate Bayes-Optimal Decision
if p(y, x)≈ P(+1|x) well
(Domingos, 1999)
approximately good gp(x) = argmink ∈{1,2,...,K }
PK
y =1p(y, x)C(y, k) Approximate Bayes-Optimal Decision (ABOD) Approach
1 use your favorite algorithm on{(xn, yn)} to getp(y, x)≈ P(y|x)
2 for each new inputx, predict its class using gp(x) above a simple extension from binary classification:
probability estimate+Bayes-optimal decision
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 42/99
Bayesian Perspective of Cost-Sensitive Multiclass Classification
ABOD on Artificial Data
1 use your favorite algorithm on{(xn, yn)} to getp(y, x)≈ P(y|x)
2 for each new inputx, predict its class using gp(x)
LogReg
−→
g
1 2 3 4
y
1 0 1 2 4
2 4 0 1 2
3 2 4 0 1
4 1 2 4 0
regular rotate
Bayesian Perspective of Cost-Sensitive Multiclass Classification
Pros and Cons of ABOD
Pros
• optimal: if goodprobability estimate: p(y, x) really close to P(y|x)
• simple: withtraining (probability estimate)unchanged, andprediction (threshold)changed only a little
Cons
• ‘difficult’: goodprobability estimateoften more difficult than goodmulticlass classification
• ‘restricted’: only applicable toclass-weighted setup
—need ‘full picture’ of cost matrix
• ‘slow prediction’: need sophisticated calculation at prediction stage
can we use anymulticlass classification algorithmfor ABOD?
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 44/99
Bayesian Perspective of Cost-Sensitive Multiclass Classification
MetaCost Approach
Approximate Bayes-Optimal Decision (ABOD) Approach
1 use your favorite algorithm on{(xn, yn)} to getp(y, x)≈ P(y|x)
2 for each new inputx, predict its class using gp(x) MetaCost Approach (Domingos, 1999)
1 use your favoritemulticlass classificationalgorithm on bootstrapped{(xn, yn)} and aggregate the classifiers to get p(y, x)≈ P(y|x)
2 for each given inputxn,relabel it to yn0 using gp(x)
3 run your favoritemulticlass classificationalgorithm on relabeled {(xn, yn0)} to get final classifier g
4 for each new inputx, predict its class using g(x)
pros: anymulticlass classificationalgorithm can be used
Bayesian Perspective of Cost-Sensitive Multiclass Classification
MetaCost on Semi-Real Data
0 500 1000 1500 2000 2500 3000 3500
0 300 600 900 1200 1500 1800
C4.5R
MetaCost Multiclass Two-class y=x
0 500 1000 1500 2000 2500
0 300 600 900 1200 1500 1800
Undersampling
MetaCost Multiclass Two-class y=x
0 500 1000 1500 2000 2500
0 300 600 900 1200 1500 1800
Oversampling
MetaCost Multiclass Two-class y=x
/(ÔçWñ8àܸê'LÐ=ØWá·ßÒWàÔªÓØgÙØMÝ*Ü"×@Ò{Ð=ØWÓ@×CDÓ(ÖzØWÓ@×@Ó#Ò´ýmÔÓ"-ôÔ×|Þ
×@ÞmØgÓÜØÝµÐ B8íB@$ó»ñÙ8æmÜ"à|Ó@ÒWá·ßmÑÔªÙmçåÒWÙæ Ø´÷WÜ"à|Ó@ÒWáßÑÔªÙmç
Ò´ýmÜ"Ó"UíX =ÒWÖ0Þ^ß8ØWÔÙg׸Ö&ØWà|à@Ü"Ó|ß8ØgÙæÓ6×@ØËÒËæÒ×|ÒMïÒWÓÜÒWÙæ*×OâmßQÜ
ØÝÖ&ØWÓ@×á·Ò×|àÔýuí ØMÔªÙW×|Ó'ÒMïQØ´÷MÜ×@ÞmÜ N üÑÔÙmÜÒWàÜ·×|ÞmØWÓ@Ü
ô6ÞmÜ"à@Ü!]Üz×@Ò{Ð=ØWÓ@×-Øgñm×@ßQÜ"àݺØWà|áÜ"æ×|ÞmÜÒÑ×Ü"à|ÙÒM×Ô÷MÜ=ÖzÑÒWÓ@Ó@ÔÕ8Ü"à"í
à@Ü"æñÖz×ÔØWÙµáÜz×@ÞØ{æµØMÝÖ0ÞmØWÔÖzܻݺØWàáñÑ×@ÔÖzÑÒWÓ@Óßà@ØWïÑÜGáÓ"í
EF Y LHRS#SUXW$YTZDLH[]\S
îÙÛ×Oô=ØMúZÖzÑÒWÓ@Óßà@ØWïmÑÜ"á·Óµô6ÞmÜGàÜE14êA7zêCN 14Jþ 7|þA N ÿó
Ó@×@à|Ò×ÔÕQÖ"Ò×ÔØWÙ'Ö"ÒMÙïQÜÒMßßmÑÔÜ"æ'ôÔ×@ÞmØgñm×rÒWÙgâ¸ÒWßßà@ØcýmÔá·Ò×@ÔØgÙ
ïgâ á·ÒMägÔÙç 14ê N 14Zþ:7zêUó 14ZþA N 14ê7UþAåÒMÙæ
ßà@ØmÖ&ÜzÜGæmÔªÙmç]ÒMÓï8ÜzݺØWà@ÜMí:rÜz×@×ÔªÙmçðêËïQÜ/×|ÞmÜËáÔªÙmØWà@Ô×OâÖzÑÒWÓ@Ó
ÒMÙ8æ þ×@ÞmÜùá·Ò´òØWà@Ô×Oâ:Ö&ѪÒMÓ|Ó"óËÜzýmßQÜ"à@ÔªáÜGÙg×@ÓðØgÙ ×Oô=ØMúZÖzÑÒWÓ@Ó
æÒM×@ÒMï8ÒMÓ@Ü"ÓÛô=ÜGàÜ Ö&ØWÙ8æñÖ&×@Ü"æñÓÔªÙmçã×@ÞmÜ:ݺØMÑÑØ´ôÔÙmçãÖ&ØgÓ×
áØmæmÜzÑ9'014êA7zê*N 14Zþ:7|þ*Nöÿ ?14ê7UþA*Nûê"ÿWÿMÿ ?14Jþ7zê*N
ê"ÿWÿMÿ gó»ô6ÞmÜGàÜðôSÒWÓËÓ@Üz×µÒÑ×ÜGà@ÙÒM×Ü"Ñâרöþmó @móÂÒWÙæûêGÿí
=ØW×Ü ×@Þ8Ò×µ×@ÞmÜüÒMï8ÓØWÑñm×@Ü¡÷´ÒMÑñmÜGÓµØÝ414Zþ:7zêËÒMÙæ<14ê7UþA
ÒMà@ÜÔªà@à@ÜzÑÜz÷´ÒWÙW×ݺØWàÂÒÑçMØWà@Ô×@ÞáÖ&Øgá·ßÒMà@ÔÓ@ØWÙßñ8à@ßQØWÓ@Ü"Ó?pØWÙmÑâ
×@ÞÜzÔªàÂà|Ò×@ÔØÔÓ»ÓÔçWÙÔ}ÕQÖ"ÒMÙg×zí;)ÞmÜàÜGÓ@ñmÑ×@ÓÂØWï×@ÒÔªÙmÜGæuóuñÓÔªÙmç
×@ÞÜ]Ó@ÒWáÜ*ÓÜ"××ÔªÙmçgÓݺØWà *Üz×|ÒgÐ=ØWÓ@×ÒMÓ/ï8ÜzݺØWà@ÜMóÒMà@Ü^Ó@ÞmØ´ô6Ù
ÔªÙ )(ÒMïmÑÜ8ó[ÒWÙæ^çWà|ÒMß8ÞmÔÖ"ÒÑÑâµÔªÙ3/Ôçgñà@ÜêMí @Â÷MÜGà@Ó|ÒMá·ßmÑÔÙç
ÔªÓÙmØM×÷WÜ"à@âüÜKpÜ"Ö&×@Ô÷WÜÔÙà@Ü"æñÖzÔÙç ÖzØWÓ@×·ôÔ×|ÞÒWÙWâ°ØÝÂ×|ÞmÜ
Ö&ØgÓ×à@ÒM×ÔØWÓ"í »ÙæmÜGà@Ó|ÒMá·ßmÑÔÙç]ÔªÓÜK[ÜGÖ&×Ô÷MÜËݺØWàCN @]ÒMÙæ
N ê"ÿóïñm×ËÙmØM×ݺØgà N þmí ]Üz×@Ò{Ð=ØWÓ@×àÜGæñÖzÜ"ÓÖzØWÓ@×@Ó
Ö&Øgá·ßÒMà@Ü"æü×@Ø Ð BQí@óLñÙæmÜGà@Ó|ÒMá·ßmÑÔÙç¡ÒMÙ8æüØ´÷MÜGà@Ó|ÒMá·ßmÑÔÙç
ØWÙËÒѪáØWÓ@×ÒMÑÑuæÒM×@ÒWïÒMÓ@Ü"Ó"óMݺØgàÒÑÑuÖ&ØWÓ@×à@ÒM×ÔØWÓ"írîÙµÒÑÑÖzÒWÓÜGÓzó
×@ÞÜÖzØWÓ@×@ÓØWïm×|ÒÔªÙmÜ"æïgâ=*Üz×|ÒgÐ=ØWÓ@×·ÒMà@Ü/ÑØ´ô=ÜGà×|ÞÒMÙ°×@ÞmØgÓÜ
ØÝSÜ"ÒWÖ0Þ]ØMÝ=×@ÞÜØM×@ÞÜ"à»×@Þà@ÜzÜÒMÑçWØWà@Ô×|Þá·ÓôÔ×@Þ ÖzØWÙ{ÕQæÜ"ÙÖzÜ"Ó
Ü&ýÖzÜzÜ"æÔÙmç HHñÓÔªÙmç ÓÔçWÙÒMÙæ ;öÔѪÖ&ØcýmØgÙü×ÜGÓ×|Ó XÜ&ýÖ&ÜGßm×
ݺØWà×|ÞmÜûÓ@ÔçgÙ ×@Ü"Ó@×ݺØWàöñÙæÜ"à|Ó@ÒWáßÑÔªÙmç ôÔ×|Þ N êGÿó
ô6ÞmÜGàÜ^×|ÞmÜ]Ö&ØgÙ{ÕQæmÜGÙÖ&ÜÔªÓ HWë Uí )ÞÜ"Ó@Ü*àÜGÓ@ñmÑ×@Ó/Ó@ñß8ß8Øgà×
×@ÞÜûÖ&ØWÙ8Ö&ѪñÓÔØWÙ ×@ÞÒM× *Üz×|ÒgÐ=ØgÓ×éÔÓ×|ÞmÜûÖ&ØgÓ×úZà@Ü"æ8ñÖ&×@ÔØgÙ
áÜz×|ÞmØmæØÝÖ0ÞmØMÔªÖ&ÜüÜ"÷MÜGÙåݺØgà*×OôSØúOÖ&ѪÒMÓ|ÓßàØgïmÑÜ"á·Ó^ô6ÞmÜGàÜ
Ó@×@à|Ò×ÔÕQÖ"Ò×ÔØWÙµÖzÒWÙËïQÜÒMß8ßmÑÔÜ"æËôÔ×@ÞmØgñm×ÒMß8ßàØcýmÔªá·Ò×ÔØWÙí
EFE 5[]S NPYM J%NP[]S
>gÜ"÷MÜ"à|ÒÑ!(gñÜ"Ó@×ÔØWÙÓÒWàÔªÓ@ÜÔÙüÖzØWÙÙmÜGÖ&×@ÔØgÙ ôÔ×|Þ=*Üz×|ÒgÐ=ØgÓ×EDÓ
à@Ü"Ó|ñmÑ×|ÓzíPIÂØ´ô Ó@Ü"ÙÓ@Ô×@Ô÷WÜÛÒWàÜÛ×@ÞmÜ"â%ר ×@ÞmÜ Ù{ñáïQÜ"àéØMÝ
à@Ü"Ó|ÒMá·ßmÑÜ"ÓñÓ@Ü"æ ;ØWñmѪæùÔ×*ïQÜüÜ"ÙØWñmçgÞ;רöÓ@Ôá·ßmÑâÛñ8ÓÜ
×@ÞÜ·ÖzÑÒWÓ@Ó¸ßà@ØWï8ÒMïmÔÑÔ×ÔÜ"Ó¸ßà@ØmæñÖzÜ"æïWâ¡ÒËÓ@ÔÙçMÑÜ·à|ñÙ]ØÝ=×|ÞmÜ
Ü"à|à@ØWàúZïÒWÓÜGæðÖ&ѪÒMÓ|Ó@Ô}Õ8ÜGà·ØWÙé×@ÞÜݹñmÑÑÂ×@à|ÒÔªÙmÔªÙmçüÓÜ"× ;ØWñÑæ
*Ü"×@ÒgÐ=ØgÓ×*ßQÜ"àݺØWà|á ïQÜz×@×ÜGà]ÔÝÒÑÑáØ{æÜzѪÓ*ôSÜ"à@ÜðñÓÜGæùÔÙ
à@ÜzѪÒMïQÜzÑÔÙmçãÒMÙ Ü&ýÒMá·ßmÑÜMóðÔªà|àÜGÓ@ßQÜ"Öz×Ô÷MÜkØMݰô6ÞmÜ"×@ÞmÜGàÛ×|ÞmÜ
Ü&ýÒWáßÑÜ'ôÒMÓñÓ@Ü"æµ×ØÑÜ"ÒWà@ÙË×@ÞÜ"á ØgàÙmØM×^øÂÙæ^ÞmØ´ôåôSÜzÑÑ
ôSØWñmѪæ ]Üz×@Ò{Ð=ØWÓ@×»æØËÔÝS×|ÞmÜÖ&ѪÒMÓ|Ó¸ßàØgïÒMïÔÑÔ×@ÔÜGÓ¸ßàØmæñ8Ö&Ü"æ
ïgâ Ð BQí@ ôSÜ"à@Ü·ÔçgÙmØWà@Ü"æórÒMÙ8æ¡×|ÞmÜßà@ØWïÒWïmÔÑÔ×Oâ]ØMÝÒ^ÖzÑÒWÓ@Ó
ôÒMÓËÜ"Ó@×ÔªáÒM×ÜGæÛÓ@Ôá·ßmÑâ;ÒMÓµ×|ÞmÜݹà@ÒWÖ&×ÔØWÙåØÝáØmæmÜ"ÑÓµ×@Þ8Ò×
ßà@Ü"æmÔªÖ&×@Ü"æ Ô× )ÞÔÓÓÜGÖ&×ÔØWÙüÒWÙÓ@ô=ÜGà@Ó¸×|ÞmÜ"Ó@Ü ({ñmÜ"Ó@×ÔØWÙ8Óïgâ
ÖzÒWà@à@â{ÔÙçðØWñm×µ×@ÞÜ à@ÜzÑÜz÷´ÒWÙW×ËÜ&ýßQÜ"à@ÔáÜ"Ùg×|Ózí /QØWàË×@ÞmÜüÓ|ÒMäÜ
ØÝÓ@ßÒWÖ&ÜWó6ØWÙmÑâöàÜGÓ@ñÑ×|Ó/ØgÙö×@ÞmÜ×OôSØúOÖ&ѪÒMÓ|Ó/æÒM×@ÒMï8ÒMÓ@Ü"Ó/ÒWàÜ
ßà@Ü"Ó@Ü"Ùg×ÜGæ ? ×@ÞmÜùàÜGÓ@ñÑ×|Ó°ØgÙ áñmÑ×ÔªÖ&ѪÒMÓ|Ó°æ8Ò×@ÒWïÒMÓ@Ü"Ó°ô=ÜGàÜ
ïà@ØWÒWæmÑâðÓ@ÔªáÔѪÒMà"í )rÒWïmÑÜ*B à@Ü"ßQØWà@×@Ó×@ÞmÜ^à@Ü"Ó|ñmÑ×@ÓØWïm×|ÒÔªÙmÜ"æ
ݺØWà N þmó @åÒWÙæ%êGÿ;ïgâù×|ÞmÜݺØMÑÑØ´ôÔªÙmçå÷´ÒWàÔªÒ×@ÔØgÙÓ*ØMÝ
*Ü"×@ÒgÐ=ØgÓ×';ñÓ@ÔÙmç;þMÿéÒWÙæ:ê"ÿàÜGÓ@ÒWá·ßmÑÜGÓµÔÙÓ@×ÜGÒMæùØÝ @ÿ
¹ÑªÒMïQÜzÑÜ"æ T
N¸þÿAV'ÒMÙæ=T
NêGÿVG?Wà@ÜzѪÒMïQÜzÑÔªÙmç×|ÞmÜ×|à@ÒMÔÙmÔªÙmç
Ü&ýÒWáßÑÜGÓ;ñÓ@ÔÙmç%×|ÞmÜûÖ&ѪÒMÓ|Ó;ßà@ØWï8ÒMïmÔÑÔ×ÔÜ"Óöß8àØmæñÖzÜ"æïgâ
Ò;Ó@ÔÙçMÑܰà@ñ8ÙØÝµÐ B8íB@$ ØgÙ ÒÑÑ×|ÞmÜæÒM×@Ò XÑÒWï8Ü"ÑÜGæ TUÐ B
à@ØWï8ÓVG?6ÔçWÙmØgàÔªÙmç ×@ÞÜ]Ö&ѪÒMÓ|Óßà@ØWïÒWïmÔÑÔ×ÔÜ"Óß8àØmæñÖzÜ"æïgâ
Ð B8íB@$ ¹ÑªÒMïQÜzÑÜ"æOT@ÿú|êLØM×ÜGÓVG?ÒMÙæéñÓ@ÔÙçÒMÑÑáØmæmÜzѪÓÔÙ
(Domingos, 1999)
• some “random” cost with UCI data
• MetaCost+C4.5:
cost-sensitive
• C4.5: regular
not surprisingly,
considering the cost properly does help
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 46/99
Cost-Sensitive Classification by Reweighting and Relabeling
Outline
Cost-Sensitive Binary Classification
Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification
Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling
Cost-Sensitive Classification by Binary Classification Cost-Sensitive Classification by Regression
Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary
Cost-Sensitive Classification by Reweighting and Relabeling
Recall: Example-Weighting Useful for Binary
can example weighting be used for multiclass?
Yes! an elegant solution if using costmatrixwith special properties
(Zhou, 2010)
C(i, j) C(j, i) = wi
wj
what if using costvectorswithout special properties?
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 48/99
Cost-Sensitive Classification by Reweighting and Relabeling
Key Idea: Cost Transformation
0 1000
| {z }
c
= 1000 0
| {z }
#of copies
·
0 1 1 0
| {z } classification costs
3 2 3 4
| {z }
costc
= 1 2 1 0
| {z }
mixture weightsq`
·
0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0
| {z }
classification costs
• split the cost-sensitive example:
(x, 2) with c = (3, 2, 3, 4) equivalent to
a weighted mixture{(x, 1, 1), (x, 2, 2), (x, 3, 1)}
cost equivalence: c[h(x)] = PK
`=1
q`J` 6= h(x)K for any h
Cost-Sensitive Classification by Reweighting and Relabeling
Meaning of Cost Equivalence
c[h(x)] = XK
`=1
q`J` 6= h(x)K
on one (x, y , c)
wrong prediction charged by c[h(x)]
—weighted classification
onall(x, `, q`)
wrong prediction charged by total weighted classification error
—weighted classification weighted classification =⇒ regular classification?
same as binary(with CPEW) when q` ≥ 0
ming expected LHS (original cost-sensitive problem)
= ming expected RHS (a regular problem when q`≥ 0)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 50/99
Cost-Sensitive Classification by Reweighting and Relabeling
Cost Transformation Methodology: Preliminary
1 split each training example (xn, yn, cn)to a weighted mixture {(xn, `, qn,`)}K`=1
2 apply regular/weighted classification algorithm on the weighted mixtures
SN
n=1{(xn, `, qn,`)}K`=1
• byc[g(x)] =PK
`=1q`J` 6= g (x)K (cost equivalence), good g for new regular classification problem
=
good g for original cost-sensitive classification problem• regular classification: needs qn,`≥ 0
but what if qn,`negative?
Cost-Sensitive Classification by Reweighting and Relabeling
Similar Cost Vectors
1 0 1 2 3 2 3 4
| {z }
costs
=
1/3 4/3 1/3 −2/3
1 2 1 0
| {z }
mixture weights q`
·
0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0
| {z }
classification costs
• negative q`: cannot split
• but ˆc = (1, 0, 1, 2) is similar to c = (3, 2, 3, 4):
for any classifier g,
c[g(x)] + constant = c[g(x)] =ˆ XK
`=1q`J` 6= g (x)K
• constant can be dropped during minimization
ming expected ˆc[g(x)] (original cost-sensitive problem)
= ming expected LHS (a regular problem when q`≥ 0)
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 52/99
Cost-Sensitive Classification by Reweighting and Relabeling
Cost Transformation Methodology: Revised
1 shift each training cost ˆcn to a similar and “splittable”cn
2 split (xn, yn, cn)to a weighted mixture{(xn, `, qn,`)}K`=1
3 apply regular classification algorithm on the weighted mixtures SN
n=1{(xn, `, qn,`)}K`=1
• splittable: qn,`≥ 0
• by cost equivalence after shifting:
good g for new regular classification problem
=
good g for original cost-sensitive classification problembut infinitely many similar and splittable cn!
Cost-Sensitive Classification by Reweighting and Relabeling
Uncertainty in Mixture
• a single example{(x, 2)}
—certain that the desired label is 2
• a mixture{(x, 1, 1), (x, 2, 2), (x, 3, 1)} sharing the same x
—uncertainty in the desired label (25% : 1, 50% : 2, 25% : 3)
• over-shifting adds unnecessary mixture uncertainty:
3 2 3 4
33 32 33 34
| {z }
costs
=
1 2 1 0
11 12 11 10
| {z }
mixture weights
·
0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0
| {z }
classification costs
should choose a similar and splittablec withminimum mixture uncertainty
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 54/99
Cost-Sensitive Classification by Reweighting and Relabeling
Cost Transformation Methodology: Final
Cost Transformation Methodology(Lin, 2010)
1 shift each training cost ˆcn to a similar and splittablecnwith minimum “mixture uncertainty”
2 split (xn, yn, cn)to a weighted mixture{(xn, `, qn,`)}K`=1
3 apply regular classification algorithm on the weighted mixtures SN
n=1
{(xn, `, qn,`)}K`=1
• mixture uncertainty: entropy of normalized (q1, q2,· · · , qK)
• a simple and unique optimal shifting exists for every ˆc
good g for new regular classification problem
=
good g for original cost-sensitive classification problemCost-Sensitive Classification by Reweighting and Relabeling
Data Space Expansion Approach
Data Space Expansion (DSE) Approach(Abe, 2004) 1 for each (xn, yn, cn)and`, letqn,`= max
1≤k ≤Kcn[k ]− cn[`]
2 apply your favoritemulticlass classification algorithmon the weighted mixtures
SN
n=1{(xn, `, qn,`)}K`=1to get g(x)
3 for each new inputx, predict its class using g(x)
• detailed explanation provided by the cost transformation methodology discussed above(Lin, 2010)
• extension of Cost-Proportionate Example Weighting, but now with relabeling!
pros: anymulticlass classificationalgorithm can be used
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 56/99
Cost-Sensitive Classification by Reweighting and Relabeling
DSE versus MetaCost on Semi-Real Data
(Abe, 2004)
MetaCost DSE annealing 206.8 127.1
solar 5317 110.9
kdd99 49.39 46.68
letter 129.6 114.0 splice 49.95 135.5 satellite 104.4 116.8
• some “random” cost with UCI data
• C4.5 with COSTING for weighted
classification
DSE comparable to MetaCost
Cost-Sensitive Classification by Reweighting and Relabeling
Cons of DSE: Unavoidable (Minimum) Uncertainty
Original Cost-Sensitive Classification Problem
1 2 3 4
individual examples with certainty
+absolute cost =
New Regular Classification Problem
mixtures with unavoidable uncertainty
• cost embedded as weight + label
• new problem usuallyharder than original one
needrobustmulticlass classification algorithm to deal with uncertainty
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 58/99
Cost-Sensitive Classification by Binary Classification
Outline
Cost-Sensitive Binary Classification
Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification
Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification
Cost-Sensitive Classification by Regression
Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary
Cost-Sensitive Classification by Binary Classification
Key Idea: Design Robust Multiclass Algorithm
One-Versus-One: A Popular Classification Meta-Method
1 for a pair (i, j), take all examples (xn, yn)that yn=i or j (original one-versus-one)
2 for a pair (i, j), from each weighted mixture{(xn, `, qn,`)} with qn,i > qn,j, take (xn, i) with weight qn,i− qn,j ; vice versa(robust one-versus-one)
3 train a binary classifier ˆg(i,j)using those examples
4 repeat the previous two steps for all different (i, j)
5 predict using the votes from ˆg(i,j)
• un-shifting inside the meta-method to remove uncertainty
• robust step makes it suitable for cost transformation methodology
cost-sensitive one-versus-one:
cost transformation + robust one-versus-one
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 60/99
Cost-Sensitive Classification by Binary Classification
Cost-Sensitive One-Versus-One (CSOVO)
Cost-Sensitive One-Versus-One (Lin, 2010)
1 for a pair (i, j), transform all examples (xn, yn)to xn, argmin
k ∈{i,j}
cn[k ]
!
with weight|cn[i]− cn[j]|
2 train a binary classifier ˆg(i,j)using those examples
3 repeat the previous two steps for all different (i, j)
4 predict using the votes from ˆg(i,j)
• comes withgood theoretical guarantee:
test cost of final classifier≤ 2X
i<jtest cost of ˆg(i,j)
• simple, efficient, and takes original OVO as special case physical meaning:
each ˆg(i,j)answers yes/no question“prefer i or j?”
Cost-Sensitive Classification by Binary Classification
CSOVO on Semi-Real Data
veh vow seg dna sat usp
0 20 40 60 80 100 120 140 160 180 200
avg. test random cost
CSOVO
OVO (Lin, 2010)
• some “random” cost with UCI data
• CSOVO-SVM:
cost-sensitive
• OVO-SVM: regular
not surprisinglyagain,
considering the cost properly does help
Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 62/99