# Cost-Sensitive Classiﬁcation: Algorithms and Advances

(1)

## Cost-Sensitive Classification:

Hsuan-Tien Lin htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

Tutorial for ACML @ Canberra, Australia November 13, 2013

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 0/99

(2)

Associate Professor Dept. CSIE National Taiwan

University

co-leader of KDDCup world champion teams at NTU: 2010–2013

research on multi-label classification, ranking, active learning, etc.

research on cost-sensitive classification:

2007–Present

Secretary General, Taiwanese Association for Artificial Intelligence

instructor of Mandarin-teaching MOOC of Machine Learning on NTU-Coursera:

2013.11–

https://www.coursera.org/course/ntumlone

(3)

Cost-Sensitive Binary Classification

## Outline

Cost-Sensitive Binary Classification

Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification

Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification

Cost-Sensitive Classification by Regression

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 2/99

(4)

Cost-Sensitive Binary Classification

f

+1 you

−1 intruder

?

you intruder

abinary classification problem

—grouping “fingerprint pictures” intotwo different “categories”

C’mon, we know about binary classification all too well!:-)

(5)

Cost-Sensitive Binary Classification

## Supervised Machine Learning

parent

?

(picture, category) pairs

?

kid’s good

decision function brain

'

&

\$

% -

6 possibilities

truth f (x) + noise e(x)

?

examples (picturexn, category yn)

?

learning good

decision function g(x)≈ f (x) algorithm

'

&

\$

% -

6

learning model{h(x)}

how toevaluatewhetherg(x)≈ f (x)?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 4/99

(6)

Cost-Sensitive Binary Classification

## Performance Evaluation

Fingerprint Verification

f

+1 you

−1 intruder

example/figure borrowed from Amazon ML best-seller textbook

“Learning from Data” (Abu-Mostafa, Magdon-Ismail, , 2013)

two types of error: false acceptandfalse reject g

+1 -1

f +1 no error false reject -1 false accept no error

g +1 -1

f +1 0 1

-1 1 0

simplest choice:

penalizes both typesequallyand calculateaveragepenalties

(7)

Cost-Sensitive Binary Classification

## Fingerprint Verification for Supermarket

Fingerprint Verification

f

+1 you

−1 intruder

two types of error: false acceptandfalse reject g

+1 -1

f +1 no error false reject -1 false accept no error

g +1 -1

f +1 0 10

-1 1 0

supermarket: fingerprint for discount

false reject:very unhappy customer, lose future business

false accept:give a minor discount, intruder left fingerprint:-)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 6/99

(8)

Cost-Sensitive Binary Classification

## Fingerprint Verification for CIA

Fingerprint Verification

f

+1 you

−1 intruder

two types of error: false acceptandfalse reject g

+1 -1

f +1 no error false reject -1 false accept no error

g

+1 -1

f +1 0 1

-1 1000 0

CIA: fingerprint for entrance

false accept:very serious consequences!

false reject:unhappy employee, but so what? :-)

(9)

Cost-Sensitive Binary Classification

## Regular Binary Classification

penalizes both typesequally

h(x) +1 -1

y +1 0 1

-1 1 0

in-sample error for any hypothesis h Ein(h) = 1

N u w v yn

|{z}

f (xn)+noise

6= h(xn) }



~

out-of-sample error for any hypothesis h Eout(h) = E

(x,y )

u w

v y

|{z}

f (x)+noise

6= h(x) }



~

regular binary classification:

well-studied in machine learning

—ya, we know! :-)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 8/99

(10)

Cost-Sensitive Binary Classification

## Class-Weighted Cost-Sensitive Binary Classification

Supermarket Cost (Error, Loss, . . .) Matrix

h(x) +1 -1

y +1 0 10

-1 1 0

in-sample

Ein(h) = 1 N

XN n=1

 10 if yn = +1 1 if yn =−1



·Jyn6= h(xn)K out-of-sample

Eout(h) = E

(x,y )

 10 if y = +1 1 if y =−1



·Jy 6= h(x)K class-weighted cost-sensitive binary classification:different

‘weight’ for different y

(11)

Cost-Sensitive Binary Classification

Setup: Class-Weighted Cost-Sensitive Binary Classification Given

N examples, each

(inputxn, label yn)∈ X × {−1, +1}

and weightsw+,w

representing the two entries of the cost matrix

h(x) +1 -1 y +1 0 w+

-1 w 0

Goal

a classifier g(x) that

pays a small cost wyJy 6= g (x)K

on futureunseen example (x, y ), i.e., achieves low Eout(g)

regular classification: w+ =w(= 1)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 10/99

(12)

Cost-Sensitive Binary Classification

## Supermarket Revisited

Fingerprint Verification

f

+1 you

−1 intruder

two types of error: false acceptandfalse reject g

+1 -1

f +1 no error false reject -1 false accept no error

g +1 -1 f big customer 0 100

usual customer 0 10

intruder 1 0

supermarket: fingerprint for discount

big customer: really don’t want to lose her/his business

usual customer: don’t want to lose business, but not so serious

(13)

Cost-Sensitive Binary Classification

Example-Weighted Cost-Sensitive Binary Classification

Supermarket Cost Vectors(Rows)

h(x) +1 -1

y

big 0 100

usual 0 10

intruder 1 0

in-sample

Ein(h) = 1 N

XN n=1

wn

|{z}

importance

·Jyn6= h(xn)K

out-of-sample Eout(h) = E

(x,y ,w)Jy 6= h(x)K example-weighted cost-sensitive binary classification:

different w for different (x, y )

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 12/99

(14)

Cost-Sensitive Binary Classification

Setup: Example-Weighted Cost-Sensitive Binary Classification

Given

N examples, each (inputxn, label yn)∈ X × {−1, +1}

and weight wn∈ R+

Goal

a classifier g(x) that

pays a small cost wJy 6= g (x)K

on futureunseen example (x, y , w), i.e., achieves low Eout(g)

regular⊂ class-weighted ⊂ example-weighted

(15)

Bayesian Perspective of Cost-Sensitive Binary Classification

## Outline

Cost-Sensitive Binary Classification

Bayesian Perspective of Cost-Sensitive Binary Classification

Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification

Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification

Cost-Sensitive Classification by Regression

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 14/99

(16)

Bayesian Perspective of Cost-Sensitive Binary Classification

## Key Idea: Conditional Probability Estimator

Goal (Class-Weighted Setup)

a classifier g(x) that pays a small cost wyJy 6= g (x)K on future unseen example (x, y )

expected error for predicting +1 onx: wP(−1|x)

expected error for predicting -1 onx: w+P(+1|x)

if P(y|x) known

Bayes optimal g(x) = sign

w+P(+1|x) − wP(−1|x)

if p(x)≈ P(+1|x) well

approximately good gp(x) = sign

w+p(x)− w(1− p(x)) how to get conditional probability estimator p?

logistic regression, Naïve Bayes,. . .

(17)

Bayesian Perspective of Cost-Sensitive Binary Classification

## Approximate Bayes-Optimal Decision

if p(x)≈ P(+1|x) well

approximately good gp(x) = sign

w+p(x)− w(1− p(x)) that is(Elkan, 2001),

gp(x) = +1 iff w+p(x)− w(1− p(x)) > 0 iff p(x)> w

w++w : 111 for supermarket; 100101 for CIA

Approximate Bayes-Optimal Decision (ABOD) Approach

1 use your favorite algorithm on{(xn, yn)} to getp(x)≈ P(+1|x)

2 for each new inputx, predict its class using gp(x) = sign(p(x)−w+w+w)

‘simplest’ approach:

probability estimate+threshold changing

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 16/99

(18)

Bayesian Perspective of Cost-Sensitive Binary Classification

## ABOD on Artificial Data

1 use your favorite algorithm on{(xn, yn)} to getp(x)≈ P(+1|x)

2 for each new inputx, predict its class using gp(x) = sign(p(x)−w+w+w)

LogReg

−→

g +1 -1

y +1 0 10

-1 1 0

regular supermarket

(19)

Bayesian Perspective of Cost-Sensitive Binary Classification

## Pros and Cons of ABOD

Pros

optimal: if goodprobability estimate: p(x) really close to P(+1|x)

simple: training (probability estimate)unchanged, andprediction (threshold)changed only a little Cons

‘difficult’: goodprobability estimateoften more difficult than goodbinary classification

‘restricted’: only applicable toclass-weighted setup

—need ‘full picture’ of cost matrix

approach for theexample-weighted setup?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 18/99

(20)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## Outline

Cost-Sensitive Binary Classification

Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification

Cost-Sensitive Multiclass Classification

Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification

Cost-Sensitive Classification by Regression

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary

(21)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## Key Idea: Example Weight = Copying

Goal

a classifier g(x) that

pays a small cost wJy 6= g (x)K on futureunseen example (x, y , w)

on one (x, y )

wrong prediction charged by w

—regular classification

onw copiesof (x, y )

wrong prediction charged by 1

—regular classification

how to copy?over-sampling

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 20/99

(22)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## Example-Weighted Classification by Over-Sampling

copy each (xn, yn)for wntimes original problem

evaluate with

h(x) +1 -1

y

big 0 100

usual 0 10

intruder 1 0

(x1,-1,1) (x2,+1,10) (x3,+1,100)

(x4,+1,10) (x5,-1,1)

equivalent problem evaluate with

h(x) +1 -1

y

big 0 1

usual 0 1

intruder 1 0

(x1,-1) (x2,+1),. . ., (x2,+1) (x3,+1),. . ., (x3,+1),. . ., (x3,+1)

(x4,+1),. . ., (x4,+1) (x5,-1)

how to learn a good g for RHS?

SVM, NNet,. . .

(23)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## Cost-Proportionate Example Weighting

Cost-Proportionate Example Weighting (CPEW) Approach

1 effectively transform{(xn, yn, wn)} to{(xm, ym)}such that the

‘copies’ of (xn, yn)in{(xm, ym)} is proportional to wn

over/under-sampling with normalized wn(Elkan, 2001)

modify existing algorithms equivalently(Zadrozny, 2003) 2 use your favorite algorithm on{(xm, ym)}to get binary

classifierg(x)

3 for each new inputx, predict its class usingg(x)

simple and general:

very popular for cost-sensitive binary classification

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 22/99

(24)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## CPEW by Modification

1 effectively transform{(xn, yn, wn)} to{(xm, ym)}such that the

‘copies’ of (xn, yn)in{(xm, ym)} is proportional to wn

modify existing algorithms equivalently(Zadrozny, 2003) 2 use your favorite algorithm on{(xm, ym)} to get binary

classifierg(x)

3 for each new inputx, predict its class usingg(x)

Regular Linear SVM

minw,b

1 2hw, wi +

N

X

n=1

n

ξn=max (1 − yn(hw, xni + b), 0)

Modified Linear SVM

minw,b

1 2hw, wi +

N

X

n=1

C · wn· ξn

ξn=max (1 − yn(hw, xni + b), 0)

(25)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## CPEW by Modification on Artificial Data

1 effectively transform{(xn, yn, wn)} to{(xm, ym)}by modifying existing algorithms equivalently(Zadrozny, 2003)

2 use your favorite algorithm on{(xm, ym)} to getg(x)

3 for each new inputx, predict its class usingg(x)

modify

−→

g +1 -1

y +1 0 10

-1 1 0

regular supermarket

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 24/99

(26)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## CPEW by Rejection Sampling

1 effectively transform{(xn, yn, wn)} to{(xm, ym)}such that the

‘copies’ of (xn, yn)in{(xm, ym)} is proportional to wn

2 use your favorite algorithm on{(xm, ym)} to get binary classifierg(x)

3 repeat 1 and 2 to get multiplegandaggregate them

4 for each new inputx, predict its class usingaggregatedg(x)

commonly used when your favorite algorithm is ablack box rather than awhite box

(27)

Non-Bayesian Perspective of Cost-Sensitive Binary Classification

## Biased Personal Favorites

CPEW by Modification if possible

COSTING: fast training and stable performance

ABOD if in the mood for Bayesian :-)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 26/99

(28)

Cost-Sensitive Multiclass Classification

## Outline

Cost-Sensitive Binary Classification

Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification

Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification

Cost-Sensitive Classification by Regression

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary

(29)

Cost-Sensitive Multiclass Classification

## Which Digit Did You Write?

?

one (1) two (2) three (3) four (4)

amulticlass classification problem

—grouping “pictures” into different “categories”

multiclass classification all too well!:-)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 28/99

(30)

Cost-Sensitive Multiclass Classification

## Performance Evaluation

(g(x)≈ f (x)?)

?

ZIP code recognition:

1:wrong; 2:right; 3: wrong; 4: wrong

check value recognition:

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4:two-dollar mistake

evaluation by formation similarity:

1:not very similar; 2: very similar;

3:somewhat similar; 4: asilly prediction different applications:

evaluate mis-predictions differently

(31)

Cost-Sensitive Multiclass Classification

## ZIP Code Recognition

?

1:wrong; 2:right; 3: wrong; 4: wrong

regular multiclass classification: only right or wrong

wrong cost: 1;right cost: 0

prediction error of h on some (x, y ):

classification cost =Jy 6= h(x)K

—as discussed in regular binary classification

regular multiclass classification:

well-studied, many good algorithms

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 30/99

(32)

Cost-Sensitive Multiclass Classification

## Check Value Recognition

?

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4: two-dollar mistake

cost-sensitive multiclass classification:

different costs for different mis-predictions

e.g. prediction error of h on some (x, y ):

absolute cost =|y − h(x)|

next: cost-sensitivemulticlass classification

(33)

Cost-Sensitive Multiclass Classification

## What is the Status of the Patient?

?

H1N1-infected cold-infected healthy

anotherclassification problem

—grouping “patients” into different “status”

are all mis-prediction costs equal?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 32/99

(34)

Cost-Sensitive Multiclass Classification

## Patient Status Prediction

error measure = society cost XXXXactual XXXXXX

predicted

H7N9 cold healthy

H7N9 0 1000 100000

cold 100 0 3000

healthy 100 30 0

H7N9 mis-predicted as healthy:very high cost

cold mis-predicted as healthy: high cost

cold correctly predicted as cold: no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the same?

(35)

Cost-Sensitive Multiclass Classification

## What is the Type of the Movie?

? romance fiction terror

customer 1 who hates romance but likes terror error measure = non-satisfaction

XXXXactual XXXXXX predicted

romance fiction terror

romance 0 5 100

customer 2 who likes terror and romance

XXXXactual XXXXXX predicted

romance fiction terror

romance 0 5 3

different customers:

evaluate mis-predictions differently

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 34/99

(36)

Cost-Sensitive Multiclass Classification

movie classification with non-satisfaction

XXXXactual XXXXXX predicted

romance fiction terror

customer 1, romance 0 5 100

customer 2, romance 0 5 3

patient diagnosis with society cost

XXXXactual XXXXXX predicted

H7N9 cold healthy

H7N9 0 1000 100000

cold 100 0 3000

healthy 100 30 0

check digit recognition with absolute cost C(y, h(x)) = |y − h(x)|

(37)

Cost-Sensitive Multiclass Classification

## Cost Vector

cost vectorc: a row of cost components

customer 1 on a romance movie: c = (0,5,100)

an H7N9 patient: c = (0,1000,100000)

absolute cost for digit 2: c = (1,0,1,2)

“regular” classification cost for label 2: c(2)c = (1,0,1,1)

regular classification:

special case of cost-sensitive classification

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 36/99

(38)

Cost-Sensitive Multiclass Classification

Setup: Matrix-Based Cost-Sensitive Binary Classification Given

N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost matrixC ∈ RK ×K

—will assumeC(y, y) =0=min1≤k ≤KC(y, k) Goal

a classifier g(x) that

pays a small costC(y, g(x)) on futureunseen example (x, y )

extension of ‘class-weighted’ cost-sensitive binary classification

(39)

Cost-Sensitive Multiclass Classification

Setup: Vector-Based Cost-Sensitive Binary Classification Given

N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost vectorcn∈ RK

—will assumecn[yn] =0=min1≤k ≤Kcn[k ] Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

will assumec[y ] =0=cmin =min1≤k ≤Kc[k ]

note: y not really needed in evaluation

extension of ‘example-weighted’

cost-sensitive binary classification

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 38/99

(40)

Cost-Sensitive Multiclass Classification

## Which Age-Group?

?

infant (1) child (2) teen (3) adult (4)

small mistake—classify a child as a teen;

big mistake—classify an infant as an adult

cost matrixC(y, g(x)) for embedding ‘order’: C =

0 1 4 5 1 0 1 3 3 1 0 2 5 4 1 0

cost-sensitive classification can help solve many other problems, such asordinal ranking

(41)

Bayesian Perspective of Cost-Sensitive Multiclass Classification

## Outline

Cost-Sensitive Binary Classification

Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification

Bayesian Perspective of Cost-Sensitive Multiclass Classification

Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification

Cost-Sensitive Classification by Regression

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 40/99

(42)

Bayesian Perspective of Cost-Sensitive Multiclass Classification

## Key Idea: Conditional Probability Estimator

Goal (Matrix Setup)

a classifier g(x) that pays a small costC(y, g(x)) on future unseen example (x, y )

if P(y|x) known

Bayes optimal g(x) =

argmin

1≤k ≤K

XK y =1

P(y|x)C(y, k)

if p(y, x)≈ P(y|x) well

approximately good gp(x) =

argmin

1≤k ≤K

XK y =1

p(y, x)C(y, k)

how to get conditional probability estimator p?

logistic regression, Naïve Bayes,. . .

(43)

Bayesian Perspective of Cost-Sensitive Multiclass Classification

## Approximate Bayes-Optimal Decision

if p(y, x)≈ P(+1|x) well

(Domingos, 1999)

approximately good gp(x) = argmink ∈{1,2,...,K }

PK

y =1p(y, x)C(y, k) Approximate Bayes-Optimal Decision (ABOD) Approach

1 use your favorite algorithm on{(xn, yn)} to getp(y, x)≈ P(y|x)

2 for each new inputx, predict its class using gp(x) above a simple extension from binary classification:

probability estimate+Bayes-optimal decision

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 42/99

(44)

Bayesian Perspective of Cost-Sensitive Multiclass Classification

## ABOD on Artificial Data

1 use your favorite algorithm on{(xn, yn)} to getp(y, x)≈ P(y|x)

2 for each new inputx, predict its class using gp(x)

LogReg

−→

g

1 2 3 4

y

1 0 1 2 4

2 4 0 1 2

3 2 4 0 1

4 1 2 4 0

regular rotate

(45)

Bayesian Perspective of Cost-Sensitive Multiclass Classification

## Pros and Cons of ABOD

Pros

optimal: if goodprobability estimate: p(y, x) really close to P(y|x)

simple: withtraining (probability estimate)unchanged, andprediction (threshold)changed only a little

Cons

‘difficult’: goodprobability estimateoften more difficult than goodmulticlass classification

‘restricted’: only applicable toclass-weighted setup

—need ‘full picture’ of cost matrix

‘slow prediction’: need sophisticated calculation at prediction stage

can we use anymulticlass classification algorithmfor ABOD?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 44/99

(46)

Bayesian Perspective of Cost-Sensitive Multiclass Classification

## MetaCost Approach

Approximate Bayes-Optimal Decision (ABOD) Approach

1 use your favorite algorithm on{(xn, yn)} to getp(y, x)≈ P(y|x)

2 for each new inputx, predict its class using gp(x) MetaCost Approach (Domingos, 1999)

1 use your favoritemulticlass classificationalgorithm on bootstrapped{(xn, yn)} and aggregate the classifiers to get p(y, x)≈ P(y|x)

2 for each given inputxn,relabel it to yn0 using gp(x)

3 run your favoritemulticlass classificationalgorithm on relabeled {(xn, yn0)} to get final classifier g

4 for each new inputx, predict its class using g(x)

pros: anymulticlass classificationalgorithm can be used

(47)

Bayesian Perspective of Cost-Sensitive Multiclass Classification

## MetaCost on Semi-Real Data

0 500 1000 1500 2000 2500 3000 3500

0 300 600 900 1200 1500 1800

C4.5R

MetaCost Multiclass Two-class y=x

0 500 1000 1500 2000 2500

0 300 600 900 1200 1500 1800

Undersampling

MetaCost Multiclass Two-class y=x

0 500 1000 1500 2000 2500

0 300 600 900 1200 1500 1800

Oversampling

MetaCost Multiclass Two-class y=x

/(ÔçWñ8àÜ¸ê'LÐ=ØWá·ßÒWàÔªÓØgÙØMÝ *Ü"×@Ò{Ð=ØWÓ@×CDÓ(ÖzØWÓ@×@Ó# Ò´ýmÔÓ"-ôÔ×|Þ

×@ÞmØgÓÜØÝµÐ B8íB@\$ó»ñÙ8æmÜ"à|Ó@ÒWá·ßmÑÔªÙmçåÒWÙæ Ø´÷WÜ"à|Ó@ÒWáßÑÔªÙmç 

Ò´ýmÜ"Ó"UíX =ÒWÖ0Þ^ß8ØWÔÙg×¸Ö&ØWà|à@Ü"Ó|ß8ØgÙæÓ6×@ØËÒËæÒ×|ÒMïÒWÓÜÒWÙæ*×OâmßQÜ

ØÝÖ&ØWÓ@×á·Ò×|àÔýuí  ØMÔªÙW×|Ó'ÒMïQØ´÷MÜ×@ÞmÜ N üÑÔÙmÜÒWàÜ·×|ÞmØWÓ@Ü

ô6ÞmÜ"à@Ü! ]Üz×@Ò{Ð=ØWÓ@×-Øgñm×@ßQÜ"àÝºØWà|áÜ"æ×|ÞmÜÒÑ×Ü"à|ÙÒM×Ô÷MÜ=ÖzÑÒWÓ@Ó@ÔÕ8Ü"à"í

à@Ü"æñÖz×ÔØWÙµáÜz×@ÞØ{æµØMÝÖ0ÞmØWÔÖzÜ»ÝºØWàáñÑ×@ÔÖzÑÒWÓ@Óßà@ØWïÑÜGáÓ"í

EF Y  LHRS#SUXW\$YTZDLH[]\S

îÙÛ×Oô=ØMúZÖzÑÒWÓ@Óßà@ØWïmÑÜ"á·Óµô6ÞmÜGàÜE14êA7zêCN 14Jþ 7|þA N ÿó

Ó@×@à|Ò×ÔÕQÖ"Ò×ÔØWÙ'Ö"ÒMÙïQÜÒMßßmÑÔÜ"æ'ôÔ×@ÞmØgñm×rÒWÙgâ¸ÒWßßà@ØcýmÔá·Ò×@ÔØgÙ

ïgâ á·ÒMägÔÙç 14ê N 14Zþ:7zêUó 14ZþA N 14ê7UþAåÒMÙæ

ßà@ØmÖ&ÜzÜGæmÔªÙmç]ÒMÓ ï8ÜzÝºØWà@ÜMí:rÜz×@×ÔªÙmçðêËïQÜ/×|ÞmÜËáÔªÙmØWà@Ô×OâÖzÑÒWÓ@Ó

ÒMÙ8æ þ­×@ÞmÜùá·Ò´òØWà@Ô×Oâ:Ö&ÑªÒMÓ|Ó"óËÜzýmßQÜ"à@Ôªá ÜGÙg×@ÓðØgÙ ×Oô=ØMúZÖzÑÒWÓ@Ó

æÒM×@ÒMï8ÒMÓ@Ü"ÓÛô=ÜGàÜ Ö&ØWÙ8æñÖ&×@Ü"æñÓÔªÙmçã×@ÞmÜ:ÝºØMÑÑØ´ôÔÙmçãÖ&ØgÓ×

áØmæmÜzÑ9'014êA7zê*N 14Zþ:7|þ*Nöÿ ?14ê7UþA*Nûê"ÿWÿMÿ ?14Jþ7zê*N

ê"ÿWÿMÿ gó»ô6ÞmÜGàÜ ðôSÒWÓËÓ@Üz×µÒÑ×ÜGà@ÙÒM×Ü"Ñâ×Øöþmó @móÂÒWÙæûêGÿí

=ØW×Ü ×@Þ8Ò×µ×@ÞmÜüÒMï8ÓØWÑñm×@Ü¡÷´ÒMÑñmÜGÓµØÝ414Zþ:7zêËÒMÙæ<14ê7UþA

ÒMà@ÜÔªà@à@ÜzÑÜz÷´ÒWÙW×ÝºØWàÂÒÑçMØWà@Ô×@ÞáÖ&Øgá·ßÒMà@ÔÓ@ØWÙßñ8à@ßQØWÓ@Ü"Ó?pØWÙmÑâ

×@ÞÜzÔªàÂà|Ò×@ÔØ ÔÓ»ÓÔçWÙÔ}ÕQÖ"ÒMÙg×zí;)ÞmÜàÜGÓ@ñmÑ×@ÓÂØWï×@ÒÔªÙmÜGæuóuñÓÔªÙmç

×@ÞÜ]Ó@ÒWáÜ*ÓÜ"××ÔªÙmçgÓÝºØWà *Üz×|ÒgÐ=ØWÓ@×ÒMÓ/ï8ÜzÝºØWà@ÜMóÒMà@Ü^Ó@ÞmØ´ô6Ù

ÔªÙ )(ÒMïmÑÜ8ó[ÒWÙæ^çWà|ÒMß8ÞmÔÖ"ÒÑÑâµÔªÙ3/Ôçgñà@ÜêMí @Â÷MÜGà@Ó|ÒMá·ßmÑÔÙç

ÔªÓÙmØM×÷WÜ"à@âüÜKpÜ"Ö&×@Ô÷WÜÔÙà@Ü"æñÖzÔÙç ÖzØWÓ@×·ôÔ×|ÞÒWÙWâ°ØÝÂ×|ÞmÜ

Ö&ØgÓ×à@ÒM×ÔØWÓ"í »ÙæmÜGà@Ó|ÒMá·ßmÑÔÙç]ÔªÓÜK[ÜGÖ&×Ô÷MÜËÝºØWà CN @]ÒMÙæ

N ê"ÿóïñm×ËÙmØM×ÝºØgà N þmí ]Üz×@Ò{Ð=ØWÓ@×àÜGæñÖzÜ"ÓÖzØWÓ@×@Ó

Ö&Øgá·ßÒMà@Ü"æü×@Ø Ð BQí@óLñÙæmÜGà@Ó|ÒMá·ßmÑÔÙç¡ÒMÙ8æüØ´÷MÜGà@Ó|ÒMá·ßmÑÔÙç

ØWÙËÒÑªáØWÓ@×ÒMÑÑuæÒM×@ÒWïÒMÓ@Ü"Ó"óMÝºØgàÒÑÑuÖ&ØWÓ@×à@ÒM×ÔØWÓ"írîÙµÒÑÑ ÖzÒWÓÜGÓzó

×@ÞÜÖzØWÓ@×@Ó ØWïm×|ÒÔªÙmÜ"æïgâ= *Üz×|ÒgÐ=ØWÓ@×·ÒMà@Ü/ÑØ´ô=ÜGà×|ÞÒMÙ°×@ÞmØgÓÜ

ØÝSÜ"ÒWÖ0Þ]ØMÝ=×@ÞÜØM×@ÞÜ"à»×@Þà@ÜzÜÒMÑçWØWà@Ô×|Þá·ÓôÔ×@Þ ÖzØWÙ{ÕQæÜ"ÙÖzÜ"Ó

Ü&ýÖzÜzÜ"æÔÙmç HHñÓÔªÙmç ÓÔçWÙÒMÙæ ;öÔÑªÖ&ØcýmØgÙü×ÜGÓ×|Ó XÜ&ýÖ&ÜGßm×

ÝºØWà×|ÞmÜûÓ@ÔçgÙ ×@Ü"Ó@×ÝºØWàöñÙæÜ"à|Ó@ÒWáßÑÔªÙmç ôÔ×|Þ N êGÿó

ô6ÞmÜGàÜ^×|ÞmÜ]Ö&ØgÙ{ÕQæmÜGÙÖ&ÜÔªÓ HWë )ÞÜ"Ó@Ü*àÜGÓ@ñmÑ×@Ó/Ó@ñß8ß8Øgà×

×@ÞÜûÖ&ØWÙ8Ö&ÑªñÓÔØWÙ ×@ÞÒM× *Üz×|ÒgÐ=ØgÓ×éÔÓ×|ÞmÜûÖ&ØgÓ×úZà@Ü"æ8ñÖ&×@ÔØgÙ

áÜz×|ÞmØmæ­ØÝÖ0ÞmØMÔªÖ&ÜüÜ"÷MÜGÙåÝºØgà*×OôSØúOÖ&ÑªÒMÓ|ÓßàØgïmÑÜ"á·Ó^ô6ÞmÜGàÜ

Ó@×@à|Ò×ÔÕQÖ"Ò×ÔØWÙµÖzÒWÙËïQÜÒMß8ßmÑÔÜ"æËôÔ×@ÞmØgñm×ÒMß8ßàØcýmÔªá·Ò×ÔØWÙ í

EFE 5[]S NPYM J%NP[]S

>gÜ"÷MÜ"à|ÒÑ!(gñÜ"Ó@×ÔØWÙÓÒWàÔªÓ@ÜÔÙüÖzØWÙÙmÜGÖ&×@ÔØgÙ ôÔ×|Þ= *Üz×|ÒgÐ=ØgÓ×EDÓ

à@Ü"Ó|ñmÑ×|ÓzíPIÂØ´ô Ó@Ü"ÙÓ@Ô×@Ô÷WÜÛÒWàÜÛ×@ÞmÜ"â%×Ø ×@ÞmÜ Ù{ñáïQÜ"àéØMÝ

à@Ü"Ó|ÒMá·ßmÑÜ"ÓñÓ@Ü"æ ;ØWñmÑªæùÔ×*ïQÜüÜ"ÙØWñmçgÞ;×ØöÓ@Ôá·ßmÑâÛñ8ÓÜ

×@ÞÜ·ÖzÑÒWÓ@Ó¸ßà@ØWï8ÒMïmÔÑÔ×ÔÜ"Ó¸ßà@ØmæñÖzÜ"æïWâ¡ÒËÓ@ÔÙçMÑÜ·à|ñÙ]ØÝ=×|ÞmÜ

Ü"à|à@ØWàúZïÒWÓÜGæðÖ&ÑªÒMÓ|Ó@Ô}Õ8ÜGà·ØWÙé×@ÞÜÝ¹ñmÑÑÂ×@à|ÒÔªÙmÔªÙmçüÓÜ"× ;ØWñÑæ

*Ü"×@ÒgÐ=ØgÓ×*ßQÜ"àÝºØWà|á ïQÜz×@×ÜGà]ÔÝÒÑÑáØ{æÜzÑªÓ*ôSÜ"à@ÜðñÓÜGæùÔÙ

à@ÜzÑªÒMïQÜzÑÔÙmçãÒMÙ Ü&ýÒMá·ßmÑÜMóðÔªà|àÜGÓ@ßQÜ"Öz×Ô÷MÜkØMÝ°ô6ÞmÜ"×@ÞmÜGàÛ×|ÞmÜ

Ü&ýÒWáßÑÜ'ôÒMÓñÓ@Ü"æµ×ØÑÜ"ÒWà@ÙË×@ÞÜ"á ØgàÙmØM× ^øÂÙæ^ÞmØ´ôåôSÜzÑÑ

ôSØWñmÑªæ ]Üz×@Ò{Ð=ØWÓ@×»æØËÔÝS×|ÞmÜÖ&ÑªÒMÓ|Ó¸ßàØgïÒMïÔÑÔ×@ÔÜGÓ¸ßàØmæñ8Ö&Ü"æ

ïgâ Ð BQí@ ôSÜ"à@Ü·ÔçgÙmØWà@Ü"æ órÒMÙ8æ¡×|ÞmÜßà@ØWïÒWïmÔÑÔ×Oâ]ØMÝÒ^ÖzÑÒWÓ@Ó

ôÒMÓËÜ"Ó@×ÔªáÒM×ÜGæÛÓ@Ôá·ßmÑâ;ÒMÓµ×|ÞmÜÝ¹à@ÒWÖ&×ÔØWÙåØÝ áØmæmÜ"ÑÓµ×@Þ8Ò×

ßà@Ü"æmÔªÖ&×@Ü"æ Ô× )ÞÔÓ ÓÜGÖ&×ÔØWÙüÒWÙÓ@ô=ÜGà@Ó¸×|ÞmÜ"Ó@Ü ({ñmÜ"Ó@×ÔØWÙ8Óïgâ

ÖzÒWà@à@â{ÔÙçðØWñm×µ×@ÞÜ à@ÜzÑÜz÷´ÒWÙW×ËÜ&ýßQÜ"à@ÔáÜ"Ùg×|Ózí /QØWàË×@ÞmÜüÓ|ÒMäÜ

ØÝÓ@ßÒWÖ&ÜWó6ØWÙmÑâöàÜGÓ@ñÑ×|Ó/ØgÙö×@ÞmÜ×OôSØúOÖ&ÑªÒMÓ|Ó/æÒM×@ÒMï8ÒMÓ@Ü"Ó/ÒWàÜ

ßà@Ü"Ó@Ü"Ùg×ÜGæ ? ×@ÞmÜùàÜGÓ@ñÑ×|Ó°ØgÙ áñmÑ×ÔªÖ&ÑªÒMÓ|Ó°æ8Ò×@ÒWïÒMÓ@Ü"Ó°ô=ÜGàÜ

ïà@ØWÒWæmÑâðÓ@Ôªá ÔÑªÒMà"í )rÒWïmÑÜ*B à@Ü"ßQØWà@×@Ó×@ÞmÜ^à@Ü"Ó|ñmÑ×@ÓØWïm×|ÒÔªÙmÜ"æ

ÝºØWà N þmó @åÒWÙæ%êGÿ;ïgâù×|ÞmÜÝºØMÑÑØ´ôÔªÙmçå÷´ÒWàÔªÒ×@ÔØgÙÓ*ØMÝ

*Ü"×@ÒgÐ=ØgÓ×';ñÓ@ÔÙmç;þMÿéÒWÙæ:ê"ÿàÜGÓ@ÒWá·ßmÑÜGÓµÔÙÓ@×ÜGÒMæùØÝ @ÿ

¹ÑªÒMïQÜzÑÜ"æ T

N¸þÿAV'ÒMÙæ=T



N êGÿVG ?Wà@ÜzÑªÒMïQÜzÑÔªÙmç×|ÞmÜ×|à@ÒMÔÙmÔªÙmç

Ü&ýÒWáßÑÜGÓ;ñÓ@ÔÙmç%×|ÞmÜûÖ&ÑªÒMÓ|Ó;ßà@ØWï8ÒMïmÔÑÔ×ÔÜ"Óöß8àØmæñÖzÜ"æïgâ

Ò;Ó@ÔÙçMÑÜ°à@ñ8Ù­ØÝµÐ B8íB@\$ ØgÙ ÒÑÑ×|ÞmÜæÒM×@Ò XÑÒWï8Ü"ÑÜGæ TUÐ B

 à@ØWï8ÓVG ?6ÔçWÙmØgàÔªÙmç ×@ÞÜ]Ö&ÑªÒMÓ|Óßà@ØWïÒWïmÔÑÔ×ÔÜ"Óß8àØmæñÖzÜ"æïgâ

Ð B8íB@\$ ¹ÑªÒMïQÜzÑÜ"æOT@ÿú|êLØM×ÜGÓVG ?ÒMÙæéñÓ@ÔÙçÒMÑÑáØmæmÜzÑªÓÔÙ

(Domingos, 1999)

some “random” cost with UCI data

MetaCost+C4.5:

cost-sensitive

C4.5: regular

not surprisingly,

considering the cost properly does help

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 46/99

(48)

Cost-Sensitive Classification by Reweighting and Relabeling

## Outline

Cost-Sensitive Binary Classification

Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification

Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling

Cost-Sensitive Classification by Binary Classification Cost-Sensitive Classification by Regression

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary

(49)

Cost-Sensitive Classification by Reweighting and Relabeling

## Recall: Example-Weighting Useful for Binary

can example weighting be used for multiclass?

Yes! an elegant solution if using costmatrixwith special properties

(Zhou, 2010)

C(i, j) C(j, i) = wi

wj

what if using costvectorswithout special properties?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 48/99

(50)

Cost-Sensitive Classification by Reweighting and Relabeling

## Key Idea: Cost Transformation

0 1000

| {z }

c

= 1000 0

| {z }

#of copies

·

0 1 1 0



| {z } classification costs

3 2 3 4

| {z }

costc

= 1 2 1 0

| {z }

mixture weightsq`

·

0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0

| {z }

classification costs

split the cost-sensitive example:

(x, 2) with c = (3, 2, 3, 4) equivalent to

a weighted mixture{(x, 1, 1), (x, 2, 2), (x, 3, 1)}

cost equivalence: c[h(x)] = PK

`=1

q`J` 6= h(x)K for any h

(51)

Cost-Sensitive Classification by Reweighting and Relabeling

## Meaning of Cost Equivalence

c[h(x)] = XK

`=1

q`J` 6= h(x)K

on one (x, y , c)

wrong prediction charged by c[h(x)]

—weighted classification

onall(x, `, q`)

wrong prediction charged by total weighted classification error

—weighted classification weighted classification =⇒ regular classification?

same as binary(with CPEW) when q` ≥ 0

ming expected LHS (original cost-sensitive problem)

= ming expected RHS (a regular problem when q`≥ 0)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 50/99

(52)

Cost-Sensitive Classification by Reweighting and Relabeling

## Cost Transformation Methodology: Preliminary

1 split each training example (xn, yn, cn)to a weighted mixture {(xn, `, qn,`)}K`=1

2 apply regular/weighted classification algorithm on the weighted mixtures

SN

n=1{(xn, `, qn,`)}K`=1

byc[g(x)] =PK

`=1q`J` 6= g (x)K (cost equivalence), good g for new regular classification problem

## =

good g for original cost-sensitive classification problem

regular classification: needs qn,`≥ 0

but what if qn,`negative?

(53)

Cost-Sensitive Classification by Reweighting and Relabeling

## Similar Cost Vectors

1 0 1 2 3 2 3 4



| {z }

costs

=

1/3 4/3 1/3 −2/3

1 2 1 0



| {z }

mixture weights q`

·

0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0

| {z }

classification costs

negative q`: cannot split

but ˆc = (1, 0, 1, 2) is similar to c = (3, 2, 3, 4):

for any classifier g,

c[g(x)] + constant = c[g(x)] =ˆ XK

`=1q`J` 6= g (x)K

constant can be dropped during minimization

ming expected ˆc[g(x)] (original cost-sensitive problem)

= ming expected LHS (a regular problem when q`≥ 0)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 52/99

(54)

Cost-Sensitive Classification by Reweighting and Relabeling

## Cost Transformation Methodology: Revised

1 shift each training cost ˆcn to a similar and “splittable”cn

2 split (xn, yn, cn)to a weighted mixture{(xn, `, qn,`)}K`=1

3 apply regular classification algorithm on the weighted mixtures SN

n=1{(xn, `, qn,`)}K`=1

splittable: qn,`≥ 0

by cost equivalence after shifting:

good g for new regular classification problem

## =

good g for original cost-sensitive classification problem

but infinitely many similar and splittable cn!

(55)

Cost-Sensitive Classification by Reweighting and Relabeling

## Uncertainty in Mixture

a single example{(x, 2)}

—certain that the desired label is 2

a mixture{(x, 1, 1), (x, 2, 2), (x, 3, 1)} sharing the same x

—uncertainty in the desired label (25% : 1, 50% : 2, 25% : 3)

3 2 3 4

33 32 33 34



| {z }

costs

=

1 2 1 0

11 12 11 10



| {z }

mixture weights

·

0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0

| {z }

classification costs

should choose a similar and splittablec withminimum mixture uncertainty

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 54/99

(56)

Cost-Sensitive Classification by Reweighting and Relabeling

## Cost Transformation Methodology: Final

Cost Transformation Methodology(Lin, 2010)

1 shift each training cost ˆcn to a similar and splittablecnwith minimum “mixture uncertainty”

2 split (xn, yn, cn)to a weighted mixture{(xn, `, qn,`)}K`=1

3 apply regular classification algorithm on the weighted mixtures SN

n=1

{(xn, `, qn,`)}K`=1

mixture uncertainty: entropy of normalized (q1, q2,· · · , qK)

a simple and unique optimal shifting exists for every ˆc

good g for new regular classification problem

## =

good g for original cost-sensitive classification problem

(57)

Cost-Sensitive Classification by Reweighting and Relabeling

## Data Space Expansion Approach

Data Space Expansion (DSE) Approach(Abe, 2004) 1 for each (xn, yn, cn)and`, letqn,`= max

1≤k ≤Kcn[k ]− cn[`]

2 apply your favoritemulticlass classification algorithmon the weighted mixtures

SN

n=1{(xn, `, qn,`)}K`=1to get g(x)

3 for each new inputx, predict its class using g(x)

detailed explanation provided by the cost transformation methodology discussed above(Lin, 2010)

extension of Cost-Proportionate Example Weighting, but now with relabeling!

pros: anymulticlass classificationalgorithm can be used

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 56/99

(58)

Cost-Sensitive Classification by Reweighting and Relabeling

## DSE versus MetaCost on Semi-Real Data

(Abe, 2004)

MetaCost DSE annealing 206.8 127.1

solar 5317 110.9

kdd99 49.39 46.68

letter 129.6 114.0 splice 49.95 135.5 satellite 104.4 116.8

some “random” cost with UCI data

C4.5 with COSTING for weighted

classification

DSE comparable to MetaCost

(59)

Cost-Sensitive Classification by Reweighting and Relabeling

## Cons of DSE: Unavoidable (Minimum) Uncertainty

Original Cost-Sensitive Classification Problem

1 2 3 4

individual examples with certainty

+absolute cost =

New Regular Classification Problem

mixtures with unavoidable uncertainty

cost embedded as weight + label

new problem usuallyharder than original one

needrobustmulticlass classification algorithm to deal with uncertainty

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 58/99

(60)

Cost-Sensitive Classification by Binary Classification

## Outline

Cost-Sensitive Binary Classification

Bayesian Perspective of Cost-Sensitive Binary Classification Non-Bayesian Perspective of Cost-Sensitive Binary Classification Cost-Sensitive Multiclass Classification

Bayesian Perspective of Cost-Sensitive Multiclass Classification Cost-Sensitive Classification by Reweighting and Relabeling Cost-Sensitive Classification by Binary Classification

Cost-Sensitive Classification by Regression

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary

(61)

Cost-Sensitive Classification by Binary Classification

## Key Idea: Design Robust Multiclass Algorithm

One-Versus-One: A Popular Classification Meta-Method

1 for a pair (i, j), take all examples (xn, yn)that yn=i or j (original one-versus-one)

2 for a pair (i, j), from each weighted mixture{(xn, `, qn,`)} with qn,i > qn,j, take (xn, i) with weight qn,i− qn,j ; vice versa(robust one-versus-one)

3 train a binary classifier ˆg(i,j)using those examples

4 repeat the previous two steps for all different (i, j)

5 predict using the votes from ˆg(i,j)

un-shifting inside the meta-method to remove uncertainty

robust step makes it suitable for cost transformation methodology

cost-sensitive one-versus-one:

cost transformation + robust one-versus-one

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 60/99

(62)

Cost-Sensitive Classification by Binary Classification

## Cost-Sensitive One-Versus-One (CSOVO)

Cost-Sensitive One-Versus-One (Lin, 2010)

1 for a pair (i, j), transform all examples (xn, yn)to xn, argmin

k ∈{i,j}

cn[k ]

!

with weight|cn[i]− cn[j]|

2 train a binary classifier ˆg(i,j)using those examples

3 repeat the previous two steps for all different (i, j)

4 predict using the votes from ˆg(i,j)

comes withgood theoretical guarantee:

test cost of final classifier≤ 2X

i<jtest cost of ˆg(i,j)

simple, efficient, and takes original OVO as special case physical meaning:

each ˆg(i,j)answers yes/no question“prefer i or j?”

(63)

Cost-Sensitive Classification by Binary Classification

## CSOVO on Semi-Real Data

veh vow seg dna sat usp

0 20 40 60 80 100 120 140 160 180 200

avg. test random cost

CSOVO

OVO (Lin, 2010)

some “random” cost with UCI data

CSOVO-SVM:

cost-sensitive

OVO-SVM: regular

not surprisinglyagain,

considering the cost properly does help

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithms and Advances 62/99

Updating...

## References

Related subjects :