Cost-sensitive Classiﬁcation: Status and Beyond

(1)

Cost-sensitive Classification: Status and Beyond

Hsuan-Tien Lin

Department of Computer Science and Information Engineering National Taiwan University

Talk in MLRT@TAAI, 11/18/2010

(2)

Which Digit Did You Write?

?

one (1) two (2) three (3) four (4)

classification: a classic problem in machine learning

how to evaluate classification performance?

(3)

Mis-prediction Costs

(g(x) ≈ f (x)?)

? ZIP code recognition:

1:wrong; 2:right; 3: wrong; 4: wrong check value recognition:

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4:two-dollar mistake different applications:

evaluate mis-predictions differently

(4)

Check Value Recognition

?

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4: two-dollar mistake cost-sensitive classification problem:

different costs for different mis-predictions e.g. prediction error of g on some (x, y ):

absolute cost = |y − g(x)|

cost-sensitive (as opposed to cost-less) classification:

relativelynew, need more research

(5)

What is the Status of the Patient?

?

H1N1-infected cold-infected healthy

anotherclassification problem

—grouping “patients” into different “status”

are all mis-prediction costs equal?

(6)

Patient Status Prediction

error measure = society cost

XXXX

XXXXXX actual

predicted

H1N1 cold healthy

H1N1 0 1000 100000

cold 100 0 3000

healthy 100 30 0

H1N1 mis-predicted as healthy:very high cost cold mis-predicted as healthy: high cost

cold correctly predicted as cold: no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the same?

(7)

Cost-sensitive Classification Setup

Given

N examples, each (inputxn,label yn) ∈ X × {1, 2, . . . , K } and a K by K cost matrix C

K = 2: binary; K > 2: multiclass

Goal

a classifier g(x) that pays a small cost C(y , g(x)) on future unseen example (x, y )

cost-sensitive classification:

apowerful and general setup

(8)

A Quick Overview of Selected Algorithms

cost-sensitive classification via relabeling

reweighting

relabeling + reweighting (our work, among others)

reducing to binary classification (our work, among others) reducing to regression (our work)

(9)

Cost-sensitive Classification via Relabeling

(Domingos, KDD, 1999)

key idea

=

cost-sensitive classification

= cost-less classification + relabeling some examples based on cost

general and makes any cost-less approach cost-sensitive butheuristic: relabel using posterior probability estimate

theoretically soundapproach?

(10)

Cost-sensitive Classification via Reweighting

(Elkan, IJCAI, 2001)

key idea

=

= cost-less classification + emphasizing some costly examples

simple andtheoretically sound

but applies toonly binarycost-sensitive classification

—multiclass case more complicated

theoretically sound approach formulticlasscost-sensitive classification?

(11)

Cost-sensitive Classification via Relabeling + Reweighting

(Abe et al., KDD, 2004; Lin, Caltech, 2008) key idea

=

= cost-less classification + emphasizing and relabeling some examples

theoretically sound for multiclass:

good cost-less classification ⇒ good cost-sensitive classification but introducesrelabeling noiseto the learning process

—bad practical performance

theoretically sound approach for multiclass cost-sensitive classification

withpromising practical performance?

(12)

Cost-sensitive Classification via Pairwise Binary Classification

(Beygelzimer et al, ICDM, 2003; Lin, NTU, 2010) key idea

=

= binary classification + “Which of the two classes is of smaller cost?”

theoretically sound:

good binary classification ⇒ good cost-sensitive classification promising practical performance (with a good binary classifier) doesnot scale well with K, the number of classes

theoretically sound approach

forlarge-K multiclass cost-sensitive classification with promising practical performance?

(13)

Cost-sensitive Classification via Regression

(Tu and Lin, ICML, 2010)

key idea

=

= regression + “What is the estimated cost of each class?”

theoretically sound:

good regression ⇒ good cost-sensitive classification promising practical performance (with a good regressor) scales better with K

what next?

(14)

Key Remaining Question: Application

theory: well-understood algorithm: sufficiently many application: where? more?

Where does cost come from?

user-provided: but may not be feasible

—consider cost intervals instead? (Liu and Zhou, KDD, 2010) parameter-to-be-tuned: but currently lacks guidelines to users

—link cost-sensitive to the true application needs?

What are important public benchmarks?

semi-artificial (traditional): assigning arbitrary costs to existing sets vision data with a class hierarchy?—ongoing but highly depends on feature extraction rather than costs

NELL data?—cost as soft-constraints

special types of learning problems (e.g. ranking)? others?

Assisting the users ontrue application needs

(15)

Cost-sensitive Classiﬁcation: Status and Beyond

Cost-sensitive Classification: Status and Beyond

Which Digit Did You Write?

Mis-prediction Costs

Check Value Recognition

What is the Status of the Patient?

Patient Status Prediction

Cost-sensitive Classification Setup

A Quick Overview of Selected Algorithms

Cost-sensitive Classification via Relabeling

Cost-sensitive Classification via Reweighting

Cost-sensitive Classification via Relabeling + Reweighting

Cost-sensitive Classification via Pairwise Binary Classification

Cost-sensitive Classification via Regression

Key Remaining Question: Application

Thank you. Questions?