• 沒有找到結果。

Cost-Sensitive Classification: Algorithm and Application

N/A
N/A
Protected

Academic year: 2022

Share "Cost-Sensitive Classification: Algorithm and Application"

Copied!
34
0
0

加載中.... (立即查看全文)

全文

(1)

Cost-Sensitive Classification:

Algorithm and Application

Hsuan-Tien Lin htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

AI in Data Science Forum December 20, 2017

(2)

About Me

Chief Data Scientist, Appier

Professor, Dept. of CSIE, NTU

Co-author of textbook “Learning from Data: A Short Course”

Instructor of two NTU-Coursera Mandarin-teaching ML Massive Open Online Courses

goal: make machine learningmore realistic

multi-class cost-sensitive classification : in ICML ’10, BIBM ’11, KDD ’12, ACML ’14, IJCAI ’16, etc.

multi-label classification: in ACML ’11, NIPS ’12, ICML ’14, AAAI ’18, etc.

online/active learning: in ICML ’12, ACML ’12, ICML ’14, AAAI ’15, etc.

large-scale data mining (w/ Profs. S.-D. Lin & C.-J. Lin & students):

KDDCup world champions of ’10, ’11 (×2), ’12, ’13 (×2)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 1/33

(3)

Cost-Sensitive Multiclass Classification

Which Digit Did You Write?

?

one (1) two (2) three (3) four (4)

amulticlass classification problem

—grouping “pictures” into different “categories”

C’mon, we know about

multiclass classification all too well!:-)

(4)

Cost-Sensitive Multiclass Classification

Performance Evaluation

(g(x)≈ f (x)?)

?

ZIP code recognition:

1:wrong; 2:right; 3: wrong; 4: wrong

check value recognition:

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4:two-dollar mistake

different applications:

evaluate mis-predictions differently

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 3/33

(5)

Cost-Sensitive Multiclass Classification

ZIP Code Recognition

?

1:wrong; 2:right; 3: wrong; 4: wrong

regular multiclass classification: only right or wrong

wrong cost: 1;right cost: 0

prediction error of h on some (x, y ):

classification cost =Jy 6= h(x)K

regular multiclass classification:

well-studied, many good algorithms

(6)

Cost-Sensitive Multiclass Classification

Check Value Recognition

?

1:one-dollar mistake; 2:no mistake;

3:one-dollar mistake; 4: two-dollar mistake

cost-sensitive multiclass classification:

different costs for different mis-predictions

e.g. prediction error of h on some (x, y ):

absolute cost =|y − h(x)|

cost-sensitive multiclass classification:

relatively newer, need more research

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 5/33

(7)

Cost-Sensitive Multiclass Classification

What is the Status of the Patient?

?

H7N9-infected cold-infected healthy

anotherclassification problem

—grouping “patients” into different “status”

are all mis-prediction costs equal?

(8)

Cost-Sensitive Multiclass Classification

Patient Status Prediction

error measure = society cost XXXXactual XXXXXX

predicted

H7N9 cold healthy

H7N9 0 1000 100000

cold 100 0 3000

healthy 100 30 0

H7N9 mis-predicted as healthy:very high cost

cold mis-predicted as healthy: high cost

cold correctly predicted as cold: no cost

human doctors consider costs of decision;

can computer-aided diagnosis do the same?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 7/33

(9)

Cost-Sensitive Multiclass Classification

What is the Type of the Movie?

? romance fiction terror

customer 1 who hates romance but likes terror error measure = non-satisfaction

XXXXactual XXXXXX predicted

romance fiction terror

romance 0 5 100

customer 2 who likes terror and romance

XXXXactual XXXXXX predicted

romance fiction terror

romance 0 5 3

different customers:

evaluate mis-predictions differently

(10)

Cost-Sensitive Multiclass Classification

Cost-Sensitive Multiclass Classification Tasks

movie classification with non-satisfaction

XXXXactual XXXXXX predicted

romance fiction terror customer 1, romance 0 5 100 customer 2, romance 0 5 3

patient diagnosis with society cost

XXXXactual XXXXXX predicted

H7N9 cold healthy

H7N9 0 1000 100000

cold 100 0 3000

healthy 100 30 0

check digit recognition with absolute cost C(y, h(x)) = |y − h(x)|

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 9/33

(11)

Cost-Sensitive Multiclass Classification

Cost Vector

cost vectorc: a row of cost components

customer 1 on a romance movie: c = (0,5,100)

an H7N9 patient: c = (0,1000,100000)

absolute cost for digit 2: c = (1,0,1,2)

“regular” classification cost for label 2: c(2)c = (1,0,1,1)

regular classification:

special case of cost-sensitive classification

(12)

Cost-Sensitive Multiclass Classification

Setup: Vector-Based Cost-Sensitive Binary Classification Given

N examples, each (inputxn, label yn)∈ X × {1, 2, . . . , K } and cost vectorscn, each∈ RK

—will assumecn[yn] =0=min1≤k ≤Kcn[k ] Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

will assumec[y ] =0=cmin =min1≤k ≤Kc[k ]

note: y not really needed in evaluation cost-sensitive classification:

can express any finite-loss supervised learning tasks

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 11/33

(13)

Cost-Sensitive Multiclass Classification

Our Contribution

(Tu and Lin, ICML 2010)

binary multiclass

regular well-studied well-studied

cost-sensitive known(Zadrozny et al., 2003) ongoing(our works, among others)

a theoretic and algorithmic study of cost-sensitive classification, which ...

introduces a methodology to reduce cost-sensitive classification toregression

providesstrong theoretical support for the methodology

leads to a promising algorithm withsuperior experimental results

will describe the methodology and an algorithm

(14)

Cost-Sensitive Classification by Regression

Key Idea: Cost Estimator

Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

if every c[k ] known optimal

g(x) = argmin1≤k ≤Kc[k ]

if rk(x)≈ c[k] well

approximately good gr(x) = argmin1≤k ≤Krk(x)

how to get cost estimator rk?regression

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 13/33

(15)

Cost-Sensitive Classification by Regression

Cost Estimator by Per-class Regression

Given

N examples, each (inputxn, label yn, cost cn)∈ X × {1, 2, . . . , K } × RK

input cn[1] input cn[2] . . . input cn[K ]

x1 0, x1 2, x1 1

x2 1, x2 3, x2 5

· · ·

xN 6, xN 1, xN 0

| {z }

r1 | {z }

r2 | {z }

rK

want: rk(x)≈ c[k] for all future (x, y, c) and k

(16)

Cost-Sensitive Classification by Regression

The Reduction Framework







 cost- sensitive example (xn, yn, cn)





@ AA

%

$ '

&

regression examples (Xn,k, Yn,k)

k = 1,· · · , K ⇒⇒⇒ regression

algorithm

⇒⇒

%

$ '

&

regressors rk(x) k∈ 1, · · · , K

AA

@









 cost- sensitive classifier gr(x)

1 transform cost-sensitive examples (xn, yn, cn)to regression examples xn,k, Yn,k = xn, cn[k ]

2 use your favorite algorithm on the regression examples and get estimators rk(x)

3 for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x)

the reduction-to-regression framework:

systematic & easy to implement

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 15/33

(17)

Cost-Sensitive Classification by Regression

Theoretical Guarantees (1/2)

gr(x) = argmin

1≤k ≤K

rk(x)

Theorem (Absolute Loss Bound)

For any set of estimators (cost estimators){rk}Kk =1and for any example (x, y , c) with c[y ] =0,

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

.

low-cost classifier⇐= accurate estimator

(18)

Cost-Sensitive Classification by Regression

Theoretical Guarantees (2/2)

gr(x) = argmin

1≤k ≤K

rk(x)

Theorem (Squared Loss Bound)

For any set of estimators (cost estimators){rk}Kk =1and for any example (x, y , c) with c[y ] =0,

c[gr(x)]≤ v u u t2

K

X

k =1

(rk(x)− c[k])2.

applies to commonleast-square regression

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 17/33

(19)

Cost-Sensitive Classification by Regression

A Pictorial Proof

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

assumec ordered and not degenerate:

y = 1;0=c[1]<c[2]≤ · · · ≤c[K ]

assume mis-prediction gr(x) = 2:

r2(x) = min1≤k ≤Krk(x)≤ r1(x)

c[1] -

1 r2(x)6 r1(x)6

2

c[2] c[3]

r3(x)6

c[K ] rK(x)6

c[2]− c[1]

|{z}

0

≤ ∆1

+

2

K

X

k =1

rk(x)− c[k]

(20)

Cost-Sensitive Classification by Regression

An Even Closer Look

let∆1≡ r1(x)− c[1] and2≡ c[2] − r2(x)

11≥ 0 and∆2≥ 0: c[2] ≤1+∆2 21≤ 0 and∆2≥ 0: c[2] ≤2

31≥ 0 and∆2≤ 0: c[2] ≤1

c[2]≤ max(∆1, 0) + max(∆2, 0)≤ ∆1

+

2

c[1] -

1

r2(x)6 r1(x)6

2

c[2] c[3]

r3(x)6

c[K ] rK(x)6

6 - r2(x) 6

1

r1(x)

2

6 -

r2(x) r1(x)6

1

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 19/33

(21)

Cost-Sensitive Classification by Regression

Tighter Bound with One-sided Loss

Defineone-sided loss ξk ≡ max(∆k,0) with ∆k ≡

rk(x)− c[k]

ifc[k ] = cmin

k ≡

c[k ]− rk(x)

ifc[k ]6= cmin

Intuition

c[k ] = cmin:wish to have rk(x)≤ c[k]

c[k ]6= cmin:wish to have rk(x)≥ c[k]

—both wishes same as ∆k ≤ 0 and hence ξk =0

One-sided Loss Bound:

c[gr(x)]≤

K

X

k =1

ξk

K

X

k =1

k

(22)

Cost-Sensitive Classification by Regression

The Improved Reduction Framework







 cost- sensitive example (xn, yn, cn)





@ AA

%

$ '

&

regression examples (Xn,k, Yn,k, Zn,k)

k = 1,· · · , K ⇒⇒⇒ one-sided regression algorithm ⇒⇒⇒

%

$ '

&

regressors rk(x) k∈ 1, · · · , K

AA

@









 cost- sensitive classifier gr(x)

1 transform cost-sensitive examples (xn, yn, cn)to regression examples

2 use aone-sided regression algorithmto get estimators rk(x)

3 for each new inputx, predict its class using gr(x) = argmin1≤k ≤Krk(x)

the reduction-to-OSR framework:

need a good OSR algorithm

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 21/33

(23)

Cost-Sensitive Classification by Regression

Regularized One-sided Hyper-linear Regression

Given

xn,k, Yn,k, Zn,k =

xn, cn[k ], 2r

cn[k ] =cn[yn] z− 1

Training Goal

all trainingξn,k =max

Zn,k rk(xn,k)− Yn,k



| {z }

n,k

, 0

small

—will drop k

min

w,b

λ

2hw, wi +

N

X

n=1

ξn

to get rk(x) =hw, φ(x)i + b

(24)

Cost-Sensitive Classification by Regression

One-sided Support Vector Regression

Regularized One-sided Hyper-linear Regression minw,b

λ

2hw, wi +

N

X

n=1

ξn

ξn=max (Zn· (rk(xn)− Yn), 0) Standard Support Vector Regression

minw,b

1

2Chw, wi +

N

X

n=1

nn)

ξn=max (+1· (rk(xn)− Yn− ), 0) ξn=max (−1· (rk(xn)− Yn+), 0)

OSR-SVM = SVR +(0→ )+(keepξn orξnby Zn)

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 23/33

(25)

Cost-Sensitive Classification by Regression

OSR versus Other Reductions

OSR: K regressors

How unlikely (costly) does the example belong to class k ?

Filter Tree (FT): K − 1 binary classifiers

Is the lowest cost within labels{1, 4} or {2, 3}?

Is the lowest cost within label{1} or {4}?

Is the lowest cost within label{2} or {3}?

Weighted All Pairs (WAP): K (K −1)2 binary classifiers isc[1] or c[4] lower?

(26)

Cost-Sensitive Classification by Regression

OSR-SVM on Semi-Real Data

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300 350

avg. test cost

OSR

OVA (Tu and Lin, ICML 2010)

OSR: a cost-sensitive extension of OVA

OVA: regular SVM

OSR often significantly better than OVA

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 25/33

(27)

Cost-Sensitive Classification by Regression

OSR versus FT on Semi-Real Data

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300

avg. test cost

OSR

FT (Tu and Lin, ICML 2010)

OSR (per-class):

O(K ) training, O(K ) prediction

FT (tournament):

O(K ) training, O(log2K ) prediction

FT faster, but OSR better performing

(28)

Cost-Sensitive Classification by Regression

OSR versus WAP on Semi-Real Data

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300

avg. test cost

OSR

WAP (Tu and Lin, ICML 2010)

OSR (per-class):

O(K ) training, O(K ) prediction

WAP (pairwise):

O(K2)training, O(K2)prediction

OSR faster and comparable performance

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 27/33

(29)

Cost-Sensitive Classification by Regression

Six Years after OSR-SVM

(Chung, Lin and Yang, IJCAI 2016)

OSR-SVM min

w,b

λ

2hw, wi +XN n=1ξn

with rk(x) =hw, φ(x)i + b

ξn =max (Zn· (rk(xn)− Yn), 0)

CS Deep NNet (CSDNN) min

NNet regularizer +XN n=1δn

with rk(x) =NNet(x)

δn=ln (1 + exp (Zn· (rk(xn)− Yn)))

CSDNN: world’s first cost-sensitive deep model via a smoother upper bound—δn≥ ξnbecause ln(1 + exp(•)) ≥ max(•, 0)

δn used in bothpretraining& training for better NNet feature extraction

concept of reduction-to-OSR still useful after 6 years

(30)

Cost-and-Error-Sensitive Classification with Bioinformatics Application

A Real Medical Application: Classifying Bacteria

The Problem

by human doctors: different treatments⇐⇒ serious costs

cost matrix averaged from two doctors:

Ab Ecoli HI KP LM Nm Psa Spn Sa GBS

Ab 0 1 10 7 9 9 5 8 9 1

Ecoli 3 0 10 8 10 10 5 10 10 2

HI 10 10 0 3 2 2 10 1 2 10

KP 7 7 3 0 4 4 6 3 3 8

LM 8 8 2 4 0 5 8 2 1 8

Nm 3 10 9 8 6 0 8 3 6 7

Psa 7 8 10 9 9 7 0 8 9 5

Spn 6 10 7 7 4 4 9 0 4 7

Sa 7 10 6 5 1 3 9 2 0 7

GBS 2 5 10 9 8 6 5 6 8 0

is cost-sensitive classificationrealistic?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 29/33

(31)

Cost-and-Error-Sensitive Classification with Bioinformatics Application

OSR versus OVO/CSOVO(WAP)/FT on Bacteria Data

(Jan et al., BIBM 2011)

. . . . . .

Are cost-sensitive algorithms great?

RBF kernel

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

OVOSVM

csOSRSVM csOVOSVM csFTSVM

algorithms

cost

.

...Cost-sensitive algorithms perform better than regular algorithm

Jan et al. (Academic Sinica) Cost-Sensitive Classification on SERS October 31, 2011 15 / 19

OSR best: cost-sensitive classification is helpful

(32)

Cost-and-Error-Sensitive Classification with Bioinformatics Application

Soft Cost-sensitive Classification

The Problem

0.1 0.15 0.2 0.25 0.3

0 0.05 0.1 0.15 0.2

Error (%)

Cost

cost-sensitive classifier: low cost but high error

traditional classifier: low error but high cost

how can we get theblueclassifiers?: low error and low cost

cost-and-error-sensitive: more suitable for medical needs

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 31/33

(33)

Cost-and-Error-Sensitive Classification with Bioinformatics Application

Improved OSR for Cost and Error on Semi-Real Data

key idea(Jan et al., KDD 2012): consider a‘modified’ costthat mixesoriginal costand‘regular cost’

Cost

iris

wine

glass vehicle

vowel

segment

dna

satimage

usps

zoo

splice ecoli soybean

Error

iris

wine

glass

vehicle

vowel

segment

dna

satimage

usps

zoo

splice

ecoli

soybean

improves other cost-sensitive classification algorithms, too

(34)

Cost-and-Error-Sensitive Classification with Bioinformatics Application

Conclusion

reduction from cost-sensitive classification to regression:

viacost estimation

one-sided regressionwithsolid theoretical guarantee

superior experimental resultswith OSR-SVM

OSRfor deep learning: OSR-SVM→ CSDNN

OSR for medical application: towardscost-and-error-sensitive

Thank you. Questions?

Hsuan-Tien Lin (NTU CSIE) Cost-Sensitive Classification: Algorithm and Application 33/33

參考文獻

相關文件

In addi- tion, soft cost-sensitive classification algorithms reach significantly lower test error rate than their hard siblings, while achieving similar (sometimes better) test

D. Existing cost-insensitive active learning strategies 1) Binary active learning: Active learning for binary classification (binary active learning) has been studied in many works

We address the two issues by proposing Feature-aware Cost- sensitive Label Embedding (FaCLE), which utilizes deep Siamese network to keep cost information as the distance of

error measure = society cost XXXX actual XXXX

introduces a methodology for extending regular classification algorithms to cost-sensitive ones with any cost. provides strong theoretical support for

makes any (robust) regular classification algorithm cost-sensitive theoretical guarantee: cost equivalence. algorithmic use: a novel and simple algorithm CSOVO experimental

1 After computing if D is linear separable, we shall know w ∗ and then there is no need to use PLA.. Noise and Error Algorithmic Error Measure. Choice of

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary.. Non-Bayesian Perspective