• 沒有找到結果。

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification

N/A
N/A
Protected

Academic year: 2022

Share "One-sided Support Vector Regression for Multiclass Cost-sensitive Classification"

Copied!
19
0
0

加載中.... (立即查看全文)

全文

(1)

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification

Han-Hsing Tu and Hsuan-Tien Lin National Taiwan University

June 23, 2010

(2)

Cost-sensitive Classification

Binary Cost-sensitive Classification

? cold-infected healthy

XXXXactual XXXXXX predicted

cold healthy

cold 0 C−1

healthy C1 0

binary, cost-matrix based

(3)

Cost-sensitive Classification

Multiclass Cost-sensitive Classification

? H1N1-infected cold-infected healthy

error measure = society cost XXXXactual XXXXXX

predicted

H1N1 cold healthy

H1N1 0 1000 100000

cold 100 0 3000

healthy 100 30 0

human doctors consider costs of decision

want computer-aided diagnosis to behave similarly multiclass, cost-matrix based

(4)

Cost-sensitive Classification

From Cost Matrix to Cost Vector

with actual underlying status

prediction H1N1 cold healthy society cost 0 1000 100000

only a "row" needed per example:cost vectorc an H1N1 patient:c = (0,1000,100000) a healthy patient:c = (100,30,0)

“regular” classification cost for label 2:c = (1,0,1,1)

binary cost-sensitive classification cost for label−1: c = (0,C−1) multiclass,cost-vector based:

a very general setup

(5)

Cost-sensitive Classification

Cost-sensitive Classification Setup

Given

N examples, each (inputxn, label yn, cost cn)∈ X × {1, 2, . . . , K } × RK

—will assumecn[yn] =0=min1≤k ≤Kcn[k ]

Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

will assumec[y ] =0=min1≤k ≤Kc[k ] = cmin note: y not really needed in evaluation

cost-sensitive classification:

can express any finite-loss supervised learning tasks

(6)

Cost-sensitive Classification

Our Contributions

decomposition per-class pair-wise tournament err. correcting

regular OVA OVO FT ECOC

cost-sensitive our work WAP FT SECOC

a theoretic and algorithmic study of multiclass cost-sensitive classification, which ...

introduces a methodology toreduce cost-sensitive classification toregression couples the methodology with a novel

regression loss forstrong theoretical support leads to a promising SVM-based algorithm with superior experimental results

(7)

Design and Analysis

Key Idea: Cost Estimator

Goal

a classifier g(x) that pays a small cost c[g(x)] on future unseen example (x, y , c)

if everyc[k ] known best g(x) = argmin

1≤k ≤K

c[k ]

if rk(x)≈ c[k] well

approximately good gr(x) = argmin

1≤k ≤K

rk(x)

how to get cost estimator rk? regression

(8)

Design and Analysis

Cost Estimator by Regression

Given

N examples, each (inputxn, label yn, cost cn)∈ X × {1, 2, . . . , K } × RK

input cn[1] input cn[2] . . . input cn[K ]

x1 0, x1 2, x1 1

x2 1, x2 3, x2 5

· · ·

xN 6, xN 1, xN 0

| {z }

r1 | {z }

r2 | {z }

rK want: rk(x)≈ c[k] for all future (x, y, c) and k

good rk =⇒ good gr?

(9)

Design and Analysis

Absolute Loss Bound

gr(x) = argmin

1≤k ≤K

rk(x)

Theorem

For any set of regressors (cost estimators){rk}Kk =1

and for any example (x, y , c) with c[y ] =0,

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

.

good rk =⇒ good gr? YES!

(10)

Design and Analysis

A Pictorial Proof

c[gr(x)]≤

K

X

k =1

rk(x)− c[k]

assumec ordered and not degenerate:

y = 1;0=c[1]<c[2]≤ · · · ≤c[K ] assume mis-prediction gr(x) = 2:

r2(x) = min1≤k ≤Krk(x)≤ r1(x)

-

c[1]

1 r2(x)6r1(x)6

2

c[2] c[3]

r3(x)6

c[K ] rK6(x)

c[2]− c[1]

|{z}

0

≤ ∆1

+

2

K

X

k =1

rk(x)− c[k]

(11)

Design and Analysis

A Closer Look

let∆1≡ r1(x)− c[1] and2≡ c[2] − r2(x)

c[1]

1

r2(x)6r 6

1(x)

2

c[2] c[3]

r3(x)6

1≥ 0 and∆2≥ 0:

c[2]≤∆1+∆2

r2(x)6 6

1

r1(x)

2

1≤ 0 and∆2≥ 0:

c[2]≤∆2

r2(x)6 r1(x)6

1

1≥ 0 and∆2≤ 0:

c[2]≤∆1

c[2]≤ max(∆1, 0) + max(∆2, 0)≤ ∆1

+ ∆2

(12)

Design and Analysis

Tighter Bound with One-sided Loss

Defineone-sided lossξk ≡ max(∆k, 0), with ∆k ≡

rk(x)− c[k]

ifc[k ] = cmin

k ≡

c[k ]− rk(x)

ifc[k ]6= cmin

Intuition: ξk =0 encodes ...

whenc[k ] = cmin:wish to have rk(x)≤ c[k]

whenc[k ]6= cmin:wish to have rk(x)≥ c[k]

c[gr(x)]≤

K

X

k =1

ξk

| {z }

one-sided loss bound

K

X

k =1

k

| {z }

absolute loss bound

(13)

Design and Analysis

The Proposed Reduction Framework

1 transform cost-sensitive examples (xn, yn, cn)to regression examples X(k )n , Yn(k ), Zn(k ) = (xn, cn[k ],

+

/

-

)

2 use aone-sided regression algorithm to get regressors rk(x)

3 for each new inputx, predict its class using gr(x) = argmin

1≤k ≤K

rk(x)

how to design a good OSR algorithm?







 cost- sensitive example (xn, yn, cn)





@ AA

%

$ '

&

regression examples (Xn,k, Yn,k, Zn,k)

k = 1,· · · , K ⇒⇒⇒ one-sided regression algorithm ⇒⇒⇒

%

$ '

&

regressors rk(x) k∈ 1, · · · , K

AA

@









 cost- sensitive classifier gr(x)

(14)

The Proposed Algorithm

Support Vector Machinery for One-sided Regression

Given

Xn,k, Yn,k, Zn,k = (xn, cn[k ],

+

/

-

) Training Goal

all trainingξn,k =max

Zn,k rk(Xn,k)− Yn,k



| {z }

n,k

, 0

small

OSR-SVM for cost-sensitive classification:

min

wk,bk

1

2hwk, wki + C

N

X

n=1

ξn,k

to get rk(X) =hwk, φ(X)i + bk

(15)

The Proposed Algorithm

One-sided Support Vector Regression

Standard Support Vector Regression

minw,b

1

2hw, wi + C

N

X

n=1

nn)

ξn=max (

+

(rk(Xn)− Yn− ), 0) ξn =max (

-

(rk(Xn)− Yn+), 0) One-sided Support Vector Regression(for each k )

min

w,b

1

2hw, wi + C

N

X

n=1

ξn

ξn=max (Zn· (rk(Xn)− Yn), 0) OSR-SVM:

SVR +( = 0)+(keepξ orξ by Z )

(16)

The Proposed Algorithm

OSR-SVM versus OVA-SVM: Formulations

OSR-SVM: gr(x) = argmin rk(X)

wmink,bk

1

2hwk, wki + C

N

X

n=1

ξn,k with rk(X) =hwk, φ(X)i + bk

ξn,k =max Zn,k · rk(Xn,k)− Yn,k , 0

OVA-SVM(−1 for correct class): gr(x) = argmin rk(X) with ξn,k =max Zn,k · rk(Xn,k)+1, 0

OVA-SVM:

special case that replaces Yn,k (i.e.cn[k ]) by−Zn,k

(17)

Experiments

OSR-SVM versus OVA-SVM: Experiments

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300 350

avg. test cost

OSR

OVA OSR: a cost-sensitive extension of OVA OVA: cost-insensitive SVM

OSR often significantly better than OVA

(18)

Experiments

OSR-SVM versus WAP/FT/SECOC-SVM

ir. wi. gl. ve. vo. se. dn. sa. us. le.

0 50 100 150 200 250 300

avg. test cost

OSR WAP FT SECOC

OSR (per-class):

O(K ) train/pred WAP (pair-wise):

O(K2)train/pred FT (tournament):

O(K ) train;

O(log K ) pred

SECOC (err correct):

big O(K ) train/pred

speed: FT >OSR> SECOC > WAP;

performance: OSR≈ WAP > FT > SECOC

(19)

Conclusion

Conclusion

reduction to regression:

a simple way of designing cost-sensitive classification algorithms theoretical guarantee:

absolute andone-sided bounds algorithmic use:

anovel and simple algorithm OSR-SVM experimental performance of OSR-SVM:

leading in SVM family

Thank you. Questions?

參考文獻

相關文件

Experiments on the benchmark and the real-world data sets show that our proposed methodology in- deed achieves lower test error rates and similar (sometimes lower) test costs

D. Existing cost-insensitive active learning strategies 1) Binary active learning: Active learning for binary classification (binary active learning) has been studied in many works

We refer to “total training time of method x” as the whole LIBSVM training time (where method x is used for reconstructing gradients), and “reconstruction time of method x” as

We have developed and implemented an accurate online support vector regression (AOSVR) algorithm that permits efficient retraining when a new sample is added to, or when an

LWJsrqt.. The motivation behind this mapping is that it is more likely to find a linear hyperplane in the high dimensional feature space. Several kernel functions, such as

(2015), an effective parameter-selection procedure by using warm-start techniques to solve a sequence of optimization problems has been proposed for linear classification.. We

error measure = society cost XXXX actual XXXX

customer 1 who hates romance but likes terror error measure = non-satisfaction. XXXX actual XXXX