• 沒有找到結果。

Improving Ranking Performance with Cost-sensitive Ordinal Classification via Regression

N/A
N/A
Protected

Academic year: 2022

Share "Improving Ranking Performance with Cost-sensitive Ordinal Classification via Regression"

Copied!
19
0
0

加載中.... (立即查看全文)

全文

(1)

Improving Ranking Performance with

Cost-sensitive Ordinal Classification via Regression

Yu-Xun Ruan1, Hsuan-Tien Lin1, Ming-Feng Tsai2

National Taiwan University1, National Chengchi University2

Preference Learning @ EURO, July 10, 2012

(2)

Preference Ranking in Search Engine

not just for searchinggood machine learning book ; but also forrecommendation systems & other web service

(3)

Three Properties of Search-Engine Ranking

listwise with focus ontop ranks query-oriented & personalized

emphasis onhighly-preferred (relevant) items large scale

both duringtraining & testing

e.g. Yahoo! Learning-To-Rank Challenge 2010: 473K training URLs, 166K test URLs

ordinal data

labeled qualitatively by human, e.g. { highly irrelevant, irrelevant, neutral, relevant, highly relevant} lack of quantitative info

search-engine ranking problem:

learning a ranker fromlarge scaleordinal data with focus ontop ranks

(4)

Search-Engine Ranking Setup

Given

for query indices q = 1, 2, · · · , Q,

a set of related documents {xq,i}N(q)i=1

ordinal relevance yq,i ∈ Y = {0, 1, . . . , K }for each documentxq,i withlarge Q and N(q)

Goal

a ranker r (x) that “accurately ranks”topxQ+1,ifrom an unseen setof documents {xQ+1,i}

how to evaluateaccurate ranking around the top?

(5)

Expected Reciprocal Rank

(ERR; Chapelle et al., CIKM ’09)

Assumption: Choice Probability of Single Document for any example (documentx, rank y ),

P(user chooses documentx) = (2y − 1)/2K Assumption: Stopping Probability ofList of Documents

P(user stops at position i of list)

= P(doesn’t stop at pos. i − 1) × P(chooses document at pos. i) ERR: TotalDiscountedStopping Probability of List of Documents

ERRq(r ) ≡

N(q)

X

i=1

1

iP(user stops at position i of the list ordered by r )

large ERR ⇔ small i matches large P ⇔ good ranking around top

(6)

Possible Approach 1: LambdaRank

(Burges et al., NIPS ’06)

maximize ERR directly with non-smooth optimization on N(q)! list reorderings

Pros

respecttop rankgoal

respectordinalnature of data

Cons

difficult optimization problem

challenging to apply onlarge-scaledata

LambdaRank: a state-of-the-art approach, butpossibly inefficient

(7)

Possible Approach 2: SVM-Rank

(Joachims, KDD ’02)

conduct listwise ranking by predicting pairwise preferences accurately

Pros

respectordinalnature of data (w/ comparison) somewhat applicable tolarge-scaledata

Cons

all pairs equal, not respectingtop rankgoal

somewhat applicable tolarge-scaledata, because of O(N2)pairs

SVM-Rank: a baseline pairwise ranking approach, but possibly not the best for listwise

(8)

Possible Approach 3:

Direct Regression

(Cossock and Tong, COLT ’06)

conduct listwise ranking by predicting real-valued scores accurately

Pros

respecttop rankgoal by embedding it in regression loss applicable tolarge-scaledata

Cons

treats y as numerical score, not respectingordinalnature of data

Direct Regression: a simple pointwise ranking approach, but may be improved by taking ordinal property into account

(9)

Possible Approach 4:

Ordinal Classification

(MCRank; Li et al., NIPS ’07)

conduct listwise ranking by predicting ordinal-valued ranks accurately

Pros

somewhat respecttop rankgoal respectordinalnature of data applicable tolarge-scaledata

Cons

somewhat respecttop rankgoal because of a loose bound in embedding the goal

McRank: a state-of-the-art pointwise ranking approach, but may be improved further towards top rank goal

(10)

Our Contributions

an algorithmic development on cost-sensitive ordinal classification via regression (COCR), which ...

systematically respects all three properties of search-engine ranking

algorithm top rank large scale ordinal data

LambdaRank ? ◦ ?

SVM-Rank × ◦ ?

Direct Regression ? ? ×

McRank ◦ ? ?

COCR

? ? ?

leads topromising experimental results

(11)

Overview of Cost-sensitive Ordinal Classification via Regression (COCR)

reduction from listwise ranking (ERR) to cost-sensitive ordinal classification (approximately)

—aim fortop rankandlarge scale data(like Direct Regression) reduction from cost-sensitive ordinal classification to binary classification

—aim forrespecting ordinal data(like McRank) reduction from binary classification to regression

—aim forlarge scale dataandavoiding discrete ties (like Direct Regression)

COCR: combine the benefits of Direct Regression and McRank

(12)

Ordinal Classification via Binary Classification

(Lin & Li, Neural Computation ’12)

desired pointwise ranking problem

r (x) = What is the rank of the document x?

reduced problems

gk(x) = Is the rank of document x greater than k ? train binary classifiers with {(xq,i,[yq,i >k ])}

predict with a simplecountingranker rg(x) =

K −1

P

k =0

gk(x) simple and efficient

good theoretical guarantee:

1 absolutely good binary classifier =⇒ absolutely good ranker relatively good binary classifier =⇒ relatively good ranker

(13)

Ordinal Classification via Regression

desired pointwise ranking problem

E (y |x) = What is theexpected rankof the documentx?

exploited by both Direct Regression and McRank

reduced problems

k(x) = P(y > k |x) = What is theprobabilitythat the rank of documentx is greater than k ?

trainregressorswith {(xq,i,[yq,i >k ])}

predict with a simplecountingestimatorE (y |x) =

K −1

P

k =0

k(x)

absolutely good regressor =⇒ absolutely good expected rank estimator

(14)

Cost-sensitive Ordinal Classification via Regression

desired pointwise ranking problem

Ec(y |x) = What is thebiasedexpected rankof the documentx ifif a mis-ranking is penalized with a costc[r (x)]?

for embedding the emphasis on top rank

reduced problems

k ,w(x) = What is thebiasedprobabilitythat the rank of documentx is greater than k when a wrong answer is penalized with a weight wk?

trainregressorswith {(xq,i,[yq,i >k ],wq,i,k)}

predict with a simplecountingestimatorEc(y |x) =

K −1

P

k =0

k ,w(x)

some good theoretical guarantees follow similarly

(15)

Optimistic ERR (oERR) Cost for COCR

desired listwise criteria

How to make ERR(r ) close to ERR(p), the ERR of perfect ranker?

embed criteria within cost

ERR(p) − ERR(r ) ≤·

N(q)

X

i=1



2yq,i − 2r (xq,i)2

+∆

∆≈ 0 if r ≈ p (optimistic) then,c[k ] = 2y− 2k2

embeds ERR

not a very tight bound, butbetter than nothing

—heuristically used in some earlier works

(16)

The Proposed Algorithm

Given

for query indices q = 1, 2, · · · , Q,

a set of related documents {xq,i}N(q)i=1

ordinal relevance yq,i ∈ Y = {0, 1, . . . , K }for each documentxq,i withlarge Q and N(q)

1 construct {(xq,i,yq,i,c[k ])} with oERR cost c

2 obtain {(xq,i, [yq,i >k ], wq,i,k)}by reduction to binary classification

3 train regressors ˜gk(x) with {(xq,i, [yq,i >k ], wq,i,k)}

4 predict (order) future documentx with

K −1

P

k =0

k(x)

systematic, simple, efficient, and take all three properties into account

(17)

Empirical Comparison Using Linear Regression

data set Direct Regression McRank-like oERR-COCR

LTRC1 0.4470 0.4484 0.4505

LTRC2 0.4440 0.4465 0.4461

MS10K 0.2643 0.2642 0.2792

MS30K 0.2748 0.2748 0.2942

best ERR

significantly better than direct regression

oERR-COCRusually the best, andordinalinformation is important

(18)

Empirical Comparison Using M5’ Decision Tree

data set Direct Regression McRank-like oERR-COCR

LTRC1 0.4499 0.4526 0.4530

LTRC2 0.4489 0.4499 0.4538

MS10K 0.3014 0.3129 0.3156

MS30K 0.3298 0.3438 0.3451

best ERR

significantly better than direct regression

oERR-COCRthe best

(19)

Conclusion

Cost-sensitiveOrdinal ClassificationviaRegression emphasize ontop rank

respectordinal data

regress pointwise forlarge-scale data theoretical guarantee:

reduction from listwise to cost-sensitive ordinal, approximately reduction from cost-sensitive ordinal to binary

reduction from binary to regression obtainedgood experimental results

Thank you. Questions?

參考文獻

相關文件

error measure = society cost XXXX actual XXXX

good binary classification ⇒ good cost-sensitive classification promising practical performance (with a good binary classifier) does not scale well with K , the number of

introduces a methodology for extending regular classification algorithms to cost-sensitive ones with any cost. provides strong theoretical support for

thresholded ensemble model: useful for ordinal regression theoretical reduction: new large-margin bounds. algorithmic reduction: new learning

thresholded ensemble model: useful for ordinal regression theoretical reduction: new large-margin bounds. algorithmic reduction: new training algorithms –

To build a cost-sensitive DNN for a K-class cost-sensitive classification problem, the proposed framework replaces the layer-wise pretraining step with layer-wise cost estimation,

We also used reduction and reverse reduction to design a novel boosting ap- proach, AdaBoost.OR, to improve the performance of any cost-sensitive base ordinal ranking algorithm..

regression loss for strong theoretical support leads to a promising SVM-based algorithm with superior experimental