• 沒有找到結果。

boosting C4.5

N/A
N/A
Protected

Academic year: 2022

Share "boosting C4.5"

Copied!
74
0
0

加載中.... (立即查看全文)

全文

(1)

A Boosting TutorialA Boosting Tutorial A Boosting Tutorial A Boosting Tutorial A Boosting Tutorial

Rob Schapire

Princeton University

www.cs.princeton.edu/∼schapire

(2)

Example: “How May I Help You?”Example: “How May I Help You?”Example: “How May I Help You?”Example: “How May I Help You?”Example: “How May I Help You?”

[Gorin et al.]

• goal:goal:goal:goal: automatically categorize type of call requested by phonegoal:

customer (Collect, CallingCard, PersonToPerson, etc.)

yes I’d like to place a collect call long distance please (Collect)

operator I need to make a call but I need to bill it to my office

(ThirdNumber)

yes I’d like to place a call on my master card please (CallingCard)

I just called a number in sioux city and I musta rang the wrong number

because I got the wrong party and I would like to have that taken off of my bill (BillingCredit)

• observationobservationobservationobservation:observation

easyeasyeasyeasy to find “rules of thumb” that are “often” correcteasy

e.g.: “IF ‘card’ occurs in utterance THEN predict ‘CallingCard’ ”

hardhardhardhard to findhard singlesinglesinglesingle highly accurate prediction rulesingle

(3)

The Boosting ApproachThe Boosting ApproachThe Boosting ApproachThe Boosting ApproachThe Boosting Approach

• devise computer program for deriving rough rules of thumb

• apply procedure to subset of examples

• obtain rule of thumb

• apply to 2nd subset of examples

• obtain 2nd rule of thumb

• repeat T times

(4)

DetailsDetailsDetailsDetailsDetails

• how to choose exampleschoose exampleschoose exampleschoose examples on each round?choose examples

concentrate on “hardest” examples

(those most often misclassified by previous rules of thumb)

• how to combinecombinecombinecombine rules of thumb into single prediction rule?combine

take (weighted) majority vote of rules of thumb

(5)

BoostingBoostingBoostingBoostingBoosting

• boostingboostingboostingboosting = general method of converting rough rules ofboosting thumb into highly accurate prediction rule

• technicallytechnicallytechnicallytechnically:technically

assumeassumeassumeassume givenassume “weak” learning algorithm“weak” learning algorithm“weak” learning algorithm“weak” learning algorithm that can“weak” learning algorithm

consistently find classifiers (“rules of thumb”) at least slightly better than random, say, accuracy ≥ 55%

(in two-class setting)

given sufficient data, a boosting algorithmboosting algorithmboosting algorithmboosting algorithm canboosting algorithm provablyprovablyprovablyprovablyprovably

construct single classifier with very high accuracy, say, 99%

(6)

Outline of TutorialOutline of TutorialOutline of TutorialOutline of TutorialOutline of Tutorial

• brief background

• basic algorithm and core theory

• other ways of understanding boosting

• experiments, applications and extensions

(7)

Brief BackgroundBrief BackgroundBrief BackgroundBrief BackgroundBrief Background

(8)

The Boosting ProblemThe Boosting ProblemThe Boosting ProblemThe Boosting ProblemThe Boosting Problem

• “strong” PAC algorithm

for any distribution

∀ > 0, δ > 0

given polynomially many random examples

finds classifier with error ≤  with probability ≥ 1 − δ

• “weak” PAC algorithm

same, but only for  ≥ 12 − γ

[Kearns & Valiant ’88]:

does weak learnability imply strong learnability?

(9)

Early Boosting AlgorithmsEarly Boosting AlgorithmsEarly Boosting AlgorithmsEarly Boosting AlgorithmsEarly Boosting Algorithms

[Schapire ’89]:

first provable boosting algorithm

call weak learner three times on three modified distributions

get slight boost in accuracy

apply recursively

[Freund ’90]:

“optimal” algorithm that “boosts by majority”

[Drucker, Schapire & Simard ’92]:

first experiments using boosting

limited by practical drawbacks

(10)

AdaBoostAdaBoostAdaBoostAdaBoostAdaBoost

[Freund & Schapire ’95]:

introduced “AdaBoostAdaBoostAdaBoostAdaBoost” algorithmAdaBoost

strong practical advantages over previous boosting algorithms

• experiments and applications using AdaBoost:experiments and applications using AdaBoost:experiments and applications using AdaBoost:experiments and applications using AdaBoost:experiments and applications using AdaBoost:

[Drucker & Cortes ’96]

[Jackson & Craven ’96]

[Freund & Schapire ’96]

[Quinlan ’96]

[Breiman ’96]

[Maclin & Opitz ’97]

[Bauer & Kohavi ’97]

[Schwenk & Bengio ’98]

[Schapire, Singer & Singhal ’98]

[Abney, Schapire & Singer ’99]

[Haruno, Shirai & Ooyama ’99]

[Cohen & Singer’ 99]

[Dietterich ’00]

[Schapire & Singer ’00]

[Collins ’00]

[Escudero, M`arquez & Rigau ’00]

[Iyer, Lewis, Schapire et al. ’00]

[Onoda, R¨atsch & M ¨uller ’00]

[Tieu & Viola ’00]

[Walker, Rambow & Rogati ’01]

[Rochery, Schapire, Rahim & Gupta ’01]

[Merler, Furlanello, Larcher & Sboner ’01]

[Di Fabbrizio, Dutton, Gupta et al. ’02]

[Qu, Adam, Yasui et al. ’02]

[Tur, Schapire & Hakkani-T ¨ur ’03]

[Viola & Jones ’04]

[Middendorf, Kundaje, Wiggins et al. ’04]

...

• continuing development of theory and algorithms:continuing development of theory and algorithms:continuing development of theory and algorithms:continuing development of theory and algorithms:continuing development of theory and algorithms:

[Breiman ’98, ’99]

[Schapire, Freund, Bartlett & Lee ’98]

[Grove & Schuurmans ’98]

[Mason, Bartlett & Baxter ’98]

[Schapire & Singer ’99]

[Cohen & Singer ’99]

[Freund & Mason ’99]

[Domingo & Watanabe ’99]

[Mason, Baxter, Bartlett & Frean ’99, ’00]

[Duffy & Helmbold ’99, ’02]

[Freund & Mason ’99]

[Ridgeway, Madigan & Richardson ’99]

[Kivinen & Warmuth ’99]

[Friedman, Hastie & Tibshirani ’00]

[R¨atsch, Onoda & M ¨uller ’00]

[R¨atsch, Warmuth, Mika et al. ’00]

[Allwein, Schapire & Singer ’00]

[Friedman ’01]

[Koltchinskii, Panchenko & Lozano ’01]

[Collins, Schapire & Singer ’02]

[Demiriz, Bennett & Shawe-Taylor ’02]

[Lebanon & Lafferty ’02]

[Wyner ’02]

[Rudin, Daubechies & Schapire ’03]

[Jiang ’04]

[Lugosi & Vayatis ’04]

[Zhang ’04]

...

(11)

Basic Algorithm and Core TheoryBasic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory

(12)

A Formal Description of BoostingA Formal Description of BoostingA Formal Description of BoostingA Formal Description of BoostingA Formal Description of Boosting

• given training settraining settraining settraining settraining set (x1, y1), . . . , (xm, ym)

• yi ∈ {−1, +1} correct label of instance xi ∈ X

• for t = 1, . . . , T:

construct distribution Dt on {1, . . . , m}

find weak classifierweak classifierweak classifierweak classifierweak classifier (“rule of thumb”) ht : X → {−1, +1}

with small errorerrorerrorerrorerror t on Dt:

t = PrDt[ht(xi) 6= yi]

• output final classifierfinal classifierfinal classifierfinal classifierfinal classifier Hfinal

(13)

AdaBoostAdaBoostAdaBoostAdaBoostAdaBoost

[with Freund]

• constructingconstructingconstructingconstructingconstructing DDDDDttttt:

D1(i) = 1/m

given Dt and ht:

Dt+1(i) = Dt(i) Zt ×

e−αt if yi = ht(xi) eαt if yi 6= ht(xi)

= Dt(i)

Zt exp(−αt yi ht(xi)) where Zt = normalization constant

αt = 12 ln

1 − t

t

> 0

• final classifierfinal classifierfinal classifierfinal classifier:final classifier

Hfinal(x) = sign

X

t αtht(x)

(14)

Toy ExampleToy ExampleToy ExampleToy ExampleToy Example

D1

weak classifiers = vertical or horizontal half-planes

(15)

Round 1Round 1Round 1Round 1Round 1

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

h1

α ε1

1

=0.30

=0.42

D2

(16)

Round 2Round 2Round 2Round 2Round 2

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

               

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

           

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

α ε2

2

=0.21

=0.65

h2 D3

參考文獻

相關文件

(Class): Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top. Hsuan-Tien

( D )The main function of fuel injection control system is to _________.(A) increase torque (B) increase horsepower (C) increase fuel efficiency (D) make 3-way catalytic

People of lesser capacities had to learn Hinayana teachings first in order to increase their intellectual power before they turned to Mahayana; the result was the gradual doctrine.

It is important to use a variety of text types, including information texts, with content-area links, as reading materials, to increase students’ exposure to texts that they

These activities provide chances for students to work on their own, to apply their economic concepts, to develop a critical attitude and, above all, to increase the interest of

Recycling Techniques are Available to Address Specific Pavement Distress and/or Pavement Structural Requirement.. Introduction to Asphalt Introduction

(2007) demonstrated that the minimum β-aberration design tends to be Q B -optimal if there is more weight on linear effects and the prior information leads to a model of small size;

Due to the increase in housing rent, rising prices in outbound package tours and air tickets during summer holidays, as well as in gasoline that was affected by price increase