• 沒有找到結果。

# Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
153
0
0

(1)

## ( 機器學習技法)

### Lecture 11: Gradient Boosted Decision Tree

Hsuan-Tien Lin (林軒田)

htlin@csie.ntu.edu.tw

### National Taiwan University ( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/25

(2)

### 2

Combining Predictive Features: Aggregation Models

of

trees with

and

(3)

## From Random Forest to AdaBoost-DTree

function

### RandomForest(D)

For t = 1, 2, . . . , T

request size-N

data

by

with D

obtain tree

### g t

by

Randomized-DTree(

) return

=Uniform({g

### t

})

function

For t = 1, 2, . . . , T

reweight data by

obtain tree

by DTree(D,

)

calculate ‘vote’

of

return

=

({(g

,

)})

need:

DTree(D,

### u(t)

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/25

(4)

## From Random Forest to AdaBoost-DTree

function

### RandomForest(D)

For t = 1, 2, . . . , T

request size-N

data

by

with D

obtain tree

### g t

by

Randomized-DTree(

) return

=Uniform({g

### t

})

function

For t = 1, 2, . . . , T

reweight data by

obtain tree

by DTree(D,

)

calculate ‘vote’

of

return

=

({(g

,

)})

need:

DTree(D,

)

(5)

## From Random Forest to AdaBoost-DTree

function

### RandomForest(D)

For t = 1, 2, . . . , T

request size-N

data

by

with D

obtain tree

### g t

by

Randomized-DTree(

) return

=Uniform({g

### t

})

function

For t = 1, 2, . . . , T

reweight data by

obtain tree

by DTree(D,

)

calculate ‘vote’

of

return

=

({(g

,

)})

need:

DTree(D,

### u(t)

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/25

(6)

## From Random Forest to AdaBoost-DTree

function

### RandomForest(D)

For t = 1, 2, . . . , T

request size-N

data

by

with D

obtain tree

### g t

by

Randomized-DTree(

) return

=Uniform({g

### t

})

function

For t = 1, 2, . . . , T

reweight data by

obtain tree

by DTree(D,

)

calculate ‘vote’

of

return

=

({(g

,

)})

need:

DTree(D,

)

(7)

## From Random Forest to AdaBoost-DTree

function

### RandomForest(D)

For t = 1, 2, . . . , T

request size-N

data

by

with D

obtain tree

### g t

by

Randomized-DTree(

) return

=Uniform({g

### t

})

function

For t = 1, 2, . . . , T

reweight data by

obtain tree

by DTree(D,

)

calculate ‘vote’

of

return

=

({(g

,

)})

need:

DTree(D,

### u(t)

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/25

(8)

## From Random Forest to AdaBoost-DTree

function

### RandomForest(D)

For t = 1, 2, . . . , T

request size-N

data

by

with D

obtain tree

### g t

by

Randomized-DTree(

) return

=Uniform({g

### t

})

function

For t = 1, 2, . . . , T

reweight data by

obtain tree

by DTree(D,

)

calculate ‘vote’

of

return

=

({(g

,

)})

need:

DTree(D,

)

(9)

## Weighted Decision Tree Algorithm

### Weighted Algorithm

minimize (regularized) E

(h) =

P

· err(y

,h(x

### n

))

if using existing algorithm as

### black box

(no modifications), to get E

### inu

approximately optimized...

### weights u

expressed by

bootstrap-sampled

—request size-N

data

by

with D

expressed by

proportional to

—request size-N

data

by

on D

+ DTree(

### D ˜ t

) without modifying DTree

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/25

(10)

## Weighted Decision Tree Algorithm

### Weighted Algorithm

minimize (regularized) E

(h) =

P

· err(y

,h(x

### n

)) if using existing algorithm as

### black box

(no modifications),

to get E

### inu

approximately optimized...

### weights u

expressed by

bootstrap-sampled

—request size-N

data

by

with D

expressed by

proportional to

—request size-N

data

by

on D

+ DTree(

### D ˜ t

) without modifying DTree

(11)

## Weighted Decision Tree Algorithm

### Weighted Algorithm

minimize (regularized) E

(h) =

P

· err(y

,h(x

### n

)) if using existing algorithm as

### black box

(no modifications),

to get E

### inu

approximately optimized...

### weights u

expressed by

bootstrap-sampled

—request size-N

data

by

with D

expressed by

proportional to

—request size-N

data

by

on D

+ DTree(

### D ˜ t

) without modifying DTree

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/25

(12)

## Weighted Decision Tree Algorithm

### Weighted Algorithm

minimize (regularized) E

(h) =

P

· err(y

,h(x

### n

)) if using existing algorithm as

### black box

(no modifications),

to get E

### inu

approximately optimized...

### weights u

expressed by

bootstrap-sampled

—request size-N

data

by

with D

expressed by

proportional to

—request size-N

data

by

on D

+ DTree(

### D ˜ t

) without modifying DTree

(13)

## Weighted Decision Tree Algorithm

### Weighted Algorithm

minimize (regularized) E

(h) =

P

· err(y

,h(x

### n

)) if using existing algorithm as

### black box

(no modifications),

to get E

### inu

approximately optimized...

### weights u

expressed by

bootstrap-sampled

—request size-N

data

by

with D

expressed by

proportional to

—request size-N

data

by

on D

+ DTree(

### D ˜ t

) without modifying DTree

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/25

(14)

## Weighted Decision Tree Algorithm

### Weighted Algorithm

minimize (regularized) E

(h) =

P

· err(y

,h(x

### n

)) if using existing algorithm as

### black box

(no modifications),

to get E

### inu

approximately optimized...

### weights u

expressed by

bootstrap-sampled

—request size-N

data

by

with D

expressed by

proportional to

—request size-N

data

by

on D

+ DTree(

)

(15)

## Weak Decision Tree Algorithm

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =

0

if

different

=⇒

(g

) =0

=⇒

=0

=⇒

= ∞(autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

### D) ˜

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/25

(16)

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =

0

if

different

=⇒

(g

) =0

=⇒

=0

=⇒

= ∞(autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

(17)

## Weak Decision Tree Algorithm

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

= ∞(autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

### D) ˜

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/25

(18)

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =

0

=⇒

=0

=⇒

= ∞(autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

(19)

## Weak Decision Tree Algorithm

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

= ∞(autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

### D) ˜

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/25

(20)

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=

0

=⇒

= ∞(autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

(21)

## Weak Decision Tree Algorithm

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

= ∞(autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

### D) ˜

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/25

(22)

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

=

∞ (autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

(23)

## Weak Decision Tree Algorithm

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

=∞ (autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

### D) ˜

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/25

(24)

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

=∞ (autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

(25)

## Weak Decision Tree Algorithm

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

=∞ (autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

### D) ˜

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/25

(26)

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

=∞ (autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

(27)

## Weak Decision Tree Algorithm

=ln 

=ln q

t

t with

=⇒

if

tree trained on

=⇒

(g

) =0 if

different

=⇒

(g

) =0

=⇒

=0

=⇒

=∞ (autocracy!!)

need:

tree trained on

to be

∝ u

∝ u

+

DTree(

### D) ˜

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/25

(28)

what if DTree with

### height ≤ 1

(extremely pruned)?

learn

argmin

X

|D

with h| ·

with h)

—if

=

=

### special case

(29)

what if DTree with

### height ≤ 1

(extremely pruned)?

learn

argmin

X

|D

with h| ·

with h)

—if

=

=

### special case

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/25

(30)

what if DTree with

### height ≤ 1

(extremely pruned)?

learn

argmin

X

|D

with h| ·

with h)

—if

=

=

### special case

(31)

what if DTree with

### height ≤ 1

(extremely pruned)?

learn

argmin

X

|D

with h| ·

with h)

—if

=

=

### special case

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/25

(32)

## Fun Time

When running AdaBoost-DTree with sampling and getting a decision tree g

such that g

### t

achieves zero error on the sampled data set ˜D

### t

. Which of the following is possible?

α

<0

α

=0

α

>0

all of the above

While g

### t

achieves zero error on ˜D

, g

### t

may not achieve zero weighted error on (D,

)and hence 

### t

can be anything, even ≥

. Then, α

### t

can be ≤ 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/25

(33)

## Fun Time

When running AdaBoost-DTree with sampling and getting a decision tree g

such that g

### t

achieves zero error on the sampled data set ˜D

### t

. Which of the following is possible?

α

<0

α

=0

α

>0

all of the above

While g

### t

achieves zero error on ˜D

, g

### t

may not achieve zero weighted error on (D,

)and hence 

### t

can be anything, even ≥

. Then, α

### t

can be ≤ 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/25

(34)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

 =

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

)

(35)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

### voting score on xn

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/25

(36)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

)

(37)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

### voting score on xn

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/25

(38)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

)

(39)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

### voting score on xn

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/25

(40)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

)

(41)

= (

·

/

=

·

n

t

n

=

·



=

·

Y

·

!

### •

recall: G(x) = sign

!

:

of {g

} on x

∝ exp −y

(

### voting score on xn

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/25

(42)

## Voting Score and Margin

linear blending =

+

+

G(x

) =sign

z }| {

X

i

i

n

### )

and hard-margin SVM

=

n

T

n

,

y

### n

(voting score)= signed & unnormalized

⇐=

want y

(voting score)

want

exp(−y

(voting score))

want

(43)

## Voting Score and Margin

linear blending =

+

+

G(x

) =sign

z }| {

X

i

i

n

### )

and hard-margin SVM

=

n

T

n

,

y

### n

(voting score)= signed & unnormalized

⇐=

want y

(voting score)

want

exp(−y

(voting score))

want

### decreases P N n=1 u n(t)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/25

(44)

## Voting Score and Margin

linear blending =

+

+

G(x

) =sign

z }| {

X

i

i

n

### )

and hard-margin SVM

=

n

T

n

,

y

### n

(voting score)= signed & unnormalized

⇐=

want y

(voting score)

want

exp(−y

(voting score))

want

(45)

## Voting Score and Margin

linear blending =

+

+

G(x

) =sign

z }| {

X

i

i

n

### )

and hard-margin SVM

=

n

T

n

,

y

### n

(voting score)= signed & unnormalized

⇐=

want y

(voting score)

want

exp(−y

(voting score))

want

### decreases P N n=1 u n(t)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/25

(46)

## Voting Score and Margin

linear blending =

+

+

G(x

) =sign

z }| {

X

i

i

n

### )

and hard-margin SVM

=

n

T

n

,

y

### n

(voting score)= signed & unnormalized

⇐=

want y

(voting score)

want

exp(−y

(voting score))

want

(47)

## Voting Score and Margin

linear blending =

+

+

G(x

) =sign

z }| {

X

i

i

n

### )

and hard-margin SVM

=

n

T

n

,

y

### n

(voting score)= signed & unnormalized

⇐=

want y

(voting score)

want

exp(−y

(voting score))

want

### decreases P N n=1 u n(t)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/25

(48)

## Voting Score and Margin

linear blending =

+

+

G(x

) =sign

z }| {

X

i

i

n

### )

and hard-margin SVM

=

n

T

n

,

y

### n

(voting score)= signed & unnormalized

⇐=

want y

(voting score)

want

exp(−y

(voting score))

want

(49)

### n=1 u n(t)

and thus somewhat

= 1 N

X

exp −y

!

linear score

=

err

(s,

) =J

≤ 0K

) =exp(−y

### s):

upper bound of err

—called

by

of err

### 0/1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/25

(50)

### n=1 u n(t)

and thus somewhat

= 1 N

X

exp −y

!

linear score

=

err

(s,

) =J

≤ 0K

) =exp(−y

### s):

upper bound of err

—called

by

of err

(51)

### n=1 u n(t)

and thus somewhat

= 1 N

X

exp −y

!

linear score

=

err

(s,

) =J

≤ 0K

) =exp(−y

### s):

upper bound of err

—called

by

of err

### 0/1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/25

(52)

### n=1 u n(t)

and thus somewhat

= 1 N

X

exp −y

!

linear score

=

err

(s,

) =J

≤ 0K

) =exp(−y

### s):

upper bound of err

—called

by

of err

(53)

### n=1 u n(t)

and thus somewhat

= 1 N

X

exp −y

!

linear score

=

err

(s,

) =J

≤ 0K

) =exp(−y

### s):

upper bound of err

—called

by

of err

### 0/1

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/25

(54)

recall: gradient descent (remember? :-)), at iteration t

min E

(w

+

E

(w

)

| {z }

+

|{z}

∇E

(w

)

| {z }

### known

at iteration t, to find

, solve min

X

exp −y

+

!!

=

X

exp (−y

X

(

1 − y

) =

X

y

good

P

(−y

### n h(x n ))

(55)

recall: gradient descent (remember? :-)), at iteration t

min E

(w

+

E

(w

)

| {z }

+

|{z}

∇E

(w

)

| {z }

### known

at iteration t, to find

, solve min

X

exp −y

+

!!

=

X

exp (−y

X

(

1 − y

) =

X

y

good

P

(−y

### n h(x n ))

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/25

(56)

recall: gradient descent (remember? :-)), at iteration t

min E

(w

+

E

(w

)

| {z }

+

|{z}

∇E

(w

)

| {z }

### known

at iteration t, to find

, solve min

X

exp −y

+

!!

=

X

exp (−y

X

(

1 − y

) =

X

y

good

P

(−y

### n h(x n ))

(57)

recall: gradient descent (remember? :-)), at iteration t

min E

(w

+

E

(w

)

| {z }

+

|{z}

∇E

(w

)

| {z }

### known

at iteration t, to find

, solve min

X

exp −y

+

!!

=

X

exp (−y

X

(

1 − y

) =

X

y

good

P

(−y

### n h(x n ))

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/25

(58)

recall: gradient descent (remember? :-)), at iteration t

min E

(w

+

E

(w

)

| {z }

+

|{z}

∇E

(w

)

| {z }

### known

at iteration t, to find

, solve min

X

exp −y

+

!!

=

X

exp (−y

X

(

1 − y

)

=

X

y

good

P

(−y

### n h(x n ))

(59)

recall: gradient descent (remember? :-)), at iteration t

min E

(w

+

E

(w

)

| {z }

+

|{z}

∇E

(w

)

| {z }

### known

at iteration t, to find

, solve min

X

exp −y

+

!!

=

X

exp (−y

X

(1 − y

=

X

y

good

P

(−y

### n h(x n ))

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/25

(60)

recall: gradient descent (remember? :-)), at iteration t

min E

(w

+

E

(w

)

| {z }

+

|{z}

∇E

(w

)

| {z }

### known

at iteration t, to find

, solve min

X

exp −y

+

!!

=

X

exp (−y

X

(1 − y

X

y

good

P

(−y

### n h(x n ))

Outline

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Basics

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/24.:. Deep Learning Deep