• 沒有找到結果。

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g t = h,

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g t = h,

min

η @

@ min

h

1 N

N

X

n=1

err



X t−1

τ =1

α τ g τ (x n )

| {z }

s

n

+ηg

t (x n ),

y

n



with err(s, y ) = (s − y )

2

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x n )

− y

n

)

2

= 1 N

N

X

n=1

(

(y n − s n )

ηg t (x n )) 2

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g t -transformed linear regression

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g t = h,

min

η @

@ min

h

1 N

N

X

n=1

err



X t−1

τ =1

α τ g τ (x n )

| {z }

s

n

+ηg

t (x n ),

y

n



with err(s, y ) = (s − y )

2

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x n )

− y

n

)

2

= 1 N

N

X

n=1

(

(y n − s n )

ηg t (x n )) 2

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g t -transformed linear regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/25

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g t = h,

min

η @

@ min

h

1 N

N

X

n=1

err



X t−1

τ =1

α τ g τ (x n )

| {z }

s

n

+ηg

t (x n ),

y

n



with err(s, y ) = (s − y )

2

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x n )

− y

n

)

2

= 1 N

N

X

n=1

((y

n − s n )

ηg t (x n )) 2

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g t -transformed linear regression

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g t = h,

min

η @

@ min

h

1 N

N

X

n=1

err



X t−1

τ =1

α τ g τ (x n )

| {z }

s

n

+ηg

t (x n ),

y

n



with err(s, y ) = (s − y )

2

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x n )

− y

n

)

2

= 1 N

N

X

n=1

((y

n − s n )

ηg t (x n )) 2

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g t -transformed linear regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/25

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g t = h,

min

η @

@ min

h

1 N

N

X

n=1

err



X t−1

τ =1

α τ g τ (x n )

| {z }

s

n

+ηg

t (x n ),

y

n



with err(s, y ) = (s − y )

2

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x n )

− y

n

)

2

= 1 N

N

X

n=1

((y

n − s n )

ηg t (x n )) 2

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g -transformed linear regression

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT)

s 1 = s 2 = . . . = s N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

t (x n ), y n − s n

)})

3

update

s n

s n

+

α t g t (x n )

return G(x) =P

T

t=1 α t g t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/25

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s 1 = s 2 = . . . = s N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

t (x n ), y n − s n

)})

3

update

s n

s n

+

α t g t (x n )

return G(x) =P

T

t=1 α t g t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s 1 = s 2 = . . . = s N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

t (x n ), y n − s n

)})

3

update

s n

s n

+

α t g t (x n )

return G(x) =P

T

t=1 α t g t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/25

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s 1 = s 2 = . . . = s N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

t (x n ), y n − s n

)})

3

update

s n

s n

+

α t g t (x n )

return G(x) =P

T

t=1 α t g t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s 1 = s 2 = . . . = s N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

t (x n ), y n − s n

)})

3

update

s n

s n

+

α t g t (x n )

return G(x) =P

T

t=1 α t g t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/25

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s 1 = s 2 = . . . = s N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

t (x n ), y n − s n

)})

3

update

s n

s n

+

α t g t (x n )

return G(x) =P

T

t=1 α t g t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Gradient Boosted Decision Tree Gradient Boosting

Fun Time

Which of the following is the optimal η for

min

η

1 N

N

X

n=1

((y

n − s n )

ηg t (x n )) 2

1

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) · (P

N

n=1

g

t 2

(x

n

))

2

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) / (P

N

n=1

g

t 2

(x

n

))

3

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) + (P

N

n=1

g

t 2

(x

n

))

4

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) − (P

N

n=1

g

t 2

(x

n

))

Reference Answer: 2

Derived within Lecture 9 of ML Foundations,

remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/25

Gradient Boosted Decision Tree Gradient Boosting

Fun Time

Which of the following is the optimal η for

min

η

1 N

N

X

n=1

((y

n − s n )

ηg t (x n )) 2

1

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) · (P

N

n=1

g

t 2

(x

n

))

2

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) / (P

N

n=1

g

t 2

(x

n

))

3

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) + (P

N

n=1

g

t 2

(x

n

))

4

(P

N

n=1

g

t

(x

n

)(y

n

s n

)) − (P

N

n=1

g

t 2

(x

n

))

Reference Answer: 2

Derived within Lecture 9 of ML Foundations,

remember? :-)

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g t

uniform

simple

voting/averaging of

g t

non-uniform

linear model on

g t -transformed

inputs

conditional

nonlinear model on

g t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g t

uniform

simple

voting/averaging of

g t

non-uniform

linear model on

g t -transformed

inputs

conditional

nonlinear model on

g t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g t

uniform

simple

voting/averaging of

g t

non-uniform

linear model on

g t -transformed

inputs

conditional

nonlinear model on

g t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g t

uniform

simple

voting/averaging of

g t

non-uniform

linear model on

g t -transformed

inputs

conditional

nonlinear model on

g t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g t

uniform

simple

voting/averaging of

g t

non-uniform

linear model on

g t -transformed

inputs

conditional

nonlinear model on

g t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g t

uniform

simple

voting/averaging of

g t

non-uniform

linear model on

g t -transformed

inputs

conditional

nonlinear model on

g t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t

Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting; linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting; linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching

GradientBoost

diverse

g t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching

GradientBoost

diverse

g t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g t Bagging

diverse

g t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g t

by

residual fitting;

linear vote

by

steepest search

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation of Aggregation Models

Bagging AdaBoost Decision Tree

Random Forest

randomized bagging + ‘strong’ DTree

AdaBoost-DTree

AdaBoost

+ ‘weak’ DTree

GradientBoost

相關文件