最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

@min

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 109-139)

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g _t = h,

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g _t = h,

min

η @

@ min

h

1 N

N

X

n=1

err

X ^t−1

τ =1

α τ g _τ (x _n )

| {z }

s

n

+ηg

_t (x _n ),

y

_n

with err(s, y ) = (s − y )

²

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x _n )

− y

n

)

²

= 1 N

N

X

n=1

(

(y n − s n )

−

ηg t (x _n )) ²

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g _t -transformed linear regression

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g _t = h,

min

η @

@ min

h

1 N

N

X

n=1

err

X ^t−1

τ =1

α τ g _τ (x _n )

| {z }

s

n

+ηg

_t (x _n ),

y

_n

with err(s, y ) = (s − y )

²

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x _n )

− y

n

)

²

= 1 N

N

X

n=1

(

(y n − s n )

−

ηg t (x _n )) ²

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g _t -transformed linear regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/25

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g _t = h,

min

η @

@ min

h

1 N

N

X

n=1

err

X ^t−1

τ =1

α τ g _τ (x _n )

| {z }

s

n

+ηg

_t (x _n ),

y

_n

with err(s, y ) = (s − y )

²

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x _n )

− y

n

)

²

= 1 N

N

X

n=1

((y

n − s n )

−

ηg t (x _n )) ²

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g _t -transformed linear regression

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g _t = h,

min

η @

@ min

h

1 N

N

X

n=1

err

X ^t−1

τ =1

α τ g _τ (x _n )

| {z }

s

n

+ηg

_t (x _n ),

y

_n

with err(s, y ) = (s − y )

²

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x _n )

− y

n

)

²

= 1 N

N

X

n=1

((y

n − s n )

−

ηg t (x _n )) ²

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g _t -transformed linear regression

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/25

Gradient Boosted Decision Tree Gradient Boosting

Deciding Blending Weight as Optimization

after finding

g _t = h,

min

η @

@ min

h

1 N

N

X

n=1

err

X ^t−1

τ =1

α τ g _τ (x _n )

| {z }

s

n

+ηg

_t (x _n ),

y

_n

with err(s, y ) = (s − y )

²

min

η

1 N

N

X

n=1

(s

n

+

ηg t (x _n )

− y

n

)

²

= 1 N

N

X

n=1

((y

n − s n )

−

ηg t (x _n )) ²

—one-variable linear regressionon {(g

t -transformed input, residual)}

GradientBoost for regression:

α t = optimal η

by

g -transformed linear regression

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT)

s ₁ = s ₂ = . . . = s _N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x _n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

_t (x _n ), y _n − s n

)})

3

update

s _n

←

s _n

+

α _t g _t (x _n )

return G(x) =P

T

t=1 α _t g _t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/25

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s ₁ = s ₂ = . . . = s _N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x _n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

_t (x _n ), y _n − s n

)})

3

update

s _n

←

s _n

+

α _t g _t (x _n )

return G(x) =P

T

t=1 α _t g _t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s ₁ = s ₂ = . . . = s _N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x _n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

_t (x _n ), y _n − s n

)})

3

update

s _n

←

s _n

+

α _t g _t (x _n )

return G(x) =P

T

t=1 α _t g _t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/25

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s ₁ = s ₂ = . . . = s _N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x _n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

_t (x _n ), y _n − s n

)})

3

update

s _n

←

s _n

+

α _t g _t (x _n )

return G(x) =P

T

t=1 α _t g _t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s ₁ = s ₂ = . . . = s _N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x _n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

_t (x _n ), y _n − s n

)})

3

update

s _n

←

s _n

+

α _t g _t (x _n )

return G(x) =P

T

t=1 α _t g _t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/25

Gradient Boosted Decision Tree Gradient Boosting

Putting Everything Together

Gradient Boosted Decision Tree (GBDT) s ₁ = s ₂ = . . . = s _N = 0

for t = 1, 2, . . . , T

1

obtain

g t

by

A({(x _n

,

y n − s n

)})where

A

is a (squared-error) regression algorithm

—how about sampled and pruned C&RT?

2

compute

α t

=OneVarLinearRegression({(g

_t (x _n ), y _n − s n

)})

3

update

s _n

←

s _n

+

α _t g _t (x _n )

return G(x) =P

T

t=1 α _t g _t

(x)

GBDT: ‘regression sibling’ of AdaBoost-DTree

—popular in practice

Gradient Boosted Decision Tree Gradient Boosting

Fun Time

Which of the following is the optimal η for

min

η

1 N

N

X

n=1

((y

_n − s _n )

−

ηg _t (x _n )) ²

1

(P

N

n=1

g

_t

(x

_n

)(y

_n

−

s _n

)) · (P

N

n=1

g

_t ²

(x

_n

))

2

(P

N

n=1

g

_t

(x

_n

)(y

_n

−

s _n

)) / (P

N

n=1

g

_t ²

(x

_n

))

3

(P

N

n=1

g

t

(x

n

)(y

n

−

s n

)) + (P

N

n=1

g

_t ²

(x

n

))

4

(P

N

n=1

g

t

(x

n

)(y

n

−

s n

)) − (P

N

n=1

g

_t ²

(x

n

))

Reference Answer: 2

Derived within Lecture 9 of ML Foundations,

remember? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/25

Gradient Boosted Decision Tree Gradient Boosting

Fun Time

Which of the following is the optimal η for

min

η

1 N

N

X

n=1

((y

_n − s _n )

−

ηg _t (x _n )) ²

1

(P

N

n=1

g

_t

(x

_n

)(y

_n

−

s _n

)) · (P

N

n=1

g

_t ²

(x

_n

))

2

(P

N

n=1

g

_t

(x

_n

)(y

_n

−

s _n

)) / (P

N

n=1

g

_t ²

(x

_n

))

3

(P

N

n=1

g

t

(x

n

)(y

n

−

s n

)) + (P

N

n=1

g

_t ²

(x

n

))

4

(P

N

n=1

g

t

(x

n

)(y

n

−

s n

)) − (P

N

n=1

g

_t ²

(x

n

))

Reference Answer: 2

Derived within Lecture 9 of ML Foundations,

remember? :-)

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g _t

uniform

simple

voting/averaging of

g _t

non-uniform

linear model on

g _t -transformed

inputs

conditional

nonlinear model on

g _t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g _t

uniform

simple

voting/averaging of

g _t

non-uniform

linear model on

g _t -transformed

inputs

conditional

nonlinear model on

g _t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g _t

uniform

simple

voting/averaging of

g _t

non-uniform

linear model on

g _t -transformed

inputs

conditional

nonlinear model on

g _t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g _t

uniform

simple

voting/averaging of

g _t

non-uniform

linear model on

g _t -transformed

inputs

conditional

nonlinear model on

g _t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g _t

uniform

simple

voting/averaging of

g _t

non-uniform

linear model on

g _t -transformed

inputs

conditional

nonlinear model on

g _t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Blending Models

blending: aggregate

after getting diverse g _t

uniform

simple

voting/averaging of

g _t

non-uniform

linear model on

g _t -transformed

inputs

conditional

nonlinear model on

g _t -transformed

inputs

uniform for ‘stability’;

non-uniform/conditional

carefully

for

‘complexity’

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t

Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting; linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting; linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting; linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching

GradientBoost

diverse

g _t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching

GradientBoost

diverse

g _t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting;

linear vote

by

steepest search

boosting-like algorithms

most popular

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation-Learning Models

learning: aggregate

as well as getting diverse g _t Bagging

diverse

g _t

by

bootstrapping;

uniform vote

by

nothing :-)

AdaBoost

diverse

g _t

by

reweighting;

linear vote

by

steepest search

Decision Tree

diverse

g _t

by

data splitting;

conditional vote

by

branching GradientBoost

diverse

g _t

by

residual fitting;

linear vote

by

steepest search

Gradient Boosted Decision Tree Summary of Aggregation Models

Map of Aggregation of Aggregation Models

Bagging AdaBoost Decision Tree

Random Forest

randomized bagging + ‘strong’ DTree

AdaBoost-DTree

AdaBoost

+ ‘weak’ DTree

GradientBoost

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 109-139)

立即下載 "Machine Learning Techn..."

Outline

@min (你在這裡)

相關文件