• 沒有找到結果。

? learning g t for uniform aggregation: diversity important

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

;

learning: aggregate

as well as getting g t

aggregation type blending

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters:

GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters:

GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters:

GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters:

GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters: GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters: GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters: GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters: GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters: GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

; learning: aggregate

as well as getting g t aggregation type blending learning

uniform voting/averaging ?

non-uniform linear ?

conditional stacking ?

learning

g

t

for uniform aggregation:

diversity

important

• diversity

by different models:

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

• diversity

by different parameters: GD with η = 0.001, 0.01, . . ., 10

• diversity

by algorithmic randomness:

random PLA with different random seeds

• diversity

by data randomness:

within-cross-validation hypotheses g

v

next:

diversity

by data randomness

without

g

Blending and Bagging Bagging (Bootstrap Aggregation)

Revisit of Bias-Variance

expected

performance of A =

expected deviation

to

consensus

+performance of

consensus consensus ¯ g

=

expected g t

from

D t

∼ P

N

• consensus

more stable than direct A(D),

but comes from many more

D t

than the D on hand

want: approximate

g ¯

by

• finite (large) T

• approximate g

t

= A(D

t

) from D

t

∼ P

N

using only D

bootstrapping: a statistical tool that re-samples

from D to ‘simulate’

D t

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Revisit of Bias-Variance

expected

performance of A =

expected deviation

to

consensus

+performance of

consensus consensus ¯ g

=

expected g t

from

D t

∼ P

N

• consensus

more stable than direct A(D),

but comes from many more

D t

than the D on hand

want: approximate

g ¯

by

• finite (large) T

• approximate g

t

= A(D

t

) from D

t

∼ P

N

using only D

bootstrapping: a statistical tool that

re-samples

from D to ‘simulate’

D t

Blending and Bagging Bagging (Bootstrap Aggregation)

Revisit of Bias-Variance

expected

performance of A =

expected deviation

to

consensus

+performance of

consensus consensus ¯ g

=

expected g t

from

D t

∼ P

N

• consensus

more stable than direct A(D),

but comes from many more

D t

than the D on hand

want: approximate

g ¯

by

• finite (large) T

• approximate g

t

= A(D

t

) from D

t

∼ P

N

using only D

bootstrapping: a statistical tool that re-samples

from D to ‘simulate’

D t

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Revisit of Bias-Variance

expected

performance of A =

expected deviation

to

consensus

+performance of

consensus consensus ¯ g

=

expected g t

from

D t

∼ P

N

• consensus

more stable than direct A(D),

but comes from many more

D t

than the D on hand

want: approximate

g ¯

by

• finite (large) T

• approximate g

t

= A(D

t

) from D

t

∼ P

N

using only D

bootstrapping: a statistical tool that

re-samples

from D to ‘simulate’

D t

Blending and Bagging Bagging (Bootstrap Aggregation)

Revisit of Bias-Variance

expected

performance of A =

expected deviation

to

consensus

+performance of

consensus consensus ¯ g

=

expected g t

from

D t

∼ P

N

• consensus

more stable than direct A(D),

but comes from many more

D t

than the D on hand

want: approximate

g ¯

by

• finite (large) T

• approximate g

t

= A(D

t

) from D

t

∼ P

N

using only D

bootstrapping: a statistical tool that re-samples

from D to ‘simulate’

D t

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Revisit of Bias-Variance

expected

performance of A =

expected deviation

to

consensus

+performance of

consensus consensus ¯ g

=

expected g t

from

D t

∼ P

N

• consensus

more stable than direct A(D),

but comes from many more

D t

than the D on hand

want: approximate

g ¯

by

• finite (large) T

• approximate g

t

= A(D

t

) from D

t

∼ P

N

using only D

bootstrapping: a statistical tool that

re-samples

from D to ‘simulate’

D t

Blending and Bagging Bagging (Bootstrap Aggregation)

Bootstrap Aggregation

bootstrapping

bootstrap sample

D ˜ t

: re-sample N examples from D

uniformly with replacement

—can also use arbitrary

N 0

instead of original N

virtual aggregation

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

G

=Uniform({g

t

})

bootstrap aggregation

consider a

physical

iterative process that for t = 1, 2, . . . , T

1

request size-N’data

D ˜ t

from

bootstrapping

2

obtain

g t

by A(

D ˜ t

)

G

=Uniform({g

t

})

bootstrap aggregation (BAGging): a simple

meta algorithm

on top of

base algorithm

A

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Bootstrap Aggregation

bootstrapping

bootstrap sample

D ˜ t

: re-sample N examples from D

uniformly with replacement—can also use arbitrary N 0

instead of original N

virtual aggregation

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

G

=Uniform({g

t

})

bootstrap aggregation

consider a

physical

iterative process that for t = 1, 2, . . . , T

1

request size-N’data

D ˜ t

from

bootstrapping

2

obtain

g t

by A(

D ˜ t

)

G

=Uniform({g

t

})

bootstrap aggregation (BAGging): a simple

meta algorithm

on top of

base algorithm

A

Blending and Bagging Bagging (Bootstrap Aggregation)

Bootstrap Aggregation

bootstrapping

bootstrap sample

D ˜ t

: re-sample N examples from D

uniformly with replacement—can also use arbitrary N 0

instead of original N

virtual aggregation

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

G

=Uniform({g

t

})

bootstrap aggregation

consider a

physical

iterative process that for t = 1, 2, . . . , T

1

request size-N’data

D ˜ t

from

bootstrapping

2

obtain

g t

by A(

D ˜ t

)

G

=Uniform({g

t

})

bootstrap aggregation (BAGging): a simple

meta algorithm

on top of

base algorithm

A

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Bootstrap Aggregation

bootstrapping

bootstrap sample

D ˜ t

: re-sample N examples from D

uniformly with replacement—can also use arbitrary N 0

instead of original N

virtual aggregation

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

G

=Uniform({g

t

})

bootstrap aggregation

consider a

physical

iterative process that for t = 1, 2, . . . , T

1

request size-N’data

D ˜ t

from

bootstrapping

2

obtain

g t

by A(

D ˜ t

)

G

=Uniform({g

t

})

bootstrap aggregation (BAGging): a simple

meta algorithm

on top of

base algorithm

A

Blending and Bagging Bagging (Bootstrap Aggregation)

Bootstrap Aggregation

bootstrapping

bootstrap sample

D ˜ t

: re-sample N examples from D

uniformly with replacement—can also use arbitrary N 0

instead of original N

virtual aggregation

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

G

=Uniform({g

t

})

bootstrap aggregation

consider a

physical

iterative process that for t = 1, 2, . . . , T

1

request size-N’data

D ˜ t

from

bootstrapping

2

obtain

g t

by A(

D ˜ t

)

G

=Uniform({g

t

})

bootstrap aggregation (BAGging): a simple

meta algorithm

on top of

base algorithm

A

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Bootstrap Aggregation

bootstrapping

bootstrap sample

D ˜ t

: re-sample N examples from D

uniformly with replacement—can also use arbitrary N 0

instead of original N

virtual aggregation

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

G

=Uniform({g

t

})

bootstrap aggregation

consider a

physical

iterative process that for t = 1, 2, . . . , T

1

request size-N’data

D ˜ t

from

bootstrapping

2

obtain

g t

by A(

D ˜ t

)

G

=Uniform({g

t

})

bootstrap aggregation (BAGging): a simple

meta algorithm

on top of

base algorithm

A

Blending and Bagging Bagging (Bootstrap Aggregation)

Bootstrap Aggregation

bootstrapping

bootstrap sample

D ˜ t

: re-sample N examples from D

uniformly with replacement—can also use arbitrary N 0

instead of original N

virtual aggregation

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

G

=Uniform({g

t

})

bootstrap aggregation

consider a

physical

iterative process that for t = 1, 2, . . . , T

1

request size-N’data

D ˜ t

from

bootstrapping

2

obtain

g t

by A(

D ˜ t

)

G

=Uniform({g

t

})

bootstrap aggregation (BAGging):

a simple

meta algorithm

on top of

base algorithm

A

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Bagging Pocket in Action

TPOCKET =1000; TBAG=25

very

diverse

g

t

from bagging

proper

non-linear

boundary after aggregating binary classifiers

bagging works reasonably well

if base

algorithm sensitive to data randomness

Blending and Bagging Bagging (Bootstrap Aggregation)

Bagging Pocket in Action

TPOCKET =1000; TBAG=25

very

diverse

g

t

from bagging

proper

non-linear

boundary after aggregating binary classifiers

bagging works reasonably well

if base algorithm sensitive to data randomness

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Bagging Pocket in Action

TPOCKET =1000; TBAG=25

very

diverse

g

t

from bagging

proper

non-linear

boundary after aggregating binary classifiers

bagging works reasonably well

if base

algorithm sensitive to data randomness

Blending and Bagging Bagging (Bootstrap Aggregation)

Fun Time

When using bootstrapping to re-sample N examples ˜D

t

from a data set D with N examples, what is the probability of getting ˜D

t

exactly the same as D?

1

0 /N

N

=0

2

1 /N

N

3

N! /N

N

4

N

N

/N

N

=1

Reference Answer: 3

Consider re-sampling in an ordered manner for N steps. Then there are (N

N

)possible

outcomes ˜D

t

, each with equal probability. Most importantly, (N!) of the outcomes are

permutations of the original D, and thus the answer.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/23

Blending and Bagging Bagging (Bootstrap Aggregation)

Fun Time

When using bootstrapping to re-sample N examples ˜D

t

from a data set D with N examples, what is the probability of getting ˜D

t

exactly the same as D?

1

0 /N

N

=0

2

1 /N

N

3

N! /N

N

4

N

N

/N

N

=1

Reference Answer: 3

Consider re-sampling in an ordered manner for N steps. Then there are (N

N

)possible

outcomes ˜D

t

, each with equal probability. Most importantly, (N!) of the outcomes are

permutations of the original D, and thus the answer.

Blending and Bagging Bagging (Bootstrap Aggregation)

Summary

1 Embedding Numerous Features: Kernel Models

2

Combining Predictive Features: Aggregation Models

Lecture 7: Blending and Bagging

Motivation of Aggregation

aggregated G strong and/or moderate

相關文件