Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
;learning: aggregate
as well as getting g t
aggregation type blending
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters:GD with η = 0.001, 0.01, . . ., 10
• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters:GD with η = 0.001, 0.01, . . ., 10
• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters:GD with η = 0.001, 0.01, . . ., 10
• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters:GD with η = 0.001, 0.01, . . ., 10
• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters: GD with η = 0.001, 0.01, . . ., 10• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters: GD with η = 0.001, 0.01, . . ., 10• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters: GD with η = 0.001, 0.01, . . ., 10• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters: GD with η = 0.001, 0.01, . . ., 10• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:diversity
by data randomnesswithout
g−
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters: GD with η = 0.001, 0.01, . . ., 10• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:
diversity
by data randomnesswithout
g−
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/23
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
; learning: aggregateas well as getting g t aggregation type blending learning
uniform voting/averaging ?
non-uniform linear ?
conditional stacking ?
learning
gt
for uniform aggregation:diversity
important• diversity
by different models:g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
• diversity
by different parameters: GD with η = 0.001, 0.01, . . ., 10• diversity
by algorithmic randomness:random PLA with different random seeds
• diversity
by data randomness:within-cross-validation hypotheses g
v −
next:diversity
by data randomnesswithout
g−
Blending and Bagging Bagging (Bootstrap Aggregation)
Revisit of Bias-Variance
expected
performance of A =expected deviation
toconsensus
+performance ofconsensus consensus ¯ g
=expected g t
fromD t
∼ PN
• consensus
more stable than direct A(D),but comes from many more
D t
than the D on hand•
want: approximateg ¯
by• finite (large) T
• approximate g
t= A(D
t) from D
t∼ P
Nusing only D
bootstrapping: a statistical tool that re-samples
from D to ‘simulate’D t
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Revisit of Bias-Variance
expected
performance of A =expected deviation
toconsensus
+performance ofconsensus consensus ¯ g
=expected g t
fromD t
∼ PN
• consensus
more stable than direct A(D),but comes from many more
D t
than the D on hand•
want: approximateg ¯
by• finite (large) T
• approximate g
t= A(D
t) from D
t∼ P
Nusing only D
bootstrapping: a statistical tool that
re-samples
from D to ‘simulate’D t
Blending and Bagging Bagging (Bootstrap Aggregation)
Revisit of Bias-Variance
expected
performance of A =expected deviation
toconsensus
+performance ofconsensus consensus ¯ g
=expected g t
fromD t
∼ PN
• consensus
more stable than direct A(D),but comes from many more
D t
than the D on hand•
want: approximateg ¯
by• finite (large) T
• approximate g
t= A(D
t) from D
t∼ P
Nusing only D
bootstrapping: a statistical tool that re-samples
from D to ‘simulate’D t
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Revisit of Bias-Variance
expected
performance of A =expected deviation
toconsensus
+performance ofconsensus consensus ¯ g
=expected g t
fromD t
∼ PN
• consensus
more stable than direct A(D),but comes from many more
D t
than the D on hand•
want: approximateg ¯
by• finite (large) T
• approximate g
t= A(D
t) from D
t∼ P
Nusing only D
bootstrapping: a statistical tool that
re-samples
from D to ‘simulate’D t
Blending and Bagging Bagging (Bootstrap Aggregation)
Revisit of Bias-Variance
expected
performance of A =expected deviation
toconsensus
+performance ofconsensus consensus ¯ g
=expected g t
fromD t
∼ PN
• consensus
more stable than direct A(D),but comes from many more
D t
than the D on hand•
want: approximateg ¯
by• finite (large) T
• approximate g
t= A(D
t) from D
t∼ P
Nusing only D
bootstrapping: a statistical tool that re-samples
from D to ‘simulate’D t
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Revisit of Bias-Variance
expected
performance of A =expected deviation
toconsensus
+performance ofconsensus consensus ¯ g
=expected g t
fromD t
∼ PN
• consensus
more stable than direct A(D),but comes from many more
D t
than the D on hand•
want: approximateg ¯
by• finite (large) T
• approximate g
t= A(D
t) from D
t∼ P
Nusing only D
bootstrapping: a statistical tool that
re-samples
from D to ‘simulate’D t
Blending and Bagging Bagging (Bootstrap Aggregation)
Bootstrap Aggregation
bootstrapping
bootstrap sample
D ˜ t
: re-sample N examples from Duniformly with replacement
—can also use arbitrary
N 0
instead of original Nvirtual aggregation
consider avirtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from P
N
(i.i.d.)2
obtaing t
by A(Dt
)G
=Uniform({gt
})bootstrap aggregation
consider aphysical
iterative process that for t = 1, 2, . . . , T1
request size-N’dataD ˜ t
frombootstrapping
2
obtaing t
by A(D ˜ t
)G
=Uniform({gt
})bootstrap aggregation (BAGging): a simple
meta algorithm
on top ofbase algorithm
AHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Bootstrap Aggregation
bootstrapping
bootstrap sample
D ˜ t
: re-sample N examples from Duniformly with replacement—can also use arbitrary N 0
instead of original Nvirtual aggregation
consider avirtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from P
N
(i.i.d.)2
obtaing t
by A(Dt
)G
=Uniform({gt
})bootstrap aggregation
consider aphysical
iterative process that for t = 1, 2, . . . , T1
request size-N’dataD ˜ t
frombootstrapping
2
obtaing t
by A(D ˜ t
)G
=Uniform({gt
})bootstrap aggregation (BAGging): a simple
meta algorithm
on top ofbase algorithm
ABlending and Bagging Bagging (Bootstrap Aggregation)
Bootstrap Aggregation
bootstrapping
bootstrap sample
D ˜ t
: re-sample N examples from Duniformly with replacement—can also use arbitrary N 0
instead of original Nvirtual aggregation
consider avirtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from P
N
(i.i.d.)2
obtaing t
by A(Dt
)G
=Uniform({gt
})bootstrap aggregation
consider aphysical
iterative process that for t = 1, 2, . . . , T1
request size-N’dataD ˜ t
frombootstrapping
2
obtaing t
by A(D ˜ t
)G
=Uniform({gt
})bootstrap aggregation (BAGging): a simple
meta algorithm
on top ofbase algorithm
AHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Bootstrap Aggregation
bootstrapping
bootstrap sample
D ˜ t
: re-sample N examples from Duniformly with replacement—can also use arbitrary N 0
instead of original Nvirtual aggregation
consider avirtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from P
N
(i.i.d.)2
obtaing t
by A(Dt
)G
=Uniform({gt
})bootstrap aggregation
consider aphysical
iterative process that for t = 1, 2, . . . , T1
request size-N’dataD ˜ t
frombootstrapping
2
obtaing t
by A(D ˜ t
)G
=Uniform({gt
})bootstrap aggregation (BAGging): a simple
meta algorithm
on top ofbase algorithm
ABlending and Bagging Bagging (Bootstrap Aggregation)
Bootstrap Aggregation
bootstrapping
bootstrap sample
D ˜ t
: re-sample N examples from Duniformly with replacement—can also use arbitrary N 0
instead of original Nvirtual aggregation
consider avirtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from P
N
(i.i.d.)2
obtaing t
by A(Dt
)G
=Uniform({gt
})bootstrap aggregation
consider aphysical
iterative process that for t = 1, 2, . . . , T1
request size-N’dataD ˜ t
frombootstrapping
2
obtaing t
by A(D ˜ t
)G
=Uniform({gt
})bootstrap aggregation (BAGging): a simple
meta algorithm
on top ofbase algorithm
AHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Bootstrap Aggregation
bootstrapping
bootstrap sample
D ˜ t
: re-sample N examples from Duniformly with replacement—can also use arbitrary N 0
instead of original Nvirtual aggregation
consider avirtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from P
N
(i.i.d.)2
obtaing t
by A(Dt
)G
=Uniform({gt
})bootstrap aggregation
consider aphysical
iterative process that for t = 1, 2, . . . , T1
request size-N’dataD ˜ t
frombootstrapping
2
obtaing t
by A(D ˜ t
)G
=Uniform({gt
})bootstrap aggregation (BAGging): a simple
meta algorithm
on top ofbase algorithm
ABlending and Bagging Bagging (Bootstrap Aggregation)
Bootstrap Aggregation
bootstrapping
bootstrap sample
D ˜ t
: re-sample N examples from Duniformly with replacement—can also use arbitrary N 0
instead of original Nvirtual aggregation
consider avirtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from P
N
(i.i.d.)2
obtaing t
by A(Dt
)G
=Uniform({gt
})bootstrap aggregation
consider aphysical
iterative process that for t = 1, 2, . . . , T1
request size-N’dataD ˜ t
frombootstrapping
2
obtaing t
by A(D ˜ t
)G
=Uniform({gt
})bootstrap aggregation (BAGging):
a simple
meta algorithm
on top ofbase algorithm
AHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Bagging Pocket in Action
TPOCKET =1000; TBAG=25
•
verydiverse
gt
from bagging•
propernon-linear
boundary after aggregating binary classifiersbagging works reasonably well
if base
algorithm sensitive to data randomness
Blending and Bagging Bagging (Bootstrap Aggregation)
Bagging Pocket in Action
TPOCKET =1000; TBAG=25
•
verydiverse
gt
from bagging•
propernon-linear
boundary after aggregating binary classifiersbagging works reasonably well
if base algorithm sensitive to data randomness
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Bagging Pocket in Action
TPOCKET =1000; TBAG=25
•
verydiverse
gt
from bagging•
propernon-linear
boundary after aggregating binary classifiersbagging works reasonably well
if base
algorithm sensitive to data randomness
Blending and Bagging Bagging (Bootstrap Aggregation)
Fun Time
When using bootstrapping to re-sample N examples ˜D
t
from a data set D with N examples, what is the probability of getting ˜Dt
exactly the same as D?1
0 /NN
=02
1 /NN
3
N! /NN
4
NN
/NN
=1Reference Answer: 3
Consider re-sampling in an ordered manner for N steps. Then there are (N
N
)possibleoutcomes ˜D
t
, each with equal probability. Most importantly, (N!) of the outcomes arepermutations of the original D, and thus the answer.
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/23
Blending and Bagging Bagging (Bootstrap Aggregation)
Fun Time
When using bootstrapping to re-sample N examples ˜D
t
from a data set D with N examples, what is the probability of getting ˜Dt
exactly the same as D?1
0 /NN
=02
1 /NN
3
N! /NN
4
NN
/NN
=1Reference Answer: 3
Consider re-sampling in an ordered manner for N steps. Then there are (N
N
)possibleoutcomes ˜D
t
, each with equal probability. Most importantly, (N!) of the outcomes arepermutations of the original D, and thus the answer.
Blending and Bagging Bagging (Bootstrap Aggregation)