Blending and Bagging Linear and Any Blending
Constraint on α t
linear blending = LinModel +
hypotheses as transform
+constraints:
min
α
t≥0
1 N
N
X
n=1
err y
n
,T
X
t=1
α t g t
(xn
)!
linear blending for binary classification
ifα t
<0 =⇒α t g t
(x) =|α t |
(−gt
(x))•
negativeα t
forg t
≡ positive|α t |
for−g t
• if you have a stock up/down classifier with 99% error, tell me! :-)
in practice, often
linear blending = LinModel +
hypotheses as transform
(( (( (( (
Blending and Bagging Linear and Any Blending
Constraint on α t
linear blending = LinModel +
hypotheses as transform
+constraints:
min
α
t≥0
1 N
N
X
n=1
err y
n
,T
X
t=1
α t g t
(xn
)!
linear blending for binary classification
ifα t
<0 =⇒α t g t
(x) =|α t |
(−gt
(x))•
negativeα t
forg t
≡ positive|α t |
for−g t
• if you have a stock up/down classifier with 99% error, tell me! :-)
in practice, often
linear blending = LinModel +
hypotheses as transform
(( (( (( (
hhh
+constraints hhh h
Blending and Bagging Linear and Any Blending
Constraint on α t
linear blending = LinModel +
hypotheses as transform
+constraints:
min
α
t≥0
1 N
N
X
n=1
err y
n
,T
X
t=1
α t g t
(xn
)!
linear blending for binary classification
if
α t
<0 =⇒α t g t
(x) =|α t |
(−gt
(x))•
negativeα t
forg t
≡ positive|α t |
for−g t
• if you have a stock up/down classifier with 99% error, tell me! :-)
in practice, often
linear blending = LinModel +
hypotheses as transform
(( (( (( ( hhh
+constraints hhh h
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/23
Blending and Bagging Linear and Any Blending
Constraint on α t
linear blending = LinModel +
hypotheses as transform
+constraints:
min
α
t≥0
1 N
N
X
n=1
err y
n
,T
X
t=1
α t g t
(xn
)!
linear blending for binary classification
if
α t
<0 =⇒α t g t
(x) =|α t |
(−gt
(x))•
negativeα t
forg t
≡ positive|α t |
for−g t
• if you have a stock up/down classifier with 99% error, tell me! :-)
in practice, often
linear blending = LinModel +
hypotheses as transform
(( (( (( (
hhh
+constraints hhh h
Blending and Bagging Linear and Any Blending
Constraint on α t
linear blending = LinModel +
hypotheses as transform
+constraints:
min
α
t≥0
1 N
N
X
n=1
err y
n
,T
X
t=1
α t g t
(xn
)!
linear blending for binary classification
if
α t
<0 =⇒α t g t
(x) =|α t |
(−gt
(x))•
negativeα t
forg t
≡ positive|α t |
for−g t
• if you have a stock up/down classifier with 99% error, tell me!
:-)
in practice, often
linear blending = LinModel +
hypotheses as transform
(( (( (( ( hhh
+constraints hhh h
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/23
Blending and Bagging Linear and Any Blending
Constraint on α t
linear blending = LinModel +
hypotheses as transform
+constraints:
min
α
t≥0
1 N
N
X
n=1
err y
n
,T
X
t=1
α t g t
(xn
)!
linear blending for binary classification
if
α t
<0 =⇒α t g t
(x) =|α t |
(−gt
(x))•
negativeα t
forg t
≡ positive|α t |
for−g t
• if you have a stock up/down classifier with 99% error, tell me!
:-)
in practice, often
linear blending = LinModel +
hypotheses as transform
(( (( (( (
hhh
+constraints hhh h
Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best,
paying dVC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty
•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23
Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best,
paying dVC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty
•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best, paying d
VC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty
•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23
Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best, paying d
VC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty
•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best, paying d
VC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23
Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best, paying d
VC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best, paying d
VC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23
Blending and Bagging Linear and Any Blending
Linear Blending versus Selection
in practice, often
g 1
∈ H1
,g 2
∈ H2
, . . . ,g T
∈ HT
byminimum E in
•
recall:selection by minimum E in
—bestof
best, paying d
VC
T S
t=1
H t
•
recall: linear blending includesselection
as special case—by setting
α t
=qE val (g t − )
smallesty•
complexity price of linear blendingwith E in
(aggregationofbest):
≥d
VC
T S
t=1
H t
like
selection, blending practically done with
(Eval
instead ofE in
) + (gt −
from minimumE train
)Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}
2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}
2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}
2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}
2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Blending and Bagging Linear and Any Blending
Any Blending
Given
g 1 −
,g 2 −
, . . .,g T −
fromD train
, transform (xn
,yn
)inD val
to (zn
=Φ −
(xn
),yn
), whereΦ −
(x) = (g− 1
(x), . . . ,g T −
(x))Linear Blending
1
computeα
= LinearModel
{(z
n
,yn
)}
2
return GLINB(x) =LinearHypothesis α
(Φ(x)),Any Blending (Stacking)
1
computeg ˜
=
AnyModel
{(z
n
,yn
)}
2
return GANYB(x) =g(Φ(x)), ˜
where
Φ(x) = (g 1
(x), . . . ,g T
(x))any
blending:• powerful, achieves conditional blending
•
butdanger of overfitting, as always :-(
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
modelE
test
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
modelE
test
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
model Etest
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
model Etest
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
model Etest
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
model Etest
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
model Etest
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Blending and Bagging Linear and Any Blending
Blending in Practice
(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)
KDDCup 2011 Track 1: World Champion Solution by NTU
• validation set blending: a special any blending
model Etest
(squared):519.45
=⇒456.24
—helped
secure the lead
inlast two weeks
• test set blending: linear blending
using ˜Etest
E
test
(squared):456.24
=⇒442.06
—helped
turn the tables
inlast hour
blending ‘useful’ in practice,
despite the computational burden
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23
Blending and Bagging Linear and Any Blending
Fun Time
Consider three decision stump hypotheses from R to {−1, +1}:
g 1
(x ) = sign(1 − x ),g 2
(x ) = sign(1 + x ),g 3
(x ) = −1. When x = 0, what is the resultingΦ(x ) = (g 1
(x ),g 2
(x ),g 3
(x )) used in the returned hypothesis of linear/any blending?1
(+1, +1, +1)2
(+1, +1, −1)3
(+1, −1, −1)4
(−1, −1, −1)Reference Answer: 2 Too easy? :-)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/23
Blending and Bagging Linear and Any Blending
Fun Time
Consider three decision stump hypotheses from R to {−1, +1}:
g 1
(x ) = sign(1 − x ),g 2
(x ) = sign(1 + x ),g 3
(x ) = −1. When x = 0, what is the resultingΦ(x ) = (g 1
(x ),g 2
(x ),g 3
(x )) used in the returned hypothesis of linear/any blending?1
(+1, +1, +1)2
(+1, +1, −1)3
(+1, −1, −1)4
(−1, −1, −1)Reference Answer: 2 Too easy? :-)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/23
Blending and Bagging Bagging (Bootstrap Aggregation)
What We Have Done
blending: aggregate
after getting g t
;learning: aggregate