Machine Learning Techniques
( 機器學習技法)
Lecture 7: Blending and Bagging
Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.twDepartment of Computer Science
& Information Engineering
National Taiwan University
( 國立台灣大學資訊工程系)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/23
Blending and Bagging
Roadmap
1 Embedding Numerous Features: Kernel Models
Lecture 6: Support Vector Regression kernel ridge regression
(dense) via ridge regression +representer theorem;
support vector regression
(sparse) via regularizedtube
error +Lagrange dual
2
Combining Predictive Features: Aggregation ModelsLecture 7: Blending and Bagging
Motivation of Aggregation Uniform Blending
Linear and Any Blending
Bagging (Bootstrap Aggregation)
3 Distilling Implicit Features: Extraction Models
Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23
Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23
Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23
Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23
Blending and Bagging Motivation of Aggregation
An Aggregation Story
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).You can . . .
• select
the most trust-worthy friend from theirusual performance
—validation!
• mix
the predictions from all your friendsuniformly
—let them
vote!
• mix
the predictions from all your friendsnon-uniformly
—let them vote, but
give some more ballots
• combine
the predictionsconditionally
—if
[t satisfies some condition]
give some ballots to friend t•
...aggregation
models:mix
orcombine
hypotheses (for better performance)Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = g
t
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g − t )
• mix
the predictions from all your friendsuniformly
G(x) = sign P
T
t=1 1
· gt
(x)• mix
the predictions from all your friendsnon-uniformly
G(x) = sign P
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t=
1
• combine
the predictionsconditionally
G(x) = sign P
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = sign P
T
t=1 1
· gt
(x)• mix
the predictions from all your friendsnon-uniformly
G(x) = sign P
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t=
1
• combine
the predictionsconditionally
G(x) = sign P
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = sign P
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t=
1
• combine
the predictionsconditionally
G(x) = sign P
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t=
1
• combine
the predictionsconditionally
G(x) = sign P
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t=
1
• combine
the predictionsconditionally
G(x) = sign P
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t=
1
• combine
the predictionsconditionally
G(x) = sign P
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t= 1
• combine
the predictionsconditionally
G(x) = sign P
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t= 1
• combine
the predictionsconditionally
G(x) = signP
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t= 1
• combine
the predictionsconditionally
G(x) = signP
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) =
α
taggregation models: a
rich family
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t= 1
• combine
the predictionsconditionally
G(x) = signP
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) = α
taggregation models: a
rich family
Blending and Bagging Motivation of Aggregation
Aggregation with Math Notations
Your T friends g
1
, · · · ,gT
predicts whether stock will go up as gt
(x).• select
the most trust-worthy friend from theirusual performance
G(x) = gt
∗(x) with t∗
=argmint∈{1,2,··· ,T } E val (g t − )
• mix
the predictions from all your friendsuniformly
G(x) = signP
T
t=1 1
· gt
(x)
• mix
the predictions from all your friendsnon-uniformly
G(x) = signP
T
t=1 α t
· gt
(x)with
α t ≥ 0
• include select: α
t= q
E
val(g
t−) smallest y
• include uniformly: α
t= 1
• combine
the predictionsconditionally
G(x) = signP
T
t=1 q t (x)
· gt
(x)with
q t (x) ≥ 0
• include non-uniformly: q
t(x) = α
taggregation models: a
rich family
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23
Blending and Bagging Motivation of Aggregation
Recall: Selection by Validation
G(x) = g
t
∗(x) with t∗
= argmint∈{1,2,··· ,T }
E val (g t − )
• simple
and popular•
what if use Ein
(gt
)instead ofE val (g t − )?
complexity price on d
VC, remember? :-)
•
needone strong
gt −
to guarantee smallE val
(and small Eout
)selection:
rely on one strong hypothesis
aggregation:
can we do better with many (possibly weaker) hypotheses?
Blending and Bagging Motivation of Aggregation
Recall: Selection by Validation
G(x) = g
t
∗(x) with t∗
= argmint∈{1,2,··· ,T }
E val (g t − )
• simple
and popular•
what if use Ein
(gt
)instead ofE val (g t − )?
complexity price on d
VC, remember? :-)
•
needone strong
gt −
to guarantee smallE val
(and small Eout
)selection:
rely on one strong hypothesis
aggregation:
can we do better with many (possibly weaker) hypotheses?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23
Blending and Bagging Motivation of Aggregation
Recall: Selection by Validation
G(x) = g
t
∗(x) with t∗
= argmint∈{1,2,··· ,T }
E val (g t − )
• simple
and popular•
what if use Ein
(gt
)instead ofE val (g t − )?
complexity price on d
VC, remember? :-)
•
needone strong
gt −
to guarantee smallE val
(and small Eout
)selection:
rely on one strong hypothesis
aggregation:
can we do better with many (possibly weaker) hypotheses?
Blending and Bagging Motivation of Aggregation
Recall: Selection by Validation
G(x) = g
t
∗(x) with t∗
= argmint∈{1,2,··· ,T }
E val (g t − )
• simple
and popular•
what if use Ein
(gt
)instead ofE val (g t − )?
complexity price on d
VC, remember? :-)
•
needone strong
gt −
to guarantee smallE val
(and small Eout
)selection:
rely on one strong hypothesis
aggregation:
can we do better with many (possibly weaker) hypotheses?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23
Blending and Bagging Motivation of Aggregation
Recall: Selection by Validation
G(x) = g
t
∗(x) with t∗
= argmint∈{1,2,··· ,T }
E val (g t − )
• simple
and popular•
what if use Ein
(gt
)instead ofE val (g t − )?
complexity price on d
VC, remember? :-)
•
needone strong
gt −
to guarantee smallE val
(and small Eout
)selection:
rely on one strong hypothesis
aggregation:
can we do better with many (possibly weaker) hypotheses?
Blending and Bagging Motivation of Aggregation
Recall: Selection by Validation
G(x) = g
t
∗(x) with t∗
= argmint∈{1,2,··· ,T }
E val (g t − )
• simple
and popular•
what if use Ein
(gt
)instead ofE val (g t − )?
complexity price on d
VC, remember? :-)
•
needone strong
gt −
to guarantee smallE val
(and small Eout
)selection:
rely on one strong hypothesis
aggregation:
can we do better with many (possibly weaker) hypotheses?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23
Blending and Bagging Motivation of Aggregation
Why Might Aggregation Work?
•
mixdifferent weak hypotheses
uniformly—G(x) ‘strong’
•
aggregation=⇒
feature transform (?)
•
mixdifferent random-PLA hypotheses
uniformly—G(x) ‘moderate’
•
aggregation=⇒
regularization (?)
proper aggregation =⇒
better performance
Blending and Bagging Motivation of Aggregation
Why Might Aggregation Work?
•
mixdifferent weak hypotheses
uniformly—G(x) ‘strong’
•
aggregation=⇒
feature transform (?)
•
mixdifferent random-PLA hypotheses
uniformly—G(x) ‘moderate’
•
aggregation=⇒
regularization (?)
proper aggregation =⇒
better performance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23
Blending and Bagging Motivation of Aggregation
Why Might Aggregation Work?
•
mixdifferent weak hypotheses
uniformly—G(x) ‘strong’
•
aggregation=⇒
feature transform (?)
•
mixdifferent random-PLA hypotheses
uniformly—G(x) ‘moderate’
•
aggregation=⇒
regularization (?)
proper aggregation =⇒
better performance
Blending and Bagging Motivation of Aggregation
Why Might Aggregation Work?
•
mixdifferent weak hypotheses
uniformly—G(x) ‘strong’
•
aggregation=⇒
feature transform (?)
•
mixdifferent random-PLA hypotheses
uniformly—G(x) ‘moderate’
•
aggregation=⇒
regularization (?)
proper aggregation =⇒
better performance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23
Blending and Bagging Motivation of Aggregation
Why Might Aggregation Work?
•
mixdifferent weak hypotheses
uniformly—G(x) ‘strong’
•
aggregation=⇒
feature transform (?)
•
mixdifferent random-PLA hypotheses
uniformly—G(x) ‘moderate’
•
aggregation=⇒
regularization (?)
proper aggregation =⇒
better performance
Blending and Bagging Motivation of Aggregation
Why Might Aggregation Work?
•
mixdifferent weak hypotheses
uniformly—G(x) ‘strong’
•
aggregation=⇒
feature transform (?)
•
mixdifferent random-PLA hypotheses
uniformly—G(x) ‘moderate’
•
aggregation=⇒
regularization (?)
proper aggregation =⇒
better performance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23
Blending and Bagging Motivation of Aggregation
Why Might Aggregation Work?
•
mixdifferent weak hypotheses
uniformly—G(x) ‘strong’
•
aggregation=⇒
feature transform (?)
•
mixdifferent random-PLA hypotheses
uniformly—G(x) ‘moderate’
•
aggregation=⇒
regularization (?)
proper aggregation =⇒
better performance
Blending and Bagging Motivation of Aggregation
Fun Time
Consider three decision stump hypotheses from R to {−1, +1}:
g
1
(x ) = sign(1 − x ), g2
(x ) = sign(1 + x ), g3
(x ) = −1. When mixing the three hypotheses uniformly, what is the resulting G(x )?1
2J|x | ≤ 1K − 12
2J|x | ≥ 1K − 13
2Jx ≤ −1K − 14
2Jx ≥ +1K − 1Reference Answer: 1
The ‘region’ that gets two positive votes from g
1
and g2
is |x | ≤ 1, and thus G(x ) is positive within the region only. We see that the three decision stumps gt
can be aggregated to form a more sophisticated hypothesis G.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/23
Blending and Bagging Motivation of Aggregation
Fun Time
Consider three decision stump hypotheses from R to {−1, +1}:
g
1
(x ) = sign(1 − x ), g2
(x ) = sign(1 + x ), g3
(x ) = −1. When mixing the three hypotheses uniformly, what is the resulting G(x )?1
2J|x | ≤ 1K − 12
2J|x | ≥ 1K − 13
2Jx ≤ −1K − 14
2Jx ≥ +1K − 1Reference Answer: 1
The ‘region’ that gets two positive votes from g
1
and g2
is |x | ≤ 1, and thus G(x ) is positive within the region only. We see that the three decision stumps gt
can be aggregated to form a more sophisticated hypothesis G.Blending and Bagging Uniform Blending
Uniform Blending (Voting) for Classification
uniform
blending: known g t
, each with
1
ballotG(x) = sign
T
X
t=1
1 · g
t(x)
!
•
sameg t
(autocracy): as good as one singleg t
•
very differentg t
(diversity+democracy):
majority cancorrect
minority•
similar results with uniform voting for multiclassG(x) = argmax
1≤k ≤K T
X
t=1
Jg
t
(x) = kKhow about
regression?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23
Blending and Bagging Uniform Blending
Uniform Blending (Voting) for Classification
uniform blending: known g t
, each with1
ballotG(x) = sign
T
X
t=1
1 · g
t(x)
!
•
sameg t
(autocracy): as good as one singleg t
•
very differentg t
(diversity+democracy):
majority cancorrect
minority•
similar results with uniform voting for multiclassG(x) = argmax
1≤k ≤K T
X
t=1
Jg
t
(x) = kKhow about
regression?
Blending and Bagging Uniform Blending
Uniform Blending (Voting) for Classification
uniform blending: known g t
, each with1
ballotG(x) = sign
T
X
t=1
1 · g
t(x)
!
•
sameg t
(autocracy): as good as one singleg t
•
very differentg t
(diversity+democracy):
majority cancorrect
minority•
similar results with uniform voting for multiclassG(x) = argmax
1≤k ≤K T
X
t=1
Jg
t
(x) = kKhow about
regression?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23
Blending and Bagging Uniform Blending
Uniform Blending (Voting) for Classification
uniform blending: known g t
, each with1
ballotG(x) = sign
T
X
t=1
1 · g
t(x)
!
•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
majority cancorrect
minority•
similar results with uniform voting for multiclassG(x) = argmax
1≤k ≤K T
X
t=1
Jg
t
(x) = kKhow about
regression?
Blending and Bagging Uniform Blending
Uniform Blending (Voting) for Classification
uniform blending: known g t
, each with1
ballotG(x) = sign
T
X
t=1
1 · g
t(x)
!
•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
majority can
correct
minority•
similar results with uniform voting for multiclassG(x) = argmax
1≤k ≤K T
X
t=1
Jg
t
(x) = kKhow about
regression?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23
Blending and Bagging Uniform Blending
Uniform Blending (Voting) for Classification
uniform blending: known g t
, each with1
ballotG(x) = sign
T
X
t=1
1 · g
t(x)
!
•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
majority can
correct
minority•
similar results with uniform voting for multiclassG(x) = argmax
1≤k ≤K T
X
t=1
Jg
t
(x) = kKhow about
regression?
Blending and Bagging Uniform Blending
Uniform Blending (Voting) for Classification
uniform blending: known g t
, each with1
ballotG(x) = sign
T
X
t=1
1 · g
t(x)
!
•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
majority can
correct
minority•
similar results with uniform voting for multiclassG(x) = argmax
1≤k ≤K T
X
t=1
Jg
t
(x) = kKhow about
regression?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) =
1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy): as good as one singleg t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy): as good as one singleg t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg
g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg
g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg (g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg (g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg (g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
=T →∞
limG
= limT →∞
1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
=T →∞
limG
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯ expected
performance of A =expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:reduces