even simple
uniform blending
can be better than anysingle hypothesis
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy): as good as one singleg t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Blending and Bagging Uniform Blending
Uniform Blending for Regression
G(x) = 1 T
T
X
t=1
g t
(x)•
sameg t
(autocracy):as good as one single
g t
•
very differentg t
(diversity+democracy):
=⇒
some
g t
(x) > f (x), someg t
(x) < f (x)=⇒average
could be
more accurate than individualdiverse hypotheses:
even simple
uniform blending
can be better than anysingle hypothesis
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg
g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg
g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (
G − f
)
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG
+ G
2+ (G − f )
2=
avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg
(g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg (g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg (g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Blending and Bagging Uniform Blending
Theoretical Analysis of Uniform Blending
G(x)
=1 T
T
X
t=1
g t (x)
avg (g
t(x) − f (x))
2= avg g
2t− 2g
tf + f
2= avg g
2t− 2Gf + f
2= avg g
2t− G
2+ (G − f )
2= avg g
2t− 2G
2+ G
2+ (G − f )
2= avg g
2t− 2g
tG + G
2+ (G − f )
2= avg (g
t− G)
2+ (G − f )
2avg
(Eout
(gt
)) =avg
E(g
t
−G) 2
+E
out
(G)≥
avg
E(g
t
−G) 2
+E
out
(G)Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞
G
=T →∞
lim1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
=T →∞
limG
= limT →∞
1 T
T
X
t=1
g t
=
E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
=T →∞
limG
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯
expected
performance of A=
expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯ expected
performance of A =expected deviation
toconsensus
+
performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯ expected
performance of A =expected deviation
toconsensus
+performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯ expected
performance of A =expected deviation
toconsensus
+performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:
reduces
variance
for more stable performanceBlending and Bagging Uniform Blending
Some Special g t
consider a
virtual
iterative process that for t = 1, 2, . . . , T1
request size-N data Dt
from PN
(i.i.d.)2
obtaing t
by A(Dt
)g ¯
= limT →∞ G
= limT →∞
1 T
T
X
t=1
g t
=E
D A(D)
avg
(Eout
(gt
)) =avg
E(g
t
−¯ g) 2
+E
out
(g) ¯ expected
performance of A =expected deviation
toconsensus
+performance of
consensus
•
performance ofconsensus: called bias
• expected deviation
toconsensus: called variance
uniform blending:reduces
variance
for more stable performanceHsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23
Blending and Bagging Uniform Blending
Consider applying uniform blending G(x) =
T 1
PT
t=1
gt
(x) on linear regression hypotheses gt
(x) = innerprod(wt
,x). Which of the following
property best describes the resulting G(x)?1
a constant function ofx
2
a linear function ofx
3
a quadratic function ofx
4
none of the other choicesReference Answer: 2
G(x) = innerprod 1T
T
X
t=1
w t
,x
!
which is clearly a linear function of
x. Note that
we write ‘innerprod’ instead of the usual‘transpose’ notation to avoid symbol conflict with T (number of hypotheses).
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/23
Blending and Bagging Uniform Blending
Consider applying uniform blending G(x) =
T 1
PT
t=1
gt
(x) on linear regression hypotheses gt
(x) = innerprod(wt
,x). Which of the following
property best describes the resulting G(x)?1
a constant function ofx
2
a linear function ofx
3
a quadratic function ofx
4
none of the other choicesReference Answer: 2
G(x) = innerprod 1 T
T
X
t=1
w t
,x
!
which is clearly a linear function of
x. Note that
we write ‘innerprod’ instead of the usual‘transpose’ notation to avoid symbol conflict with T (number of hypotheses).
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/23
Blending and Bagging Linear and Any Blending
Linear Blending
linear
blending: known g t
, each to be given
α t
ballotG(x) = sign
T
X
t=1
α t
· gt
(x)!
with
α t ≥ 0
computing ‘good’ α
t
: minα
t≥0
Ein
(α)linear blending for regression
minα
t≥0
1 N
N
X
n=1
y
n
−T
X
t=1
α t g t
(xn
)
2
LinReg + transformation
minw
i1 N
N
X
n=1
y
n
−d ˜
X
i=1
w i φ i
(xn
)