• 沒有找到結果。

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g t

(x)

same

g t

(autocracy): as good as one single

g t

very different

g t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g t

(x)

same

g t

(autocracy):

as good as one single

g t

very different

g t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g t

(x)

same

g t

(autocracy):

as good as one single

g t

very different

g t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g t

(x)

same

g t

(autocracy):

as good as one single

g t

very different

g t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g t

(x)

same

g t

(autocracy):

as good as one single

g t

very different

g t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg

g

2t

− 2g

t

f + f

2



= avg g

2t



− 2Gf + f

2

= avg g

2t



− G

2

+ (

G − f

)

2

= avg g

2t



− 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G

+ G

2

 + (G − f )

2

=

avg

(g

t

− G)

2



+ (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg

g

2t

− 2g

t

f + f

2



= avg g

2t



− 2Gf + f

2

= avg g

2t



− G

2

+ (

G − f

)

2

= avg g

2t



− 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G

+ G

2

 + (G − f )

2

=

avg

(g

t

− G)

2



+ (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t



− 2Gf + f

2

= avg g

2t



− G

2

+ (

G − f

)

2

= avg g

2t



− 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G

+ G

2

 + (G − f )

2

=

avg

(g

t

− G)

2



+ (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t

 − 2Gf + f

2

= avg g

2t



− G

2

+ (

G − f

)

2

= avg g

2t



− 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G

+ G

2

 + (G − f )

2

=

avg

(g

t

− G)

2



+ (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t

 − 2Gf + f

2

= avg g

2t

 − G

2

+ (G − f )

2

= avg g

2t



− 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G

+ G

2

 + (G − f )

2

=

avg

(g

t

− G)

2



+ (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t

 − 2Gf + f

2

= avg g

2t

 − G

2

+ (G − f )

2

= avg g

2t

 − 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G

+ G

2

 + (G − f )

2

=

avg

(g

t

− G)

2



+ (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t

 − 2Gf + f

2

= avg g

2t

 − G

2

+ (G − f )

2

= avg g

2t

 − 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G + G

2

 + (G − f )

2

= avg

(g

t

− G)

2

 + (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t

 − 2Gf + f

2

= avg g

2t

 − G

2

+ (G − f )

2

= avg g

2t

 − 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G + G

2

 + (G − f )

2

= avg (g

t

− G)

2

 + (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t

 − 2Gf + f

2

= avg g

2t

 − G

2

+ (G − f )

2

= avg g

2t

 − 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G + G

2

 + (G − f )

2

= avg (g

t

− G)

2

 + (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

2



= avg g

2t

− 2g

t

f + f

2



= avg g

2t

 − 2Gf + f

2

= avg g

2t

 − G

2

+ (G − f )

2

= avg g

2t

 − 2G

2

+ G

2

+ (G − f )

2

= avg g

2t

− 2g

t

G + G

2

 + (G − f )

2

= avg (g

t

− G)

2

 + (G − f )

2

avg

(E

out

(g

t

)) =

avg



E(g

t

G) 2



+E

out

(G)

avg

E(g

t

G) 2



+E

out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

=

T →∞

lim

G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

=

T →∞

lim

G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+

performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

t

from P

N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg



E(g

t

¯ g) 2



+E

out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+performance of

consensus

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Consider applying uniform blending G(x) =

T 1

P

T

t=1

g

t

(x) on linear regression hypotheses g

t

(x) = innerprod(w

t

,

x). Which of the following

property best describes the resulting G(x)?

1

a constant function of

x

2

a linear function of

x

3

a quadratic function of

x

4

none of the other choices

Reference Answer: 2

G(x) = innerprod 1

T

T

X

t=1

w t

,

x

!

which is clearly a linear function of

x. Note that

we write ‘innerprod’ instead of the usual

‘transpose’ notation to avoid symbol conflict with T (number of hypotheses).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/23

Blending and Bagging Uniform Blending

Consider applying uniform blending G(x) =

T 1

P

T

t=1

g

t

(x) on linear regression hypotheses g

t

(x) = innerprod(w

t

,

x). Which of the following

property best describes the resulting G(x)?

1

a constant function of

x

2

a linear function of

x

3

a quadratic function of

x

4

none of the other choices

Reference Answer: 2

G(x) = innerprod 1 T

T

X

t=1

w t

,

x

!

which is clearly a linear function of

x. Note that

we write ‘innerprod’ instead of the usual

‘transpose’ notation to avoid symbol conflict with T (number of hypotheses).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/23

Blending and Bagging Linear and Any Blending

Linear Blending

linear

blending: known g t

, each to be given

α t

ballot

G(x) = sign

T

X

t=1

α t

· g

t

(x)

!

with

α t ≥ 0

computing ‘good’ α

t

: min

α

t

≥0

E

in

(α)

linear blending for regression

min

α

t

≥0

1 N

N

X

n=1

y

n

T

X

t=1

α t g t

(x

n

)

2

LinReg + transformation

min

w

i

1 N

N

X

n=1

y

n

d ˜

X

i=1

w i φ i

(x

n

)

2

相關文件