最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

diverse hypotheses:

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 44-74)

even simple

uniform blending

can be better than any

single hypothesis

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g _t

(x)

•

same

g _t

(autocracy): as good as one single

g t

•

very different

g _t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g _t

(x)

•

same

g _t

(autocracy):

as good as one single

g t

•

very different

g _t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g _t

(x)

•

same

g _t

(autocracy):

as good as one single

g t

•

very different

g _t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g _t

(x)

•

same

g _t

(autocracy):

as good as one single

g t

•

very different

g _t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Blending and Bagging Uniform Blending

Uniform Blending for Regression

G(x) = 1 T

T

X

t=1

g _t

(x)

•

same

g _t

(autocracy):

as good as one single

g t

•

very different

g _t

(diversity+

democracy):

=⇒

some

g t

(x) > f (x), some

g t

(x) < f (x)

=⇒average

could be

more accurate than individual

diverse hypotheses:

even simple

uniform blending

can be better than any

single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg

g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (

G − f

)

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G

+ G

²

+ (G − f )

²

=

avg

(g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg

g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (

G − f

)

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G

+ G

²

+ (G − f )

²

=

avg

(g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (

G − f

)

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G

+ G

²

+ (G − f )

²

=

avg

(g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (

G − f

)

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G

+ G

²

+ (G − f )

²

=

avg

(g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (G − f )

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G

+ G

²

+ (G − f )

²

=

avg

(g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (G − f )

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G

+ G

²

+ (G − f )

²

=

avg

(g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (G − f )

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G + G

²

+ (G − f )

²

= avg

(g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (G − f )

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G + G

²

+ (G − f )

²

= avg (g

t

− G)

²

+ (G − f )

²

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

G) ²

+E

_out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (G − f )

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G + G

²

+ (G − f )

²

= avg (g

t

− G)

²

+ (G − f )

²

avg

(E

out

(g

t

)) =

avg

E(g

_t

−

G) ²

+E

out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Blending and Bagging Uniform Blending

Theoretical Analysis of Uniform Blending

G(x)

=

1 T

T

X

t=1

g t (x)

avg (g

t

(x) − f (x))

²

= avg g

²_t

− 2g

_t

f + f

²

= avg g

²_t

− 2Gf + f

²

= avg g

²_t

− G

²

+ (G − f )

²

= avg g

²_t

− 2G

²

+ G

²

+ (G − f )

²

= avg g

²_t

− 2g

_t

G + G

²

+ (G − f )

²

= avg (g

t

− G)

²

+ (G − f )

²

avg

(E

out

(g

t

)) =

avg

E(g

_t

−

G) ²

+E

out

(G)

≥

avg

E(g

_t

−

G) ²

+E

_out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg

E(g

_t

−

¯ g) ²

+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg

E(g

_t

−

¯ g) ²

+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

out

(g

t

)) =

avg

E(g

_t

−

¯ g) ²

+E

out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞

G

=

T →∞

lim

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

=

T →∞

lim

G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

=

T →∞

lim

G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯

expected

performance of A

=

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+

performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Blending and Bagging Uniform Blending

Some Special g _t

consider a

virtual

iterative process that for t = 1, 2, . . . , T

1

request size-N data D

_t

from P

^N

(i.i.d.)

2

obtain

g t

by A(D

t

)

g ¯

= lim

T →∞ G

= lim

T →∞

1 T

T

X

t=1

g t

=

E

D A(D)

avg

(E

_out

(g

_t

)) =

avg

E(g

_t

−

¯ g) ²

+E

_out

(

g) ¯ expected

performance of A =

expected deviation

to

consensus

+performance of

consensus

•

performance of

consensus: called bias

• expected deviation

to

consensus: called variance

uniform blending:

reduces

variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

Consider applying uniform blending G(x) =

_T ¹

P

T

t=1

g

t

(x) on linear regression hypotheses g

t

(x) = innerprod(w

t

,

x). Which of the following

property best describes the resulting G(x)?

1

a constant function of

x

2

a linear function of

x

3

a quadratic function of

x

4

none of the other choices

Reference Answer: 2

G(x) = innerprod 1

T

T

X

t=1

w _t

,

x

!

which is clearly a linear function of

x. Note that

we write ‘innerprod’ instead of the usual

‘transpose’ notation to avoid symbol conflict with T (number of hypotheses).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/23

Blending and Bagging Uniform Blending

Consider applying uniform blending G(x) =

_T ¹

P

T

t=1

g

t

(x) on linear regression hypotheses g

t

(x) = innerprod(w

t

,

x). Which of the following

property best describes the resulting G(x)?

1

a constant function of

x

2

a linear function of

x

3

a quadratic function of

x

4

none of the other choices

Reference Answer: 2

G(x) = innerprod 1 T

T

X

t=1

w _t

,

x

!

which is clearly a linear function of

x. Note that

we write ‘innerprod’ instead of the usual

‘transpose’ notation to avoid symbol conflict with T (number of hypotheses).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/23

Blending and Bagging Linear and Any Blending

Linear Blending

linear

blending: known g t

, each to be given

α t

ballot

G(x) = sign

T

X

t=1

α _t

· g

_t

(x)

!

with

α _t ≥ 0

computing ‘good’ α

_t

: min

α

t

≥0

E

_in

(α)

linear blending for regression

min

α

t

≥0

1 N

N

X

n=1



y

n

−

T

X

t=1

α t g t

(x

n

)





2

LinReg + transformation

min

w

i

1 N

N

X

n=1



y

n

−

d ˜

X

i=1

w _i φ _i

(x

n

)





2

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 44-74)

立即下載 "Machine Learning Techn..."

Outline

相關文件