• 沒有找到結果。

(( (( (( (hhh+constraintshhhh

Blending and Bagging Linear and Any Blending

Constraint on α t

linear blending = LinModel +

hypotheses as transform

+

constraints:

min

α

t

≥0

1 N

N

X

n=1

err y

n

,

T

X

t=1

α t g t

(x

n

)

!

linear blending for binary classification

if

α t

<0 =⇒

α t g t

(x) =

t |

(−g

t

(x))

negative

α t

for

g t

≡ positive

t |

for

−g t

if you have a stock up/down classifier with 99% error, tell me! :-)

in practice, often

linear blending = LinModel +

hypotheses as transform

(( (( (( (

Blending and Bagging Linear and Any Blending

Constraint on α t

linear blending = LinModel +

hypotheses as transform

+

constraints:

min

α

t

≥0

1 N

N

X

n=1

err y

n

,

T

X

t=1

α t g t

(x

n

)

!

linear blending for binary classification

if

α t

<0 =⇒

α t g t

(x) =

t |

(−g

t

(x))

negative

α t

for

g t

≡ positive

t |

for

−g t

if you have a stock up/down classifier with 99% error, tell me! :-)

in practice, often

linear blending = LinModel +

hypotheses as transform

(( (( (( (

hhh

+

constraints hhh h

Blending and Bagging Linear and Any Blending

Constraint on α t

linear blending = LinModel +

hypotheses as transform

+

constraints:

min

α

t

≥0

1 N

N

X

n=1

err y

n

,

T

X

t=1

α t g t

(x

n

)

!

linear blending for binary classification

if

α t

<0 =⇒

α t g t

(x) =

t |

(−g

t

(x))

negative

α t

for

g t

≡ positive

t |

for

−g t

if you have a stock up/down classifier with 99% error, tell me! :-)

in practice, often

linear blending = LinModel +

hypotheses as transform

(( (( (( ( hhh

+

constraints hhh h

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/23

Blending and Bagging Linear and Any Blending

Constraint on α t

linear blending = LinModel +

hypotheses as transform

+

constraints:

min

α

t

≥0

1 N

N

X

n=1

err y

n

,

T

X

t=1

α t g t

(x

n

)

!

linear blending for binary classification

if

α t

<0 =⇒

α t g t

(x) =

t |

(−g

t

(x))

negative

α t

for

g t

≡ positive

t |

for

−g t

if you have a stock up/down classifier with 99% error, tell me! :-)

in practice, often

linear blending = LinModel +

hypotheses as transform

(( (( (( (

hhh

+

constraints hhh h

Blending and Bagging Linear and Any Blending

Constraint on α t

linear blending = LinModel +

hypotheses as transform

+

constraints:

min

α

t

≥0

1 N

N

X

n=1

err y

n

,

T

X

t=1

α t g t

(x

n

)

!

linear blending for binary classification

if

α t

<0 =⇒

α t g t

(x) =

t |

(−g

t

(x))

negative

α t

for

g t

≡ positive

t |

for

−g t

if you have a stock up/down classifier with 99% error, tell me!

:-)

in practice, often

linear blending = LinModel +

hypotheses as transform

(( (( (( ( hhh

+

constraints hhh h

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/23

Blending and Bagging Linear and Any Blending

Constraint on α t

linear blending = LinModel +

hypotheses as transform

+

constraints:

min

α

t

≥0

1 N

N

X

n=1

err y

n

,

T

X

t=1

α t g t

(x

n

)

!

linear blending for binary classification

if

α t

<0 =⇒

α t g t

(x) =

t |

(−g

t

(x))

negative

α t

for

g t

≡ positive

t |

for

−g t

if you have a stock up/down classifier with 99% error, tell me!

:-)

in practice, often

linear blending = LinModel +

hypotheses as transform

(( (( (( (

hhh

+

constraints hhh h

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best,

paying dVC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallest

y

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best,

paying dVC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallest

y

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best, paying d

VC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallest

y

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best, paying d

VC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallest

y

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best, paying d

VC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallesty

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best, paying d

VC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallesty

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best, paying d

VC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallesty

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/23

Blending and Bagging Linear and Any Blending

Linear Blending versus Selection

in practice, often

g 1

∈ H

1

,

g 2

∈ H

2

, . . . ,

g T

∈ H

T

by

minimum E in

recall:

selection by minimum E in

—bestof

best, paying d

VC



T S

t=1

H t



recall: linear blending includes

selection

as special case

—by setting

α t

=q

E val (g t )

smallesty

complexity price of linear blending

with E in

(aggregationof

best):

≥d

VC



T S

t=1

H t



like

selection, blending practically done with

(E

val

instead of

E in

) + (g

t

from minimum

E train

)

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}

2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}

2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}

2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}

2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}



2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}



2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}



2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}



2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Blending and Bagging Linear and Any Blending

Any Blending

Given

g 1

,

g 2

, . . .,

g T

from

D train

, transform (x

n

,y

n

)in

D val

to (z

n

=

Φ

(x

n

),y

n

), where

Φ

(x) = (g

1

(x), . . . ,

g T

(x))

Linear Blending

1

compute

α

= LinearModel



{(z

n

,y

n

)}



2

return GLINB(x) =

LinearHypothesis α

(Φ(x)),

Any Blending (Stacking)

1

compute

g ˜

=

AnyModel



{(z

n

,y

n

)}



2

return GANYB(x) =

g(Φ(x)), ˜

where

Φ(x) = (g 1

(x), . . . ,

g T

(x))

any

blending:

powerful, achieves conditional blending

but

danger of overfitting, as always :-(

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/23

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model

E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model

E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Blending and Bagging Linear and Any Blending

Blending in Practice

(Chen et al., A linear ensemble of individual and blended models for music rating prediction, 2012)

KDDCup 2011 Track 1: World Champion Solution by NTU

• validation set blending: a special any blending

model E

test

(squared):

519.45

=⇒

456.24

—helped

secure the lead

in

last two weeks

• test set blending: linear blending

using ˜E

test

E

test

(squared):

456.24

=⇒

442.06

—helped

turn the tables

in

last hour

blending ‘useful’ in practice,

despite the computational burden

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/23

Blending and Bagging Linear and Any Blending

Fun Time

Consider three decision stump hypotheses from R to {−1, +1}:

g 1

(x ) = sign(1 − x ),

g 2

(x ) = sign(1 + x ),

g 3

(x ) = −1. When x = 0, what is the resulting

Φ(x ) = (g 1

(x ),

g 2

(x ),

g 3

(x )) used in the returned hypothesis of linear/any blending?

1

(+1, +1, +1)

2

(+1, +1, −1)

3

(+1, −1, −1)

4

(−1, −1, −1)

Reference Answer: 2 Too easy? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/23

Blending and Bagging Linear and Any Blending

Fun Time

Consider three decision stump hypotheses from R to {−1, +1}:

g 1

(x ) = sign(1 − x ),

g 2

(x ) = sign(1 + x ),

g 3

(x ) = −1. When x = 0, what is the resulting

Φ(x ) = (g 1

(x ),

g 2

(x ),

g 3

(x )) used in the returned hypothesis of linear/any blending?

1

(+1, +1, +1)

2

(+1, +1, −1)

3

(+1, −1, −1)

4

(−1, −1, −1)

Reference Answer: 2 Too easy? :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/23

Blending and Bagging Bagging (Bootstrap Aggregation)

What We Have Done

blending: aggregate

after getting g t

;

learning: aggregate

as well as getting g t

aggregation type blending

相關文件