• 沒有找到結果。

# Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
147
0
0

(1)

## ( 機器學習技法)

### Lecture 7: Blending and Bagging

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw

### ( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/23

(2)

Blending and Bagging

### Lecture 6: Support Vector Regression kernel ridge regression

(dense) via ridge regression +

### support vector regression

(sparse) via regularized

error +

### 2

Combining Predictive Features: Aggregation Models

### 3 Distilling Implicit Features: Extraction Models

(3)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

(4)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

(5)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

(6)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

(7)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

(8)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

(9)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

(10)

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

—validation!

### • mix

the predictions from all your friends

—let them

### • mix

the predictions from all your friends

### non-uniformly

—let them vote, but

the predictions

—if

### [t satisfies some condition]

give some ballots to friend t

...

models:

or

### combine

hypotheses (for better performance)

(11)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign P

· g

(x)

### • mix

the predictions from all your friends

G(x) = sign P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

(12)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign P

· g

(x)

### • mix

the predictions from all your friends

G(x) = sign P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

(13)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

(14)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

(15)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

(16)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

(17)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

(18)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign

P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

(19)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign

P

· g

(x)

with

t

### α

t

aggregation models: a

### rich family

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

(20)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign

P

· g

(x)

with

t

### (x) = α

t

aggregation models: a

### rich family

(21)

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

, · · · ,g

### T

predicts whether stock will go up as g

(x).

### • select

the most trust-worthy friend from their

G(x) = g

(x) with t

=argmin

### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)



### • mix

the predictions from all your friends

G(x) = sign

P

· g

(x)

with

t

val

t

t

the predictions

G(x) = sign

P

· g

(x)

with

t

### (x) = α

t

aggregation models: a

### rich family

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

(22)

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

(x) with t

= argmin

and popular

what if use E

(g

VC

need

g

### t−

to guarantee small

(and small E

)

### selection:

rely on one strong hypothesis

### aggregation:

can we do better with many (possibly weaker) hypotheses?

(23)

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

(x) with t

= argmin

and popular

what if use E

(g

VC

need

g

### t−

to guarantee small

(and small E

)

### selection:

rely on one strong hypothesis

### aggregation:

can we do better with many (possibly weaker) hypotheses?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

(24)

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

(x) with t

= argmin

and popular

what if use E

(g

VC

need

g

### t−

to guarantee small

(and small E

)

### selection:

rely on one strong hypothesis

### aggregation:

can we do better with many (possibly weaker) hypotheses?

(25)

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

(x) with t

= argmin

and popular

what if use E

(g

VC

need

g

### t−

to guarantee small

(and small E

)

### selection:

rely on one strong hypothesis

### aggregation:

can we do better with many (possibly weaker) hypotheses?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

(26)

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

(x) with t

= argmin

and popular

what if use E

(g

VC

need

g

### t−

to guarantee small

(and small E

)

### selection:

rely on one strong hypothesis

### aggregation:

can we do better with many (possibly weaker) hypotheses?

(27)

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

(x) with t

= argmin

and popular

what if use E

(g

VC

need

g

### t−

to guarantee small

(and small E

)

### selection:

rely on one strong hypothesis

### aggregation:

can we do better with many (possibly weaker) hypotheses?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

(28)

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

mix

uniformly

—G(x) ‘strong’

aggregation

=⇒

mix

uniformly

—G(x) ‘moderate’

aggregation

=⇒

### regularization (?)

proper aggregation =⇒

### better performance

(29)

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

mix

uniformly

—G(x) ‘strong’

aggregation

=⇒

mix

uniformly

—G(x) ‘moderate’

aggregation

=⇒

### regularization (?)

proper aggregation =⇒

### better performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23

(30)

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

mix

uniformly

—G(x) ‘strong’

aggregation

=⇒

mix

uniformly

—G(x) ‘moderate’

aggregation

=⇒

### regularization (?)

proper aggregation =⇒

### better performance

(31)

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

mix

uniformly

—G(x) ‘strong’

aggregation

=⇒

mix

uniformly

—G(x) ‘moderate’

aggregation

=⇒

### regularization (?)

proper aggregation =⇒

### better performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23

(32)

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

mix

uniformly

—G(x) ‘strong’

aggregation

=⇒

mix

uniformly

—G(x) ‘moderate’

aggregation

=⇒

### regularization (?)

proper aggregation =⇒

### better performance

(33)

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

mix

uniformly

—G(x) ‘strong’

aggregation

=⇒

mix

uniformly

—G(x) ‘moderate’

aggregation

=⇒

### regularization (?)

proper aggregation =⇒

### better performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23

(34)

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

mix

uniformly

—G(x) ‘strong’

aggregation

=⇒

mix

uniformly

—G(x) ‘moderate’

aggregation

=⇒

### regularization (?)

proper aggregation =⇒

### better performance

(35)

Blending and Bagging Motivation of Aggregation

## Fun Time

Consider three decision stump hypotheses from R to {−1, +1}:

g

### 1

(x ) = sign(1 − x ), g

### 2

(x ) = sign(1 + x ), g

### 3

(x ) = −1. When mixing the three hypotheses uniformly, what is the resulting G(x )?

2J|x | ≤ 1K − 1

2J|x | ≥ 1K − 1

2Jx ≤ −1K − 1

### 4

2Jx ≥ +1K − 1

The ‘region’ that gets two positive votes from g

and g

### 2

is |x | ≤ 1, and thus G(x ) is positive within the region only. We see that the three decision stumps g

### t

can be aggregated to form a more sophisticated hypothesis G.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/23

(36)

Blending and Bagging Motivation of Aggregation

## Fun Time

Consider three decision stump hypotheses from R to {−1, +1}:

g

### 1

(x ) = sign(1 − x ), g

### 2

(x ) = sign(1 + x ), g

### 3

(x ) = −1. When mixing the three hypotheses uniformly, what is the resulting G(x )?

2J|x | ≤ 1K − 1

2J|x | ≥ 1K − 1

2Jx ≤ −1K − 1

### 4

2Jx ≥ +1K − 1

The ‘region’ that gets two positive votes from g

and g

### 2

is |x | ≤ 1, and thus G(x ) is positive within the region only. We see that the three decision stumps g

### t

can be aggregated to form a more sophisticated hypothesis G.

(37)

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

, each with

ballot

T

t=1

t

same

### g t

(autocracy): as good as one single

very different

(diversity+

majority can

minority

### •

similar results with uniform voting for multiclass

G(x) = argmax

X

Jg

(x) = kK

### regression?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

(38)

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

, each with

ballot

T

t=1

t

same

### g t

(autocracy): as good as one single

very different

(diversity+

majority can

minority

### •

similar results with uniform voting for multiclass

G(x) = argmax

X

Jg

(x) = kK

### regression?

(39)

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

, each with

ballot

T

t=1

t

same

### g t

(autocracy): as good as one single

very different

(diversity+

majority can

minority

### •

similar results with uniform voting for multiclass

G(x) = argmax

X

Jg

(x) = kK

### regression?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

(40)

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

, each with

ballot

T

t=1

t

same

### g t

(autocracy):

as good as one single

very different

(diversity+

majority can

minority

### •

similar results with uniform voting for multiclass

G(x) = argmax

X

Jg

(x) = kK

### regression?

(41)

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

, each with

ballot

T

t=1

t

same

### g t

(autocracy):

as good as one single

very different

(diversity+

majority can

minority

### •

similar results with uniform voting for multiclass

G(x) = argmax

X

Jg

(x) = kK

### regression?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

(42)

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

, each with

ballot

T

t=1

t

same

### g t

(autocracy):

as good as one single

very different

(diversity+

majority can

minority

### •

similar results with uniform voting for multiclass

G(x) = argmax

X

Jg

(x) = kK

### regression?

(43)

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

, each with

ballot

T

t=1

t

same

### g t

(autocracy):

as good as one single

very different

(diversity+

majority can

minority

### •

similar results with uniform voting for multiclass

G(x) = argmax

X

Jg

(x) = kK

### regression?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

(44)

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

(x)

same

### g t

(autocracy): as good as one single

very different

(diversity+

=⇒

some

### g t

(x) > f (x), some

(x) < f (x)

=⇒average

### could be

more accurate than individual

even simple

### uniform blending

can be better than any

### single hypothesis

(45)

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

(x)

same

### g t

(autocracy): as good as one single

very different

(diversity+

=⇒

some

### g t

(x) > f (x), some

(x) < f (x)

=⇒average

### could be

more accurate than individual

even simple

### uniform blending

can be better than any

### single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

(46)

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

(x)

same

### g t

(autocracy):

as good as one single

very different

(diversity+

=⇒

some

### g t

(x) > f (x), some

(x) < f (x)

=⇒average

### could be

more accurate than individual

even simple

### uniform blending

can be better than any

### single hypothesis

(47)

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

(x)

same

### g t

(autocracy):

as good as one single

very different

(diversity+

=⇒

some

### g t

(x) > f (x), some

(x) < f (x)

=⇒average

### could be

more accurate than individual

even simple

### uniform blending

can be better than any

### single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

(48)

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

(x)

same

### g t

(autocracy):

as good as one single

very different

(diversity+

=⇒

some

### g t

(x) > f (x), some

(x) < f (x)

=⇒average

### could be

more accurate than individual

even simple

### uniform blending

can be better than any

### single hypothesis

(49)

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

(x)

same

### g t

(autocracy):

as good as one single

very different

(diversity+

=⇒

some

### g t

(x) > f (x), some

(x) < f (x)

=⇒average

### could be

more accurate than individual

even simple

### uniform blending

can be better than any

### single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

(50)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

(51)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

(52)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

(53)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

(54)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

(55)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

(56)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

(57)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

(58)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

(59)

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

=

t

2

2t

t

2

2t

2

2t

2

2

2t

2

2

2

2t

t

2

2

t

2

2

(E

(g

)) =



E(g



+E

(G)

avg

E(g



+E

### out

(G)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

(60)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

= lim

=

lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

(61)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

= lim

=

lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

(62)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

= lim

=

lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

(63)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

= lim

=

lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

(64)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

=

lim

= lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

(65)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

=

lim

= lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

(66)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

= lim

= lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

(67)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

= lim

= lim

=

(E

(g

)) =



E(g



+E

(

performance of A

=

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

(68)

Blending and Bagging Uniform Blending

## Some Special g t

consider a

### virtual

iterative process that for t = 1, 2, . . . , T

### 1

request size-N data D

from P

(i.i.d.)

obtain

by A(D

)

= lim

= lim

=

(E

(g

)) =



E(g



+E

(

### g) ¯ expected

performance of A =

to

+

performance of

performance of

to

### consensus: called variance

uniform blending:

reduces

### variance

for more stable performance

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/28.. Linear Support Vector Machine Course Introduction.

Feature Exploitation Techniques Error Optimization Techniques Overfitting Elimination Techniques Machine Learning in Practice... Finale Feature

soft-margin k -means OOB error RBF network probabilistic SVM GBDT PCA random forest matrix factorization Gaussian kernel kernel LogReg large-margin prototype quadratic programming

decision tree: a traditional learning model that realizes conditional aggregation.. Decision Tree Decision Tree Hypothesis.. Disclaimers about

decision tree: a traditional learning model that realizes conditional aggregation.. Disclaimers about Decision

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/25.. Gradient Boosted Decision Tree Summary of Aggregation Models. Map of

2 Combining Predictive Features: Aggregation Models Lecture 7: Blending and Bagging.. Motivation of Aggregation

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Principle Component Analysis Denoising Auto Encoder Deep Neural Network... Deep Learning Optimization

For a deep NNet for written character recognition from raw pixels, which type of features are more likely extracted after the first hidden layer.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/24.:. Deep Learning Deep

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22.. Matrix Factorization Summary of Extraction Models.

Lecture 1: Large-Margin Linear Classification Large-Margin Separating Hyperplane Standard Large-Margin Problem Support Vector Machine.. Reasons behind

Lecture 4: Soft-Margin SVM Soft-Margin SVM: Primal Soft-Margin SVM: Dual Soft-Margin SVM: Solution Soft-Margin SVM: Selection.. Soft-Margin SVM Soft-Margin

For a data set of size 10000, after solving SVM on some parameters, assume that there are 1126 support vectors, and 1000 of those support vectors are bounded.. Soft-Margin

Random Forest Algorithm Out-Of-Bag Estimate Feature Selection.. Random Forest

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

• validation set blending: a special any blending model E test (squared): 519.45 =⇒ 456.24. —helped secure the lead in last