## Machine Learning Techniques

## ( 機器學習技法)

### Lecture 7: Blending and Bagging

Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw### Department of Computer Science

### & Information Engineering

### National Taiwan University

### ( 國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/23

Blending and Bagging

## Roadmap

### 1 Embedding Numerous Features: Kernel Models

### Lecture 6: Support Vector Regression **kernel ridge regression**

(dense) via
ridge regression +**representer theorem;**

**support vector regression**

(sparse) via
regularized**tube**

error +**Lagrange dual**

### 2

Combining Predictive Features: Aggregation Models### Lecture 7: Blending and Bagging

### Motivation of Aggregation Uniform Blending

### Linear and Any Blending

### Bagging (Bootstrap Aggregation)

### 3 Distilling Implicit Features: Extraction Models

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/23

Blending and Bagging Motivation of Aggregation

## An Aggregation Story

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g### t

(x).### You can . . .

### • **select**

the most trust-worthy friend from their**usual performance**

—validation!

### • **mix**

the predictions from all your friends**uniformly**

—let them

**vote!**

### • **mix**

the predictions from all your friends**non-uniformly**

—let them vote, but

**give some more ballots**

### • **combine**

the predictions**conditionally**

—if

**[t satisfies some condition]**

give some ballots to friend t
### •

...**aggregation**

models: **mix**

or**combine**

hypotheses (for better performance)
Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g

_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g ^{−} _{t} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign P

### T

### t=1 1

· g### t

(x)### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### =

### 1

### • **combine**

the predictions**conditionally**

G(x) = sign P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign P

### T

### t=1 1

· g### t

(x)### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### =

### 1

### • **combine**

the predictions**conditionally**

G(x) = sign P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### =

### 1

### • **combine**

the predictions**conditionally**

G(x) = sign P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### =

### 1

### • **combine**

the predictions**conditionally**

G(x) = sign P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### =

### 1

### • **combine**

the predictions**conditionally**

G(x) = sign P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### =

### 1

### • **combine**

the predictions**conditionally**

G(x) = sign P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### = 1

### • **combine**

the predictions**conditionally**

G(x) = sign P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### = 1

### • **combine**

the predictions**conditionally**

G(x) = sign
P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### = 1

### • **combine**

the predictions**conditionally**

G(x) = sign
P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) =

### α

_{t}

aggregation models: a

**rich family**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### = 1

### • **combine**

the predictions**conditionally**

G(x) = sign
P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) = α

taggregation models: a

**rich family**

Blending and Bagging Motivation of Aggregation

## Aggregation with Math Notations

Your T friends g

_{1}

, · · · ,g_{T}

predicts whether stock will go up as g_{t}

(x).
### • **select**

the most trust-worthy friend from their**usual performance**

G(x) = g_{t}

_{∗}(x) with t

### ∗

=argmin### t∈{1,2,··· ,T } E _{val} (g _{t} ^{−} )

### • **mix**

the predictions from all your friends**uniformly**

G(x) = sign
P

### T

### t=1 1

· g### t

(x)

### • **mix**

the predictions from all your friends**non-uniformly**

G(x) = sign
P

### T

### t=1 α _{t}

· g_{t}

(x)
with

### α _{t} ≥ 0

### • include **select: α**

_{t}

### = q

### E

val### (g

_{t}

^{−}

### ) smallest y

### • include **uniformly: α**

_{t}

### = 1

### • **combine**

the predictions**conditionally**

G(x) = sign
P

### T

### t=1 q _{t} (x)

· g### t

(x)with

### q _{t} (x) ≥ 0

### • include **non-uniformly: q**

t### (x) = α

taggregation models: a

**rich family**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/23

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

_{t}

_{∗}(x) with t

_{∗}

= argmin
### t∈{1,2,··· ,T }

### E _{val} (g _{t} ^{−} )

### • **simple**

and popular
### •

what if use E_{in}

(g_{t}

)instead of### E _{val} (g _{t} ^{−} )?

**complexity price on d**

_{VC}

**, remember? :-)**

### •

need**one strong**

g_{t} ^{−}

to guarantee small### E _{val}

(and small E_{out}

)
**selection:**

rely on one strong hypothesis

**aggregation:**

can we do better with many (possibly weaker) hypotheses?

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

_{t}

_{∗}(x) with t

_{∗}

= argmin
### t∈{1,2,··· ,T }

### E _{val} (g _{t} ^{−} )

### • **simple**

and popular
### •

what if use E_{in}

(g_{t}

)instead of### E _{val} (g _{t} ^{−} )?

**complexity price on d**

_{VC}

**, remember? :-)**

### •

need**one strong**

g_{t} ^{−}

to guarantee small### E _{val}

(and small E_{out}

)
**selection:**

rely on one strong hypothesis

**aggregation:**

can we do better with many (possibly weaker) hypotheses?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

_{t}

_{∗}(x) with t

_{∗}

= argmin
### t∈{1,2,··· ,T }

### E _{val} (g _{t} ^{−} )

### • **simple**

and popular
### •

what if use E_{in}

(g_{t}

)instead of### E _{val} (g _{t} ^{−} )?

**complexity price on d**

_{VC}

**, remember? :-)**

### •

need**one strong**

g_{t} ^{−}

to guarantee small### E _{val}

(and small E_{out}

)
**selection:**

rely on one strong hypothesis

**aggregation:**

can we do better with many (possibly weaker) hypotheses?

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

_{t}

_{∗}(x) with t

_{∗}

= argmin
### t∈{1,2,··· ,T }

### E _{val} (g _{t} ^{−} )

### • **simple**

and popular
### •

what if use E_{in}

(g_{t}

)instead of### E _{val} (g _{t} ^{−} )?

**complexity price on d**

_{VC}

**, remember? :-)**

### •

need**one strong**

g_{t} ^{−}

to guarantee small### E _{val}

(and small E_{out}

)
**selection:**

rely on one strong hypothesis

**aggregation:**

can we do better with many (possibly weaker) hypotheses?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

_{t}

_{∗}(x) with t

_{∗}

= argmin
### t∈{1,2,··· ,T }

### E _{val} (g _{t} ^{−} )

### • **simple**

and popular
### •

what if use E_{in}

(g_{t}

)instead of### E _{val} (g _{t} ^{−} )?

**complexity price on d**

_{VC}

**, remember? :-)**

### •

need**one strong**

g_{t} ^{−}

to guarantee small### E _{val}

(and small E_{out}

)
**selection:**

rely on one strong hypothesis

**aggregation:**

can we do better with many (possibly weaker) hypotheses?

Blending and Bagging Motivation of Aggregation

## Recall: Selection by Validation

G(x) = g

_{t}

_{∗}(x) with t

_{∗}

= argmin
### t∈{1,2,··· ,T }

### E _{val} (g _{t} ^{−} )

### • **simple**

and popular
### •

what if use E_{in}

(g_{t}

)instead of### E _{val} (g _{t} ^{−} )?

**complexity price on d**

_{VC}

**, remember? :-)**

### •

need**one strong**

g_{t} ^{−}

to guarantee small### E _{val}

(and small E_{out}

)
**selection:**

rely on one strong hypothesis

**aggregation:**

can we do better with many (possibly weaker) hypotheses?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/23

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

### •

mix**different weak** **hypotheses**

uniformly
—G(x) ‘strong’

### •

aggregation=⇒

**feature transform (?)**

### •

mix**different random-PLA** **hypotheses**

uniformly
—G(x) ‘moderate’

### •

aggregation=⇒

**regularization (?)**

proper aggregation =⇒

**better performance**

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

### •

mix**different weak** **hypotheses**

uniformly
—G(x) ‘strong’

### •

aggregation=⇒

**feature transform (?)**

### •

mix**different random-PLA** **hypotheses**

uniformly
—G(x) ‘moderate’

### •

aggregation=⇒

**regularization (?)**

proper aggregation =⇒

**better performance**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

### •

mix**different weak** **hypotheses**

uniformly
—G(x) ‘strong’

### •

aggregation=⇒

**feature transform (?)**

### •

mix**different random-PLA** **hypotheses**

uniformly
—G(x) ‘moderate’

### •

aggregation=⇒

**regularization (?)**

proper aggregation =⇒

**better performance**

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

### •

mix**different weak** **hypotheses**

uniformly
—G(x) ‘strong’

### •

aggregation=⇒

**feature transform (?)**

### •

mix**different random-PLA** **hypotheses**

uniformly
—G(x) ‘moderate’

### •

aggregation=⇒

**regularization (?)**

proper aggregation =⇒

**better performance**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

### •

mix**different weak** **hypotheses**

uniformly
—G(x) ‘strong’

### •

aggregation=⇒

**feature transform (?)**

### •

mix**different random-PLA** **hypotheses**

uniformly
—G(x) ‘moderate’

### •

aggregation=⇒

**regularization (?)**

proper aggregation =⇒

**better performance**

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

### •

mix**different weak** **hypotheses**

uniformly
—G(x) ‘strong’

### •

aggregation=⇒

**feature transform (?)**

### •

mix**different random-PLA** **hypotheses**

uniformly
—G(x) ‘moderate’

### •

aggregation=⇒

**regularization (?)**

proper aggregation =⇒

**better performance**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/23

Blending and Bagging Motivation of Aggregation

## Why Might Aggregation Work?

### •

mix**different weak** **hypotheses**

uniformly
—G(x) ‘strong’

### •

aggregation=⇒

**feature transform (?)**

### •

mix**different random-PLA** **hypotheses**

uniformly
—G(x) ‘moderate’

### •

aggregation=⇒

**regularization (?)**

proper aggregation =⇒

**better performance**

Blending and Bagging Motivation of Aggregation

## Fun Time

Consider three decision stump hypotheses from R to {−1, +1}:

g

_{1}

(x ) = sign(1 − x ), g_{2}

(x ) = sign(1 + x ), g_{3}

(x ) = −1. When mixing
the three hypotheses uniformly, what is the resulting G(x )?
### 1

2J|x | ≤ 1K − 1### 2

2J|x | ≥ 1K − 1### 3

2Jx ≤ −1K − 1### 4

2Jx ≥ +1K − 1### Reference Answer: 1

The ‘region’ that gets two positive votes from g

_{1}

and g_{2}

is |x | ≤ 1, and thus G(x ) is
positive within the region only. We see that the
three decision stumps g_{t}

can be aggregated to
form a more sophisticated hypothesis G.
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/23

Blending and Bagging Motivation of Aggregation

## Fun Time

Consider three decision stump hypotheses from R to {−1, +1}:

g

_{1}

(x ) = sign(1 − x ), g_{2}

(x ) = sign(1 + x ), g_{3}

(x ) = −1. When mixing
the three hypotheses uniformly, what is the resulting G(x )?
### 1

2J|x | ≤ 1K − 1### 2

2J|x | ≥ 1K − 1### 3

2Jx ≤ −1K − 1### 4

2Jx ≥ +1K − 1### Reference Answer: 1

The ‘region’ that gets two positive votes from g

_{1}

and g_{2}

is |x | ≤ 1, and thus G(x ) is
positive within the region only. We see that the
three decision stumps g_{t}

can be aggregated to
form a more sophisticated hypothesis G.
Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

### uniform

### blending: known g _{t}

, each with

### 1

ballot### G(x) = sign

T

### X

t=1

### 1 · g

_{t}

### (x)

### !

### •

same### g t

(autocracy): as good as one single### g _{t}

### •

very different### g t

(diversity+**democracy):**

majority can**correct**

minority
### •

similar results with uniform voting for multiclassG(x) = argmax

### 1≤k ≤K T

X

### t=1

Jg

### t

(x) = kKhow about

**regression?**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

### uniform blending: known g _{t}

, each with### 1

ballot### G(x) = sign

T

### X

t=1

### 1 · g

_{t}

### (x)

### !

### •

same### g t

(autocracy): as good as one single### g _{t}

### •

very different### g t

(diversity+**democracy):**

majority can**correct**

minority
### •

similar results with uniform voting for multiclassG(x) = argmax

### 1≤k ≤K T

X

### t=1

Jg

### t

(x) = kKhow about

**regression?**

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

### uniform blending: known g _{t}

, each with### 1

ballot### G(x) = sign

T

### X

t=1

### 1 · g

t### (x)

### !

### •

same### g t

(autocracy): as good as one single### g _{t}

### •

very different### g t

(diversity+**democracy):**

majority can**correct**

minority
### •

similar results with uniform voting for multiclassG(x) = argmax

### 1≤k ≤K T

X

### t=1

Jg

### t

(x) = kKhow about

**regression?**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

### uniform blending: known g _{t}

, each with### 1

ballot### G(x) = sign

T

### X

t=1

### 1 · g

t### (x)

### !

### •

same### g t

(autocracy):as good as one single

### g _{t}

### •

very different### g t

(diversity+**democracy):**

majority can**correct**

minority
### •

similar results with uniform voting for multiclassG(x) = argmax

### 1≤k ≤K T

X

### t=1

Jg

### t

(x) = kKhow about

**regression?**

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

### uniform blending: known g _{t}

, each with### 1

ballot### G(x) = sign

T

### X

t=1

### 1 · g

t### (x)

### !

### •

same### g t

(autocracy):as good as one single

### g _{t}

### •

very different### g t

(diversity+**democracy):**

majority can

**correct**

minority
### •

similar results with uniform voting for multiclassG(x) = argmax

### 1≤k ≤K T

X

### t=1

Jg

### t

(x) = kKhow about

**regression?**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

### uniform blending: known g _{t}

, each with### 1

ballot### G(x) = sign

T

### X

t=1

### 1 · g

t### (x)

### !

### •

same### g t

(autocracy):as good as one single

### g _{t}

### •

very different### g t

(diversity+**democracy):**

majority can

**correct**

minority
### •

similar results with uniform voting for multiclassG(x) = argmax

### 1≤k ≤K T

X

### t=1

Jg

### t

(x) = kKhow about

**regression?**

Blending and Bagging Uniform Blending

## Uniform Blending (Voting) for Classification

### uniform blending: known g _{t}

, each with### 1

ballot### G(x) = sign

T

### X

t=1

### 1 · g

t### (x)

### !

### •

same### g t

(autocracy):as good as one single

### g _{t}

### •

very different### g t

(diversity+**democracy):**

majority can

**correct**

minority
### •

similar results with uniform voting for multiclassG(x) = argmax

### 1≤k ≤K T

X

### t=1

Jg

### t

(x) = kKhow about

**regression?**

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/23

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

### G(x) =

### 1 T

### T

### X

### t=1

### g _{t}

(x)
### •

same### g _{t}

(autocracy):
as good as one single### g t

### •

very different### g _{t}

(diversity+**democracy):**

=⇒

some

### g t

(x) > f (x), some### g t

(x) < f (x)=⇒average

**could be**

more accurate than individual
**diverse hypotheses:**

even simple

### uniform blending

can be better than any### single hypothesis

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

### G(x) = 1 T

### T

### X

### t=1

### g _{t}

(x)
### •

same### g _{t}

(autocracy):
as good as one single### g t

### •

very different### g _{t}

(diversity+**democracy):**

=⇒

some

### g t

(x) > f (x), some### g t

(x) < f (x)=⇒average

**could be**

more accurate than individual
**diverse hypotheses:**

even simple

### uniform blending

can be better than any### single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

### G(x) = 1 T

### T

### X

### t=1

### g _{t}

(x)
### •

same### g _{t}

(autocracy):
as good as one single

### g t

### •

very different### g _{t}

(diversity+**democracy):**

=⇒

some

### g t

(x) > f (x), some### g t

(x) < f (x)=⇒average

**could be**

more accurate than individual
**diverse hypotheses:**

even simple

### uniform blending

can be better than any### single hypothesis

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

### G(x) = 1 T

### T

### X

### t=1

### g _{t}

(x)
### •

same### g _{t}

(autocracy):
as good as one single

### g t

### •

very different### g _{t}

(diversity+**democracy):**

=⇒

some

### g t

(x) > f (x), some### g t

(x) < f (x)=⇒average

**could be**

more accurate than individual
**diverse hypotheses:**

even simple

### uniform blending

can be better than any### single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

### G(x) = 1 T

### T

### X

### t=1

### g _{t}

(x)
### •

same### g _{t}

(autocracy):
as good as one single

### g t

### •

very different### g _{t}

(diversity+**democracy):**

=⇒

some

### g t

(x) > f (x), some### g t

(x) < f (x)=⇒average

**could be**

more accurate than individual
**diverse hypotheses:**

even simple

### uniform blending

can be better than any### single hypothesis

Blending and Bagging Uniform Blending

## Uniform Blending for Regression

### G(x) = 1 T

### T

### X

### t=1

### g _{t}

(x)
### •

same### g _{t}

(autocracy):
as good as one single

### g t

### •

very different### g _{t}

(diversity+**democracy):**

=⇒

some

### g t

(x) > f (x), some### g t

(x) < f (x)=⇒average

**could be**

more accurate than individual
**diverse hypotheses:**

even simple

### uniform blending

can be better than any### single hypothesis

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/23

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg

### g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (

### G − f

### )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G

### + G

^{2}

### + (G − f )

^{2}

### =

### avg

### (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg

### g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (

### G − f

### )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G

### + G

^{2}

### + (G − f )

^{2}

### =

### avg

### (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (

### G − f

### )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G

### + G

^{2}

### + (G − f )

^{2}

### =

### avg

### (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (

### G − f

### )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G

### + G

^{2}

### + (G − f )

^{2}

### =

### avg

### (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G

### + G

^{2}

### + (G − f )

^{2}

### =

### avg

### (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G

### + G

^{2}

### + (G − f )

^{2}

### =

### avg

### (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G + G

^{2}

### + (G − f )

^{2}

### = avg

### (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G + G

^{2}

### + (G − f )

^{2}

### = avg (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G + G

^{2}

### + (G − f )

^{2}

### = avg (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E### out

(g### t

)) =### avg

E(g

_{t}

−### G) ^{2}

+E

### out

(G)≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Blending and Bagging Uniform Blending

## Theoretical Analysis of Uniform Blending

### G(x)

=### 1 T

### T

### X

### t=1

### g t (x)

### avg (g

t### (x) − f (x))

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### f + f

^{2}

### = avg g

^{2}

_{t}

### − 2Gf + f

^{2}

### = avg g

^{2}

_{t}

### − G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2G

^{2}

### + G

^{2}

### + (G − f )

^{2}

### = avg g

^{2}

_{t}

### − 2g

_{t}

### G + G

^{2}

### + (G − f )

^{2}

### = avg (g

t### − G)

^{2}

### + (G − f )

^{2}

### avg

(E### out

(g### t

)) =### avg

E(g

_{t}

−### G) ^{2}

+E

### out

(G)≥

avg

E(g

_{t}

−### G) ^{2}

+E

_{out}

(G)
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/23

Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

= lim### T →∞

### G

=### T →∞

lim### 1 T

### T

### X

### t=1

### g t

=

### E

### D A(D)

### avg

(E### out

(g### t

)) =### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

### out

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:

reduces

**variance**

for more stable performance
Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

= lim### T →∞

### G

=### T →∞

lim### 1 T

### T

### X

### t=1

### g t

=

### E

### D A(D)

### avg

(E### out

(g### t

)) =### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

### out

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:

reduces

**variance**

for more stable performance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

= lim### T →∞

### G

=### T →∞

lim### 1 T

### T

### X

### t=1

### g t

=

### E

### D A(D)

### avg

(E### out

(g### t

)) =### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

### out

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:

reduces

**variance**

for more stable performance
Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

= lim### T →∞

### G

=### T →∞

lim### 1 T

### T

### X

### t=1

### g t

=

### E

### D A(D)

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

_{out}

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:

reduces

**variance**

for more stable performance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

=### T →∞

lim### G

= lim### T →∞

### 1 T

### T

### X

### t=1

### g t

=

### E

### D A(D)

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

_{out}

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:

reduces

**variance**

for more stable performance
Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

=### T →∞

lim### G

= lim### T →∞

### 1 T

### T

### X

### t=1

### g t

=### E

### D A(D)

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

_{out}

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:

reduces

**variance**

for more stable performance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

= lim### T →∞ G

= lim### T →∞

### 1 T

### T

### X

### t=1

### g t

=### E

### D A(D)

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

_{out}

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:

reduces

**variance**

for more stable performance
Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

= lim### T →∞ G

= lim### T →∞

### 1 T

### T

### X

### t=1

### g t

=### E

### D A(D)

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

_{out}

(### g) ¯

### expected

performance of A=

### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:
reduces

**variance**

for more stable performance
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/23

Blending and Bagging Uniform Blending

## Some Special g _{t}

consider a

**virtual**

iterative process that for t = 1, 2, . . . , T
### 1

request size-N data D_{t}

from P^{N}

(i.i.d.)
### 2

obtain### g t

by A(D### t

)### g ¯

= lim### T →∞ G

= lim### T →∞

### 1 T

### T

### X

### t=1

### g t

=### E

### D A(D)

### avg

(E_{out}

(g_{t}

)) = ### avg

E(g

_{t}

−### ¯ g) ^{2}

+E

_{out}

(### g) ¯ expected

performance of A =### expected deviation

to### consensus

+

performance of

### consensus

### •

performance of### consensus: called **bias**

### • expected deviation

to### consensus: called **variance**

uniform blending:
reduces