最近搜尋

沒有找到結果。

標籤

沒有找到結果。

文件

沒有找到結果。

上傳

首頁學校主題

登錄

Machine Learning Techniques (ᘤᢈ)

Share "Machine Learning Techniques (ᘤᢈ)"

N/A

N/A

Protected

學年: 2022

Info

Protected

Academic year: 2022

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!

146

0

0

146

0

0

加載中.... (立即查看全文)

立即下載 ( 146 頁 )

全文

(1)

Machine Learning Techniques

( 機器學習技法)

Lecture 14: Radial Basis Function Network

Hsuan-Tien Lin (林軒田)

htlin@csie.ntu.edu.tw

Department of Computer Science

& Information Engineering

National Taiwan University

( 國立台灣大學資訊工程系)

(2)

Radial Basis Function Network

Roadmap

1 Embedding Numerous Features: Kernel Models

2 Combining Predictive Features: Aggregation Models

3

Distilling Implicit Features: Extraction Models

Lecture 13: Deep Learning

pre-training

with

denoising autoencoder

(non-linear PCA) and fine-tuning with

backprop

for NNet with

many layers

Lecture 14: Radial Basis Function Network RBF Network Hypothesis

RBF Network Learning

k -Means Algorithm

(3)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM:

find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: g_SVM(x) = sign P

SV α n

g _n

(x)

+b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(4)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: g_SVM(x) = sign P

SV α n

g _n

(x)

+b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(5)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: gSVM(x) = sign P

SV α n

g _n

(x)

+b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(6)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: gSVM(x) = sign P

SV α n

g _n

(x)

+b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(7)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: gSVM(x) = sign P

SV α n

g _n

(x)

+b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(8)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: gSVM(x) = sign P

SV α n

g n

(x)

+b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(9)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: gSVM(x) = sign P

SV α n g n

(x) + b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(10)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: gSVM(x) = sign P

SV α n g n

(x) + b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(11)

Radial Basis Function Network RBF Network Hypothesis

Gaussian SVM Revisited

g

SVM

(x) = sign X

SV

α

_n

y

_n

exp −γkx − x

_n

k

²

+ b

!

Gaussian SVM: find

α n

to

combine Gaussians

centered at

x n

;

achieve large margin in

infinite-dimensional space, remember? :-)

•

Gaussian kernel: also called

Radial Basis Function

(RBF) kernel

• radial: only depends on distance between x and ‘center’ x

n

• basis function: to be ‘combined’

•

let

g n

(x) =

y n exp −γkx − x n k ²

: gSVM(x) = sign P

SV α n g n

(x) + b

—linear

aggregation

of

selected radial

hypotheses

Radial Basis Function

(RBF)

Network:

linear

aggregation

of

radial

hypotheses

(12)

Radial Basis Function Network RBF Network Hypothesis

From Neural Network to RBF Network

Neural Network

x0=1

x1

x2

x3

... xd

+1

tanh

tanh

tanh

w _ij ⁽¹⁾ w _j1 ⁽²⁾

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

centers votes

• hidden layer

different:

(inner-product+ tanh) versus (distance+ Gaussian)

• output layer

same:

just linear aggregation

RBF Network: historically

a type of NNet

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/24

(13)

Radial Basis Function Network RBF Network Hypothesis

From Neural Network to RBF Network

Neural Network

x0=1

x1

x2

x3

... xd

+1

tanh

tanh

tanh

w _ij ⁽¹⁾ w _j1 ⁽²⁾

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

centers votes

• hidden layer

different:

(inner-product+ tanh) versus (distance+ Gaussian)

• output layer

same:

just linear aggregation

RBF Network: historically

a type of NNet

(14)

Radial Basis Function Network RBF Network Hypothesis

From Neural Network to RBF Network

Neural Network

x0=1

x1

x2

x3

... xd

+1

tanh

tanh

tanh

w _ij ⁽¹⁾ w _j1 ⁽²⁾

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

centers votes

• hidden layer

different:

(inner-product+ tanh) versus (distance+ Gaussian)

• output layer

same:

just linear aggregation

RBF Network: historically

a type of NNet

(15)

Radial Basis Function Network RBF Network Hypothesis

From Neural Network to RBF Network

Neural Network

x0=1

x1

x2

x3

... xd

+1

tanh

tanh

tanh

w _ij ⁽¹⁾ w _j1 ⁽²⁾

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

centers votes

• hidden layer

different:

(inner-product+ tanh) versus (distance+ Gaussian)

• output layer

same:

just linear aggregation

RBF Network: historically

a type of NNet

(16)

Radial Basis Function Network RBF Network Hypothesis

From Neural Network to RBF Network

Neural Network

x0=1

x1

x2

x3

... xd

+1

tanh

tanh

tanh

w _ij ⁽¹⁾ w _j1 ⁽²⁾

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

centers votes

• hidden layer

different:

(inner-product+ tanh) versus (distance+ Gaussian)

• output layer

same:

just linear aggregation

(17)

Radial Basis Function Network RBF Network Hypothesis

RBF Network Hypothesis

h(x)

= Output

M

X

m=1

β

_m

RBF(x, µ

_m

) + b

!

key variables:

centers µ _m

; (signed)

votes β _m

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

Output

centers votes

g

SVM

for Gaussian-SVM

• RBF: Gaussian; Output: sign (binary classification)

•

M = #SV;

µ _m

: SVM SVs

x m

;

β m

: α

m

y

m

from SVM Dual

learning: given

RBF

and

Output,

decide

µ _m

and

β _m

(18)

Radial Basis Function Network RBF Network Hypothesis

RBF Network Hypothesis

h(x)

= Output

M

X

m=1

β

_m

RBF(x, µ

_m

) + b

!

key variables:

centers µ _m

; (signed)

votes β _m

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

Output

centers votes

g

SVM

for Gaussian-SVM

• RBF: Gaussian; Output: sign (binary classification)

•

M = #SV;

µ _m

: SVM SVs

x m

;

β m

: α

m

y

m

from SVM Dual

learning: given

RBF

and

Output,

decide

µ _m

and

β _m

(19)

Radial Basis Function Network RBF Network Hypothesis

RBF Network Hypothesis

h(x)

= Output

M

X

m=1

β

_m

RBF(x, µ

_m

) + b

!

key variables:

centers µ _m

; (signed)

votes β _m

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

Output

centers votes

g

SVM

for Gaussian-SVM

• RBF: Gaussian; Output: sign (binary classification)

•

M = #SV;

µ _m

: SVM SVs

x m

;

β m

: α

m

y

m

from SVM Dual learning: given

RBF

and

Output,

decide

µ _m

and

β _m

(20)

Radial Basis Function Network RBF Network Hypothesis

RBF Network Hypothesis

h(x)

= Output

M

X

m=1

β

_m

RBF(x, µ

_m

) + b

!

key variables:

centers µ _m

; (signed)

votes β _m

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

Output

centers votes

g

SVM

for Gaussian-SVM

• RBF: Gaussian; Output: sign (binary classification)

•

M = #SV;

µ _m

: SVM SVs

x m

;

β m

: α

m

y

m

from SVM Dual

learning: given

RBF

and

Output,

decide

µ _m

and

β _m

(21)

Radial Basis Function Network RBF Network Hypothesis

RBF Network Hypothesis

h(x)

= Output

M

X

m=1

β

_m

RBF(x, µ

_m

) + b

!

key variables:

centers µ _m

; (signed)

votes β _m

RBF Network

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

Output

centers votes

g

SVM

for Gaussian-SVM

• RBF: Gaussian; Output: sign (binary classification)

•

M = #SV;

µ _m

: SVM SVs

x m

;

β m

: α

m

y

m

from SVM Dual learning: given

RBF

and

Output,

decide

µ _m

and

β _m

(22)

Radial Basis Function Network RBF Network Hypothesis

RBF and Similarity

general similarity function

between

x and x ⁰

:

Neuron(x, x

⁰

) =tanh(γx

^T x ⁰

+1) DNASim(x, x

⁰

) =EditDistance(x, x

⁰

)

RBF Network:

distance similarity-to-centers

as

feature transform

kernel: similarity via Z-space inner product

—governed by Mercer’s condition, remember? :-) Poly(x, x

⁰

) = (1 +

x ^T x ⁰

)

²

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Truncated

(x, x

⁰

) =Jkx − x

0

k ≤ 1K (1 − kx − x

0

k)

² RBF: similarity via X -space distance

—often

monotonically non-increasing

to distance

(23)

Radial Basis Function Network RBF Network Hypothesis

RBF and Similarity

general similarity function

between

x and x ⁰

:

Neuron(x, x

⁰

) =tanh(γx

^T x ⁰

+1) DNASim(x, x

⁰

) =EditDistance(x, x

⁰

)

RBF Network:

distance similarity-to-centers

as

feature transform

kernel: similarity via Z-space inner product

—governed by Mercer’s condition, remember? :-) Poly(x, x

⁰

) = (1 +

x ^T x ⁰

)

²

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Truncated

(x, x

⁰

) =Jkx − x

0

k ≤ 1K (1 − kx − x

0

k)

² RBF: similarity via X -space distance

—often

monotonically non-increasing

to distance

(24)

Radial Basis Function Network RBF Network Hypothesis

RBF and Similarity

general similarity function

between

x and x ⁰

:

Neuron(x, x

⁰

) =tanh(γx

^T x ⁰

+1) DNASim(x, x

⁰

) =EditDistance(x, x

⁰

)

RBF Network:

distance similarity-to-centers

as

feature transform

kernel: similarity via Z-space inner product

—governed by Mercer’s condition, remember? :-) Poly(x, x

⁰

) = (1 +

x ^T x ⁰

)

²

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Truncated

(x, x

⁰

) =Jkx − x

0

k ≤ 1K (1 − kx − x

0

k)

² RBF: similarity via X -space distance

—often

monotonically non-increasing

to distance

(25)

Radial Basis Function Network RBF Network Hypothesis

RBF and Similarity

general similarity function

between

x and x ⁰

:

Neuron(x, x

⁰

) =tanh(γx

^T x ⁰

+1) DNASim(x, x

⁰

) =EditDistance(x, x

⁰

)

RBF Network:

distance similarity-to-centers

as

feature transform

kernel: similarity via Z-space inner product

—governed by Mercer’s condition, remember? :-) Poly(x, x

⁰

) = (1 +

x ^T x ⁰

)

²

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Gaussian(x, x

⁰

) =exp(−γkx − x

⁰

k

²

)

Truncated

(x, x

⁰

) =Jkx − x

0

k ≤ 1K (1 − kx − x

0

k)

² RBF: similarity via X -space distance

—often

monotonically non-increasing

to distance

(26)

Radial Basis Function Network RBF Network Hypothesis

Fun Time

Which of the following is not a radial basis function?

1

φ(x, µ) = exp(−γkx − µk

²

)

2

φ(x, µ) = −p

x ^T x − 2x ^T

µ + µ

^T

µ

3

φ(x, µ) =Jx = µK

4

φ(x, µ) = x

^T x + µ ^T

µ

Reference Answer: 4

Note that 3 is an extreme case of 1 (Gaussian) with γ → ∞, and 2 contains an kx − µk

²

somewhere

:-).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/24

(27)

Radial Basis Function Network RBF Network Hypothesis

Fun Time

Which of the following is not a radial basis function?

1

φ(x, µ) = exp(−γkx − µk

²

)

2

φ(x, µ) = −p

x ^T x − 2x ^T

µ + µ

^T

µ

3

φ(x, µ) =Jx = µK

4

φ(x, µ) = x

^T x + µ ^T

µ

Reference Answer: 4

Note that 3 is an extreme case of 1 (Gaussian) with γ → ∞, and 2 contains an kx − µk

²

somewhere

:-).

(28)

Radial Basis Function Network RBF Network Learning

Full RBF Network

h(x) =

Output

M

X

m=1

β m RBF(x, µ _m

)

!

• full RBF

Network:

M = N

and each

µ _m

=

x _m

•

physical meaning: each

x _m influences

similar

x by β _m

•

e.g. uniform influence with

β m = 1 · y m

for binary classification

g

_uniform

(x) =

sign

N

X

m=1

y _m exp

−γkx − x _m k ²

!

—aggregateeach example’s

opinion

subject to

similarity

full RBF

Network:

lazy

way to decide

µ _m

(29)

Radial Basis Function Network RBF Network Learning

Full RBF Network

h(x) =

Output

M

X

m=1

β m RBF(x, µ _m

)

!

• full RBF

Network:

M = N

and each

µ _m

=

x _m

•

physical meaning: each

x _m influences

similar

x by β _m

•

e.g. uniform influence with

β m = 1 · y m

for binary classification

g

_uniform

(x) =

sign

N

X

m=1

y _m exp

−γkx − x _m k ²

!

—aggregateeach example’s

opinion

subject to

similarity

full RBF

Network:

lazy

way to decide

µ _m

(30)

Radial Basis Function Network RBF Network Learning

Full RBF Network

h(x) =

Output

M

X

m=1

β m RBF(x, µ _m

)

!

• full RBF

Network:

M = N

and each

µ _m

=

x _m

•

physical meaning: each

x _m influences

similar

x by β _m

•

e.g. uniform influence with

β m = 1 · y m

for binary classification

g

_uniform

(x) =

sign

N

X

m=1

y _m exp

−γkx − x _m k ²

!

—aggregateeach example’s

opinion

subject to

similarity

full RBF

Network:

lazy

way to decide

µ _m

(31)

Radial Basis Function Network RBF Network Learning

Full RBF Network

h(x) =

Output

M

X

m=1

β m RBF(x, µ _m

)

!

• full RBF

Network:

M = N

and each

µ _m

=

x _m

•

physical meaning: each

x _m influences

similar

x by β _m

•

e.g. uniform influence with

β m = 1 · y m

for binary classification

g

_uniform

(x) =

sign

N

X

m=1

y _m exp

−γkx − x _m k ²

!

—aggregateeach example’s

opinion

subject to

similarity

full RBF

Network:

lazy

way to decide

µ _m

(32)

Radial Basis Function Network RBF Network Learning

Full RBF Network

h(x) =

Output

M

X

m=1

β m RBF(x, µ _m

)

!

• full RBF

Network:

M = N

and each

µ _m

=

x _m

•

physical meaning: each

x _m influences

similar

x by β _m

•

e.g. uniform influence with

β m = 1 · y m

for binary classification

g

_uniform

(x) =

sign

N

X

m=1

y _m exp

−γkx − x _m k ²

!

—aggregateeach example’s

opinion

subject to

similarity

(33)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor

k nearest neighbor:

also

lazy

but

very intuitive

(34)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor

k nearest neighbor:

also

lazy

but

very intuitive

(35)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor

k nearest neighbor:

also

lazy

but

very intuitive

(36)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor

k nearest neighbor:

also

lazy

but

very intuitive

(37)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor

k nearest neighbor:

also

lazy

but

very intuitive

(38)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor

k nearest neighbor:

also

lazy

but

very intuitive

(39)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor k nearest neighbor:

also

lazy

but

very intuitive

(40)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor

k nearest neighbor:

also

lazy

but

very intuitive

(41)

Radial Basis Function Network RBF Network Learning

Nearest Neighbor

g

_uniform

(x) = sign

N

X

m=1

y _m exp

−γkx − x m k ²

!

• exp −γkx − x m k ²

:

maximum

when

x closest to x _m

—maximum oneoften dominates the

P N

m=1

term

•

take

y m

of

maximum exp(. . .)

instead of

voting

of all

y m

—selectioninstead of

aggregation

•

physical meaning:

g

_nbor

(x) =

y m

such that

x closest to x _m

—called

nearest neighbor

model

•

can

uniformly aggregate k neighbors

also:

k nearest neighbor k nearest neighbor:

also

lazy

but

very intuitive

(42)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z ⁻¹

y

(43)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z ⁻¹

y

(44)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z ⁻¹

y

(45)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z ⁻¹

y

(46)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z? N (examples) by N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z ⁻¹

y

(47)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z? N (examples) by N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z ⁻¹

y

(48)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z? N (examples) by N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

Z ⁻¹

(49)

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

XX Output X X

N

X

m=1

β m RBF(x, x _m

)

!

•

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x ₁

),

RBF(x n

,

x ₂

), . . . ,

RBF(x n

,

x _N

)]

•

optimal

β? β

= (Z

^T Z) ⁻¹ Z ^T y, if Z ^T Z

invertible,

remember? :-)

•

size of

Z? N (examples) by N (centers)

—symmetric square matrix

•

theoretical fact: if

x _n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z ⁻¹ y

(50)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

=

y

^T

Z

⁻¹

(first column of Z) = y

^T

1

0 . . . 0

T

=

y

₁

—g_RBF(x

_n

) =

y

_n

, i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(51)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

=

y

^T

Z

⁻¹

(first column of Z)

= y

^T

1

0 . . . 0

T

=

y

₁

—g_RBF(x

_n

) =

y

_n

, i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(52)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z)

= y

^T

1

0 . . . 0

T

=

y

₁

—g_RBF(x

_n

) =

y

_n

, i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(53)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1

0 . . . 0

T

=

y

₁

—g_RBF(x

_n

) =

y

_n

, i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(54)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

=

y

₁

—g_RBF(x

_n

) =

y

_n

, i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(55)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

=

y

₁

—g_RBF(x

_n

) =

y

_n

, i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(56)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =

y

_n

, i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(57)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =

y

_n

,

i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(58)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

,

i.e. E

_in

(g_RBF) =

0

,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(59)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

, i.e. E

_in

(g_RBF) =

0,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(60)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

, i.e. E

_in

(g_RBF) =0,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(61)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

, i.e. E

_in

(g_RBF) =0,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(62)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

, i.e. E

_in

(g_RBF) =0,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(63)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

, i.e. E

_in

(g_RBF) =0,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(64)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

, i.e. E

_in

(g_RBF) =0,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

(65)

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z ⁻¹ y g

RBF

(x

₁

) = β

^T

z

₁

= y

^T

Z

⁻¹

(first column of Z) = y

^T

1 0 . . . 0

T

= y

₁

—g_RBF(x

_n

) =y

_n

, i.e. E

_in

(g_RBF) =0,

yeah!! :-)

•

called

exact interpolation

for

function approximation

•

but

overfitting for learning? :-(

•

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

•

seen

Z? Z

= [Gaussian(x

_n

,

x _m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) ⁻¹ y;

regularized

full RBFNet:

β

= (Z

^T Z

+

λI) ⁻¹ Z ^T y

參考文獻

立即下載 ( PDF - 146 頁 - 1.25 MB )

Outline

XXOutputXX by constraining number of centers and voting weights average

相關文件

Machine Learning Techniques (ᘤᢈ)

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

Machine Learning Techniques (ᘤᢈ)

3 Distilling Implicit Features: Extraction Models Lecture 14: Radial Basis Function Network. RBF

Machine Learning Techniques (ᘤᢈ)

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Machine Learning Techniques (ᘤᢈ)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

Machine Learning Techniques (ᘤᢈ)

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

Machine Learning Techniques (ᘤᢈ)

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Machine Learning Techniques (ᘤᢈ)

Principle Component Analysis Denoising Auto Encoder Deep Neural Network... Deep Learning Optimization

Machine Learning Techniques (ᘤᢈ)

For a deep NNet for written character recognition from raw pixels, which type of features are more likely extracted after the first hidden layer.

上傳您的學習材料以下載所有文件。

您的文件將被豐富，在 9lib TW 上共享以幫助學習。

相關文件

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

126

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

26

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

112

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

37

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

31

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

147

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

153

0

0

Machine Learning Techniques (ᘤᢈ)

Machine Learning Techniques (ᘤᢈ)

28

0

0