• 沒有找到結果。

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z −1

y

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

   XX Output X X

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z −1

y

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

   XX Output X X

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z −1

y

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

   XX Output X X

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z?

N (examples) by

N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z −1

y

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

   XX Output X X

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z? N (examples) by N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z −1

y

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

   XX Output X X

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z? N (examples) by N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z −1

y

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

   XX Output X X

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z? N (examples) by N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

Z −1

Radial Basis Function Network RBF Network Learning

Interpolation by Full RBF Network

full RBF

Network for squared error regression:

h(x) =

   XX Output X X

N

X

m=1

β m RBF(x, x m

)

!

just linear regression on

RBF-transformed data

z n

= [RBF(x

n

,

x 1

),

RBF(x n

,

x 2

), . . . ,

RBF(x n

,

x N

)]

optimal

β? β

= (Z

T Z) −1 Z T y, if Z T Z

invertible,

remember? :-)

size of

Z? N (examples) by N (centers)

—symmetric square matrix

theoretical fact: if

x n all different, Z

with

Gaussian RBF invertible

optimal

β

with

invertible Z: β

=

Z −1 y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

=

y

T

Z

−1

(first column of Z) = y

T



1

0 . . . 0 

T

=

y

1

—gRBF(x

n

) =

y

n

, i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

=

y

T

Z

−1

(first column of Z)

= y

T



1

0 . . . 0 

T

=

y

1

—gRBF(x

n

) =

y

n

, i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z)

= y

T



1

0 . . . 0 

T

=

y

1

—gRBF(x

n

) =

y

n

, i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1

0 . . . 0 

T

=

y

1

—gRBF(x

n

) =

y

n

, i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

=

y

1

—gRBF(x

n

) =

y

n

, i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

=

y

1

—gRBF(x

n

) =

y

n

, i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =

y

n

, i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =

y

n

,

i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

,

i.e. E

in

(gRBF) =

0

,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =

0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =

Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

Radial Basis Function Network RBF Network Learning

Regularized Full RBF Network

full Gaussian RBF Network for regression:

β

=

Z −1 y g

RBF

(x

1

) = β

T

z

1

= y

T

Z

−1

(first column of Z) = y

T



1 0 . . . 0 

T

= y

1

—gRBF(x

n

) =y

n

, i.e. E

in

(gRBF) =0,

yeah!! :-)

called

exact interpolation

for

function approximation

but

overfitting for learning? :-(

how about

regularization? e.g. ridge

regression for

β

instead

—optimal

β

= (Z

T Z

+

λI) −1 Z T y

seen

Z? Z

= [Gaussian(x

n

,

x m

)] =Gaussian kernel matrix

K

effect of

regularization

in different spaces:

kernel

ridge

regression:

β

= (K+

λI) −1 y;

regularized

full RBFNet:

β

= (Z

T Z

+

λI) −1 Z T y

Radial Basis Function Network RBF Network Learning

Fewer Centers as Regularization

recall:

gSVM(x) =

sign X

SV

α m y m exp



−γkx − x m k 2 

+

b

!

—only ‘ N’

SVs

needed in ‘network’

next:

M  N

instead of

M = N

effect:

regularization

相關文件