N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ?
N (examples) by
N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
optimal
β
withinvertible Z: β
=Z −1
y
Radial Basis Function Network RBF Network Learning
Interpolation by Full RBF Network
full RBF
Network for squared error regression:h(x) =
XX Output X X
N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ?
N (examples) by
N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
optimal
β
withinvertible Z: β
=Z −1
y
Radial Basis Function Network RBF Network Learning
Interpolation by Full RBF Network
full RBF
Network for squared error regression:h(x) =
XX Output X X
N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ?
N (examples) by
N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
optimal
β
withinvertible Z: β
=Z −1
y
Radial Basis Function Network RBF Network Learning
Interpolation by Full RBF Network
full RBF
Network for squared error regression:h(x) =
XX Output X X
N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ?
N (examples) by
N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
optimal
β
withinvertible Z: β
=Z −1
y
Radial Basis Function Network RBF Network Learning
Interpolation by Full RBF Network
full RBF
Network for squared error regression:h(x) =
XX Output X X
N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ? N (examples) by N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
optimal
β
withinvertible Z: β
=Z −1
y
Radial Basis Function Network RBF Network Learning
Interpolation by Full RBF Network
full RBF
Network for squared error regression:h(x) =
XX Output X X
N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ? N (examples) by N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
optimal
β
withinvertible Z: β
=Z −1
y
Radial Basis Function Network RBF Network Learning
Interpolation by Full RBF Network
full RBF
Network for squared error regression:h(x) =
XX Output X X
N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ? N (examples) by N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
Z −1
Radial Basis Function Network RBF Network Learning
Interpolation by Full RBF Network
full RBF
Network for squared error regression:h(x) =
XX Output X X
N
X
m=1
β m RBF(x, x m
)!
•
just linear regression onRBF-transformed data
z n
= [RBF(xn
,x 1
),RBF(x n
,x 2
), . . . ,RBF(x n
,x N
)]•
optimalβ? β
= (ZT Z) −1 Z T y, if Z T Z
invertible,remember? :-)
•
size ofZ? N (examples) by N (centers)
—symmetric square matrix
•
theoretical fact: ifx n all different, Z
withGaussian RBF invertible
optimal
β
withinvertible Z: β
=Z −1 y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1=
y
TZ
−1(first column of Z) = y
T1
0 . . . 0
T=
y
1—gRBF(x
n
) =y
n
, i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1=
y
TZ
−1(first column of Z)
= y
T1
0 . . . 0
T=
y
1—gRBF(x
n
) =y
n
, i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z)
= y
T1
0 . . . 0
T=
y
1—gRBF(x
n
) =y
n
, i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1
0 . . . 0
T=
y
1—gRBF(x
n
) =y
n
, i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T=
y
1—gRBF(x
n
) =y
n
, i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T=
y
1—gRBF(x
n
) =y
n
, i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =y
n
, i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =y
n
,
i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
,i.e. E
in
(gRBF) =0
,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,
yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrix
K
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrix
K
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrix
K
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrix
K
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrix
K
effect ofregularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect of
regularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect ofregularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
Radial Basis Function Network RBF Network Learning
Regularized Full RBF Network
full Gaussian RBF Network for regression:
β
=Z −1 y g
RBF(x
1) = β
Tz
1= y
TZ
−1(first column of Z) = y
T1 0 . . . 0
T= y
1—gRBF(x
n
) =yn
, i.e. Ein
(gRBF) =0,yeah!! :-)
•
calledexact interpolation
forfunction approximation
•
butoverfitting for learning? :-(
•
how aboutregularization? e.g. ridge
regression forβ
instead—optimal
β
= (ZT Z
+λI) −1 Z T y
•
seenZ? Z
= [Gaussian(xn
,x m
)] =Gaussian kernel matrixK
effect ofregularization
in different spaces:kernel
ridge
regression:β
= (K+λI) −1 y;
regularized
full RBFNet:β
= (ZT Z
+λI) −1 Z T y
Radial Basis Function Network RBF Network Learning
Fewer Centers as Regularization
recall:
gSVM(x) =
sign X
SV
α m y m exp
−γkx − x m k 2
+b
!
—only ‘ N’