Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K
(x n − µ m )
=
−2
X
x
n∈S
mx n
−
|S
m
|µ m
optimal
prototype µ m
=Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K
(x n − µ m )
=
−2
X
x
n∈S
mx n
−
|S
m
|µ m
optimal
prototype µ m
=average
of
x n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K
(x n − µ m )
=−2
X
x
n∈S
mx n
−
|S
m
|µ m
optimal
prototype µ m
=average
of
x n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K (x n − µ m )
=
−2
X
x
n∈S
mx n
−
|S
m
|µ m
optimal
prototype µ m
=average
of
x n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K (x n − µ m )
=−2
X
x
n∈S
mx n
−
|S
m
|µ m
optimal
prototype µ m
=average
of
x n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K (x n − µ m )
=−2
X
x
n∈S
mx n
− |S
m
|µm
optimal
prototype µ m
=average
of
x n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K (x n − µ m )
=−2
X
x
n∈S
mx n
− |S
m
|µm
optimal
prototype µ m
=average
of
x n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K (x n − µ m )
=−2
X
x
n∈S
mx n
− |S
m
|µm
optimal
prototype µ m
=average
ofx n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
Prototype Optimization
with
S 1 , · · · , S M
being a partition of{x n },
{S
1,··· ,S
minM;µ
1,··· ,µ
M} N
X
n=1 M
X
m=1
J x n ∈ S m K kx n − µ m k 2
• hard to optimize: joint combinatorial-numerical
optimization• two sets
ofvariables: will optimize alternatingly
if
S 1 , · · · , S M fixed, just unconstrained optimization
for eachµ m
∇
µ
mEin
= −2N
X
n=1
J x n ∈ S m K (x n − µ m )
=−2
X
x
n∈S
mx n
− |S
m
|µm
optimal
prototype µ m
=average
ofx n
withinS m
for given
S 1 , · · · , S M
, eachµ n
‘optimally computed’ as
consensus
withinS m
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically
(different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
:say, as
k
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
muntil
converge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationk -Means: the most popular
clustering
algorithm throughalternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
:say, as
k
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
muntil
converge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationk -Means: the most popular
clustering
algorithm throughalternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
:say, as
k
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
muntil
converge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationk -Means: the most popular
clustering
algorithm throughalternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
:say, as
k
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
m untilconverge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationk -Means: the most popular
clustering
algorithm throughalternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
:say, as
k
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
m untilconverge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationk -Means: the most popular
clustering
algorithm throughalternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
: say, ask
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
m untilconverge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationk -Means: the most popular
clustering
algorithm throughalternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
: say, ask
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
m untilconverge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimization k -Means: the most popularclustering
algorithm through
alternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
: say, ask
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
m untilconverge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationk -Means: the most popular
clustering
algorithm throughalternating minimization
Radial Basis Function Network k -Means Algorithm
k -Means Algorithm
use k prototypes instead of M historically (different from k nearest neighbor, though)
k -Means Algorithm
1
initializeµ 1 , µ 2 , . . . , µ k
: say, ask
randomly chosenx n
2 alternating optimization
of Ein
: repeatedly1 optimize S
1, S
2, . . . , S
k:
each x
n‘optimally partitioned’ using its closest µ
i2 optimize µ
1, µ
2, . . . , µ
k:
each µ
n‘optimally computed’ as consensus within S
m untilconverge
converge: no change of S 1 , S 2 , . . . , S k
anymore—guaranteed as E
in decreases
during alternating minimizationRadial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform
—like
autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)RBF Network: a simple (old-fashioned) model
Radial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform
—like
autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)RBF Network: a simple (old-fashioned) model
Radial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform
—like
autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)RBF Network: a simple (old-fashioned) model
Radial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform
—like
autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)RBF Network: a simple (old-fashioned) model
Radial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform
—like
autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)RBF Network: a simple (old-fashioned) model
Radial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform—like autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)RBF Network: a simple (old-fashioned) model
Radial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform—like autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)RBF Network: a simple (old-fashioned) model
Radial Basis Function Network k -Means Algorithm
RBF Network Using k -Means
RBF Network Using k -Means
1
runk -Means
with k = M to get{µ m }
2
construct transformΦ(x) from RBF (say, Gaussian) at µ m Φ(x)
= [RBF(x,µ 1
),RBF(x,µ 2
), . . . ,RBF(x,µ M
)]3
runlinear model
on {(Φ(xn
),yn
)}to getβ
4
return gRBFNET(x) =LinearHypothesis
(β,Φ(x))
•
usingunsupervised learning (k -Means)
to assistfeature transform—like autoencoder
•
parameters: M (prototypes), RBF (such as γ of Gaussian)Radial Basis Function Network k -Means Algorithm
Fun Time
For k -Means, consider examples
x n
∈ R2
such that all xn,1
and xn,2
are non-zero. When fixing two prototypes µ1
= [1, 1] and µ2
= [−1, 1], which of the following set is the optimal S1
?1
{xn
:xn,1
>0}2
{xn
:xn,1
<0}3
{xn
:xn,2
>0}4
{xn
:xn,2
<0}Reference Answer: 1
Note that S
1
contains examples that are closer to µ1
than µ2
.Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/24
Radial Basis Function Network k -Means Algorithm
Fun Time
For k -Means, consider examples
x n
∈ R2
such that all xn,1
and xn,2
are non-zero. When fixing two prototypes µ1
= [1, 1] and µ2
= [−1, 1], which of the following set is the optimal S1
?1
{xn
:xn,1
>0}2
{xn
:xn,1
<0}3
{xn
:xn,2
>0}4
{xn
:xn,2
<0}Reference Answer: 1
Note that S
1
contains examples that are closer to µ1
than µ2
.Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Beauty of k -Means
k = 4
usually works well
with
proper k and initialization
Radial Basis Function Network k -Means and RBF Network in Action
Difficulty of k -Means
k = 2 k = 4 k = 7
‘sensitive’ to
k and initializationRadial Basis Function Network k -Means and RBF Network in Action
Difficulty of k -Means
k = 2 k = 4 k = 7
‘sensitive’ to
k and initializationRadial Basis Function Network k -Means and RBF Network in Action
Difficulty of k -Means
k = 2 k = 4 k = 7
‘sensitive’ to
k and initializationRadial Basis Function Network k -Means and RBF Network in Action
RBF Network Using k -Means
k = 2 k = 4 k = 7
reasonable performance with
proper centers
Radial Basis Function Network k -Means and RBF Network in Action
RBF Network Using k -Means
k = 2 k = 4 k = 7
reasonable performance with
proper centers
Radial Basis Function Network k -Means and RBF Network in Action
RBF Network Using k -Means
k = 2 k = 4 k = 7
reasonable performance with
proper centers
Radial Basis Function Network k -Means and RBF Network in Action