• 沒有找到結果。

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K

(x n − µ m )

=

−2

 X

x

n

∈S

m

x n

−

|S

m

|

µ m

 optimal

prototype µ m

=

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K

(x n − µ m )

=

−2

 X

x

n

∈S

m

x n

−

|S

m

|

µ m

 optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K

(x n − µ m )

=

−2

 X

x

n

∈S

m

x n

−

|S

m

|

µ m

 optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K (x n − µ m )

=

−2

 X

x

n

∈S

m

x n

−

|S

m

|

µ m

 optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K (x n − µ m )

=

−2

 X

x

n

∈S

m

x n

−

|S

m

|

µ m

optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K (x n − µ m )

=

−2

 X

x

n

∈S

m

x n

− |S

m

m

optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K (x n − µ m )

=

−2

 X

x

n

∈S

m

x n

− |S

m

m

 optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K (x n − µ m )

=

−2

 X

x

n

∈S

m

x n

− |S

m

m

 optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

Prototype Optimization

with

S 1 , · · · , S M

being a partition of

{x n },

{S

1

,··· ,S

minM

1

,··· ,µ

M

} N

X

n=1 M

X

m=1

J x n ∈ S m K kx n − µ m k 2

hard to optimize: joint combinatorial-numerical

optimization

two sets

of

variables: will optimize alternatingly

if

S 1 , · · · , S M fixed, just unconstrained optimization

for each

µ m

µ

mE

in

= −2

N

X

n=1

J x n ∈ S m K (x n − µ m )

=

−2

 X

x

n

∈S

m

x n

− |S

m

m

 optimal

prototype µ m

=

average

of

x n

within

S m

for given

S 1 , · · · , S M

, each

µ n

‘optimally computed’ as

consensus

within

S m

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically

(different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

:

say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m

until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

:

say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m

until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

:

say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m

until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

:

say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

:

say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

: say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

: say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

: say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

k -Means: the most popular

clustering

algorithm through

alternating minimization

Radial Basis Function Network k -Means Algorithm

k -Means Algorithm

use k prototypes instead of M historically (different from k nearest neighbor, though)

k -Means Algorithm

1

initialize

µ 1 , µ 2 , . . . , µ k

: say, as

k

randomly chosen

x n

2 alternating optimization

of E

in

: repeatedly

1 optimize S

1

, S

2

, . . . , S

k

:

each x

n

‘optimally partitioned’ using its closest µ

i

2 optimize µ

1

, µ

2

, . . . , µ

k

:

each µ

n

‘optimally computed’ as consensus within S

m until

converge

converge: no change of S 1 , S 2 , . . . , S k

anymore

—guaranteed as E

in decreases

during alternating minimization

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform

—like

autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform

—like

autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform

—like

autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform

—like

autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform

—like

autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform—like autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform—like autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Radial Basis Function Network k -Means Algorithm

RBF Network Using k -Means

RBF Network Using k -Means

1

run

k -Means

with k = M to get

m }

2

construct transform

Φ(x) from RBF (say, Gaussian) at µ m Φ(x)

= [RBF(x,

µ 1

),RBF(x,

µ 2

), . . . ,RBF(x,

µ M

)]

3

run

linear model

on {(Φ(x

n

),y

n

)}to get

β

4

return gRBFNET(x) =

LinearHypothesis

(β,

Φ(x))

using

unsupervised learning (k -Means)

to assist

feature transform—like autoencoder

parameters: M (prototypes), RBF (such as γ of Gaussian)

Radial Basis Function Network k -Means Algorithm

Fun Time

For k -Means, consider examples

x n

∈ R

2

such that all x

n,1

and x

n,2

are non-zero. When fixing two prototypes µ

1

= [1, 1] and µ

2

= [−1, 1], which of the following set is the optimal S

1

?

1

{x

n

:x

n,1

>0}

2

{x

n

:x

n,1

<0}

3

{x

n

:x

n,2

>0}

4

{x

n

:x

n,2

<0}

Reference Answer: 1

Note that S

1

contains examples that are closer to µ

1

than µ

2

.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/24

Radial Basis Function Network k -Means Algorithm

Fun Time

For k -Means, consider examples

x n

∈ R

2

such that all x

n,1

and x

n,2

are non-zero. When fixing two prototypes µ

1

= [1, 1] and µ

2

= [−1, 1], which of the following set is the optimal S

1

?

1

{x

n

:x

n,1

>0}

2

{x

n

:x

n,1

<0}

3

{x

n

:x

n,2

>0}

4

{x

n

:x

n,2

<0}

Reference Answer: 1

Note that S

1

contains examples that are closer to µ

1

than µ

2

.

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Beauty of k -Means

k = 4

usually works well

with

proper k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Difficulty of k -Means

k = 2 k = 4 k = 7

‘sensitive’ to

k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Difficulty of k -Means

k = 2 k = 4 k = 7

‘sensitive’ to

k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

Difficulty of k -Means

k = 2 k = 4 k = 7

‘sensitive’ to

k and initialization

Radial Basis Function Network k -Means and RBF Network in Action

RBF Network Using k -Means

k = 2 k = 4 k = 7

reasonable performance with

proper centers

Radial Basis Function Network k -Means and RBF Network in Action

RBF Network Using k -Means

k = 2 k = 4 k = 7

reasonable performance with

proper centers

Radial Basis Function Network k -Means and RBF Network in Action

RBF Network Using k -Means

k = 2 k = 4 k = 7

reasonable performance with

proper centers

Radial Basis Function Network k -Means and RBF Network in Action

相關文件