by constraining number of centers and voting weights

Radial Basis Function Network RBF Network Learning

Fewer Centers as Regularization

recall:

gSVM(x) =

sign X

SV

α m y m exp

−γkx − x m k ²

b

—only ‘ N’

SVs

needed in ‘network’

• M N

instead of

M = N

•

effect:

regularization

Radial Basis Function Network RBF Network Learning

Fewer Centers as Regularization

recall:

gSVM(x) =

sign X

SV

α m y m exp

−γkx − x m k ²

b

—only ‘ N’

SVs

needed in ‘network’

• M N

instead of

M = N

•

effect:

regularization

by constraining

number of centers and voting weights

•

physical meaning of

centers µ _m

prototypes

remaining question: how to extract

prototypes?

Radial Basis Function Network RBF Network Learning

Fewer Centers as Regularization

recall:

gSVM(x) =

sign X

SV

α m y m exp

−γkx − x m k ²

b

—only ‘ N’

SVs

needed in ‘network’

• M N

instead of

M = N

•

effect:

regularization

by constraining

number of centers and voting weights

•

physical meaning of

centers µ _m

prototypes

remaining question: how to extract

prototypes?

Radial Basis Function Network RBF Network Learning

Fewer Centers as Regularization

recall:

gSVM(x) =

sign X

SV

α m y m exp

−γkx − x m k ²

b

—only ‘ N’

SVs

needed in ‘network’

• M N

instead of

M = N

•

effect:

regularization

by constraining

number of centers and voting weights

•

physical meaning of

centers µ _m

prototypes

remaining question: how to extract

prototypes?

Radial Basis Function Network RBF Network Learning

Fewer Centers as Regularization

recall:

gSVM(x) =

sign X

SV

α m y m exp

−γkx − x m k ²

b

—only ‘ N’

SVs

needed in ‘network’

• M N

instead of

M = N

•

effect:

regularization

by constraining

number of centers and voting weights

•

physical meaning of

centers µ _m

prototypes

remaining question:

Radial Basis Function Network RBF Network Learning

Fun Time

x ₁

x ₂

, what happens in the

Z

matrix of full Gaussian RBF network?

1

the first two rows of the matrix are the same

2

the first two columns of the matrix are different

3

the matrix is invertible

4

the sub-matrix at the intersection of the first two rows and the first two columns contains a constant of 0

Reference Answer: 1

It is easy to see that the first two rows must be the same; so must the first two columns. The two same rows makes the matrix singular; the sub-matrix in 4 contains a constant of 1 = exp(−0) instead of 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/24

Radial Basis Function Network RBF Network Learning

Fun Time

x ₁

x ₂

, what happens in the

Z

matrix of full Gaussian RBF network?

1

the first two rows of the matrix are the same

2

the first two columns of the matrix are different

3

the matrix is invertible

4

the sub-matrix at the intersection of the first two rows and the first two columns contains a constant of 0

Reference Answer: 1

It is easy to see that the first two rows must be the same; so must the first two columns. The two same rows makes the matrix singular; the

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Good Prototypes: Clustering Problem

=⇒

x ₁ ≈ x 2

=⇒

no need

both

RBF(x, x ₁

RBF(x, x ₂

)in RBFNet,

=⇒

cluster x ₁

and

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

• clustering

with

prototype:

• partition {x

} to disjoint sets S

, S

, · · · , S

• choose µ

for each S

—hope:

x ₁ , x ₂

both ∈

S _m

⇔

µ _m ≈ x ₁ ≈ x ₂

•

cluster error with squared error measure:

E

(S

, · · · , S

; µ

₁

, · · · , µ

) = 1 N

X

n=1 M

X

m=1

J x

∈ S

K kx

− µ

k

goal: with

S ₁ , · · · , S _M

being a partition of

{x n },

min

{S

,··· ,S

;µ

₁

,··· ,µ

}

_in

₁ , · · · , S _M

;

µ ₁ , · · · , µ _M

)

Radial Basis Function Network k -Means Algorithm

Partition Optimization

with

S ₁ , · · · , S _M

being a partition of

{x n },

{S

₁

,··· ,S

minM

;µ

₁

,··· ,µ

} N

n=1 M

m=1

J x _n ∈ S _m K kx _n − µ _m k ²

• hard to optimize: joint combinatorial-numerical

optimization

• two sets

variables: will optimize alternatingly

µ ₁ , · · · , µ _M fixed, for each x _n

• J x _n ∈ S _m K

: choose

one and only one subset

• kx _n − µ _m k ²

: distance to each

prototype

optimal

chosen subset S _m

= the one with

在文檔中 Machine Learning Techniques (ᘤᢈ) (頁 68-84)

by constraining number of centers and voting weights

Fewer Centers as Regularization

sign X

SV

α m y m exp



−γkx − x m k 2 

b

SVs

•

M  N

M = N

•

regularization

Fewer Centers as Regularization

sign X

SV

α m y m exp



−γkx − x m k 2 

b

SVs

•

M  N

M = N

•

regularization

number of centers and voting weights

•

centers µ m

prototypes

prototypes?

Fewer Centers as Regularization

sign X

SV

α m y m exp



−γkx − x m k 2 

b

SVs

•

M  N

M = N

•

regularization

number of centers and voting weights

•

centers µ m

prototypes

prototypes?

Fewer Centers as Regularization

sign X

SV

α m y m exp



−γkx − x m k 2 

b

SVs

•

M  N

M = N

•

regularization

number of centers and voting weights

•

centers µ m

prototypes

prototypes?

Fewer Centers as Regularization

sign X

SV

α m y m exp



−γkx − x m k 2 

b

SVs

•

M  N

M = N

•

−γkx − x m k ²

M N

−γkx − x m k ²

M N

centers µ _m

−γkx − x m k ²

M N

centers µ _m

−γkx − x m k ²

M N

centers µ _m

−γkx − x m k ²

M N

centers µ _m

x ₁

x ₂

x ₁

x ₂

x ₁ ≈ x 2

RBF(x, x ₁

RBF(x, x ₂

cluster x ₁

x ₂

one prototype µ ≈ x ₁ ≈ x ₂

x ₁ , x ₂

S _m

µ _m ≈ x ₁ ≈ x ₂

S ₁ , · · · , S _M

_in

₁ , · · · , S _M

µ ₁ , · · · , µ _M

x ₁ ≈ x 2

RBF(x, x ₁