• 沒有找到結果。

# Machine Learning Techniques (ᘤᢈ)

N/A
N/A
Protected

Share "Machine Learning Techniques (ᘤᢈ)"

Copied!
42
0
0

(1)

## Machine Learning Techniques ( 機器學習技法)

### Lecture 14: Radial Basis Function Network

Hsuan-Tien Lin (林軒田)

htlin@csie.ntu.edu.tw

(2)

### 3

Distilling Implicit Features: Extraction Models

with

### denoising autoencoder

(non-linear PCA) and fine-tuning with

for NNet with

### k -Means and RBF Network in Action

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/24

(3)

## Gaussian SVM Revisited

SVM

SV

n

n

n

2

### !

Gaussian SVM: find

to

centered at

### x n

;

achieve large margin in

### •

Gaussian kernel: also called

(RBF) kernel

n

let

(x) =

### y n exp −γkx − x n k 2 

: gSVM(x) = sign P

(x) + b

—linear

of

hypotheses

(RBF)

linear

### aggregation

of

hypotheses

(4)

Radial Basis Function Network RBF Network Hypothesis

## From Neural Network to RBF Network

x0=1

x1

x2

x3

... xd

+1

tanh

tanh

tanh

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

### • hidden layer

different:

(inner-product+ tanh) versus (distance+ Gaussian)

same:

### just linear aggregation

RBF Network: historically

### a type of NNet

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/24

(5)

## RBF Network Hypothesis

M

m=1

m

m

key variables:

; (signed)

x0=1

x1

x2

x3

... xd

+1

RBF

RBF

RBF

Output

SVM

M = #SV;

: SVM SVs

;

: α

y

### m

from SVM Dual learning: given

and

decide

and

### β m

(6)

Radial Basis Function Network RBF Network Hypothesis

## RBF and Similarity

between

:

Neuron(x, x

) =tanh(γx

+1) DNASim(x, x

### 0

) =EditDistance(x, x

)

RBF Network:

as

### feature transform

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/24

### kernel: similarity via Z-space inner product

—governed by Mercer’s condition, remember? :-) Poly(x, x

) = (1 +

)

Gaussian(x, x

) =exp(−γkx − x

k

)

Gaussian(x, x

) =exp(−γkx − x

k

)

(x, x

) =Jkx − x

### 0

k ≤ 1K (1 − kx − x

k)

—often

### monotonically non-increasing

to distance

(7)

Radial Basis Function Network RBF Network Hypothesis

## Fun Time

Which of the following is not a radial basis function?

### 1

φ(x, µ) = exp(−γkx − µk

)

φ(x, µ) = −p

µ + µ

µ

φ(x, µ) =Jx = µK

φ(x, µ) = x

### Tx + µT

µ

Note that 3 is an extreme case of 1 (Gaussian) with γ → ∞, and 2 contains an kx − µk

somewhere

### :-).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/24

(8)

Radial Basis Function Network RBF Network Hypothesis

## Fun Time

Which of the following is not a radial basis function?

### 1

φ(x, µ) = exp(−γkx − µk

)

φ(x, µ) = −p

µ + µ

µ

φ(x, µ) =Jx = µK

φ(x, µ) = x

### Tx + µT

µ

Note that 3 is an extreme case of 1 (Gaussian) with γ → ∞, and 2 contains an kx − µk

somewhere

### :-).

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/24

(9)

## Full RBF Network

h(x) =

X

)

!

Network:

and each

=

### •

physical meaning: each

similar

### •

e.g. uniform influence with

### β m = 1 · y m

for binary classification

g

(x) =

### −γkx − xm k 2 

!

—aggregateeach example’s

subject to

Network:

way to decide

### µ m

(10)

Radial Basis Function Network RBF Network Learning

## Nearest Neighbor

g

(x) = sign

!

:

when

### xclosest toxm

—maximum oneoften dominates the

term

take

of

of all

### •

physical meaning:

g

(x) =

such that

—called

model

can

also:

also

but

### very intuitive

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/24

(11)

## Interpolation by Full RBF Network

### full RBF

Network for squared error regression:

h(x) =

)

!

### •

just linear regression on

= [RBF(x

,

),

,

), . . . ,

,

)]

optimal

= (Z

invertible,

size of

### Z? N (examples) by N (centers)

—symmetric square matrix

### •

theoretical fact: if

with

optimal

with

=

### Z −1y

(12)

Radial Basis Function Network RBF Network Learning

## Regularized Full RBF Network

full Gaussian RBF Network for regression:

=

RBF

1

T

1

T

−1

T

T

1

—gRBF(x

) =y

, i.e. E

(gRBF) =0,

called

for

but

regression for

—optimal

= (Z

+

seen

= [Gaussian(x

,

### xm

)] =Gaussian kernel matrix

effect of

### regularization

in different spaces:

kernel

regression:

= (K+

full RBFNet:

= (Z

+

### λI) −1 Z Ty

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/24

(13)

## Fewer Centers as Regularization

recall:

gSVM(x) =

+

!

—only ‘ N’

### SVs

needed in ‘network’

next:

effect:

by constraining

### •

physical meaning of

:

### prototypes

remaining question:

how to extract

### prototypes?

(14)

Radial Basis Function Network RBF Network Learning

## Fun Time

If

=

### x2

, what happens in the

### Z

matrix of full Gaussian RBF network?

### 1

the first two rows of the matrix are the same

### 2

the first two columns of the matrix are different

### 3

the matrix is invertible

### 4

the sub-matrix at the intersection of the first two rows and the first two columns contains a constant of 0

It is easy to see that the first two rows must be the same; so must the first two columns. The two same rows makes the matrix singular; the sub-matrix in 4 contains a constant of 1 = exp(−0) instead of 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/24

(15)

## Fun Time

If

=

### x2

, what happens in the

### Z

matrix of full Gaussian RBF network?

### 1

the first two rows of the matrix are the same

### 2

the first two columns of the matrix are different

### 3

the matrix is invertible

### 4

the sub-matrix at the intersection of the first two rows and the first two columns contains a constant of 0

It is easy to see that the first two rows must be the same; so must the first two columns. The two same rows makes the matrix singular; the sub-matrix in 4 contains a constant of 1 = exp(−0) instead of 0.

(16)

Radial Basis Function Network k -Means Algorithm

## Good Prototypes: Clustering Problem

=⇒

if

,

=⇒

both

)&

)in RBFNet,

=⇒

and

by

with

n

1

2

M

m

m

—hope:

both ∈

### •

cluster error with squared error measure:

in

1

M

1

M

N

n=1 M

m=1

n

m

n

m

2

goal: with

### S 1 , · · · , S M

being a partition of

min

1

M

1

M

E

(S

;

### µ 1 , · · · , µ M

)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/24

(17)

## Partition Optimization

with

### S 1 , · · · , S M

being a partition of

1

minM

1

M

X

X

optimization

of

if

: choose

### • kx n − µ m k 2

: distance to each

optimal

= the one with

for given

, each

### xn

‘optimally partitioned’ using its

### closest µ m

(18)

Radial Basis Function Network k -Means Algorithm

## Prototype Optimization

with

### S 1 , · · · , S M

being a partition of

1

minM

1

M

X

X

optimization

of

if

for each

mE

= −2

X

=

 X

n

m

− |S

 optimal

=

of

within

for given

, each

### µ n

‘optimally computed’ as

within

### S m

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/24

(19)

## k -Means Algorithm

initialize

: say, as

randomly chosen

of E

: repeatedly

1

2

k

n

i

1

2

k

n

m until

anymore

—guaranteed as E

### indecreases

during alternating minimization k -Means: the most popular

### clustering

algorithm through

### alternating minimization

(20)

Radial Basis Function Network k -Means Algorithm

## RBF Network Using k -Means

run

### k -Means

with k = M to get

### 2

construct transform

= [RBF(x,

),RBF(x,

), . . . ,RBF(x,

)]

run

on {(Φ(x

),y

)}to get

### 4

return gRBFNET(x) =

(β,

using

to assist

### •

parameters: M (prototypes), RBF (such as γ of Gaussian)

RBF Network: a simple (old-fashioned) model

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/24

(21)

Radial Basis Function Network k -Means Algorithm

## Fun Time

For k -Means, consider examples

∈ R

such that all x

and x

### n,2

are non-zero. When fixing two prototypes µ

= [1, 1] and µ

### 2

= [−1, 1], which of the following set is the optimal S

?

{x

:x

>0}

{x

:x

<0}

{x

:x

>0}

{x

:x

<0}

Note that S

### 1

contains examples that are closer to µ

than µ

### 2

.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/24

(22)

Radial Basis Function Network k -Means Algorithm

## Fun Time

For k -Means, consider examples

∈ R

such that all x

and x

### n,2

are non-zero. When fixing two prototypes µ

= [1, 1] and µ

### 2

= [−1, 1], which of the following set is the optimal S

?

{x

:x

>0}

{x

:x

<0}

{x

:x

>0}

{x

:x

<0}

Note that S

### 1

contains examples that are closer to µ

than µ

### 2

.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/24

(23)

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

(24)

Radial Basis Function Network k -Means and RBF Network in Action

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/24

(25)

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

(26)

Radial Basis Function Network k -Means and RBF Network in Action

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/24

(27)

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

(28)

Radial Basis Function Network k -Means and RBF Network in Action

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/24

(29)

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

(30)

Radial Basis Function Network k -Means and RBF Network in Action

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/24

(31)

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

(32)

Radial Basis Function Network k -Means and RBF Network in Action

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/24

(33)

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

(34)

Radial Basis Function Network k -Means and RBF Network in Action

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/24

(35)

## Beauty of k -Means

k = 4

usually works well

with

### proper k and initialization

(36)

Radial Basis Function Network k -Means and RBF Network in Action

## Difficulty of k -Means

k = 2 k = 4 k = 7

### ‘sensitive’ to

k and initialization

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/24

(37)

## Difficulty of k -Means

k = 2 k = 4 k = 7

### ‘sensitive’ to

k and initialization

(38)

Radial Basis Function Network k -Means and RBF Network in Action

## RBF Network Using k -Means

k = 2 k = 4 k = 7

reasonable performance with

### proper centers

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/24

(39)

## Full RBF Network

k = N k = 4 nearest neighbor

λ =0.001

### full RBF Network: generally less useful

(40)

Radial Basis Function Network k -Means and RBF Network in Action

## Fun Time

When coupled with ridge linear regression, which of the following RBF Network is ‘most regularized’?

### 1

small M and small λ

### 2

small M and large λ

### 3

large M and small λ

### 4

large M and large λ

small M: fewer weights and more regularized; large λ: shorter β more and more regularized.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 23/24

(41)

## Fun Time

When coupled with ridge linear regression, which of the following RBF Network is ‘most regularized’?

### 1

small M and small λ

### 2

small M and large λ

### 3

large M and small λ

### 4

large M and large λ

small M: fewer weights and more regularized;

large λ: shorter β more and more regularized.

(42)

Radial Basis Function Network k -Means and RBF Network in Action

## Summary

### 3

Distilling Implicit Features: Extraction Models

### • next: extracting features from abstract data

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 24/24

Which of the following aggregation model learns diverse g t by reweighting and calculates linear vote by steepest search?.

Lecture 4: Soft-Margin Support Vector Machine allow some margin violations ξ n while penalizing them by C; equivalent to upper-bounding α n by C Lecture 5: Kernel Logistic

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22.. Decision Tree Decision Tree Hypothesis. Disclaimers about

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.. linear SVM: more robust and solvable with quadratic programming Lecture 2: Dual Support

1 Embedding Numerous Features: Kernel Models Lecture 1: Linear Support Vector Machine.

Principle Component Analysis Denoising Auto Encoder Deep Neural Network... Deep Learning Optimization

For a deep NNet for written character recognition from raw pixels, which type of features are more likely extracted after the first hidden layer.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22.. Matrix Factorization Summary of Extraction Models.