• 沒有找到結果。

Energy function for SOM

N/A
N/A
Protected

Academic year: 2022

Share "Energy function for SOM"

Copied!
38
0
0

加載中.... (立即查看全文)

全文

(1)

A Novel Method for Manifold Construction

Wei-Chen Cheng and Cheng-Yuan Liou*

Department of Computer Science and Information Engineering

National Taiwan University Republic of China

*cyliou@csie.ntu.edu.tw

(2)

ICONIP 2008 Liou, C.-Y. 2

Related Works

Separating two classes and

clustering the same class in SOM Liou, Chen, Huang

2000

Distance invariance Liou, Cheng

2007

Conformality, angle invariance in SOM

Liou, Tai 2000

Anisotropic mapping Pedrycz, Waletzky

1997

Average expected distortion measure

Kohonen 1997

Least distortion measure in SOM Luttrell

1991 1992

Approx. energy function of SOM Erwin, Obermayer,

Schulten 1992

People Year

(3)

Energy function for SOM

• E. Erwin, K. Obermayer, K. Schulten, (1992)

• Approximated form

( ) ( ) ( )

+

i p i

i

H H W x w

2

P x d x Z

2 , 1

ε ˆ

ε

(4)

ICONIP 2008 Liou, C.-Y. 4

Least distortion measure

• Luttrell (1991, 1992)

=

j

j ij

i

h x m

d

2

Trying to comprehend SOM

(5)

Average expected distortion measure

• T. Kohonen (1997)

( ) ∫∑ ( ) ( )

=

= ep x dx h d x m p x dx E

L i

i

ci ,

∑∫ ∑

( )

=

i x X

k

k ik

i

dx x

p m

x h

E 2

(6)

ICONIP 2008 Liou, C.-Y. 6

Anisotropy mapping for SOM

• W. Pedrycz, J. Waletzky (1997)

• The anisotropy of this metric means that one can find such pairs of

patterns x1, x2 and x3, x4 such that and are equal yet the values

and can differ quite substantially.

( ) ( )

[

target NN x1 NN x2 2

]

2

Q =

2

1 x

x

4

3 x

x

( )

x1 NN

( )

x2

NN NN

( )

x3 NN

( )

x4

(7)

Conformality

• Liou, Dai (2000)

• Differential geometry

• Angle preservation in SOM

• Patterns may not have a fixed space structure, such as tree or chain with flexible joints.

(8)

ICONIP 2008 Liou, C.-Y. 8

SIR

I nternal representations for SOM

• Liou, Chen, Huang (2000)

• Separating two classes and clustering the same class in a hidden layer

flexible for various designs

• Flexible design of output space y, tree

• Relative distances only

• Capable of anisotropy

( ) ( )

( )

( ) ∑∑

∑∑

=

= P

p P

p

p p P

p P

p

p p

rep d y y E

E

1 2

2 1

1 2

2

1 2

2 1

( ) ( )

(y p1 y p2 )

d

(9)

Distance invariance

Liou, Cheng (2007)

Relative distances only

Flexible design of output space – 2D, tree, or chain

Solving S-shape problem in LLE, Isomap (2007)

Maximally resemble the pattern distortions

– Visualization of physical meaning among patterns

Perfect energy function

– N = number of data, save for

Isotropy

( ) ( ) ( ( ))

( )

2 ,

2

,

2 2

4 1 4

1 ∑ ∑ ∑ ∑

=

=

p U pr

q p

p U pr

q p q

p

q q

y y

d r

E

x x

x x

y y

( )x

p p( )y

(10)

ICONIP 2008 Liou, C.-Y. 10

Flexible design

Density problem in LLE

(11)

Objective Function

( ) ( )

2

,

2 2

4

1

∑ ∑

=

p U p r

q p

q p

q

r E

x

x x

y y

2 2

1 y

y

2 2

1 x

x

(12)

ICONIP 2008 Liou, C.-Y. 12

Each epoch

(13)

Each pattern

(14)

ICONIP 2008 Liou, C.-Y. 14

Ten epochs and all patterns

(15)

N << number of data Regular cell position

LDP irregular cells

Self-organizing map

(16)

ICONIP 2008 Liou, C.-Y. 16

Artificial Data

Input data Output data

• There are 280 input patterns in 3D input space.

• Initial the output cells by MDS

(17)

Artificial Data

( ) ( ) ∑ ∑ ( )

=

p x U pr

q p

q p q

p

p

r q

p r U

MDI

,

2 2

, 1

x x

x x y y

• There are 280 input patterns in 3D input space.

• Initial the output cells by MDS

• MDI (measurement of distance invariance)

(18)

ICONIP 2008 Liou, C.-Y. 18

Artificial Data

Performance Output data

• There are 280 input patterns in 3D input space.

• Initial the output cells by MDS

• MDI (measurement of distance invariance)

(19)

Comparisons

(20)

ICONIP 2008 Liou, C.-Y. 20

Sampling densities in swiss roll

(21)

Sampling densities in swiss roll

• LDP is not affected by the density distributions.

• This is because every pattern has its correspondent cell in the output

space.

• The density of each pattern is equal to the density of its cell.

• The number of cells is equals to the

(22)

ICONIP 2008 Liou, C.-Y. 22

World Stock Indices

TAIEX SWISS MARKET

SSE S&P 500

Taiwan Swiss

Shanghai New York

OBX NYSE COMP

NIKKEI 225 NASDAQ

MIBTEL

OSLO New York

Osaka Nasdaq

Milan

KOSPI KLSE

JKSE HANG SENG

FTSE100

Korea Kuala

Jakarta Hong Kong

London

DJ-INDUS DAX

BSE SENSEX ALL ORDS

AEX

New York Frankfurt

Bombay Australia

Amsterdam

(23)

World Stock Indices

• Data: The indices value of 19 country in each month are arranged in the

vector form.

• Initial: the initial values of the output cells are from the first two dimension of MDS.

(24)

ICONIP 2008 Liou, C.-Y. 24

2007-10

2003-03 2006-05

(25)

Flexible design of mapping fctn y=Ax Edge length estimation of known tree

i

( ) j

( )

2 1 1

1 0 1

2 1 1

1 2

1 2

2

=

P P z

z

P P P

P

δ δ

M M

48 47

6

M

L L j

i

zj

0 subject to

z δ

z A

All tree edges

All input distance pairs

(26)

ICONIP 2008 Liou, C.-Y. 26

Edge length estimation of known tree

xp xq i

( ) j

( )

2 1 1

1 0 1

2 1 1

1 2

1 2

2

=

P P z

z

P P P

P

δ δ

M M

48 47

6

M

L L j

i

q p

i = x x

δ

zj

0 subject to

if ,

1 2

1

=

z

r z

a i i

P

j

j

ij δ δ

LDP

(local distance preservation) All tree edges

All input distance pairs

(27)

Phylogenetic Tree

• Tree: The tree is constructed by UPGMA (Unweighted Pair Group Method with Arithmetic mean).

• Path Estimation: The arc length of the tree is

estimated by two different methods for comparison.

– Non-Negative Least Square Square [Sattath, 1977] Method – Local Distance Preservation

• Data: Case, S.M.: Biochemical systematics of

members of the genus Rana native to western north America. Systematic Zoology 27, 299-311 (1978)

(28)

ICONIP 2008 Liou, C.-Y. 28

Immunological distance

Non-Negative Least Square Square [Sattath, 1977] Method

Local Distance Preservation Method

(29)

Immunological distance

Fitch, W.M. 1981, A Non -Sequential Method for Constructing Trees and Hierarchical Classifications, J Mol Evol

Saitou, N. and Nei, M. 1987, The neighbor-joining method: a new

method for reconstructing

phylogenetic trees. Mol Bio and Evol

(30)

ICONIP 2008 Liou, C.-Y. 30

Immunological distance

( ) ( )

( ( ) )

∑ ∑( )

=

p x U p r

q p

q p

p

q

q p t r

p r U

MDI

,

, 2

, 1

x x

x x

(31)

Phylogenetic Tree

• Tree: The trees (H5N1, HIV, SARS, Nipah

Virus) are constructed by UPGMA (Unweighted Pair Group Method with Arithmetic mean).

• Path Estimation: The edge length of the

tree is estimated by two different methods for comparison.

– Non-Negative Least Square Method – Local Distance Preservation

• Assumption: Local distance is more reliable than global distance.

(32)

ICONIP 2008 Liou, C.-Y. 32

Phylogenetic Tree

Non-Negative Least Square [Sattath, 1977] Method Local Distance Preservation Method

8,89,41,72,86,88,79,53,81,35

(33)

Indonesia subtree

Non-Negative Least Square [Sattath, 1977] Method Local Distance Preservation Method

68,25,59,37,85,54

(34)

ICONIP 2008 Liou, C.-Y. 34

Phylogenetic Tree

• Distance: hamming distance after performing multiple alignment

• Characters: 20 amino acid

• # of data: there are totally 97 H5N1 protein sequences of segment 1 (PB2) in this simulation.

• Data source: NCBI Influenza Virus Resource

(35)

Summary

• Visualization only, no LVQ

• Flexible design of low dimensional space

– 2D, tree y=Ax

• Distance invariance

– Preservation of physical meaning in 2D – Perfect energy function

– Relative distances, no pattern vectors

– Better edge lengths for short amino acid distances

• number of output cells=P, Save for

– no probability manipulation

• Flexible initial setting.

• Parallel and distributed algorithm possible

( )x

p p( )y

(36)

ICONIP 2008 Liou, C.-Y. 36

Kohonen’s book 2001, preface page XI

• Let me also emphasize the following facts: 1) The SOM has not been meant for statistical pattern recognition (no probability); It is a clustering, visualization, and abstraction method.

Anybody wishing to implement decision and classification processes should use LVQ in stead of SOM. 2)…

(37)

Thank You

A Novel Method for Manifold Construction

Wei-Chen Cheng and Cheng-Yuan Liou*

Department of Computer Science and Information Engineering

National Taiwan University Republic of China

*cyliou@csie.ntu.edu.tw

(38)

ICONIP 2008 Liou, C.-Y. 38

Related Work

Isomap Tenenbaum, J. B.

2000

Locally Linear Embedding

Roweis, S.-T.

2000

SOM Kohonen, T.

1989

MDS Young, G.

1938

Contribution People

Year

參考文獻

相關文件

• There are important problems for which there are no known efficient deterministic algorithms but for which very efficient randomized algorithms exist. – Extraction of square roots,

An n×n square is called an m–binary latin square if each row and column of it filled with exactly m “1”s and (n–m) “0”s. We are going to study the following question: Find

Keywords: accuracy measure; bootstrap; case-control; cross-validation; missing data; M -phase; pseudo least squares; pseudo maximum likelihood estimator; receiver

Let f being a Morse function on a smooth compact manifold M (In his paper, the result can be generalized to non-compact cases in certain ways, but we assume the compactness

If that circle is formed into a square so that the circumference of the original circle and the perimeter of the square are exactly the same, the sides of a pyramid constructed on

Based on the reformulation, a semi-smooth Levenberg–Marquardt method was developed, and the superlinear (quadratic) rate of convergence was established under the strict

職銜 地址 電話 傳真 電郵地址...

We present a new method, called ACC (i.e. Association based Classification using Chi-square independence test), to solve the problems of classification.. ACC finds frequent and