A Novel Method for Manifold Construction
Wei-Chen Cheng and Cheng-Yuan Liou*
Department of Computer Science and Information Engineering
National Taiwan University Republic of China
*cyliou@csie.ntu.edu.tw
ICONIP 2008 Liou, C.-Y. 2
Related Works
Separating two classes and
clustering the same class in SOM Liou, Chen, Huang
2000
Distance invariance Liou, Cheng
2007
Conformality, angle invariance in SOM
Liou, Tai 2000
Anisotropic mapping Pedrycz, Waletzky
1997
Average expected distortion measure
Kohonen 1997
Least distortion measure in SOM Luttrell
1991 1992
Approx. energy function of SOM Erwin, Obermayer,
Schulten 1992
People Year
Energy function for SOM
• E. Erwin, K. Obermayer, K. Schulten, (1992)
• Approximated form
( ) ( ) ( )
∫ − +
≡
i p ii
H H W x w
2P x d x Z
2 , 1
ε ˆ
ε
ICONIP 2008 Liou, C.-Y. 4
Least distortion measure
• Luttrell (1991, 1992)
∑ −
=
j
j ij
i
h x m
d
2Trying to comprehend SOM
Average expected distortion measure
• T. Kohonen (1997)
( ) ∫∑ ( ) ( )
∫
∈=
= ep x dx h d x m p x dx E
L i
i
ci ,
∑∫ ∑
∈ −( )
=
i x X
k
k ik
i
dx x
p m
x h
E 2
ICONIP 2008 Liou, C.-Y. 6
Anisotropy mapping for SOM
• W. Pedrycz, J. Waletzky (1997)
• The anisotropy of this metric means that one can find such pairs of
patterns x1, x2 and x3, x4 such that and are equal yet the values
and can differ quite substantially.
( ) ( )
[
target NN x1 NN x2 2]
2Q = − −
2
1 x
x −
4
3 x
x −
( )
x1 NN( )
x2NN − NN
( )
x3 − NN( )
x4Conformality
• Liou, Dai (2000)
• Differential geometry
• Angle preservation in SOM
• Patterns may not have a fixed space structure, such as tree or chain with flexible joints.
ICONIP 2008 Liou, C.-Y. 8
SIR
I nternal representations for SOM
• Liou, Chen, Huang (2000)
• Separating two classes and clustering the same class in a hidden layer
• flexible for various designs
• Flexible design of output space y, tree
• Relative distances only
• Capable of anisotropy
( ) ( )
( )
( ) ∑∑
∑∑
− == P
p P
p
p p P
p P
p
p p
rep d y y E
E
1 2
2 1
1 2
2
1 2
2 1
( ) ( )
(y p1 y p2 )
d −
Distance invariance
• Liou, Cheng (2007)
• Relative distances only
• Flexible design of output space – 2D, tree, or chain
• Solving S-shape problem in LLE, Isomap (2007)
• Maximally resemble the pattern distortions
– Visualization of physical meaning among patterns
• Perfect energy function
– N = number of data, save for
• Isotropy
( ) ( ) ( ( ))
( )
2 ,
2
,
2 2
4 1 4
1 ∑ ∑ ∑ ∑
∈
∈
−
⎟ =
⎠⎞
⎜⎝
⎛ − − −
=
p U pr
q p
p U pr
q p q
p
q q
y y
d r
E
x x
x x
y y
( )x
p p( )y
ICONIP 2008 Liou, C.-Y. 10
Flexible design
Density problem in LLE
Objective Function
( ) ( )
2
,
2 2
4
1
∑ ∑
∈
⎟⎠
⎜ ⎞
⎝⎛ − − −
=
p U p r
q p
q p
q
r E
x
x x
y y
2 2
1 y
y −
2 2
1 x
x −
ICONIP 2008 Liou, C.-Y. 12
Each epoch
Each pattern
ICONIP 2008 Liou, C.-Y. 14
Ten epochs and all patterns
N << number of data Regular cell position
LDP irregular cells
Self-organizing map
ICONIP 2008 Liou, C.-Y. 16
Artificial Data
Input data Output data
• There are 280 input patterns in 3D input space.
• Initial the output cells by MDS
Artificial Data
( ) ∑ ( ) ∑ ∑∈ ( ) −
⎟⎠
⎜ ⎞
⎝⎛ − − −
=
p x U pr
q p
q p q
p
p
r q
p r U
MDI
,
2 2
, 1
x x
x x y y
• There are 280 input patterns in 3D input space.
• Initial the output cells by MDS
• MDI (measurement of distance invariance)
ICONIP 2008 Liou, C.-Y. 18
Artificial Data
Performance Output data
• There are 280 input patterns in 3D input space.
• Initial the output cells by MDS
• MDI (measurement of distance invariance)
Comparisons
ICONIP 2008 Liou, C.-Y. 20
Sampling densities in swiss roll
Sampling densities in swiss roll
• LDP is not affected by the density distributions.
• This is because every pattern has its correspondent cell in the output
space.
• The density of each pattern is equal to the density of its cell.
• The number of cells is equals to the
ICONIP 2008 Liou, C.-Y. 22
World Stock Indices
TAIEX SWISS MARKET
SSE S&P 500
Taiwan Swiss
Shanghai New York
OBX NYSE COMP
NIKKEI 225 NASDAQ
MIBTEL
OSLO New York
Osaka Nasdaq
Milan
KOSPI KLSE
JKSE HANG SENG
FTSE100
Korea Kuala
Jakarta Hong Kong
London
DJ-INDUS DAX
BSE SENSEX ALL ORDS
AEX
New York Frankfurt
Bombay Australia
Amsterdam
World Stock Indices
• Data: The indices value of 19 country in each month are arranged in the
vector form.
• Initial: the initial values of the output cells are from the first two dimension of MDS.
ICONIP 2008 Liou, C.-Y. 24
2007-10
2003-03 2006-05
Flexible design of mapping fctn y=Ax Edge length estimation of known tree
i
( ) j
( )
2 1 1
1 0 1
2 1 1
1 2
1 2
2
−
⎪⎪
⎭
⎪⎪
⎬
⎫
⎥⎥
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢⎢
⎢
⎣
⎡
=
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢
⎣
⎡
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢
⎣
⎡
−
−
−
P P z
z
P P P
P
δ δ
M M
48 47
6
M
L L j
i
zj
0 subject to ≥
≈
z δ
z A
All tree edges
All input distance pairs
ICONIP 2008 Liou, C.-Y. 26
Edge length estimation of known tree
xp xq i
( ) j
( )
2 1 1
1 0 1
2 1 1
1 2
1 2
2
−
⎪⎪
⎭
⎪⎪
⎬
⎫
⎥⎥
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢⎢
⎢
⎣
⎡
=
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢
⎣
⎡
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢
⎣
⎡
−
−
−
P P z
z
P P P
P
δ δ
M M
48 47
6
M
L L j
i
q p
i = x − x
δ
zj
0 subject to
if ,
1 2
1
≥
≤
∑
− ≈=
z
r z
a i i
P
j
j
ij δ δ
LDP
(local distance preservation) All tree edges
All input distance pairs
Phylogenetic Tree
• Tree: The tree is constructed by UPGMA (Unweighted Pair Group Method with Arithmetic mean).
• Path Estimation: The arc length of the tree is
estimated by two different methods for comparison.
– Non-Negative Least Square Square [Sattath, 1977] Method – Local Distance Preservation
• Data: Case, S.M.: Biochemical systematics of
members of the genus Rana native to western north America. Systematic Zoology 27, 299-311 (1978)
ICONIP 2008 Liou, C.-Y. 28
Immunological distance
Non-Negative Least Square Square [Sattath, 1977] Method
Local Distance Preservation Method
Immunological distance
• Fitch, W.M. 1981, A Non -Sequential Method for Constructing Trees and Hierarchical Classifications, J Mol Evol
• Saitou, N. and Nei, M. 1987, The neighbor-joining method: a new
method for reconstructing
phylogenetic trees. Mol Bio and Evol
ICONIP 2008 Liou, C.-Y. 30
Immunological distance
( ) ( )
( ( ) )
∑ ∑( )
∑ ∈ −
−
= −
p x U p r
q p
q p
p
q
q p t r
p r U
MDI
,
, 2
, 1
x x
x x
Phylogenetic Tree
• Tree: The trees (H5N1, HIV, SARS, Nipah
Virus) are constructed by UPGMA (Unweighted Pair Group Method with Arithmetic mean).
• Path Estimation: The edge length of the
tree is estimated by two different methods for comparison.
– Non-Negative Least Square Method – Local Distance Preservation
• Assumption: Local distance is more reliable than global distance.
ICONIP 2008 Liou, C.-Y. 32
Phylogenetic Tree
Non-Negative Least Square [Sattath, 1977] Method Local Distance Preservation Method
8,89,41,72,86,88,79,53,81,35
Indonesia subtree
Non-Negative Least Square [Sattath, 1977] Method Local Distance Preservation Method
68,25,59,37,85,54
ICONIP 2008 Liou, C.-Y. 34
Phylogenetic Tree
• Distance: hamming distance after performing multiple alignment
• Characters: 20 amino acid
• # of data: there are totally 97 H5N1 protein sequences of segment 1 (PB2) in this simulation.
• Data source: NCBI Influenza Virus Resource
Summary
• Visualization only, no LVQ
• Flexible design of low dimensional space
– 2D, tree y=Ax
• Distance invariance
– Preservation of physical meaning in 2D – Perfect energy function
– Relative distances, no pattern vectors
– Better edge lengths for short amino acid distances
• number of output cells=P, Save for
– no probability manipulation
• Flexible initial setting.
• Parallel and distributed algorithm possible
( )x
p p( )y
ICONIP 2008 Liou, C.-Y. 36
Kohonen’s book 2001, preface page XI
• Let me also emphasize the following facts: 1) The SOM has not been meant for statistical pattern recognition (no probability); It is a clustering, visualization, and abstraction method.
Anybody wishing to implement decision and classification processes should use LVQ in stead of SOM. 2)…
Thank You
A Novel Method for Manifold Construction
Wei-Chen Cheng and Cheng-Yuan Liou*
Department of Computer Science and Information Engineering
National Taiwan University Republic of China
*cyliou@csie.ntu.edu.tw
ICONIP 2008 Liou, C.-Y. 38
Related Work
Isomap Tenenbaum, J. B.
2000
Locally Linear Embedding
Roweis, S.-T.
2000
SOM Kohonen, T.
1989
MDS Young, G.
1938
Contribution People
Year