Data Complex Simulation
5.2 Relations to Manifold Learning Measure
5.2.1 Simulation results
Here we use the Swiss roll dataset mentioned in previous section as in Table 3.1. The corresponding embedding results will be used for computing curves of performance measures. The eigenvalues and corresponding measurement results for di¤erent manifold learning methods with di¤erent embedding measures using k-nn can be shown in Table 5.4 and Table 5.5. The blue curve from Isomap local correlation measure is based on the k-nn of the original dataset, while the green curve is based on the k-nn of the embedding results. In this case, the blue curve should be more representative than the green curve in the measurement results. For Trustworthy and Continuity, the blue curve means the trustworthy measure and the green curve means the continuity measure. For MRRE, the blue curve means the element similar to the trustworthy measure, while the green curve means the element similar to the continuity measure.
Some corresponding embedding results for di¤erent manifold learning methods can be seen in Figures 3-2, 3-8, 3-14, 3-20, 3-26 and 3-32. The measurement changes can capture the major embedding result changes as the eigenvalues do. These rank-based measures have their view points for the quality of embedding results compared to the original datasets, but the eigenvalues can only reveal the properties of the changes of embedding results.
5.2.2 Summary
Since it is hard to set a standard for identifying change points directly referred from the eigenvalues, the embedding measures are able to give clues of the real change points. Currently, we can only detect these points from human eyes and
Table 5.4: Eigenvalues and measures for Swiss roll dataset using LLE, isomap and HLLE.
Approaches LLE Isomap Hessian LLE
Eigenvalues
Table 5.5: Eigenvalues and measures for Swiss roll dataset using Laplacian eigenmap, di¤usion map and LTSA.
Approaches Laplacian Eigenmap Di¤usion Map LTSA
Eigenvalues
# neares t neighbors
Eigenvalue
heuristics. Since the change points from embedding measures support signi…cant change points in the eigenvalue curves, using eigenvalue curves does not require extra computations as the embedding measures since the eigenvalues can be obtained directly from the eigen-decomposition step of the manifold learning frameworks.
Chapter 6
Conclusion
In this work, manifold learning methods within certain framework are discussed and analyzed. In this framework, neighborhood selection is the most important com-ponent which determines the …nal embedding results and eigenvalues. For di¤erent dimension reduction methods, change points selection for eigenvalues may be accord-ing to the eigenvalue changes or changes of eigenvalue slopes. These change points re‡ect major structural changes on the embedding results. Looking into eigenvalues from di¤erent neighborhood selection parameters can increase our understanding about the original dataset. For simulation dataset, we can validate these manifold learning methods from obtaining required embedding results. For real data, there is no supported information as from simulation datasets. The di¤erence from embed-ding results from di¤erent manifold learning methods comes from di¤erent matrix basis. Unfortunately, the true data are all more bene…t on k-nn. For -distance, it’s not easy to …nd a distance range which contains much less neighborhood con-nections. If there exists valleys in the pairwise distance histogram, the dataset is possible for -distance to apply divide and conquer.
There are still lots of issues for analyzing eigenvalues and for modi…cation of man-ifold learning algorithms from conventional decomposition to general eigen-decomposition in order to …nd other points of view on a dataset. When it’s hard to determine change points in eigenvalues’…gures, some existing performance measures are introduced due to ensure the true change points in the eigenvalue …gures which a¤ect the major structures of embedding results. If we can directly identify change points from eigenvalues, we don’t need to pay more computational cost on measures for reference change points. Also, the …nding for smallest positive eigenvalues which is not considered as machine epsilon is also an important issue for applying manifold learning on large datasets. Many other neighborhood selection approaches can also be included for more viewpoints of the original datasets. We hope eventually we can use di¤erent manifold learning methods with di¤erent neighborhood selection approaches to extract much more details for understanding the datasets.
Bibliography
[1] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
[2] Hong Chang and Dit-Yan Yeung. Robust locally linear embedding. Pattern Recognition, 39:1053–1065, 2006.
[3] Lisha Chen and Andreas Buja. Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. Journal of the American Statistical Association, 104(485):209–219, 2009.
[4] Wei-Chen Cheng and Cheng-Yuan Liou. Manifold construction based on local distance invariance. Memetic Computing, 2(2):149–160, 2010.
[5] Ronald R. Coifman and Stephane Lafon. Di¤usion maps. Applied and Compu-tational Harmonic Analysis, 21(1):5–30, 2006.
[6] David L. Donoho and Carrie Grimes. Hessian eigenmaps: Locally linear embed-ding techniques for high-dimensional data. Proceeembed-dings of the National Academy of Sciences, 100(10):5591–5596, 2003.
[7] J.B. Kruskal. Multidimensional scaling by optimizing goodness of …t to a non-metric hypothesis. Psychometrika, 29:1–27, 1964.
[8] Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. nature, 401(6755):788–791, 1999.
[9] John A. Lee and Michel Verleysen. Quality assessment of dimensionality reduc-tion: Rank-based criteria. Neurocomputating, 72(7-9):1431–1443, March 2009.
[10] Cheng-Yuan Liou and Bruce R. Musicus. Cross entropy approximation of struc-tured gaussian covariance matrices. IEEE Transactions on Signal Processing, 56(7):3362–3367, July 2008.
[11] Laurens Van Der Maaten and Geo¤rey Hinton. Visualizing data using t-sne.
Journal of Machine Learning Research, 9:2579–2605, 2008.
[12] Yaozhang Pan, Shuzhi Sam Ge, and Abdullah Al Mamun. Weighted locally linear embedding for dimension reduction. Pattern Recognition, 42:798–811, 2009.
[13] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
[14] Lawrence K. Saul and Sam T. Roweis. Think globally, …t locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119–155, 2003.
[15] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–
2323, 2000.
[16] Laurans van Der Maaten. Matlab toolbox for dimensionality reduction, No-vember 2010.
[17] Jarkko Venna and Samuel Kaski. Local multidimensional scaling. Neural Net-works, 19(6):889–899, July 2006.
[18] Guihua Wen, Lijun Jiang, and Jun Wen. Local relative transformation with application to isometric embedding. Pattern Recognition Letters, 30:203–211, 2009.
[19] Todd Wittman. Manifold learning matlab demo, April 2005.
[20] Tsung Tai Yeh, Tseng-Yi Chen, Yen-Chiu Chen, and Wei-Kuan Shih. E¢ cient parallel algorithm for nonlinear dimensionality reduction on gpu. In Granular Computing (GrC), 2010 IEEE International Conference on, pages 592–597, San Jose, CA, USA, August 2010.
[21] Zhen-Yue Zhang and Hong-Yuan Zha. Principal manifolds and nonlinear dimen-sionality reduction via tangent space alignment. Journal of Shanghai University (English Edition), 8:406–424, 2004.
[22] Wangmeng Zuo, David Zhang, and Kuanquan Wang. On kernel di¤erence-weighted k-nearest neighbor classi…cation. Pattern Analysis and Applications, 11:247–257, 2008.