• 沒有找到結果。

Chapter 7 Experimental results and Performance study

7.6 Experiment on Real world dataset

7.6.3 DBLP dataset

The DBLP dataset presents the interaction information of co-authorship. We take out the related dataset from 2000 to 2009 which contains 769,137 authors and 1,068,239 records. A node represents an author and an edge between nodes represents the co-authorship between authors. We use a year as a time unit due to the characteristics of interaction data. We preset the observation eyeshot wr = 1 and the same Minimum community similarity threshold ( )=0.3.

Figure 7-25 Result of DBLP dataset using EPC-SHRINK

Fig 7-25 shows the result of DBLP dataset. The cumulative number of Alive being

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Cumulative number

65

higher than other states depicts that most communities are alive over time. The cumulative numbers of Child and Fission are still lower than that of other states depicts the same phenomenon as the Enron dataset and Facebook.

However, the aggregate number of Birth increases fast shows an obvious fact. For example: a researcher could be a Master or a Ph. D. who co-worked with the professor and later he graduated from school. Nevertheless, he chooses another job rather than a researcher so the phenomenon displays the growth rates of Birth and Death states are higher than Enron email and Facebook dataset.

Especially in Facebook and in DBLP datasets, the cumulative number of Birth and Death are higher than in Enron dataset. This phenomenon shows the networks of FacetNet and Co-authorship change fast. The highly cumulative number of Alive also shows the result that EPC is more smoothy.

Figure 7-26 Smoothness property of EPC when running with DBLP dataset For the smoothness quality in DBLP dataset, the curve of EPC expresses higher smoothness quality than PD-Greedy.

0 0.2 0.4 0.6 0.8 1

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

NMI

Timestamp

DBLP dataset

EPC-SHRINK-EXP wr=2 PD-Greedy ut=%0.01, a=0.8

66

Chapter 8

Conclusion and Future work 8.1 Conclusion

Although a large number of studies have been made on community detection in networks, little is known about the property and feature of dynamic community. We propose the algorithm EPC which provides a novel approach of data smoothness to explore the evolution of community. The proposed Relationship extraction strategy not only considers the historical data but also the oncoming data. We also propose a mapping method of community partition over time called Community Pedigree Mapping which shows the state of community and displays the life circle of community.

In synthetic data experiment, our algorithm EPC provides a scalable way to solve the problem of mining dynamic community. There are two versions of the EPC-algorithm, EPC-SHRINK and EPC-GSCAN. The experiment demonstrates that EPC-SHRINK and EPC-GSCAN have higher accuracy than previous algorithms such as FacetNet[3] and PD-Greedy [5]. For the smoothness quality of community partitioning, the experiment shows the community partition of EPC is more smoothing than FacetNet and PD-Greedy. For the scalability of EPC, the experiment shows EPC-GSCAN is linearly scalable and almost as fast as PD-Greedy. In general, the accuracy and smoothness quality of EPC-SHRINK is better than EPC-GSCAN. It appears that the clustering quality of SHRINK is better than the clustering quality of GSCAN. The accuracy of EPC-GSCAN is better than PD-Greedy, which validates that the concept of Relationship graph over time outperforms the concept of temporal smoothness which using the same clustering algorithm.

We also apply EPC on real datasets to Enron email, Facebook and DBLP dataset. In all datasets, the cumulative number of Alive state being higher than other states indicates that the variation of real dynamic communities is quite low in social networks. The growth rates of Birth and Death in DBLP dataset and Facebook dataset are higher than Enron email dataset,

67

indicating that new communities appear more quickly in common relationship than in company. This phenomenon supports the argument that the real world such as FacetNet and Co-authorship change fast. The high cumulative number of Alive also verifies the result of EPC provides the property of smoothness over time.

8.2 Future work

Although EPC has acceptable time complexity and higher accuracy, there still are some related works worth further investigation. (1) So far there are no convinced theories about how to measure the quality of dynamic community in dynamic social network even we have discovered the community. (2) The proposed EPC presets the observation eyeshot wr and assumes each relationship between individuals should use the same decay weighting function.

In real world the decays of relationship between individuals might be different. Moreover, the observation eyeshot could be very dependent on different individuals at different timestamps.

(3) In real world, there are many kinds of interaction between individuals. How to correctly determine and present the relationship strength converting various kinds of interactions is still a challenging problem.

68

Bibliography

[1] D. Chakrabarti, R. Kumar, and A. Tomkins. Evolutionary clustering. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.554-560, 2006

[2] Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 153-162, 2007.

[3] Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. L.Tseng. FacetNet: A framework for analyzing communities and their evolutions in dynamic networks. Proceedings of the 17th International Conference on World Wide Web, pp. 685-694, 2008.

[4] L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode networks. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 677-685, 2008.

[5] M. S. Kim and J. Han. A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks. Proceedings of the 35th International Conference on Very Large Data Bases, pp. 622-633, 2009.

[6] G. Palla, A.-La´szlo´ Baraba´si and T. Vicsek. Quantifying social group evolution. Nature 2007, vol 446, pp.664-667, 2007.

[7] J. Sun, C. Faloutsos, S. Papadimitriou and P. S. Yu: GraphScope: parameter-free mining of large time-evolving graphs. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.687-696, 2007.

[8] C. Tantipathananandh, T. Y. Berger-Wolf and D. Kempe. A framework for community identification in dynamic social networks. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.717-726, 2007.

[9] S. Wasserman and K. Faust. Social Network Analysis: Method sand Applications.

Cambridge University Press, 1994.

69

[10] R. Sulo, T. Berger-Wolf and R. Grossman. Meaningful selection of temporal resolution for dynamic networks. Proceedings of the 8th Workshop on Mining and Learning with Graphs, pp.127-136, 2010.

[11] http://en.wikipedia.org/wiki/Exponential_decay#Mean_lifetime

[12] J. Huang, H. Sun, J. Han, H. Deng, Y. Sun and Y. Liu: SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks. Proceedings of the 19th ACM Conference on Information and Knowledge Management, pp.219-228 , 2010.

[13] S. Asur, S. Parthasarathy and D. Ucar, An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs. ACM Transactions on Knowledge Discovery from Data, vol 3, no 4, Article 16, 2009.

[14] http://dblp.uni-trier.de/xml/

[15] M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical Review, 2004, E 69, 026113, 2004.

[16] http://www.cs.cmu.edu/~enron/

[17] http://socialnetworks.mpi-sws.org/

[18] A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review, E 70, 066111, 2004.

[19] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, vol 2008,issue 10, 2008.

[20] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226~231, 1996.

[21] X. Xu, N. Yuruk, Z. Feng, and T. Schweiger. SCAN: a structural clustering algorithm for networks. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824-833, 2007.

70

[22] J.-G. Lee, J. Han and K.-Y. Whang. Trajectory clustering: A partition-and-group framework. Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 593-604, 2007.

[23] Q. Zhao, S. S. Bhowmick, X. Zheng and K. Yi. Characterizing and Predicting Community Members from Evolutionary and Heterogeneous Networks. Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 309-318, 2008.

[24] J. Shi and J. Malik. Normalized cuts and image segmentation. Conference on Computer Vision and Pattern Recognition, pp.731-737, 1997.

[25] Z. Feng, X. Xu, N. Yuruk, and T. A. J. Schweiger. A novel similarity-based modularity function for graph partitioning. Data Warehousing and Knowledge Discovery, 9th International Conference, pp. 385–396, 2007.

[26] http://www.public.asu.edu/~ylin56/research.html

[27] A. Lancichinetti and S. Fortunato. Community detection algorithms: A comparative analysis. Physical Review, E 80, 056117, 2009.

[28] C. Tantipathananandh, T. Y. Berger-Wolf. Constant-factor approximation algorithms for identifying dynamic communities. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.827-836, 2009.

[29] S. Fortunato. Community detection in graphs. Physics Reporters, vol 486, issue 3-5, pp.

75-174, 2010.

[30]K. Yu, S. Yu and V. Tresp. Soft clustering on graphs. Proceedings of the nineteenth Annual Conference in Neural Information Processing Systems, 2005.

[31] http://en.wikipedia.org/wiki/APX

相關文件