Social Network Summarization Overview
[Chierichetti’09][Maserrat’1 0] [Maserrat’12]
Summarization Strategies: Lossless / Lossy
Opportunities for Future Research
• Advanced techniques to sample/summarize more complex graph structures
– E.g. location‐based social networks, diffusion networks, dynamic social networks, social network with activity information, etc.
• Should we focus on task‐driven sampling and
summarization or do we need a general framework across tasks?
• Sampling/Summarization on noisy data
• Standard evaluation metrics and benchmark data are in high demand.
• And many others…
• Sampling and summarization have immediate practical values in the big data era
– Allow data miners to perform advanced mining tasks in large graphs
– Achieve scalable storage and querying
– Facilitate the development of real‐world applications
• Existing works are rich, but by no means
complete to handle every aspect of the problem.
• This tutorial is partially sponsored by National Science Council, National Taiwan University and Intel Corporation under Grants NSC101‐
2911‐I‐002‐001, NSC101‐2628‐E‐002‐028‐MY2 and NTU102R7501
• Special thanks to Shu‐Ming Hsu @ Academia Sinica for his inputs
Reference – Homogeneous Sampling
• J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD 2006.
• A. S. Maiya and T. Y. Berger‐Wolf. Benefits of bias: towards better characterization of network sampling. In KDD 2011.
• B. Ribeiro and D. Towsley. Estimating and sampling graphs with multidimensional random walks. In ACM SIGCOMM IMC 2010.
• M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in facebook: a case study of unbiased sampling of OSNs. In IEEE INFOCOM 2010.
• V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J.‐H. Cui, and A. G.
Percus. Reducing large internet topologies for faster simulations. In Networking, 2005.
• N. K. Ahmed, J. Neville, and R. Kompella. Network Sampling: From Static to Streaming Graphs. arXiv:1211.3412, 2012.
• C. Hubler, H.‐P. Kriegel, K. M. Borgwardt, and Z. Ghahramani. Metropolis Algorithms for Representative Subgraph Sampling. In IEEE ICDM 2008.
• M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a graph with a magnifying glass: stratified sampling via weighted random walks.
SIGMETRICS Perform. Eval. Rev. 2011.
Reference – Heterogeneous Sampling
• M. Gjoka, C. T. Butts, M. Kurant, and A. Markopoulou.
Multigraph Sampling of Online Social Networks. IEEE Journal on Selected Areas in Communications, 2011.
• M. Kurant, M. Gjoka, Y. Wang, Z. W. Almquist, C. T. Butts, and A. Markopoulou. Coarse‐grained topology estimation via
graph sampling. ACM WOSN 2012.
• J.‐Y. Li and M.‐Y. Yeh. On Sampling Type Distribution from Heterogeneous Social Networks. In PAKDD 2011.
• Cheng‐Lun Yang, Perng‐Hwa Kung, Chun‐An Chen, Shou‐De Lin. Semantically Sampling in Heterogeneous Social Networks in WWW 2013
• D. Heckathorn. Respondent‐driven sampling: a new approach to the study of hidden populations. Social problems, 1997.
Reference – Task‐driven Sampling
• A. S. Maiya and T. Y. Berger‐Wolf. Sampling community structure. In WWW 2010.
• M. Mathioudakis, F. Bonchi, C. Castillo, A. Gionis, and A. Ukkonen. Sparsification of Influence Networks. In KDD 2011.
• A.S. Maiya and T.Y. Berger‐Wolf. Online Sampling of High Centrality Individuals in Social Networks. In PAKDD 2010.
• V. Satuluri, S. Parthasarathy, and Y. Ruan. Local Graph Sparsification for Scalable Clustering. In SIGMOD 2011.
• A. Vattani, D. Chakrabarti, and M. Gurevich. Preserving Personalized Pagerank in Subgraphs. In ICML 2011.
• N. K. Ahmed, J. Neville, and R. Kompella. Network
Sampling Designs for Relational Classification. In AAAI ICWSM 2012.
References: Aggregation‐based Summarization
• S. Navlakha, R. Rastogi, N. Shrivastava. Graph Summarization with Bounded Error. In Proc. of ACM SIGMOD International Conference on Management of Data (SIGMOD’08), 2008.
• Y. Tian, R. A. Hankins and J. M. Patel. Efficient Aggregation for Graph Summarization. In Proc. of ACM SIGMOD International Conference on Management of Data (SIGMOD’08), 2008.
• G. Buehrer and K. Chellapilla. A Scalable Pattern Mining Approach to Web Graph Compression with Communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM’08), pages 95–106, 2008.
• N. Zhang, Y. Tian, and J. M. Patel. Discovery‐driven Graph
Summarization. In Proc. of IEEE International Conference on Data Engineering (ICDE’10), 2010.
References: Abstraction‐based Summarization
• Z. Shen, K. L. Ma and T. Eliassi‐Rad. Visual Analysis of Large Heterogeneous Social Networks by
Semantic and Structural Abstraction. IEEE Transactions on Visualization and Computer Graphics, 12(6), 1427–1439, 2006.
• C.‐T. Li and S.‐D. Lin. Egocentric Information
Abstraction for Heterogeneous Social Networks, In Proc. of International Conference on Advances in Social Network Analysis and Mining
References: Compression‐based Summarization
• P. Boldi and S. Vigna. The Webgraph Framework I: Compression Techniques. In the 13th international conference on World Wide Web (WWW'04), pages 595–602, 2004.
• F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P.
Raghavan. On Compressing Social Networks, In Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), 2009.
• H. Maserrat and J. Pei. Neighbor Query Friendly Compression of Social Networks, In Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10), 2010.
• H. Maserrat and J. Pei. Community Preserving Lossy Compression of Social Networks, In Proc. ICDM, 2012.
• P. Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. Layered label propagation: a multiresolution coordinate‐free ordering for compressing social networks. In WWW'11.
• Y. Choi and W. Szpankowski. Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments. Information Theory, IEEE Transactions on, 58(2):620–638, February 2012
References: Application‐oriented Summarization
• F. Zhou, S. Malher, and H. Toivonen. Network Simplification with Minimal Loss of Connectivity. In Proc. of IEEE International Conference on Data Mining (ICDM’10), 2010.
• H. Toivonen, F. Zhou, A. Hartikainen, and A. Hinkka. Compression of Weighted Graphs, In Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), 2011.
• U. Kang, H. Tong, J. Sun, C. Y. Lin, and C. Faloutsos. GBASE: A Scalable and General Graph
Management System, In Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), 2011.
• K. LeFevre and E. Terzi. GraSS: Graph Structure Summarization. In Proc. of SIAM International Conference on Data Mining (SDM’10), 2010.
• U. Kang and C. Faloutsos. Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining. In Proc. of IEEE International Conference on Data Mining (ICDM’10), 2010.
• C. Chen, C. X. Lin, M. Fredrikson, M. Christodorescu, X. Yan, and J. Han. Mining Graph Patterns Efficiently via Randomized Summaries. Proc. VLDB Endow., 2(1):742–753, August 2009.
• W. Fan, J. Li, X. Wang, and Y. Wu. Query Preserving Graph Compression, In Proc. of ACM SIGMOD International Conference on Management of Data (SIGMOD’12), 2012.