Who Do You Know Who:
Social Networks Analysis and
Mining
Presenter: I-Hsien Ting, Ph.D. (丁一賢)
Introduction (1)
• What is Social Network?
– A social network is a social structure to
describe social relations (wikipedia)
– The history of Social Network is older
than everybody who is here
• More than 100 years (Cooley 1909, Durkheim 1893)
• Focusing on small groups
– Information Techniques give it a new life
– From Sociology to Computer Science
Introduction (2)
• Topics about Social Networking
– Social Networking: Analyzing and
Constructing Social Network
(Churchill & Halverson 2005)• Social Networks Analysis
• Online Social Networking
• Social Network Extraction and Construction
• Applications of Social Networking
Social Networks Analysis (1)
• Social Networks Analysis
– A simple social network diagram
(Scott 1991)• Roles
• Relationships
– Directed & Undirected – One way & Two way – Positive & Negative
– Self-defined relationships
• Visualization
– Why visualization?
» Providing as much information as possible in a social network
» Human can easily and roughly
Social Networks Analysis (2)
• Relational Data
Adjacency matrix:Companies-by-companies 1 2 3 4 1 - 3 3 1 2 3 - 2 2 3 3 2 - 1 4 1 2 1 - Adjacency matrix: directors-by-directors A B C D E A - 2 2 1 1 B 2 - 3 2 1 C 2 3 - 2 2 Directors A B C D E C om pani e s 1 1 1 1 1 0 2 1 1 1 0 1 3 0 1 1 1 0 4 0 0 1 0 1 1 2 4 3 1 1 2 2 3 3 A B 1 1 1 2 2 2 2 3
Social Networks Analysis (3)
• From social data to relational data
6 A B C D E F G A x 2 2 2 2 2 2 B 2 x 1 1 1 1 1 C 2 1 x 1 1 1 1 D 2 1 1 x 1 1 1 E 2 1 1 1 x 1 1 F 2 1 1 1 1 X 1 G 2 1 1 1 1 1 X
Social Networks Analysis (4)
• Measurement
– Centrality Degree
• Betweenness
• Closeness
– Clustering Coefficient
– Density
– Path Length
– Reachability
– Structural Hole
Social Networks Analysis (5)
• Density measurement-An example
Connected Points 4 4 4 3 2 0 Inclusiveness 1.0 1.0 1.0 0.7 0.5 0 Sum of degrees 12 8 6 4 2 0 No. of lines 6 4 3 2 1 0 Density 1.0 0.7 0.5 0.3 0.1 0 8 2 ) 1 ( n n l Density (Scott 1991)
Social Networks Analysis (6)
• Centrality
Social Networks Analysis (7)
• Measurement
– Clustering Coefficient
– Path Length, Trail, Walk
– Reachability
(digraph)– Structural Hole
– Reciprocity
– K-Clique
– Position
10Social Networks Analysis (8)
• Measurement
– Clustering Coefficient
• Local
Social Networks Analysis (9)
• Measurement
– Structural Hole
Social Networks Analysis (10)
• Measurement
– Clique
• Complete subgraph
• Maximum clique
– {1,2,5}• Maximal cliques
– {1,2,5} – {2,3}, {3,4}, {4,5},{4,6}• K-clique
– Clique of size kSocial Networks Analysis (11)
• Sociologists only focus on small social
networks
– 50~100 nodes in a social network
• The advent of Internet communications
has greatly increased SNA‟s popularity
– Computer & Information Technologies
become essential tools for SNA
(Churchill & Halverson 2005)Can You Analyze and Construct this Social Network
Diagram by hand??
Cross-Field Research
16
ASONAM 2010 in Odense, Denmark
(The 2010 International Conference on Social Network Analysis and Mining)
Sociology Ministry of Defense, Netherland Terrorist Terrorist Operation Unit, European Council Data Mining & Privacy WWW and Mining Neural Network and NLP Data Mining
Social Networking Systems:
Off-line social networking software(1)
• Most off-line social networking softwares are
focusing on
– Visualization
– Social Networks Analysis
• UCINET
– The best known off-line social networking software
(Borgatti et al. 2002)
– Visualization
18
Social Networking Systems:
Off-line social networking software(2)
• UCINET
Social Networking Systems:
Off-line social networking software(3)
20
Social Networking Systems:
Off-line social networking software(4)
• NetMiner II
– Strong Visualization Function
(Cyram 2003)Social Networking Systems:
Off-line social networking software(5)
22
Social Networking Systems:
Off-line social networking software(6)
• NetMiner II
• Netdraw
– A simple social network visualization tool
Social Networking Systems:
24
Social Networking Systems:
Off-line social networking software(2)
• A summary table of off-line social networking software
(Huisman and Duiju 2003)Social Networking Systems:
On-line social networking websites(1)
• On-line social networking website is an
evolution from on-line community
• Sharing is the main object
• Two main categories of on-line social
networking website
– Portal-based
• Integrating multi-functions
– Function-based
• For a specific function
– Albums sharing (Flickr) – Videos sharing (YouTube)26
Social Networking Systems:
On-line Social Networking Website (2)
• Classmates.com
– The first on-line social networking website – Created in 1995
• MySpace
– The world‟s busiest website (more accesses than Google)
– Originally designed to mirror a college community
• Orkut
– The social networking website owned by Google
• Wallop
– The next generation online-social networking website ,owned by Microsoft
Social Networking Systems:
On-line Social Networking Websites (3)
• A Comparison of On-line Social Networking
Websites
Friend Search
Blog Album Music & Video Sharing Visualization E-commerce Classmates Myspace Facebook Orkut Wallop
Social Networking
• Social Network = Computer Network
– Next Target: Mobile Phone
• Facebook is collaborating with Cingular
Wireless, Sprint Nextel & Verizon Wireless
• Killer Applications are Needed
– E-commerceFacebook
– Job Finder
• On-line Social Networking Websites
– Using people to find content
Social Networks Extraction & Construction (1)
• Extracting & Constructing Social Networks from
Contents
– Using content to find people
– Contents
• Web
• Event-logs
• On-line Chat
• Papers & Theses
Social Networks Extraction & Construction (2)
• Extracting Social Networks from Web
– Extracting from web contents (Personal Homepage)
– Semantic Analysis (Ontology) & NLP (Natural Language Processing) (Jin et al. 2007)
– Contacting information is the focus (Culotta et al. 2005)
• E-mail address • Phone Number • Names – Network Analysis • Appearance • Connectivity • URLs Similarity 30
Social Networks Extraction & Construction (3)
• Extracting Social
Networks from E-mail
– A most used on-line
communication application
– E-mail is a semi-structured
document
(Bird et al 2006) • Header for sender identification– Form: „Bill Stoddard‟
<reddrum@attglobal.net>
• Subject • Receiver • Date & Time
Me A B C Me A B C A B
Social Networks Extraction & Construction (4)
Social Networks Extraction & Construction (4)
• Extracting Social Networks from Chat
– Internet Relay Chat (Chat Room)
(Muttons 2006)– Instant Messenger
• MSN Messenger, ICQ, Yahoo! Messenger,……
• MSN messenger provides a XML based and structured communication logs
– Date & Time – Sender
– Receiver – Messages
• Network Analysis
– Communication Frequency & Closeness
Visualization
34
SONAM Applications (1)
• Marketing & E-commerce
– Target Marketing
– Collaborative Recommendation
• Terrorist & Crime Detection
– 911 Network
– Ipswich‟s Jack the Ripper, England 2006
• Medical Network
– Finding Blood – Organ
SONAM Applications (2)
• Learning
• Organizational Social Network Analysis
– Optimice
• Politic & Election
• Academic Social Networking
– Family Tree
• Game AI
– On-line Game
– Game with Social Network (Game 2.0)
• Second Life
• And Much More………
Challenges in Mining Social
Network Data
Adopted and Modified from the talk of Jon M. Kleinberg
Challenge 1: Splitting Network
Challenge 2: A Matter of Scale
• 436-node network of e-mail exchange over 3 months at
a corporate research lab (Adamic-Adar 2003)
• 43,553-node network of e-mail exchange over 2 years at
a large university (Kossinets-Watts 2006)
• 4.4-million-node network of declared friendships on
blogging community LiveJournal (Liben-Nowell et al.
2005, Backstrom et al. 2006)
• 240-million-node network of all IM communication over
one month on Microsoft Instant Messenger
(Leskovec-Horvitz‟07)
Challenge 2: A Matter of Scale
• Currently, massive network datasets give
you both more and less:
– More: can observe global phenomena that are
genuine, but literally invisible at smaller
scales.
– Less: Don‟t really know what any one node or
link means. Easy to measure things; hard to
pose nuanced questions.
– Goal: Find the point where the lines of
research converge.
Challenge 3: Geographic Data
• Liben-Nowell, Kumar, Novak, Raghavan, Tomkins (2005) studied • LiveJournal, an on-line blogging community with friendship links • Large-scale social network with geographical embedding:
– 500,000 members with U.S. Zip codes, 4 million links.
Challenge 4: Diffusion in Social Networks
• Diffusion, another fundamental social processs:
Behaviors that cascade from node to node like an
epidemic.
– News, opinions, rumors, fads, urban legends, ... – Viral marketing [Domingos-Richardson 2001]
– Public health (e.g. obesity [Christakis-Fowler 2007]) – Cascading failures in financial markets
– Localized collective action: riots, walkouts
Challenge 5: Protecting Privacy in Social Network Data
• Many large datasets based on communication (e-mail, IM, voice) where users have strong privacy expectations.
– Current safeguards based on anonymization: replace node names with random IDs.
• With more detailed data, anonymization has run into
trouble:
– Identifying on-line pseudonyms by textual analysis – De-anonymizing Netflix ratings via time series
– Search engine query logs: identifying users from their queries.
• Does this make things safer?
– E.g. no text, time-stamps, or node attributes
Challenge 6: Attacking an Anonymized Network
• What we learn from this:
• Attacker may have extra power if they are part of the system. In large e-mail/IM network, can easily add yourself to system.
• But “finding yourself” when there are 100 million nodes is going to be more subtle than when there are 34 nodes.
• Template for an active attack on an anonymized network
– Attacker can create (before the data is released) nodes (e.g. by registering an e-mail account) edges incident to these nodes (by sending mail)
– Privacy breach: learning whether there is an edge between two existing nodes in the network.