• 沒有找到結果。

Social Networks Analysis and Mining: From Sociology to Computer Science

N/A
N/A
Protected

Academic year: 2021

Share "Social Networks Analysis and Mining: From Sociology to Computer Science"

Copied!
49
0
0

加載中.... (立即查看全文)

全文

(1)

Who Do You Know Who:

Social Networks Analysis and

Mining

Presenter: I-Hsien Ting, Ph.D. (丁一賢)

(2)

Introduction (1)

• What is Social Network?

– A social network is a social structure to

describe social relations (wikipedia)

– The history of Social Network is older

than everybody who is here

• More than 100 years (Cooley 1909, Durkheim 1893)

• Focusing on small groups

– Information Techniques give it a new life

– From Sociology to Computer Science

(3)

Introduction (2)

• Topics about Social Networking

– Social Networking: Analyzing and

Constructing Social Network

(Churchill & Halverson 2005)

• Social Networks Analysis

• Online Social Networking

• Social Network Extraction and Construction

• Applications of Social Networking

(4)

Social Networks Analysis (1)

• Social Networks Analysis

– A simple social network diagram

(Scott 1991)

• Roles

• Relationships

– Directed & Undirected – One way & Two way – Positive & Negative

– Self-defined relationships

• Visualization

– Why visualization?

» Providing as much information as possible in a social network

» Human can easily and roughly

(5)

Social Networks Analysis (2)

• Relational Data

Adjacency matrix:

Companies-by-companies 1 2 3 4 1 - 3 3 1 2 3 - 2 2 3 3 2 - 1 4 1 2 1 - Adjacency matrix: directors-by-directors A B C D E A - 2 2 1 1 B 2 - 3 2 1 C 2 3 - 2 2 Directors A B C D E C om pani e s 1 1 1 1 1 0 2 1 1 1 0 1 3 0 1 1 1 0 4 0 0 1 0 1 1 2 4 3 1 1 2 2 3 3 A B 1 1 1 2 2 2 2 3

(6)

Social Networks Analysis (3)

• From social data to relational data

6 A B C D E F G A x 2 2 2 2 2 2 B 2 x 1 1 1 1 1 C 2 1 x 1 1 1 1 D 2 1 1 x 1 1 1 E 2 1 1 1 x 1 1 F 2 1 1 1 1 X 1 G 2 1 1 1 1 1 X

(7)

Social Networks Analysis (4)

• Measurement

– Centrality Degree

• Betweenness

• Closeness

– Clustering Coefficient

– Density

– Path Length

– Reachability

– Structural Hole

(8)

Social Networks Analysis (5)

• Density measurement-An example

Connected Points 4 4 4 3 2 0 Inclusiveness 1.0 1.0 1.0 0.7 0.5 0 Sum of degrees 12 8 6 4 2 0 No. of lines 6 4 3 2 1 0 Density 1.0 0.7 0.5 0.3 0.1 0 8 2 ) 1 (   n n l Density (Scott 1991)

(9)

Social Networks Analysis (6)

• Centrality

(10)

Social Networks Analysis (7)

• Measurement

– Clustering Coefficient

– Path Length, Trail, Walk

– Reachability

(digraph)

– Structural Hole

– Reciprocity

– K-Clique

– Position

10

(11)

Social Networks Analysis (8)

• Measurement

– Clustering Coefficient

• Local

(12)

Social Networks Analysis (9)

• Measurement

– Structural Hole

(13)

Social Networks Analysis (10)

• Measurement

– Clique

• Complete subgraph

• Maximum clique

– {1,2,5}

• Maximal cliques

– {1,2,5} – {2,3}, {3,4}, {4,5},{4,6}

• K-clique

– Clique of size k

(14)

Social Networks Analysis (11)

• Sociologists only focus on small social

networks

– 50~100 nodes in a social network

• The advent of Internet communications

has greatly increased SNA‟s popularity

– Computer & Information Technologies

become essential tools for SNA

(Churchill & Halverson 2005)

(15)

Can You Analyze and Construct this Social Network

Diagram by hand??

(16)

Cross-Field Research

16

ASONAM 2010 in Odense, Denmark

(The 2010 International Conference on Social Network Analysis and Mining)

Sociology Ministry of Defense, Netherland Terrorist Terrorist Operation Unit, European Council Data Mining & Privacy WWW and Mining Neural Network and NLP Data Mining

(17)

Social Networking Systems:

Off-line social networking software(1)

• Most off-line social networking softwares are

focusing on

– Visualization

– Social Networks Analysis

• UCINET

– The best known off-line social networking software

(Borgatti et al. 2002)

– Visualization

(18)

18

Social Networking Systems:

Off-line social networking software(2)

• UCINET

(19)

Social Networking Systems:

Off-line social networking software(3)

(20)

20

Social Networking Systems:

Off-line social networking software(4)

• NetMiner II

– Strong Visualization Function

(Cyram 2003)

(21)

Social Networking Systems:

Off-line social networking software(5)

(22)

22

Social Networking Systems:

Off-line social networking software(6)

• NetMiner II

(23)

• Netdraw

– A simple social network visualization tool

Social Networking Systems:

(24)

24

Social Networking Systems:

Off-line social networking software(2)

• A summary table of off-line social networking software

(Huisman and Duiju 2003)

(25)

Social Networking Systems:

On-line social networking websites(1)

• On-line social networking website is an

evolution from on-line community

• Sharing is the main object

• Two main categories of on-line social

networking website

– Portal-based

• Integrating multi-functions

– Function-based

• For a specific function

– Albums sharing (Flickr) – Videos sharing (YouTube)

(26)

26

Social Networking Systems:

On-line Social Networking Website (2)

• Classmates.com

– The first on-line social networking website – Created in 1995

• MySpace

– The world‟s busiest website (more accesses than Google)

• Facebook

– Originally designed to mirror a college community

• Orkut

– The social networking website owned by Google

• Wallop

– The next generation online-social networking website ,owned by Microsoft

(27)

Social Networking Systems:

On-line Social Networking Websites (3)

• A Comparison of On-line Social Networking

Websites

Friend Search

Blog Album Music & Video Sharing Visualization E-commerce Classmates   Myspace     Facebook    Orkut    Wallop      

(28)

Social Networking

• Social Network = Computer Network

– Next Target: Mobile Phone

• Facebook is collaborating with Cingular

Wireless, Sprint Nextel & Verizon Wireless

• Killer Applications are Needed

– E-commerceFacebook

– Job Finder

Linkedin

• On-line Social Networking Websites

– Using people to find content

(29)

Social Networks Extraction & Construction (1)

• Extracting & Constructing Social Networks from

Contents

– Using content to find people

– Contents

• Web

• E-mail

• Event-logs

• On-line Chat

• Papers & Theses

(30)

Social Networks Extraction & Construction (2)

• Extracting Social Networks from Web

– Extracting from web contents (Personal Homepage)

– Semantic Analysis (Ontology) & NLP (Natural Language Processing) (Jin et al. 2007)

– Contacting information is the focus (Culotta et al. 2005)

• E-mail address • Phone Number • Names – Network Analysis • Appearance • Connectivity • URLs Similarity 30

(31)

Social Networks Extraction & Construction (3)

• Extracting Social

Networks from E-mail

– A most used on-line

communication application

– E-mail is a semi-structured

document

(Bird et al 2006) • Header for sender identification

– Form: „Bill Stoddard‟

<reddrum@attglobal.net>

• Subject • Receiver • Date & Time

Me A B C Me A B C A B

(32)

Social Networks Extraction & Construction (4)

(33)

Social Networks Extraction & Construction (4)

• Extracting Social Networks from Chat

– Internet Relay Chat (Chat Room)

(Muttons 2006)

– Instant Messenger

• MSN Messenger, ICQ, Yahoo! Messenger,……

• MSN messenger provides a XML based and structured communication logs

– Date & Time – Sender

– Receiver – Messages

• Network Analysis

– Communication Frequency & Closeness

(34)

Visualization

34

(35)

SONAM Applications (1)

• Marketing & E-commerce

– Target Marketing

– Collaborative Recommendation

• Terrorist & Crime Detection

– 911 Network

– Ipswich‟s Jack the Ripper, England 2006

• Medical Network

– Finding Blood – Organ

(36)

SONAM Applications (2)

• Learning

• Organizational Social Network Analysis

– Optimice

• Politic & Election

• Academic Social Networking

– Family Tree

• Game AI

– On-line Game

– Game with Social Network (Game 2.0)

• Second Life

• And Much More………

(37)

Challenges in Mining Social

Network Data

Adopted and Modified from the talk of Jon M. Kleinberg

(38)

Challenge 1: Splitting Network

(39)

Challenge 2: A Matter of Scale

• 436-node network of e-mail exchange over 3 months at

a corporate research lab (Adamic-Adar 2003)

• 43,553-node network of e-mail exchange over 2 years at

a large university (Kossinets-Watts 2006)

• 4.4-million-node network of declared friendships on

blogging community LiveJournal (Liben-Nowell et al.

2005, Backstrom et al. 2006)

• 240-million-node network of all IM communication over

one month on Microsoft Instant Messenger

(Leskovec-Horvitz‟07)

(40)

Challenge 2: A Matter of Scale

• Currently, massive network datasets give

you both more and less:

– More: can observe global phenomena that are

genuine, but literally invisible at smaller

scales.

– Less: Don‟t really know what any one node or

link means. Easy to measure things; hard to

pose nuanced questions.

– Goal: Find the point where the lines of

research converge.

(41)

Challenge 3: Geographic Data

• Liben-Nowell, Kumar, Novak, Raghavan, Tomkins (2005) studied • LiveJournal, an on-line blogging community with friendship links • Large-scale social network with geographical embedding:

– 500,000 members with U.S. Zip codes, 4 million links.

(42)

Challenge 4: Diffusion in Social Networks

• Diffusion, another fundamental social processs:

Behaviors that cascade from node to node like an

epidemic.

– News, opinions, rumors, fads, urban legends, ... – Viral marketing [Domingos-Richardson 2001]

– Public health (e.g. obesity [Christakis-Fowler 2007]) – Cascading failures in financial markets

– Localized collective action: riots, walkouts

(43)

Challenge 5: Protecting Privacy in Social Network Data

• Many large datasets based on communication (e-mail, IM, voice) where users have strong privacy expectations.

– Current safeguards based on anonymization: replace node names with random IDs.

• With more detailed data, anonymization has run into

trouble:

– Identifying on-line pseudonyms by textual analysis – De-anonymizing Netflix ratings via time series

– Search engine query logs: identifying users from their queries.

• Does this make things safer?

– E.g. no text, time-stamps, or node attributes

(44)

Challenge 6: Attacking an Anonymized Network

• What we learn from this:

• Attacker may have extra power if they are part of the system. In large e-mail/IM network, can easily add yourself to system.

• But “finding yourself” when there are 100 million nodes is going to be more subtle than when there are 34 nodes.

• Template for an active attack on an anonymized network

– Attacker can create (before the data is released) nodes (e.g. by registering an e-mail account) edges incident to these nodes (by sending mail)

– Privacy breach: learning whether there is an edge between two existing nodes in the network.

(45)
(46)

Challenge 7: Network Abstraction

• It is really difficult just like sampling

(47)

工商服務時間

(48)

Call for Papers

ASONAM 2011

– (the 3

rd

International Conference on Advances

in Social Network Analysis and Mining)

– July 25-27, 2011, Kaohsiung, Taiwan

(49)

參考文獻

相關文件

• Information retrieval : Implementing and Evaluating Search Engines, by Stefan Büttcher, Charles L.A.

Since the FP-tree reduces the number of database scans and uses less memory to represent the necessary information, many frequent pattern mining algorithms are based on its

This bioinformatic machine is a PC cluster structure using special hardware to accelerate dynamic programming, genetic algorithm and data mining algorithm.. In this machine,

We try to explore category and association rules of customer questions by applying customer analysis and the combination of data mining and rough set theory.. We use customer

Furthermore, in order to achieve the best utilization of the budget of individual department/institute, this study also performs data mining on the book borrowing data

It is concluded that the proposed computer aided text mining method for patent function model analysis is able improve the efficiency and consistency of the result with

Step 5: Receive the mining item list from control processor, then according to the mining item list and PFP-Tree’s method to exchange data to each CPs. Step 6: According the

由於資料探勘 Apriori 演算法具有探勘資訊關聯性之特性,因此文具申請資 訊分析系統將所有文具申請之歷史資訊載入系統,利用