• 沒有找到結果。

On the Social Network Analysis and Mining

N/A
N/A
Protected

Academic year: 2021

Share "On the Social Network Analysis and Mining"

Copied!
36
0
0

加載中.... (立即查看全文)

全文

(1)

On the Social Network

Analysis and Mining:

A Brief Introduction

Leon S.L. Wang (王學亮)

Department of Information Management National University of Kaohsiung

(2)

Outline

• What is Social Network?

• Social Networks Analysis

• Social Network Extraction &

Construction

• Social Network Applications

• Challenges in Mining Social

(3)

Introduction (1)

• What is Social Network?

– A social network is a social structure to

describe social relations (wikipedia)

– The history of Social Network is older

than everybody who is here

• More than 100 years (Cooley 1909, Durkheim 1893)

• Focusing on small groups

– Information Techniques give it a new life

– From Sociology to Computer Science

(4)

Introduction (2)

• Topics about Social Networking

– Social Networking: Analyzing and

Constructing Social Network

(Churchill & Halverson 2005)

• Social Network Extraction and Construction

• Social Network Analysis

• Online Social Networking

(5)

Social Network Analysis (1)

• Social Network Analysis

– A simple social network diagram

(Scott 1991)

• Roles

• Relationships

– One way – Two way

– Positive & Negative

– Self-defined relationships

• Visualization

– Why visualization?

» Providing as much information as possible in a social network

» Human can easily and roughly

(6)

Social Network Analysis (2)

• Relational Data

Adjacency matrix:

Companies-by-companies 1 2 3 4 1 - 3 3 1 2 3 - 2 2 3 3 2 - 1 4 1 2 1 -Adjacency matrix: directors-by-directors A B C D E A - 2 2 1 1 B 2 - 3 2 1 Directors A B C D E Com pan ies 1 1 1 1 1 0 2 1 1 1 0 1 3 0 1 1 1 0 4 0 0 1 0 1 1 2 4 3 1 1 2 2 3 3 A B 1 1 1 2 2 2 3

(7)

Social Network Analysis (3)-1

• Measurement

– Size

• Density

• Geodesic distance

• Diameter

• Closeness

– Centrality

• Degree

• Betweenness

• Closeness

7

(8)

Social Network Analysis (3)-2

• Density Measurement-An example

Connected Points 4 4 4 3 2 0 Inclusiveness 1.0 1.0 1.0 0.7 0.5 0 Sum of degrees 12 8 6 4 2 0 No. of lines 6 4 3 2 1 0

(9)

Social Network Analysis (3)-3

• Centrality

– Degree, Betweenness, Closeness, Eigenvalue

(10)

Social Network Analysis (3)-4

• Measurement

– Clustering Coefficient

– Path Length, Trail, Walk

– Reachability

(digraph)

– Structural Hole

– Reciprocity

– K-Clique

– Position

(11)

Social Network Analysis (3)-5

• Measurement

– Clustering Coefficient

• Local

• Global

11

(12)

Social Network Analysis (3)-6

• Measurement

(13)

Social Network Analysis (3)-7

• Measurement

– Reciprocity

• the number of ties that are involved in reciprocal

relations relative to the total number of actual ties

• |(AB, BA)| / |(AB, BA, AC)|

• = 2/3

(14)

Social Network Analysis (3)-8

• Measurement

– Clique

• Complete subgraph

• Maximum clique

– {1,2,5}

• Maximal cliques

– {1,2,5} – {2,3}, {3,4}, {4,5},{4,6}

• K-clique

– Clique of size k

(15)

Social Network Analysis (5)

• Sociologists only focus on small social

networks

– 50~100 nodes in a social network

• The advent of Internet communications

has greatly increased SNA‟s popularity

– Computer & Information Technologies

become essential tools for SNA

(Churchill & Halverson 2005)

(16)

Can You Analyze and Construct this Social Network

Diagram by hand??

(17)

Social Networking

• Social Network = Computer Network

– Next Target: Mobile Phone

• Facebook is collaborating with Cingular

Wireless, Sprint Nextel & Verizon Wireless

• Killer Applications are Needed

– E-commerceWallop

– Job Finder

Linkedin

• On-line Social Networking Websites

– Using

people

to find

content

(18)

Social Network Extraction & Construction

• Extracting & Constructing Social Networks from

Contents

– Using

content

to find

people

– Contents

• Web

• E-mail

• Event-logs

• On-line Chat

• Papers & Theses

(19)

Social Network Extraction & Construction (2)

• Extracting Social Networks from Web

– Extracting from web contents (Personal Homepage)

– Semantic Analysis (Ontology) & NLP (Natural Language Processing) (Jin et al. 2007)

– Contacting information is the focus (Culotta et al. 2005)

• E-mail address • Phone Number • Names – Network Analysis • Appearance • Connectivity • URLs Similarity 19

(20)

Social Network Extraction & Construction (3)

• Extracting Social

Networks from E-mail

– A most used on-line

communication application

– E-mail is a semi-structured

document

(Bird et al 2006) • Header for sender identification

– Form: „Bill Stoddard‟ <reddrum@attglobal.net>

• Subject • Receiver • Date & Time

Me A B C Me A B C A B

(21)

Social Network Extraction & Construction (4)

• Extracting Social Networks from Chat

– Internet Relay Chat (Chat Room)

(Muttons 2006)

– Instant Messenger

• MSN Messenger, ICQ, Yahoo! Messenger,……

• MSN messenger provides a XML based and structured communication logs

– Date & Time – Sender

– Receiver – Messages

• Network Analysis

– Communication Frequency & Closeness

– Contact Sharing (Who may also your friends) – Automatic Grouping and Blocking

(22)

Social Networking Applications (1)

• Marketing & E-commerce

– Target Marketing

– Collaborative Recommendation

• Terrorist & Crime Detection

– Ipswich‟s Jack the Ripper, England 2006

• Medical Network

– Finding Blood – Organ

• Knowledge Management

(23)

Social Networking Applications (2)

• Learning

• Organizational Social Network Analysis

– Optimice

• Politic & Election

• Academic Social Networking

– Family Tree

• Game AI

– On-line Game

– Game with Social Network (Game 2.0)

• Second Life

• And Much More………

(24)

Some Link Mining Tasks (1)

• Object-Related Tasks

– Link-based Object Ranking

• PageRank, HITS, Centrality, Tagommender

– Link-based Object Classification

• News items classification, Folksonomy

– Object Clustering (Group Detection)

• Finding positions, a set of people with similar links

– Object identification

• Same name, different people

• Link-Related Tasks

(25)

Some Link Mining Tasks (2)

• Graph-Related Tasks

– Subgraph Discovery

• K-Cliques, K-Clans, K-Plexes

– Graph Classification

– Generative Models for Graphs

(26)

Challenges in Mining Social

Network Data

Adopted and Modified from the talk of Jon M. Kleinberg

(27)

Challenge 1: Splitting Network

(28)

Challenge 2: A Matter of Scale

• 436-node network

of e-mail exchange over 3 months at

a corporate research lab (Adamic-Adar 2003)

• 43,553-node network

of e-mail exchange over 2 years at

a large university (Kossinets-Watts 2006)

• 4.4-million-node network

of declared friendships on

blogging community LiveJournal (Liben-Nowell et al.

2005, Backstrom et al. 2006)

• 240-million-node network

of all IM communication over

one month on Microsoft Instant Messenger

(Leskovec-Horvitz‟07)

(29)

Challenge 2: A Matter of Scale

• Currently, massive network datasets give

you both more and less:

– More: can observe global phenomena that are

genuine, but literally invisible at smaller

scales.

– Less: Don‟t really know what any one node or

link means. Easy to measure things; hard to

pose nuanced questions.

– Goal: Find the point where the lines of

research converge.

(30)

Challenge 3: Geographic Data

• Liben-Nowell, Kumar, Novak, Raghavan, Tomkins (2005) studied • LiveJournal, an on-line blogging community with friendship links • Large-scale social network with geographical embedding:

– 500,000 members with U.S. Zip codes, 4 million links.

(31)

Challenge 4: Diffusion in Social Networks

• Diffusion, another fundamental social processs:

Behaviors that cascade from node to node like an

epidemic.

– News, opinions, rumors, fads, urban legends, ... – Viral marketing [Domingos-Richardson 2001]

– Public health (e.g. obesity [Christakis-Fowler 2007]) – Cascading failures in financial markets

(32)

Challenge 5: Protecting Privacy in Social Network Data

• Many large datasets based on communication (e-mail, IM, voice) where users have strong privacy expectations.

– Current safeguards based on anonymization: replace node names with random IDs.

• With more detailed data, anonymization has run into

trouble:

– Identifying on-line pseudonyms by textual analysis – De-anonymizing Netflix ratings via time series

– Search engine query logs: identifying users from their queries.

• Does this make things safer?

– E.g. no text, time-stamps, or node attributes

(33)

Challenge 6: Attacking an Anonymized Network

• What we learn from this:

• Attacker may have extra power if they are part of the system. In large e-mail/IM network, can easily add yourself to system.

• But “finding yourself” when there are 100 million nodes is going to be more subtle than when there are 34 nodes.

• Template for an active attack on an anonymized network

– Attacker can create (before the data is released) nodes (e.g. by registering an e-mail account) edges incident to these nodes (by sending mail)

– Privacy breach: learning whether there is an edge between two existing nodes in the network.

(34)
(35)

Challenge 7: Network Abstraction

• It is really difficult just like sampling

(36)

參考文獻

相關文件

EQUIPAMENTO SOCIAL A CARGO DO INSTITUTO DE ACÇÃO SOCIAL, Nº DE UTENTES E PESSOAL SOCIAL SERVICE FACILITIES OF SOCIAL WELFARE BUREAU, NUMBER OF USERS AND STAFF. 數目 N o

INFORMAÇÃO GLOBAL SOBRE AS ASSOCIAÇÕES DE SOLIDARIEDADE SOCIAL E OS SERVIÇOS SUBSIDIADOS REGULARMENTE PELO INSTITUTO DE ACÇÃO SOCIAL. STATISTICS ON SOCIAL SOLIDARITY ASSOCIATIONS

Srikant, Fast Algorithms for Mining Association Rules in Large Database, Proceedings of the 20 th International Conference on Very Large Data Bases, 1994, 487-499. Swami,

In view of the large quantity of information that can be obtained on the Internet and from the social media, while teachers need to develop skills in selecting suitable

&#34;Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,&#34; Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

In this paper, we have studied a neural network approach for solving general nonlinear convex programs with second-order cone constraints.. The proposed neural network is based on

Centre for Learning Sciences and Technologies (CLST) The Chinese University of Hong

Instead of categorizing triggers by the functionality of their associated services [13], we categorize by the types of information they may leak, and iden- tified three types