• 沒有找到結果。

Mining Social Networks

N/A
N/A
Protected

Academic year: 2021

Share "Mining Social Networks"

Copied!
52
0
0

加載中.... (立即查看全文)

全文

(1)

Mining Social Networks

Presenter: I-Hsien Ting (丁一賢)

Department of Information Management National University of Kaohsiung

國立高雄大學資訊管理學系

(2)

Outline

• Analyzing Multi-Source Social Data for

Extracting and Mining Social Networks

• Mining Organizational Networks for Layoff

Prediction Model Construction

• Challenges in Mining Social Network Data

• CFP

(3)

3

Analyzing Multi-Source Social

Data for Extracting and Mining

Social Networks

In 2009 IEEE SocialCom International Conference

29 August 2009 in Vancouver, Canada

(4)

Introduction (1/2)

• Communication and Social Activities in the

Internet

– E-mail, Instant Messenger (MSN), Blog, etc.

• Large amount of Communication data

• Valuable data to understand the social

(5)

Introduction (2/2)

• How to collect, pre-process, organize and

for social network extraction?

• In this paper

– An system architecture will be proposed

– Some detail about how to process e-mail and

Instant Messenger data

– Social Network Extraction

• How to measure the relationship?

(6)

Literature Review (1/2)

• Social Network Analysis

– To understand the relationship between “Actors”

• each actor is presented as a node and each pair of nodes can be connected by lines to show the relationships

– Three important elements

• actors, ties, and relationships

(7)

Literature Review (2/2)

• Social Network Extraction

– Bird et al. propose a method to extract social networks from e-mail communications

– Agrawal et. al using web mining techniques to understand the behavior of users in newsgroup

– Jin et al. and Matsuo et al. developed systems and tried to extract social networks from the web

– Furukawa et al. were trying to identify social networks from blogspace

• Most of the researches that discussed above are

focusing on a single data source for social

(8)

The System Architecture

Instant Messenger

E-Mail

Blog

Data Extraction Engine Database

Ontology Base

SNA Engine

SN Construction Engine

User Input (Task)

Dynamic and Task-oriented (1) (2) (3) (4) (5) (6) (7) (8) (9)

(9)

Data Collection System (1/3)

9

Return-path: <eri@xx.xx.xx.xx> Envelope-to: RSs@xx.xxxx.xx.xx

Received: from funnelweb.cs.york.ac.uk ([144.32.161.232] Message-ID: <47552CF4.70806@xx.xxx.xx.xx>

Date: Tue, 04 Dec 2007 10:33:24 +0000 From: E Rid <eri@xx.xxx.xx.xx>

Reply-To: eri@xx.xx.xx.xx

User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0

To: RSs@xx.xxx.xx.xx

Subject: java versus C benchmarks

Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit

Status: RO

A Sample E-mail Format

The system is a web-based system, and it allows users to upload email file (in *.eml or text format), MSN history data (in *.xml format) and client side logging data (in *.log or text format).

(10)
(11)

Data Collection System (3/3)

11

<?xml version="1.0"?>

<?xml-stylesheet type='text/xsl' href='MessageLog.xsl'?> <Log FirstSessionID="1" LastSessionID="12">

<Message Date="2009/7/4" Time="上午 12:50:55" DateTime="2009-07-03T16:50:55.390Z" SessionID="31">

<From><User FriendlyName="Want SAP, Netweaver, J2EE consultant"/></From>

<To><User FriendlyName="Derrick-"/></To><Text Style="font-family:; color:#000000; ">

ok...I got it... </Text>

</Message> </Log>

(12)

Data Extraction System (1/6)

• From the email file, some necessary fields will

be extracted, including

– “deliver-to”, “receive-id”, “date”, “to”, “from”, “subject”, “msg-id”, “priority”, “reply-to”, “mailer (agent)”,

“encode”. “content-type”, “content”, “cc”

• MSN history file. The fields will be extracted

including

(13)

Data Extraction System (2/6)

Name Frequency 1 linda LEE 85 2 JoyceWu 16 3 d94311001 16 4 Chris Kimble 16 5 ceremonies 8 6 CSENG 4 7 scyang 4 8 mio 4 9 Daniel 3 13

(14)

Data Extraction System (3/6)

• From Social Data to Social Networking

– How to measure the relationship

– Relationship from Email

R

i

W

1

E

i

W

2

M

i

W

3

B

i

(15)

Data Extraction System (4/6)

 Relationship from MSN

 Relationship from Blog

15



M

i

W

1

M

send

W

2

M

receive

W

3

M

multi



MiW1MsendW2MreceiveW3MmultiW4Minteraction



B

i

W

1

B

browsing

W

2

B

bookmarking

W

3

B

interaction



(16)
(17)

Network Construction and Visualization (2/6)

• Entering name “iting”

(18)

• Entering name “joyce”

(19)

19

(20)

• Entering Keyword-”right”

(21)

• Entering Keyword-”right”

21

(22)

Conclusion

• Social and communication data are very

common data in our daily life

• Useful for making decision

• An architecture has been proposed

• Data collection, processing and social network

extraction

(23)

Mining Organizational Networks

for Layoff Prediction Model

Construction

In ASONAM (Advances in Social Network Analysis and Mining) 2009 International Conference, July 2009 in Athens Greece

(24)

Introduction (1)

• Economic recession.

• Resources reduction

• Downsizing.

(25)

Introduction (2)

• This study applies SNA and DM techniques

to build a model for layoff prediction.

Employees wish to retain their jobs and keep their

work.

Layoff prediction have become of great concern

to the employees and managers.

Predicting the possible layoff and then utilize their

resources to retain their job.

(26)

Social Networks Analysis

• Social network analysis

Mapping and analyzing relationships among

people, teams, departments or even the entire

organization.

Understanding social networks within an

organization is a mean to understand how social

(27)

Properties of Networks

• Connection

Size : density and degree

Centrality : betweenness and closeness

Reachability

• Distance

Walk

Trail

Path

Geodesic distance

27

(28)

Data Mining Analysis

• Clustering

Segmenting heterogeneous data into different

clusters.

• Associations rule

Finding the attribute value conditions that

occur frequently together in a given set of

(29)

Research process

• Step1:Exploring Data Analysis.

• Step2:Constructing Organizational Network.

• Step3:Data Mining Analysis for the Layoff’s file.

• Step4 :Constructing Layoff Prediction Model .

(30)

Research Architecture

Left Employees’Database The Record File of Layoff

Data Extraction

Constructing Organizational Network

Social Network Analysis

Data Pre-processing

(31)

Constructing Organizational Networks (1)

(32)

Constructing Organizational Networks (2)

• Using SNA for construct a organizational networks relationship of laid-off employees.

• This study proposed an example of organizational network A in order to explanation the

(33)

Constructing Organizational Networks (3)

Properties of Networks Value

Density 0.3816 Degree 40% Closeness Centrality 41.65% Betweenness centrality 36.67% Average distance 1.762 Distance –based cohesion 0.667

• The descriptions of each indicator are shown as follow:

(34)

Position and role Analysis

• To compare Euclidean

distance, the short distance is 1.414 between the employee A2 and A3 .

(35)

Data Mining Analysis for Layoff’s file (1)

(36)

Data Mining Analysis for Layoff’s file (2)

• Clustering

 We selected age, sex, marriage, grade, education level, shift, position and compensation level to cluster the left employee 6 segments by K-Means.

• Associations rule

 We used WEKA to do association and found the 8 useful rules for the laid-off employees’ attribute.

(37)

Data Mining Analysis for Layoff’s file (3)

• The three rules for Layoff’s file in decision-tree algorithm are :

Rule No. Rule Accuracy

1 Layoff, Age=40~50, Sex=M, Mar=Y, Shi=Normal shift,

Pos=Manager_Lv and Com ≧ 50000

86.2%

2 Layoff, Age=40~50, Sex=M, Mar=Y,

Edu=University , Pos=Manager_Lv and Com≧ 50000

92.17%

3 Layoff, Age=40~50, Sex=M, Mar=Y, Gra≧ 10 , Pos=Manager_Lv and Com≧50000

96.44%

(38)
(39)

Discussion and Conclusion

• Conclusion

 The results indicated that the proposed approach has pretty good prediction accuracy by using organizational networks, employees databases, and layoff records.

• Discussion

 The current trend of layoff cut many high compensation managers in Taiwan Hi-Tech industry . It is important phenomenon to make one deep in thought for employees.

(40)

Challenges in Mining

Social Network Data

(41)

Challenge 1: Splitting Network

(42)

Challenge 2: A Matter of Scale

• 436-node network of e-mail exchange over 3 months at a corporate research lab (Adamic-Adar 2003)

• 43,553-node network of e-mail exchange over 2 years at a large university (Kossinets-Watts 2006)

• 4.4-million-node network of declared friendships on blogging community LiveJournal (Liben-Nowell et al. 2005, Backstrom et al. 2006)

(43)

Challenge 2: A Matter of Scale

• Currently, massive network datasets give

you both more and less:

– More: can observe global phenomena that are

genuine, but literally invisible at smaller

scales.

– Less: Don’t really know what any one node or

link means. Easy to measure things; hard to

pose nuanced questions.

– Goal: Find the point where the lines of

research converge.

(44)

Challenge 3: Geographic Data

• Liben-Nowell, Kumar, Novak, Raghavan, Tomkins (2005) studied • LiveJournal, an on-line blogging community with friendship links • Large-scale social network with geographical embedding:

(45)

Challenge 4: Diffusion in Social Networks

• Diffusion, another fundamental social processs: Behaviors that cascade from node to node like an epidemic.

– News, opinions, rumors, fads, urban legends, ... – Viral marketing [Domingos-Richardson 2001]

– Public health (e.g. obesity [Christakis-Fowler 2007]) – Cascading failures in financial markets

(46)

Challenge 5: Protecting Privacy in Social Network Data • Many large datasets based on communication (e-mail, IM, voice)

where users have strong privacy expectations.

– Current safeguards based on anonymization: replace node names with random IDs.

• With more detailed data, anonymization has run into trouble:

– Identifying on-line pseudonyms by textual analysis – De-anonymizing Netflix ratings via time series

(47)

Challenge 6: Attacking an Anonymized Network • What we learn from this:

• Attacker may have extra power if they are part of the system. In large e-mail/IM network, can easily add yourself to system.

• But “finding yourself” when there are 100 million nodes is going to be more subtle than when there are 34 nodes.

• Template for an active attack on an anonymized network

– Attacker can create (before the data is released) nodes (e.g. by registering an e-mail account) edges incident to these nodes (by sending mail)

– Privacy breach: learning whether there is an edge between two existing nodes in the network.

(48)
(49)

Challenge 7: Network Abstraction

• It is really difficult just like sampling

(50)

工商服務時間

(51)

Call for Papers

• 2

nd

International Workshop on Mining

Social Networks for Decision Support

– 9-11 August, 2010, Odense, Denmark

http://asonam2010.hau.gr

• ASONAM 2011

– (the 3

rd

International Conference on Advances

in Social Network Analysis and Mining)

– 29-31 July, 2011, Kaohsiung,Taiwan

(52)

Thanks for Your Attention

Any Questions?

參考文獻

相關文件

EQUIPAMENTO SOCIAL A CARGO DO INSTITUTO DE ACÇÃO SOCIAL, Nº DE UTENTES E PESSOAL SOCIAL SERVICE FACILITIES OF SOCIAL WELFARE BUREAU, NUMBER OF USERS AND STAFF. 數目 N o

INFORMAÇÃO GLOBAL SOBRE AS ASSOCIAÇÕES DE SOLIDARIEDADE SOCIAL E OS SERVIÇOS SUBSIDIADOS REGULARMENTE PELO INSTITUTO DE ACÇÃO SOCIAL. STATISTICS ON SOCIAL SOLIDARITY ASSOCIATIONS

Srikant, Fast Algorithms for Mining Association Rules in Large Database, Proceedings of the 20 th International Conference on Very Large Data Bases, 1994, 487-499. Swami,

形成 形成 形成 研究問題 研究問題 研究問題 研究問題 形成問題 形成問題 形成問題 形成問題 的步驟及 的步驟及 的步驟及 的步驟及 注意事項 注意事項 注意事項

&#34;Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,&#34; Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

Centre for Learning Sciences and Technologies (CLST) The Chinese University of Hong

Overview of NGN Based on Softswitch Network Architectures of Softswitch- Involved Wireless Networks.. A Typical Call Scenario in Softswitch- Involved

Following the supply by the school of a copy of personal data in compliance with a data access request, the requestor is entitled to ask for correction of the personal data