1
A Dynamic and Task-Oriented Social
Network Extraction System Based on
Analyzing Personal Social Data
Kai-Yu Wang1, I-Hsien Ting2, Hui-Ju Wu2, Pei-Shan Chang2 1Department of Marketing, International Business and Strategy
Brock University, Canada
2Department of Information Management
National University of Kaohsiung, Taiwan
iting@nuk.edu.tw
2
Outline
1. Introduction
2. Literature Review
Social Networks Analysis Social Networks Extraction
Web Mining Techniques for Social Networking
3. The System Architecture
4. The Data Collection System and Extraction engine
5. System Implementation
6. Conclusion and Future Research
Introduction (1/3)
Communication and Social Activities in the
Internet
E-mail, Instant Messenger (MSN), Blog, etc.
Large amount of Communication data
Valuable data to understand the social
structure
Useful techniques to analyze the social data
Data Mining, Visualization……
Introduction (2/3)
How to collect, pre-process, organize and for
social network extraction and for decision
support?
In this paper
An system architecture will be proposed
Some detail about how to process e-mail and
Instant Messenger data
Social Network Extraction
Introduction (3/3)
A system architecture will be proposed and implemented in this paper
– E-mail and Instant Messenger history can be uploaded to the system
– User can input keyword for a particular task
– Network for the task will be generated dynamically
The system can also be used as a very useful decision support system
Literature Review (1/3)
Social Network Analysis
To understand the relationship between “Actors”
• each actor is presented as a node and each pair of nodes can be connected by lines to show the relationships
Three important elements
• actors, ties, and relationships
The most important measurements of SNA include
• network size, diameter, density, centrality and structure holes (Scott, 2000)
Literature Review (2/3)
Social Network Extraction
Bird et al. propose a method to extract social networks from e-mail communications
Agrawal et. al using web mining techniques to understand the behavior of users in newsgroup
Jin et al. and Matsuo et al. developed systems and tried to extract social networks from the web
Furukawa et al. were trying to identify social networks from blogspace
Most of the researches that discussed above are
focusing on a single source for social network
8
E-mail Instant Messenger
Data Preprocessing &
Extraction Engine
Database
Ontologybase
Social Network Analysis Engine
Social Network Visualization Engine
User Input Visualized Social Network
The Data Collection System (1/3)
9
Return-path: <eri@xx.xx.xx.xx> Envelope-to: RSs@xx.xxxx.xx.xx
Received: from funnelweb.cs.york.ac.uk ([144.32.161.232] Message-ID: <47552CF4.70806@xx.xxx.xx.xx>
Date: Tue, 04 Dec 2007 10:33:24 +0000 From: E Rid <eri@xx.xxx.xx.xx>
Reply-To: eri@xx.xx.xx.xx
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0
To: RSs@xx.xxx.xx.xx
Subject: java versus C benchmarks
Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit
Status: RO
A Sample E-mail Format
The system is a web-based system, and it allows the user to upload email file (in *.eml or text format), MSN history data (in *.xml format) and client side logging data (in *.log or text format).
The Data Collection System (2/3)
10
The Data Collection System (3/3)
11
<?xml version="1.0"?>
<?xml-stylesheet type='text/xsl' href='MessageLog.xsl'?> <Log FirstSessionID="1" LastSessionID="12">
<Message Date="2009/7/4" Time="上午 12:50:55" DateTime="2009-07-03T16:50:55.390Z" SessionID="31">
<From><User FriendlyName="Want SAP, Netweaver, J2EE consultant"/></From> <To><User FriendlyName="Derrick-"/></To><Text Style="font-family:;
color:#000000; "> ok...I got it...
</Text> </Message> </Log>
The Data Extraction System (1/6)
From the email file, some necessary fields will
be extracted, including
“deliver-to”, “receive-id”, “date”, “to”, “from”,
“subject”, “msg-id”, “priority”, “reply-to”, “mailer
(agent)”, “encode”. “content-type”, “content”, “cc”
MSN history file. The fields will be extracted
including
from”, to”, content”,
datetime”, id”, sessionid”,
The Data Extraction System (2/6)
13
The Data Extraction System (3/6)
From Social Data to Social Networking
How to measure the relationship
Relationship from Email
14
R
i
W
1
E
i
W
2
M
i
W
3
B
i
The Data Extraction System (4/6)
Relationship from MSN
Relationship from Blog
15
M
i
W
1
M
send
W
2
M
receive
W
3
M
multi
MiW1Mse ndW2Mr e c e i v eW3Mmul t iW4Mi nt e r ac t i on
B
i
W
1
B
b ro wsin g
W
2
B
b o o k ma rk in g
W
3
B
in tera ctio n
System Implementation(1/6)
16
System Implementation(2/6)
17
System Implementation(2/6)
18
System Implementation(2/6)
19
System Implementation(2/6)
20
Conclusion
Social and communication data are very common data in our daily life
Useful for making decision
An system architecture has been proposed
Data collection, processing, social network extraction and visualization
Future research Ontology base
Including data mining techniques
Measurement the performance of the system
22
Thank You!
Any Questions?
I-Hsien Ting
Department of Information Management National University of Kaohsiung, Taiwan
iting@nuk.edu.tw