• 沒有找到結果。

A Dynamic and Task-Oriented Social Network Extraction System Based on Analyzing Personal Social Data

N/A
N/A
Protected

Academic year: 2021

Share "A Dynamic and Task-Oriented Social Network Extraction System Based on Analyzing Personal Social Data"

Copied!
22
0
0

加載中.... (立即查看全文)

全文

(1)

1

A Dynamic and Task-Oriented Social

Network Extraction System Based on

Analyzing Personal Social Data

Kai-Yu Wang1, I-Hsien Ting2, Hui-Ju Wu2, Pei-Shan Chang2 1Department of Marketing, International Business and Strategy

Brock University, Canada

2Department of Information Management

National University of Kaohsiung, Taiwan

iting@nuk.edu.tw

(2)

2

Outline

 1. Introduction

 2. Literature Review

 Social Networks Analysis  Social Networks Extraction

 Web Mining Techniques for Social Networking

 3. The System Architecture

 4. The Data Collection System and Extraction engine

 5. System Implementation

 6. Conclusion and Future Research

(3)

Introduction (1/3)

Communication and Social Activities in the

Internet

E-mail, Instant Messenger (MSN), Blog, etc.

Large amount of Communication data

Valuable data to understand the social

structure

Useful techniques to analyze the social data

Data Mining, Visualization……

(4)

Introduction (2/3)

How to collect, pre-process, organize and for

social network extraction and for decision

support?

In this paper

An system architecture will be proposed

Some detail about how to process e-mail and

Instant Messenger data

Social Network Extraction

(5)

Introduction (3/3)

 A system architecture will be proposed and implemented in this paper

– E-mail and Instant Messenger history can be uploaded to the system

– User can input keyword for a particular task

– Network for the task will be generated dynamically

 The system can also be used as a very useful decision support system

(6)

Literature Review (1/3)

 Social Network Analysis

 To understand the relationship between “Actors”

• each actor is presented as a node and each pair of nodes can be connected by lines to show the relationships

 Three important elements

• actors, ties, and relationships

 The most important measurements of SNA include

• network size, diameter, density, centrality and structure holes (Scott, 2000)

(7)

Literature Review (2/3)

 Social Network Extraction

 Bird et al. propose a method to extract social networks from e-mail communications

 Agrawal et. al using web mining techniques to understand the behavior of users in newsgroup

 Jin et al. and Matsuo et al. developed systems and tried to extract social networks from the web

 Furukawa et al. were trying to identify social networks from blogspace

 Most of the researches that discussed above are

focusing on a single source for social network

(8)

8

E-mail Instant Messenger

Data Preprocessing &

Extraction Engine

Database

Ontologybase

Social Network Analysis Engine

Social Network Visualization Engine

User Input Visualized Social Network

(9)

The Data Collection System (1/3)

9

Return-path: <eri@xx.xx.xx.xx> Envelope-to: RSs@xx.xxxx.xx.xx

Received: from funnelweb.cs.york.ac.uk ([144.32.161.232] Message-ID: <47552CF4.70806@xx.xxx.xx.xx>

Date: Tue, 04 Dec 2007 10:33:24 +0000 From: E Rid <eri@xx.xxx.xx.xx>

Reply-To: eri@xx.xx.xx.xx

User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0

To: RSs@xx.xxx.xx.xx

Subject: java versus C benchmarks

Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit

Status: RO

A Sample E-mail Format

The system is a web-based system, and it allows the user to upload email file (in *.eml or text format), MSN history data (in *.xml format) and client side logging data (in *.log or text format).

(10)

The Data Collection System (2/3)

10

(11)

The Data Collection System (3/3)

11

<?xml version="1.0"?>

<?xml-stylesheet type='text/xsl' href='MessageLog.xsl'?> <Log FirstSessionID="1" LastSessionID="12">

<Message Date="2009/7/4" Time="上午 12:50:55" DateTime="2009-07-03T16:50:55.390Z" SessionID="31">

<From><User FriendlyName="Want SAP, Netweaver, J2EE consultant"/></From> <To><User FriendlyName="Derrick-"/></To><Text Style="font-family:;

color:#000000; "> ok...I got it...

</Text> </Message> </Log>

(12)

The Data Extraction System (1/6)

From the email file, some necessary fields will

be extracted, including

“deliver-to”, “receive-id”, “date”, “to”, “from”,

“subject”, “msg-id”, “priority”, “reply-to”, “mailer

(agent)”, “encode”. “content-type”, “content”, “cc”

MSN history file. The fields will be extracted

including

from”, to”, content”,

datetime”, id”, sessionid”,

(13)

The Data Extraction System (2/6)

13

(14)

The Data Extraction System (3/6)

From Social Data to Social Networking

How to measure the relationship

Relationship from Email

14



R

i

W

1

E

i

W

2

M

i

W

3

B

i



(15)

The Data Extraction System (4/6)

 Relationship from MSN

 Relationship from Blog

15



M

i

W

1

M

send

W

2

M

receive

W

3

M

multi



MiW1Mse ndW2Mr e c e i v eW3Mmul t iW4Mi nt e r ac t i on



B

i

W

1

B

b ro wsin g

W

2

B

b o o k ma rk in g

W

3

B

in tera ctio n



(16)

System Implementation(1/6)

16

(17)

System Implementation(2/6)

17

(18)

System Implementation(2/6)

18

(19)

System Implementation(2/6)

19

(20)

System Implementation(2/6)

20

(21)

Conclusion

 Social and communication data are very common data in our daily life

 Useful for making decision

 An system architecture has been proposed

 Data collection, processing, social network extraction and visualization

 Future research  Ontology base

 Including data mining techniques

 Measurement the performance of the system

(22)

22

Thank You!

Any Questions?

I-Hsien Ting

Department of Information Management National University of Kaohsiung, Taiwan

iting@nuk.edu.tw

參考文獻

相關文件

例如 : http ( 網頁伺服器所用的協定 ) 定義了 client 如何向 server request 網頁及 server 如何 將網頁及其中的各種內容回傳給 client 。. 提供服務給 application layer

本次的作業 (netstat -na 部分 ) 即在觀看所有機 器上的 transport layer 連線. 本次的作業 (netstat -na 部分 ) 即在觀看所有機 器上的 transport layer

Centre for Learning Sciences and Technologies (CLST) The Chinese University of Hong

ESDA is used by schools to collect and manage self-evaluation data, including the administration of on-line Stakeholder Survey (SHS), assessing students’ affective and

The elderly health centres provide people aged 65 or above with comprehensive primary healthcare services which include health assessments, physical check-ups, counselling,

It costs &gt;1TB memory to simply save the raw  graph data (without attributes, labels nor content).. This can cause problems for

5/11 Network Address Translation and Virtual Private Network. 5/18 System configuration and

They are: Booklet (6) – Healthy Community, exploring the communicable and non- communicable diseases and how they affect community health so that students are able to