• 沒有找到結果。

Combining Social Network Analysis and Web Mining Techniques to Discover Interest Groups in the Blogspace

N/A
N/A
Protected

Academic year: 2021

Share "Combining Social Network Analysis and Web Mining Techniques to Discover Interest Groups in the Blogspace"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

Combining Social Network Analysis and Web

Mining Techniques to Discover Interest Groups in the

Blogspace

Hui-Ju Wu

Institute of Human Resource

Management

National Changhua University of Education Changhua, Taiwan d94311001@mail.ncue.edu.tw

I-Hsien Ting

Department of Information Management

National University of Kaohsiung Kaohsiung City, Taiwan

iting@nuk.edu.tw

Kai-Yu Wang

Department of Marketing, International Business and Strategy

Brock University, Canada kwang@brocku.ca

Abstract— The purpose of the research paper is to analyze the

characteristics of users in social networking websites as well as the related contents of the websites. This research expect to use the techniques of social network analysis and web mining to illustrate the networks of the blog users and by this to discover the interest groups. The analysis results will then be used to construct a recommendation mechanism and system. Furthermore, we expect that the concept of target-marketing can be achieved according to the proposed system.

Keywords- Social Network Analysis, Recommendation System, Web Mining, E-commerce, Blogspace

I. INTRODUCTION

Recently, the websites that created based on the concept of web 2.0 have becoming the main stream in WWW especially for those social networking websites, such as Blog, friends making website, web album, etc. However, it is now an important research field about how to combine the features of social networking websites with traditional business models.[7] Thus, the purpose of this paper is to analyze the characteristics of users in social networking websites as well as the related contents of the websites. We expect to use the techniques of social network analysis and web mining to illustrate the networks of the users and by this to discover the interest groups. The analysis results will then be used to construct an automatic product recommendation system that is based on the interest groups categorization mechanism. Furthermore, we expect that the concept of target-marketing can be achieved according to the proposed system.

The structure of this paper is organized as below: In section 1, the background and introduction will be introduced. Some related literatures of social network analysis, the taxonomy and techniques of web mining, virtual communituy will be reviewed in section 2. The research methodology and research design of this paper will be proposed in section 3 as well as the research architecture will be discussed in section 4. In section 5,

an empirical study will be included in this paper to show how the system works, and the implementation of the system will also be included in this section. In section 6, this paper will be concluded with the suggestions for future research.

II. LITERATURE REVIEW

In this section, related literature about social network analysis, the taxonomy and techniques of web mining and virtual community will be reviewed and discussed.

2.1 Social Network Analysis

The research methodology of social network analysis is developed to understand the relationship between “actors”, and the term actor can be a person, an organization, an event or an object [4]. In a social network, each actor is presented as a node and each pair of nodes can be connected by lines to show the relationships. The social network structure graph is a graph that formed by those lines and nodes, and social network analysis is therefore a methodology that used to understand the graph and the relationships and actors in the social network [6][4][17].

There are three important elements that included in a social network: actors, ties, and relationships [14]. Actors are the essential elements in the social network to define the people, events or objects. Ties are used to construct the relationship between actors by using a mean of path to establish the relationship directly or indirectly. Ties can also be divided into strong and weak tie according to the strength of the relationships; they are also useful for discovering subgroups of the social network. Relationships are used to illustrate the interactions and relationships between two actors. Furthermore, different relationships may cause the network to reflect different characteristics [10][12].

(2)

2.2 Web Mining

According to different analysis targets and resources, the web mining techniques can be divided into three different types, which are Web Content Mining, Web Structure Mining and Web Usage Mining [5].

Web content mining is a web mining technique to analyze the contents in the web, such as texts, graphs, graphics, etc [2]. Recently, most of web content mining researches are focused on the text data processing and few are focused on other multimedia data.

Web structure mining is a technique that can be used to analyze the links and structure of websites. Graph theory is usually the main concept and theory for web structure mining to analyze and explain the structure of websites. In addition, the extraction of the structure of websites is always essential in this research area [11][15].

Web usage mining is a web mining technique that can be used to analyze how the websites have been used, such as the navigation behavior of users [3]. The server-side Clickstream data (logs file) is the main sources that used for web usage mining. Client-side data (such as client-side logs file, cookies) is sometimes to be used due to some research concerns, such as in order to record more complete behavior of users [1][16]. 2.3 Virtual Community

The term virtual community is a group of computer users to provide friendship, social resources, information, belongness, and to support, learn and share with each other. The virtual community is an extension of community of practice [13][18]. Currently, many companies believe that virtual community is a valuable knowledge management system, and therefore make their effort in managing or collaborating with social network websites [8]. The most popular type of social networking website now is Blog and also known as Weblog [9][19]. Most of weblogs now present as personal logs websites for user to post articles and receive responses from the people who have similar interests. The people who have similar interests may post articles with similar topics as well as discussing and responding with each other. Therefore, Blogs have become a good resource for us to understand the virtual community and interest groups.

III. RESEARCH METHODOLOGY AND DESIGN

This paper combines the techniques of social network analysis and web mining to analyze social networking websites. Blogs with different characteristics will be classified and to discover a suitable products recommendation mechanism based on the consumer’s network.

3.1 Research process

According to this purpose, a research process has been designed as shown in figure 1. In the process, there are four steps included. The first step is pilot study and literature review process. Step 2 and step 3 are both analyze step but using different techniques, step 2 is using the technique of social network analysis and step 3 is using web mining techniques. The last step of the process is designed to develop the

recommendation mechanism and to implement the recommendation system.

Association Rule Mining Social Network Analysis

Construction the Customer Network Models

Classification of Customer Models

Construction the Recommendation Strategies Data Collection Literature Review Identifying Research Questions

and Purpose Construction of Recommendation Mechanism Implementation of the Recommendation System Conclusion Step 4 Recommendation System Construction Step 3 Web Mining Step 2 Social Network Analysis Step 1 Pilot Study

Figure 1. The research process of this paper 3.2 Research architecture

According to the research purpose, the research architecture of the paper is presented as figure 2.

Writes Detailed Item Listing

Sets Auction Length

Figure 2. The research architecture

IV. EMPIRICAL STUDY

In the empirical study, we collected the Blogs that related to cosmetics as the analysis target. The data collection duration

(3)

is from January 2008 to March 2009. Then, the paper uses the three measurements to measure the relationship of the cosmetics community, which are responding, citation and recommendation.

Prior to the processing of social network analysis, a relationship matrix has to be established first according to the three measurements. Each member of the community is presented as a node in the network, and the link between any two nodes means the relationship. The relationship value will be denoted as 1, if the value of measurement is larger than 3 else will be denoted as 0. The relationship matrix will then be used to illustrate the social network graph as shown in figure 3.

Figure 3. The social network graph of cosmetics community In this research, we use UCINET 6 to help the analysis of the cosmetic community, which is very popular SNA software. Table 1 shows the SNA result. In table 1, the network size is 20 and it means there are 20 members who have interaction with each other. The density of the network is 0.5658, the distance is 0.783. About the centrality of the network, the degree centrality is 0.7124 and the closeness centrality is 0.6293. The closer to the center means the members in the network can get the information faster.

Table1. The SNA results

Figure 4. Distance tree

The paper also uses the position and role analysis of social network analysis to calculate the distance of relationship and by this for community classification. Figure 4 is the distance tree of the members. For example, in figure 4 the shortest distance of the tree is 4 between member no. 14 and 8. It means member 14 and 8 have most similar interest and can be assigned into the same class. Furthermore, member 15 and 16 can also be assigned to the class which member 14 and 8 belong.

According to this classification methodology, the classification result is shown in figure 5. In figure 5, the classification result can be understood very easy. For example, the member no. 1, 17, 20 and 19 are in the same class and it means that they have similar interests.

Figure 5. The classification result tree

Then, the research will shift the analysis focus to the content of the blogs. We apply the techniques of web content mining to discover the association rule of the blogs’ content and response content. Furthermore, we also try to identify the difference of intra-class and inter-class, and the result is shown in table 2.

In table 2, group 1 has been assigned to the group of cosmetics according to the discovered association rule, and group 2 has been assigned to the group of skin care, the members were discuss more about the topics of skin care. The members of group 3 have been assigned to the group of hair-style.

Table 2. Grouping result and the association rules

.

The main purpose of this paper is to analyze the database of social networking websites through applying the techniques of SNA and web mining. We want to discover the social relationship of members in the blogs and to discover the association between members. Finally, to find the interest groups in the blogspace. The interest groups will of help for us to develop a mechanism and to construct a product recommendation system based on the network of consumers.

The prototype of the recommendation is shown in figure 6 and 7. The two figures show that the customer can accord to their interests and preferences to find the classified blogs. Therefore, the customer can save their time to search and navigate in the blogspace. Furthermore, the companies can use the classified groups to promote their products or to distribute the test products. Through the system, it can help to enhance the performance of internet marketing and achieve the concept of target marketing.

Size Density Distance Centrality Degree Closeness 20 0.5658 0.783 0.7124 0.6293

Groups Members Association Rule Group 1 1, 17, 20, 19 FoundationÆEye cosmeticsÆLip

cosmeticsÆcosmetics tools 3 ,6, 2, 9

Group 2

8, 14, 15, 16

Skin careÆSkin conditionÆcare accessories

4, 11 18, 5 Group 3

12, 10, 7

Hair styleÆHair styling toolsÆfashion information 13

(4)

Figure 6. Blog recommendation system (1)

Figure 7. Blog recommendation system (2)

V. CONCLUSION AND FUTURE RESEARCH

With the rapid growth of Internet and the web 2.0 websites, tradition marketing strategies become harder and harder to satisfy the customers. Social networking websites now are very popular internet applications and therefore attract more and more users. Thus, the social networking websites have also become a very good resource and platform for marketing. This paper proposes a methodology to combine the techniques of social network analysis and web mining to discover the interest groups in the blogspace. A system prototype has also been implemented in this paper to show how the mechanism works. In the future, we will apply the methodology to different data and resources in the internet. We will also focus on how to measure the system performance.

ACKNOWLEDGMENT

This work is partially supported by a NSC research grant, TAIWAN (NSC 97-2410-H-390-022).

REFERENCE

[1] Han, J., Kamber, M. “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, USA, 2001. [2] Joshi, A., and Joshi, K. “On Mining Web Access Logs”,

in Technical Report, CSEE Department, UMBC, 1999.

[3] Buchner, A. G. and Mulvenna, M. D. “Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining”, ACM SIGMOD Record, 27(4):54-61, 1998.

[4] Borgatti, S. P., Everett, M. G. and Freeman, L. C. ”Ucinet forWindows: Software for Social Network Analysis”, Harvard, MA: AnalyticTechnologies, 2002. [5] Cooley, R., Mobasher, B. and Srivastava, J., “Web

Mining᧶ Information and Pattern Discovery on the World Wide Web”, in Technical Report᧨University of Minnesota, Dept. of Computer Science , Minneapolis, 1997.

[6] Freeman, L., “Centrality in Social Networks: Conceptual Clarification,” Social Networks, 1979.

[7] Garton, L. and Haythornthwaite, C. and Wellman, B., “Studying Online Social Networks”,Journal of Computer-Medicated Communication, June 1997.

[8] Hsu, M.H., Ju, T.L., Yen, C.H., and Chang, C.M.“ Knowledge Sharing Behavior In Virtual Communities The Relationship Between Trust, Self-Efficacy, and Outcome Expectations.”, International Journal of Human-ComputerStudies, 2007, Vol. 65, pp. 153-169.

[9] Scott, J. “Social Network analysis:a Handbook”, Sage publisher, 2000.

[10] Hanneman, R. A. (2001). “Introduction to Social Network Methods: Department of Sociology”, University of California, Riverside Publisher, 2001. [11] Spiliopoulou , M. “Web Usage Mining for Web Site

Evaluation”, Communication of the ACM 43, 8, 2000, pp. 127- 134.

[12] Mitchell, J. C. “Social networks and urban situations” England: Manchester University Press, 1969.

[13] Preece, J. “Online Communities᧶Designing Usability, Supporting Sociability,” New York᧶Wiley, 2000. [14] Scott, J. “Social Network Analysis: Critical Concepts in

Sociology,” New York, Routledge Publisher, 2002. [15] S.Chakrabarti, “Data mining for hypertext: A tutorial

survey”, ACM SIGKDD Explorations, 1(2):1 - 11, 2000. [16] Cooley, R. Mobasher, B. and Srivastave, J. “Web

Mining: Information and Pattern Discovery on the World Wide Web” In Proceedings of the 9th IEEE International Conference on Tool with Artificial Intelligence, pp. 558-567, Newport Beach, CA, USA, 1997

[17] Wasserman, B., and Faust, K. “Social Network Analysis: Methods and Applications.”New York: Cambridge University Press, 1994.

[18] Wellman, B. “Computer Networks as Social Networks: Collaborative Work, Telework and Virtual Community,” Annual Review of Sociology, Vol.22, pp.211-238, 2000. [19] Wellman, B., “For a Social Network Analysis of

Computer Networks: A Sociological Perspective on Collaborative Work and Virtual Community,” ACM, 1996

數據

Figure 1. The research process of this paper  3.2 Research architecture
Figure 3. The social network graph of cosmetics community   In this research, we use UCINET 6 to help the analysis of  the cosmetic community, which is very popular SNA software
Figure 6. Blog recommendation system (1)

參考文獻

相關文件

This kind of algorithm has also been a powerful tool for solving many other optimization problems, including symmetric cone complementarity problems [15, 16, 20–22], symmetric

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

We try to explore category and association rules of customer questions by applying customer analysis and the combination of data mining and rough set theory.. We use customer

The purpose of this research is to explore the important and satisfaction analysis of experiential marketing in traditional bakery industry by using Importance-Performance and

This study collected consumer expectations and perception of medical tourism industry, with Neural Network Analysis and Mahalanobis Distance Analysis found the key to the

The purpose of this paper is to use data mining method in semiconductor production to explore the relation of engineering data and wafer accept test.. In this paper, we use two

Through the analysis of Structural Equation Modeling (SEM), the results of this research discover that the personal and family factors, organizational climate factors,

This study intends to use the Importance-performance and gap analysis (IPGA) to analyze the factors key to the improvement of the service quality provided by