• 沒有找到結果。

Xiong Zhang Chenwei Li

[email protected]@cityu.edu.hk

Department of Information Systems, City University of Hong Kong, Hong Kong

Abstract:One representative hacker forum is selected to analyze users’ reputation earning behaviors. Our analysis results suggest that both post-centered characteristics and network-centered characteristics are important for users to earn high reputation in the online forum. To earn high reputation, users are suggested to write more posts in detail to offer help to others and try to interact with others in more diverse topics.

Keyword: Learning Behavior, User Reputation, Survival Analysis, Social Network Analysis.

1. Introduction

Piracy loss due to hacking has increased significantly [1]. Hackers are always considered as mysterious in gray world[2]. Holt[3] refers to a hacker as “any individual with a profound interest in computers and technology that has used this knowledge to access computer systems with or without authorization from the system owners”. Online forums are the important venues for hackers to learn, to communicate, even to collaborate for an attack [4]. As a social network, hackers interact with others to earn reputations in online cummunity. This motivate us to study hacking-related phenomenon in social network perspective. Both white-hat hackers and black-hat hackers may exist in the hacker forum. Some hackers are very knowledgeable and are very active in the forum. However, only a small number users can earn high reputation in the online forum compared with most other users of low reputation. In other words, in terms of high user reputation, only a small number of users can survive in the forum.Survival analysis is usually used in the analysis of time to event, such as the death of biological organisms and failure in mechanical systems[5]. Although we can’t equate the high reputation with survival of hackers in the online forum, in this paper, we try to apply survival analysis to study the characteristics high reputation hackers should have.Although this online hacking forum doesn’t kick users out, the forum has the policy that users can’t ask for score from others without making contribution, therefore, it’s not easy for users to gain high reputation. Users have to interact with others and contribute to the forum to earn respect and reputation from others.By studying hackers’

reputation, we expect to study how users can gain higher reputation and what are the characteristics of high reputation hackers. The analysis result of this paper can be the foundation from which we can extend to study hacking’s impact on firms and organizations.

In this paper, we try to study users’ survival characteristics[6] in terms of high user reputation based on the raw data downloaded from a representative hacker forum. More specifically, we are interested in what would make a user a high-reputation hacker in online forum and how could they survive in this virtual community in terms of earning high reputation.

We try to study this research question in both post content perspective and social network perspective. We do survival analysis using Cox proportional hazard regression model[7]. The

107

analysis results would provide insights on how users can survive and earn high reputation in the online forum.

2. Data Collection

The hack forum we choose has over six years of history. Up to now, this forum has more than 350 thousands registered members, in which, over 145000 members are pretty active. These members have made a total of more than 23 million posts in over 2 million threads. There are at most 2885 users online at the same time on July 22, 2012. In this forum, different topics, such as techniques, tutorials, movies, aggregate into dozens of boards, of which 23 hack-related sub-boards focus on hacking related technologies, such as hacking tools, hacking tutorials, hacking issues in specific areas, proxies, decryption and so on. The raw data was downloaded from this forum in the time range of February 2007 to April 2012. From the raw data, we select the users who post more than 10 times, either initializing a thread or replying others. Those users who post less than 10 times in 6 years are out of our interest since not enough posting behaviors can be obtained for those users. We select users who comment or get comments for more than 3 times to remove those users who don’t get recognized in this community. Finally, we have 1112 users and their 127628 posts.

3. Model and Implementation

The panel of posting network includes the threads users initialize and posts they reply to others in different sub-boards. Post content is one of the key indicators of hacking knowledge of users. Post content is also the basis on which other privileged users give feedbacks and opinions.

We propose that whether a post is written to offer help to others may have important influence on the comment authors can get. Therefore,we built a text mining system[8][9] to automatically classify post content in the database so that each post can be assigned to one of the two classes of help offering or not. In this paper, we consider the value assigned to each post by this system as the level of offering help of each post. The length of post content can serve as the level of user generated content.

Different posting networks exist in this forum on the basis of users’ interests. Users who post consistently in one or several certain sub-boards more easily get recognized and furthermore get positive feedbacks from other senior users. We propose a measurement about user 𝑖’s sub-board loyalty as follows, which is higher when user 𝑖 concentratemore on certain sub-boards.

𝑆 𝑎 𝑎 𝑖 = ∑ ( − )

In which in the percentage of user 𝑖’s posts in sub-board 𝑗.

Users belong to five different clubs, among which, the users in the highest club can assign score within the largest range to others and the users in the lowest club can’t assign any scores to others. According to this power hierarchy, we denote the highest club as level 4 and the lowest club as level 0.

Besides, we also analyze the online forum in a social network perspective. Between-ness

108

centrality[10][11]quantifies the times a user serves as bridge along the shortest path between two other users. Between-ness centrality is defined as

𝐶 𝑒 𝑤𝑒𝑒𝑛𝑛𝑒𝑠𝑠𝑖 = ∑ 𝛿𝑠𝑡(𝑖)

The descriptive analysis of the variables used in this model is presented in Table 1.

Table 1: Descriptive Analysis of Variables

Avg. offerhelp 1112 -0.900171 0.013936 -0.96455 -0.82321

Avg. post length 1112 199.6682 150.9032 41 1913

Sub-board loyalty 1112 6.432228 4.383272 1.0608 22

Between-ness centrality 1112 8.26E-05 0.000304 0 0.00426

Score per comment 1112 0.452338 0.497947 0 1

The correlation analysis between independent variables is presented in Table 2.

Table 2: Correlation Analysis of Variables

Between-ness centrality 0.8811 0.2607 0.3939 0.3472 0.0777 -0.0473 -0.0904 1

4. Analyses and Results

In this paper, we denote users whose score per comment is higher than the median of score per comment for all users as the users who survive in the online forum, in terms of high reputation. The survival time is defined as the time range between the date when the first post and the date when the last post wrote by users. The analysis results are shown in Table 3.

From the results of cox regression, the regression coefficients of comment count, header post count and sub board loyalty are positive. These factors serve as risk factors for users to earn high reputation in the online forum. On the other hand, the regression coefficients of reply post count, club membership, the average level of help offering, the average length of post content and the between-ness centrality are negative. These factors serve as protective factors for users to earn high reputation in the online forum.

Especially, our results indicate users who open more threads tend not to earn high reputation

109

in the online forum more easily, while users who reply more frequently tend to gain positive feedback from others in the forum. When we focus on post content, we find that users who reply to offer help to others can earn higher score compared with others in the forum. Besides, posting posts in much details can also help authors to gain positive feedback in the virtual community.

Table 3:Cox Regression Analysis

VARIABLES 50% Z

Comment count 0.00263 0.00362

Header count 0.00877*** 0.00308

Reply count -0.00394*** 0.000623

clubmembership -0.0885 0.0647

Avgofferhelp -5.169 3.278

Avg post length -0.000234 0.000304

Sub-board loyalty 0.0303*** 0.0101 Betweennesscentrality -1,173* 658.1

Observations 1,112 1,112

*** p<0.01, ** p<0.05, * p<0.1

Besides, users are also advised not to post in only a few sub boards. Instead, users should extend their interests and write posts in other sub-boards such that they have the chance to interact with more users. In the social network perspective, users who play as the bridges among others can earn higher reputation in the forum compared with other users.

Figure 1: Kaplan-Meier Survival Estimates

To validate the robustness of the regression results, we take the factor club membership, which is not a significant factor, as an example to give the Kaplan-Meier survival estimates under different memberships. The Kaplan_Meiersurvivial curve is defined as the probability of surviving in a given length of time while considering time in many small intervals[12].The Cox regression result shows that focal users survive for longer time in terms of higher reputation when these users are in a higher club membership. This conclusion is also verified by the survival curve. Figure 1 shows the failure rates for users in different levels of club memberships.

As we can see, users of higher level of club membership tend to live longer in the online forum in terms of higher reputation.

0.000.250.500.751.00

0 500 1000 1500

analysis time

clubmembership = 0 clubmembership = 1 clubmembership = 2 clubmembership = 3

Kaplan-Meier survival estimates

110

5. Conclusion and Future Work

In this paper, we try to study how users can earn high reputation in a hacker forum. We download the raw data from a representative hacker forum and do survival analysis using Cox proportional hazard regression model. Our regression results suggest that users should reply detailed posts to help others so that they can earn positive feedback in the forum more easily. In the social network perspective, users are also advised to broaden their interests and try to serve as the bridge among user interactions in the virtual community. The current work can be extended in following directions. First, our model will be verified in a larger database to check the robustness of our results. Second, we will further investigate how these risk factors and protective factors influence firms and organizations in business.

References

[1] G. Farrell and M. A. Riley, "Hackers Take $1 Billion a Year as Banks Blame Their Clients," Bloomberg, 5 August 2011. [Online]. Available: http://www.bloomberg.com/news/2011-08-04/hackers-take-1-billion-a-year-from-company-accounts-banks-won-t-indemnify.html. [Accessed 24 11 2013].

[2] Symantec, "2013 Norton Report: Cost per Cybercrime Victim Up 50 Percent," Symantec, Mountain View, Calif., 2013.

[3] T. J. Holt, "Hacks, cracks, and crime: An examination of subculture and social organization of computer hackers," University of Missouri-St. Louis, Ph.D Dissertation, 2005.

[4] V. Mookerjee, R. Mookerjee, A. Bensoussan and W. T. Yue, "When Hackers Talk: Managing Information Security Under Variable Attack Rates and Knowledge Dissemination," Information Systems Research, vol. 22, no. 3, pp. 606-623, 2011.

[5] "Wikipedia," [Online]. Available: http://en.wikipedia.org/wiki/Survival_analysis. [Accessed 24 11 2013].

[6] J. F. Lawless, Statistical Models and Methods for Lifetime Data, 2nd ed, Wiley-Interscience, 2002.

[7] D. R. Cox, "Regression models and life-tables," Journal of the Royal Statistical Society, Series B (Methodological), vol. 34, no. 2, pp. 187-220, 1972.

[8] T. Joachims, Learning to Classify Text Using Support Vector Machines-Methods, Theory and Algorithms, Kluwer Academic Publishers, 2002.

[9] M. F. Porter, "An Algorithm for Suffix Stripping," Program, pp. 130-137, 1980.

[10] K. Faust, "Centrality in Affiliation Networks," Social Networks, pp. 157-191, 1997.

[11] L. Freeman, "A set of measures of centrality based on betweenness," Sociometry, vol. 40, no. 1, pp. 35-41, 1977.

[12] A. DG, Analysis of Survival times.In:Practical statistics for Medical research, London: Chapman and Hall, 1992.

111