Proposed Method
3.2 Design Concept
4.4.2 Diffusion Evaluation
Due to the desired data permissions, our dataset contains 24 participants and their friends only. Regardless of the diffusion situation toward friends more than directed two hops from participants, we give a list of our participants’ contagious potential through our dataset, de-picted in Table 4.7. We not only evaluate the participants’ influence toward this connected network but also predict who are the potentially affected users. On account of strong ties and weak ties, greater σ(m)(c) is not equal to more potentially affected users user c could affect. The greater σ(m)(c) is, the more potential influence user c has. If user c has larger
σ(m)(c)
number of potentially af f ected users, user c has a better opportunity to affect others successfully.
However, user c with smaller number of potentially af f ected usersσ(m)(c) is not negligible. User c with smaller number of potentially af f ected usersσ(m)(c) still has an opportunity to affect others, because of two
Figure 4.9: Real data testify for the diffusion path existing from viola to lun.
Table 4.7: Order by users have contagious potential through our dataset.
user σ(m)(user, 0.1) maximum cumulative number of potentially affected users
reasons. On one hand, according to one of Facebook functionality, once an information receives a like or a comment, a piece of information would display on the top of not only the News Feed pages but also Ticker bar. While interactions exist between user c and his or her friends fac-tually, the information would get more exposure. On the other hand, the responding threshold depends on subjective elements of different users. In our method, responding threshold is in-herited when an information is re-shared. But in the real world, the definition of responding threshold varies with different users.
Figure 4.10:The expected information diffusion of case three.
4.5 Summary
According to our experiments, we validate our method by estimating tie strength and re-sponding rate with real data in the first place. Secondly, we predict possible information deliv-ering path with individual responding rate. Then, real data testify that the individual responding rate can really infer information delivering path. By the proposed method, we can select a pos-sible information delivering path set out of all 9550 friendships. The executing reduction in selecting efficient target set is remarkable. In other words, we can predict the diffusion cov-erage of a piece of information from a specified individual. By giving several cases, since different users have different tie strengths with their friends and the characteristics of scale-free
Figure 4.11:Real data testify for the information delivering path existing from alphar to peter.
network impact the information diffusion, information diffuses in a different way as contagious user alters. Thirdly, although the vulnerability of scale-free network is rooted in connectivity distribution, we observe a noticeable phenomena that the most connected users does not affect the whole network most with network structure and affinities of users. Finally, we evaluate all our participants’ potential influence to indicate that who is the best choice in terms of dissem-inating information extensively on Facebook. Also, we provide several indexes of who is the most powerful user for information diffusion in social network: σ(m)(user, th) and maximum cumulative number of potentially affected users.
Figure 4.12: Different real data testify for the information delivering path existing steadily from alphar to peter.
Figure 4.13: Attack Survivability 6= Connectivity in Online Social Networks
Chapter 5 Discussion
In this chapter, we discuss different representations of tie strength from the existing works.
Also, we compare the proposed method to existing works. Then, we examine several phenom-ena with the proposed method.
5.1 Estimation
In this section, we compare the proposed tie strength with the definition used in [19] in terms of similarities and differences. We observe the habitual behaviours of Facebook users to infer the cause of users re-sharing the information on Facebook. In this thesis, we learn three factors of information propagation from the experiments: (1) trust in information sharers, (2) novelty of information, (3) exposure to information.
The first factor, trust in information sharers, is the strength of ties between dyads. By the definitions of “homophily” and tie strength, a strong tie is individually influential unquestion-ably. Also, according to Granovetter [6], Manuel E. Sosa [7] and Bakshy et al. [19], strength of a weak tie is a critical bridging of diffusion in a general tie model. Because of the second factor, novelty of information, some weak ties are responsible for information dissemination.
The users with weak ties have more diverse social networks that provide access to novel infor-mation. To increase the weighting of the weak ties in [19], we focus on the number of post with response from others rather than the number of response. Instead of concerning about how profoundly a user gets attracted in topics posted by other users, we intensify how diversely a
user gets interested in topics posted by other user. This represents ‘intensity’ and ‘reciprocal services’ of tie strength in another way. Then, the third factor, exposure to information, has already been proved with large-scale experiments by Facebook, Inc in Figure 5.1. The sharing
Figure 5.1: (a) The difference in sharing time between a user and their first sharing friend. (b) The difference between the time at which a user was first to exposed (or was to be exposed) to the link and the time at which they shared. Adopted from [19].
latency after a friend has already shared the information is conspicuous within one day, even within one week. After a week the information has been shared, the probability of sharing latency is small enough to endure the case in our method. Since no access to get the time a user exposed to the information, we consider these two scenarios as one scenario. Hence, we devise the responding rate to consider the refresh frequency and attraction of information. By the definitions of Granovetter, we divide the dataset by week to reveal the frequency and the
’time’ of interactions between individuals. In the proposed method, we do not consider the
‘intimacy’ of tie strength. Because we only concern about whether information “delivering” or not, regardless positive or negative of the information.
We compare tie strength estimation by the proposed method to the one by [19], illustrated
in Figure 5.2. We show that tie strengths connect our participants viola and alphar with
Figure 5.2: Tie strength estimation by the proposed method compares to the one by previous method.
their friends. The tie strength estimated by the proposed method is ordered by comments re-ceived. While, overall, the normalized tie strength decays similarly to comments received, the proposed method can identify the weak ties, defined by [19], which is critical for information dissemination. We adjust the corresponding weights for tie strengths. Also, we lighten the ties which belong to users having interested in some specific topics deeply but not in diverse top-ics. The users attracted by diverse topics are more likely to rise the propagation opportunity of information exposure within a broader context.
For a specific diffusion purpose, we do not consider “homophly” in our method. For ex-ample, viola and gk are family. They have little interactions on Facebook although they are
familiar than others in real world. Besides, viola and jay are in a relationship. They have either few interactions on Facebook even though they are closer than others in real world. However, viola is one of lun’s student. Though they have difference in ‘social distance’, the interactions between viola and lun is apparent.
Bakshy et al. [19] defined ‘weak’ ties to friends with no interaction. However, in accor-dance with the definition by Granovetter, no interaction between individuals is called ‘absent’
tie, even though they “know” the name of each other. Since no interaction exists, the proposed method does not describe the influence of absent ties. In Figure 5.2, we do not list the friends having no interaction with our participant viola during the two time sets.
5.2 Comparison
In this section, we compare the existing works to the proposed method for information propagation. In recent years, new research studies have put forward different issues of tie strength in a social network. Reviewing the previous works, we contrast the existing works mentioned in Chapter 2 with the proposed method in Table 5.1. We describe more as follows.
For offering a fine representation of relationship than binary friendship indicator, Xiang [20]
proposed a homophily-based model using the similarities of user profiles and dyadic directional interactions. Xiang utilized Gaussian prior probability to explain users’ relationships. The study presented the higher autocorrelation of profile attributes for relationship weights than for binary friendship.
For mapping social media data to real tie strength, Gilbert advocated Marsden’s definition of tie strength using the similarities of user profiles and dyadic undirected interactions. Gilbert adopted OLS regression to approximate subjective tie strength of participants. The study
ex-Table 5.1: Existing works compare to the proposed approach.
tracted 15 significant factors out of 74 variables for tie strength.
For exploring the role of social network in information diffusion, Bakshy et al. designed several homophily-based experiments using information sharing events. Bakshy et al. em-ployed raw data to inspect the relations between user tie strength and information propagation.
The study demonstrated not only strong ties are influential but also weak ties are responsible for dissemination of novel information.
Moreover, based on research of Bakshy et al., we draw attention on tie strength for infor-mation diffusion in social network. We adopt the Granovetter’s definition of tie strength using dyadic directional interactions. Also, we estimate tie strength by OLS regression. Our study predict possible information delivering path and evaluate possible information diffusion.
However, that there have been few attempts to establish a direct relations between
informa-tion diffusion and tie strength. With Gilbert’s method, we cannot analyze informainforma-tion delivering path since undirected interaction leading no directions. With Xiang’s method, we can neither do information diffusion analysis owing to similarities of user profiles in lower positive correlation than the similarities that are currently believed. Although Bakshy et al. performed the large-scale experiments of diffusion analysis, this study regarded tie strength as a factor and gave a coarse view of diffusion in a social network. However, we provide a fine granularity, not only tie strength but also responding rate, of the dyads’ relations for information diffusion. Also, we inspect a finer way of diffusion in social network such as information delivering paths.
The proposed method provides an assessment of general information diffusion. Also, we do not need too many permissions for the proposed method to disturb our participants. For the network security issue, the information may consist of malicious links. If the malicious phising link disguises successfully, the proposed method considers the potentially affected users as hidden victims. For the privacy and safety issues, the information may display personal messages. The proposed method regards the potentially affected users as likely viewers and probable spreaders.
5.3 Issues
We record the average responses of each post from participants. The participant who has the most responses are defined as the most influential user. We compare the target set with the real influential set. There are 3 out of 5 matched within two sets. And we define predicting error as follows.
predicting error = |average responses − predicting responses| (5.1)
The predicting responses is equivalent to σ(m)(user, 0.1). The average predicting error is 3.
The maximal predicting error is 17. The minimal predicting error is lower than 1. In this paper, we define ‘distributional similarity’ as the coverage ratio of active participant friends to active friends. For example, jay has the largest distributional similarity. He has 3 active participant friends among his 9 active friends. Distributional similarity of jay is 0.33. viola has 17 active participant friends among her 74 active friends. Distributional similarity of viola is 0.23.
If user has larger distributional similarity, the user gets smaller predicting error. In our dataset, the distributional similarity of user c is larger than 20%, and the predicting error of user c is lower than 2. Otherwise, the distributional similarity of user c is lower than 1%, and the predicting error of user c is between 3 and 17. Take max for instances, his distributional similarity is 20% and his predicting error is 1.15.
Going over our experiments, we examine several circumstances in this section. In compari-son with the outcomes of evaluation in Table 4.7, the remaining results discriminate the property from every participants indistinctly. Taking results with th = 0 and th = 0.7 in Table 5.2 as examples, if th is too small, almost all participants pass the responding threshold to unfold the information among the whole connected network. Furthermore, if th is too large, almost all participants fail to extend the information through the connected network. We display other remaining results in Appendix A.
However, the diffusion evaluation results with th = 0.1 differentiate the property from every participants apparently. The evaluation results with th = 0.1 distribute more diversely than other results. Taking max, ryan, bzero in Table 4.7 as examples, we distinguish the in-consistent σ(m) from these participants due to their tie strength with friends, although these participants have the same maximum cumulative number of potentially affected users. Also, taking benben, mao as an example, even though both participants have similar σ(m), benben’s
Table 5.2: Diffusion evaluation with th = {0, 0.7} ordered by th = 0.1. (In case th = 0, users get activated if r(ij) > 0, but if r(ij) = 0.)
user σ(m)(user, 0) maximum cumu-lative number of
tie strengths with friends is quite stronger than mao’s, which leads to different maximum cu-mulative number of potentially affected users obviously. Hence, while we do our experiments for diffusion evaluations with responding threshold th = {0, 0.1, 0.2, ..., 0.7}, we only display the evaluation results with th = 0.1 in section 4.4.2.
Since the invitations of our experiments are sent to users not only who frequently use but also who infrequently use Facebook, the behaviour of inactive participants is hard to predict. In our dataset, it is common situation that participants who infrequently visit Facebook post only one piece of information on their wall every week or every few weeks. However, according to the proposed method, no interactions within few weeks is treated as sparsity of training data in our dataset. Thus, the participants who infrequently visit Facebook get the regression coefficient α, β with large variance due to insufficient data. The cases of participants infrequently using
Facebook guide various outcomes. Sparsity also causes the large predicting error, even though the user has larger distributional similarity.
During the time sets in our experiments, Facebook originally published Timeline beta on Sept. 22th, 2011. Timeline is a new kind of user profile. The way of Timeline displaying the stories changes the way of user behaviours. It’s much easier to get the information through Timeline than order user profile. Timeline updated on Dec. 6th, 2011 and roll it out in New Zealand. Some of our participants get Timeline for their profiles before the experiments or during the experiments, but the others don’t. Hence, the inconsistency on user wall displaying may involve with the proposed method.
Also, Ticker is introduced in Sept. 2011. The users with English version get this prod-uct on their Facebook first. The introduce of Ticker to users in de-synchrony involves with the proposed method as well. The observations on new Facebook products deserve people’s attentions.
Even though Facebook strengthens security with partnerships in attempt to protect its 900 million users from spam and malicious content, malicious Web links still pop up on the so-cial network [27]. Besides, many issues rise due to the popularity of soso-cial network such as marketing, privacy and safety issues. The proposed method delineating the way of information spreading on Facebook is noteworthy although only less than 4 percent of content shared is spam now.
The proposed method provides a preliminary idea of information diffusion in social net-work for the condition that information is shared on user i ’s wall by user i . Future net-work will hopefully examine the condition that information is shared on user i ’s wall by user j . An addi-tional interesting avenue of investigation might be to consider the condition that the information is shared as a comment. However, the EdgeRank that Facebook use to determine which story
comes out on the News Feed is not announced in detail. Also, no entry is for sharer of each link via FQL now. We only can access the like data as a list of the viewing user’s friends who like the post. Future studies should be alerted to the desired data limitation.
Chapter 6 Conclusion
In this research, we propose a method to measure tie strength of the dyads for strength of interaction, instead of for relationship. We define a responding rate to represent the opportunity for information propagation. We further predict the information delivering path of a wall post.
Also, we model information diffusion for 1-hop dissemination and m-hop spreading.
Sequentially, we conduct experiments to estimate the tie strength and responding rate for our participants by analyzing 6 months data. By verifying the estimation with the characteris-tics of user behaviours, we build the connections between the participants. With the proposed method, we can select an efficient target set from 9550 relationships. The executing reduction is noticeable. In other words, we can predict the diffusion coverage of a piece of information from a specified individual. After analyzing the tie strengths and responding rates in the real data, we verify the existence of the information delivering path predicted by our method. Accord-ing to our prediction, we find out that attack survivability is not equivalent to the connectivity while human behaviours are included in a scale-free network, especially when the information contains malicious links. Furthermore, we provide a preliminary model to evaluate information diffusion within a broad context.
An area of future research that should be considered is the information topic attractiveness in a finer granularity. In this method, we use one out of the three elements for EdgeRank so far.
Adding the other two elements into diffusion model is obviously required in the future work, but this is an exciting first step for analysis of information diffusion.
Appendix A
As mentioned in section 4.3.2, the remaining verification of tie strengths and responding rates are in Table A.1 and Table A.2, respectively. We figure out that the results with fewer interactions fluctuates larger. Larger fluctuation does not represent the lower accuracy of our method. The infrequent users use Facebook irregularly. Hence, they are much harder to predict their behaviours. We will consider another model to describe infrequent users in the future work.
Table A.1: Order of users have potential influence toward their friends by tie strength when tie strength fluctuates with time varying. (T1:[Sept.-Nov.] T2:[Dec.-Feb.])
user integrated tie strength
As mentioned in section 5.3, the remaining diffusion evaluations are in Table A.3 and Table A.4. We figure out that the influence of participants with high σ(1) and low integrated responding rates is enclosed. For instances, mao and stone are the top 10 participants for influencing their friends. Since their integrated responding rates are lower than those who are
Table A.2: Order of users have potential influence toward their friends by tie strength when responding rate fluctuates with time varying. (T1:[Sept.-Nov.] T2:[Dec.-Feb.])
Table A.2: Order of users have potential influence toward their friends by tie strength when responding rate fluctuates with time varying. (T1:[Sept.-Nov.] T2:[Dec.-Feb.])