Various Roles Analysis

However, For the sake of anti-monotone, the longer of the path grow the much lower of the probability. Under the same minimum support, the great amount of longer length of path will be pruned. Hence, we relax the minimum support as Equation 4-1 when we generate longer path. For example, to set 0.1 minimum support in the beginning, it will become 0.1 when system generate influence edge and 2-influence path, 0.01 when generate 3-influence path and

Figure 4-2 execution time

Figure 4-3 number of influence path we found

Figure 4.3: influence paths we found out

Table 4.1: Number of Influence Paths in Different Length influence

threshold. To prune some path with low probability, we set the α = 5 to discover influence and find out various roles.

However, the longer of the path grow the much lower of the probability. Under the same minimum support, the great amount of longer length of path will be pruned. Hence, we loosen the minimum support in a heuristic way as Equation (4.1).

minimum support = minSup

10[round(length of path /2)−1] (4.1) Table 4.1 lists various roles analysis from influence path found by APPM with α = 5 and minimum support from 0.1.

4.3 Various Roles Analysis

In this section, we discover various roles and analyze them base on the influence path. Since the number of influence is much smaller in

‧

number of leader

length of chain

β>=2 β>=3 β>=4 β>=5

Figure 4.4: chain leader without leading degree constrain

long length path, we set different threshold β of leading path and without the threshold of leading degree in Figure 4.4. For the reason of different user leads two 6-chains, we can’t find a leader under following threshold.

If we set threshold difference by length of chain as β =5 in 1-chain, β =4 in 2-chain, β =3 in 3-chain and 4-chain, β =2 in 5-chain, β =1 in 6-chain with the threshold of leading degree, we compare different leading degree constrain in Figure 4.5.

4.3.1 Leader Validation

In this section, we measure the precision, recall and F-measure for leader validation and NDCG for validating ranking quality in different method. We compared our proposed algorithm with the following four methods on the same social graph.

According to [10], we set first method Share. With Facebook func-tion, we also fix the definition and set Like and Comment method. To

‧

number of leader

length of chain

δ>=0.25 δ>=5 δ>=0.66 δ>=0.75

Figure 4.5: chain leader in different leading degree

avoid noise actions such as advertisements or unrelated conversations in comments and to have better influential action detection, we let two people selected actions by their judgement.

1. (Share): Leader as someone who influenced a number of action made by a number of users who share the same action on the wall during a period of time.

2. (Comment): Leader as someone who influenced a number of action made by a number of comments during a period of time.

3. (Like): Leader as someone who influenced a number of action made by a number of likes during a period of time.

4. (Selection): Leader as someone who influenced a number of action made by people selection during a period of time.

To find out the ground truth, three users from the university groupwere asked to ark our dataset, truth is for ideal leader while

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

false is not, and also rank leaders they have chosen. Then, we select the intersection of three answers which three users marked truth for cautions. Finally, ten leaders have been selected from the dataset as ground truth and ranking from 1 to 10. As sharing was performed the least and like was performed the most in our data set, we decided to set different threshold. Under the same two months as time threshold, we set more 5 actions made by 5 user shared in method Share, more 10 actions made by 10 users wrote comments in method Comment and 15 actions made by 10 users pressed like in method Like. Due to our 3-chain leader perform better than other chain, we take it to present our work.

Validation Measure

In the following section, we use various measure to validate our approach. To validate our relevace with grund truth, we use three fraction precision, recall and F-measure.

1. (Precision): the percentage of retrieved leaders that are relevant to the ground truth as Equation (4.2).

precision = |{leaders in ground truth}|T |{leaders retrieved}|

|{leader retrieved}|

(4.2) 2. (Recall): the percentage of the leaders that are relevant to the ground truth that are successfully retrieved as Equation (4.3).

recall = |{leaders in ground truth}|T |{leaders retrieved}|

|{leaders in ground truth}|

(4.3) 3. (F-measure):the harmonic mean of precision and recall as

Equa-‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Table 4.2: Comparison in Precision, Recall and F-measure Like Comment Share Selection Chain leader

Precision 0.154 0.177 0.143 0.6 0.667

Recall 0.2 0.3 0.2 0.9 1

F-measure 0.174 0.22 0.167 0.72 0.8

tion (4.4).

F = 2 · precision · recall

precision + recall (4.4) For leader rankin validation, we use NDCG as Equation (4.5), where IDCG is ranking from grund truth and DCG as Equation (4.6). In Equation (4.6), rel_i is the score of leader which from 1 to 10. The score of other users who are not leaders is 0.

N DCG@n = DCG@n

IDCG@n (4.5)

DCG@n = Σⁿ_i=1 rel_i

log₂i (4.6)

Result of Relevance

Table 4.2 shows the comparison of four methods under precision, recall and f-measure. From the table, ain leader and Selection are clearly better than Like, Comment and Share. Comparing to method by man selected action, chain leader still win by a very narrow margin. This can be explained by :

1. Without timstamp detail, only calculating comments likes and times of sharing is not good to discover real leader.

2. Due to too many action on the users’ wall, many noises may influence the result and easily counting can’t extract influential action.

‧

2-chain 3-chain 4-chain 5-chain 6-chain recall

precision f-measure

Figure 4.6: chain leader comparison

Under thershold with L_n > 5 in 2-chain, L_n > 4 in 3-chain, L_n> 3 in 4-chain, L_n> 2 in 5-chain, L_n > 1 in 6-chain and D_n > 0.66 in all chain. Figure 4.6 shows how frequent path concept in APPM which chain leader discovery based on is able to reduce the noise and extract better path through all actions with other chain validation.

Only two 6-chain leaders were found, so there is bad recall but high precision. Due to explain short chain leader perform better, we believe that leader in social network are people who always interact directly. That is why shorter chain influence people more directly than long chain.

Result of Leader Ranking

With rank comparison, under the same threshold, we ranked by sorting action number in four method and leading number in our approach. We choose two short chains and one long chain to present our ranking quality. Furthermore, we also combined our 2 to 6-chain leader in ranking by label 10 is No. 1 and 0 is out of top ten position in that chain. Then, we added every users’ score directly. Table 4.3 lists the results in terms of NDCG.

Our approach outperforms both short chain and long chain in NDCG @10. However, middle chain leader, both 3 and 4, are not good at NDCG @ 5. Furthermore, 2-chain leader ranking outperforms

‧

Table 4.3: Comparison in terms of NDCG@5 and NDCG@10

(a) other methods

Like Comment Share Selection

NDCG@5 0.416 0.416 0.290 0.657

NDCG@10 0.438 0.440 0.286 0.660

(b) chain leader

2-chain 3-chain 5-chain Combine

NDCG@5 0.810 0.425 0.685 0.788

NDCG@10 0.770 0.669 0.696 0.795

on NDCG because it presents the most directly way of relation on social network. Due to reasons we mention before, Like, Comment and Share are still stuck in ranking. Few users used share function in Facebook may cause lower performance than Like and Comment.

Here are some case study in our data set. One of the user, No.

4 in combine method also in ground truth, is the vice-captain of badminton team in school. Two strict 6-chain leaders in the social graph are No. 1 and No. 9 in ground truth. One is a captain of basketball team of department and he is also very active in school, No. 2 in our combine method. The other one is senior in school who is a higher hierarchy leader in reality live but lower rank in ground truth, No. 1 in our 6-chain leader rank. This is able to explain long chain leader may be a high hierarchy level in reality but.

To analyze more roles situation, we pick every user who leads 5-chain in Table 4.4. Since the sake of privacy, we change the id to common name. Every name can related a real id in our experiment.

Action column shows total action the user made. We show the number of chain has been lead in column of leads in 5-chain. And first and second user also leads in 6-chain. We also find out the

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

user id actions Ln Dn roles in 6-chain strict leader loyal follower

Princeton 18 4 1 leader Y HAM

Zheng 15 2 1 leader N Hohan

Hung 23 2 0.66 none N Cheng

Jay 39 1 0.5 follower N none

Wei 9 1 0.5 follower N Irene, Zheng, Kai

Ham 5 1 0.5 follower N PAN

Table 4.4: leader in 5-chain

most loyal follower for those users. However, some of the last two users’ follower has same loyalty. By corresponding to reality, we find out the loyal follower and his leader are very close friends. One of the user, Jay, leading in 5-chain is the vice-captain of badminton team in school, he also a leader without leading degree constrains in shorter chain. Since, he follow many activities also in shorter chain, the leading degree of him is not high enough to be a leader in shorter chain. Furthermore, his follower also not very loyal to him. The only strict 6-chain leader in the table, Princeton, is a captain of basketball team of department and he is also very active in his class. Therefore, since his characteristic conform to concept of leader in realistic, his roles as also a leader is also been discovered on social network by our framework. The discussion above validates not only our framework of main path but also important roles by our definition. These two methods are able to help us understand social network even to testing viral marketing strategy.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在文檔中利用資訊串流探勘社群網路中的多樣角色 - 政大學術集成 (頁 40-48)

4.3 Various Roles Analysis

‧

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Equa-‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學