EXPERIMENTS - 標籤社群網絡之影響力最佳化

In this section, we conduct a series of experiments to evaluate the effectiveness and efficiency of the methods for the proposed labeled influence maximization problem. We employ the Internet Movie Database (IMDb) to construct the labeled social network. We use the movies in 1994 and 1995 and consider those actors and actresses in such period as the nodes in the network. Those who have ever co-worked for a certain movie are linked together. On the other hand, each movie has a set of categories.

For example, the movie Toy Story has the categories of animation, adventure, comedy, family, and

fantasy. The labels associated with each node are derived from the categories of their involved movies.

For each node, we take the most frequent category among his/her involved movies as the label. For example, the label of the actor Jim Carrey is comedy, which is the most frequent category of his ever involved movie. Eventually the constructed labeled social network consists of 6,079 nodes and 120,610 edges. We show the number of actors/actresses of each label in Figure 4. Totally there are 12 labels, in which the label sport has the least actors/actresses while the label drama has the most ones.

Figure 4. The number of actors/actresses of all the labels in the IMDb labeled social network.

We proposed six algorithms for the labeled influence maximization problem, including the Label General Greedy, Labeled CELF Greedy, Label New Greedy, Labeled Degree Discount, Proximity Discount, and Maximum Coverage. In this report, we only execute and report the performances of last four algorithms because the expected running time of the first two ones is too long for us to obtain the results (more than one month). On the other hand, we measure the effectiveness by the labeled influence spread (i.e., total profit value produced by the seed nodes) and estimate the efficiency by the run time in second. We will report the results as the number of seed nodes increase from 1 to 20. For the simulations using the independent cascade model, we set the edge influence probabilities to be equally 0.05.

The evaluation plan can be divided into three parts according to the provided targeted labels and their profit weights: (1) given single targeted label, (2) given multiple targeted labels with equal profit weights, and (3) given multiple targeted labeled with different profit weights. Finally, we show the time efficiency (in second) for the four abovementioned methods. We present the results for each part in the following.

6.1 Single Targeted Label

Figure 5 and 6 show the effectiveness of the targeted label drama and comedy respectively. We can find that, in general, the Proximity Discount algorithm outperforms the Labeled New Greedy when the number of seed nodes is low. The Maximum Coverage shows similar trends. When the number of seed nodes exceed about 3~6, the Labeled New Greedy is a little better than the other three ones gradually.

However, as we will elaborate later in time efficiency, the Labeled New Greedy has an infeasible execution time for online making marketing strategies. On the other hand, the total profit of the Labeled Degree Discount is the worst one when the targeted label is drama. We think it is because the nodes with

drama label dominate the network and those belonging to drama usually gather together, and the Labeled

Degree Discount fails to capture such kind of distribution. As for the comedy label, which is distributed around the network, the Labeled Degree Discount can normally perform closely to the other threes.

Figure 5. The effectiveness of , .

Figure 6. The effectiveness of , .

6.2 Multiple Targeted Labels with Equal Profit Weights

Next we demonstrate the results when the online queries contain multiple targeted labels with equal profit weights. Figure 7 and Figure 8 show the effectiveness of queries consisting of two targeted labels {comedy, biography} and {comedy, thriller} with equal weights. Again, the results exhibit the Proximity Discount outperforms the other three when the number of seed nodes is small (about 1~6). As the seed number further increases, other two methods, the Maximum Coverage becomes the better one. In general, the Labeled Degree Discount also performs well.

Figure 7. The effectiveness of , .

Figure 8. The effectiveness of , .

We further want to know how the proposed methods can perform when all the labels are considered as the targeted ones. That says, it turns out to be the original influence maximization problem. The experimental result in Figure 9 shows that Labeled New Greedy outperforms the other three methods no matter how the number of seed nodes is low or high, while the worst one is the Labeled Degree Discount.

However, since the time efficiency of the Labeled New Greedy is infeasible for real-time querying, as will be shown in Figure 12, we suggest using the Proximity Discount when the number of seeds is low and using the Maximum Coverage when the number of seeds becomes high.

Figure 9. The effectiveness of , and each label’s profit equals is set to be 1.

6.3 Multiple Targeted Labels with Different Profit Weights

In the third part, we designate the online queries contains multiple targeted labels with different profit weights. Figure 10 shows the results of that the targeted labels are drama and comedy with profits of 1 and 3 respectively while Figure 11 presents the effectiveness of that the targeted labels are drama,

comedy, and thriller with 1, 3, and 5 profit weights. In general, the Labeled New Greedy outperforms the

other three ones. We can find, again, the Labeled New Greedy is the better one, though its execution time is infeasible for online marketing analytics (as shown in Figure 12). The more feasible is the Proxmity Discount method, which can not only earn higher total profits but also allow real-time querying. The Labeled Degree Discount performs relatively worse for such kind of varying profit weights.

Figure 10. The effectiveness of , and .

Figure 11. The effectiveness of , .

6.4 Time Efficiency

Figure 12 formally presents the execution time (in second) for both the four methods. The results are derived under the case that the targeted label contains only comedy with single profit weight since the other cases show the same trends. We can find that Labeled Degree Discount is fastest one, which even needs no more than one second, because it considers only the one-step neighbors. The time efficiency of the Proximity Discount and Maximum Coverage is acceptable. They also have promising effectiveness in different cases, as shown previous elaborations. For the Labeled New Greedy, though it performs well in some cases, its run time is totally infeasible for the online marketing analytics. Note that the Labeled General Greedy and Labeled CELF Greedy are not reported here because their execution time is too long for us to complete the execution and thus not applicable to the real-time interactive analytics for marketers.

Figure 12. The time efficiency (in second) of , .

在文檔中標籤社群網絡之影響力最佳化 (頁 17-21)