Comparisons with EABIF Network - PLPF: 探勘網路行為履歷於連線行為預測

In this section, we compare our method – PLPF – with the most similar work driven by information flow scheme – EABIF Network – in both effectiveness and efficiency.

The comparison of precisions is performed first as operating a link prediction system using PLPF, which uses LastDay Profiles built in last day to make predictions. Then we include the most refresh data which is just before the query into profiles as Instant Proflies, and illustrate the difference between using LastDay Profiles and Instant Pro-files. In the end, we compare the computation time cost for a single query despite the huge I/O cost for each method.

5.2.1 Precision Comparison

In the most similar work [20], their method models an information flow network called EABIF Network, and uses many different propagation models to compute the propa-gation probabilities for predicting the late adopters (possible successors). Here we list the MAP scores of the EABIF Network using the most two useful propagation models with 5 different parameters in Table 5.1. It is obvious that the propagation model

Table 5.1: MAP scores for different propagation models used in EABIF Network

Propagation Models MAP Score

Exponential Weighted Summation with β = 1.0 0.208167 Exponential Weighted Summation with β = 2.0 0.199635 Exponential Weighted Summation with β = 3.0 0.197339 Exponential Weighted Summation with β = 4.0 0.196231 Exponential Weighted Summation with β = 5.0 0.195724 Summation to M Step with M = 1 0.229598 Summation to M Step with M = 2 0.228144 Summation to M Step with M = 3 0.225049 Summation to M Step with M = 4 0.221385 Summation to M Step with M = 5 0.218087

Summation to M Steps (StS(M)) performs better than Exponential Weighted Summation with weight β (EW S(β)) in all parameters. Besides, the best param-eters in both propagation models show that short propagation paths are much more important than long propagation paths to predict the new links. Later we will use the propagation model StS(1) in EABIF Network to compare the precisions with our methods.

Table 5.2: MAP scores for each method

Method MAP Score Improvement

EABIF Network (StS(1)) 0.229598 1

PLPF (Surprising-only) 0.279514 1.217406 PLPF (Static-only) 0.242124 1.054056

PLPF (Both) 0.248188 1.080968

We compare our method with EABIF Network in Table 5.2. Since we have two different types of features (static and surprising features for static and surprising inter-ests), we compare the results using each type of features only and using both types of features. It shows that PLPF performs better than EABIF Network in all three

combi-nation of two different types of features, and PLPF with surprising features only gives the best MAP score 0.279514 which has the maximum improvement (21.7%) compared to EABIF Network.

(a) Precision at rank k (b) Recall at rank k

Figure 5.1: Precision & Recall to rank k for each method.

Besides the MAP scores, the results of overall precision/recall at rank k are illus-trated in figure 5.1 with k = 10, 20, 30, 40, 50. As the result of MAP scores, PLPF with surprising feature only has the best precision and recall values rather than other methods in each rank. All methods have a better precision value when k is smaller expect for the method of EABIF Network using propagation model EW S(1.0). In addition to this, PLPF with surprising feature only give a better precision value even when k is larger (k = 50). This shows that surprising features (interests) are much useful to predict new links in internet.

5.2.2 Instant Profiles

All above comparisons use the history data which includes at most to the last day before the new site appears, and here we do a comparison on precisions with the most recent history data which includes the connections in the connection log just before the

Figure 5.2: The history data used in LastDay profiles and Instant profile.

new site appears. The two types of history data are illustrated in Figure 5.2. We call the above profile a LastDay Profile since it uses the history data which includes at most to the last day before the new site appears, and the other profile which uses the history data including to the most recent connections just before the new site appears is called an Instant Profile. However, one should notice that building Instant Profiles needs to access the newest raw connection log directly and will cost lots of time. Hence a query using Instant Profiles does not have an instant result, and one should wait a long time to complete the query.

Figure 5.3: MAP scores when using LastDay profiles and Instant profiles.

The MAP scores using different profiles for each method are illustrated in Figure 5.3. It can be obviously seen that MAP score gets higher when more recent (fresh)

data are included (using Instant Profiles) expect the EABIF Network with EW S(1.0).

This again supports that surprising feature for the surprising (short-term) interest is much more important for predicting links to a newly appeared site.

5.2.3 Computation Time Comparison

Figure 5.4: Computation time cost for a single query.

To evaluation the efficiency, we compare the computation time cost against to different length of history data we used as the input. Since we collect connection logs from 2010-09-14, we increase length of history data in half month until 2011-01-15. In figure 5.4 we compare the total computation time cost for a query from reading the connection logs to deriving the prediction result in each method. It can be seen that the computation time cost using PLPF is slightly less than using EABIF Network, and using PLPF with surprising features only (with the smaller sliding window size ss = 7) gives the smallest computation time cost which is less than one second. PLPF with static features (with a larger sliding window size sl = 35) has a relatively consistent time cost about 10 to 15 seconds after the history data includes more than 35 days (after 2010-10-31). The method using EABIF Network has an increasing computation time cost because it records the adoption time of each user-site pair, and the number

of user-site pairs it records increases as the length of history data increases. Besides, all methods have a slightly higher computation cost using the history data from 2010-09-14 to 2010-10-31. The reason is that the number of connections made in these days (2010-09-22 ˜ 2010-10-30) is much larger than other days, hence it prolong the computation cost for every method.

在文檔中 PLPF: 探勘網路行為履歷於連線行為預測 (頁 40-45)