Parameters for Both Features - Parameters in PLPF

5.3 Parameters in PLPF

5.3.3 Parameters for Both Features

Table 5.3: 4 different combination of parameters β, γ, ss, sl for getting best similarities in static features and surprising features

Static Features (γ, sl) Surprising Features (β, ss)

A (0.95, 35) (0.005, 7)

B (0.95, 35) (0.035, 1)

C (0, 35) (0.005, 7)

D (0, 35) (0.035, 1)

Since we have derived 2 parameter pairs for calculating the similarities using static features (γ, sl) and surprising features (β, ss) separately, there are total 4 different combinations of parameters β, γ, ss and sl. Those 4 different combinations are listed in Table 5.3.

Set α = 0, 0.05, . . . , 1 and the result of MAP scores for the 4 different parameter combinations is illustrated in Figure 5.9. Even it is obviously seen that using surprising features only (α = 1) brings the best MAP scores, but we still need a parameter set using all features in this experiment to compare the influences. So we choose the α = 0.85 with parameter combination B, which has the highest MAP score while using all static features and surprising features. The final parameter sets for PLPF is illustrated in Table 5.4.

Figure 5.9: Tuning α to combine both similarities using static features and surprising features.

Table 5.4: 3 different parameter sets for PLPF to use static features only, surprising features only and both static and surprising features

Parameter PLPF PLPF (Static-only) PLPF (Surprising-only)

α 0.85 0 1

β 0.035 X 0.035

γ 0.95 0.95 X

s_s 1 X 1

s_l 35 35 X

Chapter 6 Conclusion and Future Work

In this paper, we propose a framework – PLPF – to do link predictions based on profiles. PLPF consists of two components: 1) an off-line component which converts the connection log into an evolving graph of several consecutive graph snapshots and then constructs/updates user profiles with features extracted from the evolving graph periodically, and 2) an on-line component which uses the profiles to do link prediction for the top-k possible links from k different users who never connect to the specific site before. Four different type of connection features are used in the profiles to capture the connection behavior of users. In addition to the connection count which is widely used in any traditional method, we also bring up other features such as the connection frequency, newly connected sites and common connection order on the newly connected sites, which can only be derived by the evolving graph view of connection network.

In the experiment, we compare our method to the state-of-the-art method – EABIF Network – proposed by Tseng et. al. [20] in a real dataset of internet connections. In effectiveness, PLPF performs better than EABIF Network when using either static fea-tures or surprising feafea-tures, and PLPF with surprising feafea-tures only gives the maximum improvement of 21.7% while comparing to EABIF Network with its best propagation model. In efficiency, PLPF shows a consistent computation time cost rather than the increasing computation time cost of EABIF Network, which is caused by recording the old information which should be faded away as time evolving. Comparing to PLPF with different types of features, PLPF with surprising features only always performs

better than with static features only. It shows that user connects to new sites in the internet based on his/her short-term interest (represented by surprising features) much more than the long-term interest (represented by static features).

The future work is to dynamically adjust the lengths of sliding window for each user.

In this work, we fix two sliding windows of different lengths (sl, ss) to capture the static features as long-term interests and surprising features as short-term interests for all users. However, for different users, the length of sliding window to capture long/short-term interests may be different or even dynamic as time evolving. Capturing users’

interest in the correct sliding window can reflect the users’ connection behavior more precisely, hence enhance the profiles content and improve the prediction result.

Bibliography

[1] Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda. Link prediction on evolv-ing data usevolv-ing matrix and tensor factorizations. Data Minevolv-ing Workshops, Inter-national Conference on, 0:262–269, 2009.

[2] Lada A. Adamic and Eytan Adar. Friends and neighbors on the web. SOCIAL NETWORKS, 25:211–230, 2001.

[3] Lars Backstrom and Jure Leskovec. Supervised random walks: predicting and recommending links in social networks. In Proceedings of the fourth ACM inter-national conference on Web search and data mining, WSDM ’11, pages 635–644, New York, NY, USA, 2011. ACM.

[4] Mohamad Badra, Samer El-Sawda, and Ibrahim Hajjeh. Phishing attacks and solutions. In Proceedings of the 3rd international conference on Mobile multime-dia communications, MobiMemultime-dia ’07, pages 42:1–42:6, ICST, Brussels, Belgium, Belgium, 2007. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).

[5] Aaron Clauset, Cristopher Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191):98–101, May 2008.

[6] Dongsheng Duan, Yuhua Li, Yanan Jin, and Zhengding Lu. Community mining on dynamic weighted directed graphs. In Proceeding of the 1st ACM international workshop on Complex networks meet information & knowledge management, CNIKM ’09, pages 11–18, New York, NY, USA, 2009. ACM.

[7] Daniel M. Dunlavy, Tamara G. Kolda, and Evrim Acar. Temporal link prediction using matrix and tensor factorizations. ACM Trans. Knowl. Discov. Data, 5:10:1–

10:27, February 2011.

[8] Lise Getoor and Christopher P. Diehl. Link mining: a survey. SIGKDD Explor.

Newsl., 7:3–12, December 2005.

[9] Anna Goldenberg, Jeremy Kubica, and Paul Komarek. A comparison of statistical and machine learning algorithms on the task of link completion. In In KDD Workshop on Link Analysis for Detecting Complex Behavior, page 8, 2003.

[10] Mohammad Al Hasan and Mohammed J. Zaki. A survey of link prediction in social networks. In Charu C. Aggarwal, editor, Social Network Data Analytics, pages 243–275. Springer US, 2011.

[11] Zan Huang, Xin Li, and Hsinchun Chen. Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, JCDL ’05, pages 141–142, New York, NY, USA, 2005. ACM.

[12] Zan Huang and Dennis K. J. Lin. The time-series link prediction problem with applications in communication surveillance. INFORMS J. on Computing, 21:286–

303, April 2009.

[13] Zan Huang and D.D. Zeng. A link prediction approach to anomalous email de-tection. In Systems, Man and Cybernetics, 2006. SMC ’06. IEEE International Conference on, volume 2, pages 1131 –1136, oct. 2006.

[14] David Liben-Nowell and Jon Kleinberg. The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management, CIKM ’03, pages 556–559, New York, NY, USA, 2003. ACM.

[15] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

[16] Galileo Mark Namata, Hossam Sharara, and Lise Getoor. A survey of link mining tasks for analyzing noisy and incomplete networks. In Jiawei Han Philip S. S. Yu and Christos Faloutsos, editors, Link Mining: Models, Algorithms, and Applica-tions. Springer, 2010.

[17] Joshua O’Madadhain, Jon Hutchins, and Padhraic Smyth. Prediction and rank-ing algorithms for event-based network data. SIGKDD Explor. Newsl., 7:23–30, December 2005.

[18] Alexandrin Popescul, Rin Popescul, and Lyle H. Ungar. Statistical relational learning for link prediction, 2003.

[19] J. Scripps, Pang-Ning Tan, Feilong Chen, and A.-H. Esfahanian. A matrix align-ment approach for link prediction. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pages 1 –4, dec. 2008.

[20] Xiaodan Song, Belle L. Tseng, Ching-Yung Lin, and Ming-Ting Sun. Personalized recommendation driven by information flow. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’06, pages 509–516, New York, NY, USA, 2006. ACM.

[21] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Min-ing, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005.

[22] Ben Taskar, Ming fai Wong, Pieter Abbeel, and Daphne Koller. Link prediction in relational data. In in Neural Information Processing Systems, 2003.

[23] Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang, and Jimeng Sun. Temporal recommendation on graphs via long- and short-term pref-erence fusion. In Proceedings of the 16th ACM SIGKDD international confpref-erence on Knowledge discovery and data mining, KDD ’10, pages 723–732, New York, NY, USA, 2010. ACM.

[24] Haiyuan Yu, Alberto Paccanaro, Valery Trifonov, and Mark Gerstein. Predicting interactions in protein networks by completing defective cliques. Bioinformatics, 22:823–829, 2006.

在文檔中 PLPF: 探勘網路行為履歷於連線行為預測 (頁 48-0)