1. availability 2. www 3. character 4. technique 5. re-examination 6. lack 7. amethyst 8. priority 9. book 10. function 11. permutation-based 12. shadow 13. wikipedia 14. topicnets 15. simplex 16. model
17. gaussian 18. system 19. k-way 20. system-throttling Table 4.3: Top 20 Words Learned from the FM Model with Textual Information from Dataset I.
term frequency. Note that we use title words of the papers only as the textual information;
the resulting vocabulary sizes of Dataset I and Dataset II are 4,057 and 50,059, respec-tively. Due to the randomization of the algorithms implemented in libFM, the values of the FM models are the average scores over 20 experiments for Dataset I; for Data II, only 5 experiments are conducted because of the computational cost.
As shown in Table 4.2, among the four baseline methods, the baseline of #Citations realizes the highest values of the two evaluation metrics in terms of the MAS gold standard, whereas PageRank has the best performance regarding to h-index. Observe that the performance of both FMs without and with texts in terms of MAS reach better results than all of the baselines; in addition, almost all results significantly outperform the four baselines with a p-value of 0.05. With the inclusion of the coauthor papers (Dataset II), the FM model with textual information achieves the best performance, i.e., ρ = 0.601 and τ = 0.463. On the other hand, regarding to h-index, the performance gains of the FM models with textual information are statistically significant compared against all of the four baselines. From the experimental results, we can observe that incorporating the supplementary textual information did greatly improve the performance, which confirms that the textual information is beneficial to model the social influence. The top 10 authors ranked by our best model (FM with textual information of Dataset II) is listed in Table 4.1;
the top 20 words is listed in Table 4.3.
4.3 Discussion
Our results show that the FM-based framework with supplementary information takes an effecient way to predict a leader in a social network. First, we propose an influence matrix to represent the latent influence. To reduce the complexity of our framework, we only consider the relation of an author and the author’s coauthor. We did not discuss the influence of an author of the estrangement one. In fact, a remote author still wield some influence. We ignore these influences in our hypothesis.
According to the dataset is easily to represnt as an user-item matrix also there are 14
‧
several similar quality between recommender system and finding a leader. The purpose of recommender system is to recommend the most appropriate item to an user. We compute the probability of an item which an user will like it or not during the items of the user ever liked or used. In social network, different users may active in the same item. For example, there are two users share the same article or they post a post together. We also can compute the probabilty of an post which an user will like or not. We assume that if an user likes a post which means the post affects the user. So, we compute the probability between each items and users and regarded this value as the influence. In summary, FM is the most appropriate algorithm to solve the problem.
Our results indicate a little different appearance in the two gold standards. First, the performance of page rank under h-index is better than the performance under Microsoft.
Also, there are the same problem in our framework. As we known, the dataset is from dblp which is updated also the standard of h-index from google is updated, but the standard from Microsoft is outdated. We consider this situation made the results under the standard of Microsoft worst which means the dataset is more fit the standard.
In Dataset I, we observe that FM without texts still has poor performance even less than the paper counts and citation counts under the standard of h-index. But the perfor-mance is much better in Dataset II. The explain of this result is the FM is not powerful to analysis a network if the dataset is not enough. In other words, the Dataset I does not contain enough information. After we used the extensive information, the performance grows up dramaticly. We can say the additional information can effective remedy a poor dataset.
In other words, the ranking list of Microsoft academic and Google research are depen-dant on the simple features like paper counts, cited counts. The results of our framework capture the latent infuence in the network, we consider the result can show the real situa-tion of the network.
We also list the top 20 words learned from the FM model with textual information from DatasetI as shown in Table 4.3 and the word cloud in Figure 4.1. The word in cloud is shown as the stemming type. The size of word means the weight of the word.
We can observe the top words are ’availabilty’, ’re-examination’, ’book’, ’wikipedia’ and
’guassian’. In this word list, we consider it exists some noises but there are still fetch
‧
Figure 4.1: The Word Cloud of the Top 20 Words from Dataset I.
16
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Chapter 5 Conclusions
5.1 Conclusions
In this paper, we have represented a FM framework to model the social influence among individuals based on their patterns of collaborations in a social network and supplemen-tary textual information. The proposed influence matrix captures the influence between authors during papers. Through the computation of FM, we get the latent inluence matrix furthermore. The FM is applicable to many kinds of features, which means we can obtain the higher level information from multiple asepects. In our task, we enforce the perfor-mance by obtaining the textual information. Our experimental results on the two datasets for the data mining community show that the proposed approach provides a better predic-tive model than the four baselines. Otherwise, obtaining the textual information indeed benefits the predictive performance. Then, the predictive model shows how does the tex-tual information works that is we can remember which texts with powerful influence and which texts with lower influence.
The influence spreading is an important issuse in social network. Although our pro-posed latent influence matrix can capture the latent influence of an author of the cognate one, we still cannot capture the latent influece from an estrangement authors. Further, we seek to improve the latent influence matrix. That is, we expect to develop a model to capture the deeper latent influence. On the other hand, we will conduct experiments on larger data sets with various fields of research communities or reality social network.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
18
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Bibliography
[1] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
[2] L. Liu, J. Tang, J. Han, and S. Yang. Learning influence from heterogeneous social networks. Data Mining and Knowledge Discovery, 25(3):511–544, 2012.
[3] J. L. Myers, A. Well, and R. F. Lorch. Research design and statistical analysis.
Routledge, 2010.
[4] S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and external influence in networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pages 33–41, New York, NY, USA, 2012. ACM.
[5] S. Rendle. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pages 995–1000, Washington, DC, USA, 2010. IEEE Computer Society.
[6] S. Rendle. Factorization machines with libfm. ACM Trans. Intell. Syst. Technol., 3(3):57:1–57:22, May 2012.
[7] X. Shuai, Y. Ding, J. Busemeyer, S. Chen, Y. Sun, and J. Tang. Modeling indirect influence on twitter. Int. J. Semant. Web Inf. Syst., 8(4):20–36, Oct. 2012.
[8] L. Terveen and W. Hill. Beyond recommender systems: Helping people help each other. 2001.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[11] K. Zhou, H. Zha, and L. Song. Learning social infectivity in sparse low-rank net-works using multi-dimensional hawkes processes. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pages 641–649, 2013.
20