• 沒有找到結果。

Experiment to Validate an Author’s Impact Power

Chapter 6 Determination of Impact Research Topics via the Bayesian Estimation of

6.1 Experiment to Validate an Author’s Impact Power

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Chapter 6 Determination of Impact Research Topics via the Bayesian Estimation of Author-Publication Correlations

In this section we model the research approach and validate whether the authors and the publications found by our model really do have impact. We suggest the high impact research topics will come from author-publications that possess greater impact power than others. We should confirm that the authors and the publications proposed using our approach are really high impact authors and publications. Section 6.1 describes the experiment of validating the author’s impact power. Section 6.2 illustrates the experiment of validating the publication’s impact power. Section 6.3 shows how to find the high impact topics using the proposed model.

6.1 Experiment to Validate an Author’s Impact Power

In this section we develop an experiment using this research model to validate the author’s impact power. We survey previously published related works to find information about which are already validated using other methods. On the other hand, we also survey some experts on the topic selected to validate the model. The validation procedure for author impact power is illustrated in Fig. 6-1.

Fig. 6-1 Validation Procedure for Author Impact Power.

Authors listed as recommended by previous related work

Validate the results of the impact power of the author as proposed by our model.

Interview the experts who investigate this same topic.

Mapping Mapping

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

the authors. The experimental results can help the reader to predict the authors’

domain based on the words they use in a paper. Although their goal was not to find the authors with most impact in a topic area, but list the top ten authors who use the words related to the same topic. The results imply that they are the authors with higher impact because they publish more papers and use more words related to a topic than others.

We collect a dataset and then try to make the comparison between previous work and our work in order to examine the research model. As we know, it is difficult to compare two different models entirely, especially when we do not use exactly the same dataset. Consequently, we try to collect the most similar dataset possible. Their dataset comes from the well-known database, the CiteSeer digital library, also referred by other researchers in this area (such as Bolelli, Ertekin, Zhou & Giles, 2009; Liu, Niculescu-Mizil & Gryc, 2009; and Rosen-Zvi, Chemudugunta, Griffiths, Smyth &

Steyvers, 2010), and is often used in papers collecting works on computer science. In the previous work they used the dataset published between 1990 and 2002. We select one of the 300 topics mentioned in their study to map out our work, specifically the topic of “data mining”. The top two words used in the topic “data mining” are “data”

and “mining”. We use their abstract as our descriptor of a paper to compare to their work. This means that if the term “data mining” is found in the abstract and we consider that paper as one involving data mining. All authors and publications involved in a paper involving the topic, data mining, are viewed as belonging to the same society. In this study we consider data mining for the years of 1990-2002.

Different research models and goals lead to a focus on different model view-points during computing. In our model, the number of co-authors will influence the impact power. Different from the work we want to compare, they emphasis on the words an author used and the co-author will be the opportunity to find out the combination of crossing domains, our research focus on the number of co-authors will duplicate calculating the power of a paper. Although we look at a paper as a research unit, it might have more than one author. If we compute the impact power of all a paper’s authors, it might give a paper with several co-authors a greater impact power.

As we know, this does not make sense in the real world. We need some assessment criteria. During assessment, we ignore the number of co-authors per paper, viewing all of them as making almost the same contribution, even this may not exactly fit in with the real situation. In order to solve this problem, in this study we use the weighting concept, as discussed in the next paragraph, to model the contribution of a paper and to ensure that the impact power of a paper is not be affected by the number of co-authors.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

The contribution weight of co-authors can be based on the sequence in which the authors are listed. We suggest that the first author of a paper has made the greatest contribution to a paper, and that the contribution will decrease in relation to the sequence listed by these authors of a paper. We use an arithmetic sequence to model the weight ratio. For example, if there are 3 authors in a paper, each weight contribution to this paper is divided by the weight ratio 3:2:1, meaning that the 1st author gets 3, the 2nd author gets 2, and the 3rd author gets 1. In order to normalize the value of the total weight so that is equal to 1, the contributions are weighted 3/6, 2/6 and 1/6. Although the 1st author’s contribution may not actually be 3 times the 3rd author’s so that the weight ratio does not exactly reflect the relative strength between these authors, this strategy can reflect the fact that the first author has a greater contribution. This method controls the total weight to be equal to 1 without being affected by the number of co-authors. Algorithm 6-1 is used to model the circumstances.

Algorithm 6-1: Identifying the contributory weight of each author for a single paper Input: n is the number of co-authors

i is the sequence of authors listed for a paper

sum is the summation of 1 to n

Output: weight is the contributory weight of the ith author

i 1 For i=1 to n

2

sum i weight

i

n + −

= 1

3 Next

The contributory weight of each co-author is not only a measure for computing each author’s impact power for a paper but also help calculate the impact power the author will receive from a cited paper. When we want to determine a paper’s impact, we use all of the authors’ impact powers and each contributory weight to compute it.

If we want to know an author’s impact power, we can summarize all the papers written by that author and calculate the published volume and citation frequency of each paper published by the contribution weight.

We use the dataset published during 1990-2002. The topic is “data mining”. We

Table 6-1 Top 10 Impact Authors in the Topic of Data Mining.

Rosen-Zvi,

2 Mohammed Javeed Zaki Rakesh Agrawal Rakesh Agrawal

3 Bing Liu Hannu T. T. Toivonen Ron Kohavi

4 David W. Cheung Heikki Mannila Mohammed Javeed Zaki 5 Kyuseok Shim Salvatore J. Stolfo Foster J. Provost 6 Heikki Mannila Osmar R. Zaïane George Karypis 7 Rajeev Rastogi Mohammed Javeed Zaki Charu C. Aggarwal

8 Venkatesh Ganti Philip S. Yu Chris Clifton

9 Hannu T. T. Toivonen George Karypis Osmar R. Zaïane

10 Huan Liu Ron Kohavi Ming-syan Chen

We use the different symbols to represent different model lists. Duplicate authors duplicate will get more than one symbol when an author belongs to more than one model. The results with symbols are shown in Table 6-2. The “●” represents the model of Rosen-Zvi, Chemudugunta, Griffiths, Smyth and Steyvers (2010); the “◆”

represents our research model without the contribution weight; and the “ ▓ ” represents our research model with the contribution weight. The value for each model is ●=16/30=53.33%; ◆=20/30=66.67%; and ▓=18/30=60%. Each list of these models has a cover rate of more than 50%, meaning that these lists of models are valuable and worth consideration while mentioning the impact author in the topic data mining during 1990-2002.

The results obtained using our research model without the contribution weights are most similar to the other list. One reason might be that the first model considers words and relationships between the topic-author. They list the top 10 persons using more words in that topic than others. We suggest that is the greater volume published in the same topic will cause the sequence of the first model. However, in our model we consider the impact power, not only of the words used and the topic, but also the citation frequency, which is an endorsement by other authors. The model with the contribution weight is a more extensive approach and reveals the best cover rate between the 3 models.

Table 6-2 Top 10 Impact Authors within the Topic of Data Mining (during 1990-2002).

No.

2 Mohammed Javeed Zaki

6.1.2 Comparing the impact power of authors with the expert survey

Besides comparing the experimental results with those obtained with previous models as noted in the literature review, we also survey experts who have investigated the topic of data mining. We list all the authors in alphabetical order without duplication for the three models. There are 20 different authors. We survey five

calculation, with yes being one, no being negative one, and no opinion represent by zero. Even when the expected value is 0, we show average grades of greater than 2.25 in bold face. There were only 9 authors who exceeded this threshold. The grades of each author are shown in Table 6-3.

We compare the survey result for the three models as mentioned above, and get the information in Table 6-4. The precision and recall of the first model are 40% and 44.44% which is obviously lower than that obtained with our models, whether with or without the contribution weight. This indicates that our model results conform more closely to the expert’s expectations than the results obtained in previous work for determining with author has the largest impact in the topic area of “data mining”.

Table 6-3 Grades of Authors Surveyed by Academic Experts.

No. Author grades No. Author grades

* The bold values indicate authors receiving higher than average grades.

Table 6-4 Precision and Recall of Author Impact Obtained with Each Model.

No.

Javeed Zaki Rakesh Agrawal Rakesh Agrawal Heikki Mannila

3 Bing Liu Hannu T. T.

Toivonen Ron Kohavi Ming-syan Chen

4 David W.

Cheung Heikki Mannila Mohammed

Javeed Zaki Rakesh Agrawal

Stolfo Foster J. Provost Bing Liu 6 Heikki Mannila Osmar R.

Zaïane George Karypis Charu C.

Aggarwal 7 Rajeev Rastogi Mohammed

Javeed Zaki

Charu C.

Aggarwal George Karypis 8 Venkatesh Ganti Philip S. Yu Chris Clifton Mohammed

Javeed Zaki 9 Hannu T. T.

Toivonen George Karypis Osmar R.

Zaïane Philip S. Yu 10 Huan Liu Ron Kohavi Ming-syan Chen

Precision 40% 60% 60%

Recall 44.44% 66.67% 66.67%