• 沒有找到結果。

Comparing the impact power of publications using the impact factor66

Chapter 6 Determination of Impact Research Topics via the Bayesian Estimation of

6.2 Experiment to Validate the Impact Power of Publications

6.2.1 Comparing the impact power of publications using the impact factor66

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Fig. 6-2 Procedures of Validating the Impact Power of a Publication.

6.2.1 Comparing the impact power of publications using the impact factor

We use Bayesian estimation to model the impact power of publications as proposed in our research approach then compare these results with the ISI impact

ISI journal lists

with impact factor

Validate the results of the impact power of publications as proposed by our model

Survey experts investigate this same topic

Mapping Mapping

Lists of authors’

publications

Find

Previous related work Lists of 10 authors

Lists of authors’

publications

Find

Our research model (without contribution

weight)

Lists of authors’

publications

Find

Our research model (with contribution

weight)

Mapping Mapping Mapping

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

factor as looked up in the JCR database. The results are basically similar. We both use the volume of publications and citation frequency in the calculation. However, the difference is that they arrive at the impact factor using only information from the past two years in the computation while with the Bayesian estimation of our approach we use sustained concepts which is the previous data will influence the result to model the impact power. Previous reputation will be considered and the posterior distribution will be the prior distribution in next step when new information comes.

We collect data on papers from CiteSeer on the topic “data mining” as determined from the abstract. We collect 1389 papers including conference and journal papers for which we can find both their authors and publications. There are 281 journal papers and 1108 conference papers. After computation we discover that there are 583 different kinds of publications, 131 journal publications and 452 conference publications. The statistics are shown in Table 6-5.

Table 6-5 Statistics for Papers and Publications.

Unit Papers Publications

Volume 1389 583

Type Journals Conferences Journals Conferences

Volume 281 1108 131 452

We have to decide on the recommendation threshold for this enormous number of papers and publications. We find that about 35 publications have published more than 5 papers on data mining. We try to incorporate the criteria of the top 35 publications in our model and recommend these to researchers. The impact powers of the top 35 publications are described in Table 6-6. We can see that the top impact publication in data mining is the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) which has an impact power of 67.6986%, almost 4-5 times that of the second best. There are 11 journals and 24 conferences in the top 35 publications. The details are given in the following tables: Table 6-7 for the top 10 journals; Table 6-8 for the top 10 conferences for this type of publication; and Table 6-9 shows the top 10 publications regardless of whether they are the journal or conference type.

Table 6-6 Top 35 Impact Publications in Topic of Data Mining (during 1990-2002).

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Rank C/J Title of the publication Impact

Power

3 C Very Large DataBase (VLDB) 4.6433%

4 C Industrial Conference on Data Mining (ICDM) 2.2011%

5 C Principles of Data Mining and Knowledge Discovery (PKDD) 2.0353%

6 C International Conference on Data Engineering (ICDE) 1.7043%

7 J Data Mining and Knowledge Discovery 1.3425%

8 C Pacific-Asia Conference on Knowledge Discovery and Data

Mining (PAKDD) 0.8536%

9 J IEEE Transactions on Knowledge and Data Engineering

(TKDE) 0.8119%

10 C Lecture Notes in Computer Sciences (LNCS) 0.7115%

11 C International Conference on Information and Knowledge

Management (CIKM) 0.4259%

12 J Knowledge and Information Systems (KIS) 0.1765%

13 C SIAM International Conference Proceedings on Data Mining

(SDM) 0.1593%

14 J SIGKDD Explorations 0.1297%

15 C ACM Symposium on Principles of Database Systems 0.1148%

16 C International Conference on Data Warehousing and Knowledge

Discovery (DaWaK) 0.1110%

17 C Lecture Notes in Artificial Intelligence (LNAI) 0.0983%

18 J IEEE Bulletin of the Technical Committee on Data

Engineering 0.0701%

19 C SIGMOD Workshop on Research Issues in Data Mining and

Knowledge Discovery (DMKD) 0.0649%

20 C ACM Conference on Computers and Security 0.0626%

21 J IEEE Computers 0.0623%

22 C International Conference on Database Theory (ICDT) 0.0618%

23 J SIGMOD Record 0.0555%

24 J Communications of the ACM (CACM) 0.0548%

25 C European Conference on Machine Learning (ECML) 0.0545%

26 J Bioinformatics 0.0444%

27 J Machine Learning 0.0381%

28 J IEEE Transactions on Visualization and Computer Graphics 0.0372%

29 C Genetic and Evolutionary Computation Conference (GECCO) 0.0311%

30 C ACM International Conference on Digital Libraries 0.0303%

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Rank C/J Title of the publication Impact

Power

31 C Advances in Large Margin Classifiers 0.0288%

32 C Advances in Distributed and Parallel Knowledge Discovery 0.0271%

33 C IEEE International Conference on Tools with Artificial

Intelligence (ICTAI) 0.0268%

34 C Advances in Digital Libraries Conference (ADL) 0.0268%

35 C Advances in Neural Information Processing Systems (NIPS) 0.0264%

Table 6-7 Top 10 Impact Journals in Topic of Data Mining (during 1990-2002).

Title of the publication IF

2002

IF 2008

1 Data Mining and Knowledge Discovery 1.192 2.421

2 IEEE Transactions on Knowledge and Data Engineering 1.055 2.236

3 Knowledge and Information Systems N/A 1.733

4 SIGKDD Explorations N/A N/A

5 IEEE Bulletin of the Technical Committee on Data Engineering N/A N/A

6 IEEE Computers 1.484 2.611

7 SIGMOD Record 0.228 1.620

8 Communications of the ACM (CACM) 1.497 2.646

9 Bioinformatics 4.615 4.328

10 Machine Learning 1.944 2.326

Table 6-8 Top 10 Impact Conferences in Topic of Data Mining (during 1990-2002)

Title of the publication

1 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2 SIGMOD

3 Very Large DataBase (VLDB)

4 Industrial Conference on Data Mining (ICDM)

5 Principles of Data Mining and Knowledge Discovery (PKDD) 6 International Conference on Data Engineering (ICDE)

7 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Table 6-9 Top 10 Impact Publications in Topic of Data Mining (during 1990-2002).

Title of the publication Impact

Power

1 C ACM SIGKDD Conference on Knowledge Discovery and Data

Mining (KDD) 67.6986%

2 C SIGMOD 14.7991%

3 C Very Large DataBase (VLDB) 4.6433%

4 C Industrial Conference on Data Mining (ICDM) 2.2011%

5 C Principles of Data Mining and Knowledge Discovery (PKDD) 2.0353%

6 C International Conference on Data Engineering (ICDE) 1.7043%

7 J Data Mining and Knowledge Discovery 1.3425%

8 C Pacific-Asia Conference on Knowledge Discovery and Data

Mining (PAKDD) 0.8536%

9 J IEEE Transactions on Knowledge and Data Engineering 0.8119%

10 C Lecture Notes in Computer Sciences (LNCS) 0.7115%

We use the ISI impact factor and get the value from the JCR database as in Table 6-7. The JCR database can trace the value from 2002 to 2008. Since the dataset in our research model is from 1990-2002, the impact factor from 2002 may be suitable for reference. The latest impact factor available is for 2008. We consider both values. The journal Bioinformatics has the highest impact factor in 2002 or 2008, but it is possible that the medical or biology discipline may not really have most impact in the data mining topic. This is also a limitation of the ISI approach, it can only rank by unit of category and not by topic. The experts we survey claim that the journal Bioinformatics may contain some publications about data mining and thus have a high impact according to citation frequency, but it may not the main journal in this area of data mining. There was no impact factor given for the journal, Knowledge and Information Systems, in 2002 because it was not in the SCI/SSCI list that year. This is also the case with the journal SIGKDD Explorations and the IEEE Bulletin of the Technical Committee on Data Engineering, which both appear in the collection of conference papers. Neither are given an impact factor in 2002 or 2008. Conference publications are also not computed in ISI which is another reason why we can not simply use the impact factor to evaluate the impact of these publications.

6.2.2 Comparing the impact power of publications using the publication list of authors recommended in previous work

In this section we will compare the publication list obtained with our research model with that of the publication list of authors suggested by 3 different models. The core concept is that for an author to have impact in a topic area that author’s

publications will in certain cases be included in important publications in the same topic. In order to validate the lists, we mark publications of authors recommend in previous works “●”; those obtained from our Bayesian estimation model without the contribution weight are marked “◆”; and those obtained with the contribution weight are marked “▓”. This information is described in Table 6-10.

Table 6-10 Comparison of the Impact Power of Publications to the Lists of Authors Suggested by Three Different Models.

Rank C/J

Title of the publication

● ◆ ▓

1 C ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)

8 C Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

X X X

9 J IEEE Transactions on Knowledge and Data Engineering (TKDE)

X X X

10 C Lecture Notes in Computer Sciences (LNCS) X X 11 C International Conference on Information and Knowledge

Management (CIKM)

X X X

12 J Knowledge and Information Systems (KIS) X X X 13 C SIAM International Conference Proceedings on Data Mining

(SDM)

X X X

14 J SIGKDD Explorations X X X

15 C ACM Symposium on Principles of Database Systems

16 C International Conference on Data Warehousing and Knowledge Discovery (DaWaK)

X X X

17 C Lecture Notes in Artificial Intelligence (LNAI)

18 J IEEE Bulletin of the Technical Committee on Data Engineering

X

28 J IEEE Transactions on Visualization and Computer Graphics

29 C Genetic and Evolutionary Computation Conference (GECCO) X 30 C ACM International Conference on Digital Libraries

31 C Advances in Large Margin Classifiers

32 C Advances in Distributed and Parallel Knowledge Discovery 33 C IEEE International Conference on Tools with Artificial

Intelligence (ICTAI)

X X X 34 C Advances in Digital Libraries Conference (ADL) X X X 35 C Advances in Neural Information Processing Systems (NIPS)

Count of numbers 26 26 23

The model results appear in the following tables. Table 6-11 shows the numbers of finding results, the category number of author’s publications, the total number of publications in our model, the recall rate and precision rate of each mode. Table 6-12 gives us the precision of each model considering the different lists of suggested publications. These tables show quite similar results between all 3 models. This supports the idea that no matter what models of the authors we suggested, ourpublications proposed are representative of important or impact publications in this topic area.

Table 6-11 Information, Recall Rate, and Precision Rate of Each Model.

Rosen-Zvi,

Recall rate of the model

Recall=A/B 73.19% 71.34% 71.32%

Precision rate of the model

Precision=A/C 17.41% 20.17% 16.72%

Table 6-12 Precision of Each Model While Considering the Lists of Suggested Publications.

Rosen-Zvi,