How to find impact research topics using the proposed model

Chapter 6 Determination of Impact Research Topics via the Bayesian Estimation of

6.3 How to find impact research topics using the proposed model

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

include almost all the conferences with impact in the topic of data mining.

6.3 How to find impact research topics using the proposed model

After validating the impact power of authors and publications, we can use information about author-publication correlations to figure out which paper is an impact paper and what topic discussed in these papers could be impact research topics.

We use the method mentioned above to calculate each paper’s impact power. We also use the topic “data mining” and the CiteSeer digital library to investigate the citations of each paper associated with data mining from 1990-2002. The volume published by each author within that topic is used to compute the prior impact power of each author.

Similarly, the volume of each publication on that topic will be used to calculate the prior impact power of each publication. The likelihood function of an author is calculated from the citation count for the cited author. In the same way, the likelihood function of a publication is computed using the citation count of itself.

After calculating the prior probability and likelihood function of each author and publication, the posterior probability of each author and publication will be produced.

The posterior probability of each author and publication can be referred to as their impact power which reflects the influence they exert in discussions in the field. The following research topics from these authors and publications with higher impact power than others and will be paid closer attention to. New topics introduced from these authors and publications with high impact power and can be examined to determine whether they are emerging topics using the emerging topic detection table.

All procedures for detecting candidate emerging topics are shown in Fig. 6-3.

The impact power of each paper is shown Table 6-16. The characteristics are shown in Table 6-17. Based on our Bayesian estimation model, we can compute the impact power and the value of the paper. On the other hand, we can extract impact research topics from the abstract and title of the paper. We cannot confirm whether the topic or the paper is an emerging one because it may so popular or important that many impact authors or publications discuss it, but we can make ensure that the paper or the topic extracted using our approach is valuable and could be an impact research topic. How to use the novelty index and published volume index to detect the potential to be an emerging topic will be discussed in the next chapter.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Fig. 6-3 Process of Detecting Candidate Emerging Topics.

Choose one field

Use CiteSeer as the database

Select all papers within the same topic area

Record all citation counts for each author and publication on a topic

Compute the prior probability for each author and publication on a topic

Calculate the likelihood function for each author and publication on a topic

Get the posterior probability of each author and publication in a topic called impact power

Survey the new topics from the higher impact power authors and publications to find candidate emerging topics

‧

0357 1998 C Mining Segment-Wise Periodic Patterns

in Time-Related Databases 0.6770 0.1087 0.0736 1 0023 1995 C

Resource and Knowledge Discovery in Global Information Systems: A

Preliminary Design and Experiment

0.6770 0.0833 0.0564 2

0925 2001 C Mining Top-n Local Outliers in Large

Databases 0.6770 0.0436 0.0295 3

0104 1996 C

Developing Tightly-Coupled Data Mining Applications on a Relational Database System

0.6770 0.0375 0.0254 4

0117 1996 C

DBMiner: A System for Mining Knowledge in Large Relational Databases

0.6770 0.0352 0.0238 5

1086 2001 C Mining E-Commerce Data: The Good,

the Bad, and the Ugly 0.6770 0.0298 0.0202 6 0133 1997 C GeoMiner: A System Prototype for

Spatial Data Mining 0.1480 0.1112 0.0165 7 0412 1999 C Plan Mining by Divide-and-Conquer 0.1480 0.1087 0.0161 8 0315 1998 C Issues for On-Line Analytical Mining of

Data Warehouses 0.1480 0.1087 0.0161 9

0048 1996 C Error-Based and Entropy-Based

Discretization of Continuous Features 0.6770 0.0208 0.0141 10

Table 6-17 Information on the Papers with the Top 10 Impact Power Rankings.

ID Year C/J Title Rank PID Authors

0357 1998 C Mining Segment-Wise Periodic

Patterns in Time-Related Databases 1 KDD

Jiawei Han Wan Gong Yiwen Yin Resource and Knowledge

Discovery in Global Information Jiawei Han

‧

Mining Applications on a

Relational Database System

4 KDD

Rakesh Agrawal Kyuseok Shim

0117 1996 C

DBMiner: A System for Mining Knowledge in Large Relational Databases 1086 2001 C Mining E-Commerce Data: The

Good, the Bad, and the Ugly 6 KDD Ron Kohavi 0133 1997 C GeoMiner: A System Prototype for

Spatial Data Mining 7 SIG

Divide-and-Conquer 8 SIG

MOD

Jiawei Han Qiang Yang Edward Kim 0315 1998 C Issues for On-Line Analytical

Mining of Data Warehouses 9 SIG The major objective of this study is to detect what are emerging topics in order to provide research intelligence for academic papers. The value of candidate emerging topics can thus be checked to validate whether these topics are prospective ones or have already been adequately researched. In addition, in order to downsize the huge database, the results of the relationship between conferences and journals will be

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

applied in our investigation of whether the emerging topics proposed by influential authors at conferences and the conference itself will appear in journals in the future.

The procedures are illustrated in Fig. 6-4.

Fig. 6-4 Process of Validating the Candidate Emerging Topics.

Although the approach may not be as rigorous and meticulous as the impact factor proposed by the well known Thomson Scientific Society (Thomson), but the method can nonetheless be used to compute the degree of impact each author and publication may exert in a particular field. The Thomson method only calculates the impact factor for SCI/SCIE/SSCI journals. The method introduced in this study can be applied to calculate the degree of impact using conference publications as well.

When the relationship between conferences and journals has been verified, researchers only need to detect emerging topics from important conferences which are likely to appear in journals in future. The forward index seems to be valuable for topic detection problems. Our efforts are geared towards helping researchers detect emerging topics with greater ease and efficiency, and to identify research intelligence from academic papers and research work.

Find impact research topics

Check the potential research value of the topics by indices Get Conference

candidate emerging topics

Get Journal candidate emerging topics

Mapping

Use the Novelty index (NI) and Published volume index (PVI) to assess whether a topic is emerging or not

How to find impact research topics using the proposed model

Chapter 6 Determination of Impact Research Topics via the Bayesian Estimation of

6.3 How to find impact research topics using the proposed model

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

6.3 How to find impact research topics using the proposed model

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學