• 沒有找到結果。

Tracking the evolution of a discipline is important to researchers and scholars (Lee et al., 1997). Knowledge in these areas can be accumulated based on experience and state-of-the-art techniques can be used to investigate trends and identify new research topics. However, before a new research topic can be identified, sometimes years can pass. It is when a topic is already in great demand, in other words, ”hot” that it starts to attract the attention of many researchers. This is always a backward index and as the number of papers discussing the same topic increases, their influence decreases. Consequently, in this study we develop some novel topic detection indices using automatic approaches which are designed to help researchers detect upcoming topics and make decisions about pursuing them before they become popular.

1.1. Research background

Topic detection and tracking (TDT) is an important field that tracks the evolution of a topic. TDT was developed in 1996 by the Defense Advanced Research Projects Agency (DARPA). A pilot study in by Allen et al. (1998) laid the groundwork for this field, generated a small corpus of information and established a durable system.

Although TDT research flourished from 1998–1999 (Lee, Lee and Jang, 2007), it remained a subject of research and practice.

Lee et al. have discussed the evolution of topics in information systems (IS) (Lee

et al., 1997). They found that the focus in journals and magazines is on different

theme, with the former focused on conceptual and abstract models, while the latter is focused on specific applications. It should be noted that academic themes show a tendency to vary more over time. Although Lee et al. discussed trends in topics related to IS, they lacked TDT techniques to deal with the time-consuming task of identification. Swan and Allen addressed the issue of how to automatically identify the timeline for a set of news stories (Swan and Allen, 2000). They used the χ2 method to identify a burst of feature terms that appear more frequently at some point in time than at other times. Kleinberg proposed a method for analyzing document streams (Kleinberg, 2002). Morinaga and Yamanishi improved Kleinberg’s approach (Morinaga and Yamanishi 2004).

Related work can be roughly divided into three groups, those that use: 1) text mining and data mining approaches (Hatzivassiloglou et al., 2000; Franz and McCarley, 2001; Kollios et al., 2003; Clifton, et al., 2004; Kuramochi and Karypis,

2004; Ozmutlu, 2006; Aurora et al., 2007; Chou and Chen, 2008); 2) those that use time-line burst detection of feature terms and measurements (Manmatha et al., 2002;

Yang et al., 2005; Wang et al., 2007; Chen et al., 2007); and 3) those that use combined content analysis or link analysis (Stokes and Carthy, 2001; Yang et al., 2002; Wu et al., 2004; Ozmutlu and Cavdur 2005; Jin et al., 2007; Steyvers et al., 2007; Jo et al., 2007; Nallapati et al., 2008; Ontrup et al., 2008; Zhang et al., 2008).

The principal task of time-line burst detection of feature terms and measurement is to determine when or whether a topic is emerging, whereas others focus on detecting the burst of a new topic.

There has been some work extending tracking or detecting techniques to such areas as literature-related discovery (Kostoff 2008; Kostoff, Briggs, Solka et al., 2008;

Kostoff, Bhattacharya, Pecht, 2007), topological analysis of citation networks (Shibata, Kajikawa, Takeda and Matsushima, 2008; Shibata, Kajikawa and Matsushima 2007; Shibata, Kajikawa, Takeda, Salata and Matsushima 2010), and combinations of other techniques and assessment tools (Daim et al., 2006; Tran and Daim, 2008; Kostoff, Johnson et al., 2007; Kostoff, Briggs, Rushenberg et al., 2007).

This area of endeavor is called tech mining and has been applied not only to find research topic for papers but also to determine new technologies and new patents for industry, organizations and governments.

1.2. Research issue

As we know, the lifecycle of a research topic can be expressed as an S-curve with five stages, which are the initial stage, early stage, expansion stage, maturation stage and decline stage (Braun, Schubert and Kostoff, 2000). Although in the maturation stage and at the peak of its lifecycle a topic is a famous one, this also means that it will soon decline. Thus, it is important for researchers to discover potential research topic as they emerge in the expansion stage of their developmental life time.

This study develops a set of novel indices for identifying such emerging topics to help researchers to determine whether a topic has the potential to become a hot topic in its lifecycle. For example, if there are ten papers in which a topic is discussed, then the impact of a new paper on the same topic can be calculated to be 1/11=0.0909. IN contrast, when there are 1000 papers in which a topic is discussed, the impact of an additional paper can be calculated to be 1/1001=0.000999.

The concept of novelty as applied by Zheng et al. (2002) and aging theory as developed in a TDT by Chen et al. (2003) are of assistance when constructing an

index for detecting emerging topics. Chen et al. (2007) used aging theory and term frequency to solve the problem of topic detection, and proved that aging theory was the best solution. Although they claimed that they could detect a topic while it was still emerging, but not before it emerged. Here, we try to use these newly developed indices to examine, from the novelty and published volume, the stage in the life cycle (expansion, maturation or decline) and to determine the topic’s research potential based on these conditions. This method gives researchers internal as well as external suggestions to consider the potential of an emerging topic.

Based on the proposed indices we can determine whether the topic of a conference or journal paper is representative of a leading trend, and how long the trend will continue. This will help researchers make decisions about whether they should pursue a topic beforehand. Conversely, not all new topics are valuable so should not become a focus of considerable research. This study develops a detection table to solve this problem. Novelty, ageing theory and the curve representing accumulative relative frequency are all used to develop an appropriate set of indices for detecting emerging topics.

1.3. Research approach

Inductive learning and deductive prediction methods of machine learning help in constructing predictable indices and determining whether they are feasible. Whether a topic will develop is based on discussion in papers within the same period. That is, the volume of published studies on a topic helps determine whether it is important.

Additionally, whether a topic has potential is partly based on its novelty.

Generally, as the number of times a topic is accessed increases, its potential value increases. A valuable topic attracts the attention of many researchers. Novel topics have considerable content that has not yet discussed. Whether in empirical or theoretical studies, novel topics have not yet completely developed and there is large space in the real world for applications. Consequently, the novelty and volume of papers published on a topic are important indices for determining it has potential to become a hot new emerging topic. The novelty index (NI) and published volume index (PVI) are developed and utilized to identify the detection point (DP) of a topic.

The DP in a topic’s life cycle will move as later papers continue to be published. The moving DPs form a detection period which represents that the topic is emerging and valuable. Its novelty and popularity (published volume) are both in the highest states in this period of time.

The NI in the period before a specific topic emerges is higher than after

emerging. Conversely, the PVI is higher in the subsequent period than the previous period. Hence, when a large volume of works are published on a topic, it is an emerging topic. The curve of the PVI increases with the percentage of publications in the previous period and the DP forms in the former period. It represents a topic at an early age as an emerging topic. If delays in published volume are significant, the rise in the curve would be delayed until a subsequent period. Regardless of the time the PVI starts increasing, it can be viewed as representative of the topic. While the DP is moving and delay the intersection by the following published volume, it represents a topic continues growing up, and it emerges for a period of time. All lifecycle records of these topics in journal and conference publications in the ACM database (at least until 2007) are compiled to form a detection table. The DP for a topic as indicated in conferences and in journals can help in determining whether conferences or journals are the leading trend of the topic.

The remainder of this paper is organized as follows. Section 2 discusses the extension of the theory application, and the development of TDT techniques and aging theory utilized in the method. Section 3 describes how to develop and construct indices in order to detect emerging topics. Section 4 presents the experimental design, execution and experimental results. Section 5 discusses the implications and contributions of the study. Section 6 gives conclusions and directions for future work.

相關文件