• 沒有找到結果。

Validate the Accuracy and Effectiveness of the Emerging Topic Detection

Chapter 8 The Research Experiment of the Development of Emerging Topic

8.4 Validate the Accuracy and Effectiveness of the Emerging Topic Detection

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Year Journal Virtual Environments

JVDP Conference Virtual Environments CVDP

7th year VDP 0.326 0.295 0.304 0.381

8th year VDP 0.299 0.271 0.229

9th year VDP 0.291 0.257 0.165

10th year VDP 0.277 0.236 0.167

11th year VDP 0.255 0.203 0.131

12th year VDP 0.256 0.191 0.113

13th year VDP 0.250 0.178 0.135

14th year VDP 0.234 0.168 0.120

15th year VDP 0.218 0.150 0.107

16th year VDP 0.213 0.153 0.102

17th year VDP 0.197 0.138 0.099

8.4 Validate the Accuracy and Effectiveness of the Emerging Topic Detection Indices

In order to validate the accuracy and effectiveness of the emerging topic detection indices, we look for related work that also detected emerging topics but within a different research time range and field. The most similar work that we found is the work of (Jo et al., 2007) in SIGKDD. Their approach is based on the intuition that documents related to a topic should be more cohesively connected in the citation graph than a random selection of documents which are connected in the citation graph.

Their work used the Citeseer data which contains 716,771 papers, with 1,740,326citations. This amounts to 2.43 citations per paper. For each paper, we use its title and abstract combined as its document. The number of bigrams in the corpus after pruning out the low-frequency bigrams and 35 stop words is 631,839. The majority of papers are from year 1994 to year 2004.

Besides the research approach, the database and the time range of the dataset are different from our study, but if we use the same time range to examine the research result, our approach could predict with precision the correct proposed emerging time of each topic. Their work contains both the conference papers and journal papers. In order to map their work we selected topics which they claimed to have emerged from their work. These topics are image retrieval, sensor network, semantic web, support

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

1. Image retrieval

Since the work contains both conference papers and journals, we also combined the volume of conference and journal papers in each year. The original graph of their work is as shown in Fig. 8-1. Using the approach of our work and database we get Fig.

8-2 which represents the published volume of image retrieval in ACM.

Fig. 8-1 The Topic of Image Retrieval Evolution Over Time from 1994 to 2004 (Jo et al., 2007).

Fig. 8-2 The Topic of Image Retrieval Evolution Over Time from 1997 to 2008 in This Research.

We discovered that the year of detection point is 2002. It represents that the topic of image retrieval started to emerge in ACM in the year 2002. It seems different from their work. But if we use the time range from 1997 to 2004, which is the same timeframe as their work, then we arrive at results as displayed in Fig. 8-3. The reason we do not use the 1994-2004 time bracket is that the developed year of image

0 10 20 30 40 50 60

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Year

T he publ is he d vol um e

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

retrieval in ACM started from 1997. The fact that the detection point falls in the year 2000 is similar to the result presented in their work that the year 2000 is the highest volume in the evolution.

Fig. 8-3 The Detection Point of Image Retrieval from 1997 to 2004

2. Sensor networks

The topic sensor network is one of the most emergent topics in their work where the emerging time falls in the year 2004. Fig.8-4 shows the topic evolution of sensor networks over time. The research shows the original published volume of conferences and journal in Fig. 8-5 and Fig. 8-6 respectively. The year of conference detection point is 2004. It is the same as Jo et al. (2007) and the year of journal detection point is 2006. The result is shown in Fig. 8-7.

0.4 0.6 0.8 1 1.2

1997 1998 1999 2000 2001 2002 2003 2004

PVI NI

Year

T he V al ue o f D et ec ti on poi nt

Detection point

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Fig. 8-5 The Conference Papers Published Volume of Sensor Networks in ACM Database.

Fig. 8-6 The Journal Papers Published Volume of Sensor Networks in ACM Database.

Fig. 8-7 The Detection Point of Sensor Networks in ACM Database.

0 50 100 150 200 250 300 350

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Year

T he publ is he d vol um e

0 10 20 30 40 50

2003 2004 2005 2006 2007 2008

Year

T he publ is he d vol um e

0 0.2 0.4 0.6 0.8 1 1.2

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

J-volume J-novelty C-volume C-novelty

Year

C-detection point

J-detection point

T he va lue of d et ec ti on po int

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

3. Semantic web

The topic of semantic web, which emerged in the year 2004, is another topic considered the most emergent that was discussed in their work. Fig. 8-8 is the topic evolution of semantic web over time. The research shows the original published volume of conferences and journal in Fig. 8-9 and Fig. 8-10 respectively. The year of conference detection point is 2004. It is same as Jo et al. (2007) and the year of journal detection point is 2007. The result is shown as Fig. 8-11

Fig. 8-8 The Topic Evolution of Semantic Web Over Time from 1994 to 2004 (Jo et al., 2007).

0 20 40 60 80 100

2001 2002 2003 2004 2005 2006 2007 2008

T he publ is he d vol um e

Year

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Fig. 8-10 The Journal Papers Published Volume of Sensor Networks in ACM Database.

Fig. 8-11 The Detection Point of Semantic Web in ACM Database.

4. Support vector

The topic support vector is one of the emerging topics in their work where the emerging time is year 2004. Fig. 8-12 is the topic of evolution of support vector over time. The research shows that the year of conference detection point is 2004. It is the same as Jo et al. (2007) and the year of journal detection point is 2005. The result is shown as Fig. 8-13. In these four topics which Jo et al. (2007) considered as the emerging topics in 2004, all are detected via the emerging topic detection indices correctly.

0 2 4 6 8

2004 2005 2006 2007 2008

Year

T he publ is he d vol um e

0 0.2 0.4 0.6 0.8 1 1.2

2001 2002 2003 2004 2005 2006 2007 2008

J-volume J-novelty C-volume C-novelty

Year

C-detection point

J-detection point

T he va lue of d et ec ti on po int

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Fig. 8-12 The Topic Evolution of Support Vector Over Time from 1994 to 2004 (Jo et al., 2007).

Fig. 8-13 The Detection Point of Support Vector in ACM Database