Chapter 7 The Indices for Emerging Topic Detection
7.3 The Properties of Emerging Topic Detection Indices
CDP (VCDP) and value of the JDP (VJDP). The VCDP is the value at which the DP intersects the Y-axis for conferences and is the same value as the CPVI and CNI. The VJDP is the value at which the DP intersects the Y-axis for journal papers and is the same value as the JPVI and JNI. Since this study uses year as a unit, and if the DP is not exactly at one year, it must be between two years. The VCDP is between 2002 and 2003, and the value is affected by
CPVI
2002 =0.227 ,CPVI
2003 =0.348 ,250 .
2002 =0
CNI
andCNI
2003 =0.200 (Fig. 7-1). The exact value of the VCDP is the center of those 4 points and is calculated as follows:4 Formula (7-5) can be used to compute the VDP.
4
7.3 The Properties of Emerging Topic Detection Indices
By creating the NI and PVI to construct the emerging topic detection indices and detection table, this study can analyze the academic publications and forecast the trend.
7.3.1 Novelty Index Properties
This study defines NI=1/n, where PDY is n. We suggest that it is a curve that can be used even this is not verified. Since the research supposes that no matter in the conferences or in the journals if the relationships exists between conferences and journals. Furthermore, the leading and following relationship (Tu & Seng, 2009), the NI will produce the same result for the relationship of conferences and journals with any validated index. Nevertheless, we assert that the NI is a reasonable and convenient index. To determine the entire lifecycle of a topic, one must obtain the termination date at which volume is 0 and determine the novelty for each year based on the termination date. For instance, if one knows that a topic has been developed for 10 years, and the NI=1 in the first year and NI=0.9 in the second year; this process continues until the last year. However, one cannot determine when a topic terminates until it is terminated. Therefore, using NI=1/n can avoid the lack, and we suggest that novelty decreases as the PDY increases. Hence, regardless of the topic, the impact of the NI is 1/n at the nth PDY, and the NI is 1 in the first year, and that in the second year is 1/2=0.5.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
7.3.2 Published Volume Index Properties
As mentioned, comparing the PVI and the traditional frequency measure can improve the forward effect. This study uses XML as an example to describe the properties of the PVI. This study uses the XML data as an example in Table 7-5 to illustrate how the PVI reflects the emergence of XML.
Table 7-5 The PVI Table of Different Situations in XML Example.
Year 2001 2002 2003 2004 2005 2006 2007 2008
Original-2006 3 11 7 11 15 20 N/A N/A
Decrease-2008 3 11 7 11 15 20 10 5
Increase-2008 3 11 7 11 15 20 40 80
PVI-2006 0.04 0.21 0.31 0.48 0.70 1.00 N/A N/A PVI-2008-decrease 0.03 0.16 0.23 0.36 0.52 0.74 0.94 1.00 PVI-2008-increase 0.02 0.08 0.13 0.19 0.28 0.40 0.57 1.00 The curve of Original-2006 in Fig.7-2 is the data in journal of XML during 2001–2006. The curve of Decrease-2008 indicates that the amount of data is decreased after 2006. The amount of data in 2007 is 1/2 of that in 2006 (10) and 1/2 of that in 2007 in 2008 (5). The other situation is Increase-2008, which indicates that the amount of data increases after 2006; thus, the amount of data in 2007 is 2 times that in 2006 (40) and that in 2008 is 2 times that in 2007 (80). Thus, PVI-2006, PVI-2008-decrease and PVI-2008-increase are the indices of Original-2006, Decrease-2008 and Increase-2008, respectively.
While the volume of PVI increases relative to that in the past, like PVI-2008-increase forms a concave curve that opens upward. Conversely, while the volume of PVI decreases relative to that in the past like PVI-2008-decrease forms a convex curve that opens downward. Consequently, as the volume of PVI is comparatively larger compared to the value in 2006 between PVI-2008-decrease and PVI-2008-increase, the curve will rise from year 2006, indicating that the topic is becoming a hot topic. Conversely, as the proportion of PVI-2008-increase in 2006 is getting lower than past - the largest volume of topic exists after 2006 - the curve is relatively flat in 2006, indicating that topic in 2006 has not yet become a hot topic.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Fig. 7-2 The PVI Curves of Different Situations in XML Example.
7.3.3 Detection Point Properties
The DP is the intersection of the NI and PVI, and produces the YDP and VDP.
Based on the discussions in Sections 7.2.1 and 7.3.2, which refer to the properties upon which the YDP is based. The accumulated relative frequency are used to determine the DP properties and validate the effectiveness.
1. As the YDP increases, the DP is delayed
This study compares the curves of PVI-2006 to those of PVI-2008-decrease and PVI-2008-increase. Regardless of whether the amount of data increases or decreases, as long as a topic keeps developing (published volume is not 0), the curve will delay the intersect point. This makes sense because a later YDP means the topic has not yet reached the highest point in its lifecycle and growth stage. Conversely, for PVI-2006, the DP must intersect before 2006 as the YDP is 2004.
2. The increase in frequency of the year the entire curve will rise
Consider PVI-2008-decrease, the highest value is produced in 2006 and the curve intersects in front of the DP of PVI-2008-increase. For the case in which the topic is in its mature stage, then the curve is getting fall down. Conversely, PVI-2008-increase indicates that 2006 was not the highest year in terms of its lifecycle, it until the year 2008 reaches the highest volume in it lifecycle. The delayed DP indicates that the topic is not hot.
3. The DP time
0.00 0.20 0.40 0.60 0.80 1.00 1.20
2001 2002 2003 2004 2005 2006 2007 2008
Year Index value
PVI-2006 PVI-2008-decrease PVI-2008-increase Novelty
DP of PVI-2006 DP of PVI-2008-decrease
DP of PVI-2008-increase
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
When a topic has a high PVI in its early stage, the curve will increase and the DP will form in the former of the curve. It indicates that the topic is becoming hot at that time. Comparatively, if the PVI is high in a late stage, the curve will delay the DP.
However, when the PVI curve starts to increase, the topic is being discussed and is an emerging topic. The highest stage is not a point of concern as the topic is already mature. The DP represents the emerging topic produced as the DP is always in front of the present and is a trade-off between the NI and the PVI.
Based on the relationship between conferences and journals, this study can use the proposed emerging topic detection indices to examine the relationship between conferences and journals. If the reasoning is correct, regardless of how the NI is defined, the pattern of topics in conferences and journals will be the same. This study detects the DP of XML in our database as 2004, which is before the highest amount of data in 2006. Although this study cannot determine whether XML has reached the highest volume in its history and will have higher volume never more than it later, but the DP is in 2004, which matches the expected date. Hence, the PVI has a better ability to predict the emerging topic happed time than traditional frequency method.
The value of the DP, regardless of the NI or PVI, is maximal in the trade off and can be used to detect when a topic is emerging. We assert that the DP must exist before the topic becomes hot. Consequently, the DP must exist during period from the first PDY to the present. Whether a topic becomes a hot one or not, the DP can still be calculated (as long as the PDY is more than 2 years) using the proposed indices.
Hence, this study uses the YDP and VDP indices to identify the situation in which a topic is hot. The emerging topic detection table is used to detect the value of retaining a research topic.