• 沒有找到結果。

Chapter 9 Conclusions and Future Work

9.2 Future Work

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

The ACM Digital Library and IEEE Computer Society are adopted as databases for searching conference papers. These are two renowned academic communities within the domain of Computer science, and hold extensive collections of all formats of conference papers within their academic communities.

The ACM Digital Library, IEEE Computer Society, ProQuest and ScienceDirect Onsite were adopted as the four databases for the searching of journal papers. Thus, two extra databases were included to complement ACM Digital Library and IEEE Computer Society’s scarcity in journal collections, and to generalize our research findings, so that our deduction can be further expanded.

The titles of the papers were adopted as the benchmarks for our extraction of information in the first part of the study, since they are strongly representative of the entire article. But in the second part and third part we try to use the title and Abstract as our descriptor of the papers. TextAnalyst was utilized to analyze the words used in titles, and single out the ones that have been repeated more than three times as features of this research. It may have some limitations of its embedded algorithm in the tool.

The novel methods we proposed can improve the limitations of impact factor proposed by ISI. Besides, it uses the impact power of the authors and the publication in a topic to measure the impact power of a paper before it really has been an impact paper can solve the limitations of Google scholar’s approach. We suggest that the topic oriented thinking of our methods can really help the researchers to solve their problems of searching the valuable topics.

9.2 Future Work

The future work can focus on the four relationships between the conferences and journals. The single term or n-gram terms perhaps not sufficient enough to represent a topic. The combination of the terms or the other approach can represent the semantic meaning of the topics will improve the method. The descriptors and the keywords we use to search the database will limit the representation of the paper and research field.

Besides, the Bayesian estimation can evolution use the Markov Chain Monte Carlo (MCMC) method. In the proving the precise of the impact power, the cited frequency and published volume can determine the impact power of the authors or the publications. It makes sense that an author’s impact power can be the endorsement of a publication and vice versa. In the part of indices, it needs more information to improve the indices more feet the realty. The situations that if a real impact paper was not published by the impact author or publication and then it would not be detected in this method. Besides, the method we proposed will ignore the new but potential

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

scholars since the method is based on the accumulative cited frequency. The field of a cite publication needs to be considered to figure out the real community which we exactly want to discuss in a topic. All the procedures and algorithm of academic intelligence in this dissertation can exactly combines to find out the emerging topics in someone’s research field. In the application point of view, the methods can also apply in the publishers such as Reuters or Washington Post for detecting whether an issue is too hot or not. It is also can apply in the stock index estimation to figure out whether the stock market is too hot or not. And we think researchers will get more creative ideas to apply the idea of this study in their research domain.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

References

Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, T. (1998). Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA

Broadcast News Transcription an Understanding Workshop.

Allan, J., Papka, R., & Lavrenko, V., (1998). On-line new event detection and tracking.

In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 37-45.

Aurora, P. P., Rafael, B. L., & Jose, R. S. (2007). Topic discovery based on text mining techniques. Information Processing & Management, 43, pp. 742-768.

Berry, M.W. (2004) Survey of text mining-clustering, classification, and retrieval.

Springer, pp. 185-224.

Bolelli, L., Ertekin, S., Zhou, D., & Giles, C. L. (2009). Finding topic trends in digital libraries, In: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital

libraries, pp. 69-72.

Chen, K.Y., Luesukprasert, L., & Chou, S. C. (2007). Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Transactions

on Knowlede and Data Enginerting, 19(8), pp. 1016-1025.

Chou, T. C., & Chen, M. C. (2008). Using incremental plsi for threshold-resilient online event analysis. IEEE Transactions on Knowlede and Data Enginerting,

20(3), pp. 289-299.

Clifton, C., Cooley, R., & Rennie, J. (2004). Topcat: data mining for topic indentification in a text corpus. IEEE Transactions on Knowlede and Data

Enginerting, 16(8), pp. 949-964.

Cui, C., & Kitagawa, H. (2005). Topic activation analysis for document streams based on document arrival rate and relevance. In: Proceedings of the 2005 ACM

symposium on applied computing, pp. 1089-1095.

Felix, M. A., Benjamin, V. Q., Zaida, C. R., Elena, C. A., Victor, H. S., Francisco J. M.

F. (2005). Domain analysis and information retrieval through the construction of heliocentric maps based on ISI-JCR category cocitation. Information Processing

& Management, 41(6), pp. 1521-1533.

Franz, M., & McCarley, J. C. (2001). Unsupervised and supervised clustering for topic tracking. In: Proceedings of the 24th annual international ACM SIGIR

conference on Research and development in information retrieval, pp.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

310-317.

Hatzivassiloglou, V., Gravano, L., & Maganti, A. (2000). An investigation of linguistic features and clustering algorithms. In: Proceedings of the 23rd annual

international ACM SIGIR conference on Research and development in information retrieval, pp. 224-231.

Jin, Y., Myaeng, S. H., & Jung, Y. (2007). Use of place information for improved event tracking. Information Processing & Management, 43, pp. 365-378.

Jo, Y., Lagoze, C., & Giles, C. L. (2007). Detecting research topics via the correlation between graphs and texts. In: Proceedings of the 13th ACM SIGKDD

international conference on Knowledge discovery and data mining,

pp.370-379.

Joachims, T. (1998). Text categorization with Support Vector Machines: learning with many relevant features. In: Proceedings of the EMNLP Conference.

Kollios, G., Gunopulos, D., Koudas, N., & Berchtold, S. (2003). Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE

Transactionson Knowlede and Data Enginerting, 15(5), pp. 1170-1187.

Kleinberg, J. (2002). Bursty and hierarchical structure in streams. In: Proceedings of

the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 91-101.

Kuramochi, M., & Karypis, G. (2004). An efficient algorithm for discovering frequent subgraphs. IEEE Transactionson on Knowlede and Data Enginerting, 16(9), pp. 1038-1051.

Lee, C., Lee, G. G., & J, M. (2007). Dependency structure language model for topic detection and tracking. Information Processing & Management, 43, pp.

1249-1259.

Lee, Z., Gosain, S., & Im, I. (1997). Topics of interest in IS: evolution of themes and differences between research and practice. Information & Management, 36, pp. 233-246.

Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: joint models of topic and author community, In :Proceedings of the 26th Annual International

Conference on Machine Learning, pp. 665-672.

Malone, J., McGarry, K., & Bowerman, C. (2006). Automated trend analysis of proteomics data using an intelligent data mining architecture, Expert Systems

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

with Applications, 30, pp. 24-33.

Manmatha, R., Feng, A., & Allan, J. (2002). A critical examination of TDT’s cost function. In: Proceedings of the 25th annual international ACM SIGIR

conference on Research and development in information retrieval, pp. 403-404.

Markkonen, J., Ahonen-Myka, H., & Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7, pp. 347-368.

Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the 10th

ACM SIGKDD international conference on Knowledge discovery and data mining, pp.

811-816.

Moulinier, I., Raskinis, G., & Ganascia, J. (1996). Text categorization: A symbolic approach. In: Annual Symposium on Document Analysis and information

retrieval (SDAIR).

Nallapati, R., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD

international conference on Knowledge discovery and data mining, pp. 542-550.

Ontrup, J., Ritter, H., Scholz, S. W., & Wagner R. (2008). Detecting, assessing and monitoring relevant topics in virtual information environments. IEEE

Transactionson Knowlede and Data Enginerting, 20(7).

Ozmutlu, H. C., & Cavdur, F. (2005). Application of automatic topic identification on excited web search engine data logs. Information Processing & Management, 41, pp. 1243-1262.

Ozmutlu, S. (2006). Automatic new topic identification using multiple linear regression. Information Processing & Management, 42, pp. 934-950.

Porter, M. (1980). An algorithm for suffix stripping. Program (Automated Library and

Information Systems), 14(3), pp. 130-137.

Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010).

Learning author-topic models from text corpora, Transactions on Information

Systems, 28 (1).

Salton, G. (1989). Automatic text processing: The transformation, analysis and retrieval of information by computer, Addison-Wesley, Reading, MA.

Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), pp. 613-620.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), pp. 513-523.

Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval.

McGraw Hill Publishing Company.

Schultz, J. M., & Liberman, M. (1999). Topic detection and tracking using idf-weighted cosine coefficient. In: Proceedings of the DARPA Broadcast News

Transcription an Understanding Workshop.

Schutze, H., Hull, D., & Pedersen, J. (1995). A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18st

annual international ACM SIGIR conference on Research and development in information retrieval, pp.229-237.

Steyvers, M., Smyth, P., & Griffiths, T. (2004). Probabilistic author topic models for information discovery. In: Proceedings of the 10th

ACM SIGKDD international conference on Knowledge discovery and data mining, pp.

306-315.

Stokes, N., & Carthy, J. (2001). Combining semantic and syntactic document classifiers to improve first story detection. In: Proceedings of the 24th annual

international ACM SIGIR conference on Research and development in information retrieval, pp. 424-425.

Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. In:

Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 49-56.

Tu, Y. N., & Seng, J. L. (2009). Research Intelligence Involving Information Retrieval – An example of Conferences and Journals, Expert Systems with

Applications, 47(6).

Tu, Y. N., & Seng, J. L. (2010). Indices of Novelty for Emerging Topic Detection.

(working paper).

Tan, P. N., Steinbach, M. & Kumar, V. (2006). Introduction to data mining.

Addison-Wesley, pp. 69-84.

Thelwall, M. (2005). Scientific web intelligence: Finding relationships in university webs, Communications of the ACM, 48(7), pp. 93-96.

Thelwall, M., & Harries, G. (2004). Do better scholars’ Web publications have significantly higher online impact? Journal of the American Society for

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Information Science and Technology, 55(2), pp. 149-159.

Thelwall, M., Vaughan, L., Cothey, V., Li, X., & Smith, A. (2003). Which academic subjects have most online impact? A pilot study and a new classification process,

Online Information Review, 27(5), pp. 333-343.

Tho, Q. T., Hui, S. C., & Fong, A. C. M. (2007). A citation-based document retrieval system for finding research expertise, Information Processing and Management,

43(1), pp. 248-264.

Walls, F., Jin, H., Sista, S., & Schwartz, R. (1999). Topic detection in broadcast news,

In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

Wang, X., Zhai, C., Hu, X., & Sproat, R. (2007). Mining correlated bursty topic patterns from coordinated text streams, In: Proceedings of the

12th ACM SIGKDD international conference on Knowledge discovery and data mining,

pp. 784-793.

Wu, K., Chen, M., & Sun, Y. (2004). Automatic topics discovery from hyperlinked documents, Information Processing & Management, 40, pp. 239-255.

Yang, H. C., & Lee, C. H. (2004). A text mining approach on automatic generation of web directories and hierarchies, Expert Systems with Applications, 27, pp.

645-663.

Yang, H. C., & Lee, C. H. (2005). A text mining approach for automatic construction of hypertexts, Expert Systems with Applications, 29, pp. 723-734.

Yang, Y., Ault, T., Pierce T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking, In: Proceedings of the 23th annual international

ACM SIGIR conference on Research and development in information retrieval,

pp. 65-72.

Yang, Y. & Pedersen, J. (1997). A comparative study on feature selection in text categorization, In: International Conference on Machine Learning.

Yang, Y. & Wilbur, J. (1996). Using corpus statistics to remove redundant words in text categorization, Journal of the American Society for Information Science,

47(5), pp. 357-369.

Yang, Y., Zhang, J., Carbonell, J., & Jin, Chun. (2002). Topic-conditioned novelty detection, In: Proceedings of the eighth ACM SIGKDD international

conference on Knowledge discovery and data mining, pp.688-693.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Yang, Y., Yoo, S., Zhang, J., & Kisiel, B. (2005). Robustness of adaptive filtering methods in a cross-benchmark evaluation, In: Proceedings of the 28th annual

international ACM SIGIR conference on Research and development in information retrieval, pp. 98-105.

Zhang, Y., Callan, J., & Minka, T. (2002). Novelty and redundancy detection in adaptive filtering, In: Proceedings of the 25th annual international ACM SIGIR

conference on Research and development in information retrieval, pp. 81-88.

Zhang, Y., Surendran, A. C., Platt, J. C., & Narasimhan, M. (2008). Learning from multi-topic web documents for contextual advertisement, In: Proceedings of

the 14th ACM SIGKDD international conference on Knowledge discovery and

data mining, pp.1051-1059.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Appendix A Questionnaire

Dear Professors:

Greetings. I am a Ph. D. candidate at National Chengchi University. This is a research questionnaire for my dissertation which is a study in the Academic Intelligence Involving Information Retrieval. The questionnaire uses a common topic of Data Mining to test the research model. The questionnaire is divided into three parts. They are (A) the emerging topics life cycle, (B) the impact power of authors, and (C) the impact power of publications. Thank you so much for your kind help, time and efforts.

Best regards,

Dr. Jia-Lang Seng

Distinguished Professor & Chair Dept & Graduate School of Accounting National Chengchi University

Dr. Woo-Tsong Lin Associate Dean College of Commerce

National Chengchi University Ph.D. Candidate Yi-Ning Tu

Department and Graduate School of Management Information Systems National Chengchi University

94356509@nccu.edu.tw

0916-075571

Part A : The Emerging Research Topics Life Cycle

Questions

Yes No

No Opinion

1. Do you agree the topic “Image Retrieval” is emerging from the year 2000?

2. Do you agree the topic “Sensor Network” is emerging from the year 2004?

3. Do you agree the topic “Semantic Web” is emerging from the year 2004?

4. Do you agree the topic “Support Vector” is emerging from the year 2004?

Opinions:

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Part B : The Impact Power of Authors in Data Mining (During 1990-2002)

Descriptions Yes No No Opinion

1 Bing Liu

2 Charu C. Aggarwal

3 Chris Clifton

4 David W. Cheung

5 Foster J. Provost

6 George Karypis

7 Hannu T. T. Toivonen

8 Heikki Mannila

9 Huan Liu

10 Jiawei Han

11 Kyuseok Shim

12 Ming-syan Chen

13 Mohammed Javeed Zaki 14 Osmar R. Zaïane

15 Philip S. Yu

16 Rajeev Rastogi

17 Rakesh Agrawal

18 Ron Kohavi

19 Salvatore J. Stolfo 20 Venkatesh Ganti

Part C : The Impact Power of Publications in Data Mining Part C1 : Journals list (During 1990-2002)

The title of the publications

Yes No No Opinion 1 Bioinformatics

2 Communications of the ACM (CACM) 3 Data Mining and Knowledge Discovery

4 IEEE Bulletin of the Technical Committee on Data Engineering 5 IEEE Computers

6 IEEE Transactions on Knowledge and Data Engineering 7 IEEE Transactions on Visualization and Computer Graphics 8 Knowledge and Information Systems

9 Machine Learning 10 SIGKDD Explorations 11 SIGMOD Record

Opinions:

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Part C2 : Conferences list (During 1990-2002)

The title of the publications

Yes No No Opinion

1 ACM Conference on Computers and Security 2 ACM International Conference on Digital Libraries

3 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)

4 ACM Symposium on Principles of Database Systems 5 Advances in Digital Libraries Conference (ADL)

6 Advances in Distributed and Parallel Knowledge Discovery 7 Advances in Large Margin Classifiers

8 Advances in Neural Information Processing Systems (NIPS) 9 European Conference on Machine Learning (ECML)

10 Genetic and Evolutionary Computation Conference (GECCO)

11

IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

12 Industrial Conference on Data Mining (ICDM) 13 International Conference on Data Engineering (ICDE)

14 International Conference on Data Warehousing and Knowledge Discovery (DaWaK)

15 International Conference on Database Theory (ICDT)

16

International Conference on Information and Knowledge Management (CIKM)

17 Lecture Notes in Computer Sciences (LNCS) 18 Lecture Notes in Artificial Intelligence (LNAI)

19 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

20 Principles of Data Mining and Knowledge Discovery (PKDD) 21 SIAM International Conference on Data Mining (SDM) 22 SIGMOD

23

SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD)

24 Very Large DataBase (VLDB)

Opinions:

This is the end of the questionnaire.

Thank you for your precious opinion and help.