• 沒有找到結果。

第五章 結論與建議

5.3 未來研究建議

雖然本研究對於運用 LSI 技術於自動化文件分類有初步的結果,也證明了其可行 性,但受限於時間與研究的規模,對於相關延伸問題無法再做深入的探討。因此針對本 研究採用的方法與前一節所提本研究的限制,對於本研究未來可持續進行的研究方向與 重點,提出下列幾點建議供後續研究參考。

一、 不僅探討單一分類的問題,還應探討多分類的問題。因為一份文件同時隸屬多個不 同的類別,比較符合現實世界的狀況。而 LSI 技術依其特性與目的,可能比較適 合處理多分類的問題。

二、 選用不同的文件資料集進行相關的研究與探討。本研究僅使用了從 Inspec on Disc 選出的部分文件資料,未來的研究可選取更多不同型態的資料,以廣泛及深入探討 運用 LSI 於自動化文件分類的可行性與效能。

三、 本研究在使用 k-NN 分類演算法時,並未對 k 值進行最佳化的處理。未來的研究 可對 k 值進行更有系統的探討。

四、 本研究利用傳統向量空間法與 LSI 法對照,分別搭配中心向量與 k-NN 兩種分類 演算法進行文件分類的實驗,未來的研究可考慮將 LSI 與其他不同的分類演算法 搭配,例如類神經網路、機率法等,以便於更充分瞭解運用 LSI 於自動化文件分 類的效能。


[1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, ACM Press Books, ACM & Addison-Wesley, 1999.

[2] 林頌堅, “自動化文件分類在資訊服務上的應用”, 資訊傳播與圖書館學, 5 (2), 87–102, 1998.

[3] C. J. van Rijsbergen, Information Retrieval, 2nd edition, Butterworths, London, 1979.

[4] G. Slaton and C. S. Yang, “On the Specification of Term Values in Automatic Indexing”, Journal of Documentation, 29 (4), 351–372, 1973.

[5] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw Hill, 1983.

[6] Y. Yang and X. Liu, “A Re-examination of Text Categorization Methods”, Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, 42–49, 1999.

[7] Y. Yang, “An Evaluation of Statistical Approaches to Text Categorization”, Information Retrieval, Kluwer Academic Publishers, 1, 69–90, 1999.

[8] Y. Yang, “Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval”, in 17th Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’94), 13–22, 1994.

[9] T. Mitchell, Machine Learning, McGraw Hill, 1996.

[10] 吳文峰, “中文郵件分類器之設計及實作”, 逢甲大學資訊工程研究所碩士論文, 2002.

[11] J. R. Quinlan, “Induction of Decision Trees”, Machine Learning, 1 (1), 81–106, 1986.

[12] E. Wiener, J. O. Pedersen, and A. S. Weigend, “A Neural Network Approach to Topic Spotting”, in Proceedings of the 4th Annual Symposium on Document Analysis and

Information Retrieval (SDAIR '95), 317–332, 1995.

[13] H. T. Ng, W. B. Goh, and K. L. Low, “Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization”, in 20th Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrievel (SIGIR '97), 67–73, 1997.

[14] Y. Yang and C. G. Chute, “A Linear Least Squares Fit Mapping Method for Information Retrieval from Natural Language Texts”, in Proceedings of the 14th International Conference on Computational Linguistics (COLING 92), 447–453, 1992.

[15] 莊慧美, “以智慧型計算方法探討文件分類”, 國立屏東科技大學資訊管理研究所 碩士論文, 2000.

[16] 涂宜昆, “以單類支持向量機為基礎之階層式文件分類”, 國立成功大學資訊工程 研究所碩士論文, 2003.

[17] H. P. Zipf, Human Behavior and the Principle of Least Effort, Cambridge, Massachusetts, 1949.

[18] H. P. Luhn, “The Automatic Creation of Literature Abstracts”, IBM Journal of Research and Development, 2, 159–165, 1958.

[19] K. Spark Jones, “A Statistical Interpretation of Term Specificity and Its Application in Retrieval”, Journal of Documentation, 28 (1), 11–20, 1972.

[20] S. T. Dumais, S. Deerwester, and R. Harshman, “Indexing by Latent Semantic Analysis”, Journal of the American Society for Information Science, 41 (6), 391–407, 1990.

[21] J. I. Hong, “An Overview of Latent Semantic Indexing”, http://www.cs.berkeley.edu/~jasonh/classes/sims240/sims-240-final-paper-lsi.htm.

[22] S. T. Dumais, “LSI Meets TREC: A Status Report”, in The First Text Retrieval Conference (TREC1), 137–152, 1993.

[23] S. T. Dumais, “Latent Semantic Indexing (LSI) and TREC-2”, in The Second Text

Retrieval Conference (TREC2), 105–116, 1994.

[24] S. T. Dumais, “Latent Semantic Indexing (LSI): TREC-3 Report”, in The Third Text Retrieval Conference (TREC3), 219–230, 1995.

[25] T. A. Letsche and M. W. Berry, “Large-Scale Information Retrieval with Latent Semantic Indexing”, Information Science, 100, 105–137, 1997.

[26] P. Foltz, “Using Latent Semantic Indexing for Information Filtering”, in Proceeding ACM Conference Office Information System (COIS), 40–47, 1990.

[27] 黃卓倫, “利用隱藏語意索引進行文件分段檢索之研究”, 國立台灣大學資訊管理 研究所碩士論文, 1997.

[28] G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Methods for Mathematical Computations, Chap. 9, NJ: Prentice Hall, 1977.

[29] L. Baoli, Y. Shiwen, and L. Qin, “An Improved k-Nearest Neighbor Algorithm for Text Categorization”, Proceedings of the 20th International Conference on Computer Processing of Oriental Languages, Chenyang, China, 2003.

[30] M. Mullin and R. Sukthankar, “Complete Cross-Validation for Nearest Neighbor Classifiers”, Proceedings of the International Conference on Machine Learning, 639–646, June 2000.

[31] K. Aas and L. Eikvil, “Text Categorization: A Survey”, Technical Report, Norwegian Computer Center, 1999.

[32] F. Sebastiani, “Machine Learning in Automated Text Categorization”, ACM Computing Surveys, 34, 1, 1–47, 2002

[33] J. W. T. Wong, W. K. Kan, and G. Young, “ACTION: Automatic Classification for Full-Text Documents”, SIGIR Forum, ACM Special Interest Group on Information Retrievel, 30 (1), 26–41, 1996.

[34] S. D’Alessio, K. Murray, R. Schiaffino, and A. Kershenbaum, “Category Levels in Hierarchical Text Categorization”, Proceedings of the 3rd Conference on Empirical Methods in Natural Language Processing (EMNLP-3), 1998.

[35] S. Dumais and H. Chen, “Hierarchical Classification of Web Content”, Proceedings of SIGIR 2000, 256–263, 2000.

[36] V. Dasigi, R. C. Mann, and V. A. Protopopescu, “Information Fusion for Text Classification—An Experimental Comparison”, Pattern Recognition, 34, 2413–2425, 2001.

[37] D. D. Lewis, “Representation and Learning in Information Retrieval”, Ph.D.

Dissertation, University of Massachusetts, Amherst, 1992.

[38] S. Zelikovitz and H. Hirsh, “Using LSI for Text Classification in the Presence of Background Text”, Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM-2001), 113–118, 2001

[39] M. F. Porter, “An Algorithm for Suffix Stripping”, Program, 14 (3), 130–137, 1980.

[40] The Porter Stemming Algorithm, http://www.tartarus.org/~martin/PorterStemmer/

[41] Latent Semantic Indexing Web Site, http://www.cs.utk.edu/~lsi/

[42] MySQL Web Site, http://www.mysql.com/

[43] Perl Web Site, http://www.perl.com/
