為了更有效率地掌握技術的後續發展脈絡,若能讓電腦幫助我們將搜尋得到的資訊 再進行一次篩選,僅留下以該項理論或技術為主要發展基礎的研究著作,將可以幫助研 究者們在文獻探討這一步上事半功倍。學術論文在撰寫時,常使用頭字語來替代技術全 名,以方便重複提及或易於讀者閱覽;我們注意到了這項特性,並在本篇實驗中加以利 用:藉由分析學術文件內的頭字語的方式,來尋找以根文獻技術為基礎改進或延伸發展 的技術和其名稱,並篩選出包含我們所需資訊的文件。我們的系統在前兩組測詴資料 中,能取得與人工篩選方式相當接近的成果,唯第三組因文件群使用頭字語不普遍而導 致成效不彰,如何降低甚至避免這種現象帶來的影響,也是我們往後改進的方向之一。
另外我們藉由分析參考文件列表,來觀察通過篩選的文件間彼此引證情形,不乏為一樣 可以評估這些文件重要性的方式。另外也進行共引文分析,詴圖找出與根文獻技術、或 是各頭字語代表的技術,有所助益的相關引用文;雖然從結果中能獲取的資訊不如預期 的多,但仍足供參考。從三組測詴資料的分析過程,我們也發現到有喜好使用頭字語的 學術著作者,但也有只是遵循參考文獻而使用、但本身不會特意創造新頭字語的作者,
甚至偏好直接使用全名。頭字語雖然普遍的被使用,但僅是習慣問題而非規定,這現象 直接影響到了我們的結果。雖然如此,對於這類憑著共同引用一篇文獻這項關聯性,所 聚集起來、蘊藏著各式具有相同基礎與目的的技術之大量學術文件,除卻第三資料組那 般的特例外,我們的方法在大多部份仍具有相當的效果。
因為文件中各章節的冗餘資訊繁多,因而我們在搜索頭字語時僅限摘要部分,若能 找到方法解決這問題、來對全文進行搜索,相信將可大大地提升準確率。另外由於目前 測詴資料皆以人工建立,若能克服技術問題、讓系統能直接從網路上下載文件並自動轉 為純文字格式檔,來提供系統分析,才是最人性化的方式。
參考文獻
[1] S. Lawrence, K. Bollacker and C. L. Giles, “Indexing and retrieval of scientific literature,” Proc. of the 8th Conference on Information and Knowledge Management, ACM Press, pp. 139-146, 1999.
[2] L. Bornmann and H. D. Daniel, “What do citation counts measure:A review of studies on citing behavior,” Journal of Documentation, Vol.64, No.1, pp.45-80, 2008.
[3] S. Teufel , A. Siddharthan and D. Tidhar, “Automatic classification of citation function,” Proc. of EMNLP 2006, pp.103-110, 2006.
[4] G. Nenadic, I. Spasic and S. Ananiadou, “Automatic acronym acquisition and term variation management within domain-specific texts,” Proc. of LREC-3. Las Palmas, Spain, pp. 2155-2162, 2002.
[5] Wiki in library and information Science (http://morris.lis.ntu.edu.tw/wikimedia/index.php/) [6] M. Liu, “The complexities of citation practice: a review of citation studies,” Journal of
Documentation, Vol.49, No.4, pp.370-408, 1993.
[7] D. E. Chubin and S. D. Moitra, “Content analysis of references: adjunct or alternative to citation counting?,” Social Studies of Science, Vol.5, pp.423-441, 1975.
[8] C. Oppenheim and S. P. Renn, “Highly cited old papers and reasons why they continue to be cited,”
Journal of the American Society for Information Science, Vol.29, pp.225-231, 1978.
[9] S. Maricic, J. Spaventi, L. Pavicic and G. Pifat-mrzljak, “Citation context versus the frequency counts of citation histories,” Journal of the American Society for Information Science, Vol.49, pp.530-540, 1998.
[10] P. A. Hooten, “Frequency and functional use of cited documents in information science,” Journal of the American Society for Information Science, Vol.42, pp.397-404, 1991.
[11] S. Hanney, I. Frame, J. Grant, P. Green and M. Buxton, “From bench to bedside: tracing the payback forwards from basic or early clinical research – a preliminary exercise and proposals for a future study,” HERG Research Report No. 31. Uxbridge, UK, Health Economics Research Group, Brunel University, 2003.
[12] A. Ritchie. “Citation context analysis for information retrieval,” PhD thesis, University of Cambridge, 2008.
[13] E. Garfield, “Citation indexing: its theory and application in science,” Technology and Humanities, Wiley, 1979.
[14] T. Strohman, W. B. Croft and D. Jensen, “Recommending citations for academic papers,” Proc. of the 30th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, pp.705-706, 2007.
[15] E. Meij and M. D. Rijke, “Using prior information derived from citations in literature search,” Proc.
of the International Conference on Recherched Information Assistée par Ordinateur (RIAO), 2007.
[16] K. Taghva and J. Gilbreth, “Recognizing acronyms and their definitions,” Technical report 95-03, ISRI, UNLV, June 1995.
[17] L.S. Larkey, P. Ogilvie, M.A. Price, and B. Tamilio, “Acrophile: an automated acronym extractor and server,” Proc. of 5th ACM Conference on Digital Libraries, San Antonio, TX, pp205–214, June 2000.
[18] D. Nadeau and P. D. Turney, “A Supervised Learning Approach to Acronym Identification,” in 18th Conference of the Canadian Society , for Computational Studies of Intelligence, Victoria, BC, Canada, pp319–329, 2005.
[19] D. Dann´ells, “ Acronym classification using feature combinations,“ 2007.
[20] D. Dann´ells, “Automatic acronym recognition,” Proc. of the 11th conference on European chapter of the Association for Computational Linguistics, pp.167–170, 2006.
[21] S. N. Sanchez, E. Triantaphyllou, and D. Kraft, “A featuremining based approach for the classification of text documents into disjoint classes,” Inf. Process. Manage. Vol.38, No.4, pp.583–604, 2002.
[22] G. Salton, A. Wong, C. S. Yang. “A Vector Space Model for Automatic Indexing,”
Communications of the ACM, Vol.18, No.11, pp.913–620, 1975.
[23] B. Aljaber, N. Stokes, J. Bailey and J. Pei, “Document Clustering of Scientific Texts Using Citation Contexts,” Information Retrieval, Vol.13, No.2, pp101–131,2009.
[24] H. Small and E. Sweeney, “Clustering The Science Citation Index @ Using Co-Citations,”
Scienrometrics, Vol.7, No.3-6, pp.391–409, June 1984.
[25] PDFTextOnline. (http://pdftextonline.com/q/)
[26] S. Yeates, “Automatic extraction of acronyms from text,” Proc. Of the 4th New Zealand Computer Science Research Students’ Conference. , pp.117–24, 1999.
[27] D. L. Olson and D. Delen, ”Advanced Data Mining Techniques,” Berlin Heidelberg : Springer-Verlag, pp.138, 2008.
[28] D. Kaplan and T. Tokunaga, ”A Citation-based Approach to Automatic Paper Summarization,” 言 語処理学会第15回年次大会発表論文集, pp.128–131, 2009.
[29] A. Ritchie, S. Teufel and S. Robertson, “How to Find Better Index Terms Through Citations,”
Proc. of the Workshop on How Can Computational Linguistics Improve Information Retrieval, pp.25–32, July 2006.
[30] K. Lang, “Newsweeder: learning to filter net news,” Proc. of ICML-95, 12th International Conference on Machine Learning(Lake Tahoe,CA,1994),pp.331–339, 1995.
[31] J. B. MacQueen, “Some Methods for classification and Analysis of Multivariate Observations,”
Proc. of 5th Berkeley Symposium on Mathematical Statistic s and Probability, 1967.
[32] C. D. Manning, P. Raghavan and H. Sch¨utze. “Introduction to Information Retrieval,” Cambridge, England: Cambridge University Press, July 2008.
附錄A 各測詴資料組通過系統篩選的文件列表
TestData.1
1. H. Houissa and N. Boujemaa, “Coherence criterion for region labelling and description,” 2005.
2. B. L. Saux and N. Boujemaa, “Image Database Clustering with SVM-based Class Personalization,” SPIE Conference on Storage and Retrieval Methods and Applications for Multimedia, 2004.
3. H. Houissa and N. Boujemaa, “Region labelling using a Point-Based Coherence Criterion,” Proc. of SPIE, 2006.
4. B. Saux and N. Boujemaa, “Unsupervised categorization for image database overview,” Proc. Recent Advances in Visual Information Systems,2002. Proc. Recent Advances in Visual Information Systems / International Conference on Visual Information System, Vol. 23,14 of Lecture Notes in Computer Science, pp.163-174, March 2002.
5. B. L. Saux and N. Boujemaa, “Unsupervised robust clustering for image database categorization,” Proc. of the International IEEE Conference on Pattern Recognition (ICPR), 2002.
6. H. Frigui and R. Krishnapuram, “A Robust Competitive Clustering Algorithm withApplications in Computer Vision,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.21, No.1, pp.450-465, January 1999.
7. L. Ren, G. W. Irwin, “Robust fuzzy Gustafson-Kessel clustering for nonlinear system identification,”
International Journal of Systems Science, Vol.34, No.1, pp.787-803, 2003.
8. H. Frigui, C. Hwang and F.C.-H. Rhee, “Clustering and aggregation of relational data with application to image database categorization,” Pattern Recognition, Vol.40, pp.3053-3068, 2007.
9. O. Nasraoui, H. Frigui, R. Krishnapuram and A. Joshi, “Extracting web user profiles using relational competitive fuzzy clustering,” International Journal on Artificial Intelligence Tools, Vol.9, pp.509-526, 2000.
10. O. Nasraoui, H. Frigui, A. Joshi and R. Krishnapuram, “Mining Web Access Logs Using Relational Competitive Fuzzy Clustering,” Proc. of the Eight International Fuzzy Systems Association World Congress, August 1999.
11. N. Grira, M. Crucianu, N. Boujemaa, “Fuzzy Clustering with Pairwise Constraints for Knowledge-Driven Image Categorization,” IEEE Proc. on Vision, Image and Signal Processing, 2006.
12. N. Grira, M. Crucianu and N. Boujemaa, “Semi-Supervised Fuzzy Clustering with Pairwise-Constrained Competitive Agglomeration,” IEEE International Conference on Fuzzy Systems (Fuzz'IEEE 2005), May 2005.
13. N. Grira, M. Crucianu and N. Boujemaa, “Semi-supervised image database categorization using pairwise constraints,” IEEE International Conference on Image Processing. Genova, Italy, 2005.
14. Y. Kanzawa, Y. Endo and S. Miyamoto, “Some Pairwise Constrained Semi-Supervised Fuzzy c-Means Clustering Algorithms,”MDAI 2009, LNAI 5861, pp.268-281, 2009.
15. A. Chourou and A. Benazza-Benyahia, “Lossless and Gradual Coding of Hyperspectral Images by Lifting Scheme,” Proc. of 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 2007.
16. R. Tebourbi, A. Benazza-Benyahia and Z. Belhadj, “Multiscale Retrieval with partial query of multispectral satellite images”.
TestData.2
17. Z. Xie, A. Wang, F. L. Chung, “An enhanced possibilistic C-Means clustering algorithm EPCM,” Soft Computing, Vol.12, pp.593-611 ,2008.
18. N. R. Pal, K. Pal, J. C. Bezdek, “A mixed c-means clustering model,” IEEE Proc. of the International Conference on Fuzzy Systems, pp.11-20, 1997.
19. N. R. Pal, K. Pal, J. M. Keller and J. C. Bezdek, “A possibilistic fuzzy c-means clustering algorithm,”
IEEE Trans. Fuzzy Systems, Vol.13, No.4, pp.517-530, August 2005.
20. J. S. Lin, “Fuzzy possibilistic neural network to vector quantizer in frequency domains,” Optical Engineering,pp.839-847, April 2002.
21. S. H. Liu and J. S. Lin, ”Vector quantization in DCT domain using fuzzy possibilistic c-means based on penalized and compensated constraints,”Pattern Recognition, Vol.35, No.10, pp.2201-2211, 2002.
22. F. Masulli, A. Schenone, “A fuzzy clustering based segmentation system as support to diagnosis in medical imaging,” Artificial Intelligence in Medicine, Vol.16, No.2, pp.129-147, 1999.
23. D. Q. Zhang and S. C. Chen, “Kernel based fuzzy and possibilistic c-means clustering,” Proc. International Conference on Artificial Neural Network (ICANN '03), Istanbul, Turkey, pp.122-125, June 2003.
24. M. De C´aceres, F. Oliva and X. Font, “On relational possibilistic clustering,” Pattern Recognition, Vol.39, No.11, pp.2010-2024, 2006.
25. V. S.Tseng and C. Kao, “A novel similarity-based fuzzy clustering algorithm by integrating PCM and mountain method,” IEEE Trans. on Fuzzy Systems, Vol.15, pp.1188-1196. December 2007.
26. J. S. Zhang , Y. W. Leung, “Improved possibilistic c-means clustering algorithms,” IEEE Trans. on Fuzzy Systems, Vol.12, pp.209-217, 2004.
27. R. Krishnapuram and J. M. Keller, “The possibilistic c-means algorithm: insights and recommendations,”
IEEE Trans. Fuzzy Systems, Vol.4, pp.385-393, 1996.
TestData.3
28. L. Galluccio, O. Michel, P. Comon, A. O. Hero and M. Kliger, “Combining multiple partitions created with a graph-based construction for data clustering,” IEEE International Workshop on Machine Learning for Signal Processing, September 2009.
29. D. D. Abdala and X. Jiang, ”An Evidence Accumulation Approach to Constrained Clustering Combination,” MLDM 2009, LNAI 5632, pp.361-371, 2009.
30. J Duarte, A Fred, F Rodrigues, J Duarte, S Ramos and Z. Vale, “Definition of MV load diagrams via weighted evidence accumulation clustering using subsampling,” Proc. of the 6th WSEAS International
Conference on Signal Processing, Robotics and Automation, Corfu Island, Greece, February 2007.
31. P. Casas, J. Mazel, P. Owezarski and Y. Labit, “Sub-Space Clustering and Evidence Accumulation for Unsupervised Anomaly Detection in IP Networks,”Hal-00485427, Version 1-20, May 2010.
32. W. Hasperue, L. Lanzarini , “Classification Rules Obtained from Evidence Accumulation,” On Information Technology Interfaces, ITI 2007, 2007.
33. S. S. Khan, Dr. S. Kant, “Computation of initial modes for K-modes clustering algorithm using evidence accumulation,” Proc. of the 20th International Joint Conference on Artificial Intelligence, Vol.7, pp.2784-2789, 2007.
附錄 B 各測詴資料組共引用文件標題列表
TestData.1
1. R. N. Dav´e, “Characterization and detection of noise in clustering,” Pattern Recognition Letters, Vol.12, No.11, pp. 657–664, 1991.
2. J. Z. Wang, J. Li and G. Wiederhold. “SIMPLIcity: Semantics-sensitive integrated matching for picture Libraries,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.23, pp.947-963, 2001.
3. R. J. Hathaway, J. W. Davenport and J. C. Bezdek, “Relational duals of the c-means algorithms,” Pattern Recognition, Vol.22, pp.205-212, 1989.
4. R. J. Hathaway and J. C. Bezdek, “NERF c-Means: Non-Euclidean relational fuzzy clustering,” Pattern Recognition, Vol.27, pp.429-437,1994.
5. H. Frigui and O. Nasraoui, “Unsupervised Learning of Prototypes And Attributes Weights,” Pattern Recognition, 2004.
6. R. R. Yager, “On ordered weighted averaging aggregation operators in multicriteria decision making,”
IEEE Trans. on Systems, Man and Cybernetics, Vol.18, pp.183-190, 1988.
7. A. Dempster, N. Laird and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,”
Journal of Royal Statistical Society, Vol.39, pp.1-38, 1977.
8. N. Boujemaa, “On competitive unsupervised clustering,” Proc. of International Conference on Pattern Recognition, Spain, 2000.
9. B. Le Saux and N. Boujemaa, “Unsupervised robust clustering for image database categorization,” Proc.
International Conference on Pattern Recognition, 2002.
10. S. Basu, A. Banerjee and R. J. Mooney, “Semi-supervised clustering by seeding,” Proc. Of 19th International Conference on Machine Learning (ICML’02), pp.19-26, 2002.
11. N. Boujemaa, “On competitive unsupervized clustering,” Proc. of the International Conference on Pattern Recognition,Vol.1, pp.631-634, Barcelona, Spain, September 2000.
12. S. A. Nene, S. K. Nayar, and H. Murase, “Columbia object image library (coil-100),” tech. rep., Department of Computer Science, Columbia University, http://www.cs.columbia.edu/CAVE/, 1996.
TestData.2
13. O. Nasraoui and R. Krishnapuram, “Crisp Interpretations of Fuzzy and Possibilistic Clustering Algorithm,”
Proc.EUFIT, Aachen, Germany, pp.1312-1318, 1995.
14. E. E. Gustafson and W. Kessel, “Fuzzy Clustering with a Fuzzy Covariance Matrix,” Proc. 1978 IEEE CDC, pp.761-766,1979.
15. R. N. Dave and S. Sen, “Possibilistic c-means clustering for relational data,” IEEE Trans Fuzzy System, Vol.10, No.6, pp.713-727, 2002.
16. H. Timm, C. Borgelt, C. Döring, R. Kruse, “An extension to possibilistic fuzzy cluster analysis,” IEEE Trans Fuzzy Sets and Systems, Vol.147,pp.3-16, 2004.
17. R.N Dave, R. Krishnapuram, “Robust clustering methods: A unified view,” IEEE Trans. Fuzzy System Vol.5, pp.270–293, 1997.
TestData.3
18. A. Topchy, A. K. Jain and W. Punch, “Clustering ensembles: Models of consensus and weak partitions,”
IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.27, No.12, pp.1866-1881, December 2005.
19. A. Fred and A. K. Jain, “Data Clustering Using Evidence Accumulation,” Proc. 16th International ConferencePattern Recognition, pp.276-280, 2002.
20. S. Theodoridis and K. Koutroumbas, “Pattern Recognition,” Academic Press, third edition, 2006.