• 沒有找到結果。

未來研究方向

在文檔中 概念式自動問答探索系統 (頁 67-75)

第五章 結論與未來研究方向

第二節 未來研究方向

本論文利用 LSA 推導出問題詞鍵和答案詞鍵的隱含關係,用來搜尋問題的 可能答案詞鍵,也運用 LSA 推演文件集中詞鍵間的語意關係,用來建立問題的 概念空間,此外也導入簡單的機率模組,計算疑問詞鍵對答案類型的可能性,

用來判別問題的答案類型。藉由結合這三方面的技術,不僅使本論文所開發的 ACAF 系統提昇到知識的層級,也提高系統的擷取效能。

以下提出我們的未來研究發展方向:

1. 在 ACAF 的探索機制,本論文只考慮了問題詞鍵和答案詞鍵間出現的 關係,但以翻譯模組的技術考量,在做兩種語言之翻譯時還要考慮先 後的順序,因此,未來擬加入問題詞鍵和答案詞鍵出現的先後順序的 評量,以增加 ACAF 的系統搜尋效能。

2. 在斷詞切字方面,本論文僅比對詞庫切出可能的詞鍵,然而中文詞鍵 間的分界是模糊不清的,利用比對詞庫的方法並不能正確地斷詞,因 此斷詞切字的結果可能會有錯誤,進而導致問題詞鍵與答案詞鍵間關 係的推導發生錯誤,也影響了概念空間的建構。未來我們將導入中文 語句結構的分析,增加斷詞的正確性。

3. 在 LSA 的分析過程中,維度約化是 LSA 的成功關鍵,在實驗時,是 利用各種維度約化的可能性尋找最佳的維度約化。未來,擬藉由分析 原始資料,找出資料特性與最佳維度約化間的關係,減少尋找最佳維 度約化的時間。

4. 在答案類型判別上,本論文僅利用問題的疑問詞鍵做判斷。未來,會 加入問題的語句結構之分析,以增加答案類型的判別正確率。

參考文獻

1. [Aggarwal01] C.C. Aggarwal and P.S. Yu, “On Effective Conceptual Indexing and Similarity Search in Text Data,” Proc. IEEE International Conference on Data Mining, pp. 3-10, 2001.

2. [Bellegarda96] J.R. Bellegarda, J.W. Butzberger, Y.L. Chow, N.B. Coccaro, and D. Naik, “A Novel Word Clustering Algorithm Based On Latent Semantic Analysis,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 172-175, 1996.

3. [Berger00] A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal,

“Bridging the Lexical Chasm: Statistical Approaches to Answer-Finding,” Proc.

of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192-199, 2000.

4. [Brown90] P. Brown, J. Cocke, S.D. Pietra, V.D. Pietra, F. Jelinek, J. Lafferty, R. Mercer and P. Roossin, “A Statistical approach to machine translation,”

Computational Linguistics, vol. 16, No. 2, pp. 79-85, 1990.

5. [Chung99] M. Chung, Q. He, K. Powell and B. Schatz, “Semantic Indexing for a Complete Subject Discipline,” Proc. of the fourth ACM conference on Digital libraries, pp. 39-48, 1999.

6. [Dumais02] S. Dumais, M. Banko, E. Brill, J. Lin and A. Ng, “Web Question Answering: Is More Always Better?,” Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291-298, 2002.

7. [Fu01] W. Fu, B. Wu, Q. He and Z. Shi, “Text Document Clustering and the Space of Concept on Text Document Automatically Generated,” Info-tech and Info-net, 2001. Proceedings. ICII 2001 - Beijing. 2001 International

Conferences on, vol. 3, pp. 107-112, 2001.

8. [Gose96] E. Gose, R. Johnsonbaugh and S. Jost, Pattern Recognition and Image Analysis. Prentice Hall, 1996.

9. [Halkidi02] M. Halkidi, Y. Batistakis and M. Vazirgiannis, “Clustering Validity Checking Methods: PartⅡ,” ACM SIGMOD Record, vol. 31, No. 3, pp. 19-27, 2002.

10. [Jain99] A. K. Jain, M. N. Murty and P. J. Flynn, “Data Clustering: A Review,”

ACM Computing Surveys, vol. 31, No. 3, pp. 264-323, 1999.

11. [Kwok01] C. Kwok, O. Etzioni and D.S. Weld, “Scaling Question Answering to the Web,” ACM Trans. on Information Systems, vol. 19, No. 3, pp. 242-262, 2001.

12. [Landauer98] T.K. Landauer, P.W. Foltz, and D. Laham, “Introduction to Latent Semantic Analysis,” Discourse Processes, vol. 25, pp. 259-284, 1998.

13. [Lin01] D. Lin and P. Pantel, “Induction of Semantic Classes from Natural Language Text,” Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 317-322, 2001.

14. [Lin02] S.H. Lin, M.C. Chen, J.M. Ho and Y.M. Huang, “ACIRD: Intelligent Internet Document Organization and Retrieval,” IEEE Trans. On Knowledge and Data Engineering, vol. 14, No. 3, pp. 599-614, 2002.

15. [Google03] http://www.google.com, Google,2003.

16. [Nei99] J.Y. Nie, F. Ren, “Chinese information retrieval: using characters or words?,” Information Processing & Management, vol. 35, No. 4, pp. 443-462, 1999.

17. [Park96] C.P. Park and K.S. Choi, “Automatic Thesaurus Construction Using Bayesian Networks,” Information Processing and Management, vol. 32, No. 5, pp. 543-553, 1996.

18. [Pasca01] M. Pasca and S.M. Harabagiu, “Answer Mining from On-Line Documents,” Proc. of the ACL-2001 Workshop on Open-Domain Question Answering, pp. 38-45, 2001.

19. [Pasca01] M.A. Pasca and S.M. Harabagiu, “High Performance

Question/Answering,” Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.

366-374, 2001.

20. [Pinto02] D. Pinto, M. Branstein, R. Coleman, W.B. Croft, M. King, W. Li and X. Wei, “QuASM: A System for Question Answering Using Semi-Structured Data,” Proc. of the 2nd ACM/IEEE-CS joint conference on Digital Libraries, pp. 46-55, 2002

21. [Prager00] J. Prager, E. Brown and A. Coden, “Question-Answering by Predictive Annotation,” Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.

184-191, 2000.

22. [Radev01] D. Radve, H. Qi, Z. Zheng, B.G. Sasha, Z. Zhang, W. Fan and J.

Prager, “Mining the Web for Answers to Natural Language Questions,” Proc. of the 10th International Conference on Information and Knowledge Management, pp. 143-150, 2001.

23. [Radev02] D. Radev, W. Fan, H. Qi, H. Wu and A. Grewal, “Probabilistic Question Answering,” Proc. of the 11th World Wide Web Conference, pp.

408-419, 2002.

24. [Ricardo99] B.Y. Ricardo, R.N. Berthier, Modern Information Retrieval.

Addison-Wesley, 1999.

25. [Sugumaran02] V. Sugumaran and V.C. Storey, “Ontologies for Conceptual Modeling: their Creation, Use, and Management,” Data and Knowledge

Engineering, vol. 42, No.3, pp. 251-271, 2002.

26. [Yahoo!03] http://tw.yahoo.com, Yahoo! Taiwan Inc., 2003.

27. [Yam03] http://www.yam.com, Yam Digital Technology, 2003.

28. [Yang00] Y. Yang, T. Ault, T. Pierce and C.W. Lattimer, “Improving Text Categorization Methods for Event Tracking,” Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 65-72, 2000.

29. [Zelikovitz01] S. Zelikovitz and H. Hirsh, “Using LSI for Text Classification in the Presence of Background Text,” Proc. of the 10th International ACM Conference on Information and Knowledge Management, pp. 113-118, 2001.

附錄:ACAF 搜尋流程 (以常問問答集為例)

在此附錄中以一測試問題來說明整個 ACAF 的搜尋流程。在訓練問答集的 部分採用常問問答集 (如圖 13 所示),QNum 為問答組的編號,QContent 為問 題內容,QRepllyCont 為答案內容,AnswrType 為問題的答案類型。

圖 13:ACAF 訓練問答集 (常問問答集)

以測試問題「如何延後所借的書之到期日?」為例,此測試問題的相關答 案為圖 13 中 QNum 為 4、5 和 32 的三篇答案文件。測試問題經過斷詞切字後,

得到關鍵詞鍵為「延後」、「到期日」,疑問詞鍵為「如何」。因「延後」和「到 期日」並沒有出現在訓練問答集中,因此找不到與問題關鍵詞鍵相關的答案詞 鍵,故 ACAFW找不到任何符合的答案,TRDR、準確率和查全率的值均為 0!

利用「延後」和「到期日」,參考問題的概念空間可以找到包含二詞鍵的 概念有 5 個概念,分別為 Concept81、Concept308、Concept318、Concept347、

Concept143,因此加入了概念的比對,就能找到符合的答案(搜尋結果如圖 14 所示),TRDR、準確率和查全率分別提升至 1.5、1 和 0.67。

圖 14:ACAFWC的搜尋結果

利用疑問詞鍵「如何」,參考答案類型判別知識庫 (如圖 15 所示),得到此 測試問題可能的答案類型代號為 7,因此加上答案類型判別的 ACAF 系統對此 測試問題所找到的答案如圖 16 所示。ACAF 系統會將 ACAFWC擷取出的答案 做答案類型的篩選,因此對 QNum 為 5 的答案文件,其答案類型符合測試問題 的答案類型,因此 QNum 為 5 的答案文件對測試問題的權重仍為 1;而對 QNum 為 4 的答案文件來說,其答案類型代號為 1,因此 QNum 為 4 的答案文件對測 試問題的權重降為 0.73。儘管如此,因回傳的文件仍為 QNum5 和 QNum4,因 此 TRDR、準確率和查全率仍然不變。

圖 15:答案類型判別知識庫 (常問問答集)

圖 16: ACAF 的搜尋結果

在文檔中 概念式自動問答探索系統 (頁 67-75)

相關文件