在本論文中,我們對虛擬關聯回饋之這個步驟都進行了探討與改進。在虛 擬關聯文件選取方法中,本論文題出將文件之關聯性資訊融入虛擬關聯文件選 取中,藉此得到以往以查詢關連為主的方法中所遺漏之資訊。此方法之參數調 整亦較現有的虛擬關聯文件選取方法有利。實驗結果顯示以此方法所挑選之虛 擬關聯文件所進行的虛擬關連回饋有不錯的效果,且在不同的虛擬關聯文件使 用數量及不同數量的候選文件的情況下都能發揮效果,表示該方法能如我們所 預期的選出有效的虛擬關聯文件,且在調整參數之效率及檢索強健性上都有所 改善。
在查詢模型上,我們探討現有 SMM 及 RMM 之不足之處,並加以改善,
以求強化查詢模型之效果。雖然在實驗結果上並未都有所改善,但在某些設定 中亦對整體檢索有所幫助,顯示這些查詢模型仍有改善空間。
我們亦探究了清晰度的計算及其所代表的意義,並將其作了除了原本的效 能評估以外之應用,即將其用於虛擬關聯文件回饋模型輔助資訊檢索中。而實 驗中亦證明其結果能夠逼近甚至超越以往只能靠手動調整之結果,並讓其調整 除了能針對不同查詢而自動作出修正外,更具有理論意義上的基礎。
在本論文中仍有許多需要研究及探討的方向:在虛擬關聯文件選取方面,
目前計算文件間關聯之方法是以餘弦相似性為主,而此方法只考慮兩文件模型 中之詞分布情形。未來我們可以考慮利用更多不同種類之資訊來估測文件關聯 性,如多連詞或詞間距離等面向都十分具有潛力。在查詢模型之改善上,由於 我們所改進的模型大都著重於高查詢關聯之文件上,故我們可針對其特性找到 符合其特性之虛擬關聯文件選取方法,以讓其能充分利用這些資訊。在清晰度 的資訊方面,由於我們目前只採用虛擬關聯文件集合之模型與背景間之距離來 進行清晰度的估測,並未使用到其他查詢或回饋模型中的各種特徵;因此,未
來我們希望能考慮更多不同的清晰度估測方式並將其用以輔助查詢模型之結 合權重調整,以達到根據不同查詢而有不同權重的目的。
在本論文中,我們也發現,在進行虛擬關聯文件選取時,候選文件的數量 以及使用的文件數量對整體檢索結果有很大的影響。由於不同的虛擬關聯文件 選取方法亦有不同的最佳的候選文件數量及使用之文件之組合,故在未來的研 究中,我們也可以朝著如何決定前面所述之選取範圍以及更強健的虛擬關聯文 件選取方法的方向進行研究。
參考文獻
[1] J. Lafferty and C. Zhai, “Document Language Models, Query Models, and Risk Minimization for Information Retrieval,” in ACM Special Interest Group on Information Retrieval, pp. 111-119, 2001.
[2] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Professional, 2011.
[3] B. Chen, H.-M. Wang and L.-S. Lee, ”A Discriminative HMM/N-gram-Based Retrieval Approach for Mandarin Spoken Documents,” ACM Transactions on Asian Language Information processing, Vol. 3 ,No. 2, pp. 128–145, 2004.
[4] J. Ponte and W. B. Croft, “A Language Modeling Approach to Information Retrieval,” in ACM Special Interest Group on Information Retrieval, pp. 275–
281, 1998.
[5] D. Miller, T. Leek and R. Schwartz, ”A Hidden Markov Model Information Retrieval System,” in ACM Special Interest Group on Information Retrieval, pp. 214–221, 1999.
[6] T. Chia, K. Sim, H. Li and H. Ng,”Statistical lattice-based spoken document retrieval,” in ACM Transactions on Information Systems, Vol. 28, No. 1, pp.
2:1–2:30, 2010.
[7] C. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambrige Unicersity Press, 2008.
[8] S. Robertson, S. Walker, M. Beaulieu and M. Gatford, “Okapi at TREC-4,” in the 4th Text Retrieval Conference, pp. 182–191, 1996.
[9] P. Zhang, D. Song, X. Zhao and Y. Hou, “A Study of Document Weight
[10] M. Bendersky, W. Croft, “Discovering Key Concept in Verbose Queries,” in ACM Special Interest Group on Information Retrieval, pp.491–498, 2008.
[11] S. Cronen-Townsend, Y. Zhou and W. Croft, “Predicting Query Performance,”
in ACM Special Interest Group on Information Retrieval, pp. 299–302, 2002.
[12] S. Cronen-Townsend, and W. Croft, “Quantifying Query Ambiguity,” in Proceedings of the second international conference on Human Language Technology Research, pp. 104–109, 2002.
[13] M. Rorvig, “A New Method of Measurement for Question Difficulty,” in Proceedings of American Society for Information Science, Knowledge Innovations, Vol. 37, pp. 372–378, 2000.
[14] Y. Lv and C. Zhai, “Query Likelihood with Negative Query Generation,” in ACM international conference on Information and knowledge management, pp. 1799–1803, 2012.
[15] L.-K. Wang, Z.-W. Li, R. Cai, Y.-X. Zhang, Y.-Z Zhou, L. Yang and L. Zhang,
“Query by Document via a Decomposition-based Two-level Retrieval
Approach,” in ACM Special Interest Group on Information Retrieval, pp. 505–
514, 2011.
[16] Y. Yang and N. Bansal, “Query by Document,” in ACM International Conference on Web Search and Data Mining, pp. 34–43, 2009.
[17] M. Najork, “Comparing the Effectiveness of HITS and SALSA,” in ACM International Conference on Information and Knowledge Management, pp.
157–164, 2007.
[18] Y. Hu, Y. Qian, H. Li, D. Jiang, Jian Pei and Q. Zheng, “Mining Query Subtopics from Search Log Data,” in ACM Special Interest Group on Information Retrieval, pp. 305–314, 2012.
[19] R. Krestel and P. Fankhauser, “Reranking Web Search Results for Diversity,”
Information Retrieval, Vol. 15, No. 5, pp. 458–477, 2012.
[20] S. Brin,and L. Page, “The Anatomy of a Large-Scale Hypertext Web Search Engine,” in Proceedings International World Wide Web Conference, Vol. 30, Iss. 1–7, pp. 107–117, 1998.
[21] T. Joachims, “Optimizing Search Engines Using Clickthrough Data,” in ACM Special Interest Group on Knowledge Discovery and Data Mining, pp.133–
142, 2002.
[22] B. Chen, H.-M. Wang and L.-S. Lee, “An HMM/N-gram-Based Linguistic Processing Approach,” in Europe Conference Speech Communication Technology, Vol. 2, pp. 1045–1048. 2001.
[23] M. Bendersky and W. Croft, “Modeling Higher-Order Term Dependencies in Information Retrieval Using Query Hypergraphs,” in ACM Special Interest Group on Information Retrieval, pp. 941–950, 2012.
[24] R. Krovetz, “Viewing morphology as an inference process,” in ACM Special Interest Group on Information Retrieval, pp. 191–201, 1993.
[25] X. Shen and C. Zhai, “Active Feedback in Ad Hoc Information Retrieval,” in ACM Special Interest Group on Information Retrieval, pp. 55–66, 2005.
[26] .J Carbonell and J. Goldstein, “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries,” in ACM Special Interest Group on Information Retrieval, pp. 335–336, 1998.
[27] T. Sakai, T. Manabe and M. Koyama, “Flexible Pseudo-Relevance Feedback via Selective Sampling,” ACM Transactions on Asian Language Information Processing, Vol. 4, No. 2, pp. 111–135, 2005.
[28] K. Lee, W. Croft and J. Allan, “A Cluster-Based Resampling Method for Pseudo-Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 234–242, 2008.
[29] K. Lee and W. Croft, “A Deterministic Resampling Method using Overlapping Document Clusters for Pseudo-Relevance Feedback,” Information Processing and Management, Vol. 40, No. 4, pp. 792–806, 2013.
[30] Z. Xu, R. Akella and Y. Zhang, “Incorporating Diversity and Density in Active Learning for Relevance Feedback,” in Advances in Information Retrieval, pp.
246–257, 2007.
[31] Y.-W. Chen, K.-Y. Chen, H.-M. Wang and B. Chen, "Effective Pseudo-Relevance Feedback for Spoken Document Retrieval," in 38th IEEE
International Conference on Acoustics, Speech, and Signal Processing, 2013.
[32] B. Efron, “Bootstrap Methods: A Other Look at Jackknife,” in The Annals of Statistics, Vol. 7 No. 1, pp. 1–26.
[33] K. Collins-Thompson and J. Callan, “Estimation and Use of Uncertainty in Pseudo-Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 303–310, 2007.
[34] R. Nallapati, W. Croft and J. Allan, “Relevant Query Feedback in Statistical Language Modeling,” in ACM International Conference on Information and Knowledge Management, pp. 560–563, 2003.
[35] V. Oliveira, G. Gomes, F. Belém, W. Brandão, J. Almeida, N. Ziviani and M.
Conçalves, “Automatic Query Expansion Based on Tag Recommendation,” in ACM International Conference on Information and Knowledge Management, pp. 1985–1989, 2012.
[36] K. Hasegawa, M. Takehara, S. Tamura and S. Hayamizu, “Spoken Document
Institute of Informatics Testbeds and Community for Information access Research, 2013.
[37] Y. Lv and C. Zhai, “A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback,” in ACM International Conference on Information and Knowledge Management, pp. 1895–1898, 2009.
[38] V. Lavrenko and W. Croft, “Relevance-Based Language Models,” in ACM Special Interest Group on Information Retrieval, pp. 120–127, 2001.
[39] C. Zhai and J. Lafferty, “Model-based Feedback in the Language Modeling Approach to Information Retrieval,” in International Conference on
Information and Knowledge Management, pp. 403–410, 2001.
[40] T. Tao and C. Zhai, “Regularized Estimation of Mixture Models for Robust Pseudo-Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 162–469, 2006.
[41] J. Xu and W. Croft, “Query Expansion Using Local and Global Document Analysis, ” in ACM Special Interest Group on Information Retrieval, pp. 4–11, 1996.
[42] J. Allan, “Relevance Feedback with Too Much Data,” in ACM Special Interest Group on Information Retrieval, pp. 337–343, 1995.
[43] A. Lam-Adesinaa and G. Jones, “Applying Summarization Techniques for Term Selection in Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 1–9, 2001.
[44] S.-H. Liu, K.-Y. Chen, H.-M. Wang, W.-L. Hsu and B. Chen, “Improved Sentence Modeling Techniques for Extractive Speech Summarization,” in Conference on Computational Linguistics and Speech Processing, pp. 5–21,
[45] A. Tombros and M. Sanderson, “Advantages of Query Biased Summaries in Information Retrieval,” in ACM Special Interest Group on Information Retrieval, pp. 2–10, 1998.
[46] M. Eflon, P. Organisciak and K. Fenlon,“Improving Retrieval of Short Texts Through Document Expansion,” in ACM Special Interest Group on
Information Retrieval, pp. 911–920, 2012.
[47] D. Yeung, C .Clarke, G. Cormack, T. Lynam and E. Terra, “Task-Specific Query Expansion,” in the 12th Text Retrieval Conference, pp. 810–819, 2004.
[48] P. Mahdabi, S. Gerani, J. Huang and F. Crestani, “Leveraging Conceptual Lexicon: Query Disambiguation using Proximity Information for Patent Retrieval,” in ACM Special Interest Group on Information Retrieval, pp. 113–
122, 2013.
[49] Y. Lv and C. Zhai, “Positional Language Models for Information Retrieval,” in ACM Special Interest Group on Information Retrieval, pp. 299–306, 2009.
[50] Y. Lv and C. Zhai, “Positional Language Models for pseudo-relevance feedback,” in ACM Special Interest Group on Information Retrieval, pp. 579–
586, 2010.
[51] J. Miao, J. Huang and Z. Ye, “Proximity-Based Rocchio’s Model for Pseudo Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 535–544, 2012.
[52] B. Chen, K.-Y. Chen, P.-N Chen and Y.-W. Chen, “Spoken Document Retrieval With Unsupervised Query Modeling Techniques,” in IEEE
Transactions on Audio, Speech and Language Processing, Vol.20, No. 2, pp.
2602–2612, 2012.
[53] C. van Rijsbergen, “Information Retrieval: Theory and Practice,” Proceedings of the joint IBM/University of Newcastle upon Tyne Seminar on Data Base Systems, pp. 1–14, 1979.
[54] K. Maxwell and W. Croft, “Compact Query Term Selection Using Topically Related Text,” in ACM Special Interest Group on Information Retrieval, pp.
583–592, 2013.
[55] K. Collins-Thompson and J. Callan, “Query Expansion Using Random Walk Models,” in ACM International Conference on Information and Knowledge Management, pp. 704–711, 2005.
[56] K. Toutanova and C. Manning and A. Ng, “Learning Random Walk Models for Inducing Word Dependency Distributions,” in the 21st International
Conference on Machine Learning, pp. 103–113, 2004.
[57] L. Maisonnasse, F. Harrathi, C. Roussey and S. Calabretto, “Analysis
Combination and Pseudo Relevance Feedback Conceptual Language Model,”
in Lecture Notes in Computer Science, Vol. 6242, pp 203–210, 2010.
[58] E. Meij, D. Trieschnigg, M. Rijke and W. Kraaij, “Conceptual Language Models for Domain-Specific Retrieval,” in Information Processing and Management, Vol. 46, pp. 448–469.
[59] Y. Lv, C. Zhai and W. Chen, “A Boosting Approach to Improving Pseudo-Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 165–174, 2011.
[60] M. Karimzadehgan and C. Zhai, “Exploration-Exploitation Tradeoff in Interactive Relevance Feedback,” in ACM International Conference on Information and Knowledge Management, pp. 1397–1400, 2010.
[61] H. Wu and H. Fang, “An Incremental Approach to Efficient Pseudo-Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 553–
562, 2013.
[62] M. Cartright, J. Allan, V. Lavrenko and A. McGregor, “Fast Query Expansion Using Approximations of Relevance Models,” in ACM International
Conference on Information and Knowledge Management, pp. 1573–1576, 2010.
[63] C. Zhai, Statistical Language Models for Information Retrieval, Morgan and Claypool, 2008.
[64] S. Kullback and R. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics, Vol. 22, No. 1, pp. 79–86, 1951.
[65] P. Dempster, N. Laird and D. Rubin, “Maximum Likelihood from Incomplete Data via EM algorithm,” Journal of Royal Statist, Serial B, Vol. 39, No. 1, pp.1–38, 1977.
[66] Hofmann, “Probabilistic Latent Semantic Analysis,” in Uncertainty in Artificial Intelligence, pp. 289–296, 1999.
[67] T. Hoffmann, “Unsupervised learning by Probabilistic Latent Semantic Analysis”, Machine Learning, Vol. 42, No. 1–2, pp. 177–196, 2001.
[68] D. Blei, A. Ng and M. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, Vol. 3, pp. 993–1022, 2003.
[69] G. Cao, J. Nie, J. Gao and S. Robertson, “Selecting Good Expansion Terms for Pseudo-Relevance Feedback,” in ACM Special Interest Group on Information Retrieval, pp. 243–250, 2008.
[70] E. Alpaydin, Introduction to Machine Learning, MIT Press, 2004.
[71] C. Cieri, S. Strassel, D. Graff, N. Martey, K. Rennert and M. Liberman,
“Corpora for Topic Detection and Tracking,” in Topic Detection and Tracking – Event-based Information Organization, Chapter 3, pp. 33–66, 2002.
[72] M. Porter, “An Algorithm for Suffix Stripping,” Electronic Library and Information Systems, Vol. 14, No. 3, pp. 130–137, 1980.
[73] J. Garofolo, C. Auzanne and E. Voorhees, “The TREC Spoken Document Retrieval Track: A Success Story,” in Proceeding 8th Text Retrieval Conference, pp. 107–129, 2000.
[74] P. Zhang, D. Song, J. Wang and Y. Hou, “Bias–Variance Analysis in Estimating True Query Model for Information Retrieval,” in Information Processing and Management, Vol. 50, No. 1, pp. 199–217, 2014.
[75] S. Hummel, A. Shtok, F. Raiber and O. Kurland, “Clarity Re-Visited,” in ACM Special Interest Group on Information Retrieval, pp. 1039–1040, 2012.
[76] M. Karimzadehgan and C. Zhai, “A Learning Approach to Optimizing Exploration-Exploitation Tradeoff in Relevance Feedback,” Information Retrieval, Vol. 16, No. 3, pp. 307–330, 2013.