Using Latent Topic Proximity Information - 探索虛擬關聯回饋技術和鄰近資訊於語音文件檢索與辨識之改進

In the third set of experiments in speech recognition, the possibility of leveraging latent topic information is investigated, namely topic-based relevance model (TRM) which provides a mechanism to describe the proximity of the search history H and the upcoming word w in the latent topic space within a pseudo-relevant document,

for the RM modeling. Additionally, we can further combine PRM with TRM (through a simple linear interpolation) so as to maximize on two different sources of proximity information simultaneously for RM modeling. As can be seen from Table 9, the improvement brought by TRM is less pronounced as compared to that of PRM, which to some extent confirms the intuition that proper modeling of word order and adjacency information is quite useful to the success of speech recognition. The combination of PRM and TRM, however, offers a moderate improvement over PRM in isolation. Additionally, if PRM is further linearly combined with PLSA, the CER can be ultimately reduced as well. Finally, Table 10 exhibits the results of significance tests between RM and PRM, RM and PRM+TRM, which demonstrates the improvement of our approach is statistically significant.

Table 9 : The speech recognition results (in CER (%)) of TRM and PLSA, and their combination with PRM respectively.

Table 10 : The p-value obtained from the pair t-test on CER(%)of PRM with respect to that of RM and CER(%) of PRM + TRM with respect to that of RM respectively.

p-value (PRM (=2)) p-value (PRM (=2) + TRM)

RM 4.99E-02 4.77E-05

TRM PRM (=2) + TRM PRM (=2) + PLSA

19.18 18.41 18.71

6 Conclusion and Future Work

In this study, to further enhance query formulation especially for SDR, a language modeling (LM) framework is proposed to combine several kinds of information cues, namely, relevance, diversity, density and non-relevance into the process of feedback document selection. The utility of the retrieval methods also been validated by extensively comparisons with several existing methods. The experimental results seem to show the superiority of our LM framework for SDR. As to future work for SDR, we would like to adopt this LM framework for speech recognition and summarization [47,60]

On the other hand, a novel extension of the RM framework for language modeling in speech recognition has been presented as well. Our contribution to speech recognition is two-fold. First, the so-called “bag-of-words” assumption of RM is relaxed by incorporating word proximity evidence into the RM formulation. Second, topic-based proximity information is additionally explored in an effort to enhance the proximity-based RM framework. Experimental results reveals that the various

language models deduced from our framework are very comparable to existing language models for LVCSR. In this aspect, we would like to adopt this LM framework for speech retrieval and summarization applications for future work[60,61].

*

Bibliography

[1] L. Lin-shan and B. Chen, "Spoken document understanding and organization,"

IEEE Signal Processing Magazine, vol. 22(5), pp. 42-60, 2005.

[2] C. Chelba, T. J. Hazen, and M. Saraclar, "Retrieval and browsing of spoken content," IEEE Signal Processing Magazine, vol. 25(3), pp. 39-49, 2008.

[3] M. Ostendorf, "Speech technology and information access," IEEE Signal Processing Magazine, vol. 25(3), pp. 152-150, 2008.

[4] B. Chen, "Word topic models for spoken document retrieval and transcription,"

ACM Transactions on Asian Language Information Processing, vol. 8(1), pp.

1-27, 2009.

[5] R. Baeza-Yates and B. Ribeiro-Neto, "Modern Information Retrieval: The Concepts and Technology behind Search": Addison-Wesley Professional, 2011.

[6] J. S. Garofolo, C. G. P. Auzanne, and E. M. Voorhees, "The TREC spoken document retrieval track: A success story," in Proceeding 8th Text REtrieval Conference (TREC-8), 2000, pp. 107-129.

[7] J. M. Ponte and W. B. Croft, "A language modeling approach to information retrieval," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia, 1998, pp. 275-281.

[8] D. R. H. Miller, T. Leek, and R. M. Schwartz, "A hidden Markov model information retrieval system," in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States, 1999, pp. 214-221.

[9] B. Chen, H.-M. Wang, and L.-S. Lee, "A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents," vol. 3(2), pp. 128-145, 2004.

[10] T. K. Chia, K. C. Sim, H. Li, and H. T. Ng, "Statistical lattice-based spoken document retrieval," ACM Transactions on Information Systems, vol. 28(1), pp.

1-30, 2010.

[11] C. X. Zhai, "Statistical language models for information retrieval: A critical review", Foundations and Trends in Informational Retrieval, vol. 2,no. 3, pp.

137-213, 2008.

[12] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," The Journal of Machine Learning Research, vol. 3(1), pp. 993-1022, 2003.

[13] T. Hofmann, "Unsupervised Learning by Probabilistic Latent Semantic Analysis,"

Machine Learning, vol. 42(1), pp. 177-196, 2001.

[14] D. Blei and J. Lafferty, "Topic models," in Text Mining: Theory and Applications, A. Srivastava and M. Sahami, Eds., ed New York: Taylor and Francis, 2009.

[15] X. Yi and J. Allan, "A Comparative Study of Utilizing Topic Models for Information Retrieval," in Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, Toulouse, France, 2009, pp.

29-41.

[16] V. T. Turunen and M. Kurimo, "Indexing confusion networks for morph-based spoken document retrieval," in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands, 2007, pp. 631-638.

[17] S. Parlak and M. Saraclar, "Performance Analysis and Improvement of Turkish Broadcast News Retrieval," IEEE Transactions on Audio, Speech, and Language Processing, , vol. 20(3), pp. 731-741, 2012.

[18] B. Chen, K.-Y. Chen, P.-N. Chen, and Y.-W. Chen, "Spoken Document Retrieval With Unsupervised Query Modeling Techniques," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20(9), pp. 2602-2612, 2012.

[19] V. Lavrenko and W. B. Croft, "Relevance-based language models," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, United States, 2001, pp. 120-127.

[20] F. Jelinek, "Statistical methods for speech recognition", Cambridge, MA: MIT Press, 1999.

[21] C. D. Manning and H. Schutze, "Foundations of statistical natural language processing", Cambridge, MA: MIT Press, 1999.

[22] X. Wei and W. B. Croft, "LDA-based document models for ad-hoc retrieval," in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA, 2006, pp. 178-185.

[23] S. Kullback and R. A. Leibler, "On Information and Sufficiency," The Annals of Mathematical Statistics, vol. 22(1), pp. 79-86, 1951.

[24] L. Shih-Hsiang, Y. Yao-Ming, and C. Berlin, "Leveraging Kullback-Leibler Divergence Measures and Information-Rich Cues for Speech Summarization "

IEEE Transactions on Audio, Speech, and Language Processing, vol. 19(4), pp.

871-882, 2011.

[25] C. Zhai and J. Lafferty, "A study of smoothing methods for language models applied to Ad Hoc information retrieval," in Proceedings of the 24th annual

international ACM SIGIR conference on Research and development in information retrieval, ed. New Orleans, Louisiana, USA: ACM, 2001, pp.

334-342.

[26] C. Zhai and J. Lafferty, "Model-based feedback in the language modeling approach to information retrieval," in Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA, 2001, pp. 403-410.

[27] X. Shen and C. Zhai, "Active feedback in ad hoc information retrieval," in Proceedings of the 28th annual international ACM SIGIR conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp.

59-66.

[28] J. Xu and W. B. Croft, "Query Expansion Using Local and Global Document Analysis," in Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 1996, pp. 4-11.

[29] L. Ballesteros and W. B. Croft, "Phrasal translation and query expansion techniques for cross-language information retrieval," in Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, Philadelphia, Pennsylvania, USA, 1997, pp. 84-91.

[30] T. Sakai, M. Kajiura, and K. Sumita, "A First Step towards Flexible Local Feedback for Ad hoc Retrieval," in Proceedings of the fifth International Workshop on Information Retrieval with Asian Languages, Hong Kong, China, 2000, pp. 95-102.

[31] J. Xu and W. B. Croft, "Improving the effectiveness of information retrieval with local context analysis," ACM Transactions on Information Systems, vol. 18(1), pp.

79-112, 2000.

[32] S. E. Robertson and S. Walker, "Okapi/Keenbow at TREC-8," in The 8th Text REtrieval Conference (TREC 8), 2000, p. 151.

[33] T. Sakai, T. Manabe, and M. Koyama, "Flexible pseudo-relevance feedback via selective sampling," ACM Transactions on Asian Language Information Processing, vol. 4(2), pp. 111-135, 2005.

[34] J. Carbonell and J. Goldstein, "The use of MMR, diversity-based reranking for reordering documents and producing summaries," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia, 1998, pp. 335-336.

[35] C. X. Zhai, W. W. Cohen, and J. Lafferty, "Beyond independent relevance:

methods and evaluation metrics for subtopic retrieval," in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, Toronto, Canada, 2003, pp. 10-17.

[36] V. Lavrenko, "A generative theory of relevance," University of Massachusetts Amherst, 2004.

[37] Y. Lv and C. Zhai, "A comparative study of methods for estimating query language models with pseudo feedback," in Proceedings of the 18th ACM conference on Information and knowledge management, Hong Kong, China, 2009, pp. 1895-1898.

[38] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM algorithm," Journal of the Royal Statistical Society B, vol. 39(1), pp. 1-38, 1977.

[39] T. L. Griffiths and M. Steyvers, "Finding scientific topics," in Proceedings of the National Academy of Sciences, 2004, pp. 5228-5235.

[40] R. Rosenfeld, "Two decades of statistical language modeling: where do we go from here?," Proceedings of the IEEE, vol. 88(8), pp. 1270-1278, 2000.

[41] J. R. Bellegarda, "Statistical language model adaptation: review and perspectives," Speech Communication, vol. 42(1), pp. 93-108, 2004.

[42] D. Gildea and T. Hofmann, "Topic-based language models using EM," in Proceedings of European Conference on Speech Communication and Technology, 1999, pp. 2167-2170.

[43] Y.-C. Tam and T. Schultz, "Dynamic language model adaptation using variational Bayes inference," in Proceedings of the Annual Conference of the International Speech Communication Association, 2005, pp. 5-8.

[44] R. Lau, R. Rosenfeld, and S. Roukos, "Trigger-based language models: a maximum entropy approach," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 45-48.

[45] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, "Recurrent neural network based language model," in Proceedings of Annual Conference of the International Speech Communication Association, 2010, pp. 1045-1048.

[46] B. Roark, M. Saraclar, and M. Collins, "Discriminative n-gram language modeling," Computer Speech and Language, vol. 21(2), pp. 373-392, 2007.

[47] B. Chen and K.-Y. Chen, "Leveraging relevance cues for language modeling in speech recognition," Information Processing & Management, vol. 49(4), pp.

807-816, 2013.

[48] B. Chen and S.-H. Lin, "A risk-aware modeling framework for speech summarization," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20(1), pp. 211-222, 2012.

[49] Z. Xu, R. Akella, and Y. Zhang, "Incorporating diversity and density in active learning for relevance feedback," in Proceedings of the 29th European conference on IR research, Rome, Italy, 2007, pp. 246-257.

[50] W. W. Edgar Meij, Jiyin He, Maarten de Rijke "Incorporating Non-Relevance Information in the Estimation of Query Models," in TREC, 2008.

[51] S. Cronen-Townsend and W. B. Croft, "Quantifying query ambiguity," presented at the Proceedings of the second international conference on Human Language Technology Research, San Diego, California, 2002.

[52] LDC, "Project Topic Detection and Tracking," Linguistic Data Consortium, 2000.

[53] H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng, "MATBN: A Mandarin Chinese Broadcast News Corpus," International Journal of Computational Linguistics & Chinese Language Processing, vol. 10(1), pp. 219-235, 2005.

[54] A. Stolcke, SRI Language Modeling Toolkit

(http://www.speech.sri.com/projects/srilm/), 2000.

[55] B. Chen, J.-W. Kuo, and W.-H. Tsai, "Lightly supervised and data-driven approaches to Mandarin broadcast news transcription," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, pp. 777-780.

[56] H.-S. Lee and B. Chen, "Generalized likelihood ratio discriminant analysis," in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, 2009, pp. 158-163.

[57] T. Mikolov, S. Kombrink, A. Deoras, L. a. s. Burget, and J. H. C. ´y,

"RNNLM-Recurrent neural network language modeling toolkit," in Proceedings of IEEE workshop on Automatic Speech Recognition and Understanding, 2011.

[58] T. Oba, T. Hori, and A. Nakamura, "A comparative study on methods of weighted language model training for reranking LVCSR N-best hypotheses," in Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing 2010, pp. 5126-5129.

[59] L. Gillick and S. J. Cox, "Some statistical issues in the comparison of speech recognition algorithms," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 532-535.

[60] C. Berlin, H.-C. Chang, and K.-Y. Chen, "Sentence modeling for extractive speech summarization," in Proceedings of the IEEE International Conference on Multimedia & Expo, 2013.

[61] Y.-W. Chen, K.-Y. Chen, H.-M. Wang, and B. Chen, "Effective pseudo-relevance feedback for spoken document retrieval," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013.

★

Publication List

[1] Yi-Wen Chen, Bo-Han Hao, Kuan-Yu Chen, Berlin Chen, "Incorporating proximity information for relevance language modeling in speech recognition,"

the 14^th Annual Conference of the International Speech Communication Association (Interspeech 2013), Lyon, France, August 25-29, 2013.

[2] Yi-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen, "Effective Pseudo-Relevance Feedback for Spoken Document Retrieval," the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), Vancouver, Canada, May 26-31, 2013.

[3] Berlin Chen, Kuan-Yu Chen, Pei-Ning Chen, Yi-Wen Chen, "Spoken Document Retrieval with Unsupervised Query Modeling Techniques," IEEE Transactions on Audio, Speech, and Language Processing, vol.20, no.9, pp.2602-2612, November, 2012.

[4] Yi-Wen Chen, Jun-Yu Chen, Kuan-Yu Chen, Berlin Chen, "Empirical Comparisons of Various Pseudo-relevant Document Selection Methods for

Improved Spoken Document Retrieval," the 17th Conference on Technologies and Applications of Artificial Intelligence (TAAI 2012), November 16-18, 2012. (in Chinese)

[5] Ching-Huang Wang, Yi-Wen Chen, Tian-You Wu, " Self-Guided Bibliotherapy: A Case Study of a Taiwanese Doctoral Student,” the 8th International Conference on New Directions in the Humanities, Los Angeles, USA, June 29 - July 2, 2010.

[6] Ching-Huang Wang, Yi-Wen Chen, Tian-You Wu, "Self- Guided Bibliotherapy: A Case Study of a Taiwanese Doctoral Student, " the International Journal of the Humanities, vol.8, no.1, pp.413-422, April, 2010.

在文檔中探索虛擬關聯回饋技術和鄰近資訊於語音文件檢索與辨識之改進 (頁 73-87)