結論與未來展望 - 使用詞向量表示與概念資訊於中文大詞彙連續語音辨識之語言模型調適

語言模型不僅在語音辨識中扮演重要的角色，還可以應用至許多不同的領域，

例如資訊檢索、機器翻譯、手寫辨識以及文件摘要等。在語音辨識中，通常會透過語言模型來補足聲學模型經常會有同音異字、發音混淆的情況，並幫助語音辨識器評估在各個候選詞序列中發生的可能性，以提高辨識的準確率。

在語言模型之中，最被廣泛使用的語言模型為 N 連語言模型，然而 N 連語言 模型會因為訓練資料不足而導致資料稀疏以及缺乏長距離的詞規則資訊問題，後續其他語言模型的提出大多是為了改善此問題。本論文回顧了近年來語言模型的 演進，包括 N 連語言模型、主題模型、關聯模型、遞迴式類神經網路語言模型、

長短期記憶類神經網路語言模型等語言模型的介紹。

近年來深度學習(Deep Learning)激起一股研究熱潮；隨著深度學習的發展而有分散式表示法(Distributed Representation)的產生。此種表示方式，不僅能以較低維度的向量表示詞彙，還能藉由向量間的運算，找出任兩詞彙之間的語意關係。本論文以此為發想，提出將分散式表示法應用於語音辨識的語言模型中使用。主要貢獻可以分為兩個部分: 第一部分，本論文將詞向量表示資訊應用於詞圖搜尋之中，在語音辨識的過程中，對於動態產生之歷史詞序列與候選詞改以詞向量表示的方式來建立其對應的語言模型，透過此種表示方式而能獲取到更多詞彙間的語意資訊，以提升辨識的準確度。第二部分，我們針對新近被提出的概念語言模型

(Concept Language Model)加以改進，在調適語料中以句子的層次做模型訓練資料選取之依據，去掉多餘且不相關的資訊，使得經由調適語料中訓練出的概念類別更為具代表性，而能幫助動態語言模型調適。另一方面，在語音辨識過程中，會選擇相關的概念類別來動態組成概念語言模型，而此是透過詞向量表示的方式來估算，藉由詞向量表示記錄每一個概念類別內詞彙彼此間的語意關係。最後，我們嘗試將上述兩種語言模型調適技術做結合。根據實驗結果顯示，本論文提出將詞向量表示(Word Representation)應用於語言模型中，對於語音辨識的準確率提升確實有幫助。

未來，我們希望將詞向量表示的資訊應用於其他的語言模型之中，例如應用於關聯模型、詞概念語言模型等。此外，我們希望依據詞圖搜尋的結果結合其他 語言模型後，在第二階段的 N 條最佳結果(N-Best)重新排名時，使用長短期記憶 類神經網路模型、遞迴式類神經網路等語言模型重新排序，希望藉由此方法達到辨識效能的提升。

參考文獻

[1] K.-F. Lee, “Automatic Speech Recognition: The Development of the SPHINX Recognition System,” Boston: Kluwer Academic Publishers, 1989.

[2] C. Manning and H. Schutze, “Foundations of statistical natural language processing,” Cambridge, MA: MIT Press, 1999.

[3] P. F. Brown, V. J. Della Pietra, S. A. Della Pietra and R. L. Mercer, “The mathematics of statistical machine translation: Parameter estimation,”

Computational Linguistics, Vol. 19, No. 2, pp. 263–311, 1993.

[4] C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to ad hoc information retrieval,” in Proceedings of the ACM Special Interest Group on Information Retrieval, pp. 334–342, 2001.

[5] X. Zhu and R. Rosenfeld, “Improving trigram language modeling with the world wide web.” in Proceeding of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 941–944, 2006.

[6] C. Chelba, D. Bikel, M. Shugrina, P. Nguyen and S. Kumar, “Large scale language modeling in automatic speech recognition,” Technical report, Google, 2012.

[7] W. -Y. Ma and K.-J. Chen, “Introduction to CKIP Chinese word segmentation system for the ﬁrst international Chinese word segmentation bakeoff,” in Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 168–171, (http://ckipsvr.iis.sinica.edu.tw/).

[8] F. Jelinek, “Up from trigrams! The struggle for improved language models,” in

Proceedings of the International Speech Communication Association, pp. 1037–

1040, 1991.

[9] G. Tur and A. Stolcke, “Unsupervised language model adaptation,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.173–176, 2007.

[10] J. R. Bellegarda, “Statistical language model adaptation: review and perspectives,” Speech Communication, Vol. 42, No. 1, pp. 93–108, 2004.

[11] G. Tur and A. Stolcke, “Unsupervised Language Model Adaptation For Meeting Recognition.” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2007.

[12] M. Novak, R. Mammone, “Use of Non-negative Matrix Factorization for Language Model. Adaptation in a Lecture Transcription Task.” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001.

[13] L. Chen, J.-L. Gauvuin, L. Lamel, and G. Addu, “Unsupervised Language Model Adaptation for Broadcast News.” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003.

[14] D. D. Lewis, “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval.” in Proceedings of the International Conference on Machine Learning, 1998.

[15] M. Collins, “Discriminative reranking for natural language parsing.” in Proceedings of the International Conference on Machine Learning, 2000.

[16] J. Gao, H. Suzuki, W. Yuan, “An Empirical Study on Language Model

Adaptation”, ACM Transactions on Asian Language Information Processing, Vol.

5, No. 3, September 2005, pp. 209–227

[17] 劉鳳萍, “使用鑑別式語言模型於語音辨識結果重新排序,”國立台灣師範大學資訊工程所碩士論文, 2009.

[18] J. Goodman, “A bit of progress in language modeling (extended version),”

Machine Learning and Applied Statistics Group, Technique Report, Microsoft, 2001.

[19] R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here,” IEEE, Vol. 88, No. 8, pp. 1270–1278, 2000.

[20] J. R. Bellegarda, “A multispan language modeling framework for large vocabulary speech recognition,” IEEE Transactions on Acoustic, Speech and Signal Processing, Vol. 6, No. 5, pp. 456–467, 1998.

[21] I. J. Good, “The population frequencies of species and the estimation of population parameters,” Biometrika, Vol. 40, No. 3–4, pp. 237–264, 1953.

[22] R. Kneser and H. Ney, “Improved backing-off for N-gram language modeling,”

in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 181–184, 1995.

[23] P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai and R. L. Mercer,

“Class-based N-gram models of natural language,” Computational Linguistics, Vol. 18, No. 4, pp. 467–479, 1992.

[24] X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee and R. Rosenfeld,

“The SPHINX-II speech recognition system: An overview,” Computer, Speech, and Language, Vol. 7, No. 2, pp. 137–148, 1993.

[25] R. Lau, R. Rosenfeld and S. Roukos, “Trigger-based language models: a maximum entropy approach,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 45–48, 1993.

[26] L. Saul and F. Pereira,“ Aggregate and mixed-order Markov models for statistical language processing.” in Proceedings of the Empirical Methods on Natural Language Processing, 1997.

[27] C. Chelba, “A structured language model,” in Proceedings of the Annual Meeting on Association for Computational Linguistics, pp. 498–450, 1997.

[28] C. Chelba and F. Jelinek, “Exploiting syntactic structure for language modeling,”

in Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 225–231, 1998.

[29] J. R. Bellegarda, “A latent semantic analysis framework for large–span language modeling,” in Proceedings of European Conference on Speech Communication and Technology, pp.1451–1454, 1997.

[30] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceeding of the ACM Special Interest Group on Information Retrieval, pp. 50–57, 1999.

[31] M. Novak, R. Mammone, “Use of Non-negative Matrix Factorization for Language Model. Adaptation in a Lecture Transcription Task.” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2001.

[32] Z. Chen, K. F. Lee and M. J. Li, “Discriminative training on language model,” in Proceedings of the International Speech Communication Association, pp. 493–

496, 2000.

[33] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang and C. H. Lee,” Discriminative Training of Language Models for Speech Recognition.” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2002.

[34] Y. Bengio, R. Ducharme, P. Vincent. A neural probabilistic language model.

Journal of Machine Learning Research, 3:1137–1155, 2003.

[35] H.-S. Chiu, B. Chen, “Word Topical Mixture Models for Dynamic Language Model Adaptation.” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2007.

[36] M. Afify, O. Siohan, R. Sarikaya, “Gaussian Mixture Language Models for Speech Recognition.” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2007

[37] T. Mikolov, M. Karafiát, L. Burget, J. Černocký and S. Khudanpur, “Recurrent neural network based language model,” in Proceedings of the International Speech Communication Association, pp. 1045–1048, 2010.

[38] Y. Bengio, P. Simard, P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” Neural Networks, IEEE Transactions on, Vol. 5, No.

2, pp. 157– 166, 1994.

[39] Hochreiter, S., Schmidhuber, J., “Long Short-Term Memory”, Neural Computation 9 (8), 1997, pp. 1735–1780.

[40] G.E. Hinton. Learning distributed representations of concepts. in Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pages 1–12, Amherst 1986, 1986. Lawrence Erlbaum, Hillsdale.

[41] Collobert, R., & Weston, J., A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, pp. 160–167, 2008.

[42] Mnih and G. E. Hinton. Three new graphical models for statistical language modelling. In International Conference on Machine Learning, pages 641–648, 2007.

[43] Mnih, A., & Hinton, G. E. A scalable hierarchical distributed language model.

Advances in Neural Information Processing Systems 21, MIT Press, pp. 1081–

1088, 2009.

[44] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations, 2013.

[45] F. Morin and Y. Bengio, “Hierarchical probabilistic neural network language model,” In Proceedings of the International Conference on Artificial Intelligence

and Statistics, pp. 246–252, 2005.

[46] A. Mnih and K. Kavukcuoglu, “Learning word embeddings efficiently with noise-contrastive estimation,” Advances in Neural Information Processing Systems, pp. 2265–2273, 2013.

[47] Q. Le, T. Mikolov. 2014. Distributed Representations of Sentences and Documents. in Proceedings of the International Conference on Machine Learning, 2014.

[48] J. Bellegarda, “A latent semantic analysis framework for large-span language modeling.” In Eurospeech-97, Rhodes, Greece, September 1997.

[49] J. R. Bellegarda, “Latent Semantic Mapping.” IEEE Signal Processing Magazine, Vol. 22. No. 5, pp. 70–80, 2005.

[50] D. Gildea and T. Hofmann, “Topic-based language models using EM.” in Proceedings of the International Speech Communication Association, 1999.

[51] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation.”In Journal of Machine Learning Research, 2003.

[52] 邱炫盛, “利用主題與位置相關語言模型於中文連續語音辨識,”國立台灣師範大學資訊工程所碩士論文, 2007.

[53] V. Lavrenko and W. Croft, “Relevance-based language models,” in Proceeding of the ACM Special Interest Group on Information Retrieval, pp. 120–127, 2001.

[54] R. Baeza-Yates and B. Ribeiro-Neto, “Modern Information Retrieval: the Concepts and Technology behind Search,” Addison-Wesley Professional, 2011.

[55] K.-Y. Chen and B. Chen, “Relevance language modeling for speech recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5568–5571, 2011.

[56] B. Chen and K.-Y. Chen, “Leveraging relevance cues for language modeling in speech recognition,” Information Processing & Management, Vol. 49, No 4, pp.

807–816, 2013.

[57] 郝柏翰，“運用鄰近與概念資訊於語言模型調適之研究，”國立臺灣師範大學資訊工程所碩士論文，2014。

[58] Ortmanns, S., Ney, H., & Aubert, X. (1997). A word graph algorithm for large vocabulary continuous speech recognition. Computer Speech and Language, 11, 43–72.

[59] Kullback, S., & Leibler, R. (1951). On information and sufficiency,” Annals of Mathematical Statistics, 22(1), 79–86.

[60] Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern Information Retrieval: the Concepts and Technology behind Search, Addison-Wesley Professional.

[61] Zhai, C. X. (2008). Statistical language models for information retrieval: A critical review. Foundations and Trends in Information Retrieval, 2(3), 137–213.

[62] D. Povey and P. C. Woodland, “Minimum phone error and I-smoothing for improved discriminative training,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 105–108, 2002.

[63] Chen, B., Kuo, J.-W., & Tsai, W.-H. (2004). Lightly supervised and data-driven approaches to Mandarin broadcast news transcription. In Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing, 777–

780.

[64] Wang, H.-M., Chen, B., Kuo, J.-W., & Cheng, S.-S. (2005). MATBN: a Mandarin Chinese broadcast news corpus. International Journal of Computational Linguistics & Chinese Language Processing, 10(1), 219–235.

[65] Liu, S.-H., Chu, F.-H., Lin, S.-H., Lee, H.-S., & Chen, B. (2007). Training data selection for improving discriminative training of acoustic models. In Proceedings of IEEE workshop on Automatic Speech Recognition and Understanding, 284–289.

[66] Stolcke, A. (2000). SRI Language Modeling Toolkit. Available at:

http://www.speech.sri.com/projects/srilm/.

在文檔中使用詞向量表示與概念資訊於中文大詞彙連續語音辨識之語言模型調適 (頁 79-89)