• 沒有找到結果。

第六章 結論與未來展望

6.2 未來展望

本系統速度提昇之後,進一步可以去使用更複雜的模型去做此辨識系統。例如使用三 連音素模型(Tri-phone Model)或是三連語言模型(Trigram LM)等模型來提昇整體辨識率;或 是建構出詞網(Word Lattice)之後再用其他語音相關知識做再計分(Rescoring)的動作以提高 辨識率。

對於語音辨識能夠真正落實於真實世界環境除了語音辨識本身相關搜尋演算法之外,

對於由麥克風端至PC端的處理亦為同等重要,例如使用VAD(Voice Activity Detector)技巧有 效地偵測出語音的內容並且進一步抵抗環境背景的雜訊,可有效地決定語音辨識的斷點偵 測(Endpoint Detection)位置。

此外該系統是針對於字元基礎(Character-based)作為詞典架構,使得該系統無法處理中

音字的問題。

最後可利用該系統的即時辨識特性與系統的彈性設計,加入語音合成相關功能,可進 而修改成多功能應用的對話系統(Dialogue System),更能讓學術實現於日常生活。畢竟科 技始終來自於人性,雖然語音辨識仍然是一門深不可測的學問,然而如何讓這項技術成為 一般人的生活中便利的工具,仍需依靠後續的傑出研究人才努力克服了。

參考文獻 參考文獻 參考文獻 參考文獻

[1] S. Young, N. Russell, and J. Thornton, “Token passing: A Conceptual Model for Connected Speech Recognition Systems,” Technical Report F INFENG/TR38, Cambridge University, 1989.

[2] H. Ney and S. Ortmanns, “Progress in Dynamic Programming Search for LVCSR, ” Proceedings, IEEE, Aug. 2000, Vol. 88, pp. 1224 - 1240.

[3] Ortmanns, S., Ney, Hermann, and Eiden, A. “Language-model Look-ahead for Large Vocabulary Speech Recognition,” In ICSLP-1996, 2095-2098.

[4]A. Cardenal-Lopez, F.J. Dieguez-Tirado, and C. Garcia-Mateo, “Fast LM Look-ahead for Large Vocabulary Continuous Speech Recognition Using Perfect Hashing,” in Proc.

ICASSP, May 2002, pp. 705–708.

[5] Ortmanns, S., Ney, H., Eiden, A. (1996): “Language-Model Look-Ahead for Large Vocabulary Speech Recognition”, In Proc. of the Int. Conf. of Spoken Language Processing, pp. 2091–2094, Philadelphia, PA, October 1996. Ortmanns,

[6] Mehryar Mohri, Fernando Pereira, and Michael Riley, “Weighted Finite-state Transducers in Speech Recognition,” Computer Speech and Language, 16(1):69–88, 2002.

[7] D. Caseiro and I. Trancoso, “A Specialized On-the-fly Algorithm for Lexicon and Language Model Composition,” IEEE Transactions on Audio, Speech and Language Processing, vol.

14, no. 4, pp. 1281–1291, July 2005.

[8] Hori, Takaaki / Hori, Chiori / Minami, Yasuhiro (2004): “Fast On-the-fly Composition for Weighted Finite-state Transducers in 1.8 Million-word Vocabulary Continuous Speech Recognition”, In INTERSPEECH-2004, 289-292.

[9] O. Cheng, J. Dines, M.M. Doss, “A Generalized Dynamic Composition Algorithm of Weighted Finite State Transducers for Large Vocabulary Speech Recognition,” Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on , vol.4, no., pp.IV-345-IV-348, 15-20 April 2007

[10] D. Willett, C. Neukirchen, G. Rigoll, “DUcoder-the Duisburg University LVCSR Stack Decoder,” Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on , vol.3, no., pp.1555-1558 vol.3, 2000

[11] Lei, Xin / Siu, Manhung / Hwang, Mei-Yuh / Ostendorf, Mari / Lee, Tan (2006): “Improved Tone Modeling for Mandarin Broadcast News Speech Recognition,” In INTERSPEECH 2006, paper 1752-Tue3A2O.4

[12] H. L. Wang, Y. Qian , F. K. Soong, J. L. Zhou, et al, “Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone models,” in Proc. of ISCSLP, pp.

445-443, 2006

[13] Jinyu Li, Yu Tsao, Chin-Hui Lee, “A Study on Knowledge Source Integration for Candidate Rescoring in Automatic Speech Recognition,” Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on , vol.1, no., pp.

837-840, March 18-23, 2005

[14] Nicolás Morales, Liang Gu, Yuqing Gao, “Fast Gaussian Likelihood Computation by Maximum Probability Increase Estimation for Continuous Speech Recognition,” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on , vol., no., pp.4453-4456, March 31 2008-April 4 2008

[15] E. Bochieri, “Vector Quantization for The Efficient Computation of Continuous Density Likelihoods,” Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on , vol.2, no., pp.692-695 vol.2, 27-30 Apr 1993

[16] Hoon Chung, Jeon Gue Park, Yun Keun Lee and Ikjoo Chung, “Fast Speech Recognition to Access a Very Large List of Items on Embedded Devices,” Consumer Electronics, IEEE Transactions on , vol.54, no.2, pp.803-807, May 2008

[17] M. Afify, Feng Liu, Hui Jiang, O. Siohan, “A New Verification-based Fast-match for Large Vocabulary Continuous Speech Recognition,” Speech and Audio Processing, IEEE Transactions on , vol.13, no.4, pp. 546-553, July 2005

[18] M.K. Ravishankar, “Efficient Algorithms for Speech Recognition,” Ph.D. thesis, School of Computer Science, Carnegie Mellon University, 1996.

[19] E. Whittaker and B. Raj, “Quantization-Based Language Model Compression,” Proc.

Eurospeech, pp.33–36, Aalborg, Denmark, 2001.

[20] B. Raj , E. U. D. Whitraker “Lossless Compression of Language Model Structure and Word Identifiers,” Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

2003 IEEE International Conference on , vol.1, no., pp. I-388-I-391 vol.1, 6-10 April 2003

[21] Langzhou Chen and K. K. Chin, “Efficient Language Model Look-ahead Probabilities Generation Using Lower order LM Look-ahead Information,” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on , vol., no., pp.4925-4928, March 31 2008-April 4 2008

[22] R Kneser, H Ney, “Improved Backing-off for M-gram Language Modeling,” Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on , vol.1, no., pp.181-184 vol.1, 9-12 May 1995

[23] B. L. Pellom, R. Sarikaya, and J. H. L. Hansen, “Fast Likelihood Computation Techniques in Nearest-Neighbor Based Search for Continuous Speech Recognition,” IEEE Signal Processing Letters, Vol. 8, No. 8, pp. 221-224, August 2001.

[24] Jun Cai, Ghazi Bouselmi, Yves Laprie, Jean-Paul Haton, “Efficient Likelihood Evaluation and Dynamic Gaussian Selection for HMM-based Speech Recognition,” Computer Speech

& Language, Volume 23, Issue 2, April 2009, Pages 147-164, ISSN 0885-2308, DOI:

10.1016/j.csl.2008.05.002.

[25] 李庚達, “雙連語言模型預查之大詞彙連續語音辨識系統,” 交通大學電信工程系碩士 論文, 2008。

[26]M. J. F. Gales, K. M. Knill, and S. J. Young, “State-Based Gaussian Selection in Large Vocabulary Continuous Speech Recognition Using HMM’s,” IEEE Trans. on Speech and Audio Processing, Vol. 7, No. 2, pp. 152-161, March 1999.

[27] Cambridge university Engineering Dept. (CUED), “Hidden Markov Model Toolkit (HTK),”

http://htk.eng.cam.ac.uk, version 3.4.

附錄 附錄 附錄

附錄A、、TCC300辨識相關結果 、 辨識相關結果 辨識相關結果 辨識相關結果

表 A.1 TCC300 語料的基準實驗結果 Acc(%)

Beam Width Histogram syllable character Real-time(s) RTF

125 3000 79.41 69.95 2023.3 0.556

表 A.2 TCC300 利用最近點概算配合上 FCR 與 BMP 方法辨識結果表 Acc(%)

Beam Width Histogram syllable character Real-time(s) RTF Ex. time(%)

125 4000 81.02 70.09 1851.812 0.509 91.12%

150 7000 82.38 72.17 2841.321 0.781 91.62%

175 8000 82.38 72.21 4043.854 1.112 93.34%

200 9000 82.61 72.47 5194.134 1.428 94.48%

225 8000 82.57 72.51 5464.691 1.503 92.86%

表 A.3 TCC300 利用單連語言模型預查來快速建立雙連語言模型預查的辨識結果 Acc(%)

Beam Width Histogram syllable character Real-time(s) RTF Ex. time(%)

125 4000 81.02 70.09 1647.975 0.453 81.09%

150 7000 82.38 72.17 2562.473 0.705 82.62%

175 8000 82.38 72.21 3733.213 1.027 86.17%

200 9000 82.61 72.47 4830.917 1.328 87.87%

225 8000 82.57 72.51 5174.652 1.423 87.93%

表 A.4 TCC300 每兩個音框延展一次詞轉移的辨識結果 Acc(%)

Beam Width Histogram syllable character Real-time(s) RTF Ex. time(%)

125 4000 80.65 69.34 1149.866 0.316 56.58%

150 7000 82.12 71.87 1970.832 0.542 63.55%

175 8000 82.22 72.2 3065.089 0.843 70.75%

200 9000 81.94 71.85 4114.897 1.132 74.85%

225 8000 82.45 72.22 4599.251 1.265 78.15%

表 A.5 TCC300 每三個音框延展一次詞轉移的辨識結果 Acc(%)

Beam Width Histogram syllable character Real-time(s) RTF Ex. time(%)

125 4000 79.88 68.02 936.84 0.258 46.10%

150 7000 81.85 71.42 1683.203 0.463 54.27%

175 8000 82.19 71.82 2737.514 0.753 63.19%

200 9000 82.27 71.91 3823.671 1.051 69.55%

225 8000 82.39 71.88 4321.441 1.188 73.43%

表 A.6 TCC300 每四個音框延展一次詞轉移的辨識結果 Acc(%)

Beam Width Histogram syllable character Real-time(s) RTF Ex. time(%)

125 4000 78.22 66 809.614 0.223 39.84%

150 7000 81.15 70.08 1516.094 0.417 48.89%

175 8000 81.44 70.53 2530.352 0.696 58.40%

200 9000 81.68 70.85 3630.771 0.998 66.04%

225 8000 81.7 70.79 4146.647 1.14 70.46%

相關文件