未來展望

第五章結論與未來展望

5.2 未來展望

從本研究可延伸出下列幾個議題值得未來探討:第一，以一階段式的方式，

在 PD-AM 產生音節圖時，每當偵測出 BT3 即做局部性的辨認，縮小辨認的搜尋空間。第二，利用 PD-AM 解碼出之韻律停頓，引入如 pitch、energy 及 pause duration 等韻律參數，進一步提高詞的辨認率。第三，一字詞一直都是造成辨認率下降的主要原因，因此，我們可利用韻律停頓將詞綴併入形成韻律詞(prosodic word)，

減少一字詞的錯誤。第四，目前本研究只對朗讀式語音作辨認，未來若能延展到自發性語音，相信可以將語音辨認更廣泛地應用在生活之中。

參考文獻

[1] K. F .Lee, “ Context–Dependent Phonetic Hidden Markov Models for Speaker-Independent Continuous Speech Recognition,” IEEE Trans. Speech Audio Process., vol38, no.4, Apr. 1990, pp. 599-609.

[2] I. Shafran, M. Ostendorf and R. Wright, “Prosody and phonetic variability:

Lessons learned from acoustic model clustering”, in Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding, pp. 127-131, 2001.

[3] M. Ostendorf et al., “A prosodically labeled database of spontaneous speech,”

Proc. of the ISCA Workshop on Prosody in Speech Recognition and Understanding, pp. 119-121, 2001.

[4] M. Ostendorf, I. Shafran, and R. Bates, “Prosody models for conversational speech recognition,” in Proc. 2nd Plenary Meeting Symp. Prosody and Speech Process 2003, pp. 147-154.

[5] K. Chen, M. Hasegawa-Johnson, A. Cohen, S. Borys, S.-S. Kim, J. Cole, and J.-Y.

Choi, “Prosody dependent speech recognition on radio news corpus of American English,” IEEE Trans. on Audio, Speech and Language Processing, vol. 14 no.1, pp.232-245, January 2006.

[6] C. Ni, W. Liu, and B. Xu, “Using prosody to improve Mandarin automatic speech recognition,” in Proc. INTERSPEECH 2010, Makuhari, Japan, Sept. , pp 2690-2693.

[7] Jui-Ting Huang, Po-Sen Huang, Yoonsook Mo, Mark Hasegawa-Johnson, Jennifer Cole, “Prosody-Dependent Acoustic Modeling Using Variable-Parameter Hidden Markov Models,” in Proc. Speech Prosody 2010, Chicago, USA, Apr.

[8] Jyh-Her Yang, Ming-Chieh Liu, Hao-Hsiang Chang, Chen-Yu Chiang, Yih-Ru Wang, and Sin-Horng Chen, “Enriching mandarin speech recognition by incorporating a hierarchical prosody model,” in Proc. ICASSP 2011, Prague, Czech, May, 2011, pp 5052-5055.

[9] C.-Y. Chiang, S.-H. Chen, H.-M. Yu, and Y.-R. Wang, “Unsupervised joint prosody labeling and modeling for Mandarin speech,” Journal of the Acoustic Society of America, vol. 125, no. 2, pp.1164-1183, Feb. 2009.

[10] Mandarin microphone speech corpus – TCC300, http://www.aclclp.org.tw/use_mat.php#tcc300edu.

[11] Z. Sheng, J.-H. Tao, and D.-L. Jiang, “Chinese prosodic phrasing with extended features,”Proceedings of the IEEE ICASSP, Vol. 1, pp.492-495, 2002.

[12] C.-Y. Tseng, S.-H. Pin, Y.-L. Lee. H.-M. Wang, and Y.-C Chen, “Fluent speech prosody:Framwork and modeling,”Speech Commun. Special issue on quantitive prosody modeling for natural speech description and generation, 46, pp.284-309,2005.

[13] “HTK Web-Site”, http://htk.eng.cam.ac.uk. Accessed 2009.

[14] F. Sha and F. Pereira, “ Shallow Parsing with Conditional Random Fields”,2003.

[15] 周建邦,“中文大詞彙語音辨認知語言模型改進”, 國立交通大學碩士論文, 民國九十八年十二月。

[16] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. OpenFst: A general and efficient weighted finite-state transducer library. In Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA 2007), Prague, Czech Republic, July 2007, volume 4783 of Lecture Notes in Computer Science, pages 11–23. Springer, Heidelberg, 2007.

[17] D. Moore, J. Dines, M. Magimai Doss, J. Vepa, O. Cheng, and T. Hain, “Juicer:

A weighted finite state transducer speech decoder,” in Proc. MLMI (to appear), Washington DC, May 2006.

[18] 劉銘傑,“以韻律輔助之中文語音辨認系統之實現”, 國立交通大學碩士論文, 民國一百年七月。

附錄：決策樹之問題集

Prosodic break question Bclass0 BT0 Bclass1 BT1 Bclass2 BT2 Bclass3 BT3,sil

Phonetic context question-聲母

國語聲母音素問題集 (依據發音方式及發音部位)

清音 p, t, k, c, ch, q, b, d, g, z, zh, j, f, s, sh, x, h 濁音 r, m, n, l

送氣 p, t, k, c, ch, q 不送氣 b, d, g, z, zh, j 塞音 p, t, k, b, d, g 送氣塞音 p, t, k

不送氣塞音 b, d, g

擦音 f, s, sh, x, h, r 擦音清 f, s, sh, x, h

擦音濁 r

塞擦音 c, ch, q, z, zh, j 送氣塞擦音 c, ch, q

不送氣塞擦音 z, zh, j

鼻音 m, n

邊音 l

Phonetic context question-韻母

國語韻母音素問題集 (依據舌面高低及舌位前後)

舌位中 FNULL1, FNULL2, er, e 舌位後 o, wu1, wu2, wu3

圓唇 yu1, yu2, wu1, wu2, wu3, o 圓唇舌面高 yu1, yu2, wu1, wu2, wu3 圓唇舌面半高 o

舌面高舌位前 yi1, yi2, yi3, yu1, yu2 韻尾鼻音 ng, en

閉央不圓唇母音 FNULL1, FMULL2

在文檔中使用韻律訊息於建立聲學模型之中文語音辨認 (頁 50-57)

第五章 結論與未來展望

5.2 未來展望

參考文獻

附錄：決策樹之問題集

第五章結論與未來展望