• 沒有找到結果。

未來展望

在文檔中 中文自發性語音辨認系統 (頁 71-85)

第六章 結論與未來展望

6.2 未來展望

自發性語音辨認要達到商業運用的地步還有很長一段距離,在自發性語音語言模型的研 究上通常會面臨語料短缺的問題,若能持續收集人類日常生活聊天中大量的口語詞、新世代 用語、及感嘆詞的文字記錄,或學習出自發性語音凌亂的文法結構,對於自發性語言模型的 訓練將會有很大的幫助。

另外一方面,本研究實驗結果也注意到自發性語音聲學模型還有很大的改善空間,而聲 學模型的訓練遇到的問題主要是音檔的品質及音節切割位置的正確性,音檔的品質或許也反 映出自發性語音中的特有現象:常用詞的音節合併現象、忽快忽慢的語速、時大時小的音量、

突如其來的背景雜訊、或同時存在兩個人的聲音等情況,都是未來自發性語音辨認在應用上 會面臨的問題。若能利用韻律資訊來協助協助聲學模型的建立,或許能夠改善一些問題。待

有了較強健的聲學模型後,能產生更好的 Top-N 候選詞串,再加入韻律模型的幫助,挑選出 句法結構正確的詞串,相信將能大幅改善現在的自發性語音辨認系統,若能使自發性語音的 辨認率達到如朗讀式語音的辨認率,則電影中人類與機器人對話的情景將不再只是幻想。

參考文獻

【1】 T. Ng, M. Ostendorf, M.Y. Hwang, M. Siu, I. Bulyko, X. Lei, “Web-Data Augmented Language Models for Mandarin Conversational Speech Recognition,” in Proceedings of ICASSP, 2005, pp. 589–592.

【2】 Hiroaki Nanjo, Tatsuya Kawahara, “Language Model and Speaking Rate Adaptation for Spontaneous Presentation Speech Recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. 12, 2004

【3】 M. Bacchiani, B. Roark, and M. Saraclar, “Language model adaptation with MAP estimation and the perceptron algorithm,” in Proceedings of the HLTNAACL, Boston, MA, May 2004.

【4】 Samuelsson, C., Reichl, W., 1999. A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics. In: Proceedings of ICASSP. pp.

537–540.

【5】 T. Yokoyama, T. Shinozaki, K. Iwano and S. Furui, “Unsupervised class-based language model adaptation for spontaneous speech recognition,” Proc ICASSP, vol.1, pp. 236–239, 2003.

【6】 G. Moore and S. Young. Class-based Language Model Adaptation using Mixtures of Weights. In Proc. ICSLP, 2000.

【7】 A. Stolcke, E. Shriberg, D. Hakkani-Tür ,and G. Tür, “Modeling the prosody of hidden events for improved word recognition,” in Proc. of Eurospeech 1999, pp. 311-314.

【8】 M. Ostendorf, I. Shafran, and R. Bates, “Prosody models for conversational speech recognition,” in Proc. of the 2nd Plenary Meeting and Symposium on Prosody and Speech Processing 2003, pp. 147–154.

【9】 K. Chen and M. Hasegawa-Johnson. “Improving the robustness of prosody dependent

language models based on prosody syntax dependence”. In Proceedings IEEE Workshop on Speech Recognition and Understanding, pp. 435–440, St. Thomas, U. S. Virgin Islands, 2003.

【10】 K. Chen, M. Hasegawa-Johnson, A. Cohen, S. Borys, S.S. Kim, J. Cole, and J.Y. Choi,

“Prosody dependent speech recognition on radio news corpus of American english,” IEEE Transactions on Speech, Audio, Language Processing, Vol. 14, No. 1, pp. 232–245, 2006.

【11】 周裕倫,“中文自發性語音之韻律標記及韻律模式”,國立交通大學碩士論文,民國 九十八年七月。

【12】 S.C. Tseng, “Processing Spoken Mandarin Corpora,” Traitement automatique des langues, Special Issue: Spoken Corpus Processing, Vol. 45, No. 2, pp. 89-108, 2004.

【13】 S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P.Woodland, The HTK Book Version 3.0. Cambridge, U.K.: Cambridge Univ. Press, 2000.

【14】 X. Huang, A. Acero, HW. Hon, ”Spoken Language Processing, ” pp.558-559, 2001

【15】 C.Y. Tseng and Z.Y. Su, “Corpus approach to phonetic investigation - methods, quanitative evidence and findings of Mandarin speech prosody,” in Proc. of Oriental COCOSDA Workshop 2006, pp. 123-138.

【16】 江振孙,“非監督式中文語音韻律標記及韻律模式”,國立交通大學博士論文,民國 九十八年三月。

附錄一 詞性分類表

2 類詞性 8 類詞性 23 類詞性 46 類詞性 4 Paralinguistic ParaL 13 Paralinguistic ParaL 24 Paralinguistic ParaL 59 Paralinguistic ParaL 5 Particle Par 14 Particle Par 25 Particle Par 60 Particle Par 5 Particle Par 14 Particle Par 25 Particle Par 61 Marker Marker

附錄二 韻律模型初始停頓標記門檻值之選定

A. Th1, Th2 和 Th3 之定義:

0.5 1 1.5 2

0 50 100 150 200

pause duration (second)

data count

呼吸聲 短靜音 靜音

圖二.1:已標記音節邊界之音節停頓長度分佈

Th1、Th2 和 Th3 為區分 B4、B3、B2-2 以及其它停頓標記之停頓長度門檻值,由於 MCDC 語料庫當中已具有一些語流中斷之標記(Labeled Juncture, LJ),例如:靜音(silence)、短靜 音(short pause, sp)或呼吸聲(breathe),表示這些標記所對應之音節邊界,都有人類可明 顯 觀 察到 的停頓現象 , 其資料 分 布如 圖二 .1 所示,因此在本研究使用一個半監督式

(semi-supervised)的方法得到 B4、B3、B2-2 以及其它停頓標記對應之停頓長度機率分佈

3( )

fB pdfB4(pd)fB2-2(pd)felse(pd)以決定 Th1、Th2 和 Th3 這些門檻值,如圖 4.10 所示。

但是由於人類的標記會有不一致性以及自動切割的方式會造成切割位置不準確的狀況,因此 先計算標記為「短靜音」以及其他已標記之停頓長度帄均值sp以及LJ以收集較為可靠之 停頓長度資料,其中詞內音節邊界、短靜音以及其他已標記之停頓長度資料,如下:

intra= { n: n intra-word syllable jucture}

pd pd pd

sp= { n: n short pause, n sp n LJ, n 0.03}

pd pd pdpd   pd  pd

LJ= { n: n silence or breathe, n LJ n sp, n 0.03}

pd pd pdpd   pd  pd

接著以向量量化(Vector Quantization, VQ)的方式將pd 中的資料分為兩群,LJ pd 以及sp pdintra

的資料視為第三群和第四群,使用伽瑪分佈建構這四群資料的機率分佈,如圖二.2 (a)和(b)

+1(1) (1)

normalized log- 0 jump of intra-word junctures

F pjn

normalized log- 0 jump of all labeled junctures

F pjn

VQ

Gaussian distribution fitting

sec

intra( ) ( ; inra, intra)

f dl N dl (a)

normalized lengthening factor of all labeled junctures dln

Gaussian distribution fitting

sec

LJ( ) ( ; LJ, LJ)

f dl N dl   (b)

sec

LJ( ) f dl

intra( ) f dl

Th7 (c)

normalized lengthening factor of intra-word junctures dln

圖二.5:Th7 之定義方法:計算 (a)詞內音節邊界和 (b)已標記之音節邊界的相鄰兩音節正規 化音節延長因子之機率分佈以及 (b)門檻值之定義

附錄三 停頓標記聲學模型之問題集

Q : Is the initial of the following syllable a null one?

11.3

Q : Is the initial of the following syllable in {ts, ch, chi}?

11.7

Q : Is the initial of the following syllable in {p, t, k}?

11.8

Q : Is the initial of the following syllable in {tz, j, ji}?

11.9

Q : Is the inter-syllable location an inter-word?

11.10

Q : Is the inter-syllable location a intra-word?

2. Questions related to sentence level features

All the following questions are subject to a prerequisite condition that the current inter-syllable location is an inter-word.

2.1 Word length

12.1.1~ 4

Q : Is the preceding word an n{1, 2, 3, 4}-syllable word?

12.1.5 ~ 8

Q : Is the following word an n{1, 2, 3, 4}-syllable word?

12.1.9

Q : Is the length of the preceding word in syllable greater than 4?

12.1.10

Q : Is the length of the following word in syllable greater than 4?

2.2 Level-1 POS and special tags

12.2.1~11

Q : Is the POS of the preceding word A/C/D/N/I/P/T/V/DE/SHI/DM?

12.2.12 ~ 22

Q : IS the POS of the following word A/C/D/N/I/P/T/V/DE/SHI/DM?

2.3 Level-2 POS

12.3.1~ 33

Q : Is the POS of the preceding word Ca/Cb/Da/Db/Dc/Dd/Df/Dg/Dh/Di/Dj/Dk/Na/Nb/Nc/Nd/Ne/Nf/Ng/Nh/VA/VB/VC/VD/VE/VF/V G/VH/VI/VJ/VK/VL/V_2?

12.3.34 ~ 66

Q : Is the POS of the following word Ca/Cb/Da/Db/Dc/Dd/Df/Dg/Dh/Di/Dj/Dk/Na/Nb/Nc/Nd/Ne/Nf/Ng/Nh/VA/VB/VC/VD/VE/VF/V G/VH/VI/VJ/VK/VL/V_2?

2.4 Level-3 POS

12.4.1~15

Q : Is the POS of the preceding word Caa/Cab/Cba/Cbb/Dfa/Dfb/Ncd/Neu/Nes/Nep/Neq/VA2/VC1/VH16/VH22?

12.4.16 ~ 30

Q : Is the POS of the following word Caa/Cab/Cba/Cbb/Dfa/Dfb/Ncd/Neu/Nes/Nep/Neq/VA2/VC1/VH16/VH22?

2.5 Combination of POS

12.5.1~ 7

Q : Does the POS of the preceding word belong to {Da, Db, Dc, Dd, Dg, Dh, Di, Dj, Dk}/{Na, Nb, Nc}/{Ncd, Ng}/{I, T}/{VA, VG}/{VB, VC, VD, VE, VF, VJ, VK, VL}/{VH, VI}?

12.5.8 ~14

Q : Does the POS of the following word belong to {Da, Db, Dc, Dd, Dg, Dh, Di, Dj, Dk}/{Na, Nb, Nc}/{Ncd, Ng}/{I, T}/{VA, VG}/{VB, VC, VD, VE, VF, VJ, VK, VL}/{VH, VI}?

附錄四 停頓標記語言模型之問題集

The question set used to construct the decision trees for building the break-syntax model ( n| )n

P B l is listed below:

1. Syllable Level

21.1

Q : Is the initial of the following syllable a null one or in {m, n, l, r}?

21.2

Q : Is the inter-syllable location an inter-word?

21.3

Q : Is the inter-syllable location a intra-word?

2. Word Level

All the following questions are subject to a prerequisite condition that the current inter-syllable location is an inter-word.

2.1 Word length

22.1.1~ 4

Q : Is the preceding word an n{1, 2, 3, 4}-syllable word?

22.1.5 ~ 8

Q : Is the following word an n{1, 2, 3, 4}-syllable word?

22.1.9

Q : Is the length of the preceding word in syllable greater than 4?

22.1.10

Q : Is the length of the following word in syllable greater than 4?

2.2 Level-1 POS and special tags

22.2.1~11

Q : Is the POS of the preceding word A/C/D/N/I/P/T/V/DE/SHI/DM?

22.2.12 ~ 22

Q : IS the POS of the following word A/C/D/N/I/P/T/V/DE/SHI/DM?

2.3 Level-2 POS

22.3.1~ 33

Q : Is the POS of the preceding word Ca/Cb/Da/Db/Dc/Dd/Df/Dg/Dh/Di/Dj/Dk/Na/Nb/Nc/Nd/Ne/Nf/Ng/Nh/VA/VB/VC/VD/VE/VF/V G/VH/VI/VJ/VK/VL/V_2?

22.3.34 ~ 66

Q : Is the POS of the following word Ca/Cb/Da/Db/Dc/Dd/Df/Dg/Dh/Di/Dj/Dk/Na/Nb/Nc/Nd/Ne/Nf/Ng/Nh/VA/VB/VC/VD/VE/VF/V G/VH/VI/VJ/VK/VL/V_2?

2.4 Level-3 POS

22.4.1~15

Q : Is the POS of the preceding word Caa/Cab/Cba/Cbb/Dfa/Dfb/Ncd/Neu/Nes/Nep/Neq/VA2/VC1/VH16/VH22?

22.4.16 ~ 30

Q : Is the POS of the following word Caa/Cab/Cba/Cbb/Dfa/Dfb/Ncd/Neu/Nes/Nep/Neq/VA2/VC1/VH16/VH22?

2.5 Combination of POS

22.5.1~ 7

Q : Does the POS of the preceding word belong to {Da, Db, Dc, Dd, Dg, Dh, Di, Dj, Dk}/{Na, Nb, Nc}/{Ncd, Ng}/{I, T}/{VA, VG}/{VB, VC, VD, VE, VF, VJ, VK, VL}/{VH, VI}?

22.5.8 ~14

Q : Does the POS of the following word belong to {Da, Db, Dc, Dd, Dg, Dh, Di, Dj, Dk}/{Na, Nb, Nc}/{Ncd, Ng}/{I, T}/{VA, VG}/{VB, VC, VD, VE, VF, VJ, VK, VL}/{VH, VI}?

在文檔中 中文自發性語音辨認系統 (頁 71-85)

相關文件