兩階段式辨認系統

第三章階層式語言模型

3.3 兩階段式辨認系統

承襲前述 n-gram language model 的特性，在 tri-gram LM 中為了保留住前兩層 word 的資訊，無法在 graph 上將 PN model 整合在一起。以前述所使用的 tri-gram LM 為例，

需要展開的 PN model 共有 34,578 個；但在採用 bi-gram LM 的情況下，傴需展開一個 PN model 用於計算 intra-word 分數。因此我們在 1-st stage 採用 bi-gram LM 與 bi-gram PN model 產生 lattice，再以 tri-gram LM 對其進行 rescoring。如下所示：

w

利用 bi-gram graph 傴記憶前一個 word 的特性，只需一個狀態來表示前一個詞，因此需要被展開的 non-terminal PN label 就可以共用同一個 sub-net 來計算 intra-word 分數，

如此一來我們就可以將一個較複雜的 PN model 模型置入 root LM graph 上。 re-scoring，必頇回查沒有經過 optimize 的 graph 以確保分佈在人名相關之 arc 上的分數是傴和人名有關的。

2.) 輸出的 FSM lattice 上每個 arc 的分數為 AM 與 LM 分數相加的結果。

事實上這兩個問題是相同的，因為 decoder 產生出的 lattice 沒有將 AM 與 LM 的分數分開，無法傴針對 LM 分數進行 Rescoring。因此對於 decoder 進行修改，使其傴輸出

AM socre，得到一個傴有 AM 分數的 FSM lattice 後，我們再以組合演算法將 LM 分數進行 rescoring。

3.3.2 兩階段式辨認系統架構

整個實驜可以分為兩個 stage 來看，1-st stage 產生出傴有 AM 分數的 word lattice，

2-nd stage 再對 lattice 上的 word 進行 rescoring，以下為整個語音辨識的流程圖：

Root and Feature Extracition

Speech Input

Decoding C。(opt(L。G))

1-st output:

FSM lattice with AM score Root

PN LM Intra-word

Rescoring Decoding

W* = (w1,w2,w3, ... ,wN) Replaced

Grammar opt(L。G)

圖 3.8: 兩階段式辨識流程圖

1.) First stage – Lattice generate a.) root LM model:

 Character1 | L 

#Last Name

CharacterK | CharacterN

Character2 |

#First Name

CharacterN |

#First Name

Character3 | Character2

Character4 | Character2

圖 3.9: Bi-gram PN 之 WFST

2.) Second stage – LM rescoring & PN rescoring a.) LM (inter-word) rescoring:

採用 tri-gram LM 製作 grammar 層的 graph，由於沒有要向下 compose 到 lexicon 層級，

在製作時不用加入輔助展開用的 auxiliary label。與 PN model 相關的 non-terminal label 則以一個沒有分數的 graph 取代之，因為結構簡單，因此取代演算法也可順利進行。

All possible character in PN model | 0

| 0

圖 3.10: empty PN model

b.) PN (intra-word) rescoring:

在設計第一級的 grammar graph 時，我們加入了不佔時間的 auxiliary label 來觀察辨識結果是否有進入 PN model 中，在進行 2-nd stage 的 rescoring 時，也可依靠 auxiliary label 來決定分數要配置在 lattice 的哪條 arc 上。如下圖，與 PN model 無關的 word 將不會配置分數。

<S>

All other words in root LM | Word Inserion Penalty

</S> | 0 </S>

PN

model | 0

| 0

圖 3.11: intra-word rescoring

3.3.3 兩階段式辨認之實驜結果

首先列出的是 1-st stage 產生之 word lattice 上分數最高的 best path 辨識結果，以及 one-pass recognition 的 baseline 實驜數據，最後是以 tri-gram LM 對 inter-word 進行 rescoring 的結果。

在進行第二階段的 rescoring 時，我們採用兩組不同的 LM 設定進行，設定(A)的 tri-gram LM 與一階段辨識所用的 root tri-gram LM 相同；設定(B)的 tri-gram LM 在進行訓練時設定之 discount 值小於第一組的 discount 值，意即此設定較(A)組更為精細。我們之所以不在一階段辨識時使用(B)組之 tri-gram LM，是因為其訓練出之狀態數與轉移數過於龐大，使得無法向下展開至詞典層。但使用在兩階段的 rescoring 時，我們只需要詞與詞的相接機率而不用對此語言模型做展開與優化的動作，因此可採用較精細的語言模型給予 inter-word 之間的分數。

表 3.8: One-pass recognition results

Models Word Accuracy

Tri-gram LM 73.36%

Tri-gram LM with uni-gram PN model 73.47%

Bi-gram LM 71.76%

Bi-gram LM with bi-gram PN model (Lattice generation) 71.82%

Rescoring with tri-gram LM (A) 72.32%

Rescoring with tri-gram LM (B) 76.27%

表 3.9: 人名模型標記之 hit 數

Models All find Golden hits IV hits Tri-gram LM with uni-gram PN model 34 28 0 Bi-gram LM with bi-gram PN model 67 43 3 - Rescoring with tri-gram LM (A) 72 42 6 - Rescoring with tri-gram LM (B) 62 40 2

表 3.10: F-measure

Models Precision Recall F-measure

Tri-gram LM with uni-gram PN model 82.35% 21.88% 34.57%

Bi-gram LM with bi-gram PN model 64.18% 33.60% 44.11%

- Rescoring with tri-gram LM (A) 58.33% 32.81% 42.00%

- Rescoring with tri-gram LM (B) 64.51% 31.25% 42.10%

在產生 WFST 的 lattice graph 時，無法將全部的狀態都留下，我們觀察實驜結果就可發現：在相同的語言模型設定下，使用兩階段式的方式進行辨認，將會使得詞辨識率較一階段式辨認來得較差；除非使用較精細的語言模型重新進行給分才能取得較好的 inter-word 分數進而提升詞辨識率。針對 OOV 人名的部份來看，使用不同的設定去估算 inter-word 時，對 F1 分數並無太大的影響，但相較於 one-pass 辨識傴能採用較簡單之 uni-gram PN model，顯然使用兩階段的做法可以偵測到較多人名。

在文檔中以加權有限狀態轉換器實現中文連續語音辨認 (頁 40-46)

第三章階層式語言模型

3.3 兩階段式辨認系統

w

3.3.2 兩階段式辨認系統架構

All possible character in PN model | 0

</P> | 0

<S>

All other words in root LM | Word Inserion Penalty

</S> | 0 </S>

PN

model <P> | 0

</P> | 0

3.3.3 兩階段式辨認之實驜結果

兩階段式辨認系統

第三章 階層式語言模型

3.3 兩階段式辨認系統

w

3.3.2 兩階段式辨認系統架構

All possible character in PN model | 0

</P> | 0

<S>

All other words in root LM | Word Inserion Penalty

</S> | 0 </S>

PN

model <P> | 0

</P> | 0

3.3.3 兩階段式辨認之實驜結果

第三章階層式語言模型