5. 第五章 系統效能評估
5.1 經驗法則式推論模型實驗
5.1.2 推論函式分析
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
34
由於模型採取的函式多數用以判斷否定推論關係,因此各項參數的設定主要為降低 兩個句子之間推論程度的分數。從上表的觀察可以發現否定詞判斷的參數β,在三種訓 練語料(除簡體中文外)中皆有較高的懲罰分數,對推論分數影響較深,因而容易判定句 子之間沒有推論關係。同時觀察簡體中文語料的設定時,發現反義詞參數γ與實體名詞 錯位參數δ幾乎沒有太大的變化,也就是說在推論模型中這兩項函式對於推論關係判斷 幾乎沒有使用來調整推論關係分數。最後我們觀察英文語料的參數搜尋,MSR 語料在 實體名詞錯位函式幾乎沒有使用,可能和語料設計有關而沒有發揮其功效;反觀 RTE 語料則仍有部分的實體名詞位置錯亂而影響推論關係的判斷。但兩種語料的反義詞判斷 函式使用狀況卻正好相反,RTE 語料幾乎不使用反義詞函式評斷推論關係。這樣的情況 可能來自於語料設計的不同,而造成某些函式沒有較好的功效來進行推論關係的判斷,
我們在稍後的測試語料實驗中,可以實際觀察判斷推論關係的系統效能,同時瞭解在不 同訓練語料時參數調校的推論結果。
5.1.2 推論函式分析
根據上述這些訓練語料的參數調整,進行測試語料的實驗,分析經驗法則式推論模型經 由參數調校後的效能與單項推論能力。
我們使用表 5.1 的參數進行 RITE-1 與 RITE-2 繁體中文測試語料的推論關係預測,
並且加入近義詞的判定,觀察是否能提升推論效果,最後針對預測的結果進行分析,計 算單項答案的精確率與召回率。圖 5.1 及圖 5.2 則為 RITE-1 與 RITE-2 繁體中文測試語 料使用近義詞的效能比較,從圖中的結果顯示近義詞在 RITE-2 的測試語料中能提升不 少系統效能,而 RITE-1 測試語料則是略微的下降,因此我們認為近義詞在推論關係的 判斷中具有潛在的幫助,而造成效能的差異是由於語料特性的不同。
‧
無近義詞 Macro-F1 73.18% 73.29% 73.59% 73.52% 73.82% 73.83%
近義詞 Macro-F1 71.53% 71.94% 72.41% 72.07% 72.54% 73.00%
無近義詞 Accuracy 73.44% 73.56% 73.78% 73.78% 74.00% 73.89%
近義詞 Accuracy 72.11% 72.56% 72.89% 72.67% 73.00% 73.22%
58.00%
無近義詞 Macro-F1 65.79% 65.73% 65.55% 65.75% 65.56% 65.42%
近義詞 Macro-F1 66.79% 67.46% 67.12% 67.25% 66.92% 67.07%
無近義詞 Accuracy 66.29% 66.29% 65.95% 66.29% 65.95% 65.61%
近義詞 Accuracy 67.76% 68.56% 67.99% 68.33% 67.76% 67.54%
58.00%
‧
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 AccC1 69.57% 83.33% 79.22% 63.56% 73.18% 73.44% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc
C1 67.18% 86.44% 81.00% 57.78% 71.53% 72.11%
‧
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 AccC1 67.91% 72.03% 64.08% 59.45% 65.79% 66.29% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc
C1 67.63% 78.08% 67.99% 55.47% 66.79% 67.76%
無近義詞 Macro-F1 67.69% 67.69% 67.69% 68.11% 68.11% 67.66%
近義詞 Macro-F1 65.96% 65.96% 65.96% 66.41% 66.41% 67.57%
無近義詞 Accuracy 75.18% 75.18% 75.18% 75.43% 75.43% 71.01%
近義詞 Accuracy 74.94% 74.94% 74.94% 75.18% 75.18% 71.74%
58.00%
‧
無近義詞 Macro-F1 62.05% 61.98% 62.24% 62.67% 63.94% 65.71%
近義詞 Macro-F1 59.35% 59.05% 59.44% 60.10% 61.68% 68.09%
無近義詞 Accuracy 66.33% 66.33% 66.45% 66.71% 67.35% 65.81%
近義詞 Accuracy 64.92% 64.66% 64.92% 65.30% 66.33% 68.50%
58.00%
‧
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 AccC7 73.82% 95.44% 82.09% 38.19% 67.69% 75.18% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc
C7 72.80% 97.72% 88.89% 33.33% 65.96% 74.94% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc
C7 62.80% 92.42% 80.00% 35.65% 62.05% 66.33%
‧
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 AccC7 61.42% 94.31% 81.95% 30.36% 59.35% 64.92%
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
41
但我們提出的函式仍然無法有效掌握英文語料中的否定推論關係,需要再進一步瞭解英 文的語言特性來改善經驗法則式推論模型的效果。
圖 5.5 經驗法則式推論模型系統效能:MSR 測試語料
圖 5.6 經驗法則式推論模型系統效能:RTE-1 測試語料
C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 63.39% 63.61% 64.82% 64.67% 64.67% 62.64% 62.17% 62.17% 61.49% 61.49%
Accuracy 72.12% 72.17% 72.46% 72.23% 72.12% 65.86% 65.28% 65.28% 64.29% 64.29%
50.00%
55.00%
60.00%
65.00%
70.00%
C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 51.11% 51.20% 52.04% 51.81% 52.06% 55.58% 55.43% 55.30% 55.48% 55.34%
Accuracy 55.63% 55.63% 56.00% 55.63% 55.75% 57.13% 57.00% 56.75% 57.00% 56.75%
50.00%
55.00%
60.00%
65.00%
70.00%
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
42
圖 5.7 經驗法則式推論模型系統效能:RTE-2 測試語料
圖 5.8 經驗法則式推論模型系統效能:RTE-3 測試語料
C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 50.58% 50.48% 50.90% 51.08% 51.34% 55.39% 55.51% 55.39% 55.30% 55.17%
Accuracy 55.25% 55.13% 55.25% 55.38% 55.50% 57.00% 57.25% 57.00% 57.00% 56.75%
50.00%
55.00%
60.00%
65.00%
70.00%
C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 56.35% 56.70% 57.75% 57.64% 57.77% 59.18% 59.25% 59.07% 59.69% 59.50%
Accuracy 60.38% 60.62% 61.00% 60.88% 60.88% 60.38% 60.50% 60.25% 60.75% 60.50%
50.00%
55.00%
60.00%
65.00%
70.00%
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
43
表 5.13 經驗法則式推論模型實驗結果:MSR 測試語料
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 73.45% 90.93% 65.90% 34.78% 63.39% 72.12%
C14 73.57% 90.76% 65.81% 35.29% 63.61% 72.17%
C15 74.31% 89.54% 65.01% 38.58% 64.82% 72.46%
C16 74.27% 89.10% 64.18% 38.75% 64.67% 72.23%
C17 74.31% 88.75% 63.66% 39.10% 64.67% 72.12%
C18 75.74% 71.58% 49.14% 54.50% 62.64% 65.86%
C19 75.56% 70.62% 48.39% 54.67% 62.17% 65.28%
C20 75.56% 70.62% 48.39% 54.67% 62.17% 65.28%
C21 75.46% 68.61% 47.21% 55.71% 61.49% 64.29%
C22 75.46% 68.61% 47.21% 55.71% 61.49% 64.29%
表 5.14 經驗法則式推論模型實驗結果:RTE-1 測試語料
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 53.50% 86.00% 64.33% 25.25% 51.11% 55.63%
C14 53.51% 85.75% 64.15% 25.50% 51.20% 55.63%
C15 53.81% 84.75% 64.12% 27.25% 52.04% 56.00%
C16 53.60% 83.75% 62.86% 27.50% 51.81% 55.63%
C17 53.70% 83.50% 62.92% 28.00% 52.06% 55.75%
C18 55.19% 75.75% 61.35% 38.50% 55.58% 57.13%
C19 55.09% 75.75% 61.20% 38.25% 55.43% 57.00%
C20 54.96% 74.75% 60.55% 38.75% 55.30% 56.75%
C21 55.11% 75.50% 61.11% 38.50% 55.48% 57.00%
C22 54.98% 74.50% 60.47% 39.00% 55.34% 56.75%
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
44
表 5.15 經驗法則式推論模型實驗結果:RTE-2 測試語料
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 53.25% 86.00% 63.64% 24.50% 50.58% 55.25%
C14 53.18% 85.75% 63.23% 24.50% 50.48% 55.13%
C15 53.29% 85.00% 62.96% 25.50% 50.90% 55.25%
C16 53.38% 85.00% 63.19% 25.75% 51.08% 55.38%
C17 53.47% 84.75% 63.25% 26.25% 51.34% 55.50%
C18 55.07% 76.00% 61.29% 38.00% 55.39% 57.00%
C19 55.20% 77.00% 61.98% 37.50% 55.51% 57.25%
C20 55.07% 76.00% 61.29% 38.00% 55.39% 57.00%
C21 55.04% 76.50% 61.48% 37.50% 55.30% 57.00%
C22 54.91% 75.50% 60.80% 38.00% 55.17% 56.75%
表 5.16 經驗法則式推論模型實驗結果:RTE-3 測試語料
編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 57.35% 88.54% 71.86% 30.77% 56.35% 60.38%
C14 57.53% 88.54% 72.19% 31.28% 56.70% 60.62%
C15 58.01% 86.59% 70.74% 34.10% 57.75% 61.00%
C16 57.94% 86.34% 70.37% 34.10% 57.64% 60.88%
C17 57.99% 85.85% 69.95% 34.62% 57.77% 60.88%
C18 58.82% 75.61% 63.37% 44.36% 59.18% 60.38%
C19 58.87% 76.10% 63.70% 44.10% 59.25% 60.50%
C20 58.75% 75.37% 63.14% 44.36% 59.07% 60.25%
C21 59.23% 75.12% 63.57% 45.64% 59.69% 60.75%
C22 59.11% 74.39% 63.03% 45.90% 59.50% 60.50%
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
45
經由多組中文與英文語料實驗,可以發現我們提出的函式組成經驗法則式推論系統,
在繁體中文語料的系統綜合效能評比是較佳的,而簡體中文由準確率來看是個不錯的效 能,但針對簡體中文語料的單項推論能力則仍有待加強,然而經驗法則式推論模型獲得 的效能與 NTCIR-9、NTCIR-10 競賽成績相比,在中文語料中仍屬於不錯的效果。英文 的實驗結果則有相當大的進步空間,兩種單項推論能力都需要就現有的函式進行改善,
以提升英文語句的推論效果。從這些實驗可以得知未來我們需要發展更多函式來判定否 定的推論關係,尤其是針對語句間的反義、獨立與矛盾等現象需要處理。同時我們發現 在不同的英文語料上有極大的效能差異,MSR 語料來自於新聞內容,因此並未多加針 對其目的性做語料的設計,而 RTE 語料則是為了各種不同領域的研究而設計的語料,
當中包含多種針對各項研究議題所設計的句對,我們認為此語料的複雜程度較 MSR 語 料更為困難,所以僅用幾項語言的特徵並沒有辦法取得較好的推論效果,致使系統的效 能在 RTE 語料的表現皆呈現偏低的情況,是未來急需改善的部分。