• 沒有找到結果。

5. 第五章 系統效能評估

5.1 經驗法則式推論模型實驗

5.1.2 推論函式分析

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

34

由於模型採取的函式多數用以判斷否定推論關係,因此各項參數的設定主要為降低 兩個句子之間推論程度的分數。從上表的觀察可以發現否定詞判斷的參數β,在三種訓 練語料(除簡體中文外)中皆有較高的懲罰分數,對推論分數影響較深,因而容易判定句 子之間沒有推論關係。同時觀察簡體中文語料的設定時,發現反義詞參數γ與實體名詞 錯位參數δ幾乎沒有太大的變化,也就是說在推論模型中這兩項函式對於推論關係判斷 幾乎沒有使用來調整推論關係分數。最後我們觀察英文語料的參數搜尋,MSR 語料在 實體名詞錯位函式幾乎沒有使用,可能和語料設計有關而沒有發揮其功效;反觀 RTE 語料則仍有部分的實體名詞位置錯亂而影響推論關係的判斷。但兩種語料的反義詞判斷 函式使用狀況卻正好相反,RTE 語料幾乎不使用反義詞函式評斷推論關係。這樣的情況 可能來自於語料設計的不同,而造成某些函式沒有較好的功效來進行推論關係的判斷,

我們在稍後的測試語料實驗中,可以實際觀察判斷推論關係的系統效能,同時瞭解在不 同訓練語料時參數調校的推論結果。

5.1.2 推論函式分析

根據上述這些訓練語料的參數調整,進行測試語料的實驗,分析經驗法則式推論模型經 由參數調校後的效能與單項推論能力。

我們使用表 5.1 的參數進行 RITE-1 與 RITE-2 繁體中文測試語料的推論關係預測,

並且加入近義詞的判定,觀察是否能提升推論效果,最後針對預測的結果進行分析,計 算單項答案的精確率與召回率。圖 5.1 及圖 5.2 則為 RITE-1 與 RITE-2 繁體中文測試語 料使用近義詞的效能比較,從圖中的結果顯示近義詞在 RITE-2 的測試語料中能提升不 少系統效能,而 RITE-1 測試語料則是略微的下降,因此我們認為近義詞在推論關係的 判斷中具有潛在的幫助,而造成效能的差異是由於語料特性的不同。

無近義詞 Macro-F1 73.18% 73.29% 73.59% 73.52% 73.82% 73.83%

近義詞 Macro-F1 71.53% 71.94% 72.41% 72.07% 72.54% 73.00%

無近義詞 Accuracy 73.44% 73.56% 73.78% 73.78% 74.00% 73.89%

近義詞 Accuracy 72.11% 72.56% 72.89% 72.67% 73.00% 73.22%

58.00%

無近義詞 Macro-F1 65.79% 65.73% 65.55% 65.75% 65.56% 65.42%

近義詞 Macro-F1 66.79% 67.46% 67.12% 67.25% 66.92% 67.07%

無近義詞 Accuracy 66.29% 66.29% 65.95% 66.29% 65.95% 65.61%

近義詞 Accuracy 67.76% 68.56% 67.99% 68.33% 67.76% 67.54%

58.00%

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C1 69.57% 83.33% 79.22% 63.56% 73.18% 73.44% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C1 67.18% 86.44% 81.00% 57.78% 71.53% 72.11%

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C1 67.91% 72.03% 64.08% 59.45% 65.79% 66.29% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C1 67.63% 78.08% 67.99% 55.47% 66.79% 67.76%

無近義詞 Macro-F1 67.69% 67.69% 67.69% 68.11% 68.11% 67.66%

近義詞 Macro-F1 65.96% 65.96% 65.96% 66.41% 66.41% 67.57%

無近義詞 Accuracy 75.18% 75.18% 75.18% 75.43% 75.43% 71.01%

近義詞 Accuracy 74.94% 74.94% 74.94% 75.18% 75.18% 71.74%

58.00%

無近義詞 Macro-F1 62.05% 61.98% 62.24% 62.67% 63.94% 65.71%

近義詞 Macro-F1 59.35% 59.05% 59.44% 60.10% 61.68% 68.09%

無近義詞 Accuracy 66.33% 66.33% 66.45% 66.71% 67.35% 65.81%

近義詞 Accuracy 64.92% 64.66% 64.92% 65.30% 66.33% 68.50%

58.00%

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C7 73.82% 95.44% 82.09% 38.19% 67.69% 75.18% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C7 72.80% 97.72% 88.89% 33.33% 65.96% 74.94% 編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C7 62.80% 92.42% 80.00% 35.65% 62.05% 66.33%

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc

C7 61.42% 94.31% 81.95% 30.36% 59.35% 64.92%

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

41

但我們提出的函式仍然無法有效掌握英文語料中的否定推論關係,需要再進一步瞭解英 文的語言特性來改善經驗法則式推論模型的效果。

圖 5.5 經驗法則式推論模型系統效能:MSR 測試語料

圖 5.6 經驗法則式推論模型系統效能:RTE-1 測試語料

C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 63.39% 63.61% 64.82% 64.67% 64.67% 62.64% 62.17% 62.17% 61.49% 61.49%

Accuracy 72.12% 72.17% 72.46% 72.23% 72.12% 65.86% 65.28% 65.28% 64.29% 64.29%

50.00%

55.00%

60.00%

65.00%

70.00%

C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 51.11% 51.20% 52.04% 51.81% 52.06% 55.58% 55.43% 55.30% 55.48% 55.34%

Accuracy 55.63% 55.63% 56.00% 55.63% 55.75% 57.13% 57.00% 56.75% 57.00% 56.75%

50.00%

55.00%

60.00%

65.00%

70.00%

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

42

圖 5.7 經驗法則式推論模型系統效能:RTE-2 測試語料

圖 5.8 經驗法則式推論模型系統效能:RTE-3 測試語料

C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 50.58% 50.48% 50.90% 51.08% 51.34% 55.39% 55.51% 55.39% 55.30% 55.17%

Accuracy 55.25% 55.13% 55.25% 55.38% 55.50% 57.00% 57.25% 57.00% 57.00% 56.75%

50.00%

55.00%

60.00%

65.00%

70.00%

C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Macro-F1 56.35% 56.70% 57.75% 57.64% 57.77% 59.18% 59.25% 59.07% 59.69% 59.50%

Accuracy 60.38% 60.62% 61.00% 60.88% 60.88% 60.38% 60.50% 60.25% 60.75% 60.50%

50.00%

55.00%

60.00%

65.00%

70.00%

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

43

表 5.13 經驗法則式推論模型實驗結果:MSR 測試語料

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 73.45% 90.93% 65.90% 34.78% 63.39% 72.12%

C14 73.57% 90.76% 65.81% 35.29% 63.61% 72.17%

C15 74.31% 89.54% 65.01% 38.58% 64.82% 72.46%

C16 74.27% 89.10% 64.18% 38.75% 64.67% 72.23%

C17 74.31% 88.75% 63.66% 39.10% 64.67% 72.12%

C18 75.74% 71.58% 49.14% 54.50% 62.64% 65.86%

C19 75.56% 70.62% 48.39% 54.67% 62.17% 65.28%

C20 75.56% 70.62% 48.39% 54.67% 62.17% 65.28%

C21 75.46% 68.61% 47.21% 55.71% 61.49% 64.29%

C22 75.46% 68.61% 47.21% 55.71% 61.49% 64.29%

表 5.14 經驗法則式推論模型實驗結果:RTE-1 測試語料

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 53.50% 86.00% 64.33% 25.25% 51.11% 55.63%

C14 53.51% 85.75% 64.15% 25.50% 51.20% 55.63%

C15 53.81% 84.75% 64.12% 27.25% 52.04% 56.00%

C16 53.60% 83.75% 62.86% 27.50% 51.81% 55.63%

C17 53.70% 83.50% 62.92% 28.00% 52.06% 55.75%

C18 55.19% 75.75% 61.35% 38.50% 55.58% 57.13%

C19 55.09% 75.75% 61.20% 38.25% 55.43% 57.00%

C20 54.96% 74.75% 60.55% 38.75% 55.30% 56.75%

C21 55.11% 75.50% 61.11% 38.50% 55.48% 57.00%

C22 54.98% 74.50% 60.47% 39.00% 55.34% 56.75%

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

44

表 5.15 經驗法則式推論模型實驗結果:RTE-2 測試語料

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 53.25% 86.00% 63.64% 24.50% 50.58% 55.25%

C14 53.18% 85.75% 63.23% 24.50% 50.48% 55.13%

C15 53.29% 85.00% 62.96% 25.50% 50.90% 55.25%

C16 53.38% 85.00% 63.19% 25.75% 51.08% 55.38%

C17 53.47% 84.75% 63.25% 26.25% 51.34% 55.50%

C18 55.07% 76.00% 61.29% 38.00% 55.39% 57.00%

C19 55.20% 77.00% 61.98% 37.50% 55.51% 57.25%

C20 55.07% 76.00% 61.29% 38.00% 55.39% 57.00%

C21 55.04% 76.50% 61.48% 37.50% 55.30% 57.00%

C22 54.91% 75.50% 60.80% 38.00% 55.17% 56.75%

表 5.16 經驗法則式推論模型實驗結果:RTE-3 測試語料

編號 Y-Precision Y-Recall N-Precision N-Recall Macro-F1 Acc C13 57.35% 88.54% 71.86% 30.77% 56.35% 60.38%

C14 57.53% 88.54% 72.19% 31.28% 56.70% 60.62%

C15 58.01% 86.59% 70.74% 34.10% 57.75% 61.00%

C16 57.94% 86.34% 70.37% 34.10% 57.64% 60.88%

C17 57.99% 85.85% 69.95% 34.62% 57.77% 60.88%

C18 58.82% 75.61% 63.37% 44.36% 59.18% 60.38%

C19 58.87% 76.10% 63.70% 44.10% 59.25% 60.50%

C20 58.75% 75.37% 63.14% 44.36% 59.07% 60.25%

C21 59.23% 75.12% 63.57% 45.64% 59.69% 60.75%

C22 59.11% 74.39% 63.03% 45.90% 59.50% 60.50%

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

45

經由多組中文與英文語料實驗,可以發現我們提出的函式組成經驗法則式推論系統,

在繁體中文語料的系統綜合效能評比是較佳的,而簡體中文由準確率來看是個不錯的效 能,但針對簡體中文語料的單項推論能力則仍有待加強,然而經驗法則式推論模型獲得 的效能與 NTCIR-9、NTCIR-10 競賽成績相比,在中文語料中仍屬於不錯的效果。英文 的實驗結果則有相當大的進步空間,兩種單項推論能力都需要就現有的函式進行改善,

以提升英文語句的推論效果。從這些實驗可以得知未來我們需要發展更多函式來判定否 定的推論關係,尤其是針對語句間的反義、獨立與矛盾等現象需要處理。同時我們發現 在不同的英文語料上有極大的效能差異,MSR 語料來自於新聞內容,因此並未多加針 對其目的性做語料的設計,而 RTE 語料則是為了各種不同領域的研究而設計的語料,

當中包含多種針對各項研究議題所設計的句對,我們認為此語料的複雜程度較 MSR 語 料更為困難,所以僅用幾項語言的特徵並沒有辦法取得較好的推論效果,致使系統的效 能在 RTE 語料的表現皆呈現偏低的情況,是未來急需改善的部分。