擷取檢驗細項候選關鍵字詞效果評估

第五章實驗結果與討論

5.3 擷取檢驗細項候選關鍵字詞效果評估

5.3.1 評估方法

本實驗取特殊檢驗項目的段落(二)、段落(六)、及段落(七)進行實驗評估，

本論文以腎臟科醫師給予的檢驗報告中的檢驗項目與細項關鍵字作為標準答案，

找出的一個字詞中包含標準答案即算正確找出，對 4.2 細項關鍵字詞自動擷取方法找出的候選關鍵字詞評估其 Precision、Recall、及 F1-score 值作為效果評估依據。圖 11 顯示對照 4.2 小節所提出擷取細項關鍵字詞的方法，考慮可能影響擷取效果的字典選擇、LIFT 篩選門檻值、LDA Topic 數設定、以及擴增細項關鍵字詞步驟，進一步分成以下[實驗 2-1]到[實驗 2-4]四部分的實驗。

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

precision recall F1-score

段落一

段落六

段落八

平均

圖 11 方法與實驗流程對照圖

5.3.2 實驗結果

【實驗 2-1】名詞字典選擇對檢驗細項關鍵字詞擷取之效果評估

本實驗比較在建立名詞字典時的三種作法：(1)不包含形容詞的名詞或複合名詞(以 NN+NN 標示)、(2)包含形容詞的複合名詞(以 JJ+NN 標示)、(3)聯集前兩者之複合名詞，希望能找出適合用於擷取檢驗細項關鍵字詞的名詞字典。

以上述三種不同作法建立的名詞字典，所找出的檢驗細項關鍵字詞效果之

precision、recall、及 F1_score 評估分別如圖 12、圖 13、及圖 14 所示。實驗結果可以發現採用 NN+NN 建立的名詞字典在各個實驗數據中表現皆不夠理想，我們認為是因為檢驗項目細項字詞中多需包含有形容詞單字，而 NN+NN 找出的字典並不包含形容詞，因而造成錯誤及不完整。而以 NN+NN 合併 JJ+NN 的名詞字典在

recall 的效果顯著比另外兩個高。此類型的字典包含最多的字詞，所以可找出最多細項關鍵字詞，但也因找出的字詞較多，導致 precision 不如(JJ+NN)的名詞字典高。以三個段落的 f1-score 平均值來看，我們選定(NN+NN 合併(JJ+NN)的名詞字典作為擷取檢驗細項關鍵字詞的名詞字典。

NN+NN JJ+NN JJ+NN、NN+NN

段落二

NN+NN JJ+NN JJ+NN、NN+NN

段落二

段落六

段落七

平均

圖 14 不同方法建立名詞字典之檢驗細項關鍵字詞擷取 F1-score 值結果

【實驗 2-2】Lift 門檻值設定對檢驗細項關鍵字詞擷取之效果評估

本實驗比較改變不同 Lift 門檻值作為篩選出專有名詞的基準，評估對擷取

檢驗細項關鍵字詞的 precision、recall、及 F1_score 效果如圖 15、圖 16、圖 17 所示。可以看出當 Lift 門檻值設於 0.2 時，各項數據皆有較好的表現，篩選出的專有名詞準確率最高、符合答案的數量也最多，因此接下來的實驗皆將設為

0.2。

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

NN+NN JJ+NN JJ+NN、NN+NN

段落二

段落六

段落七

平均

圖 17 不同 Lift 門檻值之檢驗細項關鍵字詞擷取 F1-score 值結果

【實驗 2-3】LDA Topic 數 numT設定對檢驗細項關鍵字詞擷取之效果評估 本實驗比較改變進行 LDA 主題機率模型的 Topic 數量 numT，評估對擷取檢驗細項關鍵字詞的 precision、recall、及 F1_score 效果如圖 18、圖 19、圖 20 所 示。根據實驗結果可以看出 numT設定為 10 時，可以得到較高的 precision 值，

圖 20 LDA 不同 Topic 數之檢驗細項關鍵字詞擷取 F1-score 結果

【實驗 2-4】擴增關鍵字詞候選清單對檢驗細項關鍵字詞擷取之效果評估根據本研究於 4.2 小節最後所提出的擴增關鍵字詞候選清單方法，本實驗目的是評估採用此方法是否增進檢驗細項關鍵字詞擷取效果。實驗結果如圖 21 所示，此方法在不同特殊檢驗項目段落中，皆有效提高了 precision 及 recall 值。顯示本論文提出的擴增方法，的確可有效補充檢驗項目細項的形容詞彙，以此擷取

microfilament condensation (微絲凝結) protein droplets

(蛋白質滴)

precision recall F1-score

擴增前擴增後

basement membrane (基底膜)

5.4 檢驗報告結構化效果評估

5.4.1 評估方法

根據 4.1 小節所提出的方法，本實驗取摘要式段落的段落(一)、段落(三)、

段落(八)進行實驗評估。評估方式由人工方式標示結構化後的內容字詞是否有遺漏重要資訊，一篇中只要有一個遺漏的字詞就算這篇錯誤。本實驗採取抽樣的方式從三個段落中隨機抽取 50 篇報告的結構化結果，根據標示結果計算出 Precision 作為效果評估值。

5.4.2 實驗結果

實驗結果如圖 22 所示，可以發現本論文提出的摘要式段落結構化方法，

precision 皆達 0.9 以上，且最高可達到 0.98 的準確率。經觀察發現造成段落(一) 與段落(八)的準確率略低於第三段，錯誤大多是因為抽取出關鍵字詞清單遺漏一些不符擷取詞性規則的字詞所造成，如表 17 所示。整題來說，本論文所提出摘要式段落結構化可以適用於大多數的檢驗報告內容，只有少部分特定文字描述方式無法完整擷取出。

圖 22 不同摘要式段落結構化之 precision 結果

表 17 摘要式段落結構化遺漏字詞部分案例

遺漏字詞詞性

diffuse and nodular diabetic nephropathy JJ CC JJ JJ NN chronic allograft rejection superimposed JJ NN NN VBN minimal glomerular, tubulointerstitial,

and vascular changes

JJ JJ , JJ CC JJ NN

0.9

0.98

0.92 0.93

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

一三八平均

結構化模型 precision

第六章結論與未來研究方向

６.1 結論

本研究針對文字檢驗報告自動結構化的問題，提出先以字詞詞性規則擷取出醫學詞彙候選詞，建立檢驗報告專屬的醫學字典，並經過字與字出現的關聯運算，

篩除字典中可能的錯字或無特殊語意的字詞。針對特殊檢驗項目段落之檢驗細項關鍵字詞擷取方法，本研究先將前述建立的字典進行去除雜訊字詞的處理，再利用 LDA 主題機率模型分析，擷取出檢驗項目細項之候選關鍵字詞。而對摘要式檢驗報告結構化的方法，則透過將段落報告與醫學字典進行字詞的比對擷取及前後詞彙合併，並將擷取出的字詞與檢驗摘要項目關鍵字尾比對，有效將其分配到所屬摘要項目中。論文並將上述提出的方法以系列實驗評估成效，實驗結果可顯示本研究提出方法的有效性。

６.2 未來方向

由實驗結果觀察到在進行摘要式段落結構化的處理過程中，少部分的重要字詞會有遺漏的狀況，無法用目前提出的字詞詞性規則擷取出。未來的研究可進一步考慮從大量檢驗報告資料庫，以及已知的醫學詞彙，自動學習出字詞樣式規則，

以提供更完整有效的結構化擷取。另一方面，未來可考慮進一步如何將找出的檢驗項目細項之關鍵字詞，分析出其分項架構關係。

參考文獻

[1] Stanford CoreNLP – Core natural language software https://stanfordnlp.github.io/CoreNLP.

[2] X. Rong, Z. Chen, Q. Mei, and E. Adar. EgoSet: exploiting word ego-networks and user-generated ontology for multifaceted set expansion. In Proc. of the International Conference on Web Search and Data Mining (WSDM), 2016.

[3] Y. Jo, N. Loghmanpour, and C. P. Rose. Time series analysis of nursing notes for mortality prediction via a state transition topic model. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2015.

[4] T. R. Goodwin, and S. M. Harabagiu. Medical question answering for clinical decision support. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2016.

[5] R. Feldman, O. Netzer, A. Peretz, and B. Rosenfeld. Utilizing text mining on online medical forums to predict label change due to adverse drug reactions. In Proc. of Knowledge Discovery and Data Mining (KDD), 2015.

[6] N. Tandon, G. D. Melo, A. De, and G. Wrikum. Knowlywood: mining activity knowledge from hollywood narratives. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2015.

[7] Y. Song, and Q. Guo. Query-less: predicting task repetition for nextGen proactive search and recommenddation engines. In Proc. of the International World Wide Web Conference (WWW), 2016.

[8] D. Savenkov, and E. Agichtein. When a knowledge base is not enough-question answering over knowledge bases with external text data. In Proc. of the Special Interest Group on Information Retrieva (SIGIR), 2016.

[9] M. Paterson, and V. Dančík. Longest common subsequences. In Proc. of the Mathematical Foundations of Computer Science (MFCS), 1994.

[10] M. Ghassemi, T. Naumann, F. Doshi-Velez, N. Brimmer, R. Joshi, A. Rumshisky, and P. Szolovits. Unfolding physiological state : mortality modelling in intensive care units. In Proc. of the Knowledge Discovery and Data Mining (KDD), 2014.

[11] L.-W. Lehman, M. Saeed, W. Long, J. Lee, and R. Mark. Risk stratification of ICU patients using topic models inferred from unstructured progress notes. In Proc. of the American Medical Informatics Association (AMIA), 2012.

[12] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In Proc. of the Journal of Machine Learning Research (JMLR), 2003.

[13] S. Balaneshin-kordan, A. Kotov, and R. Xisto. Wsu-Ir at trec 2015 clinical decision support track: Joint weighting of explicit and latent medical query concepts from diverse sources. In Proc. of the Text REtreival Conference (TREC), 2015.

[14] R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, and G. Gonzalez.

Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In Proc. of the 2010 workshop on Biomedical Natural Language Processing, Association for Computational Linguistics, 2010.

[15] X. Liu, and H. Chen. Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums. In Proc. of the International Conference on Smart Health (ICSH), 2013.

[16] Sethi, Sanjeev, et al. Mayo clinic/renal pathology society consensus report on pathologic classification, diagnosis, and reporting of GN. In Proc. of the Journal of the American Society of Nephrology (JASN), 2015.

附錄一

刪除之結尾字

finding comment correlate microscopy presentation

course correlation history examination possibility

consideration survey result myeloma sclerosis

cause patient smoking treatement due

drug data condition according setting

age feature background information study

profile year criterion criteria exmaination

likehood micropsy exmaintation usage limitation

response diagnosis evidence pending conclusion

impression evaluation clarification em show ultrastructure show

red staining pattern

刪除之字首

including mostly only favoring otherwise

involving

在文檔中醫療檢驗報告關鍵字擷取與結構化之研究 (頁 47-0)

第五章 實驗結果與討論

5.3 擷取檢驗細項候選關鍵字詞效果評估

5.3.1 評估方法

precision recall F1-score

段落一

段落六

段落八

平均

5.3.2 實驗結果

NN+NN JJ+NN JJ+NN、NN+NN

段落二

NN+NN JJ+NN JJ+NN、NN+NN

段落二

段落六

段落七

平均

NN+NN JJ+NN JJ+NN、NN+NN

段落二

段落六

段落七

平均

5.4 檢驗報告結構化效果評估

5.4.1 評估方法

5.4.2 實驗結果

結構化模型 precision

第六章 結論與未來研究方向

６.1 結論

６.2 未來方向

參考文獻

附錄一

第五章實驗結果與討論

第六章結論與未來研究方向