6.1 結論
本研究針對文字檢驗報告自動結構化的問題,提出先以字詞詞性規則擷取出 醫學詞彙候選詞,建立檢驗報告專屬的醫學字典,並經過字與字出現的關聯運算,
篩除字典中可能的錯字或無特殊語意的字詞。針對特殊檢驗項目段落之檢驗細項 關鍵字詞擷取方法,本研究先將前述建立的字典進行去除雜訊字詞的處理,再利 用 LDA 主題機率模型分析,擷取出檢驗項目細項之候選關鍵字詞。而對摘要式檢 驗報告結構化的方法,則透過將段落報告與醫學字典進行字詞的比對擷取及前後 詞彙合併,並將擷取出的字詞與檢驗摘要項目關鍵字尾比對,有效將其分配到所 屬摘要項目中。論文並將上述提出的方法以系列實驗評估成效,實驗結果可顯示 本研究提出方法的有效性。
6.2 未來方向
由實驗結果觀察到在進行摘要式段落結構化的處理過程中,少部分的重要字 詞會有遺漏的狀況,無法用目前提出的字詞詞性規則擷取出。未來的研究可進一 步考慮從大量檢驗報告資料庫,以及已知的醫學詞彙,自動學習出字詞樣式規則,
以提供更完整有效的結構化擷取。另一方面,未來可考慮進一步如何將找出的檢 驗項目細項之關鍵字詞,分析出其分項架構關係。
53
參考文獻
[1] Stanford CoreNLP – Core natural language software https://stanfordnlp.github.io/CoreNLP.
[2] X. Rong, Z. Chen, Q. Mei, and E. Adar. EgoSet: exploiting word ego-networks and user-generated ontology for multifaceted set expansion. In Proc. of the International Conference on Web Search and Data Mining (WSDM), 2016.
[3] Y. Jo, N. Loghmanpour, and C. P. Rose. Time series analysis of nursing notes for mortality prediction via a state transition topic model. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2015.
[4] T. R. Goodwin, and S. M. Harabagiu. Medical question answering for clinical decision support. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2016.
[5] R. Feldman, O. Netzer, A. Peretz, and B. Rosenfeld. Utilizing text mining on online medical forums to predict label change due to adverse drug reactions. In Proc. of Knowledge Discovery and Data Mining (KDD), 2015.
[6] N. Tandon, G. D. Melo, A. De, and G. Wrikum. Knowlywood: mining activity knowledge from hollywood narratives. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2015.
[7] Y. Song, and Q. Guo. Query-less: predicting task repetition for nextGen proactive search and recommenddation engines. In Proc. of the International World Wide Web Conference (WWW), 2016.
[8] D. Savenkov, and E. Agichtein. When a knowledge base is not enough-question answering over knowledge bases with external text data. In Proc. of the Special Interest Group on Information Retrieva (SIGIR), 2016.
54
[9] M. Paterson, and V. Dančík. Longest common subsequences. In Proc. of the Mathematical Foundations of Computer Science (MFCS), 1994.
[10] M. Ghassemi, T. Naumann, F. Doshi-Velez, N. Brimmer, R. Joshi, A. Rumshisky, and P. Szolovits. Unfolding physiological state : mortality modelling in intensive care units. In Proc. of the Knowledge Discovery and Data Mining (KDD), 2014.
[11] L.-W. Lehman, M. Saeed, W. Long, J. Lee, and R. Mark. Risk stratification of ICU patients using topic models inferred from unstructured progress notes. In Proc. of the American Medical Informatics Association (AMIA), 2012.
[12] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In Proc. of the Journal of Machine Learning Research (JMLR), 2003.
[13] S. Balaneshin-kordan, A. Kotov, and R. Xisto. Wsu-Ir at trec 2015 clinical decision support track: Joint weighting of explicit and latent medical query concepts from diverse sources. In Proc. of the Text REtreival Conference (TREC), 2015.
[14] R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, and G. Gonzalez.
Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In Proc. of the 2010 workshop on Biomedical Natural Language Processing, Association for Computational Linguistics, 2010.
[15] X. Liu, and H. Chen. Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums. In Proc. of the International Conference on Smart Health (ICSH), 2013.
[16] Sethi, Sanjeev, et al. Mayo clinic/renal pathology society consensus report on pathologic classification, diagnosis, and reporting of GN. In Proc. of the Journal of the American Society of Nephrology (JASN), 2015.
55
附錄一
刪除之結尾字
finding comment correlate microscopy presentation
course correlation history examination possibility
consideration survey result myeloma sclerosis
cause patient smoking treatement due
drug data condition according setting
age feature background information study
profile year criterion criteria exmaination
likehood micropsy exmaintation usage limitation
response diagnosis evidence pending conclusion
impression evaluation clarification em show ultrastructure show
red staining pattern
刪除之字首
including mostly only favoring otherwise
involving