Information Extraction - 從臨床文字報告到復發預測模組以肝癌患者為研究對象進行資訊擷取、資料查詢與探勘

Chapter 6 Discussion

6.1 Information Extraction

In order to resolve the problem of information overload for clinicians to review large amounts of patients’ records, the previous works put their efforts on the creation of concept and problem oriented views of patient records [46-48].

This proposed method reduces the problem of information overload [46] in three aspects. First, this method only retrieves reports including concepts relevant to liver cancer. Second, this method groups duplicated clinical findings from the same source report but being mentioned in different reports. Third, this method provides answers and evidence sentences of clinical questions using a rule-based classifier. For checking a patient’s personal summarized status, clinicians could only check these brief answers and evidence sentences, but not review all extracted and grouped results.

The idea of hot-spotting technique is employed in identifying interesting major concepts in sentences and then parses the surrounding text based on major concepts for identifying their related concepts. It would be a more flexible way to process the grammatical/ungrammatical sentences and narrative/tabular textual formats based on the requirements. When this method employs merely large amounts of grammar rules and syntactic rules for parsing all sentences, the ungrammatical sentences and tabular textual formats might not be parsed well. Although flexibility of this method may produce errors in processing some cases, the cases of ungrammatical sentences and tabular textual formats need flexibility. In this study, reports come from a homogeneous group of patients and the concepts relevant to liver cancer are collected by clinicians. For the

classification.

6.1.1 Principal Results

A patient has one classification result for each classification question. In this study, there are 78 patients that only total 78 patients’ classification results are acquired for each classification question. Because of total 78 classification results, few errors could largely reduce the kappa scores (i.e., the results of human IAA are in the range of 71%

to 79% in kappa scores). For example, only one disagreement classification exists in total 78 classifications but the kappa score is 79.37%.

F-score of temporal information and report type is the lowest (92.40% in Table 7) in the evaluation of seven major categories for IE module. The errors of temporal information and report types might be caused by the following reasons. (1) In order to provide flexibility for handling two mixture types of year, multiple formats of representation, and incomplete information (e.g., only year “2007” or without year

“03/05”), the concept identification module might capture wrong values that are actually not the temporal information. (2) The multiple temporal information and report types are incorrectly bound. (3) Information appears apart and is incorrectly combined into single temporal information.

In the process of classifying whether a current treatment is a patient’s first treatment for HCC (first treatment), a final result is concluded based on all relevant reports. When one of these reports appears incorrect extracted result (e.g., surgical operation for HCC is not performed but incorrectly recognized as being performed already), the error of this single report might cause the error of final classification.

Therefore, in the aspect of treatment, although F-score of IE module for single report is 99.59%, NPV of this rule-based classifier is 82.35% due to the error of single report

which is able to cause the error of final classification. In the aspect of HCC, the final result of classification (HCC patient) is concluded from multiple reports. On the contrary, when one of these reports confirms the positive extracted result (e.g., confirmed HCC diagnosis), the final classification is concluded as positive HCC patient. Therefore, although F-score of IE module for single report is 97.43%, accuracy of rule-based classifier is 100% due to the extracted error of partial reports not causing the error of final classification. Comparing to reviewing all the considerable quantities of patients’ narrative reports and extracting information manually, an automated method is cost savings.

6.1.2 Error Analysis

In the evaluation of IE module, 76 errors (35 false positives and 41 false negatives) are found (from the total 1384 gold-standard entities included in the 759 reports). In the evaluation of rule-based classifier, six errors (2 false positives and 4 false negatives) are found (from the total 234 classifications in the 78 patients’ summarized results).

According to the evaluation results, the errors occurred in this method could be divided into following categories.

(1) Incorrect concepts binding (20.22%) (e.g., “…liver cirrhosis, suspected HCC …”, for extracting the status of liver cirrhosis, “suspected” was incorrectly regarded as the status of liver cirrhosis).

(2) The information is incorrectly reserving (20.22%) (e.g., in “… liver cirrhosis … status post echo-guided RFA…”, the “echo-guided” is incorrectly regarded as the echo report; however, “echo-guided” actually has to be filtered out and could not be regarded

distance between clinical findings and the date of examination are too far back so that the temporal information is not correctly bound with the clinical findings).

(4) Incorrect concept recognition (13.48%) (e.g., in “Alpha-Fetoprotein:3.05 2009/12/08”, the correct date is “2009/12/08”, but “3.05 2009” is incorrectly recognized as “2009/03/05”).

(5) Descriptors in reports are not included in predefined expressions (11.24%) (e.g., in “More in favor of HCC”, “favor” is not included in the predefined expressions for diagnosis status).

(6) Inappropriate concepts binding (8.99%) (e.g., in “A 2 cm tumor was in the right lobe of liver (S8)”, the location, “right lobe of liver”, is bound with this tumor, but “S8”

is a more specific and appropriate information).

(7) Misspelled words (5.62%) (e.g., the full name of HCC is misspelled as

“hepatocelluar carcinoma”, the RFA treatment is misspelled as “RAF”).

(8) Inappropriate neglect (4.49%) (e.g., in “Hepatocellular carcinoma, S7, stage A1”, the “stage A1” is inappropriately ignored due to the sentence not having the cancer stating descriptor, “BCLC”; but actually, “stage A1” is most likely to describe the stage of BCLC for HCC).

6.1.3 Limitations

There are several limitations included in this proposed method. (1) Some research studies temporal reasoning for medical events using “implicit” information, but the proposed method simply focuses on extracting temporal information explicitly stated in the reports [133-134]. (2) The “sorting and grouping” only focuses on repeated sentences in different reports, but not focuses on repeated semantics in reports. (3) The reports come from 152 patients, and the sample size in this study is smaller than other

studies. Furthermore, the context in these reports might appear high similarity, and these data could be extracted correctly. (4) Different expressions in each concept are manually collected. When a new expression of a concept appears, it should be added in the information extraction module. (5) For the information extraction module, different rules are required for various concepts to determine whether the extracted results need to be filtered out based on their extracted positive and negative concepts. Therefore, when this method has to recognize a new concept, new rules for this concept should also be added. Similarly, rule-based classifiers for clinical questions also require new rules for a new clinical question. (6) In regard to the evaluation in the six major concepts, not all related concepts of a major concept would be evaluated. For instance, the concept of temporal report and report type is the related concept of different major concepts, but this concept (i.e., temporal report and report type) is independently evaluated. Not all reports contain the information relevant to temporal reports and report types, and reports containing this information are collected for evaluating more cases with the extraction of temporal reports and report types from these reports.

6.2 Data Query

在文檔中從臨床文字報告到復發預測模組以肝癌患者為研究對象進行資訊擷取、資料查詢與探勘 (頁 91-95)