Data Query - 從臨床文字報告到復發預測模組以肝癌患者為研究對象進行資訊擷取、資料查詢與探勘

Chapter 6 Discussion

6.2 Data Query

In the results of the three query tasks, when more query target patients are in the database, more total execution time is spent for the entire query operation. The most part of the execution time is spent for “SQL operations” and “criteria verification”

(Table 10).

The system would simplify complex GLIF3.5-based query criteria into one or

during the process of “criteria verification” after the patient sets being retrieved based on the simplified SQL queries in the process of “SQL operations”. Therefore, the increasing execution time for “Criteria verification” is larger than the increasing execution time for “SQL operations” when the total patients increase. In Table 10, more than 90% of execution time is spent for “SQL operations” in two experiments with ten patients and 100 patients. In comparison with these two experiments, the percentage of the total execution time for “SQL operations” decreases, while the percentage of the total execution time for “Criteria verification” increases in the experiments with 1000 patients and 10000 patients. Figure 16 shows more increasing time for “Criteria verification” (the red lines with square points in three query tasks) than the increasing time for “SQL operations” (the blue lines with circle points in three query tasks) when data set is larger, especially in the experiment with 10000 patients.

In the query tasks when applying the mutually exclusive setting, once the patients are checked to meet the criteria for one child node, the patents would not have to be checked for other child nodes. In this situation, less time would be needed in the process of “Criteria verification”. Therefore, for the query tasks with or without mutually exclusive setting, the query task with mutually exclusive setting would spend less time than the one without mutually exclusive setting does (Table 10).

6.2.2 The Features of the Approach

The proposed approach has several features. First, the adoption of GLIF3.5 increases the potential of interoperability and shareability provided by GLIF3.5.

GLIF3.5 is a clinical guideline representation language which was originally developed and designed for formulating the sharable computer-interpretable clinical practice guidelines. The concepts, patient data items, and query criteria in query tasks can be

formulated using the standard vocabularies, medical data models, and medical logical expression languages of criteria (e.g., the Unified Medical Language System (UMLS), HL-7's Reference Information Model version 1.0 (RIM), and expression language called Guideline Expression Language (GEL) [103, 135-136]).

Second, GLIF3.5 includes flowchart-based models. The algorithm class of GLIF3.5 is used for formulating the algorithm included in the clinical guideline [103, 135-136]. In RetroGuide mentioned above, the flowchart-based query approach is used for assisting the users with limited database query experiences in formulating the query tasks. In the study, the flowchart-based instances provided by GLIF3.5 are employed in assisting the users in formulating the overall workflow of query tasks. Each node in the flowchart is regarded as the sub-process of the overall query process. Third, this system contains the visualized representation of query results. The query results are presented on the visualized graphical interface, including the amount of retrieved patients shown beside the nodes of the graphical flowchart, the table-based patient list, and the distribution information shown by the graphical pie chart. Fourth, the query criteria selection interface provides flexibility to select all or certain nodes in the flowchart for participating in the process of query operation. Fifth, the formulated query tasks can be stored as the project file of Protégé and the storage of these query tasks facilitates the reusability of these query tasks.

6.2.3 Limitations

Although this proposed system enriches the capability of data query using the ontology-driven and FBDQM-based approach, it does have several limitations. First, the

in advance through Protégé. Then, these criteria are exported as XML-format and are handled by this proposed system. When the query criteria in the node of a flowchart need to be modified or new criteria need to be added in the node, the criteria should be modified or created through the environment of Protégé.

Second, in order to perform the data mapping, the data items mapping table should be predefined in the database in advance. The query language generator uses the predefined mapping information for data mapping, and the data items of query criteria are mapped to the data items in the database. When the mapping information is not predefined in the database, these data items of query criteria would not be correctly mapped to data items in database. Third, the “one item to one item” mapping is supported in the study. “One item to one item” mapping means that one item in query criteria is mapped to one item in the database. Other mapping types such as “one item to many items” and “many items to one item” are not supported in this study. “One item to many items” means that one item in query criteria can be mapped to more than one synonymous items in the database. “Many items to one item” means that many synonymous items in query criteria can be mapped to one item in the database. Fourth, although this proposed approach may assist the users with limited database query experiences in formulating query tasks, the users have to understand how to formulate query tasks using the components in GLIF3.5.

6.3 Recurrence Predictive Models

在文檔中從臨床文字報告到復發預測模組以肝癌患者為研究對象進行資訊擷取、資料查詢與探勘 (頁 95-98)