Research framework - The Problem and the related work

2 The Problem and the related work

2.4 Research framework

Motivated by the concept of clinical pathways, we propose a framework for detecting service providers’ fraud and abuse. Generally, as shown in Figure 2.2, two sets of clinical instances, which are labeled as normal and fraudulent, serve as the input of structure pattern discovery module. The structure pattern discovery module produces a set of frequently occurred structure patterns, which then serve as features of clinical instances. Each clinical instance is seen as an example that comprises an assignment of features and a class label (normal or fraudulent). The resultant data set is further filtered by a feature selection module to eliminate redundant and irrelevant features.

The selected features and the dataset are finally used to construc t the detection model, performed by the induction module. The detection model will be used to detect the incoming instances that are fraudulent.

Normal

In the research framework, we thereby identify three issues and form our series of investigations as below.

(1) The problem of how to discover structure patterns from clinical instances. As shown in Figure 2.1, a clinical pathway typically comprises a set of care activities.

These activities, each appears over a temporally extended interval, may execute in a particular transition way, such as sequentially, concurrently, or repeatedly. A clinical instance, which consists of care activities from one or more clinical pathways, is thus formed as a set of activities in process. How to take these characters into account and design methods to efficiently discover structure patterns is the first problem we faced.

(2) The problem of how to select relevant feature s. Clearly, it is not the case that all discovered patterns have discriminating power. A certain percentage of patterns can be found in both normal and fraudulent cases and thus are irrelevant with respect to the detection problem. Also, a certain percentage of patterns are

correlated and thus form redundant features. How to efficiently eliminate these redundant and irrelevant features to improve the performance of the subsequent (induction) model construction is the second problem in which we are interested.

(3) The problem of how to revise the detection model when the number of labeled examples is small. The input to the proposed research framework consists of two sets of labeled instances, those classified by experts as normal and fraudulent cases. In practice, requiring a large number of labeled training examp les to learn accurately is often prohibitive. Therefore, it is necessary to revise the existing detection model that is constructed from only labeled instances because they tend to learn less accurately for a small set of data. How to integrate other sources, such as unlabeled instances, to improve the accuracy of the detection model is another important issue in our research.

We investigate the issues listed above in order and report research results in the subsequent chapters.

Chapter 3 Structure pattern discovery

In order to construct the detection model described in antecedent chapter, we need to extract patterns in a way amenable to represent structures of clinical instances. In this chapter, we explore the entrance problem: the structure pattern discovery.

Typically, a clinical instance, as described in Section 2.3, is a process instance comprising a set of activities, each a logical unit of work performed by a medical staff.

For example, a patient treatment flow may involve measur ing blood pressure, examining respiration, and medicine treatment, just to name a few. These activities, each appearing over a temporally extended interval, may execute sequentially, concurrently, or repeatedly. For example, before giving any therapeutic intervention, diagnosis activities are often executed to verify conditions of a patient. Also, in order to gain better curative effect in some cases, it is necessary to execute a number of therapeutic interventions concurrently.

As a result, if we want to extract structure patterns from clinical instances, we need to take structural characteristics of process−temporally extended intervals and various transition ways−into consideration. We accordingly use a temporal graph to represent a clinical instance in our research. Detailed definitions of temporal graph and corresponding algorithms for discovering patterns are described in this chapter. The experimental results in evaluating performance of the proposed algorithms are also reported.

3.1 Related works

The works reported in [AGL98, Datta98, HY02] deal with the problem of discovering a process model from a set of process instances and assume the existence of a process model (i.e., control dependencies between activities) underlying a given set of process instances. In this vein, such discovery, using a directed graph [AGL98, HY02] or a finite state machine [Datta98] for representing process instances, aims at discovering a process model that best describes the set of process instances. Our study significantly differs from the process model discovery in that we do not assume the existence of an underlying process model but is designed to identify frequently observed temporal dependencies within process instances rather than control dependencies that are presumably genuine in the process instances.

Our work is closest to sequential pattern discovery that discovers frequent sequential occurrence of activities (e.g., items purchased) across transactions of the same entity (e.g., customer) [AS95, SA96]. The sequential pattern discovery is to find the maximal sequences among all sequences that have a certain user-specified minimum support. The work on sequential pattern discovery assumes that a transaction contains a set of activities occurring at the same time and that transactions of the same entity are sequentially ordered. While we assume that an activity appears over a temporally extended interval, two activities may overlap or occur in sequence, making sequential pattern discovery inappropriate because grouping activities into transactions cannot capture all possible temporal relationships between activities.

In addition, graph-based mining techniques are proposed in [CH00] that identifies interesting and repetitive substructures within structural data. Representing structural

data as a labeled graph, the substructure discovery techniques aim at finding all possible substructures from the graph. By its nature, the techniques discover only substructures that are regionally connected subgraphs and disregards transitive relationships among objects, limiting its applicability to our structure pattern discovery problem, where transitivity in temporally sequential relationships prevails.

Finally, [BWJ98] deals with the discovery of frequent-event patterns in a time sequence that consists of a set of time-stamped events. The discovery process starts with a user-specified event structure that consists of a set of variables representing events and temporal constraints between variables. Its goal is to identify instantiations of variables in the event structure that appear frequently in the time sequence. The event pattern discovery differs from our work in several ways. First, it assumes an event appears at a time point rather than over a time interval. Second, it searches for instantiations of a user-specified event structure rather than discovering all possible frequent temporal relationships among events within a time sequence.

3.2 Formalization of structure pattern discovery problem

A clinical instance comprises a set of activities, each of which is an execution unit that leads to the transition of state in the instance. The execution of an activity spans a temporally extended period. Each activity may also be associated with such information as execution entit y(s) involved, execution location and execution outcome. However, since the main intent of this research is to discover frequent activities and their associated temporal dependencies, we make use only of the starting time and ending time of an activity execution. Our view on a clinical instance can be formally described as below.

Definition 3.1 A clinical instance I is a set of triplets (Vi, st, et), where Vi uniquely identifies an activity, and st and et are timestamps representing the starting time and ending time of the execution of Vi in I, respectively.

Given a clinical instance, the temporal relationship between any activity pair can be classified into two types: followed and overlapped.

Definition 3.2 In a clinical instance I, an activity Vi is followed by another activity Vj

if Vj starts after Vi terminates in I.

Definition 3.3 In a clinical instance I, two activities, Vi and Vj, are overlapped if Vi

and Vj incur overlapped execution durations in I.

Definition 3.4 An activity Vi is directly followed by another activity Vj in a clinical instance I if Vi is followed by Vj in I and there does not exist a distinct activity Vk in I such that Vi is followed by Vk and Vk is followed by Vj in I.

To represent temporal relationships between activities in a clinical instance concisely, a temporal graph is defined as follows.

Definition 3.5 The pertinent temporal graph of a clinical instance I is a directed acyclic graph G = (V, E), where V is the set of activities in I, and E is a set of edges.

Each edge in G is an ordered pair (Vi, Vj), where Vi, Vj ∈ V, Vi ≠ Vj, and Vi is directly followed by Vj.

Transforming a clinical instance into its corresponding temporal graph representation is straightforward. We first traverse the activities in the given clinical instance by the

ascending order of their starting times. For each activity a, the set F of activities that directly follow a are identified. Subsequently, edges connecting a to each activity in F are created. As shown in Figure 3.1(a), activity B will be processed first due to its earliest starting time among all activities in the instance. Activities C and D directly follow B; thus, two edges are created from B to C and D, respectively, as shown in Figure 3.1(b). The subsequent traversal of this clinical instance processes activities A, C, D, and E in sequence. The resulting temporal graph corresponding to this instance is graphically illustrated in Figure 3.1(b). From a given temporal graph G, it is evident that an activity Vi is followed by another activity Vj if there exists a path from Vi to Vj

in G, and Vi and Vj are overlapped otherwise. As shown in Figure 3.1(b), activity B is followed by E since there exists a path from B to E. In contrast, activities A and B are overlapped since there does not exist a path that connects them.

B E

D C

(a) An instance

A C D

(b) Temporal graph for the instance in (a) Figure 3.1 Example of a clinical instance and the corresponding temporal graph

A structure pattern can also be represented as a temporal graph that has a certain user-specified minimum support.

Definition 3.6 A temporal graph G is said to be supported by a clinical instance I if all followed and overlapped relationships that exist in G are present in I.

Definition 3.7 A temporal graph G is said to be frequent if it is supported by no less than s% of the clinical instances, where s% is a user-defined minimum support

threshold.

Definition 3.8 A temporal graph G=(V, E) is a temporal subgraph of another temporal graph G’=(V’, E’) if V⊆V’ and for any pair of vertices v1, v2∈V, there is a path in G connecting v1 to v2 if and only if there is a path in G’ connecting v1 to v2. If G is a temporal subgraph of G’, then G’ is a temporal supergraph of G.

Problem statement: Given a set of temporal graphs, each of which represents a clinical instance, the structure pattern discovery is to find all frequent temporal graphs.

Each such temporal graph is referred to as a structure pattern.

在文檔中 Item 987654321/15891 (頁 23-31)