Chapter 2 An Efficient Algorithm for Mining Temporal Patterns from Interval-based
3.4 CEMiner
3.4.2 Proposed Algorithm
Fig. 3.4 illustrates the main framework of CEMiner. It first transforms the temporal database to endpoint representation and counts the support of each endpoint concurrently. It also removes infrequent endpoints under given minimum support, min_sup (Lines 2-3, algorithm 3.1). For each frequent starting endpoint x, we build projected database DB|x and use EBackScan to check whether x can be pruned or not (Lines 5-7, algorithm 3.1). If not, we compute the number of backward-extension endpoints and call EBIDE recursively (Line 9, algorithm 3.1). Finally, we output all closed temporal pattern (Line 10, algorithm 3.1).
Algorithm 3.1: CEMiner (DB, min_sup)
Input: a temporal database DB, and the minimum support min_sup Output: all closed temporal patterns CTP
1: CTP ← ;
2: transform DB into endpoint presentation;
3: find all frequent endpoints and remove infrequent endpoints;
4: FSE ← all frequent starting endpoint;
5: for each interval x FSE do
6: construct projected database DB|x with regard to x;
7: if EBackScan(x, DB|x) = “false” then 8: BE = backward extension check (x, DB|x);
9: EBIDE (DB|x , x, min_sup, BE, CTP );
10: output all closed temporal patterns CTP;
Fig. 3.4: CEMiner algorithm
The pseudo code of EBIDE is shown in Fig. 3.5. For a prefix , EBIDE scans its projected database DB| once to discover all local frequent endpoints (Line 1, algorithm 3.2) and computes
and has neither backward-extension endpoint nor forward-extension endpoint, then is a closed temporal pattern (Lines 4-5, algorithm 3.2). For each frequent endpoint, we can append it to original prefix to generate new sequence ’ with the length increased by 1 (Lines 6-11, algorithm 3.2). In this way, the prefixes are forward-extended.
Algorithm 3.2: EBIDE (DB|, , min_sup, BE, CTP)
Input: a projected database DB| , an endpoint sequence , the minimum support min_sup, and a set of closed temporal patterns CTP
Output: a set of closed temporal patterns CTP
01: scan DB| once, remove infrequent endpoints and find every frequent endpoint y such that:
(i) y can be assembled to the last endpoint of to form a temporal pattern; or (ii) y can be appended to to form a temporal pattern;
02: LFE ← all local frequent endpoint;
03: FE = | { z | ( z LFE) ( support (z) = support ( )}|;
04: if (BE + FE == 0) and ( is a temporal pattern) then // no backward and forward extension
05: CTP ← CTP ∪ {}; // is a closed temporal pattern 06: for each y LFE do
07: if y is a “finishing endpoint” then
08: if exist corresponding starting endpoint in then 09: append b to to form ’; // pre-pruning strategy 10: if y is a “starting endpoint then
11: append y to to form ’;
12: construct projected database DB|’ with insignificant postfix elimination; //
post-pruning strategy
13: if EBackScan (’, DB|’) = “false” then 14: BE = backward extension check (’, DB|’);
15: EBIDE (DB|’ , ’, min_sup, BE, CTP );
Fig. 3.5: EBIDE algorithm
With the property of event endpoint, we use three pruning strategies, pre-pruning, post-pruning, and pair-pruning to reduce the searching space efficiently and effectively. First, the starting endpoint and finishing endpoint definitely occur in pairs in an endpoint sequence. We only require projecting the frequent finishing endpoints which have the corresponding start endpoints in their prefixes (Lines 7-9, algorithm 3.2). It is called pre-pruning strategy which can
prune off non-qualified patterns before constructing projected database. Second, when we construct a projected database, some endpoints in postfixes need not be considered. With respect to a prefix sequence , a finishing endpoint in projected postfix is called significant, if it has a corresponding starting endpoint in projected postfix or in . When constructing the projected database DB|, only the significant endpoints are collected and all insignificant endpoints are eliminated since they can be ignored in the discovery of closed temporal patterns. The second pruning method is called post-pruning strategy which eliminates insignificant endpoints when constructing projected database (Lines 12-13, algorithm 3.2). Finally, if ’ is frequent, EBIDE uses EBackScan to check if ’ can be pruned (Line 15, algorithm 3.2). If not, it computes the number of backward-extension endpoints and calls itself recursively (Lines 16-17, algorithm 3.2).
Moreover, we can avoid some unnecessary checking based on the characteristic of endpoint representation. When extending the pattern by a locally frequent endpoint, if the appending endpoint is a finishing endpoint, we require a two-directional closure checking, i.e., backward-extension and forward-extension checking, to verify whether the pattern is closed or not. However, if the appending endpoint is a starting endpoint, we can omit the closure checking.
Since the starting endpoint and finishing endpoint always occur in pairs in an endpoint sequence, forward directional checking is unnecessary. Actually, we just require growing the pattern. The last pruning method is called pair-pruning.
We take the database in Fig. 3.2 with min_sup = 2 as an example. There are 17 event intervals which can be regarded as 4 event sequences in the database. After transforming database, we can find all frequent endpoints. They are A+: 3, A-: 3, B+: 4, B-: 4, D+: 4, D-: 4, E+: 4, and E-: 4, where the notation “pattern: count” represents the sequence and its associated support count. The event sequences with corresponding endpoint representation are shown as in first column in Fig. 3.6. We take the frequent endpoint A+ and E+ as examples to further discuss in details.
For an endpoint A+, the projected database with respect to A+ has 3 sequences: B+A-B-D+ E+E-D-, B+A-(B-D+)E+E-D- , and A-D+E+E-D- . Since A+ is a starting endpoint, by
pair-pruning, we need not do closure checking. Continuing the recursive process with the DB|A+, we can discover all closed temporal patterns prefixed with A+. In addition, when projecting frequent endpoint E+, the endpoint D- in generated postfix sequences will be eliminated by post-pruning strategy directly since D- is insignificant. The last column in Fig. 3.6 lists all generated closed temporal patterns. Obviously, the set of closed patterns expresses the same information as the set of temporal patterns, but includes much fewer patterns.
event sequences with corresponding
endpoint representation prefix projected database ( : insignificant endpoint )
Fig. 3.6: An example of projected databases and closed temporal patterns