• 沒有找到結果。

Chapter 2 An Efficient Algorithm for Mining Temporal Patterns from Interval-based

3.4 CEMiner

3.4.2 Proposed Algorithm

Fig. 3.4 illustrates the main framework of CEMiner. It first transforms the temporal database to endpoint representation and counts the support of each endpoint concurrently. It also removes infrequent endpoints under given minimum support, min_sup (Lines 2-3, algorithm 3.1). For each frequent starting endpoint x, we build projected database DB|x and use EBackScan to check whether x can be pruned or not (Lines 5-7, algorithm 3.1). If not, we compute the number of backward-extension endpoints and call EBIDE recursively (Line 9, algorithm 3.1). Finally, we output all closed temporal pattern (Line 10, algorithm 3.1).

Algorithm 3.1: CEMiner (DB, min_sup)

Input: a temporal database DB, and the minimum support min_sup Output: all closed temporal patterns CTP

1: CTP ← ;

2: transform DB into endpoint presentation;

3: find all frequent endpoints and remove infrequent endpoints;

4: FSE ← all frequent starting endpoint;

5: for each interval x  FSE do

6: construct projected database DB|x with regard to x;

7: if EBackScan(x, DB|x) = “false” then 8: BE = backward extension check (x, DB|x);

9: EBIDE (DB|x , x, min_sup, BE, CTP );

10: output all closed temporal patterns CTP;

Fig. 3.4: CEMiner algorithm

The pseudo code of EBIDE is shown in Fig. 3.5. For a prefix , EBIDE scans its projected database DB| once to discover all local frequent endpoints (Line 1, algorithm 3.2) and computes

and has neither backward-extension endpoint nor forward-extension endpoint, then  is a closed temporal pattern (Lines 4-5, algorithm 3.2). For each frequent endpoint, we can append it to original prefix to generate new sequence ’ with the length increased by 1 (Lines 6-11, algorithm 3.2). In this way, the prefixes are forward-extended.

Algorithm 3.2: EBIDE (DB|,  , min_sup, BE, CTP)

Input: a projected database DB| , an endpoint sequence  , the minimum support min_sup, and a set of closed temporal patterns CTP

Output: a set of closed temporal patterns CTP

01: scan DB| once, remove infrequent endpoints and find every frequent endpoint y such that:

(i) y can be assembled to the last endpoint of  to form a temporal pattern; or (ii) y can be appended to to form a temporal pattern;

02: LFE ← all local frequent endpoint;

03: FE = | { z | ( z  LFE)  ( support (z) = support ( )}|;

04: if (BE + FE == 0) and ( is a temporal pattern) then // no backward and forward extension

05: CTP ← CTP ∪ {}; //  is a closed temporal pattern 06: for each y  LFE do

07: if y is a “finishing endpoint” then

08: if exist corresponding starting endpoint in then 09: append b to to form ’; // pre-pruning strategy 10: if y is a “starting endpoint then

11: append y to  to form ’;

12: construct projected database DB| with insignificant postfix elimination; //

post-pruning strategy

13: if EBackScan (’, DB|) = “false” then 14: BE = backward extension check (’, DB|);

15: EBIDE (DB| , ’, min_sup, BE, CTP );

Fig. 3.5: EBIDE algorithm

With the property of event endpoint, we use three pruning strategies, pre-pruning, post-pruning, and pair-pruning to reduce the searching space efficiently and effectively. First, the starting endpoint and finishing endpoint definitely occur in pairs in an endpoint sequence. We only require projecting the frequent finishing endpoints which have the corresponding start endpoints in their prefixes (Lines 7-9, algorithm 3.2). It is called pre-pruning strategy which can

prune off non-qualified patterns before constructing projected database. Second, when we construct a projected database, some endpoints in postfixes need not be considered. With respect to a prefix sequence , a finishing endpoint in projected postfix is called significant, if it has a corresponding starting endpoint in projected postfix or in . When constructing the projected database DB|, only the significant endpoints are collected and all insignificant endpoints are eliminated since they can be ignored in the discovery of closed temporal patterns. The second pruning method is called post-pruning strategy which eliminates insignificant endpoints when constructing projected database (Lines 12-13, algorithm 3.2). Finally, if ’ is frequent, EBIDE uses EBackScan to check if ’ can be pruned (Line 15, algorithm 3.2). If not, it computes the number of backward-extension endpoints and calls itself recursively (Lines 16-17, algorithm 3.2).

Moreover, we can avoid some unnecessary checking based on the characteristic of endpoint representation. When extending the pattern by a locally frequent endpoint, if the appending endpoint is a finishing endpoint, we require a two-directional closure checking, i.e., backward-extension and forward-extension checking, to verify whether the pattern is closed or not. However, if the appending endpoint is a starting endpoint, we can omit the closure checking.

Since the starting endpoint and finishing endpoint always occur in pairs in an endpoint sequence, forward directional checking is unnecessary. Actually, we just require growing the pattern. The last pruning method is called pair-pruning.

We take the database in Fig. 3.2 with min_sup = 2 as an example. There are 17 event intervals which can be regarded as 4 event sequences in the database. After transforming database, we can find all frequent endpoints. They are A: 3, A: 3, B: 4, B: 4, D: 4, D: 4, E: 4, and E: 4, where the notation “pattern: count” represents the sequence and its associated support count. The event sequences with corresponding endpoint representation are shown as in first column in Fig. 3.6. We take the frequent endpoint A and E as examples to further discuss in details.

For an endpoint A, the projected database with respect to A has 3 sequences:  BABD EED,  BA(BD)EED , and  ADEED . Since A is a starting endpoint, by

pair-pruning, we need not do closure checking. Continuing the recursive process with the DB|A, we can discover all closed temporal patterns prefixed with A. In addition, when projecting frequent endpoint E, the endpoint D in generated postfix sequences will be eliminated by post-pruning strategy directly since D is insignificant. The last column in Fig. 3.6 lists all generated closed temporal patterns. Obviously, the set of closed patterns expresses the same information as the set of temporal patterns, but includes much fewer patterns.

event sequences with corresponding

endpoint representation prefix projected database ( : insignificant endpoint )

Fig. 3.6: An example of projected databases and closed temporal patterns

相關文件