• 沒有找到結果。

Figure 5.4: An example of lattice structures for sequential patterns in a starting domain (i.e., D1 in Table 2).

of elements are further arranged level by level according to their sequence lengths and nodes with one element are placed level by level in increasing order of sequence length. For example, < (b, c) > in Figure 5.4 is below the nodes whose sequence length is 1 (e.g., < (b) >). As mentioned above, the lattice structure is used as a guideline for propagating time-instance sets of sequential patterns to other domains.

In the deriving phase, algorithm PropagatedMine extracts those sequences with occurrence times equal to those of the time-instance sets propagated. Thus, for each propagated time-instance set, we can build the corresponding propagated table as defined in Definition 8.

Definition 8. (Propagated table) Let M be a k-domain sequential pattern. The propagated table of M in sequence database Dk+1 is denoted as Dk+1||M = {< Si[l1], Si[l2], . . . , Si[lb] > | < T S(Si) : l1, l2, . . . , lb >∈ T IS(M ), where Si ∈ Dk+1} which is consisted of sequences that co-occurred with M .

Furthermore, Dk+1||M is also a sequence database, and

 M

S

 is a (k + 1)-domain sequential pattern if and only if S is a sequential pattern of Dk+1||M and e(S) = e(M ) with the same minimum support threshold.

For example, in domain D1 of Table 5.2, we have T IS(< (a)(c) >) = {< (T1)(T2)(T3)(T4) : 1, 2 >,

< (T1)(T2)(T3)(T4) : 1, 3 >, < (T5)(T6)(T7) : 1, 2 >, < (T5)(T6)(T7) : 1, 3 >, < (T21)(T22)(T23)(T24) : 1, 3 >}, and propagating T IS(< (a)(c) >) to domain D2 yields propagated table D2||<(a)(c)>. Table 5.6 is the propagated table D2||<(a)(c)>, where each sequence is very likely to form multi-domain sequential patterns with < (a)(c) > mined from domain D1. From propagated tables, one could mine sequential patterns having the same number of elements as the propagated sequential pattern and these sequential patterns could be formed as multi-domain sequential patterns. Consider the above example, where the minimum support is set to 3. We can easily find that < (1)(2) > is the sequential pattern of D2||<(a)(c)>

and thus

(a) (c) (1) (2)

is a 2-domain sequential pattern by compositing < (a)(c) > and < (1)(2) >.

Time sequences Sequences

< (T1)(T2)(T3)(T4) > (1, 2)(2, 3)

< (T1)(T2)(T3)(T4) > (1, 2)(6)

< (T5)(T6)(T7) > (1, 3)(2, 4)

< (T5)(T6)(T7) > (1, 3)(8)

< (T21)(T22)(T23)(T24) > (1, 2, 5)(2, 3)

Table 5.6: Example of propagated table D2||<(a)(c)>.

Note that even though PropagatedMine successfully prevents mining sequential patterns in each do-main, however, the cost of some redundant mining of propagated tables can be further reduced. For example, some patterns mined in propagated tables D2||<(a)> and D2||<(c)> are the same as patterns mined in propagated table D2||<(a)(c)>. This is due to that the time-instance set of < (a)(c) > is contained in both time-instance sets of < (a) > and < (c) >. Consequently, sequences in propagated table D2||<(a)(c)> also include some sequences in propagated table D2||<(a)> and D2||<(c)>. Therefore, only sequential patterns with their length being one should be propagated to other domains. In other words, only time-instance sets of the top-level nodes (referred to as atomic patterns) in lattice structures are propagated. After obtained, propagated tables are viewed as transaction databases. Consequently, given a propagated table, by utilizing frequent itemset algorithms in [1][2][80][25], we could generate the corresponding multi-domain sequential patterns. We now analyze some important properties of the propagated table. With these properties of propagated tables, the lattice structure in the starting domain is used to determine multi-domain sequential patterns whose length is larger than one. The details of generating multi-domain sequential patterns are described later.

Property of the propagated table of atomic patterns: Suppose that P is a k-domain sequential pattern (i.e., P ∈ SPk ) with |P | = 1.

 P β

is a multi-domain sequential pattern across (k + 1)-domain sequence databases (i.e., D1, D2, . . . , and Dk+1) with a minimum support of δ if and only if β is a frequent itemset in propagated table Dk+1||P with the same minimum support δ.

Property of antimonotone with multiple domains: If M is a k-domain sequential pattern (i.e., across D1, D2, . . . , and Dk), k-domain sequences contained by M are also k-domain sequential patterns.

Based on the antimonotone property, algorithm PropagatedMine generates candidate multi-domain sequential patterns in a level-by-level manner. However, in the propagated domain, sequential patterns are also generated level by level according to the number of sequence elements. The detailed steps for deriving multi-domain sequential patterns are described below.

Step 1: Derive atomic patterns across (k + 1) domains

Let SPk be the set of multi-domain sequential patterns across k domains. When deriving atomic patterns across (k + 1) domains, the corresponding frequent itemsets can be derived from the propagated tables of each atomic pattern in SPk. Through the property of propagated table of atomic patterns, those

<(2)>

Figure 5.5: Example of generating atomic patterns in domain D2.

frequent items mined from propagated tables are merged with atomic patterns in SPk to derive atomic patterns across (k + 1) domains. Consider the sequence databases across two domains in Table 5.2 as an example, where sequential patterns of domain D1are represented as a lattice structure. We could derive atomic patterns in domain D2and thus generate their corresponding multi-domain sequential patterns by propagating the time-instance sets of atomic patterns in domain D1 (i.e., the top-level nodes) to domain D2. Specifically, in Figure 5.5, for each atomic pattern in D1, there are interdomain links representing that these two patterns are able to form multi-domain sequential patterns. Consequently, we have

in the above example, and they are obviously also atomic patterns.

Step 2: Derive (k + 1)-domain sequential patterns with one element

This step involves deriving (k + 1)-domain sequential patterns with one element. Assume that k-domain sequential pattern P across k-k-domain sequence databases (i.e., D1, D2, . . . , and Dk) and that there is only one element in P (i.e., e(P ) = 1). The intradomain links in the lattice structure for domain k can be followed to find two multi-domain sequential patterns (e.g., X and Y , which are the components of P ). The corresponding multi-domain sequential patterns in domain k + 1 are found by traversing interdomain links of X and Y . According to the antimonotone property, if there exists any corresponding sequential patterns of X or Y in domain k + 1, they must have been discovered due to X ⊑ P and Y ⊑ P . Hence, the corresponding sequential patterns of P in domain k + 1 are generated from the union of all the multi-domain sequential patterns found in domain k + 1. For example, let P =< (b, c) > be a sequential pattern with e(P ) = 1 in D1of Table 5.2. The components of P (i.e., < (b) > and < (c) >) can be found from the intradomain links. Following interdomain links of < (b) > and < (c) > in Figure 5.6, yields the multi-domain sequential patterns in domain D2(i.e.,

for < (c) >). Consequently, two candidates are generated by union operation:

<(2)>

Figure 5.6: An Example of generating sequential patterns with one element in domain D2.

Once the candidate multi-domain sequential patterns are obtained, support values of these patterns are examined by checking their time-instance sets (i.e., Support(

(α) >) ∩ T IS(< (β) >)|). Given a minimum support of 3, since the support values of

are 3 and 2, respectively, only

is frequent. Thus, the lattice structure in domain D2contains node < (2) >, and interdomain links are built between lattice structures in domains D1and D2.

Step 3: Derive (k + 1)-domain sequential patterns with more than one element

After generating atomic patterns and the (k + 1)-domain sequential patterns with one element in step1 and step 2 respectively, algorithm PropagatedMine can further generate remaining (k + 1)-domain sequential patterns in a level-by-level manner by referring to the lattice structure in the last domain propagated (i.e., domain Dk). In this step, PropagatedMine starts deriving from those patterns with two elements due to the antimonotone property. The frequent patterns in the upper levels are found from the intradomain links in the lattice structure of Dk, and the corresponding upper level patterns in the lattice structure of domain Dk+1are identified from their interdomain links. Now, the interdomain links of upper level patterns must been established due to the antimonotone property. Before deriving (k + 1)-domain sequential patterns, it should be determined whether or not to merge the sequential patterns identified in the lattice structure based on their time order. This leads to Definition 9.

Definition 9. (Concatenate operation of TIS) Let M and N be two multi-domain sequences, where T IS(M ) = {< T S1: l11, l12, . . . , l1e(M)>, < T S2: l21, l22, . . . , l2e(M)>, . . . , < T Sm: lm1, lm2, . . . , lme(M)>

}, T IS(N ) = {< T T1: k11, k12, . . . , k1e(N )>, < T T2: k21, k22, . . . , k2e(N )>, . . . , < T Tn: kn1, kn2, . . . , kne(N )>

}, and T Siis the time sequence for i = 1, 2, . . . , m while T Tj is also time sequence for j = 1, 2, . . . , n. The

Algorithm: PropagatedMine

Input: Sequence databases across n domains D1, D2, . . . , Dn, and minimum support δ.

Output: Multi-domain sequential patterns across n domains.

Begin

Apply sequential pattern mining on D1.

Let SP1 be the set of sequential patterns mined in D1. For each domain Di, i = 2, 3, . . . , n

For each P ∈ SPi−1

//Step 1

If |P | = 1 Then Begin

Construct propagation table Di||P.

Find frequent items in Di||P with minimum support δ.

Let F I be the set of frequent items in Di||P.

Let X and Y be two patterns pointed to by intradomain links of P . For each pattern α pointed to by interdomain links of X

For each pattern β pointed to by interdomain links of Y If Support(

 α β



) > δ Then Begin Construct interdomain links from P to

 α

Let X and Y be two patterns pointed to by intradomain links of P . For each pattern α pointed to by interdomain links of X

For each pattern β pointed to by interdomain links of Y If Support([(α)(β)]) > δ Then Begin

Construct interdomain links from P to [(α)(β)].

Construct intradomain links from [(α)(β)] to α and β.

Append [(α)(β)] to SPi. End

Output=SPn. End

<(2)>

Figure 5.7: Example of generating sequential patterns with more than one element in domain D2.

concatenation of T IS(M ) and T IS(N ) is denoted as T IS(M )∩<T IS(N ) = {< T Si: li1, li2, . . . , lie(M), kj1, kj2, . . . , kje(

}, such that T Si= T Tj and lie(M) < kj1. In other words, T IS(M ) ∩<T IS(N ) is the time-instance set of the multi-domain sequence [M, N ], T IS([M, N ]).

For example, given M =

, and the sequence database across two domains in Table 5.2, where T IS(M ) = {< (T1)(T2)(T3)(T4) : 1 >, < (T5)(T6)(T7) : 1 >, < (T10)(T12)(T13) : and Y , by traversing intradomain links among lattice structures across k domains, and the multi-domain sequential patterns pointed to by their interdomain links can be determined. In light of Definition 9, a concatenate operation is considered rather than generating their union as in Step 2. For example, assume pattern P =< (a)(b, c) > in Figure 5.7. The intradomain and interdomain links yield

. Therefore, candidate multi-domain sequential pattern

The above steps allow multi-domain sequential patterns across (k + 1)-domain sequence databases to be derived from k-domain sequential patterns. Algorithm PropagatedMine iteratively repeats the above three steps until all sequence databases are propagated.

相關文件