• 沒有找到結果。

Sequential Pattern Mining in the Propagated Domain

After constructing propagated databases in the propagated domain, we describe how to mine simultaneous sequential patterns in the propagated databases. Observing a propagated data-base T RSDBprop, we find that: the numbers of time slots and elements in every time related sequence are exactly the same with that of the propagator pattern of T RSDBprop. If a se-quential pattern β in the propagated database and propagator pattern α are simultaneous, β has the same number of elements as that in α. Accordingly, we propose a simultaneous sequential pattern mining method referred to as SSM (standing for Slot-by-Slot sequential pattern Mining). The concept of SSM is that since sequential pattern β has exactly the same number of elements as propagator pattern α, we can collect elements that map to time slots which have the same rank in T IM E_SEQs to form an element set and mining these element sets step by step. If no frequent itemset is found, sequential pattern β and propagator pattern α are not simultaneous. An example of mining simultaneous patterns by SSM is described as follows.

Example 2 (SSM)Given min_sup = 2, α =<(a)(b,c)(e)>, which is a sequential pattern

in T RSDB1with T ISαT RSDB1 ={1:<1,2,4>, 1:<1,3,4>, 2:<2,3,4>, 4:<1,3,4>} and propagated database T RSDB2||α in Table 3.4, the process of SSM on T RSDB2||α is described as the following steps.

Step 1. Mine frequent itemsets occurred in the first time slot: Extracting the elements occurred in the first time slot of each T IM E_SEQ in T RSDB2||α into one element set, which can be treated as a transaction database, denoted by DB1st_slot. In this example, DB1st_slot of T RSDB2||α is {1:(1,2), 1:(1,2), 2:(1,3), 4:(1,2,5)}, where the number before an itemset indicates the period in which this itemset occurs. Then we can use traditional frequent itemset mining method [1],[7],[14] to find frequent itemsets in DB1st_slot. Note that the support of an itemset is counted one for the same period. Consequently, the supports of (1), (2), and (1,2) in DB1st_slot are 3, 2, and 2, respectively.

Step 2. Divide search space: For every frequent itemset X found in DB1st_slot, we construct the <X>-projected database in T RSDB2||α. Projected databases in SSM is somewhat different with that in PrefixSpan, because we have mined all frequent item-sets in DB1st_slot, and therefore items within the postfix and the first element is useless for every CON T EXT _SEQ in T RSDB2||α. Consequently, items within the postfix and the first element are ignored when <X>-projected database is constructed. For example, the <(1)>-projected database in T RSDB2||α should contain postfixes <(_2)(2,3)(4,5)>,

<(_2)(6)(4,5)>, <(_3)(2,4)(8)>, and <(_2,5)(2,3)(4,5,6)>. After ignoring items, we get new postfixes: <(2,3)(4,5)>, <(6)(4,5)>, <(2,4)(8)>, and <(2,3)(4,5,6)>. Projected database (T RSDB2||α)|<(1)>is shown in Table 3.4. Similarly, (T RSDB2||α)|<(2)>and (T RSDB2||α)|<(1,2)>

are also constructed. We only list (T RSDB2||α)|<(1,2)> in Table 3.4.

Step 3. Mine frequent itemsets in the 1st_element set of split search space recursively: For every projected database found in the previous step, we mine frequent item-sets in currently DB1st_slot and divide search space recursively. Consider (T RSDB||α)|<(1,2)>

as an example. We extract the first element in every CON T EXT _SEQ to form

cur-< ( 1,2 ) ( 2,3 ) ( 4.5 ) >

Figure 3.3: Illustrate the process of SSM in example 2

rent DB1st_slot={1:(2,3), 2:(6), 4:(2,3)}. After performing frequent itemset mining method on DB1st_slot of (T RSDB||α)|<(1,2)>, we can find frequent itemsets (2), (3) and (2,3) with the support value to be 2. Therefore, (T RSDB2||α)|<(1,2)(2)>, (T RSDB2||α)|<(1,2)(3)>, and (T RSDB2||α)|<(1,2)(2,3)> are constructed and frequent items are mined recursively. Since there is no element in <(1,2)(2,3)(4)>-, <(1,2)(2,3)(5)>-, and <(1,2)(2,3)(5)>-projected databases, the process of SSM on <(1,2)(2,3)>-projected database stops and returns three sequential patterns: <(1,2)(2,3)(4)>, <(1,2)(2,3)(5)>, and <(1,2)(2,3)(4,5)>. Therefore, we get three simultaneous sequential patterns for propagator pattern <(a)(b,c)(e)>, i.e.,

Following the above procedure, we could mine all simultaneous patterns in divided space.

We use Figure 3.3 to illustrate the processing of SSM with the profile given in Example 2.

Note that in Figure 3.3, we omit T IM ES_SEQ part in propagated and projected databases and focus on the operation performed in CON T EXT _SEQ.

SID TIME_SEQ CONTEXE_SEQ

Table 3.4: Projected databases used in Example 2 Algorithm: Slot-by-slot simultaneous sequential pattern mining (SSM.) Input: propagator pattern α and the propagated database T RSDBv||α

and minimum support threshold min_support.

Output: The complete MDSSPs which can be found in T RSDBv||α.

1. call SSM (T RSDBv||α).

2. For every pattern p return from previous step, if the number of elements in p equals to the number of elements in α, we output that

∙α p

¸

is a MDSSP.

Function SSM (T RSDB)

/*T RSDB is a time related sequence database.*/

IFevery CON T EXT _SEQ in T RSDB is empty RETURN

ELSE BEGIN

1. Collect every first element which is mapped by the first time slot in every CON T EST _SEQ of T RSDB to form the transaction database DB1st_slot.

2. Mine frequent itemset in DB1st_slot.

3. IFno frequent itemset can be found in DB1st_slot

RETURN ELSE BEGIN

1. For every found frequent itemset Xi, we construct projected database T RSDB|<Xi>.

2. For every T RSDB|<Xi>, we call SSM (T RSDB|<Xi>).

3. For every returned pattern p from previous step, we insert itemset Xi into p to form a new pattern p0

such that Xi is the first element in p0. 4. RETURN every p0.

END END

When performing SSM on propagated databases in propagated domain T RSDBtarget, we can ignore some propagated databases and do not need to apply SSM on them because of the following property:

Properity 1 (Reducible Propagation) Assume that both α and β are propagator patterns which are represented as £

c1 c2 ... cm T RSDBtraget||α, also no simultaneous sequential patterns can be found in T RSDBtraget||β.

Based on the property above, we can do simultaneous sequential pattern mining more efficiently: first we perform SSM on propagated databases constructed by propagator patterns having only one column. We record which propagator patterns can not find no simultaneous sequential pattern in T RSDBtarget and then prune propagator patterns which are prefixed with these recorded propagator patterns. After doing propagation pruning, we perform SSM on propagated databases constructed by propagator patterns having two columns, and then similarly prune propagator patterns according to the result of performing SSM. Repeat this process until all propagator patterns have been propagated or pruned.

Based on the above discussion, the algorithm of PropagatedMine is presented as follows:

Algorithm : PropagatedMine

Input: Time related sequence databases: T RSDB1, T RSDB2, ..., T RSDBn, and the minimum support threshold min_support.

Output: The complete MDSSPs of T RSDB1, T RSDB2, ...,and T RSDBn. BEGIN

1. Perform PrefixSpan on T RSDB1.

2. Construct propagated database T RSDB2||α and call SSM(α, T RSDB2||α) for every mined sequential pattern α in step 1 if α cannot be pruned.

3. For every MDSSP p returned form step 2, call P ropagation(p, d3).

END

Subroutine: P ropagation(propagator, domainID) /* propagator is the propagator pattern, and domainID is a identifier used to indicate which domain.*/

BEGIN

1. call SSM (propagator, T RSDB ||propagator) if propagator

cannot be pruned.

2. IFdomainID equals to dn BEGIN

OUTPUT every MDSSP p returned from step 1 is a MDSSP of T RSDB1, T RSDB2, ...,and T RSDBn

END

ELSE BEGIN

call P ropagation(p, domainID + 1) for every MDSSP p returned from step 1.

END END

相關文件