• 沒有找到結果。

Chapter 3 New-Moment: Mining Closed Frequent Itemsets

3.1 Related Work: Moment Algorithm

Moment [20, 21] algorithm mines closed frequent itemsets with sliding window model in a data stream. It uses a closed enumeration tree (CET) to maintain the closed frequent itemsets in the current window. CET not only maintains closed frequent itemsets but also maintains some boundary tree nodes. Figure 3-1 shows the CET in the first window. Assume that the window size is 4 and the first four incoming transaction is listed in the left of the graph.

(a, b, c): 2 Infrequent gateway nodeInfrequent gateway nodeInfrequent gateway node

Closed node

Fig 3-1. CET in the first sliding window

There are four types of tree nodes for CET:

(1) infrequent gateway nodes

A node nI that represents itemset I is an infrequent gateway node if i) I is an infrequent itemset, ii) nI’s parent, nJ, is frequent, and iii) I is the result of joining I’s parent, J, with one of J’s frequent siblings. In Figure 3-1, the tree node (d) is an infrequent gateway node.

(2) unpromising gateway nodes

A node nI is an unpromising gateway node if i) I is a frequent itemset and ii) there exists a closed frequent itemset J such that J ⊂ I, and J has the same support as I does. In Figure 3-1, the tree nodes (a, c) and (b) are unpromising gateway nodes.

(3) intermediate nodes

A node nI is an intermediate node if i) I is a frequent itemset, ii) nI has a child node nJ

such that J has the same support as I does, and iii) nI is not an unpromising gateway node.

In Figure 3-1, the tree node (a) is an intermediate node because its child (a, b) has the same support as (a) does.

(4) closed nodes

These nodes represent closed frequent itemsets in the current window. A closed node can be an internal node or a leaf node. In Figure 3-1, (c), (a, b), and (a, b, c) are closed nodes.

Except closed nodes, Moment keeps three types of boundary nodes. These nodes are the most possible candidates of new closed nodes in the next window. Moment keeps these nodes for speeding up modification of the closed enumeration tree.

There are three steps in Moment algorithm:

(1) Building the closed enumeration tree (CET)

When the total number of transactions coming from the data stream does not excess window size N, Moment just saves these transactions in its sliding window. As long as the

window is full, Moment builds an initial closed enumeration tree (CET). Figure 3-1 shows the tree in the first window.

Moment adopts a depth-first procedure to generate all possible candidate itemsets in the window and check their supports. In the procedure, if a node is found to be infrequent, it is marked as an infrequent gateway node and Moment does not explore its descendants further.

If a node is frequent itemset but not closed frequent itemset, the node is marked as an unpromising gateway node. Moment also does not explore its descendants, which does not contain any closed frequent itemsets. Moment uses support of a node and the tid sum of the transactions that containing the node (tid_sum) to check if the node is a closed node. Take the nodes (a, c) and (a, b, c) in Figure 3-1 as an example. The support of (a, c) is the same as (a, b, c). The tid_sum of (a, c) is 7 (the third transaction and the fourth transaction in the window). That is equal to the tid_sum of (a, b, c). By the definition of closed frequent itemsets, we can know that (a, c) is not a closed node.

If a node is found to be neither an infrequent node nor an unpromising gateway node, Moment explores its descendants. The nodes that are intermediate nodes or closed nodes are maintained in the CET.

(2) Updating the CET

Initial closed enumeration tree is built when the number of incoming transactions from the data stream is equal to the window size. After that, when a new transaction comes from the data stream, Moment updates the CET to maintain the closed frequent itemsets in the current window. There are two steps for updating the CET:

Adding the new transaction coming from the data stream

Fig 3-2. Adding the new transaction with tid = 5

In Figure 3-2, a new transaction T (tid = 5) is added to the sliding window. Moment traverses the parts of the CET that are related to transaction T. For each related node nI in depth-first order, Moment updates its support and tid_sum. Whenever a node is updated, Moment checks if it needs to change its node type.

In Figure 3-2, the node (d) becomes a new frequent node so Moment generates the new candidates node (a, d) and (c, d). By node properties Moment know that (a, d) is an infrequent gateway node and (c, d) is a new closed node. By checking the support of the nodes (a), (a, c), and (c), Moment modifies them to closed nodes.

Deleting the oldest transaction in the window

In Figure 3-3, the transaction with tid = 1 is deleted. Like adding the new transaction, Moment updates support and tid_sum of each node in the CET. By checking the support of each node, Moment modifies its node type.

In Figure 3-3, node (c) becomes unpromising gateway node because it is contained by node (a, c) and supports of (c) and (a, c) are the same. Then the sub tree of node (c), (c, d), is deleted. The node (d) becomes new infrequent gateway node.

Moment maintains a huge number of boundary nodes to speed up the procedure of updating CET. The cost for a node to change its type is less. But we find that those boundary nodes are unnecessary overhead. In our proposed algorithm New-Moment, we reduce the number of tree nodes and utilize an efficient structure to store the information of the sliding window.