Experimental Results - Incrementally Fast Updated Frequent Pattern Trees

Experiments were made to compare the performance of the batch FP-tree

construction algorithm and the incremental FUFP-tree maintenance algorithm for

processing new transactions. When new transactions came, the batch FP-tree

construction algorithm integrated new transactions into the original database and

constructed a new FP-tree from the updated database. The process was executed

whenever new transactions came. The incremental FUFP-tree maintenance algorithm

processed new transactions incrementally in the way mentioned in Section 3.

The experiments were performed in C++ on an Intel x86 PC with a 2.8G Hz

processor and 512 MB main memory and running the Microsoft Windows XP

operating system. A real dataset called BMS-POS [22] were used in the experiments.

This dataset was also used in the KDDCUP 2000 competition. The BMS-POS dataset

contained several years of point-of-sale data from a large electronics retailer. Each

transaction in this dataset consisted of all the product categories purchased by a

customer at one time. There were 515,597 transactions with 1657 items in the

dataset. The maximal length of a transaction was 164 and the average length of the

transactions was 6.5.

The first 400,000 transactions were extracted from the BMS-POS database to

construct an initial FP-tree. The next 5,000 transactions were then sequentially used

each time as new transactions for the experiments. The minimum support was set at

4%. The execution times and the numbers of nodes obtained from both the batch

FP-tree construction algorithm and the incremental FUFP-tree maintenance

algorithm were compared. Figure 18 shows the execution times required by the

batch FP-tree construction algorithm and by the FUFP-tree maintenance algorithm

for processing each 5000 new transactions.

0 20 40 60 80 100 120

0 5000 10000 15000 20000 25000

number of transactions

execution time (sec.)

FP-tree FUFP-tree

Figure 18: The comparison of the execution time

In Figure 18, it easily observed that the execution time by the proposed approach

was much less than that by the batch FP-tree construction algorithm for handling new

transactions. Especially, when the transaction numbers in the original database

became larger, the FUFP-tree maintenance algorithm had a better speed-up.

The FUFP-tree maintenance algorithm may generate a less concise tree than the

FP-tree construction algorithm since the latter completely follows the sorted frequent

items to build the tree. As mentioned above, when an originally small item becomes

large due to new transactions, its updated support is usually only a little larger than

the minimum support. It is thus reasonable to put a new large item at the end of the

Headrer_Table. The difference between the FP and the FUFP tree-structures will thus

not be significant. For showing this effect, the numbers of nodes between the two

algorithms are shown in Figure 19.

158000 160000 162000 164000 166000 168000 170000 172000

0 5000 10000 15000 20000 25000 number of transactions

number of nodes _FP-tree

FUFP-tree

Figure 19: The comparison of the number of nodes

It is observed from Figure 19 that the FUFP-tree maintenance algorithm

generated nearly the same nodes as the FP-tree construction algorithm. The

effectiveness of the FUFP-tree maintenance algorithm is thus acceptable.

6. Conclusion

In this paper, we have proposed the FUFP maintenance structure and algorithm

to efficiently and effectively handle new transaction insertion in data mining. The

FUFP-tree structure is the same as the FP-tree structure [12] except that the links

between parent nodes and their child nodes are bi-directional. Besides, the counts of

the sorted frequent items are also kept in the Header_Table. These modifications will

make the tree update process easier.

When new transactions are added, the proposed incremental maintenance

algorithm processes them to maintain the FUFP-tree. It first partitions items into four

parts according to whether they are large or small in the original database and in the

new transactions. Each part is then processed in its own way. The Header_Table and

the FUFP-tree are correspondingly updated whenever necessary. It is reasonable to

insert a new large item at the end of the Header_Table since when an originally small

item becomes large due to new transactions, its updated support is usually only a little

larger than the minimum support.

Experimental results also show that the proposed FUFP-tree maintenance

algorithm runs faster than the batch FP-tree construction algorithm for handling new

transactions and generates nearly the same tree structure as the FP-tree algorithm. The

proposed approach can thus achieve a good trade-off between execution time and tree

complexity.

The FP-Growth mining procedure was used for mining from the FP-tree in the

past. It can also be borrowed for mining from FUFP-tree. Both the FP and the FUFP

tree structures can easily allow the FP-growth procedure to mine desired rules for

only specified items. In this case, the maintenance of the tree structures is especially

important. In the future, we will attempt to discuss other issues on incremental mining

problems.

References

[1] R. Agrawal, T. Imielinksi and A. Swami, “Mining association rules between sets

of items in large database,“ The ACM SIGMOD Conference, pp. 207-216,

Washington DC, USA, 1993.

[2] R. Agrawal, T. Imielinksi and A. Swami, “Database mining: a performance

perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No.

6, pp. 914-925, 1993.

[3] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” The

International Conference on Very Large Data Bases, pp. 487-499, 1994.

[4] R. Agrawal and R. Srikant, ”Mining sequential patterns,” The Eleventh IEEE

International Conference on Data Engineering, pp. 3-14, 1995.

[5] R. Agrawal, R. Srikant and Q. Vu, “Mining association rules with item

constraints,” The Third International Conference on Knowledge Discovery in

Databases and Data Mining, pp. 67-73, Newport Beach, California, 1997.

[6] M.S. Chen, J. Han and P.S. Yu, “Data mining: An overview from a database

perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No.

6, pp. 866-883, 1996.

[7] D.W. Cheung, J. Han, V.T. Ng, and C.Y. Wong, “Maintenance of discovered

association rules in large databases: An incremental updating approach,” The

Twelfth IEEE International Conference on Data Engineering, pp. 106-114, 1996.

[8] D.W. Cheung, S.D. Lee, and B. Kao, “A general incremental technique for

maintaining discovered association rules,” In Proceedings of Database Systems

for Advanced Applications, pp. 185-194, Melbourne, Australia, 1997.

[9] C. I. Ezeife, “Mining Incremental association rules with generalized FP-tree,”

Proceedings of the 15th Conference of the Canadian Society for Computational

Studies of Intelligence on Advances in Artificial Intelligence, pp. 147-160, 2002.

[10] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama, "Mining optimized

association rules for numeric attributes," The ACM SIGACT-SIGMOD-SIGART

Symposium on Principles of Database Systems, pp. 182-191, 1996.

[11] J. Han and Y. Fu, “Discovery of multiple-level association rules from large

database,” The Twenty-first International Conference on Very Large Data Bases,

pp. 420-431, Zurich, Switzerland, 1995.

[12] J. Han, J. Pei, and Y. Yin, ”Mining frequent patterns without candidate

generation” The 2000 ACM SIGMOD International Conference on Management

of Data, 2000.

[13] M.Y. Lin and S.Y. Lee, “Incremental update on sequential patterns in large

databases,” The Tenth IEEE International Conference on Tools with Artificial

Intelligence, pp. 24-31, 1998.

[14] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient algorithm for

discovering association rules,” The AAAI Workshop on Knowledge Discovery in

Databases, pp. 181-192, 1994.

[15] J.S. Park, M.S. Chen, P.S. Yu, “Using a hash-based method with transaction

trimming for mining association rules,” IEEE Transactions on Knowledge and

Data Engineering, Vol. 9, No. 5, pp. 812-825, 1997.

[16] Y. Qiu, Y. J. Lan and Q. S. Xie, “An improved algorithm of mining from

FP-tree,” Proceedings of the Third International Conference on Machine

Learning and Cybernetics, Shanghai, August, pp 26-29, 2004.

[17] N. L. Sarda and N. V. Srinivas, “An adaptive algorithm for incremental mining

of association rules,” The Ninth International Workshop on Database and Expert

Systems, pp. 240-245, 1998.

[18] R. Srikant and R. Agrawal, “Mining generalized association rules,” The

Twenty-first International Conference on Very Large Data Bases, pp. 407-419,

Zurich, Switzerland, 1995.

[19] R. Srikant and R. Agrawal, “Mining quantitative association rules in large

relational tables,” The 1996 ACM SIGMOD International Conference on

Management of Data, pp. 1-12, Montreal, Canada, 1996.

[20] S. Zhang, “Aggregation and maintenance for database mining,” Intelligent Data

Analysis, Vol. 3, No. 6, pp. 475-490, 1999.

[21] O. R. Zaiane and E. H. Mohammed, “COFI-tree mining: A new approach to

pattern growth with reduced candidacy generation,” IEEE International

Conference on Data Mining, 2003.

[22] Z. Zheng, R. Kohavi, L. Mason, “Real world performance of association rule

algorithms”, The International Conference on Knowledge Discovery and Data

Mining, pp. 401-406, 2001.

在文檔中 Incrementally Fast Updated Frequent Pattern Trees (頁 37-46)