Designing incremental mining algorithms which effectively utilize the previously mined information to reduce costs of knowledge maintenances is rather important and useful. In order to compress the amount of frequent itemsets, we have utilized the concepts of closed itemsets to develop more efficient, scalable and practical approaches for maintaining and compressing association rule.
In the first part of this thesis, we have described that is not an intuitive translation from incremental frequent itemsets mining to incremental frequent closed itemsets mining, and divided the closed itemsets in the updated database into several portions.
We have shown the frequent closed itemsets in the updated database could be generated by two candidate sets, the closed original frequent itemsets and the closed potentially frequent itemsets. A special set named intersectional closed itemset collects the closed itemsets that only appears in the updated database has also been described. Some frequent closed itemsets belong to the intersectional closed itemsets are difficult to be determined since they were closed by other closed itemsets before.
We have shown the relations between the closed original itemsets, the closed potentially frequent itemsets and intersectional closed itemsets.
In the second part of this thesis, in order to avoid huge comparing cost, CIM has utilized the branch update strategy to generate the closed original frequent itemsets and make full use of the closed original itemsets to generate the closed potentially frequent itemsets are generated from . At last we have utilized the concept of pre-large, to develop the CIM-P algorithm that reduces the amount of the closed potentially frequent itemsets further. We have utilized two strategies to improve the utility of
buffer. The first is bucketing strategy that uses some buckets to record the actual contributions of d for the major itemsets in pre-large (the itemsets with higher supports). The consumption of buffer can be tightly calculated using the maximum value of buckets. This strategy can enhance the utility of buffer and the second strategy is using the infrequent 1-items to prune the itemsets that must still be infrequent. These two strategies can enhance the utility of buffer used in our CIM-P algorithm.
Reference
1. C.C. Aggarwal, P.S. Yu, A new approach to online generation of association rules, IEEE Transactions on Knowledge and Data Engineering, Vol. 13, No. 4, pp.
527-540, 2001.
2. R. Agrawal, T. Imielinksi, A. Swami, Mining association rules between sets of items in large database, ACM SIGMOD Conference, pp. 207-216, Washington DC, USA, 1993.
3. R. Agrawal, T. Imielinksi, A. Swami, Database mining: a performance perspective, IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp.
914-925, 1993
4. R. Agrawal, R. Srikant, Fast algorithm for mining association rules, ACM International Conference on Very Large Data Bases, pp. 487-499, 1994.
5. R. Agrawal, R. Srikant, Mining sequential patterns, IEEE International Conference on Data Engineering, pp. 3-14, 1995.
6. W.G. Aref, M.G. Elfeky, A.K. Elmagarmid, Incremental, online, and merge mining of partial periodic patterns in time-series databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 3, pp. 332-342, 2004.
7. R.J. Bayardo, R. Agrawal, D. Gunopulos, Constraint-based rule mining in large, dense databases, IEEE International Conference on Data Engineering, pp. 188-197, 1999.
8. K. Beyer, R. Ramakrishnan, Bottom-up computation of sparse and iceberg cubes, ACM SIGMOD Conference, pp. 359-370, 1999.
9. S. Brin, R. Motwani, C Silverstein, Beyond market baskets: generalizing association rules to correlations, ACM SIGMOD Conference, pp. 265-276, Tucson, Arizona, USA, 1997.
10. S. Brin, R. Motwani, J.D. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data, ACM SIGMOD Conference, pp. 255-264, Tucson, Arizona, USA, 1997.
11. S. Chaudhuri, U. Dayal, An overview of data warehousing and OLAP technology, ACM SIGMOD Record, 26:65-74, 1997.
12. M.S. Chen, J. Han, P.S. Yu, Data mining: an overview from database perspective, IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp.
866-883, 1996.
13. D.W. Cheung, J. Han, V.T. Ng, C.Y. Wong, Maintenance of discovered association rules in large databases: an incremental updating approach, IEEE International Conference on Data Engineering, pp. 106-114, 1996.
14. D.W. Cheung, S.D. Lee, B. Kao, A general incremental technique for maintaining discovered association rules, In Proceedings of Database Systems for Advanced Applications, pp. 185-194, Melbourne, Australia, 1997.
15. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J.D. Ullman, Computing iceberg queries efficiently, ACM International Conference on Very Large Data Bases, pp. 299-310, 1998.
16. R. Feldman, Y. Aumann, A. Amir, H. Mannila, Efficient algorithms for discovering frequent sets in incremental databases, ACM SIGMOD Workshop on DMKD, pp. 59-66, USA, 1997.
17. G. Grahne, L.V.S. Lakshmanan, X. Wang, M.H. Xie, On dual mining: from patterns to circumstances, and back, IEEE International Conference on Data Engineering, pp. 195-204, 2001.
18. J. Han, L.V.S. Lakshmanan, R. Ng, Constraint-based, multidimensional data mining, IEEE Computer Magazine, pp.2-6, 1999.
19. J. Han, M. Kamber, Data mining: concepts and techniques, Morgan Kaufmann
Publishers, 2001.
20. J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, ACM SIGMOD Conference, pp. 1-12, 2000.
21. C. Hidber, Online association rule mining, ACM SIGMOD Conference, pp.
145-156, USA, 1999.
22. T.P. Hong, C.Y. Wang, Y.H. Tao, A new incremental data mining algorithm using pre-large itemsets, International Journal on Intelligent Data Analysis, 2001.
23. W.H. Immon, Building the data warehouse, Wiley Computer Publishing, 1996.
24. L.V.S. Lakshmanan, R. Ng, J. Han, A. Pang, Optimization of constrained frequent set queries with 2-variable constraints, ACM SIGMOD Conference, pp. 157-168, Philadelphia, Pennsylvania, USA, 1999.
25. B. Lan, B.C. Ooi, K.L. Tan, Efficient indexing structures for mining frequent patterns, IEEE International Conference on Data Engineering, pp. 453-462, 2002.
26. H. Mannila, H. Toivonen, A.I. Verkamo, Efficient algorithm for discovering association rules, The AAAI Workshop on Knowledge Discovery in Databases, pp.
181-192, 1994.
27. H. Mannila, H. Toivonen, On an algorithm for finding all Interesting sentences, The European Meeting on Cybernetics and Systems Research, Vol. II, 1996.
28. R.T. Ng, L.V.S. Lakshmanan, J. Han, A. Pang, Exploratory mining and pruning optimizations of constrained associations Rules, ACM SIGMOD Conference, pp.
13-24, Seattle, Washington, USA, 1998.
29. J.S. Park, M.S. Chen, P.S. Yu, Using a hash-based method with transaction trimming for mining association rules, IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, pp. 812-825, 1997.
30. N.L. Sarda, N.V. Srinivas, An adaptive algorithm for incremental mining of association rules, IEEE International Workshop on Database and Expert Systems,
pp. 240-245, 1998.
31. A. Savasere, E. Omiecinski, S. Navathe, An efficient algorithm for mining association rules in large databases, ACM International Conference on Very Large Data Bases, pp. 432-444, 1995.
32. S. Thomas, S. Bodagala, K. Alsabti, S. Ranka, An efficient algorithm for the incremental update of association rules in large databases, The International Conference on Knowledge Discovery and Data Mining, pp. 263-266, 1997.
33. K. Wang, L. Tang, J. Han, J. Liu, Top down FP-Growth for association rule mining, Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 334-340, 2002.
34. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT'99, Jan. 1999.
35. J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In DMKD'00, May 2000.
36. M. Zaki and C. Hsiao. CHARM: An efficient algorithm for closed itemset mining.
In SDM'02, April 2002.
37. J. Wang, J. Han, and J. Pei, “Closet+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, Aug. 2003.
38. Z. Zheng, R. Kohavi, L. Mason, Real world performance of association rule algorithms, The International Conference on Knowledge Discovery and Data Mining, 2001.
39. R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent item sets. Journal of Parallel and Distributed Computing, 61(3):350– 371, 2001.