From the experimental results, different values of data dependency will cause the same
large itemsets, but different predictive effects. When w=1, the non-promising candidate sets
are predicted very well, but the promising candidate sets are predicted badly; and vice versa
for w=0. By default, we set w=0.5. If the data dependency relationships in transactions can be
well utilized, our method can improve the overall performance of finding large itemsets.
In our experiments, both the (n, p) algorithm and ours suffer from the inefficiency of
generating Ci++1 from Ci+. When there are many items in the dataset, e.g., 25 items in
D1~D6, and more levels of transactions to be considered, more computation is needed in both
algorithms. However, our method provides a more accurate approach for predicting itemsets
and obtains a better performance than the (n, p) algorithm, especially when p>2.
In this paper, we have presented a mining algorithm that combines the advantages of the
apriori and the (n, p) algorithm in finding large itemsets. As the (n, p) algorithm does, our
algorithm reduces the number of scanning datasets for finding p levels of large itemsets. A
new parameter that considers data dependency is included in our method for early filtering
out the itemsets that are possibly of lower supports and thus improves the computational
efficiency.
We also conclude that the three algorithms can compete with each other and gain the best
performance on different types of datasets. There need more studies on how to tune the
parameters, such as n, p, and transaction threshold in the (n, p) algorithm and w, t in ours,
before the mining task is performed.
Acknowledgement
The authors would also like to thank Mr. Tsung-Te in the department of information
management, Shu-Te University, Taiwan, for his help in conducting the experiments.
References
[1] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” The
International Conference on Very Large Data Bases, pp. 487-499, 1994.
[2] R. Agrawal and R. Srikant, “Mining sequential patterns,” The Eleventh IEEE
International Conference on Data Engineering, pp. 3-14, 1995.
[3] R. Agrawal, R. Srikant and Q. Vu, “Mining association rules with item constraints,” The
Third International Conference on Knowledge Discovery in Datasets and Data Mining,
pp. 67-73, Newport Beach, California, 1997.
[4] R. Agrawal, T. Imielinksi and A. Swami, “Dataset mining: a performance perspective,”
IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925,
1993.
[5] R. Agrawal, T. Imielinksi and A. Swami, “Mining association rules between sets of
items in large dataset,“ The ACM SIGMOD Conference, pp. 207-216, Washington DC,
USA, 1993.
[6] A. Bykowski and C. Rigotti, “A condensed representation to find frequent patterns,” The
12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems,
Santa Barbara, California, USA, 2001.
[7] Y. Bastide, R. Taouil, N. Pasquier, G. Stumme and L. Lakhal,. “Mining frequent
patterns with counting inference,” ACM SIGKDD Explorations, Vol. 2, No. 2, pp. 66 -75,
2000.
[8] M.S. Chen, J. Han and P.S. Yu, “Data mining: an overview from a dataset perspective,”
IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866-883,
1996.
[9] N. Denwattana and J. R. Getta, “A parameterised algorithm for mining association
rules,” The Twelfth Australasian Dataset Conference, pp. 45-51, 2001.
[10] C. I. Ezeife, “Mining Incremental association rules with generalized FP-tree,” The 15th
Conference of the Canadian Society for Computational Studies of Intelligence on
Advances in Artificial Intelligence, pp. 147-160, 2002.
[11] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama, “Mining optimized association
rules for numeric attributes,” The ACM SIGACT-SIGMOD-SIGART Symposium on
Principles of Dataset Systems, pp. 182-191, 1996.
[12] J. Han and Y. Fu, “Discovery of multiple-level association rules from large dataset,” The
Twenty-first International Conference on Very Large Data Bases, pp. 420-431, Zurich,
Switzerland, 1995.
[13] J. Han, J. Pei and Y. Yin, “Mining frequent patterns without candidate generation,” The
2000 ACM SIGMOD International Conference on Management of Data, pp. 1-12, 2000.
[14] IBM, The Intelligent Information Systems Research (Quest) Group,
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html
[15] J. L. Koh and S. F. Shieh, “An efficient approach for maintaining association rules based
on adjusting FP-tree structures,” The Ninth International Conference on Database
Systems for Advanced Applications, pp. 417-424, 2004.
[16] M. Kryszkiewicz and M. Gajek, “Why to apply generalized disjunction-free generators
representation of frequent patterns?” The 13th International Symposium International
Symposium on Methodologies for Intelligent Systems, Lyon, France, pp. 382-392, 2002.
[17] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient algorithm for discovering
association rules,” The AAAI Workshop on Knowledge Discovery in Datasets, pp.
181-192, 1994.
[18] J. Pei, J. Han and R. Mao, “CLOSET: an efficient algorithm for mining frequent closed
itemsets,” The 2000 ACM SIGMOD DMKD‘00, Dallas, TX, USA, 2000.
[19] J.S. Park, M.S. Chen, P.S. Yu, “Using a hash-based method with transaction trimming
for mining association rules,” IEEE Transactions on Knowledge and Data Engineering,
Vol. 9, No. 5, pp. 812-825, 1997.
[20] Y. Qiu, Y. J. Lan and Q. S. Xie, “An improved algorithm of mining from FP- tree,” The
Third International Conference on Machine Learning and Cybernetics, pp. 26-29, 2004.
[21] L. Shen, H. Shen and L. Cheng, “New algorithms for efficient mining of association
rules,” The Seventh Symposium on the Frontiers of Massively Parallel Computation, pp.
234-241, 1999.
[22] R. Srikant and R. Agrawal, “Mining generalized association rules,” The Twenty-first
International Conference on Very Large Data Bases, pp. 407-419, Zurich, Switzerland,
1995.
[23] R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational
tables,” The 1996 ACM SIGMOD International Conference on Management of Data, pp.
1-12, Montreal, Canada, 1996.
[24] M. Wojciechowski and M. Zakrzewicz, “Dataset filtering techniques in constraint-based
frequent pattern mining,” Pattern Detection and Discovery, London, UK, 2002.
[25] O. R. Zaiane and E. H. Mohammed, “COFI-tree mining: A new approach to pattern
growth with reduced candidacy generation,” The IEEE International Conference on
Data Mining, 2003.
[26] M. J. Zaki and C. J. Hsiao, “CHARM: an efficient algorithm for closed itemset mining,”
The Second SIAM International Conference on Data Mining, Arlington, 2002.