Discussions and Conclusions - An Improved Data Mining Approach Using Predictive Itemsets

From the experimental results, different values of data dependency will cause the same

large itemsets, but different predictive effects. When w=1, the non-promising candidate sets

are predicted very well, but the promising candidate sets are predicted badly; and vice versa

for w=0. By default, we set w=0.5. If the data dependency relationships in transactions can be

well utilized, our method can improve the overall performance of finding large itemsets.

In our experiments, both the (n, p) algorithm and ours suffer from the inefficiency of

generating C_i⁺₊₁ from C_i⁺. When there are many items in the dataset, e.g., 25 items in

D1~D6, and more levels of transactions to be considered, more computation is needed in both

algorithms. However, our method provides a more accurate approach for predicting itemsets

and obtains a better performance than the (n, p) algorithm, especially when p>2.

In this paper, we have presented a mining algorithm that combines the advantages of the

apriori and the (n, p) algorithm in finding large itemsets. As the (n, p) algorithm does, our

algorithm reduces the number of scanning datasets for finding p levels of large itemsets. A

new parameter that considers data dependency is included in our method for early filtering

out the itemsets that are possibly of lower supports and thus improves the computational

efficiency.

We also conclude that the three algorithms can compete with each other and gain the best

performance on different types of datasets. There need more studies on how to tune the

parameters, such as n, p, and transaction threshold in the (n, p) algorithm and w, t in ours,

before the mining task is performed.

Acknowledgement

The authors would also like to thank Mr. Tsung-Te in the department of information

management, Shu-Te University, Taiwan, for his help in conducting the experiments.

References

[1] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” The

International Conference on Very Large Data Bases, pp. 487-499, 1994.

[2] R. Agrawal and R. Srikant, “Mining sequential patterns,” The Eleventh IEEE

International Conference on Data Engineering, pp. 3-14, 1995.

[3] R. Agrawal, R. Srikant and Q. Vu, “Mining association rules with item constraints,” The

Third International Conference on Knowledge Discovery in Datasets and Data Mining,

pp. 67-73, Newport Beach, California, 1997.

[4] R. Agrawal, T. Imielinksi and A. Swami, “Dataset mining: a performance perspective,”

IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925,

1993.

[5] R. Agrawal, T. Imielinksi and A. Swami, “Mining association rules between sets of

items in large dataset,“ The ACM SIGMOD Conference, pp. 207-216, Washington DC,

USA, 1993.

[6] A. Bykowski and C. Rigotti, “A condensed representation to find frequent patterns,” The

12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems,

Santa Barbara, California, USA, 2001.

[7] Y. Bastide, R. Taouil, N. Pasquier, G. Stumme and L. Lakhal,. “Mining frequent

patterns with counting inference,” ACM SIGKDD Explorations, Vol. 2, No. 2, pp. 66 -75,

2000.

[8] M.S. Chen, J. Han and P.S. Yu, “Data mining: an overview from a dataset perspective,”

IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866-883,

1996.

[9] N. Denwattana and J. R. Getta, “A parameterised algorithm for mining association

rules,” The Twelfth Australasian Dataset Conference, pp. 45-51, 2001.

[10] C. I. Ezeife, “Mining Incremental association rules with generalized FP-tree,” The 15th

Conference of the Canadian Society for Computational Studies of Intelligence on

Advances in Artificial Intelligence, pp. 147-160, 2002.

[11] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama, “Mining optimized association

rules for numeric attributes,” The ACM SIGACT-SIGMOD-SIGART Symposium on

Principles of Dataset Systems, pp. 182-191, 1996.

[12] J. Han and Y. Fu, “Discovery of multiple-level association rules from large dataset,” The

Twenty-first International Conference on Very Large Data Bases, pp. 420-431, Zurich,

Switzerland, 1995.

[13] J. Han, J. Pei and Y. Yin, “Mining frequent patterns without candidate generation,” The

2000 ACM SIGMOD International Conference on Management of Data, pp. 1-12, 2000.

[14] IBM, The Intelligent Information Systems Research (Quest) Group,

http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html

[15] J. L. Koh and S. F. Shieh, “An efficient approach for maintaining association rules based

on adjusting FP-tree structures,” The Ninth International Conference on Database

Systems for Advanced Applications, pp. 417-424, 2004.

[16] M. Kryszkiewicz and M. Gajek, “Why to apply generalized disjunction-free generators

representation of frequent patterns?” The 13th International Symposium International

Symposium on Methodologies for Intelligent Systems, Lyon, France, pp. 382-392, 2002.

[17] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient algorithm for discovering

association rules,” The AAAI Workshop on Knowledge Discovery in Datasets, pp.

181-192, 1994.

[18] J. Pei, J. Han and R. Mao, “CLOSET: an efficient algorithm for mining frequent closed

itemsets,” The 2000 ACM SIGMOD DMKD‘00, Dallas, TX, USA, 2000.

[19] J.S. Park, M.S. Chen, P.S. Yu, “Using a hash-based method with transaction trimming

for mining association rules,” IEEE Transactions on Knowledge and Data Engineering,

Vol. 9, No. 5, pp. 812-825, 1997.

[20] Y. Qiu, Y. J. Lan and Q. S. Xie, “An improved algorithm of mining from FP- tree,” The

Third International Conference on Machine Learning and Cybernetics, pp. 26-29, 2004.

[21] L. Shen, H. Shen and L. Cheng, “New algorithms for efficient mining of association

rules,” The Seventh Symposium on the Frontiers of Massively Parallel Computation, pp.

234-241, 1999.

[22] R. Srikant and R. Agrawal, “Mining generalized association rules,” The Twenty-first

International Conference on Very Large Data Bases, pp. 407-419, Zurich, Switzerland,

1995.

[23] R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational

tables,” The 1996 ACM SIGMOD International Conference on Management of Data, pp.

1-12, Montreal, Canada, 1996.

[24] M. Wojciechowski and M. Zakrzewicz, “Dataset filtering techniques in constraint-based

frequent pattern mining,” Pattern Detection and Discovery, London, UK, 2002.

[25] O. R. Zaiane and E. H. Mohammed, “COFI-tree mining: A new approach to pattern

growth with reduced candidacy generation,” The IEEE International Conference on

Data Mining, 2003.

[26] M. J. Zaki and C. J. Hsiao, “CHARM: an efficient algorithm for closed itemset mining,”

The Second SIAM International Conference on Data Mining, Arlington, 2002.

在文檔中 An Improved Data Mining Approach Using Predictive Itemsets (頁 26-31)