Experimental Results of Incremental Utility Mining Algorithms

CHAPTER 6 Experimental Results

6.2 Experimental Results of Incremental Utility Mining Algorithms

The proposed incremental two-phase average-utility mining algorithm (ITPAU) was then compared to the two-phase average-utility mining algorithm (TPAU), which was a batch utility mining algorithm. The high upper-bound average-utility itemsets and the high average-utility itemsets of original database were recorded for incremental mining. The number of new inserted transactions was set as 100 each time. The numbers of the original transactions were respectively set as 1000, 5000, 10000, 15000 and 20000 to show the effects on different numbers of transactions. The same transactions datasets but with inserted data were executed by the batch utility mining algorithm (TPAU) as well. The original minimum average-utility thresholds were set respectively at 0.01% (Low), 0.05% (Medium) and 0.09%

(High) of the total utility to show the effects on different minimum average-utility thresholds.

Figure 6-3 shows the execution time of ITPAU vs. TPAU on different numbers of transactions, with the threshold set at 0.01% of the total utility.

Figure 6-3: The execution time of ITPAU vs. TPAU on different numbers of transactions (threshold=0.01%).

Figure 6-4 shows the execution time of ITPAU vs. TPAU on different numbers of transactions with the threshold set at 0.05% of the total utility.

Figure 6-4: The execution time of ITPAU vs. TPAU on different numbers of transactions (threshold=0.05%).

Figure 6-5 shows the execution time of ITPAU vs. TPAU on different numbers of transactions, with the threshold set at 0.09% of the total utility.

Figure 6-5: The execution time of ITPAU vs. TPAU on different numbers of transactions (threshold=0.09%).

From the three figures, we could observe that when the number of original transactions was small, the execution time of ITPAU was close to that of TPAU. But with the number of original transactions increases, the execution time of TPAU increased considerably, and the execution time of ITPAU increased only a little. The difference between the execution time of ITPAU and TPAU became apparent with the number of original transactions increased. The execution time of ITPAU was less than that of TPAU on different numbers of transactions and on different minimum average-utility thresholds. The reason was that the mined results from

original transactions were recorded. Since the most execution time of the algorithm ITPAU was spent on updating the upper-bound values of high upper-bound average-utility itemsets and checking against the minimum threshold, the time to scan the database could thus was substantially reduced in this way. Thus, the total execution time of ITPAU was less than that of TPAU for the updated database.

CHAPTER 7 Discussion and Conclusion

This thesis defines a new mining measure called the average utility and proposes three algorithms to discover high average-utility itemsets. The first algorithm discovers high average-utility itemsets from static databases in a batch way. This algorithm is divided into two phases. In phase I, it overestimates the utility of itemsets for maintaining the “downward-closure” property. The property is then used to efficiently prune impossible utility itemsets level by level. In phase II, one database scan is needed to determine the actual high average-utility itemsets from the candidate itemsets generated in phase I. Since the number of candidate itemsets has been greatly reduced when compared to that by the traditional approaches, a lot of computational time may be saved. Considering that the length of itemsets is a major factor to influence the utility values of itemsets in the traditional approaches, the measure “average-utility” is good to avoid the influence of the length. The proposed concept can thus get a trade-off between high utility and time complexity. The experimental results also show the above points.

The second and the third algorithms are proposed to maintain the discovered high average-utility itemsets in an incremental way. The two algorithms can handle the databases varying with the newly inserted records and the old deleted records. The two algorithms

adopt the concept of the FUP algorithm to reduce the time of re-processing the original databases. The experiments show that the two proposed incremental utility mining algorithms are effective to maintain the discovered high average-utility itemsets for record insertion and deletion. In the future, we will try to use appropriate data structure to further improve the execution time of the proposed algorithms.

References

[1] R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993, pp. 207-216.

[2] R. Agrawal and R. Srikant, "Mining Sequential Patterns," in Proceedings of International Conference on Data Engineering, 1995, pp. 3-14.

[3] R. Agrawal and R. Srikant, "Mining Sequential Patterns: Generalizations and Performance Improvements," in Proceedings of the 5th International Conference on Extending Database Technology, 1996, pp. 3-17.

[4] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, "FreeSpan:

Frequent Pattern-Projected Sequential Pattern Mining," in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp.

355-359.

[5] B. Liu, W. Hsu, and Y. Ma, "Integrating Classification and Association Rule Mining,"

Knowledge Discovery and Data Mining, pp. 80-86, 1998.

[6] Y. Sucahyo and R. Gopalan, "Building a More Accurate Classifier Based on Strong Frequent Patterns," in Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, 2004, pp. 1036-1042.

[7] C. Chang and C. Lin, "Perfect Hashing Schemes for Mining Association Rules," The Computer Journal, vol. 48, pp. 168-179, 2005.

[8] F. Berzal, J. Cubero, N. Marin, and J. Serrano, "TBAR: An Efficient Method for Association Rule Mining in Relational Databases," Data & Knowledge Engineering, vol.

37, pp. 47-64, 2001.

[9] J. Park, M. Chen, and P. Yu, "An Effective Hash-Based Algorithm for Mining Association Rules," in Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, 1995, pp. 175-186.

[10] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, "Dynamic Itemset Counting and Implication Rules for Market Basket Data," in Proceedings of the 1997 ACM SIGMOD

International Conference on Management of Data, 1997, pp. 255-264.

[11] J. Han, J. Pei, Y. Yin, and R. Mao, "Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach," Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004.

[12] G. Grahne and J. Zhu, "Fast Algorithms for Frequent Itemset Mining Using FP-Trees,"

IEEE Transactions on Knowledge and Data Engineering, vol. 17, pp. 1347-1362, 2005.

[13] R. Agarwal, C. Aggarwal, and V. Prasad, "A Tree Projection Algorithm for Generation of Frequent Itemsets," Journal of Parallel and Distributed Computing, vol. 61, pp. 350-371, 2001.

[14] L. Junqiang, P. Yunhe, W. Ke, and H. Jia-wei, "Mining Frequent Item Sets by Opportunistic Projection," in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 229-238.

[15] C. Chi and K. Lain, "A New FP-Tree Algorithm for Mining Frequent Itemsets," in Proceedings of the Content Computing, Advanced Workshop on Content Computing, 2004, pp. 266-277.

[16] Y. Liu, W. Liao, and A. Choudhary, "A Fast High Utility Itemsets Mining Algorithm," in Proceedings of the 1st International Workshop on Utility-Based Data Mining, 2005, pp.

90-99.

[17] H. Yao, H. Hamilton, and C. Butz, "A Foundational Approach to Mining Itemset Utilities from Databases," in Proceedings of the 4 th SIAM International Conference on Data Mining, 2004, pp. 211-225.

[18] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," in Proceedings of the 20th International Conference on Very Large Data Bases, 1994, pp.

487-499.

[19] D. Cheung, J. Han, V. Ng, and C. Wong, "Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Approach," in Proceedings of the 12th IEEE International Conference on Data Engineering, 1996, pp. 106-114.

[20] D. Cheung, S. Lee, and B. Kao, "A General Incremental Technique for Maintaining Discovered Association Rules," in Proceedings of the 5th International Conference on Database Systems for Advanced Applications, 1997, pp. 185-194.

Association Rules," in Proceedings of the 9th International Workshop on Database and Expert Systems Applications, 1998, p. 240.

[22] M. Lin and S. Lee, "Incremental Update on Sequential Patterns in Large Databases," in Proceedings of the 10th IEEE International Conference on Tools with Artificial Intelligence, 1998, pp. 24-31.

[23] J. Chen and P. Yu, "Using a Hash-Based Method with Transaction Trimming for Mining Association Rules," IEEE Transactions on Knowledge and Data Engineering, vol. 9, pp.

813-825, 1997.

[24] B. Barber and H. Hamilton, "Extracting Share Frequent Itemsets with Infrequent Subsets," Data Mining and Knowledge Discovery, vol. 7, pp. 153-185, 2003.

[25] B. Barber and H. Hamilton, "Algorithms for Mining Share Frequent Itemsets Containing Infrequent Subsets," Lecture Notes in Computer Science, pp. 316-324, 2000.

[26] Y. Li, J. Yeh, and C. Chang, "Direct Candidates Generation: A Novel Algorithm for Discovering Complete Share-Frequent Itemsets," Lecture Notes in Computer Science, vol. 3614, p. 551, 2005.

[27] Y. Li, J. Yeh, and C. Chang, "A Fast Algorithm for Mining Share-Frequent Itemsets," in Proceedings of the 7th Asia Pacific Web Conference, 2005, pp. 417-428.

[28] Y. Li, J. Yeh, and C. Chang, "Efficient Algorithms for Mining Share-Frequent Itemsets,"

in Proceedings of the 11th World Congress of International Fuzzy Systems Association, 2005, pp. 543-539.

[29] H. Yao and H. Hamilton, "Mining Itemset Utilities from Transaction Databases," Data &

Knowledge Engineering, vol. 59, pp. 603-626, 2006.

[30] Y. Liu, W. Liao, and A. Choudhary, "A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets," in Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2005.

在文檔中高平均效益項目集之探勘 (頁 98-0)