Evaluation on A Real Dataset - Projection-based Weighted Sequential Pattern Mining with Improve

CHAPTER 2 Review of Related Works

4.2 Projection-based Weighted Sequential Pattern Mining with Improved

4.3.4 Evaluation on A Real Dataset

The real dataset Foodmart was also used to evaluate the performance of the three algorithms under different parameter settings. Figures 4.6 to 4.9 showed the differences in the number of weighted frequent upper-bound patterns and execution time of the three algorithms for different minimum weighted support thresholds, varying from 9.8% to 9%.

Figure 4.6: Comparison of numbers of weighted frequent upper-bound patterns needed by the three algorithms under different minimum weighted support thresholds.

Figure 4.7: Pruning effect of the proposed algorithms under different minimum weighted support thresholds.

Figure 4.8: Execution efficiency of all the algorithms under different thresholds.

Figure 4.9: Pruning effect of PWSI under different thresholds.

As could be seen, the proposed PWSI algorithm performed better than the other two algorithms for the real dataset with regard to the number of candidate subsequences and execution efficiency. The effects were even better when the minimum weighted support threshold value decreased.

CHAPTER 5 Conclusions and Future Works

Weighted mining has been recently applied to find significant patterns from a set of data due to its practical applications. The main reason is that weight for each item in transactions could be given according to the relevant information of the item, such as its cost or profit.

Different from traditional data mining, the weighted data mining could suitably be applied to find interesting knowledge from a set of data with different significant values. The major challenge for weighted data mining is that the downward-closure property in traditional mining cannot be kept according to actual weight values of items in transactions. To handle this, in the past, an upper-bound model based on the maximum weight of a database was designed to hold the downward-closure property in weighted data mining. Based on the traditional upper-bound model, however, many uncompromising candidates may be generated for mining. In the thesis, we observe that maximum weight in a sequence is more suitable for maximum weight in a sequence database. We thus propose new upper-bound models and efficient mining algorithms in finding weighted frequent itemsets and weighted sequential patterns, respectively.

For the issue of weighted frequent itemset mining, we developed an improved upper-bound model, which the maximum weight in a sequence is adopted to build a new

downward-closure property, to further tighten the upper-bounds of weight values for itemsets.

In particular, the two effectively improved strategies for the model are designed and adopted to prune more unpromising candidates in the mining process. Based on the model and strategies, moreover, we propose effective projection-based algorithms to achieve a better performance for finding weighted frequent itemsets from a set of transactions. On the other hand, the proposed model and strategies used in weighted frequent itemset mining are further extended to the weighted sequential pattern mining. Correspondingly, these model and strategies still has an excellent performance in terms of both pruning effect and execution efficiency on weighted sequential pattern mining.

Finally, experimental results show unpromising upper-bound candidates needed by the proposed improved models are obviously less than that of traditional models under various parameter settings. With the models and the strategies, furthermore, it can be known that the proposed algorithms run faster than the existing algorithms in execution efficiency when working on both synthetic databases generated by the public IBM data generator and a real public database foodmart.

In the future, the proposed and developed models and strategies in the thesis might be extended to other mining issues, such as data stream mining, closed frequent itemset mining, temporal data mining, and so forth. Moreover, the existing approaches for weighted data mining cannot be applied to handle the centralized database with multiple data sources in a

chain-store environment. For the above different issues, we also will attempt to handle the maintenance problem of weighted data mining when the transactions or sequences are inserted, deleted or modified.

References

[1] R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A performance Perspective,”

IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925,

1993.

[2] R. Agrawal, T. Imielinksi, and A. Swami, “Mining Association Rules between Sets of Items in Large Database,” The ACM SIGMOD International Conference on Management

of Data, pp. 207-216, 1993.

[3] R. Agrawal, R. Srikant, and Q. Vu, “Mining Association Rules with Item Constraints,”

The 3rd International Conference on Knowledge Discovery in Databases and Data

Mining, pp. 66-73, 1997.

[4] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules,” The International Conference on Very Large Data Bases, pp. 487-499, 1994.

[5] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” The IEEE Internationanl Conference on Data Engineering, pp. 3-14, 1995.

[6] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, Y. K. Lee, and H. J. Choi, “Single-pass Incremental and Interactive Mining for Weighted Frequent Patterns,” Expert Systems with Applications, Vol. 39, No. 9, pp. 7976-7994, 2012.

[7] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “Handling Dynamic Weights in Weighted Frequent Pattern Mining,” The Institute of Electronics, Information and

Communication Engineers (IEICE), Vol. 91-D, No. 11, pp. 2578-2588, 2008.

[8] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “Mining Weighted Frequent Patterns Using Adaptive Weights,” Intelligent Data Engineering and Automated

Learning (IDEAL), pp. 258-265, 2008.

[9] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee “High Utility Pattern Mining Using The Maximal Itemset Property and Lexicographic Tree Structures,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 12, pp. 1708-1721,

2009.

[10] C. F. Ahmed, S. K. Tanbeer, and B. S. Jeong, “Efficient Mining of Weighted Frequent Patterns over Data Streams,” IEEE International Conference on High Performance

Computing and Communications (HPCC), pp. 400-406, 2009.

[11] F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi, “ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraints,” The 3rd IEEE International Conference on Data Mining, pp. 11-18, 2003.

[12] C. H. Cai, W. C. Fu, C. H. Cheng, and W. W. Kwong, “Mining Association Rules with Weighted Items,” International Database Engineering and Applications Symposium, pp.

68-77, 1998.

[13] J. H. Chang and N. H. Park, “Comparative Analysis of Sequence Weighting Approaches for Mining Time-interval Weighted Sequential Patterns,” Expert Systems with

Applications, Vol. 39, No. 9, pp. 3867-3873, 2012.

[14] J. H. Chang, “Mining Weighted Sequential Patterns in A Sequence Database with A Time-Interval Weight,” Knowledge Based Systems, Vol. 24, No. 1, pp. 1-9, 2011.

[15] Y. Guo, Z. Jiang, Y. Y. Wang, and Q. Mei, “Frequent Items Mining on Data Stream Based on Weighted Counts,” International Conference on Cyber-Enabled Distributed

Computing and Knowledge Discovery, pp. 48-54, 2011.

[16] J. Han, J. Pei and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” The ACM SIGMOD International Conference on Management of Data, pp. 1-12, 2000.

[17] IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code,” Available at (http://www.almaden.ibm.com/cs/quest/syndata.html).

[18] Y. Kim, W. Kim, and U. Kim, “Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams,” The Journal of Information Processing Systems, Vol. 6, No.

1, pp. 79-90, 2010.

[19] Y. S. Koh, R. Pears, and G. Dobbie, “WeightTransmitter: Weighted Association Rule Mining Using Landmark Weights,” Springer-Verlag Berlin Heidelberg, Vol. 7302, No.

pp. 37-48, 2012.

[20] B. Le and B. Vo, “Efficient Algorithms for Mining Frequent Weighted Itemsets from Weighted Items Databases,” International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 1-6, 2010.

[21] M. Y. Lin, S. C. Hsueh, and C. W. Chang, “Fast Discovery of Sequential Patterns in Large Databases Using Effective Time-indexing,” Information Sciences, Vol. 278, No.

22, pp. 4228-4245, 2008.

[22] D-I. Lin and Z. M. Kedem, “Pincer-search: a New Algorithm for Discovering the Maximum Frequent Set,” The 6th International Conference on Extending Database

Technology, pp. 103-119, 1998.

[23] M. Y. Lin and S. Y. Lee, “Incremental Update on Sequential Patterns in Large Databases by Implicit Merging and Efficient Counting,” Information Systems, Vol. 29, No. 5, pp.

385-404, 2004.

[24] M. Y. Lin and S. Y. Lee, “Interactive Sequence Discovery by Incremental Mining,”

Information Systems, Vol. 165, No. 3-4, pp. 187-205, 2004.

[25] M. Y. Lin and S. Y. Lee, “Efficient Mining of Sequential Patterns with Time Constraints by Delimited Pattern Growth,” Knowledge and Information Systems, Vol. 7, No. 4, pp.

499-514, 2005.

[26] M. Y. Lin and S. Y. Lee, “Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning,” Journal of Information Science and Engineering,

Vol. 21, No. 1, pp. 109-128, 2005.

[27] Y. Liu, W. Liao, and A. Choudhary, “A Fast High utility Itemsets Mining Algorithm,”

The Utility-Based Data Mining Workshop, pp. 90-99, 2005.

[28] J. Liu, Y. Pan, K. Wang, and J. Han, “Mining Frequent Item Sets by Opportunistic Projection,” The International Conference on Knowledge Discovery in Databases, pp.

229-238, 2002.

[29] H. Mannila, H. Toivonen, and A. I. Verkamo, “Discovering Frequent Episodes in Sequences,” The First International Conference on Knowledge Discovery and Data Mining, pp. 210-215, 1995.

[30] Microsoft Corporation, Example Database FoodMart of Microsoft Analysis Services.

[31] R. Pears, Y. S. Koh and G. Dobbie, “EWGen: Automatic Generation of Item Weights for Weighted Association Rule Mining,” International Conference on Advanced Data

Mining and Applications (ADMA), pp. 36-47, 2010.

[32] J. Pei, J. Han, B. M. Asi, J. Wang, and Q. Chen, “Mining Sequential Patterns by Pattern-Growth: the PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1424-1440, 2004.

[33] J. Pei, J. Han, and W. Wang, “Mining Sequential Patterns with Constraints in Large Databases,” The International Conference on Information and Knowledge Management,

pp. 18-25, 2002.

[34] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal, “Multidimensional Sequential Pattern Mining,” Information and Knowledge Management Conference, pp. 81-88, 2001.

[35] J. D. Ren, J. Yang, and Y. Li, “Mining Weighted Closed Sequential Patterns in Large

Databases,” International Conference on Fuzzy Systems and Knowledge Discovery

(FSKD), pp. 640-644, 2008.

[36] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” The 5th International Conference Extending Database Technology, pp. 3-17, 1996.

[37] F. Tao, F. Murtagh, and M. M. Farid, “Weighted Association Rule Mining Using Weighted Support and Significance Framework,” The Ninth ACM SIGKDD

International Conference on Knowledge Discovery and Data Mining, pp.

661-666, 2003.

[38] F. S. Tsai and A. T. Kwee, “Experiments in Term Weighting for Novelty Mining,” Expert Systems with Applications, Vol. 38, No. 11, pp. 14094-10101, 2011.

[39] P. S. M. Tsai, “Mining Frequent Itemsets in Data Streams Using the Weighted Sliding Window Model,” Expert Systems with Applications, Vol. 36, No. 9, pp. 11617-11625, 2009.

[40] W. Wang, J. Yang, and P. S. Yu, “Efficient Mining of Weighted Association Rules (WAR),” International Conference on Knowledge Discovery and Data Mining (KDD),

pp. 270-274, 2000.

[41] U. Yun and J. J. Leggett, “WSpan: Weighted Sequential Pattern Mining in Large Sequence Databases,” The 3rd International IEEE Conference on Intelligent Systems, pp.

512-517, 2006.

[42] U. Yun and K. H. Ryu, “Discovering Important Sequential Patterns With Length-Decreasing Weighted Support Constraints,” International Journal of Information Technology & Decision Making, Vol. 9, No. 4, pp. 575-599, 2010.

[43] U. Yun and K. H. Ryu, “Approximate Weighted Frequent Pattern Mining with/without Noisy Environments,” Knowledge Based Systems, Vol. 24, No. 1, pp. 73-82, 2011.

[44] U. Yun, “A New Framework for Detecting Weighted Sequential Patterns in Large Sequence Databases,” Knowledge Based Systems, Vol. 21, No. 2, pp. 110-122, 2008.

[45] U. Yun, “On Pushing Weight Constraints Deeply into Frequent Itemset Mining,”

Intelligent Data Analysis, Vol. 13, No. 3, pp. 359-383, 2009.

在文檔中有效權重資料探勘方法之研究 (頁 103-0)