Evaluation on a Real Dataset - 基於投影技術之模糊資料探勘方法之研究

A real dataset BMS-POS [11] was used to evaluate the performance of the three algorithms under different parameter settings. Figure 5.1.6 and Figure 5.1.7 showed the difference in the number of candidates generated by the algorithms for different thresholds, varying from 1.00% to 0.20%.

0 200 400 600 800 1,000

100K 200K 300K 400K 500K

Execution Time (Sec.)

D: Number of Transactions

T10I4N4KD200K datasets, min_fsup = 0.02%

PFA GDF FApriori

Figure 5.1.6: Comparison number of candidate itemsets generated by the three algorithms along with different minimum fuzzy support threshold, min_fsup.

Figure 5.1.7: Execution time of the three algorithms along with different minimum fuzzy support threshold, min_fsup.

It can be seen that the proposed PFA algorithm performed better than the other two algorithms for the real dataset with regard to the number of candidate itemsets and execution efficiency. The effects were even better when the minimum fuzzy support

0 50,000 100,000 150,000 200,000 250,000 300,000

1% 0.80% 0.60% 0.40% 0.20%

Number of Candidate Itemsets

min_fsup: Minimum Fuzzy Support Threshold

BMSPOS dataset

PFA GDF FApriori

0 500 1,000 1,500 2,000 2,500 3,000

1% 0.80% 0.60% 0.40% 0.20%

Execution Time (Sec.)

min_fsup: Minimum Fuzzy Support Threshold

BMSPOS dataset

PFA GDF FApriori

CHAPTER 6 Conclusions and Future Work

Fuzzy data mining has been widely applied to various applications, because its results are both simple and comprehensible to human operators. However, most of the currently existing approaches in the field of fuzzy itemset mining adopt level-wise techniques to find fuzzy frequent itemsets in a set of quantitative transactions.

Accordingly, these methods need to spend considerable time cost generating a large number of candidates and counting their actual fuzzy counts in transactions. In this thesis, we thus develop two novel algorithms, called gradual data-reduction fuzzy mining approach (GDF) and projection-based fuzzy mining algorithm (PFA), to deal with the problem of fuzzy itemsets mining. In particular, two effective strategies, reducing and pruning, are developed to improve the efficiency of finding fuzzy frequent itemsets. Finally, the experimental results reveal the proposed algorithms have good performance in terms of both the pruning effect and execution efficiency compared to the currently existing algorithms under various parameter settings, when working with synthetic datasets generated by a public IBM data generator and a public real dataset, BMS-POS.

In future work, we will apply the proposed strategies and approaches to different applications, such as streaming data, sequential pattern mining, multi-sources mining, and so forth. Moreover, we will attempt to handle the maintenance problem of fuzzy data mining, with effective strategies to deal with deleted or modified transactions.

REFERENCES

[1] R. Agrawal, T. Imielinksi, and A. Swami, “Mining association rules between sets of items in large database,” In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207-216, 1993.

[2] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” In Proceedings of the International Conference on Very Large Data Bases, pp.

487-499, 1994.

[3] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “Efficient tree structures for high utility pattern mining in incremental databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 3, pp. 1708-1721, 2009.

[4] J. Alcalá-Fdez, R. Alcalá, M. J. Gacto, and F. Herrera, “Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms,”

Fuzzy Sets and Systems, Vol. 160, No. 7, pp. 905-921, 2009.

[5] C. H. Chen, T. P. Hong, and Vincent S. Tseng, “An improved approach to find membership functions and multiple minimum supports in fuzzy data mining,”

Expert Systems with Applications, Vol. 36, No. 6, pp. 10016-10011, 2009.

[6] C. H. Chen, T. P. Hong, and Vincent S. Tseng, “Fuzzy data mining for time-series data,” Applied Soft Computing, Vol. 12, No. 1, pp. 536-542, 2012.

[7] Keith C. C. Chen and W. H. Au, “An effective algorithm for mining interesting quantitative association rules,” In Proceedings of the 1997 ACM Symposium on Applied Computing, pp. 88-90, 1997.

[8] C. H. Chen, T. P. Hong, and Vincent S. Tseng, “An improved approach to find membership functions and multiple minimum supports in fuzzy data mining,”

Expert Systems with Applications, Vol. 36, No. 6, pp. 10016-10011, 2009.

[9] R. Chan, Q. Yang, and Y. D. Shen, “Mining high utility itemsets,” In Proceedings of the 3rd IEEE International Conference on Data Mining, pp 19-26, 2003.

[10] Y. L. Chen, and Tony C. K. Huang, “A new approach for discovering fuzzy quantitative sequential patterns in sequence databases,” Fuzzy Sets and Systems, Vol. 157, No.12, pp. 1641-1661, 2006.

[11] Frequent Itemsets Mining Dataset Repository, available at (http://fimi.cs.helsink i.fi/data/).

[12] T. P. Hong, C. S. Kuo, and S. C. Ch, “Mining association rules from quantitative data,” Intelligent Data Analysis, Vol. 3, No. 5, pp 363-376, 1999.

[13] T. P. Hong, C. S. Kuo, and S. C. Chi, “A data mining algorithm for transaction data with quantitative values,” In Proceedings of the 7th National Conference on Fuzzy Theory and Its Applications, pp. 874-878, 1999.

[14] T. P. Hong, C. S. Kuo, and S. C. Chi, “A fuzzy data mining algorithm for quantitative values,” In Proceedings of the 3rd International Conference on Knowledge-Based Intelligent Information Engineering System, pp. 480-483, 1999.

[15] T. P. Hong, C. S. Kuo, S. C. Chi and S. L. Wang, “Mining fuzzy rules from quantitative data based on the AprioriTid algorithm,” In Proceedings of the 2000 ACM Symposium on Applied Computing, Vol. 1, pp. 534-536. 2000.

[16] T. P. Hong, C. S. Kuo, and S. C. Chi, “Trade-off between computation time and number of rules for fuzzy mining from quantitative data,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 9, No. 5, pp. 587-604, 2001.

[17] T. P. Hong, K. Y. Lin, and S. L. Wang, “Mining fuzzy sequential patters from quantitative transactions,” Soft Computer, Vol. 10, No. 10, pp. 925-932, 2006.

[18] T. P. Hong, C. M. Huang, and S. J. Horng, “Linguistic object-oriented web-usage

mining,” International Journal of Approximate Reasoning, Vol. 48, No.1, pp.

47-61, 2008.

[19] T. P. Hong, C. M. Huang and S. J. Horng, “Discovering fuzzy inter-and intra-object associations,” Expert Systems with Applications, Vol. 38, No. 6, pp. 6777-6786, 2011.

[20] Tony C. K. Huang, “Developing an efficient knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence database,” Fuzzy Sets and Systems, Vol. 160, No. 23, pp. 3359-3381, 2009.

[21] Tony C. K. Huang, “Mining the change of customer behavior in fuzzy time-interval sequential patterns,” Applied Soft Computing, Vol. 12, No. 3, pp. 1068-1086, 2012.

[22] IBM Quest Data Mining Project, “Quest synthetic data generation code,” available at (http://www.almaden.ibm.com/cs/quest/syndata.htm), 1996.

[23] C. M. Kouk, A. Fu, and M. H. Wong, “Mining fuzzy association rules in database,”

SIGMOD Record, Vol. 27, No. 1, pp. 41-46, 1998.

[24] R. Kruse, C. Borgelt, and D Nauck, “Fuzzy data analysis: challenges and perspectives,” In Proceedings of the IEEE International Conference on Fuzzy Systems, Vol. 3, pp. 1211-1216, 1999.

[25] G. C. Lan, T. P. Hong, and Vincent S. Tseng, “Discovery of high utility itemsets from on-shelf time periods of products,” Expert Systems with Application, Vol. 38, No. 5, pp. 5851-5857, 2011.

[26] Y. C. Lee, T. P. Hong, and W. Y. Lin, “Mining association rules with multiple minimum supports using maximum constraints,” International Journal of Approximate Reasoning, Vol. 40, No. 1-2, pp. 44-54, 2005.

[27] Y. C. Lee, T. P. Hong, and T. C. Wang, “Multi-level fuzzy mining with multiple minimum supports,” Expert Systems with Applications, Vol. 34, No. 1, pp. 459-468,

2008.

[28] C. W. Lin, T. P. Hong, and W. H. Lu, “An efficient tree-based fuzzy data mining approach,” International Journal of Fuzzy Systems, Vol. 12, No. 2, pp. 150-157, 2010.

[29] J. H. Liu, “Mining sequential patterns with fuzzy taxonomy structures,” Journal of Chienkuo Technology University, Vol. 24, No. 2, pp. 193-210, 2005.

[30] Y. Liu, W. K. Liao, and A. Choudhary, “A fast high utility itemsets mining algorithm,” In Proceedings of the Utility-Based Data Mining Workshop, pp. 90-99, 2005.

[31] R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” In Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 1-12, 1996.

[32] C. Y. Tsai, C. C. Lo, and C. W. Lin, “A time-interval sequential pattern change detection method,” International Journal of Information Technology & Decision Marking, Vol. 10, No.1, pp. 83-108, 2011.

[33] C. M. Wang, S. H. Chen, and Y. F. Huang, “A fuzzy approach for mining high utility quantitative itemsets,” In Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 20-24, 2009.

[34] J. S. Yeh and P. C. Hsu, “HHUIF and MSICF: Novel algorithms for privacy preserving utility mining,” Expert Systems with Applications, Vol. 37, No. 7, pp.

4779-47786, 2010.

[35] J. S. Yeh, C. Y. Chang, and Y. T. Wang, “Efficient algorithms for incremental utility mining,” In Proceedings of the Conference on Ubiquitous Information Management and Communication, pp. 229-234, 2008.

在文檔中基於投影技術之模糊資料探勘方法之研究 (頁 55-61)