The Proposed CDFAR Mining Algorithm - CONCEPT DRIFT FOR FUZZY ASSOCIATION RULES

CHAPTER 4 CONCEPT DRIFT FOR FUZZY ASSOCIATION RULES

4.2 The Proposed CDFAR Mining Algorithm

In this part, the proposed CDFAR approach that combines concept-drift, fuzzy C-means algorithm and fuzzy data mining is described as follows:

INPUT: Two quantitative transaction databases D^t consists of n quantitative transactions and m items at time t, and D^t+k consists of w quantitative transactions and m items at time t+k; The parameters include a support threshold α; A confidence threshold λ; A concept-drift rules sets S; cd:

conditional threshold; cs : consequent threshold; A set of membership functions.

OUTPUT: The fuzzy concept-drift patterns.

STEP 1: The two database generate fuzzy membership functions for each item via the following sub-steps.

(a) Set i = 1, where i is used to keep the identity number of the current item from database. (fuzzy c-means refer to the related words).

(b) The center points of these N clusters are set as the center of fuzzy membership functions for these M linguistic terms.

(d) Set i = i + 1.

(e) i ≤ I, go to Step (a).

STEP 2: The two database generate fuzzy association rules for each item via the following sub-steps.

(a) A set of fuzzy membership functions by fuzzy C-means

(b) If the item satisfies the condition put it in R large itemsets. (fuzzy apriori refer to the chapter 4.1.2).

STEP 3: Find the concept-drift rules from the fuzzy association rules of large itemsets between D^t and D^t+k by the following sub-steps.

STEP 4: Set the initial concept-drift rules sets .

STEP 5: Set r = 1, where r is used to keep the identity number of the current rule from database.

STEP 6: Calculate the emerging change for the fuzzy association rules and check the concept-drift rules from two databases D^t and D^t+k by below sub steps.

(a) Set j = 1, where j is used to keep the identity number of the current conditional terms.

(b) Calculate the fuzzy values of conditional terms for each rule sets.

𝑐𝑐𝑠𝑠 = �

(e) Calculate the fuzzy values of consequents term for each rule sets.

𝑐𝑐𝑠𝑠𝑖𝑖𝑗𝑗 = 𝑐𝑐𝑖𝑖𝑗𝑗 × �1 − �𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖_𝑐𝑐𝑖𝑖𝑠𝑠𝑖𝑖𝑖𝑖𝑖𝑖𝑐𝑐𝑖𝑖𝑖𝑖𝑗𝑗

𝑖𝑖_𝑗𝑗− 1 �

𝛼𝛼

�, (4-8)

(f) Check the concept-drift rules.

STEP 7: Calculate the unexpected change for the fuzzy association rules and check the φ

= S

concept-drift rules from two databases D^t and D^t+k by below sub steps.

(a) Set j = 1, where j is used to keep the identity number of the current conditional terms.

(b) Calculate the fuzzy values of conditional terms for each rule sets.

𝑐𝑐𝑠𝑠 = �

ℓ𝑖𝑖𝑗𝑗× ∑𝑗𝑗∈𝐴𝐴_{𝑖𝑖𝑖𝑖}𝑥𝑥𝑖𝑖𝑗𝑗𝑗𝑗

�𝐴𝐴_{𝑖𝑖𝑗𝑗}� , 𝑖𝑖𝑖𝑖�𝐴𝐴_{𝑖𝑖𝑗𝑗}� ≠ 0 0, 𝑖𝑖𝑖𝑖�𝐴𝐴_{𝑖𝑖𝑗𝑗}� = 0

(4-7)

(c) set j = j + 1

(d) Check the concept-drift rules.

(e) Calculate the fuzzy values of consequents term for each rule sets.

𝑖𝑖_𝑗𝑗− 1 �

𝛼𝛼

�, (4-8)

(f) Check the concept-drift rules.

STEP 8: Set r = r + 1.

STEP 9: If the item set has not been the processed as well as items, go to Step 6.

STEP 10: Output rule sets S.

4.3 Experimental Results

In this part the results of the experiments to show the performance of the proposed fuzzy association rules concept-drift patterns mining (CDFAR) algorithm. In the experimental results, fuzzy membership functions which are generated by Fuzzy

C-means are fixed and the same in order to apply to the two databases. The experiments were implemented in a computer with Intel Core i5-3230M 2.60GHz processor, 4 threads and 12G RAM. The operating system was Microsoft Windows 8.1 Pro, and the programming language was .NET Framework 4.5.1 C# (C# Version 5.0).

A simulation dataset containing 60 items and 10,000 transactions was used in the experiments. In the data set, the number of purchased items in transactions was first randomly generated, and the purchased items and their quantities in each transaction were then generated. Here, we selected 10,000 transactions from the simulated dataset, and divided them into two datasets as databases D^t and D^t+k, where each dataset thus has 5,000 transactions. The minimum support threshold value α was set at 0.04 (4%).

Firstly, the proposed approach is shown in Table 4.8.

Table 4.8: The number of fuzzy concept-drift patterns at thresholds minimum support value as 4%.

Emerging Patterns Unexpected Changes different

location 37 50

first half with second half

of a year

20 40

random

months 2 7

a random month with

whole year

7 13

In Table 4.8, the proposed CDFAR algorithm was performed with different pair of databases which were two databases with different locations, the databases of first half with second half of a year, the two databases of the random months and the database of a random month with whole year. In the experimental results, we can find the influence for customer behavior of the different time is bigger than different location. In the experimental results, we can find the concept-drift patterns from different locations more than different times, but also represents the influence customer behavior more than different times.

We observed that two databases of the random months could find less the same rules, but the rules are found quite special. We can find more concept-drift patterns for fuzzy association rules in two databases by different locations. As the result, the concept-drift rules can represent the different meanings (customer behaviors) in different time or different places.

Then we compared the experimental results for proposed CDFAR algorithm with different thresholds in Table 4.9.

Table 4.9: The number of fuzzy concept-drift patterns at different minimum support thresholds value as 3%.

Emerging Patterns Unexpected Changes different

location 26 42

first half with second half of a year

14 30

random

months 1 2

a random month with whole year

4 10

In Table 4.9, we discuss the effect of different threshold values to the number of fuzzy concept-drift patterns. The minimum support threshold value α was set at 0.03 (3%). The result is shown at Table 4.9.

Evidently, there are few fuzzy concept-drift patterns for fuzzy association rules with higher threshold values. However, this concept-drift patterns are most representative by different kinds. Thus, the proposed CDFAR algorithm should be set a suitable threshold value in order to get a reasonable number of patterns and these patterns are also representatives of special meaning.

CHAPTER 5

CONCLUSION AND FUTURE WORK

In the first part of this thesis, we have proposed a new research issue, named fuzzy concept-drift patterns mining. In addition, the CDMF approach is developed to find concept-drift patterns for fuzzy membership function in two different training database.

To our best knowledge, this research is the second work on mining concept-drift patterns for fuzzy membership functions. In particular, the proposed methods can be understand customer purchase number of commodity in different times or different places. The experimental results show that proposed CDMF approach can find useful concept-drift rules and provide valuable information on among various parameter settings.

In the second part of this thesis, we have also introduced another new issue, named fuzzy association rules concept-drift mining, which considers not only quantities but also linguistic terms in fuzzy theory. In addition, a fuzzy association rule mining approach (CDFAR) is designed to find fuzzy concept-drift patterns. The previous methods for fuzzy association rules can not obtain the information for change of customers’ behavior, however, this information is very valuable for businesses. From

the experimental results, it can be observed the proposed CDFAR approach can be find the effectiveness of the fuzzy association rules concept-drift patterns.

In the future, we would apply the proposed algorithms to other practical applications, such observing change of customers’ behavior for each year, the difference of customer favorite products each season, and among others. In addition, how to design more effective ways to decrease the computing time and find out more about the concept-drift patterns is another interesting topic.

REFERENCES

[1] C. Schwenke, V. Vasyutynskyy, and K. Kabitzsch, "Simulation and analysis of buying behavior in supermarkets,"Emerging Technologies and Factory Automation, pp. 1-4, 2010.

[2] F. Fassetti, G. Greco, and G. Terracina, "Mining loosely structured motifs from biological data," Knowledge and Data Engineering, vol. 20, pp. 1472-1489, 2008.

[3] M. L. Shyu, Z. Xie, M. Chen, and S. C. Chen, "Video semantic event/concept detection using a subspace-based multimedia data mining framework," IEEE Transactions on Multimedia, vol. 10, pp. 252-259, 2008.

[4] S. Krishnaswamy, J. Gama, and M. M. Gaber, "Mobile data stream mining:

From algorithms to applications,"Mobile Data Management, pp. 360-363, 2012.

[5] C. Mastroianni, D. Talia, and P. Trunfio, "Managing heterogeneous resources in data mining applications on grids using xml-based metadata,"Parallel and Distributed Processing Symposium, p. 11, 2003.

[6] F. Marozzo, D. Talia, and P. Trunfio, "A cloud framework for parameter sweeping data mining applications,"Cloud Computing Technology and Science pp. 367-374, 2011.

[7] S. Piramuthu, "Evaluating feature selection methods for learning in data mining

applications," European journal of operational research, vol. 156, pp. 483-494, 2004.

[8] R. Agrawal, T. Imieliński, and A. Swami, "Mining association rules between sets of items in large databases,"ACM Special Interest Group on Management of Data, pp. 207-216, 1993.

[9] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation,"ACM Special Interest Group on Management of Data, pp. 1-12, 2000.

[10] T. P. Hong, J. W. Lin, and Y. L. Wu, "A fast updated frequent pattern tree,"Systems, Man and Cybernetics, pp. 2167-2172, 2006.

[11] L. Lin, K. Yuan, and S. Ren, "Analysis of urban freeway traffic flow characteristics based on frequent pattern tree,"IEEE Transactions on Intelligent Transportation Systems, pp. 1719-1725, 2014.

[12] T. Xu and X. Dong, "Mining frequent patterns with multiple minimum supports using basic Apriori,"International Conference on Natural Computation, pp.

957-961, 2013.

[13] B. Wu, D. Zhang, Q. Lan, and J. Zheng, "An efficient frequent patterns mining algorithm based on apriori algorithm and the fp-tree structure,"International Conference on Convergence and Hybrid Information Technology, pp.

1099-1102, 2008.

[14] A. Mangalampalli and V. Pudi, "Fuzzy association rule mining algorithm for fast and efficient performance on very large datasets," International Conference on Fuzzy Systems, pp. 1163-1168, 2009.

[15] K. Noori and K. Jenab, "Fuzzy reliability-based traction control model for intelligent transportation systems," Systems, Man, and Cybernetics, vol. 43, pp.

229-234, 2013.

[16] B.-Y. Wang and S.-M. Zhang, "A mining algorithm for fuzzy weighted association rules,"Machine Learning and Cybernetics, pp. 2495-2499, 2003.

[17] T. P. Hong, C. S. Kuo, and S. C. Chi, "Trade-off between computation time and number of rules for fuzzy mining from quantitative data," Fuzziness and Knowledge-Based Systems, vol. 9, pp. 587-604, 2001.

[18] H. Zheng, J. He, G. Huang, and Y. Zhang, "Optimized fuzzy association rule mining for quantitative data,"The IEEE International Conference on Fuzzy Systems, pp. 396-403, 2014.

[19] S. L. Wang, C. Y. Kuo, and T. P. Hong, "Mining fuzzy similar sequential patterns from quantitative data,"Systems, Man and Cybernetics, 2002.

[20] P. Chen, H. Su, L. Guo, and Y. Qu, "Mining fuzzy association rules in data streams,"Computer Engineering and Technology, pp. 153-158, 2010.

[21] C. C. Yang and N. Bose, "Generating fuzzy membership function with self-organizing feature map," Pattern Recognition Letters, vol. 27, pp. 356-365, 2006.

[22] T. P. Hong, Y. F. Tung, S. L. Wang, M. T. Wu, and Y. L. Wu, "Extracting membership functions in fuzzy data mining by ant colony systems,"Machine Learning and Cybernetics, pp. 3979-3984, 2008.

[23] G. Liu and W. Yang, "Learning and tuning of fuzzy membership functions by simulated annealing algorithm,"Circuits and Systems, pp. 367-370, 2000.

[24] H. S. Song and S. H. Kim, "Mining the change of customer behavior in an internet shopping mall," Expert Systems with Applications, vol. 21, pp. 157-168, 2001.

[25] P. K. Bala, "Mining changes in purchase behavior in retail sale with products as conditional part,"International Advance Computing Conference, 2010.

[26] C. Wang and Y. Li, "Mining Changes of E-shopper Purchase Behavior in B2C,"Fuzzy Systems and Knowledge Discovery, pp. 240-244, 2008.

[27] B. Liu, W. Hsu, H. S. Han, and Y. Xia, "Mining changes for real-life applications,"Data Warehousing and Knowledge Discovery, p. 337, 2000.

[28] A. Tsymbal, "The problem of concept drift: definitions and related work,"

Technical Repor, 2004.

[29] W. H. Au and K. C. Chan, "Fuzzy data mining for discovering changes in association rules over time," International Conference on Fuzzy Systems, pp.

890-895, 2002.

[30] W. Ng and M. Dash, "A change detector for mining frequent patterns over evolving data streams," IEEE Transactions on Systems, Man and Cybernetics, pp. 2407-2412, 2008.

[31] E. Apeh and B. Gabrys, "Change mining of customer profiles based on transactional data,"International Conference on Data Mining Workshops, pp.

560-567, 2011.

[32] A. Hora, N. Anquetil, S. Ducasse, and M. T. Valente, "Mining system specific rules from change patterns," Working Conference on Reverse Engineering, pp.

331-340, 2013.

[33] C. I. Lee, C. J. Tsai, J. H. Wu, and W. P. Yang, "A decision tree-based approach to mining the rules of concept drift,"Fuzzy Systems and Knowledge Discovery, pp. 639-643, 2007.

[34] L. C. Cheng and M. T. Lai, "Mining the change of consensus from group ranking decisions,"Fuzzy Systems and Knowledge Discovery, pp. 1459-1463, 2011.

[35] M. Z. Hayat, J. Basiri, L. Seyedhossein, and A. Shakery, "Content-based concept drift detection for email spam filtering,"International Symposium on

Telecommunications pp. 531-536, 2010.

[36] B. Thuraisingham, "Data mining for security applications: Mining concept-drifting data streams to detect peer to peer botnet traffic,"Intelligence and Security Informatics, 2008.

[37] M. Z. Hayat and M. R. Hashemi, "A dct based approach for detecting novelty and concept drift in data streams,"Soft Computing and Pattern Recognition, pp.

373-378, 2010.

[38] S. Shetty, S. K. Mukkavilli, and L. Keel, "An integrated machine learning and control theoretic model for mining concept-drifting data streams,"Technologies for Homeland Security, pp. 75-80, 2011.

[39] P. D. Patil and P. Kulkarni, "Adaptive supervised learning model for training set selection under concept drift data streams,"Cloud & Ubiquitous Computing &

Emerging Technologies, pp. 36-41, 2013.

[40] J. Sun, H. Li, and H. Adeli, "Concept drift-oriented adaptive and dynamic support vector machine ensemble with time window in corporate financial risk prediction," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol.

43, pp. 801-813, 2013.

[41] P. B. Dongre and L. G. Malik, "A review on real time data stream classification and adapting to various concept drift scenarios,"IEEE International Advance

Computing Conference, pp. 533-537, 2014.

[42] E. Padmalatha, C. Reddy, and B. P. Rani, "Classification of concept drift data streams,"Information Science and Applications, pp. 1-5, 2014.

[43] R. Agrawal, T. Imielinski, and A. Swami, "Database mining: A performance perspective," Knowledge and Data Engineering, vol. 5, pp. 914-925, 1993.

[44] R. Agrawal and R. Srikant, "Fast algorithms for mining association rules,"Very Large Data Bases, pp. 487-499, 1994.

[45] R. Srikant, Q. Vu, and R. Agrawal, "Mining association rules with item constraints,"Knowledge Discovery in Databases, pp. 67-73, 1997.

[46] C. M. Kuok, A. Fu, and M. H. Wong, "Mining fuzzy association rules in databases," ACM Special Interest Group on Management of Data, vol. 27, pp.

41-46, 1998.

[47] T. P. Hong, C. S. Kuo, and S. C. Chi, "Mining association rules from quantitative data," Intelligent Data Analysis, vol. 3, pp. 363-376, 1999.

[48] T. P. Hong, M. J. Chiang, and S. L. Wang, "Fuzzy weighted data mining from quantitative transactions with linguistic minimum supports and confidences,"

International Journal of Fuzzy Systems, vol. 8, 2006.

[49] H. Jin, J. Sun, H. Chen, and Z. Han, "A fuzzy data mining based intrusion detection model,"Future Trends of Distributed Computing Systems, pp. 191-197,

2004.

[50] C. W. Lin, T. P. Hong, and W. H. Lu, "An efficient tree-based fuzzy data mining approach," International Journal of Fuzzy Systems, vol. 12, pp. 150-157, 2010.

[51] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms:

Springer Science & Business Media, 2013.

[52] A. Parodi and P. Bonelli, "A new approach to fuzzy classifier systems,"5th International Conference on Genetic Algorithms, pp. 223-230, 1993.

[53] J. Alcalá Fdez, R. Alcalá, M. J. Gacto, and F. Herrera, "Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms," Fuzzy Sets and Systems, vol. 160, pp. 905-921, 2009.

在文檔中數量型資料庫中模糊概念轉移探勘之研究 (頁 55-0)