Multi-level fuzzy mining with multiple minimum supports

(1)

Multi-level fuzzy mining with multiple minimum supports

Yeong-Chyi Lee

a

, Tzung-Pei Hong

b,*

, Tien-Chin Wang

c

a_{Department of Information Engineering, I-Shou University, Kaohsiung 84008, Taiwan, ROC} b_{Department of Electrical Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, ROC}

c_{Department of Information Management, I-Shou University, Kaohsiung 84008, Taiwan, ROC}

Abstract

Finding association rules in transaction databases is most commonly seen in data mining. In real applications, different items may have different support criteria to judge their importance, taxonomic relationships among items may appear, and data may have quantitative values. This paper thus proposes a fuzzy multiple-level mining algorithm for extracting knowledge implicit in quantitative transactions with multiple minimum supports of items. Items may have different minimum supports and the maximum-itemset minimum-taxonomy support constraint is adopted to discover the large itemsets. Under the constraint, the characteristic of down-ward-closure is kept, such that the original apriori algorithm can be easily extended to find fuzzy large itemsets. The proposed algorithm adopts a top-down progressively deepening approach to derive large itemsets. It can also discover cross-level fuzzy association rules under the maximum-itemset minimum-taxonomy support constraint. An example is also given to demonstrate that the proposed mining algorithm can derive the multiple-level association rules under multiple item supports in a simple and effective way.

Keywords: Data mining; Fuzzy set; Multiple minimum supports; Quantitative value; Taxonomy

1. Introduction

Data mining has played an important role in identifying eﬀective, coherent, potentially useful, and unknown pat-terns from a vast amount of data. Among the data-mining technologies, ﬁnding association rules in transaction dat-abases is most commonly seen (Agrawal, Imielinksi, & Swami, 1993a; Agrawal, Imielinksi, & Swami, 1993b;

Chen, Han, & Yu, 1996;Famili, Shen, Weber, & Simoudis, 1997;Frawley, Piatetsky-Shapiro, & Matheus, 1991; Fuk-uda, Morimoto, Morishita, & Tokuyama, 1996; Han & Fu, 1995; Hong, Kuo, & Chi, 1999; Srikant & Agrawal, 1995, 1996). An association rule is expressed as the form A! B, where A and B are sets of items, such that the pres-ence of A in a transaction will imply the prespres-ence of B in the same transaction. It is initially applied to market basket

analysis for getting relationships of purchased items. The mined knowledge about the items tending to be purchased together can then be passed to managers as a good refer-ence in planning store layout and market policy.

Transaction data in real-world applications usually con-sist of quantitative values, so designing a sophisticated data-mining algorithm able to deal with quantitative data presents a challenge to workers in this research ﬁeld. Agra-wal and his co-workers thus proposed a method (Srikant & Agrawal, 1995) for mining association rules from data sets using quantitative and categorical attributes. Their pro-posed method ﬁrst determines the number of partitions for each quantitative attribute, and then maps all possible values of each attribute onto a set of consecutive integers. Other methods have also been proposed to handle numeric attributes and to derive association rules. Fukuda et al. introduced the optimized association-rule problem and used a dynamic programming algorithm to solve it. Their approach could allow association rules to contain only a single numerical attribute on the left-hand side (Fukuda et al., 1996). Rastogi and Shim then extended the approach

*

Corresponding author.

E-mail addresses: d9003007@stmail.isu.edu.tw(Y.-C. Lee),tphong@ nuk.edu.tw(T.-P. Hong),tcwang@isu.edu.tw(T.-C. Wang).

www.elsevier.com/locate/eswa Expert Systems with Applications 34 (2008) 459–468

Expert Systems with Applications

(2)

to more than one optimal region, and showed that the problem was NP-hard even for cases involving one unin-stantiated numeric attribute (Rastogi & Shim, 1999). Recently, fuzzy set theory has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning (Kandel, 1992). Several fuzzy learning algorithms for inducing rules from given sets of data have been designed and used to good eﬀect with speciﬁc domains (de Campos & Moral, 1993; Delgado & Gonzalez, 1993; Gonzalez, 1995; Hong & Lee, 1996; Hong & Chen, 2000; Rives, 1990; Wang, Liu, Hong, & Tseng, 1999). Wang et al. proposed a fuzzy version space learning strategy for managing vague information (Wang, Hong, & Tseng, 1996). Hong et al. proposed a fuzzy mining algo-rithm for managing quantitative data (Hong et al., 1999). Strategies based on decision trees were also proposed (Rives, 1990; Weber, 1992; Yuan & Shaw, 1995).

Most of the previous approaches set a single minimum support threshold for all the items (Agrawal & Srikant, 1994;Brin, Motwani, Ullman, & Tsur, 1997; Park, Chen, & Yu, 1995; Savasere, Omiecinski, & Navathe, 1995). In real applications, different items may have different criteria to judge their importance. The support requirements thus vary with different items. For example, the minimum sup-ports for cheaper items may be set higher than those for more expensive ones. Liu et al. proposed an approach for mining association rules with non-uniform minimum sup-port values (Liu, Hsu, & Ma, 1999). Their approach allowed users to specify different minimum supports to dif-ferent items. They defined the minimum support value of an itemset as the lowest minimum support among all the items in the itemset. Wang et al. proposed a mining approach, which allowed the minimum support value of an itemset to be any function of the minimum support val-ues of items contained in the itemset (Wang, He, & Han, 2000). Items were grouped into disjoint sets called bins, and items within the same bin were regarded as non-distin-guishable with respect to the specification of a minimum support. We also proposed a simple and efficient algorithm based on the apriori approach to generate large itemsets under the maximum constraints of multiple minimum sup-ports (Lee, Hong, & Lin, 2004; Lee, Hong, & Lin, 2005). The characteristics of processing itemsets in a level-wise way could be reserved, causing the mining process efficient according to the maximum constraints.

Furthermore, taxonomic relationships among items often appear in real applications. For example, wheat bread and white bread are two kinds of bread. Bread is thus a higher level of concept than wheat bread or white bread. The information needed by decision makers in some applications is not necessary to be detailed to the primitive concept level, but at a higher one. For example, the associ-ation rule ‘‘bread! milk’’ may be more helpful to decision makers than the rule ‘‘wheat bread! juice milk’’. Discov-ering association rules at diﬀerent levels may thus provide more information than that only at a single level (Han & Fu, 1995; Srikant & Agrawal, 1995).

This paper thus proposes a fuzzy multiple-level mining algorithm with multiple supports of items for extracting implicit knowledge from transactions stored as quantitative values. The proposed algorithm adopts a top-down pro-gressively deepening approach to finding large itemsets. It integrates fuzzy-set concepts, data-mining technologies and multiple-level taxonomy to find fuzzy association rules in given transaction data sets. Each primitive item is given a predefined support threshold, and the minimum support of an item at a higher level and an itemset is determined by the maximum of the support thresholds of the items con-tained in it. The mined rules are expressed in linguistic terms, which are more natural and understandable for human beings.

The remaining parts of this paper are organized as follows. Some related mining algorithms are reviewed in Section2. The proposed algorithm for mining fuzzy multi-ple-level association rules under the maximum-itemset minimum-taxonomy support constraint is described in Section3. An example to illustrate the proposed algorithm is given in Section4. Conclusion and discussion are given in Section5.

2. Review of related mining algorithms

In this section, some related researches about mining multiple-level association rules and mining association rules with multiple minimum supports are reviewed in this section.

2.1. Mining multiple-level association rules

Previous studies on data mining focused on ﬁnding asso-ciation rules at a single-concept level. Mining assoasso-ciation rules at multiple concept levels may, however, lead to dis-covery of more general and important knowledge from data. Relevant item taxonomies are usually predeﬁned in real-world applications and can be represented as hierarchy trees. Terminal nodes on the trees represent actual items appearing in transactions; internal nodes represent classes or concepts formed from lower-level nodes. A simple exam-ple is given inFig. 1.

In Fig. 1, the root node for ‘‘Food’’ is at level 0, the internal nodes representing categories (such as ‘‘Milk’’) are at level 1, the internal nodes representing ﬂavors (such as ‘‘Chocolate’’) are at level 2, and the terminal nodes rep-resenting brands (such as ‘‘Foremost’’) are at level 3. Only terminal nodes appear in transactions.

Han and Fu proposed a method for finding level-cross-ing association rules at multiple levels (Han & Fu, 1995). Their method could find flexible association rules not con-fined to strict, pre-arranged conceptual hierarchies. Nodes in predefined taxonomies are first encoded using sequences of numbers and the symbol ‘‘*_{’’ according to their positions}

in the hierarchy tree. For example, the internal node ‘‘Milk’’ in Fig. 1 is represented by 1**_{, the internal node}

(3)

by 111. A top-down progressively deepening search approach is used and exploration of ‘‘level-crossing’’ asso-ciation relationships is allowed. Candidate itemsets at cer-tain levels may thus concer-tain items at lower levels. For example, candidate 2-itemsets at level 2 are not limited to containing only pairs of large items at level 2. Instead, large items at level 2 may be paired with large items at level 1 to form candidate 2-itemsets at level 2 (such as {11*_{, 2}**_}).

2.2. Mining association rules with multiple minimum supports

A variety of mining approaches based on the apriori algorithm were proposed, each for a specific problem domain, a specific data type, or for improving its efficiency. In these approaches, the minimum supports for all the items or itemsets to be large are set at a single value. But in real applications, different items may have different crite-ria to judge its importance. Liu et al. proposed an approach for mining association rules with non-uniform minimum support values (Liu et al., 1999). Their approach allowed users to specify different minimum supports to different items. The minimum support value of an itemset is defined as the lowest minimum supports among the items in the itemset. This assignment is, however, not always suitable for application requirements. For example, assume the minimum supports of items A and B are set at 20% and 40% respectively. As mentioned above, the minimum sup-port of an item means that the occurrence frequency of the item must be larger than or equal to it for being consid-ered in the next mining steps. If the support of an item is not larger than or equal to the support threshold, this item is not worth considering. When the minimum support value of an itemset is defined as the lowest minimum sup-ports of the items in it, the itemset may be large, but items included in it may be small. In this case, it is doubtable whether this itemset is worth considering. For the example described above, if the support of item B is 30%, smaller than its minimum support 40%, then the 2-itemset {A, B} should not be worth considering. It is thus reasonable in some sense that the occurrence frequency of an interesting

itemset must be larger than the maximum of the minimum supports of the items contained in it.

Wang et al. then generalized the above idea and allowed the minimum support value of an itemset to be any function of the minimum support values of items contained in the itemset (Wang et al., 2000). They proposed a bin-oriented, non-uniform support constraint. Items were grouped into disjoint sets called bins, and items within the same bin were regarded as non-distinguishable with respect to the specifi-cation of a minimum support. Although their approach is flexible in assigning the minimum supports to itemsets, the mining algorithm is a little complex due to its generality. As mentioned before, it is meaningful to assign the mini-mum support of an itemset as the maximini-mum of the minimini-mum supports of the items contained in the itemset. Although Wang et al.’s approach can solve this kind of problems, the time complexity is high. Besides, their approach does not consider items with quantitative values and organized into multiple levels. In our previous work, a simple algo-rithm based on the apriori approach was proposed to find the large-itemsets and association rules under the maximum constraint of multiple minimum supports (Lee et al., 2004, 2005). The proposed algorithm is easy and efficient when compared to Wang et al.’s under the maximum constraint. Below, we will propose an efficient algorithm based on fuzzy sets and Han’s mining approach for multiple-level items to generate the fuzzy large itemsets level by level.

3. The proposed algorithm

The proposed mining algorithm integrates fuzzy set con-cepts, data mining and multiple-level taxonomy to find fuzzy association rules in a given transaction data set. The knowledge derived is represented by fuzzy linguistic terms, and thus easily understandable by human beings. In the proposed algorithm, items may have different mini-mum supports and taxonomic relationships, and the max-imum-itemset minimum-taxonomy support constraint is adopted to discover the large itemsets. Each primitive item is given a predefined support threshold (minimum sup-port). The minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset, while the minimum support for an item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it. Under the constraint, the characteristic of downward-closure is kept, such that the original apriori algorithm can be easily extended to find the fuzzy large itemsets.

The proposed fuzzy mining algorithm first encodes items (nodes) in a given taxonomy as Han and Fu’s approach did (Han & Fu, 1995). It then filters out unpromising itemsets in two phases. In the first phase, an item group is removed if its occurring count is less than the support threshold. In the second phase, the count of a fuzzy region is checked to determine whether it is large. In this phase, a set of mem-bership functions are used to transform the quantitative

Food ... Milk Apple Chocolate ... ... White Wheat

Dairyland Foremost OldMills Wonder Bread

... ...

Fig. 1. A taxonomy example.

(4)

If 3**_{= Middle, then 2}**_{= Middle;}

If 21*_{= Middle, then 22}*_{= Low;}

If 22*_{= Low, then 21}*_{= Middle;}

If 22*_{= Low, then 32}*_{= Middle;}

If 32*_{= Middle, then 22}*_{= Low.}

(b) The conﬁdence factors of the above association rules are calculated. Take the possible association rule ‘‘If 2**_{= Middle, then 3}**_{= Middle’’ as an example.}

The conﬁdence value for this rule is calculated as P10 i¼1ð2 _:Middle_{\ 3}_:MiddleÞ P10 i¼1ð2 _:MiddleÞ ¼ 2:27 3:07¼ 0:74: Results for all possible association rules are shown as follows:

If 2**_{= Middle, then 3}**_{= Middle, with}

conﬁ-dence 0.74;

If 3**_{= Middle, then 2}**_{= Middle, with}

conﬁ-dence 0.69;

If 21*_{= Middle, then 22}*_{= Low, with conﬁdence}

0.82;

If 22*_{= Low, then 21}*_{= Middle, with conﬁdence}

0.46;

0.97;

1.0.

Step 18: The confidence values of the above association rules are compared with the predefined confidence threshold k. Assume the confidence k is set at 0.8 in this example. The following three association rules are thus output:

0.82;

0.97;

If 32*_{= Middle, then 22}*_{= Low, with conﬁdence 1.0.}

In this example, three fuzzy association rules are gen-erated. The proposed algorithm can thus ﬁnd the large itemsets level by level without backtracking.

5. Conclusion

In this paper, we have integrated fuzzy set concepts, data mining, multiple-level taxonomy and multiple mini-mum supports to find fuzzy association rules in a given quantitative transaction data set. Using different criteria to judge the importance of different items, managing taxo-nomic relationships among items, and dealing quantitative data sets are three issues that usually occur in real mining applications. In the proposed algorithm, the minimum sup-port for an item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it and the minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset. The rational for using the two kinds of sup-port constraints has also been well explained and this

con-straint may be suitable to some mining domains. The proposed fuzzy mining algorithm can thus generate large itemsets level by level and then derive fuzzy association rules from quantitative transaction data. An example is also given to demonstrate that the proposed mining algo-rithm can derive the multiple-level association rules under multiple item supports in a simple and eﬀective way.

References

Agrawal, R., Imielinksi, T., & Swami, A. (1993a). Mining association rules between sets of items in large database. In The 1993 ACM SIGMOD conference on management of data (pp. 207–216) Washington DC, USA. Agrawal, R., Imielinksi, T., & Swami, A. (1993b). Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6), 914–925.

Agrawal, R., & Srikant, R. (1994). Fast algorithm for mining association rules. In The international conference on very large data bases (pp. 487– 499).

Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market-basket data. In: Proceedings of the 1997 ACM-SIGMOD international conference in management of data (pp. 207–216).

Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883.

de Campos, L. M., & Moral, S. (1993). Learning rules for a fuzzy inference model. Fuzzy Sets and Systems, 59, 247–257.

Delgado, M., & Gonzalez, A. (1993). An inductive learning procedure to identify fuzzy systems. Fuzzy Sets and Systems, 55, 121–132. Famili, A., Shen, W. M., Weber, R., & Simoudis, E. (1997). Data

preprocessing and intelligent data analysis. Intelligent Data Analysis, 1(1), 3–23.

Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1991). Knowledge discovery in databases: an overview. In The AAAI Workshop on Knowledge Discovery in Databases (pp. 1–27).

Fukuda, T., Morimoto, Y., Morishita, S., & Tokuyama, T. (1996). Mining optimized association rules for numeric attributes. The ACM SIG-ACT-SIGMOD-SIGART symposium on principles of database systems (pp. 182–191), June.

Gonzalez, A. (1995). A learning methodology in uncertain and imprecise environments. International Journal of Intelligent Systems, 10, 57–371. Han, J., & Fu, Y. (1995). Discovery of multiple-level association rules from large databases. The international conference on very large databases (pp. 420–431).

Hong, T. P., & Lee, C. Y. (1996). Induction of fuzzy rules and member-ship functions from training examples. Fuzzy Sets and Systems, 84, 33–47.

Hong, T. P., Kuo, C. S., & Chi, S. C. (1999). A data mining algorithm for transaction data with quantitative values. Intelligent Data Analysis, 3(5), 363–376.

Hong, T. P., & Chen, J. B. (2000). Processing individual fuzzy attributes for fuzzy rule induction. Fuzzy Sets and Systems, 112(1), 127–140.

Kandel, A. (1992). Fuzzy expert systems. Boca Raton: CRC Press, pp. 8–19. Lee, Y. C., Hong, T. P., & Lin, W. Y. (2004). Mining fuzzy association rules with multiple minimum supports using maximum constraints. The eighth international conference on knowledge-based intelligent information and engineering systems (Vol. 3214, pp. 1283–1290). Lecture Notes in Computer Science.

Lee, Y. C., Hong, T. P., & Lin, W. Y. (2005). Mining association rules with multiple minimum supports using maximum constraints. Inter-national Journal of Approximate Reasoning, 40(1), 44–54.

Liu, B., Hsu, W., & Ma, Y. (1999). Mining association rules with multiple minimum supports. In Proceedings of the 1999 international conference on knowledge discovery and data mining (pp. 337–341).

(5)

Park, J. S., Chen, M. S., & Yu, P. S. (1995). An Eﬀective hash-based algorithm for mining association rules. In Proceedings of the 1995 ACM-SIGMOD international conference in management of data (pp. 175–186).

Rastogi, R., & Shim, K. (1999). Mining optimized support rules for numeric attributes. The 15th IEEE international conference on data engineering (pp. 206–215) Sydney, Australia.

Rives, J. (1990). FID3: fuzzy induction decision tree. In The ﬁrst international symposium on uncertainty, modeling and analysis (pp. 457–462).

Savasere, A., Omiecinski, E., & Navathe, S. (1995). An eﬃcient algorithm for mining association rules in large databases. In Proceedings of the 21st international conference in very large data bases (VLDB’95) (pp. 432–443).

Srikant, R., & Agrawal, R. (1995). Mining generalized association rules. In Proceeding of the 21st international conference on very large data bases, (pp. 407–419).

Srikant, R., & Agrawal, R. (1996). Mining quantitative association rules in large relational tables. In The 1996 ACM SIGMOD international conference on management of data (pp. 1–12), Montreal, Canada, June. Wang, C. H., Hong, T. P., & Tseng, S. S. (1996). Inductive learning from fuzzy examples. In The ﬁfth IEEE international conference on fuzzy systems (pp. 13–18), New Orleans.

Wang, C. H., Liu, J. F., Hong, T. P., & Tseng, S. S. (1999). A fuzzy inductive learning strategy for modular rules. Fuzzy Sets and Systems, 103(1), 91–105.

Wang, K., He, Y., & Han, J. (2000). Mining frequent itemsets using support constraints. In Proceedings of the 26th international conference on very large data bases (pp. 43–52).

Weber, R. (1992), Fuzzy-ID3: a class of methods for automatic knowledge acquisition. In The second international conference on fuzzy logic and neural networks (pp. 265–268), Iizuka, Japan.

Yuan, Y., & Shaw, M. J. (1995). Induction of fuzzy decision trees. Fuzzy Sets and Systems, 69, 125–139.