Itemset Count
7. Discussion and Conclusions
From Figure 6, it is easily seen that the numbers of rules by the proposed mining
algorithm are greater than those by FDM. The taxonomic structures can gather purchased
items into general classes, thus being able to generating additional large itemsets. The large
amounts of rules may, however, cause humans hard to interpret them. Thus, the support and
the confidence values in the proposed algorithm can be set at a higher value than those set by
FDM. For example, when the minimum support value is set at 1000, the number of rules can
be reduced to below 250 (from Figure 4).
7. Discussion and Conclusions
In this paper, we have proposed a fuzzy multiple-level data-mining algorithm that can
process transaction data with quantitative values and discover interesting patterns among 0
2000 4000 6000 8000 10000 12000
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 min-confidence
number of rules
FDM The proposed method
them. The rules thus mined exhibit quantitative regularity on multiple levels and can be used
to provide suggestions to appropriate supervisors. Compared to conventional crisp-set mining
methods for quantitative data, our approach gets smoother mining results due to its fuzzy
membership characteristics. The mined rules are expressed in linguistic terms, which are more
natural and understandable for human beings. The proposed mining algorithm can also be
degraded into a conventional non-fuzzy mining algorithm by assigning a single membership
function with values always equal to 1 for quantity larger than 0.
When compared to fuzzy mining methods, which take all fuzzy regions into
consideration, our method achieves better time complexity since only the most important
fuzzy term is used for each item. If all fuzzy terms are considered, the possible combinational
searches are large. Therefore, a trade-off exists between rule completeness and time
complexity.
Our proposed algorithm does not find association rules for items on the same paths in
given hierarchy trees. Such rules can however be found to analyze interesting behavior among
these items by slightly modifying the proposed algorithm. For example, it may be that a
certain brand of milk is found to be the most popular for its chocolate flavor from a rule that
states that if the brand of milk has high sales figures, then chocolate-flavored milk also has
high sales figures.
Our algorithm can also be easily modified to allow users to assign their items or terms of
interest. The rules thus mined can therefore satisfy users' requirements. For example, users
can require that only items with high sales quantities are mined, which would greatly reduce
the mining time.
Although the proposed method works well in data mining for quantitative values, it is
just a beginning. There is still much work to be done in this field. In the future, we will first
extend our proposed algorithm to resolve the two problems stated above. Our method also
assumes that membership functions are known in advance. In [15-17], we proposed some
fuzzy learning methods to automatically derive membership functions. We will therefore
attempt to dynamically adjust the membership functions in the proposed mining algorithm to
avoid the bottleneck of membership function acquisition. We will also attempt to design
specific data-mining models for various problem domains.
Acknowledgment
The authors would like to thank the anonymous referees for their very constructive
comments. This research was supported by the National Science Council of the Republic of
China under contract NSC89-2213-E-214-003.
References
[1] R. Agrawal, T. Imielinksi and A. Swami, “Mining associations between sets of items in
massive databases,“ The 1993 ACM SIGMOD Conference on Management of Data,
Washington DC, USA, 1993, pp. 207-216.
[2] R. Agrawal, T. Imielinksi and A. Swami, “Database mining: a performance perspective,”
IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, 1993, pp.
914-925.
[3] A. F. Blishun, “Fuzzy learning models in expert systems,” Fuzzy Sets and Systems, Vol. 22,
No. 1, 1987, pp. 57-70.
[4] L. M. de Campos and S. Moral, “Learning rules for a fuzzy inference model,” Fuzzy Sets
and Systems, Vol. 59, No. 2, 1993, pp. 247-257.
[5] R. L. P. Chang and T. Pavlidis, “Fuzzy decision tree algorithms,” IEEE Transactions on
Systems, Man and Cybernetics, Vol. 7, No. 1, 1977, pp. 28-35.
[6] M. S. Chen, J. Han and P. S. Yu, “Data mining: an overview from a database perspective,”
IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, 1996, pp.
866-883.
[7] C. Clair, C. Liu and N. Pissinou, “Attribute weighting: a method of applying domain
knowledge in the decision tree process,” The Seventh International Conference on
Information and Knowledge Management, Bethesda, Maryland, USA, 1998, pp. 259-266.
[8] M. Delgado and A. Gonzalez, “An inductive learning procedure to identify fuzzy
systems,” Fuzzy Sets and Systems, Vol. 55, No. 2, 1993, pp. 121-132.
[9] A. Famili, W. M. Shen, R. Weber and E. Simoudis, "Data preprocessing and intelligent
data analysis," Intelligent Data Analysis, Vol. 1, No. 1, 1997, pp. 3-23.
[10] W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, “Knowledge discovery in
databases: an overview,” The AAAI Workshop on Knowledge Discovery in Databases,
Anaheim, CA, 1991, pp. 1-27.
[11] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama, "Mining optimized association
rules for numeric attributes," The Fifteenth ACM SIGACT-SIGMOD-SIGART
Symposium on Principles of Database Systems, Montreal, Canada, 1996, pp. 182-191.
[12] A. Gonzalez, “A learning methodology in uncertain and imprecise environments,”
International Journal of Intelligent Systems, Vol. 10, No. 3, 1995, pp. 357-371.
[13] J. Han and Y. Fu, “Discovery of multiple-level association rules from large databases,”
The International Conference on Very Large Databases, Zurich, Switzerland, 1995, pp.
420 -431.
[14] T. P. Hong, C. S. Kuo and S. C. Chi, "A data mining algorithm for transaction data
with quantitative values," Intelligent Data Analysis, Vol. 3, No. 5, 1999, pp. 363-376.
[15] T. P. Hong and J. B. Chen, "Finding relevant attributes and membership functions,"
Fuzzy Sets and Systems, Vol.103, No. 3, 1999, pp. 389-404.
[16] T. P. Hong and J. B. Chen, "Processing individual fuzzy attributes for fuzzy rule induction," Fuzzy Sets and Systems, Vol. 112, No. 1, 2000, pp. 127-140.
[17] T. P. Hong and C. Y. Lee, "Induction of fuzzy rules and membership functions from
training examples," Fuzzy Sets and Systems, Vol. 84, No. 1, 1996, pp. 33-47.
[18] T. P. Hong, C. S. Kuo and S. C. Chi, "Trade-off between time complexity and number of rules for fuzzy mining from quantitative data," International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems, Vol. 9, No. 5, 2001, pp.
587-604.
[19] A. Kandel, Fuzzy Expert Systems, CRC Press, Boca Raton, 1992, pp. 8-19.
[20] R. S. Michalski, I. Bratko and M. Kubat, Machine Learning and Data Mining: Methods
and Applications, John Wiley & Sons Ltd, England, 1998.
[21] J. R. Quinlan, “Decision tree as probabilistic classifier,” The Fourth International
Machine Learning Workshop, Morgan Kaufmann, San Mateo, CA, 1987, pp. 31-37.
[22] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo,
CA, 1993.
[23] R. Rastogi and K. Shim, "Mining optimized association rules with categorical and
numeric attributes," The 14th IEEE International Conference on Data Engineering,
Orlando, 1998, pp. 503-512.
[24] R. Rastogi and K. Shim, "Mining optimized support rules for numeric attributes," The
15th IEEE International Conference on Data Engineering, Sydney, Australia, 1999, pp.
206-215.
[25] J. Rives, “FID3: fuzzy induction decision tree,” The First International Symposium on
Uncertainty, Modeling and Analysis, University of Maryland, College Park, Maryland,
1990, pp. 457-462.
[26] R. Srikant, Q. Vu and R. Agrawal, “Mining association rules with item constraints,” The
Third International Conference on Knowledge Discovery in Databases and Data Mining,
Newport Beach, California, August 1997, pp.67-73.
[27] R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational
tables,” The 1996 ACM SIGMOD International Conference on Management of Data,
Monreal, Canada, June 1996, pp. 1-12.
[28] C. H. Wang, T. P. Hong and S. S. Tseng, “Inductive learning from fuzzy examples,” The
Fifth IEEE International Conference on Fuzzy Systems, New Orleans, 1996, pp. 13-18.
[29] C. H. Wang, J. F. Liu, T. P. Hong and S. S. Tseng, “A fuzzy inductive learning strategy
for modular rules,” Fuzzy Sets and Systems, Vol. 103, No. 1, 1999, pp. 91-105.
[30] R. Weber, “Fuzzy-ID3: a class of methods for automatic knowledge acquisition,” The
Second International Conference on Fuzzy Logic and Neural Networks, Iizuka, Japan,
1992, pp. 265-268.
[31] Y. Yuan and M. J. Shaw, “Induction of fuzzy decision trees,” Fuzzy Sets and Systems, Vol.
69, No. 2, 1995, pp. 125-139.
[32] L. A. Zadeh, “Fuzzy sets,” Information and Control, Vol. 8, No. 3, 1965, pp. 338-353.