Related work - Toward intelligent data warehouse mining: An ontology-integrated approach for mu

This section gives an overview of the literature related to our work in twofold: the use of ontology in data mining and data warehouse mining.

(1) Use of ontology in data mining

If the concept hierarchy or taxonomy can be viewed as an ontology, then the use of ontologies in data mining can be traced back to 1991 when Nunez used information of the classification hierarchy and attribute processing cost to improve the efficiency of the classification process(Nunez, 1991). Later, Han & Fu (1995) and Srikant & Agrawal (1995) also proposed combining classification hierarchies to mine multilevel association rules and generalized association rules, respectively. Their works were later extended by Chien et al.

(2007), who not only applied classification but also composition hierarchical knowledge to mining fuzzy association rules. These researches, however, concentrated on the design of the algorithms, yet discussion of ontology structure design and its benefit to data mining were not covered. Until recently, research on applying ontology to data mining was exploited by several studies such as, ontology-based induction of rules (Aronis et al., 1996; Taylor et al., 1997), based business understanding (Sharma & Osei-Bryson, 2009), ontology-based post-processing and explanation of association rules (Domingues & Rezende, 2005;

Liao et al., 2009; Marinica et al., 2008; Svatek et al., 2005), ontology-supported selection of classification algorithms (Bernstein et al., 2005; Lin et al., 2006), ontology-guided new

attributes generation from databases (Phillips & Buchanan, 2001), and ontology-based integration and preprocessing of data (Euler & Scholz, 2004; Perez-Rey et al., 2006).

Differing from the above work on dealing with the issue of incorporating ontology in the individual phase of the well known KDD process proposed by Fayyad et al. (1996), there has been work conducted from an integral perspective. For example, Kopanas et al. (2002) pointed out the essence of incorporating ontology (the term domain knowledge is used instead) to the KDD process and demonstrated their viewpoints using a telecommunication customer insolvency case study.Cespivova et al. (2004) conducted a systematic study by discussing the roles of medical domain ontology in each aspect of the KDD process. A similar study was also presented in (Gottgtroy et al., 2004; Kuo et al., 2007). A position paper presented by Charlest et al. (2006) discussed the synergy of combining case based reasoning and ontology in the context of data mining assistance framework, though the issue of realization and implementation was left aside. In 2006, Pan & Pan proposed an ontology supporting data mining from databases. They maintained previous mining results in ontology that can further be applied for incremental association rule mining.

(2) Data warehouse mining

Currently, the research on data warehouse mining is mostly concentrated on data mining from data cubes or multi-dimensional databases. J. Han’s research group pioneered this research subject (Han, 1998; Han et al., 1999). The study conducted by Ester and his colleagues (Ester et al., 1998; Ester & Wittmann, 1998) instead considered the problem of incrementally updating mined patterns from data warehouses. In 2000, Psaila and Lanzi studied multi-level association mining from a primitive data warehouse and proposed a mining algorithm. Since then, substantial works have been devoted to discovering multidimensional association rules from data warehouses (Ng et al., 2002; Chung &

Mangamuri, 2005; Tjioe & Taniar, 2005; Messaoud et al., 2006; Yang et al., 2008).

The research by Priebe & Pernul (2003) first exploited the issues of incorporating ontology into knowledge discovery from data warehouses. In particular, it proposed an intelligent web portal integrating OLAP and information retrieval through ontology, yet it focused on information retrieval issues but not on data mining. Subsequent work on multiple source integration for data warehouse OLAP construction includes Niemi et al. (2007) and Shah et al. (2009). In (Wu et al., 2007), we presented the problems with contemporary association rule mining in data warehousing systems, explained the essence that incorporates

ontologies to resolve the problems, anddemonstrated a preliminary framework.

7. Conclusions

The purpose of data mining is for users to find real and useful knowledge they actually want. In this paper we have shown a data warehouse mining system framework with intelligent assistance incorporating schema ontology, schema constraint ontology, domain ontology and user preference ontology. We have demonstrated the intelligent assistance provided by the mining system in guiding users through the mining processes. This improves the mining effectiveness and efficiency in four aspects as follows. First, the processes of the mining model settings are assisted by intelligent functions, minimizing the possibilities of illegal settings of mining models. Also, appropriate recommendations of the mining model elements are provided while the users are setting the mining model. This avoids execution of ineffective or redundant mining processes and also guides the users through the approaching of the mining models that are closer to their mining intention. Second, with the support of domain ontology, mining rules can be extended and generalized. Third, the information in the domain ontology can be included in the filtering condition to obtain a more specific search space. More precise knowledge can be discovered. Fourth, it provides the system with knowledge browsing capability that a mining model can be examined against the user preference ontology for any duplication or similarities. This saves the system’s resources. In this paper, we have discussed the intelligent assistance in general. A preliminary implementation of this system framework has also been provided to demonstrate the claimed benefits.

The ontologies we have proposed in this paper are implemented in relational table structures. Nevertheless, these ontologies are local to the specific mining system we have proposed. Making them globally sharable is challenging and is an important future work.

Acknowledgements

This work was supported by the National Science Council of R.O.C. under grant NSC 95-2221-E-390-024.

References

Aronis, J.M., Provost, F.J., & Buchanan, B.G. (1996). Exploiting background knowledge in automated discovery. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (pp. 355–358).

Bernstein, A., Provost, F., & Hill, S. (2005). Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17(4), 503–518.

Cespivova, H., Rauch, J., Svatek, V., Kejkula, M., & Tomeckova, M. (2004). Roles of medical ontology in association rule mining Crisp-Dm cycle. In Proceedings of ECML/PKDD Workshop on Knowledge Discovery and Ontologies.

Charest, M., Delisle, S., Cervantes, O., & Shen, Y. (2006). Intelligent data mining assistance via CBR and ontologies. In Proceedings of the 17th International Conference on Database and Expert Systems Applications (pp. 593–597).

Chaudhuri, S., & Dayal, U. (1997). An overview of data warehouse and OLAP technology.

ACM SIGMOD Record 26(1), 65-74.

Chien, B.C., Zhong, M.H., & Wang, J.J. (2007). Mining fuzzy association rules on has-a and is-a hierarchical structures. International Journal of Advanced Computational Intelligence and Intelligent Informatics 11(4), 423–432.

Cheung, D.W., Han, J., Ng, V.T., & Wong, C.Y. (1996). Maintenance of discovered association rules in large databases: An incremental update technique. In Proceedings of the 12th International Conference on Data Engineering (pp. 106–114).

Chung, S.M., & Mangamuri, M. (2005). Mining association rules from the star schema on a parallel NCR teradata database system. In Proceedings of International Conference on Information Technology: Coding and Computing (pp. 206–212).

Domingues, M.A., & Rezende, S.O. (2005). Using taxonomies to facilitate the analysis of the association rules. In Proceedings of the 2nd International Workshop on Knowledge Discovery and Ontologies.

van Elst, L., & Abecker, A. (2002). Ontologies for information management: balancing formality, stability, and sharing scope. Expert Systems with Applications 23(4), 357–366.

Ester, M., Kriegel, H.P., Sander, J., Wimmer, M., & Xu, X. (1998). Incremental clustering for mining in a data warehousing environment. In Proceedings of 24th International Conference on Very Large Data Bases (pp. 323–333).

Ester, M., & Wittmann R. (1998). Incremental generalization for mining in a data warehousing environment. In Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology (pp. 135–149) Euler, T. & M. Scholz. (2004). Using ontologies in a KDD workbench. In Proceedings of

ECML/PKDD Workshop on Knowledge Discovery and Ontologies.

Fayyad U., Piatetsky-Shapiro G., and Smyth P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM 39(11), 27–34.

Gottgtroy, P., Kasabov, N., & MacDonell, S. (2004). An ontology driven approach for knowledge discovery in biomedicine. In Proceedings of the 8th Pacific Rim International Conference on Artificial Intelligence.

Han, J. (1998). Toward on-line analytical mining in large databases. ACM SIGMOD Record 27(1), 97–107.

Han, J., Chiang, J.Y., Chee, S., et al. (1997). DBMiner: A system for data mining in relational databases and data warehouses. In Proceedings of the 1997 Conference of the Centre for Advanced Studies on Collaborative Research (pp. 250–255).

Han, J., & Fu, Y. (1995). Discovery of multiple-level association rules from large databases.

In Proceedings of the 21st Very Large Databases Conference (pp. 420–431).

Han J., & Kamber, M. (2001). Data Mining: Concepts and Techniques, Morgan Kaufmann.

Han, J., Lakshmanan, L.V.S., & Ng, R.T. (1999). Constraint-based, multi-dimensional data mining. IEEE Computer 32(8), 46–50.

Heflin, J., Editor (2004). OWL Web Ontology Language Use Cases and Requirements, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-webont-req-20040210/

Inmon, W.H. (1995). Building the Data Warehouse, John Wiley & Sons, Inc., New York, NY.

Kimball, R. (1996). The Data Warehouse Toolkit Practical For Building Dimensional Data Warehouses, John Wiley & Sons, Inc.

Kopanas, I., Nikolaos N., Avouris, M., & Daskalaki, S. (2002). The role of domain knowledge in a large scale data mining project. In Proceedings of the 2nd Hellenic Conference on AI:Methods and Applications of Artificial Intelligence (pp. 288–299).

Kuo, Y.T., Lonie, A., & Sonenberg, L. (2007). Domain ontology driven data mining: A medical case study. In Proceedings of ACM SIGKDD Workshop on Domain Driven Data Mining (pp. 11–17).

Liao, S.H., Ho, H.H., & Yang, F.C. (2009). Ontology-based data mining approach implemented on exploring product and brand spectrum. Expert Systems with Applications 36(9), 11730–11744.

Lin, M.S., Zhang, H., & Yu, Z.G. (2006). An ontology for supporting data mining process. In Proceedings of IMACS Multiconference on Computational Engineering in Systems Applications (pp. 2074–2077).

Lisi, F.A., & Malerba, D. (2004). Inducing multi-level association rules from multiple relations. Machine Learning 55(2), 175–210.

Liu, J., & Yin, J. (2001). Towards efficient data re-mining (DRM). In Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Lecture Notes in Computer Science 2035 (pp. 406–412).

Marinica, C., Guillet, F., & Briand, H. (2008). Post-processing of discovered association rules using ontologies. In Proceedings of IEEE International Conference on Data Mining Workshops (pp. 126–133).

Messaoud, R.B., Rabaséda, S.L., Boussaid, O., & Missaoui, R. (2006). Enhanced mining of association rules from data cubes. In Proceedings of the 9th ACM International Workshop on Data Warehousing and OLAP (pp. 11–18).

Ng, E.K.K., Ng, K., Fu, A.W.C., & Wang, K. (2002). Mining association rules from stars. In Proceedings of the 2002 IEEE International Conference on Data Mining (pp. 322–329) Niemi, T. Toivonen, S., Niinimaki, M., & Nummenmaa, J. (2007). Ontologies with semantic

web/grid in data integration for OLAP. International Journal on Semantic Web and Information Systems 3(4), 25–49.

Nunez, M. (1991). The use of background knowledge in decision tree induction. Machine Learning 6(3), 231–250.

Pan, D., & Pan, Y. (2006). Using ontology repository to support data mining. In Proceedings of the 6th World Congress on Intelligent Control and Automation (pp. 5947–5951).

Perez-Rey, D., Anguita, A., & Crespo J. (2006). OntoDataClean: Ontology-based integration and preprocessing of distributed data. Lecture Notes in Computer Science4345,262–272.

Perng, C.S., Wang, H., Ma, S., & Hellerstein, J.L. (2001). Farm: A framework for exploring mining spaces with multiple attributes. In Proceedings of the 1st IEEE International Conference on Data Mining (pp. 449–456).

Perng, C.S., Wang, H., Ma, S., & Hellerstein, J.L. (2002). User-directed exploration of mining space with multiple attributes. In Proceedings of the 2nd IEEE International Conference on Data Mining (pp. 394–401).

Phillips, J., & Buchanan, B.G. (2001). Ontology-guided knowledge discovery in databases. In Proceedings of the 1st International Conference on Knowledge Capture (pp. 1230–130).

Priebe, T., & Pernul, G. (2003). Ontology-based integration of OLAP and information retrieval. In Proceedings of the 14th International Workshop on Database and Expert Systems Applications (pp. 610–614).

Psaila, G., & Lanzi, P.L. (2000). Hierarchy-based mining of association rules in data warehouses. In Proceedings of ACM Symposium on Applied Computing (pp. 307–312).

Sharma, S., & Osei-Bryson, K.M. (2009). Framework for formal implementation of the business understanding phase of data mining projects. Expert Systems with Applications 36(2), 4114–4124.

Shah, N., Tsai, C.F., Marinov, M., Cooper, J., Vitliemov, P., & Chao, K.M. (2009).

Ontological on-line analytical processing for integrating energy sensor data, IETE Technical Review 26(5), 375–387.

Srikant, R., & Agrawal, R. (1995). Mining generalized association rules. In Proceedings of the 21st Very Large Data Bases Conference (pp. 407–419).

Svatek, V., Rauch, J., & Flek, M. (2005). Ontology-based explanation of discovered associations in the domain of social reality. In Proceedings of the 2nd International Workshop on Knowledge Discovery and Ontologies, 2005.

Taylor, M., Stoffel, K., & Hendler, J. (1997). Ontology-based induction of high level classification rules. In Proceedings of SIGMOD Data Mining and Knowledge Discovery Workshop.

Tjioe, H.C., & Taniar, D. (2005). Mining Association Rules in Data Warehouses.

International Journal of Data Warehousing and Mining 1(3), 28–62.

Tseng, M.C., Lin, W.Y., & Jeng, R. (2007). Mining association rules with ontological information. In Proceedings of the 2nd International Conference on Innovative Computing, Information and Control (pp. 300–303).

Uschold, M., & Gruninger, M. (1996). Ontologies: principles, methods and applications.

Knowledge Engineering Review 11(2), 93–155.

Wu, C.A., Lin, W.Y., Tseng, M.C., & Wu, C.C. (2007). Ontology-incorporated mining of association rules in data warehouse. Journal of Internet Technology 8(4), 477–485.

Wu, C.A., Lin, W.Y., Jiang, C.L., & Wu, C.C. (2009). Favorable support threshold recommendation for multidimensional association mining using user preference ontology.

In Proceedings of 2009 IEEE International Conference on Granular Computing (pp.

586–591).

Yang, W., Li, Y., Wu, J., & Xu, Y. (2008). Granule mining oriented data warehousing model for representations of multidimensional association rules. International Journal of Intelligent Information and Database Systems 2(1), 125–145

Zhu, H. (1998) On-Line Analytical Mining of Association Rules. Master’s Thesis, Simon Fraser University, U.S.A.

在文檔中 Toward intelligent data warehouse mining: An ontology-integrated approach for multi-dimensional association mining (頁 29-36)