Note, however, that in some cases the group identity is not so dependent onthe value of a single attribute.
Rather, the group identity depends onthe combined values of a set of attributes. This is particularly true in a database where attributes have strong dependencies among themselves. Combining several individual fea- tures is thus required for constructing multi-attribute predicates with better inference powers. In the second phase, referred to as feature combination phase , those features extracted from the rst phase are evaluated together and multi-attribute predicates with strong in- ference powers are identied. A technique on using match index of attributes is devised to reduce the pro- cessing cost. In essence, a match index is a heuristic indication onthe combined inference power of multiple attributes, and can be used to identify uninteresting combined attributes and remove them from later pro- cessing. Note that being performed only on a subset ofthe training set, the feature extraction phase can be executed eciently. Onthe other hand, since the fea- tures extracted are used to the whole training set in the feature combination phase, the condence ofthe nal classication rules derived can hence be ensured.
The effective development of data mining techniques forthe discovery of knowledge from training samples forclassification problems in industrial engineering is necessary in applications, such as group technology. This paper proposes a learning algorithm, which can be viewed as a knowledge acquisition tool, to effectively discover fuzzy association rulesforclassification problems. The consequence part of each rule is one class label. The proposed learning algorithm consists of two phases: one to generate large fuzzy grids from training samples by fuzzy partitioning in each attribute, and the other to generate fuzzy association rulesforclassification problems by large fuzzy grids. The proposed learning algorithm is implemented by scanning training samples stored in a database only once and applying a sequence of Boolean operations to generate fuzzy grids and fuzzy rules; therefore, it can be easily extended to discover other types of fuzzy association rules. The simulation results from the iris data demonstrate that the proposed learning algorithm can effectively derive fuzzy association rulesforclassification problems. q 2002 Elsevier Science Ltd. All rights reserved.
b Institute of Management of Technology, National Chiao Tung University, Hsinchu 300, Taiwan, ROC Received25 November 2002; receivedin revisedform 22 May 2003; accepted18 September 2003
Data mining techniques can be used to ﬁnd potentially useful patterns from data and to ease the knowledge acquisition bottleneck in building prototype rule-based systems. Based onthe partition methods presented in simple-fuzzy-partition-based method (SFPBM) proposed by Hu et al. (Comput. Ind. Eng. 43(4) (2002) 735), the aim of this paper is to propose a new fuzzy data mining technique consisting of two phases to ﬁndfuzzy if–then rulesfor classiﬁcation problems: one to ﬁndfrequent fuzzy grids by using a pre-speciﬁed simple fuzzy partition method to divide each quantitative attribute, and the other to generate fuzzy classiﬁcation rules from frequent fuzzy grids. To improve the classiﬁcation performance ofthe proposed method, we specially incorporate adaptive rules proposed by Nozaki et al. (IEEE Trans. Fuzzy Syst. 4(3) (1996) 238) into our methods to adjust the conﬁdence of each classiﬁcation rule. For classiﬁcation generalization ability, the simulation results from the iris data demonstrate that the proposed methodmay effectively derive fuzzy classiﬁcation rules from training samples.
One ofthe most important patterns in data mining is to discover association rules from a database. An association rule is an expression ofthe form, X Y, where X and Y are sets of items. Such information is very useful in making decision for business management. In the past few years, there has been researches investigated the problem ofmining association rules with classification or composition information , showing the benefit of incorporating domain knowledge and proposing effective algorithms.
In this paper, we propose a two-phase data mining technique to discover fuzzy rulesfor clas- siﬁcation problems based onthe Apriori algo- rithm. The ﬁrst phase ﬁnds frequent fuzzy grids by dividing each quantitative attribute with a pre- speciﬁed number of various linguistic values. The second phase generates eﬀective fuzzy classiﬁca- tion rules from those frequent fuzzy grids. The fuzzy support and the fuzzy conﬁdence, which have been deﬁned previously (e.g., Ishibuchi et al., 2001a; Ishibuchi et al., 2001b; Hu et al., 2002), are employed to determine which fuzzy grids are fre- quent and which rules are eﬀective by comparison with the minimum fuzzy support (min FS) and the minimum fuzzy conﬁdence (min FC), respectively.
(3) information with respect to types of cargo carried, use of different tanks for cargo/ballast, protection of tanks and condition of coating, if any.
(iii) There are three basic types of possible failure which may be the subject of technical assessment in connection with planning of surveys; corrosion, cracks and buckling. Contac t damages are not normally covered by the survey plan since indents are usually noted in memoranda and assumed to be dealt with as a normal routine by surveyors. Technical assessments performed in conjunction with the survey planning process are, in principle to be as shown schematically in Fig. I 2-1. The approach is basically an evaluationofthe risk based onthe knowledge and experience related to design and corrosion. The design is to be considered with respect to structural details which may be susceptible to buckling or cracking as a result of vibration, high stress levels or fatigue. Corrosion is related to the ageing process, and is closely connected with the quality of corrosion protection at newbuilding, and subsequent maintenance during the service life. Corrosion may also lead to cracking and/or buckling.
(3) information with respect to types of cargo carried, use of different tanks for cargo/ballast, protection of tanks and condition of coating, if any.
(iii) There are three basic types of possible failure which may be the subject of technical assessment in connection with planning of surveys; corrosion, cracks and buckling. Contact damages are not normally covered by the survey plan since indents are usually noted in memoranda and assumed to be dealt with as a normal routine by surveyors. Technical assessments performed in conjunction with the survey planning process are, in principle to be as shown schematically in Fig. I 2 -1. The approach is basically an evaluationofthe risk based onthe knowledge and experience related to design and corrosion. The design is to be considered with respect to structural details which may be susceptible to buckling or cracking as a result of vibration, high stress levels or fatigue. Corrosion is related to the ageing process, and is closely connected with the quality of corrosion protection at newbuilding, and subsequent maintenance during the service life. Corrosion may also lead to cracking and/or buckling.
Figure 4 gives an example to explain how to tune the membership functions by the refined value δ defined in the proposed refined K-means clustering algorithm. In Fig. 4, the circle and square notations with black and grey colors represent respectively the cluster centers determined by the K-means and refined K-means clustering algorithms. Moreover, the dotted and solid lines respectively represent the membership functions determined by the K-means and refined K-means clustering algorithms. Compared with the original K-means clustering algorithm, we find that the refined K-means clustering algorithm has benefits in terms of promoting classification ability forthe boundary patterns between different classes and reducing the number of unknown patterns due to expanding the boundary range while employing the neuro-fuzzy classifier to discover the learning performance assessment rules based on learning portfolios. The later experimental results will confirm these benefits.
 J. A. Marin, J., D. J. Ragsdale, and J. R. Surdu, “A Hybrid Approach to Profile Creation and Intrusion Detection,” Proceedings ofthe DARPA Information Survivability Conference and Exposition - DISCEX, pp. 69-76, 2001.
 MIT Lincoln Laboratory – DARPA Intrusion Detection Evaluation, http://www.ll.mit.edu/IST?ideval/index.html, 2002.
 L. Portnoy, E. Eskin, and S. J. Stolfo, “Intrusion Detection with Unlabeled Data Using Clustering,” Proceedings ofthe ACM CCS Workshop on Data Miningfor Security Applications, 2001.
2 Dept. of Comp. Sci. & Info. Eng., National University of Kaohsiung, Taiwan
1 firstname.lastname@example.org, 2 email@example.com
Abstract. The process of knowledge discovery from databases is a knowledge intensive, highly user-oriented practice, thus has recently heralded the development of ontology-incorporated data mining techniques. In our previous work, we have considered the problem ofmining association rules with ontological information (called ontological association rules) and devised two efficient algorithms, called AROC and AROS, for discovering ontological associations that exploit not only classification but also composition relationship between items. The real world, however, is not static. Data mining practitioners usually are confronted with a dynamic environment. New transactions are continually added into the database over time, and the ontology of items is evolved accordingly. Furthermore, the work of discovering interesting association rules is an iterative process; the analysts need to repeatedly adjust the constraint of minimum support and/or minimum confidence to discover real informative rules. Under these circumstances, how to dynamically discover association rules efficiently is a crucial issue. In this regard, we proposed a unified algorithm, called MIFO, which can handle the maintenance of discovered frequent patterns taking account of all evolving factors: new transactions updating in databases, ontology evolution and minimum support refinement. Empirical evaluation showed that MIFO is significantly faster than running our previous algorithms AROC and AROS from scratch.
One ofthe predominant techniques used in the area of data mining is association rule mining. In real world, data mining analysts usually are confronted with a dynamic environment; the database would be changed over time, and the analysts may need to set different support constraints to discover real informative rules. Efficiently updating the discovered association rules thus becomes a crucial issue. In this paper, we consider the problem of dynamic miningof association rules with classification ontology and with non-uniform multiple minimum supports constraint. We investigate how to efficiently update the discovered association rules when there is transaction update to the database and the analyst has refined the support constraint. A novel algorithm called DMA_CO is proposed. Experimental results show that our algorithm is 14% to 80% faster than applying generalized associations mining algorithms to the whole updated database.
Abstract—Mining association rules from a large business database, has been recognized as an important topic in the data mining community. A method that can help the analysis of associations is the use ofclassification ontology (taxonomy) and the setting of parameter constraints, such as minimum support. In real world applications, however, theclassification ontology cannot be kept static while new transactions are continuously added into the original database, and the analysts may also need to set a different support constraint from the original one while formulating a new query in discovering real informative rules.
c Department ofInformation Management, National DongHwa University, Hualien, Taiwan, ROC
In a database, the concept of an example might change along with time, which is known as concept drift. When the concept drift occurs, the classiﬁcation model built by using the old dataset is not suitable for predicting a new dataset. Therefore, the problem of con- cept drift has attracted a lot of attention in recent years. Although many algorithms have been proposed to solve this problem, they have not been able to provide users with a satisfactory solution to concept drift. That is, the current research about concept drift focuses only on updating the classiﬁcation model. However, real life decision makers might be very interested in therulesof concept drift. For exam- ple, doctors desire to know the root causes behind variation in the causes and development of disease. In this paper, we propose a con- cept drift rule mining tree, called CDR-Tree, to accurately discover the underlying rule governing concept drift. The main contributions of this paper are: (a) we address the problem ofmining concept-drifting rules which has not been considered in previously developed classiﬁcation schemes; (b) we develop a method that can accurately mine rules governing concept drift; (c) we develop a method that should classiﬁcation models be required, can eﬃciently and accurately generate such models via a simple extraction procedure rather than constructing them anew; and (d) we propose two strategies to reduce the complexity of concept-drifting rules mined by our CDR-Tree.
Mining association rules from large databases of business data is an important topic in data mining. In many applications, there are explicit or implicit taxonomies (hierarchies) over the items, so it may be more useful to find associations at different levels ofthe taxonomy than only at the primitive concept level. Previous work ontheminingof generalized association rules, however, assumed that the taxonomy of items are kept unchanged, disregarding the fact that the taxonomy might be updated as new transactions are added into the database over time. Under this circumstance, how to effectively update the discovered generalized association rules to reflect the database change with the taxonomy evolution is a crucial task. In this paper, we examine this problem and propose two novel algorithms, called IDTE and IDTE2, which can incrementally update the discovered generalized association rules when the taxonomy of items is evolved with new transactions. Empirical evaluations show that our algorithms can maintain their performance even in large amounts of incremental transactions and high degree of taxonomy evolution, and is faster than applying the contemporary generalized association mining algorithms to the whole updated database.
(iii) Integer program hyper-plane methods: Bertsimas and Shioda 
recently used a mixed-integer optimization method  to solve the classical statistical problems of classiﬁcation and regression.
Their method separates data points into different regions by using hyper-planes. Each region is assigned a class during the classiﬁcation. Solving this mixed-integer program, therules with high rate of accuracy can be induced. However, this approach may generate too many polyhedral regions, which decrease the rate of compact in the induced rules. Using integer programming techniques, Li and Chen  developed a multiple criteria method to induce classiﬁcation rules. Their method clusters data points into polyhedral regions, and yield highly accurate. However, since their approach is based onthe concept
argued that the two dimensions are inseparable in most of real world cases because
managers behave according to their thinking. In comparison, the behavioral dimension
has been adopted frequently in the alignment literature, and more focus should be
added to the cognitive dimension to enrich the assessment of alignment. Table 1
摘要: Hashing schemes are widely used to improve the performance of data mining association rules, as in the DHP algorithm that utilizes the hash table in identifying the validity of candidate itemsets according to the number ofthe table's bucket accesses. However, since the hash table used in DHP is plagued by the collision problem, the process of
Transactions with quantitative values and items with hierarchy relation are, however, commonly seen in real-world applications. In this paper, we introduce the problem ofmining generalized association rulesfor quantitative values. We propose fuzzy generalized rulesmining algorithm for extracting implicit knowledge from transactions stored as quantitative values. Given a set of transaction and predefined taxonomy, we want to find fuzzy generalized association rules where the quantitative of items may be from any level ofthe taxonomy. Each item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as that ofthe original items. The algorithm can therefore focus onthe most important linguistic terms and reduce its time complexity. We propose algorithm combines fuzzy transaction data mining algorithm and mining generalized association rules algorithm. This paper related to set concepts, fuzzy data mining algorithms and taxonomy and generalized association rules.