Top PDF Mining rules from an incomplete data set with a high missing rate

Mining rules from an incomplete data set with a high missing rate

Mining rules from an incomplete data set with a high missing rate

3 / 27 high. To deal with this disadvantage, we introduce an iterative missing-value completion method to fully infer the missing attribute values by combining an iterative mechanism and data mining techniques. The method uses the RAR support criterion [11] to extract useful association rules for inferring the missing values in an iterative way. It consists of three phases. The first phase uses the association rules which are mined from an original incomplete dataset to roughly complete the missing values. The second phase uses the reduced minimum support to gather more association rules from the original incomplete dataset to complete the rest of missing values from phase 1 in an iterative way until no missing values exist. The third phase uses the association rules from the completed dataset to correct the missing values that have been filled in according to the association rules until the missing values converge. Experiments on two datasets are also made to show the performance of the proposed approach.
Show more

27 Read more

Mining a Complete Set of Fuzzy Multiple-Level Rules

Mining a Complete Set of Fuzzy Multiple-Level Rules

However, the fuzzy association rules derived in that way are not complete, as some possible fuzzy association rules might be missing. This paper proposes a new fuzzy data-mining algorithm for extracting all possible fuzzy association rules from transactions stored as quantitative values. The proposed algorithm can derive a more complete set of rules but with more computation time than the previous method. Trade-off thus exists between the computation time and the completeness of rules. Choosing an appropriate mining method thus depends on the requirement of the application domains.
Show more

27 Read more

A DATA mining Procedure Using Neural Network- Self Organization Map and Rough Set to Discover Association Rules

A DATA mining Procedure Using Neural Network- Self Organization Map and Rough Set to Discover Association Rules

2.3.1 SOM Kohonen proposed SOM in 1980. It is an unsupervised two-layer network that can recognize a topological map from a random starting point. By SOM we can cluster enterprise’s customers, products, suppliers, etc. According to different clusters’ characteristics, different marketing strategies may be adopted by making use of the corresponding discovered association rules. In SOM network, input nodes and output nodes are fully connected with each other. Each input node contributes to each output node with a weight. Figures 3 and 4 are the network structure and flow chart for SOM training procedure, respectively. In our developed system user can assign different numbers of output nodes (cluster number), learning rate, radius rate and converge error rate, etc.
Show more

8 Read more

Finding fuzzy classification rules using data mining techniques

Finding fuzzy classification rules using data mining techniques

f ðV ðjÞ Þ ¼ W CAR CARðV ðjÞ Þ  W V jV ðjÞ j ð9Þ where V ðjÞ denotes a set consisting of the effective fuzzy classification rules obtained by s ðjÞ c ðjÞ , and W CAR and W V are relative weights of the classifi- cation accuracy rate by V ðjÞ (i.e., CARðV ðjÞ Þ) and the number of fuzzy rules in V ðjÞ (i.e., jV ðjÞ j), re- spectively. The chromosome that has the maxi- mum fitness value in the final generation is further used to examine the classification performance of the proposed method. That is, the acquisition of a compact fuzzy rule set with high classification ac- curacy rate is taken into account in the overall
Show more

11 Read more

Ranking discovered rules from data mining with multiple criteria by data envelopment analysis

Ranking discovered rules from data mining with multiple criteria by data envelopment analysis

The traditional Apriori algorithm cannot classify the infrequent items to interesting itemsets since the subjective domain knowledge is ignored. A huge amount of subjective domain knowledge may exist, which can be considered as potential subjective constraints and measures for evaluat- ing association rules. Following the discovery and report- ing of some rules, a data miner can select the subjective interestingness measures in Step 3. In market basket anal- ysis, understanding which products are usually bought together by customers and which products are beneficial to sellers are both interesting subjects for marketing ana- lysts. The former can be measured in terms of support and confidence in association rules. In this paper, the sub- jective measures of sellers’ profits are evaluated in terms of itemset value and cross-selling profit corresponding to the association rules. For association rules like X ) Y, four criteria are jointly used for rule evaluation as follows:
Show more

7 Read more

Mining Time Series Data with Fuzzy Association Rules

Mining Time Series Data with Fuzzy Association Rules

2.BASIC CONCEPTS 2.1 Association Rules Data mining can be applied to discover the useful patterns and rules by exploring and analyzing a large quantity of data. That is, a collection of data from customer surveys, health studies, market examinations, item banks and other raw data needs further analysis to transform it into useful information. In general, data mining involves the recognition of implicitly patterns that are hard to be analyzed, even though the use of traditional statistical techniques. In addition, an important mining task in this area is to discover association rules [10]. Suppose that basket data consists of items bought by a customer over a period of time, mining a large collection of basket data by association rules is to refine the relation between the sets of items with some specified confidences and support. The definitions of the association rules are reviewed as follows:
Show more

7 Read more

An efficient algorithm for mining temporal high utility itemsets from data streams

An efficient algorithm for mining temporal high utility itemsets from data streams

3.3.2. Incremental procedure of THUI-Mine As shown in Table 3, D  indicates the unchanged por- tion of an ongoing transaction database. The deleted and added portions of an ongoing transaction database are denoted by D  and D + , respectively. It is worth mentioning that the sizes of D + and D  , i.e., jD + j and jD  j respectively, are not required to be the same. The incremental procedure of THUI-Mine is devised to maintain temporal high utility itemsets efficiently and effectively. This procedure is shown in Fig. 4. As mentioned before, this incremental step can also be divided into three sub-steps: (1) generating tempo- ral high TWU2I in D  = db 1,3  D  , (2) generating tempo- ral high TWU2I in db 2,4 = D  + D + and (3) scanning the database db 2,4 only once for the generation of all temporal high utility itemsets. Initially, after some update activities, old transactions D  are removed from the database db m,n and new transactions D + are added (in Step 6). Note that D   db m,n . Denoting the updated database as db i,j , note that db i,j = db m,n  D  + D + . We denote the unchanged transactions by D  = db m,n  D  = db i,j  D + . After load- ing Thtw m,n of db m,n into CF where I 2 Thtw m,n , we start the first sub-step, i.e., generating temporal high TWU2I in D  = db m,n  D  . This sub-step reverses the cumulative processing which is described in the pre-processing proce- dure. From Step 8 to Step 16, we prune the occurrences of an itemset I, which appeared before partition P i , by deleting the value I.twu where I 2 CF and I.start < i. Next, from Step 17 to Step 39, similarly to the cumulative pro- cessing in Section 3.3.1, the second sub-step generates tem- poral high TWU2I in db i,j = D  + D + and employs the scan reduction technique to generate C i;j hþ1 . Finally, to gen- erate temporal high utility itemsets, i.e., Thu i,j , in the updated database, we scan db i,j only once in the incremen- tal procedure to find temporal high utility itemsets. Note that Thtw i,j is kept in main memory for the next generation of incremental mining.
Show more

13 Read more

Data Mining: An Overview from A Database Perspective

Data Mining: An Overview from A Database Perspective

In order to conduct effective data mining, one needs to first examine what kind of features an applied knowledge dis- covery system is expected to have and what kind of chal- [r]

18 Read more

Mining Fuzzy Multiple-Level Association Rules from Quantitative Data

Mining Fuzzy Multiple-Level Association Rules from Quantitative Data

Machine-learning and data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Transactions with quantitative values and items with hierarchical relationships are, however, commonly seen in real-world applications. This paper proposes a fuzzy multiple-level mining algorithm for extracting knowledge implicit in transactions stored as quantitative values. The proposed algorithm adopts a top-down progressively deepening approach to finding large itemsets. It integrates fuzzy-set concepts, data-mining technologies and multiple-level taxonomy to find fuzzy association rules from transaction data sets. Each item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of original items. The algorithm therefore focuses on the most important linguistic terms for reduced time complexity.
Show more

33 Read more

Applying Visualizing Association Rules in Medical Data Mining —An  Example of Information Extraction from Severe Acute Respiratory

Applying Visualizing Association Rules in Medical Data Mining —An Example of Information Extraction from Severe Acute Respiratory

Applying Visualizing Association Rules in Medical Data Mining —An Example of Information Extraction from Severe Acute Respiratory.. 中文摘要[r]

1 Read more

Mining fuzzy generalized association rules from quantitative data under fuzzy taxonomic structures

Mining fuzzy generalized association rules from quantitative data under fuzzy taxonomic structures

discovery of more general and important knowledge from data. Relevant taxonomies of data items are thus usually predefined in real-world applications. An item may, however, belong to different classes in different views. When taxonomic structures are not crisp, hierarchical graphs can be used to represent them. Terminal nodes on the

24 Read more

An Effective Algorithm for Mining Association Rules with Multiple Thresholds

An Effective Algorithm for Mining Association Rules with Multiple Thresholds

lexicographic order. Frequent itemsets are computed iteratively in the ascending order of size. Assume the largest frequent itemsets contain k items, it takes k iterations for mining all frequent itemsets. Initial iteration computes the frequent 1-itemsets L 1 . Then, for each iteration i≤ k, all frequent i-itemsets are computed by scanning database once. Iteration i consists of two phases. First, the set C i of candidate i-itemsets are created by joining the frequent (i-1)-itemsets in L i-1 found in the previous iteration. Next, the database is scanned for determining the support of all candidates in C i and the frequent i-itemsets L i are extracted from these candidates. This iteration is repeated until no more candidates can be generated. The Apriori algorithm needs to take k database passes to generate all frequent itemsets. For disk resident databases, this requires reading the database completely for each pass resulting in a large number of disk reads. It means that the Apriori algorithm takes a huge I/O operations.
Show more

12 Read more

Mining generalized fuzzy association rules from web taxonomic Mining generalized fuzzy association rules from web taxonomic

Mining generalized fuzzy association rules from web taxonomic Mining generalized fuzzy association rules from web taxonomic

The discovery of fuzzy association rules is an important data-mining task for which many algorithms have been proposed. However, the efficiency of these algorithms needs to be improved to handle real-world large datasets. In this paper, we present an efficient method named cluster-based fuzzy association rule (CBFAR) to discover generalized fuzzy association rules from web structures. The CBFAR method is to create fuzzy cluster tables by scanning the browse information database (BIDB) once, and then clustering the browse records to the k-th cluster table, where the length of a record is k. The counts of the fuzzy regions are stored in the Fuzzy_Cluster Tables. This method requires less contrast to generate large itemsets. The CBFAR method is also discussed.
Show more

6 Read more

A generic approach for mining indirect association rules in data streams

A generic approach for mining indirect association rules in data streams

3 Dept. of Computer Science and Information Engineering, Tamkang University, Taiwan 1 wylin@nuk.edu.tw; 2 waiewing@gmail.com; 3 chchen@mail.tku.edu.tw Abstract. An indirect association refers to an infrequent itempair, each item of which is highly co-occurring with a frequent itemset called “mediator”. Al- though indirect associations have been recognized as powerful patterns in re- vealing interesting information hidden in many applications, such as recom- mendation ranking, substitute items or competitive items, and common web navigation path, etc., almost no work, to our knowledge, has investigated how to discover this type of patterns from streaming data. In this paper, the problem of mining indirect associations from data streams is considered. Unlike con- temporary research work on stream data mining that investigates the problem individually from different types of streaming models, we treat the problem in a generic way. We propose a generic window model that can represent all classi- cal streaming models and retain user flexibility in defining new models. In this context, a generic algorithm is developed, which guarantees no false positive rules and bounded support error as long as the window model is specifiable by the proposed generic model. Comprehensive experiments on both synthetic and real datasets have showed the effectiveness of the proposed approach as a ge- neric way for finding indirect association rules over streaming data.
Show more

10 Read more

Combined Association Rules for Dealing with Missing Values

Combined Association Rules for Dealing with Missing Values

上傳時間: 2009-12-17T06:58:05Z 出版者: Asia University 摘要: With the rapid increase in the use of databases, the problem of missing values inevitably arises. The techniques developed to recover these missing values effectively should be highly precise in order to estimate the missing values completely. The mining of association rules can effectively establish the relationship among items in databases.

1 Read more

Maintenance of Association Rules in Data Mining

Maintenance of Association Rules in Data Mining

compares these itemsets with the previously retained large and pre-large 1-itemsets. It partitions candidate 1-itemsets into three parts according to whether they are large or pre-large for the original database. If a candidate 1-itemset from the newly inserted transactions is also among the large or pre-large 1-itemsets from the original database, its new total count for the entire updated database can easily be calculated from its current count and previous count since all previous large and pre-large itemsets with their counts have been retained. Whether an originally large or pre-large itemset is still large or pre-large after new transactions have been inserted is determined from its new support ratio, as derived from its total count over the total number of transactions. On the contrary, if a candidate 1-itemset from the newly inserted transactions does not exist among the large or pre-large 1-itemsets in the original database, then it is absolutely not large for the entire updated database as long as the number of newly inserted transactions is within the predefined number of new transactions. In this situation, no action is needed.
Show more

6 Read more

Using Affinity Set and Data Mining on Revisiting Rules of Emergent Patients 李宗鴻、陳郁文

Using Affinity Set and Data Mining on Revisiting Rules of Emergent Patients 李宗鴻、陳郁文

Data mining can explore the hidden messages from data for decision-makers. When facing the rush time of emergency room、how to aid medical personnels to provide effective services in order to enhance patient safety is a very important issue. In this study、the use of six methods; for example、Affinity Set、Back-propagation Neural Network、Rough Set theory、Support Vector Machine 、Decision Tree and Association Rules、are computed by their performances of Receiver Operating Characteristic (ROC) curve to find the best model’s capability of revisiting rules for emergent patients. Study results show that Support Vector Machine has the best classification power、the second best is the affinity set model、and they both have the prediction accuracy of 80%. However
Show more

3 Read more

Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window

Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window

National Chengchi University Taipei, Taiwan, R.O.C. Abstract Mining frequent itemsets has been widely studied over the last decade. Past research focuses on mining frequent itemsets from static databases. In many of the new applications, data flow through the Internet or sensor networks. It is challenging to extend the mining techniques to such a dynamic environment. The main challenges include a quick response to the continuous request, a compact summary of the data stream, and a mechanism that adapts to the limited resources. In this paper, we develop a novel approach for mining frequent itemsets from data streams based on a time-sensitive sliding window model. Our approach consists of a storage structure that captures all possible frequent itemsets and a table providing approximate counts of the expired data items, whose size can be adjusted by the available storage space.
Show more

12 Read more

Fuzzy data mining for interesting generalized association rules

Fuzzy data mining for interesting generalized association rules

Transactions with quantitative values and items with hierarchy relation are, however, commonly seen in real-world applications. In this paper, we introduce the problem of mining generalized association rules for quantitative values. We propose fuzzy generalized rules mining algorithm for extracting implicit knowledge from transactions stored as quantitative values. Given a set of transaction and predefined taxonomy, we want to find fuzzy generalized association rules where the quantitative of items may be from any level of the taxonomy. Each item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as that of the original items. The algorithm can therefore focus on the most important linguistic terms and reduce its time complexity. We propose algorithm combines fuzzy transaction data mining algorithm and mining generalized association rules algorithm. This paper related to set concepts, fuzzy data mining algorithms and taxonomy and generalized association rules.
Show more

33 Read more

Elicitation of classification rules by fuzzy data mining

Elicitation of classification rules by fuzzy data mining

otFCðR b Þ ¼ max j fo j FCðR j ÞjR j A TRg; ð10Þ where TR is the set of fuzzy rules generatedby the proposed method. The adaptive rules are further employed to adjust the fuzzy confidence of R b : If t p is correctly classifiedthen FC(R b ) is increased; otherwise, FC(R b ) is decreased. Nozaki et al. (1996) also suggested that the learning rates shouldbe specifiedas 0 oZ 1 5 Z 2 o1: Actually, Z 1 ¼ 0:001; Z 2 ¼ 0:1 and J max ¼ 500 are usedin the experiment. In the subsequent section, experimental results from the iris data demon- strate the effectiveness of the proposedmethod. How- ever, the aim of the experiment is to show the feasibility andthe problem-solving capability of the proposed methodfor classification problems. That is, method s about the acquisition of appropriate parameter specifi- cations to obtain higher classification accuracy rates and smaller number of fuzzy if–then rules are not considered in this paper.
Show more

8 Read more

Show all 10000 documents...