Experiments were made to compare the performance of the batch FP-tree
construction algorithm and the incremental FUFP-tree maintenance algorithm for
processing new transactions. When new transactions came, the batch FP-tree
construction algorithm integrated new transactions into the original database and
constructed a new FP-tree from the updated database. The process was executed
whenever new transactions came. The incremental FUFP-tree maintenance algorithm
processed new transactions incrementally in the way mentioned in Section 3.
The experiments were performed in C++ on an Intel x86 PC with a 2.8G Hz
processor and 512 MB main memory and running the Microsoft Windows XP
operating system. A real dataset called BMS-POS [22] were used in the experiments.
This dataset was also used in the KDDCUP 2000 competition. The BMS-POS dataset
contained several years of point-of-sale data from a large electronics retailer. Each
transaction in this dataset consisted of all the product categories purchased by a
customer at one time. There were 515,597 transactions with 1657 items in the
dataset. The maximal length of a transaction was 164 and the average length of the
transactions was 6.5.
The first 400,000 transactions were extracted from the BMS-POS database to
construct an initial FP-tree. The next 5,000 transactions were then sequentially used
each time as new transactions for the experiments. The minimum support was set at
4%. The execution times and the numbers of nodes obtained from both the batch
FP-tree construction algorithm and the incremental FUFP-tree maintenance
algorithm were compared. Figure 18 shows the execution times required by the
batch FP-tree construction algorithm and by the FUFP-tree maintenance algorithm
for processing each 5000 new transactions.
0 20 40 60 80 100 120
0 5000 10000 15000 20000 25000
number of transactions
execution time (sec.)
FP-tree FUFP-tree
Figure 18: The comparison of the execution time
In Figure 18, it easily observed that the execution time by the proposed approach
was much less than that by the batch FP-tree construction algorithm for handling new
transactions. Especially, when the transaction numbers in the original database
became larger, the FUFP-tree maintenance algorithm had a better speed-up.
The FUFP-tree maintenance algorithm may generate a less concise tree than the
FP-tree construction algorithm since the latter completely follows the sorted frequent
items to build the tree. As mentioned above, when an originally small item becomes
large due to new transactions, its updated support is usually only a little larger than
the minimum support. It is thus reasonable to put a new large item at the end of the
Headrer_Table. The difference between the FP and the FUFP tree-structures will thus
not be significant. For showing this effect, the numbers of nodes between the two
algorithms are shown in Figure 19.
158000 160000 162000 164000 166000 168000 170000 172000
0 5000 10000 15000 20000 25000 number of transactions
number of nodes FP-tree
FUFP-tree
Figure 19: The comparison of the number of nodes
It is observed from Figure 19 that the FUFP-tree maintenance algorithm
generated nearly the same nodes as the FP-tree construction algorithm. The
effectiveness of the FUFP-tree maintenance algorithm is thus acceptable.
6. Conclusion
In this paper, we have proposed the FUFP maintenance structure and algorithm
to efficiently and effectively handle new transaction insertion in data mining. The
FUFP-tree structure is the same as the FP-tree structure [12] except that the links
between parent nodes and their child nodes are bi-directional. Besides, the counts of
the sorted frequent items are also kept in the Header_Table. These modifications will
make the tree update process easier.
When new transactions are added, the proposed incremental maintenance
algorithm processes them to maintain the FUFP-tree. It first partitions items into four
parts according to whether they are large or small in the original database and in the
new transactions. Each part is then processed in its own way. The Header_Table and
the FUFP-tree are correspondingly updated whenever necessary. It is reasonable to
insert a new large item at the end of the Header_Table since when an originally small
item becomes large due to new transactions, its updated support is usually only a little
larger than the minimum support.
Experimental results also show that the proposed FUFP-tree maintenance
algorithm runs faster than the batch FP-tree construction algorithm for handling new
transactions and generates nearly the same tree structure as the FP-tree algorithm. The
proposed approach can thus achieve a good trade-off between execution time and tree
complexity.
The FP-Growth mining procedure was used for mining from the FP-tree in the
past. It can also be borrowed for mining from FUFP-tree. Both the FP and the FUFP
tree structures can easily allow the FP-growth procedure to mine desired rules for
only specified items. In this case, the maintenance of the tree structures is especially
important. In the future, we will attempt to discuss other issues on incremental mining
problems.
References
[1] R. Agrawal, T. Imielinksi and A. Swami, “Mining association rules between sets
of items in large database,“ The ACM SIGMOD Conference, pp. 207-216,
Washington DC, USA, 1993.
[2] R. Agrawal, T. Imielinksi and A. Swami, “Database mining: a performance
perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No.
6, pp. 914-925, 1993.
[3] R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” The
International Conference on Very Large Data Bases, pp. 487-499, 1994.
[4] R. Agrawal and R. Srikant, ”Mining sequential patterns,” The Eleventh IEEE
International Conference on Data Engineering, pp. 3-14, 1995.
[5] R. Agrawal, R. Srikant and Q. Vu, “Mining association rules with item
constraints,” The Third International Conference on Knowledge Discovery in
Databases and Data Mining, pp. 67-73, Newport Beach, California, 1997.
[6] M.S. Chen, J. Han and P.S. Yu, “Data mining: An overview from a database
perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No.
6, pp. 866-883, 1996.
[7] D.W. Cheung, J. Han, V.T. Ng, and C.Y. Wong, “Maintenance of discovered
association rules in large databases: An incremental updating approach,” The
Twelfth IEEE International Conference on Data Engineering, pp. 106-114, 1996.
[8] D.W. Cheung, S.D. Lee, and B. Kao, “A general incremental technique for
maintaining discovered association rules,” In Proceedings of Database Systems
for Advanced Applications, pp. 185-194, Melbourne, Australia, 1997.
[9] C. I. Ezeife, “Mining Incremental association rules with generalized FP-tree,”
Proceedings of the 15th Conference of the Canadian Society for Computational
Studies of Intelligence on Advances in Artificial Intelligence, pp. 147-160, 2002.
[10] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama, "Mining optimized
association rules for numeric attributes," The ACM SIGACT-SIGMOD-SIGART
Symposium on Principles of Database Systems, pp. 182-191, 1996.
[11] J. Han and Y. Fu, “Discovery of multiple-level association rules from large
database,” The Twenty-first International Conference on Very Large Data Bases,
pp. 420-431, Zurich, Switzerland, 1995.
[12] J. Han, J. Pei, and Y. Yin, ”Mining frequent patterns without candidate
generation” The 2000 ACM SIGMOD International Conference on Management
of Data, 2000.
[13] M.Y. Lin and S.Y. Lee, “Incremental update on sequential patterns in large
databases,” The Tenth IEEE International Conference on Tools with Artificial
Intelligence, pp. 24-31, 1998.
[14] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient algorithm for
discovering association rules,” The AAAI Workshop on Knowledge Discovery in
Databases, pp. 181-192, 1994.
[15] J.S. Park, M.S. Chen, P.S. Yu, “Using a hash-based method with transaction
trimming for mining association rules,” IEEE Transactions on Knowledge and
Data Engineering, Vol. 9, No. 5, pp. 812-825, 1997.
[16] Y. Qiu, Y. J. Lan and Q. S. Xie, “An improved algorithm of mining from
FP-tree,” Proceedings of the Third International Conference on Machine
Learning and Cybernetics, Shanghai, August, pp 26-29, 2004.
[17] N. L. Sarda and N. V. Srinivas, “An adaptive algorithm for incremental mining
of association rules,” The Ninth International Workshop on Database and Expert
Systems, pp. 240-245, 1998.
[18] R. Srikant and R. Agrawal, “Mining generalized association rules,” The
Twenty-first International Conference on Very Large Data Bases, pp. 407-419,
Zurich, Switzerland, 1995.
[19] R. Srikant and R. Agrawal, “Mining quantitative association rules in large
relational tables,” The 1996 ACM SIGMOD International Conference on
Management of Data, pp. 1-12, Montreal, Canada, 1996.
[20] S. Zhang, “Aggregation and maintenance for database mining,” Intelligent Data
Analysis, Vol. 3, No. 6, pp. 475-490, 1999.
[21] O. R. Zaiane and E. H. Mohammed, “COFI-tree mining: A new approach to
pattern growth with reduced candidacy generation,” IEEE International
Conference on Data Mining, 2003.
[22] Z. Zheng, R. Kohavi, L. Mason, “Real world performance of association rule
algorithms”, The International Conference on Knowledge Discovery and Data
Mining, pp. 401-406, 2001.