81
Chapter 6 Conclusion and Feature Works
In this thesis, two algorithms, named TFTRP-Mine and RE-TFTRP-Mine, are
proposed to mine top-K non-trivial fault-tolerant repeating patterns with length no less
than minimum length constraints for data sequences. By extending the idea of
appearing bit sequences, fault-tolerant appearing bit sequences are defined to represent
the positions where candidate patterns appear in a data sequence with
insertion/deletion errors. Both of two algorithms use the recursive formulas to obtain
fault-tolerant appearing bit sequences of a pattern systematically and then the
fault-tolerant frequency of each candidate pattern could be counted quickly. Besides,
RE-TFTRP-Mine adopts two additional strategies to increase the mining efficiency.
From experiment results, we can know that RE-TFTRP-Mine outperforms
TFTRP-Mine algorithm when K and min_len are small. In addition, when adopting
fault tolerant mining, more important and implicit repeating patterns could be found
for music objects.
The bit index table is created during the mining process. Because the size of it
increases as the length of a data sequence or the number of various data items
increases, the use of storage space will be increase. Therefore, how to partition the
bit index table into several parts to mine parallel could be discussed in the future.