Introduction - 點集合影片資料庫中封閉性樣式之資料探勘

With advances in information technologies, a large amount of videos have been collected into video databases. Thus, the approaches of mining useful patterns from video databases have been attracted more and more attention in recent years. If we known what patterns often happen in the videos, we will know what situations often occur and to what we should pay attention. For example, we put a video camera on corridor in a hospital to collect how the patients walk. Many videos will be collected into a video database. By mining frequent patterns in such a database, we could know the pattern of normal walk of patients and abnormal ones, like patients falling down to the ground. If a patient falls down to the ground, an alarm will be raised to notify the workers in the first-aid station. If we can know the pattern of patients falling down in the video database, it could help a monitoring system to detect this event automatically, and inform the first-aid station immediately. We can also detect other kinds of movement patterns of human beings to achieve some purpose of security protection. For instance, if we can obtain the patterns of the action of throwing, it is helpful to avoid some violent behaviors such as throwing a grenade or other weapons.

A pattern is frequent if it satisfies the user-specified minimum support. There are many kinds of frequent patterns, including itemsets, subsequences, substructures, etc. A frequently-occurring subsequence, such as the pattern that customers tend to purchase first a PC, followed by a digital camera, and then a memory card, is also called a sequential pattern. Sequential pattern mining, which discovers frequent subsequences in a sequence database, is a critical data mining problem with board applications, including the analyses of customer purchase behavior, Web access patterns, scientific experiments, DNA sequences, and so on.

Many sequential mining methods have been proposed. Agrawal et al. [1]

proposed an Apriori method, which adopts a generate-and-test approach to mine sequential patterns. The major approaches of mining a complete set of sequential patterns include SPAM [2], GSP [16], SPADE [20], GO-SPADE [10] and PrefixSpan [14]. GSP [16] uses the downward-closure property of sequential patterns and adopts the candidate generate-and-test approach to mine sequential patterns. SPADE [20] and GO-SPADE [10] devises a divide-and-conquer strategy to implement the sequential patterns mining with a vertical data format. SPAM [2] exploits a vertical bitmap structure to count supports efficiently. However, the Apriori-based methods would generate many redundant candidates and require multiple database scans. Thus, Han et al. [4] designed the FP-growth method to mine frequent itemsets without candidate generation. Han et al. [5] proposed the FreeSpan method, which recursively projects a sequential database into projected databases, and generates frequent sequential patterns from these projected databases. PrefixSpan [14] mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation.

Moreover, using prefix-projection can substantially reduces the size of projected databases and leads to mining the patterns efficiently.

Many methods have been proposed to mine frequent subgraphs. Inokuchi et al.

[7] proposed an AGM method to represent a graph as an “adjacency matrix” and mine them with an Apriori-based approach. Kuramochi et al. [9] presented an FSG method based on the Apriori algorithm, which uses a sparse graph representation to minimize storage space and computation time and has various optimization techniques for candidate generations. Yan et al. [19] developed a depth-first search algorithm, gSpan, to mine frequent subgraphs without candidate generations, where the DFS lexicographic order and minimum DFS code are used to represent a graph. Huan et al.

[6] designed a candidate subgraph enumeration scheme, called FFSM, to mine frequent subgraphs. Wang et al. [8] proposed a method that eliminates some vertices in a path of a graph which can keep the topology structures in the graph and also reduce the search space to increase the mining efficiency.

Instead of mining all frequent itemsets, Pasquier et al. [12] introduced a new concept to mine the frequent closed itemsets. A frequent itemset is closed if there does not exist any super-itemset with the same support. However, the number of closed itemsets must be not greater than that of frequent itemsets in the database and frequent closed itemsets mined can be used to generate a complete set of frequent itemsets [12].

Generally speaking, mining closed itemsets is more efficiently than mining all frequent itemsets [12]. A-CLOSE [12] exploits the Apriori property to find closed itemsets. CLOSET [13] and CLOSET+ [18] uses the FP-tree as a compact data structure and mines frequent closed itemsets by projected databases. CHARM [21]

uses an itemset-tidset search tree and applies a diffset technique to increase its performance. DCI_CLOSED [11] can detect and discard the duplicate closed itemsets without the need of keeping the closed itemsets mined in main memory. Singh et al.

[15] proposed the CloseMiner algorithm to mine closed itemsets where they considered the frequent closed itemset mining problem as the problem of clustering the complete set of itemsets with closed tidsets. Uno et al. [17] developed the LCM algorithm, which organizes the closed itemsets into a tree structure and mines them in a depth-first search manner. Cheng et al. [3] proposed an algorithm to mine δ-tolerance frequent closed itemsets (δ-TCFIs) in order to reduce the number of closed itemsets.

However, the itemset mining methods proposed cannot use to mine the patterns in video databases. The sequential mining methods do not consider the spatial attribute in the video patterns. The graph mining methods cannot be used to mine the patterns in video databases because they do not consider the temporal attribute.

Therefore, the itemset mining, sequential mining and graph mining methods are not suitable to mine frequent closed patterns in video databases.

Therefore, in this thesis, we first devise two data structures, called rplist and CV-tree, to store the information of frequent video patterns. Next, we propose a novel algorithm, called CVP, to mine frequent closed patterns from a video database in a

depth-first search (DFS) manner. Our proposed algorithm consists of two phases. We first grow frequent video patterns in the spatial dimension and then grow them in the temporal dimension. To efficiently mine frequent closed patterns, we develop several pruning strategies to prune non-closed patterns. By exploiting the CV-tree and rplists to store the information of frequent video patterns, the CVP algorithm can localize the candidate generation, pattern join, and support counting in a small amount of rplists.

Therefore, it can efficiently mine frequent closed patterns in a video database.

The contributions of this thesis are summarized as follows: (1) We first devise two data structures, called rplist and CV-tree, to store the information of frequent video patterns. (2) We propose a novel algorithm, called CVP, to mine frequent closed patterns from a video database in a depth-first search (DFS) manner. (3) To efficiently mine frequent closed patterns, we develop several pruning strategies to prune non-closed patterns. (4) By exploiting the CV-tree and rplists to store the information of frequent video patterns, the CVP algorithm can localize the candidate generation, pattern join, and support counting in a small amount of rplists. (5) The experimental results show that our proposed algorithm is efficient and scalable, and outperforms the modified Apriori algorithm.

The rest of this thesis is organized as follows. Chapter 2 illustrates the preliminary concepts and problem definitions. Chapter 3 describes our proposed algorithm. Chapter 4 shows the experimental setup and performance evaluation.

Finally, the conclusions and future work are discussed in Chapter 5.

在文檔中點集合影片資料庫中封閉性樣式之資料探勘 (頁 9-13)