Clustering Trajectories - 利用移動物體軌跡中之線索藉由分群及彙整技術探勘物體之移動模式

Since the given trajectories may contain more than one moving behavior, it is required to distinguish the trajectories with the same moving behaviors and group them into clusters. In this section, we describe our approach for grouping trajectories with similar moving behavior into clusters.

In Section 3, we introduce a SC-graph to present similar and close relations between trajec-tories. As such, the similar and close edges in a SC-graph represent some clues that indicates whether two trajectories represent the similar moving behavior or not. Thus, clustering tra-jectories with the similar moving behavior can be viewed as the procedure of exploiting some clues to clustering vertices in a SC-graph. To realize this idea, some definitions are elaborated in the following.

Definition 9. Core: Given a SC-graph G = (V, E_S ∪ E_C) and a threshold δ, a vertex u ∈ V is a core if there exists a set of vertices C_u such that 1. for v ∈ C_u and v 6= u, (u, v) ∈ E_S, and 2. for all v, w ∈ C_u, (v, w) ∈ E_S, and 3. |C_u| ≥ δ.

A core u in a SC-graph is a vertex with sufficient trajectories similar to it and these trajectories are mutually similar. Thus, a core set C_u contains trajectories which can most likely represent the same moving behavior. The value of δ is usually set to be at least 2 since the moving behavior described by a trajectory which is not similar to anyone is not enough confident. For example, let δ = 2, v₂ is a core and C_v₂ = {v₂, v₆, v₇} where v₆ and v₇ are the neighboring vertices of v₂ in E_S, each vertex has similar edge to others, and |C_v₂| = 3 ≥ δ = 2.

Even if some trajectorie follows the same moving behavior, it is possible that they are not in the same core due to the natural of trajectories. However, some ’clues’ may exist to indicates two cores with the similar moving behavior. This definition is elaborated in the following.

Definition 10. Directly Clue-Reachable: A vertex u is directly clue-reachable to a

vertex v, denoting as u Ã v, if v is a core and u is adjacent to v in E_S or E_C.

Directly clue-reachability shows a vertex u with the same moving behavior to a core. A vertex u can show it following the same moving behavior of a core through a similar or a close edge, respectively. Obviously, all vertices in Cv are mutually directly clue-reachable.

For example, v₅ Ã v₆ since v₆ is a core and (v₅, v₆) ∈ E_S; v₁ Ã v₂ since v₂ is a core and (v1, v2) ∈ EC.

Through the directly clue-reachability, we can find those vertices which potentially repre-sent the similar moving behavior as a core. In the following definition, we extent the directly clue-reachability to clue-reachability which can describe a vertex following the similar moving behavior through many clues indirectly.

Definition 11. Clue-Reachable: A vertex u is clue-reachable to a vertex v, denoting as u Ã^∗ v, if there exists a chain of vertices v = v1, v2, ..., vn = u such that vi Ã vi+1 for all i = 1, 2, ..., n − 1.

For example, v5 Ã^∗ v8 through the path v5 Ã v6 Ã v7 Ã v8.

Based on clue-reachability, we can further define the clue-connection from one core to the other core as follows:

Definition 12. Clue-Connect: A core u is clue-connected to v if there exists a core w such that x Ã^∗ y for all x ∈ Cu and for some y ∈ Cw, and y⁰ Ã^∗ z for all y⁰ ∈ Cw and for some z ∈ C_v.

Conceptually, through clue-connection, we can imply the moving behavior of a core u is similar to that of a core v. To ensure sufficient clues to support that, each vertex in C_u should be clue-reachable to some vertices of an intermediate core sets. That is, all vertices in Cu, i.e., trajectories stating the similar moving behavior of u, should follow the similar moving behavior as an intermediate core set. Similarly, all vertices in this intermediate core sets should follows the same moving behavior as the core v. For example, v₁₁ is clue-connected to v8. It can be seen that there is a core v5 such that all vertices in Cv11 are clue-reachable to some vertices in C_v₅ (i.e., v₁₁ Ã v₅ and v₁₀ Ã v₅), and all vertices in C_v₅ are clue-reachable to some vertices in Cv8 (i.e., v1 Ã^∗ v3 and v5 Ã^∗ v8).

For a core, there may be several cores clue-connected to it. To ensure that trajectories with the most similar moving behavior are grouped into a cluster, we derive a measurement clue-gain to evaluate how much ’clue’ a core set can obtain via the other one.

Definition 13. Clue-Gain: Consider two sets S and T . Let E_S^st and E_C^st be the set of

Generally speaking, more similar/close edges from S to T implies that S is more likely to represent the similar moving behavior as T . Also, the weights between these edges should be taken into account. The higher weights of these edges, the more similar the moving behaviors of S and T . Therefore, the clue-gain is proportional to the number of similar and close edges and the corresponding weights from S to T . Moreover, the similar edges should be weighted higher than the close edges because the similar edges represent that the moving behaviors of two vertices are mutually similar and the close edges only represents the moving behavior of one vertex is like to the other. Thus, two constants α and β are used for weighting the similar and close edges, respectively. Usually, the value of α should be at least two times larger than β. By the definition of similar and close scores, SS(i, j) = min(ST T(Ti, Tj), ST T(Tj, Ti)) ≤ S_{T T}(T_i, T_j) = CS(i, j). Thus, 2SS(i, j) ≤ CS(i, j) + CS(j, i), which shows that one similar edge is at least two times important as a close edge. Thus, α should be set two times larger than β.

For example, let α = 2 and β = 1. ClueGain(Cv12, Cv3) = 2 × 1 × 0.5 + 1 × 1 × 0.9 = 2.9 and ClueGain(C_v₁₂, C_v₁₁) = 2 × 2 × (0.5 + 0.7) + 1 × 1 × 0.3 = 5.1. Obviously, C_v₁₂ intends to show more similar moving behavior to Cv11 than Cv3 since there are more similar edges from C_v₁₂ to C_v₁₁ than to C_v₃.

According to the clue-connected and the clue-gain, we can formulate the problem of clus-tering trajectories with similar moving behavior as follows:

Definition 14. Cluster: A cluster C is a set of vertices satisfying the following conditions:

1. for all u ∈ C, there exists a core v ∈ C such that u is clue-connected to v (connectivity);

2. for all core sets Cu ∈ C, the core set Cv which can induce ClueGain(Cu, Cv) maximal is also in C (compactness); 3. |C| ≥ min sup (frequentness).

The first requirement states that a cluster is composed of many cores which have clues to support them describing the similar moving behavior. The second requirement describes the compactness of a cluster, where each core set should be in the same cluster with the core set which can make the clue-gain maximal. That is, each core set is used to interpret the moving behavior with the strongest clues from this core set. On the other words, a core set will not interpret the moving behavior with weaker clues. To ensure derived regions being frequent, a cluster should contain more than min sup vertices which is describing in the third statement.

We propose a clustering algorithm to find clusters in a SC-graph. In nut shell, this algo-rithm first discovers all core sets, then merges them according to their clue-gains, and adds some non-cores into clusters for enriching information of clusters at last. The algorithmic form is listed in Algorithm 1.

Figure 4.1: A scenario of clustering in a dual graph.

Note that a core set in a SC-graph is equivalent to a clique with size ≥ δ on ES. Thus, in the beginning, we find a clique cover on E_S, where a clique cover refers to a set of clique with their union being the whole graph. There are many existing heuristic algorithm to find a clique cover efficiently [10]. One of famous heuristic algorithms is based on greedy strategy which idea is to always select the highest degree vertex, and to pick its adjacent vertices which have edges mutually to form a clique. For example, v₈ owns the highest degree in this graph (only considering ES). There are five vertices adjacent to it, say v7, v3, v4, v9, and v13. It can be verified that only v₃, v₄, and v₉ have edges between each other. Thus, the first clique {v3, v4, v8, v9} is then generated. Cliques of a clique cover in our example are shaded in Figure 4.1.

After finding a clique cover, we can identify those clique with size ≥ δ as the core sets.

For example, let δ = 2, C_v₁, C_v₂, C_v₃, C_v₁₁, C_v₁₂, and C_v₁₄ are core sets. As long as deriv-ing the core sets, each core set computes the clue-gain from it to all one-step clue-connected core sets. Then, a core set is merged to the core set with the maximal clue-gain. For ex-ample, for Cv12, there are two one-step clue-connected core sets Cv3 and Cv11 with the clue-gains ClueGain(C_v₁₂, C_v₃) = 1.9 and ClueGain(C_v₁₂, C_v₁₁) = 5.1, respectively. Thus, C_v₁₂ is merged with Cv3 rather Cv11. The other example is that Cv14 is merged with Cv1 due to ClueGain(C_v₁₄, C_v₁) = 4.8 > ClueGain(C_v₁₄, C_v₁₁) = 2.1. Similarly, C_v₁ is merged into C_v₂ and Cv2 is merged into Cv3. Consequently, we can derive two clusters: {Cv1, Cv2, Cv3, Cv14} and {C_v₁₁, C_v₁₁}. It is worth mentioning that the merged cliques are guaranteed to be clue-connected since each clique can be only merged with its one-step clue-clue-connected clique, thereby satisfying the requirement 1 and 2 of a cluster. At last, some cliques with size ≤ δ are con-sidered to join some clusters to compensate the moving behaviors represented by this cluster.

As such, a non-core joins a cluster with a core which can induce the maximal clue gain. In

this case, a non-core does not merge the other non-core because a chain of non-cores with dif-ferent moving behavior may be contained in a cluster, especially for non-cores with only one vertex. For example, v₁₈ and v₁₉ form such a chain. It can be seen that the v₁₈ is close to v₁₇ which indicates it can compensate the information of Cv11. However, it is not obvious whether v₁₉ can be used to compensate or not. After that, clusters with less than min sup vertices and the remaining non-cores are eliminated. Following the example, let min sup = 5. The cluster {C_v₁₁, C_v₁₁} and the vertex v₁₉ are eliminated. Consequently, there are two clusters {Cv1, Cv2, Cv3, Cv14, v10} and {Cv11, Cv12, v18} as the final results.

Algorithm 1: Clustering Trajectories

Input : A SC-graph: G = (V, ES∪ EC) Output : A set of clusters: C

C ← φ;

Compute the clue-gain with all one-step clue-connected cliques in CORE;

Kmax← the clique in CORE which can maximize ClueGain(Ki, Kmax);

Cmax← the cluster containing Kmax;

Cjoin← the cluster containing Ki;

The selection of thresholds usually depends on user’s requirements and the properties of the environment. However, setting λ and µ are not straightforward tasks. The selection of thresholds highly affects the structure of a SC-graph since the number of edges significantly depend on the thresholds for similar and close relations (i.e., λ and µ). The larger λ and µ restrict whether two trajectories have similar and close relations or not more seriously. Thus, larger thresholds incur the fewer edges in a dual graph and make a dual graph more sparse.

A cluster in a sparse dual graph may contain only few trajectories such that it is hard to aggregate them to obtain the frequent movement precisely. The smaller λ and µ makes a dual graph more dense. However, it is easy for a cluster to contain more irrelevant trajectories such that the frequent regions cannot be derived precisely. Therefore, the results of clustering are highly dependent on the values of λ and µ.

Here, we propose a heuristic for selecting λ and µ adaptively according to the distribution

在文檔中利用移動物體軌跡中之線索藉由分群及彙整技術探勘物體之移動模式 (頁 23-27)