• 沒有找到結果。

Since the given trajectories may contain more than one moving behavior, it is required to distinguish the trajectories with the same moving behaviors and group them into clusters. In this section, we describe our approach for grouping trajectories with similar moving behavior into clusters.

In Section 3, we introduce a SC-graph to present similar and close relations between trajec-tories. As such, the similar and close edges in a SC-graph represent some clues that indicates whether two trajectories represent the similar moving behavior or not. Thus, clustering tra-jectories with the similar moving behavior can be viewed as the procedure of exploiting some clues to clustering vertices in a SC-graph. To realize this idea, some definitions are elaborated in the following.

Definition 9. Core: Given a SC-graph G = (V, ES ∪ EC) and a threshold δ, a vertex u ∈ V is a core if there exists a set of vertices Cu such that 1. for v ∈ Cu and v 6= u, (u, v) ∈ ES, and 2. for all v, w ∈ Cu, (v, w) ∈ ES, and 3. |Cu| ≥ δ.

A core u in a SC-graph is a vertex with sufficient trajectories similar to it and these trajectories are mutually similar. Thus, a core set Cu contains trajectories which can most likely represent the same moving behavior. The value of δ is usually set to be at least 2 since the moving behavior described by a trajectory which is not similar to anyone is not enough confident. For example, let δ = 2, v2 is a core and Cv2 = {v2, v6, v7} where v6 and v7 are the neighboring vertices of v2 in ES, each vertex has similar edge to others, and |Cv2| = 3 ≥ δ = 2.

Even if some trajectorie follows the same moving behavior, it is possible that they are not in the same core due to the natural of trajectories. However, some ’clues’ may exist to indicates two cores with the similar moving behavior. This definition is elaborated in the following.

Definition 10. Directly Clue-Reachable: A vertex u is directly clue-reachable to a

vertex v, denoting as u à v, if v is a core and u is adjacent to v in ES or EC.

Directly clue-reachability shows a vertex u with the same moving behavior to a core. A vertex u can show it following the same moving behavior of a core through a similar or a close edge, respectively. Obviously, all vertices in Cv are mutually directly clue-reachable.

For example, v5 Ã v6 since v6 is a core and (v5, v6) ∈ ES; v1 Ã v2 since v2 is a core and (v1, v2) ∈ EC.

Through the directly clue-reachability, we can find those vertices which potentially repre-sent the similar moving behavior as a core. In the following definition, we extent the directly clue-reachability to clue-reachability which can describe a vertex following the similar moving behavior through many clues indirectly.

Definition 11. Clue-Reachable: A vertex u is clue-reachable to a vertex v, denoting as u à v, if there exists a chain of vertices v = v1, v2, ..., vn = u such that vi à vi+1 for all i = 1, 2, ..., n − 1.

For example, v5 Ã v8 through the path v5 Ã v6 Ã v7 Ã v8.

Based on clue-reachability, we can further define the clue-connection from one core to the other core as follows:

Definition 12. Clue-Connect: A core u is clue-connected to v if there exists a core w such that x à y for all x ∈ Cu and for some y ∈ Cw, and y0 à z for all y0 ∈ Cw and for some z ∈ Cv.

Conceptually, through clue-connection, we can imply the moving behavior of a core u is similar to that of a core v. To ensure sufficient clues to support that, each vertex in Cu should be clue-reachable to some vertices of an intermediate core sets. That is, all vertices in Cu, i.e., trajectories stating the similar moving behavior of u, should follow the similar moving behavior as an intermediate core set. Similarly, all vertices in this intermediate core sets should follows the same moving behavior as the core v. For example, v11 is clue-connected to v8. It can be seen that there is a core v5 such that all vertices in Cv11 are clue-reachable to some vertices in Cv5 (i.e., v11 Ã v5 and v10 Ã v5), and all vertices in Cv5 are clue-reachable to some vertices in Cv8 (i.e., v1 Ã v3 and v5 Ã v8).

For a core, there may be several cores clue-connected to it. To ensure that trajectories with the most similar moving behavior are grouped into a cluster, we derive a measurement clue-gain to evaluate how much ’clue’ a core set can obtain via the other one.

Definition 13. Clue-Gain: Consider two sets S and T . Let ESst and ECst be the set of

Generally speaking, more similar/close edges from S to T implies that S is more likely to represent the similar moving behavior as T . Also, the weights between these edges should be taken into account. The higher weights of these edges, the more similar the moving behaviors of S and T . Therefore, the clue-gain is proportional to the number of similar and close edges and the corresponding weights from S to T . Moreover, the similar edges should be weighted higher than the close edges because the similar edges represent that the moving behaviors of two vertices are mutually similar and the close edges only represents the moving behavior of one vertex is like to the other. Thus, two constants α and β are used for weighting the similar and close edges, respectively. Usually, the value of α should be at least two times larger than β. By the definition of similar and close scores, SS(i, j) = min(ST T(Ti, Tj), ST T(Tj, Ti)) ≤ ST T(Ti, Tj) = CS(i, j). Thus, 2SS(i, j) ≤ CS(i, j) + CS(j, i), which shows that one similar edge is at least two times important as a close edge. Thus, α should be set two times larger than β.

For example, let α = 2 and β = 1. ClueGain(Cv12, Cv3) = 2 × 1 × 0.5 + 1 × 1 × 0.9 = 2.9 and ClueGain(Cv12, Cv11) = 2 × 2 × (0.5 + 0.7) + 1 × 1 × 0.3 = 5.1. Obviously, Cv12 intends to show more similar moving behavior to Cv11 than Cv3 since there are more similar edges from Cv12 to Cv11 than to Cv3.

According to the clue-connected and the clue-gain, we can formulate the problem of clus-tering trajectories with similar moving behavior as follows:

Definition 14. Cluster: A cluster C is a set of vertices satisfying the following conditions:

1. for all u ∈ C, there exists a core v ∈ C such that u is clue-connected to v (connectivity);

2. for all core sets Cu ∈ C, the core set Cv which can induce ClueGain(Cu, Cv) maximal is also in C (compactness); 3. |C| ≥ min sup (frequentness).

The first requirement states that a cluster is composed of many cores which have clues to support them describing the similar moving behavior. The second requirement describes the compactness of a cluster, where each core set should be in the same cluster with the core set which can make the clue-gain maximal. That is, each core set is used to interpret the moving behavior with the strongest clues from this core set. On the other words, a core set will not interpret the moving behavior with weaker clues. To ensure derived regions being frequent, a cluster should contain more than min sup vertices which is describing in the third statement.

We propose a clustering algorithm to find clusters in a SC-graph. In nut shell, this algo-rithm first discovers all core sets, then merges them according to their clue-gains, and adds some non-cores into clusters for enriching information of clusters at last. The algorithmic form is listed in Algorithm 1.

Figure 4.1: A scenario of clustering in a dual graph.

Note that a core set in a SC-graph is equivalent to a clique with size ≥ δ on ES. Thus, in the beginning, we find a clique cover on ES, where a clique cover refers to a set of clique with their union being the whole graph. There are many existing heuristic algorithm to find a clique cover efficiently [10]. One of famous heuristic algorithms is based on greedy strategy which idea is to always select the highest degree vertex, and to pick its adjacent vertices which have edges mutually to form a clique. For example, v8 owns the highest degree in this graph (only considering ES). There are five vertices adjacent to it, say v7, v3, v4, v9, and v13. It can be verified that only v3, v4, and v9 have edges between each other. Thus, the first clique {v3, v4, v8, v9} is then generated. Cliques of a clique cover in our example are shaded in Figure 4.1.

After finding a clique cover, we can identify those clique with size ≥ δ as the core sets.

For example, let δ = 2, Cv1, Cv2, Cv3, Cv11, Cv12, and Cv14 are core sets. As long as deriv-ing the core sets, each core set computes the clue-gain from it to all one-step clue-connected core sets. Then, a core set is merged to the core set with the maximal clue-gain. For ex-ample, for Cv12, there are two one-step clue-connected core sets Cv3 and Cv11 with the clue-gains ClueGain(Cv12, Cv3) = 1.9 and ClueGain(Cv12, Cv11) = 5.1, respectively. Thus, Cv12 is merged with Cv3 rather Cv11. The other example is that Cv14 is merged with Cv1 due to ClueGain(Cv14, Cv1) = 4.8 > ClueGain(Cv14, Cv11) = 2.1. Similarly, Cv1 is merged into Cv2 and Cv2 is merged into Cv3. Consequently, we can derive two clusters: {Cv1, Cv2, Cv3, Cv14} and {Cv11, Cv11}. It is worth mentioning that the merged cliques are guaranteed to be clue-connected since each clique can be only merged with its one-step clue-clue-connected clique, thereby satisfying the requirement 1 and 2 of a cluster. At last, some cliques with size ≤ δ are con-sidered to join some clusters to compensate the moving behaviors represented by this cluster.

As such, a non-core joins a cluster with a core which can induce the maximal clue gain. In

this case, a non-core does not merge the other non-core because a chain of non-cores with dif-ferent moving behavior may be contained in a cluster, especially for non-cores with only one vertex. For example, v18 and v19 form such a chain. It can be seen that the v18 is close to v17 which indicates it can compensate the information of Cv11. However, it is not obvious whether v19 can be used to compensate or not. After that, clusters with less than min sup vertices and the remaining non-cores are eliminated. Following the example, let min sup = 5. The cluster {Cv11, Cv11} and the vertex v19 are eliminated. Consequently, there are two clusters {Cv1, Cv2, Cv3, Cv14, v10} and {Cv11, Cv12, v18} as the final results.

Algorithm 1: Clustering Trajectories

Input : A SC-graph: G = (V, ES∪ EC) Output : A set of clusters: C

C ← φ;

Compute the clue-gain with all one-step clue-connected cliques in CORE;

5

Kmax← the clique in CORE which can maximize ClueGain(Ki, Kmax);

9

Cmax← the cluster containing Kmax;

10

Cjoin← the cluster containing Ki;

11

The selection of thresholds usually depends on user’s requirements and the properties of the environment. However, setting λ and µ are not straightforward tasks. The selection of thresholds highly affects the structure of a SC-graph since the number of edges significantly depend on the thresholds for similar and close relations (i.e., λ and µ). The larger λ and µ restrict whether two trajectories have similar and close relations or not more seriously. Thus, larger thresholds incur the fewer edges in a dual graph and make a dual graph more sparse.

A cluster in a sparse dual graph may contain only few trajectories such that it is hard to aggregate them to obtain the frequent movement precisely. The smaller λ and µ makes a dual graph more dense. However, it is easy for a cluster to contain more irrelevant trajectories such that the frequent regions cannot be derived precisely. Therefore, the results of clustering are highly dependent on the values of λ and µ.

Here, we propose a heuristic for selecting λ and µ adaptively according to the distribution

相關文件