In clustering phase, the trajectories are divided into several clusters where each cluster con-tains more than min sup trajectories. In this phase, the spatial and temporal information of trajectories in a cluster will be aggregated and then a frequent region sequences is generated for each cluster.
The trajectories in a cluster may represent the same moving behavior. However, it is not a trivial task to aggregate the information of these trajectories because there may exist some spatial bias, temporal bias, and noise data in them. To overcome these issues, a trajectory which can best represent the moving behavior of trajectories in a cluster will be chosen. Such a trajectory is referred as to the kernel. The information of other trajectories are adjusted to compensate the kernel. Once obtaining the compensated kernel, the coming issue is how to decide the minimal number of regions which can satisfy the spatial bias threshold ².
Since the weight of two similar edges represents how much these two trajectories are similar to each other, the vertex which the highest total weights of the similar edges incident to refers the trajectory which the most trajectories in a cluster are similar to. For example, in Figure 4.1, v3 is the kernel in the cluster {Cv1, Cv2, Cv3, Cv14, v10}. Note that the kernel is likely from more larger cores. A larger core has more mutually similar vertices such that each vertex has more similar edges incident to it. The total weights of a vertex in a larger core is more easily larger than that in a smaller core. In addition, we have more confident to the moving behavior describing by a larger core than a smaller one. It satisfies that we shall select the trajectory which can most represent the moving behavior of this cluster. Moreover, a larger core tends to have more close edges incident to it. It also follows the intuition that the kernel can be compensated the moving behavior from other trajectories.
Since not all information of trajectories can be used to compensate the kernel, the order of adding trajectories should be carefully decided to make as more trajectories able to compensate
(a) (b) (c) (d)
Figure 5.1: An illustrative example of our aggregation algorithm
their information to the kernel as possible. As such, the trajectories which are most likely similar to the kernel should be first considered. To evaluate how a trajectory is similar to the kernel, we should first consider the minimal steps from the trajectory to the kernel, which can be done by BFS. Once a vertex can achieve the kernel by fewer edges, this vertex has less spatial and temporal bias to the kernel with higher probability. Moreover, the path that induces the minimal steps from the trajectory to the kernel is also important. Product of the weights along the path implies that how much this trajectory is similar to the kernel transitively.
Overall, the trajectory with smaller BFS steps and the higher product of weight along its BFS path should be first considered. For example, consider the cluster {Cv1, Cv2, Cv3, Cv14, v10} in Figure 4.1. In this cluster, the kernel is v3. The vertices v6 and v7 are two-step far from the kernel. From v6 to v3, the maximal product of weight is 0.5 × 0.6 = 0.3. From v7 to v3, the maximal product of weight is 0.5 × 0.5 = 0.25. Thus, v6 has higher priority to compensate the kernel than v7 does.
After deciding the order of compensating the kernel, the next task is to adjust the spatial and temporal information of other trajectories such that these information can be used to compensate the kernel. The concept of our aggregation algorithm can be best understood by the example in Figure 5.1. Suppose that the black points are from the kernel and the grey points are from the compensating trajectory. The number associated with each point represents the time. In the beginning, all points of the compensating trajectory are spatially projected as shown in Figure 5.1(a). Among the compensating points, The point w has some temporal bias with the kernel because it locates between kernel points a and b but the value of time of w is not between that of a and b. The point x has such temporal bias as well. In addition, the point y is a noise point which is too far from the other points. Then, according to the points of the kernel, the temporal information of compensating points will be adjusted.
Suppose that a compensating point p locates between two kernel points q and r. If p is between [tq− τ, tr+ τ ] where τ is the temporal-bias threshold, then its time is adjusted by the proportion of its distance to q and r Specifically, tp = tq+(tr−tq)×dist(p,q)+dist(p,r)dist(p,q) . Otherwise,
the point is discarded. Such adjustment is reasonable because a temporal-bias τ is allowed when computing the similarity between two trajectories. For example, let τ = 2. Suppose that the distance between a and w equals to that between b and w. Since the time of w is 7 which is between [2 − 2, 6 + 2], the time of w is adjusted as 2 + (6 − 2) × dist(a,w)+dist(w,b)dist(a,w) = 4.
On the other hand, the time of x is 11 which is outside the interval [6 − 2, 8 + 2]. Thus, the compensating point x is discarded. In the similar fashion, the time of z is adjusted to 18.
Next, the noise points are discarded. Following the notations above, the point p is a noise point if dist(p, qr) > ². The point y is the noise point and thus eliminated. Figure 5.1(b) shows the results after adjusting the temporal information and eliminating noise points. The procedure repeats until all compensating trajectories are added.
Algorithm 2: Aggregation Algorithm
Input : A set of clusters: C
Output : A set of frequent region sequences: R for each cluster K ∈ C do
1
Tker← the kernel trajectory of C;
2
for each vertex v ∈ K do Compute its BFS steps and largest weight product to the kernel;
3
for each trajectories T in the descending order of BFS steps and weight products do
4
Spatial projection all points of T ;
5
for each points p do
6
q, r ← two points in the kernel that p locates between them;
7
L ← lines obtaining by Douglas-Peucker algorithm;
14
Ω ← regions by central lines L;
15
Add Ω into R;
16
end
17
After adding all compensating points, Douglas-Peucker algorithm are used to determine the number of regions. The purpose of this algorithm is that finding the minimal line segments which the distance of each point to the corresponding line is smaller than a threshold ².
Therefore, the minimal number of regions can be obtained while the distance between each point to the line can be guaranteed to be smaller than a threshold ². Conceptually, the algorithm recursively divides the line. Initially a line segment with the first and the last points is constructed. If the farthest point to the line segment is closer than ², it represents the point can be represented by this line. Otherwise, if the point furthest from the line segment is greater than ², the original line segment will be separated into two line segments at this point. The algorithm recursively calls itself until the distance of all points to the derived lines are smaller than ². Taking the line segments in Figure 5.1(b) as input, Figure 5.1(c) shows the
final results where two line segments are derived. Consequently, by viewing the derived lines as the central lines, the regions can be easily derived. The final results are shown in Figure 5.1(d).