Analysis of CluST - Experiment Results - Trajectories Spatiotemporally

Trajectories Spatiotemporally

2.6 Experiment Results

2.6.5 Analysis of CluST

One main challenge of the k-means based clustering is the decision of k. Here we discuss the impact of k on the profiling and provide some guidelines for choosing k.

Based on (2.12), k between 30 and 55 was recommended for the typhoon track data, and k from 60 to 160 was suggested for the bus trajectory data. After we can decide a range of k based on some clustering validation measurements as in [50], the exact k selection should base on the domain expertise of the data. In our experiments, for typhoon tracks, clusters should tell different parts of the moving

180 E

121.53 E 121.54 E 121.55 E 121.56 E 25.033 N

25.038 N 25.043 N

(b) Bus T R: k=20 (thlen=0.1, thspd=0.9)

Figure 2.20: CluST results with too small k.

121.53 E 121.54 E 121.55 E 121.56 E

25.033 N 25.038 N 25.043 N

(a) w_sht: w_spd= 5 : 1

121.53 E 121.54 E 121.55 E 121.56 E

25.033 N 25.038 N 25.043 N

(b) w_sht: w_spd= 1 : 5

Figure 2.21: Impacts of w_sht and w_spd on bus data profiling.

characteristics along their paths of typhoons from low latitudes to high latitudes and distinguish different patterns. For bus trajectories, the number of clusters must be large enough to separate traffic on different segments of roads and distinguish those with different speeds on one road.

Finally, we show the clustering results if too small or too large k was used in Fig. 2.19 and Fig. 2.20 on both the typhoon tracks and the bus trajectories. When k was too large as shown in Fig. 2.19, too many similar and close-by clusters were generated, which gave us too many redundant information. On the other hand, when k was too small, many basic typhoon track moving styles vanished as shown in Fig. 2.20(a) while the bus routes that were originally on different roads were grouped into one cluster as shown in Fig. 2.20(b).

The weights in distance function (2.5) can be adjusted to strengthen either of spatial or temporal information. The impact of w_sht can be shown by comparing Fig. 2.13(a), Fig. 2.21(a) and Fig. 2.13(f), where the w_sht : w_spd = 1 : 1, 5 : 1 and 1 : 0, respectively. The speed diversity decreased as ratio of w_shtincreased, while the generated clusters fit better into the real road network. In the mean time, the impact of w_spdis shown through comparing Fig. 2.13(a), Fig. 2.21(b) and Fig. 2.13(g), where

Figure 2.22: DBSCAN parameters analysis on typhoon data.

180 E

(a) 31 clus. (=2500, M inLns=5, 0.4% noise)

180 E

(b) 175 clus. (=750,M inLns=20, 19.6% noise)

Figure 2.23: DBSCAN representation lines on typhoon data.

the w_sht : w_spd = 1 : 1, 1 : 5 and 0 : 1, respectively. The speed diversity increased as ratio of w_spd got larger, while the generated clusters distributed more randomly and the original road network was totally vanished when w_sht = 0. With the above observations, the w_sht and w_spd should be adjusted according to the importance of the spatial and speed information.

Based on the same replacement lines generated by DivST, here we show the impact of different clustering methods on the profiling results. We compare our k-means based clustering method, CluST, with the adapted DBSCAN algorithm used in [19] on the typhoon track data set. As discussed in Section 2.5, the model-based clustering methods require a data-dependent design and thus we did not compare those algorithms with ours.

For the adapted DBSCAN method, we tested a series of ε-neighborhood and M inLns (originates from M inP ts in the DBSCAN method [54], which is the minimum number of lines in our case) values, while using the same line distance defined in (2.5). As the parameters ε and M inLns varied, we plotted the number of clusters

and the noise ratio (the ratio of lines that were not included in any cluster to the total number of replacement lines) in Fig. 2.22. When we set M inLns small or set ε large, the number of generated clusters was very small (less than ten). The density reachable characteristic of DBSCAN, which tends to group lines spread over large area, is inflicted on generating few but large in size clusters. Thus the large line cluster cannot be interpreted simply by computing its mean. To solve this problem, an additional afterward cluster sweep was used in [19], which took a lot of time to compute the average coordinate whenever the number of line segments encoun-tered was more than M inLns. On the other hand, although the generated cluster number increased rapidly as ε-neighborhood was smaller, the noise line ratio also increased fast as illustrated in Fig. 2.22(b), which would cause serious information loss. Furthermore, k-means has O(n + k) memory cost and O(ntk) time complexity, where n is the number of objects to be clustered, t is the iteration number. Usually k, t n, so the k-means-based method is more scalable and efficient in processing huge amounts of trajectory data compared to DBSCAN, which has O(n²) complexity in both time and space.

To be clearer about the clustering results of DBSCAN, we provided sample results under two sets of parameter settings on the typhoon data in Fig. 2.23. First we set = 2500 and M inLns = 5 so that the number of generated cluster is 31, which is close to the cluster number, k=30, used in our CluST method in previous sections. Under similar number of clusters, the profiling results of DBSCAN, as shown in Fig. 2.23(a), lost some important regional characteristics compared to those in Fig. 2.12(a). For example, the delicate turning points of trajectories vanished from several large clusters generated by the density reachable property of the clustering method (represented by those very wide short lines). To capture more details, we used another settings: = 750 and M inLns = 20. The chosen was the average intra-cluster distances in Fig. 2.12(a), while this M inLns caused a tolerable noise ratio, 19.6%. The resulting cluster number was 175 and the corresponding cluster mean lines were shown in Fig. 2.23(b). We can see that too many clusters were generated, which missed our goal of finding distinct moving patterns with a clear view.

The above results showed that the DBSCAN-based method was not suitable for

our goal to cluster sub-trajectories: to find convex clusters of objects with consistent spatiotemporal properties, as we discussed in Section 2.5. In fact, our distance function and the way of computing cluster mean lines were all designed in favor of convex clusters. The DBSCAN method connects density-reachable objects into one group and may included members spreading over large spatial area or temporal differences, which can result in less consistent spatiotemporal properties. Thus, the DBSCAN method was unfavorable to reveal the regional moving behaviors of trajectories in our design. This explains why we chose the k-means based clustering method in DivCluST.

2.7 Summary

In this work, we presented DivCluST, an approach to profiling a set of moving objects by dividing and clustering their accumulated trajectories spatiotemporally. First, DivST divided the trajectories and generated a proper number of replacement lines while preserving their original spatiotemporal properties well, through the designed spatial and temporal dividing criteria and the threshold parameter selections. Then, CluST clustered the replacement lines based on the proposed spatiotemporal line distance measurement in consideration of the differences between positions, lengths, directions and speeds. With the specially designed mean line representation, the clustering results can reveal the regional main spatiotemporal moving behaviors of the given trajectory data set. Finally, we conducted extensive experiments on three different real world trajectory data sets. The results showed that DivCluST can effectively produce the profiles of moving objects and identify regional typical moving styles from the trajectory data.

Chapter 3

在文檔中以軌跡資料剖析與倉儲進行物體移動行為分析 (頁 53-59)