Performance Evaluation - 對交通資料之混合式預測演算法

In order to evaluate the accuracy and eﬃciency of the proposed methods, we have conducted extensive experiments on real data sets under a number of conﬁgurations. In this section, we have done many experiments to prove that our proposed methods are practical and accurate for real data. We also compare our proposed method with one existing method, ECTS, proposed in [23].

We used a real traﬃc data set in our experiments. The data set is the traﬃc speed log of a freeway segment in Taiwan. This segment is from Hsinchu to Jubei which has the highest throughput in Taiwan. There is a traﬃc burst at commuting time on weekdays. Therefore this data set has the patterns we expected. We spent more than three months obtaining the data from government sensors.

However, the training data length is not ”the longer the better”. Too long training data that is too long contains data which is out of date. In our experiments, we used two weeks of traﬃc status as the training data for the clustering-based methods, while for the regression, we used the same prediction length as the training data length.

In Fig. 5.1 and Fig. 5.2, we compare the prediction methods for short-term and long-term prediction. In these two experiments, the parameters of MCF and MPST are the best parameters in the experiment, as shown in Fig.5.5, 5.6, 5.7 and 5.8. The diﬀerence between short-term and long-term is the prediction length, and the prediction is short-term if it is less than 1 hour. This breakpoint is decided from our data set. The sampling rate of our data set

Figure 5.1: Prediction results for short-term prediction

Figure 5.2: Prediction results for long-term prediction

Figure 5.3: Prediction results for 6 hours continuous prediction

Average Trace Length for One Value

Predict length(hour)

Comparison of Trace Length between MPST and ECTS MPST ECTS

Figure 5.4: Comparison average trace length of MPST and ECTS

is 1 minute. In the short-term predictions, each is a continuous prediction for the values of 15 minutes after the prediction length. For example, the prediction length of 0.5 hour means to predict the values at t_c + 31 t_c + 45. As the results show in Fig. 5.1, regression is the best prediction method, as we mentioned, for short-term predictions. For long-term predictions, each experiment is continuous predictions for the values of 1 hour. Although MPST is not always the most accurate prediction method, it is better than regression, MCF and ECTS on most occasions.

Although the accuracy of MPST is close to that of ECTS, our MPST is better than ECTS on the tracing length. The tracing length is how far we should trace back the time series to predict a value. In Fig. 5.4, we compared the average tracing length of MPST and ECTS.

In this experiment, ECTS needs a longer tracing length, and the needed tracing length is not ﬁxed. Therefore our MPST uses less data than ECTS, and the accuracy of MPST is close to that of ECTS.

To summarize the above results, we did an experiment of 6 hours continuous prediction, the results of which are shown in Fig. 5.3. In this experiment, the parameters of the prediction methods are the same as in the above experiments. The hybrid prediction in this experiment is the most accurate prediction method. In this experiment the threshold of hybrid prediction is 1 hour; therefore we use regression when the prediction length is less than 1 hour and MPST when it exceeds 1 hour.

We also carried out the experiment using our clustering-based methods. In Fig. 5.5, 5.6, 5.7 and 5.8, we use diﬀerent parameters for the clustering-based prediction methods MCF and

MSE for different time series period MCF

Figure 5.5: Diﬀerent periods for MCF

MSE for different time series periods MPST

Figure 5.6: Diﬀerent periods for MPST

Figure 5.7: Diﬀerent tree heights for MPST

MSE for different similarity thresholds MPST

Figure 5.8: Diﬀerent pattern similarity threshold for MPST

MPST to investigate the eﬀect of the parameters. In MCF, there are only two parameters: k and time series period. The parameter k is for clustering algorithm k-means. The value of k aﬀects the clustering result of both MCF and MPST; therefore we will discuss this issue with MPST. The experiment result for the time series period of MCF is shown in Fig. 5.5. As can be seen, the best time series period for MCF when using our data set is 4 hours.

In Fig. 5.6, we used diﬀerent time series periods for MPST. Because it predicts the future value using several recent patterns, the period is shorter than in MCF. If the period is 30 minutes and the MPST height is 3, MPST will use the recent data of 1.5 hours to predict the next 30 minutes. In this experiment, we found that 10 minutes is the best time series period for MPST when using our data set. The height of MPST is another important parameter.

We mentioned that the prediction result is decided from the recent patterns. The height of MPST aﬀects the maximal patterns we use to make predictions. The eﬀect of MPST heights is shown in Fig. 5.7. The eﬀect is not obvious in this experiment because we used the number of occurrences to make the predictions. The number of occurrences is lower and lower in the lower nodes; therefore the eﬀectiveness is also lower. Thus the MPST height is not so important as period. The last parameter of MPST is the pattern similarity threshold. This is the parameter which decides whether the subsequence is like the pattern in the symbolization process. The value of the pattern similarity threshold means: if the distance between the subsequence and the pattern is lower than the threshold, the subsequence is similar to this pattern. As shown in Fig. 5.8, the eﬀect of the threshold is stepwise. When the threshold is greater than 6, the prediction result is inaccurate. This is because a threshold that is too

Average Trace Length for One Value

Predict length(hour)

Comparison of Trace Length between MPST and ECTS Hybrid

Figure 5.9: Diﬀerent threshold for Hybrid Prediction

Clustering Results with Different Values of K k

Figure 5.10: Eﬀect of K in K-means

big causes MPST to use patterns which are not similar to the subsequence to predict future values.

For the hybrid prediction method, we tried diﬀerent threshold values. The experiment results are shown in Fig. 5.9, The best threshold is 1 hour, therefore we use 1 hour as the breakpoint of the short-term and long-term predictions. In this experiment, a bigger threshold means using much regression; thus accuracy is low with a big threshold.

We mentioned that the value of k in clustering algorithm K-means is not an important parameter. To prove this, we clustered our data set with diﬀerent values of k and pruned the clusters. This experiment result is shown in Fig. 5.10. When k is big enough, the number of clusters will no longer grow.

In these experiments, we veriﬁed the performance of the proposed methods. We also prove

that the regression-based method ﬁts for short-term prediction, clustering-based prediction ﬁts for long-term prediction.

Chapter 6

在文檔中對交通資料之混合式預測演算法 (頁 29-36)