• 沒有找到結果。

Prediction of bike stations activity

2. Literature review

2.3 Station activity and spatiotemporal patterns

2.3.3 Prediction of bike stations activity

There are some studies interested in the prediction of station usage as more details described following. In addition, predicting bike station usage is expected to have following advantages (Froehlich et al., 2009):

• Allowing for more accurate load balancing of the stations;

• Providing expected station activity information to operators and stakeholders; and

• Allowing for new mobile services to users to inform possible station activity.

Both studies of Froehlich et al. (2009) and Kaltenbrunner et al. (2010) focus on predicting the number of available bicycles at a given station at a given time. In the contrast, Borgnat et al.

(2009) however predict the number of shared bicycles hired per hour taking into account external factors to the cycle patterns. According to Froehlich et al. (2009), four simple predictive models are used which are last value (LV), historic mean (HM), historic trend (HT) and Bayesian network model (BN. Note that all of these models have three input parameters:

(1) the current time t0; (2) the current number of bicycles at time t0 , denoted as Bt0; and (3) a prediction windows (PW) ranging from 10 minutes to 120 minutes. Three weeks of historic

45

data are used to form the three history based predictors, LV, HM and HT. In terms of BN model, extensive experiments with two pilot stations as to determine the optimal dimensionality7 of three observed (input) nodes which are time, bike (normalised available bicycles, NAB), and PW with six possible values in the future, and one hidden (output) node, delta8. Therefore, prediction of bikes are made through the current available bikes plus the value of delta node.

Each BN is trained by computing a posterior over the parameters from the observed data in terms of time, bikes, PW and delta where time covers the three-week period of training data in five minutes increments. Results show that BN model performs the best among these models with smallest average error of 0.08 NAB (e.g., average error of 2 bicycle if corresponds to a station capacity of 25) whereas HM predictor has the worst performance, implying station daily activity is quite varied in comparison with historic mean. Station state of full or empty are taken into account as well. Similarly, BN model still perform the best if up to 2 hours prediction in the future with 80% accuracy either for empty station or full station state prediction in the most challenging scenario (i.e., PW for 120 minutes). However, HM and HT predictors are replaced by decision tree classifier (ID3) and support vector machine classifier (SVM) due to poor estimation. Practically, most bikesharing users tent to be interested in the available numbers of shared bicycles within next 60 minutes in the future. And LV, HT, and BN models are actually able to provide sufficient accuracy only within one bicycle error.

Kaltenbrunner et al. (2010) use two basic predictors which are baseline model and gradient-based prediction, and advanced time series analysis method, i.e. auto-regressive moving average (ARMA) model. More specifically, baseline model is to predict the current state of the problem for any time in the future whereas the other is based on inferring from the current state using only data tendencies of the same day of the week. These two models are compared in terms of mean error for different time offsets such as 10 minute, 30 minutes or more in the future. Notably there is no significant difference for predicting in a very short period of these two models; however, a greater performance of prediction using the gradient of the average activity cycle. In terms of ARMA model, a history of 20 minutes (i.e. 10 samples) usage data are used to generate for both AR (auto-correlated nature) and MA (information from surrounding stations) components, resulting in the same order of 10. The optimal number of surrounding station used for MA component is tested through examining the average absolute error for a set of different ten stations with different number of surrounding stations. While

7 The optimal dimensionality is based on the prediction error

8 All observed data of time, bikes and PW are the parents of the delta

46

using 15 surrounding stations achieves the least mean absolute error, actually the performance of the number of surrounding stations ranging from 5 to 20 have no significant differences.

Additionally, the prediction error over the time intervals ranging from 2 minutes to 60 minutes is evaluated with consideration of 5 closest surrounding stations. Results show that the average prediction error is below 1 bicycle at a 30-minute prediction interval while the error increasing to around 3 bicycles at one hour prediction interval. Although there are smaller prediction errors in the prediction interval of less than 20 minutes, it may be resulted from the low-pass filtering applied to the data. It seems that ARMA model can provide better prediction performance over simpler methods and the important role of the number of surrounding stations play to improve the prediction.

Unlike these two studies mentioned above, Borgnat et al. (2009) propose a statistical model to describe the daily and weekly patterns of Vélo'v bikesharing system in Paris in terms of cyclostationarity manner and possible non-stationary evolutions in larger time-scales. This study is further developed in the study of Borgnat et al. (2011). In addition, linear regression model is combined with this model is developed to predict the number of bicycles hired hourly.

Weekly temporal patterns are identified firstly to study non-stationary patterns on time scale larger than the day and the cyclic mean for the number of bicycle rentals over the week is estimated in terms of the periodic average. Prediction of the number of bicycles hourly can be divided into two parts9: firstly, the prediction of the non-stationary amplitude 𝐴𝐴𝑑𝑑(𝑑𝑑) for a given day; and secondly, the prediction of the fluctuations 𝐹𝐹 (𝑡𝑡) at a specific hour. Factors of the weather in terms of the average temperature over one day and the volume of rain during the day are considered; the number of registered users and the number of bicycles available and the dummy variables in terms of holidays and specific days or strikes are also taken into account.

In terms of prediction of hourly fluctuations, it is modelled by an auto-regressive process of order 1 with exogenous input. It should be noted that the prediction of the number of bicycles hired is estimated for the whole system rather than each station. This study only evaluates the prediction in terms of the standard deviation of the error, namely 120 bikes per hour; however, the difference between estimated and actual number of bicycles hired per hour is not examine.

As a result, it is hard to evaluate the performance of this model and only can be understand the temporal pattern globally.

9 Estimated the number of bicycles hired hourly is represented as 𝐿𝐿 (𝑡𝑡) = 𝐿𝐿𝑚𝑚𝑚𝑚𝑑𝑑 (𝑡𝑡) + 𝐹𝐹(𝑡𝑡) = 𝐴𝐴𝑑𝑑(𝑑𝑑) <𝐿𝐿(𝑡𝑡)>𝑐𝑐 𝐴𝐴𝑚𝑚𝑚𝑚𝑚𝑚(𝑑𝑑7)+ 𝐹𝐹(𝑡𝑡)

47

相關文件