Phase I: Traffic Information Generation… - Knowledge-based Travel Time Prediction

CHAPTER 4. Knowledge-based Travel Time Prediction

4.2. Phase I: Traffic Information Generation…

The phase I is traffic information generation, including processes of data collection, preprocessing, and transformation from LBS. The process is first creating journey table and traffic information spot (TIS) table schemas, which are derived from LBS, and cleaning some useless raw data to make our probing data more robust (accurate). At last, some categorized traffic patterns in spatial and temporal dimension is defined for storing the results of traffic patterns in the historical traffic database.

4.2.1 Table Schema Derived from LBS

After collecting the raw data of LBS server, data transformation process including three steps is applied to generate the traffic information, as shown in Figure 5. The first step is journey selection, which filters out the meaningless raw data and generates the journey tour of each taxi. There are two cases in a journey of a taxi, one is the tour from dispatched state to occupied state, and the other is the tour from occupied state to the empty state. The former means taxi driver is dispatched to serve the customer, taxi goes from the current location to the customer’s location. The latter means the journey from the customer get on the taxi to the customer’s destination. After journey selection, we can build the OD (Origin and Destination) table, called journey table, as shown in Figure 5.

The journey table is a table storing the touring records, where each tour contains one OD pair and a sequence of several URP of the same OBU. These sequences will be stored in TIS table, as shown in the right side of Figure 5. The definition of URP is : URPi = (TSi, (Xi,Yi), Si, Di, Ai), where TSi is the timestamp of the URP, (Xi,Yi) stands for the coordinates of the vehicle, Si and Di are speed and direction of the vehicle, and Ai means

The second step combines the road network data and GIS engine to transform the locations of the URP into real address, which helps mapping the vehicle traveling status to the traffic status of relative road section. Finally the third step summarizes all URP in the journey table of a same temporal section to get average speed of each road section.

As discussed above, the two generated table schema will be used to analyze by data mining technique for producing the historical traffic patterns and rules, and the detail will be introduced in the following sections. By the way, raw data is collected from LBS in real-time at Phase I, and each collected records from LBS application represents location, speed, direction, and status of a vehicle at somewhere the OBU reports to the backend. Hence, we name the real-time location information as traffic information spot (TIS) and create the third table schema for real-time traffic consideration, which is so called real-time traffic spot table. At meanwhile, the traffic information generation process is done at the same time, and the GIS engine [21] helps to convert the coordinates of the vehicle location into address. The speed of the vehicle can be regarded as a sample of each link. The generated traffic information will send the real-time road network status to the expert system, and this information is the inference data source for real-time TTP.

Figure 5. Journey and TIS Table

4.2.2 Data Cleaning

After creating our journey and TIS table, we find some noise raw data which are useless, e.g., invalid values of GPS position, speed, and directions, need to be removed from the sequences of TIS table.

Missing values: There are some links of which probing vehicles do not record the

traffic status information. The problem could be caused by GPRS communication or GPS errors. GPS errors might occur when a probing vehicle passes under an infrastructure such as tunnel or the vicinity of elevated structures (the so called urban canon). GPRS communication might be the same reason or any unknown events to cause missing values.

Useless data: In the content of URP, probing vehicle’s speed is 0 in the same

position with a long time. The reason of this happening is probing vehicles were stopping in the ranking station and waiting for servicing. This is because the LBS based probing vehicles are commercial taxi fleet and have “taxi behaviors” on their operating.

Redundant: Some reports of URP show the same message from the same vehicle.

This is because there were several events happened immediately, such as periodically report event after the cross boundary event. So, the reports of message are counted twice and need to be pruned.

4.2.3 Spatiotemporal Traffic Patterns Classification

In this section, we give the definition of spatial and temporal dimensions in order to present our traffic rules and patterns. Classifying the traffic patterns is beneficial to our TTP expert system, but it may have some drawbacks in other situations. The benefit is effective to reduce the computation time on classified historical database, so that only similar segments of the historical database are searched [2]. But, if the searching window time is too large, the real-time online TTP system will be suspended. For example, holiday traffic patterns may be different from the other days of the week.

Therefore, predicting the travel time on Sunday can be done by only searching all historical traffic patterns on Sunday in one year. As a result, prediction time can be reduced to 1/7 * 365 (7 days a week). The drawback of classifying the traffic patterns is pattern matching problem, because the fluctuation of TTP is affected by many inferences factors, such as incidents, weather, and driver behaviors. But it is doubtful about only using historical traffic patterns to predict real-time travel time. Here, we give some flexible solutions to handle this problem. Solution one uses dynamic variables of α and β to compute the real-time and historical TTP. Solution two uses the designed TTP system to mine some historical traffic patterns (holiday, working day, etc) which are cleaner without any noise factors (raining, accidents, etc). The system will trigger the meta-rules to dynamically control α and β variables when there are some traffic events on the links for achieving higher accuracy TTP.

The classification method in temporal dimension was grouped into “Year”,

“Season”, “Month”, “Date”, “Hour”, and “half an hour” in nature way, and spatial dimension was defined as “City”, “Zone”, and “Road section”. Because the limited speed of Taipei Metropolitan Area is 40 km/hr, we classify 9 fuzzy traffic level statuses.

The average speed of collected records between 0-5 km/hr is defined as level 1 and 6-10 km/hr is level 2, as shown in Table 1. We also define some characteristics for traffic status, such as “Congestion (B)” means the average speed falls below 25 km/hr, and if the average speed is below 15 km/hr, called “Extreme Congestion (A)”. “Normal (C)” is the average speed falls between 26~35 km/hr. “Free Flow (D)” is above 36 km/hr,

Table 1. Classification of Traffic Levels

Traffic Level 1 2 3 4 5 6 7 8 9

Average Speed (Km/Hr)

0-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 >41

Characteristic

The above spatial and temporal granularities are used to formulate traffic rules. In this thesis, we choose the “month” as temporal granularity with two types of days (e.g.

workday and holiday) for calculating the interesting values (e.g. support, confidence).

As in spatial granularity, the “road section” granularity is considered in the target area.

And the target area of this thesis focuses on the arterial roads of Taipei urban area, as shown in Figure 6.

Figure 6. Road Network in Taipei Urban Area

在文檔中基於行動定位服務的即時旅行時間知識庫預測系統 (頁 28-33)