CHAPTER 4 EMPIRICAL STUDY
4.2 P ATTERNS OF T AIWAN S INGLE A UTO -V EHICLE A CCIDENTS
4.2.1 Data
Taiwan 2003 single auto-vehicle (SAV) accident data is chosen to demonstrate the feasibility and usefulness of rough sets theory and the proposed framework in accident chain analyses. Single auto-vehicle accidents are those in which only one vehicle is involved. Since no other vehicles or pedestrians, are involved, the problem can be more accurately defined. Meanwhile, far more information is required to explore the accident patterns of multi-vehicle accidents. Consequently, studying SAV accidents is a good start for the study.
The total number of SAV accidents, excluding invalid cases, was 2,316. The number of invalid cases was 20, which accounted for 0.86% of the total cases. These cases were invalid mainly due to the unknown attribute values of the driver’s characteristics. They were directly ignored in the study based on their relatively small size. The collected attributes and their corresponding categories are summarized in Table 4-1. Accident type is chosen as the decision attribute while the other attributes are considered as condition attributes. The categories of the accident types herein were slightly different from the original data provided by the National Police Agency. While rollover crashes, off-road crashes, crashes with architectures, crashes with work zone and other crashes were directly adopted from the original database, the crashes with road facilities include crashes with guardrails, traffic signals, toll collection booths, median islands, trees and utility polls; and the crashes with non-fixed objects include those bumping into animals as well as other non-fixed objects.
A popular rough sets software, ROSE2 (Rough sets Data Explorer), was used in this study where LEM2 (Grzymala-Busse, 1992; Grzymala-Busse and Werbrouck, 1998) is embedded to generate a minimum rule set covering all objects. The results of rough sets analysis consist of five parts: rule generation, quality of approximation, rule validation, rule description and significance of condition attributes.
TABLE 4-1 Attribute and Category
Student, Working people, No job, Other, Unknown
Trip characteristics (Condition attribute)
Trip purpose
Trip time
Work, School, Social, Shop, Sightseeing, Business, Other, Unknown
Morning peak (07:00-09:00 h), Day offpeak (09:00-16:00 h), Afternoon peak (16:00-19:00 h), Night offpeak (19:00-23:00 h), Midnight to daybreak (23:00-07:00 h)
Behavior and
Normal, Other (e.g. holes, soft, and so on) Dry, Wet, Other
Yes, No (within 15 meters)
Good, Bad (based on road design speed) Regular, Flash, No signal
a sample size of the accident type
4.2.2 Rule Generation
As shown in Table 4-2, the number of rules generated increases with the completeness of the specified condition attributes. Since all the condition attributes are categorical variables, the incorporation of any additional condition attribute with n categories would expand the possible classifications n times. However, while the quality of approximation is much enhanced, the number of rules does not increase proportionally but only with limited growth. This implies that the condition attributes included are valid enough to classify the accident types and that some patterns do exist for the SAV accidents in Taiwan rather than all SAV accidents being regarded as unique.
TABLE 4-2 Rough Sets Results
Approach Accident type Generated
rules Accuracy Quality of
classification Hit rate Overall hit rate
aD: Driver characteristics; T: Trip characteristics; B: Behavior and environment factors; A: Accidents
4.2.3 Quality of Approximation
The accuracy of approximation for rollover and bump-into-non-fixed object accidents is extremely low, except when all condition attributes are included. However, the accuracy of approximation for the bump-into-bridge accidents, off-road accidents, and other accident types can be increased to 30%~40% if B&E factors are combined with either driver characteristics or trip characteristics. This can be raised to 70% or even 80% if all condition attributes are included. Roughly speaking, bump-into-facility and work zone are the most definable accident types, while bump-into-bridge, off-road, and other accident types are moderately definable accident types, and rollover and bump-into-non-fixed object are the least definable accident types.
The quality of classification is proportional to the completeness of selected attributes.
Approach 7 shows the highest quality, while Approach 2 shows the lowest. B&E factors show the most important attributes for the quality of classification partly due to their wide coverage of affecting factors, which are also proximal factors. Each dimension alone (Approaches 1, 2, 3) does not yield a good quality of classification. If at least two dimensions are combined, the quality of classification is much enhanced. For example, the quality of classification for B&E alone is 38.69%. However, it is raised to 70.68% by merely combining it with trip characteristics in which only two more attributes are included.
These results suggest that accidents should not be resolved by single factor, but by a chain of factors. Previous countermeasures focused mostly on B&E proximal factors. It is effective; however, to further improve road safety, all factors associated in the factor chain may need to be taken into serious consideration. Furthermore, neglecting factors in a chain may result in rather different stories and blur the interactions among accident features.
4.2.4 Rule Validation
The 10-fold cross-validation technique is used to conduct validation test of classification results. The hit rate, i.e. the percentage of correct prediction, for the bump-into-facility accidents can be improved by up to 70 percent when all condition attributes are considered. On the other hand, the hit rates for the remaining accident types all range from 0 to 20 or 30 percent. This suggests that the occurrence of a bump-into-facility accident may follow similar paths and is more predictable. But for other accident types, the rules generated from their training cases may not be representative since their occurrences are mostly random.
The higher the quality of approximation, the higher the overall hit rate and the hit rate for the bump-into-facility accidents. Yet, the bump-into-bridge and bump-into-non-fixed
object accidents show the highest hit rate in Approach 3, which consists of B&E proximal factors only and reveals the unexpected and random characteristics of these kinds of accidents. Its hit rate becomes lower if other condition attributes are included. These results suggest that except for the bump-into-facility accidents where more information is useful, different accident types have their corresponding useful condition attributes. For example, the condition attributes of driver characteristics are useful for the bump-into-work zone and the other accident types, and those of trip characteristics are useful for rollover accidents.
All these results are helpful for devising adequate countermeasures.
The classification results show that most of the bump-into-bridge, bump-into-facility, off-road and rollover accidents are assigned to the bump-into-facility accident type and least into the bump-into-non-fixed and bump-into-work zone accident types. This suggests that, while most accidents are associated with some critical condition attributes which lead to the similar classification pattern, bump-into-non-fixed and bump-into-work zone accidents are related to very distinctive characteristics. This also implies that some similarities may exist in the occurrence of the bump-into-bridge, bump-into-facility, off-road and rollover types since they are all related to road geometry and driving environments. These similarities are the reasons for the low hit rates for the bump-into-bridge and off-road accident types, since they can be easily assigned to the bump-into-facility accidents due to the fact that the sample size for the bump-into-facility accident type outweighs theirs. As a consequence, more rules associated with the occurrence of the bump-into-facility accident type are generated and dominate the classification pattern. On the other hand, the remaining accident types, such as the bump-into-non-fixed object, are more closely related to driver characteristics and are relatively unique.
4.2.5 Description of Significant Rules
Rules are generated from the accident database by rough sets theory, and the significant rules for each accident type are shown in Table 4-3. The rule strength – the number of accident cases matching the rule – for most accident types is small except for the bump-into-facility type. The highest strength for most types is about 3 or 4. This shows the uniqueness of those accident types, especially, the infrequent and stochastic occurrences of the bump-into-non-fixed objects. Interestingly, the derived factor chain shows that a drinking driver without regular license exhibits a relatively high possibility of being involved in bump-into-non-fixed object accidents on a secondary road without roadside marking and light.
TABLE 4-3 Description of Significant Rules
Environment: Road segment ; Median island ; Wet surface ; No obstruction within 15 meters;
Off-road (7)
Driver: Regular license; Student;
Environment: Speed limit 50-79 ; Median marking ; With roadside marking ; With light;
Driver: Middle-aged; Working people;
Behavior: Drinking;
Environment: Speed limit less than 50; Collision position rather than intersection, segment and ramp; With roadside marking;
Behavior: Drinking; Cell phone use unknown;
Environment: Flash signal ; No roadside marking ; Dry surface;
Driver: Young; Working people;
Trip: Other trip purpose; Between midnight and daybreak;
Behavior: Not drinking;
Environment: Collision position rather than intersection, segment and ramp ; Pavement rather than asphalt ; No directional-divided facility ; No roadside marking ; No obstruction within 15 meters;
Bump into work zone
(4)
Driver: Male; Regular license type; Unknown occupation;
Trip: During midnight to daybreak;
Environment: Speed limit less than 50; Asphalt pavement; No signal ; Obstruction within 15
(3) Driver: Young; Male; Regular license type; Working people;
Trip: Day offpeak;
Environment: Speed limit less than 50 ; Regular signal;
Bump into non-fixed object (2)
Driver: Other license type;
Behavior: Drinking; Cell phone use unknown;
Environment: Speed limit less than 50 ; No roadside marking ; No light;
a please refer to Table 4-1 for the details of condition attributes
b the value represents the rule strength
The most significant rule for the bump-into-work zone suggests that there is a relatively high risk when a driver approaches work zone on a road with speed limit less than 50 (kph) around midnight. This information suggests that more effective and sufficient work zone traffic controls should be installed, particularly in the dark work zone on those secondary roads. The rule reflects the fact that, to save cost, it is often the case that safety measures are not properly implemented, especially on rural secondary roads.
For rollover accidents, two significant rules describe young working people who are driving during off-peak period as being more likely involved in the rollover accidents, probably due to the low traffic and high speed.
Four significant rules for the bump-into-bridge accidents describe two conditions:
drinking driving under normal road environment and sober drivers under abnormal road
environment. Specific deficiencies exist on both conditions for this accident type. This shows the necessity for the government to prevent this type of accident by improving the road environment or raising the penalties for drinking driving.
The derived factor chain for off-road accidents shows that student drivers who are young and less experienced exhibit a relatively high possibility of being involved in off-road accidents. This result echoes the graduated licensing scheme currently existing in many countries (Simpson, 2003). Moreover, the factor chain shows that the corresponding driving environment is normal, i.e. no particularly unfavorable factors such as drinking driving or poor sight distance appear on the chain. Since other driving groups such as working people do not show similar accident patterns as off-road accident type, the government should seriously consider educating student drivers to enhance their situational awareness of driving environment and reduce their risk-driving behavior on roads.
The rule with the highest strength goes to bump-into-facility accidents. It describes 35 employed sober drivers rather than students driving on an island-divided road segment where the surface was wet and there were no obstructions within 15 meters. The wet surface denotes lower friction on road surfaces that increase the difficulty of handling vehicles.
Meanwhile, drivers generally might slow down their driving speed to maintain vehicles at an “acceptable” speed. Therefore, the extremely high supporting evidence may imply that those drivers overestimated their driving skills and underestimated the risk of the decrease in surface friction.
4.2.6 Significance of Condition Attributes
The significance of condition attributes is measured by their presence on the derived rules. When a condition attribute shows up more frequently in the rules, it is more likely being used to describe the occurrence of accidents and hence is more significant in distinguishing accident types. The presence of a condition attribute is represented with presence percentage which is calculated by summing up its presence in each rule weighted with cases of the associated rule divided by total cases. Here, only the rules derived from Approach 7 are adopted in the calculation since Approach 7 shows the most satisfactory performance. Moreover, since condition attributes with more categories tend to distinguish accident types more effectively, comparisons are made on those with same number of categories. As shown in Figure 4-1, gender, roadside marking and light condition; speed limit, road shape and directional divided facility; age, occupation, trip time and drinking condition are those attributes with a relatively higher presence percentage among all condition attributes with two, three and four or more categories, respectively.
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
Gender SurfDef VisDist Light LiceCon Cell RdShape SigType DireDivided LiceType TripTime Drink
Presence percentage
FIGURE 4-1 Presence percentage of condition attributes.