The purpose of this chapter is to discuss the issues related to the methodologies presented in Chapter 3 and the empirical findings demonstrated in Chapter 4. The connection between rough sets rules and accident chains are discussed in Section 5.1. The heterogeneity of accident data are shown in Section 5.2; and the issue of aggregation bias is presented in Section 5.3. Finally, the confounding effects are discussed in Section 5.4.
5.1 Connection between Rough Sets Rules and Accident Chains
Taking advantages of rough sets, this research implemented the idea that the occurrence of an accident is a series of errors or mishandling. The illustrated case shows that it is feasible to apply rough sets theory to analyze the links among affecting factors and accident types. The proposed factor structure can be easily transformed and extended based on an analyst’s knowledge and his/her on-hand accident databases. Any factor structures can be tested by similar steps proposed in this research. In addition, a large number of condition attributes were included without any prior judgments except when being grouped with respect to the temporal and logical sequence of the occurrence of an accident. A condition attribute was dropped only when the removal did not have any impact on defining accident types. In our empirical study, only one redundant condition attribute (pavement material) was found when all the attributes were included. This procedure differs from conventional statistical approaches where non-significant attributes are usually immediately dropped and are sometimes claimed to have no impact on the occurrence of an accident.
Rules generated from rough sets provide fruitful information describing conditions under which certain type of accidents may occur. For example, as mentioned in the previous section, the most significant rule for the bump-into-work zone suggests that there is a relatively high risk when a driver approaches work zone on a road with speed limit less than 50 (kph) around midnight. When it comes to employment of the modern ITS technologies (FHWA, 2006), specific warning messages could be devised and sent to the drivers conforming to this particular scenario; consequently, the potential accidents could be prevented. In short, the derived rules have the potential to distribute the right information to the right drivers at the right time for them to be able to act properly.
On the other hand, hundreds of rules were generated in the end, which makes it difficult for analysts to conclude which rules or accident patterns are the most significant.
This result may partly come from the fact that some accident types, such as the bump-into-non-fixed object accidents or rollover accidents, are so stochastic and unique, and partly from the lack of detailed information about drivers’ characteristics in the database
that hinder the possibility of more effectively recognizing accident characteristics. Despite the fact that these accident types are the least definable and the least classifiable, some protective measures still can be implemented to reduce the accident possibility and severity such as preventing animals crossing roads or increasing the strength of the vehicle roof. On the other hand, the most definable and recognizable accident type – the bump-into-facility accidents – is regarded as being preventable. In addition, the bump-into-bridge and off-road accidents showing similar classification patterns as the bump-into-facility accidents, are also expected to be preventable.
In order to find representative rules for occurrence of those avoidable accident types, more advanced rough sets models, such as the hybrid approach combining rough sets with genetic programming (Mckee and Lensberg, 2002), can be adopted in future research.
However, for the low-performing (unpredictable) accident types which are highly related to driver characteristics and unpredictable environment conditions (i.e. non-fixed objects), more related data need to be collected for further study. Meanwhile, instead of preventing accidents, measures for reducing the negative effects of those unpredictable accidents may be more effective and are worth investigating.
The estimation results showed that the accuracy of approximation, the quality of approximation and the hit rates could be dramatically enhanced by considering at least two sets of condition attributes while the inclusion of overall condition attributes generally gave the most satisfactory quality of classification. This suggests that collecting more detailed data on some specialties rather than aimlessly increasing survey items is more effective.
Nonetheless, additional attributes are welcomed and could be collected and examined by testing their redundancy and their effect on the accuracy of approximation, quality of approximation as well as hit rates to determine whether they are worthwhile.
5.2 Heterogeneity of Accident Data
The heterogeneity discussed in this manuscript is different from past studies. It is based neither on driver characteristics (such as age or gender) nor on environmental characteristics (such as urban or rural roads). Instead, the heterogeneity in the study originates from a hypothesis in which the features for frequently repeated processes of accident occurrence and for sparsely unique processes of accident occurrence may be essentially different. The distinct features of accident groups uncovered in this empirical study did show the possible existence of such heterogeneity. The accidents associated with weak rules occur rather uniquely. Since they occur by chance and tend not to lead to similar consequences under similar processes and conditions, it is intuitively expected that it would be relatively inefficient to devise the corresponding countermeasures for them. Surprisingly, it is
observed that those accidents are heavily related to road environment and could be possibly improved by carefully providing adequate road facilities.
Countermeasures for traffic accidents have been previously either focused on drivers who break laws such as drunk driving or speedy behaviors or are concentrated on road design to build a smooth road. Although these measures are generally known and effective, less attention is put on identifying the risky but rational drivers associated with the strong pattern accidents. That means more research and information from studies is required to identify this type of drivers and specific measures devised for them to prevent accidents. It is noted that preventing accidents associated with weak patterns is as crucial as preventing those with strong patterns. However, the efficiency of specifically designed countermeasures to prevent accidents related to the strong patterns will be prominent since accidents associated with the weak patterns are highly diverse. Thus, when detailed heterogeneous accident information is taken into account, countermeasures, such as on-board warning messages and smart roadside safety facilities which try to provide right safety information to right drivers at right statuses, are expected to be effective for the occurrence of strong pattern accidents and are worth being studied.
5.3 Aggregation Bias
The issue of aggregation biases has been noticed and studied by many studies (Davis, 2004; Hewson, 2005; Vlahogianni et al., 2004; Walker and Catrambone, 1993), of which Davis (2004) presented a thorough discussion using simulated data. He argued that since accident data have no independent status, the statistical regularities are simply the result of aggregating particular types and frequencies of mechanisms. The aggregation step implemented in this study could raise similar issues. Despite of the difficulty, aggregation does lay a concrete basis for understanding accident scenarios and further studying those associated with strong pattern with detailed design experiments.
Analyzing each rule instead of accident groups provides a possible way to alleviate such problems. Each rule is herein treated as an individual mechanism since rules are derived under the condition that many critical factors have been controlled. By examining the characteristics of each rule classified as strong patterns, most rules are found to support the findings from crosstab analysis and multinomial logistic regression models where accidents with strong patterns indicate that the drivers involved are somewhat high-risk.
This suggests that the proposed approach can be effective in processing the heterogeneous accident data, although the aggregation bias issue must be faced.
It is unfortunately far more difficult to interpret individual rules with weak and
medium strength since the number of rules runs into the hundreds. An alternative way is to loosen up a little on the pattern requirements after the most (and least) important attributes have been identified. This can be achieved by using an index called significance of attributes (Pawlak, 1991). This index evaluates the number of objects which can not be distinguished with the elementary sets while one condition attribute is dropped from the model. In doing so, the number of rules is expected to decrease. However, the thoroughness of the process of accident occurrence described by the rules will also decrease at the same time. The issue of overwhelming number of rules derived from rough sets theory has also been noticed by researchers (Løken and Komorowski, 2001) and requires further studies.
5.4 Confounding Effects in Causality Analysis
Finding causal factors on safety in observational studies, especially in cross-section studies, is an unresolved issue (Hauer, 2006). The main difficulty lies in the numerous confounding effects while doing comparisons. Consequently, if the majority of these attributes is not well controlled, the analysis results would be biased.
As an attempt to resolve this issue, this research identified the possible causal factors by comparing the differences between entire accident patterns instead of estimating the marginal effects of each attribute. Based on rough sets analysis, the accident data was separated into two subsets: one contained the accidents which could be fully described by the on-hand information and consisted of a certain number of accidents representing the possible existence of causality; the other contained the remaining accidents. The rules, derived from the rough sets analysis, were then compared with each other. The comparison design was used to find the most similar rules for each rule and to examine the differences.
This allowed the control of many confounding factors as possible, and partially revealed the differences between what happened and what would have happened had the circumstances in question been different.
Since the causal factors were found by comparing the complete rules, it is obvious that the comprehensiveness of on-hand data determines to what extent the confounding effects are controlled. In our empirical study, 23 attributes were considered. These attributes were presumed to have impact on accident occurrence and examined with rough sets theory to determine whether some of them were redundant. Basically, more information is welcome in such research provided that it is relevant to the decision attribute. Moreover, there is theoretically no limitation in the attributes that rough sets theory can adopt as long as the computational time is tolerant. Yet, it should be noted that including attributes with similar meanings could produce unnecessary rules and impede the interpretations. For example, two rules with all other things are equal except that one rule specifies the road type as a
freeway and the other rule specifies a high speed limit which could only show up on freeways. There is no difference between these two conditions in the real world. A careful selection of the entry attributes could avoid such redundancy.