CHAPTER 3 METHODOLOGY
3.1 C HALLENGES IN A CCIDENT A NALYSIS
The definition of causality is strict. In epidemiology, for example, the Surgeon General (1964) claimed that to diagnose cancer of smoking causes, the following ad hoc rules for judging causality could be adopted: 1) Strength of association (meaning some statistical measure of association is strong); 2) Dose-response effect (the more of the causal factor, the larger the effect); 3) No temporal ambiguity (disease follows exposure to risk factor); 4) Consistency of findings (several studies produce similar results); 5) Biological plausibility (the hypothesis makes sense in view of what is known in biology); 6) Coherence of evidence (some combination of 4 and 5); and 7) Specificity (causal factor causes this disease, and this disease is due to this causal factor). Some of these rules are deficient if being directly applied in traffic safety. Rule 2, for example, is not necessary true in traffic safety: empirical evidence shows that the relationship between expected accident frequency and traffic flow is usually not linear†. Yet, most of them are desirable (or just need a few modifications) in traffic safety including rules 1, 4 and 5. A more concise definition of causality is given by Pearl (2000) who asserted that causality has to meet three criteria: 1) Correlation: Cause and effect must vary together; 2) Time sequence: The cause must come before the effect; and 3) Non-spurious: The relationship between cause and effect cannot be explained by any third variable. These criteria can be viewed as the baseline for all kinds of causality including traffic safety.
Factual knowledge of causality is not easy to come by. The best way to obtain causality is via randomized experiments. Yet, it is technically impossible and immoral to do so in traffic safety research. Another two ways are observational before-after studies and cross-section studies. An observational before-after study is to randomly divide a set of
† Golob et al. (2004) gave a complete review on their published article, Freeway Safety as a Function of Traffic Flow, in Accident Analysis and Prevention, Vol.36, No.6, pp.933-946.
candidate entities into those to be treated and those not prior to the implementation of some effect. After a certain period of implementation, the differences between treated and untreated groups are compared. On the other hand, an observational cross-section study arises when the attributes and accident history of entities (such as road sections, intersections, drivers, etc.) are used in an attempt to estimate the safety effect of the difference in treatment (or attribute) in question. Observational before-after studies have been demonstrated being able to explore correct insights under a meticulous study design (Hauer, 1997) while the capability of observational cross-section studies still opens to question (Hauer, 2006).
Since observational studies, whether before-after or cross-section, are not as robust as randomized experiments in causal-effect interpretations, inconsistent or even controversial conclusions are sometimes found in reports or journal articles. For example, Davis (2004) mentioned that although many studies have used statistical methods to correlate accident experience with variations in traffic and road conditions, the transferability of such models have been found that the significance of accident predictors can differ for data collected in the same geographic region but at different times, as well as for data collected in different regions. In another example, Elvik and Greibe presented the result of a meta-analysis (2005) for the studies evaluating the road safety effects of porous asphalt. They concluded that
“While some studies have evaluated these effects, not all of these studies can be trusted and their findings are highly inconsistent.” These inconsistencies mainly result from four difficulties: the existence of confounding factors, the determination of scope of causality, the quality and availability of data, and the capability of methodologies.
The leading and the most important difficulty comes from confounding factors. A confounding factor is any exogenous (i.e. not influenced by the road safety measure itself) variable affecting the number of accidents or injuries whose effects, if not estimated, can be mixed up with effects of the measure being evaluated. The results of a study should never be trusted if confounding factors are not well controlled (Elvik, 2002). Factors that are commonly regarded as potential confounding factors in observational before-after studies include: long term trends affecting accident consequences; general changes of the number of accidents from before to after the road safety measure is introduced; any other treatments that have been implemented during the ‘before’ or ‘after’ periods; regression-to-the-mean‡;
adjustments to the reportability limit; and traffic flow (Hauer, 1997). Confounding factors of accidents are abundant and various such that a well control over them becomes very difficult. This reflects in the following three difficulties.
‡ “The entities may have been chosen for treatment because they had unusually many or few accidents in the past… one can hardly hope that the ‘unusual’ is a good basis for predicting what would be expected in the future had treatment not been applied.” Hauer (1997), pp.74.
Since confounding factors are numerous, an immediate issue raises: how to define the scope of the causality of an accident; i.e. which factors should be considered and which should not. In early days, the causes of an accident were usually attributed to the closest-to-accident factors. Researchers, however, have recently tended to analyze an accident more thoroughly – not only the accident itself but also the activities prior to and subsequent to the accident. For example, Eby et al. (2000) and Simoes (2003) found that elderly people tend to avoid night driving, reduce freeway driving, driving only in familiar areas, and driving with a co-pilot to compensate for their age-related decline and the corresponding difficulties in performing the driving task. An accident, therefore, may not occur if one or several undesirable activities in this accident chain were broken (Baker and Ross, 1961). An analysis of accident chains can be roughly divided into several stages, for example: the situation prior to driving, the driving situation, the accident or discontinuity situation, the emergency situation, and the collision situation (Fleury and Brenac, 2001). It is obvious that the driving situation, such as pavement material, illumination, traffic signals, etc., would affect accident occurrence, but the activities in other stages are difficult to recognize whether they have impacts on accident occurrence and/or severity.
The other concern on the selection of contributing factors is the use of statistical null hypothesis significance testing (NHST for short). Recall the first rule to define causality claimed by Surgeon General (1964): some statistical measure of association is strong.
NHST has been regarded as a good measure to define the importance of factors. However, a
‘not significant’ factor in statistical sense is not equal to a ‘not important’ or ‘useless’ factor in traffic safety. A fair way to say about a non-significant factor is: “I cannot be sure that the safety effect is not zero”. Since a ‘non-reject’ null hypothesis is of scarce help on dropping potential factors and it is expected that the farther a factor away from an accident (such as factors in the prior to driving stage), the more insignificant a factor would be, it becomes more and more difficult for researchers to choose factors via NHST in research.
The third difficulty goes to the availability and quality of data. Although most accident databases have been designed to contain as much information as possible, some attributes such as driver’s psychological status are still difficult to discover except in some in-depth investigation projects. Thereafter, even though an accident case is fully described with all the recorded data, it is an incomplete description for the case. Furthermore, although accident databases are panel data, i.e. data of same targets are collected over some periods, the targets are usually defined by administrative areas such as city and county rather than specific intersections, road segments or specified populations. Moreover, not all traffic accidents are reportable; not all reportable accidents are reported; not all reported accidents are correctly recorded. With these deficiencies, the availability and quality of data is questionable. This problem exists in many countries including Taiwan (Lai et al., 2006).
Assume data has been screened where confounding factors are all considered; potential causal factors are determined; and the quality of data is assured. The last difficulty goes to the capability of analytical techniques. Statistical methodology has been the most frequently one to be adopted on analyzing accident data. Conventional statistical methods, such as logistic regression models, are great for analyzing relationships which are clear between dependent and independent variables. Moreover, few ‘representative’ variables are usually chosen to interpret dependent variables. The conventional statistical approach is great to explore relationships but would be inappropriate to examine causality since the complicated interrelationships among factors are difficult to be well controlled.