Real-time Trafﬁc Light Recognition on Mobile Devices with Geometry-Based Filtering

(1)

Real-time Trafﬁc Light Recognition on Mobile Devices with Geometry-Based Filtering

Tzu-Pin Sung^‡ and Hsin-Mu Tsai^†

†Department of Computer Science and Information Engineering

‡Department of Electrical Engineering National Taiwan University

Taipei, Taiwan

b98901115@ntu.edu.tw, hsinmu@csie.ntu.edu.tw Abstract—Understanding the status of the trafﬁc signal at an

intersection is crucial to many vehicle applications. For example, the information can be utilized to estimate the optimal speed for passing the intersection to increase the fuel efficiency, or to provide additional context information for predicting whether a vehicle would run the red light. In this paper, we propose a new real-time traffic light recognition system with very low computa- tional requirements, suitable for use in mobile devices, such as smartphones and tablets, and video event data recorders. Our system does not rely on complex image processing techniques for detection; instead, we utilize a simple geometry-based technique to eliminate most false detections. Moreover, the proposed system performs well in general and realistic conditions, i.e., vibration caused by rough roads. Evaluation of our proposed system is performed with data collected from a smartphone onboard a scooter, including video footage recorded from the camera and data collected by GPS. It is shown that our system can accurately recognize the traffic light status in real-time as a vehicle carrying the device approaching the intersection.

I. INTRODUCTION

Traffic light is one of the most important means to maintain order in the road system. However, the traffic light only provides passive prevention of collisions; it does not have direct control of individual vehicles at intersections and the drivers do not necessarily follow its instruction, in which case devastating traffic accidents could happen. According to [1], in 2011 there were 4.5 million reported intersection and intersection-related crashes in the U.S., resulting in approximately 12,000 fatalities and more than 1.4 million injury crashes. It is therefore beneficial to develop an active safety system that would predict red-light running behavior from individual vehicles, using onboard GPS; when such behavior is detected, a warning message is presented to the drivers of that vehicle as well as nearby vehicles, and automatic brake could be immediately applied to prevent a potential collision at the intersection. Real-time traffic light status recognition is crucial in such a system to yield accurate behavior prediction, as it provides context information to the prediction engine.

In addition, trafﬁc light detection and recognition is essen- tial to many other applications. For example, [2] implements a Green Light Optimal Speed Advisory (GLOSA) system which could estimate and inform the driver the optimal speed to pass the intersection, in order to increase the fuel efﬁciency.

[3] implements a driving assistance system for color vision deﬁcient drivers.

Fig. 1. The vehicle application can be implemented on a smartphone. The smartphone is mounted on the handlebar of a scooter and the front camera can be used to recognize trafﬁc light status.

Many vehicle applications are cooperative, requiring a certain percentage of other vehicles on the road to be equipped with the proposed system, i.e., a minimum market penetration rate. Relying only on gradual adoption of the system in newly purchased vehicles would signiﬁcantly lower the incentive for customers to purchase the product, as in early stage the market penetration is too low, the system cannot function at all, and it provides no beneﬁt to them. One way to solve the problem is to implement the application on mobile devices that the customers already own, as part of an aftermarket solution (see Figure 1). However, in such devices, the application can only be implemented as software and executed on a general- purpose microprocessor - the limited computational power could become a constraint for these applications.

In this paper, we proposed a real-time traffic light recogni- tion system that requires very little computational power while maintaining high accuracy, suitable to be implemented on mobile devices with limited computational resources. The proposed system does not rely on complex image processing and computer vision techniques. Instead, we use a filtering scheme that could eliminate most false traffic light detections obtained from any simple detection scheme. The filtering scheme uses geometry to determine whether a detection is likely a traffic light, utilizing the information of the estimated height and the location of the traffic light and the GPS information, which is available on most modern mobile devices, e.g., tablets, smartphones, GPS navigation devices, and video event data recorders (VEDR). The other major advantage of our proposed scheme is that it is robust under many real-world conditions.

For example, on rough roads, vibration would result in camera shake that produces visible frame-to-frame jitter, and could degrade the performance of the trafﬁc light recognition system due to motion blur, distortion, and unwanted discontinuous movement. Our scheme does not rely on template matching,

(2)

does not require pre-processing to eliminate the effects caused by vibration for recognition, and the performance is therefore not sensitive to road surface conditions.

The rest of this paper is organized as follows. In Section II, several related works in existing literature are described. In Section III, the detailed design of our system is presented.

Experimental results and evaluation of our system is shown in Section IV. Finally, concluding remarks are given in Section V.

II. RELATEDWORK

Traffic light recognition was early studied in [4]. A non- real-time and viewpoint dependent theory is applied to formulate the hidden Markov models. On the other hand, traffic light recognition for still cameras is proposed for intelligent transportation system (ITS) in [5]. [6] also present a video- based system for automatic red light runner detection which also works on fixed cameras. Omachi et al. [7] use pure image processing techniques including color filtering and edge extraction to detect the traffic light.

Real-time approaches to traffic light recognition are presented in [3], [8]. [3] propose real-time recognition algorithm to identify the color of the traffic light in various detecting conditions and assist the driver with color vision deficiency. [8]

obsoletes color-based image detecting techniques and adopts luminance to filter out false detections. By applying spot light detection and template matching for suspended and supported traffic lights, the work can be performed in real-time on personal computers. [9] presents a non-real-time algorithm with high accuracy by learning models of image features from traffic light regions.

Recognition technique that tracks the traffic lights is early proposed in [10]. The authors introduce the architecture of detecting, tracking, and classifying for traffic light recognition and compare three different detectors, e.g. a high-end HDR camera sensor, that advantageously perform to speed or accuracy as a trade-off. Another tracking approach using CAMSHIFT algorithm to obtain temporal information of traffic light is presented in [11]. The tracking algorithm also narrows down the searching areas to improve performance on image processing. Our system revises the architecture in [10], proposes advanced detecting and tracking models that eliminate the use of costly color detector and template matching in tracking, and can recognize realistic scenes accurately in real-time on computationally constrained mobile devices.

III. SYSTEMDESIGN

A. Architecture

Figure 2 presents the block diagram of our system. Our proposed system is designed with a multi-stage architecture.

In order to achieve real-time performance on a mobile device, whose camera typically captures images at a rate of 30 frame/s or lower, the processing time for each frame needs to be less than 1/30 second. Therefore, we designed the algorithm of each block to avoid complicated computation.

As shown in Figure 2, video frames read from the built- in camera are processed in RGB domain¹ and a few simple

1From the Android smartphone used in our experiments, the raw data obtained from the camera is in RGB color domain.

Fig. 2. System block diagram

Fig. 3. The output images after each processing stage; upper-left: original image, upper-right: RGB ﬁltered image, lower-left: connected-component labeling and merging, and lower-right: trajectory model.

and time-efficient image processing techniques are applied to identify possible blobs and filter out obvious false detections and noise in the first stage.

In the second stage, the shape and the size of these blobs are used to filtered out blobs which is unlikely to be a traffic light, generating the candidates of traffic lights.

To recognize candidates in consecutive frames to obtain the temporal information, the nearest neighbor algorithm is applied to ﬁnd the nearest blob in the previous frame.

In the third stage, with the position and time information of the candidates, two proposed geometry-based models, trajectory and projection, are utilized to further filter out and recognize the traffic light intended for the host vehicle. The trajectory approach is inspired by realizing that the trajectory of the intended traffic light is distinctive. Thus, by fitting the trajectory into a straight line, the properties of the line can be exploited. On the other hand, the projection is prompted by understanding that the position of the traffic light on the frame is related to the position of the traffic light in the real-world 3-dimensional environment relative to the camera onboard the vehicle, and this can be estimated with the camera projection model and the GPS data. Figure 3 presents the images of the original traffic light scene as well as the output images after each processing stage. It is worth mentioning that

(3)

the camera was intentionally conﬁgured to have the captured images underexposed for the pre-ﬁltering technique to function properly, which will be described in the following subsection.

B. Image Processing in RGB Color Domain

1) RGB Color Filtering: In existing works, it was found that image processing techniques such as color segmentation, Hough transform, and morphological transforms perform well in trafﬁc light recognition, but these methods have high complexity, require signiﬁcant computational time, and are not feasible for achieving real-time performance on mobile devices. As a result, a fast image processing strategy is adopted in our proposed system. Each frame is processed in RGB color domain to eliminate the computational time for color domain conversion. Moreover, instead of applying complex image processing techniques, a straightforward but effective heuristic rule is proposed. From the image analysis of each frame, pixels in red or green light areas have not only large R or G values respectively, but also adequate R-G difference or G-R difference due to the fact that the R or the G value would dominate. In addition, for a green light, the absolute G- B difference should be small. To formulate these guidelines, let sred(x, y) and sgreen(x, y) represent the scores of how likely the pixel at frame position (x, y) is part of a red light and a green light, respectively, and given by

sred(x, y) = pR(x, y)− p^G(x, y) (1) and

sgreen(x, y) = pG(x, y)− p^R(x, y)− 2|p^G(x, y)− p^B(x, y)| where pR(x, y), pG(x, y), and pB(x, y) are the R, G, B(2) values of the pixel at frame position (x, y), respectively. If sred(x, y) and sgreen(x, y) are above certain thresholds, then it is considered that the pixel (x, y) belongs to a red light or a green light. In our system, the thresholds are set to be Tred = 128 and Tgreen = 50, respectively, which are based on numerous observations from our experimental data under various lighting conditions. The image is then converted to two binary images with the thresholds as follows

pred(x, y) =

! 1 , if sred(x, y) > Tred

0 , otherwise (3)

, and pgreen(x, y) can be given in a similar way. Note that most areas in the image are filtered out at this point. Although false detections could occur on unintended traffic lights, taillights of vehicles, and signboards, our system accepts the false detections at this stage and will filter out most of them at a later stage with the geometry-based filters.

2) Camera Exposure: Ambient lighting conditions affect the performance of RGB-color-based algorithms signiﬁcantly.

The ambient lighting condition attributes to weather (sunny or cloudy), the environment, and time of the day (day or night). Most built-in cameras of mobile devices automatically adjust their exposure setting according to the ambient lighting condition in order to capture well-exposed images, since it has relatively low dynamic range. However, with the default setting, trafﬁc lights in the images often appear over-exposed due to the high luminous intensity. In our proposed system, the exposure is conﬁgured to be as low as possible, on a condition

that the traffic light is not under-exposed, since for traffic light recognition purpose the only object in consideration is the traffic light. To do so, the system could estimate the ambient lighting conditions, and adjust the exposure setting accord- ingly. It is also worth mentioning that setting the exposure to the minimum provides a pre-filtering mechanism to eliminate low luminous intensity areas that could have similar hue values to the traffic light and a method to prevent motion blur.

C. Connected-Component Labeling and Merging

To distinguish individual blobs, connected-component labeling algorithm with 8-connectivity for the binary image is applied to find the area size and position of each blob. Each blob is assigned with a identification (ID) number, and the area size and the position in the frame is recorded along with the ID number. In addition, a smallest bounded rectangle which can contain the blob is found. Let hbiand wbidenote the height and the width of this rectangle for the blob with ID i, and (xbi, ybi) denote the center position of this rectangle. The occurrence of noise attributes to disconnection of contiguous components due to non-uniform diffusion in real-world. Therefore, those small blobs should be merged as a whole. In our system, for each blob biwe check if there are other blobs bjthat are sufficiently close. If so, they are marked with the same ID number to merge them into a larger blob. The following presents the criteria to merge two blobs

"

(xbi− x^bj)²+ (ybi− y^bj)²≤ 1.8("

h²_b_i+ w_b²_i+"

h²_b_j + w²_b_j) . (4)

D. Candidate Assignment and Criteria Filtering

From our observation of the experimental data, it is found that trafﬁc lights are generally in a circular shape and has a moderate area size. Therefore, by applying the ﬁltering criteria of the dimension ratio in [8]

max(hbi, wbi) < 1.8× min(h^bi, wbi)) (5) , and filter out blobs with area size larger than 3600 pixels, those that are unlikely to be a traffic light are filtered out. Note that the constant 1.8 in equation 5 is chosen to maintain recall at a certain level, i.e., avoiding the intended traffic light to be ignored, but at the same time filter out as many false detections as possible. The remaining blobs are defined as candidates of traffic lights.

E. Geometry-Based Filtering

After the previous ﬁltering schemes, the candidates are similar in color, luminance intensity, shape, and size. Thus, it is rather hard to distinguish them from the appearance.

To overcome this problem, we proposed two geometry-based models to ﬁlter out most false detections. In addition, for both models, temporal information is required since each candidate appears in consecutive frames and its position is strongly related to that of the frames before and after. As a result, the same candidate in consecutive frames is recognized as a whole and the spatial-temporal information, i.e., the position of the candidate in the frame at a particular time, is recorded for further use. To arrange the appearances (detections) of

(4)

200 400 600 4 2

8 6 1210 14 100 200 300 400

x

Simulated traffic light trajectory in spatial and time domain

Time (s)

y

(a) 3D trajectory of the trafﬁc light with time information

100 200 300 400 500 600

50 100 150 200 250 300 350 400 450

Simulated traffic light trajectory projected on spatial domain

y

x

(b) Projected 2D trajectory of the trafﬁc light Fig. 4. Spatial-temporal information of a trafﬁc light

trafﬁc lights between frames into candidates, for each new appearance, nearest neighbor algorithm is used to ﬁnd the candidate whose latest appearance is the closest to it within a certain search radius, and link this new appearance to the candidate. This also allows us to prevent the problem caused by temporary occlusion due to nearby objects.

1) Trajectory Model: The spatial-temporal information of a candidate formulates the trace in the 3-dimensional (3D) space formed by the 2D position space and time. By projecting the trace to the 2D position space, we may obtain a trajectory of the candidate in the frame. As shown in Figure 4, one can observe that the trajectory of an intended trafﬁc light possesses particular properties. Therefore, although the time information of the trace in the 3D space is abandoned in the projecting process, the 2D trajectory itself conveys sufﬁcient information.

In the trajectory model, our proposed system takes the properties of the trajectory as features to filter out false detections. Although the trajectory is composed of plenty of positions from different frames and the positions are noisy due to camera vibrations, the main objective is to extract the trend of the trajectory. Therefore, our system applies a low-pass filter with exponential moving averaging (with the α parameter set to 0.1) to the positions (xbi(t), ybi(t)) to remove the fluctuations caused by the vibration. Here (xbi(t), ybi(t)) represents the center position of blob i at time t. Then, the system fits the filtered trajectory to a linear line, i.e., uses a linear regression approach that finds the parameters c1 and c2

with the following

E = arg min

c1,c2

T

!

t=0

||y^bi(t)− (c¹+ c2xbi(t))|| (6) , utilizing Levenberg-Marquardt algorithm [12]. The properties of the trajectory, c1 and c2, along with the length of the trajectory, i.e., the number of consecutive frames with the detected traffic light candidate, representing how reliably the traffic light is detected for an extend period of time, are generated to serve as a feature vector for the decision tree model in the final stage.

2) Projection Model: In this section, the camera model is simpliﬁed that lens distortion is neglected and that the principle point is set to the origin since the approximation is tolerable

Fig. 5. The projection model for the trafﬁc light

for further classiﬁcation and can reduce computational burden.

Note that the inaccuracy caused by the simplification could be mitigated by the decision tree model in the final stage. Given the focal length of the camera f, the distance from the camera to the bottom of the traffic light d, and the position of the traffic light in the frame (x, y), the estimated coordinate of the traffic light relative to the camera (s, t) can be obtained theoretically with

s = d sin φx= d x

"f²+ x² (7) t = d sin φy= d y

"f²+ x² (8) , using the camera project model shown in Figure 5 and the coordinate system in Figure 6. The estimated height s and the horizontal shift t of the trafﬁc light can be compared to the typical values of these two parameters to determine whether a candidate is likely to be an actual trafﬁc light.

With the aid of the built-in GPS on mobile devices, the required distance from the camera to the bottom of the traffic light d for the camera projection model is obtained. Note that we assume that the GPS coordinate of the traffic lights is accessible from the map database in a mobile device, so that d can be evaluated by the difference between the current coordinate of the vehicle and the coordinate of the traffic light.

Ideally, the camera would face right to the +z direction shown in Figure 5. However, in reality, the camera is usually

(5)

Fig. 6. The coordinate system of the image plane and the real world

Fig. 7. Elevation angle correction

mounted with a small elevation angle δ, and this could cause inaccuracy resulting in unreliable estimated trafﬁc light coordinate (s, t). As shown in Figure 7, to overcome this problem, a calibration is made to correct the position of the candidate on the frame with

x = fy

z = f tan θ⇒ tan θ = x

f (9)

x^′= fy^′

z^′ = f tan(θ + δ) = fx + f tan δ

f− x tan δ (10) which is achieved by applying a rotation matrix to the original coordinate (x, y). However, another difﬁculty is that the elevation angle is unknown.

The elevation angle is approximately fixed as a vehicle approaches the intersection. As a result, the estimated height of the traffic light should be kept fixed during the approach.

Utilizing this fact, one simple approach is to evaluate all possible angles in a range, so that the estimated height of the trafﬁc light is closer to be a constant. It is reasonable to assume that the elevation angle δ is in the range from

−5 to +5, since the elevation angle is calibrated manually before recording. Moreover, the resolution for searching for an optimal estimation of δ is set to one degree, as vibration of the device also creates inaccuracy and a ﬁner resolution does not yield better results. Figure 8 shows an example of estimating the elevation angle. Note that the variation of the estimated height exist because of the vehicle vibration that causes varying elevation angle.

10 15 20 25 30 35 40 45 50

2 3 4 5 6 7 8 9 10 11 12

Estimated y coordinate: angle of elevation correction on each angle in degree

Distance (m)

Estimated Height (m)

5 4 3 2 1 0 +1 +2 +3 +4 +5

Fig. 8. An example of the elevation angle correction - the estimated elevation angle is -2 degree.

TABLE II. FOUR DIFFERENT FEATURE SELECTION SETS Feature Frame Estimated trafﬁc Regression line

Set coordinates light position parameters (x, y) (s, t) (c1, c2)

I !³

II ! !

III ! !

IV ! ! !

Another inaccuracy is caused by inaccurate vehicle coordinates reported by the GPS. [13] reports that the error could be up to 10 meters with a consumer-grade GPS device. This GPS error shifts the estimated coordinates to a higher or lower level. Fortunately, for traffic light recognition, the error does not significantly affect the performance since it affects both intended traffic light and the other false detections.

F. Decision Tree Model

With the features described in previous subsections, each candidate can now be represented as a feature vector and all instances are manually labeled as intended-traffic-light (ITL), and non-ITL. The classification model is trained with random forest algorithm [14] with five hundred randomly generated decision trees not limiting the height and selected features.

The evaluation procedure and results will be discussed in the next section.

IV. EXPERIMENTALRESULTS

In this section, the computation time and the accuracy performance of our proposed system are evaluated in two parts.

Data collected from 32 intersections in the urban area of Taipei, Taiwan, including 16 red and 16 green light conditions, is used in this section. The video footage in the collected data has a resolution of 720-by-480, and the data was collected from a smartphone mounted on a 100 c.c. scooter, as shown in Figure 1. In addition, all video footage may contain frame jitter caused by vibration of the smartphone. Processing was performed in real-time (with a capture frame rate of 20 frame/s) on a Android 4.04 smartphone equipped with a 1.5 GHz dual-core processor and 1 GB of RAM. The recorded video footage is labeled manually as ITL and non-ITL.

2Including criteria ﬁltering.

3The check mark! means the inclusion of this feature in that particular feature set.

(6)

TABLE I. COMPUTATIONTIMEDETAIL

Stage Name Sub steps Duration (ms) Total Duration (ms) RGB color domain RGB color ﬁltering 2.602

13.861 Connected Components Labeling 11.247

Merging² 0.012

Candidate assigning Candidates assigning 0.072 0.072

Trajectory model Linear regressionLow-pass ﬁlter 0.0111.201 1.212 Projection model Camera projectionAngle correction 0.0090.222 0.231

Decision tree model Classiﬁcation 0.324 0.324

Total 15.700

Fig. 9. Performance comparison of different voting rules

Fig. 10. Performance comparison of different feature selections

A. Computation Time

The computation time of each stage in the proposed system is measured and shown in Table I. One can observe that the combined execution time for a frame is only 15.7 ms, only half of the required 1/30 second for real-time processing on a mobile device and half of the numbers in [8], which is obtained from executions on personal computers.

B. Recognition Performance

For all 32 intersections, video sequences and recognized candidate features are collected in advance. Afterwards, the class of each candidate is labeled manually. With the features and the labeled class of each candidate, a data set having about 20,000 samples is constructed. Each sample is a possible traffic light in a specific frame; thus, every possible traffic light

has numerous samples in different frames. In the evaluation, the data set is divided into the testing set and the training set. The testing set is composed of randomly chosen 30 traffic lights and 30 non-traffic lights. On the other hand, the training set includes the rest of the candidates. Since the number of candidates in the ITL and the non-ITL classes are unbalanced, i.e., there are about 30 in ITL and 250 in non- ITL, bootstrapping [15] is applied to re-sample the candidates in ITL. This step prevents the classification results from biasing toward non-ITL.

After training a decision tree model using the random forest algorithm with the data in the training set, the data in the testing set is classified with the trained model, with each candidate in the testing set classified into either ITL or non-ITL. However, the final decision of whether the candidate is a traffic light or not is made through voting. Since every possible traffic light appears as a group of candidates appearing in consecutive frames, candidates in the same group votes for the final decision.

Figure 9 shows the results from different voting rules. From the result, we may see that voting rule 1 has the best result, i.e., a group of candidates in consecutive frames is regarded as ITL if in more than one frame the candidate is classiﬁed as ITL.

This is because the predicting model generated from random forest algorithm hardly classiﬁes any candidate as ITL; as long as it classiﬁes a candidate as ITL, this group of candidates has a large probability to be a ITL.

Table II gives four different feature selections that we used for training the decision tree model and in Figure 10, performance comparison of different selected feature sets is shown. One could observe that of the four feature selection sets, set I performs the worst. Moreover, although the precision of set II is the highest, its recall is low. Set IV performs the best out of the four sets, having the highest F1-score and relatively high precision and recall. The result is reasonable since set IV uses all available features to provide a more reliable and robust classification. On the other hand, estimated traffic light position have better performance than the parameters of the regression line; in Figure 11, one could observe that the estimated traffic light position of candidates in different classes have larger separation than that of the regression line parameters.

From Table III, it is clear that before our proposed geometry-based ﬁltering, due to the simplicity of the detection scheme, all detected candidates are recognized as ITL;

thus, although the recall is equal to one, i.e., every ITL

4The number in the parenthesis gives the numbers of the ground truth.

5With 50 trials of randomly chosen testing data and set IV feature selection.

(7)

50 0 50 10

5 0 5 10 15 20 25 30

Visualization of estimated coordinates with labels

Estimated x coordinate

Estimated y coordinate

4000 2000 0 2000 4000

10 5 0 5 10 15

Visualization of regression line coefficients with labels

C1

C2

1 1

Fig. 11. Visualization of estimated coordinates and regression line parameters (1: ITL; -1: non-ITL)

TABLE III. PERFORMANCE COMPARISON WITH AND WITHOUT OUR PROPOSED FILTERING SCHEME

Filtering Trafﬁc Non-Trafﬁc Performance

light⁴ light Precision Recall F1-score w/o ﬁltering 59 (59) 0 (296) 0.166 1.000 0.285 w/ ﬁltering⁵ 1222 (1500) 1333 (1500) 0.882 0.815 0.845

is recognized, the precision is very low because we cannot distinguish between ITL and non-ITL. After applying the proposed geometry-based ﬁltering, in Table III it shows that the performance of the proposed system is greatly improved;

the F1-score is about 0.85 on average.

V. CONCLUSION

In this paper, we proposed a real-time traffic light recognition system that can be implemented on a mobile device with limited computational resources. By using straightforward and simplified detection schemes, processing in RGB color domain, the computation time is greatly reduced. Although the simple detection scheme produces many false detections, the proposed geometry-based filtering scheme could be used to eliminate most of them. Moreover, the system performs well in realistic condition where the camera experiences vibration when shooting the video. The overall F1-score of our system is as high as 0.85 on average and our proposed system can process each frame within 20 milliseconds. It is also worth to mention that our proposed filtering scheme can be used to combine with other detection scheme and to eliminate the false detections time-efficiently in a similar way.

ACKNOWLEDGMENT

This work was supported in part by National Science Council, National Taiwan University, and Intel Corporation under Grants NSC-101-2911-I-002-001 and NTU-102R7501.

REFERENCES

[1] “Trafﬁc safety facts 2011 - a compilation of motor vehicle crash data from the fatality analysis reporting system and the general estimates system,” National Highway Trafﬁc Safety Administration, National Center for Statistics and Analysis, and U.S.

Department of Transportation, 2013. [Online]. Available: http://www- nrd.nhtsa.dot.gov/Pubs/811754AR.pdf

[2] E. Koukoumidis, L.-S. Peh, and M. R. Martonosi, “Signalguru: lever- aging mobile phones for collaborative trafﬁc signal schedule advisory,”

in Proceedings of the 9th international conference on Mobile systems, applications, and services, ser. MobiSys ’11. New York, NY, USA:

ACM, 2011, pp. 127–140.

[3] Y. Kim, K. Kim, and X. Yang, “Real time trafﬁc light recognition system for color vision deﬁciencies,” in Mechatronics and Automation, 2007.

ICMA 2007. International Conference on, 2007, pp. 76–81.

[4] Z. Tu and R. Li, “Automatic recognition of civil infrastructure objects in mobile object mapping imagery using a markov random ﬁeld model,”

in Proc. the 19th International Society for Photogrammetry and Remote Sensing (ISPRS) Congress, 2000.

[5] Y.-C. Chung, J.-M. Wang, and S.-W. Chen, “A vision-based trafﬁc light system at intersections,” Journal of Taiwan Normal University:

Mathematics, Science, and Technology, vol. 47, no. 1, pp. 67–86, 2002.

[6] N. Y. H.S. Lai, “A video-based system methodology for detecting red light runners,” in Proceedings of the IAPR Workshop on Machine Vision Applications, 1998, pp. 23–26.

[7] M. Omachi and S. Omachi, “Trafﬁc light detection with color and edge information,” in Proc. IEEE International Conference on Computer Science and Information Technology (ICCSIT), 2009, pp. 284–287.

[8] R. de Charette and F. Nashashibi, “Real time visual trafﬁc lights recognition based on spot light detection and adaptive trafﬁc lights templates,” in Proc. IEEE Intelligent Vehicles Symposium, 2009, pp.

358–363.

[9] Y. Shen, U. Ozguner, K. Redmill, and J. Liu, “A robust video based traf- ﬁc light detection algorithm for intelligent vehicles,” in IEEE Intelligent Vehicles Symposium, 2009, pp. 521–526.

[10] F. Lindner, U. Kressel, and S. Kaelberer, “Robust recognition of trafﬁc signals,” in IEEE Intelligent Vehicles Symposium, 2004, pp. 49–53.

[11] J. Gong, Y. Jiang, G. Xiong, C. Guan, G. Tao, and H. Chen, “The recognition and tracking of trafﬁc lights based on color segmentation and camshift for intelligent vehicles,” in IEEE Intelligent Vehicles Symposium, 2010, pp. 431–435.

[12] J. J. More, “The Levenberg-Marquardt algorithm: Implementation and theory,” Lecture Notes in Mathematics, vol. 630, pp. 169–173, 1978.

[13] M. G. Wing, A. Eklund, and L. D. Kellogg, “Consumer-Grade Global Positioning System (GPS) Accuracy and Reliability,” Journal of Forestry, vol. 103, no. 4, pp. 169–173, June 2005.

[14] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp.

5–32, 2001.

[15] G. Dupret and M. Koda, “Bootstrap re-sampling for unbalanced data in superviced learning,” European Journal of Operational Research, vol.

134, no. 1, pp. 141–156, 2001.