A Relative-Discriminative-Histogram-of-Oriented-Gradients-Based Particle Filter Approach to Vehicle Occlusion Handling and Tracking

(1)

A

Relative-Discriminative-Histogram-of-Oriented-Gradients-Based Particle Filter Approach to

Vehicle Occlusion Handling and Tracking

Bing-Fei Wu, Fellow, IEEE, Chih-Chung Kao, Student Member, IEEE, Cheng-Lung Jen, Student Member, IEEE,

Yen-Feng Li, Ying-Han Chen, and Jhy-Hong Juang

Abstract—This paper presents a relative discriminative

his-togram of oriented gradients (HOG) (RDHOG)-based particle filter (RDHOGPF) approach to traffic surveillance with occlusion handling. Based on the conventional HOG, an extension known as RDHOG is proposed, which enhances the descriptive ability of the central block and the surrounding blocks. RDHOGPF can be used to predict and update the positions of vehicles in continuous video sequences. RDHOG was integrated with the particle filter framework in order to improve the tracking robustness and accu-racy. To resolve multiobject tracking problems, a partial occlusion handling approach is addressed, based on the reduction of the particle weights within the occluded region. Using the proposed procedure, the predicted trajectory is closer to that of the real rigid body. The proposed RDHOGPF can determine the target by using the feature descriptor correctly, and it overcomes the drift problem by updating in low-contrast and very bright situations. An empirical evaluation is performed inside a tunnel and on a real road. The test videos include low viewing angles in the tunnel, low-contrast and bright situations, and partial and full occlusions. The experimental results demonstrate that the detection ratio and precision of RDHOGPF both exceed 90%.

Index Terms—Histogram of oriented gradients (HOG), particle

filter, vehicle tracking.

I. INTRODUCTION

T

RAFFIC monitoring is an important issue during traffic management. Therefore, many detection methods have been proposed in the literature, which use monocular cameras, stereo cameras, and active sensors, such as radar and infrared. Due to falling costs and the increased efficiency of compu-tational power, vision-based techniques have become popular for monitoring traffic scenes, and comprehensive up-to-date surveys are provided in [1] and [2]. However, it is difficult to lo-cate and track targets precisely in low-contrast and excessively bright scenes. The bright object pairing method is feasible for vehicle tracking, and it is reliable in low-illumination situations [3]. A histogram extension method [4] was used to extract

Manuscript received March 4, 2013; revised June 7, 2013; accepted July 18, 2013. Date of publication October 1, 2013; date of current version February 7, 2014. This work was supported by the National Science Council, Taiwan, under Grant NSC 100-2221-E-009-041. This work is specifically supported by “Aiming for the Top University Program” of the National Chiao Tung University and the Ministry of Education, Taiwan.

The authors are with the Institute of Electrical and Control Engineering, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: bwu@mail. nctu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIE.2013.2284131

vehicular objects from highways and urban roads. Background subtraction, which obtains the foreground (moving) pixels by subtracting the current input image from a background image [5]–[10], is one of the most efficient methods for change detection, and it performs well when separating vehicles from the background, even with shadows and tree vibrations. In this paper, background subtraction is used to separate the vehicles from the background, while the relative discriminative histogram of oriented gradients (HOG) (RDHOG) method is applied for tracking vehicles.

After detection, visual tracking [30] is used to correlate vehicles that appear in successive frames. High-level event detection can be achieved using vehicle tracking. As well as detection, there have been many major advances in visual object tracking in recent decades. The most common methods rely on Bayesian filtering, such as Kalman filtering and particle filtering. Moreover, a recent review paper [11] summarizes the application of these filters to vision analysis. Traditionally, Kalman filters have been used to predict and update the states of linear and Gaussian models. Thus, head tracking [12] and target detection and following [13] have been performed based on Kalman filters. A particle filter [14], also known as the sequential Monte Carlo method or condensation algorithm, has been used for target localization in situations with large numbers of particles (samples). The particle filter method [15]– [24] has become very popular because of its ability to handle nonlinear and non-Gaussian models, as well as its robustness and flexibility. In [15], the contextual confidence of the mea-surement model was also considered, and nonoverlapping frag-ments were selected dynamically for likelihood measurefrag-ments. Two operation modes have been proposed for the adaptive particle filter (APF) [16]. When the tracked vehicle is not occluded, APF uses a normal probability density function to generate a new set of particles. When the tracked vehicle is occluded, however, APF generates a new set of particles using a Normal-Rayleigh probability density function. Moreover, a motion-based model [17] was used to minimize the variance in the particle weights. In general, a first-order hidden Markov model is used during visual tracking, although a novel visual tracking framework that uses a high-order Monte Carlo Markov chain has been presented [18]. However, the computational complexity is increased as the order increases.

In [19], a projective transform was integrated into the particle filter framework. However, the parameters of transformation,

0278-0046 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(2)

such as the height and the angle of the set camera, were difficult to obtain in each scene. A tracking algorithm based on a combination of a particle filter and the mean shift with a new transition model was presented in [20]. The multimodel anisotropic mean shift was integrated into the particle filter to reduce the number of particles, although the computational load was still high. In [21], a Kalman filter and a mean-shift tracker were integrated to predict a position more precisely. In addition to the Bhattacharyya coefficients used in color histograms, template matching is another method for particle weight calculation. A framework has been proposed that fuses salient points, blobs, and edges as features, while an object model adaptation was developed to facilitate more efficient operation [22]. Mei and Ling [23] proposed a visual tracking system that used a sparse representation set. The template was updated dynamically to capture changes in appearance, and nonnegativity constraints were enforced to filter out clutter that negatively resembled the tracking targets. A ten-layer annealed particle filter [24] was developed for visual tracking. In this framework, noise is adjusted adaptively by each layer, and the particles become tightly concentrated in the last layer. These techniques work well in normal light condition, but the low contrast in tunnels and areas with abnormal lighting conditions is seldom considered. Thus, the color histogram and image template are easily confused with the background image. Hence, the RDHOG method can be integrated into the particle framework to represent the target.

This paper proposes an RDHOG-based particle filter (RD-HOGPF) for vehicle tracking with partial occlusion handling in a traffic surveillance application. RDHOGPF can be used to track passing vehicles more precisely. Unlike other meth-ods that track objects using color histograms, the RDHOGPF tracking module locates and tracks vehicles accurately using RDHOG features. Based on HOG, the RDHOG features are the difference between the central block and the surrounding blocks. Thus, the weighting contributions of the features can be improved by combining the conventional HOG features and RDHOG features. In addition, each particle has two sample images: an image of the original size and another with a scaled width and height. Both sample images are resized into a stan-dard pattern size of 32× 32. If the sample image with a scaling factor has a larger weight after the likelihood calculation, the target size is updated with a scaling factor. Therefore, the target can be tracked and updated using the proper size. Occlusion handling is performed to maintain the tracked vehicle in a stable position when it is partially occluded.

Section II briefly introduces the workflow of the proposed approach. Section III describes the details of the proposed RDHOGPF for vehicle tracking with occlusion handling. Section IV presents our experimental evaluation of the pro-posed vehicle tracking method in particular scenarios. Finally, Section V concludes this paper.

II. WORKFLOW

The proposed method combines RDHOGPF to improve the detection accuracy, according to the procedure shown in Fig. 1. Initially, background subtraction is used to detect and verify

Fig. 1. Workflow of the proposed multivehicle detection and tracking method.

Fig. 2. Detection results for foreground segmentation and vehicle verification. (a) Foreground image. (b) Moving edge detection image.

vehicles entering the region of interest (ROI). Background subtraction is a rapid approach for separating the vehicle from the background image. A mixture of Gaussian distributions is often used for modeling the color value of each pixel in an image over time.

After background subtraction, reflections in tunnels are eas-ily recognized as the foreground pixels, particularly those caused by headlights in tunnels. Many previous studies have discussed shadow elimination, but the effect caused by head-light reflections in tunnels is also severe. It is difficult to distin-guish the color of the headlight reflections from white vehicles. Therefore, an edge feature needs to be integrated into the system to segment the vehicle correctly. The edges of vehicles and the background are both extracted in the input image. To remove the edges caused by lane marks, moving edges are extracted by eliminating the edges from the background and the shadow. In Fig. 2(a), the false foreground pixels caused by the shadows and the headlights make it difficult for the system to identify the vehicle’s position. Thus, we need to scan the horizontal edges in an upward direction to discriminate the real objects from the rigid body. The foreground pixels are drawn as gray in blue rectangles, which represent the segmented vehicles with moving edges. Otherwise, the green rectangle represents the foreground pixels attributable to the headlights. When the object is verified as a vehicle, the initial size of the rectangle is obtained for the vehicle. Using the verification procedure, the glare of the front light of coming vehicles can be recognized as a nonvehicle.

The trajectory is generated if the vehicle is detected more than twice in consecutive frames. The partial occlusions are detected and segmented based on the feedback tracking infor-mation. The tracked vehicle is represented and tracked using an RDHOG descriptor. The proposed vehicle tracking method

(3)

Fig. 3. Diagram of the cell and block used in the normalized image.

based on a particle filter can locate the target in the spatial space accurately and also obtain the correct size of the tracking vehicle as it passes.

III. RDHOGPF VEHICLETRACKINGALGORITHM

Several important methods are introduced in this section for our proposed appearance-adaptive vehicle tracking framework, including the RDHOG computation, trajectory generation, a particle filter based on a two-scale RDHOG, appearance-adaptive template matching, and occlusion handling.

A. RDHOG Feature Computation

HOG is now a well-known descriptor used for object detec-tion [26], [27]. The image gradient is computed by convolving the original image using a horizontal kernel [−1 0 1] and a vertical gradient kernel [−1 0 1]T_{. In this paper, the sizes}

of all the target templates and sample images (particles) are normalized to 32× 32. Next, the image is divided into equal cells of 8× 8, which are represented as blue rectangles in Fig. 3. In this paper, the orientation of the gradient is weighted by voting using nine bins spaced over 0◦–180◦. We take the magnitude of each pixel as the weight of the angle bin for any pixels present in the cell. We select four connected cells as a block. Therefore, the red rectangle in Fig. 3 contains nine blocks in the 32× 32 image, and there are 36 bins in each block. These are combined with all blocks to produce a feature vector that serves as the vehicle descriptor. This is the basic HOG descriptor used to describe a target vehicle, and there are 324 HOG bins in each descriptor. HOG can be used to describe the shapes and magnitudes of the target image, but RDHOG is proposed to enhance the description of the target. In contrast to the conventional HOG bins, the relationships are considered between the central block and other surrounding blocks. The RDHOG bins indicate the difference between the central block and the surrounding blocks. Therefore, the difference between the HOG bins in the central block and those in the block around the central block is calculated using

RDbj(i) = b5(i)− bj(i), j = 1∼ 9, j = 5 (1) where bj(i) is the ith bin of the jth block, b5(i) is the ith bin

of the central block, and RDbj(i) is the ith RDHOG bin of the jth block. Therefore, there are 288 RDHOG bins in each descriptor. The total number of elements in the descriptor is

Fig. 4. Diagram of trajectory generation.

612 after combining HOG features and RDHOG features. Two scaling images are also calculated to track the vehicle at the correct size. If the vehicle drives in the y-direction in the real world, the appearance of the vehicle in the image will increase or decrease gradually. Therefore, in our approach, each particle has two samples: the original width (W) and height (H) of the tracked vehicle and the width (sW) and height (sH) with a scaling factor (this scaling factor depends on the testing scene). The vehicle becomes gradually smaller in the test scene, so the scaling factor is smaller than one. Next, both sample images with different image sizes are resized to 32 × 32, and HOG features are calculated. In addition, to reduce aliasing from the background, only the gradients in the foreground image are considered when calculating the HOG and RDHOG features. After the trajectory of the detected vehicle is generated, the descriptor of the vehicle is obtained for vehicle tracking. B. Trajectory Generation

The trajectory must be generated after detection to track a vehicle in an image sequence. The movements of the vehicles must be continuous if the frame rate is sufficiently high. The detected vehicles that appear in successive frames must overlap. Thus, the trajectory can be established when the vehicle is detected more than twice, and it overlaps in the consecutive frames. Thus, the correct detection result for a passing vehicle must include relationships for times t, t− 1, and t − 2. The detected regions overlap if these regions are related to the same vehicle. As shown in Fig. 4, A1, A2, and A3 are the detected

vehicles at times t, t− 1, and t − 2, respectively. a1, a2, and a3

are the respective areas of A1, A2, and A3, and a1,2 and a2,3

are the corresponding overlapping areas ai

Δ

= area(A1) and a2 Δ

= area(A2) (2)

a1,2= area(AΔ 1∩ A2) and a2,3= area(AΔ 2∩ A3) (3)

where area(A) is defined as the area of vehicle A. In our experiments, the trajectory is generated if the corresponding overlapping areas are larger than half the size of the respective areas a1and a2.

C. Particle Filter

We assume a first-order Markov chain state-space model for Bayesian filtering. The particle filter is one of the most popular Bayesian filters used to estimate the posterior distributions of state variables. It comprises two main steps: prediction and updating. The state vector xk is modeled as the position and the speed of the tracked vehicle in the image at discrete time k

(4)

Fig. 5. Example of a HOG descriptor. (a) Target intensity image. (b) Target binary foreground image. (c) Target binary edge image. (d) Target HOG image. (e)–(h) Corresponding images of particle 1. (i)–(l) Corresponding images of particle 2.

where x and y represent the column and row position of the vehicle in the image, respectively, and ˙x and ˙y are the respective speeds of the vehicle in the subsequent frame.

Using the set of weighted particles Sk, the state estimate ˆxk at time k can be the maximum a posteriori estimate

ˆ

xk = arg max Sk

ω(j)_k (5)

where ω_k(j) is the weight of the jth particle x(j)_k . Next, the samples are resampled to avoid a degeneration problem by deleting the particles with negligible weights and generating particles with equal weights from the high weight particles. Finally, the estimation results are refined and updated.

D. Likelihood Calculation

In this section, the state transition transits the states in the successive frames using white Gaussian noise, and the measure-ment model measures the similarity between the observations and the target. Unlike other methods that use color to represent the target, RDHOG is proposed to represent the tracked vehicle. RDHOG can address the shape property of the rigid body more clearly than the color histogram representation, and RDHOG also enhances the discrimination of the descriptor. To compute the similarity between the observation yk (image within the bounding box) and the template xk, the observations are resized to the same size as the target template, 32 × 32. The HOG image and the binary edge image are shown in Fig. 5. The HOG features are extracted from the gradient image before the binary process. Using the edge image and foreground image, it is clear that the HOG features illustrated in Fig. 5 are different in the images of the target and the particles.

The similarity (or likelihood) p(yk|xk) that represents the similarity degree between the particle and the template is calcu-lated based on the Bhattacharyya distance of the HOG feature. Using the HOG descriptor, the jth HOG weight of particle

TABLE I

PSEUDOCODE OF THERESAMPLINGALGORITHM

h(j)_k at time k can be computed based on the Bhattacharyya distance h(j)_k = N−1 i=0 d(j)_(i)dT_(i) ₍₆₎

where d(j)(i) and dT(i) are the ith elements of the HOG feature of the jth particle and target, respectively, and N , which is 324, is the total length of the HOG descriptor.

Next, the jth RDHOG weight of particle δ(j)_k is calculated to enhance the discrimination of the target

δ_k(j)= M−1

i=0

RDb(j)_(i)_{− RDb}T_(i) ₍₇₎ where RDb(j)_{(i) and RDb}T_{(i) are the ith elements of the} RDHOG features of the jth particle and target, respectively, and M , which is 288, is the total length of the RDHOG descriptor. Finally, the jth particle weight ω_k(j)at time k can be computed using the following equation:

ω_k(j)= p

yk|x(j)_k

= h(j)_k /δ(j)_k . (8)

As h(j)_k increases, the similarity between the target and particle also increases. As δ_k(j) becomes lower, the difference between the central block and surrounding blocks in the target and particles declines. Therefore, if ω(j)_k is higher, there is greater similarity between the target and the particle. After obtaining the likelihood of the particles, we normalize the weights, so

N j=1

ω(j)_k = 1. (9)

To resolve the degeneracy problem, the resampling procedure is performed to delete the lower weight particles and retain the strong particles. The pseudocode of the resampling algorithm is presented in Table I.

(5)

Fig. 6. New positions of the three trajectories estimated using RDHOGPF with partial occlusion handling for the edge image, which are masked with gray to avoid the generation of a new trajectory for the same vehicle in successive frames. (a) Position of the first trajectory. (b) Positions of the rest of the trajectories.

Fig. 7. Importance sampling example of the second trajectory during the RDHOGPF estimation. The (gray) masked region is the new region for the first trajectory. (a) Binary edge target image. (b), (c) Binary edge images (particles) with partially occluded regions.

E. Occlusion Handling

Most previous studies that addressed the problems of track-ing have focused on resolvtrack-ing strack-ingle object tracktrack-ing. In most situations, however, two or more objects may appear simultane-ously. In this paper, the older trajectory is estimated first, while the estimated position of the tracked vehicle is masked with gray, as shown in Fig. 6(a).

After updating the older trajectories, the newly generated trajectory estimates the new position using RDHOGPF. The masked region is drawn as gray based on the older trajectories, so some particles will contain the partial gray region shown in Fig. 7 when the sampling process is performed to calculate the likelihood.

The magnitudes are set to zero in the gray region using M (u, v) =

0, if F (u, v)∼= 1

M (u, v), otherwise (10) where u and v are the horizontal and vertical coordinates, respectively, and M (u, v) and F (u, v) are the respective mag-nitude and foreground values in the normalized 32× 32 image. When the magnitude value is set to zero, the accumulated HOG feature d(j)_{(i) in (8) will decline, almost to zero in}

d(i) = 0, if M (u, v) = 0 (11) where u and v are the horizontal and vertical coordinates in a cell, respectively. Therefore, the occluded regions will lead to decreases in the weights in (8). Thus, the estimation of the second trajectory will yield the correct position by evaluating the weight affected least by the other trajectory. Finally, the results of the estimation of three trajectories are exhibited in Fig. 6(b). Partial occlusion still occurs, but the trajectory can handle this situation and tracks the vehicles well. Moreover, the region in the estimation result is masked with a gray color

TABLE II

EXECUTIONTIME FOREACHMODULE

in the edge and the foreground image, so the vehicle detection procedure will not detect and generate a new trajectory for the same vehicle in the subsequent frame.

IV. EXPERIMENTALRESULTS

This section addresses the performance evaluation results for the proposed RDHOGPF method of vehicle tracking using sev-eral traffic surveillance videos. Traffic video 1 was a continuous 1-h video recorded inside a tunnel in low-contrast conditions, while traffic video 2 was a continuous half-hour video in exces-sively bright light conditions with low contrast. The difficulties inherent in these conditions were as follows. First, because of the height limitations in the tunnel, vehicles were sometimes partially occluded when they entered the detection zone within the tunnel. Second, low-contrast and excessively bright scenes led to rough target representations. However, the proposed RDHOGPF approach handled these situations and maintained its performance. Our approach was implemented in C++ using the Microsoft Foundation Class Library. A highly detailed analysis of the computational time requirements is provided in Table II. Table II illustrates that most of the computational time was attributed to the particle filter tracking module because of the likelihood computation and the particle weight normaliza-tion. Because of the existing trajectories, the standard deviation of the particle filter tracking execution time was higher than that of the other modules. These scenarios were all tested using a Windows XP platform using an Intel core i5 with a 3.3-GHz central processing unit and 3 GB of RAM, where the test image size was 320× 240.

In the quantitative evaluation of the vehicle tracking perfor-mance, we used the detection ratio (Recall) and precision ratio (Precision), which are used widely to evaluate the performance during information retrieval. These measures are defined as

Recall = Tp/(Tp+ Fn) (12) Precision = Tp/(Tp+ Fp) (13) where Tp (true positives) represents the number of correctly detected vehicles, Fp(false positives) represents the number of

(6)

TABLE III

DETECTIONRESULTS INTRAFFICVIDEO1 (1 h)

falsely detected vehicles, and Fn(false negatives) is the number of missing vehicles. We compared our RDHOGPF method with an APF [16], an annealing PF [24], a two-stage hybrid tracker (TSHT) [25], and a tracker that fused the color and contour features [28]. These approaches all deliver good performance according to the literature. TSHT is a hybrid tracker that com-bines a particle filter with mean shift, the annealing PF gener-ates and resamples the particles in different layers with scalable variance, and APF handles occlusion using a Rayleigh distri-bution. The system presented in [28] performed well. How-ever, its performance was still worse than that of RDHOGPF because of the latter’s ability to cope with noise and poor lighting environments. RDHOGPF delivered a better overall performance than the other approaches, regardless of the lane to which it was applied or the performance measure selected.

Table III shows the results with each approach during ve-hicle detection and tracking using traffic surveillance video 1. RDHOGPF delivered a Recall of 93.75% and a Precision of 95.26%. Thus, our proposed approach could detect and track the passing vehicles correctly through the end of ROI. Lower values of Fn and Fpindicated lower numbers of missed vehi-cles and falsely detected vehivehi-cles, respectively. The parameters for each lane were adjusted for the comparison of the methods because the slopes were different in each lane. For TSHT, the standard deviation of the position of x and y was set to 6, the standard deviation of the size scale was set to 0.01, ε was 1, kpwas 15, and the particle number was 30. For annealing PF,

the layer number was set to 3, P0 is 6, the alpha values of

the y position in all three lanes were set to 0.1, and the alpha values were set to 0.6, 0.4, and 0.2 for the x positions in the left, middle, and right lanes, respectively. For APF, because the left and middle lanes slanted to the right, the Rayleigh distribution was used in both the x- and y-directions. In the right lane, the normal distribution was set for the x-direction while the

Fig. 8. RDHOGPF tracking results during the lane changing event in traffic video 1. (a) Simple lane changing case. (b) Lane changing with partial occlu-sion. (c) Lane changing from the middle lane to the right lane with strong rear taillight lighting.

Fig. 9. RDHOGPF tracking results for some complex cases in traffic video 1. (a) Partial occlusion was caused by a trunk in the middle lane. (b) Partial and full occlusions caused by a truck. (c) Occurrence of partial occlusion and auto-white balance at the same time.

Rayleigh distribution was set for the y-direction. The standard deviations of x and yy were set to (5, 3), (2, 3), and (6, 3) in the left, middle, and right lanes, respectively. The color and contour features are fused into the particle framework [28], so the performance was good. For the proposed RDHOGPF method, the parameters for the motion model in [x y ˙x ˙y]T were set to [10 10 2 2]T _{and [6 6 1 1]}T _{in traffic videos 1 and 2,} respectively. The total particle number was 100.

Some representative results with the proposed RDHOGPF are shown in Figs. 8 and 9. The RDHOGPF tracking results for the lane changing vehicles are addressed in Fig. 8. The images in Fig. 8(a) illustrate a simple lane change tracking case, while the images in Fig. 8(b) and (c) present that robust tracking was achieved throughout the lane change event, even when partial occlusion occurred.

Fig. 9 are some RDHOGPF tracking results for complex cases. Fig. 9(a) illustrates partial occlusion between the left lane

(7)

TABLE IV

DETECTIONRESULTS INTRAFFICVIDEO2 (30 min)

and middle lane. Fig. 9(b) presents partial and full occlusions caused by a trunk in the right lane, where the tracked vehicles in the middle lane and left lane were counted successfully even when the vehicle in the middle lane was fully covered by the truck. Fig. 9(c) addresses that RDHOGPF handled the partial occlusion successfully and tracked the vehicle even while the white balance function was executed. During the auto-white balance process, the colors in the image changed sharply, and the trajectory updating process became difficult.

Table IV shows the detection results for traffic surveillance video 2, where the test scene was a general road with low contrast because of the brightness of the light. The proposed RDHOGPF could maintain most of the trajectories until the ve-hicle left the ROI. Some veve-hicles were not detected, while some trajectories were deleted because they did not contain enough edges, but the overall Recall and Precision for RDHOGPF were both > 90%. In Fig. 10(a)–(c), the images address that the tracked vehicles were tracked successfully when they changed from the left to the middle lane, from the middle to the left lane, and from the middle to the right lane, respectively, even when the vehicles were similar to the background image. Fig. 11 shows some results for occlusion handling during RDHOGPF tracking in traffic video 2. Fig. 11(a) and (b) shows occlusion between the left and middle lanes. The tracked sedan was covered by the roof of the bus in the right lane, but the sedan was counted successfully until the end in Fig. 11(c).

Fig. 12 compares the tracking results for traffic video 1 produced using TSHT, annealing PF, APF, and RDHOGPF. The first column in Fig. 12 illustrates that the rectangle size increased with the TSHT method and vehicle detection failed in the middle lane. The particles were more concentrated in annealing PF than in the other methods because of the

three-Fig. 10. RDHOGPF tracking results for the lane changing events in traffic video 2. (a) White van changed from the left lane to the middle lane. (b) Taxi changed from the middle lane to the left lane. (c) White sedan with a similar color to the background was still tracked successfully during lane changing.

Fig. 11. Occlusion handling results by RDHOGPF in traffic video 2. (a) Partial occlusions between three vehicles. (b) Severe occlusion between two vehicles. (c) Car in the middle lane was covered by the roof of the bus.

layer enforcement. However, the drift problem still occurred, and the trajectories were deleted when the rectangle contained nothing in the third column. Therefore, three vehicles were missed. In APF, the trajectory in the right lane could not capture all of the vehicle blobs. Therefore, a new trajectory was generated, and this vehicle was eventually counted twice in the third column. Although TSHT tracked a vehicle in the right lane and it was counted correctly, the rectangle size of the tracked vehicle was obviously too large, and the vehicle in the middle lane was missed. In the last column, the vehicles were moving in different lanes, and partial occlusion occurred, but RDHOGPF detected the different vehicles successfully in this complex traffic scene. The comparative videos can be viewed and downloaded at http://140.113.150.201/research/image.php. The RDHOGPF approach delivered a good performance, but some errors still occurred due to inaccuracies. The truck in Fig. 13(a) was in the middle lane, but it was falsely detected

(8)

Fig. 12. Comparison of the tracking results using different methods for traffic video 1. (a) TSHT. (b) Annealing PF. (c) APF. (d) RDHOGPF.

Fig. 13. Failure results with the proposed method. (a) False detection in the left lane. (b) False detection in the left lane and the vehicle was missed in the right lane. (c) Truck was counted twice in the middle lane. (d) White sedan was missed in the right lane. (e) Truck was counted erroneously in the left lane. (f) Motor was falsely detected in the right lane.

and counted in the left lane. In the same situation, the bus was falsely counted in the left lane because the system falsely recognized the bus roof as a vehicle in Fig. 13(b). In Fig. 13(c), the truck was counted twice in the middle lane, and the false positive number increased by one in the middle lane. As shown in Fig. 13(d), some white cars were difficult to recognize when they blended into the road surface. In Fig. 13(e), despite RDHOGPF detecting and tracking the truck successfully, the trajectory was close to the lane mark, so this vehicle was counted in the left lane when it was actually in the middle lane. The car in Fig. 13(f) was falsely detected in the right lane.

To illustrate the results with the proposed RDHOGPF method at crossroads, we provided a test video that involves

Fig. 14. RDHOGPF tracking results for the red car showing the appearance and RDHOG descriptor updating. (a) Tracking of the red sedan was initiated. (b) Red sedan started to change direction at the crossroads. (c) Red sedan was partially occluded by the utility pole. (d) Presented RDHOGPF method tracked the red sedan until the end of the video. (e)–(h) Appearance of the tracked red sedan. (i)–(l) Corresponding updated RDHOG descriptors.

crossroads. This allowed us to demonstrate the performance of our proposed RDHOGPF method with numerous vehicle direction changes. The test video was derived from the Imagery Library for Intelligent Detection Systems (i-LIDS) [29], which is a well-known data set used to evaluate system performance. Fig. 14 addresses a sequence where a red car changed from a vertical direction to a horizontal direction, and the corre-sponding appearances of the template and the HOG maps are displayed together. In Fig. 14(a), the red car was tracked, and it changed to the left lane in Fig. 14(b). In Fig. 14(e)–(h), the appearances of the template changed from a front view to a side view, and the corresponding HOG and RDHOG features are

(9)

updated in Fig. 14(i)–(l). In Fig. 14(c), the red car is partially occluded by a utility pole, but the rectangle still located the po-sition of the red car correctly at an appropriate size. The lane on the left was oriented in a horizontal direction, so the scaling fac-tor is set to one. Thus, the size of the rectangle was not changed in this lane. Finally, in Fig. 14(d), when the red car passed the utility pole, the proposed RDHOGPF method was not af-fected by the utility pole, and it tracked the red car successfully.

V. CONCLUSION

This paper proposed an effective vehicle detection and track-ing approach based on RDHOG with occlusion handltrack-ing. Back-ground subtraction was used to detect vehicles by segmenting the foreground image and verifying their edge features. The trajectory was generated if the vehicle was detected three times in consecutive images. The target pattern was normalized to produce a 32× 32 image before the HOG and RDHOG features were extracted. RDHOG was based on the HOG descriptor, which was used to represent the difference between the central block and the surrounding blocks of the target. After combining the features of HOG and RDHOG, the descriptor could describe the shape of the target and discriminate between the center and borders. To track the vehicle with a correct rectangle size, each particle had two sample images. One was of the original size, whereas the other one had a scaled width and height. The position of the tracked vehicle was estimated and updated after calculating the likelihoods for vehicle tracking. We also developed a method for occlusion handling to reduce the effects of occluded vehicles by increasing the difference values for the partially obscured region. The rectangle of an older trajectory was masked, and smaller particle weights were obtained when a particle occupied this masked occluded region, so the trajectories were prevented from moving unexpectedly. Using RDHOG tracking, the drift effects were reduced in scenes with low viewing angles, low contrast, and excessive brightness. Despite changes in the appearance of vehicles when negotiating crossroads, the proposed RDHOGPF method could track the vehicle correctly. The RDHOGPF method could also track vehicles accurately when the tracked vehicles were oc-cluded by signposts and utility poles. The RDHOGPF vehicle tracker was implemented in C++ and tested using real highway traffic scenes captured inside a tunnel, on a general road with low contrast, and an i-LIDS image obtained from the U.K. government. Our experimental results and comparisons with other tracking methods demonstrated that RDHOGPF is highly effective and it delivers good performance in many different traffic situations.

ACKNOWLEDGMENT

The authors would like to thank the laboratory members who helped to manually count the vehicles in the video sequences.

REFERENCES

[1] J. Zhang, F. Y. Wang, K. Wang, W. H. Lin, X. Xu, and C. Chen, “Data-driven intelligent transportation systems: A survey,” IEEE Trans. Intell.

Transp. Syst., vol. 12, no. 4, pp. 1624–1639, Dec. 2011.

[2] N. Buch, S. A. Velastin, and J. Orwell, “A review of computer vision techniques for the analysis of urban traffic,” IEEE Trans. Intell. Transp.

Syst., vol. 12, no. 3, pp. 920–939, Sep. 2011.

[3] Y. L. Chen, B. F. Wu, H. Y. Huang, and C. J. Fan, “A real-time vision system for nighttime vehicle detection and traffic surveillance,” IEEE

Trans. Ind. Electron., vol. 58, no. 5, pp. 2030–2044, May 2011.

[4] B. F. Wu and J. H. Juang, “Adaptive vehicle detector approach for com-plex environments,” IEEE Trans. Intell. Transp. Syst., vol. 13, no. 2, pp. 817–827, Jun. 2012.

[5] W. Zhang, Q. M. J. Wu, X. Yang, and X. Fang, “Multilevel framework to detect and handle vehicle occlusion,” IEEE Trans. Intell. Transp. Syst., vol. 9, no. 1, pp. 161–174, Mar. 2008.

[6] S. Chen, J. Zhang, Y. Li, and J. Zhang, “A hierarchical model incor-porating segmented regions and pixel descriptors for video background subtraction,” IEEE Trans. Ind. Informat., vol. 8, no. 1, pp. 118–127, Feb. 2012.

[7] L. Maddalena and A. Petrosino, “A self-organizing approach to back-ground subtraction for visual surveillance applications,” IEEE Trans.

Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008.

[8] M. Vargas, J. M. Milla, S. L. Toral, and F. Barrero, “An enhanced back-ground estimation algorithm for vehicle detection in urban traffic scenes,”

IEEE Trans. Veh. Technol., vol. 59, no. 8, pp. 3694–3709, Oct. 2010.

[9] W. Kim and C. Kim, “Background subtraction for dynamic texture scenes using fuzzy color histograms,” IEEE Trans. Signal Process. Lett., vol. 19, no. 3, pp. 127–130, Mar. 2012.

[10] B. Han and L. S. Davis, “Density-based multifeature background sub-traction with support vector machine,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 34, no. 5, pp. 1017–1023, May 2012.

[11] S. Y. Chen, “Kalman filter for robot vision: A survey,” IEEE Trans. Ind.

Electron., vol. 59, no. 11, pp. 4409–4420, Nov. 2012.

[12] J. Garcia, A. Gardel, I. Barvo, J. L. Lazaro, M. Martinez, and D. Rodriguez, “Directional people counter based on head tracking,” IEEE

Trans. Ind. Electron., vol. 60, no. 9, pp. 3991–4000, Sep. 2013.

[13] F. Lin, X. Dong, B. M. Chen, K.-Y. Lum, and T. H. Lee, “A robust real-time embedded vision system on an unmanned rotorcraft for ground target following,” IEEE Trans. Ind. Electron., vol. 59, no. 2, pp. 1038– 1049, Feb. 2012.

[14] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking,”

IEEE Trans. Signal Process., vol. 50, no. 2, pp. 174–188, Feb. 2002.

[15] Y. Lao, J. Zhu, and Y. F. Zheng, “Sequential particle generation for vi-sual tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 9, Sep. 2009.

[16] J. Scharcanski, A. B. de Oliveria, P. G. Cavalcanti, and Y. Yari, “A particle-filtering approach for vehicular tracking adaptive to occlusions,” IEEE

Trans. Veh. Technol., vol. 60, no. 2, pp. 381–389, Feb. 2011.

[17] N. Bouaynaya and D. Schonfeld, “On the optimality of motion-based particle filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 7, pp. 1068–1072, Jul. 2009.

[18] P. Pan and D. Schonfeld, “Visual tracking using high-order particle fil-tering,” IEEE Trans. Signal Process. Lett., vol. 18, no. 1, pp. 51–54, Jan. 2011.

[19] P. L. M. Boutterfroy, A. Bouzerdoum, S. L. Phung, and A. Beghdadi, “Integrating the projective transform with particle filtering for visual tracking,” EURASIP J. Image Video Process., vol. 2011, pp. 1–11, Jan. 2011.

[20] Z. H. Khan, I. Y. H. Gu, and A. G. Backhouse, “Robust visual tracking using multi-mode anisotropic mean shift and particle filters,” IEEE Trans.

Circuits Syst. Video Technol., vol. 21, no. 1, pp. 74–87, Jan. 2011.

[21] X. Li, T. Zhang, X. Shen, and J. Sun, “Object tracking using an adap-tive Kalman filter combined with mean shift,” Opt. Eng., vol. 49, no. 2, pp. 1–3, 2010.

[22] A. Makris, D. Kosmopoulos, S. Perantonis, and S. Theodoridis, “A hierar-chical feature fusion framework for adaptive visual tracking,” Image and

Vision Computing, vol. 29, no. 9, pp. 594–606, 2011.

[23] X. Mei and H. Ling, “Robust visual tracking and vehicle classification via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 2259–2272, Nov. 2011.

[24] J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture by annealed particle filter,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2000, vol. 2, pp. 126–133.

[25] E. Maggio and A. Cavallaro, “Hybrid particle filter and mean shift tracker with adaptive transition model,” in Proc. IEEE Int. Conf. Acoust., Speech,

Signal Process., 2005, vol. 2, pp. 221–224.

[26] D. Monzo, A. Albiol, A. Albiol, and J. M. Mossi, “Color HOG-EBGM face recognition,” in Proc. IEEE Int. Conf. Image Process., 2011, pp. 785–788.

(10)

[27] J. Arrospide, L. Salgado, and J. Marinas, “HOG-like gradient-based de-scriptor for visual vehicle detection,” in Proc. IEEE Intell. Veh. Symp., 2012, pp. 223–228.

[28] J. Li, X. Lu, L. Ding, and H. Lu, “Moving target tracking via particle filter based on color and contour features,” in Proc. 2nd Int. Conf. Inf. Eng.

Comput. Sci., 2012, pp. 1–4.

[29] [Online]. Available: https://www.gov.uk/imagery-library-for-intelligent-detection-systems#i-lids-datasets

[30] C.-Y. Chang and H. W. Lie, “Real-time visual tracking and measurement to control fast dynamics of overhead cranes,” IEEE Trans. Ind. Electron., vol. 59, no. 3, pp. 1640–1649, Mar. 2012.

Bing-Fei Wu (S’89–M’92–SM’02–F’12) received the B.S. and M.S. degrees in control engineer-ing from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 1981 and 1983, respectively, and the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles, CA, USA, in 1992.

He was with the Department of Electrical Engi-neering, NCTU, in 1992, was promoted to Professor in 1998, and Distinguished Professor in 2010. He has served as the Director of the Institute of Electrical and Control Engineering, NCTU, since 2011. His current research interests include image recognition, vehicle driving safety, intelligent control, intelligent transportation systems, and embedded systems.

Dr. Wu founded and served as the Chair of the IEEE Systems, Man, and Cybernetics Society’s (SMC) Taipei Chapter in 2003 and was elected as the Chair of the Technical Committee on Intelligent Transportation Systems of the IEEE SMC Society in 2011.

Chih-Chung Kao (S’06) received the B.S. degree from the Department of Electrical Engineering, Na-tional Taiwan Ocean University, Keelung, Taiwan, in 2004 and the M.S. degree from the Institute of Indus-trial Education, National Taiwan Normal University, Taipei, Taiwan, in 2006. He is currently working toward the Ph.D. degree at National Chiao Tung University, Hsinchu, Taiwan.

His research interest includes image processing, pattern recognition, statistical learning, and intelli-gent transportation systems.

Cheng-Lung Jen (S’06) received the B.S. degree in electrical engineering from the National Chin-Yi University of Technology, Taichung, Taiwan, in 2004 and the M.S. degree in electrical engineering from the National Central University, Jhongli, Taiwan, in 2006. He is currently working toward the Ph.D. degree at National Chiao Tung University, Hsinchu, Taiwan.

His main research interests lie in the area of machine learning, wireless positioning systems, and robotics.

Yen-Feng Li was born in Taitung, Taiwan, in 1976. He received the B.S. degree in electronic engineering from the National Taiwan University of Science and Technology, Taipei, Taiwan, in 2007 and the M.S. degree in electronic engineering from National Chin-Yi University of Technology, Taichung, Taiwan, in 2009. He is currently working toward the Ph.D. degree in electrical engineering at National Chiao Tung University, Hsinchu, Taiwan.

His research interests include image processing, embedded systems, and intelligent transportation system.

Ying-Han Chen was born in Tainan, Taiwan, in 1981. He received the B.S. and M.S. degrees in elec-trical engineering from National Central University, Jhongli, Taiwan, in 2003 and 2006, respectively. He is currently working toward the Ph.D. degree in elec-trical engineering at National Chiao Tung University, Hsinchu, Taiwan.

His research interests include networking, embed-ded systems, and digital signal processing.

Jhy-Hong Juang was born in I-Lan, Taiwan, in 1974. He received the B.S. degree in control engi-neering, and the M.S. and Ph.D. degrees in electrical and control engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 1997, 1999, and 2012, respectively.

He is currently with NCTU. His research interests include image processing, software engineering, and pattern recognition.