Chapter 1 Introduction
1.2 Organization
This thesis is organized as follows: Chapter II gives an overview of related work about this research realm and presents the research by different modules. Chapter III introduces our proposed system. We present the framework and the algorithms with four sub-systems:
background reconstruction, objects extraction, objects tracking and behavior analysis. Chapter IV presents experimental results of the implementation of our proposed algorithm and the successful working of some self-made traffic videos of intersections and sample videos of surveillance systems. Finally, we made a conclusion of this study in Chapter V.
Chapter 2 Related Work
A tracking system is composed of three main modules: objects extraction, objects tracking and behavior analysis. The module of objects extraction could segment moving objects from the input video or frames. The common approach of segmenting objects is background subtraction. Therefore, this module needs to reconstruct the proper background model and establish an efficient algorithm to update the background to cope environmental changes. Then it will extract the moving objects and eliminate some noise or undesired objects. The module of objects tracking will extract significant features and use them with tracking algorithm to obtain the optimal matching between previous objects and current objects. Finally, the last module could analyze objects’ properties and recognize their behaviors to extract useful traffic parameters. In this chapter, we will review some research related with these main modules.
2.1 Objects Extraction
Foreground segmentation is the first step of objects extraction and it’s to detect regions corresponding to moving objects such as vehicles and pedestrians. The modules of objects tracking and behavior analysis only need to focus on those regions of moving objects. There are three conventional approaches for foreground segmentation outlined in the following:
1) Optical flow. Optical-flow-based motion segmentation uses characteristics of the flow vectors of moving objects over time to detect moving regions in an image sequence. This method is often applied to 3D-reconstruction research [1] or activity analysis work [2].
Optical-flow-based methods also can be used to discriminate between different moving groups, e.g., the optical flow of background resulting from camera motion is different with
optical flow resulting from moving objects. There are some algorithms proposed to assist in solving equations of optical flow. Differential technique, region-based matching, energy-based method and phase-based method are main methods, which are used for optical flow framework. Barron [3] presented those approaches of optical flow and evaluated the performances and measurement accuracy. Meyer et al. [2] computed the displacement vector field to initialize a contour based on the tracking algorithm for the extraction of articulated objects. But the optical flow method is computationally complex and very sensitive to noise, and cannot be applied to video streams in real time without the specialized hardware.
2) Temporal differencing. This method takes the differences between two or three consecutive frames in an image sequence to extract moving regions. Comparing with optical flow method, the temporal differencing is less computation and easy to implement with real-time tracking systems. Besides, the temporal differencing is adaptive to dynamic environments. But it is poor in extracting all the relevant pixels, e.g., there may be holes left inside moving objects. Some research used three-frame differencing instead of the two-frame process. Lipton et al. [4] used temporal differencing method to detect moving objects in real video streams.
3) Background subtraction. Background subtraction-based method is an easy and popular method for motion segmentation, especially under those situations with a relatively static background. It detects moving regions by taking the difference between the current image and the reference background image in a pixel-by-pixel sequence. It did a good job to extract complete and clear objects region. But it’s sensitive to changes in dynamic environment derived from lighting and extraneous factors etc. Hence, a good background model is indispensable to reduce the influence of these changes. Haritaoglu et al. [5] built a statistical model by representing each pixel with three values: its minimum and maximum intensity values, and the maximum intensity difference between consecutive frames observed
during the training period. These three values are updated periodically.
Besides those basic methods described above, there are other approaches or combined methods for foreground segmentation. Elgammal et al. [6], [22] presented nonparametric kernel density estimation techniques as a tool for constructing statistical representations for the scene background and foreground regions in video surveillance. Its model achieved sensitive detection of moving targets against cluttered background. Kamijo [7] proposed a spatio-temporal markov random field model for segmentation of spatio-temporal images.
Kato et al. [8] used a hidden Markov model/Markov random field (HMM/MRF)-based segmentation method that is capable of classifying each small region of an image into three different categories: vehicles, shadows of vehicles, and backgrounds. The method provided a way to model the shadows of moving objects as well as background and foreground regions.
As mentioned previously, active construction and updating of background are important to object tracking system. Therefore, it’s a key process to recover and update background images from a continuous image sequences automatically. Unfavorable conditions, such as illumination variance, shadows and shaking branches, bring many difficulties to this acquirement and updating of background images. There are many algorithms proposed for resolving these problems. Median filtering on each pixel with thresholding based on hysteresis was used by [9] for building a background model. Friedman et al. [10] used a mixture of three Gaussians for each pixel to represent the foreground, background, and shadows with an incremental version of EM (expectation maximization) method. Ridder et al.
[11] modeled each pixel value with a Kalman filter to cope with illumination variance.
Stauffer et al. [12] presented a theoretic framework for updating background with a process in which a mixed Gaussian model and the online estimation were used. McKenna et al. [13] used an adaptive background model with color and gradient information to reduce the influences of shadows and unreliable color cues. Cucchiara et al. [14] based the background subtraction
method and combined statistical assumptions with the object level knowledge of moving objects to update the background model and deal with the shadow. They also used optical flow method to improve object segmentation. Li et al. [15] proposed a Bayesian framework that incorporated spectral, spatial, and temporal features to characterize the background appearance. Under this framework, a novel learning method was used to adapt to both gradual and sudden background changes.
In our system, we proposed foreground segmentation framework based on background subtraction and temporal differencing. We also introduced an adaptive background updating algorithm using the statistic index. It’s effective to cope with the gradual and sudden changes of the environment.
2.2 Objects Tracking
Besides foreground segmentation, objects tracking is the another key module of almost surveillance systems. The purpose of tracking module is to track moving objects from one frame to another in an image sequences. And, tracking algorithm needs to match the observed objects to the corresponding objects detected previously. Useful mathematical tools for objects tracking include the Kalman filter, the condensation algorithm, the dynamic Bayesian network, the geodesic method, etc. Hu et al. [16] presented there are four major categories of tracking algorithms: region-based tracking algorithms, active-contour-based tracking algorithms, feature-based tracking algorithms, and model-based tracking algorithms. Firstly, region-based tracking algorithms [17] were dependent on the variation of the image regions corresponding to the moving objects. The motion regions were usually detected by subtracting the background from the current image. Secondly, active contour-based tracking algorithms represented the outline of moving objects as contours. These algorithms had been successfully applied to vehicle tracking [18]. Thirdly, feature-based tracking algorithms performed the recognition and tracking of objects by extracting elements, clustering them into higher level
features, and then matching the features between images. The global features used in feature-based algorithms include centroids, perimeters, areas, some orders of quadratures, and colors [19], etc. Fourthly, model-based tracking algorithms localized and recognized vehicles by matching a projected model to the image data. Tan et al. [20] proposed a generalized Hough transformation algorithm based on single characteristic line segment matching an estimated vehicle pose.
Besides, much research presented tracking algorithms with different categories integrated together for better tracking performance. McKenna et al. [13] proposed a tracking algorithm at three levels of abstraction: regions, people, and groups in indoor and outdoor environments. Each region has a bounding box and regions can merge and split. A human is composed of one or more regions under the condition of geometric constraints, and a human group consists of one or more people grouped together. Cucchiara et al. [21] presented a multilevel tracking scheme for monitoring traffic. The low-level consists of image processing while the high-level tracking is implemented as knowledge-based forward chaining production system. Veeraraghavan et al. [23] used a multilevel tracking approach with Kalman filter for tracking vehicles and pedestrians at intersections. The approach combined low-level image-based blob tracking with high-level Kalman filtering for position and shape estimation. An intermediate occlusion-reasoning module served the purpose of detecting occlusions and filtering relevant measurements. Chen et al. [24] proposed a learning-based automatic framework to support the multimedia data indexing and querying of spatio-temporal relationships of vehicle objects. The relationships were captured via unsupervised image/video segmentation method and object tracking algorithm, and modeled using a multimedia augmented transition network (MATN) model and multimedia input strings. Useful information was indexed and stored into a multimedia database for further information retrieval and query. Kumar et al. [25] presented a tracking algorithm combined
Kalman filter-based motion and shape tracking with a pattern matching algorithm. Zhou et al.
[26] presented an approach that incorporates appearance adaptive models in a particle filter to realize robust visual tracking and recognition algorithms. Nguyen et al. [27] proposed a method for object tracking in image sequences using template matching. To update the template, appearance features are smoothed temporally by robust Kalman filters, one to each pixel.
In regard to the cameras of surveillance systems, there are fixed cameras, active cameras and multiple cameras used for capturing the surveillance video. Kang et al. [28] presented an approach for continuous tracking of moving objects observed by multiple, heterogeneous cameras and the approach processed video streams from stationary and Pan-Tilt-Zoom cameras. Besides, much research used fixed cameras for the convenience of system construction and combining with the traditional surveillance system.
In this thesis, we combined region-based and feature-based tracking methods and used plentiful features as effective inputs of tracking analysis. This proposed algorithm can do a good job to handle multi-objects with occlusion events or split events.
2.3 Behavior Analysis
Understanding objects’ behavior and extracting useful traffic parameters are the main work after successfully tracking the moving objects from the image sequences. Behavior understanding involves the analysis and recognition of objects’ motion, and the production of high-level description of actions and interactions. Thus, via user interface or other output methods, we presented summarized useful information.
Traffic information is also an important tool in the planning, maintenance, and control of any modern transport system. Traffic engineers are interested in parameters of traffic flow such as volume, speed, type of vehicles, traffic movements at junctions, etc. Fathy [29]
presented a novel approach based on applying edge-detection techniques to the key regions or
windows to measure traffic parameters such as traffic volume, type of vehicles. Jung et al. [30]
proposed a traffic flow extraction method with the velocity and trajectory of the moving vehicles. They estimated the traffic parameters, such as the vehicle count and the average speed and extracted the traffic flows. Kumar et al. [31] proposed target classification in traffic videos using BNs. Using the tracking results and the results of classification, world coordinate estimation of target position and velocity were obtained. The a priori knowledge of context and predefined scenarios was used for behavior recognition. Haag and Nagel [32] proposed a system for incremental recognition of traffic situations. They used fuzzy metric temporal logic (FMTL) as an inference tool to handle uncertainty and temporal aspects of action recognition.
In their system, all actions were modeled using some predefined situation trees. Remagnino et al. [33] presented an event-based visual surveillance system for monitoring vehicles and pedestrians that supplies word descriptions for dynamic activities in 3-D scenes. In [34], an approach for the interpretation of dynamic object interactions in temporal image sequences using fuzzy sets and measures was presented. A multidimensional filter-based tracking algorithm was used to track and classify moving objects. Uncertainties in the assignment of trajectories and the descriptions of objects were handled by fuzzy logic and fuzzy measures.
Recently, traffic incident detection employing computer vision and image processing had attracted much attention. Ikeda et al. [35] outlined an image-processing technology based automatic abnormal incident detection system. This system was used to detect the four types of incidents: stopped vehicles, slow vehicles, fallen objects, or vehicles that attempted lane changes. Trivedi et al. [36] described a novel architecture for developing distributed video networks for incident detection and management. The networks utilized both rectilinear and omni-directional cameras. Kamijo et al. [37] developed a method by the results of tracking for accident detection which can be generally adapted to intersections. The algorithm to detect accidents used simple left-to-right HMM. Lin et al. [38] proposed an image tracking module
with active contour models and Kalman filtering techniques to perform the vehicle tracking.
The system provided three types of traffic information: the velocity of multi-lane vehicles, the number of vehicles and car accident detection. Veeraraghavan et al. [23] presented a visualization module. This module was useful for visualizing the results of the tracker and served as a platform for the incident detection module. Hu et al. [39] proposed a probabilistic model for predicting traffic accidents using three-dimensional (3-D) model-based vehicle tracking. Vehicle activity was predicted by locating and matching each partial trajectory with the learned activity patterns, and the occurrence probability of a traffic accident was determined.
We propose a framework for accident prediction based on objects’ properties, such as velocity, size and position. Besides, according to some preset information, our system can also do accurate objects classification. The useful information can be presented on GUI module and it’s easy to be understood.
Chapter 3
Multi-objects Tracking System with Adaptive Background Reconstruction
In this chapter, we will present our system structure and the details of proposed algorithms. The system structure is composed of four sub-systems: foreground segmentation, objects extraction, objects tracking and behavior analysis. In section 3.1, we use a diagram of the global system to show four sub-systems and their key modules. In section 3.2, we present foreground segmentation’s framework and the adaptive background reconstruction technique.
In section 3.3, we present the approach and algorithms of objects extraction. In section 3.4, we present the framework of objects tracking and its relevant algorithms. In section 3.5, we present the behavior analysis module and the analyzing algorithms.
3.1 System Overview
At first, foreground segmentation module directly uses the raw data of surveillance video as inputs. This sub-system also updates background image and applies segmenting algorithm to extract the foreground image. Next, the foreground image will be processed with morphological operation and connected components method to extract individual objects. At the same time, object-based features are also extracted from the image with extracted objects.
Main work of the third sub-system is to track objects. The tracking algorithm will use significant object features and input them into analyzing process to find the optimal matching between previous objects and current objects. The occlusion situation and other interaction of moving objects are also handled well in this sub-system. After moving objects are tracked successfully in this sub-system, the consistent labels are assigned to the correct objects.
Finally, objects behavior is analyzed and recognized. Useful traffic parameters are extracted and shown in the user interface. The diagram of global system is shown in Fig. 1.
3.2 Foreground Segmentation
The purpose of first sub-system is to extract foreground image. At first, we input the raw Adaptive B/G
Updating
Objects Lists Frames
Matching Analysis Foreground Color
Channels Foreground
Segmentation
Objects Extraction
Behavior Analysis
Region Matching
-
Adaptive Threshold
Object Features
Objects Classification
Accident Prediction Objects Tracking
Pre- Processing
Mask Images
Modified Bounding Box
Traffic Parameters
Extraction Connected Components
Fig. 1 Global system diagram
main processes of this sub-system are foreground segmentation and background reconstruction. In regard to segmentation, there are three basic techniques: 1) frame differencing, 2) background subtraction, and 3) optical flow. Frame differencing will easily produce some small regions that are difficult to separate from noise when the objects are not sufficiently textured. Optical flow’s computations are very intensive and difficult to realize in real time. In [10], a probabilistic approach to segmentation is presented. They used the expectation maximization (EM) method to classify each pixel as moving object, shadow or background. In [31], Kumar proposed a background subtraction technique to segment the moving objects from image sequences. And, the background pixels were modeled with a single Gaussian distribution. In [17], Gupte used a self-adaptive background subtraction method for segmentation.
In almost surveillance condition, the video camera is fixed and the background can be regarded as stationary image, so the background subtraction method is the simplest way to segment moving objects. That’s why we adopt this method as the basis of our segmentation algorithm. Besides, the results of frame differencing and previous objects condition are also used in order to achieve the segmentation more reliably. The process of foreground segmentation and background reconstruction is shown in Fig. 2.
3.2.1 Background Initialization
Before segmenting foreground from the image sequences, the system needs to construct the initial background image for further process. The basic idea of finding the background pixel is the high appearing probability of background. During a continuous duration of surveillance video, the level of each pixel appeared most frequently is almost its background level. According to this concept, there are some approaches to find out background image, such as classify and cluster method and Least-Median-Squares (LMedS) method. We use a simpler method that if a pixel’s value is within a criterion for several consecutive frames, it
means the probability of appearing of this value is high locally or this value is locally stable.
This value is regarded as the background value of this pixel. Then the pixel value in background buffer is duplicated to the corresponding pixel in the initial background image.
Fig. 2 The process diagram of foreground segmentation
This method can build an initial background image automatically even though there are Current
Frame
Previous Frame
-
Frame Diff.
Image
Objects Life Mask Background Reconstruction
Current B/G Image
-
B/G Diff.
Image Yes
Adaptive Updating
Objects Filter
Foreground Image B/G
Temp.
Image
establishing equation is Eq. (1), (2) and (3). In these equations, the superscript C means different color channels. In our segmentation algorithm, we use R, G and intensity channels for background subtraction process. Hit(i,j) records the times of one same value appeared consecutively at pixel(i,j) and Thappear is the threshold of the appearing times. The σBG is a
establishing equation is Eq. (1), (2) and (3). In these equations, the superscript C means different color channels. In our segmentation algorithm, we use R, G and intensity channels for background subtraction process. Hit(i,j) records the times of one same value appeared consecutively at pixel(i,j) and Thappear is the threshold of the appearing times. The σBG is a