Related work on Daytime Surveillance

Chapter 2. Related Work

2.1 Related work on Daytime Surveillance

In the past, plenty of works related to daytime traffic surveillance had been proposed [1]. In the following sections, some methods of roadway detection, object detection, shadow elimination, and traffic surveillance video analysis are introduced.

2.1.1 Roadway Detection

In traffic surveillance videos, roadway is the only region that we are interested in, and the rest regions in video frames are worthless. Therefore, finding out the region in advance reduces the computation of video processing and decreases errors caused by moving objects outside the roadway. In addition, discovering a center line of bidirectional roadway is helpful to monitor respective traffic flows in two directions.

Li and Chen [2] propose an algorithm to detect the lane boundaries of roadway by using Multi-resolution Hough Transform [3] without a priori knowledge of road geometry or training data. Then the region between the lane boundaries is regarded as roadway region. Lai et al. [4] put forward a method to detect multiple lanes from a traffic scene by using lane marking information and orientation. However, there are many types of lane markings like solid line, double solid line, and dotted line on different kinds of roadway as shown in Figure 2. Therefore, finding the correct lane

Figure 2. Lane markings with different shapes and colors on the roadway.

markings is not a simple task. Furthermore, the lane markings of the roadway are not always visible in some cases. Hence, there are other researches on roadway detection without lane marking information.

Stewart et al. [5] present an automatic lane finding algorithm based on detecting a region with significant changes. The roadway region in a traffic scene is generated by accumulating the differences between two consecutive frames after removing noises and sudden changes in brightness. However, a limitation of their algorithm is that the roadway must be parallel to the direction of camera shooting. Afterwards, Pumrin and Dailey [6] improve the algorithm to detect the roadway region from a variety of camera angles. For generating a roadway region mask, a hundred frames of moving edge images are accumulated and then holes are filled with a convex hull algorithm. In addition, two successive activity region masks generated from two successive sets of hundred frames are compared to detect the camera’s motion. In [7], Lee and Ran put forward a method to detect bidirectional roadway by accumulating moving parts in a difference image between two consecutive frames and find a center line to separate the roadway into two parts with different directions. Nevertheless, their methods are affected by unbalanced traffic flow in different lanes and constrained by three roadway types. As shown in Figure 3, the roadway may extend to a) bottom-right, b) bottom-mid, c) bottom-left. Moreover, a clear gap must exist between the two respective parts in two directions for center line estimation.

(a) (b) (c)

Figure 3. Three roadway types. (a) Bottom-right. (b) Bottom-mid. (c) Bottom-left.

Therefore, in order to conquer the drawbacks in previous works, the roadway detection and bidirectional roadway analysis without any lane marking information and a limitation of specific roadway types are proposed in our framework.

2.1.2 Object Detection

Detecting the moving objects is an important and useful technique for video understanding. Thus, many techniques are developed and can be classified into four categories: background subtraction, segmentation, pointer detectors and supervised learning [8]. Among the four categories, background subtraction is a widely used method for detecting moving objects in videos captured by static cameras. The rationale of the method is to detect the moving objects from the significant differences between the current frame and a reference frame, often called “background image” or

“background model”. However, this method suffers from the background varying.

Thus, the background image must be a representation of the scene without moving objects and keeps regularly updated so as to adapt to the changing geometry setting and luminance condition [9]. Since roadway surface is even and smooth in most cases and traffic surveillance is stable without camera motion, we employ background subtraction technique for the advantages of integrity of foreground and low computation complexity in extracting the moving vehicles on the roadway.

A variety of algorithms and techniques for performing background subtraction have been developed to detect vehicles. Averaging [10] and finding the median values [11] of a sequence of frames are the most basic ways to construct the background image in the past. Subsequently, Chen et al. [12] put forward a background image construction by calculating the frequency of pixel intensity value at training period.

The frequency ratios of intensity values for each pixel at same position in frames are calculated and the intensity values with biggest ratio are incorporated to model a background image. Then, the background image is updated by repeating initialization operation. However, the aforementioned methods are fast but memory consuming and do not provide explicit methods to choose a threshold for segmenting out the foreground. Hence, Wren et al. [13] propose a running Gaussian average background model which fits a Gaussian probability density function on the latest n values of a pixel location for each pixel in a frame. In addition to the low memory requirement, that the threshold is determined automatically by standard deviation is the most significant improvement.

In some conditions, different objects are likely to appear at a same location over time. Therefore, some researches are proposed to deal with multiple modal background distributions. Stauffer and Grimson [14] raise a case for a multi-valued background model that is capable of coping with multiple background objects. The recent history of each pixel, called pixel process, is modeled by a mixture of K Gaussian distributions, and each pixel is classified into foreground or background according to whether the pixel matches one distribution of its pixel process. In a highly volatile environment, Elgammal et al. [15] propose a method to model the background by a non-parametric model based on kernel density estimation on the last n values. The method rapidly forgets the past and concentrates on recent observation

and is more accurate because of avoiding the inevitable errors in parameter estimation,

which requires great amount of data to be accurate and unbiased.

On the other hand, traditional background subtraction approaches model only temporal variation of each pixel. However, there is also spatial variation in real world due to dynamic background such as waving trees and camera jitters, which causes a significant performance degradation of traditional methods. A novel spatial-temporal nonparametric background subtraction approach is proposed by Zhang et al. [16] to handle dynamic background by modeling the spatial and temporal variations at the same time. Moreover, other various background subtraction methods suitable for different environments have been reviewed and discussed in [1, 16, 17].

2.1.3 Shadow Elimination

When we detect the moving objects in the outdoor images, shadows are often extracted with the objects. Also, separate objects may be connected through shadows.

Both conditions always cause failure in object detection. However, separating the moving objects from shadows is not a trivial task. As shown in Figure 4, shadows can be generally categorized cast shadow and self shadow [18]. Referring to Figure 4, the self shadow is a portion of the object not illuminated by the light source. The cast shadow lying beside the object does not belong to the object. For object detection and many applications, cast shadow is undesired and should be eliminated, while self shadow is a part of the object and need to be preserved. However, cast shadow and self shadow are similar in intensity. Thus, how to distinguish between them becomes a serious challenge. Moreover, if an object has intensity close to its shadows, shadow elimination is extremely difficult. Sometimes even though object and shadows can be separated, object shape is often incomplete due to imprecise shadow removal.

Figure 4. Cast Shadow and self shadow.[18]

Confronting these knotty problems, various methods for shadow elimination have been proposed for suppression of cast shadow in recent years. The intensity, color and texture are the most remarkable features of shadow. Because the distribution of intensity within a shadow is not uniform in real environments, Wang et al. [18]

develop a method to estimate attributes of shadow by sampling points on edges of cast shadow and remove the shadow by the attributes. Afterwards, a process is executed to recover the object shape on the basis of information of object edges and attributes of shadow for avoiding over-elimination. Song et al. [19] remove the shadow in good use of the different properties between shadows and objects based on the RGB chroma model. Liu [20] introduces a method which uses gradient feature to eliminate shadow based on the observation that shadow region presents the same textual characteristics as in the corresponding background image.

Based on the prior knowledge, Yoneyama et al. [21] simplify 3D solid cuboids model to a 2D joint vehicle-shadow model for eliminating cast shadow. Six types of vehicle-shadow models are employed to match the extracted vehicle by utilizing luminance of shadow for differentiating the vehicle and shadow. Besides, Chien et al.

[22] remove the shadow by a mathematical analysis model. Different from the methods mentioned above, Hsieh et al. [23] use lane geometries as an important cue to help eliminate all undesirable shadows even though the intensity, color and texture of vehicles are similar to cast shadow.

A summary of general observations with respect to cast shadow and background in roadway scene is given by Xie et al. [24] : (1) Pixels of cast shadow fall on the same surface as the background; (2) Cast shadow pixels are darker than their background in all three color channels; (3) Background is mostly roadway surface which is often monochrome in traffic scene. As a result, the values of hue channel are small in the cast shadow region; (4) The edge pixels of the cast shadow are significantly less than that of the vehicle.

2.1.4 Traffic Surveillance Video Analysis

For analyzing the content of video, the trajectories of moving objects can provide much information, and object tracking is an important and unavoidable way to extract the trajectories. Take traffic surveillance videos for example, if we intend to realize the action of moving vehicles, we have to analyze how the vehicles move. That is, we must track vehicles during the traffic monitoring in order to obtain their trajectories.

Generally, tracking methods can be classified into two categories. One category estimates motions of moving objects and minimizes the error function to track objects.

Another category calculates the similarities between current objects and previous objects and maximizes the similarity measures to track the objects. A variety of object tracking methods developed in past decades are reviewed in [8].

In traffic surveillance videos, a traditional approach of moving vehicles tracking is to model the moving object properties such as position, velocity and acceleration.

Measurements usually include the object positions in the frame, which is obtained by an object detection algorithm. In particular, Kalman filter [25] and particle filter [26]

are popularly used in many research works. However, in practical applications, it is difficult to track all vehicles on the roadway. For examples, if the viewpoint of a

camera is low or there are plenty of vehicles on the roadway, the vehicle occlusion problem results in failing to extract and track correct individual vehicles. Moreover, that the effective resolution of perspective is reduced in a frame makes insufficient features of vehicles for object tracking. Even though many researches on occlusion problem have been proposed, the complexity of vehicle tracking is surged and accuracy of vehicle tracking is sagged while a great number of occluded vehicles need to be tracked at one time.

Along with the trajectories of vehicles are extracted, numerous works deal with vehicle activity analysis. The features such as size, speed, and moving direction of vehicles are helpful to understand the situation of traffic flow in the surveillance videos [2]. Generally, more complicated events are mostly detected with machine learning algorithms. In [27], the issue of event detection in time series data is addressed using neural network. In [28], Hidden Markov Models are used to form the basis of activity recognition and anomaly detection.

For the purpose of traffic flow analysis, the usage of virtual line detectors without tracking all vehicles on the roadway was developed in [29-33]. In [29], the authors present an approach to evaluate traffic-flow parameters under urban road environment. Virtual line based time-spatial image as shown in Figure 5 is used for vehicle counting. The vehicles are extracted from the time-spatial image after edge detection and morphological operation. In [30], the time-spatial image is processed to evaluate the traffic congestion level. However, these methods do not work well in the frames with low contrast, small vehicle blocks, and irregular driving behaviors.

Afterwards, the virtual line group methods are proposed as an improvement in [32, 33]. Actually, virtual line based algorithms are more suitable to analyze the traffic flow while there are a large number of vehicles for real-time performance.

Figure 5. Generation procedure of time-spatial image.[29] (a) A frame sequence. (b) Time-spatial image generated by virtual line iteration.

在文檔中交通監控影片中日夜車流壅塞之分析及評估 (頁 15-23)