Chapter 2 Backgrounds
2.1 Motion Detection
In this section, some popular methods about how to detect moving objects in a complex background are to be discussed. The detection process is very important since better detection algorithms support better tracking and correspondence results. Since all of these methods have advantages and disadvantages, we usually choose a suitable algorithm according to the environment of the surveillance system.
2.1.1 Background Subtraction
Background subtraction is a method to detect moving regions in an image by taking the difference between the current image and the reference background in a pixel-by-pixel fashion.
This technique is developed especially for surveillance systems with static cameras since the reference background model is needed. After subtraction, each pixel of residuals is then classified as foreground if its intensity is larger than a given threshold; otherwise the pixel is classified as background. Although this method is quite simple and costs low computation, it is very sensitive to the image noise and the variation of illumination. Stauffer [1] uses a set of Gaussian distributions to represent a pixel’s value. In other words, the intensity at each pixel in the reference background follows the distribution of a Gaussian mixture model (GMM). The difference between the current image and the reference background is obtained by measuring the distance of each point to its corresponded set of Gaussian distributions. Moreover, the parameters of Gaussian distributions will be adjusted as time goes on. As a result, the background model becomes more precisely and flexibly. However, the detection results are still degenerated by shadows and highlights. Horprasert T [2] utilizes the concept of color constancy of human eyes to separate the information of color and intensity. It can not only segment the moving regions in an image but also tell if some pixels of the moving regions belong to shadows or highlights. As Figure 2-1 shows, the upper-left picture is the reference background and the upper-right picture is the current image. In the lower-left figure, the blue region represents the detected moving object, and the red region indicates the shadows and highlights. The lower-right picture shows the final result of detection.
Figure 2-1 Horprasert’s method [2]
Upper-left is the reference background, upper-right is the current image. In the lower-left picture, the blue region is the detected moving object and red region is shadows and highlights. The lower-right picture is the final detection result.
Duque [3] takes the temporal differencing results into consideration. The detected moving regions obtained by both temporal differencing results and background subtraction results can reduce the influence of noise.
2.1.2 Temporal Differencing
Temporal differencing is a method to detect moving regions by taking the difference between successive images. We can use a given threshold to distinguish moving parts from static parts. Although this approach needs low computation and is not sensitive to the environment, it can only detect moving objects. That is, if an object stops to move, then it will be misclassified as a part of background. In addition, sometimes the detection results are not reliable because the detected regions are usually not complete and have many holes. Fu-Yuan Hu [4] does some morphological operations and GMM detection on the results of temporal differencing to eliminate these holes. S Dubuisson [5] uses a particular set of Gabor filters to filter the residuals.
The contribution of such filters is to emphasize the moving regions. As Figure 2-2 shows, the left figure illustrates the original result, and the right figure shows the results after using Gaber filters.
After the filtering, we put some particles randomly on the moving regions, with the number of particles being proportional to the magnitude of motion. As Figure 2-3 shows, in the left figure the white particles represent the locations with larger motion. By applying some classification methods, we can cluster the small residuals to accomplish the detection task.
Figure 2-2 The illustration of the results before and after Gabor Filters [5]
Left one is the original results of time differencing, and right one is the results after applying Gabor filters.
Figure 2-3 The example of detection result by applying Gabor Filters [5]
Left figure is the detection results after applying Gabor filters, white particles represent the points used to clustering, and the right figure shows the final detection results.
2.1.3 Optical Flow
Optical flow is a method to represent each pixel by a function of time and location. If along the time a moving point only changes its location while keeps its intensity values unchanged, then we can estimate its motion vector. The pro of this method is that we can extract moving objects from the background even over a dynamic camera system. In addition, by classifying the motion vectors of each pixel we can further segment and distinguish different moving objects.
The con of this method is that it costs a lot of computation and is not suitable for real-time surveillance systems.
2.1.4 Learned Classifier
Learned Classifier method needs to know the objects which are going to detect. That means we need to train a set of classifiers at first based on the features of objects. Figure 2-4 illustrates some examples of training data for detection. However, there are lots of features that can be selected to train the classifiers and Figure 2-5 illustrates one of the selections. The pro is that if the amount of the training data is large enough then the performance of this method can be very good. In addition, the application of this method is not limited to static camera systems. The con is that this method can only detect specific kinds of objects and it needs a large amount of training data to “learn” these objects. V Nair [6] proposes a method which doesn’t require much training data at first. Instead, his system can learn the features of objects online by combining the motion information of the moving objects. Hence, his system can learn the objects’ features adaptively.
Figure 2-4 Training data[6]
The left figure shows some training examples of “people”.
The right figure shows some training examples of “background”.
Figure 2-5 A example of training features[6]
Many detection methods contain more than one working principle. For example, Dongxiang Zhou [7] proposes a method which integrated all the aforementioned concepts. Since each method has its pros and cons, we may combine different methods to achieve better detection results.