Thresholding the difference between two consecutive input frames is the basic concept of change detection-based segmentation. However, since the behavior and characteris-tics of the moving objects differ significantly, the quality of segmentation result depends strongly on background noise, object motion, and the contrast between the object and the background. Reliable and consistent object information is very difficult to obtain. Hence, instead of trying to get more information from the changing part of the scene, we con-centrate on the stationary background where the characteristics are well known and more reliable.
The idea of background subtraction is to subtract the current image from the still background, which is acquired before the objects move in. After subtraction, only non-stationary or new object are left. The most straightforward way to separate background is to apply a simple difference and threshold method.
An example of background subtraction-based technique can be seen in [3]. This seg-mentation system consists of five major steps as shown in Figure 2.4.
1. Frame difference:
The frame difference mask is generated simply by thresholding the frame differ-ence. This data is sent to the background registration step where the reliable back-ground is constructed from the accumulated information of several frame difference masks. Since the accumulated frame difference mask are used in the final decision for a reliable background, no filtering or boundary relaxation is applied on the frame difference.
A significance test technique is used to obtain the threshold value. The test statis-tic is the absolute value of frame difference. Under the assumption that there is no change in the current pixel, the frame difference obeys a zero-mean Gaussian distribution and its probability density function is as follows:
p(F D|H0) = √ 1
2πσ2 exp{−F D2 2σ2 },
whereF D is the frame difference and σ2 is the variance of the frame difference.
Note that σ2 is equal to twice the camera noise variance σc2. H0 denotes the null hypothesis, i.e., the hypothesis that there is no change at the current pixel. The threshold value is decided by required significance level. Their relation is as fol-lows:
α = P rob(|F D| > T H|H0),
whereα is the significance level and T H is the threshold value.
2. Background registration:
Figure 2.4: Background substraction-based segmentation system (from [3]).
The goal of background registration is to construct a reliable background informa-tion from the video sequence. In this applicainforma-tion, it needs a reliable background information for change detection. An approximate background information is not helpful for object detection, and even worse, it will cause error in the later segmen-tation result until the background information is corrected. Therefore, for informa-tion that they are not very sure to be background, they tend to reject and leave the corresponding area in the background buffer empty.
In the background registration step, the history of frame difference mask is con-sidered in constructing and updating the background buffer. A stationary map is maintained for this purpose. If a pixel is marked as changing in the frame difference mask, the corresponding value in the stationary map is cleared to zero; otherwise, if the pixel is stationary, the corresponding value is incremented by one. The values in the stationary map indicate that the corresponding pixel has not been changing for how many consecutive frames. If a pixel is stationary for the past several frames, then the probability is high that it belongs to the background region. Therefore, if the value in the stationary map exceeds a predefined value, then the pixel value in the current frame is copied to the corresponding pixel in the background buffer.
A background registration mask is also changed in this process. The value in the background registration mask indicates that whether the background information of the corresponding pixel exists or not. If a new pixel value is added into the background buffer, the corresponding value in the background registration mask is changed from nonexisting to existing.
3. Background difference:
This step generates a background difference mask by thresholding the difference between the current frame and the background information stored in the background buffer. This step is very similar to the generation of frame difference mask.
4. Object detection:
The object detection step generates the initial object mask from the frame differ-ence mask and the background differdiffer-ence mask. The background registration mask,
Figure 2.5: Initial object mask generation (from [3]).
frame difference mask, and background difference mask of each pixel are required information. Figure 2.5 lists the criteria for object detection, whereBD means the absolute value of difference between the current frame and the background informa-tion stored in the background buffer,F D is the absolute value of frame difference, and the OM field indicates that whether or not the pixel is included in the object mask. T HBD and T HF D are the threshold values for generating the background difference mask and frame difference mask, respectively.
5. Post-processing:
After the object detection step, an initial object mask is generated. However, due to the camera noise and irregular object motion, there exist some noise regions in the initial object mask. The approach to eliminate the noise region relies on an observation that the area of noise regions tend to be smaller than the area of the object. Regions with area smaller than a threshold value are removed from the object mask. In this way, the object shape information is preserved while smaller noise regions are removed. After removing noise regions, a close and an open operation with a3 × 3 structuring element are applied on the object mask.
2.7 Summary
All techniques described above can obtain a desired object mask. Here, we explain the reason why we choose the background subtraction-based technique. In our application, the camera is usually fixed and therefore the optical flow-based technique is not under
consideration.
The idea of change detection-based and edge detection-based techniques are very sim-ilar. In our experience, the object masks obtained by these two techniques are hard to resist the camera noise and some noise-removing filters are necessary. Besides, the accuracy of object mask is hard to keep for entire sequence.
The major disadvantage of the background subtraction-based technique is that it needs some time to gather enough information of background. The reason why we still choose this technique is that the accuracy of object mask is easier to keep when we collect enough information of background. Besides, the module for gathering background can be stopped to improve processing time when the whole background is obtained.