In recent research, many approaches for video objects extraction have been developed. They can be roughly broken into two groups: motion-based and spatial-temporal. Early video objects extraction algorithms are motion-based segmentation methods that employ motion information only. They usually deal with rigid motion or piecewise rigid motion. Motion-based segmentation algorithms generally involve three main issues. The first issue is data primitives or region of support, the data primitives can be individual pixels, corners, lines, blocks or regions.
The second issue is motion models or motion representations, which can be 2D optical flow [14], [15], [18], [20], or 3D motion parameters [16][20], this issue involves parameter estimation or motion estimation. The third issue is segmentation criteria, which can be maximum a posteriori (MAP) [17], Hough transform, expectation and maximization (EM) [21],[22]. However, due to noise problem and motion complexity of the scene, the real motion segmentation/clustering schemes are usually much more complex than this in that the motion estimation in the motion representation stage and the segmentation are usually recursive processes.
Change detection is a computation-less method of motion-based segmentation algorithms. This method distinguishes temporally changed and unchanged regions with two successive images k-1 and k, and the moving object is then separated from the changed regions. There are two parameters to affect the performance of a change detector. The first is the choice of the threshold separating changed from unchanged luminance picture elements, and the second is to find a reasonable criterion that eliminates small regions, e.g. small unchanged regions within large changed regions.
Thoma and Bierling [19] combine change detection with optical flow to carry
out object segmentation, and incorporate a median filter to eliminate small elements in the change detection mask. It iteratively evaluates new threshold until the system is stable to provide good segmentation object performance. But, the problem can arise here that some spatial positions in the uncovered background of current frame are not addressed by any motion vector.
Arch et al [13] propose a change detection technique using MAP and relaxation.
The thresholding is carried out by performing a significance test on the noise hypothesis of the luminance difference image, which is modeled as Gaussian camera noise with a varianceσ2. The result object mask is too scattered due to large number of small areas in the object area, which can be solved by a morphological closing operation.
Neri [12] formulates automatic segmentation as the problem of separating moving objects from a static background. Potential foreground regions are detected by applying a higher order statistics (HOS) test to a group of interframe differences. The resulting segmented foreground objects are slightly too large, because the boundary location is not directly determined from the gray level or edge image.
Recently, an efficient moving object segmentation algorithm based on change detection suitable for real-time content-based multimedia communication systems is proposed by L.G. Chen [5]. He firstly constructs a reliable background image by accumulating frame difference information. Then, the moving object region is then segmented from the background region by comparing the current frame with the constructed background image. Finally, a post-processing step is applied on the obtained object mask to remove noise regions and to smooth the object boundary. The resulting object is sometimes large when the background information has not been completely constructed yet.
In above approaches, they could not segment accurate moving object due to the lack of spatial information. We have to refer to spatial information to segment semantic moving object. The comparatively new spatial-temporal segmentation techniques employ both spatial and temporal information embedded in the video sequence. By combining both motion and spatial information, these techniques intend to overcome the over-segmentation problem in image segmentation and overcome the noise-sensitive and inaccuracy problems in motion-based segmentation.
Mech and Wollborn [9] generate the video object plane or object mask from an estimated change detection mask (CDM). Initially, a change detection mask is generated by taking the difference between two successive frames using a global threshold. This CDM is then refined in an iterative relaxation that uses a locally adaptive threshold to enforce spatial continuity. Then, object mask is calculated from the CDM by eliminating uncovered background by hierarchical block matching and adapting to gray-level edges to improve the location of boundaries.
Choi et al. [8] presented a spatial morphological segmentation technique. It uses watershed algorithm to partition a frame into some different homogeneous regions to detect the location of the object boundaries. Then, a foreground/background decision is made to create the video object planes. To enforce temporal continuity, the segmentation is aligned with that of the previous frame, and those regions for which a majority of pixels belonged to the foreground before are added to the foreground too.
This allows tracking an object even when it stops moving for an arbitrary time.
Tsaig and Averbuchs’ [7] are similar to as above technique. This paper formulates the problem as graph labeling over a region adjacency graph (RAG) based on motion information. An initial spatial partition of each frame is obtained by a fast,
each region is estimated by hierarchical region matching. Finally, a new label is obtained by maximization of the a posteriori probability of the MRF using motion information, spatial information and the memory which is maintain temporal coherence. The optimization is carried out by highest confidence first (HCF).
Demin [11] presents a technique for unsupervised video segmentation. This technique consists of two phases: initial segmentation and temporal tracking, similar to above techniques. However, it can effectively track fast moving objects and it is computationally efficient because of the use of modified watershed transformations and a fast motion estimation algorithm.
There is good moving object segmentation performance in above techniques.
However, the computation complexity is very high because both the watershed algorithm and the motion estimation are computationally intensive operations. In addition, there are some fast and efficient moving object segmentation techniques with using the edge information.
Meier and Ngan [10] proposed a moving object edge tracking method. The core of this algorithm is an object tracker that matches a two-dimensional (2-D) binary model of the object against subsequent frames using the Hausdorff distance. The best match found indicates the translation the object has undergone, and the model is updated every frame to accommodate for rotation and changes in shape. The initial model is derived automatically, and a new model update method based on the concept of moving connected components allows for comparatively large changes in shape.
The proposed algorithm is improved by a filtering technique that removes stationary background. Finally, the binary model sequence guides the extraction of the VOPs from the sequence.
Kim and Hwang [4] proposed a moving object segmentation algorithm with using Canny filter. The extraction of a specific single video object (VO) is based on connected components analysis and smoothness of VO displacement in successive frames. It begins with a robust double-edge map derived from the difference between two successive frames. After removing edge points which belong to the previous frame, the remaining edge map, moving edge (ME), is used to extract the VOP. It is similar to previous approach that these algorithms have several drawbacks. Moving edge is not suitable for head-and shoulder type video sequence such as “Miss American” or “Akiyo”. They need to clarify the moving object at first frame because of the little movement.
Our algorithm is temporal-spatial based technique. We compare current frame with constructed background frame to provide temporal information. This temporal information could help us to efficiently extract moving object. In addition, we use edge operator to provide spatial information. This spatial information could help us to segment moving object which is semantic. The framework of the moving object segmentation algorithm and the detail of initial background construction will be described in Chapter 3, and object tracking will be discussed in Chapter 4.