Chapter 3 Proposed Algorithm
3.4 Image Matting Model
The last stage of our proposed method is to apply an image matting model to get the final segmentation result. In the literature, supervised image matting is based on the similarity of colors between the unknown pixels and some samples of foreground pixels and background pixels. In the image matting model, the color of a pixel can be expressed as a linear combination of the corresponding foreground and background colors; that is,
Ii = 𝛼𝑖𝐹𝑖+ (1 − 𝛼𝑖)𝐵𝑖. (3-7)
Figure 3-12 Image matting model
Equation (3-7) can be pictured as illustrated in Figure 3-12, where we need to compute the alpha matte based on the matting model. As expected, the more alpha value is, the more likely the pixel belongs to the foreground object. For supervised image matting in [6], Levin et al.
assume the alpha matte may be represented locally as a linear combination of the image color channels. They introduce the use of matting Laplacian (ML) to compute the alpha matte without explicitly estimating the foreground and background colors in Equation (3-7). ML can be
29
drawback of ML is that it may have incorrect propagation across foreground regions because of the use of low quality outdoor surveillance cameras and the severe color variations in outdoor scenes.
The first thing in supervised image matting is to label two types of strokes as foreground pixels and background pixels. However, even though this ML approach can achieve quite impressive performance for foreground object extraction, it has difficulty in handling outdoor scenes under dramatic brightness variations or color variations. In Figure 3(b) and Figure 2-4(b), we show the manually selected scribbles for the image in Figure 2-3(a) and 2-4(a). Based on the scribbles, the corresponding matting result generated by the ML method is shown in Figure 2-3(c) and Figure 2-4(c).
Instead of using manually selected scribbles, our system can automatically generate the required trip-map for the ML method. This tri-map is obtained by the initial foreground segmentation got from the active contour model. As expressed in Equation (2-2), the tri-map is obtained by taking a threshold over the u(x) image. Above the threshold T1, we define the region as a foreground region; otherwise, it should be a background region if the value of u(x) is beneath the threshold T2. After that, a morphological erosion operation is applied over both the foreground regions and the background regions to obtain the unknown regions, as shown in Figure 3-13.
Figure 3-13 Automatically generated tri-map based on active contour model.
More specifically, for each pixel in unknown regions, the matting algorithm is the process of extracting a foreground object from an image. Based on this tri-map and the color similarity
30
between the unknown pixel and the samples foreground and background samples, the alpha value of the unknown pixel is estimated. We can obtain a much better matting result as shown in Figure 3-14(a).
To further improve the matting result, we add in more information in the computation of the affinity values. In the original ML method, they only consider the RGB color values in the computation of the affinity value to reduce foreground region propagation across background and to produce a smooth and accurate boundary contour. Instead of the original ML which only considers the RGB color space to separate foreground regions from background regions, here, we include extra image features: the absolute value of the difference image between the input image and the background reference image. The extra features can help in generating more accurate boundaries between the foreground objects and the background. In Figure 3-14, we further show the comparison between the supervised image matting method and the proposed automatic method in the embedding of the foreground object into the virtual scene. It can be seen that, except the shadow part, the proposed system can generate more accurate and natural result. The removal of shadows will be the next goal of our system.
(a) (b)
Figure 3-14 Modified ML comparison. (a) Original matting result. (b) Modified ML result.
31
3.5 Shadow Removal
The proposed method can extract foreground objects with reasonable precision, but the shadow is also extracted. In some applications, the synthesized image may look absurb and unnatural because of the shadow, as shown in Figure 3-15. We will discuss and remove the shadow in this section.
Figure 3-15 Shadow effect in synthesized image caused contrived result.
Shadow in many applications, such as object tracking, video surveillance, may appear as foreground objects. The inability to distinguish between foreground objects and shadows can cause some problems, such as weird synthesized image in virtual studio and failure of identification. Hence, shadow detection and removal is an important task.
To remove shadows, we assume the shadow areas are on the ground. Consider a scene containing a reference plane being viewed by a set of wide-baseline stationary cameras. The background models in each view are available. Any scene point lying inside the foreground zone in the scene will be projected to a foreground pixel in every view. Instead of using the fundamental matrix, we reduce the dimension in the fundamental matrix. We only concentrate our attention on the ground plane, which is called homography constraint, as shown in Figure 3-16. After reducing the dimension in fundamental matrix, we can map a point in one view to
32
a point in another view. The homography matrix warps a pixel from image to another on a reference scene planeπ.
Figure 3-16 Homography constraint show that a pixel in one view are warping to another view by a reference plane (ground).
Let Ф1,Ф2 , … ,Фn be the images of the scene obtained from n calibrated cameras. Hiπ𝑟 is homography of the reference plane π between the reference plane and any other view i.
Using homography matrix Hiπ𝑟, pixel as suspected foreground pixels in all the other images are warped to the reference image. The warping result are thresholded to get categorized as a shadow area, where can write this term as:
δ(x) = {10 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒𝑛>|𝐶|2 (3-8)
In Equation (3-8), |C| is the number of cameras. Typically we set the threshold to be a half of the camera number. The results of shadow detection are shown in Figure 3-17.
33
(a) (b)
Figure 3-17 Shadow detection. (a) Shadow on the ground. (b) MTM distance measure.
In Figure 3-17(a), the shadow areas which assume to be on the ground are marked in black.
Though the shadow areas can be well detected, some foreground objects like the shoes near the ground may also get detected. If we directly remove the area on the ground, the result will be poor because of the foreground region near the ground. By using the MTM distance measure mentioned before, we can distinguish the exact foreground regions from the shadow regions,
34
as shown in Figure 3-17 (b). Note that we use a smaller window to measure the ground (marked as black) region patch by patch to get a more accurate separation. By combining the active contour model and the image matting method, we can get the final segmentation result as shown in Figure 3-18.
Figure 3-18 Shadow removed result.
In fact, the result of Figure 3-18 still have some shadows left around the outer shadow area because the shadow area detected by homography constraint is smaller than the GMM background modeling This leads to imperfect result. The problem can be solved by applying multi-view constraint mentioned in Section 3-2 to further eliminate the shadow area. We can first use the homography constraint to detect the shadow area and then apply MTM distance measure to separate foreground from the ground. After that, we project all detected shadow regions to the reference view and the interaction region remains as the final result. The use of multi-v0iew constraint to remove shadow area is demonstrated in Figure 3-19.
35
Figure 3-19 Multi-view constraint with shadow detection. (a) Original image. (b) GMM modeling of (a). (c) Multi-view constraint with shadow removed.
(a)
(b)
(c)
36
Chapter 4 Experimental Results
In Figure 4-1, we show more simulation results on the other camera views. It can be seen that the foreground object can be effectively extracted in a quite accurate form, in spite of the complicated background environments. Note in Figure 4-1, the homography constraint to remove shadow is not included yet. It can be seen that, except the shadow part, the proposed system can generate accurate and natural results.
Figure 4-1 More results on other camera views. (a) Original image. (b) Multi-view constraint. (c) Result of convex active contour model. (d) Result of modified image matting method.
(a)
(b)
(c)
(d)
37
In Figure 4-2, we show the comparison between the supervised image matting method and the proposed automatic method in the embedding of the foreground object in a virtual scene. It can be seen that, with the homography constraint to remove shadows, the final foreground region has more accurate and smoother boundary.
Figure4-2 Comparison between the supervised image matting method and the proposed automatic method (a) Manually selected scribbles. (b) Synthesized image based on (a) and the original ML method. (c) Automatically generated tri-map. (d) Synthesized image based on (c) and the modified ML method.
In Figure 4-3, we further show the comparison between the original image matting methods which only considers RGB color values, and the modified image matting method. As expressed before, the boundary of the foreground in Figure 4-2 (a) is mixed with the background because of the dramatic color variations and the color similarity between foreground regions and background regions. The modified ML method has more precise boundary around the foreground object.
38
(a) (b)
Figure 4-3 Comparison between original ML and modified method of Figure 2-4(a). (a) Original ML method.
(b) Modified Ml method.
In Figure 4-4 to Figure 4-6, we show the comparison of the active contour model, the image matting model methods which only considers RGB color values, and the modified image matting method. The active contour model still has some broken foreground regions though the foreground objects is accurately separated from the background image. Based on the active contour model, the image matting model can accurately find the broken foreground. The modified matting laplacian can separate foreground object from background precisely. Note that even the foreground object is dressed in different colors, the proposed method can still separate the foreground object from the background accurately.
39
Figure 4-4 Compariosn of active contour model and image matting model. (a) Input image. (b) Active contour model. (c) Original image matting. (d) Modified image matting.
(a)
(b)
(c)
(d)
40
Figure 4-5 Compariosn of active contour model and image matting model. (a) Input image. (b) Active contour model. (c) Original image matting. (d) Modified image matting.
(a)
(b)
(c)
(d)
41
Figure 4-6 Compariosn of active contour model and image matting model. (a) Input image. (b) Active contour model. (c) Original image matting. (d) Modified image matting.
(a)
(b)
(c)
(d)
42
In Figure 4-7, we show more embedding examples of the foreground object into virtual scenes. The proposed method can accurately separate foreground from background in outdoor scenes.
(a) (b)
(c)
(d) (e)
(f)
43
Figure 4-7 More result of the foreground object into the virtual scene. (a)(d)(g)(j) Input image. (b)(e)(h)(k) Foreground extraction. (c)(f)(i)(l) Synthesized image.
In our experimental result to generate synthesized image, wesmooth the object boundary in order to generate more natural result in the embedding of the foreground object into computer generated environments. To smooth the boundary we take the alpha value to be between 0.1 to 0.9.
(g) (h)
(i) (j) (k) (l)
44
Chapter 5 Conclusion
In this thesis, we present a novel method to automatically separate foreground objects from the background in an outdoor scene. To suppress the noise generated in the background subtraction process, we include a multi-view constraint to combine information from different camera views. We further build a convex active contour model to obtain more reliable object boundaries. Finally, we propose a modified image matting method to get fine-tuned segmentation results. Experimental results demonstrate that the proposed method can outperform existing methods for outdoor-scene foreground/background separation without the inclusion of user guidance.
45
References
[1] S.-y. Chien, S.-y. Ma, and L.-g. Chen, "Efficient moving object segmentation algorithm using background registration technique," IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, pp. 577-586, 2002.
[2] C. Rother, V. Kolmogorov, and A. Blake, "GrabCut": interactive foreground extraction using iterated graph cuts," ACM Transactions on Graphics - TOG , vol. 23, pp. 309-314, 2004.
[3] L. Grady, "Random Walks for Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, pp. 1768-1783, 2006.
[4] X. Bai and G. Sapiro, "A Geodesic Framework for Fast Interactive Image and Video Segmentation and Matting," the International Conference on Computer Vision, 2007.
[5] A. Criminisi, T. Sharp, and A. Blake, "GeoS: Geodesic Image Segmentation," in Computer Vision – European Conference on Computer Vision, pp. 99-112, 2008.
[6] A. Levin, D. Lischinski, and Y. Weiss, "A Closed-Form Solution to Natural Image Matting," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, pp.
228-242, 2008.
[7] M. Piccardi, "Background subtraction techniques: a review," IEEE International Conference on Systems, Man, and Cybernetics, vol. 4, pp. 3099-3104, 2004.
[8] A. M. Elgammal, D. Harwood, and L. S. Davis, "Non-parametric Model for Background Subtraction," European Conference on Computer Vision, pp. 751-767, 2000.
[9] D. Culibrk, O. Marques, D. Socek, H. Kalva, and B. Furht, "Neural Network Approach to Background Modeling for Video Object Segmentation," IEEE Transactions on Neural Networks, vol. 18, pp. 1614-1627, 2007.
[10] M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active contour models,"
International Journal of Computer Vision, vol. 1, pp. 321-331, 1988/01/01 1988.
[11] V. Caselles, R. Kimmel, and G. Sapiro, "Geodesic Active Contours," International Conference on Computer Vision , pp. 694-699, 1995.
[12] T. F. Chan and L. A. Vese, "Active contours without edges," IEEE Transactions on Image Processing, vol. 10, pp. 266-277, 2001.
[13] X. Bresson, S. Esedoglu, P. Vandergheynst, J.-P. Thiran, and S. Osher, "Fast Global Minimization of the Active Contour/Snake Model," Journal of Mathematical Imaging and Vision, vol. 28, no. 2, pp. 151-167, 2007.
[14] T. Goldstein, X. Bresson, and S. Osher, "Geometric Applications of the Split Bregman Method: Segmentation and Surface Reconstruction," Journal of Scientific Computing, vol. 45, no. 1-3, pp. 272-293, 2010.
[15] V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, "Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation," IEEE Transactions on
46
Pattern Analysis and Machine Intelligence, vol. 28, pp. 1480-1492, 2006.
[16] G. Zeng, "SILHOUETTE EXTRACTION FROM MULTIPLE IMAGES OF AN UNKNOWN BACKGROUND," Asian Conference on Computer Vision, 2004.
[17] M. Sormann, C. Zach, and K. F. Karner, "Graph Cut Based Multiple View Segmentation for 3D Reconstruction," 3D Data Processing Visualization and Transmission, pp. 1085-1092, 2006.
[18] W. Lee, W. Woo, and E. Boyer, "Silhouette Segmentation in Multiple Views," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 1429-1441, 2011.
[19] Y. Hel-Or, H. Hel-Or, and E. David, "Fast template matching in non-linear tone-mapped images," International Conference on Computer Vision, pp. 1355-1362, 2011.
[20] T. Bouwmans, F. E. Baf, and B. Vachon, Background Modeling using Mixture of Gaussians for Foreground Detection - A Survey, 2008.
[21] C. Stauffer and W. E. L. Grimson, "Adaptive Background Mixture Models for Real-Time Tracking," Computer Vision and Pattern Recognition, vol. 2, pp. 2246-2252, 1999.
[22] E. Boyer, "On Using Silhouettes for Camera Calibration," Asian Conference on Computer Vision, pp. 1-10, 2006.
[23] A. Laurentini, "The Visual Hull Concept for Silhouette-Based Image Understanding,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, pp. 150-162, 1994.