In this chapter, we will show and discuss our experimental results. In our experiments, the first stage is to capture the video. To show the assistance of temporal knowledge in our algorithm, we chose the scenes that have some occluding objects that block moving objects behind them. To avoid some situations that might decrease the background subtraction performance, we chose the scenes that do not have huge luminance change and dramatically shaking trees. We took some videos in the campus.
The background images of two videos are shown in Figure 4-1. In each scene, we took the video for about four minutes. There are about 25 objects pass through within the period. The occluding objects in the first scene are the trees near the road. Moving objects mainly move on the road or on the sidewalk. The occluding objects in the second scene are the tree, the streetlamp, and the car. Moving objects move on the road or on the two sidewalks of the road. The camera is the DCR-TRV60 digital video camera with the frame rate of 29 frames per seconds.
Figure 4-1 The background images of our demo videos
Over the video, we perform background subtraction. Here, we use the ViBe program proposed in [11]. We choose the default parameter setting to run the program.
It takes about five and half minutes to run the background subtraction algorithm over the 4-min videos. In the result, there are some holes in the moving object regions and there are some false alarms over the leave regions. Examples of the background subtraction results are shown in Figure 4-2.
Figure 4-2 Some background subtraction results
Having the background subtraction result, we do the morphology processes and collect the temporal knowledge. This step spends a lot of time because we have thousands of frames. Even only performing the opening and closing operations would takes a few minutes. After that, we combine the temporal knowledge with Hoiem‟s algorithm to get the depth estimation of the background image. Here, we use a rule-based approach to adjust the boundary likelihood. The results are shown in Figure 4-3.
39
Figure 4-3 Background depth estimation result
With the background image depth estimation, we assign depth to the moving objects and then get the relative depth of the video contents. We compare our results with the original single-image estimation results. Here we use Hoiem‟s algorithm to estimate the depth of each image separately. The results are shown in Figure 4-4.
Compared with the results using Hoiem‟s method, our algorithm does better at a few places. First, the boundaries with objects passing are more likely to be preserved.
For example, the trunks of trees in the first video have occluded some moving objects.
Their depths are better estimated. Second, we can avoid the depth discontinuity on the background, which can be shown is the second video. The groves in the image are separated by some occluding objects. Using our algorithm, we can treat them as continuous objects that have continuous depth. Third, the depths of moving objects are better estimated. In fact, using Hoiem‟s method to estimate the depth of each frame is very inefficient. Their estimation results are usually inconsistent from frame
to frame.
Figure 4-4 Video depth estimation result
There are some restrictions in our system. First, if there is no occluding object that occludes some moving objects in the video, we cannot obtain some useful temporal information for depth estimation. Second, our system is based on the result of background subtraction. Hence, it won‟t perform well if the background subtraction result is poor. Moreover, in our method, we do not use specific object knowledge and we do not use complicated tracking technique. It would be better if we can take into account more object knowledge and adopt more advanced tracking techniques in the future.
41
Chapter 5.
C ONCLUSIONS
In this thesis, we proposed a method to estimate the depth of videos taken from static cameras. We accomplish this by first estimating the depth of background image and then assigning the depth to moving objects to get the final depth estimation of the video contents. Compared with single-image depth estimation, we concern the temporal information offered by the moving objects and the obstacles that might occlude the moving objects. Our algorithm can provide useful information to other processes, like video surveillance and video synthesis. For surveillance systems, we can know whether a moving object is occluded by the obstacle and use the information as prior knowledge. For video synthesis, we can add some synthesized moving objects into the video and know whether some part of the moving object should be occluded by some occluding objects in the scene.
R EFERENCES
[1] S. Battiato, S. Curti, M. L. Cascia, M. Tortora and E. Scordato, “Depth Map Generation by Image Classification”, in Proc. SPIE, Three-Dimensional Image Capture and Applications VI, 2004, vol. 5302, pp. 95-104.
[2] B. J. Super and A.C. Bovik, “Shape from Texture Using Local Spectral Moments” , IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 4, pp. 333–343, April 1995.
[3] Y. G. Leclerc and A. F. Bobick, “The Direct Computation of Height from Shading,” IEEE Conference on Computer Vision and Pattern Recognition, pp.552-558, June 1991.
[4] A. Torralba and A. Oliva, “Depth Estimation from Image Structure”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp.
1226–1238, September 2002.
[5] A. Saxena, S. H. Chung and A. Y. Ng, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision, vol. 76, no.1, pp. 53-69, 2008.
[6] A. Saxena, M. Sun and A.Y. Ng, “Make3D: Learning 3-D Scene Structure from a Single Still Image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824–840, May 2009
[7] P. Felzenszwalb and D. Huttenlocher, “Efficient Graph-Based Image Segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp.
167–181, 2004.
[8] B. Liu, S. Gould and D. Koller,“Single Image Depth Estimation from Predicted Semantic Labels,” IEEE Conference on Computer Vision and Pattern Recognition, pp.1253-1260, June 2010.
[9] D. Hoiem, A. A. Efros and M. Hebert,“Recovering Surface Layout from an Image,” International Journal of Computer Vision, vol. 75, no. 1, pp. 151-172, 2007.
[10] D. Hoiem and A. A. Efros,“Recovering Occlusion Boundaries from an Image,”
International Journal of Computer Vision, vol. 91, no. 3, pp. 328-346, 2011.
[11] O. Barnich and M. V. Droogenbroeck, “ViBe: A Universal Background Subtraction Algorithm for Video Sequences,” IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709-1724, June 2011.