Background Removal Result - Experiment Results

Chapter 4 Experiment Results

4.2 Background Removal Result

The experiment of background removal takes an image sequence of an object in unprepared environments as the input and generates the mask of the object. Mention that the proposed algorithm requires the 3D information of camera pose and feature point position, as described in Chapter 3. Hence, a camera tracker is required as a pre-process for the mask generation process. The voodoo camera tracker [21] is adopted to perform the task of reconstruction 3D camera poses and feature point positions of the image sequence. The result from voodoo camera tracker is then applied to the proposed algorithm to generate the mask of foreground object. The test image sequences are taken by a handheld free-motion camera, orbiting around and aiming at the target object. Meanwhile, the target object for each test is placed in an unprepared environment. The algorithm applies to two different cases for testing the performance.

4.2.1 The Toy-on-Table Sequence

The first experiment, referring to the scene of a small scale, is shooting at a toy placed on a table in a range about 60 degrees, called the Toy-on-Table sequence. The sequence contains 134 frames with frame rate of 30 fps, resolution of 640x480 in pixel, and color depth of 24-bit.

Three frame of the sequence are shown in the “Original frame” column of Figure 4.2.1, while the corresponding foreground feature point distribution and background removal results are shown in the “Separated foreground feature points” column and “Background removal result”

column of Figure 4.2.1, respectively. For the more detailed results, result of the

foreground/background separation step, the first step of proposed algorithm, is shown in Figure 4.2.2. The whole process takes about 82 minutes to run on a laptop with Intel T2500 2.0GHz CPU and 512MB ram of DDRII-667. However, 81 minutes, or 98.7% of the execution time is spend on the camera tracking process.

Original frame Separated foreground

feature points Background removal result

(a)

(b)

(c)

Figure 4.2.1 Frames of the Toy-on-Table sequence, the background of “Background removal result” is changed to gray for clear view: (a) frame #0, (b) frame #67, (c) frame #134

Observing Figure 4.2.1, the performance of background removal varies from (a) to (c).

The background removal extracts the object precisely in Figure 4.2.1(a), but shows non-ignorable large error in Figure 4.2.1(c). To probe into the causes of the error, three factors are found besides the one of adopting convex hull algorithm mentioned in Chapter 3. The first factor is the error of foreground/background separation. Because the object mask is directly generated by the foreground points, any error in foreground point determination may directly

the detection algorithm adopted by the voodoo camera tracker. The above two factors mainly influence on the background removal result.

The last factor is caused by the error of the camera tracker’s estimation process, including errors in 3D feature point positions, 3D camera positions, and 3D camera orientations. Note that this factor does not generate clear influence on the result when compared to the above two factors, for the reasons that the estimation error is small and the projection from 3D onto 2D reduces the error. Viewing Figure 4.2.2 again, the error in 3D feature point positions can be judged from two parts. The first part can be seen as some yellow points around the foreground object points marked in green, circled by orange ellipse, which should not exist in the original image. The second error is the yellow points in the left part, circled by red ellipse.

Notice that the input image sequence was taken around the object in about 60 degrees, there should be no such points on the left part since no information were provided.

Figure 4.2.2 Foreground/background separation result the Toy-on-Table sequence. The green points are the foreground points and the yellow points are the background points.

4.2.2 The Statue Sequence

Different from the indoor condition in 4.2.1, the scale of scene of outdoor condition is larger than the indoor condition. The Statue sequence is the test sequence taking outdoor shooting at a statue in an open environment in a range about 70 degrees. The Statue sequence is an image sequence of 60 frames with frame rate of 15 fps, resolution of 640x480 in pixel, and color depth of 24-bit. Same as Section 4.2.1, Figure 4.2.3 shows the result of three frame of the Statue sequence and Figure 4.2.4 shows the result of the foreground/background separation step. Also mentioned that the whole process takes about 9 minutes to run on a

laptop with Intel T2500 2.0GHz CPU and 512MB ram of DDRII-667. Also the same as Section 4.2.1, 8 minutes, or 88.9%, of the processing time is spent on the camera tracking process.

Original frame Separated foreground

feature points Background removal result

(a)

(b)

(c)

Figure 4.2.3 Frames of the Statue sequence, the background of “Background removal result” is changed to gray for clear view: (a) frame #0, (b) frame #30, (c) frame #60 Comparing to Section 4.1, the most obvious difference is the background result affected by the shape of target object. For instance, since the object shape in Figure 4.2.1 is approximately a convex, the convex hull algorithm successfully generates object masks close to the object shape. However, when the convex hull algorithm applies to an object not solely composed of convex shape, as shown in Figure 4.2.3, it fails to recover the concave parts of the object. Even though the foreground separation result reveals the shape of the target object, the background removal result contains a large part of background due to the convex hull

Figure 4.2.4 Foreground/background separation result of the Statue sequence. The green points are the foreground points and the yellow points are the background points.

4.3 3D Model Reconstruction System Result

Combining camera tracker, octree and proposed background removal algorithm, a new 3D model reconstruction system prototype is proposed, as mentioned in Section 1.3. However, after testing several available camera trackers including voodoo camera tracker 0.9.1 beta[21], PFTrack 4.0 evalution[34], ICARUS v2.09 personal edition[17], and SynthEyes Demo[1], a limitation of these camera trackers has been revealed. These camera trackers can only reconstruct information from image sequences taken by camera with change in viewing angle less than about 80 degrees. The limitation comes from the assumption of camera trackers that most of the feature points detected must remain in sight. Camera trackers reconstruct 3D information based on tracking the position change of these feature points. Once the change in viewing angle is larger than about 80 degrees, feature points in some frame may be quite different from another frame. Yet the camera tracker still tries to track feature points that are already lost and find a best solution of the 3D camera pose and feature point positions. As a result, the reconstructed information is totally collapsed due to the invalid tracking.

Observing Figure 1.3.1, the reconstructed 3D information from camera tracker is crucial for the proposed reconstruction system. The proposed reconstruction system can not function effectively without information from 360-degree object image sequences due to the camera tracker limitation. Instead, scale-down experiments are adopted to verify the reconstruction system. With the use of the Toy-on-table and Statue sequences in Section 4.2, the reconstruction results are shown in Figure 4.3.1 and 4.3.2 respectively.

(a) (b)

Figure 4.3.1 Reconstruction result of Toy-on-table sequence. (a) octree model, (b) triangular model.

(a) (b)

Figure 4.3.2 Reconstruction result of Statue sequence. (a) octree model, (b) triangular model.

Viewing from Figure 4.3.1 and Figure 4.3.2, shapes of reconstructed models are much different from the target objects. The models are hollow since the results show only ON cubes and the hollow parts inside the surface actually represent undetermined cubes identified as IN cubes. The dissimilarity between object models and target objects is caused by two factors.

The first factor is the object mask generated by the convex hull algorithm, as mentioned in Section 4.2, which is adopted to preserve the largest possible shape of the object. However,

Since the image sequence provides only 80 degrees of view, the available information is limited in the provided viewing range. Hence, the octree algorithm can only trim the 3D model in the range and then leads to an incomplete model.

Chapter 5 Conclusion

5.1 Conclusion

This thesis proposes a new prototype of 3D object reconstruction system dealing with more general cases. However, a 3D object reconstruction system requires knowledge and technologies of many aspects that can not be all mastered during the research period. Hence, the implementation of proposed system integrates some existing algorithm and programs, including the octree algorithm, the convex hull algorithm, and the voodoo camera tracker mentioned in Chapter 2, Section 3.3, and Section 4.2, respectively. Since the performance and stability of these integrated parts are verified by many researches and applications, the reconstruction system only focuses on the performance of background removal algorithm mentioned in Chapter 3.

Section 4.1 shows that the octree algorithm functions as rebuilding the 3D model from silhouettes and corresponding camera poses. Next in Section 4.2 shows the proposed background removal algorithm is workable, yet a lot to be improved. Unfortunately, experimental results in Section 4.3 reveal that the performance of the proposed system is restricted to the background removal algorithm and the camera tracker, especially the latter.

As mentioned above, this thesis focuses on proposing a brand new system dealing with problems never dealt before. This thesis proves the proposed algorithm and system structure are useful and many work can be done in the future to make the system better, as explained in the next section.

5.2 Future Work

5.2.1 The Camera Tracker

The limitation and effect of the existing camera tracker is presented in Section 4.3.

Actually, the proposed algorithm can not fully work if the camera tracker limitation is not

essential problem for all camera trackers. The problem should be a defect of the existing camera tracking algorithms. Hence, modifying the existing algorithm or designing a new algorithm for the shooting condition mentioned in this thesis is required.

The typical structure of a camera tracker is shown in Figure 5.2.1. The problem mentioned is caused by the feature point tracking block in the camera tracker. To deal with the problem, the feature point tracking algorithm must be improved to be able to determine whether a tracked point is appearing or disappearing in some frame. With the improvement, a point is tracked only when it is visible to the camera, not through out the image sequence. To achieve the requirement, the feature point tracking algorithm must track feature points based on not only the low level information such as edge, color, and texture, but also the information of higher level like geometry relationship. Hence, algorithms of higher information processing like image interpretation and understanding might be integrated into the point tracking algorithm. Figure 5.2.1 System structure for depth-map recovery algorithm

Besides the limitation of camera shooting angle, another minor improvement for the feature point tracker is also required for better background removal performance. As mentioned in Section 4.2, some important features including edges and corners of the object are not tracked by the feature point tracker. The lack of certain points makes the following

process, the object mask generation in Section 3.3, becoming more complex and difficult to recover the shape of the object. In other words, more feature points provided, more accuracy the object mask becomes. Hence, the algorithm of feature point extraction could be improved to obtain more feature points on the object surface to support the background removal algorithm.

5.2.2 The Background Removal Algorithm

The background removal algorithm is composed of two steps. Currently, each step implements simple algorithm and works as a prototype. For the foreground/background separation step, the current algorithm applies a modified nearest neighbor algorithm on 3D point positions to separate points. The algorithm is fast and simple but not accurate enough.

The separation step should imports information, such as texture, edge, and shape segment, from the object image to assist the separation algorithm. These information could be useful to eliminate points which are near but do not belong to the object.

On the other hand, the object mask generation step must be improved to be capable of dealing with concave contour and holes. From Section 4.2, it is clear that the ability is crucial for the quality of reconstructed model. However, it is difficult to determine the concave contour and holes of the object even with sufficient 2D point information. To obtain the accuracy object mask, information including texture and edges from object images must be taken into consideration to the object mask generation, same as the previous step.

5.2.3 The Texture Mapping Block

Though the texture mapping is a well-developed algorithm, the algorithm is much complex than octree algorithm and takes much time to implement. Hence, the texture mapping block is not implemented in the current system. However, a 3D model reconstruction system is incomplete without the texture mapping process, as a 3D model is incomplete without texture. The texture mapping block must be implemented in the future.

Bibliography

[1] Anderson Technologies LLC, "SynthEyes 2007 Demo," in http://www.ssontech.com/, 2007.

[2] Autodesk, "Maya Personal Learning Edition 8.5," in

http://usa.autodesk.com/adsk/servlet/index?siteID=123112&id=7639525.

[3] P. A. Beardsley, P. H. S. Torr, and A. Zisserman, "3D Model Acquisition from Extended Image Sequences," in ECCV '96 : Proc. 4th European Conference on Computer Vision-Volume II, 1996, pp. 683-695.

[4] E. Boyer, "Object Models from Contour Sequences," in ECCV '96 : Proc. 4th European Conference on Computer Vision-Volume II, London, UK, 1996, pp.

109-118.

[5] M. S. Brown and W. B. Seales, "Beyond 2D images: effective 3D imaging for library materials," in DL '00: Proceedings of the fifth ACM conference on Digital libraries, New York, NY, USA, 2000, pp. 27-36.

[6] J. C. Carr, R. K. Beatson, J. B. Cherrie, T. J. Mitchell, W. R. Fright, B. C. McCallum, and T. R. Evans, "Reconstruction and representation of 3D objects with radial basis functions," in SIGGRAPH '01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, New York, NY, USA, 2001, pp. 67-76.

[7] CGAL Open Source Project, "Chapter 24 : 2D Alpha Shapes," 2006.

[8] C. M. Christoudias, B. Georgescu, and P. Meer, "Synergism in Low Level Vision,"

Porc. of 16th International Conference on Pattern Recognition, vol. 04, pp. 150-155, 2002.

[9] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, Second Edition: {The MIT Press}, 2001.

[10] B. Curless and M. Levoy, "A volumetric method for building complex models from range images," in SIGGRAPH '96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, New York, NY, USA, 1996, pp.

303-312.

[11] P. Eisert, E. Steinbach, and B. Girod, "Automatic reconstruction of stationary 3-D objects from multiple uncalibrated camera views," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 10, pp. 261-277, 2000.

[12] P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient Graph-Based Image Segmentation," Int. J. Comput. Vision, vol. 59, pp. 167-181, September 2004.

[13] K. Fischer, "Introduction to alpha shapes," 2000.

[14] A. W. Fitzgibbon, G. Cross, and A. Zisserman, "Automatic 3D Model Construction for Turn-Table Sequences," in Proceedings of SMILE Workshop on Structure from Multiple Images in Large Scale Environments; Lecture Notes in Computer Science, 1998, pp. 154-170.

[15] S. F. Frisken and R. N. Perry, "Efficient estimation of 3D Euclidean distance fields from 2D range images," in VVS '02: Proceedings of the 2002 IEEE symposium on Volume visualization and graphics, Piscataway, NJ, USA, 2002, pp. 81-88.

[16] B. Georgescu and C. M. Christoudias, "Edge Detection and Image SegmentatiON (EDISON) System," in http://www.caip.rutgers.edu/riul/research/robust.html, 2003.

[17] S. Gibson, J. Cook, T. Howard, R. Hubbold, and D. Oram, "ICARUS v2.09 personal edition," 2003.

[18] K. Higuchi, M. Hebert, and K. Ikeuchi, "Building 3-D models from unregistered range images," Graph.Models Image Process., vol. 57, pp. 315-333, 1995.

[19] M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active contour models," Int'l Conf.

on Computer Vision, pp. 321-331, 1987.

[20] R. Koch, M. Pollefeys, and L. J. V. Gool, "Multi Viewpoint Stereo from Uncalibrated Video Sequences," in ECCV '98 : Proc. 5th European Conference on Computer Vision-Volume I, London, UK, 1998, pp. 55-71.

[21] Laboratorium für Informationstechnologie University of Hannover, "voodoo camera tracker 0.9.1 beta," in http://www.digilab.uni-hannover.de/docs/manual.html, 2007.

[22] M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira, M. Ginzton, S.

Anderson, J. Davis, J. Ginsberg, J. Shade, and D. Fulk, "The digital Michelangelo project: 3D scanning of large statues," in SIGGRAPH '00, New York, NY, USA, 2000, pp. 131-144.

[23] W. E. Lorensen and H. E. Cline, "Marching cubes: A high resolution 3D surface construction algorithm," in SIGGRAPH '87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, New York, NY, USA, 1987, pp. 163-169.

[24] C. Montani, R. Scateni, and R. Scopigno, "A modified look-up table for implicit disambiguation of Marching Cubes," The Visual Computer, vol. 10, pp. 353-355, December 1994.

[25] A. Moreira and M. Santos, "Concave Hull: a k-nearest neighbours approach for the computation of the region occupied by a set of points," 2007, pp. 8-11.

[26] C. Pheatt, J. Ballester, and D. Wilhelmi, "Low-cost three-dimensional scanning using range imaging," J.Comput.Small Coll., vol. 20, pp. 13-19, 2005.

[27] M. Pollefeys, R. Koch, M. Vergauwen, and L. V. Gool, "Flexible acquisition of 3D structure from motion," in Proc. 10th IMDSP Workshop, 1998, pp. 90-95.

[28] L. Qingquan, L. Bijun, L. Yuguang, and C. Jing, "3D Modeling and Visualization Based on Laserscanning," Geographic Information Sciences, vol. 6, pp. 159-164, 12 2000.

[29] J. Sethian, Level Set Methods and Fast Marching Methods: Evolving Interfaces in

[30] S. Sullivan and J. Ponce, "Automatic Model Construction, Pose Estimation, and Object Recognition from Photographs Using Triangular Splines," in ICCV '98:

Proceedings of the Sixth International Conference on Computer Vision, Washington, DC, USA, 1998, p. 510.

[31] B. Sumengen and B. S. Manjunath, "Graph Partitioning Active Contours (GPAC) for Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, p. 509, 2006.

[32] B. Sumner, "Active Contour Models: Snakes."

[33] R. Szeliski, "Rapid octree construction from image sequences," CVGIP: Image Underst., vol. 58, pp. 23-32, 1993.

[34] The Pixel Farm, "PFTrack 4.0 evalution," 2007.

[35] N. Wolfgang and W. Jochen, "Automatic Reconstruction of 3D Objects using a Mobile Monoscopic Camera," in 3DIM '97 : Proc. Int. Conf. on Recent Advances in 3-D Digital Imaging and Modeling, Washington, DC, USA, 1997, p. 173.

[36] C. Xu and J. L. Prince, "Snakes, Shapes, and Gradient Vector Flow," IEEE Transactions on Image Processing, vol. 7, pp. 359-369, March 1998.

[37] Y. Yemez and F. Schmitt, "3D reconstruction of real objects with high resolution shape and texture," Image and Vision Computing, vol. 22, pp. 1137-1153, NOV 1 2004.

[38] C. T. Zahn, "Graph-theoretic methods for detecting and describing gestalt clusters,"

IEEE Transactions on Computing, vol. 20, pp. 68-86, 1971.

在文檔中應用於無背景限制下三維建模之物體萃取演算法 (頁 47-0)