CHAPTER 1 INTRODUCTION
1.2 Survey of Related Research
Many 3D structure construction methods have been proposed. They can be classified into:
(A) Active Vision
(B) Passive Vision
The techniques proposed in the active vision class project light pattern onto the object. Then the 3D triangulation or time of light is calculated to estimate the depth information [1]-[10]. Active vision 3D construction methods provide high accuracy results. However, additional procedures are required to set up and calibrate the pattern projection device. Besides, the range of working area is limited to the brightness of the projected light pattern onto the object or scene.
On the contrary, passive vision techniques do not require to project light pattern on the surface to infer the 3D information. Features from the images of the object surfaces are extracted and then used to infer the 3D information. The object silhouette provides an important visual clue which is used to construct the 3D structure of the object. Besides, in the physical world (especially the man-made world) planar surfaces such as walls, windows, table, roof, road, and terrace can be found in the indoor as well as the outdoor scenes. Another task is to reconstruct the 3D planar surfaces in a scene from multiple uncalibrated images taken by a camera placed at different viewpoints. In the following paragraphs, existing methods using the silhouette or planar surfaces information for inferring 3D information are reviewed.
The reconstruction methods using the silhouette information, usually named the silhouette from silhouette (SFS) method, can be further divided into two groups. The first group of methods [11]-[12] applies the differential geometry constraints onto the extracted object silhouettes and generates 3D curves corresponding to the object silhouettes. However, at least three consecutive silhouette images are required to construct the 3D curves. Besides, the change of the object silhouettes should be slightly different, thus a large number of views are required to construct a complete object. Another group of the SFS methods obtain the 3D geometry of the object using the volume intersection techniques. Different structures [13]-[30] have been proposed to represent the initial volume used to intersect with the space that the object occupies.
Recently an efficient image-based approach was proposed [66] to compute the so called visual hull of an object to represent the volume where the object occupies. The operation of the method was based on the intersection of the epipolar lines with the silhouette images one by one. The performance of the method is highly dependent on the resolution of the image. Besides, additional rendering techniques, like pixel splating, are required for rendering the new images. In [27][65] methods are proposed to represent the NxNxN voxels to represent the initial volume. They then projected
each voxel onto the images and applied the color consistency check to determine whether the voxel belongs to the object surface or not. The number of N will affect the accuracy of the final result of the constructed model. The larger the N, the better the reconstructed model. However, it will take much time to do the voxel projection and color consistency check. An octree is one of the volumetric representations of an object geometric model. It is well known in the field of computer vision that an octree can be constructed from multiple silhouettes taken from different viewpoints of an object [13]-[24]. They can be divided into two classes: 3D space approach [13]-[16]
and 2D space approach [17]-[24]. Both approaches recursively subdivide a partially occupied octree octant into smaller octants until all the generated octants are entirely inside or outside the object. The overlapping between an octree octant and the object is determined by an intersection test. The above two approaches differ in their intersection tests. In the 3D space approach the octant under examination is tested against the conic view volume formed by the individual silhouette and the center of projection for each viewpoint. In the 2D space approach the projection image of the octant is tested against the silhouette for each viewpoint. Generally speaking, the 2D space approach is performed in a space with a lower dimension, so it is more efficient than the 3D space approach [17]. Since the octant number grows exponentially with the subdivision level, an upper bound is usually imposed on the number of subdivision levels to avoid the insufficient memory space problem. However, a larger value of subdivision levels leads to a better construction result. It is generally difficult to make a good balance between memory space and construction quality. In this dissertation, we propose a new subdivision strategy which is governed by the degree of overlapping between the generated octant and the object. We shall consider making effective use of the octant subdivisions to improve the overall system performance.
Recently, much research efforts are involved in developing the 3D geometry reconstruction of an object without calibrating the camera a priori. In general, the methods for 3D projective or uncalibrated reconstruction [33][36][37][40][41][50] are point-based. They estimate the fundamental matrix from a sufficient number of corresponding point pairs first, and then derive the epipole and the canonical geometric representation for projective views using the fundamental matrix. Then, for each pair of corresponding points, they use a triangulation technique or bundle adjustment technique to compute the 3D point coordinates in the projective space.
Finally, for the determination of the uncalibrated planar scene structure
[34][39][46][47][49][54][56][60], the 3D points found are fitted by planes. However, it is desirable to derive the 3D planar scene structure in terms of plane features in the images directly, for these features are more reliable than the point or line features [49].
The estimation of the 3D projective planar structure based on the projected plane feature information exclusively has not yet received much attention, although it is known that the corresponding projected plane regions in a pair of stereo images induce a homography. It is also known that homographies are useful to many other practical applications including:
(a) Fundamental matrix estimation or canonical projective geometry representation [48][49].
(b) 2D image mosaicing or view synthesis [43].
(c) Plane + parallax analysis [34][36][56].
(d) Planar motion estimation and ego-motion [45][54][60].
Recently, two methods have been proposed for the 3D projective reconstruction of planes and cameras. The first method assumes all planes are visible in all images and the second method assumes a reference plane is visible in all images [51][52]. In practice, it is not realistic to have all planes or even one plane visible in all images unless a very large ground plane is available. When there is no reference plane visible in all images, the reconstruction problem cannot be formulated within a common projective space and the reconstruction results will be inevitably obtained in different projective spaces.