10 Related Works
Figure 2.1: Camera 3D Reconstruction System - 3D Dome Developed by Narayanan, Rander and Kanade.
The technique of 3D reconstruction from stereo images of real scenes has been stud-ied for many years. The focus points of 3D reconstruction studies vary due to different requirements of various applications such as robot navigation, 3D model reconstruction of architectures, computer graphics, virtual reality, etc.
Take robot navigation for example, robot vision systems demand no sophisticated or re-alistic reconstruction results but only the accuracy of depth information and some principle parts of the environment, therefore, the researchers of robot vision systems focus on how to calculate depth information from images precisely and efficiently.
Another example is the 3D virtual model reconstruction of a specific real object, the most common way nowadays is to put the object on a rotating plate and keep capturing images with a stationary camera while the plate rotates. The camera can be calibrated first in order to acquire the relationship between image points and its reprojection rays. Camera motion can be formulated since the rotation speed and the radius of the rotation are prior knowledge under the model reconstruction system. The main purpose of 3D virtual model reconstruction systems is to build photo-realistic models from a sequence of images.
In the following sections of this chapter, we will introduce several methods about how to rebuild 3D virtual models from images captured by various poses of cameras.
2.1 Multi-View 3D Reconstruction 11
2.1 Multi-View 3D Reconstruction
Multi-view 3D reconstruction systems rebuild the model from photos captured by sev-eral cameras of different poses. The corresponding features are found amoung cameras in order to calculate 3D coordinates of the real object. Cameras in multi-view rebuilding sys-tems are often fully calibrated so that their relative poses are known. Acquire enough 3D information by tracking the motion of an moving object with multiple calibrated cameras is the main advantage of these systems.
Narayanan, Rander and Kanade proposed a multi-view photographic reconstruction system called 3D Dome [12]. As illustrated in Figure 2.1, the system 3D Dome is a semi-sphere multi-capturing system formed by fifty-one synchronous and fully calibrated cam-eras. Since all the cameras are all fully calibrated, which means in the Equation 4.6 the camera intrinsic matrix K and relative poses amoung cameras are all known. Therefore, when a person is taking some actions in the 3D dome, every camera around the 3D dome will capture images from different point of views and then obtain a dense depth graph for each camera by running through a multiple-baseline stereo reconstruction procedure. Map-ping the texture onto the dense depth graph forms a simple reconstructed 3D human model.
The author called this a visible surface model (VSM). But VSM is a surface model recon-structed from each camera, there is some part of the human have inevitable reconstruction difficulties due to occlusion. The author solved this problem by synthesizing all the VSMs together with a optimized integration procedure in order to reconstruct the complete surface model (CSM) of the scene.
In the 3D dome system, cameras are all fully calibrated with relative poses known.
Therefore, in the 3D dome system, calculation of 3D coordinates from the photos needs no complicated computation. Since the system is equipped with 51 cameras, the main problem of the system is to have cameras capture images synchronously.
Fua and Leclerc have proposed a similar system which goal is to rebuild the real scenes in virtual reality [6]. Differ from Narayanan, Rander and Kanade, they use only two cali-brated cameras to capture images of static scenes. Fua and Leclerc turn the calculated 3D points into meshes in order to reconstruct the 3D surface model of the scenes.
12 Related Works
2.2 Single-Camera 3D Reconstruction
Single-camera 3D reconstruction systems often obtain images with a single camera but from different point of view or simply record videos while the camera is moving. The most often used method is the so-called structure from motion. Image processing methods, such as multi-image intensity correlation, can be used in single camera video systems in order to find out image correspondences since there should be small differences between frames in short-term intervals.
Pollefeys and Van Gool [13] have implemented a single camera reconstruction system similar to described systems above. The input of their system is a sequence of images captured from the same scene by single camera. After specifying some distinct features in each image, similarity comparison methods are used to find out correspondences amoung images. Since there are some errors in images due to camera projection hardware structure and some noises caused if the feature points were specified by human, Pollefeys and Van used a method called random sampling consesus(RANSAC) to calculate several choices of the fundamental matrices from the image correspondences and picked the most stable fundamental matrix out from the computed matrices. The fundamental matrix encodes the transformation of every image points in two corresponding images, the definition of the fundamental matrix and epipolar geometry will be introduced in chapter 3 and 4. After finding out the fundamental matrix, a projective reconstruction can be computed. If the cameras are calibrated, the intrinsic parameters of the camera matrix are all known and thus a metric reconstruction can be performed, which differs from the real world by only a scalar factor. After the metric reconstruction is done, the author computed the dense depth graph in order to calculate depth for every pixel in the image and then performed texture mapping to reconstruct the whole 3D model.
Fitzgibbon and Zisserman have proposed a similar system, but besides finding feature points, they used the informations of line segments in the images by using image processing techniques such as edge detection for 3D reconstruction. Therefore, their reconstruction re-sults are not only sparse 3D points but with the information support of line segments, which can assist in reducing the depth error of the 3D reconstruction. In addition, Zisserman used
2.3 Other Reconstruction Methods and Applications 13
both the two-view reconstruction method and the trifocal tensor method, which is to cut down depth ambiguity by using three-view image correspondences.
2.3 Other Reconstruction Methods and Applications
There are lots of applications require image-based 3D reconstruction systems for assis-tance. Schreer [15] has developed a robot navigation system with photographic 3D recon-struction algorithms built in. The two cameras used for robot vision are both fully calibrated in order to calculate 3D coordinates in real-time while the robot is moving around in the environment. But this system uses only the distribution condition of the reconstructed 3D points with some prior knowledge and experiences in order to avoid obstacles. But lack of considering the structure of indoor environments may cause the robot vision system inflexible.
Some reconstruction systems use some characteristics of the scenes to refine the recon-structed model. Cipolla and Robertson [3] used the prior knowledge such as the perpen-dicular relations amoung walls and floors of the buildings to find the vanishing point in the image. The vanishing point is then transfered into a 3D vector form in order to reduce computational error of the vanishing point. After the vanishing point is found, the cam-era intrinsic parameters can be calculated with the vanishing point in order to simplify the calculation process of camera intrinsic parameters and 3D model reconstruction.
14 Related Works