Chapter 1 Introduction
1.2 Related Works
Currently, 3D model reconstruction can be conducted by human, or automatically operated by laser scanner or camera. The automatic reconstruction process contains two steps, the shape-modeling and the texture-mapping, to take care of the silhouette and to construct the surface texture of the 3D model, respectively.
Reconstruction by human is the most common but time-consuming way. Usually, a 3D artist takes weeks to months to build a model by several sketches or photos. Besides, the quality of the constructed model depends on the ability of the artist. It is clear that the uncertainty of quality and the cost of time are the crucial problems of human reconstruction.
Hence, automatic reconstruction systems are presented to overcome these problems.
Reconstruction by laser scanners[5, 6, 10, 15, 18, 22, 26, 28] is the one that can create a model with shape exactly same as the object to be reconstructed due to the fact that the shape data is obtained by scanning the entire object in all angles. However, the scanner is too expensive and requires special know-how to operate in restricted environments. In addition, the scanning process is affected by the material of object. The system does not perform well on objects with surfaces absorbing light, such as fur or velvet. Finally, the scanner needs extra instruments such as camera to capture the texture information since it can not obtain texture information while scanning shape data.
Compared with the two methods above, reconstruction using camera not only attains results of good quality, but also it is much cheaper than laser scanners and takes less time than human. This makes algorithms using cameras become the topic that has been widely investigated among all 3D reconstruction systems. Since 2D images captured by camera contain both shape and texture information of a 3D object, it is the most critical issue to find methods which can extract useful data for reconstruction 3D models.
The typical architecture of camera reconstruction systems is shown in Figure 1.2.1, including three main steps, the camera-calibration, the shape-modeling, and the texture-mapping. The first step is the camera calibration to determine essential optical parameters and perspective characteristics. The second step performs the reconstruction of 3D model shape. The last step is the texture-mapping to generate the surface texture of the model.
A preprocess of camera calibration is applied to obtain image characteristics of the camera,
3
Figure 1.2.1 Camera reconstruction system structure
Most systems [3, 4, 11, 14, 20, 27, 30, 33, 35, 37] use an image sequence, at least 10 images rather than a single image, as input, because single image can not provide sufficient information for the whole process. Generally, the performance of shape-modeling process is affected by the number of input images; in other words, the more input images, more data in other words, the more accurate 3D model.
Systems for 3D model reconstruction nowadays can be classified into two types according to the algorithm used in the model reconstruction block. One type of algorithm is depth-map recovery [3, 11, 20, 27], reconstructs 3D model by estimating the 3D position of every pixel in the input image sequence according to the estimated camera pose. Another type is the shape-from-silhouette algorithm [4, 14, 30, 33, 35, 37], uses the object silhouette in each image to refine the outline of 3D object model according to the corresponding object pose, since the 2D object silhouette is the projection of 3D object shape.
Diagrams of system structure using depth-map recovery algorithm and shape-from-silhouette algorithm are illustrated in Figure 1.2.2 and Figure 1.2.3, respectively.
The shape-modeling step is subdivided into several processing block: camera tracking, model reconstruction, and object extraction. The former two blocks are contained in both types of systems, each with different functionality of recover 3D poses of camera and feature points for each image, and reconstruct 3D model from provided data. The object extraction block, which only required by the systems using shape-from-silhouette algorithm, is the most
significant difference between these two types of systems.
The differences of system structure and algorithm itself make these two algorithms vary from each other in many aspects. First, the result of depth-map recovery is the 3D model of the whole scene in the image sequence, which is very useful in the whole scene reconstruction but not suitable for reconstruction the object model. On the contrary, the shape-from-silhouette algorithm is designed to reconstruct a specific object in the image sequence. Second, the depth-map recovery algorithm reconstructs the 3D model at pixel-wise precision, but the reconstruction process is very time-consuming. The reconstruction process of the shape-from-silhouette algorithm is much faster than depth-map recovery algorithm because it reconstructs the model to certain accuracy but not exactly. To increase the accuracy, the reconstruction environment must be controlled to obtain errorless information including object silhouettes and object poses. There is another weakness only existing in the shape-from-silhouette algorithm, that is, it can not reconstruct the concave part if the concave never appears in the silhouette.
Camera
Figure 1.2.2 System structure for depth-map recovery algorithm
5 Object
Extraction
Camera
Texture Mapping Shape Modeling
Camera Tracking
Model Reconstruction
Object Model
Camera Calibration Image
Sequence
Camera Internal Parameters Calibration Image
Shape Silhouette Camera Pose
Triangular Model Image
Sequence
Figure 1.2.3 System structure for shape-from-silhouette algorithm