CHATPER 1 INTRODUCTION
1.2. R EVIEW OF R ELATED W ORK
Virtual reality systems involve two major classes of technique, i.e., geometry-based and image-based rendering. In geometry-based methods, a complete 3D model of the environment,
5
including all the objects within the virtual world, is constructed and rendered to simulate the virtual world. Conversely, image-based methods, collections of images taken from different viewpoints of the environment are used to generate novel views of the virtual world. Both approaches have their own advantages and weaknesses. However, image-based methods have become increasingly popular, because they can easily be applied to construct high-quality and photorealistic environments. Shum et al. [56] performed a thorough survey of image-based rendering techniques, and classified the techniques into three categories according to the amount of geometric information used: rendering without geometry[12][29][54], rendering with implicit geometry (i.e. correspondence)[11][19] and rendering with explicit geometry (either with approximate or accurate geometry)[7][51]. Light Field Rendering [29] and Lumigraphs [19] are two famous methods, but their large memory requirements make them impractical for real applications, especially those requiring Internet transmission. Conversely, OM has a smaller storage requirement than those methods. The OM approach can be classified into the first class, rendering without geometry, because it does not need 3D information when rendering OMs. OM has recently become the most popular approach to modeling and rendering the 3D objects, and has been adopted in many applications. This work investigate the techniques for producing high quality OMs including object movie rig calibration, OM segmentation, stereo OMs generation, and 3D reconstruction. The related work is discussed as follows.
As described in Section 1.1, the aim of the rig calibration is to ensure that the pan-, the tilt- and the optical- axes intersect at a common point, Cs. Since a camera is mounted on the object movie rig, it can be used to perform the calibration, which can be considered as a pose estimation problem. The problem is widely studied in robotic motion and automatic industry [35][42]. A camera can be adopted in a robot system to determine the robot pose from the camera extrinsic parameters, as is well known. Camera calibration is widely discussed.
Calibration methods fall into two categories. The first category is self-calibration, in which the
6
camera parameters are estimated without any reference object, by moving a camera in a static scene [21][39]. However, many parameters need to be estimated, making reliable results hard to obtain. The other calibration methods are estimation with a reference object. Calibration is performed by observing a calibration object whose geometry in 3D space is known with very high precision [58]. In this thesis, the motorized object rig is formulated with the kinematic model. Denavit and Hartenberg [16] developed a notation for assigning orthonormal coordinate frames to a pair of adjacent links in an open kinematic chain. However, parameter jumps occur when two consecutive joint axes change from parallel to almost parallel. Zhuang et al. [69]
proposed a complete and parametrically continuous (CPC) kinematic model to avoid this situation.
To our knowledge, OM segmentation is currently performed entirely by the artists. These experts mainly manipulate some industrial interactive tools (e.g., magic wand and intelligent scissors from Adobe Photoshop [1]) to remove the backgrounds of each image individually.
The work flow does not utilize any information between images captured in neighboring viewing directions, and consequently is very expensive. Unfortunately, background removal in the OM has not been widely investigated, so OM segmentation is an obstacle to the spreading of image-based objects.
Interactive background removal tools have been developed for many years because of their practical importance. Such tools include magic wand [1], intelligent scissors, [40][41][26]
Bayesian matting [13], graph-cut-based image segmentation [6][47][31][17][11], and interactive matting based on belief propagation [64]. The color information (e.g., foreground and background color model) and contrast information (e.g., gradient and edge strength) are usually exploited to achieve the goal. The most popular of these methods is probably graph-cut-based image segmentation. The remaining of the image are automatically classified as the foreground or background immediately after a user manually provides foreground and background hard constraints on the image. These approaches are often quite successful for
7
single-image segmentation, but hard to apply to the OM segmentation due to the endless drudgery of manually specifying hard constraints on each image of the OM individually.
OM background removal is a specific type of video object segmentation. Some automatic methods for video object segmentation have been proposed [44][36], but are not always able to extract the desired video objects. Some researchers have proposed semi-automatic methods that allow user interaction to improve the accuracy of results [20][36][38][67]. Although many approaches have been proposed to deal with video object segmentation, none of these are devoted to object movie segmentation.
Generating stereo OMs from monocular ones is a novel view generation problem, which can be intuitively solved by image morphing [4][48]. Since it does not consider any 3D information, it may produce unexpected effects. View morphing [50] utilizes additional 3D information, such as epiploar geometry and camera parameters, to eliminate the unexpected effects. Moreover, image morphing and view morphing require corresponding features on the original images. However, obtaining good corresponding features is also an open problem.
Another approach tries to reconstruct a geometric model of the object according to the consistency with the image information. A calibrated laser projector and a calibrated camera can be used to reconstruct 3D surface [59]. However, laser scanner devices are expensive.
Another methods [18][61][37], photometric stereo, can recover high-quality 3D models. To utilize these methods, the lights must be conscientiously and carefully controlled, which is impractical for many applications. Passive methods have been developed for more practical purposes. Laurentini [28] proposed a stable method, called visual hull, to reconstruct a 3D surface using silhouette information. However, his method cannot recover the concavity features of the 3D objects. Seitz and Dyer [49] proposed an improved method that considers voxel colors from different views in order to carve the voxels outside of the true surface.
However, the method has a problem in that the surface points are dispersed. Vogiatzis et al. [62]
recently proposed a graph-cut-based method, called volumetric graph cuts, to solve this
8
problem. Because graph cuts algorithm prefers shortest cut, the volumetric graph cuts has the problems that concavity-convex features and silhouettes cannot be preserved. Tran and Davis [57] tried to solve these problems with silhouette constraints. Their method sets hard constraints on some verified surface voxels. It works well for some cases, but does not solve the problems completely.