INTRODUCTION - 無校正影片中多材質物體深度之重建

1.1Background

Televisions are more and more important in every family. They bring not only information but also entertainment to us. In recent years, there are dramatic improvements with them, especially in size and resolution. Even though, people still gradually unsatisfy with the traditional fixed viewpoint 2D display gradually and want to have more variety. Therefore, it becomes the current trend developing free viewpoint 3D display. Generally speaking, the modern 3D display technique can be categorized as stereoscopic display and auto-stereoscopic display. No matter what technique they use, the depth map of each frame used to specify objects near or far is always necessary.

There are many methods to generate depth maps, and it‟s the prevailing one by using multiple views which need synchronous image capture and pixel-based correspondence matching in each view. Due to the scene complexity(not-rigid body、

similar color object、occlusion), it‟s really a challenge to recover depth map by the multiple-view technique. Even though, by adding some human assistants, multiple views can still get a more effective result. However, our popular –used videos, like existing DVD movies, were not captured by the multiple views technique, the view-points are uncalibrated and they are difficult for precise frame-by-frame correspondence matching. For these reason, we want to find out a suitable and practical approach such that we can generate sequence of depth map toward the target object from the common video sequence.

Other methods ,like shape from shading(SFS)、structure from motion(SFM) and photometric stereo are also important for shape reconstruction. Photometric stereo use multiple images taken with a fixed viewpoint and different lighting conditions to recover the shape and reflectance properties of the target object. Although this method

can reconstruct a detailed surface, it‟s limited only for Lambertian surface object.

SFM is a technique similar to multiple-view technique but uses only one camera and views from camera motion instead. It recovers the point cloud and projection matrix from the correspondences in the frames but not as accurate as the calibrated multiple views technique. SFS technique recovers the depth of the object in the image via the comparison of its intensity variation and reflection model. The advantage of SFS is that it requires only one image and can avoid pixel correspondence problem. In spite of these advantages, SFS has several limitations, for instance, it‟s sensitive to the noise of intensity, and the light condition is limited to single light source or simple light conditions. SFS techniques are only for single material object by its principle.

Our goal is to recover the depth maps of target objects from a general purpose video sequence like movies or family video illuminated at a simple lighting condition .Because the target object in the video sequence can be of multi-material or non-rigid-body properties, we combine the SFS and the non-rigid SFM with space-time optimization to approximate its surface as real as possible.

1.2Framework

First Phase Second

Phase

Feature Point Extraction Super-pixel Clustering

Non-Rigid Structure from Motion

Video Sequence

Target Object Segmentation

Boundaries of Multi-Material Segments Adjust

SFS and Reflectance model recovery

Optimization between non-rigid SFM and SFS

Space-time optimization

Output Depth map sequences

Third Phase

Our goal is to recover the shape of target object in one video sequence. As the flowchart above, we instead to a segment of video sequence and generate the corresponding depth map sequence. The system can be split into four parts.

In the first part, we take one video sequence as input and segment the target object frame by frame. The target object may be a non-rigid object like cloth or rigid object like a rolling ball, etc. We assume the light condition is directional spot lighting or in a cloudy day.

In the second part, toward the target object, we extract the time-varying feature points frame by frame and recover shapes b structure from motion, a technique of reconstruction without calibration, especially useful for common video.

In the third part, with the over-segmentation of a target object with continuous surfaces, we find out the boundaries of different materials and adjust height of intensities between them. After the offset propagation along boundaries of segments, we can interactively approximate the 3D shape and the reflectance model.

In the fourth part, we combine SFS with SFM. For the sake of more reliable and continue 3D points, we add the space-time optimization between neighboring frames.

1.3Contribution

In this thesis, we propose an approach to recover the shape of the target object on a video sequence. The object can be a moving、non-rigid-body or with multiple-reflectance properties. To improve the reliability, we add the spatial and temporal coherence constrain. The primary contributions are as follows:

(1)An advanced shape from shading technique toward the multi-material object is proposed.

(2)Toward the real object in the video sequence under a single directional light condition, we can produce a temporal-consistent depth maps for display on 3D TV.

在文檔中無校正影片中多材質物體深度之重建 (頁 12-16)