PHOTOMETRIC STEREO - RELATED WORK - 無校正影片中多材質物體深度之重建

CHAPTER 2 RELATED WORK

2.3 PHOTOMETRIC STEREO

Photometric stereo estimates local surface orientation by using several images of the same surface taken from the same viewpoint but under illumination from different directions.

Fig7. Reconstructions of the shark sequence using Lorenzo et al‟s method.

It‟s given 2D tracks as inputs; reconstructions are shown here from a different viewpoint.

M. seitz et al. [8] proposed the example-based photometric stereo method (Fig8).

They introduced orientation-consistency concept to reconstruct the surface normal from the reference images where the reference objects with idential materials are also taken. Combined with traditional photometric stereo, there will be a more detailed surface recovered. The technique is extremely simple to implement to a broader class of objects than previous photometric stereo technique. Fig. 4(a) shows the bottle result of the 3D recovery. Fig 4(b) illustrates images with different light source directions.

Carlos et al. [9] used the silhouettes in multiple views to recover camera motion and then got a coarse shape of the object by the visual hull (Fig9). Besides, they proposed a robust technique to estimate light directions and introduced a novel formulation to combine photometric stereo and 3D points from visual hull.

Fig8. The left two images are the reconstructed surface by the seitz et al‟s method The right four images are the reference and target object used for

reconstruction

Fig9. Detailed surface reconstructed by Carlos et al‟s photometric method

2.4Shape from shading

Shape form shading recovers the shape from the gradual variation of shading of one single image. It‟s difficult for real applications due to its intrinsic ill-pose problem.

Besides, it‟s limited to be the Lambertian reflectance model and single light condition.

Even though, it is still an important topic in computer vision for it advantages. We need only one single image and without the time-wasting correspondences matching compared to multiple views technique. For this reason, there are still many surveys or applications with it.

Due to its intrinsic ill-pose problem, Zeng at al. [10] proposed a global solution of continuous surface. Users input surface normal on specific feature points and the system refines the surface variations to the whole face (Fig10). This method applied a Fast Marching Method which speeded up the computation to the divided local patches, and each local surface is estimated with some human assistances. After optimizing the energy function combining with each local surface, it can evaluate a global solution toward synthetic and real-world data.

Tai-Pang ea al.[11] made a extension of the above one(Fig11). Toward the biases of the light direction, they reformulated SFS and produce good initial normals for a large region to leave most noticeable errors mainly in the smooth part. They also

Fig10. Interactive shape from shading

developed an easily used 2D user-interface to edit and correct the normal map toward noticeable errors.

Fang et al.[12] combined the shape-from-shading and texture synthesis to re-texture the target object in the photograph(Fig12). However, this approach is error-prone due to the Lambertian surface assumption and simple lighting conditions.

It‟s only suitable for some simple objects, like t-shirt or sculptures, and need manual rectification.

Chapter3 Multi-Material Shape from Shading Reconstruction

This thesis focuses on the reconstruction of a target object in the video sequence under a single directional light condition. Reconstruction on a real object is always a difficult subject. Although the multiple-view technique can lead to the more precise 3D positions, it still needs time-consuming and accurate pixel correspondence and lots of image inputs. The photometric stereo technique also needs the specific controlled

Fig11. Tai-Pang et al‟s reconstructed surface

Fig12 Left: input image; middle: estimated normal map;

right: textured surface

light direction which is not suitable for the common video sequence due to its multiple unknown light direction. Furthermore, they cannot both deal with the not-rigid body and the uncalibrated video sequence. For these reasons, we propose using a novel hybrid methods which can recover 3D surfaces combining the non-rigid structure-from-motion and multi-material shape-from-shading.

3.1 First Phase – Multi-Material Shape from Shading

Conventional Shape-from-shading technique is only suitable for the target object composed of only one material (consistent reflectance property)but a common object surface usually have multiple reflectance properties. For this reason, we will divide the image inputs into spatially connected segments and each segment has only a single material. By the segmentation, we can perform the SFS to each single material segment. But SFS is the technique based on the optimization of intensity and reflectance model. There will be some artifacts between the boundary of each segment.

At the next step, we will adjust the boundary between each segment and propagate the offset between them to generate an intensity map with smooth boundaries. As we acquire the final smooth intensity map, we can get a surface without artifacts at boundaries. The detailed steps are introduced bellow.

3.1.1 Super-pixel Clustering

To avoid the artifacts between boundaries of different materials due to the intensity difference, we split the target object into multiple single material segments, and each one are spatially connected (Fig13).

Fig13.Superpixels of the synthetic image

“Super-pixels” [13] is the technique we use. It‟s a clustering technique based on Normalized Cuts. It exploits pair-wise brightness, color and texture cues between pixels to over-segment images into multiple spatially connected components. This technique accelerates region clustering and edge finding operations. With the super-pixel clustering, we can not only preserve spatial relationship between segments but also find out the boundaries between different materials as the picture shows above more easily.

3.1.2Propagating Intensity Offset between Super-pixels

As other clustering technique‟s limit, super-pixel technique can‟t find the perfect boundary between different materials (Fig14). It‟s due to the trade-off of position and intensity differences in objective functions. If we don‟t find out the precise boundary, our system further provides offset propagation to eliminate boundary artifacts.

To improve the boundaries by super-pixel, we use the shortest path method to align the boundary according to intensity gradients. After we generate a small gradient map around the specific boundary and readjust the boundary, we re-distribute the pixels around the boundary belong to the two neighbor super-pixels. Repeat this step, we get more and more precise boundaries. Although the aligned boundaries are closer to the material boundaries, the intensity difference at the boundary is not yet

eliminated. For this reason, we compensate the super-pixels offset to complete a smooth and continue intensity map.

At first, we choose one super-pixel as the start seed and then shift the neighbor super-pixels in the intensity domain. After we offset the neighbor super-pixels, they will be connected smoothly in the intensity domain. The neighbor super-pixels‟ offset can be split into two parts- boundary and internal parts(Fig15、Fig16). In the boundary part (Fig15), the offset is the difference from intensity of boundary pixel in the start super-pixel to the closest pixel in the neighbor super-pixel. To avoid noise, we average regional pixels around the boundary pixels and then take the result to the offset operation.

In the internal part (Fig16), we interpolate the offset values according to the distance to the boundary.

Fig15. Boundary part

Fig16. Internal part

Afterwards, we propagate offset operation to all super-pixels and then acquire a smooth connecting intensity map without the fault due to the boundary of different materials. Although we make this propagation, there may be still some artifact in the boundary in the final intensity map. To solve this problem, we repeat the propagation to get a better result or simply use the other smooth filter. Once we acquire a satisfying intensity map, we start our shape recovery by use of the following shape-from-shading technique.

3.1.3 Shape from Shading Shape Recovery

Our thesis is based on [13] for the basic shape and light condition recovery. The method in [13] recovers only the single material object, and we extend to the multiple material object. Therefore, we take the shape-from-shading technique to the final intensity map and use space-time constraints as the final smoothness. According to the surface intensity variation and the spatial and temporal coherence, we can recover the 3D data based on the optimization method.

Let V denotes the set of shape parameters. In order to decrease the dimensions of the cost function, we represent the 3D data in terms of height maps, and therefore, we only have to optimize the z values. Hence the set of shape parameters V can be defined as:

V {z_i} i1~ point number (1)

Our reflectance model is Phong model, since it is widely used in the computer graphics and parameters can be efficiently acquired. Given a light source L and the surface normal N, the Phong reflection model can be written as:

)

Phong exponent term. The vector

e

denotes the eye direction and

r

is the reflection vector at the mirror direction of the light source with respect to the surface normal.

Let R denotes the reflectance parameters which can be defined as:

R={Kd, Ks ,α } (3) Our cost function

C

can be defined as the sum of the square error between the input real image

I

and synthesis image

S

C V R I S V R i numberof pixel

Fig. 17 demonstrates the flow chart of the minimization of the objective function.

The synthesis data will be iteratively refined.

Refine the shape and reflectance parameters < V, R >

Figure 17: The flowchart of the minimization of the cost function

In other words, our goal is to find the shape and reflectance parameters

reflectance model parameters R. The parameters R is the global parameters for all the triangle facets but the parameters V represent only local geometry. If we refinement these two parameters into a single optimization procedure, we should choose the proper scales of the two kinds of parameters to balance the effects. In order to avoid unbalance condition, we optimize these two parameters separately to obtain the more accurate result. The flow chart of the optimization method is showed below,

3D shape

Figure18: Flow Chart of the optimization algorithm

First, we assign the initial shape and reflectance parameters to this system.

Different initial conditions will influence the optimization results. We can adjust the parameters manually or apply the batch work. Afterward, the reflectance parameters will fix and the shape parameter V is refined.

In order to optimize the initial shape more reliably, we just apply the diffuse model in the first phase. Specular terms will be included on the following phases.

After optimizing the shape parameters, the reflectance parameters R will be refined while the optimized parameter V is fixed. We will repeat these steps until the cost value is small than the threshold.

We treat the minimization of the cost function as a non-linear least square problem and solve its solution by the conjugate gradient method.

According to the conjugate gradient, the cost function should be differentiated by the variable of shape parameters V,

V

( term should transform the surface normal to the position representation.

Fig 19 shows the normal estimated by the cross product. Given a surface, we can approximate the surface normal by using the cross product of the two vectors. In order to reduce the error caused by the approximation, normal estimated by other pairs of tangent vectors should be included for a more reliable result.

Figure 19: the surface normal N

reflection vector should be represented in term of N, L:

L N

r 



(8) In order to acquire more accurate results, we only adjust the z component to reduce the dimension of the optimization function.

Our reflectance parameters R{K_d,K_s,



} can also be computed by finite differencing as the shape optimization. In general, we normalized the parameters of Phong model and assumeK_d K_s 1.

3.2 Second Phase - Non-rigid structure from motion

By use of shape-from-shading, we can acquire a detailed surface. However, the surface from SFS is relative depth, not the absolute depth like structure-from-motion.

Furthermore, shape-from-shading recovers surface by frames in temporal domain and can‟t deal with occlusion or correspondence problem. For those reasons, we take structure-from-motion as constrains for more accurate single frame recovery and other cues for temporal coherence. In general, structure-from-motion is technique for rigid body and uncalibrated video but can‟t deal with the occlusion problem. We take the improved structure-from-motion [15] at high textural features to improve surfaces recovered by shape-from-shading. This method models shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation and assumes that the object shape at each time instant is formed from a Gaussian distribution. We use this method to recover the specific points of the non-rigid body, and therefore build the relationship of temporal coherence for shape-form-shading which will be specified at the next section.

3.3 Third Phase - Combination with space time constrain

After we recover the detailed SFS surface and accurate discrete SFM points. We combine them into a complete and precise surface. However, there will be too much noise, since every time-varying depth map is acquired independently and temporal flicker may occur between consequent frames. In order to solve this problem, we add the spatial and temporal constraints to obtain the more reliable result.

3.3.1 Spatial constraints

For a smoother surface, we propose the space-time shape-from-shading to recover the 3D data. To stabilize the iterative shape-from-shading and to obtain reliable result, we use a spatial constraint as

 

3.3.2 Combination with shape-from-shading and structure-from-motion

Non-rigid Structure-from-motion we take can only recover the specific feature points at each frame. By use of the specific points at shape-from-shading corresponding to those at structure from motion, we build the transformation between the SFS and SFM, and then transform the all recovered 3D points at SFS to the SFM.

However, because of SFS‟ intrinsic ill-posed problem, the recovery is not good enough. For this reason, we take the acquired depth as the initial guess to repeat the flow above. We can recover a more precise shape with the combination of shape-from-shading and structure-from-motion (Fig21). The SFM constrain is bellow.

(9)

Fig21. Flowchart of Combination with SFS and SFM Optimize SFS with

Initial depth used for SFS recovery

Least square

T is the transformation of specific points from SFS to SFM; M is the depth of specific

points relative to the transformed SFS plane. Finally, we conclude SFS optimization combined with SFM as bellow.

(10)

Chapter 4 Experiment and Result

In this chapter, we describe our experiment and show our result. At the beginning, we introduce the experiment of the input video sequence. Then, we will show the final results where the structure-from-motion and intensity propagation are included.

4.1 The Experiment of Input Video Sequence

In our system, we use one video sequence to create the depth maps. In order to acquire the more accurate surface details, our input images are taken under a restricted light environment, where only one directional light is applied.

We set a projector as the single light source. Our input data are the two synchronized high-definition video (HDV, 1280*720 pixel resolution) and the frame per second (FPS) is set to the 30 frames per second. In order to increase the efficiency, we decrease the size of the video to 240*120.The three figs bellow shows the three different views of the video sequence.

Fig22. Three different views from the video sequence.

We pick a set of apparent feature points at the surface of the object as the specific points (Red points Fig25).

Fig23. Picked specific points

In our research, we apply a novel temporal shape-from-shading combined with SFM to reconstruct the 3D shape. At first, we utilize an optimization method [13] to solve the ill-condition of shape-from-shading. This method can optimize the space

and reflectance parameters to minimize the cost function. Afterwards, we use the offset propagation to reduce the boundary artifacts. This method can reduce the depth offset between multiple materials and make the reconstructed surface smooth. The progress is showed bellow.

(a) Original image (b) Original intensity

Fig24. Offset propagation effects

4.2 The

Reconstructed Surface Detail

After we acquire the satisfied intensity map, we continue the optimization of SFM for more accurate and detailed surface. The Figs bellow shows the effects of the offset propagation and SFM constrain. With the offset propagation, the surface artifact between the boundaries of the materials can be eliminated. With the SFM, the detailed more accurate surface can be acquired.

Fig25. Original Image

Fig26. SFS without offset propagation

Fig27. SFS with offset propagation

Fig28. SFS without surface propagation

Fig29. SFS with offset propagation, but without SFM constrain

Fig30. SFS with offset propagation and SFM constrain

Chapter 5 Conclusion

In this thesis, we propose an improved space-time shape-from-shading to reconstruct the surface of multi-material object. We utilize the offset propagation to recover the surface of multi-material object to improve the original SFS (suitable only for single material). We also apply the optimization method combined with the non-rigid SFM for the more detailed surface recovery. Finally, we apply the temporal constrain for the final optimization in order for one smooth depth sequence suitable for the 3D display.

Our method is limited for the simple light condition and toward the smooth object.

Our contribution include

(1) A novel space-time shape-from-shading for recovering 3D data (2) Reconstruct more detailed surface toward the multi-material object (2) Reconstruct more detailed surface toward the rigid or non-rigid object

Chapter 6 Future Work

In this thesis, we adopt Phong model as the reflectance model. Other reflectance models such that Torrance model or BSSRDF which has more physical cues may get more accurately results. And the other hand, we can apply other numerical method such that Fast Marching Method (FMM) to speed up the optimized procedure.

Reference

[1] Anton van den Hengel, Anthony Dick, Thorsten Thorm¨ahlen, Ben Ward, Philip H.

S. Torr,” VideoTrace: Rapid interactive scene modeling from video”, SIGGRAPH‟07 [2]Szymon Rusinkiewicz Olaf Hall-Holt Marc Levoy, ”Real-Time 3D Model Acquisition”, ACM TOG‟02.,

[3] L.Zhang, N. Snavely, B. Curless, S.M. Seitz, “Spacetime Faces: High Resolution Capture for Modeling and Animation”, SIGGRAPH‟04.

[4]G. Vogiatzis, P. H. S. Torr, R. Cipolla,” Multi-view Stereo via Volumetric Graph-cuts”, CVPR‟

[5]G. Vogiatzis, P. H. S. Torr, R., ” Volumetric graph-cut”, CVPR'05

[6]Maxime Lhuillier, Long Quan, “A quasi-dense approach to surface reconstruction from uncalibrated images”, PAMI‟05

[7]Lorenzo Torresani, Aaron Hertzmann,Chris Bregler,”Learning Non-Rigid 3D Shape from 2D Motion”, NIPS 2003

[8]Hertzmann,A,Seitz,S.M. ,“Example-Based Photometric Stereo ： Shape Reconstruction with General, Varying BRDFs”, PAMI‟05

[9]Carlos Hernandez Esteban, George Vogiatzis, Roberto Cipolla, ”Multi-view photometric stereo”, TPAMI‟08

[10] Gang Zeng, Yasuyuki Matsushita, Long Quan, Heung-Yeung Shum, ” Interactive Shape from Shading”, CVPR‟05

[11]Tai-Pang Wu, Jian Sun, Chi-Keung Tang, Heung-Yeung Shum,“Interactive Normal Reconstruction from a Single Image”,SIGGRAPH‟08

[12]Hui Fang, John C. Hart,” Textureshop: Texture Synthesis as a Photograph Editing Tool”,SIGGRAPH‟04

[13]Xiaofeng Ren, Jitendra Malik,”Learning a Classification Model for

在文檔中無校正影片中多材質物體深度之重建 (頁 19-0)