Method - 單一視角影片中折疊表面之深度重建

The input of our system is a single-view video sequence. Here we divide our method into four stages: Preprocessing, Initial surface by Shape-from-shading , Control Point Estimation and PCA-based space-time optimization. Details of these stages are then introduced in the following sections.

Shading image Color Component

Result surface geometry

Fig11.Demostration of our system.

Selected

reference frame Input Video

Sequence Target Object Segmentation Decoloring

Initial surface by Shape-from-shading

Initial-guess by SFS

Distance map estimation

Control point tracking Estimated control points

PCA-based Space-time optimization for surface

sequence

Iteration if needed Preprocessing

3.1 Preprocessing:

Given a single-view input video, our first step is to perform image/video segmentation for tar region extraction. Here, we modify [Jue Wang 2005.]‟s video cut method.

For a common cloth image, its reflected intensities result from both shading from surface gradients and material reflection. Before we estimate its surface undulation, we should remove the material‟s effect and reduce our problem to Shape-from-shading of a single-material surface..

We combine [Marshall F. Tappen 2005.]‟s methods to remove the material-reflectance component. First, we create a color histogram to store colors of extracted regions of the whole video sequence. For each pixel in one particular image, we calculate the color vector C(x,y) = <R,G,B> and find the same direction color vector in the histogram which has maximum intensity. Because the histogram is created from all frames, for a highly deformed surface, it‟s highly possible that there exists one pixel of an individual material has faced the light direction. Based on the Lambertian reflection, we assume the maximum intensity of a color vector as the individual material color. After all, we can calculate normal of each pixel (x,y) as N x, y = ^{C x,y −I}_I ^Min

Max−I_Min .

(a) (b) (c)

Fig12.The decoloring processing. (a) Source image.

(b)Recovered shading (c) Color component.

3.2 Initial surface by Shape-from-shading:

3D mesh is usually a good parametric representation of geometry reconstruction.

Nevertheless, for estimating highly deformed surface from a single view, exploring the whole degrees of freedoms of meshes is too expensive in computation and is also easily trapped into local minima. Instead, we use a lower dimensional parametric space, called morphable model space(or PCA space) for more stable shape recovery.

We assume that our target surface has following properties:

(1) Foldable but with little self-occlusion.

(2) The maximum of surface undulation is less than ¹₄ of an edge.

(3) The four boundary vertices are nearly laid on one plane.

Surface with these properties can provide more reasonable Shape-from-Shading result. Furthermore, to track the motion markless cloth, we select the surface with the biggest area as our “reference frame” from the video sequence.

The following step is to generate an initial-guess depth map. Our input is only one single image and has no other viewpoints. Shape-from-Shading (SFS) is the few solution that can deal with such limited information. Here, we adapt Pentland‟s linear Shape-from-shading method.

Shape-from-Shading recovers depth from normal vector. Due to noise and insufficient scene information, it cannot tell us the surface‟s really height. In other word, Shape-from-Shading only recovers “relative” height.

Fig13.SFS result in different viewpoint.

3.3 Control Point Estimation

To deal with unstable and highly deformed clothes or flags, we propose using morphable model for shape recovery. Nevertheless, mapping between morphable grids to input image is not straightforward. Here we consider the geodesic distances on the surface we recovered in SFS stage. Given point 𝑋 and 𝑌 on the surface, if the surface is flat, the straight line across 𝑋 and 𝑌 have minima distance. When the surface is undulating, the “line” with minimum geodesic distance may not look

“straight” at the camera viewpoint as shown in Fig.10. No matter the line looks like, the geodesic distance is the same location.

Without loss of generality, we consider the distance from the upper two vertices to all other points over the surface. Our generic mesh can be seen as lattices with the same size. Assume the width and height of the square flags is 𝐿 and break the flag into n*n grid. Every grid‟ size is ^𝐿

𝑛∗_𝑛^𝐿 .We name every vertex 𝑉 𝑢, 𝑣 of mesh as Fig.15.

Fig14.The line across 𝑋 and 𝑌. (a) Source texture with straight line. (b) (c) The same line on not-flat geometry.

(a) (b) (c)

. .

First we consider the v-axis .To the point over the v=k axis, we can write:

Then about the u-axis. To all points over the i=k axis we can write:

With the above two equation, we can locate control points over the 3D geometry of the surface.

Fig15.An example of 3*3 mesh.

𝑉 𝑢, 𝑣

After we define the above two equations, the next step is how to calculate the distance between V_u_,_vand V over the surface. Of course we cannot directly ₀_,₀ calculate the L2 norm between the two vertexes. The solution is to use Dijkstra algorithm.

Dijkstra algorithm bases on the following concept: If there exists a path P from

0 ,

V to 0 V_x_,_ywhich has minima geodesic length, the path P from ' V to some ₀_,₀ point just in front of V_x_,_y is also a minima path. In our experiment, the basic dynamic programming function is:

We calculate the distance function all over the surface, therefore, we get a table saved all vertex‟s minima distance to V₀_,₀.

Generally, the above equation cannot work well because of our decoloring method. The decoloring method is not perfect so there exists gaps between color regions. Surface‟s Normals near the gap sharp thus effect shape-from-shading‟s result and so that the dist(V₀_,₀,V_i_,_j) will become larger if the path P cross the gap. So we adjust the dist function:

Here T_d acts like an upper bound of distance of any two close vertices. So the color‟s effect is diminished. Not only we calculate all the distance from V to all ₀_,₀



other vertex, we also take V_n_,₀into account. After the two dynamic programming processes, we get two distance map from the two upper vertexes.

Fig17. The distance map.(a) The recovered shading image. (b) From the left-up vertex (c) From the right-up vertex. The Darker is closer.

(a)

(b) (c)

0 , 0 0 ,

0 P

V  , V_n_,₀P_w_,₀ //Assign the two upper point‟s position For u = 1 to W

For v = 1 to H

For 1≤k_r,k_c≤ n

If( _u_v ^r _n _u_v ^r L T_row

n Lk P

V n dist

Lk V

dist( , r) ( )  ( , r) ( )  ||

|| ₀_,₀ _, ² ² _,₀ _, ² ² )

And( _u_v _n _u_v ^c L T_col

n Lk L P

V dist P

dist( , )  ( , )  (2  )||

|| ₀_,₀ _, ² _,₀ _, ² )

r r

ck uv

k P

V _,  _,

End ^k^r,^k^c End j

End i

Finally we check all the pixels to find someone located at right distance from two upper vertices:

Fig18. The control points algorithm. T_col and Tro ware thresholds.

The above mentioned algorithm has a problem: because of error propogation, the estimated lengh and position are not precise in lower region (far from the origin vertices). From Fig19, we can see that control points are compact in upper region, but are distorted in lower part.

3.4 Space Optimization of Mesh Recovering

It is difficult and imprecise to treat the above result as the really mesh point‟s position.The z-axis of each points is from shape-from-shading result, which is just a

“related” depth. Worst of all, the control points miss much and we easily see that those points didn‟t locate at what they should be as i and j increased.

Because of the above reason, we need a stable method to recover the geometry and take its characteristic into account. Meanwhile, the method should restrict the improved control points‟ location near our estimation. One appropriate technique is

Fig19.Estimating control points. (a) The selected frame. (b) The columns when n=10. (c) The columns when n=10. Notice that the two row/column of k=0 and 10 is not appear. (d) The control point we really recovered. The mesh is 16*15.

(a) (b)

using Principle Component Analysis(PCA). First we generate training data by cloth simulation, then find out their statistics characteristics, called principle componenes.

Then we use the PCs to recover the mesh by subspace optimization.

The relation among input estimation control points ( X ,Y , Z ) generated at the

To reduce the computing complexity, we change the matrix order as follows:

where

T

^¹

S

^¹ are inverse of T and S .

Because the two upper vertices should be translated to the same position, we can easily define the scalar matrix S‟ by scale the distance between V₀_,₀ and V_i_,_jto

Fig21. The relation between PC subspace and input image domain.

where is .

We therefore recover the shape by solving the optimization problem:

As we mentioned before, our estimation control points‟ correctness is inversely proportional to the distance from upper vertex. So we modified the optimization regarded as a spring constraint between vertices. The following energy function will be added in our minimize function:



where L is the grid‟s length and E_cis our previous optimization equation.

We call the rear part as distance constraint.

Last but not least, we have to add one more constraint in the optimization function. Because of the , the farther control points may move to unpredicted location as follows:

To get a better result, we give the boundary point a bigger weight. Althrough our minima distance method may not works well over the boundary, the boundary vertex‟s location is easy to directly estimate from our SFS result. We define the boundary cost function :

where constraints the boundary‟s 2D position.

wctrl

Fig22. The bad result without boundary constraint.

The final optimization function we used is:

3.5 Time Optimization

Finally we have to map the mesh on all the frames in the video. Here we use the method in [Mathieu Salzmann,2009]. This paper introduce a optimal solution with a given 3D mesh and mapping the mesh to all frames. The optimal equation is:

Where is it‟s objective function for a single frame, and

We replace the component with our objective function and apply to all frames. Thus we get a continuous video sequence.

Fig23. The bettwe result and its wireframes.





在文檔中單一視角影片中折疊表面之深度重建 (頁 20-34)