Bilinear method - Multiple view computational methods

Chapter 2. Estimation of structure and motion in projective space

2.1. Multiple view computational methods

2.1.2. Bilinear method

Unlike the factorization-based method, the bilinear method does not attempt to estimate the projective depths. Instead, these are shown to be redundant, and their estimation leads to a new expression for the square norm of a vector that is bilinear function of the matrices M_i^{( P}⁾ and vectors X^{( P}_j ⁾ [10].

First, the error function the bilinear method use in optimization process is as

∑

applying factorization to a data matrix in which all z_ij =1. The error term associated

with the projection matrix in projective space |M_i⁽^P⁾|²=1(i=1,L,m) can be

expressed as E_i(X⁽^P⁾)=|C_im_i⁽^P⁾|²_F, where, m_i^{( P}⁾ denotes a column vector∈ R ^12×¹

is equivalent to compute the singular vector of C associated with the smallest _i singular value.

Next, fix the matrix and solve for X^{( P}_j ⁾ with a unit Forbenius norm that minimizes [E_j(M⁽^P⁾)] using SVD. We optimize the structure and motion iteratively without estimating the projective depths until the results converge.

As mentioned earlier, the bilinear method does not require all feature points to be tracked in all images: at each iteration, each X^{( P}_j ⁾ can be estimated independently using all the views in which it is observed. Likewise, each M_i^{( P}⁾ can be estimated independently using the visible points in the i-th view. Because it is easy to satisfy the minimum number of views (points), we can discard some of views in which the corresponding projection point is the outlier while computing X^{( P}_j ⁾, and discard some of points whose corresponding projection points in the i-th view are outliers while computing M_i^{( P}⁾. The question is: how can we decide which projection points are not consistent with others automatically? We use the iterative reweighted least square (IRLS) to decide the outliers, to be introduced in the next section.

2.2. Outliers detection by the iterative reweighted least square(IRLS) method

The target of the IRLS method is to accomplish the least square of weighted residual by deciding weights, then deciding structure and motion in projective space iteratively. The weighted residual is defined as:

∑

= At last, when we deicide the final weights V , we are able to select the outliers which

will not be used in the re-estimation of structure and motion in projective space.

2.2.1. Adaptive weighting

This section introduces how to compute the error function matrix and update weighting matrix.

Define

Σ ：the covariance matrix of the noise on j D _j ∈R³^m^×³^m

Assuming the noise of D is a Gaussian distribution with zero mean, and _j Σ as _j the covariance matrix.

V ：representing the weights of the j D _j ∈R³^m^×³^m

Σ ) should be determined in the feature tracking algorithm. j

However, this is not necessarily a requirement. In the absence of prior knowledge,

) 0 (

Σ is initialized as the identity matrix. j

The error function of residual is introduced in the IRLS for updating the covariance matrix of the noise. The error function is implemented via the truncated quadratic as follows：

Define

We can find when the norm-2 of residual is much bigger than k, the value of error function is close to zero.

At each iteration of IRLS, we can compute the (t-1) covariance matrix of noise as：

)

The next weighting matrix is able to be computed from：

) 1 initialized as the diagonal matrix. Consequently, Σ^(t_j⁾ is still a diagonal matrix.

2.2.2. Optimization with weights

This section introduces the concept and algorithm of the optimization the structure and motion in projective when given a fixed weights. The key point of optimization with weights is how to separate M^{( P}⁾ and X^{( P}⁾ from D with the

However, SVD cannot be applied in the case of (10). To solve (10), the method generally known is “surrogate modeling” [13]. The essence of surrogate modeling is applying a computationally simpler model to iteratively approximate the original hard problem. Where residuals on the j-th point X^{( P}_j ⁾ with projection matrices in projective space between D is defined as _j N_j =(D_j −M⁽^P⁾X⁽_j^P⁾), and the weighted residual is defined as N~_j =V_jN_j

. Here, the D~

in (11) is the so called surrogate used to iteratively approximate the D

in (10). Each column D~_j

for the next approximation through modifying the original data ( D ) with the weighted residual _j N~_j

. V will reduce the large _j back-projection error N to the smaller error _j N~_j =V_jN_j

. These two procedures are executed iteratively to achieve the approximate solution of (10). The algorithm converges under the condition that every residual N is stable instead that every _j N~_j converges to a zero vector in case of not producing proper V~_j

In the following, we shall describe the complete surrogate modeling algorithm.

Here, we omit the script of iteration number in IRLS for convenience and the elegance of the expression. Thus, the surrogate modeling algorithm to optimize motion and structure in projective space with weights goes as follows：

Initialize：

1 ¹ ，q denotes the iteration number in optimization process.

Step1. Estimating Model via Surrogate:

Apply SVD to ~q₋¹ Step2. Calculate Residuals

P q P q q

q D M X

N = ~ ⁻¹− ⁽ ⁾ ⁽ ⁾ Step3. Modify Surrogate

∀ ，j N~^q_j =V_jN^q_j

If Stop, return M⁽^P⁾ =M⁽^P⁾^q,X⁽^P⁾ = X⁽^P⁾^q If not stop, q= q+1，go to step1

In summary, the IRLS executes the adaptive weighting and optimization with weights iteratively until N~_j

converges to a zero. Then we will discard such projection points that corresponding v_ij <1 in re-estimation using the bilinear method to output the final structure and motion in projective space. The next step is to transform our result to Euclidean space

Chapter 3. Euclidean Reconstruction

This chapter introduces the auto calibration which is the process of determining intrinsic matrix K of the camera directly with multiple projection matrices in projective space from multiple views. Once this is done, it is straightforward to compute the result of Euclidean reconstruction. Auto-calibration avoids the onerous task of calibrating cameras using special calibration object, and this gives great flexibility in real application [11].

3.1. Auto-calibration using the iterative absolute dual quadric method

The absolute dual quadric Q is a degenerate dual quadric represented by a _∞^*

Under the 4× homography H transforms the 3D point of homogenous form in 4 Euclidean space to that in projective space:

This leads to a linear system

solving w, we can obtain w by the inverse of ^* w, then use the new w to run the next iteration until it converges. This method provides a *)^'

(Q_∞ for extracting the rectifying homography.

3.2. Acquisition of the rectifying homography

After acquisition of (Q_∞^*⁾^', now we compute the rectifying homography. First, apply SVD to *)^'

is not able to be determined; however, it only affects the translation in projection matrix such that keep the structure of object the same. We choose

) P

hv₄ ≅ ₀

Chapter 4. Experimental Results

In this chapter, we will demonstrate our experimental results including synthetic data and real data from image stream. In synthetic data experiment, we show the nearly perfect result in ideal case, and how to use IRLS to make our system stable with noised data. In real data experiment, we apply our system to human head recognition and dense reconstruction of human head geometry. The image stream in our database has some missing feature correspondences, and our result shows the reasonable missing data do not affect the stability of our system.

4.1. Synthetic data

4.1.1. Synthetic data without noise

In our first experiment, we create 10 views of an artificial six-faces cube geometry consisting of 26 3D points with a dimension of 20×20×20. The image resolution is800×600. To verify the reconstruction result, a similarity transformation

sR where s is a scaling factor and R is a rotation matrix between the output structure and the synthetic model is computed , then the mean and standard deviation of 3D point errors among all the corresponding 3D points are examined. The evaluation of the result is shown in Table 4.1.

Table 4.1. Evaluation of the 3D synthetic cube reconstruction results.

mean of 3D error stdv of 3D error estimated cube structure 0.0005 0.0002

Here, we can see the result is nearly perfect in the ideal case.

we generate 10 views of another synthetic object which simulates the human head containing 19 points with a dimension of 20×20×10. Figure 4.1 shows the six snap shoots of the wireframe of the synthetic human head. The evaluation of 3D point errors is shown in Table 4.2. The result is nearly perfect.

statistics sample

Figure 4.1. The wireframe of synthetic human head

Table 4.2. Evaluation of the 3D synthetic head reconstruction results mean of 3D error stdv of 3D error estimated head structure 0.0401 0.02537

4.1.2. Synthetic data with noise

In previous section, we have seen our result is nearly perfect in the ideal case without noise. In the following, we will test the noise tolerance of our system. We add the noise generated by the Gaussian PDF with μ =0 and σ =3 in two directions (horizontal and vertical). We generate 20 noisy copies of the 10-view image stream.

We show the cube reconstruction results obtained by two different schemes: one with the statistical IRLS scheme and another without. Table 4.3 lists the average of mean and standard deviation of the 3D point location errors of the cube reconstruction without IRLS scheme, and Table 4.4 lists the average of mean and standard deviation of the 3D point location errors of the cube reconstruction with IRLS scheme(r = 0.95)

statistics sample

Table 4.3 Evaluation of the cube reconstruction results in the noise case without IRLS.

mean of 3D error stdv of 3D error

sample 1 0.4763 0.2152

sample 2 0.3457 0.1421

sample 3 14.327 5.6238

sample 4 0.3346 0.1405

sample 5 14.379 4.5152

sample 6 0.5092 0.1475

sample 7 0.3323 0.1708

sample 8 0.3211 0.1579

sample 9 0.3194 0.1345

sample 10 0.4327 0.2245

sample 11 0.3474 0.1314

sample 12 15.766 5.3354

sample 13 0.4676 0.1810

sample 14 0.3927 0.1692

sample 15 14.421 4.2290

sample 16 0.3228 0.1381

sample 17 0.3844 0.1787

sample 18 0.4043 0.1760

sample 19 0.3328 0.1518

sample 20 0.4285 0.1375

average 3.2523 1.1150

statistics sample num.

Table 4.4. Evaluation of the cube reconstruction results in the noisy cases with IRLS mean of 3D error stdv of 3D error

sample 1 0.5985 0.2598

sample 2 0.3874 0.1904

sample 3 0.5064 0.2519

sample 4 0.7287 0.3341

sample 5 0.7445 0.4098

sample 6 0.4838 0.2185

sample 7 0.3609 0.1604

sample 8 0.6268 0.3811

sample 9 0.4062 0.2675

sample 10 0.5297 0.2738

sample 11 0.4961 0.2579

sample 12 0.5917 0.2769

sample 13 0.5162 0.4306

sample 14 0.5179 0.2283

sample 15 0.5619 0.2610

sample 16 0.4868 0.2699

sample 17 0.5375 0.1912

sample 18 0.4900 0.1997

sample 19 0.3961 0.2116

sample 20 0.5751 0.2278

average 0.5271 0.2651

From the two experimental results, we can find out there are four crashed samples in a total of 20 samples in the cases without IRLS scheme. We correct the crashed case successfully with the IRLS scheme. The average of mean of 3D error decreases from 3.2523 to 0.5271.

Next, we show the results in synthetic human head reconstruction, also divided into two categories. The first result is without IRLS scheme, and the second result is

statistics sample num.

Table 4.5 Evaluation of the head reconstruction in the noisy cases without IRLS mean of 3D error stdv of 3D error

sample 1 0.1819 0.0864

sample 2 0.2635 0.1689

sample 3 0.1879 0.1132

sample 4 0.2328 0.1188

sample 5 0.2488 0.0789

sample 6 0.1860 0.1074

sample 7 0.1864 0.0943

sample 8 0.2349 0.1521

sample 9 0.3577 0.2220

sample 10 0.2542 0.0882

sample 11 4.9966 2.6401

sample 12 3.3724 1.9236

sample 13 0.1780 0.1023

sample 14 0.2221 0.1216

sample 15 0.1909 0.1083

sample 16 0.2144 0.1117

sample 17 0.2251 0.1173

sample 18 0.2499 0.1282

sample 19 0.3188 0.1799

sample 20 0.2672 0.1849

average 0.6285 0.3424

statistics sample num.

Table 4.6 Evaluation of the head reconstruction results in the noisy cases with IRLS mean of 3D error stdv of 3D error

sample 1 0.3280 0.1627

sample 2 0.2654 0.1759

sample 3 0.2854 0.1951

sample 4 0.2568 0.1244

sample 5 0.2504 0.0900

sample 6 0.2955 0.1555

sample 7 0.2436 0.1620

sample 8 0.3058 0.1636

sample 9 0.2954 0.1955

sample 10 0.3214 0.2255

sample 11 0.2303 0.1173

sample 12 0.2673 0.1888

sample 13 0.1923 0.0922

sample 14 0.2756 0.1436

sample 15 0.2276 0.1405

sample 16 0.2469 0.1220

sample 17 0.1923 0.0902

sample 18 0.2656 0.1362

sample 19 0.3910 0.1995

sample 20 0.2789 0.1611

average 0.2708 0.1521

There are two crashed samples in the case without IRLS such that the input data can not achieve an equilibrium automatically. We correct these two samples successfully with the IRLS scheme (r = 0.95) to discard some projection points. The average of mean of 3D error decreases from 0.6285 to 0.2708.

We rely on that the IRLS selects the outlier points which are not consistent with the input data to increase the accuracy of our result. Although in some cases, discarding some projection points will increase the error slightly, it still helps our

statistics sample num.

system to resist the random like noise. In addition, some geometry constraints on the input data should be noticed:

1. Critical motion: the motion only contains the translation or one axis rotation.

2. All of the projection points in some views lie on a plane in the 3D space.

4.2. Real data with the missing data

4.2.1. Point structure reconstruction of the real object with combination

Due to the viewing constraint, we are usually not capable to reconstruct all points of the real object at the same time. Here, we demonstrate the experiment that we conduct the point structure of the object with two clip segments, then merge these two point structures into one by making use of the overlapping points.

Figure 4.2 shows the input image stream with correspondences and the reconstructed 3D point structure of clip segment A with 41 feature points (structure A), and Figure 4.3 shows the input image stream with correspondences and the reconstructed 3D point structure of clip segment B with 45 feature points (structure B).

The clip segment A and the clip segment B have the eight overlapping feature points, and we will use the similarity transformation computed from these eight overlapping points to combine the structure A and the structure B.

Figure 4.2. The input image stream and the 3D reconstructed point structure of the real object (clip segment A)

Figure 4.3. The input image stream and the 3D reconstructed point structure of the

Figure 4.4 The eight snap shoots of the combined 3D point structure for the structure A and structure B.

4.2.2.

Point structure recognition of the human head

We construct a database of five human heads, each containing 19 feature points.

A representative image data is shown in Figure 4.5.

Figure 4.5. The images in the human head dataset

Given the above input data, our system is able to reconstruct the 3D point structure.

Figure 4.6 shows the six snap shoots of our reconstructed 3D point structure. Table 4.7 is the illustration of our database, each consisting of: the image stream with feature tracking and the corresponding 3D point structure.

Figure 4.6. The wireframe representations of the reconstructed 3D point structure

Table 4.7. Database of the real human heads with or without eye glasses.

Name Input Image (without eye glasses)

3D Point Structure

Input Image (with eye glasses)

3D Point Structure Wu

Ruping

Lee

Chenbc N/A N/A

The evaluation process of the reconstruction results is same as those for the synthetic experiments. Table 4.8 shows the human head recognition result for the five persons subject to a variation in wearing the eye glasses. In this experiment there are 19 head feature points including the four ear feature points. Table 4.9 shows the recognition result for the same case except that four ear feature points are removed (to simulate the case when ears are covered by the hair). From Tables 4.8 and 4.9, all the five persons are correctly recognized no matter whether wearing or not wearing the eye glass or whether four ear feature points are covered or not.

The experiment also indicates that the system can handle the problems of missing points and data noise. In the future, we need to collect a larger database, and

wearing a hat or with bear. Nevertheless, we can construct the 3D geometry for the head, the use of the geometric comparison can resist with the severe figure changes in principle. On the other hand, we may require the person being identified to follow the same general guidelines as she or he files the official papers like the passport photo or ID photo.

Table 4.8. Human head recognition result subject to a variation in wearing the eye glasses. Nineteen feature points including the ear feature points are used.

Wu LK Ruping Lee

Wu 0.2725 0.7034 0.2984 2.4347

LK 0.9480 0.4215 0.3158 3.0554 Ruping 0.8307 0.7614 0.1986 2.2197

Chenbc 1.4882 0.8595 0.5192 3.5336

Lee 1.0998 0.7634 0.3092 1.9284

Table 4.9. Human head recognition result for the same case in Table 4 except that four ear feature points are discarded.

Wu LK Ruping Lee

Wu 0.4809 0.6082 0.2215 1.5746

LK 0.5985 0.3161 0.3281 2.9405

Ruping 0.5952 0.6315 0.1317 1.4417

Chenbc 0.6695 0.8078 0.3147 1.6205

Lee 0.6812 0.7410 0.2884 1.1286

Person

without eye glasses Person with eye glasses

Person

without eye glasses Person with eye glasses

4.2.3. Dense reconstruction of human head

In this section, we continue another experiment to use the fast octree algorithm [1]

to do the dense reconstruction of human head with our estimated projection matrix in the Euclidean space. Figures 4.7-4.8 are the five snap shoots of the dense reconstruction results of two persons with the fast octree algorithm. Generally, the reconstructed model captures the real head shape, as indicated by the resemblance between the created one and the real one. However, due to the small camera projection matrix estimation errors the created head contain a bug (a dent in the head) for person 1. Also the quality of the reconstructed head depends on the total number of views used; more views give a better approximation, in particular, those views containing the rapidly varying silhouettes.

Figure 4.7. The dense reconstruction result for person 1 using the fast octree algorithm

Figure 4.8. The dense reconstruction result for person 2 using the fast octree algorithm

Chapter 5. Conclusion and future work

5.1. Conclusion

Our proposed system is capable to handle the missing data and noise. For handling the missing data, our proposed method can deal with directly. For handling the noise, the optimization process of our proposed method has the nature of the bundle adjustment to find the balanced solution such that the effects of random noise cancel out. In some cases where the correct solution results are not possible due to the influence of the outliers, the IRLS feature of our method plays a role to discover these outliers based on the statistical technique. In order to find an Euclidean reconstruction, an iterative auto-calibration procedure using the absolute dual quadric computes an overall rectifying homography even under the situation where the individual estimated camera intrinsic matrix varies across the views because of noise.

From the synthetic data experimental results, our proposed scheme shows the robustness against the random noise generated by the Gaussian PDF. In the real data experimental results, our system can handle mild image noise and missing features.

5.2. Future work

This thesis exploits the robustness problem of 3D reconstruction of rigid objects.

We consider to extend our method to deal with the advanced problems including (1) how to reconstruct structures of multiple objects moving independently [14], (2) how to reconstruct the articulated object whose subparts have different motions. For reconstructing the independently moving object or articulated object, some researchers try to use the statistical methods [15] like expectation maximization algorithm (EM) or the factorization-based methods imposed in articulation constraints [16].

For the application of human head/face recognition, there is a need to test on a larger database for the system evaluation. Besides, the selection of an appropriate representation of the head/face model to deal with variations in the head/face appearance is the essential problem in human head/face recognition.

For a higher accuracy of the reconstruction result, we can achieve by reducing feature correspondence errors.

References:

[1] H. L. Chou, “Constructing 3D Object Models from Image Sequences”, 博士論文, 國立交通大學, 資訊工程學系, 新竹市, 臺灣, 2004.

[2] Y.H. Fang, H.L. Chou, and Z. Chen (2003),”3D Shape Recovery of Complex Objects from Multiple Silhouette Images”, Pattern Recognition Letters 24, 1279-1293.

[3] C. Tomasi and T. Kanade, “Shape and Motion from Image Streams under orthography: A Factorization Method”, International Journal of Computer Vision, vol. 9, no. 2, pp. 137-154, 1992.

[4] C. J. Poelman and T. Kanade, “A Paraperspective Factorization Method for Shape and Motion Recovery,” Proc. Conf. Third European Computer Vision, vol. 2, pp.

97-108, 1994.

[5] R.I. Hartley, “Euclidean Reconstruction from Uncalibrated Views”, Applications of Invariance in Computer Vision, M.Z. Foresyth, ed., pp. 237-256. Berlin Heidelberg: Springer Verlag. 1994.

[6] R. Szelinski and S.B. Kang, ”Recovering 3-D Shape and Motion from Image Streams Using Non-Linear Least Squares,” J. Visual Comm. and Image Representation, vol. 5, no. 1, pp. 10-28, Mar. 1994.

[7] S. Christy and R. Horaud, “Euclidean Shape and Motion from Multiple Perspective Views by Affine Iteration”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 11, pp.123-141, 1995.

[8] T. Kanade and D. Morris, “Factorization Methods for Structure from Motion,”

Philosophical Transactions of the Royal Society of London, vol.A, no. 356, pp.

1,153-1,173, 1998.

[9] H. Aanaes and R. Fisker, “Robust Factorization”, IEEE Transactions on pattern Analysis on Pattern Analysis and Machine Intelligence, vol. 24, no.9, pp. 121-125, 2004.

[10] S. Mahamud, M. Herbert, Y. Omori*, J. Ponce, “Provably-Convergent Iterative Methods for Projective Structure from Motion”, Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 1,018-1,025, 2001.

[11] J. Oliensis, “Fast and Accurate Self-Calibration”, Proc. Int' l Conf. Computer Vision, pp. 745-752, 1999.

[12] R. I. Hartley, “In Defense of the Eight-Point Algorithm”, Proc. Fifth Int’ l Conf.

Computer Vision, pp. 1,064-1,070, 1995

[13] A. J. Brooker, J. E. Dannis, Jr., P. D. Frank, “A Rigorous Framework for Optimization of Expensive Functions by Surrogates”, Structural Optimization, vol.

17, no. 1, pp. 1-13, 1999.

[14] J. Costeira and T. Kanade, “A Multibody Factorization Method for Independently Moving Objects”, Int’ l J. Computer Vision, vol. 29, no.3, pp. 159-179, 1998.

[15] A.Gruber, Y. Weiss, “Multibody Factorization with Uncertainty and Missing

在文檔中利用數位相機建構三維物體點結構 (頁 16-0)