Chapter 2. Estimation of structure and motion in projective space
2.1. Multiple view computational methods
2.1.2. Bilinear method
Unlike the factorization-based method, the bilinear method does not attempt to estimate the projective depths. Instead, these are shown to be redundant, and their estimation leads to a new expression for the square norm of a vector that is bilinear function of the matrices Mi( P) and vectors X( Pj ) [10].
First, the error function the bilinear method use in optimization process is as
∑
applying factorization to a data matrix in which all zij =1. The error term associatedwith the projection matrix in projective space |Mi(P)|2=1(i=1,L,m) can be
expressed as Ei(X(P))=|Cimi(P)|2F, where, mi( P) denotes a column vector∈ R 12×1
is equivalent to compute the singular vector of C associated with the smallest i singular value.
Next, fix the matrix and solve for X( Pj ) with a unit Forbenius norm that minimizes [Ej(M(P))] using SVD. We optimize the structure and motion iteratively without estimating the projective depths until the results converge.
As mentioned earlier, the bilinear method does not require all feature points to be tracked in all images: at each iteration, each X( Pj ) can be estimated independently using all the views in which it is observed. Likewise, each Mi( P) can be estimated independently using the visible points in the i-th view. Because it is easy to satisfy the minimum number of views (points), we can discard some of views in which the corresponding projection point is the outlier while computing X( Pj ), and discard some of points whose corresponding projection points in the i-th view are outliers while computing Mi( P). The question is: how can we decide which projection points are not consistent with others automatically? We use the iterative reweighted least square (IRLS) to decide the outliers, to be introduced in the next section.
2.2. Outliers detection by the iterative reweighted least square(IRLS) method
The target of the IRLS method is to accomplish the least square of weighted residual by deciding weights, then deciding structure and motion in projective space iteratively. The weighted residual is defined as:
∑
= At last, when we deicide the final weights V , we are able to select the outliers whichwill not be used in the re-estimation of structure and motion in projective space.
2.2.1. Adaptive weighting
This section introduces how to compute the error function matrix and update weighting matrix.
Define
Σ :the covariance matrix of the noise on j D j ∈R3m×3m
Assuming the noise of D is a Gaussian distribution with zero mean, and j Σ as j the covariance matrix.
V :representing the weights of the j D j ∈R3m×3m
Σ ) should be determined in the feature tracking algorithm. j
However, this is not necessarily a requirement. In the absence of prior knowledge,
) 0 (
Σ is initialized as the identity matrix. j
The error function of residual is introduced in the IRLS for updating the covariance matrix of the noise. The error function is implemented via the truncated quadratic as follows:
Define
We can find when the norm-2 of residual is much bigger than k, the value of error function is close to zero.
At each iteration of IRLS, we can compute the (t-1) covariance matrix of noise as:
)
The next weighting matrix is able to be computed from:
) 1 initialized as the diagonal matrix. Consequently, Σ(tj) is still a diagonal matrix.
2.2.2. Optimization with weights
This section introduces the concept and algorithm of the optimization the structure and motion in projective when given a fixed weights. The key point of optimization with weights is how to separate M( P) and X( P) from D with the
However, SVD cannot be applied in the case of (10). To solve (10), the method generally known is “surrogate modeling” [13]. The essence of surrogate modeling is applying a computationally simpler model to iteratively approximate the original hard problem. Where residuals on the j-th point X( Pj ) with projection matrices in projective space between D is defined as j Nj =(Dj −M(P)X(jP)), and the weighted residual is defined as N~j =VjNj
. Here, the D~
in (11) is the so called surrogate used to iteratively approximate the D
in (10). Each column D~j
for the next approximation through modifying the original data ( D ) with the weighted residual j N~j
. V will reduce the large j back-projection error N to the smaller error j N~j =VjNj
. These two procedures are executed iteratively to achieve the approximate solution of (10). The algorithm converges under the condition that every residual N is stable instead that every j N~j converges to a zero vector in case of not producing proper V~j
.
In the following, we shall describe the complete surrogate modeling algorithm.
Here, we omit the script of iteration number in IRLS for convenience and the elegance of the expression. Thus, the surrogate modeling algorithm to optimize motion and structure in projective space with weights goes as follows:
Initialize:
1 1 ,q denotes the iteration number in optimization process.
Step1. Estimating Model via Surrogate:
Apply SVD to ~q−1 Step2. Calculate Residuals
P q P q q
q D M X
N = ~ −1− ( ) ( ) Step3. Modify Surrogate
∀ ,j N~qj =VjNqj
If Stop, return M(P) =M(P)q,X(P) = X(P)q If not stop, q= q+1,go to step1
In summary, the IRLS executes the adaptive weighting and optimization with weights iteratively until N~j
converges to a zero. Then we will discard such projection points that corresponding vij <1 in re-estimation using the bilinear method to output the final structure and motion in projective space. The next step is to transform our result to Euclidean space
Chapter 3. Euclidean Reconstruction
This chapter introduces the auto calibration which is the process of determining intrinsic matrix K of the camera directly with multiple projection matrices in projective space from multiple views. Once this is done, it is straightforward to compute the result of Euclidean reconstruction. Auto-calibration avoids the onerous task of calibrating cameras using special calibration object, and this gives great flexibility in real application [11].
3.1. Auto-calibration using the iterative absolute dual quadric method
The absolute dual quadric Q is a degenerate dual quadric represented by a ∞*
4
Under the 4× homography H transforms the 3D point of homogenous form in 4 Euclidean space to that in projective space:
n
This leads to a linear system
1
solving w, we can obtain w by the inverse of * w, then use the new w to run the next iteration until it converges. This method provides a *)'
(Q∞ for extracting the rectifying homography.
3.2. Acquisition of the rectifying homography
After acquisition of (Q∞*)', now we compute the rectifying homography. First, apply SVD to *)'
is not able to be determined; however, it only affects the translation in projection matrix such that keep the structure of object the same. We choose
) P
X(
hv4 ≅ 0
Chapter 4. Experimental Results
In this chapter, we will demonstrate our experimental results including synthetic data and real data from image stream. In synthetic data experiment, we show the nearly perfect result in ideal case, and how to use IRLS to make our system stable with noised data. In real data experiment, we apply our system to human head recognition and dense reconstruction of human head geometry. The image stream in our database has some missing feature correspondences, and our result shows the reasonable missing data do not affect the stability of our system.
4.1. Synthetic data
4.1.1. Synthetic data without noise
In our first experiment, we create 10 views of an artificial six-faces cube geometry consisting of 26 3D points with a dimension of 20×20×20. The image resolution is800×600. To verify the reconstruction result, a similarity transformation
sR where s is a scaling factor and R is a rotation matrix between the output structure and the synthetic model is computed , then the mean and standard deviation of 3D point errors among all the corresponding 3D points are examined. The evaluation of the result is shown in Table 4.1.
Table 4.1. Evaluation of the 3D synthetic cube reconstruction results.
mean of 3D error stdv of 3D error estimated cube structure 0.0005 0.0002
Here, we can see the result is nearly perfect in the ideal case.
we generate 10 views of another synthetic object which simulates the human head containing 19 points with a dimension of 20×20×10. Figure 4.1 shows the six snap shoots of the wireframe of the synthetic human head. The evaluation of 3D point errors is shown in Table 4.2. The result is nearly perfect.
statistics sample
Figure 4.1. The wireframe of synthetic human head
Table 4.2. Evaluation of the 3D synthetic head reconstruction results mean of 3D error stdv of 3D error estimated head structure 0.0401 0.02537
4.1.2. Synthetic data with noise
In previous section, we have seen our result is nearly perfect in the ideal case without noise. In the following, we will test the noise tolerance of our system. We add the noise generated by the Gaussian PDF with μ =0 and σ =3 in two directions (horizontal and vertical). We generate 20 noisy copies of the 10-view image stream.
We show the cube reconstruction results obtained by two different schemes: one with the statistical IRLS scheme and another without. Table 4.3 lists the average of mean and standard deviation of the 3D point location errors of the cube reconstruction without IRLS scheme, and Table 4.4 lists the average of mean and standard deviation of the 3D point location errors of the cube reconstruction with IRLS scheme(r = 0.95)
statistics sample
Table 4.3 Evaluation of the cube reconstruction results in the noise case without IRLS.
mean of 3D error stdv of 3D error
sample 1 0.4763 0.2152
sample 2 0.3457 0.1421
sample 3 14.327 5.6238
sample 4 0.3346 0.1405
sample 5 14.379 4.5152
sample 6 0.5092 0.1475
sample 7 0.3323 0.1708
sample 8 0.3211 0.1579
sample 9 0.3194 0.1345
sample 10 0.4327 0.2245
sample 11 0.3474 0.1314
sample 12 15.766 5.3354
sample 13 0.4676 0.1810
sample 14 0.3927 0.1692
sample 15 14.421 4.2290
sample 16 0.3228 0.1381
sample 17 0.3844 0.1787
sample 18 0.4043 0.1760
sample 19 0.3328 0.1518
sample 20 0.4285 0.1375
average 3.2523 1.1150
statistics sample num.
Table 4.4. Evaluation of the cube reconstruction results in the noisy cases with IRLS mean of 3D error stdv of 3D error
sample 1 0.5985 0.2598
sample 2 0.3874 0.1904
sample 3 0.5064 0.2519
sample 4 0.7287 0.3341
sample 5 0.7445 0.4098
sample 6 0.4838 0.2185
sample 7 0.3609 0.1604
sample 8 0.6268 0.3811
sample 9 0.4062 0.2675
sample 10 0.5297 0.2738
sample 11 0.4961 0.2579
sample 12 0.5917 0.2769
sample 13 0.5162 0.4306
sample 14 0.5179 0.2283
sample 15 0.5619 0.2610
sample 16 0.4868 0.2699
sample 17 0.5375 0.1912
sample 18 0.4900 0.1997
sample 19 0.3961 0.2116
sample 20 0.5751 0.2278
average 0.5271 0.2651
From the two experimental results, we can find out there are four crashed samples in a total of 20 samples in the cases without IRLS scheme. We correct the crashed case successfully with the IRLS scheme. The average of mean of 3D error decreases from 3.2523 to 0.5271.
Next, we show the results in synthetic human head reconstruction, also divided into two categories. The first result is without IRLS scheme, and the second result is
statistics sample num.
Table 4.5 Evaluation of the head reconstruction in the noisy cases without IRLS mean of 3D error stdv of 3D error
sample 1 0.1819 0.0864
sample 2 0.2635 0.1689
sample 3 0.1879 0.1132
sample 4 0.2328 0.1188
sample 5 0.2488 0.0789
sample 6 0.1860 0.1074
sample 7 0.1864 0.0943
sample 8 0.2349 0.1521
sample 9 0.3577 0.2220
sample 10 0.2542 0.0882
sample 11 4.9966 2.6401
sample 12 3.3724 1.9236
sample 13 0.1780 0.1023
sample 14 0.2221 0.1216
sample 15 0.1909 0.1083
sample 16 0.2144 0.1117
sample 17 0.2251 0.1173
sample 18 0.2499 0.1282
sample 19 0.3188 0.1799
sample 20 0.2672 0.1849
average 0.6285 0.3424
statistics sample num.
Table 4.6 Evaluation of the head reconstruction results in the noisy cases with IRLS mean of 3D error stdv of 3D error
sample 1 0.3280 0.1627
sample 2 0.2654 0.1759
sample 3 0.2854 0.1951
sample 4 0.2568 0.1244
sample 5 0.2504 0.0900
sample 6 0.2955 0.1555
sample 7 0.2436 0.1620
sample 8 0.3058 0.1636
sample 9 0.2954 0.1955
sample 10 0.3214 0.2255
sample 11 0.2303 0.1173
sample 12 0.2673 0.1888
sample 13 0.1923 0.0922
sample 14 0.2756 0.1436
sample 15 0.2276 0.1405
sample 16 0.2469 0.1220
sample 17 0.1923 0.0902
sample 18 0.2656 0.1362
sample 19 0.3910 0.1995
sample 20 0.2789 0.1611
average 0.2708 0.1521
There are two crashed samples in the case without IRLS such that the input data can not achieve an equilibrium automatically. We correct these two samples successfully with the IRLS scheme (r = 0.95) to discard some projection points. The average of mean of 3D error decreases from 0.6285 to 0.2708.
We rely on that the IRLS selects the outlier points which are not consistent with the input data to increase the accuracy of our result. Although in some cases, discarding some projection points will increase the error slightly, it still helps our
statistics sample num.
system to resist the random like noise. In addition, some geometry constraints on the input data should be noticed:
1. Critical motion: the motion only contains the translation or one axis rotation.
2. All of the projection points in some views lie on a plane in the 3D space.
4.2. Real data with the missing data
4.2.1. Point structure reconstruction of the real object with combination
Due to the viewing constraint, we are usually not capable to reconstruct all points of the real object at the same time. Here, we demonstrate the experiment that we conduct the point structure of the object with two clip segments, then merge these two point structures into one by making use of the overlapping points.
Figure 4.2 shows the input image stream with correspondences and the reconstructed 3D point structure of clip segment A with 41 feature points (structure A), and Figure 4.3 shows the input image stream with correspondences and the reconstructed 3D point structure of clip segment B with 45 feature points (structure B).
The clip segment A and the clip segment B have the eight overlapping feature points, and we will use the similarity transformation computed from these eight overlapping points to combine the structure A and the structure B.
Figure 4.2. The input image stream and the 3D reconstructed point structure of the real object (clip segment A)
Figure 4.3. The input image stream and the 3D reconstructed point structure of the
Figure 4.4 The eight snap shoots of the combined 3D point structure for the structure A and structure B.
4.2.2.
Point structure recognition of the human headWe construct a database of five human heads, each containing 19 feature points.
A representative image data is shown in Figure 4.5.
Figure 4.5. The images in the human head dataset
Given the above input data, our system is able to reconstruct the 3D point structure.
Figure 4.6 shows the six snap shoots of our reconstructed 3D point structure. Table 4.7 is the illustration of our database, each consisting of: the image stream with feature tracking and the corresponding 3D point structure.
Figure 4.6. The wireframe representations of the reconstructed 3D point structure
Table 4.7. Database of the real human heads with or without eye glasses.
Name Input Image (without eye glasses)
3D Point Structure
Input Image (with eye glasses)
3D Point Structure Wu
LK
Ruping
Lee
Chenbc N/A N/A
The evaluation process of the reconstruction results is same as those for the synthetic experiments. Table 4.8 shows the human head recognition result for the five persons subject to a variation in wearing the eye glasses. In this experiment there are 19 head feature points including the four ear feature points. Table 4.9 shows the recognition result for the same case except that four ear feature points are removed (to simulate the case when ears are covered by the hair). From Tables 4.8 and 4.9, all the five persons are correctly recognized no matter whether wearing or not wearing the eye glass or whether four ear feature points are covered or not.
The experiment also indicates that the system can handle the problems of missing points and data noise. In the future, we need to collect a larger database, and
wearing a hat or with bear. Nevertheless, we can construct the 3D geometry for the head, the use of the geometric comparison can resist with the severe figure changes in principle. On the other hand, we may require the person being identified to follow the same general guidelines as she or he files the official papers like the passport photo or ID photo.
Table 4.8. Human head recognition result subject to a variation in wearing the eye glasses. Nineteen feature points including the ear feature points are used.
Wu LK Ruping Lee
Wu 0.2725 0.7034 0.2984 2.4347
LK 0.9480 0.4215 0.3158 3.0554 Ruping 0.8307 0.7614 0.1986 2.2197
Chenbc 1.4882 0.8595 0.5192 3.5336
Lee 1.0998 0.7634 0.3092 1.9284
Table 4.9. Human head recognition result for the same case in Table 4 except that four ear feature points are discarded.
Wu LK Ruping Lee
Wu 0.4809 0.6082 0.2215 1.5746
LK 0.5985 0.3161 0.3281 2.9405
Ruping 0.5952 0.6315 0.1317 1.4417
Chenbc 0.6695 0.8078 0.3147 1.6205
Lee 0.6812 0.7410 0.2884 1.1286
Person
without eye glasses Person with eye glasses
Person
without eye glasses Person with eye glasses
4.2.3. Dense reconstruction of human head
In this section, we continue another experiment to use the fast octree algorithm [1]
to do the dense reconstruction of human head with our estimated projection matrix in the Euclidean space. Figures 4.7-4.8 are the five snap shoots of the dense reconstruction results of two persons with the fast octree algorithm. Generally, the reconstructed model captures the real head shape, as indicated by the resemblance between the created one and the real one. However, due to the small camera projection matrix estimation errors the created head contain a bug (a dent in the head) for person 1. Also the quality of the reconstructed head depends on the total number of views used; more views give a better approximation, in particular, those views containing the rapidly varying silhouettes.
Figure 4.7. The dense reconstruction result for person 1 using the fast octree algorithm
Figure 4.8. The dense reconstruction result for person 2 using the fast octree algorithm
Chapter 5. Conclusion and future work
5.1. Conclusion
Our proposed system is capable to handle the missing data and noise. For handling the missing data, our proposed method can deal with directly. For handling the noise, the optimization process of our proposed method has the nature of the bundle adjustment to find the balanced solution such that the effects of random noise cancel out. In some cases where the correct solution results are not possible due to the influence of the outliers, the IRLS feature of our method plays a role to discover these outliers based on the statistical technique. In order to find an Euclidean reconstruction, an iterative auto-calibration procedure using the absolute dual quadric computes an overall rectifying homography even under the situation where the individual estimated camera intrinsic matrix varies across the views because of noise.
From the synthetic data experimental results, our proposed scheme shows the robustness against the random noise generated by the Gaussian PDF. In the real data experimental results, our system can handle mild image noise and missing features.
5.2. Future work
This thesis exploits the robustness problem of 3D reconstruction of rigid objects.
We consider to extend our method to deal with the advanced problems including (1) how to reconstruct structures of multiple objects moving independently [14], (2) how to reconstruct the articulated object whose subparts have different motions. For reconstructing the independently moving object or articulated object, some researchers try to use the statistical methods [15] like expectation maximization algorithm (EM) or the factorization-based methods imposed in articulation constraints [16].
For the application of human head/face recognition, there is a need to test on a larger database for the system evaluation. Besides, the selection of an appropriate representation of the head/face model to deal with variations in the head/face appearance is the essential problem in human head/face recognition.
For a higher accuracy of the reconstruction result, we can achieve by reducing feature correspondence errors.
References:
[1] H. L. Chou, “Constructing 3D Object Models from Image Sequences”, 博士論文, 國立交通大學, 資訊工程學系, 新竹市, 臺灣, 2004.
[2] Y.H. Fang, H.L. Chou, and Z. Chen (2003),”3D Shape Recovery of Complex Objects from Multiple Silhouette Images”, Pattern Recognition Letters 24, 1279-1293.
[3] C. Tomasi and T. Kanade, “Shape and Motion from Image Streams under orthography: A Factorization Method”, International Journal of Computer Vision, vol. 9, no. 2, pp. 137-154, 1992.
[4] C. J. Poelman and T. Kanade, “A Paraperspective Factorization Method for Shape and Motion Recovery,” Proc. Conf. Third European Computer Vision, vol. 2, pp.
97-108, 1994.
[5] R.I. Hartley, “Euclidean Reconstruction from Uncalibrated Views”, Applications of Invariance in Computer Vision, M.Z. Foresyth, ed., pp. 237-256. Berlin Heidelberg: Springer Verlag. 1994.
[6] R. Szelinski and S.B. Kang, ”Recovering 3-D Shape and Motion from Image Streams Using Non-Linear Least Squares,” J. Visual Comm. and Image Representation, vol. 5, no. 1, pp. 10-28, Mar. 1994.
[7] S. Christy and R. Horaud, “Euclidean Shape and Motion from Multiple Perspective Views by Affine Iteration”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 11, pp.123-141, 1995.
[8] T. Kanade and D. Morris, “Factorization Methods for Structure from Motion,”
Philosophical Transactions of the Royal Society of London, vol.A, no. 356, pp.
1,153-1,173, 1998.
[9] H. Aanaes and R. Fisker, “Robust Factorization”, IEEE Transactions on pattern Analysis on Pattern Analysis and Machine Intelligence, vol. 24, no.9, pp. 121-125, 2004.
[10] S. Mahamud, M. Herbert, Y. Omori*, J. Ponce, “Provably-Convergent Iterative Methods for Projective Structure from Motion”, Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 1,018-1,025, 2001.
[11] J. Oliensis, “Fast and Accurate Self-Calibration”, Proc. Int' l Conf. Computer Vision, pp. 745-752, 1999.
[12] R. I. Hartley, “In Defense of the Eight-Point Algorithm”, Proc. Fifth Int’ l Conf.
Computer Vision, pp. 1,064-1,070, 1995
[13] A. J. Brooker, J. E. Dannis, Jr., P. D. Frank, “A Rigorous Framework for Optimization of Expensive Functions by Surrogates”, Structural Optimization, vol.
17, no. 1, pp. 1-13, 1999.
[14] J. Costeira and T. Kanade, “A Multibody Factorization Method for Independently Moving Objects”, Int’ l J. Computer Vision, vol. 29, no.3, pp. 159-179, 1998.
[15] A.Gruber, Y. Weiss, “Multibody Factorization with Uncertainty and Missing
[15] A.Gruber, Y. Weiss, “Multibody Factorization with Uncertainty and Missing