4.3 Reconstruction of Planar Model
4.3.3 Grouping of Extended-Planes and Rendering of Planar Model
In the reconstruction method proposed in this study, the objects within the scene are synthesized in accordance with the information contained within multiple views of the scene. However, the reconstruction results obtained for an object based on the information within any two of the available views may well differ in terms of the boundary of the planar region. Therefore, in the proposed approach, the complete planar regions associated with each object in the world scene are reconstructed by merging the extended half-planes which are both neighboring to one another and coplanar (see Fig. 4.7).
Figure 4.7: Merging the coplanar extended half-planes 𝜋(L) and 𝜋(L
0
).If two neighboring planes are considered to be coplanar, they are merged to form a single plane. Take Fig. 4.7 as an example, suppose the extended half-plane 𝜋(L0), which is parameterized by the 3D line L0, is coplanar with 𝜋(L), which is parameterized by L. A new plane, 𝜋𝑁, comprising both planes is initially obtained by taking the cross product of L0 and L as the normal vector. The optimum solution is then obtained by minimizing the distances between the new plane 𝜋𝑁 and the points X on plane 𝜋(L0) or 𝜋(L) using the Levenberg Marquardt algorithm
𝜋𝑁′ = arg min
𝜋𝑁
∑︁
X∈{𝜋(L0)∪𝜋(L)}
𝑑(X, 𝜋𝑁) (4.9)
where 𝑑(X, 𝜋𝑁) is the Euclidean orthogonal distance between the point X and the plane 𝜋𝑁.
Finally, the resulting merged planes are rendered to produce the final 3D re-constructed model in accordance with their geometric information. Pseudo-code of the algorithm used to construct the final planar models from the half-planes is summarized in Algorithm 3.
Algorithm 2 Region-extension algorithm.
Objective: Given the region r
1
, which is the projected image of the 3D half-plane 𝜋 on the first view, and a set of corresponding feature points, extend r1
subject to a homography criterion.Algorithm:
1: Define the bounded search region R
1
using the fixed lines;2: For each feature point x
1 𝑖
in R1
, check whether the point pair satisfies H(𝜋) or not;3: The feature points x
1 𝑖
which are coplanar with 𝜋 form the set 𝜓𝜋
; 4: Sort the points x1 𝑖
∈ 𝜓𝜋
in ascending order according to 𝑑(x1 𝑖
, l1
);5: Extend the seed region r
1
incrementally by adding one point x1 𝑖
∈ 𝜓𝜋
each time until all the points in 𝜓𝜋
have been added;Algorithm 3 Planar model reconstruction algorithm.
Objective: Produce planar model from extended half-planes.
Algorithm:
1: for each half-plane 𝜋
𝑖
do2: Create an empty extended planar region 𝒞
𝑖
of 𝜋𝑖
; 3: for each image I𝑗
do4: Let I
𝑗
be the reference image;5: Create an empty extended planar region ℛ
𝑗
for I𝑗
; 6: for each image I𝑘
do7: Let I
𝑘
be the target image;8: Obtain the extended region of 𝜋
𝑖
based on the image pair (I𝑗
, I𝑘
) using Algorithm 2;9: Update ℛ
𝑗
by adding new extended region;10: end for
11: end for
12: for each extended planar region ℛ
𝑗
do 13: Update 𝒞𝑖
= 𝒞𝑖
∪ ℛ𝑗
;14: end for 15: end for 16:
17: for each extended planar region 𝒞
𝑖
do 18: for each extended planar region 𝒞𝑗
do 19: if (𝒞𝑗
̸= 𝒞𝑖
) and (𝒞𝑗
, 𝒞𝑖
) are coplanar then28: Render the reconstructed 3D planar models from 𝒞
𝑖
;4.4 Experimental Results
This section verifies the performance of the proposed reconstruction method using two kinds of image sets, namely a series of images of a stacked arrangement of boxes on a table surface and the Aerial Views image sets provided by the Visual Geometry Group [73]. The accuracy of the half-planes reconstructed by the proposed method is quantified in terms of the orientation error relative to the corresponding real-world planes and is compared with that of the reconstruction results obtained using the method presented in [1]. In addition, the comparison of the computational time between the proposed method and the method in [1] is also evaluated.
Figure 4.8: Consecutive images of stacked camera boxes.
Fig. 4.8 presents three consecutive images of a stacked box arrangement taken by a common digital camera. Each image has a resolution of 1024 × 768 pixels, and the camera is calibrated using Yang’s method [40] implemented in the OpenCV library [72]. The fundamental matrices between the three views are estimated via the salient corresponding feature points are then decomposed to obtain the respective camera poses [35]. Fig. 4.9(a) shows the 3D wire-frame model of the boxes constructed from the corresponding line segments extracted from the three images. Fig. 4.9(b) shows the estimated half-planes associated with some of these 3D line segments.
After extending the half-planes and grouping the half-planes which are considered to be coplanar, a total of 7 planes are obtained for the real-world scene, as shown in Fig. 4.9(d). It is observed that each of the grouped coplanar regions covers the entire region of the corresponding box facet. Fig. 4.10 presents three views of the rendered 3D model of these 7 planes of the stacked-box with texture mapping. It is observed that all the planar facets which are visible in the original image set are fully reconstructed. However, those which are hidden in the original image set can not be reconstructed due to a lack of information.
(a) (b) (c) (d)
Figure 4.9: Reconstruction results obtained for stacked-box scene: (a) 3D wire-frame model (b) the half-planes associated with 3D lines (c) detected feature points and (d) grouped coplanar regions.
Figure 4.10: Reconstructed 3D model of stacked-box arrangement viewed in three directions.
The performance of the proposed reconstruction scheme was evaluated in terms of the orientation accuracy of the reconstructed half-planes, i.e., the difference be-tween the intersection angles of the real planes and those of the corresponding re-constructed half-planes. For simplicity, the evaluation experiment considered only orthogonal or parallel plane pairs (i.e., planes characterized by a real-world inter-section angle of 90 degrees or 0 degrees). Table 4.1 compares the orientation errors of five plane pairs in the model constructed using the proposed method with those of the corresponding plane pairs in the model produced using the method proposed by Baillard et al. [1]. The results confirm that the current method yields a more accurate representation of the 3D planes in the real-world scene than the method presented in [1]. Fig. 4.11(a) shows the indices of the planes of the boxes which are used for orientation accuracy evaluation.
The second set of reconstruction experiments was performed using the Aerial Views I dataset provided by the Visual Geometry Group [73]. The dataset comprises six consecutive images taken by an airplane-mounted camera. The dataset also includes the calibrated camera poses corresponding to each of the six different views.
The images in the Aerial Views I dataset are acquired with a resolution of 600 × 600
Table 4.1: Errors of intersection angles between the constructed half-plane pairs for the stacked box image set.
Plane Pair Our Method
Baillard’s Method
(A, E) 0.62∘ 0.46∘
(A, F) 2.92∘ 2.88∘
(C, B) 2.27∘ 4.20∘
(D, E) 1.60∘ 4.87∘
(E, F) 1.57∘ 9.61∘
Average 1.80∘ 4.40∘
(a) (b)
Figure 4.11: The indices of planes used for accuracy evaluation: (a) stacked-box and (b) Aerial Views I.
pixels. Fig. 4.12 presents representative images in the dataset, while Fig. 4.13 shows various views of the planar models constructed from the images in the Aerial Views I datasets. From an observation of Fig. 4.13, it is clear that all of the 3D planes in the real-world scene are correctly and fully reconstructed other than some of the vertical walls which can not be discerned in the original image sets.
Figure 4.12: Images in the Aerial Views I datasets.
The accuracy of the reconstruction result in Fig. 4.13 was quantified by evaluat-ing the orientation accuracy of 15 plane pairs in the Aerial Views I dataset known to be orthogonal or parallel in the real world. The planes which are used for orientation accuracy evaluation are shown in Fig. 4.11(b) with indices. Table 4.2 compares the intersection errors of the corresponding plane pairs in the model constructed by the proposed system with those of the plane pairs in the model constructed using the
Figure 4.13: Reconstructed 3D model of Aerial Views I dataset observed from different views.
method presented in [1]. As in the first reconstruction experiment, it is observed that the results obtained using the proposed method are significantly more precise than those obtained using the method proposed by Baillard et al. [1].
The computational cost was evaluated in terms of the number of trials for search-ing a half-plane, the total computation time for searchsearch-ing all the half-planes, and the time consumed by the feature corresponding detection process. Table 4.3 shows the results of the computational evaluation of the stacked-box set reconstruction, while Table 4.4 shows the results of the Aerial Views set I. Both tables present the comparison of the computational cost using our method with that using the method proposed by Baillard et al. [1]. As shown in these tables, the average number of search trials required in our method is significantly reduced. Even for the worst case, the number of trials required in our method is still less than that required in Baillard’s method. From these tables, it is observed that the computation time of half-plane searching by our method is far less than by Baillard’s method (about 90%
improvement). These tables also show that even with the extra time required for computing the corresponding points by SIFT, the total time cost in this method is still much less than that in the method [1].
Table 4.2: Errors of intersection angles between the constructed half-plane pairs for the Aerial
Table 4.3: The comparison of the computational cost between our method and Baillard’s method for the stacked box image set.
Boxes Set Our Method
Baillard’s Method no. of trials (worst case) 60 trials 180 trials no. of trials (in average) 10.6 trials 180 trials Time for half-plane search 63.25 sec 510.64 sec Time for feature points detection 11.96 sec N/A
Table 4.4: The comparison of the computational cost between our method and Baillard’s method for the Aerial Views I image set.
Aerial Views Set Our Method
Baillard’s Method no. of trials (worst case) 33 trials 180 trials no. of trials (in average) 6.6 trials 180 trials Time for half-plane search 12.92 sec 116.78 sec Time for feature points detection 5.44 sec N/A
Chapter 5 Conclusions
In this thesis, we have explored three applications of multiple view techniques for depth estimation and 3D reconstruction, i.e., depth map estimation from stereo view, stereo-based 3D localization technique for texture-less human back, and 3D reconstruction of piecewise planar models from multiple views.
First, we have proposed a Hierarchical Bilateral Disparity Structure (HBDS) al-gorithm for improving the efficiency of the disparity estimation process while simul-taneously preserving the accuracy of the computed disparity map. In the proposed approach, the disparity levels in the original image are divided hierarchically into lower-level bilateral disparity structures. Within each bilateral structure, the dis-parity set is split into a foreground disdis-parity set and a background disdis-parity set using an optimal break point determined from the estimated disparity histogram of the original image. The bilateral disparity map is then computed by minimizing an energy function tailored specifically to the HBDS structure using the graph cut algorithm proposed in [16]. By decomposing the disparity estimation process into a series of bilateral disparity estimations, the HBDS algorithm reduces the compu-tational complexity from the square order to the linear order with respect to the number of disparity levels. The evaluation results have shown that HBDS is not only far more efficient than the original GC algorithm, but also achieves a slightly higher accuracy. Furthermore, to mitigate the unavoidable “foreground fattening” effect, a modified disparity calibration (MDC) method has been proposed for refining the HBDS disparity map. In the proposed refinement scheme, the fattening regions are detected by comparing the edges in the disparity map with the boundaries of the segments in the over-segmented reference image. The disparity values within the fattening regions are then recalibrated based on the disparity values of all the nearby pixels other than those within the fattening region. The results have shown that
the accuracy of the refined disparity map is significantly better than that of the original disparity map. Moreover, it has been shown that the MDC method yields a better accuracy than the disparity calibration method proposed in [28]. Finally, the evaluation results have shown that the proposed HBDS+MDC method not only performs well on standard test stereo images, but also on stereo images of ordinary real-world scenes captured using a common 3D camera.
Second, we have developed a vision-based robotic back massage machine in which the positions of the points for massage are measured using the stereoscopic 3D localization technique. To solve the problem of correspondence matching on the smooth and texture-less human back, we have proposed an algorithm designated as Correspondences from Epipolar geometry and Contours via Triangle barycentric coordinates (CECT) by combining the respective advantages of content information and geometric information. The CECT algorithm commences by extracting the back edge contours from the left and right images captured by a stereo camera.
A set of one-to-one corresponding points on the two contours is then constructed by applying a uniqueness constraint and an epipolar geometry constraint. These points, designated as reliable foundational correspondences, are then used to locate corresponding points within the region of the back enclosed by the edge contour using a descriptor based on the Triangle Barycentric Coordinates (TBC) system.
The performance of CECT is further enhanced by applying two geometric constraints on the reference triangles used by the TBC descriptor and an epipolar constraint for assessing the correctness of each candidate correspondence. The experimental results have shown that the proposed localization approach correctly identifies the specified 3D positions on the back even when the subject is lit from the side, or when the camera is displaced along with the robotic arm. Moreover, it has been shown that the localization performance of the CECT algorithm is consistent with that obtained using the cun-based measurement method in Traditional Chinese Medicine.
Third, we have presented a method for reconstructing 3D planar models of real-world scenes based on the information contained within the images taken from mul-tiple views of the scene. The proposed technique applies the same half-plane concept as that proposed in [1], but parameterizes the half-plane using the corresponding
feature points rather than the angle of rotation of the plane about a 3D line The experimental results have shown that the proposed parameterization scheme yields a significant improvement in the efficiency of the search process performed to identify the candidate half-planes. The search process is further simplified via the imposi-tion of region and coplanar constraints, which limit the region over which the search process is performed and exclude any points which are not coplanar in the 3D world scene, respectively. Having identified the correct half-planes in accordance with the results obtained from a similarity score function, a region-extension process is per-formed to ensure that the selected half-planes cover the entire region occupied by the corresponding real-world planes. To avoid the requirement for an exhaustive search over the full image when performing the half-plane extension process, the search region is defined in advance for each half-plane using the Hough transform-based plane boundary identification method presented in [3]. Significantly, the half-plane extension process is based on the corresponding feature points contained within the bounded search region rather than all the pixels within the search region, and thus the efficiency of the extension process is greatly improved compared to that of conventional pixel-by-pixel extension schemes. Finally, the 3D planar model is constructed by merging all the half-planes belonging to the same world plane. The feasibility of the proposed method has been confirmed by reconstructing 3D planar models of a stacked arrangement of boxes and an urban scene, respectively. The ex-perimental results have shown that the proposed method is both more efficient and more accurate than the 3D reconstruction system proposed by Baillard et al. [1].
References
[1] C. Baillard, C. Schmid, A. Zisserman, and A. W. Fitzgibbon, “Automatic line matching and 3d reconstruction of buildings from multiple views,” in Proc. ISPRS Conference on Automatic Extraction of GIS Objects from Digital Imagery, 1999, pp. 69–80.
[2] K. Schindler, “Generalized use of homographies for piecewise planar reconstruction,” in Proc.
Scandinavian Conference on Image Analysis, ser. Lecture Notes in Computer Science, vol.
2749. Springer Berlin Heidelberg, Jul. 2003, pp. 470–476.
[3] G. Simon, “Automatic online walls detection for immediate use in ar tasks,” in Proc.
IEEE/ACM International Symposium on Mixed and Augmented Reality, Oct. 2006, pp. 39–42.
[4] D. Gallup, J. M. Frahm, P. Mordohai, Q. X. Yang, and M. Pollefeys, “Real-time plane-sweeping stereo with multiple plane-sweeping directions,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8.
[5] W. F. Li, J. Zhou, B. X. Li, and M. I. Sezan, “Virtual view specification and synthesis for free viewpoint television,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 4, pp. 533–546, Apr. 2009.
[6] S. Chan, H. Y. Shum, and K. T. Ng, “Image-based rendering and synthesis,” IEEE Signal Processing Magazine, vol. 24, no. 6, pp. 22–33, Nov. 2007.
[7] L. H. Wang, X. J. Huang, M. Xi, D. X. Li, and M. Zhang, “An asymmetric edge adaptive filter for depth generation and hole filling in 3dtv,” IEEE Transactions on Broadcasting, vol. 56, no. 3, pp. 425–431, Sep. 2010.
[8] S. U. Yoon and Y. S. Ho, “Multiple color and depth video coding using a hierarchical repre-sentation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 11, pp. 1450–1460, Nov. 2007.
[9] L. Shen, Z. Liu, T. Yan, Z. Zhang, and P. An, “View-adaptive motion estimation and disparity estimation for low complexity multiview video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 6, pp. 925–930, Jun. 2010.
[10] H. Karim, N. Shah, N. Arif, A. Sali, and S. Worrall, “Reduced resolution depth coding for stereoscopic 3d video,” IEEE Transactions on Consumer Electronics, vol. 56, no. 3, pp. 1705–
1712, Aug. 2010.
[11] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo corre-spondence algorithms,” International Journal of Computer Vision, vol. 47, no. 1-3, pp. 7–42, Apr. 2002.
[12] O. Veksler, “Fast variable window for stereo correspondence using integral images,” in Proc.
IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, Jun.
2003, pp. 556–561.
[13] T. Kanade and M. Okutomi, “A stereo matching algorithm with an adaptive window: Theory and experiment,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 920–932, Sep. 1994.
[14] Y. Ohta and T. Kanade, “Stereo by intra- and inter-scanline search using dynamic program-ming,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, no. 2, pp.
139–154, Mar. 1985.
[15] O. Veksler, “Stereo correspondence by dynamic programming on a tree,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, Jun. 2005, pp.
384–390.
[16] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp.
1222–1239, Nov. 2001.
[17] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1124–1137, Sep. 2004.
[18] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147–159, Feb. 2004.
[19] V. Kolmogorov and R. Zabih, “Computing visual correspondence with occlusions using graph cuts,” in Proc. IEEE Int. Conf. on Computer Vision, vol. 2, Jul. 2001, pp. 508–515.
[20] J. Sun, N. N. Zheng, and H. Y. Shum, “Stereo matching using belief propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 787–800, Jul.
2003.
[21] M. F. Tappen and W. T. Freeman, “Comparison of graph cuts with belief propagation for stereo, using identical mrf parameters,” in Proc. IEEE Int. Conf. on Computer Vision, vol. 2, Oct. 2003, pp. 900–906.
[22] L. Hong and G. Chen, “Segment-based stereo matching using graph cuts,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, Jun.-Jul. 2004, pp. 74–81.
[23] W. Daolei and K. B. Lim, “Obtaining depth map from segment-based stereo matching using graph cuts,” Journal of Visual Communication and Image Representation, vol. 22, no. 4, pp.
325–331, May 2011.
[24] H. Lombaert, Y. Sun, L. Grady, and C. Xu, “A multilevel banded graph cuts method for
[24] H. Lombaert, Y. Sun, L. Grady, and C. Xu, “A multilevel banded graph cuts method for