Practical Tests - 針對單一平面目標物之三維姿態的直接分析：演算法和系統實作

In the end, we conduct a practical test to demonstrate the performance of the proposed system in the real world. As shown in Figure 5.8(a), the system is built on NVIDIA Jetson TX1 board with a Microsoft LifeCam Cinema webcam and a DELL display. The resolution of camera images captured by webcam are 640 × 360 and the resolution of the target image is 400 × 300. The area of the planar target in the real world is 16 × 12 cm². We use a texture target image and a textureless target image for the test. Due to the lack of ground truth poses, we use appearance distance E_a defined in (2.6) to evaluate the performance. The target images and the results are shown in Figure 5.8(b) and sample images rendered model with poses obtained from the proposed system are shown in Figure 5.7.

In this tests, the pose estimation unit spends 10 seconds to obtain the initial pose while the pose tracker achieves 11 fps for tracking. The proposed system is able to give the accurate and robust result for both texture and textureless target in the real world.

(a)

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199

Appearance Distance

Number of Frame Ichiro 2Circle Ichiro

2Circle

(b)

Figure 5.8: (a) The picture of the proposed system for the practical tests. (b) The results of the practical tests for the proposed system. We use two planar targets in the tests, which are texture target Ichiro and textureless target 2Circle. The pixel values are normalized to [0, 1] for calculating the appearance distance.

Chapter 6 Conclusion

In this thesis, we propose a robust direct 3D pose estimation algorithm and de-velop D-PET, a direct 3D pose estimation and tracking system for a planar target.

The proposed algorithm is a two-step scheme. First, the pose of the target with respect to a calibrated camera is approximated estimated using a coarse-to-fine scheme. Next, we use a gradient descent search method to further refine and dis-ambiguate the pose. Extensive experimental evaluations show that the proposed algorithm performs favorably against two state-of-the-art feature-based method-s in termmethod-s of accuracy and robumethod-stnemethod-smethod-s. On the other hand, the propomethod-sed D-PET system which is implemented on an embedded GPU consists of a pose estimation unit and a pose tracker. The pose estimation unit is built based on the proposed algorithm and is responsible for finding the initial pose. In order to perform pose tracking, the pose tracker applies a 3-scale search with the proposed pose search pattern. Experimental results verify that the proposed pose estimation unit has similar performance compared to the proposed algorithm and the pose tracker are able to track the pose in severe conditions. The proposed D-PET system achieves the processing speed of 11 fps on an embedded GPU in practical. Our future work includes implementing the specific VLSI hardware to make the system available on wearable devices.

Reference

[1] S. Gauglitz, T. H¨ollerer, and M. Turk, “Evaluation of interest point detectors and feature descriptors for visual tracking,” International Journal of Com-puter Vision, vol. 94, no. 3, pp. 335–360, 2011.

[2] R. T. Azuma, “A survey of augmented reality,” Presence: Teleoperators and virtual environments, vol. 6, no. 4, pp. 355–385, 1997.

[3] H. Kato and M. Billinghurst, “Marker tracking and hmd calibration for a video-based augmented reality conferencing system,” in Proc. IEEE and ACM International Workshop on Augmented Reality, 1999.

[4] H. Kato, M. Billinghurst, I. Poupyrev, K. Imamoto, and K. Tachibana, “Vir-tual object manipulation on a table-top ar environment,” in Proc. IEEE and ACM International Symposium on Augmented Reality. Ieee, 2000, pp. 111–

119.

[5] G. A. Lee, C. Nelles, M. Billinghurst, and G. J. Kim, “Immersive author-ing of tangible augmented reality applications,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2004.

[6] N. Hagbi, O. Bergig, J. El-Sana, and M. Billinghurst, “Shape recognition and pose estimation for mobile augmented reality,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 10, pp. 1369–1379, 2011.

[7] M. Donoser, P. Kontschieder, and H. Bischof, “Robust planar target tracking and pose estimation from a single concavity,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2011.

[8] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”

International Journal of Computer Vision, vol. 60, no. 2, 2004.

[9] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Computer Vision and Image Understanding, vol. 110, no. 3, 2008.

[10] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary Robust Invari-ant Scalable Keypoints,” in Proc. IEEE International Conference on Com-puter Vision, 2011.

[11] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: an efficient alter-native to sift or surf,” in Proc. IEEE International Conference on Computer Vision, 2011.

[12] A. Alahi, R. Ortiz, and P. Vandergheynst, “Freak: Fast retina keypoint,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2012.

[13] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartog-raphy,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.

[14] O. Chum and J. Matas, “Matching with prosac-progressive sample consen-sus,” in Proc. IEEE Conference on Computer Vision and Pattern Recogni-tion, 2005.

[15] T.-J. Chin, P. Purkait, A. Eriksson, and D. Suter, “Efficient globally opti-mal consensus maximisation with tree search,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2015.

63 [16] G. Schweighofer and A. Pinz, “Robust Pose Estimation from a Planar Tar-get,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, 2006.

[17] V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate O(n) solu-tion to the pnp problem,” Internasolu-tional Journal of Computer Vision, vol. 81, no. 2, 2009.

[18] Y. Zheng, Y. Kuang, S. Sugimoto, K. Astrom, and M. Okutomi, “Revisiting the PnP Problem: A Fast, General and Optimal Solution,” in Proc. IEEE International Conference on Computer Vision, 2013.

[19] T. Collins and A. Bartoli, “Infinitesimal plane-based pose estimation,” Inter-national Journal of Computer Vision, vol. 109, no. 3, pp. 252–286, 2014.

[20] B. D. Lucas, T. Kanade et al., “An iterative image registration technique with an application to stereo vision.” vol. 81, 1981, pp. 674–679.

[21] G. D. Hager and P. N. Belhumeur, “Efficient region tracking with parametric models of geometry and illumination,” IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, vol. 20, no. 10, pp. 1025–1039, 1998.

[22] H.-Y. Shum and R. Szeliski, “Construction of panoramic image mosaics with global and local alignment,” 2001, pp. 227–268.

[23] S. Baker and I. Matthews, “Equivalence and efficiency of image alignmen-t algorialignmen-thms,” in Proc. IEEE Conference on Compualignmen-ter Vision and Paalignmen-talignmen-tern Recognition, 2001.

[24] E. Malis, “Improving vision-based control using efficient second-order min-imization techniques,” 2004.

[25] A. Crivellaro and V. Lepetit, “Robust 3d tracking with descriptor fields,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2014.

[26] J. Engel, T. Sch¨ops, and D. Cremers, “Lsd-slam: Large-scale direct monoc-ular slam,” in Proc. European Conference on Computer Vision, 2014.

[27] Y.-T. Chi, J. Ho, and M.-H. Yang, “A direct method for estimating planar projective transform,” in Proc. Asian Conference on Computer Vision, 2011.

[28] S. Korman, D. Reichman, G. Tsur, and S. Avidan, “Fast-match: Fast affine template matching,” in Proc. IEEE Conference on Computer Vision and Pat-tern Recognition, 2013.

[29] J. F. Henriques, P. Martins, R. F. Caseiro, and J. Batista, “Fast training of pose detectors in the fourier domain,” in Proc. Annual Conference on Neural Information Processing Systems, 2014.

[30] D. Oberkampf, D. F. DeMenthon, and L. S. Davis, “Iterative pose estimation using coplanar points,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 1993.

[31] S. Li and C. Xu, “Efficient lookup table based camera pose estimation for augmented reality,” vol. 22, no. 1, pp. 47–58, 2011.

[32] P.-C. Wu, Y.-H. Tsai, and S.-Y. Chien, “Stable pose tracking from a planar target with an analytical motion model in real-time applications,” 2014.

[33] J. Stork, “Camera pose estimation with circular markers,” Ph.D. dissertation, Thesis, University of Amsterdam (UvA), 2012.

[34] E. Olson, “Apriltag: A robust and flexible visual fiducial system,” 2011.

[35] D. Wagner and D. Schmalstieg, Artoolkitplus for pose tracking on mobile devices, 2007.

[36] J. Rekimoto and Y. Ayatsuka, “Cybercode: designing augmented reality en-vironments with visual tags,” in Proceedings of DARE 2000 on Designing augmented reality environments. ACM, 2000.

65 [37] S. Lieberknecht, Q. Stierstorfer, G. Kuschk, D. Ulbricht, M. Langer, and S. Benhimane, “Evolution of a tracking system,” in Handbook of Augmented Reality. Springer, 2011, pp. 355–377.

[38] D. Schmalstieg and D. Wagner, “Experiences with handheld augmented re-ality,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2007.

[39] S.-W. Shih and T.-Y. Yu, “On designing an isotropic fiducial mark,” IEEE Transactions on Image Processing, vol. 12, no. 9, pp. 1054–1066, 2003.

[40] F. Bergamasco, A. Albarelli, E. Rodola, and A. Torsello, “Rune-tag: A high accuracy fiducial marker with strong occlusion resilience,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2011.

[41] P. Santos, A. Stork, A. Buaes, and J. Jorge, “Ptrack: introducing a novel it-erative geometric pose estimation for a marker-based single camera tracking system,” in Proc. IEEE Virtual Reality, 2006.

[42] H. Uchiyama and E. Marchand, “Deformable random dot markers,” in Proc.

IEEE International Symposium on Mixed and Augmented Reality, 2011.

[43] G. Yu and J.-M. Morel, “Asift: A new framework for fully affine invariant image comparison,” Image Processing On Line, 2011.

[44] C.-P. Lu, G. D. Hager, and E. Mjolsness, “Fast and globally convergent pose estimation from video images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 610–622, 2000.

[45] S. Li, C. Xu, and M. Xie, “A robust o (n) solution to the perspective-n-point problem,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1444–1450, 2012.

[46] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans-actions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp.

1330–1334, 2000.

[47] P. Sturm, “Algorithms for plane-based pose estimation,” in Proc. IEEE Con-ference on Computer Vision and Pattern Recognition, 2000.

[48] T. Collins, J.-D. Durou, P. Gurdjos, and A. Bartoli, “Singleview perspec-tive shape-from-texture with focal length estimation: A piecewise affine ap-proach,” in International Symposium 3D Data Processing, Visualization and Transmission, 2010.

[49] O. Pele and M. Werman, “Accelerating pattern matching or how much can you slide?” in Proc. Asian Conference on Computer Vision, 2007.

[50] B. Alexe, V. Petrescu, and V. Ferrari, “Exploiting spatial overlap to effi-ciently compute appearance distances between image windows,” in Proc.

Annual Conference on Neural Information Processing Systems, 2011, pp.

2735–2743.

[51] L.-K. Liu and E. Feig, “A block-based gradient descent search algorithm for block motion estimation in video coding,” IEEE Transactions on Circuits and Systemsfor Video Technology, vol. 6, no. 4, pp. 419–422, 1996.

[52] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast block-matching motion estimation,” IEEE Transactions on Image Processing, vol. 9, no. 2, pp. 287–290, 2000.

[53] C. Zhu, X. Lin, and L.-P. Chau, “Hexagon-based search pattern for fast block motion estimation,” IEEE Transactions on Circuits and Systemsfor Video Technology, vol. 12, no. 5, pp. 349–355, 2002.

[54] D. Eberly, “Euler angle formulas,” Geometric Tools, LLC, Technical Report, 2008.

67 [55] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning from data.

AMLBook, 2012.

[56] L. Kneip, H. Li, and Y. Seo, “Upnp: An optimal o (n) solution to the absolute pose problem with universal applicability,” in Proc. European Conference on Computer Vision, 2014.

[57] S. Lieberknecht, S. Benhimane, P. Meier, and N. Navab, “A dataset and evaluation methodology for template-based tracking algorithms,” in Proc.

IEEE International Symposium on Mixed and Augmented Reality, 2009.

[58] H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geo-metric consistency for large scale image search,” in Proc. European Confer-ence on Computer Vision, 2008.

[59] N. Bell and J. Hoberock, “Thrust: a productivity-oriented library for cuda,”

GPU Computing Gems: Jade Edition, 2012.

在文檔中針對單一平面目標物之三維姿態的直接分析：演算法和系統實作 (頁 74-85)