Evaluation for Sensor Fusion Results - 基于視覺和慣性測量之飛行攝影機自我定位

Sensor fusion is used to make up for the limitations of visual positioning. Noticed that the visual positioning methods mentioned above are accurate, so it is hard to achieve significantly better accuracy by loosely-coupled sensor fusion methods. However, we can still verify the function of this sensor fusion method by designing a bad case. In this experiment, we add a Gaussian noise onto the Vicon measurement to simulate bad visual positioning result, and then fuse it with IMU readings. Figure 5.10 shows the framework in our simulation experiment.

In the experiment, the used inertial sensor is x-IMU [21]. The IMU moves under the Vicon system and the angular velocity ω and acceleration a readings are used to predict the system state in the EKF-based framework. As discussed before, the visual part is considered as a black box, so it is feasible to use Vicon data in this part as measurement.

We add σ = 0.1m Gaussian noise onto Vicon measurement to simulate bad vision cases.

The mean error in 3D space is 16.01cm after adding the noise. Fusion result is shown below. The 3D visualized input measurement and the fusion result are shown together in Figure 5.11. Figure 5.12 shows the input and the output in the three axes in the sensor fusion experiment, from which we see the noise has been reduced significantly and the curve is much more smooth. Table 5.3 shows the quantitative results, from which the 3D positioning error has been reduced to 8.68cm from 16.01cm.

Figure 5.10: The framework in our experiment, where the visual result is simulated by the Vicon measurement with noise.

Figure 5.11: The 3D visualized input measurement and the fusion result shown together.

Table 5.3: Positioning errors (cm) before and after sensor fusion.

Axis Measurement Fusion Result Mean Stdev. Mean Stdev.

x 8.23 6.09 4.00 2.93

y 7.79 5.77 4.21 3.22

z 8.09 6.05 4.67 3.34

3D 16.01 6.67 8.68 3.36

Figure 5.12: The position measurements before and after sensor fusion in the three axes.

For each axis, the upper is the input measurement with σ = 0.1m Gaussian noise, and the lower is the fusion result. Both of them are compared with ground truth in red.

Chapter 6 Conclusion and Future Works

6.1 Conclusion and Future Work

In this paper, we have evaluated the three different visual positioning methods in many scenarios. LSD-SLAM is less accurate but more robust in featureless and blurry cases.

It uses the most information and its dense reconstruction is useful for other tasks than just localization. ORB-SLAM achieves impressively high precision most of the time but still has the nature defects of both SLAM methods and feature-based methods. MBL proves to be the most robust method in monocular positioning by localizing each frame independently. However, it cannot be used in unknown environment since the model need to be built previously and the positioning performance depends on the training images. To make up for the limitations of vision, we use an IMU to aid visual positioning by sensor fusion. The experiment shows that it helps reduce the positioning error in bad cases and the metric scale which is not observable in monocular positioning can be estimated.

We find that the pure rotation situation is an important issue in ego-positioning for flying cameras, which is uncommon in positioning for vehicles. While the SLAM methods all suffer from this situation, MBL shows its robustness. It is a valuable future topic to use them for complementary combination. ORB-SLAM is used for general tracking and helps update the model in unknown area. LSD-SLAM can be combined as a spare module for featureless or blurry cases. MBL is used to correct the accumulative drift and handle the pure rotation cases, and it is also used for global positioning.

Bibliography

[1] Jakob Engel, Thomas Schóps, and Daniel Cremers. Lsd-slam: Large-scale direct monocular slam. In European Conference on Computer Vision, pages 834–849.

Springer, 2014.

[2] Stephan Weiss and Roland Siegwart. Real-time metric state estimation for modular vision-inertial systems. In Robotics and Automation (ICRA), 2011 IEEE Interna-tional Conference on, pages 4531–4537. IEEE, 2011.

[3] Raul Mur-Artal, JMM Montiel, and Juan D Tardós. Orb-slam: a versatile and ac-curate monocular slam system. IEEE Transactions on Robotics, 31(5):1147–1163, 2015.

[4] DJI. Dji phantom series. http://www.dji.com/cn/products/phantom.

[5] Ted Driver. Long-term prediction of gps accuracy: Understanding the fundamentals.

In ION GNSS, 2007.

[6] Marko Modsching, Ronny Kramer, and Klaus ten Hagen. Field trial on gps accuracy in a medium size city: The influence of built-up. In 3Rd workshop on positioning, navigation and communication, pages 209–218, 2006.

[7] Microsoft. Microsoft kinect. https://developer.microsoft.com/en-us/windows/kinect.

[8] Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In Mixed

and augmented reality (ISMAR), 2011 10th IEEE international symposium on, pages 127–136. IEEE, 2011.

[9] Thomas Whelan, Stefan Leutenegger, Renato F Salas-Moreno, Ben Glocker, and Andrew J Davison. Elasticfusion: Dense slam without a pose graph. Proc. Robotics:

Science and Systems, Rome, Italy, 2015.

[10] Andrew Howard. Real-time stereo visual odometry for autonomous ground vehi-cles. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3946–3952. IEEE, 2008.

[11] David Schleicher, Luis M Bergasa, Manuel Ocaña, Rafael Barea, and Elena López.

Real-time hierarchical stereo visual slam in large-scale environments. Robotics and Autonomous Systems, 58(8):991–1002, 2010.

[12] Andrew J Davison. Real-time simultaneous localisation and mapping with a single camera. In Computer Vision, 2003. Proceedings. Ninth IEEE International Confer-ence on, pages 1403–1410. IEEE, 2003.

[13] Georg Klein and David Murray. Parallel tracking and mapping for small ar workspaces. In Mixed and Augmented Reality, 2007. ISMAR 2007. 6th IEEE and ACM International Symposium on, pages 225–234. IEEE, 2007.

[14] Andrew J Davison, Ian D Reid, Nicholas D Molton, and Olivier Stasse. Monoslam:

Real-time single camera slam. IEEE transactions on pattern analysis and machine intelligence, 29(6):1052–1067, 2007.

[15] Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. Dtam: Dense tracking and mapping in real-time. In 2011 international conference on computer vision, pages 2320–2327. IEEE, 2011.

[16] Kuan-Wen Chen, Chun-Hsin Wang, Xiao Wei, Qiao Liang, Chu-Song Chen, Ming-Hsuan Yang, and Yi-Ping Hung. Vision-based positioning for internet-of-vehicles.

IEEE Transactions on Intelligent Transportation Systems, 2016.

[17] Anastasios I Mourikis and Stergios I Roumeliotis. A multi-state constraint kalman filter for vision-aided inertial navigation. In Proceedings 2007 IEEE International Conference on Robotics and Automation, pages 3565–3572. IEEE, 2007.

[18] Jonathan Kelly and Gaurav S Sukhatme. Visual-inertial simultaneous localization, mapping and sensor-to-sensor self-calibration. In Computational Intelligence in Robotics and Automation (CIRA), 2009 IEEE International Symposium on, pages 360–368. IEEE, 2009.

[19] Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Fur-gale. Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3):314–334, 2015.

[20] Stephan Weiss, Markus W Achtelik, Margarita Chli, and Roland Siegwart. Versatile distributed pose estimation and sensor self-calibration for an autonomous mav. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 31–38. IEEE, 2012.

[21] x-io Technologies. x-imu. http://www.x-io.co.uk/products/x-imu/.

[22] Javier Civera, Andrew J Davison, and JM Martinez Montiel. Inverse depth parametrization for monocular slam. IEEE transactions on robotics, 24(5):932–945, 2008.

[23] Ethan Eade and Tom Drummond. Scalable monocular slam. In 2006 IEEE Com-puter Society Conference on ComCom-puter Vision and Pattern Recognition (CVPR’06), volume 1, pages 469–476. IEEE, 2006.

[24] Jan Stúhmer, Stefan Gumhold, and Daniel Cremers. Real-time dense geometry from a handheld camera. In Joint Pattern Recognition Symposium, pages 11–20. Springer, 2010.

[25] Matia Pizzoli, Christian Forster, and Davide Scaramuzza. Remode: Probabilistic, monocular dense reconstruction in real time. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 2609–2616. IEEE, 2014.

[26] Jakob Engel, Jurgen Sturm, and Daniel Cremers. Semi-dense visual odometry for a monocular camera. In Proceedings of the IEEE international conference on com-puter vision, pages 1449–1456, 2013.

[27] Thomas Schöps, Jakob Engel, and Daniel Cremers. Semi-dense visual odometry for ar on a smartphone. In Mixed and Augmented Reality (ISMAR), 2014 IEEE Interna-tional Symposium on, pages 145–150. IEEE, 2014.

[28] Jakob Engel, Júrgen Sturm, and Daniel Cremers. Scale-aware navigation of a low-cost quadrocopter with a monocular camera. Robotics and Autonomous Systems, 62(11):1646–1656, 2014.

[29] Mingyang Li and Anastasios I Mourikis. High-precision, consistent ekf-based visual–inertial odometry. The International Journal of Robotics Research, 32(6):

690–711, 2013.

[30] Roland Brockers, Sara Susca, David Zhu, and Larry Matthies. Fully self-contained vision-aided navigation and landing of a micro air vehicle independent from exter-nal sensor inputs. In SPIE Defense, Security, and Sensing, pages 83870Q–83870Q.

International Society for Optics and Photonics, 2012.

[31] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In 2011 International conference on computer vision, pages 2564–2571. IEEE, 2011.

[32] Vicon. Vicon bonita. http://www.vicon.com/products/camera-systems/bonita.

在文檔中基于視覺和慣性測量之飛行攝影機自我定位 (頁 34-41)