Conclusion - 基於深度學習語義分割之城市道路汽車轉向操控

In this thesis, a deep CNN model for autonomous car steering is proposed. The proposed approach is based on deep CNNs and it takes advantages of semantic segmentation to provide a high-level representation for steering angle prediction. In general, the proposed method has two stages: semantic segmentation generation and car steering angle prediction.

In the first stage of the proposed approach, a Perception Network that based on the architecture of SegNet is used to generate semantic representation from an RGB input image. In order to obtain better segmentation results, we used pertained weights on Cityscapes for the Perception Network and fine-tune it with our manually labeled semantic segmentation ground truths. In the second stage, the segmentation result is fed to a Control Network for predicting a steering angle. The Control Network is a compact network that can learn to map a semantic segmentation result to a steering angle value.

The experimental results demonstrate that the proposed approach outperforms a typical end-to-end CNN baseline model. The proposed approach has RMSE 8.85 × 10^-2 on the test set of Udacity dataset while the baseline model has 9.2 × 10^-2 RMSE. In addition, we use several data to support that our method has more robust results than the baseline model.

In future work, we would like to survey how to label semantic segmentation for driving videos efficiently. In this thesis, we have to label segmentation ground truths manually; however, if we could introduce automatic annotation techniques, we can expand the size of the training data easily. Possible directions for efficient labeling of

semantic segmentation are video segmentation or unsupervised learning of semantic segmentation. Finally, we also interested in designing a unified CNN architecture that can deal with semantic meaning extraction and driving control prediction in a single CNN.

References

[1] Global Status Eport on Road Safety 2015. Available:

http://www.who.int/violence_injury_prevention/road_safety_status/2015/en/

[2] J. Janai, F. Güney, A. Behl, and A. Geiger, "Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art," arXiv preprint arXiv:1704.05519, 2017.

[3] V. Sze, Y.-H. Chen, T.-J. Yang, and J. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," arXiv preprint arXiv:1703.09039, 2017.

[4] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems, pp. 91-99, 2015.

[5] J. Long, E. Shelhamer, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.

[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.

[7] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, "Deep Spatial Autoencoders for Visuomotor Learning," in IEEE International Conference on Robotics and Automation, pp. 512-519, 2016.

[10] A. H. van der Heijden, "Two Stages in Visual Information Processing and Visual Perception?," Visual Cognition, vol. 3, no. 4, pp. 325-362, 1996.

[11] T. S. Lee and A. L. Yuille, "Efficient Coding of Visual Scenes by Grouping and Segmentation," in Bayesian Brain: Probabilistic Approaches to Neural Coding, pp. 141-185, 2006.

[12] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M.

Andriluka, P. Rajpurkar, T. Migimatsu, and R. Cheng-Yue, "An Empirical Evaluation of Deep Learning on Highway Driving," arXiv preprint arXiv:1504.01716, 2015.

[13] D. A. Pomerleau, "Alvinn: An Autonomous Land Vehicle in a Neural Network,"

in Advances in Neural Information Processing Systems, pp. 305-313, 1989.

[14] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, "Deepdriving: Learning Affordance for Direct Perception in Autonomous Driving," in IEEE International Conference on Computer Vision, pp. 2722-2730, 2015.

[15] The Open Racing Car Simulator Website. Available: http://torcs.sourceforge.net/

[16] S. Yang, S. Konam, C. Ma, S. Rosenthal, M. Veloso, and S. Scherer, "Obstacle Avoidance through Deep Networks Based Intermediate Perception," arXiv preprint arXiv:1704.08759, 2017.

[17] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, "Off-Road Obstacle Avoidance through End-to-End Learning," in Advances in Neural Information Processing Systems, pp. 739-746, 2006.

[18] A. Giusti, J. Guzzi, D. C. Cireşan, F.-L. He, J. P. Rodríguez, F. Fontana, M.

Faessler, C. Forster, J. Schmidhuber, and G. Di Caro, "A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots," IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 661-667, 2016.

[19] C. Chen, "Extracting Cognition out of Images for the Purpose of Autonomous Driving," Ph.D., Princeton University, 2016.

[20] L. G. Appelbaum and A. M. Norcia, "Attentive and Pre-Attentive Aspects of Figural Processing," Journal of Vision, vol. 9, no. 11, pp. 18-18, 2009.

[21] S. Chernova and M. Veloso, "Interactive Policy Learning through Confidence-Based Autonomy," Journal of Artificial Intelligence Research, vol.

34, no. 1, p. 1, 2009.

[22] S. Ross and D. Bagnell, "Efficient Reductions for Imitation Learning," in International Conference on Artificial Intelligence and Statistics, pp. 661-668, 2010.

[23] D. Silver, J. Bagnell, and A. Stentz, "High Performance Outdoor Navigation from Overhead Data Using Imitation Learning," Robotics: Science and Systems

[24] S. Ross, G. J. Gordon, and D. Bagnell, "A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning," in International Conference on Artificial Intelligence and Statistics, pp. 627-635, 2011.

[25] J. Zhang and K. Cho, "Query-Efficient Imitation Learning for End-to-End Simulated Driving," in AAAI Conference on Artificial Intelligence, pp.

2891-2897, 2017.

[26] S. J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010.

[27] Y. Bengio, "Deep Learning of Representations for Unsupervised and Transfer Learning," in ICML Workshop on Unsupervised and Transfer Learning, pp.

17-36, 2012.

[28] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," in International Conference on Machine Learning, pp. 448-456, 2015.

[31] H. Noh, S. Hong, and B. Han, "Learning Deconvolution Network for Semantic Segmentation," in IEEE International Conference on Computer Vision, pp.

1520-1528, 2015.

[32] P. O. Pinheiro, "Large-Scale Image Segmentation with Convolutional Networks," Ph.D., É cole Polytechnique Fédérale de Lausanne, 2017.

[33] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab:

Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1-1, 2017.

[34] D. Eigen and R. Fergus, "Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture," in IEEE International Conference on Computer Vision, pp. 2650-2658, 2015.

[35] G. Lin, C. Shen, A. van den Hengel, and I. Reid, "Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation," in IEEE Conference on

Computer Vision and Pattern Recognition, pp. 3194-3203, 2016.

[36] V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," arXiv preprint arXiv:1511.00561, 2015.

[37] F. J. Huang, Y.-L. Boureau, and Y. LeCun, "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2007.

[38] Udacity Self-Driving Car Challenge 2 Dataset. Available:

https://github.com/udacity/self-driving-car/tree/master/challenges/challenge-2 [39] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U.

Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset for Semantic Urban Scene Understanding," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213-3223, 2016.

[40] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in International Conference on Learning Representations, 2014.

[41] L. Bottou, "Stochastic Gradient Descent Tricks," in Neural Networks: Tricks of the Trade: Springer, pp. 421-436, 2012.

[42] S. Ji, W. Xu, M. Yang, and K. Yu, "3d Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2013.

[43] Teaching a Machine to Steer a Car. Available:

https://medium.com/udacity/teaching-a-machine-to-steer-a-car-d73217f2492c [44] Model of Team Rwightman in Udacity Self-Driving Car Challenge 2. Available:

https://github.com/udacity/self-driving-car/blob/master/steering-models/evaluati on/rwightman.py

[45] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.

[46] Model of Team Epoch in Udacity Self-Driving Car Challenge 2. Available:

https://github.com/udacity/self-driving-car/tree/master/steering-models/commun ity-models/cg23

[47] Udacity Self-Driving Car Challenge 2 Leaderboard. Available:

nal-leaderboard

[48] T. Mikolov, M. Karafiat, and L. Burget, "Recurrent Neural Network Based Language Model," in Eleventh Annual Conference of the International Speech Communication Association, 2010.

在文檔中基於深度學習語義分割之城市道路汽車轉向操控 (頁 66-72)