The training process video of NAO typing keyboard is available at https://youtu.be/tg6uag4cjLA. Moreover, you can view some of the learned policy of typing keyboard at https://youtu.be/71PU4HFyeyY and https://youtu.be/RkYOPsdqC5Q.
Conclusion
We have developed a coordination framework of vision and motion based on deep learning and transfer learning. A robot with such a coordination system is useful in helping people inconvenient in typing keyboard, such as elders not proficient in using 3C devices or the internet, to accomplish some works, and improve their life quality. The proposed system can facilitate the robot to type a virtual keyboard to operate the tablet computer. The type of keyboard layout and stylus pen affect the state observation. These factors tend to make the policy take improper action. However, our modularized system can smoothly adjust the vision component for many new applications.
Suggestions for future works include improving the vision model, since the vision model we applied fit only on the Surface pro4 keyboard, and the stylus pen used. Second, the robot not fully conquered the typing task, it can be training for longer training episode or try another RL algorithm by the designed framework. Additional images of the keyboard layout and/or different types of stylus pen for training can make the system more generally. Moreover, if the waist joint of NAO can be used for control, too, the limitations imposed on NAO’s joints angles and arm lengths can be lifted, and those keys not accessible by our system can be pressed.
References
[1] Y. D. Qian Yu. "Attention-OCR." https://github.com/da03/Attention-OCR (accessed.
[2] A. Robotics. "Aldebaran official website." http://doc.aldebaran.com/2-5/home_nao.html (accessed.
[3] S. Feng, E. Whitman, X. Xinjilefu, and C. G. Atkeson, "Optimization based full body control for the atlas robot," in 2014 IEEE-RAS International Conference on Humanoid Robots, 2014: IEEE, pp. 120-127.
[4] S. Feng, X. Xinjilefu, C. G. Atkeson, and J. Kim, "Optimization based controller design and implementation for the atlas robot in the darpa robotics challenge finals," in 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), 2015: IEEE, pp. 1028-1035.
[5] C. W. Wampler, "Manipulator inverse kinematic solutions based on vector formulations and damped least-squares methods," IEEE Transactions on Systems, Man, and Cybernetics, vol. 16, no. 1, pp. 93-101, 1986.
[6] X. Xinjilefu, S. Feng, W. Huang, and C. G. Atkeson, "Decoupled state estimation for humanoids using full-body dynamics," in 2014 IEEE
International Conference on Robotics and Automation (ICRA), 2014: IEEE, pp.
195-201.
[7] S. J. Julier and J. K. Uhlmann, "New extension of the Kalman filter to
nonlinear systems," in Signal processing, sensor fusion, and target recognition VI, 1997, vol. 3068: International Society for Optics and Photonics, pp. 182-194.
[8] S. Levine, C. Finn, T. Darrell, and P. Abbeel, "End-to-end training of deep visuomotor policies," The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334-1373, 2016.
[9] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection," The International Journal of Robotics Research, vol. 37, no. 4-5, pp.
421-436, 2018.
[10] Q. Wan, "Developments of Drawing Capability for NAO Humanoid Robot,"
Information Technology, Vaasan Ammattikorkeakoulu University of Applied Sciences, 2015.
[11] X. Sun, "Development of a Vision System and Basic Drawing with NAO Robot," Information Technology, Vaasan Ammattikorkeakoulu University of Applied Sciences, 2016.
[12] L. Calvo-Varela, C. V. Regueiro, D. S. Canzobre, and R. Iglesias,
"Development of a Nao humanoid robot able to play Tic-Tac-Toe game on a tactile tablet," in Robot 2015: Second Iberian Robotics Conference, 2016:
Springer, pp. 203-215.
[13] Y. Philippczyk, "Implementing Deep Learning Object Recognition on NAO,"
Bachelor’s Thesis in the Degree Course, Computer Science and Media, Stuttgart Media University, 2016.
[14] D. Albani, A. Youssef, V. Suriani, D. Nardi, and D. D. Bloisi, "A deep learning approach for object recognition with nao soccer robots," in Robot World Cup, 2016: Springer, pp. 392-403.
[15] B. Sahiner et al., "Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images," IEEE
transactions on Medical Imaging, vol. 15, no. 5, pp. 598-610, 1996.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[17] H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Fifteenth annual conference of the international speech communication association, 2014.
[18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once:
Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[19] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
[20] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
[21] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, "Dssd: Deconvolutional single shot detector," arXiv preprint arXiv:1701.06659, 2017.
[22] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
"Feature pyramid networks for object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[23] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988.
[24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
[25] V. Mnih et al., "Human-level control through deep reinforcement learning,"
Nature, vol. 518, no. 7540, p. 529, 2015.
[26] D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529, no. 7587, p. 484, 2016.
[27] A. Y. Ng et al., "Autonomous inverted helicopter flight via reinforcement learning," in Experimental robotics IX: Springer, 2006, pp. 363-372.
[28] S. Levine, N. Wagener, and P. Abbeel, "Learning contact-rich manipulation skills with guided policy search," in 2015 IEEE international conference on robotics and automation (ICRA), 2015: IEEE, pp. 156-163.
[29] C. J. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, no. 3-4, pp.
279-292, 1992.
[30] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning,"
arXiv preprint arXiv:1509.02971, 2015.
[31] M. Večerík et al., "Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards," arXiv preprint arXiv:1707.08817, 2017.
[32] S. Han, J. Pool, J. Tran, and W. Dally, "Learning both weights and connections for efficient neural network," in Advances in neural information processing systems, 2015, pp. 1135-1143.
[33] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
[34] Z. Lin et al., "A structured self-attentive sentence embedding," arXiv preprint arXiv:1703.03130, 2017.
[35] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on
knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2009.
[36] Opencv. "Computer Vision Annotation Tool (CVAT)."
https://github.com/opencv/cvat (accessed 2019).
[37] J. Canny, "A computational approach to edge detection," in Readings in computer vision: Elsevier, 1987, pp. 184-203.
[38] C. Tomasi and R. Manduchi, "Bilateral filtering for gray and color images," in Iccv, 1998, vol. 98, no. 1, p. 2.
Appendix
~/naoRL/DDPGRANDIN$ DDPG.2.py
Initial the training environment of NAO set the arm to safe position Input:
ep=0: The episode of training process to set the initial task of environment.
goal=’H’: Set the goal of this episode Output:
A shape (17, ) numpy array of state render()
Render the monitor of NAO camera and virtual keyboard outlay.
step(action, [speed])
Agent execute the action and return the observated state.
Input:
action: Five dimension numpy array between [-1, 1]
speed: Five dimension numpy array between [0, 1]
Output:
A shape (17, ) numpy array of state