5.1 Conclusions
The main contribution of this thesis is that we propose a robust system fingerspelling recognition system by the user independent hand gesture features, the hand structure and LBP feature which is based on RGB-D image. The system consist with three parts, including ROI selection, feature extraction, and fingerspelling recognition system. The feature points are obtained by the distance transform, which is used to find hand features. The information of hand structure is also created by these feature points which contains hand direction, finger number, and finger vectors.
The other hand gesture feature is the texture of hand gesture, which is represented by LBP feature. Finally, these features are sent into the fingerspelling recognition system which is consisted with several classifiers. There are some conclusions generalized by the experimental result:
1. The system provides a high accuracy rate for hand region detection which is used to select the ROI in this thesis. The accuracy is higher than 90% and the average executing time is 0.2sec/frame.
2. The system can detect the hand region correctly even suffering from the overlapping between the hand and the other skin color objects.
3. The hand structure is a fast way to represent the characteristics of the hand contour, and this feature is used to classify the fingerspelling with different hand gesture shape.
50
4. For the cases which are hard to recognize with the information from hand structure, the binary classifier based on local binary pattern (LBP) is provided to deal with the texture of the hand gestures. To simplify the complexity of input data, we separated the LBP value (0~255) into 50 parts, and includes the idea of difference for binary classification.
5. The fingerspelling recognition system is cascaded by the multiple classifiers based on hand structure and binary classifiers based on LBP. The number of classifiers is different for each fingerspelling, and we select the most suitable combination for these classifiers.
6. The accuracy of most fingerspelling are higher than 80%, that is, the system is efficacious for fingerspelling recognition.
5.2 Future Works
The system introduced in this thesis cannot deal with the dynamic hand gesture problem, and is limited by the resolution of Kinect. There are some primary ideas of future works that make the system more complete:
1. The system in this thesis is designed for only one user, so we can extend the system to multi-user fingerspelling recognition system with a little modification of the user interface.
2. With the combination of HMM or the algorithms which are specifically designed to solve the sequential image problems, this system can be extend to the dynamic sign language recognition system. This application is based on the features such as hand structure and hand texture introduced in this thesis, and is more commonly
51
used in accessible communication.
3. This thesis has successfully distinguished many different hand gestures. The one-finger hand gestures can be defined as writing tool which is used for air writing which is also an application of dynamic hand gestures. The writing trajectory can be obtained by the hand information introduced in this thesis such as palm position, finger positions, fingertips vectors, etc. The optical character recognition system has been researched for a long term, so with the combination of the system in this thesis, the air writing recognition can be implemented in efficient way.
52
Reference
[1] Allen, J.M.; Asselin, P.K.; Foulds, R., "American Sign Language finger spelling recognition system," Bioengineering Conference, 2003 IEEE 29th Annual, Proceedings of , vol., no., pp.285,286, 22-23 March 2003
[2] Bui, T.D.; Nguyen, L.T., "Recognizing Postures in Vietnamese Sign Language With MEMS Accelerometers," Sensors Journal, IEEE , vol.7, no.5, pp.707,712, May 2007
[3] Bragatto, T. A C; Ruas, G. I S; Lamar, M.V., "Real-time video based finger spelling recognition system using low computational complexity Artificial Neural Networks," Telecommunications Symposium, 2006 International , vol., no., pp.393,397, 3-6 Sept. 2006
[4] Nagasue, A.; Joo Kooi Tan; Hyoungseop Kim; Ishikawa, S., "Japanese finger-spelling recognition using a chest-mounted camera," SICE Annual Conference (SICE), 2012 Proceedings of , vol., no., pp.909,912, 20-23 Aug. 2012
[5] Shimada, M.; Iwasaki, S.; Asakura, T., "Finger spelling recognition using neural network with pattern recognition model," SICE 2003 Annual Conference , vol.3, no., pp.2458,2463 Vol.3, 4-6 Aug. 2003
[6] Terrillon, J.-C.; Shirazi, M.N.; Fukamachi, H.; Akamatsu, S.,
"Comparative performance of different skin chrominance models and chrominance spaces for the automatic detection of human faces in color images," Automatic Face and Gesture Recognition, 2000. Proceedings.
53
Fourth IEEE International Conference on , vol., no., pp.54,61, 2000
[7] M.H. Yang, N. Ahuja, "Gaussian Mixture model for human skin color and its application in image and video databases", Proceedings of SPIE:
Conference on Storage and Retrieval for Image and Video Databases, vol.
3656, pp. 458–466,1999.
[8] Steven M. Kay, Fundamentals of Statistical Signal Processing Volume II Detection Theory: Pearson/Prentice Hall, 1996.
[9] T. Ojala, M. Pietikinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions,” Pattern Recognition, vol. 29, no. 1, pp. 51–59, 1996.
[10] M. A. Al-Mouhamed, O. Toker, and A. Al-Harthy, "A 3-D vision-based man-machine interface for hand-controlled telerobot," Industrial Electronics, IEEE Transactions on, vol.52, pp.306-319, 2005.
[11] P. Viola and M. Jones, “Robust real-time object detection,” Computer Vision, International Journal on, vol. 2, no. 57, pp. 137–154, 2004.
[12] R. Laganière, OpenCV 2 computer vision application programming cookbook:
Packt Publ. Limited, 2011.
[13] G. Borgefors, "Distance transformations in digital images," Computer vision, graphics, and image processing, vol. 34, pp. 344-371, 1986.
[14] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, "A general algorithm for computing distance transforms in linear time," Mathematical Morphology and its
54
applications to image and signal processing, pp. 331-340, 2002.
[15] Le Dung; Mizukawa, M., "Fast fingertips positioning based on distance-based feature pixels," Communications and Electronics (ICCE), 2010 Third International Conference on , vol., no., pp.184,189, 11-13 Aug. 2010
[16] Le Dung and Makoto Mizukawa, " Fast Hand Feature Extraction Based on Connected Component Labeling, Distance Transform and Hough Transform,"
Journal of Robotics and Mechatronics, Vol.21 No.6, 2009
[17] Shi-Cheng Wu (2013) . " Real-Time Hand Gesture Recognition System Design Based on Image Feature Points Extraction and Depth Information, " Master thesis, National Chaio Tung University
[18] G. Borgefors, "Distance transformations in digital images," Computer vision, graphics, and image processing, vol. 34, pp. 344-371, 1986.
[19] M. Hrúz, J. Trojanová, M. Železný, “Local Binary Pattern based features for sign language recognition,” Pattern Recognition and Image Analysis, Volume 22, Issue 4, pp 519-526, Oct. 2012
[20] Daniel Kelly, John McDonald, Charles Markham, “A person independent system for recognition of hand postures used in sign language,” Pattern Recognition Letters, Volume 31, Issue 11, P1359–13681, Aug. 2010