HMM-Based Recognition - Hand Gesture Recognition

Chapter 3 Hand Gesture Recognition System

3.3 Hand Gesture Recognition

3.3.3 HMM-Based Recognition

As a hand gesture is basically an action composed of a sequence of hand postures that are connected by continuous motions, thus describing the hand gestures in terms of a sequential input is suitable in HGR systems. HMM have been applied to HGR tasks with success. The classification of the input sequence proceeds by

Fig- 3.18 HGR ANN structure

┼

440 neurons 10 neurons 1 neuron

where pznzn-1pxnznare called transition andemission probability, respectively. zn

is the latent variable, or called hidden state, corresponding to the input data xn. An iterative expectation-maximization (EM) algorithm procedure is used to maximize the likelihood function and obtain optimal parameters. To speed up the calculation, the Viterbi algorithm is used that finding the most probable sequence of latent variables for a given observations X {x1 ,..., x_N}. The posterior probability of time t is proportional to the jointly probability:

1 1 1 1

The Viterbi algorithm consider explicitly all of the exponentially many paths, evaluate the probability for each and then select the path having the highest probability. In this experiment, the latent variable z_nhas 6 states and N is chosen to be 10.

This experiment test 6 hand gestures to be recognized, including moving left, right, upward, downward, open and click, and Fig- 3-19 shows the defined gestures.

The hand gestures are defined in natural way rather than in particular ordered hand pose. The palm center difference between the current and previous frames can be used to recognize the these gestures. The above three algorithms are used to recognize these six hand gestures.

Fig- 3.19 The defined hand gestures.

Chapter 4 Experimental Results

In the previous chapters, the three main steps of the proposed HGR system are introduced. In this chapter, the experiment results of each step will be shown in detail and the results of the proposed algorithm will be obtained by MATLAB R2010b and OpenCV 2.2.

4.1 Hand Region Detection

To examine the reliability of hand region detection, the system is tested in many different situations, including different distances, the skin color background, overlapping by other skin object and more than one human, and the results are shown from Fig- 4.1 to Fig- 4.4, respectively. The left column contains the skin color images, the middle one shows the ROI images, and the right one represents the skin color regions with large enough area, where the white pixel indicates skin color and red represents the depth searching region. Note that the green rectangles in the middle are the selected ROIs. Fig- 4.1 shows even though human keeps moving away from the camera, the system would not fail to extract hand region. The distance between the human and camera is from 0.5m to 2m. In Fig-4.2, there are some skin color objects in the images, and the system also could detect the hand regions.

(b) (c)

Fig- 4.1 Results of hand region detection in different distances. (a) The original skin color region (b) The ROI images (c) skin color regions with large enough area. Note that the green rectangles in (b) are the selected ROIs.

The situations in Fig- 4.3 and Fig- 4.4 are more complex. In Fig- 4.3, there are overlapping between the hand and the other skin-color object, like face and shirt. The results show that system still could extract hand regions and even when suffering from serious overlapping problem. In Fig-4.4, there are more than one human standing in front of the camera, and the system could filter out these regions to reduce the number of ROI and still success to extract the hand regions. The programming tools are Visual Studio 2010 and OpenNi for controlling the Kinect, and the resolution of the video is 640×480 pixels. The average processing speed is 0.2ms which is fast enough to be used in a real-time HCI system.

(a) (b) (c)

Fig- 4.2 Results of complex skin color background.

(a) (b) (c)

Fig- 4.3 Results of hand region detection in the condition of overlapping between the hand and other skin color objects.

(a) (b) (c)

Fig- 4.4 Results of hand region detection in the condition of multiple users.

4.2 Feature Extraction

In this section, the experimental results are presented in three parts. The first part focuses on the forearm cropping. The second part focuses on the performance of different hand orientation. The third part shows the results of finding fingertip positions.

Experiment 1:

To evaluate the ability of the proposed method to crop the forearm region, and Fig- 4.5 shows the results, where the green rectangles are boundary of detected CCL

(a) (b)

Fig- 4.5 Results of forearm cropped in condition of (a) different postures (b) different distances.

objects and blue ones are hand regions. The result of Fig- 4.5 (a) shows that even though the hands are not open, the forearm regions are still be cropped. In order to examine the function of normalization, the experiment test different distances between the human and camera including 0.5m, 1.0m, 1.5m and 2m, and Fig- 4.5 (b) shows the result,. Obviously, the influence of distance is highly reduced.

Experiment 2:

This experiment shows the hand direction can be detected with fast processing speed and accurate rate. These information can be applied in a sequential HGR process.

(a) (b)

Fig- 4.6 Results of different hand direction.

Experiment 3:

The fingertip positions are shown in Fig- 4.7 (c), where the red and yellow (a)

(b)

Fig- 4.7 Results of finding fingertip positions.

(b) (c)

points denote palm center and fingertips, respectively. This experiment shows the fingertip positions can be extracted fast and precisely.

4.3 Hand Gesture Recognition

In this section, the hand recognition system would be tested in algorithms to examine the performance and reliability. This thesis adopts discriminative classification model, neuron network and hidden Markov model to implement the HGR systems. This experiment test 6 hand gestures to be recognized, including moving left, right, upward, downward, open and click. The performances of these methods in different algorithm are shown in Table 4.1, which include 4483 training dataset.

This experiment shows that the hand feature extraction method can be used in different recognition method and posses good performance, and the result shows in Fig- 4.9. Furthermore, a hand gesture composed of a sequence of hand postures that are connected by continuous motions can be recognized using the extracted features.

Table 4.1 Recognition Results of Different Classifier

Correct rate(%) Process Time(s)

Discriminative 85.6 0.09

NN-based Approach 89.83 0.24

HMM-based Approach 91.06 0.31

It is obvious that the accuracy rate of HMM-based HGR system is higher than that of the other two HGR systems. However, the computational cost of HMM-based is lower than other two HGR systems and the average executing time of Set-I is lower than 0.1sec. This experiment shows that the proposed hand feature extraction method is useful and meaningful for the HGR system, no matter what the classifier is. Fig- 4.9 shows the decision boundary of the four moving direction using NN-based algorithm.

Fig- 4.8 Results of HGR systems.

The next experiment test a real application on HCI, which use the HGR system to control the direction and position of the mouse, and Fig- 4.10 shows this

implementation. For more defined gestures, the mouse control can process more complicated movement.

Fig- 4.10 Application of controlling mouse.

Fig- 4.9 Decision boundary of four moving direction

Chapter 5 Conclusions and Future Works

This thesis proposes a fast and robust hand feature extraction method based on depth and RGB information generated by Kinect that can be used in real-time HGR or hand tracking. The system is divided into three parts, including ROI selection, feature extraction and HGR. First, the skin color detection and connected component labeling (CCL) are applied to select the potential ROIs. Through distance transformation, the system could extract feature points that can be used to find hand features, which includes direction, fingertip positions, and palm center. Finally, these features are sent into several HGR systems. From the experimental results, there are some conclusions listed as below:

 The ROI selection could detect hand region with accuracy rate higher than 90%

and the average executing time about 0.2sec/frame. Besides, with the help of depth image, the system could also detect hands correctly even suffering from the overlapping between the target hand and other skin color object.

 The feature points extracted by this thesis can be used to find useful hand features for HGR. With the depth information, the forearm could be cropped and these feature points could generate the fingertip positions and hand direction. These hand features are useful and meaningful for the usage of human-computer interaction(HCI).

 The extracted features are sent into the HGR system, which is implemented by different algorithms. The experimental results show that the proposed hand feature extraction method is useful and meaningful for HGR system and can work in

real-time and possess high recognition rate.

The proposed feature extraction method has been demonstrated to be successful in HGR system, which is an important function in the field of HCI. With this HGR system, the HCI could be more accurate and natural. There are two primary future works to further investigate which are presented as below:

 The ROI selection could not only detect outstretched hands but hands with curved fingers. Furthermore, with the depth distribution information of a CCL object, the 3-D direction can be estimated to build up a hand model.

 Two or more users can appear in the screen to allow a more complicated hand gestures. Therefore, the ROI selection and HGR system should have the ability to discriminate these hand gestures when faces occlusion problem. Furthermore, the more robust HGR system should be designed to handle the hand gestures with ambiguous movement.

54 z_N}. Assume z_n is M-state discrete random variable, where n1,...,N. The posterior probability of latent variable can be decoupled as:

     

where α(zn) is called forward message that accumulated from the previous n

observations and β(z_n) is backward message of all future data from time n+1 to N. The α(zn) can be expressed in terms of α(zn-1) as follows:

where pznzn-1and pxnznare known parameters. To start this recursion, the initial forward message is given by

     

1 1 1 1 1 1

( )z p x , z p z p x | z

  (A.4)

Therefore, the overall cost of evaluating these quantities for n=1,...,N is of O(NM²).Similarly, the backward message β(zn) can find a recursion relation as:

 

Reference

[1] S. Sclaroff and V. Athitsos, "Skin color-based video segmentation under time-varying illumination, " Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.26, pp.862-877, July 2004.

[2] R. C. González and R. E. Woods, Digital Image Processing, 3rd Edition:

Pearson/Prentice Hall, 2008.

[3] M. de La Gorce, D. J. Fleet and N. Paragios, "Model-Based 3D Hand Pose Estimation from Monocular Video," Pattern Analysis and Machine

Intelligence, IEEE Transactions on, vol.33, pp.1793-1805, September 2011.

[4] C.W. Ng and S. Ranganath, “Gesture recognition via pose classification,” in Pattern Recognition, 2000 IEEE 15th International Conference on, vol. 3, pp.

699–704, 2000.

[5] R. H. Liang and M. Ouhyoung, “A real-time continuous gesture recognition system for sign language,” in Automatic Face and Gesture Recognition,1998 Third IEEE International Conference on, pp. 558–565, 1998.

[6] A. Morales, "Towards contactless palmprint authentication," in IET Computer Vision, vol. 5, pp. 407–416, 2011.

[7] D.Nicolas and E. M. Petriu, "Hand gesture recognition using haar-like features and a stochastic context-free grammar," Instrumentation and Measurement, IEEE Transactions on, vol.57, pp.1562-1571, 2008.

[8] L. Gupta and M. Suwei, "Gesture-based interaction and communication:

automated classification of hand gesture contours," Systems, IEEE Transactions on Man, and Cybernetics, Part C: Applications and Reviews,

vol:31, pp.114-120, 2001.

[9] S. M. Dominguez, T. Keaton and A. H .Sayed, " A robust finger tracking method for multimodal wearable computer interfacing," Multimedia, IEEE Transactions on, vol.8, pp.956-972, October 2006.

[10] K. Oka, Y. Sato, and H. Koike, “Real-time fingertip tracking and gesture recognition,” Computer Graphics and Applications, IEEE Vol.22, No.6, pp.

64-71, 2002.

[11] K.-J. Hsiao, T.-W. Chen, and S.-Y. Chien, “Fast fingertip positioning by combining particle filtering with particle random diffusion,” in Multimedia and Expo , IEEE International Conference on, pp. 977-980, 2008.

[12] J. C. Terrillon and S. Akamatsu, "Comparative performance of different chrominance spaces for color segmentation and detection of human faces in complex scene images," Proc. Vision Interface, pp. 180–187, 1999.

[13] M. H. Yang, N. Ahuja, "Gaussian mixture model for human skin color and its application in image and video databases", Proceedings of SPIE: Conference on Storage and Retrieval for Image and Video Databases, vol. 3656, pp.

458–466,1999.

[14] S. M. Kay, Fundamentals of Statistical Signal Processing Volume II Detection Theory: Pearson/Prentice Hall, 1996.

[15] H. Youngmo,"A Low-Cost Visual Motion Data Glove as an Input Device to Interpret Human Hand Gestures," Consumer Electronics, IEEE Transactions on, vol.56, pp.501-509, May 2010.

[16] M. A. Al-Mouhamed, O. Toker, and A. Al-Harthy, "A 3-D vision-based man-machine interface for hand-controlled telerobot," Industrial Electronics, IEEE Transactions on, vol.52, pp.306-319, 2005.

[17] P. Viola and M. Jones, “Robust real-time object detection,” Computer Vision,

International Journal on, vol. 2, no. 57, pp. 137–154, 2004.

[18] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”

Computer Vision, International Journal on, vol. 60, no. 2, pp. 91–110, Nov.

2004.

[19] L. Dung and M. Mizukawa, " Fast hand feature extraction based on connected component labeling, distance transform and hough transform," Journal of Robotics and Mechatronics, Vol.21 No.6, 2009

[20] R. Laganière, OpenCV 2 computer vision application programming cookbook:

Packt Publ. Limited, 2011.

[21] G. Borgefors, "Distance transformations in digital images," Computer vision, graphics, and image processing, vol. 34, pp. 344-371, 1986.

[22] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, "A general algorithm for computing distance transforms in linear time," Mathematical Morphology and its applications to image and signal processing, pp. 331-340, 2002.

[23] 鍾振方，"基於凹槽匹配法之即時手勢辨識系統設計，" 國立交通大學電

機學院電控工程研究所碩士論文，民國 101 年。

在文檔中基於影像特徵點擷取結合深度資訊之即時手勢辨識系統設計 (頁 48-0)