Hand Gestures Recognition - Hand Gesture-Based HCI

Chapter 3. Hand Gesture-Based HCI

3.4 Hand Gestures Recognition

Table 3.1 shows the hand gestures and action considered in our hand gesture recognition system. When a hand is detected in our system, it will be recognized as the dominant hand and system will enter one-hand mode. There are three gestures for the one-hand mode, including cursor moving, mouse clicking, and next/pervious page. The Move gesture is recognized when the hand point is moving in "Moving Region" (see Appendix A). The

"Click" gesture is a open-close-open sequence by dominant hand. The Swipe gesture is switching page by swiping hand. If another hand is detected, the system will enter two-hand mode which includes zooming and scrolling hand gesture. Zooming and scrolling action are recognized by the fisting of the dominant hand. The Zooming gesture is moving two hands closer or apart. The Scrolling gesture is moving non-dominant hand up or down.

Table 3.1 Hand gestures used in the HCI operation

Hand

Chapter 4. System Implementation and Experimental Results

The system implementation and experimental results are described in this chapter. In Section 4.1, the system implementation and the function of each module are presented. Then, we test all kinds of hand gestures that we selected and compare the results with some similar methods in Section 4.2.

4.1 System Implementation

We implement our system with the Model-View-Controller (MVC) design pattern as shown in Figure 4.1, with three parts to increase the flexibility of hand gesture-based human-computer interaction system and decrease dependency of each part. The View part contains QtMotorController, Windows GUI and QtImgReader modules. QtMotorController is a UI for a user to adjust Kinect angle. Windows GUI is a part of Windows OS which provides an application programming interface (API) for system-level IO control. For efficient operation, QtImgReader is adopted to access the depth data directly which emits update event to DataGenerator and shows depth data on the screen.

The Controller part is organized by MotorController and GestureController.

MotorController controls Kinect motor and changes the horizontal angle of Kinect from -37 to +37 degrees. GestureController analyzes user's hand gesture and dispatches different events, like mouse or keyboard event, to a computer.

The Model part is organized by MotorData and DataGenerator. MotorData is the Kinect motor data which stores in Kinect hardware that records the value of Kinect angle.

DataGenerator acquires the depth data and image data from Kinect and stores in an array map.

Figure 4.1 The Model-View-Controller for our system.

4.2 Experimental Results

The hand gesture-based human-computer interaction system is implemented by using Microsoft Visual C++ 2010 and uses Kinect as input sensor. We test our system under both PC and notebook to show that it can execute on a platform with fewer system resources than PC. The detailed specifications are shown in Table 4.1.

Table 4.1 Hardware and software specifications

Item Describe

Model PC Notebook Thinkpad X201i

CPU Intel I5-2500K @ 3.9GHz Intel i3 M330 @ 2.13GHz

RAM 16GB DDR3-1333 4GB DDR3-1066

OS Windows 7 Enterprise 64bit Library

All the experiments discussed in this section are tested indoors. The user stands in front of the Kinect sensor and keeps a distance of 1.8 meters. We test all hand gestures that we defined before and calculate the average process during the operation. In our experiments, we simulate a sequence of operations for surfing the web and analyze the performance of recognition algorithm. Then we compare these results with some similar methods.

The system initialization requires user waving hands. Figure 4.2 shows the hand detection process. Figures 4.2 (a) and (b) shows two images of a waving hand. After waving hand, the hand point that is detected by NITE (see the blue dot in Figure 4.2(c)). To define the Moving Region, the dominant hand (the first detected hand) needs to stay in the air, as show in Figure 4.3 (a), for about one second. Figure 4.3 (b) shows that the Moving Region is defined and the cursor (the yellow circle) is show in the browser.

The one-hand gestures include Click, Move, and Swipe gestures. Figure 4.4 shows the clicking test. Figures 4.4 (a)-(c) show open-fist-open sequence and the red circle in Figure 4.4 (c) indicates the position of clicking action. Figure 4.5 shows the moving test where the cursor moves from right to left as the hand moves from right (see Figure 4.5 (a)) to left (see Figure 4.5 (b)). The pervious page and next page commands correspond to swiping left (see Figures 4.6 (a)-(b)) and swiping right (see Figures 4.6 (b)-(c)), respectively, by a hand.

The two-hand gestures include Zooming and Scrolling gestures. Figures 4.7 and Figure 4.8 show the zooming test. The dominant hand fisting indicates the start of a two-hand action while two hands move apart (see Figure 4.7) and close together (see Figure 4.8) to zoom in and zoom out the page, respectively. The Scrolling gesture also needs the user to fist the dominant hand (shown in Figures 4.9 (a)-(b)) before scrolling pages. The scroll up and down is moving the non-dominant hand up (see Figures 4.10 (a)-(b)) and down (see Figures 4.9 (b)-(c)).

Figure 4.2 (a) and (b) waving the hand. (c) The hand being detected (the blue dot).

(a)

(b)

(c)

Figure 4.3 (a) the hand stay in the air. (b) the defined Moving Region (right) and the cursor position (left).

(a)

(b)

Figure 4.4 The clicking gesture.

(a)

(b)

(c)

Figure 4.5 The moving test.

(a)

(b)

Figure 4.6 The Swipe gesture for pervious page and next page.

(a)

(b)

(c)

Figure 4.7 The zoom in test (moving two hand apart).

(a)

(b)

Figure 4.8 The zoom out test (moving two hand close).

(a)

(b)

Figure 4.9 The scroll down the page.

(a)

(b)

(c)

Figure 4.10 The scroll up the page.

(a)

(b)

Table 4.2 shows the performance of HCI for PC and notebook platforms. The processing time of each frame is the average time during the test of HCI operations for each platform, which correspond a frame rate larger than 45 frames per second even for the slower platform.

Table 4.2 Comparison of frame rate and process time

Item Performance

Machine PC Notebook X201i

Frame rate 49.0103 45.0015

Process time (sec) 0.02 0.022

For operating a web browser in ICU, we compare our system with methods presented in [23]-[25], which are all designed for manipulating medical data. Table 4.3 shows a comparison of different methods. In ICU, paramedics will prefer a system which can be ready quickly for immediate use for browsing patient's health state and health data right around hospital bed. Our hand gesture-based human-computer interaction system requires only about 2 second to initialize system and doesn't require a special field of view (see Figure 4.11 for Kipshagen's method), which is more convenient to browse patient's data. Therefore, our method is more suitable for ICU than other methods.

Table 4.3 Comparison of different methods

Our method Gallo Kipshagen Bellmore

Main Technology

Hand point from Kinect

Skeleton from

Kinect Stereo-cameras Skeleton from Kinect

Figure 4.11 Schematic drawing of contact-free software control application [25].

Chapter 5. Conclusions

In this study, we proposed a real-time hand gesture-based HCI using the Kinect and developed some basic hand gestures that users can use to browse web pages, photos and other information. In order to perform the hand gesture recognition, we extract the four features from depth data, including hand position, hand state and hand moving direction and distance, which are effective in describing the different hand gestures. The hand gesture recognition approach uses two modes according to the number of hand used. The one-hand mode contains three kinds of gestures: Move, Click and Swipe gesture. The two-hand mode contains two kinds of gestures: Zooming and Scrolling. Beside, a Moving Region is automatically determined which can allows the hand to move within a small range to control the cursor to more easily in the monitor screen and decreases the computation of fist detection.

We implemented the hand gesture recognition system based on MVC model to increase the flexibility of hand gesture recognition system and simulate the browsing actions for ordinary web pages. Our system compares favorable with systems in terms of average process time.

Appendix A

The Moving Region

Gesture-based user interface often requires a user to move hands to operate the system, making the user tired easily. To resolve this problem, the system determines the Moving Region (as shown in Figure A.1 (a)) which is a small region and defined to be mapped to the whole screen (shown in Figure A.2 (b)). This region can also reduce the computation time of fist detection.

Figure A.1 (a) the Moving Region. (b) the corresponding monitor screen.

The Moving Region detection operation includes two steps: scene analysis and Moving Region adjustment. In scene analysis, the depth image will be segmented into three layers according to the depth value of hand point:

value of hand point －190 mm. T_high = depth value of hand point ＋190 mm.) Figure A.2 shows an example of the above segmentation, Figure A.2 (a) shows the depth image and Figures A.2 (b)-(d) show the segmented three layers. In order to define a Moving Region, which may not be occluded by other objects, the union of Figures A.2 (b) and (c) is used.

Figure A.2 An example of the layer segmentation. (a) the depth image, (b) the foreground layer, (c) the hand layer, (d) the background layer.

The Moving Region is decided by growing a initial window from the hand point until the window contain additional non-hand object. This process ensures the hand is clear visible in the Moving Region. Figure A.1 (a) shows the Moving Region represented by a yellow rectangle while Figure A.1 (b) shows the corresponding region of the monitor screen (which is the full screen in this case).

(a)

(b) (c) (d)

References

[1] S. Mitra, “Gesture recognition a survey,” IEEE Transactions on Systems, Man, and Cybernetics (SMC) – Part C , vol. 37, no. 3, pp. 311-324, 2007.

[2] R. Dongwan and P. Junseok, “Design of an armband type contact-free Space Input Device for Human-Machine Interface,” IEEE International Conference on communications, pp. 841-842

[3] P. Kumar, S. S. Rautaray, and A. Agrawal, “Hand data glove: a new generation real-time mouse for human-computer interaction,” IEEE International Conference on Recent Advances in Information Technology, pp. 750-755, 2012.

[4] A. Ibarguren, I. Maurtua, and B. Sierra, “Layered architecture for real time sign recognition: hand gesture and movement,” Engineering Applications of Artificial Intelligence, vol. 23(7), pp. 1216-1228, 2010.

[5] Fifth Dimension Technologies, 2012. (http://www.5dt.com/products/ pdataglove5u.html) [6] Y. Fang, K. Wang, J. Cheng, and H. Lu, “A real-time hand gesture recognition method,”

IEEE International Conference Multimedia and Expo, pp.995-998, 2007.

[7] R. Khan, A. Hanbury, J. Stöttinger, and A. Bais, “Color based skin classification,”

Pattern Recognition Letters, vol. 33, no. 2, pp. 157-163, 2012.

[8] R. Y. Wang and J. Popović, “Real-time hand-tracking with a color glove,” ACM Transactions on Graphics, vol. 28, no. 3, pp. 63:1-63:8, 2009.

[9] J. Nagi, F. Ducatelle, G. A. Di Caro, D. Ciresan, U. Meier, A. Giusti, F. Nagi, J.

Schmidhuber and L. M. Gambardella, “Max-pooling convolutional neural networks for vision-based hand gesture recognition,” IEEE International Conference on Signal and Image Processing Applications, pp. 342-347, 2011.

[10] Y. Y. Pang, N. A. Ismail, and P. L. S. Gilbert, “A real time vision-based hand gesture interaction,” Fourth Asia International Conference on Analytical Modelling and Computer Simulation, pp. 237-242, 2011.

[11] S. Kulkarni, H. Manoj, S. David, V. Madumbu and Y. S. Kumar “Robust hand gesture recognition system using motion templates,” International Conference on ITS Telecommunications, pp. 431-435, 2011.

[12] S. I. Kang, A. Roh, and H. Hong, “Using depth and skin color for hand gesture classification,” IEEE International Conference on Consumer Electronics, pp. 155-156, 2011.

[13] X. Li, J. H. An, J. H. Min, and K. S. Hong, “Hand gesture recognition by stereo camera using the thinning method,” International Conference on Multimedia Technology, pp.

3077-3080, 2011.

[14] J. Appenrodt, S. Handrich, A. Al-Hamadi, and B. Michaelis, “Multi stereo camera data fusion for fingertip detection in gesture recognition systems,” International Conference of Soft Computing and Pattern Recognition, pp. 35-40, 2010.

[15] G. Hu and Q. Gao, “Gesture analysis using 3d camera, shape features and particle filters,”

Canadian Conference on Computer and Robot Vision, pp. 204-211, 2011.

[16] G. F. He, S. K. Kang, W. C. Song, and S. T. Jung, “Real-time gesture recognition using 3d depth camera,” IEEE 2nd International Conference on Software Engineering and Service Science, pp. 187-190, 2011.

[17] Z. Zhang, "Iterative point matching for registration of free-form curves and surfaces,"

International Journal of Computer Vision, vol. 13, no. 2, pp. 119-152, 1994.

[18] SwissRanger SR4000 Overview. (http://www.mesa-imaging.ch/prodview4k.php?cat=3D

%20Camera)

[19] Xtion PRO LIVE. (http://tw.asus.com/Multimedia/Motion_Sensor/Xtion_PRO_LIVE/) [20] OpenNI User Guide. (http://www.openni.org/images/stories/pdf/openni_userguide_v4.pdf) [21] K. Khoshelham, “Accuracy analysis of kinect depth data,“ International Society for

Photogrammetry and Remote Sensing Workshop Laser Scanning, 2011.

[22] G. E. Krasner and S. T. Pope. “A cookbook for using the Model-View-Controller user interface paradigm in Smalltalk-80.” Journal of Object Oriented Programming, vol.1, no.

3, pp. 26–49, 1988.

[23] L. Gallo, A. P. Placitelli, and M. Ciampi, “Controller-free exploration of medical image data: experiencing the Kinect,” Computer-Based Medical Systems, pp. 1-6, 2011.

[24] T. Kipshagen, M. Graw, V. Tronnier, M. Bonsanto, and U. Hofmann., “Touch-and marker-free interaction with medical software,” Medical Physics and Biomedical Engineering, pp. 75-78, 2009.

[25] C. Bellmore, R. Ptucha, and A. Savakis, “Interactive display using depth and RGB sensors for face and gesture control,” Western New York Image Processing Workshop, pp. 1-4, 2011.

在文檔中基於手勢之次世代加護病房人機互動技術 (頁 31-0)