h - 基於PairNet的連續手勢辨識

o (Loss Function)

(Cross Entropy) (Batch Size) 64 (Epoch) 100

o SoftMax

PairNet ConvNet )- 14 1~3

128 4~5 256

o (Rectified Linear Unit, ReLU)

(Batch Normalization Layer) o (Optimizer)

AdaDelta [43] c (

PairNet ) c z ConvNet

- ) 2 × 1

o (Optimizer) Adam [44]

Bi-LSTM Bi-GRU 256 c

Bi-LSTM Bi-GRU ( 13) 256

( 256 0.25 DropOut [38]

3-4

4-1 10 1 6 PairNet

( n 19 6

( 638 ) PairNet

97.81% ( 1000 )

PairNet 93.1% h 90% Bi-LSTM

PairNet ) 5.3% ) ( 588 )

PairNet Bi-GRU ) 12.82%

Bi-GRU 25.03% PairNet )12.78%

19. PairNet

)12.78% CNN 78.9% n

PairNet PairNet

72.21% PairNet 85.03% )12.82%

2 PairNet CNN ) 58

14 PairNet 108 ( Bi-LSTM 112

PairNet )12.82%

20. 6 PairNet

CNN ) 58 CNN 65% PairNet 106 Bi-LSTM

with Att. 112

•

21. 14

Google Daydream Controller

20ms ( 50 ) 20 0 14

B z n

7910 2~5

294

0 o (Loss

Function) (Cross Entropy) (Batch Size) 128

(Epoch) 100 o SoftMax

PairNet ConvNet 13

-1~3 128 4~5 256 h

ReLU) Layer)

o (Optimizer) AdaDelta [43] c

( PairNet ) c z

ConvNet - ) 2 × 1

)

o (Optimizer) Adam Bi-LSTM Bi-GRU

128 c Bi-LSTM Bi-GRU

( 12) 128 ( 128

0.25 DropOut

3 6 PairNet 99.38%

Bi-GRU 1.22%

22. PairNet 100% 99.4%

4 ) 6 PairNet

) 63% 28 PairNet (

Bi-LSTM 30

• PairNet

14 PairNet ConvNet

1 2 PairNet

) ( 2 1)

(Overlapping) (Non-overlapping) (Stride) z 1 2

5 6 PainNet

(Max-pooling) c (Global Average Pooling) 6 6 PairNet

5. 6 c

5 6 6 PairNet

c 1 3 ConvNet

-n 6 )

(Three-axis Accelerometer) (Three-axis Gyroscope) B

a n PairNet n

6 h - PairNet

h 12.82% 6 (

60~65% PairNet

h (Generalization)

PairNet 6 ResNet

(Shortcut) [41] h 6

PairNet a

n (

[1] Cabral, Marcio C., Carlos H. Morimoto, and Marcelo K. Zuffo. "On the usability of gesture interfaces in virtual reality environments." Proceedings of the 2005 Latin American conference on Human-computer interaction. ACM, 2005.

[2] Vogler, Christian, and Dimitris Metaxas. "Parallel hidden markov models for american sign language recognition." Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. Vol. 1. IEEE, 1999.

[3] Chai, Xiujuan, et al. "Sign language recognition and translation with kinect." IEEE Conf. on AFGR. 2013.

[4] Sun, Chao, et al. "Discriminative exemplar coding for sign language recognition with Kinect." IEEE Transactions on Cybernetics 43.5 (2013): 1418-1428.

[5] Zhang, Xu, et al. "A framework for hand gesture recognition based on accelerometer and EMG sensors." IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 41.6 (2011): 1064-1076.

[6] Starner, Thad, and Alex Pentland. "Real-time american sign language recognition from video using hidden markov models." Motion-Based Recognition. Springer, Dordrecht, 1997. 227-243.

[7] Xu, Deyou. "A neural network approach for hand gesture recognition in virtual reality driving training system of SPG." Pattern Recognition, 2006. ICPR 2006. 18th International Conference on. Vol. 3. IEEE, 2006.

[8] Murthy, G. R. S., and R. S. Jadon. "A review of vision based hand gestures recognition." International Journal of Information Technology and Knowledge Management 2.2 (2009): 405-410.

[9] Mitra, Sushmita, and Tinku Acharya. "Gesture recognition: A survey." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37.3 (2007): 311-324.

[10] Stotts, David, Jason McC Smith, and Karl Gyllstrom. "Facespace: endo-and exo-spatial hypermedia in the transparent video facetop." Proceedings of the fifteenth ACM conference on Hypertext and hypermedia. ACM, 2004.

[11] Rautaray, Siddharth S., and Anupam Agrawal. "Vision based hand gesture recognition for human computer interaction: a survey." Artificial Intelligence Review 43.1 (2015): 1-54.

[12] Sharma, Rajeev, et al. "Speech/gesture interface to a visual computing environment for molecular biologists." Pattern Recognition, 1996., Proceedings of the 13th International

[13] O'Hagan, R. G., Alexander Zelinsky, and Sebastien Rougeaux. "Visual gesture interfaces for virtual environments." Interacting with Computers 14.3 (2002): 231-250.

[14] Molchanov, Pavlo, et al. "Hand gesture recognition with 3D convolutional neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2015.

[15] Ohn-Bar, Eshed, and Mohan Manubhai Trivedi. "Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations." IEEE transactions on intelligent transportation systems 15.6 (2014): 2368-2377.

[16] Gupta, Hari Prabhat, et al. "A continuous hand gestures recognition technique for human-machine interaction using accelerometer and gyroscope sensors." IEEE Sensors Journal16.16 (2016): 6425-6432.

[17] Elmezain, Mahmoud, et al. "A hidden markov model-based continuous gesture recognition system for hand motion trajectory." Pattern Recognition, 2008. ICPR 2008. 19th International Conference on. IEEE, 2008.

[18] Potter, Leigh Ellen, Jake Araullo, and Lewis Carter. "The leap motion controller: a view on sign language." Proceedings of the 25th Australian computer-human interaction conference:

augmentation, application, innovation, collaboration. ACM, 2013.

[19] Luzhnica, Granit, et al. "A sliding window approach to natural hand gesture recognition using a custom data glove." 3D User Interfaces (3DUI), 2016 IEEE Symposium on. IEEE, 2016.

[20] Marin, Giulio, Fabio Dominio, and Pietro Zanuttigh. "Hand gesture recognition with leap motion and kinect devices." Image Processing (ICIP), 2014 IEEE International Conference on.

IEEE, 2014.

[21] Lee, Hyeon-Kyu, and Jin-Hyung Kim. "An HMM-based threshold model approach for gesture recognition." IEEE Transactions on pattern analysis and machine intelligence 21.10 (1999): 961-973.

[22] Laurel, Brenda, and S. Joy Mountford. The art of human-computer interface design. Addison-Wesley Longman Publishing Co., Inc., 1990.

[23] Huang, Deng-Yuan, Wu-Chih Hu, and Sung-Hsiang Chang. "Vision-based hand gesture recognition using PCA+ Gabor filters and SVM." Intelligent Information Hiding and Multimedia Signal Processing, 2009. IIH-MSP'09. Fifth International Conference on. IEEE, 2009.

[24] Hong, Pengyu, Matthew Turk, and Thomas S. Huang. "Gesture modeling and recognition using finite state machines." Automatic face and gesture recognition, 2000. proceedings. fourth ieee international conference on. IEEE, 2000.

[25] Lefebvre, Grégoire, et al. "BLSTM-RNN based 3D gesture classification." International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, 2013.

[26] Shin, Sungho, and Wonyong Sung. "Dynamic hand gesture recognition for wearable devices with low complexity recurrent neural networks." Circuits and Systems (ISCAS), 2016 IEEE International Symposium on. IEEE, 2016.

[27] Ordóñez, Francisco Javier, and Daniel Roggen. "Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition." Sensors 16.1 (2016): 115.

[28] Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.

[29] Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. "On the difficulty of training recurrent neural networks." International Conference on Machine Learning. 2013.

[30] Hochreiter, Sepp. "The vanishing gradient problem during learning recurrent neural nets and problem solutions." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6.02 (1998): 107-116.

[31] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.

[32] Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with LSTM.

[33] Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.

[34] Gers, Felix A., Nicol N. Schraudolph, and Jürgen Schmidhuber. "Learning precise timing with LSTM recurrent networks." Journal of machine learning research 3.Aug (2002): 115-143.

[35] Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).

[36] Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).

[37] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

[38] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

[39] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale

[40] Szegedy, Christian, et al. "Going deeper with convolutions." Cvpr, 2015.

[41] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[42] Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International Conference on Machine Learning. 2015.

[43] Zeiler, Matthew D. "ADADELTA: an adaptive learning rate method." arXiv preprint arXiv:1212.5701 (2012).

[44] Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).

在文檔中基於PairNet的連續手勢辨識 (頁 32-44)