Experiment Results - 基於幾何人臉特徵之智慧型頭部姿態估測

In the previous chapter, the three main steps of the proposed head attitude estimation system are introduced. In this chapter, the experiment results of each step will be expressed and the result of the proposed algorithm will be obtained by MATLAB R2011b.

4.1 Facial Features Detection

For facial feature detection, there two features, eyes and mouth are detected in thesis. A set of experimental results will be used to show the effectiveness and efficiency of the proposed system. In this dissertation, a webcam is applied to the experiment in order to catch appropriate images. In the previous chapter had introduced how to detection human eyes and human mouth, here 300 images under different people human, with size of (640× 480) for test. The table 4.1 shows eyes detection and mouth accuracy rate. After experiment the results achieve good performance which get high accuracy rate to detect human eyes and mouth. The figure 4.1 shows different people facial features success detection result.

Facial features detection Accuracy rate %

Left-eye correct 96.66 % (290/300) Right-eye correct 95.66% (287/300)

Mouth 91.6 % (275/300) Three feature correct 91.6 % (275/300)

Table 4.1 Accuracy rate of three facial features detection.

Figure 4.1.1 the result of different people facial detection.

4.2 Geometric Facial Features

Thesis has to statistic more information of the geometric facial feature which are β and γ . We don’t know human head rotation information, therefore, thesis manufactured a stereo facial model showing in the figure 4.2.1 which are showed the appearance including a protractor and indicator which achieve more precise in the statistic human attitude estimation system information. On the table, there are seven points shows figure 4.2.2. Thesis uses red, green, blue colors represent right-eye, left-eye, mouth and four points of mask human face.

Figure 4.2.1 the turntable appearance.

Figure 4.2.2 RGB color label seven points.

After labeling red, green, blue colors automatic detected labeling, therefore, thesis uses two steps to detect red, green and blue colors which shows figure 4.2.3, first, gray level three colors and second step is binary three colors, finally, we can successful automatic detect red, green, blue colors.

Figure 4.2.3 detection RGB colors.

P1 P2 P3

P5 P7

Left -eye right -eye

mouth

Input image ^{Gray R}

Gray G

Gray B

Binary R

Binary G

Binary B

Output

image

4.3 Head Attitude Estimation System Design

This section shows the head attitude estimation system design experiment result, including the neural network off-line training, test the neural network off-line training performance and head attitude estimation final result.

4.3.1 Neural Network Off-line Training

This section focuses on the off-line training of the two neural networks, βNN and γNN, used in the head attitude estimation. It is known that different types of the head attitude estimation system requires different types of neural networks. Besides, all the neural networks are designed to have three layers, the input layer, output layer and hidden layer. The number of neurons of the input layer is chosen to the same as the number of input data, so is the number of neurons of the output layer, corresponding to the output data. However, how many neurons are needed for the hidden layer should be determined by experiments, via neural network off-line training in this thesis.

According to the experiment, 𝛽NN performance are influenced to the distance between webcam and stereo facial model. The thesis divides two cases to determine off-line training parameter.

Case1:

The 30CM distance from webcam to stereo facial model for case1. First, let’s find the suitable number of neurons of βNN and γNN which will be applied to head attitude estimation system. The off-line training of the βNN is executed in difference cases, named as βNN-k, h where k is the number of neurons of the first hidden layer and is chosen from 10 to 30, where h is the number of the second hidden layer and is chosen from 10 to 30. Based on the off-line training, it can be found that the performance is

best while the pair of (k, h) are changed from (20, 20) to (30, 30), as show in Table 4.2.The performance from (20, 20) to (30, 30) are the same, thus, thesis choses (20, 20) pair less neurons than (30, 30). Similarly, Table 4.3 shows the results of the off-line training of the γNN. Obviously, the γNN-25, 25 is the best structure with minimal learning mean square error (MSE). Hence, the γNN-25, 25 will be used in the HAES.

Table 4.2 30CM distance of βNN off-line training parameter.

Experiments βNN-10,0 βNN-

Table 4.3 γNN off-line training parameter

Case2:

It is all range of distance from webcam to stereo facial model for case1. First, let’s find the suitable number of neurons of βNN which will be applied to head attitude detection system. The off-line training of the βNN is executed in difference cases, named as βNN-k, h where k is the number of neurons of the first hidden layer and is chosen from 10 to 40, where h is the number of the second hidden layer and is chosen from 10 to 40. Based on the off-line training, it can be found that the performance is best while the pair of (k, h) are changed from (30, 30) to (40, 40), as show in Table 4.4.The performance from (30, 30) to (40, 40) are the same, thus, thesis choses (30, 30) pair less neurons than (40, 40).

Table 4.4 All distance range of βNN off-line training parameter.

4.3.2 Test neural Network Performance

In this section, this experiment will test the performance both of 𝛽NN and γNN.

According to the experiment, the 𝛽NN performance is influenced by the distance between webcam and stereo facial model. The figure 4.3.1 shows the different distance between webcam and stereo facial model image. There are two case in the βNN. The case1 is training data in 30 CM of the distance and case 2 is training data in all ranges.

The table 4.4 shows the training data in case1 of β NN accuracy rate relating two angle-scale and distance. In the table 4.5, obviously, the distance from 20CM to 30CM had good performance, the distance from 35CM to 40CM is medium performance and the distance from 45CM to 50CM had bad performance. The reason is the stereo facial model closing the webcam and the pixels change a measure of the βNN information is very obvious. Otherwise, the stereo facial model is far away from the webcam and the pixels change a measure of the βNN information extremely awful. The figure 4.3.2

shows the histogram relates to accuracy rate and angle-scale of orientation βNN in case1. In this plot, the accuracy rate accompanies the distance becoming extremely awful. In table 4.4, it is clearly observed that the angle-scale from

±0~ ± 20 had a good performance and the angle-scale ±21~ ± 30 was extremely awful. The reason is the angle-scale ±21~ ± 30 rotation was too extreme, hence the information of the βNN judge the accuracy angle. The figure 4.3.3 shows in the different angle-scale curves in case1, the βNN relating two accuracy rates and distance.

Obviously, the curves shows the good performance when the angle-scale within ±0~ ± 20 and the curves shows the bad performance when the angle-scale within±21~ ± 30.

The table 4.6 shows the training data including all distance range, the table shows performance better than the training data only distance in 30CM. The figure 4.3.4 shows histogram relates to accuracy rate and angle-scale of orientation βNN of the distance in all ranges. Figure 4.3.5 shows the different angle-scale curves in case2 .Table 4.7 shows γ NN accuracy rate relating two angle-scale and distance. Obviously, γNN performance are not influenced by the distance between webcam and stereo facial model. The reason is the pixels change a measure of the γNN information are very obvious in the per-independent angle-scale. The figure 4.3.6 shows histogram relating two accuracy rate and angle-scale of orientation γNN. Obviously, the performance is very well in different distances. The figure 4.3.7 shows in the different angle-scale curves, theγNN relating two accuracy rate and distance. In this plot, the accuracy rate is not influenced by the different angle- scale curves. Finally, the figure 4.3.8 shows the HAES final result.

Figure 4.3.1The different distance between webcam and stereo facial model.

20CM 25CM 30CM 35CM

40CM 45CM 50CM

0^°~5^° 6^°~10^° 11^°~15^° 16^°~20^° 21^°~25^° 26^°~30^°

20CM 100% 100% 96.5% 100% 100% 96.5%

25CM 96.5% 96.5% 95.5% 99% 95.5% 95.5%

30CM 97% 99% 100% 99% 93% 91.5%

35CM 95% 85.5% 97.5% 91% 71.5% 64%

40CM 96% 92% 92.5% 88.5% 64.5% 63%

45CM 79% 56.5% 51.5% 51% 29.5% 20%

50CM 76.5% 49.5% 48% 47.5% 19.5% 20%

0^°~−5^° −6^°~−10^° −11^°~−15^° −16^°~ − 20^° −21^°~ − 25^° −26^°~ − 30^°

20CM 100% 100% 98.5% 100% 100% 97.5%

25CM 97.5% 98% 100% 100% 96% 91%

30CM 100% 100% 99% 100% 92.5% 91%

35CM 96.5% 86.5% 95.5% 89.5% 73.5% 68.5%

40CM 97.5% 90.5% 89.5% 91.5% 60.5% 61.5%

Distance

Accuracy rate

Distance Accuracy

rate

Table 4.5 the βNN accuracy rate relating to angle-scale and distance in case1.

Figure 4.3.2 A histogram relates to accuracy rate and angle-scale of βNN in case1.

45CM 73.5% 53.5% 56.5% 52.5% 31.5% 19%

The  accuracy effect on per angle range

-26 to -30-21 to -25-15 to -20-11 to -15-6 to -10 0 to -5 0 to 56 to 10 11 to 1516 to 20 21 to 25 26 to 30

The  accuracy effect on per angle range

Angle and orietation range

Accuracy rate %

Figure 4.3.3In different angle-scale curves, βNN relates to accuracy rate and distance in case1.

0^°~5^° 6^°~10^° 11^°~15^° 16^°~20^° 21^°~25^° 26^°~30^°

20CM 100% 100% 97.5% 100% 100% 96.5%

25CM 98.5% 98.5% 97.5% 99% 95.5% 95.5%

30CM 96% 99% 100% 99% 95% 91.5%

35CM 97% 89.5% 96.5% 91% 78.5% 68%

40CM 98% 93% 94.5% 91.5% 69.5% 69%

45CM 85% 60% 55.5% 53% 31.5% 23%

50CM 81.5% 55% 53.5% 50.5% 23.5% 26%

0^°~−5^° −6^°~−10^° −11^°~−15^° −16^°~ − 20^° −21^°~ − 25^° −26^°~ − 30^°

20CM 100% 100% 98.5% 100% 100% 96.5%

25CM 98.5% 98% 100% 100% 96% 93%

30CM 100% 100% 99% 100% 92.5% 94%

Distance

Accuracy rate

Distance Accuracy

rate

Table 4.6 the βNN accuracy rate relating to angle-scale and distance in case2.

Figure 4.3.4 A histogram relates to accuracy rate and angle-scale of βNN in case2.

Figure 4.3.5In different angle-scale curves, βNN relates to accuracy rate and distance in case2.

0^°~5^° 6^°~10^° 11^°~15^° 16^°~20^° 21^°~25^° 26^°~30^°

20CM 98.5% 99.5% 98% 99.5% 98% 100%

25CM 99.5% 100% 98.5% 99.5% 99.5% 99%

30CM 98.5% 99.5% 98% 98.5% 100% 99.5%

35CM 98% 98.5% 99% 98.5% 97.5% 100%

40CM 98.5% 100% 100% 98.5% 100% 100%

45CM 99.5% 98% 100% 98% 99% 99.5%

50CM 98.5% 100% 99.5% 98.5% 99% 99.5%

Distance Accuracy

rate

Table 4.7 the γNN accuracy rate relating to angle-scale and distance.

Figure 4.3.6 A histogram relates to accuracy rate and angle-scales of γNN.

0^°~−5^° −6^°~−10^° −11^°~−15^° −16^°~ − 20^° −21^°~ − 25^° −26^°~ − 30^°

-26to-30-21to-25-16to-20-11to-15-6to-100to-5 0to56to10 11to1516to2021to2526to30

-26to-30-21to-25 -16to-20-11to-15 -6to-10 0to-5 0to5 6to10 11to15 16to20 21to25 26to30

-26to-30 -21to-25-16to-20 -11to-15 -6to-10 0to-5 0to5 6to10 11to15 16to20 21to2526to30

0 The  accuracy effect on per angle range

-26to-30-21to-25-16to-20-11to-15-6to-10 0to-5 0to5 6to1011to15 16to2021to2526to30

-26to-30-21to-25-16to-20-11to-15-6to-100to-5 0to5 6to1011to1516to2021to2526to30

The  accuracy effect on per angle range

-26to-30-21to-25-16to-20-11to-15-6to-100to-5 0to5 6to1011to1516to2021to2526to30

The  accuracy effect on per angle range

angle and orientation range

Accuracy rate %

-26to-30-21to-25-16to-20-11to-15-6to-100to-5 0to5 6to10 11to1516to2021to2526to30

0 The  accuracy effect on per angle range

Accuracy rate

Distance

Figure 4.3.7 In different angle-scale curves, γNN relates to accuracy rate and distance

Figure 4.3.8 the HAES final results.

Chapter 5 Conclusions and Future Works

This thesis proposes an intelligent human attitude estimation system (HAES) designed to detect the face orientation and angle based on geometric facial features, which are both eyes and the mouth, two important facial features in HAES. There are three steps to complete the intelligent head attitude estimation. First, facial feature detection method for detecting faces in color images is proposed [23]. It is based on a robust skin region detector which provides face coordinates. Then using some simple rules derived from anthropological characteristics, eyes are selected within the face region, the mouth is selected based on the relation from the eyes and mouth characteristics. This method achieves a successful rate of 91.6% on the HAES application. It can also detect eyes and the mouth rotation. Second, build up a stereo facial model to simulate the head attitude which is able to adjust the face orientation and angle by seven detected points marked on the face model. Then record the seven detecting points on each image referring to a specific face orientation and angle, which will be used in neural network learning. Third, the HAES is completed by intelligent neural networks under supervised learning.

The HAES intelligent neural networks include both βNN and γNN . Both βNN and γNN relation to accuracy and distance is shown in experiment results. Table 4.3.1 and the table 4.3.2 show the accuracy rate as high as 97.3% within 30CM between webcam and human head.

This system is useful in many computer vision tasks such as mobile phone user identification, the user can be whether observing the screen or not, or man-machine interface used in detecting the state of vehicle driver.

The proposed intelligent HAES detects the facial features in accuracy rate equal

to 91.6%, which is accepted in thesis experiment but the accuracy rate is not enough for the commercial products which requires to be as high as 99%. Future work includes developing other detection facial features methods or algorithms for upward the accuracy rate. The HAES is restricted to the distance because the dots per inch are not enough to determine face rotation angle and orientation of the webcam, thus raising the dpi in the sequence of images will improve it. Finally, combine two neural networks to become one neural network. We hope that HAES will succeed implement on our daily life and this system can be handy.

Reference

[1] A. Yuille, D. Cohen, and P. Hallinan, "Feature extraction from faces using deformable templates," in Proc. IEEE Computer Soc. Con Computer Vision and Pat. Recog. pp. 104-109, 1989.

[2] P. Hallinan, "Recognizing human eyes," in SPIE Proc.: Geometric Methods in Computer Vision, vol. 1570, pp. 214-226, 1991.

[3] M. Ahmed Fadzil, and H. Abu Bakar, "Human face recognition using neural networks," in IEEE Int. Con Image Proc. vol. 3, pp. 936-938, 1994.

[4] T. C. Chang, and T. S. Huang, "Facial feature extraction from color images,"

12th IAPR Int. Con Pattern Recognition, vol. 2, pp. 39-43, October 1994.

[5] J. Ohya, Y. Kitamura, F. Kishino, and N. Terashima, "Virtual space teleconferencing: real-time reproduction of 3d human images," Visual Commun. and Image Representation, vol. 6, no. 1, pp. 1-25, March 1995.

[6] N. Rahman, K. Wei, and J. See, "RGB-H-CbCr Skin Color Model for Human Face Dection," Proc. of the MMU International Symposium on Information &

Communications Technologies, 2006.

[7] L. Bretzner, I. Laptev, and T. Lindeberg, "Hand Gesture Recognition using Multi-Scale Colour Features, Hierarchical Models and Particle Filtering,"

Proc. The 5th IEEE Internat. Conf. on Automatic Face and Gesture Recognition, pp.423–428, May 2002.

[8] Q. Peng, and X. Zhang, "Sensitive Image Recognition Technology Based on Eigenvectors," Academic Journal of Southwest Jiaotong University,pp.13–18, Jan, 2007.

[9] Q. Zhang, S. Li, and H. Xiao, "Extracting regions of interest in medical images based on visual attention mechanism," Application Research of Computers, vol.

26 , Dec. 2009, pp. 4803-4805.

[10] V. Vezhnevets, V. Sazono, and A. Andreeva, "A Survey on Pixel- Based Skin Color Detection Techniques," Proc. Graphicon-2003, Sep.2003, pp. 85–92.

[11] K. C. Yow, and R. Cipolla, "Feature-based Human Face Detection," Image and Vision Computing, vol. 15, no. 9, pp. 713-735, 1997.

[12] S. Abdallah, A. Lynn Abbott, and A.Mohamad, "A New Face Detection Technique using 2D DCT and Self Organizing Feature Map," Proceeding of World of science, Engineering and technology. Vol. 24, May. 2007, pp. 15 - 19.

[13] J. Wu, X. Zhang, and F. Zhang, "The experiment research of edge detection in digital image," Microcomputer Information (CONTROL & AUTOMATION), vol. 20, No. 5, pp: 106-107, 2004.

[14] L.G. Roberts, "Machine Perception of Three-Dimension Solids," Optical and Electro-Optimal Information Processing, England : Cambridge, pp:99-19, 1965.

[15] Sobel L, "Camera Models and Machine Perception," CA: Stanford University, pp:121, 1999.

[16] D.C. Marr, E. Hildret, and Hildreth, "Theory of Edge Detection," Theory of Edge Detection, vol.B275, pp:187-217, 1980.

[17] J. Prewitt, "Enhancement and Extraction," Picture Process, New York : Academic Press, pp:75-1970, 1970.

[18] J. Canny, "A Computational Approach to Edge Detection," IEEE Trans.

Pattern Analysis and Machine Intelligence, vol. PAMI-8, No. 1, pp: 679-699, 1986.

[19] R. Chellappa, C. L. Wilson, and S. Sirohey, "Human and machine recognition of faces: a survey, " Proc. of the IEEE, vol. 83, no. 5, pp. 705-740, May 1995.

[20] K. Aizawa, and T. S. Huang, "Model-based image coding: advanced video coding techniques for very low bit-rate applications," Proc. of the IEEE, vol. 83, no. 2, pp. 259-27 1, Feb. 1995.

[21] C. Huang, and C. Chen, "Human facial feature extraction for face interpretation and recognition," Pattern Recognition, vol. 25, no. 12, pp. 1435-1444, 1992.

[22] D. Reisfed, and Y. Yeshuran, "Robust detection of facial features by generalized symmetry," in Proc. I 1 th hit. Con$ on Pat. Recog. pp. 1 17- 120, 1992.

[23] D. Sidibe, P. Montesinos, and S. Janaqi, "A simple and efficient eye detection method in color images," Author manuscript, published in International Conference Image and Vision Computing New Zealand,version 1-8 Apr 2009.

[24] W. Gonzalez, and S. Eddins, Digital Image Processing Using Matlab, Image Processing.

[25] B. Hogarth, "Drawing the Human Head," 1st ed., New York: Watson-Guptill, 2000.

在文檔中基於幾何人臉特徵之智慧型頭部姿態估測 (頁 48-0)