Skin Color Percentage in Candidate Regions

Chapter 3 Method for Reducing Fasle Alarms

3.2 Skin Color Percentage in Candidate Regions

(a)

(b) (c) (d) (e) (f)

Figure 3.2: (a) A picture with five candidate regions. (b) 0% skin color percentage in region 1. (c) 33% skin color percentage in region 2. (d) 0% skin color percentage in region 3. (e) 79% skin color percentage in region 4.

(f) 0% skin color percentage in region 5.

In this section, we demonstrate how to reduce false alarms with skin detection and get a better detection performance. For each image frame, we first find several candidates by the face detector. Then we exploit the skin detection method mentioned in section 3.1 to find skin pixels among regions of these candidates. Finally a candidate is taken out if the percentage of skin pixels in the region under it is less than a threshold. Take Figure 3.2 for example, in Figure 3.2(a) five candidates are detected and marked with red squares. For each candidate the skin pixels are detected and the percentage of them is recorded in the caption of Figure

3.2(b)-(f). Therefore, we may take out some bad candidates if we can find a suitable threshold for skin pixel percentage. In order to determine such a threshold that can separate true faces and false alarms properly, two videos are tested, i.e., S10 for indoor environment and O5 for outdoor environment, as shown in Figure 3.3. Table 3.1 and Table 3.2 show face detection results for different threshold values for the two videos. One can see that when the threshold is set between 10% and 40% for indoor environment (between 15% and 35% for outdoor environment) best detection results can be achieved. With such thresholds, we keep almost as many true positives as the origin while reduce most false positives.

And we set the threshold to 40% and show the comparison of face detection results on another video with and without the false alarm detection in Figure 3.4. One can see that with false alarm detection most false positives have been removed while the true faces are still preserved.

(a)

(b)

Figure 3.3: Videos used to find the threshold of skin pixel percentage. (a) Video S10. (b) Video O5.

Table 3.1: The detection results for S10 with different thresholds of skin pixel percentage.

TP FP FN

origin 80 31 0

P>0% 80 9 0

P>5% 80 2 0

P>10% 80 0 0

P>15% 80 0 0

P>20% 80 0 0

P>25% 80 0 0

P>30% 80 0 0

P>35% 80 0 0

P>40% 80 0 0

P>45% 78 0 2

P>50% 63 0 17

P>55% 41 0 39

P>60% 19 0 61

P>65% 0 0 80

P>70% 0 0 80

P>75% 0 0 80

P>80% 0 0 80

P>85% 0 0 80

P>90% 0 0 80

P>95% 0 0 80

P=100% 0 0 80

Table 3.2: The detection results for O5 with different thresholds of skin pixel percentage.

TP FP FN

origin 115 184 0

P>0% 115 80 0

P>5% 115 56 0

P>10% 115 6 0

P>15% 115 0 0

P>20% 115 0 0

P>25% 115 0 0

P>30% 115 0 0

P>35% 115 0 0

P>40% 112 0 3

P>45% 107 0 8

P>50% 102 0 13

P>55% 96 0 19

P>60% 85 0 30

P>65% 78 0 37

P>70% 70 0 45

P>75% 39 0 76

P>80% 14 0 101

P>85% 6 0 109

P>90% 0 0 115

P>95% 0 0 115

P=100% 0 0 115

Figure 3.4: Reducing false alarms with 40% skin color in candidate regions method in different path images.

Chapter 4 Experimental Results

In this chapter, we give four experiments to show how face detection can be improved with the proposed method. We have tested our method on videos under three real environments as well as on some synthesized images. In Section 4.1, the specifications of cameras used in our experiments and the installation parameters are detailed. In Section 4.2, we show some image samples from our testing videos. An experiment for comparison is present in Section 4.3. In this experiment, the videos are simply rotated such that the new position of the VPVL is at the horizontal center. The face detection results of our method with or without reducing false alarms method on videos under real environments are reported in Section 4.4 and in Section 4.5, respectively. Finally, in Section 4.6 we show the experiments on synthesized images with multiple camera installation settings.

4.1 Environment Settings

Through our experiments two cameras, named as AXIS 207MW and VIVOTEK FD8161 according to their model names, are used. In the indoor experiments, the AXIS 207MW is attached at the ceiling without any calibration and is located about 2.7 meters above the ground plane, as shown in Figure 4.1. In the outdoor experiments, we install the VIVOTEK FD8161 above an entrance also without any calibration. The height of the camera is about 2.5 meters, as shown in Figure 4.2.

(a) (b)

Figure 4.1: Camera AXIS 207MW settings. (a) Experimental environment. (b) Close-up view of camera.

(a) (b)

Figure 4.2: Camera VIVOTEK FD8161 settings. (a) Experimental environment. (b) Close-up view of camera.

4.2 Video Demonstration

We have captured 25 videos and use them in our experiments. As mentioned previously, our experiments contain videos both on an indoor environment and on an outdoor environment. These videos have different resolutions according to the camera models and thus have different resolutions on the two environments. In our experiments the resolutions of the indoor and outdoor videos are 640x480 and 800x600, respectively. In all videos one or multiple people walking and roughly toward the camera are captured. Furthermore, these people start walking from different positions so that we can see faces with different sizes, rotations, and distortions, in these videos.

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j)

Figure 4.3: Frame examples of a single person walking in the laboratory. (a) Video S1. (b) Video S2. (c) Video S3. (d) Video S4. (e) Video S5. (f) Video S6. (g) Video S7. (h) Video S8. (i) Video S9. (j) Video S10.

(a) (b) (c) (d)

(e) (f) (g)

Figure 4.4: Frame examples of multi people walking in the laboratory. (a) Video M1. (b) Video M2. (c) Video M3. (d) Video M4. (e) Video M5. (f) Video M6. (g) Video M7.

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 4.5: Frame examples of a single person walking in the outdoor environment. (a) Video O1. (b) Video O2.

4.3 Experiment 1 – Face Detection using Image Rotation

In this section, we present a simple method using only image rotation to improve face detection with a surveillance camera. Based on the observation that faces in directions other than the front of the camera seem like faces with in-plane rotations, to treat them just like in-plane rotated faces like [14, 15] is an intuitive way. In [14, 15], faces with rotations in different degrees are detected by rotated detectors. Unlike them, for easier implementation, we instead rotate the image and repeat the detection procedure several times.

To decide the amount of the angular shift in each face detection procedure, we first investigate the limitation of the frontal face detector, which is proposed in [17] and provided in OpenCV, on in-plane rotations. A subset of the MUCT face database [18] is used in this testing. More specifically, we use images captured by the camera "e" as shown in Figure 4.6.

There are total 751 images in the database and each subject is captured under 2 to 3 different lightings as shown in Figure 4.7. In our testing each image is rotated 360 times with the step of one degree and then the face detection results are carried out on all rotated versions. The detection result of an image with all its rotated versions is shown in Figure 4.8 where the maximum continuous region in which the face detection succeeds is of 48 degree. With the same procedure on all images, we have the result that the smallest maximum continuous region is of 28 degree and the average maximum continuous region is of 47.47 degree.

Figure 4.6: The relationship between five cameras and human face in MUCT.

(a) (b) (c)

Figure 4.7: Three different lighting for a person in MUCT face database captured by camera e.

Figure 4.8: The testing result of Figure 4.7(a). 1=face was detected; 0=no face was detected. 22-(-25)+1=48, 48 is the maximum rotation in plane degree result for Figure 4.7(a).

Although the face detector is proved effective for faces rotated up to ±23.7 degree (half of the average result 47.47), we take a relatively conservative rotation step of 15 degree in our experiments. The maximum in-plane rotation of faces in our videos is about +24.8 degree, thus it should be enough to rotate the images by ±15 degrees to deal with all in-plane rotations occurred here. However, we also rotate images by ±30 degree and merge the results for more robustness. An example of face detection on an image in rotations of different degrees is shown in Figure 4.9. In Figure 4.9 the face detector can detect only one face in the original image. In the meantime, the remaining two faces are detected in the -15 and -30 degree rotated versions separately. It is obvious that more faces can be found if we provide more rotated versions of an image.

(a) (b) (c)

(d) (e)

Figure 4.9: Example of rotating image. (a) Origin image. (b) Rotate -15 degree. (c) Rotate 15 degree. (d) Rotate -30 degree. (e) Rotate 30 degree.

By combining results in different rotations, we have the result shown in Figure 4.10. One can see that although more faces are detected, the number of false positives is also increased.

(a) (b)

Figure 4.10: Combine the detection results of Figure 4.9 into one image. (a) Combine Figure 4.9(a)-(c) and 2 TP/FP in the image. (b) Combine Figure 4.9(a)-(e) and 3 TP/FP in the image.

Finally we show the detection results in Table 4.1 ~ Table 4.3. Where O represents the origin method without any preprocessing on images; R1 represents combining the results on the origin images and on the 15-degree rotated version; and R2 represents combining results on the origin images, 15-degree rotated version and 30-degree rotated version. In the table, we give precision and recall to show the detection performance. One can see that the more rotated images are added, the higher recall and lower precision will be. In Table 4.4 we give the average results across all videos. It is obviously to see that R2 has the best recall but the lowest precision in most groups.

Table 4.1: Face detection results of S1 ~ S10 videos by image rotation.

S1 S2 S3 S4

Recall Precision Recall Precision Recall Precision Recall Precision

O 100%

Recall Precision Recall Precision Recall Precision Recall Precision

O 1.8%

Recall Precision Recall Precision

O 95.5%

Table 4.2: Face detection results of M1 ~ M7 videos by image rotation.

M1 M2 M3 M4

Recall Precision Recall Precision Recall Precision Recall Precision

O 42.2%

Recall Precision Recall Precision Recall Precision

O 46.9%

Table 4.3: Face detection results of O1 ~ O8 videos by image rotation.

O1 O2 O3 O4

Recall Precision Recall Precision Recall Precision Recall Precision

O 1.1%

Recall Precision Recall Precision Recall Precision Recall Precision

O 93.0%

Table 4.4: Average detection results of image rotation method.

S1 ~ S10 videos M1 ~ M7 videos O1 ~ O8 videos All 25 videos Recall Precision Recall Precision Recall Precision Recall Precision

O 49.9% 35.9% 52.1% 68.3% 82.5% 49.3% 61.5% 51.2%

R1 98.7% 41.6% 70.2% 59.5% 92.1% 38.1% 87.0% 46.4%

R2 99.1% 37.8% 79.5% 48.3% 92.8% 33.4% 90.4% 39.8%

4.4 Experiment 2 – Face Detection based on Image Rectification

In this section, we show the detection performance by the proposed image rectification methods. The results are reported in Table 4.5 ~ Table 4.8. Where O means the origin method without any preprocessing on images; T1 means transformation 1; T2 means transformation 2;

and T3 means transformation 3. Again, we give precision and recall to show the detection performance. Some images with different rectifications and face detection result are present in Figure 4.11~4.13.

(a) (b) (c) (d)

Figure 4.11: Results of S3 video. (a) Results of O. (b) Results of T1. (c) Results of T2. (d) Result of T3.

In Table 4.8 we find that the face detection performances are generally improved by the proposed methods. With the proposed three transformations the overall recall are improved from 0.615 to 0.781, 0.837, and 0.905 separately. As the results show, transformation 3 has the best recall. We also find that transformation 3 is very stable through all directions.

Table 4.5: Face detection results of S1 ~ S10 videos by image rectification.

S1 S2 S3 S4

Recall Precision Recall Precision Recall Precision Recall Precision

O 100%

Recall Precision Recall Precision Recall Precision Recall Precision

O 1.8%

Recall Precision Recall Precision

O 95.5%

Table 4.6: Face detection results of M1 ~ M7 videos by image rectification.

M1 M2 M3 M4

Recall Precision Recall Precision Recall Precision Recall Precision O 42.2%

Recall Precision Recall Precision Recall Precision O 46.9%

Figure 4.12: Results of M7 video. (a) Results of O. (b) Results of T1. (c) Results of T2. (d) Result of T3.

Table 4.7: Face detection results of O1 ~ O8 videos by image rectification.

O1 O2 O3 O4

Recall Precision Recall Precision Recall Precision Recall Precision

O 1.1%

Recall Precision Recall Precision Recall Precision Recall Precision

O 93.0%

Figure 4.13: Results of O8 video. (a) Results of O. (b) Results of T1. (c) Results of T2. (d) Result of T3.

Table 4.8: Average detection results of image rectification method.

S1 ~ S10 videos M1 ~ M7 videos O1 ~ O8 videos All 25 videos Recall Precision Recall Precision Recall Precision Recall Precision

O 49.9% 35.9% 52.1% 68.3% 82.5% 49.3% 61.5% 51.2%

T1 89.9% 63.0% 61.6% 61.3% 83.0% 49.4% 78.1% 57.9%

T2 95.9% 39.1% 67.8% 51.2% 87.5% 40.6% 83.7% 43.6%

T3 99.1% 48.4% 79.1% 54.3% 93.4% 40.8% 90.5% 47.8%

Now we compare the proposed methods with the image rotation method mentioned in the previous subsection. Comparing Table 4.4 and Table 4.8, we can see that T3 has better recall and precision than R1 and R2. The computation times of different methods on three different videos are shown in Table 4.9. Where T1 is 1.3 times slower than O; T2 and T3 are almost 1.5 times slower than O. When compared with the image rotation method, we see that T1, T2, and T3 on the contrary are over 1.8 and 3.3 times faster R1 and R2. This leads to the conclusion that although image rotation can improve the detection performance, it also results more false positives and computation times than our proposed methods.

Table 4.9: The computation time(s) of each face detection methods.

Frame Number O T1 T2 T3 R1 R2

S1 126 70.14 92.00 112.72 105.14 189.47 327.47

M1 90 47.44 69.57 77.19 71.48 132.91 217.20

O1 115 96.77 131.14 156.56 153.11 295.93 496.16

4.5 Experiment 3 – Face Detection based on Image Rectification with Reducing False Alarms Method

In this section, we show the face detection results by combining the false alarms reducing method proposed in Chapter 3 with the transformation 3. As suggested in Section 3.2, we test two thresholds, 40 and 45, on percentage of skin region in this experiment. The results are reported in Table 4.10 ~ 4.12. Where T3 denotes transformation 3; P0 means reducing false alarms in T3 with the threshold 35%; P1 means reducing false alarms in T3 with the threshold 40%; and P2 means reducing false alarms in T3 with the threshold 45%. As used in previous sections, we give precision and recall to show the detection performance.

Table 4.10: Face detection results of S1 ~ S10 videos by image rectification with reducing false alarms.

S1 S2 S3 S4

Recall Precision Recall Precision Recall Precision Recall Precision

T3 100%

Recall Precision Recall Precision Recall Precision Recall Precision

T3 100%

Recall Precision Recall Precision

T3 100%

Table 4.11: Face detection results of M1 ~ M7 videos by image rectification with reducing false alarms.

M1 M2 M3 M4

Recall Precision Recall Precision Recall Precision Recall Precision T3 74.4%

Recall Precision Recall Precision Recall Precision T3 77.4%

Table 4.12: Face detection results of O1 ~ O8 videos by image rectification with reducing false alarms.

O1 O2 O3 O4

Recall Precision Recall Precision Recall Precision Recall Precision T3 69.7%

Recall Precision Recall Precision Recall Precision Recall Precision T3 95.6%

Table 4.13: Average detection results of image rectification with reducing false alarms.

S1 ~ S10 videos M1 ~ M7 videos O1 ~ O8 videos All 25 videos Recall Precision Recall Precision Recall Precision Recall Precision

T3 99.1% 48.4% 79.1% 54.3% 93.4% 40.8% 90.5% 47.8%

P0 - - - - 93.7% 99.8% 93.7% 99.8%

P1 98.0% 99.7% 79.0% 98.9% 91.8% 99.8% 89.6% 99.5%

P2 96.6% 100% 79.0% 99.9% - - 87.8% 99.9%

After the false alarm reducing process, one can see in Table 4.13 that the precision in all videos has been improved to 0.9 and 1. In the meanwhile, the precision is enhanced by nearly 0.5 while the detection rate is only decreased less than 0.03. The computation times are reported in Table 4.14, where P combining the false alarm reducing method with transformation 3. As shown in Table 4.14, with the proposed false alarms reducing method the computation time are only be increased by 1 second through the three videos, which is about 0.01 second per frame.

Table 4.14: The computation time(s) of T3 and reducing false alarm method.

Video Frame Number T3 P

S1 126 105.14 106.94

M1 90 71.48 72.57

O1 115 153.11 154.15

4.6 Experiment 4 – Face Detection in Synthetic Scene

In order to show how the proposed methods can work under different situations in an efficient way, we build a synthetic scene by Autodesk MAYA for our experiments. With MAYA we construct a scene containing 21 people and a camera. These virtual people are positioned in three rows and spaced equally by one meter. Besides, we set the distance between the camera and the person in the middle of the first row to 2.5 meters. An example of these settings is shown in Figure 4.14 and some images captured by the camera are shown in Figure 4.15.

(a) (b)

Figure 4.14: The environment of synthetic scene. (a) An aerial image of the synthetic scene. (b) A side view of the synthetic scene with camera at 3 meters height and 40 degree angle of depression.

(a) (b)

Figure 4.15: Examples of synthetized images capture by camera. (a) 2m, . (b) 4m, .

(a)

(b)

Figure 4.16: The detection rates of different transformations by different camera settings. (a) The detection rates from different transformations according to varying camera heights. (b) The detection rates from different transformations according to varying camera angles.

(a)

(b)

Figure 4.17: The detection rates of different face detection methods by different camera settings. (a) The detection rates from different face detection methods according to varying camera heights. (b) The detection rates from different face detection methods according to varying camera angles.

Because of the clean background with which no false alarm occurs in our experiment, we only focus on the detection rate (recall) in the following. In this experiment we have five heights, from 2m to 6m, and nine tilt angles, from , in the camera settings. The detection rates are plotted in Figure 4.16. In Figure 4.16(a), we show the detection rates with different camera heights across all camera angles. In Figure 4.16 (b), conversely we show the detection rates with different camera angles across all heights. Throughout the experiments we see that best face detection result is obtained with transformation 3; the second is with transformation 2; the third is with transformation 1; and with the original images we have the worse result. We also add the detection result with the image rotation method introduced in Section 4.3 for comparison which is shown in Figure 4.17. From it we can conclude that transformation 3 is more suitable than all the rest methods in the common surveillance applications.

Chapter 5 Conclusions and Future Works

In this chapter we give our conclusions of this thesis in Section 5.1 and some future works in Section 5.2.

5.1 Conclusions

In this thesis we propose a novel method to improve face detection in a practical condition with vanishing point-based image rectification. Our approach requires only the position of the vanishing point of vertical lines or two vertical lines marked by the user.

Compare with the simple image rotation method, our method gives better results and requires much less computational resource. For better detection performance, we further propose a method for reducing false alarms by skin analysis. This method significantly decreases false alarms and only causes negligible additional computation time. Through the proposed framework, the face detection performance is significantly improved for common surveillance camera installation.

5.2 Future Works

In our future works, a detection scheme which is more suitable for image rectified by the proposed method is considered. For example, since the lower part of the rectified image is generally enlarged, the minimal detection window could be set larger around there for better efficiency. On the other hand, how to properly adopt a multi-view face model for face detection is also considered. By these works, the integrity of the detection method can be enhance and support.

References

[1] Y. Ishii, H. Hongo, K. Yamamoto and Y. Niwa, "Face and head detection for a real-time surveillance system," in Proc. International Conference on Pattern Recognition, vol.3, pp.298-301, Aug. 2004.

[2] O. Arandjelovic and Roberto Cipolla, "An illumination invariant face recognition system for access control using video," in Proc. British Machine Vision Conference, 2004.

[3] Lloyd A. B. Louw, "Automated face detection and recognition for a login system," M.S.

thesis, Dept. Science of Engineering, University of Stellenbosch, 2007.

[4] D.G. Lowe, "Object recognition from local scale-invariant features," in Proc. IEEE International Conference on Computer Vision, vol.2, pp.1150-1157, 1999.

[5] M.F. Tappen, W.T. Freeman and E.H. Adelson, "Recovering intrinsic images from a single image," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, no.9, pp.1459-1472, Sept. 2005.

[6] I. Kemelmacher-Shlizerman and R. Basri, "3D Face Reconstruction from a Single Image Using a Single Reference Face Shape," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, no.2, pp.394-405, Feb. 2011.

[7] G. Fangi, G. Gagliardini and E.S. Malinverni, "Photointerpretation and small scale stereoplotting with digitally rectified photographs with geometrical constraints,"

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pp. 160-167, 2001.

[8] F. Schaffalitzky and A. Zisserman, "Planar grouping for automatic detection of vanishing lines and points," Image and Vision Computing, vol. 18, pp. 647-658, 2000.

[9] O. Chum and J. Matas, "Planar affine rectification from change of scale," in Proc. Asian Conference on Computer Vision, Springer Berlin Heidelberg, pp. 347-360, 2011.

[10] E. Ribeiro and E.R. Hancock, "Estimating the perspective pose of texture planes using

spectral analysis on the unit sphere," Pattern Recognition, vol. 35, no. 10, pp. 2141-2163, 2002.

[11] A.P. Witkin, "Recovering surface shape and orientation from texture," Artificial intelligence, vol. 17, pp. 17-45, 1981.

[12] C. Rasmussen, "Texture-based vanishing point voting for road shape estimation," in Proceedings of the British Machine Vision Conference, 2004.

[13] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Proc. IEEE International Conference on Computer Vision and Pattern Recognition, vol.1, pp.511-518, 2001.

[14] B. Wu, H. Ai, C. Huang and S. Lao, "Fast rotation invariant multi-view face detection based on real Adaboost," in Proc. IEEE International Conference on Automatic Face and Gesture Recognition, pp.79-84, May 2004.

[15] C. Huang, H. Ai, Y. Li and S. Lao, "Vector boosting for rotation invariant multi-view face detection," in Proc. IEEE International Conference on Computer Vision, vol.1, pp.446-453, Oct. 2005.

[16] C. Garcia, G. Zikos and G. Tziritas, "Face detection in color images using wavelet packet analysis," in Proc. International Conference on Multimedia Computing and Systems, vol.1, pp.703-708, Jul. 1999.

[17] R. Lienhart, A. Kuranov and V. Pisarevsky, "Empirical analysis of detection cascades of boosted classifiers for rapid object detection," Pattern Recognition, Springer Berlin Heidelberg, pp.297-304, 2003.

[18] S. Milborrow, J. Morkel and F. Nicolls, "The muct landmarked face database," Pattern Recognition Association of South Africa, 2010.

在文檔中應用基於消失點之影像校正於強健式人臉偵測 (頁 28-0)