Preprocessing of Face Detection - SKIN COLOR REGION DETECTION

CHAPTER 3 SKIN COLOR REGION DETECTION

3.2 Preprocessing of Face Detection

Given a color image, the first step is segmenting skin part of the image. At every skin pixel, the face detection algorithm is performed with different sub-windows. The result of detection sometimes makes mistake, it recognizes no-faces as faces, as shown in Fig. 3.7.

(a) (b)

Fig. 3.7 Two examples of bad results of face detection.

To eliminate these false results, we use a simple method before face detection. This method is based on the ratio of skin region area to total area in the sub-window, as shown in Eq. 3-2:

= 100%

-the area of skin region in -the sub window skin ratio

the total area in the sub window ×

(3-2)

The

skin ratio

represents the percentage of skin pixels in the sub-window. It is apparent that non-faces and faces can be distinguished

by applying an appropriate threshold value. The non-face samples will be rejected when the

skin ratio

is less than the threshold

The threshold is evaluated from 535 correct detected face samples in our experiment. We compute the

skin ratio

of every sample and sort these data by their scale. Fig. 3.8 shows some examples in our experiment.

Fig 3.9 shows distribution of all correct detected face samples.

samples

Skin ratio 83% 99% 76% 91%

Fig. 3.8 Some examples in our experiment and their skin ratio.

Fig. 3.9 Distribution of skin ratio of 535 correct detected face

samples.

Distribution of skin ratio

From Fig. 3.9, over 95% of the samples can be detected correctly when skin ratio is greater than 65%, so the threshold 65% is used in our experiment. Fig. 3.10 shows two examples from Fig. 3.6, the results with preprocessing are satisfied.

(a) (b)

Fig. 3.10 Two examples with preprocessing from Fig. 3.6.

CHAPTER 4 EXPERIMENTAL RESULTS AND DISCUSSIONS

4.1 Detection sub-windows

4.1.1 Nearest neighbor interpolation

For face detection, we scan input image with different-size detection sub-windows. Usually, the faces in the image have different scales. So we need to normalize all sub-windows to a standard size (24*24 pixels). The method of resizing the sub-windows is using the nearest neighbor interpolation [11].

Nearest neighbor interpolation is shown as below:

where

0 ≤ < x m ' < m

0 ≤ < < y n ' n

, and floor() is the function that rounds a float number to an integer.

For nearest neighbor interpolation, the subsampled pixel value is placed by the nearest pixel value. Fig. 4.1(a) shows the original image in the sub-windows, and Figs. 4.1(b)-(c) are the images after transformation.

The subsampled image may appear step-like after transformation, but it causes a little effect in following procedure.

(a) (b) (c)

**Fig. 4.1 (a) The original image with 100*100 pixels. (b)(c) The**

subsampled images with 50*50 pixels and 24*24 pixels.

4.1.2 The threshold of confidence

As mentioned in Section 2.2, the sub-windows are classified as positive if the final hypothesis exceeds the AdaBoost threshold

(i.e. ¹

∑

). We define the confidence of a sub-window as:

The

Conf

(

x

) is an index that decides the output of classifier is acceptable or not. The high confidence value represents the sub-image satisfies more features. In test procedure, when

Conf

(

x

) is larger than the threshold,

Th

, this sub-image is regarded as positive:

( ) >

Conf x Th

(4-2)

The threshold,

Th

, will be adjusted according to different test image set. In the beginning, the threshold is initialized to 0.5. Then, the threshold is gradually increased until minimizing the sum of miss rate and false alarm rate. In real-time system, optimal threshold is selected according to illumination and background conditions at that time. In our photo test experiment, optimal threshold of 0^o face detection is assigned to 0.6, and optimal thresholds of

±

45^o face detection are assigned to 0.67, as shown in Fig. 4.2. In our real-time system, optimal threshold of 0^o face detection is assigned to 0.56, and optimal thresholds of ±45^o face detection are assigned to 0.6.

Fig. 4.2 The optimal threshold selected at each layer.

Th Miss

4.1.3 Classification of potential faces

In the experiment, we illustrate the detected faces by red square.

As shown in Fig. 4.3(b), there are usually a lot of red squares for single face. In order to eliminate this phenomenon, we combine adjacent red squares to one.

The first step is computing the centers (x1, … ,xN) of all potential face sub-windows, where

N

is the number of potential face sub-windows.

Secondly, we assign x1 to y1, which is first member of the first class C1

and we will meet two different conditions:

Condition 1 (when the number of class

k

is 1):

We compute the distances of y1 and x1, … ,xN respectively. Once distance of y1 and xi is larger than threshold, we assign xi to y2, which is first member of the second class C2. If distance of y1 and xi is smaller than threshold, we assign xi to C1.

Condition 2 (when the number of classes

k

is larger than 1):

We compute the distances of y1, … ,yk and remaining centers respectively. Once distances of y1, … ,yk and xi are all larger than threshold, we assign xi to yk+1, which is first member of the

k

+1th class Ck+1. If distances of some yj and xi are smaller than threshold, we assign

x

i to C_j₀ which has the minimal distance of y_j₀ and xi.

The remaining centers do Condition 2 procedure repeatedly until all potential face sub-windows are classified. Finally, we select the member of each class with the highest confidence value. The result is

shown in Fig. 4.3(c).

The detailed algorithm is shown as below:

z Defined:

x

1, … ,xN : the centers of selected sub-windows

C₁, … ,C_k: the set of classes, where

k

is the number of classes

y

1, … ,yk : the set of first member of the classes

z Choose x1 as first member of C1: y1 ← x1, and

k

= 1.

z For

i

= 2, …,

N

: For

j

= 1, …,

k

if || xi – yj || > Th then yk+1 ← xi

k

←

k

+ 1 else

find

j

₀ such that || xi – yj₀ || =

min

j= || xi – yj ||

x

i ∈ Cj₀

z Select one member with the highest confidence value in C_j,

∀ j

= 1, … ,

k

(a)

(b)

(c)

Fig. 4.3 (a) The original images. (b) The detection results without

applying our algorithm. (c) The detection results with applying our algorithm.

4.2 Experiments on Real-World Photos

Our experimental environment is based on AMD 3500+ processor.

The 100 test photos with 308 faces are all resized to 320*240 pixels.

4.2.1 Comparison with the Performances with Two Different Feature Selection Types

Here, we compare two different feature selection types. One is AdaBoost we already mentioned before, and the other one is feature selection depending on the order of error rate among all features. Unlike AdaBoost, the second type selects features at only one round. In order to prove that AdaBoost is more robust, we test our database and compare performances of two different types, as shown in Table 4.1. The table illustrates that the two feature selection methods almost have the same performances in detection rate. But on the other side, the false alarm rate without using AdaBoost is too high to tolerable, as shown in Fig. 4.4.

Feature Selection

Table 4.1 Comparison to the performances with two different

feature selection types.

No AdaBoost AdaBoost

Fig. 4.4 Comparison between results by using feature selection

without AdaBoost and with AdaBoost.

From Table 4.1, we can see 200 features selected without AdaBoost each layer introduce high false alarm rate 78.94%. Hence, in order to improve the performance, we select 100 better features as a new feature set. These new 100 features come from original 200 features, and we select them by number of each feature regarded as positive among all false alarm cases.

We use new feature set to test our database, and comparison to the performance with old is shown as Table 4.2. We find the new method is useless, and the performance with new feature set is worst than old one.

That is 100 features eliminated may be important at other position in an image, so the possibility of false alarm becomes higher.

Feature set Detection

Rate Miss Rate False Alarm Rate

Old

87.99% 12.01% 78.94%

New

87.66% 12.34% 83.08%

Table 4.2 Comparison to the performances with two different

feature sets. Old: 200 features. New: 100 better features.

4.2.2 Comparison with the Performances using features with and without confident weight, α, selected by AdaBoost

Here, we compare the performances using features with and without confident weight, α, selected by AdaBoost. As we mentioned in section 2.3, confident weight represents the importance of each selected feature. The confident weight is higher, the selected feature is more important. Table 4.3 shows the performances with and without confident weight. From the table, we can see the method with confident weight has better performance than without confident weight, but the method without confident weight is still much better than feature selected without AdaBoost. Therefore, the features selected by AdaBoost are critical and important for testing real-world photos.

Detection

Rate Miss Rate False Alarm Rate

With α

88.31% 11.69% 12.26%

Withoutα

84.09% 15.91% 15.08%

Table 4.3 Comparison to the performances with and without

confident weight.

4.2.3 Comparison with the Performances of Two Different Systems

Fig. 4.5 shows two different systems. Fig. 4.5(a) is the system we mentioned before. As shown in Fig. 4.5(b), all sub-windows first do detection to determine if human face images or not, and then face images will be separated to three classes (-45^o, 0^o, 45^o). In the grey part of the figure, training set is composed of three pose (-45^o+ 0^o+ 45^o) faces. Table 4.4 shows the performances of (a) and (b). We can see (a) is better in detection rate but worse in false alarm rate. Thus, we compare their performances under equal false alarm condition by adjusting the thresholds of (b), as shown in Table 4.5. From the table, we can see two systems both produce 38 false alarm cases and detection rate of (a) is still better than (b). In our experiment, there are 13.31% positive examples are rejected at the grey part in the Fig. 4.5(b). Hence, the grey part is a major reason that miss rate of (b) is higher than (a).

All sub-window images

(b)

Fig. 4.5 The structures of two different systems.

Result

Table 4.4 The performances of two different systems.

0^o , 45^o , & -45^o

Table 4.5 The performances of two different systems under equal

false alarm condition.

4.2.4 Dynamic analysis of systems without and with preprocessing

Table 4.6 shows comparison with dynamic analysis of systems without and with skin ratio preprocessing.

Result

Table 4.6 Comparison with dynamic analysis of systems (a)

without skin ratio preprocessing and (b) with skin ratio preprocessing.

From the table, Level A does not use skin ratio preprocessing. We can see Level A1 uses 0^o face detection, and its detection rate is 82.14%

and miss rate is 17.86%. After adding +45^o face detection (Level A2), detection rate increases to 82.79% and miss rate reduces to 17.21%.

Hence, under 0^o face detection (Level A1), there are 17.86% faces can not be detected. But after adding +45^o face detection (Level A2), only 17.21% faces can not be detected. It means 0.65% (17.86% - 17.21%) faces originally miss at 0^o face detection (Level A1) layer, they can be detected. Finally, after entering -45^o face detection (Level A3) layer, detection rate increases to 84.09% and miss rate reduces to 15.91%.

Hence, under Level A2, there are 17.21% faces can not be detected. But after adding -45^o face detection (Level A3), only 15.91% faces can not be

detected. It means 1.3% (17.21% - 15.91%) faces originally miss at Level A1. After entering -45^o face detection (Level A3) layer, they can be detected.

Besides, ±45^o face detection increases the possibility of false alarm. The false alarm rate of Level A1 is 57.76%. After adding +45^o face detection (Level A2), the false alarm rate increases to 60.59%. After adding -45^o face detection (Level A3), the false alarm rate increases to 62.02%. It means 2.83% (60.59% - 57.76%) non-faces are originally rejected at Level A1, and 1.43% (62.02% - 60.59%) at Level A2. But after adding ±45^o face detection, these 2.83% non-faces are detected and mistaken as +45^o faces at Level A2, and 1.43% are mistaken as -45^o faces at Level A3.

Level B uses skin ratio preprocessing. Under 0^o face detection (Level B1), 14.94% faces can not be detected. After adding +45^o face detection (Level B2), only 13.31% faces can not be detected. It means 1.63% (14.94% - 13.31%) faces originally miss at Level B1. After adding +45^o face detection (Level B2) layer, they can be detected. Finally, after adding -45^o face detection (Level B3), only 12.26% faces can not be detected. It means 1.05% (13.31% - 12.26%) faces originally miss at Level B2. After adding -45^o face detection (Level B3) layer, they can be detected. Fig. 4.6(a) shows Level B3 can detect more pose face than Level B1.

Besides, the false alarm rate of Level B1 is 6.16%. After adding +45^o face detection (Level B2), the false alarm rate increases to 8.59%.

After adding -45^o face detection (Level B3), the false alarm rate increases to 12.26%. It means 2.43% (8.59% - 6.16%) non-faces are originally

rejected at Level B1, and 3.67% (12.26% - 8.59%) at Level B2. But after adding ±45^o face detection, these 2.43% non-faces are detected and mistaken as +45^o faces at Level A2, and 3.67% are mistaken as -45^o faces at Level A3. Fig. 4.6(b) shows the result of Level B1 is better than Level B2 in false alarm rate.

In terms of skin ratio preprocessing, we compare Level A1 and Level B1. The detection rate of Level A1 without skin ratio preprocessing is 82.14%, and the detection rate of Level B1 with skin ratio preprocessing is 85.06%. So the system with using preprocessing has better performance. In terms of false alarm rate, the false alarm rate of Level A1 without skin ratio preprocessing is 57.76% and the false alarm rate of Level B1 with skin ratio preprocessing is 6.16%. The false alarm rate without skin ratio preprocessing (Level A1) is 51.6% (57.76% - 6.16%) higher than the false alarm rate with skin ratio preprocessing (Level B1). This is because a majority of non-face sub-windows are rejected with preprocessing. As we mentioned in section 3-2, if the real human face exists in a sub-window, the majority part of this sub-window must be skin color. Hence, false alarm rate 57.76% reduces to 6.16% with preprocessing. Besides, scanned regions just focused on skin color regions, so correct results were not influenced by complex background and it also can improve detection rate.

Compared with Level A3 and Level B3, the detection rate of Level A3 without skin ratio preprocessing is 84.09%, and the detection rate of Level B3 with skin ratio preprocessing is 88.31%. In terms of false alarm rate, the false alarm rate of Level A3 without skin ratio preprocessing is 62.02% and the false alarm rate of Level B3 with skin ratio preprocessing is 12.26%. The false alarm rate without skin ratio preprocessing (Level A3) is 49.76% (62.02% - 12.26%) higher than the

false alarm rate with skin ratio preprocessing (Level B3). Fig. 4.6(c) shows the system with using preprocessing introduces high false alarm rate.

Level B1 Level B3

(a)

Level B1 Level B3

(b)

Level A3 Level B3

(c)

Fig. 4.6 Comparison the results of different levels in our system.

In terms of computational time, the case without using preprocess requires longer time to scan the whole image, and its computation time is usually constant. From Table 4.6, we can see that the system without using preprocess requires 2100~2300ms but the system with using preprocess just requires 80~2000ms. This is because the majority part of image is non-skin region, and we only need to scan skin color region.

When skin color area is small, the detection speed is very quickly. Table 4.7 shows the average processing time per image with different skin color region size of single image among our test set.

Skin Color

Table 4.7 The average processing time per image with different

skin color region size.

4.2.5 Testing on Real-life Photos

After comparing with different levels in our system, we show some our experimental results. Fig. 4.7 and Fig. 4.8 separately show some good and bad test results.

Fig. 4.7 Some good test results by applying our system.

Fig. 4.8 Some bad test results by applying our system.

4.3 Experiments on Real-Time System

In this section, we implement real-time face detect system. The input image from a webcam is caught by function provided by OpenCV [12]. OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real time computer vision.

Example applications of the OpenCV library are Human-Computer Interaction, Object Identification, Face Recognition, Gesture Recognition, Motion Tracking, and Mobile Robotics. OpenCV provides a structure named IplImage to process bmp raw data. Fig. 4.9 shows the members of the structure IplImage. Hence, we need to assign the pointer named imageData to input bmp data caught from webcam, and initialize the width and height of the image.

Fig. 4.9 The members of the structure IplImage.

After initializing the information of the input image, we proceed real-time face detection procedure. The detection results of our real-time system are shown in Fig. 4.10. Our face detector can process a 320*240 pixel image in 20~400ms, depending on skin color area. Compared with OpenCV face detector, our detection speed is faster in small skin color area situation, as shown in Table 4.8.

Fig. 4.10 Real-time face detection.

Detection Rate Detection Time

OpenCV Detector

94.48% 70~85ms

Our Face Detector

88.31% 20~400ms

Table 4.8 Comparison with our real-time face detector and

OpenCV real-time face detector. For only single face test, it requires 20~70ms to detect face when the face scale is under 55*55 pixels.

4.4 Implementation of Real-Time Monitoring System

Here, we construct the real-time monitoring system with face detection technique. Fig. 4.11 and Fig. 4.12 show the execution results of server and client. When the server program is started, it also catches the image from USB camera connected to server PC. The streaming service is started by clicking on “Streaming Start” button of sever interface. In Fig. 4.12, the client program is connecting to the server to receive live images by entering IP address of the server and pressing “Open” button of client interface. The right side of client interface is shown that outcome of face detection by pressing “image processing” button.

Fig. 4.11 Server interface.

Fig. 4.12 Client interface with face detection.

CHAPTER 5 CONCLUSIONS AND FUTURE WORK

In this thesis, we proposed a robust face detection method which have good detection speed and can detect wide face rotation.

Experimental results show our system achieve higher detection rate than the system without AdaBoost, the system without using multi-pose face detection, and the system without using skin ratio preprocessing. We also use features selected with and without AdaBoost to test photos.

Experimental results prove AdaBoost is so robust that false alarm rate can be reduced efficiently. In term of detection speed, the method with skin ratio preprocessing is much faster than the method without skin ratio preprocessing. Depending on different skin color area over the whole image, it can save 20~95% computational time. Therefore, our system can be widely used in real-time applications.

In the future, we plan to integrate face recognition approach into our system, and try to improve detection rate and speed. It will enhance standard of human living.

References

[1] Face detection technology on digital cameras:

http://www.letsgodigital.org/en/14826/face-detection-technology/

[2] R.-L. Hsu, M. Abdel-Mottaleb and A.K. Jain, “Face Detection in color images,”

IEEE Transactions on Pattern Analysis and Machine Intelligence,

vol.24, no.5, pp.696-706, May 2002.

[3] J. Kovac and P. Peer, “Human skin colour clustering for face detection,”

EUROCON 2003. International Conference on Computer as a Tool, Ljubljana, Slovenia,

Sept. 2003

.

[4] P. Viola and M. Jones, “Rapid objection using a boosted cascade of simple feature,”

Computer Vision and Pattern Recognition,

vol. 1, pp.

8-14, 2001.

[5] P. Viola and M. Jones, “Fast multi-view face detection,” Tech. Rep.

TR2003-96, Mitsubishi Electric Research Laboratories, July 2003.

[6] Y. Wang, H. Ai, B. Wu, and C. Huang, “Real time facial expression recognition with Adaboost,”

ICPR,

2004.

[7] P. J. Phillips, H. J. Moon, S. A. Rizvi, and P. J. Rauss, “The feret evaluation methodology for face recognition algorithms,”

PAMI,

22(10). 1090-1104, Oct. 2000.

[8] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,”

Computer Learning Theory: Eurocolt ’95,

pages 23-37, Springer-Verlag, 1995.

[9]CCIR, “Encoding parameters of digital television for studios,”

CCIR

Recommendation 601-2, Int. Radio Consult. Committee, Geneva,

Switzerland,

1990.

[10] D. Chai and A. Bouzerdoum, “A Bayesian Approach to Skin Color Classification in YCbCr Color Space,”

TENCON 2000. Proceedings, IEEE, Kuala Lumpur Malaysia,

Vol. 2, pp. 421-424, Sept. 2000.

[11] Nearest Neighbor Interpolation:

http://www.dpreview.com/learn/?/key=interpolation [12] Open Source Computer Vision Library (OpenCV):

http://opencvlibrary.sourceforge.net/

http://www.intel.com/technology/computing/opencv/index.htm

在文檔中使用膚色比例前處理之即時性人臉偵測系統 (頁 39-0)