Synopsis of the Dissertation - 人臉偵測之研究

CHAPTER 1 INTRODUCTION

1.3 Synopsis of the Dissertation

The rest of the dissertation is organized as follows. Chapter 2 describes the proposed horizontal eye line detection and eye-like region detection method. The face-pose identification method and face detection method with different poses under vary environment are proposed in Chapter 3. Some conclusions and future research directions are drawn in Chapter 4.

CHAPTER 2 A NOVEL METHOD FOR

HORIZONTAL EYE LINE DETECTION UNDER VARIOUS ENVIRONMENTS

In this chapter, we will propose a method to extract the horizontal eye line of a face and some eye-like regions under various environments. It contains several steps.

First, we will use skin colors to extract candidate skin regions. Next, an eye-like region detector based on intensity and color information will be provided to explore all possible eye-like regions in each candidate skin region. A lighting distribution based algorithm will then be presented to remove some false eye regions. Finally, based on the extracted eye-like regions, the horizontal eye line for a candidate skin region can be located using the gray level distribution of the skin region.

2.1 INTRODUCTION

In a drowsy driving warning system, driver’s mental condition can be revealed by measuring the eye movement from the eye location [1]. How to locate the eyes of a driver under extreme lighting condition is an important issue for a successful intelligent transportation system. Detecting the eye line can help locate eyes.

Moreover, the detected eye line can be used to do face detection. Although face

detection has been studied for more than 20 years, developing a human face detector under various environments is still a challenging task. Some factors make face location difficult. One is the variety of colored lighting sources, another is that facial features such as eyes may be partially or wholly occluded by shadow generated by a bias lighting direction; others are races and different face poses with/without glasses.

In order to overcome this problem, Hjelmas and Low [2] have presented an extensive survey of feature-based and image-based algorithms for face detection. In feature-based methods, the most widely used technique is skin color detection based on one of the color models. Most Image-based approaches are based on multiresolution window scanning to detect faces at all scales, making them computationally expensive. Also, Yang [3] classifies the approaches into four categories: knowledge-based, feature invariant, template matching, and appearance-based and address one of the difficulty of face detection is lighting source varies significantly. Several methods [4-18] had been proposed by now. Some [4-6]

use eigenface, neural network and support vector machine (SVM) to detect faces of restricted poses under normal lighting condition. Chow [7] proposed an approach to locate facial features (such as eyes, eyebrows) by using an eigenmask under different lighting conditions. However, the method fails to locate the facial features for some kinds of poses (such as near-profile and profile face). Shin and Chuang [8] uses shape

and contour information to locate face in a plain background and normal lighting condition. However, it is difficult to detect the predefined shape of a face in a complex background and various lighting conditions. Wu et al. [9] proposed an algorithm to locate face candidates using eye-analogue segments information. The algorithm would fail when a person wears glasses. Some methods [10-12] use mosaic, edge and mathematical morphology to detect eyes, however, these methods would fail to locate eyes when a face is under a poor lighting condition (such as bias-light source). Wang [13] proposed a boosted based algorithm to detect a face. But the training phase of the method is time consuming and it is only designed for frontal face. Shih [14] proposed a color based method shows reasonable performance in terms of detection rate.

However, the detection would fail under extreme lighting condition.

As mentioned above, extreme lighting conditions (such as colored lighting source and bias lighting direction), different subject poses, glasses, races, and complex background are factors to make face detection difficult. To solve the problem caused by these factors, we will propose a method to extract the horizontal eye line of a face and some eye-like regions under various environments. The extracted eye line and eye-like regions can be further used to help extract the true positions of eyes and face.

2.2 THE PROPOSED METHOD

The proposed method contains several steps. First, we will use skin colors to extract candidate skin regions. Next, an eye-like region detector based on intensity and color information will be provided to explore all possible eye-like regions in each candidate skin region. A lighting distribution based algorithm will then be presented to remove some false eye regions. Finally, based on the extracted eye-like regions, the horizontal eye line for a candidate skin region can be located using the gray level distribution of the skin region.

2.2.1 The Proposed Skin Region Detector

In several face detection systems [15], skin color plays a major role for segmenting face regions from an image. There are several color models utilized to label pixels as skin, these include RGB, HSV and YCrCb color spaces [16-18]. Since a face image is usually taken under unpredicted lighting conditions, RGB color-model is sensitive to light and unsuitable for representing skin color. In YCbCr model, Cr represents the red degree [15]. Since human skin color tends to red, we will adopt YCbCr color model.

2.2.1.1 Skin Region Extraction

Some skin detectors [14, 15] work on skin-color distribution modeling. Different illumination levels such as indoor, outdoor, highlight and shadows will change the color of skin. However, the size of the training set and variety of samples used in the training may be impact on the skin detector performance [15]. That means the trained skin model may be required extensive training and fine tuning to archive an optimal performance for different training samples. In this dissertation, the skin detector works on explicit skin-color space thresholding.

As mentioned above, human skin color tends to red despite the race, we will use Cr value to determine whether a pixel is skin or not. If a pixel’s Cr value is larger than a predefined threshold, then it will be considered as a skin pixel. In this dissertation, a threshold Crth1 is adopted to bound skin color. On the other hand, the Cr value for skin pixels will be lowered under a high lighting with blue color biased source environment, another lower threshold Crth2 will also be used. That is, for an input image, we first use Crth1 (here, 142 is used) as threshold to get candidate skin regions.

Then, for the same input image, we use Crth2 (here, 132 is used) as threshold to get another candidate skin regions. Fig. 2.1 shows some results of applying the skin region extractor on different racial faces. From this figure, we can see that whatever the light-conditions and the human races change, most skin pixels are extracted (white

pixels). After skin pixels are extracted, those connected skin pixels are grouped to form a set of candidate skin regions, each region is bounded by a red rectangle (see Figs. 2.1(b) and 2.1(c)). From Fig. 2.1, we can see that some extracted candidate skin regions are not true skin. In the next section, we will present a filter to remove some false skin regions.

(a) Four images taken under normal, bias colored and high lighting environments.

(b) The extracted candidate skin regions using threshold 142.

Fig. 2.1. Some results of applying skin region extractor on different races under various lighting environments.

2.2.1.2 Skin Region Filter

The skin region filter is designed to remove false skin regions based on four features: size, shape, uniform-content and geometric relation. Each feature can be

used to remove several false regions. The detail is described as follows.

(1) Size: A candidate skin region will be removed if the width or height of its bounding rectangle is less than a threshold Sth. In this dissertation, we set Sth to be 15 (see Figs. 2.2(b) and 2.2(d)). On the other hand, if the width of its bounding rectangle is larger than a half of the input image width, the region will also be removed.

(2) Shape: In general, the shape of a face should be like an ellipse. Thus, if a region looks like a horizontal or vertical thin bar, it should be removed (see Fig. 2.2(b)).

Note that if a region with H_s/W_s  V_th, it is considered as a vertical thin bar. On the other hand, if a region with Ws/ Hs  Hth, it is considered as a horizontal thin bar. Here, H_s and W_s are the height and the width of a region, and in the dissertation, we set Vth to be 2.5 and Hth to be 0.5.

(3) Uniform-Content: A uniform content region should not be a face due to that the eyes and eyebrows are in face. Thus, if the standard derivation of the gray values in the bounding rectangle of a candidate skin region is less than a predefined threshold U_th (10), the region will be removed (see Fig. 2.2(f)).

(4) Geometric relation: If two candidate skin regions crossover and the small one is less than a half of the big one, then the small one is removed (see Fig. 2.2(g)). On the other hand, if a small region is totally covered by another big region, then both regions are preserved (see Fig. 2.2(g)). For this case, in the later process, the small one is first used to detect the eye line. If an eye line is found, the detection for the big one will be skipped.

Vertical shape region

Small size regions

(a) A face image. (b) Candidate skin regions using 132 as threshold.

Small size regions

(d) Candidate skin regions using 142 as threshold.

(e) The result of removing small regions in (d).

(f) The result of merging (c) and (e) and imposed on (a).

(g) The result of removing uniform content regions and cross over regions in (f).

Fig. 2.2. Some examples for illustrating the skin region filter.

2.2.2 The Proposed Eye-Like Region Detector

In this section, we will provide a method to extract eye-like regions from those remaining candidate skin regions. Fig. 2.3 shows the block diagram of the proposed eye-like region detector. First, based on intensity and skin color information, some eye-like regions are extracted. Next, some false regions are removed. Since the true eye may not be located in the center of the eye-like region, a refining algorithm is presented to adjust the eye-like region. Finally, some false eye-like regions appearing in hair or shadow part will be removed, and isolated eye-like regions are also

Fig. 2.3. The block diagram of the eye-like region detector.

In [7], luminance contrast and eye shape are used to detect an eye. However, under various lighting environments, unpredicted shadow appearing on face makes eye detection difficult. Fig. 2.4 shows some face images taken under various lighting environments. To treat this problem, we use two fundamental eye properties to extract eye-like regions. The first property is that the pupil and iris’s gray values are lower than the gray values of skin. Based on this property, an intensity based technique is

provided to locate eye-like regions in the bounding rectangle of each candidate skin region. The second property is that under non-colored lighting source condition, the eye color will be very different from skin color. Based on this property, we consider those small non-skin color regions in the bounding rectangle of a candidate skin region as eye-like regions.

(a) The normal lighting condition. (b) Top lighting source.

(d) High lighting source.

Fig. 2.4. Subjects taken under various lighting environments.

2.2.2.1 Intensity Based Eye-Like Region Extraction

After extracting candidate skin regions, an intensity based detector will be first provided to extract eye-like regions from the bounding rectangle of each candidate skin region. Fig. 2.5 shows its flowchart.

The bounding

Fig. 2.5. The flowchart of the intensity based eye-like region detector.

Based on the fact that the pupil’s gray value are lower than skin gray values, the detector adopts a bi-level thresholding to get the eye-like regions. First, for a candidate skin region, several different threshold values, t1, are used for bi-level thresholding, the maximum and minimum of t1 are named as t1_MAX and t1_MIN, respectively. t1_MAX is set as the average gray value of 10% pixels with maximum gray values in the candidate skin region and t1_MIN is set to be the average gray value of 5% pixels with minimum gray values in the candidate skin region. At beginning, a bi-level threshold operator with t1_MAX as the initial threshold is applied to extract eye-like regions, and then the operator is applied literately with t1 reduced by a value I_th (here, 10 is used) each time until t1 reaches t1_MIN. Fig. 2.6 shows an example for evaluating t1_MAX and t1_MIN, and the result of applying the intensity based eye detector.

t1_Min t1_Max t1=146 t1=86 t1=56 (a) A face

region.

(b) The histogram of detected gray pixels within the skin region.

Fig. 2.6. An example of the bi-level thresholding results.

After applying a bi-level thresholding operator at each time, every resulting black region is considered as a candidate eye-like region. However, some of these black regions are false ones. Here, we provide three filters to remove these false eye-like regions. One is geometric filter, which is used to remove too small or too large regions.

A black region with size less than Sth2 (here, Sth2 is set as 5x5) or its width (height) larger than Sth3 (here, we set Sth3 to be 1/4 skin region width) will be removed. One is statistical filter, if the standard derivation of the gray values of an eye-like region is lower than a predefined threshold, the region is considered as a false eye-like region and then removed. Here the threshold we chose is 10. From the structure of an eye, we can see that the iris and pupil’s gray levels are lower than skin. Based on this property, the third filter called projection filter is provided to remove each black region with its average gray value larger than that of the upper or lower neighboring areas. A horizontal gray value projection histogram (see Fig. 2.7) is used to implement the

of the highest peak above or below the position, we removed the region. Fig. 2.8 shows the final result of applying the intensity based eye-like region detector to a face image, each extracted eye-like region is enclosed by a red rectangle.

Eye-like region

Projection region

(a) The projection region extended from eye-like region.

(b) The projected histogram of the projection region in (a).

Fig. 2.7. An example of the horizontal gray value projection for an eye.

(a) The result of a normal lighting source face.

(b) The result of a colored and left bias lighting source face.

Fig. 2.8. The results of applying the intensity based algorithm.

2.2.2.2 Non-Skin Color Based Eye Detector

Under a very low lighting environment, an eye-like region extracted by intensity based detector may contain a large non-eye part (see the left eye in Fig. 2.8(b)). This

information to locate eyes. First, non-skin color regions within a candidate skin region are considered as eye-like regions. The three false eye-like region filters described in Section 3.1 are then used to remove some false regions. Fig. 2.9(a) shows the result of applying the non-skin color based eye detector. Each eye-like region is enclosed by a green rectangle. From this figure, we can see that the left eye was precisely located.

Fig. 2.9(b) shows the combined result of Fig. 2.8(b) and Fig. 2.9(a).

(a) The eye-like regions located based on non-skin color information.

(b) The combined result of Fig. 2.8(b) and (a).

Fig. 2.9. The result of applying the proposed intensity and non-skin color based eye detectors.

2.2.3 False Eye-Like Region Removing

The obtained eye-like regions still contain some false regions; we can classify these regions into five classes: overlapped, hair reflecting, beard/clothes, isolated, and forehead-hair (see Fig. 2.10). In the following, we will provide some procedures to remove these regions.

Hair reflecting

(b) Another example to show forehead - hair class.

Fig. 2.10. Five false region classes.

2.2.3.1 Overlapping/ Hair Reflecting/ Beard/ Clothes False Region Removing For the overlapping class, if one eye-like region totally covers a small eye-like region, the covering region is removed. To remove hair reflection regions, first, a horizontal gray level projection histogram (see Fig. 2.11(b)) is created. Then, the first peak location from the top of the projection histogram is defined as an upper line which usually indicates the forehead. Each eye-like region intersecting the upper line is removed. For the beard/clothes class, we first define a bottom line with distance h/2 from the upper line, where h is the distance between the upper line and the bottom of the skin region. Then all eye-like regions below the line are removed. Fig. 2.11(d) shows the result of applying the removing procedure to Fig. 2.11(a).

The first peak

(a) A bounding rectangle of a candidate skin region.

(b) The horizontal projection histogram of the bounding rectangle in (a).

h/2

Upper line

Bottom line h

(d) The result after applying the removing procedure presented in Section 2.2.3.1 to (a).

Fig. 2.11. The result of applying the false eye-like removing procedure.

2.2.3.2 Eye Position Refining And Isolated False Region Removing

For a non-rotated human face, an eye pair should be located at a near-horizontal line. Thus for an eye-like region, if we can not find another region at its left or right, then this region is called isolated region and should be removed. However, under various environments, a true eye may not be located at the center of the extracted eye-like region (see Fig. 2.12(a)), we are then unable to know where the eye is. In order to treat this problem, we provide an eye position refining method to locate the true eye position in the eye-like region. First, we classify the frontal images of HHI

into three classes according to the lighting condition: normal, high light and low light.

Next, one image is selected randomly from each class. Considering the effect of glasses, the normal class has two images selected with/without wearing glasses. For each image, three templates are extracted (see Table 2.1). An eye area including eyelid is taken as the first template. In the first template, an area only containing eye is then taken as the second template. In the second template, an area only containing eye ball is taken as the third template. After obtaining the twelve templates, we use each template to mask each eye-like region by sliding the template pixel by pixel in the region to find the best matched area for each class. Covariance is used to measure the similarity. If an eye-like region with the located eye center does not lie in the true eye, we take three new templates from the image containing the region using the previous method. The procedure is repeated until all eye-like regions are processed. After the procedure is finished, we find that nine new templates (see Table 2.1) belonging to three new classes are extracted. These three new classes are left-side (the eye sees the left side), right-side (the eye sees the right side) and bias lighting. Since the images in HHI have different face sizes, the extracted templates will also have different sizes.

After we make the eye templates, a mentioned similarity matching procedure is adopted to find the best matched area for testing image. We use the similarity function is defined as follows.

在文檔中人臉偵測之研究 (頁 21-0)