CHAPTER 2 FACE SEGMENTATION
2.3 Face Segmentation Algorithm
The algorithm in [3] is an unsupervised segmentation algorithm, and hence no manual adjustment of any design parameter is needed in order to suit any particular input image. The only principal assumption is that the person’s face must be present in the given image, since we are locating and not detecting whether there is a face. In this thesis, we do not use the luminance regularization of stage four in Chai et al. [3]
proposed algorithm. It is because we do not give any constraint to our background, it may be complex. The luminance regularization which uses the characteristic that the brightness is more non-uniform throughout the facial than background is more likely to use in a simple background situation. The algorithm we use is consists of four
stages, as outlined in Fig. 2.1.
A. Color Segmentation
The first stage of the algorithm is to classify the pixels of the input image to skin region and non-skin region. To do this, we reference a skin-color reference map in YCrCb color space. The transformation from RGB color space to YCrCb color space is given as follows.
Y = 0.2999R + 0.587G + 0.114B
Cb =à 0.169R à 0.331G + 0.500B (2.1) Cr = 0.500Rà 0.419G à 0.081B
It has been proved that a skin-color region can be identified by the presence of a certain set of chrominance values (i.e., Cr and Cb) narrowly and consistently distributed in the YCrCb color space. The location of these chrominance values has been found and can be illustrated using the CIE chromaticity diagram as shown in Fig.
2.2. We denote RCr and RCb as the respective ranges of Cr and Cb values that correspond to skin color, which subsequently define our skin-color reference map.
The ranges that the paper uses to be the most suitable for all the input images that they have tested are RCr= [133, 173], and RCb= [77, 127].
The size of image we use is 320×240. In order to reduce the computing time, we sample the image to become 160×120 and recover in the last stage. Now, consider an image of M×N pixels and we down sample to M/2×N/2. With the skin-color reference map, we got the color segmentation result OA as
OA(x, y) = 1, if [Cr(x, y)∈RCr]T picture respectively. An example to illustrate the classification of the original image Fig. 2.3 is shown in Fig. 2.4.
Among all the stages, this first stage is the most vital. Based on the model of human skin color, the color segmentation has to remove as many pixels as possible that are unlikely to belong to the facial region while catering for a wide variety of skin color. However, if it falsely removes too many pixels that belong to the facial region, then the error will propagate down the remaining stages of the algorithm, consequently causing a failure to the entire algorithm.
Nevertheless, the result of color segmentation is the detection of pixels in a facial area and may also include other areas where the chrominance values coincide with those of the skin color (as is the case in Fig. 2.4). Hence the successive operating stages of the algorithm are used to remove these unwanted areas.
B. Density Regularization
This stage considers the bitmap produced by the previous stages to contain the facial region that is corrupted by noise. The noise may appear as small holes on the facial region due to undetected facial features such as eyes, mouth, even glasses, or it may also appear as objects with skin-color appearance in the background scene.
Input Image: An Image Including A Face
Color Segmentation
Density Regularization
Geometric Correction
Contour Extraction
Output Image: Segmented Facial Region Fig. 2.1. Outline of face-segmentation algorithm.
Chrominance values found in facial region Fig. 2.2. Skin-color region in CIE chromaticity diagram.
Fig. 2.3. Original image.
Fig. 2.4. Image after filtered by skin-color map in stage A.
Therefore, this stage performs simple morphological operations such as dilation to fill in any small hole in the facial area and erosion to remove any small object in the background area. The intention is not necessarily to remove the noise entirely but to reduce its amount and size.
To distinguish between facial and non-facial region, we first need to identify regions of the bitmap that have higher probability of being the facial region. The probability measure that we used is derived from our observation that the facial color is very uniform, and therefore the skin-color pixels belonging to the facial region will appear in a large cluster, while the skin-color pixels belonging to the background may appear as large clusters or small isolated objects. Thus, we study the density distribution of the skin-color pixels detected in stage A. A density map is calculus as follows.
It first partitions the output bitmap of stage A OA(x, y) into non-overlapping groups of 4×4 pixels, then counts the number of skin-color pixels within each group and assigns this value to the corresponding point of the density map.
According to the density value, we classify each point into three types, namely, zero (D = 0), intermediate (0 < D < 16), and full (D = 16). A group of points with zero density value will represent a non-facial region, while a group of full density points will signify a cluster of skin-color pixels and a high probability of belonging to a facial region. Any point of intermediate density value will indicate the presence of noise. The density map of an example with three density classifications is depicted in Fig. 2.5. The point of zero density is shown in white, intermediate density in green,
and full density in black.
Once the density map is derived, we can then begin the process that we termed as density regularization. This involves the following three steps.
1) Discard all points at the edge of the density map, i.e., set D(0, y) = D(M/8-1, y) = D(x, 0) = D(x, N/8-1) for all x = 0, …, M/8-1 and y = 0, …, N/8-1.
2) Erode any full-density point (i.e., set to zero) if it is surrounded by less than five other full-density points in its local 3×3 neighborhood.
3) Dilate any point of either zero or intermediate density (i.e., set to 16) if there are more than two full-density points in its local 3×3 neighborhood.
After this process, the density map is converted to the output bitmap of stage B as
OB(x, y) = 1, if D(x, y) = 16 0, otherwise
ú
(2.4)
for all x = 0, …, M/8-1 and y = 0, …, N/8-1.
The result of the previous example is displayed in Fig. 2.6.
Fig. 2.5. Density map after classified to three classes.
Fig. 2.6. Image produced by stage B.
C. Geometric Correction
We performed a horizontal and vertical scanning process to identify the presence of any odd structure in the previously obtained bitmap, OB(x, y), and subsequently removed it. This is to ensure that a correct geometric shape of the facial region is obtained. However, prior to the scanning process, we will attempt to further remove any more noise by using a technique similar to that initially introduced in stage B.
Therefore, a pixel in OB(x, y) with the values of one will remain as a detected pixel if there are more than three other pixels, in its local 3×3 neighborhood, with the same value. At the same time, a pixel in OB(x, y) with a value of zero will be reconverted to a value of one (i.e., as a potential pixel of the facial region) if it is surrounded by more than five pixels, in its local 3×3 neighborhood, with a value of one. These simple procedures will ensure that noise appearing on the facial region is filled in and that isolated noise objects on the background are removed.
We then commence the horizontal scanning process on the “filtered” bitmap. We search for any short continuous run of pixels that are assigned with the value of one.
For a CIF-size image, the threshold for a group of connected pixels to belong to the
facial region is four. Therefore, any group of less than four horizontally connected pixels with the value of one will be eliminated and assigned to zero. A similar process is then performed in the vertical direction. The rationale behind this method is that, based on our observation, any such short horizontal or vertical run of pixels with the value of one is unlikely to be part of a reasonable-size and well-detected facial region.
As a result the output bitmap of this stage should contain the facial region with minimal or no noise, as demonstrated in Fig. 2.7.
D. Contour Extraction
In this final stage, we convert the M/8×N/8 output bitmap of stage C back to the dimension of M/2×N/2. To achieve the increase in spatial resolution, we utilize the edge information that is already made available by the color segmentation in stage A.
Therefore, all the boundary points in the previous bitmap will be mapped into the corresponding group of 4×4 pixels with the value of each pixel as defined in the output bitmap of stage A. The representative output bitmap of this final stage of the algorithm is shown in Fig. 2.8.
Fig. 2.7. Image produced by stage C.
Fig. 2.8. Image produced by stage D.