Face Segmentation Algorithm - FACE SEGMENTATION

CHAPTER 2 FACE SEGMENTATION

2.2 Face Segmentation Algorithm

The algorithm in [17] is an unsupervised segmentation algorithm, and hence no manual adjustment of any design parameter is needed in order to suit any particular input image. The only principal assumption is that the person’s face must be present in the given image, since we are locating the face and not detecting whether there is a face. The revised algorithm we used is composed of four stages, as depicted in Fig.

2.1.

Fig. 2.1. Flowchart of the face-segmentation algorithm.

Input Image: A Frontal View Face Image

Output Image: Segmented Facial Region Color

Segmentation

Density Regularization

Geometric Correction

Contour Extraction

A. Color Segmentation

The first stage of the algorithm is to classify the pixels of the input image into skin region and non-skin region. To do this, we obtain a skin-color reference map in YCbCr color space. It has been proved that a skin-color region can be identified by the presence of a certain set of chrominance values (i.e., C_b and C_r) narrowly and consistently distributed in the YCbCr color space. We utilize RCb and RCr to represent the respective ranges of C_b and C_r values that correspond to skin color, which subsequently define our skin-color reference map. The ranges to be most suitable for all the input images are RC_r =[^{133, 173}] and RC_b =[^{77, 127}].

In order to reduce the computation time, we downsample, both row and column, the input image to half resolution and recover it in the last stage. Therefore, for an image of M×Npixels and we downsample it to M / 2×N/ 2. With the skin-color reference map, we got the color segmentation result OA as

( )

^1, ^{if ,}

( ) ( )

^, of the input image respectively. An example of output image to illustrate the classification of the input image Fig. 2.2 is shown in Fig. 2.3.

Fig. 2.2. Input image.

Fig. 2.3. Output image after segmented by skin-color map in stage A.

B. Density Regularization

The bitmap produced by the preceding stage A to reserve the facial region is corrupted by noise. The noise may be small holes on the facial region due to undetected facial features such as eyes, mouth, and even glasses. It may also appear as

simple morphological operations such as dilation to fill in the small holes in the facial region and erosion to remove the small object in the background scene. To distinguish facial region form non-facial region, we first need to identify regions of the bitmap that have higher probability of being the facial region. For this task, a density mapD x y is calculated as follows. ( , )

( )

³ ³

( )

0 0

, _A 4 , 4y

i j

D x y O x i j

= =

∑∑

+ + (2.3)

It first partitions the output bitmap of stage A OA(x, y) into non-overlapping groups of 4×4 pixels, then counts the number of skin-color pixels within each group and assigns this value to the corresponding point of the density map.

Fig. 2.4. The procedure of doing density map D x y . ( , )

According to the density value, we classify each pixel into one of three clusters:

zero (D = 0), intermediate (0 < D < 16), and full (D = 16). Fig. 2.5 shows the density map of the output bitmap of stage A shown in Fig. 2.3 with three density classifications. The point of zero density is shown in white, intermediate density in

green, and full density in black. A group of points with white color will likely represent a non-facial region, while a group of black points will signify a cluster of skin-color pixels and a high probability of belonging to a facial region. Points with intermediate density values shown in green will probably indicate the noise.

Fig. 2.5. Density map after classified to three classes.

After the density map is derived, we then begin the process that termed as density regularization. This includes three steps as below.

1) Discard all points at the edge of the density map, i.e., set D(0, y) = D(M/4–1, y) = D(x, 0) = D(x, N/4–1) for all x = 0, 1, …, M/4–1 and y = 0, 1, …, N/4–1.

2) Erode any full-density point (i.e., set to zero) if it is surrounded by less than five other full-density points within its local 3×3 neighborhood.

3) Dilate any point with either zero or intermediate density (i.e., set to 16) if there are more than two full-density points within its local 3×3 neighborhood.

Processed by density regularization, the density map is converted to the output bitmap of stage B as

for all x = 0, 1, …, M/4–1 and y = 0, 1, …, N/4–1. The eroded and dilated result of the bitmap in Fig. 2.5 processed after stage B is shown in Fig. 2.6.

Fig. 2.6. Output image produced by erode and dilate.

C. Geometric Correction

After stage B, there may be still some fragmented areas in the output bitmap. In order to eliminate or mend these areas, we performed a horizontal and vertical scanning process to identify the presence of any odd structure in the preceding bitmap obtained from stage B, O_B(x, y), and subsequently removed it. Firstly, we use a technique similar to that introduced in stage B to further remove any more noise. A pixel in O_B(x,y) with a value of one will remain as a detected facial pixel if there are more than three other pixels with the same value in its local 3×3 neighborhood.

Simultaneously, a pixel in O_B(x,y) with a value of zero will be reconverted to a value of one (i.e., as a potential pixel of the facial region) if it is surrounded by more than five pixels with a value of one in its local 3×3 neighborhood.

A bitmap of well-detected facial region should look continuous, and therefore any short run of pixels with the value different from the detected facial region should unlikely belong to this region. As a result, next to the process above, we then begin the horizontal scanning process on the filtered bitmap. We search for any short continuous run of pixels which are assigned with the value of one. Any group of less

than four horizontally connected pixels with the value of one will be eliminated and assigned to zero. A similar process is then performed in the vertical direction. After all processes in this stage, the output bitmap should contain the facial region with minimal or even no noise, as shown in Fig. 2.7.

Fig. 2.7. Output image produced after stage C.

D. Contour Extraction

In this stage, we convert the output bitmap of stage C back to the original dimension of the extracted face region from stage A. We utilize the edge information that is already made available by the color segmentation in stage A. All the boundary points in the previous bitmap will be mapped into the corresponding group of 4×4 pixels with the value of each pixel as defined in the output bitmap of stage A. The output image of this final stage is shown in Fig. 2.8.

Fig. 2.8. Image produced by stage D.

Chapter 3 Eye Detection and Iris Extraction

3.1 PCA Review

Most approaches in computer recognition of faces and expressions have been focused on detecting individual features such as eyes, head outline, mouth, or defining a face model by position, size, and relationships among these features. Features extraction plays an essential role in the pre-processing stage. Principal component analysis (PCA) has been commonly used to face recognition problems [18]. Typical PCA algorithm is one of the main streams of research on face feature processing [19].

PCA has advantage over other face recognition schemes in its speed and simplicity.

We utilized PCA in the pre-processing stage to extract features from input face image which has been segmented by skin color mentioned in chapter 2.

The basis of the input image space is composed of all single pixel vectors.

However, the input image space is not an optimal space for face representation and categorization. The aim of applying PCA is to build an eye space which better describes the eye regions. The basis vectors of this eye space are called the principal components. These components will be uncorrelated and will maximize the variance accounted in the original basis. It can also reduce the dimension of the feature space.

The computation complexity is thus reduced.

3.2 Computation of Eigeneyes

Step 1: obtain eye region imagesI I₁, ₂, ,I_M(training eyes)

important note: the eye region images must be the same size.

Step 2: represent every input image as a vector Γ _i

Fig.3.1 Represent input eye region matrix to a vector

Step 3: compute the average eye vector Ψ :

1 M

i₌

Ψ = ∑ Γ

(3.1)

Step 4: subtract the average eye vector Ψ:

i i

Φ = Γ − Ψ

(3.2)

Step 5: compute the covariance matrix C: eigenvectors are related as follows: ui =Avi

Note 1: AA can have at most ^T N² eigenvalues and eigenvectors.

Note 2: A A can have at most M eigenvalues and eigenvectors. ^T

Note 3: The M eigenvalues of A A correspond to the M largest eigenvalues of ^T

1 2 2

where A= Φ Φ[ ΦM] (N ×M matrix )

i i i

A A v = μ v

AA and in the same way as eigenvectors. T

Step 6.3: normalize uisuch that ui =1, compute M best eigenvectors of AA for T ui =Avi.

Step 7: keep M eigenvectors which are corresponding to the M largest eigenvalues.

3.3 Representing Eyes onto PCA Basis

Each eye region image (minus the mean) Φ in the training set can be represent _i as a linear combination of the M eigenvectors:

Each normalized training eye region image Φ is represent in this basis by a _i vector:

3.4 Eye Region Recognition Using Eigeneyes

Input an unknown eye region image Γ with the same size as training image and follow these steps:

Step 1: normalize Γ : Φ = Γ − Ψ Step 2: project onto the eigenspace:

Step 3: computethe distance e (distance from eye space)._d

ˆ

e = Φ − Φ

(3.8)

Step 4: find an input eye region image which has minimum e as the output. d

3.5 Using Deformable Template for Iris Extraction

The deformable template matching method has gained a growing interest in locating and finding exact shapes and sizes of known objects. Actually, it has been used in many applications including boundary finding in magnetic resonance images [20], extractions of eyes and mouths, vehicle segmentation and classification for ITS [21], and mouth description. In the method, the shape or contour of the object to be extracted is modeled by a combination of parametric functions such as linear functions, quadratic functions, and circles, called a deformable template. The parameter values to constitute the template are searched by an optimization algorithm, so that the template should fit the object in a given image as best as possible. A circle of variable size is scanning across the search region to find the best fit. The above mentioned search region is that we constructed in Section 3.2－3.4 by PCA. The fitting process uses the intensity field and the edge field of the search region of eyes.

3.5.1 Intensity Field and Edge Field

An intensity field I and an edge field s E are given by a threshold α s

according as the low intensity of iris and highly contrast between iris and sclera respectively. Edge field is produced by Canny edge operator and intensity field shows below.

Observe the region of iris in Fig. 3.2(b), we find out there are some small holes in this region because of the variation or noise corruption and the intensity field is not suitable for template search. Using this intensity field for estimate the maximum intensity region will not work reliably. This is because that the uneven intensity distribution accounts for the wrongly convergence in template fitting. On the other hand, on observing Fig. 3.2(c), there are too many edges are answer to use edge field Circle template commonly observed both in Figs. 3.2(b) and 3.2(c), could be helpful for iris search. Accordingly, we must modify this approach obtain color compensating

procedure for intensity field. Do Canny edge operator in intensity field image which has been color compensated instead of input eye image.

3.5.2 Anisotropic Diffusion

The approach we proposed to compensate the input eye image for intensity field is anisotropic diffusion. In [22], Black mentioned diffusion algorithms could remove noise from an image by modifying the image via a partial differential equation (PDE).

For example, consider applying the isotropic diffusion equation (the heat equation) given by ^∂^{I x y t}

(

^{, ,}

)

^{/ = div}^∂^t

( )

^∇^I , using the original (degraded/noisy) image

(

^{, , 0}

)

I x y as the initial condition,

( )

^{x y}^, specifies spatial position, t is the number of iteration, and where ∇I is the image gradient. Modifying the image according to this isotropic diffusion equation is equivalent to filtering the image with Gaussian filter; however it result in blurring the edge.

Perona and Malik [23] proposed the anisotropic diffusion equation as follows:

( )

= div

It ⎡⎣g ∇ ∇I I⎤⎦ (3.10)

where ∇I is the gradient magnitude, and ^g( )^∇^I is an “edge-stopping” function.

This function is chosen to satisfy ^{g x}

( )

^→ ⁰^when^x ^{→ ∞} so that the diffusion is “stopped” across edges as Fig. 3.3. The “edge-stopping” function is adopted in [25]

are

( ) (

⁽ ^I ^/^K⁾²

)

g ∇I = e^{− ∇} , (3.11)

and

The constant K was fixed either by hand at some fixed value, or using the “noise estimator” described by Canny [24].

Perona and Malik discretized their anisotropic diffusion equation as follows:

( )

where Is^t is a discretely sampled image, s denotes the pixel position in a discrete, two-dimensional (2-D) grid,, t now denotes discrete time steps (iterations), and

0≤ ≤λ 0.25 for the numerical scheme to be stable. The constant is a scalar that

determines the rate of diffusion, η represents the spatial neighborhood of pixel s, _s and η is the number of neighbors (usually four, except at the image boundaries). s

Perona and Malik [23] linearly approximated the image gradient (magnitude) in a particular direction as

, ^t,

sρ ρ s ρ ηs

∇Ι = Ι − Ι ∈ . (3.14)

We show the local neighborhood of pixels at a boundary in Fig. 3.4. Fig. 3.5 shows the example of the noise image and its result image after anisotropic diffusion processing.

Fig. 3.4. Local neighborhood of pixels at a boundary (intensity discontinuity).

(a) (b) (c)

Fig. 3.5. A diagram of anisotropic diffusion algorithm. (a) The noisy image. (b) The processed image after average mask. (c) The image after anisotropic diffusion.

(a) (b)

Fig. 3.6. An eye image after anisotropic diffusion. (a) Input image. (b) Output image.

(a) (b) (c)

Fig. 3.7. An example of anisotropic diffusion algorithm. (a) Input image (b) Intensity field. (c) Edge field.

The effect of noisy artifact removal by is illustrated below. After anisotropic diffusion procedure on Fig 3.6, we got an eye region image with the iris and the rest smoother distribution of intensity. Uniform intensity field is helpful for the template search, which is due to a smooth average intensity amount for the iris candidate regions. In the edge field, the remainder edges out of rounded boundary ones have been discarded, thus we can find the correct iris circle handily.

Here is another problem that the edge sometimes shrinks to the small region inner the iris as shown in Fig. 3.8(b). From this figure, it is easy to en-circle the small region with the dark intensity, which has some edges are formed by upper eyelid and iris. This wrong encircling can be avoided by the following: First, we

eliminate the horizontal edges as shown in Fig. 3.8(f). Then a circle coincides with most of the edge detected is the good candidate for iris. The result with imposed circle for iris is shown in Fig. 3.8(e).

(a)

(b) (c) (d)

(e) (f) (g)

Fig. 3.8. Eliminating the horizontal edges of (c) produces (f). (b) is invalid result. (e) is valid result.

3.5.3 Circular Deformable Template

An adaptive search region from PCA algorithm and better edge field and intensity field by Section 3.5.1 and 3.5.2 help to do the template search task. Then, we set up the circular template subsequently, which require only two parameters. In this way we can simplify the template model and thus reduce template searching time. The circular template model is composed of the radius r and the center point (Xc,Yc) as shown in Fig. 3.9. Considering the iris is not precise round, there is a range from

r− to r+ pixels of circumference for fitting edges and the region less then the 1

radius r for fitting intensity.

Fig. 3.9. The diagram of circular template model.

According to the color and shape characters of the iris, its low intensity value and round edge are located to iris. To this setting, shift the circle center pixel by pixel in the search region of the eye which is chosen from PCA and then record the cumulative intensity value P and cumulative edge number I P , defined below: _E

( , ) ( , ) Deformable circle’s radius r ranges from 1 to the height N of the rectangular eye region.

These two characters of iris favor circular template and we obtain the best circle of iris by :

{ }

, , ,

(X Y r_c _c, )=arg max P X Y_I( _c _c)+P_E(X Y_c _c) (3.17)

Based on the proposed scheme above, some experimental iris extraction are shown below. The input image samples are obtain from FACS database at http://face-and-emotion.com/.

Fig. 3.10. Examples of iris extraction.

Chapter 4 Eye State Determination

4.1 Analyzing the Iris Region

After the iris extraction and circular deformable template search mentioned in Chapter 3, we subsequently can analyze the iris image. The iris is a circular and dark colored region on the eyeball. The color of the iris is mainly determined by the reflection of environmental illumination and the iris’ texture and patterns including the pupil (an aperture in the center of the iris) [25].

Referring to Fig 4.1, human eye closeness state can be simplified to the observable ratio of iris. When human blink his eyes, the eyelid covers the iris and the observable ratio of iris is defined to be the ratio of dark area to the whole iris region.

This ratio, as shown in Fig 4.2, will change momentarily when someone blinks his eyes or close them. However, it is difficult to measure the observable ratio of the iris exactly, particularly the closing eye state. Because of there exists many interferences like eyelashes and deep eyelid fold, these makes the iris region noisy and difficult to detect accurately. In practice, we frequently need detect the drowsy state of a person, in this case we just need to know the observable ratio of the iris, which simplifies the task to some extent. The reason why will be is described in the next section.

Fig. 4.1. Physiological motion of human eye.

Fig. 4.2. Observable ratio of the iris appearance.

4.2 Determining Eye State

When someone falls asleep or fatigues, his eyelid will rise and fall with an increasing frequency in the beginning and his eye stays barely open during drowsy.

With this fact in mind, we estimate the eye closeness into three states: open, barely open and closed; instead of just open and closed states.

The observable ratio of iris in normal (open) state is different among people.

Someone open his eye with iris in totally seen scale but others open his eye with iris covered by upper eyelid to some extent. To this fact, determining one’s eye opened state should also consider his eye open/closed habit. Namely, we have to know the commonly observable portion of the iris, which specifies his normal (open) state of the eye. Firstly, the tester keeps his eyes in open state and the system counts his observable portion of irisI by t

Where is center point of the output solution after template search.

Three states of open, barely open and closed are defined as follows:

0.6, Open state.

A problem often encountered when the eye is in almost close, in which the iris is almost completely covered by the eyelid. Our system still output an image and the intensity value as the eye image. According to our experience, we found that our imposed circles will be located around the eye inner corner principally. This is because that eye corner contains low intensity and edges comparatively. Our iris imposed circle is liable to locate the barely open or open eyes, and is not reliable for closed eye. To solve this problem, we have to resort to the image’s HSI color domain and utilize the hue component as a reference.

Fig 4.3 shows the eye state from open to completely closed, the hue images in the right column, decrease their values when the eye changes from open to closed.

With this observation in mind, and we add an extra criterion based on eye’s hue component of an eye image. If the hue ratio is lower than 50% of its normal (open) state, the eye is classified to closed state no matter what the observable ratio of the iris is.

Fig. 4.3. Different level of eye closure and its hue component.

4.3 Drowsy State Detection

在文檔中圓形可變樣板應用於眼睛張開程度偵測 (頁 17-0)