Glasses Reflection Separation System - 自動眼睛偵測及眼鏡反光消除

CHAPTER 1 INTRODUCTION

1.4 Glasses Reflection Separation System

When a subject wears glasses or sunglasses the performance of estimating the state of eye will descend. Because the reflection will arise on the lens of glasses arbitrarily; hence, we need to diminish the side effect from the reflection. There are many algorithms being proposed to do this task. Levin and Weiss [16] use derivative filters, linear programming and some user assistance. [17, 18] use the feature which the reflection and non-reflection images have different motions to separate the two image. In this thesis, we adopt the method proposed by Levin et al. [19] to separate the reflection. We will show the details in Chapter 3.3.

1.5 Flowchart of Eye Detection and Glasses Reflections Separation System

Fig 1.1 exhibits the flowchart of eye detection and glasses reflections system.

The system starts with a new image frame. After getting a new color frame, we search out the face region by extracting a skin-color area based on the skin-color reference map in [3]. Then we detected the appearance of glasses because we must preprocess the reflection within glasses if one wears glasses. Subsequently, we locate the area of eye for eye detection. Finally, we use the eye detection to determine that the state of eye is open or closed.

1.6 Thesis Outline

This thesis is organized as follows. Chapter 2 introduces the face location detection with some illustrative examples. Chapter 3 shows how we use edge detection, corner detection, and anisotropic diffusion transform to location eye area, estimate the presence of glasses, and then to separate the reflection within glasses.

The technique that we use patch database and evaluation function to choose the best decomposition will also be shown. Several simulation examples and their results of each topic of chapters are provided in Chapter 4. We conclude this thesis with a discussion in Chapter 5.

Fig 1.1. Flowchart of eye detection and glasses removal system.

Chapter 2. Face Segmentation

2.1 Introduction

In this section, we will show how to segment face area. Here we adopt a method proposed by Chai et al. [3]. We just used four stages of this method except the luminance regularization stage. The luminance regularity property is only appropriate for a simple background, but we will identify a person’s face in an image without any constraints. To this end, we skip the luminance stage, and use more constraints on geometry to distinguish the face region and background instead.

2.2 Face Segmentation Algorithm

The algorithm in [3] is an unsupervised segmentation algorithm, and hence no manual adjustment of any design parameter is needed in order to suit any particular input image. The only principal assumption is that the person’s face must be present in the given image, since we are locating and not detecting whether there is a face.

The revised algorithm we used is consists of four stages, as depicted in Fig. 2.1.

Fig. 2.1. Outline of face-segmentation algorithm.

A. Color Segmentation

The first stage of the algorithm is to classify the pixels of the input image to skin region and non-skin region. To do this, we reference a skin-color reference map in

r b

YC C color space.It has been proved that a skin-color region can be identified by the presence of a certain set of chrominance values (i.e., C and _r C ) narrowly and _b consistently distributed in the YC C color space. We utilize Rr b Cr and RCb to represent the respective ranges of C and _r C values that correspond to skin color, _b which subsequently define our skin-color reference map. The ranges that the paper uses to be the most suitable for all the input images that they have tested are

Input Image: An Image Including A Face

RC_r= [133, 173], and RC_b= [77, 127].

The size of image we use is 640×480. In the cause of reducing the computing time, we downsample the image to become 320×240 and recover in the last stage.

Therefore, for an image of M×N pixels and we downsample it to M/2×N/2. With the skin-color reference map, we got the color segmentation result OA as

OA(x, y) = 1, if [Cr(x, y)∈R_C_r]T the picture respectively. An example to illustrate the classification of the original image Fig. 2.2 is shown in Fig. 2.3.

Nevertheless, the result of color segmentation is the detection of pixels in a facial area and may also include other areas where the chrominance values coincide with those of the skin color (as is the case in Fig. 2.3). Hence the successive operating stages of the algorithm can be exploited to eliminate these misdiagnosed areas.

Fig. 2.2. Original image.

Fig. 2.3. Image after filtered by skin-color map in stage A.

B. Density Regularization

This stage considers the bitmap produced by the previous stages to contain the facial region that is corrupted by noise. The noise may appear as small holes on the facial region due to undetected facial features such as eyes, mouth, even glasses, or it may also appear as objects with skin-color appearance in the background scene.

Therefore, this stage pre-performs simple morphological operations such as dilation to fill in any small hole in the facial area and erosion to remove any small object in the background area. Nevertheless, the intention is not to remove the noise entirely but to reduce its amount and size.

To distinguish between facial and non-facial region more complete, we first need to identify regions of the bitmap that have higher probability of being the facial region.

According to their observation in [3], it shows that the facial color is very uniform, and therefore the skin-color pixels belonging to the facial region will appear in a single large cluster, while the skin-color pixels belonging to the background may appear as many large clusters or small isolated objects. Thus, we study the density distribution of the skin-color pixels detected in stage A. A density map is calculus as follows.

It first partitions the output bitmap of stage A OA(x, y) into non-overlapping groups of 4×4 pixels, then counts the number of skin-color pixels within each group and assigns this value to the corresponding point of the density map.

According to the density value, we classify each point into three types, namely, zero (D = 0), intermediate (0 < D < 16), and full (D = 16). A group of points with zero density value will represent a non-facial region, while a group of full density points will signify a cluster of skin-color pixels and a high probability of belonging to a facial region. Any point of intermediate density value will indicate the presence of noise. The density map of an example with three density classifications is depicted in Fig. 2.4. The point of zero density is shown in white, intermediate density in green,

and full density in black.

Once the density map is derived, we can then begin the process that we termed as density regularization. This involves the following three steps.

1) Discard all points at the edge of the density map, i.e., set D(0, y) = D(M/8–1, y)

= D(x, 0) = D(x, N/8–1) for all x = 0, …, M/8–1 and y = 0, …, N/8–1.

2) Erode any full-density point (i.e., set to zero) if it is surrounded by less than five other full-density points in its local 3×3 neighborhood.

3) Dilate any point of either zero or intermediate density (i.e., set to 16) if there are more than two full-density points in its local 3×3 neighborhood.

After this process, the density map is converted to the output bitmap of stage B as

The result of the previous example is displayed in Fig. 2.5.

Fig. 2.4. Density map after classified to three classes.

Fig. 2.5. Image produced by stage B.

C. Geometric Correction

In this stage, we first performed two simple procedures that are similar to that initially introduced in stage B to ensure that noise appearing on the facial region is filled in and that isolated noise objects on the background are removed. The two procedures are shown as followings, a pixel in O_B(x, y) with the values of one will remain as a detected pixel if there are more than three other pixels, in its local 3×3 neighborhood, with the same value. At the same time, a pixel in O_B(x, y) with a value of zero will be reconverted to a value of one (i.e., as a potential pixel of the facial region) if it is surrounded by more than five pixels, in its local 3×3 neighborhood, with a value of one.

We then commence the horizontal scanning process on the “filtered” bitmap. We search for any short continuous run of pixels that are assigned with the value of one.

Any group of less than four horizontally connected pixels with the value of one will be eliminated and assigned to zero. A similar process is then performed in the vertical direction. As a result the output bitmap of this stage should contain the facial region with minimal or no noise, as demonstrated in Fig. 2.6.

Fig. 2.6. Image produced by stage C.

D. Contour Extraction

In this final stage, we convert the M/8 × N/8 output bitmap of stage C back to the dimension of M/2 × N/2. To achieve the increase in spatial resolution, we utilize the edge information that is already made available by the color segmentation in stage A. Therefore, all the boundary points in the previous bitmap will be mapped into the corresponding group of 4 × 4 pixels with the value of each pixel as defined in the output bitmap of stage A. The representative output bitmap of this final stage of the algorithm is shown in Fig. 2.7.

Fig. 2.7. Image produced by stage D.

Chapter 3. Eye Detection, Glasses Existence Detection and Reflection Separation of Glasses

3.1 Introduction

In this section, we first show three measures that were employed to eye location, glasses existence detection, and reflection separation. Then we describe how to locate the eye area, and then present the details of separating reflection. Our procedure is slightly different from the order of the flowchart mentioned in Fig 1.1. The task of glasses existence detection and eye position detection is done at the same stage, because the position of glasses always overlapped with eye or eyebrow.

3.1.1 Edge Detection

Image edges have already been defined as local variations of image intensity.

Therefore, local image differentiation techniques [20]-[22] can produce edge detector operators. The image gradient ∇f(x, y)

provides useful information about local intensity variations. Its magnitude,

^∇^{f x y}

( )

^, ⁼ ^f^x²

( )

^{x y}^, ⁺ ^f^y²

( )

^{x y}^, ^,

^(3.2)

can be used as an edge detector. Alternatively, the sum of the absolute values of

partial derivatives f_x, f_y can be employed by

( )

^, x

( )

^, y

( )

f x y f x y f x y

∇ = + , (3.3)

for computational simplicity. Local edge direction can be described by the direction angle:

þ(x, y) = arctan(fy/fx). (3.4)

Gradient estimates can be obtained by using gradient operators of the form:

fêx= W^T₁X, (3.5) fêy= W^T₂X, (3.6)

where X is the vector containing image pixels in a local image neighborhood. Weight vectors W₁, W₂ are described by gradient masks. Such masks are shown in Fig. 3.1 for the Sobel edge detectors. Eqs. (3.5) and (3.6) are essentially two-dimensional linear convolutions with the 3×3 kernels shown in Fig. 3.1. They can be easily implemented in the spatial domain.

Edge templates are masks that can be used to detect edges along different directions. Such Modified Sobel edge detector masks of size 3×3 are shown in Fig.

3.2. They can detect edges at four directions (0, 45, 90, and 135 degrees).The resultant edge images processed through Sobel and Modified Sobel operator are shown in Figs.

3.3(b) and 3.3(c) respectively, over a test image “Baboon.” The Modified Sobel operator has better performance than the Sobel operator because it can produce more minute and subtle edge, such as slanted edges beside the noise.

-1 0 1

Fig. 3.2. Modified Sobel edge detector masks.

1 2 1

(a)

(b)

(c)

Fig. 3.3. The example of image applying edge detector. (a) Original image (Baboon);

(b) Sobel edge detector output; and (c) Modified Sobel edge detector output.

3.1.2 Corner Detection

In this thesis, we utilize the Harris-like operator as a corner detection operator;

therefore, we first describe the concept of Harris corner detector. In an arbitrary image, we can classify it to three kinds of regions with respect to Harris corner detector. Here, we show it as follows.

z The flat region: The intensity at this region changes scarcely at all directions.

z The edge region: The intensity at this region changes scarcely along the direction of edge, but it changes severely along the orthogonal direction of edge.

z The corner region: The intensity at this region changes significantly at all directions.

According to Harris [23], the Harris corner detector is based on the local auto-correlation function of a signal; where the local auto-correlation function measures the local changes of the signal with patches shifted by a small amount in different directions. Given a shift

(

^{∆ ∆}^x^, ^y

)

and a point

( )

^{x y}^, , the auto-correlation

(

i ^, i

) (

i^, i

)

(

i^, i

) (

y i^, i

)

where matrix ^{C x y}

( )

^, captures the intensity structure of the local neighborhood.

Let λ 1 , λ 2 be the eigenvalues of matrix ^{C x y}

( )

^, . The eigenvalues form a rotationally invariant description. There are three cases to be considered:

1. If bothλ1,λ 2 are small, then it indicates the windowed image region is of approximately constant intensity.

2. If one eigenvalue is high and the other is low, then it denotes local shifts along the edge direction cause little change and significant change in the orthogonal direction; this means an edge.

3. If both eigenvalues are high, then it indicates local shifts in any direction will result in a significant increase; this means a corner.

Harris [23] defined a measure of corner strength:

( )

^, ^det

(

^trace

)

H x y = C−α C , (3.10)

and a corner is detected when

(

)

thr

H x y > H (3.11)

whereHthris a parameter, a threshold on corner strength. In Harris corner detector, α plays a role to tune the sensitivity of corner. When α is larger, then^{H x y is}

(

)

smaller and less sensitive for corner detection; otherwise not. Fig. 3.4 and Fig. 3.5 shows Harris corner detector applied to some image with corners.

(a) (b) (c)

(d) (e) (f)

Fig. 3.4 The example 1 of corner detection. (a) Original image. (b) α =0.04 and Hthr=0.005. (c) α =0.04 and Hthr=0.01. (d) α =0.04 and Hthr=0.15. (e) α =0.04 and Hthr=0.6. (f) α =0.04 and Hthr=0.9.

(a) (b)

Fig. 3.5 The example 2 of corner detection. (a) Original image. (b) α =0.05 and Hthr=0.05.

Although the Harris corner detector provide good repeatability under varying rotation and illumination, it will be the best that remove noise before apply Harris corner detector to prevent from “spurious” corner. Therefore, the next section will introduce how anisotropic diffusion to remove noise.

3.1.3 Anisotropic Diffusion

In [24], Black mentioned diffusion algorithms could remove noise from an image by modifying the image via a partial differential equation (PDE). For example, consider applying the isotropic diffusion equation (the heat equation) given by

(

^{, ,}

)

^{/ = div}

( )

I x y t t I

∂ ∂ ∇ , using the original (degraded/noisy) image ^{I x y}

(

^{, , 0}

)

^{as the}

initial condition, where ^{I x y}

(

^{, , 0 :}

)

^R² ^→R is an image in the continuous domain, ⁺

( )

^{x y}^, specifies spatial position, t is an artificial time parameter, and where ∇I is

the image gradient. Modifying the image according to this isotropic diffusion equation is equivalent to filtering the image with Gaussian filter; however it result in blurring the edge.

Perona and Malik [25] proposed the anisotropic diffusion equation as follows:

(

^{, ,}

)

^{= div}

( )

is “stopped” across edges as Fig. 3.6. The “edge-stopping” function adopted in [25]

are

The constant K was fixed either by hand at some fixed value, or using the “noise estimator” described by Canny [22].

Perona and Malik discretized their anisotropic diffusion equation as follows:

where Is^t is a discretely sampled image, s denotes the pixel position in a discrete, two-dimensional (2-D) grid, and t now denotes discrete time steps (iterations). The constant λ is a scalar that determines the rate of diffusion, η represents the spatial s

neighborhood of pixel s, and η is the number of neighbors. Perona and Malik _s linearly approximated the image gradient (magnitude) in a particular direction as

, ^t,

sρ ρ s ρ ηs

∇Ι = Ι − Ι ∈ . (3.16)

We show the local neighborhood of pixels at a boundary in Fig. 3.7. Fig. 3.8 shows the example of the noise image and its result image after anisotropic diffusion processing.

Fig. 3.6. The stepping function g(.).

Fig. 3.7. Local neighborhood of pixels at a boundary (intensity discontinuity).

(a) (b) (c)

Fig. 3.8. The example about anisotropic diffusion processing. (a) The input image(with noise). (b) The processed image after average mask 10 times. (c) The processed image after anisotropic diffusion 10 times.

3.2 Glasses Existence Detection and Eye Location

In this section, we discuss the glasses existence and eye location detection together, because the position of glasses always overlapped with eye or eyebrow. At first, we classify the face condition into three types:

z Face without glasses.

z Face with glasses (non-sunglasses).

z Face with sunglasses.

The typical results of face segment derived from above three types are shown in Figs.

3.9, 3.10, and 3.11, respectively.

(a) (b)

Fig. 3.9. The example of face without glasses. (a) The input face image without glasses. (b) The face segment derived from (a).

(a) (b)

Fig. 3.10. The example of face with glasses. (a) The input face image with glasses. (b) The face segment derived from (a).

(a) (b)

Fig. 3.11. The example of face with sunglasses. (a) The input face image with sunglasses. (b) The face segment derived from (a).

We can utilize the fact that the segment of the face with sunglasses has rapid gap at the position of eye caused from sunglasses to locate the eye wearing sunglasses, and hence to detect the existence of sunglasses. Therefore, we first assign the point of non-face region to zero, and assign the point of face region to the value of one. Then we sum up the face segment row by row into an 1-D vector that presents the quantity of face intensity to locate the eye position. Fig. 3.12 depicts the method above.

(a) (b)

Fig. 3.12. The example of eye detection on sunglasses. (a) The face segment of a face with sunglasses. (b) The 1-D graph derived via summing the row-wise intensity face segment of (a).

In the bare face image, the eye region has more corners and gradient magnitude than other region. Moreover, the face image with glasses has more corners and horizontal edges than without glasses caused from the frame of glasses, nose-piece, and reflections within lens.

However, when the corner and edge detectors are applied to flat region of a face segment, many spurious responses will be generated. To avoid these responses, we first invoke anisotropic diffusion. Then we use the corner and edge detectors to detect

the existence of glasses, and to locate the eye. Therefore, after anisotropic diffusion processing on the face segment, we then apply the corner operator and edge operator.

We sum up the two operator’s output into a 1-D vector. Subsequently, we find the position that has peak value from the 1-D vector. The peak point indicates the horizontal position of eyes. Fig. 3.13 and Fig. 3.14 show the method above.

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 3.13. The example of eye location determination on an image of face without glasses. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment.

(g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 3.14. The example of eye location determination on an image of face wearing glasses. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment.

Comparing Figs. 3.13(h) and 3.14(h), it can be seen that the shapes and the maximum of the peak lobe are different. With these differences in the peak lobe, we can select suitable threshold value to determine whether the eyes of a face image wearing glasses or not.

3.2.1 Eyeball Extraction within Glasses

If the glasses are present, we have to perform some operation to extract the eye as below. First, we compute the color edge of the eye region in HSV color model and perform simple morphological operations to get the preliminary edge map of glasses.

The color edge detection method proposed by Fan et al. [26] use entropic thresholding technique to obtain an optimal threshold that is adaptive to the image contents, and this technique has been proved to be highly efficient for two-class data classification

在文檔中自動眼睛偵測及眼鏡反光消除 (頁 17-0)