Anisotropic Diffusion - EYE DETECTION, GLASSES EXISTENCE DETECTION

CHAPTER 3 EYE DETECTION, GLASSES EXISTENCE DETECTION

3.1.3 Anisotropic Diffusion

In [24], Black mentioned diffusion algorithms could remove noise from an image by modifying the image via a partial differential equation (PDE). For example, consider applying the isotropic diffusion equation (the heat equation) given by

(

^{, ,}

)

^{/ = div}

( )

I x y t t I

∂ ∂ ∇ , using the original (degraded/noisy) image ^{I x y}

(

^{, , 0}

)

^{as the}

initial condition, where ^{I x y}

(

^{, , 0 :}

)

^R² ^→R is an image in the continuous domain, ⁺

( )

^{x y}^, specifies spatial position, t is an artificial time parameter, and where ∇I is

the image gradient. Modifying the image according to this isotropic diffusion equation is equivalent to filtering the image with Gaussian filter; however it result in blurring the edge.

Perona and Malik [25] proposed the anisotropic diffusion equation as follows:

(

^{, ,}

)

^{= div}

( )

is “stopped” across edges as Fig. 3.6. The “edge-stopping” function adopted in [25]

are

The constant K was fixed either by hand at some fixed value, or using the “noise estimator” described by Canny [22].

Perona and Malik discretized their anisotropic diffusion equation as follows:

where Is^t is a discretely sampled image, s denotes the pixel position in a discrete, two-dimensional (2-D) grid, and t now denotes discrete time steps (iterations). The constant λ is a scalar that determines the rate of diffusion, η represents the spatial s

neighborhood of pixel s, and η is the number of neighbors. Perona and Malik _s linearly approximated the image gradient (magnitude) in a particular direction as

, ^t,

sρ ρ s ρ ηs

∇Ι = Ι − Ι ∈ . (3.16)

We show the local neighborhood of pixels at a boundary in Fig. 3.7. Fig. 3.8 shows the example of the noise image and its result image after anisotropic diffusion processing.

Fig. 3.6. The stepping function g(.).

Fig. 3.7. Local neighborhood of pixels at a boundary (intensity discontinuity).

(a) (b) (c)

Fig. 3.8. The example about anisotropic diffusion processing. (a) The input image(with noise). (b) The processed image after average mask 10 times. (c) The processed image after anisotropic diffusion 10 times.

3.2 Glasses Existence Detection and Eye Location

In this section, we discuss the glasses existence and eye location detection together, because the position of glasses always overlapped with eye or eyebrow. At first, we classify the face condition into three types:

z Face without glasses.

z Face with glasses (non-sunglasses).

z Face with sunglasses.

The typical results of face segment derived from above three types are shown in Figs.

3.9, 3.10, and 3.11, respectively.

(a) (b)

Fig. 3.9. The example of face without glasses. (a) The input face image without glasses. (b) The face segment derived from (a).

(a) (b)

Fig. 3.10. The example of face with glasses. (a) The input face image with glasses. (b) The face segment derived from (a).

(a) (b)

Fig. 3.11. The example of face with sunglasses. (a) The input face image with sunglasses. (b) The face segment derived from (a).

We can utilize the fact that the segment of the face with sunglasses has rapid gap at the position of eye caused from sunglasses to locate the eye wearing sunglasses, and hence to detect the existence of sunglasses. Therefore, we first assign the point of non-face region to zero, and assign the point of face region to the value of one. Then we sum up the face segment row by row into an 1-D vector that presents the quantity of face intensity to locate the eye position. Fig. 3.12 depicts the method above.

(a) (b)

Fig. 3.12. The example of eye detection on sunglasses. (a) The face segment of a face with sunglasses. (b) The 1-D graph derived via summing the row-wise intensity face segment of (a).

In the bare face image, the eye region has more corners and gradient magnitude than other region. Moreover, the face image with glasses has more corners and horizontal edges than without glasses caused from the frame of glasses, nose-piece, and reflections within lens.

However, when the corner and edge detectors are applied to flat region of a face segment, many spurious responses will be generated. To avoid these responses, we first invoke anisotropic diffusion. Then we use the corner and edge detectors to detect

the existence of glasses, and to locate the eye. Therefore, after anisotropic diffusion processing on the face segment, we then apply the corner operator and edge operator.

We sum up the two operator’s output into a 1-D vector. Subsequently, we find the position that has peak value from the 1-D vector. The peak point indicates the horizontal position of eyes. Fig. 3.13 and Fig. 3.14 show the method above.

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 3.13. The example of eye location determination on an image of face without glasses. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment.

(g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 3.14. The example of eye location determination on an image of face wearing glasses. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment.

Comparing Figs. 3.13(h) and 3.14(h), it can be seen that the shapes and the maximum of the peak lobe are different. With these differences in the peak lobe, we can select suitable threshold value to determine whether the eyes of a face image wearing glasses or not.

3.2.1 Eyeball Extraction within Glasses

If the glasses are present, we have to perform some operation to extract the eye as below. First, we compute the color edge of the eye region in HSV color model and perform simple morphological operations to get the preliminary edge map of glasses.

The color edge detection method proposed by Fan et al. [26] use entropic thresholding technique to obtain an optimal threshold that is adaptive to the image contents, and this technique has been proved to be highly efficient for two-class data classification problem [27]. Then, we convert the image of eye region from RGB color space to

r b

YC C color space because in YC C color model domain the intervals of the r b C r

and C components of skin-color are always very dissimilar from glasses and can _b easily be clustered to two classes. However, for kinds of glasses, such as metallic and thin-frame, the color of glasses frame sometimes lies in the skin-color interval in

r b

YC C color model because the metallic reflection and the noise caused by low resolution CCD. In order to solve this type problem, we make use of extra information from RGB gradient edge detector. Subsequently, we combine the three evidences to guarantee that the glasses have completely been extracted, despite some noises caused by hair or eyebrows to be included in the map. Fig. 3.15 show an example of edge detection while one wear glasses.

projections to eliminate the glasses region and locate accurately the eyes position.

When we apply erosion twice, twice better than once empirically, to the edge image of wearing glasses image, the edge will break into small pieces and then the eye can be separated from glasses contour easily by selecting the largest connected component which has the smallest standard deviation to each center of the component. The hair and eyebrows components also can be recognize because they are always from the top to the bottom and begin with the top of the eye region we have set. The extracted eye position result from edge map of wearing glasses example of Fig. 3.15 is demonstrated in Fig. 3.16. Finally, we can estimate the state of eyes by its vertical length and area, and determine whether the driver is drowsy or not in drowsiness detection using PERCLOS and blinking rate, PERCLOS and blinking rate are, respectively, the duration and frequency of eye closure.

(a) (b) (c)

(d) (e)

Fig. 3.15. An example of edge detection on eye region. (a) Original image. (b) Edge detection in RGB color space. (c) Edge detection in Hue component of HSV color space. (d) Non-skin-color region. (e) The resultant edge map union the previous three edge map.

(a) (b)

Fig. 3.16. Eye extraction from edge map of Fig. 3.15. (a) Edge map of eye region. (b) Edge map with eliminating the hair region. (c) Edge map eliminating hair region with twice erosion. (d) Extraction the pupil position.

3.3 Reflection Separation within Glasses

When we take a picture from a subject who wears glasses, the image unavoidably catches some reflection within glasses. Therefore, we need to remove the side effect of glasses reflection if we want to detect the open or close degree of eyes.

3.3.1 Introduction to Reflection Separation

As we take a picture through a glass, we often get an image that seems a linear superposition of two images: the image of the scene beyond the glass plus the image of the scene reflected by the glass. Our algorithm for reflection separation was mentioned by Levin et al. [19], which assumed the image with reflection can be

decomposed into two transparent layers.

Mathematically, the decomposition problem can be posed as follows. We are given an image ^{I x y}

( )

^, and wish to find two layers I1 and I2 such that

( )

, 1

( )

, 2

( )

I x y = I x y + I x y (3.17) Obviously, in the absence of additional prior knowledge there are a huge number of possible decompositions. Fig. 3.17 depicts a number of possible decompositions; all of them satisfy Eq. (3.17). In order to choose the “correct” decomposition, we need additional assumptions or conditions.

One might think that in order to correctly decompose such images, an algorithm would need to recognize all objects in images. The algorithm above would possess such high level knowledge to prefer the correct decomposition. But can the “correct”

decomposition be chosen without such high level knowledge? In [19], Levin et al. [19]

propose an algorithm that can decompose reflection images using a single input image and without any high level knowledge. The algorithm is based on a very simple cost function: it favors decompositions which have a small number of edges and corners.

(a)

(b) (c)

(d) (e)

Fig. 3.17. The example of reflection image and it’s decomposition. (a) The input image (generated by summing the two images of (b)). (b) The correct decomposition.

(c)à(e) alternative possible decompositions.

3.3.2 Edges, Corners, and Cost Function

What is the reason we make use of edges and corners to estimate the “correct”

decompositions? At first, consider the simple image in Fig. 3.18(a), the image can be decomposed into an infinite number of possible two layer decompositions. Fig.

3.18(bàe) show some possible decompositions including the decomposition into two squares (the perceptually “correct” decomposition).

Why should the “correct” decomposition be favored? One reason is that out of the decompositions shown in the figure, it minimizes the total number of edges and corners. The original image has 10 corners: 4 from each square and two “spurious”

corners caused by the superposition of the two images. When we separate the image into two squares we get rid of the two spurious corners and are left only with eight corners. The decomposition shown in the third row increases the number of corners (it has 14 corners) while the bottom decomposition has 8 corners but increase the number of edges.

(a)

(b) (c)

(d) (e)

Fig. 3.18. An example of an input image and it’s possible decompositions .

How can we translate the preference for a small number of edges and corners into cost function? We need to make two decisions: (1) what operators to use for edge and corner detectors and (2) what mathematical form to give the cost. We adopt the expression proposed by Levin et al. [19], who use the gradient magnitude ^∇^{I x y}

( )

as an edge operator and Harris-like operator ^{c x y}

( )

^, as corner operator:

For cost function, we motivate by the qualitative statistics observed from natural images mentioned by Levin et al. [28]; it leads to the following cost function for a single layer: the corner and edge operators in natural images and were shown to be critical for the successful decomposition [28]. The cost of two layer decomposition is simply the sum of the costs for each layer separately:

(

1, 2

) ( )

( )

cost I I = cost I + cost I (3.20)

In real images, however, Detecting edges by edge magnitude and corner via a simple Harris detector arise many spurious “edges” and “corner” in many seemingly flat regions of image. Therefore, we first apply a nonlinear smoothing separately to each layer to diminish the number of spurious “edges” and “corner” found by the gradient and Harris operators. Then we apply Eq. (3.19) to the smoothed layers. In

Thus our cost function for a single layer now is: where ˆI is the layer after applying anisotropic diffusion [25].

3.3.3 Oriented filters

In this section, we introduce oriented filters because they will be utilized as the measure to find the correct decomposition. There are many approaches proposed in the computer vision literature by convolving the image with a bank of linear spatial filters fi tuned to various orientation and spatial frequencies. The oriented filter we exploit is introduced in Malik et al. [29]. The oriented filterbank we used in this thesis, depicted in Fig. 3.19, is based on rotated copies of Gaussian derivative and its Hilbert transform. More precisely, let ^f¹

( )

^{x y}^, ⁼ ^G^σ^′′¹

( )

^{y G}^σ²

( )

^x ^and ^f²

( )

^{x y}^, equal the

Now assume that the image is convolved with such a bank of linear filters. We will refer to the collection of response images I ∗ fi as the hypercolumn transform of the image. Malik et al. [29] state the hypercolumn transform provides a convenient front end for contour and texture analysis; therefore, it is the reason why we utilize oriented filter as the measure to find the correct decomposition.

Fig. 3.19. The example of A filterbank. Filter set consisting of 2 phase (even and odd) and 6 orientations (equally spaced from 0 toπ ). The first and third column is the image of f1

( )

x y, with size 7x7; the second and forth column is the correspond Hilbert transform of first and third column.

3.3.4 Implementation

In this section, we describe the detail of implementation because we have some

optimizing possible decomposition, Levin et al. [19] discretized the problem by dividing the image into small 7×7 overlapping patches and restrict the search to 20 possible decompositions for each patch. Therefore, in this thesis we divide the image into small 7×7 non-overlapping patches p to reduce the computational time. Then we build the patches database for searching the possible decompositions for each patch of the input image. Our patches database just contains the eye area in the face image with glass; because we only are interest in the layer that possesses the eye information, not the reflection layer.

where fi is the oriented filterbank mentioned above. For any candidate patch q and the optimal contrast s were found via minimizing this equation. Where optimal contrast s is exploited to conquer the change of intensity at patches as patches is located at the doubtful reflection region. However, we just search candidate patches to reconstruct the non-reflection layer from the patches database. Since, it arises some misdiagnosed patches at the recovery image when the reflection layer has more strong intensity of feature than the non-reflection layer. Fig. 3.20 depicts the example we separate reflection by find out the patches by Eq. (3.24).

(a) (b) (c)

(d) (e) (f)

Fig. 3.20. Example of separation results of images with reflections using discretization. (a) The input image consisting of the foreground image multiplied by 0.8 and the reflection image multiplied by 0.2. (b) Separated foreground layer image of (a). (c) Separated reflection layer image of (a). (d) The input image consisting of the foreground image multiplied by 0.85 and the reflection image multiplied by 0.15.

(e) Separated foreground layer image of (d). (f) Separated reflection layer image of (d).

Chapter 4. Simulation and Results

We have tested the eye-detection algorithm on a number of people in order to confirm the validity and stability. First, we obtain frontal face images of people from a CCD camera, and then we demonstrate the algorithm via these images. In Section 4.1, we will show the step-by-step result of finding the eyes. Subsequently, we will show the experiment result of reflection separation within glasses in Section 4.2. The size of images is 640×480 and the simulation is processing on a Pentium IV 3.2GHz personal computer.

4.1 Eye detection

We show five example of eye detection on the images of bare face, three examples of eye detection on the images of face wearing glasses, and one example of eye detection on the image of face wearing light sunglasses in Section 4.1.1. Then We show two example of eye detection on the images of face wearing sunglass in Section 4.1.2.

4.1.1 Eye detection on the image of bare face, face wearing glasses, and face wearing light sunglasses

In Figs. 4.1–4.5, we show five example of eye detection on the images of bare face. In Figs. 4.6–4.8 we show three examples of eye detection on the images of face wearing glasses. In Fig. 4.9 we show one example of eye detection on the image of face wearing light sunglasses.

In each example, we will show the original face image in (a), and face extraction

by skin-color map in (b). The boundary of face segment produced by erosion operation is depicted in (c). The corner response and edge response of (b) are depicted in (d) and (e), respectively. The corner response and edge response inside the boundary of face segment are depicted in (f) and (g), respectively. (h) is the 1-D graph showing the sum of (f) and (g). (i) shows the eye location is detected by the position that has the peak value at the 1-D graph of (h). (j) locates the detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 4.1. The example 1 of eye location on the image with bare face. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment. (g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 4.2. The example 2 of eye location on the image with bare face. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment. (g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 4.3. The example 3 of eye location on the image with bare face. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment. (g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 4.4. The example 4 of eye location on the image with bare face. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment. (g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 4.5. The example 5 of eye location on the image with bare face. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment. (g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 4.6. The example 1 of eye location on the image of face wearing glasses. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response of (b). (f) The corner response inside the boundary of face segment. (g) The edge response inside the boundary of face segment. (h) The 1-D graph showing the sum of (f) and (g). (i) The eye location detected by the position that has the peak value at the 1-D graph of (h). (j) The detected eye position in the face segment of (b).

(a) (b)

(e) (f)

(g) (h)

(i) (j)

Fig. 4.7. The example 2 of eye location on the image of face wearing glasses. (a) The input face image. (b) Face extraction by skin-color map. (c) The boundary of face segment by erosion operation. (d) The corner response of (b). (e) The edge response

在文檔中自動眼睛偵測及眼鏡反光消除 (頁 35-0)